You are on page 1of 1823


This version: 2nd January 2019.
Latest version here.

Revision in progress.1

Latest changes: Currently revising Part V (Calculus).

This textbook was first completed in Jul 2016. For 1.5 years afterwards, only minor changes were made.
But starting Feb 2018, I’ll be completely rewriting this textbook. In particular, I’ll be working to (a)
slow down the pace of this textbook; and (b) explain things more clearly and simply. Less importantly,
I’ll also be removing the old (9740) material that is no longer on the current (9758) syllabus and making
trivial formatting changes (to make the book beautifuller).
As with everything I do, please let me know if you spot any errors or have any feedback. Thank you.
i, Contents
, Errors? Feedback? Email me! ,

With your help, I plan to keep improving this textbook.

Please refer to the latest version and also

mention the relevant page number(s) (bottom left).

ii, Contents

This book is licensed under the Creative Commons license CC-BY-NC-SA 4.0.

You are free to:

• Share — copy and redistribute the material in any medium or format
• Adapt — remix, transform, and build upon the material
The licensor cannot revoke these freedoms as long as you follow the license terms.
Under the following terms:
• Attribution — You must give appropriate credit, provide a link to the license, and
indicate if changes were made. You may do so in any reasonable manner, but not in any
way that suggests the licensor endorses you or your use.
• NonCommercial — You may not use the material for commercial purposes.
• ShareAlike — If you remix, transform, or build upon the material, you must distribute
your contributions under the same license as the original.
• No additional restrictions — You may not apply legal terms or technological measures
that legally restrict others from doing anything the license permits.

You do not have to comply with the license for elements of the material in the public domain
or where your use is permitted by an applicable exception or limitation. No warranties are
given. The license may not give you all of the permissions necessary for your intended use.
For example, other rights such as publicity, privacy, or moral rights may limit how you use
the material.

Author: Choo, Yan Min.

Title: H2 Mathematics Textbook.
ISBN: 978-981-11-0383-4 (e-book).

iii, Contents

The first thing to understand is that mathematics is an art.

— Paul Lockhart (2009, A Mathematician’s Lament).

A mathematician, like a painter or a poet, is a maker of patterns. If his

patterns are more permanent than theirs, it is because they are made with
ideas. ... Beauty is the first test: there is no permanent place in the world
for ugly mathematics.

— G.H. Hardy (1940, A Mathematician’s Apology).

Le savant n’étudie pas la nature parce que cela est utile; il l’étudie parce qu’il
y prend plaisir et il y prend plaisir parce qu’elle est belle. Si la nature n’était
pas belle, elle ne vaudrait pas la peine d’être connue, la vie ne vaudrait pas
la peine d’être vécue.
The scientist does not study nature because it is useful to do so. He studies
it because he takes pleasure in it, and he takes pleasure in it because it is
beautiful. If nature were not beautiful it would not be worth knowing, and life
would not be worth living.

— Henri Poincaré (1908, Science and Method [1914 trans.]).

[W]hoever does not love and admire mathematics for its own internal splend-
ours, knows nothing whatever about it.

— Michael Polanyi (1959, The Study of Man).

iv, Contents

Cover C
Version Date i

CC-BY-NC-SA 4.0 License iii

Contents v

About this Book xxv

Tips for the Student xxvii
Miscellaneous Tips for the Student xxxi

Use of Graphing Calculators xxxiii

Preface/Rant xxxv

Part 0. A Few Basics 1

1 Just To Be Clear 3
2 PSLE Review: Division 4
2.1 Long Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Dividing By Zero . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Logic 8
3.1 True, False, and Indeterminate Statements . . . . . . . . . . . . . . . . . . 9
3.2 The Conjunction AND and the Disjunction OR . . . . . . . . . . . . . . . 10
3.3 The Negation NOT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3.4 Equivalence ⇐⇒ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
3.5 De Morgan’s Laws: Negating the Conjunction and Disjunction . . . . . . 13
3.6 The Implication P Ô⇒ Q . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.7 The Converse Q Ô⇒ P . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.8 Affirming the Consequent (or The Fallacy of the Converse) . . . . . . . . 19
3.9 The Negation NOT- (P Ô⇒ Q) . . . . . . . . . . . . . . . . . . . . . . . 20
3.10 The Contrapositive NOT-Q Ô⇒ NOT-P . . . . . . . . . . . . . . . . . . 23
3.11 (P Ô⇒ Q AND Q Ô⇒ P ) ⇐⇒ (P ⇐⇒ Q) . . . . . . . . . . . . . . . 25
3.12 Other Ways to Express P Ô⇒ Q (Optional) . . . . . . . . . . . . . . . . 26
3.13 The Four Categorical Propositions and Their Negations . . . . . . . . . . 27
3.14 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

4 Sets 32
4.1 The Elements of a Set Can Be Pretty Much Anything . . . . . . . . . . . 33
4.2 In ∈ and Not In ∉ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

v, Contents
4.3 The Order of the Elements Doesn’t Matter . . . . . . . . . . . . . . . . . 36
4.4 n(S) Is the Number of Elements in the Set S . . . . . . . . . . . . . . . . 37
4.5 The Ellipsis “. . . ” Means Continue in the Obvious Fashion . . . . . . . . . 37
4.6 Repeated Elements Don’t Count . . . . . . . . . . . . . . . . . . . . . . . 38
4.7 R Is the Set of Real Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.8 Z Is the Set of Integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.9 Q Is the Set of Rational Numbers . . . . . . . . . . . . . . . . . . . . . . . 41
4.10 A Taxonomy of Numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.11 More Notation: + , − , and 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.12 The Empty Set ∅ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
4.13 Intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.14 Subset Of ⊆ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.15 Proper Subset Of ⊂ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.16 Union ∪ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.17 Intersection ∩ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.18 Set Minus ∖ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.19 The Universal Set E . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
4.20 The Set Complement A′ . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.21 De Morgan’s Laws . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.22 Set-Builder Notation (or Set Comprehension) . . . . . . . . . . . . . . . . 56
4.23 Chapter Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5 O-Level Review 61
5.1 Some Mathematical Vocabulary . . . . . . . . . . . . . . . . . . . . . . . . 61
5.2 The Absolute Value or Modulus Function . . . . . . . . . . . . . . . . . . 62
5.3 The Factorial n! . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.4 Exponents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.5 Rationalising the Denominator with a Surd . . . . . . . . . . . . . . . . . 69
5.6 Logarithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
5.7 Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

vi, Contents

Part I. Functions and Graphs 75

6 Graphs 77
6.1 Ordered Pairs . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . 77
6.2 The Cartesian Plane . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . 79
6.3 A Graph is Any Set of Points . . . . . . . . . . . . .
. . . . . . . . . . . . 80
6.4 The Graph of An Equation . . . . . . . . . . . . . .
. . . . . . . . . . . . 82
6.5 Graphing with the TI84 . . . . . . . . . . . . . . . .
. . . . . . . . . . . . 84
6.6 The Graph of An Equation with Constraints . . . .
. . . . . . . . . . . . 86
6.7 Intercepts and Roots . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . 93
6.8 Lines . . . . . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . 96
6.9 Horizontal, Vertical, and Oblique Lines . . . . . . . .
. . . . . . . . . . . . 98
6.10 Finding the Equation of a Line . . . . . . . . . . . .
. . . . . . . . . . . . 99
6.11 Perpendicular Lines . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . 101
6.12 The Difference between Lines, Line Segments, and Rays . . . . . . . . . . 102
6.13 Asymptotes and Limit Notation . . . . . . . . . . . . . . . . . . . . . . . . 103
6.14 Maximum and Minimum Points . . . . . . . . . . . . . . . . . . . . . . . . 107

7 Reflection and Symmetry 116

7.1 The Reflection of a Point in a Point . . . . . . . . . . . . . . . . . . . . . 116
7.2 The Reflection of a Point in a Graph . . . . . . . . . . . . . . . . . . . . . 118
7.3 Lines of Symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

8 Solutions and Solution Sets 122

9 O-Level Review: The Quadratic Equation y = ax2 + bx + c 127

10 Functions 137
10.1 What Functions Aren’t . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
10.2 Notation for Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
10.3 Warning: f and f (x) Refer to Different Things . . . . . . . . . . . . . . . 150
10.4 Real-Valued Functions of a Real Variable . . . . . . . . . . . . . . . . . . 151
10.5 Graphs of Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
10.6 The Range of a Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
11 An Introduction to Continuity 163

12 When a Function Is Increasing or Decreasing 168

13 Arithmetic Combinations of Functions 170

14 Inverse Functions 173

14.1 One-to-One or Invertible Functions . . . . . . . . . . . . . . . . . . . . . . 179
14.2 The Graphs of f and f −1 Are Reflections in the Line y=x . . . . . . . . . 184
14.3 The Intersection of f and f −1 . . . . . . . . . . . . . . . . . . . . . . . . . 190
14.4 Domain Restriction to Create an Invertible Function . . . . . . . . . . . . 193
14.5 Invertibility and Strict Monotonicity . . . . . . . . . . . . . . . . . . . . . 195

15 Composite Functions 196

vii, Contents

16 Transformations 202
16.1 y = f (x) + a . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
16.2 y = f (x + a) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204
16.3 y = af (x) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
16.4 y = f (ax) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 208
16.5 Combinations of the Above . . . . . . . . . . . . . . . . . . . . . . . . . . 210
16.6 y = ∣f (x)∣ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
16.7 y = f (∣x∣) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214
16.8 1/f . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215

17 ln, exp, and e 218

17.1 Euler’s Number . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 223

18 O-Level Review: The Derivative 225

18.1 The Gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
18.2 The Derivative as the Gradient . . . . . . . . . . . . . . . . . . . . . . . . 229
18.3 The Derivative as Rate of Change . . . . . . . . . . . . . . . . . . . . . . 230
18.4 Differentiability vs Continuity . . . . . . . . . . . . . . . . . . . . . . . . . 233
18.5 Rules of Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
18.6 The Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 240
18.7 Increasing and Decreasing . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
18.8 Stationary and Turning Points . . . . . . . . . . . . . . . . . . . . . . . . 244

19 O-Level Review: Trigonometry 252

19.1 Angles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
19.2 The Radian . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254
19.3 The Pythagorean Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 257
19.4 Sine and Cosine: The Right-Triangle Definitions . . . . . . . . . . . . . . 258
19.5 Sine and Cosine: The Unit-Circle Definitions . . . . . . . . . . . . . . . . 262
19.6 Warning about Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 272
19.7 The Inverse Trigonometric Functions . . . . . . . . . . . . . . . . . . . . . 273
19.8 The Area of a Triangle and the Laws of Sines and Cosines . . . . . . . . . 280
20 Elementary Functions 282

21 Polynomial Division 285

21.1 Factorising Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288

22 Conic Sections 299

22.1 The Ellipse x2 + y 2 = 1 (The Unit Circle) . . . . . . . . . . . . . . . . . . . 300
22.2 The Ellipse x2 /a2 + y 2 /b2 = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . 301
22.3 The Hyperbola y = 1/x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303
22.4 The Hyperbola x2 − y 2 = 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
22.5 The Hyperbola x2 /a2 − y 2 /b2 = 1 . . . . . . . . . . . . . . . . . . . . . . . . 305
22.6 The Hyperbola y 2 /b2 − x2 /a2 = 1 . . . . . . . . . . . . . . . . . . . . . . . . 306
22.7 The Hyperbola y = (bx + c)/(dx + e) . . . . . . . . . . . . . . . . . . . . . 307
22.8 The Hyperbola y = (ax2 + bx + c) /(dx + e) . . . . . . . . . . . . . . . . . . 313

23 Simple Parametric Equations 318

viii, Contents
23.1 Eliminating the Parameter t . . . . . . . . . . . . . . . . . . . . . . . . . . 325

24 Solving Inequalities 328

24.1 Multiplying an Inequality by an Unknown Constant . . . . . . . . . . . . 328
24.2 ax2 + bx + c > 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 330
24.3 (ax + b)/(cx + d) > 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 337
24.4 xxx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
24.5 Inequalities Involving the Absolute Value Function . . . . . . . . . . . . . 344
24.6 (ax2 + bx + c) / (dx2 + ex + f ) > 0 . . . . . . . . . . . . . . . . . . . . . . . 353
24.7 Solving Inequalities by Graphical Methods . . . . . . . . . . . . . . . . . . 355

25 Solving Systems of Equations 358

25.1 O-Level Review: Partial Fractions . . . . . . . . . . . . . . . . . . . . . . 367

26 Extraneous Solutions 371

26.1 Squaring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 371
26.2 Multiplying by Zero . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
26.3 Removing Logs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374

Part II. Sequences and Series 376

27 Sequences 378
27.1 Sequences Are Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 380
27.2 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382
27.3 Arithmetic Combinations of Sequences . . . . . . . . . . . . . . . . . . . . 383

28 Series 384
28.1 Convergent and Divergent Series . . . . . . . . . . . . . . . . . . . . . . . 385

29 Summation Notation ∑ 389

29.1 Summation Notation for Infinite Series . . . . . . . . . . . . . . . . . . . . 391

30 Arithmetic Sequences and Series 394

30.1 Finite Arithmetic Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . 395
30.2 Infinite Arithmetic Series . . . . . . . . . . . . . . . . . . . . . . . . . . . 396

31 Geometric Sequences and Series 397

31.1 Finite Geometric Sequences and Series . . . . . . . . . . . . . . . . . . . . 398
31.2 Infinite Geometric Sequences and Series . . . . . . . . . . . . . . . . . . . 400

32 Rules of Summation Notation 401

33 The Method of Differences 403

ix, Contents

Part III. Vectors 408

34 Introduction to Vectors 410

34.1 The Magnitude or Length of a Vector . . . . . . . . . . . . . . . . . . . . 412
34.2 When Are Two Vectors Identical? . . . . . . . . . . . . . . . . . . . . . . . 413
34.3 A Vector and a Point Are Different Things . . . . . . . . . . . . . . . . . 414
34.4 Two More Ways to Denote a Vector . . . . . . . . . . . . . . . . . . . . . 415
34.5 Position Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
34.6 The Zero Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 417
34.7 Displacement Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 418
34.8 Sum and Difference of Points and Vectors . . . . . . . . . . . . . . . . . . 419
34.9 Sum, Additive Inverse, and Difference of Vectors . . . . . . . . . . . . . . 421
34.10 Scalar Multiplication of a Vector . . . . . . . . . . . . . . . . . . . . . . . 424
34.11 When Do Two Vectors Point in the Same Direction? . . . . . . . . . . . . 425
34.12 When Are Two Vectors Parallel? . . . . . . . . . . . . . . . . . . . . . . . 426
34.13 Unit Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 427
34.14 The Standard Basis Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . 429
34.15 Any Vector Is A Linear Combination of Two Other Vectors . . . . . . . . 430
34.16 The Ratio Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432
35 Lines 434
35.1 Direction Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 434
35.2 Cartesian to Vector Equations . . . . . . . . . . . . . . . . . . . . . . . . . 438
35.3 Pedantic Points to Test/Reinforce Your Understanding . . . . . . . . . . . 443
35.4 Vector to Cartesian Equations . . . . . . . . . . . . . . . . . . . . . . . . . 444
36 The Scalar Product 447
36.1 A Vector’s Scalar Product with Itself . . . . . . . . . . . . . . . . . . . . . 449

37 The Angle Between Two Vectors 450

37.1 The Pythagorean Theorem and Triangle Inequality . . . . . . . . . . . . . 457
37.2 Direction Cosines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 458

38 The Angle Between Two Lines 461

39 Vectors vs Scalars 467
40 The Projection and Rejection Vectors 470

41 Collinearity 474
42 The Vector Product 476
42.1 The Angle between Two Vectors Using the Vector Product . . . . . . . . 479
42.2 The Length of the Rejection Vector . . . . . . . . . . . . . . . . . . . . . . 480

43 The Foot of the Perpendicular From a Point to a Line 481

43.1 The Distance Between a Point and a Line . . . . . . . . . . . . . . . . . . 483

44 Three-Dimensional (3D) Space 488

x, Contents
44.1 Graphs (in 3D) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 490

45 Vectors (in 3D) 492

45.1 The Magnitude or Length of a Vector . . . . . . . . . . . . . . . . . . . . 494
45.2 Sums and Differences of Points and Vectors . . . . . . . . . . . . . . . . . 495
45.3 Sum, Additive Inverse, and Difference of Vectors . . . . . . . . . . . . . . 498
45.4 Scalar Multiplication and When Two Vectors Are Parallel . . . . . . . . . 502
45.5 Unit Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 504
45.6 The Standard Basis Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . 505
45.7 The Ratio Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506

46 The Scalar Product (in 3D) 507

46.1 The Angle between Two Vectors . . . . . . . . . . . . . . . . . . . . . . . 509
46.2 Direction Cosines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 512

47 The Projection and Rejection Vectors (in 3D) 513

48 Lines (in 3D) 517

48.1 Vector to Cartesian Equations . . . . . . . . . . . . . . . . . . . . . . . . . 520
48.2 Cartesian to Vector Equations . . . . . . . . . . . . . . . . . . . . . . . . . 525
48.3 Parallel and Perpendicular Lines . . . . . . . . . . . . . . . . . . . . . . . 528
48.4 Intersecting Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 530
48.5 Skew Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 532
48.6 The Angle Between Two Lines . . . . . . . . . . . . . . . . . . . . . . . . 534
48.7 Collinearity (in 3D) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 538
49 The Vector Product (in 3D) 541
49.1 The Right-Hand Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545
49.2 The Length of the Vector Product . . . . . . . . . . . . . . . . . . . . . . 546
49.3 The Length of the Rejection Vector . . . . . . . . . . . . . . . . . . . . . . 547

50 The Distance Between a Point and a Line (in 3D) 548

51 Planes: Introduction 553
51.1 The Analogy Between a Plane, a Line, and a Point . . . . . . . . . . . . 557
52 Planes: Formally Defined in Vector Form 559
52.1 The Normal Vector . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 563

53 Planes in Cartesian Form 569

53.1 Finding Points on a Plane . . . . . . . . . . . . . . . . . . . . . . . . . . . 571
53.2 Finding Vectors on a Plane . . . . . . . . . . . . . . . . . . . . . . . . . . 572

54 Planes in Parametric Form 575

54.1 Planes in Parametric Form . . . . . . . . . . . . . . . . . . . . . . . . . . 578
54.2 Parametric to Vector or Cartesian Form . . . . . . . . . . . . . . . . . . . 582

55 Four Ways to Uniquely Determine a Plane 584

xi, Contents

56 The Angle between a Line and a Plane 587
56.1 When a Line and a Plane Are Parallel, Perp., or Intersect . . . . . . . . . 589

57 The Angle between Two Planes 592

57.1 When Two Planes Are Parallel, Perp., or Intersect . . . . . . . . . . . . . 593

58 Point-Plane Foot of the Perpendicular and Distance 598

58.1 Formula Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 601

59 Coplanarity 605
59.1 Coplanarity of Lines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 608

Part IV. Complex Numbers 611

60 Complex Numbers: Introduction 613

60.1 The Real and Imaginary Parts of Complex Numbers . . . . . . . . . . . . 616
60.2 Complex Numbers in Ordered Pair Notation . . . . . . . . . . . . . . . . . 617

61 Some Arithmetic of Complex Numbers 618

61.1 Addition and Subtraction . . . . . . . . . . . . . . . . . . . . . . . . . . . 618
61.2 Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 619
61.3 Conjugation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621
61.4 Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 623

62 Solving Polynomial Equations 624

62.1 The Fundamental Theorem of Algebra . . . . . . . . . . . . . . . . . . . . 625
62.2 The Complex Conjugate Root Theorem . . . . . . . . . . . . . . . . . . . 627

63 The Argand Diagram 629

64 Complex Numbers in Polar Form 630

64.1 The Argument: An Informal Introduction . . . . . . . . . . . . . . . . . . 631
64.2 The Argument: Formally Defined . . . . . . . . . . . . . . . . . . . . . . . 632
64.3 Complex Numbers in Polar Form . . . . . . . . . . . . . . . . . . . . . . . 634

65 Complex Numbers in Exponential Form 635

65.1 Complex Numbers in Exponential Form . . . . . . . . . . . . . . . . . . . 636

66 More Arithmetic of Complex Numbers 637

66.1 The Reciprocal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 641
66.2 Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 642

xii, Contents

Part V. Calculus 645

67 Limits 647
67.1 Limits, Informally Defined . . . . . . . . . . . . . . . . . . . . . . . . . . . 647
67.2 Examples Where The Limit Does Not Exist . . . . . . . . . . . . . . . . . 652
67.3 Rules for Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 659

68 Continuity, Revisited 660

68.1 Functions with a Single Discontinuity . . . . . . . . . . . . . . . . . . . . 662
68.2 Functions That Are Discontinuous Everywhere . . . . . . . . . . . . . . . 665
68.3 Functions That Seem Discontinuous But Aren’t . . . . . . . . . . . . . . . 666
68.4 Continuity and Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 668
68.5 Every Elementary Function Is Continuous . . . . . . . . . . . . . . . . . . 669
68.6 Continuity at Isolated Points (optional) . . . . . . . . . . . . . . . . . . . 672

69 The Derivative 673

69.1 Differentiable ⇐⇒ Approximately Linear . . . . . . . . . . . . . . . . . . 678
69.2 Continuity vs Differentiability . . . . . . . . . . . . . . . . . . . . . . . . . 680
69.3 The Derivative Is A Function . . . . . . . . . . . . . . . . . . . . . . . . . 681
69.4 The Notation of Lagrange, Leibniz, and Newton . . . . . . . . . . . . . . 682
69.5 Proving Several Rules of Differentiation . . . . . . . . . . . . . . . . . . . 685
69.6 Proving the Product and Quotient Rules . . . . . . . . . . . . . . . . . . . 690
69.7 Proving the Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . 693
69.8 Are All Elementary Functions Differentiable? . . . . . . . . . . . . . . . . 694
69.9 More About Leibniz’s Notation (optional) . . . . . . . . . . . . . . . . . . 695

70 Some Techniques of Differentiation 698

70.1 The Inverse Function Theorem (IFT) . . . . . . . . . . . . . . . . . . . . . 698
70.2 Implicit Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 700
70.3 Parametric Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . 703

71 The Second and Higher Derivatives 704

71.1 Higher Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 710
71.2 Smoothness (or Infinite Differentiability) . . . . . . . . . . . . . . . . . . . 714

72 When a Function Is Increasing or Decreasing, Revisited 716

72.1 Extrema and Stationary and Turning Points . . . . . . . . . . . . . . . . . 719
72.2 Interior (and Boundary) Points . . . . . . . . . . . . . . . . . . . . . . . . 723
72.3 The Interior Extremum Theorem (IET) . . . . . . . . . . . . . . . . . . . 725
72.4 The First Derivative Test for Extrema (FDTE) . . . . . . . . . . . . . . . 729
72.5 The Second Derivative Test for Extrema (SDTE) . . . . . . . . . . . . . . 730

73 Concavity 734
73.1 Inflexion Points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 738
73.2 Stationary and Non-Stationary Points of Inflexion . . . . . . . . . . . . . 741
73.3 The First Derivative Test for Inflexion Points (FDTI) . . . . . . . . . . . 742
73.4 The Second Derivative Test for Inflexion Points (SDTI) . . . . . . . . . . 743

xiii, Contents

73.5 A Summary of the Types of Points . . . . . . . . . . . . . . . . . . . . . . 746

74 More Techniques of Differentiation 748

74.1 Relating the Graph of f ′ to That of f . . . . . . . . . . . . . . . . . . . . 748
74.2 Equations of Tangents and Normals . . . . . . . . . . . . . . . . . . . . . 749
74.3 Connected Rates of Change Problems . . . . . . . . . . . . . . . . . . . . 750

75 More Fun With Your TI84 752

75.1 Locate Max and Min Points on Your TI84 . . . . . . . . . . . . . . . . . . 753
75.2 Locate the Derivative at a Point on Your TI84 . . . . . . . . . . . . . . . 755

76 The Maclaurin Series 757

76.1 Power Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 757
76.2 Analytic Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 759
76.3 Introducing the Maclaurin Series . . . . . . . . . . . . . . . . . . . . . . . 761
76.4 When Can a Function Be Represented by Its Maclaurin Series? . . . . . . 765
76.5 Formally Defining sin and cos . . . . . . . . . . . . . . . . . . . . . . . . . 767
76.6 Revisiting exp (optional) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 769
76.7 Can Every Function Be Represented by Its Maclaurin Series? . . . . . . . 771
76.8 The Five Standard Maclaurin Series . . . . . . . . . . . . . . . . . . . . . 775
76.9 Maclaurin Polynomials as Approximations . . . . . . . . . . . . . . . . . . 777
76.10 Term-by-Term Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . 780
76.11 The Cauchy Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 783
76.12 The Composition of Two Analytic Functions . . . . . . . . . . . . . . . . 785
76.13 Repeated Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 790
76.14 Repeated Implicit Differentiation . . . . . . . . . . . . . . . . . . . . . . . 791
76.15 The Basel Problem (fun, optional) . . . . . . . . . . . . . . . . . . . . . . 794
76.16 The Riemann Hypothesis (fun, optional) . . . . . . . . . . . . . . . . . . . 797

77 Integration 800
77.1 An Important Warning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 804
77.2 A Sketch of How We Can Find the Area under a Curve . . . . . . . . . . 806
77.3 Some Basic Rules of Integration . . . . . . . . . . . . . . . . . . . . . . . . 808
77.4 The First Fundamental Theorem of Calculus (FTC1) . . . . . . . . . . . . 811

78 Antidifferentiation 816
78.1 The Antiderivative Is Not Unique ... . . . . . . . . . . . . . . . . . . . . . 818
78.2 ... But It Is Unique Up to a COI . . . . . . . . . . . . . . . . . . . . . . . 818
78.3 How, Precisely, Should We Use the Antidifferentiation Symbol ∫ ? . . . . 820
78.4 Rules of Antidifferentiation . . . . . . . . . . . . . . . . . . . . . . . . . . 821

79 The Second Fundamental Theorem of Calculus (FTC2) 826

80 More Techniques of Antidifferentiation 828
80.1 Factorisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 828
80.2 Partial Fractions: Finding ∫ dx where b2 − 4ac > 0 . . . . . . 829
ax + bx + c
80.3 Building a Divisor of the Denominator . . . . . . . . . . . . . . . . . . . . 831
80.4 More Rules of Antidifferentiation . . . . . . . . . . . . . . . . . . . . . . . 833

xiv, Contents

80.5 Completing the Square: ∫ dx where b2 − 4ac < 0 . . . . . . . 835
ax + bx + c
80.6 ∫ √ 2 dx in the Special Case where a < 0 . . . . . . . . . . . . . 838
ax + bx + c
80.7 Using Trigonometric Identities . . . . . . . . . . . . . . . . . . . . . . . . 841
80.8 Integration by Parts (IBP) . . . . . . . . . . . . . . . . . . . . . . . . . . . 842
81 The Substitution Rule 846
∫ [(f ○ g) ⋅ g ] = f ○ g + C . . . . . . . . . . . . . . . . . . . . . . . . . . . 851
′ ′

∫ f exp f = exp f + C . . . . . . . . . . . . . . . . . . . . .

81.2 . . . . . . . . 853
81.3 ∫ f = ln ∣f ∣ + C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 855
∫ (f ) ⋅ f = n + 1 (f ) + C . . . . . . . . . . . . . . . . .

n n+1
. . . . . . . . 858
81.5 Building a Derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 860
81.6 More Challenging Applications of the Substitution Rule . . . . . . . . . . 861
81.7 An Alternative Formula for IBP . . . . . . . . . . . . . . . . . . . . . . . 869
81.8 The Substitution Rule with the TOT and IBP . . . . . . . . . . . . . . . 870
81.9 Finding the Antiderivative of an Inverse Function (optional) . . . . . . . . 871
82 Term-by-Term Integration 874

83 Finding Specific Definite Integrals 876

83.1 Area between a Curve and Lines Parallel to Axes . . . . . . . . . . . . . . 876
83.2 Area between a Curve and a Line . . . . . . . . . . . . . . . . . . . . . . . 878
83.3 Area between Two Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . 879
83.4 Area Below the x-Axis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 880
83.5 Area under a Parametrically-Defined Curve . . . . . . . . . . . . . . . . . 881
83.6 Volume of Rotation About the y- or x-Axis . . . . . . . . . . . . . . . . . 882
84 Types of Integrals (fun, optional) 885

85 Revisiting ln, logb , and exp (optional) 887

85.1 Revisiting Logarithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 889
85.2 Revisiting the Exponential Function . . . . . . . . . . . . . . . . . . . . . 891

86 Even More Fun with Your TI84 893

87 Differential Equations 894
87.1 dy/dx = f (x) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 894
87.2 dy/dx = f (y) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 896
87.3 d2 y/dx2 = f (x) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 899
87.4 Word Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 901

xv, Contents

Part VI. Probability and Statistics 904

88 How to Count: Four Principles 906

88.1 How to Count: The Addition Principle . . . . . . . . . . . . . . . . . . . . 907
88.2 How to Count: The Multiplication Principle . . . . . . . . . . . . . . . . . 910
88.3 How to Count: The Inclusion-Exclusion Principle . . . . . . . . . . . . . . 913
88.4 How to Count: The Complements Principle . . . . . . . . . . . . . . . . . 915
89 How to Count: Permutations 916
89.1 Permutations with Repeated Elements . . . . . . . . . . . . . . . . . . . . 919
89.2 Circular Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 924
89.3 Partial Permutations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 927
89.4 Permutations with Restrictions . . . . . . . . . . . . . . . . . . . . . . . . 928
90 How to Count: Combinations 930
90.1 Pascal’s Triangle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 933
90.2 The Combination as Binomial Coefficient . . . . . . . . . . . . . . . . . . 934
90.3 The Number of Subsets of a Set is 2n . . . . . . . . . . . . . . . . . . . . . 936

91 Probability: Introduction 937

91.1 Mathematical Modelling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 937
91.2 The Experiment as a Model of Scenarios Involving Chance . . . . . . . . . 939
91.3 The Kolmogorov Axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . 944
91.4 Implications of the Kolmogorov Axioms . . . . . . . . . . . . . . . . . . . 945

92 Probability: Conditional Probability 947

92.1 The Conditional Probability Fallacy (CPF) . . . . . . . . . . . . . . . . . 949
92.2 Two-Boys Problem (Fun, Optional) . . . . . . . . . . . . . . . . . . . . . . 953

93 Probability: Independence 955

93.1 Warning: Not Everything is Independent . . . . . . . . . . . . . . . . . . 958
93.2 Probability: Independence of Multiple Events . . . . . . . . . . . . . . . . 960

94 Fun Probability Puzzles 961

94.1 The Monty Hall Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 961
94.2 The Birthday Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 964

95 Random Variables: Introduction 965

95.1 A Random Variable vs. Its Observed Values . . . . . . . . . . . . . . . . . 966
95.2 X = k Denotes the Event {s ∈ S ∶ X(s) = k} . . . . . . . . . . . . . . . . . 967
95.3 The Probability Distribution of a Random Variable . . . . . . . . . . . . . 968
95.4 Random Variables Are Simply Functions . . . . . . . . . . . . . . . . . . . 971
96 Random Variables: Independence 973
97 Random Variables: Expectation 976
97.1 The Expected Value of a Constant R.V. is Constant . . . . . . . . . . . . 978

xvi, Contents

97.2 The Expectation Operator is Linear . . . . . . . . . . . . . . . . . . . . . 980

98 Random Variables: Variance 982

98.1 The Variance of a Constant R.V. is 0 . . . . . . . . . . . . . . . . . . . . . 987
98.2 Standard Deviation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 988
98.3 The Variance Operator is Not Linear . . . . . . . . . . . . . . . . . . . . . 989
98.4 The Definition of the Variance (Optional) . . . . . . . . . . . . . . . . . . 991
99 The Coin-Flips Problem (Fun, Optional) 992
100 The Bernoulli Trial and the Bernoulli Distribution 993
100.1 Mean and Variance of the Bernoulli Random Variable . . . . . . . . . . . 995

101 The Binomial Distribution 996

101.1 Probability Distribution of the Binomial R.V. . . . . . . . . . . . . . . . . 998
101.2 The Mean and Variance of the Binomial Random Variable . . . . . . . . . 999

102 The Continuous Uniform Distribution 1001

102.1 The Continuous Uniform Distribution . . . . . . . . . . . . . . . . . . . . 1002
102.2 The Cumulative Distribution Function (CDF) . . . . . . . . . . . . . . . . 1004
102.3 Important Digression: P (X ≤ k) = P (X < k) . . . . . . . . . . . . . . . . . 1005
102.4 The Probability Density Function (PDF) . . . . . . . . . . . . . . . . . . 1006
103 The Normal Distribution 1007
103.1 The Normal Distribution, in General . . . . . . . . . . . . . . . . . . . . . 1014
103.2 Sum of Independent Normal Random Variables . . . . . . . . . . . . . . . 1023
103.3 The Central Limit Theorem and The Normal Approximation . . . . . . . 1026

104 The CLT is Amazing (Optional) 1028

104.1 The Normal Distribution in Nature . . . . . . . . . . . . . . . . . . . . . . 1028
104.2 Illustrating the Central Limit Theorem (CLT) . . . . . . . . . . . . . . . . 1032
104.3 Why Are So Many Things Normally Distributed? . . . . . . . . . . . . . . 1038
104.4 Don’t Assume That Everything is Normal . . . . . . . . . . . . . . . . . . 1039
105 Statistics: Introduction (Optional) 1045
105.1 Probability vs. Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1045
105.2 Objectivists vs Subjectivists . . . . . . . . . . . . . . . . . . . . . . . . . . 1046

106 Sampling 1048

106.1 Population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1048
106.2 Population Mean and Population Variance . . . . . . . . . . . . . . . . . . 1049
106.3 Parameter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1050
106.4 Distribution of a Population . . . . . . . . . . . . . . . . . . . . . . . . . . 1051
106.5 A Random Sample . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1052
106.6 Sample Mean and Sample Variance . . . . . . . . . . . . . . . . . . . . . . 1054
106.7 Sample Mean and Sample Variance are Unbiased Estimators . . . . . . . 1060
106.8 The Sample Mean is a Random Variable . . . . . . . . . . . . . . . . . . 1063
106.9 The Distribution of the Sample Mean . . . . . . . . . . . . . . . . . . . . 1064

xvii, Contents

106.10 Non-Random Samples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1065

107 Null Hypothesis Significance Testing (NHST) 1066

107.1 One-Tailed vs Two-Tailed Tests . . . . . . . . . . . . . . . . . . . . . . . . 1070
107.2 The Abuse of NHST (Optional) . . . . . . . . . . . . . . . . . . . . . . . . 1073
107.3 Common Misinterpretations of the Margin of Error (Optional) . . . . . . 1074
107.4 Critical Region and Critical Value . . . . . . . . . . . . . . . . . . . . . . 1077
107.5 Testing of a Pop. Mean (Small Sample, Normal Distribution, σ 2 Known) 1079
107.6 Testing of a Pop. Mean (Large Sample, Any Distribution, σ 2 Known) . . 1081
107.7 Testing of a Pop. Mean (Large Sample, Any Distribution, σ 2 Unknown) . 1083
107.8 Formulation of Hypotheses . . . . . . . . . . . . . . . . . . . . . . . . . . 1085

108 Correlation and Linear Regression 1086

108.1 Bivariate Data and Scatter Diagrams . . . . . . . . . . . . . . . . . . . . . 1086
108.2 Product Moment Correlation Coefficient (PMCC) . . . . . . . . . . . . . 1088
108.3 Correlation Does Not Imply Causation (Optional) . . . . . . . . . . . . . 1093
108.4 Linear Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1094
108.5 Ordinary Least Squares (OLS) . . . . . . . . . . . . . . . . . . . . . . . . 1096
108.6 TI84 to Calculate the PMCC and the OLS Estimates . . . . . . . . . . . 1101
108.7 Interpolation and Extrapolation . . . . . . . . . . . . . . . . . . . . . . . . 1103
108.8 Transformations to Achieve Linearity . . . . . . . . . . . . . . . . . . . . . 1111
108.9 The Higher the PMCC, the Better the Model? . . . . . . . . . . . . . . . 1115

xviii, Contents

Part VII. Ten-Year Series 1117

109 Past-Year Questions for Part I. Functions and Graphs 1121

110 Past-Year Questions for Part II. Sequences and Series 1134
111 Past-Year Questions for Part III. Vectors 1144

112 Past-Year Questions for Part IV. Complex Numbers 1152

113 Past-Year Questions for Part V. Calculus 1158

114 Past-Year Questions for Part VI. Prob. and Stats. 1187
115 All Past-Year Questions, Listed and Categorised 1224
115.1 2017 (9758) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1224
115.2 2016 (9740) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1226
115.3 2015 (9740) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1227
115.4 2014 (9740) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1228
115.5 2013 (9740) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1229
115.6 2012 (9740) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1230
115.7 2011 (9740) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1231
115.8 2010 (9740) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1232
115.9 2009 (9740) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1233
115.10 2008 (9740) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1234
115.11 2007 (9740) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1235
115.12 2008 (9233) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1236
115.13 2007 (9233) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1237
115.14 2006 (9233) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1238

xix, Contents

Part VIII. Appendices 1250

116 Appendices for Part 0. A Few Basics 1251

116.1 Logic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1251
116.2 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1253
116.3 Division . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1256

117 Appendices for Part I. Functions and Graphs 1257

117.1 Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1257
117.2 The Quadratic Equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1265
117.3 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1267
117.4 Inverse Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1269
117.5 Transformations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1272
117.6 Trigonometry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1273
117.7 Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1274
117.8 Conic Sections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1275
117.9 Inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1280
118 Appendices for Part II. Sequences and Series 1281
118.1 Convergence, Divergence, and All That . . . . . . . . . . . . . . . . . . . 1282

119 Appendices for Part III. Vectors 1285

119.1 Some General Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . 1285
119.2 Some Basic Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1287
119.3 Scalar Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1289
119.4 Angles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1291
119.5 The Relationship Between Two Lines . . . . . . . . . . . . . . . . . . . . . 1292
119.6 Lines in 3D Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1293
119.7 Projection Vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1295
119.8 The Vector Product . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1297
119.9 Planes in General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1299
119.10 Planes in Three-Dimensional Space . . . . . . . . . . . . . . . . . . . . . . 1305
119.11 Four Ways to Uniquely Determine a Plane . . . . . . . . . . . . . . . . . . 1307
119.12 The Relationship Between a Line and a Plane . . . . . . . . . . . . . . . . 1308
119.13 The Relationship Between Two Planes . . . . . . . . . . . . . . . . . . . . 1309
119.14 Distances . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1311
119.15 Point-Plane Distance: Calculus Method . . . . . . . . . . . . . . . . . . . 1312
119.16 The Relationship Between Two Lines in 3D Space . . . . . . . . . . . . . 1316
119.17 A Necessary and Sufficient Condition for Skew Lines . . . . . . . . . . . . 1317
120 Appendices for Part IV. Complex Numbers 1318

121 Appendices for Part V. Calculus 1324

121.1 A Few Useful Results and Terms . . . . . . . . . . . . . . . . . . . . . . . 1325
121.2 Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1327
121.3 Infinite Limits and Vertical Asymptotes . . . . . . . . . . . . . . . . . . . 1328
121.4 Limits at Infinity and Horizontal and Oblique Asymptotes . . . . . . . . . 1329
121.5 Rules for Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1330
xx, Contents
121.6 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1332
121.7 The Derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1336
121.8 Proving the Chain Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1338
121.9 When a Function is Increasing or Decreasing . . . . . . . . . . . . . . . . 1341
121.10 The First and Second Derivative Tests . . . . . . . . . . . . . . . . . . . . 1344
121.11 Concavity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1348
121.12 The Maclaurin Series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1354
121.13 The Riemann Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1355
121.14 Proving the FTC1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1358
121.15 More Techniques of Antidifferentiation . . . . . . . . . . . . . . . . . . . . 1359
121.16 ∫ √ 2 dx . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1361
ax + bx + c
121.17 Revisiting Logarithms and Exponentiation . . . . . . . . . . . . . . . . . . 1363

122 Appendices for Part VI. Probability and Statistics 1367

122.1 How to Count . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . 1367
122.2 Circular Permutations . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . 1369
122.3 Probability . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . 1370
122.4 Random Variables . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . 1371
122.5 The Normal Distribution . . . . . . . . . . . . . . .
. . . . . . . . . . . . . 1375
122.6 Sampling . . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . 1378
122.7 Null Hypothesis Significance Testing . . . . . . . .
. . . . . . . . . . . . . 1380
122.8 Calculating the Margin of Error . . . . . . . . . . .
. . . . . . . . . . . . . 1381
122.9 Correlation and Linear Regression . . . . . . . . .
. . . . . . . . . . . . . 1383
122.10 Deriving a Linear Model from the Barometric Formula . . . . . . . . . . . 1385

xxi, Contents

Part IX. Answers to Exercises 1386

123 Part 0 Answers (A Few Basics) 1387

123.1 Ch. 1 Answers (Just To Be Clear) . . . . . . . . . . . . . . . . . . . . . . 1387
123.2 Ch. 2 Answers (PSLE Review: Division) . . . . . . . . . . . . . . . . . . . 1387
123.3 Ch. 3 Answers (Logic) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1388
123.4 Ch. 4 Answers (Sets) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1393
123.5 Ch. 5 Answers (O-Level Review) . . . . . . . . . . . . . . . . . . . . . . . 1396

124 Part I Answers (Functions and Graphs) 1398

124.1 Ch. 6 Answers (Graphs) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1398
124.2 Ch. 7 Answers (Reflection and Symmetry) . . . . . . . . . . . . . . . . . . 1403
124.3 Ch. 8 Answers (Solutions and Solution Sets) . . . . . . . . . . . . . . . . 1403
124.4 Ch. 9 Answers (O-Level Review: The Quadratic Equation) . . . . . . . . 1404
124.5 Ch. 10 Answers (Functions) . . . . . . . . . . . . . . . . . . . . . . . . . . 1405
124.6 Ch. 11 Answers (An Introduction to Continuity) . . . . . . . . . . . . . . 1407
124.7 Ch. 13 Answers (Arithmetic Combinations of Functions) . . . . . . . . . . 1408
124.8 Ch. 14 Answers (Inverse Functions) . . . . . . . . . . . . . . . . . . . . . 1409
124.9 Ch. 15 Answers (Composite Functions) . . . . . . . . . . . . . . . . . . . 1413
124.10 Ch. 16 Answers (Transformations) . . . . . . . . . . . . . . . . . . . . . . 1416
124.11 Ch. 17 Answers (ln, exp, and e) . . . . . . . . . . . . . . . . . . . . . . . . 1421
124.12 Ch. 18 Answers (O-Level Review: The Derivative) . . . . . . . . . . . . . 1422
124.13 Ch. 19 Answers (O-Level Review: Trigonometry) . . . . . . . . . . . . . . 1426
124.14 Ch. 21 Answers (Polynomials) . . . . . . . . . . . . . . . . . . . . . . . . 1431
124.15 Ch. 22 Answers (Conic Sections) . . . . . . . . . . . . . . . . . . . . . . . 1434
124.16 Ch. 23 Answers (Simple Parametric Equations) . . . . . . . . . . . . . . . 1442
124.17 Ch. 24 Answers (Solving Inequalities) . . . . . . . . . . . . . . . . . . . . 1448
124.18 Ch. 25 Answers (Solving Systems of Equations) . . . . . . . . . . . . . . . 1457
124.19 Ch. 26 Answers (Extraneous Solutions) . . . . . . . . . . . . . . . . . . . 1463
125 Part II Answers (Sequences and Series) 1464
125.1 Ch. 27 Answers (Sequences) . . . . . . . . . . . . . . . . . . . . . . . . . . 1464
125.2 Ch. 28 Answers (Series) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1464
125.3 Ch. 29 Answers (Summation Notation Σ) . . . . . . . . . . . . . . . . . . 1465
125.4 Ch. 30 Answers (Arithmetic Sequences and Series) . . . . . . . . . . . . . 1466
125.5 Ch. 31 Answers (Geometric Sequences and Series) . . . . . . . . . . . . . 1466
125.6 Ch. 32 Answers (Rules of Summation Notation) . . . . . . . . . . . . . . 1467
125.7 Ch. 33 Answers (Method of Differences) . . . . . . . . . . . . . . . . . . . 1468
126 Part III Answers (Vectors) 1471
126.1 Ch. 34 Answers (Introduction to Vectors) . . . . . . . . . . . . . . . . . . 1471
126.2 Ch. 35 Answers (Lines) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1474
126.3 Ch. 36 Answers (The Scalar Product) . . . . . . . . . . . . . . . . . . . . 1475
126.4 Ch. 37 Answers (The Angle Between Two Vectors) . . . . . . . . . . . . . 1476
126.5 Ch. 38 Answers (The Angle Between Two Lines) . . . . . . . . . . . . . . 1478
126.6 Ch. 39 Answers (Vectors vs Scalars) . . . . . . . . . . . . . . . . . . . . . 1478
126.7 Ch. 40 Answers (Projection Vectors) . . . . . . . . . . . . . . . . . . . . . 1478
126.8 Ch. 41 Answers (Collinearity) . . . . . . . . . . . . . . . . . . . . . . . . . 1479
xxii, Contents
126.9 Ch. 42 Answers (The Vector Product) . . . . . . . . . . . . . . . . . . . . 1480
126.10 Ch. 43 Answers (The Foot of the Perpendicular) . . . . . . . . . . . . . . 1481
126.11 Ch. 44 Answers (Three-Dimensional Space) . . . . . . . . . . . . . . . . . 1485
126.12 Ch. 45 Answers (Vectors in 3D) . . . . . . . . . . . . . . . . . . . . . . . . 1485
126.13 Ch. 46 Answers (The Scalar Product in 3D) . . . . . . . . . . . . . . . . . 1487
126.14 Ch. 47 Answers (The Proj. and Rej. Vectors in 3D) . . . . . . . . . . . . 1488
126.15 Ch. 48 Answers (Lines in 3D) . . . . . . . . . . . . . . . . . . . . . . . . . 1489
126.16 Ch. 49 Answers (The Vector Product in 3D) . . . . . . . . . . . . . . . . 1492
126.17 Ch. 50 Answers: The Distance Between a Point and a Line . . . . . . . . 1495
126.18 Ch. 51 Answers (Planes: Introduction) . . . . . . . . . . . . . . . . . . . . 1498
126.19 Ch. 52 Answers (Planes: Formally Defined in Vector Form) . . . . . . . . 1499
126.20 Ch. 53 Answers (Planes in Cartesian Form) . . . . . . . . . . . . . . . . . 1500
126.21 Ch. 54 Answers (Planes in Parametric Form) . . . . . . . . . . . . . . . . 1501
126.22 Ch. 56 Answers (The Angle Between a Line and a Plane) . . . . . . . . . 1504
126.23 Ch. 58 Answers (Point-Plane Foot and Distance) . . . . . . . . . . . . . . 1507
126.24 Ch. 59 Answers (Coplanarity) . . . . . . . . . . . . . . . . . . . . . . . . . 1510

127 Part IV Answers (Complex Numbers) 1512

127.1 Ch. 60 Answers (Complex Numbers: Introduction) . . . . . . . . . . . . . 1512
127.2 Ch. 61 Answers (Some Arithmetic of Complex Numbers) . . . . . . . . . 1512
127.3 Ch. 62 Answers (Solving Polynomial Equations) . . . . . . . . . . . . . . 1514
127.4 Ch. 63 Answers (The Argand Diagram) . . . . . . . . . . . . . . . . . . . 1515
127.5 Ch. 64 Answers (Complex Numbers in Polar Form) . . . . . . . . . . . . . 1516
127.6 Ch. 65 Answers (Complex Numbers in Exponential Form) . . . . . . . . . 1516
127.7 Ch. 66 Answers (More Arithmetic of Complex Numbers) . . . . . . . . . 1517
128 Part V Answers (Calculus) 1520
128.1 Ch. 67 Answers (An Introduction to Limits) . . . . . . . . . . . . . . . . 1520
128.2 Ch. 68 Answers (An Introduction to Continuity, Continued) . . . . . . . . 1521
128.3 Ch. 69Answers (An Introduction to the Derivative) . . . . . . . . . . . . . 1522
128.4 Ch. 71 Answers (The Second and Higher Derivatives) . . . . . . . . . . . 1526
128.5 Ch. 70.2 Answers (Implicit Differentiation) . . . . . . . . . . . . . . . . . 1527
128.6 Ch. ?? Answers (Solving Problems Involving Differentiation) . . . . . . . 1528
128.7 Ch. 72.1 Answers (Stationary, Maximum, Minimum, and Inflexion Points)1530
128.8 Ch. 76 Answers (Maclaurin Series) . . . . . . . . . . . . . . . . . . . . . . 1536
128.9 Ch. 77 Answers (Integration) . . . . . . . . . . . . . . . . . . . . . . . . . 1541
128.10 Ch. 78 Answers (Antidifferentiation) . . . . . . . . . . . . . . . . . . . . . 1542
128.11 Ch. 79 Answers (FTC2) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1545
128.12 Ch. 80 Answers (More Techniques of Antidifferentiation) . . . . . . . . . 1546
128.13 Ch. ?? Answers (The Fundamental Theorems of Calculus) . . . . . . . . . 1554
128.14 Ch. 83 Answers (Definite Integrals) . . . . . . . . . . . . . . . . . . . . . . 1555
128.15 Ch. 85 Answers (Revisiting Logarithms) . . . . . . . . . . . . . . . . . . . 1558
128.16 Ch. 87 Answers (Differential Equations) . . . . . . . . . . . . . . . . . . . 1559

129 Part VI Answers (Probability and Statistics) 1563

129.1 Ch. 88 Answers (How to Count: Four Principles) . . . . . . . . . . . . . . 1563
129.2 Ch. 89 Answers (How to Count: Permutations) . . . . . . . . . . . . . . . 1566
129.3 Ch. 90 Answers (How to Count: Combinations) . . . . . . . . . . . . . . . 1568
129.4 Ch. 91 Answers (Probability: Introduction) . . . . . . . . . . . . . . . . . 1571
xxiii, Contents
129.5 Ch. 92 Answers (Conditional Probability) . . . . . . . . . . . . . . . . . . 1575
129.6 Ch. 93 Answers (Probability: Independence) . . . . . . . . . . . . . . . . 1576
129.7 Ch. 95 Answers (Random Variables: Introduction) . . . . . . . . . . . . . 1577
129.8 Ch. 96 Answers (Random Variables: Independence) . . . . . . . . . . . . 1581
129.9 Ch. 97 Answers (Random Variables: Expectation) . . . . . . . . . . . . . 1581
129.10 Ch. 98 Answers (Random Variables: Variance) . . . . . . . . . . . . . . . 1583
129.11 Ch. 99 Answers (The Coin-Flips Problem) . . . . . . . . . . . . . . . . . . 1583
129.12 Ch. 100 Answers (Bernoulli Trial and Distribution) . . . . . . . . . . . . . 1583
129.13 Ch. 101 Answers (Binomial Distribution) . . . . . . . . . . . . . . . . . . 1584
129.14 Ch. 102 Answers (Continuous Uniform Distribution) . . . . . . . . . . . . 1585
129.15 Ch. 103 Answers (Normal Distribution) . . . . . . . . . . . . . . . . . . . 1586
129.16 Ch. 106 Answers (Sampling) . . . . . . . . . . . . . . . . . . . . . . . . . 1592
129.17 Ch. 107 Answers (Null Hypothesis Significance Testing) . . . . . . . . . . 1595
129.18 Ch. 108 Answers (Correlation and Linear Regression) . . . . . . . . . . . 1599
130 Part VII Answers (2006–17 A-Level Exams) 1603
130.1 Ch. 109 Answers (Functions and Graphs) . . . . . . . . . . . . . . . . . . 1603
130.2 Ch. 110 Answers (Sequences and Series) . . . . . . . . . . . . . . . . . . . 1634
130.3 Ch. 111 Answers (Vectors) . . . . . . . . . . . . . . . . . . . . . . . . . . 1652
130.4 Ch. 112 Answers (Complex Numbers) . . . . . . . . . . . . . . . . . . . . 1664
130.5 Ch. 113 Answers (Calculus) . . . . . . . . . . . . . . . . . . . . . . . . . . 1685
130.6 Ch. 114 Answers (Probability and Statistics) . . . . . . . . . . . . . . . . 1736

Index 1767
Abbreviations Used in This Textbook 1771
Singlish Used in This Textbook 1775

Notation Used in the Main Text 1776

Notation Used in the Appendices 1777

YouTube Ad 1778
Tuition Ad 1779

xxiv, Contents

About this Book
This textbook is for H2 Maths students in Singapore (hence the occasional Singlish and
TLAs2 ). It exactly follows the latest 9758 syllabus.3 (Of course, I hope that anyone else
in the world will also find this useful!)

• FREE! This book is free. But if you paid any money for it, I certainly hope your money
is going to me! This book is free because:
1. It is a shameless advertising vehicle for my awesome tutoring services.
2. The marginal cost of reproducing this book is zero and I am a benevolent maximiser of
social welfare. If you don’t understand what that last sentence means, you should read
my economics textbooks.4 (Quick translation: I’m a very nice guy. ,)

• HELP ME IMPROVE THIS BOOK! Feel free to email me if:

1. There are any errors in this book. Please let me know even if it’s something as trivial
as a spelling mistake, an extra space, a grammatical error, or an incorrect/broken link.
2. You have absolutely any suggestions for improvement.
3. Any part of this book is less than crystal clear.
If at any point in this textbook, you’ve read the same passage a few times, tried to reason
it through, and still find things confusing, then it is a failure on MY part. I have failed to
explain things clearly enough. Please let me know and I will try to rewrite it so that it’s
clearer. (There is also the possibility that I simply made some mistake or typo! So please
let me know if there’s anything confusing!)
I deeply value any feedback, because I’d like to keep improving this textbook
for the benefit of everyone! I am very grateful to all the kind folks who’ve already
written in, allowing me to rid this book of more than a few embarrassing errors.

• LYX rocks!5 A big thank you to all the developers and those who have helped to fund
its development.

Three Letter Abbreviations. In the US for example, abbreviations are viewed by some as dumbing down.
In contrast, in Singapore, the ability to use as many abbreviations as possible and even create one’s own
abbreviations is nearly a mark of intelligence. There shall therefore be very many abbreviations in this
In 2017, the current 9758 syllabus was examined for the first time and the previous 9740 syllabus for the
last time.
I’m working on these. You can find half-completed versions on my website.
This book was written using LYX. LATEX is the typesetting program used by most economists and
scientists. But LATEX can be annoying to use. LYX is a user-friendly, GUI, for-dummies version of LATEX.
With LYX, you can actually clearly see on screen the equations you’re typing, as you’re typing them.
It is quite pointless to try working out one’s maths in LATEX because you can’t clearly tell what you’re
writing. In contrast, it is perfectly feasible and indeed easy to work out one’s maths in LYX. For example,
if you have countless lines of tedious algebra to do, you can do it in LYX and copy-paste/document every
step of the way. Otherwise you’d probably be doing it on pen and paper which is messy and which you’ll
probably misplace.
LYX has boosted my productivity by countless hours over the years and you should use LYX too!
xxv, Contents
• Maths, Math, and Matzz.
This book uses British English. And so the word mathematics shall be abbreviated as maths
(and not math). By the way, Singaporeans used to pronounce maths as matzz (similar to
how they pronounce clothes as klotes). But at some point between 2005 and 2015, perhaps
after watching too many American TV and movies, Singaporeans decided they’d switch to
the American math. Perhaps in 50 years, we’ll all be Amos Yees trying to speak annoying
pseudo American English. But in the meantime, I’ll continue to say matzz. This is my way
of promoting and preserving Singlish (and also sticking it to the ghost of LKY).

xxvi, Contents

Tips for the Student
• Read maths slowly.
Reading maths is not like reading Harry Potter. Most of Harry Potter is fluff. You can
skip 100 pages of Harry Potter without missing a beat.6
In contrast, there is little fluff in maths. Skip or misapprehend one line and it will cost you.
So, go slowly. Dwell upon and carefully consider every sentence in this textbook. Make
sure you completely understand what each statement says and why it is true. Reading
maths is very different from reading most other subject matter.
If you don’t quite understand some material, you might be tempted to move forward anyway.
Don’t. In maths, later material usually builds on earlier material. So, if you simply move
forward, then yea sure, this may save you some time and frustration in the short run, but
it will almost always cost you far more in the long run.
Better then to stop right there. Keep working on it until you “get” it. Help is all around
— ask a friend or a teacher. Feel free to even email me! (I’m always interested to know
what the common points of confusion are and how I can better clear them up.)

• Examples and exercises are your best friends.

A good stock of examples, as large as possible, is indispensable for a thorough

understanding of any concept, and when I want to learn something new, I
make it my first job to build one.

— Paul Halmos (1983, I Want to Be a Mathematician: An Automathology).

Carefully work through all the examples and exercises. Merely moving your eyeballs is
not the same as working. Working means having pencil and paper by your side and going
through each example/exercise word-by-word, line-by-line.
For example, I might say something like “x2 − y 2 = 0. Thus, (x − y)(x + y) = 0.” If it’s not
obvious to you why the first sentence implies the second, stop right there and work on it
until you understand why. And again, if you can’t figure it out yourself, don’t be shy about
asking someone for help. Don’t just let your eyeballs fly over these sentences and pretend
that your brain “gets” it. Such self-deceit will only cost you in the long run.
The Chinese believe in “eating bitterness” and work for work’s sake. That is not my view.
The exercises in this textbook are not to make you suffer or somehow strengthen your moral
fibre. Instead, as with learning to ride a bike or swim, practice makes perfect. The best
way to learn and master any material is by practising, doing, and occasionally failing. You
may struggle initially and get a few bruises, but eventually, you’ll get good.
I strongly advise that you do every one of the exercises in this textbook. They will help
you learn. They will also serve as a check that you’ve actually “got it”. And if you haven’t,
then well, as mentioned, keep working until you get it or seek help. The seemingly-easy
way out of pretending you’ve got it and flipping to the next page is actually the hard way
out, because it will only cost you more grief in the long run.
Spoiler: Voldemort dies, Harry Potter lives happily ever after.
xxvii, Contents
• Confused? Good!

Shall I teach you what wisdom means? To know what you know and know
what you do not know — this then is wisdom.

— Confucius (Analects Book II Ch. 17, 1998 trans.).

... he fancies he knows something, although he knows nothing, whereas I, as

I do not know any thing, so I do not fancy I do. In this trifling particular,
then, I appear to be wiser than him, because I do not fancy I know what I do
not know.

— Plato’s Socrates (Apology).

... he discovered that, when he imagined his education was completed, it had
in fact not commenced; and that, although he had been at a public school and
a university, he in fact knew nothing. To be conscious that you are ignorant
is a great step to knowledge.

— Benjamin Disraeli (1845, Sybil).

There are always some students who, when probed, say they are not at all confused and
have no questions to ask. In my experience, these are usually precisely the worst students.
These students have such poor understanding of the material that they do not know
what they do not know.
And so, when you find yourself confused, don’t panic or despair. Your confusion is actually
good news! To be confused is to know that you do not know and thus to have taken your
first step towards wisdom. You now know what you need to work on and what questions
to ask your friends and teachers.
Of course, merely knowing what you do not know is not enough. You need to actually act
on this as well. Be proactive: your education and your life are in your own hands!

xxviii, Contents

• Learning by teaching.

Feynman was once asked by a Caltech faculty member to explain why spin
1/2 particles obey Fermi-Dirac statistics. He gauged his audience perfectly
and said, “I’ll prepare a freshman lecture on it.” But a few days later he
returned and said, “You know, I couldn’t do it. I couldn’t reduce it to
the freshman level. That means we really don’t understand it.”

— Special Preface to Six Easy Pieces (1989).

There is a view in some philosophical circles that anything that can be un-
derstood by people who have not studied philosophy is not profound enough
to be worth saying. To the contrary, I suspect that whatever cannot be said
clearly is probably not being thought clearly either.

— Peter Singer (2016, Ethics in the Real World).

I agree: If you can’t explain something simply, you don’t understand it well
This saying is useful for the instructor. But it also yields the learner the following useful
corollary (in mathematics, a corollary is a statement that follows readily from another):

An excellent test of whether you actually understand something is

to see if you can explain it simply to yourself or to someone else.

This learning technique may be dubbed learning by teaching and is a perfectly good
one. As the Latin proverb goes, Docendo discimus — by teaching, we learn.
One way to implement this technique8 is for you and your friends to get together explain
concepts to each other aloud. For this technique to work, you and your friends must
challenge each other and be demanding of each other. Do not be content until you’ve
actually given each other the full and correct explanation of the concept at hand.

This quote or some similar variant is often misattributed to Einstein. But as Einstein himself once said,
“73% of Einstein quotes are misattributed.”
Another way is to hand over the classroom to students. But this is probably too adventurous a move in
Singapore, where the closest thing is probably the officially-sanctioned and of course graded project work
xxix, Contents
• Lessons from the science of learning.
Unfortunately, the science of learning is still very much in its infancy. But do check out
this short review of the literature: “How We Learn: What Works, What Doesn’t” (2013).
According to this review, the two most effective study/learning techniques are:
1. Self-testing.
2. Distributed practice. (Sometimes also called spaced practice.)
The next three are:
3. Elaborative interrogation.
4. Self-explanation.
5. Interleaved practice.
Two commonly-used but ineffective techniques are (a) highlighting; and (b) rereading.
Again, be warned that the science of learning is very much in its infancy, so you should
take with a large dose of salt any advice (including mine about learning by teaching).
Nonetheless, there’s probably no harm trying out different techniques and seeing what
works for you.

xxx, Contents

Miscellaneous Tips for the Student
• You get a List of Formulae (MF26) during the A-Level exam.
So there’s no need to memorise all the formulae that are already on this list. (Note though
that your JC may or may not give you List MF26 during your JC common tests and exams.)

• Remember your O-Level Maths & ‘A’ Maths?

You’ve probably forgotten some (or most?) of it, but unfortunately, you are still assumed to
know all of O-Level Maths (2017 syllabus) and “some” (OK, more like a lot) of Additional
Maths (pp. 14–15 of your A-Level syllabus lists what you need to know from ‘A’ Maths).
(To take H2 Maths, most JCs require that you at least passed ‘A’ Maths.)9
Littered around this textbook are occasional “O-Level Reviews” (e.g. Ch. 5 and 19). These
reviews will usually be very quick and hopefully you’ll have no difficulty with them. But if
you do, go back and review your O-Level Maths and ‘A’ Maths!

• Web-based calculators.
Google is probably the quickest for simple calculations. Type in anything into your
browser’s Google search bar and the answer will instantly show up:

Wolfram Alpha is somewhat more advanced (but also slower). Enter sin x for example and
you’ll get graphs, the derivative, the indefinite integral, the Maclaurin series, and a bunch
of other stuff you neither know nor care about. In this textbook, you’ll sometimes see
(usually at the end of an example or an exercise answer) a clickable Wolfram Alpha logo
that will bring you to the relevant computation on Wolfram Alpha.
Symbolab is a much less powerful alternative to Wolfram Alpha. However, it’s perfectly
good for simple algebra and somewhat quicker, so you may sometimes prefer using it.
Derivative Calculator • With Steps!

This website uses cookies. OK Privacy policy

The Derivative Calculator and the Integral Calculator are probably unbeatable for the specific
Also check the Integral Calculator!
Calculadora de Derivadas en español
Ableitungsrechner auf Deutsch

purposes of differentiation and integration. Both give step-by-step solutions for anything
Calculate derivatives online — with steps and graphing!

you want to differentiate or integrate. As with Wolfram Alpha, you’ll sometimes see click-

able logos and that’ll bring you to the relevant computations. (Note that unfortunately,
after clicking, you’ll also have to either click “Go!” or hit Enter .)10
I also made this Collection of Spreadsheets (click “Make a copy”). These are for doing tedi-
ous and repetitive calculations you’ll often encounter in H2 Maths (with vectors, complex
numbers, etc.).11

Some kiasu JCs, like HCI, even require that you got at least a B3 for both Maths & Additional Maths.
The site author informed me that not having a direct link was deliberate and defensive.
As with anything I do, I welcome any feedback on these spreadsheets. (Perhaps in the future I will make
a more attractive version.)
xxxi, Contents
• Other online resources.
There are way too many websites that try to cover primary, secondary, and lower-level
undergraduate maths. Unfortunately, some of them are awful and get things wrong.
Three websites I like (though are probably a bit advanced for JC students) are:

1. The Stack Exchange (SE) family of Q&A websites.12

Just for maths alone, there are three SE sites!13
(a) At , you can ask questions and often get them answered fairly prompt-
ly. Note though that this site is mostly frequented by fairly advanced users of maths
(including many mathematicians), so they can be pretty impatient and quick to down-
vote questions they perceive to be “stupid”. Nonetheless, if you make an effort to write
down a carefully-crafted question and show also that you’ve made some effort to look
for an answer (either on your own or online), they can be very helpful.
(b) Mathematics Educators beta focuses on pedagogy (teaching and learning). Unfortunately,
it’s not as active as . Nonetheless, many of the discussions there are
filled with insights — insights that I’ve tried to incorporate into this textbook.
(c) is for research-level mathematics and is thus way beyond anything
that’s of use to us.
In this textbook, you’ll occasionally see and logos. Click/touch them and you’ll be
sent to related SE discussions.
2. ProofWiki gives succinct and rigorous definitions and proofs. Unfortunately it is very
3. Mathworld.Wolfram is also great, but at times excessively encyclopaedic, at the cost of
clarity and brevity.
And of course, you can find countless free maths textbooks online (some less legal than
others). One wonderful and (mostly) legal resource is the Internet Archive, which has
many old books — including very many that were scanned by Google but which are no
longer available on Google Books.
Two totally illegal14 resources are: Library Genesis for books and Sci-Hub for articles.15
And of course, an old reliable is BitTorrent. (I have, of course, never used any of these
illegal resources, but I hear they are great.)

The flagship SE site is where you can ask any programming question and (often) see it
answered amazingly quickly. For computing in general, there are also many other SE sites.
SE sites are like Yahoo! Answers or Quora, but less stupid. The worst SE site is probably Politics beta ,
but even there, the average question or answer is probably better than that on Yahoo! Answers or Quora.
There are also many other wonderful SE sites that you should explore. Unfortunately, despite my
magnificent contributions, Economics beta is not exactly thriving. It seems that economists, unlike
programmers or mathematicians, have learnt all too well that contributing to the public good is folly.
Well, depending on which jurisdiction you live in. Of course, in Singapore, unless told otherwise, you
should assume that everything is illegal.
Note though that these sites are constantly playing whac-a-mole with the fascist authorities and so the
URLs often change. If the links here aren’t correct, please first let me know so I can correct them. Then
simply google to find the current working URLs. For Sci-Hub, the page usually lists the
latest up-and-running URLs.
xxxii, Contents
Use of Graphing Calculators
You are required to know how to use a graphing calculator.16
This textbook will give only a few examples involving graphing calculators.
There is no better way of learning to use it than to play around with it yourself. By the
time you sit down for your A-Level exams, you should have had plenty of practice with it.
You can also use any of the seven calculators in the following list (last updated by SEAB
on Dec 1st, 2017 — PDF).17


The following graphing calculator models are approved for use ONLY in subjects examined at H1, H2 and
H3 Levels of the A-Level curriculum.
Note: All graphing calculators must be reset prior to any examination.

S/N Calculator Brand Calculator Model Approved Period1

1 CASIO FX-9860G Slim 2008 – 2018
2 FX-9860GIIs 2014 – 2022
3 TI-83 PLUS 2010 – 2018
5 TI-84 PLUS Silver Edition 2006 – 2020
6 TI-84 PLUS Pocket SE 2012 – 2021
7 TI-84 Plus C Silver Edition 2015 – 2018
8 TI-84 Plus CE 2017 – 2020

This textbook will stick with the TI-84 PLUS Silver Edition,18 which I’ll simply call the
TI84. (My understanding is that most students use a TI calculator and that the five
approved TI calculators are pretty similar.)
I’ll always start each example with the calculator freshly reset.19

Pretty bizarre that in this age of the smartphone, they want you to learn how to use these clunky and
now-useless devices from the ’80s and ’90s. It is the equivalent of learning to program a VCR. (The TI-81
was designed in the 1980s and first sold in 1990. The TI-84 PLUS was first sold in 2004 and represents
only a modest improvement over the original TI-81.)
IMHO it’d be much better to teach you to some simple programming or Excel (or whatever spreadsheet
program). “B-b-but ... how would such learning be tested in an exam format?” Ay, there’s the rub. In
the Singapore education system, anything that cannot be “examified” is not worth learning.
Of course, there are some folks over in Texas who don’t mind. Nor do those lucky few MOE teachers
and administrators who get to go on all-expenses-paid “business” trips to Texas to “learn” more about
the calculators.
They forgot a U in INSTRMENTS.
Operating System, Version 2.55MP — available at the TI website.
I’ve never actually bought or owned a graphing calculator. All my screenshots here of the TI84 are
actually from the emulator Wabbitemu. (Yup, you can download Atari or DOS emulators to play
decades-old games; and you can likewise download an emulator for this decades-old piece of junk.)
xxxiii, Contents
Exam Tips for Towkays

Use your graphing calculator as much as possible. You are always allowed to use your
graphing calculator.
Some exam questions will explicitly instruct you not to use your calculator, but this just
means that your written answer should not include any hint that you used your calculator.
(Nonetheless, you can still cheat and use your calculator for guidance and to check your
Instructions to not use your calculator include:
• “Do not use a calculator in answering this question.”
• “Without using your calculator ...”
• “Use a non-calculator method ...”
• “Find the exact value of ...”

• “Express your answer in terms of 3 or π.”

In Spider-Man: Homecoming (2017), Spider-Man uses a TI calculator to bust out of the

“most secure facility on the Eastern Seaboard” — the Damage Control Deep Storage Vault
(also known as the Department of Damage Control Vault).20

This scene takes place at about 57:30–58:00 of the movie (YouTube clip).
They removed the brand name and also the model name, so I can’t quite tell what model it is. One
website claims it’s a TI-86, while another claims it’s an exact match to the TI-83 PLUS.
Zooming in, it looks like Spider-Man is doing some sort of combinatorics on his notepad. I can’t tell
what exactly’s on the calculator screen. Many of the buttons on the calculator seem to be permanently
depressed, which is weird.
Slowing down, we see that he first punches in 5 8 ( , then a moment later 5 7 (or maybe 2 6 ) —
so it’s probably all just rubbish that he’s punching in. The latter keystrokes do get him out of the place
though. (That’s all for my brilliant movie analysis of the week.)
xxxiv, Contents

When you’re very structured almost like a religion ... Uniforms, uniforms,
uniforms ... everybody is the same. Look at structured societies like Singapore
where bad behaviour isn’t tolerated. You are extremely punished. Where
are the creative people? Where are the great artists? Where are the great
musicians? Where are the great singers? Where are the great writers? Where
are the athletes? All the creative elements seem to disappear.

— Steve Wozniak (2011 BBC interview).21

The most dangerous man, to any government, is the man who is able to think
things out for himself.

— H.L. Mencken (1919).

Divide students into two extremes:

• Type 1 students are happy to learn absolutely nothing, so long as they get an A.
• Type 2 would rather learn a lot, even if this means getting a C.
The good Singaporean is taught that pragmatism is the highest virtue (obedience is second).
She is thus also trained to be a Type 1 student (and indeed a Type 1 human being).
If you’re a Type 1 student, then this textbook may not be the best use of your time, though
you may still find the exercises and the TYS questions useful.22 (Below I do nonetheless
give Two Reasons Why Even Type 1 Pragmatists Should Read This Textbook.)
Of course, any careful student of this textbook will be rewarded with an A. But getting an
A is not the goal of this textbook. Instead, the goal of this textbook is to:

Impart genuine understanding.

Let us contrast this with the goal of the Singapore education system, which is to:

Create a docile labour force that generates GDP growth.

At first glance, these two goals do not seem to be in conflict. After all, we’d expect a
student who genuinely understands her H2 Maths to also contribute to GDP growth.
The conflict only arises with the keyword docile. An education system that imparts genuine
understanding tends also to encourage independent thinking and discourage docility.
On the one hand, to maximise GDP growth, Gahmen wants “creative” innovators. On the
other, it doesn’t want too much of a challenge to the status quo (especially politically). Its
goal is thus to turn docile test-taking drones into docile creative innovators.
Unfortunately, the audio at this BBC webpage seems to be broken.
The following resources are provided with the efficient Type 1 student in mind: (a) The H2 Mathem-
atics CheatSheet, all the formulae you’ll ever need on two sides of an A4 sheet of paper. (b) The H1
Mathematics Textbook, which is written more simply, and which covers a subset of the H2 syllabus.
(c) The H2 Maths Exercise Book (coming “soon”), which teaches you how to mindlessly apply formulae
and give the “correct” answer to every exam question. (d) My totally awesome tuition classes!
xxxv, Contents
Unfortunately, creative innovation is completely incompatible with docility. A populace
trained to avoid the slightest transgression is not one that is capable of producing anything
new. The result is lip service to buzzwords like “creativity” and half-hearted education
reform. Once a decade or so, some technocrat comes up with an inane four-letter campaign
(FLC) like TSLN 1997 and TLLM 2005 that brings us precisely nowhere.23
To this ambivalence and pussy-footing, add (i) the deep-rooted Confucian love of exams
and rote-learning; and (ii) the elitist British educational system we inherited.24 Altogether,
despite superficial appearances to the contrary, we’ve had very little change over the years.
Administrators, teachers, and students alike remain completely fixated with exams.

There is a place for testing. The problem is that as currently constructed, our exams
and education system do not test for genuine understanding. Instead, they merely test if
you’ve mastered the art of mimicry and if you’re able to follow instructions, recipes, and
algorithms. In other words, they test if you are an obedient monkey capable of performing
tricks you’ve practised over and over and over again.
I have a study in mind: Gather all the students who got As for their A-Level maths exams,
5, 10, 15, 20 years after the exam. Get them to do that exact same A-Level exam they
took years ago. Ask them if they remember anything from their JC maths education or
if their JC maths education had any value whatsoever. I suspect that most will get close
to 0, remember absolutely nothing, and consider their JC maths education to have been
completely worthless. If these suspicions are correct, then JC maths education has no value,
except and only as a selection device.
(... Which, of course, is of paramount importance in elitist, social-Darwinist Singapore.
Grades help differentiate the President’s Scholar from the “mere” PSC scholar and the
lowly McDonald’s employee from the dalit cleaner. Grades help differentiate those should
reproduce from those who shouldn’t.)25

In 1997, Thinking Schools, Learning Nation (or, as was joked, Sinking Schools, Burning Nation). In
2006, Teach Less, Learn More. These campaigns may now be found, alongside such gems as Goal 2010,
in the ash heap of history. The current FLC is probably ESGS (Every School a Good School).
From a April 12, 2013 Straits Times interview with Tharman:
ST: DPM, why do you think we are the way we are?
A: Well, we inherited the British system, which is quite academically biased and in Britain, of course,
quite an elitist system. We also inherited a Chinese education culture, which is also quite academically
oriented, a strong emphasis on values and character education but quite academically oriented and quite
test-oriented. And I think the combination of a British and East Asian educational ethos has created a
particular form of meritocracy which achieved a lot in 40 years. But as we go forward and we think about
the type of inclusive society we want, it’s not just about wages, which we are working on, it’s about how
you view yourself and others at the workplace, wherever you live, how we view fellow Singaporeans, do
you view them as equals, do you do things together. That has to start from young and it has to continue
through life.
One will recall the infamous Graduate Mothers’ Priority Scheme and Small Family Incentive Scheme.
Those two Orwellian/Nazi schemes have since been scrapped.
However, the Social Development Unit (SDU), now rebranded as the Social Development Network (SDN),
lives on. The SDU was established at around the same time (1984) as the aforementioned schemes “to
encourage social interaction and marriage among graduate singles”. I do not know if this still remains
the rebranded SDN’s explicit, written goal, but it would surprise me if this did not still remain at least
its implicit and unwritten goal.
xxxvi, Contents
In A Mathematician’s Lament (2002, 2009), Paul Lockhart describes (pre-tertiary) maths
education in the US as being “stupid and boring”, “formulaic”, and “mindless” “pseudo-
maths”.26 The same may be said of maths education in Singapore. But at least the typical
US student has the consolation that only a very small portion of her life will have been
squandered on such “mindless” “pseudo-maths”.
The same cannot be said for the typical Singaporean student. By the time she turns 18, she
will have — just for the single subject of maths alone — clocked many thousands of hours
attending school and tuition classes; doing homework, practice exam questions, assessment
books, and Ten Year Series (TYS); taking common tests, promos, prelims, mid-year exams,
and end-of-year exams; ad infinitum, ad nauseam.
The 8 Confucian East Asian countries27 perform splendidly on international tests like the
triennial Programme for International Student Assessment (PISA). In the 2015 PISA, these
8 countries all ranked in the top 11 (out of 70 countries/regions, see p. xliii). Singapore
did especially well and topped all three categories (science, reading, mathematics).
“What,” the Western educator enquires, “is the magic here?”28 But here there is no magic.
To me, the explanation for why East Asian students do so well on these tests is obvious:

They’re forced to work their butts off.

While the American teen is “wasting” her time on typical, “useless” teenager-ly pursuits,
the Singaporean teen is seated obediently in front of his desk, doing yet another soul-
crushing TYS question. And once in a while, a 10-year-old commits suicide due to poor
exam results.29 Kids the world over commit suicide for a variety of reasons, but only in
East Asia do they regularly do so due to poor exam results.
In South Korea, legislators have even passed (ill-enforced) laws barring hagwons (private
cram schools) from operating past 10 p.m. To the American teen, it is utterly mind-
blowing that (a) anyone would be in school past 10 p.m.; and (b) this practice grew to be
so common that legislators saw fit to pass laws against it. But in East Asia, this isn’t at
all strange.
To me, the fact that East Asian students bust their butts is obviously the single most
important explanation for why they do so well on international tests. Yet strangely, in the
countless papers and books that I’ve come across seeking to explain why some countries do
better than others, this explanation is rarely ever considered.

By the way, Lockhart explains why this is so and what maths really is far more eloquently and clearly than
I ever could. I strongly recommend that every student and instructor of maths read A Mathematician’s
Lament. There are two versions — a 2002 25-page PDF that circulated online and a 2009 book version.
Japan, Korea, Viet Nam, and the five Chinese-majority countries (China, Hong Kong, Macao, Singapore,
and Taiwan).
In the US, “Singapore Math” has acquired something of a mythical status. As with weight loss, Americans
are constantly on the lookout for some magic, painless solution to their mediocre education systems.
For example, in 2001, 10-year-old Lysher Loh jumped to her death from her fifth floor apartment (source).
She “had been disappointed with her mid-year examination results and had found the workload heavy.”
She had also “told her maid Lorna Flores two weeks before her death she did not want to be reincarnated
as a human being because she never wanted to have to do homework again.” In 2016, another 11-year-
old “killed himself over his exam results by jumping from his bedroom window in the 17th-storey flat”
(source). For more of such stories, see this page.
xxxvii, Contents
Figure 2: Average test score by group and treatment: U.S. vs. Shanghai

U.S Shanghai


(out of 25) 21.3 22.0 22.7
21.0 20.6

17.0 17.5 17.6


6.1 6.4

School 1 School 1 School 1 School 2 School 2 School 1 School 2 School 3

Low Regular Honors Regular Honors

Control Treatment

N otes: Average score for students who received no incentives (Control) and for
students who received incentives (Treatment) by school and track.

A related explanation is that East Asians are trained from young to take every test seriously.
In contrast, American kids could care less about some inconsequential PISA test.30 (Indeed,
one might argue that if made to do an hours-long inconsequential PISA test, the truly
intelligent kid would simply click through it as quickly as possible.)
This explanation has recently received some academic attention. In “Measuring Success in
Education: The Role of Effort on the Test Itself” (2017), experimenters gave a PISA-like
test to students in several schools in the US and Shanghai, China. At each location, the
treatment group was given a financial incentive (i.e. a bribe) to do well, while the control
group was given nothing.
In the US, the treatment group students performed significantly better than the control
group. In contrast, in Shanghai, the treatment group students performed no better.
This suggests that robotic Shanghai students are conditioned to always try their hardest,
whether or not there’s any financial incentive. In contrast, US students may not be trying
particularly hard when the stakes are low or zero (as is the case with PISA), but will try
a little harder when there’s some financial incentive.
Note also that “students learn about the incentives just before taking the test, so any
impact on performance can only operate through increased effort on the test itself rather
than through, for example, better preparation or more studying” (p. 4). The remaining US-
Shanghai performance gap could thus very well be eliminated if the former were incentivised
(through carrot and stick) to prepare and work half as hard as the latter.
A similar but more recent study is “Taking PISA Seriously: How Accurate Are Low Stakes
East Asians care as much about exams as Americans do professional sports, while Americans care as
much about exams as East Asians do professional sports.
xxxviii, Contents
Exams?” (2018). One of its conclusions is that “a country can rise up to 15 places in
rankings if its students took the exam seriously.”
Altogether then, there is very little that Western educators can learn from East Asia. The
only lesson is this: If you want your students to do well on tests like the PISA, then:
• Train them from young to take every test seriously; and
• Force them to work their butts off (thereby destroying their childhood and adolescence).
Singapore produces world champion test-takers and gold medallists at the various Inter-
national Olympiads. But as currently constituted, the Singapore education system will
never produce a Fields Medallist or a Nobel Laureate. And as Steve Wozniak sug-
gests, Singapore will never produce a world-beating innovator like an Apple or a
Google. The reason is that unlike taking tests (be it your J1 Promos or your IMO), such
endeavours require more than mere monkey-see-monkey-do mimicry.31
The goal of this textbook is to impart genuine understanding. I suspect that the sincere
pursuit of this goal will do more to promote GDP growth than any of Gahmen’s current
educational policies.
But quite aside from any such instrumental value, I believe that a genuine understanding
of maths (and indeed any other material) is intrinsically valuable. (GDP growth is
lovely, but despite what Gahmen would have you believe, it is not all that makes life
worthwhile.) And heck, learning can even be that three-letter word banished from the
Singapore education system (and apparently also playgrounds and void decks): F-U-N.

These were actual signs posted in Singapore. They became viral and were removed in
June 2013 (New Paper story) and in March 2016 (Straits Times story), respectively.

Here are two excuses I’ve come across for why Singapore has produced no Nobel Laureates: (a) Singa-
pore’s population is too small; and (b) Singapore was until fairly recently very poor.
But consider Denmark (population 5.7M), Finland (5.5M), and Norway (5.2M), whose populations are
similar to or even smaller than Singapore’s (5.5M) and who were producing Nobel Laureates when they
were far poorer than Singapore is today. We could also point to tiny Saint Lucia (180,000) which has
produced two Nobel Laureates.
I am thus accepting bets for this proposition: “By 2050, no born-and-bred Singaporean will have won a
Fields medal or a Nobel Prize (Peace excluded).” (We’ll need to work out what exactly “born-and-bred”
means, but that can be ironed out.)
xxxix, Contents
Now, what do I mean by “imparting genuine understanding”?
Personal anecdote: As a JC student, I remember being deeply mystified by why the scalar
product had such a simple algebraic definition and yet could at the same time also tell us
about the cosine of the angle between the two vectors. I never figured it out. But this
didn’t matter, because this was simply “yet another formula” that we learnt for the sole
purpose of answering exam questions.32
I remember being confused about the difference between the sample mean, the mean of the
sample mean, the variance of the sample mean, and the sample variance. But this confusion
didn’t matter, because once again, all we needed to do to get an A was to mindlessly apply
formulae and algorithms. Monkey see, monkey do.
This textbook is thus partly in response to my unhappy and unsatisfactory experience as a
cog in the Singapore educational system. In other words, this is the textbook I wish I had
had when I was a JC student.
Almost all results are proven. I try to supply the intuition for each result in the simplest
possible terms. Many proofs are relegated to the appendices, but where a proof is especially
simple and beautiful, I encourage the student to savour it by leaving it in the main text.
In the rare instances where proofs are entirely omitted from this book — usually because
they are too advanced — I make sure to clearly state so, lest the student wonder whether
the result is supposed to be obvious (as I often did when I was a JC student).
This textbook follows the Singapore A-Level syllabus.33 And so, a good deal of mindless
formulae is unavoidable. Even so, I try in this textbook to give the student a tiny glimpse
of what maths really is — “the art of explanation”.34 I try to plant a thoughtcrime in the
student’s mind: Maths is not merely another pain to be endured, but can at times be a joy.
And so for example, this textbook explains:
• A bit of intuition behind differentiation, integration, and the Fundamental Theorems of
Calculus. (To get an A, no understanding of these is necessary. Instead, one need merely
know how to “do” differentiation and integration problems.)
• Why the Central Limit Theorem is so amazing. (To get an A, one need merely treat
the CLT as yet another mysterious mathematical trick that helps solve exam questions.
It isn’t necessary to appreciate why it is so amazing, where it might possibly come, or
what relevance it has to everyday life.)
• A bit of intuition behind the Maclaurin series. (To get an A, it suffices to know how to
mindlessly apply this strange formula that falls out from the sky.)
• Why it is terribly wrong to believe that “a high correlation coefficient means a good
model”. (Yet this is exactly your A-Level examiners seem to believe. See Ch. 108.9.)

I remember complaining about this to a classmate and he responded, “But that’s how we’ve always been
taught maths what. It’s just a bunch of formulae.” He was of course right.
Today, the intellectually-curious student can easily find the answer on the internet. But at that time
(2001–02), the internet was not quite as developed and so one could not easily find answers online.
Another of my quixotic desires is to change that too. Or as Lockhart says, “[T]hrow the stupid curriculum
and textbooks out the window!”
Lockhart, A Mathematician’s Lament (2002, 2009).
xl, Contents
Two Reasons Why Even Type 1 Pragmatists Should Read This Textbook

(1) The A-Level exams now include more curveball or out-of-syllabus questions.
Previously, the A-Level exam questions were always perfectly predictable. If you had no
problem doing past year exam questions, you’d have no problem getting an A.
But starting in 2017 (coinciding with the new and supposedly reduced syllabus), curveball
questions now carry a weight of perhaps 10–20%. For example, in 2017, out of absolutely
nowhere, students were suddenly asked to use something called D’Alembert’s ratio test and
to explain whether a series converges (see Exercise 483).
I can find no official, publicly-available statement announcing (much less explaining) this
change. I have heard only that JC maths teachers were informed by MOE of this change
ahead of time. My guess is that this is the MOE’s highly-creative method of creating
creative students.
In my humble opinion, this change is complete cow manure. It serves only to add further
pressure to the already-miserable Singapore student.
But I will confess that selfishly, I welcome this change because it increases the value of this
textbook. The student who carefully studies this textbook will be rewarded with a true
and deep understanding of all the H2 Maths material and hence be fully prepared to bat
away any curveball.
Take for example Exercise 512 (9758 N2017/I/6). This unfamiliar problem will likely have
come as a shock for many a Singaporean monkey drilled a thousand times over to “do”
computational problems involving 3D geometry. In contrast, any student who bothered to
read this textbook’s Part III even once will have enjoyed solving this problem.

(2) If you’re intending to do more maths in the future (e.g. physics, economics,
engineering), then this textbook will actually save you time in the long run.
Merely doing well in A-Level H2 Maths may give you the false illusion that you’ve actually
learnt or understood the material. Down the road, this may cost you more time.
Another personal anecdote: When I began my undergraduate studies (in a small US col-
lege), I was still the typical kiasu Singaporean monkey trained to believe that life was a
competitive, Social Darwinist race. And so I skipped a whole bunch of first- and second-
year maths classes (Calculus I, Calculus II, Statistics, and Linear Algebra), thinking I had
already covered all the material back in JC.
On paper, I may indeed have covered all this material. But in practice, all I’d learnt in JC
was monkey see, monkey do. I’d learnt enough to do well on the exams, but not enough to
actually understand or use any of the material.
It was only many years later, with the benefit of hindsight, that I began to see how much of
a mistake I had made. Skipping those classes saved me time and put me “ahead of the race”
in the short run. But in the long run, this actually cost me dearly. I would actually have
saved more time by not skipping those seemingly-elementary first- and second-year classes!
(And of course, I would’ve saved even more time by skipping the Singapore education
system, but alas, that wasn’t an option.)
xli, Contents
This textbook thus offers the sort of A-Level maths education I wished I had received.35
You’ll be spending two years on H2 Maths anyway. And so, instead of wasting these two
years learning mindless algorithms you’ll forget a month after the A-Level exams, why not
spend this time actually learning and understanding the material that you can go on to
actually use?
And of course, in my completely humble and unbiased opinion, the best way to learn and
understand H2 Maths is by studying this textbook.

I conclude this Preface/Rant by expressing my hope that even if you the instructor or
student do not use this textbook as your primary instructional or learning material, you
will still find it perfectly useful as an authoritative and reliable reference.

P.S. This textbook is far from perfect. To steal a certain neighbourhood school’s motto,
the best is yet to be. I hope to keep improving this textbook, but I can only do so with your
help. So if you have any feedback or spot any errors, please feel free to email
me. (As you can tell, I am pretty merciless about criticising others. So please don’t be shy
about pointing out the many foolish mistakes that are surely still lurking in this textbook.)

Please note that there is no knock or diss here on my JC Maths teachers. My JC Maths teachers and
indeed most of my Singapore teachers were generally pretty good and did the best they could within
the stultifying confines of the Singapore educational system. My critique here applies to the Singapore
educational system. As the hip-hop cliché goes, I don’t hate the player(s); I hate the game.
xlii, Contents
PISA 2015 Mean Scores
Science Reading Maths Science Reading Maths
Singapore 556 535 564 Lithuania 475 472 478
Japan 538 516 532 Croatia 475 487 464
Estonia 534 519 520 CABA (Argen.) 475 475 456
Taiwan 532 497 542 Iceland 473 482 488
Finland 531 526 511 Israel 467 479 470
Macao 529 509 544 Malta 465 447 479
Canada 528 527 516 Slovak Rep. 461 453 475
Viet Nam 525 487 495 Greece 455 467 454
Hong Kong 523 527 548 Chile 447 459 423
BSJG (China) 518 494 531 Bulgaria 446 432 441
Korea 516 517 524 UAE 437 434 427
NZ 513 509 495 Uruguay 435 437 418
Slovenia 513 505 510 Romania 435 434 444
Australia 510 503 494 Cyprus 433 443 437
UK 509 498 492 Moldova 428 416 420
Germany 509 509 506 Albania 427 405 413
Netherlands 509 503 512 Turkey 425 428 420
Switzerland 506 492 521 Trin. & Tobago 425 427 417
Ireland 503 521 504 Thailand 421 409 415
Belgium 502 499 507 Costa Rica 420 427 400
Denmark 502 500 511 Qatar 418 402 402
Poland 501 506 504 Colombia 416 425 390
Portugal 501 498 492 Mexico 416 423 408
Norway 498 513 502 Montenegro 411 427 418
US 496 497 470 Georgia 411 401 404
Austria 495 485 497 Jordan 409 408 380
France 495 499 493 Indonesia 403 397 386
Sweden 493 500 494 Brazil 401 407 377
Czechia 493 487 492 Peru 397 398 387
Spain 493 496 486 Lebanon 386 347 396
Latvia 490 488 482 Tunisia 386 361 367
Russia 487 495 494 FYROM 384 352 371
Luxembourg 483 481 486 Kosovo 378 347 362
Italy 481 485 490 Algeria 376 350 360
Hungary 477 470 477 Dominican Rep. 332 358 328
Notes: B-S-J-G = Beijing-Shanghai-Jiangsu-Guangdong; CABA = Ciudad Autónoma de Buenos Aires;
FYROM = Former Yugoslav Republic of Macedonia. Source: “PISA 2015 Results in Focus” (PDF), p. 5.

xliii, Contents

Part 0.
A Few Basics

1, Contents
The glory of [maths] is its complete irrelevance to our lives. That’s why it’s
so fun!

— Paul Lockhart (2002, 2009).

I have never done anything ‘useful’. No discovery of mine has made, or is

likely to make, directly or indirectly, for good or ill, the least difference to the
amenity of the world.

— G.H. Hardy (1940).

2, Contents
1. Just To Be Clear
In this textbook, we’ll stick to these standard conventions:
• Greater than means “strictly greater than” (>). So I won’t bother saying “strictly”,
unless it’s something I want to emphasise.
• Less than means “strictly less than” (<).
• If I want to say greater than or equal to (≥) or smaller than or equal to (≤), I’ll
say exactly that.
• Positive means “greater than zero” (> 0).
• Negative means “less than zero” (< 0).
• Non-negative means “greater than or equal to zero” (≥ 0).
• Non-positive means “less than or equal to zero” (≤ 0).
• Zero is neither positive nor negative. Instead, it is both non-negative and non-
Names of some punctuation marks:

Left Right A pair of

Parenthesis ( Parenthesis ) Parentheses ()
Bracket [ Bracket ] Brackets []
Brace { Brace } Braces {}

Remark 1. Some writers refer to (), [], and {} as round, square, and curly brackets
— we’ll avoid these terms. Instead, as stated above, we’ll strictly refer to (), [], and {}
as parentheses, brackets, and braces.37

• The symbol ∵ and ∴ stand for because and therefore.

• The punctuation mark “ means ditto or “the same as above/before”. Example:

Tokyo is the capital of Japan.

Beijing “ China.
Lima “ Peru.

• The multiplication symbol ⋅ is (sometimes) preferred to × because there is (sometimes)

the slight risk of confusing × with the letter x.

Note though that in France, positif and négatif mean ≥ 0 and ≤ 0, so that 0 is both positif and négatif.
On this, see e.g. Wiktionary.
Note that there is actually another pair of brackets ⟨⟩ called angle brackets. If we’re using angle
brackets, then we’ll want to be careful to distinguish them from [] by referring to the latter as square
brackets. Happily, we won’t be using angle brackets at all in this textbook. And so, we’ll simply call
[] brackets.
3, Contents
2. PSLE Review: Division

Example 1. Consider 9 ÷ 4. We call 9 the dividend and 4 the divisor. We have:

9 1
=2 .
4 4
We call q = 2 the quotient — q is the largest integer such that 4q ≤ 9.
We call r = 1 the remainder — r is defined so that 9 = 4q + r or equivalently:

r = 9 − 4q = 9 − 4 × 2 = 1.

Example 2. Consider 17 ÷ 3. We call 17 the dividend and 3 the divisor. We have:

17 2
=5 .
3 3
We call q = 5 the quotient — q is the largest integer such that 3q ≤ 17.
We call r = 2 the remainder — r is defined so that 17 = 3q + r or equivalently:

r = 17 − 3q = 17 − 3 × 5 = 2.

The above two examples used the Euclidean division algorithm:

Definition 1. (The Euclidean Division Algorithm.) Let x and d be positive in-

tegers.38 Let q be the largest integer that satisfies dq ≤ x. Define r so that:

x = dq + r.

We call x the dividend, d the divisor, q the quotient, and r the remainder.

Note that thus defined, the quotient and remainder are unique.39

Definition 2. Let x and d be positive integers. If x ÷ d leaves no remainder, then we say

that d is a factor of x or that d divides x.

Example 3. Consider 18 ÷ 6. We call 18 the dividend and 6 the divisor.

Following the Euclidean Division Algortihm:
• Let q be the largest integer such that 6q ≤ 18. Then q = 3 and we call q the quotient.
• Define r so that 18 = 6q + r. Then r = 0 and we call r the remainder. Since r = 0, we
can actually more simply say that there is no remainder.
Altogether, we have: = 3.
Since 18 ÷ 6 leaves no remainder (r = 0), we say that 6 is a factor of or divides 18.
To keep things simple, I narrow this definition to the case where x and d are positive integers.
This is proven in Theorem 33 in the Appendices.
4, Contents
2.1. Long Division
Remember long division from primary school?

Example 4. 87 ÷ 7, by long division:

10s 1s
1 2
7 8 7 Explanation
7 0 10 × 7 = 70
1 7 87 − 70 = 17
1 4 2 × 7 = 14
3 17 − 14 = 3

3 3
Thus: 87 ÷ 7 = 12 + = 12 .
7 7

We call 87 the dividend, 7 the divisor, 12 the quotient, and 3 the remainder.

Example 5. 912 ÷ 17, by long division:

100s 10s 1s
5 3
17 9 1 2 Explanation
8 5 0 50 × 17 = 850
6 2 912 − 850 = 62
5 1 3 × 17 = 51
1 1 62 − 51 = 11

11 11
Thus: 912 ÷ 17 = 53 + = 53 .
17 17

We call 912 the dividend, 17 the divisor, 53 the quotient, and 11 the remainder.

Exercise 1. Do the long division for 8 057 ÷ 39. Identify the dividend, divisor, quotient,
and remainder. (Answer on p. 1387.)

5, Contents
2.2. Dividing By Zero
Dividing by zero is a common mistake. Students have little trouble avoiding this mistake
if the divisor is obviously a big fat zero. Instead, students usually make this mistake when
the divisor is an unknown constant or variable that might be zero.

Example 6. Solve x (x − 1) = (2x − 2) (x − 1). (That is, find the values of x for which
the equation is true. We call these values of x the solutions to the equation.)
Here’s a wrong solution: “Divide both sides by x − 1 to get x = 2x − 2. So x = 2.” 7
The correct solution considers two possible cases:
Case 1. If x − 1 = 0, then the equation is true. So, x = 1 is a possible solution.
Case 2. If x − 1 ≠ 0, then we can divide both sides by x − 1 to get x = 2x − 2.
So, x = 2 is another possible solution.
Conclusion. The two possible
Art. 24.
solutions are x = 1 and x = 2.
s 1 M P L E E Q_U A T 1 o N. S. Io 3
The Proof.

Moral of the story. Dividing by zero may cause us to lose perfectly valid solutions. So,
always make sureThethe divisor
original equation, #;=Iº;
is non-zero. If you’re not
x=sure whether
6; therefore it equals zero, then
break up your analysis into two cases, as was4.5 done in4.5the above example: Case 1. The

quantity equals zero (and see what happens in this case).

therefore 2 x + 3 = 15 ; therefore 2x+3 15 =
Case 2. The quantity is non-zero
3 : again, 4x=24;

(in which casetherefore

you can4x— go ahead
5 = 19and divide).Iz-z-z-3; therefore 2x+3
; therefore –37––37– 45

What’s wrong with the following seven-step “proof” that 1 = 0?

Exercise 2.——%
T 4 x – c’
1. Let x and y be
4. positive
5 numbers such that x12.= y.
2. Square both3sides:
x—4 T =2yI 26.
x2 5x-6° 6482–864
therefore 128 ==== ; therefore 640 x
3. Rearrange: x2 − y 2 = 0
(x −is,y)-768
(x + y)
—768 = 648 x–864; therefore – 768 = 648 x – 640 x —864,
4. Factorise: that 0. therefore +864–768 =83, that is,
5. Divide both sides by x − y to get x + y = 0.
8 x = 96 ; and x = 12.
The Proof.
6. Since x = y, plug y = x into the above 216 to get 2x = 0.
128 equation
7. Divide both The
by 2xequation, = 0.
to get 1 3x–4Tsz-6 (Answer
x = 12 ; therefore 3 x on p. 1387.)

Exercise 3.=Nicholas Saunderson

36; therefore 3 x–4 = (1682–1739)
32; thereforewas the —
128 Lucasian
: again Professor of
Mathematics (at the University of Cambridge) — a fancy 216title that’s
216 been held by Isaac
Newton (1642–1727)
5 x = 60 ;and Stephen Hawking
= 54; (1942–2018). Sanderson wrote a textbook
- -

therefore 5x-6 therefore :=E===4;

called The Elements of Algebra (1740). On p. 103, he gives the following example:
I 28 216
therefore 3x–4 - 5 x–6'

Example 13.
#=#. divide both numerators by x, and you will have
–tº– - 35 ; therefore 42 = 35 x-70 ; therefore 42 x– 126 =
3. – 2 x - 3: -

3.5 x–70; therefore 42 x–35 x–126=–70, that is, 7 x — 126

=–70; therefore 7x = 126–70, that is, 7x = 56; and x = 8.
The Proof.
Can you identify any error(s) made in the above example? (Answer on p. 1387.)
The original 42*
35% ; x = 8 ; therefore x–2=6;

ginal equation, z_2 x–3

6, Contents
42 ×
By the way, let’s take this opportunity to clear up a related and popular misconception.
You’ve probably heard someone saying that 1 ÷ 0 = ∞. This is wrong. Instead:
≠ ∞.
Instead, any non-zero number divided by zero is undefined.40 Undefined is the mathem-
atician’s way of saying:

You haven’t told me what you’re talking about.

So what you’re saying is meaningless.

And so, the following five expressions are all undefined:

1 50 −17.1 ∞ −∞
, , , , , .
0 0 0 0 0

The special case is 0 ÷ 0, which is indeterminate. This means that 0 ÷ 0 is sometimes undefined, but
can sometimes be defined under certain circumstances.
7, Contents
3. Logic
Big surprise — you’ve secretly been using logic your whole life.
Logic isn’t explicitly on your H2 Maths syllabus.41 But spending an hour or two on logic
pays huge dividends — you’ll learn to reason better, both in maths and in everyday life.
This chapter is thus a brief and gentle introduction to logic. Here we merely present some
of the most basic but also some of the most useful results from logic. (If you truly can’t be
bothered, please at least check out the one-page summary of this chapter on p. 31.)
First, try this appetiser.42

Example 7. The Wason Four-Card Puzzle. In a special deck of cards, each card
has a letter on one side and a number on the other. You are shown these four cards:

A Z 1 8
Betsy the Bimbotic Blonde now comes along and makes the following claim:

“If a card has a vowel on one side, then it has an even number on the other side.”

You suspect that Betsy is wrong. To prove that she’s wrong, which of the above four
cards should you turn over? (The goal is to turn over as few cards as possible.)43

The above puzzle baffles most who are untrained in logical thinking. Right now, that
probably includes you. But by p. 22 of this textbook, you’ll have had some training in
logic and thus be able to solve this puzzle easily.

If it were up to me, the H2 Maths syllabus would devote at least a little time to logic. Instead, that time
is spent on learning to compute the volume of the revolution of a curve around the y-axis. Which is a
doggie trick that (1) students will forget two weeks after the final A-Level exam; and (2) is completely
useless unless you’re planning to be an engineer or physicist, in which case it is still completely useless
since down the road, you’ll be learning it again (and probably more properly).
The wording here is a slightly-modified version of Wason (1966, pp. 145–146).
Answer: A and 1. We’ll explain this on p. 22.
8, Contents
3.1. True, False, and Indeterminate Statements

Example 8. Let A, B, C, and D be the following statements.

A: “Germany is in Europe.” 3 (True)

B: “Germany is in Asia.” 7 (False)
C: “1 + 1 = 2.” 3 (True)
D: “1 + 1 = 3.” 7 (False)

Then statements A and C are true, while statements B and D are false.44
As shorthand, we’ll use the green checkmark 3 to denote that a statement is true; and a
red crossmark 7 to denote that it’s false.

Example 9. Let M , N , and O be the following statements.

M : “x > 0.”
N : “x > 1.”
O: “x is a positive number.”

Note that the truth values of M , N , and O depend on the value of x. That is, whether
each statement is true or false depends on the value of x.
So, without being given further information, each of these three statements can neither
be said to be true nor said to be false. Instead, we say that each is indeterminate.
But if we’re told that:
• x = 5, then all three statements are true.
• x = 0.5, then statements M and O are true, while N is false.
• x = −1, then all three statements are false.

In this textbook, we will not define the terms statement, true, and false. We’ll take for granted that
“everybody knows” what these terms mean (even if they don’t).
9, Contents
3.2. The Conjunction AND and the Disjunction OR

Example 10. As before, let A, B, C, and D be the following statements:

A: “Germany is in Europe.” 3
B: “Germany is in Asia.” 7
C: “1 + 1 = 2.” 3
D: “1 + 1 = 3.” 7
Using the logical connective AND (called the conjunction), we can form the following
statements (which we also call conjunctions):

A AND B: “Germany is in Europe AND Germany is in Asia.” 7

A AND C: “Germany is in Europe AND 1 + 1 = 2.” 3
B AND D: “Germany is in Asia AND 1 + 1 = 3.” 7

• The conjunction A AND B is false because B is false.

• The conjunction A AND C is true because both A and C are true.
• The conjunction B AND D is false because B is false. (Indeed, D is also false.)
Using the logical connective OR (called the disjunction) we can form the following
statements (which we also call disjunctions):

A OR B: “Germany is in Europe OR Germany is in Asia.” 3

A OR C: “Germany is in Europe OR 1 + 1 = 2.” 3
B OR D: “Germany is in Asia OR 1 + 1 = 3.” 7

• The disjunction A OR B is true because A is true.

• The disjunction A OR C is true because A is true. (Indeed, C is also true.)
• The disjunction B OR D is false because both B and D are false.

Definition 3. P AND Q, the conjunction of P and Q, is the statement that is

• True if both P and Q are true; and
• False if at least one of P or Q is false.

Definition 4. P OR Q, the disjunction of P and Q, is the statement that is

• True if at least one of P or Q is true; and
• False if both P and Q are false.

Exercise 4. Continue with the above example. Explain if each of the following statements
is true or false. (Answer on p. 1388.)

(a) B AND C. (b) A AND D. (c) C AND D.

(d) B OR C. (e) A OR D. (f) C OR D.

10, Contents

3.3. The Negation NOT
Informally, a statement’s negation is the “opposite” or contradictory statement. Formally:

Definition 5. NOT-P , the negation of P , is the statement that is

• True if P is false; and
• False if P is true.

Example 11. As before, let A, B, C, and D be the following statements:

A: “Germany is in Europe.” 3
B: “Germany is in Asia.” 7
C: “1 + 1 = 2.” 3
D: “1 + 1 = 3.” 7
Then the negations of statements A, B, C, and D are simply:

NOT-A: “Germany is not in Europe.” 7

NOT-B: “Germany is not in Asia.” 3
NOT-C: “1 + 1 ≠ 2.” 7
NOT-D: “1 + 1 ≠ 3.” 3

• A and C are true; thus, their negations NOT-A and NOT-C must be false.
• B and D are false; thus, their negations NOT-B and NOT-D must be true.

The negation of the negation simply brings us back to the original statement:

NOT-NOT-A: “Germany is in Europe.” 3

NOT-NOT-B: “Germany is in Asia.” 7
NOT-NOT-C: “1 + 1 = 2.” 3
NOT-NOT-D: “1 + 1 = 3.” 7

Exercise 5. Let E: “It’s raining”, F : “The grass is wet”, G: “I’m sleeping”, and H:
“My eyes are shut”.
Write down NOT-E, NOT-F , NOT-G, and NOT-H. (Answer on p. 1388.)

Remark 2. AND, OR, and NOT are our three most basic logical connectives. Using
these three basic connectives, we can build ever more complex statements.

11, Contents

3.4. Equivalence ⇐⇒
The symbol ⇐⇒ reads aloud as is equivalent to or if and only if.

Definition 6. We say that two statements P and Q are equivalent and write:

P ⇐⇒ Q,

if it is impossible that one is true while the other is false.

Example 12. Let M , N , and O be the following statements:

M : “x > 0.”
N : “x > 1.”
O: “x is a positive number.”
Observe that if M is true, then O is also true. And if M is false, then O is also false. It
is impossible that one is true while the other is false. And so, we say that M and O are
equivalent and write M ⇐⇒ O.
In contrast, it is possible that M is true while N is false — this is the case when x = 0.5.
And so we say that M and N are not equivalent and write M ⇐⇒ / N.

Example 13. Let α, β, and γ be the following statements:

α: “x = 3.”
β: “x + 2 = 5.”
γ: “x2 = 9.”
Observe that if α is true, then β is true. And if α is false, then β is also false. It is
impossible that one is true while the other is false. And so, we say that α and β are
equivalent and write α ⇐⇒ β.

Exercise 6. (a) Are N : “x > 1” and O: “x is a positive number” equivalent?

(b) Are α: “x = 3” and γ: “x2 = 9” equivalent? (Answer on p. 1388.)

12, Contents

3.5. De Morgan’s Laws: Negating the Conjunction and Disjunction
The negation of P AND Q is denoted NOT- (P AND Q).

Remark 3. Note the use of the parentheses.

In arithmetic, 2 + 3 × 7 = 23 is different from (2 + 3) × 7 = 35.
Likewise, in logic, NOT- (P AND Q) is different from NOT-P AND Q.
In both arithmetic and logic, we sometimes add parentheses to be clear about the order
of operations. Indeed, just to be extra clear, we sometimes add parentheses even when
strictly speaking, they aren’t necessary.

Fact 1. NOT- (P AND Q) ⇐⇒ (NOT-P OR NOT-Q).

Proof. See p. 1251 in the Appendices.

Example 14. Consider these three statements:

1. A AND B: “Germany is in Europe AND Germany is in Asia.” 7
2. C AND D: “1 + 1 = 2 AND 1 + 1 = 3.” 7
3. A AND C: “Germany is in Europe AND 1 + 1 = 2.” 3
Let’s now consider the negation of each of the above statements:
1. NOT- (A AND B) is true. There are two ways to see this:
• Since A AND B is false, its negation NOT- (A AND B) must be true.
• By the above Fact, NOT- (A AND B) is equivalent to (NOT-A OR NOT-B): “Ger-
many is not in Europe OR Germany is not in Asia”. Which is true because NOT-B:
“Germany is not in Asia” is true.
2. NOT- (C AND D) is true. There are two ways to see this:
• Since C AND D is false, its negation NOT- (C AND D) must be true.
• By the above Fact, NOT- (C AND D) is equivalent to (NOT-C OR NOT-D): “1 +
1 ≠ 2 OR 1 + 1 ≠ 3”. Which is true because NOT-D: “1 + 1 ≠ 3” is true.
3. NOT- (A AND C) is false. There are two ways to see this:
• Since A AND C is true, its negation NOT- (A AND C) must be false.
• By the above Fact, NOT- (A AND C) is equivalent to (NOT-A OR NOT-C): “Ger-
many is not in Europe OR 1+1 ≠ 2”. Which is false because both NOT-A: “Germany
is not in Europe” and NOT-C: “1 + 1 ≠ 2” are false.

Exercise 7. Continue with the above example. Explain if the negation of each of the
following statements is true. (Answer on p. 1388.)

(a) B AND C. (b) A AND D. (c) B AND D.

13, Contents

The negation of P OR Q is denoted NOT- (P OR Q).

Fact 2. NOT- (P OR Q) ⇐⇒ (NOT-P AND NOT-Q).

Proof. See p. 1252 in the Appendices.

Example 15. Consider these three statements:

1. A AND B: “Germany is in Europe OR Germany is in Asia.” 7
2. C AND D: “1 + 1 = 2 OR 1 + 1 = 3.” 7
3. A AND C: “Germany is in Europe OR 1 + 1 = 2.” 7
Let’s now consider the negation of each of the above statements:
1. NOT- (A OR B) is false. There are two ways to see this:
• Since A OR B is true, its negation NOT- (A OR B) must be false.
• By the above Fact, NOT- (A OR B) is equivalent to (NOT-A AND NOT-B): “Ger-
many is not in Europe AND Germany is not in Asia”. Which is false because
NOT-A: “Germany is not in Europe” is false.
2. NOT- (C OR D) is false. There are two ways to see this:
• Since C OR D is true, its negation NOT- (C OR D) must be false.
• By the above Fact, NOT- (C OR D) is equivalent to (NOT-C AND NOT-D): “1 +
1 ≠ 2 AND 1 + 1 ≠ 3”. Which is false because NOT-C: “1 + 1 ≠ 2” is false.
3. NOT- (A OR C) is false. There are two ways to see this:
• Since A OR C is true, its negation NOT- (A OR C) must be false.
• By the above Fact, NOT- (A OR C) is equivalent to (NOT-A AND NOT-C): “Ger-
many is not in Europe AND 1 + 1 ≠ 2”. Which is false because NOT-A: “Germany
is not in Europe” is false. Indeed, NOT-C: “1 + 1 ≠ 2” is also false.

Exercise 8. Is the negation of each statement true? (Answer on p. 1389.)

(a) B OR C. (b) A OR D. (c) B OR D.

Remark 4. Together, Facts 1 and 2 are known as De Morgan’s Laws.

14, Contents

3.6. The Implication P Ô⇒ Q
The statement “P Ô⇒ Q” reads aloud as “If P , then Q” or “P implies Q”.

Example 16. Let E: “It’s raining” and F : “The grass is wet”.

Then the implication E Ô⇒ F is the statement:

“If it’s raining, then the grass is wet.”

Or equivalently: “That it’s raining implies that the grass is wet.”

Which is true.

Example 17. Let G: “I’m sleeping” and H: “My eyes are shut”.
Then the implication G Ô⇒ H is the statement:

“If I’m sleeping, then my eyes are shut.”

Or equivalently: “That I’m sleeping implies that my eyes are shut.”

Which is true.

Example 18. Let M : “x > 0” and N : “x > 1”.

Then the implication M Ô⇒ N is the statement:

“If x > 0, then x > 1.”

Or equivalently: “That x > 0 implies that x > 1.”

Which is false (counterexample: x = 0.5).

Example 19. Let α: “x = 3.” and γ: “x2 = 9.”

Then the implication γ Ô⇒ α is the statement:

“If x2 = 9, then x = 3.”

Or equivalently: “That x2 = 9 implies that x = 3.”

Which is false (counterexample: x = −3).

This all seems simple enough. However, you may find the formal definition of P Ô⇒ Q a
little strange and unintuitive:

15, Contents

Definition 7. Let P and Q be statements. Then the implication P Ô⇒ Q is defined


That is, P Ô⇒ Q is the statement that is:

• True if P is false OR Q is true; and
• False if P is true AND Q is false.
Given the implication P Ô⇒ Q, we call P the hypothesis, premise, or antecedent; and
Q the conclusion or consequent.

What confuses students most about the above definition is this: From a false hypothesis,
any conclusion may be drawn! That is:

If P is false, then P Ô⇒ Q is always true!

Example 20. Consider these three statements:

1. “If goldfish can walk on land, then pigs can fly.”

2. “If there are 17 washing machines on Mars, then I am a billionaire.”

3. “If x > 0 and x < 0, then 2 = 77.”

Pigs cannot fly, I am not a billionaire, and 2 ≠ 77. It may thus seem that all three
statements must be false.
But strangely enough, all three are true! To see why, use the above Definition to rewrite
the three statements as:
1. “Goldfish cannot walk on land OR pigs can fly.” Which is true, because “Goldfish
cannot walk on land” is true.
2. “There aren’t 17 washing machines on Mars OR I am a billionaire.” Which is true,
because “There aren’t 17 washing machines on Mars” is true.

3. “NOT-(x > 0 AND x < 0) OR 2 = 77.” Which is true, because “NOT-(x > 0 AND
x < 0)” is true — it is impossible that x is both more than AND less than 0.
The examples here illustrate that from a false hypothesis, any conclusion may be drawn!

Exercise 9. Is each statement true? (Answer on p. 1389.)

(a) “If Tin Pei Ling (TPL) is a genius, then the Nazis won World War II (WW2).”
(b) “If TPL is a genius, then the Allies won WW2.”
(c) “If π is rational, then I am the king of the world.”
(d) “If π is rational, then Lee Hsien Loong is Lee Kuan Yew’s son.”

16, Contents

3.7. The Converse Q Ô⇒ P
Informally, converse = “Flip”. Formally:

Definition 8. Given the implication P Ô⇒ Q, its converse is the statement Q Ô⇒ P .

Example 21. Let E: “It’s raining” and F : “The grass is wet”. Then

E Ô⇒ F : “If it’s raining, then the grass is wet.”

The converse of the implication E Ô⇒ F is the following implication:

F Ô⇒ E: “If the grass is wet, then it’s raining.”

Or equivalently: “That the grass is wet implies that it’s raining.”

Note that F Ô⇒ E is false. One way to prove that a statement is false is by supplying
a counterexample. A counterexample that shows that F Ô⇒ E is false is any scenario
where the grass is wet even though it isn’t raining. We can easily think of three such
1. The rain just stopped.
2. Someone is watering the grass.
3. A dog is peeing on the grass.

Observe that in the above example, E Ô⇒ F is true but its converse F Ô⇒ E is false.
This proves that an implication and its converse are not always equivalent. Let’s
jot this down formally:

Fact 3. Given two statements P and Q, it is not always true that:

(P Ô⇒ Q) ⇐⇒ (Q Ô⇒ P ).

We have instead the following result, which states that Q Ô⇒ P is equivalent to


Fact 4. (Q Ô⇒ P ) ⇐⇒ (NOT-Q OR P ).

Proof. Simply apply Definition 7.

17, Contents

Exercise 10. Let G: “I’m sleeping”; H: “My eyes are shut”; M : “x > 0”; N : “x > 1”;
α: “x = 3”; and γ: “x2 = 9”. Explain whether each of the following statements and its
converse are true or false. (Answer on p. 1389.)

(a) G Ô⇒ H. (b) M Ô⇒ N . (c) γ Ô⇒ α.

Exercise 11. Write down the converse of each statement. Then explain whether this
converse is true. (Answer on p. 1389.)
(a) “If Tin Pei Ling (TPL) is a genius, then the Nazis won World War II (WW2).”
(b) “If TPL is a genius, then the Allies won WW2.”
(c) “If π is rational, then I am the king of the world.”
(d) “If π is rational, then Lee Hsien Loong is Lee Kuan Yew’s son.”

Exercise 12. Let A: “Germany is in Europe”, B: “Germany is in Asia”, C: “1 + 1 = 2”,

and D: “1 + 1 = 3”. Explain if each of the following statements and its converse are true
or false. (Answer on p. 1390.)

(a) A Ô⇒ B. (b) A Ô⇒ C. (c) A Ô⇒ D. (d) C Ô⇒ D.

Exercise 13. Fill in the blanks with (i) must be true; (ii) must be false; or (iii) could be
true or false. (Answer on p. 1390.)

(a) If P is true, then P Ô⇒ Q _____ and Q Ô⇒ P _____.

(b) If P is false, then P Ô⇒ Q _____ and Q Ô⇒ P _____.
(c) If P Ô⇒ Q is true, then Q Ô⇒ P _____.
(d) If P Ô⇒ Q is false, then Q Ô⇒ P _____.

18, Contents

3.8. Affirming the Consequent (or The Fallacy of the Converse)
Affirming the consequent or the fallacy of the converse is a common error that
people make in everyday life. Formally, it takes the following form:

1. “P Ô⇒ Q.”
2. “Q.”
3. “Therefore, P .”

Example 22. Examples of affirming the consequent or the fallacy of the converse:

1. “If it’s raining, then the grass is wet.”

2. “The grass is wet.”
3. “Therefore, it’s raining.”45
That the grass is wet does not imply that it’s raining. The grass may be wet because (a)
it just stopped raining; (b) someone is watering it; or (c) a dog is peeing on it.
1. “If John is undergoing chemotherapy, then John is bald.”
2. “John is bald.”
3. “Therefore, John is undergoing chemotherapy.”
That John is bald does not imply he’s undergoing chemotherapy. John may be bald
because (a) he’s suffering from male-pattern baldness; (b) he likes to keep his scalp clean-
shaven; or a billion other reasons.
1. “If Mary drinks a lot of Coke, then Mary is fat.”
2. “Mary is fat.”
3. “Therefore, Mary drinks a lot of Coke.”
That Mary is fat does not imply she drinks a lot of Coke. Mary may be fat because (a)
she drinks a lot of Pepsi; (b) eats a lot of junk food; (c) suffers from low metabolism; or
a billion other reasons.

When spelt out so explicitly, affirming the consequent or the fallacy of the converse
seems rather silly. But unfortunately, people make this error all the time. Hopefully you’ll
now be able to avoid it.

Exercise 14. Is the following chain of reasoning valid?

1. “If Warren Buffett owns Google, then Warren Buffett is rich.”
2. “Warren Buffett is rich.”
3. “Therefore, Warren Buffett owns Google.” (Answer on p. 1390.)

By the way, such a chain of reasoning is called a syllogism. A syllogism has two or more statements
called premises, followed by a conclusion.
19, Contents
3.9. The Negation NOT- (P Ô⇒ Q)

Example 23. Let I: “x is German” and J: “x is European”. Then we have:

I Ô⇒ J: “If x is German, then x is European.”

Which of the following correctly negates I Ô⇒ J? In other words, which of the following
statements is NOT- (I Ô⇒ J)?
(a) “If x is German, then x is not European.”
(b) “If x is not German, then x is European.”
(c) “Some x is German and not European.”
(d) “Some x is European and not German.”
This is tricky and you should take as long as you need to think about it, before reading
the answer/explanation on the next page. The point of this exercise is to demonstrate to
yourself that it isn’t obvious what the negation of an implication is. (Or if it’s obvious,
it’ll demonstrate that you’re pretty smart.)
(As I’ve repeatedly stressed, do not do the intellectually-lazy thing of skipping ahead.
Give it at least three minutes of honest effort before going to the next page.)

Or don’t.


If you dun care, I oso dun care.

(Example continues on the next page ...)

20, Contents

With the following fact, we’ll find it very easy to negate any implication:

Fact 5. NOT- (P Ô⇒ Q) ⇐⇒ (P AND NOT-Q).

Proof. You are asked to prove this in the next Exercise.

Exercise 15. Prove the above Fact. (Hint in footnote.)46 (Answer on p. 1390.)

(... Example continued from the previous page.)

As before, let I: “x is German” and J: “x is European”.
By the above Fact, NOT- (I Ô⇒ J) is equivalent to:

I AND NOT-J: “x is German AND x is not European”.

That is: I AND NOT-J: “Some x is German AND not European.”

And so, the negation of I Ô⇒ J: “If x is German, then x is European” is:

(c) “Some x is German and not European.”

Exercise 16. Let K: “x is donzer” and L: “x is kiki”. And so, we have:

K Ô⇒ L: “If x is donzer, then x is kiki.”

Which of the following is NOT- (K Ô⇒ L)?

(a) “If x is kiki, then x is not donzer.”
(b) “Some x is kiki and not donzer.”
(c) “If x is donzer, then x is not kiki.”
(d) “Some x is donzer and not kiki.” (Answer on p. 1390.)

Look at the definition of P Ô⇒ Q (Definition 7). What is its negation?
21, Contents
We can now easily solve the Wason Four-Card Puzzle.

Example 7. The Wason Four-Card Puzzle. In a special deck of cards, each card
has a letter on one side and a number on the other. You are shown these four cards:

A Z 1 8
Betsy the Bimbotic Blonde now comes along and makes the following claim:

“If a card has a vowel on one side, then it has an even number on the other side.”

You suspect that Betsy is wrong. To prove that she’s wrong, which of the above four
cards should you turn over? (The goal is to turn over as few cards as possible.)

The answer is that we should turn over A and 1. Here are two explanations:
Solution I. By Fact 5, the negation of P Ô⇒ Q is P AND NOT-Q. Thus, the negation
of Betsy’s claim is:

“A card has a vowel on one side AND an odd number on the other side.”

So, we should turn over any vowels and odd numbers.

In case you weren’t convinced, here’s Solution II, which doesn’t directly use Fact 5. We
can also call this the brute-force case-by-case method:
• An odd number behind A would prove Betsy wrong. So, we should turn over A.
• An odd number behind Z would not prove Betsy wrong. Nor would an even number.
So, we needn’t turn over Z.
• A vowel behind 1 would prove Betsy wrong. So, we should turn over 1.
• A vowel behind Z would not prove Betsy wrong. Nor would a consonant. So, we
needn’t turn over 8.

22, Contents

3.10. The Contrapositive NOT-Q Ô⇒ NOT-P
Informally, contrapositive = “Flip and negate both”. Formally:

Definition 9. Given the implication P Ô⇒ Q, its contrapositive is the statement


Example 24. Let I: “x is German” and J: “x is European”.

Consider the implication I Ô⇒ J and its contrapositive NOT-J Ô⇒ NOT-I:

I Ô⇒ J: “If x is German, then x is European.”

NOT-J Ô⇒ NOT-I: “If x is not European, then x is not German.”

Observe that both I Ô⇒ J and its contrapositive NOT-J Ô⇒ NOT-I are true.

Now consider the converse J Ô⇒ I and its contrapositive NOT-I Ô⇒ NOT-J:

J Ô⇒ I: “If x is European, then x is German.”

NOT-I Ô⇒ NOT-J: “If x is not German, then x is not European.”

Observe that both J Ô⇒ I and its contrapositive NOT-I Ô⇒ NOT-J are false.

Example 25. Let E: “It’s raining” and F : “The grass is wet”.

Consider the implication E Ô⇒ F and its contrapositive NOT-F Ô⇒ NOT-E:

E Ô⇒ F : “If it’s raining, then the grass is wet.”

NOT-F Ô⇒ NOT-E: “If the grass is not wet, then it’s not raining.”

Observe that both E Ô⇒ F and its contrapositive NOT-F Ô⇒ NOT-E are true.

Now consider the converse F Ô⇒ E and its contrapositive NOT-E Ô⇒ NOT-F :

F Ô⇒ E: “If the grass is wet, then it’s raining.”

NOT-E Ô⇒ NOT-F : “If it’s not raining, then the grass is not wet.”

Observe that both E Ô⇒ F and its contrapositive NOT-F Ô⇒ NOT-E are false.

As the above examples suggest, every implication is equivalent to its contrapositive:

Fact 6. (P Ô⇒ Q) ⇐⇒ (NOT-Q Ô⇒ NOT-P ).

Proof. By Definition 7, P Ô⇒ Q is NOT-P OR Q. And NOT-Q Ô⇒ NOT-P is

Q OR NOT-P . Hence, P Ô⇒ Q and NOT-Q Ô⇒ NOT-P are equivalent.47

Here we make the implicit assumption that the logical connective OR is commutative.
23, Contents
Fact 6 is especially useful on those occasions when it’s hard to prove an implication but
easy to prove its contrapositive:48

Example 26. It’s not obvious how we can prove the following implication:

If x4 − x3 + x2 ≠ 1, then x ≠ 1.

But proving its contrapositive is easy:

If x = 1, then x4 − x3 + x2 = 1.

Proof. Simply plug in x = 1 to verify that x4 − x3 + x2 = 1.

Example 27. It’s not obvious how we can prove the following implication:

If x2 is even, then x is even.

But proving its contrapositive is easy:

If x is odd, then x2 is odd.

Proof. Let x = 2k + 1 where k is some integer. Then x2 = 4k 2 + 4k + 1, which is odd.

But don’t worry. In H2 Maths, you won’t be required to write any proofs; the above is just
FYI and to illustrate why the contrapositive is useful.

Exercise 17. The statement “If x is German, then x is European” is true. Which of the
following statements is its contrapositive? Which are true? (Answer on p. 1391.)
(a) “If x is European, then x is German.”
(b) “If x is not German, then x is not European.
(c) “If x is not German, then x is European.”
(d) “If x is not European, then x is not German.”
(e) “If x is not European, then x is German.”

These examples are from .
24, Contents
3.11. (P Ô⇒ Q AND Q Ô⇒ P ) ⇐⇒ (P ⇐⇒ Q)

Example 28. Let:

• I: “x is German.”
• J: “x is European.”
• M : “x > 0.”
• N : “x > 1.”
• O: “x is a positive number.”

To show that two statements are equivalent, we can use Definition 12. So for example:
• M ⇐⇒ O, because it is impossible that one is true while the other is false.
• I ⇐⇒
/ J (counterexample: if x = Emmanuel Macron, then I is false while J is true).
• M ⇐⇒/ N (counterexample: if x = 0.5, then M is true while N is false).

Alternatively, we can use the following fact:

Fact 7. (P Ô⇒ Q AND Q Ô⇒ P ) ⇐⇒ (P ⇐⇒ Q).

Proof. See p. 1252 in the Appendices.

And so, to show that P ⇐⇒ Q, we can show that P Ô⇒ Q and Q Ô⇒ P are both true.
And to show that P ⇐⇒
/ Q, we can show that either P Ô⇒ Q or Q Ô⇒ P is false.

Example 29. The implication M Ô⇒ O (“if x > 0, then x is a positive number”) is

true. So too is the implication O Ô⇒ M (“if x is a positive number, then x > 0”). And
so, by the above Fact, M ⇐⇒ O.
The implication J Ô⇒ I (“if x is European, then x is German”) is false. And so, by the
above Fact, I ⇐⇒
/ J.
The implication N Ô⇒ M (“if x > 1, then x > 0”) is false. And so, by the above Fact,
M ⇐⇒ / N.

Exercise 18. Continue with the last example: Is N ⇐⇒ O true? (Answer on p. 1391.)

Exercise 19. Let X: “John is a Singapore citizen”, Y : “John has a National Registration
Identity Card (NRIC)”, and Z: “John has a pink NRIC”. Are any two of these three
statements equivalent? (Answer on p. 1391.)

25, Contents

3.12. Other Ways to Express P Ô⇒ Q (Optional)
Eskimos supposedly have 50 different words for snow, presumably because snow is so im-
portant and ubiquitous in their lives.49
Similarly, because the implication P Ô⇒ Q is so important and ubiquitous in both
mathematics/logic and everyday life, we have many equivalent ways to express it:

Example 30. Let E: “It is raining” and F : “The grass is wet”. Then all of the following
statements are exactly equivalent:

Maths/Logic Everyday English

E Ô⇒ F That it’s raining implies that the grass is wet.
E Ô⇒ F It’s raining only if the grass is wet.50
If E, then F . If it’s raining, then the grass is wet.
If E, F . If it’s raining, the grass is wet.
F if E. The grass is wet if it’s raining.
F when E. The grass is wet when it’s raining.
F follows from E. That the grass is wet follows from the fact that it’s raining.
E is sufficient for F . That it’s raining is sufficient for the grass to be wet.
F is necessary for E. It is necessary that the grass is wet, for it to be raining.

But don’t worry. The above is just FYI.

In this textbook, we’ll avoid using the terms “only if”, “sufficient”, and “necessary”. Instead,
we’ll stick to using the following three (equivalent) statements:

“P Ô⇒ Q”, “P implies Q”, and “If P , then Q”.

Exercise 20. Let G: “I’m sleeping” and H: “My eyes are shut”. Construct the exact
same table as we just did, but for G Ô⇒ H. (Answer on p. 1391)

See this Washington Post story: “There really are 50 Eskimo words for ‘snow’”.
It’s far from obvious, but implies is logically equivalent to only if. And thus, the symbol Ô⇒ can be
read aloud not only as “implies”, but also as “only if”.
26, Contents
3.13. The Four Categorical Propositions and Their Negations
Note first that in mathematics and logic, some means at least one.
The four categorical propositions are:51

1. The universal affirmative (UA): “All S are P ” (or “Every S is P ”).

2. The universal negative (UN): “No S is P ” (or “Every S is NOT − P ”).
3. The particular affirmative (PA): “Some S is P .”
4. The particular negative (PN): “Some S is NOT − P .”

In each, we call S the subject and P the predicate.

Example 31. Here follow six examples of each categorical proposition:

The Universal Affirmative (UA) “All S are P ”

1. “All Koreans are Asian.” (Or: “Every Korean is Asian.”)
2. “All Germans are European.” (Or: “Every German is European.”)
3. “All animals are dogs.” (Or: “Every animal is a dog.”)
4. “All mammals are bats.” (Or: “Every mammal is a bat.”)
5. “All Koreans eat dogs.” (Or: “Every Korean eats dogs.”)
6. “All Germans eat bats.” (Or: “Every German eats bats.”)

The Universal Negative (UN) “No S is P ”

7. “No Korean is Asian.” (Or: “Every Korean is not Asian.”)
8. “No German is European.” (Or: “Every German is not European.”)
9. “No animal is a dog.” (Or: “Every animal is not a dog.”)
10. “No mammal is a bat.” (Or: “Every mammal is not a bat.”)
11. “No Korean eats dogs.” (Or: “Every Korean does not eat dogs.”)
12. “No German eats bats.” (Or: “Every German does not eat bats.”)

The six subjects used are: “Korean”, “German”, “animal”, “mammal”, “Korean”, and
The six predicates used are: “Asian”, “European”, “a dog”, “a bat”, “eats dogs” (or “a
dog-eater”), and “eats bats” (or “a bat-eater”).
(Example continues on the next page ...)

These are also called the A, E, I, and O propositions.
27, Contents
(... Example continued from the previous page.)

The Particular Affirmative (PA) “All S is P ”

13. “Some Korean is Asian.” (Or: “At least one Korean is Asian.”)
14. “Some German is European.” (Or: “At least one German is European.”)
15. “Some animal is a dog.” (Or: “At least one animal is a dog.”)
16. “Some mammal is a bat.” (Or: “At least one mammal is a bat.”)
17. “Some Korean eats dogs.” (Or: “At least one Korean eats dogs.”)
18. “Some German eats bats.” (Or: “At least one German eats bats.”)

The Particular Negative (PN) “Some S is NOT − P ”

19. “Some Korean is not Asian.” (Or: “At least one Korean is not Asian.”)
20. “Some German is not European.” (Or: “At least one German is not European.”)
21. “Some animals is not a dog.” (Or: “At least one animal is not a dog.”)
22. “Some mammal is not a bat.” (Or: “At least one mammal is not a bat.”)
23. “Some Korean does not eat dogs.” (Or: “At least one Korean does not eat dogs.”)
24. “Some German does not eat bats.” (Or: “At least one German does not eat bats.”)

Exercise 21. For the given pair of subject S and predicate P , write down the corres-
ponding UA, UN, PA, and PN. (Answer on p. 1391.)

(a) Donzer Kiki
(b) Donzer Cancer
(c) Bachelor Married
(d) Bachelor Smoke

Exercise 22. Is each of the following statements always true? If so, explain why. If not,
supply a counterexample. (Answer on p. 1392.)
(a) The UA and UN are negations of each other.
(b) The PA and UN are negations of each other.

28, Contents

In the last exercise, we proved that the UA and the UN are not generally negations of each
other. Nor are the PA and the UN. So, what then is the correct negation of each categorical

Example 32. Consider this UA: “All Koreans eat dogs.”

To prove that this UA is wrong, we need merely show that there is at least one Korean
who doesn’t eat dogs. Hence, the negation of this UA is a PN:

“Some (i.e. at least one) Korean does not eat dogs”.

Example 33. Consider this UN: “No Korean eats dogs.”

To prove that this UN is wrong, we need merely show that there is at least one Korean
who eats dogs. Hence, the negation of this UN is a PA:

“Some (i.e. at least one) Korean eats dogs”.

The above examples show that in general, the negation of the UA is the PN, while the
negation of the UN is the PA:

Statement Negation
UA: “All S are P .” PA: “Some S is NOT-P .”
UN: “No S is P .” PN: “Some S is P .”

Example 34. The same examples as before, with their negations:

UA “All S are P ” Negation: PN “Some S is NOT-P ”

1. “All Koreans are Asian.” “Some Korean is not Asian.”
2. “All Germans are European.” “Some German is not European.”
3. “All animals are dogs.” “Some animal is not a dog.”
4. “All mammals are bats.” “Some mammal is not a bat.”
5. “All Koreans eat dogs.” “Some Korean does not eat dogs.”
6. “All Germans eat bats.” “Some German does not eat bats.”

UN “No S is P ” Negation: PA “Some S is P ”

7. “No Korean is Asian.” “Some Korean is Asian.”
8. “No German is European.” “Some German is European.”
9. “No animal is a dog.” “Some animal is a dog.”
10. “No mammal is a bat.” “Some mammal is a bat.”
11. “No Korean eats dogs.” “Some Korean eats dogs.”
12. “No German eats bats.” “Some German eats bats.”

29, Contents

Exercise 23. Is each of the following statements a UA, UN, PA, or PN? Also, write
down its negation. (Answer on p. 1392.)

(a) “All donzers are kiki.”
(b) “No donzer is kiki.”
(c) “Some donzer is kiki.”
(d) “Some donzer is not kiki.”
(e) “All bachelors are married.”
(f) “No bachelor is married.”
(g) “Some bachelor is married.”
(h) “Some bachelor is not married.”
(i) “All donzers cause cancer.”
(j) “No donzer causes cancer.”
(k) “Some donzer causes cancer.”
(l) “Some donzer does not cause cancer.”
(m) “All bachelors smoke.”
(n) “No bachelor smokes.”
(o) “Some bachelor smokes.”
(p) “Some bachelor does not smoke.”

Exercise 24. While trying to excuse the less-than-perfect play of a basketball player, a
commentator remarks, “Everybody is not LeBron James.” Rewrite this statement into
the form of a categorical proposition. Identify the type of categorical proposition, the
subject, and the predicate.
Write down its negation. Then state if the commentator’s statement is true or false. If
false, what should he have said instead? (Answer on p. 1392.)

30, Contents

3.14. Chapter Summary
• The conjunction P AND Q is true if both P and Q are true.
• The disjunction P OR Q is true if either P or Q is true.
• The negation NOT-P is the statement that’s true if P is false and false if P is true.
• P and Q are equivalent (written P ⇐⇒ Q) if it is impossible that one is true while
the other is false.
• De Morgan’s Laws:
– NOT- (P AND Q) ⇐⇒ (NOT-P OR NOT-Q).
– NOT- (P OR Q) ⇐⇒ (NOT-P AND NOT-Q).
• The implication P Ô⇒ Q:
– Is defined as NOT-P OR Q.
– Is equivalent to its contrapositive NOT-P Ô⇒ NOT-Q.
– Has the negation P AND NOT-Q.
– Is not generally equivalent to its converse Q Ô⇒ P .
∗ If P Ô⇒ Q is true, then Q Ô⇒ P could be true or false.
∗ If P Ô⇒ Q is false, then Q Ô⇒ P must be true.
• The four categorical propositions and their negations:

Statement Negation
UA: “All S are P .” PN: “Some S is NOT − P .”
UN: “No S is P .” PA: “Some S is P .”

We call S the subject and P the predicate.

31, Contents

4. Sets
The set is the basic building block of mathematics. Informally, a set is a “container” or
“box” whose contents we call its elements (or members).

Example 35. Let A = {1, 3, 5} and B = {100, 200}.

1 100

5 200

The set A The set B

The set A contains three elements — namely, the numbers 1, 3, and 5. Informally, it is
a “box” containing the numbers 1, 3, and 5.
The set B contains three elements — namely, the numbers 100 and 200. Informally, it is
a “box” containing the numbers 100 and 200.
Note that when we talk about a set, we refer to both the box and the things inside it.

Mathematical punctuation:
• Braces {} — are used to denote the “container”.
• A comma means “and” and is used to separate the elements within a set.

Exercise 25. Write down C, the set of the first 7 positive integers.(Answer on p. 1393.)

Exercise 26. Write down D, the set of even prime numbers. (Answer on p. 1393.)

32, Contents

4.1. The Elements of a Set Can Be Pretty Much Anything
Note that for the A-Levels and hence also in this textbook, the elements of a set will almost
always be numbers. However, in general, they needn’t be numbers; they can be pretty much
anything whatsoever:52

Example 36. Let V be the set of the four largest cities in the US. Then V =
{New York City, Los Angeles, Chicago, Houston}.

Example 37. Let L be the set of suits in the game of bridge. Then L = {♠, ♡, ♢, ♣}.

Example 38. Let E = {3, π2 , The Clementi Mall, Love, the colour green}.

3 π2

The Clementi Mall

❤ ▮
The set E

The set E contains exactly five elements: two numbers — 3 and π2 ; a shopping centre
— The Clementi Mall; an abstract concept called love (denoted in the figure above by a
red heart); and even the colour green (denoted by a green rectangle).

Example 39. Let S be the set of Singapore citizens. Then S contains about 3.4M
elements,53 including Lee Hsien Loong, Ho Ching, and Chee Soon Juan.

Example 40. Let U be the set of United Nations (UN) member states. Then U contains
exactly 193 elements,54 including Afghanistan, Singapore, and Zimbabwe.

Exercise 27. Write down X, the set of Singapore Prime Ministers (both past and
present). (Answer on p. 1393.)

Actually, there are some restrictions on what can go into a set, but these technicalities are beyond the
scope of the A-Levels.
According to SingStat, the number of Singapore citizens in 2017 was about 3,439,200.
According to this UN webpage, the most recent and 193rd state to join the UN was South Sudan in
33, Contents
A set can even contain other sets:

Example 41. Let F be the set that contains the sets A = {1, 3, 5} and B = {100, 200}.

That is, let: F = {A, B} = {{1, 3, 5} , {100, 200}}.

The set A The set B

1 100 1 100
5 5
3 200 3 200

The set F The set G

Now consider instead the set:

G = {1, 3, 5, 100, 200}.

The sets F and G look very similar. So, is G the same set as F ?
Nope. The set F contains exactly two elements, namely the sets A and B.
In contrast, G contains exactly five elements, namely the numbers 1, 3, 5, 100, and 200.
And so, F and G are not the same.
You can think of F as a box that itself contains two boxes — namely, A and B, each of
which contain some numbers. In contrast, G is a box that contains no boxes; instead, it
simply contains five numbers.

The following exercises continue with the above example:

Exercise 28. Let H be the set whose elements are F and G. (Answer on p. 1393.)

(a) How many elements does H have?

(b) Write down H without using the letters A, B, F , or G.

Exercise 29. Let I be the set whose elements are A, B, and G. (Answer on p. 1393.)

(a) How many elements does I have?

(b) Write down I without using the letters A, B, F , or G.
(c) Compare the sets H (from the previous Exercise) and I. Are they the same?

34, Contents

4.2. In ∈ and Not In ∉
Mathematical punctuation:
• ∈ means “is in”.
• ∉ means “is not in”.

Example 42. Let J = {1, 2, 3, 4, 5, 6, 7}. Then 1 ∈ J, 2 ∈ J, 3 ∈ J, etc. You can read these
statements aloud as “1 is in J”, “2 is in J”, “3 is in J”, etc.
We can also write 1, 2, 3 ∈ J (read aloud as “1, 2, and 3 are in J”).
Also, 8 ∉ J, 9 ∉ J, 10 ∉ J, etc. (read aloud as “8 is not in J”, “9 is not in J”, “10 is not
in J”, etc.). We can also write 8, 9, 10 ∉ J (read aloud as “8, 9, and 10 are not in J”).

Example 43. Let K = {Cow, Chicken}.

Then Cow ∈ K reads aloud as “Cow is an element of K”.
Cow, Chicken ∈ {Cow, Chicken} reads aloud as “Cow and Chicken are elements of K.”

Exercise 30. Fill in the blanks with either ∈ or ∉. (Answer on p. 1393)

(a) Los Angeles ___ The set of the four largest cities in the US.
(b) Tharman ___ The set of Singapore Prime Ministers (past and present).

35, Contents

4.3. The Order of the Elements Doesn’t Matter

Definition 10. Two sets A and B are equal if every element that is in A is also in B and
every element that is in B is also in A.

One implication of the above definition55 is that the order in which we write out the
elements of a set does not matter:

Example 44. Let A = {2, 4, 6} and B = {6, 2, 4}.

Then A = B because every element that is in A is also in B and every element that is in
B is also in A.

2 6

4 2
6 4

The set A The set B

In fact, {2, 4, 6} = {2, 6, 4} = {4, 2, 6} = {4, 6, 2} = {6, 2, 4} = {6, 4, 2}.

Example 45. Let C = {Cow, Chicken} and D = {Chicken, Cow}.

Then C = D because every element that is in C is also in D and every element that is in
D is also in C.

🐄 🐔

The set C The set D

Exercise 31. Is each of the following pairs of sets equal? (Answer on p. 1393.)

(a) {1, 2, 3} and {3, 2, 1}. (b) {{1} , 2, 3} and {{3} , 2, 1}.

Actually, in set theory, this is not a definition, but an axiom (known as the Axiom of Extensionality).
But here for simplicity, I’ll just call it a definition.
36, Contents
4.4. n(S) Is the Number of Elements in the Set S
Let S be a set. Then the number of elements in S is denoted by:


Example 46. n ({2, 4, 6}) = 3.

Example 47. n ({Cow, Chicken}) = 2.

Exercise 32. Let X be the set of Singapore Prime Ministers (past and present). Then
what is n (X)? (Answer on p. 1393.)

Remark 5. Note that most writers denote the number of elements in the set S by ∣S∣.56
But for some reason, your A-Level syllabus (p. 16) instead uses the notation n (S), so
that’s what we’ll have to use too.

4.5. The Ellipsis “. . . ” Means Continue in the Obvious Fashion

Mathematical punctuation:
• The ellipsis “. . . ” means “continue in the obvious fashion”.

Example 48. L is the set of all odd positive integers smaller than 100. So in set notation,
we can write L = {1, 3, 5, 7, 9, 11, . . . , 99}.

Example 49. M is the set of all negative integers greater than −100. So in set notation,
we can write M = {−99, −98, −97, . . . , −2, −1}.

What is “obvious” to you may not be obvious to your reader. So only use the ellipsis when
you’re confident it will be obvious to your reader! And as I did with the sets above, never
be shy to write a few more of the set’s elements (doing so costs you nothing except maybe
a few more seconds and some ink).

Exercise 33. In the above examples, what are n (L) and n (M )? (Answer on p. 1393.)
Exercise 34. Let N be the set of even integers greater than 100 but smaller than 1, 000.
Write down N in set notation. (Answer on p. 1393.)

Or cardA. See ISO 80000-2:2009, Item No. 2-5.5.
37, Contents
4.6. Repeated Elements Don’t Count
Another implication of Definition 10 is that repeated elements don’t count (they’re
simply ignored):

Example 50. Let A = {2, 4, 6} and B = {2, 2, 4, 6}.

Then A = B because every element that is in A is also in B and every element that is in
B is also in A.

2 2 2

6 4 6

The set A The set B

In fact, {2, 4, 6} = {2, 2, 4, 6} = {4, 2, 6, 2, 2, 6, 4, 2}.

Moreover, n ({2, 4, 6}) = n ({2, 2, 4, 6}) = n ({4, 2, 6, 2, 2, 6, 4, 2}).

Example 51. {Cow, Chicken} = {Cow, Cow, Chicken} = {Chicken, Cow, Chicken}.

🐄 🐄 🐔
= 🐔 = 🐔
🐔 🐄 🐄

n ({Cow, Chicken}) = n ({Cow, Cow, Chicken}) = n ({Chicken, Cow, Chicken}) = 2.

Exercise 35. Given below is the set W . Find n (W ). (Answer on p. 1393.)

W = {Apple, Apple, Apple, Banana, Banana, Apple}.

Exercise 36. C is the set of even prime numbers. Find n(C). (Answer on p. 1393.)

38, Contents

4.7. R Is the Set of Real Numbers
So far, we’ve encountered only finite sets, i.e. sets with finitely many elements.
In this and the next two subchapters, we introduce several infinite sets, i.e. sets with
infinitely many elements.
First, we have the set of real numbers (or simply reals):

Definition 11. The set of real numbers is denoted R.

Example 52. 16, −1.87, π ≈ 3.14159, and 2 ≈ 1.41421 are all real numbers.

Now, by the way, what exactly is a real number? This sounds like a “dumb” question, but
is actually a profound one that was satisfactorily resolved only from the late 19th century.
Indeed, this question is a little beyond the scope of the A Levels.
And so for the A Levels, we’ll simply pretend — as we did in secondary school — that
“everyone knows” what real numbers are (even though, as the quotes below suggest, they
actually don’t). We shall not attempt to define or construct the real numbers.

Like most students entering college, mathematicians of the midnineteenth

century thought they understood real numbers. In fact, the real number line
turned out to be much subtler and more complicated than they imagined.

— David M. Bressoud (2008, p. 51).

Few mathematical structures have undergone as many revisions or have been

presented in as many guises as the real numbers. Every generation re-
examines the reals in the light of its values and mathematical objectives.

— Faltin, Metropolis, Ross, and Rota (1975).

39, Contents

4.8. Z Is the Set of Integers
Die ganzen Zahlen hat der liebe Gott gemacht, alles andere ist Menschenwerk.
God made the integers, the rest is the work of man.

— Leopold Kronecker (1886).57

Next, Z is the set of integers:

Definition 12. The set of integers, denoted Z, is:

Z = {. . . , −3, −2, −1, 0, 1, 2, 3, . . . }.

Z is for Zahl, German for number.

Example 53. 16 is an integer, while −1.87, π, and 2 are non-integers.

Note that with an infinite set, we cannot explicitly list out all its elements. And so, when
writing out an infinite set, we’ll sometimes find it helpful to use the ellipsis.
When writing out Z above, we used two ellipses. The first ellipsis says we continue “left-
wards” in the “obvious” fashion, with −4, −5, −6, etc. The second says we continue “right-
wards” in the “obvious” fashion, with 4, 5, 6, etc.

Exercise 37. H is the set of all prime numbers. With the aid of an ellipsis, write down
H in set notation. (Answer on p. 1393.)

It’s not on your syllabus, but a natural number is simply any positive integer:58

Definition 13. The set of natural numbers, denoted N, is:

N = {1, 2, 3, . . . }.

According to Heinrich Weber (in his 1893 obituary for Kronecker), Kronecker made this remark at an
1886 lecture to the Berliner Naturforscher-Versammlung.
There’s actually a little bit of a debate as to whether N should include 0.
40, Contents
4.9. Q Is the Set of Rational Numbers

Definition 14. A rational number (or simply rational) is any real number that can be
expressed as the ratio of two integers.
Any other real number is called an irrational number (or simply irrational).

Definition 15. The set of rational numbers is denoted Q.

Q is for quotient.59

Example 54. 16 ∈ Q because we can express 16 as the ratio of two integers (e.g. 16/1).
−1.87 ∈ Q because we can express −1.87 as the ratio of two integers (e.g. −187/100).
√ √
Example 55. 2, π ∉ Q. In √ words: “ 2 and π are not elements of the set of rational
numbers.” Or more simply: “ 2 and π are irrational.”

(Note though that this is far from obvious. It takes a little work to prove that 2 is
irrational and even more work to prove that π is irrational.)

As you probably already know from secondary school, any number whose decimal repres-
entation (eventually) recurs is rational. All other numbers are irrational.60

Example 56. 1/3 = 0.33333 ⋅ ⋅ ⋅ = 0.3 is rational and sure enough, it has the recurring digit
3. We will use the overbar to denote recurring digit(s).
1/7= 0.142857142857142857 ⋅ ⋅ ⋅ = 0.142857 is rational and sure enough, it has the recurring
digits 142857.
Similarly, 16 = 16.000 ⋅ ⋅ ⋅ = 16.0 and 1.87 = 1.87000 ⋅ ⋅ ⋅ = 1.87 are rational and have the
recurring digit 0. (Of course, when the recurring digit is 0, we usually don’t bother
writing it.)

Example 57. 2 ≈ 1.4142135623 . . . and π ≈ 3.1415926535 . . . are irrational. And sure
enough, their digits never recur.
(But again, this is far from obvious and takes some work to prove.)

Also quoziente in Italian, Quotient in German, and quotient in French.
We prove this in Fact 188 (p. 1254) of the Appendices.
41, Contents
4.10. A Taxonomy of Numbers
Below is a taxonomy of the types of numbers you’ll encounter in this textbook. We’ll√study
complex and imaginary numbers only later on in Part IV (quick preview: i = −1 is
the imaginary number — indeed, i is the imaginary unit).
Real numbers are either rational or irrational. In turn, rational numbers are either integers
or non-integers.

Reals R Rationals Q Integers Z
numbers C

Irrationals Non-integers

Three miscellaneous remarks:

1. R, Z, and Q are infinite sets (i.e. they contain infinitely many elements).
We may thus write: n (R) = n (Z) = n (Q) = ∞.
2. When handwritten, R is an R with an extra vertical line on the left; Z is a Z with an
extra diagonal line; and Q is a Q with an extra vertical line:

3. Infinity (∞) and negative infinity (−∞) are NOT numbers.

As with real numbers, we shall not attempt to formally define what ∞ or −∞ is. Instead,
we shall simply and informally say that ∞ is the “thing” that is greater than every real
number; while −∞ is the “thing” that is smaller than every real number.
In case that wasn’t clear, let me repeat:


Remark 6. This textbook uses blackboard bold font and writes R, Z, and Q. Note
though that some other writers instead use bold font and write R, Z, and Q. Your
A-Level syllabus and exams use only the former, so that’s what we’ll do too.

Actually, the truth is somewhat more complicated. For example, some writers call ∞ and −∞ extended
real numbers. But in this textbook, I’ll keep it simple and insist that infinity is not a number.
42, Contents
4.11. More Notation: + , − , and 0

To create a new set that contains only the positive elements of the old set, append a
superscript plus sign (+ ) to the name of a set:
1. Z+ = {1, 2, 3, . . . } is the set of all positive integers.
2. Q+ is the set of all positive rational numbers.
3. R+ is the set of all positive real numbers.
To create a new set that contains only the negative elements of the old set, append a
superscript minus sign (− ) to the name of a set:
1. Z− = {−1, −2, −3, . . . } is the set of all negative integers.
2. Q− is the set of all negative rational numbers.
3. R− is the set of all negative real numbers.
(As we’ll learn later, there is no such thing as a positive or negative complex number.
Hence, there are no sets denoted C+ or C− .)
To add the number 0 to a set, append a subscript zero (0 ) to its name:
1. Z+0 = {0, 1, 2, 3, . . . } is the set of all non-negative integers. Z−0 = {0, −1, −2, −3, . . . } is the
set of all non-positive integers.
2. Q+0 is the set of all non-negative rational numbers. Q−0 is the set of all non-positive
rational numbers
3. R+0 is the set of all non-negative real numbers. R−0 is the set of all non-positive real

Exercise 38. Which of the sets introduced above are finite? (Answer on p. 1393.)

Remark 7. The three pieces of notation introduced on this page (+ , − , and 0 ) aren’t terribly
important or widely used. I give them a quick mention only because they’re listed on p.
16 of your A-Level syllabus.

43, Contents

4.12. The Empty Set ∅
A set can contain any number of elements. Indeed, it can even contain
zero elements.
The set that contains zero elements is called the empty set:

Definition 16. The empty set is the set {}. It is often also denoted
The empty set
Informally, the empty set {} = ∅ is the “container” with nothing inside. {} or ∅.
Hence the name.
Example 58. In 2016, the set of all Singapore Ministers who are younger than 30 is {}
or ∅. This means there is no Singapore Minister who is younger than 30.

Example 59. The set of all even prime numbers greater than 2 is {} or ∅. This means
there is no even prime number that is greater than 2.

Example 60. The set of numbers that are greater than 4 and smaller than 4 is {} or ∅.
This means there is no number that is simultaneously greater than 4 and smaller than 4.

Example 61. The set {∅} is not the same as the set ∅.
{∅} is a set containing a single element, namely the empty set.
Informally, {∅} is a box containing an empty box — it is not empty.
In contrast, ∅ is the empty set.
Informally, it is simply an empty box.
We can also rewrite the two sets as:

{∅} = {{}} and ∅ = {}.

Now it is perhaps clearer that {{}} ≠ {}. {∅} = {{}} ∅ = {}

Example 62. The set {∅, 3, {∅}} is the set contain-

ing exactly three elements, namely the empty set, the
number 3, and a set containing the empty set.
Take care to note that this set does not contain four

{∅, 3, {∅}}

Exercise 39. Tricky: Let S = {{{}} , ∅, {∅} , {}}. What is n (S)? (Answer on p. 1393.)

44, Contents

4.13. Intervals
An interval of real numbers may written with the aid of parentheses () and brackets

√ 63. Let A = (0, 3). Then A is the set of real numbers that are > 0 and < 3. So,
2 ≈ 1.41 ∈ A, but 0, 3 ∉ A.

Real number line A

−3 −2 −1 0 1 2 3 4

√ 64. Let B = [0, 3]. Then B is the set of real numbers that are ≥ 0 and ≤ 3. So,
0, 2, 3 ∈ B.

Real number line B

−3 −2 −1 0 1 2 3 4

√ 65. Let C = (0, 3]. Then C is the set of real numbers that are > 0 and ≤ 3. So,
2, 3 ∈ C, but 0 ∉ C.

Real number line C

−3 −2 −1 0 1 2 3 4

√ 66. Let D = [0, 3). Then D is the set of real numbers that are ≥ 0 and < 3.
So, 0, 2 ∈ D, but 3 ∉ D.

Real number line D

−3 −2 −1 0 1 2 3 4

45, Contents

Definition 17. Let a and b be real numbers with b ≥ a. Then:
(a) (a, b) denotes the set of real numbers that are > a and < b.
(b) [a, b] “ ≥ a and ≤ b.
(c) (a, b] “ > a and ≤ b.
(d) a, b “ ≥ a and < b.
(e) (a, ∞) “ > a.
(f) [a, ∞) “ ≥ a.
(g) (−∞, a) “ < a.
(h) (−∞, a] “ ≤ a.
(i) (−∞, ∞) denotes the set of all real numbers. That is, (−∞, ∞) = R.
Moreover, each of the above nine sets is called an interval. We call (a, b) an open interval,
[a, b] a closed interval, (a, b] a left half-open interval, [a, b) a right half-open interval, a a
left endpoint, and b a right endpoint.

Exercise 40. Let X = [1, 1] , Y = (1, 1) , Z = (1, 1.01). Find n (X), n (Y ), n (Z). Express
the set X in another way and the set Y in another two ways. (Answer on p. 1393.)
Exercise 41. Express R, R+ , R+0 , R− , and R−0 in interval notation. (Answer on p. 1394.)

Remark 8. Some writers (usually French-speakers) use reverse bracket notation62 —

that is, they write ]a, b[, ]a, b], or [a, b[ instead of (a, b), (a, b], or [a, b). We will not do
so in this textbook.

One good argument in favour of reverse bracket notation is that it avoids confusing the open interval
(a, b) with the ordered pair (a, b) (we’ll learn more about ordered pairs in Ch. 6). However, by
and large, the reverse bracket notation remains uncommon, except in continental Europe and especially
France (where it was introduced by the Bourbaki group).
46, Contents
4.14. Subset Of ⊆

Definition 18. If every element of A is an element of B, we say that A is a subset of B

and write:

A ⊆ B.

Otherwise, we say that A is not a subset of B and write A ⊈/ B.

Example 67. Let M = {1, 2}, N = {1, 2, 3}, and O = {1, 2, 4, 5}, and P = {3, 2, 1}. Then:

• M is a subset of N , O, and P .
We write M ⊆ N , M ⊆ O, and M ⊆ P .
• N is a subset of P , but not of M or O.
We write N ⊆ P , but N ⊈ M and N ⊈ O.
• O is not a subset of M , N , or P .
We write O ⊈ M , O ⊈ N , and O ⊈ P .
• P is a subset of N , but not of M or O.
We write P ⊆ N , but P ⊈ M and P ⊈ O.

Note that N is a subset of P and P is a subset of N . Indeed, the sets N and P are equal.
We have the following fact:

Fact 8. Two sets are equal ⇐⇒ They are subsets of each other.

Proof. In the following chain of reasoning, the first ⇐⇒ simply uses our definition of when
two sets are equal (Definition 10). The second ⇐⇒ simply uses the above definition.

“Two sets A and B are equal.”

⇐⇒ “Every element in A is also in B and every element in B is also in A.”
⇐⇒ “A is a subset of B and B is a subset of A.”

Exercise 42. Are Z, Q, and R subsets of each other? (Answer on p. 1394.)

Exercise 43. True or false: “The set of current Singapore Prime Minister(s) is a subset
of the set of current Singapore Minister(s).” (Answer on p. 1394.)
Exercise 44. Let A and B be sets. Explain whether each of the following statements is
true. (If false, give a counterexample.) (Answer on p. 1394.)

(a) A ⊆ B Ô⇒ A = B. (d) A = B Ô⇒ A ⊆ B.
(b) B ⊆ A Ô⇒ A = B. (e) A = B ⇐⇒ A ⊆ B.
(c) A = B Ô⇒ A ⊆ B. (f) A = B ⇐⇒ B ⊆ A.

47, Contents

4.15. Proper Subset Of ⊂

Definition 19. If A ⊆ B but A ≠ B, we say that A is a proper subset of B and write:

A ⊂ B.

Otherwise, we say that A is not a proper subset of B and write A ⊂/ B.

Example 68. Let M = {1, 2}, N = {1, 2, 3}, and O = {1, 2, 4, 5}, and P = {3, 2, 1}. Then:

• M is a proper subset of N , O, and P .

We write M ⊂ N , M ⊂ O, and M ⊂ P .
• N is not a proper subset of M , O, or P .
We write N ⊂/ M , N ⊂/ O, and N ⊂/ P .
• O is not a proper subset of M , N , or P .
We write O ⊂/ M , O ⊂ N , and O ⊂/ P .
• P is not a proper subset of M , N , or O.
We write P ⊂ M , P ⊂/ N , and P ⊂/ O.

Note that N = P and so by the above Definition, N is not a proper subset of P and P
is not a proper subset of N .

Exercise 45. Let S be the set of all squares and R be the set of all rectangles. Is S ⊂ R?
(Answer on p. 1394.)

Exercise 46. Does A ⊆ B imply that A ⊂ B? (Answer on p. 1394.)

Exercise 47. Does A ⊂ B imply that A ⊆ B? (Answer on p. 1394.)

Exercise 48. True or false statement: “If A is a subset of B, then A is either a proper
subset of or is equal to B.” (Answer on p. 1394.)

Remark 9. The A-Level syllabus (p. 16) uses the symbol ⊆ to mean “subset of” and ⊂ to
mean “proper subset of”. So this is what we’ll use in this textbook.
However, confusingly enough, some writers use the symbol ⊂ to mean “subset of” and ⊊
to mean “proper subset of”. We will not follow such practice in this textbook. This is
just FYI, in case you get confused when reading other mathematical texts!

48, Contents

4.16. Union ∪

Definition 20. The union of A and B is the set of elements that are in A OR B and is
denoted A ∪ B.

The set A The set B

A ∪ B is the yellow region.

Tip: “U” for Union.

Example 69. Let T = {1, 2}, U = {3, 4}, and V = {1, 2, 3}.
Then T ∪ U = {1, 2, 3, 4}, T ∪ V = {1, 2, 3}, and U ∪ V = {1, 2, 3, 4}. And T ∪ U ∪ V =
{1, 2, 3, 4}.

Exercise 49. Rewrite each set more simply:

(a) [1, 2] ∪ [2, 3].

(b) (−∞, −3) ∪ [−16, 7).
(c) {0} ∪ Z+ . (Answer on p. 1394.)

Exercise 50. Let S be the set of squares and R be the set of rectangles. What is S ∪ R?
(Answer on p. 1394.)
Exercise 51. What is the union of the set of rationals and the set of irrationals? (Answer
on p. 1394.)

49, Contents

4.17. Intersection ∩
The set A The set B

A ∩ B is the yellow region.

Definition 21. The intersection of A and B, denoted A ∩ B, is the set of elements that
are in A AND B. Two sets intersect if their intersection contains at least one element.

Equivalently, two sets A and B intersect if their intersection is non-empty, i.e.

A ∩ B ≠ ∅.

Definition 22. We say that two sets are mutually exclusive or disjoint if they do not
intersect, i.e. their intersection is empty:

A ∩ B = ∅.

Example 70. Let T = {1, 2}, U = {3, 4}, and V = {1, 2, 3}.
Then T ∩ U = ∅, T ∩ V = {1, 2}, U ∩ V = {3}, and T ∩ U ∩ V = ∅.

Exercise 52. Rewrite each of the following sets more simply:

(a) (4, 7] ∩ (6, 9).
(b) [1, 2] ∩ [5, 6].
(c) (−∞, −3) ∩ [−16, 7). (Answer on p. 1394.)

Exercise 53. Let S be the set of squares and R be the set of rectangles. What is S ∩ R?
(Answer on p. 1394.)

Exercise 54. What is the intersection of the set of rationals and the set of irrationals?
(Answer on p. 1394.)

50, Contents

4.18. Set Minus ∖
The set minus (sometimes also called set difference) sign ∖ is very convenient. Sadly, it
does not appear in the A-Level syllabus. Nonetheless, it’s worth a quick mention and I’ll
sometimes use it in this textbook. (If you do use it on the exams, you should make a quick
note to the marker that you’re using the set minus notation.)

Definition 23. A set minus B is the set of elements that are in A AND not in B and is
denoted A ∖ B.

The set A The set B

A ∖ B is the yellow region.

Example 71. Let T = {1, 2}, U = {3, 4}, and V = {1, 2, 3}.
Then T ∖ U = T , T ∖ V = ∅, and U ∖ V = {4}.

Exercise 55. Continue with the above example. Write down V ∖ T and V ∖ U . (Answer
on p. 1394.)

Some examples to illustrate why the set minus notation is sometimes very convenient and
allows us to avoid writing ugly monstrosities.

Example 72. Without the set minus sign, we’d write (−∞, 1) ∪ (1, ∞) to denote the set
of all real numbers except 1.
With it, we can write the same set more simply as R ∖ {1}.

Example 73. Without the set minus sign, we’d write ⋅ ⋅ ⋅ ∪ (−3, −2) ∪ (−2, −1) ∪ (−1, 0) ∪
(0, 1) ∪ (1, 2) ∪ (2, 3) . . . to denote the set of all real numbers that aren’t integers.
With it, we can write the same set more simply as R ∖ Z.

51, Contents

4.19. The Universal Set E
The universal set E (that’s a squiggly E) is the set of “all” elements. Note though that
what we mean by “all” elements depends on the context:

Example 74. In the context of a roll of a die, the universal set might be the set of all
possible outcomes:

E = {1, 2, 3, 4, 5, 6} .

Example 75. In the context of a spin of a European-style roulette wheel, the universal
set might be is the set of all possible outcomes:

E = {0, 1, 2, 3, . . . , 36} .

Note that in American-style roulette, there is a 38th possible outcome — double zero 00.
And so in the American context, the universal set might instead be:

E = {00, 0, 1, 2, 3, . . . , 36} .

(It looks like Marina Bay Sands does American-style roulette.)

Example 76. In the context of a game of chess, the universal set might be the set of all
possible outcomes for White:

E = {Win, Lose, Draw} .

Remark 10. I give the universal set notation E a quick mention here only because it
appears on your A-Level syllabus (p. 16). We will rarely (if ever) make use of this piece
of notation in this textbook.

52, Contents

4.20. The Set Complement A′

Definition 24. A′ , the set complement of A, is the The set A

set of all elements that are not in A.

Equivalently: A′ = E ∖ A. That is, A′ is the universal

set E minus those elements in A.
In the figure (right), A′ is the yellow region.
Note that as we saw in the last subchapter, what the
universal set E is (i.e. what we mean by “all” ele-
ments) depends on the context:

Example 77. Let A = {2, 3}. If the relevant context is the roll of a die, then:

E = {1, 2, 3, 4, 5, 6} and A′ = {1, 2, 3, 4, 5, 6} ∖ A = {1, 4, 5, 6}.

But if instead the relevant context is a roulette wheel spin, then:

E = {0, 1, 2, . . . , 36} and A′ = {0, 1, 2, . . . , 36} ∖ A = {0, 1, 4, 5, 6, . . . , 36}.

And if instead the relevant context is the positive integers smaller than 10, then

E = {1, 2, . . . , 9} and A′ = {1, 2, . . . , 9} ∖ A = {1, 4, 5, 6, 7, 8, 9}.

Example 78. Let B = {2, 4, 6, . . . }. If the relevant context is the positive integers, then:

E = Z+ and B ′ = Z+ ∖ {2, 4, 6, 8, . . . } = {1, 3, 5, 7, . . . }.

That is, B ′ is simply the set of positive odd numbers.

But if instead the relevant context is the set of all integers, then:

E =Z and B ′ = Z ∖ {2, 4, 6, 8, . . . } = Z−0 ∪ {1, 3, 5, 7, . . . }.

That is, B ′ is the set of positive odd numbers and all non-positive integers.

Example 79. Let C = R+ . If the relevant context is all real numbers, then:

E =R and C ′ = R ∖ C = R−0 .

That is, C ′ is simply the set of non-positive reals.

But if instead the relevant context is the set of reals greater than or equal to −1, then:

E = [−1, ∞) and C ′ = [−1∞) ∖ C = [−1, 0].

That is, C ′ is the set of reals between −1 and 0 (inclusive).

Remark 11. Just so you know (JSYK), some writers write Ac or A instead of A′ .
53, Contents
4.21. De Morgan’s Laws
It turns out there’s a deep connection between logic and set theory. In particular:
• The intersection ∩ corresponds to the logical connective AND (the conjunction).
• The union ∪ corresponds to the logical connective OR (the disjunction).
Earlier in logic, we had De Morgan’s Laws:
• Fact 1: The negation of the conjunction P AND Q is NOT-P OR NOT-Q.
• Fact 2: The negation of the conjunction P OR Q is NOT-P AND NOT-Q.
We now have the following De Morgan’s Laws for set theory:

Fact 9. (P ∩ Q) ′ = P ′ ∪ Q′ .

Proof. See p. 1256 in the Appendices.

The set P The set Q The set P The set Q

′ ′
(P ∩ Q) = P ′ ∪ Q′ is in yellow. (P ∪ Q) = P ′ ∩ Q′ is in yellow.

Fact 10. (P ∪ Q) ′ = P ′ ∩ Q′ .

Proof. See p. 1256 in the Appendices.

54, Contents

Example 80. Suppose that every animal can be classified (i) either as tall or short; and
(ii) either as fat or lean. So for example, the horse is tall and lean, the elephant is tall
and fat, the pig is short and fat, and the mouse is short and lean.
Let P be the set of tall animals and Q be the set of fat animals. Then:
• P ′ is the set of short animals.
• Q′ is the set of lean animals.
• P ′ ∪ Q′ is the set of animals that are short OR lean.
• P ∩ Q is the set of animals that are tall AND fat.
• (P ∩ Q) ′ = P ′ ∪ Q′ is the set of animals that are short OR lean (yellow region below).
The set P The set Q

🐎 🐘 🐖
• P ′ ∩ Q′ is the set of animals that are short AND lean.
• P ∪ Q is the set of animals that are tall OR fat.
• (P ∪ Q) ′ = P ′ ∩ Q′ is the set of animals that are short AND lean (yellow region below).

The set P The set Q

🐎 🐘 🐖
55, Contents
4.22. Set-Builder Notation (or Set Comprehension)
Previously, we simply wrote out a set using the method of set enumeration. That is to
say, we simply enumerated (i.e. listed out) all their elements:

Example 81. The set of Singapore PMs (both past and present) is:

S = {Lee Kuan Yew, Goh Chok Tong, Lee Hsien Loong} .

Where the set has too many elements to list out, we can use the ellipsis “. . . ”. But this too
counts as the method of set enumeration:

Example 82. The set of integers greater than 100 is:

T = {101, 102, 103, 104, . . . } .

We now introduce a second method of writing out a set, called set-builder notation or
set comprehension:

Example 83. The set of Singapore PMs (both past and present) is:

S = {x ∶ x has ever been the Singapore PM} .

In set-builder notation, the mathematical punctuation mark colon “∶” means such that.
Following the colon is the property or criterion that x must satisfy in order to be an
element of the set. Hence, S is

“the set that contains all elements x such

that x has ever been the Singapore PM”.

Note that the letter or symbol x is simply a placeholder or dummy variable. We

could have replaced x with any other letter or indeed any other symbol and the set would
have been unchanged. For example, we could’ve replaced x with the letter y:

S = {y ∶ y has ever been the Singapore PM} .

Or even with a smiley face ,:

S = {, ∶ , has ever been the Singapore PM} .

Example 84. T = {x ∶ x ∈ Z, x > 100}. (Recall that the comma “,” means AND.)
T is “the set that contains all elements x such that x is an integer AND x > 100”. (This
time, following the colon are two properties or criteria that x must satisfy in order to be
an element of the set.)
We could also have written T as “the set that contains all integers x such that x > 100”:

T = {x ∈ Z ∶ x > 100}.

56, Contents

Remark 12. In maths (as in natural language), there are often many ways of saying the
same thing. Here we’ve written the set of Singapore PMs (past and present) in four
different ways and the set of integers greater than 100 in three different ways.
So, if there are many ways to say the same thing, then how should we say it? Well, in
maths (as in natural language), you should always strive to express yourself as clearly
and simply as possible. This will require using your judgment, experience, and wisdom.

Example 85. Using set enumeration, the set of ASEAN members is:

{Brunei, Cambodia, Indonesia, Laos, Malaysia, Myanmar,

the Philippines, Singapore, Thailand, Vietnam}.

Using set-builder notation or set comprehension, the same set is:

{x ∶ x is a member of ASEAN} .

This is “the set that contains all elements x such that x is a member of ASEAN”.

Example 86. Let Familee be the set of individuals who have ever been members of
Singapore’s Royal or First Family. Now consider:

A = {x ∶ x has ever been the Singapore PM, x ∉ Familee} .

A is “the set that contains all elements x such that x has ever been the Singapore PM
AND x has never been a member of the Familee”. And so:

A = {GCT} .

Example 87. Let B = {x ∶ x is a member of ASEAN, x has fewer than 5, 000 islands}.
That is, B is “the set that contains all elements x such that x is a member of ASEAN
AND x has fewer than 5, 000 islands”. Then:

B = {Brunei, Cambodia, Laos, Malaysia, Myanmar, Singapore, Thailand, Vietnam} .

Exercise 56. Rewrite the sets S and T using set enumeration. (Answer on p. 1394.)

S = {x ∶ x is a child of Lee Kuan Yew} ,

T = {x ∶ x is a child of Lee Kuan Yew, x has never been the Singapore PM} .

Exercise 57. What is {LKY, GCT, LHL} ∩ Familee? (Answer on p. 1394.)

57, Contents

We’ve used set-builder notation to describe sets of prime ministers or countries. But more
commonly, we’ll be using it to describe sets of numbers.

Example 88. Consider A = {x ∶ x2 − 1 = 0}.

A is “the set that contains all elements x such that x2 − 1 = 0”.

Hence: A = {−1, 1}.

Example 89. Consider B = {x ∶ x2 − 1 = 0, x > 0}.

B is “the set that contains all elements x such that x2 − 1 = 0 and x > 0”.

Hence: B = {1}.

We could also have written: B = {x ∈ R+ ∶ x2 − 1 = 0}.

This says that B is “the set that contains all positive real numbers x such that x2 − 1 = 0”.

Example 90. Consider C = (3.5, 4.5) = {x ∈ Z ∶ 3.5 < x < 5.5}.

C is “the set that contains all integers x such that 3.5 < x < 5.5.

Hence: C = {4, 5}.

Example 91. The set of positive reals may be written as:

R+ = (0, ∞) = {x ∈ R ∶ x > 0} .

This is “the set that contains all reals x such that x is greater than 0”.
Similarly, the set of non-negative reals may be written as:

R+0 = [0, ∞) = {x ∈ R ∶ x ≥ 0} .

This is “the set that contains all reals x such that x is greater than or equal to 0”.
Similarly, we have:

Q+ = {x ∈ Q ∶ x > 0}, Q+0 = {x ∈ Q ∶ x ≥ 0},

Z+ = {x ∈ Z ∶ x > 0}, Z+0 = {x ∈ Z ∶ x ≥ 0}.

58, Contents

Example 92. The set of positive even numbers may be written as:

{2, 4, 6, 8, . . . } = {x ∶ x = 2k, k ∈ Z+ } .

This is “the set that contains all elements x such that x equals 2k AND k is a positive
integer”. Notice that here we introduce a second placeholder or dummy variable k to
help us describe the set.
Again, both the letters or symbols x and k could’ve been replaced by any other letters
or symbols and we’d still have the same set. For example:

{2, 4, 6, 8, . . . } = {p ∶ p = 2q, q ∈ Z+ }
= {⋆ ∶ ⋆ = 2⧫, ⧫ ∈ Z+ }
= {, ∶ , = 2/, / ∈ Z+ } .

Of course, it is customary and thus preferable to stick to letters like x, k, p, and q rather
than weird symbols like shapes and faces.
Another way to write down the set of positive even numbers:

{2, 4, 6, 8, . . . } = {x ∶ x/2 ∈ Z+ } .

This is “the set that contains all elements x such that x divided by 2 is a positive integer”.
Actually, there is a simpler way to write the above set without using two placeholder
variables. We can simply write:

{2, 4, 6, 8, . . . } = {2k ∶ k ∈ Z+ } .

In words, this is “the set of all elements 2k such that k is a positive integer”.

Exercise 58. Rewrite each set in set-builder notation (answer on p. 1394):

(a) R− . (b) Q− . (c) Z− . (d) R−0
(e) Q−0 . (f) Z−0 . (g) (a, b). (h) [a, b].
(i) (a, b]. (j) [a, b). (k) (−∞, −3) ∪ (5, ∞) .

(l) (−∞, 2] ∪ (e, π) ∪ (π, ∞). (m) (−∞, 3) ∩ (0, 7).
(n) The set of negative even numbers.
(o) The set of positive odd numbers.
(p) The set of negative odd numbers.
(q) {π, 4π, 7π, 10π, . . . }.
(r) {−3π, π, 4π, 7π, 10π, . . . }.

Remark 13. Following the A-Level syllabus (p. 16), in set-builder notation, we use the
colon “∶” to mean such that. Note though that some writers use the pipe “∣” instead.

59, Contents

4.23. Chapter Summary
• Informally, a set is a container or box whose objects we call its elements.

Whenever we talk about a set, we refer to both the container and the objects inside.
• ∈ means in and ∉ means not in.
• The ellipsis “. . . ” means continue in the obvious fashion.
• The order of the elements doesn’t matter and repeated elements don’t count:

{1, 2, 3} = {3, 3, 2, 1, 3, 2, 1, 1, 1, 1, 2, 3} .

• R, Z, and Q are the sets of reals, integers, and rationals.63

• R+ , R− , R+0 , and R−0 are the sets of positive reals, negative reals, non-negative reals, and
non-positive reals. Similarly, we have:

Q+ , Q− , Q+0 , and Q−0 ; and Z+ , Z− , Z+0 , and Z−0 .

• The empty set {} = ∅ is the set with zero elements.

• Interval notation. Let a < b. Then:
1. (a, b) is the set of reals between a (excluded) and b (excluded).
2. [a, b] is the set of reals between a (included) and b (included).
3. (a, b] is the set of reals between a (excluded) and b (included).
4. [a, b) is the set of reals between a (included) and b (included).
• A is a subset of B — denoted A ⊆ B — if every element in A is also in B.
• A is a proper subset of B — denoted A ⊂ B — if A ⊆ B and A ≠ B.
• A ∪ B, the union of A and B, is the set of elements that are in A OR B.
• A ∩ B, the intersection of A and B, is the set of elements that are in A AND B.
• A ∖ B (or A − B), A set minus B, is the set of elements that are in A AND not in B.
• The universal set E is the set of “all” elements (where what we mean by “all” depends
on the context).
• The set complement A′ is the set of elements that are not in A.
• De Morgan’s Laws: (P ∩ Q) ′ = P ′ ∪ Q′ and (P ∪ Q) ′ = P ′ ∩ Q′ .
• Set-builder notation (or set comprehension). The set of objects satisfying the
property P is denoted {x ∶ P (x)}.

Exercise 59. Write each set more simply. (Answer on p. 1395.)

(a) R ∖ Z+ . (b) R ∖ (Q ∪ Z).
(c) [1, 6] ∖ ((3, 5) ∩ (1, 4)). (d) {1, 5, 9, 13, . . . } ∩ {2, 4, 6, 8, . . . }.
(e) {2, 5, 8, 11, . . . } ∩ {2, 4, 6, 8, . . . }. (f) (0, 5] ∩ ([1, 8] ∩ [5, 9)) ′ .

Not on your syllabus: N is the set of natural numbers.
60, Contents
5. O-Level Review

5.1. Some Mathematical Vocabulary

Example 93. Consider the equation 1 + 4 = 2 + 3.

The expression on the equation’s left-hand-side (LHS) is 1 + 4. The expression on
the right-hand-side (RHS) is 2 + 3.

LHS expression RHS expression

³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ
1 + 4 = 2 + 3.
↓ ↓ ↓ ↓
Term Term Term Term

The expression on the LHS contains the terms 1 and 4. The expression on the RHS
contains the terms 2 and 3.
Informally, the difference between an equation and an expression is this:

An equation contains an equals sign; an expression does not.

Example 94. Consider the equation y = 5x + 6.

This equation contains two variables: x and y.
Variable Variable
↑ ↑
y = 5x + 6.
↓ ↓
Coefficient Constant

The coefficient on x is 5.
The constant term or more simply constant is 6. (The constant term is simply any
real number64 that does not involve any variables.)

Example 95. Consider the inequality 8x2 + 4x + 3 > y − 2.

This inequality contains two variables: x and y.
Variable Variable Variable
↑2 ↑ ↑
8x + 4x + 3 > y −2.
↓ ↓ ↓ ↓
Coefficient Coefficient Constant Constant

The coefficients on x2 and x are 8 and 4. The coefficient on y is 1.

The constant terms on the LHS and RHS are 3 and −2.

When we study complex numbers in Part IV, constants will also include complex numbers.
61, Contents
5.2. The Absolute Value or Modulus Function
The absolute value (or modulus) function, denoted ∣⋅∣, is defined for all x ∈ R by:65

⎪x for x ≥ 0,
∣x∣ = ⎨

⎪ for x < 0.

Example 96. ∣5∣ = 5, ∣−5∣ = 5, and ∣0∣ = 0.

Observe that if x ≠ 0, then x/ ∣x∣ gives us the sign of x:

7 7 −7 −7
Example 97. = = 1 and = = −1.
∣7∣ 7 ∣−7∣ 7

x ⎪⎪
⎪1, if x > 0,
Fact 11. If x ≠ 0, then: =⎨
∣x∣ ⎪

⎪ if x < 0.

5.3. The Factorial n!

Definition 25. Let n ∈ Z+0 . Then n-factorial, denoted n!, is defined by:

⎪1 for n = 0,
n! = ⎨

⎩1 × 2 × ⋅ ⋅ ⋅ × n
⎪ for n > 0.

⎪1 for n = 0,
Or equivalently:66 n! = ⎨

⎩(n − 1)! × n
⎪ for n > 0.

And so: 0! = 1,
1! = 0! × 1 = 1,
2! = 1! × 2 = 1 × 2,
3! = 2! × 3 = 1 × 2 × 3,
4! = 3! × 4 = 1 × 2 × 3 × 4,
5! = 4! × 5 = 1 × 2 × 3 × 4 × 5,
6! = 5! × 6 = 1 × 2 × 3 × ⋅ ⋅ ⋅ × 6,
⋮ ⋮ ⋮

You may be wondering, “Why is 0! defined as 1?” It turns out this is the definition that
causes us the least overall inconvenience — we’ll appreciate this a little better when we
study combinatorics in Part VI.
A slightly more formal definition of this function is given as Definition 78, after we’ve learnt a little more
about functions.
This latter equivalent definition is an example of a recursive definition.
62, Contents
5.4. Exponents

Definition 26. Let b be a non-zero real number and x be an integer. Then b to the power
of x, denoted bx , is defined as the following number:

⎪ 1 for x = 0,

⎪ b ⋅ b ⋅ ⋅⋅⋅ ⋅ b for x > 0,

⎪´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
b = ⎨ x times


1 1 1

⎪ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ for x < 0.

⎪ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
b b b

⎩ ∣x∣ times
Given the expression bx , we call b the base and x the exponent.
In the special case where the base is zero, i.e. b = 0, we define:

⎪ 0 for x > 0,

0x = ⎨ 1 for x = 0,

⎩Undefined for x < 0.

Example 98. By the above Definition, we have:

21 = 2, 22 = 4, 23 = 8, 24 = 16, 220 = 1 048 576.

We also have:
1 1 1 1 1 1
2−1 = = = 0.5, 2−2 = = = 0.25, 2−3 = = = 0.125,
21 2 22 4 23 8

1 1 1 1
2−4 = = = 0.0625, 2−20 = = = 0.000 000 953 674 316 406 25.
24 16 220 1 048 576

By the above Definition, for any real number b, we have b0 = 1.

Example 99. 20 = 1, (−π) = 1, 1 000 0000 = 1.


We now examine the special case where the base is zero, i.e. b = 0:

63, Contents

Example 100. By the above Definition:
• If x ∈ Z+ , then 0x = 0.
³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ
55 times

So: 02 = 0 ⋅ 0 = 0, 07 = 0 ⋅ 0 ⋅ 0 ⋅ 0 ⋅ 0 ⋅ 0 ⋅ 0 = 0, 055 = 0 ⋅ 0 ⋅ ⋅ ⋅ ⋅ ⋅ 0 ⋅ 0 = 0.
1 1
• If x ∈ Z− , then 0x = = is undefined.
0∣x∣ 0
So, 0−1 and 0−50 are undefined.

• If x = 0, then 0x = 1. That is, 00 = 1.

There’s actually nothing “obvious” or “natural” about defining 00 = 1. Similar to 0! = 1,

this is merely the definition that will cause us the least inconvenience.67

One convenience this definition affords is this: For all b ∈ R, we simply have b0 = 1.
But note that some writers (including ) argue that we should simply leave 00 undefined. But in my
judgment, 00 = 1 is probably the definition that will cause us the least inconvenience.
64, Contents
We next define b1/x , in the case where b is non-negative and x is a non-zero integer:

Definition 27. Let b > 0 and x be a non-zero integer. Then b to the power of 1/x, also
called the xth root of b, is defined to be the number a > 0 that satisfies:

ax = b.

a = b x = b.
We write:

√ 1
Remark 14. b and b are just two different ways to write exactly the same thing.

Example 101. By the above Definition, we have:

√ √ 1
1 √ √ 1
4 2 = 4 = 4 = 2, ( ) = 0.25 2 = = 0.25 = 0.25 = 0.5 = ,
1 2 1
2 2 2

4 4 2

√ 1 3
1 √ 1
8 3 = 8 = 2, ( ) = 0.125 3 = = 0.125 = 0.5 = ,
1 3 1 3 3

8 8 2

√ 1 4
1 √ 1
16 4 = 16 = 2, ( ) = 0.0625 4 = = 0.0625 = 0.5 = .
1 4 1 3 4

16 8 2

1 048 576 20 = 1 048 576 = 2.
We also have:


1 1
( ) = 0.000 000 953 674 316 406 25 20 =
20 1
And: 20

1 048 576 1 048 576

√ 1
= 0.000 000 953 674 316 406 25 = 0.5 = .

Remark 15. It is true that b2 = 25 has two solutions, namely b = 5 and b = −5. However,
it is wrong to write any of the following:
√ √
25 2 = ±5. 7 25 2 = −5. 7 25 = ±5. 7 25 = −5. 7
1 1

By the above Definition, 251/2 or 25 refers to the positive square root:

251/2 = 25 = 5. 3

To talk about the negative square root −5, you must (simply) stick a minus sign in front:

−251/2 = − 25 = −5. 3

In general, given any real number x, we have x ≥ 0.

Exercise 60. True or false: “If x ∈ R, then x2 = x.” (Answer on p. 1396.)

65, Contents

Note the requirement in Definition 27 that b ∈ R+ . If b < 0, then the above definition does
not apply:

Example 102. For now, we leave −11/2 or −1 undefined, because there is no number
a ∈ R+ such that

a2 = −1.

(Later on, when we study complex numbers in Part IV, we will define i = −11/2 = −1.)

Note also the requirement in Definition 27 that x ≠ 0. If x = 0, then
0 1
b or b 0 is simply
For certain special cases of b x , we have some special notation or terminology:

• If x = 1, we will not write a = b. Instead, we will simply write a = b.

√ √
• If x = 2, we will not write a = b. Instead, we will simply write a = b. We will also call

a the square root of b.

• If x = 3, we will call a = b the cube root of b.

We next define bx , in the case where b is positive and x is rational:

Definition 28. Let b > 0 and x ∈ Q. Suppose x = m/n for some integers m and n. Then
we define:

bx = (b n ) .
1 m

bx = b n = (b n ) = ( b) .
1 m m
Altogether, we have:
m n

Example 103. By the above Definition, we have:

√ 3
21.5 = 2 2 = ( 2) = (1.414 2 . . . ) ≈ 2.828 4,
3 3

√ 1383
213.83 = 2 100 = ( 2) = (1.000 6 . . . ) ≈ 14 563,
1 383 100 1383

√ 7
22.3 = 2 3 = ( 2) = (1.259 9 . . . ) ≈ 5.039 7.
7 3 7

You should find the following Laws of Exponents familiar:

66, Contents

Proposition 1. (Laws of Exponents.) Let a, b > 0 and x, y ∈ R. Then:
(a) b b = b = x. (c) y = bx−y . (d) (bx ) = bxy . (e) (ab) = ax bx .
−x bx
. (b) b
x y x+y y x
b b

Remark 16. Note well the requirement that the bases a and b are positive.
If for example b = −1 < 0, then:

(b2 ) = 11/2 = 1 b1 = −1,


so that (b2 ) ≠ b1 and Law (d) fails.


Our proof below is only of the special and simple case where the exponents x and y are
positive integers. (The proof where they are instead negative integers is similar.)
You can nonetheless take for granted that the above laws hold for all real x, y.68

Proof. In this proof, = denotes the use of Definition 26.


(a) bx by = b ⋅ b ⋅ ⋅ ⋅ ⋅ ⋅ b × b ⋅ b ⋅ ⋅ ⋅ ⋅ ⋅ b = b ⋅ b ⋅ ⋅ ⋅ ⋅ ⋅ b = bx+y .
1 1
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
x times y times x+y times
1 1
(b) Since x > 0, we have −x < 0 and ∣−x∣ = x. Now: b−x = =
b∣−x∣ bx
(c) If x ≥ y so that x − y ≥ 0, then:
y times
³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ
x times

b ⋅ b ⋅ ⋅ ⋅ ⋅ ⋅ b 1 b ⋅ b ⋅ ⋅ ⋅ ⋅ ⋅ b bx
= b ⋅ b ⋅ ⋅⋅⋅ ⋅ b = b ⋅ b ⋅ ⋅⋅⋅ ⋅ b × = = .
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ b ⋅ b ⋅ ⋅ ⋅ ⋅ ⋅ b b ⋅ b ⋅ ⋅ ⋅ ⋅ ⋅ b by
x−y times x−y times ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
y times y times

If instead x < y so that x − y < 0 and ∣x − y∣ = y − x, then:

³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ
x times x times

1 1 1 b ⋅ b ⋅ ⋅ ⋅ ⋅ ⋅ b b ⋅ b ⋅ ⋅ ⋅ ⋅ ⋅ b bx
= = = × = = .
b ⋅ b ⋅ ⋅ ⋅ ⋅ ⋅ b b ⋅ b ⋅ ⋅ ⋅ ⋅ ⋅ b b ⋅ b ⋅ ⋅ ⋅ ⋅ ⋅ b b ⋅ b ⋅ ⋅ ⋅ ⋅ ⋅ b b ⋅ b ⋅ ⋅ ⋅ ⋅ ⋅ b by
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
∣x−y∣ times y−x times y−x times x times y times

y times
³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ · ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ
(d) (bx ) = (b ⋅ b ⋅ ⋅ ⋅ ⋅ ⋅ b)y = (b ⋅ b ⋅ ⋅ ⋅ ⋅ ⋅ b) × ⋅ ⋅ ⋅ × (b ⋅ b ⋅ ⋅ ⋅ ⋅ ⋅ b) = b ⋅ b ⋅ ⋅ ⋅ ⋅ ⋅ b = bxy .
y 1 1 1

´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹xy ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶

x times x times x times

(e) (ab) = (ab) (ab) . . . (ab) = a ⋅ a ⋅ ⋅ ⋅ ⋅ ⋅ a × b ⋅ b ⋅ ⋅ ⋅ ⋅ ⋅ b = ax bx .

x 1 1

´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹x¹ ¹ ¹ ¹ ¹ ¹times

¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
x times
x times

Even though we haven’t even defined bx in the case where x ∉ Q! See Ch. 121.17 for a further discussion.
67, Contents
Exercise 61. Simplify each expression. (Answer on p. 1396.)

54x ⋅ 251−x √ 8x+2 − 34 ⋅ 23x

(a) . (b) 2 √ 2x+1 .
52x+1 + 3 ⋅ 25x + 17 ⋅ 52x 8
Exercise 62. Let b > 0 and x, y ∈ Z+ . Explain whether each equation is generally true.

(a) b(x ) = bxy . (b) (bx ) = bxy . (Answer on p. 1396.)

y y

68, Contents

5.5. Rationalising the Denominator with a Surd

The term surd usually refers to a square root ⋅.
√ √
1 2 2
Example 104. √ = √ √ = .
2 2× 2 2

Recall that (a + b) (a − b) = a2 − b2 .

Definition 29. The conjugate of a + b is a − b. We call a + b and a − b a conjugate pair.

√ √ √ √
1 1− 2 1− 2 1− 2 1− 2 √
Example 105. √ = √ √ = √ = = = 2 − 1.
1 + 2 (1 + 2) (1 − 2) 12 − ( 2)2 1−2 −1

We take this opportunity to talk about the ± and ∓ notation:

√ √
1 1 2− 3 2− 3 √
Example 106. We have: √ = √ √ = 2 = 2 − 3. ⋆
2+ 3 2+ 3 2− 3 2 − 3
√ √
1 1 2+ 3 2+ 3 √
And: √ = √ √ = 2 = 2 + 3. ☇
2− 3 2− 3 2+ 3 2 − 3
Using plus-minus ± and minus-plus ∓ notation, we can rewrite the above two lines as:
√ √
1 1 2∓ 3 2∓ 3 √
√ = √ √ = 2 = 2∓ 3.
2± 3 2± 3 2∓ 3 2 − 3
The minus-plus notation ∓ indicates the “opposite” of ±. So in this example:
• If ± is +, then ∓ is − and we have ⋆. And
• If ± is −, then ∓ is + and we have ☇.

Remark 17. We only ever use the ∓ notation if we’re also already using the ± notation
and have a need to indicate√that the signs go the opposite way. So for example, we’d√never
write, “x2 = 17 Ô⇒ x = ∓ 17,” because we can simply write, “x2 = 17 Ô⇒ x = ± 17”.

Exercise 63. Let x, y ∈ R, with y ≠ 0. Prove the following. (Answer on p. 1396.)

1 x2
√ 2 =− ± + 1.
± +1
x x y y 2
y y2

Exercise 64. Let a, b, c ∈ R with a ≠ 0 and b2 − 4ac > 0. Prove the following.69

−b ± b2 − 4ac 2c
= √ . (Answer on p. 1397.)
2a −b ∓ b2 − 4ac

The LHS is the quadratic formula in the more familiar form. The RHS is also the quadratic formula,
69, Contents
f: x y the function f maps the element x to the element y

f –1 the inverse of the function f

g o f, gf
5.6. Logarithms
the composite function of f and g which is defined by
(g o f)(x) or gf(x) = g(f(x))
Informally, logarithms are simply the inverse of exponents. A bit more formally:
lim f(x)
Definition 30. Let b, n ∈ Rx→+ a and x ∈ R. If bx = n, then we write x = logb n and call x the
the limit of f(x) as x tends to a

base b logarithm of n. ∆x ; δx an increment of x

Example 107. 23 = 8 ⇐⇒dx3 = log2 8.

the derivative of y with respect to x

Example 108. 34 = 81 ⇐⇒ 4 = log3 81.

d y n
the nth derivative of y with respect to x
n dx

Example 109. 45 = 1 024 f'(x),

5= …,log
f (x)1 024.the first, second,
nth derivatives of f(x) with respect to x

∫ y dx indefinite integral of y with respect to x

Remark 18. Some writers (including your TI84) write log x to mean the base 10 logar-
ithm of x. Still others write∫a ylog the natural
dx x to mean the logarithm of x(Note that
of x at
definite integral of y with respect for values between a
this point, we still haven’t discussed what the natural logarithm function is. We will
do so only in Ch. 17.) x& , &x& , … the first, second, derivatives of x with respect to time

This inconsistent usage makes for a very confused situation.

5. Exponential and Logarithmic Functions
In this textbook, we will simply
stick strictly to the following notation that appears on
base of natural logarithms
your H2 Maths syllabus p. x18:
e , exp x exponential function of x

log a x logarithm to the base a of x

ln x natural logarithm of x
lg x logarithm of x to base 10

We will never write log. We will only ever write logb and even then fairly rarely.
6. Circular Functions and Relations

You should find these Lawssin,

sec, cot
} the circular functions
sin–1, cos–1, tan–1
cosec–1, sec–1, cot–1
} the inverse circular functions


but in an alternative and less familiar form.

70, Contents
Proposition 2. (The Laws of Logarithms.) Let a, b, c > 0 and x ∈ R. Then
(a) logb 1 = 0.
(b) logb b = 1.
(c) logb bx = x.
(d) blogb a = a.
(e) c logb a = logb ac .
(f) logb = − logb a. (Logarithm of Reciprocal.)
(g) logb (ac) = logb a + logb c. (Sum of Logarithms.)
(h) logb = logb a − logb c.
(Difference of Logarithms.)
logc a
(i) logb a = (b, c ≠ 1). (Change of Base.)
logc b
(j) logab c = loga c (a ≠ 1).

Proof on the next page:

71, Contents

Proof. In this proof, Ô⇒ denotes the use of Definition 30.

(a) b0 = 1 Ô⇒ logb 1 = 0.

(b) b1 = b Ô⇒ logb b = 1.

(c) bx = bx Ô⇒ logb bx = x.

(d) Let y = logb a Ô⇒ by = a ⇐⇒ blogb a = a.

(e) The first step here uses Proposition 1(d):

c (d) ⋆
bc logb a = (blogb a ) = ac Ô⇒ c logb a = logb ac .

= logb a−1 = − logb a.
(f) logb
(g) The first step here uses Proposition 1(a):
(d) ⋆
blogb a+logb c = blogb a blogb c = ac Ô⇒ logb (ac) = logb a + logb c.

1 (g) 1 (f) 1
= logb (a ⋅ ) = logb a + logb = logb a − logb .
(h) logb
c c c c
(i) blogb a = a.

Apply base c log: logc (blogb a ) = logc a.

By (e), we also have: logc (blogb a ) = (logb a) (logc b).

logc a
Divide by logc b ≠ 0: logb a = .
logc b

loga c (c) loga c 1

logab c = = = loga c.
loga ab b b

Exercise 65. Show that each expressions equal 2. (Answer on p. 1397.)

(a) log2 32 + log3 .
(b) log3 45 − log9 25.

(c) log16 768 − log2 3.

72, Contents

5.7. Polynomials

Definition 31. Let c0 , c1 , . . . , cn be constants, with cn ≠ 0. We call the expression cn xn +

cn−1 xn−1 + cn−2 xn−2 + ⋅ ⋅ ⋅ + c1 x + c0 an nth-degree polynomial (in one variable). We also call:

• Each ci xi the ith-degree term or the ith term;

• Each ci the coefficient on xi (or the ith-degree coefficient, or the ith coefficient);
• The 0th coefficient c0 the constant term or, more simply, the constant; and
• The following equation a (nth-degree) polynomial equation (in one variable):
cn xn + cn−1 xn−1 + cn−2 xn−2 + ⋅ ⋅ ⋅ + c1 x + c0 = 0.

Again, in the above definition, x is merely a dummy or placeholder variable that we

can replace with any other symbol, like y, z, ,, or ☀.

Example 110. The expression 7x − 3 is a 1st-degree or linear polynomial. The

equation 7x − 3 = 0 is a 1st-degree polynomial (or linear) equation.

1st-degree term 0th-degree term

↖ ↗
7 x −3
® ¯
↙ ↘
1st-degree coefficient 0th-degree coefficient

Term Coefficients
0th-degree −3 −3
1st-degree 7x 7

The 0th-degree term is usually simply called the constant term.

Example 111. The expression 3x2 + 4x − 5 is a 2nd-degree or quadratic polynomial.

The equation 3x2 + 4x − 5 = 0 is a 2nd-degree polynomial (or quadratic) equation.

1st-degree term
2nd-degree term ↖ ↑ ↗ 0th-degree term
¬ ¬«
3 x2 + 4 x −5
® ® ¯
2nd-degree coefficient ↙ ↓ ↘ 0th-degree coefficient
1st-degree coefficient

Term Coefficient
0th-degree −5 −5
1st-degree 4x 4
2nd-degree 3x2 3

73, Contents

Example 112. The expression −5x3 +2x+9 is a 3rd-degree or cubic polynomial. The
equation −5x3 + 2x + 9 = 0 is a 3rd-degree polynomial (or cubic) equation.
When a particular coefficient is 0, we usually don’t bother writing out that term. This
is the case here with the 2nd-degree term.

2nd-degree term 1st-degree term

3rd-degree term ↖ ↑ ↑ ↗ 0th-degree term
­ ¬ ¬ ª
−5 x3 + 0 x2 + 2 x + 9
¯ ® ® ®
3rd-degree coefficient ↙ ↙ ↓ ↘ 0th-degree coefficient
2nd-degree coefficient 1st-degree coefficient

Term Coefficient
0th-degree 9 9
1st-degree 2x 2
2nd-degree 0x2 0
3rd-degree −5x2 −5

You get the idea. We also have 4th-, 5th-, 6th-, . . . degree (or quartic, quintic, sextic,
. . . ) polynomials and equations.

Example 113. Technically, the expression 7 could be regarded as a 0th-degree poly-

nomial, because 7 = 7x0 . And so in certain contexts, it may be convenient to refer to
the expression 7 as such.
But more commonly, we’ll simply call the expression 7 a constant.

In this textbook, we’ll almost always consider only polynomials in one variable. So
unless otherwise stated, when we say polynomial, we’ll always mean a polynomial in one

But in case you were wondering, an example of a polynomial in two variables is Ax + Bxy + Cy.
In general, the degree of each term in a polynomial is the sum of the exponents on the variables. And
the polynomial’s degree is simply the highest such degree. So here, the term Ax has degree 1, Bxy has
degree 2, and Cy has degree 1. So this polynomial is of degree 2.
Another example of a polynomial in two variables is Ax2 + Bxy + Cy 2 + Dx + Ey + F . Despite looking
more complicated than the previous example, this polynomial also has degree 2 because the greatest
sum of exponents on any term is again 2.
And by the way, as Ch. 117.8 (Appendices) discusses, the conic section is, in general, described by this
2nd-degree polynomial equation in two variables:
Ax2 + Bxy + Cy 2 + Dx + Ey + F = 0.
74, Contents
Part I.
Functions and Graphs

75, Contents

[A] large part of mathematics which became useful developed with absolutely
no desire to be useful, and in a situation where nobody could possibly know
in what area it would become useful; and there were no general indications
that it ever would be so. By and large it is uniformly true in mathematics
that there is a time lapse between a mathematical discovery and the moment
when it is useful; and that this lapse of time can be anything from thirty to a
hundred years, in some cases even more ...

This is true for all of science. Successes were largely due to forgetting com-
pletely about what one ultimately wanted, or whether one wanted anything
ultimately; in refusing to investigate things which profit, and in relying solely
on guidance by criteria of intellectual elegance; it was by following this rule
that one actually got ahead in the long run, much better than any strictly
utilitarian course would have permitted.

— John von Neumann (1954).

76, Contents

6. Graphs

6.1. Ordered Pairs

Recall that with sets, the order of the elements didn’t matter:

Example 114. {Cow, Chicken} = {Chicken, Cow}.

Example 115. {−5, 4} = {4, −5}.

We now introduce a new mathematical object called an ordered pair (a, b). Like the
sets {Cow, Chicken} and {−5, 4}, you can think of an ordered pair as a container with two
objects. But unlike sets, with ordered pairs, the order matters (hence the name).71

Definition 32. Given the ordered pair (a, b), we call a its first or x-coordinate and b its
second or y-coordinate.

Two ordered pairs are equal if and only if both their x- and y-coordinates are equal.

Fact 12. (x, y) = (a, b) ⇐⇒ x = a AND y = b.

Proof. See p. 1257 in the Appendices.

Example 116. Consider these two ordered pairs:

(Cow, Chicken) and (Chicken, Cow).

The ordered pair (Cow, Chicken) has x-coordinate Cow and y-coordinate Chicken.
The ordered pair (Chicken, Cow) has x-coordinate Chicken and y-coordinate Cow.
Since these two ordered pairs have different x- and y-coordinates, they are not equal:

(Cow, Chicken) ≠ (Chicken, Cow) ,

This is in contrast to what we saw above with sets, where:

{Cow, Chicken} = {Chicken, Cow} .

To distinguish an ordered pair from a set with two elements, we use parentheses (instead
of braces). Be very clear that the ordered pair (a, b) is a completely different mathematical
object from the set {a, b}:

(Cow, Chicken) ≠ {Cow, Chicken} and (Chicken, Cow) ≠ {Chicken, Cow}.

For the formal definition of an ordered pair, see p. 1257 in the Appendices.
77, Contents
Example 117. (-5,4) and (4,-5) are ordered pairs. (−5, 4) has x-coordinate −5 and
y-coordinate 4, while (4, −5) has x-coordinate 4 and y-coordinate −5. Since these two
ordered pairs have different x- and y-coordinates, they are not equal:

(-5,4) ≠ (4, −5)

This is in contrast to what we saw above with sets, where {−5, 4} = {4, −5}.
Again, be very clear that the ordered pair (a, b) is a completely different mathematical
object from the set {a, b}:

(-5,4) ≠ {−5, 4} and (4, −5) ≠ {4, −5}.

An ordered pair (a, b) can have a = b:

Example 118. The following four ordered pairs are distinct:72

(Cow, Chicken), (Chicken, Cow), (Cow, Cow), and (Chicken, Chicken).

The ordered pair (Cow, Cow) has x-coordinate Cow and y-coordinate Cow.
The ordered pair (Chicken, Chicken) has x-coordinate Chicken and y-coordinate Chicken.

Example 119. The following four ordered pairs are distinct:

(1,-5), (-5,1), (1,1), and (-5,-5).

The ordered pair (1, 1) has x-coordinate 1 and y-coordinate 1.

The ordered pair (−5, −5) has x-coordinate −5 and y-coordinate −5.

Remark 19. Confusingly, (−5, 4) can denote two entirely different things:

• In Ch. 4.13, we learnt that (−5, 4) denotes the set of real numbers between −5 and 4.
• Here we learn that (−5, 4) denotes the ordered pair with the x-coordinate −5 and the
y-coordinate 4.
This is an unfortunate and confusing situation. But don’t worry.
In the Oxford English Dictionary, no fewer than 645 different meanings are given for the
word run.73 But one rarely has trouble telling from the context which of these is meant
when someone uses the word run.
Likewise, you’ll rarely have trouble telling from the context whether by (−5, 4), the writer
means a set of real numbers or an ordered pair.

The word distinct is just a synonym for not equal.
According to the NYT, the OED editor Peter Gilliver spent nine months working on the word run.
Previously, the word set was said to have the most different meanings, at 430 (see e.g. Guinness Book
of World Records).
78, Contents
6.2. The Cartesian Plane
We will usually be concerned with ordered pairs of real numbers (rather than ordered
pairs of cows and chickens):

Definition 33. A point (in two-dimensional space) is any ordered pair of real numbers.74

By the way, what a point is depends on the context. In one-dimensional space, a point is
simply any real number. Later on in Part III, we’ll also be looking at three-dimensional
space — in that case, a point will be any ordered triple of real numbers. But for now, we’ll
be concerned only with two-dimensional space. And so for now, whenever we say point, it
should be understood that we’re talking about an ordered pair of real numbers.
The cartesian plane75 is the set of all points (i.e. the set of all ordered pairs of real
numbers). Formally:

Definition 34. The cartesian plane is the following set:76

{(x, y) ∶ x, y ∈ R} .
The origin O is the point at which the x- and y-axes intersect.

Definition 35. The origin is the point O = (0, 0).

Example 120. Depicted below is the cartesian plane, centred on the origin O = (0, 0).
Note that the cartesian plane stretches out to ±∞ in both the x- and y-directions.
Also depicted are three points A = (−5, 4), B = (1, 1), C = (2, −3).

A = (−5, 4)

2 B = (1, 1)
O = (0, 0)

-6 -4 -2 2 4 x

-4 C = (2, −3)

Note that a point is itself a zero-dimensional object.
The cartesian plane (and more generally cartesian geometry) is named after René Descartes
(1596–1650), who’s also the dude who came up with “Cogito ergo sum” (“I think, therefore I am”).
There is some disagreement over whether to capitalise cartesian — see e.g. . Indeed, in
your syllabus, it is capitalised on p. 19 but not on pp. 7–8! (My guess is that while pp. 1–15 were written
by the local Singapore authorities, pp. 16–20 were simply copy-pasted from some standard Cambridge
notation template.) My personal preference is to capitalise cartesian, but it seems that the A-Level
exams do not do so. I shall therefore follow the sacred A-Level exams by not capitalising cartesian.
79, Contents
6.3. A Graph is Any Set of Points
You’re probably used to thinking of a graph (or a curve) as a “drawing”. But formally:

Definition 36. A graph (or curve) is any set of points.

And so equivalently: A graph is any subset of the cartesian plane.

Example 121. Consider G = {(−5, 4) , (1, 1) , (2, −3)}. G is a set that contains three
points. And so by definition, G is also a graph.
We’ve defined graph as a noun (it is a set of points). But at the slight risk of confusion,
we’ll also use graph as the verb meaning to draw a graph. So, we can either say, “The
graph G is drawn below,” or, “G is graphed below”.

(−5, 4)
The graph 3
G = {(−5, 4) , (1, 1) , (2, −3)}
contains three points. (1, 1)

-5 -4 -3 -2 -1 -1 1 2

(2, −3)

Example 122. Consider H = {(−5, 4) , (2, −3)}. H is a set that contains two points. And
so by definition, H is also a graph. H is graphed below.

(−5, 4)
The graph 3
H = {(−5, 4) , (2, −3)}
contains two points. 1

-5 -4 -3 -2 -1 -1 1 2

(2, −3)

80, Contents

Example 123. Consider I = {(1, 1)}. I is a set that contains one point. And so by
definition, I is also a graph. I is graphed below.

The graph 3
I = {(1, 1)} (1, 1)
contains one point. 1

-5 -4 -3 -2 -1 -1 1 2



If a set contains at least one element that isn’t a point, then it isn’t a graph:

Example 124. Consider J = {(−5, 4) , Love}.

J is a set containing two elements — the point (−5, 4) and the abstract concept called
Love. Since J contains at least one element that isn’t a point, J is not a graph.

Example 125. Consider K = {(−5, 4) , (1, 1) , 1}.

K is a set containing three elements — the points (−5, 4) and (1, 1), and the number 1.
Since K contains at least one element that isn’t a point, K is not a graph.77

Example 126. Each of the sets R, Q, and Z contains at least one element that isn’t a
point. (Indeed, each contains no points at all and infinitely many elements that are not
points.) Thus, R, Q, and Z are not graphs.

You may be used to thinking of a graph as a “drawing”. But you should now think of a
graph as being simply a set of points. A “drawing” of a graph is not the graph itself, but
merely a visual aid.78

Remark 20. Your A-Level exams seem to use the terms graph and curve interchangeably
(i.e. as entirely equivalent synonyms), so that’s what this textbook will do too.

Actually, a number can also be regarded as a point in one-dimensional space. However, as stated earlier,
for now, whenever we say point, we mean a point in two-dimensional space. And so, following this usage,
1 here is not a point.
Albeit a tremendously helpful one. Indeed, analytic or cartesian geometry was one of the major
milestones in the history of mathematics. The idea of combining algebra and geometry is today “obvious”
even to the secondary school student, but it wasn’t always obvious to mathematicians.
81, Contents
6.4. The Graph of An Equation
We’ll be looking mostly at graphs of equations (and shortly, also of functions):

Definition 37. The graph of an equation is the set of points (x, y) for which the equation
is true.

Example 127. Let G be the graph of the equation y = x + 2. Then:

G = {(x, y) ∶ y = x + 2} .

We can say, “Below we’ve graphed the equation y = x + 2” or more simply, “Below is
graphed y = x + 2,” or, “Below is graphed G”.
G is the set of points (x, y) for which the equation y = x + 2 is true. And so, G contains:
• (5, 7), because 7 = 5 + 2.
• (1, 3), because 3 = 1 + 2.

8 y
(5, 7)

(1, 3) The graph of the equation y = x + 2

2 is the set G = {(x, y) ∶ y = x + 2},
which contains (5, 7) and (1, 3).

-3 -2 -1 1 2 3 4 5 6 x

Informally, we say that the equation y = x + 2 describes a line. More specifically, it

describes the line with gradient 1 and which passes through the point (0, 2).

82, Contents

Example 128. Let H be the graph of the equation y = x2 . Then:

H = {(x, y) ∶ y = x2 } .

H is the set of points (x, y) for which the equation y = x2 is true. And so, H contains:

(1, 1), because 1 = 12 ; and (2, 4), because 4 = 22 .

The graph of the 4 (2, 4)

equation y = x2 is the
set G = {(x, y) ∶ y = x2 }, 2
which contains (1, 1) and (2, 4).
(1, 1)

-5 -4 -3 -2 -1 1 2 x

Informally, we say that the equation y = x2 describes a parabola.

Example 129. Let I be the graph of the equation x2 + y 2 = 1. Then:

I = {(x, y) ∶ x2 + y 2 = 1} .

I is the set of points (x, y) for √ √
which the equation x2 + y 2 = 1 is (
2 2
, )
true. And so, I contains: 2 2
• (1, 0), because 12 + 02 = 1.
√ √
2 2
• ( , ), because:
2 2
√ 2 √ 2 (1, 0)
2 2
( ) +( ) = 1. x
2 2
The graph of the equation x + y = 1 2 2

is the set I = {(x, y) ∶ x2 + y 2 = 1},

Informally, we say that the equa- √ √
tion x2 + y 2 = 1 describes the unit 2 2
which contains (1, 0) and ( , ).
circle centred on the origin. (By 2 2
“unit circle”, we mean it has ra-
dius 1.) We will study the unit
circle in greater detail later.

83, Contents

6.5. Graphing with the TI84
Our first examples of using the TI84:

Example 130. Graph the equation y = x2 .

1. Press ON to turn on your calculator.
2. Press Y= to bring up the Y= editor.
3. Press X,T,θ,n to enter “X”; then x2 to enter the squared “2 ” symbol.
4. Now press GRAPH and the calculator will graph y = x2 .

After Step 1. After Step 2. After Step 3. After Step 4.

Example 131. Graph the equation x2 + y 2 = 1.

Unfortunately, you cannot directly input this equation, because the piece of junk that is
the TI84 requires that you enter equations with y on the left √
hand side. And so√here, we’ll
have to tell the TI84 to graph two separate equations: y = 1 − x2 and y = − 1 − x2 .
1. Press ON to turn on your calculator.
2. Press Y= to bring up the Y= editor.
Most buttons on the TI84 have three different functions. Simply pressing a button ex-
ecutes the function that’s printed on the button itself. Pressing the 2ND button and
then a button executes the function that’s printed in blue above the button. And pressing
the ALPHA and then a button executes the function that’s printed in green above the
2 √
3. Press
√ the 2ND button and then the x button to execute the √ function and enter
“ ”. Next press 1 − X,T,θ,n x2 . Altogether you’ve entered 1 − x2 .
4. Now press ENTER and the blinking cursor will move down, to the right of “Y2 =”.

After Step 1. After Step 2. After Step 3. After Step 4.

(Example continues on the next page ...)

84, Contents

(Example continues on the next page ...)
We’ll now enter the second equation.
5. Press the (-) button. (Warning: This is different from the − button. If you use the
− button, you will get an error message when you try to generate your graphs later.)

Now repeat what we did in step 3 above: Press√the blue 2ND button and then
(which corresponds to the x2 button) to enter “ ”. Next press 1 − X,T,θ,n x2 to

enter “1 − X 2 ”. Altogether you will have entered − 1 − x2 .
√ √
6. Now press GRAPH and the calculator will graph both y = 1 − x2 and y = − 1 − x2 .
Notice the graphs are very small. To zoom in:
7. Press the ZOOM button to bring up a menu of ZOOM options.
8. Press 2 to select the Zoom In option. Nothing seems to happen. But now press
ENTER and the TI84 will zoom in a little for you.
We expected to see a perfect circle — instead, we get an ellipse. Hm, what’s going on?
The reason is that by default, the x- and y- axes are scaled differently. To set them to
the same scale:
9. Press the ZOOM button again to bring up the ZOOM menu of options. Press 5 to
select the ZSquare option. The TI84 will adjust the x- and y- axes so that they have
the same scale and thus give us a perfect circle.

After Step 5. After Step 6. After Step 7.

After Step 8. After Step 9.

P.S. An alternative to Step 5 is to enter “−Y1 ” instead of “− 1 − X 2 ”. To do so, replace
Step 5 with the following instructions: First press (-) to enter the minus sign, as was
done in Step 5. Next press VARS to bring up the VARS menu. Then press ⟩ to go to
the Y-VARS menu. Now press ENTER to select “1: Function...”. Press ENTER again
to select “1: Y1 ”. Altogether, we will have entered “Y2 = −Y1 ”. Now go to Step 6.

Exercise 66. Graph the following equations. (Answer on p. 1398.)

(a) y = ex . (b) y = 3x + 2. (c) y = 2x2 + 1.

85, Contents

6.6. The Graph of An Equation with Constraints

Example 132. Let G1 be the graph of the equation y = x + 2 with the constraint x ≥ 3.
Equivalently, let G1 be the graph of y = x + 2, x ≥ 3. Then:

G1 = {(x, y) ∶ y = x + 2, x ≥ 3} .

G1 is the set of points (x, y) for which (y = x + 2 AND x ≥ 3) is true. And so, G1 contains:
• (5, 7), because 7 = 5 + 2 AND 5 ≥ 3.
• (3, 5), because 5 = 3 + 2 AND 3 ≥ 3.
G1 does not contain (1, 3), because 1 ≥/ 3.

G1 = {(x, y) ∶ y = x + 2, x ≥ 3}

(5, 7)
(3, 5)
(1, 3)

-1 -1 1 2 3 4 5 6 7 8 x

Above we labelled our graph as G1 = {(x, y) ∶ y = x + 2, x ≥ 3}. But going forward, we’ll
be a little lazy/sloppy and simply label it as y = x + 2, x ≥ 3 (as done below), with the
understanding that this is the graph that satisfies the labelled equation and constraint.
Nonetheless, you should always bear in mind that a graph is a set of points.
y = x + 2, x ≥ 3

(5, 7)
(3, 5)
(1, 3)

-1 -1 1 2 3 4 5 6 7 8 x

86, Contents

Example 133. Let G2 be the graph of y = x + 2, x > 3. Then:

G2 = {(x, y) ∶ y = x + 2, x > 3} .

G2 is exactly the same as G1 but with one difference — the constraint (inequality) is now
strict, so that this time, G2 does not contain (3, 5).

y = x + 2, x > 3

(5, 7)
(3, 5)
(1, 3)

-1 -1 1 2 3 4 5 6 7 8 x

87, Contents

Example 134. Let H1 be the graph of y = x2 , x ≤ 2. Then:

H1 = {(x, y) ∶ y = x2 , x ≤ 2} .

H1 is the set of points (x, y) for which (y = x2 AND x ≤ 2) is true. And so, H1 contains:
• (1, 1), because 12 = 1 AND 1 ≤ 2.
• (2, 4), because 22 = 4 AND 2 ≤ 2.
H1 does not contain (3, 9), because 3 ≤/ 2.

8 (3, 9)
y = x ,x ≤ 2

(2, 4)
(1, 1)

-4 -3 -2 -1 1 2 3 x

Example 135. Let H2 be the graph of y = x2 , x < 2. Then:

H2 = {(x, y) ∶ y = x2 , x < 2} .

H2 is exactly the same as H1 but with one difference — the constraint (inequality) is now
strict, so that this time, H2 does not contain (2, 4).

8 (3, 9)
y = x ,x < 2

(2, 4)
(1, 1)

-4 -3 -2 -1 1 2 3 x

88, Contents

Example 136. Let I1 be the graph of x2 + y 2 = 1, x ≥ − . Then:
I1 = {(x, y) ∶ x2 + y 2 = 1, x ≥ − } .
I1 is the set of points (x, y) for which (x2 + y 2 = 1 AND x ≥ − ) is true. And so, I1
• (1, 0), because 12 + 02 = 1 AND 1 ≥ −0.5.
√ √ √ 2 √ 2 √
2 2 2 2 2 1
• ( , ), because ( ) +( ) = 1 AND ≥− .
2 2 2 2 2 2
√ √ 2
1 3 1 2 3 1 1
• (− , ), because (− ) + ( ) = 1 AND − ≥ − .
2 2 2 2 2 2
√ √ 2
1 3 1 2 3 1 1
• (− , − ), because (− ) + (− ) = 1 AND − ≥ − .
2 2 2 2 2 2

y 1
√ x2 + y 2 = 1, x ≥ −
1 3 2
(− , )
2 2
√ √
2 2
( , )
2 2

(1, 0) x

1 3
(− , − )
2 2

89, Contents

Example 137. Let I2 be the graph of x2 + y 2 = 1, x > − . Then:
I2 = {(x, y) ∶ x2 + y 2 = 1, x > − } .
I2 is exactly the same as I1 but with one difference√— the constraint
√ (inequality) is now
1 3 1 3
strict, so that this time, I2 does not contain (− , ) or (− , − ).
2 2 2 2

y 1
√ x2 + y 2 = 1, x ≥ −
1 3 2
(− , )
2 2
√ √
2 2
( , )
2 2

(1, 0) x

1 3
(− , − )
2 2

90, Contents

More generally, a graph can also be of multiple equations with multiple constraints:

Example 138. Let J be the graph of:

⎪−x, for x ≤ 0

⎩x , for x > 0.
⎪ 2

In words, J is the set of points for which either (y = −x AND x ≤ 0) OR (y = x2 AND

x > 0) is true.

J is the graph of: J

⎪−x, for x ≤ 0

⎩x , for x > 0.
⎪ 2

We can actually write down J in set-builder notation:

J = {(x, y) ∶ y = −x, x ≤ 0} ∪ {(x, y) ∶ y = x2 , x > 0} .

But this is cumbersome and difficult to read. And so, we’ll usually simply specify J as
was done above.

91, Contents

Example 139. Let K be the graph of:

⎪x, for x ≠ 2

⎩0, for x = 2.

In words, K is the set of points for which either (y = x AND x ≠ 2) OR (y = 0 AND x = 2)

is true.

y K
K is the graph of:

⎪x, for x ≠ 2

⎩0, for x = 2.

Exercise 67. Graph each of the following. (Answers on pp. 1399–1400.)

(a) y = ex , −1 ≤ x < 2.
(b) y = 3x + 2, −1 ≤ x < 2.
(c) y = 2x2 + 1, −1 ≤ x < 2.

⎪x + 1, for x ≤ 0
(d) y = ⎨

⎩x − 1, for x > 0.

⎪x + 1, for x < 0
(e) y = ⎨

⎩x − 1, for x ≥ 0.

92, Contents

6.7. Intercepts and Roots
In secondary school, you learnt about x-intercepts, y-intercepts, and roots. We now
give their formal definitions:

Definition 38. Let G be a graph.

• If the point (0, b) is in G, then we call (0, b) a vertical or y-intercept of G.
• If the point (a, 0) is in G, then we call (a, 0) a horizontal or x-intercept of G.
If, moreover, G is the graph of an equation (or function), then we also call a a root of
that equation (or function).

Example 140. The graph of the equation y = x + 2 has horizontal or x-intercept (−2, 0)
and vertical or y-intercept (0, 2).
We can also more simply say, “The equation y = x+2 has horizontal or x-intercept (−2, 0)
and vertical or y-intercept (0, 2).”
Since the equation y = x + 2 has x-intercept with x-coordinate −2, we say that −2 is a
root of the equation y = x + 2.
y =x+2

(0, 2)
(−2, 0)

-3 -2 -1 1 2 3 4 5 6 x

Remark 21. Just so you know, some writers (including your TI84) also call a root a zero.
So in the above example, they’d say that the zero of the equation y = x+2 is −2. However,
we will avoid using the term zero in this textbook because it does not appear on your
A-Level exams or syllabus.

93, Contents

Example 141. Consider the equation y = x2 . The point (0, 0) is both its (only) x-
intercept and its (only) y-intercept. Also, the equation has root 0.

y = x2

(0, 0)

-3 -2 -2 -1 1 1 2 x

Example 142. The equation x2 + y 2 = 1 has two x-intercepts (−1, 0) and (1, 0), two
y-intercepts (0, −1) and (0, 1), and two roots −1 and 1.

{(x, y) ∶ y = x + 2}
(0, 1)

(−1, 0) (1, 0) x

(0, −1)

94, Contents

Example 143. The equation y = x2 − 1 has two x-intercepts (−1, 0) and (1, 0), one
y-intercept −1, and two roots −1 and 1.

y = x2 − 1

(−1, 0) (1, 0)

-2 -1 1 x
(0, −1)

Exercise 68. For each of the following equations, write down any x-intercept(s), y-
intercept(s), and roots. (Answer on p. 1400.)

(a) y = 2.
(b) y = x2 − 4.
(c) y = x2 + 2x + 1.
(d) y = x2 + 2x + 2.

95, Contents

6.8. Lines
You already know what a line is. Here’s the formal definition:79

Definition 39. A line is the graph of any equation

ax + by + c = 0,

where a, b, c ∈ R and it is not the case that both a and b are zero. The line’s gradient is the
number −a/b (provided b ≠ 0; if b = 0, then we say that the line’s gradient is undefined).

You may find this definition a little puzzling — didn’t we always simply write lines as:

y = dx + e?

It turns out that if b ≠ 0, then ax + by + c = 0 and y = dx + e are equivalent, as we now prove:

ax + by + c = 0 ⇐⇒ by = −ax − c ⇐⇒ y= − x − .
a c

° ¯
b b
d e

Writing a line as y = dx + e has two advantages — it immediately tells us that:

d is the gradient and e is the y-intercept.

But writing a line as y = dx + e also has one big disadvantage — it can’t describe the case
where b = 0, i.e. vertical lines:

Example 144. The line x + y + 1 = 0 can also be written as y = −x − 1.

In contrast, the vertical line x − 1 = 0 cannot be written in the form y = dx + e.

The line x + y + 1 = 0 can y

also be written as y = −x − 1. 3 The vertical line
x − 1 = 0 cannot be
written in the
1 form y = dx + e.

-4 -3 -2 -1 -1 1 2 3 x



Here’s what we’ll do in this textbook: If we know for sure that a line isn’t vertical, then
Our definition here covers only lines in two-dimensional space. In Part IV (Vectors), we’ll learn of a
more general definition (namely Definition 109) of a line that covers also higher-dimensional spaces and
which will replace Definition 39.
96, Contents
we’ll write it in the form y = dx + e, because of the aforementioned advantages. Otherwise,
we’ll write it as ax + by + c = 0.

97, Contents

6.9. Horizontal, Vertical, and Oblique Lines

Definition 40. We say that a graph is horizontal if any two points in that graph have
the same y-coordinate; and vertical if any two points have the same x-coordinate.

Definition 41. We say that a line is oblique (or slanted) if it is neither horizontal nor

Example 145. The line y = 1 is horizontal, the line x = 2 is vertical, and the line y = x+1
is oblique.


y =x+1

(1, 1)
(0, 1)

y = −1

(1, 0)

Fact 13. Suppose ax + by + c = 0 describes a line. Then:

(a) The line is horizontal ⇐⇒ a = 0.
(b) The line is vertical ⇐⇒ b = 0.

Proof. See p. 1263 in the Appendices.

Example 146. The line y = 1 is horizontal because the coefficient on x is zero.

The line x = 2 is vertical because the coefficient on y is zero.
The line y = x + 1 is oblique because the coefficients on x and y are both non-zero.

98, Contents

6.10. Finding the Equation of a Line

Fact 14. The line containing the distinct points (a1 , b1 ) and (a2 , b2 ) is:

(a2 − a1 ) (y − b1 ) = (b2 − b1 ) (x − a1 ) .

Proof. See p. 1261 in the Appendices.

Example 147. The line containing the points (1, 2) and (−1, 3) is:

1 5
(−1 − 1) (y − 2) = (3 − 2) (x − 1) or −2y = x − 5 or y =− x+ .
2 2

(4, 5)
1 5
y =− x+
2 2 5
y = x−5

(−1, 3)
(1, 2)

(2, 0)

The line containing the points (2, 0) and (4, 5) is:

(4 − 2) (y − 0) = (5 − 0) (x − 2) or 2y = 5x − 10 or y = x − 5.

Exercise 69. In each of the following, write down the equation of the line that contains
the two given points. (Answer on p. 1401.)
(a) (4, 5) and (7, 9).
(b) (1, 2) and (−1, −3).

99, Contents

Fact 15. The line that contains the point (p, q) and has gradient m is:

y − q = m (x − p) .

Proof. The line contains the distinct points (p, q) and (p + 1, q + m).
And so by Fact 14 then, it may also be described by:

(p + 1 − p) (y − q) = (q + m − q) (x − p) or y − q = m (x − p).

Example 148. The line that contains the point (−1, 2) and has gradient 3 is:

y − 2 = 3 [x − (−1)] or y = 3x + 5.

y = 3x + 5


y = −2x + 19

1 (7, 5)

(−1, 2)

The line that contains the point (7, 5) and has gradient −2 is:

y − 5 = −2 (x − 7) or y = −2x + 19.

Exercise 70. In each of the following, write down the equation of the line that contains
the given point and has the given gradient. (Answer on p. 1401.)
(a) (4, 5) and 3.
(b) (1, 2) and −2.

100, Contents

6.11. Perpendicular Lines
For now, let us use the following working definition of when two lines are said to be per-

Definition 42. Two lines are perpendicular if:

(a) Their gradients are negative reciprocals of each other; or
(b) One line is vertical while the other is horizontal.

In Part IV (Vectors), we will give a more general definition (namely Definition 116) of when
two lines are said to be perpendicular and which will replace the above Definition.

Example 149. XXX

Example 150. XXX

Exercise 71. XXX (Answer on p. 101.)


101, Contents

6.12. The Difference between Lines, Line Segments, and Rays

Example 151. Let A and B be points.

The line AB contains both A and B. It extends B
“forever” in both directions and thus has infinite length.

In contrast, the line segment AB has finite length — it contains

B A, B, and every point between, but no other point. It thus
A has finite length. We’ll use ∣AB∣ to denote the length of the line
segment AB.
With lines and line segments, the order in which we write A and B doesn’t matter. The
line AB is the same as the line BA. And the line segment AB is the same as the line
segment BA. But with rays, the order in which we write A and B does matter.
The ray AB is the “half-infinite line” that
starts at A, passes through B, and also con- B
tains every point “beyond” B. A

B In contrast, the ray BA is the “half-infinite line” that starts at

B, passes through A, and also contains every point “beyond” A.

The rays AB and BA are distinct. Both have infinite length.

It makes no sense to speak of the length of a line or a ray (because these have infinite
length). We can however speak of a line segment’s (finite) length.80

Remark 22. Confusingly, some other writers use ray to mean a (finite) line segment. But
this textbook will strictly reserve the word ray to mean a “half-infinite line”.

Remark 23. According to ISO 80000-2: 2009,81 AB may be used to denote the line
segment from point A to point B. This notation is convenient and allows us to distinguish
between the line AB and the line segment AB.
However, your A-Level syllabus and exams (see e.g. N2007/I/6) do not seem to use this
notation.82 And so, this textbook shall not do so either.
For us, the only way to tell whether AB is a line or a line segment is to see if we say “the
line AB” or “the line segment AB”. We must thus always be absolutely clear if we’re
talking about a line or a line segment.

For the formal definitions of a line, a line segment, and a ray, see Definition 222 in the Appendices.
See Item No. 2-8.4. The 40-something-page PDF costs a mind-blowing 158 Swiss Francs or about S$214
at the ISO store. As always, you may or may not be able to find free versions of this document elsewhere
on the interwebz.
Indeed, they don’t seem to be very careful about distinguishing between lines and line segments.
102, Contents
6.13. Asymptotes and Limit Notation
Informally, an asymptote is a line that a graph “gets ever closer to”.

Example 152. Consider y = + 2. We have:

lim− y = lim− ( + 2) = −∞.
x→1 x→1

For the A-Levels, you need only know, roughly and informally, what the above line ,1
says.83 And as you can probably guess, it says:

The limit of y as x approaches 1 from the left is −∞.

Or: As x approaches 1 from the left, y approaches −∞.

Similarly, we have: lim+ y = lim+ ( + 2) = ∞
x→1 x→1

,2 says: The limit of y as x approaches 1 from the right is ∞.

Or: As x approaches 1 from the right, y approaches ∞.
Both ,1 and ,2 say that x = 1 is a vertical asymptote for the graph of y = + 2.

y As x → 1+ , y → ∞.

y= +2
Horizontal asymptote x−1

As x → −∞, y → 2− . As x → ∞, y → 2+ .

Vertical asymptote

As x → 1− , y → −∞.

(Example continues on the next page ...)

Ch. 121.2 (Appendices) formally and precisely defines what ,1 means and what asymptotes are.
103, Contents
(... Example continued from the previous page.)
Next, we also have:

lim y = lim ( + 2) = 2− ,
x→−∞ x→−∞ x − 1

lim y = lim ( + 2) = 2+ .
x→∞ x→∞ x−1

= says: The limit of y as x approaches −∞ is 2.


Or: As x approaches −∞, y approaches 2 from below.

= says: The limit of y as x approaches ∞ is 2.


Or: As x approaches ∞, y approaches 2 from above.

Both = and = say that y = 2 is a horizontal asymptote for the graph of y = + 2.
1 2

Example 153. Consider y = ex + 1. We have:

lim y = lim (ex + 1) = 1+ .

x→−∞ x→−∞

That is: The limit of y as x approaches −∞ is 1.

Or: As x approaches −∞, y approaches 1 from above.

Thus, y = 1 is a horizontal asymptote for the graph of y = ex + 1.

y = ex + 1

As x → −∞, y → 1+ . Horizontal asymptote

104, Contents

Example 154. Consider y = tan x. We have:

lim − y = lim − tan x = ∞,


x→( π2 ) x→( π2 )

lim + y = lim + tan x = −∞.


x→( π2 ) x→( π2 )

= says: from the left is ∞.

1 π
The limit of y as x approaches
As x approaches from the left, y approaches ∞.

= says: from the right is −∞.

2 π
The limit of y as x approaches
As x approaches from the right, y approaches −∞.

Thus, x = is a vertical asymptote for the graph of y = tan x.


Vertical asymptote x =
π −
As x → ( ) , y → ∞. 2

y = tan x


π +
As x → ( ) , y → −∞.

And actually, y = tan x has infinitely many vertical asymptotes.

Indeed, for every positive integer k, the line x = (k + ) π is a vertical asymptote.
We’ll learn more about this when we formally review trigonometric functions in Ch. 19.

Exercise 72. With the aid of formal limit notation, explain why the line x = −π/2 is also
a vertical asymptote for the graph of y = tan x. (Answer on p. 1401.)

105, Contents

Asymptotes need not only be vertical or horizontal — they can also be oblique (slanted).

Example 155. Consider y = + x. y
x−1 y=
We note in passing that x = 1 is a vertical x−1
asymptote (can you explain why?).
Observe that this graph has the oblique
asymptote y = x, because: As x → ∞, y → x+ .
lim y = lim ( + x) = x− ,
x→−∞ x→−∞ x − 1

lim y = lim ( + x) = x+ .
x→∞ x→∞ x−1
Oblique asymptote

Vertical asymptote

As x → −∞, y → x− .

= says: The limit of y as x approaches −∞ is x.


Or: As x approaches −∞, y approaches x from below.

= says: The limit of y as x approaches ∞ is x.


Or: As x approaches ∞, y approaches x from above.

Thus, y = x is an oblique asymptote for the graph of y = + x.

Exercise 73. Graphed on the right is the y

equation: 1
y= .

Identify any asymptotes and explain why

each is an asymptote. (Answer on p. 1401.) x

106, Contents

6.14. Maximum and Minimum Points
In secondary school, you were introduced to the concepts of a maximum point and a
minimum point.
We’ll now go into a little more depth than we did in secondary school. In particular, we’ll
learn to distinguish between:
• Global and local maximum points; and
• Strict and non-strict maximum points.
Informally, a global maximum (point) of a graph is a point that’s at least as high as
any other point (that’s also in the graph).84
And a strict global maximum (point) of a graph is a point that’s (strictly) higher than
any other point (that’s also in the graph).

Example 156. Consider the graph of y = −x2 . The point D = (0, 0) is a global max-
imum, because it is at least as high as any other point. Indeed, it is also a strict global
maximum, because it is strictly higher than any other point.

D = (0, 0) x

y = −x2

Here in this subchapter, we’ll merely give the informal definitions of the eight types of points introduced.
For their formal definitions, see Definition 224 (Appendices).
107, Contents
Example 157. Consider the graph of y = sin x. Let

3π 5π
A = (− B = ( , 1), C=(
, 1), and , 1)
2 2 2

be points. Each of A, B, and C is a global maximum, because each is at least as high as

any other point.

3π y 5π
A = (− B = ( , 1) C=(
, 1) , 1)
2 2 2

y = sin x

Indeed, y = sin x has infinitely many global maxima — for k ∈ Z, the following point is a
global maximum:
((2k + ) π, 1)
Note though that y = sin x has no strict global maximum, because no point is strictly
higher than any other point.

Obviously, if a point is strictly higher than any other point, then it must also be at least
as high as any other point. And so, in general:

Fact 16. Every strict global maximum is also a global maximum.

However, the converse is not true. That is, a global maximum need not be strict. In the
above example, each of A, B, and C is a global maximum, but none is a strict global
maximum. Indeed, the graph of y = sin x has no strict global maximum.
We next introduce the concept of a local maximum:

108, Contents

Informally, a local maximum (point) of a graph is a point that’s at least as high as any
“nearby” point (that’s also in the graph).

Example 158. Consider the graph of y = 6x5 − 15x4 − 10x3 + 30x2 .

Neither E = (−1, 19) nor F = (1, 11) is a global maximum, because neither is at least as
high as any other point. However, each of E and F is a local maximum, because each is
at least as high as any “nearby” point.
E = (−1, 19) y

F = (1, 11)

y = 6x5 − 15x4 − 10x3 + 30x2

Informally, a strict local maximum (point) of a graph is a point that’s (strictly) higher
than any “nearby” point (that’s also in the graph).

Example 159. In the last example, the points E and F are also strict local maxima,
because each is (strictly) higher than any “nearby” point.

Again and obviously, if a point is strictly higher than any “nearby” point, then it must also
be at least as high as any “nearby” point. And so in general:

Fact 17. Every strict local maximum is also a local maximum.

However, the converse is not true. That is, a local maximum need not be strict:

Example 160. Consider the graph of y = 3. The point G = (1, 3) is a local maximum,
because it’s at least as high as any “nearby” point. However, it is not a strict local
maximum, because it is not strictly higher than any “nearby” point.
Actually, every point in y = 3 is a local maximum (though not a strict local maximum)!


G = (1, 3)

Indeed, every point in y = 3 is a global maximum (though not a strict global maximum)!

109, Contents

We can similarly distinguish between:
• Global and local minimum points; and
• Strict and non-strict minimum points.
Informally, a global minimum (point) of a graph is a point that’s at least as low as any
other point (that’s also in the graph).
And a strict global minimum (point) of a graph is a point that’s (strictly) lower than
any other point (that’s also in the graph).

Example 161. Consider the graph of y = x2 . The point K = (0, 0) is a global minimum,
because it is at least as low as any other point. Indeed, it is also a strict global
minimum, because it is (strictly) lower than any other point.

y = x2

K = (0, 0) x

110, Contents

Example 162. Consider the graph of y = sin x. Let

5π 3π
H = (− , −1), I = (− , −1), J =( , −1)
2 2 2

be points. Each of H, I, and J is a global minimum, because each is at least as low as

any other point.

y = sin x

5π 3π
H = (− , −1) I = (− , −1) J =( , −1)
2 2 2

Indeed, y = sin x has infinitely many global minima — for k ∈ Z, the following point is a
global maximum:
((2k − ) π, 1)
Note though that y = sin x has no strict global minimum, because no point is strictly
lower than any other point.

Obviously, if a point is strictly lower than any other point, then it must also be at least as
low as any other point. And so in general:

Fact 18. Every strict global minimum is also a global minimum.

However, the converse is not true. That is, a global maximum need not be strict. In
the above example, each of H, I, and J is a global minimum, but none is a strict global
minimum. Indeed, the graph of y = sin x has no strict global minimum.

111, Contents

Informally, a local minimum (point) of a graph is a point that’s at least as low as any
“nearby” point (that’s also in the graph).

Example 163. In the graph of y = 6x5 − 15x4 − 10x3 + 30x2 , neither L = (0, 0) nor
M = (2, −8) is a global minimum, because neither is at least as low as any other point.
However, each is a local minimum, because each is at least as low as any “nearby” point.

y = 6x5 − 15x4 − 10x3 + 30x2

L = (0, 0) x

M = (2, −8)

Informally, a strict local minimum (point) of a graph is a point that’s (strictly) lower
than any “nearby” point (that’s also in the graph).

Example 164. In the last example, the points L and M are also strict local minima,
because each is (strictly) lower than any “nearby” point.

Again and obviously, if a point is strictly lower than any “nearby” point, then it must also
be at least as low as any “nearby” point. And so in general:

Fact 19. Every strict local minimum is also a local minimum.

However, the converse is not true. That is, a local minimum need not be strict:

Example 165. Consider the graph of y = 3. The point G = (1, 3) is a local minimum,
because it’s at least as low as any “nearby” point. However, it is not a strict local
minimum, because it is not strictly lower than any “nearby” point.
Actually, every point in y = 3 is a local minimum (though not a strict local minimum)!


G = (1, 3)

Indeed, every point in y = 3 is a global minimum (though not a strict global minimum)!

112, Contents

The following definition is sometimes convenient:

Definition 43. An extremum (plural: extrema) is any maximum or minimum point.

A strict extremum is any strict maximum or minimum point.

So altogether, we have eight types of extrema. Summary:

(a) A global maximum is at least as high as any other point.

(b) The strict global “ “ higher than any other point.
(c) A local “ “ at least as high as any “nearby” point.
(d) A strict local “ “ higher than any “nearby” point.
(e) A global minimum “ at least as low as any other point.
(f) The strict global “ “ lower than any other point.
(g) A local “ “ at least as low as any “nearby” point.
(h) A strict local “ “ lower than any “nearby” point.

Remember turning points? We’ll formally define what these are in Definition 63. For
now, we’ll rely on your intuitive understanding of what turning points are. We’ll also
mention that every turning point is a strict extremum (i.e. either a strict maximum or
minimum), but not every strict extremum is a turning point.

Example 166. The graph of y = x (left) has no extrema.

y y

y = x, x ≥ −1

x x
I = (−1, −1)

In contrast, the graph of y = x with the constraint x ≥ −1 (right) has one extremum,
namely I = (−1, −1), which is a global minimum, strict global minimum, local minimum,
and strict local minimum. Observe though that I is not a turning point.

113, Contents

Example 167. Consider the graph of:

⎪ −81, for x ≤ −3,

y = ⎨0.75x5 − 3.75x4 − 5x3 + 30x2 , for − 3 < x < 5,

⎩125, for x ≥ 5.

y H = (6, 125)
G = (5, 125)

C = (−2, 76)

E = (2, 44)

D = (0, 0) x

F = (4, −32)
A = (−4, −81)
B = (−3, −81)

The following table says that A = (−8, −81) is a local maximum, a global minimum, and
a local minimum, but is not any of the other five types of extrema and is not a turning
point. You should verify that the table is correct.

Global maximum 3 3
Strict global maximum
Local maximum 3 3 3 3 3
Strict local maximum 3 3
Global minimum 3 3
Strict global minimum
Local minimum 3 3 3 3 3
Strict local minimum 3 3
Turning point 3 3 3 3

Besides the eight points (A–H), does this graph have any other extrema?85

Yes. In fact, there are infinitely many other extrema. For every p < −3, the point (p, −81) is, like A, a
global minimum, local maximum, and local minimum. And for every q > 5, the point (q, 125) is, like H,
a global maximum, local maximum, and local minimum.
114, Contents
Example 168. Let G = {A, B, C} be the graph consisting of three isolated86 points:
A = (−5, 4), B = (1, 1), and C = (2, −3).
Interestingly, each of A, B, and C is a strict local maximum of G. This is because each
of A, B, and C has no “nearby” point. It is thus trivially or vacuously true that each of
A, B, and C is strictly higher than any “nearby” point.
By the same token, it is likewise trivially or vacuously true that each of A, B, and C is
a strict local minimum of G.

A = (−5, 4)

B = (1, 1)

-6 -4 -2 2 4 x


C = (2, −3)

Note though that there’s only one strict -6

global maximum or global maximum, namely A.
Likewise, there is only one strict global minimum or global minimum, namely C.
Observe also that none of A, B, or C is a turning point.

Exercise 74. Identify each graph’s extrema and turning points. (Answer on p. 1402.)

(a) y = x2 + 1. (b) y = x2 + 1, −1 ≤ x ≤ 1.
(c) y = cos x. (d) y = cos x, −1 ≤ x ≤ 1.

Exercise 75. Explain whether each statement is true or false. (Answer on p. 1403.)

(a) A global maximum must also be a strict local maximum.

(b) A global maximum must also be a local maximum.
(c) A strict global maximum must also be a global maximum, strict local maximum, and
local maximum.
(d) A global maximum cannot be a local minimum.
Informally, a point of a graph is said to be isolated if there is no point in the same graph is “nearby”.
For the formal definition, see Definition 250 (Appendices).
115, Contents
7. Reflection and Symmetry

7.1. The Reflection of a Point in a Point

Example 169. The reflection of the point P = (−2, 0) in the point Q = (0, 1) is the point:

R = (2, 2) .

2 R = (2, 2)

1 Q = (0, 1)

P = (−2, 0)

-4 -3 -2 -1 1 2 3 x


As the above example shows, intuitively, we know what the reflection of one point P in
another Q is. Let us now try to formalise our intuition.
Let the reflection of P in Q be the point R. Intuitively, we want R to satisfy two properties:

(a) ∣P Q∣ = ∣QR∣. (b) R is on the line P Q.

As we’ll explain on the next page, the following definition of a reflection point “works”,
in the sense that it satisfies the above two properties. Or in other words, it successfully
formalises our intuition about what a reflection should be.

Definition 44. Let P = (a, b) and Q = (c, d) be distinct points. Then the reflection of P
in Q is the point:

R = (2c − a, 2d − b) .

On the next page, we give an informal proof-by-picture that the above definition “works”.87

For a more formal proof, see Fact 194 in the Appendices.
116, Contents
Informal proof-by-picture. Suppose P = (a, b) and Q = (c, d) are points and we want to find
the reflection of P in Q.
To get from point P to point Q, go c − a units right and d − b units down.
Now, if from point Q, we again go c − a units right and d − b units down, we arrive at point
R = (2c − a, 2d − b). And so, “obviously”:

(a) ∣P Q∣ = ∣QR∣. 3 (b) R is on the line P Q. 3

P = (a, b)


Q = (c, d)


R = (2c − a, 2d − b)

Exercise 76. What is the reflection of (8, 5) in (−2, 4)? (Answer on p. 1403.)

Remark 24. By the way, the above is an example of how maths often proceeds. We start
with our intuition of what a reflection of a point in a point ought to be. We then try
to write down a definition (in this case Definition 44) that formalise our intuition. We
then verify that our definition does indeed satisfy our intuition.
But as we go along, we may very well discover that our definition suffers from flaws or
contradictions. Or perhaps it is simply not the most convenient definition possible. If so,
we may decide to go back and write down a new definition.

117, Contents

7.2. The Reflection of a Point in a Graph

Example 170. Consider the point P = (−3, 2) and the line y = 2x + 3. You are told that
Q = (−1, 1) is the point in the line that’s closest to P .
Then the reflection of P in the line is simply the reflection of P in Q, which is:

R = (2 × (−1) − (−3) , 2 × 1 − 2) = (1, 0) .

y = 2x + 3
P = (−3, 2)

Q = (−1, 1) 1

R = (1, 0)

-4 -3 -2 -1 1 2 3 x


As before, let us formalise our intuition about what the reflection of a point in a graph is,
by writing down a formal definition:

Definition 45. Let P be a point and G be a graph. Let Q be the point in G that’s
closest to P . Then the reflection of P in G is the reflection of P in the point Q.

Note that the above definition is the general one, where G can be any sort of graph. But
to keep things simple, we will look only at cases where G is a line.
Here are two simple but useful results:

118, Contents

Corollary 1. The reflection of the point (p, q) in the line y = x is the point (q, p).

Proof. See p. 1262 in the Appendices.

Example 171. The reflection of (−2, 1) in y = x is (1, −2). The reflection of (0, 1) in
y = x is (1, 0).

(0, 1)
(−2, 1) 1

(1, 0)

-4 -3 -2 -1 1 2 3


-2 (1, −2)

Corollary 2. The reflection of the point (p, q) in the line y = −x is the point (−q, −p).

Proof. See p. 1262 in the Appendices.

Example 172. The reflection of (−2, 1) in y = −x is (−1, 2). The reflection of (0, 1) in
y = x is (−1, 0).

(−1, 2)

(−2, 1) 1 (0, 1)

(−1, 0)

-5 -4 -3 -2 -1 1 2 3 4 x

y = −x

Exercise 77. Find the reflections of (3, 2) in y = x and y = −x. (Answer on p. 1403.)

119, Contents

7.3. Lines of Symmetry

Definition 46. Let G and L be graphs. The reflection of G in L is the set of points
obtained by reflecting every point in G in L.

Again, the above definition is the general one, where L can be any sort of graph. But again,
to keep things simple, we will look only at cases where L is a line:

Example 173. The reflection of y = x2 in the line y = 2 is y = 4 − x2 .

y=x 2

y = 4 − x2

-3 -2 -1 -1 1 2 x

Definition 47. Suppose the graph G is identical88 to its reflection in the line l. Then we
say that G is symmetric in l, or that that l is a line (or axis) of symmetry for G.

Example 174. The graph of y = x2 is symmetric in the line x = 0 (also the y-axis).
Or equivalently: x = 0 is a line or axis of symmetry for y = x2 .

The word identical is a synonym for the word equal.
120, Contents
Example 175. The graph of y = is symmetric in the lines y = x and y = −x.

y = −x y=x


Example 176. The graph of x2 + y 2 = 1 is symmetric in every line through the origin.
For example, it is symmetric in the lines y = 2x and y = −x.

y = −x y = 2x

x2 + y 2 = 1

Exercise 78. What is the line of symmetry for y = x2 + 2x + 2? (Answer on p. 1403.)

121, Contents

8. Solutions and Solution Sets
To solve an equation is to find all its solutions or equivalently, its solution set.

Example 177. Consider the equation x − 1 = 0 (x ∈ R).

It has one solution: 1. The equation’s solution set is {1}.

Example 178. Consider the equation x + 5 = 8 (x ∈ R).

It has one solution: 3. The equation’s solution set is {3}.

The above two examples were easy enough to solve. In the next chapter, we’ll review the
solution to the quadratic equation ax2 + bx + c = 0. For now, here’s a quick example:

Example 179. Consider the equation x2 − 1 = 8 (x ∈ R).

Let’s see how graphs can help us solve this equation.
Graph y = x2 − 1 and y = 8. We find that these two graphs intersect at x = −3 and x = 3.


(−3, 8) (3, 8)

y = x2 − 1

Thus, the equation x2 − 1 = 8 (x ∈ R) has two solutions: −3 and 3. Equivalently, its

solution set is {−3, 3} or {±3}.

122, Contents

We won’t be learning to solve the cubic equation. Nonetheless, here’s a quick example:

Example 180. Consider the equation x3 − 3x = 3x − x2 (x ∈ R).

Graph y = x3 − 3x and y = 3x − x2 . We find that these two graphs intersect at x = −3,
x = 0, and x = 2.

y = x3 − 3x (2, 2)

(0, 0)

y = 3x − x2

(−3, −18)

Thus, the equation x3 −3x = 3x−x2 (x ∈ R) has three solutions: −3, 0, and 2. Equivalently,
its solution set is {−3, 0, 2}.
By the way, don’t worry. We don’t know how to solve cubic equations and we won’t be
learning to do so.
However, we are required to know how to use a graphing calculator to find the solutions
to this and indeed just about any equation. We’ll learn how to do so later.

123, Contents

Like the above examples, most equations we’ll encounter will have only finitely many solu-
tions. But this isn’t always the case:

Example 181. Consider the equation sin x = 0 (x ∈ R).

It has infinitely many solutions — for every k ∈ Z, kπ is a solution. The equation’s
solution set is {kπ ∶ k ∈ Z}.

y = sin x

(−π, 0) (2π, 0)

(−2π, 0) (0, 0) (π, 0) x

124, Contents

We now look at inequalities. To solve an inequality is to find all its solutions or equival-
ently, its solution set.
Inequalities usually have infinitely many solutions:

Example 182. Solve x − 1 > 0 (x ∈ R).


A perfectly good answer is, “x > 1”.

Another perfectly good answer is, “The solution set of > is (1, ∞).” That is, every real

number greater than 1 is a solution.

Example 183. Solve x + 5 ≥ 8 (x ∈ R).


A perfectly good answer is, “x ≥ 3”.

Another perfectly good answer is, “The solution set of ≥ is [3, ∞).” That is, every real

number greater than or equal to 3 is a solution.

The above two inequalities were easy enough to solve. In Ch. 24.2, we’ll learn the general
solution to the quadratic inequality ax2 + bx + c > 0. For now, here’s a quick example:

Example 184. Solve x2 − 1 > 8 (x ∈ R).

Again, let’s see how graphs can help us solve this inequality. Graph y = x2 − 1 and y = 8.
We find that the graph of y = x2 −1 is above that of y = 8 in the light-red regions indicated
below (but not including the dotted red lines).
Thus, the solution is x < −3, x > 3. Equivalently, the solution set is (−∞, −3) ∪ (3, ∞) =
R ∖ [−3, 3]. In words, every real number except those between −3 and 3 (inclusive) is a


(−3, 8) (3, 8)

y = x2 − 1

125, Contents

Example 185. Solve x3 − 3x ≥ 3x − x2 .
Graph y = x3 − 3x and y = 3x − x2 . We find that the graph of y = x3 − 3x is above that of
y = 3x − x2 in the light-red regions (including the solid red lines) indicated below.
Thus, the solution is −3 ≤ x ≤ 0, x ≥ 2. We may also write that the solution set is
[−3, 0] ∪ [2, ∞). In words, every real number between −3 and 0 (inclusive) and every real
number greater than or equal to 2 is a solution.

(2, 2)

y = x3 − 3x

y = 3x − x2

(−3, −18)

Here are the formal definitions of a solution and a solution set:

Definition 48. Given an equation (or inequality) involving a single variable, any number
that satisfies the equation (or inequality) is called its solution. And the set of all such
solutions is called its solution set.

For now, we’ll be dealing only with real numbers. And so for now, we’ll be looking only
at real solutions and real solution sets. But just so you know, here’s an example of an
equation with solutions that are not real:

Example 186. The equation x2 + 1 = 0 (x ∈ R) has no solution and its solution set is ∅
(the empty set). But this is only because we’ve specified that x ∈ R.
In contrast, the equation x2 + 1 = 0 (x ∈ C) has two solutions: −i and i. And its solution
set is {−i, i}. We’ll learn more about this in Part IV.

126, Contents

9. O-Level Review: The Quadratic Equation y = ax2 + bx + c
Consider the quadratic equation:89

y = ax2 + bx + c (a ≠ 0).

(If a = 0, then the equation is not quadratic but linear: y = bx + c.)

Let’s find the roots of the quadratic equation (i.e. the values of x for which y = 0):

0 = ax2 + bx + c

Divide both sides by a ≠ 0: 0 = x2 + x + .

b c
a a

b2 b2 b2
0=x + x+ 2 − 2 + .
2 c
Add and subtract 2 :
4a a 4a 4a a

b 2 b2
0 = (x + ) − 2 + .
Complete the square:
2a 4a a

b 2 b2 c b2 − 4ac
Rearrange: (x + ) = 2 − = .
2a 4a a 4a2
√ √
b2 − 4ac ± b2 − 4ac
x+ =± =
Take the square root: .
2a 4a2 2a

Rearrange to get the quadratic formula (i.e. the two roots of the quadratic equation):

−b ± b2 − 4ac
x= .
The quadratic formula is not on the List of Formulae (MF26) and so sadly, you’ll have to
memorise it. Unfortunately, I’ve never come across a good mnemonic that works (for me).
Look for one that works for you. (Lemme know if you think you’ve found a good one!)

Technically, this is a quadratic equation in two variables. And technically, when we simply speak of the
quadratic equation, we’re referring ax2 + bx + c = 0. But here we shall be a little sloppy and also call
y = ax2 + bx + c “the” quadratic equation.
127, Contents
One mnemonic is to sing the quadratic formula to the tune of Pop Goes the Weasel.90
Arranged by a musical genius so that each syllable matches each note:


x e-quals to ne-ga-tive b,

Plus or mi-nus square root,

b squared mi-nus four a c,

All! O-ver two a.

Definition 49. The discriminant of the quadratic equation y = ax2 + bx + c is b2 − 4ac.

On the next page, Fact 20 summarises the key features of the quadratic equation and also
reviews some of the concepts we’ve gone through in previous chapters. Six examples follow.

I checked out about ten versions on YouTube and unfortunately I found all of them to be very annoying
and cannot recommend them. Maybe I’ll make one — don’t worry, I won’t be the one singing.
128, Contents
Fact 20. Given the quadratic equation y = ax2 + bx + c,
1. The y-intercept is (0, c).
2. The sign of the discriminant determines the number of x-intercepts:
(a) If b2 − 4ac > 0, then there are two x-intercepts (i.e. two real roots):

−b ± b2 − 4ac
x= .
We can factorise the quadratic polynomial:
√ √
−b + 2 − 4ac −b − b2 − 4ac
ax2 + bx + c = a (x − ) (x − ).
2a 2a

(b) If b2 − 4ac = 0, then there is one x-intercept (i.e. one real root), where the graph
just touches the x-axis:

We can factorise the quadratic polynomial:
b 2
ax + bx + c = a (x + ) .
(c) If b2 − 4ac < 0, then there are no x-intercepts (i.e. no real roots). There is also
no way to factorise the quadratic polynomial ax2 + bx + c (unless we use complex
3. There is one line of symmetry, which is vertical:

(− , − + c).
4. There is one turning point:
2a 4a

5. The sign of a (the coefficient on x2 ) determines the shape of the graph:

(a) If a > 0, then the graph is ∪-shaped and the turning point is a strict global min-
(b) If a < 0, then the graph is ∩-shaped and the turning point is a strict global max-

Proof. See p. 1265 in the Appendices.

We can distinguish between six cases of the quadratic equation, depending on whether ...

(a) b2 − 4ac ⪋ 0 and (b) a ≶ 0.

Here are six examples to illustrate the six cases:

129, Contents

Example 187. Consider the quadratic equation y = x2 + 3x + 1.

3 y y = x2 + 3x + 1

D = (0, 1)

√ √
−3 − 5 −3 + 5 x
A=( , 0) C=( , 0)
2 2

3 5
B = (− , − )
2 4

1. There is one y-intercept: D = (0, 1).

2. The equation y = x2 + 3x + 1 has two real roots:
√ √ √
−b ± b2 − 4ac −3 ± 32 − 4 × 1 × 1 −3 ± 5
x= = =
2a 2
And so, there are two x-intercepts:
√ √
−3 − 5 −3 + 5
A=( , 0) and C = ( , 0).
2 2

We can factorise the quadratic polynomial:

√ √
−3 − 5 −3 + 5
x2 + 3x + 1 = (x − ) (x − ).
2 2

3 3
3. There is one (vertical) line of symmetry: x = − =− =− .
2a 2×1 2
4. The one turning point is:
b2 3 32 3 5
B = (− , c − ) = (− , 1 − ) = (− , − ) .
2a 4a 2 4×1 2 4
5. Since the coefficient 1 on x2 is positive, the graph is ∪-shaped, with the turning point
being the strict global minimum.

130, Contents

Example 188. Consider the quadratic equation y = x2 + 2x + 1.

x = −1 y = x2 + 2x + 1

B = (0, 1)

A = (−1, 0) x

1. There is one y-intercept: B = (0, 1).

2. The equation y = x2 + 2x + 1 has one real root:
√ √ √
−b ± b2 − 4ac −2 ± 22 − 4 × 1 × 1 −2 ± 0
x= = = = −1.
2a 2×1 2
And so, there is one x-intercept, where the graph just touches the x-axis:

A = (−1, 0).

We can factorise the quadratic polynomial:

x2 + 2x + 1 = [x − (−1)] = (x + 1) .
2 2

3. There is one (vertical) line of symmetry: x = − =− = −1.
2a 2×1
4. In general, if a quadratic equation has only one real root, then the turning point is
also the x-intercept:
b2 22
A = (− , c − ) = (−1, 1 − ) = (−1, 0) .
2a 4a 4×1
5. Since the coefficient 1 on x2 is positive, the graph is ∪-shaped, with the turning point
being the strict global minimum.

131, Contents

Example 189. Consider the quadratic equation y = x2 + x + 1.

x=− y = x2 + x + 1

B = (0, 1)
1 3
A = (− , )
2 4

1. There is one y-intercept: B = (0, 1).

2. The equation y = x2 + x + 1 has no real roots:
√ √ √
−b ± 2 − 4ac −2 ± 12 − 4 × 1 × 1 −2 ± −3
x= = =
2a 2
And so, there are no x-intercepts. We cannot factorise the quadratic polynomial
(unless we use complex numbers).
1 1
3. There is one (vertical) line of symmetry: x = − = − =− .
2a 2×1 2
4. The one turning point is:
b2 1 12 1 3
A = (− , c − ) = (− , 1 − ) = (− , ) .
2a 4a 2 4×1 2 4
5. Since the coefficient 1 on x2 is positive, the graph is ∪-shaped, with the turning point
being the strict global minimum.

132, Contents

Example 190. Consider the quadratic equation y = −x2 + 3x − 1.


3 5
C=( , ) √
2 4 3+ 5
D=( , 0)

3− 5 x
A = (0, 1) B=( , 0)

y = −x2 + 3x − 1

1. There is one y-intercept: A = (0, 1).

2. The equation y = −x2 + 3x − 1 has two real roots:
√ √ √ √
−b ± b − 4ac −3 ± 32 − 4 × (−1) × (−1) −3 ± 5 3 ∓ 5
x= = = =
2 × (−1) −2
2a 2
And so, there are two x-intercepts:
√ √
3− 5 3+ 5
B=( , 0) and D = ( , 0).
2 2

We can factorise the quadratic polynomial:

√ √
3 − 5 3 + 5
−x2 + 3x − 1 = − (x − ) (x − ).
2 2

3 3
3. There is one (vertical) line of symmetry: x = − =− = .
2a 2 × (−1) 2
4. The one turning point is:

b2 3 32 3 5
C = (− , c − ) = ( , −1 − ) = ( , ).
2a 4a 2 4 × (−1) 2 4

5. Since the coefficient −1 on x2 is negative, the graph is ∩-shaped, with the turning point
being the strict global maximum.

133, Contents

Example 191. Consider the quadratic equation y = −x2 + 2x − 1.


B = (1, 0)
A = (0, −1) x

y = −x2 + 2x − 1

1. There is one y-intercept: A = (0, −1).

2. The equation y = −x2 + 2x − 1 has one real root:
√ √ √
−b ± b2 − 4ac −2 ± 22 − 4 × (−1) × (−1) −2 ± 0
x= = = = 1.
2a 2 × (−1) −2
And so, there is one x-intercept, which is also where it just touches the x-axis:

B = (1, 0).

We can factorise the quadratic polynomial:

−x2 + 2x − 1 = − (x − 1) .

3. There is one (vertical) line of symmetry: x = − =− = 1.
2a 2 × (−1)
4. In general, if a quadratic equation has only one real root, then the turning point is
also the x-intercept:

b2 22
B = (− , c − ) = (1, 1 − ) = (1, 0) .
2a 4a 4 × (−1)

5. Since the coefficient −1 on x2 is negative, the graph is ∩-shaped, with the turning point
being the strict global maximum.

134, Contents

Example 192. Consider the quadratic equation y = −x2 + x − 1.


A = (0, −1) x

1 3
B = ( ,− )
2 4

1. There is one y-intercept: A = (0, −1).

2. The equation y = −x2 + x − 1 has no real roots:
√ √ √
−b ± 22 − 4ac −1 ± 12 − 4 × (−1) × (−1) −1 ± −3
x= = =
2 × (−1) −2
And so, there are no x-intercepts. We cannot factorise the quadratic polynomial
(unless we use complex numbers).
1 1
3. There is one (vertical) line of symmetry: x = − = − = .
2a 2 × (−1) 2
4. The one turning point is:

b2 1 12 1 3
B = (− , c − ) = ( , −1 − ) = ( ,− ).
2a 4a 2 4 × (−1) 2 4

5. Since the coefficient −1 on x2 is negative, the graph is ∩-shaped, with the turning point
being the strict global maximum.

135, Contents

Our six examples put together in one figure:

x2 + x + 1 y

−x2 + 3x − 1
x2 + 2x + 1

x2 + 3x + 1 −x2 + 2x − 1

−x2 + x − 1

Exercise 79. Sketch each graph, identifying any intercepts, lines of symmetry, turning
points, and extrema.

(a) y = 2x2 + x + 1.
(b) y = −2x2 + x + 1.
(c) y = x2 + 4x + 4. (Answers on p. 1404.)

136, Contents

10. Functions
Undoubtedly the most important concept in all of mathematics is that of a
function—in almost every branch of modern mathematics functions turn out
to be the central objects of investigation.

— Michael Spivak (1994, Calculus).

In secondary school, you probably learnt to describe functions like this:

“Let f (x) = 2x be a function.”

Strictly speaking,91 this description of functions is incorrect and suffers from (at least) two
big problems:
1. It fails to make any mention of the domain and the codomain.
But whenever we specify a function, we must also specify the domain and codomain.92
2. It incorrectly suggests that a function must always be some sort of a “formula”.
But it needn’t be. A function simply maps or assigns every element in the domain to
(exactly) one element in the codomain. There need be nothing logical or formulaic about
how this mapping or assignment is done. We will illustrate this important point with many
examples below.
Here is the correct way to describe functions:93

Definition 50. A function consists of three pieces:

1. A set called the domain;

2. A set called the codomain; and
3. A mapping rule that maps or assigns every element in the domain to (exactly) one
element in the codomain.

Two very simple examples of functions:

Pedagogical note: H2 Maths is equivalent to first- and even second-year university courses in many
countries. (Well, at least on paper. See my Preface/Rant.) My view is that while at earlier levels,
the above incorrect description of a function may have been suitable, at this level, it is no longer so.
Definition 50 is not at all difficult (especially when compared with a lot of the junk that’s already in H2
Maths). At the cost of very little additional pain and time, the student gains a far better understanding
of what functions are and how they work, and thus saves herself more grief in the long run.
Things become so much simpler if we make it clear from the outset and indeed insist that the very
definition of a function includes the specification of a domain and codomain. Instead we have the
present rigmarole where we ask students to explain which values “to exclude” from the domain, as if we
were doing some ad hoc repair to make the function “work”.
Definition 50 will serve us very well. Note though that it is still not quite correct! The formal and
correct definition of a function is that it is a set. See Definition 226 (Appendices).
137, Contents
Example 193. Let f be the function with:

Domain: The set {1, 5, 3}.

Codomain: The set {100, 200}.
Mapping rule: f (1) = 100, f (5) = 100, and f (3) = 200.
f is a well-defined function, because every element in the domain “hits” (exactly) one
element in the codomain — 1 “hits” 100, 5 “hits” 100, and 3 “hits” 200. (Note: “Hits” is
an informal term. The correct and formal term is maps to.)
Observe that there is no apparent logic or formula behind how the mapping rule works.
Why does 1 hit 100, 5 hit 100,
The function f
and 3 hit 200?
But this doesn’t matter. To
qualify as a well-defined func- 1
tion, all we require is that
f maps every element in the 5
domain to (exactly) one ele-
ment in the codomain. And 200
it does. Thus, f is a well- 3
defined function. We neither
know nor care why 1 hits 100,
5 hits 100, and 3 hits 200. The domain The codomain

Example 194. Let g be the function with:

Domain: The set {1, 5, 3}.

Codomain: The set {100, 200}.
Mapping rule: g (1) = 100, g (5) = 100, and g (3) = 100.

Observe that the element 200

in the codomain is not “hit”. The function g
Nonetheless, g is again a well-
defined function, because it 1
satisfies the requirement that 100
every element in the domain
“hits” exactly one element in
the codomain — 1 ∈ D “hits” 200
100 ∈ C, 5 ∈ D “hits” 100 ∈ C, 3
and 3 ∈ D “hits” 100 ∈ C.
There is no requirement that
every element in the codo- The domain The codomain
main be “hit”.
And so here, even though 200 ∈ C isn’t “hit”, g is a well-defined function.

138, Contents

The concept of a function is of great generality. To qualify as a function, all we require is
that every element in the domain be mapped to (exactly) one element in the codomain.
In each of our above examples, both the domain and codomain were sets of real numbers.
But in general, the domain and codomain could be any sets whatsoever. Examples:

Example 195. Let h be the function with:

Domain: The set {Cow, Chicken}.

Codomain: The set {Produces eggs, Guards the home, Produces milk}.
Mapping rule: “Match the animal to its role.”
Here the mapping rule is informally stated in words.
Nonetheless, it is clear The function h

enough. If we wanted to, we
could write it out formally,
like so: Produces eggs

h (Cow) = Produces milk, Guards the home

h (Chicken) = Produces eggs.
Produces milk
The function h is well-
defined, because it satisfies 🐔
the requirement that every
The domain The codomain
element in the domain “hits”
exactly one element in the codomain.

In the above example, the mapping rule “makes sense” — we simply map each animal to
its role. In the next example, the mapping rule “makes no sense”. Nonetheless, we have a
perfectly well-defined function all the same:

Example 196. Let i be the function with:

Domain: The set {Cow, Chicken}.

Codomain: The set {Produces eggs, Guards the home, Produces milk}.
Mapping rule: i (Cow) = Produces eggs, i (Chicken) = Guards the home.

This time, the mapping rule The function i

“makes no sense” — it maps
Cow to Produces eggs and
Chicken to Guards the home. Produces eggs
Nonetheless, this is a well-
defined function, because it Guards the home
maps every element in the
domain to (exactly) one ele- Produces milk
ment in the codomain.
The domain The codomain

139, Contents

Another pair of examples:

Example 197. Let j be the function with:

Domain: The set of UN member states.

Codomain: The set of ISO three-letter country codes.
Mapping rule: “Match the state to its code.”
The domain, codomain, and mapping rule are all informally stated in words. Nonetheless,
they are clear enough.
The domain contains exactly 193 elements, namely Afghanistan, Albania, Algeria, ...,
Zambia, and Zimbabwe (list). The codomain contains exactly 249 elements, namely
ABW, AFG, AGO, AIA, ALA, ..., ZMB, and ZWE (list).94
And the mapping rule maps every element in the domain to (exactly) one element in the
codomain. For example:

j (The United States of America) = USA,

j (China) = CHN,
j (Singapore) = SGP.

Altogether then, j is a well-defined function, because it satisfies the requirement that

every element in the domain “hits” exactly one element in the codomain.

Again, a very similar example, but this time the mapping rule “makes no sense”:

Example 198. Let k be the function with:

Domain: The set of UN member states.

Codomain: The set of ISO three-letter country codes.
Mapping rule: For every x, k (x) = SGP.
The function k simply maps every UN member state to the three-letter code SGP. This
is strange and seems to make no sense. Nonetheless, k is a well-defined function, because
it satisfies the requirement that every element in the domain “hits” exactly one element
in the codomain.

So! Specifying a function is jolly simple. Simply write down:

1. The domain (could be any set);
2. The codomain (“); and
3. The mapping rule, making sure that it maps every element in the domain to (exactly)
one element in the codomain.

The reason there are more ISO three-letter country codes than UN members is that not every “country”
is a UN member. For example, Greenland is assigned the ISO code GRL, but is a territory of Denmark
and is not a UN member state. The Holy See (or Vatican City) is a fully sovereign state and is
assigned the ISO code VAT, but is not a UN member state. Taiwan is, for all intents and purposes, a
fully sovereign state and is assigned the ISO code TWN, but is not a UN member state (thanks to a big
bully and evil empire next door).
140, Contents
Now, a function consists of the above three pieces. Hence — and here’s a somewhat subtle
point — two functions are identical (i.e. equal) if and only if they have the same domain,
codomain, and mapping rule.

Example 199. Recall from above the function h:

The function h

🐄 Produces eggs

Guards the home

Produces milk
The domain The codomain

Now consider the very similar-looking function h1 :

The function h1

🐄 Produces eggs

Produces milk
The domain The codomain

The functions h and h1 look very similar. Indeed, they have the same domain and
mapping rule.
However, their codomains are different. And so h and h1 are distinct:

h ≠ h1 .

One might think, “Aiyah, the codomain not very important wat. Both h and h1 map Cow
to Produces Milk and Chicken to Produces eggs. So just call them the same function
lah!” But this thinking is wrong.
Again, to reiterate, stress, and emphasise, a function consists of three pieces: the domain,
the codomain, and the mapping rule. Two functions are identical if and only if they have
the same domain, codomain, and mapping rule.

141, Contents

Example 200. Let l be the function with:

Domain: The set {1, 2}.

Codomain: The set {1, 2, 3, 4}.
Mapping rule: For every x, l (x) = 2x.

The function l



The domain The codomain

Now consider the very similar-looking function l1 , which has:

Domain: The set {1, 2}.

Codomain: The set {1, 2, 4}.
Mapping rule: For every x, l1 (x) = 2x.

The function l1



The domain The codomain

Again, the functions l and l1 look very similar — they have the same domain and mapping
rule. However, their codomains are different. And so, l and l1 are distinct:

l ≠ l1 .

142, Contents

10.1. What Functions Aren’t
To better understand what functions are, we’ll now look at what functions aren’t. That is,
we’ll now look at examples of non-functions:

Example 201. It is alleged that f is the function with:

Domain: The set {1, 5, 3}.

Codomain: The set {100, 200}.
Mapping rule: f (1) = 100 and f (3) = 200.

The “function” f



The domain The codomain

Unfortunately, f is not a function, because f fails to map every element in the domain
to (exactly) one element in the codomain — in particular, f fails to map 5 to any element
in the codomain.

143, Contents

Example 202. It is alleged that g is the function with:

Domain: The set {1, 5, 3}.

Codomain: The set {100, 200}.
Mapping rule: g (1) = 100, g (1) = 200, g (5) = 100, and g (3) = 200.

The “function” g



The domain The codomain

Unfortunately, g is not a function, because g fails to map every element in the domain
to (exactly) one element in the codomain — in particular, g maps 1 to two elements,
namely 100 and 200.

Example 203. It is alleged that h is the function with:

Domain: The set of ISO three-letter country codes.

Codomain: The set of UN member states.
Mapping rule: “Match the code to the corresponding state.”
Unfortunately, h is not a function, because h fails to map every element in the domain
to (exactly) one element in the codomain.
For example, h fails to map the ISO three-letter country code VAT to any element in
the codomain. This is because VAT corresponds to the Holy See (or the Vatican City),
which is not a UN member state and thus not an element of the codomain.

Example 204. It is alleged that i is the function with:

Domain: The set of real numbers R.

Codomain: The set of real numbers R.

Mapping rule: For all x, i (x) = x.
Unfortunately, i is not a function, because i fails to map every element in the domain
to (exactly) one element in the codomain.

For example, i fails to map −1 in the domain to any element in the codomain. This is
because −1 is not a real number.

144, Contents

10.2. Notation for Functions
Let’s denote the domain and codomain of the function f by Domain(f ) and Codomain(f ).

Example 205. Let f be the function with:

Domain: The set {3, 4}.

Codomain: The set {5, 6, 7, 8}.
Mapping rule: “Double it.”

The function f



The domain The codomain

Domain(f ) = {3, 4},

We have:
Codomain(f ) = {5, 6, 7, 8}.

There is usually more than one way to write down a mapping rule. The mapping
rule in the above example was written informally as “Double it”. But if we wanted to, it
could’ve been written more formally as:

“For every x ∈ Domain(f ), we have f (x) = 2x.”

We could also have written the mapping rule explicitly, stating what each element in the
domain is to be mapped to:

“f (3) = 6, f (4) = 8.”

The mathematical punctuation mark ↦ means maps to. And so, here’s yet another way
to write the mapping rule:

“f ∶ 3 ↦ 6 and f ∶ 4 ↦ 8.”

So, altogether, in this example alone, we’ve given four different but entirely equivalent ways
of writing out the mapping rule. You can choose to write the mapping rule however you
like. What’s important is that you make clear how the mapping rule maps each element in
the domain to (exactly) one element in the codomain. If you haven’t made it sufficiently
clear, then you have failed to communicate to others what your function is and your function
is not well-defined.
145, Contents
Here are eight ways to say aloud f (4) = 8 or f ∶ 4 ↦ 8:

“f of 4 is 8.” “f of 4 equals 8.” “f of 4 is equal to 8.”

“f maps 4 to 8.” “The value of f at 1 is 8.” “f evaluated at 4 is 8.”

“The value of f when applied to 4 is 8.” “The image of 4 under f is 8.”

If x ∉ Domain(f ), then f (x) is simply undefined. So in the last example, 0 ∉ Domainf ;

and thus, f (0) is simply undefined.

Example 206. Let g be the function with:

Domain: The set of positive integers Z.

Codomain: The set of real numbers R.
Mapping rule: “Double it.”
Then we have, for example:

g(1) = 2, g(2) = 4, and g(3) = 6.

Or equivalently: g ∶ 1 ↦ 2, g ∶ 2 ↦ 4, and g ∶ 3 ↦ 6.

Since −1, 1.5, π, 0 ∉ Domaing, g(−1), g(1.5), g(π), and g (0) are undefined.
Likewise, since Cow, Chicken ∉ Domain(g), g (Cow) and g (Chicken) are undefined.

The mapping rule was given informally above. We could’ve written it more formally as:

For every x ∈ Z, we have g (x) = 2x.

With the aid of an ellipsis, we could also have written it down explicitly:95

“g(1) = 2, g(2) = 4, g(3) = 6, g(4) = 8, . . .

Or equivalently: g ∶ 1 ↦ 2, g ∶ 2 ↦ 4, g ∶ 3 ↦ 6, g ∶ 4 ↦ 8, . . .

But in general, this may not be possible.
146, Contents
So far, we’ve written down functions with the aid of tables and/or figures. But going
forward, we’ll want to write down functions more concisely and without the aid of
tables or figures.
Here are nine entirely equivalent and formal ways to write down the function g from the
last example:
1. Let g be the function that maps every element x in the domain Z to the element x2 in
the codomain R.
2. Let g be the function that has domain Z, codomain R, and maps every x ∈ Z to x2 ∈ R.
3. Let g ∶ Z → R be the function defined by g (x) = x2 .
4. Let g ∶ Z → R be the function defined by g ∶ x ↦ x2 .
5. Let g ∶ Z → R be defined by g (x) = x2 .
6. Let g ∶ Z → R be defined by g ∶ x ↦ x2 .
7. Define g ∶ Z → R by g (x) = x2 .
8. Define g ∶ Z → R by g ∶ x ↦ x2 .
9. Define g ∶ Z → R by x ↦ x2 .
In Statements 3–9, the domain comes after the colon, the codomain after the → arrow,
and the mapping rule at the end of the statement.
The general version of Statement 9 is:

Domain (f ) Codomain (f )

Define f ∶ A → B by x ↦ f (x).
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
Mapping rule

In words, the function f is defined to have domain A, codomain B, and mapping rule
x ↦ f (x). (Note that once again, x here is merely a dummy or placeholder variable, that
could’ve been replaced by any other symbol like y, z, ,, or ☀.)
Three examples on the next page:

147, Contents

Example 207. Define h ∶ R+ → R by h (x) = 2x.
The domain of h is R+ (the set of positive real numbers). The codomain is R (the set of
real numbers). Expressed informally in words, the mapping rule is “Double it”.

We have, for example: h(1) = 2, h(2.3) = 4.6, and h(−3) = −6.

Or equivalently: h ∶ 1 ↦ 2, h ∶ 2.3 ↦ 4.6, and h ∶ −3 ↦ −6.

Observe that −1, 0, Cow, Chicken ∉ R+ = Domainh. Thus, the following are all undefined:

h (−1), h (0), h (Cow), and h (Chicken).

Example 208. Define i ∶ Z → R by i (x) = x2 .

The domain of i is Z (the set of integers). The codomain is R (the set of real numbers).
Expressed informally in words, the mapping rule is “Square it”.

We have, for example: i(1) = 1, i(7) = 49, and i(−6) = 36.

Or equivalently: i ∶ 1 ↦ 1, i ∶ 7 ↦ 49, and i ∶ −6 ↦ 36.

Observe that 1.5, π, Cow, Chicken ∉ Z = Domaini. Thus, the following are all undefined:

h (1.5), i (π), i (Cow), and i (Chicken).

Example 209. Define j ∶ R+ → R by j (x) = x2 .

The domain of j is R+ (the set of positive real numbers). The codomain is R (the set of
real numbers). Expressed informally in words, the mapping rule is “Square it”.

We have, for example: j(1) = 1, j(2.2) = 4.84, and j(−6) = 36.

Or equivalently: j ∶ 1 ↦ 1, j ∶ 2.2 ↦ 4.84, and j ∶ −6 ↦ 36.

Observe that −1, −3.2, Cow, Chicken ∉ Z = Domainj. Thus, the following are all undefined:

j (−1), j (−3.2), j (Cow), and j (Chicken).

(By the way, in the above examples, is g = h? And is i = j?)96

The functions g and h have different domains and so g ≠ h. Likewise, the functions i and j have different
domains and so i ≠ j.
148, Contents
In the above examples, we actually cheated a little. Or rather, we took it for granted that
the specified mapping rule applied to all elements in the domain. If we wanted to be extra
careful (or pedantic), then we should “really” have written:

Example 210. Define g ∶ Z → R by g (x) = 2x for all x ∈ Z.

Example 211. Define h ∶ R+ → R by h (x) = 2x for all x ∈ R+ .

Example 212. Define i ∶ Z → R by i (x) = x2 for all x ∈ Z.

Example 213. Define j ∶ R+ → R by j (x) = x2 for all x ∈ R+ .

This is because we can have piecewise functions like the following, where there are
different mapping rules for different elements in the domain:

Example 214. Define k ∶ Z → R by:

⎪2x, for x ≤ 5,
k (x) = ⎨

⎩x + 1, otherwise.

In words, the function k doubles integers that are less than or equal to 5, but adds one
to those that are greater than 5.

We have, for example: k(−10) = −20, k (0) = 0, and k(4) = 8;

but: k(7) = 8, k(15) = 16, and k(100) = 101.

Example 215. Define l ∶ R+ → R by:

⎪x2 , for x ≤ 5,
l (x) = ⎨

⎪ 3
⎩x , otherwise.
In words, the function l squares positive real numbers that are less than or equal to 5,
but cubes those that are greater than 5.

We have, for example: l(1) = 1, l(1.1) = 1.21, and l(4) = 64;

but: l(5.3) = 148.877, l(7) = 343, and l(9) = 729.

149, Contents

10.3. Warning: f and f (x) Refer to Different Things
If you cannot say what you mean, Your Majesty, you will never mean what
you say and a gentleman should always mean what he says.

— The Last Emperor (1987).

A common mistake is to believe that f (x) denotes a function. But this is wrong.

f and f (x) refer to two different things.

f denotes a function. f (x) denotes the value of f at x.

For the next two examples, let S be the set of human beings.

Example 216. Let h ∶ S → R be the height function. That is, h gives each human being’s
height (rounded to the nearest centimetre).
Then we have, for example, h (Joseph Schooling) = 184.

h (Joseph Schooling) is not a function.

Instead, h (Joseph Schooling) is the number 184.

It would be silly to say that h (Joseph Schooling) is a function, because it isn’t —

h (Joseph Schooling) is the number 184.
In general, for any human being x, her height (rounded to the nearest cm) is h (x). Again,
h (x) is not a function — it’s x’s height (rounded to the nearest cm), which is a number.

Example 217. Let w ∶ S → R be the weight function. That is, w gives each human
being’s weight (rounded to the nearest kilogram).
Then we have, for example, w (Joseph Schooling) = 74.

w (Joseph Schooling) is not a function.

Instead, w (Joseph Schooling) is the number 74.

It would be silly to say that w (Joseph Schooling) is a function, because it isn’t —

w (Joseph Schooling) is the number 74.
In general, for any human being x, her weight (rounded to the nearest kg) is w (x). Again,
w (x) is not a function — it’s x’s weight (rounded to the nearest kg), which is a number.

This may seem like an excessively pedantic distinction. But maths is precise and pedantic.
In maths, we are gentlemen (and ladies) who say what we mean and mean what we say.
There is no room for ambiguity or alternative interpretations.

150, Contents

10.4. Real-Valued Functions of a Real Variable

Example 218. Consider the functions f ∶ [1, 5] → R, g ∶ (−3, 3) → R, h ∶ R → R+ , and

i ∶ (−2, −1) → R+ , each defined by the mapping rule x ↦ x2 .
• A function of a real variable is one whose domain contains only real numbers.
And so here, each of f , g, h, and i is a function of a real variable.
• A real-valued function is one whose codomain contains only real numbers.
And so here, each of f , g, h, and i is a real-valued function.
• We shall call any real-valued function of a real variable a nice function.
And so here, each of f , g, h, and i is a nice function.

Example 219. Define j ∶ {Cow, Chicken} → Z by j (Cow) = 0 and j (Chicken) = 1; and

k ∶ {Cow, Chicken} → {0, 1, 2} by k (Cow) = 0 and k (Chicken) = 1.
Each of the functions j and k has a codomain that contains only real numbers. Hence,
each of j and k is thus a real-valued function.
However, each of j and k’s domains contains elements that are not real numbers. So, j
and k are not functions of a real variable and are thus not nice functions either.

Example 220. Define l ∶ Z → {Cow, Chicken} by:

⎪Cow, if x is an odd integer,
l (x) = ⎨

⎩Chicken, otherwise.
And define m ∶ {0, 1, 2} → {Cow, Chicken} by:

⎪Cow, if x is an odd integer,
m (x) = ⎨

⎪Chicken, otherwise.

Each of the functions l and m has a domain that contains only real numbers. Hence,
each of j and k is thus a function of a real variable.
However, each of l and m’s codomains contains elements that are not real numbers. So,
j and k are not real-valued functions and are thus not nice functions either.
By the way, in this and the previous examples, are j = k and l = m?97

In H2 Maths, we’ll usually encounter only nice functions. So, we’ll often encounter functions
like f , g, h, and i, but not functions like j, k, l, or m.

Remark 25. The term nice function is not standard and is used in this textbook for
brevity’s sake (so we don’t have to keep saying “a real-valued function of a real variable”).

No to both, because j and k have different domains and so too do l and m.
151, Contents
10.5. Graphs of Functions
Let f be a nice function. Then the graph of f is simply the graph of the equation y = f (x)
with the constraint x ∈ Domain(f ). A little more formally:

Definition 51. Given a nice function f , its graph is the following set of points:98

{(x, y) ∶ x ∈ Domainf, y = f (x)} .

Example 221. Define f ∶ R → R by f (x) = 2x.

The graph of f is the set y

{(x, y) ∶ x ∈ R, y = 2x}.

By the way, strictly speaking, we should say that the point (3, 6) is on the graph of f .
That is, we should explicitly state both the x- and y-coordinates of any point we are
talking about.
However, since the x-coordinate is sufficient for identifying a point on the graph of a
function, we will sometimes be lazy and say things like:

“The point x = 3 is on the graph of f .”

Or even: “The point 3 is on the graph of f .”

Actually, going by the formal definition of a function (Definition 226 in the Appendices), there is no
difference between a function and its graph. A function is its graph. See the discussion in the Appendices.
152, Contents
Example 222. Define f1 ∶ R+ → R by f1 (x) = 2x.

The graph of f1 is the set y

{(x, y) ∶ x ∈ R+ , y = 2x}.

Note that the graph of f1 does not contain the point (0, 0), because 0 ∉ R+ .

Example 223. Define f2 ∶ Z → R by f2 (x) = 2x.

The graph of f2 is the set y

{(x, y) ∶ x ∈ Z, y = 2x}.

-6 -4 -2 2 4 x



The graph of f2 is simply the set of isolated points:

{. . . , (−3, −3) , (−2, −2) , (−1, −1) , (0, 0) , (1, 1) , (2, 2) , (3, 3) , . . . }

153, Contents

Example 224. Define g ∶ R → R by g (x) = x2 .

The graph of g is the set y

{(x, y) ∶ x ∈ R, y = x2 }.

Example 225. Define g1 ∶ R+ → R by g1 (x) = x2 .

The graph of g1 is the set y

{(x, y) ∶ x ∈ R+ , y = x2 }.

Note that again, the graph of g1 does not contain the point (0, 0), because 0 ∉ R+ .

154, Contents

Example 226. Define g2 ∶ Z → R by g2 (x) = x2 .

The graph of g2 is the set y

{(x, y) ∶ x ∈ Z, y = x2 }.

The graph of g2 is simply the set of isolated points:

{. . . , (−3, 9) , (−2, 4) , (−1, 1) , (0, 0) , (1, 1) , (2, 4) , (3, 9) , . . . }

We next look at the graphs of piecewise functions:

155, Contents

Example 227. Define h ∶ R → R by

⎪x2 for x ≤ 2,
h (x) = ⎨

⎪ for x > 2.


Example 228. Define i ∶ R → R by

⎪x for x ∉ Z,
i (x) = ⎨

⎪ for x ∈ Z.


−3 −2 −1 1 2 3

156, Contents

Example 229. Define j ∶ R → R by:99

⎪0 for x < 0,
j (x) = ⎨

⎪ for x ≥ 0.



Exercise 80. Fill in the blanks. (Answer on p. 1405.)

A function consists of ____ pieces: namely, the ____, the ____, and the ____.
Exercise 81. Fill in the blanks with “can be any set”; “must be R”; or “must be a subset
of R”. (Answer on p. 1405.)
In general, the domain ______; and the codomain _____.

Exercise 82. Fill in the blanks with “at least one element”; “every element”; or “exactly
one element”. (Answer on p. 1405.)
A function maps _____ in its domain to _____ in its codomain.
Exercise 83. What do we call a function whose ...

(a) Domain contains only real numbers?

(b) Codomain contains only real numbers? (Answer on p. 1405.)

Exercise 84. Evaluate each function at 1. (Answer on p. 1405.)

Define: a∶R→R by a (x) = x + 1;
b ∶ [−1, 1] → R by b (x) = 17x;
c ∶ Z+ → R by c (x) = 3x ;
d ∶ Z− → R by d (x) = 3x ;
e∶R→R by e (x) = 17.

By the way, this function is sometimes called the Heaviside function.
157, Contents
Exercise 85. (i) Verify that the functions given below are well-defined. (ii) Which (if
any) of them are equivalent? (Answer on p. 1405.)

Function Domain Codomain Mapping rule

a {1, 2} {1, 2, 3, 4} “Double it”
b {1, 2, 3} {1, 2, 3, 4, 5, 6} “Double it”
c {x ∈ Z+ ∶ x < 4} {1, 2, 3, 4, 5, 6} “Double it”
d [0, 4) ∩ Z (−3, 6] ∩ Z+0 “Double it”

Exercise 86. Define f ∶ R+ → R by “round it off to the nearest integer (half-integers are
rounded up)”. (Answer on p. 1405.)
(a) What are f (3), f (π), f (3.5), f (3.88), and f (0)?

Would f still be well-defined if we changed:

(b) The domain to R and the codomain to Z?
(c) The domain to Z and the codomain to R?

Exercise 87. Let A = {Lion, Eagle} and B = {Fat, Tall}. Can we construct a well-defined
function using A as the domain and B as the codomain? (Answer on p. 1405.)

Exercise 88. Explain whether each of the following alleged functions is in fact well-
defined. (Answer on p. 1405.)
(a) Define a ∶ {Cow, Chicken, Dog} → {Produces eggs, Guards the home, Produces milk}
by “match the animal to its role”.
(b) Define b ∶ {Cow, Chicken, Dog} → {Produces eggs, Produces milk} by “match the
animal to its role”.
(c) Let c have the set of UN member states as its domain, the set of cities as its codomain,
and the mapping rule “match the state to its most splendid city”.
(d) Let d have the set of UN member states as its domain, the set of cities as its codomain,
and the mapping rule “match the state to a city with over 10M people”.

158, Contents

Exercise 89. Below are 17 alleged functions named a through q, with a given domain,
codomain, and mapping rule (the last being given informally in words).

(i) Explain whether each alleged function is in fact well-defined.

And if it is,
(ii) Write down a statement to formally define it. (Answer on p. 1406.)

Function Domain Codomain Mapping rule

a {5, 6, 7} Z “Double it”
b {5, 6, 7} Z+ “Double it”
c {5, 6, 7} Z− “Double it”
d {5.4, 6, 7} Z “Double it”
e {5.5, 6, 7} Z “Double it”
f {3} {3, 4} “Any larger number”
g {3, 3.1} {3, 4} “Any larger number”
h {0, 3} {3, 4} “Any larger number”
i {3, 4} {3, 4} “Any larger number”
j {2, 4} {3, 4} “Any smaller number”
k {1} {1} “Stay the same”
l {1} {1, 2} “Stay the same”
m {1, 2} {1} “Stay the same”
n R R “Take the square root”
o R R “Take the reciprocal”
p R [0, 1] “Add one”
q [0, 1] R “Add one”

Exercise 90. Continuing with the above exercise, how can we change the domains of n
and o so that n and o become well-defined?100 (Answer on p. 1407.)

The sharp student may have noticed that one trivial answer here is to simply change the domain to
the empty set ∅. Then it is trivially or vacuously true that every element in the domain is mapped
to exactly one element in the codomain. If you’ve noticed and are bothered by this, please change the
question to “What are the largest subsets of R to which the domains of n and o can be changed, so that
they become well defined?”
159, Contents
10.6. The Range of a Function
Informally, the range is the set of elements in the codomain that are “hit”.101 Formally:

Definition 52. The range of a function f is the set:

Rangef = {f (x) ∶ x ∈ Domainf } .

Example 230. Define f ∶ [0, 1] → R by f (x) = x + 1. Then

• Codomainf = R (the set of reals).
• Rangef = [1, 2] (the set of reals between 1 and 2, inclusive).
The range of f is “smaller” than (i.e. is a proper subset of) its codomain.

In general, the range is not the same thing as the codomain.102 This is because in
general, not every element in the codomain need be hit by the function.
Instead, in general, the range is a subset of the codomain. (Indeed, it is typically a proper
subset of the codomain; in other words, it is typically “smaller” than the codomain.)
Because this is such a common point of confusion, let me repeat:

♡ The range is not the same thing as the codomain. ♡

Example 231. Define g ∶ {2, 3} → R by g (x) = x + 1. Then

• Codomain(g) = R (the set of reals).
• Range(g) = {3, 4} (the set containing two numbers).
The range of g is “smaller” than (i.e. is a proper subset of) its codomain.

Example 232. Define h ∶ R → R by h (x) = ex . Then

• Codomain(h) = R (the set of reals).
• Range(h) = R+ (the set of non-negative reals).
The range of h is “smaller” than (i.e. is a proper subset of) its codomain.

If D is the domain of the function f , then we can also call the range of f the image of D under f and
denote it f (D).
Unfortunately and very confusingly, for a minority of writers, the term range is synonymous with
codomain. In the A-Level syllabus and exams and in this textbook, we will follow majority practice by
insisting that range is not the same thing as codomain.
160, Contents
Sometimes, every element in the codomain is “hit” — in such cases, the range is equal to
the codomain. Examples:

Example 233. Define i ∶ R → R by i (x) = x + 1. Then

• Codomain(i) = R (the set of reals).
Every element in the codomain R is “hit”. And so, the range is the same as the codomain:
• Range(i) = R.
The range of i is equal to its codomain.

Example 234. Define j ∶ R+ → R by j (x) = ln x. Then

• Codomain(j) = R (the set of reals).

Every element in the codomain R is “hit”. And so, the range is the same as the codomain:
• Range(j) = R.
The range of j is equal to its codomain.

Example 235. Define k ∶ [0, 1] → [0, 1] by k (x) = x2 . Then

• Codomain (k) = [0, 1] (the set of reals between 1 and 2, inclusive).
Every element in the codomain [0, 1] is “hit”. And so, the range is the same as the
• Range (k) = [0, 1].
The range of k is equal to its codomain.

161, Contents

Exercise 91. Find the range of each function. Which (if any) of these functions has a
range that’s identical to its codomain? (Answer on p. 1407.)

(a) Define a ∶ R+0 → R by a (x) = x.
(b) Define b ∶ Z → R by b (x) = x2 .
(c) Define c ∶ Z → Z by c (x) = x2 .
(d) Define d ∶ Z → Z by d (x) = x + 1.
(e) Define e ∶ Z → R by e (x) = x + 1.
(f) The function f



The domain The codomain

(g) The function g



The domain The codomain

Exercise 92. Let f be a function. Then which of the following must be true?
(a) Range(f ) ⊆ Domain(f ).
(b) Range(f ) ⊆ Codomain(f ).
(c) Range(f ) ⊂ Domain(f ).
(d) Range(f ) ⊂ Codomain(f ).
(e) Range(f ) = Domain(f ).
(f) Range(f ) = Codomain(f ). (Answer on p. 1407.)

162, Contents

11. An Introduction to Continuity
Informally, a function is continuous if you can draw its graph without lifting your pencil.103
Most functions we’ll encounter in A-Level maths will be continuous everywhere. This
includes all polynomial functions.

Example 236. The function f ∶ R → R defined by f (x) = x5 − x2 − 1 is continuous

everywhere, because you can draw its entire graph without lifting your pencil.

f is continuous everywhere
because you can draw its entire
graph without lifting your pencil.

Example 237. The sine function sin is continuous everywhere.

y sin is continuous everywhere

because you can draw its
entire graph without
lifting your pencil.

For the formal definition of continuity, see Ch. 121.6 in the Appendices.
163, Contents
Example 238. The exponential function exp is continuous everywhere.

exp is continuous everywhere

because you can draw its entire
graph without lifting your pencil.

Example 239. The absolute value function ∣⋅∣ is continuous everywhere.

∣⋅∣ is continuous
everywhere because you
can draw its entire graph
without lifting your pencil.

As stated, most functions we’ll encounter are continuous everywhere. However, we’ll oc-
casionally encounter functions that aren’t continuous everywhere. For example, functions
with vertical asymptotes:

164, Contents

Example 240. The tan function is not continuous everywhere, because to draw its entire
graph, you must lift your pencil.
Note though that tan is continuous on the interval (− , ), because you can draw this
π π
2 2
particular portion of the graph without lifting your pencil.

π π x
2 2

1 1
Indeed, tan is continuous on each interval ((k − ) π, (k + ) π), for k ∈ Z.
2 2
And so, we can actually say that the tan function is continuous everywhere except at
each x = (k + ) π, for k ∈ Z.

Example 241. Define h ∶ R ∖ {0} → R by h (x) = . y
Then h is not continuous everywhere,
because to draw its entire graph, you
must lift your pencil.
Note though that h is continuous on R− h
because you can draw this portion of the
graph without lifting your pencil.
Similarly, h is also continuous on R+ be- x
cause you can draw this portion of the
graph without lifting your pencil.
And so, we can actually say that h is
continuous everywhere except at x = 0.

165, Contents

Also, functions with “holes” in them are not continuous:

Example 242. Define i ∶ R → R by

⎪x for x ≠ 1,
i (x) = ⎨

⎪ for x = 1.
Then i is not continuous everywhere, because to draw its entire graph, you must lift your

Note though that i is continuous on (−∞, 1) because you can draw this portion of the
graph without lifting your pencil.
Similarly, i is continuous on (1, ∞) because you can draw this portion of the graph without
lifting your pencil.
And so, we can actually say that h is continuous everywhere except at x = 1.

To repeat, most functions we’ll encounter in A-Level maths will be continuous everywhere.
Indeed, all functions we’ll encounter will be continuous everywhere except possibly at a
set of isolated points. Informally, a point in a set is said to be isolated if it isn’t close
to any other point in the set.104 For example, tan is continuous everywhere except on
{(k + 1/2) π ∶ k ∈ Z}.

For the formal definition, see Definition 250 (Appendices).
166, Contents
This isn’t something you need to know, but just to illustrate, here’s a somewhat exotic
function whose domain is R but which is nowhere-continuous.

Example 243. The Dirichlet function105 d ∶ R → R is defined by

⎪1 for x ∈ Q,
d (x) = ⎨

⎪ for x ∉ Q.
Then d is not continuous everywhere, because to draw its entire graph, you must lift
your pencil.
In fact, it’s worse than that — d is nowhere-continuous! Informally, this means you
can’t draw more than one point of the graph without lifting your pencil.106

The Dirichlet function y

d ∶ R → R is defined by:

⎪1 for x ∈ Q,
d(x) = ⎨

⎩0 for x ∉ Q.

The graph of d contains the
point (x, 1) for every x ∈ Q.

The graph of d contains the

point (x, 0) for every x ∉ Q.

Or the characteristic function of the rationals. Named after the German mathematician Peter
Gustav Lejeune Dirichlet (1805–59).
We will explain in a little more detail why this is so in Part V (Calculus), Ch. 68.
167, Contents
12. When a Function Is Increasing or Decreasing
Informally, we know what it means for a function to be increasing, decreasing, strictly
increasing, and/or strictly decreasing:

Example 244. Consider the function f ∶ R → R defined by f (x) = x2 .

It is decreasing on R−0 , increasing on R+0 , strictly decreasing on R− , and strictly increasing
on R+ .
It is both decreasing and increasing at 0.

Figure to be
inserted here.

Note: At x = 0, f is both decreasing and increasing, but neither strictly decreasing nor
strictly increasing. This follows from the formal definitions (below).

Formal definitions:

Definition 53. Let f be a nice function. Given a set of points S ⊆ Domainf , we say
that f is:

(a) Increasing on S if for any x1 , x2 ∈ S with x2 > x1 , we have f (x2 ) ≥ f (x1 );

(b) Strictly increasing —–—–—–—–—–—– “ —–—–—–—–—–—— f (x2 ) > f (x1 );

(c) Decreasing —–—–—–—–—–—– “ —–—–—–—–—–—— f (x2 ) ≤ f (x1 );

(d) Strictly decreasing —–—–—–—–—–—– “ —–—–—–—–—–—— f (x2 ) < f (x1 ).

If f is (strictly) increasing/decreasing on its domain, then we simply say that f is a
(strictly) increasing/decreasing function.

We will find it convenient to have a word that describes a function that’s either increasing
or decreasing:

Definition 54. If a function is increasing or decreasing (on a set), then we say that it is
monotonic (on that set).
If a function is strictly increasing or strictly decreasing (on a set), then we say that it is
strictly monotonic (on that set).
If a function is (strictly) monotonic on its domain, then we simply say that it is a (strictly)
monotonic function.

Obviously, strict monotonicity implies monotonicity. That is:

168, Contents
• If a function is strictly increasing on some set, then it is also increasing on that set; and
• If a function is strictly decreasing on some set, then it is also decreasing on that set.

Exercise 93. We will review trigonometric functions in Ch. 19. For now, here is the
graph of the sine function sin ∶ R → R:

Figure to be
inserted here.

Write down the sets on which sin is (a) increasing; (b) decreasing; (c) strictly increasing;
and/or (d) strictly decreasing. What are the points at which sin is (e) increasing but
not strictly increasing; and (f) decreasing but not strictly decreasing? (Answer on p.
A93. For every integer k, sin is:
(a) Increasing on [− + 2kπ, + 2kπ];
π π
2 2

(b) Decreasing on [ + 2kπ, + 2kπ];
2 2
(c) Strictly increasing on (− + 2kπ, + 2kπ);
π π
2 2

(d) Strictly decreasing on ( + 2kπ, + 2kπ);
2 2
(e) Increasing but not strictly increasing at each point x = 2kπ + ;
(f) Decreasing but not strictly decreasing at each point x = 2kπ + .

Figure to be
inserted here.

169, Contents

13. Arithmetic Combinations of Functions

Example 245. Define f, g ∶ R → R by f (x) = 7x + 5 and g (x) = x3 . Let k = 2.

Then we can also define the sum, difference, product, constant multiple, and quo-
tient functions:

(f + g) ∶ R→R by (f + g) (x) = 7x + 5 + x3 .
(f − g) ∶ R→R by (f − g) (x) = 7x + 5 − x3 .
(f ⋅ g) ∶ R→R by (f ⋅ g) (x) = (7x + 5) x3 .
(kf ) ∶ R→R by (kf ) (x) = 2 (7x + 5) .
7x + 5
( )∶ R ∖ {0} → R by ( ) (x) =
f f
g g x3
Evaluating each of these five functions at 1, we have:

(f + g) (1) = 7(1) + 5 + 13 = 13.

(f − g) (1) = 7(1) + 5 − 13 = 11.
(f ⋅ g) (1) = (7 × 1 + 5) ⋅ 13 = 12.
(kf ) (1) = 2 [7 (1) + 5] = 24.
( ) (1) = = 12.
g 13
By the way, the parentheses help to make clear that f + g is a single function. It would
be confusing and unclear if we wrote f + g(1) instead of (f + g) (1).

A word about the domain. For kf , the domain is simply Domainf . For each of f + g,
f − g, and f ⋅ g, the domain is simply Domainf ∩ Domaing (i.e. the set of numbers that
are in in both the domains of f and g).
But for the quotient function f /g, it’s a little trickier figuring out what the domain is.
Observe that g (0) = 0. And so, in order for f /g to be well-defined, we must restrict the
domain by removing the element 0. Otherwise, (f /g) (0) would be undefined and f /g
would fail to be a well-defined function. Thus, the domain of f /g is:

Domainf ∩ Domaing ∖ {x ∶ g (x) = 0} = R ∖ {0} .

In words, the domain of f /g is the set of numbers x that are in both the domains of f
and g, but remove those for which g (x) equals zero.

170, Contents

Example 246. Define h, i ∶ [−1, ∞) → R by h (x) = x + 1 and i (x) = x + 1. Let l = 5.
Then we can also define:

(h + i) ∶ [−1, ∞) → R by (h + i) (x) = x + 1 + x + 1.

(h − i) ∶ [−1, ∞) → R by (h − i) (x) = x + 1 − x + 1.

(h ⋅ i) ∶ [−1, ∞) → R by (h ⋅ i) (x) = x + 1 ⋅ 1.
(lh) ∶ [−1, ∞) → R by (kh) (x) = 5 (x + 1) .
x+1 √
( ) ∶ (−1, ∞) → R by ( ) (x) = √ = x + 1.
h h
i i x+1
Evaluating each of these five functions at 1, we have:
√ √
(h + i) (1) = 1 + 1 + 1 + 1 = 2 + 2.
√ √
(h − i) (1) = 1 + 1 − 1 + 1 = 2 − 2.
√ √
(h ⋅ i) (1) = 1 + 1 ⋅ 1 = 2 ⋅ 1.
(lh) (1) = 5 (1 + 1) = 10.
√ √
( ) (1) = 1 + 1 = 2.
Again, for lh, the domain is simply Domainh. And for each of h + i, h − i, and h ⋅ i, the
domain is simply Domainh ∩ Domaini (the set of numbers that are in both the domains
of h and i).
But again, for the quotient function h/i, it’s a little trickier figuring out what the domain
is. Observe that i(−1) = 0. And so, in order for h/i to be well-defined, we must restrict
the domain by removing the element 0. Otherwise, (h/i) (0) would be undefined and
h/i would fail to be a well-defined function. Thus, the domain of h/i is:

Domainh ∩ Domaini ∖ {x ∶ i (x) = 0} = [−1, ∞) ∖ {−1} = (−1, ∞) .

In words, the domain of h/i is the set of numbers x that are in both the domains of h
and i, but remove those for which i (x) equals zero.107

Here the sharp student may wonder, “But isn’t x + 1 perfectly well-defined for all x ∈ [−1, ∞)? So
couldn’t the domain of h/i instead be [−1, ∞)?”’ Great point. The thing is, we really want to think
of (h/i) (x) as being equal to h (x) divided by i (x) (without
√ doing any simplification beforehand and
without thinking of h/i as being simply the “formula” x + 1). So, if i (x) is undefined, then (h/i) (x)
should also be undefined.
171, Contents
Formal definitions of the five functions:

Definition 55. Let f and g be nice functions and k ∈ R. Then the sum, difference,
product, constant multiple, and quotient functions — denoted f + g, f − g, f ⋅ g, kf , and
f /g — have codomain R and domain and mapping rule as given below:

Domain Mapping rule

f +g Domain(f ) ∩ Domain(g) (f + g) (x) = f (x) + g (x)

f −g Domain(f ) ∩ Domain(g) (f − g) (x) = f (x) − g (x)

f ⋅g Domain(f ) ∩ Domain(g) (f ⋅ g) (x) = f (x) g (x)

kf Domain(f ) (kf ) (x) = kf (x)

f (x)
Domain(f ) ∩ Domain(g) ∖ {x ∶ g (x) = 0} ( ) (x) =
f f
g g g (x)

Remark 26. As we’ll learn shortly, f g refers to a function that’s entirely different from
f ⋅ g. So take great care to write f ⋅ g if that’s what you mean.108

Exercise 94. Let k = 2 and l = 5. Define:

f ∶ R → R by f (x) = 7x + 5, g ∶ R → R by g (x) = x3 ,

h ∶ [−1, ∞) → R by h (x) = x + 1; and i ∶ [−1, ∞) → R by i (x) = x + 1.

Evaluate each of the following:

(a) (f + g) (2). (f)(h + i) (2).
(b) (g − f ) (1). (g) (i − h) (1).
(c) (g ⋅ f ) (2). (h) (i ⋅ h) (2).
(d) (kg) (1). (i)(li) (1).
(e) ( ) (1). (j) ( ) (1).
g i
f h
(k) Write down the domains of the functions f + h, f − h, f ⋅ h, and f /h. Then formally
define these four functions. (Answer on p. 1408)

Unfortunately and very confusingly, a minority of writers do use f g to mean f ⋅ g. In the A-Level
syllabus and exams and in this textbook, we will follow majority practice by insisting that f g ≠ f ⋅ g.
172, Contents
14. Inverse Functions

Example 247. Define f ∶ {Cow, Chicken} → {Produces eggs, Produces milk} by “match
the animal to its role”.
The function f

🐄 Produces eggs

Produces milk
The domain The codomain

Then f ’s inverse function is the function f −1 ∶ {Produces eggs, Produces milk} →

{Cow, Chicken}, defined by “match the role to the corresponding animal”.

The function f −1

🐄 Produces eggs

Produces milk
The codomain The codomain

Here’s the three-step procedure to get the inverse function f −1 :

1. Set Domain (f −1 ) = Range(f ) = {Produces eggs, Produces milk}.
2. Set Codomain (f −1 ) = Domain(f ) = {Cow, Chicken}.
3. Use the fact that f −1 (f (x)) = x to “invert” the mapping rule.
So here, we simply “invert” the mapping rule from “match the animal to its role” to
“match the role to the corresponding animal”. (This corresponds to inverting the ↦
arrows in the above figure.)

Given the function f , its inverse function (or simply inverse) is denoted f −1 and satisfies:

If f (x) = y, then f −1 (y) = x.

Or equivalently: f −1 (f (x)) = x.

Definition 56 will formally define the inverse function. But first, more examples:
173, Contents
Example 248. Define g ∶ {Cow, Dog, Chicken} → {Produces eggs, Guards the home,
Produces milk} by “match the animal to its role”.

The function g

🐄 Produces eggs

🐕 Guards the home

🐔 Produces milk

The domain The codomain

Three-step procedure to get the inverse function g −1 :

1. Set Domain (g −1 ) = Range(g) = {Produces eggs, Guards the home, Produces milk}.
2. Set Codomain (g −1 ) = Domain(g) = {Cow, Dog, Chicken}.
3. Use the fact that g −1 (g (x)) = x to “invert” the mapping rule.
So again, here we simply “invert” the mapping rule from “match the animal to its role”
to “match the role to the corresponding animal”. (This corresponds to inverting the ↦
arrows in the above figure.)
Thus, g’s inverse function is g −1 ∶ {Produces eggs, Guards the home, Produces milk} →
{Cow, Dog, Chicken} defined by “match the role to the corresponding animal”.

The function g −1

🐄 Produces eggs

🐕 Guards the home

🐔 Produces milk

The codomain The domain

In each of the above examples, the original function’s range was identical to its codomain.
And thus, the inverse function’s domain was simply the original function’s codomain.
However, in general, this need not be the case:

174, Contents

Example 249. Define h ∶ {Cow, Chicken} → {Produces eggs, Guards the home, Produces
milk} by “match the animal to its role”.

The function h

🐄 Produces eggs

Guards the home

Produces milk
The domain The codomain

Three-step procedure to get the inverse function h−1 :

1. Set Domain (h−1 ) = Range(h) = {Produces eggs, Produces milk}.
Observe that this time, Range(h) = Domain (h−1 ) is different from Codomain(h). (And
so in the figure below, “Guards the home” is now no longer in the set on the right.)
2. Set Codomain (h−1 ) = Domain(h) = {Cow, Chicken}.

1. Use the fact that h−1 (h (x)) = x to “invert” the mapping rule.
So again, here we simply “invert” the mapping rule from “match the animal to its role”
to “match the role to the corresponding animal”. (This corresponds to inverting the ↦
arrows in the above figure.)
Thus, the inverse function of h is h−1 ∶ {Produces eggs, Produces milk} → {Cow, Chicken}
defined by “match the role to the corresponding animal”.

The function h−1

🐄 Produces eggs

Produces milk
The codomain The domain

More examples, but now involving nice functions:

175, Contents

Example 250. Define i ∶ [0, 1] → R by i (x) = x + 1.
Three-step procedure to get the inverse function i−1 :
1. Set Domain (i−1 ) = Range(i) = [1, 2].
2. Set Codomain (i−1 ) = Domain(i) = [0, 1].
3. Use the fact that i−1 (i (x)) = x to “invert” the mapping rule.

i−1 (i (x)) = x ⇐⇒ i−1 (x + 1) = x

⇐⇒ i−1 (y) = x (Let y = i (x) = x + 1.)
⇐⇒ i−1 (y) = y − 1 (Do the algebra: x = y − 1.)

Thus, we define i−1 ∶ [1, 2] → [0, 1] by i−1 (y) = y − 1.109

Let’s verify that this inverse function “works”, i.e. i−1 (i (x)) = x, for some values of x:

i−1 (i (0)) = i−1 (0 + 1) = i−1 (1) = 1 − 1 = 0. 3

i−1 (i(0.3)) = i−1 (0.3 + 1) = i−1 (1.3) = 1.3 − 1 = 0.3. 3
i−1 (i(0.8)) = i−1 (0.8 + 1) = i−1 (1.8) = 1.8 − 1 = 0.8. 3

Example 251. Define j ∶ [0, 1] → R by j (x) = 2x.

Three-step procedure to get the inverse function j −1 :
1. Set Domain (j −1 ) = Range(j) = [0, 2].
2. Set Codomain (j −1 ) = Domain(j) = [0, 1].
3. Use the fact that j −1 (j (x)) = x to “invert” the mapping rule:

j −1 (j (x)) = x ⇐⇒ j −1 (2x) = x
⇐⇒ j −1 (y) = x (Let y = j (x) = 2x.)
⇐⇒ j −1 (y) = (Do the algebra: x = .)
y y
2 2

Thus, we define j −1 ∶ [0, 2] → [0, 1] by j −1 (y) = .

Let’s verify that this inverse function “works”, i.e. j −1 (j (x)) = x, for some values of x:

j −1 (j (0)) = j −1 (2 × 0) = j −1 (0) = 0/2 = 0. 3

j −1 (j(0.3)) = j −1 (2 × 0.3) = j −1 (0.6) = 0.6/2 = 0.3. 3
j −1 (j(0.8)) = j −1 (2 × 0.8) = j −1 (1.6) = 1.6/2 = 0.8. 3

By the way, note that once again, y here is merely a dummy or placeholder variable that we use for i−1 .
We could’ve replaced y with any other symbol like w, z, ,, or ☀. (We would however avoid using x
because this is the dummy or placeholder variable that we already used for i.)
176, Contents
Example 252. Define k ∶ R ∖ {0} → R by k (x) = .
Three-step procedure to get the inverse function k −1 :
1. Set Domain (k −1 ) = Range(k) = R ∖ {0}.
2. Set Codomain (k −1 ) = Domain(k) = R ∖ {0}.
3. Use the fact that k −1 (k (x)) = x to “invert” the mapping rule:

k −1 (k (x)) = x
⇐⇒ k −1 ( ) = x
⇐⇒ k −1 (y) = x (Let y = k (x) = .)
1 1
⇐⇒ k −1 (y) = (Do the algebra: x = .)
y y

Thus, we define k −1 ∶ R ∖ {0} → R ∖ {0} by k −1 (y) = .
Let’s verify that this inverse function “works”, i.e. k −1 (k (x)) = x, for some values of x:
1 1
k −1 (k(1)) = k −1 ( ) = k −1 (1) = = 1. 3
1 1

1 10 1
k −1 (k(0.3)) = k −1 ( ) = k −1 ( ) = = 0.3. 3
0.3 3 10/3

1 5 1
k −1 (k(0.8)) = k −1 ( ) = k −1 ( ) = = 0.8. 3
0.8 4 5/4

By the way, are the functions k and k −1 equal?110

They have identical domains and mapping rules. However, they have different codomains and are
therefore not equal.
177, Contents
Example 253. Define l ∶ R+0 → R by l (x) = x2 .
Three-step procedure to get the inverse function l−1 :
1. Set Domain (l−1 ) = Range(l) = R+0 .
2. Set Codomain (l−1 ) = Domain(l) = R+0 .
3. Use the fact that l−1 (l (x)) = x to “invert” the mapping rule:

l−1 (l (x)) = x
⇐⇒ l−1 (x2 ) = x
⇐⇒ l−1 (y) = x (Let y = l (x) = x2 .)
√ √
⇐⇒ l−1 (y) = y (Do the algebra: x = ± y.)

Note that in the last step here, we discard − y, because the codomain of l−1 is the set
of non-negative real numbers.

Thus, we define l−1 ∶ R+0 → R+0 by l−1 (y) = y.
Let’s verify that this inverse function “works”, i.e. l−1 (l (x)) = x holds, for some values
of x:

l−1 (l(1)) = l−1 (12 ) = l−1 (1) = 1 = 1. 3

l−1 (l(3)) = l−1 (32 ) = l−1 (9) = 9 = 3. 3

l−1 (l(8)) = l−1 (82 ) = l−1 (64) = 64 = 8. 3

Here’s the formal definition of an inverse function:

Definition 56. Let f be a function. Then its inverse function (or simply inverse),
denoted f −1 , has the following domain, codomain, and mapping rule:

Domain: Range(f ).
Codomain: Domain(f ).
Mapping rule: If y = f (x), then f −1 (y) = x.

As we’ll see on the next page, the inverse function f −1 isn’t always well-defined. In such
cases, we say that “the inverse function does not exist” or more simply, “the function has
no inverse”.

Exercise 95. Find the inverse of each function.

(a) a∶ R →R defined by a (x) = 5x.
(b) b∶ R →R defined by b (x) = x3 .
(c) c∶ R+ →R defined by c (x) = ln x.
(d) d∶ R+ →R defined by d (x) = 1/x2 . (Answer on p. 1409.)

178, Contents

14.1. One-to-One or Invertible Functions
In each of our above examples and exercises, the given function always had an inverse.
However, this is not generally true — the inverse function doesn’t always exist:

Example 254. Define f ∶ {Cow, Chicken} → {Yum, Yuck} by f (Cow) = Yum and
f (Chicken) = Yum.

The function f

🐄 Yum

The domain The codomain

Say we try to get the inverse function f −1 through the usual three-step procedure:
1. Set Domain (f −1 ) = Range(f ) = {Yum}. 3
2. Set Codomain (f −1 ) = Domain(j) = {Cow, Chicken}. 3
So far so good. But in the third step, we run into trouble:
3. If we try to “invert” the mapping rule by inverting the ↦ arrows, we get the following

The “function” f −1

🐄 Yum

The codomain The domain

But as you should know very well by now, this “function” f −1 isn’t well-defined, because
it maps the element Yum in its domain to more than one element in its codomain.
And so in this case, we say that “the inverse function f −1 does not exist” or more simply,
“the function f has no inverse”.

179, Contents

Example 255. Define g ∶ {0, 1, 2} → {0, 1, 2, 3} by g (0) = 1, g(1) = 2, and g(2) = 1.

The function g

0 0


2 3

The domain The codomain

Say we try to get the inverse function g −1 through the usual three-step procedure:
1. Set Domain (g −1 ) = Range(g) = {0, 1}. 3
2. Set Codomain (g −1 ) = Domain(g) = {0, 1, 2}. 3
So far so good. But in the third step, we run into trouble:
3. If we try to “invert” the mapping rule by inverting the ↦ arrows, we get the following

The “function” g −1

0 0


2 3

The codomain The domain

But again, this “function” g −1 is not well-defined, because it maps the element 1 in its
domain to more than one element in its codomain.
And so in this case, we say that “the inverse function g −1 does not exist” or more simply,
“the function g has no inverse”.

180, Contents

The reason f and g do not have inverses is that in each case, an element in the codomain
is “hit” more than once:
• f “hits” Yum twice — we have f (Cow) = Yum and f (Chicken) = Yum.
• g “hits” 1 twice — we have g (0) = 1 and g (2) = 1.
And so, when we try to construct the inverse function, we run into a fatal ambiguity:
• Should it be f −1 (Yum) = Cow or f −1 (Yum) = Chicken?
• Should it be g −1 (1) = 0 or g −1 (1) = 2?
This fatal ambiguity means that in each case, we are unable to construct a well-defined
inverse function. And so we simply say that the inverse does not exist.
The foregoing discussion motivates us to give a special name to functions whose codomain’s
elements are “hit” at most once:

Definition 57. A function f is one-to-one or invertible if given any x1 , x2 ∈ Domainf

with x1 ≠ x2 , we have f (x1 ) ≠ f (x2 ).

In words, a function is one-to-one or invertible if any two distinct elements in its domain
correspond to two distinct elements in the codomain.
Here are two equivalent ways to rewrite the above definition — a function f is one-to-one
or invertible if:
• For every y ∈ Range(f ), there is exactly one x ∈ Domain(f ) such that f (x) = y.
• f (x1 ) = f (x2 ) implies x1 = x2 . (This is the contrapositive of the above definition.)
Graphically and informally, we can also use the horizontal line test (HLT):

A function is one-to-one or invertible if and only if

no horizontal line intersects its graph more than once.

The name one-to-one is apt — every element in the codomain is “hit” by exactly one
element in the domain. In contrast, a function that isn’t one-to-one is many-to-one,
where at least one element in the codomain is “hit” by more than one element in the
domain. (By the way, is it possible that a function is one-to-many?111 )
The name invertible is also apt, because as our examples above illustrate, a function has
a well-defined inverse if and only if it is invertible. Let’s formally jot this down as a result:

Fact 21. A function has a well-defined inverse if and only if it is invertible.

Proof. See p. 1269 in the Appendices.

Nope. A one-to-many function would be one that maps an element in the domain to more than one
element in the codomain — but this violates our cardinal requirement that a function maps each element
in the domain to (exactly) one element in the codomain.
181, Contents
Remark 27. There is actually a third name for one-to-one or invertible functions — they’re
also called injective functions (or simply injections). But we won’t use this third name
in this textbook.

To show that a function f is not invertible, simply find a counterexample. That is, simply
find some x1 ≠ x2 such that f (x1 ) = f (x2 ):

Example 256. Consider the function h ∶ R → R defined by h (x) = x2 + 2x.

The element 8 in Codomain(h) is hit twice — we have h (−4) = 8 and h (2) = 8. Thus, h
is not invertible and has no inverse. (Indeed, except for −1 ∈ Range(h) which is hit once,
every y ∈ Range(h) is hit twice.)

HLT: The horizontal
line y = 8 intersects
the graph of h twice. 8

∣ ∣
−4 2 x

Example 257. Consider the sine function sin.

The element 1 in Codomain(sin) is hit infinitely many times — we have, for example,

sin (− ) = 1 and sin ( ) = 1. Thus, sin is not invertible and has no inverse. (Indeed, as
2 2
you can probably tell, every y ∈ Range(sin) is hit infinitely many times.)

1 HLT: The horizontal

line y = 1 intersects
sin the graph of sin
infinitely many times.

∣ ∣

2 2

182, Contents

Example 258. Consider the absolute value or modulus function ∣⋅∣.
The element 2 in Codomain(∣⋅∣) is hit twice — we have ∣−2∣ = 2 and ∣2∣ = 2. Thus, ∣⋅∣ is not
invertible and has no inverse. (Indeed, except for 0 ∈ Range(∣⋅∣) which is hit once, every
y ∈ Range(∣⋅∣) is hit twice.)

∣⋅∣ y HLT: The horizontal

line y = 2 intersects
2 the graph of ∣⋅∣ twice.

−2 2 x
∣ ∣

To show that a function f is invertible, prove that (i) x1 ≠ x2 Ô⇒ f (x1 ) ≠ f (x2 ). Or

equivalently, prove the contrapositive (ii) f (x1 ) = f (x2 ) Ô⇒ x1 = x2 .

Example 259. Define i ∶ R → R by i (x) = 3x.

Here are two proofs that i is invertible:
(i) Let x1 > x2 . Then 3x1 > 3x2 and so i (x1 ) ≠ i (x2 ).
(ii) Let i (x1 ) = i (x2 ) or 3x1 = 3x2 . Then x1 = x2 .

The inverse function is i−1 ∶ R → R defined by i−1 (y) = .


Example 260. Define j ∶ R+0 → R by j (x) = x2 .

Here are two proofs that j is invertible:
(i) Let x1 > x2 ≥ 0. Then x21 > x22 and so j (x1 ) ≠ j (x2 ).
(ii) Let j (x1 ) = j (x2 ) or x21 = x22 . Then since x1 , x2 ≥ 0, we have x1 = x2 .

The inverse function is j −1 ∶ R+0 → R+0 defined by j −1 (y) = y.

Exercise 96. Determine if each function is invertible. And if it is, write down its inverse.
(a) a ∶ R → R defined by a (x) = x2 − 1.
(b) b ∶ R+0 → R defined by b (x) = x2 − 1.
(c) c ∶ R → [−1, ∞) defined by c (x) = x2 − 1. (Answer on p. 1409.)

183, Contents

14.2. The Graphs of f and f −1 Are Reflections in the Line y = x

Fact 22. Let f be an invertible function and f −1 be its inverse. Then f and f −1 are
reflections of each other in the line y = x.

Proof. See p. 1269 in the Appendices.

Example 261. The function f ∶ R → R defined by f (x) = x + 1 is invertible (you should

be able to verify this yourself). Thus, its inverse exists and we can find it as usual:
1. Domain (f −1 ) = Range(f ) = R.
2. Codomain (f −1 ) = Domain(f ) = R.
3. Use the fact that f −1 (f (x)) = x to “invert” the mapping rule:

f −1 (f (x)) = x
⇐⇒ f −1 (x + 1) = x
⇐⇒ f −1 (y) = x (Let y = f (x) = x + 1.)
⇐⇒ f −1 (y) = y − 1 (Do the algebra: x = y − 1.)

Thus, the inverse function is f −1 ∶ R → R defined by f −1 (x) = x − 1. Below are graphed f

and f −1 . Observe that they are reflections of each other in the line y = x.


f −1

184, Contents

Example 262. The function g ∶ R → R defined by g (x) = 2x is invertible. Thus, its
inverse exists and we can find it as usual:
1. Set Domain (g −1 ) = Range(g) = R.
2. Set Codomain (g −1 ) = Domain(g) = R.
3. Use the fact that g −1 (g (x)) = x to “invert” the mapping rule:

g −1 (g (x)) = x
⇐⇒ g −1 (2x) = x
⇐⇒ g −1 (y) = x (Let y = g (x) = 2x.)
⇐⇒ g −1 (y) = (Do the algebra: x = .)
y y
2 2

Thus, the inverse function is g −1 ∶ R → R defined by g −1 (x) =

. Below are graphed g
and g −1 . Observe that they are reflections of each other in the line y = x.


g −1

185, Contents

Example 263. The function h ∶ R ∖ {0} → R defined by h (x) = is invertible. Thus, its
inverse exists and we can find it as usual:
1. Set Domain (h−1 ) = Range(h) = R ∖ {0}.
2. Set Codomain (h−1 ) = Domain(h) = R ∖ {0}.
3. Use the fact that h−1 (h (x)) = x to “invert” the mapping rule:

h−1 (h (x)) = x
⇐⇒ h−1 ( ) = x
⇐⇒ h−1 (y) = x (Let y = h (x) = .)
1 1
⇐⇒ h−1 (y) = (Do the algebra: x = .)
y y

Thus, the inverse function is h−1 ∶ R ∖ {0} → R ∖ {0} defined by h−1 (x) = . Below are
graphed h and h−1 . Observe that they are reflections of each other in the line y = x.
Indeed, they look exactly the same. (Is h = h−1 ?112 )

h, h

Nope. They have the same domains and mapping rules. But they have different codomains and are
thus different.
186, Contents
Example 264. The function i ∶ R+0 → R defined by i (x) = x2 is invertible. Thus, its
inverse exists and we can find it as usual:

1. Set Domain (i−1 ) = Range(i) = R+0 .

2. Set Codomain (i−1 ) = Domain(i) = R+0 .
3. Use the fact that i−1 (i (x)) = x to “invert” the mapping rule:

i−1 (i (x)) = x
⇐⇒ i−1 (x2 ) = x
⇐⇒ i−1 (y) = x (Let y = i (x) = x2 .)
√ √
⇐⇒ i−1 (y) = y (Do the algebra: x = ± y.)

Note that in the last step here, we discard − y, because the codomain of i−1 is the set
of non-negative real numbers.

Thus, the inverse function is i−1 ∶ R+0 → R+0 defined by i−1 (x) = x. Below are graphed i
and i−1 . Observe that they are reflections of each other in the line y = x.




187, Contents

Fact 22 is particularly useful when we are unable to write down the mapping rule of the
inverse function:

Example 265. The function j ∶ R → R defined by j (x) = x3 + x is invertible. (Can you

verify this?)113 Thus, the inverse function j −1 ∶ R → R exists.
But unfortunately, not having learnt to solve cubic equations, we don’t know how to write
down the mapping rule of the inverse function j −1 ∶ R → R.
Nonetheless, even though we have no idea what the mapping rule of j −1 is, if we already
have the graph of j, we can use Fact 22 to sketch the graph of j −1 .

j −1 y=x

In the above example, it’s actually possible to write down the inverse function’s mapping
rule114 — we just don’t know how to, because we haven’t learnt to solve cubic equations.
In the next example, it is impossible to do so. Nonetheless, we can again use Fact 22 to
sketch the graph of the inverse function.

Let x2 > x1 . Then x32 + x2 > x31 + x1 . We’ve just proven that for any x1 ≠ x2 , we must have j(x1 ) ≠ j(x2 )$.
Thus, j is invertible. ¿ ¿
√ √
Á 1 2 1 Á
À1 Á
À 1 1 2 1
In case you were wondering, it’s j (x) = x+ x + + x− x + .
3 3
114 −1
2 4 27 2 4 27
188, Contents
Example 266. The function k ∶ R → R defined by k (x) = x5 + x is invertible. (Can you
verify this?)115 Thus, the the inverse function k −1 ∶ R → R exists.
But unfortunately, it is impossible 116 to write down the mapping rule of the inverse
function k −1 ∶ R → R.
But again, even though we cannot write down the mapping rule of k −1 , if we already have
the graph of k, we can use Fact 22 to sketch the graph of k −1 .

k −1

Exercise 97. Write down the inverse of each function. Then graph both the function
and its inverse. (Answers on pp. 1410–1411.)
(a) f ∶ (0, 1] → R defined by f (x) = x + 1.
(b) g ∶ (0, 1] → R defined by g (x) = 2x.
(c) h ∶ (0, 1] → R defined by h (x) = .
(d) i ∶ (0, 1] → R defined by i (x) = x2 .

Let x2 > x1 . Then x52 + x2 > x51 + x1 . We’ve just proven that for any x1 ≠ x2 , we must have k (x1 ) ≠ k (x2 ).
Thus, k is invertible.
Abel’s impossibility theorem says there is no algebraic solution for polynomials of degree 5 and
above. That is, unlike the quadratic formula which gives us algebraic expressions for the quadratic
equation’s two roots, it is impossible to write down a similar formula for polynomials of degree 5 and
above. One implication of this is that it is impossible to write down k −1 here as an algebraic expression.
189, Contents
14.3. The Intersection of f and f −1
By Fact 22, a function f and its inverse f −1 are reflections of each other in the line y = x.
Observe then that if f intersects the line y = x at some point, then f −1 must also intersect
y = x at the very same point. Thus, any point at which f intersects y = x is also a point at
which f and f −1 intersect.

Example 267. Define f ∶ R → R by f (x) = 2x − 1.

Then f ’s inverse f −1
∶ R → R is defined by:
1 1
f −1 (x) = x + . (1, 1)
2 2
f −1
The graph of f intersects the line y = x at the
point (1, 1). And so, f and f −1 should also
intersect at this point. x

lin or
45 ○ = x

Example 268. Define g ∶ R+0 → R+0 by g (x) =

x2 . y
Then g’s √ inverse is g −1 ∶ R+0 → R+0 defined by
g −1 (x) = x.

lin or
45 ○ = x
The graph of f intersects the line y = x at g −1 y
the points (0, 0) and (1, 1). And so, g and g −1
should also intersect at these points.

(1, 1)

A formal statement of our above observation:

Fact 23. Suppose the function f has inverse f −1 . Then any point at which f intersects
the line y = x is also a point at which f intersects f −1 .

Or equivalently and in more concise notation:

f (x) = x Ô⇒ f (x) = f −1 (x).
Now, consider the converse of Fact 23. That is, consider the following statement:
190, Contents
“Any point at which f intersects f −1 must be on the line y = x.”

Or equivalently: “f (x) = f −1 (x) Ô⇒ f (x) = x.”

The above statement sounds perfectly plausible. But unfortunately, it is false. (Those
writing your A-Level exams have assumed it to be true at least twice in the recent past.)117

1 y
Example 269. Define f ∶ R ∖ {0} → R ∖ {0} by: f (x) = .
x f = f −1
Then f ’s inverse is f −1 ∶ R ∖ {0} → R ∖ {0}, also defined by:

f −1 (x) = .
(3, 1/3)
Observe that interestingly, here f is its (1, 1)
own inverse.118 That is, f = f −1 . And
so, f and f −1 share infinitely many in- x
tersection points. (−1, −1)
However, only (−1, −1) and (1, 1) are
on the line y = x. Every other intersec- lin or
tion point is not. For example, f and
45 ○ = x
f −1 at the point (3, 1/3), but this point

is not on y = x.

Example 270. One might object y

lin or
that the function in the last counter- g

45 ○ = x
example was unusual because it was g −1

(a) not continuous everywhere; and
(b) its own inverse.
Consider then the function g ∶ R → R (−1, 1)
defined by g (x) = −x3 . It is (a) con-
tinuous everywhere; and (b) isn’t its (0, 0)
own inverse.
Its inverse g −1 ∶ R → R is defined by: x

g −1 (x) = − 3 x.

Now, observe that g and g −1 intersect (1, −1)

at three points — only (0, 0) is on the
line y = x, while the other two (−1, 1)
and (1, −1) are not.

Exercises 467(iii) (N2011/II/3) and 475(iii) (N2008-II-4). (Indeed, this subchapter was inspired by
those two questions.) One set of published TYS answers baldly claims, “As y = f (x) is a reflection of
y = f −1 (x) about the line y = x, the point of intersection of the two curves must meet on y = x.”
If you like big words, a function that’s its own inverse is called an involution.
191, Contents
Two Results That Come Kinda Close (Optional)

Our last two examples show that the following statement is false:

“Any point at which f intersects f −1 must be on the line y = x.” 7

Or equivalently: “f (x) = f −1 (x) Ô⇒

/ f (x) = x.”

Nonetheless, here are two results that come kinda close.

The first result says that if a function is continuous on an interval and intersects its inverse
at least once, then at least one of these intersection points is on the line y = x.

Fact 24. Let D be an interval. Let f ∶ D → R be a continuous and invertible func-

tion. Suppose f and its inverse f −1 intersect at least once. Then at least one of these
intersection points is on the line y = x.

Proof. See p. 1270 in the Appendices.

The invertible functions examined in our last four examples were all continuous on an
interval. And so sure enough, in each case, the function intersected its inverse on the line
y = x at least once.

The second result says that by adding the peculiar assumption that f and f −1 intersect at
an even number of points, we can obtain the stronger result that all of the intersection
points are on the line y = x.

Fact 25. Let D be an interval. Let f ∶ D → R be a continuous and invertible function.

Suppose f and its inverse f −1 intersect at an even (and positive) number of points. Then
all of these intersection points are on the line y = x.

Proof. See p. 1270 in the Appendices.

Fact 268 is illustrated by Example 268, but not by the other three of the last four examples
examined, because the hypotheses of Fact 268 do not apply in those three examples (can
you explain why?).119

In each of the three examples, the function intersects its inverse once, infinitely many times, and thrice
(respectively). And so in each case, we violate the hypothesis in Fact 268 that they intersect an even
number of times.
192, Contents
14.4. Domain Restriction to Create an Invertible Function
We saw that some functions were not invertible. And so, for such functions, the inverse
function simply does not exist.
Nonetheless, it turns out we can always120 restrict the domain of a non-invertible function
to create a brand new function that is invertible.

Example 271. The function f ∶ R → R

defined by f (x) = x2 is not invertible. y

But by restricting its domain to R+ , we can f is not

get the brand new function g ∶ R+ → R invertible.
defined by g (x) = x2 . The function g
is invertible and has the inverse √function
g ∶ R → R defined by g (x) = x.
−1 + + −1

Alternatively, we can restrict the domain of

f to R− and get the brand new function h ∶
R+ → R defined by h (x) = x2 . The function

h is invertible and has the inverse function
h−1 ∶ R+ → R− defined by h−1 (x) = − x.

y y


g −1


We can always simply restrict the domain to be the empty set! The function thus formed would have
an empty domain and an empty range. It would thus be vacuously true that this function is invertible
(because no element in its range is hit more than once).
193, Contents
As the above example illustrates, there is usually more than one way to restrict the domain
of a non-invertible function to create an invertible function.

Example 272. The absolute value func-

tion ∣⋅∣ is not invertible.
∣⋅∣ is not y
But by restricting its domain to R , we invertible.
can get the brand new function i ∶ R+ → R
defined by i (x) = x. The function i is in-
vertible and, indeed, shares the same do-
main and mapping rule as its inverse func-
tion i−1 .
Alternatively, we can restrict the domain of
∣⋅∣ to R− and get the brand new function j ∶
R− → R defined by j (x) = −x. The function
j is invertible and has the inverse function
j −1 ∶ R+ → R− defined by j −1 (x) = −x.

y y
i, i

x x

j −1

Exercise 98. Define f ∶ R ∖ {1} → R by f (x) = 2. (Answer on p. 1412.)
(x − 1)
(a) Prove that f is not invertible.
Let g be the function created by restricting the domain of f to (1, ∞).
(b) Prove that g is invertible, then write down the inverse function g −1 .
Let h be the function created by restricting the domain of f to (−∞, 1).
(c) Prove that h is invertible, then write down the inverse function h−1 .

194, Contents

14.5. Invertibility and Strict Monotonicity
Recall that a function is (strictly) monotonic if it is (strictly) increasing or decreasing.

Fact 26. If a function is strictly monotonic, then it is also invertible.

Proof. If f is strictly monotonic, then for any distinct x1 , x2 ∈ Domainf , we have f (x1 ) ≠
f (x2 ). And so, by Definition 57, f is invertible.

Example 273. XXX

Example 274. XXX

Unfortunately, the converse of Fact 26 is false:

Example 275. XXX

However, if we make the additional assumptions that the function is continuous and has
an as its domain, then the converse of Fact 26 is true:

Proposition 3. Suppose D is an interval and f ∶ D → R is a continuous function. Then:

f is invertible Ô⇒ f is strictly monotonic (on D).

Proof. See p. 1269 in the Appendices.

Combining Fact 26 and Proposition 3, we have:

Corollary 3. Suppose D is an interval and f ∶ D → R is a continuous function. Then:

f is invertible ⇐⇒ f is strictly monotonic (on D).

Example 276. XXX

Example 277. XXX

Fact 26 tells us that any strictly monotonic function has an inverse. The following result
tells us a little more:

Proposition 4. If a function is strictly increasing, then so too is its inverse.

Similarly, if a function is strictly decreasing, then so too is its inverse.

Proof. See p. 1269 in the Appendices.

Example 278. XXX

Example 279. XXX

195, Contents
15. Composite Functions
Earlier we learned that given two functions f and g, we can construct the product function
f ⋅ g (read aloud as “f times g”).
We now learn to construct the composite function f g (read aloud simply as “f g”).
Despite looking very similar, the composite function f g is entirely different from the product
function f ⋅ g.

Example 280. Define f, g ∶ R → R by f (x) = 2x and g (x) = x + 1. Then the composite

function f g ∶ R → R is defined by:

(f g) (x) = f (g (x)) = f (x + 1) = 2 (x + 1) = 2x + 2.
So for example: (f g) (0) = 2 × 0 + 2 = 2.

In contrast, f ⋅ g ∶ R → R is defined by:

(f ⋅ g) (x) = f (x) g (x) = (2x) (x + 1) = 2x2 + 2x.

So for example: (f ⋅ g) (0) = 2 × 02 + 2 × 0 = 0.

Note that f ⋅ g = g ⋅ f , but f g ≠ gf . The composite function gf ∶ R → R is defined by:

(gf ) (x) = g (f (x)) = g (2x) = 2x + 1.

So for example: (gf ) (0) = 2 × 0 + 1 = 1 ≠ (f g) (0) = 2.

Note also that for the composite function f g, we first apply the function g, then apply
the function f . So for example, to compute, say f g(7), we first compute g(7) = 7 + 1 = 8,
then compute f (g(7)) = f (8) = 2 ⋅ 8 = 16.
Conversely, for the composite function gf , we first apply the function f , then apply the
function g. So for example, to compute, say gf (7), we first compute f (7) = 2 ⋅ 7 = 14,
then compute g (f (7)) = g (14) = 14 + 1 = 15.
(A common mistake is to instinctively go from left to right. So with f g, one might
mistakenly apply f before g. And with gf , one might mistakenly apply g before f .)

196, Contents

Example 281. Define h, i ∶ R → R by h (x) = x2 − 1 and i (x) = . Then the composite
function hi ∶ R → R is defined by:
(hi) (x) = h (i (x)) = h ( ) = − 1.
2 4
So for example: (hi) (0) = − 1 = −1.

In contrast, h ⋅ i ∶ R → R is defined by:

x x3 x
(h ⋅ i) (x) = h (x) i (x) = (x − 1) =
− .
2 2 2
03 0
So for example: (h ⋅ i) (0) = − = 0.
2 2

Again, note that h ⋅ i = i ⋅ h, but hi ≠ ih. The composite function ih ∶ R → R is defined by:
x2 1
(ih) (x) = i (h (x)) = i (x2 − 1) = − .
2 2
02 1 1
So for example: (ih) (0) = − = − ≠ (hi) (0) = −1.
2 2 2

Formal definition of a composite function:

Definition 58. Let f and g be functions with Range(g) ⊆ Domain(f ). Then the com-
posite function f g is defined to have:

Domain: Domain(g);
Codomain: Codomain(f ); and
Mapping rule: (f g) (x) = f (g (x)).

Remark 28. The composite function f g may also be written as f ○ g (read aloud as “f
circle g”). We use this alternative piece of notation whenever we want to be extra careful
about distinguishing f ○ g from f ⋅ g.

The condition Range(g) ⊆ Domain(f ) is important. It ensures that for any x ∈ Domain(g),
we have g (x) ∈ Domain(f ) and hence that f (g (x)) is well-defined.
If this condition fails, then we simply say that the composite function f g does not exist
or is undefined:

197, Contents

Example 282. Define f ∶ R+ → R by f (x) = ln x and g ∶ R → R by g (x) = x + 1.
Observe Range(g) = R is not a subset of Domain(f ) = R+ . And so for example,

(f g) (−5) = f (g (−5)) = f (−5 + 1) = f (−4)

would be undefined. Hence, the composite function f g simply does not exist.

The rest of this example is explicitly excluded from your syllabus.121 Nonetheless, spending
a minute or two reading it will earn you a better understanding of composite functions.
Recall that given a non-invertible function, we can restrict its domain to create a brand
new function that is invertible.
Here we can similarly restrict the domain of g to create a brand new function h, so that
the composite function f h exists.
For example, we can restrict the domain of g to (−1, ∞) and get the brand new function
h ∶ (−1, ∞) → R defined by h (x) = x + 1. Now Range(h) = R+ is a subset of Domain(f ) =
R+ . We thus have the composite function f h ∶ (−1, ∞) → R defined by:

(f h) (x) = f (h (x)) = f (x + 1) = ln (x + 1) .

Example 283. Define i ∶ R+0 → R by i (x) = x and j ∶ R → R by j (x) = x − 3.
Observe Range(j) = R is not a subset of Domain(i) = R+0 . And so for example,

(ij) (−5) = i (j (−5)) = i (−5 − 3) = i (−8)

would be undefined. Hence, the composite function ij simply does not exist.

Again, the rest of this example is explicitly excluded from your syllabus.
Again, we can restrict the domain of j to create a brand new function k, so that the
composite function jk exists.
For example, we can restrict the domain of j to [3, ∞) and get the brand new function
k ∶ [3, ∞) → R defined by k (x) = x−3. Now Range(k) = R+0 is a subset of Domain(f ) = R+0 .
We thus have the composite function jk ∶ [3, ∞) → R defined by:

(jk) (x) = j (k (x)) = j (x − 3) = x − 3.

See p. 5 of your syllabus.
198, Contents
We can use a single function to build a composite function.

Example 284. Define f ∶ R → R by f (x) = 2x. Since Range(f ) = R ⊆ Domain(f ) = R,

the composite function f f ∶ R → R exists and is defined by:

(f f ) (x) = f (f (x)) = f (2x) = 4x.

So for example: (f f ) (1) = 4 and (f f ) (3) = 12.

The composite function f f is usually simply written as f 2 . So, the line above can also
be written as:

f 2 (1) = 4 and f 2 (3) = 12.

Since Range (f 2 ) = R ⊆ Domain(f ) = R, we can also define the composite function f f 2

or f 3 . We have f 3 ∶ R → R defined by:

f 3 (x) = 8x.
So for example: f 3 (1) = 8 and f 3 (3) = 24.

We also have f 4 , f 5 , etc.:

f 4 ∶ R → R defined by f 4 (x) = 16x,

f 5 ∶ R → R defined by f 5 (x) = 32x,

Remark 29. The Singapore-Cambridge A-Level exams and syllabus use f 2 to mean the
composite function f f , f 3 to mean f f 2 , f 4 to mean f f 3 , etc.
Later on in Part V (Calculus), we’ll use a similar-looking but totally different piece of
notation. We’ll use f ′ , f ′′ , f ′′′ , f (4) , etc. to denote “the first derivative of”, the “second
derivative of”, the “third derivative of”, the “fourth derivative of”, etc. And in general,
f (n) means “the nth derivative of”.
Take care not to confuse the composite function f n with the nth derivative f (n) .

199, Contents

Example 285. Define g ∶ R → R by g (x) = 1 − .
Observe that Range(g) = R ⊆ Domain(g) = R. Thus, the composite function g 2 ∶ R → R
exists and is defined by:
1 − x2 1 x
g (x) = g (g (x)) = g (1 − ) = 1 − = + .
2 x
2 2 2 4
3 5
So for example: g 2 (1) = and g 2 (3) = .
4 4

Observe that Range (g 2 ) = R ⊆ Domain(g) = R. Thus, we can also define g 3 ∶ R → R by:

1 x 1
+x 3 x
g 3 (x) = g ( + ) = 1 − 2 4 = − .
2 4 2 4 8
5 5
So for example: g 3 (1) = and g 3 (3) = .
8 8

This example is continued in the next exercise.

Exercise 99. Continue with the above example.

(a) Let n = 4.
(i) Explain whether the composite function g n exists. If it does exist:
(ii) Write down the function g n ; and
(iii) Evaluate g n (1) and g n (3).
(b) Repeat part (a), but now let n = 5.
(c) Repeat part (a), but now let n = 6.
Part (d) is a little harder, but is also the sort of curveball that the A-Level examiners
like to throw in to make you squirm:
(d) Let n be a positive integer. Write down the function g n . (You need not prove that
g n exists nor that the function g n you’ve written down is the correct one.) Hence,
prove that for any x ∈ R, we have:

lim g n (x) = . (Answer on p. 1413.)
n→∞ 3

Big hint for part (d): The Jacobstahl numbers are 1, 1, 3, 5, 11, 21, 43, . . . , where each
new number equals the sum of the number before it and twice the number before that.
So for example, 11 = 5 + 2 × 3 and 21 = 11 + 2 × 5. You are told that the nth Jacobsthal
number is given by

2n − (−1)

200, Contents

Exercise 100. For each of the following pairs of functions f and g, determine if the
composite functions f g and gf exist. If they do, write them down and compute f g(1),
f g(2), gf (1), and gf (2). (Answer on p. 1414.)

(a) Define f, g ∶ R → R by f (x) = ex and g (x) = x2 + 1.

1 1
(b) Define f, g ∶ R ∖ {0} → R by f (x) = and g (x) = .
x 2x
(c) Define f ∶ R ∖ {0} → R by f (x) = and g ∶ R → R by g (x) = x2 + 1.
(d) Define f ∶ R ∖ {0} → R by f (x) = and g ∶ R → R by g (x) = x2 − 1.
Exercise 101. For each of the following functions f , determine if the composite function
f 2 exists. If it does, write it down and compute f 2 (1) and f 2 (2).
(a) Define f ∶ R → R by f (x) = ex .
(b) Define f ∶ R → R by f (x) = 3x + 2.
(c) Define f ∶ R → R by f (x) = 2x2 + 1.
(d) Define f ∶ R+ → R by f (x) = ln x. (Answer on p. 1415.)

201, Contents

16. Transformations

16.1. y = f (x) + a
The graph of y = f (x) + a is simply the graph of f translated (or shifted) upwards by a
units. (Note that if a < 0, then we have a negative upward shift, i.e. a downward shift.)

Example 286. The black graph below is the function f ∶ R → R defined by f (x) = x3 −1.
The graph of y = f (x) + 2 = x3 + 1 is simply that of f translated upwards by 2 units.

y = f (x) + 2
(0, 1)

(0, −1) x

(0, −4)
y = f (x) − 3

The graph of y = f (x) − 3 = x3 − 4 is simply that of f translated downwards by 3 units.

When a graph is translated upwards or downwards, any y-intercepts, lines of sym-
metry, turning points, and asymptotes simply shift along with it.122
In this example, f has y-intercept (0, −1). And so, y = f (x) + 2 and y = f (x) − 3 simply
have y-intercepts (0, 1) and (0, −4).
(Important: Note that in general, when a graph shifts upwards or downwards, we can’t
say anything about how the x-intercepts shift.)

This assertion is more formally stated and proven as Fact 196 in the Appendices.
202, Contents
Example 287. The black graph below is the function g ∶ R ∖ {0} → R defined by g (x) =
The graph of y = g (x) + 1 = 1/x + 1 is simply that of g translated upwards by 1 unit.
Since g has horizontal asymptote y = 0 (the x-axis), y = g (x)+1 has horizontal asymptote
y = 1.

y y =x+1
y =x+1
y = −x − 1

y = g (x) + 1


x = 0 is a
vertical asymptote
for both g and y = g (x) + 1

Note that with an upward or downward shift, any vertical asymptotes remain unchanged,
because a vertical line translated upwards or downwards is simply the same vertical line.
And so here, both g and y = g (x) + 1 have the vertical asymptote x = 0 (the y-axis).
The two lines of symmetry for g are y = x and y = −x. Thus, the two lines of symmetry
for y = g (x) + 1 are simply the same, but translated upwards by 1 unit — y = x + 1 and
y = −x + 1.

203, Contents

16.2. y = f (x + a)
The graph of y = f (x + a) is simply the graph of f translated leftwards by a units. (Note
that if a < 0, then we have a negative leftward shift, i.e. a rightward shift.)
Why leftwards (and not rightwards as one might expect)? The reason is that in order for
f (x1 ) and f (x2 + a) to “hit” the same value, it must be that x2 = x1 − a. That is, x2 must
be a units to the left of x1 .

Example 288. The black graph below is the function f ∶ R → R defined by f (x) = x3 −1.
Also graphed are these two equations:

y = f (x + 2) = (x + 2) − 1 = x3 + 6x2 + 12x + 7,

y = f (x − 1) = (x − 1) − 1 = x3 − 3x2 + 3x − 2.

The first equation is simply f translated leftwards by 2 units. The second is simply f
translated rightwards by 1 unit.



(−1, 0) (1, 0) (2, 0)


y = f (x + 2) f y = f (x − 1)

When a graph is translated leftwards or rightwards, any x-intercepts, lines of sym-

metry, turning points, and asymptotes simply shift along with it.123
In this example, f has x-intercept (1, 0). And so, y = f (x + 2) and y = f (x − 1) simply
have x-intercepts (−1, 0) and (2, 0).
(Important: Note that in general, when a graph shifts leftwards or rightwards, we can’t
say anything about how the y-intercepts shift.)
This assertion is more formally stated and proven as Fact 196 in the Appendices.
204, Contents
Example 289. The black graph below is the function g ∶ R ∖ {0} → R defined by g (x) =
The graph of y = g (x + 1) is simply that of g translated leftwards by 1 unit.
Since g has vertical asymptote x = 0 (the y-axis), y = g (x + 1) has vertical asymptote
x = −1.

y y =x+1
x = −1 g


y = 0 is a horizontal
asymptote for both
g and y = g (x + 1)

y = −x

y = g (x + 1) y = −x − 1

Note that with a leftward or rightward shift, any horizontal asymptotes remain unchanged,
because a horizontal line translated leftwards or rightwards is simply the same horizontal
line. And so here, both g and y = g (x + 1) have the horizontal asymptote y = 0 (the
The two lines of symmetry for g are y = x and y = −x. Thus, the two lines of symmetry
for y = g (x + 1) are simply the same, but translated leftwards by 1 unit — y = x + 1 and
y = − (x + 1) = −x − 1.

205, Contents

16.3. y = af (x)
Let a > 0. The graph of y = af (x) is simply that of f stretched vertically (outwards from
the x-axis) by a factor of a. (If a < 1, then the graph is compressed rather than stretched.)

Example 290. The black graph below is the function f ∶ R → R defined by f (x) = x3 −1.
The graph of y = 2f (x) = 2x3 − 2 is simply that of f stretched vertically (outwards from
the x-axis) by a factor of 2.

(0, − )

Compress 2× (1, 0) x

y = f (x) Stretch 2× (0, −1)
(0, −2)

y = 2f (x)

The graph of y = 0.5f (x) = 0.5x3 − 0.5 is simply that of f compressed vertically (inwards
towards the x-axis) by a factor of 2.
When a graph is stretched vertically, any y-intercepts, lines of symmetry, turning
points, and asymptotes stretch along with it.124
In this example, f has y-intercept (0, −1). And so, y = 2f (x) and y = 0.5f (x) simply
have y-intercepts (0, −2) and (0, −0.5).
Under a vertical stretch, any x-intercepts remain unchanged. Here, all three graphs have
the same x-intercept (1, 0).

This assertion is more formally stated and proven as Fact 196 in the Appendices.
206, Contents
To get y = −af (x) (where a > 0), first reflect f in the x-axis to get y = −f (x), then stretch
vertically by a factor of a.

Example 291. The black graph below is the function f ∶ R → R defined by f (x) = x3 −1.
Suppose we want to graph y = −2f (x).
To do so, first reflect f in the x-axis to get y = −f (x) = −x3 + 1.

y = −2f (x)

y = −f (x) (0, 2)

(0, −1)
(1, 0)
(0, −1)

Then stretch vertically by a factor of 2, to get y = −2f (x) = −2x3 + 2.

207, Contents

16.4. y = f (ax)
Let a > 0. The graph of y = f (ax) is simply that of f compressed horizontally (inwards
towards the y-axis) by a factor of a.
Why compressed inwards (and not stretched outwards as one might expect)? The reason
is that in order for f (x1 ) and f (ax2 ) to “hit” the same value, we must have x2 = x1 /a.

Example 292. The black graph below is the function f ∶ R → R defined by f (x) = x3 −1.
The graph of y = f (2x) = (2x) − 1 = 8x3 − 1 is simply that of f compressed horizontally

(inwards towards the y-axis) by a factor of 2.

y = f (2x) f

y=f( )
( , 0) (1, 0)
(0, −1) (2, 0) x

Compress 2×
Stretch 2×

The graph of y = f (0.5x) = (0.5x) − 1 = 0.125x3 − 1 is simply that of f stretched


horizontally (outwards from the y-axis) by a factor of 2.

When a graph is stretched horizontally, any x-intercepts, lines of symmetry, turning
points, and asymptotes stretch along with it.125
In this example, f has x-intercept (1, 0). And so, y = f (2x) and y = f (0.5x) simply have
x-intercepts (0.5, 0) and (2, 0).
Under a horizontal stretch, any y-intercepts remain unchanged. Here, all three graphs
have the same y-intercept (0, −1).

This assertion is more formally stated and proven as Fact 196 in the Appendices.
208, Contents
To get y = f (−ax) (where a > 0), first reflect f in the y-axis to get y = f (−x), then compress
horizontally by a factor of a.

Example 293. The black graph below is the function f ∶ R → R defined by f (x) = x3 −1.
Say we want to graph y = f (−2x).
To do so, first reflect f in the y-axis to get y = f (−x) = (−x) − 1 = −x3 − 1.

Compress 2×

(− , 0)

(1, 0) x
(−1, 0)
(0, −1)

y = f (−x)

y = f (−2x)

Then compress horizontally by a factor of 2, to get y = f (−2x) = (−2x) − 1 = −8x3 − 1.


209, Contents

16.5. Combinations of the Above

Example 294. Some unknown function f is graphed below. You are told only that
points A, B, and C are (approximately) (−1.4, 0), (0.8, −1.1), and (1.4, 0).
Armed only with this knowledge, we will try to graph four equations: y = 2f (x + 1),
y = 2f (x + 1), y = f (2x) + 1, and y = f (2x + 1).

A ≈ (−1.4, 0) C ≈ (1.4, 0)

B ≈ (0.8, −1.1)

First stretch f vertically by a factor of 2 to get y = 2f (x), then translate upwards by 1

unit to get y = 2f (x) + 1.

y = 2f (x) + 1

y = 2f (x)

(Example continues on the next page ...)

210, Contents

(... Example continued from the previous page.)
First translate f leftwards by 1 unit to get y = f (x + 1), then stretch vertically by a
factor of 2 to get y = 2f (x + 1).

y = f (x + 1)
y = 2f (x + 1)

First compress f horizontally by a factor of 2 to get y = f (2x), then translate upwards

by 1 unit to get y = f (2x) + 1.

y = f (2x) + 1

y = f (2x)

(Example continues on the next page ...)

211, Contents

(... Example continued from the previous page.)
First translate f leftwards by 1 unit to get y = f (x + 1), then compress horizontally by a
factor of 2 to get y = f (2x + 1).


y = f (x + 1)

y = f (2x + 1)


Fact 27. Let a, b > 0 and c, d ∈ R. Let f be a nice function. Then to get the graph of
y = af (bx + c) + d, follow these steps:
1. Translate leftwards by c units, to get y = f (x + c).
2. Compress horizontally (inwards towards y-axis) by a factor of b, to get y = f (bx + c).
3. Stretch vertically (outwards from x-axis) by a factor of a, to get y = af (bx + c).
4. Translate upwards by d units, to get y = af (bx + c) + d.

Proof. See p. 1272 in the Appendices.

Exercise 102. Use f from the last example to graph these equations. (Hint: You can
make use of what was already shown in the above example.)
(a) y = −2f (x) − 1. (b) y = 2f (−x) + 1. (Answers on p. 1416.)
(c) y = −2f (x + 1). (d) y = 2f (−x + 1). (Answers on p. 1417.)
(e) y = −f (2x) + 1. (f) y = f (−2x) + 1. (Answers on p. 1418.)
(g) y = −f (2x + 1). (h) y = f (−2x + 1). (Answers on p. 1419.)

212, Contents

16.6. y = ∣f (x)∣
• Where f ≥ 0 (i.e. above the x-axis), the graphs of f and y = ∣f (x)∣ coincide.
• But where f < 0 (i.e. below the x-axis), they’re reflections of each other in the x-axis.

Example 295. The black graph below is the function f ∶ R → R defined by f (x) = x3 −1.
y = ∣f (x)∣ is the red dotted graph.

Where f < 0, the two Where f ≥ 0, the two
graphs are reflections of graphs coincide.
each other in the x-axis.
y = ∣f (x)∣

Example 296. The black graph below is the function g ∶ R ∖ {0} → R defined by g (x) =
y = ∣g (x)∣ is the red dotted graph.

Where g ≥ 0, the two

graphs coincide.
y = ∣g(x)∣

Where g < 0, they’re
reflections of each
other in the x-axis.

213, Contents

16.7. y = f (∣x∣)
1. Where x ≥ 0 (i.e. the portion to the right of the y-axis), f and y = f (∣x∣) coincide.
2. Now take this right portion and reflect it in the y-axis to get the left portion of y = f (∣x∣).

Example 297. The black graph below is the function f ∶ R → R defined by f (x) = x3 −1.
y = f (∣x∣) is the red dotted graph.

1. The right portions coincide.

2. To get the left portion,
simply reflect the right
portion in the y-axis.
y = f (∣x∣)

Example 298. The black graph below is the function g ∶ R ∖ {0} → R defined by g (x) =
y = g (∣x∣) is the red dotted graph.

1. The right portions coincide.
2. To get the left portion,
simply reflect the right
portion in the y-axis.

y = g (∣x∣)

g x

214, Contents

Here are three crude but helpful rules for sketching the graph of 1/f :
1. Small becomes big.
2. Big becomes small.
3. Intersect at f (x) = 1.

Example 299. The black graph below is the function f ∶ R → R defined by f (x) = x3 −1.
To graph 1/f , we make use of the above three rules:
1. Small becomes big.
In particular: Where f → 0− , we have 1/f → −∞; and where f → 0+ , we have 1/f → ∞.
So, x = −1 is a vertical asymptote of 1/f .
2. Big becomes small.
In particular: Where f → −∞, we have 1/f → 0− ; and where f → ∞, we have 1/f → 0+ .
So, y = 0 is a horizontal asymptote of 1/f .
3. Intersect at f (x) = 1.

So, the graphs of f and 1/f intersect at (0, −1) and ( 2, 1).


Horizontal asymptote ( 2, 1)

(0, −1)

Vertical asymptote
f x=1

215, Contents

Example 300. The black graph below is the function g ∶ R → R defined by g (x) = x2 + 2.
To graph 1/g, we follow the same three rules:
1. Small becomes big.
Not applicable here.
2. Big becomes small.
Where g → ∞, we have 1/g → 0+ . So, y = 0 is a horizontal asymptote of 1/g.
3. Intersect at g (x) = 1.
There is no x for which g (x) = 1. Thus, the graphs of g and 1/g do not intersect at all.

(0, 2)

1 (0, 0.5)

Horizontal asymptote x

216, Contents

Exercise 103. Some unknown function f is graphed below. You are told only that points
A, B, and C are (approximately) (−1.4, 0), (0.8, −1.1), and (1.4, 0). Graph the following

(a) y = ∣2f (2x)∣.

(b) y = f (∣x − 1∣) + 2. (Answer on p. 1420.)

A ≈ (−1.4, 0) C ≈ (1.4, 0)

B ≈ (0.8, −1.1)

Exercise 104. Describe a sequence of transformations that would transform the graph

1 1
y= onto y = 3 − . (Answer on p. 1421.)
x 5x − 2

217, Contents

17. ln, exp, and e
In secondary school, your teachers probably went in this order:
1. First introduce Euler’s number e = 2.718 281 828 459 . . .
2. Then define the natural logarithm ln as the logarithm with base e.
Here we’ll go the other way round. We’ll:
1. First define the natural logarithm function ln.
2. Then define Euler’s number e as being the number such that ln e = 1.
This may seem strange but will pay off when we study calculus in Part V.

Definition of the Natural Logarithm Function (informal)

The natural logarithm function ln ∶ R+ → R is informally

y defined by:

ln t is the area under y = ,
between x = 1 and x = t.

ln t is defined as x
this shaded area

1 t x

The above definition is considered (slightly) informal because the mapping rule is described
using geometry. After we’ve learnt about the definite integral in Part V (Calculus), we will
give a formal definition of the natural logarithm function (see Definition 184).

Remark 30. In Singapore and some other Britishy bits of the world, ln is usually read
aloud as lawn. In the US, it’s usually read out loud as “el en”.
The notation ln was probably first published in Steinhauser (1875, p. 277), where it stood
for the Latin Logarithmus naturalis.

218, Contents

Example 301. We have ln 4 = 1.386 . . . and ln 5 = 1.609 . . .

ln 4 = 1.386 . . .

1 4 x

ln 5 = 1.609 . . . .

1 5 x

Note that right now, we don’t yet know how to calculate ln 4 or ln 5.

One possibility is to draw an extremely precise and large graph of y = 1/x on some graph
paper, then slowly count up the squares. This sounds like a ridiculous idea, but as we’ll
learn in Part V, this isn’t too far from describing how integral calculus is done.

219, Contents

Example 302. By definition, ln 1 is the area under y = 1/x, between x = 1 and x = 1.
Hence, ln 1 = 0.

ln 1 = 0 because x
there’s no area!

1 x

Note that if x < 1, then ln x is negative:

Example 303. ln 0.9 = −0.105 . . .

ln 0.9 = −0.105 . . .

0.9 1 x

220, Contents

Example 304. ln 0.5 = −0.693 . . .

ln 0.5 = −0.693 . . . y=

0.5 1 x

Note that if x ≤ 0, then ln x is simply undefined:

Example 305. ln 0 and ln (−2.5) are undefined.

Below is the graph of ln. As we saw above:

• ln 1 = 0 — thus, ln has x-intercept (1, 0).
• For x ∈ (0, 1), ln x < 0.
• For x ≤ 0, ln x is simply undefined.


1 x

As x → 0+ , ln x → −∞.
Thus, ln has the vertical
asymptote x = 0 (also the y-axis).

221, Contents

Observe that for x > 0, the graph of y = 1/x is strictly positive. Since ln is defined as the
area under the graph of y = 1/x, it follows that ln is strictly increasing. And since ln is
strictly increasing, by the Horizontal Line Test (see p. 181), ln is invertible. That is, ln has
an inverse function.
Now, we don’t know what exactly this inverse function is, but we know it exists. So, let’s
simply give it a name — the exponential function.

Definition 59. The exponential function, denoted exp, is the inverse of the natural
logarithm function.

Since Range (ln) = R, the above Definition says that exp has:

Domain: R,
Codomain: R+ ,
Mapping rule: ln y = x ⇐⇒ exp x = y.

Example 306. Since exp is the inverse of ln, we know from the earlier examples that:

• exp 1.386 ≈ 4, exp 1.609 ≈ 5, and exp 0 = 1.

• exp (−0.105) ≈ 0.9 and exp (−0.693) ≈ 0.5.

Since exp is the inverse of ln, it must be the reflection of ln in y = x:


1 x

We next introduce Euler’s number e.

222, Contents

17.1. Euler’s Number

Definition 60. Euler’s number, denoted e, is defined as the number that satisfies:

ln e = 1 or equivalently, e = exp 1.

y Remark 31. Euler is pronounced oiler (and not yuler).

And so by definition of ln, Euler’s number126 e is the num-

ber such that the area under y = 1/x, between x = 1 and x = e
is equal to 1.
Right now we have no idea how to compute e. But I’ll cheat
by telling you that:

e = exp 1 ≈ 2.718 281 828 459 . . .

We define e so that
this area equals 1.

1 e x

We also have the following familiar result:

Fact 28. For every real number x, we have:

ex = exp x.

Proof. See p. 1366 in the Appendices.

The above Fact justifies why we can and will often write ex in place of exp x.

Remark 32. Note that while interesting, Euler’s number e is by itself not particularly
important. What’s really important are the natural logarithm and exponential func-
For this reason, we will in this textbook often write exp x rather than ex . This is to
remind you that this expression is not simply a number e raised to the power of x, but is
the value of the exponential function at x.

It was Leonhard Euler (1707–1783) himself who first used the letter e to denote this number. Presumably
he did not do this to honour himself. Calling e Euler’s number is simply an honour conferred by
posterity. (Confusingly, there is also another number called Euler’s constant γ ≈ 0.577 215 664 . . . But
fortunately we will not encounter γ in A-Level maths.)
Indeed, one mathematician Walter Rudin (1966 [1987]) goes so far as to call the exponential function
“the most important function in mathematics”.
223, Contents
We also have these two lovely results:

1 1 1 1
Theorem 1. e = + + + + ...
0! 1! 2! 3!

Proof. Happily, we’ll learn to prove this in Part V (Calculus) — see p. 769.

1 n
Theorem 2. e = lim (1 + ) .
n→∞ n

Proof. Happily, we’ll learn to prove this in Part V (Calculus) — see p. 770.

Although we can’t prove either of the above theorems right now, we can nonetheless nu-
merically “verify” that they are at least plausible:
1 1
To “verify” Theorem 1, define: f ∶ Z+0 → R by f (n) = + ⋅ ⋅ ⋅ + .
0! n!
Then write:
1 1 1 1
f (0) = = = 1. f (3) = f (2) + = 2.5 + = 2.6.
0! 1 3! 6
1 1 1 1
f (1) = f (0) + = 1 + = 2. f (4) = f (3) + = 2.6 + = 2.716.
1! 1 4! 24
1 1 1 1
f (2) = f (1) + = 2 + = 2.5. f (5) = f (4) + = 2.716 + = 2.718 . . .
2! 2 5! 120
We see that f rapidly converges towards e = 2.718 281 828 459 . . . By f (6), we have e correct
to three decimal places. Lovely.
1 n
Similarly, to “verify” Theorem 2, define: g ∶ R → R by g(n) = (1 + ) .
Then write:
1 1 1 10
g(1) = (1 + ) = 2. g(10) = (1 + ) = 2.593 742 . . .
1 10
1 2 1 100
g(2) = (1 + ) = 2.25. g(100) = (1 + ) = 2.704 813 . . .
2 100
1 3 1 1 000
g(3) = (1 + ) = 2.6. g(1 000) = (1 + ) = 2.716 923 . . .
3 1 000
1 4 1 10
g(4) = (1 + ) = 2.708 3. g (10 ) = (1 + 6 )
= 2.718 280 . . .
4 10
1 5 1 10
g(5) = (1 + ) = 2.716. g (10 ) = (1 + 9 )
= 2.718 281 . . .
5 10
We see that g also converges towards e = 2.718 281 828 459 . . . , though much less rapidly
than f . Even g(100) gets e correct only to two decimal places. And even g (106 ) gets e
correct to only five decimal places.
224, Contents
18. O-Level Review: The Derivative

18.1. The Gradient

Informally, the gradient or slope of a line is:
• “Rise over Run”.
• ∆y/∆x or “Change in y over change in x”. (∆ is the upper-case Greek letter delta.)
• The answer to the question: “If we move 1 unit to the right while remaining along the
line, by how many units will we move up?”
As you know, the gradient of a line is constant:

Example 307. The gradient of the line y = 2x + 1 at every point is:

∆y 2
= = 2.
∆x 1

y = 2x + 1

∆y = 2

∆x = 1

225, Contents

Example 308. The gradient of the line y = −x − 1 at every point is:
∆y −1
= = −1.
∆x 1

y = −x − 1

∆y = −1

∆x = 1

Remark 33. Slope is a perfectly good synonym for gradient. However, your A-Level
syllabus and exams do not use the word slope. And so we’ll stick to using only the word

226, Contents

In general, the gradient or derivative of a graph at a point is the gradient of the tangent
line at that point. Informally, a tangent line (from the Latin for touching line) is a line
that just touches the graph at that point.

Example 309. Graphed below is the equation y = x2 + 1.

The red line is tangent to the graph at the point (−2, 5). It can be shown that this line’s
gradient is −4. Therefore, the graph’s gradient at this point is −4.

y = x2 + 1

(3, 10)

Gradient −4

(−2, 5)

Gradient 6

(0, 1)
Gradient 0

The blue line is tangent to the graph at the point (0, 1). It can be shown that this line’s
gradient is 0. Therefore, the graph’s gradient at this point is 0.
The green line is tangent to the graph at the point (3, 10). It can be shown that this
line’s gradient is 6. Therefore, the graph’s gradient at this point is 6.

227, Contents

Example 310. Consider the graph of y = x3 + 1.
The red line is tangent to the graph at the point (−2, −7). It can be shown that this
line’s gradient is 12. Therefore, the graph’s gradient at this point is 12.

Gradient 12

(3, 28)

y = x3 + 1

Gradient 0 (0, 1)

Gradient 27
(−2, −7)

The blue line is tangent to the graph at the point (0, 1). It can be shown that this line’s
gradient is 0. Therefore, the graph’s gradient at this point is 0.
The green line is tangent to the graph at the point (3, 28). It can be shown that this
line’s gradient is 27. Therefore, the graph’s gradient at this point is 27.

228, Contents

18.2. The Derivative as the Gradient
Informally, the derivative — variously denoted as dy/dx, f ′ , and f˙ — is the gradient. A
little more formally, it is the function that tells us what a graph’s gradient is at each point.
We’ll learn more about this in Part V. But for now, some quick examples:

Example 311. Define f ∶ R → R by f (x) = 5x. The graph of f is simply the graph of the
equation y = 5x. We don’t yet know how, but it’s possible to show that at every point,
this graph’s gradient is equal to 5:

= f ′ (x) = 5 for all x.

So, for example, at x = −2, 0, 3, and indeed any other point, the graph’s gradient is 5.

Example 312. Define g ∶ R → R by g (x) = x2 + 1. The graph of g is simply the graph

of y = x2 + 1. We don’t yet know how, but it’s possible to show that at every point, this
graph’s gradient is equal to 2x:
= g ′ (x) = 2x.
So, for example:
RRR = g ′ (−2) = 2 ⋅ (−2) = −4, RRR = g ′ (0) = 2 ⋅ 0 = 0, RRR = g ′ (3) = 2 ⋅ 3 = 6.
dx RR dx RR dx RR
Rx=−2 Rx=0 Rx=3

That is, at each of the points x = −2, 0, or 3, the graph’s gradient is −4, 0, and 6.
Formally, RRR = g ′ (a) is read aloud as “the derivative of y with respect to x, evaluated
dx RR
at a” or “the derivative of g at a”.
A little less formally, we can simply read it as “the gradient at a”.

Example 313. Define h ∶ R → R by h (x) = x3 + 1. The graph of h is simply the graph

of y = x3 + 1. We don’t yet know how, but it’s possible to show that at every point, this
graph’s gradient is equal to 3x2 :
= h′ (x) = 3x2 .
So, for example:
RRR = h (−2) = 3 ⋅ (−2) = 12,

RRR = h (0) = 3 ⋅ 0 = 0,

RRR = h′ (3) = 3 ⋅ 32 = 27.
2 2
dx RR dx RR dx RR
Rx=−2 Rx=0 Rx=3

That is, at each of the points x = −2, 0, or 3, the graph’s gradient is 12, 0, or 27.

229, Contents

18.3. The Derivative as Rate of Change
Recall that may be interpreted as:
• The rate of change of x with respect to t; and
• The change in x resulting from a small unit change in t.

Example 314. A particle P travels along a line. Its eastward displacement x (metres)
from the point O at time t (seconds) is given by:

x (t) = t3 − 6t2 + 9t.

x (metres)
O 1 2 3 4

• At t = 0, P starts x = 03 − 6 ⋅ 02 + 9 ⋅ 0 = 0 metres east of O. In other words, it is at O.

And during t ∈ [0, 1), P travels eastwards away from O.
• At t = 1, P stops and is x = 13 − 6 ⋅ 12 + 9 ⋅ 1 = 4 metres east of O.
• During t ∈ (1, 3), it travels westwards, i.e. back towards O. For example, at t = 2, P is
x = 23 − 6 ⋅ 22 + 9 ⋅ 2 = 2 metres east of O.
• At t = 3, P has returned to O (x = 33 − 6 ⋅ 32 + 9 ⋅ 3 = 0) and stops.
• During t > 3, P keeps travelling eastwards away from O. For example, at t = 4, P is
x = 43 − 6 ⋅ 42 + 9 ⋅ 4 = 4 metres east of O. And at t = 10 (not depicted in the figures), P
is x = 103 − 6 ⋅ 102 + 9 ⋅ 10 = 490 metres east of O.

x (m)

t (s)

1 2 3 4

(Example continues on the next page ...)

230, Contents

(... Example continued from the previous page.)
We define velocity v (metres per second) to be the rate of change of displacement with
respect to (w.r.t.) time. In other words, velocity is the (first) derivative of displacement
w.r.t. time:
v= .
Using rules of differentiation that we’ll review in Ch. 18.5, it is possible to work out that
P ’s (eastward) velocity is given by:

v (t) = 3t2 − 12t + 9.

v (m s−1)

-1 1 2 3 4 t (s)


At each instant of time, v tells us what P ’s velocity is — in other words, the rate at
which x is changing per “infinitesimally small” unit of time t. If v > 0, then P is travelling
eastwards. And if v < 0, then P is travelling westwards.
From the above graph, we can tell that:
• During t ∈ [0, 1), P travels eastwards.
• At t = 1, P stops.
• During t ∈ (1, 3), P travels westwards.
• At t = 3, P stops.
• During t > 3, P travels eastwards.
(Example continues on the next page ...)

231, Contents

(... Example continued from the previous page.)
We define acceleration a (metres per second per second) to be the rate of change of
velocity w.r.t. time. In other words, acceleration is the (first) derivative of velocity w.r.t.
time and thus the second derivative of displacement w.r.t. time:
dv d2 x
v= = .
dt dt2
Again, it is possible to work out that P ’s (eastward) acceleration is given by:

a (t) = 6t − 12.

a (m s−2)

-2 1 2 3 4 t (s)

At each instant of time, a tells us what P ’s acceleration is — in other words, the rate
at which v is changing per “infinitesimally small” unit of time t. If a > 0, then P ’s
eastwards velocity is increasing (or equivalently, its westwards velocity is decreasing).
And if a < 0, then P ’s eastwards velocity is decreasing (or equivalently, its westwards
velocity is increasing)
From the above graph, we can tell that:
• During t ∈ [0, 2), P ’s eastwards velocity is increasing (or equivalently, its westwards
velocity is decreasing).
• During t > 2, P ’s eastwards velocity is decreasing (or equivalently, its westwards velo-
city is increasing)

232, Contents

18.4. Differentiability vs Continuity
In Ch. 11, we learnt that informally, continuity means you can draw the graph without
lifting your pencil. We now introduce the concept of differentiability, which turns out
to be a stronger condition than continuity. Informally, a function is differentiable if its
graph is “smooth” and, in particular, doesn’t have “kinks”.128

Example 315. The function f ∶ R → R defined by f (x) = x5 − x2 − 1 is continuous

everywhere, because you can draw its entire graph without lifting your pencil.

f is differentiable everywhere
because its graph is
“smooth” and has no “kinks”.

The function f is also differentiable everywhere, because it is “smooth” everywhere and

has no “kinks”.

Example 316. The sine function sin is both continuous everywhere and differentiable

sin is differentiable
everywhere because
its graph is “smooth”
and has no “kinks”.

We will formally define the concept of differentiability in Part V (Calculus).
233, Contents
Example 317. The exponential function exp is both continuous everywhere and differ-
entiable everywhere.

exp is differentiable everywhere

because its graph is
“smooth” and has no “kinks”.

The above examples may leave you wondering, “So aren’t continuity and differentiability
just the same thing?” Well, it turns out that every differentiable function must also
be continuous.129 (This is exactly what we meant when we said that differentiability is a
stronger condition than continuity.)
However, the converse is false — not every continuous function is differentiable; that is,
continuity does not imply differentiability. Differentiable functions are thus a subset of
continuous functions. Beautiful Venn diagram drawn by an artistic genius:



Let’s now look at some examples of functions that are continuous but not differentiable.
The classic example is the absolute value function:

This assertion is formally stated and proven as Theorem 19 in the Appendices.
234, Contents
Example 318. The absolute value function ∣⋅∣ is continuous everywhere, because you can
draw its entire graph without lifting your pencil.
However, it is not differentiable everywhere, because it is not “smooth” everywhere. In
particular, it has a “kink” at x = 0.
We can say though that the absolute value function is differentiable everywhere except
at x = 0. Or equivalently, it is differentiable on R/ {0}.

∣⋅∣ is not ∣⋅∣ is

differentiable everywhere differentiable
because it has a kink at x = 0. on R ∖ {0}.

Example 319. Define the piecewise function g ∶ R → R by:

⎪−x, for x ≤ 0
g (x) = ⎨

⎩x , for x > 0.
⎪ 2

The function g is continuous everywhere, because you can draw its entire graph without
lifting your pencil.
However, it is not differentiable everywhere because, like ∣⋅∣, it has a “kink” at x = 0.
Nonetheless, again, we can say that g is differentiable everywhere except at x = 0. Or
equivalently, g is differentiable on R/ {0}.

g is not differentiable
everywhere because it
has a kink at x = 0.
g is
on R ∖ {0}.

235, Contents

Any function that’s differentiable must also be continuous. Therefore, by the contrapositive,
any function that’s non-continuous must also be non-differentiable:

Example 320. The tangent function tan is not continuous everywhere. Thus, it is not
differentiable everywhere either.

tan is not
tan is differentiable
on (− , ).
π π
2 2

π π x
2 2

It is however both (i) continuous and (ii) differentiable on the interval (− , ). This is
π π
2 2
because this interval (i) can be drawn without lifting your pencil; and (ii) is “smooth”
and has no “kinks”.
Indeed, tan is both continuous and differentiable on every interval:

1 1
((k − ) π, (k + ) π), for k ∈ Z.
2 2

Happily, most functions we’ll encounter in A-Level maths will be both continuous and
differentiable. There are however exceptions, as we’ve seen here and in Ch. 11.

Exercise 105. Graphed below are three functions f , g, and h. State if each is continuous
everywhere and/or differentiable everywhere. If not, state the set of points on which each
function is continuous or differentiable. (Answer on p. 1422.)

y y h y

f x

236, Contents

18.5. Rules of Differentiation
Below, we informally state several Rules of Differentiation that you should find famil-
iar.130 In Part V (Calculus), we will explain a little more where these rules come from. For
now, you need merely “know” these rules and how to use them to “solve” differentiation

Rules of Differentiation (informal)

Let c be a constant and x, y, and z be variables. Then:

Constant Rule Constant Factor Rule Power Rule

d C d F dy d c P c−1
c=0 (cy) = c x = cx
dx dx dx dx

Sum and Difference Rules Product Rule

d ± dy dz d × dy dz
(y ± z) = ± (yz) = z +y
dx dx dx dx dx dx

Quotient Rule Sine Cosine

d y ÷ z dx − y dx
dy dz
d d
= sin x = cos x cos x = − sin x
dx z z2 dx dx

natural logarithm Exponential

d 1 d x x
ln x = e =e
dx x dx

You’re probably wondering where the Chain Rule is. Don’t worry, it is the topic of the
next subchapter.

Here’s a common mnemonic for the Quotient Rule:

Lo-D-Hi minus Hi-D-Lo,

Cross over and square the Lo.

For a formal statement of these Rules of Differentiation, see Proposition 20 (Appendices).
237, Contents
d C
Example 321. 5 = 0. (Constant Rule)

500 = 0.
Example 322. (Constant Rule)

(−200) = 0.
Example 323. (Constant Rule)

d d
(5x3 ) = 5 ( x3 ) = 5 (3x2 ) = 15x2 .
Example 324. (Constant Factor Rule)
dx dx

d d
(500x0.3 ) = 500 ( x0.3 ) = 500 (0.3x−0.7 ) = 150x−0.7 .
Example 325. (CFR)
dx dx

d d
(−200x−1 ) = −200 ( x−1 ) = −200 (−x−2 ) = 200x−2 .
Example 326. (CFR)
dx dx

d 3P 2
Example 327. x = 3x . (Power Rule)

d 0.3 P
Example 328. x = 0.3x−0.7 . (Power Rule)

d −1 P −2
Example 329. x = −x . (Power Rule)

Example 330. Sum and Difference Rules:

d ± d 3 d
(x3 + 500x0.3 ) = x + (500x0.3 ) = 3x2 + 150x−0.7 .
dx dx dx
d ± d 3 d
(x3 − 500x0.3 ) = x − (500x0.3 ) = 3x2 − 150x−0.7 .
dx dx dx

Example 331. Product Rule and Sine:

d ×
(x3 sin x) = 3x2 sin x + x3 cos x.

Example 332. Quotient Rule and Cosine:

d x3 ÷ 3x2 cos x − x3 (− sin x) x2

= = (3 cos x + x sin x).
dx cos x cos2 x cos2 x

238, Contents

Example 333. Combination of various rules:
d e2x d ex ex F 1 d ex ex
= =
dx ln x2 dx 2 ln x 2 dx ln x
1 ln x dx (ex ex ) − (ex ex ) dx
ln x

(ln x)
2 2

1 ln x (ex ex + ex ex ) − (ex ex ) x1
(ln x)
2 2

e2x 2 ln x − x1 e2x 1
= = (1 − ).
2 (ln x) 2 ln x 2x ln x

dy dy RRRR
Exercise 106. Find and for each of the following. (Answer on p. 1422.)
(a) y = x2 .
(b) y = 3x5 − 4x2 + 7x − 2.
(c) y = (x2 + 3x + 4) (3x5 − 4x2 + 7x − 2).
Exercise 107. For each of the following, find without using the chain rule.
(a) y = ex ln x.
(b) y = x2 ex ln x.
sin x
(c) y = .
sin x
(d) y = tan x, given that tan x = and sin2 x + cos2 x = 1.
cos x
(e) y = , where z is a variable that can be expressed in terms of x. (Leave your answer
in terms of z and .)
Use (e) to solve (f), (g) and (h):
(f) y = cosecx, where cosecx = .
sin x
(g) y = sec x, where sec x = .
cos x
(h) y = cot x, where cot x = . (Answers on p. 1422.)
tan x

239, Contents

18.6. The Chain Rule
You will, of course, recall the Chain Rule from secondary school. Here we will merely
state it informally, as we did in secondary school:131

Chain Rule (informal)

Let x, y, and z be variables. Then:

dz dz dy
= ⋅ .
dx dy dx

One mnemonic is to think of the derivatives on the RHS as fractions — in which case,
the dy’s get cancelled out and we’re left with dz/dx. (In Part V, we’ll explain why this is
merely a mnemonic and why it is wrong to think of derivatives as fractions.)
The chain rule has the following informal interpretation:

The change in z caused by = The change in z caused by × The change in y caused by

a small unit change in x a small unit change in x a small unit change in x.

This interpretation makes (common)sense:

Example 334. Say that if I add to a cup of water 1 g of Milo, its water volume increases
by 2 cm3 . And if the cup’s water volume increases by 1 cm3 , its water level rises by 0.3 cm.
Then by common sense, if I add 1 g of Milo, its water level should rise by 2 × 0.3 = 0.6 cm.
Let’s now rewrite the above common-sense observations more formally:
Let x be the mass (g) of Milo in a cup of water, y be the total volume (cm3 ) of water in
the cup, and z be the water level (cm) in the cup.
• When x increases by 1 g, y increases by 2 cm3 .

Formally: = 2 cm3 g−1 .
• When y increases by 1 cm−3 , z increases by 0.3 cm.

Formally: = 0.3 cm cm−3 = 0.3 cm−2 .
• And so, by the chain rule, when x increases by 1 g, z increases by 2 × 0.3 = 0.6 cm.

dz dz dy
Formally: = = 0.3 cm−2 × 2 cm3 g−1 = 0.6 cm g−1 .
dx dy dx

For a formal statement and proof of the Chain Rule, see Theorem XXX (Appendices).
240, Contents
Examples of how to “use” the chain rule:

Example 335. Let y = esin x . Then:

dy desin x Ch desin x dsin x

= = = esin x cos x.
dx dx dsin x dx

Example 336. Let y = 4x − 1. Then:
√ √
dy d 4x − 1 Ch d 4x − 1 d(4x − 1) 1 2
= = = √ ×4= √ .
dx dx d(4x − 1) dx 2 4x − 1 4x − 1

A slightly more complicated example, where we use the Chain Rule more than once:

Example 337. Let y = [sin(2x − 3) + cos(5 − 2x)] . Then:


d [sin(2x − 3) + cos(5 − 2x)]

dx dx
d [sin(2x − 3) + cos(5 − 2x)] d[sin(2x − 3) + cos(5 − 2x)]
d[sin(2x − 3) + cos(5 − 2x)] dx
d sin (2x − 3) d cos (5 − 2x)
= 3 [sin(2x − 3) + cos(5 − 2x)] [ + ]
dx dx
d sin (2x − 3) d (2x − 3) d cos (5 − 2x) d (5 − 2x)
= 3 [sin(2x − 3) + cos(5 − 2x)] [ + ]
Ch 2
d (2x − 3) dx d (5 − 2x) dx

= 3 [sin(2x − 3) + cos(5 − 2x)] [cos (2x − 3) × 2 + (− sin (5 − 2x)) × (−2)]


= 6 [sin(2x − 3) + cos(5 − 2x)] [cos (2x − 3) + sin (5 − 2x)].


241, Contents

dy dy RRRR
Exercise 108. Find and for each of the following, (Answer on p. 1423.)

(a) y = 1 + [x − ln (x + 1)] . (b) y = sin

2 x
1 + [x − ln (x + 1)]

Exercise 109. Let F, m, v, t, and p denote force, mass, velocity, time, and mo-
mentum. Momentum is defined as the product of mass and velocity.
(a) Newton’s Second Law of Motion states that the rate of change of momentum (of
an object) is equal to the force applied (to that object). Write down this law in
mathematical notation.
(b) Acceleration a is defined as the rate of change of momentum. Explain why Newton’s
Second Law simplifies to F = ma if mass is constant. (Answer on p. 124.12.)

Exercise 110. In Part V (Calculus), Fact 157, we will formally state and prove that:

d 1 1
ln x = .
dx x

But assuming = is true, we can quite easily prove that exp x = exp x:
(a) Use the Chain Rule to write down an expression for ln (exp x).
(b) What do you observe about the expression ln (exp x)? Use this observation to write
down another expression for ln (exp x).
(c) Then conclude that exp x = exp x. (Answer on p. 1424.)

Remark 34. By the way, you needn’t mug the following derivatives because they are on
your List MF26. (We’ll review the inverse trigonometric functions sin−1 , cos−1 , and tan−1
in Ch. 19.7.)

sin −1 x
1− x 2

cos −1 x −
1− x 2

tan −1 x
1 + x2

cosec x – cosec x cot x

sec x sec x tan x

242, Contents

18.7. Increasing and Decreasing

Definition 61. We say that a function is:

• Increasing (at a certain point) if its derivative (at that point) is positive; and
• Decreasing (at a certain point) if its derivative (at that point) is negative.

Example 338. The function f ∶ R → R defined by f (x) = x2 is decreasing on R− and

increasing on R+ . It is neither decreasing nor increasing at x = 0.

y y
f ′ < 0 on R− f ′ > 0 on R+
g g′ = 0
at x = 0

f′ = 0 g ′ > 0 on R− g ′ < 0 on R+
at x = 0

The function g ∶ R → R defined by g (x) = −x2 is increasing on R− and decreasing on R+ .

It is neither increasing nor decreasing at x = 0.

Example 339. The function h ∶ R → R defined by h (x) = x is increasing everywhere.

h′ > 0

243, Contents

18.8. Stationary and Turning Points
A stationary point (of a function) is simply any point at which that function’s derivative
equals zero:

Definition 62. A point x is a stationary point of a function f if f ′ (x) = 0.

Informally and intuitively, a turning point (of a function) is where the graph (of that
function) “turns”. Formally:

Definition 63. A point x is a turning point of a function f if it is both a stationary

point and a strict extremum (i.e. a strict maximum or minimum point) of f .

Example 340. Consider f ∶ R → R defined by f (x) = x2 .

Then K = (0, 0) is a stationary point because the derivative (or gradient) of f at K is

f ′ (0) = 0.



Moreover, K is also a turning point because it is both a stationary point and a strict
extremum (in particular, it is a strict local minimum.)

Remark 35. The term turning point is rarely or never used by mathematicians. How-
ever, it appears on your O- and A-Level syllabuses and exams. We shall therefore have to
use it. Definition 63 is merely my attempt to formally define what I believe your A-Level
examiners mean by this term.

244, Contents

Example 341. Consider g ∶ R → R defined by g (x) = −x2 .
Then D = (0, 0) is a stationary point the derivative (or gradient) of g at K is zero:

g ′ (0) = 0.


Moreover, D is also a turning point because it is both a stationary point and a strict
extremum (in particular, it is a strict local maximum.)

By definition, every turning point is a stationary point. However, the converse is false —
that is, a stationary point need not be a turning point.



245, Contents

Example 342. Define f ∶ R → R by f (x) = x3 .
The origin (0, 0) is a stationary point of f , because the derivative at that point is zero.
However, (0, 0) is not a turning point because it is not a strict extremum.

(0, 0) is a stationary point
but not a turning point

As we’ll learn Part V (Calculus), (0, 0) is actually an example of an inflexion point.

246, Contents

Example 343. Define j ∶ R → R by:

⎪ −81, for x ≤ −3,

j (x) = ⎨0.75x5 − 3.75x4 − 5x3 + 30x2 , for − 3 < x < 5,

⎩125, for x ≥ 5.

y H = (6, 125)
G = (5, 125)

C = (−2, 76)

E = (2, 44)

D = (0, 0) x

F = (4, −32)
A = (−4, −81)
B = (−3, −81)

The derivative (or gradient) of j at each of the points A, C, D, E, F , and H is zero.

Hence, each of these points is a stationary point.
Observe that moreover, C and E are strict maximum points, while D and F are strict
local minimum points. Hence, the stationary points C, D, E, and F are also turning
points. In contrast, A and H, which are not strict extrema, are not turning points.
The derivative of j at each of the points B and G is not equal to zero. Indeed andaAs
we’ll learn later on in Part V (Calculus), the derivative of j at each of those points does
not even exist! Hence, neither B nor G is a stationary point. Since B and G are not
stationary points, they are not turning points either.
(By the way, is j continuous everywhere? Differentiable everywhere?132 )

j is continuous everywhere, but not differentiable everywhere. It is differentiable everywhere except at
B and G.
247, Contents
Example 344. The sine function has infinitely many stationary and turning points. For
every integer k, the point Ik = ((0.5 + k) π, 1) is both a stationary and a turning point.

3π 5π
I−2 = (− I0 = ( , 1) I2 = (
, 1) y , 1)
2 2 2


5π 3π
I−3 = (− , −1) I−1 = (− , −1) I1 = ( , −1)
2 2 2

Note that at every even integer k, the point Ik = ((0.5 + k) π, 1) is a strict local maximum.
While at every odd integer l, the point Il = ((0.5 + 2l) π, 1) is a strict local minimum.

Now, here’s a subtle but important point. Every turning point is either a strict local
maximum or minimum. But the converse is false — that is, a strict local maximum or
minimum need not be a turning point.

248, Contents

Example 345. The function i ∶ [−1, ∞) → R defined by i (x) = x has a strict local
minimum at I = (−1, −1). (Indeed, I is also a strict global minimum.)


I = (−1, −1)

However, the derivative of i at I is not equal to zero. Indeed, the derivative of i at I does
not even exist. Hence, I is not a stationary point and cannot be a turning point either.

249, Contents

Example 346. Let A = (−5, 4), B = (1, 1), and C = (2, −3) be points. The graph
G = {A, B, C} simply consists of these three isolated points.
As explained earlier in Example 168, each of A, B, and C is both a strict local maximum
and a strict local minimum of G.

A = (−5, 4)

B = (1, 1)

-6 -4 -2 2 4 x


C = (2, −3)

However, the graph G is nowhere-differentiable.

-6 And so, it can have neither stationary
points nor turning points.

You may recall from secondary school that there were also something called inflexion
points. Don’t worry, we haven’t covered these yet and will do so in Part V (Calculus).

250, Contents

Exercise 111. For each of the following functions, identify all turning points.
(a) f ∶ R → R defined by f (x) = x2 + 1.
(b) g ∶ [−1, 1] → R “ g (x) = x2 + 1
(c) h∶R→R “ h (x) = cos x.
(d) i ∶ [−1, 1] → R “ i (x) = cos x. (Answer on p. 1424.)
Exercise 112. Explain whether each of the points A–H in the graph below is a turning
and/or stationary point. (Answer on p. 1424.)



Assume the graph is G

“smooth” everywhere
except at two points.
H x

251, Contents

19. O-Level Review: Trigonometry
Let A and B be two points on a circle. Then the arc AB is the set of points between A
and B (inclusive).133

Example 347. The arc AB contains A

every point between A and B (inclusive).
The arc CD contains every point between
C and D (inclusive). B
Note that like a line, an arc is a one-
dimensional object. (In contrast, a
point is a zero-dimensional object, an area D
is a two-dimensional object, and a volume
is a three-dimensional object.)
Let O be the centre of the circle. Note
that an arc does not include the radii. So
here, the arc AB doesn’t include the line
segments OA and OB. Similarly, the arc C
CD doesn’t include the line segments OC
and OD.

As with line segments, ∣AB∣ and ∣CD∣ will denote the lengths of the arcs AB and CD.

Example 348. Let EF be an arc. If the E

line EF passes through the centre of the
circle O (i.e. if the line segment EF is also
the circle’s diameter), then we may also
call the arc EF a semicircle.
We see then that the semicircle is a spe-
cial case of the arc.
More generally, an arc is simply any sub-
set of the circumference. O

Remark 36. Just so you know, some writers use the word arc to refer to any “smooth”
curve. This shall not be the practice of this textbook. In this textbook, we will strictly
reserve the word arc to mean a subset of the circumference of a circle.

For the formal definition of an arc, see Definition 228 in the Appendices.
252, Contents
19.1. Angles
Let a and b be rays. Informally, the angle between a and b is the anti-clockwise rotation a
must undergo to coincide with b.

Example 349. Let A, B, C, and D be A

points on a circle with centre O.
The angle α between the rays OB and OA B
is the anti-clockwise rotation OB must
undergo to coincide with OA.
We will also say that α is the angle sub- α
tended by the arc AB.
Similarly, the angle β between the rays O
OD and OC is the anti-clockwise rotation
OD must undergo to coincide with OC. β
We will also say that β is the angle sub-
tended by the arc CD. C

We now explain why angles are periodic, with period 360○ .

In the above example, rotating the ray OB anti-clockwise (a.c.w.) by α produces the ray
OA. Now, observe that we also get the ray OA if we rotate the ray OB a.c.w. by α + 360○ ;
or by α + 720○ ; or by α + 1080○ ; etc. Since each of these rotations (on the ray OB) produce
the same result (the ray OA), let us simply regard all of these rotations as being equal.

α = α + 360○ = α + 720○ = α + 1080○ . . .

That is:

Now, going the other way, observe that we also get the ray OA if we rotate the ray OB
clockwise (c.w.) by 360○ − α; or by 720○ − α; or by 1080○ − α; etc. But:

A 360○ − α c.w. rotation is equivalent to a − (360○ − α) = α − 360○ a.c.w. rotation.

A 720○ − α “ − (720○ − α) = α − 720○ “
A 1080○ − α “ − (1080○ − α) = α − 1080○ “

And so, by the same reasoning as before, let us simply regard the angles α, α−360○ , α−720○ ,
α − 1080○ , etc. as being equal.

α = α − 360○ = α − 720○ = α − 1080○ . . .

That is:

Together, = and = say that:

1 2

α = α ± 360○ = α ± 720○ = α ± 1080○ = . . .

That is, for any integer k, the angles α and α + k ⋅ 360○ are equal. This is what we mean,
when we say that angles are periodic, with period 360○ .
253, Contents
19.2. The Radian
In primary and secondary school, we used the degree (○ ) to measure angles. But from
here on out, we’ll use the radian instead.134

Definition 64. Let AB be an arc of a circle of radius r. Then the magnitude, in radians
(rad), of the angle α subtended by the arc is defined to be the following number:

Refer to the circle on the left. It has centre O and radius r. And so, by the above Definition:

∣AB∣ ∣CD∣
α= and β= .
r r

r r

The full angle
is 2π rad.

Now refer to the circle on the right. By the above Definition, the full angle 360○ equals
2π radians, because this is the ratio of the circle’s entire circumference to its radius r:

Full angle= = 360○ = rad = 2π rad.

Indeed, the radian is the SI unit for angles.
254, Contents
By Definition 64, the angle subtended by an A
arc whose length equals r has magnitude 1 rad. r
To figure out what 1 rad is in degrees, simply135
divide = (previous page) by 2π: B
2π rad 360○ 1 rad
1 rad = = ≈ 57.3○ .
2π 2π
By the way, observe that by definition, the
radian is the ratio of two lengths. Thus, it
is actually a “unitless” unit or a “pure num-
ber”. And so, going forward, we will not even
bother writing the unit “rad” (as we’ve been
doing so far).

Refer to the circle on the left. The straight angle 180○ is that subtended by the semicircle
and equals π, because this is the ratio of a semicircle’s length πr to the radius r.

πr r

The right
The straight angle is π/2.
angle is π.


Now refer to the circle on the right. The right angle 90○ is one-quarter of a full angle
and thus equal to π/2. By convention, we depict the right angle as a square and any other
angle as a sector  of a circle.

This computation requires that we know the value (or at least the approximate value) of π.
255, Contents
Altogether, we have seven names for angles, depending on their magnitude:

Definition 65. We call the angle θ:

1. The zero angle if θ = 0.
2. An acute angle if θ ∈ (0, π/2) = (0○ , 90○ ).
3. The right angle if θ = π/2 = 90○ .
4. An obtuse angle if θ ∈ (π/2, π) = (0○ , 90○ ).
5. The straight angle if θ = π = 180○ .
6. A reflex angle if θ ∈ (π, 2π) = (180○ , 360○ ).
7. The full angle if θ = 2π = 360○ .

Definition 66. We call a triangle:

1. An acute triangle if its largest angle is acute.
2. A right triangle if its largest angle is right.
3. An obtuse triangle if its largest angle is obtuse.

Acute Right Obtuse

Radian-degree conversion table:

π π π π
Radian(s) 0 π 2π
6 4 3 2
Degree(s) 0○ 30○ 45○ 60○ 90○ 180○ 360○

Here are a few convenient terms that we may use in this textbook:

Definition 67. Let α and β be angles. We say that α and β are:

• Complementary if α + β =
• Supplementary if α + β = π; and
• Explementary or conjugate if α + β = 2π.

Remark 37. By convention, angles are usually denoted by lower-case Greek letters α
(alpha), β (beta), γ (gamma), and θ (theta). Your List of Formulae (MF26, p. 3) also
uses upper-case Latin letters like A, B, P , and Q, and so we’ll use those too.

256, Contents

19.3. The Pythagorean Theorem

Theorem 3. Let ABC be a right triangle with hypotenuse AB. Then

∣AC∣ + ∣BC∣ = ∣AB∣ .

2 2 2

Proof. Let D be the point on AB that is the base of the perpendicular from the point C.


Observe that the three triangles ABC, ACD, and CBD are similar. Thus:
∣AC∣ ∣AD∣ ∣BC∣ ∣BD∣
= and = .
∣AB∣ ∣AC∣ ∣AB∣ ∣BC∣

∣AC∣ = ∣AB∣ ∣AD∣ and ∣BC∣ = ∣AB∣ ∣BD∣.

2 2
Rearrange to get:

Now add these two equations together to get:

∣AC∣ + ∣BC∣ = ∣AB∣ ∣AD∣ + ∣AB∣ ∣BD∣

2 2

= ∣AB∣ (∣AD∣ + ∣BD∣)

= ∣AB∣ ∣AB∣ = ∣AB∣ .

257, Contents

19.4. Sine and Cosine: The Right-Triangle Definitions
Sine and cosine, denoted sin and cos, are the two basic trigonometric (or circular)
In the right triangle below, we’ve labelled angle θ and sides of lengths o, a, and h (for
“Opposite”, “Adjacent”, and “Hypotenuse”).



Here now are the right-triangle definitions of sine and cosine you’ll recall from secondary

sin θ = and cos θ = .

o a
h h

We then use the sine and cosine functions to define another four trigonometric functions:

Definition 68. Define:

1 sin θ
The tangent functiontan ∶ R ∖ {(k + ) π ∶ k ∈ Z} → R by tan θ = .
2 cos θ
The cosecant function cosec ∶ R ∖ {kπ ∶ k ∈ Z} → R by cosecθ = .
sin θ
1 1
The secant function sec ∶ R ∖ {(k + ) π ∶ k ∈ Z} → R by sec θ = .
2 cos θ
cos θ
The cotangent function cot ∶ R ∖ {kπ ∶ k ∈ Z} → R by cot θ = .
sin θ

Remark 38. Your A-Level syllabus and exams denote the cosecant function by cosec and
so we’ll do that too. Be aware though that many writers instead denote it by csc.

(For now, don’t worry about the complicated-looking domains of the trigonometric functions
— we’ll discuss them in a moment.)
The above Definition together with the right-triangle definitions of sine and cosine give us:

tan θ = , cosecθ = , sec θ = , cot θ = .

o h h a
a o a o

258, Contents

In the past, Singaporean students were taught the Hokkien mnemonic “big leg woman”:

大 跤 嫂
tuā kha só


Opposite Adjacent Opposite
Tangent = , Cosine = , Sine = .
Adjacent Hypothenuse Hypothenuse

But these days, we probably prefer a more angmoh mnemonic like:

Soccer tour — SOH-CAH-TOA.136

Some useful and basic identities you should mug:137

Fact 29. Let x ∈ R. Then:

(a) sin2 x + cos2 x = 1.

(b) 1 + tan2 x = sec2 x, for x ∉ {(k + ) π ∶ k ∈ Z}.
(c) 1 + cot x = cosec x, for x ∉ {kπ ∶ k ∈ Z}.
2 2

Proof.138 By definition, we have:

sin x = , cos x = , tan x = cosecx = , sec x = , cot x = .

o a o h h a
h h a o a o

By the Pythagorean Theorem, we have: o2 + a2 = h2 . Now write:


o2 a2 o2 + a2 P h2
(a) sin2 x + cos2 x = + = = 2 = 1.
h2 h2 h2 h
o2 a2 + o2 P h2
(b) 1 + tan2 x = 1 + 2
= 2
= 2 = sec2 x.
a a a
a2 o2 + a2 P h2
(c) 1 + cot x = 1 + 2 =
= 2 = cosec2 x.
o o o
Credit to Tom J at
To mug is to study hard and, especially, to engage in rote learning or memorisation. I consider mug to
be Singlish and so italicise it.
It seems though that the phrase mug up may have originated in Britain. To my knowledge, this phrase
isn’t in current usage in Britain. I have however come across South Asians using mug up in this sense.
(In efficient Singlish though, the preposition up is simply dropped.)
Our proof here is actually not quite general, because it implicitly refers to the right triangle and thus
implicitly assumes that x is acute.
259, Contents
It’s not difficult to remember these particular values of sin, cos, and tan:

Fact 30. π π π π
x 0
6 4 3 2
√ √ √ √ √
0 1 2 3 4
sin x
2 2 2 2 2
√ √ √ √ √
4 3 2 1 0
cos x
2 2 2 2 2
1 √
tan x 0 √ 1 3 N.A.

Proof. The results for x = 0 and x = π/2 are “obvious” and require no proof.
Here we merely give an informal proof-by-picture that:
√ √
π 1 3 2
(a) sin = , (b) sin = (c) sin =
π π
, and .
6 2 3 2 4 2

The equilateral triangle ABC (right) has A

sides ∣AB∣ = ∣AC∣ = ∣BC∣ = 1 and
angles ∠ABC = ∠BAC = ∠ACB = π/3. π
Let AD be the perpendicular bisector of BC.139

(a) Then ∣BD∣ = 0.5 and ∠BAD = π/6.140

And so by definition of sin, we have: 1 1

sin ∠BAD = ,
π 0.5 1
or sin = = .
6 1 2 π
B 0.5 D C

(b) By the Pythagorean Theorem, ∣AD∣ = ∣AB∣ − ∣BD∣ = 12 − 0.52 = 3/4.

2 2 2

Thus, ∣AD∣ = 3/2. And so, by definition of sin, we have:
√ √
∣AD∣ 3/2 3
sin ∠ABD = sin = =
or .
∣AB∣ 3 1 2

(Proof continues on the next page ...)

We’ll simply assume without proof that AD is perpendicular to BC and bisects the line segment BC.
Here again we’ll simply assume without proof that AD bisects ∠BAD.
260, Contents
(... Proof continued from the previous page.) D

(c) The isosceles right triangle DEF below has

the following legs and angles:

∣DF ∣ = ∣EF ∣ = 1 and ∠DEF = ∠EDF =

By the Pythagorean Theorem, we have: 1

∣DE∣ = ∣EF ∣ + ∣DF ∣ = 12 + 12 = 2.

2 2 2

Thus, ∣DE∣ = 2. And so:
√ π
∣DF ∣ 2
sin ∠DEF = or sin = E
π 4
∣DE∣ 4 2
1 F
√ sin θ
To complete the proof, we use cos θ = 1 − sin2 θ and tan θ = and write:
cos θ
√ √ √ √
1 2 3 π sin (π/6) 1/2 1 3
cos = 1 − sin2 = 1−( ) = tan = =√ =√ =
π π
. .
6 6 2 2 6 cos (π/6) 3/2 3 3
√ Á √ 2 √ √
Á (π/3) 3/2 √

À1 − ( 3 1 sin
cos = 1 − sin2 ) = tan = =√ = 3.
π π π
3 3 2 2 3 cos (π/3) 1/2
√ Á √ 2 √ √
Á (π/4)

À1 − ( 2 2 sin 2/2
cos = 1 − sin2 ) = tan = =√ = 1.
π π π
4 4 2 2 4 cos (π/4) 2/2

Fun Fact

According to one historian of mathematics,141 sine means bosom or breast:

The English word “sine” comes from a series of mistranslations of the

Sanskrit jyā-ardha (chord-half). Āryabhata frequently abbreviated this term
to jyā or its synonym jı̄vā. When some of the Hindu works were later trans-
lated into Arabic, the word was simply transcribed phonetically into an other-
wise meaningless Arabic word jiba. But since Arabic is written without vow-
els, later writers interpreted the consonants jb as jaib, which means bosom or
breast. In the twelfth century, when an Arabic trigonometry work was trans-
lated into Latin, the translator used the equivalent Latin word sinus, which
also meant bosom, and by extension, fold (as in a toga over a breast), or a
bay or gulf. This Latin word has now become our English “sine”.

Victor Katz, A History of Mathematics: An Introduction, (2009, p. 253).
261, Contents
19.5. Sine and Cosine: The Unit-Circle Definitions
Unfortunately, the right-triangle definitions of sine and y
cosine suffer from a slight problem — they “work” only if
θ is an angle in a right triangle, i.e. only if θ ∈ [0, π/2].
In order for sin and cos to also “work” more generally,
that is, for any θ ∈ R, we must turn to the unit-circle
definitions. x
We first divide the cartesian plane into four Quadrants
called I, II, III, and IV — Quadrant I is where x and y III IV
are positive, then go anti-clockwise.

The positive x-axis is a ray that starts at the origin. For convenience, let’s call it x+ .
Let α be any angle. Rotate x+ anticlockwise by α to produce the ray a.
Let A = (Ax , Ay ) be the point at which a intersects the unit circle centred on the origin.

y The ray a

A = (Ax , Ay )
α The ray x+

O Ax Ā x

Here now are the unit-circle definitions of sine and cosine:

sin α = Ay and cos α = Ax

That is, sine and cosine are simply given by the y- and x-coordinates of the point A.

262, Contents

We now show that at least in Quadrant I, the right-triangle and unit-circle definitions
coincide. To see why, let Ā (read aloud as “A-bar”) be the base of the perpendicular
dropped from A to the x-axis. Consider the right triangle OĀA — we have “Hypotenuse”
of length ∣OA∣ = 1, “Opposite” of length ∣AĀ∣ = Ay , and “Adjacent” of length ∣OĀ∣ = Ax .
And so, by the right-triangle definitions of sine and cosine, we have:

o ∣AĀ∣ Ay a ∣OĀ∣ Ax
sin β = = = = Ay and cos β = = = = Ax ,
h ∣OA∣ 1 h ∣OA∣ 1
which indeed coincides with the unit-circle definitions from the previous page.

In the other Quadrants, the right-triangle definitions will continue to give the “correct”
magnitudes of sine and cosine. However, they may now give the “wrong” signs. (“Correct”
and “wrong” here are as determined by the unit-circle definitions.)
Consider for example the angle β ∈ (π, 3π/2) in Quadrant III. As before, rotate x+ anti-
clockwise by β to produce the ray b. Let B = (Bx , By ) be the point at which b intersects
the unit circle centred on the origin.

B̄ Bx The ray x+

O x

The ray b
B = (Bx , By )

By the unit-circle definitions, we have:

sin β = By < 0 and cos β = Bx < 0.

In contrast, looking at the right triangle OB̄B, the right-triangle definitions give:

o ∣B B̄∣ ∣By ∣ a ∣OB∣ ∣By ∣

sin β = = = = ∣By ∣ > and cos β = = = = ∣Bx ∣ > 0.
h ∣OB∣ 1 h ∣OB∣ 1
The magnitudes are “correct” but the signs are “wrong”.
263, Contents
Henceforth, we will abandon the right-triangle definitions and treat only the unit-circle
definitions of sine and cosine as the “correct” ones. (You may nonetheless still find right-
triangle-based mnemonics like TOA-CAH-SOH helpful.)

Remark 39. Note though that the unit-circle definitions are considered informal, because
they rely on drawings. One way to formally define the sine and cosine functions is to
use their power series — something we’ll learn about in Part V. Another way is to use
Euler’s identity — something we’ll learn about in Part IV.
For A-Level maths, our above unit-circle definitions will be more than good enough. But
if you’re interested, this textbook’s “official” formal definitions of sine and cosine are
given in Definitions 177 and 178 in the Appendices.

Summary of the sign of each of sine, cosine, and tangent, in each of the four Quadrants:

Quadrant sin cos tan

I + + +
II + − −
III − − +
IV − + −

Or just remember: All Students Talk Cock (ASTC).

Sine is positive All three are positive.

Tangent is positive Cosine is positive

264, Contents

Graphs of sin, cos, and tan:

y tan

sin 1

3π 3π
− −
π π
2 2 2 2 x

−2π −π π 2π


Assorted remarks and observations in bullet form:

• My mnemonic for the above graphs:
Sine is normal and so starts at the origin. Cosine is weird and so starts above it. Tangent
is crazy and goes through the origin.
In Ch. 16.8, we learnt to use the graph of f to graph 1/f . Here’s more practice:

Exercise 113. Graph the functions cosec, sec, and cot. (Answer on the next page.)

265, Contents

cosec sec cot

π 3π

−π 2 1 π 2

−2π 3π −1 π 2π

2 2

• The trigonometric functions are periodic, with period 2π (or 360○ ).

As mentioned earlier, angles are periodic. Every 2π brings us one full circle.
It follows then that the trigonometric functions are also periodic with period 2π. That is,
every 2π, they “repeat”. For example:

sin 1 = sin (1 ± 2π) = ± sin (1 + 4π) = . . . ≈ 0.841,

cos 1 = cos (1 ± 2π) = ± cos (1 + 4π) = . . . ≈ 0.540.

More generally, if we shift any of the six graphs to the left or right by an integer multiple
of 2π, we get the exact same graphs. Algebraically:

sin θ = sin (θ + 2π) = sin (θ − 2π) , cos θ = cos (θ + 2π) = cos (θ − 2π) ,
sec θ = sec (θ + 2π) = sec (θ − 2π) , cosecθ = cosec (θ + 2π) = cosec (θ − 2π) .
• tan and cot don’t just have period 2π — they also have period π.
That is, after every π, they “repeat”. If we shift their graphs to the left or right by an
integer multiple of π, we get the exact same graphs. Algebraically:

tan θ = tan (θ + π) = tan (θ − π) , cot θ = cot (θ + π) = cot (θ − π) .

266, Contents

• The domain and range of each function:

Domain Range
sin R [−1, 1]
cos R [−1, 1]
tan R ∖ {(k + 0.5) π ∶ k ∈ Z} R
cosec R ∖ {kπ ∶ k ∈ Z} R ∖ (−1, 1)
sec R ∖ {(k + 0.5) π ∶ k ∈ Z} R ∖ (−1, 1)
cot R ∖ {kπ ∶ k ∈ Z} R

Recall142 that the domain of f /g must exclude any values of x for which g (x) = 0. This is
exactly what we must do here to get the domains of tan, cosec, sec, and cot:
For any integer k, sin (kπ) = 0. And so, for cosec = 1/ sin and cot = cos / sin, we must
exclude {kπ ∶ k ∈ Z} from the domain.
And for any integer k, cos ((k + 0.5) π) = 0. And so, for sec = 1/ cos and tan = sin / cos, we
must exclude {(k + 0.5) π ∶ k ∈ Z} from the domain.
• sin translated leftwards by π/2 is cos:

sin (θ + ) = cos θ.
Equivalently, cos translated rightwards by π/2 is sin:

cos (θ − ) = sin θ.

• sin and cos translated left- or rightwards by π are their own reflections in the x-axis:

sin (θ + π) = − sin θ and cos (θ + π) = − cos θ.

• cos is symmetric in the y-axis: cos θ = cos (−θ).

• The reflection of sin in the y-axis is its reflection in the x-axis:

sin (−θ) = − sin θ.

The same is true of tan: tan (−θ) = − tan θ.

Ch. 13.
267, Contents
Fact 31 lists a whole bunch of useful trigonometric identities.

Fact 31. Let A, B, P, Q ∈ R. Then:

The Addition and Subtraction Formulae

(a) sin (A ± B) = sin A cos B ± cos A sin B,

(b) cos (A ± B) = cos A cos B ∓ sin A sin B,
tan A ± tan B
(c) tan (A ± B) = .
1 ∓ tan A tan B

The Double Angle Formulae

(d) sin 2A = 2 sin A cos A,

(e) cos 2A = cos2 A − sin2 A
= 2 cos2 A − 1
= 1 − 2 sin2 A,
2 tan A
(f) tan 2A = .
1 − tan2 A

Sum-to-Product or Product-to-Sum Formulae

P +Q P −Q
(g) sin P + sin Q = 2 sin cos ,
2 2
P +Q P −Q
(h) sin P − sin Q = 2 cos sin ,
2 2
P +Q P −Q
(i) cos P + cos Q = 2 cos cos ,
2 2
P +Q P −Q
(j) cos P − cos Q = −2 sin sin .
2 2

For (c), we require also that 143

A, B, A ± B ∉ {(k + 0.5) π ∶ k ∈ Z} .

Thankfully, you needn’t mug the above because they’re on List of Formulae, MF26 (p. 3).

Exam Tip for Towkays

Whenever you see a question with trigonometric functions, put MF26 (p. 3) next to you!

These additional conditions ensure that cos A, cos B, and cos (A ± B) are non-zero — and hence that
tan A, tan B, tan (A ± B) are well-defined. Otherwise, (c) is false, because an undefined mathematical
object cannot be said to be equal to anything — indeed, not even to another undefined object!
268, Contents
Here we’ll give only an informal proof-by-picture144 of the sine and cosine Addition Formu-
lae.145 This proof-by-picture covers only the special case where A, B, and A + B are acute.
The subsequent exercises then ask you to prove the remaining formulae.

Proof. The figure below is constructed so that ∣P R∣ = 1.146

Q cos (A + B) sin A sin B


cos A sin B
sin B
sin (A + B)
cos B
sin A cos B
P cos A cos B U

1. ∣P T ∣ = cos B. Thus:

∣P U ∣ = cos A cos B and ∣T U ∣ = sin A cos B.

2. ∠P RQ and ∠RP U are alternate angles (or “Z-angles”). Hence, ∠P RQ = ∠RP U =

A + B. Thus:

∣P Q∣ = sin (A + B) and ∣QR∣ = cos (A + B).

3. ∠RT S is complementary to ∠P T U ,147 which is in turn complementary to ∠T P U .
Hence, ∠RT S = ∠T P U = A. Moreover, ∣RT ∣ = sin B. Thus:

∣ST ∣ = cos A sin B and ∣RS∣ = sin A sin B.

We now have:

P Q = UT + T S or sin (A + B) = sin A cos B + cos A sin B,

QR = P U − RS or cos (A + B) = cos A cos B − sin A sin B.

Credit to Blue .
My mnemonic from earlier also kinda works here: sine is normal, while cosine is weird.
Construction details. Let A, B, A + B ∈ (0, π/2). Let P U be the horizontal ray that starts at P . Rotate
P U anticlockwise by A to get the ray P T . Now rotate P T anticlockwise by B to get the ray P R. Pick
R so that ∣P R∣ = 1. Pick T so that it is the perpendicular drop of R onto the ray P T . Now construct
the rectangle P QSU that completely contains the right triangle P RT , with R on the line segment QS
and T on the line segment SU .
I.e. ∠RT S + ∠P T U = π/2 — see Definition 67.
269, Contents
Exercise 114. Use the figure148 to prove Q S
the Subtraction Formulae for Sine
and Cosine in the special case where A R
is acute and B < A. (Answer on p. 1426.)

sin (A − B) = sin A cos B − cos A sin B,

cos (A − B) = cos A cos B + sin A sin B.
Exercise 115. Prove the Addition and T
Subtraction Formulae for Tangent
(below). You may assume all terms are
well-defined and that the corresponding A−B
Formulae for Sine and Cosine hold for all B
A, B ∈ R. (Answer on p. 1427.)
tan A ± tan B
tan (A ± B) = P U
1 ∓ tan A tan B

Exercise 116. Prove the following Double-Angle Formulae: (Answers on p. 1427.)

(a) sin 2A = 2 sin A cos A. (b) cos 2A = 2 cos2 A − 1 = 1 − 2 sin2 A.

2 tan A
(c) tan 2A = . Hint: Use the Addition Formulae.
1 − tan2 A

Remark 40. The Double-Angle Formulae are on List MF26, so strictly speaking, you
needn’t memorise them. Nonetheless, it’s a good idea to have them committed to memory
so you can solve problems that much more quickly.

Exercise 117. Prove the following Triple-Angle Formulae: (Answers on p. 1427.)

(a) sin 3A = 3 sin A − 4 sin3 A. (b) cos 3A = 4 cos3 A − 3 cos A.

3 tan A − tan3 A
(c) tan 3A = . Hint: Use the Addition and Double-Angle Formulae.
1 − 3 tan2 A

Remark 41. The Triple-Angle Formulae are not on List MF26. So if they ever come up
on exams, you’ll want to be able to either derive them from scratch or recall them.
For cos 3A = 4 cos3 A − 3 cos A, there’s the Hokkien mnemonic “$1.30 = $4.30 − $3”:

箍 三 等於 四箍 三 減 三 箍 。
khoo sam sì khoo sam sam khoo .
$1.30 = $4.30 − $3 .
The construction of this figure is very similar to before, except now we start with the vertical ray P Q,
rotate it clockwise by A − B to get the ray P R, then rotate P R clockwise by B to get P T .
270, Contents
Exercise 118. Prove the following Half-Angle Formulae. (Answer on p. 1428.)
⎧ √

⎪ 1 − cos A

⎪ for
in Quadrant I or II,
A ⎪ ⎪
2 2
sin = ⎨ √
2 ⎪ ⎪

⎪ 1 − cos A
⎪ −

, for in Quadrant III or IV,
⎩ 2 2


⎪ 1 + cos A

⎪ for
in Quadrant I or IV,
A ⎪⎪
2 2
cos = ⎨ √
2 ⎪⎪

⎪ 1 + cos A

, for in Quadrant II or III.
⎩ 2 2
Hint: cos A = cos ( + ).
2 2

Exercise 119. Prove the following Sum to Product or Product to Sum Formulae.

P +Q −Q P +Q P −Q
sin P + sin Q = 2 sin cos P + cos Q = 2 cos
cos , cos ,
2 2 2 2
P +Q −Q P +Q P −Q
sin P − sin Q = 2 cos cos P − cos Q = −2 sin
sin , sin .
2 2 2 2

(Answers on p. 1429.)
P +Q P −Q P +Q P −Q
Hint: P = + and Q = − .
2 2 2 2

Fun Fact

The above S2P or P2S Formulae are also known as the Prosthaphaeresis Formulae.
Sounds cheem, but that’s just the combination of the Greek words for addition and
subtraction — prosthesis and aphaeresis. So yea, something you can totally use to impress
your friends and family.

The P2S Formulae will be particularly useful when we do integration, because they allow
us to rewrite an otherwise-difficult-to-integrate product into an easy-to-integrate sum.

Exercise 120. Rewrite each expression using the P2S Formulae: (Answer on p. 1429.)

(a) sin 2x cos 5x. (b) cos 2x sin 5x.

(c) cos 2x cos 5x. (d) sin 2x sin 5x

271, Contents

19.6. Warning about Notation
In Ch. 15, we learnt that:
• f 2 denotes the composite function f ○ f ;
• f 3 denotes the composite function f ○ f 2 ;
• Etc.
We’d thus expect that sin2 denotes the composite function sin ○ sin. That is, we’d expect:

“sin2 x = sin (sin x).”

But this is not the case! Very confusingly, sin2 denotes the function sin ⋅ sin. That is:

sin2 x = (sin x) = (sin x) (sin x).


≠ sin (sin ) because:

π π
Example 350. sin2
2 2

π 2
= (sin ) = (1) = 1, sin (sin ) = sin 1 ≈ 0.845.
2π 2 π
sin but
2 2 2

And in general, for any positive integer n, sinn does not denote sin ○ sin ○ ⋅ ⋅ ⋅ ○ sin. That is:

sinn x ≠ sin (sin (sin (. . . (sin x)))).

Instead, sinn denotes sin ⋅ sin ⋅ ⋅ ⋅ ⋅ ⋅ sin:

sinn x = (sin x) = (sin x) (sin x) . . . (sin x).


(The same is true of the other five trigonometric functions.)

This is yet another annoying and confusing bit of notation you’ll have to learn to live with.

272, Contents

19.7. The Inverse Trigonometric Functions
Say we want to construct the inverses of sin, cos, and tan.
Unfortunately, none of them is invertible. The horizontal line y = 1 intersects each graph
more than once; and so by the horizontal line test (HLT), none is invertible.

y tan


sin cos

And so, if we want to construct the inverses of sin, cos, and tan, we’ll have to first restrict
their domains:

Exercise 121. Use the HLT to show that the following domain restrictions will create
invertible functions. Then write down their inverses. (Answer on the next page.)

Original domain Restrict domain to

[− , ].
π π
sin R
2 2
cos R [0, π].
tan R ∖ {(k + ) π ∶ k ∈ Z} (− , ).
π π
2 2 2

273, Contents

sinR ∶ [− , ] → R by sinR x = sin x.
π π
2 2
cosR ∶ [0, π] → R by cosR x = cos x.

tanR ∶ (− , ) → R by tanR x = tan x.

π π
2 2

Below are the graphs of sinR , cosR , and tanR . Clearly, by the HLT, each is invertible.



π π
2 2
sinR cosR


We can now define the three inverse trigonometric functions:

Definition 69. The inverse trigonometric functions arcsine, arccosine, and arctangent
are denoted sin−1 , cos−1 , and tan−1 and are defined as follows:

Define sin−1 ∶ [−1, 1] → [− , ] by if y = sin x, then sin−1 y = x.

π π
2 2
Define cos−1 ∶ [−1, 1] → [0, π] by if y = cos x, then cos−1 y = x.

Define tan−1 ∶ R → (− , ) by if y = tan x, then tan−1 y = x.

π π
2 2

Note that the domain restrictions given here are somewhat arbitrary. For example, with sin, we could
equally well have chosen to restrict the domain to [π/2, 3π/2] instead. Nonetheless, by convention, these
domain restrictions are standard and so they are what we’ll use.
274, Contents
The three inverse trigonometric functions are graphed below. Observations:
• sin−1 and cos−1 both have domain [−1, 1].
sin−1 has endpoints (−1, −π/2) and (1, π/2), while cos−1 has endpoints (−1, π) and (1, 0).
• In contrast, tan−1 has domain R and no endpoints.



π sin−1

−1 1


Note that each function has a range that’s equal to its codomain. Each inverse trigonometric
function’s range (or equivalently codomain) is also called its set of principal values. So,
we have the following table (which also appears on List MF26, p. 3):

sin−1 cos−1 tan−1

Principal values [− , ] [0, π] (− , )

π π π π
2 2 2 2

Remark 42. The previous subchapter noted that sin2 = sin ⋅ sin. This is confusing and
contradicts with our earlier use of f 2 to denote the composite function f ○ f .
Here, to add to our confusion, sin−1 doesn’t mean 1/ sin, as would be logical given that
sin2 = sin ⋅ sin. Instead, sin−1 x denotes the inverse sine or arcsine function!
This tremendously confusing notation is one reason why many writers prefer to denote
the three inverse trigonometric functions by arcsin, arccos, and arctan.
However and unfortunately, your A-Level exams and syllabus insist on using the notation
sin−1 , cos−1 , and tan−1 — and so that’s what we’ll have to do too.

275, Contents

We now give formulae for the compositions of a trigonometric function with an inverse
trigonometric function. You don’t need to know these formulae, but their proofs are fairly
simple. Moreover, they’ll often come in handy, as we’ll see in the following Fact.

Lemma 1. (a) If x ∈ R, then cos (tan−1 x) = √ and sin (tan−1 x) = √
1 + x2 1 + x2

(b) If x ∈ [−1, 1], then sin (cos−1 x) = 1 − x2 = cos (sin−1 x).
Let’s first give an informal proof-by-picture:

sin (tan−1 x) = =√
H 1 + x2

1 + x2 A 1
(a) x cos (tan−1 x) = =√
H 1 + x2

tan−1 x

O 1 − x2
sin (cos−1 x) = =
cos−1 x H 1
1 √
(b) x A 1 − x2
cos (sin−1 x) = =
H 1

sin−1 x

1 − x2

Formal proof:

Proof. (a) Let y = tan−1 x. Then x = tan y, so that 1 + x2 = 1 + tan2 y = sec2 y.

Taking square roots,150 we have 1 + x2 = sec y = sec (tan−1 x).
1 1
Hence, cos (tan−1 x) = =√ . 3
sec (tan x)
1 + x2

Here we’ve actually omitted a step. Recall that if a = b√2 , then a = ∣b∣. And so here, taking square roots
of the equation 1 + x2 = sec2 y, we should instead get 1 + x2 = ∣sec y∣ = ∣sec (tan−1 x)∣. We next observe
that tan−1 x ∈ (− , ) and hence sec (tan−1 x) > 0; thus, we can get rid of the absolute value sign.
π π
2 2
276, Contents
1 x2
Next, sin y = 1 − cos y = 1 − = = √
2 2 151 x
. Taking square roots, we have sin
1 + x2 1 + x2
1 + x2
or sin (tan−1 x) = √
. 3
1 + x2
(b) Let y = cos−1√ x. Then x = cos y and 1√− x = 1 − cos y = sin y. Taking square roots,
2 2 2 152

we have sin y = 1 − x2 or sin (cos−1 x) = 1 − x2 .

Let z = sin−1√x. Then x = sin z and 1 √
− x2 = 1 − sin2 z = cos2 z. Taking square roots,153 we
have cos z = 1 − x2 or cos (sin−1 x) = 1 − x2 .

We have the following obvious result that is immediate from the definitions of the inverse
trigonometric functions:

Fact 32. (a) If x ∈ R, then tan (tan−1 x) = x.

(b) If x ∈ [−1, 1], then sin (sin−1 x) = x and cos (cos−1 x) = x

The following Corollary is nearly immediate from the above Lemma and Fact:

1 − x2
Corollary 4. If x ∈ (−1, 1), then tan (sin−1 x) = √ and tan (cos−1 x) =
1 − x2 x

For an informal proof-by-picture, refer to that for Lemma 1(b).

Proof. By tan = sin / cos and the above Lemma and Fact, we have:

sin (sin−1 x) sin (cos−1 x) 1 − x2
tan (sin−1 x) = =√ tan (cos−1 x) = =
and .
cos (sin−1 x) 1 − x2 cos (cos−1 x) x

Again, here we’ve omitted a step. Taking square roots of sin2 y =

, we should instead have
1 + x2

∣x∣ ∣x∣
∣sin y∣ = √ or ∣sin (tan−1 x)∣ = √ . To get rid of these absolute value signs, we observe that:

1 + x2 1 + x2
• x ≥ 0 ⇐⇒ tan−1 x ∈ [0, ) and hence sin (tan−1 x) ≥ 0.
• x < 0 ⇐⇒ tan−1 x ∈ (− , 0) and hence sin (tan−1 x) < 0.
The above two observations show that x always has the same sign as sin (tan−1 x). Thus, we can get
rid of the absolute value signs.
Again, here’s we’ve omitted√
√ a step. Taking square roots of 1−x2 = sin2 y, we should instead have ∣sin y∣ =
1 − x2 or ∣sin (cos−1 x)∣ = 1 − x2 . We next observe that cos−1 x ∈ [0, π] and hence sin (cos−1 x) ≥ 0;
thus, we can get rid of the absolute value sign.
Again, here’s we’ve omitted a step. Taking square roots of 1−x2 = cos2 z, we should instead have ∣cos z∣ =
√ √
1 − x2 or ∣cos (sin−1 x)∣ = 1 − x2 . We next observe that sin−1 x ∈ [− , ] and hence cos (sin−1 x) ≥ 0;
π π
2 2
thus, we can get rid of the absolute value sign.
277, Contents
The next Fact is a result you’re supposed to have mastered in secondary school. Sadly, it
is not on List MF26, which means you’ll have to mug it.

Fact 33. (Harmonic Addition.) Let a, b, θ ∈ R with a ≠ 0. Suppose:

R= a2 + b2 α = tan−1 .

Then: (a) R sin (θ + α) = a sin θ + b cos θ. (c) R sin (θ − α) = a sin θ − b cos θ.

(b) R cos (θ + α) = a cos θ − b sin θ. (d) R cos (θ − α) = a cos θ + b sin θ.

Proof. First, use the above Lemma to write:

sin α = sin (tan−1 ) = √ =√ =√

b b/a b/a b
2 + b2
1 + (a)
b 2 a2 +b2
a a

1 1
cos α = cos (tan−1 ) = √ =√ =√
b a
and: .
2 + b2
1 + (a)
b 2 a2 +b2
a a

Below we first use the Addition and Subtraction Formulae, then what was just written in
red and blue above. In each case, the surd a2 + b2 nicely cancels out:


(a) R sin (θ + α) = a2 + b2 (cos α sin θ + sin α cos θ) = a sin θ + b cos θ,

(b) R cos (θ + α) = a2 + b2 (cos α cos θ − sin α sin θ) = a cos θ − b sin θ,

(c) R sin (θ − α) = a2 + b2 (cos α sin θ − sin α cos θ) = a sin θ − b cos θ,

(d) R cos (θ − α) = a2 + b2 (cos α cos θ + sin α sin θ) = a cos θ + b sin θ.

On the next page are another two (inverse) trigonometric identities that will be used in
Part III (Vectors):

278, Contents

Fact 34. Let x ∈ R. Then cos−1 x + cos−1 (−x) = π.

Proof. For a formal proof, see Exercise 122. Here follows an informal proof-by-picture:
Two right triangles with hypotenuse and base of lengths 1 and x are drawn below.
1. The red angle equals cos−1 x.
2. By symmetry, the green angle is equal
to the red angle. 1 1
3. The blue angle is the supplement of 3 cos (−x)

the green angle and equals cos−1 (−x).

Hence, cos−1 (−x) = π − cos−1 x.
−x 2 1 cos−1 x x
Rearranging, cos x + cos (−x) = π.
−1 −1

Fact 35. Let x ∈ R. Then sin−1 x + cos−1 x = π/2.

Proof. For a formal proof, see Exercise 123. Here is an informal proof-by-picture:
Two right triangles with hypotenuse and base of lengths
1 and x are drawn below.
1. The red angle equals cos−1 x. 1
2. The blue angle is supplementary to the red angle and 2 sin−1 x
is also equal to sin−1 x (because “Opp” is of length x).
Hence, sin−1 x = π/2 − cos−1 x.
1 cos−1 x x
Rearranging, cos x + sin x = π/2.
−1 −1

Exercise 122. This Exercise guides you through a proof of Fact 34. Recall (p. 19.5)
that cosine reflected in the x-axis is cosine translated rightwards by π. That is:

− cos θ = cos (θ − π) .

(a) Explain why we also have: − cos θ = cos (π − θ).

(b) Plug in θ = cos−1 x into the above equation and simplify the LHS.
(c) Now complete the proof by applying cos−1 . (Answer on p. 1430.)

Exercise 123. This Exercise guides you through a proof of Fact 35. Recall (p. 19.5)
that cosine translated rightwards by π/2 is sine. That is:

sin θ = cos (θ − ) .

sin θ = cos ( − θ).

(a) Explain why we also have:
(b) Plug in θ = sin x into the above equation and simplify the LHS.

(c) Now complete the proof by applying cos−1 . (Answer on p. 1430.)

279, Contents

19.8. The Area of a Triangle and the Laws of Sines and Cosines

Proposition 5. Let △ABC have angles A, B, and C, and sides a, b, and c. Then:

(a) The area of △ABC is: ab sin C.

sin A sin B sin C

(b) The Law of Sines: = =
a b c

(c) The Law of Cosines: c2 = a2 + b2 − 2ab cos C

Proof. Let D be the point on AC that is the base of the perpendicular from B. Then sin C =
∣BD∣ /a and cos C = ∣CD∣ /a. Thus, ∣BD∣ = a sin C, ∣CD∣ = a cos C, and ∣AD∣ = b − a cos C.

c a

a sin C


A b − a cos C D a cos C C

(a) △ABC has base b and height a sin C. Hence, its area is 0.5ab sin C.
(b) By symmetry, the triangle has area 0.5ab sin C = 0.5bc sin A = 0.5ac sin B.
Divide by 0.5abc to get: sin A/a = sin B/b = sin C/c.
(c) Consider the triangle ABD. It has hypotenuse of length c and legs of lengths a sin C
and b − a cos C. Now use the Pythagorean Theorem and the identity sin2 C + cos2 C = 1:

c2 = (a sin C) + (b − a cos C) = a2 sin2 C + b2 − 2ab cos C + a2 cos2 C

2 2

= a2 (sin2 C + cos2 C) + b2 − 2ab cos C = a2 + b2 − 2ab cos C.

280, Contents

The following “obviosity” can be proven using the Law of Cosines:

Corollary 5. The length of any one side of a triangle is always less than the sum of the
lengths of the other two sides.

Proof. Consider a triangle with sides of lengths a, b, c > 0, where C > 0 is the angle opposite
the side of length c. By the Law of Cosines:

c2 = a2 + b2 − 2ab cos C
= a2 + b2 − 2ab + 2ab − 2ab cos C
= (a − b) + 2ab (1 − cos C)

> (a − b) ,

where the last inequality follows because a, b > 0 and cos C < 1.
The inequality c2 > (a − b) is equivalent to c > a − b or a < b + c. This proves that the length

a is less than the sum of the lengths b and c.

We can likewise prove that b < a + c and c < a + b.

The following result, known as the Triangle Inequality, is secretly the same as the above
result (hence the name):

Fact 36. (The Triangle Inequality.) Let x, y ∈ R. Then ∣x + y∣ ≤ ∣x∣ + ∣y∣.

Proof. We have − ∣x∣ ≤ x ≤ ∣x∣ and − ∣y∣ ≤ y ≤ ∣y∣.

Adding up, we have − (∣x∣ + ∣y∣) ≤ x + y ≤ ∣x∣ + ∣y∣.
Hence, ∣x + y∣ ≤ ∣x∣ + ∣y∣.

281, Contents

20. Elementary Functions
We first review some of the functions we’ve encountered so far:

Definition 70. A polynomial function is any nice function f defined by f (x) = a0 + a1 x +

a2 x2 + ⋅ ⋅ ⋅ + an xn , where a0 , a1 , . . . , an are any real numbers.

The constant and identity functions are, of course, special instances of polynomial

Definition 71. An identity function is any nice function f defined by f (x) = x.154

Definition 72. A constant function is any nice function f defined by f (x) = c, where
c ∈ R.
A special case of a constant function is a zero function:

Definition 73. A zero function is any nice function f defined by f (x) = 0.

Definition 74. A power function is any nice function f defined by f (x) = xk , where k
is any real number.
We shall not formally define what an algebraic function is. Instead, we shall merely note
in passing that the set of algebraic functions includes all polynomial and power functions
(but also more besides).

Definition 75. The functions sin, cos, tan, cosec, sec, and tan are called trigonometric
(or circular) functions.155

Definition 76. The functions sin−1 , cos−1 , and tan−1 are called inverse trigonometric (or
circular) functions.

In Ch. 17, we also defined:

• The natural logarithm function ln ∶ R+ → R; and
• Its inverse the exponential function exp ∶ R → R+ .

All of the above functions are elementary functions. Also, any arithmetic combination
(Ch. 13) or composition (Ch. 15) of two elementary functions is an elementary function.

The identity mapping is the mapping x ↦ x. And so, an identity function may also be defined as any
nice function with the identity mapping.
On p. 18 of your H2 Maths syllabus, these six functions are simply called the circular functions.
However, the term trigonometric functions is probably more common.
282, Contents
Definition 77. An elementary function is:

a polynomial function, a trigonometric function, an inverse trigonometri

a natural logarithm function, an exponential function, a power function,
any arithmetic combination of two elementary functions, or any composition of two eleme

Nearly every function you’ll ever encounter in H2 Maths is elementary. Through arith-
metic combinations and compositions, we can build ever functions that “look” ever more
complicated but are nonetheless elementary:

Example 351. Consider the function f ∶ R+ → R defined by:

f (x) = 1 + 2x + 3 sin (cos (1 + x3 − ln x) ).


The function f “looks” complicated, but is nonetheless elementary.

Now consider the absolute value function:

Definition 78. The absolute value function ∣⋅∣ ∶ R → R is defined by:

⎪x for x ≥ 0,
∣x∣ = ⎨

⎪ for x < 0.

The absolute value function doesn’t seem to fall under our above Definition of elementary
functions. But observe we can rewrite the mapping rule of ∣⋅∣ more simply as:

∣x∣ = x2 .

(If you are puzzled by the above equation, recall that for any y ∈ R, we have y ≥ 0 — see
Remark 15.)

Now define the functions f, g ∶ R → R by f (x) = x2 and g (x) = x. Both f and g are
elementary. Moreover, ∣⋅∣ is the composition of f and g:

∣⋅∣ = f ○ g,

And so by Definition 77, ∣⋅∣ is an elementary function.

283, Contents

Fun Fact

Most functions we’ll encounter in H2 Maths are elementary.

Probably the only non-elementary function we’ll spend any time on is the normal distri-
bution’s cumulative density function. We’ll learn about this in Part VI (Probability
and Statistics).
It is possible to prove that the derivative (if it exists) of any elementary function is also
elementary. However and importantly, the integrals of many elementary functions are
not. We’ll learn a little about this in Part V (Calculus).

Remark 43. Note that there is no single standard definition of the term elementary
function. The above definition is merely this book’s.156
This term is nonetheless introduced because it is a convenient one for referring to nearly
all functions that maths students at this level will encounter.

And ProofWiki’s.
284, Contents
21. Polynomial Division
Ch. 2 reviewed division. We’ll now look at polynomial division.

Example 352. Consider the expression (2x + 1) ÷ x. We have:

­ q(x) ©
p(x) r(x)

­ © © © 2x + 1 ©
p(x) q(x) d(x) r(x)
2x + 1 = 2 ⋅ x + 1 or = 2 + .
® ®
x x
d(x) d(x)

The four polynomials labelled above have the same four names as before:

The dividend The divisor The quotient The remainder

p (x) = 2x + 1 d (x) = x q (x) = 2 r (x) = 1

In the above example, it was kinda obvious that the quotient had to be q (x) = 2. In the
next example, it’s a little less obvious and we’ll have to use long division:

Example 353. Consider (x2 + 3)÷(x − 1). The dividend is p (x) = x2 +3 and the divisor
is d (x) = x − 1. This time, it’s not so obvious what the quotient q (x) should be. But it
turns out that just like with (simple) division, here long division can help us.
In (simple) long division, going from right to left, the columns were 1s, 10s, 100s, etc.
Here in polynomial long division, going from right to left, they’re the constant x0 term,
the linear x1 term, the squared x2 term, etc.

Terms: x2 x1 x0
x +1 Explanation
x − 1 x2 +0x +3
x2 −x x ⋅ (x − 1) = x2 − x
x +3 (x2 + 3) − (x2 − x) =x+3
x −1 1 ⋅ (x − 1) =x−1
4 (x + 3) − (x − 1) =4

The quotient is q (x) = x + 1, while the remainder is r (x) = 4. Altogether, we have:

­ q(x) ©
p(x) r(x)
­ ³¹¹ ¹ ¹ ·¹ ¹ ¹ ¹ µ ³¹¹ ¹ ¹ ·¹ ¹ ¹ ¹ µ ©
x2 + 3 ¬
d(x) q(x) r(x)
x2 + 3 = (x − 1) ⋅ (x + 1) + 4 or = x+1+ .
x−1 x−1
± ±
d(x) d(x)

285, Contents

Example 354. Consider (3x2 + x − 4) ÷ (2x − 3). The dividend is p (x) = 3x2 + x − 4 and
the divisor is d (x) = 2x − 3. As before, let’s do the long division:

Terms: x2 x1 x0
3 11
x + Explanation
2 4
2x − 3 3x2 +x −4
9 3 9
3x2 − x x ⋅ (2x − 3) = 3x2 − x
2 2 2
11 9 11
x −4 (3x2 + x − 4) − (3x2 − x) = x − 4
2 2 2
11 33 11 11 33
x − ⋅ (2x − 3) = x −
2 4 4 2 4
17 11 11 33 17
( x − 4) − ( x − ) =
4 2 2 4 4

Thus the quotient and remainder are:

3 11 17
q (x) = x + and r (x) = .
2 4 4

³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ ³¹¹ ¹ ¹ ¹ ¹ ·¹¹ ¹ ¹ ¹ ¹ µ ³¹¹3¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹¹11

¹¹¹¹¹¹¹¹¹µ ª
q(x) r(x)
p(x) d(x)
We have: 3x2 + x − 4 = (2x − 3) ⋅ ( x + ) + .
2 4 4

³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ ¬
p(x) r(x)

3x2 + x − 4 3 11 17/4
Or: = x+ + .
2x − 3 2 4 2x − 3
² ´¹¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¶ ²
d(x) q(x) d(x)

286, Contents

Example 355. Consider (4x3 + 2x2 + 1) ÷ (2x2 − x − 1). The dividend is p (x) = 4x3 +
2x2 + 1 and the divisor is d (x) = 2x2 − x − 1. As before, let’s do the long division:

Terms: x3 x2 x1 x0
2x +2 Explanation
2x − x − 1 4x +2x
2 3 2
+0x +1
4x3 −2x2 −2x 2x ⋅ (2x2 − x − 1) = 4x3 − 2x2 − 2x
4x2 2x +1 (4x3 + 2x2 + 1) − (4x3 − 2x2 − 2x) = 4x2 + 2x + 1
4x2 −2x −2 2 ⋅ (2x2 − x − 1) = 4x2 − 2x − 2
4x +3 (4x2 + 2x + 1) − (4x2 − 2x − 2) = 4x + 3

Thus the quotient and remainder are:

q (x) = 2x + 2 and r (x) = 4x + 3.

³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ · ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ ³¹¹ ¹ ¹ ¹ ¹ ·¹¹ ¹ ¹ ¹ ¹ µ ­

p(x) d(x) q(x) r(x)

We have: 4x3 + 2x2 + 1 = (2x2 − x − 1) ⋅ (2x + 2) + 4x + 3.

³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ ­
p(x) r(x)

4x3 + 2x2 + 1 4x + 3
Or: = 2x + 2 + 2 .
2x − x − 1 2 ² 2x − x − 1
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ q(x) ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
d(x) d(x)

Exercise 124. For each expression, do the long division and identify the dividend, divisor,
quotient, and remainder. (Answer on p. 1431.)

16x + 3 4x2 − 3x + 1 x2 + x + 3
(a) . (b) . (c) .
5x − 2 x+5 −x2 − 2x + 1

The above examples and exercises suggest the following theorem and definition:

Theorem 4. (Euclidean Division Theorem for Polynomials.) Let p (x) and d (x)
be P - and D-degree polynomials in x with D < P . Then there exists a unique polynomial
q (x) of degree P − D such that r (x) = p (x) − d (x)q (x) has degree less than D.

Proof. 1274 in the Appendices.

Definition 79. Given polynomials p (x) and d (x) and the expression p (x) ÷ d (x), we
call p (x) the dividend, d (x) the divisor, the unique polynomial q (x) given in the above
theorem the quotient, and r (x) = p (x) − d (x)q(x) the remainder.

287, Contents

21.1. Factorising Polynomials

Definition 80. Let p (x) and d (x) be polynomials. If there exists a polynomial q (x)
such that p (x) = d (x) q (x), then we say that d (x) is a factor for p (x) or divides p (x).

Example 356. We can factorise the quadratic polynomial p (x) = x2 + 4x + 3:

p (x) = (x + 1) (x + 3) .

We call x + 1 and x + 3 factors of the polynomial p (x).

By the way, if d (x) is a factor of the polynomial p (x), then so too is kd (x), for any
k ≠ 0. So here, since x + 1 is a factor of p (x), so too are 2x + 2, 7x + 7, and 0.5x + 0.5.

We now state and prove the Remainder Theorem and its corollary157 the Factor The-
orem. The Factor Theorem is especially useful for factorising polynomials.

Theorem 5. Let a be a constant and p (x) be a polynomial. Then:

(a) p (x) divided by x − a has remainder p (a). (The Remainder Theorem)

(b) x − a is a factor for p (x) if and only if p (a) = 0. (The Factor Theorem)

Proof. (a) By Definition 79, when we divide p (x) by x−a, the quotient q (x) and remainder
r (x) are given by:

p (x) = (x − a) q (x) + r (x) .

Note that r (x) is a polynomial whose degree is less than 1 and which is thus a constant.
And so, let us simply write r in place of r (x).
Now simply plug in x = a to get:

p (a) = (a − a) q (a) + r = r.

In words, the remainder r equals p (a).

(b) We just showed that p (x) divided by x − a leaves a remainder of p (a). By Definition
80, x − a is a factor for p (x) if and only if the remainder is zero. Altogether, x − a is a factor
for p (x) if and only if p (a) = 0.

The Factor Theorem is equivalent to the following statement:

x − a is a factor for p (x) if and only if a is a root of p (x).

In mathematics, a corollary is a statement that follows readily from another.
288, Contents
Some examples and exercises to illustrate the Remainder Theorem (RT):

Example 357. Consider p (x) = x2 − 5x + 1 (a quadratic polynomial). By the RT, p (x)

divided by x − 3 leaves a remainder of p(3) = 32 − 5(3) + 1 = −5.
We could’ve figured this out through long division (below), but clearly the RT is a lot

Squared Linear Constant
x −2
x−3 x2 −5x +1
x2 −3x
−2x +1
−2x +6

Example 358. Consider p (x) = 17x5 − 5x4 + x2 + 1 (a quintic polynomial). By the RT,
p (x) divided by x − 1 leaves a remainder of p(1) = 17 ⋅ 15 − 5 ⋅ 14 + 12 + 1 = 14.
We could’ve figured this out through long division (I didn’t bother but you can try this
as an exercise), but clearly the RT is a lot quicker.

Historically, the RT rarely featured on the A-Level exams ... Which means, of course, that
it made a sudden appearance in 2017 just to screw students over.158 So yea, it’s another
thing you’ll want to remember. An exercise to help with that:

Exercise 125. Find the remainder when we divide:

(a) 2x3 + 7x2 − 3x + 5 by x − 3.
(b) −2x4 + 3x2 − 7x − 1 by x + 2. (Answer on p. 1432.)

For H2 Maths, the Remainder Theorem will have little use (except as a means for screwing
students over). Instead, its corollary — the Factor Theorem (FT) — will be more useful
for factorising polynomials.
The FT tells us that p (a) = 0 if and only if x − a is a factor for p (x). And so, by guess-and-
checking numbers a for which p (a) = 0, we can factorise p (x). Let’s call this the Factor
Theorem guess-and-check method (FTGACM). Examples:

See Exercise 446(a) (9758 N2017/I/5).
289, Contents
Example 359. Factorise p (x) = x2 − 3x + 2 (a quadratic polynomial).
Let’s try the FTGACM by plugging in the number 1:

p(1) = 12 − 3 ⋅ 1 + 2 = 0. 3

Wah! So “lucky”! Success on the very first try! By the FT, x − 1 is a factor for p (x).
Since p (x) is quadratic, its other factor must be a linear or 1st-degree polynomial, i.e. of
the form ax + b. To find this other factor, we could continue trying the FTGACM. But
we won’t do that.
Instead, we’ll divide x2 − 3x + 2 by x − 1. In fact, here let’s learn another method for
dividing polynomials. Write:

p (x) = x2 − 3x + 2 = (x − 1) (ax + b) = ax2 + (b − a) x − b.

1 2

By comparing the coefficients on the squared and constant terms, we see that a = 1 and
b = −2. Thus, the other factor must be ax + b = x − 2. We have:

p (x) = x2 − 3x + 2 = (x − 1) (x − 2) .

P.S. It was actually unnecessary to write = above. Because just from = alone, you can
2 1

easily tell from the coefficients on the squared and constant terms that a = 1 and b = −2.

Example 360. Factorise p (x) = 3x2 + 5x + 2.

Let’s try the FTGACM by plugging in the number 1:

p(1) = 3 ⋅ 12 + 5 ⋅ 1 + 2 = 10 ≠ 0. 7

Aiyah, sian. Doesn’t work — by the FT, x − 1 is not a factor for p (x).
By the way, here’s a towkay time-saving tip: We don’t need to compute the exact value
of p(1) to see that since every term is positive, p(1) is clearly positive and non-zero.
Let’s keep trying the FTGACM. This time, we try −1:

p(−1) = 3 ⋅ (−1) + 5 ⋅ (−1) + 2 = 0. 3


Yay, works! By the FT, x + 1 is a factor for p (x).

As in the previous example, to find the other factor ax + b, let’s divide 3x2 + 5x + 2 by
x + 1 by writing:

p (x) = 3x2 + 5x + 2 = (x + 1) (ax + b) .

From the coefficients on the squared and constant terms, we see that a = 3 and b = 2.
Thus, the other factor must be ax + b = 3x + 2. We have:

p (x) = 3x2 + 5x + 2 = (x + 1) (3x + 2) .

290, Contents

Actually, to factorise quadratic polynomials, there are quicker methods than the FTGACM:
1. The quadratic formula (Ch. 9).
If b2 − 4ac > 0, then the roots to the quadratic equation are:

−b ± b2 − 4ac
x= .
We can use these roots and the FT to factorise the quadratic polynomial easily:
√ √
−b − 2 − 4ac −b + b2 − 4ac
• If b − 4ac > 0, then ax + bx + c = a (x − ) (x − ).
2 2 b
2a 2a
• If b2 − 4ac > 0 (so that the surds disappear), then:
−b −b b 2
ax + bx + c = a (x − ) (x − ) = a (x + ) .
2a 2a 2a
• If b2 − 4ac < 0, then the quadratic polynomial cannot be factorised.
We revisit the last two examples:

Example 361. x2 − 3x + 2 has discriminant b2 − 4ac = (−3) − 4(1)(2) = 1. Thus:


√ √
3 − 1 3 + 1
x2 − 3x + 2 = (x − ) (x − ) = (x − 1) (x − 2) .
2 2

Example 362. 3x2 + 5x + 2 has discriminant b2 − 4ac = 52 − 4(3)(2) = 1. Thus:

√ √
−5 − 1 −5 + 1 2
3x + 5x + 2 = 3 (x −
) (x − ) = 3 (x + 1) (x + ) = (x + 1) (3x + 2) .
2⋅3 2⋅3 3

Two more examples:

Example 363. 4x2 − 12x + 9 has discriminant b2 − 4ac = (−12) − 4(4)(9) = 0. Thus:

−12 2 3 2
4x − 12x + 9 = 4 (x + ) = 4 (x − ) = (2x − 3) .
2 2
2×4 2

Example 364. 3x2 −2x+1 has discriminant b2 −4ac = (−2) −4(3)(1) < 0. Thus, 3x2 −2x+1

cannot be factorised.

291, Contents

2. The secondary school guess-and-check method (SSGACM).
The examiners are usually nice and give polynomials whose factors have integer coefficients.
So, it is usually not hard to guess what these integer coefficients are. Indeed, this is probably
what you did back in secondary school. Examples:

Example 365. Factorise 3x2 + 5x + 2.

We write: 3x2 + 5x + 2 = (ax + b) (cx + d).

If your examiners are nice, a, b, c, and d should all be integers. We already know that
ac = 3 and bd = 2. So, let’s try something like a = 3, c = 1, b = 1, and d = 2:

(3x + 1) (x + 2) = 3x2 + 7x + 2. 7

Aiyah, sian. Doesn’t work. Neh’mine. Try again by switching 1 and 2:

(3x + 2) (x + 1) = 3x2 + 5x + 2. 3 Yay! Done!

Example 366. Factorise 6x2 − 11x − 35.

We write: 6x2 − 11x − 35 = (ax + b) (cx + d).

If your examiners are nice, a, b, c, and d should all be integers. We already know that
ac = 6 and bd = −35. So, let’s try something like a = 3, c = 2, b = 7, and d = −5:

(3x + 7) (2x − 5) = 6x2 − x − 35. 7

Aiyah, sian. Doesn’t work. Neh’mine. Try again by switching 3 and 2:

(2x + 7) (3x − 5) = 6x2 + 11x − 35. 7

Aiyah, sian. Still doesn’t work. Neh’mine. Try again by switching the signs:

(2x − 7) (3x + 5) = 6x2 − 11x − 35. 3 Yay! Done!

With quadratic (i.e. degree-2) polynomials, the quadratic formula or SSGACM will usually
be quicker than the FTGACM.
But for polynomials of degree 3 or higher, we do not know of any formula159 and the
SSGACM may be hopeless. And so it is really only with higher-degree polynomials that
the FTGACM comes in handy:
There are actually formulae for factorising cubic and quartic polynomials, but we haven’t learnt these.
292, Contents
Example 367. Factorise p (x) = 15x3 − 17x2 − 22x + 24.
If we want to try the SSGACM, we’d write:

15x3 − 17x2 − 22x + 24 = (ax + b) (cx + d) (ex + f ) .

We observe that ace = 15 and bdf = 24. We could try out different numbers, but there
are just way too many possibilities and we’d probably take way too long.
Better then to try the FTGACM. To do so, we plug in the number 1:

p(1) = 15 ⋅ 13 − 17 ⋅ 12 − 22 ⋅ 1 + 24 = 0. 3

Wah! So “lucky”! Success on the very first try! By the FT, x − 1 is a factor for p (x).
Since p (x) is cubic, p (x) divided by x − 1 gives us a quadratic polynomial ax2 + bx + c,
which we can find by writing:

p (x) = 15x3 − 17x2 − 22x + 24 = (x − 1) (ax2 + bx + c) .

The coefficients on the cubed and constant terms are a = 15 and −c = 24 (or c = −24).
The coefficients on the squared term are −a + b = −17; and so, b = −2. Thus:

ax2 + bx + c = 15x2 − 2x − 24.

Let’s now factorise 15x2 − 2x − 24. Here, let’s just use the quadratic
√ formula. We have
b − 4ac = (−2) − 4 (15) (−24) = 4 + 1440 = 1444 > 0. Moreover, 1444 = 38. Thus:
2 2

2 − 38 2 + 38 6 4
15x2 − 2x − 24 = 15 (x − ) (x − ) = 15 (x + ) (x − ) .
30 30 5 3
Altogether, we have:
6 4
15x3 − 17x2 − 22x + 24 = 15 (x − 1) (x + ) (x − ) = (x − 1) (5x + 6) (3x − 4) .
5 3

But even with higher-degree polynomials, it may sometimes be quicker to use the SSGACM
(than the FTGACM), especially if the coefficients aren’t too big or are equal to zero:

Example 368. Factorise p (x) = x3 − 7x − 6. We try the SSGACM by writing:

x3 − 7x − 6 = (ax + b) (cx + d) (ex + f ) .

Since ace = 1 and bdf = −6, why not try a = c = e = 1 and b = 2, d = −3, and f = 1:

(x + 2) (x − 3) (x + 1) = (x2 − x − 6) (x + 1) = x3 − x2 − 6x + x2 − x − 6 = x3 − 7x − 6. 3

Wah! So “lucky”! Success on the very first try! We’re done!

As we’ve seen, factorising polynomials takes wisdom, intuition, and some luck. You’ll have
to learn to judge which tool will get you the answer most quickly.

293, Contents

We now learn another tool for factorising polynomials — the Intermediate Value The-
orem (IVT). We first describe the IVT informally using an example:

Example 369. Let f be a continuous function. Suppose we have most of f ’s graph, but
are missing the interval (1, 3). Say we know that f (1) = −2 and f (3) = 2. What then can
we say about the missing portion of the graph?

y f

2 (3, 2)

1 3 x

(1, −2)

We have most of the

graph of f , but are
missing the interval (1, 3).

Since f is continuous, it must be that we can draw its entire graph without lifting our
pencil. In particular, we can connect the dots (1, −2) and (3, 2) without lifting our
pencil. But obviously, the only way to do so is to have our pencil “go through” every
value between −2 and 2. Hence, f must take on every value between −2 and 2.

And yup, that’s all the IVT says — if f is continuous on the interval [a, b], then f must
“hit” every value between f (a) and f (b) in the interval (a, b). A bit more formally:

Theorem 6. (The Intermediate Value Theorem.) If f is continuous on the interval

[a, b], then for every y ∈ (f (a) , f (b)), there exists x ∈ (a, b) such that y = f (x).

Proof. Omitted — see e.g. Tao (Analysis I, 2016, pp. 238–9).

We now illustrate how the IVT can help us factorise polynomials.

294, Contents

Example 370. Factorise p (x) = 2x2 + 9x − 5.
Note that with this quadratic polynomial, it’s probably quicker to use the quadratic
formula or the SSGACM.
But here, just to illustrate how the IVT can be used, we’ll start instead with the FT-
GACM. We plug in the number 1:

p(1) = 2 ⋅ 12 + 9 ⋅ 1 − 5 = 6 ≠ 0. 7

Aiyah, sian. Doesn’t work — by the FT, x − 1 is not a factor for p (x).
We could continue trying the FTGACM. But here let’s first enlist the help of the IVT.
Observation: p (0) = −5. What good is that observation?
Well, since p (0) < 0 < p (1), the IVT says there must be some 0 < c < 1 such that p (c) = 0.
So let’s continue trying the FTGACM by plugging in the number 0.5:

p (0.5) = 2 ⋅ 0.52 + 9 ⋅ 0.5 − 5 = 0. 3

Yay, works! By the FT, x − is a factor for p (x).
1 1
As stated earlier, if x − is a factor for p (x), then so too is 2 (x − ) = 2x − 1. So write:
2 2
p (x) = 2x2 + 9x − 5 = (2x − 1) (ax + b) .

The coefficients on the squared and constant terms are 2a = 2 and −b = −5. And so, a = 1
and b = 5. Altogether, we have:

p (x) = 2x2 + 9x − 5 = (2x − 1) (x + 5) .

For the A-Levels, you will routinely have to factorise quadratic polynomials. You may
sometimes also have to factorise cubic polynomials.
It’s unusual that they ask you to factorise polynomials of a higher order. And if they do,
the friendly folks at the MOE will usually be nice enough to give you a little help.160
In the next example, we factorise a quartic polynomial using what we’ve learnt. It’s long
and tedious, but conceptually not any harder than what we’ve already done:

See e.g. Exercise 552(b) — 9740 N2010/II/1.
295, Contents
Example 371. Factorise p (x) = 6x4 + 13x3 − 29x2 − 52x + 20 (a quartic polynomial).
We’ll start by trying the FTGACM. We plug in the number 1:

p(1) = 6 ⋅ 14 + 13 ⋅ 13 − 29 ⋅ 12 − 52 ⋅ 1 + 20 < 0. 7
(Again, we can tell p(1) < 0 even without computing its exact value.)

Aiyah, sian. Doesn’t work — by the FT, x − 1 is not a factor for p (x).
But now, observe that p (0) = 20 > 0 > p(1). And so, the IVT says there must be some
value 0 < q < 1 such that p (q) = 0. So, let’s stick with the FTGACM, but now try 1/2:

1 1 4 1 3 1 2 1 6 13 29 52
p ( ) = 6 ⋅ ( ) + 13 ⋅ ( ) − 29 ⋅ ( ) − 52 ⋅ ( ) + 20 = + − − + 20 < 0. 7
2 2 2 2 2 16 8 4 2
(Again, we can tell p (1/2) is negative, even without computing its exact value.)

Aiyah, sian. Still doesn’t work — by the FT, x − 1/2 is not a factor for p (x).
But again, since p (0) = 20 > 0 > p (1/2), the IVT says that there must be some value
0 < r < 1/2 such that p (r) = 0. So, let’s stick with the FTGACM, but now try 1/3:

1 1 4 1 3 1 2 1
p ( ) = 6 ⋅ ( ) + 13 ⋅ ( ) − 29 ⋅ ( ) − 52 ⋅ ( ) + 20
3 3 3 3 3
6 13 29 52 2 13 29 8 15 5
= + − − + 20 = + − + = − = 0. 3
81 27 9 3 27 27 9 3 27 9

Yay, works! By the FT, x − 1/3 is a factor for 6x4 + 13x3 − 29x2 − 52x + 20.
If x − 1/3 is a factor, so too is 3 (x − 1/3) = 3x − 1. So write:

p (x) = 6x4 + 13x3 − 29x2 − 52x + 20 = (3x − 1) (ax3 + bx2 + cx + d) .

The coefficients on the 4th-degree and constant terms are 3a = 6 and −d = 20. And so,
a = 2 and d = −20. Next, to find b and c, examine the coefficients on the cubed and linear
terms, which are 3b − a = 13 and 3d − c = −52. And so, b = 5 and c = −8. Thus:

ax3 + bx2 + cx + d = 2x3 + 5x2 − 8x − 20.

We must now factorise 2x3 + 5x2 − 8x − 20. Once again, let’s try the FTGACM.
By the way, here’s an additional trick to help you factorise polynomials. When trying
the FTGACM, you should try numbers that are factors of the constant term. So in this
case, the constant term is −20 = −2 × 2 × 5. So why not we try plugging in 2:

2 ⋅ 23 + 5 ⋅ 22 − 8 ⋅ 2 − 20 = 16 + 20 − 16 − 20 = 0. 3

Yay, works! By the FT, x − 2 is a factor for 2x3 + 5x2 − 8x − 20.

(Example continues on the next page ...)

296, Contents

(... Example continued from the previous page.)

Now write: 2x3 + 5x2 − 8x − 20 = (x − 2) (ex2 + f x + g).

The coefficients on the cubed and constant terms are e = 2 and −2g = −20. And so, g = 10.
To find f , look at the coefficients on the squared terms, which are f − 2e = 5. And so,
f = 9. Thus, ex2 + f x + g = 2x2 + 9x + 10.
To factorise 2x2 + 9x + 10, we use the SSGACM. Let’s try:

(2x + 5) (x + 2) = 2x2 + 9x + 10. 3

Wah! So lucky! Success on the very first try! And now, at long last, we’re done:

p (x) = 6x4 + 13x3 − 29x2 − 52x + 20 = (3x − 1) (x − 2) (2x + 5) (x + 2) .

All of our examples have so far involved a n-degree polynomial that can be fully factorised
into n linear (or degree-1) factors. But this need not always be the case:

Example 372. Consider the quartic polynomial x4 − 1. We can write:

x4 − 1 = (x2 + 1) (x + 1) (x − 1) .

x2 + 1 has negative discriminant and so cannot be further factorised.161

Example 373. Consider the quartic polynomial x4 + 5x2 + 4. We can write:

x4 + 5x2 + 2 = (x2 + 1) (x2 + 4) .

Both x2 + 1 and x2 + 4 have negative discriminants and so cannot be further factorised.162

Example 374. The quartic polynomial x4 + x + 1 cannot be factorised at all.163

These last three examples issue two important warnings:

• Do not assume that an n-degree polynomial can be fully factorised into n linear factors.
• Indeed, do not even assume that an n-degree polynomial has any linear factors!

Actually, with complex numbers, further factorisation is possible. We can write x2 + 1 = (x + i) (x − i)
and thus x4 −1 = (x + i) (x − i) (x + 1) (x − 1). Indeed, as we’ll see later, the Fundamental Theorem of
Algebra (Theorem 11) guarantees that with the aid of complex numbers, any nth-degree polynomial
can be factorised into n linear factors.
Again, with complex numbers, we can actually write x2 + 1 = (x + i) (x − i) and x2 + 4 = (x + 2i) (x − 2i).
Thus, x4 + 5x2 + 2 = (x + i) (x − i) (x + 2i) (x − 2i).
Again, with complex numbers, it is actually possible to factorise x4 + x + 1 into four linear factors.
297, Contents
Exercise 126. Factorise the following polynomials. (Answer on p. 1432.)

(a) 2x2 − 5x − 3. (b) 7x2 − 19x − 6.

(c) 6x2 + x − 1. (d) 2x3 − x2 − 17x − 14.

Exercise 127. Let p (x) = ax4 + bx3 − 31x2 + 3x + 3, where a and b are constants. You are
told that (i) p (x) divided by x − 1 leaves a remainder of 5; and (ii) p (0.5) = 0.

(a) Find a and b.

You are now also told that (iii) p (−1/3) < 0.
(b) Factorise p (x). (Answer on p. 1433.)

298, Contents

22. Conic Sections
In this chapter, we’ll study the graphs of the following eight equations. All eight are
examples of conic sections.164
We first study the unit circle and ellipse centred on the origin:

x2 y 2
x2 + y 2 = 1 + =1
a2 b2

We then study six types of hyperbolae:

1 x2 y 2
y= x2 − y 2 = 1 − =1
x a2 b2

y 2 x2 bx + c ax2 + bx + c
− =1 y= y=
b2 a 2 dx + e dx + e

Fun Fact

The Greek word hyperbola is closely related to the English word hyperbole, which means
an exaggeration or overstatement.

By the way, we’ve actually already studied one example of a conic section — this was the
graph of the quadratic equation y = ax2 + bx + c, which is a type of conic section called the

For why these are called conic sections, see Ch. 117.8 (Appendices).
299, Contents
22.1. The Ellipse x2 + y 2 = 1 (The Unit Circle)
Let’s do an O-Level review of why the equation x2 + y 2 = 1 describes the unit circle (i.e.
radius 1) centred on the origin.
Consider any point A = (Ax , Ay ) on the unit circle. We can use it to form a right triangle,
with base Ax , height Ay , and hypotenuse 1. By the Pythagorean Theorem, A2x +A2y = 12 = 1.
This proves that every point on the unit circle satisfies the equation x2 + y 2 = 1.

1 y

x2 + y 2 = 1
(0, 1) is a strict A = (Ax , Ay )
global maximum.
−1 1
O Ax x

(0, −1) is a strict

global minimum. A2x + A2y = 1


Here are some characteristics of the graph of x2 + y 2 = 1:

1. Intercepts. The y-intercepts are (0, −1) and (0, 1). The x-intercepts are (−1, 0) and
(1, 0).
2. Turning points. By observation, there are two turning points — (0, 1) is a strict global
maximum and (0, −1) is a strict global minimum.
3. Asymptotes. By observation, there are no asymptotes.
4. Symmetry. By observation, every line that passes through the origin is a line of
symmetry. (And no other line is a line of symmetry.)

300, Contents

x2 y 2
22.2. The Ellipse 2 + 2 = 1
a b
Two transformations165 will get us from x2 + y 2 = 1 to (x/a) + (y/b) = 1:
2 2

1. First stretch x2 + y 2 = 1 horizontally, outwards from the y-axis, by a factor of a, to get

(x/a) + y 2 = 1.

2. Then stretch (x/a) + y 2 = 1 vertically, outwards from the x-axis, by a factor of b, to get

(x/a) + (y/b) = 1.
2 2

Thus, (x/a) + (y/b) = 1 is simply the unit circle stretched horizontally and vertically by
2 2

factors of a and b. We call this “elongated” or “imperfect” circle an ellipse. Note that this
ellipse remains centred on the origin (0, 0).

y (0, b) is a strict
global maximum.
x 2 y 2
( ) +( ) =1
a b

Line of symmetry

−a Line of symmetry a x

(0, −b) is a strict

global minimum.

1. Intercepts. The y-intercepts are (0, −b) and (0, b). The x-intercepts are (−a, 0) and
(a, 0).
2. Turning points. By observation, there are two turning points — (0, b) is a strict global
maximum and (0, −b) is a strict global minimum.
3. Asymptotes. By observation, there are no asymptotes.
4. Symmetry. By observation, if a ≠ b, then there are only two lines of symmetry, namely
y = 0 (the x-axis) and x = 0 (the y-axis). (Note that if a = b, then this ellipse is in fact a
circle and there are again infinitely many lines of symmetry.)

Read Ch. 16 if you haven’t already.
301, Contents
Exercise 128. Graph the equation below (a, b, c, d ∈ R and a, b ≠ 0). Label any turning
points, asymptotes, lines of symmetry, and intercepts. (Hint in footnote.)166

(x + c) (y + d)
2 2
+ = 1. (Answer on p. 1434.)
a2 b2

The rest of this chapter will look at six examples of hyperbolae. Our first and also the
simplest example of a hyperbola is y = 1/x:

To find the y-intercepts, plug in x = 0. To find the x-intercepts, plug in y = 0.
302, Contents
22.3. The Hyperbola y =
All hyperbolae we’ll study will share some common features:
1. There’ll be two branches — y = 1/x has a bottom-left branch and a top-right branch.
2. There may or may not be x- and y-intercepts — y = 1/x has neither.
3. There may or may not be turning points — y = 1/x has none.
4. There’ll be two asymptotes — y = 1/x has horizontal asymptote y = 0, because as
x → −∞, y → 0− and as x → ∞, y → 0+ . Also, y = 1/x has the vertical asymptote x = 0,
because as x → 0− , y → −∞ and as x → 0+ , y → ∞.
A rectangular hyperbola is any hyperbola whose two asymptotes are perpendicular
— thus, y = 1/x is an example of a rectangular hyperbola.
5. The hyperbola’s centre is the point at which the two asymptotes intersect167 — y = 1/x
has centre (0, 0).
6. There’ll be two lines of symmetry — each (a) passes through the centre; and (b)
bisects an angle formed by the two asymptotes.
y = 1/x has two lines of symmetry: y = x and y = −x. Observe that indeed, each (a)
passes through the centre; and (b) bisects an angle formed by the two asymptotes.

Line of symmetry y= Line of symmetry
y = −x x y=x

Horizontal asymptote
(0, 0)

Vertical asymptote

For simplicity, this shall be this textbook’s definition of a hyperbola’s centre.
Note though that in the usual and proper study of conic sections, the centre is instead defined as the
midpoint of the line segment connecting the two foci. That the two asymptotes intersect at the centre
is then a result rather than a definition. However, in H2 Maths, there is no mention of foci and so I
thought it better to simply define the centre as the intersection point of the two asymptotes.
303, Contents
22.4. The Hyperbola x2 − y 2 = 1
Consider the equation x2 − y 2 = 1. Notice that no x ∈ (−1, 1) satisfy this equation. Hence,
this graph contains no points for which x ∈ (−1, 1).

Oblique asymptote Oblique asymptote
y = −x y=x

x2 − y 2 = 1

Line of symmetry
−1 1 x
(0, 0)

Line of symmetry

1. There are two branches — one on the left and another on the right.
2. Intercepts. The x-intercepts are (−1, 0) and (1, 0). There are no y-intercepts.
3. There are no turning points.
4. As x → −∞, y → ±x. And as x → ∞, y → ±x. So, x2 − y 2 = 1 has two oblique
asymptotes y = ±x.
Since the two asymptotes y = x and y = −x are perpendicular, this is again a rectangular
hyperbola. (In fact, we call this an “east-west” rectangular hyperbola.)
5. The hyperbola’s centre is (0, 0).
6. The two lines of symmetry are y = 0 (the x-axis) and x = 0 (the y-axis). Observe
that each (a) passes through the centre; and (b) bisects an angle formed by the two

304, Contents

x2 y 2
22.5. The Hyperbola 2 − 2 = 1
a b
Consider the equation (x/a) − (y/b) = 1. Notice that again, no x ∈ (−a, a) satisfy this
2 2

equation. Hence, this graph again contains no points for which x ∈ (−a, a).
Two transformations will get us from x2 − y 2 = 1 to (x/a) − (y/b) = 1:
2 2

1. Stretch x2 − y 2 = 1 horizontally, outwards from the y-axis, by a factor of a, to get

(x/a) − y 2 = 1.

2. Stretch (x/a) − y 2 = 1 vertically, outwards from the x-axis, by a factor of b, to get


(x/a) − (y/b) = 1.
2 2

Oblique asymptote Oblique asymptote

y=− x y= x
b b
a a

Line of symmetry
−a a x
(0, 0)

Line of symmetry

1. There are two branches — one on the left and another on the right.
2. Intercepts. The x-intercepts are (−a, 0) and (a, 0). There are no y-intercepts.
3. There are no turning points.
4. As x → −∞, y → ±bx/a. And as x → ∞, y → ±bx/a. So, (x/a) − (y/b) = 1 has two
2 2

oblique asymptotes y = ±bx/a.

Since the two asymptotes y = bx/a and y = −bx/a are perpendicular, this is again a rect-
angular hyperbola. (This is again an “east-west” rectangular hyperbola.)
5. The hyperbola’s centre is (0, 0).
6. The two lines of symmetry are y = 0 (the x-axis) and x = 0 (the y-axis). Observe
that each (a) passes through the centre; and (b) bisects an angle formed by the two

305, Contents

y 2 x2
22.6. The Hyperbola 2 − 2 = 1
b a
Take the equation from the last chapter, but switch a and b, so that we have the equation
x2 /b2 − y 2 /a2 = 1. Note that this is also an east-west rectangular hyperbola, but with
x-intercepts are (±b, 0) instead of (±a, 0).
To get from x2 /b2 − y 2 /a2 = 1 to y 2 /b2 − x2 /a2 = 1, apply any one of these transformations:

Reflect in the line y = x.

π π
Rotate clockwise. Rotate anticlockwise.
2 2
Notice that no y ∈ (−b, b) satisfy y 2 /b2 − x2 /a2 = 1. Hence, this graph contains no points for
which y ∈ (−b, b).

y 2 x2
− =1
b2 a2
Oblique asymptote
y=− x
b b
Line of symmetry
Centre x
(0, 0)
Oblique asymptote
y= x
Line of symmetry

1. There are two branches — one above and another below.

2. Intercepts. The y-intercepts are (−b, 0) and (b, 0). There are no x-intercepts.
3. The two turning points are (0, b) and (0, −b) — the former is a strict local minimum,
while the latter is a strict local maximum.
4. As x → −∞, y → ±bx/a. And as x → ∞, y → ±bx/a. So, y 2 /b2 − x2 /a2 = 1 has two
oblique asymptotes y = ±bx/a.
Since the two asymptotes y = bx/a and y = −bx/a are perpendicular, this is again a rect-
angular hyperbola. (In fact, we call this a “north-south” rectangular hyperbola.)
5. The hyperbola’s centre is (0, 0).
6. The two lines of symmetry are y = 0 (the x-axis) and x = 0 (the y-axis). Each (a)
passes through the centre; and (b) bisects an angle formed by the two asymptotes.

306, Contents

bx + c
22.7. The Hyperbola y =
dx + e
In the next subchapter, we’ll study the graph of:
ax2 + bx + c
dx + e

But to warm up, let’s first study the simpler case where a = 0. That is, let’s first study:
bx + c
dx + e

We’ll assume that d ≠ 0 and cd − be ≠ 0. This is because:

• If d = 0, then this is simply a linear equation; and
• If cd − be = 0, then as we’ll show below, this is simply the horizontal line y = b/d.

307, Contents

2x + 1
Example 375. Consider y = . Do the long division:
x + 1 2x +1
2x + 1 1
2x +2 Ô⇒ y= =2− .
x+1 x+1

2x + 1
x = −1
y = −x + 1 y =x+3

(−1, 2)
(0, 1)

(−1/2, 0) x

1. There are two branches — one on the top-left and another on the bottom-right.
2. Intercepts. Plug in x = 0 to get y = (2 ⋅ 0 + 1) / (0 + 1) = 1 — the y-intercept is (0, 1).
Plug in y = 0 to get 2x + 1 = 0 or x = −1/2 — the x-intercept is (−1/2, 0).
3. There are no turning points.
4. As x → −1− , y → ∞. And as x → −1+ , y → −∞. So, y = (2x + 1) / (x + 1) has vertical
asymptote x = −1. (Not coincidentally, this is the x-value for which x + 1 = 0.)
As x → −∞, y → 2+ . And as x → ∞, y → 2− . So, y = (2x + 1) / (x + 1) has horizontal
asymptote y = 2. (Not coincidentally, this is the quotient in the above long division.)
Since the two asymptotes y = 2 and x = −1 are perpendicular, this is again a rectan-
gular hyperbola.
5. The hyperbola’s centre (the point at which the two asymptotes intersect) is (−1, 2).
(The centre’s coordinates are simply given by the vertical and horizontal asymptotes.)
6. The two lines of symmetry are y = x + 3 and y = −x + 1.

308, Contents

In general, given the hyperbola y = (bx + c) / (dx + e), here’s how to find the intercepts,
asymptotes, and centre:
• Intercepts.
Plug in x = 0 to get y = (b ⋅ 0 + c) / (d ⋅ 0 + e) = c/e and thus the y-intercept (0, c/e). (Note
that if e = 0, then c/e is undefined and there is no y-intercept.)
Plug in y = 0 to get bx + c = 0 or x = −c/b. And thus, the x-intercept is (−c/b, 0). (Note that
if b = 0, then −c/b is undefined and there is no x-intercept.)
• Asymptotes.
The value of x for which dx + e = 0 is x = −e/d. Thus, the vertical asymptote is x = −e/d.
To find the horizontal asymptote, do the long division:

dx + e bx +c
bx +be/d
c − be/d

bx + c b c − bed b cd − be 1
Thus: = + = + .
dx + e d dx + e d d2 x + e/d

The quotient b/d gives us the horizontal asymptote y = b/d.

Note that since we always have two perpendicular asymptotes (one horizontal and another
vertical), y = (bx + c) / (dx + e) is a rectangular hyperbola.
(By the way, note that if cd−be = 0, then this hyperbola is simply the horizontal line y = b/d.
This is why above we imposed the condition that cd − be ≠ 0.)
• We’ve defined the centre to the point at which the two asymptotes intersect. And so,
the centre is simply (−e/d, b/d). (These coordinates are simply given by the vertical and
horizontal asymptotes.)
• The two lines of symmetry are: y = ±x + (b + e) /d.
You need not know where the above two lines of symmetry come from (but see the proof of
Fact 37 if you’re interested). Indeed, you need not even mug them. All you need remember
is that the two lines of symmetry:
(a) Pass through the centre; and
(b) Have slope ±1.
(b) implies that the two lines of symmetry may be written as y = x + α and y = −x + β,
where αand β are constants you can easily find. Examples:

309, Contents

7x + 3
Example 376. Consider y = . Do the long division:
2x + 4
2x + 4 7x +3
7x + 3 7 11
7x +14 Ô⇒ y= = − .
2x + 4 2 2x + 4

7x + 3
2x + 4
x = −2
y = −x + 3/2 y = x + 11/2

(−2, 7/2) y = 7/2

(0, 3/4)

(−3/7, 0) x

1. There are two branches — one on the top-left and another on the bottom-right.
2. Intercepts. Plug in x = 0 to get y = 3/4. So, the y-intercept is (0, 3/4).
Plug in y = 0 to get 7x + 3 = 0 or x = −3/7. So, the x-intercept is (−3/7, 0).
3. There are no turning points.
4. Asymptotes. The value of x that makes the denominator 0 is −2 — hence, the
vertical asymptote is x = −2.
The quotient in the long division is 7/2 — hence, the horizontal asymptote is y = 7/2.
Since the two asymptotes x = −2 and y = 7/2 are perpendicular, this is again a rect-
angular hyperbola.
5. The hyperbola’s centre (the point at which the two asymptotes intersect) is (−2, 7/2).
(These coordinates are given by the vertical and horizontal asymptotes.)
6. The two lines of symmetry may be written as y = x + α and y = −x + β and pass
through the centre (−2, 7/2). Plugging in the numbers, we find that α = 11/2 and
β = 3/2. Thus, the two lines of symmetry are y = x + 11/2 and y = −x + 3/2.

310, Contents

−5x + 1
Example 377. Consider y = . Do the long division:
3x + 2
3x + 2 −5x +1
−5x + 1 5 13/3
−5x −10/3 Ô⇒ y= =− + .
3x + 2 3 3x + 2

x = −2/3

y = −x − 7/3

(0, 1/2) (1/5, 0) x

y = −5/3

(−2/3, −5/3)

y =x−1 −5x + 1
3x + 2

1. There are two branches — one on the top-left and another on the bottom-right.
2. Intercepts. Plug in x = 0 to get y = 1/2. So, the y-intercept is (0, 1/2).
Plug in y = 0 to get −5x + 1 = 0 or x = 1/5. So, the x-intercept is (1/5, 0).
3. There are no turning points.
4. Asymptotes. The value of x that makes the denominator 0 is −2/3 — hence, the
vertical asymptote is x = −2/3.
The quotient in the long division is −5/3 — hence, the horizontal asymptote is y = −5/3.
Since the two asymptotes x = −2/3 and y = −5/3 are perpendicular, this is again a
rectangular hyperbola.
5. The hyperbola’s centre (the point at which the two asymptotes intersect) is
(−2/3, −5/3). (These coordinates are given by the vertical and horizontal asymptotes.)
6. The two lines of symmetry may be written as y = x + α and y = −x + β and pass
through the centre (−2/3, −5/3). Plugging in the numbers, we find that α = −1 and
β = −7/3. Thus, the two lines of symmetry are y = x − 1 and y = −x − 7/3.

311, Contents

bx + c
The following Fact summarises the features of the hyperbola y = :
dx + e
Fact 37. Let b, c, d, e ∈ R with d ≠ 0 and cd − be ≠ 0. Consider the graph of
bx + c
dx + e

(a) Intercepts. If e ≠ 0, then there is one y-intercept (0, c/e). (If e = 0, then there are
no y-intercepts.) And if b ≠ 0, then there is one x-intercept (−c/b, 0). (If b = 0, then
there are no x-intercepts.)
(b) There are no turning points.
(c) There is the horizontal asymptote y = b/d and the vertical asymptote x = −e/d.
(The asymptotes are perpendicular and so, this is a rectangular hyperbola.)
(d) The hyperbola’s centre is (−e/d, b/d).
(e) The two lines of symmetry are y = ±x + (b + e) /d.

Proof. We proved (a), (c), and (d) above. For (b) and (e), see p. 1277 (Appendices).

Exam Tip for Towkays

For the hyperbola y = (bx + c) / (dx + e), you should know how to find (a) the x- and
y-intercepts; and (c) the horizontal and vertical asymptotes.
You’re not required to know what (d) the hyperbola’s centre is, but since this is simply
the intersection of the two asymptotes (which you already know how to find), you might
as well know how it, since it’ll help you sketch better graphs.
You’re also not required to know how to find (e) the equations of the two lines of sym-
metry. But as we’ve shown in the above examples, it is not very difficult to figure out
their equations. It is certainly not very difficult for you to at least sketch them.
After you are done with Part V (Calculus), there is a small possibility that you are
required to prove that (b) this hyperbola has no turning points.(And so you may or may
not be interested in reading the proof of (b) in the Appendices.)

Exercise 129. Graph and describe the features of the following equations.

3x + 2
(a) y = . (Answer on p. 1436.)
(b) y = . (Answer on p. 1437.)
−2x + 1
−3x + 1
(c) y = . (Answer on p. 1438.)
2x + 3

312, Contents

ax2 + bx + c
22.8. The Hyperbola y =
dx + e
We now study the equation y = (ax2 + bx + c) / (dx + e). We’ll assume that a ≠ 0, d ≠ 0, and
either c ≠ 0 or e ≠ 0. This is because:
• If a = 0, then this is simply the equation we studied in the last subchapter.
• If d = 0, then the equation is quadratic and we’ve already studied that.
• If c = 0 = e, then the equation is linear and we’ve already studied that.

Example 378. Consider y = (x2 + 1) /x.

x y
x x2 +1 x2 + 1
Do the long y=
division: x2 x

1 y = (1 + 2) x

x2 + 1 1
Ô⇒ y= =x+ .
x x (1, 2)

y=x (0, 0)

(−1, −2) y = (1 − 2) x


1. There are two branches — one on the bottom-left and another on the top-right.
2. Intercepts. If we plug in x = 0, then y is undefined. Thus, there are no y-intercepts.
And if we plug in y = 0, then x2 + 1 = 0, an equation for which there are no (real)
solutions. Thus, there are no x-intercepts.
3. There are two turning points: (−1, −2) is a strict local maximum and (1, 2) is a strict
local minimum.
4. Asymptotes. The value of x that makes the denominator 0 is 0 — hence, the ver-
tical asymptote is x = 0. The quotient in the long division is x — hence, the oblique
asymptote is y = x. (By the way, here for the first time, the asymptotes here are not
perpendicular and so, this is a non-rectangular hyperbola.)
5. The hyperbola’s centre is (0, 0). Recall that the centre is simply the point at which
the two asymptotes intersect. In this example, the intersection of the asymptotes x = 0
and y = x is (0, 0),

6. The two lines of symmetry are y = (1 ± 2) x.

313, Contents

Here are the features of the hyperbola y = (ax2 + bx + c) / (dx + e).
• Intercepts. Plug in x = 0 to get y = c/e and thus the y-intercept (0, c/e). (Note that
if e = 0, then c/e is undefined and there is no y-intercept. This was the case in the last
example.) The x-intercepts are given by the values of x for which ax2 + bx + c = 0:

−b ± b2 − 4ac
( , 0).

Note that if b2 − 4ac < 0, then there are no x-intercepts (this was the case in the last
example). And if b2 − 4ac = 0, then there is exactly one x-intercept, namely (−b/ (2a) , 0)
• Asymptotes. The value of x that makes the denominator dx + e zero is x = −e/d and
gives us the vertical asymptote x = −e/d. To find the other oblique asymptote, do
the long division:

ax/d + (bd − ae) /d2

dx + e ax2 +bx +c
ax2 +aex/d
(bd − ae) x/d +c
(bd − ae) x/d + (bde − ae2 ) /d2
(cd2 + ae2 − bde) /d2 .

bd − ae cd2 + ae2 − bde 1

Ô⇒ y = x+ + ⋅
dx + e
d2 d2
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
Quotient Remainder

The quotient gives us the oblique asymptote y = ax/d + (bd − ae) /d2 . (Since the asymp-
totes are not perpendicular, this hyperbola is not rectangular.)
• The centre is the point at which the two asymptotes intersect. Its x-coordinate is given
by the vertical asymptote x = −e/d. For its y-coordinate, plug x = −e/d into the equation
of the oblique asymptote:

bd − ae −ae + bd − ae bd − 2ae
y= (− ) + = =
a e
d d d2 d2 d2

bd − 2ae
Thus, the centre is: (−e/d, ).
• You need not know how to find the equations of the lines of symmetry.
You should however know how to roughly sketch them. So, just remember that they (a)
pass through the centre; and (b) bisect the angles formed by the two asymptotes.

314, Contents

x2 + 3x + 1
Example 379. Consider y = . Do the long division:

x2 + 3x + 1 1
y= =x+2− .
x+1 x+1

1. There are two branches — one on the left and another on the right.
2. Intercepts. Plug in x = 0 to get y = 1/1 = 1. Thus, the y-intercept is (0, 1). Plug in

y = 0 to get x2 + 3x + 1 = 0 — thus, the two x-intercepts are (0.5 (−3 ± 5) , 0).
3. There are no turning points.
4. Asymptotes. The value of x that makes the denominator 0 is −1 — hence, the
vertical asymptote is x = −1. The quotient in the long division is x + 2 — hence, the
oblique asymptote is y = x + 2.
5. The centre’s x-coordinate is given by the vertical asymptote x = −1. For its y-
coordinate, plug x = −1 into the oblique asymptote to get y = −1 + 2 = 1. Hence, the
centre is (−1, 1).
√ √
6. The two lines of symmetry are y = (1 ± 2) x + 2 ± 2.

x2 + 3x + 1
y= x = −1 y =x+2

√ √
y = (1 − 2) x + 2 − 2

(−1, 1) (0, 1)

−3 − 5
( , 0)
2 x

−3 + 5
( , 0)

√ √
y = (1 + 2) x + 2 + 2

315, Contents

2x2 + 2x + 1
Example 380. Consider y = . Do the long division:
−x + 1

2x2 + 2x + 1 5 5
= −2x − 4 + = −2x − 4 + .
−x + 1 −x + 1 −x + 1

1. There are two branches — one on the top-left and another on the bottom-right.
2. Intercepts. Plug in x = 0 to get y = 1/1 = 1. Thus, the y-intercept is (0, 1).
Plug in y = 0 to get 2x2 + 2x + 1 = 0, an equation for which there are no (real) solutions.
Thus, there are no x-intercepts.
√ √
3. The two turning points are (1 ± 0.5 10, −6 ± 2 10).
4. Asymptotes. The value of x that makes the denominator 0 is 1 — hence, the vertical
asymptote is x = 1. The “quotient” in the long division is −2x − 4 — hence, the oblique
asymptote is y = −2x − 4.
5. The centre’s x-coordinate is given by the vertical asymptote x = 1. For its y-
coordinate, plug x = 1 into the oblique asymptote to get y = −2 (1) − 4 = −6. Hence,
the centre is (1, −6).
√ √
6. The two lines of symmetry are y = (−2 ± 5) x − 4 ± 5.

2x2 + 2x + 1
−x + 1
√ √
y = (−2 + 5) x − 4 − 5 x=1

(0, 1) x
√ √
(1 − 10/2, −6 − 2 10) (1, −6)
√ √
y = (−2 − 5) x − 4 + 5 y = −2x − 4
√ √
(1 + 10/2, −6 + 2 10)

316, Contents

Given the hyperbola

ax2 + bx + c
y= ,
dx + e

you now know how to find its intercepts, asymptotes, and centre.168
And after we’ve done Calculus (Part V), you’ll also be able to find the turning points.
To repeat, you do not need to know how to find the equations of the two lines of sym-
metry. However, you should at least be able to roughly sketch them.

Exercise 130. Graph the equations below. Label any intercepts, asymptotes, and centre.
Roughly indicate or sketch any turning points and lines of symmetry.

x2 + 2x + 1
(a) y = . (Answer on p. 1439.)
−x2 + x − 1
(b) y = . (Answer on p. 1440.)
2x2 − 2x − 1
(c) y = . (Answer on p. 1441.)

Fact 198 in the Appendices summarises the features of this hyperbola.
317, Contents
23. Simple Parametric Equations
We can sometimes describe a graph (i.e. a set of points) using an equation. We can
sometimes also describe a graph using parametric equations:

Example 381. We can describe the unit circle centred on the origin with the
equation x2 + y 2 = 1. This graph (set of points) is the set S = {(x, y) ∶ x2 + y 2 = 1}.
Recall169 that sin2 t + cos2 t = 1. And so, observe that by letting x = cos t and y = sin t, we
have x2 + y 2 = 1. We thus have a second method for writing down the set S:

S = {(x, y) ∶ x = cos t, y = sin t, t ≥ 0} .

We call the variable t a parameter — hence the name parametric equations. In

words, S is the set of points such that x = cos t, y = sin t, and t ≥ 0.
As t increases from 0 to 2π, we trace out, anti-clockwise, the unit circle:

t = 0 Ô⇒ (x, y) = (1, 0), t = π Ô⇒ (x, y) = (−1, 0),

√ √ √ √
t = π/4 Ô⇒ (x, y) = ( 2/2, 2/2) , t = 5π/4 Ô⇒ (x, y) = (− 2/2, − 2/2) ,
t = π/2 Ô⇒ (x, y) = (0, 1), t = 3π/2 Ô⇒ (x, y) = (0, −1),
√ √ √ √
t = 3π/4 Ô⇒ (x, y) = (− 2/2, 2/2) , t = 7π/4 Ô⇒ (x, y) = ( 2/2, − 2/2) .

Arrows indicate y At t = 1,
(x, y) ≈ (0.54, 0.84) ,
direction of
(vx , vy ) ≈ (−0.84, 0.54) ,

(ax , ay ) ≈ (−0.54, −0.84) .
At t = 0,
(x, y) = (1, 0) ,
S = {(x, y) ∶ x + y = 1}
2 2
(vx , vy ) = (1, 0) ,
= {(x, y) ∶ x = cos t, y = sin t, t ≥ 0} (ax , ay ) = (−1, 0) .
l At t =


√ √
2 2
(x, y) = (− ,− ),
2 2
√ √
\ (vx , vy ) = (

√ √

2 2
(ax , ay ) = ( , ).
2 2

(Example continues on the next page ...)

Fact 29.
318, Contents
(... Example continued from the previous page.)
One nice interpretation is that our parametric equations describe the motion of some
particle P in the plane as time t progresses. P ’s rightward and upward displace-
ments (metres) away from the origin are given by x and y.
As time progresses, P moves anti-clockwise along a unit circle centred on the origin O.
At any instant of time t, P ’s position is (cos t, sin t).
• At (the instant of) time t = 0, P is at (x, y) = (cos 0, sin 0) = (1, 0). Thus, P ’s starting
position is 1 m to the right of O.
• At time t = 1, P is at (x, y) = (cos 1, sin 1) ≈ (0.54, 0.84). Thus, after 1 s, P is 0.54 m
right and 0.84 m above O.
√ √
• At time t = 5π/4, P is at (x, y) = (cos (5π/4) , sin (5π/4)) = (− 2/2, − 2/2) ≈
√ √
(−0.71, −0.71). Thus, after 5π/4 s, P is 2/2 m left of and 2/2 m below O.
Every 2π s, P travels one full circle and returns to its starting position (1, 0).
Under this interpretation, we can also calculate the particle’s velocity at each instant in
time. We will decompose the particle’s velocity into the x- and y-components.
Its velocity in the x-direction is the rate of change of displacement in the x-direction with
respect to time t, i.e. the (first) derivative of x w.r.t. t:

vx = = − sin t.
And its velocity in the y-direction is the rate of change of displacement in the y-direction
with respect to time t, i.e. the (first) derivative of y w.r.t. t:

vy = = cos t.
Altogether, (vx , vy ) = (− sin t, cos t). And so:
• At time t = 0, the particle P has velocity (vx , vy ) = (− sin 0, cos 0) = (0, 1) — it is
moving upwards at 1 m s−1 (and not moving rightwards at all).
• At time t = 1, P has velocity (vx , vy ) = (− sin 1, cos 1) ≈ (−0.84, 0.54) — it is moving
leftwards at 0.84 m s−1 and upwards at 0.54 m s−1 .
• At time t = 5π/4, P has velocity
√ √
5π 5π 2 2
(vx , vy ) = (− sin , cos ) = ( ,− ) ≈ (0.71, −0.71).
4 4 2 2
√ √
At time t = 5π/4, P is moving rightwards at 2/2 m s−1 and downwards at 2/2 m s−1 .
(Example continues on the next page ...)

319, Contents

(... Example continued from the previous page.)
We can similarly calculate the particle’s acceleration at each instant in time. We will
decompose the particle’s acceleration into the x- and y-components.
Its acceleration in the x-direction is the rate of change of velocity in the x-direction with
respect to time t or, in other words, the second derivative of x w.r.t. t:

dvx d2 x
ax = = 2 = − cos t.
dt dt
And its acceleration in the y-direction is the rate of change of velocity in the y-direction
with respect to time t or, in other words, the second derivative of y w.r.t. t:

dvy d2 y
vy = = 2 = − sin t.
dt dt
Altogether, (ax , ay ) = (− cos t, − sin t). And so:
• At time t = 0, the particle P has acceleration (ax , ay ) = (− cos 0, − sin 0) = (−1, 0) — it
is accelerating leftwards at 1 m s−2 (and not upwards at all).
• At time t = 1, P has acceleration (ax , ay ) = (− cos 1, − sin 1) ≈ (−0.54, −0.84) — it is
accelerating leftwards at 0.54 m s−2 and downwards at 0.84 m s−2 . Note that at t = 1,
we have vy = 0.54 > 0 but ay = −0.84 < 0 — this means P is still moving upwards, but
this upwards movement is slowing down.
• At time t = 5π/4, P has acceleration
√ √
(ax , ay ) = (− cos (5π/4) , − sin (5π/4)) = ( 2/2, 2/2) ≈ (0.71, 0.71).

√ √
At time t = 5π/4, P is moving rightwards
√ at 2/2 m s −1

and also upwards at 2/2 m s−1 .
Note that at t = 5π/4, we have vy = − 2/2 < 0 but ay = 2/2 > 0 — this means P is still
moving downwards, but this downwards movement is slowing down.170

Exercise 131. Particle Q travels on the same plane as P (from the above example). Q’s
position is described by {(x, y) ∶ x = sin t, y = cos t, t ≥ 0}. (Answer on p. 1442.)

(a) What is Q’s starting position (i.e. at t = 0)?

(b) Is Q travelling clockwise or anticlockwise?
(c) Where will P and Q be at t = 665π?
(d) At what times t are P and Q at the exact same position?
(e) What are Q’s velocity and acceleration in the x- and y-directions at each instant t?

We will revisit this example when we study vectors in Part III. There, we will show that P ’s direction of
movement is always tangent to the circle and its direction of acceleration is always towards the centre.
Moreover, the magnitudes of the particle’s (overall) velocity and acceleration are always constant.
320, Contents
Example 382. Let a, b > 0. The equation x2 /a2 + y 2 /b2 = 1 describes an ellipse centred
on the origin, with x-intercepts (±a, 0) and y-intercepts (0, ±b). This graph is the set:
U = {(x, y) ∶ x2 /a2 + y 2 /b2 = 1}.
Observe that by letting x = a cos t and y = b sin t, we have:

x2 y 2 a2 cos2 t b2 sin2 t
+ 2 = 2
+ 2
= cos2 t + sin2 t = 1.
a b a b
We can thus use parametric equations to rewrite the set U :

U = {(x, y) ∶ x = a cos t, y = b sin t, t ≥ 0}.

We apply the same interpretation as before: t is time (seconds) and the parametric
equations describe the position (metres from the origin O) of some particle R. Observe
that like P , R is travelling anticlockwise.

x2 y 2
b U = {(x, y) ∶ 2 + 2 = 1}
a b
= {(x, y) ∶ x = a cos t, y = b sin t, t ≥ 0}

−a a x


Exercise 132. Continue with the above example. (Answer on p. 1442.)

(a) What are R’s velocity and acceleration in the x- and y-directions at each instant t?
(b) At each of the following times, state R’s position. State also its velocity and acceler-
ation in the x- and y- directions. Describe all of these in words to a layperson, with
reference to the figure in the above example.

(i) t = (ii) t = (iii) t = 2π.

π π
. .
4 2

(c) Above we specified that a, b > 0. How does R’s starting position and direction of
travel (clockwise or anticlockwise) change if:

(i) a, b < 0? (ii) a > 0, b < 0? (iii) a < 0, b > 0?

321, Contents

Example 383. The equation x2 − y 2 = 1 describes the “east-west” hyperbola centred
on the origin, with x-intercepts (±1, 0). Its graph is the set: W = {(x, y) ∶ x2 − y 2 = 1}.
Recall171 the trigonometric identity sec2 t − tan2 t = 1. And so, if we let x = sec t and
y = tan t, then x2 − y 2 = 1. This thus gives us another way to write down the set W :

W = {(x, y) ∶ x = sec t, y = tan t, t ≥ 0} .

We can again impose a similar interpretation: t is time (measured in seconds) and the
parametric equations describe the motion (in metres from the origin O) of a particle A.

W = {(x, y) ∶ x2 − y 2 = 1} y
= {(x, y) ∶ x = sec t, y = tan t, t ≥ 0}
During t ∈ [0, 0.5π),

Arrows indicate A is moving northeast.
direction of travel.

At t = 0,

(x, y) = (1, 0) ,
(vx , vy ) = (0, 1) , x
During t ∈ (0.5π, 1.5π), (ax , ay ) = (1, 0) .
A moves upwards along
the left branch.

An instant after t = 0.5π,

A magically reappears
“near” “bottom-left infinity”.

At each instant t, A’s velocity in the x- and y-directions is given by:

dx dy d sec t d tan t
(vx , vy ) = ( , )=( , ) = (sec t tan t, sec2 t) .
dt dt dt dt
(Example continues on the next page ...)

Exercise 133. Continuing with the above example, write down A’s acceleration in the
x- and y-directions at the instant t. (Answer on the next page.)

Fact 29(b).
322, Contents
(... Example continued from the previous page.)
dvx d ×
ax = = (sec t tan t) = sec t tan t tan t + sec t sec2 t
dt dt
= sec t (tan2 t + sec2 t) = sec t (2 sec2 t − 1) ,

dvy d
ay = = (sec2 t) = 2 sec t ⋅ sec t tan t = 2 sec2 t tan t.
dt dt
d2 x d2 y dvx dvy
Thus: (ax , ay ) = ( , 2)=( , ) = (sec t (2 sec2 t − 1) , 2 sec2 t tan t).
dt dt
2 dt dt

Note that at t = 0.5π, 1.5π, 2.5π, , . . . , both sec t and tan t are undefined. And so, we’ll
say that at these instants in time, the particle A’s position, velocity, and acceleration are
simply undefined.
Observe that interestingly, vy = sec2 t > 0 for all t (for which sec t is well-defined). Hence,
the particle A is always moving upwards (except during the aforementioned instants in
time when its velocity is undefined).

At t = 0, we have: (x, y) = (sec 0, tan 0) = (1, 0),

(vx , vy ) = (sec 0 tan 0, sec2 0) = (0, 1),

(ax , ay ) = (sec 0 (2 sec2 0 − 1) , 2 sec2 0 tan 0) = (1, 0).

So, A starts at the midpoint of the right branch of the hyperbola, is moving upwards at
1 m s−1 , and is accelerating rightwards at 1 m s−2 .
During t ∈ [0, π/2), the particle A is moving northeast. As t → π/2, it “flies off” towards
the “top-right infinity” (∞, ∞) and:

x, y, vx , vy , ax , ay → ∞.

An instant after t = π/2, A magically reappears “near” “bottom-left infinity” (−∞, −∞).
During t ∈ (0.5π, 1.5π), the particle travels upwards along the left branch of the hyperbola.
And again, as t → 1.5π, it “flies off” towards “top-left infinity” (−∞, ∞) and we have:

x, vx , ax → −∞ and y, vy , ay → ∞.

Exercise 134. Continue with the above example. (Answer on p. 1443.)

(a) What happens to particle A an instant after t = 1.5π?
(b) Describe A’s movement during t ∈ (1.5π, 2.5π).

323, Contents

Exercise 135. The set {(x, y) ∶ x = tan t, y = sec t, t ≥ 0} describes particle B’s position in
the plane. (Answer on p. 1444.)
(a) Rewrite the equations x = tan t and y = sec t into a single equation that does not
contain the parameter t.
(b) What is particle B’s starting position (i.e. at t = 0)?
(c) Compute dx/dt. And hence, conclude that the particle B always moves _____.
(d) Describe qualitatively what happens during each of the following time intervals:

π 3π 3π 5π
(i) t ∈ [0, ). (ii) t ∈ ( , ). (iii) t ∈ ( , ).
2 2 2 2 2
(e) Marked below are the positions of the particle B at six different instants in time
t = 0, 1, 2, 3, 4, and 5. However, we do not know which position corresponds to which
instant in time. Without using a calculator, match each of the six positions to
the corresponding instant in time. (Hint: 0.5π ≈ 1.57, π ≈ 3.14, 1.5π ≈ 4.71, and
2π ≈ 6.28.)

Ba y


{(x, y) ∶ x = tan t, y = sec t, t ≥ 0} Bb

Arrows indicate
direction of travel.


324, Contents

23.1. Eliminating the Parameter t
Above, we rewrote a single (cartesian) equation into a pair of (parametric) equations and
in the process introduced the parameter t. Now we’ll go in reverse — we’ll rewrite a pair of
(parametric) equations into a single (cartesian) equation and in the process eliminate the
parameter t.

Example 384. Consider the set S = {(x, y) ∶ x = t2 − 4t, y = t − 1, t ≥ 0}.

As usual, we interpret this set as describing the motion of a particle P in the plane, where
t is time (seconds), while x and y are P ’s rightward and upward displacements (metres)
from the origin. With a little algebra, we can combine the two equations x = t2 − 4t and

y = t − 1 into a single equation and eliminate the parameter t:


• First rewrite y = t − 1 as t = y + 1.
2 3

• Then plug = into = to get x = (y + 1) − 4 (y + 1) = y 2 − 2y − 3.

3 1 2

• Observe also that t ≥ 0 ⇐⇒ y ≥ 0 − 1 = −1.

Altogether, we can rewrite the set S as S = {(x, y) ∶ x = y 2 − 2y − 3, y ≥ −1}.
Note that here we have a quadratic equation. Our quadratic equations are usually in the
variable x, but in this case, it is in the variable y.
The set S is the black graph below and does not include the grey portion. At t = 0, P
starts at the position (x, y) = (0, −1). P never travels along the grey portion.

After t = 2,
At t = 2, P moves rightwards.

(x, y) = (−4, 1) √
(vx, vy ) = (0, 1) At t = 2 + 5,

(ax, ay ) = (2, 0) (x, y) = (1, 1 + 5)
(vx , vy ) = (0, 1)
P always moves (ax , ay ) = (2, 0)
upwards at 1 m s−1 .

During t ∈ [0, 2), At t = 0,
P moves leftwards.
(x, y) = (0, −1)
(vx, vy ) = (−4, 1) P does not travel along
this grey portion.
(ax, ay ) = (2, 0)

(Example continues on the next page ...)

325, Contents

(... Example continued from the previous page.)
Here are P ’s velocity and acceleration, decomposed into the x- and y-directions:

dx dy dax d2 x day d2 y
vx = = 2t − 4, vy = = 1, ax = = 2 = 2, ay = = 2 = 0.
dt dt dt dt dt dt

In the y- or upwards-direction, P moves at a constant velocity of 1 m s−1 and does not

In the x- or rightwards-direction, P moves at velocity 2t − 4 m s−1 and accelerates at a
constant rate of 2 m s−2 . In particular:
• When t < 2, we have 2t − 4 < 0 and so P is moving leftwards.
• At t = 2, the particle is at the leftmost point of the parabola and its velocity in the
x-direction is 0 m s−1 .
• And when t > 2, we have 2t − 4 > 0 and so P is moving rightwards.

Example 385. Consider the set U = {(x, y) ∶ x = 2 cos t − 4, y = 3 sin t + 1, t ≥ 0}.

Again, we interpret this set as describing the motion of a particle Q in the plane. With
a little algebra, we can combine the two equations x = 2 cos t − 4 and y = 3 sin t + 1 into a
1 2

single equation and eliminate the parameter t:

• First rewrite x = 2 cos t − 4 as cos t =
• Next rewrite y = 3 sin t + 1 as sin t =
• Now recall172 the trigonometric identity sin2 t + cos2 t = 1.
Thus, we can rewrite the set U as:

x+4 2 y−1 2
U = {(x, y) ∶ ( ) +( ) = 1} .
2 3

This describes an ellipse centered on (−4, 1).

Note that since x and y can be expressed in terms of trigonometric functions, we know
that x and y will be periodic. That is, the particle Q will keep repeating its movement
along a certain path. And so, we need not include any constraints for x and y.
Note also that by eliminating the parameter t, we’ve lost one piece of information —
namely Q’s starting position.
(Example continues on the next page ...)

Fact 29(a).
326, Contents
(... Example continued from the previous page.) y

(−4, 1)
At t = 0, x
(x, y) = (−2, 1)
(vx , vy ) = (0, 3)
(ax , ay ) = (−2, 0)

Here are Q’s velocity and acceleration, decomposed into the x- and y-directions:

dx dy dax d2 x day d2 y
vx = = −2 sin t, vy = = 3 cos t, ax = = 2 = −2 cos t, ay = = 2 = −3 sin t.
dt dt dt dt dt dt
At t = 0, Q’s starting position and velocity are:

(x, y) = (2 cos 0 − 4, 3 sin 0 + 1) = (−2, 1) and (vx , vy ) = (−2 sin 0, 3 cos 0) = (0, 3).

So, it starts at the rightmost point of the ellipse and is moving upwards at 3 m s−1 . Thus,
its direction of travel is anticlockwise around the ellipse. Every 2π s, Q completes one
full revolution around the ellipse.

Exercise 136. The sets A, B, and C below describe the positions (metres) of particles
A, B, and C at time t (seconds), relative to the origin. For each set:
(i) Rewrite the set so that the parameter t is eliminated.
(ii) Sketch the graph.
(iii) Describe the particle’s position and velocity as time progresses.

(a) A = {(x, y) ∶ x = t − 1, y = ln (t + 1) , t ≥ 0} (Answer on p. 1445.)

(b) B = {(x, y) ∶ x = , y = t2 + 1, t ≥ 0}. (Answer on p. 1446.)
(c) C = {(x, y) ∶ x = 2 sin t − 1, y = 3 cos2 t, t ≥ 0}. (Answer on p. 1447.)

327, Contents

24. Solving Inequalities

24.1. Multiplying an Inequality by an Unknown Constant

Given an equation a = b, we can always multiply by any unknown constant x and the
equality will still be preserved:

ax = bx. 3

This however is not generally true with inequalities:

Example 386. Consider the inequality 2 > 1.

(a) It is preserved if we multiply it by a positive constant like 3:

2×3>1×3 or 6 > 3.

(b) It is reversed (or flipped) if we multiply it by a negative constant like −3:

2 × (−3) < 1 × (−3) or −6 < −3.

(c) It becomes an equality if we multiply it by 0:

2×0=1×0 or 0 = 0.

The above seems obvious. But a common mistake is to multiply an inequality by some
unknown constant x and expect it to be preserved:

Example 387. Let x ∈ R. Beng reasons, “We know that 8 > 5. Therefore 8x > 5x.”
Beng’s reasoning is wrong.
(a) If x > 0, then yea, he happens to be correct and 8x > 5x. 3
(b) If x = 0, then 8x = 5x = 0 and he’s wrong. 7
(c) If x < 0, then 8x < 5x and he’s again wrong. 7

In general:

Fact 38. Let a, b, x ∈ R. Suppose a > b.

(a) If x > 0, then ax > bx. (Positive multiplication preserves order)

(b) If x < 0, then ax < bx. (Negative multiplication reverses order)
(c) If x = 0, then ax = bx = 0.

Proof. Omitted.173
These results may seem “obvious”, but to properly prove them, we need to first define the notion of
328, Contents
The above result also holds if (i) the inequalities are reversed; or (ii) the strict inequalities
(i.e. > and <) are replaced with weak ones (i.e. ≥ and ≤).
In general, we may have to break our analysis into two (or three) cases, depending on
whether x is positive, negative, or zero. Example:

Example 388. Let a, b, x ∈ R with x ≠ 0. Suppose we are told that:

> .
a 1 b
x x

What can we say about a and b?

Beng reasons, “Multiply > by x to conclude that a > b.”
As usual, Beng’s reasoning is wrong. We must break our analysis down into two cases:

(a) If x > 0, then yea, he happens to be correct — we can indeed multiply > by x to

conclude that a > b.

(b) But if x < 0, then multiplying > by x reverses the inequality, so that a < b.

So, the correct solution is this:

Given >, if x > 0, then a > b; but if x < 0, then a < b.


By the way, why did I specify that x ≠ 0?174

If x = 0, then a/x and b/x would be undefined and > could not possibly hold.
174 1

329, Contents

24.2. ax2 + bx + c > 0
This subchapter also doubles as yet another review of quadratic equations.

Example 389. Solve −x2 + 3x − 1 > 0.

Since a < 0, this is a ∩-shaped quadratic.
And since the discriminant b2 − 4ac = 32 − 4 (−1) (−1) = 5 is positive, the graph intersects
the x-axis at two points, given simply by the two roots of the quadratic, which are:
√ √
−3 ± 5 3 ∓ 5
x= = .
−2 2

Altogether then, −x2 + 3x − 1 > 0 is true between those two roots. That is, the given
inequality has solution set:
√ √
3− 5 3+ 5
( , ).
2 2

√ √
3− 5 3+ 5
x∈( , )
2 2
solves −x2 + 3x − 1 > 0.

y = −x2 + 3x − 1

330, Contents

Example 390. Solve x2 + 3x + 1 > 0.
Since a > 0, this is a ∪-shaped quadratic.
And since the discriminant b2 − 4ac = 32 − 4(1)(1) = 5 is positive, the graph intersects the
x-axis at two points, given simply by the two roots of the quadratic, which are:

−3 ± 5
x= .

Altogether then, x2 + 3x + 1 > 0 “outside” those two roots. That is, the given inequality
has solution set:
√ √
−3 − 5 −3 + 5
R∖[ , ].
2 2

y = x2 + 3x + 1

−3 + 5

−3 − 5 x

331, Contents

Example 391. Solve −x2 + 2x − 1 > 0.
Since a < 0, this is a ∩-shaped quadratic
And since the discriminant b2 − 4ac = 22 − 4(−1)(−1) = 0 is zero, the graph just touches
the x-axis at:
x=− =− = 1.
2a 2(−1)

Altogether then, there are no values of x for which x2 + 2x + 1 > 0 is true. Equivalently,
the inequality has solution set ∅.


y = −x2 + 2x − 1

(there are no values of x
for which −x2 + 2x − 1 > 0)

Note that 1 does not belong to the solution set of this inequality. This is because at x = 1,
we have −x2 + 2x − 1 = −12 + 2 ⋅ 1 − 1 = 0. And so, at x = 1, it is not true that −x2 + 2x − 1 > 0.

332, Contents

Example 392. Solve x2 + 2x + 1 > 0.
Since a > 0, this is a ∪-shaped quadratic.
And since the discriminant b2 − 4ac = 22 − 4(1)(1) = 0 is zero, the graph just touches the
x-axis at:
x=− =− = −1.
2a 2(1)

Altogether then, x2 + 2x + 1 > 0 for all values of x except −1. Equivalently, the inequality
has solution set R ∖ {−1}.

y = x2 + 2x + 1

x ∈ R ∖ {−1} solves
x2 + 2x + 1 > 0

1 x

Note that 1 does not belong to the solution set of this inequality. This is because at x = 1,
we have x2 + 2x + 1 = 12 + 2 ⋅ 1 + 1 = 0. And so, at x = 1, it is not true that x2 + 2x + 1 > 0.

333, Contents

Example 393. Solve −x2 + x − 1 > 0.
Since a < 0, this is a ∩-shaped quadratic.
And since the discriminant b2 −4ac = (−1) −4(−1)(−1) = −3 is negative, the graph doesn’t

touch the x-axis at all and is completely below the x-axis.

Altogether then, there are no values of x for which x2 + x + 1 > 0 holds. Equivalently, the
inequality has solution set ∅.


y = −x2 + x − 1

(there are no values of x
for which −x2 + x − 1 > 0).

334, Contents

Example 394. Solve x2 + x + 1 > 0.
Since a > 0, this is a ∪-shaped quadratic.
And since the discriminant b2 − 4ac = 12 − 4(1)(1) = −3 is negative, the graph doesn’t
touch the x-axis at all and is completely above the x-axis.
Altogether then, x2 + x + 1 > 0 holds for all values of x. Equivalently, the inequality has
solution set R.

y = x2 + x + 1

x ∈ R solves
x2 + x + 1 > 0

Here is the general solution for ax2 + bx + c > 0 (x ∈ R):

Fact 39. Let a, b, c be constants and x ∈ R. Let r1 and r2 be the two roots given by the
quadratic formula. That is, let:
√ √
−b − b2 − 4ac −b + b2 − 4ac
r1 = and r2 = .
a a

Then here are the solution sets for ax2 + bx + c > 0, in the six different cases:

(a) b2 − 4ac > 0 (b) b2 − 4ac = 0 (c) b2 − 4ac < 0

(1) a > 0 R ∖ [r1 , r2 ] R ∖ {−b/2a}. ∅
(2) a < 0 (r2 , r1 ) ∅ ∅

Proof on the next page:

335, Contents

Proof. Case 1. If a > 0, then this is a ∪-shaped quadratic.
(a) If b2 −4ac > 0, then y = ax2 +bx+c intersects the x-axis at x = r1 , r2 . Thus, ax2 +bx+c > 0
for all x “outside” those two roots — i.e. for all x except x ∈ [r1 , r2 ].
(b) If b2 − 4ac = 0, then y = ax2 + bx + c just touches the x-axis at x = −b/2a. Thus,
ax2 + bx + c > 0 for all x except x = −b/2a.
(c) If b2 −4ac < 0, then y = ax2 +bx+c is completely above the x-axis and thus ax2 +bx+c > 0
for all x.

Case 1. a > 0, y
∪-shaped quadratic

b2 − 4ac < 0

b2 − 4ac > 0
b2 − 4ac = 0
b2 − 4ac = 0 x

b2 − 4ac > 0

b2 − 4ac < 0

Case 2. a < 0,
∩-shaped quadratic

Case 2. If a < 0, then this is a ∩-shaped quadratic.

(a) If b2 −4ac > 0, then y = ax2 +bx+c intersects the x-axis at x = r1 , r2 . Thus, ax2 +bx+c > 0
for all x “between” those two roots — i.e. for all x ∈ (r2 , r1 ).175
(b) If b2 − 4ac = 0, then y = ax2 + bx + c just touches the x-axis at x = −b/2a. Thus,
ax2 + bx + c > 0 for no x.
(c) If b2 −4ac < 0, then y = ax2 +bx+c is completely below the x-axis and thus ax2 +bx+c > 0
for no x.

Exercise 137. Solve each inequality. (Answer on p. 1452.)

(a) −3x2 + x − 5 > 0. (b) x2 − 2x − 1 > 0.


Note that r2 < r1 because a < 0.
336, Contents
ax + b
24.3. >0
cx + d
Let be a fraction where N is the numerator and D is the denominator. Then:

>0 ⇐⇒ N and D have the same signs (i.e. N, D > 0 OR N, D < 0).

Example 395. 4/7 > 0 because both N = 4 and D = 7 are positive.

Example 396. −5/ − 3 > 0 because both N = −5 and D = −3 are negative.

<0 ⇐⇒ N and D have different signs (i.e. N, D > 0 OR N, D < 0).


Conversely, N /D < 0 if and only if N and D have opposite signs.

Example 397. −9/2 < 0 because N = −9 < 0 but D = 2 > 0.

Example 398. 1/ − 8 < 0 because N = 1 > 0 but D = −8 < 0.

337, Contents

Example 399. Consider 1/x < 1. Let’s first solve this inequality the wrong way:

1. Multiply both sides by x to get 1 < x.

2. Hence, x > 1 — or equivalently, the solution set for < is (1, ∞). 7

Again, the mistake here is to multiply both sides by the unknown quantity x and expect
the inequality to be preserved. But it may not be, because x may be negative.
Here’s the correct solution. First, rewrite the given inequality into SF:

1 1 1 x−1 N 2
<1 ⇐⇒ 1− >0 ⇐⇒ = > 0.
x x x D
Observe that D = 0 ⇐⇒ x = 0 and N = 0 ⇐⇒ x = 1.
Let us call the points at which either the denominator or numerator equals zero the zero
points. So here, the zero points are 0 and 1.
We now consider the sign of N /D, in each of the three possible cases:
(a) If x < 0, then N < 0 and D < 0, so that N /D > 0.
(b) If 0 < x < 1, then N < 0 and D > 0, so that N /D < 0.
(c) If x > 0, then N > 0 and D > 0, so that N /D > 0.
The above observations may be compactly summarised in the following diagram.
+ − +

0 1

We conclude: < holds ⇐⇒ > holds ⇐⇒ x < 0 OR x > 1.

1 2

Equivalently, the solution set for < is (−∞, 0) ∪ (1, ∞).


For convenience (and lack of a better name), let’s call the above the sign diagram.

y 1



338, Contents x
5x + 4
Example 400. Solve > 0.
−2x + 1
We note that the numerator and denominator equal zero at −4/5 and 1/2. And so, we
have the sign diagram below, because:
(a) If x < − , then N < 0 and D > 0, so that N /D < 0.
4 1
(b) If − < x < , then N < 0 and D > 0, so that N /D > 0.
5 2
(c) If x > , then N > 0 and D > 0, so that N /D < 0.

− + −
−4/5 1/2

4 1
Hence, − < x < .
5 2
4 1
Equivalently, the solution set is: (− , ).
5 2

4 1
5x + 4 −
y= 5 2
−2x + 1

339, Contents

= > 0.
Example 401. Solve
D 3x + 2
We note that the numerator and denominator equal zero at −3 and −2/3. And so, we
have the sign diagram below, because:
(a) If x < −3, then N < 0 and D < 0, so that N /D > 0.
(b) If −3 < x < −2/3, then N < 0 and D > 0, so that N /D < 0.
(c) If x > −2/3, then N > 0 and D > 0, so that N /D > 0.

+ − +
−3 −2/3

Hence, x < −3 OR x > −2/3.

Equivalently, the solution set is: (−∞, −3) ∪ (− , ∞).

3x + 2

−3 2 x


340, Contents

N 4x − 1 1
Example 402. Solve = > 0.
D x+2
We note that the numerator and denominator equal zero at 1/4 and −2. And so, we have
the sign diagram below, because:
(a) If x < −2, then N < 0 and D < 0, so that N /D > 0.
(b) If −2 < x < 1/4, then N < 0 and D > 0, so that N /D < 0.
(c) If x > 1/4, then N > 0 and D > 0, so that N /D > 0.

+ − +

−2 1/4

Hence, x < −2 OR x > 1/4.

Equivalently, the solution set is: (−∞, −2) ∪ ( , ∞).

4x − 1

−2 1

Exercise 138. Solve each inequality. (Answers on p. 1449.)

x−1 −1 1 2x + 1 −3x − 18
(a) > 0. (b) > 0. (c) > 0. (d) > 0. (e) > 0.
−4 −4 −4 3x + 2 9x − 14

341, Contents

3x − 2
Example 403. Solve < 3.
−5x + 1
first rewrite the given inequality into SF:

3x − 2 −15x + 3 − (3x − 2) −18x + 5

3− > 0 ⇐⇒ > 0 ⇐⇒ > 0.
−5x + 1 −5x + 1 −5x + 1
Now as usual, write:

−18x + 5
> 0 ⇐⇒ (−18x + 5, −5x + 1 > 0 OR − 18x + 5, −5x + 1 < 0).
1 2
−5x + 1

Observe that:
5 1 1
• −18x + 5, −5x + 1 > 0 ⇐⇒ (x < AND x < ) ⇐⇒ x < .
18 5 5
5 1 5
• −18x + 5, −5x + 1 < 0 ⇐⇒ (x > AND x > ) ⇐⇒ x > .
18 5 18
3x − 2 −18x + 5 1 5
Altogether then, < 3 ⇐⇒ > 0 ⇐⇒ x ∈ R ∖ [ , ].
−5x + 1 −5x + 1 5 18


1 5 x
3x − 2
y= 5 18
−5x + 1

Exercise 139. Solve each inequality. (Answer on p. 1451.)

2x + 3 −4x + 2
(a) < 9. (b) > 13.
−x + 7 x+1

342, Contents

24.4. xxx
Now, dividing an inequality by an unknown constant x is equivalent to multiplying it by
1/x. And so, we must be equally careful when dividing an inequality by an unknown
constant x:

Example 404. Solve xex ≥ xe (x ∈ R).


Beng reasons, “Divide ≥ by x to get ex ≥ e. Which is true for all x ≥ 1. Hence, the solution

set for ≥ is [1, ∞).” 7


Again, Beng’s mistake is to divide ≥ by x and assume the inequality will be preserved.

By the way, when given any inequality, I recommend first rewriting it in the following
Standardised Form (SF), where STUFF is on the LHS of the inequality and 0 is on
the RHS, like this:

Standardised Form: STUFF ⋛ 0.

(The symbol ⋛ stands for any inequality, i.e. any of >, ≥, <, or ≤.)
Strictly speaking, it isn’t necessary to rewrite inequalities into SF. But if you make it a
habit to always do so, you’ll be less likely to make a careless mistake.
So, here let’s rewrite ≥ into SF, then factorise:

xex − xe ≥ 0 ⇐⇒ xe (ex−1 − 1) ≥ 0 ⇐⇒ x (ex−1 − 1) ≥ 0.

In the last step, we divided by the positive constant e. This is equivalent to multiplying
by the positive constant 1/e and hence by Fact 38 preserves the inequality.
Let a = x (ex−1 − 1). We observe that:

a=0 ⇐⇒ x = 0 OR x = 1.

343, Contents

24.5. Inequalities Involving the Absolute Value Function
You’re required to solve inequalities involving the absolute value function ∣⋅∣. So here are a
few useful and non-obvious facts:

Fact 40. Let b ≥ 0. Then

(a) ∣x∣ < b ⇐⇒ −b < x < b.
(b) ∣x∣ ≤ b ⇐⇒ −b ≤ x ≤ b.
(c) ∣x∣ > b ⇐⇒ (x > b OR x < −b).
(d) ∣x∣ ≥ b ⇐⇒ (x ≤ b OR x ≤ −b).

Proof. See p. 1280 in the Appendices.

Example 405. ∣x∣ < 5 ⇐⇒ −5 < x < 5. The solution set is (−5, 5).

Example 406. ∣x∣ ≤ 3 ⇐⇒ −3 ≤ x ≤ 3. The solution set is [−3, 3].

Example 407. ∣x∣ > 7 ⇐⇒ (x > 7 OR x < −7). The solution set is (−∞, −7) ∪ (7, ∞).

Example 408. ∣x∣ ≥ 1 ⇐⇒ (x ≥ 1 OR x ≤ −1). The solution set is (−∞, −1] ∪ [1, ∞).

A more general version of the above result is the following:

Fact 41. Let a ∈ R, b ≥ 0. Then

(a) ∣x − a∣ < b ⇐⇒ a − b < x < a + b.
(b) ∣x − a∣ ≤ b ⇐⇒ a − b ≤ x ≤ a + b.
(c) ∣x − a∣ > b ⇐⇒ (x > a + b OR x < a − b).
(d) ∣x − a∣ ≥ b ⇐⇒ (x ≥ a + b OR x ≤ a − b).

Proof. See p. 1280 in the Appendices.

Example 409. Solve ∣x − 1∣ < 5.


∣x − 1∣ < 5 ⇐⇒ −5 < x − 1 < 5 ⇐⇒ −4 < x < 6.


Equivalently, we may say that the solution set for < is (−4, 6).

Example 410. Solve ∣x + 4∣ ≤ 3.


∣x + 4∣ ≤ 3 ⇐⇒ −3 ≤ x + 4 ≤ 3 ⇐⇒ −7 ≤ x ≤ −1.

Equivalently, we may say that the solution set for ≤ is [−1, −7].

344, Contents

Example 411. Solve ∣x − 2∣ > 7.

∣x − 2∣ > 7 ⇐⇒ x − 2 > 7 OR x − 2 < −7 ⇐⇒ x > 9 OR x < −5.


Equivalently, we may say that the solution set for > is (∞, −5] ∪ [9, ∞).

Example 412. Solve ∣x + 1∣ ≥ 1.


∣x + 1∣ ≥ 1 ⇐⇒ x + 1 ≥ 1 OR x + 1 ≤ −1 ⇐⇒ x ≥ 0 OR x ≤ 0 ⇐⇒
x ∈ R.

Equivalently, we may say that the solution set for ≥ is R.


There’s actually a nice geometric interpretation of Fact 41:

(a) ∣x − a∣ < b means that the distance between x and a is less than b.

∣x − a∣ < b
( )
a − b a x a + b

(b) ∣x − a∣ ≤ b means that the distance between x and a is no greater than b.

∣x − a∣ ≤ b
[ ]
a  − b a x a  + b

(c) ∣x − a∣ > b means that the distance between x and a is more than b.

∣x − a∣ > b
) (
a  − b a a  + b x

(d) ∣x − a∣ ≥ b means that the distance between x and a is no less than b.

∣x − a∣ ≥ b
] [
a  − b a a  + b x

345, Contents

It will often be easier to solve inequalities with the aid of a graph:

346, Contents

Example 413. Solve ∣x − 4∣ ≤ 2x.

We will first solve ≤ with the aid of graphs. To do so, first sketch the graphs of y = ∣x − 4∣

and y = 2x. To sketch the former, simply recall from Ch. 16.6 that the graph of y = ∣f (x)∣
is the same as that of f , but with any portion below the x-axis reflected in the x-axis.

y = 2x

y = ∣x − 4∣


“Clearly”, ≤ holds ⇐⇒ x is to the right of the intersection point P . Our goal then is to

find P .
From the graph, we see that at P , we have x − 4 < 0 and thus ∣x − 4∣ = − (x − 4) = 4 − x.
And so, P is given by:

4 − x = 2x or x= .
4 4
Hence, x ≥ . Or equivalently, the solution set for ≤ is [ , ∞).

3 3
We now solve ≤ again but this time a little more rigorously and without the aid of graphs.

We consider two cases. If (a) x < 4, then:

∣x − 4∣ ≤ 2x ⇐⇒ 4 − x ≤ 2x ⇐⇒ 4 ≤ 3x ⇐⇒ ≤ x.

If (b) x ≥ 4, then:

∣x − 4∣ ≤ 2x ⇐⇒ x − 4 ≤ 2x ⇐⇒ −4 ≤ x,

347, Contents

Example 414. Solve ∣2x − 4∣ > x.
Sketching the graphs of y = ∣2x − 4∣ and y = x, we see that ∣2x − 4∣ > x ⇐⇒ x is to the
left or right of the two intersection points P and Q. So, let’s find P and Q.

y = ∣2x − 4∣


3 4

At P , we have 2x − 4 < 0 and thus ∣2x − 4∣ = − (2x − 4) = 4 − 2x. And so, P is given by:

4 − 2x = x or x= .
At Q, we have 2x − 4 > 0 and thus ∣2x − 4∣ = 2x − 4. And so, Q is given by:

2x − 4 = x or x = 4.
Conclusion: x < or x > 4 solves the inequality. In other words, the solution set is:
4 4
(−∞, ) ∪ (4, ∞) = R ∖ [ , 4].
3 3

348, Contents

Example 415. Solve ∣3x − 4∣ ≥ 2x + 2.
Sketching the graphs of y = ∣3x − 4∣ and y = 2x + 2, we see that ∣3x − 4∣ ≥ 2x + 2 ⇐⇒ x is
to the left or right of the two intersection points P and Q. So, let’s find P and Q.

y = 2x + 2

y = ∣3x − 4∣
5 6 x

At P , we have 3x − 4 < 0 and thus ∣3x − 4∣ = − (3x − 4) = 4 − 3x. And so, P given by:

4 − 3x = 2x + 2 or x= .
At Q, we have 3x − 4 > 0 and thus ∣3x − 4∣ = 3x − 4. And so, Q is given by:

3x − 4 = 2x + 2 or x = 6.
Conclusion: x ≤ or x ≥ 6 solves the inequality. In other words, the solution set is:
2 2
(−∞, ] ∪ [6, ∞) = R ∖ ( , 6).
5 5

349, Contents

Example 416. Solve ∣x + 1∣ ≥ ∣2x − 1∣.
Sketching the graphs of y = ∣x + 1∣ and y = ∣2x − 1∣, we see that ∣x + 1∣ ≥ ∣2x − 1∣ ⇐⇒ x is
between the two intersection points P and Q. So, let’s find P and Q.

y = ∣2x − 1∣

y = ∣x + 1∣

0 2 x

At P , we have x + 1 > 0 and 2x − 1 < 0. Thus, ∣x + 1∣ = x + 1 and ∣2x − 1∣ = 1 − 2x. And so,
P is given by:

x + 1 = 1 − 2x or x = 0.

At Q, we have x + 1 > 0 and 2x − 1 > 0. Thus, ∣x + 1∣ = x + 1 and ∣2x − 1∣ = 2x − 1. And so,

Q is given by:

x + 1 = 2x − 1 or x = 2.

Conclusion: x ∈ [0, 2] solves the inequality. In other words, the solution set is [0, 2].

Exercise 140. Solve each inequality. (Answer on p. 1448.)

(a) ∣x − 4∣ ≤ 71.
(b) ∣5 − x∣ > 13.
(c) ∣−3x + 2∣ − 4 ≥ x − 1.
(d) ∣x + 6∣ > 2 ∣2x − 1∣.

350, Contents

Each of the following three equations is not generally true.
a ∣a∣
∣ ∣= . 7 ∣ ∣= . 7 ∣ ∣= . 7
a a a a
b b b ∣b∣ b b

6 ∣6∣ 6 ∣6∣ 6
Example 417. ∣ ∣≠ because ∣ ∣ = ∣3∣ = 3 but = = −3.
−2 −2 −2 −2 −2

−6 −6 −6 −6 −6
Example 418. ∣ ∣≠ because ∣ ∣ = ∣3∣ = 3 but = = −3.
2 ∣2∣ 2 ∣2∣ 2

−6 −6 −6 −6
Example 419. ∣ ∣≠ because ∣ ∣ = ∣3∣ = 3 but = −3.
2 2 2 2

However, the following is always true:

a ∣a∣
Fact 42. If a, b ∈ R with b ≠ 0, then ∣ ∣ = .
b ∣b∣

Proof. If a = 0, then this is obviously true. So suppose a ≠ 0. Two cases:

Case 1. If a and b have the same signs, then either ∣a∣ / ∣b∣ = a/b or ∣a∣ / ∣b∣ = (−a) / (−b) =
a/b. Moreover, a/b > 0 so that ∣a/b∣ = a/b.
Case 2. If a and b have opposite signs, then either ∣a∣ / ∣b∣ = (−a) /b or ∣a∣ / ∣b∣ = a/ (−b).
Moreover, a/b < 0 so that ∣a/b∣ = −a/b.

If the above proof doesn’t convince you, hopefully the following examples do:

6 ∣6∣ ∣6∣ 6 6
Example 420. ∣ ∣ = because = = 3 and ∣ ∣ = ∣3∣ = 3.
2 ∣2∣ ∣2∣ 2 2

6 ∣−6∣ ∣−6∣ 6 −6
Example 421. ∣ ∣ = because = = 3 and ∣ ∣ = ∣−3∣ = 3.
2 ∣2∣ ∣2∣ 2 2

6 ∣6∣ ∣6∣ 6 6
Example 422. ∣ ∣ = because = = 3 and ∣ ∣ = ∣−3∣ = 3.
2 ∣−2∣ ∣−2∣ 2 −2

6 ∣−6∣ ∣−6∣ 6 −6
Example 423. ∣ ∣ = because = = 3 and ∣ ∣ = ∣3∣ = 3.
2 ∣−2∣ ∣−2∣ 2 −2

We also have the following similar result:

Fact 43. If a, b ∈ R, then ∣ab∣ = ∣a∣ ∣b∣.

Proof. There are four possible cases, depending on the signs of a and b. We will show that
the conclusion holds in each case:
1. If a, b ≥ 0, then ∣ab∣ = ab = ∣a∣ ∣b∣.
351, Contents
2. If a, b < 0, then ∣ab∣ = ab = ∣a∣ ∣b∣.
3. If a ≥ 0 and b < 0, then ∣ab∣ = −ab = ∣a∣ ∣b∣.
4. If a < 0 and b ≥ 0, then ∣ab∣ = −ab = ∣a∣ ∣b∣.

352, Contents

ax2 + bx + c
24.6. >0
dx2 + ex + f
According to your syllabus, you need to solve the above inequality in those cases where
both numerator and denominator are either always positive or factorisable.
Let’s first look at those cases where either expression is always positive:

ax2 + bx + c 1
Fact 44. Consider the inequality: > 0.
dx2 + ex + f
(a) If ax2 + bx + c is always positive, then > is equivalent to dx2 + ex + f > 0.

(b) If dx2 + ex + f is always positive, then > is equivalent to ax2 + bx + c > 0.


Proof. (a) Divide both sides by ax2 + bx + c to get 1/ (dx2 + ex + f ) > 0 or dx2 + ex + f > 0.
(b) Multiply both sides by dx2 + ex + f to get ax2 + bx + c > 0.

By the way, recall (Ch. 9) that a quadratic expression is always positive if and only if
its coefficient on x2 is positive (so that the graph is ∪-shaped) AND the discriminant is
positive (so that the graph is everywhere above the x-axis).

x2 + x + 1
Example 424. Consider > 0.
3x2 − 2x − 5
The numerator is always positive, because the coefficient on x2 is positive and the dis-
criminant 12 − 4 (1) (1) = −3 is negative. And so by Fact 44(a), the given inequality is
simply equivalent to 3x2 − 2x − 5 > 0. This is a ∪-shaped quadratic with discriminant
(−2) − 4 (3) (−5) = 64. Hence, 3x2 − 2x − 5 > 0 “outside” the two roots, which are:

2 ± 64 −6 10 5
x= = , = −1, .
2⋅3 6 6 3
Hence, the given inequality’s solution set is R ∖ [−1, 5/3] or (−∞, −1) ∪ (5/3, ∞).

−x2 + 7x + 1
Example 425. Consider > 0.
2x2 − x + 1
The denominator is always positive, because the coefficient on x2 is positive and the
discriminant (−1) − 4 (2) (1) = −7 is negative. And so by Fact 44(b), the given inequality

is simply equivalent to −x2 + 7x + 1 > 0. This is a ∩-shaped quadratic with discriminant

72 − 4 (−1) (1) = 53. Hence, −x2 + 7x + 1 > 0 “between” the two roots, which are:
√ √
−7 ± 53 7 ∓ 53
x= =
2 ⋅ (−1)
√ √
7 − 53 7 + 53
Hence, the given inequality’s solution set is ( , ).
2 2

353, Contents

We next look at the case where both ax2 + bx + c and dx2 + ex + f are factorisable.

x2 + 3x + 2
Example 426. Solve > 0.
2x2 − 7x + 6
First consider the numerator N = x2 + 3x + 2. Observe that x2 + 3x + 2 = (x + 1) (x + 2).
And so y = x2 + 3x + 2, which is a ∪-shaped quadratic, intersects the x-axis at x = −2, −1.
Thus, N > 0 if x ∈ R ∖ [−2, −1], while N < 0 if x ∈ (−1, −2).
Next consider the denominator D = 2x2 −7x+6. Observe that 2x2 −7x+6 = (2x − 3) (x − 2).
And so y = 2x2 − 7x + 6, which is a ∪-shaped quadratic, intersects the x-axis at x = 1.5, 2.
Thus, D > 0 if x ∈ R ∖ [1.5, 2], while D < 0 if x ∈ (1.5, 2).
The given inequality is true if N, D > 0 OR N, D < 0. We have:

N, D > 0 ⇐⇒ x ∈ R ∖ [−2, −1] AND x ∈ R ∖ [1.5, 2] ⇐⇒ x ∈ R ∖ ([−2, −1] ∪ [1.5, 2]).

N, D < 0 ⇐⇒ x ∈ (−1, −2) AND x ∈ (1.5, 2) ⇐⇒ x ∈ (−1, −2) ∩ (1.5, 2) = ∅.

Hence, the given inequality’s solution set is:

R ∖ ([−2, −1] ∪ [1.5, 2]) or (−∞, −2) ∪ (−1, 1.5) ∪ (2, ∞).

−x2 + 5x − 4
Example 427. Solve > 0.
3x2 − 2x − 5
First consider the numerator N = −x2 +5x−4. Observe that −x2 +5x−4 = − (x − 1) (x − 4).
And so y = −x2 + 5x − 4, which is a ∩-shaped quadratic, intersects the x-axis at x = 1, 4.
Thus, N > 0 if x ∈ (1, 4), while N < 0 if x ∈ R ∖ [1, 4].
Next consider the denominator D = 3x2 −2x−5. Observe that 3x2 −2x−5 = (3x − 5) (x + 1).
And so y = 3x2 − 2x − 5, which is a ∪-shaped quadratic, intersects the x-axis at x = −1, 5/3.
Thus, D > 0 if x ∈ R ∖ [−1, 5/3], while D < 0 if x ∈ (−1, 5/3).
The given inequality is true if N, D > 0 OR N, D < 0. We have:

N, D > 0 ⇐⇒ x ∈ (1, 4) AND x ∈ R ∖ [−1, 5/3] ⇐⇒ x ∈ (1, 4) ∖ [−1, 5/3] = (5/3, 4).

N, D < 0 ⇐⇒ x ∈ R ∖ [1, 4] AND x ∈ (−1, 5/3) ⇐⇒ x ∈ (−1, 5/3) ∖ [1, 4] = (−1, 1).

Hence, the given inequality’s solution set is (−1, 1) ∪ (5/3, 4).

Exercise 141. Solve each inequality. (Answers on pp. 1453, 1454, and 1455.)

x2 + 2x + 1 x2 − 1 x2 − 3x − 18
(a) > 0. (b) > 0. (c) > 0.
x2 − 3x + 2 x2 − 4 −x2 + 9x − 14

354, Contents

24.7. Solving Inequalities by Graphical Methods

Example 428. Solve x > sin (0.5πx).

As usual, first rewrite the inequality into SF: x − sin (0.5πx) > 0.
Then graph y = x−sin (0.5πx) on your TI84. Our goal is to find the roots of this equation,
i.e. the values of x for which x − sin (0.5πx) = 0.

After Step 1. After Step 2. After Step 3. After Step 4.

After Step 5. After Step 6. After Step 7.

1. Press ON to turn on your TI84.

2. Press Y= to bring up the Y= editor.
3. Press X,T,θ,n − SIN 0 . 5 . Next press 2ND and then ∧ to enter π. Now press
X,T,θ,n ) and altogether you will have entered “x − sin (0.5πx)”.
4. Now press GRAPH and the TI84 will graph y = x − sin (0.5πx).
It looks there may be some x-intercepts close to the origin. Let’s zoom in to see better.
5. Press the ZOOM button to bring up a menu of ZOOM options.
6. Press 2 to select the Zoom In option. Nothing seems to happen. But now press
ENTER and the TI84 will zoom in a little for you.
It looks like there are 3 horizontal intercepts. To find out what precisely they are, we’ll
use the TI84’s “zero” option.
7. Press 2ND and then TRACE to bring up the CALC (CALCULATE) menu.
8. Press 2 to select the zero function. This brings you back to the graph, with a cursor
flashing at the origin. Also, the TI84 prompts you with the question: “Left Bound?”
(Example continues on the next page ...)

355, Contents

(... Example continued from the previous page.)

After Step 8. After Step 9. After Step 10.

After Step 11. After Step 12. After Step 13.

Recall176 that zero is another word for root. So what TI84’s zero function will do here is
find the roots of the given equation (i.e. the values of x for which y = 0). Those of you
accustomed to newfangled inventions like the world wide web and the wireless telephone
will probably be expecting that the TI84 simply and immediately tells you what all the
roots are. But alas, the TI84 is an ancient device, which means there’s plenty more work
you must do to find the three roots.
To find a root, you must first specify a “Left Bound” and a “Right Bound” for x. The
TI84 will then check to see if there are any values of x for which y = 0 between those
two bounds.
9. Using the ⟨ and ⟩ arrow keys, move the blinking cursor until it is where you want
your first “Left Bound” to be. For me, I have placed it a little to the left of where I
believe the leftmost horizontal intercept to be.
10. Press ENTER and you will have just entered your first “Left Bound”.
TI84 now prompts you with the question: “Right Bound?”.
11. So now just repeat. Using the ⟨ and ⟩ arrow keys, move the blinking cursor until it
is where you want your first “Right Bound” to be. For me, I have placed it a little to
the right of where I believe the leftmost horizontal is.
12. Again press ENTER and you will have just entered your first “Right Bound”.
TI84 now asks you: “Guess?” This is just asking if you want to proceed and get TI84 to
work out where the horizontal intercept is. So go ahead and:
13. Press ENTER . TI84 now informs you that there is a “Zero” at “x = −1”, “y = 0” and
places the blinking cursor at that point. So, x = −1 is the first root we’ve found.
To find the other two roots, “simply” repeat steps 7 through 13 — two more times. You
should find that the other two roots are x = 0 and x = 1. Altogether, the three roots are
x = −1, 0, 1. Based on these and what the graph looks like, we conclude:

x > sin (0.5πx) ⇐⇒ x − sin (0.5πx) > 0 ⇐⇒ x ∈ (−1, 0) ∪ (1, ∞).

Remark 21.
356, Contents
Example 429. Solve x > e + ln x.
As usual, first rewrite the inequality into SF: x − e − ln x > 0.
1. Graph y = x − e − ln x on your TI84 (precise instructions omitted).
We see that there’s clearly an x-intercept at around x ∈ (4, 5). (Note that by default,
each of the little tick marks shown on your TI84 marks 1 unit.)
2. Zoom in (precise instructions omitted).
Now we see that there’s probably also an x-intercept near the origin. But unfortu-
nately, now we can no longer see the other x-intercept. To fix this:
3. Press WINDOW to bring up the WINDOW menu.
We will adjust Xmin and Xmax:
4. Press 0 . We have adjusted Xmin to 0. Next:
5. Press ENTER 5 . We have adjusted Xmax to 5.
6. Now press GRAPH . We can now see the portion of the graph between x = 0 and
x = 5.
7. To find the two roots, “simply” go through the steps described in the previous example
— twice (precise instructions omitted). You should find that the two roots are x ≈
0.708, 4.139.
Based on these roots and what the graph looks like, we conclude that the inequality’s
solution set is x ∈ (0, 0.708 . . . ) ∪ (4.139 . . . , ∞) = R+ ∖ (0.708 . . . , 4.139 . . . ).

After Step 1. After Step 2. After Step 3. After Step 4.

After Step 5. After Step 6. After Step 7.

Exercise 142. Use a GC to solve each inequality: (Answers on p. 1456.)

√ 1
(a) x3 − x2 + x − 1 > ex . (b) x > cos x. (c) > x3 + sin x.

357, Contents

25. Solving Systems of Equations
Actually, you already learnt to solve systems of equations in primary and secondary
school. Here’s a PSLE-style question:

Exercise 143. When Apu was 40 years old, Beng was twice as old as Caleb. Today, Apu
is twice as old as Beng and Caleb is 28 years old. What are the ages of Apu and Beng
today? (Assume that a person’s age is always an integer and fixed between January 1st
and December 31st each year.) (Answer on p. 1457.)

Here’s a O-Level-style question that requires a little trigonometry:

Exercise 144. Planes A and B leave the same point at 12 p.m. Plane A travels northeast
at a constant speed of 100 km h−1 . Plane B travels south at a constant speed of 200 km h−1 .

Plane A travels
northeast at 100 km h−1

Plane B travels
south at 200 km h−1

At 3 p.m., both planes make an instant turn and start flying directly towards each other
at the same speed. At what time will the two planes collide? (Answer on p. 1458.)

358, Contents

In Ch. 8, we learnt what a solution and a solution set were, in the case of an equation
involving only one variable. We now introduce a system of equations, which is a set of
equations involving more than one variable.177

Example 430. Consider this system of equations (which is simply a set of two equations):

y = x + 1 and y = −x + 3 (x, y ∈ R).

1 2

You already learnt to solve this system of equations in secondary school. Simply plug =

into = to get =, then do some simple algebra:

1 3

−x + 3 = x + 1 ⇐⇒ 2 = 2x ⇐⇒ x = 1.

Thus, this system of equations has one solution: (1, 2) and its solution set is {(1, 2)}.

y = −x + 3 y =x+1

(1, 2)

To make it extra clear that the ordered pair’s first co-ordinate is x and second is y, we
can also write, “This system of equations has one solution: (x, y) = (1, 2).”

In Ch. 8, we also studied inequalities involving one variable. Fortunately, for H2 Maths, we will not be
studying systems of inequalities (i.e. sets of inequalities involving more than one variable). We will
instead only be studying systems of equations.
359, Contents
Example 431. Consider this system of equations:

x2 3
y= − and y = x (x, y ∈ R).
1 2
2 2

Again, from secondary school, you already know how to solve this system of equations.
Simply plug = into = to get =, then do some simple algebra:
2 1 3

x2 3
x= − ⇐⇒ 0 = x2 − 2x − 3 = (x − 3) (x + 1).
2 2
Thus, this system of equations has two solutions: (3, 3) and (−1, −1) and its solution
set is {(3, 3) , (−1, −1)}.

x2 3
y= −
2 2

(3, 3)

(−1, −1)

To be extra clear, we can also write, “This system of equations has two solutions:
(x, y) = (3, 3) , (−1, −1).”

360, Contents

A system of equations can have no solutions.

Example 432. Consider this system of equations: y = ln x and y = x (x, y ∈ R).

1 2

y = ln x lies below y = x everywhere. Hence, no ordered pair (x, y) satisfies the above
1 2

system of equations. In other words, there is no solution and the solution set is ∅.


y = ln x

Example 433. Consider this system of equations: y = x2 and y = −1 (x, y ∈ R).

1 2

y = x2 lies above y = −1 everywhere. Hence, no ordered pair (x, y) satisfies the above
1 2

system of equations. In other words, there is no solution and the solution set is ∅.178

y = x2

y = −1

But as we’ll learn in Part IV, if this system of equations is rewritten so that x, y ∈ C, then it actually
has two (complex) solutions — (−i, −1) and (i, −1) — and the solution set {(−i, −1) , (i, −1)}.
361, Contents
A system of equations can have infinitely many solutions.

Example 434. Consider this system of equations:

y = x and 2y = 2x (x, y ∈ R).

1 2

Observe that these two equations are really the same. And so, the above system of equa-
tions is satisfied by every point (x, y) for which y = x. Its solution set is {(x, y) ∶ y = x}.

2y = 2x

Example 435. Consider this system of equations:

y = cos x and y = 1 (x, y ∈ R).

1 2

The above system of equations is satisfied by every point (x, y) for which x = 2kπ and
y = 0, where k is any integer. Its solution set is {(x, y) ∶ x = 2kπ, y = 0, k ∈ Z}.

(−2π, 1) y=1 (2π, 1)

(0, 1)

y = cos x

362, Contents

Our above examples involved only two variables. Here are some examples of systems of
equations with more than two variables:

Example 436. The following system of equations is a set of three equations, involving
three variables:

y = 3x − 2, z = 7 − y, and x = y + z (x, y, z ∈ R).

1 2 3

Again, you already learnt to solve this system of equations in secondary school. Simply
plug = and = into = to get x = 3x − 2 + 7 − y = 3x + 5 − y ⇐⇒ y = 2x + 5.
1 2 3 4

Next plug = into = to get: 2x + 5 = 3x − 2 ⇐⇒ 7 = x.

4 1

Now plug x = 7 into = to get y = 19. Then plug y = 19 into = to get z = −12.
1 2

We conclude that this system of equations has one solution: (7, 19, −12). Its solution
set is {(7, 19, −12)}.
To be extra clear that the ordered triple’s first co-ordinate is x, second is y, and third is
z, we can also write, “This system of equations has one solution: (x, y, z) = (7, 19, −12).”
By the way, (7, 19, −12) is our first example of an ordered triple. This is exactly
analogous to an ordered pair, except that now there are three coordinates. As you can
imagine, we also have ordered quadruples, ordered quintuples, and more generally
ordered n-tuples.179
When we study vectors in Part III, we’ll learn that it’s actually possible to graph the
above system of equations. (Spoiler: Our graph will be in 3-dimensional space.)

Example 437. Here is a system of two equations that involves three variables:

x = yz and z = 0 (x, y, z ∈ R).

1 2

= immediately tells us that any solution must have z = 0. Plugging = into =, we then also
2 2 1

have x = 0. There are no restrictions on what y can be.

Altogether then, there are infinitely many solutions — the solution set is
{(x, y, z) ∶ x = 0, y ∈ R, z = 0} or more simply {(0, y, 0) ∶ y ∈ R}. That is, for any real num-
ber y, the ordered triple (0, y, 0) solves the above system of equations.

In the context of a system of equations, here are the formal definitions of a solution and
the solution set:

Definition 81. Given a system of equations involving n variables, we call any ordered
n-tuple that satisfies the system of equations a solution. And we call the set of all such
ordered n-tuples its solution set.

See Definition 218 in the Appendices for the formal definition of an n-tuple.
363, Contents
Here’s a very typical example from the A-Level exams:

Example 438. Let a, b, c, x, y ∈ R. Suppose the equation y = ax2 + bx + c has solutions

(0, 1), (2, 3), and (4, 5). Then what are a, b, and c?
Here things may seem a little confusing. We speak of the equation y = ax2 + bx + c and
its solutions, with x and y being the variables.
Yet what we really want to do is to solve the following system of equations, where a, b,
and c are the variables and we’ve plugged in the given values of x and y:

1 = a ⋅ 02 + b ⋅ 0 + c = c,

3 = a ⋅ 22 + b ⋅ 2 + c = 4a + 2b + c,

5 = a ⋅ 42 + b ⋅ 4 + c = 16a + 4b + c.

As usual, this is a simple matter of algebra. Plug = into =: 3 = 4a + 2b + 1 ⇐⇒ 1 − 2a = b.

1 2 4

Plug = and = into =: 5 = 16a + 4 (1 − 2a) + 1 = 8a + 5 ⇐⇒ 8a = 0.

1 4 3

Hence, a = 0, b = 1, and c = 1. The above system of equations has the solution (a, b, c) =
(0, 1, 1) and the solution set {(0, 1, 1)}.

Remark 44. Note that historically, the A-Level exams have never mentioned the concept
of a solution set.

Find a, b, and c in each of the following (a, b, c, x, y ∈ R).

Exercise 145. The equation y = ax2 + bx + c has solutions (x, y) = (1, 2) , (3, 5) , (6, 9).
(Answer on p. 1459.)
Exercise 146. The equation y = ax2 + bx + c has the solution (x, y) = (−1, 2) and the
strict global minimum point (0, 0). (Answer on p. 1459.)

364, Contents

You need to know how to use your graphing calculator to solve systems of equations. This
is particularly handy when we don’t know how to solve the given system of equations:

Example 439. Consider the following system of equations.

y = x4 − x3 − 5 and y = ln x (x > 0, y ∈ R).

We’ll solve this system using two methods.

Method 1. Graph both equations, then find the intersection points:
1. Enter the two equations (precise instructions omitted).
2. Press GRAPH to graph the two equations.
3. Press 2ND and TRACE to bring up the CALC (CALCULATE) menu.
4. Now press 5 to select the “intersect” function.
Your TI84 now asks, “First curve?” So now, simply:
5. Press ENTER to tell the TI84 that “y1 = x4 − x3 − 5” is indeed our first curve.
The TI84 now asks you, “Second curve?” This time:
6. Use the left and right arrow keys ⟨ and ⟩ to move the cursor to roughly where you
think there is an intersection point. For me, I’ve moved it to (x, y) ≈ (1.702, 0.532).
7. Now hit ENTER . The TI84 will now ask you “Guess?”.
8. So hit ENTER again. After working furiously for a moment, the TI84 will inform you
that the intersection point is (x, y) ≈ (1.866, 0.624).
It looks though like there might be another intersection point near x = 0. And so,
repeating the above steps but now moving the cursor leftwards (precise instructions and
screenshots omitted), you should find that there is another intersection point at (x, y) ≈
(0.00674, −5). Thus, the solutions are (1.865 . . . , 0.623 . . . ) and (0.00674 . . . , −5); and the
solution set is {(1.865 . . . , 0.623 . . . ) , (0.00674 . . . , −5)}.

After Step 1. After Step 2. After Step 3. After Step 4.

After Step 5. After Step 6. After Step 7. After Step 8.

(Example continues on the next page ...)

365, Contents

(... Example continued from the previous page.)
Method 2. Combine the two equations into the single equation y = x4 − x3 − 5 − ln x.
Graph it, then find the x-intercepts. Brief instructions:
1. Graph y = x4 − x3 − 5 − ln x.
It looks like there’s an x-intercept near 2.
2. Using the “zero” function, we find that it’s at x ≈ 1.866.
It’s not obvious, but it looks like there might be another x-intercept near the origin.180
3. Using the “zero” function, we find there is indeed another x-intercept at x ≈ 0.00674.
With Method 2, we must plug in these two values of x into either of the original equations.
As before, we conclude: the solutions are (1.865 . . . , 0.623 . . . ) and (0.00674 . . . , −5); and
the solution set is {(1.865 . . . , 0.623 . . . ) , (0.00674 . . . , −5)}.

After Step 1. After Step 2. After Step 3.

Exercise 147. Use your GC to solve each system of equations.

(a) y = √ , y = x5 − x3 + 2 (x, y ∈ R, x ≥ 0).
1+ x
(b) y= , y = x3 + sin x (x, y ∈ R, x ≠ ±1).
1 − x2
(c) x2 + y 2 = 1, y = sin x (x, y ∈ R). (Answers on pp. 1459–1460.)

Indeed, I foolishly missed this second x-intercept in earlier versions of this textbook!
366, Contents
25.1. O-Level Review: Partial Fractions

Example 440. Consider the expression .
x2 + 3x + 2
We can rewrite or decompose this expression into partial fractions. First, observe
that x2 + 3x + 2 = (x + 1) (x + 2). Next, write:
1 1
= = +
x2 + 3x + 2 (x + 1) (x + 2) x+1 x+2

A (x + 2) + B (x + 1) (A + B) x + 2A + B
= =
(x + 1) (x + 2) (x + 1) (x + 2)

Comparing coefficients on the linear and constant terms, we have A+B = 0 and 2A+B = 1.
Hence, A = 1 and B = −1. Altogether then, we can decompose our original expression
into the following partial fractions:

1 1 1
= − .
x2 + 3x + 2 x + 1 x + 2

5x − 3
Example 441. Consider . Again, x2 + 3x + 2 = (x + 1) (x + 2) and so write:
x + 3x + 2

5x − 3 A (x + 2) + B (x + 1) (A + B) x + 2A + B
= + = =
x2 + 3x + 2 x + 1 x + 2 (x + 1) (x + 2) (x + 1) (x + 2)

Comparing coefficients on the linear and constant terms, we have A + B = 5 and 2A + B =

−3. Hence, A = −8 and B = 13. Altogether then, we have:

5x − 3 5 3
= − .
x2 + 3x + 2 x + 1 x + 2

9x − 5
Example 442. Consider .
−x2 + 5x − 6

First, observe that −x2 + 5x − 6 = − (x − 3) (x − 2). Next write:

9x − 5 A (x − 2) − B (x − 3) (A − B) x − 2A + 3B
= + = =
−x2 + 5x − 6 − (x − 3) x − 2 − (x − 3) (x − 2) − (x − 3) (x − 2)

Comparing coefficients on the linear and constant terms, we have A−B = 9 and −2A+3B =
−5. Hence, A = 22 and B = 13. Altogether then, we have:

9x − 5 22 13 22 13
= + = + .
−x2 + 5x − 6 − (x − 3) x − 2 3 − x x − 2

367, Contents

More generally, as per p. 14 of your A-Level syllabus, you are required to know how to
decompose partial fractions in “cases where the denominator is no more complicated than”:
(ax + b)(cx + d), (ax + b)(cx + d)2 , or (ax + b)(x2 + c2 ).
The lovely folks at MOE have kindly included the following on your List MF26:

Partial fractions decomposition

Non-repeated linear factors:

px + q A B
= +
(ax + b)(cx + d ) (ax + b) (cx + d )
Repeated linear factors:
px 2 + qx + r A B C
= + +
(ax + b)(cx + d ) 2
(ax + b) (cx + d ) (cx + d ) 2
Non-repeated quadratic factor:
px 2 + qx + r A Bx + C
= + 2
(ax + b)( x + c )
2 2
(ax + b) ( x + c 2 )

Two examples of “repeated linear factors”:

x2 + x + 1
Example 443. Consider 2. Write:
(x + 1) (x − 1)
x2 + x + 1
= + +
(x + 1) (x − 1)
2 x + 1 x − 1 (x − 1)2

A (x − 1) + B (x + 1) (x − 1) + C (x + 1)
(x + 1) (x − 1)

A (x2 − 2x + 1) + B (x2 − 1) + C (x + 1)
(x + 1) (x − 1)

(A + B) x2 + (−2A + C) x + A − B + C
= .
(x + 1) (x − 1)

Comparing coefficients, we have A + B = 1, −2A + C = 1, A − B + C = 1.

Summing these last three equations, we get 2C = 3 or C = 3/2. Thus, A = 1/4 and B = 3/4.
Altogether , we have:

x2 + x + 1 1 3 3
= + + .
(x + 1) (x − 1)
2 4 (x + 1) 4 (x − 1) 2 (x − 1)2

368, Contents

3x2 − x + 1
Example 444. Consider 2. Write:
(4x − 1) (x + 2)

3x2 − x + 1 A (x + 2) + B (4x − 1) (x + 2) + C (4x − 1)


2 = 4x − 1 + x + 2 + 2 =
(4x − 1) (x + 2) (x + 2) (4x − 1) (x + 2)

A (x2 + 4x + 4) + B (4x2 + 7x − 2) + C (4x − 1)

(4x − 1) (x + 2)

(A + 4B) x2 + (4A + 7B + 4C) x + 4A − 2B − C

= .
(4x − 1) (x + 2)

Comparing coefficients, we have A + 4B = 3, 4A + 7B + 4C = −1, and 4A − 2B − C = 1.

1 2 3

From =, we have A = 3 − 4B. Next, = minus = yields 9B + 5C = −2 or C = − (2 + 9B) /5.

1 4 2 3 5

Now plug = and = into = to get 4 (3 − 4B) − 2B + (2 + 9B) /5 = 1 or B = 19/27.

4 5 3

And now we can also get A = 5/27 and C = −5/3. Altogether then, we have:

3x2 − x + 1 5 19 5
= + − .
(4x − 1) (x + 2)
2 27 (4x − 1) 27 (x + 2) 3 (x + 2)2

Two examples of “non-repeated quadratic factors”:

2x2 + x + 1
Example 445. Consider . Write:
(5x − 1) (x2 + 1)

2x2 + x + 1 Bx + C A (x2 + 1) + (Bx + C) (5x − 1)

= + =
(5x − 1) (x2 + 1) 5x − 1 x2 + 1 (5x − 1) (x2 + 1)

Ax2 + A + 5Bx2 + (5C − B) x − C

(5x − 1) (x2 + 1)

(A + 5B) x2 + (5C − B) x + A − C
(5x − 1) (x2 + 1)

Comparing coefficients, we have A + 5B = 2, 5C − B = 1, and A − C = 1.

1 2 3

From =, we have A = 2 − 5B. From =, we have C = (B + 1) /5. Plug = and = into = to get
1 4 2 5 4 5 3

2 − 5B − (B + 1) /5 = 1 or B = 2/13.
And now we can also get A = 16/13 and C = 3/13. Altogether then, we have:

2x2 + x + 1 16 2x + 3
= + .
(5x − 1) (x + 1) 13 (5x − 1) 13 (x2 + 1)

369, Contents

4x2 − 3x + 2
Example 446. Consider . Write:
(2x + 7) (x2 + 9)

4x2 − 3x + 2 Bx + C A (x2 + 9) + (Bx + C) (2x + 7)

= + =
(2x + 7) (x2 + 9) 2x + 7 x2 + 9 (2x + 7) (x2 + 9)

Ax2 + 9A + 2Bx2 + (7B + 2C) x + 7C

(2x + 7) (x2 + 9)

(A + 2B) x2 + (7B + 2C) x + 9A + 7C

(2x + 7) (x2 + 9)

Comparing coefficients, we have A + 2B = 4, 7B + 2C = −3, and 9A + 7C = 2.

1 2 3

From =, we have A = 4 − 2B. From =, we have C = − (7B + 3) /2. Plug = and = into = to
1 4 2 5 4 5 3

get 9 (4 − 2B) + 7 (−7B + 3) /2 = 2 or B = 47/85.

And now we can also get A = 246/85 and C = −292/85. Altogether then, we have:

4x2 − 3x + 2 246 47x − 292

= + .
(2x + 7) (x + 9) 85 (2x + 7) 85 (x2 + 9)

Towkay Tip

As you can tell, partial fractions aren’t difficult — just a bunch of tedious algebra. So
the important thing is to go slowly and be really careful. Check and double-check that
you’ve got everything exactly correct at each step of the way. This will save you time
and marks, as compared to trying to do the algebra quickly and making a mistake.

Exercise 148. Decompose each of the following into partial fractions. (Hint: You may
need to factorise the denominators — read Ch. 21.1 if you haven’t already.)
8 17x − 5 2x2 − x + 7
(a) 2 . (b) . (c) 3 .
x +x−6 3x2 − 8x − 3 x − x2 − x + 1

−3x2 + 5
(d) . (Answers on pp. 1461–1462.)
x3 − 2x2 + 4x − 8

370, Contents

26. Extraneous Solutions

26.1. Squaring

Example 447. We are asked to solve x = 2 − x (x ≤ 2).

To do so, we try these steps:

1. Square both sides: x2 = 2 − x.
2. Rearrange and factorise: x2 + x − 2 = (x − 1)(x + 2) = 0.
3. Conclude: x = 1 or x = −2.
b c

√ √
Observe that x = 1 does indeed solve =: 1 = 2 − 1 = 1 = 1.
b 1 a
√ √
But x = −2 does not: −2 = 2 − (−2) = 4 = 2.
c a
So, the above steps are wrong because they produce the extraneous solution x = −2.

Where was this extraneous solution introduced?

It turns out that this extraneous solution was introduced in Step 1, where we applied the
squaring operation.
To see this more clearly, let us be more explicit about our above chain of reasoning. In
particular, let us use the logical operators ⇐⇒ (“is equivalent to”) and Ô⇒ (“implies”):

x= 2−x

Ô⇒ x2 = 2 − x

⇐⇒ (x − 1)(x + 2) = 0

⇐⇒ x = 1 or x = −2.
3 b c

We now see clearly how Step 1 differs from Steps 2 and 3. Step 1 is a “ Ô⇒ ” state-
ment, while Steps 2 and 3 are “ ⇐⇒ ” statements. Or in plainer English, the squaring
operation in Step 1 is an irreversible operation.
It is generally true that: a=b Ô⇒ a2 = b2 .
However, the converse is false. That is: a2 = b2 Ô⇒
/ a = b.
For example, (−1) = 12 , but −1 ≠ 1. So, squaring is an example of an irreversible

operation; it produces an “ Ô⇒ ” statement and not a “ ⇐⇒ ” statement.

Thus, all our above chain of reasoning does is to produce the following (true) implication:
a √
x = 2−x Ô⇒ x = 1 OR x = −2. (If you’re puzzled as to why the above
b c

implication is true, review Ch. 3.2.)

must now check, on a case-by-case basis, whether each of x = 1 OR x = −2 solves
b c
We √
x = 2 − x. And when we do check, we find that x = 1 does, while x = −2 does not and is
a b c

thus an extraneous solution that must be discarded.

371, Contents

The following informal Theorem is our moral of the story. It describes how, when, and
why extraneous solutions may arise:

Extraneous Solutions Theorem (very informal)

If our chain of reasoning contains only ⇐⇒ ’s, then all is well. However, if it contains
even one Ô⇒ (i.e. an irreversible step), then extraneous solutions may arise and we
must be careful to check for them.

Note the emphasis on the word may. Extraneous solutions may arise but might not.
The operation of squaring is merely one example of when extraneous solutions may be
introduced. Two other examples of operations that may do likewise are those of multiply-
ing by zero and removing logarithms:

372, Contents

26.2. Multiplying by Zero

x2 − 3x 1 a
Example 448. We are asked181 to solve: + 2 + = 0. We try these steps:
x2 − 1 x−1
1. Multiply by x2 − 1: x2 − 3x + 2 (x2 − 1) + (x + 1) = 0.
2. Rearrange and factorise: 3x2 − 2x − 1 = (x − 1)(3x + 1) = 0.
3. Conclude: x = 1 or x = −1/3.
b c

Exercise: Verify that x = −1/3 solves =, while x = 1 does not.182

c a b

So, x = 1 is an extraneous solution and the above steps are wrong. Where lies the error?

Again, to clearly detect the error, let us write out our chain of reasoning more explicitly
with the aid of the logical relations Ô⇒ and ⇐⇒ :

x2 − 3x 1 a
+ 2 + =0
x2 − 1 x−1
Ô⇒ x2 − 3x + 2 (x2 − 1) + (x + 1) = 0

⇐⇒ 3x2 − 2x − 1 = (x − 1)(3x + 1) = 0

⇐⇒ x=1 x = −1/3.
3 b c

Now, why is Ô⇒ an irreversible operation? The reason is that we’re multiplying by


some unknown quantity x2 − 1 which might be equal to zero.

And multiplying by zero is an irreversible operation. In general:

y = z Ô⇒ 0 ⋅ y = 0 ⋅ z, but 0 ⋅ y = 0 ⋅ z Ô⇒
/ y=z
For example: 1 = 1 Ô⇒ 0 ⋅ 1 = 0 ⋅ 1, but 0 ⋅ 2 = 0 ⋅ 3 Ô⇒
/ 2 = 3.

So, our above chain of reasoning yields the following (true) implication:

x2 − 3x 1 a
+2+ =0 Ô⇒ x = 1 or x = −1/3.
b c
x −1
2 x−1
We must then check, case-by-case, whether each of our final solutions actually solves the
original equation. In this example, we find that x = −1/3 solves =, while x = 1 does not
c 1 b

and is an extraneous solution that must be discarded.

The problem of multiplying by zero is thus dual to the problem of dividing by zero,
which we discussed earlier in Ch. 2.2. Dividing by zero may cause us to lose (valid)
solutions; while multiplying by zero may introduce extraneous (or invalid) solutions.
This example was stolen from Manning (1970) “A History of Extraneous Solutions”, p. 170.
(−1/3) − 3 (−1/3) 1 1/9 + 1 3 5 3 a
Plug x = −1/3 into = : +2+ = + 2 − = − + 2 − = 0.
c a
−1/3 − 1 1/9 − 1
(−1/3) − 1 4 4 4

In contrast, x = 1 does not solve = because x − 1 = 0, so that some terms in = are undefined.
b a a

373, Contents

26.3. Removing Logs

Example 449. We are asked to solve log x+log (x + 1) = log (2x + 2). We try these steps:

1. Use a Logarithm Law: log x + log (x + 1) = log (x2 + x) = log (2x + 2).

2. Remove logs: x2 + x = 2x − 2.
3. Rearrange and factorise: x2 − x + 2 = (x + 1)(x − 2) = 0.
4. Conclude: x = −1 or x = 2.
b c

Exercise: Verify that x = 2 solves = but x = −1 does not.183

c a b

So, x = 1 is an extraneous solution and the above steps are wrong. Where lies the error?

Again, to clearly detect the error, let us write out our chain of reasoning more explicitly:

log x + log (x + 1) = log (2x + 2)


⇐⇒ log x + log (x + 1) = log (x2 + x) = log (2x + 2)

1 a

Ô⇒ x2 + x = 2x − 2

⇐⇒ x2 − x + 2 = (x + 1)(x − 2) = 0

⇐⇒ x = −1 or x = 2.
4 b c

This time, Step 2 is the irreversible step. In general, we have:

log a = log b Ô⇒ a = b, a = b Ô⇒
/ log a = log b.

If a = b is non-positive, then log a or log b is undefined, so that = is necessarily false.


Here in this brief chapter, we have merely examined three examples of operations by which
extraneous solutions may be introduced, namely squaring, multiplying by zero, and
removing logarithms. These are not exhaustive and you will likely encounter more of
such operations as your maths education progresses.
The important thing is to remember how and why extraneous solutions arise. In particular,
you should remember and understand the informal theorem given on p. 372.

Plug x = 2 into =: log 2 + log (2 + 1) = log 2 + log 3 = log 6 = log (2 ⋅ 2 + 2) = log 6.

183 c a a
In contrast, x = −1 does not solve = because log −1 is not even defined.
b a

374, Contents

Exercise 149. Suppose we are given that x ∈ [0, 2π) and sin x + cos x = 1. We attempt

to solve = with the steps below. Identify any errors in these steps and give the correct

solution.184 (Answer on p. 1463.)

1. Square both sides: (sin x + cos x) = sin2 x + cos2 x + 2 sin x cos x = 12 = 1.

2 1

2. Apply the identity sin2 x + cos2 x = 1 to get: 2 sin x cos x = 0.

3. Hence: sin x = 0 or cos x = 0.
4. Conclude: x = 0 or x = π or x = π/2 or x = 3π/2.
2 3 4 5

√ √
Exercise 150. (Tricky.)185 Suppose we are given that x ∈ R and x2 x + x2 + x + 1 = 0.

We have the following, seemingly-watertight chain of reasoning:

√ √
x2 x + x2 + x + 1 = 0

⇐⇒ (x2 + 1) ( x + 1) = 0

⇐⇒ x2 + 1 = 0 or x + 1 = 0

⇐⇒ N.A. or x = −1

⇐⇒ x = 1.
4 2

Verify that x = 1 does not solve =. Then identify any errors in the above chain of reasoning
2 1

and give the correct solution. (Answer on p. 1463.)

Exercise 151. (Even trickier.)186 Suppose we are given that x ∈ R and x2 + x + 1 = 0. We


have the following, seemingly-watertight chain of reasoning:

x2 + x + 1 = 0.

⇐⇒ Rearrange: x2 = −x − 1.
1 2

⇐⇒ Since x = 0 doesn’t solve = anyway, we know that x ≠ 0.

2 1

So, we can divide = by x to get: x = −1 − .
2 3
⇐⇒ Now plug = into = to get: x2 + (−1 − ) + 1 = 0.
3 3 1 4
⇐⇒ x = 1.
4 5

Verify that x = 1 solves = but not =. Then identify any errors in the above chain of
5 4 1

reasoning and give the correct solution. (Answer on p. 1463.)

Stolen from Sullivan (Precalculus, 10e, 2017, p. 519), hat tip to .
Adapted from .
Stolen from .
375, Contents
Part II.
Sequences and Series

376, Contents

[T]here is nothing as dreamy and poetic, nothing as radical, subversive, and
psychedelic, as mathematics.

— Paul Lockhart (2002, 2009).

377, Contents

27. Sequences
Informally, a sequence is simply an ordered list of real numbers.187

Example 450. The Fibonacci sequence is:

(1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, . . . )

We call the numbers in a sequence its terms.

So the Fibonacci sequence’s 1st term is 1, 2nd is 1, 3rd is 2, etc.
The first two terms of the Fibonacci sequence are fixed as 1. Each subsequent term is
then simply the sum of the previous two terms. So:

the 3rd term is 1+1 = 2; the 7th term is 5 + 8 = 13;

the 4th term is 1+2 = 3; the 8th term is 8 + 13 = 21;
the 5th term is 2+3 = 5; the 9th term is 13 + 21 = 34;
the 6th term is 3+5 = 8; etc.

Letting f (n) denote the nth term, the Fibonacci sequence may be defined by:

⎪1 for n = 1, 2,
f (n) = ⎨

⎩f (n − 2) + f (n − 1)
⎪ for n ≥ 3.

Example 451. The sequence of square numbers is:

(1, 4, 9, 16, 25, 36, 49, 64, . . . )

This sequence’s 1st term is 1, 2nd is 4, 3rd term is 9, etc.

Letting s (n) denote the nth term, this sequence may be defined by s(n) = n2 .

Example 452. The sequence of triangular numbers is:

(1, 3, 6, 10, 15, 21, 28, 36, . . . )

This sequence’s 1st term is 1, 2nd is 3, 3rd term is 6, etc.

Letting t (n) denote the nth term, this sequence may be defined by t(n) = 1 + 2 + ⋅ ⋅ ⋅ + n.

Remark 45. For clarity, it is wise and indeed customary to enclose a sequence in paren-
theses.188 And so, even though your A-Level exams do not, I will insist on doing so.

In H2 Maths, we’ll look only at sequences of real numbers. But in general, the objects in a sequence
need not be real numbers and could be any objects whatsoever.
Note though that some writers prefer using braces {} or angle brackets ⟨⟩.
378, Contents
The above sequences were infinite. But of course, sequences can also be finite:

Example 453. The finite sequence of the first six Fibonacci numbers is:

(1, 1, 2, 3, 5, 8) .

We call this a finite sequence of length 6.

Example 454. The finite sequence of the first seven square numbers is:

(1, 4, 9, 16, 25, 36, 49) .

We call this a finite sequence of length 7.

Example 455. The finite sequence of the first four triangular numbers is:

(1, 3, 6, 10) .

We call this a finite sequence of length 4.

Remark 46. We’ll generally be more interested in infinite sequences than finite sequences.
And so, when we simply say sequence, it may be assumed that we’re talking about an
infinite sequence. And when we want to talk about a finite sequence, we’ll clearly and
explicitly include the word finite.

379, Contents

27.1. Sequences Are Functions
A little more formally, sequences are functions:

Definition 82. A finite sequence of length k is a function with domain {1, 2, . . . , k}.

Definition 83. An (infinite) sequence is a function with domain Z+ = {1, 2, 3, . . . }.

Remark 47. For H2 Maths, the objects in a sequence will always be real numbers. But
in general, they can be anything whatsoever.189

We now formally rewrite our earlier examples of sequences as functions:

Example 456. Formally, the Fibonacci sequence is the function f ∶ Z+ → R defined by:

⎪1 for n = 1, 2,
f (n) = ⎨

⎩f (n − 2) + f (n − 1)
⎪ for n ≥ 3.

Example 457. Formally, the sequence of square numbers is the function s ∶ Z+ → R

defined by s(n) = n2 .

Example 458. Formally, the sequence of triangular numbers is the function t ∶ Z+ → R

defined by t(n) = 1 + 2 + ⋅ ⋅ ⋅ + n.

Example 459. Formally, the finite sequence of the first six Fibonacci numbers is the
function f6 ∶ {1, 2, 3, 4, 5, 6} → R defined by:

⎪1 for n = 1, 2,
f6 (n) = ⎨

⎩f6 (n − 2) + f6 (n − 1)
⎪ for n = 3, 4, 5, 6.

Example 460. Formally, the finite sequence of the first seven square numbers is the
function s7 ∶ {1, 2, 3, 4, 5, 6, 7} → R defined by s7 (n) = n2 .

Example 461. Formally, the finite sequence of the first four triangular numbers is the
function t4 ∶ {1, 2, 3, 4} → R defined by t4 (n) = 1 + 2 + ⋅ ⋅ ⋅ + n.

Remark 48. As repeatedly emphasised, the letter x we often use with functions is merely
a dummy or placeholder variable that can be replaced by any another.
Indeed, in the context of sequences, we’ll often prefer using n rather than x as our dummy

More examples:

In other words, for H2 Maths, the codomain of a sequence (which is a function) will always be R. But
in general, it can be any set whatsoever.
380, Contents
Example 462. The function e ∶ Z+ → R with mapping rule e(n) = 2n is the sequence of
(positive) even numbers (2, 4, 6, 8, 10, 12, . . . ).

Example 463. The function g ∶ Z+ → R with mapping rule g(n) = 2n2 − 3n + 3 is the
following sequence (2, 5, 12, 23, 38, 57, 80, 107, 138, 173, . . . ).

Recall that a function need not “follow any formula” or “make any sense”. The same is
true of sequences (since sequences are simply functions):

Example 464. The function h ∶ {1, 2, 3, 4} → {Cow, Chicken} is defined by:

h(1) = Cow, h(2) = Cow, h(3) = Chicken, and h(4) = Cow.

Although h does not seem to “make any sense”, it is a (perfectly) well-defined function.
Indeed, the function h is also the following finite sequence of length 4:

(Cow, Cow, Chicken, Cow).

Although this sequence does not seem to “make any sense”, it is a (perfectly) well-defined
finite sequence of length 4, simply because it is a function with domain {1, 2, 3, 4}.

Example 465. The function j ∶ {1, 2, 3} → {↑, ↓, →, ←, Punch, Kick} is defined by:

j(1) = ↓, j(2) = →, and j(3) = Punch

Although j does not seem to “make any sense”, it is a (perfectly) well-defined function.
Indeed, the function j is also the following finite sequence of length 3:

(↓, →, Punch).

Although this sequence does not seem to “make any sense”, it is a (perfectly) well-defined
finite sequence of length 3, simply because it is a function with domain {1, 2, 3}.

Exercise 152. Formally rewrite each sequence as a function. (Answer on p. 1464.)

(a) (1, 4, 9, 16, 25, 36, 49, 64, 81, 100).
(b) (2, 5, 8, 11, 14, 17, 20, . . . , 299).
(c) (1, 8, 27, 64, 125, 216, 343).
(d) (2, 6, 6, 12, 10, 18, 14, 24, 18, 30, 22, 36, 26, 42, . . . ).
(e) (5, 0, 99).
(f) (1, 2, 6, 24, 120, 720, 5 040, 40 320, . . . ).
(g) (1, 0, 1, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 1, . . . ).

381, Contents

27.2. Notation
Let (a1 , a2 , . . . , ak ) be a finite sequence of length k. For convenience, this sequence can also
be denoted as any of the following:

(an )n=1 or (an )n=1,2,...,k or (an )n∈{1,2,...,k} or (an )1≤n≤k .


Example 466. Let s1 = 1, s2 = 4, s3 = 9, s4 = 16, and s5 = 25. Then we can denote the
finite sequence of the first five square numbers by:

(s1 , s2 , s3 , s4 , s5 ) = (sn )n=1 = (sn )n=1,2,3,4,5 = (sn )n∈{1,2,3,4,5} = (sn )1≤n≤5 .


Again, s and n are merely dummy or placeholder variables. And here, n in particular may
also be called an index variable, because it indexes or indicates which term in the
sequence we’re referring to.
We could replace s or n with any other symbol, like , or ⋆. And so, we could rewrite the
above example as:

Example 467. Let ,1 = 1, ,2 = 4, ,3 = 9, ,4 = 16, and ,5 = 25. Then we can also

denote the finite sequence of the first five square numbers by:

(,1 , ,2 , ,3 , ,4 , ,5 ) = (,⋆ )⋆=1 = (,⋆ )⋆=1,2,3,4,5 = (,⋆ )⋆∈{1,2,3,4,5} = (,⋆ )1≤⋆≤5 .


Of course, it’s a bit strange to use symbols like , or ⋆. The point here is simply to illustrate
that once again, these are mere symbols that can be replaced by any other. We’ll usually
stick to using boring symbols like letters from the Latin alphabet.

Next, let (a1 , a2 , . . . ) be an (infinite) sequence. For convenience, this sequence can also be
denoted as any of the following:

(an )n=1 or (an )n=1,2,... or (an )n∈Z+ or (an )n∈{1,2,... } or (an ).

Example 468. Let t1 = 1, t2 = 3, t3 = 6, t4 = 10, t5 = 15, etc. Then we can denote the
(infinite) sequence of triangular numbers by:

(t1 , t2 , . . . ) = (tn )n=1 = (tn )n=1,2,... = (tn )n∈Z+ = (tn )n∈{1,2,... } = (tn ) .

Again, we can replace t and n with any other symbols:

Example 469. Let ,1 = 1, ,2 = 3, ,3 = 6, ,4 = 10, ,5 = 15, etc. Then we can also

denote the (infinite) sequence of triangular numbers by:

(,1 , ,2 , . . . ) = (,⋆ )⋆=1 = (,⋆ )⋆=1,2,... = (,⋆ )⋆∈Z+ = (,⋆ )⋆∈{1,2,... } = (,⋆ ) .

382, Contents

27.3. Arithmetic Combinations of Sequences
In Ch. 13, we learnt to create arithmetic combinations of functions. Since sequences are
functions, we can likewise create arithmetic combinations of sequences:

Example 470. Let (an ) = (1, 1, 2, 3, 5, 8, 13, 21, 34, . . . ) be the Fibonacci sequence. Let
(bn ) = (2, 4, 6, 8, 10, 12, 14, 16, 18, . . . ) be the sequence of even numbers. Let k = 10. Then:

(an + bn ) = (3, 5, 8, 11, 15, 20, 27, 37, 52, . . . ),

(an − bn ) = (−1, −3, −4, −5, −5, −4, −1, 5, 16, . . . ),

(an bn ) = (2, 4, 12, 24, 50, 96, 182, 336, 612, . . . ),

1 1 1 3 1 2 13 21 17
( ) = ( , , , , , , , , , . . . ),
bn 2 4 3 8 2 3 14 16 9
(kan ) = (10, 10, 20, 30, 50, 80, 130, 210, 340, . . . ),

(kbn ) = (20, 40, 60, 80, 100, 120, 140, 160, 180, . . . ).

Exercise 153. Let (cn ) be the sequence of negative odd numbers, (dn ) the sequence of
cube numbers, and k = 2. Write out the first five terms of (a) (cn ); and (b) (dn ). Then
write out the first five terms of each of the following. (Answer on p. 1464.)

(c) (cn + dn ) (d) (cn − bn ) (e) (cn dn )

(f) ( ) (g) (kcn ) (h) (kdn )


383, Contents

28. Series

Definition 84. Given a finite sequence (an )n∈{1,2,...,k} , its series is the expression

a1 + a2 + a3 + ⋅ ⋅ ⋅ + ak .

And the sum of this series is the number that equals the above expression.

Example 471. Consider the finite sequence of the first five square numbers:
(1, 4, 9, 16, 25). Its series is the expression 1 + 4 + 9 + 16 + 25, while the sum of this
series is the number 55.

Example 472. Consider the finite sequence of the first six even numbers:
(2, 4, 6, 8, 10, 12). Its series is the expression 2 + 4 + 6 + 8 + 10 + 12, while the sum of
this series is the number 42.

It may seem strange and unnecessary to distinguish between a series and its sum. Aren’t
they exactly the same thing?
It turns out that expressions like a1 + a2 + a3 + ⋅ ⋅ ⋅ + ak play an important role in maths. And
so, we want to reserve a special name for the expression itself, in order to distinguish it
from the number that is the sum of the series.

Example 473. Given the sequence (1, 3, 5, 7), we might be specifically interested in the
expression 1 + 3 + 5 + 7, rather than just the number 16.
It is thus convenient to have separate names for them — we call the expression 1+3+5+7
the series and the number 16 the sum (of the series).

384, Contents

28.1. Convergent and Divergent Series

Definition 85. Given an (infinite) sequence (an ), its series is the expression

a1 + a2 + a3 + . . .

As we saw on the previous page, every finite series has a well-defined sum — simply add
up all the numbers!
With an infinite series, things get a little trickier. It may sometimes be that an infinite
series diverges and its limit does not exist.

Example 474. Let (1, 1, 1, 1, 1, . . . ) be the (infinite) sequence that consists solely of 1s.
Its series is the expression 1 + 1 + 1 + 1 + 1 + . . .
“Clearly”, this expression is not equal to any number. And so formally, we say that this
series diverges and that its limit does not exist.
Also, observe that this expression “grows ever larger”. As shorthand, we write:190

1 + 1 + 1 + 1 + 1 + ⋅ ⋅ ⋅ = ∞.

Remark 49. As was the case with sequences, we’ll generally be more interested in infinite
series than finite series. And so, when we simply say series, it may safely be assumed that
we’re talking about an infinite series. And when we want to talk about a finite series,
we’ll clearly and explicitly include the word finite.

Example 475. Let (2, 4, 6, 8, 10, . . . ) be the sequence of (positive) even numbers.
Its series is the expression 2 + 4 + 6 + 8 + 10 + . . .
“Clearly”, this expression is not equal to any number. And so formally, we say that this
series diverges and that its limit does not exist.
Also, observe that this expression “grows ever larger”. As shorthand, we write:

2 + 4 + 6 + 8 + 10 + ⋅ ⋅ ⋅ = ∞.

Pedantic point: this “equation” is not really an equation. Instead, it is merely shorthand for the
following statement:

“The expression 1 + 1 + 1 + 1 + 1 + . . . grows ever larger.”

“Grows ever larger” is, in turn, a vague and informal phrase that we clarify only in Ch. 118.1 of the
385, Contents
We just looked at two examples of series that diverge. We now look at examples of series
that converge to some limit:

Example 476. Consider the zero sequence (0, 0, 0, 0, 0, . . . ).

Its series is the expression 0 + 0 + 0 + 0 + 0 + . . .
This series converges to 0. We call 0 its limit. And as shorthand, we write:

0 + 0 + 0 + 0 + 0 + ⋅ ⋅ ⋅ = 0.

Note that what was called the sum (in the previous context of finite series) is now called
the limit (in the current context of infinite series).

1 1 1 1 1
Example 477. Consider the sequence ( , , , , , . . . ).
2 4 8 16 32
1 1 1 1 1
Its series is the expression + + + + + ...
2 4 8 16 32
As we’ll soon learn, it turns out that:
1 1 1 1 1
+ + + + + ⋅ ⋅ ⋅ = 1.
2 4 8 16 32
That is, this series converges to 1. (And we call 1 the limit of this series.)
Here we should remark that whenever we are dealing with infinite series, we must be very
careful. Here the = sign in the above equation is not the usual one. Instead, the above
1 1 1 1 1
equation is merely shorthand for “the expression + + + + + . . . converges
2 4 8 16 32
to the number 1”. In H2 Maths, we shall simply count on your intuitive and imprecise
understanding of what the phrase converges to means, but you should know that it does
actually have a clear and precise meaning (on this, see Ch. 118.1 in the Appendices).

1 1 1
Example 478. Consider the sequence of reciprocals of squares ( , , , . . . ).
12 22 32
1 1 1 1 1
It series is the expression + + + ⋅ ⋅ ⋅ = 1 + + + ...
12 22 32 4 9
It’s not at all obvious, but it turns out that:
1 1 1 1 1 π2
+ + + ⋅ ⋅ ⋅ = 1 + + + ⋅ ⋅ ⋅ = .
12 22 32 4 9 6
That is, this series converges to π/6. (And we call π/6 the limit of this series.)
The problem of finding the sum of this series is among the more famous problems in the
history of mathematics and is known as the Basel Problem. We will revisit this probably
in Ch. XXX.

386, Contents

Now, when does a series converge or diverge? Or equivalently, when does its limit exist?
The precise definitions of these terms are beyond the scope of H2 Maths.191 For H2 Maths,
you need only know — roughly and intuitively — what convergence, divergence, and
limits are.
It turns out that what exactly these terms mean is no simple matter. On the next page are
two fun examples to give you a glimpse of the difficulties involved:

Example 479. Consider Grandi’s series:192

1 − 1 + 1 − 1 + 1 − 1 + ...

Does this series converge or diverge? Or equivalently, does its limit exist?
Remarkably, we can “prove” that this series is equal to 0, 1, and 1/2.
• To “prove” that it equals 0, pair off the terms like so:

1 − 1 + 1 − 1 + 1 − 1 + ⋅ ⋅ ⋅ = (1 − 1) + (1 − 1) + (1 − 1) + ⋅ ⋅ ⋅ = 0 + 0 + 0 + ⋅ ⋅ ⋅ = 0.
´¹¹ ¹ ¹ ¸ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¸ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¸ ¹ ¹ ¹ ¶
0 0 0

• To “prove” that it equals 1, pair off the terms after the first, like so:

1 − 1 + 1 − 1 + 1 − 1 + . . . = 1 + (−1 + 1) + (−1 + 1) + (−1 + 1) + . . .

´¹¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¶
0 0 0
= 1 + 0 + 0 + 0 + ⋅ ⋅ ⋅ = 1.

• To “prove” that it equals 1/2, let S = 1 − 1 + 1 − 1 + 1 − 1 + . . . , then “show” that 1 − S = S:

1 − S = 1 − (1 − 1 + 1 − 1 + 1 − 1 + . . . ) = 1 − 1 + 1 − 1 + 1 − 1 + ⋅ ⋅ ⋅ = S.

Since 1 − S = S, simple algebra yields 1 = 2S or S = 1/2.

We’ve just “proven” that the expression 1 − 1 + 1 − 1 + 1 − 1 + . . . equals 0, 1, and 1/2.
Well, which is it then? It turns out that the series 1 − 1 + 1 − 1 + 1 − 1 + . . . is not equal to
0, 1, or 1/2. Instead, it is divergent and its limit does not exist.193

Here is an example of a series whose convergence is unknown:

But if you’re interested, see the brief discussion in Ch. 118.1 (Appendices).
The Italian priest-mathematician Luigi Guido Grandi (1671–1742) thought he had proved that the sum
of this series equalled both 0 and 1/2, and thus that “God could create the word out of nothing” (source).
We prove this in Example 1224 in the Appendices.
387, Contents
Example 480. Consider the following series:
1 2 3 4 5 6 7 8 9 10
− + − + − + − + − + ...
2 3 5 7 11 13 17 19 23 29
The terms are fractions, with the numerators being the (positive) integers and the denom-
inators being the prime numbers. This series looks “simple” enough. But remarkably,
mathematicians still do not know whether it converges or diverges!194

This problem is listed as equation (8) on this Wolfram Mathworld page and as Problem E7 in Richard
Guy’s Unsolved Problems in Number Theory (3e, p. 316), where it is attributed to Paul Erdős.
388, Contents
29. Summation Notation ∑

The symbol Σ is the upper-case Greek letter sigma.195

An enlarged version of that symbol is the symbol ∑, which is read aloud as “sum”. We
use the symbol ∑ to write down series more compactly, in what is called summation or
sigma notation.

Example 481. ∑ n2 = 12 + 22 + 32 + 42 + 52 = 1 + 4 + 9 + 16 + 25.

• The variable n is the dummy, placeholder, or index variable. (We can replace it with
any other symbol without changing the meaning of the above sentence.)
• The integer below ∑ is the starting point. So here, 1 tells us to start counting (the
index variable n) from n = 1.
• The integer above ∑ is the stopping point. So here, 5 tells us to stop at n = 5.
• The expression to the right of ∑ describes the nth term to be added up. So here, the
nth term to be added up is n2 .
Altogether then, ∑ n2 tells us to add up the terms 12 , 22 , 32 , 42 , and 52 .

Example 482. ∑ (2n + 3) = (2 ⋅ 1 + 3) + (2 ⋅ 2 + 3) + (2 ⋅ 3 + 3) = 5 + 7 + 9 = 21.

Example 483. ∑ n = 1 + 2 + 3 + 4 = 10.

Example 484. ∑ 2n = 2 ⋅ 1 + 2 ⋅ 2 + 2 ⋅ 3 + 2 ⋅ 4 + 2 ⋅ 5 + 2 ⋅ 6 = 2 + 4 + 6 + 8 + 10 + 12 = 42.

Example 485. ∑ 2n = 21 + 22 + 23 + 24 + 25 + 26 + 27 = 2 + 4 + 8 + 16 + 32 + 64 + 128 = 254.

Example 486. ∑ 1 = 1 + 1 + 1 + 1 + 1 = 5. Here each term to be added up is simply the
constant 1. And so, ∑ 1 is simply the sum of five 1s.

Example 487. ∑ (10 − 2n) = (10 − 2 ⋅ 1) + (10 − 2 ⋅ 2) + (10 − 2 ⋅ 3) = 8 + 6 + 4.

The symbol σ is the lower-case Greek letter sigma.
389, Contents
1 1 1 1 1
Example 488. ∑ = + + + .
n=1 n 1 2 3 4

1 1 1 1 1 1 1 1 1
Example 489. ∑ = + + + = + + + .
(n + 1) (1 + 1) (2 + 1) (3 + 1) (4 + 1)
2 2 2 2 2 4 9 16 25

Example 490. Let x ∈ R. Then:

∑ xn−1 = x1−1 + x2−1 + x3−1 + x4−1 + x5−1 = x0 + x1 + x2 + x3 + x4 = 1 + x + x2 + x3 + x4 .

It’s nice to have 1 as the starting point and that’s what we’ll usually do. But there’s no
reason why the starting point must always be 1. Examples:

Example 491. Starting point 3:

∑ n3 = 33 + 43 + 53 = 27 + 64 + 125 = 216.

Example 492. Starting point 0:

4 √ √ √ √ √ √ √ √
∑ n = 0 + 1 + 2 + 3 + 4 = 0 + 1 + 2 + 3 + 2.

The starting point can even be negative:

Example 493. Starting point −2:

n −2 −1 0 1 2 3
1 1 1 1 3 3
∑ = + + + + + =− − +0+ + + = .
n=−2 4 4 4 4 4 4 4 2 4 4 2 4 4

Exercise 154. Rewrite each series in summation notation. (Answer on p. 1465.)

(a) 1 + 2 + 6 + 24 + 120 + 720 + 5 040.
(b) 2 + 5 + 8 + 11 + 14 + 17 + 20 + 23.
(c) 1/2 + 1 + 3/2 + 2 + 5/2 + 3 + 7/2.
(d) 8 + 7 + 6 + 5 + 4 + 3.

Exercise 155. Redo the above exercise, but now with starting point 0. (Answer on p.

Exercise 156. Find the sum of each series. (Observe that here the dummy or index
variables are not the usual n. Instead, they are i, ⋆, and x.) (Answer on p. 1465.)

4 17 33
(a) ∑ (2 − i) . (b) ∑ (4 ⋆ +5). (c) ∑ (x − 3).

i=1 ⋆=16 x=31

390, Contents

29.1. Summation Notation for Infinite Series

Example 494. In the following series, the “n = 1” below the ∑ symbol indicates that it
has starting point 1.

1 1 1 1 1 1 1
∑ = + + + ⋅ ⋅ ⋅ = + + + ...
n=1 2n 2 1 2 2 2 3 2 4 8

The ∞ above the ∑ symbol indicates that there is no stopping point. This is thus an
infinite series.
As already mentioned and as we’ll soon learn, this series converges to 1. We may write:

∑ = 1.
n=1 2n

If the context makes it crystal clear what the starting and stopping points are, we will
sometimes be lazy/sloppy and omit them. And so here, we may also write:

1 1
∑ = ∑ = 1.
n=1 2n 2 n

Example 495. ∑ n = ∑ n = 1 + 2 + 3 + . . .

“Clearly”, this series diverges. We write: ∑ n = ∑ n = ∞.

1 1 1 1 1 1 1
Example 496. ∑ = ∑ = + + + ⋅ ⋅ ⋅ = 1 + + + ...
n=1 n
2 n2 12 22 32 4 9

1 1 π
As mentioned in Example 478, this series converges to π/6: ∑ =∑ 2 = .
n=1 n
2 n 6

Example 497. ∑ nx2 = ∑ nx2 = 1x2 + 2x2 + 3x2 + 4x2 + . . .

This infinite series has starting point 1 and each term to be added up is nx2 .

Thus, ∑ nx2 is the sum of infinitely many terms, namely x2 , 2x2 , 3x2 , 4x2 ...
By the way, this series diverges for all x ≠ 0. That is, for all x ≠ 0, we have:

∑ nx2 = ∑ nx2 = ∞.

And “obviously”, if x = 0, then this series converges to 0:

∑ n ⋅ 02 = ∑ n ⋅ 02 = 0.

391, Contents

Example 498. Consider the harmonic series:

1 1 1 1 1
∑ = ∑ = + + + ...
n=1 n n 1 2 3

We have:
1 2 3
1 1 1 1 1 1 1 1 1
∑ = = 1, ∑ = + = 1.5, ∑ = + + = 1.83,
n=1 n 1 n=1 n 1 2 n=1 n 1 2 3

100 200 300

1 1 1
∑ = 5.187 . . . , ∑ = 5.878 . . . , ∑ = 6.282 . . . ,
n=1 n n=1 n n=1 n

1 000 104 105

1 1 1
∑ = 7.485 . . . , ∑ = 9.787 . . . , ∑ = 12.090 . . . .
n=1 n n=1 n n=1 n

Does the harmonic series converge or diverge? From the above, it’s not obvious.
It turns out that it diverges. Here’s a heuristic (i.e. not 100% rigorous) proof:
1 1 1
First, consider the series 1 + + + + . . . — “clearly”, this series diverges.
2 2 2
As we show below, this series is “smaller than” the harmonic series.196
Thus, the harmonic series must “clearly” also diverge:

1 1 1 1
1+ + + + +...
2 2 2 2
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
= + + + + + + + + + + + + + + + +...
1 2 4 4 8 8 8 8 16 16 16 16 16 16 16 16
² ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
1/2 1/2 1/2

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
< + + + + + + + + + + + + + + + +...
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

= ∑ .
n=1 n

Again, here we must be careful to define what we mean by one infinite series being “smaller than”
392, Contents
As before, it’s nice to have 1 as the starting point, but this need not always be so:

1 1 1 1 1 1 1 1
Example 499. ∑ = 0 + 1 + 2 + 3 + ⋅⋅⋅ = 1 + + + + ...
n=0 2 2 2 2 2 2 4 8

By the way, since ∑ 1/2n = 1, it “clearly” follows that:

∞ ∞
1 1
∑ n = 1 + ∑ n = 1 + 1 = 2.
n=0 2 n=1 2

1 1 1 1 1 1 1 1 1
Example 500. ∑ = + + + + ⋅⋅⋅ = + + + + ...
n=−2 n + 5 −2 + 5 −1 + 5 0 + 5 1 + 5 3 4 5 6

By the way, since the harmonic series diverges — i.e. ∑ 1/n = ∞ — we have:

1 1 1 1 1 1 1 ∞ 1
∑ = + + + + ⋅ ⋅ ⋅ = − − + ∑ = ∞.
n=−2 n + 5 3 4 5 6 1 2 n=1 n

That is, since the harmonic series, this series also diverges.

Exercise 157. Rewrite each series in summation notation. (Answer on p. 1465.)

(a) 1 + 2 + 6 + 24 + 120 + 720 + 5 040 + . . . .
(b) 2 + 5 + 8 + 11 + 14 + 17 + 20 + 23 + . . .
(c) 1/2 + 1 + 3/2 + 2 + 5/2 + 3 + 7/2 + . . .
(d) 8 + 7 + 6 + 5 + 4 + 3 + . . .

Exercise 158. Redo the above exercise, but now with starting point 0. (Answer on p.

393, Contents

30. Arithmetic Sequences and Series

Definition 86. An arithmetic sequence (or progression) is a sequence where the difference
between any two consecutive terms is constant. This constant difference, denoted d, is
called the common difference.

Example 501. Below are six arithmetic sequences. Those on the left are finite while
those on the right are infinite.

(an )n=1 = (1, 3, 5, 7, 9), (an ) = (1, 3, 5, 7, 9, . . . ),


(bn )n=1 = (4, 7, 10, 13, 16, 19, 22) (bn ) = (4, 7, 10, 13, 16, 19, 22, . . . )

(cn )n=1 = (0, π, 2π) (cn ) = (0, π, 2π, . . . )


In each sequence, the difference between any two consecutive terms is a constant.
(an )n=1 (an ), d = 2.
In and the common difference is

(bn )n=1 (bn ), d = 3.

In and the common difference is

(cn )n=1 (cn ), d = π.

In and the common difference is

Definition 87. Given an arithmetic sequence, its series is called an arithmetic series.

Example 502. In the above example, we gave six arithmetic sequences. Here are the six
corresponding arithmetic series:

1 + 3 + 5 + 7 + 9, 1 + 3 + 5 + 7 + 9 + ...,
4 + 7 + 10 + 13 + 16 + 19 + 22, 4 + 7 + 10 + 13 + 16 + 19 + 22 + . . . ,
0 + π + 2π, 0 + π + 2π + . . .

Fact 45. Let (an )n=1 be a finite arithmetic sequence with common diff. d = a2 − a1 . Then:

(a) The nth term is: an = a1 + (n − 1) d.

ak − a1
(b) The number of terms is: k= + 1.
k k
(c) ∑ an = ∑ [a1 + (n − 1) d].
n=1 n=1

Proof. (a) The common difference between any two consecutive terms is d. Since an is
(n − 1) terms “after” a1 , we must have an = a1 + (n − 1) d.
ak − a1
(b) By (a), ak = a1 + (k − 1) d. Rearranging, we have: k = + 1.
(c) This is an immediate consequence of (a).

394, Contents

30.1. Finite Arithmetic Series

Example 503. You’ve probably heard the apocryphal story197 about an eight-year-old
Gauss adding up the numbers from 1 to 100 in an instant. The trick is to pair the first
number with the last, the second with the second last, etc., then multiply. Like this:
50 pairs
³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹· ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ
1 + 2 + 3 + 4 + ⋅ ⋅ ⋅ + 100 = (1 + 100) + (2 + 99) + (3 + 98) + ⋅ ⋅ ⋅ + (50 + 51)
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
101 101 101 101
= 101 × 50 = 5050.

In general, there is a simple formula for the sum of a finite arithmetic series:
Number of Terms
(First Term + Last Term) × .
A bit more formally:

Fact 46. Let (an )n=1 be a finite arithmetic sequence. Then:


∑ an = (a1 + ak ) .
n=1 2

Our proof of this formula is simply a formalisation of Gauss’s apocryphal idea:

Proof. First, suppose k is even. Then:

a1 + a2 + ⋅ ⋅ ⋅ + ak = (a1 + ak ) + (a2 + ak−1 ) + ⋅ ⋅ ⋅ + (ak/2 + ak/2+1 ).

Note that RHS has k/2 pairs of terms.
Next: a1 + ak = a2 + ak−1 = a3 + ak−3 = ⋅ ⋅ ⋅ = ak/2 + ak/2+1 .

∑ an = (a1 + ak ) . 3
n=1 2

For the remainder of this proof (the case where k is odd), which is slightly trickier/messier,
see p. 1281 (Appendices).

The American Scientist writer Brian Hayes has investigated the provenance of this story. According to
him, the earliest known printing of this story was in “a 1906 pamphlet authored by Franz Mathé”.
395, Contents
Example 504. Consider the arithmetic sequence (an )n=1 = (7, 17, 27, 37, . . . , 837).

The first and last terms are a1 = 7 and ak = 837. The common difference is 10.
So, by Fact 45(b), the total number of terms is k = (837 − 7) /10 + 1 = 83 + 1 = 84.
And now by Fact 46:
∑ an = (a1 + ak ) = (7 + 837) = 35 448.
n=1 2 2

Example 505. Consider the arithmetic sequence (bn )n=1 = (1, 5, 9, 13, 17, . . . , 393).

The first and last terms are b1 = 1 and bk = 393. The common difference is 4.
So, by Fact 45(b), the total number of terms is k = (393 − 1) /4 + 1 = 98 + 1 = 99.
And now by Fact 46:
∑ bn = (b1 + bk ) = (1 + 393) = 19 503.
n=1 2 2

Exercise 159. Rewrite each series in summation notation, then compute its sum.
(a) 2 + 7 + 12 + 17 + 22 + 27 + 32 + ⋅ ⋅ ⋅ + 997.
(b) 3 + 20 + 37 + 54 + 71 + ⋅ ⋅ ⋅ + 1 703.
(c) 81 + 89 + 97 + 105 + 113 + ⋅ ⋅ ⋅ + 8 081. (Answer on p. 1466.)

30.2. Infinite Arithmetic Series

“Clearly”, the infinite arithmetic series 4 + 7 + 10 + 13 + 16 + . . . does not converge. And
more generally:

Fact 47. Other than the zero series, every (infinite) arithmetic series diverges.

Proof. See p. 1283 in the Appendices.

396, Contents

31. Geometric Sequences and Series

Definition 88. A geometric sequence (or progression) is a sequence where the ratio
between any two consecutive terms is constant. This constant ratio, denoted r, is called
the common ratio.

Example 506. Below are six geometric sequences. Those on the left are finite while
those on the right are infinite.

(an )n=1 = (1, 2, 4, 8, 16), (an ) = (1, 2, 4, 8, 16, . . . ),


1 1 1 1 1 1 1 1 1 1 1 1
(bn )n=1 = (1, , , , , , ), (bn ) = (1, , , , , , , . . . ),
2 4 8 16 32 64 2 4 8 16 32 64
(cn )n=1 = (7, 7π, 7π2 ), (cn ) = (7, 7π, 7π2 , . . . ).

In each sequence, the ratio between any two consecutive terms is a constant.
(an )n=1 (an ), r = 2.
In and the common ratio is

(bn )n=1 (bn ), r = 1/2.

In and the common ratio is

(cn )n=1 (cn ), r = π.

In and the common ratio is

Definition 89. Given a geometric sequence, its series is called a geometric series.

Example 507. In the above example, we gave six geometric sequences. Here are the six
corresponding geometric series:

1 + 2 + 4 + 8 + 16, 1 + 2 + 4 + 8 + 16 + . . . ,
1 1 1 1 1 1 1 1 1 1 1 1
1+ + + + + + 1+ + + + + + + ...
2 4 8 16 32 64 2 4 8 16 32 64
7 + 7π + 7π2 7 + 7π + 7π2 + . . .

Fact 48. Let (an )n=1 be a finite geometric sequence with common ratio r = a2 /a1 . Then:

(a) The nth term is an = a1 rn−1 .

(b) The number of terms is k = logr (ak /a1 ) + 1.

(c) ∑ an = ∑ (a1 rn−1 ).

k k

n=1 n=1

Proof. (a) The common ratio between any two consecutive terms is r. Since an is (n − 1)
terms “after” a1 , we must have an = a1 rn−1 .
(b) By (a), ak = a1 rk−1 , or ak /a1 = rk−1 , or logr (ak /a1 ) = k − 1, or k = logr (ak /a1 ) + 1.
(c) This is an immediate consequence of (a).

397, Contents

31.1. Finite Geometric Sequences and Series
It turns out that just like with finite arithmetic series, there’s a nice formula for the sum
of a finite geometric series. In words:

1 − Common RatioNumber of Terms

First Term ×
1 − Common Ratio

A bit more formally:

Fact 49. Let a, r ∈ R with r ≠ 1. Let S = a + ar + ar2 + ar3 + ⋅ ⋅ ⋅ + ark−1 . Then:

1 − rk

Proof. Let: Sk = a + ar + ar2 + ar3 + ⋅ ⋅ ⋅ + ark−1 .

And so: lim Sk = a + ar + ar2 + ar3 + . . .

1 − rk
By Fact 46: Sk = a .
Now, since ∣r∣ < 1: lim rk = 0.

lim Sk =
And thus: .
k→∞ 1−r

Remark 50. By the way, the mass cancellation trick used in the above proof is called the
method of differences (which is the topic of Ch. 33).

Example 508. Consider the geometric series 1 + 2 + 22 + 23 + ⋅ ⋅ ⋅ + 220 .

The first term is 1, the common ratio is 2, and there are 21 terms. Hence:

1 − 221 221 − 1
1 + 2 + 22 + 23 + ⋅ ⋅ ⋅ + 220 = 1 = = 2 097 152 − 1 = 2 097 151.
1−2 1

1 1 1 1
Example 509. Consider the geometric series 1 + + 2 + 3 + ⋅ ⋅ ⋅ + 20 .
2 2 2 2
The first term is 1, the common ratio is 1/2, and there are 21 terms. Hence:

1 1 1 1 1 − 0.521 1 − 0.521 221 − 1 2 097 151

1 + + 2 + 3 + ⋅ ⋅ ⋅ + 20 = 1 = = = .
2 2 2 2 1 − 0.5 0.5 220 1 048 576

398, Contents

Corollary 6 gives another formula for the sum of a finite geometric series.

Corollary 6. Let (an )n=1 be a finite geometric sequence with r = a2 /a1 . Then:

a1 − rak
∑ an =

First Term − Common Ratio × Last Term

In words: .
1 − Common Ratio
Proof. The first step uses Fact 49 and the last step uses Fact 48(a):
1 − rk a1 − a1 rk a1 − ra1 rk−1 a1 − rak
∑ an = a1 = = =
1−r 1−r 1−r 1−r

Example 510. Consider the geometric series 1 + 2 + 4 + 8 + 16 + ⋅ ⋅ ⋅ + 1 024.

The first and last terms are 1 and 1 024, and the common ratio is r = 2. Hence:

1 − 2 ⋅ 1 024 1 − 2 048
1 + 2 + 4 + 8 + 16 + ⋅ ⋅ ⋅ + 1 024 = = = 2 047.
1−2 −1

Example 511. Consider the geometric series 4 + 12 + 36 + 108 + ⋅ ⋅ ⋅ + 8 748.

The first and last terms are 4 and 8 748, and the common ratio is 3. Hence:

4 − 3 ⋅ 8 748 4 − 26 244
4 + 12 + 36 + 108 + ⋅ ⋅ ⋅ + 8 748 = = = 13 120.
1−3 −2

Exercise 160. Rewrite each series in summation notation, then compute its sum.
(a) 7 + 14 + 28 + 56 + 112 + 224 + 448 + 896.
(b) 20 + 10 + 5 + 5/2 + 5/4 + 5/8.
(c) 1 + 1/3 + 1/9 + 1/27 + 1/81 + 1/243. (Answers on p. 1466.)

399, Contents

31.2. Infinite Geometric Sequences and Series
Perhaps surprisingly, an infinite geometric series converges whenever ∣r∣ < 1. Moreover,
there’s a nice and simple formula for the limit:
First Term
1 − Common Ratio

A bit more formally:

Fact 50. Let a, r ∈ R. If ∣r∣ < 1, then:

a1 + a1 r + a1 r2 + a1 r3 + ⋅ ⋅ ⋅ =

Proof. Let: Sk = a + ar + ar2 + ar3 + ⋅ ⋅ ⋅ + ark−1 .

And so:198 lim Sk = a + ar + ar2 + ar3 + . . .

1 − rk
By Fact 49: Sk = a .

Now, since ∣r∣ < 1: lim rk = 0.


lim Sk =
And thus: .
k→∞ 1−r
If ∣r∣ ≥ 1, then the limit does not exist:

Fact 51. If ∣r∣ ≥ 1, then a1 + a1 r + a1 r2 + a1 r3 + . . . diverges.

Proof. See p. 1283 in the Appendices.

Exercise 161. Rewrite each series in summation notation, then compute its sum.
(a) 6 + 9/2 + 27/8 + . . . .
(b) 20 + 10 + 5 + . . .
(c) 1 + 1/3 + 1/9 + . . . (Answers on p. 1466.)

Our proof here hand-waves a little because it implicitly assumes certain “obvious” results about limits
that aren’t mentioned in the main text (but are proved only in Ch. XXX of the Appendices).
400, Contents
32. Rules of Summation Notation

Fact 52. Let an , bn ∈ R for all n, c ∈ R, and k, l ∈ Z+ . Then:199

(a) ∑ c = ck. (Constant Rule)
k k
(b) ∑ (can ) = c ∑ an . (Constant Factor Rule)
n=1 n=1
k k k
(c) ∑ (an + bn ) = ∑ an + ∑ bn . (Sum Rule)
n=1 n=1 n=1
k k k
(d) ∑ (an − bn ) = ∑ an − ∑ bn . (Difference Rule)
n=1 n=1 n=1
k+l k k+l
(e) ∑ an = ∑ an + ∑ an . (Breakup Rule)
n=1 n=1 n=k+1
k+l l
(f) ∑ an = ∑ ak+n . (Change of start and stop points)
n=k+1 n=1

³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ
k times
Proof. (a) ∑ c = c + c + ⋅ ⋅ ⋅ + c = ck.
k k
(b) ∑ (can ) = ca1 + ca2 + ⋅ ⋅ ⋅ + cak = c (a1 + a2 + ⋅ ⋅ ⋅ + ak ) = c ∑ an .
n=1 n=1

(c) ∑ (an + bn ) = (a1 + b1 ) + (a2 + b2 ) + ⋅ ⋅ ⋅ + (ak + bk )
k k
= (a1 + a2 + ⋅ ⋅ ⋅ + ak ) + (b1 + b2 + ⋅ ⋅ ⋅ + bk ) = ∑ an + ∑ bn .
n=1 n=1
(d) ∑ (an − bn ) = (a1 − b1 ) + (a2 − b2 ) + ⋅ ⋅ ⋅ + (ak − bk )
k k
= (a1 + a2 + ⋅ ⋅ ⋅ + ak ) − (b1 + b2 + ⋅ ⋅ ⋅ + bk ) = ∑ an − ∑ bn .
n=1 n=1
(e) ∑ an = a1 + a2 + ⋅ ⋅ ⋅ + ak + ak+1 + ak+2 + ⋅ ⋅ ⋅ + ak+l
k k+l
= (a1 + a2 + ⋅ ⋅ ⋅ + ak ) + (ak+1 + ak+2 + ⋅ ⋅ ⋅ + ak+l ) = ∑ an + ∑ an .
n=1 n=k+1

k+l l
(f) ∑ an = ak+1 + ak+2 + ak+3 + ⋅ ⋅ ⋅ + +ak+l = ∑ ak+n .
n=k+1 n=1

More of such identities available here: .
401, Contents
Exercise 162. Evaluate each of the following (x ∈ R). (Answer on p. 1467.)

100 100 100 100 100

(a) ∑ 1. (b) ∑ n. (c) ∑ (n + 1). (d) ∑ (3n + 2). (e) ∑ nx.
n=5 n=5 n=5 n=5 n=5

Exercise 163. Let x ≠ 1 and Sk = ∑ nxn−1 . Prove the following.

1 − 6x5 + 5x6 1 − (k + 1) xk + kxk+1

(a) S5 = . (b) Sk = .
(1 − x) (1 − x)
2 2

(Hint: Consider Sk − xSk .) (Answer on p. 1467.)

402, Contents

33. The Method of Differences

1 1 1 1
Example 512. Consider + + + ⋅⋅⋅ + .
1×2 2×3 3×4 1 000 × 1 001
It seems like a tall task to evaluate the sum of this series. But using partial fractions,
it’s actually ezy-wheezy. First, rewrite this series in summation notation:
1 000
1 1 1 1 1
+ + + ⋅⋅⋅ + = ∑
1×2 2×3 3×4 1 000 × 1 001 n=1 n (n + 1)

Next, take the nth term and do the partial fractions decomposition:
1 A (n + 1) + Bn (A + B) n + A
= + = =
n (n + 1) n n + 1 n (n + 1) n (n + 1)

Comparing coefficients, we have A = 1, A + B = 0, and hence B = −1.

1 1 1
Hence: = − .
n (n + 1) n n + 1
1 000
1 1 1 1 1
And: ∑ = + + + ⋅⋅⋅ +
n=1 n (n + 1) 1 × 2 2 × 3 3 × 4 1 000 × 1 001
1 1 1 1 1 1 1 1 1 1
= − + − + − + ⋅⋅⋅ − + −
1 2 2 3 3 4 4 1 000 1 000 1 001
1 1 000
=1− = .
1 001 1 001

Observe that in the second line, every term with denominator 2 through 1 000 is happily
cancelled out. Your syllabus calls this process of mass cancellation the method of
differences. (Some other writers instead call this telescoping.)200
More generally, we have:
1 1 1 1 1 1
∑ = + + + ⋅⋅⋅ + =1− .
n=1 n (n + 1) 1 × 2 2 × 3 3 × 4 k (k + 1) k + 1

We can thus easily show that the sum of the corresponding infinite series converges:

1 1 1 k
1 1
+ + + ⋅ ⋅ ⋅ = lim ∑ = lim (1 − ) = 1.
1×2 2×3 3×4 k→∞ n=1 n (n + 1) k→∞ k+1

ProofWiki says that this “arises from the obvious physical analogy with the folding up of a telescope”.
403, Contents
1 1 1 1
Example 513. Consider + + + ⋅⋅⋅ + .
1×2×3 2×3×4 3×4×5 1 000 × 1 001 × 1 002
1 000
First rewrite this series in summation notation: ∑ .
n=1 n (n + 1) (n + 2)
Next, take the nth term and do the partial fractions decomposition:

1 A (n + 1) (n + 2) + Bn (n + 2) + Cn (n + 1)
= + + =
n (n + 1) (n + 2) n n + 1 n + 2 n (n + 1) (n + 2)

(A + B + C) n2 + (3A + 2B + C) n + 2A
= .
n (n + 1) (n + 2)

Comparing coefficients, we have: A + B + C = 0, 3A + 2B + C = 0, and 2A = 1 or A = 0.5.

1 2

= minus = yields 2A + B = 0 or B = −1. And now = yields C = 0.5.

2 1 1

1 0.5 1 0.5
Altogether then: = − + .
n (n + 1) (n + 2) n n+1 n+2

1 1 1 1
And so: + + + ⋅⋅⋅ +
1×2×3 2×3×4 3×4×5 1 000 × 1 001 × 1 002
0.5 1 0.5 0.5 1 0.5 0.5 1 0.5 0.5 1 0.5
= − + + − + + − + + ⋅⋅⋅ + − + .
1 2 3 2 3 4 3 4 5 1 000 1 001 1 002

Observe that the terms with denominator 3 cancel out nicely. And the same will happen
to all terms with denominator 3 through 1 000.
We will then be left only with terms that have denominators 1, 2, 1 001, and 1 002:
1 000
1 0.5 1 0.5 0.5 1 0.5
∑ = − + + − +
n=1 n (n + 1) (n + 2) 1 2 2 1 001 1 001 1 002
1 0.5 0.5 1 1
= − + = − = 0.249 . . .
4 1 001 1 002 4 2 ⋅ 1001 ⋅ 1002
More generally, we have:
1 0.5 1 0.5 0.5 0.5 1 1 1
∑ = − + − + = − + .
n=1 n (n + 1) (n + 2) 1 2 2 k + 1 k + 2 4 2 (k + 1) 2 (k + 2)

And hence, the sum of the corresponding infinite series converges to 1/4:

1 1 k
1 1 1 1 1
+ + ⋅ ⋅ ⋅ = lim ∑ = lim ( − + )= .
1×2×3 2×3×4 k→∞ n=1 n (n + 1) (n + 2) k→∞ 4 2 (k + 1) 2 (k + 2) 4

404, Contents

Above we used partial fractions. We will next use surd rationalisation.

1 1 1 1
Example 514. Consider √ √ + √ √ + √ √ + ⋅⋅⋅ + √ √ .
1+ 2 2+ 3 3+ 4 9 999 + 10 000
Again, first rewrite this series in summation notation:
9 999
1 1 1 1 1
√ √ + √ √ + √ √ + ⋅⋅⋅ + √ √ = ∑ √ √ .
1+ 2 2+ 3 3+ 4 9 999 + 10 000 n=1 n + n + 1

Next, take the nth term and do the surd rationalisation:

√ √ √ √
1 1 n− n+1 n− n+1
√ √ =√ √ √ √ =
n+ n+1 n+ n+1 n− n+1 n − (n + 1)
√ √
n− n+1 √ √
= = n + 1 − n.
1 1 1 1
And so: √ √ + √ √ + √ √ + ⋅⋅⋅ + √ √
1+ 2 2+ 3 3+ 4 9 999 + 10 000
√ √ √ √ √ √ √ √
= 2 − 1 + 3 − 2 + 4 − 3 + ⋅ ⋅ ⋅ + 10 000 − 9 999.

The same-coloured terms nicely cancel out, so that we’re left with:
9 999
1 √ √
∑ √ √ = 10 000 − 1 = 100 − 1 = 99.
n=1 n+ n+1

More generally, we have:

1 1 1 1 1 √
∑√ √ = √ √ + √ √ + √ √ + ⋅ ⋅ ⋅ + √ √ = k + 1 − 1.
n=1 n + n + 1 1 + 2 2 + 3 3 + 4 k + k + 1

“Clearly” then, the corresponding infinite series diverges:

∞ k √
1 k
∑√ √ = lim ∑ √ √ = lim ∑ ( k + 1 − 1) = ∞.
n=1 n + n + 1 k→∞ n=1 n + n + 1 k→∞ n=1

405, Contents

The next example again uses partial fractions. But this time we’ll also use the formula
for the sum of an arithmetic series:

1 1 1 1
Example 515. Consider + + + ⋅⋅⋅ + .
1 1+2 1+2+3 1 + 2 + 3 + ⋅ ⋅ ⋅ + 1 000
Again, first rewrite the series in summation notation:
1 000
1 1 1 1 1
+ + + ⋅⋅⋅ + = ∑ .
1 1+2 1+2+3 1 + 2 + 3 + ⋅ ⋅ ⋅ + 1 000 n=1 1 + ⋅ ⋅ ⋅ + n

Next, use the formula for the sum of an arithmetic series to rewrite the nth term:

n (n + 1) 1 2
1 + ⋅⋅⋅ + n = Ô⇒ = .
2 1 + ⋅ ⋅ ⋅ + n n (n + 1)

Now do the partial fractions decomposition:

2 2 2
= −
n (n + 1) n n + 1

And so altogether, we have:

1 000
1 1 1 1 1
∑ = + + + ⋅⋅⋅ +
n=1 1 + ⋅⋅⋅ + n 1 1 + 2 1 + 2 + 3 1 + 2 + 3 + ⋅ ⋅ ⋅ + 1 000
2 2 2 2 2 2 2 2 2 2 2 000
= − + − + − + ⋅⋅⋅ + − = − = .
1 2 2 3 3 4 1 000 1 001 1 1 001 1 001
More generally, we have:
1 1 1 1 1 2
∑ = + + + ⋅⋅⋅ + =2− .
n=1 1 + ⋅ ⋅ ⋅ + n 1 1+2 1+2+3 1 + 2 + 3 + ⋅⋅⋅ + k k+1

And hence, the sum of the corresponding infinite series converges to 2:

1 1 1 k
1 2
+ + + ⋅ ⋅ ⋅ = lim ∑ = lim (2 − ) = 2.
1 1+2 1+2+3 k→∞ n=1 1 + ⋅ ⋅ ⋅ + n k→∞ k+1

406, Contents

We can also use the method of differences to compute the sum of squares:

Example 516. Consider ∑ n2 = 12 + 22 + 32 + ⋅ ⋅ ⋅ + k 2 .

k (k + 1) (2k + 1)
We will prove that: ∑ n2 = .
i=1 6

(n + 1) − n3 = 3n2 + 3n + 1.
First, observe that:

∑ [(n + 1) − n ] = ∑ (3n + 3n + 1) = 3 ∑ n + 3 ∑ n + ∑ 1
k k k k k
3 3 2 2
i=1 i=1 i=1 i=1 i=1
k (k + 1)
= 3 ∑ n2 + 3 + k.

i=1 2

On the other hand, observe also that:

∑ [(n + 1) − n3 ] = 23 − 13 + 33 − 23 + 43 − 33 + ⋅ ⋅ ⋅ + (k + 1) − k 3
3 3


= (k + 1) − 13 = k 3 + 3k 2 + 3k.
2 3

k (k + 1)
Putting = and = together, we get: 3 ∑ n2 + 3 + k = k 3 + 3k 2 + 3k.
1 2

i=1 2
k 3 + 3k 2 + 3k − 3k (k + 1) /2 − k
Rearranging: ∑n = 2
i=1 3
2k 3 + 3k 2 + k k (k + 1) (2k + 1)
= = .
6 6

Exercise 164. Rewrite each series in summation notation and find its sum. Next, write
down its sum in the case where the series has k terms instead. Finally, determine if the
corresponding infinite series converges. If it does, find its limit. (Answer on p. 1468.)

1 1 1 1 1 1
(a) + + + + + ⋅⋅⋅ + . (Hint in footnote.)201
3 8 15 24 35 999 999
1 2 3 999
(b) lg + lg + lg + ⋅ ⋅ ⋅ + lg . (lg is the base-10 log.)
2 3 4 1 000
1 1 1
(c) √ √ + √ √ + ⋅⋅⋅ + √ √ . (Hint in footnote.)202
2 1+1 2 3 2+2 3 100 99 + 99 100
(d) 13 + 23 + 33 + ⋅ ⋅ ⋅ + 1003 . (Hint in footnote.)203

Hint: Think about the square numbers.
Do the surd rationalisation. Then persevere with the algebra and things will work out nicely.
Consider (n + 1) − n4 and mimic the last example (be warned that the algebra will be more painful).
203 4

407, Contents

Part III.

408, Contents

The cultural problem is a self-perpetuating monster: students learn about
math from their teachers, and teachers learn about it from their teachers,
so this lack of understanding and appreciation for mathematics in our cul-
ture replicates itself indefinitely. Worse, the perpetuation of this “pseudo-
mathematics,” this emphasis on the accurate yet mindless manipulation of
symbols, creates its own culture and its own set of values. Those who have
become adept at it derive a great deal of self-esteem from their success. The
last thing they want to hear is that math is really about raw creativity and
aesthetic sensitivity. Many a graduate student has come to grief when they
discover, after a decade of being told they were “good at math,” that in fact
they have no real mathematical talent and are just very good at following
directions. Math is not about following directions, it’s about making new

— Paul Lockhart (2002, 2009).

409, Contents

34. Introduction to Vectors
The Latin word vector means carrier. You may recall from biology that mosquitoes
are vectors, because they carry diseases to humans. Similarly, in mathematics, a vector
“carries” us from one point to another.
Example 517. Let A = (−1, −3) and B = (2, 1) be points. Then AB = (3, 4) is the
vector that “carries” us 3 units east and 4 units north from the point A to the point B.
The vector AB has tail A and head B — remember, a vector goes from tail to head.
(Just remember: arrowhead — the arrowhead is where the vector’s head is.)

D = (2, 4)

F = (0.5, 2)

B = (2, 1)

C = (−1, 0) E = (4, 0)

A = (−1, −3)

Let C = (−1, 0), D = (2, 4), E = (4, 0), and F = (0.5, 2) be points. Then:
The vector CD = (3, 4) “carries” us 3 units east and 4 units north from the tail C to the
head D.
The vector CE = (5, 0) “carries” us 5 units east and 0 units north from the tail C to the
head E.
The vector CF = (1.5, 2) “carries” us 1.5 units east and 2 units north from the tail C to
the head F .

410, Contents

Formally, a vector in 2D space is an ordered pair of real numbers:

Definition 90. Given the points A = (a1 , a2 ) and B = (b1 , b2 ), the vector from A to B,
denoted AB, is the ordered pair of real numbers:
AB = (b1 − a1 , b2 − a2 ) .

(Later on when we look at three-dimensional (3D) space, vectors will instead be ordered
triples of real numbers.)

A vector is often contrasted with a scalar, which is simply any real number:

Definition 91. A scalar is any real number.

We will now contrast a vector, a scalar, and a point.

A vector is a two-dimensional (2D) mathematical object with the properties of:

magnitude (or length) and direction.

In contrast, a scalar is a one-dimensional object with only the property of:


And a point is a zero-dimensional object (with neither magnitude nor direction).

Example 518. You may recall from physics that velocity is a vector quantity, while
speed is a scalar quantity. In particular, speed is the magnitude of velocity.
We’ll have more to say about this in Ch. 39.

411, Contents

34.1. The Magnitude or Length of a Vector
The magnitude or length of a vector is simply given by the Pythagorean Theorem:

Definition 92. Given the vector (p, q), its magnitude (or length), denoted ∣(p, q)∣, is the

∣(p, q)∣ = p2 + q 2 .

Example 519. The magnitude or length of the vector AB = (3, 4) is:

∣AB∣ = ∣(3, 4)∣ = 42 + (−3) = 5.

D = (2, 4)

F = (0.5, 2)

B = (2, 1)
C = (−1, 0) x

E = (4, 0)

∣AB∣ = 42 + (−3) = 5

A = (−1, −3) 3

ÐÐ→ Ð→ Ð→
Similarly, the magnitudes of CD = (3, 4), CE = (5, 0), and CF = (1.5, 2) are:

∣CD∣= ∣(3, 4)∣ = 42 + (−3) = 5,

Ð→ √
∣CE∣= ∣(5, 0)∣ = 52 + 02 = 5,
Ð→ √
∣CF ∣= ∣(1.5, 2)∣= 1.52 + 22 = 2.5.

Remark 51. In more general contexts, norm is another synonym for magnitude (or
length). But we shan’t use this term in this textbook.

412, Contents

34.2. When Are Two Vectors Identical?
Informally, two vectors (p, q) and (r, s) are identical or equal if and only if they have the
same direction and magnitude. And formally:

(p, q) = (r, s) ⇐⇒ p = r, q = s.

Example 520. Consider the

Ð→ ÐÐ→ y
D = (2, 4)
vectors AB = (3, 4) and CD =
(3, 4). Both have length 5.
Also, both point in the
same direction.204
Ð→ ÐÐ→
Informally, AB = CD because
Ð→ ÐÐ→ F = (0.5, 2)
AB and CD have the same
length and direction.
Ð→ B = (2, 1)
A little more formally, AB =
CD because 3 = 3 and 4 = 4. C = (−1, 0) x
Note that when determin-
ing whether two vectors are E = (4, 0)
identical, their tail and 4
head do not matter. Ð→ ÐÐIn

the above example, AB = CD √
∣AB∣ = 42 + (−3) = 5
even though they don’t have
the same tail or head.

A = (−1, −3) 3

ÐÐ→ Ð→
Example 521. The vectors CD = (3, 4) and CF = (1.5, 2) point in the same direction.
ÐÐ→ Ð→ ÐÐ→ Ð→
However, CD ≠ CF because they have different lengths — ∣CD∣ = 5, while ∣CF ∣ = 2.5.
ÐÐ→ Ð→
More formally, CD ≠ CF because (3, 4) ≠ (1.5, 2).

Ð→ ÐÐ→ Ð→
Example 522. Each of AB = (3, 4), CD = (3, 4), and CE = (5, 0) has length 5. However,
Ð→ Ð→ ÐÐ→
CE “obviously” points in a different direction from AB and CD.

Ð→ Ð→ Ð→ ÐÐ→
Thus: CE ≠ AB and CE ≠ CD.

Below, Definition 103 will formally define what it means for two vectors to “point in the same direction”.
413, Contents
34.3. A Vector and a Point Are Different Things
Here’s another important point. Although a vector and a point can each be described by
an ordered pair of real numbers, they are entirely different mathematical objects.
To repeat, a vector is a two-dimensional object endowed with the properties of length
and direction. In contrast, a point is a zero-dimensional object, with neither length
nor direction. Example:

Example 523. Let A = (−1, −3), B = (2, 1), and G = (3, 4) be points.
Then AB = (3, 4) is the vector that carries us 3 units east and 4 units north from A to
B. It is a two-dimensional object endowed with the properties of length and direction.
In contrast, the point G = (3, 4), although also described by an ordered pair of real
numbers, is a zero-dimensional object, with neither length nor direction.

G = (3, 4)

B = (2, 1)

A = (−1, −3)

AB = (3, 4) is a vector, while G = (3, 4) is a point. They are completely different math-
ematical objects. Do not confuse them.

So far in this textbook, we’ve used the notation (p, q) to mean three entirely different
things. The notation (p, q) can mean:
(a) The set of real numbers between p and q (excluding p and q);
(b) The point with x-coordinate p and y-coordinate q; or
(c) The vector that “carries” us p units east and q units north.
As argued in Remark 19, in theory, this could be very confusing. But in practice, it isn’t.

414, Contents

34.4. Two More Ways to Denote a Vector
The vector (p, q) can also be written in two other ways. First, we can write it “vertically”:

(p, q) =
⎝q ⎠

Ð→ ÐÐ→ Ð→ Ð→
Example 524. The vectors AB = (3, 4), CD = (3, 4), CE = (5, 0), CF = (1.5, 2) may
also be written:

Ð→ ⎛ 3 ⎞ ÐÐ→ ⎛ 3 ⎞ Ð→ ⎛ 5 ⎞ Ð→ ⎛ 1.5 ⎞
AB = , CD = , CE = , CF = .
⎝4⎠ ⎝4⎠ ⎝0⎠ ⎝ 2 ⎠

As we’ll see shortly, we’ll be doing a lot of addition and multiplication with vectors. And
so, this “vertical” notation for vectors is very useful, because it literally helps us see better.
But in print, I’ll often prefer using the (a, b) notation, simply because it takes up less space.
Second, we can denote a vector by a single, lower-case, bold-font letter:

(p, q) = u.

Ð→ ÐÐ→ Ð→ Ð→
Example 525. The vectors AB = (3, 4), CD = (3, 4), CE = (5, 0), CF = (1.5, 2) may
also be written:
Ð→ ÐÐ→ Ð→ Ð→
AB = a, CD = b, CE = c, CF = d.

(The choice of letters is somewhat arbitrary. For an obvious reason, v is a favourite.)

We’ll often use the bold-font letter notation in print. However, it’s hard to hand-write in
bold font, so you can write Ð

u and Ð →v in place of u and v.
Our first exercises:

Exercise 165. Let A = (−1, −3), B = (2, 1), and G = (3, 4) be points.
Ð→ Ð→ Ð→ Ð→ Ð→
Consider the five vectors AG, BA, BG, GA, GB. Write down each in three different
ways. What is each vector’s tail, head, and length? How many units does each vector
carry us in the x- and y-directions? (Answer on p. 1471.)

Exercise 166. Provide a counterexample to show that the following is not always true:

∣u + v∣ = ∣u∣ + ∣v∣. (Answer on p. 1471.)

415, Contents

Remark 52. Yet another way to denote the vector v = (v1 , v2 ) is with brackets (and no
comma): v = [ v1 v2 ].

This is just so you know. We will not be using this bracket notation for vectors in this
textbook, nor do your H2 Maths syllabus and exams.

416, Contents

34.5. Position Vectors
Given a point A, its position vector OA is simply the vector that carries us from the
origin O to the point A:
Definition 93. Given a point A, its position vector is the vector OA .

So, given a point A = (a1 , a2 ), its position vector is:

Ð→ ⎛ a1 ⎞
OA = a = (a1 , a2 ) = .
⎝ a2 ⎠

Example 526. The points A = (−1, −3), B = (2, 1), and C = (−1, 0) have position vectors:

Ð→ ⎛ −1 ⎞ Ð→ ⎛2⎞ Ð→ ⎛ −1 ⎞
OA = a = (−1, −3) = , OB = b = (2, 1) = , and OC = c = (−1, 0) = .
⎝ −3 ⎠ ⎝1⎠ ⎝ 0 ⎠
Again, take care not to confuse a point with its position vector. Although A and OA
may both be denoted by (−1, −3), they are different mathematical objects — the former
is a point while the latter is a vector.

34.6. The Zero Vector

Informally, the zero vector is the vector that carries us nowhere. Formally:

Definition 94. The zero vector, denoted 0, is the origin’s position vector.

Ð→ ⎛0⎞
And so, the zero vector is: 0 = OO = (0, 0) = .

• Once again, do not confuse the point O = (0, 0) with its position vector 0.
• Given any point P , the vector that carries us from P to P is the vector carries us precisely
nowhere. Hence, P P = 0.

The following result says that every vector has non-negative length; and moreover, the only
vector with length 0 is the zero vector:

Fact 53. Suppose v is a vector. Then ∣v∣ ≥ 0. Moreover, ∣v∣ = 0 ⇐⇒ v = 0.

Proof. Let v = (v1 , v2 ). Then ∣v∣ = v12 + v22 ≥ 0 and ∣v∣ = 0 ⇐⇒ v = (0, 0) = 0.205

For a proof of this result in the general n-dimensional case, see p. 1287 in the Appendices.
417, Contents
34.7. Displacement Vectors

Definition 95. Suppose a moving object starts at point A and ends at point B. Then
we call AB its displacement vector.

So, if a moving object starts at A = (a1 , a2 ) and ends at B = (b1 , b2 ), then regardless of the
path taken by the object, we say that its displacement vector is:
AB = (b1 − a1 , b2 − a2 ).

Example 527. A particle starts at the y

point A = (−1, 0), travels rightwards along B = (2, 3)
the black arc, and comes to a stop at the
point B = (2, 3). So, this particle’s dis-
placement vector is AB = (3, 3).
AB = (3, 3)

A = (−1, 0) x

Example 528. A particle starts y

at the point A = (1, 0), travels

anti-clockwise around the unit

circle centred on the origin, com-

pletes one full circle, and comes

to a stop at the point A. So, this

particle’s displacement vector is
Ð→ x
AA = (0, 0) = 0 (not depicted in
A = (1, 0)

418, Contents

34.8. Sum and Difference of Points and Vectors
In this subchapter, we’ll learn that: 3. A + v = B.
1. Point + Point = Undefined.
2. Point − Point = Vector.
3. Point + Vector = Point. 2. B − A = v.
4. Point − Vector = Point.

1. Point + Point = Undefined. 4. B − v = A.

If A and B are points, then there is no such thing as A + B.

Example 529. Let A = (1, 2) and B = (5, 0) be points. Then the sum A+B is undefined.
It makes no sense to talk about the sum of two points.

The analogy is to points or locations in the real world:

Example 530. Consider Athens and Berlin, two points or locations. The sum Athens +
Berlin is undefined. It makes no sense to talk about the sum of two points or locations.

2. Point − Point = Vector.

Definition 96. Given two points A and B, the difference B − A is the vector AB.

And so, given the points A = (a1 , a2 ) and B = (b1 , b2 ), their difference is:
B − A = AB = (b1 − a1 , b2 − a2 ) .
Example 531. Let A = (1, 2) and B = (5, 0). Then B − A is the vector AB:
B − A = AB = (5, 0) − (1, 2) = (4, −2) .
Similarly, the difference A − B is defined to be the vector BA:
A − B = BA = (1, 2) − (5, 0) = (−4, 2) .

We can continue with the same Athens-Berlin analogy:

Example 532. The vector “Berlin − Athens” is the journey from Athens to Berlin:

Berlin − Athens = (−500, 900) .

That is, the journey from Athens to Berlin carries us 500 km west and 900 km north.206
Similarly, the vector “Athens − Berlin” is the reverse journey from Berlin to Athens:

Athens − Berlin = (500, −900) .

That is, the journey from Athens to Berlin carries us 500 km east and 900 km south.
These numbers are made up. The actual journey is longer (Google Maps).
419, Contents
3. Point + Vector = Point.

Definition 97. Given a point A = (a1 , a2 ) and a vector v = (v1 , v2 ), their sum A + v is
the following point:

A + v = (a1 + v1 , a2 + v2 ) .

Equivalently, if the vector v’s tail is at the point A, then its head is at the point A + v.

Example 533. Let A = (1, 2) and v = (4, 4). Then their sum is the point (5, 6):

A + v = (1, 2) + (4, 4) = (5, 6) .

Example 534. Let v = (−500, 900). Then:

Athens + v = Athens + (−500, 900) = Berlin.

Starting from Athens, travelling 500 km west and 900 km north brings us to Berlin.

4. Point − Vector = Point.

Definition 98. Given a point B = (b1 , b2 ) and a vector v = (v1 , v2 ), their difference B − v
is the following point:

B − v = (b1 − v1 , b2 − v2 ) .

Equivalently, if the vector v’s head is at the point A, then its tail is at the point A − v.

Example 535. Let A = (1, 2) and v = (4, 4). Then their difference is the following point:

A − v = (1, 2) − (4, 4) = (−3, −2) .

Example 536. Let v = (−500, 900). Then:

Berlin − v = Berlin − (−500, 900) = Athens.

If we end up in Berlin after travelling 500 km west and 900 km north, then we must have
started in Athens.

Exercise 167. Consider the vector (4, −3). (Answer on p. 1471.)

(a) If its tail is (0, 0), then what is its head?
(b) If its head is (0, 0), then what is its tail?
(c) If its tail is (5, 2), then what is its head?
(d) If its head is (5, 2), then what is its tail?

420, Contents

34.9. Sum, Additive Inverse, and Difference of Vectors
In this subchapter, we’ll learn that:

1. Vector + Vector = Vector.

2. − Vector = Vector.
3. Vector − Vector = Vector.
1. Vector + Vector= Vector.

Definition 99. Let u = (u1 , u2 ) and v = (v1 , v2 ) be vectors. Then their sum, denoted
u + v, is the vector u + v = (u1 + v1 , u2 + v2 ).

Place v’s tail at u’s head. Then u + v is the vector from u’s tail to v’s head:


Example 537. The sum of u = (−1, 3) and v = (4, 4) is the vector (3, 7):

⎛ −1 ⎞ ⎛ 4 ⎞ ⎛ 3 ⎞
u+v= + =
⎝ 3 ⎠ ⎝4⎠ ⎝8⎠

2. − Vector= Vector (Additive inverse).


Informally, the vector −u is the vector u flipped in the opposite direction. Formally:

Definition 100. The additive inverse of the vector u = (u1 , u2 ) is the vector:

−u = (−u1 , −u2 ) .

Example 538. The additive inverses of u = (−1, 3) and v = (4, 4) are:

⎛ −1 ⎞ ⎛ 1 ⎞ ⎛ 4 ⎞ ⎛ −4 ⎞
−u = − = and −v = − = .
⎝ 3 ⎠ ⎝ −3 ⎠ ⎝ 4 ⎠ ⎝ −4 ⎠

421, Contents

3. Vector − Vector= Vector.

Definition 101. Given two vectors u and v, the difference u − v is the sum of u and the
additive inverse of v. That is:

u − v = u + (−v).

Fact 54. If u = (u1 , u2 ) and v = (v1 , v2 ) are vectors, then:

u − v = (u1 − v1 , u2 − v2 ) .

Proof. By Definition 100, −v = (−v1 , −v2 ). And so by Definition 99,

u − v = u + (−v) = (u1 − v1 , u2 − v2 ) .

To get u − v, first flip v to get −v, then add −v to u :


v Flip v
ÐÐÐÐ→ u−v −v

Or equivalently, place the heads of u and v at the same point. Then u − v is the vector
from the tail of u to the tail of v:

u−v v

Example 539. Let u = (−1, 3) and v = (4, 4). Then the difference u − v is defined to be
the vector (−5, −1):

⎛ −1 ⎞ ⎛ 4 ⎞ ⎛ −5 ⎞
u−v= − =
⎝ 3 ⎠ ⎝ 4 ⎠ ⎝ −1 ⎠

Similarly, the difference v − u is defined to be the vector (5, 1):

⎛ 4 ⎞ ⎛ −1 ⎞ ⎛ 5 ⎞
v−u= − =
⎝4⎠ ⎝ 3 ⎠ ⎝1⎠

422, Contents

Ð→ Ð→ Ð→ Ð→
In the previous subchapter, we defined B − A = AB. We’ll now prove that OB − OA = AB:
Ð→ Ð→ Ð→
Fact 55. If A and B are points, then OB − OA = AB.

Proof. Let A = (a1 , a2 ) and B = (b1 , b2 ). By Definition 90,

Ð→ Ð→ Ð→
AB = (b1 − a1 , b2 − a2 ), OA = (a1 , a2 ), and OB = (b1 , b2 ).
Ð→ Ð→ Ð→ Ð→ Ð→
By Fact 54, OB − OA = (b1 − a1 , b2 − a2 ). Thus, OB − OA = AB.

More generally:
Ð→ Ð→ Ð→ Ð→ Ð→ Ð→
Fact 56. If A, B, and C are points, then AB − AC = CB and AB + BC = AC.

Proof. Let A = (a1 , a2 ), B = (b1 , b2 ), and C = (c1 , c2 ) be points. Then by Definition 96:
Ð→ Ð→ Ð→
AB = (b1 − a1 , b2 − a2 ), AC = (c1 − a1 , c2 − a2 ), and CB = (b1 − c1 , b2 − c2 ).
And now by Fact 54,
Ð→ Ð→ Ð→
AB − AC = (b1 − a1 , b2 − a2 ) − (c1 − a1 , c2 − a2 ) = (b1 − c1 , b2 − c2 ) = CB. 3
Ð→ Ð→ Ð→ Ð→ Ð→
Observing that −CB = BC and rearranging, we also have AB + BC = AC. 3

Example 540. Let A = (−1, 2) and B = (3, −1). Then:

Ð→ Ð→ Ð→ ⎛ 3 ⎞ ⎛ −1 ⎞ ⎛ 4 ⎞
AB = OB − OA = − = .
⎝ −1 ⎠ ⎝ 2 ⎠ ⎝ −3 ⎠

Example 541. Let C = (−1, 1) and D = (3, −2). Then:

ÐÐ→ ÐÐ→ Ð→ ⎛ 3 ⎞ ⎛ −1 ⎞ ⎛ 4 ⎞
CD = OD − OC = − = .
⎝ −2 ⎠ ⎝ 1 ⎠ ⎝ −3 ⎠

Exercise 168. Express each of the following vectors more simply: (Answer on p. 1471)

Ð→ Ð→ ÐÐ→ Ð→ ÐÐ→ Ð→
AC + CB, DC + CA, BD + DA,
Ð→ ÐÐ→ ÐÐ→ ÐÐ→ ÐÐ→ ÐÐ→
AD − CD, −DC − BD, BD + DB.

423, Contents

34.10. Scalar Multiplication of a Vector
Scalar multiplication of a vector works in the “obvious” fashion:

Definition 102. Given the vector v = (v1 , v2 ) and the scalar c ∈ R, the vector cv is:

cv = (cv1 , cv2 ) .

The vector cv is simply the vector that points in the same direction as v, but has c times
the length.


Fact 57. If v is a vector and c ∈ R, then ∣cv∣ = ∣c∣ ∣v∣.

Proof. See Exercise 169.

Exercise 169. Let v = (v1 , v2 ). (Answer on p. 1471.)

(a) Write out cv.

√ √
(b) Now prove Fact 57. (Hint: x2 y = ∣x∣ y.)

Exercise 170. Let A = (1, −3), B = (2, 0), and C = (5, −1). (Answer on p. 1472.)
Ð→ Ð→ Ð→ Ð→ Ð→ Ð→
(a) Write down AB, AC, BC, 2AB, 3AC, and 4BC.
Ð→ Ð→ Ð→ Ð→ Ð→ Ð→
(b) Verify that ∣2AB∣ = 2 ∣AB∣, ∣3AC∣ = 3 ∣AC∣, and ∣4BC∣ = 4 ∣BC∣.

424, Contents

34.11. When Do Two Vectors Point in the Same Direction?
Two non-zero vectors u and v are said to point in:
(a) The same direction if they are positive scalar multiples of each other;
(b) Exact opposite directions if they are negative scalar multiples of each other; and
(c) Different directions if they are not scalar multiples of each other.
A little more formally:

Definition 103. Two non-zero vectors u and v are said to point in:
(a) The same direction if u = kv for some k > 0;
(b) Exact opposite directions if u = kv for some k < 0; and
(c) Different directions if u ≠ kv for any k.

Example 542. Let a = (2, 0), b = (1, 0), c = (−3, 0), and d = (1, 1).
The vectors a and b point in the same direction because a = 2b.
The vectors a and c point in the exact opposite directions because c = −1.5a.
The vectors a and d point in different directions because a ≠ kd for any k.

Exercise 171. Continuing with the above example, explain if b points in the same, exact
opposite, or different direction from each of c and d. (Answer on p. 1472.)

Remark 53. Note the special case of the zero vector 0 = (0, 0) — it does not point in the
same, exact opposite, or different direction as any other vector.

425, Contents

34.12. When Are Two Vectors Parallel?
Two non-zero vectors u and v are parallel if they point in the same or the exact opposite
directions and non-parallel otherwise.207

Definition 104. Two non-zero vectors u and v are parallel if u = kv for some k and
non-parallel otherwise.

(So, non-parallel and point in different directions are synonyms.)

As shorthand, we write a ∥ b if a and b are parallel and a ∥/ b if they are not.

Example 543. The vectors a = (2, 0), b = (1, 0), c = (−3, 0) are parallel. And so as
shorthand, we may write a ∥ b, a ∥ c, and b ∥ c.
The vector d = (1, 1) is not parallel to a, b, or c. Equivalently, d = (1, 1) points in a
different direction from a, b, and c. And so as shorthand, we may write d ∥/ a, d ∥/ b,
and d ∥/ c.

Remark 54. Again, note the special case of the zero vector 0 = (0, 0) — it is neither
parallel nor non-parallel to any other vector.

Just so you know, some writers call two vectors that point in exact opposite directions anti-parallel.
In contrast, we call them parallel. We will not use the term anti-parallel in this textbook.
426, Contents
34.13. Unit Vectors

Definition 105. A unit vector is any vector of length 1.

√ √
Example 544. The vectors (1, 0), (0, 1), and ( 2/2, 2/2) are unit vectors:

∣(1, 0)∣ = 12 + 02 = 1,

∣(0, 1)∣ = 02 + 12 = 1, 3
√ √ Á √ 2 √ 2 √
)∣ = Á
2 2 À( 2 ) + ( 2 ) = 2 2
∣( , + = 1. 3
2 2 2 2 4 4

Example 545. The vectors (1, 1) and (−1, −1) are not unit vectors:
√ √
∣(1, 1)∣ = 12 + 12 = 2 ≠ 1, 3
√ √
∣(−1, −1)∣ = (−1) + (−1) = 2 ≠ 1.
2 2

Given a vector v, its unit vector, denoted v̂, is the vector that points in the same direction,
but has length 1. Formally:

Definition 106. Given a non-zero vector v, its unit vector (or the unit vector in its
direction) is:
v̂ = v.

It is easy to verify that thus defined, any vector’s unit vector has length 1:

Fact 58. Given any non-zero vector, its unit vector has length 1.

1 1 1
Proof. By Fact 57, ∣v̂∣ = ∣ v∣ = ∣ ∣ ∣v∣ = ∣v∣ = 1.
∣v∣ ∣v∣ ∣v∣

Fact 59. Let v be a vector with unit vector v̂. If c ∈ R, then the vector cv̂ has length ∣c∣.

Proof. See Exercise 172.

Fact 60. Let a and b be non-zero vectors. Then:

(a) â = b̂ ⇐⇒ a and b point in the same direction.
(b) â = −b̂ ⇐⇒ a and b point in exact opposite directions.
(c) â = ±b̂ ⇐⇒ a and b are parallel.
(d) â ≠ ±b̂ ⇐⇒ a and b are non-parallel.

Proof. See p. 1287 in the Appendices.

427, Contents

Exercise 172. Prove Fact 59. (Answer on p. 1472.)
Exercise 173. Let A = (1, −3), B = (2, 0), and C = (5, −1) be points. What are the unit
vectors of the following six vectors? (Answer on p. 1472.)

Ð→ Ð→ Ð→ Ð→ Ð→ Ð→
AB, AC, BC, 2AB, 3AC, 4BC.

Remark 55. Note that some writers also call û the normalised vector of u, but we shall
not do so.

428, Contents

34.14. The Standard Basis Vectors

j = (0, 1)
i = (1, 0)

The standard basis vectors i = (1, 0) and j = (0, 1) are simply the unit vectors that point
in the directions of the positive x- and y-axes. Formally:

Definition 107. The standard basis vectors (in 2D space) are i = (1, 0) and j = (0, 1).

It turns out that any vector can be written as the linear combination (i.e. weighted
sum) of i’s and j’s:

Example 546. Let A = (−2, −1), B = (1, 2), and C = (5, −2) be points. Their position
vectors can be written as linear combinations (i.e. weighted sums) of i’s and j’s:

Ð→ ⎛ 1 ⎞ ⎛1⎞ ⎛0⎞
OB = = i + 2j = +2
⎝2⎠ ⎝0⎠ ⎝1⎠

Ð→ ⎛ −2 ⎞ ⎛1⎞ ⎛0⎞
OA = = −2i−j = −2 −
⎝ −1 ⎠ ⎝0⎠ ⎝1⎠

Ð→ ⎛ 5 ⎞ ⎛1⎞ ⎛0⎞
OC = = 5i−2j = 5 −2
⎝ −2 ⎠ ⎝0⎠ ⎝1⎠

429, Contents

34.15. Any Vector Is A Linear Combination of Two Other Vectors
We just learnt that any vector can be written as the linear combination (i.e. weighted
sum) of the standard basis vectors i and j. It turns out that more generally, we can do the
same using any two vectors, so long as they are non-parallel:

Fact 61. Let a, b, and c be vectors. If a ∥/ b, then there are α, β ∈ R such that:

c = αa + βb.

Proof. The next page gives a heuristic proof. For a formal proof, see p. 1288 (Appendices).

Example 547. Consider the vectors a = (1, 2) and b = (3, 4). Since a ∥/ b, by Fact 61,
any vector can be expressed as the linear combination of a and b.
Consider for example the vector u = (2, 2). We will find α, β ∈ R such that u = αa + βb.
To do so, first write:

⎛2⎞ ⎛1⎞ ⎛3⎞

u= = αa + βb = α +β
⎝2⎠ ⎝2⎠ ⎝4⎠

Write out the above vector equation as the following two cartesian equations:

2 = 1α + 3β 2 = 2α + 4β.
1 2

Now solve this system of (two) equations: = minus 2× = yields −2β = −2 or β = 1, so that
2 1

α = −1. Thus:

⎛2⎞ ⎛1⎞ ⎛3⎞

u= =− + = −a + b.
⎝2⎠ ⎝2⎠ ⎝4⎠

Exercise 174. Let a = (1, 2) and b = (3, 4). Express each of the vectors v = (3, 2) and
w = (−1, 0) as the linear combination of a and b. (Answer on p. 1472.)
Exercise 175. Explain why any vector can be written as a linear combination of the
vectors a = (1, 3) and b = (7, 5). Then express each of the vectors i = (1, 0), j = (0, 1),
and d = (1, 1) as the linear combination of a and b. (Answer on p. 1473.)

Remark 56. For a somewhat recent application of Fact 61 in the A-Level exams, see
N2013/I/6(i) (Exercise 521).

430, Contents

The condition that a ∥/ b is important. If a ∥ b, then Fact 61 does not apply:

Example 548. Consider the vectors a = (1, 1) and b = (2, 2). Since a ∥ b, Fact 61 does
not apply.
For example, we cannot express v = (1, 2) written as the linear combination of a and b.
(Indeed, this can only be done for vectors that are themselves parallel to a and b.)

Example 549. The vectors c = (3, 1) and d = (−3, −1) point in exact opposite directions.
Since c ∥ d, Fact 61 does not apply.
For example, we cannot express v = (1, 2) written as the linear combination of c and d.
(Indeed, this can only be done for vectors that are themselves parallel to c and d.)

Here is a heuristic proof-by-picture of Fact 61.

Let a, b, and v be non-zero vectors, with a ∥/ b.
Place a’s tail at v’s tail. Then also place b’s head at v’s head.

v = αa + βb b



As the above figure suggests, “obviously”, we can always find real numbers α and β so that
the head of αa and the tail of βb coincide. In other words, there are real numbers α and
β such that v = αa + βb.

431, Contents

34.16. The Ratio Theorem

Theorem 7. Let A and B be points with position vectors a and b. Let P be the point
that divides the line segment AB in the ratio λ ∶ µ. Then P ’s position vector is:
µa + λb

The point P has position vector:
µa + λb

The point A has P

position vector a.

The point B has

position vector b.

Ð→ 1 Ð→
Proof. By Fact 55, AP = p − a and AB = b − a.
Ð→ Ð→
Now observe that AP points in the same direction as AB, but has λ/ (λ + µ) times the
length. Thus:
Ð→ 2 λ Ð→
AP = AB = (b − a) .
λ+µ λ+µ

Putting = and = together, we have:

1 2

Ð→ (λ + µ) a + λ (b − a) µa + λb
p = a + AP = a + (b − a) = =
λ+µ λ+µ λ+µ

By the way, you do not need to mug the Ratio Theorem because the following is printed
on p. 4 of List MF26:

µa + λb
The point dividing AB in the ratio λ : µ has position vector
Vector product:

432, Contents

Example 550. Let A = (3, 4) and B = (−1, 2) y
be points. Let P be the point that divides the A = (3, 4)
line segment AB in the ratio 3 ∶ 2. 3
By the Ratio Theorem:
B = (−1, 2) 2 P = (0.6, 2.8)
2a + 3b 2 ⎛ 3 ⎞ 3 ⎛ −1 ⎞ 1 ⎛ 3 ⎞
p= = + =
3+2 5 ⎝ 4 ⎠ 5 ⎝ 2 ⎠ 5 ⎝ 14 ⎠
And hence: P = (0.6, 2.8).

Example 551. Let C = (8, 3) and D = (2, −6) be points. Let Q be the point that divides
the line segment CD in the ratio 3 ∶ 7. By the Ratio Theorem:

7c + 3d 7 ⎛ 8 ⎞ 3 ⎛ 2 ⎞ 1 ⎛ 62 ⎞
q= = + = and hence Q = (6.2, 0.3).
3+7 10 ⎝ 3 ⎠ 10 ⎝ −6 ⎠ 10 ⎝ 3 ⎠

C = (8, 3)

Q = (6.2, 0.3)

D = (2, −6)

Exercise 176. Let A = (1, 2), B = (3, 4), C = (1, 4), D = (2, 3), E = (−1, 2), F = (3, −4)
be points. Find the points P , Q, and R which divide the line segments AB, CD, and
EF in the ratios 5 ∶ 6, 5 ∶ 1, and 2 ∶ 3, respectively. (Answer on p. 1473.)

433, Contents

35. Lines
Recall208 that if l is a line, then l may be written as:

l = {(x, y) ∶ ax + by + c = 0} ,

where at least one of a or b is non-zero.

More simply, we can say that the line l is described by the following cartesian equation:209

ax + by + c = 0.

In this chapter, we’ll learn a second method for describing lines, namely vector equations.
We’ll start by introducing the concept of a line’s direction vector:

35.1. Direction Vector

Informally, a direction vector of a line is any vector that’s parallel to the line. Formally:
Definition 108. Given any two distinct points A and B on a line, we call the vector AB
a direction vector of the line.

Example 552. Consider the line l described by the cartesian equation y = 2x + 3.

It contains the points:
A = (0, 3) and B = (−1.5, 0).
Hence, a direction vector of l is:
Ð→ Ð→
AB = B − A = (0, 3) − (−1.5, 0) = (1.5, 3). AB = (1.5, 3) B
CD = (2, 4)
l also contains the points:

C = (−1, 1) and D = (1, 5).

Hence, another direction vector of l is:
CD = D − C = (−1, 1) − (1, 5) = (2, 4).

Ch. 6.8.
Some writers also call this a scalar equation, but we shan’t do so.
434, Contents
As the above example suggests, direction vectors are not unique. If v is a direction vector
of a line, then so too is any vector that’s parallel to v.
But no other vector is a direction vector of the line. That is, if u ∥/ v, then u is not a
direction vector of the line. (And so, although the direction vector v isn’t unique, we can
say that it is unique up to non-zero scalar multiplication.)
Altogether then, if a line has direction vector v, then its direction vectors are exactly those
that are parallel to v. Formally:

Fact 62. Let u and v be vectors. Suppose v is a line’s direction vector. Then:

u is also that line’s direction vector ⇐⇒ u ∥ v.

Proof. See p. 1288 in the Appendices.

Example 553. The line l described by y = 2x + 3 has direction vectors:

Ð→ ⎛ 1.5 ⎞ ÐÐ→ ⎛ 2 ⎞
AB = and CD = .
⎝ 3 ⎠ ⎝4⎠

ÐÐ→ Ð→
We can easily verify that CD ∥ AB:

ÐÐ→ ⎛ 2 ⎞ 4 ⎛ 1.5 ⎞ 4 Ð→
CD = = = AB.
⎝ 4 ⎠ 3⎝ 3 ⎠ 3

Ð→ ÐÐ→
The following are parallel to AB and CD and are thus also direction vectors of l:

⎛ 2 ⎞ ⎛ −10 ⎞ ⎛ 2 ⎞ ⎛ 2π ⎞ ⎛ 2 ⎞ ⎛ 34 ⎞
−5 = , = , and 17 = .
⎝ 4 ⎠ ⎝ −20 ⎠ ⎝ 4 ⎠ ⎝ 4π ⎠ ⎝ 4 ⎠ ⎝ 68 ⎠

In contrast, the following are not:

⎛1⎞ ⎛2⎞ ⎛1⎞ ⎛0⎞ ⎛0⎞

, , , , and 0= .
⎝1⎠ ⎝3⎠ ⎝0⎠ ⎝1⎠ ⎝0⎠

435, Contents

Fact 63. If a line is described by the cartesian equation ax + by + c = 0 or by = −ax − c,
then it has direction vector (b, −a).

Proof. Let D = (p, q) be any point on the line. Since D is on the line, it satisfies the line’s
cartesian equation — that is:

ap + bq + c = 0.

Now consider the point E = (p + b, q − a). We now show that E also satisfies the line’s
cartesian equation and is thus is also on the line:

a (p + b) + b (q − a) + c = ap + ab + bq − ab + c = ap + bq + c = 0. 3

Since D and E are both points on the line, by Definition 108, the line has direction vector:
DE = E − D = (p + b, q − a) − (p, q) = (b, −a) .

Example 554. Consider the line described by 5x − 2y + 3 = 0 or y = 2.5x + 1.5. By the

above Fact, it has direction vector (−2, −5).
Since (1, 2.5) ∥ (−2, −5), by Fact 62, it also has direction vector (1, 2.5). (As we’ll see on
the next page, not coincidentally, this line’s gradient is also 2.5.)

−2x + 1 = 0 (1, 2.5)

(−2, −5)
5x − 2y + 3 = 0
or y = 2.5x + 1.5 (0, 2)

(3, 0)

3y − 1 = 0

Next, the line described by 3y − 1 = 0 or y = 1/3 has direction vector (3, 0).
And the line described by −2x + 1 = 0 or x = 0.5 has direction vector (0, 2).

436, Contents

The following result follows readily from Fact 63:

Corollary 7. (a) A horizontal line has direction vector (1, 0).

(b) A vertical line has direction vector (0, 1).
(c) A line with gradient m has direction vector (1, m).

Proof. Suppose the line is described by ax + by + c = 0. Then by Fact 63, the line has
direction vector (b, −a).
(a) If the line is horizontal, then by Fact 13, a = 0. And so, the line has direction vector
(b, 0). Since (1, 0) ∥ (b, 0), by Fact 62, the line also has direction vector (1, 0).
(b) Similarly, if the line is vertical, then by Fact 13, b = 0. And so, the line has direction
vector (0, −a). Since (0, 1) ∥ (0, −a), by Fact 62, the line also has direction vector (0, 1).
(c) The line’s gradient is −b/a = m. But (b, −a) ∥ (−b/a, 1). And so by Fact 62, the line
also has direction vector (1, m).

Example 555. The horizontal line y = −1 has direction vector (1, 0).
The vertical line x = 2 has direction vector (0, 1).
The oblique line y = x + 1 has gradient 1 and thus direction vector (1, 1).


y =x+1

(1, 1)
(0, 1)

y = −1

(1, 0)

Exercise 177. For each line, write down a direction vector. (Answer on p. 1474.)

(a) 3x − y + 2 = 0. (b) 7y = −2x − 3. (c) y = π. (d) x = −5.

437, Contents

35.2. Cartesian to Vector Equations

Example 556. Consider the line l = {(x, y) ∶ 3x − y + 2 = 0}. It contains exactly those
points (x, y) that satisfy the cartesian equation 3x − y + 2 = 0 or y = 3x + 2. More simply,
we may say that this cartesian equation describes l.
It turns out that we can also describe l using a vector equation.
To do so, first observe that l contains the point P = (0, 2). Also, it has gradient 3 and
thus direction vector v = (1, 3). Since l is a straight line, it must also contain the points:

P + 1v = (0, 2) + 1(1, 3) = (1, 5) and P − 1v = (0, 2) − 1(1, 3) = (−1, −1).

Indeed, l contains exactly those points R that can be expressed as P + λv = (0, 2) + λ(1, 3)
for some real number λ. That is:

l = {R ∶ R = (0, 2) + λ(1, 3) (λ ∈ R)}.


More simply, we may say that the vector equation = describes l.


Equivalently, l contains exactly those points R whose position vector r may be expressed
as p + λv = (0, 2) + λ(1, 3) for some real number λ. That is:

l = {R ∶ r = (0, 2) + λ(1, 3) (λ ∈ R)}.


Again, we may more simply say that the vector equation = describes l.

(By the way, = and = are subtly different — more on this in Ch. 35.3.)
1 2

As in Ch. 23 (Simple Parametric Equations), λ is a parameter. Here “λ ∈ R” says that

λ takes on every value in R. And as λ varies, we get different points of the line.
For example, λ = −1, λ = 0, and λ = 1 produce the following points:

(−1, −1) = (0, 2)−1(1, 3), (0, 2) = (0, 2) + 0(1, 3), and (1, 5) = (0, 2) + 1(1, 3).

l may be described by:
3x − y + 2 = 0, or (1, 5)
r = (0, 2) + λ(1, 3) (λ ∈ R).
2 (λ = 1)

(0, 2)
(λ = 0) (1, 3)

(−1, −1)
(λ = −1) x

(Example continues on the next page ...)

438, Contents

(... Example continued from the previous page.)
Of course, l also contains infinitely many other points — each distinct value of λ ∈ R
produces a distinct point on l.
We noted in Fact 62 that a line’s direction vector is unique up to non-zero scalar
multiplication. A direction vector of l is v = (1, 3), but so too is any non-zero scalar
multiple of v. And so, here are three more ways to write l:

⎛0⎞ ⎛ 100 ⎞
l = { (x, y) ∶ r = +λ (λ ∈ R) }
⎝2⎠ ⎝ 300 ⎠

⎛ −1 ⎞ ⎛ −100 ⎞
= { (x, y) ∶ r = +λ (λ ∈ R) }
⎝ −1 ⎠ ⎝ −300 ⎠

⎛1⎞ ⎛ 1.5 ⎞
= { (x, y) ∶ r = +λ (λ ∈ R) }.
⎝5⎠ ⎝ 4.5 ⎠

More generally, for any k ≠ 0 and u = k (1, 3), we may write:

l = {(x, y) ∶ r = (0, 2) + λu (λ ∈ R)} .

The foregoing discussion suggests the following general Definition of a line.

Definition 109. A line is any set of points that can be written as:
{R ∶ OR = p + λv (λ ∈ R)} ,

where p and v ≠ 0 are some vectors.

The above Definition says that a line contains exactly those points R whose position vector
OR = r may be expressed as:
OR = r = p + λv = (p1 , p2 ) + λ(v1 , v2 )
for some real number λ.

Equivalently, a line contains exactly those points R that may be expressed as:

R = (p1 , p2 ) + λ(v1 , v2 )
for some real number λ.

To repeat, here are what the vectors p and v and the number λ mean:
• p = (p1 , p2 ) is the position vector of some point on the line;
• v = (v1 , v2 ) is a direction vector of the line; and
• The parameter λ takes on every value in R; each distinct value produces a distinct
point on the line.
Note that Definition 109 is perfectly consistent with our earlier definition of a line (Definition
39). The difference is that Definition 39 “works” only in 2D space. In contrast, Definition
109 is more general — it “works” in 2D space and, as we’ll see, also in 3D space.210
Indeed, it also “works” in any n-dimensional space.
439, Contents
And so, to write down a line’s vector equation, we need simply find any point on the line
and any direction vector of the line. More examples to illustrate how this works:

Example 557. The line l is described by y = 1 − x.

It contains the point (0, 1). It has gradient −1 and hence, by Corollary 7, the direction
vector (1, −1). Thus, we can also describe the line l by:

R = (0, 1) + λ(1, −1) r = (0, 1) + λ(1, −1) (λ ∈ R).

1 2

Both = and = say the same thing:

1 2

• = says that l contains exactly those points R that may be written as (0, 1) + λ(1, −1),

for some real number λ.

• = says that l contains exactly those points R whose position vector r = OR may be

written as (0, 1) + λ(1, −1), for some real number λ.

The line l

(1, −1)

(−1, 2)
(λ = −1)

(0, 1)
(λ = 0)

l may be described by: (1, 0)

(λ = 1)
y = 1 − x, or
r = (0, 1) + λ(1, −1) (λ ∈ R).

As the parameter λ takes on different values in R, we get different points of l. So for

example, λ = −1, λ = 0, and λ = 1 produce the following points:

(−1, 2) = (0, 1) − 1(1, −1),

(0, 1) = (0, 1) + 0(1, −1),
(1, 0) = (0, 1) + 1(1, −1).

440, Contents

Example 558. The line l is described by y = 3.
It contains the point (0, 3). Since it’s horizontal, by Corollary 7, it has direction vector
(1, 0). Hence, we can also describe the line l by:

R = (0, 3) + λ(1, 0) r = (0, 3) + λ(1, 0) (λ ∈ R).

1 2

(1, 0)

The line l

(−1, 3) (0, 3) (1, 3)

(λ = −1) (λ = 0) (λ = 1)

As λ varies, we get different points of l. So for example, λ = −1, λ = 0, and λ = 1 produce:

(−1, 3) = (0, 3) − 1(1, 0),

(0, 3) = (0, 3) + 0(1, 0),
(1, 3) = (0, 3) + 1(1, 0).

Example 559. The line l is described by: y

x = −1.

It contains the point (−1, 0). Since it’s vertical, (−1, 1)

by Corollary 7, it has direction vector (0, 1) . (λ = 1) (0, 1)
Hence, we can also describe the line l by:

R = (0, 1) + λ(0, 1) or

r = (0, 1) + λ(0, 1) (λ ∈ R).

(−1, 0) x
(λ = 0)
As λ varies, we get different points of l. So for
example, λ = −1, λ = 0, and λ = 1 produce:
(−1, −1)
(−1, 2) = (0, 1) − 1(0, 1), (λ = −1)
(0, 1) = (0, 1) + 0(0, 1),
(1, 0) = (0, 1) + 1(0, 1).

441, Contents

Exercise 178. Each of the following (cartesian) equations describes a line. Rewrite each
into a vector equation. Also write down the points corresponding to when your parameter
takes on the values −1, 0, and 1. (Answer on p. 1474.)

(a) −5x + y + 1 = 0. (b) x − 2y − 1 = 0. (c) y − 4 = 0. (d) x − 4 = 0.

Exercise 179. In Definition 109 (of a line), we impose the restriction that a line’s dir-
ection vector v must be non-zero. By considering what the line becomes if v is the zero
vector, explain why we impose this restriction. (Answer on p. 1474.)

Remark 57. Here we repeat our earlier warning. A line {(x, y) ∶ ax + by + c = 0} is a set of
points. But for the sake of convenience, we often simply say that the line may be described
by the cartesian equation ax + by + c = 0. And if we’re especially lazy or sloppy, we
might even say that the line is the equation ax + by + c = 0 (even though strictly speaking,
this is wrong because a line is not an equation — it is a set).
Here likewise, a line {R = (x, y) ∶ r = p + λv, λ ∈ R} is a set of points. But for the sake
of convenience, we will often simply say that the line may be described by the vector
equation r = p + λv (λ ∈ R). And if we’re especially lazy or sloppy, we might even say
that the line is the equation ax + by + c = 0 (even though again, strictly speaking, this is
wrong because a line is not an equation — it is a set).

442, Contents

35.3. Pedantic Points to Test/Reinforce Your Understanding
As we’ve seen above, a line l may be described using either of the following equations:

© 1 © ª
Point Point Vector

R = P + λv (λ ∈ R).

© 2 ©
Or: r = p + λv (λ ∈ R).

Here are three pedantic points that can serve as a useful test of your understanding:
Pedantic Point #1. = is consistent with what we learnt earlier (in Ch. 34.9):

Point = Point + Vector.

= is also consistent with what we learnt earlier:


Vector = Vector + Vector.

So, both vector equations = and = are perfectly correct ways to describe the exact same
1 2

line line.
The difference is that = does so “more directly” than does =. Because, to repeat:
1 2

• = says that l contains exactly those points R that may be written as P + λv, for some

real number λ.
• = says that l contains exactly those points R whose position vector r = OR may be

written as p + λv, for some real number λ.

Pedantic Point #2. What would be wrong and unacceptable is the following:

© 3 Vector
© ©
R = p +λv (λ ∈ R), 7

As we learnt earlier (Ch. 34.9), Vector + Vector = Vector. But the LHS of = is a Point

while its RHS is a Vector. Thus, = is false.


Similarly, the following is also wrong and unacceptable:

© 4 ©
Vector Vector
r = P +λv (λ ∈ R). 7

As we also learnt earlier, Point + Vector = Point. But the LHS of = is a Vector while its

RHS is a Point. Thus, = is false.


Pedantic Point #3. A line is a set of points and not a set of vectors. So, take care to
note that the line l contains the points R = (x, y) and P = (p1 , p2 ) — it does not contain
the vectors r = (x, y) and p = (p1 , p2 ).

443, Contents

35.4. Vector to Cartesian Equations
Suppose a line is described by the following vector equation:

r = p + λv (λ ∈ R).

⎛ x ⎞ ⎛ p1 ⎞ ⎛v ⎞
Or equivalently: = + λ 1 (λ ∈ R).
⎝ y ⎠ ⎝ p2 ⎠ ⎝ v2 ⎠

Then given any point (x, y) on this line, there must be some real number λ such that:

x = p1 + λv1 and y = p2 + λv2 .

We say that the line may be described by the above pair of cartesian equations.
Hm ... but aren’t we supposed to be able to describe a line with just one cartesian equation?
Well, if we’d like, we can do some easy algebra to eliminate the parameter λ:

Example 560. The line l is described by:

r = (1, 2) + λ(1, 1) (λ ∈ R).
y = x + 1 or
We can also describe l by the following pair of r = (1, 2) + λ(1, 1)
cartesian equations: 1

x=1+λ⋅1 y = 2 + λ ⋅ 1.
1 2
and x

We can eliminate the parameter λ through

simple algebra. = minus = yields:
2 1

y−x=1 or y = x + 1.

Example 561. The line l is described by:

r = (0, 0) + λ(4, 5) (λ ∈ R).
y = x or
We can also describe l by the following pair of 4
cartesian equations: r = (0, 0) + λ(4, 5)

x=0+λ⋅4 y = 0 + λ ⋅ 5.
1 2
and x
5 1
= minus × = yields:
5 5
y− x=0 or y = x.
4 4

444, Contents

Example 562. The line l is described by: y
x = 3 or
r = (3, 1) + λ(0, 2) (λ ∈ R). r = (3, 1) + λ(0, 2)

We can also describe l by the following pair of

cartesian equations:

x=3+λ⋅0=3 y = 1 + λ ⋅ 2.
1 2

Observe that in this example, we cannot use x

algebra to eliminate λ. It turns out that this is
actually a vertical line.
As λ varies, the value of x is fixed at x = 3, while y varies along with λ. And so, instead

of doing any algebra, we’ll simply discard =. The above pair of cartesian equations then

reduces to the single equation:

x = 3.

Example 563. The line l is described by: y

r = (−1, 2) + λ(−1, 0) (λ ∈ R).

We can also describe l by the following pair of y = 2 or

cartesian equations: r = (−1, 2) + λ(−1, 0)

x = −1 + λ ⋅ (−1) y = 2 + λ ⋅ 0 = 2.
1 2

Observe that in this example, we cannot use x

algebra to eliminate λ. It turns out that this is
actually a horizontal line.
As λ varies, the value of y is fixed at y = 2, while x varies along with λ. And so, instead

of doing any algebra, we’ll simply discard =. The above pair of cartesian equations then

reduces to the single equation:

y = 2.

Exercise 180. Each of the following vector equations describes a line. Rewrite each into
cartesian equation form. (Answer on p. 1474.)

(a) r = (−1, 3) + λ (1, −2) (λ ∈ R). (b) r = (5, 6) + λ (7, 8) (λ ∈ R).

(c) r = (0, −3) + λ (3, 0) (λ ∈ R). (d) r = (1, 1) + λ (0, 2) (λ ∈ R).

445, Contents

In general:

Fact 64. Let l be the line described by r = (p1 , p2 ) + λ(v1 , v2 ) (λ ∈ R).

(a) If v1 , v2 ≠ 0, then l can be described by:

x − p1 y − p2
= y= x + p2 − p1 .
v2 v2
or, rearranging:
v1 v2 v1 v1
(b) If v1 = 0, then l is vertical and can be described by x = p1 .
(c) If v2 = 0, then l is horizontal and can be described by y = p2 .

x = p1 + λv1 y = p2 + λv2 .
1 2
Proof. First, write: and
Then v1 × = minus v2 × = yields:
2 1

v1 y − v2 x = v1 p2 + λv1 v2 − v2 p1 − λv1 v2 = v1 p2 − v2 p1 .

v2 (x − p1 ) = v1 (y − p2 ).
x − p1 y − p2
(a) If v1 , v2 ≠ 0, then = divided by v1 v2 yields =
v1 v2
(b) If v1 = 0, then = becomes x = p1 .

(c) If v2 = 0, then = becomes y = p2 .


Armed with Fact 64, we now revisit the last four examples.

Example 564. The line r = (1, 2) + λ(1, 1) (λ ∈ R) may be described by:

x−1 y−2
= or, rearranging: y = x + 1.
1 1

Example 565. The line r = (0, 0) + λ(4, 5) (λ ∈ R) may be described by:

x−0 y−0 5
= or, rearranging: y = x.
4 5 4

Example 566. The line r = (3, 1) + λ(0, 2) (λ ∈ R) may be described by x = 3.

Example 567. The line r = (−1, 2) + λ(−1, 0) (λ ∈ R) may be described by y = 2.

Exercise 181. Use Fact 64 to redo Exercise 180. (Answer on p. 1474.)

446, Contents

36. The Scalar Product

Definition 110. Given vectors u = (u1 , u2 ) and v = (v1 , v2 ), their scalar product, denoted
u ⋅ v, is the number:

u ⋅ v = u1 v1 + u2 v2 .

⎛ 5 ⎞ ⎛2⎞ ⎛ −4 ⎞ ⎛8⎞
Example 568. Let u = , v= , w= , and x = . Then:
⎝ −3 ⎠ ⎝1⎠ ⎝ 0 ⎠ ⎝7⎠

⎛ 5 ⎞ ⎛2⎞
u⋅v = ⋅ = 5 ⋅ 2 + (−3) ⋅ 1 = 10 − 3 = 7,
⎝ −3 ⎠ ⎝ 1 ⎠

⎛ 5 ⎞ ⎛ −4 ⎞
u⋅w = ⋅ = 5 ⋅ (−4) + (−3) ⋅ 0 = −20 + 0 = −20,
⎝ −3 ⎠ ⎝ 0 ⎠

⎛ 5 ⎞ ⎛8⎞
u⋅x = ⋅ = 5 ⋅ 8 + (−3) ⋅ 7 = 40 − 21 = 19,
⎝ −3 ⎠ ⎝ 7 ⎠

⎛ 2 ⎞ ⎛ −4 ⎞
v⋅w = ⋅ = 2 ⋅ (−4) + 1 ⋅ 0 = −8 + 0 = −8.
⎝1⎠ ⎝ 0 ⎠

The scalar product is itself simply a scalar (i.e. a real number). Hence the name.

Remark 58. The scalar product is also called the dot product or the inner product.
But it appears that your A-Level exams and syllabus do not use these terms. And so
neither shall we. We will stick strictly to the term scalar product.

Right now, the scalar product may seem like a totally random and useless thing, but as
we’ll soon learn, it is plenty useful. Let us first learn about a few of its properties.

Recall from primary school that ordinary multiplication is commutative:

Example 569. 3 × 5 = 15 and 5 × 3 = 15.

Moreover, ordinary multiplication is distributive (over addition):

Example 570. 3 × (5 + 11) = 3 × 5 + 3 × 11 and 18 × (7 − 31) = 18 × 7 + 18 × (−31).

It turns out that the scalar product is likewise commutative and distributive:

447, Contents

Fact 65. Let a, b, and c be vectors. Then:
(a) a ⋅ b = b ⋅ a. (Commutativity)
(b) a ⋅ (b + c) = a ⋅ b + a ⋅ c. (Distributivity over addition)

The fact that the scalar product is both commutative and distributive is a simple con-
sequence of the fact that multiplication is itself commutative and distributive.211

Proof. Let212 a = (a1 , a2 ), b = (b1 , b2 ), and c = (c1 , c2 ). Then:

(a) a ⋅ b = a1 b1 + a2 b2 = b1 a1 + b2 a2 = b ⋅ a.

(b) a ⋅ (b + c) = a1 (b1 + c1 ) + a2 (b2 + c2 ) = a1 b1 + a2 b2 + a1 c1 + a2 c2 = a ⋅ b + a ⋅ c.

Example 571. Continue to let u = (5, −3), v = (2, 1), w = (−4, 0), and x = (8, 7).
The scalar product is commutative:

v ⋅ u = u ⋅ v = 7, w ⋅ u = u ⋅ w = −20, x ⋅ u = u ⋅ x = 19.

It is also distributive over addition:

⎛ 5 ⎞ ⎛2−4⎞
u ⋅ (v + w) = ⋅ = −10 − 3 = −13 = 7 − 20 = u ⋅ v + u ⋅ w,
⎝ −3 ⎠ ⎝ 1 + 0 ⎠

⎛ 5 + 2 ⎞ ⎛ −4 ⎞
(u + v) ⋅ w = ⋅ = −28 + 0 = −28 = −20 − 8 = u ⋅ w + v ⋅ w.
⎝ −3 + 1 ⎠ ⎝ 0 ⎠

Here’s another “obvious” property of the scalar product:

Fact 66. Suppose a and b be vectors and c ∈ R be a scalar. Then:

(ca) ⋅ b = c (a ⋅ b).

Proof. Let213 a = (a1 , a2 ) and b = (b1 , b2 ). So, ca = (ca1 , ca2 ). Thus:

(ca) ⋅ b = (ca1 ) b1 + (ca2 ) b2 = c (a1 b1 + a2 b2 ) = c (a ⋅ b) .

Exercise 182. Let v = (2, 1), w = (−4, 0), and x = (8, 7). Above we already computed
v ⋅ w = −8. Now also compute the following: (Answer on p. 1475.)

v ⋅ x, w ⋅ x, w ⋅ v, x ⋅ v, x ⋅ w, w ⋅ (x + v), (2v) ⋅ x, and w ⋅ (2x).

The latter is, in turn, a fact we will simply take for granted in this textbook.
The proof covers only the two-dimensional case. For a more general proof, see p. 1289 (Appendices).
This proof covers only the two-dimensional case. For a more general proof, see p. 1289 (Appendices).
448, Contents
36.1. A Vector’s Scalar Product with Itself
It turns out that a vector’s length is the square root of the scalar product with itself:

Fact 67. Suppose v be a vector. Then ∣v∣ = v ⋅ v and ∣v∣ = v ⋅ v.

Proof. By Definition 92, ∣v∣ = v12 + v22 . By Definition 110, v ⋅ v = v1 v1 + v2 v2 = v12 + v22 .

Hence, ∣v∣ = v ⋅ v and ∣v∣ = v ⋅ v.

Exercise 183. Let u = (5, −3), v = (2, 1), w = (−4, 0), and x = (8, 7).
The lengths of each vector are:
√ √
∣u∣ = 52 + (−3) =
√ √
∣v∣ = 22 + 12 = 5

∣w∣ = (−4) + 02 =
√ √
∣x∣ = 82 + 72 = 113

And the square roots of the scalar product of each vector with itself are:
√ √ √ √
u⋅u = (5, −3) ⋅ (5, −3) = 25 + 9 = 34
√ √ √ √
v⋅v = (2, 1) ⋅ (2, 1) = 4+1 = 5
√ √ √
w⋅w = (−4, 0) ⋅ (−4, 0) = 16 + 0 = 4
√ √ √ √
x⋅x = (8, 7) ⋅ (8, 7) = 64 + 49 = 113

Exercise 184. You are given the vectors a = (−2, 3), b = (7, 1), and c = (5, −4). Verify
that the length of each vector is equal to the square root of each vector’s scalar product
with itself. (Answer on p. 1475.)

449, Contents

37. The Angle Between Two Vectors
In the figure on the right, the tails of the vectors u and v
are placed at the same point.
Informally, the angle between u and v is the “amount”
by which we must rotate u so that it points in the same 2π − α
direction as v.
Note though that we can either rotate u anticlockwise by
α or clockwise by 2π − α. So, we have an ambiguity here
— which is the angle between u and v? Is it α or 2π − α?
To resolve this ambiguity, we will simply define the angle between two vectors so that it is
always the smaller of these two angles. In other words, we’ll define the angle between
two vectors so that it’s always between 0 and π.
And so, in the above figure, the angle between u and v is α (and not 2π − α).
In contrast, in the figure below, the angle between w and x is β (and not 2π − β).

w β

2π − β

We now give our formal Definition of the angle between two vectors. Be warned that
it comes seemingly outta nowhere. But don’t worry, Exercise 185 (next page) will help you
understand where this Definition comes from.

Definition 111. The angle between two non-zero vectors u and v is the number:
∣u∣ ∣v∣

Recall214 that the range of cos−1 is [0, π]. And so, by the above Definition, the angle
between two vectors is indeed always between 0 and π.

P. 275.
450, Contents
Exercise 185. Let u and v be vectors and θ be the angle between them.

This Exercise will help you understand why we define θ = cos−1 .
∣u∣ ∣v∣
(a) Write down the vector that corresponds to the third side of the above triangle.
(b) Write down the lengths of the triangle’s three sides in terms of u and v.
(c) The Law of Cosines (Proposition 5) states that if a triangle has sides of lengths a, b,
and c and has angle C opposite the side of length c, then:

c2 = a2 + b2 − 2ab cos C.

Use the Law of Cosines to write down an equation involving θ, u, and v.

(d) Use distributivity (Fact 65) to prove that (u − v) ⋅ (u − v) = u ⋅ u + v ⋅ v − 2u ⋅ v.
(e) Now take the equation you wrote down in (c), do the algebra (hint: use Fact 67),
and hence show that:
θ = cos−1 . (Answer on p. 1476.)
∣u∣ ∣v∣

A simple rearrangement of Definition 111 produces the following result:

Fact 68. If u and v are two non-zero vectors and θ is the angle between them, then:

u ⋅ v = ∣u∣ ∣v∣ cos θ.

Example 572. Let θ be the angle between the vectors i = (1, 0) and u = (1, 1). Since i
points east, while u points north-east, we know from primary school trigonometry that
θ = π/4.
Let’s verify that this is consistent with Definition 111:

i⋅u (1, 0) ⋅ (1, 1)

cos−1 = cos−1
∣i∣ ∣u∣ ∣(1, 0)∣ ∣(1, 1)∣
u = (1, 1)
= cos−1 √ √
12 + 02 12 + 12
1 π
= cos−1 √ = . 3 i = (1, 0)
2 4 4

451, Contents

Example 573. Let θ be the angle between the vectors i = (1, 0) and j = (0, 1). Since i
points east, while j points north, we know from primary school trigonometry that θ = π/2.
Let’s verify that this is consistent with Definition 111:
i⋅j (1, 0) ⋅ (0, 1)
cos−1 = cos−1 j = (0, 1)
∣i∣ ∣j∣ ∣(1, 0)∣ ∣(0, 1)∣
= cos−1 √ √
12 + 02 02 + 12
= cos−1 0 = . 3
π 2 i = (1, 0)

Example 574. Consider the vectors v = (3, 2) and w = (−1, −4).

By Definition 111: v = (3, 2)

v⋅w (3, 2) ⋅ (−1, −4)

cos−1 = cos−1
∣v∣ ∣w∣ ∣(3, 2)∣ ∣(−1, −4)∣
3 × (−1) + 2 × (−4)
= cos−1 √ √
32 + 22 (−1) + (−4)
2 2

−3 − 8 −11
= cos−1 ( √ √ ) = cos−1 ( √ ) ≈ 2.404.
13 17 221 w = (−1, −4)

By the way, here’s a possible concern. We’ve defined the angle between two vectors as:
∣u∣ ∣v∣

But recall215 that the domain of arccosine is [−1, 1]. So, how can we be sure that the above
expression is always well-defined? In other words, how can we be sure that:
−1 ≤ ≤ 1?
∣u∣ ∣v∣

Fortunately, we can, thanks to Cauchy’s Inequality:216

Fact 69. (Cauchy’s Inequality.) Let u and v be non-zero vectors. Then:

−1 ≤ ≤ 1.
∣u∣ ∣v∣

− ∣u∣ ∣v∣ ≤ u ⋅ v ≤ ∣u∣ ∣v∣ (u ⋅ v) ≤ ∣u∣ ∣v∣ .

2 2 2
Equivalently: or

Proof. See p. 1289 in the Appendices.

P. 275.
Also known as the Cauchy-Schwarz Inequality in its more general form.
452, Contents
Fact 70. Let u and v be non-zero vectors and θ be the angle between them. Then:
(a) = 1 ⇐⇒ = 0.
∣u∣ ∣v∣

∈ (0, 1) ⇐⇒ ∈ (0, ).
(b) And thus:
∣u∣ ∣v∣
= ⇐⇒ = u⋅v > ⇐
(c) 0 . (i) 0
∣u∣ ∣v∣
∈ (−1, 0) ⇐⇒ ∈ ( , π). u⋅v = ⇐
(d) (ii) 0
∣u∣ ∣v∣
(e) = −1 ⇐⇒ = (iii) u⋅v < 0 ⇐
∣u∣ ∣v∣
θ π.

Proof. To prove this, simply apply Definition 111.



−1 1

Fact 70(ii) motivates the following Definition:

Definition 112. Two non-zero vectors u and v are perpendicular (or normal or ortho-
gonal) if u ⋅ v = 0 and non-perpendicular if u ⋅ v ≠ 0.

As shorthand, we write u ⊥ v if two non-zero vectors u and v are perpendicular; and u ⊥/ v

if they aren’t.

Remark 59. Again, note the special case of the zero vector 0 = (0, 0) — it is neither
perpendicular nor non-perpendicular to any other vector.

453, Contents

Fact 71. Let u and v be non-zero vectors. Then:
(a) =1 ⇐⇒ u and v point in the same direction.
∣u∣ ∣v∣
(b) = −1 ⇐⇒ u and v point in exact opposite directions.
∣u∣ ∣v∣
(c) = ±1 ⇐⇒ u and v are parallel.
∣u∣ ∣v∣
(d) ∈ (−1, 1) ⇐⇒ u and v point in different directions.
∣u∣ ∣v∣

Proof. “Obviously”, (c) and (d) simply follow from (a) and (b). For the proof of (a) and
(b), see p. 1290 in the Appendices.

More examples:

Example 575. Let θ be the angle between the vectors u = (1, −3) and v = (−2, 4). Then:

u⋅v −2 − 12 −14 −14

θ = cos−1 = cos−1 √ √ = cos−1 √ √ = cos−1 √ ≈ 2.999.
∣u∣ ∣v∣
12 + (−3) (−2) + 42 10 20 10 2
2 2

So, u and v are neither perpendicular nor parallel; instead, they point in different direc-
tions. Moreover, the angle between them is obtuse.

v = (−2, 4)

θ ≈ 2.999

u = (1, −3)

454, Contents

Example 576. Let θ be the angle between the vectors u = (1, −2) and v = (−2, 4). Then:

u⋅v −2 − 8 −10
θ = cos−1 = cos−1 √ √ = cos−1 √ √ = cos−1 −1 = π.
∣u∣ ∣v∣
12 + (−2) (−2) + 42 5 20
2 2

So, u and v are parallel; more specific- v = (−2, 4)

ally, they point in exact opposite direc-
tions. Moreover, the angle between them
is straight.


u = (1, −2)

Example 577. Let θ be the angle between the vectors u = (3, −1) and v = (1, 3). Then:

u⋅v 3−3
θ = cos−1 = cos−1 √ = −1
2√ 2
cos 0
∣u∣ ∣v∣
3 + (−1) 1 + 3
2 2

So, u and v are perpendicular and the v = (1, 3)

angle between them is right.


u = (3, −1)

Example 578. Let θ be the angle between the vectors u = (1, −2) and v = (2, −4). Then:

u⋅v 2+8 10
θ = cos−1 = cos−1 √ √ = cos−1 √ √ = cos−1 1 = 0.
∣u∣ ∣v∣
12 + (−2) 22 + (−4) 5 20
2 2

So, u and v are parallel; more specifically, they θ=0

point in the same direction. Moreover, the
angle between them is zero.
u = (1, −2)

v = (2, −4)

455, Contents

We can use Fact 70(i)–(iii) to quickly check if the angle between two vectors is acute, zero,
or obtuse:

Example 579. Even without doing any precise calculations or drawing any graphs, we
can quickly see that:

• The angle between the vectors (817, −2) and (39, −55) is acute, because “clearly”:

817 ⋅ 39 + (−2) ⋅ (−55) > 0.

√ √
• The vectors ( 79300, −470) and (47, 793) are perpendicular, because “clearly”:

√ √
79300 ⋅ 47 + (−470) ⋅ 793 = 0.

• If k < 0, then the angle between (67, k) and (−485, 32) is obtuse, because “clearly”:

67 ⋅ (−485) + 32k < 0.

Exercise 186. In each of the following, find the angle between u and v. Is it zero, acute,
right, obtuse, or straight? Are u and v perpendicular or parallel? Do they point in the
same, exact opposite, or different directions? (Answers on p. 1476.)

(a) u = (2, 0) and v = (0, 17). (b) u = (5, 0) and v = (−3, 0).

(c) u = (1, 0) and v = (1, 3). (d) u = (2, −3) and v = (1, 2).

456, Contents

37.1. The Pythagorean Theorem and Triangle Inequality
Remember the Pythagorean Theorem (Theorem 3)? Here it is again, but this time in
the language of vectors:

Theorem 8. (Pythagorean Theorem.) If u ⊥ v, then ∣u + v∣ = ∣u∣ + ∣v∣ .

2 2 2

Proof. See Exercise 187.


Exercise 187. Use Facts 65 and 67 to show that ∣u + v∣ = ∣u∣ + 2u ⋅ v + ∣v∣ . Then use
2 2 2

u ⊥ v to complete the proof of the above Theorem. (Answer on p. 1477.)

Remember the Triangle Inequality (Fact 36)? Here it is again, but this time in the
language of vectors:

Fact 72. (Triangle Inequality.) If u and v are vectors, then ∣u + v∣ ≤ ∣u∣ + ∣v∣.

Proof. See Exercise 188.

Exercise 188. In Exercise 187, we already showed that:

∣u + v∣ = ∣u∣ + 2u ⋅ v + ∣v∣ .
2 2 2

To prove Fact 72, first apply Cauchy’s Inequality (Fact 69) to the above equation; then
complete the square and take square roots. (Answer on p. 1477.)

457, Contents

37.2. Direction Cosines

Definition 113. The x- and y-direction cosines of the vector v = (v1 , v2 ) are the numbers:

v1 v2
and .
∣v∣ ∣v∣

Observe that the unit vector of v = (v1 , v2 ) is:

v̂ = ( , ).
v1 v2
∣v∣ ∣v∣

And so, equivalently, v’s x- and y-direction cosines are the x- and y-coordinates of its unit
We now explain why the x- and y-direction cosines are so named. Place the tail of v =
(v1 , v2 ) at the origin. Let α be the angle between v and the positive x-axis. Similarly, let
β be the angle between v and the positive y-axis.217

y v



a x

Let v̂ = (a, b) be v’s unit vector. It has length 1 and forms the hypotenuse of two right
triangles. From the lower-right triangle, we have a = cos α.
Similarly, from the upper-left triangle, we have b = cos β.
This explains why v’s unit vector’s x- and y-coordinates are also its x- and y-direction

More formally, α is the angle between v and i = (1, 0), while β is the angle between v and j = (0, 1).
458, Contents
We can state and prove what was just said a bit more formally:

Fact 73. Let v = (v1 , v2 ) be a non-zero vector. Let α and β be the angles between v and
each of i and j. Then:

cos α = cos β =
v1 v2
and .
∣v∣ ∣v∣

Proof. Since α is the angle between v and i, by Definition 111, we have:

v ⋅ i v1 ⋅ 1 + v2 ⋅ 0 v1
cos α = = = .
∣v∣ ∣i∣ ∣v∣ ⋅ 1 ∣v∣

Similarly, since β is the angle between v and j, we have:

v ⋅ j v1 ⋅ 0 + v2 ⋅ 1 v2
cos β = = = .
∣v∣ ∣j∣ ∣v∣ ⋅ 1 ∣v∣

Example 580. Consider the vector v = (3, 2). Its x- and y-direction cosines are:

3 3 2 2
=√ =√ =√ =√ .
v1 v2
∣v∣ 32 + 22 13 ∣v∣ 32 + 22 13

3 2
Which means, of course, that its unit vector is v̂ = ( √ , √ ).
13 13

v = (3, 2)

√ v̂
2/ 13

3/ 13 x

Let α and β be the angles it makes with the positive x- and y-axes. Then:

3 2
α = cos−1 √ ≈ 0.588 and β = cos−1 √ ≈ 0.983.
13 13

459, Contents

Example 581. Consider the vector v = (−2, −1). Its x- and y-direction cosines are:

−2 −2 −1 −1
=√ =√ =√ =√ .
v1 v2
∣v∣ ∣v∣
(−2) + (−1) 5 (−2) + (−1) 5
2 2 2 2

−2 −1
Which means, of course, that its unit vector is v̂ = ( √ , √ ).
5 5


−2/ 13 x

−1/ 5

v = (−2, −1)

Let α and β be the angles it makes with the positive x- and y-axes. Then:

−2 −1
α = cos−1 √ ≈ 2.678 and β = cos−1 √ ≈ 2.034.
5 5

Exercise 189. Find each vector’s x- and y-direction cosines. Then write down its unit
vector. (Answer on p. 1477.)

(a) (1, 3). (b) (4, 2). (c) (−1, 2).

460, Contents

38. The Angle Between Two Lines
(Below, Corollary 8 will give the “formula” for the angle between two lines. If you’re a
well-trained, mindless Singaporean monkey who cares only about “knowing” the “formula”
without understanding where it comes from, you can skip straight to Corollary 8.)
“Obviously”, any two non-parallel lines l1 and l2 in
2D space must intersect at exactly one point. And
at this intersection point, two angles are formed.
Let’s call the smaller angle α, so that the larger
angle is β = π − α.
Now, we have a potential ambiguity: When we
talk about the angle between l1 and l2 , are we
talking about the smaller angle α or the larger α
angle β?
β =π−α
To resolve this ambiguity, this textbook will adopt β =π−α
the convention that the angle between two lines is
the smaller one. So, by this convention, the angle α
between l1 and l2 shall be α (and not β).
Note that by our adopted convention, the angle
between two lines will always be acute (i.e. between
0 and π/2) and never obtuse.

We now work towards a formal definition of the angle between two lines.

Example 582. Given the lines l1 and l2 , we pick for

each the direction vectors u and v. l2
Observe that:
Angle between Angle between
α= = α
l1 and l2 u and v.

That is, α is the angle between the two lines; moreover,

it is also the angle between the two vectors.

The above Example suggests the following “Definition” for the angle between two lines.
Given two lines l1 and l2 , pick for each any direction vectors u and v. Then define:

Angle between Angle between

l1 and l2 u and v.

This “Definition” works well in the above Example, but only because the angle between u
and v happens to be acute.
Unfortunately and as the next example illustrates, this “Definition” doesn’t work so well if
the angle between the two chosen direction vectors is instead obtuse:

461, Contents

Example 583. We continue with the same two
lines as before. We continue to pick the direction l2
vector u for the line l1 . But this time, we pick
the direction vector w for the line l2 .
The angle between the two lines remains the
acute angle α.
However, the angle between the two chosen dir-
ection vectors is now the obtuse angle β.
And so, the above “Definition” fails because:

Angle between Angle between

α= ≠ = β. w
l1 and l2 u and w

In this case, the angle between the two lines, α, is actually the supplement of the angle
between the chosen two direction vectors, β. That is:

α = π − β.

The following Definition of the non-obtuse angle between two vectors will prove

Definition 114. Let α denote the non-obtuse angle between the vectors u and v. Let β
be the angle between u and v. Then:

⎪β if β is not obtuse,

⎩π − β
⎪ if β is obtuse.

Example 584. The angle between the vectors c and d is β, which is obtuse. And so,
the non-obtuse angle between them is α = π − β.

c e

β γ
d f

The angle between the vectors e and f is γ, which is acute. And so, the non-obtuse angle
between them is also γ.

We are now ready to write down our formal Definition of the angle between two lines:

462, Contents

Definition 115. Given two lines, pick for each any direction vector. We call the non-
obtuse angle between these two vectors the angle between the two lines.

Example 585. Given the lines l1 and l2 , we pick the direction vectors u and v.
The angle between u and v is α, which is acute. And so, the non-obtuse angle between
u and v is also α. Thus, by Definition 115, the angle between the two lines is α.

l1 l3
u γ w

Given the lines l3 and l4 , we pick the direction vectors w and x.

The angle between w and x is β, which is obtuse. And so, the non-obtuse angle between
w and x is γ = π − β. Thus, by Definition 115, the angle between the two lines is γ.

We just wrote down the Definition of the angle between two lines. We now work towards
Corollary 8, which will give us our “formula” for the angle between two lines.
Recall that by Definition 111, the angle between u and v is: cos−1 .
∣u∣ ∣v∣
It turns out that we can get the non-obtuse angle between u and v simply by slapping ∣⋅∣
(the absolute value function) onto the numerator:

Fact 74. The non-obtuse angle between two non-zero vectors u and v is:
∣u ⋅ v∣
∣u∣ ∣v∣

Proof. Suppose 0 ≤ θ ≤ π/2. Then u ⋅ v ≥ 0. And so, by Definition 114, the non-obtuse angle
between u and v is:
u⋅v ∣u ⋅ v∣
cos−1 = cos−1
∣u∣ ∣v∣ ∣u∣ ∣v∣
. 3

Suppose instead θ > π/2. Then u ⋅ v < 0. And so, by Definition 114 and the trigonometric
identity π − cos−1 x = cos−1 (−x) (Fact 34), the non-obtuse angle between u and v is:
u⋅v −u ⋅ v ∣u ⋅ v∣
π − cos−1 = cos−1 = cos−1
∣u∣ ∣v∣ ∣u∣ ∣v∣ ∣u∣ ∣v∣
. 3

463, Contents

From Fact 74, the following “formula” for the angle between two lines is immediate:

Corollary 8. The angle between two lines with direction vectors u and v is:
∣u ⋅ v∣
∣u∣ ∣v∣

Example 586. Two lines l1 and l2 are y

described by: l1


r = (1, 3) +λ (2, 1) (λ ∈ R),

v2 = (1, 3)

r = (−1, −1) +λ (1, 3) (λ ∈ R).

By Corollary 8, the angle between the

two lines is:

∣v1 ⋅ v2 ∣ ∣(2, 1) ⋅ (1, 3)∣

cos−1 = cos−1
v1 = (2, 1)
∣v1 ∣ ∣v2 ∣ ∣(2, 1)∣ ∣(1, 3)∣

∣5∣ 1
= = cos−1 √ √ = cos−1 √ = .
5 10 2 4

Example 587. Two lines l1 and l2 are described by:

³¹¹ ¹ ¹ · ¹ ¹ ¹ ¹µ

r = (0, 0) + λ(−2, 3) and r = (1, 0) + λ (3, 1) (λ ∈ R).

By Corollary 8, the angle between the two lines is:

∣v1 ⋅ v2 ∣ ∣(−2, 3) ⋅ (3, 1)∣ ∣−3∣
cos−1 = cos−1 = cos−1 √ √ ≈ 1.305.
∣v1 ∣ ∣v2 ∣ ∣(−2, 3)∣ ∣(3, 1)∣ 13 10

y l2

v1 = (−2, 3)

v2 = (3, 1)

1.305 x

464, Contents

Definition 116. Two lines are (a) parallel if they have parallel direction vectors; and
(b) perpendicular if they have perpendicular direction vectors.

Corollary 9. Suppose θ is the angle between two lines. (a) If θ = 0, then the two lines
are parallel. And (b) if θ = π/2, then they are perpendicular.

Proof. See p. 1291 in the Appendices.

Example 588. Two lines l1 and l2 are described by:

³¹¹ ¹ ¹ ¹ ¹ ¹·¹ ¹ ¹ ¹ ¹ ¹ ¹µ

r = (2, −2) + λ (3, 3) and r = (1, 1) + λ(−1, −1) (λ ∈ R).

By Corollary 8, the angle between l1 and l2 is:

∣v1 ⋅ v2 ∣ ∣(3, 3) ⋅ (−1, −1)∣ ∣−6∣
cos−1 = cos−1 = cos−1 √ √ = cos−1 1 = 0.
∣v1 ∣ ∣v2 ∣ ∣(3, 3)∣ ∣(−1, −1)∣ 18 2
By Corollary 9, the lines l1 and l2 are parallel.

y y
v3 = (−1, 2)
v1 = (3, 3)
v4 = (6, 3)


v2 = (−1, −1) π/2


Two lines l3 and l4 are described by:

³¹¹ ¹ ¹ · ¹ ¹ ¹ ¹µ

r = (0, 3) + λ(−1, 2) and r = (−1, 1) + λ (6, 3) (λ ∈ R).

By Corollary 8, the angle between l1 and l2 is:

∣v3 ⋅ v4 ∣ ∣(−1, 2) ⋅ (6, 3)∣ ∣0∣
cos−1 = cos−1 = cos−1 √ √ = cos−1 0 = .
∣v3 ∣ ∣v4 ∣ ∣(−1, 2)∣ ∣(6, 3)∣ 5 45 2

By Corollary 9, the lines l3 and l4 are perpendicular.

465, Contents
Let’s write down the “obvious” formal definition of when a line and a vector are parallel or

Definition 117. A line and a vector are (a) parallel if the line has a direction vector
that’s parallel to the given vector; and (b) perpendicular if the line has a direction vector
that’s perpendicular to the given vector.

Here are two “obvious” Facts you may recall from primary school:

Fact 75. If two lines are:

(a) Identical, then they are also parallel.
(b) Distinct and parallel, then they do not intersect.
(c) Distinct, then they share at most one intersection point.

Proof. See p. 1292 in the Appendices.

Fact 76. If two lines (in 2D space) are distinct and non-parallel, then they must share
exactly one intersection point.

Proof. See p. 1292 in the Appendices.

Example 589. The lines l1 and l2 are identical. And indeed, they l1 = l2
are also parallel.
The lines l1 and l3 are distinct and parallel. And indeed they do
not intersect.
The lines l1 and l4 are distinct and non-parallel. And indeed they l4
share exactly one intersection point.

Exercise 190. Find the angle between each given pair of lines. State if they are parallel
or perpendicular. (Answer on p. 1478.)
(a) r = (−1, 2) +λ (−1, 1) and r = (0, 0) +λ (2, −3) (λ ∈ R).
(b) r = (−1, 2) +λ (1, 5) and r = (0, 0) +λ (8, 1) “
(c) r = (−1, 2) +λ (2, 6) and r = (0, 0) +λ (3, 2) “

Remark 60. Fact 76 applies only to 2D space. As we’ll learn later, in 3D space, two lines
can be distinct, non-parallel, and yet do not intersect. (We call such lines skew lines.)
In contrast, Fact 75 applies more generally to higher dimensions, including in 3D space.

466, Contents

39. Vectors vs Scalars
We now illustrate the difference between vectors and scalars by revisiting Example 381 from
Ch. 23 (Simple Parametric Equations):

Example 590. A moving particle has position vector s given by:

s(t) = (sx (t), sy (t)) = (cos t, sin t) (t ≥ 0).

The particle’s position vector s is a function of time t. For brevity of notation, we will
often be lazy/sloppy and omit “(t)”. That is, we will often instead simply write:

s = (sx , sy ) = (cos t, sin t) (t ≥ 0).

At time t, the particle’s x- and y-coordinates are sx = cos t and sy = sin t, respectively. In
other words, at time t, the particle is cos t m east and sin t m m north of the origin.
As time t progresses from 0 to 2π seconds, we trace out, anti-clockwise, the unit circle:

t = 0 Ô⇒ (sx , sy ) = (1, 0), t = π Ô⇒ (sx , sy ) = (−1, 0),

√ √ √ √
t = π/4 Ô⇒ (sx , sy ) = ( 2/2, 2/2) , t = 5π/4 Ô⇒ (sx , sy ) = (− 2/2, − 2/2) ,
t = π/2 Ô⇒ (sx , sy ) = (0, 1), t = 3π/2 Ô⇒ (sx , sy ) = (0, −1),
√ √ √ √
t = 3π/4 Ô⇒ (sx , sy ) = (− 2/2, 2/2) , t = 7π/4 Ô⇒ (sx , sy ) = ( 2/2, − 2/2) .

Arrows indicate y At t = 1,
instantaneous s = (sx , sy ) ≈ (0.54 m, 0.84 m),
direction of v = (vx , vy ) ≈ (−0.84 m s−1 , 0.54 m s−1 ),

v = 1 m s−1 .

At t = 0,
s = (sx , sy ) ≈ (1 m, 0 m),
Direction of
a and F v = (vx , vy ) ≈ (1 m s−1 , 0 m s−1 ),

l v = 1 m s−1 .

At t =


√ √
2 2
s = (sx , sy ) ≈ (− m, − m),
2 2
√ √
\ v = (vx , vy ) ≈ (
ms ,−−1
m s−1 ),

v = 1 m s−1 .

(Example continues on the next page ...)

467, Contents

(... Example continued from the previous page.)
We next define the particle’s velocity vector v(t) to be the first derivative of s with
respect to time t:
ds dsx dsy d cos t d sin t
v(t) = (vx (t), vy (t)) = =( , )=( , ) = (− sin t, cos t) .
dt dt dt dt dt

So, at time t, the particle is travelling eastwards at vx = − sin t m s−1 (or equivalently,
westwards at sin t m s−1 ) and northwards at vy = cos t m s−1 .
The magnitude of the particle’s velocity vector is denoted v and is called its speed:

v = ∣v∣ = vx2 + vy2 .

Recalling218 the trigonometric identity sin2 t + cos2 t = 1, we have:

√ √ √
v = ∣v∣ = vx + vy = sin2 t + cos2 t = 1 = 1.
2 2

Aha! So, interestingly, the particle travels at the constant speed of 1 m s−1 . That is, at
every instant in time t, it is moving 1 m s−1 in its direction of travel.
We now prove that the particle always moves in a direction tangent to the circle.
In other words, its direction of travel is always perpendicular to its position vector.
To do so, we need simply prove that v ⋅ s = 0 for all t:

v ⋅ s = (− sin t, cos t) ⋅ (cos t, sin t) = − sin t cos t + cos t sin t = 0. 3

Velocity is a vector — it has both magnitude and direction. In contrast, speed is

a scalar — it has only magnitude.
(Example continues on the next page ...)

Fact 29(a).
468, Contents
(... Example continued from the previous page.)
Similarly, the particle’s acceleration vector is defined as the first derivative of the
velocity vector (or, equivalently, the second derivative of the position vector):

dv d2 s dvx dvy d (− sin t) d cos t

a(t) = (ax (t), ay (t)) = = 2 =( , )=( , ) = (− cos t, − sin t) .
dt dt dt dt dt dt

So, at time t, the particle is accelerating eastwards at ax = − cos t m s−2 and northwards
at ay = − sin t m s−2 . Or equivalently, it is accelerating westwards at cos t m s−2 and south-
wards at sin t m s−2 . (Note that m s−2 is abbreviation for metre per second per second.)
The magnitude of the particle’s acceleration vector is denoted a:
√ √ √
a = ∣a∣ = ax + ay = (− cos t) + (− sin t) = 1 = 1.
2 2 2 2

Aha! So, interestingly, the particle accelerates at the constant rate of 1 m s−2 . That is, at
every instant in time t, it is accelerating 1 m s−2 in its direction of acceleration.
(Note that for velocity, we gave its magnitude the special name of speed. But in contrast,
the magnitude of acceleration has no special name. We simply call it the magnitude of
Above we proved that the particle’s direction of movement is always tangent to the
circle. Here we can similarly prove that its direction of acceleration is always towards
the centre of the circle. (Or equivalently, the acceleration vector points in the exactly
opposite direction as the position vector.)
To prove this, we need simply observe that the acceleration vector a = (− cos t, − sin t)
and the position vector s = (cos t, sin t) point in exact opposite directions.219
Suppose the particle’s mass is m = 1 kg. Recall from physics Newton’s Second Law:220

­ ­ ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ
Vector Scalar Vector

F = ma or Force = Mass × Acceleration.

Note that mass is a scalar quantity. Hence, force, being a product of a scalar and a
vector, is itself a vector quantity.
The force vector points in the same direction as the acceleration vector (i.e. towards the
centre of the circle). Moreover, it has constant magnitude:

∣F∣ = ∣ma∣ = ∣m∣ ∣a∣ = (1 kg) × (1 m s−2 ) = 1 kg m s−2 = 1 N,

where N is for newton (the SI unit for force and which is equal to kg m s−2 ).
Physicists call such a force (which results in circular movement) a centripetal force.

Note though that it would be wrong to write a = −s. This is because acceleration is measured in m s−2 ,
while position is measured in m.
See e.g. Exercise 109.
469, Contents
40. The Projection and Rejection Vectors
Let a and b be vectors.
The projection of a on b, denoted projb a, is a
the vector that is:221
a − projb a
• Parallel to b; and
• Perpendicular to a − projb a (the vector de-
picted in blue). θ

Let us work towards a formal definition of projb a. projb a b

First, since projb a ∥ b, by Definition 104, we must have projb a = k b̂ for some k ≠ 0.
Next, the length of projb a is ∣k∣. But what is k?
We observe that in the above figure, a right triangle is formed. This right triangle’s hypo-
tenuse corresponds to the vector a, while its base corresponds to the vector projb a. Let θ
be the angle between these two vectors.
Then by our right-triangle definition of cosine, we have:
“Adjacent′′ ∣projb a∣ 1 ∣k∣
cos θ = = = .
“Hypotenuse′′ ∣a∣ ∣a∣

a ⋅ b 2 a ⋅ b̂
But by Definition 111, we also have: cos θ = = .
∣a∣ ∣b∣ ∣a∣

Putting together = and =, we have:

1 2

∣k∣ a ⋅ b̂
= or ∣k∣ = a ⋅ b̂.
∣a∣ ∣a∣

The above discussion motivates the following definition of the projection vector:

Definition 118. Let a and b be vectors. Then the projection of a on b, denoted projb a,
is the following vector:

projb a = (a ⋅ b̂) b̂ (or equivalently, projb a = b).

For convenience, let’s call the “blue vector” the rejection vector and denote it rejb a:

Definition 119. Let a and b be non-zero vectors. Then the rejection of a on b, denoted
rejb a, is the following vector:

rejb a = a − projb a.

These two properties are to hold so long as projb a and a − projb a are non-zero.
470, Contents
The following result is “obvious” from our above two Definitions:

Fact 77. Let a and b be vectors, and projb a = (a ⋅ b̂) b̂.

(a) If a ⋅ b̂ > 0, then projb a is a positive scalar multiple of b.

(b) If a ⋅ b̂ < 0, then projb a is a negative scalar multiple of b.
(c) If a ⋅ b̂ = 0, then projb a = 0 and rejb a = a.

Here’s what the above result says geometrically. Let θ be the angle between a and b. Then:
(a) If θ is acute, then projb a points in the same direction as b.
(b) If θ is obtuse, then projb a points in the exact opposite direction as b.
(c) If θ is right (i.e. if a ⊥ b), then projb a = 0 and rejb a = a.

(a) (b)

rejb a = a − projb a θ
projb a b projb a b

a (c)

rejb a = a

θ = π/2

projb a = 0 b

Above we already argued that the length of projb a must be ∣a ⋅ b̂∣. We now formally prove
that this is so:

Fact 78. Let a and b be vectors. Then:

∣projb a∣ = ∣a ⋅ b̂∣ .

Proof. By Definition 118 and Fact 57:

∣projb a∣ = ∣(a ⋅ b̂) b̂∣ = ∣a ⋅ b̂∣ ∣b̂∣ = ∣a ⋅ b̂∣ ⋅ 1 = ∣a ⋅ b̂∣ . 3

471, Contents

Example 591. Let u = (3, 2) and v = (1, 1) be vec- projv u
tors and projv u be the projection of u on v. Then:

(3, 2) ⋅ (1, 1) 3 + 2 5
∣projv u∣ = ∣u ⋅ v̂∣ = ∣ ∣= √ = √ .
∣(1, 1)∣ 2 2
u = (3, 2)
v = (1, 1)
Now consider w = (−2, −2) — it points in the exact
opposite direction from v and has twice the length.
It turns out that the projection of u on w, projw u,
is identical to projv u.

projw u

u = (3, 2)

We can easily verify that ∣projw u∣ = ∣projv u∣:

(3, 2) ⋅ (−2, −2)

∣projw u∣ = ∣u ⋅ ŵ∣ = ∣ ∣
∣(−2, −2)∣
−6 − 4 −10 5
=∣ √ ∣=∣√ ∣= √ . 3
8 8 2
w = (−2, −2)

As the above example suggests, if v and w are parallel, then the projections of any vector
u on v and w are identical. Formally:

Fact 79. Let u, v, and w be vectors. If v ∥ w, then:

projv u = projw u.

Proof. If v ∥ w, then by Fact 60, v̂ = ±ŵ. And so:

projv u = (u ⋅ v̂) v̂ = [u ⋅ (±ŵ)] (±ŵ) = (u ⋅ ŵ) ŵ = projw u.

472, Contents

Example 592. Let a = (−6, 1) and b = (2, 0) be vectors and projb a be the projection of
a on b. Then:

(−6, 1) ⋅ (2, 0) −12 + 0 −12

∣projb a∣ = ∣a ⋅ b̂∣ = ∣ ∣=∣ ∣=∣ ∣ = 6.
∣(2, 0)∣ 2 2

a = (−6, 1)
b = (2, 0)
projb a

Now consider c = (3, 0) — it points in the same direction as b, but is half again as long.
By Fact 79, the projection of a on c is the same as that of a on b. That is:

projc a = projb a.

We can easily verify that ∣projc a∣ = ∣projb a∣ = 6:

(−6, 1) ⋅ (3, 0) −18 + 0 −18

∣projc a∣ = ∣a ⋅ ĉ∣ = ∣ ∣=∣ ∣=∣ ∣ = 6.
∣(3, 0)∣ 3 3

a = (−6, 1)
c = (3, 0)
projc a

Fact 79 can help us simplify some calculations:

√ √
Example 593. Let u = (5, −7) and v = (51 347, 68 347). What is ∣projv u∣ (i.e. the
length of the projection of u onto v)?
Here it seems that the calculations will be pretty tedious. But observe that v is a multiple
of w = (3, 4). Thus, v ∥ w. And so, by Fact 79:

projv u = projw u.

Hence, instead of computing ∣projv u∣, we can simply compute ∣projw u∣:

u⋅w (5, −7) ⋅ (3, 4) 15 ⋅ −28 −13

∣projv u∣ = ∣projw u∣ = ∣u ⋅ ŵ∣ = ∣ ∣=∣ ∣ = ∣√ ∣=∣ ∣ = 2.6.
∣w∣ ∣(3, 4)∣ 32 + 42 5

Exercise 191. Find the lengths of the projections of: (Answer on p. 1478.)

(a) (1, 0) on (33, 33); and (b) (33, 33) on (1, 0).

(c) Hence conclude if the following statement is true or false:

“Given any vectors a and b, ∣projb a∣ = ∣proja b∣.”

473, Contents

41. Collinearity

Definition 120. Two or more points are collinear if some line contains all of them.

Example 594. The points A and B are collinear.

Ð→ Ð→ Ð→
AB r = OA + λAB (λ ∈ R)

“Obviously”, any two points must be collinear. Indeed, given any two points, there is a
unique line that contains both of them:

Fact 80. Suppose A and B are distinct points. Then the unique line that contains both
A and B is described by:
Ð→ Ð→
r = OA + λAB (λ ∈ R).

Proof. First, plug in λ = 0 and λ = 1 to verify that the given line contains A and B.
Next, this line is unique because any line that contains both A and B must have direction
vector AB and must thus be described by:
Ð→ Ð→
r = OA + λAB (λ ∈ R) .

In contrast, three distinct points can be collinear but will not generally be:

Example 595. The points A, B, and C are collinear:

Ð→ Ð→
A, B, and C are collinear. AB r = a + λAB (λ ∈ R)


In contrast, the points D, E, and F are not:

D, E, and F are not collinear. r = d + λDE (λ ∈ R)


474, Contents

Here is one possible procedure for checking whether three points are collinear:
1. First use Fact 80 to write down the unique line that contains two of the three points.
2. Then check whether this line also contains the third point.

Example 596. Let A = (1, 2), B = (4, 5), and C = (7, 8) be points.
To check if they are collinear:
1. First write down the unique line that contains both A and B:

Ð→ Ð→
r = OA + λAB = (1, 2) + λ(3, 3) (λ ∈ R).

2. If this line also contains C, then there exists λ̂ such that:

⎛7⎞ ⎛1⎞ ⎛3⎞ 7 = 1 + 3λ̂,

C= = + λ̂ or
⎝8⎠ ⎝2⎠ ⎝3⎠ 8 = 2 + 3λ̂.

As you can verify, λ̂ = 2 solves the above vector equation (or system of two equations).
Hence, our line also contains C. Thus, A, B, and C are collinear.

Example 597. Let D = (1, 0), E = (0, 1), and F = (0, 0) be points. To check if they are

1. First write down a line that contains both D and E:

r = OD + λDE = (1, 0) + λ(−1, 1) (λ ∈ R).

2. If this line also contains F , then there exists λ̂ such that:

⎛0⎞ ⎛1⎞ ⎛ −1 ⎞ 0 = 1 − 1λ̂,

F= = + λ̂ or
⎝0⎠ ⎝0⎠ ⎝ 1 ⎠ 0 = 0 + 1λ̂.

From =, we have λ̂ = 1. But this contradicts =. This contradiction means that there is no
1 2

solution to the above vector equation (or system of two equations).

Hence, our line does not contain F . Thus, D, E, and F are not collinear.

Exercise 192. In each of the following, three points A, B, and C are given. Determine
if they are collinear.

(a) A = (3, 1), B = (1, 6), and C = (0, −1).

(b) A = (1, 2), B = (0, 0), and C = (3, 6). (Answer on p. 1479.)

475, Contents

42. The Vector Product

Definition 121. Let u = (u1 , u2 ) and v = (v1 , v2 ) be vectors. Their vector product,
denoted u × v, is the number:

u × v = u1 v2 − u2 v1 .

Remark 61. The vector product is also called the cross product. But your A-Level
exams and syllabus do not seem to use this term and so neither shall we. We will stick
strictly to the term vector product.

Example 598. Let u = (5, −3), v = (2, 1), w = (−4, 0), and x = (8, 7). Then:

⎛ 5 ⎞ ⎛2⎞
u×v = × = 5 ⋅ 1 − (−3) ⋅ 2 = 5 + 6 = 11,
⎝ −3 ⎠ ⎝1⎠

⎛ 5 ⎞ ⎛ −4 ⎞
u×w = × = 5 ⋅ 0 − (−3) ⋅ (−4) = 0 − 12 = −12,
⎝ −3 ⎠ ⎝ 0 ⎠

⎛ 5 ⎞ ⎛8⎞
u×x = × = 5 ⋅ 7 − (−3) ⋅ 8 = 35 + 24 = 59,
⎝ −3 ⎠ ⎝7⎠

⎛2⎞ ⎛ −4 ⎞
v×w = × = 2 ⋅ 0 − 1 ⋅ (−4) = 0 + 4 = 4.
⎝1⎠ ⎝ 0 ⎠

We now discuss three properties of the vector product.

Recall that ordinary multiplication and the scalar product are both distributive (over
addition) and commutative. It turns out that the vector product is also distributive:

Example 599. Continuing with the above example:

⎛ 5 ⎞ ⎛ 2−4 ⎞
u × (v + w) = × = 5 ⋅ 1 − (−3) ⋅ (−2)
⎝ −3 ⎠ ⎝ 1+0 ⎠

= −1 = 11 + (−12) = u × v + u × w.

⎛ 5+2 ⎞ ⎛ −4 ⎞
(u + v) × w = × = 7 ⋅ 0 − (−2) ⋅ (−4)
⎝ −3 + 1 ⎠ ⎝ 0 ⎠

= −8 = −12 + 4 = u × w + v × w.

476, Contents

However, the vector product is not commutative. Instead, it is anti-commutative:

a × b = −b × a.

Example 600. Let u = (5, −3), v = (2, 1), and w = (−4, 0). We already showed that:

u × v = 11 and w × u = −12.

We now show that v × u = −11 and w × u = 12:

v × u = (2, 1) × (5, −3) = 2 ⋅ (−3) − 1 ⋅ 5 = −6 − 5 = −11.

w × u = (−4, 0) × (5, −3) (−4) ⋅ (−3) − 0 ⋅ 5 = 12 − 0 = 12.

The third property is that a vector’s vector product with itself is 0:

Example 601. Continuing with the above example, we have:

u × u = (5, −3) × (5, −3) = 5 ⋅ (−3) − (−3) ⋅ 5 = −15 + 15 = 0.

v × v = (2, 1) × (2, 1) 2 ⋅ 1 − 1 ⋅ 2 = 2 − 2 = 0.

w × w = (−4, 0) × (−4, 0) (−4) ⋅ 0 − 0 ⋅ (−4) = 0 − 0 = 0.

In summary:

Fact 81. Let a, b, and c be vectors. Then:

(a) a × (b + c) = a × b + a × c. (Distributivity over addition)
(b) a × b = −b × a. (Anti-commutativity)
(c) a × a = 0. (Self vector product equals zero)

Proof. See Exercise ??.

477, Contents

From Fact 81(c), we have the following result:

Corollary 10. If a ∥ b, then a × b = 0.

Proof. If a ∥ b, then there exists c ≠ 0 such that ca = cb. Thus:

a × b = a × (ca) = c (a × a) = c ⋅ 0 = 0.

Example 602. Let a = (1, 2) and b = (−2, −4). Since a ∥ b, by Corollary 10, a × b = 0.
The converse of Corollary 10 is also true, but is harder to prove:

Fact 82. Let a and b be non-zero vectors. If a × b = 0, then a ∥ b.

Proof. See p. 1297 in the Appendices.

Example 603. Let a = (3, −1) and b = (2, k), where k is some unknown constant. We
are now told that a × b = 0. What is k?
Well, by Fact 82, a × b = 0 implies that a ∥ b. That is, b is a multiple of a.
Hence, 2/3 = k/ (−1). And so, k = −2/3.

The following result is simply Corollary 10 and Fact 82 combined:

Corollary 11. Suppose a and b are non-zero vectors. Then:

a×b=0 ⇐⇒ a ∥ b.

Here’s another “obvious” property of the vector product:

Fact 83. Suppose a and b are vectors and c ∈ R is a scalar. Then:

(ca) × b = c (a × b).

Proof. Let a = (a1 , a2 ) and b = (b1 , b2 ). Then ca = (ca1 , ca2 ) and:

(ca) × b = (ca1 ) ⋅ b2 − (ca2 ) b1 = c (a1 b2 − a2 b1 ) = c (a × b) .

Exercise 193. Let a = (1, −2), b = (3, 0), and c = (4, 1). Compute a × b, a × c, b × c,
b × a, c × a, c × b, and a × (b + c). (Answer on p. 1480.)

478, Contents

42.1. The Angle between Two Vectors Using the Vector Product
Recall222 that with the scalar product, we had a ⋅ b = ∣a∣ ∣b∣ cos θ.
It turns out that with the vector product, we have a very similar result:

Fact 84. Let θ be the angle between the vectors a and b. Then:

∣a × b∣ = ∣a∣ ∣b∣ sin θ.

Proof. See Exercise 194.

Exercise 194. Let θ be the angle between the vectors a = (a1 , a2 ) and b = (b1 , b2 ).
(a) Express ∣a∣, ∣b∣, ∣a × b∣, and cos θ in terms of a1 , a2 , b1 , and b2 . (You do not need to
expand the squared terms.)
(b) Since θ ∈ [0, π], what can you say about the sign of sin θ? (That is, is sin θ positive,
negative, non-positive, or non-negative?)
(c) Now use a trigonometric identity to express sin θ in terms of cos θ. (Hint: You should
find that there are two possibilities. Use what you found in (b) to explain why you
can discard one of these possibilities.)
(d) Plug the expression you wrote down for cos θ in (a) into what you found in (c).
(e) Prove223 that (a21 + a22 ) (b21 + b22 )−(a1 b1 + a2 b2 ) = (a1 b2 − a2 b1 ) . (Hint: Simply expand
2 2

the terms and do the algebra.)

(f) Use (a) and (d) to express ∣a∣ ∣b∣ sin θ in terms of a1 , a2 , b1 , and b2 . Then use (e) to
prove that ∣a × b∣ = ∣a∣ ∣b∣ sin θ. (Answer on p. 1480.)

Fact 84 immediately yields the following result :

Corollary 12. If θ ∈ [0, π] is the angle between the vectors u and v, then:
∣u × v∣
θ = sin−1
∣u∣ ∣v∣

Corollary 12 is thus the sine or vector product analogue of Definition 111.

Note though that we won’t be using Corollary 12 to compute the angle between two vectors.
This is because, as we’ll see shortly, it’s generally easier to compute the scalar product than
the vector product. And so, it’s easier to just use Definition 111.

Fact 68.
By the way, this is simply an instance of Lagrange’s Identity.
479, Contents
42.2. The Length of the Rejection Vector
The vector product will mostly be useful only when we look at 3D space. Nonetheless, even
in 2D space, it has the following use:

Fact 85. Let a and b be vectors. Then:

∣rejb a∣ = ∣a × b̂∣ .

rejb a = a − projb a

projb a b

Fact 85 is thus the vector product analogue of Fact 78.

Proof. By the right-triangle definition of sine:224

Opposite ∣rejb a∣
sin θ = = .
Hypotenuse ∣a∣

Below, we first rearrange, then use ∣b̂∣ = 1 and Fact 84:

∣rejb a∣ = ∣a∣ sin θ = ∣a∣ ∣b̂∣ sin θ = ∣a × b̂∣ . 3

As we’ll see next, Fact 85 will help us compute the distance between a point and the line.

For a proof that makes no mention of the sine function, see p. 1296 in the Appendices.
480, Contents
43. The Foot of the Perpendicular From a Point to a Line

Definition 122. Let A be a point that isn’t on the line l. The foot A
of the perpendicular from A to l is the point B on l such that AB ⊥ l. B l

A Ð→ Suppose the line l has direction vector v. Pick any point P on

AB l Ð→ Ð→
l. Consider projv P A — the projection of P A on v.
v Observe that B = P + projv P A is a foot of the perpendicular
Ð→ from A to l. Moreover, it is unique.225
P projv P A Formally:
Fact 86. Suppose l is the line described by r = OP + λv (λ ∈ R) and A is a point that
isn’t on l. Then the unique foot of the perpendicular from A to l is the following point:
P + projv P A.

Proof. See Exercise 196.

Example 604. Let A = (1, 2) be a point and l be the line described by:

r = OP + λv = (0, 1) + λ(9, 1) (λ ∈ R).

Compute P A = (1, 2) − (0, 1) = (1, 1) and:

Ð→ (1, 1) ⋅ (9, 1) ⎛ 9 ⎞ 9 + 1 5
projv P A = proj(9,1) (1, 1) = = (9, 1) = (9, 1).
92 + 12 ⎝ 1 ⎠ 82 41

By Fact 86, the foot of the perpendicular from A to l is: A = (1, 2)

Ð→ 5 1
B = P + projv P A = (0, 1) + (9, 1) = (45, 46) .
41 41
By the way, let’s call this Method 1 or the Formula B
v = (9, 1)
Method for finding the foot of the perpendicular (so named
because we simply plug in the formula given in Fact 86). P = (0, 1)
Below, we’ll also two more methods for finding the same.

Exercise 195. Find the feet of the perpendiculars from the points A = (−1, 0) and
B = (3, 2) to the line described by r = OP + λv = (2, −3) + λ (5, 1) (λ ∈ R). (Answer on p.

Hence justifying the use of the definite article in Definition 122.
481, Contents
Exercise 196. This Exercise guides you through a proof of Fact 86. Let l be the line
Ð→ Ð→
described by r = OP + λv (λ ∈ R), A be a point that isn’t on l, and B = P + projv P A.
To prove that B is a foot of the perpendicular from A to l, we must prove that (a) B is
on l; and (b) AB ⊥ l :
(a) To prove that B is on l, follow these two steps:
(i) Explain why projv P A can be written as a scalar multiple of v. That is, explain
why projv P A = λv for some λ ∈ R.
(ii) Hence explain why B satisfies l’s vector equation and is thus on l.
(b) Next, to prove that AB ⊥ l, follow these steps:
Ð→ Ð→
(i) Show that AB = −rejv P A. (Hint: Definition 119.)
(ii) Explain why rejv P A ⊥ v.
(iii) Hence explain why AB ⊥ l.
(c) To prove that B is the unique foot of the perpendicular from A to l, let C ≠ B be
a point on l — we will show that AC ⊥/ l and hence that C cannot also be a foot of
the perpendicular from A to l:
Ð→ Ð→
(i) First explain why AB ⋅ BC = 0.
Ð→ Ð→ Ð→
(ii) Now prove that AC ⊥/ BC and hence that AC ⊥/ l. (Answer on p. 1481.)

482, Contents

43.1. The Distance Between a Point and a Line
We define the distance between a point and a line as the minimum distance between them:

Definition 123. Let A be a point and l be a line. Suppose B is the point on l that’s
closest to A. Then the distance between A and l is ∣AB∣.

Remark 62. In the trivial case where A is on l, the point on l that’s closest to A is A
itself. And so, the distance between A and l is ∣AA∣ = ∣0∣ = 0 (as we’d expect).

Fact 87 states what you may find in-

tuitively “obvious”: If B is the foot of A
the perpendicular from a point A to a
line l, then B is also the point on the
line that’s closest to A. The line l B, the foot of the perpendicular,
is also the closest point.

Fact 87. If B is the foot of the perpendicular from a point A to a line l, then B is also
the point on l that’s closest to A.
A l
Proof. Let C ≠ B be any other point on l. Observe that ABC
forms a right triangle with hypotenuse AC.
By the Pythagorean Theorem, the leg AB must be shorter than B
the hypotenuse AC. That is, ∣AB∣ < ∣AC∣. We’ve just shown
that B is closer to A than any other point on q. Thus, B is the
point on l that’s closest to A.

From the above Definition and Fact, the following Corollary is immediate:

Corollary 13. Suppose l is a line, A is a point, and B is the foot of the perpendicular
from A to l. Then the distance between A and l is ∣AB∣.

Example 605. As in Example 604, let A = (1, 2) be a point and l be the line described by
r = OP +λv = (0, 1)+λ (9, 1) (λ ∈ R). Previously, using Method 1 (Formula Method),
we already found that the foot of the perpendicular from A to l is B = (45, 46).
Ð→ 1 1 4
Now: AB = B − A = (45, 46) − (1, 2) (4, −36) = (1, −9)
41 41 41
Ð→ 4√
And so, by Corollary 13, the distance between A and l is ∣AB∣ = 82.
We will next introduce two more methods for finding B and hence the distance between
A and l. In each method, we begin by letting B = (0, 1) + λ̃ (9, 1) be the foot of the
perpendicular from A to l, where λ̃ is some unknown to be found.
(Example continues on the next page ...)

483, Contents

(... Example continued from the previous page.)
Method 2 (Perpendicular Method). First write down AB:
AB = B − A = (0, 1) + λ̃ (9, 1) − (1, 2) = (9λ̃ − 1, λ̃ − 1).

Ð→ Ð→ Ð→
Since B is the foot of the perpendicular, we have AB ⊥ l or AB ⊥ v or AB ⋅ (9, 1) = 0:
0 = AB ⋅ (9, 1) = (9λ̃ − 1, λ̃ − 1) ⋅ (9, 1) = 9 (9λ̃ − 1) + (λ̃ − 1) = 82λ̃ − 10.

5 1
Rearranging, λ̃ = 10/82 = 5/41 and so B = (0, 1) + λ̃ (9, 1) = (0, 1) +
(9, 1) = (45, 46).
41 41
Lovely — this is the same as what we found in Method 1. And now, if we’d like, we can
calculate ∣AB∣ (the distance between A and l) as we did before.
Method 3 (Calculus Method). Let R be a generic point on l. Then AR =
(9λ − 1, λ − 1) and the distance between A and R is:
√ √
∣AR∣ = (9λ − 1) + (λ − 1) = 82λ2 − 20λ + 2.
2 2

Since B is the point on l that’s closest to A, the value of λ that minimises ∣AR∣ must
Ð→ √
be λ̃. Our goal then is to determine when ∣AR∣ = 82λ2 − 20λ + 2 is minimised — or
equivalently, when 82λ2 − 20λ + 2 is minimised.
To do so, we’ll use calculus. We’ll learn how more about how this calculus thing works in
Part V, but for now we’ll rely on what you may (or may not) remember from secondary
school. First differentiate 82λ2 − 20λ + 2 with respect to λ:

(82λ2 − 20λ + 2) = 164λ − 20.

And so, by the First Order Condition (FOC), we have:

20 5
(164λ − 20) ∣λ=λ̃ = 0 or 164λ̃ − 20 = 0 or λ̃ = = .
164 41
Lovely — this is the same as what we found in Method 2. And now, if we’d like, we can
find B and ∣AB∣ as we did before.
By the way, instead of explicitly using calculus, an alternative is to use what we learnt
earlier in Part I. Recall226 that quadratic expressions are minimised at “−b/2a”. Hence:

b ′′ −20 5
λ̃ = “ − =− = .
2a 2 ⋅ 82 41
This alternative is probably quicker (provided of course you can recall the “−b/2a” thing).

Fact 20 (whose proof, by the way, actually uses calculus).
484, Contents
The following is a “formula” for the distance between a point and a line.
Corollary 14. Suppose A is a point, l is the line described by r = OP + λv (λ ∈ R), and
d is the distance between A and l. Then:
d = ∣P A × v̂∣.

Proof. See Exercise 197.

Example 606. We continue with the point A = (1, 2) and the line l described by r =
OP + λv = (0, 1) + λ (9, 1) (λ ∈ R).
By Corollary 14, the distance between A and l is:

Ð→ (9, 1) 1⋅1−1⋅9 8 2
∣P A × v̂∣ = ∣(1, 1) × √ ∣=∣ √ ∣= √ =4 .
92 + 12 82 82 41

Happily, this coincides with what we found before.

Exercise 197. Let A be a point, l be the line described by r = OP + λv (λ ∈ R), and d
be the distance between A and l. (Answer on p. 1481.)

(a) Prove Corollary 14 in the trivial case where the point A is on the line l.
In the rest of this exercise (or proof), we will suppose that A is not on l. Let B be the
foot of the perpendicular from A to l.
(b) What is the relationship between d and AB?
Ð→ Ð→
(c) Express AB in terms of rejv P A. (Hint: Refer to the figure on p. 481.)
(d) Now use a result from the previous chapter to complete the proof of Corollary 14.

And now a brand new example where we illustrate all three methods:
Example 607. Let A = (−1, 0) be a point, l be the line described by r = OP + λv =
(3, 2) + λ(5, 1) (λ ∈ R), and B be the foot of the perpendicular from A to l.
Method 1 (Formula Method). First compute P A = (−1, 0) − (3, 2) = (−4, −2) and:

Ð→ (−4, −2) ⋅ (5, 1) ⎛ 5 ⎞ −20 − 2 ⎛ 5 ⎞ −11 ⎛ 5 ⎞

projv P A = proj(5,1) (−4, −2) = = =
52 + 12 ⎝1⎠ 26 ⎝ 1 ⎠ 13 ⎝ 1 ⎠

Ð→ −11 1
So: B = P + projv P A = (3, 2) + (5, 1) = (−16, 15).
13 13

Ð→ Ð→ (5, 1) −4 − (−10) 6 3 26
And: AB = ∣P A × v̂∣ = ∣(−4, −2) × √ ∣=∣ √ ∣= √ = .
52 + 12 26 26 13

(Example continues on the next page ...)

485, Contents
(... Example continued from the previous page.)
Method 2 (Perpendicular Method). Let B = (3, 2) + λ̃ (5, 1). Write:
AB = B − A = (3, 2) + λ̃ (5, 1) − (−1, 0) = (5λ̃ + 4, λ̃ + 2).

Ð→ Ð→
Since AB ⊥ l, we have AB ⊥ v or:
0 = AB ⋅ (5, 1) = (5λ̃ + 4, λ̃ + 2) ⋅ (5, 1) = 5 (5λ̃ + 4) + (λ̃ + 2) = 26λ̃ + 22.

Rearranging, λ̃ = −22/26 = −11/13 and so:

11 1
B = (3, 2) + λ̃ (5, 1) = (3, 2) − (5, 1) = (−16, 15) .
13 13
And now, the distance between A and l is:

Ð→ 1 1 3 3 26
∣AB∣ = ∣B − A∣ = ∣ (−16, 15) − (−1, 0)∣ = ∣ (−3, 15)∣ = ∣(−1, 5)∣ = .
13 13 13 13

Lovely — these coincide with what we found using Method 1.

Method 3 (Calculus Method). Let R be a generic point on l. Then:
√ √
Ð→ Ð→
AR = (5λ + 4, λ + 2) ∣AR∣ = (5λ + 4) + (λ + 2) = 26λ2 + 44λ + 20.
2 2

Differentiate: (26λ2 + 44λ + 20) = 52λ + 44.

44 11
FOC: (52λ + 44) ∣λ=λ̃ = 0 or λ̃ = − = .
52 13
44 44 11
Alternatively, we could simply have used “−b/2a”: λ̃ = − =− =− .
2 ⋅ 26 52 13
And now, we can find B and ∣AB∣ as we did in Method 2.

Exercise 198. In each of the following, a point A and line l are given. Let B be the foot
of the perpendicular from A to l. Find B and also the distance between A and l.
The point A The line l Answer on p.
(a) (7, 3) r = (8, 3) + λ (9, 3) 1482.
(b) (8, 0) Contains the points (4, 4) and (6, 11) 1483.
(c) (8, 5) r = (8, 4) + λ (5, 6) 1484.

486, Contents

So far, we’ve looked only at two-dimensional (2D) space
(or the cartesian plane). In the remainder of Part III, we’ll
look instead at three-dimensional (3D) space.
I will also often make use of Paul Seeburger’s CalcPlot3D.227
Whenever you see a tiny version of this icon:

click/touch it228 and you’ll be brought to the relevant 3D

graph, where you can pan, zoom, rotate, etc., so as to get a
better sense of what the 3D graph looks like.

I’ve examined dozens of 3D graphing software and all things considered (user-friendliness, accessibility,
features, etc.), this is the best 3D graphing web app I’ve found so far. Please let me know if you know
of any other better software/app. (I was gonna use GeoGebra, but it had too many critical flaws.)
Note that you’ll be routed through first. The reason is that the CalcPlot3D links are
often thousands of characters long and were confusing my computer.
487, Contents
44. Three-Dimensional (3D) Space

In 2D space, we had ordered pairs. In 3D space, we’ll instead have ordered triples:229

Definition 124. Given an ordered triple (a, b, c), we call a its first or x-coordinate, b its
second or y-coordinate, and c its third or z-coordinate.

Example 608. The ordered triple (Cow, Chicken, Dog) has x-coordinate Cow, y-
coordinate Chicken, and z-coordinate Dog. As with ordered pairs, the order of the
coordinates matters. So for example:

(Cow, Chicken, Dog) ≠ (Cow, Dog, Chicken) ≠ (Chicken, Cow, Dog) .

In contrast, with a set of three elements, order doesn’t matter:

{Cow, Chicken, Dog} = {Cow, Dog, Chicken} = {Chicken, Cow, Dog} .

Example 609. The ordered triple (2, 5, −π) has x-coordinate 2, y-coordinate 5, and
z-coordinate −π. Again, order matters, so that for example:

(2, 5, −π) ≠ (2, −π, 5) ≠ (5, 2, −π) .

Again, in contrast, with a set of three elements, order doesn’t matter:

{2, 5, −π} = {2, −π, 5} = {5, 2, −π} .

In 2D space, a point was simply any ordered pair of real numbers. Now in 3D space:

Definition 125. A point is any ordered triple of real numbers.

Example 610. The ordered triple (Cow, Chicken, Dog) is not a point because at least
one of its coordinates is not a real number. (Indeed, all three aren’t.)

Example 611. The ordered triple (2, 5, −π) is a point because all three of its coordinates
are real numbers.

For the formal definition of an ordered triple (and n-tuple), see Definition 218 (Appendices).
488, Contents
In 2D space (the cartesian plane), we could depict points y
(ordered pairs of real numbers) by drawing on a piece
of paper. The x-axis went right and the y-axis up. A = (2, 1)
y A = (a1 , a2 , a3 )

In 3D space, we can again depict points

x (now ordered triples of real numbers) by
drawing on a piece of paper. Again, the
a1 x-axis goes right and the y-axis up. But
now, we also have the z-axis, which “comes
a3 out of the paper towards your face” and is
z O = (0, 0, 0) perpendicular to both the x- and y-axes.

We say that this coordinate system follows the right-hand rule. To see why, have the
palm of your right hand face you. Fold your ring and pinky fingers. Have your thumb point
right, your index finger up, and your middle finger towards your face. Then these three
fingers correspond to the x-, y-, and z-axes. (Try it!)
(If instead the z-axis “goes into the paper away from your face”, then our coordinate system
would instead follow the left-hand rule. Can you explain why?)
In 2D space, the origin was the point O = (0, 0) (Definition 35) and was where the x- and
y-axes intersected. And the generic point A = (a1 , a2 ) was a1 units to the right and a2 units
above the origin.
Analogously, in 3D space:

Definition 126. The origin is the point O = (0, 0, 0).

In 3D space, the origin is where the x-, y-, and z-axes intersect. And relative to the origin,
the generic point A = (a1 , a2 , a3 ) is a1 units right, a2 units up, and a3 units “out (towards
your face)”.

489, Contents

44.1. Graphs (in 3D)
In 2D space, a graph was any set of points, where points were ordered pairs of real numbers
(Definition 36). And the graph of an equation was the set of points (x, y) for which the
equation was true (Definition 37).
Likewise, in 3D space, a graph remains any set of points, the only difference being that
points are now ordered triples of real numbers. And:

Definition 127. The graph of an equation (or system of equations) is the set of points
(x, y, z) for which the equation (or system of equations) is true.

Shortly, we’ll learn about the equations (and systems of equations) used to describe planes
(and lines). For now, here are two quick examples:

Example 612. Consider the equation: y

The plane q
x + y + z = 1.

It turns out that this equation describes B = (0, 1, 0)

a plane q in 3D space. (We’ll learn more
about this in Ch. 51.)
Specifically, q contains exactly those
points (x, y, z) that satisfy: A = (1, 0, 0)
x + y + z = 1. C = (0, 0, 1)

So for example, it contains the points

(1, 0, 0), (0, 1, 0), and (0, 0, 1). z
A little more formally, the plane q is a set:

q = {(x, y, z) ∶ x ∈ R, y ∈ R, z ∈ R, x + y + z = 1}.

In words, q is the set of ordered triples (x, y, z) such that x, y, and z are real numbers
satisfy x + y + z = 1.
As with ordered pairs, we will generally be looking only at ordered triples of real numbers,
i.e. points. And so, we shall be a little lazy/sloppy and not bother mentioning that x, y,
and z are real numbers. That is, we’ll usually more simply write:

q = {(x, y, z) ∶ x + y + z = 1} .

In words, q is the set of points (x, y, z) such that x, y, and z satisfy x + y + z = 1.

490, Contents

Example 613. Consider the following system of (two) equations:

x=y and y = z.

Or equivalently and more simply: x = y = z.

It turns out that this system of (two) equations describes a line l in 3D space. (We’ll
learn more about this in Ch. 48.) The line l contains exactly those points that can be
written as (λ, λ, λ), for some real number λ. ,
So, for example, it contains the points (1, 1, 1), O = (0, 0, 0), and (−1, −1, −1).

The line l

(1, 1, 1)

(−1, −1, −1)

A little more formally, the line l is a set:

l = {(x, y, z) ∶ x = y = z}.

In words, l is the set of ordered triples (x, y, z) such that x, y, and z are real numbers
that satisfy x = y = z.
As per , above, we can also write:

l = {(λ, λ, λ) ∶ λ ∈ R} = {λ (1, 1, 1) ∶ λ ∈ R} .

In words, l is the set of points that can be written as (λ, λ, λ) or λ (1, 1, 1) for some real
number λ.

491, Contents

45. Vectors (in 3D)
We now give the basic definitions and results concerning vectors in 3D space. Everything
we learnt about vectors in 2D space finds its analogy in 3D space. (Indeed, we’ll simply
reproduce verbatim many of the definitions and results from before).
Most of the time, the analogy is obvious. We will therefore go fairly briskly in this chapter.

Definition 128. Given the points A = (a1 , a2 , a3 ) and B = (b1 , b2 , b3 ), the vector from A
to B is AB = (b1 − a1 , b2 − a2 , b3 − a3 ).

Example 614. The vector from the point A = (1, 5, 0) to the point B = (−2, 6, 3) is:

⎛ −3 ⎞
AB = (−3, 1, 3) = ⎜ ⎟
⎜ 1 ⎟ = u.
⎝ 3 ⎠

Observe that there are, again, at least four ways to denote a single vector.

B = (−2, 6, 3)

Ð→ A = (1, 5, 0)
AB = (−3, 1, 3)
(The vector
from A to B)


492, Contents

Definition 91. A scalar is any real number.

We may again contrast vectors with scalars: vectors are two-dimensional objects, while
scalars are one-dimensional.
Definition 93. Given a point A, its position vector is the vector OA .
And so, the point A = (a1 , a2 , a3 ) has position vector OA = a = (a1 , a2 , a3 ).
Example 615. The point A = (1, 5, 0) has position vector OA = a = (1, 5, 0).

Once again, do not confuse a point (a zero-dimensional object) with a vector (a two-
dimensional object).

Definition 94. The zero vector, denoted 0, is the origin’s position vector.
And so, the zero vector (in 3D space) is 0 = OO = (0, 0, 0).

Definition 95. Suppose a moving object starts at point A and ends at point B. Then
we call AB its displacement vector.

And so, if a moving object starts at A = (a1 , a2 , a3 ) and ends at B = (b1 , b2 , b3 ), then its
displacement vector is AB = (b1 − a1 , b2 − a2 , b3 − a3 ).

Exercise 199. Let A = (2, 5, 8) and B = (0, 1, 1) be points.

(a) What is the vector from A to B?
(b) What are the position vectors of A and B?
(c) If a particle starts at A, travels to B, then travels back to A and stops there, then
what is its displacement vector? (Answer on p. 1485.)

493, Contents

45.1. The Magnitude or Length of a Vector

In 2D space, the length of a vector u = (u1 , u2 ) was defined as ∣u∣ = u21 + u22 (Def. 92).
Our Definition of a vector’s length in 3D space is very much analogous:

Definition 129. Given the vector u = (u1 , u2 , u3 ), its magnitude or length, denoted ∣u∣,
is the number:

∣u∣ = u21 + u22 + u23 .

√ √
Example 616. If u = (1, 2, 3), then the length of u is ∣u∣ = 12 + 22 + 32 = 14.

y To see why the above definition makes

sense, pick any point A = (a1 , a2 , a3 ).
It is natural to define the length of the
B = (0, a2 , a3 ) vector OA to be the length of the line
A = (a1 , a2 , a3 ) segment OA, i.e. ∣OA∣.
Our goal then is to find ∣OA∣. To do so,
we first consider the point B = (0, a2 , a3 ).
x Observe that the line segment OB is the
hypotenuse of the right triangle ODB.
a1 And so, by the Pythagorean Theorem:

a3 ∣OB∣ = a22 + a23 .
O = (0, 0, 0)

Now, observe that the line segment OA is the hypotenuse of the right triangle OBA.
Moreover, ∣BA∣ = a1 . And so, again by the Pythagorean Theorem:
√ √
√ 2 √
∣OA∣ = ∣BA∣ + ∣OB∣ = a1 + ( a2 + a3 ) = a21 + a22 + a23 .
2 2 2 2 2

This completes our explanation of why the above Definition makes sense.

As before, the length of every vector must be non-negative. Moreover, a vector has zero
length if and only if it is the zero vector:

Fact 53. Suppose v is a vector. Then ∣v∣ ≥ 0. Moreover, ∣v∣ = 0 ⇐⇒ v = 0.

Exercise 200. Let A = (2, 5, 8) and B = (0, 1, 1) be points. What is the length of the
vector from A to B? (Answer on p. 1485.)

494, Contents

45.2. Sums and Differences of Points and Vectors
As before: 1. Point + Point = Undefined.
2. Point − Point = Vector.
3. Point + Vector = Point.
4. Point − Vector = Point.

Definition 96. Given two points A and B, the difference B − A is the vector AB.

And so, given the points A = (a1 , a2 , a3 ) and B = (b1 , b2 , b3 ), their difference is:
B − A = AB = (b1 − a1 , b2 − a2 , b3 − a3 ) .

Example 617. Given the points A = (1, 5, 0) and B = (−2, 6, 3), their difference is the
vector B − A = AB = (−2 − 1, 6 − 5, 3 − 0) = (−3, 1, 3).

B = (−2, 6, 3)

Ð→ A = (1, 5, 0)
B − A = AB = (−3, 1, 3)
(The difference between
the points A and B)


495, Contents

Definition 130. Given a point A = (a1 , a2 , a3 ) and a vector v = (v1 , v2 , v3 ), their sum
A + v is the following point:

A + v = (a1 + v1 , a2 + v2 , a3 + v3 ).

Example 618. Given the point A = (1, 5, 0) and the vector v = (−3, 1, 3), their sum is
the point A + v = (1 − 3, 5 + 1, 0 + 3) = (−2, 6, 3).

(The sum of a point A
and a vector v)
A + v = (−2, 6, 3)

v = (−3, 1, 3) A = (1, 5, 0)


496, Contents

Definition 131. Given a point B = (b1 , b2 , b3 ) and a vector v = (v1 , v2 , v3 ), their difference
B − v is the following point:

B − v = (b1 − v1 , b2 − v2 , b3 − v3 ) .

Example 619. Given the point B = (−2, 6, 3) and the vector v = (−3, 1, 3), their difference
is the point B − v = (−2 − (−3) , 6 − 1, 3 − 3) = (1, 5, 0).

B = (−2, 6, 3)

v = (−3, 1, 3)
B − v = (1, 5, 0)
(The difference between a
point B and a vector v)


Exercise 201. Let A = (1, 2, 3), B = (−1, 0, 7), and C = (5, −2, 3) be points. What are
(a) A + B; (b) A − B; (c) A + (B + C); and (d) A + (B − C)? (Answer on p. 1485.)

497, Contents

45.3. Sum, Additive Inverse, and Difference of Vectors
As before: 1. Vector + Vector = Vector.
2. − Vector = Vector.
3. Vector − Vector = Vector.

Definition 132. Let u = (u1 , u2 , u3 ) and v = (v1 , v2 , v3 ) be vectors. Then their sum,
denoted u + v, is the vector u + v = (u1 + v1 , u2 + v2 , u3 + v3 ).

Example 620. Given the vectors u = (1, 2, 3) and v = (−1, 0, 1), their sum is the vector
u + v = (1 − 1, 2 + 0, 3 + 1) = (0, 2, 4).
v = (−1, 0, 1)

u = (1, 2, 3)

u + v = (0, 2, 4) Place the tail of v at the head of u.

(The sum of two vectors) Then the sum u + v is the vector
from the tail of u to the head of v.


Definition 133. The additive inverse of u = (u1 , u2 , u3 ) is the vector:

−u = (−u1 , −u2 , −u3 ).

Example 621. The additive inverse of u = (1, 2, 3) is the vector −u = (−1, −2, −3).

u = (1, 2, 3)

Flip u in the exact

opposite direction to get
its additive inverse −u.

−u = (−1, −2, −3)

(The additive inverse)

498, Contents

Definition 101. Given two vectors u and v, the difference u − v is the sum of u and the
additive inverse of v. That is:

u − v = u + (−v).

Fact 88. If u = (u1 , u2 , u3 ) and v = (v1 , v2 , v3 ) are vectors, then:

u − v = (u1 − v1 , u2 − v2 , u3 − v3 ) .

Proof. By Definition 133, −v = (−v1 , −v2 , −v3 ). And so by Definition 101,

u − v = u + (−v) = (u1 − v1 , u2 − v2 , u3 − v3 ) .

Example 622. Given the vectors u = (1, 2, 3) and v = (−1, 0, 1), their difference is the
vector u − v = (1 − (−1) , 2 − 0, 3 − 1) = (2, 2, 2).

v = (−1, 0, 1)
u − v = (2, 2, 2)
u = (1, 2, 3)

Place the heads of u and v

at the same point. Then the (The difference
difference u − v is the vector of two vectors)
from the tail of u to that of v.


Exercise 202. Let u = (1, 2, 3), v = (−1, 0, 7), and w = (5, −2, 3) be vectors. What are
(a) u + v; (b) u − v; (c) u + (v + w); and (d) u + (v − w)? (Answer on p. 1485.)

499, Contents

Ð→ Ð→ Ð→
Fact 55. If A and B are points, then OB − OA = AB.

Proof. Let A = (a1 , a2 , a3 ) and B = (b1 , b2 , b3 ). Then:

Ð→ Ð→ Ð→
AB = (b1 − a1 , b2 − a2 , b3 − a3 ), OA = (a1 , a2 , a3 ), and OB = (b1 , b2 , b3 ).
Ð→ Ð→ Ð→ Ð→ Ð→
By Fact 88, OB − OA = (b1 − a1 , b2 − a2 , b3 − a3 ). Thus, AB = OB − OA.

Example 623. Let A = (1, 5, 0) and B = (−2, 6, 3) be points.

Ð→ Ð→ Ð→
Then AB = B − A = (−2, 6, 3) − (1, 5, 0) = (−3, 1, 3), OA = (1, 5, 0), and OB = (−2, 6, 3).
Ð→ Ð→ Ð→
And indeed, OB − OA = (−2, 6, 3) − (1, 5, 0) = (−3, 1, 3) = AB. 3

B = (−2, 6, 3)
Ð→ Ð→ Ð→
OB − OA = AB = (−3, 1, 3)

A = (1, 5, 0)

Ð→ OA = (1, 5, 0)
OB = (−2, 6, 3)


500, Contents

Ð→ Ð→ Ð→ Ð→ Ð→ Ð→
Fact 56. If A, B, and C are points, then AB − AC = CB and AB + BC = AC.

Proof. Let A = (a1 , a2 , a3 ), B = (b1 , b2 , b3 ), and C = (c1 , c2 , c3 ). Then:

AB = (b1 − a1 , b2 − a2 , b3 − a3 ),
AC = (c1 − a1 , c2 − a2 , c3 − a3 ),
CB = (b1 − c1 , b2 − c2 , b3 − c3 ),
Ð→ Ð→
And: AB − AC = (b1 − a1 , b2 − a2 , b3 − a3 ) − (c1 − a1 , c2 − a2 , c3 − a3 )
= (b1 − c1 , b2 − c2 , b3 − c3 ) = CB. 3
Ð→ Ð→ Ð→ Ð→ Ð→
Observing that −CB = BC and rearranging, we also have AB + BC = AC. 3

Example 624. Let A = (1, 5, 0), B = (−2, 6, 3), and C = (4, −2, 1) be points.
Ð→ Ð→
Then AB = B − A = (−2, 6, 3) − (1, 5, 0) = (−3, 1, 3), AC = C − A = (4, −2, 1) − (1, 5, 0) =
(3, −7, 1), and BC = C − B = (4, −2, 1) − (−2, 6, 3) = (6, −8, −2).
Ð→ Ð→ Ð→ Ð→
And indeed, AB − AC = (−3, 1, 3) − (3, −7, 1) = (−6, 8, 2) = −BC = CB. 3
Ð→ Ð→ Ð→
Also, AB + BC = (−3, 1, 3) + (6, −8, −2) = (3, −7, 1) = AC. 3

Ð→ y
AB = (−3, 1, 3)
B = (−2, 6, 3)
A = (1, 5, 0)

BC = (6, −8, −2) Ð→
AC = (3, −7, 1)

C = (4, −2, 1)

Exercise 203. Let A = (5, −1, 0), B = (3, 6, −5), and C = (2, 2, 3) be points. Find AB,
Ð→ Ð→ Ð→ Ð→ Ð→ Ð→ Ð→ Ð→
AC, and BC; and show that AB − AC = CB and AB + BC = AC. (Answer on p. 1485.)

501, Contents

45.4. Scalar Multiplication and When Two Vectors Are Parallel

Definition 134. Let v = (v1 , v2 , v3 ) be a vector and c ∈ R be a scalar. Then:

cv = (cv1 , cv2 , cv3 ).

Fact 57. If v is a vector and c ∈ R, then ∣cv∣ = ∣c∣ ∣v∣.

Example 625. Let v = (−1, 0, 1) be a vector. Then 2v = (−2, 0, 2) and −3v = (3, 0, −3).

y −3v = (3, 0, −3)

2v = (−2, 0, 2)

z x

v = (−1, 0, 1)

√ √
∣v∣ = (−1) + 02 + 12 = 2

And so by Fact 57, we have:

√ √
∣2v∣ = 2 2 and ∣−3v∣ = 3 2.

502, Contents

Definition 103. Two non-zero vectors u and v are said to point in:
(a) The same direction if u = kv for some k > 0;
(b) Exact opposite directions if u = kv for some k < 0; and
(c) Different directions if u ≠ kv for any k.

Definition 104. Two non-zero vectors u and v are parallel if u = kv for some k and
non-parallel otherwise.

Example 626. Let u = (1, 0, 1). Then u points in:

• The same direction as v = (3, 0, 3) because v = 3u.
• The exact opposite direction as w = (−2, 0, −2) because w = −2u.
• A different direction from x = (5, 1, 0) because x ≠ ku for any k.

x = (5, 1, 0)
w = (−2, 0, −2)

u = (1, 0, 1) x

v = (3, 0, 3)

So, u is parallel to both v and w, but not to x. As shorthand, we may write:

u ∥ v, w and u ∥/ x.

Remark 63. Again, note the special case of the zero vector 0 = (0, 0, 0). It does not point
in the same, exact opposite, or different direction as any other vector. Also, it is neither
parallel nor non-parallel to any other vector.

Exercise 204. Continue to let u = (1, 0, 1), v = (3, 0, 3), w = (−2, 0, −2), and x = (5, 1, 0).
State if each of the following pairs of vectors point in the same, exact opposite, or different
directions; and also if they are parallel. (Answer on p. 1485.)

(a) v and w. (b) v and x. (c) w and x. (d) u and 0.

503, Contents

45.5. Unit Vectors
The following definitions and results about unit vectors are exactly the same as before:

Definition 105. A unit vector is any vector of length 1.

Definition 106. Given a non-zero vector v, its unit vector (or the unit vector in its
direction) is:
v̂ = v.

Fact 58. Given any non-zero vector, its unit vector has length 1.

Fact 59. Let v be a vector with unit vector v̂. If c ∈ R, then the vector cv̂ has length ∣c∣.

Fact 60. Let a and b be non-zero vectors. Then:

(a) â = b̂ ⇐⇒ a and b point in the same direction.
(b) â = −b̂ ⇐⇒ a and b point in exact opposite directions.
(c) â = ±b̂ ⇐⇒ a and b are parallel.
(d) â ≠ ±b̂ ⇐⇒ a and b are non-parallel.

Exercise 205. Find the length and unit vector of each vector. (Answer on p. 1486.)

(a) a = (1, 2, 3). (b) b = (4, 5, 6). (c) a − b.

(d) 2a. (e) 3b. (f) −4 (a − b).

504, Contents

45.6. The Standard Basis Vectors
Recall230 that in 2D space, the (two) standard basis vectors were the unit vectors that
point in the directions of the positive x- and y-axes:

i = (1, 0) and j = (0, 1).

Analogously, in 3D space, the (three) standard basis vectors are the three unit vectors
that point in the directions of the positive x-, y-, and z-axes:

Definition 135. The standard basis vectors are:

i = (1, 0, 0), j = (0, 1, 0), and k = (0, 0, 1).

j = (0, 1, 0)

k = (0, 0, 1)
i = (1, 0, 0)

Not surprisingly, every vector can be written as the linear combination of i, j, and k:231

Example 627. Let u = (7, 5, 3). Then u = 7i + 5j + 3k.

Exercise 206. Write each of the vectors v = (9, 0, −1) and w = (−7, 3, 5) as a linear
combination of the standard basis vectors. (Answer on p. 1486.)

Ch. 34.14.
You may also recall (Fact 61) that in 2D space, every vector can be written as the linear combination
of two non-parallel vectors. It turns out that there is an analogous result in 3D space.
For this result, we first define what it means for three (or more) vectors to be linearly independent:

Definition 136. Three (or more) non-zero vectors are linearly independent if the first vector cannot
be written as a linear combination of the other two vectors.
We then have the following Fact (proof omitted). This Fact is definitely out of the H2 Maths syllabus
and isn’t something you need worry about.
Fact 89. Every vector can be written as the linear combination of three linearly independent vectors.

505, Contents

45.7. The Ratio Theorem
The Ratio Theorem is exactly the same as before and now reproduced:

Theorem 7. Let A and B be points with position vectors a and b. Let P be the point
that divides the line segment AB in the ratio λ ∶ µ. Then P ’s position vector is:
µa + λb

The point P has position vector:
µa + λb

The point A has P

position vector a.

The point B has

position vector b.

Exercise 207. Let A = (1, 2, 3) on B = (4, 5, 6) be points. Find the point that divides
the line segment AB in the ratio 2 ∶ 3. (Answer on p. 1486.)

Let’s end this chapter with three more exercises:

Exercise 208. Fill in the blanks. (Answers on p. 1486.)

(a) Informally, a vector is an “arrow” with two properties: ____ and ____.
(b) A point and a vector are entirely different objects and should not be confused. Non-
etheless, each can be described by ____.
(c) Let A = (a1 , a2 , a3 ) be a point and a = (a1 , a2 , a3 ) be a vector. We say that a is A’s
(d) The vector a = (a1 , a2 , a3 ) carries us from the ____ to the point A = (a1 , a2 , a3 ).
Exercise 209. Let A = (a1 , a2 , a3 ) be a point. Write down the vector from the origin to
A in every possible way. (Answers on p. 1486.)
Exercise 210. Let A = (a1 , a2 , a3 ) and B = (b1 , b2 , b3 ) be points. What are A+B, A+ OB,
Ð→ Ð→ Ð→ Ð→ Ð→ Ð→
OA + OB, OA − OB, OA − BA? (Answers on p. 1486.)

506, Contents

46. The Scalar Product (in 3D)
In 2D space, we defined the scalar product of u = (u1 , u2 ) and v = (v1 , v2 ) to be:

u ⋅ v = u1 v1 + u2 v2 .

We define the scalar product in 3D space analogously:

Definition 137. Let u = (u1 , u2 , u3 ) and v = (v1 , v2 , v3 ) be vectors. Their scalar product
u ⋅ v is the number:

u ⋅ v = u1 v1 + u2 v2 + u3 v3 .

Example 628. Let u = (5, −3, 1), v = (2, 1, −2), and w = (0, −4, 3). Then:

⎛ 5 ⎞ ⎛ 2 ⎞
u⋅v =⎜ ⎟ ⎜
⎜ −3 ⎟ ⋅ ⎜ 1
⎟ = 10 − 3 − 2 = 5.

⎝ 1 ⎠ ⎝ −2 ⎠

⎛ 5 ⎞ ⎛ 0 ⎞
u⋅w =⎜ ⎟ ⎜
⎜ −3 ⎟ ⋅ ⎜ −4
⎟ = 0 + 12 + 3 = 15.

⎝ 1 ⎠ ⎝ 3 ⎠

⎛ 2 ⎞ ⎛ 0 ⎞
v⋅w =⎜ ⎟ ⎜
⎜ 1 ⎟ ⋅ ⎜ −4
⎟ = 0 − 4 − 6 = −10.

⎝ −2 ⎠ ⎝ 3 ⎠

Recall232 that in 2D space, the scalar product was both commutative and distributive
over addition. The same remains true of the scalar product in 3D space:

Fact 90. Let a, b, and c be vectors. Then:

(a) a ⋅ b = b ⋅ a. (Commutativity)
(b) a ⋅ (b + c) = a ⋅ b + a ⋅ c. (Distributivity over addition)

Proof. Let233 a = (a1 , a2 , a3 ), b = (b1 , b2 , b3 ), and c = (c1 , c2 , c3 ). Then:

(a) a ⋅ b = a1 b1 + a2 b2 + a3 b3 = b1 a1 + b2 a2 + b3 a3 = b ⋅ a.

(b) a ⋅ (b + c) = a1 (b1 + c1 ) + a2 (b2 + c2 ) + a3 (b3 + c3 )

= a1 b1 + a2 b2 + a3 b3 + a1 c1 + a2 c2 + a3 c3 = a ⋅ b + a ⋅ c.

Fact 65.
Our proof here covers only the 3D case. For a more general proof, see p. 1289 (Appendices).
507, Contents
Example 629. Continue to let u = (5, −3, 1), v = (2, 1, −2), and w = (0, −4, 3).
To illustrate commutativity, we can easily verify that:

v⋅u=u⋅v=5 and w ⋅ u = u ⋅ w = 5.

And to illustrate distributivity, we compute the following:

⎛ 5 ⎞ ⎛ 2+0 ⎞
u ⋅ (v + w) = ⎜ ⎟ ⎜
⎜ −3 ⎟ ⋅ ⎜ 1 − 4
⎟ = 10 + 9 + 1 = 20.

⎝ 1 ⎠ ⎝ −2 + 3 ⎠

⎛ 5+2 ⎞ ⎛ 0 ⎞
(u + v) ⋅ w = ⎜
⎜ −3 + 1
⎟ ⋅ ⎜ −4 ⎟ = 0 + 8 − 3 = 5.
⎟ ⎜ ⎟
⎝ 1−2 ⎠ ⎝ 3 ⎠

And again, a vector’s length is the square root of its scalar product with itself:

Fact 67. Suppose v be a vector. Then ∣v∣ = v ⋅ v and ∣v∣ = v ⋅ v.

Proof. By Definition 129, ∣v∣ = v12 + v22 + v32 . By Definition 110, v ⋅ v = v1 v1 + v2 v2 + v3 v3 =

v12 + v22 + v32 . Hence, ∣v∣ = v ⋅ v and ∣v∣ = v ⋅ v.

Exercise 211. Compute (1, 2, 3)⋅(4, 5, 6) and (−2, 4, −6)⋅(1, −2, 3). (Answer on p. 1487.)

508, Contents

46.1. The Angle between Two Vectors
Place the tails of two vectors at the same point. Then as before, the angle between
these two vectors is, informally, simply the (smaller) “amount” by which we must rotate
one of the two vectors so that both point in the same direction.

Example 630. As shown in the figure below, a, b, c, and d are vectors. The angle
between a and b is α, while that between c and d is β.


c β x

(Observe that here it so happens that α is acute, while β is obtuse.)

And formally, we’ll use the exact same Definition as before:

Definition 111. The angle between two non-zero vectors u and v is the number:
∣u∣ ∣v∣

And as before, a simple rearrangement of Definition 111 yields:

Fact 68. If u and v are two non-zero vectors and θ is the angle between them, then:

u ⋅ v = ∣u∣ ∣v∣ cos θ.

509, Contents

Example 631. The angle between the vectors (1, 3, −2) and (0, 2, 1) is:

(1, 3, −2) ⋅ (0, 2, 1) 4

cos−1 = cos−1 √ √ ≈ 1.072.
∣(1, 3, −2)∣ ∣(0, 2, 1)∣ 14 5

(1, 3, −2)

(0, 2, 1)


Example 632. The angle between (8, 5, 0) and (−2, −3, 5) is:

(8, 5, 0) ⋅ (−2, −3, 5) −31

cos−1 = cos−1 √ √ ≈ 2.133.
∣(8, 5, 0)∣ ∣(−2, −3, 5)∣ 89 38

(8, 5, 0)

2.133 x

(−2, −3, 5)

The following results and Definition are reproduced verbatim from before:

Fact 69. (Cauchy’s Inequality.) Let u and v be non-zero vectors. Then:

−1 ≤ ≤ 1.
∣u∣ ∣v∣

− ∣u∣ ∣v∣ ≤ u ⋅ v ≤ ∣u∣ ∣v∣ (u ⋅ v) ≤ ∣u∣ ∣v∣ .

2 2 2
Equivalently: or

510, Contents

Fact 70. Let u and v be non-zero vectors and θ be the angle between them. Then:
(a) = 1 ⇐⇒ = 0.
∣u∣ ∣v∣

∈ (0, 1) ⇐⇒ ∈ (0, ).
(b) And thus:
∣u∣ ∣v∣
= ⇐⇒ = u⋅v > ⇐
(c) 0 . (i) 0
∣u∣ ∣v∣
∈ (−1, 0) ⇐⇒ ∈ ( , π). u⋅v = ⇐
(d) (ii) 0
∣u∣ ∣v∣
(e) = −1 ⇐⇒ = (iii) u⋅v < 0 ⇐
∣u∣ ∣v∣
θ π.

Definition 112. Two non-zero vectors u and v are perpendicular (or normal or ortho-
gonal) if u ⋅ v = 0 and non-perpendicular if u ⋅ v ≠ 0.

Fact 71. Let u and v be non-zero vectors. Then:

(a) =1 ⇐⇒ u and v point in the same direction.
∣u∣ ∣v∣
(b) = −1 ⇐⇒ u and v point in exact opposite directions.
∣u∣ ∣v∣
(c) = ±1 ⇐⇒ u and v are parallel.
∣u∣ ∣v∣
(d) ∈ (−1, 1) ⇐⇒ u and v point in different directions.
∣u∣ ∣v∣

Theorem 8. (Pythagorean Theorem.) If u ⊥ v, then ∣u + v∣ = ∣u∣ + ∣v∣ .

2 2 2

Fact 72. (Triangle Inequality.) If u and v are vectors, then ∣u + v∣ ≤ ∣u∣ + ∣v∣.

Exercise 212. Find the angle between each pair of vectors. Also, state whether each
pair of vectors is parallel or perpendicular. (Answer on p. 1487.)

(a) a = (1, 2, 3) and b = (4, 5, 6) (b) u = (−2, 4, −6) and v = (1, −2, 3)

511, Contents

46.2. Direction Cosines
In 2D space, we defined the x- and y-direction cosines of the vector u = (u1 , u2 ) to be u1 / ∣u∣
and u2 / ∣u∣. We shall define direction cosines in 3D space analogously:

Definition 138. The x-, y-, and z-direction cosines of the vector v = (v1 , v2 , v3 ) are:

v1 v2 v3
, , and .
∣v∣ ∣v∣ ∣v∣

Again, the direction cosines are so named because each direction cosine is equal to the
cosine of the angle the given vector makes with each (positive) axis:

Fact 91. Let v = (v1 , v2 , v3 ) be a non-zero vector. Let α, β, and γ be the angles between
v and each of i, j, and k. Then:

cos α = cos β = cos γ =

v1 v2 v3
, , and .
∣v∣ ∣v∣ ∣v∣

Proof. Since α is the angle between v and i, we have:

v ⋅ i v1 ⋅ 1 + v2 ⋅ 0 + v3 ⋅ 0 v1
cos α = = = .3
∣v∣ ∣i∣ ∣v∣ ⋅ 1 ∣v∣

Similarly, since β is the angle between v and j, we have:

v ⋅ j v1 ⋅ 0 + v2 ⋅ 1 + v3 ⋅ 0 v2
cos β = = = .3
∣v∣ ∣j∣ ∣v∣ ⋅ 1 ∣v∣

And since γ is the angle between v and k, we have:

v ⋅ k v1 ⋅ 0 + v2 ⋅ 0 + v3 ⋅ 1 v3
cos γ = = = .3
∣v∣ ∣k∣ ∣v∣ ⋅ 1 ∣v∣

√ √
Example 633. Let v = (2, 3, 2). Compute: ∣v∣ = 22 + 32 + 22 = 17.
√ √ √
So v’s x-, y-, and z-direction cosines are 2/ 17, 3/ 17, and 2/ 17.
And the angles it makes with the positive x-, y-, and z-axes are:

2 3 2
cos−1 √ ≈ 1.064, cos−1 √ ≈ 0.756, and cos−1 √ ≈ 1.064.
17 17 17

Exercise 213. For each of the following vectors, write down its unit vector and x-, y-,
and z-direction cosines. Then compute also the angles each makes with the positive x-,
y-, and z-axes. (Answer on p. 1487.)

(a) (1, 3, −2). (b) (4, 2, −3). (c) (−1, 2, −4).

512, Contents

47. The Projection and Rejection Vectors (in 3D)
Our definitions and results about the projection and rejection vectors carry over from
2D space in the “obvious” fashion. Here are the same Definitions reproduced:

Definition 118. Let a and b be vectors. Then the projection of a on b, denoted projb a,
is the following vector:

projb a = (a ⋅ b̂) b̂ (or equivalently, projb a = b).

Definition 119. Let a and b be non-zero vectors. Then the rejection of a on b, denoted
rejb a, is the following vector:

rejb a = a − projb a.

a As before, the two key properties234 are that:

rejb a = a − projb a
projb a ∥ b

θ and
projb a b rejb a ⊥ projb a, b.
The 2D figure above is simply reproduced from before. Here’s a figure depicting the pro-
jection and rejection vectors in 3D:

projb a
rejb a = a − projb a


Again, these two properties must hold provided projb a and rejb a are both non-zero.
513, Contents

Example 634. Let a = (5, −2, 3) and b = (0, 1, 2). Then b̂ = (0, 1, 2) / 5 and:

⎛0⎞ ⎛0⎞ ⎛0 ⎞ ⎛ 0 ⎞
(5, −2, 3) ⋅ (0, 1, 2) ⎜ ⎟ 0 − 2 + 6 ⎜ ⎟ 4 ⎜ ⎟ = ⎜ 0.8 ⎟,
projb a = (a ⋅ b̂) b̂ = ⎜ 1 ⎟= ⎜ 1 ⎟= 5⎜ 1 ⎟ ⎜ ⎟
5 5
⎝2⎠ ⎝2⎠ ⎝2 ⎠ ⎝ 1.6 ⎠

rejb a = a − projb a = (5, −2, 3) − (0, 0.8, 1.6) = (5, −2.8, 1.4)

projb a

b = (0, 1, 2) θ


a = (5, −2, 3)

rejb a = a − projb a

We can easily verify that projb a = kb for some k and hence that projb a ∥ b:

projb a = (0, 0.8, 1.6) = 0.8 (0, 1, 2) = 0.8b. 3

We can also verify that rejb a ⋅ b = 0 and hence that rejb a ⊥ projb a, b:

rejb a ⋅ b = (5, −2.8, 1.4) ⋅ (0, 1, 2) = 0 − 2.8 + 2.8 = 0. 3

514, Contents

As before, the length of the projection vector is given by the scalar product:

Fact 78. Let a and b be vectors. Then:

∣projb a∣ = ∣a ⋅ b̂∣ .

Example 635. Continue to let a = (5, −2, 3) and b = (0, 1, 2). We already found:

b̂ = (0, 1, 2) / 5 and projb a = (0, 0.8, 1.6).
√ √
Now: ∣projb a∣ = ∣0.8 (0, 1, 2)∣ = 0.8 02 + 12 + 22 = 0.8 5.
√ √ √ √
Also: ∣a ⋅ b̂∣ = (5, −2, 3) ⋅ (0, 1, 2) / 5 = 4/ 5 = 4 5/5 = 0.8 5.

And so, it is indeed true that: ∣projb a∣ = ∣a ⋅ b̂∣.

As before, the sign of a ⋅ b tells us whether projb a points in the same or exact opposite
direction as b:

Fact 77. Let a and b be vectors, and projb a = (a ⋅ b̂) b̂.

(a) If a ⋅ b̂ > 0, then projb a is a positive scalar multiple of b.

(b) If a ⋅ b̂ < 0, then projb a is a negative scalar multiple of b.
(c) If a ⋅ b̂ = 0, then projb a = 0 and rejb a = a.

Example 636. Let a = (2, 5, −1), b = y

rejb a rejc a
(1, 1, 4), and c = (−2, −2, 1). Then:
(a) a ⋅ b = 2 + 5 − 4 > 0, so that the angle
between a and b is acute and projb a projc a
points in the same direction as b. b = (1, 1, 4)
a = (2, 5, −1)
(b) a ⋅ c = −4 − 10 − 1 < 5, so that the angle
between a and c is obtuse and projc a
points in the exact opposite direction
as b. z projb a x
(c) b ⋅ c = −2 − 2 + 4 = 0, so that the angle
between a and c is right and projc b = 0.
Moreover, rejc b = b. c = (−2, −2, 1)

Exercise 214. Continuing with the above example, find projb a, rejb a, projc a, and rejc a.
Then verify that rejb a ⊥ b and rejc a ⊥ c. (Answer on p. 1488.)

515, Contents

As before, if v ∥ w, then the projections of any vector on each of v and w are identical:

Fact 79. Let u, v, and w be vectors. If v ∥ w, then:

projv u = projw u.

Example 637. Let u = (2, 5, −1), v = (1, −2, 1), and w = (−2, 4, −2). Since v ∥ w, by the
above Fact, it should be that projv u = projw u, as we now verify:

⎛ 1 ⎞ ⎛ 1 ⎞ ⎛ 1 ⎞
(2, 5, −1) ⋅ (1, −2, 1) ⎜ ⎟ 2 − 10 + 1 ⎜ ⎟ 3⎜ ⎟.
projv u = (u ⋅ v̂) v̂ = ⎜ −2 ⎟ = ⎜ −2 ⎟ = − 2 ⎜ −2 ⎟
12 + (−2) + 12 ⎝
2 6
1 ⎠ ⎝ 1 ⎠ ⎝ 1 ⎠

⎛ −2 ⎞ ⎛ −2 ⎞ ⎛ 1 ⎞
(2, 5, −1) ⋅ (−2, 4, −2) ⎜ ⎟ −4 + 20 + 2 ⎜ 4 ⎟ = − ⎜ −2
3 ⎟.
projw u = (u ⋅ ŵ) ŵ = 2 ⎜ 4 ⎟ = ⎜ ⎟
(−2) + 4 + (−2) ⎝
2 24 2⎜ ⎟
⎠ ⎝ ⎠ ⎝ 1 ⎠
−2 −2

y rejv u = rejw u

w = (−2, 4, −2)

u = (2, 5, −1)

projv u = projw u

v = (1, −2, 1)

Exercise 215. Given a = (1, 2, 3) and b = (4, 5, 6), find projb a, rejb a, ∣projb a∣, and
∣rejb a∣. Verify that projb a ∥ b and rejb a ⊥ b. Does projb a point in the same or exact
opposite direction as b? (Answer on p. 1488.)

516, Contents

48. Lines (in 3D)
Our definition of a line in 3D space is the general one given earlier and now reproduced:

Definition 109. A line is any set of points that can be written as:
{R ∶ OR = p + λv (λ ∈ R)} ,

where p and v ≠ 0 are some vectors.

As before, the above Definition says that a line contains exactly those points R whose
position vector OR = r may be expressed as:

⎛ p1 ⎞ ⎛ v1 ⎞
OR = r = p + λv = ⎜ ⎟ ⎜
⎜ p 2 ⎟ + λ⎜ v 2

⎟ for some real number λ.
⎝ p3 ⎠ ⎝ v3 ⎠

Equivalently, a line contains exactly those points R that may be expressed as:

⎛ p1 ⎞ ⎛ v1 ⎞
R=⎜ ⎟ ⎜
⎜ p 2 ⎟ + λ ⎜ v2

⎟ for some real number λ.
⎝ p3 ⎠ ⎝ v3 ⎠

As before, here are what the vectors p and v and the number λ mean:
• p = (p1 , p2 , p3 ) is the position vector of some point on the line;
• v = (v1 , v2 , v3 ) is a direction vector of the line; and
• The parameter λ takes on every value in R; each distinct value produces a distinct
point on the line.

Direction vectors are defined exactly as before:

Definition 108. Given any two distinct points A and B on a line, we call the vector
AB a direction vector of the line.

And as before, direction vectors are unique up to non-zero scalar multiplication. In

other words, if a line has direction vector v, then that line’s direction vectors are exactly
those that are parallel to v. Formally:

Fact 62. Let u and v be vectors. Suppose v is a line’s direction vector. Then:

u is also that line’s direction vector ⇐⇒ u ∥ v.

517, Contents

Example 638. Let l be the line described by the vector equation:

⎛1⎞ ⎛0 ⎞
r = OP + λv = ⎜ ⎟ ⎜
⎜ 2 ⎟ + λ⎜ 1

⎟ (λ ∈ R).
⎝3⎠ ⎝1 ⎠

The line l contains the point P = (1, 2, 3) and has direction vector v = (0, 1, 1).
As the parameter λ varies, we get different points of l. So for example, when λ takes on
the values 0, 1, and −1, we get the following three position vectors (and thus points):

⎛1⎞ ⎛1 ⎞ ⎛0⎞ ⎛1⎞ ⎛1 ⎞ ⎛0⎞ ⎛1⎞ ⎛1 ⎞ ⎛0⎞

⎜ 2 ⎟=⎜ 2 ⎟ + 0⎜ 1 ⎟, ⎜ 3 ⎟=⎜ 2 ⎟ + 1⎜ 1 ⎟, ⎜ 1 ⎟=⎜ 2 ⎟ − 1⎜ 1 ⎟.
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ and ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝3⎠ ⎝3 ⎠ ⎝1⎠ ⎝4⎠ ⎝3 ⎠ ⎝1⎠ ⎝2⎠ ⎝3 ⎠ ⎝1⎠

(1, 3, 4)
(0, 1, 1)
(1, 2, 3)
λ=0 (1, 1, 2)
λ = −1 x


Note that the direction vector v = (0, 1, 1) has x-coordinate 0. Informally, one implication
of this is that the line doesn’t “move” in the direction of the x-axis.
A little more formally, the line is perpendicular to the x-axis. Indeed, we can easily verify
that v is perpendicular to the first standard basis vector i = (1, 0, 0):

⎛0⎞ ⎛0⎞ ⎛1 ⎞
v⋅i=⎜ ⎟ ⎜ ⎟ ⎜
⎜ 1 ⎟⋅i=⎜ 1 ⎟⋅⎜ 0
⎟ = 0 ⋅ 1 + 1 ⋅ 0 + 1 ⋅ 0 = 0.

⎝1⎠ ⎝1⎠ ⎝0 ⎠

518, Contents

Example 639. Let l be the line described by the vector equation:

⎛0⎞ ⎛1 ⎞
r = OP + λv = ⎜ ⎟ ⎜
⎜ 0 ⎟ + λ⎜ 0

⎟ (λ ∈ R).
⎝0⎠ ⎝0 ⎠

The line l contains the point P = (0, 0, 0) and has direction vector v = (1, 0, 0).
As the parameter λ varies, we get different points of l. So for example, when λ takes on
the values 0, 1, and −1, we get the following three position vectors (and thus points):

⎛0⎞ ⎛0 ⎞ ⎛1⎞ ⎛1⎞ ⎛0 ⎞ ⎛1⎞ ⎛ −1 ⎞ ⎛ 0 ⎞ ⎛1⎞

⎜ 0 ⎟=⎜ 0 ⎟ + 0⎜ 0 ⎟, ⎜ 0 ⎟=⎜ 0 ⎟ + 1⎜ 0 ⎟, ⎜ 0 ⎟=⎜ 0 ⎟ − 1⎜ 0 ⎟.
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ and ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝0⎠ ⎝0 ⎠ ⎝0⎠ ⎝0⎠ ⎝0 ⎠ ⎝0⎠ ⎝ 0 ⎠ ⎝0 ⎠ ⎝0⎠

(1, 0, 0)

(−1, 0, 0) (1, 0, 0)
λ = −1 λ=1

(0, 0, 0)
l x

Note that the direction vector v = (1, 0, 0) has y- and z-coordinates 0. Again, this means
that the line is perpendicular to the y- and z-axes — we can easily verify that v is
perpendicular to both j = (0, 1, 0) and k = (0, 0, 1).
Indeed, this line actually coincides with the x-axis — it passes through the origin (0, 0, 0)
and its direction vector is parallel to i.

519, Contents

48.1. Vector to Cartesian Equations
Suppose a line l is described by the following vector equation:

r = (p1 , p2 , p3 ) + λ(v1 , v2 , v3 ) (λ ∈ R).

Then for each point (x, y, z) on l, there is some real number λ such that:

x = p1 + λv1 , y = p2 + λv2 , and z = p3 + λv3 .

These three cartesian equations also describe l.

In 2D space, we could eliminate the parameter λ and thus reduce the two cartesian equations
to one. Here in 3D space, we can also eliminate the parameter λ, but this time we’ll merely
reduce the three cartesian equations to two.
We start with three examples where none of the direction vector’s coordinates are zero:

Example 640. Let l be the line described by the following vector equation:

⎛1⎞ ⎛4 ⎞
r=⎜ ⎟ ⎜ ⎟

⎜ 2 ⎟ + λ⎜ 5 ⎟ (λ ∈ R).
⎝3⎠ ⎝6 ⎠

That is, let: l = {R ∶ r = (1, 2, 3) + λ(4, 5, 6)}.

In words, l is the set of points R whose position vector can be written as (1, 2, 3)+λ(4, 5, 6),
for some real number λ.

Write out = as the following three cartesian equations:

x = 1 + 4λ, y = 2 + 5λ, z = 3 + 6λ.

1 2 3

Rearrange each equation so that λ is on one side:

x−1 y−2 z−3

λ= λ= λ=
1 2 3
, , .
4 5 6
Eliminating λ leaves the following two cartesian equations that describe the line l:

x−1 y−2 z−3

= = .
4 5 6
And so, we may also write:

x−1 y−2 z−3

l = {(x, y, z) ∶ = = }.
4 5 6
x−1 y−2 z−3
In words, l is the set of points (x, y, z) that satisfy the equations = = .
4 5 6

520, Contents

Example 641. Let l0 be the line described by the following vector equation:

⎛ −2 ⎞ ⎛ 1 ⎞
r=⎜ ⎟ ⎜
⎜ 5 ⎟ + λ⎜ 5

⎟ (λ ∈ R).
⎝ 0 ⎠ ⎝ −2 ⎠

Write out the three cartesian equations:

x = −2 + λ, y = 5 + 5λ, z = 0 − 2λ.
1 2 3

Rearrange each equation so that λ is on one side:

x+2 y−5
λ= λ= λ=
1 2 3 z
, , .
1 5 −2
Thus, l0 may also be described by the following two cartesian equations:

x+2 y−5 z
= = .
1 5 −2

Example 642. Let l1 be the line described by the following vector equation:

⎛0⎞ ⎛2 ⎞
r=⎜ ⎟ ⎜
⎜ 0 ⎟ + λ⎜ 3

⎟ (λ ∈ R).
⎝0⎠ ⎝5 ⎠

Write out the three cartesian equations:

x = 0 + 2λ, y = 0 + 3λ, z = 0 + 5λ.

1 2 3

Rearrange each equation so that λ is on one side:

λ= , λ= , λ= .
1 x 2 y 3 z
2 3 5
Thus, l1 may also be described by the following two cartesian equations:

= = .
x y z
2 3 5

521, Contents

We now look at examples where exactly one of the direction vector’s coordinates is zero:

Example 643. The line l2 is described by r = (1, 2, 3) + λ(0, 5, 6) (λ ∈ R).

x = 1 + 0λ = 1, y = 2 + 5λ, z = 3 + 6λ.
1 2 3
Write out the three cartesian equations:
Observe that the direction vector (0, 5, 6) has x-coordinate 0. And so, l2 must be per-
pendicular to the x-axis. Indeed, the x-coordinate of every point on l2 is fixed as x = 1.

Leave = alone. But as before, rearrange = and = so that λ is on one side:

1 2 3

y−2 z−3
λ= λ=
2 3
and .
5 6
Altogether then, l2 may also be described by the following two cartesian equations:

y−2 z−3
x=1 and = .
5 6

Example 644. The line l3 is described by r = (1, 2, 3) + λ(4, 0, 6) (λ ∈ R).

x = 1 + 4λ, y = 2 + 0λ = 2, z = 3 + 6λ.
1 2 3
Write out the three cartesian equations:
Observe that the direction vector (4, 0, 6) has y-coordinate 0. And so, l3 must be per-
pendicular to the y-axis. Indeed, the y-coordinate of every point on l3 is fixed as y = 2.

Leave y = 2 alone. But as before, rearrange = and = so that λ is on one side:

2 1 3

x−1 z−3
λ= λ=
1 3
and .
4 6
Thus, l3 may also be described by the following two cartesian equations:

x−1 z−3
y=2 and = .
4 6

Example 645. The line l4 is described by r = (1, 2, 3) + λ(4, 5, 0) (λ ∈ R).

x = 1 + 4λ, y = 2 + 5λ, z = 3 + 0λ = 3.
1 2 3
Write out the three cartesian equations:
Observe that the direction vector (4, 5, 0) has z-coordinate 0. And so, l4 must be per-
pendicular to the z-axis. Indeed, the z-coordinate of every point on l4 is fixed as z = 3.

Leave z = 3 alone. But as before, rearrange = and = so that λ is on one side:

3 1 2

x−1 y−2
λ= λ=
2 3
and .
4 5
Thus, l4 may also be described by the following two cartesian equations:

x−1 y−2
z=3 and = .
4 5

522, Contents

We now look at examples where exactly two of the direction vector’s coordinates are zero:

Example 646. The line l5 is described by r = (1, 2, 3) + λ(0, 0, 6) (λ ∈ R).

Write out the three cartesian equations:

x = 1 + 0λ = 1, y = 2 + 0λ = 2, z = 3 + 6λ.
1 2 3

Observe that the direction vector (0, 0, 6) has x- and y-coordinates 0. And so, l5 must be
perpendicular to both the x- and y-axes.
Indeed, the x-and y-coordinates of every point of l5 are fixed as x = 1 and y = 2.
1 2

On the other hand, z is free to vary along with λ. Unlike in any of our previous examples,
there is no restriction on what z can be. And so we call z the free variable.
And so here in this example, there is actually no algebra to be done. We simply discard
= and say that l5 may be described by the following two cartesian equations:

x=1 y = 2.
1 2

Example 647. The line l6 is described by r = (1, 2, 3) + λ(0, 5, 0) (λ ∈ R).

Write out the three cartesian equations:

x = 1 + 0λ = 1, y = 2 + 5λ, z = 3 + 0λ = 3.
1 2 3

Observe that the direction vector (0, 5, 0) has x- and z-coordinates 0. And so, l6 must be
perpendicular to both the x- and z-axes.
Indeed, the x-and z-coordinates of every point of l6 are fixed as x = 1 and z = 3.
1 3

In this example, the free variable is y. We simply discard = and say that l6 may be

described by the following two cartesian equations:

x=1 z = 3.
1 3

Example 648. The line l7 is described by r = (1, 2, 3) + λ(4, 0, 0) (λ ∈ R).

Write out the three cartesian equations:

x = 1 + 4λ, y = 2 + 0λ = 2, z = 3 + 0λ = 3.
1 2 3

Observe that the direction vector (4, 0, 0) has y- and z-coordinates 0. And so, l7 must be
perpendicular to both the y- and z-axes.
Indeed, the y-and z-coordinates of every point of l7 are fixed as y = 2 and z = 3.
2 3

In this example, the free variable is x. We simply discard = and say that l7 may be

described by the following two cartesian equations:

y=2 z = 3.
2 3

523, Contents

In total, we have seven possible cases, depending on which of the direction vector’s
coordinates are zero. These seven cases are summarised in the following Fact (and were
illustrated by the above examples).

Fact 92. Suppose the line l is described by r = (p1 , p2 , p3 ) + λ(v1 , v2 , v3 ) (λ ∈ R).

(1) If v1 , v2 , v3 ≠ 0, then l can be described by:

x − p1 y − p2 z − p3
= = .
v1 v2 v3
(2) If v1 = 0 and v2 , v3 ≠ 0, then l is perpendicular to the x-axis and can be described by:

y − p2 z − p 3
x = p1 and = .
v2 v3
(3) If v2 = 0 and v1 , v3 ≠ 0, then l is perpendicular to the y-axis and can be described by:

x − p1 z − p 3
y = p2 and = .
v1 v3
(4) If v3 = 0 and v1 , v2 ≠ 0, then l is perpendicular to the z-axis and can be described by:

x − p1 y − p2
z = p3 and = .
v1 v2
(5) If v1 , v2 = 0, then l is perpendicular to the x- and y-axes and can be described by:

x = p1 and y = p2 .

(6) If v1 , v3 = 0, then l is perpendicular to the x- and z-axes and can be described by:

x = p1 and z = p3 .

(7) If v2 , v3 = 0, then l is perpendicular to the y- and z-axes and can be described by:

y = p2 and y = p2 .

Proof. See p. 1293 in the Appendices.

Exercise 216. Each vector equation below describes a line. Rewrite each into cartesian
form. Also, state if each line is perpendicular to any axes. (Answer on p. 1489.)

(a) r = (−1, 1, 1) +λ (3, −2, 1) (λ ∈ R). (b) r = (5, 6, 1) +λ (7, 8, 1) (

(c) r = (0, −3, 1) +λ (3, 0, 1) (λ ∈ R). (d) r = (9, 9, 9) +λ (1, 0, 0) (
(e) r = (0, 0, 0) +λ (4, 8, 5) (λ ∈ R). (f) r = (1, 3, 5) +λ (0, −4, 0) (

524, Contents

48.2. Cartesian to Vector Equations
In the last subchapter, we started with a line’s vector equation, then wrote down its
cartesian equations. We’ll now go the other way round.
To write down a line’s vector equation, all we need are a point that’s on the line and a
direction vector of the line.

Example 649. Let l be the line described by the following cartesian equations:
3x − 9 2y − 8 z − 1
= = .
6 2 3
First, rewrite the above cartesian equations so that the coefficients on x, y, and z are all
1. This is easily done by dividing the numerator and denominator of each fraction by the
variable’s coefficient:
x−3 y−4 z−1
= = .
2 1 3
Reading off, the line l contains the point (3, 4, 1) and has direction vector (2, 1, 3). So, it
can also be described by the following vector equation:

r = (3, 4, 1) + λ(2, 1, 3) (λ ∈ R).

−x + 7 0.5y + 1
Example 650. Let l1 be the line described by = = z − 2.
−5 0.3
x−7 y + 2 z − 2
Rewrite the above as: = = .
5 0.6 1
Reading off, l1 contains the point (7, −2, 2) and has direction vector (5, 0.6, 1). So, it can
also be described by:

r = (7, −2, 2) + λ(5, 0.6, 1) (λ ∈ R).

5x y − 12 3z − 15
Example 651. Let l2 be the line described by = = .
2 6 9
x − 0 y − 12 z − 5
Rewrite the above as: = = .
0.4 6 3
Reading off, l2 contains the point (0, 12, 5) and has direction vector (0.4, 6, 3). So, it can
also be described by:

r = (0, 12, 5) + λ(0.4, 6, 3) (λ ∈ R).

525, Contents

Three examples where exactly one of the direction vector’s coordinates is zero:

Example 652. Let l3 be the line described by:

5y − 12 2 3z − 15
x = 17 =
and .
100 9

By =, every point on l3 has x-coordinate 17. Thus, l3 is perpendicular to the x-axis.


y − 2.4 z − 5
Rewrite = as: =
20 3
Reading off, l3 contains the point (17, 2.4, 5) and has direction vector (0, 20, 3). So, it
can also be described by:

r = (17, 2.4, 5) + λ(0, 20, 3) (λ ∈ R).

Example 653. Let l4 be the line described by:

−x 2 z + 10
y = −2 =
and .
3 −5

By =, every point on l4 has y-coordinate −2. Thus, l4 is perpendicular to the y-axis.


x z − (−10)
Rewrite = as: =
−3 −5
Reading off, l4 contains the point (0, −2, −10) and has direction vector (−3, 0, −5). So, it
can also be described by:

r = (0, −2, −10) + λ(−3, 0, −5) (λ ∈ R).

Example 654. Let l5 be the line described by:

7x − 6 2 2y + 10
4z = 3 =
and .
35 18

By =, every point on l5 has z-coordinate 3/4. Thus, l5 is perpendicular to the z-axis.


x − 6/7 y − (−5)
Rewrite = as: =
5 9
Reading off, l5 contains the point (6/7, −5, 3/4) and has direction vector (5, 9, 0). So, it can
also be described by:

r = (6/7, −5, 3/4) + λ(5, 9, 0) (λ ∈ R).

526, Contents

Three examples where exactly two of the direction vector’s coordinates are zero:

Example 655. Let l6 be the line described by x = 5 and z = 9.

Every point on l6 has x- and z-coordinates 5 and 9. Hence, the direction vector must
have 0 as its x- and z-coordinates. (Equivalently, this line must be perpendicular to both
the x- and z-axes.)
The free variable is y. Altogether then, l6 contains exactly those points (5, y, 9), for all
real numbers y. For example, it contains the points (5, 0, 9) and (5, −100, 9). Hence, for
any non-zero k, (0, k, 0) is a direction vector of l6 .
For simplicity, we pick (0, 1, 0) as our direction vector and describe l6 by:

r = (5, 0, 9) + λ(0, 1,(λ0)∈ R).

Example 656. Let l7 be the line described by 3x = −12 and y = 0.

Every point on l7 has x- and y-coordinates −4 and 0. Hence, the direction vector must
have 0 as its x- and y-coordinates. (Equivalently, this line must be perpendicular to both
the x- and y-axes.)
The free variable is z. Altogether then, l7 contains exactly those points (−4, 0, z), for all
real numbers z. For example, it contains the points (−4, 0, 0) and (−4, 0, π). Hence, for
any non-zero k, (0, 0, k) is a direction vector of l7 .
For simplicity, we pick (0, 0, 1) as our direction vector and describe l7 by:

r = (−4, 0, 0) + λ(0, 0,(λ1)∈ R).

Example 657. Let l8 be the line described by y = −11 and −4z = 52.
Every point on l8 has y- and z-coordinates −11 and −13. Hence, the direction vector
must have 0 as its y- and z-coordinates. (Equivalently, this line must be perpendicular
to both the y- and z-axes.)
The free variable is x. Altogether then, l8 contains exactly those points (x, −11, −13), for

all real numbers x. For example, it contains the points (0, −11, −13) and ( 2, −11, −13).
Hence, for any non-zero k, (k, 0, 0) is a direction vector of l8 .
For simplicity, we pick (1, 0, 0) as our direction vector and describe l8 by:

r = (0, −11, −13) + λ(1, 0,(λ0)∈ R).

Exercise 217. Each pair of cartesian equations below describes a line. Rewrite each into
vector form. State if each is perpendicular to any axes. (Answer on p. 1489.)

7x − 2 0.3y − 5 8z x − 3 5z − 2
(a) = = . (d) 3y = 11 and = .
5 7 7 2 7
2x = 3y = 5z. = 13 2z = 1.
(b) (e) and
3y − 1
(c) 17x − 4 = = 3z. (f) 13x + 5 = 0 and y = 5z − 2.

527, Contents

48.3. Parallel and Perpendicular Lines
Our definition of when two lines in 3D space are parallel or perpendicular is exactly the
same as before and now reproduced:

Definition 116. Two lines are (a) parallel if they have parallel direction vectors; and
(b) perpendicular if they have perpendicular direction vectors.

Example 658. Suppose two lines are described by:

r = (1, 0, 1) + λ (1, 2, 3) and r = (5, 0, 9) + µ (−2, −4, −6) (λ, µ ∈ R).

Since (1, 2, 3) ∥ (−2, −4, −6), by the above Definition, the two lines are parallel.

⎛1⎞ ⎛1 ⎞
r=⎜ ⎟ ⎜
⎜ 0 ⎟ + λ⎜ 2
⎟ (λ ∈ R)

y ⎝1⎠ ⎝3 ⎠


⎛5⎞ ⎛ −2 ⎞
⎜ ⎟
r = ⎜ 0 ⎟ + µ⎜
⎜ −4
⎟ (µ ∈ R)

⎝9⎠ ⎝ −6 ⎠

Example 659. Suppose two lines are described by:

r = (5, −1, 4) + λ (8, 2, −1) and r = (3, 1, 6) + µ (1, −2, 4) (λ, µ ∈ R).

We have (8, 2, −1) ⋅ (1, −2, 4) = 8 − 4 − 4 = 0. So, (8, 2, −1) ⊥ (1, −2, 4) and by the above
Definition, the two lines are perpendicular.

Example 660. Suppose two lines are described by:

r = (3, 2, −1) + λ (1, 0, 0) and r = (0, 0, 0) + µ (1, 1, 0) (λ, µ ∈ R).

Since (1, 0, 0) ∥/ (1, 1, 0) and (1, 0, 0) ⊥/ (1, 1, 0), the two lines are neither parallel nor

528, Contents

The following Fact remains true in 3D space:

Fact 75. If two lines are:

(a) Identical, then they are also parallel.
(b) Distinct and parallel, then they do not intersect.
(c) Distinct, then they share at most one intersection point.

Example 661. Continuing with Ex- ⎛1⎞ ⎛1 ⎞

ample 658, two lines are described by: r=⎜ ⎟ ⎜
⎜ 0 ⎟ + λ⎜ 2
⎟ (λ ∈ R)

y ⎝1⎠ ⎝3 ⎠
r = (1, 0, 1) + λ (1, 2, 3) and
r = (5, 0, 9) + µ (−2, −4, −6) (λ, µ ∈ R).

Since (1, 2, 3) ∥ (−2, −4, −6), the two

lines are parallel.
Observe that the point (1, 0, 1) is on
the first line (plug in λ = 0), but not x
on the second (to see this, observe that z (1, 0, 1)
the only point on the second line with
x-coordinate 1 corresponds to µ = 2).
Thus, the two lines are distinct.
Since they are parallel and distinct, by ⎛5⎞ ⎛ −2 ⎞
Fact 75(b), they do not intersect. ⎜ ⎟
r = ⎜ 0 ⎟ + µ⎜
⎜ −4
⎟ (µ ∈ R)

⎝9⎠ ⎝ −6 ⎠

Example 662. Suppose two lines are described by:

r = (3, 6, 9) + λ (1, 2, 3) and r = (0, 0, 0) + µ (−2, −4, −6) (λ, µ ∈ R).

Since (1, 2, 3) ∥ (−2, −4, −6), by the above Definition, the two lines are parallel.
Observe that the point (3, 6, 9) is on both lines (plug in λ = 0 and µ = −1.5).
Since the two lines are parallel and do intersect, by Fact 75(b), they cannot be distinct.
Equivalently, they must be identical.

We will next learn how to determine whether two lines in 3D space intersect and if they
do, how to find their intersection point.

529, Contents

48.4. Intersecting Lines
Fact 75(c) says that two distinct lines share at most one intersection point. Here are two
examples of when two distinct lines in 3D space intersect and how we go about finding their
(only) intersection point.

Example 663. Suppose two lines are described by:

⎛0⎞ ⎛ 1 ⎞ ⎛1⎞ ⎛ 2 ⎞
r = ⎜ 0 ⎟ + λ⎜
⎜ ⎟
⎜ −1

⎟ and r = ⎜ 1 ⎟ + µ⎜
⎜ ⎟
⎜ 0

⎟ (λ, µ ∈ R).
⎝0⎠ ⎝ −2 ⎠ ⎝1⎠ ⎝ −1 ⎠

These two lines are not parallel and hence distinct. And so, by Fact 75(c), they share at
most one intersection point.
Suppose they intersect. Then there must be real numbers λ̂ and µ̂ such that:

λ̂ = 1 + 2µ̂,
⎛0⎞ ⎛ 1 ⎞ ⎛1⎞ ⎛ 2 ⎞
⎜ 0 ⎟ + λ̂ ⎜ −1 ⎟ = ⎜ 1 ⎟ + µ̂ ⎜ 0 ⎟, −λ̂ = 1,
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝0⎠ ⎝ −2 ⎠ ⎝1⎠ ⎝ −1 ⎠
−2λ̂ = 1 − µ̂.

From =, λ̂ = −1. Plug this into = to get µ̂ = −1. You can verify that these values of λ̂ and
2 1

µ̂ also satisfy =. Hence, the two lines do indeed intersect.


To find their intersection point, plug λ̂ = −1 or µ̂ = −1 into either line’s vector equation:

⎛0⎞ ⎛ 1 ⎞ ⎛1⎞ ⎛ 2 ⎞ ⎛ −1 ⎞
⎜ 0 ⎟ + λ̂ ⎜ −1 ⎟ = ⎜ 1 ⎟ + µ̂ ⎜ 0 ⎟ = ⎜ 1 ⎟.
⎜ ⎟ ®⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝ 0 ⎠ −1 ⎝ −2 ®
⎠ ⎝ 1 ⎠ −1 ⎝ −1 ⎠ ⎝ 2 ⎠

y ⎛1⎞ ⎛ 2 ⎞

r = ⎜ 1 ⎟ + µ⎜

⎜ 0
⎟ (µ ∈ R)

(−1, 1, 2) ⎝1⎠ ⎝ −1 ⎠


⎛0⎞ ⎛ 1 ⎞
r=⎜ ⎟ ⎜
⎜ 0 ⎟ + λ ⎜ −1
⎟ (λ ∈ R)

⎝0⎠ ⎝ −2 ⎠

530, Contents

Example 664. Suppose two lines are described by:

⎛1⎞ ⎛1 ⎞ ⎛0⎞ ⎛2 ⎞
r=⎜ ⎟ ⎜
⎜ 2 ⎟ + λ⎜ 1

⎟ and r=⎜ ⎟ ⎜
⎜ 1 ⎟ + µ⎜ 1

⎟ (λ, µ ∈ R).
⎝3⎠ ⎝1 ⎠ ⎝2⎠ ⎝4 ⎠

These two lines are not parallel and hence distinct. And so, by Fact 75(c), they share at
most one intersection point.
Suppose they intersect. Then there must be real numbers λ̂ and µ̂ such that:

1 + λ̂ = 2µ̂,
⎛1⎞ ⎛1 ⎞ ⎛0⎞ ⎛2 ⎞
⎜ 2 ⎟ + λ̂ ⎜ 1 ⎟ = ⎜ 1 ⎟ + µ̂ ⎜ 1 ⎟, 2 + λ̂ = 1 + µ̂,
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝3⎠ ⎝1 ⎠ ⎝2⎠ ⎝4 ⎠
3 + λ̂ = 2 + 4µ̂.

= minus = yields −1 = µ̂ − 1 or µ̂ = 0. Plug this into = to get λ̂ = −1. You can verify that
1 2 1

these values of λ̂ and µ̂ also satisfy =. Hence, the two lines intersect.

To find their intersection point, plug λ̂ = −1 or µ̂ = 0 into either line’s vector equation:

⎛1⎞ ⎛1 ⎞ ⎛0⎞ ⎛2 ⎞ ⎛0⎞

⎜ 2 ⎟ + λ̂ ⎜ 1 ⎟ = ⎜ 1 ⎟ + µ̂ ⎜ 1 ⎟ = ⎜ 1 ⎟.
⎜ ⎟ ®⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝ 3 ⎠ −1 ⎝ 1 ⎠ ⎝2⎠ 0 ⎝4 ® ⎠ ⎝2⎠

⎛0⎞ ⎛2 ⎞
(0, 1, 2) r=⎜ ⎟ ⎜ ⎟ (µ ∈ R)
⎜ 1 ⎟ + µ⎜ 1 ⎟
⎝2⎠ ⎝4 ⎠

⎛1⎞ ⎛1 ⎞
r=⎜ ⎟ ⎜
⎜ 2 ⎟ + λ⎜ 1
⎟ (λ ∈ R)
⎟ x
⎝3⎠ ⎝1 ⎠ z

531, Contents

48.5. Skew Lines
Recall235 that in 2D space, two non-parallel lines must intersect. In contrast, in 3D space,
two non-parallel lines need not intersect. We have a special name for such lines:

Definition 139. Two lines are said to be skew if they are not parallel and do not intersect.

Example 665. Suppose two lines are described by:

⎛0⎞ ⎛1 ⎞ ⎛1⎞ ⎛4 ⎞
r=⎜ ⎟ ⎜
⎜ 0 ⎟ + λ⎜ 2

⎟ and r=⎜ ⎟ ⎜
⎜ 1 ⎟ + µ⎜ 5

⎟ (λ, µ ∈ R).
⎝0⎠ ⎝3 ⎠ ⎝2⎠ ⎝6 ⎠

These two lines are not parallel and hence distinct. And so by Fact 75(c), they share at
most one intersection point.
To check if they intersect, suppose there are real numbers λ̂ and µ̂ such that:

λ̂ = 1 + 4µ̂,
⎛0⎞ ⎛1 ⎞ ⎛1⎞ ⎛4 ⎞
⎜ 0 ⎟ + λ̂ ⎜ 2 ⎟ = ⎜ 1 ⎟ + µ̂ ⎜ 5
⋆ ⎟, 2λ̂ = 1 + 5µ̂,
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝0⎠ ⎝3 ⎠ ⎝2⎠ ⎝6 ⎠
3λ̂ = 2 + 6µ̂.

Now, 2× = minus = yields 0 = 1 + 3µ̂ or µ̂ = −1/3. Plug this back into = to get λ̂ = −1/3.
1 2 1

But these values of λ̂ and µ̂ contradict =.


This contradiction means that there are no real numbers λ̂ and µ̂ such that = holds. In
other words, the two lines do not intersect. And since they are not parallel either, by
the above Definition, they are skew.

⎛1⎞ ⎛4 ⎞
r = ⎜ 1 ⎟ + µ⎜
⎜ ⎟
⎟ (µ ∈ R)

⎝2⎠ ⎝6 ⎠

⎛0⎞ ⎛1 ⎞
r=⎜ ⎟ ⎜
⎜ 0 ⎟ + λ⎜ 2
⎟ (λ ∈ R)

⎝0⎠ ⎝3 ⎠

Fact 76.
532, Contents
Example 666. Suppose two lines are described by:

⎛1⎞ ⎛ 1 ⎞ ⎛1⎞ ⎛2 ⎞
r=⎜ ⎟ ⎜
⎜ 3 ⎟ + λ ⎜ −1

⎟ and r=⎜ ⎟ ⎜
⎜ 0 ⎟ + µ⎜ 1

⎟ (λ, µ ∈ R).
⎝3⎠ ⎝ −2 ⎠ ⎝1⎠ ⎝3 ⎠

These two lines are not parallel and hence distinct. And so by Fact 75(c), they share at
most one intersection point.
To check if they intersect, suppose there are real numbers λ̂ and µ̂ such that:

1 + λ̂ = 1 + 2µ̂,
⎛1⎞ ⎛ 1 ⎞ ⎛1⎞ ⎛2 ⎞
⎜ 3 ⎟ + λ̂ ⎜ −1 ⎟ = ⎜ 0 ⎟ + µ̂ ⎜ 1
⋆ ⎟, 3 − λ̂ = µ̂,
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝3⎠ ⎝ −2 ⎠ ⎝1⎠ ⎝3 ⎠
3 − 2λ̂ = 1 + 3µ̂.

Now, = minus 2× = minus yields 3λ̂ − 5 = 1 or λ̂ = 2. Plug this back into = to get µ̂ = 1.
1 2 1

But these values of λ̂ and µ̂ contradict =.


And so again, the two lines do not intersect. And since they are not parallel either,
they are skew.

⎛1⎞ ⎛ 1 ⎞
⎛1⎞ ⎛2 ⎞
r=⎜ ⎟ ⎜
⎜ 3 ⎟ + λ ⎜ −1
⎟ (λ ∈ R)

r=⎜ ⎟ ⎜
⎜ 0 ⎟ + µ⎜ 1
⎟ (µ ∈ R)
⎟ ⎝3⎠ ⎝ −2 ⎠
⎝1⎠ ⎝3 ⎠

533, Contents

48.6. The Angle Between Two Lines

Example 667. As in 2D space, two inter- y

secting lines in 3D space form two angles
α and β = π − α at their intersection point.
We define the smaller of these two angles
to be the angle between the two lines. β =π−α
So in the figure on the right, the angle α
between the two lines is α and not β.


β =π−α

In the figure on the left, the black and

solid red lines do not intersect. But
z even so, we will still find it useful to talk
about the angle between them.
To do so, translate the red line upwards so that it intersects the black line. As usual, two
angles α and β = π − α are formed at the intersection point. We then define the angle
between the black and solid red lines to be the smaller of these two angles, namely α.

Our formal definition of the angle between two lines is reproduced from before:

Definition 115. Given two lines, pick for each any direction vector. We call the non-
obtuse angle between these two vectors the angle between the two lines.

And so, we have the same results as before:

Corollary 8. The angle between two lines with direction vectors u and v is:
∣u ⋅ v∣
∣u∣ ∣v∣

Corollary 9. Suppose θ is the angle between two lines. (a) If θ = 0, then the two lines
are parallel. And (b) if θ = π/2, then they are perpendicular.

534, Contents

First, two examples where we find the angle between two intersecting lines:

Example 668. Two lines are described by r = (3, 3, 3) + λ (1, 2, 3) and

r = (3, 3, 3) + µ (4, 5, 6) (λ, µ ∈ R).
Observe that these two
lines intersect at (3, 3, 3).
(And so, they aren’t skew.) (3, 3, 3)
The angle between these
two lines is the non-obtuse ⎛3⎞ ⎛1 ⎞
angle between their direc- r=⎜ ⎟ ⎜
⎜ 3 ⎟ + λ⎜ 2
⎟ (λ ∈ R)

tion vectors and is given by ⎛3⎞ ⎛4 ⎞ ⎝3⎠ ⎝3 ⎠
Corollary 8: r = ⎜ 3 ⎟ + µ⎜
⎜ ⎟
⎟ (µ ∈ R)

⎝3⎠ ⎝6 ⎠
∣(1, 2, 3) ⋅ (4, 5, 6)∣
∣(1, 2, 3)∣ ∣(4, 5, 6)∣
∣4 + 10 + 18∣
= cos−1 √ √
12 + 22 + 32 42 + 52 + 62
= cos−1 √ √ ≈ 0.226.
14 77
This angle is neither zero nor right. And so by Corollary 9, the two lines are neither
parallel nor perpendicular.

Example 669. Two lines are described by:

r = (0, 0, 0) + λ (1, 0, 1) and r = (6, 1, 3) + µ (5, 1, 2) (λ, µ ∈ R).

Observe that these two lines intersect at (1, 0, 1). (And so, they are not skew.)
Again, the angle between these two lines is given by Corollary 8:

∣(1, 0, 1) ⋅ (5, 1, 2)∣ ∣5 + 0 + 2∣ 7

cos−1 = cos−1 √ √ = cos−1 √ √ ≈ 0.442.
∣(1, 0, 1)∣ ∣(5, 1, 2)∣ 12 + 02 + 12 52 + 12 + 22 2 30

This angle is neither zero nor right. And so by Corollary 9, the two lines are neither
parallel nor perpendicular.
⎛0⎞ ⎛1 ⎞
r=⎜ ⎟ ⎜
⎜ 0 ⎟ + λ⎜ 0
⎟ (λ ∈ R)

⎝0⎠ ⎝1 ⎠

⎛6⎞ ⎛5 ⎞ 0.442
r=⎜ ⎟ ⎜ ⎟ (µ ∈ R) (1, 0, 1)
⎜ 1 ⎟ + µ⎜ 1 ⎟ z
⎝3⎠ ⎝2 ⎠

535, Contents

And now, two examples where we find the angle between two non-intersecting lines.

Example 670. Two lines are described by:

⎛1⎞ ⎛1 ⎞ ⎛0⎞ ⎛ 0 ⎞
r=⎜ ⎟ ⎜
⎜ 2 ⎟ + λ⎜ 2

⎟ and r=⎜ ⎟ ⎜
⎜ 0 ⎟ + µ̂ ⎜ 3

⎟ (λ, µ ∈ R).
⎝2⎠ ⎝1 ⎠ ⎝0⎠ ⎝ −2 ⎠

If they intersect, then there are real numbers λ̂ and µ̂ such that:

1 + λ̂ = 0,
⎛1⎞ ⎛1 ⎞ ⎛0⎞ ⎛ 0 ⎞
⎜ 2 ⎟ + λ̂ ⎜ 2 ⎟ = ⎜ 0 ⎟ + µ̂ ⎜ 3
⋆ ⎟, 2 + 2λ̂ = 3µ̂,
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝2⎠ ⎝1 ⎠ ⎝0⎠ ⎝ −2 ⎠
2 + λ̂ = −2µ̂.

From =, λ̂ = −1. Plug this into = to get µ̂ = 0. But now, these values of λ̂ and µ̂ contradict
1 2

=. Hence, the two lines do not intersect.


Even though the two lines do not intersect, we will still find it useful to talk about the
angle between them. This we can compute as usual:

∣(1, 2, 1) ⋅ (0, 3, −2)∣ ∣0 + 6 − 2∣ 4

cos−1 = cos−1 √ √ = cos−1 √ √ ≈ 1.101.
∣(1, 2, 1)∣ ∣(0, 3, −2)∣
12 + 22 + 12 02 + 32 + (−2) 6 13

This angle is neither zero nor right. And so by Corollary 9, the two lines are neither
parallel nor perpendicular.
Since the two lines do not intersect and are not parallel, they are skew.

y ⎛0⎞ ⎛ 0 ⎞
r = ⎜ 0 ⎟ + µ⎜
⎜ ⎟
⎜ 3
⎟ (µ ∈ R)

⎛1⎞ ⎛1 ⎞ ⎝0⎠ ⎝ −2 ⎠
r=⎜ ⎟ ⎜
⎜ 2 ⎟ + λ⎜ 2
⎟ (λ ∈ R)
⎟ 1.101
⎝2⎠ ⎝1 ⎠

The two lines do not intersect. Nonetheless, we can always translate one of the two lines
so that they intersect. In the above figure, we’ve translated the black line so that it
intersects the red line at the origin.

536, Contents

Example 671. Two lines are described by:

r = (0, 1, 2) + λ (9, 1, 3) and r = (4, 5, 6) + µ (3, 2, 1) (λ, µ ∈ R).

If they intersect, then there are real numbers λ̂ and µ̂ such that:

9λ̂ = 4 + 3µ̂,
⎛0⎞ ⎛9 ⎞ ⎛4⎞ ⎛3 ⎞
⎜ 1 ⎟ + λ̂ ⎜ 1 ⎟ =⋆ ⎜ 5 ⎟ + µ̂ ⎜ 2 ⎟, 1 + λ̂ = 5 + 2µ̂,
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝2⎠ ⎝3 ⎠ ⎝6⎠ ⎝1 ⎠
2 + 3λ̂ = 6 + µ̂.

= minus 3× = yields −6 = −14, which is a contradiction. Hence, the two lines do not
1 3

intersect. Nonetheless, we can as usual compute the angle between them:

∣(9, 1, 3) ⋅ (3, 2, 1)∣ ∣27 + 2 + 3∣ 32

cos−1 = cos−1 √ √ = cos−1 √ √ ≈ 0.459.
∣(9, 1, 3)∣ ∣(3, 2, 1)∣ 92 + 12 + 32 32 + 22 + 12 91 14

This angle is neither zero nor right. And so by Corollary 9, the two lines are neither
parallel nor perpendicular. Since they do not intersect either, they are skew.

⎛4⎞ ⎛3 ⎞
r=⎜ ⎟ ⎜
⎜ 5 ⎟ + µ⎜ 2
⎟ (µ ∈ R)
⎟ 0.459
⎝6⎠ ⎝1 ⎠ (0, 1, 2)

z x

⎛0⎞ ⎛9 ⎞
r=⎜ ⎟ ⎜
⎜ 1 ⎟ + λ⎜ 1
⎟ (λ ∈ R)

⎝2⎠ ⎝3 ⎠

The two lines do not intersect. Nonetheless, we can always translate one of the two
lines so that they intersect. In the above figure, we’ve translated the red line so that it
intersects the black line at the point (0, 1, 2).

Exercise 218. Each of (a)–(d) gives a pair of lines in vector form. Find any inter-
section points and the angle between the two lines. State if the two lines are parallel,
perpendicular, identical, or skew. (Answer on p. 1490.)
(a) r= (0, 1, 1) +λ (1, −1, 1) and r= (1, 3, 3) +µ (0, 0, 2).
(b) r= (−1, 2, 3) +λ (0, 1, 0) and r= (0, 0, 0) +µ (8, −3, 5).
(c) r= (7, 3, 4) +λ (8, 3, 4) and r= (9, 3, 7) +µ (3, −4, −3).
(d) r= (0, 0, 1) +λ (1, 2, 1) and r= (1, 0, 0) +µ (−3, −6, −3).

537, Contents

48.7. Collinearity (in 3D)
Our Definition of collinearity is the same as before:

Definition 120. Two or more points are collinear if some line contains all of them.

Fact 80 is reproduced from before and says that any two points are always collinear:

Fact 80. Suppose A and B are distinct points. Then the unique line that contains both
A and B is described by:
Ð→ Ð→
r = OA + λAB (λ ∈ R).

Example 672. Any two points A and B are collinear.

Ð→ Ð→ Ð→
AB r = OA + λAB (λ ∈ R)

And as before, three distinct points can be collinear but will not generally be:

Example 673. The points A, B, and C are collinear:

Ð→ Ð→
A, B, and C are collinear. AB r = a + λAB (λ ∈ R)


In contrast, the points D, E, and F are not collinear.

D, E, and F are not collinear. r = d + λDE (λ ∈ R)


We’ll use the exact same procedure to check whether three points are collinear:
1. First use Fact 80 to write down the unique line that contains two of the three points.
2. Then check whether this line also contains the third point.
Two examples:

538, Contents

Example 674. Let A = (1, 2, 3), B = (4, 5, 6), and C = (7, 8, 9) be points.
To check if they are collinear:
1. First write down the unique line that contains both A and B:

⎛1⎞ ⎛3 ⎞
Ð→ Ð→ ⎜ ⎟
r = OA + λAB = ⎜ 2 ⎟ + λ⎜

⎟ (λ ∈ R).
⎝3⎠ ⎝3 ⎠

2. If this line also contains C, then there exists λ̂ such that:

7 = 1 + 3λ̂,
⎛7⎞ ⎛1 ⎞ ⎛3⎞
C=⎜ ⎟ ⎜
⎜ 8 ⎟=⎜ 2
⎟ + λ̂⎜ 3 ⎟,
⎟ ⎜ ⎟ or 8 = 2 + 3λ̂,

⎝9⎠ ⎝3 ⎠ ⎝3⎠
9 = 3 + 3λ̂.

As you can verify, λ̂ = 2 solves the above vector equation (or system of three equations).
Thus, our line also contains C.
We conclude that A, B, and C are collinear.

C = (7, 8, 9)

B = (4, 5, 6)

A = (1, 2, 3)
⎛1⎞ ⎛3 ⎞ x
r = ⎜ 2 ⎟ + λ⎜
⎜ ⎟
⎟ (λ ∈ R)

⎝3⎠ ⎝3 ⎠

539, Contents

Example 675. Let D = (1, 0, 0), E = (0, 1, 0), and F = (0, 0, 1) be points. To check if
they are collinear:
1. First write down a line that contains both D and E:

⎛1⎞ ⎛ −1 ⎞
ÐÐ→ ÐÐ→ ⎜ ⎟
r = OD + λDE = ⎜ 0 ⎟ + λ⎜
⎜ 1

⎟ (λ ∈ R).
⎝0⎠ ⎝ 0 ⎠

2. If this line also contains F , then there exists λ̂ such that:

0 = 1 − 1λ̂,
⎛0⎞ ⎛1⎞ ⎛ −1 ⎞
F =⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎜ 0 ⎟ = ⎜ 0 ⎟ + λ̂⎜ 1 ⎟, or 0 = 0 + 1λ̂,

⎝1⎠ ⎝0⎠ ⎝ 0 ⎠
0 = 0 + 0λ̂.

From =, we have λ̂ = 1. But this contradicts =. This contradiction means that there is no
1 2

solution to the above vector equation (or system of three equations). Thus, the line we
wrote down above does not contain F .
We conclude that D, E, and F are not collinear.

⎛1⎞ ⎛ −1 ⎞
r = ⎜ 0 ⎟ + λ⎜
⎜ ⎟
⎜ 1
⎟ (λ ∈ R)

⎝0⎠ ⎝ 0 ⎠ E = (0, 1, 0)

D = (1, 0, 0)

F = (0, 0, 1)

Exercise 219. Determine if A, B, and C are collinear. (Answer on p. 1491.)

(a) A = (3, 1, 2), B = (1, 6, 5), and C = (0, −1, 0).
(b) A = (1, 2, 4), B = (0, 0, 1), and C = (3, 6, 10).

540, Contents

49. The Vector Product (in 3D)
In 2D space, the vector product was simply a scalar (real number). In contrast, in 3D
space, it is a vector (hence the name).
As we’ll see later, we’ll often have the need to find a vector that’s perpendicular to two
other vectors. It is this need that motivates the concept of the vector product (in 3D).
Let a = (a1 , a2 , a3 ) and b = (b1 , b2 , b3 ) be vectors. Can we find some vector c = (c1 , c2 , c3 )
that’s perpendicular to both a and b?
Well, if c ⊥ a, b, then a ⋅ c = 0 and b ⋅ c = 0. Or:
1 2

(a1 , a2 , a3 ) ⋅ (c1 , c2 , c3 ) = a1 c1 + a2 c2 + a3 c3 = 0,

(b1 , b2 , b3 ) ⋅ (c1 , c2 , c3 ) = b1 c1 + b2 c2 + b3 c3 = 0.

Our goal is to find c that solves = and =. Observe that b3 × = minus a3 × = yields:
1 2 1 2

0 = a1 b3 c1 + a2 b3 c2 
b3c3 − a3 b1 c1 − a3 b2 c2 

= c2 (a2 b3 − a3 b2 ) − c1 (a3 b1 − a1 b3 ) .

Now, notice that c1 = a2 b3 − a3 b2 and c2 = a3 b1 − a1 b3 solves =.

4 5 3

Next, get the corresponding value of c3 by plugging = and = into =:

4 5 1

³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ · ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹µ ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ
c1 c2

0 = a1 (
a2b3 − a3 b2 ) + a2 (a3 b1 −a1 b3 ) + a3 c3 = −a1
a3 b2 + a2
a3 b1 + 
a3 c3 or c3 = a1 b2 − a2 b1 .

Hence, a vector that solves = and = (i.e. is perpendicular to both a and b) is:
1 2

⎛ c1 ⎞ ⎛ a2 b3 − a3 b2 ⎞
c=⎜ ⎟ ⎜
⎜ c2 ⎟ = ⎜ a3 b1 − a1 b3

⎝ c3 ⎠ ⎝ a1 b2 − a2 b1 ⎠

We will simply use the above as our Definition of the vector product:236

Definition 140. Let a = (a1 , a2 , a3 ) and b = (b1 , b2 , b3 ) be vectors. Then their vector
product, denoted a × b, is the following vector:

⎛ a2 b3 − a3 b2 ⎞
⎜ a3 b1 − a1 b3

⎝ a1 b2 − a2 b1 ⎠

By the way, no need to mug Definition 140, because it’s already on List MF26 (p. 4).

Pedagogical note: In earlier versions of this textbook (i.e. before the revisions of 2018), I started with
the geometric definition of the vector product. I have now decided to go the other way round — that
is, I now take the more standard and formalistic approach of defining the vector product analytically.
541, Contents
Example 676. The vector product of a = (1, 2, 3) and b = (4, 5, 6) is:

⎛ 1 ⎞ ⎛ 4 ⎞ ⎛ 2⋅6−3⋅5 ⎞ ⎛ −3 ⎞
a×b=⎜ ⎟ ⎜ ⎟ ⎜
⎜ 2 ⎟×⎜ 5 ⎟=⎜ 3⋅4−1⋅6
⎟ = ⎜ 6 ⎟.
⎟ ⎜ ⎟
⎝ 3 ⎠ ⎝ 6 ⎠ ⎝ 1⋅5−2⋅4 ⎠ ⎝ −3 ⎠

From our above discussion, we already know that a × b ⊥ a, b. Indeed, this was the
geometric property that motivated our definition of the vector product. Nonetheless, as
an exercise, let’s go ahead and verify that (a × b) ⋅ a = 0 and (a × b) ⋅ b = 0:

(a × b) ⋅ a = (−3, 6, −3) ⋅ (1, 2, 3) = −3 + 12 − 9 = 0, 3

(a × b) ⋅ b = (−3, 6, −3) ⋅ (4, 5, 6) = −12 + 30 − 18 = 0. 3

b = (4, 5, 6)
a × b = (−3, 6, −3)

a = (1, 2, 3)

Example 677. The vector product of u = (1, 0, −1) and v = (3, −1, 0) is:

⎛ 1 ⎞ ⎛ 3 ⎞ ⎛ 0 ⋅ 0 − (−1) ⋅ (−1) ⎞ ⎛ −1 ⎞
u×v=⎜ ⎟ ⎜
⎜ 0 ⎟ × ⎜ −1
⎟ ⎜ −1 ⋅ 3 − 1 ⋅ 0 ⎟ = ⎜ −3 ⎟.
⎟ ⎜ ⎟
⎝ −1 ⎠ ⎝ 0 ⎠ ⎝ 1 ⋅ (−1) − 0 ⋅ 3 ⎠ ⎝ −1 ⎠

Again, let’s verify that u × v ⊥ u, v:

(u × v) ⋅ u = (−1, −3, −1) ⋅ (1, 0, −1) = −1 + 0 + 1 = 0, 3

(u × v) ⋅ v = (−1, −3, −1) ⋅ (3, −1, 0) = −3 + 3 + 0 = 0. 3
u = (1, 0, −1)

u × v = (−1, −3, −1)

v = (3, −1, 0)

542, Contents

Here’s the formal statement of the vector product’s key geometric property:

Fact 93. Suppose a and b are vectors with a × b ≠ 0. Then a × b ⊥ a, b.

Proof. See Exercise 220(c).

Exercise 220. Let u = (0, 1, 2), v = (3, 4, 5), w = (−1, −2, −3), and x = (1, 0, 5).
(a) Find u × v and verify that u × v ⊥ u, v.
(b) Find w × x and verify that w × x ⊥ w, x. (Answer on p. 1492.)
(c) Let a = (a1 , a2 , a3 ) and b = (b1 , b2 , b3 ). Prove that (a × b) ⋅ a = 0 and (a × b) ⋅ b = 0.

All of our results about the vector product in 2D space continue to hold in 3D space and
are now reproduced. First, it remains true that the vector product is distributive and
anti-commutative. Moreover, the vector product of a vector with itself is zero:

Fact 81. Let a, b, and c be vectors. Then:

(a) a × (b + c) = a × b + a × c. (Distributivity over addition)
(b) a × b = −b × a. (Anti-commutativity)
(c) a × a = 0. (Self vector product equals zero)

Proof. See Exercise 221(a), (b), and (c).

Next, we have the following “obvious” property reproduced from before:

Fact 83. Suppose a and b are vectors and c ∈ R is a scalar. Then:

(ca) × b = c (a × b).

Proof. See Exercise 221(d).

Exercise 221. Suppose a = (a1 , a2 , a3 ), b = (b1 , b2 , b3 ), c = (c1 , c2 , c3 ), and d ∈ R.

(a) Prove that a × (b + c) = a × b + a × c.
(b) First verify that (4, 5, 6)×(1, 2, 3) = − (1, 2, 3)×(4, 5, 6). Then prove that a×b = −b×a.
(c) Prove that a × a = 0.
(d) Let d ∈ R. Prove that (da) × b = d (a × b). (Answer on p. 1493.)

Also, from Fact 81(c), we again have the following result. The proof is exactly the same as
before and is simply reproduced:

Corollary 10. If a ∥ b, then a × b = 0.

Proof. If a ∥ b, then there exists c ≠ 0 such that ca = cb. Thus:

a × b = a × (ca) = c (a × a) = c ⋅ 0 = 0.

543, Contents

And again, we have the converse of Corollary 10:

Fact 82. Let a and b be non-zero vectors. If a × b = 0, then a ∥ b.

Proof. See p. 1297 in the Appendices.

So again, together, Corollary 10 and Fact 82 yield:

Corollary 11. Suppose a and b are non-zero vectors. Then:

a×b=0 ⇐⇒ a ∥ b.

Example 678. Let s = (1, 2, 3) and t = (2, 4, 6) be vectors. Since s ∥ t, by Corollary 11,
we must have s × t = 0. We can easily verify that this is so:

⎛1⎞ ⎛2 ⎞ ⎛ 2⋅6−3⋅4 ⎞ ⎛0⎞

s×t=⎜ ⎟ ⎜
⎜ 2 ⎟×⎜ 4
⎟=⎜ 3⋅2−1⋅6
⎟ ⎜
⎟ = ⎜ 0 ⎟.
⎟ ⎜ ⎟
⎝3⎠ ⎝6 ⎠ ⎝ 1⋅4−2⋅2 ⎠ ⎝0⎠

Example 679. The vector product of c = (−1, 3, −5) and d = (2, −4, 6) is:

⎛ −1 ⎞ ⎛ 2 ⎞ ⎛ −2 ⎞
c×d=⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎜ 3 ⎟ × ⎜ −4 ⎟ = ⎜ −4 ⎟.
⎝ −5 ⎠ ⎝ 6 ⎠ ⎝ −2 ⎠

You can verify that c × d ⊥ c, d. y

Note that c and d nearly but do not point in exact oppos- c = (−1, 3, −5)
ite directions. If they pointed in exact opposite directions
(and were thus parallel), then by Corollary 11, their vector
product would have to be the zero vector, i.e. c × d = 0.
Which isn’t the case here.

c × d = (−2, −4, −2)

d = (2, −4, 6)

Let a and b be non-parallel vectors. Then a vector is parallel to a × b if and only if it is

perpendicular to both a and b:

Fact 94. Suppose a, b, and c are vectors, with a ∥/ b. Then:

c∥a×b ⇐⇒ c ⊥ a, b.

Proof. See p. 1297 in the Appendices.

544, Contents
49.1. The Right-Hand Rule
Given two non-parallel vectors a and b, there are exactly
two (unit) vectors that are perpendicular to both a and b. b
One is (the unit vector of) our vector product a × b. The
other is (the unit vector of) the vector that points in the
exact opposite direction — this, of course, is simply −a × b.
−a × b
The vector product a × b is defined so that it satisfies the
right-hand rule. To see why, have the palm of your right
hand face you. Fold your ring and pinky fingers. Have a
your thumb point right, your index finger up, and your
middle finger towards your face. Then these three fingers
correspond to the vectors a, b, and c. (Try it!) a×b

In contrast, −a × b, the other vector that’s perpendicular to a and b, satisfies the left-hand
rule. (Can you explain why?)

Remark 64. We have:

⎛ a2 b3 − a3 b2 ⎞ ⎛ a3 b2 − a2 b3 ⎞
⎜ a3 b1 − a1 b3

⎟ and −a × b = ⎜
⎜ a1 b3 − a3 b1

⎝ a1 b2 − a2 b1 ⎠ ⎝ a2 b1 − a1 b2 ⎠

Why is it that one of these two arbitrary-looking vectors satisfies the right-hand rule,
while the other satisfies the left-hand rule? That this is so is not at all obvious and is
beyond the scope of this textbook.237

Fun Fact

Why do we use the right-hand rule rather than the left-hand rule? One possible
explanation might be that the right-handed majority is, as usual, being tyrannical.
But more likely, this is simply an arbitrary convention, not unlike like how most of the
world drives on the right, while a minority drives on the left.238
Indeed, according to one writer:

Until 1965, the Soviet Union used the left-hand rule, logically reasoning that
the left-hand rule is more convenient because a right-handed person can sim-
ultaneously write while performing cross products.

The short answer is that (i) we earlier adopted the convention that our coordinate system obeys the
right-hand rule; and (ii) (a, b, a × b) is positively oriented with respect to (i, j, k) (what exactly
positively oriented means is the bit that’s beyond the scope of this textbook). Had we instead adopted
the convention that our coordinate system obeys the left-hand rule, then as currently defined, our vector
product a × b would also obey the left-hand rule.
This left-driving minority includes Japan, the UK, and former British colonies like Singapore and
545, Contents
49.2. The Length of the Vector Product
As before, the vector product a × b has length ∣a∣ ∣b∣ sin θ. Formally:

Fact 84. Let θ be the angle between the vectors a and b. Then:

∣a × b∣ = ∣a∣ ∣b∣ sin θ.

Proof. Exercise 222 guides you through a proof of this Fact.

Exercise 222. Let θ be the angle between the vectors a = (a1 , a2 , a3 ) and b = (b1 , b2 , b3 ).
(a) Express ∣a∣, ∣b∣, ∣a × b∣, and cos θ in terms of a1 , a2 , a3 , b1 , b2 , and b3 . (You need not
expand the squared terms.)
(b) Since θ ∈ [0, π], what can you say about the sign of sin θ? (That is, is sin θ positive,
negative, non-positive, or non-negative?)
(c) Now use a trigonometric identity to express sin θ in terms of cos θ. (Hint: You should
find that there are two possibilities. Use what you found in (b) why you can discard
one of these possibilities.)
(d) Plug the expression you wrote down for cos θ in (a) into what you found in (c).
(e) Prove the following algebraic identity.239 (Hint: Fully expand each of LHS and RHS.
Then conclude that LHS = RHS.)

(a21 + a22 + a23 ) (b21 + b22 + b23 ) − (a1 b1 + a2 b2 + a3 b3 )


= (a2 b3 − a3 b2 ) + (a3 b1 − a1 b3 ) + (a1 b2 − a2 b1 ) .

2 2 2

(f) Use (a) and (d) to express ∣a∣ ∣b∣ sin θ in terms of a1 , a2 , a3 , b1 , b2 , and b3 . Then use
(e) to prove that:

∣a × b∣ = ∣a∣ ∣b∣ sin θ. (Answer on p. 1494.)

By the way, this is again simply an instance of Lagrange’s Identity.
546, Contents
49.3. The Length of the Rejection Vector
In 2D space, the vector product was a scalar (real number). In contrast, in 3D space, it is
a vector (hence the name).
Nonetheless and perhaps surprisingly, Fact 85 — which says the rejection vector’s length
is given by the vector product — remains true and is now reproduced:

Fact 85. Let a and b be vectors. Then:

∣rejb a∣ = ∣a × b̂∣ .

Example 680. The points A = (1, 5, −2), B = (2, 3, 1), and C = (2, 7, −1) form a right
Ð→ Ð→ Ð→
triangle. Compute: AB = (1, −2, 3), AC = (1, 2, 1), and BC = (0, 4, −2).
The lengths of the line segments AB and AC are simply:

Ð→ √ Ð→ √ √
∣AB∣ = 12 + (−2) + 32 = 14 ∣AC∣ = 12 + 22 + 12 = 6.

As an exercise, let’s verify that Fact 85 “works”. C = (2, 7, −1)

Observe that: y
Ð→ Ð→ Ð→ Ð→ Ð→ Ð→
AB = −rejÐ→ BC and AC = rejÐ→ BC. BC = (0, 4, −2) AC = (1, 2, 1)

And so, by Fact 85:

Ð→ Ð→ A = (1, 5, −2)
Ð→ Ð→ Ð ̂ → ∣BC × AC∣ ∣(8, −2, −4)∣
∣AB∣ = ∣BC × AC∣ = Ð→ =
∣AC∣ ∣(1, 2, 1)∣ Ð→
√ √ AB = (1, −2, 3)
82 + (−2) + (−4) 84 √
2 2
= = = 14. 3
12 + 22 + 12 6
B = (2, 3, 1)

Ð→ Ð→
Ð→ Ð→ Ð ̂ → ∣BC × AB∣ ∣(8, −2, −4)∣
∣AC∣ = ∣BC × AB∣ = Ð→ =
∣AB∣ ∣(1, −2, 3)∣
¿ √
Á (−8)2 + 22 + 42 84 √ z
=ÁÀ = = 6. 3 x
12 + (−2) + 32
2 14

547, Contents

50. The Distance Between a Point and a Line (in 3D)
Our Definitions and results concerning the foot of the perpendicular and the distance
between a point and a line are exactly the same as before. We now reproduce them
verbatim from Ch. 43:

Definition 122. Let A be a point that isn’t on the line l. The foot A
of the perpendicular from A to l is the point B on l such that AB ⊥ l. B l

Again, we can make use of the projection vector to find the foot of the perpendicular:
Fact 86. Suppose l is the line described by r = OP + λv (λ ∈ R) and A is a point that
isn’t on l. Then the unique foot of the perpendicular from A to l is the following point:
P + projv P A.

Again, the distance between a point and a line is the minimum distance between them:

Definition 123. Let A be a point and l be a line. Suppose B is the point on l that’s
closest to A. Then the distance between A and l is ∣AB∣.

Again, the foot of the perpendicular is also the closest point:

Fact 87. If B is the foot of the perpendicular from a point A to a line l, then B is also
the point on l that’s closest to A.

Corollary 13. Suppose l is a line, A is a point, and B is the foot of the perpendicular
from A to l. Then the distance between A and l is ∣AB∣.

Again, we can use what we learnt about the rejection vector to find the distance between
a point and a line:
Corollary 14. Suppose A is a point, l is the line described by r = OP + λv (λ ∈ R), and
d is the distance between A and l. Then:
d = ∣P A × v̂∣.

We’ll use the the exact same three methods as before to find the foot of the perpen-
dicular and the distance between a point and a line (in 3D space). Here are two

548, Contents

Example 681. Let A = (1, 2, 3) be a point and l be the line described by:

r = OP + λv = (0, 1, 2) + λ(9, 1, 3) (λ ∈ R).

Method 1 (Formula Method). First, P A = (1, 2, 3) − (0, 1, 2) = (1, 1, 1). So:

Ð→ Ð→ (1, 1, 1) ⋅ (9, 1, 3)
P B = projv P A = proj(9,1,3) (1, 1, 1) = (9, 1, 3)
92 + 12 + 32
9+1+3 13 1
= (9, 1, 3) = (9, 1, 3) = (9, 1, 3).
91 91 7
And so by Fact 86, the foot of the perpendicular from A to l is:
Ð→ 1 1
B = P + projv P A = (0, 1, 2) + (9, 1, 3) = (9, 8, 17) .
7 7

y A = (1, 2, 3)


P = (0, 1, 2)
v̂ = (9, 1, 3)

By Corollary 14, the distance between A and l is:

Ð→ Ð→ (9, 1, 3) (1 ⋅ 3 − 1 ⋅ 1, 1 ⋅ 9 − 1 ⋅ 3, 1 ⋅ 1 − 1 ⋅ 9)
∣BA∣ = ∣P A × v̂∣ = ∣(1, 1, 1) × √ ∣=∣ √ ∣
92 + 12 + 32 91
√ √ √
(2, 6, −8) (1, 3, −4) 1 + 3 + (−4)
2 2 26 2
=∣ √ ∣ = 2∣ √ ∣=2 =2 =2 .
91 91 91 91 7

(Example continues on the next page ...)

549, Contents

(... Example continued from the previous page.)
Method 2 (Perpendicular Method). Let B = (0, 1, 2) + λ̃ (9, 1, 3). Write down AB:
AB = B − A = (0, 1, 2) + λ̃ (9, 1, 3) − (1, 2, 3) = (9λ̃ − 1, λ̃ − 1, 3λ̃ − 1).

Ð→ Ð→
Since AB ⊥ l, we have AB ⊥ v or:

⎛ 9 ⎞ ⎛ 9λ̃ − 1 ⎞ ⎛9⎞
Ð→ ⎜ ⎟ ⎜ ⎟ ⋅ ⎜ 1 ⎟ = 9 (9λ̃ − 1) + (λ̃ − 1) + 3 (3λ̃ − 1) = 91λ̃ − 13.
0 = AB ⋅ ⎜ 1 ⎟ = ⎜ λ̃ − 1 ⎟ ⎜ ⎟
⎝ 3 ⎠ ⎝ 3λ̃ − 1 ⎠ ⎝3⎠

Rearranging, λ̃ = 13/91 = 1/7 and so:

1 1
B = (0, 1, 2) + (9, 1, 3) = (9, 8, 17) .
7 7
Happily, this is the same as what we found in Method 1. And now:

Ð→ 1 1 2
AB = B − A = (9, 8, 17) − (1, 2, 3) = (2, −6, −4) = (1, −3, −2).
7 7 7
Thus, the distance between A and l is:
√ √
Ð→ 2 2 2 √ 2
∣AB∣ = ∣(1, −3, −2)∣ = 12 + (−3) + (−2) = 14 = 2
2 2
7 7 7 7
Method 3 (or the Calculus Method). Let R be a generic point on l, so that AR =
(9λ − 1, λ − 1, 3λ − 1) and the distance between A and R is:
√ √
∣AR∣ = (9λ − 1) + (λ − 1) + (3λ − 1) = 91λ2 − 26λ + 3.
2 2 2

Again, first differentiate the expression 91λ2 − 26λ + 3 with respect to λ:

(91λ2 − 26λ + 3) = 182λ − 26.

Then by the First Order Condition (FOC), we have:

26 1
(182λ − 26) ∣λ=λ̃ = 0 or λ̃ = = .
182 7
Happily, this is the same as what we found in Method 2. And now, as before, we can find
B and ∣AB∣. Alternatively, we could simply have found λ̃ by using “−b/2a”:
−26 1
λ̃ = “ − b/2a” = − = .
2 ⋅ 91 7

550, Contents

Example 682. Let A = (−1, 0, 1) be a point and l be the line described by:

r = OP + λv = (3, 2, 1) + λ(5, 1, 2) (λ ∈ R).

Method 1 (Formula Method). First, P A = (−1, 0, 1) − (3, 2, 1) = (−4, −2, 0). So:

Ð→ Ð→ (−4, −2, 0) ⋅ (5, 1, 2)

P B = projv P A = proj(5,1,2) (−4, −2, 0) = (5, 1, 2)
52 + 12 + 22
−20 − 2 + 0 22 11
= (5, 1, 2) = − (5, 1, 2) = − (5, 1, 2).
30 30 15
And so by Fact 86, the foot of the perpendicular from A to l is:
Ð→ 11 1
B = P + projv P A = (3, 2, 1) − (5, 1, 2) = (−10, 19, −7) .
15 15

B P = (3, 2, 1)

v̂ = (5, 1, 2)

A = (−1, 0, 1)

By Corollary 14, the distance between A and l is:

Ð→ Ð→ (5, 1, 2) (−4, 8, 6) 58
∣BA∣ = ∣P A × v̂∣ = ∣(−4, −2, 0) × √ ∣=∣ √ ∣= .
52 + 12 + 22 30 15

Method 2 (Perpendicular Method). Let B = (3, 2, 1) + λ̃ (5, 1, 2). Write down AB:
AB = B − A = (3, 2, 1) + λ̃ (5, 1, 2) − (−1, 0, 1) = (5λ̃ + 4, λ̃ + 2, 2λ̃).

Ð→ Ð→
Since AB ⊥ l, we have AB ⊥ v or:

0 = (5λ̃ + 4, λ̃ + 2, 2λ̃) ⋅ (5, 1, 2) = 5 (5λ̃ + 4) + (λ̃ + 2) + 2 (2λ̃) = 30λ̃ + 22.

(Example continues on the next page ...)

551, Contents

(... Example continued from the previous page.)
Rearranging, λ̃ = −22/30 = −11/15 and so:
11 1
B = (3, 2, 1) + (5, 1, 2) = (−10, 19, −7) .
15 15
Happily, this is the same as what we found in Method 1. And now:

Ð→ 1 1
AB = B − A = (−10, 19, −7) − (−1, 0, 1) = (5, 19, −22).
15 15
Thus, the distance between A and l is:
√ √ √
Ð→ 1 1 870 58
∣AB∣ = ∣(5, 19, −22)∣ = 52 + 192 + (−22) = =
15 15 15 15

Method 3 (or the Calculus Method). Let R be a generic point on l, so that AR =
(5λ + 4, λ + 2, 2λ) and the distance between A and R is:
√ √
∣AR∣ = (5λ + 4) + (λ + 2) + (2λ) = 30λ2 + 44λ + 20.
2 2 2

Again, first differentiate the expression 30λ2 + 44λ + 20 with respect to λ:

(30λ2 + 44λ + 20) = 60λ + 44.

Then by the First Order Condition (FOC), we have:

44 11
(60λ + 44) ∣λ=λ̃ = 0 or λ̃ = − =− .
60 15
Happily, this is the same as what we found in Method 2. And now, as before, we can find
B and ∣AB∣. Alternatively, we could simply have found λ̃ by using “−b/2a”:
44 11
λ̃ = “ − b/2a” = − =− .
2 ⋅ 30 15

Exercise 223. For each of the following, use all three methods you just learnt to find
the foot of the perpendicular from A to l; and the distance between A and l.
The point A The line l Answer on p.
(a) (7, 3, 4) r = (8, 3, 4) + λ (9, 3, 7) 1495.
(b) (8, 0, 2) Contains the points (4, 4, 3) and (6, 11, 5) 1496.
(c) (8, 5, 9) r = (8, 4, 5) + λ (5, 6, 0) 1497.

552, Contents

51. Planes: Introduction
The remainder of Part III (Vectors) will be devoted to the study of planes.
Informally, a plane is a “flat 2D surface” (not unlike a piece of paper). A bit more formally,
a plane is, like a line, simply a set of points.
Let’s start by taking a quick look at some examples of planes.

Example 683. Consider the plane q described by the cartesian equation x = 1.

It contains exactly those points whose x-coordinate is 1. So, it contains A = (1, 0, 0),
B = (1, 1, 1), C = (1, 3, 1), and every other point with x-coordinate 1.

In contrast, it does not contain (2, 0, 0), (π, 1, 1), ( 2, 3, 1), or any other point whose
x-coordinate isn’t 1.

The plane q described by x = 1

C = (1, 3, 1)

B = (1, 1, 1)

A = (1, 0, 0)

Formally, the plane q is a set of points:

q = {(x, y, z) ∶ x = 1} .

In words, q is the set containing exactly those points (x, y, z) whose x-coordinate is 1.
You should take a moment to convince yourself that the plane q, which is the set of points
whose x-coordinates are 1, does indeed form a “flat 2D surface”.

553, Contents

Example 684. Three planes are described by the following cartesian equations:

x = 1, x = 3, and x = 5.

Later on, we will learn what it means for two planes to be parallel and how to calculate
the distance between two planes. But for now, we merely assert that “obviously”:
• The three planes are parallel.
• The distance between the first and second planes is 2.
• The distance between the second and third planes is also 2.

x=1 x=3 x=5


Exercise 224. The planes q1 and q2 are described by y = 2 and z = 3.

(a) Sketch the graphs of both planes in a single figure.

Then find two points that are on:

(b) q1 but not q2 ;
(c) q2 but not q1 ; and
(d) Both q1 and q2 . (Answer on p. 1498.)

554, Contents

More examples:

Example 685. Consider the plane q described by the cartesian equation y = 2x.
It is the set of points (x, y, z) that satisfies the equation y = 2x. Formally:

q = {(x, y, z) ∶ y = 2x}.

So for example, it contains:

• The origin O = (0, 0, 0) because 0 = 2 ⋅ 0;
• The point A = (1, 2, 3) because 2 = 2 ⋅ 1; and
• The point B = (−1, −2, 0) because −2 = 2 ⋅ (−1).
In contrast, it does not contain:
• The point C = (1, 0, 3) because 0 ≠ 2 ⋅ 1; or
• The point D = (−1, 2, 0) because 2 ≠ 2 ⋅ (−1).

y y = 2x, z = 0
D = (−1, 2, 0)

y = 2x, z = 3

The plane A = (1, 2, 3)

y = 2x

C = (1, 0, 3)

B = (−1, −2, 0)

Also, the plane q contains the lines y = 2x, z = 0 and y = 2x, z = 3.

Indeed, for every k ∈ R, the line y = 2x, z = k is contained in the plane q.240

Here’s a proof of this assertion. Consider the line y = 2x, z = k. Let P be any point on the line. Observe
that P obviously satisfies the plane’s equation y = 2x. Thus, P ∈ q. We have just shown that any
arbitrary point P on the line is also on q. Therefore, q contains the line.
555, Contents
Example 686. Consider the plane q described by the cartesian equation x + y = z.
It is the set of points (x, y, z) that satisfies the equation x + y = z. Formally:

q = {(x, y, z) ∶ x + y = z}.

So for example, it contains:

• The origin O = (0, 0, 0) because 0+0 = 0;
• The point A = (1, 2, 3) because 1+2 = 3; and
• The point B = (−1, 1, 0) because −1 + 1 = 0.
In contrast, it does not contain:
• The point C = (1, 0, 3) because 1+0 ≠ 3; or
• The point D = (−1, 2, 0) because −1 + 2 ≠ 0.

D = (−1, 2, 0)
A = (1, 2, 3)

B = (−1, 1, 0) O x

The plane C = (1, 0, 3)

x+y =z

Note that as depicted, D is “behind” the plane.

As the above examples suggest, it turns out that in general, any plane q is simply the
graph of the following cartesian equation:

ax + by + cz = d,

where a, b, c, d ∈ R (and at least one of a, b, or c is non-zero).

In other words, the plane q is the set of points (x, y, z) that satisfy ax + by + cz = d.
Formally, we’d write the plane q as a set of points:

q = {(x, y, z) ∶ ax + by + cz = d} .

In the coming chapters, we will explain why a plane may be described by the above cartesian
equation. We will also learn what the vector (a, b, c) and the number d mean geometrically.

556, Contents

51.1. The Analogy Between a Plane, a Line, and a Point
In 3D space:

A plane is a two-dimensional object and can be described by one

A line “ one-dimensional “ and “ two
A point “ zero-dimensional “ and “ three

Example 687. We are given 3D space, the set of all points.

We first impose the equation or constraint x = 4. That is, we keep only those points (in
3D space) that satisfy the equation x = 4 and “throw away” all other points. This leaves
us with the plane q. We say that the plane q is described by one equation:

x = 4.

“3 − 1 = 2”: By imposing 1 constraint on 3D space, we end up with a 2D object.

The plane q: x = 4

The line l: x = 4, y = 5 P = (4, 5, 6) ∶ x = 4, y = 5, z = 6

z x

Now suppose we also impose the constraint y = 5. That is, we take the plane q, but keep
only those points on q that satisfy the equation y = 5 (and “throw away” all other points).
This gives us the line l. We say that the line l is described by two equations:

x=4 and y = 5.

“3 − 2 = 1”: By imposing 2 constraints on 3D space, we end up with a 1D object.

Finally, we impose a third constraint z = 6. That is, we take the line l, but keep only
those points on l that satisfy the equation z = 6 (and “throw away” all other points).
This gives us the point P . Hence, the point P is described by three equations:

x = 4, y = 5, and z = 6.

“3 − 3 = 0”: By imposing 3 constraints on 3D space, we end up with a 0D object.

557, Contents

We observe that each additional equation (or constraints) “chops off” a dimension. This
observation also generalises to spaces of other dimensions. For example, in 2D space:

“2 − 1 = 1”: A line is a 1D object that can be described by one equat

“2 − 2 = 0”: A point “ 0D “ two

Example 688. In this example, we return to 2D space.

We first impose the equation or constraint x = 4. That is, we keep only those points (in
2D space) that satisfy the equation x = 4 and “throw away” all other points. This leaves
us with the line l. We say that the line l is described by one equation:

x = 4.

“2 − 1 = 1”: By imposing 1 constraint on 2D space, we end up with a 1D object.

y The line l: x = 4

P = (4, 5) ∶
x = 4, y = 5

Now suppose we also impose the constraint y = 5. That is, we take the line l, but keep
only those points on l that satisfy the equation y = 5 (and “throw away” all other points).
This gives us the point P = (4, 5). We say that the point P is described by two equations:

x=4 and y = 5.

“2 − 2 = 0”: By imposing 2 constraints on 2D space, we end up with a 0D object.

Similarly, in 4D space (not in H2 Maths):

“4 − 1 = 3”: One equation describes a 3D space.

“4 − 2 = 2”: A plane is a 2D object that can be described by one equ
“4 − 3 = 1”: A line “ 1D “ two
“4 − 4 = 0”: A point “ 0D “ three

And so on and so forth in all higher-dimensional spaces as well.

558, Contents
52. Planes: Formally Defined in Vector Form

Example 689. Let q be the plane that contains the points A = (1, 0, 0), B = (0, 1, 0),
and C = (0, 0, 1). Informally, a plane is a “flat surface”.
And since it is a “flat surface”, there must y The plane q
be some vector that is perpendicular to
it. We will call any such vector a normal
vector of the plane.241
Ð→ n = (1, 1, 1)
To find a normal vector of q, all we need do AB = (−1, 1, 0)
is pick any two vectors on q and compute B = (0, 1, 0)
their vector product.
Ð→ x
Let’s pick, say, AB = (−1, 1, 0) and
Ð→ A = (0, 0, 1)
AC = (−1, 0, 1). Their vector product
C = (1, 0, 0) Ð→
(which we’ll also denote n) is: AC = (−1, 0, 1)

⎛ −1 ⎞ ⎛ −1 ⎞ ⎛1⎞
Ð→ Ð→ ⎜ z
AB × AC = ⎜ 1 ⎟ ⎜
⎟×⎜ 0
⎟ = ⎜ 1 ⎟ = n.
⎟ ⎜ ⎟
⎝ 0 ⎠ ⎝ 1 ⎠ ⎝1⎠
As we learnt in Ch. 49, the vector product n = (1, 1, 1) must be perpendicular to both
Ð→ Ð→
AB and AC. It turns out that n is also perpendicular to every vector on q (we’ll formally
state and prove this as Fact 101 below). And so, we call n a normal vector of q.
Now, let R denote a generic point on q. Then the vector AR is on q. Which means:
Ð→ Ð→
AR ⊥ n = (1, 1, 1), or equivalently, AR ⋅ (1, 1, 1) = 0.

Let’s manipulate this last equation a little:

AR ⋅ (1, 1, 1) =0
Ð→ Ð→ Ð→ Ð→ Ð→
⇐⇒ (OR − OA) ⋅ (1, 1, 1) =0 (∵ AR = OR − OA)
Ð→ Ð→
⇐⇒ OR ⋅ (1, 1, 1) − OA ⋅ (1, 1, 1) =0 (Distributivity)
Ð→ Ð→
⇐⇒ OR ⋅ (1, 1, 1) = OA ⋅ (1, 1, 1)
⇐⇒ OR ⋅ (1, 1, 1) = (1, 0, 0) ⋅ (1, 1, 1) = 1.
We say that this last equation OR ⋅ (1, 1, 1) = 1 is a vector equation that describes the
plane q. To be a bit more formal and precise, we’d write q as the following set:
q = {R ∶ OR ⋅ (1, 1, 1) = 1} .

In words, q is the set containing exactly those points R that satisfy OR ⋅ (1, 1, 1) = 1.
(Example continues on the next page ...)
We’ll formally define what a normal vector of a plane is in Definition 143.
559, Contents
(... Example continued from the previous page.)
If we let r denote the position vector of the generic point R, then here is another vector
equation that also describes q:

r ⋅ (1, 1, 1) = 1.

Again, to be a bit more formal and precise, we’d write:

q = {R ∶ r ⋅ (1, 1, 1) = 1}.

In words, q is the set containing exactly those points R that satisfy r ⋅ (1, 1, 1) = 1.

The above discussion motivates the following formal definition of a plane:

Definition 141. A plane is any set of points that can be written as:
{R ∶ OR ⋅ n = d} or {R ∶ r ⋅ n = d},

where n is some non-zero vector and d ∈ R.

In words, a plane is the set containing exactly those points R that satisfy:
OR ⋅ n = d or r ⋅ n = d.

A little less formally, we will simply say that a plane is described by either of the above
vector equations.
By the way, in the above example, we spoke of vectors being on a plane. It’s probably a
good idea to formally and precisely define what this means:
Definition 142. A vector v is on a plane q if there are points S, T ∈ q such that v = ST .

And if v is on q, then for the sake of convenience, we will sometimes be sloppy and say
that q contains v.242

I say that this is sloppy because strictly speaking,it is wrong to say that a plane q contains a vector v.
A plane contains points and not vectors. Nonetheless, for the sake of convenience, we will often simply
(and incorrectly) say that a plane contains a vector.
560, Contents
Example 690. Consider the plane q = {R ∶ OR ⋅ (−3, 0, 2) = −5}.
It contains the point A = (1, 0, −1) because A satisfies the plane’s vector equation:
OA ⋅ (−3, 0, 2) = (1, 0, −1) ⋅ (−3, 0, 2) = −3 + 0 − 2 = −5. 3

It also contains the point B = (3, 1, 2):

OB ⋅ (−3, 0, 2) = (3, 1, 2) ⋅ (−3, 0, 2) = −9 + 0 + 4 = −5. 3

In contrast, it does not contain the point C = (9, 1, 1):

OC ⋅ (−3, 0, 2) = (9, 1, 1) ⋅ (−3, 0, 2) = −27 + 0 + 2 = −25. 7

The plane q

A = (1, 0, −1)

C = (9, 1, 1)

(−3, 0, 2)
B = (3, 1, 2) x

Since q contains the points A and B, the vector AB is on q. (If we were being sloppy,
we’d instead say that q contains the vector AB.)
Ð→ Ð→
Now, are the vectors AC and BC on q? As you may have guessed, the answer is, “No,
they are not.” But to prove this, we’ll have to wait until Fact 100 below.243
Exercise 225. Consider the plane q = {R ∶ OR ⋅ (−5, 7, 3) = −1}. Does q contain the
points A = (5, −3, 1), B = (1, −2, 6), and C = (−2, 2, −3)? (Answer on p. 1499.)

Ð→ Ð→
Here is an incorrect “proof”: “C is not on q. Therefore, the vectors AC and BC not on q.” This proof
is incorrect because in order to prove that say AC is not on q, we need to prove that given any two
Ð→ Ð→
points P and Q on q, AC ≠ P Q. The mere observation that C is not on q does not suffice.
561, Contents
Our first result about planes is simple and intuitively “obvious”:

Fact 95. If a plane contains two distinct points, then it also contains the line through
those two points.

The plane q

The line AB Since q contains A and B,

A it also contains the line AB.

Proof. See p. 1299 in the Appendices.244

Example 691. In the last example, we verified that the plane q = {R ∶ OR ⋅ (−3, 0, 2) = 5}
contains the points A = (1, 0, −1) and B = (3, 1, 2). By the above Fact then, q also contains
the line AB. That is, any point on the line AB is also on the plane q.

Example 692. The plane q is described by r ⋅ (4, 1, 5) = 0 and the line l is described by
r = OA + λv = (−1, −1, 1) + λ (5, 0, −4) (λ ∈ R).
It turns out that the plane q contains the y
line l. Here are two ways to show this: The plane q
Method 1. Observe that l contains the
points A = (−1, −1, 1) and B = A + v =
(−1, −1, 1) + (5, 0, −4) = (4, −1, −3).
But the plane q also contains A and B (as
you should be able to verify). And so, by the
above Fact, q contains the line AB, which is The line l
also the line l.
Method 2. Let R = (−1, −1, 1) + λ (5, 0, −4)
be a generic point on the line l. We show x
that R satisfies q’s vector equation:

OR ⋅ (4, 1, 5) = [(−1, −1, 1) + λ (5, 0, −4)] ⋅ (4, 1, 5)
= 4 (−1 + 5λ) + (−1) + 5 (1 − 4λ)
= −4 − 20λ − 1 + 5 + 20λ = 0.

We’ve just shown that q contains any point R on l. That is, q contains l.

Exercise 226. Suppose the plane q is described by r ⋅ (4, −3, 2) = −10, while the line l is
described by r = (7, 3, 1) + λ (3, 6, −2). Determine if q contains l. (Answer on p. 1499.)

Next up, we’ll examine the normal vector in greater detail.

I have relegated many of the proofs in this Chapter to the Appendices even though they are not difficult
and would usually have been in the main text. However, this Chapter contains an inordinate number
of results and I decided to do so lest the student feel overwhelmed.
562, Contents
52.1. The Normal Vector
A plane’s normal vector is perpendicular to every vector on that plane. Formally:

Definition 143. A normal vector of a plane is a vector that’s perpendicular to every

vector on that plane.

More simply, instead of saying, “n is a normal vector of the plane q,” we’ll also say, “n is
normal to q”. And as shorthand, we’ll write n ⊥ q.
Not surprisingly, the vector n used in Definition 141 of the plane is a normal vector:
Fact 96. If q = {R ∶ OR ⋅ n = d} is a plane, then n ⊥ q.

Proof. See p. 1299 in the Appendices.

Example 693. The plane q is described by r ⋅ n = d.

By the above Fact, n ⊥ q. That is, n
the vector n is perpendicular to every u w
vector on q. v
So for example, suppose the vectors u,
v, and w are on q. Then n ⊥ u, v, w.

Example 694. Note that a plane’s nor- y

mal vector is not unique. Consider the
plane q described by r⋅(1, 1, 1) = 1. Let:

m = (2, 2, 2) n
u = (−1.5, −1.5, −1.5)
√ √ √
v = ( 5, 5, 5) x

The vectors m = 2n, u = −1.5n, and
v = 5n are parallel to n = (1, 1, 1). u
And so, they are “obviously” also nor-
mal vectors of q. q

As the above example suggests, if n is a normal vector of the plane q, then “obviously”, so
too is any vector m that’s parallel to n. Formally:

Fact 97. Let q be a plane and n and m be vectors. Suppose n ⊥ q. Then:

m ∥ n Ô⇒ m ⊥ q.

Proof. See p. 1299 in the Appendices.

563, Contents

It turns out that the converse of Fact 97 is also true. That is, suppose n is a normal vector
of the plane q. If m is also a normal vector of q, then m ∥ n. Formally:

Theorem 9. Let q be a plane and n and m be vectors. If n ⊥ q, then:

m⊥q Ô⇒ m ∥ n.

Proof. See p. 1302 in the Appendices.

Fact 97 says that a normal vector n of a plane q is not unique — any vector parallel to
n is also a normal vector of q. Theorem 9 then says the converse: only vectors that are
parallel to n are normal vectors of q. Putting these two results together, we have:

Corollary 15. Let q be a plane and n and m be vectors. If n ⊥ q, then:

m⊥q ⇐⇒ m ∥ n.

In other words, if n is a plane’s normal vector, then that plane’s normal vectors are exactly
those which are parallel to n.

Example 695. Consider again y

the plane q described by:
The plane q
r ⋅ (1, 1, 1) = 1.
u = (2, 2, 2)
a = (1, 2, 3)
By ⇐Ô of Corollary 15, d = (1, −1, 0)
every vector that’s parallel to
n = (1, 1, 1) is also normal to q. n = (1, 1, 1)
For example, the following vec- x
tors are normal to q because
they are parallel to (1, 1, 1).
v = (−2, −2, −2)
u = (2, 2, 2), b = (0, −1, 2)
v = (−2, −2, −2), z
√ √ √
w = ( 5, 5, 5) .

Conversely ( Ô⇒ of Corollary 15), every normal vector of q must be parallel to (1, 1, 1).
And so for example, the vectors a = (1, 2, 3), b = (0, −1, 2), c = (1, 1, 0.9), and 0 are not
parallel to (1, 1, 1) and are thus not normal to q.
The vector d = (1, −1, 0) is on the plane q and, as depicted, d ⊥ n, u, v, but d ⊥/ a, b.

Exercise 227. The plane q is described by r ⋅ (1, −1, 1) = −2. Determine if a = (2, −2, 2),
√ √ √
b = (2, 2, −2), and c = (− 2, 2, − 2) are normal vectors of q. (Answer on p. 1499.)

564, Contents

Suppose the plane q can be described by r⋅n = d. If m = kn for some k ≠ 0, then “obviously”,
q can also be described by:

r ⋅ m = kd.

For future reference, let’s jot this down as a formal result:

Fact 98. Suppose q is a plane with q = {R ∶ r ⋅ n = d}.

If m = kn for some k ≠ 0, then we also have q = {R ∶ r ⋅ m = kd}.

Example 696. The plane q described by r ⋅ (1, 2, 3) = 4 can also be described by:
√ √ √ √
r ⋅ (2, 4, 6) = 8, r ⋅ (−1, −2, −3) = −4, or r ⋅ ( 5, 2 5, 3 5) = 4 5.

Example 697. Let q be the plane that y

contains the points A = (9, 0, 5), B =
(−2, 1, 1), and C = (3, −5, 4). The plane q
Let us first write down
two vectors on q: B = (−2, 1, 1)
AB = (−11, 1, −4),
Ð→ Ð→
AC = (−6, −5, −1). AB = (−11, 1, −4)
Hence, a normal vector A = (9, 0, 5)
of q is: x
n = (−21, 13, 61)

⎛ −21 ⎞ Ð→
Ð→ Ð→ ⎜ AC = (−6, −5, −1)
n = AB × AC = ⎜ 13 ⎟⎟.
⎝ 61 ⎠ C = (3, −5, 4)

Compute d = OA ⋅ n = (9, 0, 5) ⋅ (−21, 13, 61) = −189 + 0 + 305 = 116.
Thus, q may be described by r ⋅ (−21, 13, 61) = 116.
Another normal vector of q is 2(−21, 13, 61) = (−42, 26, 122). And so, q may also be
described by r ⋅ (−42, 26, 122) = 2 ⋅ 116 = 232.

Exercise 228. The plane q contains the points A = (1, −1, 2), B = (−2, 3, 0), and C =
(0, −1, 1). (Answer on p. 1499.)
(a) Find a normal vector of q.
(b) Hence write down a vector equation that describes the plane q.
(c) Write down another normal vector of q.
(d) Hence write down another vector equation that describes the plane q.

565, Contents

Now, suppose q is a plane with normal vector n. Definition 143 says that if a vector v is
on q, then it must also be perpendicular to n.
It turns out that the converse is also true. That is, if v is perpendicular to n, then it must
be on q. Formally:

Fact 99. Let q be a plane with normal vector n. Suppose v is a vector. Then:

v⊥n Ô⇒ v is on q.

Proof. See p. 1301 in the Appendices.

Putting Definition 143 and Fact 99 together, a plane’s vectors are exactly those that are
perpendicular to its normal vector:

Corollary 16. Let v be a vector and q be a plane with normal vector n. Then:

v⊥n ⇐⇒ v is on q.

Example 698. Let q be the plane described by r ⋅ (5, 1, 6) = −3. As you should be able
to verify, it contains the points A = (−1, 2, 0), B = (−2, 1, 1), and C = (3, 0, −3).
Ð→ Ð→
Since A, B, and C are on q, so too are the vectors AB = (−1, −1, 1) and AC = (4, −2, −3).
Ð→ Ð→
Hence, both AB and AC should be perpendicular to the normal vector n = (5, 1, 6). Let’s
verify that this is so:
AB ⋅ n = (−1, −1, 1) ⋅ (5, 1, 6) = −5 − 1 + 6 = 0. 3
AC ⋅ n = (4, −2, −3) ⋅ (5, 1, 6) = 20 − 2 − 18 = 0. 3

Now, consider the vector u = (−3, 3, 2). Is it on the plane q?

Well, it’s not obvious. We could try to find two points S and T on q such that u = ST .
But this could be slow and laborious.
A quicker method would be to use Corollary 16. That is, simply check if u ⊥ n:

u ⋅ n = (−3, 3, 2) ⋅ (5, 1, 6) = −15 + 3 + 12 = 0. 3

So yup, u is on q.245
Let’s now consider the vector v = (−7, 2, 3). Is it on q? Again, simply check if v ⊥ n.

v ⋅ n = (−7, 2, 3) ⋅ (5, 1, 6) = −35 + 2 + 18 = −15. 7

So nope, v is not on q.

Exercise 229. Let q be the plane described by r ⋅ (8, −2, 1) = 5. Are the vectors a =
(3, 7, −5), b = (1, 6, 4), and c = (3, 10, 1) are on q? (Answer on p. 1499.)

For the doubtful reader, let D = A + u = (−1, 2, 0) + (−3, 3, 2) = (−4, 5, 2). We can verify that D ∈ q. And
now, since A, D ∈ q, by Definition 142, the vector AD = u is on q.
566, Contents
Suppose the plane q contains the point P . Then q contains exactly those points R for which
the vector P R is on q. Formally:

Fact 100. Let q be a plane and P and R be points. Suppose P ∈ q. Then:

R∈q ⇐⇒ The vector P R is on q.

Proof. See p. 1300 in the Appendices.

Example 699. The plane q = {R ∶ OR ⋅ (0, −2, 3) = 1}
contains the point A = (0, 1, 1). The point B is such
that AB = (1, 3, 2).
Here are two methods for showing that B ∈ q. B
Method 1. First find B = A + AB = (0, 1, 1) +
(1, 3, 2) = (1, 4, 3). Then show that B satisfies the Ð→
plane’s vector equation:
OB ⋅ n = (1, 4, 3) ⋅ (0, −2, 3) = 0 − 8 + 9 = 1.
Ð→ z n
Method 2. Simply check if AB ⊥ n:
AB ⋅ n = (1, 3, 2) ⋅ (0, −2, 3) = 0 − 6 + 6 = 0. 3
Yup, AB ⊥ n. And so by Fact 100, B ∈ q.

As promised earlier, we now revisit Example 690:

Example 690. We already showed that the plane q = {R ∶ OR ⋅ (−3, 0, 2) = −5} contains
the points A = (1, 0, −1) and B = (3, 1, 2), but not the point C = (9, 1, 1). We also
concluded that the vector AB is on q (because q contains A and B).
Ð→ Ð→
However, we were unable to say if the vectors AC and BC are on q. But now, with Fact
100, we know that they are not.

Exercise 230. The plane q is described by r ⋅ (7, −1, 3) = 19 and A = (1, 4, −1) is a point.
The point B is such that AB = (7, 3, −2). Is the point B on q? (Answer on p. 1499.)

567, Contents

Given two (non-parallel) vectors on a plane, their vector product is normal to the plane:

Fact 101. If a and b are non-parallel vectors on a plane q, then a × b ⊥ q.

Proof. See p. 1305 in the Appendices.

Now, suppose a and b are (non-parallel)
vectors on a plane q. If v ⊥ q, then by Defin-
ition 143, v ⊥ a, b.
It turns out that the converse is also true. v
That is, if v ⊥ a, b, then v ⊥ q. This is a
because if v ⊥ a, b, then by Fact 94, v ∥
a × b. But by Fact 101, a × b ⊥ q. And so
by Corollary 15, v ⊥ q.
Let’s jot this down as a formal result:
Corollary 17. Let a and b be non-
parallel vectors on a plane q. Then:
v ⊥ a, b Ô⇒ v ⊥ q
v⊥q ⇐⇒ v ⊥ a, b.

Corollary 18. Suppose a and b are non-parallel vectors on the plane q. Then

c⊥a×b ⇐⇒ c is on q.

Proof. By Fact 101, a × b ⊥ q. And so by Corollary 16, c ⊥ a × b ⇐⇒ c is on q.

Example 700. The plane q contains the points A = (0, 0, 1), B = (4, 2, 0), and C =
(−5, 0, 4).
Aisha claims that v = (6, −7, 10) is normal to q. Let’s check if she’s correct:
First, write down two non-parallel vectors on q. Two obvious candidates are:
Ð→ Ð→
AB = (4, 2, −1) and AC = (−5, 0, 3).
Ð→ Ð→
Then check if v ⊥ AB, AC:
v ⋅ AB = (6, −7, 10) ⋅ (4, 2, −1) = 24 − 14 − 10 = 0, 3
v ⋅ AC = (6, −7, 10) ⋅ (−5, 0, 3) = −30 + 0 + 30 = 0. 3
Ð→ Ð→ Ð→ Ð→
Since AB ∥/ AC and v ⊥ AB, AC, by Corollary 17, v ⊥ q and Aisha is correct.

Exercise 231. The vectors a = (1, −1, 1) and b = (−2, 2, −2) are on the plane q. Is
n = (0, 1, 2) a normal vector of q? What about m = (1, 3, 2)? (Answer on p. 1499.)

568, Contents

53. Planes in Cartesian Form

Example 701. Consider the plane q described by:

r ⋅ n = r ⋅ (1, 2, 3) = 4 .

The plane q contains those points R = (x, y, z) whose position vector satisfies =.

³¹¹ ¹ ¹ ¹ ¹ ·¹¹ ¹ ¹ ¹ ¹ µ ³¹¹ ¹ ¹ ¹ ·¹¹ ¹ ¹ ¹ ¹µ v

r n

We can also rewrite = as: (x, y, z) ⋅ (1, 2, 3) = 4.


Notice we simply have: (x, y, z) ⋅ (1, 2, 3) = x + 2y + 3z.

Hence, q is also described by the following cartesian equation:

x + 2y + 3z = 4.

In general, suppose q is the plane described by the following vector equation:

r ⋅ n = (x, y, z) ⋅ (a, b, c) = d.

Then q can also be described by the following cartesian equation:

ax + by + cz = d.


Fact 102. Let (a, b, c) be a non-zero vector and d ∈ R. Then:

⎪ ⎫
⎪ ⎧
⎪ ⎫

⎪ ⎛x⎞ ⎛x ⎞ ⎛a⎞ ⎪
⎪ ⎪
⎪ ⎛x⎞ ⎪
⎪⎜ ⎟ ⎜ ⎟ ⋅ ⎜ b ⎟ =v d⎪ ⎪⎜ ⎟ c ⎪⎪
⎨⎜ y ⎟ ∶ ⎜ y ⎟ ⎜ ⎟ ⎬ = ⎨⎜ y ⎟ ∶ ax + by + cz = d⎬ .

⎪ ⎪

⎪ ⎪

⎪⎝ z ⎠ ⎪

⎩⎝ z ⎠ ⎝ z
⎪ ⎠ ⎝ c ⎠ ⎪
⎭ ⎪ ⎩ ⎪

Example 702. The plane described by r ⋅ (5, 0, −1) = 3 may also be described by:

5x − z = 3.

Example 703. The plane described by r ⋅ (−1, 7, 2) = 0 may also be described by:

−x + 7y + 2z = 0.

Example 704. The plane described by 5x + 6y + 7z = 8 may also be described by:

r ⋅ (5, 6, 7) = 8.

Example 705. The plane described by y = 5 may also be described by:

r ⋅ (0, 1, 0) = 5.
569, Contents
Fact 103. The plane described by r ⋅ (a, b, c) = d contains the origin if and only if d = 0.

Proof. The origin is the point (x, y, z) = (0, 0, 0) and satisfies the equation r ⋅ (a, b, c) = d or
ax + by + cz = d if and only if d = 0. Thus, the plane described by r ⋅ (a, b, c) = d contains the
origin if and only if d = 0.

Example 706. Consider the plane q described by:

ax + by + cz = 0 or r ⋅ (a, b, c) = 0.

Even if we don’t know what a, b, and c are, we know that q contains the origin.

Example 707. Consider the plane q described by:

ax + by + cz = 8 or r ⋅ (a, b, c) = 8.

Even if we don’t know what a, b, and c are, we know that q does not contain the origin.

Exercise 232. Each of the following is a plane given in vector form. Rewrite each in
cartesian form and state if each contains the origin. (Answer on p. 1500.)

(a) r ⋅ (1, 2, 3) = 17. (b) r ⋅ (−1, 0, −2) = 0. (c) r ⋅ (0, −2, 5) = −3.

Exercise 233. Each of the following is a plane given in cartesian form. Rewrite each in
vector form and state if each contains the origin. (Answer on p. 1500.)

(a) x + 5 = 17y + z. (b) y + 1 = 0. (c) x + z = y − 2.

570, Contents

53.1. Finding Points on a Plane

Example 708. Consider the plane described in vector or cartesian form by:

r ⋅ (1, 2, 3) = 4 or x + 2y + 3z = 4.

Given a plane’s cartesian equation, we can easily use trial-and-error to find points on that
plane: Simply try out values of x, y, and z that satisfy the cartesian equation. (Tip: As
always, zero is our friend.)
So for example, the following points are on the given plane (as you should verify yourself):

A = (4, 0, 0), B = (0, 2, 0), and C = (1, 0, 1).

In contrast, the following points are not on the given plane, because they do not satisfy
x + 2y + 3z = 4 (as you should verify yourself):

D = (1, 0, 0), E = (0, 1, 1), and F = (−3, 4, 5).

Example 709. Consider the plane described in vector or cartesian form by:

r ⋅ (3, 1, 1) = −4 or 3x + y + z = −4.

The following points are on the given plane:

A = (0, −4, 0), B = (0, 0, −4), and C = (−1, −1, 0).

The following are not:

D = (1, 0, 0), E = (0, 1, 1), and F = (−3, 4, 5).

Example 710. Consider the plane described in vector or cartesian form by:

r ⋅ (−5, 1, 0) = 1 or −5x + y = 1.

The following points are on the given plane:

A = (0, 1, 0), B = (0, 1, 1), and C = (−1, −4, 0).

The following are not:

D = (1, 0, 0), E = (0, 2, 1), and F = (−3, 4, 5).

Exercise 234. Below are given three planes in vector form. First rewrite each plane in
cartesian form. Then find three points that are on each plane and another three points
that are not. (Answer on p. 1500.)

(a) r ⋅ (0, 0, 1) = 32. (b) r ⋅ (5, 3, 1) = −2. (c) r ⋅ (1, −2, 3) = 0.

571, Contents

53.2. Finding Vectors on a Plane

Example 711. The plane q is described by:

r ⋅ (1, 2, 3) = 4 or x + 2y + 3z = 4.
Its normal vector is: n = (a, b, c) = (1, 2, 3).
Recall (Corollary 16) that a vector is on q if and only if it is perpendicular to n. So,
to find vectors on q, we need simply find vectors that are perpendicular to n — that is,
vectors whose scalar product with n is zero.
We will now construct one such vector u = (u1 , u2 , u3 ). That is, we’ll pick values of u1 ,
u2 , and u3 so that u ⋅ n = 0.
As always, zero is our friend. Let’s start by picking u3 = 0, so that:

u = (u1 , u2 , 0).

We’ll now play a simple little trick. Suppose we set u2 = −a and u1 = b:

u = (b, −a, 0) = (2, −1, 0).

Then observe that things will nicely cancel out:

u ⋅ n = (b, −a, 0) ⋅ (a, b, c) = ba − ab + 0 = 0.

Et voilà! By construction, u is perpendicular to n and is thus on q.

Using the same method, we can easily construct two more vectors that are also on q:

v = (c, 0, −a) = (3, 0, −1) and w = (0, c, −b) = (0, 3, −2).

(As you can easily verify, v ⋅ n = 0 and w ⋅ n = 0.)

And of course, the additive inverses of u, v, and w are also on q:

−u = (−2, 1, 0) −v = (−3, 0, 1) and −w = (0, −3, 2).

The above method is formally stated as Fact 104 below.

Even without the above method, we can easily find vectors that are on q. For example,
it is not difficult to see that d = (1, 1, −1) is also on q because:

d ⋅ n = (1, 1, −1) ⋅ (1, 2, 3) = 1 + 2 − 3 = 0.

It is equally easy to show that a vector is not on q. For example, e = (3, 2, 1) and
f = (1, −1, 1) are not on q because e ⋅ n ≠ 0 and f ⋅ n ≠ 0 (as you can verify).

572, Contents

Fact 104. If a plane has normal vector (a, b, c), then the following vectors are on the

(b, −a, 0), (c, 0, −a), (0, c, −b), (−b, a, 0), (−c, 0, a), and (0, −c, b).

573, Contents

Example 712. The plane q is described by r ⋅ (3, 1, 1) = −4 or 3x + y + z = −4. It has
normal vector n = (a, b, c) = (3, 1, 1). And so by Fact 104, the following vectors are on q:

(b, −a, 0) = (1, −3, 0), (c, 0, −a) = (1, 0, −3), (0, c, −b) = (0, 1, −1),
(−b, a, 0) = (−1, 3, 0), (−c, 0, a) = (−1, 0, 3), (0, −c, b) = (0, −1, 1).

We can also easily find other vectors on q. For example, d = (−1, 1, 2) is on q because:

d ⋅ n = (−1, 1, 2) ⋅ (3, 1, 1) = −3 + 1 + 2 = 0.

In contrast, e = (3, 2, 1) and f = (1, −1, 1) are not on q because e ⋅ n ≠ 0 and f ⋅ n ≠ 0.

Example 713. The plane q is described by r ⋅ (−5, 1, 0) = −4 or −5x + y = 1. It has normal

vector n = (a, b, c) = (−5, 1, 0). And so by Fact 104, the following vectors are on q:

(b, −a, 0) = (1, 5, 0), (c, 0, −a) = (0, 0, 5), (0, c, −b) = (0, 0, −1),
(−b, a, 0) = (−1, −5, 0), (−c, 0, a) = (0, 0, −5), (0, −c, b) = (0, 0, 1).

Actually, here we can make another useful and important observation. Notice that n’s
z-coordinate is 0. And so, if a vector u = (u1 , u2 , u3 ) is perpendicular to n (and is hence
on q), then so too is the vector (u1 , u2 , λ) for any value of λ.
So for example, since (1, 5, 0) is on q, so too are the following vectors:

(1, 5, 0), (1, 5, 1), (1, 5, − 2), (1, 5, 999), (1, 5, π), etc.

Also, for any value of λ , the vector (0, 0, λ) must be perpendicular to n. In particular,
the standard basis vector k = (0, 0, 1) is perpendicular to n and is thus also on q.

Example 714. The plane q is described by r ⋅ (0, 0, 1) = 5 or z = 5. It has normal vector

n = (a, b, c) = (0, 0, 1). Notice that here n’s x- and y-coordinates are both 0.
Following the observation made in the previous example, if a vector u = (u1 , u2 , u3 ) is
perpendicular to n (and is hence on q), then so too is the vector (λ, µ, u3 ) for any values
of λ and µ.
Here we can also make a new observation. Since only the z-coordinate of n is non-zero,
if u = (u1 , u2 , u3 ) ⊥ n or u ⋅ n = 0, it must be that u3 = 0.
Altogether then, the vectors that are perpendicular to n (and are hence on q) are exactly
those vectors that can be written as (λ, µ, 0) for some real numbers λ and µ. So for

example, the vectors (9, 0, 0), (0, 5, 0), and (−π, 2, 0) are perpendicular to n (and are
thus on q).

In contrast, the vectors (9, 0, 1), (0, 5, 2), and (−π, 2, 3) are not.

Exercise 235. Find three non-parallel vectors on each plane: (a) r ⋅ (1, −2, 3) = 0; (b)
r ⋅ (5, 3, 1) = −2; (c) r ⋅ (1, 0, 4) = 5; (d) r ⋅ (0, 7, 0) = 32. (Answer on p. 1500.)

574, Contents

54. Planes in Parametric Form
In the last two chapters, we learnt to describe planes in vector and cartesian form. In
this chapter, we’ll learn to describe planes in parametric form.
To do so, we first introduce an important result.
Recall (Fact 61) that in 2D space, if a and b are non-parallel vectors, then any vector c
can be written as the linear combination (LC) of a and b. That is, given any vector c
(in 2D space), there exist real numbers λ and µ such that c can be written as:

c = λa + µb.

It turns out that the same is true of vectors on a plane in 3D space. That is, in 3D space,
a vector is on a plane if and only if it can be written as a LC of two non-parallel vectors on
that plane. Or equivalently, the vectors on a plane q are exactly those that can be written
as a LC of any two non-parallel vectors on q. Formally:

Theorem 10. Let q be a plane and a and b be non-parallel vectors on q. Suppose c is a

non-zero vector. Then:

c is a vector on q ⇐⇒ There exist λ, µ ∈ R such that c = λa + µb.

Remark 65. Take care to note that Theorem 10 is an if and only if ( ⇐⇒ ) statement
which says two things. First, ⇐Ô says:

“If a vector can be written as a LC of a and b, then it is on q.”

Or equivalently: “Every vector that’s a LC of a and b is on q.”

Second, the converse Ô⇒ says:

“If a vector is on q, then it can be written as a LC of a and b.”

Or equivalently: “Every vector on q is as a LC of a and b.”

Proof. First note that since a and b are non-parallel vectors on q, by Fact 101, a × b ⊥ q.
We first prove ⇐Ô . Suppose there exist λ, µ ∈ R such that c = λa + µb. We show that
a × b ⊥ c, so that by Fact 99, c is also a vector on the plane:

(a × b) ⋅ c = (a × b) ⋅ (λa + µb) = (a × b) ⋅ (λa) + (a × b) ⋅ (µb)

= λ (a × b) ⋅ a + µ (a × b) ⋅ b = 0 + 0 = 0. 3

The proof of Ô⇒ is harder and thus relegated to p. 1305 (Appendices).

As we’ll see in Ch. 54.1, Theorem 10 will allow us to describe planes in parametric form.
But first, let’s better acquaint ourselves with Theorem 10 with some examples:

575, Contents

Example 715. The plane q de- a = (1, 2, 3)
scribed by x+y+z = 1 has normal
vector n = (1, 1, 1).
−u = − (1, −1, 0)
Consider the vectors w = (0, 1, −1)
u = (1, −1, 0) and v = (1, 0, −1).
They are both perpendicular v = (1, 0, −1)
to n (verify this!). And so by x
Corollary 16, they are on q. n = (1, 1, 1)
Observe moreover that u ∥/ v. And so by The- z
orem 10, the vectors on q are exactly those v = (1, 0, −1)
that can be written as a LC of u and v.
u = (1, −1, 0)
For example the vector w = (0, 1, −1) is perpendicular
to n and is on q. And so by Theorem 10, we should be
able to write w as a LC of u and v, as indeed we can:
w = (0, 1, −1) = (1, 0, −1) − (1, −1, 0) = v − u.

In contrast, the vector a = (1, 2, 3) is not perpendicular to n and so is not on q. And so

by Theorem 10, we cannot write a as a LC of u and v. To verify that this is so, write:

⎛1⎞ ⎛ 1 ⎞ ⎛ 1 ⎞ 1 =λ+µ

a = λu + µv ⎜ 2 ⎟ = λ⎜ −1 ⎟ + µ⎜ 0 ⎟ 2 = −λ
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
or or
⎝3⎠ ⎝ 0 ⎠ ⎝ −1 ⎠ 3 = −µ.

= plus = yields 3 = µ, which contradicts =. So, there are no numbers λ and µ such that
1 2 3

a = λu + µv. Or equivalently, a cannot be written as a LC of u and v.

Example 716. The plane q described by x − y = 5 has normal vector n = (1, −1, 0).
The vectors u = (0, 0, 1) and v = (1, 1, 0) are perpendicular to n and are thus on q.
Moreover, u ∥/ v. And so by Theorem 10, the vectors on q are exactly those that can be
written as a LC of u and v.
For example, the vector w = (1, 1, 1) is perpendicular to n and is on q. And so by Theorem
10, we should be able to write w as a LC of u and v, as indeed we can:

w = (1, 1, 1) = (0, 0, 1) + (1, 1, 0) = u + v.

In contrast, the vector a = (0, 1, 0) is not perpendicular to n and is not on q. And so by

Theorem 10, we cannot write a as a LC of u and v. To verify that this is so, write:

⎛0⎞ ⎛0 ⎞ ⎛1⎞ 0 =µ

a = λu + µv ⎜ 1 ⎟ = λ⎜ 0 ⎟ + µ⎜ 1 ⎟ 1 =µ
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
or or
⎝0⎠ ⎝1 ⎠ ⎝0⎠ 0 = λ.

Clearly, = immediately contradicts =. So, there are no numbers λ and µ such that
1 2

a = λu + µv. Or equivalently, a cannot be written as a LC of u and v.

576, Contents

Example 717. The plane q described by z = 7 has normal vector n = (0, 0, 1).
The vectors u = (1, 0, 0) and v = (0, 1, 0) are perpendicular to n and are thus on q.
Moreover, u ∥/ v. And so by Theorem 10, the vectors on q are exactly those that can be
written as a LC of u and v.
For example, the vector w = (1, 2, 0) is perpendicular to n and is on q. And so by Theorem
10, we should be able to write w as a LC of u and v, as indeed we can:

w = (1, 2, 1) = (1, 0, 0) + 2 (0, 1, 0) = u + 2v.

In contrast, the vector a = (0, 1, 1) is not perpendicular to n and is not on q. And so by

Theorem 10, we cannot write a as a LC of u and v. To verify that this is so, write:

⎛0⎞ ⎛1 ⎞ ⎛0⎞ 0 =λ

a = λu + µv ⎜ 1 ⎟ = λ⎜ 0 ⎟ + µ⎜ 1 ⎟ 1 =µ
⎜ ⎟ ⎜ ⎟ ⎜ ⎟
or or
⎝1⎠ ⎝0 ⎠ ⎝0⎠ 1 = 0.

Clearly, = is an immediate self-contradiction. So, there are no numbers λ and µ such that

a = λu + µv. Or equivalently, a cannot be written as a LC of u and v.

Exercise 236. Here are three planes:

(a) r ⋅ (1, 2, 3) = 4. (b) r ⋅ (1, 0, −1) = 0. (c) r ⋅ (9, 1, 1) = −5.

For each plane:

(i) Write down the corresponding cartesian equation.
(ii) Write down two non-parallel vectors on the plane (call them u and v). (You should
explain why u and v are non-parallel.)
(iii) Write down another vector w on the plane. Explain why we can express w as a LC
of u and v. Then do so.
(iv) Explain whether it is possible to express the vector (1, 1, 1) as a LC of u and v.
And if it is possible, do so. (Answer on p. 1501.)

577, Contents

54.1. Planes in Parametric Form
Suppose q is a plane and P is a point on q. Then Fact 100 says that a point R is on the
plane q if and only if the vector P R is on q. In formal notation:
R∈q ⇐⇒
P R is on q.

Next, suppose a and b are non-parallel vectors on q. Then Theorem 10 says that a vector
v is on the plane q if and only if it is a LC of a and b. In formal notation:

⇐⇒ There exist λ, µ ∈ R such that v = λa + µb.

v is on q

Now, combining ⇐⇒ and ⇐⇒ , we have:

1 2

R∈q ⇐⇒ There exist λ, µ ∈ R such that P R = λa + µb.

In words: A point R is on the plane q if and only if the vector P R is a LC of a and b.
Or equivalently: The plane q contains exactly those points R for which the vector P R is a
LC of a and b. That is:
q = {R ∶ P R = λa + µb (λ, µ ∈ R)} .

We previously learnt how to describe planes in vector and cartesian forms. = now gives

us a third way to describe planes and is called the parametric form of a plane.
Let’s write down or summarise the above discussion as a formal result:

Fact 105. Let q be a plane, P be a point, and a and b be non-parallel vectors. Suppose
P , a, and b are on q. Then:
R∈q ⇐⇒ There exist λ, µ ∈ R such that P R = λa + µb.
q = {R ∶ P R = λa + µb (λ, µ ∈ R)}.
Or equivalently:

Ð→ Ð→ Ð→
Observe that P R = OR − OP . And so:
Ð→ Ð→ Ð→
P R = λa + µb ⇐⇒ OR = OP + λa + µb.

Thus, the above Fact can also be rewritten as:

Corollary 19. Let q be a plane, P be a point, and a and b be non-parallel vectors.

Suppose P , a, and b are on q. Then:
Ð→ Ð→
R∈q ⇐⇒ There exist λ, µ ∈ R such that OR = OP + λa + µb.
Ð→ Ð→
q = {R ∶ OR = OP + λa + µb (λ, µ ∈ R)}.
Or equivalently:

578, Contents

As before, instead of fully writing out the plane q in set notation (as was done in = and =),
4 5

we can simply say that q is described by the following parametric equation:

Vec. Vec.
« «
Ð→ Ð→
= +λa + µb (λ, µ ∈ R).

Pt Pt
© ©
= +λa + µb (λ, µ ∈ R).
Or equivalently: R P
= gives us a nice, informal geometric interpretation. A point R is on the plane q if:

R can be reached from P through a LC of “steps” in the directions of a and b.

Example 718. Let q be the plane described by r ⋅ (1, 1, 1) = 1 or x + y + z = 1.

It contains the point P = (1, 0, 0) and the non-parallel vectors a = (−1, 1, 0) and
b = (−1, 0, 1). And so by Corollary 19,
Ð→ Ð→
q = {R ∶ OR = OP + λa + µb (λ, µ ∈ R)}

⎪ ⎫

⎪ ⎛1⎞ ⎛ −1 ⎞ ⎛ −1 ⎞ ⎛ 1 − λ − µ ⎞ ⎪

⎪ Ð→ 6 ⎜ ⎟ ⎪
= ⎨R ∶ OR = ⎜ 0 ⎟ + λ⎜
⎜ 1
⎟ + µ⎜ 0 ⎟ = ⎜
⎟ ⎜ ⎟ ⎜

⎟ (λ, µ ∈ R)⎬ .

⎪ ⎪


⎪ ⎝0⎠ ⎝ 0 ⎠ ⎝ 1 ⎠ ⎝ ⎠ ⎪

⎩ µ ⎭

More simply, we say that q is described by the parametric equation =:


⎛1⎞ ⎛ −1 ⎞ ⎛ −1 ⎞ ⎛ 1 − λ − µ ⎞
Ð→ 6 ⎜ ⎟
OR = ⎜ 0 ⎟ + λ ⎜
⎜ 1
⎟ + µ⎜ 0 ⎟ = ⎜
⎟ ⎜ ⎟ ⎜ λ ⎟
⎟ (λ, µ ∈ R).
⎝0⎠ ⎝ 0 ⎠ ⎝ 1 ⎠ ⎝ µ ⎠

As the parameters λ and µ vary, we get dif-

ferent points on q. For example, (λ, µ) = The plane q
(3, 5) produces the point:

(1, 0, 0)+3 (−1, 1, 0)+5 (−1, 0, 1) = (−7, 3, 5).

5b = 5 (−1, 0, 1)
Starting from the point P , we can reach the
3a = 3 (−1, 1, 0)
point (−7, 3, 5) by taking 3 “steps” in the
direction of a, then 5 “steps” in the direc- (−7, 3, 5) x
tion of b. P = (1, 0, 0)
Similarly, (λ, µ) = (0, 0) produces:

(1, 0, 0) + 0 (−1, 1, 0) + 0 (−1, 0, 1) = (1, 0, 0).

Starting from the point P , we are already z

at the point (1, 0, 0).

579, Contents

Example 719. Let q be the plane described by z = 3.
It contains the point P = (0, 0, 3) and the non-parallel vectors a = (1, 0, 0) and b = (0, 1, 0).
And so by Corollary 19,
Ð→ Ð→
q = {R ∶ OR = OP + λa + µb (λ, µ ∈ R)}

⎪ ⎫

⎪ ⎛0⎞ ⎛1⎞ ⎛0⎞ ⎛λ ⎞ ⎪

⎪ Ð→ 6 ⎜ ⎟ ⎪
= ⎨R ∶ OR = ⎜ 0 ⎟ + λ⎜ 0 ⎟ + µ⎜
⎜ ⎟
⎜ 1⎟ ⎟ =⎜⎜µ

⎟ (λ, µ ∈ R)⎬ .

⎪ ⎪

⎩ ⎝3⎠ ⎝0⎠ ⎝0⎠ ⎝ 3 ⎠ ⎪

More simply, we may say that q is described by the parametric equation =.


As the parameters λ and µ vary, we get different points on the plane q. For example,
(λ, µ) = (5, −1) produces the point:

(0, 0, 3) + 5(1, 0, 0) + (−1)(0, 1, 0) = (5, −1, 3).

Starting from the point P , we can reach the point (5, −1, 3) by taking 1 “step” in the
direction opposite to b, then 5 “steps” in the direction of a.

The plane q

P = (0, 0, 3) x
−b = − (0, 1, 0)

(5, −1, 3)
5a = 5 (1, 0, 0)

As another example, (λ, µ) = (0, 1) produces the point (not depicted):

(0, 0, 3) + 0(1, 0, 0) + 1(0, 1, 0) = (0, 1, 3).

Starting from the point P , we can reach the point (0, 1, 3) by taking 1 “step” in the
direction of b.

580, Contents

Example 720. Let q be the plane described by −x + 3y − 5z = 7.
It contains the point P = (−7, 0, 0) and the non-parallel vectors a = (3, 1, 0) and
b = (5, 0, −1). And so by Corollary 19,
Ð→ Ð→
q = {R ∶ OR = OP + λa + µb (λ, µ ∈ R)}

⎪ ⎫

⎪ ⎛ −7 ⎞ ⎛3 ⎞ ⎛ 5 ⎞ ⎛ −7 + 3λ + 5µ ⎞ ⎪

⎪ Ð→ 6 ⎜ ⎪
= ⎨R ∶ OR = ⎜ 0 ⎟ + λ⎜

⎟ + µ⎜ 0 ⎟ = ⎜
⎟ ⎜ ⎟ ⎜

⎟ (λ, µ ∈ R)⎬ .

⎪ ⎪


⎪ ⎝ 0 ⎠ ⎝0 ⎠ ⎝ −1 ⎠ ⎝ −µ ⎠ ⎪

⎩ ⎭

More simply, we may say that q is described by the parametric equation =.


As the parameters λ and µ vary, we get different points on the plane q. For example,
(λ, µ) = (1, 2) produces the point:

(−7, 0, 0) + 1(3, 1, 0) + 2 (5, 0, −1) = (6, 1, −2).

Starting from the point P , we can reach the point (6, 1, −2) by taking 1 “step” in the
direction of a, then 2 “steps” in the direction of b.

The plane q
a = (3, 1, 0) 2b = 2 (5, 0, −1)
(6, 1, −2)
P = (1, 0, 0)

As another example, (λ, µ) = (−1, 3) produces the point (not depicted):

(−7, 0, 0) + (−1)(3, 1, 0) + 3 (5, 0, −1) = (5, −1, −3).

Starting from the point A, we can reach the point (5, −1, −3) by taking 1 “step” in the
direction opposite to a and 3 “steps” in the direction of b.

Exercise 237. Below are three planes given in vector form. Rewrite each into both
cartesian and parametric forms. (Answer on p. 1502.)

(a) r ⋅ (−1, 2, 5) = 5. (b) r ⋅ (0, 0, 1) = 0. (c) r ⋅ (1, −3, 5) = −2.

581, Contents

54.2. Parametric to Vector or Cartesian Form
Given a plane in parametric form, rewriting it into vector or cartesian form is easy:

Example 721. We are given a plane in parametric form:

⎛7⎞ ⎛8 ⎞ ⎛ 9 ⎞ ⎛ 7 + 8λ + 9µ ⎞
r = ⎜ 3 ⎟ + λ⎜
⎜ ⎟
⎟ + µ ⎜ 3 ⎟ = ⎜ 3 + 3λ + 3µ
⎟ ⎜ ⎟ ⎜

⎟ (λ, µ ∈ R).
⎝4⎠ ⎝4 ⎠ ⎝ 7 ⎠ ⎝ 4 + 4λ + 7µ ⎠

This plane contains the vectors (8, 3, 4) and (9, 3, 7). And so, a normal vector of this
plane is the vector product of these two vectors:

(8, 3, 4) × (9, 3, 7) = (9, −20, −3).

The plane contains the point (7, 3, 4). Since (7, 3, 4) ⋅ (9, −20, −3) = 63 − 60 − 12 = −9, the
plane may be described by:

r ⋅ (9, −20, −3) = −9 or 9x − 20y − 3z = −9.

Example 722. We are given a plane in parametric form:

⎛ 17 + 3λ − 2µ ⎞
⎜ 2µ − 2 ⎟
⎟ (λ, µ ∈ R).
⎝ 5λ ⎠

First, rewrite the above as:

⎛ 17 ⎞ ⎛3 ⎞ ⎛ −2 ⎞
r=⎜ ⎟ ⎜
⎜ −2 ⎟ + λ ⎜ 0
⎟ + µ⎜ 2 ⎟
⎟ ⎜ ⎟ (λ, µ ∈ R).
⎝ 0 ⎠ ⎝5 ⎠ ⎝ 0 ⎠

This plane contains the vectors (3, 0, 5) and (−2, 2, 0). And so, a normal vector is:

(3, 0, 5) × (−2, 2, 0) = (−10, −10, 6).

Observe that (−10, −10, 6) ∥ (5, 5, −3). Hence, by Fact 97, (5, 5, −3) is also a normal
vector of the plane. (It’s nice to “simplify” the normal vector as much as possible — this
will usually make our subsequent calculations slightly easier.)
The plane contains the point (17, −2, 0). Since (17, −2, 0) ⋅ (5, 5, −3) = 85 − 10 + 0 = −75,
the plane may be described by:

r ⋅ (5, 5, −3) = −75 or 5x + 5y − 3z = −75.

582, Contents

Example 723. We are given a plane in parametric form:

⎛ λ−µ−2 ⎞
⎜ 14 + 5λ + 3µ

⎟ (λ, µ ∈ R).
⎝ 5 + µ + 7λ ⎠

First, rewrite the above as:

⎛ −2 ⎞ ⎛1 ⎞ ⎛ −1 ⎞
r = ⎜ 14 ⎟ + λ ⎜
⎜ ⎟
⎟ + µ⎜ 3 ⎟
⎟ ⎜ ⎟ (λ, µ ∈ R).
⎝ 5 ⎠ ⎝7 ⎠ ⎝ 1 ⎠

This plane contains the vectors (1, 5, 7) and (−1, 3, 1). And so, a normal vector is:

(1, 5, 7) × (−1, 3, 1) = (−16, −8, 8).

Observe that (−16, −8, 8) ∥ (2, 1, −1). So, (2, 1, −1) is also a normal vector.
The plane contains the point (−2, 14, 5). Since (−2, 14, 5) ⋅ (2, 1, −1) = −4 + 14 − 5 = 5, the
plane may be described by:

r ⋅ (2, 1, −1) = 5 or 2x + y − z = 5.

Exercise 238. Below are three planes given in parametric form. Rewrite each into both
vector and cartesian form. (Answer on p. 1502.)
(a) r = (1, 2, 3) + λ (4, 5, 6) + µ (7, 8, 9) (λ, µ ∈ R).
(b) r = (λ − µ, 4λ + 5, 0) (λ, µ ∈ R).
(c) r = (1 + µ, 1 + λ, λ + µ) (λ, µ ∈ R).

583, Contents

55. Four Ways to Uniquely Determine a Plane
Recall that there are two ways to uniquely determine a line. A line can be uniquely
determined by (a) two distinct points; or (b) a point and a vector.
Similarly, there are Four Ways to uniquely determine a plane. A plane can be uniquely
determined by:246
1. A point and a normal vector;
2. A point and two vectors (that aren’t parallel);
3. Two points and a vector (that isn’t parallel to the vector between the two points); or
4. Three points (that aren’t collinear).
We’ll give two examples of each of the Four Ways. These examples will also serve as a
summary of what we’ve learnt so far about planes.
First, two examples where we’re given a point and a normal vector of the plane:

Example 724. A plane contains the point (1, 2, 3) and has normal vector (1, 1, 0).
Compute (1, 2, 3) ⋅ (1, 1, 0) = 1 + 2 + 0 = 3. Thus, this plane may be described in vector or
cartesian form by:

r ⋅ (1, 1, 0) = 3 or x + y = 3.

This plane also contains the non-parallel vectors (1, −1, 0) and (0, 0, 1). Thus, it may be
described in parametric form by:

⎛1⎞ ⎛ 1 ⎞ ⎛ 0 ⎞ ⎛ 1+λ ⎞
r=⎜ ⎟ ⎜
⎜ 2 ⎟ + λ ⎜ −1
⎟ + µ⎜ 0 ⎟ = ⎜ 2 − λ
⎟ ⎜ ⎟ ⎜

⎟ (λ, µ ∈ R).
⎝3⎠ ⎝ 0 ⎠ ⎝ 1 ⎠ ⎝ 3+µ ⎠

Example 725. A plane contains the point (0, 0, 1) and has normal vector (2, −1, 1).
Compute (0, 0, 1) ⋅ (2, −1, 1) = 0 + 0 + 1 = 1. Thus, this plane may be described in vector
or cartesian form by:

r ⋅ (2, −1, 1) = 1 or 2x − y + z = 1.

This plane also contains the non-parallel vectors (1, 2, 0) and (0, 1, 1). Thus, it may be
described in parametric form by:

⎛0⎞ ⎛1 ⎞ ⎛0⎞ ⎛ λ ⎞
r=⎜ ⎟ ⎜
⎜ 0 ⎟ + λ⎜ 2
⎟ + µ ⎜ 1 ⎟ = ⎜ 2λ + µ
⎟ ⎜ ⎟ ⎜

⎟ (λ, µ ∈ R).
⎝1⎠ ⎝0 ⎠ ⎝ 1 ⎠ ⎝ 1+µ ⎠

Second, two examples where we’re given one point and two vectors (that aren’t parallel):

This is an assertion that we formally prove only in Ch. 119.11 (Appendices).
584, Contents
Example 726. A plane contains the point (1, 2, 3) and the non-parallel vectors (5, 4, 3)
and (1, −1, 2). It may thus be described by the following parametric equation:

⎛1⎞ ⎛5 ⎞ ⎛ 1 ⎞ ⎛ 1 + 5λ + µ ⎞
r = ⎜ 2 ⎟ + λ⎜
⎜ ⎟
⎟ + µ ⎜ −1 ⎟ = ⎜ 2 + 4λ − µ
⎟ ⎜ ⎟ ⎜

⎟ (λ, µ ∈ R).
⎝3⎠ ⎝3 ⎠ ⎝ 2 ⎠ ⎝ 3 + 3λ + 2µ ⎠

This plane has normal vector (5, 4, 3) × (1, −1, 2) = (11, −7, −9).
Compute (1, 2, 3) ⋅ (11, −7, −9) = 11 − 14 − 27 = −30. Thus, this plane may be described in
vector or cartesian form by:
r ⋅ (11, −7, −9) = −30 or 11x − 7y − 9z = −30.

Example 727. A plane contains the point (5, 0, 1) and the non-parallel vectors (1, 1, 8)
and (1, 0, 1). It may thus be described by the following parametric equation:

⎛5⎞ ⎛1 ⎞ ⎛ 1 ⎞ ⎛ 5+λ+µ ⎞
r=⎜ ⎟ ⎜
⎜ 0 ⎟ + λ⎜ 1
⎟ + µ⎜ 0 ⎟ = ⎜
⎟ ⎜ ⎟ ⎜ λ ⎟
⎟ (λ, µ ∈ R).
⎝1⎠ ⎝8 ⎠ ⎝ 1 ⎠ ⎝ 1 + 8λ + µ ⎠

This plane has normal vector (1, 1, 8) × (1, 0, 1) = (1, 7, −1).

Compute (5, 0, 1) ⋅ (1, 7, −1) = 5 + 0 − 1 = 4. Thus, this plane may be described in vector
or cartesian form by:
r ⋅ (1, 7, −1) = 4 or x + 7y − z = 4.

Third, two examples where we’re given two points and a vector (that isn’t parallel to
the vector between the two points):

Example 728. A plane contains the points (0, 0, 3) and (1, 4, 5), and the vector (3, 2, 1).
The vector between the two points is (1, 4, 5) − (0, 0, 3) = (1, 4, 2) and isn’t parallel to the
vector (3, 2, 1). Thus, this plane may be described by the following parametric equation:

⎛0⎞ ⎛3 ⎞ ⎛ 1 ⎞ ⎛ 3λ + µ ⎞

r = ⎜ 0 ⎟ + λ⎜

⎟ + µ ⎜ 4 ⎟ = ⎜ 2λ + 4µ
⎟ ⎜ ⎟ ⎜

⎟ (λ, µ ∈ R).
⎝3⎠ ⎝1 ⎠ ⎝ 2 ⎠ ⎝ 3 + λ + 2µ ⎠

This plane has normal vector (3, 2, 1) × (1, 4, 2) = (0, −5, 10).
It thus also has normal vector (0, −1, 2).
Compute (0, 0, 3) ⋅ (0, −1, 2) = 0 + 0 + 6 = 6. Thus, this plane may be described in vector
or cartesian form by:
r ⋅ (0, −1, 2) = 6 or −y + 2z = 6.
585, Contents
Example 729. A plane contains the points (8, −2, 0) and (3, 6, 9), and the vector (0, 1, 1).
The vector between the two points is (3, 6, 9) − (8, −2, 0) = (−5, 8, 9) and isn’t parallel to
the vector (0, 1, 1). Thus, this plane may be described in parametric form as:

⎛ 8 ⎞ ⎛0 ⎞ ⎛ −5 ⎞ ⎛ 8 − 5µ ⎞
r = ⎜ −2 ⎟ + λ ⎜
⎜ ⎟
⎟ + µ ⎜ 8 ⎟ = ⎜ −2 + λ + 8µ
⎟ ⎜ ⎟ ⎜

⎟ (λ, µ ∈ R).
⎝ 0 ⎠ ⎝1 ⎠ ⎝ 9 ⎠ ⎝ λ + 9µ ⎠

This plane has normal vector (0, 1, 1) × (−5, 8, 9) = (1, −5, 5).
Compute (8, −2, 0)⋅(1, −5, 5) = 8+10+0 = 18. Thus, this plane may be described in vector
and cartesian forms by r ⋅ (1, −5, 5) = 18 and x − 5y + 5z = 18.

Fourth and lastly, two examples where we’re given three points (that aren’t collinear):

Example 730. A plane contains the points (1, 2, 3), (4, 5, 8), and (2, 3, 5).
The vector between the first two points is (4, 5, 8) − (1, 2, 3) = (3, 3, 5), while that between
the first and last is (2, 3, 5) − (1, 2, 3) = (1, 1, 2). Since (3, 3, 5) ∥/ (1, 1, 2), this plane may
be described described in parametric form as:

⎛1⎞ ⎛3 ⎞ ⎛ 1 ⎞ ⎛ 1 + 3λ + µ ⎞
r=⎜ ⎟ ⎜
⎜ 2 ⎟ + λ⎜ 3
⎟ + µ⎜ 1 ⎟ = ⎜ 2 + 3λ + µ
⎟ ⎜ ⎟ ⎜

⎟ (λ, µ ∈ R).
⎝3⎠ ⎝5 ⎠ ⎝ 2 ⎠ ⎝ 3 + 5λ + 2µ ⎠

This plane has normal vector (3, 3, 5) × (1, 1, 2) = (1, −1, 0).
Compute (1, 2, 3) ⋅ (1, −1, 0) = 1 − 2 + 0 = −1. Thus, this plane may be described in vector
and cartesian forms by r ⋅ (1, −1, 0) = −1 and x − y = −1.

Example 731. A plane contains the points (1, 0, 0), (0, 1, 0), and (0, 0, 1).
The vector between the first two points is (0, 1, 0)−(1, 0, 0) = (−1, 1, 0), while that between
the first and last is (0, 0, 1) − (1, 0, 0) = (−1, 0, 1). Since (−1, 1, 0) ∥/ (−1, 0, 1), this plane
may be described described in parametric form as:

⎛1⎞ ⎛ −1 ⎞ ⎛ −1 ⎞ ⎛ 1 − λ − µ ⎞
r=⎜ ⎟ ⎜
⎜ 0 ⎟ + λ⎜ 1
⎟ + µ⎜ 0 ⎟ = ⎜
⎟ ⎜ ⎟ ⎜ λ ⎟
⎟ (λ, µ ∈ R).
⎝0⎠ ⎝ 0 ⎠ ⎝ 1 ⎠ ⎝ µ ⎠

This plane has normal vector (−1, 1, 0) × (−1, 0, 1) = (1, 1, 1).

Compute (1, 0, 0) ⋅ (1, 1, 1) = 1 + 0 + 0 = 1. Thus, this plane may be described in vector
and cartesian forms by r ⋅ (1, 1, 1) = 1 and x + y + z = 1.

Exercise 239. In each of the following, three points are given. Describe the plane that
contains all three points in vector, cartesian, and parametric form.
(a) (7, 3, 4), (8, 3, 4), and (9, 3, 7). (c) (8, 5, 9), (8, 4, 5), an
(b) (8, 0, 2), (4, 4, 3), and (2, 7, 2). (Answe

586, Contents

56. The Angle between a Line and a Plane
What is the angle between a line and a plane?
Suppose the line l has direction vector v, the plane q has normal vector n, and the non-
obtuse angle between v and n is θ.
Let A denote the angle between the line l and the plane q.
Case 1 The line l Case 2
The plane q v
n v θ
θ θ
A = 0.5π − θ A = 0.5π − θ

A = 0.5π − θ n

As the above figure suggests, A is simply the complement of θ. That is:

A= − θ.
Recall (Fact 74) that θ, the non-obtuse angle between v and n, is given by:
∣v ⋅ n∣
θ = cos−1
∣v∣ ∣n∣

∣v ⋅ n∣
A= − θ = − cos−1
π π
Thus: .
2 2 ∣v∣ ∣n∣

Recall also the following trigonometric identity (Fact 35):

sin−1 x + cos−1 x =
­ ­
x x

∣v ⋅ n∣ ∣v ⋅ n∣
A = − cos−1 = sin−1
And so, we have: .
2 ∣v∣ ∣n∣ ∣v∣ ∣n∣

The above discussion motivates the following Definition:

Definition 144. Suppose a line has direction vector v and a plane has normal vector n.
Then the angle between the line and the plane is the following number:
∣v ⋅ n∣
∣v∣ ∣n∣

Observe that to compute the angle between a line and a plane, all we need are a direction
vector of the line and a normal vector of the plane. Three examples:

587, Contents

Example 732. The angle between a line y
with direction vector (9, 1, 3) and a plane
with normal vector (1, 1, 1) is: The plane

n = (1, 1, 1)
−1 ∣(9, 1, 3) ⋅ (1, 1, 1)∣ z The line
∣(9, 1, 3)∣ ∣(1, 1, 1)∣ x
∣9 + 1 + 3∣
= sin−1 √ √
92 + 12 + 32 12 + 12 + 12 0.906
= sin−1 √ √ ≈ 0.906.
91 3

Example 733. The angle between a line y

with direction vector (1, 0, 1) and a plane
with normal vector (−1, −1, 0) is:
The line The plane
−1 ∣(1, 0, 1) ⋅ (−1, −1, 0)∣
∣(1, 0, 1)∣ ∣(−1, −1, 0)∣
∣−1 + 0 + 0∣
= sin−1 √ √
1 + 0 + 1 (−1) + (−1) + 02
2 2 2 2 2

∣−1∣ 1 π z
= sin−1 √ √ = sin−1 ≈ . π
2 2 2 6 n = (−1, −1, 0) 6

Example 734. The angle between a line with direction vector (1, 0, 1) and a plane with
normal vector (0, 1, 0) is:

∣(1, 0, 1) ⋅ (0, 1, 0)∣ ∣0 + 0 + 0∣ ∣0∣

sin−1 = sin−1 √ √ = sin−1 √ √ = sin−1 0 = 0.
∣(1, 0, 1)∣ ∣(0, 1, 0)∣ 12 + 02 + 12 02 + 12 + 02 2 1

Exercise 240. Find the angle between the given line and plane. (Answer on p. 1504.)
(a) Line: r = (−1, 2, 3) + λ (−1, 1, 0) (λ ∈ R).
Plane: r ⋅ (3, 4, 5) = 0.

(b) Line: Contains the points (−1, 2, 3) and (−1, 4, 9).

Plane: r = (2, 0, 0) + λ (−3, 1, 0) + µ (0, 5, −3) (λ ∈ R).

(c) Line: Contains the points (−1, 2, 3) and (0, 11, 11).
Plane: Contains the points (1.5, 0, 0) and (0, 0, 1.5) and the vector (4, −1, 0).

588, Contents

56.1. When a Line and a Plane Are Parallel, Perp., or Intersect

Definition 145. Let θ be the angle between a line and a plane. The line and plane are
said to be (a) parallel if θ = 0; and (b) perpendicular if θ = π/2.

As usual, if a line l and a plane q are parallel, then as shorthand we’ll write l ∥ q. And if
they’re perpendicular, we’ll write l ⊥ q. n l2
“Obviously”, a line (e.g. l1 ) is parallel to a plane l1
if and only if the line’s direction vector is per-
pendicular to the plane’s normal vector.
Similarly, a line (e.g. l2 ) is perpendicular to a l
plane if and only if the line’s direction vector is The plane
parallel to the plane’s normal vector.
Let’s state and prove these obviosities formally:

Fact 106. Suppose the line l has direction vector v and the plane q has normal vector n.
Then (a) l ∥ q ⇐⇒ v ⊥ n; and (b) l ⊥ q ⇐⇒ v ∥ n.

Proof. Let θ be the angle between l and q. By Definitions 145, 144, and 112, and Fact 71:
∣v ⋅ n∣
(a) l ∥ q ⇐⇒ θ = 0 ⇐⇒ sin−1 = 0 ⇐⇒ v⋅n = 0 ⇐⇒ v ⊥ n.
∣v∣ ∣n∣
∣v ⋅ n∣ ∣v ⋅ n∣
(b) l ⊥ q ⇐⇒ θ = ⇐⇒ sin−1 = ⇐⇒ = 1 ⇐⇒ v ∥ n.
π π
2 ∣v∣ ∣n∣ 2 ∣v∣ ∣n∣
It will be nice if we can speak of a vector and a plane being parallel or perpendicular:

Definition 146. Let q be a plane with normal vector n and v be a vector. Then q and
v are said to be (a) parallel if v ⊥ n; and (b) perpendicular if v ∥ n.

The following result is intuitively “obvious”,

Fact 107. Given a line and a plane, there are three possibilities. The line and plane are:
(a) Parallel and do not intersect at all.
(b) Parallel and the line lies entirely on the plane.
(c) Non-parallel and intersect at exactly one point.

Proof. See p. 1308 in the Appendices.

Example 735. The three lines l1 , l2 , and l3 , and plane q l1

illustrate all three possibilities given in Fact 107:
1. l1 is parallel to q and does not intersect q.
2. l2 is parallel to q and lies entirely on q.
3. l3 isn’t parallel to q and intersects q at exactly one point.
l3 The plane q

589, Contents

From Fact 107, the following Corollary is immediate:

Corollary 20. If a line and plane are parallel, then they intersect if and only if the line
lies completely on the plane.

Example 736. The line l and the plane q are described by:

r = (3, 5, 5) + λ (9, 1, 3) (λ ∈ R) and r ⋅ (1, 1, 1) = 3.

l has direction vector v = (9, 1, 3),

The line l
while q has normal vector n = (1, 1, 1).
Observe that v ⊥/ n, because v ⋅ n =
(9, 1, 3) ⋅ (1, 1, 1) ≠ 0. And so, by Fact 1
106, l ∥/ q. (−51, 55, 35)
The plane q 13
Since l ∥/ q, by Fact 107, they share
exactly one intersection point.

To find this intersection point, simply plug in a generic point of l into the equation for q:

[(3, 5, 5) + λ̂ (9, 1, 3)] ⋅ (1, 1, 1) = 3 ⇐⇒ 13 + 13λ̂ = 3 ⇐⇒ λ̂ = − .
10 1
Hence, l and q intersect at: (3, 5, 5) + λ̂ (9, 1, 3) = (3, 5, 5) − (9, 1, 3) = (−51, 55, 35).
13 13

Example 737. The line l and the plane q are described by:

r = (3, 5, 5) + λ (9, 1, 3) (λ ∈ R) and r ⋅ (1, 0, −3) = −6.

l has direction vector v = (9, 1, 3), while q has

normal vector n = (1, 0, −3). The line l
Observe that v⋅n = (9, 1, 3)⋅(1, 0, −3) = 0, so that
v ⊥ n and hence l ∥ q.
Since l ∥ q, by Fact 107, there are two possibilit-
ies: The plane q

Either l lies entirely on q; or l and q don’t intersect at all.

Here’s a trick to find out which of these two possibilities holds. Observe that the line l
contains the point (3, 5, 5).247 So, simply check if this point is also on the plane q:

(3, 5, 5) ⋅ (1, 0, −3) = 3 + 0 − 15 = −12 ≠ −6.

Since this point does not satisfy q’s vector equation, it is not on q. So, it cannot be that
l lies entirely on q. That leaves only one possibility: l and q do not intersect at all.

To see this, simply plug λ = 0 into the line’s vector equation.
590, Contents
Example 738. The line l and the plane q are described by:

r = (3, 5, 3) + λ (9, 1, 3) (λ ∈ R) and r ⋅ (1, 0, −3) = −6.

Again, l has direction vector v = (9, 1, 3), while q has normal vector n = (1, 0, −3). And
so again, l ∥ q.
And again, two possibilities: either l lies entirely on q or they don’t intersect at all.
To find out which it is, we use the same trick as before. Observe that the point (3, 5, 3)
is on l. Let’s check if this point is also on the plane q:

(3, 5, 3) ⋅ (1, 0, −3) = 3 + 0 − 9 = −6. 3

Yup, this time it is. Since l and q are parallel and share an intersection point, it must be
that l lies entirely on q.

The line l
The plane q

Exercise 241. In each of the following, a line l and a plane q are given. For each,
determine which of the three possibilities given in Fact 107 holds. And if the line and
plane intersect, find their intersection point. (Answer on p. 1504.)
(a) l: r = (4, 5, 6) + λ (2, 3, 5) (λ ∈ R).
q: r ⋅ (−10, 0, 4) = −26.

(b) l: Contains the points (5, 5, 6) and (3, 2, 1).

q: r = (3, 0, 1) + λ (2, 0, 5) + µ (2, 1, 5) (λ ∈ R).

(c) l: Contains the points (4, 5, 6) and (6, 8, 11).

q: Contains the points (2, 0, −2) and (2, 1, −2) and the vector (3, 0, 10).

591, Contents

57. The Angle between Two Planes

Definition 147. The angle between two planes is the non-obtuse angle between their
normal vectors.

Example 739. The planes q1 and q2 have n1

q2 θ
normal vectors n1 and n2 .
Let θ be the (non-obtuse) angle between the n2
normal vectors n1 and n2 .
Then θ is also the angle between q1 and q2 .
q1 θ

∣u ⋅ v∣
By Fact 74, the non-obtuse angle between u and v is cos−1 .
∣u∣ ∣v∣
And so, we have the following “formula” for the angle between two planes:

Fact 108. The angle between two planes with normal vectors u and v is:
∣u ⋅ v∣
∣u∣ ∣v∣

Example 740. Two planes are described by r ⋅ (2, 1, 3) = 26 and r ⋅ (−3, 0, 5) = −25. The
angle between them is the (non-obtuse) angle between their normal vectors:

∣(2, 1, 3) ⋅ (−3, 0, 5)∣ ∣9∣

θ = cos−1 = cos−1 √ √ ≈ 1.146.
∣(2, 1, 3)∣ ∣(−3, 0, 5)∣ 14 34

Example 741. Two planes are described by r ⋅ (1, 1, 1) = 12 and r ⋅ (−1, −1, 0) = −1. The
angle between them is the (non-obtuse) angle between their normal vectors:

∣(1, 1, 1) ⋅ (−1, −1, 0)∣ ∣−2∣ 2
θ = cos−1 = cos−1 √ √ = cos−1 ≈ 0.615.
∣(1, 1, 1)∣ ∣(−1, −1, 0)∣ 3 2 3

Exercise 242. Find the angle between the given planes. (Answers on p. 1505.)
(a) r ⋅ (−1, −2, −3) = 1 and r ⋅ (3, 4, 5) = 2.
(b) One plane contains the vectors (1, −1, 0) and (3, 5, −1). The other contains the vectors
(0, 1, 0) and (10, 2, 3).
(c) One plane contains the points (1, 1, 0), (3, 0, 0), and (0, 0, 1). The other contains the
points (1, −1, 0), (1, 0, −1), and (0, 3, 1).

592, Contents

57.1. When Two Planes Are Parallel, Perp., or Intersect

Definition 148. Let θ be the angle between two planes. We say that the two planes are:

(a) Parallel if θ = 0; and (b) Perpendicular if θ = π/2.

If the planes q and r are parallel, then as shorthand, we’ll write q ∥ r. And if they’re
perpendicular, we’ll write q ⊥ r.
“Obviously”, two planes are parallel (or perpendicular) if and only if their normal vectors
are parallel (or perpendicular):

Fact 109. Let q and r be planes with normal vectors u and v. Then:

(a) q ∥ r ⇐⇒ u ∥ v; and (b) q ⊥ r ⇐⇒ u ⊥ v.

Proof. See p. 1309 in the Appendices.

Example 742. The planes q1 , q2 , and q3 have normal q1 v2

vectors v1 , v2 , and v3 . We have: q2
• v1 ⊥ v2 and q1 ⊥ q2 .
• v1 ⊥ v3 and q1 ⊥ q3 .
• v2 ∥ v3 and q2 ∥ q3 . v1


The following result is intuitively “obvious”:

Fact 110. If two planes are parallel, then they are either identical or do not intersect.

Proof. See p. 1309 in the Appendices.

Example 743. The planes q1 and q2 are described by r⋅(3, −3, −1) = 1 and r⋅(−6, 6, 2) = 5.
They are parallel because their normal vectors
are parallel: (3, −3, −1) ∥ (−6, 6, 2). q1
Now pick any point on q1 — for example,
(0, 0, −1). Check if this point is on q2 :

(0, 0, −1) ⋅ (−6, 6, 2) = 0 + 0 − 2 = −2 ≠ 5. q2

It isn’t. Since the two planes are not identical, by Fact 110, they do not intersect at all.

593, Contents

Example 744. The planes q1 and q2 are described by r⋅(1, 0, 2) = −3 and r⋅(−2, 0, −4) = 6.
They are parallel because their normal vectors are parallel: (1, 0, 2) ∥ (−2, 0, −4).
Now pick any point on q1 — for example, (−1, 0, −1). Check if this point is on q2 :
q 1 = q2
(−1, 0, −1) ⋅ (−2, 0, −4) = 2 + 0 + 4 = 6.
Yup, it is. Since the two planes share at least one
intersection point, by Fact 110, they are identical.

Here’s another “obvious” result:

Fact 111. If two planes are not parallel, then they must intersect.

Proof. See p. 1310 in the Appendices.

Example 745. The planes q1 and q2 aren’t parallel. So by Fact 111, they must intersect.


Int line

In fact, and as should be intuitively “obvious”, q1 and q2 must intersect along a line.

As the above example suggests, two non-parallel planes must intersect along a line.
Now, what do we know about this intersection line? Well, “obviously”, its direction vector
is parallel to both planes and is thus also perpendicular to both planes’ normal vectors.
Let n and m be two planes’ normal vectors. By Fact 94, the only vectors perpendicular to
both n and m are those parallel to n × m.
Hence, their intersection line must have direction vector n × m. Altogether then:

Fact 112. Suppose two non-parallel planes have normal vectors n and m. Then their
intersection is a line with direction vector n × m.

Proof. See p. 1310 in the Appendices.

594, Contents

Example 746. The planes q1 and q2 are described by:

r ⋅ n1 = r ⋅ (−1, 2, −3) = 4 and r ⋅ n2 = r ⋅ (5, −6, 7) = 0.

Clearly, n1 ∥/ n2 . And so by Fact 112, q1 and q2 must intersect along a line with direction
vector n1 × n2 = (−1, 2, −3) × (5, −6, 7) = (−4, −8, −4) or (1, 2, 1).
Recall that to fully describe a line, we need a direction vector and a point. We already
have a direction vector. Let us now find some point P that is on the intersection line.
To do so, first write out the two planes’ cartesian equations:

−x + 2y − 3z = 4 and 5x − 6y + 7z = 0.

The solutions to the above system of (two) equations gives us the two planes’ intersection
points. Note that with three variables and two equations, this system of equations has
infinitely many solutions and hence infinitely many intersection points. And of
course, the set of all these intersection points is the intersection line.
Here’s a simple trick to find any one such intersection point. As always, zero is our
friend. So, let’s look for an intersection point whose x-coordinate is zero. In other words,
let’s simply plug x = 0 into the above equations to get:

2y − 3z = 4 −6y + 7z = 0.
1 2

And now, we can easily solve this system of (two) equations (with two variables): = plus

3× = yields −2z = 12 or z = −6. Hence, y = −7.


Altogether then, an intersection point shared by q1 and q2 is P = (0, −6, −7). And thus,
their intersection line may be described by:

r = (0, −6, −7) + λ (1, 2, 1) (λ ∈ R).

n1 = (−1, 2, −3)

n2 = (5, −6, 7)
P = (0, −6, −7)


The intersection line has

direction vector (1, 2, 1).

595, Contents

Together, Facts 110 and 112 say the following:

Corollary 21. Given two planes, there are exactly three possibilities. They are:
(a) Identical and thus also parallel;
(b) Parallel and do not intersect at all; or
(c) Non-parallel and intersect along a line.
And hence, two distinct planes intersect if and only if they are not parallel.

Proof. If two distinct planes are parallel, then by Fact 110, they do not intersect at all.
And if they aren’t parallel, then by Fact 112, they intersect along a line.

Example 747. The planes q1 and q2 are described by:

r ⋅ n1 = r ⋅ (−3, 7, 1) = 2 and r ⋅ n2 = r ⋅ (1, 2, 1) = 0.

Clearly, n1 ∥/ n2 . And so by Fact 112, they must intersect along a line with direction
vector (−3, 7, 1) × (1, 2, 1) = (5, 4, −13).
To find an intersection point, write out the two planes’ cartesian equations:

−3x + 7y + z = 2 and x + 2y + z = 0.
n1 = (−3, 7, 1)
Again, there’ll be infinitely many in- q2
tersection points. To find one, use
the same trick as before. Plug x = 0
into the above equations to get: P = (0, 0.4, −0.8)
n2 = (1, 2, 1)
7y + z = 2 2y + z = 0.
1 2

Solving, we have y = 0.4 and z = −0.8.

Hence, an intersection point shared
by q1 and q2 is P = (0, 0.4, −0.8). And q1
thus, their intersection line is de-
scribed by:
The intersection line has
r = (0, 0.4, −0.8) + λ (5, 4, −13) (λ ∈ R). direction vector (5, 4, −13).

596, Contents

Example 748. The planes q1 and q2 are described by:

r ⋅ n1 = r ⋅ (0, 4, 5) = 0 and r ⋅ n2 = r ⋅ (3, 4, 5) = 1.

Clearly, n1 ∥/ n2 . And so by Fact 112, they must intersect along a line with direction
vector (0, 4, 5) × (3, 4, 5) = (0, 15, −12) or (0, 5, −4).
To find an intersection point, write out the two planes’ cartesian equations:

4y + 5z = 0 and 3x + 4y + 5z = 1.
Again, there’ll be infinitely many intersec-
n1 = (0, 4, 5)
tion points. It turns out that this time, our
“plug in x = 0 trick” won’t work so nicely.
Let’s try it anyway and see what happens:
P = (1/3, 0, 0)
4y + 5z = 0 4y + 5z = 1, n2 = (3, 4, 5)
1 2

which are clearly contradictory. What this

contradiction means is that the two planes
do not share any intersection point whose
x-coordinate is 0. q1
No big deal. Instead of plugging in x = 0,
let’s try y = 0 instead: The intersection line has
direction vector (0, 5, −4).
5z = 0 3x + 5z = 1.
3 4

Solving, we have z = 0 and x = 1/3. Hence, an intersection point shared by q1 and q2 is

P = (1/3, 0, 0). And thus, their intersection line is described by:

r = (1/3, 0, 0) + λ (0, 5, −4) (λ ∈ R).

Exercise 243. In each of the following, a pair of planes q1 and q2 is given. For each
pair, determine which of the three possibilities in Corollary 21 holds. If the two planes
intersect, describe the set of intersection points. (Answer on p. 1505.)

(a) r ⋅ (4, 9, 3) = 61 and r ⋅ (1, 1, 2) = 19.

(b) q1 contains the point (1, 3, −2) and the vectors (1, −1, 0) and (1, −1, 1); while q2
contains the point (2, 3, 5) and the vectors (6, −1, 0) and (8, 0, −1).
(c) q1 contains the points (1, 1, 6), (7, 7, 0), and (5, 3, 3); while q2 contains the points
(7, 3, 1), (5, 5, 1), and (3, 5, 2).
(d) q1 contains the points (5, 3, 2), (1, 5, 3), and (10, 0, 1); while q2 contains the points
(5, −1, 4), (8, 8, −2), and (3, 5, 2).
(e) r ⋅ (7, 1, 1) = 42 and r ⋅ (1, 1, 2) = 6.
(f) r ⋅ (0, 1, 3) = 0 and r ⋅ (−1, 1, 3) = 2.

597, Contents

58. Point-Plane Foot of the Perpendicular and Distance
Earlier,248 we learnt to find:
(a) The foot of the perpendicular from a point to a line; and
(b) The distance between a point and a line.
In this chapter, we’ll analogously learn to find:
(a) The foot of the perpendicular from a point to a plane; and
(b) The distance between a point and a plane.
As before, we start by defining the foot of the perpendicular from a point to a plane:

Definition 149. Let A be a point that isn’t on A n

the plane q. The foot of the perpendicular from The plane q
A to q is the point B on q such that AB ⊥ q.
Equivalently, if q has normal vector n, then
the foot of the perpendicular from A to q is the B
point B on q such that AB ∥ n.

Earlier, we learned that the foot of the perpendicular from a point to a line is unique. It is
analogously true that the foot of the perpendicular from a point to a plane is unique:249

Fact 113. There is at most one foot of the perpendicular from a point to a plane.

Proof. See Exercise 244.

Exercise 244. This Exercise250 guides you through a proof of Fact 113. Let B be a foot
of the perpendicular from a point A to a plane q. Let C ≠ B be a point on q. We’ll show
that AC ⊥/ q and hence that C cannot also be a foot of the perpendicular from A to q.
Ð→ Ð→
(a) Explain why AB ⋅ BC = 0.
Ð→ Ð→ Ð→
(b) Now prove that AC ⊥/ BC and hence that AC ⊥/ q. (Answer on p. 1507.)

As before, we define the distance between a point and a plane to be the minimum distance
between them:

Definition 150. Let A be a point and q be a plane. Suppose B is the point on q that’s
closest to A. Then the distance between A and q is ∣AB∣.

Remark 66. In the trivial case where A is on q, the point on q that’s closest to A is A
itself. And so, by Definition 150, the distance between A and q is ∣AA∣ = ∣0∣ = 0. Which
is just what we’d expect.
Chs. 43 and 50.
Hence justifying the use of the definite article the in Definition 149.
It is also very similar to Exercise 196(c).
598, Contents
Fact 114 is the analogue of Fact 87 and is again intuitively “obvious”:

q A

B is also the
closest point.

Fact 114. If B is the foot of the perpendicular from a point A to a plane q, then B is
also the point on q that’s closest to A.

Proof. Let C ≠ B be any other point on q. Observe that A

ABC forms a right triangle with hypotenuse AC. q
By the Pythagorean Theorem, the leg AB is shorter
than the hypotenuse AC. That is, ∣AB∣ < ∣AC∣. We’ve C B
just shown that B is closer to A than any other point on
q. Thus, B is the point on q that’s closest to A.
The following Corollary is immediate from Fact 114 and Definition 150

Corollary 22. If B is the foot of the perpendicular from a point A to a plane q, then the
distance between A and q is ∣AB∣.

Example 749. Let A = (1, 2, 3) be a point, q n = (1, 1, 1)

A = (1, 2, 3)
be the plane described by r ⋅ n = r ⋅ (1, 1, 1) = 3,
and B be the foot of the perpendicular from
Ð→ q
A to q. We shall find B and also ∣AB∣ (the
distance between A and q) using what I’ll call Ð→
B AB = −n
the Perpendicular Method:252
By Definition 149, AB ∥ n. So, there exists k ≠ 0 such that:
AB = kn or B − A = kn or B = A + kn = (1, 2, 3) + k(1, 1, 1).

Let us find k. Since B ∈ q, we have:

OB ⋅ (1, 1, 1) = 3 or [(1, 2, 3) + k(1, 1, 1)] ⋅ (1, 1, 1) = 3 or 6 + 3k = 3.

So, k = −1 and: B = A + kn = A −1n = (1, 2, 3) −1(1, 1, 1) = (0, 1, 2).

And the distance between A and q is:

Ð→ √
∣AB∣ = ∣kn∣ = ∣k∣ ∣n∣ = ∣−1∣ ∣(1, 1, 1)∣ = 3.

Note that this proof is actually exactly identical to that of Fact 87.
This is exactly analogous to Method 2 (Perpendicular Method) in Chs. 43 and 50.
599, Contents
Example 750. Let A = (−1, 0, 1) be a point, q be A = (−1, 0, 1) n = (0, 2, 5)
the plane described by r ⋅ n = r ⋅ (0, 2, 5) = 1, and B
be the foot of the perpendicular from A to q. q
Perpendicular Method. By Definition 149,
AB ∥ n. So, there exists k ≠ 0 such that: Ð→ 4
B AB = − n
B = A + kn = (−1, 0, 1) + k(0, 2, 5). Not to scale.

Since B ∈ q, we have:
OB ⋅ (0, 2, 5) = 1 or [(−1, 0, 1) + k(0, 2, 5)] ⋅ (0, 2, 5) = 1 or 5 + 29k = 1.

So, k = −4/29 and:

4 4 8 9
B = A + kn = A − n = (−1, 0, 1) − (0, 2, 5) = (−1, − , ).
29 29 29 29

And the distance between A and q is:

Ð→ 4 4 4√ 4
∣AB∣ = ∣kn∣ = ∣k∣ ∣n∣ = ∣− ∣ ∣(0, 2, 5)∣ = ∣(0, 2, 5)∣ = 29 = √ .
29 29 29 29

Exercise 245. For each of the following, let B be the foot of the perpendicular from A
to q. Use the Perpendicular Method to find B and the distance between A and q.
The point A The plane q
(a) (7, 3, 4) r⋅ (9, 3, 7) = 109.
(b) (8, 0, 2) r⋅ (2, 7, 2) = 42.
(c) (8, 5, 9) r⋅ (5, 6, 0) = 64. (Answer on p. 1507.)

Next up, we’ll learn a second method, called the Formula Method, for finding the foot
of the perpendicular from a point to a plane and the distance between a point and a plane:

600, Contents

58.1. Formula Method
Fact 115 gives us the Formula Method for finding the foot of the perpendicular from a
point to a plane and the distance between a point and a plane:

Fact 115. Suppose q is a plane described by r ⋅ n = d, A is a point that isn’t on the plane
q, and B is the foot of the perpendicular from A to q.
d − OA ⋅ n
Let: k= .

Then: (a) B = A + kn; and (b) ∣AB∣ = ∣k∣ ∣n∣.

Proof. See Exercise 246.

Observe that if n = (a, b, c) and A = (x, y, z), then in the above result, we also have:
d − (ax + by + cz)
a2 + b2 + c2

You may find this latter formula for k easier to remember.

Exercise 246. Prove Fact 115(a) and (b). (Answer on p. 1508.)

(Hint: Carefully and exactly mimic what was done in the last two examples.)

We now redo the last two examples, but now using Fact 115:

Example 751. Let A = (1, 2, 3) be a point, q be the plane described by r⋅n = r⋅(1, 1, 1) = 3,
and B be the foot of the perpendicular from A to q.
√ √
Formula Method. First compute ∣n∣ = 12 + 12 + 12 = 3. Then compute:
d − OA ⋅ n 3 − (1, 2, 3) ⋅ (1, 1, 1) 3 − (1 + 2 + 3) −3
k= = = = = −1.
2 3 3 3

And now by Fact 115, we have B = A + kn = (1, 2, 3) −1(1, 1, 1) = (0, 1, 2).

Ð→ √ √
And: ∣AB∣ = ∣k∣ ∣n∣ = 1 ⋅ 3 = 3.

Happily, these are the same as what we found with Method 2 earlier.

Remark 67. I recommend sticking with and using the Perpendicular Method rather
than the Formula Method. Two reasons for this: the Perpendicular Method (a) is
easier to remember; and (b) helps you understand what’s going on.
In contrast, with the Formula Method, one is liable to simply and mindlessly plug in
formulae without understanding what is going on. Disaster then strikes if one is unable
to recall these formulae.

601, Contents

Example 752. Let A = (−1, 0, 1) be a point, q be the plane described by r⋅n = r⋅(0, 2, 5) =
1, and B be the foot of the perpendicular from A to q.
√ √
Formula Method. First compute ∣n∣ = 02 + 22 + 52 = 29. Then compute:
d − OA ⋅ n 1 − (−1, 0, 1) ⋅ (0, 2, 5) 3 − (0 + 0 + 5) 4
k= = = =− .
2 29 3 29

4 8 4
And now by Fact 115, we have B = A + kn = (−1, 0, 1) − (0, 2, 5) = (−1, − , ).
29 29 29
Ð→ 4 √ 4
And: ∣AB∣ = ∣k∣ ∣n∣ = ⋅ 29 = √ .
29 29

Happily, these are the same as what we found with Method 2 earlier.

Exercise 247. Redo Ex. 245 using the Formula Method. (Answer on p. 1508.)

The next result is not one that students could reasonably have been expected to know.
Which means, of course, that it made a sudden appearance in 2017 (Exercise 512).

Corollary 23. Suppose a plane is described by r ⋅ n = d. Then the distance between the
plane and the origin is:


Hence, if n is a unit vector, then the distance between the plane and the origin is ∣d∣.

Proof. See Exercise 248.

Exercise 248. Prove Corollary 23. (Hint: Use Fact 115). (Answer on p. 1509.)

We again revisit the same two examples:

Example 753. Let q be the plane described by r ⋅ n = r ⋅ (1, 1, 1) = 3. Then by Corollary

23, the distance between q and the origin is:

∣d∣ 3 √
= √ = 3.
∣n∣ 3

Example 754. Let q be the plane described by r ⋅ n = r ⋅ (0, 2, 5) = 1. Then by Corollary

23, the distance between q and the origin is:

∣d∣ 1
=√ .
∣n∣ 29

602, Contents

We end this chapter with two new examples that review what we’ve learnt in this chapter:

Example 755. Let q be the plane described by r ⋅ n = r ⋅ (0, 5, 1) = 7, A = (−1, 2, 5) be a

point, and B be the foot of the perpendicular from A to q.
Perpendicular Method. Write B = A + kn = (−1, 2, 5) + k(0, 5, 1). Since B ∈ q, we have:
OB ⋅ (0, 5, 1) = 7 or [(−1, 2, 5) + k(0, 5, 1)] ⋅ (0, 5, 1) = 7 or 15 + 26k = 7.

So, k = −8/26 = −4/13. Thus:

4 4 1
B = A− n = (−1, 2, 5) − (0, 5, 1) = (−13, 6, 61) .
13 13 13
And the distance between A and q is:

Ð→ 4 √ 4 2
∣AB∣ = ∣k∣ ∣n∣ = ⋅ 26 = √ .
13 13

By Corollary 23, the distance between the origin and the given plane is:

∣d∣ ∣7∣ 7
=√ =√ .
∣n∣ 26 26
√ √
Formula Method. First compute ∣n∣ = 02 + 52 + 12 = 26. Then compute:
d − OA ⋅ n 7 − (−1, 2, 5) ⋅ (0, 5, 1) 7 − 15 4
k= = = =− .
2 26 26 13

And now by Fact 115, we have:

4 1
B = A + kn = (−1, 2, 5) −
(0, 5, 1) = (−13, 6, 61) .
13 13
Ð→ √
And as before, we can compute ∣AB∣ = ∣k∣ ∣n∣ = 4 2/13.

603, Contents

Example 756. Let q be the plane described by r ⋅ n = r ⋅ (1, 2, 3) = 32, A = (0, 0, 0) be a
point, and B be the foot of the perpendicular from A to q.
Perpendicular Method. Write B = A + kn = (0, 0, 0) + k(1, 2, 3). Since B ∈ q, we have:
OB ⋅ (1, 2, 3) = 32 or [(0, 0, 0) + k(1, 2, 3)] ⋅ (1, 2, 3) = 32 or 14k = 32.

16 16 16
So, k = 32/14 = 16/7 and B = A + n = (0, 0, 0) + (1, 2, 3) = (1, 2, 3).
7 7 7
The distance between A and q is:

Ð→ 16 √ 16 2
∣AB∣ = ∣k∣ ∣n∣ = ⋅ 14 = √ .
7 7

A = (0, 0, 0)
n = (1, 2, 3)

Ð→ 16
AB = n

√ √
Formula Method. First compute ∣n∣ = 12 + 22 + 32 = 14. Then compute:
d − OA ⋅ n 32 − (0, 0, 0) ⋅ (1, 2, 3) 32 − 0 16
k= = = = .
2 14 14 7

16 16
And now by Fact 115, B = A + kn = (0, 0, 0) + (1, 2, 3) = (1, 2, 3).
7 7
Ð→ √
And as before, we can compute ∣AB∣ = ∣k∣ ∣n∣ = 16 2/7.
Of course, since A = O, this number is also the distance between the origin and q.

Exercise 249. Let S = (−1, 0, 7) and T = (3, 2, 1) be points and q be the plane described
by r ⋅ (5, −3, 1) = 0. Use both methods you’ve learnt in this chapter to find (a) the feet of
the perpendiculars from S and T to q. Then find (b) the distances from q to the points
S and T ; and also (c) the distance between q and the origin. (Answer on p. 1509.)

Remark 68. It turns out that there is also a third method, called the Calculus Method,
for finding the foot of the perpendicular and the distance between a point and a plane.
This is very similar to and not much more difficult what we did earlier in Ch. 43 and 50
with a point and a line.
However, because this involves multivariate calculus and is not on your syllabus, I have
decided to relegate this discussion to Ch. 119.15(Appendices).

604, Contents

59. Coplanarity

Definition 151. Two or more points are coplanar if some plane contains all of them.

“Obviously”, given any line, there is a plane that contains this line.
Recall253 that any two points are collinear. And so “obviously”, they must also be coplanar.
Recall254 that three non-collinear points uniquely determine a plane. Thus, any three points
must also be coplanar.
In contrast, four points need not be coplanar. Given four distinct points, we’ll use these
steps to check if they’re coplanar:
1. Write down the plane that contains three of the points.
2. Then check whether this plane also contains the fourth point.

Example 757. Let A = (1, 0, 0), B = (0, 1, 0), C = (0, 0, 1), and D = (1, 1, −1) be points.
To check if they are coplanar, we’ll write down the plane q that contains A, B, and C.
We’ll then check if D ∈ q.
Ð→ Ð→
The non-parallel vectors AB = (−1, 1, 0) and AC = (−1, 0, 1) are on q.
Ð→ Ð→
Method 1 (Vector Form). q has normal vector AB × AC = (1, 1, 1).
Compute OA ⋅ (1, 1, 1) = 1 + 0 + 0 = 1. Hence, q may be described by r ⋅ (1, 1, 1) = 1.
We now verify that D ∈ q and hence that the four points are coplanar:
OD ⋅ (1, 1, 1) = (1, 1, −1) ⋅ (1, 1, 1) = 1 + 1 − 1 = 1. 3

Method 2 (Parametric Form). We can describe q in parametric form as:

⎛1⎞ ⎛ −1 ⎞ ⎛ −1 ⎞ ⎛ 1 − λ − µ ⎞
Ð→ Ð→ Ð→ ⎜ ⎟
r = OA + λAB + µAC = ⎜ 0 ⎟ + λ ⎜ ⎟ ⎜ ⎟ ⎜
⎜ 1 ⎟ + µ⎜ 0 ⎟ = ⎜ λ ⎟
⎟ (λ, µ ∈ R).
⎝0⎠ ⎝ 0 ⎠ ⎝ 1 ⎠ ⎝ µ ⎠

Now check if D ∈ q by plugging D into the the above parametric equation:

⎛ 1 ⎞ ⎛ 1−λ−µ ⎞ 1 = 1 − λ − µ,

⎜ 1 ⎟=⎜ ⎟ 1 = λ,
⎜ ⎟ ⎜ ⎟
λ or
⎝ −1 ⎠ ⎝ µ ⎠ −1 = µ.

From = and =, λ = 1 and µ = −1. These values of λ and µ satisfy =. Hence, D ∈ q and the
2 3 1

four points are coplanar.

Fact 80.
Ch. 55.
605, Contents
Example 758. Let A = (2, 3, 5), B = (8, −1, 0), C = (0, 1, 0), and D = (−3, −2, −1) be
points. To check if they are coplanar, we’ll write down the plane q that contains A, B,
and C. We’ll then check if D ∈ q.
Ð→ Ð→
The non-parallel vectors AB = (6, −4, −5) and BC = (−8, 2, 0) are on q.
Ð→ Ð→
Method 1. q has normal vector AB × BC = (10, 40, −20) or (1, 4, −2).
Compute OC ⋅ (1, 4, −2) = 0 + 4 + 0 = 4. Hence, q may be described by r ⋅ (1, 4, −2) = 4.
We now show that D ∉ q and hence that the four points are not coplanar:
OD ⋅ (1, 4, −2) = (−3, −2, −1) ⋅ (1, 4, −2) = −3 − 8 + 2 = −9 ≠ 4. 7

Method 2. We can describe q in parametric form as:

⎛0⎞ ⎛ 6 ⎞ ⎛ −8 ⎞ ⎛ 6λ − 8µ ⎞
Ð→ Ð→ Ð→ ⎜ ⎟
r = OC + λAB + µBC = ⎜ 1 ⎟ + λ ⎜
⎜ −4
⎟ + µ ⎜ 2 ⎟ = ⎜ 1 − 4λ + 2µ
⎟ ⎜ ⎟ ⎜

⎟ (λ, µ ∈ R).
⎝0⎠ ⎝ −5 ⎠ ⎝ 0 ⎠ ⎝ −5λ ⎠

Now check if D ∈ q by plugging D into the the above parametric equation:

⎛ −3 ⎞ ⎛ 6λ − 8µ ⎞ −3 = 6λ − 8µ,

⎜ −2 ⎟ = ⎜ 1 − 4λ + 2µ ⎟ −2 = 1 − 4λ + 2µ,
⎜ ⎟ ⎜ ⎟
⎝ −1 ⎠ ⎝ −5λ ⎠ −1 = −5λ.

From =, λ = 0.2. And so from =, µ = 21/40. These values of λ and µ contradict =. Hence,
3 1 2

D ∉ q and the four points are not coplanar.

Here’s an occasionally useful shortcut:

Fact 116. If three of four points are collinear, then the four points are coplanar.

Proof. Given four points A, B, C, and D, suppose A, B, and C are collinear. Let q be the
plane that contains A, B, and D. By Fact 95, q contains the line AB and hence also the
point C. Thus, q contains all four points.

Example 759. Let A = (1, 2, 3), B = (4, 5, 6), C = (10, 11, 12), and D = (9, 1, 7) be points.
The line AB is described by r = (1, 2, 3) + λ (3, 3, 3) (λ ∈ R). By picking λ = 3, we see that
AB also contains the point C. Hence, A, B, and C are collinear.
And so by Fact 116, A, B, C, and D are coplanar. (Indeed, given any point E, it is
similarly true that the four points A, B, C, and E must be coplanar.)

Note though that the converse of Fact 116 is false (see Exercise 250).

606, Contents

Exercise 250. In Example 757, we already showed that the points A = (1, 0, 0), B =
(0, 1, 0), C = (0, 0, 1), and D = (1, 1, −1) are coplanar. Now show that no three of these
four points are collinear. (You will thus have produced a counterexample to the converse
of Fact 116.) (Answer on p. 1510.)

Exercise 251. In each of the following, determine if the four points given are coplanar.
If they are, write down the plane that contains all four points. (Answer on p. 1510.)
(a) A = (0, 1, 5), B = (−3, −1, 1), C = (2, 7, 5), and D = (6, 6, 1).
(b) A = (−1, 3, −5), B = (0, 0, 0), C = (6, 1, −2), and D = (4, 7, −12).
(c) A = (0, 1, 2), B = (1, 2, 3), C = (2, 3, 4), and D = (19, 0, −5).

607, Contents

59.1. Coplanarity of Lines

Definition 152. Two or more lines are coplanar if some plane contains all of them.

Example 760. The lines l1 and l2 are coplanar because the plane q1 contains both of
them. Similarly, l2 and l3 are coplanar because the plane q2 contains both of them.



In contrast, l1 and l3 are not coplanar because no plane contains both of them.


Fact 117. If two lines are identical, then they are also parallel and coplanar.

Proof. Let the two (identical) lines be described by r = OP + λu (λ ∈ R).
They have parallel direction vectors. And so by Definition 116, they are parallel.
Let v be any vector that points in a different direction from u. Then the plane r =
OP + λu + µv (λ, µ ∈ R) contains both lines (to verify this, simply let µ = 0).

The following Fact says that in the case of two distinct lines, there are three possibilities:
Ð→ Ð→
Fact 118. Suppose l1 and l2 are distinct lines described by r = OP + λu and r = OQ + λv
(λ ∈ R). Then the three possibilities are that l1 and l2 are:
(a) Parallel and do not intersect; moreover, the unique plane that contains l1 and l2 is
Ð→ Ð→
described by r = OP + λu + µP Q (λ, µ ∈ R).
(b) Non-parallel and share exactly one intersection point; moreover, the unique plane
that contains l1 and l2 is described by r = OP + λu + µv (λ, µ ∈ R).
(c) Skew (i.e. neither parallel nor intersect) and are not coplanar.

Proof. See p. 1316 in the Appendices.

608, Contents

Example 761. We can illustrate Fact 118 l3
using the last example: q2
(a) The lines l2 and l3 are parallel, co-
planar, and do not intersect. q1
(b) The lines l1 and l2 are non-parallel, co-
planar, and share exactly one intersec- l1
tion point.
(c) The lines l1 and l3 are skew (i.e. aren’t
parallel & don’t intersect) and not co-

Fact 118 yields the following Corollary:

Corollary 24. Two lines are coplanar if and only if they are not skew.

Proof. ( Ô⇒ ) Suppose two lines are coplanar. Either they are parallel or not. If they are
parallel, then by Definition 139, they are not skew. And if they are not parallel, then by
Fact 118(b), they intersect and again by Definition 139 are not skew.
( ⇐Ô ) Fact 117 already proved that two identical lines are coplanar. So suppose two
distinct lines are not skew. Then by Definition 139, they either intersect or are parallel. If
they intersect, then by Fact 118(b), they are coplanar. And if they are parallel, then by
Fact 118(a), they are again coplanar.

Remark 69. In this textbook, we define two lines to be skew if they are not parallel and
do not intersect (Definition 139). Corollary 24 then follows as a result.
However, some writers take the opposite route — they first define two lines to be skew
if they are not coplanar. That is, they use Corollary 24 as their definition of skew lines.
They then prove that our Definition 139 follows as a result.

Example 762. Two distinct lines are described by:

r = (8, 1, 1) + λ (3, 6, 9) and r = (4, 5, 6) + λ (1, 2, 3) (λ ∈ R).

Since (3, 6, 9) ∥ (1, 2, 3), the two lines are parallel. And so by Fact
118(a), they are coplanar and do not intersect.
Compute (4, 5, 6) − (8, 1, 1) = (−4, 4, 5). By Fact 118(a), the (unique)
plane that contains both lines is:

r = (8, 1, 1) + λ(1, 2, 3) + µ (−4, 4, 5) (λ, µ ∈ R).

609, Contents

Example 763. Two distinct lines are described by:
r = (0, 0, 0) + λ (0, 1, 0) and r = (4, 17, 0) + λ (1, 0, 0) (λ ∈ R).

Since (0, 1, 0) ∥/ (1, 0, 0)the two lines are not parallel. And so, there
are the two possibilities given by Fact 118(b) and (c). To check if they

intersect, write:

⎛0⎞ ⎛0⎞ ⎛ 4 ⎞ ⎛1⎞ 0 = 4 + µ̂,


⎜ 0 ⎟ + λ̂ ⎜ 1 ⎟ = ⎜ 17 ⎟ + µ̂ ⎜ 0 ⎟ λ̂ = 17,
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝0⎠ ⎝0⎠ ⎝ 0 ⎠ ⎝0⎠ 0 = 0.

Solving, we have λ̂ = 17 and µ̂ = −4. Thus, the two lines intersect at the following point:

(0, 0, 0) + 17 (0, 1, 0) = (4, 17, 0) − 4 (1, 0, 0) = (0, 17, 0) .

And so by Fact 118, the two lines are also coplanar and the (unique) plane that contains
them can be described by r = (0, 0, 0) + λ (0, 1, 0) + µ (1, 0, 0) (λ, µ ∈ R).

Example 764. Two distinct lines are described by:

r = (0, 1, 2) + λ (9, 1, 3) and r = (4, 5, 6) + λ (3, 2, 1) (λ ∈ R).

Since (9, 1, 3) ∥/ (3, 2, 1), the two lines are not parallel. And so again,
there are the two possibilities given by Fact 118(b) and (c). To check if

they intersect, write:

⎛0⎞ ⎛9 ⎞ ⎛4⎞ ⎛3 ⎞ 9λ̂ = 4 + 3µ̂,


⎜ 1 ⎟ + λ̂ ⎜ 1 ⎟ = ⎜ 5 ⎟ + µ̂ ⎜ 2 ⎟ 1 + λ̂ = 5 + 2µ̂,
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝2⎠ ⎝3 ⎠ ⎝6⎠ ⎝1 ⎠ 2 + 3λ̂ = 6 + µ̂.

= minus 3× = yields −6 = −14, a contradiction. So, the two lines do not intersect. And
1 3

so by Fact 118, they are skew and not coplanar.

Exercise 252. Determine if each pair of lines is parallel, coplanar, or intersect. Also,
find any intersection points and any plane that contains both lines.(Answer on p. 1511.)
l1 l2
(a) r= (8, 1, 5) +λ (3, 2, 1) r= (1, 2, 3) +λ (5, 6, 7) (λ ∈
(b) r= (0, 0, 6) +λ (3, 9, 0) r= (1, 1, 1) +λ (1, 3, 0) “
(c) r= (6, 5, 5) +λ (1, 0, 1) r= (9, 3, 6) +λ (0, 1, 1) “
(d) r= (−1, 3, 8) +λ (−5, 0, 1) r= (9, 3, 6) +λ (10, 0, 2) “

610, Contents

Part IV.
Complex Numbers

611, Contents

What I cannot create, I do not understand.
Know how to solve every problem that has been solved.

— Richard Feynman’s blackboard at the time of his death (1988).

612, Contents

60. Complex Numbers: Introduction
Here’s a brief and simplistic motivation of complex numbers:
1. Solve x − 1 = 0. Easy; the solution is a natural number: x = 1.
2. To solve x + 1 = 0, we must invent negative numbers. The solution is x = −1.

3. To solve x2 = 2, we must invent irrational numbers. The solution is x = ± 2.
4. To solve x2 = −1, we must invent the imaginary unit:

Definition 153. The imaginary unit, denoted i, is the number that satisfies i2 = −1.

Or equivalently: i = −1. And so, the solution to the equation x2 = −1 is x = ±i.
In this textbook, we will blithely and naïvely assume that the “usual” rules of arithmetic
also apply to the complex numbers.
Any non-zero, real multiple of the imaginary unit is called a purely imaginary number:

Definition 154. A purely imaginary number is any ib, where b ∈ R with b ≠ 0.

We specify that b ≠ 0 because 0i = 0 is not purely imaginary, but real.

Example 765. The following numbers are purely imaginary:

√ √ √
i + i = 2i = 2 −1, −i = − −1, iπ = π −1.

The number i is both purely imaginary and the imaginary unit.

The sum of any real number (including zero) and a purely imaginary number is called an
imaginary number:

Definition 155. An imaginary number is any a + ib, where a, b ∈ R with b ≠ 0.

Again, we specify that b ≠ 0 because a + 0i is not imaginary, but real.

Example 766. The number 3 + 2i is imaginary, but not purely imaginary.

In contrast, 2i is both imaginary and purely imaginary. (Every purely imaginary number
is also an imaginary number.)

Any imaginary number that isn’t purely imaginary is an “impure” imaginary number:

Definition 156. An “impure” imaginary number is any a+ib, where a, b ∈ R with a, b ≠ 0.

Example 767. The number 3 + 2i is imaginary and also “impure” imaginary.

A complex number is simply any real or imaginary number.

Definition 157. A complex number is any a + ib, where a, b ∈ R.

613, Contents

More examples of the different types of numbers:

Example 768. The following numbers are complex, imaginary, and “impure” imaginary:

3 − 2i, −5 + 13i, 2+ 5i, −13 − i.

However, they are neither real nor purely imaginary.

Example 769. The following numbers are complex, imaginary, and purely imaginary:

2i, − 5i, (47 − π) i.

However, they are neither real nor “impure” imaginary.

Example 770. The number i is complex, imaginary, purely imaginary, and the imaginary
However, i is neither real nor “impure” imaginary.

Example 771. The following numbers are both complex and real:

−20, 5, 3π, −100, 0.

However, they are not imaginary, purely imaginary, or “impure” imaginary.

Remark 70. Be aware that some other writers define purely imaginary numbers to include
the number 0. This is not the practice of this textbook.255 In this textbook, the number
0 is not purely imaginary because 0 = 0 + 0i, but by Definition 154, a purely imaginary
number a + ib must have b ≠ 0.

R denotes the set of real numbers. Similarly, C denotes the set of complex numbers:

Definition 158. The set of all complex numbers, denoted C, is {a + ib ∶ a, b ∈ R}.

Every real number is a complex number, but not every complex number is a real number.
Or equivalently, the set of real numbers is a proper subset of the set of complex numbers:

Fact 119. R ⊂ C.

Proof. Every a ∈ R can be written as a + 0i and so by Definition 158, we have a ∈ C. Since

every element of R is an element of C, by Definition 18, we have R ⊆ C.
Now, observe that 3 + 7i ∈ C but 3 + 7i ∉ R. Thus, R ≠ C.
Altogether then, since R ⊆ C and R ≠ C, by Definition 19, R ⊂ C.

This alternative practice used by other writers is sometimes convenient as one learns more about complex
numbers. However, because this textbook doesn’t go far, it is simpler and more natural to simply
continue to continue calling 0 a real number, as we’ve done all our lives, and not now also call it a
purely imaginary number.
614, Contents
We now reproduce our taxonomy of numbers from p. 42, but also flesh it out a little with
what we’ve just learnt, adding three boxes at the bottom):

Complex C
Real R
(a + ib, Rationals Q Integers Z
(a + ib, b = 0)
a, b ∈ R)

Irrationals Non-integers
(a + ib, b ≠ 0)

“Impure” ima- Purely imagi- Imaginary

ginary (a + ib, nary (a + ib, unit i (a + ib,
a ≠ 0, b ≠ 0) a = 0, b ≠ 0) a = 0, b = 1)

Now, here’s an important remark:

There is no such thing as a positive or negative complex number.

To fully appreciate why this is so is beyond the scope of the A-Levels.256 But for now, here
is a very simple example just to illustrate the point.

Example 772. The numbers 1, −1, i, and −i are all complex.

We say that 1 is a positive real number and −1 is a negative real number.
However, there is no such thing as a positive or negative complex number. So:
(a) i is neither a positive nor a negative complex number.
(b) −i is neither a positive nor a negative complex number.
(c) 1 is neither a positive nor a negative complex number.
(d) −1 is neither a positive nor a negative complex number.

Exercise 253. Fill in the rest of the following table. (Answer on p. 1512.)
√ √
Is this number ... 9 − 2i 3i 0 4 4 + 2i i 3
Complex? 3
Real? 7
Imaginary? 3
Purely imaginary? 7
“Impure” imaginary? 3
The imaginary unit? 7

But if you’re interested, see (especially ajotaxte’s answer).
615, Contents
60.1. The Real and Imaginary Parts of Complex Numbers

Definition 159. Given a complex number z = a + ib, its real part is Rez = a and its
imaginary part is Imz = b.

Example 773. Let z = 3 + 2i. Then Rez = Re (3 + 2i) = 3 and Imz = Im (3 + 2i) = 2.

Example 774. Let w = 7. Then Rew = Re7 = 7 and Imw = Im7 = 0.

Example 775. Let ω = 19i. Then Reω = Re (19i) = 0 and Imω = Im (19i) = 19.

Remark 71. Your A-Level examiners like using the symbols z, w, and ω (lower-case Greek
letter omega) to denoted complex numbers and so that’s what we’ll try to do too.

“Obviously”, two complex numbers z and w are equal if and only if:

Rez = Rew and Imz = Imw.

Example 776. Let z = 3 + ib and w = a − 17i. Suppose z = w. Then:

a=3 and b = −17.

Exercise 254. The complex numbers a, b, c, and d are given below. Exactly two are
identical — find them. (Answer on p. 1512.)
√ √
1 2 3 1 3
a= √ − b = √ − √ i, c = sin − sin i, d= − cos (− ) i.
π π π
2 2 2 2 3 3 2 4

Remark 72. Your A-Level examiners seem to follow the convention of writing a + ib or
x + iy rather than a + bi or x + yi. That is, the imaginary unit i is written before a variable
like b or y. And so that’s also what we’ll do too. (The logic seems to be that constants
come before variables and the imaginary unit i is a constant.)
Note though that we will still write 1 + 2i or −3 − 4i rather than 1 + i2 or −3 − i4. That is,
any constants like 1 or 4 will still be written before the imaginary unit i.

616, Contents

60.2. Complex Numbers in Ordered Pair Notation
It is also often convenient to write complex numbers in ordered pair notation, with the
first term being the real part and the second term being the imaginary.

Example 777. z = 3 + 2i = (3, 2).

Example 778. w = 7 = (7, 0).

Example 779. ω = 19i = (0, 19).

In general, given a complex number z, we can also write:

z = Rez + i Imz = (Rez, Imz).

Definition 160. Given a complex number z = a + ib, its real part is Rez = a and its
imaginary part is Imz = b.

Exercise 255. Rewrite each number in ordered pair notation. (Answer on p. 1512.)

(a) z = 33 (1 + ei) (b) w = (237 + π) − ( 2 − 3) i (c) Reω = p, Imω = q.

617, Contents

61. Some Arithmetic of Complex Numbers
As previously stated, in this textbook, we will blithely and naïvely assume that the “usual”
rules257 of arithmetic also apply to the complex numbers. In which case, addition and
subtraction are especially simple:

61.1. Addition and Subtraction

Example 780. Let z = −2 + = (−2, 1) i and w = 3i = (0, 3). Then:

z + w = −2 + 4i and z − w = −2 − 2i.

Or: z + w = (−2 + 0, 1 + 3) = (−2, 4) and z − w = (−2 − 0, 1 − 3) = (−2, −2).

Example 781. Let z = 7 − i = (7, −1) and w = 2 + 5i = (2, 5). Then:

z + w = 9 + 4i and z − w = 5 − 6i.

Or: z + w = (7 + 2, −1 + 5) = (9, 4) and z − w = (7 − 2, −1 − 5) = (5, −6).

In general:

Fact 120. Suppose z = a + ib = (a, b) and w = c + id = (c, d). Then:

(a) z + w = a + c + i (b + d) = (a + c, b + d); and
(b) z − w = a − c + i (b − d) = (a − c, b − d).

Exercise 256. For each, compute z + w and z − w. (Answer on p. 1512.)

(a) z = −5 + 2i, w = 7 + 3i. (b) z = 3 − i, w = 11 + 2i. (c) z = 1 + 2i, w = 3 − 2i.

In this textbook, we’ve been neither clear nor explicit about what these rules are. We have simply
assumed that everyone, including you the student, “knows” what they are.
618, Contents
61.2. Multiplication
Here are the powers of i:

i = i, i2 = i × i = −1, i3 = i × i2 = −i, i4 = i × i3 = 1,

i5 = i × i4 = i, i6 = i × i = −1, i7 = i × i2 = −i, i8 = i × i3 = 1,

i9 = i × i8 = i, i10 = i × i = −1, i11 = i × i2 = −i, i12 = i × i3 = 1,


Observe that i4 = 1. And so, the cycle repeats after every fourth power.
The “usual” rules of multiplication hold:

Example 782. Let z = i and w = 1 + i. Then zw = i(1 + i) = i (1) + i2 = i − 1.

Example 783. Let z = −2 + i and w = 3i. Then:

zw = (−2 + i) (3i) = (−2) (3i) + i (3i) = −6i + 3i2 = −3 − 6i.

Google does the basic arithmetic of complex numbers as well as Wolfram Alpha, but
much more quickly. So here in Part IV, whenever you see the logo, click on it and
you’ll be brought to the relevant computation done by Google.

Example 784. Let z = 2 − i and w = −1 + i. Then:

zw = (2 − i) (−1 + i) = −2 + 2i + i − i2 = −1 + 3i.

Example 785. Let z = 3 + 2i and w = −7 + 4i. Then:

zw = (3 + 2i) (−7 + 4i) = −21 + 12i − 14i + 8i2 = −29 − 2i.

In general:

Fact 121. If z = a + ib = (a, b) and w = c + id = (c, d), then

zw = ac − bd + i (ad + bc) = (ac − bd, ad + bc) .

Proof. See Exercise 258.

Recall that (x + y) = x2 + 2xy + y 2 and (x + y) = x3 + 3x2 y + 3xy 2 + y 3 . Hence:

2 3

(a + ib) = a2 + 2a (ib) + (ib) = a2 + 2iab − b2 ,

2 2

(a + ib) = a3 + 3a2 (ib) + 3a (ib) + (ib) = a3 + 3ia2 b − 3ab2 − ib3 .

3 2 3

619, Contents

Let’s jot these down formally:

Fact 122. (a) (a + ib) = a2 + 2iab − b2 .


(b) (a + ib) = a3 + 3ia2 b − 3ab2 − ib3 .


Example 786. Let z = 3 + 2i. To compute z 2 , we can use Fact 122(a):

z 2 = (3 + 2i) = 32 + 2 ⋅ 3 ⋅ 2i − 22 = 9 + 12i − 4 = 5 + 12i.

2 1

Instead of using Fact 122(a), we could do the usual multiplication:

z 2 = (3 + 2i) (3 + 2i) = 9 + 6i + 6i − 4 = 5 + 12i.

And to compute z 3 , we can use Fact 122(b):

z 3 = (3 + 2i) = 33 + 3 ⋅ 32 ⋅ 2i − 3 ⋅ 3 ⋅ 22 − 23 i = 27 + 54i − 36 − 8i = −9 + 46i.


Again, instead of using Fact 122(b), we could do the usual multiplication:

z 3 = z 2 z = (5 + 12i) (3 + 2i) = 15 + 10i + 36i − 24 = −9 + 46i.

Exercise 257. For each, compute zw, z 2 , and z 3 . (Answer on p. 1512.)

(a) z = −5 + 2i, w = 7 + 3i. (b) z = 3 − i, w = 11 + 2i. (c) z = 1 + 2i, w = 3 − 2i.

Exercise 258. Prove Fact 121. (Answer on p. 1513.)

Exercise 259. Suppose we are given that 2 + i solves az 3 + bz 2 + 3z − 1 = 0. Then what

are a and b? (Answer on p. 1513.)

620, Contents

61.3. Conjugation

Example 787. Let z = 1 + i. Then the (complex) conjugate of z is z ∗ = 1 − i.

We call z = 1 + i and z ∗ = 1 − i a (complex) conjugate pair.

Definition 161. Given the complex number z = a+ib, its (complex) conjugate is z ∗ = a−ib.
Also, z = a + ib and z ∗ = a − ib are called a (complex) conjugate pair.
√ √
Example 788. The conjugate of w = −5 − (17 + 2) i is w∗ = −5 + (17 + 2) i; and w and
w∗ are called a conjugate pair.

Example 789. The conjugate of ω = 10 is ω ∗ = 10.

Example 790. The conjugate of a = 2i is a∗ = −2i.

“Obviously”, the conjugate of the conjugate of a complex number z is z itself:

Fact 123. (z ∗ ) = z.

Example 791. Let z = 1 + i. Then z ∗ = 1 − i and (z ∗ ) = 1 + i = z.

Example 792. Let w = −5 − 17i. Then w∗ = −5 + 17i and (w∗ ) = −5 − 17i = w.

Example 793. Let ω = 10. Then ω ∗ = 10 and (ω ∗ ) = 10 = ω.

Example 794. Let a = 2i. Then a∗ = −2i and (a∗ ) = 2i = a.

Recall258 that a + b and a − b were called a conjugate pair because:

(a + b) (a − b) = a2 − b2 .

Recall also that when a denominator contained a surd, we could often use = to rationalise

(“make rational”) the denominator:

√ √ √
3 3 1 − 5 3 (1 − 5) 3 (1 − 5) 3 √
Example 795. √ = √ √ = = = ( 5 − 1).
1+ 5 1+ 51− 5 12 − 52 −4 4

Here we can play a similar trick. Observe that if z = a + ib, then:

zz ∗ = (a + ib) (a − ib) = a2 − (ib) = a2 − i2 b2 = a2 + b2 .

1 2 2

We can often use = to realise (“make real”) a denominator that contains a complex number:

Ch. 5.5.
621, Contents
Example 796. Let z = 1 + i. Consider the reciprocal of z:
1 1
z 1+i

In general, it is easier to deal with “simpler” denominators. We might thus like to rid
the above denominator of any complex numbers.
Here’s how we can do so by using the conjugate z ∗ . We simply multiply by z ∗ /z ∗ = 1:
1 1 z∗ 1 1−i 2 1−i 1−i 1 1
= ∗= = 2 2= = − i.
z zz 1+i 1−i 1 +1 2 2 2

Fact 124. If z = a + ib = (a, b), then:

1 z∗ 1
(a) zz ∗ = a2 + ib2 = (a2 , b2 ); and (b) = 2 = 2 2 (a, −b).
z ∣z∣ a +b

Proof. See Exercise 261.

A few more examples:

Example 797. Let z = −3 + 5i = (−3, 5). Then z ∗ = −3 − 5i = (−3, −5) and:

1 z∗ z∗ −3 − 5i 3 5 1
= 2 2 = ∗= =− − i= (−3, −5).
z 3 +5 34 34 34 34 34

Example 798. Let w = 1 − i = (1, −1). Then w∗ = 1 + i = (1, 1) and:

1 w∗ w∗ 1 + i 1 1 1
= = = = + i = (1, 1).
w 12 + 12 2 2 2 2 2

Example 799. Let ω = 1 + i = (1, 1). Then ω ∗ = 1 − i = (1, −1) and:

1 1 1−i 1 1 1
= 2 2 ω∗ = = − i = (1, −1).
ω 1 +1 2 2 2 2

Exercise 260. Write down each number’s conjugate and reciprocal in the form a + ib.
(a) z = −5 + 2i. (b) z = 3 − i. (c) z = 1 + 2i. (Answer on p. 1513.)

Exercise 261. Prove Fact 124. (Answer on p. 1513.)

Remark 73. Just so you know, most writers denote the conjugate of z by z. However,
your A-Level examiners use z ∗ and so that’s what we’ll do too.

622, Contents

61.4. Division
We can now divide one complex number by another:

Fact 125. If z = a + ib = (a, b) and w = c + id = (c, d) with w ≠ 0, then:

z zw∗ 1
= = 2 (ac + bd, bc − ad) .
w ∣w∣ 2 c + d2

z z w∗ zw∗
Proof. = =
w w w∗ c2 + d2

Example 800. Let z = −2 + i and w = 3i. Then:

z −2 + i zw∗ (−2 + i) (−3i) 3 + 6i 1 2

= = 2 2= = = + i.
w 3i 0 +3 9 9 3 3

Example 801. Let z = 3 + i and w = 1 − i. Then:

z 3+i zw∗ (3 + i) (1 + i) 2 + 4i
= = 2 2= = = 1 + 2i.
w 1−i 1 +1 2 2

Example 802. Let z = 1 + i and w = 3 − 2i. Then:

1+i zw∗ (1 + i) (3 + 2i) 1 + 5i 1 5

= = 2 2= = = + i.
w 3 − 2i 3 + 2 13 13 13 13

Example 803. Let z = 2 − i and w = −1 + i. Then:

2−i zw∗ (2 − i) (−1 − i) −3 − i 3 1

= = 2 2= = = − − i.
w −1 + i 1 + 1 2 2 2 2

Example 804. Let z = 3 + 2i and w = −7 + 4i. Then:

3 + 2i zw∗ (3 + 2i) (−7 − 4i) −13 − 26i 1 2

= = 2 2= = = − − i.
w −7 + 4i 7 + 4 65 65 5 5

Example 805. Let z = −3 + 6i and w = 2 + iπ. Then:

z −3 + 6i zw∗ (−3 + 6i) (2 − iπ) −6 + 3iπ + 12i + 6π π−1 π+4

= = 2 = = = 6 + 3 i.
w 2 + iπ 2 + π2 4 + π2 4 + π2 4 + π2 4 + π2

Exercise 262. For each, find z/w in the form a + ib. (Answer on p. 1513.)

(a) z = 1 + 3i, w = −i. (b) z = 2 − 3i, w = 1 + i. (c) z = 2 − πi, w
(d) z = 11 + 2i, w = i. (e) z = −3, w = 2 + i. (d) z = 7 − 2i, w

623, Contents

62. Solving Polynomial Equations
Recall (Ch. 9) that if the quadratic equation ax2 + bx + c = 0 has non-negative discriminant
(i.e. b2 − 4ac ≥ 0), then it has two real roots, which are given by:

1 −b ± b − 4ac
x= .
Now that we’ve learnt a little about complex numbers, we can more simply say that re-
gardless of the sign of the discriminant:

Fact 126. Every quadratic equation has two complex roots given by =.

Proof. See Theorem 11 below (the Fundamental Theorem of Algebra).

Example 806. Consider the quadratic equation x2 − 2x + 2 = 0.

Its discriminant is negative: b2 − 4ac = (−2) 2 − 4 (1) (2) = −4 < 0.
Nonetheless, like every quadratic equation, it has two complex roots:
√ √ √ √
−b ± b2 − 4ac 2 ± −4 4 × −1 2i
x= = =1± = 1 ± = 1 ± i.
2a 2 2 2
In this case, both roots are imaginary.

Example 807. The quadratic equation x2 −3x+2 = 0 has positive discriminant : b2 −4ac =
(−3) 2 − 4 (1) (2) = 1 > 0. Thus, both of its complex roots are real:
√ √
−b ± b2 − 4ac 3 ± 1
x= = = 1, 2.
2a 2

Example 808. The quadratic equation x2 − 2x + 1 = 0 has discriminant zero: b2 − 4ac =

(−2) 2 − 4 (1) (1) = 0. Its roots are given by:
√ √
−b ± b2 − 4ac 2 ± 0
x= = = 1.
2a 2
Hmm ... this time there’s only one root, namely 1. Doesn’t this contradict Fact 126?
Well, here we’ll cheat a little, by calling 1 a repeated or double root of the quadratic
equation x2 − 2x + 1 = 0. You can think of this as a sort of accounting trick to ensure that
Fact 126 (and later on also Theorem 11) are “correct”.259

Exercise 263. Solve each equation. (Answer on p. 1514.)

(a) x2 + x + 1 = 0. (b) x2 + 2x + 2 = 0. (c) 3x2 + 3x + 1 = 0.

This is a somewhat simplistic explanation. Repeated or multiple roots actually have greater significance
than merely ensuring the veracity of Fact 126 or Theorem 11.
624, Contents
62.1. The Fundamental Theorem of Algebra
By Fact 126, every quadratic equation has two (possibly repeated) complex roots. It turns
out this is more generally true: Every nth-degree polynomial equation in one variable has
n (possibly repeated) roots. This is the Fundamental Theorem of Algebra (FTA):

Theorem 11. (The Fundamental Theorem of Algebra.) Suppose a0 ≠ 0. Then the

following equation has n (possibly repeated) roots:

a0 xn + a1 xn−1 + a2 xn−2 + ⋅ ⋅ ⋅ + an−1 x + an = 0.

Proof. Omitted. See e.g. Schilling, Lankham, & Nachtergaele (2016, Ch. 3).

Example 809. The 2nd-degree polynomial (or quadratic) equation x2 − 1 = 0 has two
roots, namely 1 and −1.

Example 810. The 2nd-degree polynomial (or quadratic) equation x2 + 1 = 0 has two
roots, namely i and −i.

Example 811. By the FTA, the 3rd-degree polynomial equation (or cubic equation)
x3 − 8 = 0 has three roots. Let’s find them using what we learnt in Ch. 21.1.
Observe that 23 − 8 = 0. So one root is 2 and x − 2 is a factor of x3 − 8.
To find the other two factors, write:

x3 − 8 = (x − 2) (ax2 + bx + c) = ax3 + (b − 2a) x2 +?x − 2c,

Comparing coefficients, we have a = 1, b = 2, and c = 4. (Note that ? stands for a coefficient

we don’t bother to calculate because it isn’t necessary. We have three unknowns a, b,
and c; and so, it is only necessary to compute three of these coefficients.) Thus:

x3 − 8 = (x − 2) (x2 + 2x + 4) .
We can then further factorise x2 + 2x + 4 using the usual quadratic formula:
√ √
−b ± b − 4ac −2 ± 22 − 4 (1) (4)
2 √
x= = = −1 ± 3i.
2a 2⋅1
Altogether then, the three roots of the 3rd-degree polynomial equation x3 − 8 = 0 are:
√ √
2, −1 + 3i, −1 − 3i.

And here is the cubic polynomial x3 − 8 factorised into its three linear factors:
√ √
x3 − 8 = (x − 2) (x + 1 − 3i) (x + 1 + 3i) .

Exercise 264. Verify that −1 ± 3i solve x3 − 8 = 0. (Answer on p. 1514.)

625, Contents

As noted earlier, there may sometimes be repeated or multiple roots:

Example 812. The 2nd-degree polynomial (or quadratic) equation

x2 − 2x + 1 = (x − 1) = 0

has two repeated or multiple roots, namely 1 and 1.

Example 813. The 3rd-degree polynomial (or cubic) equation

x3 − 6x2 + 12x − 8 = (x − 2) = 0

has three repeated or multiple roots, namely 2, 2, and 2.

The FTA can be useful even if we have no idea how to solve an equation.

Example 814. We may have no idea how to solve the 17th-degree polynomial equation

x17 + 3x4 − 2x + 1 = 0.

Nonetheless, the FTA gives us a useful piece of information, namely that this equation
must have 17 roots or solutions (though some may possibly be repeated).

Exercise 265. Solve x3 + 64 = 0. (Answer on p. 1514.)

Exercise 266. You’re given that 1 solves both of the equations given below. Find the
other roots of each equation. (Answer on p. 1514.)

(a) x3 + x2 − 2 = 0. (b) x4 − x2 − 2x + 2 = 0.

626, Contents

62.2. The Complex Conjugate Root Theorem

Example 815. The equation x2 − 2x + 2 = 0 has roots 1 + i and 1 − i.

√ √
1 3 3 1 3 3
Example 816. The equation 7x2 + x + 1 = 0 has roots − + i and − − i.
14 14 14 14

The above examples suggest that if z = p + iq solves the quadratic equation ax2 + bx + c = 0,
then so too does its conjugate z ∗ = c − id. It turns out that this is generally true of any
polynomial equation, provided the coefficients are real:

Theorem 12. (Complex Conjugate Root Theorem.) Suppose c0 , c1 , . . . , cn ∈ R. If

z = a + ib solves cn xn + cn−1 xn−1 + cn−2 xn−2 + ⋅ ⋅ ⋅ + c1 x + c0 = 0, then so does z ∗ = a − ib.

Proof. See p. 1319 in the Appendices.

Example 817. If given that z = 2 − i solves x3 − x2 − 7x + 15 = 0, then by Theorem 12, we

know that the conjugate z ∗ = 2 + i also solves the same equation.

Example 818. If given that both i and 0.5i solve 4x4 + 5x2 + 1 = 0, then by Theorem 12,
we know that their conjugates −i and −0.5i also solve the same equation.

Example 819. If −1 + 2i is a root of x3 − 3x2 − 5x − 25 = 0, then what are the other two?
Well, by Theorem 12, we know that −1 − 2i must be another root.
So, both x−(−1 + 2i) = x+1−2i and x−(−1 − 2i) = x+1+2i are factors of x3 −3x2 −5x−25.
(x + 1 − 2i) (x + 1 + 2i) = (x + 1) − (2i) = x2 + 2x + 5.
2 2

Now write: x3 − 3x2 − 5x − 25 = (x2 + 2x + 5) (ax + b) = ax3 +?x2 +?x + 5b.

Comparing coefficients, we have a = 1 and b = −5. Thus:

x3 − 3x2 − 5x − 25 = (x2 + 2x + 5) (x − 5) = (x + 1 − 2i) (x + 1 + 2i) (x − 5) .

Altogether then, the three roots of the cubic equation x3 − 3x2 − 5x − 25 = 0 are:

−1 + 2i, −1 − 2i, 5.

Example 820. Let p, q ∈ R. If 3 + 2i solves x2 + px + q = 0, then what are p and q?

Well, by Theorem 12, 3 − 2i also solves this equation. Thus:

x2 + px + q = [x − (3 + 2i)] [x − (3 − 2i)] = (x − 3) 2 − (2i) 2 = x2 − 6x + 13.

Comparing coefficients, we have p = −6 and q = 13.

627, Contents

Example 821. We’re given that i solves x4 + x3 − 5x2 + x − 6 = 0. What are the other
three roots?
Well, by Theorem 12, we know that −i must be another root.
So, both x − i and x + i are factors of x4 + x3 − 5x2 + x − 6.

Compute: (x − i) (x + i) = x2 − i2 = x2 + 1.

Now: x4 + x3 − 5x2 + x − 6 = (x2 + 1) (ax2 + bx + c) = ax4 + bx3 +?x2 +?x + c.

Comparing coefficients, we have a = 1, b = 1, and c = −6. Thus:

x4 + x3 − 5x2 + x − 6 = (x2 + 1) (x2 + x − 6) .

By the quadratic formula or otherwise, we have:

x2 + x − 6 = (x + 3) (x − 2) .

Altogether then: x4 + x3 − 5x2 + x − 6 = (x − i) (x + i) (x + 3) (x − 2).

And the four roots of the quartic equation x4 + x3 − 5x2 + x − 6 = 0 are:

i, −i, −3, 2.

By the way, the condition that all coefficients c0 , c1 , . . . , cn in the polynomial equation are
real is important. If this condition is violated, then the Theorem’s conclusion may not hold:

Example 822. We are given that −2 + i solves x2 − (5 + 4i) x + (−17 + i) = 0.


Observe that not all of the coefficients in = are real. And so, Theorem 12’s conclusion

may not hold.

And indeed, it does not. As you should verify yourself, the conjugate −2 − i does not solve
=. Instead, the other solution to = is 7 + 3i.
1 1

Exercise 267. You’re given that 2 − 3i solves both of the equations below. Find the
other roots of each equation. (Answer on p. 1515.)

(a) x4 − 6x3 + 18x2 − 14x − 39 = 0. (b) −2x4 + 21x3 − 93x2 + 229x − 195 = 0.

Exercise 268. Suppose 1 − i solves:

x2 + px + q = 0 (p, q ∈ R).

Then what are p and q? (Answer on p. 1515.)

628, Contents

63. The Argand Diagram
Since secondary school, we’ve known that ordered pairs of real numbers can be depicted
geometrically as points on the plane.
We just learnt that the complex number z = a + ib may also be written as an ordered pair:

z = (a, b) .

And so, complex numbers can also be depicted geometrically as points on the plane. This
time, the real axis is the horizontal or x-axis, while the imaginary axis is the vertical or
y-axis. We call this plane the complex plane or Argand diagram.

Example 823. On the right is y

an Argand diagram that depicts
seven complex numbers.
3i = (0, 3)
The “impure” imaginary numbers −3 + 2i = (−3, 2)
−3 + 2i = (−3, 2), 1 + i = (1, 1), and
1 − 3i = (1, −3) are not on either 1 + i = (1, 1)
The purely imaginary numbers 0 = (0, 0) x
3i = (0, 3) and −4i = (0, −4) are on
the y-axis. −3 = (−3, 0) 2 = (2, 0)
The real numbers −3 = (−3, 0),
0 = (0, 0), and 2 = (2, 0) are on
the x-axis. 1 − 3i = (1, −3)

−4i = (0, −4)

Exercise 269. Depict the complex numbers 2, −1, 2i, 1 + 2i, and −1 − 3i on a single
Argand diagram. (Answer on p. 1515.)

Remark 74. Both of the following mathematical objects can be depicted on a plane.
• The set of complex numbers or the complex plane C = {a + ib ∶ a, b ∈ R}; and
• The set of ordered pairs of real numbers or the cartesian plane
{(x, y) ∶ x, y ∈ R}.

However, you should be aware that the complex plane and cartesian are different math-
ematical objects. Don’t worry, you need merely be aware that they are different; you
needn’t know what exactly the differences are.

629, Contents

64. Complex Numbers in Polar Form
So far, we’ve written complex numbers in cartesian form,260 i.e. as either:

z = a + ib or z = (a, b).

In this chapter, we’ll learn to write down a complex number in polar form. (And in the
next chapter, we’ll learn how to do so in exponential form.)
To write down a complex number z = a + ib = (a, b) in cartesian form, we need two pieces of

its real part Rez = a; and its imaginary part Imz = b.

To write down a complex number z in polar form, we likewise need two pieces of information:

its modulus, denoted ∣z∣; and its argument, denoted arg z.

Informally, the modulus of a complex number is the magnitude or length of its position
vector. Formally:

Definition 162. Given a complex number z = a + ib, its modulus, denoted ∣z∣ is the
following number:

∣z∣ = a2 + b2 .

Example 824. Let: y

z = 3 + 2i = (3, 2)
z = 3 + 2i√= (3, 2)
w = −2i = (0, −2),
∣z∣ = 13
ω = −2 − 2i = (−2, −2).
w = −2i
√ √ ∣w∣ = 2
Then: ∣z∣ = 32 + 22 = 13,
√ x
∣w∣ = 02 + (−2) = 2,
√ √
∣ω∣ = (−2) + (−2) = 2 2.
2 2

ω = −2 √
− 2i
∣ω∣ = 2 2

Exercise 270. Compute the moduli of the following numbers: (Answer on p. 1516.)

2, −1, 2i, 1 + 2i, −1 − 3i.

Also known as standard or rectangular form.
630, Contents
64.1. The Argument: An Informal Introduction
Informally, a complex number’s argument is the angle that number’s position vector makes
with the positive x-axis:

Example 825. Let z = 3 + 2i = (3, 2), w = −2i = (0, −2), and ω = −2 − 2i = (−2, −2).

2 3π
Then: arg z = tan−1 ≈ 0.588, arg w = π, and arg ω = − .
3 4
Notice that the angles that give us arg z and y
arg w are measured anti-clockwise from the
positive x-axis. In contrast, the angle that z = 3 + 2i
gives us arg ω is measured clockwise from √
∣z∣ = 13
the positive x-axis.
We will adopt the following informal rule:
• If the complex number a is on or above the arg w = π
x-axis (i.e. Ima ≥ 0), then the angle that arg z ≈ 0.588
gives us arg a is measured anti-clockwise w = −2 3π x
from the positive x-axis. ∣w∣ = 2 arg ω = −
• But if a is strictly below the x-axis (i.e.
Ima < 0), then the angle that gives us arg z
is measured clockwise from the positive ω = −2 √
− 2i
x-axis. ∣ω∣ = 2 2
And thus:

• If a is on or above the x-axis, then the angle measured anti-clockwise from the
positive x-axis must be in the interval [0, π].
• And if a is below the x-axis, then the angle measured clockwise from the positive
x-axis must be in the interval (−π, 0).
Altogether then, for any non-zero complex number a, we always have:

arg a ∈ (−π, π].

Or equivalently, the range or set of principal values of the argument function is:

Range (arg) = (−π, π].

Remark 75. The argument of the complex number 0, i.e. arg 0, is undefined. For all other
z, we have arg z ∈ (−π, π].

Exercise 271. Find the arguments of 2, −1, 2i, 1 + 2i, and −1 − 3i.(Answer on p. 1516.)

Exercise 272. Depict the complex numbers z = 2 − i and w = −3 + 2i on a single Argand

diagram. Then find their moduli and arguments. (Answer on p. 1516.)

631, Contents

64.2. The Argument: Formally Defined
We now work towards a formal definition of the
argument. We’ll do so using what we learnt y
about vectors.
Let z = (a, b) and w = (c, d) be non-zero com- z = a + ib
plex numbers, where z is on or above the x-axis arg z = θ
(i.e. b ≥ 0), while w is below it (i.e. d < 0).
Let z = (a, b) and w = (c, d) be their position
vectors on the Argand diagram. Let i = (1, 0) x
be the unit vector that points in the direction
of the positive x-axis. µ i = (1, 0)
Let θ be the angle between z = (a, b) and i =
(1, 0); and µ be the angle between w = (c, d)
and i = (1, 0). Following our informal discussion
w = c + id
on the previous page, we “want”:
arg w = −µ
arg z = θ and arg w = −µ.

Recall (Definition 111) that θ, the angle between z and i, is given by:
z⋅i (a, b) ⋅ (1, 0) a+0
θ = cos−1 = cos−1 = cos−1 √ = cos−1 √ = cos−1 .
a a
∣z∣ ∣i∣ ∣(a, b)∣ ∣(1, 0)∣ a2 + b2 ⋅ 1 a2 + b2 ∣z∣

Similarly, µ, the angle between w and i, is given by:

w⋅i (c, d) ⋅ (1, 0) c+0
µ = cos−1 = cos−1 = cos−1 √ = cos−1 √ = cos−1
c c
∣w∣ ∣i∣ ∣(c, d)∣ ∣(1, 0)∣ ∣w∣
c2 + d2 ⋅ 1 c2 + d2

arg z = θ = cos−1 arg w = −µ = − cos−1

a c
Thus: and .
∣z∣ ∣w∣

We are led to the following formal definition of a complex number’s argument:

Definition 163. Given a non-zero complex number z = a + ib, its argument, denoted
arg z, is the following number:

⎪ cos−1 √ if b ≥ 0,

⎪ 2 + b2

arg z = ⎨

⎪ − −1
√ if b < 0.

⎪ cos

a2 + b2

⎪ Rez

⎪ cos−1 if Imz ≥ 0,
⎪ ∣z∣

Or equivalently: arg z = ⎨

⎪ Rez

⎪ − cos−1 if Imz < 0.
⎪ ∣z∣

632, Contents

Observe that in accord with Remark 75, the above Definition leaves arg 0 undefined.

Example 826. Let z = 3 + 2i = (3, 2),

w = −2 = (−2, 0), and ω = −2 − 2i = (−2, −2). y
√ √
We have ∣z∣ = 13, ∣w∣ = 2, and ∣ω∣ = 2 2. And
z = 3 + 2i
so by Definition 163: √
∣z∣ = 13
Rez 3
arg z = cos−1 = cos−1 √ ≈ 0.588,
∣z∣ 13
arg w = π
Re w −2 arg z ≈ 0.588
arg w = cos−1 = cos−1 = π,
∣w∣ 2 w = −2 3π x
∣w∣ = 2 arg ω = −
Re ω −2 3π 4
arg ω = − cos−1 = − cos−1 √ = − .
∣ω∣ 2 2 4

ω = −2 √
− 2i
Take note that there is a negative sign before
∣ω∣ = 2 2
arccosine for ω (because ω is below the x-axis).
In contrast, there isn’t for either z or w (be-
cause they are on or above the x-axis).

The following result is immediate from what we’ve learnt about the Argand diagram and
Definition 163:

Fact 127. Let z be a non-zero complex number. Then:

(a) z is purely imaginary ⇐⇒ z is on the y-axis ⇐⇒ arg z = ± .

(b) z is a positive real number ⇐⇒ z is on the positive x-axis ⇐⇒ arg z = 0.
(c) z is a negative real number ⇐⇒ z is on the negative x-axis ⇐⇒ arg z = π.

Exercise 273. Find each number’s argument, but this time using Definition 163. (Check
that your answers are the same as before.) (Answer on p. 1516.)

2, −1, 2i, 1 + 2i, −1 − 3i, z = 2 − i, w = −3 + 2i.

Remark 76. In this subchapter, we’ve learnt two methods for computing the argument of a
complex number. The first method, covered in the previous subchapter, may be called the
“look at the graph and use arctangent”. The second method, covered in this subchapter.
simply uses Definition 163. I personally do not find (b) difficult to remember and so
going forward in this textbook, that’s what I’ll be using. But you should use whichever
you think is easier for you.

633, Contents

64.3. Complex Numbers in Polar Form
Suppose z = a+ib = (a, b) is a complex number with
a, b > 0. Let θ = arg z and r = ∣z∣. y

We have: cos θ =
A a O b
= and sin θ = = . z = a + ib = (a, b)
H r H r
Rearranging: a = r cos θ and b = r sin θ. r
Thus, we may also write z in polar form as: b

z = a + ib = r cos θ + ir sin θ = r (cos θ + isin θ). θ

a x
Let’s jot this down as a formal result:

Fact 128. Let z be a non-zero complex number with ∣z∣ = r and arg z = θ. Then:

z = r (cos θ + i sin θ) .

Proof. We already proved this result above, but only in the case where both a and b are
positive. For a complete proof, see p. 1319 in the Appendices.

Example 827. Let z = 5 − 2i = (5, −2). Then:

√ √ 5
r = ∣z∣ = 52 + (−2) = 29 θ = arg z = − cos−1 √ ≈ −0.381.
So, z may also be written in polar form as:

z = r (cos θ + i sin θ) ≈ 29 (cos −0.381 + i sin −0.381).

Example 828. Let z = 1 + 3i = (1, 3). Then:

√ √ 1
r = ∣z∣ = 12 + 32 = 10 and θ = arg z = cos−1 √ ≈ 0.322.

Thus: z = r (cos θ + i sin θ) ≈ 10 (cos 0.322 + i sin 0.322).

Example 829. Let z = −4 + 7i = (−4, 7). Then

√ √ −4
r = ∣z∣ = (−4)2 + 72 = 65 and θ = arg z = cos−1 √ ≈ 2.090.

Thus: z = r (cos θ + i sin θ) ≈ 65 (cos 2.090 + i sin 2.090).

Exercise 274. Rewrite each complex number in polar form. (Hint: We already computed
their moduli and arguments in earlier exercises.) (Answer on p. 1516.)

2, −1, 2i, 1 + 2i, −1 − 3i, z = 2 − i, w = −3 + 2i.

634, Contents
65. Complex Numbers in Exponential Form
In this chapter, we introduce Euler’s Formula, then use Euler’s Formula to write down
complex numbers in exponential form.

Theorem 13. (Euler’s Formula) If θ ∈ R, then eiθ = cos θ + i sin θ.

Proof. See p. 1320 in the Appendices.

Richard Feynman called the above “the most remarkable formula in mathematics”.261
Now, plug θ = π into Euler’s Formula to get:

eiπ = cos π + i sin π = −1 + 0 = −1.

Rearrangement yields Euler’s Identity:

Corollary 25. (Euler’s Identity) eiπ + 1 = 0.

Euler’s Identity is one of the most extraordinary and beautiful equations in all of math-
ematics. It links together five of the most fundamental mathematical constants:

e, i, π, 1, and 0.

Fun Fact

Leonhard Euler (1707–83) was a stud. There are so many mathematical results and
objects named after him that there is even a Wikipedia entry listing the things named
after him!
This can sometimes result in confusion. For example, what we call Euler’s Formula is
called Euler’s Identity by others and vice versa.
As another example, Euler’s number e = 2.718 . . . is different from Euler’s constant
γ = 0.577 . . .
Even if we count only Euler’s output after he turned blind at around age 60, his output
was of such quality and quantity that has been matched by few other mathematicians in

The Feynman Lectures on Physics (1964, p. 22–10).
635, Contents
65.1. Complex Numbers in Exponential Form
Let z be a non-zero complex number with r = ∣z∣ and θ = arg z. Then by Fact 128:

z = r (cos θ + i sin θ).


By Euler’s Formula, cos θ + i sin θ = eiθ . Now plug = into = and we will have written z down
2 2 1

in exponential form:

z = reiθ .

For future reference, let’s jot this down as a formal result:

Fact 129. Let z be a non-zero complex number. Suppose r = ∣z∣ and θ = arg z. Then:

z = reiθ .

Example 830. Let z = 5 − 2i = (5, −2). Then:

√ √ 5
r = ∣z∣ = 52 + (−2) = 29 θ = arg z = − cos−1 √ ≈ −0.381.
So, we may also write z in exponential form as:
√ −0.381i
z = reiθ = 29e .

Example 831. Let z = 1 + 3i = (1, 3). Then:

√ √ 1
r = ∣z∣ = 12 + 32 = 10 and θ = arg z = cos−1 √ ≈ 0.322.

Thus: z = reiθ = 10e0.322i .

Example 832. Let z = −4 + 7i = (−4, 7). Then:

√ √ −4
r = ∣z∣ = (−4)2 + 72 = 65 and θ = arg z = cos−1 √ ≈ 2.090.

Thus: z = reiθ = 65e2.090i .

Exercise 275. Rewrite each complex number in polar form. (Hint: We already computed
their moduli and arguments in earlier exercises.) (Answer on p. 1516.)

2, −1, 2i, 1 + 2i, −1 − 3i, z = 2 − i, w = −3 + 2i.

Remark 77. Your A-Level examiners do not seem to use the term exponential form. What
we call exponential form is simply called polar form by them. (And what we call polar
form is also called polar form by them.)

636, Contents

66. More Arithmetic of Complex Numbers
Now that we know how to write complex numbers in polar and exponential forms, the
arithmetic of complex numbers becomes even easier. We start with more multiplication:

Fact 130. Let z and w be non-zero complex numbers. Then:

(a) ∣zw∣ = ∣z∣ ∣w∣; and (b) arg (zw) = arg z + arg w + 2kπ,

⎪ −1, if arg z + arg w > π,

where in (b): k = ⎨0, if arg z + arg w ∈ (−π, π] ,

⎩1, if arg z + arg w ≤ −π.

Proof. For (a), see Exercise 277. For (b), see p. 1321 (Appendices).

The additional term 2kπ in Fact 130(b) is to ensure that arg (zw) ∈ (−π, π], as is required
by the definition of the argument. A few examples will make this clear:

Example 833. We are given:

√ √
z = 5 − 2i = (5, −2) ≈29 (cos −0.381 + i sin −0.381) = 29e−0.381i ,
√ √
w = 1 + 3i = (1, 3) ≈ 10 ( cos 1.249 + i sin 1.249 ) = 10e1.249i .
√ √ √
By Fact 130: (a) ∣zw∣ = 29 10 = 290; and

(b) arg (zw) = arg z + arg w + 2kπ ≈ −0.381 + 1.249 + 0π ≈ 0.869.

Now, how did I know to choose k = 0 here? Well, by definition, the argument of any
complex number is in the interval (−π, π]. And so, when applying Fact 130(b), we always
simply choose k to be such that arg (zw) ∈ (−π, π]. Here we already have arg z + arg w ≈
−0.381 + 1.249 ≈ 0.869 ∈ (−π, π]. And so, we simply choose k = 0.
With (a) and (b), we can write zw down in both polar and exponential forms:
√ √
zw ≈ 290 (cos 0.869 + i sin 0.869) = 290e0.869i .

To write zw down in cartesian form, we can use ∣zw∣ and arg (zw) to compute:
√ √
Re (zw) ≈ 290 cos 0.869 = 10.994; and Im (zw) ≈ 290 sin 0.869 = 13.005.

Of course, since the real and imaginary parts of both z and w are all integers, so too
must be the real and imaginary parts of zw. And so we have in fact Re (zw) = 11 and
Im (zw) = 13. Thus, zw = 11 + 13i.
Alternatively, we can do the usual multiplication, which yields us the exact value of zw:

zw = (5 − 2i) (1 + 3i) = 5 + 15i − 2i + 6 = 11 + 13i.

637, Contents

Example 834. We are given:
√ √
z = −4 + 7i = (−4, 7) ≈ 65 (cos 2.090 + i sin 2.090) = 65e2.090i ,
√ √
w = 1 + 9i = (1, 9) ≈ 82 (cos 1.460 + i sin 1.460) = 82e1.460i .
√ √
By Fact 130: (a) ∣zw∣ = 65 82; and

(b) arg (zw) = arg z + arg w + 2kπ ≈ 2.090 + 1.460 − 2π ≈ −2.733.

Since arg z + arg w ≈ 2.090 + 1.460 > π, we choose k = −1.
With (a) and (b), we can write zw down in both polar and exponential forms:
√ √ √ √
zw ≈ 65 82 (cos −2.733 + i sin −2.733) = 65 82e−2.733i .

To write zw down in cartesian form, we can use ∣zw∣ and arg (zw) to compute:
√ √ √ √
Re (zw) ≈ 65 82 cos −2.733 ≈ −67; and Im (zw) ≈ 65 82 sin −2.733 ≈ −29.

Thus: zw = −67 − 29i.

Alternatively, we can do the usual multiplication, which yields us the exact value of zw:

zw = (−4 + 7i) (1 + 9i) = −4 − 36i + 7i − 63 = −67 − 29i.

Example 835. We are given:

√ √
z = −2 − i = (−2, −1) ≈ 5 (cos −2.678 + i sin −2.678) = 5e−2.678i ,
√ √
w = 1 − 3i = (1, −3) ≈ 10 (cos −1.249 + i sin −1.249) = 10e−1.249i .
√ √ √
By Fact 130: (a) ∣zw∣ = 5 10 = 5 2; and

(b) arg (zw) = arg z + arg w + 2kπ ≈ −2.678 − 1.249 + 2π ≈ 2.356.

Since arg z + arg w ≈ −2.678 − 1.249 ≤ π, we choose k = 1.
With (a) and (b), we can write zw down in both polar and exponential forms:
√ √
zw ≈ 5 2 (cos 2.356 + i sin 2.356) = 5 2e2.356i .

To write zw down in cartesian form, we can use ∣zw∣ and arg (zw) to compute:
√ √
Re (zw) ≈ 5 2 cos 2.356 ≈ −5; and Im (zw) ≈ 5 2 sin 2.356 ≈ 5.

Thus: zw = −5 + 5i.

Alternatively, we can do the usual multiplication, which yields us the exact value of zw:

zw = (−2 − i) (1 − 3i) = −2 + 6i − i − 3 = −5 + 5i.

638, Contents

Multiplying a complex number by a positive real number leaves the argument unchanged:

Corollary 26. Suppose z is a complex number and a > 0. Then arg (az) = arg z.

Proof. By Fact 130, arg (az) = arg a + arg z + 2kπ = arg z + 2kπ = arg z, where we choose
k = 0 because arg z ∈ (−π, π].

Example 836. Let z=7-9i, so that 5z = 35 − 45i. Then arg z = arg (5z) ≈ −0.910.

In contrast, multiplying z by −1 changes the argument by either −π or +π:

Corollary 27. (a) If arg z > 0, then arg (−z) = arg z − π.

(b) If arg z ≤ 0, then arg (−z) = arg z + π.

Proof. By Fact 130(b), arg (−z) = arg (−1 ⋅ z) = arg (−1) + arg z + 2kπ = π + arg z + 2kπ.
(a) If arg z > 0, then k = −1 and thus: arg (−z) = π + arg z − 2π = arg z − 2π.

(b) If arg z ≤ 0, then k = 0 and thus: arg (−z) = π + arg z − 0π = arg z + π


Example 837. Let z=7-9i, so that −z = −7 + 9i. Since arg z ≈ −0.910 ≤ 0, by Corollary
27, we have arg (−z) = arg z + π ≈ −0.910 + π ≈ 2.232.

Example 838. Let a = 1, b = 1 + i, c = i, and d = 1 − 3i. Then:

arg a = arg 1 = 0 ≤ 0,

−d = −1 + 3i y
arg b = arg (1 + i) = > 0,
arg c = = > 0, c=i b=1+i
arg i
√ −π
arg d = arg (1 − 3i) = ≤ 0.
−a = −1 x
And so by Corollary 27: a=1

arg (−a) = arg (−11) = arg a + π = π,

−b = −1 − i −c = −i
arg (−b) = arg (−1 − i) = arg b − π = ,
4 √
−π d=1− 3i
arg (−c) = arg (−i) = arg c − π = ,
√ 2π
arg (−d) = arg (−1 + 3i) = arg d + π = .

639, Contents

Combining the last two Corollaries yields following result, which says that multiplying a
complex number by a negative real number changes the argument by either −π or +π:

Corollary 28. Suppose z is a complex number and a > 0.

(a) If arg z > 0, then arg (−az) = arg (−z) = arg z − π.
(b) If arg z ≤ 0, then arg (−az) = arg (−z) = arg z + π.

Example 839. Let z=7-9i, so that −5z = −35+45i. Since arg z ≈ −0.910 ≤ 0, by Corollary
28, we have arg (−5z) = arg z + π ≈ −0.910 + π ≈ 2.232.

Example 840. Let a = 1, b = 1 + i, c = i, and d = 1 − 3i. Then:

arg a = arg 1 = 0 ≤ 0, arg c = = > 0,

arg i
√ −π
arg b = arg (1 + i) = > 0, arg d = arg (1 − 3i) = ≤ 0.
4 3

And so, by Corollary 28:

arg (−3a) = arg (−33) = arg (−a) = arg a + π = π,

arg (−3b) = arg (−3 − 3i) = arg (−b) = arg b − π = ,
arg (−3c) = arg (−3i) = arg (−c) = arg c − π = ,
√ 2π
arg (−3d) = arg (−3 + 3 3i) = arg (−d) = arg d + π = .

Exercise 276. For each, find ∣zw∣, arg (zw), ∣−2zw∣, and arg (−2zw). Then express both
zw and −2zw in polar, exponential, and cartesian forms. (Answer on p. 1517.)
(a) z = 1, w = −3. (b) z = 2i, w = 1 + 2i. (c) z = −1 − 3i, w
(d) z = −2 + 5i, w = i. (e) z = −1 − i, w = −1 − 2i. (f) z = −5 − 3i, w

Exercise 277. This Exercise guides you through a proof of Fact 130(a). Let r = ∣z∣,
θ = arg z, s = ∣w∣, and φ = arg w. (Answer on p. 1518.)

(a) Express z and w in polar form.

(b) Expand zw. Then use a trigonometric identity to show that:

zw = rs [cos (θ + φ) + i sin (θ + φ)] .

(c) Now show that ∣zw∣ = ∣z∣ ∣w∣.

640, Contents

66.1. The Reciprocal

Fact 131. Suppose w is a non-zero complex number. Then:

1 1
(a) ∣ ∣= .
w ∣w∣

If moreover w is not a negative real number, then:

(b) arg = − arg w.

Proof. See p. 1322 in the Appendices.

Example 841. We are given:

√ √
z = 5 − 2i = (5, −2) ≈ 29 (cos −0.381 + i sin −0.381) = 29e−0.381i ,
√ √
w = 1 + 3i = (1, 3) ≈ 10 (cos 1.249 + i sin 1.249) = 10e1.249i .

1 1 1 1 1
By Fact 131, (a) ∣ ∣ = √ and (b) arg = − arg z ≈ 0.381. Also, (a) ∣ ∣ = √ and
z 29 z w 10
(b) arg = − arg w ≈ −1.249. Thus:
1 1 1
≈ √ (cos 0.381 + i sin 0.381) ≈ √ e0.381i ,
z 29 29
1 1 1
≈ √ (cos −1.249 + i sin −1.249) ≈ √ e−1.249i .
w 10 10

As stated, Fact 131(b) does not hold in the special case where w < 0. In this special case,
we instead simply have:

arg = arg w = π.

Example 842. Let ω = −5, so that 1/ω = −1/5. Then arg ω = π and arg (1/ω) = π. So
Fact 131(b) does not hold — that is, arg (1/ω) ≠ − arg w. Instead, we have:
arg = arg w = π.

Exercise 278. Find the moduli and arguments of each number and its reciprocal. Then
write down the latter in exponential, polar, and cartesian form. (Answer on p. 1518.)
(a) z = 1. (b) w = 2i. (c) z = −17. (d) w = −8i.
(e) z = −2 + 5i. (f) w = −1 − i. (g) z = 1 − 3i. (h) w = 3 + 4i.

641, Contents

66.2. Division

Fact 132. Let z and w be non-zero complex numbers. Then:

(a) ∣ ∣ = = arg z − arg w + 2kπ,
z z
; and (b) arg
w ∣w∣ w

⎪ −1, if arg z − arg w > π,

where in (b): k = ⎨0, if arg z − arg w ∈ (−π, π] ,

⎩1, if arg z − arg w ≤ −π.

Proof. For (a), see Exercise 280. For (b), see p. 1323 (Appendices).

Example 843. We are given:

√ √ −0.381i
z = 5 − 2i = (5, −2) ≈ 29 (cos −0.381 + i sin −0.381) = 29e ,
√ √
w = 1 + 3i = (1, 3) ≈ 10 (cos 1.249 + i sin 1.249) = 10e1.249i .

By Fact 132, we have:

29 √
∣ ∣ = √ = 2.9,
w 10

= arg z − arg w + 2kπ ≈ −0.381 − 1.249 + 0 ≈ −1.630,

(b) arg
where we chose k = 0 because arg z − arg w ∈ (−π, π].
With (a) and (b), we can write z/w down in both polar and exponential forms:
z √ √
≈ 2.9 (cos −1.630 + i sin −1.630) = 2.9e−1.630i .
To write z/w down in cartesian form, we can use ∣z/w∣ and arg (z/w) to compute:
z √ z √
Re ≈ 2.9 cos −1.630 ≈ −0.101; and Im ≈ 2.9 sin −1.630 ≈ −1.700.
w w

= −0.1 − 1.7i.

Alternatively, we can simply do the usual division, which yields us the exact value of zw:

z 5 − 2i 5 − 2i 1 − 3i 5 − 15i − 2i − 6 −1 − 17i
= = ⋅ = = = −0.1 − 1.7i.
w 1 + 3i 1 + 3i 1 − 3i 12 + 32 10

642, Contents

Example 844. We are given:
√ √
z = −4 + 7i = (−4, 7) ≈ 65 (cos 2.090 + i sin 2.090) = 65e2.090i ,
√ √
w = 1 + 9i = (1, 9) ≈ 82 (cos 1.460 + i sin 1.460) = 82e1.460i .

∣ ∣=√ ,
By Fact 132: (a)
w 82

= arg z − arg w + 2kπ ≈ 2.090 − 1.460 + 0 ≈ 0.630,

(b) arg

where we chose k = 0 because arg z − arg w ∈ (−π, π].

With (a) and (b), we can write z/w down in both polar and exponential forms:
√ √
65 65
≈ √ (cos 0.630 + i sin 0.630) = √ e0.630i .
w 82 82
To write z/w down in cartesian form, we can use ∣z/w∣ and arg (z/w) to compute:
√ √
65 65
Re ≈ √ cos 0.630 ≈ 0.719; Im ≈ √ sin 0.630 ≈ 0.525.
z z
w 82 w 82

≈ 0.719 + 0.525i.

Alternatively, we can simply do the usual division, which yields us the exact value of z/w:
z −4 + 7i −4 + 7i 1 − 9i −4 + 36i + 7i + 63 59 + 43i 59 43
= = = = = + i.
w 1 + 9i 1 + 9i 1 − 9i 12 + 92 82 82 82

643, Contents

Example 845. We are given:
√ √
z = −2 − i = (−2, −1) ≈ 5 (cos −2.678 + i sin −2.678) = 5e−2.678i ,
√ √
w = 1 − 3i = (1, −3) ≈ 10 (cos −1.249 + i sin −1.249) = 10e−1.249i .

5 1
∣ ∣=√ =√ ,
By Fact 132: (a)
w 10 2

= arg z − arg w + 2kπ ≈ −2.678 + 1.249 + 0 ≈ −1.429,

(b) arg

where we chose k = 0 because arg z − arg w ∈ (−π, π].

With (a) and (b), we can write z/w down in both polar and exponential forms:
1 1
≈ √ (cos −1.429 + i sin −1.429) = √ e−1.429i .
w 2 2
To write z/w down in cartesian form, we can use ∣z/w∣ and arg (z/w) to compute:

1 1
≈ √ cos −1.429 ≈ 0.100; ≈ √ sin −1.429 ≈ −0.700.
z z
Re and Im
w 2 w 2

= 0.1 − 0.7i.

Alternatively, we can simply do the usual division, which yields us the exact value of z/w:
z −2 − i −2 − i 1 + 3i −2 − 6i − i + 3 1 − 7i
= = = = = 0.1 − 0.7i
w 1 − 3i 1 − 3i 1 + 3i 12 + 32 10

Exercise 279. For each, find ∣z/w∣ and arg (z/w). Then express z/w in polar, exponen-
tial, and cartesian forms. (Answer on p. 1519.)
(a) z = 1, w = −3. (b) z = 2i, w = 1 + 2i. (c) z = −1 − 3i, w
(d) z = −2 + 5i, w = i. (e) z = −1 − i, w = −1 − 2i. (f) z = −5 − 3i, w

Exercise 280. Use Facts 130 and 131 to prove Fact 132(a). (Answer on p. 1519.)

644, Contents

Part V.

Revision in progress (Jan 2019).

And hence messy at the moment.
Appy polly loggies for any inconvenience caused.

645, Contents

The calculus was the first achievement of modern mathematics, and it is
difficult to overestimate its importance.

— John von Neumann, “The Mathematician” (1947).

My mathematical tutors had never shown me any reason to suppose the Cal-
culus anything but a tissue of fallacies.

— Bertrand Russell, Autobiography (1951).

Zudem ist es ein Irrtum zu glauben, daß die Strenge in der Beweisführung
die Feindin der Einfachheit wäre. An zahlreichen Beispielen finden wir im
Gegenteil bestätigt, daß die strenge Methode auch zugleich die einfachere
und leichter faßliche ist.
Besides it is an error to believe that rigor in the proof is the enemy of sim-
plicity. On the contrary we find it confirmed by numerous examples that the
rigorous method is at the same time the simpler and the more easily compre-

— David Hilbert (1900, [1902 trans.]).

646, Contents

67. Limits
The idea of limits isn’t on your syllabus.262 But it is fundamental to calculus. And it
really isn’t all that difficult, especially if presented in informal and intuitive terms (as I
have tried to do here). It is therefore well worthwhile spending just a little time on the
idea of limits, just so things become that much clearer.263

Remark 78. To the unconvinced Type 1 Pragmatist thinking of skipping this chapter:
Think again. In recent years, your A-Level examiners have seen fit to screw students over
with curveball, totally-out-of-the-syllabus questions264 involving limits. See especially
Exercise 483(c) (N2017/I/9). So yea, this chapter’s probably worth a quick read.

67.1. Limits, Informally Defined

Suppose f is a function. We are told that:

The limit of f at a is L.

What does this mean? Here are two informal definitions or interpretations:265

For all values of x that are “close” but not equal to a,

f (x) is “close” (or possibly even equal) to L.

Or: By making x “sufficiently close” but not equal to a,

f (x) can be made as “close” as we like to L.

Ignoring the Central Limit Theorem, the word limit appears on your syllabus only once (on p. 9),
almost in passing, and solely in relation to the definite integral.
To keep things simple, we discuss only functional limits (and not sequential limits).
We already discussed this “phenomenon” in my Preface/Rant — see p. xli.
We will formally define the above statement only in the Appendices (see Definition 252).
647, Contents
Example 846. Consider the function f ∶ R → R defined by f (x) = x2 − 1.

Figure to be
inserted here.

Or: For all values of x that are “close” but not equal to 2,
f (x) is “close” (or possibly even equal) to 3.

Or: By making x “sufficiently close” but not equal to 2,

f (x) can be made as “close” as we like to 3.

And so, we say that: The limit of f at 2 is 3.

The statement the limit of f at a is L can be written more formally (and concisely) as:

lim f (x) = L.

Here is another, exactly equivalent way to say the limit of f at a is L:

As x approaches a, f (x) approaches L.

Or equivalently and more concisely: As x → a, f (x) → L.

Altogether then, we have four, equivalent ways to say the same thing:
1. The limit of f at a is L. 3. As x approaches a, f (x) approaches L.
2. lim f (x) = L. 4. As x → a, f (x) → L.

Example 847. Continue to define f ∶ R → R by f (x) = x2 − 1.

The following four statements are equivalent and true:
1. The limit of f at 2 is 3. 3. As x approaches 2, f (x) approaches 3.
2. lim f (x) = 3. 4. As x → 2, f (x) → 3.

To repeat, we say that the limit of f at a is L if:

For all values of x that are “close” but not equal to a,

f (x) is “close” (or possibly even equal) to L.

The condition “not equal to” is subtle and requires emphasis. When considering the limit
of a function g at a, we do not care about g (a), the value of the function at a. We only
care about the values of x that are “close” to a. Example:
648, Contents
Example 848. Define the function g ∶ R → R by:

⎪x2 − 1 for x ≠ 2,
g (x) = ⎨

⎪ for x = 2.
The function g is very similar to the function f , except that now there is a “hole” in
the curve, with g (2) = 0. As we’ll learn later, this is an example of a removable

Figure to be
inserted here.

Nonetheless and perhaps surprisingly, it remains true that:

lim g (x) = 3.

To see why, observe that the following statements remain true:

For all values of x that are “close” but not equal to 2,

g (x) is “close” (or possibly even equal) to 3.

Or: By making x “sufficiently close” but not equal to 2,

g (x) can be made as “close” as we like to 3.

And so, the following four (equivalent) statements are again true:
1. The limit of g at 2 is 3. 3. As x approaches 2, g (x) approaches 3.
2. lim g (x) = 3. 4. As x → 2, g (x) → 3.

Actually, the condition “not equal to” goes even further. When we consider the limit of
a function h at a point a, we don’t even care if h (a) is undefined! Example:

649, Contents

Example 849. Define the function h ∶ R ∖ {2} → R by h (x) = x2 − 1.
The function h is very similar to g, except that h (2) is simply left undefined.

Figure to be
inserted here.

Nonetheless and perhaps surprisingly, it remains true that:

lim h (x) = 3.

To see why, observe that the following statements remain true:

For all values of x that are “close” but not equal to 2,

h (x) is “close” (or possibly even equal) to 3.

Or: By making x “sufficiently close” but not equal to 2,

h (x) can be made as “close” as we like to 3.

And so, the following four (equivalent) statements are again true:
1. The limit of h at 2 is 3. 3. As x approaches 2, h (x) approaches 3.
2. lim h (x) = 3. 4. As x → 2, h (x) → 3.

650, Contents

Exercise 281. The function i ∶ R → R is graphed below. Find the limits of i at 0, 1, 2,
and 3. (Answer on p. 651.)

Figure to be
inserted here.

Exercise 282. Define j ∶ R → R by:

⎪x, for x ≠ 0,
j (x) = ⎨

⎪ for x = 0.
Graph j. Then find lim j (x), lim j (x), lim j (x), and lim j (x). (Answer on p. 651.)
x→ x→ x→ x→

To better understand limits, we next look at examples where the limit does not exist:

651, Contents

67.2. Examples Where The Limit Does Not Exist

Example 850. Define j ∶ R → R by:

⎪0 for x < 0,
j (x) = ⎨

⎪ for x ≥ 0.



Now, consider the limit of j at 0:

lim j (x).

This limit cannot be −1, because for values of x that are “close” to but more than 0, we
have j (x) = 1. Hence:

lim j (x) ≠ −1, or equivalently: For some values of x that are “close” but
x→0 not equal to 0, j (x) is not “close” to −1.

Similarly, this limit cannot be 1, because for values of x that are “close” to but less than
0, we have j (x) = −1. Hence:

lim j (x) ≠ 1, or equivalently: For some values of x that are “close” but
x→0 not equal to 0, j (x) is not “close” to 1.

More generally, the limit cannot be any real number L. For any real number L, we have:

lim j (x) ≠ L, or equivalently: For some values of x that are “close” but
x→0 not equal to 0, j (x) is not “close” to L.

Since there is no real number that equals lim j (x), we simply say that lim j (x) does not
x→0 x→0
As we’ll learn later, x = 0 is an example of a jump discontinuity.
652, Contents
Your H2 Maths syllabus does not mention of the concepts of a left-hand limit and a
right-hand limit. But they are simple and can aid your understanding. In the above
• The left-hand limit of j at 0 is −1; and
• The right-hand limit of j at 0 is 1.
In formal notation, we’d write:

lim j (x) = −1 and lim j (x) = 1.

x→0− x→0+

Given our above informal definitions of the limit, it is not difficult to write down the
following informal definitions of left- and right-hand limits.
We say that the left-hand limit of f at a is L if:

For all values of x that are “close” to but less than 0,

f (x) is “close” (or possibly even equal) to L.

Or: By making x “sufficiently close” but less than a,

f (x) can be made as “close” as we like to L.

Similarly, the right-hand limit of f at a is L if:

For all values of x that are “close” to but greater than 0,

f (x) is “close” (or possibly even equal) to L.

Or: By making x “sufficiently close” but greater than a,

f (x) can be made as “close” as we like to L.

Not surprisingly, the limit of f at a is L ⇐⇒ the left- and right-hand limits of f at a are
also L:

Fact 133. Suppose D ⊆ R and f ∶ D → R. Then:

lim f (x) = L ⇐⇒ lim f (x) = L = lim+ f (x).

x→a x→a− x→a

Proof. It turns out that this result is an immediate consequence of our definitions of the
limit (Definition 252), the left-hand limit (Definition 253), and the right-hand limit
(Definition 254). However, these formal definitions are given only in the Appendices.

In the last example, the left- and right-hand limits of j at 0 do exist:

lim j (x) = −1 and lim j (x) = 1.

x→0− x→0+

However, they are not equal. And so, by Fact 133, the limit of j at 0 does not exist.

653, Contents

Example 851. Define k ∶ R ∖ {0} → R by k (x) = .

Figure to be
inserted here.

Then the limit of k at 0 does not exist. This is because there is no real number L such

For all values of x that are “close” but not equal to 0,

k (x) is “close” (or possibly even equal) to L.

Nonetheless, we are allowed to say the limit of k at 0 is ∞ and write:

lim k (x) = ∞.

At this point you may feel confused. We have two seemingly-contradictory statements.
• “The limit of k at 0 does not exist.” (Or: “lim k (x) does not exist.”)
• “The limit of k at 0 is infinity.” (Or: “lim k (x) = ∞.”)

But strangely enough, both of the above statements are true. How can this be?
The key here is to recall what we emphasised on p. 42 — ∞ is not a number. Instead,
it is merely a symbol that is sometimes convenient for helping us say specific things.
We have written:

“lim k (x) = ∞”,

“The limit of k at 0 is infinity.” Or:

It is important to understand that = is simply shorthand for the following informal



As x “approaches” 0, k (x) “grows” without bound from above.266

Importantly, = does not say that lim k (x) is equal or identical to some object called ∞.
Indeed, in writing =, we do not even commit to the existence of an object called ∞.

A more precise statement of what = means is given in Ch. 121.3 (Appendices).

266 1

654, Contents

Example 852. Define m ∶ R ∖ {0} → R by m (x) = .

Figure to be
inserted here.

Again, the limit of m at 0 does not exist. This is because there is no real number L
such that:

For all values of x that are “close” but not equal to 0,

m (x) is “close” (or possibly even equal) to L.

It is likewise true that the left- and right-hand limits of m at 0 do not exist. This is
because there is no real number L such that:

For all values of x that are “close” For all values of x that are “close”
but less than 0, m (x) is “close” or but greater than 0, m (x) is “close”
(or possibly even equal) to L; (or possibly even equal) to L.

At the same time, we may write:

lim− m (x) = −∞ lim+ m (x) = ∞.

1 2
x→0 x→0

Again, neither = nor = acknowledges the existence of an object called −∞ or ∞. Instead,

1 2

informally, each simply says the following:

1. The left-hand limit of m at 0 is −∞. Or:
As x “approaches” 0 “from the left”, k (x) “grows” without bound from below.
2. The right-hand limit of m at 0 is ∞. Or:
As x “approaches” 0 “from the right”, k (x) “grows” without bound from above.
Note though that unlike the previous example, here we may not write either of the
following statements:

lim m (x) = ∞ and lim m (x) = −∞.

x→0 x→0

655, Contents

Example 853. Recall that the natural logarithm function ln has domain R+ .

Figure to be
inserted here.

Observe that ln is undefined “near” −1. And so, the limit of ln at −1, or lim ln x, does
not exist.
One key motivation for having the concept of limits is that it can help us understand
how a function behaves “near” a point.
And so, if a function is undefined “near” a point, then there is nothing to understand. In
which case, we shall simply say that the limit does not exist at that point.
So, here for example, the function ln is undefined “near” −1. Thus, we shall simply say
that lim ln x does not exist.
Indeed, if a is any negative number, then the limit lim ln x likewise does not exist. The
reason is that ln is undefined “near” any negative number a.

By the way, what is the limit of ln at 0? Following our previous examples, we observe
that the right-hand limit of ln at 0 is −∞ and we may write:

lim+ ln x = −∞

Or informally, as x “approaches” 0 “from the right”, ln x “grows” without bound from

Note that here we may also write:

lim ln x = −∞.

Informally, the reason is that for all values of x that are in the domain of ln, as x
“approaches” 0, it is indeed the case that ln x “grows” without bound from below.

We now revisit the Dirichlet function (previously examined on p. 167):

656, Contents

Example 854. The Dirichlet function y
d ∶ R → R is defined by:

⎪1 for x ∈ Q,
d(x) = ⎨

⎩0 for x ∉ Q.

The graph of d contains the
point (x, 1) for every x ∈ Q.

The graph of d contains the

point (x, 0) for every x ∉ Q.

Observe that for any a ∈ R, lim d (x) does not exist.

To see why, consider any rational number, say b = 2. Since b = 2 is rational, we have
d (b) = 1.
However, for values of x “near” b = 2, there is no number L that d (x) stays “close” to.
Instead, “near” b = 2, d (x) takes on the values 0 and 1 “infinitely often”. And so, we
simply say that lim d (x) does not exist.
√ √

Similarly, consider any irrational number, say c = 2. Since c = 2 is irrational, we have

d (c) = 0.

However, for values √of x “near” c = 2, there is no number L that d (x) stays “close” to.
Instead, “near” c = 2, d (x) takes on the values 0 and 1 “infinitely often”. And so, we
simply say that lim d (x) does not exist.

Our next example is even stranger:

657, Contents

Example 855. Define f ∶ R → R by

⎪ 1

⎪sin for x ≠ 0,
f (x) = ⎨ x

⎩0 for x = 0.

This is a very strange function indeed. Like sin, f takes on values between −1 and 1.
But as x gets “closer” to 0, f (x) fluctuates ever more rapidly between −1 and 1. Indeed,
when we’re very “close” to 0, it’s impossible to accurately depict the graph of f .


⎪ 1

⎪sin for x ≠ 0,
f (x) = ⎨ x

⎩0 for x = 0.

Observe that for all values of x that are “close to” but not equal to 0, there is no number
L that f (x) is “close to”. Instead, when x is “close to” 0, f (x) takes on every value in
[−1, 1] “infinitely often”! And so, “near” 0, there is no number L that f (x) can be said
to be “close to”. In other words:

lim f (x) does not exist.


Exercise 283. Consider the function f ∶ R → R defined by:

⎪1 for x ≤ 0,
f (x) = ⎨

⎪ for x > 0.
What are lim f (x), lim f (x), and lim f (x)? (Answer on p. 1528.)
x→−5 x→0 x→5

658, Contents

67.3. Rules for Limits
Happily, the algebraic Rules for Limits are the simple and “obvious” ones you’d expect.267

Theorem 14. (Rules for Limits) Suppose k, L, M ∈ R, lim f (x) = L, and lim g (x) =
x→a x→a
M . Then:
lim [kf (x)] = kL
(a) (Constant Factor Rule)
(b) lim [f (x) ± g (x)] =L+M (Sum and Difference Rules)
(c) lim [f (x) g (x)] = LM (Product Rule)
1 R 1
(d) lim = (provided M ≠ 0) (Reciprocal Rule)
x→a g (x) M
f (x) ÷ L
(e) lim = (provided M ≠ 0) (Quotient Rule)
x→a g (x) M
(f) lim k (Constant Rule)
= ak
(g) lim xk (Power Rule)

Proof. See p. 1330.

Example 856. XXX

Remark 79. As stated, limits are not on your H2 Maths syllabus. And so, a fortiori,
neither are the above Rules for Limits.
Nonetheless, since the above Rules are so simple, “obvious”, and easy to remember, it is
probably not much of a cognitive burden on you to include them here. We will, moreover,
find the above Rules useful when learning how to compute derivatives shortly.

Example 857. XXX

Example 858. XXX

Exercise 284. XXX (Answer on p. 659.)


These Rules are most definitely not on your H2 Maths syllabus. But since they are so simple and
“obvious”, there’s probably no pedagogical harm in listing them without proof.
659, Contents
68. Continuity, Revisited
In Ch. 11, we already briefly discussed the concept of continuity. Recall that informally, a
continuous function is one whose entire graph can be drawn without lifting your pencil.
We now revisit the concept of continuity, now that we have an intuitive grasp of the idea of
limits. In particular, we can now use limits to write down a formal definition of continuity:

Definition 164. Let f be a nice function268 with domain D. Let a ∈ D. We say that f
is continuous at a if either of the following Conditions hold:
1. lim f (x) = f (a); or
2. a is an isolated point of D.

The above Definition gives two Conditions under which f is said to be continuous at a.
Condition 1 is the important one that we’ll focus on. In words, it says that the limit of f
at a equals the value of f at a.
Condition 2 is less important. You can think of it as an annoying technicality that we’ll
briefly discuss only in Ch. 68.6 (optional).
We’ll illustrate continuity using examples from the previous chapter:

Example 859. Define f ∶ R → R by f (x) = x2 − 1.

Figure to be
inserted here.

Then f is continuous at 1, 0, and −1 because:

lim f (x) = 0 = 12 − 1 = f (1),


lim f (x) = −1 = 02 − 1 = f (0),


lim f (x) = 0 = (−1) − 1 = f (−1).


Definition 165. A function is continuous on a set if it is continuous at every point in

that set. A function is continuous if it is continuous on its domain.

So, f ∶ D → R is a continuous function if for every non-isolated point a in D, we have:

lim f (x) = f (a).

Recall269 that a nice function is simply this textbook’s term for a real-valued function of a real variable.
660, Contents
Example 860. Consider again the function f ∶ R → R defined by f (x) = x2 − 1.
Above we saw that f is continuous at 2, 0, and −1, because:

lim f (x) = f (2), lim f (x) = f (0), lim f (x) = f (−1).

x→2 x→0 x→−1

It turns out that for every a ∈ Domainf = R, it is also true that:

lim f (x) = f (a).


Hence, f is continuous at every point a in its domain. And thus, f is a continuous


Example 861. XXX

Example 862. XXX

661, Contents

68.1. Functions with a Single Discontinuity

Example 863. Define the function g ∶ R → R by:

⎪x2 − 1, for x ≠ 2,
g (x) = ⎨

⎪ for x = 2.

Figure to be
inserted here.

The function g is almost identical to the function f from the previous example.
It is again the case that g is continuous at 1, 0, and −1, because:

lim g (x) = 0 = g (1), lim g (x) = 1 = g (0), lim g (x) = 0 = g (−1).

x→1 x→0 x→−1

However, g is not continuous at 2 because:

lim g (x) = 1, while g (2) = 0, so that lim g (x) ≠ g (2).

x→2 x→2

It turns out that g is continuous at every point in its domain R except at 2.

In fact, we say that g is discontinuous (or has a discontinuity) at 2. Indeed, this is
an example of a removable discontinuity.
Because g has a single discontinuity (namely at 2), it is not a continuous function — one
single discontinuity disqualifies a function from being called continuous.
Nonetheless, we can say that g is continuous on (−∞, 2) and also on (2, ∞) — or equi-
valently, g is continuous everywhere except at 2.

Definition 166. Let f be a nice function and a be a non-isolated point in its domain.
We say that f is discontinuous at a (or has a discontinuity at a) if:

lim f (x) ≠ f (a).


In general, there are three types of discontinuities:270

• A removable discontinuity;
• A jump discontinuity; and
• An essential (or infinite) discontinuity.

These are formally defined in Definition 262(Appendices).
662, Contents
In our last example, the function g had a removable discontinuity at 2. Informally, a
removable discontinuity is simply where we have a “little hole” at a point — patch that
“little hole” and the function becomes continuous at that point.
Two more examples of functions with a removable discontinuity:

Example 864. XXX

Example 865. XXX

We now turn to examine examples of jump discontinuities:

Example 866. Define j ∶ R → R by:

⎪0 for x < 0,
j (x) = ⎨

⎪ for x ≥ 0.



Then j has a discontinuity at 0, because lim j (x) does not exist.

Indeed, we say that this is an example of a jump discontinuity. Informally, a jump
discontinuity is where the function “jumps”.
Jump discontinuities are a little “worse” than a removable discontinuity in the informal
sense that they cannot be “fixed” by simply patching some “little hole”. Instead, in this
particular example, we can “fix” the discontinuity only by shifting either the left half of
j upwards or the right half downwards.
By the way, note that just like g, the function j is continuous everywhere except at a
single point (namely 0). And so, like g, j is not a continuous function.

Example 867. XXX

Example 868. XXX

Any other discontinuity — that is, any discontinuity that isn’t a removable or a jump
discontinuity — is called an essential (or infinite) discontinuity. (So this is really

663, Contents

simply a catch-all category for “everything else”.)

Example 869. We revisit the function f ∶ R → R defined by:

⎪ 1

⎪sin for x ≠ 0,
f (x) = ⎨ x


⎩ for x = 0.

As discussed in the last chapter:

lim f (x) does not exist.


Thus, f is discontinuous at 0.
It turns out that this discontinuity at 0is neither a removable discontinuity nor a jump
discontinuity. And so, we call it an essential (or infinite) discontinuity.


⎪ 1

⎪sin for x ≠ 0,
f (x) = ⎨ x

⎩0 for x = 0.

It turns out that again, f is continuous everywhere except at a single point (namely 0).
That is, f is continuous on (−∞, 0) ∪ (0, ∞), but discontinuous at 0. And so, f is not a
continuous function.

Example 870. XXX

Example 871. XXX

664, Contents

68.2. Functions That Are Discontinuous Everywhere
Each of our discontinuous functions so far has featured one discontinuity. We now look at
functions that are discontinuous at every point in their domain:

Example 872. We revisit the Dirichlet function d.

Observe that d is defined everywhere on R. However, as discussed in the last chapter, for
every a ∈ R, lim d (x) does not exist.
Hence, d is discontinuous everywhere. Equivalently, d is discontinuous at every point in

The Dirichlet function y

d ∶ R → R is defined by:

⎪1 for x ∈ Q,
d(x) = ⎨

⎩0 for x ∉ Q.

The graph of d contains the
point (x, 1) for every x ∈ Q.

The graph of d contains the

point (x, 0) for every x ∉ Q.

Using the formal definitions of the three types of discontinuities (given in the Appendices),
we can prove that none of these discontinuities is a removable or a jump discontinuity.
Hence, each is an essential (or infinite) discontinuity. Thus, we may say that d is essen-
tially (or infinitely) discontinuous everywhere. Or equivalently, d has an essential
(or infinite) discontinuity at every point a ∈ R.

Example 873. XXX

Example 874. XXX

665, Contents

68.3. Functions That Seem Discontinuous But Aren’t
Now, here’s a subtle point. Observe that in Definition 166 of a discontinuity, we require
that a ∈ Domainf in order for f to be discontinuous at a point a. If a ∉ Domainf , then it
is, by definition, impossible that f is discontinuous at a.
In each of the following examples, we are tempted to say that the function has at least one
discontinuity. It turns out though that each function is continuous (i.e. continuous on their
entire domain)!

Example 875. Define the function h ∶ R ∖ {2} → R by h (x) = x2 − 1.

We are tempted to say that h is discontinuous at 2. However, 2 is not in the domain of
h. And so, h can neither be said to be continuous nor discontinuous at 2.

Figure to be
inserted here.

Indeed, perhaps surprisingly, h is continuous at every point in its domain (we prove this
formally on p. XXX in the Appendices). Thus, h is a continuous function!
Observe that this example proves that our informal definition of continuity (“we can draw
the entire graph without lifting our pencil”) is actually not quite correct!271 Although h
is continuous, we are unable to draw its entire graph without lifting our pencil.

Example 876. As noted in the previous chapter, lim ln x, does not exist or is un-

Figure to be
inserted here.

However, as also noted, the natural logarithm function ln has domain R+ . So, −1 is not
in its domain. Equivalently, ln is not defined at −1.
Thus, ln is neither continuous nor discontinuous at −1.
Indeed, perhaps surprisingly, ln is continuous at every point in its domain (we prove this
formally on p. XXX in the Appendices). Thus, ln is a continuous function!

To be precise, the condition given our informal definition is sufficient but not necessary for continuity.
666, Contents
Example 877. XXX

Example 878. XXX

Exercise 285. XXX (Answer on p. 667.)


667, Contents

68.4. Continuity and Limits

Example 879. Suppose we are asked to evaluate lim sin x2 .

We are tempted to proceed as follows:

lim sin x2 = sin (lim x2 ) = sin 0 = 0.
x→0 x→0

It turns out that the above is correct. However, the step taken at = requires justification.
Why is it that we can simply “move” the limit in?

One of the (many) nice things about continuity is the following result. When taking take
the limit of a composite function, we can “move” the limit in if the “outer” function is

continuous. This justifies = in the above example — since sin is continuous, we can “move”
lim in.

Fact 134. Suppose lim g (x) = b. If f is continuous at b, then:


lim f (g (x)) = f (lim g (x)) .

x→a x→a

Proof. See p. 1335 in the Appendices.

Of course, in H2 Maths, most functions we’ll encounter are continuous, so that the above
result usually applies.

Exercise 286. XXX (Answer on p. 668.)


668, Contents

68.5. Every Elementary Function Is Continuous
As stated in Ch. 20, most functions we’ll encounter in H2 Maths are elementary.
It turns out that happily, every elementary function is continuous — this if formally
stated as Theorem 18 at the end of this chapter. Our task in this subchapter is to work
towards Theorem 18.
First, as intuition would suggest, the composition of two continuous functions is also
continuous. A little more precisely and formally:

Theorem 15. Let f and g be nice functions and a ∈ Domaing. If g is continuous at a

and f is continuous at g (a), then the composite function f g is continuous at a.

Proof. See p. 1332 in the Appendices.

Example 880. XXX

Example 881. XXX

Intuition would also suggest that continuity is preserved272 under the four basic arithmetic
operations and scalar multiplication:

Theorem 16. Suppose the nice functions f and g are continuous at a. Then (a) f ±g and
(b) f ⋅ g are also continuous at a. (c) If moreover g (a) ≠ 0, then f /g is also continuous
at a. (d) If c ∈ R, then cf is also continuous at a.

Proof. See p. 1333 in the Appendices.

Example 882. XXX

Example 883. XXX

“Obviously”, any constant function is continuous.

Fact 135. Suppose c ∈ R and f is a nice function defined by f (x) = c. Then f is


Proof. See p. 1332 in the Appendices.

“Obviously”, any identity function is continuous:

Fact 136. Suppose f is a nice function defined by f (x) = x. Then f is continuous.

Proof. See p. 1332 in the Appendices.

Or closed.
669, Contents
Using the above four results, we can quite easily prove that all polynomial functions
are continuous:

Fact 137. Let c0 , c1 , . . . , cn be constants. Suppose f is a nice function defined by

f (x) = c0 + c1 x + c2 x2 + ⋅ ⋅ ⋅ + cn xn . Then f is continuous.

Proof. Let g and h be functions with the same domain as f . Let g and h have mapping
rules g (x) = x and h (x) = c0 .
By Fact 136, g is continuous.
By Theorem 15, g 2 = g ○ g is also continuous. That is, the nice function that has the same
domain as f and the mapping rule x ↦ x2 is also continuous.
Similarly, for any n = 1, 2, 3, . . . , the repeated application of Theorem 15 shows that g n =
g ○ g ○ ⋅ ⋅ ⋅ ○ g is also continuous.
Now, observe that the mapping rule of f may also be rewritten as:

f (x) = h (x) + c1 g (x) + c2 g 2 (x) + ⋅ ⋅ ⋅ + cn g n (x) .

Since f may be written as the arithmetic combination of n + 1 continuous functions, by

Theorem 16, f is also continuous.

The sine, cosine, and natural logarithm functions are continuous:

Fact 138. The functions sin and cos cosine functions are continuous.

Proof. See p. 76.10 in Part V (Calculus).

Fact 139. The function ln is continuous.

Proof. See Ch. 85.

The inverse of a continuous function defined on an interval is also continuous.

A bit more precisely:

Theorem 17. Let D be an interval and f ∶ D → R be continuous. If f is invertible, then

its inverse f −1 is also continuous.

Proof. See p. 1335 in the Appendices.

Example 884. XXX

Example 885. XXX

Note the requirement in Theorem 17 that Domainf be an interval. If f is continuous and

invertible but its domain is not an interval, then its inverse f −1 may not be continuous:

Example 886. XXX

670, Contents
Using the last three results, we have:

Corollary 29. The functions sin−1 , cos−1 , and exp are continuous.

Proof. Observe that:

• sin−1 is the inverse of sin restricted to the interval [− , ].

π π
2 2
• cos−1 is the inverse of cos restricted to the interval [0, π].
• exp is the inverse of ln, which is defined on the interval (0, ∞).
And so by Theorem 17, sin−1 , cos−1 , and exp are continuous.

Corollary 30. The functions tan, cosec, sec, cot, and tan−1 are continuous.

Proof. See Exercise 287.

In summary:

Theorem 18. Every elementary function is continuous.

Copy-pasting Definition 77 of an elementary function, the above Theorem says that each
of the following functions is continuous:

a polynomial function, a trigonometric function, an inverse trigonometric

a natural logarithm function, an exponential function, a power function,
any arithmetic combination of two elementary functions, or any composition of two elemen

Exercise 287. Prove Corollary 30. (You should carefully specify any definitions and
results used at each step of the way.) (Answer on p. 1521.)

671, Contents

68.6. Continuity at Isolated Points (optional)
We now briefly discuss Condition 2 of Definition 164 (of continuity). This Condition says
that by definition, if a is an isolated point of Domainf , then f is continuous at a.
Recall (p. 166) that informally, an isolated point of a set is one that isn’t “close” to any
other point in the set.273
It turns out that somewhat strangely — but for sensible reasons274 — we want to say that
a function is continuous at all of its domain’s isolated points. And so, we simply include
in our definition of continuity any isolated points.

Example 887. XXX

Once again, this example demonstrates that our informal definition of continuity (“we
can draw its entire graph without lifting our pencil”) is actually not quite correct. The
function f is continuous but we cannot draw its entire graph without lifting our pencil.

Example 888. XXX

In an ideal universe, Condition 2 would be unnecessary because it would already have been
implied by Condition 1 of Definition 164 — lim f (x) = f (a). Unfortunately, if a is an
isolated point of D, then by our formal definition of limits (see Ch. 121.2), the limit of
f at a does not exist, so that lim f (x) = f (a) is necessarily false. Hence the annoying,
additional need for Condition 2.

For the formal definition, see Definition 250 (Appendices).
Here are two. First, in general, we want to be able to say that the restriction of a continuous function
to a subset of its domain is also continuous. So, say f ∶ X → Y is continuous. Let S ⊆ X be a set of
isolated points in X. Now consider the function g ∶ S → Y defined by x ↦ f (x). We want to say that g
is also continuous.
Second, under more general definitions of continuity (e.g. the topological one where the pre-image of
open sets are also open), a function is continuous at its isolated points. So here in our more specialised
setting, we want to also include in our definition of continuity any isolated points.
672, Contents
69. The Derivative
Differentiation, or the problem of finding the derivative, is the problem of finding the
gradient of a curve.
Graphed below is some function f ∶ R → R.
Consider the point A = (a, f (a)). Let l be the tangent line to the graph of f at A.
Now, how might we find the gradient of the line l? Unsure of how to proceed, we try a
series of approximations.
1. We first pick some point B = (b, f (b)) on f .
Consider the line AB. We have:
Rise f (b) − f (a)
AB’s gradient = =
Clearly, our actual tangent line l is steeper than the line AB. Nonetheless, AB’s gradient
serves as our first crude estimate of l’s gradient.
Now, can we improve on this crude estimate? Sure. Simply:
2. Pick some point C = (c, f (c)) that’s also on f but which is closer to A than B.
Consider the line AC. It is a little steeper than AB but still not as steep as l.
We have:
Rise f (c) − f (a)
AC’s gradient = =
The gradient of AC now serves as our second and slightly-improved estimate of l’s gradient.

f (b)
f (c)

f (a)

a c b x

We can keep repeating the above procedure to get ever-improved estimates of l’s gradient:
673, Contents
• Pick a point D = (d, f (d)) that’s on f but which is closer to A than C. The gradient of
the line AD serves as our third and slightly-improved estimate of l’s gradient.
• Pick a point E = (e, f (e)) that’s on f but which is closer to A than D. The gradient of
the line AE serves as our fourth and slightly-improved estimate of l’s gradient.
• Etc.
The above suggests that the gradient of the line l at the point A can be written as:
f (x) − f (a)

We are thus motivated to write down the following formal definition of the derivative.275

Definition 167. Let D be an interval, f ∶ D → R, and a ∈ D. Consider the following

f (x) − f (a)

If this limit exists (i.e. is equal to a real number), then we say that f is differentiable at
df df
a, call this limit the derivative of f at a, and denote it by f ′ (a), (a), ∣ , or f˙ (a).
dx dx x=a
If this limit doesn’t exist, then we say that f is not differentiable at a.

Remark 80. The lines AB, AC, AD, and AE in our above discussion are sometimes called
secant lines. We may thus consider the tangent line l to be the limit of these secant
This is just so you know — the term secant lines does not appear on your H2 Maths
syllabus or exams.

Use the Rules for Limits (Theorem 14), we can compute derivatives:

Technical/pedagogical note: Strictly speaking, it is unnecessary to assume that the function’s domain
is an interval. What matters is that the point a is a limit point of D (see Definition 251). However, by
imposing this (strong) assumption, every point in D is a limit point, thus allowing us to avoid having
to speak of limit points in the main text. Besides, most (all?) functions encountered in H2 Maths are
defined on intervals (or unions thereof). So all things considered, this assumption is mostly harmless.
674, Contents
Example 889. Define f ∶ R → R by f (x) = ∣x∣.
Using the above Rules, we can prove that the derivative of f at 2 is 1, i.e. that f ′ (2) = 1:

f (x) − f (2) ∣x∣ − ∣2∣

lim = lim (Simply plug in)
x→2 x−2 x→2 x − 2

= lim (For all x “near” 2, x ≥ 0 and hence ∣x∣ = x)
x→2 x − 2

= lim 1 (Note that x ≠ 2)


(Constant Factor Rule).

Since the derivative of f at 2 exists, we say that f is differentiable at 2.

Figure to be
inserted here.

We can similarly show that f is likewise differentiable at any a > 0. In particular, we can
prove that the derivative of f at any a > 0 is also 1 — in other words, we can prove that
f ′ (a) = 1 for any a > 0:

f (x) − f (a) ∣x∣ − ∣a∣

lim = lim (Simply plug in)
x→a x−a x→a x − a
= lim (For all x “near” a > 0, x ≥ 0 and hence ∣x∣ = x)
x→a x − a

= lim 1 (Note that x ≠ a)


(Constant Factor Rule).

We will continue with this example below. But first, Exercise 288.

Exercise 288. Continue to define f ∶ R → R by f (x) = ∣x∣. Prove that f is differentiable

at: (Answer on p. 1522.)
(a) −3, with f ′ (−3) = −1;
(b) Any a < 0, with f ′ (a) = −1 for any a < 0.

675, Contents

Example 890. Continue to define f ∶ R → R by f (x) = ∣x∣.
Above, we showed that f is differentiable at any a ≠ 0.
In contrast, f is not differentiable at 0.
Informally, this is because “near” 0, there is no single number that the gradient of f stays
“near” to. To the left, the gradient is −1; while to the right, it is 1.
A little more formally, we may observe that the left- and right-hand limits at 0 exist but
are not equal:

f (x) − f (0) f (x) − f (0)

lim− = −1 and lim+ = 1.
x→0 x−0 x→0 x−0
And so by Fact 133, the following limit does not exist:
f (x) − f (0)

Altogether then, f is differentiable everywhere except at 0. Equivalently, f is differenti-

able on R ∖ {0}.
Since f fails to be differentiable at even a single point, it is, by Definition 168, not a
differentiable function.

676, Contents

Example 891. Define g ∶ R → R by g (x) = x2 .

Figure to be
inserted here.

Using the Rules for Limits, we can prove that the derivative of g at 2 is 4, i.e. g ′ (2) = 4:

g (x) − g (2) x 2 − 22
lim = lim (Simply plug in)
x→2 x−2 x→2 x − 2

(x − 2) (x + 2)
= lim
x→2 x−2
= lim (x + 2) (Note that x ≠ 2)
= lim x + lim 2 (Sum and Difference Rules)
x→2 x→2

= 2+2=4
(Power and Constant Rules).

Since the derivative of g at 2 exists, we say that g is differentiable at 2.

Exercise 289 continues with this example.

Exercise 289. Continue to define g ∶ R → R by g (x) = x2 . By mimicking the above

example, determine if each of the following derivatives exists and if it does, find it.
(Answer on p. 1522.)
(a) The derivative of g at −3.
(b) The derivative of g at 0.
(c) The derivative of g at any a ∈ R.

677, Contents

69.1. Differentiable ⇐⇒ Approximately Linear
Informally, Newton’s Linear Approximation states that:

A function f is differentiable at a point a ⇐⇒ f is approximately linear at a.

Or more informally:
”Near” a (or, when we “zoom in” to a), f “looks” like a straight line.

Example 892. The graph of sin doesn’t “look” like a straight line anywhere.

Figure to be
inserted here.

However, if we pick any point, say x = 0, and zoom in, then the graph does “look”
increasingly like a straight line. And indeed, sin is differentiable at 0.

Example 893. Consider the absolute value function. No matter how far we zoom in at
the point 0, it never looks like a straight line.

Figure to be
inserted here.

And indeed, f is not differentiable at 0.

Remark 81. Note that instead of approximately linear, some writers say locally lin-

Newton’s Linear Approximation is formally stated as Proposition 19 in the Appendices.

Nonetheless, here we can provide a little heuristic justification of why it works. By definition
of the derivative, we have:
f (x) − f (a)
f ′ (a) = lim
Some writers object to the phrase locally linear. The chief objection seems to be that given some
property X in mathematics, to say that f is locally X at a is to say that f actually satisfies property
X, at least on some neighbourhood of a. However, when f is differentiable at a, there need not be any
neighbourhood of a on which f is actually linear.
678, Contents
Let’s imagine we can rewrite the above equation as:

f (x) − f (a)
“f ′ (a) ≈ ”.
Then rearranging, we have:

“f (x) ≈ f (a) + f ′ (a) (x − a)”.

Figure to be
inserted here.

679, Contents

69.2. Continuity vs Differentiability
Informally, both continuity and differentiability tell us how “smooth” a function is.
Informally, a function is:
• Continuous if its graph contains no “holes” or “jumps” anywhere and can be drawn
without lifting your pencil.
• Differentiable if it is continuous and moreover, contains no “kinks” or other “abrupt
turns” in its graph.
It turns out that differentiability implies continuity. (We may thus say that differen-
tiability is a stronger condition than continuity.) Formally:

Theorem 19. If f is differentiable at a, then it is also continuous at a.

Proof. See p. 1336 in the Appendices.

However, the converse is false — that is, a function may be continuous without also being

Example 894. The function f is both continuous and differentiable.

f g

The function g is continuous but not differentiable. In particular, it is differentiable at

every point except A.
The function h is neither continuous nor differentiable. In particular, it is continuous and
differentiable at every point except B.

680, Contents

69.3. The Derivative Is A Function
Given a function f , its derivative is the function denoted f ′ and which gives us the gradient
of the tangent line to f at each point.
We now give

Definition 168. If f is differentiable at every point in a set S, then we say that f is

differentiable on S.
Let T be the set of points on which f is differentiable. Then the derivative of f is the
real-valued function denoted f ′ , , or f˙, with domain T and mapping rule:
f (x) − f (a)
x ↦ lim

If f is differentiable on its domain (or equivalently, if T = D), then we simply call f a

differentiable function.

So, given any point a ∈ S, f ′ (a) gives us the gradient of the tangent line to f at that point.

Example 895. XXX

Example 896. XXX

Let us stress, emphasise, and repeat: the derivative f ′ is itself also a function. In partic-
ular, it is the function whose:
• Domain is the set of points at which the derivative of f exists; and
f (x) − f (a)
• Mapping rule is a ↦ lim .
x→a x−a
Exercise 290. XXX (Answer on p. 681.)

681, Contents

69.4. The Notation of Lagrange, Leibniz, and Newton
Given a nice function f , its derivative is a function and may be denoted in at least three
different ways:

The derivative of f = f ′ (Lagrange’s notation)

= (Leibniz’s notation)
= f˙. (Newton’s notation)

Suppose that a ∈ Domainf . Then the following limit (if it exists) is simply a number and
is called the derivative of f at a:
f (x) − f (a)

There are, again, at least three different ways to denote the derivative of f at a:
f (x) − f (a)
lim = f ′ (a) (Lagrange’s notation)
x→a x−a
df RRRR df
= RRR = (a) (Leibniz’s notation)
dx RR dx

= f (a). (Newton’s notation)
Of course, Newton’s notation is very similar to Lagrange’s — instead of the prime symbol

to the right of the name of the function f , Newton uses a dot over f .

Example 897. XXX

Example 898. XXX

Remark 82. The notation of Lagrange and Leibniz are widely used. Newton’s is not.
Indeed, Newton’s notation does not appear in any of your recent years’ A-Level exams
and we shall not use it in this textbook.
Nonetheless, Newton’s notation is sometimes used in physics (especially when the in-
dependent variable is time). Moreover, it appears on p. 18 of your syllabus. So, it’s
probably worth knowing about.

One convenience of Leibniz’s notation is that it allows us to interpret as the differen-
tiation operator or function. That is, is itself a function that maps a function (e.g.
f ) to another (e.g. f ′ ).277
Operator and function are synonyms. However, if a function maps functions to other functions, then
we tend to call this function an operator. Here for example, maps a function f to another function
f ′ and so we call it an operator.
682, Contents
d 2
Example 899. The statement “ x = 2x” is simply shorthand for:

The derivative of a function with mapping rule x ↦ x2

is a function with mapping rule x ↦ 2x.

Or: d
The function or operator maps the
function with mapping rule x ↦ x2 to
the function with mapping rule x ↦ 2x.

Example 900. The statement “ f = g” is simply shorthand for:

The derivative of the function named f is the function named g.

Or: d
The function or operator maps the
function named f to the function named g.

Example 901. The statement “ f ⋅ g = f ′ ⋅ g + f ⋅ g ′ ” (this is the Product Rule) is simply
shorthand for:

The derivative of the function f ⋅ g is a function with mapping rule:

x ↦ f ′ (x) ⋅ g (x) + f (x) ⋅ g ′ (x).

Or: d
The function or operator maps the
function with mapping rule x ↦ (f ⋅ g) (x) to
the function with mapping rule x ↦ f ′ (x) ⋅ g (x) + f (x) ⋅ g ′ (x).

683, Contents

Fun Fact

Isaac Newton (1643–1727) was one of the greatest physicists ever and also one of the
greatest mathematicians ever. It is not surprising then that one writer ranked him the
second-most influential person in history (and the only among the top six who was
a non-religious figure).278
Gottfried Wilhelm von Leibniz279 (1646–1716) was likewise a first-rate genius and a poly-
math. Indeed, he is sometimes called be “the last man to know everything”, the
rationale being that:

Since his time the growth of knowledge has resulted in, and indeed neces-
sitated, specialization. The horizon for the individual is now restricted,
for few can hope to attain proficiency in more than one subject.

— A. L. Leigh Silver (1962).

Newton and Leibniz are often dubbed the “inventors” of the calculus. Indeed, their
dispute over who “invented” calculus is perhaps history’s most famous academic dispute.
(Even history’s greatest geniuses are not above some petty bickering.)280 But as has been
well said by the historian of mathematics Carl B. Boyer (1949):

Few new branches of mathematics are the work of single individuals.

Michael Hart, in The 100: A Ranking of the Most Influential Persons in History (1978, 1992). In case
you’re wondering, Muhammad was ranked first and Jesus third. Full rankings plus summary here and
book here.
Sometimes spelt Leibnitz.
Jason Socrates Bardi gives a popular account of this dispute in The Calculus Wars: Newton, Leibniz,
and the Greatest Mathematical Clash of All Time (2007).
684, Contents
69.5. Proving Several Rules of Differentiation
In Ch. 18.5, we gave several (informal) Rules of Differentiation.281 We now reproduce
these verbatim:

Rules of Differentiation (informal)

Let c be a constant and x, y, and z be variables. Then:

Constant Rule Constant Factor Rule Power Rule

d C d F dy d c P c−1
c=0 (cy) = c x = cx
dx dx dx dx

Sum and Difference Rules Product Rule

d ± dy dz d × dy dz
(y ± z) = ± (yz) = z +y
dx dx dx dx dx dx

Quotient Rule Sine Cosine

d y ÷ z dx − y dx
dy dz
d d
= sin x = cos x cos x = − sin x
dx z z2 dx dx

natural logarithm Exponential

d 1 d x x
ln x = e =e
dx x dx

We now have a better understanding of what derivatives are. In particular, we understand


A derivative is itself a function.

And so, in this and the next two subchapters, we shall formally and properly restate and
prove several of the above Rules of Differentiation. These proofs are not on the syllabus,
so the Type 1 pragmatist can choose to skip them.
We start with the simplest, the Constant Rule, which says that the derivative of a
constant function is a zero function:

Fact 140. Let D be an interval and f ∶ D → R be a function. If f is defined by f (x) = c

for some c ∈ R, then f ’s derivative is the function f ′ ∶ D → R defined by:

f ′ (x) = 0.
(Constant Rule)

Proof. For any a ∈ D, we have:

For a formal statement of these Rules, see Theorem 20 (Appendices).
685, Contents
f (x) − f (a) c−c
f ′ (a) = lim = lim = lim 0 = 0,
x→a x−a x→a x − a x→a

where the last step uses the Constant Rule for Limits (see Theorem 14).
We’ve just shown that for any a ∈ D, the derivative of f at a is 0. Thus, the derivative of
f is the function f ′ ∶ D → R defined by f ′ (x) = 0.

By the way, the above result’s converse is also true. That is, a function whose derivative
is a zero function is itself is a constant function:

Proposition 6. Let D be an interval and f ∶ D → R be a function. If f ’s derivative is the

function f ′ ∶ D → R defined by f ′ (x) = 0, then f is defined by f (x) = c for some c ∈ R.

Proof. See p. 1342 in the Appendices.

We next state and prove the Constant Factor Rule, which says that the derivative
of a constant multiple of a function is the scalar multiple of that function’s

Fact 141. Let D be an interval and c ∈ R. Suppose f ∶ D → R is a differentiable function.

Then the derivative of the function cf is the function (cf ) ∶ D → R defined by:

(cf ) (x) = cf ′ (x).
(Constant Factor Rule)

Proof. For any a ∈ D, we have:

′ cf (x) − cf (a) f (x) − f (a)

(cf ) (a) = lim = lim [c ]
x→a x−a x→a x−a
f (x) − f (a) 2 f (x) − f (a) 3 ′
= [lim c] [lim ] = c [lim ] = cf (a),
x→a x→a x−a x→a x−a

where = and = use the Product and Constant Rules for Limits (see Theorem 14), while =
1 2 3

uses the definition of the derivative.

We’ve just shown that for any a ∈ D, the derivative of cf at a is cf ′ (a). Thus, the derivative
′ ′
of cf is the function (cf ) ∶ D → R defined by (cf ) (x) = cf ′ (x).

The Sum and Difference Rules say that the derivative of the sum (or difference)
of two functions is the sum (or difference) of their derivatives:

Fact 142. Let D be an interval. Suppose f, g ∶ D → R are differentiable functions. Then

the derivative of the function f ± g is the function (f ± g) ∶ D → R defined by:

′ ±
(f ± g) (x) = f ′ (x) ± g ′ (x).
(Sum and Difference Rules)

Proof. For any a ∈ D, we have:

686, Contents

(f ± g) (x) − (f ± g) (a) f (x) ± g (x) − [f (a) ± g (a)]
(f ± g) ′ (a) = lim = lim
x→a c−a x→a c−a
f (x) − f (a) g (x) − g (a) 5 ′
= lim ± lim = f (a) ± g ′ (a),
x→a c−a x→a c−a
where = uses the Sum and Difference Rules for Limits (see Theorem 14), while = uses the
4 5

definition of the derivative.

We’ve just shown that for any a ∈ D, the derivative of f ± g at a is f ′ (a) ± g ′ (a). Thus, the
′ ′
derivative of f ±g is the function (f ± g) ∶ D → R defined by (f ± g) (x) = f ′ (x)±g ′ (x).

Next is the Power Rule:

Fact 143. Let D be an interval, c ∈ R, and f ∶ D → R be defined by f (x) = xc . Then the

derivative of f is the function f ′ ∶ D → R defined by:

f ′ (x) = cxc−1 .
(Power Rule)

A complete proof of the Power Rule is beyond the scope of H2 Maths and this textbook.
Nonetheless and very excitingly, we will now learn to prove the Power Rule in the special
case where the exponent c is a non-negative integer.
Of course, we’ve already proven the Power Rule in the simplest case where the exponent is
0 — this is simply the Constant Rule. And so, let us start with the next simplest case —
the case where the exponent is 1 — and work our way up:

Example 902. Let D be an interval. Define f ∶ D → R by f (x) = x.

For any a ∈ D, we have:
f (x) − f (a) x−a
= lim = lim 1 = 1.
x→a x−a x→a x − a x→a

We’ve just shown that for any a ∈ D, the derivative of f at a is 1.

Thus, the derivative of f is the function f ′ ∶ D → R defined by f ′ (x) = 1.

687, Contents

Example 903. Let D be an interval. Define g ∶ D → R by g (x) = x2 .
In Exercise 289(c), we showed that for any a ∈ D, the derivative of g at a exists and is
equal to 2a. For convenience, we reproduce these steps:

g (x) − g (a) x2 − a2
lim = lim (Simply plug in)
x→a x−a x→a x − a

(x − a) (x + a)
= lim
x→a x−a
= lim (x + a) (Note that x ≠ a)
= lim x + lim a (Sum and Difference Rules)
x→a x→a

= a + a = 2a
(Power and Constant Rules).

We’ve just shown that for any a ∈ D, the derivative of g at a is 2a.

Thus, the derivative of g is the function g ′ ∶ D → R defined by g ′ (x) = 2x.

Example 904. Let D be an interval. Define h ∶ D → R by h (x) = x3 .

As a hint, we are told that: x3 − a3 = (x − a) (x2 + ax + a2 ).


Using =, we can find the derivative of h. For any a ∈ D, we have:


h (x) − h (a) x3 − a3
lim = lim (Simply plug in)
x→a x−a x→a x − a

(x − a) (x2 + ax + a2 )
= lim
x→a x−a
= lim (x2 + ax + a2 ) (Note that x ≠ a)
±, F
= lim x2 + a lim x + lim a2 (Sum, Difference, and Constant Factor Rules)
x→a x→a x→a

= a2 + a ⋅ a + a2 = 3a2
P, C
(Power and Constant Rules).

What we’ve just shown is that for any a ∈ D, the derivative of h at a is 3a2 .
Thus, the derivative of h is the function h′ ∶ D → R defined by h′ (x) = 3x2 .

Using the following information, we can go on finding the derivatives of higher integer

688, Contents

x2 − a2 = (x − a) (x + a) ,
x3 − a3 = (x − a) (x2 + ax + a2 ) ,

x4 − a4 = (x − a) (x3 + ax2 + a2 x + a3 ) ,

More generally, for higher integers n, we have:

xn − an = (x − a) (xn−1 + xn−2 a + xn−3 a2 + ⋅ ⋅ ⋅ + xan−1 + an ) .
⋆ ○
You will use = and = to solve the following two Exercises:

Exercise 291. Find the derivative of the function i ∶ R → R defined by i (x) = x4 .(Answer
on p. 1523.)
Exercise 292. Let c be a positive integer and define f ∶ R → R by f (x) = xc . Find the
derivative of f (you will thus have proven the Power Rule in the special case where the
exponent is a positive integer). (Answer on p. 1523.)

Remark 83. As stated above, a complete proof of the Power Rule is beyond the scope of
H2 Maths and this textbook. In Exercise 292, we merely proved the Power Rule in the
special case where c is a positive integer.
For a somewhat more general proof of the Power Rule, see p. 1337 (Appendices).

Exercise 293. What’s wrong with the following “proof” that 1 = 0? (Answer on p.

1. Define f ∶ R → R by f (x) = x2 − 3x.

2. Then the derivative of f is the function f ′ ∶ R → R defined by f ′ (x) = 2x − 3.
3. Observe that f (2) = 22 − 3 ⋅ 2 = −2.
4. Hence, f ′ (2) = (−2) = 0.
5. But from Step 2, we also have f ′ (2) = 2 ⋅ 2 − 3 = 1.
6. Thus, by Steps 4 and 5, we have 1 = 0.

689, Contents

69.6. Proving the Product and Quotient Rules
We now formally state and prove the Product and Quotient Rules.

Theorem 20. (The Product and Quotient Rules) Let D be an interval. Suppose
f, g ∶ D → R are differentiable functions. Then:

(a) The derivative of the function f ⋅ g is the function (f ⋅ g) ∶ D → R defined by:

(f ⋅ g) ′ (x) = f (x) g ′ (x) + f ′ (x) g (x). (Product Rule)

(f ⋅ g) ′ = f ⋅ g ′ + f ′ ⋅ g.
Or equivalently and more succinctly: (Product Rule)

(b) The derivative of the function f /g is the function (f /g) ∶ D ∖ {x ∶ g (x) = 0} → R
defined by:

÷ g (a) f (a) − f (a) g (a)

′ ′
( ) ′ (a) =
. (Quotient Rule)
[g (a)]
g 2

f ′ ÷ gf ′ − f g ′
Or equivalently and more succinctly:( ) = . (Quotient Rule)
g 2

Remark 84. Note the exclusion from the domain of (f /g) any points at which g (x) = 0.
If you cannot remember or do not understand why this is necessary, go back and review
Ch. 13.

We reproduce from the following common mnemonic for the Quotient Rule:

Lo-D-Hi minus Hi-D-Lo,

Cross over and square the Lo.

Fun Fact

The Product Rule (for differentiation) is sometimes named after Leibniz.

But Leibniz initially got it wrong! He initially guessed — as one might — that analogous
to the Sum and Difference Rules, the derivative of the product is equal to the product of
the derivatives. That is, he guessed that:

(f g) = f ′ g ′ .

He quickly realised though that this was wrong and arrived at the correct Product Rule.282

For more about this story, see Google Books and .
690, Contents
Proof. (a) (Product Rule) For any a ∈ D, we have:

′ (f ⋅ g) (x) − (f ⋅ g) (a)
(f ⋅ g) (a) = lim
x→a x−a
f (x) g (x) − f (a) g (a)
= lim
x→a x−a
f (x) g (x) −f (x) g (a) + f (x) g (a) − f (a) g (a)
= lim (Plus Zero Trick)
x→a x−a
g (x) − g (a) f (x) − f (a)
= lim [f (x) ] + lim [ g (a)]
x→a x−a x→a x−a
g (x) − g (a) f (x) − f (a)
= lim f (x) lim + g (a) lim
x→a x→a x−a x→a x−a
= f (a) g ′ (a) + f ′ (a) g (a).

We’ve just shown that for any a ∈ Domain (f ⋅ g), the derivative of f ⋅ g at a is f (a) g ′ (a) +

f ′ (a) g (a). Thus, the derivative of f ⋅ g is the function (f ⋅ g) ∶ D → R defined by

(f ⋅ g) (x) = f (x) g ′ (x) + f ′ (x) g (x).
(b) (Quotient Rule) For any a ∈ D ∖ {x ∶ g (x) = 0}, we have:

(f /g) (x) − (f /g) (a) f (x) /g (x) − f (a) /g (a)

lim = lim
x→a x−a x→a x−a
1 1 f (x) g (a) − f (a) g (x)
= lim [ ]
x→a g (a) g (x) x−a
1 1 f (x) g (a) −f (x) g (x) + f (x) g (x) − f (a) g (x)
= lim [ ](Plu
x→a g (a) g (x) x−a
1 1 f (x) − f (a) g (x) − g (a)
= lim { [g (x) − f (x) ]}
x→a g (a) g (x) x−a x−a
1 1 f (x) − f (a) g (x) − g
= lim lim [lim g (x) lim − lim f (x) lim
x→a g (a) x→a g (x) x→a x→a x−a x→a x→a x−a
where the last step uses the Product and Difference Rules for Limits (see Theorem 14).
The proof of the Quotient Rule will be completed in Exercise 294.

691, Contents

Exercise 294. We now continue with the proof of the Quotient Rule. (Answer on p.
Let’s now examine what each term in expression ⋆ is. (Fill in the blanks.)
(a) Observe that is a _____. And so, by the _____ (result), we have:
g (a)
lim =?
x→a g (a)

(b) Since f and g are differentiable at a, by _____ (result), they are also _____ at
a. And so, by the definition of _____, we have:

lim f (x) =? and lim g (x) =?

x→a x→a

(c) Since g is _____ at a and g (a) ≠ 0, by _____ (result), the reciprocal function
1/g is also _____ at a. And so, again by the definition of _____, we have:
lim =?
x→a g (x)

(d) By definition of the _____, we have:

f (x) − f (a) g (x) − g (a)

lim =? and lim =?.
x→a x−a x→a x−a

(e) Now return to expression ⋆ and plug in the equations = through = to get:
1 6

(f /g) (x) − (f /g) (a) 1 1 f (x) − f (a) g (x) −

lim = lim lim [lim g (x) lim − lim f (x) lim
x→a x−a x→a g (a) x→a g (x) x→a x→a x−a x→a x→a x−


(f) Complete the proof by writing down the usual last two sentences.

692, Contents

69.7. Proving the Chain Rule
The Chain Rule tells us about the derivative of the composition of two functions.
We reproduce from Ch. 18.6 the following informal statement of the Chain Rule:

Chain Rule (informal)

Let x, y, and z be variables. Then:

dz dz dy
= ⋅ .
dx dy dx

We now formally state the Chain Rule:

Theorem 21. (Chain Rule) Let a ∈ R and f and g be nice functions. Suppose g ′ (a)
and f ′ (g (a)) exist. Then:

(f g) ′ (a) = f ′ (g (a)) g ′ (a) .

Proof. We are tempted to simply write down the following “proof”:

f (g (x)) − f (g (a)) 1 f (g (x)) − f (g (a)) g (x) − g (a)

(f g) ′ (a) = lim = lim [ ] (Times One Trick)
x→a x−a x→a g (x) − g (a) x−a
f (g (x)) − f (g (a)) g (x) − g (a)
= lim lim = f ′ (g (a)) g ′ (a).
x→a g (x) − g (a) x→a x−a

Unfortunately, the above “proof” contains two fatal flaws. First, in =, there is the possibility

that g (x) = g (a) for some values of x that are “near” a — if so, then = commits the cardinal

f (g (x)) − f (g (a)) ⋆ ′
sin of (possibly) dividing by zero. Second, the step lim = f (g (a))
x→a g (x) − g (a)
requires additional justification.
Nonetheless, these two flaws may be regarded as mere blemishes or technicalities that can
be easily addressed — if you’re interested in the gory details, see p. 1338 (Appendices).
Though flawed, the above “proof” should give you an idea of why the Chain Rule “works”.

693, Contents

69.8. Are All Elementary Functions Differentiable?
We saw in Ch. 68.5 that all elementary functions are continuous.
Is the following, analogous claim likewise true?

“All elementary functions are differentiable.”

Unfortunately, the above claim is false:

Example 905. Consider the elementary function f ∶ R+0 → R defined by f (x) = x.

Figure to be
inserted here.

Unfortunately, f is not a differentiable function, because it fails to be differentiable at

one point in its domain, namely at 0. In other words, the derivative of f at 0 does not
exist. (Informally, the reason for this is that the tangent line at f is vertical.)
The derivative of f is the function f ′ ∶ R+ → R defined by f ′ (x) = √ .
2 x
Note the subtle difference in the domains of f and f ′ . (The latter excludes the point 0.)

Nonetheless, the above claim is “almost” true. Recall that an elementary function is:

a polynomial function, a trigonometric function, an inverse trigonometric

a natural logarithm function, an exponential function, a power function,
any arithmetic combination of two elementary functions, or any composition of two elemen

It turns out that the only bad apple is the power function x ↦ xc and even then only
in the special case where c < 1. In such cases, the derivative at certain points may have
denominator 0 and thus be undefined. And so, we have the following informal theorem
telling us that with this small exception, all elementary functions are differentiable:

Informal Theorem on Differentiability of Elementary Functions

All elementary functions are differentiable, except possibly those involving the power func-
tion x ↦ xc where c < 1.

694, Contents

69.9. More About Leibniz’s Notation (optional)
This subchapter briefly discusses the motivation behind Leibniz’s notation. As always, the
goal of this discussion is to improve your understanding .
Let y be a nice function that is defined at a ∈ D. Denote:

∆x = x − a and ∆y = y (x) − y (a).

Then for x ≠ a, we can write:

∆y y (x) − y (a)
Thus, if lim exists, then it is also the derivative of f at a:
x→a ∆x

∆y y (x) − y (a)
lim = lim .
x→a ∆x x→a x−a
We then introduce the following (familiar) piece of notation:
dy ∆y
= lim .
dx x→a ∆x
It must be stressed, emphasised, and repeated that is a single expression. Do not think
of it as a fraction with numerator dy and denominator dx.
However, to better understand where the above notation comes from, let us note that
Leibniz held a view that was contrary to the modern and standard one. In particular, to
• dx denoted an “infinitesimal change in x”;
• dy denoted the corresponding “infinitesimal change in y”; and
• was really a fraction with numerator dy and denominator dx.
Unfortunately, Leibniz’s notion of “infinitesimals” (or Newton’s of “fluxions”) was rather
vague, imprecise, and non-rigorous. An “infinitesimal” was smaller than any quantitatively
and yet not zero. The most famous (and most poetic) critique is probably George Berkeley’s
what are these same evanescent Increments? They are neither finite Quant-
ities nor Quantities infinitely small, nor yet nothing. May we not call them
the Ghosts of departed Quantities?
So, in the 19th century, mathematicians embarked on a project to put calculus on a firmer
footing. In particular, they sought to rid mathematics of those ill-defined “infinitesimals”.
Eventually, they came up with the formal notion of limits. Thereafter, Leibniz’s “infinites-
imals” or Newton’s “fluxions” were banished from maths.283
In Ch. 67, you learnt a little about the idea of limits. Under our modern notion of limits:
But would later be resurrected by Abraham Robinson in the 1960s with his non-standard analysis.
695, Contents
It is wrong to think of as a fraction with numerator dy and denominator dx.
So simply put, Leibniz was wrong to think of the derivative as a fraction.284 And
you should be very careful not to think of the derivative as a fraction, even though it looks
very much like one.
Instead, denotes a function — in particular, it is the derivative of the function y.285
Indeed, the operator is itself a function! It maps a function, for example y, to another
dy d
function denoted and which we call the derivative of y. Thus, is itself a function
dx dx
whose domain and codomain are both sets of functions!

Now, if Leibniz was wrong to think of the derivative as a fraction, then why are we still
using his notation? The main reason is that it is highly intuitive.
Leibniz’s notation reminds us that calculus is the study of continuous changes. For
example, it also allows us to quickly grasp the intuition behind such results as the Chain
Rule, which we stated informally as:
dz dz dy
= × .
dx dy dx
It is tempting to naïvely interpret the expressions in the above equation as fractions, naïvely
apply simple algebra, naïvely cancel out the dy’s, so that the equation looks correct by
primary-school algebra:

dz dz dy
“ = × .”
dx dy dx

But the correct informal interpretation (easily seen when written in Leibniz’s notation) is

The change in z due to The change in z due to The change in y due to

= ×
a small unit change in x a small unit change in y a small unit change in x

Another example is the Inverse Function Theorem, which may informally be stated as:
dy 1
= .
dx dx

dy dx
Again, the naïve interpretation would be that and are fractions, so that the above
dx dy
equation again looks correct by primary-school algebra.
The traditional historiographical view is that Leibniz and Newton were less than completely rigorous
and sometimes committed logical fallacies. Note however that not everyone agrees. For example, Katz
and Sherry (2012) argue that, “Leibniz’s system for differential calculus was free of logical fallacies.”
Actually, the variable x is superfluous! Indeed, Euler simply wrote Dy to denote the derivative of the
function y. Happily, this fourth (!) piece of notation for the derivative does not appear in your A-Level
syllabus or exams and so we shall say no more about it.
696, Contents
But again, the correct informal interpretation (easily seen when written in Leibniz’s nota-
tion) is this:
The change in y due to ⎛ The change in x due to ⎞
a small unit change in x ⎝ a small unit change in y ⎠

Stack Exchange has numerous discussions on why we use Leibniz’s notation even though it
is arguably “wrong”.286 I recommend reading the top answer here:

Fun Fact

Here is what Alan Turing wrote about the Leibniz notation in his recently-discovered
wartime notebooks:

The Leibniz notation I find extremely difficult to understand in spite of
it having been the one I understood best once! It certainly implies that some

relation between x and y has been laid down e.g.

y = x2 + 3x

— Alan Turing (c. 1944).287

“Wrong” is in scare quotes here because, of course, notation can’t be “wrong” any more than any
convention (such as driving on the left side of the road) can be “wrong”.
Discussion of what Turing might have meant by this remark: .
697, Contents
70. Some Techniques of Differentiation

70.1. The Inverse Function Theorem (IFT)

Informally, the Inverse Function Theorem (IFT) simply says that at points at which
≠ 0, we have:288
dy 1
= .
dx dx

Here is one possible informal interpretation of the IFT:

The change in y due to ⎛ The change in x due to ⎞
a small unit change in x ⎝ a small unit change in y ⎠

Example 906. Let x be the mass of Milo powder in a cup of water and y be the volume
of water in the cup.
Suppose that adding another 1 g of Milo to the cup of water increases the volume of water
by 2 cm3 . Then we may write:
= 2 cm3 g−1 .
By the IFT (or common sense), we also have:
dx 1 −1 3
= g cm .
dy 2

That is, if we had instead wanted to increase the volume of water by 1 cm3 , we should
instead have added 1 g of Milo.

Here’s a more typical use of the IFT:

Example 907. Consider the sine function.

Define Let x ∈ [−π/2, π/2]. Let y = sin x. Suppose we wish to find dx/dy in terms of x.
Method #1 (longer method using Corollary 144 ). y = sin x Ô⇒ x = sin−1 y. So
dx d 1 1 1
= sin−1 y = √ =√ =
dy dy 1 − y2 1 − sin2 x cos x

dy dx 1
Method #2 (quicker method using the IFT). = cos x Ô⇒ = .
dx dy cos x

For a formal statement of the IFT, see Theorem 34 (Appendices).
698, Contents
dy dx
Exercise 295. Suppose x2 y + sin x = 0. Find . Hence write down . (You may leave
dx dy
your answers expressed in terms of x and y.) (Answer on p. 1528.)

699, Contents

70.2. Implicit Differentiation

Example 908. Consider the equation x2 + y 2 = 1. What is y ′ ,
, or ẏ?
Method 1. First express y in terms of x:

y = ± 1 − x2 .

√ strictly/pedantically speaking, here we actually have√ two functions. One is y1 ∶ [−1, 1] → R defined by
(Note that
y1 (x) = 1 − x2 . The other is y2 ∶ [−1, 1] → R defined by y2 (x) = 1 − x2 .)
operator to =:
Now apply the
dy d √ −2x −x ∓x
= (± 1 − x2 ) = ± √ = ±√ =√ .
dx dx 2 1 − x2 1 − x2 1 − x2

(Strictly/pedantically speaking, we have two √ derivatives. One is y1′ ∶ (−1, 1) → R defined by y1′ (x) = −x/ 1 − x2 . The
other is y2′ ∶ (−1, 1) → R defined by y2′ (x) = x/ 1 − x2 . Note that the functions y1 and y2 are not differentiable because
each fails to be differentiable at the points −1 and 1.)
operator to =:
Method 2 (Implicit Differentiation). Directly apply the
d d dy dy 3 x
(x2 + y 2 ) = 1 Ô⇒ 2x + 2y =0 ⇐⇒ = − (for y ≠ 0).
dx dx dx dx y

We can leave = as our final answer.


Alternatively, we can plug = into = to get the same answer as in Method 1e:
2 3

dy ∓x
=− √ =√
dx ± 1 − x2 1 − x2

In the above example, Method 2 (implicit differentiation) was not obviously superior
to Method 1. However, it is sometimes difficult (or impossible) to express y in terms of
x. In such cases, implicit differentiation is the clear winner:

700, Contents

√ dy dy
Example 909. Consider the equation x2 y + = 1. What is y ′ (0), ∣ , (0),
y 1
cos x dx x=0 dx
or ẏ (0)?
In this example, it’s difficult to express y in terms of x. But this doesn’t matter, because
operator to =:
we can use implicit differentiation — simply apply the

√ √ 1 dy y (− sin x) − cos x dx
d d
(x2 y + )= Ô⇒ 2x y + x2 √ + = 0.
dx cos x dx 2 y dx cos2 x

Now plug in x = 0:

√ y (− sin 0) − cos 0 dx ∣ dy
2 1 dy dy
2⋅0 y+0 √ ∣ + x=0
=0 ⇐⇒ ∣ = 0.
2 y dx x=0 cos 0
2 dx x=0

In this textbook, we will simply take for granted that implicit differentiation “works”,
without explaining why.289

But if you’re interested, here’s a quick and incomplete explanation: Suppose we have an equation
involving x and y, but have no idea how to explicitly express y in terms of x. Then the Implicit
Function Theorem says that even if we have no idea how, it is actually possible to express y as
a function of x and moreover, we can speak of the derivative of y (with respect to the variable x).
Unfortunately, the Implicit Function Theorem requires a little knowledge of multivariate calculus and
partial derivatives and so we shall omit any discussion of it from this textbook altogether. But if you’re
interested, see Wikipedia or Krantz & Parks (1993, The Implicit Function Theorem: History, Theory,
and Applications).
701, Contents
Happily, the next Corollary appears on List MF26, so no need to mug:

d d 1
Fact 144. (a) sec x = sec x tan x. (b) sin−1 x = √ .
dx dx 1 − x2
d −1 d 1
(c) cos−1 x = √ . (d) tan−1 x = .
dx 1 − x2 dx 1 + x2

Proof. You are asked to prove (a), (c), and (d) in Exercise 296. Here we prove only (b).
d dy
(b) Let y = sin−1 x ∈ [− , ]. Then x = sin y. Apply to = to get 1 = cos y .
π π 1 1 2
2 2 dx dx

From the identity sin2 y + cos2 y = 1, we have cos y = ± 1 − x2 . But since y ∈ [− , ],
3 π π
2 2
√ that cos y ≥ 0. And so in =, we may simply discard the negative value to get
we know
cos y = 1 − x2 .

Now plug = into = to get:

4 2

√ dy dy 1
1= 1 − x2 or =√ .
dx dx 1 − x2

Exercise 296. Prove Fact 144(a), (c), and (d). (Answer on p. 1527.)

702, Contents

70.3. Parametric Differentiation

Parametric Differentiation Rule (informal)290

Let x, y, and t be variables. Then:

dy b dy dx
= ÷ .
dx dt dt

“Proof”. Use (a) the Chain Rule and (b) the IFT:
dy a dy dt b dy dx
= = ÷ .
dx dt dx dt dt

Example 910. Let x = t5 + t and y = t6 − t. Find ∣ .
dx t=0
dy dy dx 6t5 − 1
= ÷ =
dx dt dt 5t4 − 1

So ∣ = 1. It would be much more difficult (perhaps even impossible) if instead we
dx t=0
first tried to express y in terms of x, then compute .

Exercise 297. Let x = cos t + t2 and y = et − t3 . Find . (Answer on p. 1528.)

703, Contents

71. The Second and Higher Derivatives
Informally, the second derivative is simply the derivative of the (first) derivative. Just
like the (first) derivative, the second derivative is itself a function.

Example 911. Define f ∶ R → R by f (x) = x3 .

The (first) derivative of f is the function f ′ ∶ R → R defined by f ′ (x) = 3x2 . The (first)
derivative of f at −1 is the number f ′ (−1) = 3 (−1) = 3.

Figure to be
inserted here.

The second derivative of f is the function f ′′ ∶ R → R defined by f ′′ (x) = 6x. The second
derivative of f at −1 is the number f ′′ (−1) = 6 (−1) = −6.
Observe that f ′′ is itself:

the (first) derivative of f ′ ; and a function.

Formal definitions concerning the second derivative:

704, Contents

Definition 169. Let D be an interval, f ∶ D → R be a differentiable function with
derivative f ′ ∶ D → R, and a ∈ D. Consider the following limit:
f ′ (x) − f ′ (a)

If this limit exists (i.e. is equal to a real number), then we say that f is twice differentiable
d2 f
at a, call this limit the second derivative of f at a, and denote it by f (a), ′′
d2 f
∣ , or f¨ (a).
dx2 x=a
If this limit doesn’t exist, then we say that f is not twice differentiable at a.
If f is twice differentiable at every point in a set S, then we say that f is twice differentiable
on S.
Let T be the set of points on which f is twice differentiable. Then the second derivative
′′ d f
of f is the real-valued function denoted f , , or f¨, with domain T and mapping rule:
f ′ (x) − f ′ (a)
x ↦ lim

If f is twice differentiable on its domain (or equivalently, if T = D), then we simply call
f a twice-differentiable function.

For comparison, we now reproduce Definitions 167 and 168 (concerning the first derivative).
As you can tell, mutatis mutandis, they are, nearly identical to the above Definition (which
concerns the second derivative):

Definition 167. Let D be an interval, f ∶ D → R, and a ∈ D. Consider the following

f (x) − f (a)

If this limit exists (i.e. is equal to a real number), then we say that f is differentiable at
df df
a, call this limit the derivative of f at a, and denote it by f ′ (a), (a), ∣ , or f˙ (a).
dx dx x=a
If this limit doesn’t exist, then we say that f is not differentiable at a.

705, Contents

Definition 168. If f is differentiable at every point in a set S, then we say that f is
differentiable on S.
Let T be the set of points on which f is differentiable. Then the derivative of f is the
real-valued function denoted f ′ , , or f˙, with domain T and mapping rule:
f (x) − f (a)
x ↦ lim

If f is differentiable on its domain (or equivalently, if T = D), then we simply call f a

differentiable function.

Again, we have three different ways to denote the second derivative:

The second derivative of f = f ′′ (Lagrange’s notation)

d2 f
= (Leibniz’s notation)
= f¨. (Newton’s notation)

Under Leibniz’s notation, the differentiation operator is denoted , while the repeated
application of this operator is denoted 2 . Thus, the second derivative of f is denoted:291

d2 f df 2
and not .
dx2 dx2

df 2
Given our notation for the second derivative, the expression is confusing and should be avoided.
However, if used, it would denote the derivative of the composite function f 2 = f ○ f with respect to the
variable x2 .
706, Contents
Example 912. Define g ∶ R → R by g (x) = x3 + 2x.
The (first) derivative of g may be denoted g ′ , , or ġ. It has R as both its domain and
codomain, and is defined by:
g ′ (x) = (x) = ġ (x) = 3x2 + 2.
Plugging in −1, we find that the (first) derivative of g at −1 is the number:

dg dg
g ′ (−1) = 5 or (−1) = 5 or ∣ =5 or ġ (−1) = 5.
dx dx x=−1

We can also read any of the above four statements aloud as “the (first) derivative of g
evaluated at −1”.

Figure to be
inserted here.

d2 g ′′
The second derivative of g may be denoted g , , or g̈. It has R as both its domain
and codomain, and is defined by:

d2 g
g (x) = 2 (x) = g̈ (x) = 6x.
Plugging in −1, we find that the second derivative of g at −1 is the number:

d2 g d2 g
g (−1) = −6
or (−1) = −6 or ∣ or g̈ (−1) = −6.
dx2 dx2 x=−1

We can also read any of the above four statements aloud as “the second derivative of g
evaluated at −1”.

A function that is twice differentiable must also be differentiable. However, the converse
need not be true. That is, a function that is differentiable need not also be twice differen-

707, Contents

Example 913. Define h ∶ R+0 → R by h (x) = x3/2 .
The (first) derivative of h may be denoted h′ , , or ḣ. It has domain R+0 , codomain R,
and is defined by:
dh 3
h′ (x) = (x) = ḣ (x) = x1/2 .
dx 2
For example, the (first) derivative of h at 4 is:

dh dh
h′ (4) = 3 or (4) = 3 or ∣ =3 or ḣ (4) = 3.
dx dx x=4

We see that h is a differentiable function because it is differentiable at every point in its

domain R+0 .

Figure to be
inserted here.

d2 h ′′
The second derivative of h may be denoted h , , or ḧ. It has domain R+ , codomain
dx 2
R, and is defined by:

d2 h 3
h (x) = 2 (x) = ḧ (x) = x−1/2 .
dx 4
For example, the second derivative of h at 4 is:

3 d2 h 3 d2 h 3
h (4) =
or (4) = or ∣ or ḧ (4) = .
8 dx 2 8 dx2 x=4 8

Note that h is not a twice-differentiable function because it is not twice differentiable

at every point in its domain R+0 . In particular, h is not twice differentiable at the point
0 ∈ R+0 . Hence, h is an example of a function that is differentiable but not twice

708, Contents

Exercise 298. Find each function’s first and second derivatives. State if each function is
differentiable and/or twice-differentiable. Evaluate each function’s derivative and second
derivative at 1. (Answer on p. 1526.)

(a) x
(b) x
(c) x
Exercise 299. Explain whether each of the following statements is true or false. (As
usual, one way to show that a statement is false is to provide a counterexample.)(Answer
on p. 1526.)

(a) “f is twice differentiable ⇐⇒ f is differentiable.”

(b) “f is twice differentiable ⇐⇒ f is differentiable and its derivative f ′ is also

Exercise 300. Let D be an interval. Suppose f ∶ D → R is a differentiable function.

Then what can we say about the domains of f ′ and f ′′ ? (Answer on p. 1526.)

709, Contents

71.1. Higher Derivatives
Informally, the third derivative is the derivative of the second derivative, the fourth is that
of the third, etc. And in general, the nth derivative is the derivative of the (n − 1)th
derivative. Again, we stress that every derivative is a function.
As before, we have three ways to denote the third derivative:

The third derivative of f = f ′′′ = f (3) (Lagrange’s notation)

d3 f
= (Leibniz’s notation)
... 3
= f = f˙. (Newton’s notation)

As you can tell, with higher derivatives, Lagrange’s prime notation and Newton’s dot
notation will start getting cumbersome. And so, in general, for the nth derivative (with
n ≥ 4), instead of writing n primes or dots, we’ll write:

The nth derivative of f = f (n) (Lagrange’s notation)

dn f
= (Leibniz’s notation)
= f˙. (Newton’s notation)

710, Contents

Example 914. Define f ∶ R → R by x ↦ x5 . We can easily find all of f ’s derivatives:

Derivative Denoted Domain Codomain Mapping

First f ′, , or f˙ R R x ↦ 5x4
d2 f
Second f , ′′
, or f¨ R R x ↦ 20x3
′′′ d f
Third f , , or f R R x ↦ 60x2
d4 f 4
Fourth f (4)
, , or f˙ R R x ↦ 120x
d5 f 5
Fifth f (5)
, , or f˙ R R x ↦ 120
d6 f 6
Six f (6) , , or f˙ R R x↦ 0
d7 f 7
Seventh f (7)
, , or f˙ R R x↦ 0
Clearly, for any n ≥ 6, the nth derivative is given by:
dn f n
nth f (n) , , or f˙ R R x↦ 0

Confusion may arise when trying to distinguish between f n , f (n) , and (f ) :


711, Contents

Example 915. Define f ∶ R → R by f (x) = x3 .
Now consider the three functions f 4 , f (4) , and (f ) . These three functions are different

from each other!

The function f 4 ∶ R → R is the composite function f ○ f ○ f ○ f and is defined by:
3 3 3 9
f 4 (x) = {[(x3 ) ] } = [(x3 ) ] = (x3 ) = x81 .

The function f (4) ∶ R → R is the fourth derivative of f and is defined by:

d d d d 3 d d d d d d
f (4) (x) = { [ ( x )]} = { [ (3x2 )]} = [ (6x)] = 6 = 0.
dx dx dx dx dx dx dx dx dx dx

The function (f ) ∶ R → R is the function f raised to the power of four and is defined by:

(f ) (x) = [f (x)] = (x3 ) = x12 .

4 4 4

If we evaluate each of these three functions at 2, we get different wildly values:

f 4 (2) = 281 ≈ 2.417 × 1024 , f (4) (2) = 0, (f ) (2) = 212 = 8 096.


(Note that since a trillion is 1012 , the number f 4 (2) is approximately 2.417 trillion tril-

Remark 85. The notation (f ) to mean a function raised to the nth power is not com-

monly used. Indeed, it does not appear on your H2 Maths syllabus or exams. We will
try to avoid using it, but as we’ll see later, it sometimes comes in handy.

Formal definitions concerning the nth derivative:

712, Contents

Definition 170. Let n ≥ 3 be an integer, D be an interval, f ∶ D → R be a (n − 1)-times
differentiable function with (n − 1)th derivative f (n−1) ∶ D → R, and a ∈ D. Consider the
following limit:

f (n−1) (x) − f (n−1) (a)


If this limit exists (i.e. is equal to a real number), then we say that f is n-times differen-
dn f
tiable at a, call this limit the nth derivative of f at a, and denote it by f (a),
dx n
dn f n
∣ , or f˙ (a).
dxn x=a
If this limit doesn’t exist, then we say that f is not n-times differentiable at a.
If f is n-times differentiable at every point in a set S, then we say that f is n-times
differentiable on S.
Let T be the set of points on which f is n-times differentiable. Then the nth derivative of
(n) d f
n n
f is the real-valued function denoted f , , or f˙, with domain T and mapping rule:
f (n−1) (x) − f (n−1) (a)
x ↦ lim

If f is n-times differentiable on its domain (or equivalently, if T = D), then we simply

call f a n-times-differentiable function.

Remark 86. The above Definition is an example of a recursive definition.

Exercise 301. In Lagrange’s notation, why do we denote the nth derivative of f with
parentheses? That is, why do we denote the nth derivative of f by f (n) rather than more
simply f n ? (Answer on p. 1526.)

Exercise 302. Let g ∶ R → R be defined by g (x) = x4 − x3 + x2 − x + 1. Write down all of

its derivatives. Evaluate all of these derivatives at 1. Write your answers in Lagrange’s,
Leibniz’s, and Newton’s notation. (Answer on p. 1526.)
Exercise 303. Define h ∶ R → R by h (x) = x2 + 1. Evaluate each of the functions h′ , h2 ,
and (h) at 1.
(Answer on p. 1526.)

713, Contents

71.2. Smoothness (or Infinite Differentiability)
As you can imagine, we also have “infinitely-differentiable” functions or what are more
simply called smooth functions:

Definition 171. Suppose that for every positive integer n, the function f is n-times-
differentiable at a. Then we say that f is smooth (or infinitely differentiable) at a.
We say that a function is smooth on a set S if it is smooth at every point in S.
A smooth function is one that’s smooth on its domain.

Most functions you’ll encounter in the A-Levels are smooth. This includes, for example, all
polynomial functions:

Example 916. Consider again f ∶ R → R defined by x ↦ x5 . We showed that every one

of f ’s derivatives exist and has R as both its domain and codomain.
The first five derivatives are defined by:

f ′ (x) = 5x4 , f ′′ (x) = 20x3 , f ′′′ (x) = 60x2 , f (4) (x) = 120x,
f (5) (x) = 120.

For any n ≥ 6, the nth derivative is defined by f (n) (x) = 0.

Hence, for any positive integer n, the function f is differentiable at every point a ∈ R.
Thus, by Definition 171, f is smooth (or infinitely differentiable).

Example 917. Consider again g ∶ R → R be defined by g (x) = x4 − x3 + x2 − x + 1.

Example 918. Consider i ∶ R → R defined by x ↦ x5 − x4 + x3 − x2 + x − 1. We have, for

all x ∈ R,

i′ (x) = 5x4 − 4x3 + 3x2 − 2x + 1, i(4) (x) = 120x − 24,

i′′ (x) = 20x3 − 12x2 + 6x − 2, i(5) (x) = 120,
i′′′ (x) = 60x2 − 24x + 6, i(6) (x) = i(7) (x) = i(8) (x) ⋅ ⋅ ⋅ = 0.

The function i is smooth, with its sixth and higher-order derivatives all having domain
R, codomain R, and mapping rule x ↦ 0.

As the above examples suggest, for any non-negative integer n, any nth-degree polynomial
function is smooth. Moreover, for any k ≥ n + 1, its kth derivative has the mapping x ↦ 0.
The exponential function is also smooth:

Example 919. Let j be the exponential function. Then for all x ∈ R, we have:

j ′ (x) = j ′′ (x) = j ′′′ (x) = j (4) (x) = ⋅ ⋅ ⋅ = ex .

The function j is smooth, with its every derivative being the exponential function.

714, Contents

As we saw earlier, with one small exception, all elementary functions are smooth. It turns
out that here we have a similar “Theorem”:

Informal Theorem on Smoothness of Elementary Functions

All elementary functions are smooth, except possibly those involving the power function
x ↦ xc .

Exercise 304. Find all the derivatives of each function. Which are smooth?(Answer on
p. 715.)
(a) x
(b) x


715, Contents

72. When a Function Is Increasing or Decreasing, Revisited
We reproduce from Ch. 12 the following definitions:

Definition 53. Let f be a nice function. Given a set of points S ⊆ Domainf , we say
that f is:

(a) Increasing on S if for any x1 , x2 ∈ S with x2 > x1 , we have f (x2 ) ≥ f (x1 );

(b) Strictly increasing —–—–—–—–—–—– “ —–—–—–—–—–—— f (x2 ) > f (x1 );

(c) Decreasing —–—–—–—–—–—– “ —–—–—–—–—–—— f (x2 ) ≤ f (x1 );

(d) Strictly decreasing —–—–—–—–—–—– “ —–—–—–—–—–—— f (x2 ) < f (x1 ).

If f is (strictly) increasing/decreasing on its domain, then we simply say that f is a
(strictly) increasing/decreasing function.

The problem of finding a derivative is the problem of finding a curve’s gradient. And so,
not surprisingly, the derivative is intimately related to whether a function is increasing
or decreasing; we have what is sometimes known as the Increasing/Decreasing Test

Fact 145. (Increasing/Decreasing Test [IDT]) Let a < b. Suppose f is differentiable

on (a, b). Then:

(a) For every x ∈ (a, b), f ′ (x) ≥0 ⇐⇒ f is increasing.

(b) “ >0 Ô⇒ “ strictly increasing.
(c) “ ≤0 ⇐⇒ “ decreasing.
(d) “ <0 Ô⇒ “ strictly decreasing.

Proof. See p. 1341 in the Appendices.

716, Contents

Example 920. Let f ∶ R → R be the function defined by f (x) = x2 .
The derivative of f is the function f ′ ∶ R → R defined by f ′ (x) = 2x.
Now observe that:
(a) f ′ (x) ≥ 0 for all x ≤ 0. And so by the IDT, f is increasing on R−0 .
(b) f ′ (x) > 0 “ x < 0. “ strictly increasing on R− .
(c) f ′ (x) ≤ 0 “ x ≥ 0. “ decreasing on R+0 .
(d) f ′ (x) < 0 “ x > 0. “ strictly decreasing on R+ .

Figure to be
inserted here.

Note also that at x = 0, we have f ′ (x) = f ′ (0) = 2 ⋅ 0 = 0, so that f is both increasing and
decreasing at 0. (However, f is neither strictly increasing nor strictly decreasing at 0.)

Take care to note that in (b) and (d) of the IDT, we cannot replace the Ô⇒ ’s with
⇐⇒ ’s.

Example 921. Define f ∶ R → R be by f (x) = x3 .

The derivative of f is the function f ′ ∶ R → R defined by f ′ (x) = 3x2 .
Observe that f is strictly increasing on R because if b > a, then f (b) = b3 > a3 = f (a).
However, it is not true that f ′ > 0 on R. In particular, f ′ (0) = 3 ⋅ 02 = 0.
This shows that the converse of (b) of the IDT is false.

Figure to be
inserted here.

717, Contents

Exercise 305. Define g ∶ R → R by g (x) = −x3 . (Answer on p. 718.)
(a) Find the derivative of g.
(b) Identify the subsets of R on which g is (strictly) increasing and/or decreasing.
(c) Of which false claim can g serve as a counterexample?
Exercise 306. For each function, find its derivative. Identify the subsets of R on which
the derivative takes on different signs. Identify also the subsets of R on which the function
is (strictly) increasing and/or decreasing. Compare your findings to Fact 145. (Answer
on p. 718.)

(a) x
(b) x

A305(a) The derivative of g is the function g ′ ∶ R → R defined by g ′ (x) = −3x2 .

(b) On all of R, g is strictly decreasing because if b > a, then g (b) = −b3 < a3 = g (a).
(c) This shows that the converse of (d) of the IDT is false.

718, Contents

72.1. Extrema and Stationary and Turning Points
We already discussed extrema (i.e. maximum and minimum points) and stationary
and turning points in some detail in Ch. 6.14. Now that we’ve learnt a little about
differentiation, we’ll revisit and study them in greater depth.
We reproduce from Ch. 6.14 the following informal definitions of the eight types of ex-

(a) A global maximum is at least as high as any other point.

(b) The strict global “ “ higher than any other point.
(c) A local “ “ at least as high as any “nearby” point.
(d) A strict local “ “ higher than any “nearby” point.
(e) A global minimum “ at least as low as any other point.
(f) The strict global “ “ lower than any other point.
(g) A local “ “ at least as low as any “nearby” point.
(h) A strict local “ “ lower than any “nearby” point.

We also reproduce from Ch. 18.8 the following definitions of stationary and turning

Definition 62. A point x is a stationary point of a function f if f ′ (x) = 0.

Definition 63. A point x is a turning point of a function f if it is both a stationary

point and a strict extremum (i.e. a strict maximum or minimum point) of f .

Example 922. XXX

When searching for stationary points, we will often write:

f ′ (a) = 0.

Because the above equation appears so often, it is sometimes given the special name of
the First Order Condition (FOC). (It could equally well be called the Stationary
Point Condition, but for some reason that name hasn’t caught on.)

Example 923. XXX

Example 924. XXX

Example 925. XXX

For the formal definitions, see Definition 224 (Appendices).
719, Contents
Example 926. Consider again the function h ∶ R → R defined by x ↦ 6x5 − 15x4 − 10x3 +
30x2 . (Graph reproduced below for convenience.)
x = ±1 are maximum points. However, they are not global maximum points. Indeed,
h has no global maximum point because lim h (x) = ∞ (“as x increases without bound,
h (x) also increases without bound”). In other words, there is no x such that h (x) ≥ h (a)

for all a ∈ R.
Similarly, x = 0, 2 are minimum points. However, they are not global minimum points.
Indeed, h has no global minimum point because lim h (x) = −∞ (“as x decreases without
bound, h (x) also decreases without bound”). In other words, there is no x such that

h (x) ≤ h (a) for all a ∈ R.

x = ±1 y
maximum points

-2 -1 0 1 2 3

x = 0, 2
minimum points

We next restrict the domain of h in two ways to create two new functions i and j:

720, Contents

Example 927. Graphed below (left) is the function i ∶ [−1.5, 2.5] → R defined by x ↦
6x5 − 15x4 − 10x3 + 30x2 .
i has three maximum points in total, namely ±1, 2.5. However, only 2.5 is a global
maximum point of i because only i(2.5) ≥ i (x) for all x ∈ [−1.5, 2.5]. Of course, it is also
a strict global maximum point because i(2.5) > i (x) for all x ∈ [−1.5, 2.5].
i has three minimum points in total, namely −1.5, 0, 2. However, only −1.5 is a global
maximum point of i because only i(−1.5) ≤ i (x) for all x ∈ [−1.5, 2.5]. Of course, it is
also a strict global minimum point because i(−1.5) < i (x) for all x ∈ [−1.5, 2.5].

y x = 2.5 y
x = ±1 x = -1
max max and max and
global max global max x = 1, 1.2

x x
-2 -1 0 1 2 3 -2 -1 0 1 2 3
x = -1.5
min and x = -1.2, 0 min x = 2 min and
global min x = 0, 2 min global min

Also graphed above (right) is the function j ∶ [−1.2, 2.2] → R defined by x ↦ 6x5 − 15x4 −
10x3 + 30x2 .
Again, there are three maximum points in total, namely ±1, 2.2. However, only −1 is a
global maximum point of j because only j(−1) ≥ j (x) for all x ∈ [−1.2, 2.2]. Of course, it
is also a strict global maximum point because j(−1) > i (x) for all x ∈ [−1.2, 2.2].
And again, there are three minimum points in total, namely −1.2, 0, 2. However, only 2
is a global minimum point of j because only j(2) ≤ j (x) for all x ∈ [−1.2, 2.2]. Of course,
it is also a strict global minimum point because j(2) < j (x) for all x ∈ [−1.2, 2.2].

Exercise 307. (Answer on p. 1530.) For each of the following functions, write down,
if any of these exist, the (i) maximum points, (ii) minimum points, (iii) strict maximum
points, (iv) strict minimum points, (v) global maximum points, (vi) global minimum
points, (vii) strict global maximum points, (viii) strict global minimum points; and also
all the corresponding values of the function at these points.
(a) f ∶ R → R defined by x ↦ 100.
(b) g ∶ R → R defined by x ↦ x2 .
(c) h ∶ [1, 2] → R defined by x ↦ x2 .

721, Contents

Example 119. Graphed below is the function f ∶ [−1.5, 0.5] → R defined by x ↦ x5 +2x4 +x3 .
Five points are
Example labelled.
928. GraphedThe tableisbelow
below classifiesfeach
the function point.
∶ [−1.5, 0.5] → R defined by x ↦ x5 +
is a+stationary
D2x x3 . Five points
point are
butlabelled. The table
not a turning below
point. (As classifies each point.
we shall learn in Section 147, D is an
example of an inflexion
D is a stationary pointpoint.)
but not a turning point. (As we shall learn in Chapter 73, D is
an example of an inflexion point.)
A is a minimum point and E is a maximum point. But neither is a turning point.
A is a minimum point and E is a maximum point. But neither is a turning point.

Max ✓3 ✓
Min ✓3 ✓3
Max ✓3 ✓
Min ✓3 ✓3
Max ✓
Min ✓3
Max ✓
Min ✓3
Stationary ✓3 ✓3 ✓
Turning ✓3 ✓3

Exercise 308. Is each of the following statements true or false? To show that a statement
is false, simply give a counterexample from the above example. If it is true, explain why.
(Answer on p. 1531.)
(a) Every65.
Exercise maximum
Is each point
of theorfollowing
minimum point is atrue
statements stationary point.
or false? To show that a statement
false, simply
Every give a point
maximum counterexample
or minimum from theisabove
point example.
a turning point.If it is true, explain why.
(c) Every p. 953.) point is a maximum point or minimum point.
Everymaximum pointisora minimum
turning point maximum point
point isoraminimum
(b) Every maximum point or minimum point is a turning point.
(e) Every turning point is a stationary point.
(c) Every stationary point is a maximum point or minimum point.
(f) Every
(d) Every turning
stationary point
point is aismaximum
a turningpoint
point.or minimum point.
(e) Every turning point is a stationary point.
(f) Every stationary point is a turning point.

Page 141, Table of Contents

722, Contents

72.2. Interior (and Boundary) Points
Let S be a set and x ∈ S. Informally, x is a boundary point (of S) if it is at the “edge”
of S. Otherwise, it is an interior point.293

Example 929. The set S = [0, 1] has two boundary points, namely 0 and 1.
Every other point in S is an interior point. So for example, the points 0.2, 0.5, and 0.785
are interior points of the set S.

Given a set S, its boundary BS is the set of its boundary points. And its interior IS is
the set of its interior points.

Example 930. Continuing with the set S = [0, 1], its boundary is BS = {0, 1} and its
interior is IS = (0, 1).

Figure to be
inserted here.

Of course, the union of any set’s boundary and interior are equal to the set. So here, we

S = SB ∪ SI .

Example 931. Consider the set T = (0, 1).

It has no boundary points and so its boundary is empty: BT = ∅.

Figure to be
inserted here.

Since the boundary is empty, every point in T is an interior point of T . Thus, IT = T =

(0, 1). (That is, T is its own interior.)

We next look at an example of a set in two-dimensional space.

For the formal definition, see Definition 263 (Appendices).
723, Contents
Example 932. Consider the set U = {(x, y) ∶ x2 + y 2 ≤ 1}.
Its boundary is the set BU = {(x, y) ∶ x2 + y 2 = 1}.

Figure to be
inserted here.

Its interior is the set IU = {(x, y) ∶ x2 + y 2 < 1}.

1 1 1
Example 933. Consider the set V = [0, ) ∪ ( , 1] = [0, 1] ∖ { }.
2 2 2
Its boundary points are 0 and 1. Thus, its boundary is the set BV = {0, 1}.

Figure to be
inserted here.

1 1
Every other point is an interior point. Thus, V ’s interior is the set IV = (0, ) ∪ ( , 1) =
2 2
(0, 1) ∖ { }.

Remark 87. Interior and boundary points are not on your H2 Maths syllabus. How-
ever, as I hope the above examples have shown, they are very simple concepts. They
are thus well worth knowing because they will give you a better and more correct under-
standing of the material that follows, in particular the Interior Extremum Theorem
and the various First and Second Derivative Tests.

724, Contents

72.3. The Interior Extremum Theorem (IET)
In secondary school, to find extrema, you may have used the following Flawed Proced-

The Flawed Secondary School Procedure for Finding Extrema (FSSPFE)

1. Compute the derivative.

2. Find the stationary points.
3. Conclude: the stationary points found in Step 2 are also the function’s extrema.

The Flawed Procedure will often work. However (and as we’ll illustrate below with several
examples), it sometimes fails.
To better understand how, why, and when the Flawed Procedure works, we now formally
introduce the result that justifies it. This result is called the Interior Extremum The-
orem (IET) and says that every interior extremum at which the derivative exists
is a stationary point.

Theorem 22. (Interior Extremum Theorem.) Let a < b, f ∶ (a, b) → R be differenti-

able, and c ∈ (a, b). If c is an extremum (of f ), then f ′ (c) = 0.

Proof. See p. 1342 in the Appendices.

In the sentence before the formal statement of the IET, there contain in italics two subtle
but important technicalities that are overlooked by the FSSPFE and which we will discuss
Here’s a quick and simple example to illustrate the IET:

725, Contents

Example 934. Below is graphed the function f ∶ R → R defined by f (x) = − (x − 1) .

The point 1 is an extremum — or more specifically, a local maximum point — of f .

Moreover, it is in the interior of the domain of f . And so, by the IET, the derivative of
f at 1 should equal 0. We can easily verify that this is correct:
The derivative of f is f ′ ∶ R → R defined by f ′ (x) = −2 (x − 1). And so:

f ′ (1) = −2 (1 − 1) = 0. 3

In secondary school, you will have gone through the intuition for why the IET works. We
now briskly go through it again:
In order for 1 to be a maximum point of f , it must be that “just” to its left, f is increasing;
and “just” to its right, f is decreasing. In other words, “just” to the left of 1, f ′ (x) ≥ 0.
And “just” to the right of 1, f ′ (x) ≤ 0. Moreover, at the maximum point, f must be
both increasing and decreasing. Thus, f ′ (1) = 0 — the gradient of f at the maximum
point must be 0.

Figure to be
inserted here.

Also graphed above is the function g ∶ R → R defined by g (x) = (x − 1) .


Remark 88. The IET is also called Fermat’s Theorem. But a bit like Euler, Fermat was
a stud whose name is attached to many results and theorems (including most famously
Fermat’s Last Theorem). So, to avoid confusion, we’ll call it the IET instead of Fermat’s

Exercise 309. Refer to the above Example. Explain the intuition for why g ′ (−1) = 0.
(Answer on p. 1531.)
Exercise 310. True or false: “Let f ∶ D → R be a differentiable function. If c is a
maximum or minimum point AND in the interior of D, then x is a turning point.”
(Answer on p. 1531.)

Here’s an example to illustrate why the FSSPFE doesn’t always work:

726, Contents

Example 935. Below is graphed the function f ∶ [−1.5, 0.5] → R defined by f (x) =
x5 + 2x4 + x3 . Five points A–E are labelled. The following table states what types of
extrema each point is:

Global maximum 3
Strict global maximum 3
Local maximum 3 3
Strict local maximum 3 3
Global minimum 3
Strict global minimum 3
Local minimum 3 3
Strict local minimum 3 3
Turning point 3 3

Note the point D is not a local maximum because there are points “nearby” (in particular,
to the right) that are higher than D.
Similarly, it is not a local minimum because there are points “nearby” (in particular, to
the left) that are lower than D.
Altogether then, f has four extrema, namely A, B, C, and E.
Let’s see if the FSSPFE correctly identifies these four extrema:
According to the FSSPFE, here’s what we’d do.
First, find the derivative’s mapping rule:

f ′ (x) = 5x4 + 8x3 + 3x2 = x2 (5x2 + 8x + 3) = x2 (5x + 3) (x + 1).

Next, find any stationary points, in what we call the First Order Con-
f ′ (x) = 0 ⇐⇒ x = − , −1, 0.
Now conclude that f ’s extrema (i.e. maximum or minimum points) are
at − , −1, and 0.
3 3
So, the FSSPFE correctly identifies the points B = (−1, f (−1)) and C = (− , f (− )) as
5 5
extrema. However, it makes two mistakes.

Figure to be
inserted here.

727, Contents

We now give a Correct Procedure for Finding Extrema (CPFE):

A Correct Procedure for Finding Extrema (CPFE)

Given a nice function f ,

1. Find all boundary points (i.e. x that are on the “edge” of Domainf ).
2. Find all points at which the derivative does not exist (i.e. x where f ′ (x) does
not exist).
3. Find all stationary points (i.e. x where f ′ (x) = 0).
4. Investigate whether each of the above points is an extremum.

Below are the three ways by which the CPFE rectifies the FSSPFE. The first two concern
the “two subtle but important technicalities” we mentioned earlier:
1. A boundary point may be an extremum but not a stationary point. Hence, it may be
overlooked by the FSSPFE.
2. Similarly, a point at which the derivative does not exist may be an extremum but will, by
definition, not be a stationary point. Hence, it may also be overlooked by the FSSPFE.
3. The FSSPFE suggests or incorrectly assumes that every stationary point is an extremum.
But this is false. Loosely, the IET says that every extremum is a stationary point, but
not the converse. And so, Step 4 of the CPFE demands that you investigate whether
each stationary point you found is actually an extremum.

Example 936. XXX

Example 937. XXX

Exercise 311. Find all extrema of each function. (Answer on p. 1532.)

(a) f ∶ R → R defined by f (x) = x.
(b) g ∶ [0, 1] → R defined by g (x) = x.
(c) h ∶ R → R defined by h (x) = x4 − 2x2 .

728, Contents

72.4. The First Derivative Test for Extrema (FDTE)
The IET provides us with a Correct Procedure for finding extrema. But of course, we are
not content with merely knowing what a function’s extrema are. Instead, we want also to
know if each extremum is a maximum or a minimum point.
One method for doing so is the First Derivative Test for Extrema (FDTE), whose
intuition we’ve already discussed in the previous subchapter:

The First Derivative Test for Extrema (FDTE) (informal)

Let a < b, f ∶ (a, b) → R be differentiable, and c ∈ (a, b).

(a) If f ′ is non-negative on c’s “immediate left” and non-positive on c’s “immediate
right”, then c is a local maximum (of f ).
(b) If f ′ is positive on c’s “immediate left” and negative on c’s “immediate right”, then
c is a strict local maximum.
(c) If f ′ is non-positive on c’s “immediate left” and non-negative on c’s “immediate
right”, then c is a local minimum.
(d) If f ′ is negative on c’s “immediate left” and positive on c’s “immediate right”, then
c is a strict local minimum.

Remark 89. The above statement of the FDTE is informal only in that the phrases
“immediate left” and “immediate right” haven’t been precisely defined. The FDTE is
formally stated (and proven) as Proposition 20 in the Appendices.

Example 938. XXX

Example 939. XXX

Exercise 312. XXX (Answer on p. 729.)


Remark 90. One is tempted to assume that the converses of each of (a)–(d) in the First
Derivative Test are true. Unfortunately, they are not!
For example, one is tempted to assume that if c is a local maximum of f , then f ′ is non-
negative on c’s “immediate left” and non-positive on c’s “immediate left”. This however
is false. For a counterexample, see Example 1233 (Appendices).

729, Contents

72.5. The Second Derivative Test for Extrema (SDTE)
If f is twice differentiable, then we may prefer use the Second Derivative Test for
Extrema (SDTE) instead of the FDTE.

Proposition 7. (Second Derivative Test for Extrema) Let f be a function that is

twice differentiable at c. Suppose f ′ (c) = 0.
(a) If f ′′ (c) < 0, then c is a strict local maximum (of f ).
(b) If f ′′ (c) > 0, then c is a strict local minimum.
(c) If f ′′ (c) = 0, then c could be a maximum point, minimum point, inflexion point, or
something else altogether (so that informally, we say that the SDTE is inconclusive).

Proof. For the proofs of (a) and (b), see p. 1347 (Appendices).
Below we will give a partial proof of (c).

Example 940. XXX

Example 941. XXX

By providing two examples, we now prove that, as asserted by (c) of the SDTE, if f ′′ (a) = 0,
then a could be a maximum or a minimum point:

730, Contents

Example 942. Graphed below are the two functions f, g ∶ R → R defined by:

f (x) = x4 and g (x) = −x4 .

We can easily verify that 0 is a stationary point of each of f and g:

f ′ (0) = 4x3 ∣ = 4 ⋅ 03 = 0,
g (0) = −4x ∣
′ 3
= −4 ⋅ 03 = 0,

Figure to be
inserted here.

We can also easily verify that the second derivative of each of f and g at 0 is zero:

f ′′ (0) = 12x2 ∣ = 12 ⋅ 02 = 0,
g (0) = −12x ∣
′′ 2
= −12 ⋅ 02 = 0,

However, as is evident from the graph, 0 is a minimum point of f and a maximum point
of g.

We are not done proving (c) of the SDTE because it remains to be proven that if f ′′ (a) = 0,
then a could be an inflexion point or “something else altogether”. In Ch. 73.4(after
we’ve introduced the concept of inflexion points), we will furnish two such examples.

731, Contents

Now that we’ve learnt about the SDTE, we can reproduce and add a little more to our
Correct Procedure for finding extrema (p. 72.3):

A Correct Procedure for Finding Extrema (CPFE)

Given a nice function f ,

1. Find all boundary points (i.e. x that are on the “edge” of Domainf ).
2. Find all points at which the derivative does not exist (i.e. x where f ′ (x) does
not exist).
3. Find all stationary points (i.e. x where f ′ (x) = 0).
4. Investigate whether each of the above points is an extremum.
In particular, for each stationary point a at which f ′′ (a) exists:
• If f ′′ (a) < 0, then by the SDTE, a is a strict maximum point (and hence also a
maximum turning point).
• If f ′′ (a) > 0, then by the SDTE, a is a strict minimum point (and hence also a
minimum turning point).
• If f ′′ (a) = 0, then the SDTE is inconclusive and we must determine the nature of a
by some other method (e.g. sketch the graph).

Example 943. XXX

Example 944. XXX

Note that similar to the FDTE, the converses of (a) and (b) in the SDTE are false. That
is, given a strict maximum (or minimum) a of a twice-differentiable function f , it need not
be that f ′′ (a) < 0 (or f ′′ (a) > 0). Instead, it could be that f ′′ (a) = 0. Example:

Example 945. Define f ∶ R → R by f (x) = x4 .

Figure to be
inserted here.

Then 0 is a strict minimum of f . However, it is false that f ′′ (0) > 0. Instead, we have:

f ′ (x) = 4x3 , f ′′ (x) = 12x2 , and hence f ′′ (0) = 12 ⋅ 02 = 0.

732, Contents

3 1
Example 928 (revisited). Consider f ∶ [− , ] → R defined by x ↦ x5 + 2x4 + x3 .
2 2
1. Identify all the stationary points. f ′ (x) = 5x4 + 8x3 + 3x2 = x2 (5x2 + 8x + 3) = 0 ⇐⇒
x = 0 or x = −1, −0.6 (quadratic formula).
(a) f ′′ (x) = 20x3 + 24x2 + 6x = 2x(10x2 + 12x + 3).
(b) f ′′ (−0.6) > 0 Ô⇒ −0.6 is a minimum point. f ′′ (−1) < 0 Ô⇒ −1 is a maximum
point. But f ′′ (0) = 0, so the SDTE tells us nothing. From a graph sketch, we see
that 0 is an inflexion point.
2. The only two non-interior points are −1.5 and 0.5. Again by sketching the graph, we
see that −1.5 is a minimum point and 0.5 is a maximum point.
Altogether, we conclude that there are two maximum points — −1 and 0.5 — and two
minimum points — −0.6 and −1.5.

Exercise 313. Find the extrema of each function. (Answer on p. 1534.)

(a) g ∶ R → R defined by x ↦ x8 + x7 − x6 .

(b) h ∶ (− , ) → R defined by x ↦ tan x.

π π
2 2
(c) i ∶ [0, 2π] → R defined by x ↦ sin x + cos x.

733, Contents

73. Concavity

Example 946. Graphed below is f ∶ R → R defined by f (x) = x3 .

Figure to be
inserted here.

We say that f is concave on R−0 = (−∞, 0], but convex on R+0 = [0, ∞).
Here are our informal definitions of concavity and convexity:294
• f is concave on an interval if its slope is decreasing on that interval.
• f is convex on an interval if its slope is increasing on that interval.
• f is strictly concave on an interval if its slope is strictly decreasing on that interval.
• f is strictly convex on an interval if its slope is strictly increasing on that interval.
Here is another characterisation of concavity and convexity:
• f is concave on R−0 because if we pick any two points in that interval, say A and B,
then no point of the line segment AB is above the graph of f .
• f is convex on R+0 because if we pick any two points in that interval, say C and D,
then no point of the line segment CD is below the graph of f .

For the formal Definitions, see Ch. 121.11 (Appendices).
734, Contents
Example 947. Graphed below are function g ∶ R → R defined by g (x) = −x2 and the
exponential function.

Figure to be
inserted here.

Observe that exp is convex on its entire domain R, while g is concave on its entire domain
Two mnemonics (for distinguishing between concave and convex):

Concave looks like a cave opening, while ex is convex .

In secondary school, we learnt that informally, a linear function is one whose graph is a
straight line.
It turns out that more formally, we can characterise the property of linearity as follows:295

Linear ⇐⇒ Concave and convex.

Example 948. Graphed below is the function h ∶ R → R defined by h (x) = x.

Figure to be
inserted here.

“Obviously”, h is linear everywhere.

Less obviously, h is also concave everywhere. Because XXX
Similarly, h is also convex everywhere. Because XXX

The following result is immediate from our informal definitions of concavity and convexity:

For a formal definition of linearity, see Definition 265 (Appendices).
735, Contents
Proposition 8. (First Derivative Test for Concavity [FDTC]) Let D be an interval
and f ∶ D → R be a differentiable function. Then:
(a) f ′ is decreasing (on D) ⇐⇒ f is concave (on D).
(b) “ strictly decreasing ⇐⇒ “ strictly concave.
(c) “ increasing ⇐⇒ “ convex.
(d) “ strictly increasing ⇐⇒ “ strictly convex.

Proof. See p. 1350 in the Appendices.

Example 949. XXX

Example 950. XXX

The following result is immediate from the FDTC with the IDT:

Corollary 31. Let D be an interval and f ∶ D → R be a twice-differentiable function.

(a) f ′′ ≤ 0 (on D) ⇐⇒ f is concave (on D).
(b) f ′′ < 0 Ô⇒ “ strictly concave.
(c) f ′′ ≥ 0 ⇐⇒ “ convex.
(d) f ′′ > 0 Ô⇒ “ strictly convex.

Example 951. XXX

Example 952. XXX

Once again, note the one-way Ô⇒ ’s in the SDTC (these are simply inherited from the
IDT). For example, the converse of (b) is false. Given twice-differentiable and strictly
concave function f , it need not be that f ′′ (x) < 0 for all x.

Example 953. Consider f ∶ R → R defined by f (x) = −x4 .

In Exercise XXX, you are asked to verify that f is strictly concave on R.
Observe though that f ′′ is not strictly negative on S. Compute:

f ′ (x) = −4x3 , f ′′ (x) = −12x2 , and hence f ′′ (0) = 0 </ 0.

And of course, if a differentiable function g is strictly convex on S, then it need not be

that g ′ is strictly increasing on S: (XXX use this as an exercise?)

Exercise 314. XXX (Answer on p. 736.)


736, Contents

Remark 91. Instead of concave, some writers say concave downwards.
And instead of convex, some writers say concave upwards.
This textbook will use only the terms concave and convex (which are more commonly
used) and never concave downwards or upwards.

737, Contents

73.1. Inflexion Points
Informally, an inflexion point (of a function) is a point at which the function goes from
being (a) strictly concave to strictly convex; or (b) strictly convex to strictly concave. A
bit more formally:

Definition of an Inflexion Point (informal)

Let a < b and f ∶ (a, b) → R be a continuous function. We call c ∈ (a, b) an inflexion point
of f if either of the following statements is true:
(a) f is strictly concave on c’s “immediate left” and strictly convex on c’s “immediate
(b) f is strictly convex on c’s “immediate left” and strictly concave on c’s “immediate

Remark 92. The above Definition is informal only in that the phrases “immediate left”
and “immediate left” have not been precisely defined. For the formal definition of an
inflexion point, see Definition 266 (Appendices).

Remark 93. It’s usually spelt inflection rather than inflexion.296 But the latter spelling
is what appears on your A-Level syllabus and so that’s what we’ll do too.

According to Google Ngram, inflection is the more common spelling (even when we restrict attention
to British English). It seems that inflexion is, like connexion, an archaic spelling.
738, Contents
Example 954. Consider the function f ∶ R → R defined by f (x) = x3 .
Observe that f is strictly concave on R− = (−∞, 0) and strictly convex on R+ = (0, ∞).
Hence, 0 is an inflexion point of f .

Figure to be
inserted here.

One simple test for inflexion points is the Tangent Line Test (TLT). If c is an inflexion
point of f , then it must pass the TLT:297

The tangent line is strictly above (or below) f on the “immediate left” of c
and strictly below (or above) f on the “immediate right” of c.

We can easily verify that the inflexion point 0 passes the TLT:
• On the “immediate left” of 0, the tangent line at 0 is strictly above f ; and
• On the “immediate right” of 0, it is strictly above f .
Note though that the converse is false! Any inflexion point must pass the TLT, but not
every point that passes the TLT must be an inflexion point! See Remark 152 (Appen-

Note that confusingly, there is no single universally-accepted definition of inflexion points.

Our above Definition of an inflexion point contains several instances of the word strictly.
Some writers though drop this modifier.

For the formal definition, see Definition 266 (Appendices).
For a formal statement of the TLT, see Fact 224 (Appendices).
739, Contents
Example 955. Define g ∶ R → R by:

⎪x2 for x ≤ 0,
g (x) = ⎨

⎪ for x > 0.
Observe that g is continuous, strictly convex on R− , and concave on R+ . (Of course, g is
in fact linear on R+ , so that it is both concave and convex on R+ .)

Figure to be
inserted here.

Under our above Definition of an inflexion point, we do not consider 0 an inflexion point
of the function g
However and confusingly, some other writers do!

Example 956. Define h ∶ R → R by h (x) = 0.

Observe that g is continuous, convex on R− , and concave on R+ . (Of course, g is in fact
linear everywhere, so that it is both concave and convex on everywhere.)

Figure to be
inserted here.

Under our above Definition of an inflexion point, we do not consider 0 an inflexion point
of the function h
However and confusingly, some other writers do!

Unfortunately, your H2 Maths syllabus is not clear on which definition is to be used. We

can therefore only hope that on your exams, no ambiguous cases like the functions g or h
ever appear in connection to inflexion points.

740, Contents

73.2. Stationary and Non-Stationary Points of Inflexion
Inflexion points can be either stationary or non-stationary. Here are the “obvious”

Definition 172. An inflexion point that is also a stationary point is called a stationary
point of inflexion; otherwise, it is called a non-stationary point of inflexion.

Your H2 Maths syllabus explicitly excludes non-stationary points of inflexion. That is,
happily enough, all points of inflexion you’ll encounter will also be stationary, i.e. where
the first derivative equals zero.
Nonetheless, one is tempted to assume that “every inflexion point must also be a stationary
point”. This is false:

Example 957. Consider the function f ∶ R → R defined by f (x) = x3 + x. It is graphed

below and is twicedifferentiable.
The first derivative of f is f ′ ∶ R → R defined by f ′ (x) = 3x2 + 1.
The second derivative of f is f ′′ ∶ R → R defined by f ′′ (x) = 6x.
We claim now that 0 is an inflexion point. To see why, observe that f is concave “just”
to the left of 0 and convex “just” to the right.
We can also check that 0 passes the TLT: The tangent line is above the graph of f “just”
to the left of 0 and below “just” to the right of 0.

Figure to be
inserted here.

Now, note that f ′ (0) = 3 ⋅ 02 + 1 = 1 ≠ 0. Thus, 0 is not a stationary point.

Altogether then, 0 is a non-stationary point of inflexion.

741, Contents

73.3. The First Derivative Test for Inflexion Points (FDTI)

The First Derivative Test for Inflexion Points (FDTI, informal)

Let a < b, f ∶ (a, b) → R be a differentiable function whose derivative is continuous, and

c ∈ (a, b) is a stationary point of f . If c is also an inflexion point of f , then f ′ is either
strictly positive or strictly negative at all points “near” c.

Remark 94. Again, the above statement of the FDTI is informal only in that we haven’t
precisely defined the term “near”. For a formal statement (and proof) of the FDTI, see
Fact 223 in the Appendices.

Example 958. XXX

Example 959. XXX

Remark 95. Unfortunately, the converse of the FDTI is false. That is, it may be that f ′
is strictly positive (or negative) at all points “near” c, but c is not an inflexion point.
For such a counterexample, see Example 1234 in the Appendices.

742, Contents

73.4. The Second Derivative Test for Inflexion Points (SDTI)
Suppose f ′ (c) = 0. Recall (c) of the SDTE, which said that if f ′′ (c) = 0, then c could be a
maximum point, minimum point, inflexion point, or something else altogether.
In Ch. 72.5, we already gave examples showing that if f ′ (c) = 0 and f ′′ (c) = 0, then c
can be a maximum point or a minimum point. We will now give examples showing
that if f ′ (c) = 0 and f ′′ (c) = 0, then c can be an inflexion point or “something else

Example 960. Graphed below is the three function f ∶ R → R defined by f (x) = x3 .

We can easily verify that 0 is a stationary point of f :

f ′ (0) = 3x2 ∣ = 3 ⋅ 02 = 0.

Figure to be
inserted here.

We can also easily verify that the second derivative of f at 0 are zero:

f ′′ (0) = 6x∣ =6⋅0=0


We have f ′ (0) = 0 and f ′′ (0) = 0. On the other hand, as verified in previous subchapters,
0 is an inflexion point of f .

743, Contents

Example 961. Graphed below is the function g ∶ R → R:

⎪ 1

⎪x sin ,
for x ≠ 0,
g (x) = ⎨


⎪ forx = 0.
We shall not do so, but it is possible to prove that g is twice differentiable, with the first
and second derivatives defined by:

⎪ 1 1

⎪ 5x 4
sin − x 3
cos , for x ≠ 0,
g ′ (x) = ⎨

x x

⎪ for x = 0,

⎪ 1 1 1

⎪ 20x3
sin − 8x2
cos − x sin , for x ≠ 0,
g ′′ (x) = ⎨

x x x

⎪ for x = 0.

Figure to be
inserted here.

We have g ′ (0) = 0 and g ′′ (0) = 0.

Observe that near 0, g fluctuates “infinitely often” between negative and positive values.
Thus, 0 can be neither a maximum nor a minimum point.
Moreover, it is false that the set of points “just” to the left (or right) of 0 are either
concave or convex. Thus, 0 cannot be an inflexion point.
So, 0 must be “something else altogether”, i.e. not a maximum point, minimum
point, or an inflexion point.

It is a little disappointing that we have (c) of the SDTE. But happily, we do have the
following partial converse, which says that if c is an inflexion point of a twice-differentiable
function f , then f ′′ (c) = 0:

Fact 146. (Second Derivative Test for Inflexion Points [SDTI]) Let a < b. Suppose
f ∶ (a, b) → R is twice differentiable and has inflexion point c ∈ (a, b). Then:
(a) c is a strict extremum of the first derivative f ′ ; and
(b) f ′′ (c) = 0.

Proof. For a proof of (a), see p. 1351 in the Appendices.

But assuming (a) is true, we can easily prove (b). Since c is an extremum of f ′ and is an

744, Contents

interior point of Domainf ′ , by the IET, f ′′ (c) = 0.

Example 962. XXX

Example 963. XXX

745, Contents

73.5. A Summary of the Types of Points
The five special types of points you need to know for H2 Maths are maximum, minimum,
stationary, turning, and inflexion points.
In the case of a twice-differentiable function defined on an interval, we have the following
Venn diagram.298

All Points
a Inflexion
Points b

d e Points i j
f h
Maximum Minimum
Points Points

Remark 96. The above Venn diagram is for reference only. It would be foolish to try
to mug it. Instead, it is much easier to simply remember what (strict) maximum and
minimum points, stationary points, inflexion points, and turning points are.

There are, in total, 12 types of points. A point of type:

• a is a non-stationary point of inflexion;
• b is a stationary point of inflexion;
• c is a stationary point that isn’t an inflexion point or an extremum;
• d is a non-strict maximum point;
• e is a strict maximum point;
• f is a strict maximum point that is also a stationary point;
• g is a (non-strict) maximum and (non-strict) minimum point that is also a stationary
• h is a strict minimum point that is also a stationary point;
• i is a strict minimum point;
• j is a non-strict minimum point;
• k is a (non-strict) maximum and (non-strict) minimum point; and
• l is “none of the above”.

It turns out that in general, there is the awkward possibility of an inflexion point being an extremum (see
Example 1235). Fortunately, we can eliminate this awkward possibility by imposing the requirement
that our function is twice differentiable (see 225).
746, Contents
Exercise 315. Which of types a through l are turning points? (Answer on p. 747.)
Exercise 316. Below is the graph is of a twice-differentiable function f , with X points
marked. What type (a through l, see above) is each point? (Answer on p. 747.)

Figure to be
inserted here.

A315. By Definition 63, a turning point is a stationary point and a strict extremum.
Therefore, points of types f and h are turning points.

Figure to be
inserted here.

747, Contents

74. More Techniques of Differentiation

74.1. Relating the Graph of f ′ to That of f

Your H2 Maths syllabus (p. 8) includes:
• relating the graph of f ′ to that of f .
So, that’s what we’ll do in this chapter. We start with a very simple example.

Example 964. XXX

Example 965. XXX

Example 966. XXX

Exercise 317. XXX (Answer on p. 748.)


748, Contents

74.2. Equations of Tangents and Normals
Your H2 Maths syllabus includes:
• finding equations of tangents and normals to curves, including cases where the curve is
defined implicitly or parametrically.
So, that’s what we’ll do in this chapter.
Recall from Ch. 6 (and also secondary school) the following fact and Definition:

Fact 15. The line that contains the point (p, q) and has gradient m is:

y − q = m (x − p) .

Definition 42. Two lines are perpendicular if:

(a) Their gradients are negative reciprocals of each other; or
(b) One line is vertical while the other is horizontal.

Example 967. The curve C has parametric equations x = t5 + t and y = t6 − t, t ∈ R.

Consider the normal line at the point where t = 0. Find any point(s) at which the normal
line intersects the curve C again.
First, note that t = 0 Ô⇒ (x, y) = (0, 0). Next,
dy RRRR dy dx RRRR 6t5 − 1 RRRR
R = ÷ R = 4 R = 1.
dx RRRR dt dt RRRR 5t − 1 RRRR
Rt=0 Rt=0 Rt=0
So the tangent line at the point t = 0 or (0, 0) has gradient 1. Thus, the normal line at
this point has gradient −1. Its equation is thus y − 0 = 1(x − 0) or more simply y = x.
The points where this normal line intersects the curve is thus given by the system of
equations y = x, x = t5 + t, and y = t6 − t. Putting these together, we have t5 + t = t6 − t
⇐⇒ t (t5 − t4 − 2) = 0. So t = 0 or t ≈ 1.45 (calculator). (We know by the Fundamental
Theorem of Algebra that there must be six roots altogether — in this case, only two are
real, while the other four are complex.)
So the normal line intersects the curve C again at the point where t ≈ 1.45 or where
(x, y) ≈ (7.88, 7.88).

Exercise 318. A curve C is described by the pair of parametric equations x = t5 + t and

y = t4 − t. Find the tangent lines to the curve at the points where t = 0 and t = 1. Find
the intersection point of these two tangent lines. (Answer on p. 1528.)

749, Contents

74.3. Connected Rates of Change Problems

Example 968. We unload sand onto a flat surface at a steady rate of 0.01 m3 s-1 . Assume
the unloaded sand always forms a perfect cone whose height and base diameter are always
Let’s find the rate at which the base area of the cone is increasing, at the instant t = 20 s.
First, recall that a cone with base radius r and height h has volume
V = πr2 h.
Since the base diameter equals the height (or h = 2r), we can rewrite this as
V = πr3 .
Now differentiate the above equation with respect to t, to get
= 2πr2 .
dt dt
Let A = πr2 be the base area. The rate at which the base area is increasing is
dr dv
= 2πr = ÷ r.
dt dt dt
The volume of the sand is always increasing at a rate 0.01 m3 s-1 . That is:
= 0.01 m3 s−1 .

3V 1/3 0.3 1/3

V ∣t=20 = 20 × 0.01 = 0.2 m . Hence, r∣t=20 = ( ) ∣
= ( ) m. Altogether then,
2π t=20 π
0.3 1/3
∣ = 0.01 ÷ ( ) = 0.0219 m2 s−1 .
dt t=20 π

750, Contents

Exercise 319. (Answer on p. 1535.) Illustrated below is a cone with lateral l, base
radius r, and height h. You are given that such a cone has total external surface area
(excluding the base) πrl and volume πr2 h.

A manufacturer wishes to manufacture a cone whose volume is fixed at 1 m3 and whose

total external surface area (excluding the base) is minimised. Find out what its height
should be. (You can follow the steps below.)
(a) Express r in terms of h.
(b) Use the Pythagorean Theorem to express l in terms of r and h. Hence express l solely
in terms of h.
(c) Now express the total external surface area A (excludes the base) solely in terms of
dA 3 π − h63 6 1/3
(d) Show that = . Hence conclude that the only stationary point is h = ( ) .
dh 2 A π
(e) Use the quotient rule to show that

d2 A 9 h4 A2 − (π − h3 )
12 6 2

= .
dh2 4 A3

d2 A
(f) Consider the numerator of . Replace A2 with the expression for A that you found
dh 2
in (c). Now fully expand this numerator. Observe that it is a quadratic and prove that
it is always positive.
(g) Hence conclude that the stationary point we found is indeed the global minimum.

751, Contents

75. More Fun With Your TI84
As part of your training as an obedient monkey, your H2 Maths syllabus includes:
• Locating maximum and minimum points using a graphing calculator; and
• Finding the approximate value of a derivative at a given point using a graphing calculator.
So, that’s what we’ll do in this chapter.

752, Contents

75.1. Locate Max and Min Points on Your TI84

Example 969. Define f ∶ [0, 2] → R by x ↦ x − sin (0.5πx). We can easily find the
minimum point of f analytically:
dF 2 2 2
= 1 − cos ( x) = 0 ⇐⇒ cos ( x) = ⇐⇒ x= cos−1 ≈ 0.560664181.
π π π
dx 2 2 2 π π π
But as an exercise, let’s find it using our TI84.

After Step 1. After Step 2. After Step 3.

After Step 4. After Step 5. After Step 6.

1. Press ON to turn on your calculator.

2. Press Y= to bring up the Y= editor.
3. Press X,T,θ,n − SIN 0 . 5 . To enter “π”, press the blue 2ND button and then π
(which corresponds to the ∧ button). Now press X,T,θ,n ) and altogether you will
have entered “x − sin(0.5πx)”.
4. Now press GRAPH and the calculator will graph y = x − sin(0.5πx).

Note that in the question given, the domain is actually [0, 2], but we didn’t bother
telling the calculator this. So the calculator just went ahead and graphed the equation
y = x − sin(0.5πx) for all possible real values of x and y.
No big deal, all we need to do is to zoom in to the region where 0 ≤ x ≤ 2.
5. Press the ZOOM button to bring up a menu of ZOOM options.
6. Press 2 to select the Zoom In option. Using the ⟨ and ⟩ arrow keys, move the cursor
to where X = 1.0638298, Y = 0. Now press ENTER and the TI will zoom in a little,
centred on the point X = 1.0638298, Y = 0.
(Example continues on the next page ...)

753, Contents

(... Example continued from the previous page.)
It looks like starting at x = 0, the function is decreasing, then hits a minimum point, then
keeps increasing. Our goal now is to find out what that minimum point is.

After Step 7. After Step 8. After Step 9. After Step 10.

After Step 11. After Step 12. After Step 13.

4. Press the blue 2ND button and then CALC (which corresponds to the TRACE
button). This brings up the CALCULATE menu.
5. Press 3 to select the “minimum” option. This brings you back to the graph, with a
cursor flashing. Also, the TI84 prompts you with the question: “Left Bound?”
TI84’s MINIMUM function works by you first choosing a “Left Bound” and a “Right
Bound” for x. TI84 will then look for the minimum point within your chosen bounds.
6. Using the ⟨ and ⟩ arrow keys, move the blinking cursor until it is where you want
your first “Left Bound” to be. For me, I have placed it a little to the left of where I
believe the minimum point to be.
7. Press ENTER and you will have just entered your first “Left Bound”.
TI84 now prompts you with the question: “Right Bound?”.
8. So now just repeat. Using the ⟨ and ⟩ arrow keys, move the blinking cursor until it
is where you want your first “Right Bound” to be. For me, I have placed it a little to
the right of where I believe the minimum point to be.
9. Again press ENTER and you will have just entered your first “Right Bound”.
TI84 now asks you: “Guess?” This is just asking if you want to proceed and get TI84 to
work out where the minimum point is. So go ahead and:
10. Press ENTER . TI84 now informs you that there is a “Zero” at “X = .56066485”,
“Y = −.2105137” and places the cursor at precisely that point. This is our desired
minimum point.
(Notice there’s a slight error, because the TI84 uses slightly-imprecise numerical methods.
Analytically, we found that the minimum point was x ≈ 0.560664181, while the TI84
claims it is “X = .56066485”.)

754, Contents

75.2. Locate the Derivative at a Point on Your TI84
This example will also illustrate how to graph parametric equations on the TI84.

Example 970. The curve C has parametric equations x = t5 + t and y = t6 − t, t ∈ R.

We’ll find dy/dx∣ using our TI84, even though this is easily found analytically:
dy 6t5 − 1 5
∣ = 4 ∣ = .
dx t=1 5t + 1 t=1 6

After Step 1. After Step 2. After Step 3. After Step 4.

After Step 5. After Step 6. After Step 7. After Step 8.

1. Press ON to turn on your calculator.

2. Press MODE to bring up a menu of settings that you can play with. In this example,
all we want is to plot a curve based on parametric equations. So:
3. Using the arrow keys, move the blinking cursor to the word (short for parametric)
and press ENTER .
4. Now as usual, we’ll input the equations of our curve. To do so, press Y= to bring up
the Y= editor. Notice that this screen looks a little different from usual, because we
are now under the parametric setting.
5. Press X,T,θ,n ∧ 5 + X,T,θ,n and altogether you will have entered “T 5 + T ” in the
first line.
6. Now press ENTER to go to the second line.
7. Press X,T,θ,n ∧ 6 − X,T,θ,n and altogether you will have entered “T 6 − T ” in the
second line.
8. Now press GRAPH and the calculator will graph the given pair of parametric equa-

Notice that strangely enough, the graph seems to be empty for the region where x < 0. But
clearly there are values for which x < 0 — for example, t = −1.1 Ô⇒ (x, y) ≈ (−2.71, 2.87).
So why isn’t the TI84 graphing this?
(Example continues on the next page ...)

755, Contents

(... Example continued from the previous page.)
The reason is that by default, the TI84 graphs only the region for where 0 ≤ t ≤ 2π (at
least this is so for my particular calculator). We can easily adjust this:
4. Press the WINDOW button to bring up a menu of WINDOW options.
5. Using the arrow keys, the number pad, and the ENTER key as is appropriate, change
Tmin and Tmax to your desired values. In my case, I decided somewhat randomly to
enter Tmin = −10 and Tmax= 10.
6. Then press GRAPH again and the calculator will graph the given pair of parametric
equations, now for the region Tmin ≤ t ≤ Tmax, where Tmin and Tmax are whatever
you chose.

After Step 9. After Step 10. After Step 11. After Step 12.

After Step 13. After Step 14. After Step 15.

Actually, the last few steps were really not necessary, if all we wanted was to find ∣ ,
dx t=1
as we do now:
7. Press the blue 2ND button and then CALC (which corresponds to the TRACE
button). This brings up the CALCULATE menu, which once again looks a little
different under the current parametric setting.
8. Press 2 to select the “dy/dx” option. This brings you back to the graph.
Nothing seems to be happening. But now, simply ...
9. Press 1 and now the bottom left of the screen changes to display “T = 1”.
10. Hit ENTER . What you’ve just done is to ask the calculator to calculate at the
point where t = 1. The calculator tells you that “dy/dx = .83333528”.
dy 5
Again, there’s a slight error — the exact correct answer is = = 0.8333..., so again
dx 6
the TI84 is a tiny bit off.

756, Contents

76. The Maclaurin Series

76.1. Power Series

We reproduce from Ch. 5.7 the following definition of polynomials:

Definition 31. Let c0 , c1 , . . . , cn be constants, with cn ≠ 0. We call the expression

cn xn + cn−1 xn−1 + cn−2 xn−2 + ⋅ ⋅ ⋅ + c1 x + c0 an nth-degree polynomial (in one variable). We
also call:
• Each ci xi the ith-degree term or the ith term;
• Each ci the coefficient on xi (or the ith-degree coefficient, or the ith coefficient);
• The 0th coefficient c0 the constant term or, more simply, the constant; and
• The following equation a (nth-degree) polynomial equation (in one variable):
cn xn + cn−1 xn−1 + cn−2 xn−2 + ⋅ ⋅ ⋅ + c1 x + c0 = 0.

Quick examples:

Example 971. The expression 7x − 3 is a 1st-degree (or linear) polynomial.

The expression 3x2 + 4x − 5 is a 2nd-degree (or quadratic) polynomial.
The expression −5x3 + 2x + 9 is a 3rd-degree (or cubic) polynomial.
The expression 18 + 5x − x2 + x4 is a 4th-degree (or quartic) polynomial. Etc.
In general,299 if c0 , c1 , . . . , cn are constants with cn ≠ 0, then the following expression is an
nth-degree polynomial in one variable:

c0 + c1 x + c2 x2 + ⋅ ⋅ ⋅ + cn xn .

You can easily imagine what an “infinite-degree polynomial” is. Except we don’t call it
that. Instead, we call it a power series:

Definition 173. Given constants c0 , c1 , c2 , . . . , we call the following expression a power


∑ cn xn = c0 + c1 x + c2 x2 + . . .

We also call:
• Each cn xn the nth-degree term or the nth term;
• Each cn the coefficient on xn (or the nth-degree coefficient, or the nth coefficient); and
• c0 the constant term or, more simply, the constant.

See Definition 31.
757, Contents
Example 972. The expression 1 + 2x + 3x2 + 4x3 + 5x4 + 6x5 + . . . is a power series.
For any non-negative integer n, this power series’s nth coefficient is 1 + n. And so, using
summation notation, we may also write this power series as:

1 + 2x + 3x2 + 4x3 + 5x4 + 6x5 + ⋅ ⋅ ⋅ = ∑ (1 + n) xn

Observe that a power series is, by definition, an infinite series (see Ch. 28.1).
As emphasised in Part II (Sequences and Series), we must be very careful when dealing
with infinite series. The = sign in the above equation is not the usual one; instead, it
means converges to, which has a very clear, precise, and technical meaning that you
are not required to know for H2 Maths.300

Example 973. In the power series below, the nth coefficient is (−1) .

∑ (−1) xn = 1 − x + x2 − x3 + x4 − x5 + . . .


Remark 97. Definition 173 mostly parallels Definition 31. The only exception is that we
do not call the following a power equation:

∑ ci xi = c0 + c1 x + c2 x2 + ⋅ ⋅ ⋅ = 0.

The reason is that the term power equation is rarely used in mathematics and when
it is, it’s usually for rather different purposes — example. So, in this textbook, we will
never use the term power equation.

But see Ch. 118.1 (Appendices) if you’re interested.
758, Contents
76.2. Analytic Functions
A function is analytic if it can be represented by a power series. A bit more formally:

Definition 174. Let D be an open interval and f ∶ D → R be a function. Suppose there

exist constants c0 , c1 , c2 , . . . such that:

f (x) = ∑ cn xn = c0 + c1 x + c2 x2 + . . . for every x ∈ D.

Then we call f an analytic function and say that f can be represented by the power series

∑ cn xn .

Two simple examples:

Example 974. Define f ∶ (−1, 1) → R by:

f (x) =

Using what we learnt about geometric series, we have:

f (x) = = ∑ xn = 1 + x + x2 + x3 + . . . for every x ∈ (−1, 1).
1 − x n=0

And so, we say that f is analytic and can be represented by the following power

∑ xn = 1 + x + x2 + x3 + . . .

Since f can be represented by the above power series, it would have been exactly equi-
valent if we had defined f ∶ (−1, 1) → R not by =, but instead by:

f (x) = 1 + x + x2 + x3 + . . .

759, Contents

Example 975. Define g ∶ (−1/2, 1/2) → R by:
g (x) =
1 + 2x

Using what we learnt about geometric series, we have:

1 1 1
g (x) = = ∑ (−2) xn = 1 − 2x + 4x2 − 8x3 + . . . for every x ∈ (− , ).
1 + 2x n=0 2 2

And so, we say that g is analytic and can be represented by the following power

∑ (−2) xn = 1 − 2x + 4x2 − 8x3 + . . .


Since g can be represented by the above power series, it would have been exactly equivalent
if we had defined g ∶ (−1/2, 1/2) → R not by =, but instead by:

g (x) = 1 − 2x + 4x2 − 8x3 + . . .

Most functions we’ll encounter in H2 Maths are analytic (at least when we restrict their
domain suitably).
Analytic functions are “well-behaved” in that they possess certain properties that make
them particularly easy to deal with. For example, analytic functions are smooth (i.e.
infinitely differentiable).301
A rare example of a function that’s commonly encountered in H2 Maths but which is not
analytic is the absolute value function ∣⋅∣. But even so, ∣⋅∣ fails to be analytic only on open
intervals containing 0 and is analytic if we restrict its domain to any other open interval.

Every analytic function is smooth. But the converse is not true — there are smooth functions that are
not analytic (for one, see Example 986).
760, Contents
76.3. Introducing the Maclaurin Series
In this subchapter, we’ll simply learn to mechanically compute something called the Mac-
laurin coefficients and Maclaurin series (expansion) without understanding what
they do.
In the next subchapter, we’ll then learn that certain functions can be represented by their
Maclaurin series and that this is tremendously convenient.

761, Contents

Example 976. Define f ∶ (−1, 1) → R by: f (x) = .
We now compute the Maclaurin coefficients and Maclaurin series (expansion) for
f using the following four steps.
Step 1. Compute all of f ’s derivatives:

1 2 3⋅2
f ′ (x) = 2, f ′′ (x) = 3, f ′′′ (x) = 4,
(1 − x) (1 − x) (1 − x)

f (4) (x) = f (n) (x) =
5, ... n+1 .
(1 − x) (1 − x)

Step 2. Evaluate each of f , f ′ , f ′′ , etc. at 0:

1 1 2
f (0) = = 1 = 0!, f ′ (0) = = 1 = 1!, f ′′ (0) = = 2 = 2!,
1−0 (1 − 0)
(1 − 0)

f (3) (0) = = 3!, f (n) (0) = = n!.
(1 − 0) (1 − 0)
4 n+1

By the way, note that f (0) = f . That is, we define the zeroth derivative of a function to
be the function itself.
Step 3. For each non-negative integer n = 0, 1, 2, . . . , define the nth Maclaurin coeffi-
cient for f to be the following number:

f (n) (0)
mn = .
And so, the 0th, 1st, 2nd, 3rd, and nth Maclaurin coefficients for f are:

f (0) 0! f ′ (0) 1! f ′′ (0) 2!

m0 = = = 1, m1 = = = 1, m2 = = = 1,
0! 0! 1! 1! 2! 2!
f ′′′ (0) 3! f (n) (0) n!
m3 = = = 1, ... mn = = = 1.
3! 3! n! n!
Here it so happens that every Maclaurin coefficient for f is simply 1. As we’ll see, this
will not generally be the case.
Step 4. We now define the Maclaurin series (expansion) for f to be the power series
whose coefficients are simply the above Maclaurin coefficients:

M (x) = ∑ mn xn = m0 + m1 x + m2 x2 ⋅ ⋅ ⋅ = 1 + 1x + 1x2 + ⋅ ⋅ ⋅ = 1 + x + x2 + . . .

Now, note that all we’ve done is to write down an infinite series M (x) called the Maclaurin
series (and which happens also to be a power series). We have not actually shown that
this series M (x) converges to or is “equal” to any number.
762, Contents
Remark 98. This textbook302 shall treat the terms Maclaurin series and Maclaurin
series expansion as synonyms. That is, the word expansion is optional — and indeed,
we will usually drop it.

Example 977. Consider the sine function sin ∶ R → R.

We shall compute the Maclaurin coefficients and Maclaurin series (expansion) for sin
using the same four steps as before.
Step 1. Compute all of sin’s derivatives.
Unsure of how to proceed, we try writing down the first few derivatives:

sin′ x = cos x, sin′′ x = − sin x, sin(3) x = − cos x, sin(4) x = sin x,

sin(5) x = cos x.

We observe a cycle after every four derivatives. And so, we have:

⎪ sin x for n = 0, 4, 8, . . . ,

⎪cos x for n = 1, 5, 9, . . . ,
sin x = ⎨

⎪ − sin x for n = 2, 6, 10, . . . ,

⎩− cos x for n = 3, 7, 11, . . .

Step 2. Evaluate each of sin, sin′ , sin′′ , etc. at 0:

⎪ 0 for n = 0, 4, 8, . . . ,

⎪1 for n = 1, 5, 9, . . . ,
sin 0 = ⎨

⎪ for n = 2, 6, 10, . . . ,


⎩−1 for n = 3, 7, 11, . . .

Step 3. So, for each n = 0, 1, 2, . . . , the nth Maclaurin coefficient for sin is:

⎪ 0/n! = 0 for n = 0, 4, 8, . . . ,

sin(n) (0) ⎪

⎪1/n! for n = 1, 5, 9, . . . ,
mn = =⎨

⎪ 0/n! = 0 for n = 2, 6, 10, . . . ,


⎩−1/n! for n = 3, 7, 11, . . .

Step 4. The Maclaurin series for sin is:

sin(n) (0) n x x3 x5

x3 x5
M (x) = ∑ mn x = ∑n
x = − + − ⋅⋅⋅ = x − + − ...
n=0 n=0 n! 1! 3! 5! 3! 5!

Formal definitions:

And from what I observe, also on your A-Level exams.
763, Contents
Definition 175. If the function f is n-times differentiable at 0, then the nth Maclaurin
coefficient for f is denoted mn and is defined to be the following number:

f (n) (0)
mn = for n = 0, 1, 2, . . .

If f is smooth (i.e. infinitely-differentiable) at 0, then the Maclaurin series (expansion)

for f is the following power series:
∞ ∞
f (n) (0) n f ′′ (0) 2 f (3) (0) 3 f (n) (0) n
M (x) = ∑ mn xn = ∑ x = f (0)+f ′ (0) x+ x+ x +⋅ ⋅ ⋅+ x +. . .

n=0 n=0 n! 2! 3! n!

Remark 99. No need to mug the above definition because = appears on List MF26.

Example 978. Consider the exponential function exp ∶ R → R+ .

Step 1. Observe that each of exp’s derivatives is equal to itself. That is:

exp(n) x = exp x for every n = 0, 1, 2, . . .

Step 2. Evaluating exp and each of its derivatives at 0, we have:

exp(n) 0 = exp 0 = 1 for every n = 0, 1, 2, . . .

Step 3. So, for each n = 0, 1, 2, . . . , the nth Maclaurin coefficient for exp is:

exp(n) (0) 1
mn = = .
n! n!
Step 4. Thus, the Maclaurin series for exp is:

exp(n) (0) n 1 1

1 2 1 3 x2 x3
M (x) = ∑ mn x = ∑
x = + x + x + x + ⋅⋅⋅ = 1 + x + + + ...
n=0 n=0 n! 0! 1! 2! 3! 2! 3!

Exercise 320. Find the Maclaurin series for each function. (Answer on p. 1536.)
(a) f ∶ (−1, 1) → R defined by f (x) = (1 + x) , where k is any real number.

(b) cos
(c) g ∶ (−1, 1] → R defined by g (x) = ln (1 + x).

Remark 100. Happily, your H2 Maths syllabus (p. 9) explicitly excludes “derivation of
the general term of the series”. I take this to mean that they promise never to ask you
to derive the general nth term of a Maclaurin series. (But of course, who knows if they’ll
actually keep this promise.)

Remark 101. Your H2 Maths syllabus and exams make no mention of the Taylor series.
But just so you know, the Maclaurin series is simply a special case of the Taylor series.
Specifically, the Maclaurin series for f is the Taylor series for f about 0.

764, Contents

76.4. When Can a Function Be Represented by Its Maclaurin
In the last subchapter, we learnt how to compute a function’s Maclaurin series. However,
we did not establish any relationship whatsoever between a function and its Maclaurin
In this subchapter, we now learn that there is such a relationship. In particular, certain
functions can be represented by their Maclaurin series.
To understand what this means, let’s first do a quick review of the concept of convergence
from Part II (Sequences and Series). Recall that the following two statements are equivalent:

M (a) converges to f (a) ⇐⇒ M (a) = f (a).

Note that in the main text of Part II, we did not formally define what it means for an
infinite series to converge. Instead, as stated on p. 387, in H2 Maths, we will simply
count on your rough and intuitive understanding of what convergence means. Two quick
examples to illustrate convergence (and its antonym divergence):

1 1 1 1 1
Example 979. The series + + + + + . . . converges to 1. Or equivalently:
2 4 8 16 32
1 1 1 1 1
+ + + + + ⋅ ⋅ ⋅ = 1.
2 4 8 16 32

Example 980. The series 1 + 1 + 1 + 1 + . . . does not converge. Equivalently, it diverges.

Definition 176. Let f be a function that is smooth at 0 and M be its Maclaurin series.
Suppose that for every x ∈ Domainf , we have:

M (x) = f (x).

Or equivalently, suppose that M converges to f everywhere.303 Then we say that f can

be represented by its Maclaurin series or, more simply, that M represents f .

It turns out that very happily, most functions encountered in H2 Maths can be represented
by their Maclaurin series (at least when the domain is suitably restricted):

By everywhere, we mean at every point in the domain of f .
765, Contents
Example 981. Define f ∶ (−1, 1) → R by:
f (x) =

In the previous subchapter, we found that the Maclaurin series for f is:

M (x) = 1 + x + x2 + . . .

It is possible (but beyond the scope of H2 Maths) to prove that M (x) converges to f (x)
for all x ∈ Domainf = (−1, 1). Or equivalently, that:

M (x) = f (x) for all x ∈ Domainf = (−1, 1).

And so, by Definition 176, we say that f can be represented by its Maclaurin series.
Thus, it would’ve been exactly equivalent if we had defined the function f not by =, but

instead by its Maclaurin series expansion. That is, it would’ve been exactly equivalent if
the first sentence of this example had been replaced with the following sentence:

Define f ∶ (−1, 1) → R by f (x) = 1 + x + x2 + x3 + . . .

766, Contents

76.5. Formally Defining sin and cos

Example 982. Consider the sine function sin ∶ R → R.

In the previous subchapter, we found that the Maclaurin series for sin is:
x3 x5
M (x) = x − + − ...
3! 5!
It is possible (but beyond the scope of H2 Maths) to prove that M (x) converges to sin x
for all x ∈ Domainx = R. Or equivalently, that:

M (x) = sin x for all x ∈ Domainx = R.

And so, by Definition 176, we say that sin can be represented by its Maclaurin series.

In Ch. 19, we defined sin by means of the right-triangle and unit-circle definitions. These
definitions are, however, considered somewhat informal. Since sin may be represented by
its Maclaurin series, why not we simply use that as our formal definition? Here then is this
textbook’s official formal definition of sin:

Definition 177. The sine function sin ∶ R → R is defined by:

x3 x5 x7 2n+1
sin x = x − + − + ⋅ ⋅ ⋅ = ∑ (−1)
n x
(2n +
3! 5! 7! n=0 1)!

Remark 102. Note that in this textbook, we have not justified why the above definition is
x3 x5 x7
valid. That is, we have not justified why for all x ∈ R, the expression x − + − + . . .
3! 5! 7!
converges to some real number. We will simply take for granted that the above definition
We will also take for granted that the above definition is in agreement with our informal
right-triangle and unit-circle definitions of sine.
These same remarks apply to the definition of the cosine function below.

We can similarly work our way towards this textbook’s formal definition of cos:

767, Contents

Example 983. Consider the sine function cos ∶ R → R.
In the previous subchapter, we found that the Maclaurin series for cos is:
x2 x4
M (x) = 1 − + − ...
2! 4!
It is possible (but beyond the scope of H2 Maths) to prove that M (x) converges to cos x
for all x ∈ Domainx = R. Or equivalently, that:

M (x) = cos x for all x ∈ Domainx = R.

And so, by Definition 176, we say that cos can be represented by its Maclaurin series.

In Ch. 19, we defined cos by means of the right-triangle and unit-circle definitions. This,
however, is considered somewhat informal. Since cos may be represented by its Maclaurin
series, why not we simply use that as our formal definition? Here then is this textbook’s
official formal definition of cos:

Definition 178. The cosine function cos ∶ R → R is defined by:

x2 x4 x6 2n
cos x = 1 − + − + ⋅ ⋅ ⋅ = ∑ (−1)
n x
2! 4! 6! n=0

Remark 103. Definitions 177 and 178 just given are called the power series definitions
of sine and cosine and shall be this textbook’s official formal definitions of these two
Note though that there are other ways to formally define sine and cosine. One is to use
the following exponential definitions:

exp (ix) − exp (−ix) exp (ix) + exp (−ix)

sin x = and cos x = .
2i 2
Another is to use the differential equation definitions: y ′′ = −y, with initial conditions
y (0) = 0 for sine and y (0) = 1 for cosine.
This Remark is JSYK. For H2 Maths, you needn’t worry too much about any of these
formal definitions. Instead, you can simply stick to the unit-circle definitions of sine and

768, Contents

76.6. Revisiting exp (optional)

Example 984. Consider the exponential function exp ∶ R → R+ .

In Ch. 76.3, we found that the Maclaurin series for exp is:
x2 x3
M (x) = 1 + x + + + ...
2! 3!
It is possible (but beyond the scope of H2 Maths) to prove that M (x) converges to exp x
for all x ∈ Domainx = R. Or equivalently, that:

M (x) = exp x for all x ∈ Domainx = R.

And so, by Definition 176, we say that exp can be represented by its Maclaurin series.

In Ch. 17, Definition 59 formally defined the exponential function exp to be the inverse
of the natural logarithm function ln.304 But since exp can be represented by its Maclaurin
series, we also have the following alternative definition of exp:

Definition 179. (Alternative Definition of the Exponential Function) The expo-

nential function exp ∶ R → R+ is defined by:
∞ n
x2 x3
exp x = 1 + x + + + ⋅⋅⋅ = ∑ .
2! 3! n=0 n!

Remark 104. Definition 179 is JSYK. Definition 59 remains this textbook’s official formal
definition of the exponential function.

In Ch. 17, we gave two results about Euler’s number e. We can now prove both of them.
The first is especially easy:

1 1 1 1
Theorem 1. e = + + + + ...
0! 1! 2! 3!

Proof. From the Maclaurin series of the exponential function, we have:

1 x x2 x3
exp x = + + + + ...
0! 0! 2! 3!

By Fact 28, e = exp 1. And so, plugging x = 1 into =, we have:


1 1 12 13 1 1
e = exp 1 = + + + + ⋅⋅⋅ = 1 + 1 + + + ...
0! 0! 2! 3! 2! 3!
The second result about e is a little harder to prove:

x 1
And the natural logarithm function ln ∶ R+ → R was, in turn, defined by ln x = ∫ dt. We’ll have
1 t
more to say about this in Ch. XXX.
769, Contents
1 n
Theorem 2. e = lim (1 + ) .
n→∞ n

Proof. Take RHS and apply ln:

1 n 1 n
ln [ lim (1 + ) ] = lim [ln (1 + ) ] (Fact 134 and continuity of ln)
n→∞ n n→∞ n
= lim [n ln (1 + )] (Laws of Logarithms)
n→∞ n
ln (1 + n1 )
= lim
n→∞ 1/n
ln (1 + n1 ) − ln 1
= lim (∵ ln 1 = 0)
n→∞ 1 + 1/n − 1
ln x − ln 1 1
= lim (substitute x = 1 + )
x→1 x−1 n
= ln′ (1) (Definition of the derivative)
= = 1. (Rules of Differentiation)
1 n
We’ve just proven that ln [ lim (1 + ) ] = 1. Now apply exp to get:
n→∞ n
1 n
lim (1 + ) = exp 1 = e,
n→∞ n
where the last step uses Fact 28.

770, Contents

76.7. Can Every Function Be Represented by Its Maclaurin Series?
As the above examples suggest, most functions we’ll encounter in H2 Maths can be repres-
ented by their Maclaurin series.
Note though that not all functions can be represented by their Maclaurin series.
While it is always possible to mechanically compute the Maclaurin series for any function
that is smooth at 0, it is not necessarily true that this Maclaurin series represents the
function. That is, it is not necessarily true that this Maclaurin series converges to the
function at every point in that function’s domain. Simple example to illustrate:

Example 985. We previously defined f ∶ (−1, 1) → R by:

f (x) =

Now suppose we also define g ∶ (−3, 1) → R by:

g (x) =

Observe that f and g have different domains but are otherwise identical.
The function f is smooth at 0 and, as found earlier, its Maclaurin series Mf is given by:

Mf (x) = ∑ mn xn = 1 + x + x2 + . . .

Observe that g is also smooth at 0. And so, we can go through the exact same four steps
to find the Maclaurin series Mg for g. Not surprisingly, we will find that Mg is exactly
the same as Mf . That is, Mg is given by:

Mg (x) = ∑ mn xn = 1 + x + x2 + . . .

However, it is no longer true that Mg (x) converges to g (x) everywhere. That is, it is no
longer true that Mg (x) = g (x) for every x ∈ Domaing = (−3, 1).
Take for example −2 ∈ Domaing = (−3, 1). We have:

1 1
M (−2) = ∑ mn (−2) = 1 + (−2) + (−2) + . . . , g (−2) = = .

n=0 1 − (−2) 3

Clearly, M (−2) ≠ g (−2). Indeed, M (−2) does not even converge. Hence, g cannot be
represented by its Maclaurin series.

In the above counterexample, g could not be represented by its Maclaurin series because
there were values of x ∈ Domaing for which Mg (x) did not converge. One might thus
wonder if the following “result” is true:
“Suppose a function has a Maclaurin series that converges everywhere.
Then this function may be represented by its Maclaurin series.”

771, Contents

Unfortunately, the above “result” is false, as the following (classic) counterexample shows:

Example 986. Define h ∶ R → R by:

⎪ −1

⎪exp 2 for x ≠ 0,
h (x) = ⎨


⎪ for x = 0.
It is possible to show that h is smooth on R. For example, its first two derivatives are:

−1 2
h′ (x) = (exp ) and
x2 x3
−1 4 −1 2 −1 4 − 6x2
h′′ (x) = (exp ) − 3 (exp ) = (exp ) .
x2 x6 x2 x4 x2 x6

It is possible to show305 that every derivative of h evaluated at 0 is 0. That is:

h(0) (0) = 0, h′ (0) = 0, h′′ (0) = 0, ..., h(n) (0) = 0 for every n ∈ Z+0 .

Thus, the Maclaurin series for h is simply:

0 0 0
M (x) = + x + x2 + ⋅ ⋅ ⋅ = 0.
0! 1! 2!
Observe that M (0) = 0 = h (0). So, the Maclaurin series does converge to h at the point
However, for every x ≠ 0, the Maclaurin series converges to 0, i.e. M (x) = 0. But for
every x ≠ 0, h (x) ≠ 0. And so, for every x ≠ 0, M (x) converges but not to h (x), i.e.
M (x) ≠ h (x).
This counterexample shows that it is perfectly possible for a function’s Maclaurin series
to converge everywhere and yet not represent the function anywhere except at one point.

Our earlier examples suggest that most functions we’ll encounter in H2 Maths can be
represented by their Maclaurin series.
In contrast, the two examples we’ve just looked at show that not every function can be
represented by its Maclaurin series.
At this point, a natural question to ask is this:

Which functions can be represented by their Maclaurin series?

Theorem 23 gives one possible answer:

Theorem 23. Every analytic function whose domain includes 0 can be represented by its
Maclaurin series.

Proof. Let f be an analytic function whose domain includes 0.

Since f is analytic, by Definition 174, there exist constants m0 , m1 , m2 , . . . such that:
This can be done with the aid of L’Hôpital’s rule and induction. See e.g. this page.
772, Contents

f (x) = ∑ mn xn = m0 + m1 x + m2 x2 + m3 x4 + . . . for every x ∈ Domainf .


Compute all of f ’s derivatives:

f ′ (x)= m1 + 2m2 x + 3m3 x2 + 4m4 x3 . . . ,

f ′′ (x)= (2 ⋅ 1) m2 + (3 ⋅ 2) m3 x + (4 ⋅ 3) m4 x2 . . . ,
f ′′′ (x)= (3 ⋅ 2 ⋅ 1) m3 + (4 ⋅ 3 ⋅ 2) m4 x + (5 ⋅ 4 ⋅ 3) m5 x2 + . . . ,
f (4) (x) = (4 ⋅ 3 ⋅ 2 ⋅ 1) m4 + (5 ⋅ 4 ⋅ 3 ⋅ 2) m5 x + (6 ⋅ 5 ⋅ 4 ⋅ 3) m6 x2 + . . . ,

(n + 1)! (n + 2)!
f (n) (x) = n!mn + mn+1 x + mn+2 x2 + . . .
1! 2!
Evaluate each of f , f ′ , f ′′ , etc. at 0:

f (0) = m0 ,
f ′ (0)= m1 ,
f ′′ (0)= 2m2 ,
f (3) (0) = 3!m3 ,
f (4) (0) = 4!m4 ,

f (n) (0) = n!mn .

Rearranging, we find that for each n ∈ Z+0 , we(n)

f (0)
mn = .
We’ve just shown that each mn is simply the nth Maclaurin coefficient for f . Equivalently,
the power series representation of f given in = is the Maclaurin series for f . And so, by

Definition 175, f can be represented by its Maclaurin series.

Let us illustrate Theorem 23 with a few examples:

Example 987. In H2 Maths, we do not learn to prove that exp is analytic.

But suppose we are simply told to assume that exp is analytic.
Then we can happily apply Theorem 23.
That is, we can go through the usual rigmarole of finding the Maclaurin series M for exp.

x2 x3
Having found that: M (x) = 1 + x + + + ..., we can then assert that:
2! 3!

x2 x3
exp x = M (x) = 1 + x + + + ... for all x ∈ R.
2! 3!

773, Contents

Example 988. Define f ∶ (−1, 1) → R by f (x) = 2.
(1 + x)
In H2 Maths, we do not learn to prove that f is analytic.
But suppose we are simply told to assume that f is analytic.
Then we can happily apply Theorem 23.
That is, we can go through the usual rigmarole of finding the Maclaurin series M for f .

Having found that: M (x) = 1 − 2x + 3x2 − 4x3 +, we can then assert that:

= M (x) = 1 − 2x + 3x2 − 4x3 + for all x ∈ (−1, 1).
(1 + x)

Theorem 23 is very wonderful. However, it simply leads to another, harder question:

Which functions are analytic?

In H2 Maths and also in this textbook, we will not even attempt to answer the above
question. Instead, we will simply and blithely assume that most functions we encounter
are analytic (at least when the domain is suitably restricted). Which means that for most
functions, we can simply and blithely apply Theorem 23 and thus assume that they can
indeed be represented by their Maclaurin series.
This is wonderful, but you should be aware that this also means there are important holes
in your understanding of how and when the Maclaurin series works. These holes will be
patched as you progress beyond H2 Maths.

774, Contents

(a + b) = a +   a b +   a b +   a b +  + b , where n is a positive integer and
1  2  3
  = 76.8. The Five Standard Maclaurin Series
 r  r!(n − r )!

The following appears on List MF26 so no need to mug.

Maclaurin expansion:
x2 x n (n)
f( x) = f(0) + x f ′(0) + f ′′(0) +  + f (0) + 
2! n!
n(n − 1) 2 n(n − 1)  (n − r + 1) r
(1 + x) n = 1 + nx + x ++ x + ( x < 1)
2! r!
x2 x3 xr
ex =1+ x + + ++ + (all x)
2! 3! r!
x3 x5 (−1) r x 2 r +1
sin x = x − + −+ + (all x)
3! 5! (2r + 1)!
x2 x4 (−1) r x 2 r
cos x = 1 − + −+ + (all x)
2! 4! (2r )!
x2 x3 (−1) r +1 x r
ln(1 + x) = x − + −+ + ( −1< x ≤1)
2 3 r

Your H2 Maths syllabus (p. 9) and exams306 call the five specific Maclaurin series listed
Partialthe standard
fractions series. In previous subchapters, we already learnt how to derive all
five of these standard series.
Non-repeated linear factors:
Your H2 Maths syllabus (p. 9) includes:
px + q A B
= +
• range of values of x for which (aaxstandard
+ b)(cx + d ) series
(ax + bconverges.
) (cx + d )

In List Repeated
MF26 (see factors: these ranges of values are given on the right (in parentheses).

For the exponential, sine, andpx cosine

+ qx + rfunctions, the range + of values is “all x”. What this
= +
means is that for every x ∈ R,
(ax +the following
+ d) are true:
+ + (cx + d )
2 2
b)(cx ( ax b ) ( cx d )
Non-repeated quadratic factor:
exp = +1r+ x +
+ qx
px 2 x A + . . .Bx + C
= 2! + 2
(ax + b)( x + c ) (ax + b) ( x + c 2 )
2 2

x3 x5
sin x = x − + + ...
3! 5!
x2 x4
cos x = 1 − + + ...
2! 4!
In contrast, for the first and fifth standard
series, we have restrictions.
The first series for (1 + x) has the restriction “∣x∣ < 1”. This means that the corresponding

Maclaurin series M (x) converges only for x ∈ (−1, 1). That is, for every x ∈ (−1, 1), we
n (n − 1) 2 n (n − 1) (n − 2) 3
(1 + x) = 1 + nx + x + x + ...
n 1
2! 3!

In contrast, = is false for any x ∉ (−1, 1). For example, suppose x = 2 and n = 1.5. Then the

LHS of = is:

See Exercise 564 (N2017/I/1).
775, Contents
(1 + x) = (1 + 2) = 31.5 ≈ 5.196.
n 1.5

But if we try to compute the RHS of =, we get:


n (n − 1) 2 1.5 (1.5 − 1) 2 1.5 (1.5 − 1) (1.5 − 2) 3

1 + nx + x + . . . = 1 + 1.5 ⋅ 2 + 2 +
2 ...,
2! 2! 3!

which diverges. Thus, = is false for x = 2 and n = 1.5.


So, to repeat, for any x ∉ (−1, 1), the corresponding Maclaurin series will not converge and
= is false.

In H2 Maths and this textbook, we shall not explain why the Maclaurin series for (1 + x)

converges only on (−1, 1). Instead, this is simply something you must “know” by rote.

Remark 105. The term binomial series — which was on the old 9740 syllabus but is no
longer on the current 9758 syllabus — is simply the Maclaurin series for (1 + x) .

Similarly, the fifth series for ln (1 + x) has the restriction “−1 < x ≤ 1”. This means that
the corresponding Maclaurin series M (x) converges only for x ∈ (−1, 1]. That is, for every
x ∈ (−1, 1], we have:
x2 x3 x4 x5 x6
ln (1 + x) = x − + − + − + ...
2 3 4 5 6

In contrast, = is false for any x ∉ (−1, 1]. For example, suppose x = 2. Then the LHS of =
3 3


ln (1 + x) = ln (1 + 2) = ln 3 ≈ 1.099.

But if we try to compute the RHS of =, we get:


x2 x3 x4 x5 x6 22 23 24 25 26
x− + − + − + ⋅⋅⋅ = 2 − + − + − + ...,
2 3 4 5 6 2 3 4 5 6

which diverges. Thus, = is false for x = 2.


So, to repeat, for any x ∉ (−1, 1], the corresponding Maclaurin series will not converge and
= is false.

Again, in H2 Maths and this textbook, we shall not explain why the Maclaurin series for
ln (1 + x) converges only on (−1, 1]. Instead, this is simply something you must “know” by

776, Contents

76.9. Maclaurin Polynomials as Approximations
Given a function f , its nth Maclaurin polynomial is simply its Maclaurin series up to
and including the xn term. A bit more formally:

Definition 180. Let f be a function that is n-times differentiable at 0. Then the nth
Maclaurin polynomial for f is:
f ′′ (0) 2 f (3) (0) 3 f (n) (0) n
Mn (x) = ∑ mi x = f (0) + f (0) x +
i ′
x + x + ⋅⋅⋅ + x .
i=0 2! 3! n!

Not surprisingly, if a function can be represented by its Maclaurin series, then it can also be
approximated by its Maclaurin polynomials. This is one useful application of the Maclaurin

Example 989. Consider the function g ∶ (−1, 1] → R defined by g (x) = ln (1 + x).

In Exercise 320(c), we showed that g can be represented by its Maclaurin series. That is,
for all x ∈ Domaing = (−1, 1], we have:

x2 x3 x4
g (x) = ln (1 + x) = x − + − + ...
2 3 4
By the above definition, the 0th, 1st, 2nd, 3rd, 4th, and 5th Maclaurin polynomials
for g are the following polynomials:

M0 (x) = 0,

M1 (x) = x,
M2 (x) = x − ,
x2 x3
M3 (x) = x − + ,
2 3
x2 x3 x4
M4 (x) = x − + − ,
2 3 4
x2 x3 x4 x5
M5 (x) = x − + − + .
2 3 4 5

Figure to be
inserted here.

Observe that these first six Maclaurin polynomials for g serve as ever-improved approx-
imations of the function g.
777, Contents
Example 990. Consider the sine function sin ∶ R → R.
We already showed that sin can be represented by its Maclaurin series. That is, for every
x ∈ Domain sin = R, we have:

x3 x5
sin x = M (x) = x − + − ...
3! 5!
The 0th, 1st, 2nd, 3rd, 4th, and 5th Maclaurin polynomials for f are:

M0 (x) = 0,

M1 (x) = x,

M2 (x) = x,
M3 (x) = x − ,
M4 (x) = x − ,
x3 x5
M5 (x) = x − + .
3! 5!

Figure to be
inserted here.

x3 x5
Observe that if x is small (close to zero), then is small and is even smaller. Indeed,
3! 5!
each Maclaurin coefficient grows smaller.
And so, if x is small, even low-order Maclaurin polynomials will serve as “good” approx-
imations of sine.
Indeed, if x is very small (i.e. very close to zero), then we may simply assert that:

sin x ≈ x.

We call ≈ the small-angle approximation for sine.


In Exercises 321(c) and 329, we’ll also learn of the small-angle approximation for
cosine and tangent.
By the way, one might reasonably think that a higher-degree Maclaurin polynomial is
always a better approximation than a lower-degree Maclaurin polynomial. Unfortunately,
this is not generally true, especially if x is far from zero.
As an example, let x = 10, so that sin x = sin 10 ≈ −0.544.
778, Contents the 5th Maclaurin polynomial at 10, we have:
Exercise 321. For each function, sketch its graph; then find and sketch (on the same
graph) its 0th, 1st, 2nd, and 3rd Maclaurin polynomials. (Answer on p. 1538.)
(a) exp
(b) f ∶ (−1, 1) → R defined by f (x) = (1 + x) , where n is any real number.

(c) cos
The small-angle approximation for cosine is given by the 2nd Maclaurin polynomial.
Write it down.

Remark 106. The term Maclaurin polynomial is not used in your H2 Maths syllabus or
exams. However, it is sufficiently convenient that I have decided nonetheless to introduce
it in this textbook.

779, Contents

76.10. Term-by-Term Differentiation

Example 991. Define f ∶ (−1, 1) → R by:

f (x) =

We know that f may be represented by its Maclaurin series. That is:

f (x) = = 1 + x + x2 + x3 + ⋅ ⋅ ⋅ = ∑ xn for every x ∈ (−1, 1).
1−x n=0

We aren’t sure if such a step is legal, but suppose we try differentiating the above Mac-
laurin series term by term. Then we’d get the following expression:

1 + 2x + 3x + ⋅ ⋅ ⋅ = ∑ nxn−1 .

Can we now say that f is differentiable with derivative defined by:

x ↦ 1 + 2x + 3x + ⋅ ⋅ ⋅ = ∑ nxn−1 ?

The following exercise continues with this example.

Exercise 322. We continue with the above example. (Answer on p. 1539.)

(a) Find the derivative of f .

(b) Find the Maclaurin series of the derivative you found in (a).
(c) Comment on your findings.

The above example and exercise suggest that if f can be represented by a power series
M , then f is differentiable and, moreover, the derivative f ′ can be obtained by simply
differentiating M term by term. It turns out that happily enough, this is true! Formally:

Theorem 24. Let D be an open interval and f ∶ D → R. Suppose that:

f (x) = c0 + c1 x + c2 x + c3 x ⋅ ⋅ ⋅ = ∑ cn xn
2 3
for every x ∈ D.

Then f is differentiable and its derivative f ′ ∶ D → R is defined by:

f (x) = c1 + 2c2 x + 3c3 x + ⋅ ⋅ ⋅ = ∑ ncn xn−1
′ 2
for every x ∈ D.

Proof. Omitted — see e.g. Abbott (2015, Theorem 6.5.7).

We now illustrate the above Theorem by proving that the derivatives of sin and cos are,

780, Contents

respectively, cos and − sin.

Fact 147. The derivative of sin ∶ R → R is cos ∶ R → R.

x3 x5
Proof. For every x ∈ R, we have sin x = x − + − ...
3! 5!
By the above Theorem then, sin is differentiable and its derivative sin′ ∶ R → R is defined
3x2 5x4 x2 x4
sin x = 1 −

+ − ⋅⋅⋅ = 1 − + − ...
3! 5! 2! 4!
Observing that this last expression is simply the power series expansion of cos, we conclude
that the derivative of sin is cos.

Fact 148. The derivative of cos ∶ R → R is − sin ∶ R → R.

x2 x4 x6
Proof. For every x ∈ R, we have cos x = 1 − + − + ...
2! 4! 6!
By the above Theorem then, cos is differentiable and its derivative cos′ ∶ R → R is defined
2x 4x3 6x5 x3 x5
cos′ x = 0 − + − + ⋅ ⋅ ⋅ = −x + − + ...
2! 4! 6! 3! 5!
Observing that this last expression is simply the power series expansion of − sin, we conclude
that the derivative of cos is − sin.

We’ve just shown that sin and cos are differentiable. By Theorem 19 then, they are also
continuous. We have thus proven the following result that was stated long ago in Ch. 68.5
and now reproduced:

Fact 138. The functions sin and cos cosine functions are continuous.

Exercise 323. XXX (Answer on p. 781.)


781, Contents

Your H2 Maths syllabus (p. 9) includes:
• derivation of the first few terms of the Maclaurin series by
– repeated differentiation, e.g. sec x
– repeated implicit differentiation, e.g. y 3 + y 2 + y = x2 − 2x,
– using standard series e.g. ex cos 2x, ln ( )
1 − 2x
So, in the remainder of the present chapter on the Maclaurin
series, we’ll cover the above.

782, Contents

76.11. The Cauchy Product
We know how to find the product of two (finite) polynomials:

Example 992. Let p (x) = x + 5 and q (x) = 2x2 − 3. Then:

p (x) q (x) = (x + 5) (2x2 − 3) = 2x3 + 10x2 − 3x − 15.

Recall that a power series is simply an “infinite polynomial”. And so, given two power
series, we can also (naïvely) multiply them together as if they were finite polynomials:

Example 993. Suppose we have the following two power series:

A = 1 + 2x + 3x2 + 4x3 + . . . and B = 1−2x2 + 4x4 − 6x6 + . . .

Let us now simply and naïvely multiply these two power series together as if they were
finite polynomials. To do so, write:

AB = (1 + 2x + 3x2 + 4x3 + . . . ) (1 − 2x2 + 4x4 − 6x6 + . . . ) = c0 + c1 x + c2 x2 + c3 x3 + . . .

Happily and as already noted, for H2 Maths, you’ll only ever be asked to find the “first
few terms”.
So here, let us find only c0 , c1 , c2 , and c3 . “Clearly”, we have:

c0 = 1 × 1 = 1,
c1 = 2 × 1 = 2,
c2 = 1 × (−2) + 3 × 1 = 1,
c3 = 2 × (−2) + 4 × 1 = 0.


AB = (1 + 2x + 3x2 + 4x3 + . . . ) (1 − 2x2 + 4x4 − 6x6 + . . . ) = 1 + 2x + x2 + 0x3 + . . .

We call this last expression obtained the Cauchy product307 of the two power series A
and B.

Example 994. XXX

x3 x5 x2 x4
Exercise 324. Let A = x − + − . . . , B = 1 − + − . . . , and C = 1 + x + x2 + x3 + . . .
3! 5! 2! 4!
Write down the Cauchy products AB, AC, and BC, up to and including the x3 term.
(Answer on p. 783.)

For the formal definition of the Cauchy product, see Definition 267 (Appendices).
783, Contents
The following result says that we can simply multiply two analytic functions together in
the “obvious” fashion:

Informal Theorem: The Product of Two Analytic Functions Is Analytic308

Suppose f and g are functions that can be represented by the power series A and B. Then
f ⋅ g can also be represented by the Cauchy product C = AB.

Example 995. Define f ∶ R → R by f (x) = sin x cos x.

We know that sin and cos can be represented by their Maclaurin series:

x3 x5 1 1 5
sin x = x − + − ⋅ ⋅ ⋅ = 0 + 1x + 0x2 − x3 + 0x4 − x − ... for all x ∈ R,
3! 5! 6 120
x2 x4 1 1
cos x = 1 − + − ⋅ ⋅ ⋅ = 1 + 0x− x2 + 0x3 + x4 − 0x5 − . . . for all x ∈ R.
2! 4! 2 24
In Exercise 324, we already found that the Cauchy product of the above two power series
x3 x5 x2 x4 2
(x − + − . . . ) (1 − + − . . . ) = x − x3 + . . .
3! 5! 2! 4! 3
And so, by Theorem 37, x − x3 + . . . is the Maclaurin series representation of f . That is:
f (x) = sin x cos x = x − x3 + . . . for all x ∈ R.
Another method for finding the representation of f is by doing what we did in earlier
subchapters. You are asked to do so in Exercise 325.

Exercise 325. Continue to define f ∶ R → R by f (x) = sin x cos x. Use Theorem 23 to

find the Maclaurin series representation of f , up to and including the x3 term. (Answer
on p. 1539.)
Exercise 326. Define g ∶ R → R by g (x) = sin x exp x. Using both methods you’ve
learnt (i.e. Theorems 23 and 37), find the Maclaurin series representation of g, up to and
including the x3 term. (Answer on p. 784.)
Exercise 327. Again, define f ∶ (−1, 1) → R by f (x) = (1 + x) and g ∶ R → R by

g (x) = ex . Again, find the Maclaurin series representation of h = f ⋅ g up to and including

the x2 term, but this time using Theorem 37. (Answer on p. 784.)

This result is formally stated as Theorem 37 (Appendices).
784, Contents
76.12. The Composition of Two Analytic Functions
The following informal result says that the composition of two analytic functions is analytic.
Moreover, the composition can be obtained in the “obvious” fashion — that is, by simply
“plugging” one power series into the other.

Informal Theorem: The Composition of Two Analytic Functions Is


Let f and g be functions for which the composite function f ○ g is well-defined. Suppose
f and g can be represented by the power series:
∞ ∞
∑ a n xn and ∑ bn xn .
n=0 n=0

Then f ○ g can be represented by the following power series:

∞ n
an ( ∑ bm x ) .

This result is formally stated as Theorem 38 (Appendices).
785, Contents
1 1 1
Example 996. Define f ∶ (−1, 1) → R by f (y) = and g ∶ (− , ) → R by g (x) = 2x.
1+y 2 2
1 1
Observe that Rangeg = (−1, 1) ⊆ Domainf so that the composite function f g ∶ (− , ) →
2 2
R is well-defined, with:

(f g) (x) = f (g (x)) = .
1 + 2x
Assuming f and g are both analytic, by Theorem 38, so too is f g.
The power (and also Maclaurin) series representation of f is:

f (y) = = 1 − y + y2 − y3 + . . . for y ∈ (−1, 1).

Similar to the previous subchapter, we will use two methods to find the power (and also
Maclaurin) series representation of f g.
Method 1. By Theorem 38, simply plug y = g (x) into =:

1 1
(f g) (x) = f (g (x)) = = = 1 − 2x + (2x) − (2x) + ⋅ ⋅ ⋅ = 1 − 2x + 4x2 − 8x3 + . . .
2 3
1 + g (x) 1 + 2x
1 1
for x ∈ Domaing = (− , ).
2 2

Method 2 (Theorem 23). The first few derivatives of f g are:

′ −2
(f g) (x) = 2,
(1 + 2x)
′′ 8
(f g) (x) = 3,
(1 + 2x)
′′′ −48
(f g) (x) = 4,
(1 + 2x)
′ ′′ ′′′
Evaluate each of f g, (f g) , (f g) , and (f g) at 0:

(f g) (0) = = 1,
′ −2
(f g) (0) = 2 = −2,
(1 + 2 ⋅ 0)
′′ 8
(f g) (0) = = 8,
(1 + 2 ⋅ 0)

′′′ −48
(f g) (0) = = −48,
(1 + 2 ⋅ 0)

Thus, f g may be represented by its Maclaurin series, which is:

786, Contents
Example 997. Find the Maclaurin series expansion of i ∶ R → R defined by i (x) = ex .

Observe that if we define h ∶ R → R by h (x) = x2 , then i = exp ○h. That is, i may be
written as the composition of the exponential function and h.
Assuming exp and h are both analytic, by Theorem 38, so too is their composition
i = exp ○h.
The power (and also Maclaurin) series expansion of exp is:

y2 y3
exp y = 1 + y + + + . . . for y ∈ R.
2! 3!
As before, we can use two methods to find the power (and also Maclaurin) series expansion
of i.
Method 1 (Theorem 38). Simply plug y = h (x) into =:

(x2 ) (x2 )
2 3
x4 x6
i (x) = (exp ○h) (x) = exp (h (x)) = e = 1 + x + x2
+ 2
+ ⋅ ⋅ ⋅ = 1 + x2 + + + ...
2! 3! 2! 3!
for x ∈ R.

Method 2 (Theorem 23). Compute the first few derivatives of i:

i′ (x) = 2xex = 2x ⋅ i (x),


i′′ (x) = 2i (x) + 2x ⋅ i′ (x),

i′′′ (x) = 2i′ (x) + 2i′ (x) + 2x ⋅ i′′ (x) = 4i′ (x) + 2x ⋅ i′′ (x),
i(4) (x) = 4i′′ (x) + 2i′′ (x) + 2x ⋅ i(3) (x) = 6i′′ (x) + 2x ⋅ i(3) (x).

Evaluate each of i, i′ , i′′ , i′′′ , and i(4) (x) at 0:

i (0) = e0 = 1,

i′ (0) = 2 ⋅ 0 ⋅ i (0) = 0,
i′′ (0) = 2i (0) + 2 ⋅ 0 ⋅ i′ (0) = 2 + 0 = 2,
i′′′ (0) = 4i′ (0) + 2 ⋅ 0 ⋅ i′′ (0) = 4 ⋅ 0 + 0 = 0,
i(4) (0) = 6i′′ (0) + 2 ⋅ 0 ⋅ i(3) (0) = 6 ⋅ 2 + 0 = 12.

Thus, i may be represented by its Maclaurin series, which is:

1 0 2 0 12 x4 1 1
i (x) = ex = + x + x2 + x3 + x4 + ⋅ ⋅ ⋅ = 1 + x2 + + ... for x ∈ (− , ).

0! 1! 2! 3! 4! 2! 2 2

We can even “plug” one Maclaurin series into another. To illustrate, here is a conceptually-
simple (if tedious) example:

787, Contents

1 1
Example 998. Define f ∶ (−1, 1) → R by f (y) = and g ∶ R → R by g (x) = sin x.
1+y 2
Let h = f g. (Can you explain why h is well defined?)310
Then h ∶ R → R is defined by:
h (x) = (f g) (x) = f (g (x)) =
1 + sin x
1 .

Assuming f and g are both analytic, by Theorem 38, so too is their composition h.
The power (and also Maclaurin) series expansion of f is:

f (y) = = 1 − y + y2 − y3 + . . . for all y ∈ (−1, 1).

The power (and also Maclaurin) series expansion of g is:

1 1 x3 x5
g (x) = sin x = (x − + − ...) for all x ∈ R.
2 2 3! 5!

Method 1 (Theorem 38). Simply plug = into =:

2 1

h (x) = =
1 + 12 sin x
1 x3 x5 1 x3 x5 1 x3 x5
1 − [ (x − + − . . . )] + [ (x − + − . . . )] − [ (x − + − . . . )] 3 + . . .
2 3! 5! 2 3! 5! 2 3! 5!
for x ∈ R.

The RHS expression looks like a complete nightmare. But if we’re merely asked to write
out the Maclaurin series of h up to and including the x3 term, then things aren’t too
bad. Examine the RHS expression one term at a time and simply discard anything that’s
above degree 3. If we do so, we get:

1 x x3 x2 x3 x x2 x3
h (x) = = 1 − ( − ) + ( ) − ( ) + ⋅⋅⋅ = 1 − + − + ... for x ∈ R.
1 + 12 sin x 2 12 4 8 2 4 24

Method 2 (Theorem 23). Compute the first few derivatives of h:

−1 1 1
h′ (x) = ( cos x) = − [h (x)] cos x,

(1 + 12 sin x) 2
2 2
1 1
h′′ (x) = − {2h (x) h′ (x) cos x − [h (x)] sin x} = h (x) [ h (x) sin x − h′ (x) cos x],
2 2
1 1 ′ 1
h (x) = h (x) [ h (x) sin x − h (x) cos x] + h (x) [ h (x) sin x + h (x) cos x − h′′ (x) cos x + h′
′′′ ′ ′
2 2 2

Evaluate each of h, h′ , h′′ , and h′′′ at 0:

788, h (0) =
Contents = 1,
Exercise 328. The function f is defined by f (x) = sin [ln (1 + x)]. Using both methods
you’ve learnt (i.e. Theorems 38 and 23), write down the Maclaurin series representation
of f , up to and including the x3 term. (As always, don’t forget to state the range of
values for which the Maclaurin series converges.) (Answer on p. 1540.)

789, Contents

76.13. Repeated Differentiation
Your syllabus includes:
• derivation of the first few terms of the Maclaurin series by repeated differentiation.
So that’s what we’ll do here:

Example 999. In a typical A-Level exam question, you might be asked to find the
Maclaurin series expansion of the secant function sec up to and including the x4 term.
To do so, first write down the first four derivatives of sec:

sec′ x = sec x tan x,

sec′′ x = sec x tan2 x + sec3 x,
sec′′′ x = sec x tan3 x + 2 tan x sec3 x + 3 sec3 x tan x,
sec(4) x = sec x tan4 x + 3 sec3 x tan2 x + 2 sec5 x + 6 tan2 x sec3 x + 9 sec3 x tan2 x + 3 sec5 x.

Observe that sec 0 = 1 and tan 0 = 0. So, we can easily evaluate each of sec, sec′ , sec′′ ,
sec′′′ , and sec(4) (x) at 0:

sec 0 = 1,
sec′ 0 = 0,
sec′′ 0 = 0 + 1 = 1,
sec′′′ 0 = 0 + 0 + 0 = 0,
sec(4) 0 = 0 + 0 + 2 + 0 + 0 + 3 = 5,

Thus, the Maclaurin series expansion of sec, up to and including the x4 term, is:

1 0 1 2 0 3 5 4 1 5
sec x = + + x + x + x + ⋅ ⋅ ⋅ = 1 + x2 + x4 + . . .
0! 1! 2! 3! 4! 2 24

Remark 107. The method of repeated differentiation is nice, but does have one im-
portant drawback — it does not tell us about the range of values on which the computed
Maclaurin series converges.

For example, it turns out that the Maclaurin series of sec converges only on (− , ).
π π
2 2
That is, = holds only for x ∈ (− , ) and not for any other x. This is an important
1 π π
2 2
piece of information that we are unable to find using only the method of repeated

Exercise 329. Find the Maclaurin series expansion of the tangent function tan, up to
and including the x5 term. The small-angle approximation for tan is given by the
2nd Maclaurin polynomial — write it down. (Answer on p. 790.)

790, Contents

76.14. Repeated Implicit Differentiation
Your syllabus includes:
• derivation of the first few terms of the Maclaurin series by repeated implicit differenti-
So that’s what we’ll do here:

Example 1000. Suppose the function f satisfies the following equation:

x [f (x)] + e = ef (x) .
2 0

Using repeated implicit differentiation, we can find the first few terms of the Maclaurin
series of f .
Differentiating once with respect to x, we have:

[f (x)] + 2x ⋅ f (x) f ′ (x) = ef (x) f ′ (x).

2 1

Differentiating a second time with respect to x, we have:

2f (x) f ′ (x) + 2f (x) f ′ (x) + 2x {[f ′ (x)] + f (x) f ′′ (x)} = ef (x) [f ′ (x)] + ef (x) f ′′ (x).
2 2 2

Plug x = 0 into =:

0 [f (0)] + e = 0 + e = e = ef (0) f (0) = 1.

2 0 3

Next, plugging x = 0 and = into =, we have:

3 1

4 1
1 + 0 = 1 = e1 f ′ (0) f ′ (0) = .

Next, plugging x = 0, =, and = into =, we have:

3 4 2

1 1 4 2 1 1 2 1 ′′ 1 3
2 ⋅ 1 ⋅ + 2 ⋅ 1 ⋅ + 0 = = e ⋅ ( ) + e f (0) = + ef ′′ (0) or f ′′ (0) = .
e e e e e e2

Thus, the Maclaurin series expansion of f , up to and including the x2 term, is:
f (0) f ′ (0) f ′′ (0) 2 3
f (x) = + x+ x + ⋅ ⋅ ⋅ = 1 + x + 2 x2 + . . .
0! 1! 2! 2e

Remark 108. Again, the method of repeated implicit differentiation is nice but fails
to us about the range of values on which the computed Maclaurin series converges.
In the above example, this method allowed us to find the Maclaurin series expansion of
f , but did not tell us the range of values on which this series converges.

791, Contents

Exam Tips for Towkays

Comparing the new 9758 syllabus (first examined 2017) with the old 9740 syllabus (last
examined 2017), we have mostly subtractions and rarely any additions. One of the rare
additions is this subchapter’s topic. My suspicion is therefore that it will soon show up.
(Note that it didn’t appear on the 2017 9758 A-Level exams.)
Although very tedious, there is conceptually nothing difficult about repeated implicit
differentiation — it’s just a whole bunch of differentiation and algebra. So, just make
sure you go slowly and carefully. Ensure that everything is correct at each step of the

The following example is literally the one given in your syllabus:

792, Contents

Example 1001. Suppose the function g satisfies the following equation:

[g (x)] + [g (x)] + g (x) = x2 − 2x.

3 2 0

Using repeated implicit differentiation, we can find the first few terms of the Maclaurin
series of g.
Differentiating once with respect to x, we have:

3 [g (x)] g ′ (x) + 2g (x) g ′ (x) + g ′ (x) = 2x − 2

g ′ (x) {3 [g (x)] + 2g (x) + 1} = 2x − 2.
2 1

Differentiating a second time with respect to x, we have:

g ′′ (x) {3 [g (x)] + 2g (x) + 1} + g ′ (x) [6g (x) g ′ (x) + 2g ′ (x)] = 2.

2 2

Plugging x = 0 into =, we have:


[g (0)] + [g (0)] + g (0) = 02 − 2 ⋅ 0 = 0 g (0) {[g (0)] + g (0) + 1} = 0.

3 2 2 3

Observe that the expression [g (0)] + g (0) + 1 is a quadratic polynomial in g (0) and has

negative discriminant 1 − 4 ⋅ 1 ⋅ 1 = −3 < 0. Thus, the only (real) solution to = is g (0) = 0.

3 4

Plugging x = 0 and = into =, we have:

4 1

g ′ (0) (3 ⋅ 0 + 2 ⋅ 0 + 1) = 2 ⋅ 0 − 2 g ′ (0) = −2.


Plugging x = 0, =, and = into =, we have:

4 5 2

g ′′ (0) (3 ⋅ 0 + 2 ⋅ 0 + 1) − 2 [6 ⋅ 0 + 2 ⋅ (−2)] = 2 or g ′′ (0) + 8 = 2 or

g ′′ (0) = −6.

Altogether then, using =, =, and =, the Maclaurin series expansion of g, up to and

4 5 6

including the x2 term, is:

g (0) g ′ (0) g ′′ (0) 2 0 −2 −6
g (x) = + x+ x + ⋅ ⋅ ⋅ = + x + x2 + ⋅ ⋅ ⋅ = −2x − 3x2 + . . .
0! 1! 2! 0! 1! 2!

Exercise 330. XXX (Answer on p. 793.)


793, Contents

76.15. The Basel Problem (fun, optional)
The Basel Problem is the problem of finding the following sum of series:311

1 1 1 1
∑ = + + + ...
n=1 n
2 12 22 32

And here is the solution:

1 1 1 1 π2
Fact 149. ∑ 2 = 2 + 2 + 2 + ⋅⋅⋅ = .
n=1 n 1 2 3 6

Below we will go through Euler’s remarkable “solution” of the Basel Problem. But first, a
very brief history:
The Basel Problem was first posed in 1650 by Pietro Mengoli (1626–86),312 but only became
more widely known in 1689 when Jacob Bernoulli (1655–1705) published one of his Treatises
on Infinite Series. The Basel Problem is named after the city of publication (and also
Bernoulli’s residence).313 Bernoulli wrote:
1 1 1 1 1
... when the numbers are pure squares, as in the series + + + +
1 4 9 16 25
&c., it is more difficult than one would have expected, which is noteworthy.
If someone should succeed in finding what till now withstood our efforts and
communicate it to us, we would be much obliged to them.314315
We already briefly discussed this in Example 478.
Mengoli wrote:
Ab huius fractionum dispositionis contemplatione faliciter expeditus, ad aliam pro-
grediebar dispositionem, in qua singula unitates numeris quadratis denominantur.
Hac speculatio fructus quidem laboris rependit, nondum tamen effecta est solvendo,
sed ingenij ditioris postulat adminiculum, ut pracisam dispositionis, quam mihi-
metipsi proposui, summam valeat reportare.
An English translation of the above paragraph is credited to Emanuele Delucchi and found in Loya
Having concluded with satisfaction my consideration of those arrangements of frac-
tions, I shall move on to those other arrangements that have the unit as numerator,
and square numbers as denominators. The work devoted to this consideration has
bore some fruit — the question itself still awaiting solution — but it [the work] re-
quires the support of a richer mind, in order to lead to the evaluation of the precise
sum of the arrangement [of fractions] that I have set myself as a task.

Basel was also the where the illustrious Bernoulli family resided. The Bernoullis left their mark every-
where. Just to give a few examples, in physics, we have Bernoulli’s Principle, named after Daniel
Bernoulli (1700–82). In economics, Daniel is usually credited with coming up with the concept of ex-
pected utility. L’Hôpital’s Rule should really be Johann Bernoulli’s (1667–1748) Rule. As we’ll learn
later, in probability, we have Bernoulli random variables, also named after Jacob.
This English translation was taken from Lagarias (2013, p. 13), who in turn credits Jordan Bell. The
original Latin passage can be found in Bernoulli’s posthumously published Ars Conjectandi (1713, p.
1 1 1 1 1
quando sunt puri Quadrati, ut in serie + + + + &c. difficilior est,
1 4 9 16 25
quam quis expectaverit, summae pervestigatio, quam tamen finitam esse, ex altera,
qua manifesto minor est, colligimus: Si quis inveniat nobisque communicet, quod
794, Contents
The Basel Problem was first solved by Leonhard Euler (1707–83) in 1734, albeit somewhat
heuristically.316 By heuristically, we mean that Euler’s solution would not meet modern
standards of rigour. Nonetheless, we will now go through his solution, because it is delight-
fully simple and illustrates one use of the Maclaurin series.

Euler’s “Solution” to the Basel Problem

Start with the Maclaurin series for sin:

x3 x5
sin x = x − + − ...
3! 5!
sin x 1 x2 x4 1 1 4
Divide by x: =1− + − ⋅ ⋅ ⋅ = 1 − x2 + x − ...
x 3! 5! 6 120

Now consider for example the quadratic polynomial 1 − 4x + 3x2 . It has constant term 1
and roots r1 = 1/3 and r2 = 1. And so, we can write:

1 − 4x + 3x2 = (1 − 3x) (1 − x) = (1 − ) (1 − ) = (1 − ) (1 − ) .
x x x x
1/3 1 r1 r2

It turns out that in general, if p (x) is a nth-degree polynomial with constant term 1 and
roots r1 , r2 , . . . , rn , then:

p (x) = (1 − ) (1 − ) . . . (1 − ).
2 x x x
r1 r2 rn

(Can you prove this?)317

Now, here’s the bit of Euler’s argument that’s heuristic and not quite rigorous. From =, we

sin x
see that may be written as an “infinite polynomial” (or more correctly, a power series)
with constant term 1. And so here, Euler made an audacious leap of logic. He supposed
sin x
could, like any finite polynomial with constant term 1, be written like =. If so,
sin x
then since has roots ±π, ±2π, ±3π, . . . , we’d have:
sin x 3
= (1 − ) (1 − ) (1 − ) (1 − ) (1 − ) (1 − )...
x x x x x x
x π −π 2π −2π 3π −3π
industriam nostram elusit hactenus, magnas de nobis gratias feret.

Euler formally presented this result in 1735 (in St. Petersburg) and in 1740 published it as “De Summis
Serierum Reciprocarum” (PDF). The latter has been translated into English by Jordan Bell (2005,
By the Factor Theorem:

ri is a root of p (x) ⇐⇒ (x − ri ) is a factor of p (x) ⇐⇒ ( − 1) or (1 − ) is a factor of p (x).

x x
ri ri

Thus, p (x) = c (1 − ) (1 − ) . . . (1 − ) for some constant c. Since p (x) has constant term 1, it
x x x
r1 r2 rn
must be that c = 1 and hence =.

795, Contents

Now, it so happens that = is true. However, Euler failed to prove this and indeed, it would

be more than a century later before this could be rigorously proven.318

But if we blithely assume that = is true (as Euler did), then we can continue with a little

bit of algebra:

sin x 3
= (1 − ) (1 − )(1 − ) (1 − ) (1 − ) (1 − )...
x x x x x x
x π −π 2π −2π 3π −3π
x2 x2 x2
= (1 − )(1 − ) (1 − )...
π2 4π2 9π2
1 1 1
= 1 + (− − − − . . . ) x2 + . . .
π 4π
2 2 9π 2

x2 1 2 x2 1 2
(To get the last step, observe that − ⋅ 1 ⋅ 1 ⋅ 1 ⋅ ⋅ ⋅ ⋅ = − x , − ⋅ 1 ⋅ 1 ⋅ 1 ⋅ ⋅ ⋅ ⋅ = − x , etc.)
π2 π2 4π2 4π2
And now, compare = and = — in particular, compare the coefficients on x2 :
1 4

1 1 1 1
− = − 2 − 2 − 2 − ...
6 π 4π 9π

1 1 1 1 π2
Rearranging: 1 + + + ⋅⋅⋅ = 1 + 2 + 2 + ⋅⋅⋅ = .
4 9 2 3 6

Remark 109. The only defect in the above “solution” is that, as discussed above, = has

not been rigorously proven and requires additional justification.

The Weierstrass Factorization Theorem. According to Turner (2013), this was first proven in 1876.
796, Contents
76.16. The Riemann Hypothesis (fun, optional)
For a complex number s whose real part is greater than 1 (i.e. Res > 1), the Riemann zeta
function, denoted ζ, is defined by:

ζ (s) = ∑ s
n=1 n

1 1 1
And so for example: ζ (2) = + + + ...
12 22 32
Observe then that the Basel Problem is simply the problem of finding ζ (2). By comparing
the coefficients on x2 in = and = in the previous subchapter, we found that:
1 4

ζ (2) = .

It turns out that by similarly comparing the coefficients on x4 in = and =, we can — with
1 4

somewhat more work and algebra — find ζ (4):

1 1 1 π4
ζ (4) = 4 + 4 + 4 + ⋅ ⋅ ⋅ = .
1 2 3 90
We can similarly find ζ (s) for any positive even integer s. Indeed, in his 1740 paper, Euler
gave the following:
1 1 1 π6
ζ (6) = + 6 + 6 + ... = ,
1 6 2 3 945
1 1 1 π8
ζ (8) = 8 + 8 + 8 + . . . = ,
1 2 3 9 450
1 1 1 π10
ζ (10) = 10 + 10 + 10 + . . . = ,
1 2 3 93 555
1 1 1 691π12
ζ (12) = 12 + 12 + 12 + . . . = .
1 2 3 638 512 875
We can now state the Riemann Hypothesis, which is considered by many mathematicians to
be the most important unsolved problem in mathematics. It is one of the seven Millennium
Prize Problems,319 each of which carries a prize of US$1M.
Above we defined the Riemann zeta function ζ only for s such that Res > 1. It turns
out though that ζ actually has domain C ∖ {1} — that is, all complex numbers except
1. To define ζ on other complex numbers, we need to use something called an analytic
continuation, which is somewhat beyond the scope of H2 Maths and which we shan’t go
But assuming we have defined ζ (s) for every s ∈ C ∖ {1}, we can then state the Riemann
Hypothesis, which is simply the following claim:
The real part of any (non-trivial) root of ζ is .
Issued by the Clay Mathematics Institute in 2000.
797, Contents
It turns out that if s is a negative even integer, then ζ (s) = 0. So, the trivial roots of ζ are
the negative even integers.
We can thus also state the Riemann Hypothesis as:

If s is not a negative even integer and ζ (s) = 0, then Res = .
In September 2018, the renowned 89-year-old British mathematician Michael Atiyah claimed
to have solved the Riemann Hypothesis. There was much initial scepticism.320 Time will
tell if he’s actually correct.
As of late 2018, only one of the seven Millennium Prize Problems — the Poincaré Con-
jecture — has been officially solved. It was solved in 2003 by the Russian mathematician
Grigori Perelman (b. 1966). Perelman was officially awarded the US$1M prize in 2010, but
rejected it, stating:
I’m not interested in money or fame. I don’t want to be on display like
an animal in a zoo. I’m not a hero of mathematics. I’m not even that
successful; that is why I don’t want to have everybody looking at me.321

See e.g. this Science Magazine story.
See e.g. this 2010 BBC story.
798, Contents
So far in Part V, we’ve been looking at differential calculus.
In the remainder of Part V, we’ll look instead at integral cal-

799, Contents

77. Integration

Differentiation, or the problem of Integration, or the problem of finding

finding the derivative, is the problem the definite integral, is the problem
of finding the gradient of a curve. of finding the area under a curve.

Example 1002. Graphed below is the function f ∶ [0, 9] → R defined by f (x) = x + 1.

Figure to be
inserted here.

The definite integral of f from 0 to 1 is the number equal to the red area and may be
1 1 1
∫0 f (x) dx or ∫0 f dx or ∫0 f .

The definite integral of f from 2 to 4 is the number equal to the blue area and may be
4 4 4
∫2 f (x) dx or ∫2 f dx or ∫2 f .

How can we compute the red or blue areas? Right now, we have no idea. But in the next
subchapter, we’ll revisit this question.

800, Contents

Example 1003. Graphed below is the function g ∶ R → R defined by g (x) = x − 2.

Figure to be
inserted here.

The definite integral of g from 0 to 1 is the number equal to the red area and may be
1 1 1
∫0 g (x) dx or ∫0 g dx or ∫0 g.

The definite integral of g from 2 to 4 is the number equal to the blue area and may be
4 4 4
∫2 g (x) dx or ∫2 g dx or ∫2 g.

Thanks to primary-school geometry, in this example, we know how to compute the red
and blue areas:
1 1 Base × Height 1 × 1 1
∫0 g (x) dx = ∫0 g dx = ∫0 g = 2
= .
4 4 4 Base × Height 2 × 2
∫2 g (x) dx = ∫2 g dx = ∫2 g = = = 2.
2 2

You are probably most familiar with the following piece of notation:

∫a f (x) dx.

We call:
• The symbol ∫ the integral sign (it is simply an elongated S);
• The numbers a and b the lower and upper limits of integration;
• The function f to be integrated the integrand; and
• The symbol dx the differential of the variable x — it tells us that the independent
variable is denoted x.
Notice though that, as usual, x is merely a dummy variable that can be replaced with
any other symbol. When describing the definite integral of f from a to b, what matters are
the function f and the lower and upper limits a and b. The symbol we use to denote the
independent variable doesn’t really matter — it is customarily x but could be any other
symbol like y, t, u, or even ,.
801, Contents
And so, the “(x)” and even “dx” are somewhat superfluous. We could equally well denote
the definite integral ∫ f (x) dx as:

b b
∫a f dx or ∫a f .

Nonetheless, as we’ll see later when we’re dealing with more than one variable, the “(x)”
and “dx” can help us avoid confusion.
As we saw earlier (Ch. 69.4), in differential calculus, there are (at least) three commonly-
used types of notation:322

The derivative of f = f ′ (Lagrange’s notation)

= (Leibniz’s notation)
= f˙. (Newton’s notation)
In contrast, in integral calculus, Leibniz’s notation is the only one that is still commonly

∫a f (x) dx
b b b
or ∫a f dx or ∫a f .

Fun Fact

Newton denoted the derivative of f with a “dot above” — f˙.

To denote the integral of f , he similarly placed a vertical line above — f . But as just
mentioned, Newton’s notation for integration is rarely (never?) used.

We will not give a formal definition of the definite integral in the main text of this
textbook.323 Nonetheless, just to provide a little clarity and precision, here’s an informal
definition anyway:

Informal Definition of the Definite Integral

Let a, b ∈ R with a < b and f ∶ [a, b] → R be a continuous function. Then the definite
integral of f from a to b is denoted:

f (x) dx
b b b
∫a or ∫a f dx or ∫a f ;

and is the area bounded by f , the x-axis, and the vertical lines x = a and x = b.
As already mentioned, we call the symbol ∫ the integral sign; the numbers a and b the
lower and upper limits of integration; the function f to be integrated the integrand; and
the symbol dx the differential of the variable x.
In n. 285, we also mentioned a fourth type of notation due to Euler.
But if you’re interested, see Ch. 121.13 (Appendices).
802, Contents
The above definition is considered informal because we haven’t formally defined what the
“area” bounded by a curve and three straight lines is, or how we can compute it. In the
next subchapter, we will make a sketch of how this might be done.
By the way, in the above definition, we define the definite integral of f from a to b only in
the case where a < b. We will find it convenient to also define the definite integral in those
cases where (a) the two limits are equal; and (b) the upper limit is smaller than the lower

Definition 181. Let a, b ∈ R with a < b and f ∶ [a, b] → R be a continuous function.

Suppose c ∈ [a, b]. Then:
(a) The definite integral of f from c to c is denoted ∫ f and is defined to be equal to
zero: c

∫c f = 0.

(b) The definite integral of f from b to a is denoted ∫ f and is defined to be the additive
inverse of ∫ f :

f = − ∫ f.
a b
∫b a

803, Contents

77.1. An Important Warning
In O Level A Maths, you may have been taught that integration is, by definition, simply
the inverse of differentiation.324 This approach is incorrect, confusing, and detrimental to
your understanding of why the Fundamental Theorems of Calculus have any substance.325
To repeat:

Differentiation is the problem of Integration is the problem of

finding the gradient of a curve. finding the area under a curve.

And so, there is, a priori,326 no relationship whatsoever between differentiation and integ-
ration. There is, a priori, no reason to believe that the gradient of a curve has anything to
do with the area under that same curve.
That there is a relationship is established only with the two Fundamental Theorems of
Calculus (FTCs), which tell us that, very surprisingly:

Integration and differentiation are inverse operations.

This, it must be stressed, is a very surprising result. There is no reason to have expected
that the gradient of a curve is somehow related to the area under the curve — much less
that these two operations are inverses of each other.
We will now work our way towards the first Fundamental Theorem of Calculus (FTC1).
Don’t worry, we’ll omit most technical details. The goal here is merely to provide you
with some intuition and hence a better understanding of why the FTCs work and why
differentiation and integration turn out to be inverses.

Indeed, on your 4047 A Maths syllabus (and again on your H2 Maths syllabus), the very first mention
of integration states, “integration as the reverse of differentiation”.
According to Abbott (2015, pp. 215–6):
Historically, the concept of integration was defined as the inverse process of differ-
entiation. ... A very interesting shift in emphasis occurred around 1850 in the work
of Cauchy, and soon after in the work of Bernhard Riemann. The idea was to com-
pletely divorce integration from the derivative and instead use the notion of “area
under the curve” as a starting point for building a rigorous definition of the integral.
The latter, modern approach is the one this textbook shall follow, not least for pedagogical reasons.
See also this MathEducators.SE discussion.
A priori is just a fancy Latin phrase for beforehand.
804, Contents
805, Contents
77.2. A Sketch of How We Can Find the Area under a Curve

Example 1004. Define327 f ∶ [0, 9] → R by f (x) = x + 1.

Figure to be
inserted here.

Consider the definite integral of f from 0 to 4. This quantity corresponds to the green
area and may be denoted:
4 4 4
∫0 f (x) dx or ∫0 f dx or ∫0 f .

We shall now try to compute this quantity.

Unsure of how to proceed, we start by trying a crude approximation.

Consider the rectangle with base 4 and height f (0) = 0 + 1 = 1. If we denote its area
by L1 , then we have:

L1 = 4 × f (0) = 4.

Next, consider the rectangle with base 4 and height f (4) = 4 + 1 = 9. If we denote its
area by U1 , then we have:

U1 = 4 × f (4) = 36.

Evidently, the green area is somewhere between these two quantities. That is:
4 4
L1 ≤ ∫ f ≤ U1 or 4≤∫ f ≤ 36.
0 0

In other words, L1 = 4 and U1 = 36 serve as lower and upper bounds for what the green
area can be.
Can we do better than this? Sure. One obvious possibility is to use more rectangles.

Construct two rectangles, each with base 2, but one with height f (0) = 1 and the other
with height f (2) = 2 + 1. Let us call the total area of these two rectangles the lower
sum and denote it by L2 . Then we have:

L2 = 2 × [f (0) + f (2)] = 4 + 2 2 ≈ 6.828.

Next, construct two rectangles, each with base 2, but one with height f (2) = 2 + 1 and
the other with height f (4) = 9. Let us call the total area of these two rectangles the
upper sum and denote it by U2 , then we have:
806, Contents
Exercise 331. Continuing with the above example, let each of L4 and U4 be the total
area of four rectangles, where L4 and U4 serve as lower and upper bounds of ∫ f . Find
L4 and U4 . Are they improvements over L2 and U2 ?
Repeat all of the above, but now for L8 and U8 . (Answer on p. 807.)
A331. xxx

Figure to be
inserted here.

Sketched in the above example and exercise is the main idea underlying integration:

The area under a curve may be approximated

by infinitely many, infinitely thin rectangles.

More precisely, integration (or the procedure of computing the area under a curve) uses
these four steps:
1. We first divide the area under the curve into n thin rectangles, with each rectangle lying
entirely below the curve. We call the sum of these rectangles’ areas the lower sum Ln .
2. We again divide the area under the curve into n thin rectangles, but this time each
rectangle lies entirely above the curve. We call the sum of these rectangles’ areas the
upper sum Un .
3. Observe that for every n, we have:
Ln ≤ Area = ∫ f ≤ Un .

4. By letting n → ∞, we have:

lim Ln = Area and lim Un = Area.

n→∞ n→∞

Or: lim Ln = Area = lim Un .

n→∞ n→∞

Of course, to properly, rigorously, and precisely define integration (or the procedure of
computing the area under a curve), there are some technical details that need to be filled
in. For example, how exactly are the lower and upper sums Ln and Un defined?
But for H2 Maths, we needn’t worry about these technical details328 and the above ex-
planation will more than suffice. For a recent A-Level exam question that requests an
explanation of integration, see Exercise 575 (N2015-I-3).
But see Ch. 121.13 in the Appendices if you’re interested.
807, Contents
77.3. Some Basic Rules of Integration

Theorem 25. Let a, b, c, d, e ∈ R with a < c < b. Suppose f, g ∶ [a, b] → R are continuous
functions. Then:

(a) ∫ (f ± g) = ∫ f ± ∫ g. (Sum and Difference Rules)

b b b

a a a

(b) ∫ f = ∫ f + ∫ f . (Adjacent Intervals Rule)

b c b

a a c

(c) ∫ (df ) = d ∫ f . (Constant Factor Rule)

b b

a a

(d) ∫ d = (b − a) d. (Constant Rule)


(e) If f ≥ g on [a, b], then ∫ f ≥ ∫ g. (Comparison Rule I)

b b

a a

(f) If d ≤ f ≤ e on [a, b], then (b − a) d ≤ ∫ f ≤ (b − a) e. (Comparison Rule II)


Proof. For the formal proofs of (a)–(?), see p. 1356 in the Appendices. Here we give some
informal proofs:
(a) Consider the area under the graph obtained by taking the sum (or difference) of f and
g. This area must be equal to the sum (or difference) of the areas under the graphs of f
and g.

Figure to be
inserted here.

c b
(b) The area under the graph of f from a to c is ∫ f . The area from c to b is ∫ f . And
a c
so “obviously”, the area from a to b, or ∫ f , is the sum of those first two quantities.

Figure to be
inserted here.

(c) Stretch f outwards from x-axis by a factor d. The area under the graph thus obtained
must be d times the area under the graph of f .

808, Contents

Figure to be
inserted here.

(d) “Clearly”, ∫ c is simply the area of a rectangle with base b − a and height c. So

∫a c = (b − a) c.

Figure to be
inserted here.

(e) If f is everywhere on or above g, then the area under f must be no less than that under

Figure to be
inserted here.

(f) The numbers c and d serve as lower and upper bounds for f on the relevant interval
(a, b). And so “obviously”, ∫ f , the area under the graph of f from a to b, is bounded

from below and above by the rectangles with base b − a and heights c and d.

Figure to be
inserted here.

It is actually not difficult to formally prove (f) and you are asked to do so in Exercise

809, Contents

Exercise 332. Prove Theorem 25(f) Comparison Rule II by following these steps. Define
the functions F, G ∶ [a, b] → R by F (x) = d and G (x) = e. (Answer on p. 810.)
b b
(a) What are ∫ F and ∫ G? (Hint: Use the Constant Rule.)
a a
b b b
(b) What can we say about ∫ f , ∫ F , and ∫ G? (Hint: Use Comparison Rule I.)
a a a
Hence complete the proof of Comparison Rule II.

Exercise 333. Let f ∶ [a, b] → R be a continuous function. Suppose f ≥ 0 on [a, b]. Prove
that ∫ f ≥ 0. (Hint: Define F ∶ [a, b] → R by F (x) = 0.)
(Answer on p. 810.)

A332(a) By the Constant Rule, ∫ F = (b − a) d and ∫ G = (b − a) e.

b b

a a

(b) By Comparison Rule I, we have ∫ F ≤ ∫ f ≤ ∫ H. Hence, (b − a) d ≤ ∫ f ≤

b b b b

(b − a) e.
a a a a

A333. Following the hint, we define F ∶ [a, b] → R by F (x) = 0. By the Constant Rule,
∫a F = 0. Since f ≥ F = 0 on [a, b], by Comparison Rule I, ∫a f ≥ ∫a F = 0.
b b b

810, Contents

77.4. The First Fundamental Theorem of Calculus (FTC1)
In the previous subchapter, we sketched the main idea underlying integration:

The area under a curve may be approximated

by infinitely many, infinitely thin rectangles.

However, we have not actually explained how we can solve the following problem:

How do we actually find the

area of these △
“infinitely many, infinitely
thin rectangles”?

Instead of tackling the above problem directly, we will now, somewhat strangely, take an
indirect approach. We will instead try to answer the following question:

What is the derivative of a definite integral? -

Now, at this point, we have no idea how to find a definite integral. And so, the above
question seems akin to asking someone who has no idea where Singapore is to locate the
Nonetheless and somewhat surprisingly, it turns out that the seemingly-indirect question
- is easier to answer than △ and will enable us to find definite integrals.
Indeed, the answer to - is precisely the First Fundamental Theorem of Calculus (FTC1)!
It is this:

The derivative of a definite integral is the function itself! ,

We will now try to work towards understanding why ,, which is an informal statement of
the FTC1, might be true.
We begin by noting that - and , are a little imprecise when they speak of “the derivative
of a definite integral”. We defined a definite integral to be a number. But we know that
only functions can have derivatives and so it makes no sense to speak of the derivative of
a number. So let us now define a function based on definite integrals:

Definition 182. Given the continuous function f ∶ [a, c] → R and any b ∈ [a, c], we define
a new function g ∶ [b, c] → R, called the definite integral of f from b, by:

g (x) = ∫ f (t) dt.


In words, the function g takes each x ∈ [b, c] and maps it to the number that is equal to
the area under f , bounded by the x-axis and the vertical lines at b and c.

811, Contents

Example 1005. Define f ∶ [−2, 5] → R by f (x) = x2 .
Then the definite integral of f from −2 is the function g ∶ [−2, 5] → R defined by:

g (x) = ∫ f (t) dt.



We have, for example:

Figure to be
inserted here.

Similarly, the definite integral of f from 0 is the function h ∶ [0, 5] → R defined by:

h (x) = ∫ f (t) dt.


We have, for example:

Figure to be
inserted here.

Remark 110. As usual, take care to note that t in the above Definition is simply a dummy
variable that can be replaced by any other symbol.
Also, the equation in the above Definition could also have been written more simply as:

g (x) = ∫ g (x) = ∫
x x
f dt, or f.
a a

812, Contents

Example 1006. As before, define f ∶ [0, 9] → R by f (x) = x + 1.

Figure to be
inserted here.

Let g be the definite integral of f from 0. That is, define g ∶ [0, 9] → R by:

g (x) = ∫

Then we have, for example:

2 4 7
g (2) = ∫ f, g (4) = ∫ f, g (7) = ∫ f.
0 0 0

We now ask:

What is the derivative of g? -

To answer this question, let us return to one of the possible intuitive interpretations of
the derivative:

The derivative addresses the question,

“Given a small unit change in the independent variable,
by how much does the value of the function change?”

We now try to find g ′ (4), the derivative of g at 4. To do so, suppose x is a quantity

slightly greater than 4.
The areas of the [COLOR XXX?] and [COLOR XXX?] rectangles depicted below are:

(x − 4) f (4) and (x − 4) f (x)

Observe that the thin [COLOR XXX?] area is g (x) − g (4) and is bounded by the above
two areas. That is:

(x − 4) f (4) ≤ g (x) − g (4) ≤ (x − 4) f (x) .

Figure to be
inserted here.

813, Contents

Theorem 26. (First Fundamental Theorem of Calculus [FTC1]) Let f ∶ [a, b] → R
be continuous. Suppose g ∶ [a, b] → R is defined by:

g (x) = ∫

Then g ′ = f .

Proof. See p. 1358 in the Appendices.

Remark 111. The FTC1 establishes that differentiation and integration are inverse
operations. Again, we must stress, emphasise, and repeat that this is a genuinely
surprising result and should not be taken for granted. In particular, we should not
assume that integration is by definition the inverse of differentiation. Instead, we should
be acutely aware that this is a surprising finding established only by the FTC1.

We can illustrate the FTC1 using a familiar example from physics:

Definition 182.
This step is formally justified by the Order Limit Theorem (Appendices).
814, Contents
Example 1007. A car is moving. Below we graph its velocity v (m s−1 ) as a function
of time t (s).

Figure to be
inserted here.

Recall that the distance d (m) travelled by the car is the area under the graph.
• For example, after 5 s, the distance travelled by the car is ∫ v dt.
• And after 8 s, the distance travelled by the car is ∫ v dt.
• In general, after x s, the distance travelled by the car is ∫ v dt.

But we already know that:

The derivative of distance w.r.t. time is velocity.

That is: d′ = v.

What we’ve just shown is precisely the FTC1:

The derivative of the area under a function’s graph is the function itself.

So far, we haven’t actually computed the area under any curve. We shall do so in Ch. 79,
where the Second Fundamental Theorem of Calculus (FTC2) is introduced.

815, Contents

78. Antidifferentiation
Antidifferentiation is simply the operation that is the inverse of differentiation.

Definition 183. Let F and f be functions. Suppose the derivative of F is f (i.e. F ′ = f ).

Then we call F an antiderivative (or primitive or indefinite integral) of f and write:

F = ∫ f.

Thus, the following two statements are exactly equivalent:

F =∫ f ⇐⇒ F′ = f.

Example 1008. Define the functions f, F ∶ R → R by:

f (x) = 2x and F (x) = x2 .

Observe that the derivative of F is f and we may write F ′ = f .

And so, by Definition 183, F is an antiderivative of f and we may write F = ∫ f .
Earlier, with Leibniz’s differentiation notation, we wrote things like:
d 21
x = 2x.

We understood that = was simply shorthand for the following, more long-winded state-


Given a function with mapping rule x ↦ x2 ,

its derivative is a function with mapping rule x ↦ 2x.

Here likewise, we shall also write things like:

∫ 2x dx = x ,

with the understanding that = simply shorthand for the following, more long-winded


Given a function with mapping rule x ↦ 2x, one of its

antiderivatives is a function with mapping rule x ↦ x2 .

Remark 112. In this textbook, we will treat the terms antiderivative, primitive, and
indefinite integral as synonyms.330 We will also treat the terms antidifferentiation
and indefinite integration as synonyms.

Some writers choose to maintain a very slight and subtle distinction between these three terms. See for
816, Contents
Example 1009. XXX

Example 1010. XXX

Remark 113. Again, let us stress, emphasise, and repeat that a priori, there is no re-
lationship whatsoever between the definite integral ∫ f and the antiderivative or
indefinite integral ∫ f .

The symbol ∫ f denotes the area under the graph of f , between a and b. In contrast,
the symbol ∫ f denotes any function whose derivative happens to be the function f .
It is only through the two FTCs that we establish that, surprisingly enough:
• There is a relationship between the integration (or definite integration) and antidiffer-
entiation (or indefinite integration),
• And moreover, the two turn out to be the “same thing”.
One reason why students are often confused into thinking that there is some obvious,
definitional relationship between the definite and indefinite integrals is that they have
almost identical names and notation. And so, to reduce this source of confusion, I will
often prefer to use the terms antiderivative and antidifferentiation instead of indef-
inite integral and indefinite integration. This helps to constantly remind students
that antidifferentiation (or indefinite integration) is, by definition, the inverse of
differentiation and, a priori, has nothing to do with integration.

example Hagen von Eitzen’s answer at . My view is that even if such a distinction were useful, it is
so subtle as to be more confusing than clarifying (especially at this introductory level). And so, this
textbook shall simply treat these three terms as synonyms.
817, Contents
78.1. The Antiderivative Is Not Unique ...

Example 1011. Define the four functions f, F, G, H ∶ R → R by:

f (x) = 2x, F (x) = x2 , G (x) = x2 + 5, H (x) = x2 − 9.

Observe that the derivative of each of F , G, and H is f . That is:

F′ = f, G′ = f , H′ = f .

And so, F , G, and H are all antiderivatives of f :

F = ∫ f, G = ∫ f, H = ∫ f.

The above example shows that the antiderivative is not unique. In general:

Fact 150. Suppose the function f ∶ D → R has the antiderivative F . Let C ∈ R. If

G ∶ D → R is defined by G (x) = F (x) + C, then G is also an antiderivative of f .

Proof. For all x ∈ D, we have F ′ (x) = f (x) and hence also G′ (x) = F ′ (x)+C ′ = F ′ (x)+0 =
f (x). We have just shown that the derivative of G is f and thus that G is also an
antiderivative of f .

We call C the constant of integration (COI).

78.2. ... But It Is Unique Up to a COI

It turns out that the converse of Fact 150 is also true. That is, although antiderivatives are
not unique, they are unique up to a constant (of integration):

Fact 151. Let D be an interval. Suppose the function f ∶ D → R has the antiderivative F .
If G ∶ D → R is also an antiderivative of f , then G may be defined by G (x) = F (x) + C,
where C is some real number.

Proof. Define H ∶ D → R by H (x) = F ′ (x) − G′ (x).

For all x ∈ D, we have F ′ (x) = G′ (x) = f (x) and hence H (x) = F ′ (x) − G′ (x) = 0.
By Proposition 6 then, H is constant on D. That is, for all x ∈ D, there exists C ∈ R such
that H (x) = F (x) − G (x) = C or equivalently G (x) = F (x) + C.

For future reference, let’s combine the above two results:

818, Contents

Corollary 32. Suppose D is an interval, f, G ∶ D → R are functions, and F is an
antiderivative of f . Then:

G is an antiderivative of f ⇐⇒ There exists C ∈ R. such that

G (x) = F (x) + C for all x ∈ D.

Example 1012. Consider the function f ∶ R → R defined by f (x) = 2x.

We know that the function F ∶ R → R defined by F (x) = x2 is an antiderivative of f .
That is:

F′ = f, or equivalently F = ∫ f.

Suppose we are told that the function G ∶ R → R is also an antiderivative of f , i.e.

G = ∫ f . Then by Corollary 32, there exists C ∈ R such that:

G (x) = F (x) + C = x2 + C for all x ∈ D.

Example 1013. Let f be a function. Suppose F is an antiderivative of f and F has

mapping rule:

F (x) = sin (ex −3x+5

for all x.

Suppose G is also an antiderivative of f . Then even without any more information, we

know that there exists C ∈ R such that:

G (x) = F (x) + C = sin (ex −3x+5

for all x.

Exercise 334. Define f ∶ R → R by f (x) = x − 3. (Answer on p. 1542.)

(a) Find three antiderivatives of f .
(b) Suppose you’re told that some function A ∶ R → R is also an antiderivative of f .
Then how must the function A be related to each of the three functions you found
in (a)?
Exercise 335. Define the functions f, F, G ∶ R → R by f (x) = 4 sin 4x, F (x) = − cos 4x,
and G (x) = 8 sin2 x cos2 x.
(a) Show that F and G are both antiderivatives of f .
(b) The functions F and G seem to be very different. Yet both are antiderivatives of f .
Why does this not contradict our assertion that “antiderivatives are unique up to a
constant”? (Answer on p. 1542.)

819, Contents

78.3. How, Precisely, Should We Use the Antidifferentiation
Symbol ∫ ?

On p. 682, we carefully and pedantically explained how, precisely, we should use Leibniz’s
differentiation notation, in particular the symbol .
We now do likewise for Leibniz’s antidifferentiation notation, in particular the symbol ∫ .

Example 1014. Consider the statement “∫ 2x dx = x2 + C”.

It is simply shorthand for the following precise but long-winded statement:
“Consider a function with mapping rule x ↦ 2x. Its antideriv-
atives are exactly those functions with mapping rule x ↦ x2 +C
for any real number C.”

Example 1015. Consider the statement “∫ sin x dx = − cos x + C”.

It is simply shorthand for the following precise but long-winded statement:
“Consider a function with mapping rule x ↦ sin x. Its an-
tiderivatives are exactly those functions with mapping rule
x ↦ − cos x + C for any real number C.”

In general, any of the following three (equivalent) statements:

∫ f (x) dx = g (x) + C or ∫ f dx = g + C or ∫ f =g+C

is simply shorthand for the following precise but long-winded statement:

“Given a function with the mapping x ↦ f (x), its antiderivat-
ives are exactly those functions with the mapping x ↦ g (x) + C,
where C is some real number.”

820, Contents

78.4. Rules of Antidifferentiation
We should first mention that every continuous function has an antiderivative:

Theorem 27. If f ∶ [a, b] → R is continuous, then f has an antiderivative.

Proof. Define g ∶ [a, b] → R by g (x) = ∫

By the FTC1, g ′ = f . So, g is an antiderivative of f .

We first give quick examples to illustrate several Rules of Antidifferentiation. We then

state these formally as Theorem 28 on the next page:

Example 1016. (Constant Rule) ∫ 5 dx = 5x+C, where as usual, C denotes the COI.

Example 1017. (Power Rule) ∫ x17 dx = x18 + C.
1 11
Example 1018. (Power Rule) ∫ x−4 dx = − x−3 + C = − 3 + C.
3 3x
Example 1019. (Reciprocal Rule) ∫ dx = ln ∣x∣ + C.

Example 1020. (Exponential function) ∫ exp x dx = exp x + C.

Example 1021. (Sine) ∫ sin x dx = − cos x + C.

Example 1022. (Cosine) ∫ cos x dx = sin x + C.

Example 1023. (Sum Rule) ∫ sin x + exp x dx = − cos x + exp x + C.

Example 1024. (Difference Rule) ∫ − cos x dx = ln ∣x∣ − sin x + C.
Example 1025. (Constant Factor Rule) ∫ 5x−4 dx = − 3 + C.
Example 1026. (LPC Rule) ∫ cos (2x + 3) dx = sin (2x + 3) + C.
Example 1027. (LPC Rule) ∫ exp (2x + 3) dx = exp (2x + 3) + C.

821, Contents

Example 1028. (LPC Rule) ∫ (2x + 3) dx = (2x + 3) + C.
17 18

822, Contents

Theorem 28. (Rules of Antidifferentiation) Let a, b, k ∈ R with a < b and f, g ∶
[a, b] → R be continuous functions. Then:
(a) ∫ k dx = kx + C, (Constant R

∫ x dx = k + 1 + C (k ≠ −1 and x ≠ 0 if n < 0),

(b) k
(Power Rule

(c) ∫ x dx = ln ∣x∣ + C (x ≠ 0), (Reciprocal

(d) ∫ e dx = e + C, (Exponentia
x x

(e) ∫ sin x dx = − cos x + C, (Sine)

(f) ∫ cos x dx = sin x + C, (Cosine)

(g) ∫ f ± g = ∫ f ± ∫ g, (Sum and D

(h) ∫ kf = k ∫ f , (Constant F

(i) ∫ f (ax + b) dx = (∫ f ) (ax + b),
where, in each case, C denotes the constant of integration.

Remark 114. For lack of a better name, I shall call the last Rule the Linear Polynomial
Composition (LPC) Rule.
When written out formally, it looks complicated. But as illustrated by the last three
examples, it’s jolly simple and you will already have seen plenty of it in secondary school.

Remark 115. Just to be perfectly clear, let us stress, emphasise, and repeat what we
already said in the last subchapter.
Take for example the Constant Rule, which states:

∫ k dx = kx + C.

The above equation is simply shorthand for the following precise but long-winded state-
Suppose a function has mapping rule x ↦ k (where k ∈ R).
Then this function’s antiderivatives are exactly those functions
whose mapping rule is x ↦ kx + C.

Proof. Here we will prove (or rather verify) only the (c) Reciprocal Rule. (You are asked
to verify the remaining Rules in Exercise 336.)
In general, to verify that ∫ f (x) dx = F (x) + C, it suffices to verify that (F (x) + C) =
823, Contents
f (x).
1 d
(c) So, to verify that ∫ dx = ln ∣x∣+C (x ≠ 0), it suffices to verify that (ln ∣x∣ + C) = x−1
x dx
(for x ≠ 0).

⎪ln x + C for x > 0,
We have: ln ∣x∣ + C = ⎨

⎩ln (−x) + C
⎪ for x < 0.

⎪ 1

⎪ for x > 0,
d ⎪

Thus: (ln ∣x∣ + C) = ⎨
dx ⎪

⎪ −1 1

⎪ = for x < 0.
⎩ −x x
d 1
That’s all there is to verifying that (ln ∣x∣ + C) = x−1 and hence also that ∫ dx =
dx x
ln ∣x∣ + C (for x ≠ 0)!

Remark 116. In the Reciprocal Rule, there is, annoyingly enough, an absolute value
sign. Take care to always include it. And no, it is not OK to simply drop it. For why
this isn’t OK, see the Remark following the answer to Exercise 337(c) or this discussion:

Remark 117. By Theorem 28(a), we have ∫ 1 dx = x + C.

We will often write ∫ 1 dx more simply as ∫ dx. Thus, = may also be rewritten as:

∫ dx = x + C.

824, Contents

Exercise 336. Verify the remaining rules in Theorem 28. (Answer on p. 1542.)
Exercise 337. Write out the precise but long-winded statement for each of the rules in
Theorem 28. You needn’t do (a), which was already done in the above Remark.(Answer
on p. 1542.)

Exercise 338. Compare the Constant Factor Rule in Theorem 27 (Rules of Antidiffer-
entiation) and the Constant Factor Rule in Theorem 25 (Rules of Integration). Aren’t
these exactly the same thing? If not, explain why. (Answer on p. 1543.)

Exercise 339. Let a, b, c, d ∈ R be constants. Find the following antiderivatives (or

indefinite integrals). (Answer on p. 1543.)

(a) ∫ ax + b dx.

(b) ∫ ax2 + bx + c dx.

(c) ∫ ax3 + bx2 + cx + d dx.

(d) ∫ (ax + b) dx for c ≠ −1.


(e) ∫ dx.
ax + b
(f) ∫ a sin (bx + c) + d dx.

(g) ∫ a exp (bx + c) + d dx.

(h) ∫ a cos bx + c + dx.

825, Contents

79. The Second Fundamental Theorem of Calculus (FTC2)
We are now, at long last, ready to compute actual areas under actual curves. To do so,
we will use the Second Fundamental Theorem of Calculus (FTC2). Informally, the
FTC2 states that:

∫a f = g (b) − g (a)
where g is any antiderivative of f .

That is, when asked to find the definite integral of f from a to b (i.e. the area under the
graph of f between a and b), we need merely follow these steps:
1. Find any antiderivative g of f .
2. Plug the lower and upper limits of the definite integral into g.
3. The difference g (b) − g (a) is our desired area.
Below we will formally state and prove the FTC2. But first, some examples to illustrate
how it works:

Example 1029. Define f ∶ R → R by f (x) = x . Suppose we are told to find ∫ f —
that is, the area under the graph of f , between 0 and 3.

Figure to be
inserted here.

To do so, we will follow the steps given above.

1. Find any antiderivative g of f .
An antiderivative of f is g ∶ R → R defined by g (x) = x3 .
2. Plug the lower and upper limits of the definite integral into g.
1 3 1
We get g (3) = ⋅ 3 = 9 and g (0) = ⋅ 03 = 0.
3 3
3. The difference g (b) − g (a) is our desired area.

Thus: ∫0 f = g (3) − g (0) = 9 − 0 = 9

Example 1030. XXX

Example 1031. XXX

826, Contents
Theorem 29. (Second Fundamental Theorem of Calculus, or FTC2) Let a, b ∈ R
with a < b. Suppose f ∶ [a, b] → R is a continuous function with antiderivative g. Then:

∫a f = g (b) − g (a) .

Proof. Define the function h ∶ [a, b] → R by h (x) = ∫

By the FTC1, h is an antiderivative of f .
Since g is also an antiderivative of f , by Corollary 32, there exists C ∈ R such that for all
x ∈ [a, b]:
h (x) = g (x) + C.
And now we have:

g (b) − g (a) = [g (b) + C] − [g (a) + C] = h (b) − h (a) = ∫ f − ∫ f = ∫ f − 0 = ∫ f.

b a b b

a a a a

Exercise 340. Evaluate each definite integral. (Answer on p. 827.)

(a) xxx
(b) xxx

A340(a) xxx
(b) xxx

827, Contents

80. More Techniques of Antidifferentiation
We take the available space here to stress, emphasise, and repeat:

Antidifferentiation is the Integration is the problem of

inverse of differentiation. finding the area under a curve.

A priori, there is no reason to believe that integration and antidifferentiation have

anything to do with each other.

It is only through the FTCs that we establish that, surprisingly

enough, integration and antidifferentiation are the “same thing”.

(We put “same thing” in scare quotes because this is a rather imprecise assertion that is
made precise only by the formal statements of the FTCs.)

80.1. Factorisation

Example 1032. Find ∫ 2 dx (for x ≠ −1).
x + 2x + 1
Looks tricky. But observe that x2 + 2x + 1 = (x + 1) . And so, using also the Power and

LPC Rules (Theorem 27), we have:

1 1 1
∫ x2 + 2x + 1 dx = ∫ 2 dx = − x + 1 + C.
(x + 1)

Example 1033. Find ∫ 3 dx (for x ≠ −1).
x + 3x2 + 3x + 1
We observe that x2 + 3x2 + 3x + 1 = (x + 1) . And so:

1 1 1 1
∫ x3 + 3x2 + 3x + 1 dx = ∫ dx = − + C..
(x + 1) 2 (x + 1)2

Exercise 341. Find each antiderivative. (Answer on p. 1546.)

(a) ∫ dx (for x ≠ 0.5).
4x2 − 4x + 1
(b) ∫ dx (for x ≠ −5/3).
9x2 + 30x + 25

828, Contents

80.2. Partial Fractions: Finding ∫ dx where b2 − 4ac > 0
ax + bx + c

In the last subchapter, we learnt to find the following antiderivative in those cases where
b2 − 4ac = 0 and ax2 + bx + c is thus a perfect square:

∫ ax2 + bx + c dx.

We now learn to find the above antiderivative, but in those cases where b2 − 4ac > 0 so that
ax2 + bx + c is still factorisable but no longer a perfect square. So, this is really just
more factorisation, but this time we’ll also make use of partial fractions.

Example 1034. Find ∫ 2 dx (for x ≠ ±1).
x −1
Here partial fractions (see Ch. 25.1) will come in handy. Observing that x2 − 1 =
(x + 1) (x − 1), we write:
1 A (x − 1) + B (x + 1) (A + B) x − A + B
= + = =
x2 − 1 x + 1 x − 1 (x + 1) (x − 1) x2 − 1

Comparing coefficients, we have A + B = 0 and −A + B = 1. Solving, we have A = −1/2

and B = 1/2. Thus:

1 −1/2 1/2
∫ x2 − 1 dx = ∫ x + 1 + x − 1 dx
−1/2 1/2
=∫ dx + ∫ dx (Sum Rule)
x+1 x−1
1 1 1 1
=− ∫ dx + ∫ dx (Constant Rule)
2 x+1 2 x−1
⋆ 1 1
= − ln ∣x + 1∣ + ln ∣x − 1∣ + C (Reciprocal and LPC Rules)
2 2
= (ln ∣x − 1∣ − ln ∣x + 1∣) + C
1 ∣x − 1∣
= ln +C (Law of Logarithm)
2 ∣x + 1∣
1 x−1
= ln ∣ ∣ + C. (Fact 42)
2 x+1

Note: We could’ve just left our answer at =; the last three steps are nice but aren’t

829, Contents

Example 1035. Find ∫ 2 dx (for x ≠ −3, 2).
x +x−6
Observing that x2 + x − 6 = (x + 3) (x − 2), we write:
1 A (x − 2) + B (x + 3) (A + B) x + 3B − 2A
= + = =
x2 + x − 6 x + 3 x − 2 (x + 3) (x − 2) x2 + x − 6

Comparing coefficients, we have A + B = 0 and 3B − 2A = 1. Solving, we have A = −1/5

and B = 1/5. Thus:

1 −1/5 1/5
∫ x2 + x − 6 dx = ∫ x + 3 + x − 2 dx
−1/5 1/5
=∫ dx + ∫ dx (Sum Rule)
x+3 x−2
1 1 1 1
=− ∫ dx + ∫ dx (Constant Rule)
5 x+3 5 x−2
⋆ 1 1
= − ln ∣x + 3∣ + ln ∣x − 2∣ + C (Reciprocal and LPC Rules)
5 5
= (ln ∣x − 2∣ − ln ∣x + 3∣) + C
1 ∣x − 2∣
= ln +C (Law of Logarithm)
5 ∣x + 3∣
1 x−2
= ln ∣ ∣ + C. (Fact 42)
5 x+3

Again, = would’ve sufficed as our answer.

Exercise 342. Find each antiderivative. (Answer on p. 1546.)

(a) ∫ dx (for x ≠ −0.6, 1).
5x2 − 2x − 3
(b) ∫ 2 dx (for x ≠ ±a).
x − a2
(c) ∫ 2 dx (for x ≠ ±a).
a − x2

830, Contents

80.3. Building a Divisor of the Denominator

dx (for x ≠ −1).
Example 1036. Find ∫ 2
x + 2x + 1
First, observe that x2 + 2x + 1 = (x + 1) . Then write:

∫ x2 + 2x + 1 dx = ∫
x x
2 dx
(x + 1)
=∫ dx (Plus Zero Trick)
(x + 1)

x+1 1
=∫ − dx
(x + 1) (x + 1)
2 2

1 1
=∫ − dx
x + 1 (x + 1)2
1 1
=∫ dx − ∫ 2 dx. (Difference Rule)
x+1 (x + 1)
= ln ∣x + 1∣ + + C. (Reciprocal, Power, and LPC Rules)

831, Contents

dx (for x ≠ 1).
Example 1037. Find ∫ 3
x − 3x2 + 3x − 1
First, observe that x3 − 3x2 + 3x − 1 = (x − 1) . Then write:

∫ x3 − 3x2 + 3x − 1 dx = ∫
x x
3 dx
(x − 1)
=∫ dx (Plus Zero Trick)
(x − 1)

x−1 1
=∫ + dx
(x − 1) (x − 1)
3 3

1 1
=∫ + dx
(x − 1) (x − 1)
2 3

1 1
=∫ dx + ∫ dx. (Sum Rule)
(x − 1) (x − 1)
2 3

1 1 1
=− − +C (Power, and LPC Rules)
x − 1 2 (x − 1)2
2x − 1
=− + C.
2 (x − 1)

The last step is nice but not necessary.

Exercise 343. Find each antiderivative. (Answer on p. 1547.)

7x + 2
(a) ∫ dx (for x ≠ 0.5).
4x2 − 4x + 1
7x + 2
(b) ∫ 2 dx (for x ≠ −3, 2). (You may use Example 1035.)
x +x−6
7x + 2
(c) ∫ dx (for x ≠ −0.6, 1). (You may use your answer from Exercise 342(a).)
5x − 2x − 3

832, Contents

80.4. More Rules of Antidifferentiation
The following appear on List MF26, which means no need to mug:

Proposition 9. (More Rules of Antidifferentiation) Suppose a ≠ 0. Then:

1 1
∫ x2 + a2 dx = a tan a + C,
−1 x

∫ √ 2 dx = sin−1 + C, for ∣x∣ < ∣a∣,
a − x2 ∣a∣
1 1 x−a
(c) ∫ x2 − a2 dx = 2a ln ∣ x + a ∣ + C, for x ≠ a,

1 1 a+x
(d) ∫ a2 − x2 dx = 2a ln ∣ a − x ∣ + C, for x ≠ a,

∫ tan x dx = ln ∣sec x∣ + C,
(e) for x not an odd multiple of ,
(f) ∫ cot x dx = ln ∣sin x∣ + C, for x not an multiple of π,

(g) ∫ cosecx dx = − ln ∣cosecx + cot x∣ + C, for x not an multiple of π,

∫ sec x dx = ln ∣sec x + tan x∣ + C,

(h) for x not an odd multiple of ,
where in each case, C is the constant of integration.

Remark 118. Our versions of (b)–(h) are slightly more general than those given in List

∫ ( cos x ) 2x dx = sin x + C.
2 2

↓ ↓ ↓ ↓ ↓
′ ′
f g g f g

Remark 119. As we already saw when doing Exercise 342, (d) is really just (c) with a
negative sign stuck in front.

Proof. Here we will prove (or rather verify) only (a) and (b). We already verified (c) and
(d) in Exercise 342. You are asked to verify (e)–(h) in Exercise 344.
d 1
(a) By Fact 144, tan−1 x = 2 . And so we have:
dx x +1
d 1 1 1 1 1
[ tan−1 + C] = ⋅ =
a ( x )2 + 1 a x2 + a2
dx a a

d 1
sin−1 x = √ (for ∣x∣ < 1). Hence, for ∣ ∣ < 1 or ∣x∣ < ∣a∣, we have:
(b) By Fact 144,
dx 1 − x2 ∣a∣

833, Contents

⎧ √

⎪ 1 1 a2 1 1 1

⎪ √ = = √ = √ for a ≥ 0

⎪ a2 − x2 a

⎪ − ( )
x 2a a2 − x2 a a2 − x2
d ⎪

1 a
(sin−1 + C) = ⎨
dx ∣a∣ ⎪

⎪ √

⎪ −a

⎪ √
) =
) = √ (−
) = √
for a < 0

⎪ a2 − x2 a a2 − x2 a a2 − x2
⎪ − ( )
x 2 a
⎩ 1 −a

Exercise 344. Verify Proposition 9(e)–(h). (Hint: In each, you will have to examine
two cases, similar to (b) above.) (Answers on p. 1547)

834, Contents

80.5. Completing the Square: ∫ dx where b2 − 4ac < 0
ax + bx + c

Consider the following antiderivative:

∫ ax2 + bx + c dx.

In Chs. 80.1 and 80.2, we already learnt to find the above antiderivative, in those cases
where b2 − 4ac ≥ 0.
But if b2 −4ac < 0, then those earlier techniques will not work. Instead, we will have to learn
a new technique. This is to complete the square so that we can make use of Proposition
1 ⋆ 1
∫ x2 + a2 dx = a tan a + C.
−1 x

Example 1038. Find ∫ 2 dx.
x +x+1

1 2 3
Observe that: x2 + x + 1 = (x + ) + .
2 4

1 1
Hence: ∫ x2 + x + 1 dx = ∫ dx.
(x + 21 ) + 34

√ ⋆
Now, let x + 1/4 and 3/4 take the places of “x” and “a” in =. Then:

1 1
∫ x2 + x + 1 dx = ∫ dx
(x + 21 ) + 34

⋆ 1 x + 21
=√ tan −1
√ +C
3/4 3/4
2 2x + 1
= √ tan−1 √ + C.
3 3

For how to complete the square, see Ch. ??. In general, we have:

b 2 b2
ax + bx + c = a (x + ) + c − .
2a 4a
But rather than to try memorise the above formula, it’s probably easier to try to understand
and thus easily “see” how you can complete the square in each case.

835, Contents

Example 1039. Find ∫ dx.
2x + 3x + 5

Let’s first rewrite the integrand so that the leading coefficient is 1 and stick any constants
in front:
1 1 1
∫ 2x2 + 3x + 5 dx = 2 ∫ x2 + 1.5x + 2.5 dx.

3 2 31
Complete the square: x + 1.5x + 2.5 = (x + ) + .
4 16
√ ⋆
Let x + 3/4 and 31/16 take the places of “x” and “a” in =. Then:

1 1 1
∫ 2x2 + 3x + 5 dx = 2 ∫ dx
(x + 4 ) + 16
3 2 31

⋆ 1 1 x + 3/4
= √ tan−1 √ +C
2 31/16 31/16
2 4x + 3
= √ tan−1 √ + C.
31 31

Remark 120. In Ch. 81, we will learn another technique for finding the antiderivative
covered in this subchapter.

In this subchapter and also Chs. 80.1 and 80.2, we’ve learnt to find the antiderivative
of the reciprocal of any quadratic polynomial. We summarise these results in Fact
152, which looks intimidating, but is really just what we’ve been doing, except with actual
numbers in place of a, b, and c.

Fact 152. Suppose a, b, c ∈ R with a ≠ 0 and d = ∣b2 − 4ac∣. Then:

⎪ x + b−d

ln ∣ ∣+C for b2 − 4ac > 0,


⎪ x + 2a

d b+d

1 ⎪

∫ ax2 + bx + c dx = ⎨− 1 + C for b2 − 4ac = 0,

⎪ x + 2a


+C for b2 − 4ac < 0.

⎩d d

Proof. See p. 1359 in the Appendices.

836, Contents

Remark 121. Fact 152 is for your reference only. Rather than try to memorise this result,
it is probably wiser to understand and know how to perform the steps that lead to it.

Exercise 345. Find each antiderivative. (Answer on p. 837.)

(a) xxx
(b) xxx


837, Contents

80.6. ∫ √ dx in the Special Case where a < 0
ax2 + bx + c

Example 1040. Find ∫ √ dx.
−x2 + x + 1

√ 2
5 1 5 1 2
Complete the square:−x + x + 1 = − (x − ) = (
) − (x − ) .
4 2 2 2

By Proposition 9(b), we have:

∫ √ 2 dy = sin−1 + C.
d − y2 ∣d∣

By letting 5/2 and x − 1/2 take the places of d and y , we have:331

1 1
∫ √ dx = ∫ √ √ dx
−x2 + x + 1 2
( 25 ) − (x − 21 )

x − 12
= sin −1
√ +C
∣ 5/2∣
2x − 1
= sin−1 √ + C.

Note that here we secretly use the LPC Rule.
838, Contents
Example 1041. Find ∫ √ dx.
−2x2 + 3x + 5

3 5 49 3 2 7 2 3 2
−2xthe+ square:
Complete 3x + 5 = 2 (−x + x + ) = 2 [ − (x − ) ] = 2 [( ) − (x − ) ].
2 2
2 2 16 4 4 4

By Proposition 9(b), we have:

∫ √ 2 dy = sin−1 + C.
d − y2 ∣d∣

By letting 7/4 and x − 3/4 take the places of d and y in Proposition 9(b), we have:

1 1
∫ √ dx = ∫ √ √ dx
−2x2 + 3x + 5 2 ( 4 ) − (x − 4 )
7 2 3 2

1 x − 3/4
= √ sin−1 +C
2 ∣7/4∣
1 4x − 3
= √ sin−1 + C.
2 7

Exercise 346. Find each antiderivative. (Answer on p. 839.)

(a) ∫ √ dx.
−3x2 + x + 6
(b) ∫ √ dx.
−7x2 − x + 2

A346(a) Complete the square:

⎡ √ ⎤
⎢ 1 2 ⎥⎥
2 2
1 73 1 ⎢
−3x + x + 6 = 3 [−x + x + 2] = 3 [ − (x − ) ] = 3 ⎢(
2 2 73
) − (x − ) ⎥.
3 36 6 ⎢ 6 6 ⎥
⎣ ⎦
So: 1 1
∫ √ dx = ∫

√ √ dx
−3x2 + x + 6 2
3 ( 673 ) − (x − 61 )

1 x − 1/6
= √ sin−1 √ +C
3 73/6
1 6x − 1
= √ sin−1 √ + C.
3 73

(b) Complete the square:

839, Contents

⎡ √ 2⎤
⎢ 57 ⎥
1 2
− (x + ) ] = 7 ⎢⎢( ) − (x + ) ⎥⎥.
1 2 57 1
−7x − x + 2 = 7 [−x + x + ] = 7 [
2 2
7 7 196 14 ⎢ 14 14 ⎥
⎣ ⎦
So: 1 1
∫ √ dx = ∫

√ √ dx
−7x2 − x + 2 2
7 ( 1457 ) − (x + 14 )
1 2

1 x + 1/14
= √ sin−1 √ +C
7 57/14
1 14x + 1
= √ sin−1 √ + C.
7 57

For your reference, here is the general formula:

Fact 153. Suppose a, b, c ∈ R with a < 0. If ax2 + bx + c > 0, then:

1 1 −2ax − b
∫ √ 2 dx = √ sin−1 √ + C.
ax + bx + c ∣a∣ b2 − 4ac

Proof. See p. 1361 in the Appendices.

You can verify that the above general formula “works” for the above examples and exercises.

Remark 122. In this subchapter, we’ve learnt to find the antiderivative

∫ √ 2 dx, but only in the special case where a < 0.
ax + bx + c
Happily, your H2 Maths syllabus does not include finding this antiderivative in the case
where a > 0.332

But see Fact 227 (Appendices) if you’re interested.
840, Contents
80.7. Using Trigonometric Identities
The following Rules of Antidifferentiation are explicitly listed on your H2 Maths syllabus,
but sadly do not appear on List MF26. Which means you’ll have to know how to derive

Proposition 10. If x ∈ R, then:

1 sin 2x
∫ sin x dx = 2 x − 4 + C,

1 sin 2x
∫ cos x dx = 2 x + 4 + C,

If x ∈ R and x is not an odd multiple of , then:


∫ tan x dx = tan x − x + C,

If x, m, n ∈ R, m + n ≠ 0, and m − n ≠ 0, then:
1 cos (m − n) x cos (m + n) x
(d) ∫ sin mx cos nx dx = − [ + ] + C,
2 m−n m+n
1 sin (m − n) x sin (m + n) x
(e) ∫ sin sin dx = [ − ] + C,
m−n m+n
mx nx
1 sin (m − n) x sin (m + n) x
(f) ∫ cos mx cos nx dx = [ + ] + C.
2 m−n m+n
(In each, C is, as usual, the constant of integration.)

Proof. (a) Use the identity cos 2x = 1 − 2 sin2 x (see Exam Tip below):

1 − cos 2x 1 sin 2x
∫ sin x dx = ∫ dx = x − + C.
2 1
2 2 4
You are asked to prove (b)–(f) in Exercise 347.

Exam Tip for Towkays

Whenever you see a question with trigonometric functions, put MF26 (p. 3) next to you!

Exercise 347. Prove Proposition 10(b)–(f). (Answer on p. 1548.)

841, Contents

80.8. Integration by Parts (IBP)
Integration by Parts (IBP) is simply the inverse of the Product Rule (for differenti-
Suppose u and v are differentiable functions.333 Then: by the

By the Product Rule: (u ⋅ v) = u′ ⋅ v + u ⋅ v ′ .

Equivalently: u⋅v = ∫ (u′ ⋅ v + u ⋅ v ′ ) = ∫ u′ ⋅ v + ∫ u ⋅ v ′ .

∫ u ⋅ v = u ⋅ v − ∫ u ⋅ v.
′ ′
This last equation is our Integration by Parts (IBP) formula. For future reference, let’s jot
it down as a formal result:

Theorem 30. (Integration by Parts) If u and v are differentiable functions, then:

∫ u ⋅ v = u ⋅ v − ∫ u ⋅ v.
′ ′

Example 1042. Find ∫ xex .

To use IBP, we must first decide: Which of x or ex is u and which is v ′ ?
Let us choose v ′ = ex , so that v = ex . And now:

u′ v
©© ©© ©©
u v u v

∫ x e = x e − ∫ 1 e = xe − e + C = e (x − 1) + C.
x x x x x x

Example 1043. Find ∫ x sin x.

To use IBP, we must first decide: Which of x or sin x is u and which is v ′ ?
Let us choose v ′ = sin x, so that v = − cos x. And now:

©¬ ©³¹¹ ¹ ¹ ¹ ¹ ¹ ¹· ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ ©³¹¹ ¹ ¹ ¹ ¹ ¹ ¹· ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ
u v′ u
v u′ v

∫ x sin x = x (− cos x) − ∫ 1 (− cos x) = −x cos x + sin x + C.

It turns out that to choose v ′ , we want to use mnemonic and rule of thumb dETAIL. That
is, choose the derivative v ′ in this order:

Exponential, Trig., Algebraic, Inverse trig., Logarithmic.

The above rule of thumb (usually) works because it is easiest to find an antiderivative of
an Exponential function and hardest to find one of a Logarithmic function.334
More examples:

Following common practice, I use the functions u and v (rather than say f and g) when discussing IBP.
Kasube (1983) first gave it as LIATE. By reversing the letters and adding the letter d in front, we get
the actual English word dETAIL, which is probably easier to remember.
842, Contents
Example 1044. Find ∫ sin x cos x.
An easy way to find this antiderivative is to recall that sin x cos x = sin 2x and hence:
1 1 1
∫ sin x cos x = 2 ∫ sin 2x = − 4 cos 2x + C.

But here as an exercise, let’s try using IBP to find this antiderivative.
This time the dETAIL rule of thumb doesn’t help us because we have two trigonometric
functions. So let’s just choose v ′ = cos x, so that v = sin x and:

¬¬ ¬¬ ¬¬
u v′ u v u′ v

∫ sin x cos x = sin x sin x − ∫ cos x sin x.

Rearranging, we have:

2 1
2 ∫ sin x cos x = sin2 x + Ĉ or ∫ sin x cos x = 2 sin x + C̄.

Can you explain why = and = are consistent with each other?335
1 2

Sometimes we need to apply IBP more than once:

Example 1045. Find ∫ x2 ex .

By dETAIL, we should choose v ′ = ex , so that v = ex and:

©© 1 ©©
u v′ ′
ª ©
u u v v

∫ x e = x e − ∫ 2x e .
2 x 2 x x

At this point, we would apply IBP a second time. But we already did this in an earlier
example and found that ∫ xex = ex (x − 1). So let’s just plug = into =:
2 2 1

∫ x e = x e − 2e (x − 1) + C = e (x − 2x + 2) + C.
2 x 2 x x x 2

Sometimes we can use IBP together with the Times One Trick:

Example 1046. To find ∫ ln x dx, the Times One Trick and IBP work wonders:

∫ ln x dx = ∫ ln x ⋅ 1 dx = (ln x) x − ∫ x x dx = x ln x − ∫ 1 dx = x ln x − x + C.

1 1 1
Recall that cos 2x = cos2 x − sin2 x = 1 − 2 sin2 x. Hence, − cos 2x = sin2 x − . The additional term
4 2 4
“− ” is not a problem once we also recall that indefinite integrals can differ by up to a constant.
843, Contents
Remark 123. Unfortunately, I have not found any good mnemonic for the IBP formula
(Theorem 30). But perhaps this is all for the best, since this may occasionally force you
to derive it from the Product Rule (and hence understand where it comes from):

(u ⋅ v) = u′ ⋅ v + u ⋅ v ′
u ⋅ v = ∫ u′ ⋅ v + ∫ u ⋅ v ′

∫ u ⋅ v = u ⋅ v − ∫ u ⋅ v.
′ 1 ′

In Ch. 81.7, we will show that the IBP formula = can also be rewritten as:

∫ u dv = u ⋅ v − ∫ v du.

A mnemonic that does work well with this alternative formula is “ultraviolet voodoo”.

Exercise 348. Find each antiderivative. (Answer on p. 844.)

(a) ∫ x3 ex dx. (You may use what we found in Example 1045.)

(b) ∫ x2 sin x dx.

Exercise 349. Starting with ∫ dx, Aisha arrives at the conclusion that 0 = 1:

1 1 1 −1 1
“∫ dx = ∫ ⋅ 1 dx = ⋅ x − ∫ 2 x dx = 1 + ∫ dx.
x x x x x

“Now subtract ∫ dx from both sides to get 0 = 1.”

Identify Aisha error. (Answer on p. 845.)

2 1
Exercise 350. Aisha now starts with the definite integral ∫ dx and again arrives at
1 x
the conclusion that 0 = 1:
2 1 2 1 1 −1 2
1 2 2 1
“∫ dx = ∫ ⋅ 1 dx = [ ⋅ x − ∫ 2 x dx] = [1 + ∫ dx] = 1 + ∫ dx.
1 x 1 x x x 1 x 1 1 x

2 1
“Now subtract ∫ dx from both sides to get 0 = 1.”
1 x

Identify Aisha’s error. (Answer on p. 845.)

A348(a) By dETAIL, we should choose v ′ = ex , so that v = ex and:

∫ x e dx = x e − ∫ 3x e dx.
3 x 1
3 x 2 x

844, Contents

In Example 1045, we already found that ∫ x2 ex = ex (x2 − 2x + 2) + Ĉ. Plug = into = to
2 2 1


∫ x e dx = x e − 3e (x − 2x + 2) + C = e (x − 3x + 6x − 6) + C.
3 x 3 x x 2 x 3 2

(b) By dETAIL, we should choose v ′ = sin x, so that v = cos x and:

∫ x sin x = x (− cos x) − ∫ 2x ⋅ (− cos x) = −x cos x + 2 ∫ x cos x.

2 2 21

Now apply IBP a second time, this time choosing cos x as our new “v ′ ”:

∫ x cos x = x sin x − ∫ 1 ⋅ sin x = x sin x + cos x + Ĉ.


Plugging = into =, we get:

2 1

∫ x sin x = −x cos x + 2 ∫ x cos x = −x cos x + 2 (x sin x + cos x) + C.

2 1 2 2

A349. The first sentence is correct. In particular, it is correct to write:

1 1
∫ x dx = 1 + ∫ x dx.

Recall that indefinite integrals are unique, but only up to a constant of integration. And
so, more generally, given any continuous function u, we may write:

∫ f (x) dx = C + ∫ f (x) dx, for any C ∈ R.

Aisha’s error is in the second sentence — with indefinite integrals, the ∫ f (x) dx on the
LHS may differ from the ∫ f (x) dx on the RHS by a constant and so we cannot simply
cancel them out.
2 1
A350. This time, the second sentence is correct. Any definite integral, such as ∫ dx,
1 x
2 1
is simply a number. It is therefore perfectly legitimate to cancel out ∫ dx from both
1 x
sides of an equation.
This time, the error lies in the last step of the first sentence. In particular, the following
equation is false:
1 2 1
“[1 + ∫ dx] = 1 + ∫ dx.”
x 1 1 x

We should instead write:

1 1 1 1 1
[1 + ∫ dx] = [1 + ∫ dx] − [1 + ∫ dx] = [∫ dx] − [∫ dx] =
x 1 x x=2 x x=1 x x=2 x x=1
2 1
∫1 x dx.

845, Contents

81. The Substitution Rule
The Substitution Rule is simply another antidifferentiation technique, but we’ll give it
its own chapter.
We give several simple examples, then briefly discuss how and why the Substitution Rule

Example 1047. Find ∫ cot x dx (where x ≠ kπ for any k ∈ Z).

cos x d
First, observe that: cot x = and sin x = cos x.
sin x dx

The above observations suggest that we use the following substitution:

u = sin x.

du 2
From =, we also have: = cos x.

And now we will plug in = and =:

1 2

cos x cos x du 1 du
∫ cot x dx = ∫ sin x dx = ∫ dx = ∫
1 2
u dx u dx
So far, so normal. But we’ll now do something strange — namely, take the last expression
and simply “cancel out the dx’s”:
1 du  s 1
∫ u  dx = ∫ du.
We call this step where we “cancel out the dx’s” an application of the Substitution Rule
and will denote it by =

And now:
∫ u du = ln ∣u∣ + C = ln ∣sin x∣ + C,

where we must always remember to plug back the initial substitution u = sin x to get rid

of u.

846, Contents

Example 1048. Find ∫ 2x cos x2 dx.
d 2 du 2
x = 2x, we use the substitution u = x2 , so that we also have = 2x.
Observing that
dx dx
And now:
du du
∫ 2x cos x dx = ∫ 2x cos u dx = ∫ dx cos u dx = ∫  cos u 
2 1 2 s

= ∫ cos u du = sin u + C = sin x2 + C.


At =, we apply the Substitution Rule and simply “cancel out the dx’s”. And as always,

the final step is to plug back the initial substitution u = x2 to get rid of u.

Example 1049. Find ∫ (2x + 1) sin (x2 + x) dx.

(x2 + x) = 2x + 1, we use the substitution u = x2 + x, so that we also
Observing that
du 2
have = 2x + 1.
And now:
∫ (2x + 1) sin (x + x) dx = ∫ (2x + 1) sin u dx = ∫ dx sin u dx
2 1 2

= ∫ sin u du = − cos u + C = − cos (x2 + x) + C.

s 1

At =, we apply the Substitution Rule and simply “cancel out the dx’s”. And as always,

the final step is to plug back the initial substitution u = x2 + x to get rid of u.

847, Contents

Example 1050. Find ∫ (x3 + 2x2 ) (3x2 + 4x) dx.
One method would be to expand the integrand, then antidifferentiate term by term:
1 6 4 5 6 5 1 6
∫ (x + 2x ) (3x + 4x) dx = ∫ 3x +4x +6x +8x dx = 2 x + 5 x + 5 x +2x +C = 2 x +2x +2x +C
3 2 2 5 4 4 3 4 5 4

Another method is to observe that (x3 + 2x2 ) = 3x2 + 4x, which suggests the substitu-
du 2 2
tion u = x3 + 2x2 . With this substitution, we have = 3x + 4x and:
∫ (x + 2x ) (3x + 4x) dx = ∫ u (3x + 4x) dx = ∫ u dx dx
3 2 2 1 2 2

1 1 1
= ∫ u du = u2 + C = (x3 + 2x2 ) + C.
s 2
2 2

At =, we apply the Substitution Rule and simply “cancel out the dx’s”. And as always,

the final step is to plug back the initial substitution u = x3 + 2x2 to get rid of u.

Remark 124. The Substitution Rule is also called integration by substitution,

change of variables, substitution by parts, or u-substitution.

Informally, the Substitution Rule says that we can simply “cancel out the dx’s”:

du du 
dx “=”  
dx “=” du. ,
dx dx

Here we must replay our earlier warning that is not a fraction and we should not think
of the dx’s as numbers. So, , is best thought of as being merely a convenient and informal
mnemonic for the Substitution Rule. Formally, we are not doing anything like cancelling
out the dx’s.
As it turns out, the Substitution Rule is simply the inverse of the Chain Rule. To
see why, let us first state and prove the following result, which is simply the Chain Rule
inverted. We will then explain why this result gives us the Substitution Rule:

Proposition 11. Let a, b ∈ R with a < b, f ∶ [a, b] → R be a differentiable function, and g

be a differentiable function for which the composite function f ○ g exists. Then:

∫ [(f ○ g) ⋅ g ] = f ○ g + C.
′ ′
(Substitution Rule)

Proof. By the Chain Rule, the derivative of f ○ g is:

(f ○ g) = (f ′ ○ g) ⋅ g ′ .

And so equivalently, an antiderivative of (f ′ ○ g) ⋅ g ′ is f ○ g:

848, Contents

∫ [(f ○ g) ⋅ g ] = f ○ g + C,
′ ′

where as usual, C is the COI.

It is not at all obvious why the equation ∫ [(f ′ ○ g) ⋅ g ′ ] = f ○ g gives us the Substitution

To see why, first observe that we can rewrite LHS of = in more familiar notation:
∫ [(f ○ g) ⋅ g ] = ∫ f (g (x)) g (x) dx = ∫ f (g (x)) dx dx.
′ ′ ′ ′ 1 ′

Next, by definition f = ∫ f ′ . And so, we can rewrite RHS of = as:

f ○ g = f (g (x)) = ∫ f ′ (t) dt∣


Putting = and = together, we can rewrite = as:
1 2

dg ○
∫ f (g (x)) dx dx = ∫ f (t) dt∣
′ ′
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ t=g(x)
∫ f (g(x)) dg

This last equation = is recognisable as the Substitution Rule. By (informally) “cancelling

the dx’s”, the LHS of = (informally) becomes336 ∫ f ′ (g (x)) dg. This last expression seems
to say:

“Find the antiderivative of f ′ , then plug in g (x).”

But this is exactly what the RHS of = says.

Exercise 351. XXX (Answer on p. 849.)


This is informal because we have not at any point in this textbook defined what an expression like
∫ f (g (x)) dg might mean.

849, Contents

Remark 125. Many calculus textbooks337 incorrectly present the Substitution Rule as:

∫ f (g (x)) g (x) dx = ∫ f (u) du.

′ 1

The above equation says that the functions (f ○ g)⋅g ′ and f have the same antiderivatives,
which is clearly false!
This observation was already made by David Gale in his 1994 article “Teaching Integra-
tion by Substitution”. I can do no better than reproduce his remarks:

Of course the equation is false. The expression ∫ f (x) dx∫ stands for
antiderivative, as in a table of integrals, and the variable, be it x, t, u
or anything else is a dummy. Clearly the antiderivatives on the left and
right above are not equal. What the books mean, no doubt, is that if you
substitute g (x) for u after taking the antiderivative on the right you get
the antiderivative on the left. I expect some readers will say I am being
pedantic or that there is no need to be so rigorous at the freshman level,
but I think this kind of lapse is symptomatic of a rather strange set of
standards and perhaps it sheds light on why none of the books proves the
inverse substitution theorem. It is because none of them formulates it.
Without changing it too much, here are two corrected versions of =:

∫ f (g (x)) g (x) dx = ∫ f (u) du∣u=g(x) ∫ (f ○ g) ⋅ g = (∫ f ) ○ g.

′ ′

Or alternatively, if F = ∫ f , then:

∫ f (g (x)) g (x) dx = F (g (x)) + C ∫ (f ○ g) ⋅ g = F ○ g.

′ ′ 3

Some textbooks that make this error are: Stewart (Single Variable Calculus, 2011, p. 331); Thomas
and Finney (Calculus and Analytic Geometry, 1998, p. 294) — see also Hass, Heil, and Weir (Thomas’
Calculus, 2018, p. 291). Also, ProofWiki (retrieved 2018-10-06-1058).
Some textbooks that do not make this error are: Apostol (Calculus: Volume I, 1967, p. 212); Larson
and Edwards (Calculus of a Single Variable, 2016, p. 296).
850, Contents
81.1. ∫ [(f ○ g) ⋅ g ] = f ○ g + C
′ ′

We now repeat our earlier derivation of the Substitution Rule. By the Chain Rule:

(f ○ g) = (f ′ ○ g) ⋅ g ′ .

And so equivalently:

∫ [(f ○ g) ⋅ g ] = f ○ g + C.
′ ′

As we saw earlier, by manipulating = a little, we can make it look more like the Substitution
Now, observe that by recognising that an integrand is of the form (f ′ ○ g) ⋅ g ′ , we can
immediately write down its antiderivative and so skip the step of making any substitution.

Example 1051. In Example 1048, we found ∫ (cosx2 ) 2x dx by using the substitution

u = x2 .
But we can actually skip this substitution altogether. To do so, observe that:

(x2 ) ′ = 2x .
± ¯ g′

∫ ( cos x ) 2x dx = sin x + C.
2 2

So: ↓ ↓ ↓ ↓ ↓
f ′ g g′ f g

What we’ve done here is secretly equivalent to what we did earlier, but just much quicker.

Example 1052. In Example 1048, we found ∫ sin(x2 + x)(2x + 1) dx by using the sub-
stitution u = x2 + x.
But we can actually skip this substitution altogether. To do so, observe that:

(x2 + x)′ = 2x + 1.
´¹¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¶ ²

∫ ( sin (x + x) ) 2x + 1 dx = − cos (x + x) + C.
2 2

So: ↓ ↓ ↓ ↓ ↓
′ ′
f g g f g

Again, what we’ve done here is secretly equivalent to what we did earlier, but just much

851, Contents

Exercise 352. XXX (Answer on p. 852.)

852, Contents

81.2. ∫ f exp f = exp f + C

Example 1053. Find ∫ esin x cos x dx.

d du 2
sin x = cos, we try the substitution u = sin x, so that = cos x and:
Observing that
dx dx
∫ e cos x dx = ∫ eu cos x dx = ∫ eu dx = ∫ eu du + C = eu + C = esin x + C,
sin x 1 2 s 1

where as usual, the last step is to plug back the original substitution u = sin x to get rid

of u.

e x
Example 1054. Find ∫ √ dx.
d√ 1 1 √ du 1
Observing that x = − √ , we try the substitution u = x, so that = − √ or
dx 2 x dx 2 x
1 2 du
√ = −2 and:
x dx

e x eu √
u du
∫ √ = ∫ √ = −2 ∫ = −2 ∫ + = −2e + = −2e + C,
1 2 s 1
dx dx e dx eu
du C u
C x
x x dx

where as usual, the last step is to plug back the original substitution u =
x to get rid
of u.

In general:

Fact 154. Suppose f is a differentiable function. Then:

∫ f exp f = exp f + C.

Proof. By the Chain Rule: (exp f + C) = (exp f ) f ′ .
And so, by recognising that an integrand is of the form f ′ exp f , we can simply skip the
substitution altogether and thus solve the problem more quickly. We now revisit the above
two examples:

Example 1055. Find ∫ esin x cos x dx.

Observing that sin x = cos, we immediately conclude:
∫ e
sin x
cos x dx = esin x + C.

853, Contents

e x
Example 1056. Find ∫ √ dx.
d√ 1
Observing that x = − √ , we immediately conclude:
dx 2 x

e x √
∫ √ dx = −2e x
+ C.

Exercise 353. XXX (Answer on p. 854.)


854, Contents

∫ f = ln ∣f ∣ + C

Example 1057. Earlier in Example 1047, we found ∫ cot x dx (for x not an integer multiple of

π) by using the substitution u = sin x.

We can skip this substitution altogether by recognising that:

cos x
cot x = and sin′ x = cos x.
sin x

cos x sin′ x
And hence: ∫ cot x dx = ∫ sin x dx = ∫ sin x dx = ln ∣sin x∣ + C.

Again, what we’ve done here is secretly equivalent to what we did earlier, but just much

In general:

Fact 155. Suppose f is a differentiable function. Then:

∫ f = ln ∣f ∣ + C (for f ≠ 0).

d f′
Proof. By the Chain Rule: (ln ∣f ∣ + C) = .
dx f

And so, by recognising that an integrand is of the form f ′ exp f , we can simply skip the
substitution altogether and thus solve the problem more quickly.

855, Contents

2x + cos x
Example 1058. Find ∫ 2 dx.
x + sin x
Let us first solve this problem using the substitution u = x2 + sin x.

du 2
This substitution gives us = 2x + cos x and:
2x + cos x 2x + cos x 1 du 1
∫ x2 + sin x dx = ∫ dx = ∫ dx = ∫ du + C
1 2 s
u u dx u
= ln ∣u∣ + C = ln ∣x2 + sin x∣ + C.

where as usual, the last step is to plug back the original substitution u = x2 + sin x + 1 to

get rid of u.
Let us now redo the problem without explicitly making the substitution. Observe that
the derivative of the integrand’s denominator is its numerator:

(x2 + sin x) = 2x + cos x.

2x + cos x (x2 + sin x)
∫ x2 + sin x = ∫ x2 + sin x dx = ln ∣x + sin x∣ + C.
And hence: dx

In this example, the second solution is secretly equivalent to the first, but just much

We will now prove one of the Rules of Antidifferentiation from Proposition 9:

Example 1059. Find ∫ tan x dx.

By definition, tan = .
We also know that cos′ = − sin.
sin − sin cos′
So: tan = =− =− .
cos cos cos

− sin x cos′ (x) ⋆

Thus: ∫ tan x dx = − ∫ dx = − ∫ dx = − ln ∣cos x∣ + C = ln ∣sec x∣ + C.
cos x cos x

856, Contents

Exercise 354. Find each antiderivative. (Answer on p. 857.)
3x2 + sin x
(a) ∫ dx.
x3 − cos x
sin 2x
(b) ∫ dx.
sin2 x + 1
10x + cos x
(c) ∫ dx.
5x2 + sin x
3x2 + 2x + 1
(d) ∫ 3 dx.
x + x2 + x + 1

A354(a) Since (x3 − cos x) = 3x2 + sin x, we have:
3x2 + sin x
∫ x3 − cos x dx = ln ∣x − cos x∣ + C.

(b) Since (sin2 x + 1) = 2 sin x cos x = sin 2x, we have:
sin 2x
∫ dx = ln ∣sin2 x + 1∣ + C = ln (sin2 x + 1) + C.
sin x + 1

Note that we can remove the absolute value sign because .

(c) Since (5x2 + sin x) = 10x + cos x, we have:
10x + cos x
∫ 5x2 + sin x dx = ln ∣5x + sin x∣ + C.

(d) Since (x3 + x2 + x + 1) = 3x2 + 2x + 1, we have:
3x2 + 2x + 1
∫ x3 + x2 + x + 1 dx = ln ∣x + x + x + 1∣ + C.
3 2

857, Contents

81.4. ∫ (f )n
⋅ = (f )n+1 + C
f ′

(f ) = 2f ⋅ f ′ . Hence:
By the Chain Rule,
∫ f ⋅ f = 2 (f ) + C.
′ 2

Example 1060. Earlier in Example 1050, one method by which we found

∫ (x + 2x ) (3x + 4x) dx was to use the substitution u = x + 2x .
3 2 2 3 2

If we recognise that (x3 + 2x2 ) = 3x2 + 4x and hence that the integrand is of the form
f ⋅ f ′ , then we can skip the substitution altogether and simply write:

2 ′ 1 3
∫ (x + 2x ) (3x + 4x) dx = ∫ (x + 2x ) (x + 2x ) dx = 2 (x + 2x ) + C.
3 2 2 3 2 3 2 2

(f ) = 3 (f ) ⋅ f ′ . Hence:
3 2
Similarly, by the Chain Rule,
∫ (f ) ⋅ f = 3 (f ) + C.
2 ′ 3

Example 1061. Find ∫ (x3 + 2x2 ) (3x2 + 4x) dx. Again, one method is to expand

the integrand then antidifferentiate term by term. Another is to use the substitution
u = x3 + 2x2 .

The best method of all is to simply recognise that (x3 + 2x2 ) = 3x2 + 4x and hence that
the integrand is of the form (f ) ⋅ f ′ . We can then skip the substitution altogether and

simply write:

2 ′ 1 3
∫ (x + 2x ) (3x + 4x) dx = ∫ (x + 2x ) (x + 2x ) dx = 3 (x + 2x ) + C.
3 2 2 2 3 2 2 3 2 3

In general, by the Chain Rule, for any n ≠ −1, we have:

(f ) = (n + 1) (f ) ⋅ f ′ .
n+1 n

Fact 156. Let n ≠ −1 be a real number. Suppose f is a differentiable function. Then:

∫ (f ) f = n + 1 (f ) + C.
n ′ n+1

858, Contents

Example 1062. Find ∫ (x3 + 5x2 − 3x + 2) (3x2 + 10x − 3) dx.

One method is to fully expand the integrand to get a 152nd-degree polynomial, then
integrate this polynomial term-by-term. This is doable, but absurdly tedious.
Another is to use the substitution u = x3 + 5x2 − 3x + 2, as we do now. Given =, we also
1 1

(x3 + 5x2 − 3x + 2) = 3x2 + 10x − 3.
This suggests that we should use the substitution .
We have:
du 2 2
= 3x + 10x − 3.
And now:

∫ (x + 5x − 3x + 2) (3x + 10x − 3) dx = ∫ u (3x + 10x − 3) dx

3 2 502 1
50 2

= ∫ u50
= ∫ u50 du

1 51
= u +C
1 1
= (x3 + 5x2 − 3x + 2) + C,
where as usual, the last step is to plug back the original substitution u = x3 + 5x2 − 3x + 2

to get rid of u.

859, Contents

81.5. Building a Derivative
Here we’ll continue using the ln trick. Sometimes though, the ln Trick may not be so
obvious and may require you to do a bit of work to get it set up first. We call this building
a derivative of the denominator.

Example 1063. Find ∫ 2 dx.
x +x+1

This time, we have: (x2 + x + 1) = 2x + 1.

And so, it’s not obvious how we can apply the ln trick.
But what we can do is to rewrite the integral:
1 2x 1 2x + 1 − 1 1 2x + 1 1
∫ x2 + x + 1 dx = 2 ∫ x2 + x + 1 dx = 2 ∫ x2 + x + 1 dx = 2 (∫ x2 + x + 1 dx − ∫ x2 + x + 1 dx

And now, we can apply the ln trick to the first term:338

2x + 1
∫ x2 + x + 1 dx = ln (x + x + 1) + Ĉ.

(Note that in this last step, we can use parentheses instead of the absolute value sign
because x2 + x + 1 > 0 for all x ∈ R.)
For the second term, we can use what we learnt in Ch. 80.5:
1 1 1 2 −1 2x + 1
∫ x2 + x + 1 dx = ∫ dx = ∫ √ dx = √ tan √ + C̄.
(x + 12 ) + 34
2 2
(x + 2 ) + ( 2 )
1 2 3 3 3

Altogether then:
1 2x + 1 1 1 1 −1 2x + 1
∫ x2 + x + 1 dx = 2 (∫ x2 + x + 1 dx − ∫ x2 + x + 1 dx) = 2 ln (x + x + 1)− √ tan √ +C.
x 2
3 3

860, Contents

81.6. More Challenging Applications of the Substitution Rule
Your H2 Maths syllabus includes:
• integration by a given substitution.
I interpret this to mean that they promise to always give you the needed substitution on
your A-Level exams and that you don’t ever need to figure out what substitution to make
yourself. As far as I can tell, they’ve kept this promise so far (but who knows if they will
continue to do so).
Note though that your H2 Maths syllabus also includes, separately:
• integration of f ′ (x) [f (x)] (including n = −1), f ′ (x) ef (x) .

That is, you may not be given the substitution to make when faced with integrands of the
form f ′ (x) [f (x)] or f ′ (x) ef (x) . But this is no problem, since we’ve already thoroughly

covered these in previous subchapters.

In this subchapter, we’ll look at harder problems where you will, on the A-Level exams,
(probably) be given the necessary substitution.

861, Contents

1 − x2 dx using the substitution u = sin−1 x ∈ [− , ].
π π
Example 1064. Find ∫
2 2
dx 3
Rearranging =, we have x = sin u and = cos u.
1 2

Also, 1 − x2 = cos u because:

√ √
1 − x = 1 − sin2 u
2 2

= cos2 u (∵ sin2 u + cos2 u = 1)

= ∣cos u∣ (∵ y 2 = ∣y∣)

= cos u Since u ∈ [− , ], cos u ≥ 0.

π π
2 2

dx du ⋆
By the Inverse Function Theorem, we have = 1. We will use this in conjunction
du dx
with the Times One Trick:
√ √
∫ 1 − x dx = ∫ 1 − sin2 u dx
2 4

⋆ dx du
= ∫ cos u dx (Times One Trick)
du dx
= ∫ cos2 u dx
= ∫ cos2 u du

cos 2u + 1
=∫ du (Double Angle Formula)
sin 2u u
= + +C
4 2
sin u cos u u
= + +C
2 2

sin u 1 − sin2 u u
= + +C
2 2
x cos u sin−1 x
= + +C
2 2

1 − x2 sin−1 x
= + + C.
4 x
2 2

862, Contents

Example 1065. Find ∫ √ dx using the substitution u = (1 + 2x) .
1 1/3
1 + 2x
2 1 dx 3 3 2
Rearranging =, we have x = (u3 − 1) and = u.
2 du 2
x2 x2
∫ √ dx = ∫
1 + 2x u
(u3 − 1)
(u3 − 1) dx du

=∫ dx (Times One Trick)
4u du dx
(u3 − 1) 3 2 du
u dx
4u 2 dx
3 2 du
= ∫ u (u3 − 1) dx
8 dx
s 3
= ∫ u (u3 − 1) du
= ∫ u (u6 − 2u3 + 1) du
= ∫ u7 − 2u4 + u du
3 u8 2u5 u2
= ( − + )+C
8 8 5 2
3 2 u6 2u3 1
= u ( − + )+C
8 8 5 2
2/3 (1 + 2x) 2 (1 + 2x) 1
= (1 + 2x) [ − + ] + C.
8 8 5 2
= (1 + 2x) (20x2 − 12x + 9) + C.
Note that the last step is nice but not necessary.

863, Contents

Example 1066. Find ∫ dx using the substitution u = tan x.
1 + 3 cos x

du 2
As usual, we have: = sec2 x.
1 sec2 x
∫ 1 + 3 cos2 x dx = ∫ sec2 x + 3 dx (Multiply N and D by sec2 x)

1 du
sec2 x + 3 dx
sec2 x + 3
=∫ du (∵ tan2 x + 1 = sec2 x)
tan x + 4

=∫ 2
u +4
1 1
= ∫ du
(u) + 1
4 2

= (2 tan−1 ) + C
4 2
= tan−1 + C
2 2
1 1 tan x
= tan−1 + C.
2 2

864, Contents

The following two exercises are a little harder than what is typical on the A-Level exams.
They also provide a (rather roundabout) method of finding formulae for sin ○ sec−1 and
cos ○ tan−1 .
Exercise 355. Consider the function f ∶ (1, ∞) → R defined by: (Answer on p. 1549.)

f (x) = √ .
x2 x2 − 1

(a) Show that ∫ f dx = sin (sec−1 x) + C1 by using the substitution x = sec u, where

u ∈ (0, ). (Hint 1: The chosen values of u ensure that tan u > 0.)

x2 − 1 1
(b) Now show that ∫ f dx = + C2 by using the substitution u = 1 − 2 ∈ (0, 1).
x x
(c) Use (a) and (b) to prove that we have:

x2 − 1
sin (sec−1 x) = (at least for x > 1).
(Hint 2: How are two antiderivatives of the same function related? Hint 3: Plug in x = 2.)

Exercise 356. Consider the function g ∶ R → R defined by: (Answer on p. 128.12.)

g (x) = .
(x2 + 1)

(a) Show that ∫ g dx = + (tan −1
+ =
cos by using the substitution
cos (tan−1 x)
x) C 1 x

tan u, where u ∈ (− , ). (Hint: These values of u ensure that sec u > 0.)
π π
2 2
1 √
(b) Now show that ∫ g dx = √ + x2 + 1 + C2 by using the substitution u = x2 + 1.

x2 + 1
1 1 1
(c) Prove that if + a = + b (a, b ≠ 0), then either a = b or a = .
a b b
(d) Use (a), (b), and (c) to prove that we have:

cos (tan−1 x) = √
(for all x).
x2 + 1

865, Contents

Example 1067. Find ∫ exp (sin x) cos x dx.
Earlier in Example ??, we used the given substitution u = sin x.
But we can actually skip this substitution altogether by recognising that:

sin′ = cos and exp ′ = exp .

¯ ° ′ ° °
g g f ′

And thus: ∫ exp (sin x) cos x dx = ∫ exp (sin x)± cos x dx = exp (sin x) + C.
° ′
´¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¶ g′
°´¹¹ ¹ ¹¸¹ ¹ ¹ ¶
f g f g

This procedure is secretly equivalent to what we did in Example ??, but just much

Like the ln trick, we can similarly observe that:

d d 1 f ′ (x) d f (x) f (x) ′

[f (x)] = 2f (x) f ′ (x), =− = e f (x).
dx f (x) 2, e
dx [f (x)] dx

And hence:
f ′ (x) 1
∫ 2f (x) f (x) dx = [f (x)] + C, ∫ − dx = + C,
′ 2
[f (x)]
2 f (x)
∫ e f (x) dx = e + C.
f (x) ′ f (x)

866, Contents

Exercise 357. Find ∫ cot x dx (for x not a multiple of π; this is also from Proposition
9). (Answer on p. 1549.)
Exercise 358. Throughout this question, assume x is not an odd multiple of π/2.
sec x + tan x
(a) By first multiplying the above integrand by , then considering the deriva-
sec x + tan x
tive of sec x + tan x, prove Proposition 9(h):

∫ sec x dx = ln ∣sec x + tan x∣ + C.


In (b)–(e), we will prove also that:

2 1 1 + sin x
∫ sec x dx = 2 ln 1 − sin x + C.
cos x
(b) Show that sec x = .
1 − sin2 x
= +
(c) Show that , where A and B are constants to be found.
1 − sin2 x 1 + sin x 1 − sin x
(d) By plugging in (a) and (b), then considering the derivatives of the denominators,
prove =.

(e) By considering 2 ln ∣sec x + tan x∣ or otherwise, show that = and = are equivalent.
1 2

(f) Prove that tan (θ + ) = sec 2θ+tan 2θ. (This may be hard — see hints in footnote.)339
(g) Hence conclude that:

∫ sec x dx = ln ∣tan ( 2 + 4 )∣ + C.
3 x π
(Answer on p. 867.)

According to Rickey and Tuchinsky (1980), historically, the problem of solving the integral
∫ sec x dx was “one of the outstanding open problems of the mid-seventeenth century”
and was closely related to the construction of the Mercator map projection. In 1645,
Isaac Barrow (Newton’s teacher) gave the first “intelligible” solution.
sec x + tan x sec2 x + sec x tan x
A358(a) ∫ sec x dx = ∫ sec x dx = ∫ dx.
sec x + tan x sec x + tan x
Observe that (sec x + tan x) = sec2 x + sec x tan x. Thus:

Use the following steps:
tan A + tan B 6 sin θ cos θ + sin θ
tan (A + B) = , tan = 1, tan θ = , sin2 θ + cos2 θ = 1,
π 5
, multiply by
4 7
1 − tan A tan B 4 cos θ cos θ + sin θ
2 sin θ cos θ = sin 2θ, cos2 θ − sin2 θ = cos 2θ
8 9

867, Contents

sec2 x + sec x tan x
∫ sec x dx = ∫ sec x + tan x
dx = ln ∣sec x + tan x∣ + C.

1 cos x cos x
(b) sec x = = = .
cos x cos2 x 1 − sin2 x
1 1 A + B + (B − A) sin x
= = + =
(c) .
1 − sin x (1 + sin x) (1 − sin x) 1 + sin x 1 − sin x (1 + sin x) (1 − sin x)

Comparing coefficients, we have A + B = 1 and B − A = 0. Thus, A = B = 1/2.

cos x b 1 cos x cos x
(d) ∫ sec x dx = ∫ = ∫ +
dx dx.
1 − sin2 x 2 1 + sin x 1 − sin x
d d
Observe that (1 + sin x) = cos x and (1 − sin x) = − cos x. Thus:
dx dx
1 cos x cos x 1 cos x − cos x
∫ sec x dx = 2 ∫ 1 + sin x + 1 − sin x dx = 2 [∫ 1 + sin x dx − ∫ 1 − sin x dx]

⋆ 1 1 1 + sin x
= [ln (1 + sin x) − ln (1 − sin x)] + C = ln + C.
2 2 1 − sin x
Note that since x is not an odd multiple of π/2, we have −1 < sin x < 1 so that 1 + sin x > 0

and 1 − sin x > 0. Hence, at =, we can use parentheses instead of the absolute value sign.

1 sin x 1 + sin x 1 + sin x 2

2 ln ∣sec x + tan x∣ = 2 ln ∣ + ∣ = 2 ln ∣ ∣ = ln ∣ ∣
cos x cos x cos x cos x
(1 + sin x) (1 + sin x) (1 + sin x) (1 + sin x) 1 + sin x
= ln = ln = ln .
cos2 x 1 − sin2 x 1 − sin x
1 1 + sin x
Hence, ln ∣sec x + tan x∣ = and = and = are equivalent.
1 2
2 1 − sin x
(f) Following the hints, we have:

π 4 tan θ + tan π4 5 tan θ + 1 6 cos

sin θ
tan (θ + ) = π = = θ sin θ
4 1 − tan θ tan 4 1 − tan θ 1 − cos θ
sin θ + cos θ sin θ + cos θ cos θ + sin θ
= = ×
cos θ − sin θ cos θ − sin θ cos θ + sin θ
sin2 θ + cos2 θ + 2 sin θ cos θ 7 1 + 2 sin θ cos θ
= =
cos2 θ − sin2 θ cos2 θ − sin2 θ
1 + sin 2θ 9 1 + sin 2θ
= = = sec 2θ + tan 2θ.
cos2 θ − sin2 θ cos 2θ

(g) By (f), we have tan ( + ) = sec x + tan x.

x π
2 4
Plugging this last equation into =, we get =.
1 3

868, Contents

81.7. An Alternative Formula for IBP

Exercise 359. Recall that the formula for Integration by Parts is:

∫ u ⋅ v dx = u ⋅ v − ∫ u ⋅ v dx.
′ ′

Using the Substitution Rule, show that the above formula may also be written as:

∫ u dv = u ⋅ v − ∫ v du. (Answer on p. 869.)

A359. We have:
dv du
∫ u ⋅ v dx = ∫ u ⋅ dx dx = ∫ u dv ∫ u ⋅ v dx = ∫ dx ⋅ v dx = ∫ v du.
′ ′

∫ u ⋅ v dx = u ⋅ v − ∫ u ⋅ v dx ⇐⇒ ∫ u dv = u ⋅ v − ∫ v du.
′ ′

The above alternative formula for IBP is really just the exact same thing but under a
different guise. We revisit our first two examples from Ch. 80.8:

Example 1068. Find ∫ xex dx.

As before, we use the dETAIL rule of thumb:

©­ ©© © ¬
u dvu v v du

∫ x ex
dx = x ex
− ∫ ex
1 dx = xex − ex + C.

Example 1069. Find ∫ x sin x.

As before, we use the dETAIL rule of thumb:

©³¹¹ ¹ ¹ ¹ · ¹ ¹ ¹ ¹ ¹ µ ©³¹¹ ¹ ¹ ¹ ¹ ¹ ¹· ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹· ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ¬

u dv u
v v du

∫ x sin x dx = x (− cos x) − ∫ (− cos x) 1 dx = −x cos x + sin x + C.

Remark 126. As mentioned in Remark 123, if you’re comfortable with or even prefer the
above alternative formula for IBP, then a mnemonic that goes well with it is “ultraviolet

869, Contents

81.8. The Substitution Rule with the TOT and IBP
We can use the Substitution Rule and IBP to find the antiderivatives of the inverse trigo-
nometric functions:

Example 1070. To find ∫ tan−1 x, we can use the (1) Times One Trick; (2) IBP; and
the (3) Substitution Rule:

1 3 1
∫ tan x = ∫ 1 ⋅ tan x = x tan x − ∫ x ⋅ 1 + x2 = x tan x − 2 ln (1 + x ) + C.
−1 1 −1 2 −1 −1 2

Note that in the last step, we can use parentheses instead of the absolute value sign
because 1 + x2 > 0 for all x ∈ R.

Exercise 360. Find each antiderivative. (Answer on p. 870.)

(a) ∫ sin−1 x

(b) ∫ cos−1 x

A360. As in the above example, we’ll use the (1) Times One Trick; (2) IBP; and the (3)
Substitution Rule:
1 √
∫ sin x = ∫ 1 ⋅ sin x dx = x sin x − ∫ x ⋅ √ = x sin−1 x + 1 − x2 + C.
−1 1 −1 2 −1 3
1 − x2

−1 √
(b) ∫ cos−1 x = ∫ 1 ⋅ cos−1 x dx = x cos−1 x − ∫ x ⋅ √ = x cos−1 x − 1 − x2 + C.
1 2 3

1 − x2

870, Contents

81.9. Finding the Antiderivative of an Inverse Function (optional)

Suppose f is an invertible function and we’ve already figured out what ∫ f −1 is. Then we
have the following lovely formula340 for ∫ f :

Proposition 12. Let a, b ∈ R with a < b. Suppose f ∶ [a, b] → R is continuous and

invertible. Then:

∫ f (x) = xf (x) − ∫ f (y) ∣ + C.


y=f (x)

Proof. We’ll use the (1) Times One Trick; (2) IBP; and the (3) Substitution Rule again.
In addition, we’ll use (4) x = f −1 (f (x)):

∫ f (x) = ∫ 1⋅f (x) = xf (x)−∫ xf (x) = xf (x)−∫ [f (f (x)) ⋅ f (x)] = xf (x)−∫ f (y) ∣
1 2 ′ −14 ′ −1 3

Example 1071. We know that ∫ exp x = exp x + C.

By definition, ln is the inverse of exp. And so by Proposition 12, we have:

∫ ln x = x ln x − ∫ exp y∣ + C = x ln x − exp y∣ + C = x ln x − exp (ln x) + C =

x ln x − x + C.
y=ln x y=ln x

Reassuringly, this is the same as what we found earlier in Example 1046. (Of course, in
Example 1046, we were secretly using a few of the tricks that we also just used in the
proof of Proposition 12.)

Somewhat surprisingly, according to Wikipedia, this formula was first published only in 1905.
871, Contents
Example 1072. We know that ∫ tan x = ln ∣sec x∣ + C.
And so by Proposition 12, we have:

∫ tan x = x tan x − ∫ tan y∣ + C = x tan−1 x − ln ∣sec y∣ ∣ +C =

−1 −1
−1 −1
y=tan x y=tan x
x tan x − ln ∣sec (tan x)∣ + C = x tan x − ln [sec (tan x)] + C.
−1 −1 −1 −1

Note that we can remove the absolute value sign because tan−1 x ∈ (− , ) and thus
π π
2 2
sec (tan x) > 0.

But in Example 1070, we showed that:

∫ tan x = x tan x − 2 ln (1 + x ) + Ĉ.
−1 2 −1 2

It would appear that = and = are inconsistent. But by Lemma 1,

1 2

1 √
sec (tan−1 x) = = 1 + x2 .
cos (tan x)

And so, = and = are consistent because:

1 2

1 √
ln (1 + x2 ) = ln 1 + x2 = ln [sec (tan−1 x)] .

Example 1073. Consider the function k ∶ R → R defined by k (x) = x5 + x.

This is simply a polynomial function whose antiderivative we can easily find:
1 6 1 2
∫ k = ∫ x + x dx = 6 x + 2 x + C.

As mentioned in Example 266, the function k is invertible. That is, the function k −1 ∶
R → R exists. However, it is impossible to write down k −1 (x) as an algebraic expression.
Nonetheless, with the aid of Proposition 12, it is actually possible to write down the
antiderivative of k −1 in terms of k −1 :

1 6 1 2 1 6 1
∫ k (x) = xk (x)−∫ k (y) ∣y=k−1 (x) +C = xk (x)−[ 6 y + 2 y ]
−1 −1 −1
+C = − [k −1 (x)] − [k
y=k −1 (x) 6 2

Exercise 361. Use Proposition 12 to find (a) ∫ sin−1 ; and (b) ∫ cos−1 . Explain, using
Lemma 1, why your answers here are consistent with those for Exercise 360. (Answer on
p. 872.)

A361(a) ∫ sin−1 x = x sin−1 x − ∫ sin y∣ + C = x sin−1 x + cos y∣ + C = x sin−1 x +

y=sin−1 x y=sin−1 x

872, Contents

cos (sin−1 x) + C.

(b) ∫ cos−1 x = x cos−1 x − ∫ cos y∣ + C = x cos−1 x − sin y∣ + C = x cos−1 x −

y=cos−1 x y=cos−1 x
sin (cos x) + C.

By Lemma 1, cos (sin−1 x) = 1 − x2 = cos (sin−1 x), so that our answers here with those for
Exercise 360.

873, Contents

82. Term-by-Term Integration
This chapter is the integration analogue of Ch. 76.10 (Term-by-Term Differentiation).

Consider the integral: ∫ a1 + a2 + a3 dx.

Observe that the integrand a1 + a2 + a3 is the sum of three terms. And so by the Sum Rule,
our integral may be written as the sum of three integrals:

∫ a1 + a2 + a3 dx = ∫ a1 dx + ∫ a2 dx + ∫ a3 dx.

More generally:

∫ a1 + a2 + ⋅ ⋅ ⋅ + an dx = ∫ a1 dx + ∫ a2 dx + ⋅ ⋅ ⋅ + ∫ an dx.

That is, we can always interchange finite summation and integration:

n n
∫ (∑ ai ) dx = ∑ (∫ ai dx) .
i=1 i=1

Now, suppose instead that our integrand is the infinite series:

∑ ai = a1 + a2 + a3 + . . .

Is it also always true that:

∫ a1 + a2 + a3 + . . . dx = ∫ a1 dx + ∫ a2 dx + ∫ a3 . . . dx + . . .?

Or equivalently, can we always interchange infinite summation and integration:

∞ ∞
∫ (∑ ai ) dx = ∑ (∫ ai dx)?
i=1 i=1

It turns out that, “No, we cannot always do so.” See Exercise 363 for a counterexample.

It turns out that interchanging ∑ and ∫ 341
is valid only under certain technical conditions.
Nonetheless, in H2 Maths, we shall simply and blithely assume that these conditions are

usually (or even always) met — so that we can interchange ∑ and ∫ pretty much whenever
we like (even when we don’t actually know if this is valid).342

Example 1074. XXX

Or more broadly, interchanging lim and ∫ .
Hence producing yet another mindless cookbook procedure that can be tested on exams. This textbook
takes the view that the student should understand everything she does, rather than simply reproduce
“recipes” and “formulae”. It seems that the MOE and your A-Level examiners have the opposite view.
874, Contents
Example 1075. XXX

For two more examples, see Exercises 583(iii) (N2014/I/8) 599(ii)(a) (N2011/I/4).

Exercise 362. XXX (Answer on p. 875.)

Exercise 363. XXX (Answer on p. 875.)


875, Contents

83. Finding Specific Definite Integrals
Your H2 Maths syllabus explicitly includes:
• Finding the area of a region bounded by a curve and lines parallel to the coordinate
axes, between a curve and a line, or between two curves;
• Area below the x-axis;
• Finding the area under a curve defined parametrically;
• Finding the volume of revolution about the x- or y-axis; and
• Finding the approximate value of a definite integral using a graphing calculator.
So, these fascinating topics are what we’ll cover in this chapter.

83.1. Area between a Curve and Lines Parallel to Axes

Example 1076. Find the area bounded by the curve y = x2 and the lines y = 1 and y = 2.
It’s always helpful to make a quick sketch:

Figure to be
inserted here.

Here are two methods for finding the requested area A.

√ √
Method 1. The rectangle A + B + C + D has area 2 × 2 2 = 4 2.
The area of C is 2. Each of B and D has area:
√ √
−1 3 −1 1 2 2 2 2−1
∫−√2 x dx = [ 3 ] √ = − 3 − (− 3 ) =
2 x
− 2 3

Hence, the area of A is:

√ 2 2−1 4 √
4 2 − (2 + 2 × ) = (2 2 − 1) .
3 3

Method 2. The right branch of the parabola y = x2 has equation x = y. Hence, the
right half of A has area:
y=2 √ 2 3/2 2 2 √
x dy = ∫ y dy = [y ]1 = (2 2 − 1).
∫y=1 y=1 3 3
2 √ 4 √
Thus, the area of A is 2 × (2 2 − 1) = (2 2 − 1).
3 3
876, Contents
Exercise 364. Find the exact area bounded by the curve y = x3 , the horizontal lines
y = 1 and y = 2, and the vertical axis. (Answer on p. 1555.)

877, Contents

83.2. Area between a Curve and a Line

Example 1077. Find the area bounded by the curve y = x2 and the line y = x + 1.
1 2

As usual, a quick sketch first:

Figure to be
inserted here.

Our sketch suggests that we find the intersection points of the line and the curve. To do
so, combine = and =:
1 2

x2 − x − 1 = 0.

By the quadratic formula, the intersection points are:

1 ± (−1) − 4 (1) (−1) 1 ± 5
x= =
2 (1)

Hence, the requested area is:

√ √
(1+ 5)/2 (1+ 5)/2
x2 3
∫(1−√5)/2 x + 1 − x dx = [ 2 + x − 3 ] √
2 x
(1− 5)/2

⎡ (1 + √5)2 √ √ 3 √ 2
(1 + 5) ⎤⎥ ⎡⎢ (1 − 5)
√ √ 3
(1 − 5) ⎤⎥
⎢ + −
= ⎢⎢ ⎥−⎢ ⎥
1 5 1 5
+ − + −
⎢ 23 2 3 ⋅ 23 ⎥⎥ ⎢⎢ 23 2 3 ⋅ 23 ⎥⎥
⎣ ⎦ ⎣ ⎦
√ √ √ √ √ √
6 + 2 5 1 + 5 16 + 8 5 6 − 2 5 1 − 5 16 − 8 5
=[ + − ]−[ + − ]
8 2 24 8 2 24

√ √ √ √ √ √ √ √ √
3+ 5 1+ 5 2+ 5 3− 5 1− 5 2− 5 7+5 5 7−5 5 5 5
=[ + − ]−[ + − ]= − = .
4 2 3 4 2 3 12 12 6

Exercise 365. Find the exact area bounded by the curve y = sin x and the line y = 0.5,
for x ∈ (0, π/2).(Answer on p. 1556.)

878, Contents

83.3. Area between Two Curves

Example 1078. Find the area bounded by the curves y = x2 − 2x − 1 and y = 1 − x2 .

1 2

Figure to be
inserted here.

Our sketch suggests that we find the intersection points of the line and the curve. To do
so, combine = and =:
1 2

2x2 − 2x − 2 = 0.

By the quadratic formula, the intersection points are:

√ √
2 ± (−2) − 4 (2) (−2) 2 ± 20 1 ± 5
x= = =
2 (2)
4 2

Hence, the requested area is:

√ √
0.5(1+ 5) 0.5(1+ 5)
A=∫ √ 1 − x2 − (x2 − 2x − 1) dx = 2 ∫ √ 1 − x2 + x dx
0.5(1− 5) 0.5(1− 5)

0.5(1+ 5)
x3 2 5 5
= 2 [x − + ] =
2 0.5(1−√5)
3 3

Exercise 366. Find exact area bounded by the curves y = 2 − x2 and y = x2 + 1. (Answer
on p. 1556.)

879, Contents

83.4. Area Below the x-Axis

Example 1079. Find the area bounded by x2 − 4 and the x-axis.

Figure to be
inserted here.

As stated earlier, the definite integral gives us the signed area. So if the curve is under
the x-axis (as is the case here), then the computed area will be negative:

−8 −32
2 x3 8
∫−2 x − 4 dx = [ − 4x] = ( − 8) − ( + 8) =
3 −2 3 3 3
But of course, an area is simply a magnitude, so we’ll take the absolute value and conclude
that the requested area is .

Exercise 367. Find the exact area bounded by x4 − 16 and the x-axis. (Answer on p.

880, Contents

83.5. Area under a Parametrically-Defined Curve

Example 1080. Consider the curve described by the equations x = t3 − 2 and y = 4 − t5 .

Find the exact area bounded by the curve, the lines x = −2 and x = −1, and the horizontal
It helps to graph this curve on your graphing calculator:

Note that x = −1 ⇐⇒ t = 1, x = −2 ⇐⇒ t = 0, and dx/dt = 3t2 . So the area can be

computed as:
∫x=−2 y dx = ∫x=−2 4 − t dx = ∫x=−2 (4 − t ) 3t dt = ∫t=0 (4 − t ) 3t dt = [4t − ] = 4.
x=−1 x=−1 x=−1 t=1
5 5 2 5 2 3
8 0

Exercise 368. Consider the curve described by the equations x = t2 + 2t and y = t3 − 1.

Find the exact area bounded by the curve, the lines y = 1 and y = 2, and the vertical axis.
(Answer on p. 1557.)

881, Contents

83.6. Volume of Rotation About the y- or x-Axis

Example 1081. Consider the line y = 1. Rotate it about the x-axis to form an (infinite)
3D cylinder. Now consider the finite portion of the cylinder between x = 1 and x = 2. By
a primary school formula, its volume is Base Area × Height = π12 × (2 − 1) = π.

Figure to be
inserted here.

We can also compute this same volume using integration. The intuition is that we’re
adding up infinitely many infinitely thin circle-shaped slices, laid on their sides, from
x = 1 to x = 2 (left to right). The face of each of these circles has area πy 2 . In this
particular example, y is constant (simply 1). Thus, the total volume is
2 2
∫1 πy 2 dx = ∫ π dx = [πx]1 = π.

882, Contents

Example 1082. Rotate the line y = 3x about the x-axis to form an infinite double cone.
Consider the finite portion of the cone between x = 0 and x = 2. By the formula for the
1 1
volume of a cone, we know its volume is πr2 h = π62 × 2 = 24π.
3 3

Figure to be
inserted here.

We can also compute this same volume using integration. Again, the intuition is that
we’re adding up infinitely many infinitely thin circle shaped slices, from x = 0 to x = 2.
Again, the face of each of these circles has area πy 2 . In this particular example, y = 3x.
Thus, the total volume is
2 2 x3
∫0 πy dx = ∫0 π(3x) dx = 9π [ ] = 24π.
2 2
3 0
Now consider instead the finite portion of the cone between x = 3 and x = 5. This looks
like a pedestal tilted sideways (not illustrated). We can easily compute its volume using
5 5 x3
∫3 πy dx = ∫3 π(3x) dx = 9π [ ] = 294π.
2 2
3 3
Computing its volume using geometric formulae is possible, if slightly more tedious. The
1 1
finite portion of the cone between x = 0 and x = 3 is V1 = πr2 h = π92 × 3 = 81π. The
3 3
1 2 1
finite portion of the cone between x = 0 and x = 5 is V2 = πr h = π152 × 5 = 375π.
3 3
Hence, the desired volume is V = V2 − V1 = 375π − 81π = 294π.

883, Contents

We can just as easily find the volume of rotation about the y-axis.

Example 1083. Consider the curve y = x2 . Find its volume of rotation about the y-axis,
from y = 0 and y = 5.

Figure to be
inserted here.

In this case, there are no familiar geometric formulae we can apply. So we really just have
to compute this same volume using integration. Again, the intuition is that we’re adding
up infinitely many infinitely thin circle-shaped slices, but this time these circle-shaped
slices are stacked from bottom to top, from y = 0 to y = 5. The face of each of these
circles has area πx2 , where in this particular example, x2 = y. Thus, the total volume is
5 5 y2
∫0 πx dy = ∫0 πy dy = π [ ] = 12.5π.
2 0

Example 1084. XXX

Exercise 369. Compute the volume of rotation of y = sin x about the x-axis from x = 0
to x = π. (Answer on p. 1557.)

884, Contents

84. Types of Integrals (fun, optional)
You might be wondering why we call it the Riemann integral. Does this mean there are
other types of integrals? The answer is yes.
Calculus had already been “invented” by Newton and Leibniz in the 17th century. However,
they and their immediate intellectual heirs were not always perfectly rigorous. As Courant
and Robbins (1941) put it:
In a veritable orgy of intuitive guesswork, of cogent reasoning interwoven
with nonsensical mysticism, with a blind confidence in the superhuman
power of formal procedure, they conquered a mathematical world of im-
mense riches.
It was only during the Age of Rigour (roughly the 19th century) that calculus would be
put on a firmer footing.
The Cauchy integral (1823) and Darboux integral (1875) are very similar to the
Riemann integral (1854, 1867). In each case, the key idea is simply the ancient one
of approximating the area under the curve with shapes like rectangles.
In fact, our above discussion of the “Riemann integral” is actually closer to Darboux than
to Riemann and might therefore have been better called the Darboux integral. However,
because the three are so similar and because Riemann is usually considered to be the first
person to give a rigorous formulation of the integral, his name is also the one most closely
associated with this method.
In 1894, Thomas Joannes Stieltjes (1856–94) made an important generalisation to the
Riemann integral, producing what we now call the Stieltjes or Riemann-Stieltjes in-
tegral. Put simplistically, the difference is this: The Riemann integral allows us to integrate
only with respect to a variable; in contrast, the Riemann-Stieltjes integral allows us to also
do so with respect to a function.
In 1904, Henri Lebesgue (1875–1941) introduced the Lebesgue integral. Put simplistic-
ally, the difference is this: The Riemann integral measure vertical strips of area; in contrast,
the Lebesgue integral measures horizontal strips of area:

Figure to be
inserted here.

Or as Lebesgue himself put it:

I have to pay a certain sum, which I have collected in my pocket. I take
the bills and coins out of my pocket and give them to the creditor in the
order I find them until I have reached the total sum. This is the Riemann
integral. But I can proceed differently. After I have taken all the money
out of my pocket I order the bills and coins according to identical values
and then I pay the several heaps one after the other to the creditor. This
is my integral.
885, Contents
It turns out that for (important) technical reasons, the Lebesgue integral is superior to
the Riemann integral and is thus what mathematicians use. For example, while all con-
tinuous functions are Riemann-integrable, many discontinuous ones are not. The Le-
besgue integral greatly remedies this — any function that is Riemann-integrable is also
Lebesgue-integrable; there are, moreover, many functions that are Lebesgue-integrable but
not Riemann-integrable.
Unfortunately, the Lebesgue integral is also less intuitive and takes more work to under-
stand. It is thus not typically introduced until upper undergraduate or later levels. For most
purposes, the Riemann integral is “good enough” and is what’s taught in most introductory
calculus courses (such as this).
A relative latecomer to the party is the gauge or Henstock-Kurzweil integral, the theory
of which was developed only from the late 1950s. The proponents of the gauge integral
argue that it is simpler and more useful than the Lebesgue integral, but its detractors
disagree. See e.g. this discussion. Whatever the truth may be, the gauge integral is rarely
used, except perhaps by some hipster mathematicians.

886, Contents

85. Revisiting ln, logb, and exp (optional)
In Ch. 17, the natural logarithm function ln was informally defined as giving us an
area under a graph.
Now that we’ve learnt about the definite integral, we can — at long last — write down its
formal definition:

Definition 184. The natural logarithm (function) ln ∶ R+ → R is defined by:

ln x = ∫
1 t

Remark 127. In the above equation, students are often confused by the presence of the
two variables x and t. Both are dummy variables that could be replaced by any other
symbol. So for example, the following three equations are exactly equivalent:

1 ⋆ 1 ,1
ln x = ∫ ln ⋆ = ∫ ln , = ∫
dt, d○, d∎.
1 t 1 ○ 1 ∎

However, we must be careful not to mix up x and t. The dummy variable x is used to
tell us about the mapping rule of ln, while the dummy variable t is used to tell us about
the definite integral.

With the above definition and the FTC1, the following result is immediate:

Fact 157. The derivative of ln ∶ R+ → R is the function f ∶ R+ → R defined by f (x) = .

Proof. For any x ∈ R+ , we have, by Definition 184:

ln x = ∫
1 t
And so, by the FTC1, ln′ x = . The claim follows.
Since ln is differentiable, by Theorem 19, it is also continuous. Hence the following result
that was given earlier in Ch. 68.5 and now reproduced:

Fact 139. The function ln is continuous.

With the above definition, it is not difficult to prove some basic properties of the natural
logarithm function:

887, Contents

Fact 158. Suppose x, y > 0 and n ∈ R. Then:

(a) ln 1 = 0.
(b) ln (xy) = ln x + ln y.
(c) ln = − ln x.
(d) ln = ln x − ln y.

Proof. (a) Simply plug x = 1 into Definition 184, then apply Definition 181(a):
1 1
ln 1 = ∫ dx = 0.
1 x
(b) We will play a little trick. Differentiate both sides with respect to x to get:
d 1 dy 1 1 1 dy d 2 1 1 dy
ln (xy) = (y + x ) = + and (ln x + ln y) = + .
dx xy dx x y dx dx x y dx
Or equivalently:
1 1 dy 1 1 1 dy 2
∫ ( x + y dx ) = ln (xy) and ∫ ( x + y dx ) = ln x + ln y.

By Corollary 32 then, ln (xy) and ln x + ln y differ by at most a constant:

ln (xy) = ln x + ln y + C.

Plugging in x = 1, we have ln y = ln 1 + ln y + C = ln y + C. Hence, C = 0. And thus,

ln (xy) = ln x + ln y.

(c) See Exercise 370(a).

(d) follows from (b) and (c).

Exercise 370. Prove Fact 158(c) (use the same trick as in the proof of Fact 158(b)).
(Answer on p. 1558.)

Fact 159. Suppose x > 0 and n ∈ R. Then ln xn = n ln x.

Proof. See p. 1363 in the Appendices.

888, Contents

85.1. Revisiting Logarithms
In Ch. 5.6, Definition 30, we defined the general logarithm to be the inverse of the
exponentiation operation:

x = logb n ⇐⇒ bx = n.

Now that we have formally defined the natural logarithm, let us now give a more formal
definition of logarithms that replaces the above definition:

Definition 185. Let b, n > 0 with b ≠ 1. Then the base b logarithm of n is denoted logb n
and is defined to be the following number:
ln n
logb n = .
ln b
In Ch. 5.6, we gave informal proofs of the Laws of Logarithms. With the above Definitions
and results, we can now prove these rigorously and indeed more easily:

Proposition 2. (The Laws of Logarithms.) Let a, b, c > 0 and x ∈ R. Then

(a) logb 1 = 0.
(b) logb b = 1.
(c) logb bx = x.
(d) blogb a = a.
(e) c logb a = logb ac .
(f) logb = − logb a. (Logarithm of Reciprocal.)
(g) logb (ac) = logb a + logb c. (Sum of Logarithms.)
(h) logb = logb a − logb c.
(Difference of Logarithms.)
logc a
(i) logb a = (b, c ≠ 1). (Change of Base.)
logc b
(j) logab c = loga c (a ≠ 1).

Proof. Below, = indicates the use of Definition 185
⋆ ln 1
(a) logb 1 = = 0.
ln b
⋆ ln b
(b) logb b = = 1.
ln b
x ⋆ ln b x ln b
(c) logb b = = = x. (The middle step uses Fact 159.)

ln b ln b
⋆ ln a
(d) logb a = . Rearranging: (logb a) ln b = ln a.
ln b
By Fact 159: ln blogb a = ln a.
889, Contents
Now apply exp: exp (ln blogb a ) = exp (ln a). Since exp is, by definition, the inverse of ln, =
1 1

becomes blogb a = a.
(e)–(j) See Exercise 371.

Exercise 371. Prove Proposition 2(e)–(j). (Answer on p. 1558.)

890, Contents

85.2. Revisiting the Exponential Function
We reproduce from Ch. 17 the formal definition of the exponential function:

Definition 59. The exponential function, denoted exp, is the inverse of the natural
logarithm function.

We now prove certain basic properties about the exponential function:

Fact 160. Suppose x, y ∈ R. Then:

(a) exp 0 = 1.
(b) exp 1 = e.
(c) exp (x + y) = (exp x) (exp y).
(d) exp (−x) = .
exp x
exp x
(e) exp (x − y) = .
exp y

Proof. (a) By Fact 158(a), ln 1 = 0. So, exp 0 = 1.

(b) This is actually not a result but our definition of Euler’s number — see Ch. 60.
(c) Let a = exp x and b = exp y. Then by definition, x = ln a and y = ln b. And now:

exp (x + y) = exp (ln a + ln b) = exp [ln (ab)] = ab = (exp x) (exp y) ,


where = uses Fact 158(b).


(d) [exp (−x)] (exp x) == exp (−x + x) = exp 0 = 1. Rearranging (note that exp x > 0 for all
b a

x ∈ R), we have:
exp (−x) = .
exp x

(e) is immediate from (c) and (d).

We can now also easily prove that the derivative of the exponential function is itself :

Fact 161. exp′ = exp.

′ 1 1
Proof. By the Chain Rule, [ln (exp x)] = exp′ x.
exp x
′ 2
But observing that ln (exp x) = x, we also have [ln (exp x)] = 1.
Putting = and = together and rearranging yields the result.
1 2

The exponential function (and constant multiples thereof) are the only functions
that are their own derivative. Formally:

891, Contents

Proposition 13. Let f ∶ D → R be a function. If f = f ′ , then there exists some c ∈ R
such that:

f (x) = cex for every x ∈ R.

Proof. Define g ∶ D → R by g (x) = f (x) e−x . Then:

g ′ (x) = f ′ (x) e−x − f (x) e−x = 0.

By Proposition 6, a function whose derivative is a zero function is itself a constant function.

So here, g is a constant function. That is, there exists some c ∈ R such that:
g (x) = c for every x ∈ R.

Equivalently: f (x) e−x = c or f (x) = cex .

892, Contents

86. Even More Fun with Your TI84

Example 1085. Use your TI84 to find the approximate area bounded by the curve
y = esin x and the horizontal axis, between x = 1 and x = 2.

After Step 1. After Step 2. After Step 3. After Step 4.

After Step 5. After Step 6. After Step 7. After Step 8.

After Step 9. After Step 10.

1. Press ON to turn on your calculator.

2. Press Y= .
3. Press blue 2ND button and then ex (which corresponds to the LN button). Then
press SIN X,T,θ,n ) ) and altogether you will have entered esin x .
4. Now press GRAPH and the calculator will graph the given equation.
5. Press the blue 2ND button and then CALC (which corresponds to the TRACE
button), to bring up the CALCULATE menu.
6. Press 7 to select the “∫ f (x) dx” option. This brings you back to the graph.
7. The TI84 is now prompting you for “Lower Limit?” Simply press 1 .
8. Now press ENTER and you will have told the TI84 that your lower limit is x = 1.
9. The TI84 is now similarly prompting you for “Upper Limit?” Simply press 2 .
10. Now press ENTER and you will have told the TI84 that your upper limit is x = 2.
The TI84 also informs you that “∫ f (x) dx = 2.60466115”. This is our desired area
(which is now also kindly shaded in black by our TI84.)

893, Contents

87. Differential Equations
87.1. = f (x)
Recall that the following two statements are equivalent:

dy 1
= f (x) ⇐⇒ y = ∫ f (x) dx.
Going from right to left (= to =), we apply the differentiation operator
2 1
Going from left to right (= to =), we apply the antidifferentiation operator ∫ dx.
1 2

dy 1 2
Example 1086. Solve the differential equation =x .
Apply the antidifferentiation operator ∫ dx:
y = ∫ x2 dx = + C.

We call = the general solution to the given differential equation =. It is general because
2 1

the constant of integration C is free to vary, so that there are many possible solutions for
Now, suppose we are told also that:

If x = 0, then y = 1 (x, y) = (0, 1).


This additional piece of information = is often called an initial condition. Here’s why.

It might be that y is the number of bats in a cave and x is time. Then the initial condition
tells us that at time x = 0 (i.e. “initially”), the number of bats in the cave is y = 1. And
over time, the number of bats in the cave changes according to the differential equation

Plugging the initial condition = into the general solution =, we find:

3 2

1 = + C = C.
Hence, C = 1. And so:
y= + 1.
dy 1 2
We call = the particular solution to the differential equation = x with initial
condition (x, y) = (0, 1).

894, Contents

Example 1087. Solve = sin x.
y = ∫ sin x dx = − cos x + C, where as usual C is the constant of integration. Again, this
is the general solution to the given differential equation.
If we are given the initial condition that x = 0 Ô⇒ y = 1, then we can write 1 = − cos 0+C
and find that C = 2. We thus have that y = − cos x + 2. This is the particular solution
to the given differential equation (with given initial condition).

For future reference, here is the formal result that justifies the procedure used in the above

Fact 162. The general solution to the differential equation = f (x) is:

y = ∫ f (x) dx.

Exercise 372. Find the general solution of = ex sin x. Find also the particular solu-
tion, if given also the initial condition x = 0 Ô⇒ y = 1. (Answer on p.

895, Contents

87.2. = f (y)
dy 1 2 2
Example 1088. Solve = y (y ≠ 0).

dx 1
Rearrange: = .
dy y 2

1 2 −1 1
Apply ∫ dy: x=∫ = +C y=
dy or .
y2 y C −x

Hence, = or = is the general solution to =.

2 3 1

We are now also given the initial condition (x, y) = (0, 1). Plugging = into = (or =):
4 4 3 2

1= ⇐⇒ C = 1.
C −0
dy 1 2
= y (y ≠ 0) with initial condition (x, y) = (0, 1) is:
Thus, the particular solution to

896, Contents

Example 1089. Solve = sin y.
dx 1
Rearrange to get = = cosecy (for y not an integer multiple of π). Hence, by
dy sin y
Proposition 9, x = ∫ cosecy dy = − ln ∣cosecy + cot y∣ + C (for y not an integer multiple of
π). This is the general solution for the given differential equation.
Unfortunately, without more information, it is impossible to write y as a function of x,
because for each given value of x, there are multiple possible values of y, as we now show,
by manipulating that last equation:

x = − ln(cosecy + cot y) + C

1 + cos y 2 cos2 (y/2) cos(y/2)

⇐⇒ eC−x = cosecy + cot y = = = = cot
sin y 2 sin(y/2) cos(y/2) sin(y/2) 2

⇐⇒ y = 2 (cot−1 eC−x + 2mπ) , for any m ∈ Z.

That is, for each given value of x, there are infinitely many possible values of y (one for
each integer m).
But now suppose we have the initial condition x = 3 Ô⇒ y = . In this case, we have
3 = − ln ∣cosec + cot ∣ + C = − ln ∣1∣ + C = C,
π π
2 2

so that C = 3. We may write y = 2 (cot−1 e3−x + 2mπ). Moreover, plugging in the same
values for x and y, we see that

= 2 (cot−1 e3−3 + 2mπ) = + 4mπ.

π π
2 2
Hence, m = 0 and y = 2 cot−1 e3−x . This is the particular solution to the given differential
equation (with given initial condition)

We now justify why the procedure used in the above examples works:
So, suppose we’re given the following differential equation:

= f (y) with f (y) ≠ 0.
By the Inverse Function Theorem:

dx 1 1
= dy = .
dy dx f (y)

And now, applying the antidifferentiation operator ∫ dy:

897, Contents

x=∫ dy.
f (y)

Fact 163. The general solution to the differential equation = f (y) with f (y) ≠ 0 is:

x=∫ dy.
f (y)

Exercise 373. Find the general solution of = y 2 + 1. Find also the particular solution,
given also the initial condition x = 0 Ô⇒ y = 1. (Answer on p. 1559.)

898, Contents

d2 y
87.3. = f (x)
d2 y 1
Given the differential equation = f (x), apply the antidifferentiation operator ∫ dx
once to get:

d2 y dy
∫ dx2 dx = ∫ f (x) dx or =
dx ∫
f (x) dx.

Now apply the antidifferentiation operator ∫ dx a second time to get:343

∫ dx dx = ∫ (∫ f (x) dx) dx y = ∫ (∫ f (x) dx) dx.

In summary:

d2 y
Fact 164. The general solution to the differential equation = f (x) is:

y = ∫ (∫ f (x) dx) dx.

Strictly speaking, the parentheses around the inner integral are not necessary.
899, Contents
d2 y 1 2
Example 1090. Solve =x .
Apply the antidifferentiation operator ∫ dx once:

d2 y dy x3
∫ dx2 = = = + C1 .
dx ∫
dx x dx

Apply the antidifferentiation operator ∫ dx a second time:

dy x3 4
= = + = + C1 x + C2 .
2 x
∫ dx dx y ∫ 3 C 1 dx

Hence, = is the general solution to the differential equation =.

2 1

We are now also given the initial conditions (x, y) = (0, 1) and (x, y) = (1, 2). Plug = and
3 4 3

= into = to get:
4 2

1 = 0 + 0 + C2 and 2= + C1 + C2 .
Solving the above system of equations, we have C2 = 1 and C1 = 11/12.
d2 y 1 2
Thus, the particular solution to the differential equation = x with initial condi-
tions (x, y) = (0, 1) and (x, y) = (1, 2) is:
3 4

x4 11
y= + x + 1.
12 12

d2 y
Example 1091. Solve = sin x.
= sin x dx = − cos x + C1 . Next, y = ∫ − cos x + C1 dx = − sin x + C1 x + C2 . This is
dx ∫
the general solution to the given differential equation.
If given the additional pieces of information that x = 0 Ô⇒ y = 1 and x = π Ô⇒ y = 2,
then we we have

1 = − sin 0 + 0C1 + C2 Ô⇒ C2 = 1,
2 = − sin π + πC1 + 1 Ô⇒ C1 = .
Hence y = − sin x + x + 1 is the particular solution.

d2 y
Exercise 374. Find the general solution of = ex sin x. Find also the particular
dx 2
solution, given also that x = 0 Ô⇒ y = 1.(Answer on p. 1560.)

900, Contents

87.4. Word Problems
The H2 Maths syllabus includes:
• Formulating a differential equation from a problem situation; and
• Interpreting a differential equation and its solution in terms of a problem situation.
So these are what we’ll cover in this subchapter.

Example 1092. A plate of bacteria grows at a rate that is inversely proportional to the
number of bacteria. Express the number of bacteria as a function of time.
Let x be the number of bacteria. Let t be time. We are given that x grows in inverse
dx k
proportion to t. In other words, = , for some constant k ∈ R. Rearranging, we have
dt x
= . Thus,
dt x
dx k
t=∫ dx = + C.
k k

we have x = ± k(t − C), where of course the negative root may be
Further rearranging, √
rejected. Hence, x = k(t − C).
Suppose we are also given that t = 0 Ô⇒ x = 1 and t = 1 Ô⇒ x = 2. Then we have
√ √
1 = k(−C) and 2 = k(1 − C).
a b

From =, we have C = −1/k. Plug this into = and we have 4 = k(1 √+ 1/k) = k + 1 or k = 3.
a b

Hence C = −1/3. Altogether then, the particular solution is x = 3t + 1.

901, Contents

Exercise 375. Follow these steps to find the escape velocity (of an object from Earth).
(Answer on p. 1561.)
(a) The law of gravitation states that the force of attraction F between two point masses
M and m is proportional to the product of their masses and inversely proportional to the
square of the distance r between them. Write down this law in the form of an equation.
Your answer should contain a constant — name this constant G (this is the gravitational
Momentum is defined as the product of mass m and velocity v. Newton’s Second Law of
Motion states that force is the rate of change of momentum.
(b) (i) Write down Newton’s Second Law in the form of an equation.
(ii) Assume that mass m is constant. Explain why F = m .
Now suppose M and m are, respectively, the masses of the Earth and a small ball. Assume
• The Earth is a perfect sphere with radius R m.
• You can treat the Earth as a single point with its mass concentrated at the centre of
the sphere. Thus, the initial distance between the Earth’s centre of mass and the ball
is R + x m.
• Upwards (away from the Earth) is the positive direction and downwards (towards the
centre of the Earth) is the negative direction.
• The Earth is immobile.
• There is no air resistance or any other form of friction.
(c) The small ball is initially held at rest, x m above the surface of the Earth. It is then
released. Let v be the velocity of the ball. Explain why 2 = − . (In particular,
r dt
explain why there is a negative sign.)
(... Exercise continued on the next page ...)

902, Contents

(... Exercise continued from the previous page ...)
From the equation in (c), we may write:
= ∫R+x − dt dr.
∫R+x r2 dr

Let vs be the velocity at which the ball hits the surface of the Earth.
1 1
(d) (i) Show that the LHS of the above equation is equal to GM (− + ).
R R+x
(ii) Show that the RHS of the above equation is equal to − . (Hint 1: Use Integration
by substitution. Hint 2: What is ?)

1 1
(iii) Hence show that vs = − 2GM ( − ). Again, explain why vs is negative.
R R+x
Suppose instead that the small ball is initially at rest on the surface of the earth. It is
then propelled upwards at a velocity V .

√ Explain why the ball will reach a maximum height of x m, where V =

1 1
2GM ( − ), before falling back down to the earth.
R R+x
(f) The escape velocity ve is the velocity with which we must propel the ball√ upwards
(from its initial resting position on the surface of the earth). Explain why ve = .
(g) Given√ that G = 6.674 × 10−11 m3 kg-1 s-2 , M = 5.972 × 1024 kg, and R = 6, 371 km,
compute (express your answer in km s-1 , correct to 4 significant figures).

903, Contents

Part VI.
Probability and Statistics

Exam Tips for Towkays

Probability and Statistics takes up 60 (out of 200) points and hence 30% of your A-Level

904, Contents

Two holes are better than one. Any mouse will tell you that.

— Willy Wonka (in Charlie and the Great Glass Elevator, 1972).

905, Contents

88. How to Count: Four Principles
How many arrangements or permutations are there of the three letters in CAT? For
example, one possible permutation of CAT is TCA.
To solve this problem, one possible method is the method of enumeration. That is,
simply list out (enumerate) all the possible permutations.


We see that there are 6 possible permutations.

Enumeration works well enough when we have just three letters, as in CAT. Indeed, enu-
meration is sometimes the quickest method.
In contrast, the 13 letters in the word UNPREDICTABLY have 6, 227, 020, 800 possible
permutations. So enumeration is probably not practical.
To help us count more efficiently, we’ll learn about four basic principles of counting:
1. The Addition Principle (AP);
2. The Multiplication Principle (MP);
3. The Inclusion-Exclusion Principle (IEP); and
4. The Complements Principle (CP).

906, Contents

88.1. How to Count: The Addition Principle
The addition principle (AP) is very simple.

Example 1093. For lunch today, I can either go to the food court or the hawker centre.
At the food court, I have 2 choices: ramen or briyani. At the hawker centre, I have 3
choices: bak chor mee, nasi lemak, or kway teow.
Altogether then, I have 2 + 3 = 5 choices of what to eat for lunch today.

Here’s an informal statement of the AP:344

The Addition Principle (AP). I have to choose a destination, out of two possible
areas. At area #1, there are p possible destinations to choose from. At area #2, there
are q possible destinations to choose from.
The Addition Principle (AP) simply states that I have, in total, p + q different choices.

(Just so you know, the AP is sometimes also called the Second Principle of Counting
or the Rule of Sum or the Disjunctive Rule.)
Of course, the AP generalities to cases where there are more than just 2 “areas”. It may
seem a little silly, but just to illustrate, let’s use the AP to tackle the CAT problem:

See section 122.1 in the Appendices (optional) for a more precise statement of the AP.
907, Contents
Example 1094. Problem: How many permutations are there of the letters in the word
We can divide the possibilities into three cases:
Case #1. First letter is an A. Then the next two letters are either CT or TC — 2
Case #2. First letter is a C. Then the next two letters are either AT or TA — 2
Case #3. First letter is a T. Then the next two letters are either AC or CA — 2
Altogether then, by the AP, there are 2 + 2 + 2 = 6 possibilities. That is, there are 6
possible permutations of the letters in CAT. These are illustrated in the tree diagram

908, Contents

The next exercise is very simple and just to illustrate again the AP.
Exercise 376. Without retracing your steps, how many ways are there to get from the
Starting Point to the River (see figure below)? (Answer on p. 1563.)

Exercise 377. How many permutations are there of the letters in the word DEED?
Illustrate your answer with a tree diagram similar to that given in the CAT example
above. (Answer on p. 1563.)

909, Contents

88.2. How to Count: The Multiplication Principle

Example 1095. For lunch today, I can either have prata or horfun. For dinner tonight,
I can have McDonald’s, KFC, or Pizza Hut.
Enumeration shows that I have a total of 6 possible choices for my two meals today:

(Prata, McDonald’s), (Prata, KFC), (Prata, Pizza Hut),

(Horfun, McDonald’s), (Horfun, KFC), (Horfun, Pizza Hut).

Alternatively, we can use the Multiplication Principle (MP). I have 2 choices for lunch
and 3 choices for dinner. Hence, for my two meals today, I have in total 2 × 3 = 6 possible

Here’s an informal statement of the MP:345

The Multiplication Principle (MP). I have to choose two destinations, one from
each of two possible areas. At area #1, there are p possible destinations to choose from.
At area #2, there are q possible destinations to choose from.
The Multiplication Principle (AP) simply states that I have, in total, p×q different choices.

(The MP is sometimes also called the Fundamental or First Principle of Counting

or the Rule of Product or the Sequential Rule.)
Of course, the MP generalities to cases where there are more than just 2 “areas”. Here’s an
example where we have to make 3 decisions:

See section 122.1 in the Appendices (optional) for a more precise statement of the MP.
910, Contents
Example 1096. For breakfast tomorrow, I can have shark’s fin or bird’s nest (2 choices).
For lunch, I can have black pepper crab or curry fishhead (2 choices). For dinner, I can
have an apple, a banana, or a carrot (3 choices). By the MP, for tomorrow’s meals, I have
a total of 2 × 2 × 3 = 12 possible choices. We can enumerate these (I’ll use abbreviations):

(SF, BPC, A), (SF, BPC, B), (SF, BPC, C), (SF, CF, A),

(SF, CF, B), (SF, CF, C), (BN, BPC, A), (BN, BPC, B),

(BN, BPC, C), (BN, CF, A), (BN, CF, B), (BN, CF, C).

Example 1097. Problem: How many four-letter words can be formed using the letters
in the 26-letter alphabet?
Let’s rephrase this problem so that it is clearly in the framework of the MP. We have 4
blank spaces to be filled:

_ _ _ _.
1 2 3 4

These 4 blanks spaces correspond to 4 decisions to be made. Decision #1: What letter
to put in the first blank space? Decision #2: What letter to put in the second blank
space? Decision #3: What letter to put in the third blank space? Decision #4: What
letter to put in the fourth blank space?
How many choices have we for each decision?
For Decision #1, we can put A, B, C, ..., or Z. So we have 26 choices for Decision #1.
For Decision #2, we can again put A, B, C, ..., or Z. So we again have 26 choices for
Decision #2.
We likewise have 26 choices for Decision #3 and also 26 choices for Decision #4.
Altogether then, by the MP, there are 26 × 26 × 26 × 26 = 264 = 456, 976 ways to make our
four decisions.
Solution: There are 264 = 456, 976 possible four-letter words that can be formed using the
26-letter alphabet.

911, Contents

Example 1098. One 18-sided die has the numbers 1 through 18 printed on each of its
sides. Another six-sided die has the letters A, B, C, D, E, and F printed on each of its
sides. We roll the two dice. How many distinct possible outcomes are there?
Again, let’s rephrase this problem in the framework of the MP. Consider 2 blank spaces:

_ _.
1 2

These 2 blank spaces correspond to 2 decisions to be made. Decision #1: What number
to put in the first blank space? Decision #2: What letter to put in the second blank
Again we ask: How many choices have we for each decision?
For Decision #1, we can put 1, 2, 3, ..., or 18. So we have 18 choices for Decision #1.
For Decision #2, we can put A, B, C, D, E, or F. So we have 6 choices for Decision #2.
Altogether then, by the MP, there are 18 × 6 = 108 ways to make our two decisions. In
other words, there are 108 possible outcomes from rolling these two dice.
(If necessary, it is tedious but not difficult to enumerate them: 1A, 1B, 1C, 1D, 1E, 1F,
2A, 2B, ..., 17E, 17F, 18A, 18B, 18C, 18D, 18E, and 18F.)

Exercise 378. A club as a shortlist of 3 men for president, 5 animals for vice-president,
and 10 women for club mascot. How many possible ways are there to choose the president,
the vice-president, and the mascot? (Answer on p. 1564.)
Exercise 379. (Answer on p. 1564.) The highly-stimulating game of 4D consists of
selecting a four-digit number, between 0000 and 9999 (so there are 10, 000 possible num-
Your mother tells you to go to the nearest gambling den (also known as a Singapore Pools
outlet) to buy any three numbers, subject to these two conditions:
• The four digits in each number are distinct.
• Each four-digit number is distinct.
How many possible ways are there to fulfil your mother’s request?

912, Contents

88.3. How to Count: The Inclusion-Exclusion Principle
The Inclusion-Exclusion Principle (IEP) is another very simple principle.

Example 1099. For lunch today, I can either go to the food court or the hawker centre.
At the food court, I have 4 choices of cuisine: Chinese, Indian, Malay, and Western. At
the hawker centre, I have 3 choices of cuisine: Chinese, Malay, and Thai.
There are 2 choices of cuisine that are common to both the food court and the hawker
centre (Chinese and Malay).
And so by the Inclusion-Exclusion Principle (IEP), I have in total 4 + 3 − 2 = 5 choices of
cuisine. The Venn diagram below illustrates.
Why do we subtract 2? If we simply added the 4 choices available at the food court
to the 3 available at the hawker centre, then we’d double-count the Chinese and Malay
cuisines, which are available at both the food court and the hawker centre. And so we
must subtract the 2 cuisines that are at both locations.

Example 1100. Problem: How many integers between 1 and 20 are divisible by 2 or 5?
There are 10 integers divisible by 2, namely 2, 4, 6, 8, 10, 12, 14, 16, 18, and 20.
There are 4 integers divisible by 5, namely 5, 10, 15, and 20.
There are 2 integers divisible by BOTH 2 and 5, namely 10 and 20.
Hence, by the IEP, there are 10 + 4 − 2 = 12 integers that are divisible by either 2 or 5.
(These are namely 2, 4, 5, 6, 8, 10, 12, 14, 15, 16, 18, and 20.)

Here’s an informal statement of the IEP:346

See section 122.1 in the Appendices (optional) for a more precise statement of the IEP.
913, Contents
The Inclusion-Exclusion Principle (IEP). I have to choose a destination, out of two
possible areas. At area #1, there are p possible destinations to choose from. At area #2,
there are q possible destinations to choose from. Areas #1 and #2 overlap — they have
r destinations in common.
The IEP simply states that I have, in total, p + q − r different choices.

Exercise 380. (Answer on p. 1565.) The food court has 4 types of cuisine: Chinese, In-
donesian, Korean, and Western. The hawker centre has 3: Chinese, Malay, and Western.
A restaurant has 3: Chinese, Japanese, or Malay.
In total, how many different types of cuisine are there? Illustrate your answer with a
Venn diagram.

914, Contents

88.4. How to Count: The Complements Principle
The Complements Principle (CP) is another very simple principle.

Example 1101. The food court has 4 types of cuisine: Chinese, Malay, Indian, and
I’m at the food court but don’t feel like eating Malay or Chinese. So by the Complements
Principle (CP), I have 4 − 2 = 2 possible choices of cuisine (Indian and Other).

Here’s an informal statement of the CP:347

The Complements Principle (CP). There are p possible destinations. I must choose
one. I rule out q of the possible destinations.
The Complements Principle says that I am left with p − q possible choices.

Exercise 381. There are 10 Southeast Asian countries, of which 3 (Brunei, Indonesia,
and the Philippines) are not on the mainland. How many mainland Southeast Asian
countries are there that a European tourist can visit? (Answer on p. 1565.)

See section 122.1 in the Appendices (optional) for a more precise statement of the CP.
915, Contents
89. How to Count: Permutations
In this chapter, we’ll use the MP to generate several more methods of counting.
But first, some notation you should find familiar from secondary school:

Definition 186. Let n ∈ Z+0 . Then n-factorial, denoted n!, is defined by n! = n × (n − 1) ×

⋅ ⋅ ⋅ × 1 for n ≥ 1 and 0! = 1.

Example 1102. 0! = 1, 1! = 1, 2! = 2× = 2, 3! = 3 × 2 × 1 = 6, 4! = 4 × 3 × 2 × 1 = 24,

5! = 5 × 4 × 3 × 2 × 1 = 120.

Exercise 382. Compute 6!, 7!, and 8!. (Answer on p. 1566.)

We now revisit the CAT problem, using the MP:

Example 1103. Problem: How many permutations (or arrangements) are there of the
three letters in the word CAT?
Let’s rephrase this problem in the framework of the MP. Consider three blank spaces:

_ _ _.
1 2 3

These 3 blank spaces correspond to 3 decisions to be made. Decision #1: What letter to
put in the first blank space? Decision #2: What letter to put in the second blank space?
Decision #3: What letter to put in the third blank space?
Again we ask: How many choices have we for each decision?
For Decision #1, we can put C, A, or T. So we have 3 choices for Decision #1.
Having already used up a letter in Decision #1, we are left with two letters. So we have
2 choices for Decision #2.
Having already used up a letter in Decision #1 and another in Decision #2, we are left
with just one letter. So we have only 1 choice for Decision #3.
Altogether then, by the MP, there are 3 × 2 × 1 = 3! = 6 possible ways of making our
decisions. This is also the number of ways there are to arrange the three letters in the
word CAT.

Let’s now try the UNPREDICTABLY problem.

916, Contents

Example 1104. Problem: How many ways permutations are there of the 13 letters in
Again, let’s rephrase this problem in the framework of the MP. Consider 13 blank spaces:

_ _ _ _ _ _ _ _ _ _ _ _ _.
1 2 3 4 5 6 7 8 9 10 11 12 13

These 13 blanks spaces correspond to 13 decisions to be made. Decision #1: What letter
to put in the first blank space? Decision #2: What letter to put in the second blank
space? ... Decision #13: What letter to put in the 13th blank space?
Again we ask: How many choices have we for each decision?
First an important note: In the word UNPREDICTABLY, no letter is repeated. (Indeed,
UNPREDICTABLY is the longest “common” English word without any repeated letters.)
For Decision #1, we can put U, N, P, R, E, D, I, C, T, A, B, L, or Y. So we have 13
choices for Decision #1.
For Decision #2, having already used up a letter in Decision #1, we are left with 12
letters. So we have 12 choices for Decision #2.
For Decision #3, having already used up a letter in Decision #1 and another letter in
Decision #2, we are left with 11 letters. So we have 11 choices for Decision #3.

For Decision #13, having already used up a letter in Decision #1, another in Decision
#2, another in Decision #3, ..., and another in Decision #12, we are left with one letter.
So we have 1 choice for Decision #13.
Altogether then, by the MP, there are 13 × 12 × ⋅ ⋅ ⋅ × 2 × 1 = 13! = 6, 227, 020, 800 possible
ways of making our decisions. This is also the number of ways there are to arrange the
13 letters in the word UNPREDICTABLY.

917, Contents

The next fact simply summarises what should already be obvious from the above examples:

Fact 165. There are n! possible permutations of n distinct objects.

Here is an informal proof of the above fact.348

Consider n empty spaces. We are to fill them with the n distinct objects.

_ _ _ . . . _.
1 2 3 n

For space #1, we have n possible choices. For space #2, we have n − 1 possible choices
(because one object was already placed in space #1). ... And finally for space #n, we have
only 1 object left and thus only 1 choice. By the MP then, there are n × (n − 1) × ⋅ ⋅ ⋅ × 1 = n!
possible ways of filling in these n spaces with the n distinct objects.

Example 1105. The word COWDUNG has seven distinct letters. Hence, there are
7! = 5040 permutations of the letters in the word COWDUNG.

This is informal because, amongst other omissions, we haven’t yet given a precise definition of the term
918, Contents
89.1. Permutations with Repeated Elements
In the previous section, we saw that there are 3! permutations of the three letters in the
word CAT and 13! permutations of the 13 letters in the word UNPREDICTABLY. We
made an important note: In each of these words, there was no repeated letter.
We now consider permutations of a set where some elements are repeated.

Example 1106. How many permutations are there of the three letters in the word SEE?
A naïve application of the MP would suggest that the answer is 3! = 6. This is wrong.
Enumeration shows that there are only 3 possible permutations:


To see why a naïve application of the MP fails, set up the problem in the framework of
the MP. Consider 3 blank spaces:

_ _ _.
1 2 3

These 3 blanks spaces correspond to 3 decisions to be made. Decision #1: What letter
to put in the first blank space? Decision #2: What letter to put in the second blank
space? Decision #3: What letter to put in the third blank space?
Again we ask: How many choices have we for each decision?
For Decision #1, we can put E or S. So we have 2 choices for Decision #1.
But now the number of choices available for Decision #2 depends on what we chose
for Decision #1! (If we chose E in Decision #1, then we again have 2 choices for
Decision #2. But if instead we chose S in Decision #2, then we now have only 1 choice
for Decision #2.) This violates the implicit but important assumption in the MP that
the number of choices available in one decision is independent on the choice made in the
other decision. Hence, the MP does not directly apply.
The reason SEE has only 3 possible permutations (instead of 3! = 6) is that it contains a
repeated element, namely E. But why would this make any difference?
To understand why, let’s rename the second E as Ê, so that the word SEE is now trans-
formed into a new word SEÊ. From the three letters of this new word, we’d again have
3! = 6 possible permutations:


(Example continues on the next page ...)

919, Contents

(... Example continued from the previous page.)
Restricting attention to the two letters EÊ, we see that there are 2! = 2 ways to permute
these two letters. Hence, any single permutation (in the case where we do not distin-
guish between the two E’s) corresponds to 2 possible permutations (in the case where
we do). The figure below illustrates how the 3 permutations of SEE correspond to the 6
permutations in SEÊ.

Hence, when we do not distinguish between the two E’s, there are only half as many
possible permutations.

We next consider permutations of SASS.

Example 1107. How many permutations are there of the four letters in the word SASS?
The answer is 4!/3! = 4. Let’s see why.
If we distinguish between the three S’s, perhaps by calling them S, Ŝ, and S̄, then we’d
have 4! = 24 possible permutations of the letters in the word SAŜS̄.
But amongst the three S’s themselves, we have 3! = 6 possible permutations: SŜS̄, SS̄Ŝ,
ŜSS̄, S̄SŜ, ŜS̄S, and S̄ŜS. So distinguishing between the three S’s increases by 6-fold the
number of possible permutations. Working backwards, the word SASS thus has one-sixth
as many permutations as SAŜS̄. That is, SASS has 4!/3! = 4 possible permutations.
The figure below illustrates how the 4 possible permutations of SASS correspond to the
24 possible permutations of SAŜS̄.

920, Contents

Example 1108. How many permutations are there of the four letters in the word DEED?
Answer: .
In the numerator, the 4! corresponds to the total of 4 letters. In the denominator, the 2!
corresponds to the 2 D’s and the 2! corresponds to the 2 E’s. Where do these numbers
come from?
Let x be the number of permutations of DEED (i.e. x is our desired answer).
If we distinguish between the two D’s, then we’d increase by 2!-fold the number of possible
permutations, to x⋅2!. If, in addition, we distinguish between the 2 E’s, then we’d increase
again by 2!-fold the number of possible permutations, to x ⋅ 2! ⋅ 2!. But we know that if
all 4 letters are distinct, then there are 4! possible permutations. Therefore,

x ⋅ 2! ⋅ 2! = 4!

Rearrangement yields the answer:

x= = 6.
You can go back and check that this answer is consistent with our answer for Exercise
377 (above).

We next consider permutations of ASSESSES.

921, Contents

Example 1109. Problem: How many permutations are there of the eight letters in the
Answer: .
In the numerator, the 8! corresponds to the total of 8 letters. In the denominator, the 2!
corresponds to the 2 E’s and the 5! corresponds to the 5 S’s. Where do these come from?
Let y be the number of permutations of ASSESSES (i.e. y is our desired answer).
If we distinguish between the two E’s, then we’d increase by 2!-fold the number of possible
permutations, to y ⋅2!. If, in addition, we distinguish between the 5 S’s, then we’d increase
again by 5!-fold the number of possible permutations, to y ⋅ 2! ⋅ 5!. But we know that if
all 8 letters are distinct, then there are 8! possible permutations. Therefore,

y ⋅ 2! ⋅ 5! = 8!

Rearrangement yields the answer:

y= .

922, Contents

In general,

Fact 166. Consider n objects, only k of which are distinct. Let r1 , r2 , . . . , and rk be the
numbers of times the 1st, 2nd, . . . , and kth distinct objects appear. (So r1 +r2 +⋅ ⋅ ⋅+rk = n.)
Then the number of possible ways to permute these n objects is
r1 !r2 ! . . . rk !

More examples:

Example 1110. How many permutations are there of the six letters in the word BA-
We have three distinct letters — B, A, and N. The letter B appears 1 time. The letter
A appears 3 times. The letter N appears 2 times. Hence, by the above Fact, the number
of possible permutations of these 6 letters is
= 60.
Of course, 1! is simply equal to 1. So for the denominator, we shall usually not bother to
write out any 1!. So we will normally instead write that the number of permutations of
= 60.

Example 1111. How many permutations are there of the 11 letters in the word MISSIS-
We have four distinct letters — M, I, S, and P. The letter M appears 1 time. The letter
I appears 4 times. The letter S appears 4 times. The letter P appears 2 times. Hence,
by the above Fact, the number of possible permutations of these 11 letters is
= 34, 650.

Exercise 383. There are 3 identical white tiles and 4 identical black tiles. How many
ways are there of arranging these 7 tiles in a row? (Answer on p. 1566.)

923, Contents

89.2. Circular Permutations
Informal Definition. Two circular permutations are equivalent if one can be trans-
formed into another by means of a rotation.

Example 1112. There are 3! = 6 (linear) permutations of CAT. That is, there are 3! = 6
possible ways to fill them into these 3 linearly-arranged spaces:

1 2 3

In contrast, there are only 2! = 2 circular permutations of CAT. That is, there are only
2! = 2 possible ways to fill them into these 3 circularly-arranged spaces:

Let’s see why there are only 2 circular permutations of CAT.

(Example continues on the next page ...)

924, Contents

(... Example continued from the previous page.)

The three seemingly-different arrangements above are considered to be the same circular
permutation. This is because any arrangement is simply a rotation of another. Take the
left red arrangement, rotate it clockwise by one-third of a circle to get the middle green
arrangement. Repeat the rotation to get the right blue arrangement.
The second and only other circular arrangement of CAT is shown below. Again, these
three seemingly-different arrangements are considered to be the same circular permuta-
tion. This is because any arrangement is simply a rotation of another. Take the left
black arrangement, rotate it clockwise by one-third of a circle to get the middle pink
arrangement. Repeat the rotation to get the right orange arrangement.
Note importantly, that the arrangement (or three arrangements) below cannot be rotated
to get the arrangement (or three arrangements) above. Hence, the arrangement below is
indeed distinct from the arrangement above.

It turns out that in general, if we have n distinct objects, there are (n − 1)! ways to
arrange them in a circle. So here there are only (3 − 1)! = 2! = 2 ways to arrange CAT in
a circle.

In general:

Fact 167. n distinct objects have (n − 1)! circular permutations.

Proof. Given n distinct objects, any 1 circular permutation can be rotated n times to obtain
925, Contents
n distinct (linear) permutations. Hence, there are n times as many (linear) permutations
as there are circular permutations.
But we already know that there are n! (linear) permutations of n distinct objects. Hence,
there are n!/n = (n − 1)! circular permutations of n distinct objects.

Exercise 384. How many ways are there to seat 10 people in a circle? (Answer on p.

Note that if there are repeated objects, then the problem is considerably more difficult. See
Ch. 122.2 in the Appendices for a brief discussion.

926, Contents

89.3. Partial Permutations

Example 1113. Using the 26-letter alphabet, how many 3-letter words can we form that
have no repeated letters? This, of course, is simply the problem of filling in these 3 empty
spaces using 26 distinct elements. For space #1, we have 26 possible choices. For space
#2, we have 25. And for space #2, we have 24.

1 2 3

By the MP then, the number of ways to fill the three spaces is 26 × 25 × 24. This is also
the number of three-letter words with no repeated letters.

Problems like the above example crop up often enough to motivate a new piece of notation:

Definition 187. Let n, k be positive integers with n ≥ k. Then P (n, k), read aloud as n
permute k, is defined by

P (n, k) =
(n − k)!

P (n, k) answers the following question: “Given n distinct objects and k spaces (where
k ≤ n), how many ways are there to fill the k spaces?”
Just so you know, P (n, k) is also variously denoted nP k, Pkn , n Pk , etc., but we’ll stick solely
with the P (n, k) in this textbook.
Example 927 (continued from above). The number of 3-letter words without re-
peated letters is simply P (26, 3) = 26!/23! = 26 × 25 × 24.

Example 1114. Problem: Using the 22-letter Phoenician alphabet, how many 4-letter
words can we form that have no repeated letters?
This, of course, is simply the problem of filling in these 4 empty spaces using 22 distinct
elements. So the answer is P (22, 4) = 22!/18! = 22 × 20 × 19 × 18 words.

Exercise 385. Out of a committee of 11 members, how many ways are there to choose
a president and a vice-president? (Answer on p. 1566.)

927, Contents

89.4. Permutations with Restrictions

Example 1115. At a dance party, there are 7 heterosexual married couples (and thus
14 people in total). Problem #1. How many ways are there of arranging them in a
line, with the restriction that every person is next to his or her partner?
Think of there as being 7 units (each unit being a couple). There are 7! ways to arrange
these 7 units in a line. Within each unit, there are 2 possible arrangements. Hence, in
total, there are 7! × 27 possible arrangements.
Problem #2. Repeat the above problem, but now for a circle, rather than a line.
There are 6! ways to arrange the 7 units in a circle. Within each unit, there are 2 possible
arrangements. Hence, in total, there are 6! × 27 possible arrangements.
Problem #3. How many ways are there of arranging them in a circle, with the restric-
tion that every man is to the right of his wife?
There are 6! ways to arrange the 7 units in a circle. Within each unit, there is only 1
possible arrangement. Hence, in total, there are 6! possible arrangements.

Example 1116. (I assume you’re familiar with the standard 52-card deck.)

(Example continues on the next page ...)

928, Contents

(... Example continued from the previous page.)
Problem #1. Using a standard 52-card deck, how many ways are there of arranging
any 3 cards in a line, with the restriction that no two cards of the same suit are next to
each other?
This is the problem of filling in 3 spaces with 52 distinct objects. For space #1, we have
52 possible choices.

_ _ _.
1 2 3

For space #2, having picked a card of suit X for space #1, we must pick a card from some
other suit Y. And so there are only 39 possible choices (we have three suits available —
that’s 3 × 13 = 39).
For space #3, having picked a card of suit Y for space #2, we must pick a card from
some other suit Z. Note that suit Z can be the same as suit X. And so there are 38
possible choices (we have three suits available, less the card used for space #1 — that’s
3 × 13 − 1 = 38).
Altogether then, there are 52 × 39 × 38 possible arrangements.
Problem #2. Repeat the above problem, but now for a circle, rather than a line.
One subtle thing is that, in addition to space #1 being of a different suit from space #2
and space #2 being of a different suit from space #3, we must also have that space #3
is of a different suit from space #1. Thus, there are 52 × 39 × 26 possible ways to fill in
these three spaces, if they were in a line.
Since they are instead in a circle, there are 52 × 39 × 26 ÷ 3 possible ways to arrange three
cards in a circle, with the condition that no two cards of the same suit are next to each

Exercise 386. (Answer on p. 1566.) There are 4 brothers and 3 sisters. In how many
ways can they be arranged ...
(a) in a line, without any 2 brothers being next to each other?
(b) in a line, without any 2 sisters being next to each other?
(c) in a circle, without any 2 brothers being next to each other?
(d) in a circle, without any 2 sisters being next to each other?

929, Contents

90. How to Count: Combinations
P (n, k) is the number of ways we can fill k (ordered) spaces using n distinct objects.
In contrast, C(n, k) is the number of ways of choosing k out of n distinct objects. Equival-
ently, it is the same problem of filling k spaces using n distinct objects, except that now
order does not matter.

Example 1117. Suppose we have a committee of 13 members and wish to select a

president and a vice-president. This is equivalent to the problem of filling in 2 spaces,
given 13 distinct objects.

1 2

The answer is thus simply P (13, 2) = 13 × 12.

Suppose instead that we want to choose two co-presidents. How many ways are there of
doing so?
This is simply the same problem as before — again we want to fill in 2 spaces, given 13
distinct objects. The only difference now is that the order of the 2 chosen objects
does not matter. So the answer must be that there are P (13, 2)/2! ways of choosing
the two co-presidents.

Example 1118. How many ways are there of choosing 5 cards out of a standard 52-card

1 2 3 4 5

First, how many ways are there to fill 5 spaces using 52 distinct objects (where order
matters)? Answer: P (52, 5) = 52 × 51 × 50 × 49 × 48 = 311, 875, 200.
And so if we don’t care about order, we must adjust this number by dividing by 5! to get
P (52, 5)/5! = 2, 598, 960. So the answer is that to choose 5 cards out of a 52-card deck,
there are 2, 598, 960 ways.

The above examples suggest that, in general, to choose k out of n given distinct objects,
there are P (n, k)/k! possible ways. This motivates the following definition:

930, Contents

Definition 188. Let n, k be positive integers with n ≥ k. Then C(n, k), read aloud as n
choose k, is defined by
P (n, k)
C(n, k) = =
(n − k)!k!

It turns out that C(n, k) appears so often in maths that it has many alternative notations
— one of the most common is .
⎝k ⎠
“n choose k” also has several names, such as the combination, the combinatorial
number, and even the binomial coefficient. Shortly, we’ll see why the name binomial
coefficient makes sense.
Exercise 387 gives an alternate expression for C(n, k) which you’ll often find very useful.
Exercise 387. (Answer on p. 1568.) Show that:
n × (n − 1) × (n − 2) × ⋅ ⋅ ⋅ × (n − k + 1)
C(n, k) = .
Exercise 388. Compute C(4, 2), C(6, 4), and C(7, 3). (Answer on p. 1568.)
Exercise 389. We wish to form a basketball team, consisting of 1 centre, 2 forwards,
and 2 guards. We have available 3 centres, 7 forwards, and 5 guards. How many ways
are there of forming a team? (Answer on p. 1568.)

Here’s a nice symmetry property:

Fact 168. (Symmetry.) C(n, k) = C(n, n − k).

Proof. Choosing k out of n objects is the same as choosing which n − k out of n objects to

931, Contents

Example 1119. We have a group of 100 men. 70 are needed for a task. The number of
ways to choose these 70 men is:
C(100, 70) = .
This is the same as the number of ways to choose the 30 men that will not be used for
the task:
C(100, 30) = .

932, Contents

90.1. Pascal’s Triangle
Pascal’s Triangle consists of a triangle of numbers. If we adopt the convention that the
topmost row is row 0 and the leftmost term of each row is the 0th term, then the nth row,
k th term is the number C(n, k):

1 1
1 2 1
1 3 3 1
1 4 6 4 1
1 5 10 10 5 1
1 6 15 20 15 6 1
1 7 21 25 35 21 7 1

It turns out that beautifully enough, each term is equal to the sum of the two terms above
it. The next exercise asks you to verify several instances of this:

Exercise 390. Verify the following: (a) C(1, 0)+C(1, 1) = C(2, 1); (b) C(4, 2)+C(4, 3) =
C(5, 3); (c) C(17, 2) + C(17, 3) = C(18, 3). (Answer on p. 1568.)

Fact 169. (Pascal’s Rule/Identity/Relation.) C(n + 1, k) = C(n, k) + C(n, k − 1).

Proof. C(n + 1, k) is the number of ways of choosing k out of n + 1 distinct objects.

Suppose we do not choose the last object, i.e. the (n + 1)th object. Then we have to choose
our k objects out of the first n objects. There are C(n, k) ways of doing so.
Suppose we do choose the last object. Then we have to choose another k − 1 objects, out
of the first n objects. There are C(n, k − 1) ways of doing so.
Altogether then, by the Addition Principle, there are C(n, k) + C(n, k − 1) ways of choosing
k out of n + 1 distinct objects.

933, Contents

90.2. The Combination as Binomial Coefficient
[L]a mathématique est l’art de donner le même nom à des choses différentes.
[M]athematics is the art of giving the same name to different things.

— Henri Poincaré (1908, Science and Method, [1914 trans.])

Poincaré’s quote is especially true in combinatorics. In this section, we’ll learn why C (n, k)
can be called the combination and also the binomial coefficient.
Verify for yourself that the following equations are true:

(1 + x) = 1,
(1 + x) = 1 + x,
(1 + x) = 1 + 2x + x2 ,
(1 + x) = 1 + 3x + 3x2 + x3 ,
(1 + x) = 1 + 4x + 6x2 + 4x3 + x4 ,
(1 + x) = 1 + 5x + 10x2 + 10x3 + 5x4 + x5 ,
(1 + x) = 1 + 6x + 15x2 + 20x3 + 15x4 + 6x5 + x6 ,
(1 + x) = 1 + 7x + 21x2 + 35x3 + 35x4 + 21x5 + 7x6 + x7 .

Each of the expressions on the RHS is called a binomial series. Each can also be called
the binomial expansion of (1 + x).
Notice anything interesting? No? Try this exercise:

⎛7⎞ ⎛7⎞ ⎛7⎞ ⎛7⎞ ⎛7⎞ ⎛7⎞ ⎛7⎞ ⎛7⎞

Exercise 391. Compute , , , , , , , . Com-
⎝0⎠ ⎝1⎠ ⎝2⎠ ⎝3⎠ ⎝4⎠ ⎝5⎠ ⎝6⎠ ⎝7⎠
pare these to the coefficients of the binomial expansion of (1 + x). What do you notice?
(Answer on p. 1569.)

It turns out that somewhat surprisingly, the coefficients of the binomial expansions of
⎛n⎞ ⎛n⎞ ⎛n⎞
(1 + x) are simply , , ... . As an additional exercise, you should verify for
⎝ 0 ⎠ ⎝ 1 ⎠ ⎝n⎠
yourself that this is also true for n = 0 through n = 6.
There are several ways to explain why the combinatorial numbers also happen to be the
binomial coefficients. Here we’ll give only the combinatorial explanation:

934, Contents

Consider (1 + x). Expanding, we have

(1 + x) = (1 + x) (1 + x) = 1 ⋅ 1 + 1 ⋅ x + x ⋅ 1 + x ⋅ x.

Consider the 4 terms on the right.

For 1 ⋅ 1, we “chose” 1 From the two (1 + x)’s in the

from the first (1 + x) and 1 Ð→ product, there is C(2, 0) = 1
from the second (1 + x). way to choose 0 of the x’s.

For 1 ⋅ x, we “chose” 1
from the first (1 + x) and x
from the second (1 + x). ⎫
⎪ From the two (1 + x)’s in the

⎬ product, there are C(2, 1) = 2

For x ⋅ 1, we “chose” x ⎭ ways to choose 1 of the x’s.
from the first (1 + x) and 1
from the second (1 + x).

Finally, for x ⋅ x, we “chose” From the two (1 + x)’s in the

x from the first (1 + x) and Ð→ product, there is C(2, 2) = 1
x from the second (1 + x). way to choose 2 of the x’s.

Altogether then, the coefficient on x0 is C(2, 0) (“choose 0 of the x’s”), that on x1 is C(2, 1)
(“choose 1 of the x’s”), and that on x2 is C(2, 1) (“choose 2 of the x’s”). That is:

(1 + x) = C(2, 0)x0 + C(2, 1)x1 + C(2, 2)x2 = 1 + 2x + x2 .

Exercise 392. (Answer on p. 1569.) Mimicking what was just done above, explain why

(1 + x) = C(3, 0)x0 + C(3, 1)x1 + C(3, 2)x2 + C(3, 3)x3 .

More generally, we have

Fact 170. Let n ∈ Z+ . Then

n ⎛ n ⎞ n−i i ⎛ n ⎞ n 0 ⎛ n ⎞ n−1 1 ⎛ n ⎞ n−2 2 ⎛n⎞ 0 n
(x + y) = ∑
x y = x y + x y + x y + ⋅⋅⋅ +
i=0 ⎝ i ⎠ ⎝ 0 ⎠ ⎝ 1 ⎠ ⎝ 2 ⎠ ⎝n⎠
xy .

935, Contents

90.3. The Number of Subsets of a Set is 2n
By plugging x = 1, y = 1 into the last fact, we see that (1 + 1) = 2n is the sum of the terms
in the nth row of Pascal’s triangle:

Fact 171. Let n ∈ Z+ . Then

n ⎛n⎞ ⎛n⎞ ⎛n⎞ ⎛n⎞ ⎛n⎞
2 =∑
= + + + ⋅⋅⋅ +
i=0 ⎝ i ⎠ ⎝ 0 ⎠ ⎝ 1 ⎠ ⎝ 2 ⎠ ⎝n⎠

There’s a nice combinatorial interpretation of the above fact (Poincaré’s quote at work
Consider the set S = {A, B}. S has 22 = 4 subsets: ∅ = {}, {A}, {B}, and S = {A, B}.
Now consider the set T = {A, B, C}. T has 23 = 8 subsets: ∅ = {}, {A}, {B}, {C}, {A, B},
{A, C}, {B, C}, and T = {A, B, C}.
In general, if a set has n elements, how many subsets does it have? We can couch this in
the framework of the Multiplication Principle — this is really a sequence of n decisions of
whether or not to include each element in the subset. There are 2 choices for each decision.
Thus, there are 2n choices altogether. In other words, using a set of n elements, we can
form 2n subsets.
But of course, this must in turn be equal to the sum of the following:
• C (n, 0) ways to form subsets with 0 elements;
• C (n, 1) ways to form subsets with 1 element;
• C (n, 2) ways to form subsets with 2 elements;
• C (n, n) ways to form subsets with n elements.

⎛n⎞ ⎛n⎞ ⎛n⎞ ⎛n⎞

2n = + + + ⋅⋅⋅ +
⎝ 0 ⎠ ⎝ 1 ⎠ ⎝ 2 ⎠ ⎝n⎠

⎛7⎞ ⎛7⎞ ⎛7⎞ ⎛7⎞

Exercise 393. Verify that 27 = + + +⋅⋅⋅+ . (Answer on p. 1569.)
⎝0⎠ ⎝1⎠ ⎝2⎠ ⎝7⎠

Exercise 394. Using what you’ve learnt, write down (3 + x)4 . (Answer on p. 1570.)

Exercise 395. (Answer on p. 1570.) (a) The Tan family has 4 sons and the Wong
family has 3 daughters. Using the sons and daughters from these two families, how many
ways are there of forming 2 heterosexual couples?
(b) The Lee family has 6 sons and the Ho family has 9 daughters. Using the sons and
daughters from these two families, how many ways are there of forming 5 heterosexual

936, Contents

91. Probability: Introduction

91.1. Mathematical Modelling

All models are wrong, but some are useful.

— George Box (1979).

Whenever we use maths in a real-world scenario, we have some mathematical model in

mind. Here’s a very simple example just to illustrate:

Example 1120. We want to know how much material to purchase, in order to build a
fence around a field. We might go through these steps:
1. Formulate a mathematical model: Our field is the shape of a rectangle, with length
100 m and breadth 50 m.
2. Analyse: The rectangle has perimeter 100 + 50 + 100 + 50 = 300 m.
3. Apply the results of our analysis: We need to buy enough material to build a
300-metre long fence.

The figure below depicts how mathematical modelling works.

Starting with some real-world scenario, we go through these steps:

1. Formulate a mathematical model.
That is, describe the real-world scenario in mathematical language and concepts.
This first step is arguably the most important. It is often subjective — not everyone will
agree that your mathematical model is the most appropriate for the scenario at hand.
To use the above example, the field may not be a perfect rectangle, so some may object
to your description of the field as a rectangle. Nonetheless, you may decide that all things
considered, the rectangle is a good mathematical model.
2. Analyse the model.
This involves using maths and the rules of logic. (A-Level maths exams tend to be mostly
concerned with this second step.)

937, Contents

In the above example, this second step simply involved computing the perimeter of the
rectangle — 100 + 50 + 100 + 50 = 300 m. Of course, for the A-Levels, you can expect the
analysis to be more challenging than this.
Note that this second step, in contrast to the first, is supposed to be completely watertight,
non-subjective, and with no room for disagreement. After all, hardly anyone reasonable
could disagree that a perfect rectangle with length 100 m and breadth 50 m has perimeter
300 m.
3. Apply your results.
Now apply the results of your analysis to the real-world scenario.
In the above example, pretend you’re a mathematical consultant hired by the fence-builder.
Then your final report might simply say, “We recommend the purchase of 300 m worth of
fence material.”
This third and last step is, like the first, subjective and open to debate. It involves your
interpretation of what the results of your analysis mean (in the real world) and your re-
commendation of what actions to take.
For example, you find that the fence will have perimeter 300 m and thus recommend that
300 m of fence material be purchased. However, someone else, looking at the same result,
might point out that the corners of the fence require additional or special material; she
might thus make a slightly different recommendation.

Secretly, we’ve always been using mathematical modelling; we just haven’t always been
terribly explicit about it. The foregoing discussion was placed here, because with probability
and statistical models, we want to be especially clear about that we are doing mathematical

938, Contents

91.2. The Experiment as a Model of Scenarios Involving Chance
Real-world scenarios often involve chance. We can model such scenarios mathemat-
ically. For this purpose, we’ll use a mathematical object named the experiment, typically
denoted E.349
An experiment E = (S, Σ, P) is an ordered triple350 composed of three objects, called the
sample space S, the event space Σ (upper-case sigma), and the probability function
P, where
• The sample space S is simply the set of possible outcomes.
• An event is simply any set of possible outcomes. In turn, the event space Σ is simply
the set of all events.
• The probability function P simply assigns to each event some probability between 0
and 1. This probability is interpreted as the likelihood of that particular event occurring.

An experiment is often instead called a probability triple or probability space or (probability)
measure space.
Previously, in the only ordered triples we encountered, the three terms were always simply real numbers.
Here however, the first two terms are sets and the third is a function. Nonetheless, this is all the same
an ordered triple, albeit a more complicated one.
939, Contents
Example 1121. We model a coin-flip with the experiment E = (S, Σ, P). What are the
sample space S, the event space Σ, and the probability function P?
1. S = {H, T }.

The sample space is simply the set of possible outcomes.

The choice of the sample space belongs to Step #1 (Formulate a mathematical model)
in the process of mathematical modelling. It is subjective and open to disagreement.
For example, John (another scientist) might argue that the coin sometimes lands exactly
on its edge. This is exceedingly unlikely but nonetheless possible — one empirical estimate
is that the US 5-cent coin has probability 1 in 6000 of landing on its edge when flipped
(source). So John might denote this third possible outcome X and his sample space
would instead be S = {H, T, X}.
2. Event space Σ = {∅, {H}, {T }, {H, T }}.
An event is simply any subset of S. In other words, an event is simply some set of
possible outcomes. So here, {H} is an event. So too is {T }. But there are also two other
events, namely ∅ = {} (this is the event that never occurs) and S = {H, T } (this is the
event that always occurs).
The event space is simply the set of events. In other words, the event space is the set
of all subsets of S.351
As we saw in Ch. 90.2, given any finite set S, there are 2∣S∣ possible subsets of S. In
general, given a finite sample space S, the corresponding event space Σ always simply con-
tains 2∣S∣ events. And so here, since there are 2 possible outcomes, there are, altogether,
22 = 4 possible events.
If the real-world outcome of the coin flip is Heads, then our interpretation (in terms of
our model) is that “the events {H} and {H, T } occur”. If the real-world outcome of the
coin flip is Tails, then our interpretation (in terms of our model) is that “the events {T }
and {H, T } occur”.
The event ∅ never occurs, whatever the real-world outcome is. And the event S = {H, T }
always occurs, whatever the real-world outcome is.
(Example continues on the next page ...)

940, Contents

(... Example continued from the previous page.)
The mathematical modeller is free to select the sample space S she deems most ap-
propriate. However, once she has selected the sample space S, the event space Σ is
automatically determined by the rules of maths. There is no room for interpretation.
Hence, the selection of the event space Σ belongs to Step #2 (Analysis) in the process of
mathematical modelling.
So likewise, John, who chooses S = {H, T, X} as his sample space,
has no freedom to choose his event space Σ. It is automatically Σ =
{∅, {H}, {T }, {X}, {H, T }, {H, X}, {T, X}, S} (consists of 8 elements).
3. Probability function P ∶ Σ → R.
The probability function simply assigns to each event a number (between 0 and 1)
called a probability. So here, if heads and tails are “equally likely” (or the coin is
“unbiased” or “fair”), then it makes sense to assign

P (∅) = 0, P ({H}) = P ({T }) = 0.5, P(S) = 1.

The mathematical modeller has no freedom over the domain Σ and codomain R of the
probability function. However, she does have freedom to choose the mapping rule she
deems most appropriate. Hence, the act of choosing the mapping rule belongs to Step
#1 (Formulation) in the process of mathematical modelling.
So here, if told that heads and tails are “equally likely” (or that the coin is “unbiased”
or “fair”), the mathematical modeller would naturally choose to assign probability 0.5 to
each of the events {H} and {T }.
John, who chooses S = {H, T, X} as his sample space, might instead assign probability
1/6000 to the event {X} and probability 5999/12000 to each of the events {H} and {T }.

Remark 128. It is correct and proper to write P ({H}) = P ({T }) = 0.5. It is incorrect
and improper to write P (H) = P (T ) = 0.5. This is because the function P is of events
(sets of outcomes) and NOT of outcomes themselves.
Nonetheless, we will often allow ourselves to be sloppy and write the “incorrect and
improper” P (H) = P (T ) = 0.5. This is because the notation P ({H}) = P ({T }) = 0.5 can
get rather messy. But you should always remember, even as you write P (H) = P (T ) = 0.5,
that this is technically incorrect.

941, Contents

Example 1122. A real-world die-roll can be modelled by an experiment E = (S, Σ, P),
1. S = {1, 2, 3, 4, 5, 6}.

2. Event space:
Σ = {∅, {1} , {2} , . . . , {6} , {1, 2} , {1, 3} , . . . , {5, 6} , {1, 2, 3} , {1, 2, 4} , . . . , {4, 5, 6} , . . . . . . , S}

There are 6 possible outcomes and thus 26 = 64 possible events. The event space, given
above, is simply the set of all possible events.
If the real-world outcome of the die roll is 3, then the interpretation (in terms of our
model) is that the following 32 events occur: {3}, {1, 3}, {2, 3}, . . . , {1, 2, 3}, {1, 3, 4},
. . . , S = {1, 2, 3, 4, 5, 6}. (These are simply the events that contain the outcome 3.)
Similarly, if the real-world outcome of the die roll is 5, then the interpretation is that 32
events occur. You should be able to list all 32 of these events on your own.
3. Probability function P ∶ Σ → R.
If the die is “unbiased” or “fair”, then it makes sense to assign
P({1}) = P({2}) = P({3}) = P({4}) = P({5}) = P({6}) = .
What about the other 58 events? It makes sense to assign, for example, P ({1, 3, 5, 6}) = .
In general, the mapping rule of the probability function can be fully specified as: For any
event A ∈ Σ,
∣A∣ ∣A∣
P(A) = =

In words, given any event A, its probability P(A) is simply the number of elements it
contains, divided by 6.

942, Contents

Here’s the formal definition of an experiment:

Definition 189. An experiment is an ordered triple (S, Σ, P), where

• S, the sample space, is simply any set (interpreted as the set of possible outcomes in
a real-world scenario involving chance).
• Σ, the event space, is the set of possible events.
• P, the probability function, has domain Σ, codomain R, and must satisfy the three
Kolmogorov axioms (to be discussed below in Definition 190).

Given any event A ∈ Σ, the number P(A) is called the probability of A.

For the probability function P, the mathematical modeller is free to choose the mapping
rule she deems most appropriate. The only restriction is that P satisfies three axioms,
called the Kolmogorov Axioms, to be discussed in the next section.

Exercise 396. (Answers on pp. 1571, 1572, and 1573.) Consider each of the following
real-world scenarios.
(a) You pick, at random, a card from a standard 52-card deck.
(b) You flip two fair coins.
(c) You roll two fair dice.

Model each of the above real-world scenarios as an experiment, by following steps (i) -

(i) Write down the appropriate sample space S.

(ii) How many possible events are there? Hence, how many elements does the event space
Σ contain? If it is not too tedious, write out Σ in full.
(iii) What are the domain and codomain of the probability function P? Write down the
probabilities of any three events. Given any event A ∈ Σ, what is P(A)?

(iv) In each scenario, explain briefly how John, another scientist, might justify choosing
a different sample space, event space, and probability function.

943, Contents

91.3. The Kolmogorov Axioms
An axiom (or postulate) is a statement that is simply accepted as being true, without
justification or proof.

Example 1123. Euclid’s parallel axiom says that “Two non-parallel lines in the plane
eventually intersect”. Historically, this axiom was accepted as a “self-evident truth”,
without need for justification or proof.
However, in the 19th century, mathematicians discovered “non-Euclidean geometries”, in
which the parallel axiom did not hold. These turned out to have significant implications
for maths, philosophy, and physics.

The above example illustrates that an axiom is not an eternal and immutable truth. Instead,
it is merely a statement that some mathematicians tentatively accept as being true. Having
listed a bunch of axioms, mathematicians then study their implications.
In probability theory, we impose three axioms on the probability function. These can be
thought of as restrictions on what the probability function looks like. Informally:
1. Probabilities can’t be negative.
2. The probability of an outcome occurring is 1.
3. The probability that one of two disjoint events occurs is the sum of the their individual

Definition 190. We say that a function P satisfies the three Kolmogorov axioms if:
1. Non-Negativity Axiom. For any event E ⊆ S, we have P(E) ≥ 0.
2. Normalisation Axiom. P(S) = 1.
3. Additivity Axiom.352 Given any two disjoint events E1 , E2 ⊆ S, we have
P (E1 ∪ E2 ) = P (E1 ) + P (E2 ).

In case you’ve forgotten, two sets are disjoint if they have no elements in common.

944, Contents

91.4. Implications of the Kolmogorov Axioms
Obviously, P(∅) = 0 (the probability that the empty event occurs is 0). Previously, you’ve
probably taken this and other “obvious” properties for granted. Now we’ll prove that they
follow from the Kolmogorov axioms.
Recall that given any set A, its complement Ac (sometimes also denoted A′ ) is defined to
be “everything else” — more precisely, Ac is the set of all elements that are not in A.

Proposition 14. Let P be a probability function and A, B be events. Then P satisfies

the following properties:

1. Complements. P(A) = 1 − P (Ac ).

2. Probability of Empty Event is Zero. P(∅) = 0.
3. Monotonicity. If B ⊆ A, then P(B) ≤ P(A).
4. Probabilities Are At Most One. P(A) ≤ 1.
5. Inclusion-Exclusion. P(A ∪ B) = P(A) + P(B) − P(A ∩ B).

You may recognise that the Complements and the Inclusion-Exclusion properties are ana-
logous to the CP and IEP from counting.

Proof. 1. Complements. By definition, A ∩ Ac are disjoint. And so by the Additivity

Axiom, P(A) + P(Ac ) = P(A ∪ Ac ).
Also by definition, A ∪ Ac = S. And so P(A ∪ Ac ) = P(S).
By the Normalisation Axiom, P(S) = 1.
Altogether then, P(A) + P(Ac ) = P(A ∪ Ac ) = P(S) = 1. Rearranging, P(A) = 1 − P (Ac ), as
The remainder of the proof is continued on p. 1370 in the Appendices.

Venn diagrams are helpful for illustrating probabilities. Those below help to illustrate the
four of the above five properties.

945, Contents

Exercise 397. Prove each of the following properties and illustrate with a Venn diagram:
(a) “If two events A and B are mutually exclusive, then P(A ∩ B) = 0.” (b) “Let A, B,
and C be events. Then P(A ∪ B ∪ C) = P(A) + P (Ac ∩ B) + P (Ac ∩ B c ∩ C).” (Answer on
p. 1574.)

946, Contents

92. Probability: Conditional Probability

Example 1124. Flip three fair coins. Model this as an experiment E = (S, Σ, P), where

• The sample space is S = {HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T }.

• The event space Σ has 28 = 256 elements.
• The probability function P ∶ Σ → R has mapping rule:
P(HHH) = P(HHT ) = ⋅ ⋅ ⋅ = P(T T T ) = ,
and more generally, for any event A ∈ Σ, P(E) = .
Problem: Suppose there is at least 1 tail. Find the probability that there are at least 2
There are 7 possible outcomes where there is at least 1 tail: HHT , HT H, HT T , T HH,
T HT , T T H, and T T T . Each is equally likely to occur. Of these, 3 outcomes involve
at least 2 heads (HHT , HT H, and T HH). Thus, given there is at least 1 tail, the
probability that there are at least 2 heads is simply 3/7.
The above analysis was somewhat informal. Here is a more formal analysis.
Let A be the event that there are at least 2 heads: A = {HHT, HT H, T HH, HHH}.
Let B be the event that there is at least 1 tail: B =
{HHT, HT H, HT T, T HH, T HT, T T H, T T T }.
A ∩ B is thus the event that there are at least 2 heads and 1 tail: A∩B =
{HHT, HT H, T HH}.
Our problem is equivalent to finding P(A∣B) — the conditional probability of A
given B, which is given by:
P(A ∩ B) 3/8 3
P(A∣B) = = = .
P(B) 7/8 7

947, Contents

Example 1125. Let P be a probability function and A, B ∈ Σ be events.
• P(A) = 0.5 (the probability that A occurs is 0.5).
• P(B) = 0.6 (the probability that B occurs is 0.6).
• P(A ∩ B) = 0.2 (the probability that both A and B occur is 0.2).
Hence, given that B has occurred, the probability that A has also occurred is simply
0.2/0.6 = 1/3. (The information that P(A) = 0.5 is irrelevant.) Formally:
P(A ∩ B) 0.2 1
P(A∣B) = = = .
P(B) 0.6 3

The foregoing examples motivate the following definition:

Definition 191. Let P be a probability function and A, B ∈ Σ be events. Then the

conditional probability of A given B is denoted P(A∣B) and is defined by:
P(A ∩ B)
P(A∣B) = .

Exercise 398. Roll two dice. Given that the sum of the two dice rolls is 8, what is the
probability that we rolled at least one even number? (Answer on p. 1575.)

948, Contents

92.1. The Conditional Probability Fallacy (CPF)

Definition 192. The conditional probability fallacy (CPF) is the mistaken belief that

P (A∣B) = P (B∣A)

is always true.

Informally, the CPF is the fallacy of leaping from

“If A, then probably B” to “Since B, then probably A.”

But in general, it is not true that P (A∣B) = P (B∣A). Instead:

Fact 172. (a) If P(A) < P(B), then P (A∣B) < P (B∣A).
(b) If P(A) > P(B), then P (A∣B) > P (B∣A).
(c) If P(A) = P(B), then P (A∣B) = P (B∣A).

P (A ∩ B) P (B ∩ A)
Proof. By definition, P (A∣B) = and P (B∣A) = .
P(B) P(A)
P (A)
Thus, P (A∣B) = P (B∣A). And so,
P(A) < P(B) Ô⇒ P (A∣B) < P (B∣A) ,
P(A) > P(B) Ô⇒ P (A∣B) > P (B∣A) ,
P(A) = P(B) Ô⇒ P (A∣B) = P (B∣A) .

The CPF is also known as the confusion of the inverse or the inverse fallacy. In
different contexts, it is also known variously as the base-rate fallacy, false-positive
fallacy, or prosecutor’s fallacy.

949, Contents

Example 1126. Suppose the following statement is true: “If Mary has Ebola, then Mary
will probably vomit today.” Formally, we might write P (Vomit∣Ebola) = 0.99.
Mary vomits today. One might then reason, “Since P (Vomit∣Ebola) = 0.99, by the CPF,
we also have P (Ebola∣Vomit) = 0.99. Thus, Mary probably has Ebola.”
Formally, this reasoning is flawed because P(Vomit) is probably much larger than
P(Ebola). Thus, P (Vomit∣Ebola) is probably much larger than P (Ebola∣Vomit).
Informally, the reasoning is flawed because:
• Ebola is extremely rare, so it is extremely unlikely that Mary has Ebola in the first
• Besides Ebola, there are many other alternative explanations for why Mary might have
vomitted. For example, she might have had motion sickness or food poisoning.

Example 1127. Sally buys a 4D ticket every week. One day, she wins the first prize.
To her astonishment, she wins the first prize again the following week.
Her jealous cousin Ah Kow makes a police report, based on the following reasoning:
“Without cheating, the probability that Sally wins the first prize two weeks in a row is
1 in 100 million. Given that she did win first prize two weeks in a row, the probability
that she didn’t cheat must likewise be 1 in 100 million. In other words, there is almost
no chance that Sally didn’t cheat.”
Let’s rephrase Ah Kow’s reasoning more formally. Let A and B be the events “Sally
wins the first prize two weeks in a row” and “Sally didn’t cheat”, respectively. We
know that P (A∣B) = 0.00000001. By the CPF, we have P (A∣B) = P (B∣A). Hence,
P (B∣A) = 0.00000001. Equivalently, there is probability 0.99999999 that Sally cheated.
Formally, this reasoning is flawed because P(B) is probably much larger than P (A).
Thus, P (B∣A) is probably much larger than P (A∣B).
Informally, the reasoning is flawed because:
• Cheating in 4D is extremely rare (and difficult), so it is extremely unlikely that Sally
cheated in the first place.
• Besides cheating, there are many other alternative explanations for why there exists
an individual who won first prize two weeks in a row.
One important alternative explanation is that so many individuals buy 4D tickets regu-
larly that there will invariably be someone as lucky as Sally. Suppose that only 100, 000
Singaporeans (less than 2% of Singapore’s population) buy one 4D number every week.
Then we’d expect that about once every 20 years, one of these 100, 000 Singaporeans
will have the fortune of winning the first prize on consecutive weeks. Rare, but hardly

950, Contents

The next example uses concrete numbers to illustrate how large the discrepancy between
P (A∣B) and P (B∣A) can be.

Example 1128. A randomly-chosen person is given a free smallpox screening. We know

that 1 out of every 1, 000, 000 people has smallpox. The test is very accurate: If you have
smallpox, it correctly tells you so 99% of the time. (Equivalently, it gives a false negative
only 1% of the time.) And if you don’t have smallpox, it also correctly tells us so 99% of
the time. (Equivalently, it gives a false positive only 1% of the time.)
Formally, let S, +, and − denote the events “the randomly-chosen person has smallpox”,
“the test returns positive”, and “the test returns negative”. Then
1 999999
P (S) = , P (S C ) = ,
1000000 1000000

P (+∣S) = 0.99, P (−∣S) = 0.01,

P (−∣S C ) = 0.99, P (+∣S C ) = 0.01.

The test result returns positive (i.e. it says that the randomly-chosen person has small-
pox). What is the probability that this person actually has smallpox?
In words, it is easy to confuse “the probability of a positive test result conditional on
having smallpox” with “the probability of having smallpox conditional on a positive
test result”. Formally, this is the CPF. One starts with P (+∣S) = 0.99 and confusedly
concludes that P (S∣+) = 0.99 — this person almost certainly has smallpox.
In fact, as we now show, despite testing positive, the person is very unlikely to have
1 ∗
smallpox. The correct answer is P (S∣+) ≈ ! In the steps below, each = simply
10, 000
uses the definition of conditional probability (Definition 191):

∗ P (S ∩ +) ∗ P (S) P (+∣S) P (S) P (+∣S)

P (S∣+) = = =
P (+) P (+) P (+ ∩ S) + P (+ ∩ S C )

∗ P (S) P (+∣S)
P (S) P (+∣S) + P (S C ) P (+∣S C )

1000000 0.99 1
= = 0.00009899029 ≈
1000000 0.99 + 1000000 0.01
1 999999 .
10, 000

This example illustrates how far off the CPF can lead one astray.

Now an actual, real-world example:

951, Contents

Example 1129. The British mother who murdered her two babies. In 1996,
Sally Clark’s first-born died suddenly within a few weeks of birth. In 1998, the same
happened to her second child. Clark was then arrested on suspicion of murdering her
At her trial, an “expert” witness claimed that in an affluent, non-smoking family such
as Sally Clark’s, the probability of an infant suddenly dying with no explanation was
1/8543. Hence, he concluded, the probability of two sudden infant deaths in the same
family was (1/8543) or approximately 1 in 73 million.

The “expert” then committed the CPF. He argued that since

P (Two babies suddenly die∣Mother did not murder babies) = ,
73, 000, 000
it therefore follows that
P (Mother did not murder babies∣Two babies suddenly die) = .
73, 000, 000
This erroneous reasoning led to Sally Clark being convicted for murdering her two babies.
(Some of you may have noticed that the “expert” actually also made another mistake.
But we’ll examine this only in the next chapter.)

It turns out that not only laypersons and court prosecutors commit the CPF. As we’ll see
later, even academic researchers also often commit the CPF, when it comes to interpreting
the results of a null hypothesis significance test (Chapter 107).

Exercise 399. (Answer on p. 1575.) At a murder scene, a sample of a blood stain is

collected. Its DNA is analysed and compared to a database of DNA profiles. A match
with one John Brown is found. Say there is only a 1 in 10 million chance that two random
individuals have a DNA match.
Does this mean that there is probability 1 in 10 million that the DNA match with John
Brown is merely a coincidence, and thus a near-certainty that the blood stain is really
his? Explain why or why not, with reference to the following conditional probabilities:

P (Blood stain is not John Brown’s∣DNA match) ,

P (Blood stain is not John Brown’s∣DNA match) .

952, Contents

92.2. Two-Boys Problem (Fun, Optional)
This is a famous puzzle, first popularised by Martin Gardner in 1959.

Example 1130. Consider all the families in the world that have two children, of whom
at least one is a boy. Randomly pick one of these families. What is the probability that
both children in this family are boys?

Think about it (set aside this book) before reading the answer below.

We already know that one child is a boy. So intuition might suggest that “obviously”,

P (Both boys) = P (The other child is a boy) = 0.5.

Intuition would be wrong. Intuition goes astray by failing to recognise that there are three equally likely
ways that a family with two children can have at least one boy: BB, BG, or GB. The answer is in fact

P(BB ∩ ”At least one boy”) P(BB)

P (BB∣At least one boy) = =
P(At least one boy) P(At least one boy)

P(BB) 1
= = 4
= .
P(BB) + P(BG) + P(GB) 1
4 + +

In 2010, the following variant of the above Martin Gardner problem was presented.

953, Contents

Example 1131. Consider all the families in the world that have two children, of whom
at least one is a boy born on a Tuesday. Randomly pick one of these families. What is
the probability that both children in this family are boys?
Those familiar with the previous problem might think, “Well, this is exactly the same as
the two-boys problem, except with an obviously-irrelevant bit of information about the
boy being born on a Tuesday. So the answer must be the same as before: 1/3.”
It turns out though that, surprisingly, the Tuesday bit of information makes a big differ-
ence. The answer is 13/27 = 0.481. This is much closer to 0.5 than to 1/3!
Consider all the “two-child, at-least-one-boy-born-on-a-Tuesday” families in the world.
The four mutually-exclusive possibilities are

Child #1 Child #2 Probability

1 1 7
BT B Boy born on Tuesday Boy (born on any day) P (BT B) = ⋅ =
14 2 196

1 1 7
BT G Boy born on Tuesday Girl P (BT G) = ⋅ =
14 2 196

6 1 6
BN BT Boy not born on Tuesday Boy born on Tuesday P (BN BT ) = ⋅ =
14 14 196

1 1 7
GBT Girl Boy born on Tuesday P (GBT ) = ⋅ =
2 14 196
Altogether then, amongst two-child families with at least one boy born on a Tuesday, the
proportion that have two boys is

P (BB ∩ ”At least one Tuesday boy”)

P (Both boys, at least one of whom born on Tuesday)

P (At least one Tuesday boy)

P (BT B) + P (BN BT )
P (BT B) + P (BT G) + P (BN BT ) + P (GBT )

+ 196
= 196
+ + 196 + 196
7 7 6 7 .
196 196

954, Contents

93. Probability: Independence
Informally, two events A and B are independent if the probability that both occur is
simply the product of the probabilities that each occurs. Independence is thus analogous
to the MP from counting. Formally:

Definition 193. Two events A, B ∈ Σ are independent if

P(A ∩ B) = P(A)P(B).

There is a second, equivalent perspective of independence. Informally, two events A and B

are independent if the probability that A occurs is independent of whether B has occurred.

Fact 173. Suppose P(B) ≠ 0. Then A, B are independent events ⇐⇒ P(A∣B) = P(A).

Proof. By definition of conditional probabilities, P(A∣B) = P(A ∩ B)/P(B). By definition


of independence, P(A ∩ B) = P(A)P(B). Plugging = into =, we have P(A∣B) = P(A), as

2 2 1


955, Contents

Example 1132. Flip two fair coins. Model this with the usual experiment, where
• S = {HH, HT, T H, T T },
• Σ contains 24 = 16 elements, and
• P ({HH}) = P ({HT }) = P ({T H}) = P ({T T }) = 1/4.
Let H1 be the event that the first coin flip is Heads — that is, H1 = {HH, HT }. Analog-
ously define T1 , H2 , and T2 .
The intuitive idea of independence is easy to grasp. If we say that the two coin flips are
independent, what we mean is that the following four conditions are true:
1. H1 and H2 are independent. (The probability that the second flip is heads is independent
of whether the first flip is heads.)
2. H1 and T2 are independent. (The probability that the second flip is tails is independent
of whether the first flip is heads.)
3. T1 and H2 are independent. (The probability that the second flip is heads is independent
of whether the first flip is tails.)
4. T1 and T2 are independent. (The probability that the second flip is tails is independent
of whether the first flip is tails.)
1. P (H1 ∩ H2 ) = P({HH}) = P (H1 ) P (H2 ) = P({HH, HT })P({HH, T H}) = 0.5 ⋅ 0.5 =
2. P (H1 ∩ T2 ) = P({HT }) = P (H1 ) P (T2 ) = P({HH, HT })P({HT, T T }) = 0.5⋅0.5 = 0.25.
3. P (T1 ∩ H2 ) = P({T H}) = P (T1 ) P (H2 ) = P({T H, T T })P({HH, T H}) = 0.5⋅0.5 = 0.25.
4. P (T1 ∩ T2 ) = P({T T }) = P (T1 ) P (T2 ) = P({T H, T T })P({HT, T T }) = 0.5 ⋅ 0.5 = 0.25.

Example 1133. Flip a fair coin and roll a fair die. This can be modelled by an experi-
ment, where

• S = {H1, H2, H3, H4, H5, H6, T 1, T 2, T 3, T 4, T 5, T 6} .

• Σ consists of 212 events.
• P(A) = ∣A∣/12, for any event A ∈ Σ.
Now consider the event “Heads” E1 = {H1, H2, H3, H4, H5, H6}, and the event “Roll an
odd number” E2 = {H1, H3, H3, T 1, T 3, T 5}. These two events E1 and E2 are independ-
ent, as we now verify:
P (E1 ∩ E2 ) 3/12 1
P (E1 ∣E2 ) = = = = P (E1 ) .
P (E2 ) 6/12 2

More broadly, we can even say that the coin flip and die roll are independent. Informally,
this means that the outcome of the coin flip has no influence on the outcome of the die
roll, and vice versa.

The idea of independence is a little tricky to illustrate on a Venn diagram. I’ll try anyway.

956, Contents

Example 1134. The Venn diagram below illustrates a sample space with 100 equally
likely outcomes (represented by 100 small squares). The event A is highlighted in red.
The event B is highlighted in blue.
P(A) = 0.2 (A is made of 20 small squares). P(B) = 0.1 (B is made of 10 small squares).
The event A ∩ B, coloured in green, is made of 2 small squares, so P(A ∩ B) = 0.02.
We compute
P(A ∩ B) 0.02
P(A∣B) = = = 0.2.
P(B) 0.1

We observe that P(A) = 0.2 = P(A∣B). And so by Fact 173, we conclude that the events
A and B are independent.

Exercise 400. Symmetry of Independence. In Fact 173, we showed that “A, B

independent ⇐⇒ P(A∣B) = P(A)”. Now prove that “A, B are independent events ⇐⇒
P(B∣A) = P(B).” (Answer on p. 1576.)
Exercise 401. (Answer on p. 1576.) An example of a transitive relation is equality:
If A = B and B = C, then A = C. Another example is ≤: If A ≤ B and B ≤ C, then A ≤ C.
In contrast, independence is not transitive, as this exercise will demonstrate. That
is, even if A and B are independent, and B and C are independent, it may not be that
A and C are also independent.
Flip two fair coins. Let H1 be the event that the first coin flip is heads, H2 be the event
that the second is heads, and T1 be the event that the first flip is tails. Show that:
(a) H1 and H2 are independent.
(b) H2 and T1 are independent.
(c) H1 and T1 are not independent.

957, Contents

93.1. Warning: Not Everything is Independent
The idea of independence is intuitively easy to grasp. Indeed, so much so that students
often assume that “everything is independent”. This is a mistake. Unless you’re explicitly
told, NEVER assume that two events are independent.
Here are two examples where the assumption of independence is plausible:

Example 1135. The event “coin-flip #1 is heads” and the event “coin-flip #2 is heads”
are probably independent.

Example 1136. The event “die-roll #1 is 3” and the event “die-roll #2 is 6” are probably

Here are two examples where the assumption of independence is not plausible:

Example 1137. The event “Google’s share price rises today” is probably not independent
of the event “Apple’s share price rises today”.

Example 1138. The event “it rains in Singapore today” is probably not independent of
the event “it rains in Kuala Lumpur today”.

Nonetheless, the assumption of independence is frequently — and incorrectly — made even

when it is implausible. One reason is that the maths is easy if we assume independence —
we can simply multiply probabilities together.
We now revisit the Sally Clark case. Previously, we saw that the court’s “expert” witness
committed the CPF. Now, we’ll see that he also made a second mistake — that of assuming

Example 1139. The “expert” witness claimed that in an affluent, non-smoking family
such as Sally Clark’s, the probability of an infant suddenly dying with no explanation
was 1/8543. Hence, he concluded, the probability of two sudden infant deaths in the
same family was (1/8543) or approximately 1 in 73 million.

Can you spot the error in the reasoning?

By simply multiplying together probabilities, the “expert” implicitly assumed that the two
events — “sudden death of baby #1” and “sudden death of baby #2” — are independent.
But as any doctor will tell you, if your family has a history of heart attack, diabetes, or
pretty much any other ailment, then you may be at higher risk (than the average person)
of suffering the same.
And so, it may well be that in any given year, a random person has probability 0.001
of dying of a heart attack. It does not however follow that in any given year, a random
family has probability 0.0012 = 0.000001 of two deaths by heart attack.
Similarly, it may be that if one baby in a family has already suddenly died, a second baby
is at higher risk (than the average baby) of suddenly dying.

958, Contents

Exercise 402. (Answer on p. 1576.) Say the probability that a randomly-chosen person
is or was an NBA player is one in a million. (This is probably about right, since there’ve
only ever been 4, 000 or so NBA players, since the late 1940s.)
The Barry family had four players in the NBA — the father Rick Barry and three of his
four sons Jon, Brent, and Drew. (The oldest son Scooter didn’t make the NBA but was
still good enough to play professionally in other basketball leagues around the world.)
A journalist concludes that the probability of a Barry family ever occurring is
1 1
( ) = .
1, 000, 000 1, 000, 000, 000, 000, 000, 000, 000, 000
This is equal to the probability of buying a 4D number on six consecutive weeks, and
winning first prize every time. Is the journalist correct?

959, Contents

93.2. Probability: Independence of Multiple Events

Definition 194. Let P be a probability function and A, B, C ∈ Σ be events.

A, B, C are pairwise independent if all three of the following conditions are true:

P(A ∩ B) = P(A)P(B),
P(B ∩ C) = P(B)P(C),
P(A ∩ C) = P(A)P(C).

A, B, C are independent if in addition to the above three conditions being true, it is also
true that

P(A ∩ B ∩ C) = P(A)P(B)P(C).

It is tempting to believe that pairwise independence implies independence. That is, if the
first three conditions listed above are true, then so is the fourth. Alas, this is false, as the
next exercise demonstrates:

Exercise 403. (Pairwise independence does not imply independence.) (Answer

on p. 1576.)
Flip two fair coins. Let H1 be the event that the first coin flip is heads, T2 be the event
that the second is tails, and X be the event that the two coin flips are different. Show
(a) These three events are pairwise independent.
(b) These three events are not independent.

960, Contents

94. Fun Probability Puzzles

94.1. The Monty Hall Problem

The Monty Hall Problem is probably the world’s most famous probability puzzle. It takes
less than a minute to state. Yet its counter-intuitive answer confuses nearly everyone.
You’re at a gameshow. There are three boxes, labelled #1, #2, and #3. One box contains
one year’s worth of a Singapore minister’s salary. The other two are empty.
You are asked to pick one box (but you are not allowed to open it yet).
The host, who knows where the minister’s salary is, opens one of the other two boxes, to
reveal that it is empty. Important: The host is not allowed to open the box that contains
the minister’s salary; he must always open a box that is empty.
You’re now given a choice: Stay (with your original choice) or switch (to the other unopened
box). What should you do?
To illustrate:

Example 1140. Say you pick Box #2. The host then opens an empty Box #1. You’re
now given a choice: Stay (with Box #2) or switch (to Box #3). Which do you choose?

Your original choice Should you switch?

Box #1
Box #2 Box #3

Take as long as you need to think about this problem, before turning to the
next page for the answer.

961, Contents

A magazine columnist named Marilyn vos Savant353 gave the correct answer:

Yes; you should switch. The first door has a 1/3 chance
of winning, but the second door has a 2/3 chance.

Here are two informal explanations:

1. The probability that the minister’s salary is in the box you picked is 1/3. The probability
that the minister’s salary is in either of the other two boxes is 2/3. Of the other two boxes,
the gameshow host (who knows where the salary is) helps you eliminate one of them. So
the remaining unopened box still has probability 2/3 of containing the minister’s salary.
2. Imagine instead that there are 100 boxes, of which one contains the minister’s salary
and the others are empty. You pick one. Of the remaining 99, the gameshow host opens
98. You are again given the choice: Should you stay or switch? In this more extreme
version of the game, it is perhaps more obvious that your originally-picked box has only
probability 1/100 of containing the minister’s salary, while the only other unopened box
has probability 99/100 of the same. Therefore, you should switch.
Here’s a more formal explanation using the method of enumeration:
3. Say you originally pick Box #1. There are three possible cases, each occurring with
probability 1/3:

Case Box #1 Box #2 Box #3 Host opens

A Minister’s salary Empty Empty Box #2 or Box #3
B Empty Minister’s salary Empty Box #3
C Empty Empty Minister’s salary Box #2

Not switching wins you the minister’s salary only in Case A (1/3 probability).
Switching wins you the minister’s salary in Cases B and C (2/3 probability).

Marilyn vos Savant was, briefly, on the Guinness Book of Records as the person with the world’s highest
IQ, until Guinness retired this category because IQ tests were considered to be too unreliable.
962, Contents
Even with the above explanations, some of you may remain unconvinced. Don’t worry, you
are not alone. After Marilyn’s initial response, 10,000 readers sent in letters telling her she
was wrong. Some were from Professors of Mathematics and PhDs. A few examples:354

As a professional mathematician, I’m very concerned with the general

public’s lack of mathematical skills. Please help by confessing your error
and in the future being more careful.

There is enough mathematical illiteracy in this country, and we don’t need

the world’s highest IQ propagating more. Shame!

Maybe women look at math problems differently than men.

Unfortunately for the above letter writers, Marilyn was correct and they were wrong.
The best way to convince the sceptical is through simulations — try this Google spreadsheet.
Or if you don’t trust computers, do an actual experiment:

Class Activity

Form pairs. One person is the gameshow host and the other is the contestant. The host
decides where the prize is (Box #1, #2, or #3). The contestant then picks a box. The
host then tells the contestant which one of the other two boxes is empty. The contestant
then decides whether to stay or switch.
Repeat as many times as you have time for. Record the proportion of times that the
contestant should have switched. You should find that this proportion is about 2/3.

You can read more of these letters at her website.
963, Contents
94.2. The Birthday Problem

Example 1141. (The birthday problem.) What is the smallest number n of people
in a room, such that it is more likely than not, that at least 2 people in the room share
the same birthday? 355
Fix person #1’s birthday. Then
• The probability that person #2’s birthday is different (from person #1) is 364/365.
• The probability that person #3’s birthday is different (from persons #1 and #2) is
• The probability that person #4’s birthday is different (from persons #1, #2, and #3)
is 362/365.
• ... ...
• The probability that person #n’s birthday is different (from persons #1 through #n−1)
is (366 − n)/365.
Altogether, the probability that no 2 persons share the same birthday is
364 363 362 366 − n
× × × ⋅⋅⋅ × .
365 365 365 365
Hence, the probability that at least 2 persons share the same birthday is
364 363 362 366 − n
1− × × × ⋅⋅⋅ × .
365 365 365 365
The smallest integer n for which the above probability is at least 0.5 is 23. That is,
perhaps surprisingly, with just 23 people, it is more likely than not that at least 2 persons
share a birthday.

964, Contents

95. Random Variables: Introduction
Informally, a random variable is a function that assigns a real number (you can think of
this as a “numerical code”) to each possible outcome s. We call any such real number an
observed value of X.

Example 1142. Model a fair coin-flip with the usual experiment E = (S, Σ, P), where

• S = {H, T }.
• Σ = {∅, {H} , {T } , S}.
• P ∶ Σ → R is defined by P (∅) = 0, P ({H}) = P ({H}) = 0.5, and P(S) = 1.
Let X ∶ S → R be the random variable that indicates whether the coin-flip is heads.
That is, the observed value of X is X(H) = 1 if the outcome is heads and X(T ) = 0 if
the outcome is tails.


Definition 195. Let E = (S, Σ, P) be an experiment. A random variable X (on the

experiment E) is any function with domain S and codomain R.
Given any random variable X and any outcome s ∈ S, we call X(s) the observed (or
realised) value of the random variable X. We often denote a generic observed value X(s)
by the lower-case letter x.

965, Contents

95.1. A Random Variable vs. Its Observed Values
Students often confuse a random variable with an observed value of the random variable.
This confusion is, of course, simply the confusion between a function and the value taken
by the function.
Example 1142 (continued from above). X is a function with domain S and codomain
R. X is therefore a random variable.
If the outcome of the coin-flip is heads, we do not say that X is 1. Instead, we say that
the observed value of X is 1.
If the outcome of the coin-flip is tails, we do not say that X is 0. Instead, we say that
the observed value of X is 0.

Remember: A random variable X is a function that can take on many possible real
number values. Each such value x = X(s) is called an observed value of X.

966, Contents

95.2. X = k Denotes the Event {s ∈ S ∶ X(s) = k}

Definition 196. Given a random variable X ∶ S → R, the notation “X = k” denotes the

event {s ∈ S ∶ X(s) = k}.

The notation “X ≥ k”, “X > k”, “X ≤ k”, “X < k”, “a ≤ X ≤ b”, etc. are similarly defined.
Example 1142 (continued from above). X(H) = 1 and X(T ) = 0. So we can write:

X = 1 denotes the event {s ∈ S ∶ X(s) = 1} = {H} ,

X = 0 denotes the event {s ∈ S ∶ X(s) = 0} = {T } .

Moreover, P ({H}) = 0.5 and P ({T }) = 0.5. So we can also write:

P(X = 1) = 0.5 and P(X = 0) = 0.5.

Now let’s try some other arbitrary number like 13.71. Notice there is no outcome s such
that X(s) = 13.71. Thus:

X = 13.71 denotes the event {s ∈ S ∶ X(s) = 13.71} = ∅, and P(X = 13.71) = 0.

Indeed, for any k ≠ 0, 1, there is no outcome s such that X(s) = k. Thus:

X = k denotes the event {s ∈ S ∶ X(s) = k} = ∅, and P(X = k) = 0.

Since P (∅) = 0, we also have P(X = k) = P (∅) = 0, for any k ≠ 0, 1.

Define Y ∶ S → R by Y (H) = 15.5, Y (T ) = 15.5. Y is an example of a constant random
variable. We may write:

Y = 15.5 denotes the event {s ∈ S ∶ X(s) = 15.5} = {H, T } , and P(X = 15.5) = 1.

Moreover, for any k ≠ 15.5,

Y = k denotes the event {s ∈ S ∶ X(s) = k} = ∅, and P(Y = k) = 0.

967, Contents

95.3. The Probability Distribution of a Random Variable
We call a complete specification of P (X = k) for all values of k the probability distribu-
tion (or probability law or probability mass function) of X. In the above example,
we gave the probability distributions of both X and Y .
More examples of random variables and their probability distributions:

Example 1143. Flip two fair coins. Model this with the usual experiment, where S =
{HH, HT, T H, T T }.
Let X ∶ S → R indicate whether the two coin flips are the same and Y ∶ S → R count the
number of heads. That is,

X(HH) = 1, X(HT ) = 0, X(T H) = 0, X(T T ) = 1,

Y (HH) = 2, Y (HT ) = 1, Y (T H) = 1, Y (T T ) = 0.


P(X = 0) = 0.5, P(X = 1) = 0.5, and P(X = k) = 0, for any k ≠ 0, 1.

P(Y = 0) = 0.25, P(Y = 1) = 0.5, P(Y = 2) = 0.25, and P(X = k) = 0, for any k ≠ 0, 1, 2.

Another example:

968, Contents

Example 1144. Pick a random card from the standard 52-card deck. Model this with
the usual experiment, where

S = {A«, K«, , . . . , 2«, Aª, Kª, . . . , 2ª, A©, K©, . . . , 2©, A¨, K¨, . . . , 2¨} .

X ∶ S → R is the High Card Point count (used in the game of bridge). I.e.,

X(A of any suit) = 4, X(K of any suit) = 3, X(Q of any suit) = 2,

X(J of any suit) = 1, X(Any other card) = 0.

36 4 4
P(X = 0) = , P(X = 1) = , P(X = 2) = ,
52 52 52

4 4
P(X = 3) = , P(X = 4) = , P(X = k) = 0,
52 52
for any k ≠ 0, 1, 2, 3, 4.
Y ∶ S → R indicates whether the picked card is a spade (♠). I.e.,

Y (Any ♠) = 1, Y (Any other card) = 0.

39 13
P(Y = 0) = , P(Y = 1) = , P(Y = k) = 0, for any k ≠ 0, 1.
52 52

969, Contents

Example 1145. Roll two fair dice. Model this with the usual experiment, where

⎪ ⎫

⎪ ⎪
S=⎨ ⎬.
⎪ ⎪
, ,..., , ,..., , ,...,

⎩ ⎪

X ∶ S → R is the sum of the two dice. And so for example,

⎛ ⎞ ⎛ ⎞
= 7 and X = 5.
⎝ ⎠ ⎝ ⎠

The table below says that P (X = 2) = 1/36, because there is only one way the event X = 2
can occur. And P (X = 3) = 2/36, because there are two ways the event X = 3 can occur.
You are asked to complete the table in the next exercise.

k s such that X(s) = k P (X = k)

3 ,

Exercise 404. (Continuation of the above example.) (Answer on p. 1577.) (a) Complete
the above table.
Consider the event E, described in words as “the sum of the two dice is at least 10”.
(b) Write down the event E in terms of X.
(c) Calculate P(E).

970, Contents

95.4. Random Variables Are Simply Functions

Example 1145 (continued from above). Continue with the same the roll-two-fair-
dice example, with X again being the random variable that is the sum of the two dice.
We had

⎛ ⎞ ⎛ ⎞
= 7 and X = 5.
⎝ ⎠ ⎝ ⎠

Let Y ∶ S → R be the product of the two dice. And so for example,

⎛ ⎞ ⎛ ⎞
= 10 and Y = 4.
⎝ ⎠ ⎝ ⎠

Remember: random variables are simply functions. And thus, we can manipulate random
variables just like we manipulate any functions.
So for example, consider the function X + Y ∶ S → R. It is also a random variable. We

⎛ ⎞ ⎛ ⎞
(X + Y ) = 17 and (X + Y ) = 9.
⎝ ⎠ ⎝ ⎠

Similarly, consider the function XY ∶ S → R. It is also a random variable. We have

⎛ ⎞ ⎛ ⎞
(XY ) = 70 and (XY ) = 20.
⎝ ⎠ ⎝ ⎠

Finally, consider the function 4X − 5Y ∶ S → R. It is also a random variable. We have

⎛ ⎞ ⎛ ⎞
(4X − 5Y ) = −22 and (4X − 5Y ) = 0.
⎝ ⎠ ⎝ ⎠

971, Contents

Exercise 405. Continue with the above roll-two-fair-dice example. Let P ∶ S → R be the
greater of the two dice. Let Q ∶ S → R be the difference of the two dice. Evaluate the
functions P , Q, and P Q at and . (Answer on p. 1578.)

Exercise 406. (Answer on p. 1578.) Model a fair die-roll with the usual experiment
E = {S, Σ, P}. Define the function X ∶ S → R by the mapping rule X(1) = 1, X(2) = 2,
X(3) = 3, X(4) = 4, X(5) = 5, and X(6) = 6.
Is X a random variable on E? Why or why not?
If X is indeed a random variable on E, then write down also P(X = k), for all possible k.
Exercise 407. For each of the following real-world scenarios, write down, in precise
mathematical notation (i) the experiment E = {S, Σ, P}; (ii) what the random variable
X is; and (iii) P(X = k), for all possible k. (Answers on pp. 1578 and 1579.)
(a) Flip 4 (fair) coins. Let the random variable X be a count of the number of heads.
(b) Roll 3 (fair) dice. Let the random variable X be the sum of the three dice. (Tedious.)

972, Contents

96. Random Variables: Independence

Definition 197. Given random variables X ∶ S → R and Y ∶ S → R, the notation

“X = x, Y = y” denotes the event {s ∈ S ∶ X(s) = x, Y (s) = y}.

Example 1146. Flip two fair coins. Model this with the usual experiment where S =
{HH, HT, T H, T T }.
Let X ∶ S → R indicate whether the two coin flips were the same and Y ∶ S → R count
the number of heads. That is,

X(HH) = 1, X(HT ) = 0, X(T H) = 0, X(T T ) = 1,

and Y (HH) = 2, Y (HT ) = 1, Y (T H) = 1, Y (T T ) = 0.

Then X = 0, Y = 0 is the event that the two coin flips were not the same AND the number
of heads was 0. By observation, this event is the empty set. Thus, P (X = 0, Y = 0) =
P (∅) = 0.
X = 1, Y = 0 is the event that the two coin flips were the same AND the number of heads
was 0. By observation, this event is {T T }. Thus, P (X = 1, Y = 0) = P ({T T }) = 0.25.
Exercise: Verify for yourself that

P (X = 0, Y = 1) = 0.5, P (X = 1, Y = 1) = 0,

P (X = 0, Y = 2) = 0, P (X = 1, Y = 2) = 0.25.

973, Contents

Informally, two random variables are independent if knowing the value of one does not
tell us anything about the value of the other.
Example 1146 (continued from above). Flip two fair coins. We say the two coin-flips
are independent. Informally, the outcome of one doesn’t affect the other. Knowing that
the first coin-flip is heads tells us nothing about the second coin-flip.
A little more formally, let A and B be the random variables indicating whether the first
and second coin-flip are heads (respectively). That is, A = 1 if the first coin-flip is heads
and A = 0 otherwise; and B = 1 if the second coin-flip is heads and B = 0 otherwise. Then
the informal statement “the two coin-flips are independent” may be translated into the
formal statement “the random variables A and B are independent”.
Informally, knowing the observed value of A tells us nothing about whether B = 0 or
B = 1. (And vice versa.)


Definition 198. Given random variables X ∶ S → R and Y ∶ S → R, we say that X and

Y are independent if for all x, y,

P (X = x, Y = y) = P(X = x)P(Y = y).

Let’s restate the above definition more explicitly. Suppose X can take on values x1 , x2 , . . . , xn
and Y can take on values y1 , y2 , . . . , ym . Then to say that X and Y are independent is to
say that all of the following n × m pairs of events are independent

X = x 1 , Y = y1 , X = x 1 , Y = y2 , ... X = x 1 , Y = ym ,
X = x 2 , Y = y1 , X = x 2 , Y = y2 , ... X = x 2 , Y = ym ,
⋮ ⋮ ... ⋮
X = x n , Y = y1 , X = x n , Y = y2 , ... X = x n , Y = ym .

Independence between two random variables is thus equivalent to independence between

many pairs of events.

974, Contents

Example 1146 (continued from above). We now verify, in more formal and precise
language, that “the two coin-flips are indeed independent”.
Again, A and B are the random variables indicating whether the first and second coin-flips
are heads (respectively).
We now verify that indeed, P (A = a, B = b) = P(A = a)P(B = b) for all possible values of
a and b:

P (A = a, B = b) P(A = a)P(B = b)
a = 0, b = 0 P ({T T }) = 0.25 P ({T H, T T }) P ({HT, T T }) = 0.5 × 0.5, 3
a = 1, b = 0 P ({HT }) = 0.25 P ({HH, HT }) P ({HT, T T }) = 0.5 × 0.5, 3
a = 0, b = 1 P ({T H}) = 0.25 P ({T H, T T }) P ({HH, T H}) = 0.5 × 0.5, 3
a = 1, b = 1 P ({HH}) = 0.25 P ({HH, HT }) P ({HH, T H}) = 0.5 × 0.5. 3

Exercise 408. Flip two fair coins. Let X ∶ S → R indicate whether the two coin flips
were the same and Y ∶ S → R count the number of heads. Are X and Y independent
random variables? (Answer on p. 1581.)

Earlier we warned against blithely assuming that any two events are independent. Here we
can repeat this warning: Unless explicitly told (or you have a good reason), do not assume
that two random variables are independent.
The assumption of independence is a strong one. There are many scenarios where it is
plausible. For example, the flips of two coins are probably independent. The rolls of two
dice are probably independent.
There are, however, also many scenarios where it is not plausible. Today’s changes in
the share prices of Google and Apple are probably not independent. Today’s rainfall in
Singapore and in Kuala Lumpur are probably not independent.
Nonetheless, the assumption of independence is frequently — and incorrectly — made even
when it is implausible. The reason is that the maths is easy if we assume independence —
we can simply multiply probabilities together. Unfortunately, incorrectly assuming inde-
pendence has sometimes had tragic consequences, as we saw in the Sally Clark case.

975, Contents

97. Random Variables: Expectation

Example 1147. Let X be the outcome of a fair die roll.

What is the expected value (or the mean) of X? In other words, on average, what’s
the expected outcome of a fair die roll?
Note that X takes on a value 1 with probability 1/6. Similarly, it takes on a value 2 with
probability 1/6. Etc. Hence, the expected value of X, denoted E [X] is given by:
1 1 1 1 1 1 1 + 2 + 3 + 4 + 5 + 6 21
E [X] = ⋅1+ ⋅2+ ⋅3+ ⋅4+ ⋅5+ ⋅6= = = 3.5.
6 6 6 6 6 6 6 6
E [X] is thus simply a weighted average of the possible values of X, where the weights
are the probability weights.

We’ll use the following slightly-incorrect definition of a discrete random variable:356

Slightly-Incorrect Definition. A random variable is discrete if its range is finite.

That is, a random variable is discrete if it takes on finitely many possible values.
We can now formally define the expected value of a discrete random variable:

Definition 199. Let E = (S, Σ, P) be an experiment. Then the corresponding expectation

operator, denoted E, is the function that maps any discrete random variable X ∶ S → R
to a real number, according to the mapping rule

E [X] = ∑ P(X = k) ⋅ k.

We call E [X] the expected value (or mean) of X. We often write µX = E [X] or even
µ = E [X] (if it is clear from the context that we’re talking about the mean of X).

Example 1148. Let X be the outcome of a fair die roll. The range of X is Range(X) =
{1, 2, 3, 4, 5, 6}. So

E [X] = ∑ P (X = k) ⋅ k

= P (X = 1) ⋅ 1 + P (X = 2) ⋅ 2 + P (X = 3) ⋅ 3 + P (X = 4) ⋅ 4 + P (X = 5) ⋅ 5 + P (X = 6) ⋅ 6.

1 1 1 1 1 1
= ⋅ 1 + ⋅ 2 + ⋅ 3 + ⋅ 4 + ⋅ 5 + ⋅ 6 = 3.5.
6 6 6 6 6 6

The correct definition is this: A random variable is discrete if its range is finite or countably-infinite.
I avoid giving this correct definition because this would require explaining what “countably-infinite”
976, Contents
Example 1149. Let Y be the sum of two fair die-rolls.
The range of Y is Range(Y ) = {2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12}. In Exercise 404, we worked
out that P (Y = 2) = 1/36, P (Y = 3) = 2/36, etc. Thus:

E [Y ] = ∑ P (Y = k) ⋅ k
k∈Range(Y )

= P (Y = 2) ⋅ 2 + P (Y = 3) ⋅ 3 + P (Y = 4) ⋅ 4 + P (Y = 5) ⋅ 5 + ⋅ ⋅ ⋅ + P (Y = 12) ⋅ 12

1 2 3 4 5 6 5 4 3 2 1
= ⋅2+ ⋅3+ ⋅4+ ⋅5+ ⋅6+ ⋅7+ ⋅8+ ⋅9+ ⋅ 10 + ⋅ 11 + ⋅ 12
36 36 36 36 36 36 36 36 36 36 36

2 + 6 + 12 + 20 + 30 + 42 + 40 + 36 + 30 + 22 + 12 252
= = = 7.
36 36

Example 1150. XXFlip two fair coins and roll two fair dice. Let X be the number of
heads and Y be the number of sixes.
Problem: What is E[X + Y ]?
As it turns out, it is generally true that E[X + Y ] = E [X] + E [Y ] (as we’ll see in the next
section). So if we knew this, then the problem would be very easy:
1 4
E[X + Y ] = E [X] + E [Y ] = 1 + = .
3 3
But as an exercise, let’s pretend we don’t know that E[X + Y ] = E [X] + E [Y ]. We thus
have to work out E[X + Y ] the hard way:
First, note that Range(X + Y ) = {0, 1, 2, 3, 4}. P (X + Y = 0) is the probability of 0 heads
and 0 sixes. And P (X + Y = 1) is the probability of 1 head and 0 sixes OR 0 heads and
1 six. We can compute:
1 1 5 5 25
P (X + Y = 0) = ⋅ ⋅ ⋅ = ,
2 2 6 6 144

⎛ 2 ⎞ 1 1 5 5 1 1 ⎛ 2 ⎞ 5 1 50 10 60
P (X + Y = 1) = ⋅ ⋅ ⋅ + ⋅ = + = .
⎝ 1 ⎠ 2 2 6 6 2 2 ⎝ 1 ⎠ 6 6 144 144 72

You are asked to complete the rest of this problem in the exercise below.

Exercise 409. Complete the above example by following these steps: (a) Compute
P (X + Y = 2). (b) Compute P (X + Y = 3). (c) Compute P (X + Y = 4). (d) Now com-
pute E[X + Y ]. (Answer on p. 1581.)

977, Contents

97.1. The Expected Value of a Constant R.V. is Constant

Example 1151. Let 5 be a constant random variable on some experiment E = (S, Σ, P).
That is, 5 ∶ S → R is the function defined by s ↦ 5. (Note that the symbol 5 does double
duty by denoting both a function and a real number.) Then not surprisingly,

Function Number
↓ ↓
E [5] = 5 .

That is, on average, we expect the random variable 5 to take on the value 5.

We can easily prove the above observation:

Fact 174. If the constant random variable c maps every outcome to the number c, then
E[c] = c.

Proof. The PMF of the constant random variable c is given by P (c = c) = 1 and P (c = k) = 0

for any k ≠ c. Hence, E [c] = P (c = c) ⋅ c = 1 ⋅ c = c.

978, Contents

Exercise 410. In the game of 4D, you pay $1 to pick any four-digit number between
0000 and 9999 (there are thus 10, 000 possible choices). There are two variants of the 4D
game — “big” and “small”. The prize structures are as given below. Let X be the prize
received from a $1 stake in the “big” game and Y be the prize received from a $1 stake
in the “small” game. (Answer on p. 1582.)
(a) Write down the range of X and the range of Y .
(b) Write down the probability distributions of X and Y .
(c) Hence find E [X] and E [Y ].
(d) Which game — “big” or “small” — is expected to lose you less money?

(Source: Singapore Pools, “Rules for the 4-D Game”, Version 1.11, 17/11/15. PDF.)

979, Contents

97.2. The Expectation Operator is Linear

Definition 200. Let f ∶ A → B be a function, x, y ∈ A, and k ∈ R. We say that f is a

linear transformation if it satisfies the following two conditions:
(a) Additivity: f (x + y) = f (x) + f (y); and
(b) Homogeneity of degree 1: f (kx) = kf (x).

Example 1152. The summation operator ∑ is an example of a linear transformation.

Because it satisfies both additivity and homogeneity of degree 1:
n n n n n
∑ (ai + bi ) = ∑ ai + ∑ bi and ∑ (kai ) = k ∑ ai .
i=1 i=1 i=1 i=1 i=1

Example 1153. The differentiation operator is an example of a linear transformation.
Because it satisfies both additivity and homogeneity of degree 1:
d d d d d
(f (x) + g (x)) = f (x) + g (x) and (kf (x)) = k f (x) .
dx dx dx dx dx

A common mistake made by students is to believe that “everything is linear”.

Here are two examples of operators that are not linear transformations.

Example 1154. The square-root operator ⋅ is not a linear transformation. In general,
we do not have
√ √ √ √ √
x+y = x+ y or kx = k x.

Example 1155. The square operator ⋅2 is not a linear transformation. In general, we

do not have

(x + y) = x2 + y 2 or (kx) = kx2 .
2 2

980, Contents

It turns out that the expectation operator is a linear transformation.

Proposition 15. The expectation operator E is linear. That is, if X and Y are random
variables and c is a constant, then
(a) Additivity: E[X + Y ] = E [X] + E [Y ],
(b) Homogeneity of degree 1: E[cX] = cE [X].

Proof. Optional, see p. 1371 in the Appendices.

The linearity of the expectation operator is a powerful property, especially because it is

true even if independence is not satisfied.

Example 1156. I stake $100 on each of two different 4D numbers for Saturday’s drawing
(“big” game). (So that’s $200 total.)
Let X and Y be my winnings (excluding my original stake) from the first and second
numbers (respectively). Now, X and Y are certainly not independent because for ex-
ample, if my first number wins first prize, then my second number cannot possibly also
win first prize.
Nonetheless, despite X and Y not being independent, the linearity of the expectation
operator tells us that

E [X + Y ] = E [X] + E [Y ] = $65.90 + $65.90 = $131.80.

981, Contents

98. Random Variables: Variance

Example 1157. Consider a random variable X that is equally likely to take on one of 5
possible values: 0, 1, 2, 3, 4. Its mean is
1 1 1 1 1
µX = ∑ P (X = k) ⋅ k = ⋅ 0 + ⋅ 1 + ⋅ 2 + ⋅ 3 + ⋅ 4 = 2.
5 5 5 5 5
Now consider another random variable Y that is equally likely to take on one of 5 possible
values: −8, −3, 2, 7, 12. Coincidentally, its mean is the same:
1 1 1 1 1
µY = ∑ P (Y = k) ⋅ k = ⋅ (−8) + ⋅ (−3) + ⋅ 2 + ⋅ 7 + ⋅ 12 = 2.
5 5 5 5 5
The random variables X and Y share the same mean. However, there is an obvious
difference: Y is “more spread out”.

What, precisely, do we mean when we say that one random variable is “more spread out”
than another?
Our goal in this section is to invent a measure of “spread-outness”. We’ll call this the
variance and denote the variance of any random variable X by Var [X].
It’s not at all obvious how the variance should be defined. One possibility is to define the
variance as the weighted average of the deviations from the mean.
Example 982 (continued from above). (Our first proposed definition of variance.)
For X, the weighted average of the deviations from the mean is

V [X] = ∑ P (X = k) ⋅ (k − µ)
1 1 1 1 1
= ⋅ (0 − µ) + ⋅ (1 − µ) + ⋅ (2 − µ) + ⋅ (3 − µ) + ⋅ (4 − µ)
5 5 5 5 5
1 1 1 1 1
= ⋅ (0 − 2) + ⋅ (1 − 2) + ⋅ (2 − 2) + ⋅ (3 − 2) + ⋅ (4 − 2)
5 5 5 5 5
2 1 1 2
= − − + 0 + + = 0.
5 5 5 5
Hmm. This works out to be 0. Is that just a weird coincidence? Let’s try the same for

V [Y ] = ∑ P (Y = k) ⋅ (k − µ)
1 1 1 1 1
= ⋅ (−8 − µ) + ⋅ (−3 − µ) + ⋅ (2 − µ) + ⋅ (7 − µ) + ⋅ (12 − µ)
5 5 5 5 5
1 1 1 1 1
= ⋅ (−8 − 2) + ⋅ (−3 − 2) + ⋅ (2 − 2) + ⋅ (7 − 2) + ⋅ (12 − 2)
5 5 5 5 5
= −2 − 1 + 0 + 1 + 2 = 0.

Hmm. Again it works out to be 0.

This is no mere coincidence. It turns out that ∑ P(X = k) ⋅ (k − µ) is always equal to 0.


982, Contents

This is because

³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ
∑ P(X = k) ⋅ (k − µ) = ∑ P(X = k) ⋅ k − ∑ P(X = k) ⋅ µ
k k k
= µ − µ∑ P(X = k) = 0.
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶


So our first proposed definition of the variance — the weighted average of the deviations
from the mean — is always equal to 0. Intuitively, the reason is that the negative deviations
(corresponding to those values below the mean) exactly cancel out the positive deviations
(corresponding to those values above the mean).
This proposed definition is thus quite useless. We cannot use it to say things like Y is
“more spread out” than X.
This suggests a second approach: define the variance to be the weighted average of the
absolute deviations from the mean.
Example 982 (continued from above). (Our second proposed definition of variance.)
For X, the weighted average of the absolute deviations from the mean is

V [X] = ∑ P (X = k) ⋅ ∣k − µ∣
1 1 1 1 1
= ⋅ ∣0 − µ∣ + ⋅ ∣1 − µ∣ + ⋅ ∣2 − µ∣ + ⋅ ∣3 − µ∣ + ⋅ ∣4 − µ∣
5 5 5 5 5
1 1 1 1 1
= ⋅ ∣0 − 2∣ + ⋅ ∣1 − 2∣ + ⋅ ∣2 − 2∣ + ⋅ ∣3 − 2∣ + ⋅ ∣4 − 2∣
5 5 5 5 5
2 1 1 2 6
= + +0+ + = .
5 5 5 5 5
And now let’s work out the same for Y :

V [Y ] = ∑ P (Y = k) ⋅ (k − µ)
1 1 1 1 1
= ⋅ ∣−8 − µ∣ + ⋅ ∣−3 − µ∣ + ⋅ ∣2 − µ∣ + ⋅ ∣7 − µ∣ + ⋅ ∣12 − µ∣
5 5 5 5 5
1 1 1 1 1
= ⋅ ∣−8 − 2∣ + ⋅ ∣−3 − 2∣ + ⋅ ∣2 − 2∣ + ⋅ ∣7 − 2∣ + ⋅ ∣12 − 2∣
5 5 5 5 5
= 2 + 1 + 0 + 1 + 2 = 6.

Wonderful! So we can now use this second proposed definition of the variance to say
things like “Y is more spread out than X”.

This second proposed definition seems perfectly satisfactory. Yet for some bizarre reason,
we won’t use it! Instead, we’ll define the variance to be the weighted average of the
squared deviations from the mean.

983, Contents

Example 982 (continued from above). (The actual definition of variance.)
For X, the weighted average of the squared deviations from the mean is

V [X] = ∑ P (X = k) ⋅ (k − µ)

1 1 1 1 1
= ⋅ (0 − µ) + ⋅ (1 − µ) + ⋅ (2 − µ) + ⋅ (3 − µ) + ⋅ (4 − µ)
2 2 2 2 2
5 5 5 5 5
1 1 1 1 1
= ⋅ (0 − 2) + ⋅ (1 − 2) + ⋅ (2 − 2) + ⋅ (3 − 2) + ⋅ (4 − 2)
2 2 2 2 2
5 5 5 5 5
4 1 1 4
= + + 0 + + = 2.
5 5 5 5
And now let’s work out the same for Y :

V [Y ] = ∑ P (Y = k) ⋅ (k − µ)

1 1 1 1 1
= ⋅ (−8 − µ) + ⋅ (−3 − µ) + ⋅ (2 − µ) + ⋅ (7 − µ) + ⋅ (12 − µ)
2 2 2 2 2
5 5 5 5 5
1 1 1 1 1
= ⋅ (−8 − 2) + ⋅ (−3 − 2) + ⋅ (2 − 2) + ⋅ (7 − 2) + ⋅ (12 − 2)
2 2 2 2 2
5 5 5 5 5
= 20 + 5 + 0 + 5 + 20 = 50.


Definition 201. Let µ = E [X]. Then the variance operator is denoted Var and is the
function that maps each random variable X to a real number c, given by the mapping

V [X] = E [(X − µ) ] .

We call Var [X] the variance of X. This is often also instead written as σX
or even more
simply as σ (if it is clear from the context that we’re talking about the variance of X).

So to calculate the variance, we do this: Consider all the possible values that X can take.
Take the difference between these values and the mean of X. Square them. Then take the
probability-weighted average of these squared numbers.
More examples:

984, Contents

Example 1158. Let the random variable X be the outcome of the roll of a fair die. We
already know that µ = 3.5. Hence,

V [X] = E [(X − µ) ] = E [(X − 3.5) ]

2 2

= P (X = 1) ⋅ (1 − 3.5)2 + P (X = 2) ⋅ (2 − 3.5)2 + ⋅ ⋅ ⋅ + P (X = 6) ⋅ (6 − 3.5)2

1 35
= (2.52 + 1.52 + 0.52 + 0.52 + 1.52 + 2.52 ) = ≈ 2.92.
6 12
So the variance of the die roll is ≈ 2.92. This means that the expected squared
deviation of X from its mean µ = 3.5 is ≈ 2.92.

Example 1159. Roll two fair dice. Let the random variable Y be the sum of the two
dice. We already know from Example 1149 that µ = 7. So, using also our findings from
Exercise 404,

V [Y ] = E [(Y − µ) ] = E [(Y − 7) ]
2 2

= P (Y = 2) ⋅ (2 − 7)2 + P (Y = 3) ⋅ (3 − 7)2 + ⋅ ⋅ ⋅ + P (Y = 12) ⋅ (12 − 7)2

1 2 2 2 3 2 4 2 5 2 6 2 5 2
= ⋅5 + ⋅4 + ⋅3 + ⋅2 + ⋅1 + ⋅0 + ⋅1
36 36 36 36 36 36 36
4 2 3 2 2 2 1 2
+ ⋅2 + ⋅3 + ⋅4 + ⋅5
36 36 36 36

2 (25 + 32 + 27 + 16 + 5) 210 70
= = = ≈ 5.83.
36 36 12
So the variance of the sum of two dice is ≈ 5.83. This means that on average, the
square of the deviation of Y from its mean µ = 7 is ≈ 5.83.

As the above examples suggest, calculating the variance can be tedious. Fortunately, there
is a shortcut:

985, Contents

Fact 175. Let X be a random variable with mean µ. Then Var [X] = E [X 2 ] − µ2 .

Proof. Using the definition of variance, the linearity of the expectation operator (Proposi-
tion 15), and the fact that µ is a constant, we have

V [X] = E [(X − µ) ] = E [X 2 + µ2 − 2Xµ] = E [X 2 ] + E [µ2 ] − 2E [Xµ]


= E [X 2 ] + µ2 − 2µE [X] = E [X 2 ] + µ2 − 2µ ⋅ µ = E [X 2 ] − µ2 .

We now redo the previous two examples using this shortcut:

Example 1158 (continued from above). Let the random variable X be the outcome
of the roll of a fair die. We already know that µ = 3.5. So compute
1 2 2 91
E [X 2 ] = P (X = 1) ⋅ 12 + P (X = 2) ⋅ 22 + ⋅ ⋅ ⋅ + P (X = 6) ⋅ 62 = (1 + 2 + ⋅ ⋅ ⋅ + 62 ) = .
6 6
91 182 147 35
Hence, Var [X] = E [X 2 ] − µ2 = − 3.52 = − = .
6 12 12 12

Example 1159 (continued from above). Let the random variable Y be the sum of
two rolled dice. We already know from Example 1149 that µ = 7. So, using also our
findings from Exercise 404,

E [Y 2 ] = P (Y = 2) ⋅ 22 + P (Y = 3) ⋅ 32 + ⋅ ⋅ ⋅ + P (Y = 12) ⋅ 122

1 2 2 2 3 2 1
= ⋅2 + ⋅3 + ⋅ 4 + ⋅⋅⋅ + ⋅ 122
36 36 36 36

4 + 18 + 48 + 100 + 294 + 320 + 324 + 300 + 242 + 144 1974 658

= = = .
36 36 12
658 658 588 70
Hence, Var [Y ] = E [Y 2 ] − µ2 = − 72 = − = .
12 12 12 12
This is still tedious, but arguably quicker than before.

Exercise 411. Let the random variable Z be the sum of three rolled dice. Find Var [Z].
(Answer on p. 1583.)

986, Contents

98.1. The Variance of a Constant R.V. is 0
A constant random variable cannot vary. So not surprisingly, the variance of a constant
random variable is 0.

Fact 176. Let c be a constant random variable (i.e. it maps every outcome to the real
number c). Then

V[c] = 0.

Proof. Use Fact 986: Var [c] = E [c2 ] − (E [c]) = c2 − c2 = 0.


987, Contents

98.2. Standard Deviation
Let X be a random variable. Then E [X] has the same unit of measure as X. In contrast,
Var [X] uses the squared unit.

Example 1160. There are 100 dumbbells in a gym, of which 30 have weight 5 kg and
the remaining 70 have weight 10 kg. Let X be the weight of a randomly-chosen dumbbell.
Then the mean of X is

E [X] = µ = 0.3 × 5 kg + 0.7 × 10 kg = 8.5 kg.

And the variance of X is

V [X] = 0.3 × (5 kg − 8.5 kg) + 0.7 × (10 kg − 8.5 kg)

2 2

= 0.3 × 12.25 kg2 + 0.7 × 2.25 kg2 = 5.25 kg2 .

To get a measure of “spread” that uses the original unit of measure, we simply take the
square root of the variance. This is called the standard deviation as a measure of spread.

Definition 202. Let X be a random variable and Var [X] be its variance. Then the
standard deviation of X is defined as

SD [X] = V [X].

The variance of a random variable X is often denoted σX or even more simply as σ 2 (if it
is clear from the context that we’re talking about the variance of X).
Correspondingly, the standard deviation of X is often denoted σX or σ.
Example 988 (continued from above). We calculated the variance of X to be
Var [X] = σ 2 = 5.25 kg2 .

Hence, the standard deviation of X is simply σ = 5.25 ≈ 2.29 kg.

Exercise 412. There are 100 rulers in a bookstore, of which 35 have length 20 cm and
the remaining 65 have length weight 30 cm. Let Y be the weight of a randomly-chosen
dumbbell. Find the mean, variance, and standard deviation of Y . (Be sure to include
the units of measurement.)(Answer on p. 1583.)

988, Contents

98.3. The Variance Operator is Not Linear
The variance operator is not linear. However, given independence, the variance operator
does satisfy additivity and homogeneity of degree 2.

Proposition 16. Let X and Y be independent random variables and c be a constant.

(a) Additivity: Var[X + Y ] = Var [X] + Var [Y ],
(b) Homogeneity of degree 2: Var[cX] = c2 Var [X].

Proof. Optional, see p. 1372 in the Appendices.

With the above, it becomes much easier than before to find the variance of the sum of 2
dice, 3 dice, or indeed n dice.

989, Contents

Example 1161. Let X be the outcome of a fair die-roll. We showed earlier that Var [X] =
Now roll two fair dice. Let X1 and X2 be the respective outcomes. Let Y be the sum of
the two dice (i.e. Y = X1 + X2 ). Assuming independence, we have
V [Y ] = V [X1 + X2 ] = V [X1 ] + V [X2 ] = .
Compare this quick computation to the work we did in Example 1159!
Now roll three fair dice. Let X3 , X4 , and X5 be the respective outcomes. Let Z be the
sum of the three dice (i.e. Z = X3 + X4 + X5 ). Again, assuming independence, we have
V [Z] = V [X3 + X4 + X5 ] = V [X3 ] + V [X4 ] + V [X5 ] = .
Again, compare this quick computation to the work you had to do in Exercise 411!
Now, let A be double the outcome of a die roll (i.e. A = 2X). Note importantly that
A ≠ Y . Y is the sum of two independent die rolls. In contrast, A is double the outcome
of a single die roll. Indeed, by Proposition 16, we see that
V[A] = V[2X] = 4V [X] = ≠ V [Y ] .
Similarly, let B be triple the outcome of a die roll (i.e. B = 3X). Note importantly that
B ≠ Z. Z is the sum of three independent die rolls. In contrast, B is triple the outcome
of a single die roll. Indeed, by Proposition 16, we see that
V[B] = V[3X] = 9V [X] = ≠ V [Z] .

Exercise 413. The weight of a fish in a pond is a random variable with mean µ kg and
variance σ 2 kg2 . (Include the units of measurement in your answers.) (Answer on p.
(a) If two fish are caught and the weights of these fish are independent of each other,
what are the mean and variance of the total weight of the two fish?
(b) If one fish is caught and an exact clone is made of it, what are the mean and variance
of the total weight of the fish and its clone?
(c) If two fish are caught and the weights of these fish are not independent of each other,
what are the mean and variance of the total weight of the two fish?

990, Contents

98.4. The Definition of the Variance (Optional)
Why is the variance defined as the weighted average of squared deviations from the mean?
1. First, we tried defining the variance as the weighted average of deviations from the mean,
i.e. Var [X] = E [X − µ]. But this was no good, because this quantity would always be
equal to 0.357
2. Next, we tried defining the variance as the weighted average of absolute deviations from
the mean, i.e. Var [X] = E [∣X − µ∣]. This seemed to work well enough. But yet for some
bizarre reason, we choose not to use this definition.
3. Instead, we choose to use this definition:
V [X] = E [(X − µ) ] .

Why do we prefer using squared (rather than absolute) deviations as our definition of
variance? The conventional view is that the squared deviations definition is superior to
the absolute deviations definition (but see Gorard (2005) and Taleb (2014) for dissenting
views). Here are some reasons for believing the squared deviations definition to be superior:
• The maths works out more nicely. For example:
– The algebra is easier when dealing with squares than with absolute values.
– Differentiation is easier (serve that x2 is differentiable but ∣x∣ is not).
– Variances are additive: If X and Y are independent, then Var [X + Y ] = Var [X] +
Var [Y ]. In contrast, if we use the definition Var [X] = E [∣X − µ∣], then variances are
no longer additive.
• Tradition (inertia).
– A century or two ago, some Europeans preferred using squared to absolute deviations.
And so we’re stuck with using this.
See also these five SE discussions: , , , , .

This is easily proven: E [X − µ] = E [X] − E [µ] = µ − µ = 0.
991, Contents
99. The Coin-Flips Problem (Fun, Optional)
Here’s another example of a probability problem that can be stated very simply, yet have
counter-intuitive results.

Example 1162. Keep flipping a fair coin until you get a sequence of HH (two heads in
a row). Let X be the number of flips taken.
Now, keep flipping a fair coin until you get a sequence of HT . Let Y be the number of
flips taken.
Which is larger µX = E [X] or µY = E [Y ]?
Intuition might suggest that “obviously”, µX = µY . Intuition would be wrong. It turns
out that, surprisingly enough, µX = 6 and µY = 4!

Example 1163. Now suppose we flip a fair coin 10, 001 times. This gives us a sequence
of 10, 000 pairs of consecutive coin-flips.
For example, if the 10, 001 coin-flips are HHTHT . . . , then the first four pairs of consec-
utive coin-flips are HH, HT, TH, and HT .
Let A be the proportion of the 10, 000 consecutive coin-flips that are HH. Let B be the
proportion of the 10, 000 consecutive coin-flips that are HT .
Which is larger µA = E [A] or µB = E [B]?
In the previous example, we saw that it took, on average, 6 flips before getting HH and
4 flips before getting HT . So “obviously”, we’d expect a smaller proportion to be HH’s.
That is, µA < µB .
Sadly, we would again be wrong! It turns out that µA = µB = 1/4! This Google spreadsheet
simulates 10, 001 coin-flips and calculates A and B.
If you’re interested, the results given in the above two examples are formally proven in Fact
229 in the Appendices.

992, Contents

100. The Bernoulli Trial and the Bernoulli Distribution
A Bernoulli trial is an experiment (S, Σ, P). A coin flip is an example of a Bernoulli

Example 1164. Flip a coin. We can model this with a Bernoulli trial with probability
of success (heads) 0.5:
• Sample space S = {T, H},
• Event space Σ = {∅, {T }, {H}, S},
• Probability function P({T }) = 0.5 and P({H}) = 0.5.

The corresponding Bernoulli random variable is simply the random variable X ∶ S →

R defined by X ({T }) = 0 and X ({H}) = 1. Its probability distribution is given by
P (X = 0) = 0.5 and P(X = 1) = 0.5.


Definition 203. A Bernoulli trial with probability of success p is an experiment (S, Σ, P)

• S = {0, 1}. (The sample space contains 2 elements.)
• Σ = {∅, {0}, {1}, S}.
• P ∶ Σ → R is defined by P({0}) = 1 − p and P({1}) = p. (And as usual P (∅) = 0 and
P (S) = 1.)

The corresponding Bernoulli random variable is simply the random variable X ∶ S →

R defined by X ({0}) = 0 and X ({1}) = 1. Its probability distribution is given by
P (X = 0) = 1 − p and P(X = 1) = p.

Note that we can denote the two elements of the sample space with any symbols. We could
use 0 — standing for failure — and 1 — standing for success. Or we could use T and H,
as was done in the example above.

Example 1165. On any given day, our refrigerator at home has probability 0.001 of
breaking down. We can model this with a Bernoulli trial with probability of success
• Sample space S = {0, 1},
• Event space Σ = {∅, {0}, {1}, S},
• Probability function P({0}) = 0.999 and P({1}) = 0.001.

The corresponding Bernoulli random variable is simply the random variable T ∶ S → R

defined by T ({0}) = 0 and T ({1}) = 1.
Its probability distribution is given by P (T = 0) = 0.999 and P(T = 1) = 0.001. In words,
the probability of no failure is 0.999 and the probability of a failure is 0.001.

993, Contents

Example 1166. 90% of H2 Maths students pass their H2 Maths A-Level exams. We
randomly pick a H2 Maths student and see if she passes her H2 Maths A-Level exam.
We can model this with a Bernoulli trial with probability of success 0.9:
• Sample space S = {F, P },
• Event space Σ = {∅, {F }, {P }, S},
• Probability function P({F }) = 0.1 and P({P }) = 0.9.
The corresponding Bernoulli random variable is simply the random variable Y ∶ S →
R defined by Y ({F }) = 0 and Y ({P }) = 1. Its probability distribution is given by
P (Y = 0) = 0.1 and P(Y = 1) = 0.9.

The following two statements are equivalent:

1. T is a Bernoulli random variable with probability of success p.
2. The random variable T has Bernoulli distribution with probability of success p.

994, Contents

100.1. Mean and Variance of the Bernoulli Random Variable

Fact 177. A Bernoulli random variable T with probability of success p has mean p and
variance p(1 − p).

Proof. E[T ] = P (T = 0) ⋅ 0 + P (T = 1) ⋅ 1 = (1 − p) ⋅ 0 + p ⋅ 1 = p.
For the variance, first compute

E [T 2 ] = P (T = 0) ⋅ 02 + P (T = 1) ⋅ 12 = (1 − p) ⋅ 0 + p ⋅ 12 = p.

Hence, Var [T ] = E [T 2 ] − (E[T ]) = p − p2 = p(1 − p).


995, Contents

101. The Binomial Distribution
Informally, the binomial random variable simply counts the number of successes in a
sequence of n identical, but independent Bernoulli trials.

Example 1167. Flip 3 fair coins. Let X be the number of heads.

X is an example of a binomial random variable X with parameters 3 and .
X can take on values 0, 1, 2, or 3 (corresponding to the number of heads).
The probability distribution of X is given by:

⎛3⎞ 1 0 1 3 1 ⎛3⎞ 1 1 1 2 3
P(X = 0) = ( ) ( ) = , P(X = 1) = ( ) ( ) = ,
⎝0⎠ 2 2 8 ⎝1⎠ 2 2 8

⎛3⎞ 1 2 1 1 3 ⎛3⎞ 1 3 1 0 1
P(X = 2) = ( ) ( ) = , P(X = 3) = ( ) ( ) = .
⎝2⎠ 2 2 8 ⎝3⎠ 2 2 8


Definition 204. Let T1 , T2 , . . . , Tn be n identical, but independent Bernoulli random

variables, each with probability of success p. Then the binomial random variable X with
parameters n and p is defined as:

X = T1 + T2 + ⋅ ⋅ ⋅ + Tn .

The following three statements are entirely equivalent:

1. X is a binomial random variable with parameters n and p.
2. The random variable X has the binomial distribution with parameters n and p.
3. X ∼ B(n, p).

996, Contents

Example 1168. 90% of H2 Maths students pass their A-Level exams.
Let Y be the number of passes among two randomly-chosen students. Then Y is a
binomial random variable with parameters 2 and 0.9. Its probability distribution is given

⎛2⎞ 0 2
P (Y = 0) = 0.9 0.1 = 0.01,

⎛2⎞ 1 1
P (Y = 1) = 0.9 0.1 = 0.18,

⎛2⎞ 2 0
P (Y = 2) = 0.9 0.1 = 0.81.

In words, the probability that both fail is 0.01, the probability that exactly one passes is
0.18, and the probability that both pass is 0.81.

997, Contents

101.1. Probability Distribution of the Binomial R.V.
Let X ∼ B (n, p). What is P(X = k)?
Observe that P(X = k) is simply the probability that in a sequence of n independent
Bernoulli trials, each with probability of success p, there are exactly k successes.
First consider instead the probability that in a sequence of n trials, the first k trials are
successes and the remaining n − k are failures. We know that the probability of a success
is p and the probability of a failure is 1 − p. Hence, by the Multiplication Principle, this
probability is simply pk (1 − p)n−k .
The above is the probability of k successes and n − k failures, but where exactly the first k
trials are successes and exactly the last n − k trials are failures. But we don’t care about
where the successes are. We only care that there are k successes. And there are C(n, k)
ways to have exactly k successes in n trials. Thus,

⎛n⎞ k
P(X = k) = p (1 − p)n−k .
⎝k ⎠

In summary:

Fact 178. Let X ∼ B(n, p). Then for any k = 0, 1, . . . , n,

⎛n⎞ k
P(X = k) = p (1 − p)1−k .
⎝k ⎠

Example 1169. Let X be the number of heads when 10 fair coins are flipped.
Then X ∼ B(10, 0.5). And the probability that exactly 8 coins are heads is:

⎛ 10 ⎞ 8 2 45
P(X = 8) = 0.5 0.5 =
⎝ 8 ⎠

Example 1170. 90% of H2 Maths students pass their A-Level exams.

Let Y be the number of passes among 20 randomly-chosen students. Then Y ∼ B(20, 0.9).
And the probability that at least 18 pass is

P(Y ≥ 18) = P(Y = 18) + P(Y = 19) + P(Y = 20)

⎛ 20 ⎞ 18 2 ⎛ 20 ⎞ 19 1 ⎛ 20 ⎞ 20 0
= 0.9 0.1 + 0.9 0.1 + 0.9 0.1 ≈ 0.677.
⎝ 18 ⎠ ⎝ 19 ⎠ ⎝ 20 ⎠

998, Contents

101.2. The Mean and Variance of the Binomial Random Variable

Example 1171. Problem: Three machines each have, independently, probability 0.3 of
failure. What is the expected number of failures? What is the variance of the number of
Solution: Let Z ∼ B(3, 0.3) be the number of failures. Then

⎛3⎞ 1 2 ⎛3⎞ 2 1 ⎛3⎞ 3 0

P (Z = 1) = 0.3 0.7 , P (Z = 2) = 0.3 0.7 , P (Z = 3) = 0.3 0.7 .
⎝1⎠ ⎝2⎠ ⎝3⎠

Hence, E [Z] = P (Z = 1) ⋅ 1 + P (Z = 2) ⋅ 2 + P (Z = 3) ⋅ 3
⎛3⎞ 1 2 ⎛3⎞ 2 1 ⎛3⎞ 3 0
= 0.3 0.7 ⋅ 1 + 0.3 0.7 ⋅ 2 + 0.3 0.7 ⋅ 3
⎝1⎠ ⎝2⎠ ⎝3⎠
= 0.441 + 0.378 + 0.081 = 0.9.

That is, the expected number of failures is 0.9.

Now,E [Z 2 ] = P (Z = 1) ⋅ 12 + P (Z = 2) ⋅ 22 + P (Z = 3) ⋅ 32
⎛3⎞ 1 2 2 ⎛3⎞ 2 1 2 ⎛3⎞ 3 0 2
= 0.3 0.7 ⋅ 1 + 0.3 0.7 ⋅ 2 + 0.3 0.7 ⋅ 3
⎝1⎠ ⎝2⎠ ⎝3⎠
= 0.441 + 0.756 + 0.243 = 1.44.

Hence, V [Z] = E [Z 2 ] − (E [Z]) = 1.44 − 0.92 = 0.63.


That is, the variance of the number of failures is 0.63.

It turns out though that there is a much quicker formula for finding the mean and variance
of any binomial random variable.

999, Contents

Fact 179. If X ∼ B(n, p), then E [X] = np and Var [X] = np(1 − p).

(You can verify that this formula works for the last example: n = 3, p = 0.3, and thus
E [Z] = np = 0.9.)

Proof. Let T1 , T2 , . . . , Tn be identical, but independent Bernoulli random variables with

parameter p. Then X = T1 + T2 + ⋅ ⋅ ⋅ + Tn . Hence,

E [X] = E [T1 + T2 + ⋅ ⋅ ⋅ + Tn ] = E [T1 ] + E [T2 ] + ⋅ ⋅ ⋅ + E [Tn ] = p + p + ⋅ ⋅ ⋅ + p = np.

V [X] = V [T1 + T2 + ⋅ ⋅ ⋅ + Tn ] = V [T1 ] + V [T2 ] + ⋅ ⋅ ⋅ + V [Tn ]

= p(1 − p) + p(1 − p) + ⋅ ⋅ ⋅ + p(1 − p) = np(1 − p).

Exercise 414. (Answer on p. 1584.) Plane engine #1 contains 20 components, each

of which has probability 0.01 of failure. Plane engine #2 contains 35 components, each
of which has probability 0.005 of failure. The probability that any component fails is
independent of whether any other component has failed.
An engine fails if and only if at least 2 of its components fail. What is the probability
that both engines fail?

1000, Contents

102. The Continuous Uniform Distribution
So far, all examples of random variables we’ve seen have been discrete. For example, the
binomial random variable X ∼ B (n, p) is discrete, because Range (X) = {0, 1, 2, . . . , n} is
We’ll now look at continuous random variables. Informally, a random variable Y is con-
tinuous if its range takes on a continuum of values.
For H2 Maths, you need only learn about one continuous random variable: the normal
random variable (subject of the next chapter).
Nonetheless, we’ll first look at another continuous random variable that is not in the syl-
labus. This is the continuous uniform random variable. It is much simpler than
the normal random variable and can thus help build up your intuition of how continuous
random variables work.

1001, Contents

102.1. The Continuous Uniform Distribution
A line measuring exactly 1 metre in length is drawn on the floor. It is about to rain. Let
X be the position of the first rain-drop that hits the line. X is measured as the distance
(in metres) from the left-most point of the line.
So for example, if the first rain-drop hits the left-most point of the line, then x = 0. If it
hits the exact midpoint of the line, then x = 0.5. And if it hits the right-most point, then
x = 1.
Assume we can measure X to infinite precision.
Then, assuming the first rain-drop is equally likely to hit any point of the line, we can
model X as a continuous uniform random variable on [0, 1]. This says that
• The range of X is [0, 1] (the first rain-drop can hit any point along the line); and
• X is equally likely to take on any value in the interval [0, 1] (the first rain-drop is equally
likely to hit any point along the line).
The following three statements are entirely equivalent:
1. X is a continuous uniform random variable on [0, 1].
2. X is a random variable with the continuous uniform distribution on [0, 1].
3. X ∼ U [0, 1].
Recall that previously with any discrete random variable Y , we could find its probability
distribution. That is, we could find P (Y = k) (the probability that Y takes on the value
k). For example, if Y ∼ B (3, 0.5) modelled the number of heads in three coin-flips, then
⎛3⎞ 1 2 3
the probability that there was one heads was P (Y = 1) = 0.5 0.5 = .
⎝1⎠ 8
Now, in contrast, for any continuous random variable X, strangely enough, there is
zero probability that X takes on any particular value! For example, if X ∼ U [0, 1], then
P (X = 0.37) = 0. That is, there is zero probability that X takes on the value of 0.37!
At first glance, this may seem strange.
But remember: There are infinitely many real numbers in the interval [0, 1]. So it makes
sense to say that the probability of X taking on any particular value is zero.358
So for any continuous random variable X, it is pointless to try to write down P (X = k) for
different possible values of k, because P (X = k) is always equal to zero (regardless of what
k is). Instead, we shall try to write down P (a ≤ X ≤ b), for different possible values of a
and b.
Now, if X ∼ U [0, 1], then the probability that X takes on values between 0.3 and 0.7 is
simply 0.7 − 0.3 = 0.4. That is,

But strangely enough, zero probability is not the same thing as impossible. For example, we’d
say that
• There is zero probability, but it is not impossible that X ∼ U [0, 1] takes on the value 0.37.
• There is zero probability and it is impossible that X ∼ U [0, 1] takes on the value 1.2.
(Actually, rather than use the word “impossible”, mathematicians prefer saying “almost never”, which
has a precise definition.)
1002, Contents
P (0.3 ≤ X ≤ 0.7) = 0.7 − 0.3 = 0.4.

Similarly, the probability that X takes on values between 0.16 and 0.35 is simply 0.35−0.16 =
0.19. That is,

P (0.16 ≤ X ≤ 0.35) = 0.35 − 0.16 = 0.19.

The above observations suggest that it may be useful to define a new concept, called the
cumulative distribution function.

1003, Contents

102.2. The Cumulative Distribution Function (CDF)
The CDF simply tells us the probability that X takes on values less than or equal to k, for
every k ∈ R. Formally:

Definition 205. The cumulative distribution function (CDF) of a random variable X is

the function FX ∶ R → R given by the mapping rule

FX (k) = P (X ≤ k) .

It turns out that every random variable can be uniquely defined by giving its
CDF. For example, the continuous uniform random variable is formally defined thus:

Definition 206. X is the continuous uniform random variable on [0, 1] if its CDF FX ∶
R → R is defined by

⎪ 0, if k < 0,

FX (k) = ⎨k, if k ∈ [0, 1],

⎩1, if k > 1.

Armed with the concept of the CDF, the formal definition of a continuous random variable
can be simply stated:

Definition 207. A random variable X is continuous if its CDF FX is continuous.

We can now summarise the three possible types of random variables.
1. Discrete random variables. A random variable is discrete if its range is finite.359
Examples: Bernoulli, binomial.
2. Continuous random variables. A random variable is continuous if its CDF is con-
tinuous. Examples: Continuous uniform, normal.
3. Other random variables. There are random variables that are neither discrete nor
continuous. But you will not study any of these for the A-Levels.
Note that every random variable (discrete, continuous, or otherwise) has a cumulative
distribution function (CDF).

Or countably-infinite.
1004, Contents
102.3. Important Digression: P (X ≤ k) = P (X < k)
For any continuous random variable X, we have

P (X ≤ k) = P (X < k) .

That is, whether an inequality is strict makes no difference. The reason is that by the third
Kolmogorov axiom (additivity),

P (X ≤ k) = P (X < k) + P (X = k) = P (X < k) + 0 = P (X < k) .

Thus, for continuous random variables, it doesn’t matter whether inequalities are strict or

Example 1172. Let X ∼ U [0, 1]. Then

P (0.2 ≤ X ≤ 0.5) = P (0.2 < X ≤ 0.5) = P (0.2 ≤ X < 0.5) = P (0.2 < X < 0.5) .

1005, Contents

102.4. The Probability Density Function (PDF)
The PDF is simply defined as the derivative of the CDF.360

Definition 208. Let X be a random variable whose CDF FX is differentiable. Then the
probability density function (PDF) of X is the function fX ∶ R → R defined by

fX (k) = FX (k).

The PDF has an intuitive interpretation. The area under the PDF between points a and
b is equal to P (a ≤ X ≤ b). This, of course, is simply a consequence of the Fundamental
Theorems of Calculus:

fX (k)dk = ∫ FX (k)dk = FX (b) − FX (a) = P(X ≤ b) − P(X ≤ a) = P(a ≤ X ≤ b).
b b d
∫a a dk
The PDF of X ∼ U[0, 1] (graphed below) is simply the function fX ∶ R → R defined by

fX (k) = 1, if k ∈ [0, 1], and fX (k) = 1, otherwise.

For any a ≤ b, the area under the PDF between a and b is precisely P (a ≤ X ≤ b). For
example, there is probability 0.25 (red area) that X takes on values between 0.5 and 0.75.
There is probability 0.1 (blue area) that X takes on values between 0.2 and 0.3.

Exercise 415. The continuous uniform random variable Y ∼ U[3, 5] is equally likely to
take on values between 3 and 5, inclusive. (a) Write down its CDF FY . (b) Write down
and graph its PDF fY . (c) Compute, and also illustrate on your graph, the quantities
P (3.1 ≤ Y ≤ 4.6) and P (4.8 ≤ Y ≤ 4.9). (Answer on p. 1585.)

Note that although every random variable has a CDF, not every random variable has a PDF. In
particular, if the random variable’s CDF is not differentiable, then by our definition here, the random
variable does not have a PDF.
1006, Contents
103. The Normal Distribution
The standard normal (or Gaussian) random variable (SNRV) is very important. In
fact, it is so important that we usually reserve the letter Z for it, and the Greek letters φ
and Φ (lower- and upper-case phi) for its PDF and CDF.
The following three statements are entirely equivalent:
1. Z is a SNRV.
2. Z is a random variable with the standard normal distribution.
3. Z ∼ N (0, 1).
Here’s the formal definition:

Definition 209. Z is called a standard normal random variable (SNRV) if its PDF
φ ∶ R → R is defined by:
φ (a) = √ e−0.5a .

For the A-Levels, you need not remember this complicated-looking PDF. Nor need you
understand where it comes from.
The normal PDF is often also referred to as the bell curve, due to its resemblance to a
bell (kinda).

As with the continuous uniform, for any a ≤ b, the area under the normal PDF between
a and b gives us precisely P (a ≤ X ≤ b). For example, there is probability 0.25 (red area)
that X takes on values between 0.5 and 0.75. There is probability 0.1 (blue area) that X
takes on values between 0.2 and 0.3.

1007, Contents

As usual, the CDF Φ ∶ R → R is defined by:
Φ (a) = P (Z ≤ a) = ∫ φ (x) dx = ∫ √ e−0.5x dx.
a a 2

−∞ −∞ 2π
Unfortunately, this last integral has no simpler expression (mathematicians would say that
it has no “closed-form expression”). Instead, as we’ll soon see, we have to use the so-called
Z-tables (or a graphing calculator) to look up values of Φ(k).
The next fact summarises the properties of the normal distribution. Some of these proper-
ties are illustrated in the figure that follows.

Fact 180. Let Z ∼ N(0, 1) and φ and Φ be its PDF and CDF.
1. Φ(∞) = 1. (As with any random variable, the area under the entire PDF is 1.)
2. φ (a) > 0, for all a ∈ R. (The PDF is positive everywhere. This has a surprising
implication: however large a is, there is always some non-zero probability that Z ≥ a.)
3. E [Z] = 0. (The mean of Z is 0.)
4. The PDF φ reaches a global maximum at the mean 0. (In fact, we can go ahead and
compute φ (0) = √ ≈ 0.399.)

5. Var [Z] = 1. (The variance of Z is 1.)
6. P (Z ≤ a) = P (Z < a). (We’ve already discussed this earlier. It makes no difference
whether the inequality is strict. This is because P(Z = a) = 0.)
7. The PDF φ is symmetric about the mean. This has several implications:
(a) P (Z ≥ a) = P (Z ≤ −a) = Φ(−a).
(b) Since P (Z ≥ a) = 1 − P (Z ≤ a) = 1 − Φ (a), it follows that Φ(−a) = 1 − Φ (a) or,
equivalently, Φ (a) = 1 − Φ(−a).
(c) Φ (0) = 1 − Φ (0) = 0.5.
8. P (−1 ≤ Z ≤ 1) = Φ (1) − Φ (−1) ≈ 0.6827. (There is probability 0.6827 that Z takes on
values within 1 standard deviation of the mean.)
9. P (−2 ≤ Z ≤ 2) = Φ (2) − Φ (−2) ≈ 0.9545. (There is probability 0.9545 that Z takes on
values within 2 standard deviations of the mean.)
10. P (−3 ≤ Z ≤ 3) = Φ (3) − Φ (−3) ≈ 0.9973. (There is probability 0.9973 that Z takes on
values within 3 standard deviations of the mean.)
11. The PDF φ has two points of inflexion, namely at ±1. (The points of inflexion are one
standard deviation away from the mean.)

Proof. Optional, see p. 1375 in the Appendices.

1008, Contents

-4 -3 -2 -1 0 1 2 3 4

1009, Contents

Example 565. Let’s use the TI84 to find Φ(2.51).
Example 1173. Let’s use the TI84 to find Φ(2.51).
1. Press the blue 2ND button and then DISTR (which corresponds to the VARS button).
Pressbrings up the
the blue 2NDDISTR
and then DISTR (which corresponds to the VARS button).
This brings
2. Press up the
2 to select theDISTR menu. option.
2. Press 2 to select the “normalcdf” option.
The TI84 is now asking for your lower and upper bounds. Since Φ(2.51) = Φ(2.51)−Φ(−∞),
The lower
your TI84 bound −∞ and
is nowisasking for your
lower bound
and upper bounds. Since Φ(2.51) = Φ(2.51) −
is 2.51.
Φ(−∞), your lower bound is −∞ and your upper bound is 2.51.
3. But there’s no way to enter −∞ on your TI84. So instead, you’ll enter −1099 , which is
But there’s
a verynolarge
way negative −∞ on your
to enter number. To TI84.
do so, So instead,
press you’ll
(-) , the enter
blue 2ND−10button,
, whichEE
simply a very large negative number. To do so, press (-) , the blue 2ND button, EE
(which corresponds to the , button), and then 9 9 . (Don’t press ENTER yet!)
(which corresponds to the , button), and then 9 9 . (Don’t press ENTER yet!)
4. Now to enter your upper bound. First press , (this simply demarcates your lower and
4. Now to enter your upper bound. First press , (this simply demarcates your lower and
bound2.51 2.51bybypressing
pressing 22 .. 55 11. .Then
ENTER.. Your
Your TI84
TI84 says
says that
that the
the answer Φ(2.51)≈≈0.99396.
answer isis Φ(2.51) 0.99396.

After Step
Step 1.
1. After
After Step
Step 2.
2. After Step 3.
After Step 3. After Step
After Step 4.

-4 -3 -2 -1 0 1 2 3 4

Page 631, Table of Contents

1010, Contents

Example 1174. To find Φ(−2.51), Φ(1.372), and P (−4 ≤ Z ≤ 4), the steps are very
similar. So for each, I’ll simply give the screenshot from the TI84:

Φ(−2.51) Φ(1.372) P (−4 ≤ Z ≤ 4)

-4 -3 -2 -1 0 1 2 3 4

-4 -3 -2 -1 0 1 2 3 4

1011, Contents

Example 1175. We’ll find Φ(2.51), Φ(−2.51), Φ(1.372), and P (−4 ≤ Z ≤ 4) using Z-
Refer to the Z-tables on p. 1013. (These are the exact same tables that appear on the
List of Formulae (MF26).)
• To find Φ(2.51), look at the row labelled 2.5 and the column labelled 1 — read off the
number 0.9940. We thus have Φ(2.51) = 0.9940.
• To find Φ(−2.51), note that the table does not explicitly give values of Φ (z), if z < 0.
But we can exploit the fact that the standard normal is symmetric about the mean
µ = 0. This fact implies that Φ(−z) = 1−Φ (z). Hence, Φ(−2.51) = 1−Φ(2.51) = 0.0060.
• To find Φ(1.372), first look at the row labelled 1.3 and the column labelled 7 — read
off the number 0.9147. This tells us that Φ(1.37) = 0.9147. Now look at the right
end of the table (where it says “ADD”). Since the third decimal place of 1.372 is 2,
we look under the column labelled 2 — this tells us to ADD 3. Thus, Φ(1.372) =
0.9147 + 0.003 = 0.9150.
• To find P (−4 ≤ Z ≤ 4), the Z-tables printed are actually useless, because they only go
to 2.99. So you can just write P (−4 ≤ Z ≤ 4) ≈ 1.

Exercise 416. Using both the Z-tables and your graphing calculator, find the following:
(a) P (Z ≥ 1.8). (b) P (−0.351 < Z < 1.2). (Answer on p. 1586.)

1012, Contents


If Z has a normal distribution with mean 0 and

variance 1 then, for each value of z, the table gives
the value of (z) , where
(z )  P(Z  z).
For negative values of z use (z)  1  (z) .

1 2 3 4 5 6 7 8 9
z 0 1 2 3 4 5 6 7 8 9
0.0 0.5000 0.5040 0.5080 0.5120 0.5160 0.5199 0.5239 0.5279 0.5319 0.5359 4 8 12 16 20 24 28 32 36
0.1 0.5398 0.5438 0.5478 0.5517 0.5557 0.5596 0.5636 0.5675 0.5714 0.5753 4 8 12 16 20 24 28 32 36
0.2 0.5793 0.5832 0.5871 0.5910 0.5948 0.5987 0.6026 0.6064 0.6103 0.6141 4 8 12 15 19 23 27 31 35
0.3 0.6179 0.6217 0.6255 0.6293 0.6331 0.6368 0.6406 0.6443 0.6480 0.6517 4 7 11 15 19 22 26 30 34
0.4 0.6554 0.6591 0.6628 0.6664 0.6700 0.6736 0.6772 0.6808 0.6844 0.6879 4 7 11 14 18 22 25 29 32
0.5 0.6915 0.6950 0.6985 0.7019 0.7054 0.7088 0.7123 0.7157 0.7190 0.7224 3 7 10 14 17 20 24 27 31
0.6 0.7257 0.7291 0.7324 0.7357 0.7389 0.7422 0.7454 0.7486 0.7517 0.7549 3 7 10 13 16 19 23 26 29
0.7 0.7580 0.7611 0.7642 0.7673 0.7704 0.7734 0.7764 0.7794 0.7823 0.7852 3 6 9 12 15 18 21 24 27
0.8 0.7881 0.7910 0.7939 0.7967 0.7995 0.8023 0.8051 0.8078 0.8106 0.8133 3 5 8 11 14 16 19 22 25
0.9 0.8159 0.8186 0.8212 0.8238 0.8264 0.8289 0.8315 0.8340 0.8365 0.8389 3 5 8 10 13 15 18 20 23
1.0 0.8413 0.8438 0.8461 0.8485 0.8508 0.8531 0.8554 0.8577 0.8599 0.8621 2 5 7 9 12 14 16 19 21
1.1 0.8643 0.8665 0.8686 0.8708 0.8729 0.8749 0.8770 0.8790 0.8810 0.8830 2 4 6 8 10 12 14 16 18
1.2 0.8849 0.8869 0.8888 0.8907 0.8925 0.8944 0.8962 0.8980 0.8997 0.9015 2 4 6 7 9 11 13 15 17
1.3 0.9032 0.9049 0.9066 0.9082 0.9099 0.9115 0.9131 0.9147 0.9162 0.9177 2 3 5 6 8 10 11 13 14
1.4 0.9192 0.9207 0.9222 0.9236 0.9251 0.9265 0.9279 0.9292 0.9306 0.9319 1 3 4 6 7 8 10 11 13
1.5 0.9332 0.9345 0.9357 0.9370 0.9382 0.9394 0.9406 0.9418 0.9429 0.9441 1 2 4 5 6 7 8 10 11
1.6 0.9452 0.9463 0.9474 0.9484 0.9495 0.9505 0.9515 0.9525 0.9535 0.9545 1 2 3 4 5 6 7 8 9
1.7 0.9554 0.9564 0.9573 0.9582 0.9591 0.9599 0.9608 0.9616 0.9625 0.9633 1 2 3 4 4 5 6 7 8
1.8 0.9641 0.9649 0.9656 0.9664 0.9671 0.9678 0.9686 0.9693 0.9699 0.9706 1 1 2 3 4 4 5 6 6
1.9 0.9713 0.9719 0.9726 0.9732 0.9738 0.9744 0.9750 0.9756 0.9761 0.9767 1 1 2 2 3 4 4 5 5
2.0 0.9772 0.9778 0.9783 0.9788 0.9793 0.9798 0.9803 0.9808 0.9812 0.9817 0 1 1 2 2 3 3 4 4
2.1 0.9821 0.9826 0.9830 0.9834 0.9838 0.9842 0.9846 0.9850 0.9854 0.9857 0 1 1 2 2 2 3 3 4
2.2 0.9861 0.9864 0.9868 0.9871 0.9875 0.9878 0.9881 0.9884 0.9887 0.9890 0 1 1 1 2 2 2 3 3
2.3 0.9893 0.9896 0.9898 0.9901 0.9904 0.9906 0.9909 0.9911 0.9913 0.9916 0 1 1 1 1 2 2 2 2
2.4 0.9918 0.9920 0.9922 0.9925 0.9927 0.9929 0.9931 0.9932 0.9934 0.9936 0 0 1 1 1 1 1 2 2
2.5 0.9938 0.9940 0.9941 0.9943 0.9945 0.9946 0.9948 0.9949 0.9951 0.9952 0 0 0 1 1 1 1 1 1
2.6 0.9953 0.9955 0.9956 0.9957 0.9959 0.9960 0.9961 0.9962 0.9963 0.9964 0 0 0 0 1 1 1 1 1
2.7 0.9965 0.9966 0.9967 0.9968 0.9969 0.9970 0.9971 0.9972 0.9973 0.9974 0 0 0 0 0 1 1 1 1
2.8 0.9974 0.9975 0.9976 0.9977 0.9977 0.9978 0.9979 0.9979 0.9980 0.9981 0 0 0 0 0 0 0 1 1
2.9 0.9981 0.9982 0.9982 0.9983 0.9984 0.9984 0.9985 0.9985 0.9986 0.9986 0 0 0 0 0 0 0 0 0

Critical values for the normal distribution

If Z has a normal distribution with mean 0 and

variance 1 then, for each value of p, the table
gives the value of z such that
P(Z  z) = p.

p 0.75 0.90 0.95 0.975 0.99 0.995 0.9975 0.999 0.9995

z 0.674 1.282 1.645 1.960 2.326 2.576 2.807 3.090 3.291
103.1. The Normal Distribution, in General
Let Z ∼ N(0, 1) be the SNRV and σ, µ ∈ R be constants.
Consider σZ + µ, itself a random variable. We know that since E [Z] = 0 and Var [Z] = 1,
it follows that

E [σZ + µ] = σE [Z] + µ = µ and V [σZ + µ] = σ 2 V [Z] = σ 2 .

It turns out that σZ + µ is a normal random variable with mean µ and variance σ 2 :

Definition 210. X is called a normal random variable with mean µ and variance σ 2 if
its PDF fX ∶ R → R is defined by:
fX (a) = √ e−0.5( σ ) .
a−µ 2

σ 2π

Once again, for the A-Levels, you need not remember this complicated-looking PDF. Nor
need you understand where it comes from.
The following three statements are entirely equivalent:
1. X is a normal random variable with mean µ and variance σ 2 .
2. X is a random variable with normal distribution of mean µ and variance σ 2 .
3. X ∼ N (µ, σ 2 ).

1014, Contents

Example 1176. The normal random variables A ∼ N(−1, 1), B ∼ N(1, 1), and C ∼ N(2, 1)
have variance 1 (just like the SNRV), but non-zero means. Their PDFs are graphed below.
(Included for reference is the standard normal PDF in black.)
We see that the effect of increasing the mean µ is to move the graph of the PDF rightwards.
And decreasing the mean moves it leftwards.

1015, Contents

Example 1177. The normal random variables D ∼ N(0, 0.1), E ∼ N(0, 2), and F ∼
N(0, 3) have mean 0 (just like the SNRV), but non-unit variances. Their PDFs are
graphed below. (Included for reference is the standard normal PDF in black.)
The effect of changing the variance σ 2 is this:
• The larger the variance, the “fatter” the “tails” of the PDF and the shorter the peak.
• Conversely, the smaller the variance, the “thinner” the “tails” of the PDF and the
taller the peak.

1016, Contents

Example 1178. The normal random variables G ∼ N(−1, 0.1), H ∼ N(1, 2), and I ∼
N(2, 3) have non-zero means and non-unit variances. Their PDFs are graphed below.
(Included for reference is the standard normal PDF in black.)

Exercise 417. Let X ∼ N(µ, σ 2 ). Verify that if µ = 0 and σ 2 = 1, then for all a ∈ R, we
have fX (a) = φ (a). What can you conclude? (Answer on p. 1587.)

1017, Contents

In general, normality is preserved under linear transformations:

Fact 181. Let X ∼ N (µ, σ 2 ) and a, b ∈ R be constants. Then aX + b ∼ N (aµ + b, a2 σ 2 ).

Proof. Optional, see p. 1377 in the Appendices.

Thus, we can easily transform any normal random variable into the SNRV:

X −µ
Corollary 33. If X ∼ N (µ, σ 2 ), then = Z ∼ N(0, 1). Equivalently, X = σZ + µ.

(Just to be clear, two random variables are identical if their CDFs are identical.)

Proof. The next exercise asks you to prove this corollary.

X −µ
Exercise 418. Using Fact 181, prove that if X ∼ N (µ, σ 2 ), then = Z ∼ N(0, 1).
(Answer on p. 1587.)

The above corollary gives us an alternative method for computing probabilities associated
with normal random variables. In general, if X ∼ N (µ, σ 2 ), then
c−µ c−µ
P (X ≤ c) = P (σZ + µ ≤ c) = P (Z ≤ ) = Φ( ).
σ σ

1018, Contents

The properties that we listed for the SNRV also apply, with only a few modifications, to
any NRV. I highlight any differences in red. The figure that follows illustrates.

Fact 182. Let X ∼ N (µ, σ 2 ) and let fX and FX be the PDF and CDF of X.
1. Φ(∞) = 1. (The area under the entire PDF is 1. This, of course, is true of any random
2. φ (a) > 0, for all a ∈ R. (The PDF is positive everywhere. This has the surprising
implication that no matter how large a is, there is always some non-zero probability
that Z ≥ a.)
3. E [X] = µ. (The mean of Z is µ.)
4. The PDF fX reaches a global maximum at the mean µ. (In fact, we can go ahead and
1 0.399
compute fX (µ) = √ ≈ .)
σ 2π σ
5. Var [X] = σ 2 . (The variance of X is σ 2 .)
6. P (Z ≤ a) = P (Z < a). (We’ve already discussed this earlier. It makes no difference
whether the inequality is strict. This is because P(Z = a) = 0.)
7. The PDF φ is symmetric about the mean. This has several implications:
(a) P (X ≥ µ + a) = P (X ≤ µ − a) = FX (µ − a).
(b) Since P (X ≥ µ + a) = 1 − P (X ≤ µ + a) = 1 − FX (µ + a), it follows that FX (µ − a) =
1 − FX (µ + a) or, equivalently, FX (µ + a) = 1 − FX (µ − a).
(c) FX (µ) = 1 − FX (µ) = 0.5.
8. P (µ − σ ≤ X ≤ µ + σ) = Φ (1) − Φ (−1) ≈ 0.6827. (There is probability 0.6827 that X
takes on values within 1 standard deviation of the mean.)
9. P (µ − σ ≤ X ≤ µ + σ) = Φ (2) − Φ (−2) ≈ 0.9545. (There is probability 0.9545 that X
takes on values within 2 standard deviations of the mean.)
10. P (µ − σ ≤ X ≤ µ + σ) = Φ (3) − Φ (−3) ≈ 0.9973. (There is probability 0.9973 that X
takes on values within 3 standard deviations of the mean.)
11. The PDF φ has two points of inflexion, namely at ±σ. (The points of inflexion are
one standard deviation away from the mean.)

Proof. See the next exercise.

Exercise 419. Prove all of the properties listed in Fact 182. (Hint: Use Corollary 33 to
convert X into the SNRV. Then simply apply Fact 180.) (Answer on p. 1588.)

1019, Contents

1020, Contents
Example 1179. Let G ∼ N(−1, 0.1), H ∼ N(1, 2), and I ∼ N(2, 3). We’ll find P (G < 2)
using our TI84. The first few steps are similar to before:
1. Press the blue 2ND button and then VARS (which corresponds to the DISTR button).
This brings up the DISTR menu.
2. Press 2 to select the “normalcdf” option.
3. Enter the lower bound −1099 by pressing (-) , the blue 2ND button, EE (which cor-
responds to the , button), and then 9 9 . (Don’t press ENTER yet!)
4. Enter the upper bound 2 by pressing , and 2 . (Don’t press ENTER yet!!).

After Step 1. After Step 2. After Step 3. After Step 4.

Previously, we didn’t bother telling the TI84 our mean µ and standard deviation σ.
And so by default, if we pressed ENTER at this point, the TI84 simply assumed that
we wanted the SNRV Z ∼ N(0, 1). Now we’ll tell the TI84 what µ and σ are:
5. First enter the mean µ = −1. Press , (-) 1 .
√ √
6. Now enter the standard deviation σ = 0.1 (and not the variance). Press , 0
. 1 ) . Finally, press ENTER . The TI84 says that P (G < 2) ≈ 1.

After Step 5. After Step 6.

Finding P (H < 2), P (I < 2), P (−1 < G < 1), P (−1 < H < 1), and P (−1 < I < 1) is similar:

P (H < 2) and P (I < 2) P (−1 < G < 1) P (−1 < H < 1) P (−1 < I < 1)

Since I has mean µ = 2, we should have exactly P (I < 2) = 0.5. So here the TI84 has
actually made a small error in reporting instead that P (I < 2) ≈ 0.5000000005.

1021, Contents

Example 1180. We now redo the previous two examples, but use Z-tables:

2 − µG 2 − (−1)
P (G < 2) = P (Z < = √ ≈ 9.4868) = Φ (9.4868) ≈ 1,
σG 0.1

2 − µH 2 − 1
P (H < 2) = P (Z < = √ ≈ 0.7071) = Φ (0.7071) ≈ 0.7601,
σH 2

2 − µI 2 − 2
P (I < 2) = P (Z < = √ = 0) = Φ (0) = 0.5,
σI 3

−1 − (−1) 1 − (−1)
P (−1 < G < 1) = P (0 = √ <Z< √ ≈ 6.3246)
0.1 0.1
= Φ (6.3246) − Φ (0) ≈ 1 − Φ (0) = 0.5.

−1 − 1 1−1
P (−1 < H < 1) = P (−1.4142 ≈ √ < Z < √ = 0)
2 2
= Φ (0) − Φ(−1.4142) ≈ 0.5 − [1 − Φ(1.4142)]
= Φ(1.4142) − 0.5 ≈ 0.9213 − 0.5 = 0.4213,

−1 − 2 1−2
P (−1 < I < 1) = P (−1.7321 ≈ √ < Z < √ ≈ −0.5774)
3 3
= Φ(−0.5774) − Φ(−1.7321) = 1 − Φ(0.5774) − [1 − Φ(1.7321)]
≈ 0.9584 − 0.7182 = 0.2402.

Exercise 420. Let X ∼ N(2.14, 5) and Y ∼ N(−0.33, 2). Using both the Z-tables
and your graphing calculator, find the following: (a) P (X ≥ 1) and P (Y ≥ 1). (b)
P (−2 ≤ X ≤ −1.5) and P (−2 ≤ Y ≤ −1.5). (Answer on p. 1589.)

1022, Contents

103.2. Sum of Independent Normal Random Variables

Theorem 31. If X and Y are independent normal random variables, then X + Y is also
a normal random variable. Moreover, X − Y is also a normal random variable.

Proof. Omitted.

We already knew from before that E [X ± Y ] = E [X] ± E [Y ]. Moreover, if X and Y are

independent, then Var [X ± Y ] = Var [X] + Var [Y ]. Thus, the above theorem implies:

Corollary 34. Let X ∼ N (µX , σX 2

) and Y ∼ N (µY , σY2 ) be independent and a, b ∈ R
be constants. Then X + Y ∼ N (µX + µY , σX 2
+ σY2 ) and more generally, aX + bY ∼
N (aµX + bµY , a2 σX
+ b2 σY2 ).
Moreover, X − Y ∼ N (µX − µY , σX
+ σY2 ) and more generally, aX − bY ∼
N (aµX − bµY , a2 σX
+ b2 σY2 ).


1023, Contents

Example 1181. The weight (in kg) of a sumo wrestler is modelled by X ∼ N (200, 50).
Assume that the weight of each sumo wrestler is independent of the weight of any other
sumo wrestler.
We randomly choose two sumo wrestlers.
(a) What is the probability that their total weight is greater than 405 kg?
(b) What is the probability that one is more than 10% heavier than that the other?

(a) Let X1 ∼ N (200, 50) and X2 ∼ N (200, 50) be the weight of the first and second sumo
wrestler. Then X1 + X2 ∼ N (400, 100). Thus,

405 − 400
P (X1 + X2 > 405) = P (Z > √ ) = P (Z > 0.5) = 1 − Φ (0.5) ≈ 1 − 0.6915 = 0.3085.

(b) Our goal is to find p = P (X1 > 1.1X2 ) + P (X2 > 1.1X1 ). This is the probability that
the first sumo wrestler is more than 10% heavier than the second, plus the probability
that the second is more than 10% heavier than the first. Of course, by symmetry, these
two probabilities are equal. Thus, p = 2 × P (X1 > 1.1X2 ). Now,

P (X1 > 1.1X2 ) = P (X1 − 1.1X2 > 0) .

But X1 − 1.1X2 ∼ N (200 − 1.1 ⋅ 200, 50 + 1.12 ⋅ 50) = N (−20, 110.5). Thus,

0 − (−20)
P (X1 > 1.1X2 ) = P (X1 − 1.1X2 > 0) = P (Z > √ )

≈ P (Z > 1.9026) = 1 − Φ (1.9026) ≈ 1 − 0.9714 = 0.0286.

Altogether then, p = 2P (X1 > 1.1X2 ) = 2 × 0.0286 = 0.0572.

1024, Contents

Example 1182. The weight (in kg) of a caught fish is modelled by X ∼ N (1, 0.4). The
weight (in kg) of a caught shrimp is modelled by Y ∼ N (0.1, 0.1). Assume that the
weights of any caught fish and shrimp are independent.
(a) What is the probability that the total weight of 4 caught fish and 50 caught shrimp
is greater than 10 kg?
(b) What is the probability that a caught fish weighs more than 9 times as much as a
caught shrimp?
(a) Let S be the total weight of 4 caught fish and 50 caught shrimp. Note, importantly,
that it would be wrong to write S = 4X + 50Y , because 4X + 50Y would be 4 times the
weight of a single caught fish, plus 50 times the weight of a single caught shrimp.
In contrast, we want Z to be the sum of the weights of 4 independent fish and 50 inde-
pendent shrimp. Thus, we should instead write S = X1 + X2 + X3 + X4 + Y1 + Y2 + ⋅ ⋅ ⋅ + Y50 ,
• X1 ∼ N (1, 0.4), X2 ∼ N (1, 0.4), X3 ∼ N (1, 0.4), and X4 ∼ N (1, 0.4) are the weights of
each caught fish.
• Y1 ∼ N (0.1, 0.1), Y2 ∼ N (0.1, 0.1), . . . , and Y50 ∼ N (0.1, 0.1) are the weights of each
caught shrimp.
Now, S ∼ N (4 × 1 + 50 × 0.1, 4 × 0.4 + 50 × 0.1) = N (9, 6.6).
(Note by the way that in contrast, 4X + 50Y ∼ N (9, 42 × 0.4 + 502 × 0.1) = N (9, 256.4),
which has a rather different variance!)
Thus, P (S > 10) ≈ 0.3485 (calculator).

(b) P (X > 9Y ) = P (X − 9Y > 0). But X −9Y ∼ N (1 − 9 × 0.1, 0.4 + 92 × 0.1) = N (0.1, 8.5).
Thus, P (X − 9Y > 0) ≈ 0.5137 (calculator).

Exercise 421. (Answer on p. 1590.) Water and electricity usage are billed, respectively,
at $2 per 1, 000 litres (l) and $0.30 per kilowatt-hour (kWh). Assume that each month,
the amount of water used by Ahmad (and his family) at their HDB flat is normally
distributed with mean 25, 000 l and variance 64, 000, 000 l2 . Similarly, the amount of
electricity they use is normally distributed with mean 200 kWh and variance 10, 000
kWh2 .
Assume that monthly water usage and electricity usage are independent.
(a) Find the probability that their total water and electricity utility bill in any given
month exceeds $100.
(b) Find the probability that their total water and electricity utility bill in any given year
exceeds $1, 000.
Suppose instead that electricity usage is billed at $x per kWh.
(c) Then what is the maximum value of x, in order for the probability that the total
utility bill in a given month exceeds $100 is 0.1 or less?

1025, Contents

103.3. The Central Limit Theorem and The Normal
Suppose we have n independent random variables, each identically-distributed with mean
µ ∈ R and variance σ 2 ∈ R. Then informally, the Central Limit Theorem (CLT) says that if
n is “large enough”, then their sum (which is also a random variable) has the approximate
distribution N (nµ, nσ 2 ). Formally:

Theorem 32. (The Central Limit Theorem.) Let X1 , X2 , . . . , Xn be random vari-

ables. Suppose (i) they are independent; and (ii) they are identically-distributed, with
mean µ ∈ R and variance σ 2 ∈ R.

Then the sum ∑ X = X1 + X2 + ⋅ ⋅ ⋅ + Xn converges in distribution to N (nµ, nσ 2 ).



Proof. The proof is a little advanced and thus entirely omitted from this book.

What does it mean for one random variable to “converge in distribution” to another? This
is a little beyond the scope of the A-Levels, but informally, this means that as n → ∞,
the random variable ∑ Xi becomes “ever more” like the random variable with distribution
N (nµ, nσ ).

One big use of the CLT is this:

If n is “large enough”, then the sum of n independent,

identically-distributed random variables can be
approximated by a normal distribution.

How large is “large enough”? The most common rule-of-thumb is that n ≥ 30 is “large
enough”, so that’s what we’ll use in this book, even though this is somewhat arbitrary.
Indeed, if the original distribution from which the random variables are drawn are not
“nice enough”, then n ≥ 30 may not be “large enough”. (Informally, a distribution is “nice
enough” if it is — among other things — fairly symmetric, fairly unimodal, and not too
You can safely assume that all distributions you’ll ever encounter in the A-Levels are “nice
enough”, so that the n ≥ 30 rule-of-thumb works. But whenever you use the CLT normal
approximation, you should be clear to state that you assume the distribution is “nice

1026, Contents

Example 1183. Let X be the random variable that is the sum of 100 rolls of a fair
die. From our earlier work, we know that each die roll has mean 3.5 and variance 35/12.
Problem: Find P(X ≥ 360) and P(X > 360).
The CLT says that since n = 100 ≥ 30 is large enough and the distribution is “nice
enough” (we are assuming this), the random variable X can be approximated by the
normal random variable Y ∼ N (100 × 3.5, 100 × 35/12) = N (350, 3500/12).
Now, in using Y as an approximation for X, we might be tempted to simply write

P(X ≥ 360) ≈ P(Y ≥ 360) and P(X > 360) ≈ P(Y > 360).

Note however that X is a discrete random variable, so that P(X ≥ 360) ≠ P(X > 360).
More specifically,

P(X ≥ 360) = P(X = 360) + P(X > 360).

In contrast, Y is a continuous random variable, so that P(Y ≥ 360) = P(Y > 360). Hence,
if we simply use the approximations P(X ≥ 360) ≈ P(Y ≥ 360) and P(X > 360) ≈ P(Y >
360), then implicitly we’d be saying that P(X = 360) = 0, which is blatantly false.
To correct for this, we perform the so-called continuity correction. This says that we’ll
instead use the approximations

P(X ≥ 360) ≈ P(Y ≥ 359.5) and P(X > 360) ≈ P(Y ≥ 360.5).

Thus, P(X ≥ 360) ≈ P(Y ≥ 359.5) ≈ 0.2890 (calculator) and P(X > 360) ≈ P(Y ≥ 360.5) ≈

Continuity Correction. If X is a discrete random variable that is to be approximated

by a continuous random variable Y , then
• P (X ≥ k) ≈ P (Y ≥ k − 0.5),
• P (X ≤ k) ≈ P (Y ≤ k + 0.5),
• P (X > k) ≈ P (Y > k + 0.5),
• P (X < k) ≈ P (Y < k − 0.5).

Note that if the random variable to be approximated is itself continuous, then there is no
need to perform the continuity correction. This is illustrated in Exercise 423 below.

Exercise 422. Let X be the random variable that is the sum of 30 rolls of a fair die.
Find P(100 ≤ X ≤ 110). (Answer on p. 1591.)
Exercise 423. The weight of each Coco-Pop is independently- and identically-distributed
with mean 0.1 g and variance 0.004 g2 . A box of Coco-Pops has exactly 5, 000 Coco-Pops.
It is labelled as having a net weight of 500 g. Find the probability that that the actual
net weight of the Coco-Pops in this box is less than or equal to 499 g. (Answer on p.

1027, Contents

104. The CLT is Amazing (Optional)
The Fundamental Theorems of Calculus and the CLT are the most profound and amazing
results you’ll learn in H2 Maths. This chapter briefly explains why the CLT is so amazing
and why the normal distribution is ubiquitous.

104.1. The Normal Distribution in Nature

The normal distribution is ubiquitous in nature. The classic example is human height.361

Example 1184. Below is a histogram of the heights of the 4,060 NBA players who ever
played in an NBA game (through the end of the 2016 season). (Heights are reported in
feet and a whole number of inches, where 1 in = 2.54 cm and 1 ft = 12 in, so that 1 ft =
30.48 cm.) The histogram has 28 bins and (arguably) looks normal (bell-shaped).
The width of each bin is 1 inch. For example, the red bin says 410 players have had
reported heights of 6 ft 7 in (approx. 200 cm). The pink (leftmost) bin is barely visible
and says only 1 player has had a reported height of 5 ft 3 in (approx. 160 cm). The
blue (rightmost) bin is also barely visible and says that only 2 players have had reported
heights of 7 ft 7 in (approx. 231 cm). The average or mean height is approx. 6 ft 6 in
(approx. 198 cm).

Data: Excel spreadsheet. Source: (retrieved June 15th, 2016). Caveats: (1)
For some reason, out of the 4060 players in that database at the time of retrieval, there was exactly
one player (George Karl) whose height was not listed. lists George Karl’s height as 6 ft
2 in, so that is what I have used for his height. (2) By NBA, I actually mean the BAA (1946-1949),
the NBA (1949-present), and the ABA (1967-1976), combined. (3) As is well-known among basketball
fans, the listed heights of NBA players are not accurate and can sometimes be off by as much as 2 to 3
inches (5 to 7.5 cm). (See this recent Wall Street Journal article.)
1028, Contents
Manute Bol (approx 231 cm) and Muggsy Bogues (approx 160
cm) were briefly on the same team. (YouTube highlights.)

An infamous example of the normal distribution concerns human intelligence:

1029, Contents

Example 1185. The 1994 book The Bell Curve: Intelligence and Class Structure in
American Life was named after the observation that the Intelligence Quotient (IQ) seems
to be normally-distributed. This observation was neither new nor controversial (though
some scholars would dispute the usefulness of IQ measures).

What made the book especially controversial were its claims that intelligence was largely
heritable and that black Americans had lower intelligence than whites. The figure above
is taken from p. 279 of the book. It suggests that
• Black IQ is normally distributed, with a mean of around 80.
• White IQ is normally distributed, with a mean of around 105.

Another example — the Galton box:

1030, Contents

Example 1186. (Galton box.) Small beans are released from the top, through a
narrow passage. There are numerous pegs that tend to randomly divert the path of the
At the bottom of the box, there are many different narrow slots of equal width, into
which the beans can drop. The beans will tend to form a bell shape at the bottom.

Beans just released. Pegs divert beans. Beans fill slots.

(Source: YouTube.)


Many things in nature seem to be normally distributed. Why?

We will try to answer this question, but only after we’ve illustrated how the Central Limit
Theorem works.

1031, Contents

69.2 Illustrating the Central Limit Theorem (CLT)
104.2. Illustrating the Central Limit Theorem (CLT)

Example 581.
Example 1187.Flip Flipmany
manyfair faircoins.
Modeleacheachwithwiththethe Bernoullirandom
Bernoulli randomvariables
TT221,, TT332,, .T. .3 ,, .each
. . , each
withwith probability
probability of success
of success (heads)
(heads) 0.5. 0.5.
Let X
Let Xn==TT1++TT2++⋅⋅⋅⋅⋅⋅++TTn ∼∼BB(n, (n,0.5).
nn 11 22 nn
Below are the histograms of the distributions of X1 , X2 , . . . , and X6 . X1 has probability
0.5 of taking
Below are theon each of the
histograms of values 0 and 1. X2ofhas
the distributions X111probability 0.5 X
, X222, . . . , and of666taking
. X on probability
X111 has the value
of 1;
0.5 of andtaking probability
on each of of 0.25 of taking
the values on 1.
0 and each
values 0 and 0.52.ofEtc.
taking on
taking on the
the value
of 1; and probability of 0.25 of taking on each of the values 0 and 2. Etc. Etc.

(Example continues on the next page ...)

(... Example continued on the next page ...)

Page Contents
1032, 656, Table
656, Table of
of Contents
(... Example
(... Example continued
Example continued from
continued from the
the previous
previous page ...)
On thisand
this andandthe thenext
the nextpage
next pageare
histograms of of
thethe distributions
distributions of77X
of X ,X7 ,88,XX
8 ,9 ,XX9 ,10X10 ,20,
10, X 20
XX3020,,,XX4030,, ,X
X XX50 ,
40 ,
X 50
X , and
100 . Observe
X 100 . Observe
that as nthat as
grows, n grows,
the shape the
of shape
the of the
probability probability
30 40 50 100
of Xnn looks
X looks ever of Xmore
ever n looks
more ever more This
bell-shaped. bell-shaped. This
is exactly is exactly
what the CLT what
says.the CLT says.

(Example continues on the next page ...)

(... Example
(... continued on the next page ...)
Example continued

Page 657,
Page 657, Table
Table of
of Contents
1033, Contents
(... Example
(... Example continued
continued from
from the
the previous
previous page

The CLT says the following:

The CLT says the following:
1. Draw a sufficiently-large number of independent and identical random variables from
1. ANY
Draw distribution.
a sufficiently-large number of independent and identical random variables
variables from
2. Add
ANY them up to get another random variable S.
distribution. 76

2. The
them up todistribution
get anotherofrandom
S will variable
look normal.
3. The probability distribution of S will look normal.
76 I
I should say nearly any distribution. For the classical CLT to apply, the variance
variance must
must be finite.
be finite.

I should say nearly any distribution. For the classical CLT to apply, the variance must be finite.
Page 658, Table of Contents
1034, Contents
What makes
makes the
the CLT
CLT particularly
particularly amazing
amazing is
is that
that it
it works with ANY
works with ANY distribution.
To illustrate,
What illustrate,
makes the next
next CLT up is
is an
an example
up particularly example where
amazing thethat original
original distribution
it works with ANY is highly
highly skewed
skewed and
is distribution. and
does not look at all bell-curved. Nonetheless, the
does not look at all bell-curved. Nonetheless, the CLT still works out nicely. CLT still works out nicely.
To illustrate, next up is an example where the original distribution is highly skewed and
does not look at all bell-curved. Nonetheless, the CLT still works out nicely.
Example 582. 582. Flip
Flip many many biased
biased coins,
coins, each
each with with probability
probability 0.9
0.9 of
of heads.
heads. Model
Model each
with the 1188.
Bernoulli Flip
random many biased
variables coins,
, , each
, with
, each probability
with 0.9
success Model
with the Bernoulli random variables Y1 , Y2 , Y3 , . . . , each with probability of success (heads)
Y 1 Y 2 Y 3 . . .
each with the Bernoulli random variables Y1 , Y2 , Y3 , . . . , each with probability of success
(heads) 0.9.
S = Y11 + Y222 +
+ ⋅⋅⋅ ⋅⋅⋅ ⋅⋅⋅ +
Let SSnnn == YY1 + +Y
Let be the number of heads
headsinin the first coin-flips. (By the way,
Let +Y + be the
the number
number of of heads inthe
firstnn coin-flips.(By
Ynnn be ncoin-flips.

SSnnn ∼∼ B (n,
B 0.9).)
On this
On this and
and the next page
the next page are
are the
the histograms
the histograms of
histograms ofthe
of thedistributions
the distributionsofof
distributions 1 ,1 ,,SS2 , ,,. ......,.. and
,, and .
On this and ofSS
S 1 S2 2 andS10
S 10 ..
has probability
SS111 has
S has probability 0.1
probability of taking
0.1 of taking onon value
value 000and
value and0.9
and 0.9ofof
0.9 oftaking
taking onvalue
on value1.1.
value 1.S2S S22hashasprobability
has probability
0.01 of
0.01 of taking
taking onon the value of 0,
of 0, 0.18
0.18 of
of taking
taking on
on the
the value
value 1,1,and
0.81 ofof taking
taking ononthe
value 2.
value 2. SS33 has probability 0.001
0.001 ofof taking
taking onon the
the value
takingononthe thevalue
1, 0.486
1, 0.486 of
of taking
taking on the
the value
value 2,
2, 0.2916
0.2916 ofof taking
taking on
0.6561ofoftaking takingononthe the
value 4.
value 4. Etc.

(Example continues on the next page ...)

(... Example continued on the next page ...)

Page 659,
659, Table
Table of
of Contents

1035, Contents

(... Example
(... Example continued
continued from
from the
the previous
previous page

ItIt certainly
certainly does not look
look like
like the
the distribution
distribution SSnnnisisbecoming
Well, let’s
Well, let’s see.
(Example continues on the next page ...)
(... Example continued on the next page ...)

Page 660,
660, Table
Table of
of Contents
1036, Contents
(... Example
Example continued
continued from
from the
the previous
previous page ...)
Below are
Below are the
the histograms
histograms of
of the
the distributions
distributions ofof SS2020,, SS3030, ,SS4040, ,SS5050, ,and
andSS . Remarkably
100 . Remarkably
enough, as
enough, as nn grows,
grows, the
the shape
shape of
of the
the probability
probability distribution
distributionofofSSn nlooks looksever
shaped. As promised by the
shaped. As promised by the CLT.CLT.

Page 661,
Page 661, Table
Table of Contents
1037, Contents
104.3. Why Are So Many Things Normally Distributed?
We now return to the question posed earlier:

Many things in nature seem to be normally distributed. Why?

This is a deep question. The standard quick answer is this:

If S is the sum of a very large number of independent random variables,

then by the CLT S is (approximately) normally distributed.

Examples to illustrate:

Example 1189. Assume that human height is entirely determined by 1000 independent
genes (assume all human beings have these 1000 genes).
Assume that each of these 1000 genes is associated with an independent random variable
X1 , X2 , . . . , X1000 , each identically distributed with mean µX and variance σX . Assume
also that human height is simply equal to the sum of these random variables. That is, a
human being’s height is simply given by H = X1 + X2 + ⋅ ⋅ ⋅ + X1000 .
Then the CLT says that since n = 1000 is “large enough”, H will be approximately
normally distributed, with mean 1000µX and variance 1000σX . Amongst the world’s 7.4
billion people, there will be some very short people and some very tall people, but most
people will be near the mean height 1000µX .

Example 1190. Ah Kow’s Mooncake Factory manufactures mooncakes. Each mooncake

is supposed to weigh exactly 185 g, if the standard recipe is followed with absolute
However, the exact weight of each mooncake will usually vary, due to myriad factors,
such as whether the baker was paying attention, how much water the baker added, how
long the mooncake was left in the oven, the room temperature that day, etc.
Assume there are 300 independent factors that determine the exact weight of a mooncake.
Assume that each of these 300 factors is associated with an independent random variable
Y1 , Y2 , . . . , Y300 , each identically distributed with mean µY and variance σY2 . Assume also
that the weight of a mooncake is simply given by W = Y1 + Y2 + ⋅ ⋅ ⋅ + Y300 .
Then the CLT says that since n = 300 is “large enough”, W will be approximately normally
distributed, with mean 300µY and variance 300σY2 . Amongst the millions of mooncakes
produced, there will be some very light mooncakes and some very heavy mooncakes, but
most mooncakes will be near the mean weight 300µY .

1038, Contents

104.4. Don’t Assume That Everything is Normal
Mathematical modellers often assume that “everything is normal”. There are three justi-
fications for this:
1. We have strong empirical evidence that many things in nature are normally-distributed.
2. We have a strong theoretical reason (the CLT) for why this might be so.
3. The normal distribution is easy to handle (because e.g. the maths is easy, compared to
some other distributions).
However, many things are not normally-distributed. It is thus a mistake to assume that
“everything is normal”.
One example of a common but non-normal distribution found in nature is the Pareto
distribution. We’ll skip the formal details. Informally, it is called the Pareto Principle or
the 80-20 Rule and businesspersons say things like:

“80% of your output is produced by 20% of your employees.”

“80% of your sales come from 20% of your clients.”

It is believed the Pareto distribution is a good description of many aspects of human

performance (though apparently not of height or IQ). By the way, it was named after
Vilfredo Pareto, who in 1896 found that approximately 80% of the land in Italy was owned
by 20% of the population.
Let’s see if the points scored in the NBA resembles the Pareto distribution.363 In particular,
is it the case that 20% of NBA players have scored 80% of the points?

Source: Caveats: (1) The data were retrieved on June 15th, 2016, so the
points scored are between 1946 and that date. (2) By NBA, I actually mean the BAA (1946-1949), the
NBA (1949-present), and the ABA (1967-1976), combined. Data: Excel spreadsheet.
1039, Contents
Example 1191. Below is a histogram of the total points scored by each of 4,060 NBA
players. Clearly, the total points scored by each player is not normally distributed.
The histogram has 20 bins of equal width. The leftmost bin says that 2, 615 players
scored 0 to 1919 points. The rightmost bin says that only 2 players scored 36, 468 to
38, 387 points.
The grand total number of points ever scored in the NBA is 11, 565, 923. Of which,
8, 424, 242 (or 72.8%) were scored by the top 20% (812). So it appears that the 80-20
Rule is a reasonably good description of the distribution of total points scored by players!
In contrast, the normal distribution is obviously not a good description.

It’s fairly obvious to anyone who bothers graphing the data that “points scored in the
NBA” is not normally-distributed. There are however instances where this is less obvious.
One is thus more likely to mistakenly assuming a normal distribution. A famous and tragic
example of this is given by the financial markets.

1040, Contents

Example 1192. The Dow Jones Industrial Average (DJIA) is one of the world’s leading
stock market indices. It is a weighted average of the share prices of 30 of the largest US
companies (e.g. Apple, Coca-Cola, McDonald’s).
Trading starts in the morning and closes in the afternoon (right now, the trading hours
are 9:30 am to 4 pm). The next graph is of the daily closing values for the past 30 years.

DJIA (1,000s), Daily Close, 16/06/1986 - 15/06/2016

Red vertical lines
18 indicate first trading
day of each year.
0 1987 1991 1995 1999 2003 2007 2011 2015

Let qi be the % change in closing value on day i, as compared to day i − 1. For example,
on June 14th, 2016, the DJIA closed at 17, 674.82. On June 15th, 2016, it closed at
17, 640.17, 34.65 points lower than the previous day’s close. Thus,
q20160615 = ≈ −0.20%.
17, 674.82

(Example continues on the next page ...)

1041, Contents

(... Example continued from the previous page.)
The graph here is of q, on 36, 044 consecutive trading days (over 131 years). In black
are those days when the DJIA rose; in red are when it fell.
Can you spot the single largest one-day fall in the DJIA? (We’ll talk about this singular
day shortly.)

% Daily Change in DJIA (q), 17/02/1885 - 15/06/2016

Green vertical lines
12% mark the first trading
day of years ending in 0.
-12% 1 1
1 1 1 1 2
8 9 9 9 9 9 0
-16% 9 1 3 5 7 9 1
0 0 0 0 0 0 0

(Example continues on the next page ...)

1042, Contents

(... Example continued from the previous page.)
The graph here is also of q, but in the form of a histogram. Each bin has width 0.1%
(except the leftmost and rightmost bins). For example, on 2, 204 days (out of 36, 044),
q ∈ (−0.1%, 0%] (the DJIA fell by between 0.1% and 0%).
On 78 days, q ≤ −5% (the DJIA fell by more than 5%). On 70 days, q > 5% (the DJIA
rose by more than 5%).

It seems reasonable to say that q is normally-distributed (at least if we ignore the leftmost
and rightmost bins).
(Example continues on the next page ...)

1043, Contents

(... Example continued from the previous page.)
The sample mean and standard deviation (from the 36, 044 observations) are µ ≈ 0.023%
and σ ≈ 1.064%. So let’s suppose q were normally-distributed with mean µ and variance
If so, then we’d predict (as per the properties of the normal distribution) that:
1. 0.6827 of the time, q is within 1 standard deviation of the mean, i.e. q ∈
(−1.04%, 1.09%).
2. 0.954 of the time, q is within 2 standard deviations of the mean, i.e. q ∈
(−2.10%, 2.15%).
As it actually turned out,
1. 0.7965 of the time (28, 709 out of 36, 044 days), q was within 1 standard deviation of
the mean. (A bit off, but not too bad.)
2. 0.9536 of the time (34, 373 out of 36, 044 days), q was within 2 standard deviations of
the mean. (Almost exactly correct!)
In addition to the above “evidence”, we might make the following theoretical argument:
Share prices are affected by a myriad random and arguably-independent factors. Hence,
by the CLT, we’d expect share prices (and thus q as well) to be normally-distributed.
Unfortunately, modelling q as a normal random variable would be a disastrous mistake,
especially when it comes to predicting rare events. If q is indeed normally distributed,
then we’d predict that: The DJIA rises or falls by more than ...
1. ... 5% less than once every 2, 000 years.
2. ... 7% less than once every 100 million years.
3. ... 10% less than once every 480, 000, 000 billion years. (For comparison, the universe
is estimated to be 13.8 billion years old.)
And so, during the 131 years for which we have data, it should have been very unlikely
that the DJIA ever rose or fell by more than 5%.
But as it actually turned out, during these 131 years, the DJIA rose or fell by more than
1. ... 5% 148 times.
2. ... 7% 40 times.
3. ... 10% 10 times.
(Data: Excel spreadsheet.)

1044, Contents

105. Statistics: Introduction (Optional)

105.1. Probability vs. Statistics

Probability Statistics
Given a known model, what can we Given observed data, what can
say about the data we’ll observe? we say about the model?

Example 1193. Let p be the probability of a coin-flip turning up heads. (p is an example

of a parameter.) Flip a coin thrice.
Probability question: “Suppose we know that p = 1/2. Then what can we say about
the probability of observing three heads (i.e. P (HHH))?” (For most probabilists, this
is a simple question with a straightforward answer: P (HHH) = 1/8.)
Statistics question: “Suppose we observe HHH. Then what can we say about p?”
(Different statisticians will give different answers.)
In the real world, we will almost never know what p “truly” is. Instead, we usually only
have some limited data observations (such as observing HHH).
Probability is about making heroic assumptions about what p is, in order to draw infer-
ences about what the observed data will look like.
In contrast, statistics is about using limited, observed data to draw statistical infer-
ences about the model and its parameters.

1045, Contents

105.2. Objectivists vs Subjectivists

Example 1194. Ann and Bob are two infinitely intelligent persons. Ann believes that
the probability of rain tomorrow is 0.2 and Bob believes that it is 0.6.
• Objectivist view: There is some single, “correct” probability p of rain tomorrow.
Perhaps no one (except some Supreme Being up above) will ever know what exactly p
is. But in any case, we can say that exactly one of the following must be true:
1. Ann is correct (and Bob is wrong);
2. Bob is correct (and Ann is wrong); or
3. Both Ann and Bob are wrong.

• Subjectivist view: A probability is not some objective, rational thing that exists
outside the mind of any human being. There is no “correct” probability. Instead, a
probability is merely

the degree of belief in the occurrence of an event attributed by a given person at

a given instant and with a given set of information. (De Finetti, infra, pp. 3-4.)

Thus, Ann and Bob can legitimately disagree about the probability of rain tomorrow,
without either being wrong. After all, the numbers 0.2 and 0.6 are merely their personal,
subjective degrees of belief in the likelihood of rain tomorrow.

Bruno de Finetti(1906–1985) was perhaps the most famous and extreme subjectivist ever.
In the preface to his Theory of Probability (1970)364 , he wrote:
My thesis, paradoxically, and a little provocatively, but nonetheless genu-
inely, is simply this:


The abandonment of superstitious beliefs about the existence of Phlogiston,

the Cosmic Ether, Absolute Space and Time, ..., or Fairies and Witches
was an essential step along the road to scientific thinking. Probability, too,
if regarded as something endowed with some kind of objective existence, is
no less a misleading misconception, an illusory attempt to exteriorize or
materialize our true probabilistic beliefs.
In this textbook (and also in H2 Maths), we will be strict objectivists. The main practical
implications of being an objectivist are illustrated in the following examples:

Originally published in 1970 in Italian as Teoria delle probabilità. The link is to a recent 2017 English
1046, Contents
Example 1195. Judge Ann says the murder suspect is probably innocent. Judge Bob
says the suspect is probably guilty.
Objectivist interpretation: Ann and Bob cannot both be correct. The suspect is
either innocent (with probability 1) or guilty (with probability 1).
In fact, we can go even further and say that both Ann and Bob are talking nonsense. It is
nonsensical to say things like the suspect is “probably” innocent (or “probably” guilty),
because the suspect either is innocent or not.
Subjectivist interpretation: Ann and Bob are perfectly well-entitled to their beliefs.
Moreover, it is perfectly meaningful to say things like the suspect is “probably” innocent
(or “probably” guilty). Ann and Bob do not know for sure whether the suspect is innocent
or guilty. They are thereby perfectly well-entitled to speak probabilistically about the
innocence or guilt of the suspect.

Example 1196. We flip a coin 100 times and get 100 heads.
Given these observed data (100 heads out of 100 flips), what can we say (what statistical
inference can we make) about whether or not the coin is fair?
Subjectivist answer: The coin is probably not fair. (This is perhaps the answer that
most laypersons would give.)
Objectivist answer: The coin either is fair (with probability 1) or isn’t fair (with
probability 1). Subjectivist statements like the coin is “probably” not fair are nonsensical.

Most untrained laypersons are innately subjectivist. Yet in this book (and also for the
A-Levels), you’ll be trained to think like strict objectivists.
Note though that it is not the case that one school of thought is correct and the other
wrong. Both the objectivist and subjectivist schools of thought have merit. The growing
consensus amongst statisticians is to take the best of both worlds.
Nonetheless, in this textbook, we learn only the objectivist interpretation. Not because it
is necessarily superior, but rather because
1. The maths is easier.
2. Tradition: For most of the 20th century, the objectivist interpretation was favoured.

1047, Contents

106. Sampling

106.1. Population

Definition 211. A population is any ordered set (i.e. vector) of objects we’re interested

A population can be finite or infinite. But to keep things simple, we’ll look at examples
where it is finite.

Example 1197. The two candidates for the 2016 Bukit Batok SMC By-Election are Dr.
Chee Soon Juan and PAP Guy. It is the night of the election and voting has just closed.

Our objects-of-interest are the 23, 570 valid ballots cast. (A ballot is simply a piece of
paper on which a vote is recorded. The words ballot and vote are often used interchange-
Arrange the ballots in any arbitrary order. Let v1 = 1 if the first ballot is in favour of Dr.
Chee and v1 = 0 otherwise. Similarly and more generally, for any i = 2, 3, . . . , 23570, let
vi = 1 if the ith ballot is in favour of Dr. Chee and v1 = 0 otherwise.
Our population here is simply the ordered set P = (v1 , v2 , . . . , v23570 ). So in this example,
the population is simply an ordered set of 1s and 0s.

1048, Contents

106.2. Population Mean and Population Variance
The population mean µ is simply the average across all population values. The popu-
lation variance σ 2 is a measure of the variation across all population values. Formally:365

Definition 212. Given a finite population P = (v1 , v2 , . . . , vk ), the population mean µ

and population variance σ 2 are defined by

∑i=1 vi v1 + v2 + ⋅ ⋅ ⋅ + vk ∑i=1 (vi − µ) (v1 − µ) + (v2 − µ) + ⋅ ⋅ ⋅ + (vk − µ)

k k 2 2 2 2
µ= = and σ =
= .
k k k k

Example 1048 (continued from above). Suppose that of the 23, 570 votes, 9, 142
were for Dr. Chee and the remaining against. So the vector (v1 , v2 , . . . , v23570 ) contains
9, 142 1s and 14, 428 0s.
Then the population mean is
v1 + v2 + ⋅ ⋅ ⋅ + vn 9142 × 1 + 14428 × 0 9142
µ= = = ≈ 0.3879.
n 23570 23570
In this particular example, the population values are binary (either 0 or 1). And so
we have a nice alternative interpretation: the population mean is also the population
proportion. In this case, it is the proportion of the population who voted for Dr. Chee.
So here the proportion of votes for Dr. Chee is about 0.3879.
The population variance is

9142 ⋅ (1 − 23570 ) + 14428 ⋅ (0 − 23570 )

2 2
(v1 − µ) + (v2 − µ) + ⋅ ⋅ ⋅ + (vn − µ)
2 2 2 9142 9142
σ = 2
= ≈ 0.2374.
n 23570
As usual, the variance tells us about the degree to which the vi ’s vary. Of course, in
this example, we already know that the vi ’s can take on only two values — 0 and 1. So
the variance isn’t terribly interesting or informative in this example. In particular, it
doesn’t tell us anything more that the population mean didn’t already tell us (indeed, it
can be shown that in this example, σ 2 = µ − µ2 ).

In the case of an infinite population, the definitions of µ and σ 2 must be adjusted slightly, but the
intuition is the same.
1049, Contents
106.3. Parameter
Informally, a parameter is some number we’re interested in and which may be calculated
based on the population.
Example 1048 (continued from above). A parameter we might be interested in
is the population mean µ — this is also the proportion of votes in favour of Dr. Chee.
(Another parameter we might be interested in is the population variance σ 2 , but let’s
ignore that for now.)
Voting has just closed. In a few hours’ time (after the vote-counting is done), we will
know what exactly µ is. But right now, we still don’t know what µ is.
Suppose we are impatient and want to know right away what µ might be. In other
words, suppose we want to get an estimate of the true value of µ. What are some
possible methods of getting a quick estimate of µ?
One possibility is to observe a random sample of 100 votes and count the proportion of
these 100 votes that are in favour of Dr. Chee. So for example, say we do this and observe
that 39 out of the 100 votes are for Dr. Chee. That is, we find that the observed sample
mean (which in this context can also be called the observed sample proportion) is
0.39. Then we might conclude:

Based on this observed random sample of 100 votes, we estimate that µ is 0.39.

The layperson might be content with this. But the statistician digs a little deeper and
asks questions such as:
• How do we know if this estimate is “good”?
• What are the criteria to determine whether an estimate is “good”?
We’ll now try to address, if only to a limited extent, these questions. But to do so, we
must first precisely define terms like sample and estimate.

1050, Contents

106.4. Distribution of a Population
Informally,366 the distribution of a population tells us
1. The range of possible values taken on by the objects in the population; and
2. The proportion of the population that takes on each possible value.

Example 1048 (continued from above). The population is P = (v1 , v2 , . . . , v23570 ),

the ordered set of 23570 ballots. Suppose that of these, 9, 142 are votes for Dr. Chee
(hence recorded as 1s) and the remaining 14, 428 are for PAP Guy (hence recorded as
Then the distribution of the population can informally be described in words as:
• A proportion 9142/23570 of the population are 1s, and
• A proportion 14428/23570 of the population are 0s.

Example 1198. The population is P = (3, 4, 7, 7, 2, 3).

Then the distribution of the population can informally be described in words as:
• A proportion 1/6 of the population are 2s;
• A proportion 2/6 of the population are 3s;
• A proportion 1/6 of the population are 4s; and
• A proportion 2/6 of the population are 7s.

Formally, we’d define the population distribution as a function. Indeed, some writers define the popu-
lation itself as the distribution function.
1051, Contents
106.5. A Random Sample
Informally, to observe a random sample of size n, we follow this procedure: Imagine the
23, 570 ballots are in a single big bag.
1. Randomly pull out one ballot. Record the vote (either we write x1 = 1, if the vote was
for Dr. Chee, or we write x1 = 0, if it wasn’t).
2. Put this ballot back in (this second step is why we call it sampling with replacement).
3. Repeat the above n times in total, so as to record down the values of x1 , x2 , . . . , xn .
We call (x1 , x2 , . . . , xn ) an observed random sample of size n. Note that this is an
ordered set (or vector) of numbers. Formally:

Definition 213. Let P be a population. Then the random vector (i.e. ordered set of
random variables) (X1 , X2 , . . . , Xn ) is a random sample of size n from the population P

• X1 , X2 , . . . , Xn are independent; and

• X1 , X2 , . . . , Xn are identically-distributed, with the same distribution as P .

As always, we must be careful to distinguish between a function and a value taken on by

the function. This table summarises.

Function Value taken by the function

f is a function f (x) is a possible value taken on by the function
X is a random variable x is a possible observed value of the random variable
(X1 , X2 , . . . , Xn ) is a random sample (x1 , x2 , . . . , xn ) is a possible observed random sample

An example to illustrate:

1052, Contents

Example 1048 (continued from above). To repeat, the distribution of the population
P = (v1 , v2 , . . . , v23570 ) can informally be described in words as:
• 9142/23570 of the population were 1s; and
• 14428/23570 of the population were 0s.
Let X1 , X2 , and X3 be independent random variables, each with the same distribution
as the population. That is, for each i = 1, 2, 3,
14428 9142
P (Xi = 0) = and P (Xi = 1) = .
23570 23570
The ordered set (or vector) (X1 , X2 , X3 ) is a random sample of size 3.
An example of an observed random sample of size 3 might be (x1 , x2 , x3 ) = (1, 1, 0)
— this would be where we randomly sample 3 ballots (with replacement) and find that
the first two are votes for Dr. Chee but the third is not.
Another example of an observed random sample of size 3 might be (x1 , x2 , x3 ) = (0, 0, 0)
— this would be where we randomly sample 3 ballots (with replacement) and find that
none of the three are for Dr. Chee.
As another example, (X1 , X2 , X3 , X4 , X5 ) is a random sample of size 5.
An example of an observed random sample of size 5 might be (x1 , x2 , x3 , x4 , x5 ) =
(0, 1, 0, 1, 0) — this would be where we randomly sample 5 ballots (with replacement)
and find that only the second and fourth are votes for Dr. Chee.
Another example of an observed random sample of size 5 might be (x1 , x2 , x3 , x4 , x5 ) =
(1, 1, 0, 1, 1) — this would be where we randomly sample 5 ballots (with replacement)
and find that only the third is not a vote for Dr. Chee.

In this textbook, we’ll be very careful to distinguish between a random sample (which is
a vector of random variables) and an observed random sample (which is a vector of real
This may be contrary to the practice of your teachers or indeed even the A-Level exams.

1053, Contents

106.6. Sample Mean and Sample Variance

Definition 214. Let (X1 , X2 , . . . , Xn ) be a random sample of size n. Then the corres-
ponding sample mean X̄ and the sample variance S 2 are the random variables defined
X 1 + X 2 + ⋅ ⋅ ⋅ + Xn
X̄ = ,

(X1 − X̄) + (X2 − X̄) + ⋅ ⋅ ⋅ + (Xn − X̄) ∑i=1 (Xi − X̄)

2 2 n 2 2

S =
n−1 n−1

(The List of Formulae (MF26) will contain the observed sample variance.)
Note that strangely enough, the denominator of S 2 is n − 1, rather than n as one might
expect. As we’ll see later, there is a good reason for this.
By the way, there are two other formulae for calculating the sample variance:

Fact 183. Let S = (X1 , X2 , . . . , Xn ) be a random sample of size n. Let X̄ be the sample
mean and S 2 be the sample variance. Let a ∈ R be a constant. Then
i=1 Xi ] [∑ (X −a)]
∑i=1 Xi2 − ∑i=1 (Xi − a) − i=1 n i
2 2
n n 2 n

(a) S 2 = n
and (b) S =
n−1 n−1

(The List of Formulae (MF26) has a but not b.)

Proof. Optional, see p. 1378 in the Appendices.

1054, Contents

Once again, it is important to distinguish between
• The sample mean X̄ (a random variable) vs. the observed sample mean x̄ (a real
• The sample variance S 2 (a random variable) vs. the observed sample variance s2
(a real number).

Example 1048 (continued from above). Let (X1 , X2 , X3 ) be a random sample of

size 3. The corresponding sample mean X̄ and sample variance S 2 are these random

(X1 − X̄) + (X2 − X̄) + (X3 − X̄)

2 2 2
X1 + X 2 + X 3
X̄ = S2 =
, .
Suppose our observed random sample of size 3 is (1, 0, 0). Then the corresponding ob-
served sample mean x̄ and observed sample variance s2 are these real numbers:
x1 + x2 + x3 1 + 0 + 0 1
x̄ = = = ,
n 3 3

(1 − 13 ) + (0 − 31 ) + (0 − 31 )
2 2 2
(x1 − x̄) + (x2 − x̄) + (x3 − x̄)
2 2 2
s =
= = .
n−1 3−1 3
Let (X1 , X2 , X3 , X4 , X5 ) be a random sample of size 5. The corresponding sample mean
X̄ and sample variance S 2 are these random variables:

(X1 − X̄) + (X2 − X̄) + ⋅ ⋅ ⋅ + (X5 − X̄)

2 2 2
X 1 + X 2 + X 3 + X4 + X5
X̄ = S2 =
, .
Suppose our observed random sample of size 5 is (0, 1, 0, 0, 1). Then the corresponding
observed sample mean x̄ and observed sample variance s2 are these real numbers:
x1 + x2 + x3 + x4 + x5 0 + 1 + 0 + 0 + 1 2
x̄ = = = = 0.4,
n 5 5

(x1 − x̄) + (x2 − x̄) + (x3 − x̄) + (x4 − x̄) + (x5 − x̄)
2 2 2 2 2
s =
(0 − 51 ) + (1 − 15 ) + (0 − 51 ) + (0 − 15 ) + (1 − 51 )
2 2 2 2 2

= = 0.35.

1055, Contents

We call a random variable an estimator if it is used to generate estimates (“guesses”)
for some parameter. Example:
Example 1048 (continued from above). It is the night of the election and polling
has just closed. We still do not know the true proportion µ that voted for Dr. Chee.
We decide to get a random sample of size 3: (X1 , X2 , X3 ). The corresponding sample
mean X̄3 = (X1 + X2 + X3 ) /3 shall be an estimator for µ. (Informally, an estimator is
a method for generating “guesses” for some unknown parameter, in this case µ.)
This estimator is used to generate estimates (“guesses”) for µ. For every observed
random sample, the estimator generates an estimate.
Suppose our observed random sample of size 3 is (1, 0, 0). We calculate the corresponding
observed sample mean to be x̄ = 1/3. We say that x̄ = 1/3 is an estimate for µ.
(By the way, unless we are extremely lucky, it is highly unlikely that the true value of
the unknown parameter µ is precisely 1/3. After all, 1/3 is merely an estimate obtained
from a single observed random sample of size 3.)
Suppose instead that our observed random sample of size 3 were (0, 1, 1). Then the
corresponding observed sample mean would be x̄ = 2/3. We’d instead say that x̄ = 2/3 is
our estimate for µ.
There is also more than one estimator we can use. For example, suppose instead that
we decide to get a random sample of size 5: (X1 , X2 , X3 , X4 , X5 ). We shall instead use
the corresponding sample mean X̄ = (X1 + X2 + X3 + X4 + X5 ) /5 as our estimator for µ.
And so for example suppose our observed random sample of size 5 is is (0, 1, 0, 0, 1). Then
the corresponding observed sample mean x̄ = 0.4 and x̄ = 0.4 would be our estimate for
Now, are these estimators and estimates “good” or “reliable”? How much should we
trust them? These are questions that we’ll address in the next section.

1056, Contents

A different example:

Example 1199. Suppose we wish to find the average height µ (in cm) of an adult male.
As a practical matter, it would be quite difficult to locate and record the height of every
adult male in the world. So instead, what we might do is to randomly pick 4 adult males
and record their heights. This gives us a random sample (H1 , H2 , H3 , H4 ) of heights. The
corresponding sample mean is the random variable H̄ = (H1 + H2 + H3 + H4 ) /4. H̄ shall
serve as our estimator for µ.
Suppose our observed random sample is (h1 , h2 , h3 , h4 ) = (178, 165, 182, 175).
Then the corresponding observed sample mean is
h1 + h2 + h3 + h4 178 + 165 + 182 + 175
h̄ = = = 175.
n 4

Thus, h̄ = 175 serves as an estimate (or “guess”) of the true average male height µ.
Again, are the estimator H̄ and estimate h̄ = 175 “good” or “reliable”? How much
should we trust them? These are questions that we’ll address in the next section.

1057, Contents

Example 1200. Let X be the random variable that is the height (in cm) of an adult
female Singaporean. Our parameters-of-interest are the true population mean µ and true
population variance σ 2 of X. We wish to generate estimates for µ and σ 2 .
To this end, we get a random sample of size 8: (X1 , X2 , . . . , X8 ). The corresponding
sample mean X̄ = (X1 + X2 + ⋅ ⋅ ⋅ + X8 ) /8 will serve as our estimator for µ. And the
corresponding sample variance S 2 = ∑ (Xi − X̄) /(8 − 1) will serve as our estimator for

σ .
(a) Suppose our observed random sample is such that
8 8
∑ xi = 1, 320 and ∑ x2i = 218, 360.
i=1 i=1

Then the observed sample mean x̄ and the observed sample variance s2 are

∑i=1 xi 1320
x̄ = = = 165,
n 8

(∑n xi )
∑i=1 x2i − i=1n 218360 − 1320
2 2
s =
= 8
= 80.
n−1 7
And our estimates for µ and σ 2 are, respectively, 165 cm and 80 cm2 .
(b) Suppose instead our observed random sample is such that
8 8
∑(xi − 160) = 72 and ∑ (xi − 160) = 1, 560.

i=1 i=1

Then the observed sample mean x̄ and the observed sample variance s2 are

∑i=1 xi ∑i=1 (xi − 160 + 160) ∑i=1 (xi − 160) 72

n n n
x̄ = = = + 160 = + 160 = 169,
n n n 8

∑i=1 (xi − 160) − [∑i=1 (xni −160)] 1, 560 − 728

n 2 n 2

s =
= ≈ 130.3.
n−1 7
And our estimates for µ and σ 2 are, respectively, 169 cm and 130.3 cm2 .

1058, Contents

Exercise 424. Calculate the observed sample mean and variance for the following ob-
served random sample of size 7: (3, 14, 2, 8, 8, 6, 0). (Answer on p. 1592.)

Exercise 425. (Answer on p. 1592.) Let X be the random variable that is the weight
(in kg) of an American. Suppose we are interested in estimating the true population mean
µ and variance σ 2 of X. We get an observed random sample of size 10: (x1 , x2 , . . . , x10 ).
10 10
(a) Suppose you are told that ∑ xi = 1, 885 and ∑ x2i = 378, 265. Find the observed
i=1 i=1
sample mean x̄ and observed sample variance s .
10 10
(b) Suppose you are instead told that ∑(xi − 50) = 1, 885 and ∑ (xi − 50) = 378, 265.

i=1 i=1
Find the observed sample mean x̄ and observed sample variance s2 .

1059, Contents

106.7. Sample Mean and Sample Variance are Unbiased Estimators
Earlier we asked: How do we decide if an estimator and the estimates it generates are
“good”? How do we know whether to trust any given estimate?
For H2 Maths, we’ll learn only about one (important) criterion for deciding whether an
estimator is “good”. This is unbiasedness. Informally, an estimator is unbiased if on
average, the estimator “gets it right”. Formally:

Definition 215. Let X be a random variable and θ ∈ R be a parameter (i.e. just some
real number). We say that X is an unbiased estimator for θ if

E [X] = θ.

If x is an estimate generated by an unbiased estimator X, then we call x an unbiased


The next proposition says that the sample mean X̄ is an unbiased estimator for the
population mean µ; and the sample variance S 2 is an unbiased estimator for the
population variance σ 2 .

Proposition 17. Let (X1 , X2 , . . . , Xn ) be a random sample of size n drawn from a dis-
tribution with population mean µ and population variance σ 2 . Let X̄ be the sample mean
and S 2 be the sample variance. Then
(a) E [X̄] = µ. And
(b) E [S 2 ] = σ 2 .

Proof. You are asked to prove (a) in Exercise 427. For the proof of (b), see p. 1379 in the
Appendices (optional).

Proposition 17(b) is the reason why, strangely enough, we define the sample variance with
n − 1 in the denominator:

(X1 − X̄) + (X2 − X̄) + ⋅ ⋅ ⋅ + (Xn − X̄)

2 2 2

S2 =

As defined, S 2 is an unbiased estimator for the population variance σ 2 . This, then, is the
reason why we define it like this.
Some writers call S 2 the unbiased sample variance, but we shall not bother doing so. We’ll
simply call S 2 the sample variance.

1060, Contents

Example 1197 (continued from above). (Chee Soon Juan election.)
Suppose two observed random samples of size 3 are (x1 , x2 , x3 ) = (1, 0, 0) and (x1 , x2 , x3 ) =
(1, 0, 1). The corresponding observed sample means are x̄1 = 1/3 and x̄2 = 2/3. These are
two possible estimates (“guesses”) of the true sample proportion µ.
Unless we’re extremely lucky, it’s unlikely that either of these two estimates is exactly
correct. Nonetheless, what the above unbiasedness proposition tells us is this:
Suppose the unknown population mean is µ = 0.39. We draw the following 10 observed
random samples of size 3 (table below). For each sample i, we calculate the corresponding
observed sample mean x̄i .

Sample i x1 x2 x3 x̄i
1 1 0 1 2/3
2 0 0 0 0
3 0 1 0 2/3
4 1 0 0 1/3
5 0 1 1 2/3
6 1 0 0 1/3
7 0 0 0 0
8 0 0 0 0
9 0 0 1 1/3
10 1 1 0 2/3

Note that every estimate x̄i is wrong. Indeed, since the sample mean X̄i can only take
on values 0, 1/3, 2/3, or 1, the estimates can never possibly be equal to the true µ = 0.39.
Nonetheless, what the above proposition says informally is that on average, the estimate
gets it correct. Formally, E [X̄] = µ = 0.39.
For a demonstration that you can play around with, try this Google spreadsheet.

1061, Contents

Exercise 426. (Answer on p. 1592.) We are interested in the weight (in kg) of Singa-
poreans. We have an observed random sample of size 5: (32, 88, 67, 75, 56).
(a) Find unbiased estimates for the population mean µ and variance σ 2 of the weights of
Singaporeans. (State any assumptions you make.)
(b) What is the average weight of a Singaporean?
Exercise 427. Prove that E [X̄] = µ. (This is part (a) of Proposition 17). (Answer on
p. 1593.)

Exercise 428. Suppose we flip a coin 10 times. The first 7 flips are heads and the next
3 are tails. Let 1 denote heads and 0 denote tails. (Answer on p. 1593.)
(a) Write down, in formal notation, our observed random sample, the observed sample
mean, and observed sample variance.
(b) Are these observed sample mean and variance unbiased estimates for the true popu-
lation mean and variance?
(c) Can we conclude that this a biased coin (i.e. the true population mean is not 0.5)?

1062, Contents

106.8. The Sample Mean is a Random Variable
This section is just to repeat, stress, and emphasise that the sample mean X̄ is itself a
random variable. This is an important point.
Indeed, the sample mean X̄ is both (i) a random variable; and (ii) an estimator. In
contrast, an observed sample mean x̄ is both (i) a real number; and (ii) an estimate.
We’ve showed that E [X̄] = µ. This equation can be interpreted in two equivalent ways:
• The expected value of the sample mean equals the population mean µ.
• The sample mean is an unbiased estimator for the population mean µ.
We now give the variance of the sample mean. It turns out to be equal to the population
variance σ 2 , divided by the sample size n.

Fact 184. Var [X̄] = .

Proof. You are asked to prove this fact in Exercise 429 .

Exercise 429. Prove Fact 184. (Hint: Note that X̄ = (X1 + X2 + ⋅ ⋅ ⋅ + Xn ) and X1 , X2 ,
. . . , Xn are independent.) (Answer on p. 1593.)

Exercise 430. For each of the following terms, give a formal definition and an intuitive
explanation. (State whether each term is a random variable or a real number.) For
simplicity, you may assume that the finite population is given by P = (x1 , x2 , . . . , xk ).
(Answer on p. 1594.)
(a) The population mean.
(b) The population variance.
(c) The sample mean.
(d) The sample variance.
(e) The mean of the sample mean.
(f) The variance of the sample mean.
(g) The mean of the sample variance.
(h) The observed sample mean.
(i) The observed sample variance.

1063, Contents

106.9. The Distribution of the Sample Mean

Fact 185. Let X1 , X2 , . . . , Xn ∼ N (µ, σ 2 ) be independent random variables. Then

X1 + X2 + ⋅ ⋅ ⋅ + X n σ2
X̄n = ∼ N (µ, ) .
n n

Proof. Corollary 34 tells us that the sum of normal random variables is itself a normal
random variable. So X1 + X2 + ⋅ ⋅ ⋅ + Xn is a normal random variable.
Fact 181 tells us that a linear transformation of a normal random variable is itself a normal
random variable. So X̄n = (X1 + X2 + ⋅ ⋅ ⋅ + Xn ) /n is a normal random variable.
In the previous sections, we already showed that X̄n has mean µ and variance σ 2 /n.
Altogether then, X̄n ∼ N (µ, ).
Now, suppose instead X1 , X2 , . . . , Xn are not normally-distributed. Surprisingly, a similar
result still holds, thanks to the CLT. Informally, draw X1 , X2 , . . . , Xn from any distribution.
Then thanks to the CLT, it will still be the case that — provided n is “large enough” —
X̄n is (approximately) normally-distributed. Formally:

Fact 186. Let X1 , X2 , . . . , Xn be independent random variables, each identically-

distributed with mean µ ∈ R and variance σ 2 ∈ R. Let
X 1 + X2 + ⋅ ⋅ ⋅ + X n
X̄n = .
Then lim X̄n ∼ N (µ, ).
n→∞ n

Proof. The CLT says that if n is “large enough”, then X1 +X2 +⋅ ⋅ ⋅+Xn is well-approximated
by the normal distribution N (nµ, nσ 2 ).
And so it follows from Fact 181 (a linear transformation of a normal random variable is
itself a normal random variable) that X̄ = (X1 + X2 + ⋅ ⋅ ⋅ + Xn ) /n is well-approximated by
the normal distribution N (µ, ).
In the next chapter, we’ll make greater use of the two results given in this section.

1064, Contents

106.10. Non-Random Samples
Some examples to illustrate the concept of a non-random sample:

Example 1201. Suppose we’re interested in the average height of a Singaporean. The
only way to know this for sure is to survey every single Singaporean. This, however, is
not practical.
Instead, we have only the resources to survey 100 individuals. We decide to go to a
basketball court and measure the heights of 100 people there. We thereby gather an
observed sample of size 100: (x1 , x2 , . . . , x100 ). We find that the average individual’s
height is x̄ = ∑ xi /100 = 179 cm.
Is x̄ = 179 cm an unbiased estimate of the average Singaporean’s height? Intuitively, we
know that the answer is obviously no.
The reason is that our observed sample of size 100 was non-random. We picked a basket-
ball court, where the individuals are overwhelmingly (i) male; and (ii) taller than average.
Our estimate x̄ = 179 cm is thus probably biased upwards.

Example 1202. Suppose we’re interested in what the average Singaporean family spends
on food each month. The only way to know this for sure is to survey every single family
in Singapore. This, however, is not practical.
Instead, we have only the resources to survey 100 families. We decide to go to Sixth
Avenue and randomly ask 100 families living there what they reckon they spend on food
each month. We thereby gather an observed sample of size 100: (x1 , x2 , . . . , x100 ). We
find that the average family spends x̄ = ∑ xi /100 = $2, 700 on food each month.
Is x̄ = $2, 700 an unbiased estimate of the average monthly spending on food by a Singa-
porean family? Intuitively, we know that the answer is obviously no.
The reason is that our observed sample of size 100 was non-random. We picked an
unusually affluent neighbourhood. Our estimate x̄ = $2, 700 is thus probably biased

1065, Contents

107. Null Hypothesis Significance Testing (NHST)
Here’s a quick sketch of how Null Hypothesis Significance Testing (NHST) works:

Example 1203. A piece of equipment has probability θ of breaking down. We have

many pieces of the same type of equipment. Assume the rates of breakdown across the
pieces of equipment are identical and independent.

1. Write down a null hypothesis H0 . In this case, it might be “H0 : θ = 0.6”.

2. Write down an alternative hypothesis HA . In this case, it might be “HA : θ < 0.6”.

(This is a one-tailed test — to be explained shortly.)

3. Observe a random sample. For example, we might have an observed random sample
of size 5, where only the fourth piece of equipment breaks down. And so we’d write
(x1 , x2 , x3 , x4 , x5 ) = (0, 0, 0, 1, 0).
4. Write down a test statistic. In this case, an obvious test statistic is the sample
number of failures T = X1 + X2 + X3 + X4 + X5 . Our observed test statistic is thus
t = x1 + x2 + x3 + x4 + x5 = 0 + 0 + 0 + 1 + 0 = 1.
5. Now ask, how likely is it that — if H0 were true — our test statistic would have been
“at least as extreme as” that actually observed? That is, what is the probability
P (Observe data as extreme as that observed∣H0 )?

The above probability is called the p-value of the observed sample.

In this case, the p-value is the probability of observing a random sample where 1 or fewer
pieces of equipment broke down, assuming H0 ∶ θ = 0.6 were true. That is,

p = P (T ≤ t = 1∣H0 ) .

Now, remember that T is a random variable. In fact, it’s a binomial random variable.
Assuming H0 to be true, we have T ∼ B (n, θ) = B (5, 0.6). Thus,

⎛5⎞ 0 5 ⎛5⎞ 1 4
p = P (T ≤ 1∣H0 ) = P (T = 0∣H0 ) + P (T = 1∣H0 ) = 0.6 0.4 + 0.6 0.4 = 0.08704.
⎝0⎠ ⎝1⎠

This says that if H0 were true, then the probability of observing a test statistic as extreme
as the one we actually observed is only 0.08704. We might interpret this relatively small
p-value as casting doubt on or providing evidence against H0 .

1066, Contents

Here is the full list of the ingredients that go into NHST.

Null Hypothesis Significance Testing (NHST)

1. Null hypothesis H0 (e.g. “this equipment has probability 0.6 of breaking down”).
2. Alternative hypothesis HA (e.g. “this equipment has probability less than 0.6 of
breaking down”). The test is either one-tailed or two-tailed, depending on HA .
3. A random sample of size n: (X1 , X2 , . . . , Xn ).
4. A test statistic T (which simply maps each observed random sample to a real num-
5. The p-value of the observed sample. This is the probability that — assuming H0 were
true — T takes on values that are at least “as extreme as” the actual observed test
statistic t.
6. The significance level α. This is a pre-selected threshold, usually chosen to be some
small value. The conventional significance levels are α = 0.1, α = 0.05, or α = 0.01.
We then conclude qualitatively that:
• A small p-value casts doubt on or provides evidence against H0 .
• A large p-value fails to cast doubt on or provide evidence against H0 .
In particular, if p < α, then we say that we reject H0 at the significance level α. And
if p ≥ α, then we say that we fail to reject H0 at the significance level α.

Note importantly that to reject H0 (at some significance level α) does NOT mean that H0
is false and HA is true. Similarly, failure to reject H0 does NOT mean that H0 is true and
HA is false. More on this below.
Another example of NHST, now slightly more formally and carefully presented.

1067, Contents

Example 1048. (Dr. Chee election example.) Our parameter of interest is µ, the
proportion of votes for Dr. Chee. We guess that Dr. Chee won only 30% of the votes.
We might thus write down two competing hypotheses:

H0 ∶ µ = 0.3,
HA ∶ µ > 0.3.

We call H0 the null hypothesis and HA the alternative hypothesis.

We pre-select α = 0.05 as our significance level. This is the arbitrary threshold at which
we’ll say we reject (or fail to reject) H0 .
We gather a random sample of 100 votes: (X1 , X2 , . . . , X100 ). Our test statistic is the
number of votes in favour of Dr. Chee, given by

T = X1 + X2 + ⋅ ⋅ ⋅ + X100 .

Suppose that in our observed random sample (x1 , x2 , . . . , x100 ), we find that 39 are in
favour of Dr. Chee. Our observed test statistic is thus t = 39.
We now ask: What is the probability that — assuming H0 were true — T takes on
values that are at least “as extreme as” the actual observed test statistic t? That is, what
is the p-value of the observed sample?
Now, assuming H0 were true, T is a binomial random variable with parameters 100 and
0.3. That is, T ∼ B (n, p) = B (100, 0.3). So:

p = P (T ≥ 39∣H0 ) = P (T = 39∣H0 ) + P (T = 40∣H0 ) + ⋅ ⋅ ⋅ + P (T = 100∣H0 )

⎛ 100 ⎞ 39 61 ⎛ 100 ⎞ 40 60 ⎛ 100 ⎞ 100 0
= 0.3 0.7 + 0.3 0.7 + ⋅ ⋅ ⋅ + 0.3 0.7 ≈ 0.03398.
⎝ 39 ⎠ ⎝ 40 ⎠ ⎝ 100 ⎠

The small p-value casts doubt on or provides evidence against H0 .

And since p ≈ 0.03398 < α = 0.05, we can also say that we reject H0 at the α = 0.05
significance level.

1068, Contents

Let θ be the parameter we’re interested in. Under the objectivist interpretation, the value
of θ may be unknown, but it is fixed. This has two consequences:
1. We never speak probabilistically about θ, because θ is a fixed number. For example, we
never say “θ is probably less than 0.6” or “θ has probability 0.8 of being between 0.4
and 0.7”. Such statements are nonsensical.
2. The null hypothesis, which is always written as an equality (e.g. “H0 ∶ θ = 0.6”), is
almost certainly false. After all, θ can (usually) take on a continuum of values. So do
NOT interpret “we fail to reject H0 ” to mean “H0 is true”. This is because H0
is almost certainly false.
When performing NHST, we will assiduously avoid saying things like “H0 is true”, “H0 is
false”, “HA is true”, or “HA is false”. Instead, we will stick strictly to saying either “we
reject H0 at the significance level α” or “we fail to reject H0 at the significance level α”.
Each of these two statements has a very precise meaning. The first says that p < α. The
second says that p ≥ α. Nothing more and nothing less.

Exercise 431. We flip a coin 20 times and get 17 heads. Test, at the 5% significance
level, whether the coin is biased towards heads. (Answer on p. 1595.)

1069, Contents

107.1. One-Tailed vs Two-Tailed Tests
In the previous section, all the NHST we did were one-tailed tests.367 For example, in
the NHST done for Dr. Chee, we had

H0 ∶ µ = 0.3,
HA ∶ µ > 0.3.

This was a one-tailed test because the alternative hypothesis HA was that µ was to the
right of 0.3.
If instead we changed the alternative hypothesis to:

H0 ∶ µ = 0.3,
HA ∶ µ ≠ 0.3.

Then this would be called a two-tailed test, because the alternative hypothesis HA is that
µ is either to the left or to the right of 0.3.
We now repeat the examples done in the previous section, but with HA tweaked so that we
instead have two-tailed tests. The difference is that the p-value is calculated differently.

By the way, the more common convention is to say “one-tailed” and “two-tailed” tests, rather than
“one-tail” and “two-tail” tests, as is the norm in Singapore (similar to those “Close for break” signs you
sometimes see). But after some consultation with my grammatical experts, I have been told that both
are equally correct.
1070, Contents
Example 1203 (equipment breakdown).
Everything is as before, except that we now change the alternative hypothesis:

H0 ∶ θ = 0.6,
HA ∶ θ ≠ 0.6.

Say we observe the same random sample as before: (x1 , x2 , x3 , x4 , x5 ) = (0, 0, 0, 1, 0).
Again our test statistic is the sample number of failures T = X1 + X2 + X3 + X4 + X5 . And
so again our observed test statistic is t = x1 + x2 + x3 + x4 + x5 = 0 + 0 + 0 + 1 + 0 = 1.
The difference now is how the p-value (of the observed sample) is calculated. In words,
the p-value gives the likelihood that our test statistic is “at least as extreme as” that
actually observed — assuming H0 were true.
Previously, under a one-tailed test, we interpreted “our test statistic is at least as extreme
as that actually observed” to mean the event T ≤ t = 1.
Now that we’re doing a two-tailed test, we’ll instead interpret the same phrase to mean
both the event T ≤ t = 1 and the event that T is as far away on the other side of
E [T ∣H0 ] = 3. The second event is, specifically, T ≥ 5. Altogether then, the p-value is
given by

p = P (T ≤ 1, T ≥ 5∣H0 )

= P (T = 0∣H0 ) + P (T = 1∣H0 ) + P (T = 5∣H0 )

⎛5⎞ 0 5 ⎛5⎞ 1 4 ⎛5⎞ 1 4
= 0.6 0.4 + 0.6 0.4 + 0.6 0.4 = 0.1648.
⎝0⎠ ⎝1⎠ ⎝5⎠

Since p = 0.1648 ≥ α = 0.1, we say that we fail to reject H0 at the α = 0.1 significance
Observe that previously, under the one-tailed test, we could reject H0 at the α = 0.1
significance level, because there p = 0.08704. Now, in contrast, under the two-tailed test,
we fail to reject H0 at the same significance level.

In general, all else equal, the p-value for an observed random sample is greater under a
two-tailed test than under a one-tailed test. Thus, under a two-tailed test, we are less
likely to reject H0 .

1071, Contents

Example 1048 (Dr. Chee election). We change the alternative hypothesis:

H0 ∶µ = 0.3,
HA ∶µ ≠ 0.3.

Say we observe the same random sample as before: (x1 , x2 , . . . , x100 ), in which 39 votes
were in favour of Dr. Chee. So again our observed test statistic is t = x1 +x2 +⋅ ⋅ ⋅+x100 = 39.
The difference now is how the p-value (of the observed sample) is calculated. In words,
the p-value gives the likelihood that our test statistic is “at least as extreme as” that
actually observed — assuming H0 were true.
Previously, under a one-tailed test, we interpreted “our test statistic is at least as extreme
as that actually observed” to mean the event T ≥ t = 39.
Now that we’re doing a two-tailed test, we’ll instead interpret the same phrase to mean
both the event T ≥ t = 39 and the event that T is as far away on the other side of
E [T ∣H0 ] = 30. The second event is, specifically, T ≤ 21. Altogether then, the p-value is
given by

p = P (T ≤ 21, T ≥ 39∣H0 ) = 1 − P (22 ≤ T ≤ 38∣H0 )

= 1 − [P (T = 22∣H0 ) + P (T = 23∣H0 ) + ⋅ ⋅ ⋅ + P (T = 38∣H0 )]

⎡ ⎤
⎢⎛ 100 ⎞ 22 78 ⎛ 100 ⎞ 23 77 ⎛ 100 ⎞ 38 62 ⎥⎥
=1−⎢ ⎢ 0.3 0.7 + 0.3 0.7 + ⋅ ⋅ ⋅ + 0.3 0.7 ⎥ ≈ 0.06281.
⎢⎝ 22 ⎠ ⎝ 23 ⎠ ⎝ 38 ⎠ ⎥
⎣ ⎦
Since p = 0.06281 ≥ α = 0.05, we say that we fail to reject H0 at the α = 0.05
significance level.
Again observe that previously, under the one-tailed test, we could reject H0 at the α = 0.05
significance level, because there p = 0.03398. Now, in contrast, under the two-tailed test,
we fail to reject H0 at the same significance level.

Exercise 432. We flip a coin 20 times and get 17 heads. Test, at the 5% significance
level, whether the coin is biased.(Answer on p. 1595.)

1072, Contents

107.2. The Abuse of NHST (Optional)
NHST is popular because it gives a simplistic, formulaic cookbook procedure. Moreover,
its conclusion appears to be binary: either we reject H0 or we fail to reject H0 .
However, NHST is widely misunderstood, misinterpreted, and misused even within scientific
communities. It has long been heavily criticised. In March 2016, the American Statistical
Association even issued an official policy statement on how NHST should be used!
Here I discuss only the most important, commonly-made error.
We may write the p-value as

p = P (D∣H0 ) ,

where D stands for the observed data and H0 stands for the null hypothesis. The p-value
answers the following question: — assuming H0 were true, what’s the probability that we’d
get data “at least as extreme” as those actually observed (D)?
Say we get a p-value of 0.03. We should then say simply that
• The small p-value casts doubt on or provides evidence against H0 .
• If the pre-selected significance level was α = 0.05, then we may say that we reject H0
at the 5% significance level.
However, instead of merely saying the above, some researchers may instead conclude that:

H0 is true with probability 0.03.

Do you see the error here? The researcher has gone from the finding that p = P (D∣H0 ) = 0.03
to the conclusion that P (H0 ∣D) = 0.03. This is precisely the Conditional Probability Fallacy
(CPF), which we discussed at length in subsection 92.1.
The error is the same as leaping from “A lottery ticket buyer who doesn’t cheat has a small
probability q of winning” to “Jane bought a lottery ticket and won. Therefore, there is only
probability q that she didn’t cheat.”
The p-value is NOT the probability that H0 is true.368 Instead, it is the probability that
— assuming H0 were true — we would have gotten data “at least as extreme” as those
actually observed. This is an important difference. But it is also a subtle one, which is why
even researchers get confused.

Indeed, under the objectivist view, such a statement is nonsensical anyway, because H0 is either true
or not true; it makes no sense to talk probabilistically about whether H0 is true.
1073, Contents
107.3. Common Misinterpretations of the Margin of Error
The sampling error or margin of error is often misinterpreted by laypersons (and

Example 1204. On the night of the 2016 Bukit Batok SMC By-Election, the Elections
Department announced369 that based on a sample count of 900 ballots,
• Dr. Chee had won 39% of the votes.
• These sample counts have a confidence level of 95%, with a ±4% margin of error.
What does the above gobbledygook mean? Let µ be the true proportion of votes won by
Dr. Chee. Let X̄ be the sample proportion and x̄ be the observed sample proportion.
It’s clear enough what the 39% means — they randomly counted 900 ballots and found
(after accounting for any spoilt votes) that x̄ = 39% were in favour of Dr. Chee.
What’s less clear is what the 95% confidence level and ±4% margin of error mean.
Here are three possible interpretations of what is meant. Only one is correct.
1. “With probability 0.95, µ ∈ (x̄ − 0.04, x̄ + 0.04) = (0.35, 0.43).”
2. “With probability 0.95, X̄ ∈ (x̄ − 0.04, x̄ + 0.04) = (0.35, 0.43).”
Equivalently, suppose we repeatedly observe many random samples of size 900. Then we
should find that in 0.95 of these observed random samples, the observed sample mean is
between 0.35 and 0.43.
3. “With probability 0.95, X̄ ∈ (µ − 0.04, µ + 0.04).”
We have no idea what µ is. All we can say is that with probability 0.95, the sample mean
X̄ of votes for Dr. Chee is between µ − 0.04 and µ + 0.04.
Equivalently, suppose we repeatedly observe many random samples of size 900. Then we
should find that in 0.95 of these observed random samples, the observed sample mean is
between µ − 0.04 and µ + 0.04.
Take a moment to understand what each of the above interpretations say. Then decide
which you think is the correct interpretation, before turning to the next page.
(Example continues on the next page ...)

1074, Contents

(... Example continued from the previous page.)
Interpretation #1 — “with probability 0.95, µ ∈ (x̄ − 0.04, x̄ + 0.04) = (0.35, 0.43)” — is
perhaps the one most commonly made by laypersons.370 It makes two errors:
1. It is nonsensical to speak probabilistically about the proportion µ of votes
won by Dr. Chee. µ is some fixed number. So either µ is in the interval (0.35, 0.43),
or it isn’t. It makes no sense to speak probabilistically about whether µ is in that
2. The margin of error is applicable to the true proportion µ and not to the
observed sample proportion x̄ = 0.39.
Some “authorities” often attempt371 to correct Interpretation #1 by offering Interpret-
ation #2 — “with probability 0.95, X̄ ∈ (x̄ − 0.04, x̄ + 0.04) = (0.35, 0.43)”. However,
Interpretation #2 is still wrong, because it still makes the second of the above two errors.
Unfortunately, the correct interpretation is also the one that says the least. It is Inter-
pretation #3 — “with probability 0.95, X̄ ∈ (µ − 0.04, µ + 0.04)”.
This interpretation says merely that if we were somehow able to repeatedly observe
random samples of size 900, then we’d find that 0.95 of the corresponding observed
sample means will be in (µ − 0.04, µ + 0.04). Which isn’t saying much, because first of all,
we have only one observed random sample; we do not get to repeatedly observe random
samples. Secondly, this still doesn’t tell us much about µ, which is what we’re really
interested in.
The correct interpretation (Interpretation #3) is the least interesting interpretation. Per-
haps this explains why journalists often prefer to give an incorrect interpretation.

See section 122.8 in the Appendices for a discussion of where the Elections Department’s
±4% margin of error comes from.

1075, Contents

Journalists often try to explain what the confidence level and margin of error mean — they
almost always get it wrong.

Example 1205. On the night of the 2016 Bukit Batok SMC By-Election, a website
called wrote:
“Based on the sample count of 100 votes,372 it was revealed at 9.26pm that the SDP
Sec-Gen received 39 percent of votes. In other words, Chee would score 35 per cent in
the worst case scenario and 43 per cent in the best case scenario.”
This is the most absurd misinterpretation of the margin of error I have ever seen.373
Let’s see what the correct worst- and best-case scenarios are.
Suppose that in the observed random sample of 900 votes, exactly 39% or 0.39×900 = 351
were votes for Dr. Chee and the remaining 549 were for PAP Guy. Then:
• Worst-case scenario: The observed random sample of 900 votes happened to contain
exactly all of the votes in favour of Dr. Chee. That is, Dr. Chee won only 351 votes
and PAP Guy won the remaining 23, 570−351 = 23, 219 votes. So the correct worst-case
scenario is that Dr. Chee won ≈ 1.5% of the votes.
• Best-case scenario: The observed random sample of 900 votes happened to contain
exactly all of the votes in favour of PAP Guy. That is, PAP Guy won only 549 votes
and Dr. Chee won the remaining 23570 − 549 = 23, 021 votes. So the correct best-case
scenario is that Dr. Chee won ≈ 97.7% of the votes.
These worst- and best-case scenarios are admittedly unlikely. Nonetheless, they are
possible scenarios all the same. The journalist’s purported worst- and best-case scenarios
are completely wrong.

By the way, even this basic fact was wrong. The sample count was not 100 votes. Instead, it was 900
votes, consisting of 100 votes from each of 9 polling stations.
Moreover, the journalist failed to report the confidence level of 95%, either because he
didn’t know what it meant or because he didn’t think it important. But it is important. It is pointless
to inform the reader about the margin of error without also specifying the confidence level.
You can find several misinterpretations of the margin of error collected in this academic paper. None is
as absurdly bad as the error committed here.
1076, Contents
107.4. Critical Region and Critical Value
Informally, the critical region is the set of values of the observed test statistic t for which
we would reject the null hypothesis. The critical region is thus sometimes also called the
rejection region.
And the critical value(s) is (are) the exact value(s) of the observed test statistic t at
which we are just able to reject the null hypothesis.
Example 1048. (Dr. Chee election.) Say that as before, we have a one-tailed test
where the two competing hypotheses are:

H0 ∶ µ = 0.3,
HA ∶ µ > 0.3.

Say that as before, we choose α = 0.05 as our significance level.

Say that as before, in our observed random sample of 100 votes, 39 are in favour of Dr.
Chee, so that our observed test statistic is t = 39.
We calculated that the corresponding p-value is 0.03398 and so we were able to reject H0
at the α = 0.05 significance level.
We now calculate the critical region and the critical value. We can calculate that
if t = 38, then the corresponding p-value is ≈ 0.053 (you should verify this for yourself).
And so we would be unable to reject H0 .
We thus conclude that the critical value is 39, because this is the value of t at which we
are just able to reject H0 .
And the critical region is the set {39, 40, 41, . . . , 100}. These are the values at which we’d
be able to reject H0 at the α = 0.05 significance level.

1077, Contents

Same example as above, but now two-tailed:
Example 1048. (Dr. Chee election.)
Say that as before, we have a two-tailed test where the two competing hypotheses are:

H0 ∶ µ = 0.3,
HA ∶ µ ≠ 0.3.

The significance level is again α = 0.05. Again, the observed random sample of 100 votes
contains 39 in favour of Dr. Chee, so that our observed test statistic is t = 39.
We calculated that the corresponding p-value is 0.06281 and so we failed to reject H0 at
the α = 0.05 significance level.
We calculate that if t = 40, then the corresponding p-value is ≈ 0.03745 (you should verify
this for yourself). Thus, the critical values are 20 and 40, because these are the values of
t at which we are just able to reject H0 .
The critical region is the set {0, 1, . . . , 20, 40, 41, . . . , 100}. These are the values at which
we’d be able to reject H0 at the α = 0.05 significance level.

Exercise 433. (Answer on p. 1596.) We flip a coin 20 times. What are the critical
region and critical value(s) in
(a) A test, at the 5% significance level, of whether the coin is biased towards heads.
(b) A test, at the 5% significance level, of whether the coin is biased.

1078, Contents

107.5. Testing of a Population Mean
(Small Sample, Normal Distribution, σ 2 Known)

Example 1206. The weight (in mg) of a grain of sand is X ∼ N (µ, 9). Our unknown
parameter of interest is the true population mean µ (i.e. the true average weight of a
grain of sand). Our “guess” is that µ = 5. We thus write down two competing hypotheses:

H0 ∶ µ = 5,
HA ∶ µ ≠ 5.

(Note that this is a two-sided test.)

We take a random sample of size 4 — (X1 , X2 , X3 , X4 ). Our test statistic is the sample
mean X̄ = (X1 + X2 + X3 + X4 ) /4.
Our observed random sample is (x1 , x2 , x3 , x4 ) = (3, 9, 11, 7). That is, we randomly pick
four grains of sand that happen to have weights 3, 9, 11, and 7 mg. Then the observed
test statistic is
3 + 9 + 11 + 7
x̄ = = 7.5.
The p-value is the probability that the test statistic X̄ takes on values “at least as extreme
as” our observed test statistic x̄ = 7.5, assuming H0 ∶ µ = 5 were true. Note that if H0
were true, then X̄ ∼ N (µ, σ 2 /n) = N (5, 9/4). Thus, the p-value is given by:

p = P (X̄ ≥ 7.5, X̄ ≤ 2.5∣H0 ) = P (X̄ ≥ 7.5∣H0 ) + P (X̄ ≤ 2.5∣H0 )

⎛ 7.5 − 5 ⎞ ⎛ 2.5 − 5 ⎞
=P Z≥ √ +P Z ≤ √ ≈ 0.04779 + 0.04779 = 0.09558.
⎝ 9/4 ⎠ ⎝ 9/4 ⎠

Thus, we reject H0 at the α = 0.1 significance level. However, we would fail to reject H0
at the α = 0.05 significance level.

1079, Contents

The table below summarises the tests to use for the population mean, in different circum-
stances. In this section, we learnt how to handle the first case (any sample size, normal
distribution, σ 2 known). The following sections will deal with the other three cases.

Sample size Distribution σ2 σ 2 known

X̄ − µ
Any Normal Known Z-test: √ ∼ N(0, 1).
σ/ n

X̄ − µ
Large Any Known Z-test: √ ∼ N(0, 1).
σ/ n

X̄ − µ
Large Any Unknown Z-test: √ ∼ N(0, 1).
s/ n

Small Normal Unknown X̄ − µ

t-test: √ ∼ N(0, 1).374
s/ n
Small Non-normal Either Not in A-Levels.

Exercise 434. The Singapore daily high temperature (in °C) can be modelled by
X ∼ N (µ, 8). Our unknown parameter of interest is the true population mean µ (i.e.
the true average daily high temperature). Your friend guesses that µ = 34. You gather
the following data on daily high temperatures, of 10 randomly-chosen days in 2015:
(35, 35, 31, 32, 33, 34, 31, 34, 35, 34). Test your friend’s hypothesis, at the α = 0.05 signific-
ance level. (Be sure to write down your null and alternative hypotheses.) (Answer on p.

1080, Contents

107.6. Testing of a Population Mean
(Large Sample, Any Distribution, σ 2 Known)
We’ll recycle the same example from the previous section. Before, we knew that X was
normally distributed. Now the big difference is that we have absolutely no idea what
distribution X comes from!
To compensate, we require also that our random sample is “large enough”, so that the
CLT-approximation can be used.

Example 1207. The weight (in mg) of a grain of sand is X ∼ (µ, 9). (This says simply
that X is distributed with mean µ and variance 9.) Our unknown parameter of interest
is the true population mean µ (i.e. the true average weight of a grain of sand). Again,
we “guess” that µ = 5. Again, we write down:

H0 ∶ µ = 5,
HA ∶ µ ≠ 5.

(Note that this is, again, a two-sided test.)

This time, we’ll take a random sample of size 100 — (X1 , X2 , . . . , X100 ). Again, our test
statistic is the sample mean X̄ = (X1 + X2 + ⋅ ⋅ ⋅ + X100 ) /100.
Recall the magic of the CLT. Even if we have absolutely no idea what distribution X
is drawn from, then provided n is sufficiently large, X̄ is normally distributed. So here,
since the sample is large (n = 100 ≥ 20), by the CLT, we know that X̄ has, approximately,
the normal distribution N (µ, σ 2 /n). So, if H0 were true, then we have, approximately,
X̄ ∼ N (µ, σ 2 /n) = N (5, 9/100).
Say the observed test statistic we get is:
x1 + x2 + ⋅ ⋅ ⋅ + x100
x̄ = = 5.5.
(Example continues on the next page ...)

1081, Contents

(... Example continued from the previous page.)
Again, the p-value is the probability that our test statistic X̄ takes on values “at least as
extreme as” our observed test statistic x̄ = 5.6, assuming H0 ∶ µ = 5 were true. Thus, the
p-value is given by

p = P (X̄ ≥ 5.6, X̄ ≤ 4.4∣H0 ) = P (X̄ ≥ 5.6∣H0 ) + P (X̄ ≤ 4.4∣H0 )

CLT 5.6 − µ 4.4 − µ ⎛ 5.6 − 5 ⎞ ⎛ 4.4 − 5 ⎞

≈ P (Z ≥ √ ) + P (Z ≤ √ )=P Z≥ √ +P Z ≤ √
σ/ n σ/ n ⎝ 9/100 ⎠ ⎝ 9/100 ⎠

= P (Z ≥ 2) + P (Z ≤ −2) ≈ 0.0455.

Thus, we reject H0 at the α = 0.05 significance level.

Exercise 435. The Singapore daily high temperature (in °C) can be modelled by X ∼
(µ, 8). Our unknown parameter of interest is the true population mean µ (i.e. the true
average daily high temperature). Your friend guesses that µ = 34. You gather the data
on daily high temperatures, of 100 randomly-chosen days in 2015 and find the observed
sample average temperature to be 33.4 °C. Test your friend’s hypothesis, at the α = 0.05
significance level. (Be sure to write down your null and alternative hypotheses. Also,
clearly state where you use the CLT.) (Answer on p. 1597.)

1082, Contents

107.7. Testing of a Population Mean
(Large Sample, Any Distribution, σ 2 Unknown)
We’ll recycle the same example from the previous section. Again, we have absolutely no
idea what distribution X comes from. And again, the random sample is large enough, so
that the CLT can be used.
But now, σ 2 is unknown. This turns out to be no big deal. We can simply replace σ 2
with the observed unbiased sample variance s2 , and do the same thing as before.

Example 1208. The weight (in mg) of a grain of sand is X ∼ (µ, σ 2 ). (This says simply
that X is distributed with mean µ and variance σ 2 .) Our unknown parameter of interest
is the true population mean µ (i.e. the true average weight of a grain of sand). Again,
we “guess” that µ = 5. Again, we write down

H0 ∶ µ = 5,
HA ∶ µ ≠ 5.

(Note that this is, again, a two-sided test.)

Again, we take a random sample of size 100 — (X1 , X2 , . . . , X100 ). Again, our test statistic
is the sample mean X̄ = (X1 + X2 + ⋅ ⋅ ⋅ + X100 ) /100.
Again, since the sample is large (n = 100 ≥ 20), by the CLT, that X̄ has, approximately,
the normal distribution N (µ, σ 2 /n). So, if H0 were true, then we have, approximately,
X̄ ∼ N (µ, σ 2 /n) = N (5, σ 2 /100). Since the sample variance S 2 is an unbiased estimator
for σ 2 , it is plausible that we also have, approximately, X̄ ∼ N (µ, σ 2 /n) = N (5, s2 /100),
where s2 is the observed sample variance.
Say the observed sample mean and observed sample variance we get are:

x1 + x2 + ⋅ ⋅ ⋅ + x100 ∑i=1 (xi − x̄)

100 2
x̄ = = 5.6 and s =
100 n−1
(Example continues on the next page ...)

1083, Contents

(... Example continued from the previous page.)
Again, the p-value is the probability that our test statistic X̄ takes on values “at least as
extreme as” our observed test statistic x̄ = 5.6, assuming H0 ∶ µ = 5 were true. Thus, the
p-value is given by

p = P (X̄ ≥ 5.6, X̄ ≤ 4.4∣H0 ) = P (X̄ ≥ 5.6∣H0 ) + P (X̄ ≤ 4.4∣H0 )

CLT 5.6 − µ 4.4 − µ ⎛ 5.6 − 5 ⎞ ⎛ 4.4 − 5 ⎞

≈ P (Z ≥ √ ) + P (Z ≤ √ ) = P Z ≥ √ +P Z ≤ √
s/ n s/ n ⎝ 8/100 ⎠ ⎝ 8/100 ⎠

≈ P (Z ≥ 2.1213) + P (Z ≤ −2.1213) ≈ 0.03389.

Thus, we reject H0 at the α = 0.05 significance level.

Exercise 436. The Singapore daily high temperature (in °C) can be modelled by X ∼
(µ, σ 2 ). Our unknown parameter of interest is the true population mean µ (i.e. the true
average daily high temperature). Your friend guesses that µ = 34. You gather the data
on daily high temperatures, of 100 randomly-chosen days in 2015. Your observed sample
mean temperature is 33.4 °C and your observed sample variance is 11.2 °C2 . Test your
friend’s hypothesis, at the α = 0.05 significance level. (Be sure to write down your null
and alternative hypotheses. Also, clearly state where you use the CLT.) (Answer on p.

1084, Contents

107.8. Formulation of Hypotheses

Example 1209. We flip a coin 100 times. We get 100 heads. What can we say about
the coin?
This is an open-ended question, to which there can be many different answers. Here’s
the answer we’re taught to give for H2 Maths:
Let µ be the probability that a coin-flip is heads. We formulate a pair of competing

H0 ∶ µ = 0.5,
HA ∶ µ ≠ 0.5.

Our test statistic T is the number of heads (out of 100 coin-flips). Our observed test
statistic t is 100. The corresponding p-value (note that this is a two-tailed test) is

P (T ≥ 100, T ≤ 0∣H0 ) = P (T = 0∣H0 ) + P (T = 100∣H0 )

⎛ 100 ⎞ 0 100 ⎛ 100 ⎞ 100 0
= 0.5 0.5 + 0.5 0.5 ≈ 1.578 × 10−30 .
⎝ 0 ⎠ ⎝ 100 ⎠

The tiny p-value may be interpreted as casting on or providing evidence against H0 .

We note also that we can easily reject H0 at any of the conventional significance levels
(α = 0.1, α = 0.05, or α = 0.01).

Exercise 437. (Answer on p. 1598.) We observe the weights (in kg) of a random sample
of 50 Singaporeans: (x1 , x2 , . . . , x50 ). We observe that ∑ xi /50 = 68 and ∑ x2i /50 = 5000.
A friend claims that the average American is heavier than the average Singaporean. It is
known that the average American weighs 75 kg. Is your friend correct? If you make any
assumptions or approximations, make clear exactly where you do so. (Hint: Use Fact

1085, Contents

108. Correlation and Linear Regression

108.1. Bivariate Data and Scatter Diagrams

In this chapter, we’ll be interested in the relationship between two sets of data.

Example 1210. We measure the heights and weights of 10 adult male Singaporeans.
Their heights (in cm) and weights (in kg) are given in this table:

i 1 2 3 4 5 6 7 8 9 10
hi (cm) 182 165 173 155 178 174 169 160 150 190
wi (kg) 81 70 71 53 72 75 69 60 44 80

We call (hi , wi ) observation i. So for example, observation 5 is (178, 72) and observation
9 is (150, 44).
We can plot a scatter diagram of these 10 persons’ weights (vertical axis) against their
heights (horizontal).

90 Weight (kg)
50 Height (cm)
145 155 165 175 185 195

The black dotted line is called a line of best fit. Shortly (section 108.4), we’ll learn
how to construct this line of best fit.
The more closely the data points in the above scatter diagram lie to a straight line, the
more strongly linearly-correlated are weight and height. So here with these particular
data, the linear correlation between weight and height seems strong. In the next section,
we’ll learn about the product moment correlation coefficient, which is a way to
precisely quantify the degree to which two sets of data are linearly-correlated.
Because the line of best fit is upward-sloping, we can also say that the linear correlation
is positive.

1086, Contents

Example 1211. We have data from the Clementi weather station for the daily high
temperature (in °C) and daily rainfall (in mm) on 361 days in 2015. (Strangely, data
were missing for four days, namely Feb 10–13.)

i 1 2 3 4 ... 361
ti (°C) 27.3 29.5 31.1 32 30.2
pi (mm) 0 0.2 0 0 12.4

We can again plot a scatter diagram of rainfall against temperature.

80 Rainfall (mm)
25 30 Temperature (degrees Celsius) 35

Again, the black dotted line is a line of best fit. The data points do not seem close to
this line. Thus, it seems that the linear correlation between temperature and rainfall is
The line of best fit is downward-sloping and so we say that the linear correlation is

Exercise 438. (Answer on p. 1599.) The table below shows the prices charged (p) and
the number of haircuts (q) given by 5 different barbers, during June 2016.
Draw a scatter diagram with price on the horizontal axis. Plot also what you think looks
like a line of best fit.

i 1 2 3 4 5
pi ($) 8 9 4 10 8
qi 300 250 1000 400 400

1087, Contents

108.2. Product Moment Correlation Coefficient (PMCC)
In the previous section, we used a scatter diagram to determine if there was a plausible
linear relationship between two sets of data. This, though, was a very crude method.
A more precise measure of the degree to which two sets of data are linearly correlated is
called the product moment correlation coefficient (PMCC). Formally:

Definition 216. Let (x1 , x2 , . . . , xn ) and (y1 , y2 , . . . , yn ) be two ordered sets of real num-
bers. The product moment correlation coefficient (PMCC) is the following real number:

∑i=1 (xi − x̄) (yi − ȳ)

r=√ √ .
∑i=1 (xi − x̄) ∑i=1 (yi − ȳ)
n 2 n 2

Properties of the PMCC.

1. −1 ≤ r ≤ 1. (Surprisingly, this can be proven using vectors: Fact 231 in the Appendices.)
2. We say the linear correlation is positive if r > 0 and negative if r < 0.
3. If r = 1, the linear correlation is positive and perfect.

4. If r = −1, the linear correlation is negative and perfect.

5. If r is close to 1, the linear correlation is very strong.

1088, Contents

6. If r is close to −1, the linear correlation is very strong.

7. If r is close to 0, the linear correlation is very weak.

8. r is merely a measure of linear correlation and nothing else. Two variables may be very
closely related but not linearly-correlated. For example, data generated by the quadratic
model yi = x2i may have a very low r.

1089, Contents

Example 1210 (continued from above). This is the height and weight example
revisited. For convenience, we reproduce the data and scatter diagram:

i 1 2 3 4 5 6 7 8 9 10
hi (cm) 182 165 173 155 178 174 169 160 150 190
wi (kg) 81 70 71 53 72 75 69 60 44 80

90 Weight (kg)
50 Height (cm)
145 155 165 175 185 195
182 + 165 + 173 + 155 + 178 + 174 + 169 + 160 + 150 + 190
h̄ = = 169.6,

81 + 70 + 71 + 53 + 72 + 75 + 69 + 60 + 44 + 80
w̄ = = 67.5,

∑ (hi − h̄) (wi − w̄) = (182 − h̄) (81 − w̄) + ⋅ ⋅ ⋅ + (190 − h̄) (80 − w̄) = 1237


¿ √
À∑ (hi − h̄)2 = (182 − 169.6)2 + ⋅ ⋅ ⋅ + (190 − 169.6)2 ≈ 37.180640,

¿ √
À∑ (wi − w̄)2 = (81 − 67.5)2 + + ⋅ ⋅ ⋅ + (80 − 67.5)2 ≈ 35.418922,

∑i=1 (hi − h̄) (wi − w̄)

Ô⇒ r = √ √ ≈ 0.9393.
∑i=1 (hi − h̄) ∑i=1 (wi − w̄)
n 2 n 2

As expected, r > 0 (the linear correlation is positive or, equivalently, the line of best fit
is upward-sloping). Moreover, r is close to 1 (the linear correlation is very strong).

1090, Contents

Example 1211 (continued from above). This is the temperature and rainfall example
revisited. For convenience, we reproduce the data and scatter diagram:

i 1 2 3 4 ... 361
ti (°C) 27.3 29.5 31.1 32 30.2
pi (mm) 0 0.2 0 0 12.4

We can again plot a scatter diagram of rainfall against temperature.

80 Rainfall (mm)
25 30 Temperature (degrees Celsius) 35
27.3 + 29.5 + 31.1 + 32 + ⋅ ⋅ ⋅ + 30.2 0 + 0.2 + 0 + 0 + ⋅ ⋅ ⋅ + 12.4
t̄ = ≈ 31.5, w̄ = ≈ 5.0.
361 361

∑i=1 (ti − t̄) (wi − w̄)

Ô⇒ r = √ √
∑i=1 (ti − t̄) ∑i=1 (wi − w̄)
n 2 n 2

(27.3 − 31.5) (0 − 5.0) + ⋅ ⋅ ⋅ + (30.2 − 31.5) (12.4 − 5.0)

=√ √
(27.3 − 31.5) + ⋅ ⋅ ⋅ + (30.2 − 31.5) (0 − 5.0) + ⋅ ⋅ ⋅ + (12.4 − 5.0)
2 2 2 2

≈ −0.1623.

As expected, r < 0 (the linear correlation is negative or, equivalently, the line of best fit
is downward-sloping). Moreover, r is fairly close to 0 (the linear correlation is weak).

1091, Contents

Exercise 439. Compute the PMCC between p and q, using the data below. (Answer on
p. 1599.)

i 1 2 3 4 5
pi ($) 8 9 4 10 8
qi 300 250 1000 400 400

1092, Contents

108.3. Correlation Does Not Imply Causation (Optional)
Correlation does not imply causation. This saying has now become a cliché. Doesn’t make
it any less true.
Below is an amusing but spurious correlation (source):

US spending on science, space, and technology

correlates with
Suicides by hanging, strangulation and suffocation
1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
$30 billion 10000 suicides
US spending on science

Hanging suicides
$25 billion 8000 suicides

$20 billion 6000 suicides

$15 billion 4000 suicides

1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009

Hanging suicides US spending on science

The PMCC is r ≈ 0.99789126. So the two sets of data are almost perfectly linearly-
correlated. But of course, this doesn’t mean that spending on science causes suicides
or that suicides cause spending on science. More likely, the correlation is simply spurious.
A comic from xkcd:

1093, Contents

108.4. Linear Regression

Example 425 (continued from above). We suspect that the heights and weights of
adult male Singaporeans are linearly-correlated. We thus write down this linear model:

w = a + bh.

Recall the quote: “All models are wrong, but some are useful.” The model w = a + bh is
unlikely to be exactly correct. But hopefully it will be useful.
We treat a and b as unknown parameters (do you expect b to be positive or negative?).
Our goal is to try to get estimates for a and b, from an observed random sample of height
and weight data.
We recycle the data from earlier. These, along with the scatter diagram, are reproduced
for convenience.

i 1 2 3 4 5 6 7 8 9 10
hi (cm) 182 165 173 155 178 174 169 160 150 190
wi (kg) 81 70 71 53 72 75 69 60 44 80

90 Weight (kg)
50 Height (cm)
145 155 165 175 185 195

The basic idea of linear regression is this: Find the line that “best fits” the given data.
Drawn in the figure above are three plausible candidates for the “line of best fit”. But
there can only be one line of best fit. Which is it?
At the end of the day, we’ll choose black dotted line as “the” line of best fit. But why?
This will be answered in the next section.

1094, Contents

Example 1211 (continued from above). We suspect that daily rainfall and daily high
temperatures for 2015 were linearly-correlated. We thus write down this linear model:

p = a + bt.

Again, our goal is to get estimates for the unknown parameters a and b (do you expect
b to be positive or negative?).
We gather the following data (recycled from before):

i 1 2 3 4 ... 361
ti (°C) 27.3 29.5 31.1 32 30.2
pi (mm) 0 0.2 0 0 12.4

We can again plot a scatter diagram of rainfall against temperature.

80 Rainfall (mm)
25 30 Temperature (degrees Celsius) 35

Again, drawn in the figure above are several plausible candidates for the “line of best fit”.
It turns out that the black dotted line will be “the” line of best fit.

1095, Contents

108.5. Ordinary Least Squares (OLS)
There are different methods for determining “the” line of best fit. Each method will give a
different line of best fit.
The method we’ll learn in H2 Maths is the most basic and most standard method. It is
called the method of ordinary least squares (OLS).
Let’s assume there is some true linear model, which may be written as y = a+bx. As always,
we stick to the objectivist interpretation. The parameters a and b have some true, fixed
values. However, they are unknown (and may forever be unknown).
Nonetheless, we’ll try to do our best and get estimates for a and b. These estimates will be
denoted â and b̂. And our line of best fit will then be y = â + b̂x.
How do we find this line of best fit? Intuitively, this will be the line to which the data
points are “as close as possible”. But there are many ways to define the term “as close
as possible”. For example, we could try to minimise the sum of the distances between the
points and the line. But we shall not do this.
Instead, we’ll use the method of OLS:
1. Measure the vertical distance of each data point (xi , yi ) from the line. This is called the
residual and is denoted ûi .
2. Our goal is to find the line y = â + b̂x that minimises ∑ û2i — this quantity is called the
Sum of Squared Residuals (SSR).

1096, Contents

Example 1210 (height and weight example revisited). Our candidate line of best
fit is w = â + b̂h = 65 + 0h = 65. This is a horizontal line, which simply “predicts” that
everyone’s weight is always 65 kg, regardless of their height. (This is a somewhat silly
candidate line of best fit. Not surprisingly, this is not the actual line of best fit.)

Weight (kg)


70 5





Height (cm)
145 155 165 175 185 195

i 1 2 3 4 5 6 7 8 9 10
hi (cm) 182 165 173 155 178 174 169 160 150 190
wi (kg) 81 70 71 53 72 75 69 60 44 80
ŵi (kg) 65 65 65 65 65 65 65 65 65 65
ûi = wi − ŵi (kg) 16 5 6 −12 7 10 4 −5 −21 15

The second last row of the above table gives, for each person with height hi , the cor-
responding predicted weight ŵi (as per our candidate line of best fit). The residual ûi
(last row) is then defined as the vertical distance between the data point and the weight
predicted by the candidate line of best fit.
The SSR is ∑ û2i = 162 + 52 + 62 + (−12)2 + 72 + 102 + 42 + (−5)2 + (−21)2 + 152 = 1317.
Can we do better than this? That is, can we find another candidate line of best fit whose
SSR is smaller than 1317?

1097, Contents

The following fact gives two formulae for b̂, the gradient of the line of best fit. Formula (i)
is printed in the List of Formulae (MF26) you get during exams, but formula (ii) is not.

Fact 187. Let (x1 , x2 , . . . , xn ) and (y1 , y2 , . . . , yn ) be two ordered sets of data. The OLS
regression line of y on x is y − ȳ = b̂ (x − x̄), where

∑ (xi − x̄) (yi − ȳ)

(i) b̂ = i=1 n ,
∑i=1 (xi − x̄)

∑ xi yi − nx̄ȳ
(ii) b̂ =
∑ x2i − nx̄2

Moreover, the regression line can also be written in the form y = â + b̂x, where b̂ is as
given above and â = ȳ − b̂x̄.

Proof. We want to find â and b̂ such that the line y = â + b̂x has the smallest SSR possible.
The residual ûi is defined as the vertical distance between (xi , yi ) and the line y = â + b̂x.
That is,

ûi = yi − y = yi − (â + b̂xi ) .

Thus, the SSR is ∑ û2i = ∑ [yi − (â + b̂xi )] .

We wish to minimise the SSR, by choosing appropriate values of â and b̂. This involves the
following pair of first order conditions:375

∑ û2i = 0, ∑ û2i = 0.
∂ ∂
∂â ∂b̂
The remainder of the proof simply involves taking derivatives and doing the algebra, and
is continued on p. 1384 in the Appendices.

Remark 129. Whenever we simply say regression line or line of best fit, it may safely
be assumed that we are talking about the OLS regression line.

There’s a bit of hand-waving here.
1098, Contents
Example 1210 (height and weight example revisited). We already calculated

∑ (hi − h̄) = 1382.4, ∑ (hi − h̄) (wi − w̄) = 1237.

n n
h̄ = 169.6, w̄ = 67.5,

i=1 i=1

∑i=1 (hi − h̄) (wi − w̄) 1237

So, b̂ = = ≈ 0.8948.
∑i=1 (hi − h̄)
n 2 1382.4

Thus, the regression line is w − 67.5 = 0.8948 (h − 169.6) or w = â + b̂h = −84.26 + 0.8948h.

Weight (kg)
85 4
Height (cm)
145 155 165 175 185 195

i 1 2 3 4 5 6 7 8 9 10
hi (cm) 182 165 173 155 178 174 169 160 150 190
wi (kg) 81 70 71 53 72 75 69 60 44 80
ŵi (kg) 78.6 63.4 70.5 54.4 75.0 71.4 67.0 58.9 50.0 85.8
ûi = wi − ŵi (kg) 2.4 6.6 0.5 −1.4 −3.0 3.6 2.0 1.1 −6.0 −5.8

The SSR for the actual line of best fit is ∑ û2i = 2.42 + ⋅ ⋅ ⋅ + (−5.8)2 ≈ 147.6. This is much
better than the SSR of 1317 that we found for the previous candidate line of best fit,
which was simply a horizontal line.

1099, Contents

Exercise 440. (a) Find the regression line of q on p, using the data below. (b) Com-
plete the table. (c) Draw the scatter diagram, including the regression line and the
corresponding residuals. (d) Compute the SSR. (Answer on p. 1600.)

i 1 2 3 4 5
pi ($) 8 9 4 10 8
qi 300 250 1000 400 400
ûi = qi − q̂i

1100, Contents

108.6. TI84 to Calculate the PMCC and the OLS Estimates

Example 1212. We’ll find the PMCC and the regression line for these data:

i 1 2 3 4 5
xi 1 7 3 11 8
yi 14 5 6 4 4

1. Press ON to turn on your calculator.

2. Press the blue 2ND button and then CATALOG (which corresponds to the 0 button).
This brings up the CATALOG menu.
3. Using the down arrow key , scroll down until the cursor is on DiagnosticOn.

4. Press ENTER once. And press ENTER a second time. The TI84 now says “DONE”,
telling you that the Diagnostic option has been turned on.
The above steps need only be performed once. Unless of course you’ve just reset your
calculator (as is required before each exam). In which case you have to go through the
above steps again.

After Step 1. After Step 2. After Step 3. After Step 4.

5. Press STAT to bring up the STAT menu.

6. Press 1 to select the “1:Edit” option.
7. The TI84 now prompts you to enter data under the column titled “L1”. This is where
you should enter the data for x, using the numeric pad and the ENTER key as is
appropriate. (I omit from this step the exact buttons you should press.)
8. After entering the last entry, press the right arrow key ⟩ to go to column L2. So enter
the data for y, again using the numeric pad and the ENTER key as is appropriate.

After Step 5. After Step 6. After Step 7. After Step 8.

(Example continues on the next page ...)

1101, Contents

(... Example continued from the previous page.)
9. Now press STAT to again bring up the STAT menu.
10. Press the right arrow key ⟩ to go to the CALC submenu.
11. Press 4 to select the “4:LinReg(ax+b)” option.
12. To tell the TI84 to go ahead and do the calculations, simply press ENTER .
The TI84 tells you that the PMCC is r = −.8147656398. The equation of the regression
line of y on x is y = ax + b = −.859375x + 11.75625.
(Be careful to note that the TI84 uses the symbol “a” for the coefficient for x, whereas
in the List of Formulae (MF26), they use b instead. Don’t get these mixed up!)

After Step 9. After Step 10. After Step 11. After Step 12.

Exercise 441. Using your TI84, find the PMCC between q and p, and also find the
regression line of q on p (see data below). Verify that your answer for this exercise is the
same as those in the last two exercises. (Answer on p. 1601.)

i 1 2 3 4 5
pi ($) 8 9 4 10 8
qi 300 250 1000 400 400

1102, Contents

108.7. Interpolation and Extrapolation

Given any value of x, we call the corresponding ŷ = b̂ (x − x̄) + ȳ the fitted value or the
predicted value. One use of the regression line is that it can help us predict (or “guess”)
the value of y, even for x for which we have no data.
Example 1210 (height and weight example revisited). Say we want to guess
the weight of an adult male Singaporean who is 185 cm tall. Using our regression line,
we predict that his weight is ŵh=185 = 0.8948 × 185 − 84.26 ≈ 81.3 kg. This is called
interpolation, because we are predicting the weight of a person whose height is between
two of our observations.
Say instead we want to guess the weight of an adult male Singaporean who is 210 cm tall.
Using our regression line, we predict that his weight is ŵh=210 = 0.8948×210−84.26 ≈ 103.6
kg. This is called extrapolation, because we are predicting the weight of a person whose
height is beyond on our rightmost observation.

i 1 2 3 4 5 6 7 8 9 10
hi (cm) 182 165 173 155 178 174 169 160 150 190 185 210
wi (kg) 81 70 71 53 72 75 69 60 44 80 - -
ŵi (kg) 78.6 63.4 70.5 54.4 75.0 71.4 67.0 58.9 50.0 85.8 81.3 103.6

110 Weight (kg)






Height (cm)
145 155 165 175 185 195 205 215

1103, Contents

For the A-Level exams, you are supposed to mindlessly and formulaically say that “Extra-
polation is less reliable than interpolation”, because

The former predicts what’s beyond the known observations; the

latter predicts what’s between two known observations.

This, though, is not a very satisfying explanation for why extrapolation is “less reliable”
than interpolation. It merely leads to another question: “Why should a prediction be more
reliable if done between two known observations, than if done to the right of the right-most
observation (or to the left of the left-most observation)?”
We won’t give an adequate answer to this latter question. Instead, we’ll simply give a
bunch of examples to illustrate the dangers of extrapolation:

Example 1213. A man on a diet weighs 115 kg in Week #1. Here’s a chart of his weight

The OLS line of best fit suggests that he has been losing about 0.5 kg a week.
He forgot to record his weight on Week #6. By interpolation, we “predict” that his
weight that week was 112.5 kg. This is probably a reliable guess.
By extrapolation, we predict that his weight on Week #201 will be 15 kg. This guess is
obviously absurd. It requires that he keeps losing 0.5 kg a week for nearly 4 years.

1104, Contents

Example 1214. A growing boy is 160 cm tall in Month #1. Here’s a chart of his growth.

The OLS line of best fit suggests that he has been growing by about 1 cm a month.
He forgot to record his height in Month #6. By interpolation, we “predict” that his
height that month was 165 cm. This is probably a reliable guess.
By extrapolation, we predict that his height in Month #101 will be 260 cm. This guess is
obviously absurd. It requires that he keep growing by 1 cm a month for the 8-plus years.

1105, Contents

Here are three colourful examples of the dangers of extrapolation from other contexts.

Example 1215. Russell’s Chicken (Problems of Philosophy, 1912, Google Books link):
The man who has fed the chicken every day throughout its life at last wrings its neck
instead, showing that more refined views as to the uniformity of nature would have been
useful to the chicken. ... The mere fact that something has happened a certain number
of times causes animals and men to expect that it will happen again. Thus our instincts
certainly cause us to believe the sun will rise to-morrow, but we may be in no better a
position than the chicken which unexpectedly has its neck wrung.

Example 1216. The Fermat numbers are

F0 = 22 + 1 = 3,

F1 = 22 + 1 = 5,

F2 = 22 + 1 = 17,

F3 = 22 + 1 = 257,

F4 = 22 + 1 = 65537.

Remarkably, the first five Fermat numbers are all prime. This observation led Fermat to
conjecture (guess) in the 17th century that all Fermat numbers are prime. This was an
act of extrapolation.
Unfortunately, Fermat’s act of extrapolation was wrong. About a century later, Euler
showed that F5 = 22 + 1 = 4294967297 = 641 × 6700417 is composite (not prime).

Today, the Fermat numbers F5 , F6 , . . . , F32 are all known to be composite. Indeed,
it was shown in 1964 that F32 is composite. Over half a century later, it is not yet
known if F33 = 22 + 1 is prime or composite. F33 is an unimaginably huge number, with

2, 585, 827, 973 digits.

1106, Contents

Example 1217. On Ah Beng’s first day at school, he learns in Chinese class that the
Chinese character for the number 1 is written as a single horizontal stroke.
On his second day at school, he learns that the Chinese character for the number 2 is
written as two horizontal strokes.
On his third day at school, he learns that the Chinese character for the number 3 is
written as three horizontal strokes.

The Chinese The Chinese The Chinese

character for 1 character for 2 character for 3

After his third day at school, Ah Beng decides he’ll skip at least the next few Chinese
classes, because he thinks he knows how to write the Chinese characters for the numbers 4
and above. 4 simply consists of four horizontal strokes; 5 simply consists of five horizontal
strokes; etc. Unfortunately, Ah Beng’s act of extrapolation is wrong.
The characters for the numbers 4 through 10 look instead like this:

4 5 6 7 8 9 10

1107, Contents

On the other hand, here are two historical examples of extrapolation that, to everyone’s
surprise, have held up remarkably well (at least to date).

Example 1218. Moore’s Law. In 1965, Gordon Moore observed that the number of
components that could be crammed onto each integrated circuit doubled every year. He
predicted that this rate of progress would continue at least through 1975.
In 1975, he adjusted his prediction to a more modest rate of doubling every two years.
Thus far, this latter prediction has held up remarkably well. The following from Nature:
to m
For the past five decades, the number of transistors per microprocessor T
chip — a rough measure of processing power — has doubled about every
two years, in step with Moore’s law (top). Chips also increased their ‘clock
speed’, or rate of executing instructions, until 2004, when speeds were tur
capped to limit heat. As computers increase in power and shrink in size, a sto
new class of machines has emerged roughly every ten years (bottom).
1010 ele
10 8 cur
10 6 mo
10 4
Transistors per chip bro
10 2 ficu
Clock speeds (MHz) chi
10 –2 the
1960 1974 1988 2002 2016
Unfortunately, as stated in the same Nature article, it “has become increasingly obvious com
to everyone involved” that “Moore’s law ... is nearing its end”. wh
e Cla
nf r
1011 Ma
i is l
1108, Contents T
Example 1219. Augustine’s Law. In 1983, Norman Augustine observed that the cost
of a tactical aircraft grows four-fold every ten years. (Google Books.)

~ A Qí Þí µø
~ Aäí
N    U ,
~ AAAhÑí  Ñí
Îí h פí
>? ; 2?
N  PU RU L $ *U

Ó A7L R 
" ~lyí h[hí
  ,2 RU
SU L , U
.Ëø ~ JQí fÜA<í ~A=í
ø ãø.ø
 Ù ç Ħ Óc~ Bí
,F U
Ñ hAyí
, U
ø 1 Wø
«WA Þh¥í
þ ) . ~.Ã
1U æ LUR,&, U
*Ň 4ø  « O`
  : çU LU LU
þ «TAí Ñí ÏÞ í Ú.[í
5í )?;?) ?
6? ) ?9 ?

C. 9 5 $?  ?:$9U
š¥³¹yÑíè í
–žRh Þí ¹¾h uh³uíy Aí
í Þى.í

   "  " " " " " " "


This is considerably quicker than the rate at which the annual US defense budget and
US Gross National Product (GNP) grows. Extrapolating, he concluded:
• In 2054, the entire annual US defense budget will be spent on a single aircraft.
• Early in the 22nd century, the entire US GNP will be spent on a single aircraft.

1109, Contents

(... Example continued from the previous page.)
These seemingly-absurd conclusions were written at least partly in jest.
Except so far they have been right on track. In a 2010 Economist article, Augustine was
quoted as saying, “We are right on target. Unfortunately nothing has changed.” That
article also presented an updated version of Augustine’s Law.
The latest F-35 fighter program is estimated to cost the US Department of Defense
US$1.124 trillion. To be fair, that estimate is the cost of the entire program over its
projected 60-year lifespan (through 2070) — this includes R&D, the purchase of over
2, 000 F-35s, and operating costs. But still, US$1.124 trillion is a mind-blowing figure.376

Exercise 442. Using the data below, “predict” how many haircuts were sold in June
2016 by (a) a barber who charged $7 per haircut; and (b) a barber who charged $200 per
haircut. Which prediction is an act of interpolation and which is an act of extrapolation?
Which prediction do you think is more reliable?(Answer on p. 1601.)

i 1 2 3 4 5
pi ($) 8 9 4 10 8
qi 300 250 1000 400 400

1110, Contents

108.8. Transformations to Achieve Linearity
Two variables may have a relationship, but not a linear one. Here we consider cases where
the relationship is quadratic, reciprocal, or logarithmic.

Example 1220. Quadratic. Consider the following data. There is a very strong, but
not perfect degree of linear correlation between x and y (r ≈ 0.950). The observations are
very close to, but are not exactly on the OLS line of best fit.

Perhaps we can do better by transforming the data. We’ll do a quadratic transformation:

let zi = x2i . Then we have the following data.

The degree of linear correlation between z and y is near perfect (r ≈ 0.995). The obser-
vations also lie closer to the line of best fit than before.

1111, Contents

Example 1221. Reciprocal. Consider the following data. There seems to be a moder-
ate degree of linear correlation between x and y (r ≈ −0.603). The observations are fairly
close to the OLS line of best fit.

Perhaps we can do better by transforming the data. We’ll do a reciprocal transformation:

let zi = 1/xi . Then we have the following data and scatter diagram.

The degree of linear correlation between z and y is much stronger (r ≈ 0.899). The
observations also lie closer to the line of best fit.

1112, Contents

Example 1222. Logarithmic. Consider the following data. There seems to be a fairly
strong degree of linear correlation between x and y (r ≈ 0.873). The observations are
fairly close to, but are not exactly on the OLS line of best fit.

Perhaps we can do better by transforming the data. We’ll do a reciprocal transformation:

let zi = ln xi . Then we have the following data and scatter diagram.

The degree of linear correlation between z and y is much stronger (r ≈ 0.978). The
observations also lie closer to the line of best fit.

1113, Contents

Exercise 443. You are given the following data. (Answer on p. 1602.)

i 1 2 3 4 5
xi 1 2 3 4 5
yi 10.59 10.54 27.30 33.84 56.6

(a) Plot the above data in a scatter diagram and find the PMCC.
(b) Apply an appropriate transformation to x. Plot the transformed data in a scatter
diagram and find the PMCC.

1114, Contents

108.9. The Higher the PMCC, the Better the Model?
There are no routine statistical questions, only questionable statistical

— J.M. Hammersley377

It’s much more interesting to live not knowing than to have answers which
might be wrong.

— Richard Feynman (1981 interview).

The A-Level examiners378 want you to say, mindlessly and formulaically, that

All else equal, a model with a higher PMCC

is better than a model with a lower PMCC.

Regurgitating the above sentence will earn you your full mark. But in fact, without the
“all else equal” clause, it is nonsense. And since it is almost never true that “all else is
equal”, it is almost always nonsense.
In every introductory course or text on statistics, one is told that the PMCC is merely
a relatively-unimportant consideration, in deciding between models. Yet somehow, the
A-Level examiners seem to consider the PMCC an all-important consideration.
Here’s a quick example to illustrate.

Example 1223. (From the 2015 exam — see Exercise 668 below.) In an experiment the
following information was gathered about air pressure P , measured in inches of mercury,
at different heights above seA-Level h, measured in feet.

h 2000 5000 10000 15000 20000 25000 30000 35000 40000 45000
P 27.8 24.9 20.6 16.9 13.8 11.1 8.89 7.04 5.52 4.28

The exam first asks us to find the PMCCs between (a) h and P ; (b) ln h and P ; and (c) h
and P . The answers are (a) ra ≈ −0.980731; (b) rb ≈ −0.974800; and (c) rc ≈ −0.998638.
The A-Level exam then says, “Using the most appropriate case ..., find the equation
√ models air pressure at different heights.” The “correct” answer is that (c)
which best
P = a + b h is the “most appropriate” model, simply because the PMCC there is the
(Example continues on the next page ...)

See 9740 N2015/II/10(iii), N2014/II/8(b)(ii), N2012/II/8(v), N2011/II/8(iii), N2010/II/10(iii), and
N2008/II/8(i). These are given in this textbook as Exercises 668, 674, 689, 695, 705, and 717.
1115, Contents
(... Example continued from the previous page.)
But this is utter nonsense. One does not conclude that one model is “more appropri-
ate” than another simply because its PMCC is 0.018 larger. Small measurement errors
or plain bad luck could easily explain these tiny differences in PMCCs.
Moreover, even if one model has r = 0.9 and another has r = 0.4, it does not automatically
follow that the first model is “more appropriate” than the second. In deciding which
statistical model to use, there are very many considerations, of which the PMCC is a
relatively-unimportant one.
In my view, the correct answer should have been this:

We have far too little information to make any conclusions.

Sadly, in the Singapore education system, what I consider to be the correct answer would
not have gotten you any marks. Instead, one is taught that there must always be one
single, simplistic, formulaic, definitive, “correct” answer. This is a convenient substitute
for thinking.
As it turns out, the “most correct” linear model — based on the actual barometric formula
(see subsection 122.10 in the Appendices) — is actually the following:

ln P = a + b ln (1 +
h) .

The constants L = −0.0065 kelvin per metre (Km-1 ) and T = 288.15 kelvin (K) are,
respectively, the standard temperature lapse rate (up to 11, 000 m above sea level) and
the standard temperature (at sea level).
The PMCC for the above model is rd ≈ 0.999998, which is “better” than the cases ex-
amined above. (See this Google spreadsheet for the data and calculations.)
But again, the PMCC is merely one relatively-unimportant √ consideration. Our
conclusion that this last model is superior to the model P = a + b h is based not on the
fact that rd is 0.001 larger than rc .
√ model because it was derived from physical theories. In
Instead, we are confident in this
contrast, the model P = a + b h (or indeed any of the other models suggested above)

is completely arbitrary and has no theoretical justification. Hence, even if the model
P = a + b h had a PMCC of 1, we’d still prefer this last model.

1116, Contents

Part VII.
Ten-Year Series

Answers for Part VI (Probability and Statistics) 2016 and 2017 questions
will be written “soon”. ,

1117, Contents

As stated on p. 4 of your H2 Maths syllabus, your A-Level exam will consist of two papers
of 3 hours each.
• Paper 1 [total 100 points]: Pure Mathematics, 3 hours, 10–12 questions.
• Paper 2 [total 100 points]: 3 hours
– Section A [total 40 points]: Pure Mathematics, 4–5 questions.
– Section B [total 60 points]: Probability and Statistics, 6–8 questions.
Each paper contains 100 points. So, you have an average of 1.8 min = 108 s to spend on
each point and each point is 0.5% of the maximum score of 200.

For more practice, try the TYS questions for H1 Maths (in my H1 Maths
Textbook). They’re very similar!
This part lists all the questions from the 2006–2017 A-Level exams, sorted into the six
different parts and in reverse chronological order.
In the older exams, they had the habit of not distinctly numbering different parts within
the same question as parts (i), (ii), etc. So I have sometimes taken the liberty of adding or
modifying such numbers.

1118, Contents

Many questions are out-of-
syllabus and can be skipped.
(They’re printed in light grey).379

Happily, the present 9758 syllabus (first examined in 2017) is considerably lighter than the previous
9740 syllabus (last examined in 2017), which was in turn lighter than the previous 9233 syllabus (last
examined in 2008). Thus, many past-year questions printed here are no longer in the current 9758
syllabus and you can skip them. Answers have been provided anyway and you’re perfectly welcome to
try them.
1119, Contents
The following appears on the cover page of each of your 9758 A-Level exam papers.
Write your Centre number, index number and name on the work you hand in.
Write in dark blue or black pen on both sides of the paper.
You may use an HB pencil for any diagrams or graphs.
Do not use staples, paper clips, glue or correction fluid.

Answer all the questions.

Give non-exact numerical answers correct to 3 significant figures, or 1 decimal place in the
case of angles in degrees, unless a different level of accuracy is specified in the question.
You are expected to use an approved graphing calculator.
Unsupported answers from a graphing calculator are allowed unless a question specifically
states otherwise.
Where unsupported answers from a graphing calculator are not allowed in a question,
you are required to present the mathematical steps using mathematical notations and not
calculator commands.
You are reminded of the need for clear presentation in your answers.

At the end of the examination, fasten all your work securely together.
The number of marks is given in brackets [ ] at the end of each question or part question.

1120, Contents

109. Past-Year Questions for Part I. Functions and Graphs
Exercise 444. (9758 N2017/I/2.) (Answer on p. 1603.)
(i) On the same axes, sketch the graphs of y = and y = b ∣x − a∣, where a and b are
positive constants. [2]
(ii) Hence, or otherwise, solve the inequality < b ∣x − a∣. [4]

Exercise 445. (9758 N2017/I/4.) (Answer on p. 1604.)

4x + 9
A curve C has equation y = .
(i) Show that the gradient of C is negative for all points on C. [3]

(ii) By expressing the equation of C in the form y = a +

, where a and b are constants,
write down the equations of the asymptotes of C. [3]

(iii) Describe a pair of transformations which transforms the graph of C on to the graph
of y = . [2]

Exercise 446. (9758 N2017/I/5.) (Answer on p. 1604.)

When the polynomial x3 +ax2 +bx+c is divided by (x − 1), (x − 2) and (x − 3), the remainders
are 8, 12 and 25 respectively.
(i) Find the values of a, b and c. [4]
A curve has equation y = f (x), where f (x) = x3 + ax2 + bx + c, with the values of a, b and
c found in part (i).
(ii) Show that the gradient of the curve is always positive. Hence explain why the equation
f (x) = 0 has only one real root and find this root. [3]
(iii) Find the x-coordinates of the points where the tangent to the curve is parallel to the
line y = 2x − 3. [3]

Exercise 447. (9758 N2017/II/1.) (Answer on p. 1605.)

A curve C has parametric equations

x= , y = 2t.
(i) The line y = 2x cuts C at the points A and B. Find the exact length of AB. [3]
(ii) The tangent at the point P ( , 2p) on C meets the x-axis at D and the y-axis at E.
The point F is the midpoint of DE. Find a cartesian equation of the curve traced by
F as p varies. [5]
1121, Contents
Exercise 448. (9758 N2017/II/3.) (Answer on p. 1605.)
(a) The curve y = f (x) cuts the axes at (a, 0) and (0, b). It is given that f −1 (x) exists.
State, if it is possible to do so, the coordinates of the points where the following curves
cut the axes.
(i) y = f (2x).
(ii) y = f (x − 1).
(iii) y = f (2x − 1).
(iv) y = f −1 (x). [4]
(b) The function g is defined by

g ∶x↦1− , where x ∈ R, x ≠ a.
(i) State the value of a and explain why this value has to be excluded from the
domain of g. [2]
(ii) Find g (x) and g (x), giving your answers in simplified form.
2 −1
(iii) Find the values of b such that g 2 (b) = g −1 (b). [2]

Exercise 449. (9740 N2016/I/1.) (Answer on p. 1605.)

4x2 + 4x − 14
Express − (x + 3) as a single simplified fraction. [2]
Hence, without using a calculator, solve the inequality

4x2 + 4x − 14
< (x + 3). [3]

Exercise 450. (9740 N2016/I/3.) (Answer on p. 1606.)

The curve y = x4 is transformed onto the curve with equation y = f (x). The turning
point on y = x4 corresponds to the point with coordinates (a, b) on y = f (x). The curve
y = f (x) also passes through the point with coordinates (0, c). Given that f (x) has the
form k (x − l) 4 + m and that a, b and c are positive constants with c > b, express k, l and
m in terms of a, b and c. [2]
By sketching the curve y = f (x), or otherwise, sketch the curve y = . State, in terms
f (x)
of a, b and c, the coordinates of any points where y = crosses the axes and of any
f (x)
turning points. [4]
1122, Contents
Exercise 451. (9740 N2016/I/10.) (Answer on p. 1606.)

(a) The function f is given by f ∶ x ↦ 1 + x, for x ∈ R, x ≥ 0.
(i) Find f −1 (x) and state the domain of f −1 . [3]
(ii) Show that if f f (x) = x then x −4x +4x−1 = 0. Hence find the value of x for which
3 2

f f (x) = x. Explain why this value of x satisfies the equation f (x) = f −1 (x). [5]
(b) The function g, with domain the set of non-negative integers, is given by

⎪ 1 for n = 0,

⎪ 1
g(n) = ⎨2 + g ( n) for n even,

⎪ 2

⎩1 + g (n − 1)
⎪ for n odd.

(i) Find g (4), g (7), and g (12). [3]

(ii) Does g have an inverse? Justify your answer. [2]

Exercise 452. (9740 N2015/I/1.) (Answer on p. 1608.)

A curve C has equation

y= + bx + c,
where a, b and c are constants. It is given that C passes through the points with coordinates
(1.6, −2.4) and (−0.7, 3.6), and that the gradient of C is 2 at the point where x = 1.
(i) Find the values of a, b and c, giving your answers correct to 3 decimal places. [4]
(ii) Find the x-coordinate of the point where C crosses the x-axis, giving your answer
correct to 3 decimal places. [2]
(iii) One asymptote of C is the line with equation x = 0. Write down the equation of the
other asymptote of C. [1]

Exercise 453. (9740 N2015/I/2.) (Answer on p. 1608.)

(i) Sketch the curve with equation y = ∣ ∣, stating the equations of the asymptotes.
On the same diagram, sketch the line with equation y = x + 2. [3]
(iv) Solve the inequality ∣ ∣ < x + 2. [3]
1123, Contents
Exercise 454. (9740 N2015/I/5.) (Answer on p. 1609.)
(i) State a sequence of transformations that will transform the curve with equation y = x2
on to the curve with equation y = 0.25 (x − 3) 2 . [2]

A curve has equation y = f (x) where

⎪ 1 for 0 ≤ x ≤ 1,

f (x) = ⎨0.25 (x − 3) 2 for 1 < x ≤ 3,

⎩0 otherwise.
(ii) Sketch the curve for −1 ≤ x ≤ 4. [3]
(iii) On a separate diagram, sketch the curve with equation y = 1 + f (0.5x), for −1 ≤ x ≤ 4.

Exercise 455. (9740 N2015/II/3.) (Answer on p. 1610.)

(a) The function f is defined by

f ∶x→ , x ∈ R, x > 1.
1 − x2
(i) Show that f has an inverse. [2]
(ii) Find f −1 (x) and state the domain of f −1 . [3]
(b) The function g is defined by

g∶x→ , x ∈ R, x ≠ ±1.
1 − x2

Find algebraically the range of g, giving your answer in terms of 3 as simply as
possible. [5]

Exercise 456. (9740 N2014/I/1.) (Answer on p. 1611.)

The function f is defined by

y= , x ∈ R, x ≠ 1, x ≠ 0.
(i) Show that f 2 (x) = f −1 (x). [4]
(ii) Find f 3 (x) in simplified form. [1]

1124, Contents

Exercise 457. (9740 N2014/I/4.) (Answer on p. 1611.)
The diagram shows the curve y = f (x). The curve crosses the x-axis at the points A, B
and C, and has a maximum turning point at D where it crosses the y-axis. The coordinates
of A, B, C and D are (−a, 0), (b, 0), (c, 0) and (0, d) respectively, where a, b, c and d are
positive constants.


O x

(i) Sketch the curve y 2 = f (x), stating, in terms of a, b, c and d, the coordinates of any
turning points and of the points where the curve crosses the x-axis. [4]
(ii) What can be said about the tangents to the curve y 2 = f (x) at the points where it
crosses the x-axis? [1]

Exercise 458. (9740 N2014/II/1.) (Answer on p. 1612.)

A curve C has parametric equations x = 3t2 , y = 6t.
(i) Find the value of t at the point on C where the tangent has gradient 0.4. [3]
(ii) The tangent at the point P (3p2 , 6p) on C meets the y-axis at the point D. Find the
cartesian equation of the locus of the mid-point of P D as p varies. [4]

Remark 130. For (ii), assume also that p ≠ 0.380

Exercise 459. (9740 N2013/I/2.) (Answer on p. 1612.)

It is given that

x2 + x + 1
y= , x ∈ R, x ≠ 1.
Without using a calculator, find the set of values that y can take. [5]

The question writers may have overlooked the fact that if p = 0, then P = (0, 0) and the tangent at P
is vertical, so that D could be any point on the y-axis.
1125, Contents
Exercise 460. (9740 N2013/I/3.) (Answer on p. 1613.)
(i) Sketch the curve with equation
2x − 1

stating the equations of any asymptotes and the coordinates of the points where the
curve crosses the axes. [4]
(ii) Solve the inequality

< 1. [1]
2x − 1

Exercise 461. (9740 N2013/II/1.) (Answer on p. 1614.)

Functions f and g are defined by

f ∶x ↦ , x ∈ R, x ≠ 1,
g ∶ x ↦ 1 − 2x, x ∈ R.

(i) Explain why the composite function f g does not exist. [2]
(ii) Find an expression for gf (x) and hence, or otherwise, find (gf ) −1 (5). [4]

Exercise 462. (9740 N2012/I/1.) (Answer on p. 1614.)

A cinema sells tickets at three different prices, depending on the age of the customer. The
age categories are under 16 years, between 16 and 65 years, and over 65 years. Three groups
of people, A, B, and C, go to the cinema on the same day. The numbers in each category
for each group, together with the total cost of the tickets for each group, are given in the
following table.

Group Under 16 years Between 16 and 65 years Over 65 years Total cost
A 9 6 4 $162.03
B 7 5 3 $128.36
C 10 4 5 $158.50

Write down and solve equations to find the cost of a ticket for each of the age categories.[4]
1126, Contents
Exercise 463. (9740 N2012/I/7.) (Answer on p. 1614.)
A function f is said to be self-inverse if f (x) = f −1 (x) for all x in the domain of f . The
function g is defined by

g∶x↦ , x ∈ R, x ≠ 1.
where k is a constant, k ≠ −1.
(i) Show that g is self-inverse. [2]
(ii) Given that k > 0, sketch the curve y = g (x), stating the equations of any asymptotes
and the coordinates of any points where the curve crosses the x- and y-axes. [3]
(iii) State the equation of one line of symmetry of the curve in part (ii), and describe fully
a sequence of transformations which would transform the curve y = onto this curve.

Exercise 464. (9740 N2013/I/3.) (Answer on p. 1616.)

It is given that f (x) = x3 + x2 − 2x − 4.
(i) Sketch the graph of y = f (x). [1]
(ii) Find the integer solution of the equation f (x) = 4, and prove algebraically that there
are no other real solutions. [3]
(iii) State the integer solution of the equation (x + 3) 3 + (x + 3) 2 − 2 (x + 3) − 4 = 4. [1]
(iv) Sketch the graph of y = ∣f (x)∣. [1]
(v) Write down two different cubic equations which between them give the roots of the
equation ∣f (x)∣ = 4. Hence find all the roots of this equation. [4]

Exercise 465. (9740 N2011/I/1.) (Answer on p. 1616.)

Without using a calculator, solve the inequality

x2 + x + 1
< 0. [4]
x2 + x − 2

Exercise 466. (9740 N2011/I/2.) (Answer on p. 1617.)

It is given that f (x) = ax2 + bx + c, where a, b, and c are constants.
(i) Given that the curve with equation y = f (x) passes through the points with coordin-
ates (−1.5, 4.5), (2.1, 3.2) and (3.4, 4.1), find the values of a, b, and c. Give your
answers correct to 3 decimal places. [3]
(ii) Find the set of values of x for which f (x) is an increasing function.
1127, Contents
Exercise 467. (9740 N2011/II/3.) (Answer on p. 1617.)
The function f is defined by

f ∶ x ↦ ln (2x + 1) + 3, x ∈ R, x > − .
(i) Find f −1 (x) and write down the domain and range of f −1 . [4]
(ii) Sketch on the same diagram the graphs of y = f (x) and y = f −1 (x) giving the equa-
tions of any asymptotes and the exact coordinates of any points where the curves
cross the x- and y-axes. [4]
(iii) Explain why the x-coordinates of the points of intersection of the curves in part (ii)
satisfy the equation ln (2x + 1) = x − 3, and find the values of these x-coordinates,
correct to 4 significant figures. [3]

Exercise 468. (9740 N2010/I/5.) (Answer on p. 1619.)

The curve with equation y = x3 is transformed by a translation of 2 units in the positive
x-direction, followed by a stretch with scale factor 0.5 parallel to the y-axis, followed by a
translation of 6 units in the negative y-direction.
(i) Find the equation of the new curve in the form y = f (x) and the exact coordinates of
the points where this curve crosses the x- and y-axes. Sketch the new curve. [5]
(ii) On the same diagram, sketch the graph of y = f (x), stating the exact coordinates

of the points where the graph crosses the x- and y-axes. [3]

Remark 131. In the above question’s first sentence, the second step y
is a little ambiguous. Say we transform the black circle on the right
“by a stretch with scale factor 0.5 parallel to the y-axis”. Then do
we get (a) the red ellipse; or (b) the blue ellipse? Perhaps this was
clear in the mind of whoever that wrote this question, but it isn’t to x
me and probably to others too. In my answer, I shall assume that
(a) the stretch is outwards from the y-axis.

1128, Contents

Exercise 469. (9740 N2010/II/4.) (Answer on p. 1620.)
The function f is defined as follows.

f ∶x↦ , for x ∈ R, x ≠ −1, x ≠ 1.
x2 − 1
(i) Sketch the graph of y = f (x). [1]
(ii) If the domain of f is further restricted to x ≥ k, state with a reason the least value of
k for which the function f −1 exists. [2]

In the rest of the question, the domain of f is x ∈ R, x ≠ −1, x ≠ 1, as originally defined.

The function g is defined as follows.

g∶x↦ , for x ∈ R, x ≠ 2, x ≠ 3, x ≠ 4.

(x − 3)
(iii) Show that f g (x) = . [2]
(4 − x) (x − 2)
(iv) Solve the inequality f g (x) > 0. [3]
(v) Find the range of f g. [3]

Exercise 470. (9740 N2009/I/1.) (Answer on p. 1623.)

(i) The first three terms of a sequence are given by u1 = 10, u2 = 6, u3 = 5. Given that un
is a quadratic polynomial in n, find un in terms of n. [4]
(ii) Find the set of values of n for which un is greater than 100. [2]

Exercise 471. (9740 N2009/I/6.) (Answer on p. 1623.)

x−2 x2 y 2
The curve C1 has equation y = . The curve C2 has equation + = 1.
x+2 6 3
(i) Sketch C1 and C2 on the same diagram, stating the exact coordinates of any points
of intersection with the axes and the equations of any asymptotes. [4]
(ii) Show algebraically that the x-coordinates of the points of intersection of C1 and C2
satisfy the equation 2 (x − 2) = (x + 2) (6 − x2 ).
2 2
(iii) Use your calculator to find these x-coordinates. [2]
1129, Contents
Exercise 472. (9740 N2009/II/3.) (Answer on p. 1624.)
The function f is defined by

f ∶x↦ for x ∈ R, x ≠ ,
ax a
bx − a b
where a and b are non-zero constants.
(i) Find f −1 (x). Hence or otherwise find f 2 (x) and state the range of f 2 . [5]
(ii) The function g is defined by g ∶ x ↦ for all real non-zero x. State whether the
composite function f g exists, justifying your answer. [2]
(ii) Solve the equation f −1 (x) = x. [3]

Exercise 473. (9740 N2008/I/9.) (Answer on p. 1625.)

It is given that
ax + b
f (x) =
cx + d

for non-zero constants a, b, c, and d.

(i) Given that ad − bc ≠ 0, show by differentiation that the graph of y = f (x) has no
turning points. [3]
(ii) What can be said about the graph of y = f (x) when ad − bc = 0? [2]
(iii) Deduce from part (i) that the graph of
3x − 7
2x + 1
has a positive gradient at all points of the graph. [1]
(iv) On separate diagrams, draw sketches of the graphs of
3x − 7
(a) y = ,
2x + 1
3x − 7
(b) y 2 = ,
2x + 1
including the coordinates of the points where the graphs cross the axes and the equa-
tions of any asymptotes. [5]
1130, Contents
Exercise 474. (9233 N2008/I/14.) (Answer on p. 1626.)
Sketch, on separate diagrams, the curves

(i) y = 2
, stating the equations of the asymptotes, [4]
x −1
(ii) y 2 = 2
, making clear the form of the curve at the origin. [3]
x −1
(iii) Show that the x-coordinates of the points of intersection of the curves y =
x2 − 1
y = ex satisfy the equation x2 = 1 + xe−x . [1]

(iv) Use the iterative formula xn+1 = 1 + xn e−xn , together with a suitable initial value x1 ,
to find the positive root of this equation correct to 2 decimal places. [2]

Exercise 475. (9740 N2008/II/4.) (Answer on p. 1627.)

The function f is defined by f ∶ x ↦ (x − 4) 2 + 1 for x ∈ R, x > 4.
(i) Sketch the graph of y = f (x). Your sketch should indicate the position of the graph
in relation to the origin. [2]
(ii) Find f −1 (x), stating the domain of f −1 . [3]
(iii) On the same diagram as in part (i), sketch the graph of y = f −1 (x). [1]
(iv) Write down the equation of the line in which the graph of y = f (x) must be reflected
in order to obtain the graph of y = f −1 (x), and hence find the exact solution of the
equation f (x) = f −1 (x). [5]

Exercise 476. (9740 N2007/I/1.) (Answer on p. 1629.)

2x2 − x − 19 x2 − 4x − 21
Show that 2 −1= 2 . [1]
x + 3x + 2 x + 3x + 2
Hence, without using a calculator, solve the inequality

2x2 − x − 19
> 1. [4]
x2 + 3x + 2
1131, Contents
Exercise 477. (9740 N2007/I/2.) (Answer on p. 1629.)
Functions f and g are defined by

f ∶x ↦ for x ∈ R, x ≠ 3,
g ∶ x ↦ x2 for x ∈ R.

(i) Only one of the composite functions f g and gf exists. Give a definition (including
the domain) of the composite that exists, and explain why the other composite does
not exist. [3]
(ii) Find f −1 (x) and state the domain of f −1 . [3]

Exercise 478. (9740 N2007/I/5. Answer on p. 1630.)

2x + 7
Show that the equation y = can be written as y = A +
, where A and B are
x+2 x+2
constants to be found. Hence state a sequence of transformations which transform the
1 2x + 7
graph of y = to the graph of y = . [4]
x x+2
2x + 7
Sketch the graph of y = , giving the equations of any asymptotes and the coordinates
of any points of intersection with the x- and y-axes. [3]

Exercise 479. (9740 N2007/II/1.) (Answer on p. 1631.)

Four friends buys three different kinds of fruits in the market. When they get home they
cannot remember the individual prices per kilogram, but three of them can remember the
total amount that they each paid. The weights of fruit and the total amounts paid are
shown in the following table.

Suresh Fandi Cindy Lee Lian

Pineapples (kg) 1.15 1.20 2.15 1.30
Mangoes (kg) 0.60 0.45 0.90 0.25
Lychees (kg) 0.55 0.30 0.65 0.50
Total amount paid in $ 8.28 6.84 13.05

Assuming that, for each variety of fruit, the price per kilogram paid by each of the friends
is the same, calculate the total amount that Lee Lian paid. [6]
1132, Contents
Exercise 480. (9233 N2007/II/4.) (Answer on p. 1631.)
The function f is defined by

4x + 1
f ∶x↦ , x ∈ R, x ≠ 3.
(i) State the equations of the two asymptotes of the graph of y = f (x). [2]
(ii) Sketch the graph of y = f (x), showing its asymptotes and stating the coordinates of
the points of intersection with the axes. [3]
(iii) Find an expression for f −1 (x) and state the domain of f −1 . [3]

Exercise 481. (9233 N2006/I/3.) (Answer on p. 1632.)

Functions f and g are defined by

f ∶ x ↦ 5x + 3, x > 0,
g∶x↦ , x > 0.
(i) Find, in a similar form, f g, g 2 and g 35 . [3]
[Note: g 2 denotes gg.]
(ii) Express h in terms of one or both f and g, where

h ∶ x ↦ 25x + 18, x > 0. [1]

Exercise 482. (9233 N2006/II/1.) (Answer on p. 1633.)

Solve the inequality

≤ 1. [5]
x2 − 9

1133, Contents

110. Past-Year Questions for Part II. Sequences and Series
Exercise 483. (9758 N2017/I/9.) (Answer on p. 1634.)
(a) A sequence of numbers u1 , u2 , u3 , . . . has a sum Sn where Sn = ∑ ur . It is given that
Sn = An2 + Bn, where A and B are non-zero constants.

(i) Find an expression for un in terms of A, B and n. Simplify your answer. [3]
(ii) It is also given that the tenth term is 48 and the seventeenth term is 90. Find A
and B. [2]
(b) Show that r2 (r + 1) − (r − 1) r2 = kr3 , where k is a constant to be determined. Use
2 2
this result to find a simplified expression for ∑ r3 . [4]

(c) D’Alembert’s ratio test states that a series of the form ∑ ar converges when lim ∣ ∣<
n→∞ an
1, and diverges when lim ∣ ∣ > 1. When lim ∣ ∣ = 1, the test is inconclusive.
an+1 an+1
n→∞ an n→∞ an
∞ r
Using the test, explain why the series ∑
converges for all real values of x and state
r=0 r!

the sum to infinity of this series, in terms of x. [4]

Exercise 484. (9758 N2017/II/2.) (Answer on p. 1634.)

An arithmetic progression has first term 3. The sum of the first 13 terms of the progression
is 156.
(i) Find the common difference. [2]
A geometric progression has first term 3 and common ratio r. The sum of the first 13 terms
of the progression is 156.
(ii) Show that r13 − 52r + 51 = 0. Show that the common ratio cannot be 1 even though
r = 1 is a root of this equation. Find the possible values of the common ratio. [4]
(iii) It is given that the common ratio of the geometric progression is positive, and that
the nth term of this geometric progression is more than 100 times the nth term of the
arithmetic progression. Write down an inequality, and hence find the smallest possible
value of n. [3]
Exercise 485. (9740 N2016/I/4.) (Answer on p. 1635.)
An arithmetic series has first term a and common difference d, where a and d are non-zero.
A geometric series has first term b and common ratio r, where b and r are non-zero. It is
given that the 4th, 9th and 12th terms of the arithmetic series are equal to the 5th, 8th
and 15th terms of the geometric series respectively.
(i) Show that r satisfies the equation 5r10 − 8r3 + 3 = 0. Given that ∣r∣ < 1, solve this
equation, giving your answer correct to 2 decimal places. [4]
(ii) Using this value of r, find, in terms of b and n, the sum of the terms of the geometric
series after, but not including, the nth term, simplifying your answer. [3]
1134, Contents
Exercise 486. (9740 N2016/I/6.) (Answer on p. 1635.)
(i) Prove by the method of mathematical induction that

∑ r (r2 + 1) = n (n + 1) (n2 + n + 2).
r=1 4

(ii) A sequence u0 , u1 , u2 , . . . is given by

u0 = 2 and un = un−1 + n3 + n for n ≥ 1.

Find u1 , u2 , and u3 . [2]

(iii) By considering ∑ (ur − ur−1 ), find a formula for un in terms of n. [3]

Exercise 487. (9740 N2015/I/8.) (Answer on p. 1636.)

Two athletes are to run 20 km by running 50 laps around a circular track of length 400 m.
They aim to complete the distance in between 1.5 hours and 1.75 hours inclusive.
(i) Athlete A runs the first lap in T seconds and each subsequent lap takes 2 seconds
longer than the previous lap. Find the set of values of T which will enable A to
complete the distance within the required time interval. [4]
(ii) Athlete B runs the first lap in t seconds and the time for each subsequent lap is 2%
more than the time for the previous lap. Find the set of values of t which will enable
B to complete the distance within the required time interval. [4]
(iii) Assuming each athlete completes the 20 km run in exactly 1.5 hours, find the difference
in the athletes’ times for their 50th laps, giving your answer to the nearest second.[3]

Exercise 488. (9740 N2015/II/4.) (Answer on p. 1636.)

(a) Prove by the method of mathematical induction that

n (n + 1) (3n2 + 31n + 74). [6]
1 × 3 × 6 + 2 × 4 × 7 + 3 × 5 × 8 + ⋅ ⋅ ⋅ + n (n + 2) (n + 5) =
(b) (i) Show that 2 can be expressed as , where A and B are
4r + 8r + 3 2r + 1 2r + 3
constants to be determined. [1]
The sum ∑ is denoted by Sn .
r=1 4r + 8r + 3

(ii) Find an expression for Sn in terms of n. [4]

(iii) Find the smallest value of n for which Sn is within 10 of the sum to infinity.[3]
1135, Contents
Exercise 489. (9740 N2014/I/6.) (Answer on p. 1637.)
(a) A sequence p1 , p2 , p3 , . . . is given by

p1 = 1 and pn+1 = 4pn − 7 for n ≥ 1.

(i) Use the method of mathematical induction to prove that

7 − 4n
pn = . [5]
(ii) Find ∑ pr . [3]

(b) The sum Sn of the first n terms of a sequence u1 , u2 , u3 , . . . is given by

Sn = 1 − .
(n + 1)!
(i) Give a reason why the series ∑ ur converges, and write down the value of the sum
to infinity. [2]
(ii) Find a formula for un in simplified form. [2]

Exercise 490. (9740 N2014/II/3.) (Answer on p. 1638.)

In a training exercise, athletes run from a starting point O to and from a series of points
A1 , A2 , A3 , . . . , increasingly far away in a straight line. In the exercise, athletes start at O
and run stage 1 from O to A1 and back to O, then stage 2 from O to A2 and back to O,
and so on.

O 4 m A1 4 m A2 4 m A3 4 m A4 4 m A5 4 m A6 4 m A7 4 m A8

(i) In Version 1 of the exercise (above), the distances between adjacent points are all 4 m.
(a) Find the distance run by an athlete who completes the first 10 stages of Version
1 of the exercise. [2]
(b) Write down an expression for the distance run by an athlete who completes n
stages of Version 1. Hence find the least number of stages that the athlete needs
to complete to run at least 5 km. [4]
(ii) In Version 2 of the exercise (below), the distances between the points are such that
OA1 = 4 m, A1 A2 = 4 m, A2 A3 = 8 m and An An+1 = 2An−1 An . Write down an expression
for the distance run by an athlete who completes n stages of Version 2. Hence find the
distance from O, and the direction of travel, of the athlete after he has run exactly
10 km using Version 2. [5]

O 4 m A1 4 m A2 8 m A3 16 m A4

1136, Contents

Exercise 491. (9740 N2013/I/7.) (Answer on p. 1639.)
A gardener is cutting off pieces of string from a long roll of string. The first piece he cuts
off is 128 cm long and each successive piece is 2/3 as long as the preceding piece.
(i) The length of the nth piece of string cut off is p cm. Show that ln p = (An + B) ln 2 +
(Cn + D) ln 3, for constants A, B, C and D to be determined. [3]
(ii) Show that the total length of string cut off can never be greater than 384 cm. [2]
(iii) How many pieces must be cut off before the total length cut off is greater than 380 cm?
You must show sufficient working to justify your answer. [4]

Remark 132. The wording of (iii) is a little ambiguous. Is the desired answer (a) the
maximum number of pieces one can cut off before the total length cut off is greater
than 380 cm? Or is it (b) the minimum number of pieces one can cut off in order for
the total length cut off to be greater than 380 cm? (Of course, the latter is simply one
more than the former.)
In my answer, I shall assume (b).

Exercise 492. (9740 N2013/I/9.) (Answer on p. 1639.)

(i) Prove by the method of mathematical induction that

∑ r (2r2 + 1) = n (n + 1) (n2 + n + 1).
r=1 2

(ii) It is given that f (r) = 2r3 +3r2 +r +24. Show that f (r)−f (r − 1) = ar2 , for a constant
a to be determined. Hence find a formula for ∑ r2 , fully factorizing your answer. [5]
(iii) Find ∑ f (r). (You should not simplify your answer.) [3]

Exercise 493. (9740 N2012/I/3.) (Answer on p. 1640.)

A sequence u1 , u2 , u3 , . . . is given by

3un − 1
u1 = 2 and un+1 = for n ≥ 1.
(i) Find the exact values of u2 and u3 . [2]
(ii) It is given that un → l as n → ∞. Showing your working, find the exact value of l. [2]
(iii) For this value of l, use the method of mathematical induction to prove that

14 1 n
un = ( ) + l. [4]
3 2
1137, Contents
Exercise 494. (9740 N2012/II/4.) (Answer on p. 1641.)
On 1 January 2001 Mrs A put $100 into a bank account, and on the first day of each
subsequent month she put in $10 more than in the previous month. Thus on 1 February
she put $110 into the account and on 1 March she put $120 into the account, and so on.
The account pays no interest.
(i) On what date did the value of Mrs A’s account first become greater than $5000? [5]
On 1 January 2001 Mr B put $100 into a savings account, and on the first day of each
subsequent month he put another $100 into the account. The interest rate was 0.5% per
month, so that on the last day of each month the amount in the account on that day was
increased by 0.5%.
(ii) Use the formula for the sum of a geometric progression to find an expression for the
value of Mr B’s account on the last day of the nth month (where January 2001 was
the 1st month, February 2001 was the 2nd month, and so on). Hence find in which
month the value of Mr B’s account first became greater than $5000. [5]
(iii) Mr B wanted the value of his account to be $5000 on 2 December 2003. What interest
rate per month, applied from January 2001, would achieve this? [3]

Exercise 495. (9740 N2011/I/6.) (Answer on p. 1642.)

(i) Using the formulae for sin(A ± B), prove that

1 1 1
sin (r + ) θ − sin (r − ) θ = 2 cos rθ sin θ. [2]
2 2 2

Remark 133. Now assume also that θ is not an even integer multiple of π.381

1 1
(ii) Hence find a formula for ∑ cos rθ in terms of sin (n + ) θ and sin θ. [3]
r=1 2 2
(iii) Prove by the method of mathematical induction that
n cos 12 θ − cos (n + 12 ) θ
∑ sin rθ =
r=1 2 sin 12 θ

for all positive integers n. [6]

Otherwise some of the formulae that follow have 0 as denominators and are thus undefined.
1138, Contents
Exercise 496. (9740 N2011/I/9.) (Answer on p. 1643.)
(i) A company is drilling for oil. Using machine A, the depth drilled on the first day is
256 metres. On each subsequent day, the depth drilled is 7 metres less than on the
previous day. Drilling continues daily up to and including the day when a depth of
less than 10 metres is drilled. What depth is drilled on the 10th day, and what is the
total depth when drilling is completed? [6]
(ii) Using machine B, the depth drilled on the first day is also 256 metres. On each
subsequent day, the depth drilled in of the depth drilled on the previous day. How
many days does it take for the depth drilled to exceed 99% of the theoretical maximum

total depth? [4]

Exercise 497. (9740 N2010/I/3.) (Answer on p. 1644.)

The sum Sn of the first n terms of a sequence u1 , u2 , u3 , . . . is given by

Sn = n (2n + c),

where c is a constant.
(i) Find un in terms of c and n. [3]
(ii) Find a recurrence relation of the form un+1 = f (un ). [2]

Exercise 498. (9740 N2010/II/2.) (Answer on p. 1644.)

(i) Prove by mathematical induction that
∑ r (r + 2) = n (n + 1) (2n + 7). [5]
r=1 6
(ii) (a) Prove by the method of differences that
1 3 1 1
∑ = − − . [4]
r=1 r (r + 2) 4 2 (n + 1) 2 (n + 2)

(b) Explain why ∑ is a convergent series, and state the value of the sum to
r=1 r (r + 2)
infinity. [2]
1139, Contents
Exercise 499. (9740 N2009/I/3.) (Answer on p. 1645.)
(i) Show that
1 2 1
− + = 3
n−1 n n+1 n −n

where A is a constant to be found. [2]

(ii) Hence find ∑ 3 . (There is no need to express your answer as a single algebraic
r=2 r − r
fraction.) [3]
(iii) Give a reason why the series ∑ 3 converges, and write down its value. [2]
r=2 r − r

Exercise 500. (9740 N2009/I/5.) (Answer on p. 1645.)

(i) Use the method of mathematical induction to prove that
∑ r2 = n (n + 1) (2n + 1). [4]
r=1 6
(ii) Find ∑ r2 , giving your answer in fully factorized form. [4]

Exercise 501. (9740 N2009/I/8.) (Answer on p. 1646.)

Two musical instruments, A and B, consist of metal bars of decreasing lengths.
(i) The first bar of instrument A has length 20 cm and the lengths of the bars form a
geometric progression. The 25th bar has length 5 cm. Show that the total length of
all the bars must be less than 357 cm, no matter how many bars there are. [4]
Instrument B consists of only 25 bars which are identical to the first 25 bars of instrument
(ii) Find the total length, L cm, of all the bars of instrument B and the length of the 13th
bar. [3]
(iii) Unfortunately the manufacturer misunderstands the instructions and constructs in-
strument B wrongly, so that the lengths of the bars are in arithmetic progression with
common difference d cm. If the total length of the 25 bars is still L cm and the length
of the 25th bar is still 5 cm, find the value of d and the length of the longest bar. [4]
1140, Contents
Exercise 502. (9740 N2008/I/2.) (Answer on p. 1646.)
The nth term of a sequence is given by

un = n (2n + 1),

for n ≥ 1. The sum of the first n terms is denoted by Sn . Use the method of mathematical
induction to show that
Sn = n (n + 1) (4n + 5)
for all positive integers n. [5]

Exercise 503. (9740 N2008/I/10.) (Answer on p. 1646.)

(i) A student saves $10 on 1 January 2009. On the first day of each subsequent month
she saves $3 more than in the previous month, so that she saves $13 on 1 February
2009, $16 on 1 March 2009, and so on. On what date will she first have saved over
$2 000 in total? [5]
(ii) A second student puts $10 on 1 January 2009 into a bank account which pays com-
pound interest at a rate of 2% per month on the last day of each month. She puts a
further $10 into the account on the first day of each subsequent month.
(a) How much compound interest has her original $10 earned at the end of 2 years?
... [2]
(b) How much in total is in the account at the end of 2 years? [3]
(c) After how many complete months will the total in the account first exceed $2 000?

Exercise 504. (9233 N2008/II/2.) (Answer on p. 1647.)

An arithmetic progression and a geometric progression each have first term .
1 1
The sum of their second terms is and the sum of their third terms is . Given that the
2 8
geometric progression is convergent, find its sum to infinity. [6]
1141, Contents
Exercise 505. (9740 N2007/I/9.) (Answer on p. 1648.)

O α β x

The diagram shows the graph of y = ex − 3x. The two roots of the equation ex − 3x = 0 are
denoted by α and β, where α < β.
(i) Find the values of α and β, each correct to 3 decimal places. [2]
A sequence of real numbers x1 , x2 , x3 , . . . satisfies the recurrence relation

xn+1 = exn , for n ≥ 1.
(ii) Prove algebraically that, if the sequence converges, then it converges to either α or β.
(iii) Use a calculator to determine the behaviour of the sequence for each of the cases
x1 = 0, x1 = 1, x1 = 2. [3]
(iv) By considering xn+1 − xn , prove that
xn+1 < xn if α < xn < β,
xn+1 > xn if xn < α or xn > β. [2]

(v) State briefly how the results in part (iv) relate to the behaviours determined in (iii).
... [2]

Exercise 506. (9740 N2007/I/10.) (Answer on p. 1648.)

A geometric series has common ratio r, and an arithmetic series has first term a and
common difference d, where a and d are non-zero. The first three terms of the geometric
series are equal to the first, fourth and sixth terms respectively of the arithmetic series.
(i) Show that 3r2 − 5r + 2 = 0. [4]
(ii) Deduce that the geometric series is convergent and find, in terms of a, the sum to
infinity. [5]
(iii) The sum of the first n terms of the arithmetic series is denoted by S. Given that a > 0,
find the set of possible values of n for which S exceeds 4a. [5]
1142, Contents
Exercise 507. (9740 N2007/II/2.) (Answer on p. 1649.)
A sequence u1 , u2 , u3 , . . . is such that u1 = 1 and

2n + 1
un+1 = un − , for all n ≥ 1.
n2 (n + 1) 2
(i) Use the method of mathematical induction to prove that un = . [4]
2n + 1
(ii) Hence find ∑ . [2]
n=1 n (n + 1)
2 2

(iii) Give a reason why the series in part (ii) is convergent and state the sum to infinity.
2n − 1
(iv) Use your answer to part (ii) to find ∑ . [2]
n=2 n (n + 1)
2 2

Exercise 508. (9233 N2007/I/14.) (Answer on p. 1650.)

Use the method of mathematical induction to prove the following result.

n cos 12 x − cos (n + 21 ) x
∑ sin rx = , where 0 < x < 2π. [6]
r=1 2 sin 12 x

Exercise 509. (9233 N2007/II/1.) Find ∑ 3r+2 . [3] (Answer on p. 1650.)

Exercise 510. (9233 N2006/I/1.) (Answer on p. 1650.)

The sum Sn of the first n terms of a geometric progression is given by Sn = 6 − n−1 . Find
the first term and the common ratio. [4]

Exercise 511. (9233 N2006/I/11.) (Answer on p. 1650.)

(i) Prove by induction that
∑ r3 = n2 (n + 1)2 . [4]
r=1 4

(ii) Deduce that 23 + 43 + 63 + ⋅ ⋅ ⋅ + (2n) 3 = 2n2 (n + 1) 2 . [1]

(iii) Hence or otherwise find
∑(2r − 1)3 ,

simplifying your answer. [4]

1143, Contents

111. Past-Year Questions for Part III. Vectors
Exercise 512. (9758 N2017/I/6.) (Answer on p. 1652.)

(i) Interpret geometrically the vector equation r = a + tb, where a and b are constant
vectors and t is a parameter. [2]
(ii) Interpret geometrically the vector equation r ⋅ n = d, where n is a constant unit vector
and d is a constant scalar, stating what d represents. [3]
(iii) Given that b ⋅ n ≠ 0, solve the equations r = a + tb and r ⋅ n = d to find r in terms of
a, b, n and d. Interpret the solution geometrically. [3]

Remark 134. This question should have clearly stated if this was meant to be in the
context of two- or three-dimensional space.382 I shall assume the latter.

Exercise 513. (9758 N2017/I/10.) (Answer on p. 1652.)

Electrical engineers are installing electricity cables on a building site. Points (x, y, z) are
defined relative to a main switching site at (0, 0, 0), where units are metres. Cables are laid
in straight lines and the widths of cables can be neglected.
⎛ 3 ⎞
An existing cable C starts at the main switching site and goes in the direction ⎜ ⎟
⎜ 1 ⎟. A
⎝ −2 ⎠
new cable is installed which passes through points P (1, 2, −1) and Q (5, 7, a).
(i) Find the value of a for which C and the new cable will meet. [4]
To ensure that the cables do not meet, the engineers use a = −3. The engineers wish to
connect each of the points P and Q to a point R on C.
(ii) The engineers wish to reduce the length of cable required and believe in order to do
this that angle P RQ should be 90○ . Show that this is not possible. [4]
(iii) The engineers discover that the ground between P and R is difficult to drill through
and now decide to make the length of P R as small as possible. Find the coordinates
of R in this case and the exact minimum length. [5]

Exercise 514. (9740 N2016/I/5.) (Answer on p. 1653.)

The vectors u and v are given by u = 2i − j + 2k and v = ai + bk, where a and b are constants.
(i) Find (u + v) × (u − v) in terms of a and b. [2]
(ii) Given that the i- and k-components of the answer to part (i) are equal, express
(u + v) × (u − v) in terms of a only. Hence find, in an exact form, the possible values
of a for which (u + v) × (u − v) is a unit vector. [4]
(iii) Given instead that (u + v) ⋅ (u − v) = 0, find the numerical value of ∣v∣. [2]

This is because in the context of two-dimensional space, the vector equation r ⋅ n = d describes a line.
1144, Contents
Exercise 515. (9740 N2016/I/11.) (Answer on p. 1654.)
⎛ 1 ⎞ ⎛1 ⎞ ⎛ a ⎞
The plane p has equation r = ⎜ ⎟ ⎜
⎜ −3 ⎟ + λ ⎜ 2
⎟ + µ ⎜ 4 ⎟, and the line l has equation
⎟ ⎜ ⎟
⎝ 2 ⎠ ⎝0 ⎠ ⎝ −2 ⎠
⎛ a−1 ⎞ ⎛ −2 ⎞
⎜ a
⎟ + t ⎜ 1 ⎟, where a is a constant and λ, µ and t are parameters.
⎟ ⎜ ⎟
⎝ a+1 ⎠ ⎝ 2 ⎠

(i) In the case where a = 0,

(a) show that l is perpendicular to p and find the values of λ, µ and t which give the
coordinates of the point at which l and p intersect, [5]
(b) find the cartesian equations of the planes such that the perpendicular distance
from each plane to p is 12. [5]
(ii) Find the value of a such that l and p do not meet in a unique point. [3]

Exercise 516. (9740 N2015/I/7.) (Answer on p. 1655.)

Referred to the origin O, points A and B have position vectors a and b respectively. Point
C lies on OA, between O and A, such that OC ∶ CA = 3 ∶ 2. Point D lies on OB, between
O and B, such that OD ∶ DB = 5 ∶ 6.
Ð→ ÐÐ→
(i) Find the position vectors OC and OD, giving your answers in terms of a and b. [2]
(ii) Show that the vector equation of the line BC can be written as r = λa + (1 − λ) b,
where λ is a parameter. Find in a similar form the vector equation of the line AD in
terms of a parameter µ. [3]
(iii) Find, in terms of a and b, the position vector of the point E where the lines BC and
AD meet and find the ratio AE ∶ ED. [5]

Exercise 517. (9740 N2015/II/2.) (Answer on p. 1656.)

The line L has equation r = i − 2j − 4k + λ (2i + 3j − 6k).
(i) Find the acute angle between L and the x-axis. [2]
The point P has position vector 2i + 5j − 6k.

(ii) Find the points on L which are a distance of 33 from P . Hence or otherwise find
the point on L which is closest to P . [5]
(iii) Find a cartesian equation of the plane that includes the line L and the point P . [3]

Exercise 518. (9740 N2014/I/3.) (Answer on p. 1656.)

(i) Given that a × b = 0, what can be deduced about the vectors a and b? [2]
(ii) Find a unit vector n such that n × (i + 2j − 2k) = 0. [2]
(iii) Find the cosine of the acute angle between i + 2j − 2k and the z-axis. [1]

1145, Contents

Exercise 519. (9740 N2014/I/9.) (Answer on p. 1656.)
Planes p and q are perpendicular. Plane p has equation x + 2y − 3z = 12. Plane q contains
the line l with equation
x−1 y+1 z−3
= =
2 4
The point A on l has coordinates (1, −1, 3) .
(i) Find a cartesian equation of q. [4]
(ii) Find a vector equation of the line m where p and q meet. [4]
(iii) B is a general point on m. Find an expression for the square of the distance AB.
Hence, or otherwise, find the coordinates of the point on m which is nearest to A. [5]

Exercise 520. (9740 N2013/I/1.) (Answer on p. 1656.)

Planes p, q and r have equations x − 2z = 4, 2x − 2y + z = 6 and 5x − 4y + µz = −9 respectively,
where µ is a constant.
(i) Given that µ = 3, find the coordinates of the point of intersection of p, q and r. [2]
(ii) Given instead that µ = 0, describe the relationship between p, q and r. [3]

Exercise 521. (9740 N2013/I/6.) A (Answer on p. 1657.)



b B


Ð→ Ð→
The origin O and the points A, B and C lie in the same plane, where OA = a, OB = b and
OC = c (see diagram).
(i) Explain why c can be expressed as c = λa + µb, for constants λ and µ. [1]
The point N is on AC such that AN ∶ N C = 3 ∶ 4.
(ii) Write down the position vector of N in terms of a and c. [1]
(iii) It is given that the area of triangle ON C is equal to the area of triangle OM C, where
M is the mid-point of OB. By finding the areas of these triangles in terms of a and
b, find λ in terms of µ in the case where λ and µ are both positive. [5]

1146, Contents

Exercise 522. (9740 N2013/II/4.) (Answer on p. 1657.)
The planes p1 and p2 have equations r ⋅ (2, −2, 1) = 1 and r ⋅ (−6, 3, 2) = −1 respectively, and
meet in the line l.
(i) Find the acute angle between p1 and p2 . [3]
(ii) Find a vector equation for l. [4]
(iii) The point A (4, 3, c) is equidistant from the planes p1 and p2 . Calculate the two possible
values of c. [6]
Exercise 523. (9740 N2012/I/5.) (Answer on p. 1658.)
Referred to the origin O, the points A and B have position vectors a and b such that
a = i − j + k and b = i + 2j. The point C has position vector c given by c = λa + µb, where
λ and µ are positive constants.

(i) Given that the area of triangle OAC is 126, find µ. [4]

(ii) Given instead that µ = 4 and that OC = 5 3, find the possible coordinates of C. [4]
Exercise 524. (9740 N2012/I/9.) (Answer on p. 1658.)

(i) Find a vector equation of the line through the points A and B with position vectors
7i + 8j + 9k and −i − 8j + k respectively. [3]
(ii) The perpendicular to this line from the point C with position vector i + 8j + 3k meets
the line at the point N . Find the position vector of N and the ratio AN ∶ N B. [5]
(iii) Find a cartesian equation of the line which is a reflection of the line AC in the line
AB. [4]
Exercise 525. (9740 N2011/I/7.) A (Answer on p. 1659.)


Ð→ Ð→
Referred to the origin O, the points A and B are such that OA = a and OB = b. The point
P on OA is such that OP ∶ P A = 1 ∶ 2, and the point Q on OB is such that OQ ∶ QB = 3 ∶ 2.
The mid-point of P Q is M (see diagram).
(i) Find OM in terms of a and b and show that the area of triangle OM P can be written
as k ∣a × b∣, where k is a constant to be found. [6]
(ii) The vectors a and b are now given by a = 2pi − 6pj + 3pk and b = i + j − 2k, where p is
a positive constant. Given that a is a unit vector,
(a) find the exact value of p, [2]
(b) give a geometrical interpretation of ∣a ⋅ b∣, [1]
(c) evaluate a × b. [2]

1147, Contents

Exercise 526. (9740 N2011/I/11.) (Answer on p. 1659.)
The plane p passes through the points with coordinates (4, −1, −3), (−2, −5, 2) and (4, −3, −2).
(i) Find a cartesian equation of p. [4]
The lines l1 and l2 have, respectively, equations

x−1 y−2 z+3 x+2 y−1 z−3

= = and = = ,
2 −4 1 1 5 k
where k is a constant. It is given that l1 and l2 intersect.
(ii) Find the value of k. [4]
(iii) Show that l1 lies in p and find the coordinates of the point at which l2 intersects p. [4]
(iv) Find the acute angle between l2 and p. [3]

Exercise 527. (9740 N2010/I/1.) (Answer on p. 1660.)

The position vectors a and b are given by

a = 2pi + 3pj + 6pk and b = i − 2j + 2k,

where p > 0. It is given that ∣a∣ = ∣b∣.

(i) Find the exact value of p. [2]
(ii) Show that (a + b) ⋅ (a − b) = 0. [3]

Exercise 528. (9740 N2010/I/10.) (Answer on p. 1660.)

The line l and plane p have, respectively, equations

x − 10 y + 1 z + 3
= = and x − 2y − 3z = 0.
−3 6 9
(i) Show that l is perpendicular to p. [2]
(ii) Find the coordinates of the point of intersection of l and p. [4]
(iii) Show that the point A with coordinates (−2, 23, 33) lies on l. Find the coordinates of
the point B which is the mirror image of A in p. [3]
(iv) Find the area of triangle OAB, where O is the origin, giving your answer to the nearest
whole number. [3]

Exercise 529. (9740 N2009/I/10.) (Answer on p. 1660.)

The planes p1 and p2 have equations r ⋅ (2, 1, 3) = 1 and r ⋅ (−1, 2, 1) = 2 respectively, and
meet in a line l.
(i) Find the acute angle between p1 and p2 . [3]
(ii) Find a vector equation of l. [4]
(iii) The plane p3 has equation 2x + y + 3z − 1 + k (−x + 2y + z − 2) = 0. Explain why l lies
in p3 for any constant k. Hence, or otherwise, find a cartesian equation of the plane
in which both l and the point (2, 3, 4) lie. [5]
1148, Contents
Exercise 530. (9740 N2009/II/2.) (Answer on p. 1661.)
Relative to the origin O, two points A and B have position vectors given by a = 14i+14j+14k
and b = 11i − 13j + 2k respectively.
(i) The point P divides the line AB in the ratio 2 ∶ 1. Find the coordinates of P . [2]
(ii) Show that AB and OP are perpendicular. [2]
(iii) The vector c is a unit vector in the direction of OP . Write c as a column vector, and
give the geometrical meaning of ∣a ⋅ c∣. [2]
(iv) Find a × p, where p is the vector OP , and give the geometrical meaning of ∣a × p∣.
Hence write down the area of the triangle OAP . [4]

Exercise 531. (9740 N2008/I/3.) (Answer on p. 1661.)

Ð→ Ð→
Points O, A, B are such that OA = i + 4j − 3k and OB = 5i − j, and the point P is such that
OAP B is a parallelogram.
(i) Find OP . [1]
(ii) Find the size of angle AOB. [3]
(iii) Find the exact area of the parallelogram OAP B. [2]

Exercise 532. (9740 N2008/I/11.) (Answer on p. 1661.)

The equations of three planes p1 , p2 , p3 are

2x − 5y + 3z = 3,
3x + 2y − 5z = −5,
5x + λy + 17z = µ,

respectively, where λ and µare constants. When λ = −20.9 and µ = 16.6, find the coordinates
of the point at which these planes meet. [2]
The planes p1 and p2 intersect in a line l.
(i) Find a vector equation of l. [4]
(ii) Given that all three planes meet in the line l, find λ and µ. [3]
(iii) Given instead that the three planes have no points in common, what can be said about
the values of λ and µ? [2]
(iv) Find the cartesian equation of the plane which contains l and the point (1, −1, 3). [4]
1149, Contents
Exercise 533. (9233 N2008/I/11.) (Answer on p. 1662.)
The cartesian equations of two lines are

x y+2 z−5 x−1 y+3 z−4

= = and = = .
1 2 −1 −1 −3 1
(i) Show that the lines intersect and state the point of intersection. [5]
(ii) Find the acute angle between the lines. [4]

Exercise 534. (9740 N2007/I/6.) (Answer on p. 1662.)

Referred to the origin O, the position vectors of the points A and B are i − j + 2k and
2i + 4j + k respectively.
(i) Show that OA is perpendicular to OB. [2]
(ii) Find the position vector of the point M on the line segment AB such that AM ∶ M B =
1 ∶ 2. [3]
(iii) The point C has position vector −4i + 2j + 2k. Use a vector product to find the exact
area of triangle OAC. [4]

Exercise 535. (9740 N2007/I/8.) (Answer on p. 1662.)

The line l passes through the points A and B with coordinates (1, 2, 4) and (−2, 3, 1)
respectively. The plane p has equation 3x − y + 2z = 17. Find
(i) the coordinates of the point of intersection of l and p, [5]
(ii) the acute angle between l and p, [3]
(iii) the perpendicular distance from A to p. [3]

Exercise 536. (9233 N2007/I/7.) (Answer on p. 1662.)

The point P is the foot of the perpendicular from the point A (1, 3, −2) to the line given by
x+3 y−8 z−3
= = .
2 2 3
Find the coordinates of P , and hence find the length of AP . [7]
1150, Contents
Exercise 537. (9233 N2007/II/2.) (Answer on p. 1662.)
Referred to an origin O the position vectors of two points A and B are 4i+j+3k and i−3j+4k
Ð→ Ð→ ÐÐ→ Ð→
respectively. Two other points, C and D, are given by OC = 0.25OA and OD = 0.75OB.
(i) Find a vector equation for the line AD. [2]
(ii) Find the position vector of the point of intersection of AD and BC. [5]

Exercise 538. (9233 N2006/I/14.) (Answer on p. 1663.)

The points A, B, C and D have position vectors i−2j+5k, i+3j, 10i+j+2k and −2i+4j+5k
respectively, with respect to an origin O. The point P on AB is such that AP ∶ P B = λ ∶ 1−λ
Ð→ Ð→
and the point Q on CD is such that CQ ∶ QD = µ ∶ 1 − µ. Find OP and OQ in terms of λ
and µ respectively. [3]
Given that P Q is perpendicular to both AB and CD,
(i) show that P Q = i + 2j + 2k, [7]
(ii) Find the area of triangle ABQ. [2]

1151, Contents

112. Past-Year Questions for Part IV. Complex Numbers
Exercise 539. (9758 N2017/I/8.) (Answer on p. 1664.)
Do not use a calculator in answering this question.
(a) Find the roots of the equation z 2 (1 − i) − 2z + (5 + 5i) = 0, giving your answers in
cartesian form a + ib. [3]
(b) (i) Given that ω = 1 − i, find ω 2 , ω 3 and ω 4 in cartesian form. Given also that

ω 4 + pω 3 + 39ω 2 + qω + 58 = 0,

where p and q are real, find p and q. [4]

(ii) Using the values of p and q in part (b)(i), express ω 4 + ρω 3 + 39ω 2 + qω + 58 as the
product of two quadratic factors. [3]

Exercise 540. (9740 N2016/I/7.) (Answer on p. 1664.)

Do not use a calculator in answering this question.
(a) Verify that −1 + 5i is a root of the equation w2 + (−1 − 8i) w + (−17 + 7i) = 0. Hence, or
otherwise, find the second root of the equation in cartesian form, p + iq, showing your
working. [5]
(b) The equation z 3 − 5z 2 + 16z + k = 0, where k is a real constant, has a root z = 1 + ai,
where a is a positive real constant. Find the values of a and k, showing your working.

Exercise 541. (9740 N2016/II/4.) (Answer on p. 1665.)

(a) Two loci in the Argand diagram are given by the equations

∣z − 3 − i∣ = 1 and arg z = α, where tan α = 0.4.

The complex numbers z1 and z2 , where ∣z1 ∣ < ∣z2 ∣, correspond to the points of intersection
of these loci.
(i) Draw an Argand diagram to show both loci, and mark the points represented by
z1 and z2 . [2]
(ii) Find the two values of z which represent points on ∣z − 3 − i∣ = 1 such that ∣z − z1 ∣ =
∣z − z2 ∣. [4]
(b) (i) The complex number 2 − 2i is denoted by w. By writing w in polar form reiθ ,
where r > 0 and −π < θ ≤ π, find exactly all the cube roots of w in polar form. [3]
(ii) Find the smallest positive whole number value of n such that arg (w∗ wn ) = π.[3]
1152, Contents
Exercise 542. (9740 N2015/I/9.) (Answer on p. 1665.)
(a) The complex number w is such that w = a+ib, where a and b are non-zero real numbers.
∗ w2
The complex conjugate of w is denoted by w . Given that ∗ is purely imaginary, find
the possible values of w in terms of a. [5]

(b) The complex number z is such that z 5 = −32i.

(i) Find the modulus and argument of each of the possible values of z. [4]
(ii) Two of these values are z1 and z2 , where 0.5π < arg z1 < π and −π < arg z2 <
−0.5π. Find the exact value of arg (z1 − z2 ) in terms of π and show that ∣z1 − z2 ∣ =
4 sin (0.2π). [4]

Exercise 543. (9740 N2014/I/5.) (Answer on p. 1666.)

It is given that z = 1 + 2i.
(i) Without using a calculator, find the values of z 2 and in cartesian form x + iy,
showing your working. [4]
(ii) The real numbers p and q are such that pz 2 + 3 is real. Find, in terms of p, the value
of q and the value of pz + 3 .
2 q

Exercise 544. (9740 N2014/II/4.) (Answer on p. 1666.)

(a) The complex number z satisfies ∣z + 5 − i∣ = 4.
(i) On an Argand diagram show the locus of z. [2]
(ii) The complex number z also satisfies ∣z − 6i∣ = ∣z + 10 + 4i∣. Find exactly the possible
values of z, giving your answers in the form x + iy. [4]

(b) It is given that w = 3 − i.
(i) Without using a calculator, find an exact expression for w6 . Give your answer in
the form reiθ , where r > 0 and 0 ≤ θ < 2π. [3]
(ii) Without using a calculator, find the three smallest positive whole number values
of n for which ∗ is a real number. [4]

Exercise 545. (9740 N2013/I/4.) (Answer on p. 1667.)

The complex number w is given by 1 + 2i.
(i) Find w3 in the form x + iy, showing your working. [2]
(ii) Given that w is a root of the equation az 3 + 5z 2 + 17z + b = 0, find the values of the
real numbers a and b. [3]
(iii) Using these values of a and b, find all the roots of this equation in exact form. [3]
1153, Contents
Exercise 546. (9740 N2013/I/8.) (Answer on p. 1668.)
The complex number z is given by z = reiθ , where r > 0 and 0 ≤ θ ≤ 0.5π.

(i) Given that w = (1 − i 3) z, find ∣w∣ in terms of r and arg w in terms of θ. [2]
(ii) Given that r has a fixed value, draw an Argand diagram to show the locus of z as θ
varies. On the same diagram, show the corresponding locus of w. You should identify
the modulus and argument of the endpoints of each locus. [4]
z 10
(iii) Given that arg 2 = π, find θ. [3]

Exercise 547. (9740 N2012/I/6.) (Answer on p. 1669.)

Do not use a calculator in answering this question.
The complex number z is given by z = 1 + ic, where c is a non-zero real number.
(i) Find z 3 in the form x + iy. [2]
(ii) Given that z 3 is real, find the possible values of z. [2]
(iii) For the value of z found in part (ii) for which c < 0, find the smallest positive integer n
such that ∣z n ∣ > 1000. State the modulus and argument of z n when n takes this value.

Exercise 548. (9740 N2012/II/2.) (Answer on p. 1669.)

The complex number z satisfies the equation ∣z − (7 − 3i)∣ = 4.
(i) Sketch an Argand diagram to illustrate this equation. [2]
(ii) Given that ∣z∣ is as small as possible,
(a) find the exact value of ∣z∣, [2]
(b) hence find an exact expression for z, in the form x + iy. [2]
(iii) It is given instead that −π < arg z ≤ π and that ∣arg z∣ is as large as possible. Find the
value of arg z in radians, correct to 4 significant figures. [3]

Exercise 549. (9740 N2011/I/10.) (Answer on p. 1670.)

Do not use a graphic calculator in answering this question.
(i) The roots of the equation z 2 = −8i are z1 and z2 . Find z1 and z2 in cartesian form
x + iy, showing your working. [4]
(ii) Hence, or otherwise, find in cartesian form the roots w1 and w2 of the equation w2 +
4w + (4 + 2i) = 0. [3]
(iii) Using a single Argand diagram, sketch the loci
(a) ∣z − z1 ∣ = ∣z − z2 ∣, [1]
(b) ∣z − w1 ∣ = ∣z − w1 ∣, [1]
(iv) Give a reason why there are no points which lie on both of these loci. [1]

1154, Contents

Exercise 550. (9740 N2011/II/1.) (Answer on p. 1671.)
The complex number z satisfies ∣z − 2 − 5i∣ ≤ 3.
(i) On an Argand diagram, sketch the region in which the point representing z can lie.[3]
(ii) Find exactly the maximum and minimum possible values of ∣z∣. [2]
(iii) It is given that 0 ≤ arg z ≤ . With this extra information, find the maximum value of
∣z − 6 − i∣. Label the point(s) that correspond to this maximum value on your diagram
with the letter P . [3]

Exercise 551. (9740 N2010/I/8.) (Answer on p. 1672.)

The complex numbers z1 and z2 are given by 1 + i 3 and −1 − i respectively.
(i) Express each of z1 and z2 in polar form r (cos θ + i sin θ), where r > 0 and −π < θ ≤ π.
Give r and θ in exact form. [2]
(ii) Find the complex conjugate of in exact polar form. [3]
(iii) On a single Argand diagram, sketch the loci
(a) ∣z − z1 ∣ = 2,
(b) arg(z − z2 ) =
. [4]
(iv) Find where the locus ∣z − z1 ∣ = 2 meets the positive real axis. [2]

Exercise 552. (9740 N2010/II/1.) (Answer on p. 1673.)

(i) Solve the equation x2 − 6x + 34 = 0. [2]
(ii) One root of the equation x4 + 4x3 + x2 + ax + b = 0, where a and b are real, is x = −2 + i.
Find the values of a and b and the other roots. [5]

Exercise 553. (9740 N2009/I/9.) (Answer on p. 1674.)

(i) Solve the equation

z 7 − (1 + i) = 0,

giving the roots in the form reiα , where r > 0 and −π < α ≤ π. [5]
(ii) Show the roots on an Argand diagram. [2]
(iii) The roots represented by z1 and z2 are such that 0 < arg z1 < arg z2 <
. Explain why
the locus of all points z such that ∣z − z1 ∣ = ∣z − z2 ∣ passes through the origin. Draw
this locus on your Argand diagram and find its exact cartesian equation. [5]

1155, Contents

Exercise 554. (9740 N2008/I/8.) (Answer on p. 1675.)
A graphic calculator is not to be used in answering this question.

(i) It is given that z1 = 1 + 3i. Find the value of z13 , showing clearly how you obtain your
answer. [3]

(ii) Given that 1 + 3i is a root of the equation 2z 3 + az 2 + bz + 4 = 0, find the values of
the real numbers a and b. [4]
(iii) For these values a and b, solve the equation in part (ii), and show all the roots on an
Argand diagram. [4]

Exercise 555. (9740 N2008/II/3.) (Answer on p. 1676.)

(a) The complex number w has modulus r and argument θ, where 0 < θ < , and w∗
denotes the conjugate of w. State the modulus and argument of p, where p = ∗ . [2]
Given that p is real and positive, find the possible values of θ. [2]
(b) The complex number z satisfies the relations ∣z∣ ≤ 6 and ∣z∣ = ∣z − 8 − 6i∣.
(i) Illustrate both of these relations on a single Argand diagram. [3]
(ii) Find the greatest and least possible values of arg z, giving your answers in radians
correct to 3 decimal places. [4]

Exercise 556. (9233 N2008/I/9.) (Answer on p. 1677.)

In an Argand diagram, the point P represents the complex number z. Clearly labelling
any relevant points, draw three separate diagrams to show the locus of P in each of the
following cases.
(i) ∣z + 2i∣ = 2. [2]
(ii) ∣z − 2 − i∣ = ∣z − i∣. [2]
(iii) ≤ arg(z + 1 − 3i) ≤ .
π π
6 3

Exercise 557. (9233 N2008/II/3.) (Answer on p. 1679.)

(i) Verify that w = 1 − i satisfies the equation w2 = −2i and write down the other root of
this equation. [3]
(ii) Use the quadratic formula to solve the equation z 2 − (3 + 5i) z − 4 (1 − 2i) = 0. [4]

Exercise 558. (9740 N2007/I/3.) (Answer on p. 1679.)

(a) Sketch, on an Argand √ diagram, the locus of points representing the complex number z
such that ∣z + 2 − 3i∣ = 13. [3]
(b) The complex number w is such that ww∗ +2w = 3+4i, where w∗ is the complex conjugate
of w. Find w in the form a + ib, where a and b are real. [4]

1156, Contents

Exercise 559. (9740 N2007/I/7.) (Answer on p. 1680.)
The polynomial P (z) has real coefficients. The equation P (z) = 0 has a root reiθ , where
r > 0 and 0 < θ < π.
(i) Write down a second root in terms of r and θ, and hence show that a quadratic factor
of P (z) is z 2 − 2rz cos θ + r2 . [3]
(ii) Solve the equation z = −64, expressing the solutions in the form re , where r > 0 and
6 iθ

−π < θ ≤ π. [4]
(iii) Hence, or otherwise, express z 6 + 64 as the product of three quadratic factors with real
coefficients, giving each factor in non-trigonometrical form. [3]

Exercise 560. (9233 N2007/I/9.) (Answer on p. 1681.)

(i) The equation az 4 + bz 3 + cz 2 + dz + e = 0 has a root z = ki, where k is real and non-zero.
Given that the coefficients a, b, c, d and e are real, show that ad2 + b2 e = bcd. [5]
(ii) Verify that this condition is satisfied for the equation z 4 + 3z 3 + 13z 2 + 27z + 36 = 0 and
hence find two roots of this equation which are of the form z = ki, where k is real. [3]

Exercise 561. (9233 N2007/II/5.) (Answer on p. 1681.)

(i) Illustrate, on an Argand diagram, the locus of a point P representing the complex
number z, where arg (z − 2i) = .
(ii) Illustrate, using the same Argand diagram, the locus of a point Q representing the
complex number z, where ∣z − 4∣ = ∣z + 2∣. [2]
(iii) Hence find the exact value of z such that arg (z − 2i) = and ∣z − 4∣ = ∣z + 2∣, giving
your answer in the form a + ib. [3]

(iv) Show that, in this case, zz ∗ = 8 + 4 3. [2]

Exercise 562. (9233 N2006/I/5.) (Answer on p. 1682.)

The complex number z satisfies ∣z + 4 − 4i∣ = 3.
(i) Describe, with the aid of a sketch, the locus of the point which represents z in an
Argand diagram. [3]
(ii) Find the least possible value of ∣z − i∣. [2]

Exercise 563. (9233 N2006/I/6.) (Answer on p. 1683.)

(i) Show that the equation z 4 − 2z 3 + 6z 2 − 8z + 8 = 0 has a root of the form ki where k is
real. [3]
(ii) Hence solve the equation z 4 − 2z 3 + 6z 2 − 8z + 8 = 0. [3]

1157, Contents

113. Past-Year Questions for Part V. Calculus
Exercise 564. (9758 N2017/I/1.) (Answer on p. 1685.)
Using standard series from the List of Formulae (MF26), expand e2x ln (1 + ax) as far as
the term in x3 , where a is a non-zero constant. Hence find the value of a for which there is
no term in x2 . [4]

Exercise 565. (9758 N2017/I/3.) (Answer on p. 1685.)

Do not use a calculator in answering this question.
A curve C has equation y 2 − 2xy + 5x2 − 10 = 0.
(i) Find the exact x-coordinates of the stationary points of C. [4]
(ii) For the stationary point with x > 0, determine whether it is a maximum or minimum.

Exercise 566. (9758 N2017/I/7.) (Answer on p. 1686.)

It is given that f (x) = sin 2mx + sin 2nx, where m and n are positive integers and m ≠ n.

(i) Find ∫ sin 2mx sin 2nx dx. [3]

(ii) Find ∫ (f (x)) dx.


Exercise 567. (9758 N2017/I/11.) (Answer on p. 1686.)

Sir Isaac Newton was a famous scientist renowned for his work on the laws of motion. One
law states that, for an object falling vertically in a vacuum, the rate of change of velocity,
v m s−1 , with respect to time, t seconds, is a constant, c.
(i) (a) Write down a differential equation relating v, t and c. [1]
(b) Initially the velocity of the object is 4 m s−1 and, after a further 2.5 s, the velocity
of the object is 29 m s−1 . Find v in terms of t and state the value of c. [3]

For an object falling vertically through the atmosphere, the rate of change of velocity is
less than that for an object falling in a vacuum. The new rate of change of v is modelled
as the difference between the value of c found in part (i)(b) and an amount proportional
to the velocity v, with a constant of proportionality k.
(ii) Given that in this case the initial velocity is zero, find v in terms of t and k. [5]
For an object falling through the atmosphere, the ‘terminal velocity’ is the value approached
by the velocity after a long time.
(iii) A falling object has initial velocity zero and terminal velocity 40 m s−1 . Find how long
it takes the object to reach 90% of its terminal velocity. [4]
1158, Contents
Exercise 568. (9758 N2017/II/4.) (Answer on p. 1687.)
(a) A flat novelty plate for serving food on is made in the shape of the region enclosed by
the curve y = x2 − 6x + 5 and the line 2y = x − 1. Find the area of the plate. [4]

(b) A curved container has a flat circular top. The shape of the container is formed by

rotating the part of the curve x =
, where a is a constant greater than 1, between
a − y2
the points (0, 0) and ( , 1) through 2π radians about the y-axis.
(i) Find the volume of the container, giving your answer as a single fraction in terms
of a and π. [4]

(ii) Another curved container with a flat circular top is formed in the same way from

the curve x = (0, (
and the points 0) and , 1). It has a volume that is
b − y2 b−1
four times as great as the container in part (i). Find an expression for b in terms

of a. [3]

Exercise 569. (9740 N2016/I/2.) (Answer on p. 1689.)

(i) Use your calculator to find the gradient of the curve y = 2cos x at the points where x = 0
and x = π. [2]
(ii) Find the equations of the tangents to this curve at the points where x = 0 and x = π
and find the coordinates of the point where these tangents meet. [3]

Exercise 570. (9740 N2016/I/8.) (Answer on p. 1689.)

It is given that y = f (x), where f (x) = tan (ax + b) for constants a and b.
(i) Show that f ′ (x) = a + ay 2 . Use this result to find f ′′ (x) and f ′′′ (x) in terms of a and
y. [5]
(ii) In the case where b = π, use your results from part (i) to find the Maclaurin series
for f (x) in terms of a, up to and including the term in x3 . [3]
(iii) Find the first two non-zero terms in the Maclaurin series for tan 2x. [3]
1159, Contents
Exercise 571. (9740 N2016/I/9.) (Answer on p. 1690.)
A stone is held on the surface of a pond and released. The stone falls vertically through the
water and the distance, x metres, that the stone has fallen in time t seconds is measured.
It is given that x = 0 and = 0 when t = 0.
(i) The motion of the stone is modelled by the differential equation
d2 x dx
+ 2 = 10.
dt2 dt
(a) By substituting y = , show that the differential equation can be written as
= 10 − 2y. [1]
(b) Find y in terms of t and hence find x in terms of t. [6]
(ii) A second model for the motion of the stone is suggested, given by the differential
d2 x 1
= 10 − 5 sin t.
dt 2 2
Find x in terms of t for this model. [3]
(iii) The pond is 5 metres deep. For each of these models, find the time the stone takes to
reach the bottom of the pond, giving your answers correct to 2 decimal places. [2]

Exercise 572. (9740 N2016/II/1.) 0.1 m3 per minute

Water is poured at a rate of 0.1 m3 per minute into

a container in the form of an open cone. The semi-
vertical angle of the cone is α, where tan α = 0.5. rm
At time t minutes after the start, the radius of the
water surface is r m (see diagram). Find the rate
of increase of the depth of water when the volume
of water in the container is 3 m3 . [7]
[The volume of a cone of base radius r and height
h is given by V = πr2 h.] (Answer on p. 1691.)
3 α

1160, Contents

Exercise 573. (9740 N2016/II/2.) (Answer on p. 1691.)

(a) (i) Find ∫ x2 cos nx dx, where n is a positive integer. [3]

2π π
(ii) Hence find ∫ x2 cos nx dx, giving your answers in the form a 2 , where the
π n
possible values of a are to be determined. [2]

(b) The region bounded by the curve y = , the x-axis and the lines x = 0 and x = 2
x x
9 − x2
is rotated through 2π radians about the x-axis. Use the substitution u = 9 − x2 to find

the exact volume of the solid obtained, simplifying your answer. [5]

Exercise 574. (9740 N2016/II/3.) (Answer on p. 1692.)

A curve D has parametric equations

x = t − cos t, y = 1 − cos t, for 0 ≤ t ≤ 2π.

(i) Sketch the graph of D. Give in exact form the coordinates of the points where D
meets the x-axis, and also give in exact form the coordinates of the maximum point
on the curve. [4]
(ii) Find, in terms of a, the area under D for 0 ≤ t ≤ a, where a is a positive constant less
than 2π. [3]
The normal to D at the point where t = π cuts the x-axis at E and the y-axis at F .
(iii) Find the exact area of triangle OEF , where O is the origin. [4]

Exercise 575. (9740 N2015/I/3.) (Answer on p. 1693.)

(i) Given that f is a continuous function, explain, with the aid of a sketch, why the value
1 1 2
[f ( ) + f ( ) + ⋅ ⋅ ⋅ + f ( )]
n→∞ n n n n
is ∫ f (x) dx. [2]
√ √ √
1 3 1 + 3 2 + ⋅⋅⋅ + 3 n
(ii) Hence evaluate lim ( √3
). [3]
n→∞ n n
1161, Contents
Exercise 576. (9740 N2015/I/4.) (Answer on p. 1694.)
A piece of wire of fixed length d m is cut into two parts. One part is bent into the shape
of a rectangle with sides of length x m and y m. The other part is bent into the shape of
a semicircle, including its diameter. The radius of the semicircle is x m. Show that the
maximum value of the total area of the two shapes can be expressed as kd2 m2 , where k is
a constant to be found.

Exercise 577. (9740 N2015/I/6.) (Answer on p. 1694.)

(i) Write down the first three non-zero terms in the Maclaurin series for ln (1 + 2x), where
1 1
− < x ≤ , simplifying the coefficients. [2]
2 2
(ii) It is given that the three terms found in part (i) are equal to the first three terms in the
series expansion of ax (1 + bx) for small x. Find the exact values of the constants a, b

and c and use these values to find the coefficient of x4 in the expansion of ax (1 + bx) ,

giving your answer as a simplified rational number. [6]

Exercise 578. (9740 N2015/I/10.) (Answer on p. 1695.)

With origin O, the curves with y

y = sin x
equations y = sin x and y = cos x,
where 0 ≤ x ≤ π, meet at the P
2 √
π 2 A2
point P with coordinates ( , ).
4 2
The area of the region bounded
by the curves and the x-axis is
A1 y = cos x
A1 and the area of the region
bounded by the curves and the x
y-axis is A2 (see diagram).
O 1
A1 √
(i) Show that = 2. [4]
(ii) The region bounded by y = sin x between O and P , the line y = 2 and the y-axis is
rotated about the y-axis through 360○ . Show that the volume of the solid formed is
given by

(sin−1 y) dy.

(iii) Show that the substitution y = sin u transforms the integral in (ii) to π ∫
u2 cos u du,
for limits a and b to be determined. Hence find the exact volume. [6]

1162, Contents

Exercise 579. (9740 N2015/I/11.) (Answer on p. 1696.)
A curve C has parametric equations

x = sin3 θ, y = 3 sin2 θ cos θ, for 0 ≤ θ ≤ π.
(i) Show that = 2 cot θ − tan θ. [3]

(ii) Show that C has a turning point when tan θ = k, where k is an integer to be
determined. Find, in non-trigonometric form, the exact coordinates of the turning
point and explain why it is a maximum. [6]
(iii) Show that the area of the region bounded by C and the x-axis is given by

∫0 9 sin4 θ cos2 θ dθ.

Use your calculator to find the area, giving your answer correct to 3 decimal places.

The line with equation y = ax, where a is a positive constant, meets C at the origin and at
the point P .
(iv) Show that tan θ = at P . Find the exact value of a such that the line passes through
the maximum point of C. [3]

Exercise 580. (9740 N2015/II/1.) (Answer on p. 1697.)

As a tree grows, the rate of increase of its height, h m, with respect to time, t years after
planting, is modelled by the differential equation

dh 1 1
= 16 − h.
dt 10 2
The tree is planted as a seedling of negligible height, so that h = 0 when t = 0.
(i) State the maximum height of the tree, according to this model. [1]
(ii) Find an expression for t in terms of h, and hence find the time the tree takes to reach
half its maximum height. [5]

Exercise 581. (9740 N2014/I/2.) (Answer on p. 1697.)

The curve C has equation x2 y+xy 2 +54 = 0. Without using a calculator, find the coordinates
of the point on C at which the gradient is −1, showing that there is only one such point.[6]
1163, Contents
Exercise 582. (9740 N2014/I/7.) (Answer on p. 1698.)
It is given that f (x) = x6 − 3x4 − 7. The diagram shows the curve with equation y = f (x)
and the line with equation y = −7, for x ≥ 0. The curve crosses the positive x-axis at x = α,
and the curve and the line meet where x = 0 and x = β.

O α x

(β, −7)

(i) Find the value of α, giving your answer correct to 3 decimal places, and find the exact
value of β. [2]
(ii) Evaluate ∫ f (x) dx, giving your answer correct to 3 decimal places.


(iii) Find, in terms of 3, the area of the finite region bounded by the curve and the line,
for x ≥ 0. [3]
(iv) Show that f (x) = f (−x). What can be said about the six roots of the equation
f (x) = 0? [4]

Exercise 583. (9740 N2014/I/8.) (Answer on p. 1699.)

It is given that f (x) = √ , where −3 < x < 3.
9 − x2
(i) Write down ∫ f (x) dx. [1]
(ii) Find the binomial expansion for f (x), up to and including the term in x6 . Give the
coefficients as exact fractions in their simplest form. [4]

Remark 135. For (ii), replace binomial expansion (no longer on the 9758 syllabus) with
Maclaurin expansion.

(iii) Hence, or otherwise, find the first four non-zero terms of the Maclaurin series for
sin−1 . Give the coefficients as exact fractions in their simplest form.
1164, Contents
Exercise 584. (9740 N2014/I/10.) (Answer on p. 1700.)
The mass, x grams, of a certain substance present in a chemical reaction at time t minutes
satisfies the differential equation
= k (1 + x − x2 ) ,
1 1 dx 1
where 0 ≤ x ≤ and k is a constant. It is given that x = and = − when t = 0.
2 2 dt 4
(i) Show that k = − . [1]
(ii) By first expressing 1 + x − x2 in completed square form, find t in terms of x. [5]
(iii) Hence find
(a) the exact time taken for the mass of the substance present in the chemical reaction
to become half of its initial value, [1]
(b) the time taken for there to be none of the substance present in the chemical
reaction, giving your answer correct to 3 decimal places. [1]
(iv) Express the solution of the differential equation in the form x = f (t) and sketch the
part of the curve with this equation which is relevant in this context. [5]

Exercise 585. (9740 N2014/I/11.) (Answer on p. 1701.)

[It is given that the volume of a sphere of radius r
is πr3 and the volume of a circular cone with base
radius r and height h is πr2 h.] 4
3 h
A toy manufacturer makes a toy which consists of a
hemisphere of radius r cm joined to a circular cone
of base radius r cm and height h cm (see diagram).
The manufacturer determines that the length of the r
slant edge of the cone must be 4 cm and that the
total volume of the toy, V cm3 , should be as large as
(i) Find a formula for V in terms of r. Given that
r = r1 is the value of r which gives the maximum
value of V , show that r1 satisfies the equation
45r4 − 768r2 + 1024 = 0. [6]
(ii) Find the two solutions to the equation in part (i) for which r > 0, giving your answers
correct to 3 decimal places. [2]
(iii) Show that one of the solutions found in part (ii) does not give a stationary value of
V . Hence write down the value of r1 and find the corresponding value of h. [3]
(iv) Sketch the graph showing the volume of the toy as the radius of the hemisphere varies.

1165, Contents

Exercise 586. (9740 N2014/II/2.) (Answer on p. 1702.)
Using partial fractions, find
2 9x2 + x − 13
∫0 (2x − 5) (x2 + 9)

Give your answer in the form a ln b + c tan−1 d, where a, b, c and d are rational numbers to
be determined. [9]

Exercise 587. (9740 N2013/I/5.) (Answer on p. 1702.)

It is given that

⎪ x2
⎪ 1− 2 for − a ≤ x ≤ a,
f (x) = ⎨


⎪ for a < x < 2a,

and that f (x + 3a) = f (x) for all real values of x, where a is a real constant.
(i) Sketch the graph of y = f (x) for −4a ≤ x ≤ 6a. [3]


(ii) Use the substitution x = a sin θ to find the exact value of ∫ 1 f (x) dx in terms of a
2 a

a 2
and π. [5]

Exercise 588. (9740 N2013/I/10.) (Answer on p. 1703.)

The variables x, y and z are connected by the following differential equations.

= 3 − 2z (A)
=z (B)
(i) Given that z < , solve equation (A) to find z in terms of x. [4]
(ii) Hence find y in terms of x. [2]
(iii) Use the result in part (ii) to show that

d2 y dy
= a + b,
dx2 dx

for constants a and b to be determined. [3]

(iv) The result in part (ii) represents a family of curves. Some members of the family are
straight lines. Write down the equations of two of these lines. On a single diagram,
sketch one of your lines together with a non-linear member of the family of curves
that has your line as an asymptote. [4]

1166, Contents

Exercise 589. (9740 N2013/I/11.) (Answer on p. 1704.)
A curve C has parametric equations

x = 3t2 , y = 2t3 .
(i) Find the equation of the tangent to C at the point with parameter t. [3]
(ii) Points P and Q on C have parameters p and q respectively. The tangent at P meets
the tangent at Q at the point R. Show that the x-coordinate of R is p2 + pq + q 2 , and
find the y-coordinate of R in terms of p and q. Given that pq = −1, show that R lies
on the curve with equation x = y 2 + 1. [5]


A curve L has equation x = y 2 + 1. The diagram shows the parts of C and L for which y ≥ 0.
The curves C and L touch at the point M .
(iii) Show that 4t6 − 3t2 + 1 = 0 at M . Hence, or otherwise, find the exact coordinates of
M. [3]
(iv) Find the exact value of the area of the shaded region bounded by C and L for which
y ≥ 0. [6]

1167, Contents

Exercise 590. (9740 N2013/II/2.) (Answer on p. 1705.)
Fig. 1 shows a piece of card, ABC, in the form of an equilateral triangle of side a. A
kite shape is cut from each corner, to give the shape shown in Fig. 2. The remaining card
shown in Fig. 2 is folded along the dotted lines, to form the open triangular prism of height
x shown in Fig. 3.

x x

x x
x x x
B Fig. 1 C Fig. 2 Fig. 3
1 √ √ 2
(i) Show that the volume V of the prism is given by V = x 3 (a − 2x 3) . [3]
(ii) Use differentiation to find, in terms of a, the maximum value of V , proving that it is
a maximum. [6]

Remark 136. For (ii), assume also that a is a fixed constant. Otherwise, V has no
maximum because we can simply let both a and x grow without bound.

Exercise 591. (9740 N2013/II/3.) (Answer on p. 1706.)

(i) Given that f (x) = ln (1 + 2 sin x), find f (0), f ′ (0), f ′′ (0) and f ′′′ (0). Hence write
down the first three non-zero terms in the Maclaurin series for f (x). [7]
(ii) The first two non-zero terms in the Maclaurin series for f (x) are equal to the first
two non-zero terms in the series expansion of eax sin nx. Using appropriate expansions
from the List of Formulae (MF15), find the constants a and n. Hence find the third
non-zero term of the series expansion of eax sin nx for these values of a and n. [5]

Exercise 592. (9740 N2012/I/2.) (Answer on p. 1706.)

(i) Find ∫ dx. [2]
1 + x4
(ii) Use the substitution u = x2 to find ∫
dx. [3]
1 + x4
1 2
(iii) Evaluate ∫ ( ) dx, giving the answer correct to 3 decimal places.
0 1 + x4
1168, Contents
Exercise 593. (9740 N2012/I/4.) (Answer on p. 1707.)
In the triangle ABC, AB = 1, angle BAC = θ radians and angle ABC = π radians (see

1 3
(i) Show that AC = . [4]
cos θ − sin θ
(ii) Given that θ is a sufficiently small angle, show that

AC ≈ 1 + aθ + bθ2 ,

for constants a and b to be determined. [4]

Exercise 594. (9740 N2012/I/8.) (Answer on p. 1707.)

The curve C has equation

x − y = (x + y) 2 .

It is given that C has only one turning point.

dy 2
(i) Show that 1 + = . [4]
dx 2x + 2y + 1
d2 y dy 3
(ii) Hence, or otherwise, show that = − (1 + ). [3]
dx2 dx
(iii) Hence, state, with a reason, whether the turning point is a maximum or a minimum.

1169, Contents

Exercise 595. (9740 N2012/I/10.) (Answer on p. 1708.)
[It is given that a sphere of radius r has surface
area 4πr2 and volume πr3 .]
A model of a concert hall is made up of three
parts. The roof is modelled by the curved sur-
face of a hemisphere of radius r cm. The walls
are modelled by the curved surface of a cylin-
der of radius r cm and height h cm. The floor
is modelled by a circular disc of radius r cm.
The three parts are joined together as shown in h
the diagram. The model is made of material of
negligible thickness.

(i) It is given that the volume of the model is a fixed value k cm3 , and the external surface
area is a minimum. Use differentiation to find the values of r and h in terms of k.
Simplify your answers. [7]
(ii) It is given instead that the volume of the model is 200 cm3 and its external surface
area is 180 cm2 . Show that there are two possible values of r. Given also that r < h,
find the value of r and the value of h. [5]

Exercise 596. (9740 N2012/I/11.) (Answer on p. 1709.)

A curve C has parametric equations

x = θ − sin θ, y = 1 − cos θ,

where 0 ≤ θ ≤ 2π.
dy 1
(i) Show that = cot θ and find the gradient of C at the point where θ = π. What can
dx 2
be said about the tangents to C as θ → 0 and θ → 2π? [5]
(ii) Sketch C, showing clearly the features of the curve at the points where θ = 0, π and
2π. [3]
(iii) Without using a calculator, find the exact area of the region bounded by C and the
x-axis. [5]
(iv) A point P on C has parameter p, where 0 < p < π. Show that the normal to C at P
crosses the x-axis at the point with coordinates (p, 0). [3]
1170, Contents
Exercise 597. (9740 N2012/II/1.) (Answer on p. 1710.)
d2 y
(a) Find the general solution of the differential equation 2 = 16 − 9x2 , giving your answer
in the form y = f (x). [3]
(b) Given that u and t are related by = 16 − 9u2 , and that u = 1 when t = 0, find t in
terms of u, simplifying your answer. [5]

Exercise 598. (9740 N2011/I/3.) (Answer on p. 1710.)

The parametric equations of a curve are x = t2 , y = .
(i) Find the equation of the tangent to the curve at the point (p2 , ), simplifying your
answer. [2]
(ii) Hence find the coordinates of the points Q and R where this tangent meets the x- and
y-axes respectively. [2]
(iii) Find a cartesian equation of the locus of the mid-point of QR as p varies.383 [3]

Exercise 599. (9740 N2011/I/4.) (Answer on p. 1710.)

(i) Use the first three non-zero terms of the Maclaurin series for cos x to find the Maclaurin
series for g (x), where g (x) = cos6 x, up to and including the term in x4 . [3]
(ii) (a) Use your answer to part (i) to give an approximation for ∫ g (x) dx in terms of

a, and evaluate this approximation in the case where a = .

g (x) dx. Why is the ap-

(b) Use your calculator to find an accurate value for ∫
proximation in part (ii) (a) not very good?

Exercise 600. (9740 N2011/I/5.) (Answer on p. 1711.)

It is given that f (x) = 2 − x.
(i) On separate diagrams, sketch the graphs of y = f (∣x∣) and y = ∣f (x)∣, giving the
coordinates of any points where the graphs meet the x- and y-axes. You should label
the graphs clearly. [3]
(ii) State the set of values of x for which f (∣x∣) = ∣f (x)∣. [1]
(iii) Find the exact value of the constant a for which ∫ f (∣x∣) dx = ∫ ∣f (x)∣ dx.
−1 1
The word locus has been removed from your H2 Maths syllabus. But here this word doesn’t really
mean much. We’ll just delete these three words without changing the question’s meaning.
1171, Contents
Exercise 601. (9740 N2011/I/8.) (Answer on p. 1712.)
(i) Find ∫ (100 − v 2 ) dv.
(ii) A stone is dropped from a stationary balloon. It leaves the balloon with zero speed,
and t seconds later it speed v metres per second satisfies the differential equation

= 10 − 0.1v 2 .
(iii) Find t in terms of v. Hence find the exact time the stone takes to reach a speed of 5
metres per second. [5]
(a) Find the speed of the stone after 1 second. [3]
(b) What happens to the speed of the stone for large values of t? [2]

Exercise 602. (9740 N2011/II/2.) (Answer on p. 1712.)

The diagram shows a rectangular piece A x B
of cardboard ABCD of sides n metres x
and 2n metres, where n is a positive con-
stant. A square of side x metres is re- P Q
moved from each corner of ABCD. The n
remaining shape is now folded along P Q, S R
QR, RS and SP to form an open rect-
angular box of height x metres.

(i) Show that the volume V cubic metres of the box is given by V = 2n2 x − 6nx2 + 4x3 .[3]
(ii) Without using a calculator, find in surd form the value of x that gives a stationary
value of V , and explain why there is only one answer. [6]

Exercise 603. (9740 N2011/II/4.) (Answer on p. 1713.)

(a) (i) Obtain a formula for ∫ x2 e−2x dx in terms of n, where n > 0.


(ii) Hence evaluate ∫ x2 e−2x dx. [1]
[You may assume that ne−2n and n2 e−2n → 0 as n → ∞.]

(b) The region bounded by the curve y = , the axis and the lines x = 0 and x = 1 is
+1 x2
rotated through 2π radians about the x-axis. Use the substitution x = tan θ to show
that the volume of the solid obtained is given by 16π ∫ sin2 θ dθ, and evaluate this
integral exactly. [6]

1172, Contents

Exercise 604. (9740 N2010/I/2.) (Answer on p. 1713.)
(i) Find the first three terms of the Maclaurin series for ex (1 + sin 2x). [You may use
standard results given in the List of Formulae (MF15).] [3]
(ii) It is given that the first two terms of this series are equal to the first two terms in the
4 n
series expansion, in ascending powers of x, of (1 + x) . Find n and show that the
third terms in each of these series are equal. [3]

Exercise 605. (9740 N2010/I/4.) (Answer on p. 1713.)

(i) Given that x2 − y 2 + 2xy + 4 = 0, find in terms of x and y. [4]
(ii) For the curve with equation x2 − y 2 + 2xy + 4 = 0, find the coordinates of each point at
which the tangent is parallel to the x-axis. [4]

Exercise 606. (9740 N2010/I/6.) (Answer on p. 1714.)

The diagram shows the curve with equation y = x3 − 3x + 1 and the line with equation y = 1.
The curve crosses the x-axis at x = α, x = β and x = γ and has turning points x = −1 and
x = 1.

α −1 O β 1 γ x

(i) Find the values of β and γ, giving your answers correct to 3 decimal places. [2]
(ii) Find the area of the region bounded by the curve and the x-axis between x = β and
x = γ. [2]
(iii) Use a non-calculator method to find the area of the region bounded by the curve and
the line. [4]
(iv) Find the set of values of k for which the equation x3 − 3x + 1 = k has three real distinct
roots. [2]

1173, Contents

Exercise 607. (9740 N2010/I/7.) (Answer on p. 1714.)
(i) A bottle containing liquid is taken from a refrigerator and placed in a room where the
temperature is a constant 20 ○ C. As the liquid warms up, the rate of increase of its
temperature θ ○ C after time t minutes is proportional to the temperature difference
(20 − θ) ○ C. Initially the temperature of the liquid is 10 ○ C and the rate of increase of
the temperature is 1 ○ C per minute. By setting up and solving a differential equation,
show that θ = 20 − 10e− 10 t .
(ii) Find the time it takes the liquid to reach a temperature of 15 ○ C, and state what
happens to θ for large values of t. Sketch a graph of θ against t. [4]

Exercise 608. (9740 N2010/I/9.) (Answer on p. 1715.)

A company requires a box made of cardboard of negligible thickness to hold 300 cm3 of
powder when full. The length of the box is 3x cm, the width is x cm and the height is y cm.
The lid has depth ky cm, where 0 < k ≤ 1 (see diagram).

3x 3x

x x
Box Lid

(i) Use differentiation to find, in terms of k, the value of x which gives a minimum total
external surface area of the box and the lid. [6]

Remark 137. My interpretation: “Total external surface area” refers to that when the
box and lid are kept separate, as depicted. Also, each of the box’s and lid’s external
surface areas consist of five rectangles. Further, I assume that k is constant.
(ii) Find also the ratio of the height to the width, , in this case, simplifying your answer.
(iii) Find the values between which must lie. [2]
(iv) Find the value of k for which the box has square ends. [2]

Remark 138. I interpret “ends” to mean those rectangles with dimensions x ⋅ y.

1174, Contents
Exercise 609. (9740 N2010/I/11.) (Answer on p. 1716.)
A curve C has parametric equations
1 1
x=t+ , y =t− .
t t
(i) The point P on the curve has parameter p. Show that the equation of the tangent at
P is
(p2 + 1) x − (p2 − 1)y = 4p. [4]

(ii) The tangent at P meets the line y = x at the point A and the line y = −x at the point
B. Show that the area of triangle OAB is independent of p, where O is the origin.[4]
(iii) Find a cartesian equation of C. Sketch C, giving the coordinates of any points where
C crosses the x- and y-axes and the equations of any asymptotes. [4]

Exercise 610. (9740 N2010/II/3.) (Answer on p. 1717.)

√ dy
(i) Given that y = x x + 2, find , expressing your answer as a single algebraic fraction.

Hence, show that there is only one value of x for which the curve y = x x + 2 has a
turning point, and state this value. [5]
(ii) A curve has equation y 2 = x2 (x + 2).
(a) Find exactly the possible values of the gradient at the point where x = 0. [2]
(b) Sketch the curve y = x (x + 2).
2 2

(iii) On a separate diagram sketch the graph of y = f ′ (x), where f (x) = x x + 2. State
the equations of any asymptotes. [2]

Exercise 611. (9740 N2009/I/2.) (Answer on p. 1717.)

Find the exact value of p such that
1 1 1
∫0 = ∫ √
dx dx. [5]
4 − x2 0 1 − p2 x 2

Exercise 612. (9740 N2009/I/4.) (Answer on p. 1718.)

It is given that

⎪7 − x2 for 0 < x ≤ 2,
f (x) = ⎨

⎩2x − 1
⎪ for 2 < x ≤ 4,

and that f (x) = f (x + 4) for all real values of x.

(i) Evaluate f (27) + f (45). [2]
(ii) Sketch the graph of y = f (x) for −7 ≤ x ≤ 10. [3]
(iii) Find ∫ f (x) dx. [3]
1175, Contents
Exercise 613. (9740 N2009/I/7.) (Answer on p. 1718.)

(i) Given that f (x) = ecos x , find f (0), f ′ (0) and f ′′ (0). Hence write down the first two

non-zero terms in the Maclaurin series for f (x). Give the coefficients in terms of e.[5]

(ii) Given that the first two non-zero terms in the Maclaurin series for f (x) are equal
to the first two non-zero terms in the series expansion of , where a and b are
a + bx2
constants, find a and b in terms of e. [4]

Exercise 614. (9740 N2009/I/11.) (Answer on p. 1719.)

The curve C has equation y = f (x), where f (x) = xe−x .

(i) Sketch the curve C. [2]

(ii) Find the exact coordinates of the turning points on the curve. [4]
(iii) Use the substitution u = x2 to find ∫ f (x) dx, for n > 0. Hence find the area of the

region between the curve and the positive x-axis. [4]
(iv) Find the exact value of ∫ ∣f (x)∣ dx. [2]
(v) Find the volume of revolution when the region bounded by the curve, the lines x = 0,
x = 1 and the x-axis is rotated completely about the x-axis. Give your answer correct
to 3 significant figures. [2]

Exercise 615. (9740 N2009/II/1.) (Answer on p. 1720.)

The curve C has parametric equations

x = t2 + 4t, y = t3 + t2 .
(i) Sketch the curve for −2 ≤ t ≤ 1. [1]

The tangent to the curve at the point P where t = 2 is denoted by l.

(ii) Find the cartesian equation of l. [3]
(iii) The tangent l meets C again at the point Q. Use a non-calculator method to find the
coordinates of Q. [4]
1176, Contents
Exercise 616. (9740 N2009/II/4.) (Answer on p. 1721.)
Two scientists are investigating the change of a certain population of size n thousand at
time t years.
d2 n
(i) One scientist suggests that n and t are related by the differential equation 2 = 10−6t.
Find the general solution of this differential equation. Sketch three members of the

family of solution curves, given that n = 100 when t = 0. [5]

(ii) The other scientist suggests that n and t are related by the differential equation
= 3 − 0.02n. Find n in terms of t, given again that n = 100 when t = 0. Explain in
simple terms what will eventually happen to the population using this model. [7]

Exercise 617. (9740 N2008/I/1.) (Answer on p. 1722.)

The diagram shows the curve with equation y = x2 . The area of the region bounded by the
curve, the lines x = 1, x = 2 and the x-axis is equal to the area of the region bounded by
the curve, the lines y = a, y = 4 and the y-axis , where a < 4. Find the value of a. [4]

O 1 2

Exercise 618. (9740 N2008/I/4.) (Answer on p. 1722.)

(i) Find the general solution of the differential equation

dy 3x
= 2 . [2].
dx x + 1
(ii) Find the particular solution of the differential equation for which y = 2 when x = 0.[1]
(iii) What can you say about the gradient of every solution curve as x → ±∞? [1]
(iv) Sketch, on a single diagram, the graph of the solution found in part (ii), together with
2 other members of the family of solution curves. [3]
1177, Contents
Exercise 619. (9740 N2008/I/5.) (Answer on p. 1722.)
(i) Find the exact value of ∫
dx. [3]
0 1 + 9x2
(ii) Find, in terms of n and e, ∫ xn ln x dx, where n ≠ −1. [4]

Exercise 620. (9740 N2008/I/6.) (Answer on p. 1723.)

(a) In the triangle ABC, AB = 1, BC = 3 and angle ABC = θ radian. Given that θ is a
sufficiently small angle, show that

AC ≈ 4 + 3θ2 ≈ a + bθ2 ,

for constants a and b to be determined. [5]

(b) Given that f (x) = tan (2x + ), find f (0), f ′ (0) and f ′′ (0). Hence find the first 3
terms in the Maclaurin series of f (x). [5]

Exercise 621. (9740 N2008/I/7.) (Answer on p. 1723.)

A new flower-bed is being designed for a large
garden. The flower-bed will occupy a rect-
angle x m by y m together with a semicircle
of diameter x m, as shown in the diagram. A
low wall will be built around the flower-bed.
The time needed to build the wall will be 3
hours per metre for the straight parts and
9 hours per metre for the semicircular part.
Given that a total time of 180 hours is taken
to build the wall, find, using differentiation,
the values of x and y which give a flower-bed y y
of maximum area. [10]

Exercise 622. (9740 N2008/II/1.) Let f (x) = ex sin x. (Answer on p. 1724.)

(i) Sketch the graph for y = f (x) for −3 ≤ x ≤ 3. [2]
(ii) Find the series expansion of f (x) in ascending powers of x, up to and including the
term in x3 . [3]
Denote the answer to part (ii) by g (x).
(iii) On the same diagram as in part (i), sketch the graph of y = g (x). Label the two
graphs clearly. [1]
(iv) Find, for −3 ≤ x ≤ 3, the set of values of x for which the value of g (x) is within ±0.5
of the value of f (x). [3]
1178, Contents
Exercise 623. (9740 N2008/II/2.) (Answer on p. 1724.)

The diagram shows the curve C with equation y 2 = x 1 − x. The region enclosed by C is
denoted by R.


O 0.5 1 x


(i) Write down an integral that gives the area of R, and evaluate this integral numerically.
(ii) The part of R above the x-axis is rotated through 2π radians about the x-axis. By
using the substitution u = 1 − x, or otherwise, find the exact value of the volume
obtained. [3]
(iii) Find the exact x-coordinate of the maximum point of C. [3]

Exercise 624. (9233 N2008/I/2.) (Answer on p. 1725.)

Find the constants a and b such that, when x is small,

cos 2x
√ ≈ a + bx2 . [4]
1+x 2

Exercise 625. (9233 N2008/I/3.) Show that (Answer on p. 1725.)

1 1 3 −2
∫0 xe dx = 4 − 4 e .

1179, Contents

Exercise 626. (9233 N2008/I/4.) (Answer on p. 1725.)
Use the substitution t = ln x to find the value of
e3 1
∫e dx. [6]
x (ln x)

Exercise 627. (9233 N2008/I/6.) (Answer on p. 1725.)

(i) Given that 0 < a < b, sketch the graph of y = ∣x − a∣ for −b ≤ x ≤ b. [3]

(ii) Find ∫ ∣x − a∣ dx.


Exercise 628. (9233 N2008/I/8.) (Answer on p. 1725.)

Find the exact value of a for which

∞ 1
3 1
∫a = ∫ √
dx dx. [5]
4 + x2 1
2 1 − x2

Exercise 629. (9233 N2008/I/10.) (Answer on p. 1726.)

(i) Prove that the substitution y = xz reduces the differential equation xy = x2 + y 2 to
xz = 1. [3]

(ii) Hence find the solution of the differential equation xy = x2 + y 2 for which y = 6
when x = 2. [5]

Exercise 630. (9233 N2008/I/13.) (Answer on p. 1726.)

A curve is defined by the parametric equations

x = cos3 t, y = sin3 t, for 0 < t < π.
(i) Show that the equation of the normal to the curve at the point P (cos3 t, sin3 t) is

x cos t − y sin t = cos4 t − sin4 t. [5]

(ii) Prove the identity cos4 t − sin4 t ≡ cos 2t. [2]

(iii) The normal at P meets the x-axis at A and the y-axis at B. Show that the length of
AB can be expressed in the form k cot 2t, where k is a constant to be found. [5]

1180, Contents

Exercise 631. (9233 N2008/I/14 EITHER.) (Answer on p. 1727.)
It is required to prove the statement:
1 − (n + 1)xn + nxn+1
1 + 2x + 3x2 + ⋅ ⋅ ⋅ + nxn−1 =
(1 − x)2

(i) Use mathematical induction to prove the statement for all positive integers n. [6]
(ii) By considering the expression obtained by integrating each term on the left hand side,
prove the statement without using mathematical induction. [6]

Exercise 632. (9233 N2008/II/1.) (Answer on p. 1727.)

Use the formulae for cos (A + B) and cos (A − B), with A = 5x and B = x, to show that
2 sin 5x sin x can be written as cos px − cos qx, where p and q are positive integers. [2]
Hence find the exact value of

∫0 sin 5x sin x dx. [3]

Exercise 633. (9233 N2008/II/5.) (Answer on p. 1728.)

(i) Show that the derivative of the function

ln (1 + x) −
is never negative. [5]
(ii) Hence show that ln (1 + x) ≥ when x ≥ 0. [3]

Remark 139. In (i), it is claimed that “ln (1 + x) − ” is a function. But it is not. It
is simply an expression.
As repeatedly stressed in Ch. 10, to specify a function, we must state its domain, codo-
main, and mapping rule. It turns out that the failure to do so here has important
consequences (see footnote in answer).

Exercise 634. (9740 N2007/I/4.) (Answer on p. 1728.)

The current I in an electric circuit at time t satisfies the differential equation

4 = 2 − 3I.
(i) Find I in terms of t, given that I = 2 when t = 0. [6]
(ii) State what happens to the current in this circuit for large values of t. [1]

1181, Contents

Exercise 635. (9740 N2007/I/11.) (Answer on p. 1729.)
A curve has parametric equations

x = cos2 t, y = sin3 t, for 0 ≤ t ≤ π.
(i) Sketch the curve. [2]
(ii) The tangent to the curve at the point (cos2 θ, sin3 θ), where 0 < θ < π, meets the x-
and y-axes at Q and R respectively. The origin is denoted by O. Show that the area
of △OQR is
sin θ (3 cos2 θ + 2 sin2 θ) .

(iii) Show that the area under the curve for 0 ≤ t ≤ 0.5π is 2 ∫

cos t sin4 t dt, and use the
substitution sin t = u to find this area. [5]

Exercise 636. (9740 N2007/II/3.) (Answer on p. 1730.)

(i) By successively differentiating (1 + x) n , find Maclaurin series for (1 + x) n , up to and

including the term in x3 . [4]
(ii) Obtain the expansion of (4 − x) 2 (1 + 2x2 ) 2 up to and including the term in x3 .
(iii) Find the set of values of x for which the expansion in part (ii) is valid. [2]

Exercise 637. (9740 N2007/II/4.) (Answer on p. 1730.)

5 5
3π 3π
(i) Find the exact value of ∫ sin x dx. Hence find the exact value of ∫
cos2 x dx.[6]
0 0
(ii) The region R is bounded by the curve y = x2 sin x, the line x = π and the part of the
x-axis between 0 and π. Find
(a) the exact area of R, [5]
(b) the numerical value of the volume of revolution formed when R is rotated com-
pletely about the x-axis, giving your answer correct to 3 decimal places. [2]

Exercise 638. (9233 N2007/I/2.) (Answer on p. 1731.)

Find the first negative coefficient in the expansion of (4 + 3x) 2 in a series of ascending

powers of x, where ∣x∣ < . Give your answer as a fraction in its lowest terms. [3]
1182, Contents
Exercise 639. (9233 N2007/I/3.) (Answer on p. 1731.)
1 1 1√
The region bounded by the curve y = √ , the x-axis and the lines x = and x = 3
1 + 4x2 2 2
is rotated through 4 right angles about the x-axis to form a solid of revolution of volume
V . Find the exact value of V , giving your answer in the form kπ2 . [5]

Exercise 640. (9233 N2007/I/8.) (Answer on p. 1731.)

(i) Use the substitution t = sin u to show that

(sin−1 t) cos [(sin−1 t) ]


∫ √ dt
1 − t2

simplifies to ∫ u cos u2 du. [3]

Remark 140. For (i), let us specify also that u ∈ (− , ) so that (a) t = sin u ∈ (−1, 1);
π π
2 2
(b) the integrand is well defined for all t; and (c) cos u ∈ (0, 1).

(sin−1 t) cos [(sin−1 t) ]

(ii) Hence evaluate ∫ √ dt. [4]
0 1 − t2

Exercise 641. (9233 N2007/I/10.) (Answer on p. 1731.)

(i) By sketching the graphs of y = cos x and y = sin x, or otherwise, solve the inequality

cos x > sin x

for 0 ≤ x ≤ 2π. [3]

(ii) Evaluate ∫ ∣cos x − sin x∣ dx. [5]

Exercise 642. (9233 N2007/I/11.) (Answer on p. 1732.)

Use partial fractions to evaluate
4 5x + 4
∫1 (x − 5) (x2 + 4)

giving your answer in the form − ln a, where a is a positive integer. [9]

1183, Contents
Exercise 643. (9233 N2007/I/13.) (Answer on p. 1732.)
In this question, the result sec x = sec x tan x may be quoted without proof.
Given that y = ln (sec x), show that

d3 y d2 y dy
(i) = 2 , [3]
dx3 dx2 dx
d4 y
(ii) the value of when x = 0 is 2. [4]
(iii) Write down the Maclaurin series for ln (sec x) up to and including the term in x4 . [2]
1 π2 π4
(iv) By substituting x = π, show that ln 2 ≈ + . [3]
4 16 1 536

Exercise 644. (9233 N2007/I/14.) (Answer on p. 1733.)

A family of curves is given by x2 − y 2 = Ax, where A is an arbitrary constant.

Remark 141. Assume384 also that x, y > 0.

(i) Show that

dy x2 + y 2
= . [4]
dx 2xy

A second, related family of curves is given by the differential equation

dy 2xy
=− 2 .
dx x + y2
(ii) By substituting y = vx, where v is a function of x, show that, for the second family of

dv 3v + v 3
=− . [4]
1 + v2
(iii) Hence show that the second family of curves is given by

3x2 y + y 3 = C,

where C is an arbitrary constant. [4]

In my opinion, there are two reasons why this simplifying assumption should have been included. First,
if x or y equals zero, then is undefined. Second, if x or y is negative, then there is little additional
hassle we’ll have to deal with in (iii), in particular with regards to ln ∣⋅∣. My suspicion is that on this
particular exam, these issues were simply glossed over.
1184, Contents
Exercise 645. (9233 N2006/I/7.) (Answer on p. 1733.)
A hollow cone of semi-vertical angle 45○ is
held with its axis vertical and vertex down-
wards (see diagram). At the beginning of an
experiment, it is filled with 390 cm3 of liquid.
The liquid runs out through a small hole at
the vertex at a constant rate of 2 cm3 s−1 .
Find the rate at which the depth of the li- 45°
quid is decreasing 3 minutes after the start
of the experiment. [6]

Remark 142. A cone’s volume is Height × 1/3 Base Area.

Exercise 646. (9233 N2006/I/8.) (Answer on p. 1734.)

Find the coordinates of the points on the curve

3x2 + xy + y 2 = 33

at which the tangent is parallel to the x-axis. [7]

Exercise 647. (9233 N2006/I/9.) (Answer on p. 1734.)

(i) Use the derivative of cos θ to show that (sec θ) = sec θ tan θ. [2]

(ii) Use the substitution x = sec θ − 1 to find the exact value of
1 1
∫√2−1 √ dx. [6]
(x + 1) x2 + 2x

Remark 143. For (ii), assume also that θ ∈ [0, π/2) ∪ [π, 3π/2), so that tan θ ≥ 0.

Exercise 648. (9233 N2006/I/12.) (Answer on p. 1734.)

(i) Express
1 + x − 2x2
f (x) =
(2 − x) (1 + x2 )

in partial fractions. [4]

(ii) Expand f (x) in ascending powers of x, up to and including the term in x2 . [5]
(iii) State the set of values of x for which the expansion is valid. [1]

1185, Contents

Exercise 649. (9233 N2006/I/14.) (Answer on p. 1735.)
A curve has parametric equations x = ct, y = , where c is a positive constant.
Three points P (cp, ), Q (cq, ), R (cr, ) on the curve are shown in the diagram.
c c c
p q r

O x

(i) Prove that the gradient of QR is − . [2]
(ii) Given that the line through P perpendicular to QR meets the curve at V (cv, ), find
v in terms of p, q and r. [2]
(iii) Find the gradient of the normal at P . [3]
(iv) The normal at P meets the curve again at S (cs, ). Show that s = − 3 .
s p

(v) Given that angle QP R is 90 , prove that QR is parallel to the normal at P . [3]

Exercise 650. (9233 N2006/II/2.) (Answer on p. 1735.)

dz 32
(i) Given that z = =
1 , show that . [3]
(x2 + 32) 2 dx (x2 + 32) 32
(ii) Find the exact value of the area of the region bounded by the curve y = ,
(x2 + 32)

the x-axis and the lines x = 2 and x = 7. [3]

1186, Contents

114. Past-Year Questions for Part VI. Prob. and Stats.
Exercise 651. (9758 N2017/II/5.) (Answer on p. 1736.)
A bag contains 6 red counters and 3 yellow counters. In a game, Lee removes counters
at random from the bag, one at a time, until he has taken out 2 red counters. The total
number of counters Lee removes from the bag is denoted by T .
(i) Find P (T = t) for all possible values of t. [3]
(ii) Find E (T ) and Var (T ). [2]
Lee plays this game 15 times.
3. Find the probability that Lee has to take at least 4 counters out of the bag in at least 5
of his 15 games. [2]

Exercise 652. (9758 N2017/II/6.) (Answer on p. 1736.)

A children’s game is played with 20 cards, consisting of 5 sets of 4 cards. Each set consists
of a father, mother, daughter and son from the same family. The family names are Red,
Blue, Green, Yellow and Orange. So, for example, the Red family cards are father Red,
mother Red, daughter Red and son Red.
The 20 cards are arranged in a row.
(i) In how many different ways can the 20 cards be arranged so that the 4 cards in each
family set are next to each other? [2]
(ii) In how many different ways can the cards be arranged so that all five father cards are
next to each other, all four Red family cards are next to each other and all four Blue
family cards are next to each other? [3]
The cards are now arranged at random in a circle.
(iii) Find the probability that no two father cards are next to each other. [4]

Exercise 653. (9758 N2017/II/7.) (Answer on p. 1736.)

The production manager of a food manufacturing company wishes to take a random sample
of a certain type of biscuit bar from the thousands produced one day at his factory, for
quality control purposes. He wishes to check that the mean mass of the bars is 32 grams,
as stated on the packets.
(i) State what it means for a sample to be random in this context. [1]
The masses, x grams, of a random sample of 40 biscuit bars are summarised as follows.

n = 40, ∑(x − 32) = −7.7, ∑(x − 32)2 = 11.05.

(ii) Calculate unbiased estimates of the population mean and variance of the mass of
biscuit bars. [2]
(iii) Test, at the 1% level of significance, the claim that the mean mass of biscuit bars is
32 grams. You should state your hypotheses and define any symbols you use. [5]
(iv) Explain why there is no need for the production manager to know anything about the
population distribution of the masses of the biscuit bars. [2]

1187, Contents

Exercise 654. (9758 N2017/II/8.) (Answer on p. 1736.)
(a) Draw separate scatter diagrams, each with 8 points, all in the first quadrant, which rep-
resent the situation where the produce moment correlation coefficient between variables
x and y is
(i) −1,
(ii) 0,
(iii) between 0.5 and 0.9. [3]
(b) An investigation into the effect of a fertiliser on yields of corn found that the amount
of fertiliser applied, x, resulted in the average yields of corn, y, given below, where x
and y are measured in suitable units.

x 0 40 80 120 160 200

y 70 104 118 119 126 129
(i) Draw a scatter diagram for these values. State which of the following equations,
where a and b are positive constants, provides the most accurate model of the
relationship between x and y.
(A) y = ax2 + b. (B) y = 2 + b.

(C) y = a ln 2x + b. (D) y = a x + b. [2]
(ii) Using the model you chose in part (i), write down the equation for the relationship
between x and y, giving the numerical values of the coefficients. State the product
moment correlation coefficient for this model. [3]
(iii) Give two reasons why it would be reasonable to use your model to estimate the
value of y when x = 189. [2]

1188, Contents

Exercise 655. (9758 N2017/II/9.) (Answer on p. 1736.)
On average 8% of a certain brand of kitchen lights are faulty. The lights are sold in boxes
of 12.
(i) State, in context, two assumptions needed for the number of faulty lights in a box to
be well modelled by a binomial distribution. [2]
Assume now that the number of faulty lights in a box has a binomial distribution.
(ii) Find the probability that a box of 12 of these kitchen lights contains at least 1 faulty
light. [1]
The boxes are packed into cartons. Each carton contains 20 boxes.
(iii) Find the probability that each box in one randomly selected carton contains at least
one faulty light. [1]
(iv) Find the probability that there are at least 20 faulty lights in a randomly selected
carton. [2]
(v) Explain why the answer to part (iv) is greater than the answer to part (iii). [1]
The manufacturer introduces a quick test to check if lights are faulty. Lights identified
as faulty are discarded. If a light is faulty there is a 95% chance that the quick test will
correctly identify the light as faulty. If the light is not faulty, there is a 6% chance that the
quick test will incorrectly identify the light as faulty.
(vi) Find the probability that a light identified as faulty by the quick test is not faulty.[3]
(vii) Find the probability that the quick test correctly identifies lights as faulty or not
faulty. [1]
(viii) Discuss briefly whether the quick test is worthwhile. [1]

Exercise 656. (9758 N2017/II/10.) (Answer on p. 1736.) A small component

for a machine is made from two metal spheres joined by a short metal bar. The masses in
grams of the spheres have the distribution N (20, 0.52 ).
(i) Find the probability that the mass of a randomly selected sphere is more than 20.2
grams. [1]

In order to protect them from rusting, the spheres are given a coating which increases the
mass of each sphere by 10%.
(ii) Find the probability that the mass of a coated sphere is between 21.5 and 22.45 grams.
State the distribution you use and its parameters. [3]
(iii) The masses of the metal bars are normally distributed such that 60% of them have a
mass greater than 12.2 grams and 25% of them have a mass less than 12 grams. Find
the mean and standard deviation of the masses of metal bars. [4]
(iv) The probability that the total mass of a component, consisting of two randomly chosen
coated spheres and one randomly chosen bar, is more than k grams is 0.75. Find k,
stating the parameters of any distribution you use. [4]
1189, Contents
Exercise 657. (9740 N2016/II/5.) (Answer on p. 1736.)
In a game of chance, a player has to spin a fair spinner. The spinner has 7 sections and an
arrow which has an equal chance of coming to rest over any of the 7 sections. The spinner
has 1 section labelled R, 2 sections labelled B and 4 sections labelled Y (see diagram).



The player then has to throw one of three fair six-sided dice, coloured red, blue or yellow.
If the spinner comes to rest over R the red die is thrown, if the spinner comes to rest over
B the blue die is thrown and if the spinner comes to rest over Y the yellow die is thrown.
The yellow die has one face with ∗ on it, the blue die has two faces with ∗ on it and the
red die has three faces with ∗ on it. The player wins the game if the die thrown comes to
rest with a face showing ∗ uppermost.
(i) Find the probability that a player wins a game. [2]
(ii) Given that a player wins a game, find the probability that the spinner came to rest
over B. [1]
(iii) Find the probability that a player wins 3 consecutive games, each time throwing a die
of a different colour. [2]

1190, Contents

Exercise 658. (9740 N2016/II/6.) (Answer on p. 1736.) The number of employees of a
company, classified by department and gender, is shown below.

Production Development Administration Finance

Male 2345 1013 237 344
Female 867 679 591 523

(i) The directors wish to survey a sample of 100 of the employees. This sample is to be
a stratified sample, based on department and gender.
(a) How many males should be in the sample? [1]
(b) How many females from the Development department should be in the sample?

The Managing Director knows that, some years ago, the mean age of employees was 37
years. He believes that the mean age of employees now is less than 37 years.
(ii) State why the stratified sample from part (i) should not be used for a hypothesis test
of the Managing Director’s belief. [1]
The Company Secretary obtains a suitable sample of 80 employees in order to carry out a
hypothesis test of the Managing Director’s belief that the mean age of the employees now
is less than 37 years. You are given that the population variance of the ages is 140 years2 .
(iii) Write down appropriate hypotheses to test the Managing Director’s belief. You are
given that the result of the test, using a 5% significance level, is that the Managing
Director’s belief should be accepted. Determine the set of possible values of the mean
age of the sample of employees. [4]
(iv) You are given instead that the mean age of the sample of employees is 35.2 years, and
that the result of a test at the α% significance level is that the Managing Director’s
belief should not be accepted. Find the set of possible values of α. [3]

Exercise 659. (9740 N2016/II/7.) (Answer on p. 1736.)

The management board of a company consists of 6 men and 4 women. A chairperson, a
secretary and a treasurer are chosen from the 10 members of the board. Find the number
of ways the chairperson, the secretary and the treasurer can be chosen so that
(i) they are all women, [1]
(ii) at least one is a woman and at least one is a man. [3]
The 10 members of the board sit at random around a table. Find the probability that
(iii) the chairperson, the secretary and the treasurer sit in three adjacent places, [3]
(iv) the chairperson, the secretary and the treasurer are all separated from each other by
at least one other person. [3]

1191, Contents

Exercise 660. (9740 N2016/II/8.) (Answer on p. 1736.)
A website about electric motors gives information about the percentage efficiency of motors
depending on their power, measured in horsepower. Xian has copied the following table for
a particular type of electric motor, but he has copied one of the efficiency values wrongly.

Power, x 1 1.5 2 3 5 7.5 10 20 30 40 50

Efficiency, y 72.5 82.5 84.0 87.4 87.5 88.5 89.5 90.2 91.0 91.7 92.4

(i) Plot a scatter diagram on graph paper for these values, labelling the axes, using a
scale of 2 cm to represent 10% efficiency on the y-axis and an appropriate scale for the
x-axis. On your diagram, circle the point that Xian has copied wrongly. [2]

For parts (ii), (iii) and (iv) of this question you should exclude the point for which Xian
has copied the efficiency value wrongly.
(ii) Explain from your scatter diagram why the relationship between x and y should not
be modelled by an equation of the form y = ax + b. [1]
(iii) Suppose that the relationship between x and y is modelled by an equation of the form
y = + d, where c and d are constants. State with a reason whether each of c and d
is positive or negative. [2]
(iv) Find the product moment correlation coefficient and the constants c and d for the
model in part (iii). [3]
(v) Use the model y = + d, with the values of c and d found in part (iv), to estimate the
efficiency value (y) that Xian has copied wrongly. Give two reasons why you would
expect this estimate to be reliable. [3]

Exercise 661. (9740 N2016/II/9.) (Answer on p. 1736.)

(a) The random variable X has distribution N (15, a2 ) and P (10 < X < 20) = 0.5. Find the
value of a. [2]
(b) The random variable Y has distribution B (4, p) and P (Y = 1) + P (Y = 2) = 0.5. Show
that 4p4 − 12p2 + 8p = 1 and hence find the possible values of p. [4]
(c) On a television quiz show contestants have to select the right answer from one of
three alternatives. George decides to do this entirely by guesswork. Use a suitable
approximation, which should be stated, to find the probability that George guesses at
least 30 questions right out of 100. [4]

1192, Contents

Exercise 662. (9740 N2016/II/10.) (Answer on p. 1736.)
Mia owns a field. Various types of weed are found in Mia’s field.
(i) State, in this context, two conditions that must be met for the numbers of a particular
type of weed in Mia’s field to be well modelled by a Poisson distribution. [2]
For the remainder of this question assume that these conditions are met.
There is an average of 1.5 dandelion plants (a type of weed) per m2 in Mia’s field.
(ii) Find the probability that in 1 m2 of Mia’s field there are at least 2 dandelion plants.
(iii) Find the probability that in 4 m2 of Mia’s field there are at most 3 dandelion plants.
(iv) Use a suitable approximation, which should be stated, to find the probability that the
number of dandelion plants in an 80 m2 area of Mia’s field is between 110 and 140
inclusive. [4]
The distribution of daisies (another type of weed) per m2 in Mia’s field can be modelled by
Po(λ). The probability that the number of daisies in a 1 m2 area of the field is less than or
equal to 2 is the same as the probability that the number of daisies in a 2 m2 area of the
field is more than 2.
(v) Write down an equation in λ and solve it to find λ. [4]

Exercise 663. (9740 N2015/II/5.) (Answer on p. 1736.)

The manager of a busy supermarket wishes to conduct a survey of the opinions of customers
of different ages about different types of cola drink.
(i) Give a reason why the manager would not be able to use stratified sampling. [1]
(ii) Explain briefly how the manager could carry out a survey using quota sampling. [2]
(iii) Give one reason why quota sampling would not necessarily provide a sample which is
representative of the customers of the supermarket. [1]

Exercise 664. (9740 N2015/II/6.) (Answer on p. 1736.)

‘Droppers’ are small sweets that are made in a variety of colours. Droppers are sold in
packets and the colours of the sweets in a packet are independent of each other. On
average, 25% of Droppers are red.
(i) A small packet of Droppers contains 10 sweets. Find the probability that there are at
least 4 red sweets in a small packet. [2]
A large packet of Droppers contains 100 sweets.
(ii) Use a suitable approximation, which should be stated, to find the probability that a
large packet contains at least 30 red sweets. [3]
(iii) Yip buys 15 large packets of Droppers. Find the probability that no more than 3 of
these packets contain at least 30 red sweets. [2]
1193, Contents
Exercise 665. (9740 N2015/II/7.) (Answer on p. 1737.)
The average number of errors per page for a certain daily newspaper is being investigated.
(i) State, in context, two assumptions that need to be made for the number of errors per
page to be well modelled by a Poisson distribution. [2]
Assume that the number of errors per page has the distribution Po(1.3).
(ii) Find the probability that, on one day, there are more than 10 errors altogether on the
first 6 pages. [3]
(iii) The probability that there are fewer than 2 errors altogether on the first n pages of
the newspaper is less than 0.05. Write down an inequality in terms of n to represent
this information, and hence find the least possible value of n. [2]

Exercise 666. (9740 N2015/II/8.) (Answer on p. 1737.)

A market stall sells pineapples which have masses that are normally distributed. The stall
owner claims the mean mass of the pineapples is at least 0.9 kg. Nur buys a random selection
of 8 pineapples from the stall. The 8 pineapples have masses, in kg, as follows.

0.80 1.000 0.82 0.85 0.93 0.96 0.81 0.89

(i) Find unbiased estimates of the population mean and variance of the mass of pine-
(ii) Test the stall owner’s claim at the 10% level of significance. [7]

Exercise 667. (9740 N2015/II/9.) (Answer on p. 1737.)

For events A, B and C it is given that P (A) = 0.45, P (B) = 0.4, P (C) = 0.3 and
P (A ∩ B ∩ C) = 0.1. It is also given that events A and B are independent, and that
events A and C are independent.
(i) Find P (B∣A). [1]
(ii) Given also that events B and C are independent, find P (A′ ∩ B ′ ∩ C ′ ). [3]
(iii) Given instead that events B and C are not independent, find the greatest and least
possible values of P (A′ ∩ B ′ ∩ C ′ ). [4]
1194, Contents
Exercise 668. (9740 N2015/II/10.) (Answer on p. 1738.)
In an experiment the following information was gathered about air pressure P , measured
in inches of mercury, at different heights above sea-level h, measured in feet.

h 2000 5000 10 000 15 000 20 000 25 000 30 000 35 000 40 000 45 000
P 27.8 24.9 20.6 16.9 13.8 11.1 8.89 7.04 5.52 4.28

(i) Draw a scatter diagram for these values, labelling the axes. [1]
(ii) Find, correct to 4 decimal places, the product moment correlation coefficient between
(a) h and P ,
(b) ln h and P ,

(c) h and P . [3]
(iii) Using the most appropriate case from part (ii), find the equation which best models
air pressure at different heights. [3]
(iv) Given that 1 metre = 3.28 feet, re-write your equation from part (iii) so that it can
be used to estimate the air pressure when the height is given in metres. [2]

Exercise 669. (9740 N2015/II/11.) (Answer on p. 1739.)

This question is about arrangements of all eight letters in the word CABBAGES.
(i) Find the number of different arrangements of the eight letters that can be made. [2]
(ii) Write down the number of these arrangements in which the letters are not in alpha-
betical order. [1]
(iii) Find the number of different arrangements that can be made with both the A’s to-
gether and both the B’s together. [2]
(iv) Find the number of different arrangements that can be made with no two adjacent
letters the same. [4]

Exercise 670. (9740 N2015/II/12.) (Answer on p. 1739.)

In this question you should state clearly the values of the parameters of any normal distri-
bution you use.
The masses in grams of apples have the distribution N (300, 202 ) and the masses in grams
of pears have the distribution N (200, 152 ). A certain recipe requires 5 apples and 8 pears.
(i) Find the probability that the total mass of 5 randomly chosen apples is more than
1600 grams. [2]
(ii) Find the probability that the total mass of 5 randomly chosen apples is more than the
total mass of 8 randomly chosen pears. [3]
The recipe requires the apples and pears to be prepared by peeling them and removing the
cores. This process reduces the mass of each apple by 15% and the mass of each pear by
(iii) Find the probability that the total mass, after preparation, of 5 randomly chosen
apples and 8 randomly chosen pears is less than 2750 grams. [4]
1195, Contents
Exercise 671. (9740 N2014/II/5.) (Answer on p. 1740.)
An Internet retailer has compiled a list of 10 000 regular customers and wishes to carry out
a survey of customer opinions involving 5% of its customers.
(i) Describe how the marketing manager could choose customers for this survey using
systematic sampling. [2]
(ii) Give one advantage and one disadvantage of systematic sampling in this context. [2]

Exercise 672. (9740 N2014/II/6.) (Answer on p. 1740.)

A team in a particular sport consists of 1 goalkeeper, 4 defenders, 2 midfielders and 4
attackers. A certain club has 3 goalkeepers, 8 defenders, 5 midfielders and 6 attackers.
(i) How many different teams can be formed by the club? [2]
One of the midfielders in the club is the brother of one of the attackers in the club.
(ii) How many different teams can be formed which include exactly one of the two broth-
ers? [3]
The two brothers leave the club. The club manager decides that one of the remaining
midfielders can play either as a midfielder or as a defender.
(iii) How many different teams can now be formed by the club? [3]

Exercise 673. (9740 N2014/II/7.) (Answer on p. 1740.)

Yan is carrying out an experiment with a fair 6-sided die and a biased 6-sided die, each
numbered from 1 to 6.
(i) Yan rolls the fair die 10 times. Find the probability that it shows a 6 exactly thrice.
(ii) Yan now rolls the fair die 60 times. Use a suitable approximate distribution, which
should be stated, to find the probability that the die shows a 6 between 5 and 8 times,
inclusive. [3]
The probability that the biased die shows a 6 is .
(iii) Yan rolls the biased die 60 times. Use a suitable approximate distribution, which
should be stated, to find the probability that the biased die shows a 6 between 5 and
8 times, inclusive. [3]
1196, Contents
Exercise 674. (9740 N2014/II/8.) (Answer on p. 1741.)
(a) Sketch a scatter diagram that might be expected when x and y related approximately
by y = px2 + t in each of the cases (i) and (ii) below. In each case your diagram should
include 6 points, approximately equally spaced with respect to x, and with all x-values
(i) p and t are both positive.
(ii) p is negative and t is positive. [2]
(b) The age in months (m) and prices in dollars (P ) of a random sample of ten used cars
of a certain model are given in the table.

m 11 20 28 36 40 47 58 62 68 75
P 112 800 102 600 76 500 72 000 72 000 69 000 65 800 57 000 50 600 47 600

It is thought that the price after m months can be modelled by one of the formulae

P = am + b, P = c ln m + d,

where a, b, c and d are constants.

(i) Find, correct to 4 decimal places, the value of the product moment correlation
coefficient between
(A) m and P ; and
(B) ln m and P . [2]
(ii) Explain which of P = am + b and P = c ln m + d is the better model and find the
equation of a suitable regression line for this model. [3]
(iii) Use the equation of your regression line to estimate the price of a car that is 50
months old. [1]

Exercise 675. (9740 N2014/II/9.) (Answer on p. 1741.)

The number of minutes that the 0815 bus arrives late at my local bus stop has a normal
distribution; the mean number of minutes the bus is late has been 4.3. A new company
takes over the service, claiming that punctuality will be improved. After the new company
takes over, a random sample of 10 days is taken and the number of minutes that the bus
is late is recorded. The sample mean is t̄ minutes and the sample variance is k 2 minutes2 .
A test is to be carried out at the 10% level of significance to determine whether the mean
number of minutes late has been reduced.
(i) State appropriate hypotheses for the test, defining any symbols that you use. [2]
(ii) Given that k 2 = 3.2, find the set of values of t̄ for which the result of the test would
be that the null hypothesis is not rejected. [4]
(iii) Given instead that t̄ = 4.0, find the set of values of k 2 for which the result of the test
would be to reject the null hypothesis. [3]

1197, Contents

Exercise 676. (9740 N2014/II/10.) (Answer on p. 1742.)
A game has three sets of ten symbols, and one symbol from each set is randomly chosen to
be displayed on each turn. The symbols are as follows.

Set 1 + + + + × × × ◯ ◯ ⋆
Set 2 + + + × ◯ ◯ ◯ ◯ ⋆ ⋆
Set 3 + + × × × × ◯ ◯ ◯ ⋆

For example, if a + symbol is chosen from set 1, a ◯ symbol is chosen from set 2 and a ⋆
symbol is chosen from set 3, the display would be +◯⋆.
(i) Find the probability that, on one turn,
(a) ⋆ ⋆ ⋆ is displayed, [1]
(b) at least one ⋆ symbol is displayed, [2]
(c) two ×symbols and one + symbol are displayed in any order. [3]
(ii) Given that exactly one of the symbols displayed is ⋆, find the probability that the
other two symbols are + and ◯. [4]
Exercise 677. (9740 N2014/II/11.) (Answer on p. 1742.)
An art dealers sells both original paintings and prints. (Prints are copies of paintings.) It
is to be assumed that his sales of originals per week can be modelled by the distribution
Po(2) and his sales of prints per week can be modelled by the independent distribution
(i) Find the probability that, in a randomly chosen week,
(a) the art dealer sells more than 8 prints, [2]
(b) the art dealer sells a total of fewer than 15 prints and originals combined. [2]
(ii) The probability that the art dealer sells fewer than 3 originals in a period of n weeks
is less than 0.01. Express this information as an inequality in n, and hence find the
smallest possible integer value of n. [5]
(iii) Using a suitable approximation, which should be stated, find the probability that the
art dealer sells more than 550 prints in a year (52 weeks). [3]
(iv) Give two reasons in context why the assumptions made at the start of this question
may not be valid. [2]
Exercise 678. (9740 N2013/II/5.) (Answer on p. 1743.)
A large multi-national company has 100 000 employees based in several different countries.
To celebrate the 90th anniversary of the founding of the company, the Chief Executive
wishes to invite a representative sample of 90 employees to a party, to be held at the
company’s Headquarters in Singapore.
(i) Explain how random sampling could be carried out to choose the 90 employees. Ex-
plain briefly why this may not provide the representative sample that the Chief Ex-
ecutive wants. [2]
(ii) Name a more appropriate sampling method, and explain how it can be carried out to
provide the representative sample that the Chief Executive wants. [2]

1198, Contents

Exercise 679. (9740 N2013/II/6.) (Answer on p. 1743.)
The continuous random variable Y has the distribution N (µ, σ 2 ). It is known that P (Y < 2a) =
0.95 and P (Y < a) = 0.25. Express µ in the form ka, where k is a constant to be determined.

Exercise 680. (9740 N2013/II/7.) (Answer on p. 1743.)

On average one in 20 packets of a breakfast cereal contains a free gift. Jack buys n packets
from a supermarket. The number of these packets containing a free gift is the random
variable F .

(i) State, in context, two assumptions needed for F to be well modelled by a binomial
distribution. [2]
Assume now that F has a binomial distribution.
(ii) Given that n = 20, find P (F = 1). [1]
(iii) Given instead that n = 60, use a suitable approximation to find the probability that
F is at least 5. State the parameter(s) of the distribution that you use. [3]

Exercise 681. (9740 N2013/II/8.) (Answer on p. 1743.) For events A and B it is given
that P(A) = 0.7, P (B∣A′ ) = 0.8 and P (A∣B ′ ) = 0.88. Find

(i) P (B ∩ A′ ), [1]
(ii) P (A′ ∩ B ′ ), [2]
(iii) P (A ∩ B). [3]

Exercise 682. (9740 N2013/II/9.) (Answer on p. 1743.)

A motoring magazine editor believes that the figures quoted by car manufacturers for
distances travelled per litre of fuel are too high. He carries out a survey into this by asking
for information from readers. For a certain model of car, 8 readers reply with the following
data, measured in km per litre.

14.0 12.5 11.0 11.0 12.5 12.6 15.6 13.2

(i) Calculate unbiased estimates of the population mean and variance. [2]
The manufacturer claims that this model of car will travel 13.8 km per litre on average. It
is given that the distances travelled per litre for cars of this model are normally distributed.
(ii) Stating a necessary assumption, carry out a t-test of the magazine editor’s belief at
the 5% significance level. [5]

1199, Contents

Exercise 683. (9740 N2013/II/10.) (Answer on p. 1744.)
(i) Sketch a scatter diagram that might be expected when x and y are related approxim-
ately as given in each of the cases (A), (B) and (C) below. In each case your diagram
should include 6 points, approximately equally spaced with respect to x, and with all
x- and y-values positive. The letters a, b, c, d, e and f represent constants.
(A) y = a + bx2 , where a is positive and b is negative,
(B) y = c + d ln x, where c is positive and d is negative,
(C) y = e + , where e is positive and f is negative.
A motoring website gives the following information about the distance travelled, y km, by
a certain type of car at different speeds, x km h−1 , on a fixed amount of fuel.

Speed, x 88 96 104 112 120 128

Distance, y 148 147 144 138 126 107

(ii) Draw the scatter diagram for these values, labelling the axes. [1]
(iii) Explain which of the three cases in part (i) is the most appropriate for modelling
these values, and calculate the product moment correlation coefficient for this case.[2]
(iv) It is required to estimate the distance travelled at a speed of 110 km h−1 . Use the case
that you identified in part (iii) to find the equation of a suitable regression line, and
use your equation to find the required estimate. [3]

Exercise 684. (9740 N2013/II/11.) (Answer on p. 1745.)

A machine is used to generate codes consisting of three letters followed by two digits. Each
of the three letters generated is equally likely to be any of the twenty-six letters of the
alphabet A–Z. Each of the two digits generated is equally likely to be any of the nine digits
1–9. The digit 0 is not used. Find the probability that a randomly chosen code has
(i) three different letters and two different digits, [2]
(ii) the second digit higher than the first digit, [2]
(iii) exactly two letters the same or two digits the same, but not both, [4]
(iv) exactly one vowel (A, E, I, O or U) and exactly one even digit. [4]

1200, Contents

Exercise 685. (9740 N2013/II/12.) (Answer on p. 1745.)
A company has two departments and each department records the number of employees
absent through illness each day. Over a long period of time it is found that the aver-
age numbers absent on a day are 1.2 for the Administration Department and 2.7 for the
Manufacturing Department.
(i) State, in this context, two conditions that must be met for the numbers of absences to
be well modelled by Poisson distributions. Explain why each of your two conditions
may not be met. [3]
For the remainder of this question assume that these conditions are met. You should assume
also that absences in the two departments are independent of each other.
(ii) Find the smallest number of days for which the probability that no employee is absent
through illness from the Administration Department is less than 0.01. [2]
Each employee absent on a day represents one ‘day of absence’. So, one employee absent
for 3 days contributes 3 days of absence, and 5 employees absent on 1 day contribute 5 days
of absence.
(iii) Find the probability that, in a 5-day period, the total number of days of absence in
the two departments is more than 20. [3]
(iv) Use a suitable approximation, which should be stated together with its parameter(s),
to find the probability that, in a 60-day period, the total number of days of absence
in the two departments is between 200 and 250 inclusive. [4]

Exercise 686. (9740 N2012/II/5.) (Answer on p. 1746.)

The probability that a hospital patient has a particular disease is 0.001. A test for the
disease has probability p of giving a positive result when the patient has the disease, and
equal probability p of giving a negative result when the patient does not have the disease.
A patient is given the test.
(i) Given that p = 0.995, find the probability that
(a) the result of the test is positive, [2]
(b) the patient has the disease given that the result of the test is positive. [2]
(ii) It is given instead that there is a probability of 0.75 that the patient has the disease
given that the result of the test is positive. Find the value of p, giving your answer
correct to 6 decimal places. [3]

1201, Contents

Exercise 687. (9740 N2012/II/6.) (Answer on p. 1746.)
On a remote island a zoologist measures the tail lengths of a random sample of 20 squirrels.
In a species of squirrel known to her, the tail lengths have mean 14.0 cm. She carries out
a test, at the 5% significance level of whether squirrels on the island have the same mean
tail length as the species known to her. She assumes that the tail lengths of squirrels on
the island are normally distributed with standard deviation 3.8 cm.
(i) State appropriate hypotheses for the test. [1]
The sample mean tail length is denoted by x̄ cm.
(ii) Use an algebraic method to calculate the set of values of x̄ for which the null hypothesis
would not be rejected. (Answers obtained by trial and improvement from a calculator
will obtain no marks.) [3]
(iii) State the conclusion of the test in the case where x̄ = 15.8. [2]

Exercise 688. (9740 N2012/II/7.) (Answer on p. 1746.)

A group of fifteen people consists of one pair of sisters, one set of three brothers and ten
other people. The fifteen people are arranged randomly in a line.
(i) Find the probability that the sisters are next to each other. [2]
(ii) Find the probability that the brothers are not all next to each other. [2]
(iii) Find the probability that the sisters are next to each other and the brothers are all
next to each other. [2]
(iv) Find the probability that either the sisters are next to each other or the brothers are
all next to each other or both. [2]
Instead the fifteen people are arranged randomly in a circle.
(v) Find the probability that the sisters are next to each other. [1]

1202, Contents

Exercise 689. (9740 N2012/II/8.) (Answer on p. 1747.)
Amy is revising for a mathematics examination and takes a different practice paper each
week. Her marks, y% in week x, are as follows.

Week x 1 2 3 4 5 6
Percentage mark y 38 63 67 75 71 82

(i) Draw a scatter diagram showing these marks. [1]

(ii) Suggest a possible reason why one of the marks does not seem to follow the trend.[1]
(iii) It is desired to predict Amy’s marks on future papers. Explain why, in this context,
neither a linear nor a quadratic model is likely to be appropriate. [2]
It is decided to fit a model of the form ln(L − y) = a + bx, where L is a suitable constant.
The product moment correlation coefficient between x and ln(L − y) is denoted by r. The
following table gives values of r for some possible values of L.

L 91 92 93
r −0.929 944 −0.929 918

(iv) Calculate the value of r for L = 91, giving your answer correct to 6 decimal places.[1]
(v) Use the table and your answer to part (iv) to suggest with a reason which of 91, 92
or 93 is the most appropriate value for L. [1]
(vi) Using the value for L, calculate the values of a and b, and use them to predict the
week in which Amy will obtain her first mark of at least 90%. [4]
(vii) Give an interpretation, in context, of the value of L. [1]

Exercise 690. (9740 N2012/II/9.) (Answer on p. 1748.)

In an opinion poll before an election, a sample of 30 voters is obtained.
(i) The number of voters in the sample who support the Alliance Party is denoted by
A. State, in context, what must be assumed for A to be well modelled by a binomial
distribution. [2]
Assume now that A has the distribution B (30, p).
(ii) Given that p = 0.15, find P (A = 3 or 4). [2]
(iii) Given instead that p = 0.55, explain whether it is possible to approximate the distri-
bution of A with
(a) a normal distribution,
(b) a Poisson distribution. [3]

(iv) For an unknown value of p it is given that P (A = 15) = 0.06864 correct to 5 decimal
places. Show that p satisfies an equation of the form p(1−p) = k, where k is a constant
to be determined. Hence find the value of p to a suitable degree of accuracy, given
that p < 0.5. [5]

1203, Contents

Exercise 691. (9740 N2012/II/10.) (Answer on p. 1748.)
Gold coins are found scattered throughout an archaeological site.
(i) State two conditions needed for the number of gold coins found in a randomly chosen
region of area 1 square metre to be well modelled by a Poisson distribution. [2]
Assume that the number of gold coins in 1 square metre has the distribution Po(0.8).
(ii) Find the probability that in 1 square metre there are at least 3 gold coins. [1]
(iii) It is given that the probability that 1 gold coin is found in x square metres is 0.2.
Write down an equation for x, and solve it numerically given that x < 1. [2]
(iv) Use a suitable approximation to find the probability that in 100 square metres there
are at least 90 gold coins. State the parameter(s) of the distribution that you use. [3]
Pottery shards are also found scattered throughout the site. The number of pottery shards
in 1 square metre is an independent random variable with the distribution Po(3). Use
suitable approximations, whose parameters should be stated, to find
(v) the probability that in 50 square metres the total number of gold coins and pottery
shards is at least 200, [4]
(vi) the probability that in 50 square metres there are at least 3 times as many pottery
shards as gold coins. [3]

Exercise 692. (9740 N2011/II/5.) (Answer on p. 1749.)

The continuous random variable X has the distribution N (µ, σ 2 ). It is known that
P (X < 40.0) = 0.05 and P (X < 70.0) = 0.975. Calculate the values of µand σ. [4]

Exercise 693. (9740 N2011/II/6.) (Answer on p. 1749.)

It is desired to interview residents of a city suburb about the types of shop to be opened
in a new shopping mall. In particular it is necessary to interview a representative range of
(i) Explain how a quota sample might be carried out in this context. [2]
(ii) Explain a disadvantage of quota sampling in the context of your answer to part (i).
(iii) State the name of a method of sampling that would not have this disadvantage, and
explain whether it would be realistic to use this method in this context. [2]
1204, Contents
Exercise 694. (9740 N2011/II/7.) (Answer on p. 1749.)
When I try to contact (by telephone) any of my friends in the evening, I know that on
average the probability that I succeed is 0.7. On one evening I attempt to contact a fixed
number, n, of different friends. If I do not succeed with a particular friend, I do not
attempt to contact that friend again that evening. The number of friends whom I succeed
in contacting is the random variable R.
(i) State, in the context of this question, two assumptions needed to model R by a
binomial distribution. [2]
(ii) Explain why one of the assumptions stated in part (i) may not hold in this context.
Assume now that these assumptions do in fact hold.
(iii) Given that n = 8, find the probability that R is at least 6. [1]
(iv) Given that n = 40, use an appropriate approximation to find P (R < 25). State the
parameters of the distribution you use. [4]

Exercise 695. (9740 N2011/II/8.) (Answer on p. 1750.)

(i) Sketch a scatter diagram that might be expected for the case when x and y are related
approximately by y = a+bx2 , where a is positive and b is negative. Your diagram should
include 5 points, approximately equally spaced with respect to x, and with all x- and
y-values positive. [1]

The table gives the values of seven observations of bivariate data, x and y.

x 2.0 2.5 3.0 3.5 4.0 4.5 5.0

y 18.8 16.9 14.5 11.7 8.6 4.9 0.8

(ii) Calculate the value of the product moment correlation coefficient, and explain why
its value does not necessarily mean that the best model for the relationship between
x and y is y = c + dx. [2]
(iii) Explain how to use the values obtained by calculating product moment correlation
coefficients to decide, for this data, whether y = a + bx2 or y = c + dx is the better
model. [1]
(iv) It is desired to use the data in the table to estimate the value of y for which x = 3.2.
Find the equation of the least-squares regression line of y on x2 . Use your equation
to calculate the desired estimate. [3]

1205, Contents

Exercise 696. (9740 N2011/II/9.) (Answer on p. 1750.)
Camera lenses are made by two companies, A and B. 60% of all lenses are made by A and
the remaining 40% by B. 5% of the lenses made by A are faulty. 7% of the lenses made by
B are faulty. (Author’s remark: Assume that there are infinitely many lenses.)
(i) One lens is selected at random. Find the probability that
(a) it is faulty, [2]
(b) it was made by A, given that it is faulty. [1]
(ii) Two lenses are selected at random. Find the probability that
(a) exactly one of them is faulty, [2]
(b) both were made by A, given that exactly one is faulty. [3]

Exercise 697. (9740 N2011/II/10.) (Answer on p. 1750.)

In a factory, the time in minutes for an employee to install an electronic component is a
normally distributed continuous random variable T . The standard deviation of T is 5.0
and under ordinary conditions the expected value of T is 38.0. After background music is
introduced into the factory, a sample of n components is taken and the mean time taken
for randomly chosen employees to install them is found to be t̄ minutes. A test is carried
out, at the 5% significance level, to determine whether the mean time taken to install a
component has been reduced.
(i) State appropriate hypotheses for the test, defining any symbols you use. [2]
(ii) Given that n = 50, state the set of values of t̄ for which the result of the test would be
to reject the null hypothesis. [3]
(iii) It is given instead that t̄ = 37.1 and the result of the test is that the null hypothesis is
not rejected. Obtain an inequality involving n, and hence find the set of values that
n can take. [4]

Exercise 698. (9740 N2011/II/11.) (Answer on p. 1751.)

A committee of 10 people is chosen at random from a group consisting of 18 women and
12 men. The number of women on the committee is denoted by R.
(i) Find the probability that R = 4. [3]
(ii) The most probable number of women on the committee is denoted by r. By using the
fact that P (R = r) > P (R = r + 1), show that r satisfies the inequality

(r + 1)!(17 − r)!(9 − r)!(r + 3)! > r!(18 − r)!(10 − r)!(r + 2)!

and use this inequality to find the value of r. [5]

1206, Contents
Exercise 699. (9740 N2011/II/12.) (Answer on p. 1752.)
The number of people joining an airport check-in queue in a period of 1 minute is a random
variable with the distribution Po(1.2).
(i) Find the probability that, in a period of 4 minutes, at least 8 people join the queue.
(ii) The probability that no more than 1 person joins the queue in a period of t seconds
is 0.7. Find an equation for t. Hence find the value of t, giving your answer correct
to the nearest whole number. [4]
(iii) The number of people leaving the same queue in a period of 1 minute is a random
variable with the distribution Po(1.8). At 0930 on a certain morning there are 35
people in the queue. Use appropriate approximations to find the probability that
by 0945 there are at least 24 people in the queue, stating the parameters of any
distributions that you use. (You may assume that the queue does not become empty
during this period.) [5]
(iv) Explain why a Poisson model would probably not be valid if applied to a time period
of several hours. [1]

Exercise 700. (9740 N2010/II/5.) (Answer on p. 1752.)

At an international athletics competition, it is desired to sample 1% of the spectators to
find their opinions of the catering facilities.
(i) Give a reason why it would be difficult to use a stratified sample. [1]
(ii) Explain how a systematic sample could be carried out. [2]

Exercise 701. (9740 N2010/II/6.) (Answer on p. 1752.)

The time required by an employee to complete a task is a normally distributed random
variable. Over a long period it is known that the mean time required is 42.0 minutes.
Background music is introduced in the workplace, and afterwards the time required, t
minutes, is measured for a random sample of 11 employees. The results are summarised as

n = 11, ∑ t = 454.3, ∑ t2 = 18 779.43.

(i) Find unbiased estimates of the population mean and variance.
(ii) Test, at the 10% significance level, whether there has been a change in the mean time
required by an employee to complete the task. [7]

1207, Contents

Exercise 702. (9740 N2010/II/7.) (Answer on p. 1753.)
For events A and B it is given that P (A) = 0.7, P (B) = 0.6 and P (A∣B ′ ) = 0.8. Find
(i) P (A ∩ B ′ ), [2]
(ii) P (A ∪ B), [2]
(iii) P (B ′ ∣A). [2]
For a third event C, it is given that P(C) = 0.5 and that A and C are independent.
(iv) Find P (A′ ∩ C). [2]
(v) Hence state an inequality satisfied by P (A′ ∩ B ∩ C).385 [1]

Exercise 703. (9740 N2010/II/8.) (Answer on p. 1753.)

The digits 1, 2, 3, 4 and 5 are arranged randomly to form a five-digit number. No digit is
repeated. Find the probability that
(i) the number is greater than 30 000, [1]
(ii) the last two digits are both even, [2]
(iii) the number is greater than 30 000 and odd. [4]

Exercise 704. (9740 N2010/II/9.) (Answer on p. 1753.)

In this question you should state clearly the values of the parameters of any normal distri-
bution you use.
Over a three-month period Ken makes X minutes of peak-rate telephone calls and Y
minutes of cheap-rate calls. X and Y are independent random variables with the dis-
tributions N (180, 302 ) and N (400, 602 ) respectively.
(i) Find the probability that, over a three-month period, the number of minutes of cheap-
rate calls made by Ken is more than twice the number of minutes of peak-rate calls.
Peak-rate calls cost $0.12 per minute and cheap-rate calls cost $0.05 per minute.
(ii) Find the probability that, over a three-month period, the total cost of Ken’s calls is
greater than $45. [3]
(iii) Find the probability that the total cost of Ken’s peak-rate calls over two independent
three-month periods is greater than $45. [3]

This question is terribly vague. A trivial but perfectly correct answer would be P (A′ ∩ B ∩ C) ≥ 0, but
I suspect that any smart aleck who wrote this didn’t get the mark.
1208, Contents
Exercise 705. (9740 N2010/II/10.) (Answer on p. 1753.)
A car is placed in a wind tunnel and the drag force F for different wind speeds v, in
appropriate units, is recorded. The results are shown in the table.

v 0 4 8 12 16 20 24 28 32 36
F 0 2.5 5.1 8.8 11.2 13.6 17.6 22.0 27.8 33.9

(i) Draw the scatter diagram for these values, labelling the axes clearly. [2]
It is thought that the drag force F can be modelled by one of the formulae

F = a + bv or F = c + dv 2

where a, b, c and d are constants.

(ii) Find, correct to 4 decimal places, the value of the product moment correlation coeffi-
cient between
(a) v and F ,
(b) v 2 and F . [2]
(iii) Use your answers to parts (i) and (ii) to explain which of F = a + bv or F = c + dv 2 is
the better model. [1]
(iv) It is required to estimate the value of v for which F = 26.0. Find the equation of a
suitable regression line, and use it to find the required estimate. Explain why neither
the model F = a + bv nor the model F = c + dv 2 should be used.386 [4]

Exercise 706. (9740 N2010/II/11.) (Answer on p. 1754.)

In this question you should state clearly all distributions that you use, together with the
values of the appropriate parameters.
The number of telephone calls received by a call centre in one minute is a random variable
with distribution Po(3).
(i) Find the probability that exactly 8 calls are received in a randomly chosen period of
4 minutes. [2]
(ii) Find the length of time, to the nearest second, for which the probability that no calls
are received is 0.2. [3]
(iii) Use a suitable approximation to find the probability that, on a randomly chosen
working day of 12 hours, more than 2200 calls are received. [4]
A working day of 12 hours on which more than 2200 calls are received is said to be ‘busy’.
(iv) Find the probability that, in six randomly chosen working days, exactly two are busy.
(v) Use a suitable approximation to find the probability that, in 30 randomly chosen
working days of 12 hours, fewer than 10 are busy. [4]

I have changed the wording of this sentence slightly.
1209, Contents
Exercise 707. (9740 N2009/II/5.) (Answer on p. 1755.)
A cinema manager wishes to take a survey of opinions of cinema-goers. Describe how a
quota sample of size 100 might be obtained, and state one disadvantage of quota sampling.

Exercise 708. (9740 N2009/II/6.) (Answer on p. 1755.)

The table gives the world record time, in seconds above 3 minutes 30 seconds, for running
1 mile as at 1st January in various years.

Year, x 1930 1940 1950 1960 1970 1980 1990 2000

Time, t 40.4 36.4 31.3 24.5 21.1 19.0 16.3 13.1

(i) Draw a scatter diagram to illustrate the data. [2]

(ii) Comment on whether a linear model would be appropriate, referring both to the
scatter diagram and the context of the question. [2]
(iii) Explain why in this context a quadratic model would probably not be appropriate for
long-term predictions. [1]
(iv) Fit a model of the form ln t = a + bx to the data and use it to predict the world record
time as at 1st January 2010. Comment on the reliability of your prediction. [3]

Exercise 709. (9740 N2009/II/7.) (Answer on p. 1755.)

A company buys p% of its electronic components from supplier A and the remaining
(100 − p) % from supplier B. The probability that a randomly chosen component sup-
plied by A is faulty is 0.05. The probability that a randomly chosen component supplied
by B is faulty is 0.03.
(i) Given that p = 25, find the probability that a randomly chosen component is faulty.[2]
(ii) For a general value of p, the probability that a randomly chosen component that is
faulty was supplied by A is denoted by f (p). Show that f (p) = . Prove by
0.02p + 3
differentiation that f is an increasing function for 0 ≤ p ≤ 100, and explain what this
statement means in the context of the question. [6]

Exercise 710. (9740 N2009/II/8.) (Answer on p. 1756.)

Find the number of ways in which the letters of the word ELEVATED can be arranged if
(i) there are no restrictions, [1]
(ii) T and D must not be next to one another, [2]
(iii) consonants (L, V, T, D) and vowels (E, A) must alternate, [3]
(iv) between any two Es there must be at least 2 other letters. [3]
1210, Contents
Exercise 711. (9740 N2009/II/9.) (Answer on p. 1756.)
The thickness in cm of a mechanics textbook is a random variable with the distribution
N (2.5, 0.12 ).
(i) The mean thickness of n randomly chosen mechanics textbooks is denoted by M̄ cm.
Given that P (M̄ > 2.53) = 0.0668, find the value of n. [3]
The thickness in cm of a statistics textbook is a random variable with the distribution
N (2.0, 0.082 ).
(ii) Calculate the probability that 21 mechanics textbooks and 24 statistics textbooks will
fit into a bookshelf of length 1 m. State clearly the mean and variance of any normal
distribution you use in your calculation. [3]
(iii) Calculate the probability that the total thickness of 4 statistics textbooks is less than
three times the thickness of 1 mechanics textbook. State clearly the mean and variance
of any normal distribution you use in your calculation. [3]
(iv) State an assumption needed for your calculation in parts (ii) and (iii). [1]

Exercise 712. (9740 N2009/II/10.) (Answer on p. 1756.)

A company supplies sugar in small packets. The mass of sugar in one packet is denoted by
X grams. The masses of a random sample of 9 packets are summarised by

∑ x = 86.4, ∑ x2 = 835.92.
(i) Calculate unbiased estimates of the mean and variance of X. [2]
The mean mass of sugar in a packet is claimed to be 10 grams. The company directors
want to know whether the sample indicates that this claim is incorrect.
(ii) Stating a necessary assumption, carry out a t-test at the 5% significance level. Explain
why the Central Limit Theorem does not apply in this context. [7]
(iii) Suppose now that the population variance of X is known, and that the assumption
made in part (ii) is still valid. What change would there be in carrying out the test?

1211, Contents

Exercise 713. (9740 N2009/II/11.) (Answer on p. 1757.)
A fixed number, n, of cars is observed and the number of those cars that are red is denoted
by R.
(i) State, in context, two assumptions needed for R to be well modelled by a binomial
distribution. [2]
Assume now that R has the distribution B (n, p). [2]
(ii) Given that n = 20 and p = 0.15, find P (4 ≤ R < 8). [2]
(iii) Given that n = 240 and p = 0.3, find P (R < 60) using a suitable approximation, which
should be clearly stated.
(iv) Given that n = 240 and p = 0.02, find P (R = 3) using a suitable approximation, giving
your answer correct to 4 decimal places and explaining why the approximation is
appropriate in this case. [3]
(v) Given that n = 20 and P (R = 0 or 1) = 0.2, write down an equation for the value of p,
and find this value numerically. [2]

Exercise 714. (9740 N2008/II/5.) (Answer on p. 1757.)

A school has 950 pupils.
(i) A sample of 50 pupils is to be chosen to take part in a survey. Describe how the
sample could be chosen using systematic sampling. [2]
The purpose of the survey is to investigate pupils’ opinions about the sports facilities
available at the school.
(ii) Give a reason why a stratified sample might be preferable in this context. [2]

Exercise 715. (9740 N2008/II/6.) (Answer on p. 1758.)

In mineral water from a certain source, the mass of calcium, X mg, in a one-litre bottle is
a normally distributed random variable with mean µ. Based on observations over a long
period, it is known that µ = 78. Following a period of extreme weather, 15 randomly chosen
bottles of the water were analysed. The masses of calcium in the bottles are summarised

∑ x = 1 026.0, ∑ x2 = 77 265.90.

Test, at the 5% significance level, whether the mean mass of calcium in a bottle has changed.
1212, Contents
Exercise 716. (9740 N2008/II/7.) (Answer on p. 1758.)
A computer game simulates a tennis match between two players, A and B. The match
consists of at most three sets. Each set is won by either A or B, and the match is won by
the first player to win two sets.
The simulation uses the following rules.
• The probability that A wins the first set is 0.6.
• For each set after the first, the conditional probability that A wins that set, given that
A won the preceding set, is 0.7.
• For each set after the first, the conditional probability that B wins that set, given that
B won the preceding set, is 0.8.
Calculate the probability that
(i) A wins the second set, [2]
(ii) A wins the match, [3]
(iii) B won the first set, given that A wins the match. [3]

Exercise 717. (9740 N2008/II/8.) (Answer on p. 1758.)

A certain metal discolours when exposed to air. To protect the metal against discolouring,
it is treated with a chemical. In an experiment, different quantities, x ml, of the chemical
were applied to standard samples of the metal, and the times, t hours, for the metal to
discolour were measured. The results are given in the table.

x 1.2 2.0 2.7 3.8 4.8 5.6 6.9

t 2.2 4.5 5.8 7.3 7.6 9.0 9.9

(i) Calculate the product moment correlation coefficient between x and t, and explain
whether your answer suggests that a linear model is appropriate. [3]
(ii) Draw a scatter diagram for the data. [1]
One of the values t appears to be incorrect.
(iii) Indicate the corresponding point on your diagram by labelling it P , and explain why
the scatter diagram for the remaining points may be consistent with a model of the
form t = a + b ln x. [2]
(iv) Omitting P , calculate least square estimates of a and b for the model t = a + b ln x.[2]
(v) Estimate the value of t at the value of x corresponding to P . [1]
(vi) Comment on the use of the model in part (iv) in predicting the value of t when x = 8.0.

1213, Contents

Exercise 718. (9740 N2008/II/9.) (Answer on p. 1759.)
A shop sells two types of piano, ‘grand’ and ‘upright’. The mean number of grand pianos
sold in a week is 1.8.
(i) Use a Poisson distribution to find the probability that in a given week at least 4 grand
pianos are sold. [2]
The mean number of upright pianos sold in a week is 2.6. The sales of the two types of
piano is independent.
(ii) Use a Poisson distribution to find the probability that in a given week the total number
of pianos sold is exactly 4. [2]
(iii) Use a normal approximation to the Poisson distribution to find the probability that
the number of grand pianos sold in a year of 50 weeks is less than 80. [4]
(iv) Explain why the Poisson distribution may not be a good model for the number of
grand pianos sold in a year. [2]

Exercise 719. (9740 N2008/II/10.) (Answer on p. 1759.)

A group of diplomats is to be chosen to represent three islands, K, L and M . The group
is to consist of 8 diplomats and is chosen from a set of 12 diplomats consisting of 3 from
K, 4 from L and 5 from M . Find the number of ways in which the group can be chosen if
it includes
(i) 2 diplomats from K, 3 from L and 3 from M , [2]
(ii) diplomats from L and M only, [2]
(iii) at least 4 diplomats from M , [2]
(iv) at least 1 diplomat from each island. [4]

Exercise 720. (9740 N2008/II/11.) (Answer on p. 1760.)

The random variable X has the distribution N (50, 82 ). Given that X1 and X2 are two
independent observations of X, find
1. P (X1 + X2 > 120), [2]
2. P (X1 > X2 + 15). [3]
The random variable Y is related to X by the formula Y = aX + b, where a and b are
constants with a > 0.
3. Given that P (Y < 74) = P (Y > 146) = 0.0668, find the values of E(Y ) and Var(Y ), and
hence find the values of a and b. [7]

Exercise 721. (9233 N2008/I/1.) (Answer on p. 1760.)

On a bookshelf there are 15 different books; 6 have red covers, 5 have blue covers and 4
have green covers. All the red books are to be kept together, all the blue books are to be
kept together and all the green books are to be kept together. In how many ways can the
15 books be arranged on the bookshelf? [3]
1214, Contents
Exercise 722. (9233 N2008/II/23.) (Answer on p. 1760.)
The events A, B and C are such that P(A) = 0.2, P(C) = 0.4, P(A ∪ B) = 0.4 and
P(B ∩ C) = 0.1. Given that A and B are independent, find P(B) and show that B and C
are also independent. [4]

Exercise 723. (9233 N2008/II/26.) (Answer on p. 1760.)

The number of times that an office photocopying machine breaks down in a week follows a
Poisson distribution with mean 3. Find the probability that
(i) the machine will break down more than twice in a given week, [2]
(ii) the machine will break down at most three times in a period of four weeks. [3]
(iii) Use a suitable approximation to find the probability that the machine will break down
more than 50 times in a period of 16 weeks. [4]

Exercise 724. (9233 N2008/II/27.) (Answer on p. 1760.)

The masses of a certain type of electronic component produced by a machine are normally
distributed with mean 32.40 g. The machine is adjusted and a sample of 80 components
is now taken and is found to have a mean mass 32.00 g. The unbiased estimate of the
population variance, calculated from this sample, is 2.892 g2 .
(i) Test at the 5% significance level whether this indicates a change in the mean. [5]
(ii) Explain what you understand by the phrase ‘at the 5% significance’ in the context of
this question. [2]
(iii) Find the least level of significance at which this sample would indicate a decrease in
the population mean. [3]

Exercise 725. (9233 N2008/II/29.) (Answer on p. 1761.)

Mr Sim and Mr Lee work in the same office and are expected to arrive by 9 a.m. each day.
Both men drive to work.
(i) The time taken for Mr Sim’s journey follows a normal distribution with mean 50
minutes and standard deviation 4 minutes. Given that he regularly leaves home at
8.05 a.m., find the probability that he will be late no more than once in a working
week of 5 days. [5]
(ii) Mr Lee’s journey time follows a normal distribution with mean 40 minutes and stand-
ard deviation 5 minutes. Mr Lee leaves home at 8.10 a.m. each day. Find the
probability that Mr Sim will arrive at work before Mr Lee on any particular day. [5]
(iii) Find the probability that in a working week of 5 days, Mr Sim arrives at work before
Mr Lee on at least 3 days. [2]
1215, Contents
Exercise 726. (9233 N2008/II/30.) (Answer on p. 1761.)
(i) The masses of valves produced by a machine are normally distributed with mean µ
and standard deviation σ. 12% of the valves have mass less than 86.50 g and 20% have
mass more than 92.25 g. Find µ and σ. [4]
(ii) The setting of the machine is adjusted so that the mean mass of the valves produced
is unchanged, but the standard deviation is reduced. Given that 80% of the valves
now have a mass within 2 g of the mean, find the new standard deviation. [3]
(iii) After the machine has been adjusted, a random sample of n valves is taken. Find the
smallest value of n such that the probability that the sample mean exceeds µ by at
least 0.50 g is at most 0.1. [5]

Exercise 727. (9740 N2007/II/5.) (Answer on p. 1761.)

(i) Give a real-life example of a situation in which quota sampling could be used. Explain
why quota sampling would be appropriate in this situation, and describe briefly any
disadvantage that quota sampling has. [4]
(ii) Explain briefly whether it would be possible to use stratified sampling in the situation
you have described in part (i). [1]

Exercise 728. (9740 N2007/II/6.) (Answer on p. 1762.)

In a large population, 24% have a particular gene A, and 0.3% have gene B. Find the
probability that, in a random sample of 10 people from the population, at most 4 have
gene A. [2]
A random sample of 1000 people is taken from the population. Using appropriate approx-
imations, find
(i) the probability that between 230 and 260 inclusive have gene A, [3]
(ii) the probability that at least 2 but fewer than 5 have gene B. [2]

Exercise 729. (9740 N2007/II/7.) (Answer on p. 1762.)

A large number of students in a college have completed a geography project. The time,
x hours, taken by a student to complete the project is noted for a random sample of 150
students. The results are summarised by

∑ x = 4626, ∑ x2 = 147691.
(i) Find unbiased estimates of the population mean and variance. [2]
(ii) Test, at the 5% significance level, whether the population mean time for a student to
complete the project exceeds 30 hours. [4]
(iii) State giving a valid reason, whether any assumptions about the population are needed
in order for the test to be valid. [1]

1216, Contents

Exercise 730. (9740 N2007/II/8.) (Answer on p. 1762.)
Chickens and turkeys are sold by weight. The masses, in kg, of chickens and turkeys are
modelled as having independent normal distributions with means and standard deviations
as shown in the table.

Mean Mass Standard Deviation

Chickens 2.2 0.5
Turkeys 10.5 2.1

Chickens are sold at $3 per kg and turkeys at $5 per kg.

(i) Find the probability that a randomly chosen chicken has a selling price exceeding $7.
(ii) Find the probability of the event that both a randomly chosen chicken has a selling
price exceeding $7 and a randomly chosen turkey has a selling price exceeding $55.[3]
(iii) Find the probability that the total selling price of a randomly chosen chicken and a
randomly chosen turkey is more than $62. [4]
(iv) Explain why the answer to part (iii) is greater than the answer to part (ii). [1]

Exercise 731. (9740 N2007/II/9.) (Answer on p. 1763.)

A group of 12 people consists of 6 married couples.
(i) The group stand in a line.
(a) Find the number of different possible orders. [1]
(b) Find the number of different possible orders in which each man stands next to his
wife. [3]
(ii) The group stand in a circle.
(a) Find the number of different possible arrangements. [1]
(b) Find the number of different possible arrangements if men and women alternate.
(c) Find the number of different possible arrangements if each man stands next to
his wife and men and women alternate. [2]

1217, Contents

Exercise 732. (9740 N2007/II/10.) (Answer on p. 1763.)
A player throws three darts at a target. The probability that he is successful in hitting the
target with his first throw is . For each of his second and third throws, the probability of
success is
• twice the probability of success on the preceding throw if that throw was successful,
• the same as the probability of success on the preceding throw if that throw was unsuc-
Construct a probability tree showing this information. [3]
(i) the probability that all three throws are successful, [2]
(ii) the probability that at least two throws are successful, [2]
(iii) the probability that the third throw is successful given that exactly two of the three
throws are successful. [4]

1218, Contents

Exercise 733. (9740 N2007/II/11.) (Answer on p. 1763.)
Research is being carried out into how the concentration of a drug in the bloodstream varies
with time, measured from when the drug is given. Observations at successive times give
the data shown in the following table.

Time (t minutes) 15 30 60 90 120 150 180 240 300

Concentration (x micrograms per litre) 82 65 43 37 22 19 12 6 2

It is given that the value of the product moment correlation coefficient for this data is
−0.912, correct to 3 decimal places. The scatter diagram for the data is shown below.

x (micrograms per litre)

10 t (minutes)
0 50 100 150 200 250 300 350

(i) Calculate the equation of the regression line of x on t. [2]

(ii) Calculate the corresponding estimated value of x when t = 300, and comment on the
suitability of the linear model. [2]

The variable y is defined by y = ln x. For the variables y and t,

(iii) calculate the product moment correlation coefficient and comment on its value, [2]
(iv) calculate the equation of the appropriate regression line. [3]
(v) Use a regression line to give the best estimate that you can of the time when the drug
concentration is 15 micrograms per litre. [2]

1219, Contents

Exercise 734. (9233 N2007/I/4.) (Answer on p. 1764.)
The diagram shows two straight lines, ABCD and AEF GHIJ, which intersect at A.
Triangles are to be drawn using three of the points A, B, C, D, E, F , G, H, I, J as




(i) How many different triangles can be drawn which have the point A as one of the
vertices? [1]
(ii) How many different triangles in total can be drawn? [4]

Exercise 735. (9233 N2007/II/23.) (Answer on p. 1764.)

(i) A random sample of size 100 is taken from a population with mean 30 and standard
deviation 5. Find an approximate value for the probability that the sample mean lies
between 29.2 and 30.8. [6]
(ii) Giving a reason, state whether it is necessary to make any assumptions about the
distribution of the population. [1]

1220, Contents

Exercise 736. (9233 N2007/II/25.) (Answer on p. 1764.)
The numbers of men and women studying Chemistry, Physics and Biology at a college are
given in the following table.

Chemistry Physics Biology

Men 12 16 32
Women 8 12 20

One of these students is chosen at random by a researcher. Events M , W , C and B are

defined as follows.
M : the student chosen is a man
W : the student chosen is a woman.
C: the student chosen is studying Chemistry
B: the student chosen is studying Biology
(i) P (W ∣B). [1]
(ii) P (B∣W ). [1]
(iii) P(B ∪ W ). [2]
State, with a reason in each case, whether W and B are independent, and whether M and
C are mutually exclusive. [4]
Exercise 737. (9233 N2007/II/26.) (Answer on p. 1764.)
At a fire station, each call-out is classified as either genuine or false. Call-outs occur at
random times. On average, there are two genuine call-outs in a week, and one false call-out
in a two-week period.
(i) Calculate the probability that there are fewer than 6 genuine call-outs in a randomly
chosen two-week period. [?]
(ii) Using a suitable approximation, calculate the probability that the total number of
call-outs in a randomly chosen six-week period exceeds 19. [?]

Exercise 738. (9233 N2007/II/27.) (Answer on p. 1765.)

An oil mixture is produced by mixing L litres of light oil with H litres of heavy oil. The
random variables L and H are independent normal variables. The expected value of L is 5
and its standard deviation is 0.1. The expected value of H is 3 and its standard deviation
is 0.05.
(i) Find the probability that the volume of the mixture lies between 7.9 litres and 8.2
litres. [6]
The density of light oil is 0.74 kilograms per litre, and the density of heavy oil is 0.86
kilograms per litre.
(ii) Find the probability that the mass of the mixture lies between 6.1 kg and 6.2 kg. [6]
[Density is defined by Density = .]
1221, Contents
Exercise 739. (9233 N2006/I/4.) (Answer on p. 1765.)
A box contains 8 balls, of which 3 are identical (and so are indistinguishable from one
another) and the other 5 are different from each other.387 3 balls are to be picked out of the
box; the order in which they are picked out does not matter. Find the number of different
possible selections of 3 balls. [4]

Exercise 740. (9233 N2006/II/23.) (Answer on p. 1765.)

Two fair dice, one red and the other green, are thrown.
A is the event: The score on the red die is divisible by 3.
B is the event: The sum of two scores is 9.
(i) Justifying your conclusion, determine whether A and B are independent. [3]
(ii) Find P (A ∪ B). [2]

Exercise 741. (9233 N2006/II/25.) (Answer on p. 1765.)

The mass of vegetables in a randomly chosen bag has a normal distribution. The mass of
the contents of a bag is supposed to be 10 kg. A random sample of 80 bags is taken and
the mass of the contents of each bag, x grams, is measured. The data are summarised by

∑(x − 10000) = −2510, ∑(x − 10000)2 = 2010203.

(i) Test, at the 5% significance level, whether the mean mass of the contents of a bag is
less than 10 kg. [7]
(ii) Explain, in the context of the question, the meaning of ‘at the 5% significance level’.

Exercise 742. (9233 N2006/II/26.) (Answer on p. 1765.)

In a weather model, severe floods are assumed to occur at random intervals, but at an
average rate of 2 per 100 years.
(i) Using this model, find the probability that, in a randomly chosen 200-year period,
there is exactly one severe flood in the first 100 years and exactly one severe flood in
the second 100 years. [3]
(ii) Using the same model, and a suitable approximation, find the probability that there
are more than 25 severe floods in 1000 years. [5]

Assume also that each of the latter 5 balls is different from each of the first 3.
1222, Contents
Exercise 743. (9233 N2006/II/28.) (Answer on p. 1766.)
Observations are made of the speeds of cars on a particular stretch of road during daylight
hours. It is found that, on average, 1 in 80 cars is travelling at a speed exceeding 125 km h−1 ,
and 1 in 10 is travelling at a speed less than 40 km h−1 .
(i) Assuming a normal distribution, find the mean and the standard deviation of this
distribution. [4]
(ii) A random sample of 10 cars is to be taken. Find the probability that at least 7 will
be travelling at a speed in excess of 40 km h−1 . [3]
(iii) A random sample of 100 cars is to be taken. Using a suitable approximation, find the
probability that at most 8 cars will be travelling at a speed less than 40 km h−1 . [3]

1223, Contents

115. All Past-Year Questions, Listed and Categorised

115.1. 2017 (9758)

There are in total 200 points, so each point accounts for 0.5% of your final A-Level grade.
Below I list points for curveball questions388 in red.
The four-digit numbers are the page numbers for the Questions and Answers.

See Preface/Rant — p. xli.
1224, Contents
Paper I: Pure Mathematics [100]
Q A Part Topics Points
1 1158 1685 Calc. Maclaurin 4=4
2 1121 1603 F&G graphs, absolute value 2+4=6
differentiation, stationary points,
3 1158 1685 Calc. 4+3=7
turning points, maximum, minimum
conic sections, differentiation,
4 1121 1604 F&G 3+3+2=8
asymptotes, transformations
factorisation, Remainder Theorem,
5 1121 1604 F&G 4 + 3 + 3 = 10
differentiation, quadratic
6 1144 1652 Vectors vector equations, lines, planes 2+3+3=8
7 1158 1686 Calc. integration, trigonometry 3+5=8
8 1152 1664 Complex quadratic, factorisation 3 + 4 + 3 = 10
9 1134 1634 S&S summation, limits, Maclaurin = 13
vector equations, lines,
10 1144 1652 Vectors 4 + 4 + 5 = 13
scalar product, quadratic
11 1158 1686 Calc. differential equations = 13
Paper II, Section A: Pure Mathematics [40]
parametric, differentiation,
1 1121 1605 F&G 3+5=8
equations, points
arithmetic progression,
2 1134 1634 S&S 2+4+3=9
geometric progression
inverse, composite functions 4+2+
3 1122 1605 F&G = 12
transformations, conic sections 4+2
graph, quadratic,
4 1159 1687 Calc. 4 + 4 + 3 = 11
integration, volume
Section B: Probability and Statistics [60]
5 1187 1736 P&S 3+2+2=7
6 1187 1736 P&S 2+3+4=9
7 1187 1736 P&S = 10
8 1188 1736 P&S = 10
9 1189 1736 P&S = 12
10 1189 1736 P&S = 12

1225, Contents

115.2. 2016 (9740)

Paper I: Pure Mathematics [100]

Q A Part Topics Points
1 1122 1605 F&G inequalities 2+3=5
2 1159 1689 Calc. differentiation, calculator 2+3=5
3 1122 1606 2+4=6
4 1134 1635 4+3=7
5 1144 1653 2+4+2=8
6 1135 1635 5 + 2 + 3 = 10
7 1152 1664 5 + 5 = 10
8 1159 1689 5 + 3 + 3 = 11
9 1160 1690 1 + 6 + 3 + 2 = 12
10 1123 1606 3 + 5 + 3 + 2 = 13
11 1145 1654 5 + 5 + 3 = 13
Paper II, Section A: Pure Mathematics [40]
1 1160 1691 7=7
2 1161 1691 3 + 2 + 5 = 10
3 1161 1692 4 + 3 + 4 = 11
4 1152 1665 2 + 4 + 3 + 3 = 12
Section B: Probability and Statistics [60]
5 1190 1736 P&S 2+1+2=5
6 1191 1736 P&S 1 + 1 + 1 + 4 + 3 = 10
7 1191 1736 P&S 1 + 3 + 3 + 3 = 10
8 1192 1736 P&S 2 + 1 + 2 + 3 + 3 = 11
9 1192 1736 P&S 2 + 4 + 4 = 10
10 1193 1736 P&S 2 + 2 + 2 + 4 + 4 = 14

1226, Contents

115.3. 2015 (9740)

Paper I: Pure Mathematics [100]

Q A Part Topics Points
1 1123 1608 4+2+1=7
2 1123 1608 3+3=6
3 1161 1693 2+3=5
4 1162 1694 6=6
5 1124 1609 2+3+2=7
6 1162 1694 2+6=8
7 1145 1655 2 + 3 + 5 = 10
8 1135 1636 4 + 4 + 3 = 11
9 1153 1665 5 + 4 + 4 = 13
10 1162 1695 4 + 2 + 6 = 12
11 1163 1696 3 + 6 + 3 + 3 = 15
Paper II, Section A: Pure Mathematics [40]
1 1163 1697 1+5=6
2 1145 1656 2 + 5 + 3 = 10
3 1124 1610 2 + 3 + 5 = 10
4 1135 1636 6 + 1 + 4 + 3 = 14
Section B: Probability and Statistics [60]
5 1193 1736 P&S 1+2+1=4
6 1193 1736 P&S 2+3+2=7
7 1194 1737 P&S 2+3+2=7
8 1194 1737 P&S 7=7
9 1194 1737 P&S 1+3+4=8
10 1195 1738 P&S 1+3+3+2=9
11 1195 1739 P&S 2+1+2+4=9
12 1195 1739 P&S 2+3+4=9

1227, Contents

115.4. 2014 (9740)

Paper I: Pure Mathematics [100]

Q A Part Topics Points
1 1124 1611 4+1=5
2 1163 1697 6=6
3 1145 1656 2+2+1=5
4 1125 1611 4+1=5
5 1153 1666 4+3=7
6 1136 1637 5 + 3 + 2 + 2 = 12
7 1164 1698 2 + 2 + 3 + 4 = 11
8 1164 1699 1+4+4=9
9 1146 1656 4 + 4 + 5 = 13
10 1165 1700 1 + 5 + 1 + 1 + 5 = 13
11 1165 1701 6 + 2 + 3 + 3 = 14
Paper II, Section A: Pure Mathematics [40]
1 1125 1612 3+4=7
2 1166 1702 9=9
3 1136 1638 2 + 4 + 5 = 11
4 1153 1666 2 + 4 + 3 + 4 = 13
Section B: Probability and Statistics [60]
5 1196 1740 P&S 2+2=4
6 1196 1740 P&S 2+3+3=8
7 1196 1740 P&S 1+3+3=7
8 1197 1741 P&S 2+2+3+1=8
9 1197 1741 P&S 2+4+3=9
10 1198 1742 P&S 1 + 2 + 3 + 4 = 10
11 1198 1742 P&S 2 + 2 + 5 + 3 + 2 = 14

1228, Contents

115.5. 2013 (9740)

Paper I: Pure Mathematics [100]

Q A Part Topics Points
1 1146 1656 2+3=5
2 1125 1612 5=5
3 1126 1613 4+2=6
4 1153 1667 2+3+3=8
5 1166 1702 3+5=8
6 1146 1657 1+1+5=7
7 1137 1639 3+2+4=9
8 1154 1668 2+4+3=9
9 1137 1639 5 + 5 + 3 = 13
10 1166 1703 4 + 2 + 3 + 4 = 13
11 1167 1704 3 + 5 + 3 + 6 = 17
Paper II, Section A: Pure Mathematics [40]
1 1126 1614 2+4=6
2 1168 1705 3+6=9
3 1168 1706 7 + 5 = 12
4 1147 1657 3 + 4 + 6 = 13
Section B: Probability and Statistics [60]
5 1198 1743 P&S 2+2=4
6 1199 1743 P&S 4=4
7 1199 1743 P&S 2+1+3=6
8 1199 1743 P&S 1+2+3=6
9 1199 1743 P&S 2+5=7
10 1200 1744 P&S 3+1+2+3=9
11 1200 1745 P&S 2 + 2 + 4 + 4 = 12
12 1201 1745 P&S 3 + 2 + 3 + 4 = 12

1229, Contents

115.6. 2012 (9740)

Paper I: Pure Mathematics [100]

Q A Part Topics Points
1 1126 1614 4=
2 1168 1706 2+3+1=
3 1137 1640 2+2+4=
4 1169 1707 4+4=
5 1147 1658 4+4=
6 1154 1669 2+2+4=
7 1127 1614 2+3+4=
8 1169 1707 4+3+2=
9 1147 1658 3+5+4=
10 1170 1708 7+5=
11 1170 1709 5+3+5+3=
Paper II, Section A: Pure Mathematics [40]
1 1171 1710 3+5=
2 1154 1669 2+2+2+3=
3 1127 1616 1+3+1+1+4=
4 1138 1641 5+5+3=
Section B: Probability and Statistics [60]
5 1201 1746 P&S 2+2+3=
6 1202 1746 P&S 1+3+2=
7 1202 1746 P&S 2+2+2+2+1=
8 1203 1747 P&S 1+1+2+1+1+4+1=
9 1203 1748 P&S 2+2+3+5=
10 1204 1748 P&S 2+1+2+3+4+3=

1230, Contents

115.7. 2011 (9740)

Paper I: Pure Mathematics [100]

Q A Part Topics Points
1 1127 1616 4=4
2 1127 1617 3+2=5
3 1171 1710 2+2+3=7
4 1171 1710 3+3+2=8
5 1171 1711 3+1+3=7
6 1138 1642 2 + 3 + 6 = 11
7 1147 1659 6 + 2 + 1 + 2 = 11
8 1172 1712 2 + 5 + 3 + 2 = 12
9 1139 1643 6 + 4 = 10
10 1154 1670 4 + 3 + 1 + 1 + 1 = 10
11 1148 1659 4 + 4 + 4 + 3 = 15
Paper II, Section A: Pure Mathematics [40]
1 1155 1671 3+2+3=8
2 1172 1712 3+6=9
3 1128 1617 4 + 4 + 3 = 11
4 1172 1713 5 + 1 + 6 = 12
Section B: Probability and Statistics [60]
5 1204 1749 P&S 4=4
6 1204 1749 P&S 2+1+2=5
7 1205 1749 P&S 2+1+1+4=8
8 1205 1750 P&S 1+2+1+3=7
9 1206 1750 P&S 2+1+2+3=8
10 1206 1750 P&S 2+3+4=9
11 1206 1751 P&S 3+5=8
12 1207 1752 P&S 1 + 4 + 5 + 1 = 11

1231, Contents

115.8. 2010 (9740)

Paper I: Pure Mathematics [100]

Q A Part Topics Points
1 1148 1660 2+3=5
2 1173 1713 3+3=6
3 1139 1644 3+2=5
4 1173 1713 4+4=8
5 1128 1619 5+3=8
6 1173 1714 2 + 2 + 4 + 2 = 10
7 1174 1714 7 + 4 = 11
8 1155 1672 2 + 3 + 4 + 2 = 11
9 1174 1715 6 + 2 + 2 + 2 = 12
10 1148 1660 2 + 4 + 3 + 3 = 12
11 1175 1716 4 + 4 + 4 = 12
Paper II, Section A: Pure Mathematics [40]
1 1155 1673 2+5=7
2 1139 1644 5 + 4 + 2 = 11
3 1175 1717 5 + 2 + 2 + 2 = 11
4 1129 1620 1 + 2 + 2 + 3 + 3 = 11
Section B: Probability and Statistics [60]
5 1207 1752 P&S 1+2=3
6 1207 1752 P&S 7=7
7 1208 1753 P&S 2+2+2+2+1=9
8 1208 1753 P&S 1+2+4=7
9 1208 1753 P&S 4 + 3 + 3 = 10
10 1209 1753 P&S 2+2+1+4=9
11 1209 1754 P&S 2 + 3 + 4 + 2 + 4 = 15

1232, Contents

115.9. 2009 (9740)

Paper I: Pure Mathematics [100]

Q A Part Topics Points
1 1129 1623 4+2=6
2 1175 1717 5=5
3 1140 1645 2+3+2=7
4 1175 1718 2+3+3=8
5 1140 1645 4+4=8
6 1129 1623 4+2+2=8
7 1176 1718 5+4=9
8 1140 1646 4 + 3 + 4 = 11
9 1155 1674 5 + 2 + 5 = 12
10 1148 1660 3 + 4 + 5 = 12
11 1176 1719 2 + 4 + 4 + 2 + 2 = 14
Paper II, Section A: Pure Mathematics [40]
1 1176 1720 1+3+4=8
2 1149 1661 2 + 2 + 2 + 4 = 10
3 1130 1624 5 + 2 + 3 = 10
4 1177 1721 5 + 7 = 12
Section B: Probability and Statistics [60]
5 1210 1755 P&S 3=3
6 1210 1755 P&S 2+2+1+3=8
7 1210 1755 P&S 2+6=8
8 1210 1756 P&S 1+2+3+3=9
9 1211 1756 P&S 3 + 3 + 3 + 1 = 10
10 1211 1756 P&S 2 + 7 + 1 = 10
11 1212 1757 P&S 2 + 2 + 3 + 3 + 2 = 12

1233, Contents

115.10. 2008 (9740)

Paper I: Pure Mathematics [100]

Q A Part Topics Points
1 1177 1722 4=4
2 1141 1646 5=5
3 1149 1661 1+3+2=6
4 1177 1722 2+1+1+3=7
5 1178 1722 3+4=7
6 1178 1723 5 + 5 = 10
7 1178 1723 10 = 10
8 1156 1675 3 + 4 + 4 = 11
9 1130 1625 3 + 2 + 1 + 5 = 11
10 1141 1646 5 + 2 + 3 + 4 = 14
11 1149 1661 2 + 4 + 3 + 2 + 4 = 15
Paper II, Section A: Pure Mathematics [40]
1 1178 1724 2+3+1+3=9
2 1179 1724 3+3+3=9
3 1156 1676 2 + 2 + 3 + 4 = 11
4 1131 1627 2 + 3 + 1 + 5 = 11
Section B: Probability and Statistics [60]
5 1212 1757 P&S 2+2=4
6 1212 1758 P&S 6=6
7 1213 1758 P&S 2+3+3=8
8 1213 1758 P&S 3 + 1 + 2 + 2 + 1 + 1 = 10
9 1214 1759 P&S 2 + 2 + 4 + 2 = 10
10 1214 1759 P&S 2 + 2 + 2 + 4 = 10
11 1214 1760 P&S 2 + 3 + 7 = 12

1234, Contents

115.11. 2007 (9740)

Paper I: Pure Mathematics [100]

Q A Part Topics Points
1 1131 1629 1+4=5
2 1132 1629 3+3=6
3 1156 1679 3+4=7
4 1181 1728 6+1=7
5 1132 1630 4+3=7
6 1150 1662 2+3+4=9
7 1157 1680 3 + 4 + 3 = 10
8 1150 1662 5 + 3 + 3 = 11
9 1142 1648 2 + 2 + 3 + 2 + 2 = 11
10 1142 1648 4 + 5 + 5 = 14
11 1182 1729 2 + 6 + 5 = 13
Paper II, Section A: Pure Mathematics [40]
1 1132 1631 6=6
2 1143 1649 4 + 2 + 2 + 2 = 10
3 1182 1730 4 + 5 + 2 = 11
4 1182 1730 6 + 5 + 2 = 13
Section B: Probability and Statistics [60]
5 1216 1761 P&S 4+1=5
6 1216 1762 P&S 2+3+2=7
7 1216 1762 P&S 2+4+1=7
8 1217 1762 P&S 2 + 3 + 4 + 1 = 10
9 1217 1763 P&S 1+3+1+2+2=9
10 1218 1763 P&S 3 + 2 + 2 + 4 = 11
11 1219 1763 P&S 2 + 2 + 2 + 3 + 2 = 11

1235, Contents

115.12. 2008 (9233)
The 9233 syllabus was significantly heftier. In particular, in addition to “Pure Mathemat-
ics” and “Probability and Statistics”, there was also “Particle Mechanics”. This textbook
has omitted the questions on Particle Mechanics and also any other questions that would
also be out of the 9740 syllabus. (This explains why there seem to be some missing ques-
The format of the papers was also somewhat more complicated. The last question of Paper
1 was an either-or question (i.e. examinees had a choice of doing one of two questions
given). Paper 2 contained four sections, of which only Section A (Pure Mathematics) was
mandatory and examinees had to choose to do one of Sections B, C, or D. And again, the
last question of each of these four sections was an either-or question.
Note also that Permutations and Combinations fell under “Pure Mathematics”.

Paper 1: Pure Mathematics

Q A Part Topics Points
1 1214 1760 3=3
2 1179 1725 4+5=9
3 1179 1725 5=5
4 1180 1725 4=4
6 1180 1725 3+2=5
8 1180 1725 5=5
9 1156 1677 2+2+4=8
10 1180 1726 3+5=8
11 1150 1662 5+4=9
13 1180 1726 5 + 2 + 5 = 12
14 1181 1727 6 + 6 = 12
14 1131 1626 4 + 3 + 1 + 2 = 10
Paper 2, Section A: Pure Mathematics
1 1181 1727 3=3
2 1141 1647 6=6
3 1156 1679 3+4=7
5 633 1728 5+3=8
Paper 2, Sections B–D: Probability and Statistics
23 1215 1760 P&S 4=4
26 1215 1760 P&S 2+3+4=9
27 1215 1760 P&S 5 + 2 + 3 = 10
29 1215 1761 P&S 5 + 5 + 2 = 12
30 1216 1761 P&S 4 + 3 + 5 = 12

1236, Contents

115.13. 2007 (9233)

Paper 1: Pure Mathematics

Q A Part Topics Points
2 1182 1731 3=3
3 1183 1731 5=5
4 1220 1764 1+4=5
7 1150 1662 7=7
8 1183 1731 3+4=7
9 1157 1681 5+3=8
10 1183 1731 3+5=8
11 1183 1732 9=9
13 1184 1732 3 + 4 + 2 + 3 = 12
14 1143 1650 6=6
14 1184 1733 4 + 4 + 4 = 12
Paper 2, Section A: Pure Mathematics
1 1143 1650 3=3
2 1151 1662 2+5=7
4 1133 1631 2+3+3=8
5 1157 1681 3 + 2 + 3 + 2 = 10
Paper 2, Sections B–D: Probability and Statistics
23 1220 1764 P&S 6+1=7
25 1221 1764 P&S 1+1+2+4=8
26 1221 1764 P&S ?=?
27 1221 1765 P&S 6 + 6 = 12

1237, Contents

115.14. 2006 (9233)

Paper 1: Pure Mathematics

Q A Part Topics Points
1 1143 1650 4=4
3 1133 1632 3+1=4
4 1222 1765 4=4
5 1157 1682 3+2=5
6 1157 1683 3+3=6
7 1185 1733 6=6
8 1185 1734 7=7
9 1185 1734 2+6=8
11 1143 1650 4+1+4=9
14 1186 1735 2 + 2 + 3 + 2 + 3 = 12
14 1151 1663 7+2=9
Paper 2, Section A: Pure Mathematics
1 1133 1633 5=5
2 1186 1735 3+3=6
Paper 2, Sections B–D: Probability and Statistics
23 1222 1765 P&S 3+2=5
25 1222 1765 P&S 7+1=8
26 1222 1765 P&S 3+5=8
28 1223 1766 P&S 4 + 3 + 3 = 10

1238, Contents

H1 Maths 2017 Questions
Exercise 744. (8865 N2017/1.) (Answer on p. 1244.)
Find algebraically the set of values of k for which

x2 + (k − 4) x − (k − 7) > 0

for all real values of x. [4]

Exercise 745. (8865 N2017/2.) (Answer on p. 1244.)

(i) Differentiate √ with respect to x. [2]
5x − 2
(2x2 − 1)
(ii) Find ∫ dx. [4]

Exercise 746. (8865 N2017/3.) (Answer on p. 1244.)




The diagram shows a sign in the shape of a rectangle ABCD with two semicircles, one
attached to AB and one attached to CD. The length of AB is 2x cm and the total perimeter
of the sign is 10 cm.
(i) Show that the area of the sign is x (10 − πx) cm2 . [3]
The area of the sign is to be as large as possible.
2. Use a non-calculator method to find the maximum value of this area, giving your answer
in terms of π. Justify that this is the maximum value. [4]

Exercise 747. (8865 N2017/4.) (Answer on p. 1244.)

The equation of a curve is y = ln (4x − 5).
(i) Sketch the curve, stating the equations of any asymptotes. [2]
(ii) Find the equation of the tangent to the curve at the point where x = 2.5, giving your
answer in the form ax + by = c, where a and b are integers and c is in exact form. [4]
This tangent meets the x-axis at P and the y-axis at Q.
(iii) Find the length of P Q, giving your answer correct to 3 decimal places. [4]

1239, Contents

Exercise 748. (8865 N2017/5.) (Answer on p. 1244.)
A company produces three types of sports shoes: Supers, Runners and Walkers. The man-
ufacturing cost of a pair of Supers is twice the manufacturing cost of a pair of Walkers. The
total manufacturing cost of 10 pairs of Runners is $50 more than the total manufacturing
cost of 7 pairs of Supers. The total manufacturing cost of 2 pairs of Supers, 6 pairs of
Runners and 4 pairs of Walkers is $481.
(i) By writing down three linear equations, find the manufacturing cost of a pair of
Runners. [5]
An economist advises the company on how to increase their profit.
[Profit = selling price − manufacturing cost.]
In a simple model, the economist suggests that the selling price of all sports shoes should
be $80 a pair.
(ii) Find the profit from the sale of 100 pairs of each of Supers, Runners and Walkers. [2]
The company is trialling a new type of sports shoes, Extremes. The economist predicts
that the profit $P will be related to the manufacturing cost ($x) by the equation

P = 7 x − 0.9x.
(iii) Sketch the graph of P against x, stating the coordinates of the intersections with the
x-axis. [2]
(iv) Use your calculator to estimate the maximum value of P . State also the value of x
for which this maximum value occurs. [2]
(v) Given that the manufacturing cost of a pair of Extremes is $55, find the selling price.
(vi) If the manufacturing cost of a pair of Extremes increased to $65, would you advise
the company to produce Extremes? Justify your answer. [1]

Exercise 749. (8865 N2017/6.) (Answer on p. 1244.)

As part of an assessment of the health of people in a particular country, the heights of
a large number of adult males have been recorded. The results show that 20% of them
have a height less than 1.6 mand 30% of them have a height greater than 1.75 m. Assuming
that the heights of adult males are normally distributed, find the mean and variance of the
distribution. [4]

Exercise 750. (8865 N2017/7.) (Answer on p. 1244.)

Printers in a busy office produce large numbers of documents each week. The ink cartridges
used in the printers often need replacing. The probability that an ink cartridge will last for
one week or more is 0.7, independently of all other cartridges. The cartridges are supplied
in boxes of 8. A box is selected at random.
(i) Find the probability that exactly 5 of the cartridges in the box will last for one week
or more. [1]
(ii) Find the probability that at least half of the cartridges will last for less than one week.

1240, Contents

The office has 6 boxes of ink cartridges in stock.
(iii) Find the probability that, for at most 2 of the boxes, at least half of the cartridges
will last less than one week. [2]

Exercise 751. (8865 N2017/8.) (Answer on p. 1244.)

A code consists of 6 characters. The first 3 characters of the code consist of 3 digits chosen
from {1, 2, 3, 4, 5, 6}. The last 3 characters of the code consist of 3 letters chosen from
{A, B, C, D, E, F, G, H}.
(i) How many codes can be formed if repetitions are not allowed? [1]
Now suppose that repetitions are allowed.
(ii) Find the probability that a code chosen at random
(a) contains the digit 5 exactly once and the letter H exactly twice, [3]
(b) has 2 as its first character or H as its sixth character, but not both. [3]

Exercise 752. (8865 N2017/9.) (Answer on p. 1244.)

A computer manufacturing company employs a large number of workers on the production
line. The owner encourages his employees to stay with the company by giving them increases
in their earnings to reward their length of service. The weekly earnings, y hundred dollars,
of a random sample of 8 employees from the production line who have been with the
company for x years are given in the following table.

Employee A B C D E F G H
x 14 16 11 24 36 28 22 40
y 4.9 5.5 5.2 6.5 9.7 7.5 6.2 9.8

(i) Give a sketch of the scatter diagram of the data. [2]

(ii) Find the product moment correlation coefficient and comment on its value in the
context of the data. [2]
(iii) Find the equation of the regression line of y on x in the form y = ax + b, giving the
values of a and b correct to 3 significant figures. Sketch this line on your scatter
diagram. [2]

Sue has been employed by the company for 2 years and she earns 190 dollars per week.
(iv) Use the equation of your regression line to calculate an estimate of the weekly earnings
for employees on the production line who have been with the company for 2 years.[1]
Sue concludes that she should be earning more.
(v) Give two reasons why her conclusion might not be justified. [2]

Exercise 753. (8865 N2017/10.) (Answer on p. 1244.)

Bottles of a certain type of juice are said to contain 0.6 litres. A random sample of 50
bottles is taken and the volumes of juice (in litres) in the bottles are measured. The
unbiased estimates for the population mean and variance are 0.568 and 0.01528. The
1241, Contents
population mean volume is denoted by µ. The null hypothesis µ = 0.6 is to be tested
against the alternative hypothesis µ < 0.6.
(i) Find the p-value of the test and state the meaning of this p-value in this context. [2]
(ii) State, giving a reason, whether it is necessary to assume a normal distribution for this
test to be valid. [1]
A second random sample of bottles of this juice is taken. The sample size is 110 and the
volumes, y, are summarised by

∑ y = 70.4, ∑ y 2 = 49.42.
(iii) Find unbiased estimates for the population mean and variance using this second
sample. [3]
(iv) Using this second sample, test at the 5% significance level, whether there is evidence
that the population mean volume of juice differs from 0.6 litres. [4]

Exercise 754. (8865 N2017/11.) (Answer on p. 1244.)

Marketing Economics
20 16 11

12 12


Marketing, Economics and Finance are three subjects offered at a business college. The
numbers of students studying different combinations of these subjects are shown in the
above Venn diagram. Every student studies at least one of these subjects. The number
who study all three subjects is x. One of the students is chosen at random.
• M is the event that the student studies Marketing.
• E is the event that the student studies Economics.
• F is the event that the student studies Finance.
(i) Write down expressions for P (M ) and P (E) in terms of x. [2]
(ii) Given that events M and E are independent, find the value of x. [3]
(iii) Find P (M ∪ F ′ ). [1]
(iv) Explain, in the context of this question, what is meant by P (F ∣M ), and find its value.
Three students are chosen at random, without replacement.
(v) Find the probability that each studies exactly two of these three subjects. [3]
1242, Contents
Exercise 755. (8865 N2017/12.) (Answer on p. 1244.)
There are bus and train services between the towns of Ayton and Beeton. The journey
times, in minutes, by bus and by train have independent normal distributions. The means
and standard deviations of these distributions are shown in the following table.

Mean Standard deviation

Bus 45 4
Train 42 3

(i) Find the probability that a randomly chosen bus journey takes less than 48 minutes.
(ii) Find the probability that two randomly chosen bus journeys each take more than 48
minutes. [2]
(iii) The probability that the total time for two randomly chosen bus journeys is more
than 96 minutes is denoted by p. Without calculating its value, explain why p will be
greater than your answer to part (ii). [1]
Lan lives in Ayton and works in Beeton. Three days a week he travels from home to work
by bus and two days a week he travels from home to work by train.
(iv) Find the probability that for 3 randomly chosen bus journeys and 2 randomly chosen
train journeys, Lan’s total journey time is more than 210 minutes. [4]
Journeys are charged by the time taken. For bus journeys the charge is $0.12 per minute
and for train journeys the charge is $0.15 per minute.
Let B represent the cost of one journey from Ayton to Beeton by bus.
Let T represent the cost of one journey from Ayton to Beeton by train.
5. Find P (3B − 2T < 3) and explain, in the context of this question, what your answer
represents. [5]

1243, Contents

H1 Maths 2017 Answers
To be written.
A744 (8865 N2017/1). XXX
A745 (8865 N2017/2). XXX
A746 (8865 N2017/3). XXX
A747 (8865 N2017/4). XXX
A748 (8865 N2017/5). XXX
A749 (8865 N2017/6). XXX
A750 (8865 N2017/7). XXX
A751 (8865 N2017/8). XXX
A752 (8865 N2017/9). XXX
A753 (8865 N2017/10). XXX
A754 (8865 N2017/11). XXX
A755 (8865 N2017/12). XXX

1244, Contents

H1 Maths 2016 Questions
Exercise 756. (8864 N2016/1.) (Answer on p. 1249.)
(i) 2 ln (3x2 + 4), [2]
(ii) 2. [2]
2 (1 − 3x)

Exercise 757. (8864 N2016/2.) (Answer on p. 1249.) Do not use a calculator in

answering this question.
Use the substitution u = ex to solve the inequality 2e2x ≥ 9 − 3ex , giving your answer in
logarithmic form. [5]

Exercise 758. (8864 N2016/3.) (Answer on p. 1249.)

The curve C has equation y = e−x − x2 .
(i) Sketch the graph of C. [1]
(ii) Find the numerical value of the gradient of C at the point where x = 0.5. [1]
(iii) Find the equation of the normal to C at the point where x = 0.5. Give your answer
in the form y = mx + c, with m and c correct to 3 significant figures. [3]

(iv) Find ∫ (e−x − x2 ) dx, where k > 0. Give your answer in terms of k.

Exercise 759. (8864 N2016/4.) (Answer on p. 1249.)

The curve C has equation y = 1 + 6x − 3x2 − 4x3 .
(i) Find . Hence find the coordinates of the stationary points on the curve. [4]
(ii) Use a non-calculator method to determine the nature of each of the stationary points.
(iii) Sketch the graph of C, stating the coordinates of any points where the curve crosses
the x-axis. [2]
(iv) Find the numerical value of the area under the curve C between x = 0.5 and x = 1.[1]

Exercise 760. (8864 N2016/5.) (Answer on p. 1249.)

1245, Contents



The diagram shows a V-shape which is formed by removing the equilateral triangle DEF ,
in which DE = y cm, from an equilateral triangle ABC, in which AB = 2x cm. The√points
E and F are on BC such that BE = F C. The area of the V-shape ABEDF CA is 2 3 cm2 .
(i) Show that 4x2 − y 2 = 8. [3]
(ii) Given that the perimeter of ABEDF CA is 10 cm, find the values of x and y. [6]

Exercise 761. (8864 N2016/6.) (Answer on p. 1249.)

A music store manager intend to carry out a survey to investigate how much money students
at a local college spend on music each year. There are 1260 male students and 1140 female
students at the college.
(i) Describe how to obtain a stratified sample of 80 students to take part in the survey.
(ii) State, in this context, one advantage that stratified sampling has compared to simple
random sampling. [1]
The amount of money, in hundreds of dollars, spent by a student in a year is X. For a
simple random sample of 80 students, it is found that ∑ x = 312 and ∑ x2 = 1328.
(iii) Calculate unbiased estimates for the population mean and variance of X. [3]

Exercise 762. (8864 N2016/7.) (Answer on p. 1249.)

The events A and B are such that P(A) = 0.6, P(B) = 0.25 and P(A ∩ B) = 0.05.
(i) Draw a Venn diagram to represent this situation, showing the probability in each of
the four regions. [3]
(ii) Find the probability that
(a) at least one of A and B occurs, [1]
(b) exactly one of A and B occurs. [1]
(iii) Find P (A∣B ′ ). [2]

Exercise 763. (8864 N2016/8.) (Answer on p. 1249.)

1246, Contents

Two boxes, A and B, contain balls of different colours. Box A contains 5 blue balls, 3 red
balls and 2 green balls. Box B contains 4 blue balls and 2 green balls. One of the boxes is
selected at random. Two balls are then chosen at random, without replacement, from the
selected box. Find the probability that
(i) both balls are red, [2]
(ii) the two balls are of different colours, [4]
(iii) both balls are red, given that they are the same colour. [3]

Exercise 764. (8864 N2016/9.) (Answer on p. 1249.)

Watch batteries are supplied to a shop in packs of 8. The probability that any randomly
chosen battery has a lifetime of less than two years is 0.6, independently of all other bat-
(i) For a single pack of batteries, find the probability that
(a) all of the batteries have a lifetime of less than two years, [1]
(b) at least half of the batteries have a lifetime of less than two years. [2]
(ii) For any 4 packs of batteries, find the probability that, for no more than 2 of the packs,
at least half of the batteries have a lifetime of less than two years. [2]
(iii) A customer buys 10 packs of these batteries. Use a suitable approximation to estimate
the probability that at least 40 of these batteries have a lifetime of less than two years.
State the mean and variance of the distribution that you use. [4]

Exercise 765. (8864 N2016/10.) (Answer on p. 1249.)

A scientist claims that the mean top speed of cheetahs, in km/h, is 95. The top speed of
each cheetah in a random sample of 40 cheetahs is recorded and the mean is found to be
96.3. It is known that the top speeds of cheetahs are normally distributed with standard
deviation 4.1.
(i) Test the scientist’s claim at the 5% significance level. [4]
The scientist now decides to test the claim that the mean top speed of cheetahs, in km/h,
is greater than 95. He takes a second random sample of 40 cheetahs and records their top
speeds. Using a 5% significance level, he finds that the mean top speed of cheetahs is not
greater than 95.
(ii) Find the set of values within which the mean top speed of this second sample must
lie. [5]

Exercise 766. (8864 N2016/11.) (Answer on p. 1249.)

Members of an athletics club are training for a ‘Swim-Run’ charity event, in which each
athlete has to complete a 1000-metre swim followed by a 1000-metre run. The times, x
minutes, to swim 1000 metres and the times, y minutes, to run 1000 metres, for a random
sample of 8 members of the club, are given in the following table.

1247, Contents

Athlete A B C D E F G H
x 15.0 16.1 18.2 16.3 17.2 18.1 15.6 16.3
y 2.5 2.7 3.3 3.0 3.5 3.4 2.7 2.8

(i) Give a sketch of the scatter diagram for the data, as shown on your calculator. [2]
(ii) Find the product moment correlation coefficient. [1]
(iii) Find the equation of the regression line of y on x in the form y = mx + c, giving the
values of m and c correct to 3 significant figures. [1]
(iv) Calculate an estimate of the time taken to run 1000 metres by an athletes who swims
1000 metres in 16.9 minutes. State two reasons why you would expect this to be a
reliable estimate. [3]
The time taken by a new member of the club to swim 1000 metres and run 1000 metres
are 18.4 minutes and 2.6 minutes respectively.
(v) Calculate the new product moment correlation coefficient when the times for the new
member are included. [1]
(vi) State, with a reason, which of your answers to parts (ii) and (v) is more likely to
represent the correlation between swimming and running times for all members of the
club. [1]

Exercise 767. (8864 N2016/12.) (Answer on p. 1249.)

Shortbread biscuits of a certain brand are sold in boxes containing 12 biscuits. The masses,
in grams, of the individual biscuits and of the empty boxes have independent normal dis-
tributions with means and standard deviations as shown in the following table.

Mean Standard deviation

Individual biscuit 20 1.1
Empty box 5 0.8

(i) Find the probability that the mass of an individual biscuit is less than 19 grams. [2]
(ii) Find the probability that the total mass of a box containing 12 biscuits is more than
248 grams. State the mean and variance of the distribution that you use. [4]

The cost of producing biscuits if 0.6 cents per gram and the cost of producing empty boxes
is 0.2 cents per gram.
(iii) Find the probability that the total cost of producing a box containing 12 biscuits is
between 142 cents and 149 cents. State the mean and variance of the distribution that
you use. [5]

1248, Contents

H1 Maths 2016 Answers
To be written.
A756 (8864 N2016/1). XXX
A757 (8864 N2016/2). XXX
A758 (8864 N2016/3). XXX
A759 (8864 N2016/4). XXX
A760 (8864 N2016/5). XXX
A761 (8864 N2016/6). XXX
A762 (8864 N2016/7). XXX
A763 (8864 N2016/8). XXX
A764 (8864 N2016/9). XXX
A765 (8864 N2016/10). XXX
A766 (8864 N2016/11). XXX
A767 (8864 N2016/12). XXX

1249, Contents

Part VIII.

The main text above has not always been complete, precise, or rigorous. In these appen-
dices, I go some way towards filling in these gaps. In particular, I give formal definitions,
statements of claims, and proofs of claims.
Where there is a trade-off between generality of a result and the simplicity of its proof, I
usually favour the latter.

1250, Contents

116. Appendices for Part 0. A Few Basics

116.1. Logic

Fact 1. NOT- (P AND Q) ⇐⇒ (NOT-P OR NOT-Q).

Proof. To prove that two statements are equivalent, we must show that in every possible
case, it is impossible that one is true while the other is false.
We will examine the four possible cases, depending on whether P and Q are true or false:
Case 1. Both P and Q are true. Then:
• P AND Q is true; and so, NOT- (P AND Q) is false.
• Both NOT-P and NOT-Q are false; and so, (NOT-P OR NOT-Q) is false.
Case 2. P is true while Q is false. Then:
• P AND Q is false; and so, NOT- (P AND Q) is true.
• NOT-Q is true; and so, (NOT-P OR NOT-Q) is true.
Case 3. P is false while Q is true. Then:
• P AND Q is false; and so, NOT- (P AND Q) is true.
• NOT-P is true; and so, (NOT-P OR NOT-Q) is true.
Case 4. Both P and Q are false. Then:
• P AND Q is false; and so, NOT- (P AND Q) is true.
• NOT-P is true; and so, (NOT-P OR NOT-Q) is true.

We can use a truth table to present the above case-by-case analysis more tidily and clearly
(“1” = true and “0” = false.) In the truth table below, we present the same four cases in
the same order:


1 1 0 0 1 0 0
1 0 0 1 0 1 1
0 1 1 0 0 1 1
0 0 1 1 0 1 1

From the truth table, we can quickly tell that across all four possible cases, NOT-(P AND Q)
and (NOT-P OR NOT-Q) always have the same truth values. Thus, the two statements
are equivalent.

1251, Contents

Fact 2. NOT- (P OR Q) ⇐⇒ (NOT-P AND NOT-Q).

Proof. The truth table below shows that NOT-(P OR Q) and (NOT-P AND NOT-Q)
always have the same truth values and are thus equivalent.


1 1 0 0 1 0 0
1 0 0 1 1 0 0
0 1 1 0 1 0 0
0 0 1 1 0 1 1

Fact 7. (P Ô⇒ Q AND Q Ô⇒ P ) ⇐⇒ (P ⇐⇒ Q).

Proof. The truth table below shows that (P Ô⇒ Q AND Q Ô⇒ P ) and P ⇐⇒ Q

always have the same truth values and are thus equivalent.

P Q P Ô⇒ Q Q Ô⇒ P P Ô⇒ Q AND Q Ô⇒ P P ⇐⇒ Q
1 1 1 1 1 1
1 0 0 1 0 0
0 1 1 0 0 0
0 0 1 1 1 1

1252, Contents

116.2. Sets
The set is what mathematicians call a primitive notion. That is, sets are left undefined
(though they do have to satisfy certain axioms). But having summoned out of the void this
single undefined object called the set, we can then go on to define every other mathematical
object based on the set. The set is thus the single Lego block out of which all of mathematics
is built. Every mathematical object can be defined solely in terms of sets. The idea is to
have just one undefined object, then define everything else based on this single undefined
And so for example, in conventional set theory, we first define the number 0 to be the
empty set. We then define the number 1 as the set that contains 0; the number 2 as the
set that contains 0 and 1; the number 3 as the set that contains 0, 1, and 2; etc.

0 = {} = ∅.
1 = {0} = {{}} = {∅} .
2 = {0, 1} = {{} , {{}}} = {∅, {∅}} .
3 = {0, 1, 2} = {{} , {{}} , {{} , {{}}}} = {∅, {∅} , {{∅}}} .

As another example, perhaps surprisingly the function is also defined to be a set. We’ll
see this shortly in Ch. 117.3 below.

1253, Contents

Fact 188. A real number is rational ⇐⇒ Its digits eventually recur.

We first give a proof sketch to expose the main ideas of the proof.
To prove ⇐Ô, consider for example x = 8.344 571 93 = 8.344 571 935 719 357 193 . . . where
the digits 57 193 eventually recur. Now consider 99 999 000x. We have:

99 999 000x = 100 000 000x − 1 000x

= 834 457 193.571 935 719 3 ⋅ ⋅ ⋅ − 8 344.571 935 719 357 193 . . .
= 834 457 193 − 8 344 = 834 448 849.

We’ve just shown that x = 834 448 849/99 999 000 — x is the ratio of two integers and is
thus rational.
To prove Ô⇒ , consider for example 9/7 = 1.285 714 = 1.285 714 285 714 . . . .
Long division (the remainder at each step is highlighted in blue):

Line 1 1. 2 8 5 7 1 4
2 7 9 Explanation
3 7 1×7=7
4 2 0 9−7=2
5 1 4 2 × 7 = 14
6 6 0 20 − 14 = 6
7 5 6 8 × 7 = 56
8 4 0 60 − 56 = 4
9 3 5 5 × 7 = 35
10 5 0 40 − 35 = 5
11 4 9 7 × 7 = 49
12 1 0 50 − 49 = 1
13 7 1×7=7
14 3 0 10 − 7 = 3
15 2 8 4 × 7 = 28
16 2 30 − 28 = 2

At line 16, the division isn’t complete. But observe that the remainder at line 16 is the
same as in line 4 — namely, 2. And so clearly, the process will simply repeat. Lines 16
through 28 of the long division will look exactly the same as lines 4 through 16. Lines 28
through 40 will again look the same. Etc. The digits 285 714 will thus recur.
The key insight here is that there are only finitely many possible remainder values (namely,
0, 1, . . . , 9) And thus, the remainder must eventually repeat (or hit zero). This completes
the proof sketch.
On the next page is a “proper” proof of the above Fact:

1254, Contents

Proof. To prove ⇐Ô, let x be a real number whose digits eventually recur. Write:

x =xn xn−1 . . . x1 x0 .x−1 x−2 . . . x−m r1 r2 . . . rk r1 r2 . . . rk . . . r1 r2 . . . rk . . . ,

where each xi and ri is a digit (0, 1, . . . , 9) in the decimal representation of x and r1 r2 . . . rk

is the eventually-recurring portion. (The decimal point and the first recurring instance of
r1 r2 . . . rk are highlighted in purple.) Now consider 10m+k x and 10m x:

10m+k x = xn xn−1 . . . x1 x0 x−1 x−2 . . . x−m r1 r2 . . . rk .r1 r2 . . . rk . . . r1 r2 . . . rk . . . , and

10m x = xn xn−1 . . . x1 x0 x−1 x−2 . . . x−m .r1 r2 . . . rk r1 r2 . . . rk . . . r1 r2 . . . rk . . . .

Observe that A = 10m+k x − 10m x = (10m+k − 10m ) x is an integer. And clearly, 10m+k − 10m
is also an integer. Thus, x can be expressed as the ratio of two integers:

10m+k − 10m

To prove Ô⇒ , let a and b be integers. We’ll prove that the digits in a/b must eventually
Let q0 be the largest integer such that a = q0 b + r0 and r0 is a non-negative integer (so that
when we divide a by b, q0 is the quotient and r0 is the remainder.)
For each i = 1, 2, 3, . . . , let qi be the largest integer such that 10ri−1 = qi b + ri and ri is a
non-negative integer (so that at each step of the long division, qi is the quotient and ri is
the remainder.)

= q0 + 10−1 q1 + 10−2 q2 + 10−3 q3 + ⋅ ⋅ ⋅ = q0 .q1 q2 q3 . . .

Note that since ri ∈ {0, 1, 2, . . . b − 1}, the remainder must eventually repeat. That is, there
must exist t > s such that rs = rt .
And since rs = rt , by definition of qi , we have:
qt+1 = qs+1 , qt+2 = qs+2 , . . . , q2t−s = qt .

We also have: rs = rt = r2t−s = r3t−2s = . . .

Altogether: qs+1 qs+2 . . . qt = qt+1 qt+2 . . . q2t−s = q2t+1 qt+2 . . . q3t−2s = . . .

And so we can write a/b as follows, with the recurring digits qs+1 qs+2 . . . qt :

= q0 .q1 q2 q3 . . .qs+1 qs+2 . . . qt qs+1 qs+2 . . . qt qs+1 qs+2 . . . qt . . .


1255, Contents

Proof of De Morgan’s Laws

We can actually reuse our earlier proofs (of De Morgan’s Laws from logic). But here as an
exercise, let’s prove these two laws using the set theory notation we’ve just learnt.

Fact 9. (P ∪ Q) ′ = P ′ ∩ Q′ .

Proof. The chain of reasoning below will show that x ∈ (P ∩ Q) ′ ⇐⇒ x ∈ P ′ ∪ Q′ . That

is, every element that’s in (P ∩ Q) ′ is also in P ′ ∪ Q′ and vice versa. And so by Definition
10, (P ∩ Q) ′ = P ′ ∪ Q′ .

x ∈ (P ∩ Q) ′
⇐⇒ x∉P ∩Q
⇐⇒ x ∉ P OR x ∉ Q
⇐⇒ x ∈ P ′ OR x ∈ Q′
⇐⇒ x ∈ P ′ ∪ Q′ .

Fact 10. (P ∪ Q) ′ = P ′ ∩ Q′ .

Proof. The chain of reasoning below will show that x ∈ (P ∪ Q) ′ ⇐⇒ x ∈ P ′ ∩ Q′ . That

is, every element that’s in (P ∪ Q) ′ is also in P ′ ∩ Q′ and vice versa. And so by Definition
10, (P ∩ Q) ′ = P ′ ∪ Q′ .

x ∈ (P ∪ Q) ′
⇐⇒ x∉P ∪Q
⇐⇒ x ∉ P AND x ∉ Q
⇐⇒ x ∈ P ′ AND x ∈ Q′
⇐⇒ x ∈ P ′ ∩ Q′ .

116.3. Division

Theorem 33. (The Euclidean Division Theorem.) Let x and d ≠ 0 be positive

integers. Then there exist a unique pair of integers q and r such that:

x = dq + r and 0 ≤ r ≤ d.

Proof. Definition 1 constructs q and r and thereby proves existence.

For uniqueness, let θ and ρ be such that x = dθ + ρ. If θ = q, then ρ = r. And if θ ≠ q, then
ρ ∉ [0, d].

1256, Contents

117. Appendices for Part I. Functions and Graphs

117.1. Graphs
As mentioned on p. 1253, every mathematical object can be defined in terms of sets. The
ordered pair is directly defined as a set.

Definition 217. The ordered pair (x, y) is defined by:

(x, y) = {{x} , {x, y}} .

There’s actually more than one way we can define an ordered pair.389 What’s important
is that our definition correctly captures the idea that (x, y) = (a, b) ⇐⇒ x = a and y = b.
This our above definition does:

Fact 12. (x, y) = (a, b) ⇐⇒ x = a AND y = b.

Proof. ⇐Ô is trivial: If x = a AND y = b, then (x, y) = {{x} , {x, y}} = {{a} , {a, b}} = (a, b).
For Ô⇒ , we’ll prove the contrapositive: If x ≠ a OR y ≠ b, then (x, y) ≠ (a, b).
Suppose x ≠ a. Then {x} ≠ {a}.
Case 1. If {x} ≠ {a, b}, then {x} is an element of (x, y) but not of (a, b). So, (x, y) ≠ (a, b).
Case 2. If {x} = {a, b}, then a ≠ b (otherwise {x} = {a, b} = {a}, a contradiction). So
(x, y) = {{x} , {x, y}} = {{a, b} , {a, b, y}} does not contain {a} and thus (x, y) ≠ (a, b).
Now suppose x = a and y ≠ b. Then (x, y) = {{x} , {x, y}} = {{a} , {a, y}} does not contain
{a, b} and therefore (x, y) ≠ (a, b).

We then define the ordered triple (x, y, z) to be the ordered pair ((x, y) , z). Similarly,
we define the ordered quadruple (x1 , x2 , x3 , x4 ) to be the ordered pair ((x1 , x2 , x3 ) , x4 ).
Etc. In general, here is how the ordered n-tuple is defined:

Definition 218. Let n ≥ 3. The ordered n-tuple (x1 , x2 , . . . , xn ) is defined recursively by:

(x1 , x2 , . . . , xn ) = ((x1 , x2 , . . . , xn−1 ) , xn ) .

Remark 144. Many writers simply call it a tuple instead of an ordered n-tuple.

We can easily prove the analogue of Fact 12 for ordered n-tuples:

The above definition is by Kuratowski (1921, p. 171) and is today the one that’s usually used.
1257, Contents
Fact 189. Two ordered n-tuples (x1 , x2 , . . . , xn ) and (y1 , y2 , . . . , yn ) are identical if and
only if xi = yi for all i = 1, 2, . . . , n.

Proof. ⇐Ô is trivial. To prove Ô⇒ , let n ≥ 3. By Definition 218:

(x1 , x2 , . . . , xn ) = (y1 , y2 , . . . , yn ) ⇐⇒ ((x1 , x2 , . . . xn−1 ) , xn ) = ((y1 , y2 , . . . , yn−1 ) , yn ).

By Fact 12, this last equation is true if and only if:

(x1 , x2 , . . . xn−1 ) = (y1 , y2 , . . . , yn−1 ) AND xn = yn .

Applying the above recursively, we find that for all i ≥ 3, xi = yi . We also find that
(x1 , x2 ) = (y1 , y2 ), whereupon by Fact 12, we also have that x1 = y1 and x2 = y2 .

Definition 219. Given x1 , x2 , . . . , xn ∈ R, (x1 , x2 , . . . , xn ) is called a point in n-

dimensional space.390

In the rest of this subchapter, everything will be in the context of the cartesian plane
R2 = {(x, y) ∶ x, y ∈ R}.

Definition 220. The distance between the points (x1 , y1 ) and (x2 , y2 ) is:

(x2 − x1 ) + (y2 − y1 ) .
2 2

Definition 221. If the distance between two points A and B is shorter than that between
A and C, we say that C is closer to A than B.

Fact 190. There is exactly one point on the line ax + by + c = 0 that is closest to (p, q).

Proof. If b = 0, then every point in the line is of the form (−c/a, y). Clearly then, the unique
closest point is simply (−c/a, q).
So suppose b ≠ 0. Then y = (−a/b) x − c/b.
Let d denote the distance between an arbitrary point (x, y) on the line and the point (p, q).
√ √
d = (x − p) + (y − q) = (x − p) + (− x − − q)
2 2 2 a c
b b

a2 c2
= x2 + p2 −2px + 2 x2 + 2 + q 2 +2 2 x + 2 x + 2
ac aq cq
b b b b b

a2 c2
= (1 + 2 )x2 + 2 ( 2 + − p)x + p2 + 2 + q 2 + 2 .
ac aq cq
b b b b b

The quadratic expression inside the surd has a positive coefficient on x2 , which means it has
a strict global and thus unique minimum (see Ch. 9). Thus, d itself has a unique minimum
— there is a exactly one point on the line that is closest to (p, q).
By the way, though embedded in n-dimensional space, a point is itself a zero-dimensional object.
1258, Contents
Corollary 35. The point on the line ax + by + c = 0 that is closest to the point (p, q) is:
ap + bq + c ap + bq + c
(p − a − ).
a2 + b2 a2 + b2
, q b

Proof. If b = 0, then as stated in the previous proof, the closest point is (−c/a, q), which the
reader can verify is indeed equal to the point claimed by the present corollary.
So, suppose b ≠ 0. We know from Ch. 9 that if m > 0, then mx2 + nx + o achieves a strict
global minimum at x = −n/2m. And so, continuing with our previous proof, the value of x
that minimises d is:

2 ( ac
b2 + b − p)
b2 p − ac − abq a2 p + b2 p − ac − abq − a2 p ap + bq + c
x=− = = = −
a2 + b2 a2 + b2 a2 + b2
p a .
2(1 + ab2 )

And the corresponding y-coordinate is:

ax + c
b2 p−ac−abq
+c a (b2 p
 − abq) +a2c + b2 c
y =− x− =− =− =−
a c a a2 +b2

b b b b b (a2 + b2 )
a (b2 p − abq) + b2 c a2 q − abp − bc a2 q + b2 q − b2 q − abp − bc ap + bq + c
=− = = =q−b 2 2 .
b (a + b )
2 2 a +b
2 2 a +b
2 2 a +b

Definition 123. Let A be a point and l be a line. Suppose B is the point on l that’s
closest to A. Then the distance between A and l is ∣AB∣.

Corollary 36. The distance between a point (p, q) and a line ax + by + c = 0 is:
∣ap + bq + c∣
√ .
a2 + b2

Proof. By Corollary 35, the point on the line that is closest to (p, q) is:
ap + bq + c ap + bq + c
(p − a − ),
a2 + b2 a2 + b2
, q b

By Definition 220, the distance between this point and (p, q) is:
Á ap + bq + c 2 ap + bq + c 2
d= [p − (p − a 2 2 )] + [q − (q − b 2 2 )]
a +b a +b
Á ap + bq + c 2 ap + bq + c 2 ap + bq + c √ 2 2 ∣ap + bq + c∣
= (a 2 2 ) + (b 2 2 ) = ∣ 2 2 ∣ a + b = √ .
a +b a +b a +b a2 + b2

In Ch. 119.14, an Appendix in the Appendices for Part III (Vectors), we’ll prove the above
two Corollaries again using different methods.

1259, Contents

Fact 191. Two distinct points can be contained by at most one line.

Proof. Suppose the lines ax + by + c = 0 and dx + ey + f = 0 contain the distinct points (p, q)
and (r, s). Then:

ap + bq + c = 0, ar + bs + c = 0, dp + eq + f = 0, dr + es + f = 0.
1 2 3 4

= minus = and = minus = yield a (p − r) = b (s − q) and d (p − r) = e (s − q).

1 2 3 4 5 6

If p = r, then given that (p, q) ≠ (r, s), we have s ≠ q. Since p − r = 0 and s − q ≠ 0, = and =
5 6

imply that b = e = 0. That is, the coefficient on y for each of our two lines is zero. Hence
our two lines are simply x = p and x = r. But of course, p = r and so the two lines are
So suppose instead that p ≠ r. Then = and = may be rewritten as:
5 6

s−q s−q
a=b d=e
7 8
and .
p−r p−r

Since at least one of a or b must be non-zero, = implies that both a and b must be non-zero.

Similarly, since at least one of d or e must be non-zero, = implies that both e and e must

be non-zero. Now use = and = to rewrite = and = as:

7 8 1 3

s−q s−q
p+q+ =0 p + q + = 0.
c 9 f 10
p−r b p−r e

And now, = − = yields c/b = f /e.

10 9 11

We now show that a point (t, u) is in the line ax + by + c = 0 if and only if it is also in the
line dx + ey + f = 0. We will thus have shown that the two lines are identical:
at + bu + c = 0 ⇐⇒ t + bu + c = 0

s−q s−q
⇐⇒ b( t+u+ ) =0 ⇐⇒ t+u+ =0
c c
p−r b p−r b
⇐⇒ e( t+u+ ) =0 ⇐⇒ dt + eu + f = 0.
11 f
p−r e

Fact 192. Given two distinct points (p, q) and (r, s), the unique line that contains both
points is (q − s) x + (r − p) y + ps − qr = 0.391

Proof. We need merely plug in and verify that the given line contains (p, q) and (r, s):

(q − s) p + (r − p) q + ps − qr = 0. 3 (q − s) r + (r − p) s + ps − qr = 0. 3
Our previous fact then states that this is the unique line that contains both points.
s−q qr − ps
Note that if r ≠ p (the line isn’t vertical), then this line may be written as: y = x+ .
r−p r−p

qr − ps
And if r = p (the line is vertical), then it may be written as: x = .
1260, Contents
Fact 14. The line containing the distinct points (a1 , b1 ) and (a2 , b2 ) is:

(a2 − a1 ) (y − b1 ) = (b2 − b1 ) (x − a1 ) .

Proof. By Fact 192, the unique line that contains both points is (y1 − y2 ) x + (x2 − x1 ) y +
x1 y2 − y1 x2 = 0. Rearranging, (x2 − x1 ) (y − y1 ) = (y2 − y1 ) (x − x1 ).

Definition 222. Let A = (p, q) and B = (r, s) be distinct points with p < r or q < s. Then
1. The line AB is the graph of the equation (q − s) x + (r − p) y + ps − qr = 0;
2. The line segment AB is the graph of the equation (q − s) x + (r − p) y + ps − qr = 0 with
the constraint x ∈ [p, r]; and
3. The ray AB is the graph of the equation (q − s) x + (r − p) y + ps − qr = 0 with the
constraint x ≥ p if p < r and the constraint y ≥ q if q < s.

Definition 223. Let ax + by + c = 0 be a line. If b ≠ 0, then the gradient of this line is

−a/b. If b = 0, then the gradient of this line is undefined.

Definition 42. Two lines are perpendicular if:

(a) Their gradients are negative reciprocals of each other; or
(b) One line is vertical while the other is horizontal.

Fact 193. Let A be a point that is not on the line l. Let B be the point on l that is closest
to A. Then l is perpendicular to the line m which contains both A and B.

Proof. Let A = (p, q), B = (r, s), and l be given by ax + by + c = 0. Then by Corollary 35,
ap + bq + c ap + bq + c
B = (r, s) = (p − a − ).
a2 + b2 a2 + b2
, q b

And so by Fact 192, line m is given by:

ap + bq + c ap + bq + c ap + bq + c ap + bq + c
− + (q − ) − (p − ) = 0.
a2 + b2 a2 + b2 a2 + b2 a2 + b2
b x a y p b q a

If a = 0, then l is horizontal and m is vertical, so that the two lines are perpendicular.
If instead b = 0, then l is vertical and m is horizontal, so that again , the two lines are
So suppose a ≠ 0 ≠ b. Then the gradient of the line m is:
ap + bq + c ap + bq + c
−b ÷ (−a ) =
a2 + b2 a2 + b2
This is the negative reciprocal of the gradient −a/b of the line l, so that the two lines are

1261, Contents

Fact 194. If z is the reflection of x in y, then:
(a) y is equidistant from x and z; and
(b) z is on the same line as x and y.

Proof. Let x = (p, q) and y = (r, s), so that by Definition 44, z = (2r − p, 2s − q).
(a) The distance between z and y is:
√ √ √
[(2r − p) − r] + [(2s − q) − s] = (r − p) + (s − q) = (p − r) + (q − s) ,
2 2 2 2 2 2

which is the distance between x and y.

(b) The line containing x = (p, q) and y = (r, s) is: (q − s) x + (r − p) y + ps − qr = 0.
We can easily verify that this line contains z = (2r − p, 2s − q):

(q − s) (2r − p) + (r − p) (2s − q) + ps − qr = 0. 3

Fact 195. The reflection of the point (p, q) in the line ay + bx + c = 0 is the point
ap + bq + c ap + bq + c
(p − 2a , q − 2b 2 2 ) .
a +b
2 2 a +b

Proof. By Corollary 35, the point on the line ay + bx + c = 0 that is closest to (p, q) is:
ap + bq + c ap + bq + c
(p − a − ).
a2 + b2 a2 + b2
, q b

By Definitions 44 and 45, the reflection of (p, q) in ay + bx + c = 0 is:

ap + bq + c ap + bq + c ap + bq + c ap + bq + c
(2 (p − a ) − 2 (q − ) − = (p − 2a − 2b ).
a2 + b2 a2 + b2 a2 + b2 a2 + b2
p, b q) , q

Corollary 1. The reflection of the point (p, q) in the line y = x is the point (q, p).

Proof. Simply use Fact 195, with a = 1, b = −1, and c = 0:

ap + bq + c ap + bq + c p−q p−q
(p − 2a − 2b ) = (p − 2 + 2 ) = (q, p) .
a2 + b2 a2 + b2 1+1 1+1
, q , q

Corollary 2. The reflection of the point (p, q) in the line y = −x is the point (−q, −p).

Proof. Simply use Fact 195, with a = 1, b = 1, and c = 0:

ap + bq + c ap + bq + c p+q p+q
(p − 2a − 2b ) = (p − 2 − 2 ) = (−q, −p) .
a2 + b2 a2 + b2 1+1 1+1
, q , q

1262, Contents

Fact 13. Suppose ax + by + c = 0 describes a line. Then:
(a) The line is horizontal ⇐⇒ a = 0.
(b) The line is vertical ⇐⇒ b = 0.

Proof. (a) Let (x1 , y1 ) and (x2 , y2 ) be points on the line with x1 ≠ x2 . Since both are points
on the line, we have:
ax1 + by1 + c = 0 and ax2 + by2 + c = 0.

First, suppose a = 0. Then by1 + c = 0 and by2 + c = 0, so that y1 = y2 = −c/b. We have just
shown that any two arbitrary points on the line have the same y-coordinate. And so by
Definition 40, the line is horizontal.
Next, suppose instead that a ≠ 0. If b = 0, then (x1 , y1 + 1) is also a point on the line, so
that the line is not horizontal (because it contains two points with different y-coordinates).
And if b ≠ 0, then:
ax1 + c ax2 + c
y1 = − and y2 = − .
b b
But since x1 ≠ x2 and a ≠ 0, it follows that y1 ≠ y2 , so that again the line is not horizontal
(again because it contains two points with different y-coordinates).
(b) The proof of (b) is very similar and thus omitted.

1263, Contents

In the following definition of the eight types of extrema, we sometimes refer to the ε-
neighbourhood of a point P . Informally, given a positive number ε, the ε-neighbourhood
of a point P is simply the set of points whose distance from P is less than  and is denoted
Nε (P ).392

Definition 224. Let P = (a, b) be a point in the graph G. We say that P is:
1. A global maximum (point) of G if for all (x, y) ∈ G, we have b ≥ y.
2. The strict global maximum (point) of G if for all (x, y) ∈ G, we have b > y.
3. A local maximum (point) of G if there exists ε > 0 such that:

For all (x, y) ∈ Nε (P ) ∩ G, we have b ≥ y.

4. A strict local maximum (point) of G if there exists  > 0 such that:

For all (x, y) ∈ Nε (P ) ∩ G, we have b > y.

5. A global minimum (point) of G if for all (x, y) ∈ G, we have b ≤ y.

6. The strict global minimum (point) of G if for all (x, y) ∈ G, we have b < y.
7. A local minimum (point) of G if there exists  > 0 such that:

For all (x, y) ∈ Nε (P ) ∩ G, we have b ≤ y.

8. A strict local minimum (point) of G if there exists  > 0 such that:

For all (x, y) ∈ Nε (P ) ∩ G, we have b < y.

Note: There can be at most one strict global maximum and at most one strict global
minimum — hence the use of the definite article the. In contrast, there can be more than
one of each of the other types of extreme points — hence the use of the indefinite article a.

For the formal definition, see Definition 249 in the Appendices for Calculus.
1264, Contents
117.2. The Quadratic Equation

Fact 20. Given the quadratic equation y = ax2 + bx + c,

1. The y-intercept is (0, c).
2. The sign of the discriminant determines the number of x-intercepts:
(a) If b2 − 4ac > 0, then there are two x-intercepts (i.e. two real roots):

−b ± b2 − 4ac
x= .
We can factorise the quadratic polynomial:
√ √
−b + 2 − 4ac −b − b2 − 4ac
ax + bx + c = a (x − ) (x − ).
2 b
2a 2a

(b) If b2 − 4ac = 0, then there is one x-intercept (i.e. one real root), where the graph
just touches the x-axis:

We can factorise the quadratic polynomial:
b 2
ax + bx + c = a (x + ) .
(c) If b2 − 4ac < 0, then there are no x-intercepts (i.e. no real roots). There is also
no way to factorise the quadratic polynomial ax2 + bx + c (unless we use complex
3. There is one line of symmetry, which is vertical:

(− , − + c).
4. There is one turning point:
2a 4a

5. The sign of a (the coefficient on x2 ) determines the shape of the graph:

(a) If a > 0, then the graph is ∪-shaped and the turning point is a strict global min-
(b) If a < 0, then the graph is ∩-shaped and the turning point is a strict global max-

Proof on the next page:

1265, Contents

Proof. (1) It is trivial to verify that (0, c) satisfies y = ax2 + bx + c.
(2) We showed above that the two roots of the quadratic equation are:

−b ± b2 − 4ac
x= .
If b2 − 4ac > 0, then both roots are real.
If b2 − 4ac = 0, then there is only one real (but repeated) root.
If b2 − 4ac < 0, then there are two complex roots.
(3) Let G be the graph of the quadratic equation y = ax2 + bx + c. The reflection of any
point (x, y) in the vertical line x = − is the point (− − x, y).
b b
2a a
We now prove that the reflection of G in this vertical line is itself. To do so, we prove that:

(x, y) ∈ G ⇐⇒ (− − x, y) ∈ G.
And to do so, we write:
b2 2b
a (− − x) + b (− − x) + c = a ( 2 + x2 + x) + b (− − x) + c
b b b
a a a a a

= ax2 + 2bx − bx + c = ax2 + bx + c.

(4) and (5) Differentiate the quadratic equation y = ax2 + bx + c with respect to x to get:

y ′ (x) = 2ax + b.

Setting this equal to zero, we obtain the stationary point x̂ = −

(a) Suppose a > 0. Then for any x < x̂, we have y ′ (x) < 0; and for any x > x̂, we have
y ′ (x) > 0. Hence, the graph is ∪-shaped. Moreover, x̂ must be a strict global minimum.
(b) Suppose a < 0. Then for any x < x̂, we have y ′ (x) > 0; and for any x > x̂, we have
y ′ (x) < 0. Hence, the graph is ∩-shaped. Moreover, x̂ must be a strict global maximum.
In either case, since x̂ is both a stationary point and a strict local extremum, it is by
Definition 63 also a turning point. Plugging x̂ into the quadratic equation, we have the full
coordinates of this point:
(− , − + c) .
2a 4a

1266, Contents

117.3. Functions
The cartesian product of two sets A and B is the set of ordered pairs (x, y) with x ∈ A
and y ∈ B. Formally:

Definition 225. Given two sets A and B, their cartesian product, denoted A × B, is:

A × B = {(x, y) ∶ x ∈ A, y ∈ B} .

Definition 226. A function f with domain A and codomain B is any subset of A × B

that satisfies the following property:

For every x ∈ A, there is exactly one y ∈ B such that (x, y) ∈ f .

Observe that thus defined, a function is simply a set of points. And so, a function and
what we called its graph in the main text above are really one and the same thing.393

Fact 196. Suppose the graph of f has x-intercept (p, 0), y-intercept (0, q), line of sym-
metry αx + by + c = 0, turning point (r, s), and asymptote α̂x + b̂y + ĉ = 0. Then:

(a) The graph of y = f (x)+a has y-intercept (0, q + a), line of symmetry αx+b (y − a)+c =
0, turning point (r, s + a), and asymptote α̂x + b̂ (y + a) + ĉ = 0.
(b) The graph of y = f (x+a) has x-intercept (p − a, 0), line of symmetry α (x − a)+by+c =
0, turning point (r − a, s), and asymptote α̂ (x − a) + b̂y + ĉ = 0.

Proof. We prove only (a) — the proof of (b) is very similar to and is thus omitted.
(a) Intercept. If q = f (0), then q + a = f (0) + a. In other words, if (0, q) satisfies y = f (x),
then (0, q + a) satisfies y = f (x) + a. Thus, (0, q + a) is a y-intercept for the graph of
y = f (x) + a.
Line of symmetry. Let (x1 , y1 ) be any point in the graph of y = f (x) + a. Its reflection
in the line αx + b (y − a) + c = 0 is:
αx1 + by1 + c − ba αx1 + by1 + c − ba
S = (Sx , Sy ) = (x1 − 2α − 2b ).
α 2 + b2 α 2 + b2
, y 1

Our goal is to show that S satisfies y = f (x) + a and is in the graph of y = f (x) + a. We
will thus have shown that αx + b (y − a) + c = 0 is a line of symmetry for y = f (x) + a.
Now, note that (x1 , y1 − a) is in the graph of y = f (x). So too is its reflection in the line
αx + by + c = 0, which is:
αx1 + b (y1 − a) + c αx1 + b (y1 − a) + c
T = (Tx , Ty ) = (x1 − 2α − − 2b ).
α2 + b2 α 2 + b2
, y 1 a

Observe that Ty = f (Tx ), Tx = Sx , and Sy = Ty + a.

Hence, Sy = Ty + a = f (Tx ) + a = f (Sx ) + a. We have just shown that S satisfies y = f (x) + a.

As mentioned in footnote 98.
1267, Contents
Turning point. To be written.
Asymptote. To be written.

1268, Contents

117.4. Inverse Functions

Fact 21. A function has a well-defined inverse if and only if it is invertible.

Proof. Suppose f is not invertible. Then by the definition of an invertible function (Defin-
ition 57), there are x1 , x2 ∈ Domainf such that x1 ≠ x2 and f (x1 ) = f (x2 ). Let y = f (x1 ) =
f (x2 ). Our definition of an inverse function (Definition 56) now fails to clearly specify
whether f −1 maps y to x1 and/or x2 . In other words, f −1 is not well-defined.
Now suppose f is invertible. Then f does indeed map every element in its domain to
exactly one element in its codomain, as per the mapping rule given in Definition 56:

f (x) = y Ô⇒ f −1 (y) = x.

Fact 22. Let f be an invertible function and f −1 be its inverse. Then f and f −1 are
reflections of each other in the line y = x.

Proof. By definition, f (a) = b ⇐⇒ f −1 (b) = a. Hence, (a, b) ∈ f ⇐⇒ (b, a) ∈ f −1 . But

as we showed in Corollary 1, the reflection of the point (a, b) in the line y = x is the point
(b, a). And so the reflection of f in the line y = x is precisely f −1 .

Proposition 3. Suppose D is an interval and f ∶ D → R is a continuous function. Then:

f is invertible Ô⇒ f is strictly monotonic (on D).

Proof. (By contradiction.) Suppose f is invertible, but neither strictly increasing nor
strictly decreasing on D.
Since f is neither strictly increasing nor strictly decreasing, there exist394 x1 , x2 , x3 ∈ D
with x1 < x2 < x3 such that f (x2 ) ≤ min {f (x1 ) , f (x3 )} or f (x2 ) ≥ max {f (x1 ) , f (x3 )}.
Actually, since f is invertible, these last two weak inequalities may be replaced by strict
If f (x2 ) < min {f (x1 ) , f (x3 )} = a, then pick any y ∈ (f (x2 ) , a). By the Intermediate
Value Theorem (Theorem 6), there exist x4 ∈ (x1 , x2 ) and x5 ∈ (x2 , x3 ) such that f (x4 ) = y
and f (x5 ) = y, so that f is not invertible and we have our desired contradiction.
The case where f (x2 ) > max {f (x1 ) , f (x3 )} = b is similarly handled.

Proposition 4. If a function is strictly increasing, then so too is its inverse.

Similarly, if a function is strictly decreasing, then so too is its inverse.

Proof. Suppose f is strictly increasing. Pick any distinct y1 , y2 ∈ Rangef = Domainf −1 , with
y1 < y2 . Since f is strictly increasing, there exist distinct x1 , x2 ∈ Domainf with x1 < x2
We should actually have mentioned that in the case where D is empty or contains a single point, then
the Proposition is “obviously” or vacuously true. So in our proof, we shall assume that D contains more
than one point.
1269, Contents
such that f (x1 ) = y1 and f (x2 ) = y2 . Thus, f −1 (y1 ) = x1 < f −1 (y2 ) = x2 and f −1 is also
strictly increasing.
The case where f is strictly decreasing is similar and thus omitted.

Fact 24. Let D be an interval. Let f ∶ D → R be a continuous and invertible func-

tion. Suppose f and its inverse f −1 intersect at least once. Then at least one of these
intersection points is on the line y = x.

Proof. Pick any intersection point (a, b).

If (a, b) is on the line y = x, then we are done. So, suppose it is not, i.e. a ≠ b, so that either
a > b or a < b.
Since (a, b) ∈ f −1 , by Fact 22, (b, a) ∈ f .
Now define g (x) = f (x) − x. By Theorem 16, g is also continuous on [a, b].
• If a < b, then g (a) = f (a) − a = b − a > 0, while g (b) = f (b) − b = a − b < 0. And:
• If a > b, then g (a) = f (a) − a = b − a < 0, while g (b) = f (b) − b = a − b > 0.
Either way, by the Intermediate Value Theorem (Theorem 6), there exists c ∈ (a, b) such
that g (c) = 0 or equivalently, f (c) = c.
Thus, (c, c) ∈ f . By Fact 22 then, (c, c) ∈ f −1 . Thus, f and f −1 also intersect at (c, c),
which is on the line y = x.

Fact 25. Let D be an interval. Let f ∶ D → R be a continuous and invertible function.

Suppose f and its inverse f −1 intersect at an even (and positive) number of points. Then
all of these intersection points are on the line y = x.

Proof. Suppose for contradiction that (a, b) ∈ f, f −1 with a ≠ b. That is, suppose there
exists a shared intersection point (a, b) that is not on the line y = x. Then by Fact 22,

(a, b) ∈ f Ô⇒ (b, a) ∈ f −1 and (a, b) ∈ f −1 Ô⇒ (b, a) ∈ f .

Thus, f and f −1 also intersect at the point (b, a), which isn’t on the line y = x. We’ve just
shown that any shared intersection points that aren’t on the line y = x must come in pairs.
Since the total number of intersection points is even, any shared intersection points that
are on the line y = x must also come in pairs.
By Fact 24, at least one of f and f −1 ’s intersection points is on the line y = x. Thus, there
exist at least two points on the line y = x at which f and f −1 intersect. Call them (c, c)
and (d, d) with c < d.
Now, since f is continuous on an interval, by Proposition 3, it is either strictly increasing
or strictly decreasing.
Since c < d and f (c) = c < f (d) = d, f is strictly increasing. ,
Since a ≠ b, we have either a > b or a < b.
• If a > b, then f (a) = b < f (b) = a, so that f is strictly decreasing, contradicting ,.
• If a < b, then f (a) = b > f (b) = a, so that f is strictly decreasing, contradicting ,.

1270, Contents

1271, Contents
117.5. Transformations
We first formally define what a translation, a stretch, and a compression are.

Definition 227. Let G and H be graphs and a > 0. We say that H is G . . .

1. Translated a units . . .
(a) Downwards if (p, q) ∈ G ⇐⇒ (p, q − a) ∈ H.
(b) Upwards if (p, q) ∈ G ⇐⇒ (p, q + a) ∈ H.
(c) Rightwards if (p, q) ∈ G ⇐⇒ (p + a, q) ∈ H.
(d) Leftwards if (p, q) ∈ G ⇐⇒ (p − a, q) ∈ H.
2. Stretched by a factor of a, outwards from the . . .
(a) x-axis if (p, q) ∈ G ⇐⇒ (p, aq) ∈ H.
(b) y-axis if (p, q) ∈ G ⇐⇒ (ap, q) ∈ H.
3. Compressed by a factor of a, inwards towards the . . .
(a) x-axis if (p, q) ∈ G ⇐⇒ (p, q/a) ∈ H.
(b) y-axis if (p, q) ∈ G ⇐⇒ (p/a, q) ∈ H.

Fact 27. Let a, b > 0 and c, d ∈ R. Let f be a nice function. Then to get the graph of
y = af (bx + c) + d, follow these steps:
1. Translate leftwards by c units, to get y = f (x + c).
2. Compress horizontally (inwards towards y-axis) by a factor of b, to get y = f (bx + c).
3. Stretch vertically (outwards from x-axis) by a factor of a, to get y = af (bx + c).
4. Translate upwards by d units, to get y = af (bx + c) + d.

Proof. In each case, we need merely verify that Definition 227 is met:
(1) Observe that q = f (p) ⇐⇒ q = f (p − c + c). Hence, the point (p, q) is in the graph
of f ⇐⇒ (p − c, q) is in the graph of y = f (x + c). And so, by Definition 227(1)(d), the
graph of y = f (x + c) is the graph of g translated leftwards by c units.
(2) Observe that q = f (p + c) ⇐⇒ q = f (bp/b + c). Hence, the point (p, q) is in the
graph of y = f (x + c) ⇐⇒ (p/b, q) is in the graph of y = f (bx + c). And so, by Definition
227(3)(b), the graph of y = f (bx + c) is the graph of y = f (x + c) compressed by a factor of
b, inwards towards the y-axis.
(3) Observe that q = f (bp + c) ⇐⇒ aq = af (bp + c). Hence, the point (p, q) is in the
graph of y = f (bx + c) ⇐⇒ (p, aq) is in the graph of y = af (bx + c). And so, by Definition
227(2)(a), the graph of y = af (bx + c) is the graph of y = f (bx + c) stretched by a factor of
a, outwards from the x-axis.
(4) Observe that q = af (bp + c) + d ⇐⇒ q + d = af (bp + c) + d. Hence, the point (p, q) is
in the graph of y = af (bx + c) ⇐⇒ (p, q + d) is in the graph of y = af (bx + c) + d. And
so, by Definition 227(1)(a), the graph of y = af (bx + c) + d is the graph of y = f (bx + c)
translated upwards by d units.

1272, Contents

117.6. Trigonometry

Definition 228. Let A = (p, q) and B = (r, s) be distinct points on the circle (x − a) +

(y − b) = c, with p < r or q < s. Then the arc AB is the following set:


{(x, y) ∶ (x − a) + (y − b) = c, x ∈ [p, r] , y ∈ [q, s]} .

2 2

Here are this textbook’s official, formal definitions of sine and cosine, reproduced from
Ch. 76.4.

Definition 177. The sine function sin ∶ R → R is defined by:

x3 x5 x7 2n+1
sin x = x − + − + ⋅ ⋅ ⋅ = ∑ (−1)
n x
(2n + 1)!
3! 5! 7! n=0

Definition 178. The cosine function cos ∶ R → R is defined by:

x2 x4 x6 2n
cos x = 1 − + − + ⋅ ⋅ ⋅ = ∑ (−1)
n x
2! 4! 6! n=0

With “some” work, it is possible to prove that under Definitions 177 and 178, all results
stated in the main text continue to hold.

Fact 197. Let x ∈ R. Then sin (cos−1 x) = 1 − x2 .

Proof. Let θ = cos−1 x ∈ [0, π]. Then cos θ = x and sin θ ≥ 0.

√ √
Now, from the identity sin2 θ+cos2 θ = 1, we have sin θ = ± 1 − cos2 √
θ = ± 1 − x. Since sin θ ≥
0, we can discard the negative value. Thus, sin (cos−1 x) = sin θ = 1 − x, as desired.

1273, Contents

117.7. Polynomials

Theorem 4. (Euclidean Division Theorem for Polynomials.) Let p (x) and d (x)
be P - and D-degree polynomials in x with D < P . Then there exists a unique polynomial
q (x) of degree P − D such that r (x) = p (x) − d (x)q (x) has degree less than D.

P D P −D
Proof. Let p (x) = ∑ pi xi and d (x) = ∑ di xi . Now construct q (x) = ∑ qi xi , with:
i=0 i=0 i=0

qP −D =
pP −1 − qP −D dD−1 pP −1 − dD dD−1
qP −D−1 = = ,
dD dD
pP −2 − (qP −D−1 dD−1 + qP −D dD−2 )
qP −D−2 = ,

pD − (q1 dD−1 + ⋅ ⋅ ⋅ + qD d0 )
q0 = .
By construction, we have:
P j
d (x) q (x) = ∑ ( ∑ qj dj−k ) xj
j=0 k=0

= qP −D dD xP + (qP −D−1 dD + qP −D dD−1 ) xP −1 + . . .

D−1 j
⋅ ⋅ ⋅ + (q0 dD + q1 dD−1 + ⋅ ⋅ ⋅ + qD d0 ) x + ∑ ( ∑ qj dj−k ) xj
j=0 k=0

D−1 j
= pP x + pP −1 x
P P −1
+ ⋅ ⋅ ⋅ + pD x + ∑ ( ∑ qj dj−k ) xj .
j=0 k=0

Define: r (x) = p (x) − d (x) q (x)

⎡ ⎤
⎢ j⎥
= ∑ pi x − ⎢pP x + pP −1 x + ⋅ ⋅ ⋅ + pD x + ∑ ( ∑ qj dj−k ) x ⎥
P D−1
P −1
⎢ ⎥
i P D
i=0 ⎣ j=0 k=0 ⎦
D−1 D−1 j D−1 j
= ∑ pi x − ∑ ( ∑ qj dj−k ) x = ∑ (pj − ∑ qj dj−k ) xj .
i j
i=0 j=0 k=0 j=0 k=0

By construction, p (x) = d (x) q (x) + r (x), q (x) is of degree P − D, while r (x) is of degree
less than D. This completes the proof of existence.
P −D
To prove uniqueness, let s (x) = ∑ si xi be a P − D-degree polynomial such that p (x) −
s (x) d (x) is a polynomial of degree below D. If sP −D ≠ qP −D , then p (x) − q (x) d (x)

is of degree P ≥ D — so we must have sP −D = qP −D . Similarly, if sP −D−1 ≠ qP −D−1 , then

p (x)−q (x) d (x) is of degree P −1 ≥ D — so we must have sP −D−1 = qP −D−1 . Etc. Altogether,
we conclude that s (x) = q (x), so that q (x) is unique.

1274, Contents

117.8. Conic Sections
For a very brief introduction to conic sections, either watch this video395 or read the fol-
lowing. (Or both.)
Take a vertical line and an oblique line. Rotate the oblique line about the vertical line to
form an infinite double cone. We call the vertical line the axis and the oblique line the
generator. The midpoint of the cone (or the point where the axis and generator meet) is
called the vertex.
Now take a two-dimensional cartesian plane and slice the double cone from all conceivable
positions and at all conceivable angles. The intersection of the plane and the outer surface
of the double cone then form curves which we call conic sections.







We have three types of conic sections:

1. An ellipse if the plane is less steep than the generator. A special case is the circle
which is obtained if the plane is perpendicular to the axis.
2. A parabola if the plane is exactly as steep as the generator.
3. A hyperbola if the plane is steeper than the generator.

Not made by me and narrated by an female Indian robot, but awesome nonetheless.
1275, Contents
The ellipse and the parabola are formed from only one half of the double cone. In contrast,
the hyperbola is formed from both halves — it thus has two branches.396
We shall not do so, but it is possible to prove that in general, a conic section is the graph
of the equation

Ax2 + Bxy + Cy 2 + Dx + Ey + F = 0.

We call B 2 − 4AC the discriminant, because it is possible to show that:

1. If B 2 − 4AC < 0, then we have an ellipse.
2. If B 2 − 4AC = 0, then we have a parabola. (The quadratic equation is an example
of a parabola.)
3. If B 2 − 4AC > 0, then we have a hyperbola.

In general, for any k ∈ R, y = k/x is symmetric in the lines y = x and y = −x:

Lemma 2. Let k ∈ R. Then y = k/x is symmetric in the lines y = x and y = −x.

Proof. Recall (Corollary 1) that the reflection of the point (p, q) in the line y = x is the
point (q, p). But:

p= ⇐⇒ q = .
k k
q p

Thus, y = x is a line of symmetry for y = k/x.

Similarly, recall (Corollary 2) that the reflection of the point (p, q) in the line y = −x is the
point (−q, −p). But:

p= ⇐⇒ −q = .
k k
q −p

Thus, y = −x is a line of symmetry for y = k/x.

There are also four types of degenerate conic sections. In each case, the plane cuts through (or
contains) the vertex. We have:
1. A point if the plane is less steep than the generator (this is the degenerate ellipse).
2. A single straight line if the plane is exactly as steep as the generator (this is the degenerate
3. A pair of intersecting lines if the plane is steeper than the generator (this is the degenerate
Now, suppose the generator, which is usually oblique, is now instead parallel to the axis. Then we get
a degenerate cone that is the cylinder. Now, any plane that is perpendicular to the cylinder’s base
4. A pair of vertical lines. (This is considered a degenerate parabola, because the plane is exactly as
steep as the generator).
1276, Contents
Fact 37. Let b, c, d, e ∈ R with d ≠ 0 and cd − be ≠ 0. Consider the graph of
bx + c
dx + e

(a) Intercepts. If e ≠ 0, then there is one y-intercept (0, c/e). (If e = 0, then there are
no y-intercepts.) And if b ≠ 0, then there is one x-intercept (−c/b, 0). (If b = 0, then
there are no x-intercepts.)
(b) There are no turning points.
(c) There is the horizontal asymptote y = b/d and the vertical asymptote x = −e/d.
(The asymptotes are perpendicular and so, this is a rectangular hyperbola.)
(d) The hyperbola’s centre is (−e/d, b/d).
(e) The two lines of symmetry are y = ±x + (b + e) /d.

Proof. We already proved (a), (c), and (d) in the main text. We now prove (b) and (e).

dy d b cd − be 1 d b d cd − be 1 cd − be −1
(b) = ( + ) = + ( ) = .
dx dx d d2 x + e/d dx d dx d2 x + e/d d2 (x + e/d)2

By assumption, cd − be = 0. Thus, dy/dx ≠ 0. And hence, by Definition ??, this graph has
no turning points.
(e) By Lemma 2, the following graph is symmetric in y = x and y = −x:
cd − be 1
y= .
d2 x
Now shift this graph leftwards by e/d units to get the graph of:
cd − be 1
y= ,
d2 x + e/d
which, by Fact 27, has lines of symmetry y = x + e/d and y = − (x + e/d).
Now shift this last graph upwards by b/d units to get the graph of:
b (cd − be) /d2 bx + c
y= + = ,
d x + e/d dx + e
which, by Fact 27, has the claimed lines of symmetry:
b+e b−e
y =x+ and y = −x + .
d d
It remains to be shown that these are the only two lines of symmetry. If there were a third
distinct line of symmetry, then there would be more than two asymptotes. But this is not
the case. Thus, there can be at most two distinct lines of symmetry.

1277, Contents

Fact 198. Consider the graph of
ax2 + bx + c
y= (a, d, ce ≠ 0).
dx + e
(a) Intercepts. If e ≠ 0, then there is one y-intercept (0, c/e). (If e = 0, then there are
no y-intercepts.)
If b2 − 4ac > 0, then the two x-intercepts are:

−b ± b2 − 4ac
( , 0).

If b2 − 4ac = 0, then there is only one x-intercept (−b/2a, 0).

And if b2 − 4ac < 0, then there are no x-intercepts.
(b) The two turning points are
√ √
⎛ −e ± (ae2 + cd2 − bde) /a bd − 2ae ± 2 a (ae2 + cd2 − bde) ⎞
⎝ ⎠
, .
d d2

If ad > 0, then the turning point on the left is a strict local maximum and the one on
the right is a strict local minimum. And if ad < 0, then the one on the left is a strict
local minimum and the one on the right is a strict local maximum.
(c) There are two asymptotes, one oblique and one vertical:

bd − ae
y = x+ = −
a e
and x .
d d2 d
(Note that since the asymptotes are not perpendicular, this is not a rectangular
e bd − 2ae
(d) The hyperbola’s centre is (− , ).
d d2
(e) The two lines of symmetry are:
√ √
a ± a2 + d2 bd − ae ± e a2 + d2
y= x+ .
d d2

Proof on the next page:

1278, Contents

Proof. We already proved (a), (c), and (d) above. Here we prove only (b) and (e).
dy d ax2 + bx + c (dx + e) (2ax + b) − (ax2 + bx + c) d adx2 + 2aex + be − cd
(b) = ( )= = .
dx dx dx + e (dx + e)
(dx + e)

Hence, dy/dx = 0 ⇐⇒ adx2 + 2aex + be − cd = 0 ⇐⇒

√ √
−2ae ± 4a2 e2 − 4ad (be − cd) −e ± (ae2 + cd2 − bde) /a
x= = . ,
2ad d

So, if (ae2 + cd2 − bde) /a < 0, then there are no stationary points.
If (ae2 + cd2 − bde) /a = 0, then dy/dx = 0 at x = −e/d. But there is no point in y =
(ax2 + bx + c) / (dx + e) at which x = −e/d. And so here, there is no stationary point.
If (ae2 + cd2 − bde) /a > 0, then there are two stationary points, given by ,. Plugging these
values of x into y = (ax2 + bx + c) / (dx + e) and doing the algebra (omitted), we can find
the y-values and thus conclude the two stationary points are:
√ √
⎛ −e ± (ae2 + cd2 − bde) /a bd − 2ae ± 2 a (ae2 + cd2 − bde) ⎞
P, Q =
⎝ ⎠
, ,
d d2

with Q being to the left of P .

Observe the numerator of dy/dx is a quadratic expression with coefficient ad on the squared
term. Hence, if ad > 0, then this quadratic is ∪-shaped, so that Q is a strict local maximum
and P is a strict local minimum. Conversely, if ad < 0, then the quadratic is ∩-shaped, so
that Q is a strict local minimum and P is a strict local maximum.
(e) Let (p, q) be a point in the hyperbola, i.e. it satisfies y = (ax2 + bx + c) / (dx + e). Use
Fact 195 to write down the reflections of the point (p, q) in the lines:
√ √
a ± a2 + d2 bd − ae ± e a2 + d2
y= x+ .
d d2
Through an insane amount of algebra (omitted), it is possible to show that these reflection
points also satisfy y = (ax2 + bx + c) / (dx + e), thus proving that this hyperbola is indeed
symmetric in the above lines.397
It remains to be shown that these are the only two lines of symmetry. If there were a third
distinct line of symmetry, then there would be more than two asymptotes. But this is not
the case. Thus, there can be at most two distinct lines of symmetry.

This painful, brute-force method is not exactly the “proper” or “usual” way to find the lines of symmetry
(for which see my “forthcoming” H2 Further Mathematics Textbook), but does avoid having to use other
facts about conic sections that we haven’t discussed.
1279, Contents
117.9. Inequalities

Fact 40. Let b ≥ 0. Then:

(a) ∣x∣ < b ⇐⇒ −b < x < b.
(b) ∣x∣ ≤ b ⇐⇒ −b ≤ x ≤ b.
(c) ∣x∣ > b ⇐⇒ (x > b OR x < −b).
(d) ∣x∣ ≥ b ⇐⇒ (x ≤ b OR x ≤ −b).

Proof. (a) That −b < x < b Ô⇒ ∣x∣ < b is obvious.

To prove that ∣x∣ < b Ô⇒ −b < x < b, consider the contrapositive — if x ≥ b or x ≤ −b, then
clearly ∣x∣ ≥ b.
(b) Similar to (a).
(c) Simply note that ∣x∣ > b is the negation of ∣x∣ ≤ b and (x > b OR x < −b) is the negation
of −b ≤ x ≤ b. And since by (b), ∣x∣ ≤ b ⇐⇒ −b ≤ x ≤ b, it follows that ∣x∣ > b ⇐⇒
(x > b OR x < −b).
(d) Similar to (c).

Fact 41. Let a ∈ R, b ≥ 0. Then:

(a) ∣x − a∣ < b ⇐⇒ a − b < x < a + b.
(b) ∣x − a∣ ≤ b ⇐⇒ a − b ≤ x ≤ a + b.
(c) ∣x − a∣ > b ⇐⇒ (x > a + b OR x < a − b).
(d) ∣x − a∣ ≥ b ⇐⇒ (x ≥ a + b OR x ≤ a − b).

Proof. (a) By Fact 40(a), ∣x − a∣ < b ⇐⇒ −b < x − a < b ⇐⇒ a − b < x < a + b.

(b) Similar.
(c) By Fact 40(c), ∣x − a∣ > b ⇐⇒ (x − a > b OR x − a < −b) ⇐⇒ (x > a + b OR x < a − b).
(d) Similar.

1280, Contents

118. Appendices for Part II. Sequences and Series

Fact 46. Let (an )n=1 be a finite arithmetic sequence. Then:


∑ an = (a1 + ak ) .
n=1 2

Proof. (... Proof continued from p. 395.)

Now, suppose instead k is odd. Then:
a1 + a2 + ⋅ ⋅ ⋅ + ak = (a1 + ak ) + (a2 + ak−1 ) + ⋅ ⋅ ⋅ + (a(k−1)/2 + a(k+3)/2 ) + a(k+1)/2 .
Note that the RHS of the above equation has one term on its own at the end and, before
that, (k − 1) /2 pairs of terms.
Next: a1 + ak = a2 + ak−1 = a3 + ak−3 = ⋅ ⋅ ⋅ = a(k+1)/2−1 + a(k+1)/2+1 .
Let d = a2 − a1 be the constant difference. Then by Fact 45, we have:
k+1 k−1
a(k+1)/2 = a1 + ( − 1) d = a1 + d and ak = a1 + (k − 1) d.
2 2

a1 + ak a1 + a1 + (k − 1) d k−1
Hence: = = a1 + d = a(k+1)/2 .
2 2 2
k−1 k − 1 a1 + ak
∑ an = (a1 + ak ) + a(k+1)/2 = (a1 + ak ) + = (a1 + ak ) .
n=1 2 2 2 2

Corollary 37. Let (an )n=1 be a finite arithmetic sequence with d = a2 − a1 . Then:

k (k − 1)
∑ an = ka1 + d.
n=1 2

Proof. Use Fact 46, then Fact 45:

k (k − 1)
∑ an = (a1 + ak ) = [a1 + a1 + (k − 1) d] = ka1 +
k k
n=1 2 2 2

1281, Contents

118.1. Convergence, Divergence, and All That
This subchapter is terse and is provided only for the sake of completeness. In particular,
this subchapter aims simply to furnish proofs of several results in the main text.
Informally, a sequence is said to converge if its terms “eventually” get “arbitrarily” close
to some limit L ∈ R. Formally:

Definition 229. Let (an ) be a (real and infinite) sequence. Let L ∈ R. Suppose that for
all ε > 0, there exists N such that for all n ≥ N , we have:

∣L − an ∣ < ε .

Then we say that the sequence (an ) is convergent and that its limit exists. Moreover, we
say the sequence converges to L and call L its limit.
That the sequence (an ) converges to L may be written as:

an → L or lim an = L.

A sequence that is not convergent is called divergent.

Definition 230. Given the series a1 + a2 + a3 + . . . , we call the finite series a1 + a2 + ⋅ ⋅ ⋅ + an

its nth partial sum.

A convergent series is then simply one whose partial sums converge:

Definition 231. Let a1 + a2 + a3 + . . . be a series and sn = a1 + a2 + ⋅ ⋅ ⋅ + an be the

corresponding nth partial sum. Consider the sequence (sn ) = (s1 , s2 , s3 , . . . ).
If the sequence (sn ) converges to some real number L, then we say that the series a1 +a2 +
a3 + . . . is convergent and that its limit exists. Moreover, we say that the series converges
to L and call L its limit.
A series that is not convergent is called divergent.

Example 1224. We can now prove that Grandi’s series 1 − 1 + 1 − 1 + 1 − 1 + . . . diverges.

The corresponding sequence of partial sums is (sn ) = (1, 0, 1, 0, 1, 0, . . . ).
Pick ε = 0.4. Suppose that for some k, we have ∣L − sk ∣ < ε. Then:

∣L − sk+1 ∣ = ∣sk − sk+1 + L − sk ∣ ≥ ∣sk − sk+1 ∣ − ∣L − sk ∣ = 1 −  > .

We have just shown that (sn ) diverges. Hence, Grandi’s series diverges.

We now introduce the Reverse Triangle Inequality.

Fact 199. (The Reverse Triangle Inequality.) Let x, y ∈ R. Then ∣x − y∣ ≥ ∣∣x∣ − ∣y∣∣.

Most proofs of the Reverse Triangle Inequality make use of the Triangle Inequality, but

1282, Contents

here’s a fun one that doesn’t:398

Proof. The given inequality ∣x − y∣ ≥ ∣∣x∣ − ∣y∣∣ is equivalent to (x − y) ≥ (∣x∣ − ∣y∣) or:
1 2 2

x2 + y 2 − 2xy ≥ ∣x∣ + ∣y∣ − 2 ∣x∣ ∣y∣ = x2 + y 2 − 2 ∣x∣ ∣y∣ or −2xy ≥ −2 ∣x∣ ∣y∣ or xy ≤ ∣x∣ ∣y∣.
2 2 2

Since ≤ is always true, so too is the ≥.

2 1

Fact 47. Other than the zero series, every (infinite) arithmetic series diverges.

Proof. Let (an ) be a non-zero arithmetic series with d = a2 − a1 and sn = ∑ ai . We have:
Number of terms dk 2
sk+1 = (First term + Last term)× = (a1 + ak+1 ) = [a1 + (a1 + kd)] = a1 k+
k k
2 2 2 2
Case 1. Suppose d = 0.
Since the given arithmetic series is non-zero, we must have a1 ≠ 0. Pick ε = ∣a1 ∣ /2 and let
L ∈ R. Suppose that for some k, we have ∣L − sk ∣ < ε. Then:
RRR RRRR ∣a1 ∣
∣L − sk+1 ∣ = ∣L − sk − a1 ∣ ≥ RRR∣L − sk ∣ − ∣a1 ∣ RRRR >
= ε,
RRR´¹¹ ¹ ¹ ¹¸ ¹ ¹ ¹ ¹ ¶ RRR 2
<ε=∣a1 ∣/2

where ≥ uses the Reverse Triangle Inequality.


And so, (sn ) diverges.

Case 2. Now suppose instead d ≠ 0.
If d > 0, then let j be the smallest integer such that aj = a1 + (j + 1) d > 0. And if d < 0,
then let j be the smallest integer such that aj = a1 + (j + 1) d < 0.
Pick ε = ∣d∣ /2. Suppose that for some k > j, we have ∣L − sk ∣ < ε. Then ∣ak+1 ∣ > ∣ak + d∣ >
∣aj + d∣ > ∣d∣ and:
∣L − sk+1 ∣ = ∣L − sk − ak+1 ∣ ≥ RRRR∣L − sk ∣ − ∣ak+1 ∣ RRRRR >
R = .

RRR´¹¹ ¹ ¹ ¹¸ ¹ ¹ ¹ ¹ ¶ ²RRR 2
<ε=∣d∣/2 >{d}

Fact 51. If ∣r∣ ≥ 1, then a1 + a1 r + a1 r2 + a1 r3 + . . . diverges.

Proof. Let sn be the nth partial sum of the given series.

Pick ε = ∣a1 ∣ /2 and let L ∈ R. Suppose that for some k, we have ∣L − sk ∣ < ε. Then:
R k RR
∣L − sk+1 ∣ = ∣L − sk − a1 r ∣ ≥= RRR∣L − sk ∣ − ∣a1 r ∣ RRR > ∣a1 ∣ /2 = ε,
k r
RRR´¹¹ ¹ ¹ ¹¸ ¹ ¹ ¹ ¹ ¶ ²RRR
<ε=∣a1 ∣/2 ≥∣a1 ∣
Taken from .
1283, Contents
where ≥ uses the Reverse Triangle Inequality.

1284, Contents

119. Appendices for Part III. Vectors
Note that much of the discussion in these Appendices can be handled more easily with the
machinery and terminology of linear algebra. But in these Appendices, I shall avoid the
explicit use of linear algebra because it isn’t in H2 Maths.

119.1. Some General Definitions

Above we gave the definitions of various terms like vector, length, and unit vector in
R2 or R3 . We now give their general definitions in Rn :

Definition 232. A point in Rn is any ordered n-tuple of real numbers.

Definition 233. The point (0, 0, . . . , 0) in Rn is called the origin and is denoted O.

Definition 234. Let A = (a1 , a2 , . . . , an ) and B = (b1 , b2 , . . . , bn ) be points in Rn . Then

the vector from A to B, denoted AB, is the ordered n-tuple:399
AB = (b1 − a1 , b2 − a2 , . . . , bn − an ) ∈ Rn .

Definition 235. Given the vector u = (u1 , u2 , . . . , un ) ∈ Rn , its length (or norm or mag-
nitude), denoted ∣u∣, is the number:

∣u∣ = u21 + u22 + ⋅ ⋅ ⋅ + u2n .

Definition 236. Given u = (u1 , u2 , . . . , un ) and c ∈ R, the vector cu is:

cu = (cu1 , cu2 , . . . , cun ) .

Definition 237. Given the points A = (a1 , a2 , . . . , an ) and B = (b1 , b2 , . . . , bn ), the differ-
ence B − A is defined as the vector AB. That is:
B − A = AB = (b1 − a1 , b2 − a2 , . . . , bn − an ) .

Definition 238. Given a point A = (a1 , a2 , . . . , an ) and a vector v = (v1 , v2 , . . . , vn ), their

sum A + v is the following point:

A + v = (a1 + v1 , a2 + v2 , . . . , an + vn ) .

Definition 239. Given a point B = (b1 , b2 , . . . , bn ) and a vector v = (v1 , v2 , . . . , vn ), the

difference B − v is defined as the following point:

B − v = (b1 − v1 , b2 − v2 , . . . , bn − vn ) .

Note that technically, the set Rn that contains the points A and B is different from the set Rn that
contains the vector AB. The former is a Euclidean space, while the latter is a vector space. This
though is somewhat beyond the scope of H2 Maths and so let’s not worry any further about this.
1285, Contents
Definition 240. Given the vectors u = (u1 , u2 , . . . , un ) and v = (v1 , v2 , . . . , vn ), their sum,
denoted u + v, is the vector:

u + v = (u1 + v1 , u2 + v2 , . . . , un + vn ) .

Definition 241. Given the vector u = (u1 , u2 , . . . , un ), its additive inverse, denoted −u,
is the vector:

−u = (−u1 , −u2 , . . . , −un ) .

Definition 242. Given the vectors u = (u1 , u2 , . . . , un ) and v = (v1 , v2 , . . . , vn ), the differ-
ence u − v is defined as the sum of u and v.

Fact 200. If u = (u1 , u2 , . . . , un ) and v = (v1 , v2 , . . . , vn ) are vectors, then:

u − v = (u1 − v1 , u2 − v2 , . . . , un − vn ) .

Proof. By Definition 241, −v = (−v1 , −v2 , . . . , −vn ). And now by Definition 242,

u − v = u + (−v) = (u1 − v1 , u2 − v2 , . . . , un − vn ) .

Ð→ Ð→ Ð→
Fact 201. Suppose A, B, and C be points. Then AB − AC = CB.

Proof. Let A = (a1 , a2 , . . . , an ), B = (b1 , b2 , . . . , bn ), and C = (c1 , c2 , . . . , cn ). Then by

Ð→ Ð→
Definition 237, AB = (b1 − a1 , b2 − a2 , . . . , bn − an ), AC = (c1 − a1 , c2 − a2 , . . . , cn − an ), and
CB = (b1 − c1 , b2 − c2 , . . . , bn − cn ). And now by Fact 200,
Ð→ Ð→
AB − AC = (b1 − a1 , b2 − a2 , . . . , bn − an ) − (c1 − a1 , c2 − a2 , . . . , cn − an )
= (b1 − c1 , b2 − c2 , . . . , bn − cn ) = CB.

Definition 243. Given the non-zero vector u = (u1 , u2 , . . . , un ) ∈ Rn , the unit vector in
its direction (or its unit vector) is denoted û and is defined by:
û = u.

So, given the vector u, its unit vector û is simply the vector that points in the same direction
but has length 1.

1286, Contents

119.2. Some Basic Results

Fact 202. Let a = (a1 , a2 , . . . , an ) be a vector and c ∈ R. Then ∣ca∣ = ∣c∣ ∣a∣.

Proof. By Definitions 236 and 235:

∣ca∣ = ∣c (a1 , a2 , . . . , an )∣ = ∣(ca1 , ca2 , . . . , can )∣ = Á
˷ (cai )2

¿ ¿ ¿

Án Á Án 2
À∑ (c2 a2 ) = Á
=Á Àc2 ∑ a2 = ∣c∣ Á
À∑ a = ∣c∣ ∣a∣.
i i i
i=1 i=1 i=1

Fact 60. Let a and b be non-zero vectors. Then:

(a) â = b̂ ⇐⇒ a and b point in the same direction.
(b) â = −b̂ ⇐⇒ a and b point in exact opposite directions.
(c) â = ±b̂ ⇐⇒ a and b are parallel.
(d) â ≠ ±b̂ ⇐⇒ a and b are non-parallel.

Proof. (a) Suppose â = b̂. Then a/ ∣a∣ = b/ ∣b∣ or a = (∣a∣ / ∣b∣) b. Since a = kb for some
k > 0, by Definition 103, they point in the same direction.
Now suppose instead that a and b point in the same direction. Then by Definition 103,
there exists k > 0 such that a = kb. Thus:
1 1 1 1
â = a= kb = kb = b = b̂.
∣a∣ ∣kb∣ ∣k∣ ∣b∣ ∣b∣

(b) is similar and thus omitted. (c) and (d) follow from (a) and (b).

Fact 53. Suppose v is a vector. Then ∣v∣ ≥ 0. Moreover, ∣v∣ = 0 ⇐⇒ v = 0.

Proof. Let v = (v1 , v2 , . . . , vn ). Then ∣v∣ = ∑ vi2 ≥ 0, with ∣v∣ ⇐⇒ v = 0.

1287, Contents

The following result holds only in two-dimensional space:

Fact 61. Let a, b, and c be vectors. If a ∥/ b, then there are α, β ∈ R such that:

c = αa + βb.

Proof. Let a = (a1 , a2 ) and b = (b1 , b2 ). Let c = (c1 , c2 ) be any vector.

Suppose a1 = 0. Then a2 ≠ 0 (because a ≠ 0) and b1 ≠ 0 (because a ∥/ b).
b1 c2 − b2 c1
α= β=
Now pick: and .
a2 b1 b1
⎛ 0 + c1 ⎞
b1 c2 − b2 c1 ⎛ 0 ⎞ c1 ⎛ b1 ⎞ ⎜ ⎟ = ⎛ c1 ⎞ = c.
Then: αa + βb = + =
⎝ a2 ⎠ b1 ⎝ b2 ⎠ ⎜ b1 c2 − b2 c1 a2 + c1 b2 ⎟ ⎝
c2 ⎠
⎝ ⎠
a2 b1
a2 b1  b1

The cases where a2 = 0, b1 = 0, or b2 = 0 are similarly handled.

Now suppose a1 , a2 , b1 , b2 ≠ 0. Since a ∥/ b, we have a1 /a2 ≠ b1 /b2 and thus a1 b2 − a2 b1 ≠ 0.
b2 c1 − b1 c2 a1 c2 − a2 c1
Now pick: α= and β= .
a 1 b2 − a 2 b1 a1 b2 − a2 b1

b2 c1 − b1 c2 ⎛ a1 ⎞ a1 c2 − a2 c1 ⎛ b1 ⎞
Then: αa + βb = +
a1 b2 − a2 b1 ⎝ a2 ⎠ a1 b2 − a2 b1 ⎝ b2 ⎠

⎛ a1 b2 c1  +a1
b1c2 b1c2 − a2 b1 c1 ⎞
⎜ a1 b2 − a2 b1 ⎟ ⎛c ⎞
⎜ ⎟
=⎜ ⎟= 1
= c.
⎜ a b c − a b c + a b c −a ⎟ ⎝ c2 ⎠
⎜ 2 2 1 2 1 2 1 2 2  2 b 2 c1

⎝ a1 b2 − a2 b1 ⎠

Fact 62. Let u and v be vectors. Suppose v is a line’s direction vector. Then:

u is also that line’s direction vector ⇐⇒ u ∥ v.

Proof. Suppose l is the line described by R = P + λv (λ ∈ R), where P ∈ Rn is some point.

Let A and B be distinct points on l. Since A, B ∈ l, there are distinct real numbers α and
β such that A = P + αv and B = P + βv.
Thus, AB = (β − α) v, where β − α ≠ 0. We have just proven that any direction vector of l
must be a non-zero scalar multiple of the vector v.
We next prove that any non-zero scalar multiple of the vector v must be a direction vector
of l. Let k ≠ 0. Let C = A + kv = P + (α + k) v. Observe that C ∈ l. Thus, kv = AC is a
direction vector of l.

1288, Contents

119.3. Scalar Product

Definition 244. Let u = (u1 , u2 , . . . , un ) and v = (v1 , v2 , . . . , vn ) be vectors. Then their

scalar product u ⋅ v is the number ∑ ui vi .

Fact 65. Let a, b, and c be vectors. Then:

(a) a ⋅ b = b ⋅ a. (Commutativity)
(b) a ⋅ (b + c) = a ⋅ b + a ⋅ c. (Distributivity over addition)

Proof. Let a = (a1 , a2 , . . . , an ), b = (b1 , b2 , . . . , bn ), and c = (c1 , c2 , . . . cn ).

n n
Then: a ⋅ b = ∑ ai bi = ∑ bi ai = b ⋅ a.
i=1 i=1
n n n
And: a ⋅ (b + c) = ∑ ai (bi + ci ) = ∑ ai bi + ∑ ai ci = a ⋅ b + a ⋅ c.
i=1 i=1 i=1

Fact 66. Suppose a and b be vectors and c ∈ R be a scalar. Then:

(ca) ⋅ b = c (a ⋅ b).

Proof. Let a = (a1 , a2 , . . . , an ) and b = (b1 , b2 , . . . , bn ). By Fact 52(b), we have:

n n
(ca) ⋅ b = ∑ (cai ) bi = c ∑ ai bi = c (a ⋅ b).
i=1 i=1
A vector’s length is the square root of its scalar product with itself:

Fact 67. Suppose v be a vector. Then ∣v∣ = v ⋅ v and ∣v∣ = v ⋅ v.

√ √
Proof. By Def. 235, ∣v∣ = ∑ vi2 . By Def. 244, v⋅v = ∑ vi vi = ∑ vi2 . Thus, ∣v∣ = v ⋅ v.

Fact 69. (Cauchy’s Inequality.) Let u and v be non-zero vectors. Then:

−1 ≤ ≤ 1.
∣u∣ ∣v∣

− ∣u∣ ∣v∣ ≤ u ⋅ v ≤ ∣u∣ ∣v∣ (u ⋅ v) ≤ ∣u∣ ∣v∣ .

2 2 2
Equivalently: or

Proof. Let x ∈ R and S = ∣u + xv∣ . Write:


S = ∣u + xv∣ = (u + xv) ⋅ (u + xv) = u ⋅ u + 2 (u ⋅ v) x + (v ⋅ v) x2 = ∣u∣ + 2 (u ⋅ v) x + ∣v∣ x2 .

2 2 2

Observe that S is a quadratic expression in x. Moreover, S ≥ 0. So, its discriminant must

be non-positive. That is, (2u ⋅ v) − 4 ∣v∣ ∣u∣ ≤ 0.
2 2 2

(u ⋅ v) u⋅v
Rearranging, we have ≤ 1. Then take square roots to get : −1 ≤ ≤ 1.
∣u∣ ∣v∣
2 2 ∣u∣ ∣v∣

1289, Contents

Fact 71. Let u and v be non-zero vectors. Then:
(a) =1 ⇐⇒ u and v point in the same direction.
∣u∣ ∣v∣
(b) = −1 ⇐⇒ u and v point in exact opposite directions.
∣u∣ ∣v∣
(c) = ±1 ⇐⇒ u and v are parallel.
∣u∣ ∣v∣
(d) ∈ (−1, 1) ⇐⇒ u and v point in different directions.
∣u∣ ∣v∣

Proof. We first prove ⇐Ô of (a). If u and v point in the same direction, then by Definition
103, there exists k > 0 such that u = kv and so:
u ⋅ v (kv) ⋅ v k (v ⋅ v) k ∣v∣ ∣v∣
= = = = 1.
∣u∣ ∣v∣ ∣kv∣ ∣v∣ ∣k∣ ∣v∣ ∣v∣ k ∣v∣ ∣v∣

The proof of ⇐Ô of (b) is very similar. If u and v point in the exact opposite directions,
then by Definition 103, there exists k < 0 such that u = kv and so:
u ⋅ v (kv) ⋅ v k (v ⋅ v) k ∣v∣ ∣v∣
= = = = −1.
∣u∣ ∣v∣ ∣kv∣ ∣v∣ ∣k∣ ∣v∣ ∣v∣ −k ∣v∣ ∣v∣

In the remainder of this proof, we prove Ô⇒ of (a) and (b).

We first show that if u ∥/ v, then u ⋅ v ≠ ± ∣u∣ ∣v∣. We’ll use the same idea that was used in
the proof of Cauchy’s Inequality:
If u ∥/ v, then u ≠ xv for all x ∈ R. So, u − xv ≠ 0 and ∣u − xv∣ > 0. Thus:

S = ∣u − xv∣ = (u − xv) ⋅ (u − xv) = ∣u∣ − 2 (u ⋅ v) x + ∣v∣ x2 > 0.

2 2 2

Observe that S is a quadratic expression in x, with positive coefficient on x2 . And since

S > 0 for all x, its discriminant must be negative:

(−2u ⋅ v) − 4 ∣v∣ ∣u∣ < 0.

2 2 2

Rearranging, we get (u ⋅ v) < ∣u∣ ∣v∣ and thus u ⋅ v ≠ ± ∣u∣ ∣v∣.

2 2 2

We have just shown that if u ∥/ v, then u ⋅ v ≠ ± ∣u∣ ∣v∣. And so, by the contrapositive, if
u ⋅ v = ± ∣u∣ ∣v∣, then u ∥ v.
Now, if u ⋅ v ≠ ∣u∣ ∣v∣, then by ⇐Ô of (a), u and v do not point in the same direction.
Thus, if u ⋅ v = − ∣u∣ ∣v∣, then u and v must point in the exact opposite directions. 3
Similarly, if u ⋅ v ≠ − ∣u∣ ∣v∣, then by ⇐Ô of (b), u and v do not point in the exact opposite
directions. Thus, if u ⋅ v = ∣u∣ ∣v∣, then u and v must point in the same direction. 3

1290, Contents

119.4. Angles

Definition 245. The standard basis vector in the ith direction (or ith standard basis
vector), denoted ei , is the vector whose ith coordinate is 1 and other coordinates are 0.

Definition 246. The ith-direction cosine of the vector v = (v1 , v2 , . . . , vn ) is the number:

Fact 203. Suppose θ is the angle between a vector v = (v1 , v2 , . . . , vn ) and ei . Then:

cos θ =

v ⋅ ei vi ⋅ 1 + ∑j≠i vj ⋅ 0 vi
Proof. By Definition 111: cos θ = = = .
∣v∣ ∣ei ∣ ∣v∣ ⋅ 1 ∣v∣

The following Fact says that given two lines, we can choose any direction vector for each
and the calculated angle between the two chosen direction vectors will be fixed:

Fact 204. If one line has direction vectors u1 and v1 , while another has u2 and v2 , then:
∣u1 ⋅ u2 ∣ ∣v1 ⋅ v2 ∣
∣u1 ∣ ∣u2 ∣ ∣v1 ∣ ∣v2 ∣

Proof. There exist non-zero real numbers λ and µ such that u1 = λv1 and u2 = µv2 . Thus:
∣u1 ⋅ u2 ∣ ∣(λv1 ) ⋅ (µv2 )∣ ∣(λv1 ) ⋅ (µv2 )∣ ∣λ∣ ∣µ∣ ∣v1 ⋅ v2 ∣ ∣v1 ⋅ v2 ∣
= = = =
∣u1 ∣ ∣u2 ∣ ∣λv1 ∣ ∣µv2 ∣ ∣λv1 ∣ ∣µv2 ∣ ∣λ∣ ∣µ∣ ∣v1 ∣ ∣v2 ∣ ∣v1 ∣ ∣v2 ∣

Corollary 9. Suppose θ is the angle between two lines. (a) If θ = 0, then the two lines
are parallel. And (b) if θ = π/2, then they are perpendicular.

Proof. Suppose the two lines are l1 and l2 , with direction vectors u and v
(a) If θ = 0, then by Corollary 8:
∣u ⋅ v∣ ∣u ⋅ v∣
cos−1 =0 or = cos 0 = 1 or u ⋅ v = ± ∣u∣ ∣v∣.
∣u∣ ∣v∣ ∣u∣ ∣v∣
And so by Fact 71, u ∥ v.
(b) If θ = π/2, then by Corollary 8:
∣u ⋅ v∣ π ∣u ⋅ v∣
cos−1 = = cos = 0 u ⋅ v = 0.
or or
∣u∣ ∣v∣ 2 ∣u∣ ∣v∣ 2
And so by Definition 112, u ⊥ v.

1291, Contents

119.5. The Relationship Between Two Lines

Fact 75. If two lines are:

(a) Identical, then they are also parallel.
(b) Distinct and parallel, then they do not intersect.
(c) Distinct, then they share at most one intersection point.

Proof. (a) If two lines are identical, then they also share a direction vector, so that by
Definition 116, they are parallel.
(b) Suppose two lines are parallel. Then by Definition 116 and Fact 62, they share some
direction vector u.
Suppose also that they intersect at some point S. Then both lines can be described by
r = OS + λu and are identical.
Thus, if two parallel lines are distinct, then they cannot intersect.
(c) We already showed that two distinct and parallel lines do not intersect. We now show
that two distinct and non-parallel lines share at most one intersection point.
Suppose for contradiction that two distinct lines share two distinct intersection points P
and Q. Then P Q is a direction vector of both lines. Thus, both lines can be described by
Ð→ Ð→
r = OP + λP Q and must thus be identical.

Fact 76. If two lines (in 2D space) are distinct and non-parallel, then they must share
exactly one intersection point.

Proof. Suppose two lines are described by:

r = (p1 , p2 ) + λ (u1 , u2 ) and r = (q1 , q2 ) + µ (v1 , v2 ) (λ, µ ∈ R).

By definition of a line, at least one of u1 or u2 must be non-zero. So, suppose without loss
of generality that u1 ≠ 0.
• If v1 = 0, then v2 ≠ 0 and so u1 v2 − u2 v1 = u1 v2 ≠ 0.
• If v1 ≠ 0, then u2 /u1 ≠ v2 /v1 (because the two lines are not parallel) and so u1 v2 −u2 v1 ≠ 0.
Thus, u1 v2 − u2 v1 ≠ 0. The reader can verify (through rather tedious algebra) that the two
lines intersect at the following parameter values:
u1 (p2 − q2 ) + u2 (q1 − p1 ) q1 + µ̂v1 − p1
µ̂ = and λ̂ = .
u1 v2 − u2 v1 u1
This intersection point is also unique because by Fact 75, two lines can share at most one
intersection point.

1292, Contents

119.6. Lines in 3D Space

Fact 92. Suppose the line l is described by r = (p1 , p2 , p3 ) + λ(v1 , v2 , v3 ) (λ ∈ R).

(1) If v1 , v2 , v3 ≠ 0, then l can be described by:

x − p1 y − p2 z − p3
= = .
v1 v2 v3
(2) If v1 = 0 and v2 , v3 ≠ 0, then l is perpendicular to the x-axis and can be described by:

y − p2 z − p 3
x = p1 and = .
v2 v3
(3) If v2 = 0 and v1 , v3 ≠ 0, then l is perpendicular to the y-axis and can be described by:

x − p1 z − p 3
y = p2 and = .
v1 v3
(4) If v3 = 0 and v1 , v2 ≠ 0, then l is perpendicular to the z-axis and can be described by:

x − p1 y − p2
z = p3 and = .
v1 v2
(5) If v1 , v2 = 0, then l is perpendicular to the x- and y-axes and can be described by:

x = p1 and y = p2 .

(6) If v1 , v3 = 0, then l is perpendicular to the x- and z-axes and can be described by:

x = p1 and z = p3 .

(7) If v2 , v3 = 0, then l is perpendicular to the y- and z-axes and can be described by:

y = p2 and y = p2 .

x = p1 + λv1 , y = p2 + λv2 , z = p3 + λv3 .

1 2 3
Proof. Write: and
Now, v1 × = minus v2 × = yields:
2 1

v1 y − v2 x = v1 (p2 + λv2 ) − v2 (p1 + λv1 ) = v1 p2 − v2 p1 v2 (x − p1 ) = v1 (y − p2 ).


Similarly, v2 × = minus v3 × = and v1 × = minus v3 × = yield:

3 2 3 1

v2 (z − p3 ) = v3 (y − p2 ) v1 (z − p3 ) = v3 (x − p1 ).
5 6
(Proof continues on the next page ...)

1293, Contents

(... Proof continued from the previous page.)

(1) If v1 , v2 , v3 ≠ 0, then divide = by v1 v2 and = by v2 v3 to get:

4 5

x − p1 y − p2 z − p3 y − p2
= and = .
v1 v2 v3 v2

(2) If v1 = 0 and v2 , v3 ≠ 0, then = becomes x = p1 and divide = by v2 v3 to get:

4 5

z − p3 y − p2
= .
v3 v2
Since (0, v2 , v3 ) ⋅ i = 0, l is perpendicular to the x-axis.

(3) If v2 = 0 and v1 , v3 ≠ 0, then = becomes y = p2 and divide = by v1 v3 to get:

5 6

z − p 3 x − p1
= .
v3 v1
Since (v1 , 0, v3 ) ⋅ j = 0, l is perpendicular to the y-axis.

(4) If v3 = 0 and v1 , v2 ≠ 0, then = becomes z = p3 and divide = by v1 v2 to get:

6 4

x − p1 y − p2
= .
v1 v2
Since (v1 , v2 , 0) ⋅ k = 0, l is perpendicular to the z-axis.

(5) If v1 , v2 = 0, then = and = become:

4 5

x = p1 and y = p2 .
Since (0, 0, v3 ) ⋅ i = 0 and (0, 0, v3 ) ⋅ j = 0, l is perpendicular to both the x- and y-axes.

(6) If v1 , v3 = 0, then = and = become:

4 6

x = p1 and z = p3 .
Since (0, v2 , 0) ⋅ i = 0 and (0, v2 , 0) ⋅ k = 0, l is perpendicular to both the x- and z-axes.

(7) If v2 , v3 = 0, then = and = become:

5 6

y = p2 and z = p3 .
Since (v1 , 0, 0) ⋅ j = 0 and (v1 , 0, 0) ⋅ k = 0, l is perpendicular to both the y- and z-axes.

1294, Contents

119.7. Projection Vectors

Fact 205. If u and v are non-zero vectors, then (u ⋅ v̂) v̂ is the unique vector such that:

(a) (u ⋅ v̂) v̂ ∥ v; and (b) u − (u ⋅ v̂) v̂ ⊥ v.

Proof. We first verify that (u ⋅ v̂) v̂ satisfies (a) and (b).

(a) Let θ be the angle between (u ⋅ v̂) v̂ and v. Then:
[(u ⋅ v̂) v̂] ⋅ v (u ⋅ v̂) (v̂ ⋅ v) u ⋅ v̂
cos θ = = = = ±1.
∣(u ⋅ v̂) v̂∣ ∣v∣ ∣u ⋅ v̂∣ ∣v̂∣ ∣v∣ ∣u ⋅ v̂∣
Thus, θ = 0 and (u ⋅ v̂) v̂ ∥ v.

(b) By Definition 112, u − (u ⋅ v̂) v̂ ⊥ v ⇐⇒ [u − (u ⋅ v̂) v̂] ⋅ v = 0. But this last equation
is true, as we now verify:
v v
[u − (u ⋅ v̂) v̂] ⋅ v = u ⋅ v − (u ⋅ v̂) v̂ ⋅ v = u ⋅ v − (u ⋅ ) ( ⋅ v)
∣v∣ ∣v∣
1 1
= u ⋅ v − 2 (u ⋅ v) (v ⋅ v) = u ⋅ v − 2 (u ⋅ v) ∣v∣ = 0
∣v∣ ∣v∣

We now show that (u ⋅ v̂) v̂ is the unique vector that satisfies (a) and (b).
Suppose w is also a vector that satisfies (a) and (b). That is:
(a) w ∥ v; and (b) u − w ⊥ v.
Since w ∥ v, there exists λ ≠ 0 such that w = λv. And since u − w ⊥ v, we have:
(u − λv) ⋅ v = 0 or u ⋅ v − λv ⋅ v = 0 or u ⋅ v = λv ⋅ v or λ= .
Altogether then: w = λv = v = (u ⋅ v̂) v̂.

Fact 206. (Lagrange’s Identity.) (a) (a21 + a22 ) (b21 + b22 )−(a1 b1 + a2 b2 ) = (a1 b2 − a2 b1 ) .
2 2

(a21 + a22 + a23 ) (b21 + b22 + b23 ) − (a1 b1 + a2 b2 + a3 b3 )

= (a1 b2 − a2 b1 ) + (a3 b1 − a1 b3 ) + (a2 b3 − a3 b2 ) .
2 2 2

Proof. We proved (a) in Exercise 194(e) and (b) in Exercise 222(e).

1295, Contents

Fact 85. Let a and b be vectors. Then:

∣rejb a∣ = ∣a × b̂∣ .

Proof. By the Pythagorean Theorem (Theorem 8):

∣rejb a∣ = ∣a∣ − ∣projb a∣ .
2 2

We will prove this claim twice, once in the 2D case and again in the 3D case. In each case,
we will use Lagrange’s Identity (LI).
2D case. Let a = (a1 , a2 ) and b = (b1 , b2 ). Then:
√ Á
À (a1 b1 + a2 b2 )
∣rejb a∣ = ∣a∣ − ∣projb a∣ = a1 + a2 −
2 2 2 2
b21 + b22

Á (a2 + a2 ) (b2 + b2 ) − (a1 b1 + a2 b2 )2
= Á
À 1 2 1 2
b1 + b22

Á (a1 b2 − a2 b1 )2 ∣a1 b2 − a2 b1 ∣
= À = √ = ∣a × b̂∣ .
b21 + b22 b21 + b22

3D case. Let a = (a1 , a2 , a3 ) and b = (b1 , b2 , b3 ). Then:

√ Á
À (a1 b1 + a2 b2 + a3 b3 )
∣rejb a∣ = ∣a∣ − ∣projb a∣ = a1 + a2 + a3 −
2 2 2 2 2
b21 + b22 + b23
Á (a2 + a2 + a2 ) (b2 + b2 + b2 ) − (a1 b1 + a2 b2 + a3 b3 )2
= ÁÀ 1 2 3 1 2 3
b1 + b22 + b23

(a2 b3 − a3 b2 ) + (a3 b1 − a1 b3 ) + (a1 b2 − a2 b1 ) ∣a × b∣
2 2 2
= √ = = ∣a × b̂∣ .

b21 + b22 + b23 ∣b∣

1296, Contents

119.8. The Vector Product

Fact 82. Let a and b be non-zero vectors. If a × b = 0, then a ∥ b.

Proof. Let θ be the angle between a and b. If a×b = 0, then by Fact 84, ∣a × b∣ = ∣a∣ ∣b∣ sin θ =
0. Since ∣a∣ ≠ 0 and ∣b∣ ≠ 0., we have sin θ = 0 and thus a ∥ b.

Fact 94. Suppose a, b, and c are vectors, with a ∥/ b. Then:

c∥a×b ⇐⇒ c ⊥ a, b.

Proof. ⇐Ô follows from Fact 207 (below).

For Ô⇒ , suppose c ∥ a × b. Then there exists k ≠ 0 such that c = k (a × b). So,
c ⋅ a = [k (a × b)] ⋅ a = k (a × b) ⋅ a = 0 and thus c ⊥ a. We can similarly show that c ⋅ b = 0
and thus c ⊥ b.

Fact 207. Suppose a, b, c, and d are non-zero vectors with a ∥/ b. If c ⊥ a, b and

d ⊥ a, b, then c ∥ d.

Proof. Let a = (a1 , a2 , a3 ), b = (b1 , b2 , b3 ), c = (c1 , c2 , c3 ), and d = (d1 , d2 , d3 ).

Since c ⊥ a, b, we have c ⋅ a = 0 and c ⋅ b = 0. Or:
1 2

c1 a1 + c2 a2 + c3 a3 = 0 c1 b1 + c2 b2 + c3 b3 = 0.
1 2

Let x = a2 b3 − a3 b2 , y = a3 b1 − a1 b3 , and z = a1 b2 − a2 b1 , so that a × b = (x, y, z).

Now, b1 × = minus a1 × = yields:
1 2

0 = b1 (c2 a2 + c3 a3 ) − a1 (c2 b2 + c3 b3 ) = −c2 z + c3 y c2 z = c3 y.


Similarly, b2 × = minus a2 × = and b3 × = minus a3 × = yield:

1 2 1 2

c1 z = c3 x c1 y = c2 x.
4 5

Since d ⊥ a, b, we have, similarly:

d2 z = d3 y, d1 z = d3 x, d1 y = d2 x.
6 7 8
We will break down the remainder of the proof into four cases, depending on whether any
of x, y, and z are zero. We will show that wherever no contradiction arises, c can be written
as a non-zero scalar multiple of d, so that c ∥ d.
(Proof continues on the next page ...)

1297, Contents

(... Proof continued from the previous page.)
Case 1. All of x, y, and z are zero.
Then a × b = 0, so that by Fact 82, a ∥ b, contradicting our assumption that a ∥/ b.

Case 2. Exactly two of x, y, and z are zero.

Suppose x = y = 0 and z ≠ 0. Then from =, =, =, and =, we have c1 = c2 = 0 and d1 = d2 = 0.
3 4 6 7

And now, c can be written as a non-zero scalar multiple of d:

⎛ c1 ⎞ ⎛ 0 ⎞ ⎛ 0 ⎞ ⎛ d1 ⎞
c=⎜ ⎟ ⎜
⎜ c2 ⎟ = ⎜ 0
⎟ = c3 ⎜ 0 ⎟ = c3 ⎜ d ⎟ = c3 d.
⎟ d ⎜ ⎟ d ⎜ 2⎟ d
⎝ c3 ⎠ ⎝ c3 ⎠ 3
⎝ d3 ⎠ 3
⎝ d3 ⎠ 3

Case 3. Exactly one of x, y, and z is zero.

Suppose x = 0 and y, z ≠ 0. Then from = and =, we have c1 = 0 and d1 = 0.
4 7

Note that c2 ≠ 0, because otherwise, from =, we have c3 = 0 and now c = 0, contradicting


our assumption that c is non-zero. Similarly, c3 , d2 , d3 ≠ 0.

And now = divided by = yields: = .

3 6 c2 c3
d2 d3
And so again, c can be written as a non-zero scalar multiple of d:

⎛ c1 ⎞ ⎛ 0 ⎞ ⎛ 0 ⎞ ⎛ d1 ⎞
c=⎜ ⎟ ⎜
⎜ c2 ⎟ = ⎜ c2
⎟ = ⎜ d ⎟ = ⎜ d ⎟ = c3 d.
c 3 c 3
⎟ d ⎜ 2⎟ d ⎜ 2⎟ d
⎝ c3 ⎠ ⎝ c3 ⎠ 3
⎝ d3 ⎠ 3
⎝ d3 ⎠ 3

Case 4. None of x, y, and z is zero.

Then = divided by = yields: = .

3 6 c2 c3
d2 d3

Similarly, = divided by = yields: = .

4 7 c1 c3
d1 d3

= = .
c1 c2 c3
d1 d2 d3
And now, c can be written as a non-zero scalar multiple of d:

⎛ c1 ⎞ ⎛ d1 ⎞
c=⎜ ⎟ ⎜ ⎟ c3
⎜ c2 ⎟ = d ⎜ d2 ⎟ = d d.

⎝ c3 ⎠ 3
⎝ d3 ⎠ 3

1298, Contents

119.9. Planes in General
The definition of a plane, reproduced:

Definition 141. A plane is any set of points that can be written as:
{R ∶ OR ⋅ n = d} or {R ∶ r ⋅ n = d},

where n is some non-zero vector and d ∈ R.

Fact 95. If a plane contains two distinct points, then it also contains the line through
those two points.

Proof. Suppose the points are A and B, so that the line AB may be described by r =
Ð→ Ð→
OA + λAB (λ ∈ R).
Suppose the plane can be described by r ⋅ n = d.
We will prove that any point on the line AB is also on the plane q. (We will thus have
shown that q contains the line AB.) To do so, we need merely verify that the generic point
Ð→ Ð→
r = OA + λAB on the line AB satisfies q’s vector equation:
Ð→ Ð→ Ð→ Ð→ Ð→
r = (OA + λAB) ⋅ n = [OA + λ (OB − OA)] ⋅ n
Ð→ Ð→ Ð→
= OA ⋅ n + λOB ⋅ n − λOA ⋅ n = d + λd − λd = d. 3

Fact 96. If q = {R ∶ OR ⋅ n = d} is a plane, then n ⊥ q.

Proof. Let v be any vector on q. Then there are points A, B ∈ q such that v = AB. Since
Ð→ Ð→
A, B ∈ q, we have OA ⋅ n = d and OB ⋅ n = d.
Ð→ Ð→ Ð→ Ð→ Ð→
And thus: n ⋅ v = n ⋅ AB = n ⋅ (OB − OA) = n ⋅ OB − n ⋅ OA = d − d = 0.

We have just proven that given any vector v on q, we have n ⋅ v = 0 or equivalently n ⊥ v.

And so by Definition 143, n is a normal vector of q (i.e. n ⊥ q).

Fact 97. Let q be a plane and n and m be vectors. Suppose n ⊥ q. Then:

m ∥ n Ô⇒ m ⊥ q.

Proof. By Definition 143, n ⊥ q ⇐⇒ n ⋅ v = 0 for every vector v on q.

By Definition 104, m ∥ n ⇐⇒ m = kn for some k ≠ 0.
And so, for any vector v on q, we have:

m ⋅ v = (kn) ⋅ v = k (n ⋅ v) = k ⋅ 0 = 0.
Thus, by Definition 143, m ⊥ q.

1299, Contents

Fact 100. Let q be a plane and P and R be points. Suppose P ∈ q. Then:
R∈q ⇐⇒ The vector P R is on q.

Ð→ Ð→ Ð→
Proof. Let n be a normal vector of q. Since OR = OP + P R, we have:
Ð→ Ð→ Ð→ Ð→ Ð→
OR ⋅ n = (OP + P R) ⋅ n = OP ⋅ n + P R ⋅ n. ,

And now: R∈q

Ð→ Ð→
⇐⇒ OR ⋅ n = OP ⋅ n (Definition 141)
⇐⇒ PR ⋅ n = 0 (,)
⇐⇒ PR ⊥ n (Definition 112)
⇐⇒ P R is on q. (Fact 99)

How to go back and forth between a plane’s vector and cartesian forms:

Fact 208. Let n = (n1 , n2 , . . . , nk ) be a non-zero vector. Then:

⎪ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎫
⎪ ⎧
⎪ ⎛ ⎞ ⎫

x1 x1 n1 ⎪

⎪ ⎪

x1 ⎪

⎪ ⎟ ⎜ ⎟ ⎜ ⎟ ⎪
⎪ ⎪
⎪ ⎪
⎪⎜ ⎟ k ⎪

⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎨⎜ ⎟∶⎜ ⎟⋅⎜ ⎟ = d⎬ = ⎨⎜ ⎟ ∶ ∑ ni xi = d⎬ .
x2 x2 n2 x2

⎪ ⎜ ⋮ ⎟ ⎜ ⋮ ⎟ ⎜ ⋮ ⎟ ⎪
⎪ ⎪
⎪ ⎜ ⋮ ⎟ i=1 ⎪

⎪ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎪
⎪ ⎪
⎪ ⎜ ⎟ ⎪

⎪ ⎪
⎪ ⎪
⎪ ⎪

⎪ xk ⎠ ⎝ xk ⎠ ⎝ nk ⎠ ⎭ ⎪
⎪ ⎩⎝ xk ⎠ ⎪

Proof. By Definition 244, (x1 , x2 , . . . , xk ) ⋅ (n1 , n2 , . . . , nk ) = ∑ ni xi .

Remark 145. Note that Definition 141 actually serves as the general definition of the
k − 1-dimensional hyperplane in Rk . In general, the k − 1-dimensional hyperplane in Rk
has cartesian equation ∑ ni xi = d, where n = (n1 , n2 , . . . , nk ).

And so the hyperplane in R3 is the “flat” two-dimensional plane with cartesian equation
ax + by + cz = d. While the hyperplane in R2 is the one-dimensional line with cartesian
equation ax + by = c.

Here is the general Definition of a two-dimensional plane in Rn :

Definition 247. A two-dimensional plane in Rn is any set that can be written as:
Ð→ Ð→
{R ∶ OR = OA + λu + µv (λ, µ ∈ R)} ,

for some point A and some non-parallel vectors u and v.

1300, Contents

Fact 209. In 3D space, Definitions 141 and 247 are equivalent.

Proof. Corollary 19 shows that Definition 141 implies Definition 247.

Ð→ Ð→
We now show the converse. Let q = {R ∶ OR = OA + λu + µv (λ, µ ∈ R)}, for some point A
and some non-parallel vectors u and v.
Ð→ Ð→
Let n = u × v (which is non-zero because u ∥/ v), d = OA ⋅ n, and s = {R ∶ OR ⋅ n = d}. We
will show that (a) P ∈ q Ô⇒ P ∈ s; and then (b) P ∈ s Ô⇒ P ∈ q. We will thus have
shown that s = q.
Ð→ Ð→
(a) Suppose P ∈ q, i.e. OP = OA + αu + βv for some α, β ∈ R. Then:
Ð→ Ð→
OP ⋅ n = OA ⋅ n + αu ⋅ n + βv ⋅ n = d + 0 + 0 = d.

Thus, P ∈ s.
Ð→ Ð→ Ð→ Ð→
(b) Now suppose P ∈ s, i.e. OP ⋅ n = d. Rearranging, OP ⋅ n = OA ⋅ n or AP ⋅ n = 0. And
Ð→ Ð→
so by Corollary 18, AP is on q. By Theorem 10 then, AP can be written as the linear
combination of u and v. That is, there exist real numbers α and β such that AP = αu + βv.
Ð→ Ð→
Rearranging, OP = OA + αu + βv. Thus, P ∈ q.

Fact 99. Let q be a plane with normal vector n. Suppose v is a vector. Then:

v⊥n Ô⇒ v is on q.

Proof. Let n = (n1 , n2 , . . . , nk ) ⊥ q and suppose v ⊥ n. Our goal is to show that v is on q.

Suppose q is described by r ⋅ n = d. Since n ≠ 0, pick any ni ≠ 0. Let P be the point whose

ith coordinate is d/ni and other coordinates are 0. Then P ∈ q because:

OP ⋅ n = ni + ∑ 0 ⋅ nj = d.
d 1
ni j≠i

Next, let Q = P + v. Then we also have Q ∈ q because:

Ð→ Ð→ Ð→
OQ ⋅ n = (OP + v) ⋅ n = OP ⋅ n + v ⋅ n = d + 0 = d.

Since P, Q ∈ q and v = P Q, v is a vector on the plane.

Fact 210. Let m be a vector and q be a plane. Suppose m ⊥ q. Then there exists e ∈ R
such that OR ⋅ m = e for all R ∈ q.

Ð→ Ð→
Proof. Let A, R ∈ q and e = OA ⋅ m. On the one hand, AR ⋅ m = 0 (because m ⊥ q). On the
Ð→ Ð→ Ð→ Ð→ Ð→ Ð→ Ð→
other, AR ⋅ m = (OR − OA) ⋅ m = OR ⋅ m − OA ⋅ m. Thus, OR ⋅ m = OA ⋅ m = e.

1301, Contents

Suppose n ⊥ q. By Fact 97, m ∥ n Ô⇒ m ⊥ q. The converse is also true:400

Theorem 9. Let q be a plane and n and m be vectors. If n ⊥ q, then:

m⊥q Ô⇒ m ∥ n.

Proof. Let q = {R ∶ OR ⋅ n = d}, n = (n1 , n2 , . . . , nk ), and m = (m1 , m2 , . . . , mk ). Suppose
m ⊥ q. By Fact 210, there exists some e ∈ R such that for all R ∈ q, OR ⋅ m = e.

We will use Lemmata 3 and 4 to prove Theorem 9:

Lemma 3. ni = 0 ⇐⇒ mi = 0.

Proof of Lemma 3. Suppose ni = 0. Since n ≠ 0, there exists some j for which nj ≠ 0.

Let w ∈ R. Let Sw be the point whose ith coordinate is w, jth coordinate is d/nj , and other
coordinates are 0. Then Sw ∈ q because:
OSw ⋅ n = wni + nj + ∑ 0 ⋅ nl = 0 + d + 0 = d.
nj l∉{i,j}

Since Sw ∈ q, we have OSw ⋅ m = e or:

OSw ⋅ m = wmi + mj + ∑ 0 ⋅ ml = wmi + mj + 0 = wmi + mj = e.
d d d 2
nj l∉{i,j} nj nj

Since = holds for all w ∈ R, it must be that mi = 0.


The proof that mi = 0 Ô⇒ ni = 0 is similar and thus omitted.

Lemma 4. ni ≠ 0 Ô⇒ mi d/ni = e.

Proof of Lemma 4. Suppose ni ≠ 0. Let Q be the point whose ith coordinate is d/ni and
other coordinates are 0. Then Q ∈ q because:
OQ ⋅ n = ni + ∑ 0 ⋅ nl = d + 0 = d.
ni l≠i

Ð→ Ð→
Since Q ∈ q, OQ ⋅ m = e OQ ⋅ m = mi + ∑ 0 ⋅ ml = mi + 0 = mi = e.
1 d d d
ni l≠i ni ni

We’ve completed our proofs of Lemmata 3 and 4. On the next page, we continue with our
proof of Theorem 9.
(Proof continues on the next page ...)

Note that this is really just a restatement of a fundamental result from linear algebra. The proof is
rather long, but uses only material and language we’ve already covered in this textbook.
1302, Contents
(... Proof continued from the previous page.)
We will show that whether (a) d ≠ 0; or (b) d = 0, we have m ∥ n.
(a) Suppose d ≠ 0. By Lemma 3, if ni = 0, then mi = 0, so that mi = ni (e/d).
And by Lemma 4, if ni ≠ 0, then mi = ni (e/d).
Hence, for all i ∈ {1, 2, . . . , k}, mi = ni (e/d). Thus, we may write m = n.
4 e
Since m ≠ 0, it must be that e ≠ 0. So, = shows that m ∥ n.

(b) Suppose d = 0. Since n ≠ 0, there is some j for which nj ≠ 0. Write mj =

5 mj
nj .
By Lemma 3, nj ≠ 0 Ô⇒ mj ≠ 0.
By Lemma 4, mj d/nj = e. Since d = 0, we have e = 0.

By Lemma 3, for i such that ni = 0, we have mi = 0. And so, for such i, mi =

mj 7
ni .
Now consider any s ≠ j such that ns ≠ 0. Let T be the point whose sth coordinate is nj ,
jth coordinate is −ns , and other coordinates are 0. Then T ∈ q because:
OT ⋅ n = nj ns + (−ns ) nj + ∑ 0 ⋅ nl = nj ns − ns nj + 0 = 0.

Since T ∈ q, we have OT ⋅ m = e = 0 or:
1 6

OT ⋅ m = nj ms + (−ns ) mj + ∑ 0 ⋅ ml = nj ms − ns mj + 0 = 0.


Thus, for any s ≠ j such that ns ≠ 0, we can rearrange = to write ms =

8 9 mj
ns .
Altogether then, =, =, and = show that for all i ∈ {1, 2, . . . , k}, we have mi = (mj /nj ) ni .
5 7 9

10 mj
Hence, we may write: n.

Since mj ≠ 0, = proves that m ∥ n.


1303, Contents

Fact 211. The unique plane that contains the point A and has normal vector n is:
Ð→ Ð→
{R ∶ OR ⋅ n = OA ⋅ n} .

Ð→ Ð→
Proof. The given plane contains A, because OA ⋅ n = OA ⋅ n.
Let v be a vector on the plane. Then there exist points P and Q on the plane such that
Ð→ Ð→ Ð→ Ð→ Ð→ Ð→ Ð→ Ð→ Ð→
v = P Q = OQ − OP . Now, v ⋅ n = (OQ − OP ) ⋅ n = OQ ⋅ n − OP ⋅ n = OA ⋅ n − OA ⋅ n = 0.
Thus, v ⊥ n. We’ve just shown that the given plane has normal vector n.
We now prove uniqueness. Suppose the plane {R ∶ OR ⋅ m = d} contains A and has normal
Ð→ Ð→ Ð→
vector n. Then by Theorem 9, m = kn for some k ≠ 0. Thus, d = OA⋅m = OA⋅(kn) = k OA⋅n.
Ð→ Ð→ Ð→ Ð→ Ð→
And now: {R ∶ OR ⋅ m = d} = {R ∶ OR ⋅ kn = k OA ⋅ n} = {R ∶ OR ⋅ n = OA ⋅ n}.

How to go back and forth between a (hyper)plane’s cartesian and parametric forms:

Fact 212. Let x = (x1 , x2 , . . . , xk ) be a generic vector. Let n = (n1 , n2 , . . . , nk ) be a non-

zero vector. Without loss of generality, suppose n1 ≠ 0. Let v1 = (d/n1 , 0, 0, . . . , 0). And
for each i ∈ {2, . . . , k}, let vi be the vector whose 1st coordinate is −ni /n1 , ith coordinate
is 1, and remaining coordinates are 0. Suppose:
k k
S = {x ∶ ∑ ni xi = d} and T = {x ∶ x = v1 + ∑ λi vi (λ2 , . . . , λk ∈ R)}.
i=1 i=2

Then S = T .

Proof. We will show that (a) a ∈ S Ô⇒ a ∈ T ; and (b) a ∈ T Ô⇒ a ∈ S.

k k
(a) Suppose a = (a1 , a2 , . . . , ak ) ∈ S. Then ∑ ni ai = d or a1 = (d − ∑ ai ni ) /n1 .
i=1 i=2
For each i = 2, 3, . . . , k, let λi = ai . Then we have:
k k
v1 + ∑ λi vi = ( − ∑ ai , a2 , a3 , . . . , ak ) = a.
d ni
i=2 n1 i=2 n1
We’ve just shown that a ∈ T .
(b) Now suppose a ∈ T . Then a = ( − ∑ λi , −λ2 , −λ3 , . . . , −λk ) for some λ2 , . . . , λk ∈ R.
d ni
n1 i=2 n1
k k k
∑ ni ai = n1 ( − ∑ λi ) + ∑ ni λi = d.
d ni
And now:
i=1 n1 i=2 n1 i=2

We’ve just shown that a ∈ S.

The results given in this subchapter were general. In contrast, the results in the next
subchapter will apply only to planes in R3 .

1304, Contents

119.10. Planes in Three-Dimensional Space

Fact 101. If a and b are non-parallel vectors on a plane q, then a × b ⊥ q.

Proof. Let a = (a1 , a2 , a3 ), b = (b1 , b2 , b3 ), and n = (n1 , n2 , n3 ) ⊥ q, so that a, b ⊥ n, or:

a1 n1 + a2 n2 + a3 n3 = 0 b1 n1 + b2 n2 + b3 n3 = 0.
1 2

Now, b2 × = minus a2 × = yields:

1 2

0 = b2 (a1 n1 + a3 n3 ) − a2 (b1 n1 + b3 n3 ) = (a1 b2 − a2 b1 ) n1 − (a2 b3 − a3 b2 ) n3 .

Similarly, b1 × = minus a1 × = and b3 × = minus a3 × = yield:

1 2 1 2

(a3 b1 − a1 b3 ) n3 − (a1 b2 − a2 b1 ) n2 = 0 and (a2 b3 − a3 b2 ) n2 − (a3 b1 − a1 b3 ) n1 = 0.

⎛ a2 b3 − a3 b2 ⎞ ⎛ n1 ⎞ ⎛ (a3 b1 − a1 b3 ) n3 − (a1 b2 − a2 b1 ) n2 ⎞ ⎛0⎞

(a × b) × n = ⎜
⎜ a3 b1 − a1 b3
⎟ × ⎜ n ⎟ = ⎜ (a b − a b ) n − (a b − a b ) n
⎟ ⎜ 2⎟ ⎜ 1 2 2 1 1 2 3 3 2 3
⎟ = ⎜ 0 ⎟ = 0.
⎟ ⎜ ⎟
⎝ a1 b2 − a2 b1 ⎠ ⎝ n3 ⎠ ⎝ (a2 b3 − a3 b2 ) n2 − (a3 b1 − a1 b3 ) n1 ⎠ ⎝0⎠

Since a × b ≠ 0, n ≠ 0, and (a × b) × n = 0, by Fact 82, a × b ∥ n.

Theorem 10. Let q be a plane and a and b be non-parallel vectors on q. Suppose c is a

non-zero vector. Then:

c is a vector on q ⇐⇒ There exist λ, µ ∈ R such that c = λa + µb.

Proof. In the main text, we already proved ⇐Ô . Here401 we prove Ô⇒ .

Let n = (n1 , n2 , n3 ) be the plane’s normal vector. Let a = (a1 , a2 , a3 ), b = (b1 , b2 , b3 ),
c = (c1 , c2 , c3 ), so that a ⋅ n = 0, b ⋅ n = 0, c ⋅ n = 0, and also a ≠ kb for all k ≠ 0.
a1 n1 + a2 n2 + a3 n3 = 0, b1 n1 + b2 n2 + b3 n3 = 0, c1 n1 + c2 n2 + c3 n3 = 0.
1 2 3
Write: and
Since n ≠ 0, suppose WLOG that n3 ≠ 0. Now rewrite =, =, and = as:
1 2 3

a1 n1 + a2 n2 b1 n1 + b2 n2 c1 n1 + c2 n2
a3 = − b3 = − c3 = −
1 2 3
, , and .
n3 n3 n3
We will use Lemmata 5 and 6 to prove Theorem 10:

Lemma 5. (a) a1 and a2 are not both zero. (b) b1 and b2 are not both zero.

Proof of Lemma 5. (a) If a1 , a2 = 0, then a3 ≠ 0 (because a ≠ 0) and a1 n1 + a2 n2 + a3 n3 =

0 + 0 + a3 n3 ≠ 0, contradicting =. The proof of (b) is similar.

(Proof continues on the next page ...)

Note that again, this long proof is just a fundamental result from linear algebra (applied to the 3D
case), but written using only material and language we’ve introduced in this textbook.
1305, Contents
(... Proof continued from the previous page.)

Lemma 6. a1 b2 − a2 b1 ≠ 0.

Proof of Lemma 6. Suppose for contradiction that a1 b2 − a2 b1 = 0. We will show that


whether (a) a1 ≠ 0 or (b) a1 = 0, a contradiction arises and hence a1 b2 − a2 b1 ≠ 0.

(a) If a1 ≠ 0, then rearranging =, we have b2 = a2 b1 /a1 . If b1 = 0, then b2 = 0, but this
4 5

contradicts Lemma 5. So, b1 ≠ 0.

We now show that a = kb for k ≠ 0, contradicting our assumption that a ∥/ b:
a1 n1 + a2 n2 a2 b1 b1 n1 + (a2 b1 /a1 ) n2
a = (a1 , a2 , a3 ) = (a1 , a2 , − ) = (b1 , ,− )
1 a1
n3 b1 a1 n3
b1 n1 + b2 2 a1
= (b1 , b2 , − ) = (b1 , b2 , b3 ) = b.
5 a1 a1
b1 n3 b1 b1
(b) If a1 = 0, then by Lemma 5, a2 ≠ 0 and the same contradictions as in (a) arise.
The proofs of Lemmata 5 and 6 are complete. We now resume our proof of Theorem 10.

6 a1 c2 − a2 c1 7 ⎪
⎪(c1 − µb1 ) /a1 if a1 ≠ 0
Pick: µ= and λ=⎨
a1 b2 − a2 b1 ⎪
⎪(c2 − µb2 ) /a2
⎪ if a1 = 0.

We now verify that λa + µb = c, or equivalently, that:

λa1 + µb1 = c1 , λa2 + µb2 = c2 , λa3 + µb3 = c3 .

8 9 10

We now show that if = and = hold, then so too does =:

8 9 10

a1 n1 + a2 n2 b1 n1 + b2 n2
λa3 + µb3 = −λ −µ
n3 n3
(λa1 + µb1 ) n1 + (λa2 + µb2 ) n2 8,9 c1 n1 + c2 n2 10
=− = − = c3 .
n3 n3

It thus suffices to show that = and = hold. And we now do so, in each of two cases:
8 9

(i) Suppose a1 ≠ 0. Then = holds: λa1 + µb1 = c1 − µb1 + µb1 = c1 . And so too does =:
8 7 8 9

a1 b2 − a2 b1 6 a2 c1 a1 c2 − a2 c1 9
λa2 + µb2 = (c1 − µb1 ) + µb2 = +µ = + = c2 .
7 a2 a 2 c1
a1 a1 a1 a1 a1

(ii) Suppose a1 = 0. Then = holds: λa2 + µb2 = c2 − µb2 + µb2 = c2 . And so too does =:
9 7 9 8

a2 b1 − a1 b2 a1 c2 a2 c1 − a1 c2 8
λa1 + µb1 = (c2 − µb2 ) + µb1 = +µ = + = c1 .
a1 a1 c2
a2 a2 a2 a2 a2

1306, Contents

119.11. Four Ways to Uniquely Determine a Plane
As discussed in Ch. 55, there are Four Ways to uniquely determine a plane:
1. A point and a normal vector;
2. A point and two vectors (that aren’t parallel);
3. Two points and a vector (that isn’t parallel to the vector between the two points); or
4. Three points (that aren’t collinear).
Fact 211 already proved that (1) a point and a normal vector uniquely determine
a plane.
Corollary 19 already proved that (2) a point and two non-parallel vectors uniquely
determine a plane.
(3) is immediate from Corollary 19:
Fact 213. Let A and B be distinct points and u ∥/ AB. Then the unique plane that
contains A, B, and u is:
{R ∶ R = A + λu + µAB (λ, µ ∈ R)} .

(4) is nearly immediate from Corollary 19:

Fact 214. Let A, B, and C be points that are not collinear. Then the plane that contains
all three points is:
Ð→ Ð→
{R ∶ R = A + λAB + µAC (λ, µ ∈ R)} .

Ð→ Ð→
Proof. Since A, B, and C are not collinear, AB ∥/ AC. Now apply Corollary 19.

1307, Contents

119.12. The Relationship Between a Line and a Plane

Fact 107. Given a line and a plane, there are three possibilities. The line and plane are:
(a) Parallel and do not intersect at all.
(b) Parallel and the line lies entirely on the plane.
(c) Non-parallel and intersect at exactly one point.

Proof. Describe the line l and plane q by:

r = p + λv r ⋅ n = d.
1 2

To find any points at which l and q intersect, plug = into = to get:

1 2

(p + λv) ⋅ n = d p ⋅ n + λv ⋅ n = d or λv ⋅ n = d − p ⋅ n.

Thus, the intersection points of l and q correspond to the values of λ for which = holds.

Suppose l ∥ q. Then by Fact 106, v ⋅ n = 0 and = becomes p ⋅ n = d.


(a) If p ⋅ n ≠ d, then l and q do not intersect at any value of λ. So, l and q do not intersect.
(b) If p ⋅ n = d, then l and q intersect at all values of λ. So, l lies completely on q.
(c) Now suppose instead that l ∥/ q. Then by Fact 106, v ⋅ n ≠ 0.
And so, we can rearrange = to get: λ=
This shows that there is only one value of λ at which the line and plane intersect. And this
unique intersection point is given by:
p + λv = p + v.

1308, Contents

119.13. The Relationship Between Two Planes

Fact 109. Let q and r be planes with normal vectors u and v. Then:

(a) q ∥ r ⇐⇒ u ∥ v; and (b) q ⊥ r ⇐⇒ u ⊥ v.

Proof. Let θ be the angle between q and r. By Definitions 148 and Facts 108 and 71:
∣u ⋅ v∣ ∣u ⋅ v∣
(a) q ∥ r ⇐⇒ θ = cos−1 = 0 ⇐⇒ = cos 0 = 1 ⇐⇒ u ∥ v.
∣u∣ ∣v∣ ∣u∣ ∣v∣
∣u ⋅ v∣ π ∣u ⋅ v∣
(b) q ⊥ r ⇐⇒ θ = cos−1 = ⇐⇒ = cos = 0 ⇐⇒ u ⋅ v = 0 ⇐⇒ u ⊥ v.
∣u∣ ∣v∣ 2 ∣u∣ ∣v∣ 2

Fact 110. If two planes are parallel, then they are either identical or do not intersect.

Proof. Suppose two planes are parallel. Then they share some normal vector n.
Ð→ Ð→
Suppose they are described by OR⋅n = d1 and OR⋅n = d2 . If d1 = d2 , then they are identical.
So suppose d1 ≠ d2 . If the point P is on the first plane, then OP ⋅ n = d1 ≠ d2 , so that P is
not on the second plane. Thus, the two planes do not intersect.

Lemma 7. Let n = (n1 , n2 , . . . , nk ) and m = (m1 , m2 , . . . , mk ) be vectors. If n ∥/ m, then

there exist i and j such that ni mj − nj mi ≠ 0.

Proof. Suppose for contradiction that ni mj − nj mi = 0 for all i, j.


Pick any s such that ns ≠ 0. Then by =, we have ni ms − ns mi = 0 for all i. Rearranging,

1 1

(ms /ns ) ni = mi for all i. Thus, m = (ms /ns ) n, contradicting n ∥/ m.


1309, Contents

Fact 111. If two planes are not parallel, then they must intersect.

Ð→ Ð→
Proof. Let the two planes be described by OR⋅n = d and OR⋅m = e, where n = (n1 , n2 , . . . , nk ),
m = (m1 , m2 , . . . , mk ), and n ∥/ m. By Lemma 7, there exist i and j such that ni mj − nj mi ≠
0. And since ni mj − nj mi ≠ 0, at least one of ni or nj must be non-zero.
Suppose without loss of generality that ni ≠ 0. Let P = (p1 , p2 , . . . , pk ) be the point with:
eni − dmi d − pj nj
pj = , pi = , and pl = 0 for all l ∉ {i, j}.
ni mj − nj mi ni

We now verify that both planes contain the point P :

OP ⋅ n = ∑ pl nl = pi ni + pj nj + ∑ pl nl

d − pj nj
= ni + pj nj + 0 = d − pj nj + pj nj = d, 3
Ð→ d − pj nj
OP ⋅ m = ∑ pl ml = pi mi + pj mj + ∑ pl ml = mi + pj mj + 0
l∉{i,j} ni
dmi + pj (ni mj − nj mi ) dmi + eni − dmi
= = = e. 3
ni ni

Fact 112. Suppose two non-parallel planes have normal vectors n and m. Then their
intersection is a line with direction vector n × m.

Proof. Here in the Appendices we’ll actually go a little further by fully specifying the line
along which the two planes intersect.
Ð→ Ð→
Let the two planes q1 and q2 be described by OR ⋅ n = d and OR ⋅ m = e. Let P be the point
constructed in the proof of Fact 111.
Then, we claim, q1 and q2 intersect at the line described by:
r = OP + λn × m (λ ∈ R).

To prove this claim, we first verify that q1 and q2 contain the above line. To do so, plug
the generic point of the above line into each plane’s vector equation:
Ð→ Ð→
(OP + λn × m) ⋅ n = OP ⋅ n + (λn × m) ⋅ n = d + 0 = d, 3
Ð→ Ð→
(OP + λn × m) ⋅ m = OP ⋅ m + (λn × m) ⋅ m = d + 0 = d. 3

Next, let S ∈ q1 ∩ q2 with S ≠ P . We will prove that S is on the given line.

Ð→ Ð→
Since P, S ∈ q1 ∩ q2 , we have P S ⊥ n, m. And so by Fact 94, P S ∥ n × m. That is,
P S = λn × m for some λ ∈ R.
Ð→ Ð→
Rearranging, we have OS = OP + λn × m. Thus, S is on the given line. 3

1310, Contents

119.14. Distances
As mentioned in Remark 145, in 2D space, the hyperplane r ⋅ n = d describes a line. So,
Fact 115, which applies generally to n-dimensional space, can actually also be applied to
2D space to prove the following two results, are reproduced from our Appendices for Part
I (Functions and Graphs).

Corollary 35. The point on the line ax + by + c = 0 that is closest to the point (p, q) is:
ap + bq + c ap + bq + c
(p − a − ).
a2 + b2 a2 + b2
, q b

Proof. Replace n, d, and A in Fact 115 with (a, b), −c, and (p, q) to get:
d − OA ⋅ n −c − (p, q) ⋅ (a, b) −c − ap − bq ap + bq + c
k= = = = −
a2 + b2 a2 + b2 a2 + b2

So, by Facts 115 and 114, the point on the given line that’s closest to the given point is:
ap + bq + c ap + bq + c ap + bq + c
B = A + kn = (p, q) − (a, = (p − − ).
a2 + b2 a2 + b2 a2 + b2
b) a , q b

Corollary 36. The distance between a point (p, q) and a line ax + by + c = 0 is:
∣ap + bq + c∣
√ .
a2 + b2

Proof. Continue with the above proof and apply Fact 115:
Ð→ ap + bq + c √ ∣ap + bq + c∣
∣AB∣ = ∣k∣ ∣n∣ = ∣− 2 2 ∣ a2 + b2 = √
a +b
a2 + b2

1311, Contents

119.15. Point-Plane Distance: Calculus Method

Example 1225. The plane q is described by r ⋅ (1, 1, 1) = 3 and A = (1, 2, 3) is a point.

In Ch. 58, we already showed, using the Perpendicular and Formula Methods, that:
• The foot of the perpendicular from A to q — B = (0, 1, 2); and
Ð→ √
• The distance between A and q — ∣AB∣ = 3.
Let’s now use the Calculus Method to show the same. We first describe the plane q
in parametric form:

r = (0, 0, 3) + λ (1, −1, 0) + µ (0, 1, −1) (λ, µ ∈ R).

Let R be a generic point on q. Then the distance between A and R is:

√ √
∣AR∣ = (λ − 1) + (−λ + µ − 2) + (−µ) = 2λ2 + 2µ2 + 2λ − 4µ − 2µλ + 5.
2 2 2 1

The values of λ and µ that minimise ∣AR∣ correspond to the point B. Our goal then is to
find these values. We’ll do so using calculus — this will be very similar to what we did
in Chs. 43 and 50, the difference being that we’ll take two derivatives w.r.t. λ and µ.
Moreover, these derivatives are a little different in that they are partial derivatives.
When taking a partial derivative with respect to a constant, we treat any other variable
as a constant. So:

(2λ2 + 2µ2 + 2λ − 4µ − 2µλ + 5) = 4λ + 2 − 2µ,


(2λ2 + 2µ2 + 2λ − 4µ − 2µλ + 5) = 4µ − 4 − 2λ.


The First Order Conditions are:

4λ + 2 − 2µ∣λ=λ̃,µ=µ̃ = 0 4µ − 4 − 2λ∣λ=λ̃,µ=µ̃ = 0.
1 2

= plus 2× = yields λ̃ = 0 and µ̃ = 1. Hence:

2 1

B = (0, 0, 3) + λ̃ (1, −1, 0) + µ̃ (0, 1, −1) = (0, 0, 3) + 0 (1, −1, 0) + 1 (0, 1, −1) = (0, 1, 2) .
√ √ √
And: ∣AB∣ = 2λ̃2 + 2µ̃2 + 2λ̃ − 4µ̃ − 2µ̃λ̃ + 5 = 0 + 2 + 0 − 4 − 0 + 5 = 3.

Happily, these results are the same as before.

Note that this textbook does not explain why the above method works. We will merely
note that the intuition for why it works is similar to that given in Part V (Calculus).

1312, Contents

Example 1226. The plane q is described by r ⋅ (0, 2, 5) = 1 and A = (−1, 0, 1) is a point.
Again, first describe the plane q in parametric form:

r = (0, 0.5, 0) + λ (0, 5, −2) + µ (1, 5, −2) (λ, µ ∈ R).

Let R be a generic point on q. Then the distance between A and R is:

∣AR∣ = (µ + 1) + (5λ + 5µ + 0.5) + (−2λ − 2µ − 1)
2 2 2

= 29λ2 + 30µ2 + 9λ + 11µ + 58λµ + 9/4.

And now, we again take the (partial) derivatives:

(29λ2 + 30µ2 + 9λ + 11µ + 58λµ + 9/4) = 58λ + 9 + 58µ,


(29λ2 + 30µ2 + 9λ + 11µ + 58λµ + 9/4) = 60µ + 11 + 58λ.


The First Order Conditions are:

58λ + 9 + 58µ∣λ=λ̃,µ=µ̃ = 0 60µ + 11 + 58λ∣λ=λ̃,µ=µ̃ = 0.

1 2

= minus = yields 2µ̃ + 2 = 0 or µ̃ = −1 and λ̃ = 49/58. Hence:

2 1

B = (0, 0.5, 0) + λ̃ (0, 5, −2) + µ̃ (1, 5, −2)

49 8 9
= (0, 0.5, 0) + (0, 5, −2) − 1 (1, 5, −2) = (−1, − , ).
58 29 29

Ð→ 9
And: ∣AB∣ = 29λ̃2 + 30µ̃2 + 9λ̃ + 11µ̃ + 58λ̃µ̃ +

49 2 49 49 9 4
= 29 ( ) + 30 + 9 ( ) − 11 − 58 ( ) + = √ .
58 58 58 4 29

Happily, these results are the same as what we found in Ch. 58.

1313, Contents

Example 1227. The plane q is described by r ⋅ (0, 5, 1) = 7 and A = (−1, 2, 5) is a point.
Again, first describe the plane q in parametric form:

r = (0, 1, 2) + λ (0, 1, −5) + µ (1, 1, −5) (λ, µ ∈ R).

Let R be a generic point on q. Then the distance between A and R is:

∣AR∣ = (µ + 1) + (λ + µ − 1) + (−5λ − 5µ − 3)
2 2 2

= 26λ2 + 27µ2 + 28λ + 30µ + 52λµ + 11.

And now, we again take the (partial) derivatives:

(26λ2 + 27µ2 + 28λ + 30µ + 52λµ + 11) = 52λ + 28 + 52µ,


(26λ2 + 27µ2 + 28λ + 30µ + 52λµ + 11) = 54µ + 30 + 52λ.


The First Order Conditions are:

52λ + 28 + 52µ∣λ=λ̃,µ=µ̃ = 0 54µ + 30 + 52λ∣λ=λ̃,µ=µ̃ = 0.

1 2

= minus = yields 2µ̃ + 2 = 0 or µ̃ = −1 and λ̃ = 6/13. Hence:

2 1

B = (0, 1, 2) + λ̃ (0, 1, −5) + µ̃ (1, 1, −5)

6 6 61
= (0, 1, 2) + (0, 1, −5) − 1 (1, 1, −5) = (−1, , ).
13 13 13

And: ∣AB∣ = 26λ̃2 + 27µ̃2 + 28λ̃ + 30µ̃ + 52λ̃µ̃ + 11
√ √ √
6 2 6 6 32 2
= 26 ( ) + 27 + 28 ( ) − 30 − 52 ( ) + 11 = =4 .
13 13 13 13 13

Happily, these results are the same as what we found in Ch. 58.

1314, Contents

Example 1228. The plane q is described by r ⋅ (1, 2, 3) = 32 and A = (0, 0, 0) is a point.
Again, first describe the plane q in parametric form:

r = (0, 16, 0) + λ (2, −1, 0) + µ (3, 0, −1) (λ, µ ∈ R).

Let R be a generic point on q. Then the distance between A and R is:

∣AR∣ = (2λ + 3µ) + (−λ + 16) + (−µ)
2 2 2

= 5λ2 + 10µ2 − 32λ + 12λµ + 256.

And now, we again take the (partial) derivatives:

(5λ2 + 10µ2 − 32λ + 12λµ + 256) = 10λ − 32 + 12µ,


(5λ2 + 10µ2 − 32λ + 12λµ + 256) = 20µ + 12λ.


The First Order Conditions are:

10λ − 32 + 12µ∣λ=λ̃,µ=µ̃ = 0 20µ + 12λ∣λ=λ̃,µ=µ̃ = 0.

1 2

5× = minus 3× = yields −160 + 14λ̃ = 0 or λ̃ = 80/7 and µ̃ = −48/7. Hence:

1 2

B = (0, 16, 0) + λ̃ (2, −1, 0) + µ̃ (3, 0, −1)

80 48 16
= (0, 16, 0) + (2, −1, 0) − (3, 0, −1) = (1, 2, 3).
7 7 7

And: ∣AB∣ = 5λ̃2 + 10µ̃2 − 32λ̃ + 12λ̃µ̃ + 256
√ √
2 2
80 48 80 80 48 2
= 5 ( ) + 10 ( ) − 32 ( ) − 12 ( ) ( ) + 256 = 16 .
7 7 7 7 7 7

Happily, these results are the same as what we found in Ch. 58.

1315, Contents

119.16. The Relationship Between Two Lines in 3D Space

Ð→ Ð→
Fact 118. Suppose l1 and l2 are distinct lines described by r = OP + λu and r = OQ + λv
(λ ∈ R). Then the three possibilities are that l1 and l2 are:
(a) Parallel and do not intersect; moreover, the unique plane that contains l1 and l2 is
Ð→ Ð→
described by r = OP + λu + µP Q (λ, µ ∈ R).
(b) Non-parallel and share exactly one intersection point; moreover, the unique plane
that contains l1 and l2 is described by r = OP + λu + µv (λ, µ ∈ R).
(c) Skew (i.e. neither parallel nor intersect) and are not coplanar.

Proof. Let qa and qb be the planes given in (a) and (b).

(a) Suppose l1 ∥ l2 . Then by Fact 75, they do not intersect. And so, u ∥/ P Q.
By Fact 213, qa is the unique plane that contains P , Q, and u.
By plugging µ = 0 into qa ’s parametric equation, we see that qa contains l1 .
Next, observe that since u ∥ v, l2 can also be described by r = OQ + λu (λ ∈ R). And now
by plugging µ = 1 into qa ’s parametric equation, we see that qa also contains l2 .

In the remainder of this proof, we’ll suppose instead that l1 ∥/ l2 . That is, u ∥/ v. Then by
Corollary 19, qb is the unique plane that contains P , u, and v. Thus, qb is the only possible
plane that contains both l1 and l2 .
By plugging in µ = 0 into qb ’s parametric equation, we see that qb contains l1 .
By Fact 75, l1 ∥/ l2 implies that l1 and l2 share at most one intersection point.
(b) If l1 and l2 share an intersection point S, then P S is a direction vector of l1 , so that
by Fact 62, P S ∥ v. Thus, qb can also be described by:
3 Ð→ Ð→
r = OP + λu + µP S (λ, µ ∈ R).

By plugging in µ = 1 into =, we see that this plane also contains l2 . And so indeed, qb is

the unique plane that contains both l1 and l2 .

(c) Now suppose that qb contains l2 . Then there exist λ̂ and µ̂ such that:
Ð→ Ð→
OQ = OP + λ̂u + µ̂v.

Now consider the point T = Q − µ̂v. Clearly, T ∈ l2 . Moreover, T ∈ l1 because:

Ð→ Ð→ Ð→ Ð→
OT = OQ − µ̂v = OP + λ̂u + µ̂v − µ̂v = OP + λ̂u.
We’ve just shown that if qb contains l2 , then l1 and l2 intersect. And so, by the contrapos-
itive, if l1 and l2 do not intersect (and are thus skew), then qb does not contain l2 the two
lines are, indeed, non-parallel.

1316, Contents

119.17. A Necessary and Sufficient Condition for Skew Lines

Ð→ Ð→
Fact 215. Let l1 and l2 be the lines described by r = OP + λu and r = OQ + λv (λ ∈ R).
Then: l1 and l2 are skew ⇐⇒ P Q ⋅ (u × v) ≠ 0.

Proof. ( ⇐Ô ) If P Q ⋅ (u × v) ≠ 0, then u × v ≠ 0, so that by Corollary 11, u ∥/ v and l1 ∥/ l2 .
Suppose for contradiction that l1 and l2 intersect at some point S. Then there are numbers
α and β such that:
S = P + αu = Q + βv or P Q = αu − βv.
And so: P Q ⋅ (u × v) = αu ⋅ (u × v) − βv ⋅ (u × v) = 0 − 0 = 0.
But this contradicts P Q ⋅ (u × v) ≠ 0. So, l1 and l2 do not intersect.
Since l1 and l2 are non-parallel and do not intersect, they are skew.

( Ô⇒ ) Now suppose P Q ⋅ (u × v) = 0.
If P Q = 0, then P = Q, so that l1 and l2 intersect and are not skew. And if u × v = 0, then
by Corollary 11, l1 and l2 are parallel and are again not skew.
So, suppose P Q, u × v ≠ 0. Then by Corollary 11, u ∥/ v.
Ð→ Ð→
Also, P Q ⊥ (u × v). By Corollary 18 then, P Q lies on the same plane as u and v.
And so by Theorem 10, there exist α and β such that:
P Q = αu + βv.

Now, let q be the plane described by:

Ð→ Ð→
{R ∶ OR = OP + λu + µv} (λ, µ ∈ R).

Clearly, q contains l1 (to see this, set µ = 0). It also contains the point Q, because:
Ð→ Ð→ Ð→ Ð→
OQ = OP + P Q = OP + αu + βv.

Hence, q also contains l2 (to see this, set λ = α).

We’ve just shown that l1 and l2 are coplanar. And so by Corollary 24, they aren’t skew.

1317, Contents

120. Appendices for Part IV. Complex Numbers

Fact 216. Suppose a ∈ R and b > 0. Then:

(a) The two square roots of a + ib (i.e. solutions to x2 = a + ib) are:

√ √ √√
2 √
± ( a2 + b2 + a + i a2 + b2 − a) .
(b) And the two square roots of a − ib (i.e. solutions to x2 = a − ib) are:
√ √ √ √
2 √
± ( a + a − b − i a − a2 − b2 ) .
2 2
√ √ √√
2 √ 2
Proof. (a) [± ( a2 + b2 + a + i a2 + b2 − a)]
√ √
1 √ 2 2 √ √
= [ a + b + a − ( a + b − a) + 2i ( a2 + b2 + a) ( a2 + b2 − a)]
2 2
1 √ √
= (2a + 2i a + b − a ) = a + i b2 = a + ib.
2 2 2
√ √ √ √
2 √ 2
(b) [± ( a + a2 − b2 − i a − a2 − b2 )]
√ √ √ √ √
= [a + a2 − b2 + a − a2 + b2 − 2i (a + a2 − b2 ) (a − a2 − b2 )]
1 √ √
= [2a − 2i a2 − (a2 − b2 )] = a − i b2 = a − ib.

Fact 217. Suppose a, b ∈ R with b ≠ 0. Then the two square roots of a + bi are:
√ √ √
2 √ b √ 2 2
± ( a +b +a+i
2 2 a + b − a) .
2 ∣b∣

√√ √√ 2
[± ( a2 + b2 + a + i a2 + b2 − a)]
2 ∣b∣
√ √
1 √ 2 2 √ √
= [ a + b + a − ( a + b − a) + 2i ( a + b + a) ( a2 + b2 − a)]
2 2
b 2 2
2 ∣b∣
1 b√ 2 2
= (2a + 2i a + b − a2 ) = a + i ∣b∣ = a + ib.
2 ∣b∣ ∣b∣

Lemma 8. If p, q ∈ C, then (p + q) = p∗ + q ∗ and (pq) ∗ = p∗ q ∗ .

Proof. Let p = (a, b) and q = (c, d). Then:

(p + q) ∗ = (a + c, b + d) = (a + c, −b − d) = (a, −b) + (c, −d) = p∗ + q ∗ .
∗ ∗
(pq) ∗ = [(a, b) (c, d)] = (ac − bd, ad + bc) = (ac − bd, −ad − bc) = (a, −b) (c, −d) = p∗ q ∗ .
1318, Contents
Theorem 12. (Complex Conjugate Root Theorem.) Suppose c0 , c1 , . . . , cn ∈ R. If
z = a + ib solves cn xn + cn−1 xn−1 + cn−2 xn−2 + ⋅ ⋅ ⋅ + c1 x + c0 = 0, then so does z ∗ = a − ib.
n n
Proof. We are given that a + ib solves ∑ ck x = 0. Or equivalently: ∑ ck (a + ib) = 0.
k k 1

k=0 k=0
Taking conjugates of both sides of =, we have: [ ∑ ck (a + ib) k ] ∗ = 0∗ = 0.
1 2


n n
We now apply Lemma 8(a) and (b) to show that [ ∑ ck (a + ib) ] = ∑ ck (a − ib) . k ∗ 3 k

k=0 k=0

n n n
[ ∑ ck (a + ib) k ] ∗ = ∑ [ck (a + ib) ] ∗ = ∑ ck ∗ [(a + ib) ] ∗
(a) k k

k=0 k=0 k=0

= ∑ ck [(a + ib) k ] ∗ = ∑ ak [(a + ib) ∗ ] = ∑ ck (a − ib) .
n n n
k k

k=0 k=0 k=0

n n
Together, = and = show that ∑ ck (a − ib) = 0. Hence, a − ib also solves ∑ ck xk = 0.
2 3 k

k=0 k=0

Fact 128. Let z be a non-zero complex number with ∣z∣ = r and arg z = θ. Then:

z = r (cos θ + i sin θ) .

Proof. Let z = a + ib. Then by Definitions 162 and 163, we have:

⎪ cos−1
= cos−1 √
if b ≥ 0,
√ ⎪ ∣z∣

⎪ a 2 + b2
r = ∣z∣ = a2 + b2 θ = arg z = ⎨
1 2

⎪ − cos−1 a
= − cos−1

if b < 0.
⎪ ∣z∣
2 + b2
⎩ a

3 ⎪
⎪r cos θ, if b ≥ 0,
Rearranging =: a=⎨

⎩r cos (−θ) = r cos θ,
⎪ if b < 0.

So, a = cos θ. Plugging = into =, we have:

3 3 1

r = r2 cos2 θ + b2 ⇐⇒ r2 = r2 cos2 θ + b2 ⇐⇒ b2 = r2 sin2 θ ⇐⇒ b = ±r sin θ.

Observe that b ≥ 0 ⇐⇒ sin θ ≥ 0 and b < 0 ⇐⇒ sin θ < 0. That is, sin θ has the same sign
as b. Hence, we can discard the negative value in = to get b = r sin θ.
4 5

And now by = and =, the result follows: z = a + ib = r (cos θ + i sin θ).

3 5

1319, Contents

Theorem 13. (Euler’s Formula) If θ ∈ R, then eiθ = cos θ + i sin θ.

Proof. Define the function f ∶ R → C by θ ↦ e−iθ (cos θ + i sin θ). Then:402

f ′ (θ) = (−i) e−iθ (cos θ + i sin θ) + e−iθ (− sin θ + i cos θ)

= e−iθ (−i cos θ + sin θ) + e−iθ (− sin θ + i cos θ) = 0.

But by Theorem XXX, the only functions whose derivatives are zero are constant functions.
Thus, e−iθ (cos θ + i sin θ) = C for some constant C.
To find what C is, plug in θ = 0 to get: C = e−0 (cos 0 + i sin 0) = 1 ⋅ (1 + 0) = 1.
Thus, e−iθ (cos θ + i sin θ) = 1. Rearranging, eiθ = cos θ + i sin θ.

Lemma 9. Let x ∈ [kπ, (k + 1) π], where k ∈ Z. Then:

⎪x − kπ if k is even,
cos (cos x) = ⎨

⎪(k + 1) π − x
⎪ if k is odd.

Proof. First, note that x − kπ and (k + 1) π − x are both in [0, π].

Moreover, if y ∈ [0, π], then cos−1 (cos y) = y.

(a) If k is even, then cos (x − kπ) = cos xcos (kπ) + sin xsin (kπ) = cos x.
´¹¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¶
1 0

cos−1 (cos x) = cos−1 [cos (x − kπ)] = x − kπ.

And so: 3

(b) If k is odd, then cos [(k + 1) π − x] = cos [(k + 1) π] cos x + sin [(k + 1) π] = cos x.
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
1 0

cos−1 (cos x) = cos−1 [cos ((k + 1) π − x)] = (k + 1) π − x.

And so: 3

We’re actually cheating a little with this proof here, because we haven’t explained how the derivatives
of complex-valued functions work. We simply assume that they work “fairly similarly”.
1320, Contents
Fact 130. Let z and w be non-zero complex numbers. Then:

(a) ∣zw∣ = ∣z∣ ∣w∣; and (b) arg (zw) = arg z + arg w + 2kπ,

⎪ −1, if arg z + arg w > π,

where in (b): k = ⎨0, if arg z + arg w ∈ (−π, π] ,

⎩1, if arg z + arg w ≤ −π.

Proof. We already proved (a) in Exercise 277. We now prove (b).

As in (a), let r = ∣z∣, s = ∣w∣, θ = arg z, and φ = arg w.
Note that θ, φ ∈ (−π, π] and so θ + φ ∈ (−2π, 2π].
Case 1. Suppose sin (θ + φ) ≥ 0. Then by Definition 163:
rs cos (θ + φ)
arg (zw) = cos−1 = cos−1 [cos (θ + φ)] .
Also, since sin (θ + φ) ≥ 0, we have θ + φ ∈ (−2π, −π] ∪ [0, π].
Case 1a. If θ + φ ∈ (−2π, −π], then by Lemma 9:

arg (zw) = cos−1 [cos (θ + φ)] = θ + φ + 2π.


Case 1b. If θ + φ ∈ [0, π], then by Lemma 9:

arg (zw) = cos−1 [cos (θ + φ)] = θ + φ.


Case 2. Suppose sin (θ + φ) < 0. Then by Definition 163:

rs cos (θ + φ)
arg (zw) = − cos−1 = − cos−1 [cos (θ + φ)] .
Also, since sin (θ + φ) < 0, we have θ + φ ∈ (−π, 0) ∪ (π, 2π).
Case 2a. If θ + φ ∈ (−π, 0), then by Lemma 9:

arg (zw) = − cos−1 [cos (θ + φ)] = − [− (θ + φ)] = θ + φ.


Case 2b. If θ + φ ∈ (π, 2π), then by Lemma 9:

arg (zw) = − cos−1 [cos (θ + φ)] = − [2π − (θ + φ)] = θ + φ − 2π.


We’ve just shown that arg (zw) = arg z + arg w + 2kπ, with:

⎪ −1, if arg z + arg w > π,

k = ⎨0, if arg z + arg w ∈ (−π, π] ,

⎩1, if arg z + arg w ≤ −π.
1321, Contents
Fact 131. Suppose w is a non-zero complex number. Then:

1 1
(a) ∣ ∣= .
w ∣w∣

If moreover w is not a negative real number, then:

(b) arg = − arg w.

Proof. Let w = a + ib ≠ 0. By Fact 124(b):

= ( 2 2, 2 2).
a b
w a +b a +b

(a) By Definition 162, ∣w∣ = a2 + b2 and:
¿ √
1 Á Á
2 2
a2 + b2 1 1
∣ ∣= ( 2 2 ) + (− 2 2 ) = 2 2 = √ =
a b
a +b a +b a +b a2 + b2 ∣w∣
. 3

⎪ cos−1

if b ≥ 0,


⎪ a +b
2 2
(b) By Definition 163: arg w = ⎨

⎪ − cos−1 √
if b < 0.

⎩ a 2 + b2

⎪ −1 a/ (a + b ) −b
2 2
⎪ √ = −1
√ ≥ 0 or b ≤ 0,

2 + b2
2 + b2
, if
a2 + b2
1 ⎪⎪ 1/ a a
And: arg = ⎨
w ⎪⎪

⎪ −1 a/ (a + b ) −b
2 2
⎪ − √ = − −1
√ < 0 or b > 0.

cos cos , if
a2 + b2
⎩ 1/ a2 + b2 a2 + b2

Thus, if b < 0, then: = cos−1 √ = − arg w.
arg 3
w a2 + b2
And if b > 0, then: = − cos−1 √ = − arg w.
arg 3
w a2 + b2
1 1
If b = 0, a > 0, then arg w = arg a = 0 and arg = arg = 0 so that indeed:
w a
arg = − arg w. 3
And in the exceptional case where b = 0, a < 0, we have arg = π = arg w.
1322, Contents
Fact 132. Let z and w be non-zero complex numbers. Then:

(a) ∣ ∣ = = arg z − arg w + 2kπ,
z z
; and (b) arg
w ∣w∣ w

⎪ −1, if arg z − arg w > π,

where in (b): k = ⎨0, if arg z − arg w ∈ (−π, π] ,

⎩1, if arg z − arg w ≤ −π.

Proof. We already proved (a) in Exercise 280. We now prove (b):

Suppose w is not a negative real number. Then by Facts 130 and 131:
1 1
= arg (z ) = arg z + arg + 2kπ = arg z − arg w + 2kπ,
w w w

⎪ 1

⎪ −1, if arg z + arg = arg z − arg w > π,


⎪ 1
where: k = ⎨0, if arg z + arg = arg z − arg w ∈ (−π, π] ,


⎪ 1

⎪ 1, if arg z + arg = arg z − arg w ≤ −π.


Now suppose instead that w is a negative real number. Then arg w = −π.
And now by Corollary 28:

⎪ ∈(−π,π]
⎪ ©

⎪ ³¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ

⎪arg z − π = arg z − arg w + 2 k π, if arg z > 0,
arg = arg (−z) = ⎨

⎪ arg z + π = arg z − arg w + 2 k π, if arg z ≤ 0.

´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ®

⎩ ≤−π 1

1323, Contents

121. Appendices for Part V. Calculus

Revision in progress (Jan 2019).

And hence messy at the moment.
Appy polly loggies for any inconvenience caused.

1324, Contents

121.1. A Few Useful Results and Terms
We shall assume, without proof, the following two intuitively-“obvious” results.
First, we can always find another real number between any two real numbers:

Fact 218. If a < b, then there exists c ∈ R such that a < c < b.

Second, given any real number, we can always find a bigger natural number:

Fact 219. (Archimedean Property) If x ∈ R, then there exists n ∈ N such that n > x.

Figure to be
inserted here.

Definition 248. Let x ∈ R and ε > 0. Then the ε-neighbourhood of x, denoted Nε (x), is:

Nε (x) = (x − ε, x + ε).

The left ε-neighbourhood of x, denoted N−ε (x), is:

N−ε (x) = (x − ε, x).

The left ε-neighbourhood of x, denoted N+ε (x), is:

N+ε (x) = (x, x + ε).

And the deleted (or punctured) ε-neighbourhood of x, denoted Nε (x), is:

Nε (x) = N−ε (x) ∪ N+ε (x) = (x − ε, x) ∪ (x, x + ε) = Nε (x) ∖ {x}.

Remark 146. The Nε (x) notation is fairly standard, though other writers may instead
use the letters B or V instead of N.
Unfortunately, there is no standard notation for deleted neighbourhoods, which is why
here in these Appendices I’ve come up with Nε (x), which is not at all standard.403

We can also speak more generally of the ε-neighbourhood of a point in any n-dimensional
space. For example, in two-dimensional space, we have the following definition:

Velleman (2016, Calculus: A Rigorous First Course) uses the equally non-standard notation x → a≠ .
1325, Contents
Definition 249. Let a, b ∈ R and ε > 0. Then the ε-neighbourhood of the point P = (a, b),
denoted Nε (P ), is:

Nε (P ) = {(x, y) ∶ (x − a) + (y − b2 ) < ε}.

And the deleted (or punctured) ε-neighbourhood of P , denoted Nε (P ), is:

Nε (P ) = Nε (P ) ∖ {P }.

Informally, an isolated point in a set S is one that isn’t “close” to any other point in S.

Definition 250. Let S be a set of real numbers. We call x an isolated point of S if x ∈ S

and there exists some ε > 0 such that Nε (x) ∩ S is empty.

Informally, a limit point x of a set S is one if we can always find another point in S that’s
“arbitrarily” close to x. Formally:

Definition 251. Let S be a set of real numbers. We call x a limit point of S if for every
ε > 0, the ε-neighbourhood of x intersects S at some point other than x — or equivalently:

Nε (x) ∩ S ≠ ∅.

Note importantly that a limit point x of S may or may not be in the set S.

Remark 147. Some writers treat the terms limit point, cluster point, and accumula-
tion point as synonyms. But unfortunately and very confusingly, other writers assign
different meanings to these three terms. Fortunately, in these appendices, we will only
mention limit points. We will never again mention cluster or accumulation points.

1326, Contents

121.2. Limits
In the main text, we gave the following informal definition of the phrase “the limit of f at
a is L ∈ R”:

For all values of x that are “close” but not equal to a,

f (x) is “close” (or possibly even equal) to L.

We now formalise the above idea:

Definition 252. Let D ⊆ R, f ∶ D → R, and a be a limit point of D. We say that the

limit of f at a is L ∈ R and write lim f (x) = L if:

For every ε > 0, there exists δ > 0 such that

x ∈ D ∩ Nδ (a) implies f (x) ∈ Nε (L).

The above definition is called the ε-δ definition and is usually credited to Bernard Bolzano
(1781–1848) and Augustin-Louis Cauchy (1789–1857).
Take note of the subtle but important requirement that a be a limit point of D. If a is not
a limit point of D, then lim f (x) is simply undefined (or does not exist).
The following result says that if it exists, the limit must be unique:

Fact 220. Let f be a nice function. If lim f (x) = L1 and lim f (x) = L2 , then L1 = L2 .
x→a x→a

Proof. Suppose for contradiction that L1 ≠ L2 . Then pick ε = ∣L1 − L2 ∣ /2. Observe that
Nε (L1 ) ∩ Nε (L2 ) = ∅.
Let D = Domainf . By Definition 252, there exist δ1 , δ2 > 0 such that x ∈ D ∩ Nmin{δ1 ,δ2 } (a)
implies f (x) ∈ Nε (L1 ) AND f (x) ∈ Nε (L2 ). But since Nε (L1 ) ∩ Nε (L2 ) = ∅, we have a

To define left- and right-hand limits, simply tweak Definition 252:

Definition 253. Let f be a nice function with domain D. Let a be a limit point of D.
We say that the left-hand limit of f at a is L ∈ R and write lim− f (x) = L if:

For every ε > 0, there exists δ > 0 such that

x ∈ D ∩ N−ε (a) implies f (x) ∈ Nε (L).

Definition 254. Let f be a nice function with domain D. Let a be a limit point of D.
We say that the right-hand limit of f at a is L ∈ R and write lim+ f (x) = L if:

For every ε > 0, there exists δ > 0 such that

x ∈ D ∩ N+ε (a) implies f (x) ∈ Nε (L).

1327, Contents

121.3. Infinite Limits and Vertical Asymptotes

Definition 255. Let D ⊆ R, f ∶ D → R, and a be a limit point of D.

(a) We say that the left-hand limit of f at a is ∞ and write lim− f (x) = ∞ if:

For every N ∈ R, there exists δ > 0 such that

x ∈ D ∩ N−δ (a) implies f (x) > N .

(b) We say that the right-hand limit of f at a is ∞ and write lim+ f (x) = ∞ if:

For every N ∈ R, there exists δ > 0 such that

x ∈ D ∩ N+δ (a) implies f (x) > N .

(c) We say that the limit of f at a is ∞ and write lim f (x) = ∞ if:

For every N ∈ R, there exists δ > 0 such that

x ∈ D ∩ Nδ (a) implies f (x) > N .

(d) We say that the left-hand limit of f at a is −∞ and write lim− f (x) = −∞ if:

For every N ∈ R, there exists δ > 0 such that

x ∈ D ∩ N−δ (a) implies f (x) < N .

(e) We say that the right-hand limit of f at a is −∞ and write lim+ f (x) = −∞ if:

For every N ∈ R, there exists δ > 0 such that

x ∈ D ∩ N+δ (a) implies f (x) < N .

(f) We say that the limit of f at a is −∞ and write lim f (x) = −∞ if:

For every N ∈ R, there exists δ > 0 such that

x ∈ D ∩ Nδ (a) implies f (x) < N .

At long last, we can finally define vertical asymptotes:

Definition 256. The line x = a is a vertical asymptote of a function f if the left- or

right-hand limit of f at a is ±∞.

1328, Contents

121.4. Limits at Infinity and Horizontal and Oblique Asymptotes

Definition 257. Let D ⊆ R and f ∶ D → R.

(a) We say that the limit of f as x approaches ∞ is L ∈ R and write lim f (x) = L if:

For every ε > 0, there exists N ∈ R such

that x > N (and x ∈ D) implies f (x) ∈ Nε (L).

(b) We say that the limit of f as x approaches −∞ is L ∈ R and write lim f (x) = L if:

For every ε > 0, there exists N ∈ R such

that x < N (and x ∈ D) implies f (x) ∈ Nε (L).

Again, at long last, we can finally define horizontal asymptotes:

Definition 258. The line y = L is a horizontal asymptote of a function f if the limit of

f as x approaches ±∞ is L.

We can similarly define oblique asymptotes:

Definition 259. Let D ⊆ R and f ∶ D → R.

(a) We say that the limit of f as x approaches ∞ is ax + b and write lim f (x) = ax + b if:

For every ε > 0, there exists N ∈ R such

that x > N (and x ∈ D) implies f (x) ∈ Nε (ax + b).

(b) We say that the limit of f as x approaches −∞ is ax + b and write lim f (x) = ax + b

For every ε > 0, there exists N ∈ R such

that x < N (and x ∈ D) implies f (x) ∈ Nε (ax + b).

Definition 260. The line y = ax + b is an oblique asymptote of a function f if a ≠ 0 and

the limit of f as x approaches ±∞ is ax + b.

1329, Contents

121.5. Rules for Limits

Definition 261. Given a set S ⊆ R, the largest and smallest numbers in S (if they exist)
are denoted by max S and min S.

Example 1229. Let S = {1, 2, 3}. Then max S = 3 and min S = 1.

Example 1230. Let T = [0, 1]. Then max T = 1 and min T = 0.

Example 1231. Let U = (0, 1). Then neither max U nor min U exists because the set U
has no largest or smallest number.

Theorem 14. (Rules for Limits) Suppose k, L, M ∈ R, lim f (x) = L, and lim g (x) =
x→a x→a
M . Then:
lim [kf (x)] = kL
(a) (Constant Factor Rule)
(b) lim [f (x) ± g (x)] =L+M (Sum and Difference Rules)
(c) lim [f (x) g (x)] = LM (Product Rule)
1 R 1
(d) lim = (provided M ≠ 0) (Reciprocal Rule)
x→a g (x) M
f (x) ÷ L
(e) lim = (provided M ≠ 0) (Quotient Rule)
x→a g (x) M
(f) lim k (Constant Rule)
= ak
(g) lim xk (Power Rule)

Proof. Fix ε > 0. For each Rule, we shall show that there exists some δ such that if
x ∈ D ∩ Nδ (a), then the value of the given function at x is less than ε away from the
purported limit.
First, note that the statements lim f (x) = L and lim g (x) = M say the following:
x→a x→a

, For every εf > 0, there exists δf > 0 such that x ∈ D∩ Nδf (a) implies
☀ For every εg > 0, there exists δg > 0 such that x ∈ D∩ Nδg (a) implies

(a) Pick εf = ε/ ∣k∣ and let δf be as given by ,. Pick δ = δf .

Suppose x ∈ D ∩ Nδ (a). Then ∣f (x) − L∣ < εf and thus:

∣kf (x) − kL∣ = ∣k∣ ∣f (x) − L∣ < ∣k∣ εf = ∣k∣ ε/ ∣k∣ = ε.

(b) Pick εf = ε/2 and εg = ε/2, and let δf and δg be as given by , and ☀. Pick δ =
min {δf , δg }.
Suppose x ∈ D ∩ Nδ (a). Then ∣f (x) − L∣ < εf , ∣g (x) − M ∣ < εg and thus:

∣f (x) ± g (x) − (L ± M )∣ ≤ ∣f (x) − L∣ + ∣g (x) − M ∣ < εf + εg = ε,


where ≤ denotes the use of the Triangle Inequality.


1330, Contents

(c) Pick εf = 0.5ε/ ∣M ∣. Let N = max {L − εf , L + εf }. Pick εg = 0.5ε/N . Let δf and δg be
as given by , and ☀. Pick δ = min {δf , δg }.
Suppose x ∈ D ∩ Nδ (a). Then ∣f (x) − L∣ < εf , ∣g (x) − M ∣ < εg , and also ∣f (x)∣ < N . Thus:

∣f (x) g (x) − LM ∣ = ∣f (x) g (x) − f (x) M + f (x) M − LM ∣ ≤ ∣f (x) g (x) − f (x) M ∣ + ∣f (x) M − L

= ∣f (x)∣ ∣g (x) − M ∣ + ∣f (x) − L∣ ∣M ∣ < ∣f (x)∣ εg + εf ∣M ∣ < N εg + 0.5ε = 0.5ε + 0.

(d) Let N be the smallest integer such that N > ε/ ∣M ∣ and N > ε2 .
Pick εg = > 0. Observe that (we’ll use this below):
N (N − ε2 )
ε 2
εg N 2 N (N −ε2 ) N
= = =
εN 1
ε2 + εg εN ε2 + N (Nε −ε2 ) εN N − ε2 + ε2
3 ε.

Let δg be as given by ☀. Pick δ = δg .

Suppose x ∈ D ∩ Nδ (a), with g (x) ≠ 0. Then ∣g (x) − M ∣ < εg , ∣g (x)∣ > ∣M ∣ − εg > ε/N + εg ,
and thus:

1 1 M − g (x) ∣g (x) − M ∣ εg N 2 1
∣ − ∣=∣ ∣= < = 2 =ε
g (x) M g (x) M ∣g (x)∣ ∣M ∣ (ε/N + εg ) ε/N ε + εg εN

(e) follows from (c) and (d).

(f) For any x, ∣k − k∣ = 0 < ε.

(g) Let a > 0. Pick δ = min { , }. Suppose x ∈ D ∩ Nδ (a). Note that since δ ≤ a/2,
ε a
2 exp k 2
we have x > 0. And now:

∣xk − ak ∣ = ∣exp (k ln x) − exp (k ln a)∣ = ∣(exp k) [exp (ln x) − exp (ln a)]∣ = ∣(exp k) (x − a)∣ < ε.
1 2 3

Above, = uses Definition 271, while = and = use Fact 160(c) and Definition 59. This
1 2 3

completes the proof of the Power Rule in the case where a > 0.
It remains to be proven that the Power Rule holds in the case where a ≤ 0. Unfortunately,
such a proof must be omitted altogether from this textbook, for reasons that were already
discussed in Remark 156.

1331, Contents

121.6. Continuity
As mentioned in the main text, discontinuities may be classified as removable, jump,
or essential (or infinite). Here now are the formal definitions of these three types of

Definition 262. Suppose the function f has a discontinuity at a. Then we call this
(a) A removable discontinuity if lim f (x) exists;
(b) A jump discontinuity if both lim− f (x) and lim+ f (x) exist but lim− f (x) ≠ lim+ f (x);
x→a x→a x→a x→a
(c) An essential (or infinite) discontinuity if it isn’t a removable or a jump discontinuity.

Fact 135. Suppose c ∈ R and f is a nice function defined by f (x) = c. Then f is


Proof. Let ε > 0 and a ∈ Domainf . Pick any δ > 0. Then for all x ∈ Nδ (a), we have:

∣f (x) − f (a)∣ = ∣c − c∣ = 0 < ε.

Fact 136. Suppose f is a nice function defined by f (x) = x. Then f is continuous.

Proof. Let ε > 0 and a ∈ Domainf . Pick δ = ε. Then for all x ∈ Nδ (a), we have:

∣f (x) − f (a)∣ = ∣x − a∣ < δ = ε.

Theorem 15. Let f and g be nice functions and a ∈ Domaing. If g is continuous at a

and f is continuous at g (a), then the composite function f g is continuous at a.

Proof. Let ε > 0. Since f is continuous at b = g (a), there exists δ̂ > 0 such that for every
y ∈ Domainf ∩ Nδ̂ (g (a)), we have ∣f (y) − f (b)∣ < ε.
Since g is continuous at a, there exists δ > 0 such that for every x ∈ Domaing ∩ Nδ (g (a)),
we have ∣g (x) − g (a)∣ < δ̂.
Hence, for every x ∈ Domaing ∩ Nδ (g (a)), we have ∣(f g) (x) − (f g) (a)∣ < ε.

Fact 221. Any nice function with the mapping x ↦ is continuous.

1 1
Proof. Let D ⊆ R ∖ {0}, f ∶ D → R be defined by f (x) = , ε ∈ (0, ), and a ∈ D.
x a

1332, Contents

a2 ε
If a > 0, then pick δ = < a2 ε < a. Then for every x ∈ D ∩ Nδ (a), we have:
1 + aε
1 1 1 1 1 + aε 1
<∣ ∣< = = = + ε.
a+δ x a − δ a − 1+aε 2
a ε a a

1 1 a−x 1 1
∣f (x) − f (a)∣ = ∣ − ∣ = ∣ ∣ = ∣ ∣ ∣ ∣ ∣x − a∣
x a ax a x
1 1 + aε 1 1 + aε 1 1 + aε a2 ε
< ∣x − a∣ < δ= = ε.
a a a a a a 1 + aε
The case where a < 0 is similarly handled.

Theorem 16. Suppose the nice functions f and g are continuous at a. Then (a) f ±g and
(b) f ⋅ g are also continuous at a. (c) If moreover g (a) ≠ 0, then f /g is also continuous
at a. (d) If c ∈ R, then cf is also continuous at a.

Proof. Let ε > 0.

(a) By the continuity of f and g at a, there exists δ1 > 0 such that for every x ∈ Nδ1 (a) ∩
Domainf ∩ Domaing, we have: f (x) ∈ Nε/2 (f (a)), g (x) ∈ Nε/2 (g (a)), so that:

∣f (x) ± g (x) − [f (a) ± g (a)]∣ ≤ ∣f (x) − f (a)∣ + ∣g (x) − g (a)∣ < + =ε
ε ε
2 2
Hence, f ± g are continuous at a.

q= p=
ε ε
(b) Let: and .
2 (∣f (a)∣ + 1) 2 [q + ∣g (a)∣]
By the continuity of f and g at a, there exists δ2 > 0 such that for every x ∈ Nδ2 (a) ∩
Domainf ∩ Domaing, we have: f (x) ∈ Np (f (a)), g (x) ∈ Nq (g (a)), so that:

∣f (x) g (x) − f (a) g (a)∣ = ∣f (x) g (x) − f (a) g (x) + f (a) g (x) − f (a) g (a)∣

≤ ∣f (x) g (x) − f (a) g (x)∣ + ∣f (a) g (x) − f (a) g (a)∣

= ∣[f (x) − f (a)] g (x)∣ + ∣f (a) [g (x) − g (a)]∣

= ∣f (x) − f (a)∣ ∣g (x)∣ + ∣f (a)∣ ∣g (x) − g (a)∣

= ∣f (x) − f (a)∣ ∣g (x) − g (a) + g (a)∣ + ∣f (a)∣ ∣g (x) − g (a)∣

≤ ∣f (x) − f (a)∣ [∣g (x) − g (a)∣ + ∣g (a)∣] + ∣f (a)∣ ∣g (x) − g (a)∣

< p [q + ∣g (a)∣] + ∣f (a)∣ q

= + ∣f (a)∣ < + = ε.
ε ε ε ε
2 2 (∣f (a)∣ + 1) 2 2

1333, Contents

(c) From Fact 221 and Theorem 15, 1/g is continuous at a. The result then follows by also
using (b).
(d) Apply Theorem 15 and (b).

Remark 148. The usual, short proof of (b) relies on other hard-fought results about
sequential limits. However, we have not discussed sequential limits at all in this textbook
and so we cannot use those results.

Proposition 18. If a monotonic function has an interval as its range, then it is contin-

Proof. Let D ⊆ R, f ∶ D → R be an increasing function, and E = Rangef be an interval.

(The proof of the case where f is decreasing is similar and thus omitted.)
Let ε > 0. Pick any a ∈ D.404
We will prove that in each of four possible cases, f is continuous at a. (Since a ∈ D is
arbitrarily chosen, we will thus have proven that f is continuous.)
Case 1. Suppose there exist b and c such that f (b) ∈ (f (a) , f (a) + ε) and f (c) ∈
(f (a) − ε, f (a)).
Let δ = b − a. Since f is increasing, for all x ∈ (a, b = a + δ) ∩ D, we have f (a) ≤ f (x) ≤ f (b),
so that ∣f (x) − f (a)∣ < ε. ,
Since f is increasing, a > c. Let δ̂ = a − c. Since f is increasing, for all x ∈ (c = a − δ̂, a), we
have f (c) ≤ f (x) ≤ f (a), so that ∣f (x) − f (a)∣ < ε. ⋆
By , and ⋆, we have thus found δ̄ = min {δ, δ̂} > 0 such that for every x ∈ Nδ̄ (a), we have
∣f (x) − f (a)∣ < ε. Thus, f is continuous at a.
Case 2. Suppose there exists b such that f (b) ∈ (f (a) , f (a) + ε), but not c such that
f (c) ∈ (f (a) − ε, f (a)).
Repeat line , here.
Since there does not exist c such that f (c) ∈ (f (a) − ε, f (a)), it must be that f (a) =
min Rangef . So, for all x ∈ (a − δ, a) ∩ D, we have f (x) = f (a) and again ∣f (x) − f (a)∣ =
0 < ε. -
By , and -, we have thus found δ > 0 such that for every x ∈ Nδ (a), we have ∣f (x) − f (a)∣ <
ε. Thus, f is continuous at a.
Case 3. Suppose there exists c such that f (c) ∈ (f (a) − ε, f (a)), but not b such that
f (b) ∈ (f (a) , f (a) + ε).
Repeat line ⋆ here.
Since there does not exist b such that f (b) ∈ (f (a) , f (a) + ε), it must be that f (a) =
max Rangef . So, for all x ∈ (a, a + δ̂) ∩ D, we have f (x) = f (a) and again ∣f (x) − f (a)∣ =
0 < ε. △
By ⋆ and △, we have thus found δ̂ > 0 such that for every x ∈ Nδ̂ (a), we have ∣f (x) − f (a)∣ <
ε. Thus, f is continuous at a.

If no such a exists, then D is empty and f is continuous.
1334, Contents
Case 4. Suppose there do not exist b or c such that f (b) ∈ (f (a) , f (a) + ε) or f (c) ∈
(f (a) − ε, f (a)).
Then since Rangef is an interval, it must be that Rangef consists of the single point f (a).
So, f is a constant function and is continuous (Fact 135).

Theorem 17. Let D be an interval and f ∶ D → R be continuous. If f is invertible, then

its inverse f −1 is also continuous.

Proof. Since f is invertible and continuous on an interval, by Proposition 3, it is strictly

monotonic. And so by Proposition 4, the inverse f −1 is also strictly monotonic.
Since Rangef −1 = Domainf = D is an interval, by Proposition 18, f −1 is continuous.

Fact 134. Suppose lim g (x) = b. If f is continuous at b, then:


lim f (g (x)) = f (lim g (x)) .

x→a x→a

Proof. Let ε > 0 and c = f (b) = f (lim g (x)).


Since f is continuous at b, there exists δ̂ > 0 such that for every y ∈ Domainf ∩ Nδ̂ (b), we
have f (y) ∈ Nε (c).
Since lim g (x) = b, there exists δ > 0 such that for every x ∈ Nδ (a), we have g (x) ∈ Nδ̂ (b).
Hence, for every x ∈ Nδ (a), we have g (x) ∈ Nδ̂ (b) and thus also f (g (x)) ∈ Nε (c). We have
just proven that:

lim f (g (x)) = c = f (lim g (x)) .

x→a x→a

Remark 149. Fact 134 is also true if a is replaced by ±∞. (The proof is very similar.)

1335, Contents

121.7. The Derivative
Differentiability implies continuity:

Theorem 19. If f is differentiable at a, then it is also continuous at a.

Proof. If f is differentiable at a, then the following limit exists:

f (x) − f (a)
f ′ (a) = lim

Using the Rules for Limits (Theorem 14), we show that lim f (x) − f (a) = 0:

± f (x) − f (a)
lim f (x) − f (a) = lim f (x) − lim f (a) = lim [f (x) − f (a)] = lim [ (x − a)]
C 1
x→a x→a x→a x→a x→a x−a
× f (x) − f (a)
= lim lim (x − a) = f ′ (a) ⋅ 0 = 0.
x→a x−a x→a

(Note that = is OK because x ≠ a.)


Hence, lim f (x) = f (a). By Definition 164 (of continuity) then, f is continuous at a.

Proposition 19. (Newton’s Linear Approximation) Let f be a nice function, a ∈

Domainf , and L ∈ R. Then:

For every ε > 0, there exists δ > 0 such that for

f (a) = L

⇐⇒ every x ∈ Nδ (a) ∩ Domainf , we have
∣f (x) − [f (a) + L (x − a)]∣ < ε ∣x − a∣.

Remark 150. Newton’s Linear Approximation gives us Newton’s Method (or the
Newton-Raphson Method) for finding the roots of a function. (Newton’s Method was
formerly on the old 9233 syllabus but was removed when the 9740 syllabus was intro-
duced in 2007.)

f ′ (a) = L
f (x) − f (a)
⇐⇒ lim = L.
x→a x−a
f (x) − f (a
⇐⇒ For every ε > 0, there exists δ > 0 such that x ∈ Nδ (a) ∩ Domainf implies ∣
⇐⇒ ∣f (x) − [f (a) + L ∣x − a∣]∣ < ∣x − a∣ ε.

In the main text (p. 687), we proved the Power Rule only in the special case where the
exponent c is a non-negative integer. We now prove it also in the case where the base x is
positive and c is any real exponent.

1336, Contents

Fact 143. Let D be an interval, c ∈ R, and f ∶ D → R be defined by f (x) = xc . Then the
derivative of f is the function f ′ ∶ D → R defined by:

f ′ (x) = cxc−1 .
(Power Rule)

Proof. Below, = indicates the use of Definition 271, which is the general definition of
exponentiation given in Ch. 121.17.
For x > 0, we have:
⋆ 1 ⋆ c1
i′ (x) = (xc ) ′ = [exp (c ln x)] ′ = [exp (c ln x)] (c ln x) ′ = c exp (c ln x) (ln x) ′ = c exp (c ln x) = cx
x x
We’ve just proven that the Power Rule of Differentiation holds in the case where x > 0.
It remains to be proven that the Power Rule also holds in the case where x ≤ 0. Unfor-
tunately, this proof shall be omitted altogether from this textbook, for reasons that were
already discussed in Remark 156.

1337, Contents

121.8. Proving the Chain Rule
In Ch. 69.7 of the main text, we gave a “proof” of the Chain Rule that contained two flaws.
The first flaw is fairly straightforward and nothing more need be said.
Let us now elaborate a little more about the second flaw, which was that the following step
requires additional justification:
f (g (x)) − f (g (a)) ⋆ ′
lim = f (g (a)) .
x→a g (x) − g (a)

To see why the above step requires additional justification, let b = g (a). We were given
that f ′ (g (a)) = f ′ (b) exists. By Definition 168 (of the derivative), this merely means that:
f (y) − f (b) f (y) − f (g (a))
f ′ (g (a)) = f ′ (b) = lim = lim
y−b y − g (a)
y→b y→g(a)

We now need to justify that this last expression is in fact equal to the LHS of =, i.e. that:
f (y) − f (g (a)) f (g (x)) − f (g (a))
lim = lim
y − g (a) g (x) − g (a)
y→g(a) x→a

The proof below fixes both of these flaws.

Theorem 21. (Chain Rule) Let a ∈ R and f and g be nice functions. Suppose g ′ (a)
and f ′ (g (a)) exist. Then:

(f g) ′ (a) = f ′ (g (a)) g ′ (a) .

Proof. Let D be the set of points for which the composite function f g is well-defined. Let
b = g (a).
Define h ∶ D → R by:

⎪ f (g (x)) − f (b)

⎪ for g (x) ≠ b,
1 ⎪
⎪ g (x) − b
h (x) = ⎨

⎩f (b)
⎪ for g (x) = b.


⎪ f (g (x)) − f (b) for g (x) ≠ b,
g (x) − b ⎪

h (x) =⎨
x−a ⎪

⎪ g (x) − b

⎪ f ′ (g (a)) = 0 = f (g (x)) − f (b) for g (x) = b.
⎩ x−a
g (x) − b 2
h (x) = f (g (x)) − f (b) .

1338, Contents

Below we will use =.

We now turn to prove that lim h (x) = f ′ (b).

Let ε > 0. Since f (b) exists, there exists λ > 0 so that for every y ∈ D ∩ Nλ (a), we have:

f (y) − f (b)
∣ − f ′ (b)∣ < ε

By the continuity of g, there exists δ > 0 such that for every x ∈ D ∩ Nδ (a), we have
g (x) ∈ D ∩ Nλ (g (a)) and thus also:

⎪ f (g (x)) − f (b)

⎪ ∣ − ′
(b)∣ <
for g (x) ≠ b,

⎪ g (x) − b
f ε
∣h (x) − f (b)∣ = ⎨

⎩∣f (b) − f (b)∣ = 0 for g (x) = b.
′ ′

We have just proven =, because we have just shown that for any ε > 0, we can find δ > 0

such that for every x ∈ D ∩ Nδ (a), we have ∣h (x) − f ′ (b)∣ < ε.

We now prove the Chain Rule using = and =:
2 3

f (g (x)) − f (g (a)) 2 g (x) − g (a)

(f g) ′ (a) = lim = lim [h (x) ]
x→a x−a x→a x−a
g (x) − g (a)
= lim h (x) lim = f ′ (g (a)) g ′ (a).
x→a x→a x−a
A simple rearrangement of the Chain Rule yields the Parametric Differentiation Rule:

Corollary 38. (Parametric Differentiation Rule) Let f ∶ D → R and g ∶ E → R

(D, E ⊆ R) be functions with Rangeg ⊆ D. Suppose g ′ (a) and f ′ (g (a)) exist, with g ′ (a) ≠
0. Then:
(f g) ′ (a)
f ′ (g (a)) =
g ′ (a)

It’s not at all obvious why the above result corresponds to our Parametric Differentiation
Rule. To see why, let y and x take the places of f and g. Then we have:
(f g) ′
f (g) =

² g′
dy ²
dt ÷ dt
dx dy dx

(As usual, t is a dummy variable that can be replaced by any other symbol.)

Here is a weak version of the Inverse Function Theorem (IFT). It is weak in the sense
that it employs strong assumptions, so that the result is nearly immediate from Corollary

1339, Contents

Theorem 34. (Inverse Function Theorem) Let f be a nice function that is defined
at a. Suppose f −1 is defined on (a − ε, a + ε) for some ε > 0. Suppose also that f ′ (a) and
(f −1 ) ′ (f (a)) exist, with f ′ (a) ≠ 0. Then:
(f −1 ) ′ (f (a)) =
f ′ (a)

Proof. By the Parametric Differentiation Rule (Corollary 38):

(f −1 f ) ′ (a) 1
(f −1 ) ′ (f (a)) = =
f ′ (a) f ′ (a)

1340, Contents

121.9. When a Function is Increasing or Decreasing
Suppose we have a differentiable function f .

Figure to be
inserted here.

Pick any two points A and B on f . Let m the gradient of the line AB.
It is intuitively plausible that there exists some point C (on f ) between A and B such that
the gradient of the tangent line at C equals m. This result is known as the Mean Value

Theorem 35. (Mean Value Theorem) Let a < b. Suppose f is continuous on [a, b]
and differentiable on (a, b). Then there exists c ∈ (a, b) such that:
f (b) − f (a)
f ′ (c) =

Proof. Omitted — see e.g. ProofWiki.

Fact 145. (Increasing/Decreasing Test [IDT]) Let a < b. Suppose f is differentiable

on (a, b). Then:

(a) For every x ∈ (a, b), f ′ (x) ≥0 ⇐⇒ f is increasing.

(b) “ >0 Ô⇒ “ strictly increasing.
(c) “ ≤0 ⇐⇒ “ decreasing.
(d) “ <0 Ô⇒ “ strictly decreasing.

Proof. ( Ô⇒ )(a) Let c, d ∈ (a, b) with d > c. By the Mean Value Theorem, there exists
f (d) − f (c)
e ∈ (c, d) such that f ′ (e) = . Since f ′ (e) ≥ 0 and d > c, we have f (d) ≥ f (c).
The proof of (b) is identical, except that we replace the two ≥s with >s.
The proofs of (c) and (d) are similar to (a) and (b) and thus omitted.
( ⇐Ô )(a) Suppose f is increasing405 on (a, b). Then for any distinct c, d ∈ (a, b), we have:

d − c > 0 ⇐⇒ f (d) − f (c) ≥ 0 and d − c < 0 ⇐⇒ f (d) − f (c) ≤ 0.

f (d) − f (c) 1
Hence: ≥ 0.
Note that this includes the possibility that f is strictly increasing.
1341, Contents
Now, let e ∈ (a, b) and consider the derivative of f at e, which exists because f is differen-
tiable on (a, b):
f (x) − f (e)
f ′ (e) = lim

We now show that f ′ (e) ≥ 0. Suppose for contradiction that f ′ (e) = L < 0. Pick δ = ∣L∣ /2.
Then by definition of the limit, there exists some ε ∈ (0, min {e − a, b − e}) such that for
every x ∈ Nε (e), we have:
f (x) − f (e)
∈ Nδ (L) ⊆ R− .
f (x) − f (e)
< 0, contradicting ≥.
Which implies that:
The proof of (c) is similar and thus omitted.

By the way, with the MVT, we can easily prove that a function whose derivative is a
zero function is itself is a constant function:

Proposition 6. Let D be an interval and f ∶ D → R be a function. If f ’s derivative

is the function f ′ ∶ D → R defined by f ′ (x) = 0, then f is defined by f (x) = c for some
c ∈ R.

Proof. Pick any a, b ∈ D with a < b. Then by the MVT, for some x ∈ [a, b], we have:
f (b) − f (a)
f ′ (x) =

But since f ′ (x) = 0 for all x ∈ D, we must have f (b) = f (a). Since a and b were arbitrarily
chosen points in D, f is constant on D.

Definition 263. Given a set S ⊆ R, x ∈ S is called an interior point of S if there exists

ε > 0 such that Nε (x) ⊆ S. Otherwise, x is a boundary point of S.

Theorem 22. (Interior Extremum Theorem.) Let a < b, f ∶ (a, b) → R be differen-

tiable, and c ∈ (a, b). If c is an extremum (of f ), then f ′ (c) = 0.

Proof. Suppose f ′ (c) > 0. That is:

f (x) − f (c)
lim > 0.
x→c x−c
Then there exists δ > 0 such that for every x ∈ Nδ (c), we have:
f (x) − f (c)
> 0.
That is, f (x) > f (c) ⇐⇒ x > c and f (x) < f (c) ⇐⇒ x < c. So by definition, x is neither
a maximum nor a minimum point.
1342, Contents
If f ′ (c) > 0, we arrive at the same conclusion.
Altogether then, by the contrapositive, if c is an extremum, then f ′ (c) = 0.

The Extreme Value Theorem is the intuitively-plausible result that if a function is

continuous on a closed interval, then it attains both its minimum and maximum on that

Theorem 36. (Extreme Value Theorem) If f is continuous on [a, b], then there exist
m, M ∈ [a, b] such that for every x ∈ [a, b], we have:

f (m) ≤ f (x) ≤ f (M ) .

Proof. Omitted — see e.g. Wikipedia.

1343, Contents

121.10. The First and Second Derivative Tests

Lemma 10. Suppose f is continuous on (a − ε, a].

(a) If f is increasing on (a − ε, a), then f (a) ≥ f (x) for all x ∈ (a − ε, a).

(b) If f is strictly increasing on (a − ε, a), then f (a) > f (x) for all x ∈ (a − ε, a).
(c) If f is decreasing on (a − ε, a), then f (a) ≤ f (x) for all x ∈ (a − ε, a).
(d) If f is strictly decreasing on (a − ε, a), then f (a) < f (x) for all x ∈ (a − ε, a).

Proof. (a) We will prove the contrapositive. Suppose there exists b ∈ (a − ε, a) such that
f (a) < f (b). Let δ = [f (b) − f (a)] /2. By continuity, there exists ε̂ ∈ (0, ε) such that for
all x ∈ (a − ε̂, a), we have f (x) ∈ Nδ (f (a)). Observe that since f (b) ∉ Nδ (f (a)), we must
have b ≤ a − ε̂. Observe that f (b) > f (x) for all x ∈ (a − ε̂, a). So, f is not increasing on
(a − ε, a).
(b) If f is strictly increasing on (a − ε, a), then by (a), f (a) ≥ f (x) for all x ∈ (a − ε, a).

Suppose for contradiction that there exists b ∈ (a − ε, a) such that f (a) = f (b). Since f is
strictly increasing on (a − ε, a), for any c ∈ (b, a), we have f (c) > f (b) = f (a), contradicting

The proofs of (c) and (d) are similar to (a) and (b) and thus omitted.

The following result is nearly immediate from our definitions of the terms (strict) local
maximum, (strict) local minimum, (strictly) increasing, and (strictly) decreasing:

Lemma 11. Let D be an interval, f ∶ D → R be continuous, and a be an interior point

of D.
(a) If there exists ε > 0 such that f is increasing on (a − ε, a) and decreasing on (a, a + ε),
then a is a local maximum of f .
(b) If there exists ε > 0 such that f is strictly increasing on (a − ε, a) and strictly de-
creasing on (a, a + ε), then a is a strict local maximum of f .
(c) If there exists ε > 0 such that f is decreasing on (a − ε, a) and increasing on (a, a + ε),
then a is a local minimum of f .
(d) If there exists ε > 0 such that f is strictly decreasing on (a − ε, a) and strictly in-
creasing on (a, a + ε). then a is a strict local maximum of f .

Proof. (a) By Lemma 10, f (a) ≥ f (x) for all x ∈ (a − ε, a).

Similarly, f (a) ≥ f (x) for all x ∈ (a, a + ε).
Altogether then, f (a) ≥ f (x) for all x ∈ (a − ε, a + ε). Hence, by Definition 224, a is a local
maximum of f .
The proofs of (b), (c), and (d) are similar and thus omitted.

It is tempting to assume that the converses of (a)–(d) in the above Lemma are also true.
Unfortunately, they are not:
1344, Contents
Example 1232. Define f ∶ R → R by:

⎪ 2 for x = 0,

f (x) = ⎨1 for x ∈ Q ∖ {0} ,

⎩0 for x ∉ Q.

Figure to be
inserted here.

Observe that 0 is a strict local (and also global) maximum of f . However, f is not
strictly increasing for any ε-neighbourhood to the left of 0; and similarly, f is not strictly
decreasing for any ε-neighbourhood to the right of 0.
Thus, the converse of (a) of the above Fact is false.

Indeed, even if we assume continuity or even differentiability, the converses of the above
Lemma remain false:

1345, Contents

Example 1233. Define g ∶ R → R by:406

⎪ 1

⎪x (2 + sin x )
for x ≠ 0,
g (x) = ⎨

⎩0 for x = 0,

Observe that 0 is a strict local (and also global) minimum of g.

Also, g is differentiable (and thus continuous) everywhere, as we now show. For x ≠ 0,
we have:
1 1 −1 1 1
g ′ (x) = 4x3 (2 + sin ) + x4 (cos ) 2 = x2 [4x (2 + sin ) − cos ] .
x x x x x
For x = 0, we have:
g (x) − g (0) g (x) 1
g ′ (0) = lim = lim = lim [x3 (2 + sin )] = 0.
x→0 x−0 x→0 x x→0 x
Altogether then:

⎪ 1 1

⎪ x 2
[4x (2 + sin ) − cos ] for x ≠ 0,
g ′ (x) = ⎨

x x

⎪ for x = 0,

Observe that for any ε > 0, we can find some (a, b) ⊆ (0, ε) such that g ′ (x) < 0 for all
x ∈ (a, b) and thus that g is (strictly) decreasing on (a, b) ⊆ R+ .

Figure to be
inserted here.

In the main text, we gave an informal statement of the First Derivative Test (FDT). We
now give a formal statement thereof:

This example was stolen from Gelbaum and Olmsted, Counterexamples in Analysis (1964, p. 36).
1346, Contents
Proposition 20. (First Derivative Test for Extrema) Let a < b, f ∶ (a, b) → R be
differentiable, and c ∈ (a, b).
(a) If there exists ε > 0 such that f ′ is non-negative on (a − ε, a) and non-positive on
(a, a + ε), then c is a local maximum (of f ).
(b) If there exists ε > 0 such that f ′ is positive on (a − ε, a) and negative on (a, a + ε),
then c is a strict local maximum (of f ).
(c) If there exists ε > 0 such that f ′ is non-positive on (a − ε, a) and non-negative on
(a, a + ε), then c is a local minimum (of f ).
(d) If there exists ε > 0 such that f ′ is negative on (a − ε, a) and positive on (a, a + ε),
then c is a strict local minimum (of f ).

Proof. (a) By Fact 145, f is increasing on (a − ε, a) and decreasing on (a, a + ε). By Lemma
11 then, a is a local maximum of f .
The proofs of (b), (c), and (d) are similar and thus omitted.

Remark 151. As illustrated in Example 1233, the converses of (a)–(d) are false.

Proposition 7. (Second Derivative Test for Extrema) Let f be a function that is

twice differentiable at c. Suppose f ′ (c) = 0.
(a) If f ′′ (c) < 0, then c is a strict local maximum (of f ).
(b) If f ′′ (c) > 0, then c is a strict local minimum.
(c) If f ′′ (c) = 0, then c could be a maximum point, minimum point, inflexion point, or
something else altogether (so that informally, we say that the SDTE is inconclusive).

Proof. (a) Suppose f ′′ (c) < 0. Then:

f ′ (x) − f ′ (c) f ′ (x)
0 > f ′′ (c) = lim = lim
x−c x→c x − c

Since f is twice-differentiable at c, f ′ is continuous at c. And so, the above inequality

implies that there exists δ > 0 such that f ′ < 0 on (c − δ, c) and f ′ > 0 on (c, c + δ). By the
FDT then, c is a strict local maximum of f .
The proof of (b) is similar and thus omitted.
We already proved (c) (with examples) in the main text.

1347, Contents

121.11. Concavity
Proposition 8 would serve perfectly well as our definition of concavity and convexity,
except that it requires the function to be differentiable. Here are more general definitions
that do not require differentiability:

Definition 264. Let D be an interval and f ∶ D → R. We say that f is concave if for

every x1 , x2 ∈ D and every α ∈ [0, 1], we have:

f (αx1 + (1 − α) x2 ) ≥ αf (x1 ) + (1 − α) f (x2 ) .


We say that f is strictly concave if ≥ can be replaced by >.


We say that f is convex if for every x1 , x2 ∈ D and every α ∈ [0, 1], we have:

f (αx1 + (1 − α) x2 ) ≤ αf (x1 ) + (1 − α) f (x2 ) .


We say that f is strictly convex if ≤ can be replaced by <.


Definition 265. Let D be an interval and f ∶ D → R. We say that f is linear if for every
x1 , x2 ∈ D, we have:

f (αx1 + (1 − α) x2 ) = αf (x1 ) + (1 − α) f (x2 ) .

It is immediate from the above definitions that “linearity ⇐⇒ concavity and convexity”:

Fact 222. Suppose D is an interval and f ∶ D → R. Then:

f is linear ⇐⇒ f is both concave and convex.

The following Lemma characterises concavity and can serve as an alternative definition

1348, Contents

Lemma 12. Let D be an interval and f ∶ D → R.
(a) f is concave ⇐⇒ For any x1 , x2 , x3 ∈ D with x1 < x2 < x3 , we have:

f (x2 ) − f (x1 ) 1 f (x3 ) − f (x2 )

x2 − x1 x3 − x2

Also, (a) remains true if we replace concave with strictly concave and ≥ with >.

(b) f is convex ⇐⇒ For any x1 , x2 , x3 ∈ D with x1 < x2 < x3 , we have:

f (x2 ) − f (x1 ) 2 f (x3 ) − f (x2 )

x2 − x1 x3 − x2

Also, (b) remains true if we replace convex with strictly convex and ≤ with <.

x3 − x2
Proof. (a) Pick any distinct x1 , x2 , x3 ∈ D. Let α = .
x3 − x1
Observe that α ∈ [0, 1]. Moreover:
x3 − x2 x2 − x1
αx1 + (1 − α) x3 = x1 + x3 = x2 .
x3 − x1 x3 − x1

And so by Definition 264 (of concavity), we have:

x3 − x2 x2 − x1
f (x2 ) ≥ f (x1 ) + f (x3 )
x3 − x1 x3 − x1
⇐⇒ (x3 − x1 ) f (x2 ) ≥ (x3 − x2 ) f (x1 ) + (x2 − x1 ) f (x3 )

⇐⇒ x3 [f (x2 ) − f (x1 )] ≥ x2 [f (x3 ) − f (x1 )] − x1 [f (x3 ) − f (x2 )]

⇐⇒ (x3 − x2 ) [f (x2 ) − f (x1 )] ≥ x2 [f (x3 ) − f (x1 )] − x1 [f (x3 ) − f (x2 )] − x2 [f (x2 ) − f (x

= x2 f (x3 ) − x1 [f (x3 ) − f (x2 )] − x2 f (x2 )

= (x2 − x1 ) [f (x3 ) − f (x2 )].

f (x2 ) − f (x1 ) f (x3 ) − f (x2 )
⇐⇒ ≥ .
x2 − x1 x3 − x2
For the proof of the strict case, simply replace the weak inequalities in the above proof with
strict ones.
The proof of (b) is similar and thus omitted.

1349, Contents

Proposition 8. (First Derivative Test for Concavity [FDTC]) Let D be an interval
and f ∶ D → R be a differentiable function. Then:
(a) f ′ is decreasing (on D) ⇐⇒ f is concave (on D).
(b) “ strictly decreasing ⇐⇒ “ strictly concave.
(c) “ increasing ⇐⇒ “ convex.
(d) “ strictly increasing ⇐⇒ “ strictly convex.

Proof. (a) Below, ⇐⇒ indicates the use of Lemma 12.


f is concave on D

f (x2 ) − f (x1 ) f (x3 ) − f (x2 )

⇐⇒For any x1 , x2 , x3 ∈ D with x1 < x2 < x3 , we have ≥
x2 − x1 x3 − x2

f (x) − f (a) f (x) − f (b)

For all a, b ∈ D with a > b , we have f ′ (a) = lim ≤ lim = f ′ (b).
x→a x−a x→b x−b
The proof of (b) is identical to that of (a), except that we replace concave with strictly
concave and the two weak inequalities with strict ones.
The proofs of (c) and (d) are similar to (a) and (b) and thus omitted.

Definition 266. Let a < b and f ∶ (a, b) → R be a continuous function. We call c ∈ (a, b)
an inflexion point of f if there exists ε > 0 such that either statement (a) or (b) is true:
(a) f is strictly concave on (c − ε, c) and strictly convex on (c, c + ε).
(b) f is strictly convex on (c − ε, c) and strictly concave on (c, c + ε).

Fact 223. (The First Derivative Test for Inflexion Points [FDTI]) Let a < b,
f ∶ (a, b) → R be a differentiable function whose derivative is continuous, and c ∈ (a, b) is
a stationary point of f . If c is also an inflexion point of f , then there exists ε > 0 such
that f ′ is either strictly positive or strictly negative on (c − ε, c) ∪ (c, c + ε).

Proof. Since c is a stationary point, f ′ (c) = 0 (Definition 62).

Since c is an inflexion point, by Definition 266, there exists ε > 0 such that either (a) f is
strictly concave on (c − ε, c) and strictly convex on (c, c + ε); or (b) f is strictly convex on
(c − ε, c) and strictly concave on (c, c + ε).
If (a), then by Proposition 8, f ′ is strictly decreasing on (c − ε, c) and strictly increasing
on (c, c + ε). And so, by the continuity of f ′ and Lemma 10, f ′ (x) > f ′ (c) = 0 for all
x ∈ (c − ε, c) ∪ (c, c + ε).
Similarly, if (b), then by Proposition 8, f ′ is strictly increasing on (c − ε, c) and strictly
decreasing on (c, c + ε). And so, by the continuity of f ′ and Lemma 10, f ′ (x) < f ′ (c) = 0
for all x ∈ (c − ε, c) ∪ (c, c + ε).

As noted in Remark 95, the converse of the FDTI is false:

1350, Contents

Example 1234. Define f ∶ R → R by:407

⎪ 1

⎪x + x sin
3 4
for x ≠ 0,
f (x) = ⎨


⎪ for x = 0.
It is possible to verify that f is differentiable, f ′ is continuous, and 0 is a stationary point,
so that the hypotheses in the first sentence of Fact 223 are satisfied.
On the other hand, it is also possible to verify that there exists ε > 0 such that f ′ > 0 on
(−ε, ε), but 0 is not an inflexion point of f , so that the converse of Fact 223 is false.

Fact 146. (Second Derivative Test for Inflexion Points [SDTI]) Let a < b. Suppose
f ∶ (a, b) → R is twice differentiable and has inflexion point c ∈ (a, b). Then:
(a) c is a strict extremum of the first derivative f ′ ; and
(b) f ′′ (c) = 0.

Proof. (a) By our definition of an inflexion point, f is either (i) strictly concave on (c − ε, c)
and strictly convex on (c, c + ε); or (ii) strictly convex on (c − ε, c) and strictly concave on
(c, c + ε).
If (i), then by Proposition 8, f ′ is strictly decreasing on (c − ε, c) and strictly increasing on
(c, c + ε). By Lemma 11then, c is a strict local minimum of f ′ .
If (ii), then by Proposition 8, f ′ is strictly increasing on (c − ε, c) and strictly decreasing
on (c, c + ε). By Lemma 11then, c is a strict local maximum of f ′ .
(b) was already proven in the main text.

Fact 224. (Tangent Line Test) Let a < b, f ∶ (a, b) → R be differentiable, and c ∈ (a, b).
If c is an inflexion point of f , then there exists ε > 0 such that for every x1 ∈ (c − ε, c)
and x2 ∈ (c, c + ε), one of the following statements is true:
f (x1 ) − f (c) f (x2 ) − f (c)
(a) f ′ (c) ≥ and f ′ (c) ≤ .
x1 − c x2 − c
f (x1 ) − f (c) f (x2 ) − f (c)
(b) f ′ (c) ≤ and f ′ (c) ≥ .
x1 − c x2 − c

Let’s parse the above formal statement of the TLT a little.

First, observe that the tangent line at c has equation y − f (c) = f ′ (c) (x − c).
And so in words, (a) is the condition that the tangent line at c is above the graph of f just
to the left of c, but below it just to the right of c.
Similarly, (b) is the condition that the tangent line at c is below the graph of f just to the
left of c, but above it just to the right of c.

This example was stolen from Kouba (1995), “Can We Use the First Derivative to Determine Inflection
1351, Contents
Proof. Following Definition 266, suppose there exists ε > 0 such that f is strictly concave
on (c − ε, c), but convex on (c, c + ε). (The other cases where the words strictly, concave,
and convex are permuted are similar and thus omitted.)
Then by Proposition 8, f ′ (x) is strictly decreasing on (c − ε, c) and f ′ (x) is increasing on
(c, c + ε). And so, it is true that for every x1 ∈ (c − ε, c) and x2 ∈ (c, c + ε), we have:
f (x1 ) − f (c) f (x2 ) − f (c)
(b) f ′ (c) < and f ′ (c) ≥ .
x1 − c x2 − c

Remark 152. Note that the converse of the TLT is false. That is, every inflexion point
satisfies the TLT, but not every point that satisfies the TLT is an inflexion point. In
other words, the TLT is necessary but not sufficient.
It turns out that this remains true even if the given function is smooth! See the answer
given by zhw at .

There is the awkward possibility that an inflexion point may be an extremum:

Example 1235. Define f ∶ R → R by:

⎪x2 for x ≤ 0,
f (x) = ⎨√

⎪ for x > 0.
⎩ x

Figure to be
inserted here.

Clearly, 0 is a minimum (and indeed a strict global minimum) of f .

Observe that f is continuous, strictly convex on R− , and strictly concave on R+ . And so
by Definition 266, 0 is an inflexion point of f .
We thus have, awkwardly enough, a point that is both an inflexion point and a minimum
We note also that f is smooth everywhere except at 0, where it is not differentiable.

By imposing twice differentiability, we eliminate the above awkward possibility:

Fact 225. Let a < b. Suppose f ∶ (a, b) → R is twice differentiable. If c ∈ (a, b) is an

inflexion point of f , then c is not an extremum of f .

Proof. Suppose for contradiction that c is also an extremum of f . Then by the IET,

1352, Contents

f ′ (c) = 0.

If c ∈ (a, b) is an inflexion point of f , then by Fact 146, c is strict extremum of the first
derivative f ′ .
Suppose c is a strict local maximum of f ′ (the case where c is instead a strict local minimum
is similar and thus omitted). Then there exists some ε > 0 such that f ′ (x) < f ′ (c) = 0 for

all x ∈ (c − ε, c) ∪ (c, c + ε). Fact 145 then implies that f is strictly decreasing on (c − ε, c)
and (c, c + ε). And so, by the continuity of f and Lemma 11, c is a strict minimum of f on
(c − ε, c] and a strict maximum on [c, c + ε). Thus, c cannot be an extremum of f — this
is our desired contradiction.

1353, Contents

121.12. The Maclaurin Series

∞ ∞
Definition 267. Given the two series A = ∑ an and B = ∑ bn , their Cauchy product is
n=0 n=0
the series:
∞ n
C = AB = ∑ cn , where cn = ∑ ai bn−i .
n=0 i=0

∞ ∞
Definition 268. The series ∑ an is absolutely convergent if ∑ ∣an ∣ converges.
n=0 n=0

Definition 269. The series ∑ an is conditionally convergent if it is convergent but not
absolutely convergent.
∞ ∞ ∞
Theorem 37. Let C = ∑ cn be the Cauchy product of A = ∑ an and B = ∑ bn . Suppose
n=0 n=0 n=0
A and B are both convergent, with at least one being absolutely convergent. Then C

Proof. Omitted — see e.g. Rudin (1976, p. 74, Theorem 3.50) or Apostol (1974, p. 204,
Theorem 8.46).

Remark 153. The version of Theorem 37 given in the main text (p. 784) is slightly
incorrect because it omits the additional condition that at least one of the two series is
absolutely convergent.
If A and B are both merely conditionally convergent, then C may not converge.

Theorem 38. Let f ∶ (−c, c) → R and g ∶ (−d, d) → R be functions. Suppose:

∞ ∞
f (x) = ∑ an x for all x ∈ (−c, c)
and g (x) = ∑ bn xn for all x ∈ (−d, d).
n=0 n=0

Then for any z ∈ (−d, d) such that ∑ bn z n converges absolutely to some number in (−c, c),
we have:
∞ ∞ n
(f ○ g) (z) = ∑ an ( ∑ bm z ) .m
n=0 m=0

Proof. Omitted — see e.g. Apostol (1974, p. 238, Theorem 9.25: “Substitution Theorem”)
or Lang (1999, p. 66, Theorem 3.10: “Composition of Series”).

1354, Contents

121.13. The Riemann Integral
For simplicity, our definition of Riemann integration here restricts attention to continuous

Definition 270. Let a, b ∈ R with a < b, f ∶ [a, b] → R be a continuous function and

n ∈ Z+ .
For each i = 1, 2, . . . , n, let Pi be the following closed interval:
b−a b−a
Pi = [b + (i − 1) ,b + i ].
n n

Next,408 for each i, let li and ui be the minimum and maximum values of f in Pi .
Then the lower and upper n-sums of f from a to b are denoted Ln and Un and are defined

b−a n b−a n
Ln = ∑ li and Un = ∑ ui .
n i=1 n i=1

The lower and upper integrals of f from a to b are denoted L and U and are defined by:

L = lim Ln and U = lim Un ,

n→∞ n→∞

provided these limits exist.

Suppose L = U . Then we say that f is Riemann-integrable. We also define ∫ f , the


definite integral of f from a to b by:

∫a f = L = U.

The following result says that a continuous function on closed (and bounded) interval is

Theorem 39. Let a, b ∈ R with a < b. If f ∶ [a, b] → R is continuous, then f is Riemann-


Proof. Omitted — see e.g. ProofWiki.

The existence of li and ui is given by the Extreme Value Theorem (Theorem 36).
1355, Contents
Theorem 40. (Order Limit Theorem) Let D be an interval, f ∶ D → R be a continuous
function, a be an interior point of D, and p ∈ R.
(a) If f (x) ≤ p for all x < a, then f (a) ≤ p.
(b) If f (x) < p for all x > a, then f (a) ≤ p.
(c) If f (x) ≥ p for all x < a, then f (a) ≥ p.
(d) If f (x) > p for all x > a, then f (a) ≥ p.

f (a) − p
Proof. (a) Suppose f (a) > p. Let δ = .
Then by the continuity of f , there exists ε > 0 such that for every x ∈ (a − ε, a + ε), we
have f (x) ∈ (f (a) − δ, f (a) + δ) and hence f (x) > p. This contradicts our assumption that
f (x) ≤ p for all x < a.
Given (a), (b), which has the same conclusion but uses a stronger assumption, is a fortiori
The proofs of (c), and (d) are similar to those of (a) and (b) and thus omitted.

Theorem 25. Let a, b, c, d, e ∈ R with a < c < b. Suppose f, g ∶ [a, b] → R are continuous
functions. Then:

(a) ∫ (f ± g) = ∫ f ± ∫ g. (Sum and Difference Rules)

b b b

a a a

(b) ∫ f = ∫ f + ∫ f . (Adjacent Intervals Rule)

b c b

a a c

(c) ∫ (df ) = d ∫ f . (Constant Factor Rule)

b b

a a

(d) ∫ d = (b − a) d. (Constant Rule)


(e) If f ≥ g on [a, b], then ∫ f ≥ ∫ g. (Comparison Rule I)

b b

a a

(f) If d ≤ f ≤ e on [a, b], then (b − a) d ≤ ∫ f ≤ (b − a) e. (Comparison Rule II)


Proof. (a) XXX To be written.

(b) xxx
(c) xxx
(d) Define h ∶ [a, b] → R by h (x) = d.
Fix a positive integer n. For any i = 1, 2, . . . , n, we have li = d and ui = d (see Definition
So, the lower and upper sums of h from a to b are:
b−a n b−a b−a n b−a
Ln = ∑ li = nd = (b − a) d and Un = ∑ ui = nd = (b − a) d.
n i=1 n n i=1 n
1356, Contents
Hence, the lower and upper integrals of h from a to b are:
L = lim Ln = (b − a) d and U = lim Un = (b − a) d.
n→∞ n→∞

Since L = U , we have:

∫a d = ∫a h = L = U = d.
b b

(e) xxx
(f) xxx

1357, Contents

121.14. Proving the FTC1

Theorem 26. (First Fundamental Theorem of Calculus [FTC1]) Let f ∶ [a, b] → R

be continuous. Suppose g ∶ [a, b] → R is defined by:

g (x) = ∫

Then g ′ = f .

Proof. Fix c ∈ [a, b] and let d ∈ [a, b] with d ≠ c. We have:

g (d) − g (c) ∫a f − ∫a f ∫c f
d c d
= =
d−c d−c d−c

where the last step uses Adjacent Intervals Rule (Theorem 25).
g (d) − g (c)
Let y = f (c) (note that this is a constant). Let S = − y.
Observe that:

g (d) − g (c) ∫ f d−c

S= −y = c − y= f (x) − y dx,

d−c d−c d−c d − c ∫c

where the last step uses Constant Rule (Theorem 25).
Let δ > 0. By the continuity of f , there exists ε > 0 such that for all x ∈ (c − ε, c + ε), we
−δ < f (x) − f (c) = f (x) − y < δ.

By Comparison Rule II (Theorem 25) then, for all d ∈ (c − ε, c + ε), we have:

−δ < f (x) − y dx = S < δ.

d − c ∫c
We have just shown that for any δ > 0, there exists ε > 0 such that for all d ∈ (c − ε, c + ε),
we have:

S ∈ Nδ (0) .

Thus: lim S = 0.

g (d) − g (c)
That is: lim [ − y] = 0.
d→c d−c
g (d) − g (c) g (d) − g (c)
But: lim [ − y] = lim − lim y = g ′ (c) − y = g ′ (c) − f (c).
d→c d−c d→c d−c d→c

Hence, g ′ (c) = f (c). Since c was chosen arbitrarily from [a, b], we have g ′ = f .

1358, Contents

121.15. More Techniques of Antidifferentiation

Fact 152. Suppose a, b, c ∈ R with a ≠ 0 and d = ∣b2 − 4ac∣. Then:

⎪ x + b−d

ln ∣ ∣+C for b2 − 4ac > 0,


⎪ x + 2a

d b+d

1 ⎪

∫ ax2 + bx + c dx = ⎨− 1 + C for b2 − 4ac = 0,

⎪ x + 2a


2 −1 2ax
+C for b2 − 4ac < 0.


Proof. We start by observing that:

b 2 d 2
a [(x + ) − ( ) ] for b2 − 4ac

2a 2a

2 2 2 2 − 4ac ⎪
⎪ b 2
ax +bx+c = a (x + ) +c− = a [(x + ) − ] = ⎨
b b b b
a (x + ) for b2 − 4ac
2 1
2a 4a 2a 4a2 ⎪

⎪ 2a

b 2 d 2

⎪ a [(x + ) + ( ) ] for b2 − 4ac

⎩ 2a 2a

We now examine each of the three cases.

(a) Suppose b2 − 4ac > 0. Then:

1 1 1 (A + B) x + A b−d
2a + B 2a
= = ( + )=
1 A B
ax2 + bx + c a (x + b+d ) (x ) + + (x ) (x )
2a + b−d
a x b+d
2a x b−d
2a a + b+d
2a + b−d

Comparing coefficients, we have A + B = 0 and:


b−d b+d 1
+B = [(A + B) b + (B − A) d] = (B − A) = 1.
2 d 3
2a 2a 2a 2a

The sum of = (rearranged) and = yields:

3 2

2a −a
2B = or B = A=
and .
d d d
−1 1
Thus: 1 1
∫ ax2 + bx + c dx = a (∫ dx + ∫ dx) = ∫ dx + ∫
A B d d
x + b+d
2a x + b−d
2a x + b+d
2a x + b−d

−1 b+d 1 b−d 1 x + b−d

= ln ∣x + ∣ + ln ∣x + ∣ + C = ln ∣ 2a
∣ + C.
d 2a d 2a d x + 2a

1359, Contents

(b) Suppose b2 − 4ac = 0. Then:
1 1 1 1 1 2
∫ ax2 + bx + c dx = ∫ a = − + = − + C.
+ +
(x + 2a )
b 2 2ax
a b b
x 2a

(c) Suppose b2 − 4ac < 0. Then:

1 1 1 1 4 1 2a 2ax 2 2ax
∫ ax2 + bx + c dx = a ∫ dx = ( tan−1 ) + C = tan−1 + C,
(x + 2a ) + ( 2a )
b 2 d 2 a d d d d

where = uses Proposition 9(a).


1360, Contents

121.16. ∫ √ dx
ax2 + bx + c

Fact 153. Suppose a, b, c ∈ R with a < 0. If ax2 + bx + c > 0, then:

1 1 −2ax − b
∫ √ 2 dx = √ sin−1 √ + C.
ax + bx + c ∣a∣ b2 − 4ac

Remark 154. Note that if a < 0 and ax2 + bx + c > 0, then we must have b2 − 4ac > 0.
−2ax − b
Also, it is possible to verify that √ ∈ [−1, 1] = Domain sin−1 .
b − 4ac

Proof. Note that since a < 0, we have ∣a∣ = −a.

Complete the square:
⎡ √ 2 2⎤
2 − 4ac 2 − 4ac ⎢ b − 4ac ⎥
2 2 2
ax +bx+c = a [(x + ) −
b b
] = ∣a∣ [
− (x +
) ] =
∣a∣ ⎢( ) − (x +
) ⎥.
⎢ 2a ⎥⎥
2a 4a2 4a2 2a ⎢ 2 ∣a∣
⎣ ⎦
By Proposition 9(b), we have:
∫ √ 2 dy = sin−1 + C.
2 y
d − y2 ∣d∣

And now, as in the above examples, we have:

1 1 1 1
∫ √ 2 dx = √ ∫ ¿ dx = √ sin−1 + C
1 2 y
ax + bx + c ∣a∣ Á √2 2 2 ∣a∣ ∣d∣
Á( b −4ac ) − (x + b )
Á 2∣a∣
À ²
´¹¹ ¹ ¹ ¸¹ ¹ ¹ ¹¶ y

1 −1 x + 2a 1 −1 x + 2a 1 −1 −2ax − b
b b
= √ sin + = √ + = √ √ + C.
√ C sin √ C sin
∣a∣ ∣a∣ ∣a∣ 2 − 4ac
∣ b2∣a∣ ∣
2 −4ac b2 −4ac

At =, we can remove the absolute value sign because the expression inside is positive.

Fact 226. Let a ≠ 0. If x2 + a > 0, then:

1 √
∫ √ dx = ln ∣ x2 + a + x∣ + C.
x +a

√ √ 2 dx
Proof. Let u = tan−1 √ ∈ (− , ). So, a tan u = x and a sec2 u =
x π π 1
. And now:
a 2 2 du

1361, Contents

1 1 1 1 1 1 1 1
∫ √ 2 dx = ∫ √ dx = √ ∫ √ dx = √ ∫ dx = √ ∫
x +a a tan2 u + a a tan2 u + 1 a ∣sec u∣ a sec u
s 1 1 dx 2 1 1 √
=√ ∫ du = √ ∫ a sec2 u du = ∫ sec u du = ln ∣sec u + tan u∣ + C
a sec u du a sec u
RRR√ 2 RRR √
R R x2 + a + x
= ln ∣sec (tan−1 √ ) + √ ∣ + Ĉ = ln RRRR + 1 + √ RRRR + Ĉ = ln ∣ √ ∣ + Ĉ
1 x x x x
a a a
√ √ √
= ln ∣ x2 + a + x∣ − ln a + Ĉ = ln ∣ x2 + a + x∣ + C.

Fact 227. Suppose a, b, c ∈ R with a > 0 and b2 − 4ac ≠ 0. If ax2 + bx + c ≠ 0, then:

dx = √ ln RR x2 + x + + x + RRRR + C.
1 1
∫ √ 2
b c b
ax + bx + c a RRR 2a RRR
a a

Proof. Completing the square we have:

b 2 4ac − b2
ax + bx + c = a [(x + ) + ].
2 1
2a 4a2


1 1 1 1 RRRÁ Á
À b 2 4ac − b2 b RRRR
∫ √ 2 dx = √ ∫ √ dx = √ ln RRR (x + ) + + x + RRR +
ax + bx + c RRR 2a 4a2 2a RRR
a (x + 2a ) + 4a2
b 2 4ac−b2 a
= √ ln RR x2 + x + + x + RRRR + C.
1 b c b
a RRR 2a RRR
a a

Of course, if b2 − 4ac = 0, then we’d simply have:

1 1 1 1 1 1
∫ √ dx = √ ∫ √ dx = √ ∫ = √ ∣x + ∣ + C.
dx ln
ax2 + bx + c a (x + 2a )
b 2 a ∣x + b
2a ∣ a 2a

Remark 155. We could also have simply verified the above results (i.e. simply show that
the derivative of the RHS is equal to the integrand on the LHS), but that would’ve been
less enlightening.

1362, Contents

121.17. Revisiting Logarithms and Exponentiation

Fact 159.
Proof. By our general definition of exponents (Definition 271), xn = exp (n ln x). So:

ln xn = ln [exp (n ln x)] = n ln x,

where = uses the fact that exp and ln are inverses.


In Ch. 5.4, we only defined what bx means in two cases:

1. x ∈ Z — so for example, we know that 52 = 25, (−5) 3 = 125, and 4.5−2 = 4/81; or
√ 26
2. b ≥ 0 and x ∈ Q — so for example, we know that 52.6 = ( 5) .

We did not define what bx in two other cases, namely:

3. x ∉ Q — so for example, we don’t know what 5 3
is; or
4. b < 0 and x ∉ Z — so for example, we don’t know what (−5)
And so, we’re actually cheating when we take for granted that the Laws of Exponents
(Proposition 1)√ hold for √all positive bases and real exponents. For example, we haven’t
defined what 5 3 and 5− 3 are, yet we cavalierly take for granted that:
√ √ √ √
5 3
− 3
=5 3+(− 3)
= 50 = 1.

Let us think about what bx might mean if x ∉ Q.

√ √
Consider 5 3 . Say we know that 3 ≈ 1.732 050 8 . . . and also that:

51 = 5,
√ 17
51.7 = 5 10 =( ≈ (1.174 619 . . . ) ≈ 15.425 . . .
17 17

√ 173
51.73 = 5 100 = ( 5) ≈ (1.016 224 6 . . . ) ≈ 16.188 . . .
173 100 173

√ 1 732
51.732 = 5 1 000 =( ≈ (1.001 610 73 . . . ) ≈ 16.241 . . .
1 732 173
1 000

√ 17 320
51.732 0 = 5 10 000 =( ≈ (1.000 160 957 . . . ) ≈ 16.241 . . .
17 320 1 730
10 000

√ 173 205
51.732 05 = 5 100 000 =( ≈ (1.000 016 094 5 . . . ) ≈ 16.242 . . .
173 205 17 305
100 000

√ 1 732 050
51.732 050 = 5 1 000 000 =( ≈ (1.000 016 094 5 . . . ) ≈ 16.242 . . .
1 732 050 173 050
1 000 000

√ 17 320 508
51.732 050 8 = 5 100 00 000 =( ≈ (1.000 000 160 944 . . . ) ≈ 16.242 . . .
17 320 508 1 730 508
10 000 000

And so informally, we might say that 5 3
≈ 16.242 4 . . .

A little more formally, we might say that 5 is the limit of the following sequence:

51 , 51.7 , 51.73 , 51.732 , 51.732 0 , 51.732 05 , 51.732 050 , 51.732 050 8 , . . .

1363, Contents

And so, following the above discussion, one possible approach for defining bx in the case
where x ∉ Q might go like this:
• Assume (prove) there is a sequence of rational numbers (x0 , x1 , . . . ) that converges to x.
• Use that sequence to form the sequence (bx0 , bx1 , . . . ).
• Assume (prove) this latter sequence converges to some number y. That is:
(bx0 , bx1 , . . . ) → y.
• Now define bx to be equal to y.
This seems like a perfectly sensible approach. But strangely, it is not the approach we
will take. Instead, somewhat surprisingly, we’ll define exponents based on the natural
logarithm and exponential functions:409

Definition 271. Let b > 0 and x ∈ R. Then b raised to the power of x, denoted bx , is
defined as the following number:

bx = exp (x ln b) .

Given the expression bx , we call b the base and x the exponent.

In the special case where the base is zero, i.e. b = 0, we define:

⎪ 0 for x > 0,

0x = ⎨ 1 for x = 0,

⎩Undefined for x < 0.

Remark 156. The above definition is fairly general, but still fails to cover the case where
the base is negative, i.e. b < 0. To cover also that case, we’d have to learn a bit more
about complex numbers and in particular the complex natural logarithm function ln
(whose domain is every complex number except 0).
It turns out that with the complex natural logarithm function, we can define bx when
b < 0 exactly as we did above:

bx = exp (x ln b) .

This though is of course quite a bit beyond H2 Maths and so we shall discuss this no

Observe that in the special case where b = 0, the above general definition of exponentiation
coincides with Definition 26, which covered only the special case where b ∈ Z.
The following Proposition shows that for b > 0 and any x ∈ R, Definitions 271 and 185 imply
and therefore supersede the four definitions of exponents and logarithms given in Ch. 5.4:

Proposition 21. If b > 0 and x ∈ R, then Definition 271 implies Definitions (a) 26; (b)
27; (c) 28; and (d) 30.
Which were in turn formally defined as Definitions 184 and 59 in the main text.
1364, Contents
⋆ ○
Proof. Below, = and = indicate the use of Definition 271 and the fact that exp and ln are
each other’s inverse.
(a) We will show that in each of three possible cases, Definition 271 implies Definition 26.
Case 1. If x = 0, then by Fact 160(a):

b0 = exp 0 = 1.

Case 2. If x > 0, then by Fact 160(c):

⋆ ○
bx = exp (x ln b) = exp (ln b + ln b + ⋅ ⋅ ⋅ + ln b) = exp (ln b) exp (ln b) . . . exp (ln b) = b ⋅ b ⋅ ⋅ ⋅ ⋅ ⋅ b.
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
x times x times x times

Case 3. If x < 0, then by Facts 158(c) and 160(c):

⋆ 1 1 1 1 1
bx = exp (x ln b) = exp (− ln b − ln b − ⋅ ⋅ ⋅ − ln b) = exp (ln + ln + ⋅ ⋅ ⋅ + ln ) = exp (ln ) exp (ln
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹
b b b b b
∣x∣ times
∣x∣ times ∣x∣ time

(b) We will show that (b x ) = b, so that Definition 27 holds:

1 x

1 x ⋆
⋆ 1 ○ 1 ○
(b ) = [exp ( ln b)] = exp {x ln [exp ( ln b)]} = exp [x ( ln b)] = exp (ln b) = b.
x x x
(c) is immediate from (a) and (b).
⋆ ○
(d) bx = n ⇐⇒ exp (x ln b) = n ⇐⇒ ln [exp (x ln b)] = ln n ⇐⇒ x ln b = ln n ⇐⇒
ln n
x= = logb n.
ln b
(The last step uses Definition 185.)

In the main text, we proved the following Laws of Exponents only in the special and simple
case where the exponents x and y are positive integers. We will now prove them more

Proposition 1. (Laws of Exponents.) Let a, b > 0 and x, y ∈ R. Then:

(a) bx by = bx+y . (b) b−x = = bx−y . (d) (bx ) = bxy . (e) (ab) = ax bx .
. (c)
y x
bx b y

Proof. Below, = indicates the use of Definition 271.
⋆ ⋆
(a) bx by = exp (x ln b) exp (y ln b) = exp (x ln b + y ln b) = exp [(x + y) ln b] = bx+y .
⋆ 1 ○ 1
(b) b−x = exp (−x ln b) = exp (− ln bx ) = exp (ln x ) = x , where = and = use Facts 159 and
1 2 1 2
b b
⋆ exp (x ln b) ⋆ bx
(c) bx−y = exp [(x − y) ln b] = exp (x ln b − y ln b) = = y , where = uses Fact 160(e).
3 3
exp (y ln b) b

1365, Contents

y ⋆ ⋆
(d) (bx ) = exp [y (ln bx )] = exp [xy (ln b)] = bxy , where = uses Fact 160(e).
4 4

x ⋆ ⋆
(e) (ab) = exp [x ln (ab)] = exp [x (ln a + ln b)] = exp (x ln a + x ln b) = [exp (x ln a)] [exp (x ln b)] =
5 6

ax bx , where = and = use Facts 158(b) and 160(c).

5 6

Fact 28. For every real number x, we have:

ex = exp x.

Proof. By Definitions 271 and 60:

ex = exp (x ln e) = exp (x ⋅ 1) = exp x.

1366, Contents

122. Appendices for Part VI. Probability and Statistics

122.1. How to Count

Theorem 41. (AP.) If A and B are disjoint, finite sets, then ∣A ∪ B∣ = ∣A∣ + ∣B∣.

Proof. Let A = {a1 , a2 , . . . , ap } and B = {b1 , b2 , . . . , bq }. Then

A ∪ B = {a1 , a2 , . . . , ap , b1 , b2 , . . . , bq } .

We have ∣A∣ = p, ∣B∣ = q, and ∣A ∪ B∣ = p + q. The result follows.

Corollary 39. If A1 , A2 , . . . , An are disjoint, finite sets, then ∣∪ni=1 Ai ∣ = ∑ ∣Ai ∣.

Proof. By induction (details omitted).

Theorem 42. (MP.) If A and B are finite sets, then ∣A × B∣ = ∣A∣ × ∣B∣.

Proof. Let A = {a1 , a2 , . . . , ap } and B = {b1 , b2 , . . . , bq }. Then

A × B = ⎨ (a1 , b1 ) , (a1 , b2 ) , . . . , (a1 , bq ) , (a2 , b1 ) , (a2 , b2 ) , . . . , (a2 , bq ) , . . . ,

. . . , (ap , b1 ) , (ap , b2 ) , . . . , (ap , bq ) ⎬.

We have ∣A∣ = p, ∣B∣ = q, and ∣A × B∣ = pq. The result follows.

Corollary 40. If A1 , A2 , . . . , An are finite sets, then ∣×ni=1 Ai ∣ = πni=1 ∣Ai ∣.

Proof. By induction (details omitted).

1367, Contents

Theorem 43. (IEP.) If A and B are finite sets, then ∣A ∪ B∣ = ∣A∣ + ∣B∣ − ∣A ∩ B∣.

Proof. A ∪ B = (A ∖ (A ∩ B)) ∪ B. So by the AP, ∣A ∪ B∣ = ∣A ∖ (A ∩ B) ∣ + ∣B∣.


Now, (A ∖ (A ∩ B)) ∪ (A ∩ B) = A. So also by the AP, ∣A ∖ (A ∩ B)∣ + ∣A ∩ B∣ = ∣A∣ or

∣A ∖ (A ∩ B)∣ = ∣A∣ − ∣A ∩ B∣. Plug = into = to get the desired result.
2 2 1

Corollary 41. If A1 , A2 , A3 , are finite sets, then

∣A1 ∪ A2 ∪ A3 ∣ = ∣A1 ∣ + ∣A2 ∣ + ∣A3 ∣ − ∣A1 ∩ A2 ∣ − ∣A1 ∩ A3 ∣ − ∣A2 ∩ A3 ∣ + ∣A1 ∩ A2 ∩ A3 ∣ .

Proof. Similar to the previous proof, just more tedious.

And here’s the generalisation of the IEP:

Corollary 42. If A1 , A2 , . . . , An , are finite sets, then

∣∪ni=1 Ai ∣ = ∑ ∣Ai ∣ − ∑ ∣Ai ∩ Aj ∣ + ∑ ∣Ai ∩ Aj ∩ Ak ∣ − ⋅ ⋅ ⋅ + (−1) ∣∩ni=1 Ai ∣ .

i=1 i,j distinct i,j,k distinct

Proof. By induction (details omitted).

Theorem 44. (CP.) If A and B are finite sets and A ⊆ B, then ∣A ∖ B∣ = ∣A∣ − ∣B∣.

Proof. B and A ∖ B are disjoint, finite sets. Moreover, B ∪ (A ∖ B) = A. So by the AP,

∣B∣ + ∣A ∖ B∣ = ∣A∣. Rearranging yields the desired result.

Corollary 43. If A is a finite set and B1 , B2 , . . . Bn ⊆ A are disjoint, then

∣A ∖ ∪ni=1 Bi ∣ = ∣A∣ − ∑ ∣Bi ∣ .

Proof. By the corollary to the AP, ∣∪ni=1 Bi ∣ = ∑ ∣Bi ∣. The result then follows by the CP.

1368, Contents

122.2. Circular Permutations
Consider n objects, only k of which are distinct. Let r1 , r2 , . . . , and rk be the numbers of
times the 1st, 2nd, . . . , and kth distinct objects appear. We already know from Fact 166
that the number of (linear) permutations of these n objects is
r1 !r2 ! . . . rk !

We also know that m distinct objects have m! (linear) permutations and (m − 1)! circular
A reasonable conjecture might thus be that the number of circular permutations of the
above n objects is
(n − 1)!
r1 !r2 ! . . . rk !

The above conjecture sometimes “works” — e.g. SEE has 3!/2! = 3 (linear) permutations
and SEE indeed also has (3 − 1)!/2! = 1 circular permutation. However and unfortunately,
this conjecture is, in general, incorrect. Here are two counter-examples.

Example 1236. There are 3!/3! = 1 (linear) permutations of the three letters AAA.
If the above conjecture were true, then there ought to be (3 − 1)!/3! = 2!/3! = 1/3 circular
permutations of AAA. But this is not even an integer, so obviously it cannot be the num-
ber of circular permutations of AAA. In fact, there is also exactly 1 circular permutation
of AAA.

Example 1237. There are 6!/ (3!3!) = 20 (linear) permutations of the six letters
If the above conjecture were true, then there ought to be (6 − 1)!/ (3!3!) = 10/3 circular
permutations of AAABBB. But this is not even an integer, so obviously it cannot be
the number of circular permutations of AAABBB. In fact, there are exactly 4 circular
permutations of AAABBB.

A general solution (i.e. formula) is possible but is a bit too advanced for A-Levels.410

See e.g. this Handbook on Combinatorics.
1369, Contents
122.3. Probability

Proposition 14 (p. 945 above). Let S be the sample space, Σ be the corresponding
event space, and A, B be events. If the probability function P ∶ Σ → R satisfies the
Kolmogorov Axioms, then P also satisfies the following properties:
1. Complements. P(A) = 1 − P (Ac ).
2. Probability of Empty Event is Zero. P(∅) = 0.
3. Monotonicity. If B ⊆ A, then P(B) ≤ P(A).
4. Probabilities Are At Most One. P(A) ≤ 1.
5. Inclusion-Exclusion. P(A ∪ B) = P(A) + P(B) − P(A ∩ B).

Proof. (Continued from p. 945.)

2. Probability of Empty Event is Zero. ∅ ∩ A = ∅ and ∅ ∪ A = A. And so again by
the Additivity Axiom, P(∅ ∪ A) = P(A) = P(∅) + P(A). Thus, P(∅) = 0.
But also by definition, A ∪ Ac = S. Hence, P(A ∪ Ac ) = P(S).
By the Normalisation Axiom, P(S) = 1.
3. Monotonicity. A ∩ {B ∖ A} = ∅ and A ∪ {B ∖ A} = B. Thus, by the Additivity
Axiom, P(B) = P(A) + P(B ∖ A). By the Non-Negativity Axiom, P(B ∖ A) ≥ 0. Hence,
P(B) ≥ P(A).
4. Probabilities Are At Most One. Any event A is a subset of S. And so by
Monotonicity, P(A) ≤ P(S). But by the Normalisation Axiom, P(S) = 1. Thus, P(A) ≤ 1.
5. Inclusion-Exclusion Principle. By the Additivity Axiom, P(A ∪ B) = P(A) + P(B ∖
Also by the Additivity Axiom, P(A ∩ B) + P(B ∖ A) = P(B).
Altogether then, P(A ∪ B) = P(A) + P(B) − P(A ∩ B).

1370, Contents

122.4. Random Variables

Proposition 15 (p. 981). The expectation operator E is linear. That is, if X and Y
are random variables and c is a constant, then
(a) Additivity: E[X + Y ] = E [X] + E [Y ],
(b) Homogeneity of degree 1: E[cX] = cE [X].

Proof. This proposition applies even for non-discrete random variables. But we’ll prove
this proposition only for the case where the random variable is discrete.
We’ll use the linearity of the expectation operator. We prove (b) first.

(b) E[cX] = ∑ P (X = k) ⋅ (ck) = c ∑ P (X = k) ⋅ k = cE [X] .

k∈Range(X) k∈Range(X)

(a) E [X + Y ]

= ∑ ∑ P (X = k, Y = l) ⋅ (k + l)
k∈Range(X) l∈Range(Y )

= ∑ k ∑ P (X = k, Y = l) + ∑ l ∑ P (X = k, Y = l)
k∈Range(X) l∈Range(Y ) l∈Range(Y ) k∈Range(X)

= ∑ kP (X = k) + ∑ lP (Y = l)
k∈Range(X) l∈Range(Y )

= E [X] + E [Y ] .

1371, Contents

Proposition 16 (p. 989). Let X and Y be independent random variables. Let c be
a constant. Then
(a) Additivity: Var[X + Y ] = Var [X] + Var [Y ],
(b) Homogeneity of degree 2: Var[cX] = c2 Var [X].

Proof. We use Fact 175 and the linearity of the expectation operator.

(b) V[cX] = E [(cX) ] − (cµX ) = c2 E [X 2 ] − c2 µ2X = c2 (E [X 2 ] − µ2X ) = c2 V [X] .

2 2

To prove (a), we’ll also use Lemma 13:

V [X + Y ] = E [(X + Y ) ] − (E [X + Y ])
2 2

= E [X 2 + Y 2 + 2XY ] − (E [X] + E [Y ])

= E [X 2 ] + E [Y 2 ] + 2E [XY ] − (µ2X + µ2Y + 2µX µY )

= E [X 2 ] − µ2X + E [Y 2 ] − µ2Y + 2(E [XY ] − µX µY ) .

´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
V[X] V[Y ] 0

Lemma 13. If X and Y are independent random variables, then E [XY ] = E [X] E [Y ].

Proof. We prove this Lemma only for the case where X and Y are both discrete.

E [XY ] = ∑ ∑ P (X = k, Y = l) ⋅ kl
k l

= ∑ ∑ P (X = k) P (Y = l) ⋅ kl (independence)
k l

= ∑ (P (X = k) k ∑ P (Y = l) ⋅ l) = ∑ (P (X = k) kE [Y ])
k l k

= E [Y ] ∑ P (X = k) k = E [Y ] E [X] .

1372, Contents

Fact 229. (a) Let X be the number of fair coin-flips until we get two consecutive heads.
Let Y be the number of fair coin-flips until we get HT consecutively. Then E [X] = µX = 6
and E [Y ] = µY = 4.
(b) Flip a fair coin n + 1 times. This gives us n pairs of consecutive coin-flips. Let A
be the proportion of these n pairs of consecutive coin-flips that are HH. Let B be the
proportion that are HT . Then E[A] = µA = 1/4 and E[B] = µB = 1/4.

Proof. (a) To find µX actually requires a clever, new trick. Let

p = E [Additional number of flips to get HH∣Last flip was T ] ,

q = E [Additional number of flips to get HH∣Last two flips were T H] .

Observe that p is the number of flips, if we’re “restarting” . Thus, p = µX . Now,

q = P (Next flip is H) × 1 + P (Next flip is T ) × (1 + p)

= 0.5 × 1 + 0.5 × (1 + p) = 1 + 0.5p.

(Explanation: If the next flip is H, then we’ve completed HH and this took us only 1 more
flip. If instead the next flip is T , then we start all over again; we’ve already taken 1 flip
and are expected to take another p flips.) Similarly, observe that

p = P (Next flip is H) × (1 + q) + P (Next flip is T ) × (1 + p)

= 0.5 × (2 + 0.5p) + 0.5 × (1 + p) = 1.5 + 0.75p.

(Explanation: If the next flip is H, then we expect to take, in addition, another q flips. If
instead the next flip is T , then we start all over again; we’ve already taken 1 flip and are
expected to take another p flips.)
Hence, p = 6 = µX . The reasoning used above is illustrated by the probability tree below.
Let’s now find µY . Again, let

r = E [Additional number of flips to get HT ∣Last two flips were T T ] ,

s = E [Additional number of flips to get HT ∣Last flip was H] .

Observe that r is the number of flips, if we’re “restarting”. Thus, r = µX .

(... Proof continued on the next page ...)

1373, Contents

(... Proof continued from the previous page ...)

Now, also observe that

s = P (Next flip is T ) × 1 + P (Next flip is H) × (1 + s)

= 0.5 × 1 + 0.5 × (1 + s) = 1 + 0.5s.

(Explanation: If the next flip is T , then we’ve completed HT and this took us only 1 more
flip. If instead the next flip is H, then we’ve already taken 1 flip and are expected to take
another s flips.)
So s = 2. Similarly, observe that

r = P (Next flip is H) × (1 + s) + P (Next flip is T ) × (1 + r)

= 0.5 × (1 + 2) + 0.5 × (1 + r) = 2 + 0.5r.

(Explanation: If the next flip is H, then we’ve already taken 1 flip and are expected to
take another s flips. If the next flip is T , then we’ve already taken 1 flip and are expected
to take another r flips.)
So r = 4 = µY .

(b) Let Si be the random variable that indicates whether the ith pair of consecutive coin-
flips is HH. That is, Si = 1 if so and Si = 0 if not. Then
S1 + S2 + ⋅ ⋅ ⋅ + Sn
A= .
S1 + S2 + ⋅ ⋅ ⋅ + Sn 1 n
And so, E [A] = E [ ] = ∑ E [Si ] .
n n i=1

But E [Si ] = 1/4. Thus, E [A] = 1/4.

The proof that E [B] = 1/4 is similar.

1374, Contents

122.5. The Normal Distribution

∞ √
Fact 230. ∫ e−x dx =

Proof. Omitted. See or .

Fact 180 (p. 1008). Let Z ∼ N(0, 1) and φ and Φ be its PDF and CDF.
1. Φ(∞) = 1. (As with any random variable, the area under the entire PDF is 1.)
2. φ (a) > 0, for all a ∈ R. (The PDF is positive everywhere. This has a surprising
implication: however large a is, there is always some non-zero probability that Z ≥ a.)
3. E [Z] = 0. (The mean of Z is 0.)
4. The PDF φ reaches √ a global maximum at the mean 0. (In fact, we can go ahead and
compute φ (0) = 1/ 2π ≈ 0.399.)
5. Var [Z] = 1. (The variance of Z is 1.)
6. P (Z ≤ a) = P (Z < a). (We’ve already discussed this earlier. It makes no difference
whether the inequality is strict. This is because P(Z = a) = 0.)
7. The PDF φ is symmetric about the mean. This has several implications:
(a) P (Z ≥ a) = P (Z ≤ −a) = Φ(−a).
(b) Since P (Z ≥ a) = 1 − P (Z ≤ a) = 1 − Φ (a), it follows that Φ(−a) = 1 − Φ (a) or,
equivalently, Φ (a) = 1 − Φ(−a).
(c) Φ (0) = 1 − Φ (0) = 0.5.
8. P (−1 ≤ Z ≤ 1) = Φ (1) − Φ (−1) ≈ 0.6827. (There is probability 0.6827 that Z takes on
values within 1 standard deviation of the mean.)
9. P (−2 ≤ Z ≤ 2) = Φ (2) − Φ (−2) ≈ 0.9545. (There is probability 0.9545 that Z takes on
values within 2 standard deviations of the mean.)
10. P (−3 ≤ Z ≤ 3) = Φ (3) − Φ (−3) ≈ 0.9973. (There is probability 0.9973 that Z takes on
values within 3 standard deviations of the mean.)
11. The PDF φ has two points of inflexion, namely at ±1. (The points of inflexion are one
standard deviation away from the mean.)

√ √
Proof. 1. Let u = x/ 2. We have u2 = 0.5x2 and du/dx = 1/ 2. And using Fact 230:
∞ 1 1 2 √ du 1 1 1 √
Φ(∞) = ∫ √ e−0.5x dx = √ ∫ e−0.5x 2 dx = √ ∫ e−u du = √ π = 1.
2 x=∞ u=∞ 2

−∞ 2π 2π x=−∞ dx π u=−∞ π

(... Proof continued on the next page ...)

1375, Contents

(... Proof continued from the next page ...) 2. Obvious.
∞ −1 ∞ −1 2 ∞ −1
E [Z] = ∫ xφ (x) dx = √ ∫ (−xe−0.5x ) dx = √ [e−0.5x ] = √ [0 − 0] = 0.
−∞ 2π −∞ 2π −∞ 2π

⎪ > 0, if a < 0,
d 1 −0.5a2 −a −0.5a2 ⎪

φ (a) = √ e =√ e ⎨= 0,
⎪ if a = 0,
da da 2π 2π ⎪

⎩< 0, if a > 0.
³¹¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ µ
∞ 1 ∞ 1 © −0.5x2
∞ u
V [Z] = ∫ x2 √ e−0.5x dx = √ ∫
(x − 0) φ (x) dx = ∫
5. x xe dx
−∞ −∞ 2π 2π −∞
∞ ∞
1 1
= √ [e−0.5x − ∫ e−0.5x dx] = √ ∫ e−0.5x dx = 1.
2 2 2

2π −∞ 2π −∞
φ is continuous, increasing for a < 0 and decreasing for a > 0. Thus, φ reaches a global
maximum√at 0. By plugging in a = 0, we can compute this global maximum value to be
φ (0) = 1/ 2π ≈ 0.399.
6. By the Additivity Axiom, P (Z ≤ a) = P (Z < a, Z = a) = P (Z < a)+P (Z = a) = P (Z < a)+
0 = P (Z < a), as desired.
2 √ 2 √
7. Clearly, φ (a) = e−0.5a / 2π = e−0.5(−a) / 2π = φ(−a) for all a ∈ R. Thus, φ is symmetric
about the vertical axis x = 0, which is also the mean.
7(a). Using the substitution u = −x, we have du/dx = −1 and

u=−∞ −e−0.5u 2
u=−a e−0.5u

P (Z ≥ a) = ∫ √ dx = ∫ √ du = ∫ √ du = P (Z ≤ −a) = Φ(−a).

x=a 2π u=−a 2π u=−∞ 2π

7(b) and 7(c). Obvious.
8, 9, and 10. These can be computed numerically, using a computer.

⎪ > 0, if a < −1,

⎪ = 0, if a = −1,
d2 d −a −0.5a2 1 −0.5a2 2 ⎪

11. φ (a) = √ e =√ e (a − 1) ⎨< 0, if − 1 < a < 1,
2 da 2π 2π ⎪


⎪ = 0, if a = 1,

⎩> 0,
⎪ if a > 1.

Hence, ±1 are the only two points of inflexion since φ changes concavity only here.

1376, Contents

Theorem 45. Let a, b ∈ R be constants with a ≠ 0 and X be a continuous random variable
with PDF fX and CDF FX . Let Y = aX + b. Then
1 c−b
fY (c) = fX ( ).
∣a∣ a

Proof. FY (c) = P (Y ≤ c) = P (aX + b ≤ c) = P (aX ≤ c − b).

c−b c−b
Case #1. If a > 0, then FY (c) = ⋅ ⋅ ⋅ = P (aX ≤ c − b) = P (X ≤ ) = FX ( ).
a a
Now differentiate:
c−b 1 c−b 1 c−b
FY (c) = FX ( ) = fY (c) = fX ( ) = fX ( ).
d d
da dc a a a ∣a∣ a

c−b c−b
Case #2. If a < 0, then FY (c) = ⋅ ⋅ ⋅ = P (aX ≤ c − b) = P (X ≥ ) = 1 − FX ( ).
a a
Now differentiate:
c−b 1 c−b 1 c−b
FY (c) = [1 − FX ( )] = fY (c) = − fX ( ) = fX ( ).
d d
da dc a a a ∣a∣ a

Fact 181 (p. 1018). Let X ∼ N (µ, σ 2 ) and a, b ∈ R be constants. Then aX + b ∼

N (aµ + b, a2 σ 2 ).

Proof. By Theorem 45, the PDF of aX + b is given by

c−b −µ 2
1 1 1 −0.5( ) 1 −0.5[ c−(aµ+b) ]

faX+b (c) = fX ( )= √ e = √ e

∣a∣ ∣a∣ σ 2π
∣a∣ σ 2π


But this lattermost expression is indeed the PDF of the random variable with distribution
N (aµ + b, a2 σ 2 ).

1377, Contents

122.6. Sampling

Fact 183 (p. 1054). Let S = (X1 , X2 , . . . , Xn ) be a random sample of size n. Let X̄ be
the sample mean and S 2 be the sample variance. Let a ∈ R be a constant. Then

∑i=1 Xi2 − [∑i=1nXi ] ∑i=1 (Xi − a) − [∑i=1 (X i −a)]

2 2
n 2 n

(a) S 2 = and (b) S 2 = n

n−1 n−1

Proof. This proof may look intimidating but it’s really just a bunch of tedious algebra. (I’ve
also tried to go slow with the algebra, so more steps are explicitly listed than is typical in
a proof.)
(a) Start from the definition of the sample variance and do the algebra:

∑i=1 (Xi − X̄) ∑i=1 (Xi2 + X̄ 2 − 2X̄Xi ) ∑i=1 Xi2 − ∑i=1 X̄ 2 − ∑i=1 (2X̄Xi )
n n2 n n n
S =
= =
n−1 n−1 n−1
∑i=1 Xi − nX̄ 2 − 2X̄ ∑i=1 Xi ∑i=1 Xi − nX̄ − 2X̄ (nX̄) ∑i=1 Xi2 − nX̄ 2
n 2 n n 2 2 n
= = =
n−1 n−1 n−1
∑i=1 Xi2 − n [ n ]
∑i=1 Xi [∑n
i=1 Xi ]
∑i=1 Xi2 −
n n 2
= = n
n−1 n−1

(b) Start from the formula found in (a) and do the algebra:
[∑ X ] [∑ (X −a+a)]
∑i=1 Xi2 − i=1n i ∑i=1 (Xi − a + a) − i=1 ni
2 2
n 2 n

S2 = =
n−1 n−1
[∑n (X −a)+∑n a]
∑i=1 [(Xi − a) + a2 + 2 (Xi − a) a] − i=1 i n i=1
n 2
[∑n (X −a)] +(∑ni=1 a) +2 ∑i=1 (Xi −a) ∑i=1 a
∑i=1 (Xi − a) + ∑i=1 a2 + 2a ∑i=1 (Xi − a) − i=1 i
2 2
n 2 n n
n n

= n
(X − +na 2 +2a n (X − a) − [∑i=1 (Xi −a)] +(na) +2na ∑i=1 (Xi −a)
∑i=1 i ∑i=1 i
2 2
n 2 n n

a) n
[∑ (X −a)]
∑i=1 (Xi − a) − i=1 n i
n 2 n


1378, Contents

Proposition 17 (p. 1060). Let (X1 , X2 , . . . , Xn ) be a random sample drawn from a
distribution with population mean µ and population variance σ 2 . Let X̄ be the sample
mean and S 2 be the sample variance. Then

(a) E [X̄] = µ. And (b) E [S 2 ] = σ 2 .

Proof. (a) was proven in Exercise 427. We prove only (b).

Equation = is the key piece of intuition (and is formally proven below):

The degree to which The degree to which The degree to which

Xi varies from X̄ X̄ varies from µ Xi varies from µ
³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ · ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹µ ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ · ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹µ ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ
E [(Xi − X̄) ] + E [(X̄ − µ) ] = E [(Xi − µ) ] .
2 2 1 2


Population variance Variance of sample mean

³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ · ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹µ
σ2 n − 1 2
E [(Xi − X̄) ] = E [(Xi − µ) ] − E [(X̄ − µ) ] = σ2 − =
2 2 2
σ .
n n

We’ve just shown that (Xi − X̄) is a biased estimator for σ 2 . And in turn, S 2 is not:

⎡ n (X − X̄)2 ⎤ ⎡ n n (X − X̄)2 ⎤ ∑n E [ n (Xi − X̄)2 ]

⎢ ∑ ⎥ ⎢ ∑i=1 n−1 i ⎥

E [S ] = E ⎢ ⎥ ⎢
= E⎢ ⎥ = =
nσ 2
= σ2.
⎥ ⎥
i=1 n−1
2 i=1 i

⎢ n−1 ⎥ ⎢ ⎥
⎣ ⎦ ⎣ ⎦
n n n

As promised, here is the proof of equation =:


E [(Xi − X̄) ] + E [(X̄ − µ) ] = E [(Xi − X̄) + (X̄ − µ) ]

2 2 2 2

= E [((Xi − X̄) + (X̄ − µ)) 2 − 2 (Xi − X̄) (X̄ − µ)]

= E [(Xi − µ) 2 − 2 (Xi − X̄) (X̄ − µ)]
= E [(Xi − µ) 2 − 2 (Xi X̄ − µXi − X̄ 2 + µX̄)]
= E [(Xi − µ) 2 − 2 (Xi X̄ − X̄ 2 )]
= E [(Xi − µ) 2 ] + 2 {E [X̄ 2 ] − E [X̄Xi ]}
= E [(Xi − µ) 2 ] .

∑i=1 E [X̄Xi ] ∑i=1 Xi

n n
The last equality follows because E [X̄Xi ] = = E [X̄ ] = E [X̄ 2 ].
n n

1379, Contents

122.7. Null Hypothesis Significance Testing

Definition 272. The random variable Tν with Student’s t-distribution with ν degrees of
freedom has PDF f ∶ R → R given by mapping rule
∞ −1 −x −
∫0 x 2 e dx

f (t) = √ (1 + )

∞ ν .
νπ ∫0 x 2 −1 e−x dx ν

1380, Contents

122.8. Calculating the Margin of Error
Let µ be the true population proportion (of votes for Dr. Chee). Say we take a random
sample of size 900.411 Let X be the sample number of votes for Dr. Chee. We know that
X ∼ B (900, µ).
Our confidence level is 95%. So we want to find the smallest k such that

P (900µ − k ≤ X ≤ 900µ + k) ≥ 0.95.

And ±k/900 will be our margin of error.

Case #1: Perfect hindsight: µ = 9142/23570.
With perfect hindsight, we now know that µ = 9142/23570. So X ∼ B (900, 9142/23570).
We want to find the smallest k such that P (349 − k ≤ X ≤ 349 + k) ≥ 0.95.
where 900 × 9142/23570 ≈ 349. Using the “Binomial” sheet at the usual link, we have

P (349 − 28 ≤ X ≤ 349 + 28) ≈ 0.9488,

P (349 − 29 ≤ X ≤ 349 + 29) ≈ 0.9565.

Thus, k = 29. Now, 29/900 ≈ 3.2%. Thus, at a 95% confidence level, the margin of error
is ±3.2%. This is the “true” margin of error, assuming we know µ. But this assumption
defeats the point of sampling — we don’t know µ, which is why we’re doing sampling in
the first place!
What we want instead is the margin of error in the case where µ is unknown.
Case #2: Without perfect hindsight: µ unknown.
With µ unknown, a conservative interpretation would be to find the smallest k such that
for all µ, P (900µ − k ≤ X ≤ 900µ + k) ≥ 0.95.

(... Analysis continued on the next page ...)

This is slightly different from what actually happened: (1) The actual random sampling was most likely
without replacement (which would change the maths slightly). (2) 100 votes were taken from each of 9
different polling stations (which would also change the maths slightly).
1381, Contents
(... Analysis continued from the previous page ...)

Observe that Var [X] = 900µ(1 − µ) is maximised at µ = 0.5. Thus, it is plausible412 that if
k satisfies

X ∼ B (900, 0.5) Ô⇒ P (900 × 0.5 − k ≤ X ≤ 900 × 0.5 + k) ≥ 0.95,

then k also satisfies

X ∼ B (900, µ) Ô⇒ P (900 × 0.5 − k ≤ X ≤ 900 × 0.5 + k) ≥ 0.95.

Our problem thus boils down to finding the smallest k such that for X ∼ B (900, 0.5) implies

P (450 − k ≤ X ≤ 450 + k) ≥ 0.95.

We have P (450 − 29 ≤ X ≤ 450 + 29) ≈ 0.9508,

P (450 − 28 ≤ X ≤ 450 + 28) ≈ 0.9426.

We conclude that the smallest such k is 29. Now, 29/900 ≈ 3.2%. So the margin of error
may be given as ±3.2%. This is the same as what was calculated above, which is not
surprising, since 9142/23570 ≈ 0.388 is close to 0.5.
The reader will, of course, wonder why the Elections Department stated that the margin of
error was ±4%, rather than ±3.2% as I calculated here. I am not sure myself. My guess is
that they probably don’t bother going through all the above calculations afresh each time.
Instead, each time they report a sample count, they simply read off the margin of error
from a table that looks something like this:

Sample Size Approximate Margin of Error

400 − 599 ±5%
600 − 999 ±4%
1000 − 2000 ±3%

(By the way, note that it is common to use the CLT approximation when calculating the
margin of error. I have not done so here. Instead, I’ve stuck with using the original, exact
binomial distribution.)

Proving this would need a little work though.
1382, Contents
122.9. Correlation and Linear Regression

Fact 231. Let x1 , x2 , . . . , xn and y1 , y2 , . . . , yn be numbers. Let x̄ = ∑ xi /n and ȳ = ∑ yi /n.


∑i=1 (xi − x̄) (yi − ȳ)

√ √ ∈ [−1, 1].
∑i=1 (xi − x̄) ∑i=1 (yi − ȳ)
n 2 n 2

Proof. Let u = (x1 − x̄, x2 − x̄, . . . , xn − x̄) and v = (y1 − ȳ, y2 − ȳ, . . . , yn − ȳ) be n-dimensional
vectors. Then

∑i=1 (xi − x̄) (yi − ȳ) u⋅v

√ √ =
∣u∣ ∣v∣
∑i=1 (xi − x̄) ∑i=1 (yi − ȳ)
n 2 n 2

But from what we learnt about vectors,413 if θ is the angle between two vectors, then:
cos θ =
∣u∣ ∣v∣

Since cos θ ∈ [−1, 1], the result follows.

Of course, in this textbook, we’ve only shown that this is true for two- and three-dimensional vectors.
But let’s just wave our hands and say that this is also true for higher-dimensional vectors.
1383, Contents
Fact 187. Let (x1 , x2 , . . . , xn ) and (y1 , y2 , . . . , yn ) be two ordered sets of data. The OLS
regression line of y on x is y − ȳ = b̂ (x − x̄), where

∑i=1 (xi − x̄) (yi − ȳ)

(i) b̂ = ,
∑i=1 i(x −
n 2

∑ xi yi − nx̄ȳ
(ii) b̂ =
∑ x2i − nx̄2

Moreover, the regression line can also be written in the form y = â + b̂x, where b̂ is a given
above and â = ȳ − b̂x̄.

Proof. (Continued from the proof begun on p. 1098.) Remember that the data (x1 , x2 , . . . , xn )
and (y1 , y2 , . . . , yn ) are given. Thus, we can treat all the xi s and yi s as constants. We have:

∑ û2i = ∑ ûi = ∑ (2ûi ) = ∑ −2 [yi − (â + b̂xi )] .

∂ ∂ 2 ∂ûi
∂â ∂â ∂â

∑ û2i = 0 ⇐⇒ yi − (â + b̂xi ) = 0 ⇐⇒ â = ȳ − b̂x̄.

∂ 1
We also have:

∑ û2i = ∑ û2i = ∑ (2ûi ) = ∑ −2xi [yi − (â + b̂xi )] .

∂ ∂ ∂ûi
∂b̂ ∂b̂ ∂b̂

∑ û2i = 0 ⇐⇒ ∑ [yi − (â + b̂xi )] xi = 0. Plugging = into this last equation, we

∂ 1
have ∑ [yi − (ȳ − b̂x̄ + b̂xi )] xi = 0. Tedious algebra yields Formula (ii):
∑ xi yi − nx̄ȳ
b̂ =
∑ x2i − nx̄2

More algebra yields Formula (i).

1384, Contents

122.10. Deriving a Linear Model from the Barometric Formula
According to NASA (1976), “U.S. Standard Atmosphere”, p. 12, eq. (33a) (PDF), the
barometric formula (relating pressure P to height H above sea level), in the case where
LM,b ≠ 0 is given by:

g0 M
R∗ LM,b

P = Pb [ ]
TM,b + LM,b (h − hb )

where Pb , TM,b , LM,b , hb , g0 , R∗ are simply constants. Now, do the algebra:

g0 M
R∗ LM,b

P = Pb [ ]
TM,b + LM,b (h − hb )

− R∗ 0L
TM,b + LM,b (h − hb )
g M

= Pb [ ]


− R∗ 0L
g M

= PM,b [1 + (h − hb )]
LM,b M,b


ln P = ln PM,b − ∗0 ln [1 + (h − hb )] .
gM LM,b
R LM,b TM,b

Now, for heights up to 11 000 m above sea level, hb is simply the height at sea level. That

is, hb = 0 m. If we also let a = ln PM,b and b = − ∗

g0 M
and get rid of the subscripts in LM,b
R LM,b
and TM,b (just to make it neater), then we have:

ln P = a + b ln (1 +
h) .
For heights up to 11 000 m above sea level, L = −0.000 65 kelvin per metre is the temperature
lapse rate (the rate at which the temperature falls, as we go up in altitude; see p.3, Table
4) and T = 288.15 kelvin is the standard sea-level temperature (also precisely equal to 15
°C; see p. 4).

1385, Contents

Part IX.
Answers to Exercises

1386, Contents

123. Part 0 Answers (A Few Basics)

123.1. Ch. 1 Answers (Just To Be Clear)

(This chapter had no exercises.)

123.2. Ch. 2 Answers (PSLE Review: Division)

Tip: Click on the exercise number to return to that exercise.
A1. Long division for 8 057 ÷ 39. The dividend is 8 057 and the divisor is 39.

1 000s 100s 10s 1s

2 0 6
39 8 0 5 7 Explanation
7 8 0 0 200 × 39 = 7 800
2 5 7 8 057 − 7 800 = 257
0 0 × 37 = 0
2 5 7 257 − 0 = 0
2 3 4 6 × 39 = 234
2 3 257 − 234 = 23

23 23
We have: 8 057 ÷ 39 = 206 + = 206 .
39 39

The quotient is 206 and the remainder is 23.

A2. The error is in Step 5. Since x = y, we have x − y = 0. And so we cannot simply divide
both sides by x − y.
A3. The error is in the first step, when he “divide[s] both numerators by x”. The given
equation’s solutions are x = 8 and x = 0.

1387, Contents

123.3. Ch. 3 Answers (Logic)
A4. Remember: P AND Q is false if either P or Q is false, while P OR Q is true if either
P or Q is true.
In this exercise, every conjunction (AND statement) is false because (at least) one of the two
statements forming the conjunction is false. In contrast, every disjunction (OR statement)
is true because (at least) one of the two statements forming the disjunction is true.

(a) B AND C: “Germany is in Asia AND 1 + 1 = 2.” 7

(b) A AND D: “Germany is in Europe AND 1 + 1 = 3.” 7
(c) C AND D: “1 + 1 = 2 AND 1 + 1 = 3.” 7
(d) B OR C: “Germany is in Asia OR 1 + 1 = 2.” 3
(e) A OR D: “Germany is in Europe OR 1 + 1 = 3.” 3
(f) C OR D: “1 + 1 = 2 OR 1 + 1 = 3.” 3

A5. NOT-E: “It’s not raining.” NOT-F : “The grass is not wet.” NOT-G: “I’m not
sleeping.” NOT-H: “My eyes are not shut.”
A6(a) If x = 0.5, then O is true while N is false. Thus, we say that O and N are not
equivalent and write O ⇐⇒
/ N.
(b) If x = −3, then γ is true while α is false. Thus, we say that γ and α are not equivalent
and write γ ⇐⇒ / α.
A7(a) NOT- (B AND C) is true. There are two ways to see this:
• Since B AND C is false, its negation NOT- (B AND C) must be true.
• By Fact 1, NOT- (B AND C) is equivalent to NOT-B OR NOT-C: “Germany is not in
Asia OR 1 + 1 ≠ 2”. Which is true because “Germany is not in Asia” (NOT-B) is true.
(b) NOT- (A AND D) is true. There are two ways to see this:
• Since A AND D is false, its negation NOT- (A AND D) must be true.
• By Fact 1, NOT- (A AND D) is equivalent to NOT-A OR NOT-D: “Germany is not in
Europe OR 1 + 1 ≠ 3”. Which is true because “1 + 1 ≠ 3” (NOT-D) is true.
(c) NOT- (B AND D) is true. There are two ways to see this:
• Since B AND D is false, its negation NOT- (B AND D) must be true.
• By Fact 1, NOT- (B AND D) is equivalent to NOT-B OR NOT-D: “Germany is not in
Asia OR 1 + 1 ≠ 3”. Which is true because “Germany is not in Asia” (NOT-B) is true.
Indeed, “1 + 1 ≠ 3” (NOT-D) is also true.

1388, Contents

A8(a) NOT- (B OR C) is false. There are two ways to see this:
• Since B OR C is true, its negation NOT- (B OR C) must be false.
• By Fact 2, NOT- (B OR C) is equivalent to NOT-B AND NOT-C: “Germany is not in
Asia AND 1 + 1 ≠ 2”. Which is false because “1 + 1 ≠ 2” (NOT-C) is false.
(b) NOT- (A OR D) is false. There are two ways to see this:
• Since A OR D is true, its negation NOT- (A OR D) must be false.
• By Fact 2, NOT- (A OR D) is equivalent to NOT-A AND NOT-D: “Germany is not in
Europe AND 1 + 1 ≠ 3”. Which is false because “Germany is not in Europe” (NOT-A) is
(c) NOT- (B OR D) is true. There are two ways to see this:
• Since B OR D is false, its negation NOT- (B OR D) must be true.
• By Fact 2, NOT- (B OR D) is equivalent to NOT-B AND NOT-D: “Germany is not in
Asia AND 1 + 1 ≠ 3”. Which is true because both “Germany is not in Asia” (NOT-B)
and “1 + 1 ≠ 3” (NOT-D) are true.
A9. Remember: An implication P Ô⇒ Q is true if either its hypothesis P is false or its
conclusion Q is true.
Here, the hypothesis of each statement (a)–(d) is false (TPL is not a genius, π is not
rational). Hence, each statement is true.
A10(a) G Ô⇒ H is true. The converse H Ô⇒ G is false:

“If my eyes are shut, then I’m sleeping.”

Or: “That my eyes are shut implies that I’m sleeping.”

Two counterexamples to H Ô⇒ G: (1) I may be resting my eyes. (2) I may be blinking.

In either counterexample, my eyes are shut, but I’m not sleeping.
(b) M Ô⇒ N is false. The converse N Ô⇒ M is true:

“If x > 1, then x > 0.” Or: “That x > 1 implies that x > 0.”

(c) γ Ô⇒ α is false. The converse α Ô⇒ γ is true:

“If x = 3, then x2 = 9.” Or: “That x = 3 implies that x2 = 9.”

A11(a) The converse is “If the Nazis won World War II (WW2), then Tin Pei Ling (TPL)
is a genius.” True because the hypothesis is false.
(b) The converse is “If the Allies won WW2, then TPL is a genius.” False because the
hypothesis is true AND the conclusion is false.
(c) The converse is “If I am the king of the world, then π is rational.” True because the
hypothesis is false.
(d) The converse is “If Lee Hsien Loong is Lee Kuan Yew’s son, then π is rational.” False
because the hypothesis is true AND the conclusion is false.

1389, Contents

A12(a) A is true and B is false. Therefore, A Ô⇒ B is false.
B is false. Therefore, the converse B Ô⇒ A is true. (Alternate answer: A is true.
Therefore, B Ô⇒ A is true.)
(b) C is true. Therefore, A Ô⇒ C is true.
A is true. Therefore, the converse C Ô⇒ A is true.
(c) A is true and D is false. Therefore, A Ô⇒ D is false.
D is false. Therefore, the converse D Ô⇒ A is true. (Alternate answer: A is true.
Therefore, D Ô⇒ A is true.)
(d) C is true and D is false. Therefore, C Ô⇒ D is false.
D is false. Therefore, the converse D Ô⇒ C is true.
A13. Remember: An implication P Ô⇒ Q is true if either its hypothesis P is false or its
conclusion Q is true.

(a) If P is true, then P Ô⇒ Q (iii) could be true or false and

is Q Ô⇒ P (i) must be true.
(b) If P is false, then P Ô⇒ Q (i) must be true and
is Q Ô⇒ P (iii) could be true or false.
(c) If P Ô⇒ Q is true, then Q Ô⇒ P (iii) could be true or false.
(d) If P Ô⇒ Q is false, then Q Ô⇒ P (i) must be true.

(c) yields us an important result: Even if an implication P Ô⇒ Q is true, this says

nothing whatsoever about its converse. Indeed, (c) is simply a restatement of Fact 3.
(d) is especially tricky: If P Ô⇒ Q is false, then Q must be false and so the converse
Q Ô⇒ P must be true.
A14. No. Again, the error here is to affirm the consequent or commit the fallacy of
the converse:

1. “P Ô⇒ Q.”
2. “P .”
3. “Therefore, P .”

A15. By Definition 7, P Ô⇒ Q is equivalent to NOT-P OR Q. And by Fact 2, the

negation of this NOT-P OR Q is:


A16. By Fact 5, NOT- (K Ô⇒ L) is equivalent to:


That’s “Some x is donzer and not kiki.” So the answer is (d).

1390, Contents

A17. Given the statement “If x is German, then x is European”, its contrapositive is:

(d) “If x is not European, then x is not German”.

(a) “If x is European, then x is German.” 7

(b) “If x is not German, then x is not European.” 7
(c) “If x is not German, then x is European.” 7
(d) “If x is not European, then x is not German.” 3
(e) “If x is not European, then x is German.” 7

By the way, (a) is the converse of the given statement, while (b) is the inverse = “Negate
A18. O Ô⇒ N is false (counterexample: x = 0.5) and so by Fact 7, N ⇐⇒
/ O.
A19. No two of these three statements are equivalent. X ⇐⇒ / Y because John may
have a blue NRIC and thus not be a Singapore citizen. X ⇐⇒ / Z, because John may be
a newborn Singapore citizen who hasn’t yet obtained his pink NRIC. Y ⇐⇒/ Z because
John may have a blue but not a pink NRIC.

A20. Maths/Logic Everyday English

G Ô⇒ H That I’m sleeping implies that my eyes are shut.
G Ô⇒ H I’m sleeping only if my eyes are shut.
If G, then H. If I’m sleeping, then my eyes are shut.
If G, H. If I’m sleeping, my eyes are shut.
H if G. My eyes are shut if I’m sleeping.
H when G. My eyes are shut when I’m sleeping.
H follows from G. That my eyes are shut follows from the fact that I’m sleeping.
G is sufficient for H. That I’m sleeping is sufficient for my eyes to be shut.
H is necessary for G. It is necessary that my eyes are shut, for me to be sleeping.

(a) “All donzers “No donzer “Some donzer “Some donzer
are kiki.” is kiki.” is kiki.” is not kiki.”
(b) “All donzers “No donzer “Some donzer “Some donzer does
cause cancer.” causes cancer.” causes cancer.” not cause cancer.”
(c) “All bachelors “No bachelor “Some bachelor “Some bachelor
are married.” is married.” is married.” is not married.”
(d) “All bachelors “No bachelor “Some bachelor “Some bachelor
smoke.” smokes.” smokes.” does not smoke.”

1391, Contents

A22. By Definition 5, the negation of a statement is true whenever the statement is false
and false whenever the statement is true.
(a) The UA “All animals are dogs” and the corresponding UN “No animal is a dog” are
both false. Thus, it is not always true that the UA and the UN are negations of each other.
(b) The PA “Some Korean eats dogs” and the corresponding PN “Some Korean does not
eat dogs” are both true. Thus, it is not always true that the PA and the PN are negations
of each other.

A23 Statement Negation

(a) UA: “All donzers are kiki.” PN: “Some donzer is not kiki.”
(b) UN: “No donzer is kiki.” PA: “Some donzer is kiki.”
(c) PA: “Some donzer is kiki.” UN: “All donzers are not kiki.”
(d) PN: “Some donzer is not kiki.” UA: “All donzers are kiki.”
(e) UA: “All bachelors are married.” PN: “Some bachelor is not married.”
(f) UN: “No bachelor is married.” PA: “Some bachelor is married.”
(g) PA: “Some bachelor is married.” UN: “All bachelors are not married.”
(h) PN: “Some bachelor is not married.” UA: “All bachelors are married.”
(i) UA: “All donzers cause cancer.” PN: “Some donzer does not cause cancer.”
(j) UN: “No donzer causes cancer.” PA: “Some donzer causes cancer.”
(k) PA: “Some donzer causes cancer.” UN: “All donzers do not cause cancer.”
(l) PN: “Some donzer does not cause cancer.” UA: “All donzers cause cancer.”
(m) UA: “All bachelors smoke.” PN: “Some bachelor does not smoke.”
(n) UN: “No bachelor smokes.” PA: “Some bachelor smokes.”
(o) PA: “Some bachelor smokes.” UN: “All bachelors do not smoke.”
(p) PN: “Some bachelor does not smoke.” UA: “All bachelors smoke.”

A24. “No person is LeBron James.” This is a UN with the subject “person” and the
predicate “LeBron James”.
The negation of this UN is the PA “Some person is LeBron James”. Which is true, since
there is a person who is LeBron James — namely LeBron James himself . Since the negation
is true, the commentator’s original statement is false.
Where we place the word not is crucial. The commentator wanted to negate the state-
ment “Everybody is LeBron James”, but placed the word not in the wrong position. He
incorrectly said, “Everybody is not LeBron James,” but should instead have said, “Not
everybody is LeBron James”. This latter statement is true and is equivalent to the PN
“Some person is not LeBron James”.
Moral of the story: Don’t say “All S are NOT-P ” when you mean “Not all S are P ”.

1392, Contents

123.4. Ch. 4 Answers (Sets)
A25. The set of the first seven integers is C = {1, 2, 3, 4, 5, 6, 7} .
A26. There is only one even prime number, namely 2. Hence, D = {2}.
A27. X = {Lee Kuan Yew, Goh Chok Tong, Lee Hsien Loong}.
A28(a) H contains two elements, namely F and G. (You can think of H as a box that
itself contains two boxes, namely F and G.)
(b) H = {{{1, 3, 5} , {100, 200}} , {1, 3, 5, 100, 200}}. (Note that the braces go three-deep.)
A29(a) I contains three elements, namely A, B, and G. (You can think of I as a box that
itself contains three boxes, namely A, B, and G.)
(b) I = {{1, 3, 5} , {100, 200} , {1, 3, 5, 100, 200}}.
(c) Nope. H is a box that contains the two boxes F and G, while I is a box that contains
the three boxes A, B, and G. So the sets H and I are not the same.
A30(a) Los Angeles ∈ The set of the four largest cities in the US.
(b) Tharman Shanmugaratnam ∉ The set of Singapore Prime Ministers (past and present).
A31(a) Yes, because {1, 2, 3} and {3, 2, 1} both contain the exact same elements, namely
the numbers 1, 2, and 3. The order in which we write out the elements of a set doesn’t
matter. (b) No. {{1} , 2, 3} is the set containing three elements: a set containing the
number 1 and the two numbers 2 and 3. In contrast, {{3} , 2, 1} is the set containing three
elements: a set containing the number 3 and the two numbers 2 and 1. Since the two sets
contain different elements, they are not the same set.
A32. n(X) = n ({LKY, GCT, LHL}) = 3.
A33. L is the set containing the first 50 odd positive integers; hence, n(L) = 50. And M
is the set containing the first 99 negative integers; hence, n(M ) = 99.
A34. N = {102, 104, 106, 108, . . . , 996, 998}.
A35. The set W = {Apple, Apple, Apple, Banana, Banana, Apple} has only two distinct
elements. Hence, n(W ) = 2. We can rewrite the set more simply as W = {Apple, Banana}.
A36. There is only one even prime number, namely 2. Hence, C = {2} and n(C) = 1.
A37. The set of all primes is H = {2, 3, 5, 7, 11, 13, 17, 23, 29, . . . }.
A38. None. All of them contain infinitely many elements and so all are infinite.
A39. The set {{{}} , ∅, {∅} , {}} contains only two elements — n (S) = 2.
Observe that {∅} = {{}} and {} = ∅. Hence, {∅} and {} are repeated elements that we
may ignore. Hence, the set {{{}} , ∅, {∅} , {}} = {{{}} , ∅} contains only two elements —
namely, (i) the set that contains the empty set; and (ii) the empty set.
A40. The set X = [1, 1] contains the real numbers that are ≥ 1 and ≤ 1. There is only one
such number, namely the number 1. And so, n(X) = 1. We can also write X = {1}.
The set Y = (1, 1) contains the real numbers that are > 1 and < 1. There are no such
numbers. And so, n(Y ) = 0. We can also write Y = {} = ∅.
The set Z = (1, 1.01) contains the real numbers that are > 1 and < 1.01. There are infinitely
many such numbers (e.g. 1.001, 1.0001, 1.0002). And so, n(Z) = ∞.

1393, Contents

A41. R = (−∞, ∞), R+ = (0, ∞), R+0 = [0, ∞), R− = (−∞, 0), and R−0 = (−∞, 0].
A42(a) Every integer is also a rational number and a real number; hence, Z ⊆ Q, R. (b)
A rational number is also a real number; hence, Q ⊆ R. However, some rational numbers
are not integers (e.g. 1.5 is rational but is not an integer); hence, Q ⊆/ Z. (c) Some real
numbers are neither rational nor integers (e.g. π); hence, R ⊆/ Z, Q.
A43. True. The set of current Singapore Prime Minister(s) is {Lee Hsien Loong}. The
set of current Singapore Minister(s) is {Lee Hsien Loong, Tharman, Teo Chee Hean, Khaw
Boon Wan, . . . }. The latter set contains every element that is in the former set. Hence,
the former is a subset of the latter.
A44(a) False. Counterexample: Let A = {1} and B = {1, 2}. Then A ⊆ B, but A ≠ B.
(b) False. Counterexample: Let A = {3, 4} and B = {3}. Then B ⊆ A, but A ≠ B.
(c) True by definition.
(d) True by definition.
(e) False. Counterexample: same as in (a).
(f) False. Counterexample: same as in (b).
A45. Yes, the set of squares is a proper subset of the set of rectangles. All squares
are rectangles and so S ⊆ R. Moreover, some rectangles are not squares and so S ≠ R.
Altogether then, by Definition 19, S ⊂ R.
A46. No. Counterexample: if A = {1, 2} and B = {1, 2}, then A ⊆ B, but A ⊂/ B.
A47. Yes. By definition, A ⊂ B requires that A ⊆ B.
A48. True by definition.
A49(a) [1, 2] ∪ [2, 3] = [1, 3]. (b) (−∞, −3) ∪ [−16, 7) = (−∞, 7). (c) {0} ∪ Z+ = Z+0 .
A50. Observe that every square is a rectangle. Hence, “the set of all squares and all
rectangles” is itself simply “the set of all rectangles”, i.e. S ∪ R = R.
A51. All real numbers are either rational or irrational. Hence, the set of all rationals and
irrationals is itself simply “the set of all reals” or R.
A52(a) (4, 7] ∩ (6, 9) = (6, 7]. (b) [1, 2] ∩ [5, 6] = ∅. (c) (−∞, −3) ∩ [−16, 7) = [−16, −3).
A53. Observe that the only objects that are BOTH squares AND rectangles are squares.
Hence, the intersection of the set of squares and the set of rectangles is itself simply the set
of all squares, i.e. S ∩ R = S.
A54. It is the empty set ∅. This is because there is no object that is BOTH rational AND
A55. V ∖ T = {3} and V ∖ U = {1, 2}.
A56. S = {Lee Hsien Loong, Lee Wei Ling, Lee Hsien Yang}.
T = {Lee Wei Ling, Lee Hsien Yang}.
A57. {LKY, LHL}.

1394, Contents

A58(a) R− = {x ∈ R ∶ x < 0} (b) Q− = {x ∈ Q ∶ x < 0}.
(c) Z− = x ∈ Z ∶ x < 0. (d) R−0 = x ∈ R ∶ x ≤ 0
(e) Q−0 = {x ∈ Q ∶ x ≤ 0}. (f) Z−0 = x ∈ Z ∶ x ≤ 0.
(g) (a, b) = {x ∈ R ∶ a < x < b}. (h) [a, b] = {x ∈ R ∶ a ≤ x ≤ b}.
(i) (a, b] = x ∈ R ∶ a < x ≤ b. (j) [a, b) = {x ∈ R ∶ a ≤ x < b}.
(k) (−∞, −3) ∪ (5, ∞) = {x ∈ R ∶ −∞ < x < −3, 5 < x < ∞} .
√ √
(l) (−∞, 2] ∪ (e, π) ∪ (π, ∞) = {x ∈ R ∶ −∞ < x < 2, e < x < π, π < x < ∞}.
(m) (−∞, 3) ∩ (0, 7) = {x ∈ R ∶ 0 < x < 3}.
(n) The set of negative even numbers is {x ∶ x = 2k, k ∈ Z− } or more simply {2k ∶ k ∈ Z− }.
(o) The set of positive odd numbers is {x ∶ x = 2k − 1, k ∈ Z+ } or {x ∶ x = 2k + 1, k ∈ Z+0 }; or
more simply, {2k − 1 ∶ k ∈ Z+ } or {2k + 1 ∶ k ∈ Z+0 }.
(p) The set of negative odd numbers is {x ∶ x = 2k − 1, k ∈ Z−0 } or {x ∶ x = 2k + 1, k ∈ Z− }; or
more simply, {2k − 1 ∶ k ∈ Z−0 } or {2k + 1 ∶ k ∈ Z− }.
(q) {π, 4π, 7π, 10π, . . . } = {(1 + 3k) π ∶ k ∈ Z+0 }.
(r) {−2π, π, 4π, 7π, 10π, . . . } = {(1 + 3k) π ∶ k ∈ Z, k ≥ −1}.
A59(a) R ∖ Z+ = Z+0 .
(b) R ∖ (Q ∪ Z) = R ∖ Q = Q′ .
(c) [1, 6] ∖ ((3, 5) ∩ (1, 4)) = [1, 6] ∖ (3, 4) = [1, 3) ∪ (4, 6].
(d) {1, 5, 9, 13, . . . } ∩ {2, 4, 6, 8, . . . } = ∅.
(e) {2, 5, 8, 11, . . . } ∩ {2, 4, 6, 8, . . . } = {2, 8, 14, 20 . . . } = {2 + 6k ∶ k ∈ Z+0 }.
(f) (0, 5] ∩ ([1, 8] ∩ [5, 9)) ′ = (0, 5] ∩ [5, 8] ′ = (0, 5).

1395, Contents

123.5. Ch. 5 Answers (O-Level Review)
√ √ √ √
A60. False. If x < 0, then x = −x — e.g., if x = −1, then x = (−1) = 1 = 1 ≠ x.
2 2 2

The statement would be true if we changed its premise: “If x ≥ 0, then x2 = x.” 3

It would also be true if we changed its conclusion: “If x ∈ R, then x2 = ∣x∣.” 3
54x ⋅ 251−x 54x ⋅ 52(1−x)
A61(a) =
52x+1 + 3 ⋅ 25x + 17 ⋅ 52x 52x+1 + 3 ⋅ 52x + 17 ⋅ 52x
= 2x+1
5 + 3 ⋅ 52x + 17 ⋅ 52x
52+2x 52+x 52+2x
= 2x 1 = = = 1.
5 (5 + 3 + 17) 52x ⋅ 25 52x+2

√ 8x+2 − 34 ⋅ 23x √ 8x+2 − 34 ⋅ 23x √ 8x+2 − 34 ⋅ 8x

(b) 2 √ 2x+1 = 2 √ 2x √ 1 = 2 √
8 8 8 8x 8

√ 8x (82 − 34) √ 82 − 34 √ 64 − 34
= 2 √ = 2 √ = 2 √
8x 8 8 8
√ 30 √ 30 √ 15
= 2 √ = 2 √ = 2 √ = 15.
8 2 2 2

A62(a) b(x ) = bxy is false, as the following counterexample shows:


Let b = 2, x = 1, y = 2. Then b(x ) = 2(1 ) = 21 = 2, but bxy = 21⋅2 = 22 = 4. Hence, b(x ) ≠ bxy .
y 2 y

(b) (bx ) = bxy is true and was already proven in Proposition 1(d) above.

√ 2
A63. 1 1
∓ y2 + 1
√ 2 = √ 2 √ 2
y ± x
y2 + 1 x
y ± x
y2 + 1 x
y ∓ y2 + 1

√ 2 √ 2
y ∓ y2 + 1 y ∓ y2 + 1
x x x x
= √ 2 =
y 2 − ( y 2 + 1)
2 2
( y ) − ( y2 + 1)
x x x2 x2

√ 2 √
∓ y2 + 1
= =− ± + 1.
y x
−1 y y2

Observe that at the last step, the −1 in the denominator flips the ∓ into a ±.

1396, Contents

√ √ √
−b ± b2 − 4ac −b ± b2 − 4ac −b ∓ b2 − 4ac
A64. = √
2a 2a −b ∓ b2 − 4ac
b2 − (b2 − 4ac)
= √
2a (−b ∓ b2 − 4ac)
= √
2a (−b ∓ b2 − 4ac)
= √ .
−b ∓ b2 − 4ac

A65(a) log2 32 + log3 = 5 − log3 27 = 5 − 3 = 2.
log3 25 2 log3 5
(b) log3 45 − log9 25 = log3 5 + log3 9 − = log3 5 + 2 − = 2.
log3 9 2
(c) First, log16 768 = log16 (256 × 3) = log16 256 + log16 3 = 2 + log16 3 = 2 + log2 3.
√ 1
Next, log2 3 = log2 31/4 = log2 3.

√ 1 1
Hence, log16 768 − log2 3 = 2 + log2 3 − log2 3 = 2.

4 4

1397, Contents

124. Part I Answers (Functions and Graphs)

124.1. Ch. 6 Answers (Graphs)

A66(a) The graph of the equation y = ex is the set {(x, y) ∶ y = ex }.
(b) The graph of the equation y = 3x + 2 is the set {(x, y) ∶ y = 3x + 2}.
(c) The graph of the equation y = 2x2 + 1 is the set {(x, y) ∶ y = 2x2 + 1}.

(b) The graph of the
equation y = 3x + 2 is
the set {(x, y) ∶ y = 3x + 2}.
(c) The graph of the
equation y = 2x2 + 1 is
the set {(x, y) ∶ y = 2x2 + 1}.

(a) The graph of the

equation y = ex is
the set {(x, y) ∶ y = ex }.

1398, Contents

A67(a) The graph of y = ex , −1 ≤ x < 2 is the set {(x, y) ∶ y = ex , −1 ≤ x < 2}.
(b) The graph of y = 3x + 2, −1 ≤ x < 2 is the set {(x, y) ∶ y = 3x + 2, −1 ≤ x < 2}.
(c) The graph of y = 2x2 + 1, −1 ≤ x < 2 is the set {(x, y) ∶ y = 2x2 + 1, −1 ≤ x < 2}.

(b) The graph of y = 3x + 2,

−1 ≤ x < 2 is the set
{(x, y) ∶ y = 3x + 2, −1 ≤ x < 2}.

(c) The graph of y = 2x2 + 1,

−1 ≤ x < 2 is the set
{(x, y) ∶ y = 2x2 + 1, −1 ≤ x < 2}.
(a) The graph of y = ex ,
−1 ≤ x < 2 is the set
{(x, y) ∶ y = ex , −1 ≤ x < 2}.

−1 2 x

(Answer continues on the next page ...)

1399, Contents

(... Answer continued from the previous page.)
A67(d) y

⎪x + 1, for x ≤ 0

⎩x − 1, for x > 0.



⎪x + 1, for x < 0

⎩x − 1, for x ≥ 0.


A68(a) y = 2 has no x-intercepts, one y-intercept (0, 2), and no roots.

(b) y = x2 − 4 has two x-intercepts (−2, 0) and (2, 0), one y-intercept (0, −4), and two roots
−2 and 2.
(c) y = x2 + 2x + 1 has one x-intercept (−1, 0), one y-intercept (0, 1), and one root −1.
(d) y = x2 + 2x + 2 has no x-intercepts, one y-intercept (0, 2), and no roots.
1400, Contents
4 1
A69(a) (7 − 4) (y − 5) = (9 − 5) (x − 4) or 3y − 15 = 4x − 16 or y = x − .
3 3
5 1
(b) (−1 − 1) (y − 2) = (−3 − 2) (x − 1) or −2y + 4 = 5x + 5 or y = − x − .
2 2
A70(a) (y − 5) = 3 (x − 4) or y = 3x − 7.
(b) (y − 2) = −2 (x − 1) or y = −2x + 4.
A72. As x approaches −π/2 from the left, y approaches ∞. Formally:

lim − y = lim − tan x = ∞. ◊1

x→(− π2 ) x→(− π2 )

As x approaches −π/2 from the right, y approaches −∞. Formally:

lim + y = lim + tan x = −∞. ◊2

x→(− π2 ) x→(− π2 )

Both ◊1 and ◊2 say that y = −π/2 is a vertical asymptote for the graph of y = tan x.
A73. The line y = 0 (the x-axis) is a horizontal asymptote for the graph of y = 1/x:
As x approaches −∞, y approaches 0 from below: lim y = lim 1/x = 0− .
x→−∞ x→−∞
And as x approaches ∞, y approaches 0 from above: lim y = lim 1/x = 0+ .
x→∞ x→∞

Vertical asymptote

Horizontal asymptote

The line x = 0 (the y-axis) is a vertical asymptote for the graph of y = 1/x:
As x approaches 0 from the left, y approaches −∞: lim− y = lim− 1/x = −∞.
x→0 x→0
And as x approaches 0 from the right, y approaches ∞: lim+ y = lim+ 1/x = ∞.
x→0 x→0
1401, Contents
A74. Refer to graphs and table below. For (c), for each k ∈ Z, let Ek = (2kπ, 1) and
Fk = ((2k + 1) π, 1) — note that there are infinitely many points Ek and Fk .

(a) y = x2 + 1. y (b) y = x2 + 1, y
−1 ≤ x ≤ −1

B = (1, 2) D = (1, 2)

A = (0, 1) C = (0, 1) x

(c) y = cos x. y

E−1 = (−2π, 1) E0 = (0, 1) E1 = (2π, 1)

F1 = (3π, 1)
F−1 = (−π, 1) F0 = (π, 1)

A B C D Ek Fk G H I
GMax 3 3 3 3 (d) y = cos x, y
SGMax 3 −1 ≤ x ≤ 1
H = (0, 1)
LMax 3 3 3 3
SLMax 3 3 3 3
GMin 3 3 3 3 3
SGMin 3 3 I = (1, cos 1)
LMin 3 3 3 3 3 G = (−1, cos (−1))
SLMin 3 3 3 3 3 x
Turning 3 3 3 3 3

1402, Contents

A75(a) False. In Example 160, G = (1, 3) is a global maximum but not a strict local
(b) True, because a point that’s at least as high as any other point must also be at least
as high as any “nearby” point.
(c) True. A point that’s higher than any other point must also be (i) at least as high as any
other point; (ii) higher than any “nearby” point; and (iii) at least as high as any “nearby”
point. Hence, a strict global maximum must also be a (i) global maximum; (ii) strict local
maximum; and (iii) local maximum.
(d) False. In Example 160, G = (1, 3) is a global maximum and also a local minimum.

124.2. Ch. 7 Answers (Reflection and Symmetry)

A76. The reflection of the point (8, 5) in the point (−2, 4) is the following point:

(2 × (−2) − 8, 2 × 4 − 5) = (−12, 3) .

A77. (2, 3) and (−2, −3).

A78. y = x2 + 2x + 2 is symmetric in the line x = −1.

x = −1

y = x2 + 2x + 2

124.3. Ch. 8 Answers (Solutions and Solution Sets)

(This chapter had no exercises.)

1403, Contents

124.4. Ch. 9 Answers (O-Level Review: The Quadratic Equation)
A79. In each case, √the y-intercept is given by (0, c), the discriminant by b2 − 4ac, the x-
intercepts by ((−b ± b2 − 4ac) /2a, 0), the line of symmetry by x = −b/2a, and the turning
point by (−b/2a, c − b2 /4a).

(a) y = 2x2 + x + 1 (b) y = −2x2 + x + 1 (c) y = x2 + 4x + 4

y-intercept (0, 1) (0, 1) (0, 4)

Discriminant −7 < 0 9>0 0

x-intercepts None (− , 0), (1, 0) (−2, 0)

1 1
Line of symmetry x=− x= x = −2
4 4

1 7 1 9
Turning point (− , ) ( , ) (−2, 0)
4 8 4 8
The turning point in each of (a) and (c) is also the strict global minimum; and in (b), it
is also the strict global maximum.

(a) y = 2x2 + x + 1 y

(c) y = x + 4x + 4

1 9
( , )
4 8
1 7
(− , ) (0, 1)
−2 4 8
1 1

(b) y = −2x2 + x + 1


1404, Contents

124.5. Ch. 10 Answers (Functions)
A80. A function consists of three pieces: namely, the domain, the codomain, and the
mapping rule.
A81. In general, the domain can be any set; and the codomain can be any set.
A82. A function maps every element in its domain to exactly one element in its codomain.
A83(a) A function whose domain contains only real numbers is a function of a real variable.
(b) A function whose codomain contains only real numbers is a real-valued function.
A84. f (1) = 1 + 1 = 2. g(1) = 17(1) = 17. h(1) = 31 = 3. i(1) is undefined because
1 ∉ Z− = {−1, −2, −3, . . . }. j(1) = 17.
A85. Below, each function is written out explicitly. (i) Each function maps each element
in its domain to exactly one element in the codomain and is thus well-defined. (ii) From
below, it is clear that only b = c.

Function Domain Codomain Mapping rule

a {1, 2} {1, 2, 3, 4}. a(1) = 2, a(2) = 4
b {1, 2, 3} {1, 2, 3, 4, 5, 6} b(1) = 2, b(2) = 4, b(3) = 6
c {1, 2, 3} {1, 2, 3, 4, 5, 6} c(1) = 2, c(2) = 4, c(3) = 6
d {0, 1, 2, 3} {1, 2, 3, 4, 5, 6}. d (0) = 0, d(1) = 2, d(2) = 4, d(3) = 6

A86(a) f (3) = 3, f (π) = 3, f (3.5) = 4, f (3.88) = 4, and f (0) is undefined (because 0 ∉ R+ ).

(b) Yes. (c) Yes.
A87. This is a trick question. The answer is that yes, of course it is possible — as we
stressed earlier, the mapping rule need not “make any sense”.
We can construct exactly 2 × 2 = 4 possible functions using A = {Lion, Eagle} as the domain
and B = {Fat, Tall} as the codomain:

f (Lion) = Fat, f (Eagle) = Fat g (Lion) = Fat, g (Eagle) = Tall.

h (Lion) = Tall, h (Eagle) = Fat. i (Lion) = Tall, i (Eagle) = Tall.

In general, given any two finite sets S and T , we can construct n (S) × n (T ) possible
functions using S as the domain and T as the codomain.
A88(a) Yes, a is well-defined because it maps every element in the domain to (ex-
actly) one element in the codomain — we have a (Cow) = Produces milk, a (Chicken) =
Produces eggs, and a (Dog) = Guards the home.
(b) No, b isn’t well-defined, because it isn’t clear what b (Dog) is.
(c) No, c isn’t well-defined, because it isn’t clear what each state’s “most splendid” city is.
(d) No, d isn’t well-defined. China has more than one city with over 10M people, while
Iceland has none. So, China would be mapped to more than one element in the codomain,
while Iceland would be mapped to none — in either case, we’d violate the requirement that
a function map every element in the domain to (exactly) one element in the codomain.

1405, Contents

A89(a)(i) Yes, a is well-defined. Every element in the domain is mapped to (exactly)
one element in the codomain — 5 to 10 ∈ Z, 6 to 12 ∈ Z, and 7 to 14 ∈ Z. (ii) Define
a ∶ {5, 6, 7} → Z by a (x) = 2x.
(b)(i) Yes, b is well-defined. Every element in the domain is mapped to (exactly) one
element in the codomain — 5 to 10 ∈ Z+ , 6 to 12 ∈ Z+ , and 7 to 14 ∈ Z+ . (ii) Define
b ∶ {5, 6, 7} → Z+ by b (x) = 2x.
(c)(i) No, c is not well-defined, because for example, the element 5 in the domain is allegedly
mapped to 10, which isn’t an element in the codomain (the set of negative integers).
(d)(i) No, d is not well-defined, because the element 5.4 in the domain is allegedly mapped
to 10.8, which isn’t an element in the codomain (the set of integers).
(e)(i) Yes, e is well-defined. Every element in the domain is mapped to (exactly) one
element in the codomain — 5.5 to 11 ∈ Z, 6 to 12 ∈ Z, and 7 to 14 ∈ Z. (ii) Define
e ∶ {5.5, 6, 7} → Z by e (x) = 2x.
(f)(i) Yes, f is well-defined. From the mapping rule and the codomain, it is unambiguous
that 3 is to be mapped to 4 ∈ {3, 4}. (ii) Define f ∶ {3} → {3, 4} by f (3) = 4.
(g)(i) Yes, g is well-defined. From the mapping rule and the codomain, it is unambiguous
that 3 and 4 are to be mapped to 4 ∈ {3, 4}. (ii) Define g ∶ {3, 3.1} → {3, 4} by g(3) = 4.
and g(3.1) = 4.
(h)(i) No, h is not well-defined. It is unclear whether 0 should be mapped to 3 or 4.
(i)(i) No, i is not well-defined. It is unclear what we should map 4 to, since there is no
number larger than 4 in the codomain.
(j)(i) No, j is not well-defined. It is unclear what we should map 2 to, since there is no
number smaller than 2 in the codomain.
(k)(i) Yes, k is well-defined. From the mapping rule and the codomain, it is unambiguous
that 1 is to be mapped to 1 ∈ {1}. (ii) Define k ∶ {1} → {1} by k (x) = x.
(l)(i) Yes, l is well-defined. From the mapping rule and the codomain, it is unambiguous
that 1 is to be mapped to 1 ∈ {1, 2}. (ii) Define l ∶ {1} → {1, 2} by l (x) = x.
(m)(i) No, m is not well-defined. It is unclear what 2 should be mapped to, since there is
no number that is “the same” as 2 in the codomain.
well-defined, because for example, the element −1 in the domain is
(n)(i) No, n is not √
allegedly mapped to −1, which isn’t an element in the codomain (the set of reals).414
(o)(i) No, o is not well-defined, because the element 0 in the domain is allegedly mapped
to 1 ÷ 0, which isn’t an element in the codomain (the set of reals).415
(p)(i) No, p is not well-defined, because for example, the element 3 in the domain is
allegedly mapped to 4, which isn’t an element of the codomain [0, 1].
(q)(i) Yes, q is well-defined. Every element x in the domain [0, 1] is mapped to (exactly)
one element in the codomain R, namely x + 1 ∈ R. (ii) Define q ∶ [0, 1] → R by q (x) = x + 1.

As noted earlier and as we’ll learn later, −1 is not a real but an imaginary number.
As discussed in Ch. 6, 1 ÷ 0 is not a real number. Indeed, it is not even a number; it is undefined.
1406, Contents
A90. Change the domain of n to R+0 , the set of non-negative
√ 416 reals. We then have the
function n ∶ R0 → R that is (well-)defined by n (x) = x.

Change the domain of o to R ∖ {0}, the set of all reals except zero. We then have the
function o ∶ R ∖ {0} → R that is (well-)defined by o (x) = 1/x.417
A91(a) Range (a) = R+0 . (b) Range (b) = {0, 1, 4, 9, 16, 25, 49, . . . }.
(c) Range (c) = {0, 1, 4, 9, 16, 25, 49, . . . }. (d) Range (d) = Z.
(e) Range(e) = Z. (f) Range(f ) = {100, 200}. (g) Range(g) = {100}.
d has a range that’s equal to its codomain. So does f . None of the other functions do.
A92. Only (b) “Range(f ) ⊆ Codomain(f )” must be true.

124.6. Ch. 11 Answers (An Introduction to Continuity)

(This chapter had no exercises.)

Note that the new domain R+0 is indeed the largest subset of R such that the function n is well-defined.
The addition of any negative number to this new domain would render the function ill-defined.
Note that the new domain R∖{0} is indeed the largest subset of R such that the function o is well-defined.
The addition of zero to this new domain would render the function ill-defined.
1407, Contents
124.7. Ch. 13 Answers (Arithmetic Combinations of Functions)

A94. (a) (f + g) (2) = f (2) + g(2) = 7(2) + 5 + 23 = 27.

(b) (g − f ) (1) = g(1) − f (1) = 13 − (7 × 1 + 5) = −11.
(c) (g ⋅ f ) (2) = g(2)f (2) = 23 (7 × 2 + 5) = 152.
(d) (kg) (1) = 2g(1) = 2 × 13 = 2.
13 1
( ) (1) = = = .
g g(1)
f (1) 7×1+5 12
√ √
(f) (h + i) (2) = h(2) + i(2) = 2 + 1 + 2 + 1 = 3 + 3.
√ √
(g) (i − h) (1) = i(1) − h(1) = 1 + 1 − (1 + 1) = 2 − 2.
√ √
(h) (i ⋅ h) (2) = i(2)h(2) = 2 + 1 (2 + 1) = 3 3.
√ √
(i) (li) (1) = 5i(1) =5 1+1 = 5 2.
√ √
1+1 2
( ) (1) = = =
i i(1)
h h(1) 2

(k) For f + h, f − h, and f ⋅ h, the domain is simply:

Domain(f ) ∩ Domain(h) = R ∩ [−1, ∞) = [−1, ∞).

For f /h, the domain is:

Domain(f ) ∩ Domain(h) ∖ {x ∶ h (x) = 0} = [−1, ∞) ∖ {−1} = (−1, ∞).

Hence, define:

(f + h) ∶ [−1, ∞) → R by (f + h) (x) = f (x) + h (x) = 8x + 6.

(f − h) ∶ [−1, ∞) → R by (f − h) (x) = f (x) − h (x) = 6x + 4.
(f ⋅ h) ∶ [−1, ∞) → R by (f ⋅ h) (x) = f (x) ⋅ h (x) = (7x + 5) (x + 1) .
f (x) 7x + 5
( )∶ (−1, ∞) → R by ( ) (x) = =
f f
h (x) x + 1
h h

1408, Contents

124.8. Ch. 14 Answers (Inverse Functions)
A95(a) Given a ∶ R → R defined by a (x) = 5x, its inverse a−1 has:
1. Domain (a−1 ) = Range (a) = R;
2. Codomain (a−1 ) = Domain (a) = R;
3. Mapping rule given by: a−1 (a (x)) = x ⇐⇒ a−1 (5x) = x ⇐⇒ a−1 (y) = x (let y = 5x and
do the algebra — x = y/5) ⇐⇒ a−1 (y) = y/5.
Thus, we define a−1 ∶ R → R by a−1 (y) = y/5.

(b) Given b ∶ R → R defined by b (x) = x3 , its inverse b−1 has:

1. Domain (b−1 ) = Range (b) = R;
2. Codomain (b−1 ) = Domain (b) = R;
3. Mapping rule given by: b−1 (b (x)) = x ⇐⇒ b−1 (x3 ) = x ⇐⇒ b−1 (y) = x (let y = x3 and
√ √
do the algebra — x = 3 y) ⇐⇒ b−1 (y) = 3 y.

Thus, we define b−1 ∶ R → R by b−1 (y) = 3 y.

(c) Given c ∶ R+ → R defined by c (x) = ln y, its inverse c−1 has:

1. Domain (c−1 ) = Range (c) = R;
2. Codomain (c−1 ) = Domain (c) = R+ ;
3. Mapping rule given by: c−1 (c (x)) = x ⇐⇒ c−1 (ln x) = x ⇐⇒ c−1 (y) = x (let y = ln x
and do the algebra — x = ey ) ⇐⇒ c−1 (y) = ey .
Thus, we define c−1 ∶ R → R+ by c−1 (y) = ey .

(d) Given d ∶ R+ → R defined by d (x) = 1/x2 , its inverse d−1 has:

1. Domain (d−1 ) = Range (d) = R ∖ {0};
2. Codomain (d−1 ) = Domain (d) = R+ ;
3. Mapping rule given by: d−1 (d (x)) = x ⇐⇒ d−1 (1/x2 ) = x ⇐⇒ d−1 (y) = x c−1 (y) = x
√ √
(let y = 1/x2 and do the algebra — x = ±1/ y) ⇐⇒ d−1 (y) = 1/ y.

(Note that in the last step, we discard −1/ y, because the codomain of d−1 is the set of
non-negative real numbers.)

Thus, we define d−1 ∶ R ∖ {0} → R+ by d−1 (y) = 1/ y.

A96(a) No, a is not invertible because a(−1) = a(1) = 0.

(b) Let x2 > x1 ≥ 0. Then:

x22 > x21 ⇐⇒ x22 − 1 > x21 − 1 ⇐⇒ b (x2 ) > b (x1 ) Ô⇒ b (x2 ) ≠ b (x1 ) .

We’ve just shown that if x1 ≠ x2 , then b (x1 ) ≠ b √

(x2 ). Thus, b is invertible. The inverse
function b ∶ [−1, ∞) → R0 is defined by b (y) = y + 1.
−1 + −1

(c) No, c is not invertible because c(−1) = c(1) = 0.

1409, Contents
A97(a) Range(f ) = (1, 2]. So, define f −1 ∶ (1, 2] → (0, 1] by f −1 (x) = x − 1.

(1, 2)

(2, 1)

f −1

(b) Range(g) = (0, 2]. So, define g −1 ∶ (0, 2] → (0, 1] by g −1 (x) = x/2.

(1, 2)

g −1
(2, 1)

(Answer continues on the next page ...)

1410, Contents

(Answer continues on the next page ...)
A97(c) Range(h) = [1, ∞). So, define h−1 ∶ [1, ∞) → (0, 1] by h−1 (x) = 1/x.


(1, 1) h−1

(d) Range(i) = (0, 1]. So, define i−1 ∶ (0, 1] → (0, 1] by i−1 (x) = x.


(1, 1)


1411, Contents

A98(a) The element 1 in the codomain is hit twice — we have f (0) = 1/ (0 − 1) = 1 and

f (2) = 1/ (2 − 1) = 1. Thus, f is not one-to-one or invertible.


(b) Let x1 , x2 ∈ (1, ∞) with x2 > x1 . Then x2 − 1 > x1 − 1 > 0 Ô⇒ (x2 − 1) > (x1 − 1)
2 2

Ô⇒ 1/ (x2 − 1) < 1/ (x1 − 1) , so that g (x2 ) ≠ g (x1 ). Thus, g is invertible.

2 2

We can find the inverse g −1 as usual:

1. Domain (g −1 ) = Range(g) = R+ ;
2. Codomain (g −1 ) = Domain(g) = (1, ∞);
3. Mapping rule given by:

g −1 (g (x)) = x ⇐⇒ g −1 ( 2) =x
(x − 1)
⇐⇒ g −1 (y) = x (Let y = g (x) = 2 .)
(x − 1)
1 1
⇐⇒ g −1 (y) = 1 + √ (Do the algebra: x = 1 ± √ .)
y y

Note that in the last step, we discard 1 − 1/ y, because the codomain√of g −1 is (1, ∞).
Thus, the inverse function is g −1 ∶ R+ → (1, ∞) defined by g −1 (y) = 1 + 1/ y.

(c) Let x3 , x4 ∈ (−∞, 1) with x4 < x3 . Then x4 − 1 < x3 − 1 < 0 Ô⇒ (x4 − 1) > (x1 − 1)
2 2

Ô⇒ 1/ (x4 − 1) < 1/ (x3 − 1) , so that h (x4 ) ≠ h (x3 ). Thus, h is invertible.

2 2

We can find the inverse h−1 as usual:

1. Domain (h−1 ) = Range(h) = R+ ;
2. Codomain (h−1 ) = Domain(h) = (−∞, 1);
3. Mapping rule given by:

h−1 (h (x)) = x ⇐⇒ h−1 ( 2) =x
(x − 1)
⇐⇒ h−1 (y) = x (Let y = h (x) = 2 .)
(x − 1)
1 1
⇐⇒ h−1 (y) = 1 − √ (Do the algebra: x = 1 ± √ .)
y y

Note that in the last step, we discard 1 + 1/ y, because the codomain of√h−1 is (−∞, 1).
Thus, the inverse function is h−1 ∶ R+ → (−∞, 1) defined by h−1 (y) = 1 − 1/ y.

1412, Contents

124.9. Ch. 15 Answers (Composite Functions)

A99(a)(i) Range (g 3 ) = R ⊆ Domain(g) = R. Thus, the composite function g 4 = gg 3 exists.

(ii) Define g 4 ∶ R → R by:

4− 8
3 x x
5 x
g (x) = g ( − ) = 1 −
= + .
4 8 2 8 16

(iii) g 4 (1) = 11/16 and g 4 (3) = 13/16.

(b) (i) Range (g 4 ) = R ⊆ Domain(g) = R. Thus, the composite function g 5 = gg 4 exists.

8 + 16
5 x x
11 x
(ii) Define g ∶ R → R by g (x) = g ( + ) = 1 −
5 5
= − .
8 16 2 16 32
(iii) g 5 (1) = 21/32 and g 5 (3) = 19/32.

(c) (i) Range (g 6 ) = R ⊆ Domain(g) = R. Thus, the composite function g 6 = gg 5 exists.

11 x 11
− 32
21 x
(ii) Define g ∶ R → R by g (x) = g ( − ) = 1 −
6 6 16
= + .
16 32 2 32 64
(iii) g (1) = 43/64 and g (3) = 45/64.
6 6

(d) So far, we have:

1 x 1 x 3 x
g (x) = − , g 2 (x) = + , g 3 (x) = − ,
1 2 2 4 4 8

5 x 11 x 21 x
g 4 (x) = + , g 5 (x) = − , g 6 (x) = + .
8 16 16 32 32 64

We observe that the red numbers are the Jacobstahl numbers; the blue numbers are 2n−1 ;
the green numbers are 2n ; and the sign between the two terms alternates between − and +.
Thus, we guess that g n ∶ R → R is defined by:
2n −(−1)
x 2 (−1)
g (x) = + (−1) n = − + (−1)
n 3 n n x
3 3 ⋅ 2n−1
2 n−1 2 2n
As n → ∞, the second and third terms tend towards zero. Hence, for any x ∈ R, we have:
lim g n (x) = .
n→∞ 3

1413, Contents

A100(a) Range(g) = [1, ∞) ⊆ Domain(f ) = R. Hence, the composite function f g ∶ R → R
exists and is defined by:

(f g) (x) = f (g (x)) = f (x2 + 1) = ex +1


We have f g(1) = e1 +1
= e2 and f g(2) = e2 +1
= e5 .
2 2

Also, Range(f ) = R+ ⊆ Domain(g) = R. Hence, the composite function gf ∶ R → R exists

and is defined by:

(gf ) (x) = g (f (x)) = g (ex ) = (ex ) + 1 = e2x + 1.


We have gf (1) = e2⋅1 + 1 = e2 + 1 and gf (2) = e2⋅2 + 1 = e4 + 1.

(b) Range(g) = R∖{0} ⊆ Domain(f ) = R∖{0}. Hence, the composite function f g ∶ R∖{0} →
R exists and is defined by:
1 1
(f g) (x) = f (g (x)) = f ( )= = 2x.
2x 1/2x

We have f g(1) = 2 ⋅ 1 = 2 and f g(2) = 2 ⋅ 2 = 4.

Also, Range(f ) = R ∖ {0} ⊆ Domain(g) = R ∖ {0}. Hence, the composite function gf ∶
R ∖ {0} → R exists and is defined by:
1 1
(gf ) (x) = g (f (x)) = g ( ) = = .
x 2 ⋅ (1/x) 2

We have gf (1) = 1/2 and gf (2) = 2/2 = 1.

(c) Range(g) = [1, ∞) ⊆ Domain(f ) = R ∖ {0}. Hence, the composite function f g ∶ R → R
exists and is defined by:
(f g) (x) = f (g (x)) = f (x2 + 1) =
x2 + 1

We have f g(1) = 1/ (12 + 1) = 1/2 and f g(2) = 1/ (22 + 1) = 1/5.

Also, Range(f ) = R∖{0} ⊆ Domain(g) = R. Hence, the composite function gf ∶ R∖{0} → R
exists and is defined by:
1 1
(gf ) (x) = g (f (x)) = g ( ) = 2 + 1.
x x

We have gf (1) = 1/12 + 1 = 2 and gf (2) = 1/22 + 1 = 5/4.

(d) Range(g) = [−1, ∞) ⊆/ Domain(f ) = R ∖ {0}. Hence, the composite function f g ∶ R → R
does not exist.
Range(f ) = R ∖ {0} ⊆ Domain(g) = R. Hence, the composite function gf ∶ R ∖ {0} → R
exists and is defined by:
1 1
(gf ) (x) = g (f (x)) = g ( ) = 2 − 1.
x x

We have gf (1) = 1/12 − 1 = 0 and gf (2) = 1/22 − 1 = −3/4.

1414, Contents

A101(a) Range(f ) = R+ ⊆ Domain(f ) = R. Hence, the composite function f 2 ∶ R → R
exists and is defined by

f 2 (x) = f (f (x)) = f (ex ) = ee .


We have f 2 (1) = ee = ee and f 2 (2) = ee .

1 2

(b) Range(f ) = R ⊆ Domain(f ) = R. Hence, the composite function f 2 ∶ R → R exists and

is defined by

f 2 (x) = f (f (x)) = f (3x + 2) = 3 (3x + 2) + 2 = 9x + 8.

We have f 2 (1) = 9 ⋅ 1 + 8 = 17 and f 2 (2) = 9 ⋅ 2 + 8 = 26.

(c) Range(f ) = [1, ∞) ⊆ Domain(f ) = R. Hence, the composite function f 2 ∶ R → R exists

and is defined by

f 2 (x) = f (f (x)) = f (2x2 + 1) = 2 (2x2 + 1) + 1 = 2 (4x4 + 4x2 + 1) + 1 = 8x4 + 8x2 + 3.


We have f 2 (1) = 8 ⋅ 14 + 8 ⋅ 12 + 3 = 19 and f 2 (2) = 8 ⋅ 24 + 8 ⋅ 22 + 3 = 128 + 32 + 3 = 163.

(d) Range(f ) = R ⊆/ Domain(f ) = R+ . Hence, the composite function f 2 ∶ R → R does not


1415, Contents

124.10. Ch. 16 Answers (Transformations)
A102(a) We already graphed y = 2f (x) + 1 in the above example. Simply reflect that in
the x-axis to get the graph of y = −2f (x − 1).

y = 2f (x) + 1 y

y = −2f (x) − 1

(b) We already graphed y = 2f (x) + 1 in the above example. Simply reflect that in the
y-axis to get the graph of y = 2f (−x) + 1.

y = 2f (x) + 1 y y = 2f (−x) + 1

1416, Contents

A102(c) We already graphed y = 2f (x + 1) in the above example. Simply reflect that in
the x-axis to get the graph of y = −2f (x + 1).

y = 2f (x + 1)
y = −2f (x + 1)

(d) We already graphed y = 2f (x + 1) in the above example. Simply reflect that in the
x-axis to get the graph of y = 2f (−x + 1).

y = 2f (−x + 1) y = 2f (x + 1) f

1417, Contents

A102(e) We already graphed y = f (2x) + 1 in the above example. Reflect that in the
x-axis to get y = −f (2x) − 1.
Then translate upwards by 2 units to get the graph of y = −f (2x) + 1.

y = −f (2x) + 1 y = f (2x) + 1

(f) We already graphed y = f (2x) + 1 in the above example. Reflect that in the y-axis to
get y = f (−2x) + 1.

y = f (−2x) + 1 y = f (2x) + 1

1418, Contents

A102(g) We already graphed y = f (2x + 1) in the above example. Reflect that in the
x-axis to get y = −f (2x + 1).

y = −f (2x + 1) y = f (2x + 1)

(h) We already graphed y = f (2x + 1) in the above example. Reflect that in the y-axis to
get y = f (−2x + 1).

y = f (−2x + 1) y = f (2x + 1) f

1419, Contents

A103(a) First compress horizontally by a factor of 2 to get y = f (2x).
Then stretch vertically by a factor of 2 to get y = 2f (2x).
Now reflect the portion below the x-axis in the x-axis to get y = ∣2f (2x)∣.

y = ∣2f (2x)∣ y

y = f (2x) y = 2f (2x)

(b) First translate 1 unit rightwards to get y = f (x − 1).

Next reflect the right portion in the y-axis to get the left portion of y = f (∣x − 1∣).
Now translate 2 units upwards to get y = f (∣x − 1∣) + 2.

y = f (∣x − 1∣) + 2 f

y = f (∣x − 1∣)

y = f (x − 1)

1420, Contents


3. y =
5x − 2
5. y = 3 −
5x − 2

4. y = −
5x − 2

1 1
1. y = 2. y =
x x−2

1. Start with the graph of y = 1/x.

2. Translate rightwards by 2 units to get y = 1/ (x − 2).
3. Compress horizontally by a factor of 5 to get y = 1/ (5x − 2).
4. Reflect in the x-axis to get y = −1/ (5x − 2).
5. Translate upwards by 3 units to get y = 3 − 1/ (5x − 2).

124.11. Ch. 17 Answers (ln, exp, and e)

(This chapter had no exercises.)

1421, Contents

124.12. Ch. 18 Answers (O-Level Review: The Derivative)
A105. f is both continuous everywhere and differentiable everywhere. ∎ g is continuous
everywhere but not differentiable everywhere. It is differentiable everywhere except at x = 0.
∎ h is neither continuous everywhere nor differentiable everywhere. It is continuous and
differentiable everywhere except at x = 0.
dy dy RRRR
= 2x, so R = 2 (0) = 0.
dy dy RRRR
= 15x − 8x + 7, so R = 3 ⋅ 0 − 8 ⋅ 0 + 7 = 7.
d d
(c) (x2 + 3x + 4) = 2x + 3 and (3x5 − 4x2 + 7x − 2) = 15x4 − 8x + 7. Thus:
dx dx
dy ×
= (2x + 3) (3x5 − 4x2 + 7x − 2) + (x2 + 3x + 4) (15x4 − 8x + 7)
= 21x6 + 54x5 + 60x4 − 16x3 − 15x2 + 6x + 22.
R = 22.
And so:
A107. Use the product rule for (a) and (b); and the quotient rule for (c)–(e).
d x x d 1 dy × x 1 1
(a) e = e and ln x = . Thus, = e + ex ln x = ex ( + ln x).
dx dx x dx x x
d 2 d x 1
(b) x = 2x and e ln x = ex ( + ln x). Thus:
dx dx x
dy × 2 x 1
= x e ( + ln x) + 2xex ln x = xex (1 + 2 ln x + x ln x).
dx x

dx − sin x dx
dy ÷ x d sin x cos x − sin x cos x sin x
x dx
(c) = = = − 2 .
dx x 2 x2 x x

dx − sin x dx
dy ÷ cos x d sin cos2 x + sin2 x
d cos x
(d) = = = .
dx cos x
2 cos x
2 cos2 x
dy ÷ z dx 1 − 1 dx
(e) = = − 2 . By the way, this is called the Reciprocal Rule.
dx z 2 z
dy (e) d sin x/dx cos x
(f) = − = − .
dx sin2 x sin2 x
dy (e) d cos x/dx − sin x sin x
(g) = − =− = .
dx cos x
2 cos2 x cos2 x
dy (e) d tan x/dx 1/ cos2 x 1
(h): = − 2 = − 2 = − .
dx tan x sin x/ cos2 x sin2 x

1422, Contents

d [x − ln (x + 1)]
A108(a) dy d
= 1+
dx dx dx
d [x − ln (x + 1)] d[x − ln (x + 1)]
= 0+
d[x − ln (x + 1)] dx
dx d ln (x + 1)
= 2 [x − ln (x + 1)] [ − ]
dx dx
d ln (x + 1) d(x + 1)
= 2 [x − ln (x + 1)] [1 − ]
d(x + 1) x
= 2 [x − ln (x + 1)] (1 − × 1)
= 2 [x − ln (x + 1)] = [x − ln (x + 1)]
x+1 x+1
dy RRRR 0
So RRR = 2 [0 − ln (0 + 1)] = 2 [0 − 0] × 0 = 0.
dx RR 0+1
(b) Let z = 1 + [x − ln (x + 1)] . Then:

dx − x dx
dy d sin xz Ch d sin xz d xz x d xz ÷ x z dx
= = = cos = cos
dx dx d xz dx z dx z z2

x z − x {[x − ln (x + 1)] x+1 }

= cos
z z2
1 + [x − ln (x + 1)] − x {[x − ln (x + 1)] x+1 }
2 2x
= (cos 2)
1 + [x − ln (x + 1)] {1 + [x − ln (x + 1)] }
2 2

where in the last step we simply plugged in z = 1 + [x − ln (x + 1)] . (We could do a little

more algebra to simplify a little further, but we wouldn’t get far.)

z∣x=0 = 1 + [0 − ln (0 + 1)] = 1.
Observe that:

dy RRRR 01−0⋅0
Thus: RRR = cos = 1.
dx RR 1 12
A109(a) Newton’s Second Law of Motion is F = (mv). (In words, force is equal to the
rate of change of momentum.)
(b) By definition: a= .
If mass is constant (i.e. mass is not changing over time), then = 0.
d × dm dv
Altogether: F= (mv) = v +m = 0 + ma = ma.
dt dt dt
1423, Contents
d 1 d
ln (exp x) = ⋅ ( exp x).
dx exp x dx
(b) Since exp is defined to be the inverse of ln, we have ln (exp x) = x. Thus, ln (exp x) =
x = 1.
(c) Putting our answers in (a) and (b) together, we have:

1 d d
⋅ ( exp x) = 1 or exp x = exp x.
exp x dx dx

A111(a) The only turning point is (0, 1).

(b) The only turning point is (0, 1).
(c) For every k ∈ Z, the points (2kπ, 1) and ((2k + 1) π, −1) are turning points. (Hence,
there are infinitely many turning points.)
(d) The only turning point is (0, 1).
A112. At A, F , and H, the graph is decreasing. So A, F , and H are not stationary points.
Thus, they cannot be turning points either.
B and D are “kinks” at which the derivative doesn’t exist. So B and D cannot be stationary
points. They are thus not turning points either.
At C and G, the derivative is zero. Thus, C and G are stationary points. However, they
are not turning points (because the derivative isn’t changing from negative to positive or
positive to negative).
At E, the derivative is zero and changing from positive to negative. Thus, E is both a
stationary and a turning point.

1424, Contents



Stationary 7 7 3 7 3 7 3 7 G
Turning 7 7 7 7 3 7 7 7

H x

1425, Contents

124.13. Ch. 19 Answers (O-Level Review: Trigonometry)
A114. Refer to the figure below.
1. By construction, we have ∠QP R = A − B and ∣P R∣ = 1.

Thus: ∣QR∣ = sin (A − B) and ∣P Q∣ = cos (A − B).

2. ∠P T U and ∠QP T are alternate; hence, ∠P T U = ∠QP T = A. Moreover, ∣P T ∣ = cos B.

Thus: ∣P U ∣ = sin A cos B and ∣T U ∣ = cos A cos B.

3. ∠SRT is complementary to ∠ST R, which is in turn complementary to ∠P T U . Hence,

∠SRT = ∠P T U = A. Moreover, ∣RT ∣ = sin B.

Thus: ∣ST ∣ = sin A sin B and ∣RS∣ = cos A sin B.

Q sin(A − B) cos A sin B


sin A sin B
sin B

cos(A − B)
cos B
A cos A cos B

P sin A cos B U

We now have: P Q = U T + T S or cos (A − B) = sin A sin B + cos A cos B.

And: QR = P U − RS or sin (A − B) = sin A cos B − cos A sin B.

1426, Contents

sin (A ± B)
A115. tan (A ± B) = (By definition of tangent)
cos (A ± B)

sin A cos B ± cos A sin B ⎛ By Add. and Sub. ⎞

cos A cos B ∓ sin A sin B ⎝ Form for sin and cos ⎠
sin A/ cos A ± sin B/ cos B
= (Divide by cos A cos B ≠ 0)
1 ∓ (sin A sin B) / (cos A cos B)
tan A ± tan B
= (By definition of tangent).
1 ∓ tan A tan B

A116(a) sin 2A = sin A cos A + cos A sin A = 2 sin A cos A.

(b) cos 2A = cos A cos A − sin A sin A = cos2 A − sin2 A = cos2 A − (1 − cos2 A) = 2 cos2 A − 1.
Also, cos 2A = cos2 A − sin2 A = (1 − sin2 A) − sin2 A = 1 − 2 sin2 A.
tan A + tan A 2 tan A
(c) tan 2A = = .
1 − tan A tan A 1 − tan2 A

A117(a) sin 3A = sin (A + 2A)

= sin A cos 2A + cos A sin 2A (Addition Formula)
= sin A (1 − 2 sin A) + cos A (2 sin A cos A) (Double-Angle Formulae)

= sin A − 2 sin3 A + 2 sin A cos2 A

= sin A − 2 sin3 A + 2 sin A (1 − sin2 A) (sin2 A + cos2 A = 1)
= 3 sin A − 4 sin3 A.

(b) cos 3A = cos (A + 2A)

= cos A cos 2A − sin A sin 2A (Addition Formula)
= cos A (2 cos2 A − 1) + sin A (2 sin A cos A) (Double-Angle Formulae)
= 2 cos3 A − cos A + 2 sin2 A cos A
= 2 cos3 A − cos A + 2 (1 − cos2 A) cos A (sin2 A + cos2 A = 1)
= 4 cos3 A − 3 cos A.

(c) tan 3A = tan (A + 2A)

tan A + tan 2A
= (Addition Formula)
1 − tan A tan 2A
tan A + 2 tan A/ (1 − tan2 A)
= (Double-Angle Formula)
1 − tan A [2 tan A/ (1 − tan2 A)]
tan A (1 − tan2 A) + 2 tan A
1 (1 − tan2 A) − tan A (2 tan A)
3 tan A − tan3 A
= (Multiply by 1 − tan2 A)
1 − 3 tan2 A

1427, Contents

A118. By the Double Angle Formulae, for all A, we have:

cos A = cos ( + ) = 2 cos2 − 1 = 1 − 2 sin2 .

2 2 2 2

A 1 − cos A A 1 + cos A
Thus: sin2 = and cos2 = .
2 2 2 2
Taking square roots, we have:
√ √
1 − cos A 1 + cos A
sin = ± and cos = ±
2 2 2 2

Here we must be a little careful with the signs. By the mnemonic ASTC, we know that
sin A/2 is positive if A/2 is in Quadrant I or II, but negative otherwise. Thus:
⎧ √

⎪ 1 − cos A

⎪ for
in Quadrant I or II,
A ⎪ ⎪ 2 2
sin = ⎨ √
2 ⎪ ⎪

⎪ 1 − cos A
⎪ −

for in Quadrant III or IV.
⎩ 2 2

And cos is positive if is in Quadrant I or IV, but negative otherwise. Thus:
2 2
⎧ √

⎪ 1 + cos A

A ⎪
for in Quadrant I or IV,

⎪ 2 2
cos = ⎨ √
2 ⎪ ⎪

⎪ 1 + cos A
⎪ −

for in Quadrant II or III.
⎩ 2 2

1428, Contents

A119. As per the hint, the key is to observe that:

P +Q P −Q P +Q P −Q
P= + and Q = − .
2 2 2 2
Now simply apply the Addition and Subtraction Formulae:

P +Q P −Q P +Q P −Q P +Q P −Q
sin P = sin ( + ) = sin cos + cos sin ,
2 2 2 2 2 2

P +Q P −Q P +Q P −Q P +Q P −Q
sin Q = sin ( − ) = sin cos − cos sin ,
2 2 2 2 2 2

P +Q P −Q P +Q P −Q P +Q P −Q
cos P = cos ( + ) = cos cos − sin sin ,
2 2 2 2 2 2

P +Q P −Q P +Q P −Q P +Q P −Q
cos Q = cos ( − ) = cos cos + sin sin .
2 2 2 2 2 2

You can easily verify that the four S2P or P2S Formulae now follow.

P +Q P −Q
A120. Let 2x = and 5x = .
2 2
P +Q P −Q
Then: P= + = 2x + 5x = 7x;
2 2

P +Q P −Q
And: Q= − = 2x − 5x = −3x.
2 2

And so, by the P2S Formulae, we have:

sin 7x + sin (−3x) sin 7x − sin 3x

(a) sin 2x cos 5x = = .
2 2
sin 7x − sin (−3x) sin 7x + sin 3x
(b) cos 2x sin 5x = = .
2 2
cos 7x + cos (−3x) cos 7x + cos 3x
(c) cos 2x cos 5x = = .
2 2
cos 7x − cos (−3x) cos 3x − cos 7x
(d) sin 2x sin 5x = − = .
2 2

1429, Contents

A122(a) Recall (p. 267) that cosine is symmetric in the y-axis. That is, for all x, cos x =
cos (−x). Thus,

− cos θ = cos (θ − π) = cos (π − θ) .

(b) Plugging in θ = cos−1 x, we have:

− cos (cos−1 x) = cos (π − cos−1 x) .

´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶

(c) Applying cos−1 , we have: cos−1 (−x) = π − cos−1 x.

Rearranging, we have: cos−1 x + cos−1 (−x) = π.

A123(a) Recall (p. 19.5) that cosine is symmetric in the y-axis. That is, for all x,
cos x = cos (−x). Thus,

sin θ = cos (θ − ) = cos ( − θ) .

π π
2 2

(b) Plugging in θ = sin−1 x, we have:

sin (sin−1 x) = cos ( − sin−1 x) .

´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ 2

(c) Applying cos−1 , we have: cos−1 x = − sin−1 x.


sin−1 x + cos−1 x =
Rearranging, we have: .

1430, Contents

124.14. Ch. 21 Answers (Polynomials)
16x + 3
A124(a) In the expression , the dividend is 16x + 3 and the divisor is 5x − 2.
5x − 2

Terms: x1x0
5x − 2 16x +3
16x −6.4

The quotient is 3.2 and 9.4 is the remainder. We have:

16x + 3 9.4
= 3.2 + .
5x − 2 5x − 2
4x2 − 3x + 1
(b) In the expression , the dividend is 4x2 − 3x + 1 and the divisor is x + 5.
Long division:

Terms: x2 x1 x0
4x −23
x + 5 4x2 −3x +1
4x2 +20x
−23x +1
−23x −115

The quotient is 4x − 23 and 116 is the remainder. We have:

4x2 − 3x + 1 116
= 4x − 23 + .
x+5 x+5
x2 + x + 3
(c) In the expression 2 , the dividend is x2 +x+3 and the divisor is −x2 − 2x + 1.
−x − 2x + 1
Long division:

Terms: x2 x1 x0
−x2 − 2x + 1 x2 +x +3
x2 +2x −1
−x +4

The quotient is −1 and −x + 4 is the remainder. We have:

x2 + x + 3 −x + 4
= −1 + .
−x2 − 2x + 1 −x2 − 2x + 1
1431, Contents
A125(a) (2x3 + 7x2 − 3x + 5) ÷ (x − 3) leaves 2 ⋅ 33 + 7 ⋅ 32 − 3 ⋅ 3 + 5 = 113.
(b) (−2x4 + 3x2 − 7x − 1) ÷ (x + 2) leaves −2 ⋅ (−2) + 3 ⋅ (−2) − 7 ⋅ (−2) − 1 = −7.
4 2

A126(a) I’ll use the SSGACM:

(2x − 3) (x + 1) = 2x2 − x − 3. 7

Aiyah, sian. Now try again, but switch −3 and 1:

(2x + 1) (x − 3) = 2x2 − 5x − 3. 3 Yay! Done!

√ use the quadratic formula. We have b −4ac = (−19) −4(7)(−6) = 361+168 = 529 > 0
2 2
(b) I’ll
and 529 = 23. Thus:
19 − 23 19 + 23 2
7x2 − 19x − 6 = 7 (x − ) (x − ) = 7 (x + ) (x − 3) = (7x + 2) (x − 3) .
14 14 7

(c) I’ll use the SSGACM: (2x + 1) (3x − 1) = 6x2 + x − 1. 3 Yay! Done!

(d) I’ll start by using the FTGACM. Since the constant term is −14 = −2 × 7, let’s try
plugging in 2:

p(2) = 2 ⋅ 23 − 22 − 17 ⋅ 2 − 14 < 0. 7

Aiyah, sian. Doesn’t work — by the FT, x − 2 is not a factor for 2x3 − x2 − 17x − 14.
Let’s instead try −2:

p(−2) = 2 ⋅ (−2) − (−2) − 17 ⋅ (−2) − 14 = −16 − 4 + 34 − 14 = 0. 3

3 2

Yay, works! By the FT, x + 2 is a factor for 2x3 − x2 − 17x − 14.

Now, as usual, write: 2x3 − x2 − 17x − 14 = (x + 2) (ax2 + bx + c) .

The coefficients on the cubed and constant terms are a = 2 and 2c = −14. And so, c = −7.
To find b, look at the coefficients on the squared term, which are 2a + b = −1 and so b = −5.
Thus, ax2 + bx + c = 2x2 − 5x − 7.
To factorise 2x2 − 5x − 7, I’ll use the SSGACM:

(2x − 7) (x + 1) = 2x2 − 5x − 7. 3

(Wah! So “lucky”! Success on the very first try!)

Altogether, we have: 2x3 − x2 − 17x − 14 = (x + 2) (2x − 7) (x + 1).

1432, Contents

A127(a) By (i) and the RT, p(1) = a + b − 31 + 3 + 3 = a + b − 25 = 5. And so, b = 30 − a.

1 a b 31 3
By (ii): 0 = p( ) = + − + +3
2 16 8 4 2
a b 13 1 a 30 − a 13
= + − = + −
16 8 4 16 8 4
60 − a 13 ×16
= − = 60 − a − 52 = 8 − a.
16 4

Thus, a = 8 and b = 22. And we have:

p (x) = 8x4 + 22x3 − 31x2 + 3x + 3.

(b) Observe that p (0) > 0. Given also (iii) p (−1/3) < 0, the IVT says there must be some
−1/3 < c < 0 such that p (c) = 0.
So, let’s try the FTGACM, by plugging in −1/4:

1 4 1 3 1 2 1
8 (− ) + 22 (− ) − 31 (− ) + 3 (− ) + 3 = 0. 3
4 4 4 4
Yay, works! By the FT, x + 1/4 or 4 (x + 1/4) = 4x + 1 is a factor of p (x).
From (ii), we also already knew that x − 1/2 or 2 (x − 1/2) = 2x − 1 is a factor of p (x).
So write: p (x) = 8x4 + 22x3 − 31x2 + 3x + 3
= (2x − 1) (4x + 1) (dx2 + ex + f )
= (8x2 − 2x − 1) (dx2 + ex + f ) .

The coefficients on the 4th-degree and constant terms are 8d = 8 and −f = 3. And so, d = 1
and f = −3. To find e, look at the coefficients on the linear term, which are −2f − e = 3.
And so, e = −2f − 3 = 3. Thus, dx2 + ex + f = x2 + 3x − 3.
To factorise this last quadratic polynomial, we observe that b2 −4ac = 32 −4(1)(−3) = 21 > 0.
And so, by the quadratic formula, we have:
√ √ √ √
−3 − 21 −3 + 21 3 + 21 3 − 21
x2 + 3x − 3 = (x − ) (x − ) = (x + ) (x + ).
2 2 2 2

Altogether then, we have:

√ √
3+ 21 3− 21
p (x) = 8x4 + 22x3 − 31x2 + 3x + 3 = (2x − 1) (4x + 1) (x + ) (x + ).
2 2

1433, Contents

124.15. Ch. 22 Answers (Conic Sections)

A128. Translate x2 /a2 + y 2 /b2 = 1 leftwards by c units to get (x + c) /a2 + y 2 /b2 = 1. Then

further translate downwards by d units to get (x + c) /a2 + (y + d) /b2 = 1.

2 2

(x + c) (y + d) √
2 2
+ =1 b a2 − c2
−d +
a2 b2
√ (−c, b − d) is a strict √
a b2 − d2 a b2 − d2
−c − global maximum. −c +
b b
Line of symmetry

y = −d



Line of symmetry
x = −c

b a2 − c2
−d −
(−c, −b + d) is a strict
global minimum.

So,(x + c) /a2 +(y + d) /b2 = 1 is the exact same ellipse as x2 /a2 +y 2 /b2 = 1, but now centred
2 2

on the point (−c, −d) (instead of the origin).

x2 /a2 + y 2 /b2 = 1 had two turning points — the strict global maximum (0, b) and the strict
global minimum (0, −b). Since (x + c) /a2 + (y + d) /b2 = 1 is simply x2 /a2 + y 2 /b2 = 1
2 2

translated c units leftwards and d units downwards, (x + c) /a2 + (y + d) /b2 = 1 again has
2 2

two turning points — the strict global maximum (−c, b − d) and the strict global minimum
(−c, −b − d).
By observation, there are no asymptotes.
By observation, there are two lines of symmetry y = −d and x = −c.
(Answer continues on the next page ...)

1434, Contents

(... Answer continued from the previous page.)
Following the hint, to find the y-intercepts, plug in x = 0:

(0 + c) (y + d) (y + d) c2 a2 − c2
2 2 2
+ = 1 ⇐⇒ =1− 2 =
a2 b2 b2 a a2
√ √
y + d ± a2 − c2 b a2 − c2
⇐⇒ = ⇐⇒ y = −d ± .
b a a
√ √
b a2 − c2 b a2 − c2
So, if ∣a∣ > ∣c∣, then the y-intercepts are (0, −d − ) and (0, −d + ).
a a
If ∣a∣ = ∣c∣, then the (only) y-intercept is (0, −d). (Either the leftmost or rightmost point of
the ellipse just touches the y-axis.)
And if ∣a∣ < ∣c∣, then there are no y-intercepts (the ellipse doesn’t touch the y-axis).
Similarly, to find the x-intercepts, plug in y = 0:

(x + c) (0 + d) (x + c) d2 b2 − d2
2 2 2
+ = 1 ⇐⇒ = 1 − =
a2 b2 a2 b2 b2
√ √
x + c ± b2 − d2 a b2 − d2
⇐⇒ = ⇐⇒ x = −c ± .
a b b
√ √
a b 2 − d2 a b 2 − d2
So, if ∣b∣ > ∣d∣, then the x-intercepts are (−c − , 0) and (−c + , 0).
b b
If ∣b∣ = ∣d∣, then the (only) x-intercept is (0, −d). (Either the topmost or bottommost point
of the ellipse just touches the x-axis.)
And if ∣b∣ < ∣d∣, then there are no x-intercepts (the ellipse doesn’t touch the x-axis).

1435, Contents

3x + 2 4
A129(a) Do the long division: y= =3− .
x+2 x+2
1. There are two branches — one on the top-left and another on the bottom-right.
2. Intercepts. Plug in x = 0 to get y = 2/2 = 1. So, the y-intercept is (0, 1).
Plug in y = 0 to get 3x + 2 = 0 or x = −2/3. So, the x-intercept is (−2/3, 0).
3. There are no turning points.
4. Asymptotes. The value of x that makes the denominator 0 is −2 — hence, the vertical
asymptote is x = −2.
The quotient in the long division is 3 — hence, the horizontal asymptote is y = 3.
(Note that since the two asymptotes x = −2 and y = 3 are perpendicular, this is a
rectangular hyperbola.)
5. The hyperbola’s centre (the point at which the two asymptotes intersect) is (−2, 3).
(These coordinates are simply given by the vertical and horizontal asymptotes.)
6. The two lines of symmetry may be written as y = x + α and y = −x + β and pass
through the centre (−2, 3). Plugging in the numbers, we find that α = 5 and β = 1.
Thus, the two lines of symmetry are y = x + 5 and y = −x + 1.

3x + 2

Line of symmetry Line of symmetry

y = −x + 1 y =x+5
(−2, 3)
Horizontal asymptote
(0, 1)

(− , 0)

Vertical asymptote
x = −2

1436, Contents

x−2 1 3/2
A129(b) Do the long division: y = =− − .
−2x + 1 2 −2x + 1
1. There are two branches — one on the top-right and another on the bottom-left.
2. Intercepts. Plug in x = 0 to get y = −2/1 = −2. So, the y-intercept is (0, −2).
Plug in y = 0 to get x − 2 = 0 or x = 2. So, the x-intercept is (2, 0).
3. There are no turning points.
4. Asymptotes. The value of x that makes the denominator 0 is 1/2 — hence, the vertical
asymptote is x = 1/2.
The quotient in the long division is −1/2 — hence, the horizontal asymptote is y = −1/2.
(Note that since the two asymptotes x = 1/2 and y = −1/2 are perpendicular, this is a
rectangular hyperbola.)
5. The hyperbola’s centre (the point at which the two asymptotes intersect) is (1/2, −1/2).
(These coordinates are simply given by the vertical and horizontal asymptotes.)
6. The two lines of symmetry may be written as y = x + α and y = −x + β and pass
through the centre (1/2, −1/2). Plugging in the numbers, we find that α = −1 and β = 0.
Thus, the two lines of symmetry are y = x − 1 and y = −x.

−2x + 1
Line of symmetry Line of symmetry
y = −x y =x−1

(2, 0) x

Horizontal asymptote
1 1
( ,− ) y=−
(0, −2) 2 2 2

Vertical asymptote

1437, Contents

−3x + 1 3 11/2
A129(c) Do the long division: y = =− + .
2x + 3 2 2x + 3
1. There are two branches — one on the top-right and another on the bottom-left.
2. Intercepts. Plug in x = 0 to get y = 1/3. So, the y-intercept is (0, 1/3).
Plug in y = 0 to get −3x + 1 = 0 or x = 1/3. So, the x-intercept is (1/3, 0).
3. There are no turning points.
4. Asymptotes. The value of x that makes the denominator 0 is −3/2 — hence, the
vertical asymptote is x = −3/2.
The quotient in the long division is −3/2 — hence, the horizontal asymptote is y = −3/2.
(Note that since the two asymptotes x = −3/2 and y = −3/2 are perpendicular, this is a
rectangular hyperbola.)
5. The hyperbola’s centre (the point at which the two asymptotes intersect) is (−3/2, −3/2).
(These coordinates are simply given by the vertical and horizontal asymptotes.)
6. The two lines of symmetry may be written as y = x+α and y = −x+β and pass through
the centre (−3/2, −3/2). Plugging in the numbers, we find that α = 0 and β = −3. Thus,
the two lines of symmetry are y = x and y = −x − 3.

Vertical asymptote
Line of symmetry Line of symmetry
y = −x − 3 1 y=x
(0, )
( , 0) x

Horizontal asymptote
Centre y=−
3 3
(− , − )
2 2

−3x + 1
2x + 3

1438, Contents

x2 + 2x + 1 25
A130(a) Do the long division: =x+6+ .
x−4 x−4
Intercepts. Plug in x = 0 to get y = 1/ (−4) = −1/4. Thus, the y-intercept is (0, −1/4).
Plug in y = 0 to get x2 + 2x + 1 = 0, an equation which has one (real) solution x = −1. Thus,
the (only) x-intercept is (−1, 0).
Asymptotes. The vertical asymptote x = 4 is given by the value of x for which x − 4 = 0.
The oblique asymptote y = x + 6 is given by the quotient in the long division.
The centre’s x-coordinate is given by the vertical asymptote x = 4. For its y-coordinate,
plug x = 4 into the oblique asymptote to get y = 4 + 6 = 10. Hence, the centre is (4, 10).
You should be able to sketch the two lines of symmetry and the two turning points.418
(The two lines of symmetry run through the centre and bisect an angle formed by the two
asymptotes. And there is one turning point for each branch.)

y x2 + 2x + 1

y =x+6
√ √
y = (1 − 2) x + 6 + 4 2 (9, 20)

(4, 10)

(−1, 0)

1 x
(0, − )
In this example, the x-intercept
(−1, 0) coincides with the
maximum turning point.

√ √ x=4
y = (1 + 2) x + 6 − 4 2

dy d 25
To find the turning points, write: = (x + 6 + ) = 1 − 25 (x − 4) = 0 ⇐⇒ (x − 4) = 25.
−2 ! 2
dx dx
So x = −1, 9. The corresponding y-values are −1 + 6 + 25/ (−1 − 4) = 0 and 9 + 6 + 25/ (9 − 4) = 20. Thus,
the two turning points are (−1, 0) and (9, 20). (If necessary, we can also show that these are respectively
the strict local maximum and minimum.)
1439, Contents
−x2 + x − 1 3
A130(b) Do the long division: = −x + 2 − .
x+1 x+1
Intercepts. Plug in x = 0 to get y = −1/1 = −1. Thus, the y-intercept is (0, −1). Plug in
y = 0 to get −x2 + x − 1 = 0, an equation for which there are no (real) solutions. Thus, there
are no x-intercepts.
Asymptotes. The vertical asymptote x = −1 is given by the value of x for which x + 1 = 0.
The oblique asymptote y = −x + 2 is given by the quotient in the long division.
The centre’s x-coordinate is given by the vertical asymptote x = −1. For its y-coordinate,
plug x = −1 into the oblique asymptote to get y = − (−1)+2 = 3. Hence, the centre is (−1, 3).
You should be able to sketch the two lines of symmetry and the two turning points.419

−x2 + x − 1 y

√ √
y = (−1 − 2) x + 2 + 2

√ √
(−1 + 3, 3 − 2 3)
√ √
y = (−1 + 2) x + 2 − 2
y = −x + 2

(−1, 3) x
√ √
(0, −1) (−1 − 3, 3 + 2 3)

x = −1

dy d 3
To find the turning points, write: = (−x + 2 − ) = −1 + 3 (x + 1) = 0 ⇐⇒ (x + 1) = 3.
−2 ! 2
dx dx
√ √ √ √
So x = −1 ± 3. The corresponding y-values are 1 ∓ 3 + 2 − 3/ (−1 ± 3 + 1) = 3 ∓ 2 3. Thus, the two
√ √
turning points are (−1 ± 3, 3 ∓ 2 3). (If necessary, we can also show that these are respectively the
strict local maximum and minimum.)
1440, Contents
2x2 − 2x − 1 39
A130(c) Do the long division: = 2x − 10 + .
x+4 x+4
Intercepts. Plug in x = 0 to get y = −1/4. Thus, the y-intercept is (0, −1/4). Plug
in y = 0 to get 2x2 − 2x − 1 = 0, an equation for which there are two (real) solutions:
√ √ √
x = (2 ± 12) /4 = (1 ± 3) /2. Thus, there are two x-intercepts: ((1 ± 3) /2, 0).
Asymptotes. The vertical asymptote x = −4 is given by the value of x for which x + 4 = 0.
The oblique asymptote y = 2x − 10 is given by the quotient in the long division.
The centre’s x-coordinate is given by the vertical asymptote x = −4. For its y-coordinate,
plug x = −4 into the oblique asymptote to get y = 2 (−4) − 10 = −18. Hence, the centre is
(−4, −18).
You should be able to sketch the two lines of symmetry and the two turning points.420

y √ √
y = (2 + 5) x − 10 + 4 5

√ √
1− 3 1+ 3
( , 0) ( , 0)
2 2

√ √ 1 (−4 + 39/2, −18 + 2√78) x
(0, − )
y = (2 − 5) x − 10 − 4 5 4
(−4, −18)

√ √
(−4 − 39/2, −18 − 2 78)
y = 2x − 10

x = −4

2x2 − 2x − 1

dy d 39
To find the turning points, write: = (2x − 10 + ) = 2 − 39 (x + 4) = 0 ⇐⇒ (x + 4) = 39/2.
−2 ! 2

√ dx dx x 4 √ √
So x = −4 ± 39/2. The corresponding y-values are 2 (−4 ± 39/2) − 10 + 39/ (−4 ± 39/2 + 4) = −18 ±
√ √ √
2 78. Thus, the two turning points are (−4 ± 39/2, −18 ± 2 78). (If necessary, we can also show that
these are respectively the strict local maximum and minimum.)
1441, Contents
124.16. Ch. 23 Answers (Simple Parametric Equations)
A131(a) As stated in the above example, at time t = 0, particle P is at (x, y) = (cos 0, sin 0) =
(1, 0). In contrast, particle Q is at (x, y) = (sin 0, cos 0) = (0, 1).
(b) At t = 0, Q is at (x, y) = (sin 0, cos 0) = (0, 1). A little after t = 0, x = sin t will have
grown a little while y = cos t will have shrunk a little. That is, the particle Q will have
moved a little to the right and a little to the south. Thus, Q travels clockwise.
(c) Every 2π s, each particle travels one full circle. Therefore, at t = 664π, each particle
will be at its starting point. And π s later, each particle will have travelled an additional
half-circle. Thus, at t = 665π, particle P will be at (x, y) = (cos π, sin π) = (−1, 0) (at the
left of the circle), while particle Q will be at (x, y) = (sin π, cos π) = (0, −1) (at the bottom
of the circle).
(d) The two particles are at the exact same position whenever (cos t, sin t) = (sin t, cos t).
Thus, they are at the exact same position whenever cos t = sin t. By inspecting the graphs
of cos and sin (see e.g. p. 265), we see that this occurs at t = (k + 1/4) π, for every k ∈ Z+0 .
(e) For the particle Q, we have:

dx dy dvx d2 x dvy d2 y
vx = = cos t, vy = = − sin t, ax = = 2 = − sin t, ay = = 2 = − cos t.
dt dt dt dt dt dt

A132(a) For the particle R, we have:

dx dy dvx d2 x dvy d2 y
vx = = −a sin t, vy = = b cos t, ax = = 2 = −a cos t, ay = = 2 = −b sin t.
dt dt dt dt dt dt
√ √
(b)(i) At t = π/4, (x, y) = (a cos (π/4) , b sin (π/4)) = (a 2/2, b 2/2),
√ √
(vx , vy ) = (−a sin (π/4) , b cos (π/4)) = (−a 2/2, b 2/2),
√ √
(ax , ay ) = (−a cos (π/4) , −b sin (π/4)) = (−a 2/2, −b 2/2).

The particle R starts at (a, 0) and travels anticlockwise. At t = π/4, R has completed
√ full revolution
one-eighth of the
and is now√at the top-right of the ellipse; it is travelling

leftwards √
at a 2/2 m s and upwards at b 2/2 m s−1 ; and it is accelerating leftwards at
a 2/2 m s−2 and downwards at b 2/2 m s−2 .

(b)(ii) At t = π/2, (x, y) = (a cos (π/2) , b sin (π/2)) = (0, b),

(vx , vy ) = (−a sin (π/2) , b cos (π/2)) = (−a, 0),
(ax , ay ) = (−a cos (π/2) , −b sin (π/2)) = (0, −b).

At t = π/2, R has completed one-quarter of the full revolution and is now at the top of the
ellipse; it is travelling leftwards at a m s−1 (and not upwards at all); and it is accelerating
downwards at b m s−2 (and not rightwards at all).
(Answer continues on the next page ...)

1442, Contents

(... Answer continued from the previous page.)

A132(b)(iii) At t = 2π, (x, y) = (a cos 2π, b sin 2π) = (a, 0),

(vx , vy ) = (−a sin 2π, b cos 2π) = (0, b),
(ax , ay ) = (−a cos 2π, −b sin 2π) = (−a, 0).

At t = 2π, R has completed one full revolution and is back at its starting position; it is
travelling upwards at b m s−1 (and not rightwards at all); and it is accelerating leftwards at
a m s−2 (and not upwards at all).

At t = At t = ,
π π
, y
2 4
√ √
(x, y) = (0, b) , 2 2
(vx , vy ) = (−a, 0) , (x, y) = (a ,b ),
2 2
(ax , ay ) = (0, −b) . √ √
2 2
(vx , vy ) = (−a ,b ),
2 2
√ √
l (ax , ay ) = (−a
, −b

At t = 2π,
(x, y) = (a, 0) ,
(vx , vy ) = (0, b) ,
(ax , ay ) = (−a, 0) .
x2 y 2
U = {(x, y) ∶ + = 1}
a2 b2
Arrows indicate instantaneous
= {(x, y) ∶ x = a cos t, y = b sin t, t ≥ 0}
direction of travel.

(c) At t = 0, (x, y) = (a cos 0, b sin 0) = (a, 0) and (vx , vy ) = (−a sin 0, b cos 0) = (0, b). Hence:
(i) If a, b < 0, R starts at the left of the ellipse. It also starts by moving downwards and is
thus moving anticlockwise.
(ii) If a > 0, b < 0, R starts at the right of the ellipse. It also starts by moving downwards
and is thus moving clockwise.
(ii) If a < 0, b > 0, R starts at the left of the ellipse. It also starts by moving upwards and
is thus moving clockwise.

A134(a) An instant after t = 1.5π, the particle magically reappears “near” “bottom-right
infinity” (∞, −∞).
(b) During t ∈ (1.5π, 2.5π), the particle moves upwards along the right branch of the
hyperbola. At t = 2π, it is back to its starting position (1, 0). And as t → 2.5π, it “flies off”
towards “top-right infinity” (∞, ∞).

1443, Contents

A135(a) x = tan t, y = sec t Ô⇒ y 2 − x2 = 1.
(b) At t = 0, (x, y) = (tan 0, sec 0) = (0, 1) — the particle B is at the midpoint of the top
branch of the hyperbola.
(c) vx = dx/dt = sec2 t is always positive and so the particle is always moving rightwards.
(d)(i) At t = 0, B starts at the midpoint of the top branch of the hyperbola. During
t ∈ [0, 0.5π),B travels rightwards along the right portion of the top branch. As t → 0.5π, B
“flies off” towards the “top-right” infinity; its position, velocity, and acceleration in both
the x- and y-directions approach ∞.
(d)(ii) An instant after t = 0.5π, B magically reappears “near” “bottom-left” infinity.
During t ∈ (0.5π, 1.5π), it travels rightwards, along the bottom branch of the hyperbola.
Specifically, during t ∈ (0.5π, π), it is on the left portion of the bottom branch. At t = π, it
is at the midpoint of the bottom branch. And during t ∈ (π, 1.5π), it is on the right portion
of the bottom branch.
As t → 1.5π, B “flies off” towards the “bottom-right” infinity; its position, velocity, and
acceleration in the x-direction approach ∞; and in the y-direction, they approach −∞.
(d)(iii) An instant after t = 1.5π, B magically reappears “near” “top-left” infinity. During
t ∈ (1.5π, 2.5π), it travels rightwards, along the top branch of the hyperbola.
Specifically, during t ∈ (1.5π, 2π), it is on the left portion of the top branch. At t = 2π, it is
back to its starting position — the midpoint of the top branch. And during t ∈ (2π, 2.5π),
it is on the right portion of the bottom branch.
As t → 2.5π, B again “flies off” towards the “top-right” infinity as it did when t approached
0.5π; its position, velocity, and acceleration in both the x- and y-directions approach ∞.
(e) At t = 0, B is at the midpoint of the top branch — hence, Bb .
Since 1 ∈ [0, 0.5π) ≈ [0, 1.57), at t = 1, B must be on the right portion of the top branch —
hence, Bc .

Position Ba Bb Bc Bd Be Bf
Time t 5 0 1 2 3 4

Since 2, 3 ∈ (0.5π, π) ≈ (1.57, 3.14), at t = 2 and t = 3, B must be on the left portion of the
bottom branch. Since B is “near” “bottom-left” infinity an instant after t = 0.5π and t = 2
is earlier than t = 3, it must be that t = 2 corresponds to Bd and t = 3 corresponds to Be .
Since 4 ∈ (π, 1.5π) ≈ (3.14, 4.71), at t = 4, B must be on the right portion of the bottom
branch — hence, Bf .
Since 5 ∈ (1.5π, 2π) ≈ (4.71, 6.28), at t = 5, B must be on the left portion of the top branch
— hence, Ba .

1444, Contents

A136(a)(i) Rewrite x = t−1 as t = x+1 and plug this into y = ln (t + 1) to get y = ln (x + 2).
Noting that t ≥ 0 ⇐⇒ t + 1 ≥ 1 ⇐⇒ y = ln (t + 1) ≥ 0, we can rewrite the set as:

A = {(x, y) ∶ y = ln (x + 2) , y ≥ 0} .

(ii) The graph of y = ln (x + 2) is simply the graph of y = ln x shifted leftwards by 2 units.

Note the constraint y ≥ 0 — the particle A travels only along the black graph and does not
travel along the grey portion.

At t = e − 1 ≈ 1.72,
(x, y) = (e − 2, 1) ≈ (0.72, 1)
(vx , vy ) = (1, ) ≈ (1, 0.37)

point x
At t = 0,
(x, y) = (−1, 0)
(vx , vy ) = (1, 1)

A does not A = {(x, y) ∶ x = t − 1, y = ln (t + 1) , t ≥ 0}

travel along = {(x, y) ∶ y = ln (x + 2) , y ≥ 0}
this grey

dx dy 1
(iii) As usual, compute: vx = = 1 and vy = = .
dt dt t + 1
At t = 0, A starts at the position (x, y) = (0 − 1, ln (0 + 1)) = (−1, 0), and is moving right-
wards 1 m s−1 and upwards at 1/ (0 + 1) = 1 m s−1 .
A’s rightwards velocity stays fixed at 1 m s−1 , while its upwards velocity decreases towards
zero. As time progresses, A travels steadily towards “top-right infinity”.

1445, Contents

A136(b)(i) Rewrite x = 1/ (t + 1) as t = 1/x − 1. Plug into y = t2 + 1 to get y = (1/x − 1) + 1.

Noting that t ≥ 0 ⇐⇒ t + 1 ≥ 1 ⇐⇒ x = ∈ (0, 1], we can rewrite the set as:
B = {(x, y) ∶ y = ( − 1) + 1, x ∈ (0, 1]} .

(ii) Using your graphing calculator, we see that the complete graph of y = (1/x − 1) + 1 has

two branches.
However, we have the constraint x ∈ (0, 1]. And so, particle B travels only along the black
graph and does not travel along the grey portion.

y At t = 3,
1 1
(x, y) = ( , 32 + 1) = ( , 10)
3+1 4
B does not 1 1
travel along (vx , vy ) = (− , 2 ⋅ 3) = (− , 6)
(3 + 1)
2 16
this gray portion.

At t = 0,
(x, y) = ( , 02 + 1) = (1, 1)
1 1
B = {(x, y) ∶ x = , y = t2 + 1, t ≥ 0} (vx , vy ) = (− 2 , 2 ⋅ 0) = (−1, 0)
t+1 (0 + 1)
= {(x, y) ∶ y = ( − 1) + 1, x ∈ (0, 1]} Starting B does not
x point travel along
this gray portion.

dx 1 dy
(iii) As usual, compute: vx = =− and v = = 2t.
(t + 1)
dt 2 y
At t = 0, B starts at the position (x, y) = (1/ (0 + 1) , 02 + 1) = (1, 1), and is moving leftwards
at 1/ (0 + 1) = 1 m s−1 and is upwards at 2 ⋅ 0 = 0 m s−1 . That is, B is initially not moving

in the y-direction.
As time progresses, B move leftwards and upwards. Its leftwards velocity decreases towards
zero, while its upwards velocity increases towards infinity.

1446, Contents

A136(c)(i) Rewrite x = 2 sin t − 1 as (x + 1) /2 = sin t and y = 3 cos2 t as y/3 = cos2 t. By the
identity sin2 t + cos2 t = 1, we have [(x + 1) /2] + (y/3) = 1, or:
2 2

x+1 2 x2 + 2x + 1 3 3 9
y = 3 [1 − ( ) ] = 3 [1 − ] = − x2 − x + .
2 4 4 2 4

Noting that t ≥ 0 Ô⇒ sin t ∈ [−1, 1] ⇐⇒ x = 2 sin t − 1 ∈ [−3, 1], we may rewrite the set
as: C = {(x, y) ∶ y = −0.75x2 − 1.5x + 2.25, x ∈ [−3, 1]}.
(ii) The graph of y = −0.75x2 − 1.5x + 2.25 is simply a ∩-shaped quadratic, with turning
point at x = −1 and roots x = −3, 1. But note the constraint x ∈ [−3, 1] — the particle C
travels only along the black graph and not along the grey portion.

C = {(x, y) ∶ x = 2 sin t − 1, y = 3 cos2 t, t ≥ 0} y

3 3 9
= {(x, y) ∶ y = − x2 − x + , x ∈ [−3, 1]}
4 2 4 Starting

At t = 0,
(x, y) = (−1, 3)
(vx , vy ) = (2, 0)

At t = At t =
, At t = 2π, 2

(x, y) = (−3, 0) C returns to its (x, y) = (1, 0)

starting position.
(vx , vy ) = (0, 0) (vx , vy ) = (0, 0)

ß ß
C does not travel along x
these grey portions.

dx dy Ch.
(iii) As usual, compute: vx = = 2 cos t and vy = = −6 cos t sin t.
dt dt
At t = 0, C starts at (2 sin 0 − 1, 3 cos2 0) = (−1, 3) (the maximum point of the parabola)
and has velocity (vx , vy ) = (2 cos 0, −6 sin 0 cos 0) = (2, 0). That is, it is moving rightwards
at 2 m s−1 (and not moving in the y-direction).
During t ∈ (0, 0.5π), it moves rightwards along the parabola. At t = 0.5π, it is at the
rightmost point of the black graph and (vx , vy ) = (0, 0).
It then does a U-turn — during t ∈ (0.5π, 1.5π), it moves leftwards along the parabola.
At t = π, it is again at the maximum point of the parabola. And at t = 1.5π, it is at the
leftmost point of the constrained parabola and again (vx , vy ) = (0, 0).
It then does a U-turn — during t ∈ (1.5π, 2π), it moves rightwards along the parabola. And
at t = 2π, it is again at the maximum point of the parabola.
The particle has completed one period and will during t ∈ [2π, 4π] repeat exactly the same
movement made during t ∈ [0, 2π]. And so on.
1447, Contents
124.17. Ch. 24 Answers (Solving Inequalities)
A140(a) ∣x − 4∣ ≤ 71 ⇐⇒ −71 ≤ x − 4 ≤ 71 ⇐⇒ −67 ≤ x ≤ 75. The solution set is [−67, 75].
(b) ∣5 − x∣ > 13 ⇐⇒ (5 − x > 13 OR 5 − x < −13) ⇐⇒ (−8 > x OR 18 < x). The solution
set is (−∞, −8) ∪ (18, ∞) or R ∖ [−8, 18].
(c) Sketch y = ∣−3x + 2∣ − 4 and y = x − 1. Observe ∣−3x + 2∣ − 4 ≥ x − 1 ⇐⇒ x is to the left
or right of the two intersection points P and Q. P is given by: −3x + 2 − 4 = x − 1 or −1 = 4x
or x = −1/4. Q is given by: 3x − 2 − 4 = x − 1 or 2x = 5 or x = 5/2. Thus, the solution set is
(−∞, −1/4] ∪ [5/2, ∞) or R ∖ (−1/4, 5/2).

y =x−1
1 5

4 2

y = ∣−3x + 2∣ − 4

(d) Sketch y = ∣x + 6∣ and y = 2 ∣2x − 1∣. Observe that ∣x + 6∣ > 2 ∣2x − 1∣ ⇐⇒ x is between
the two intersection pointsP and Q. P is given by: −x−6 = 2 (2x − 1) or −4 = 5x or x = −4/5.
Q is given by: x + 6 = 2 (2x − 1) or 8 = 3x or x = 8/3. Thus, the solution set is (−4/5, 8/3).

y = 2 ∣2x − 1∣

y = ∣x + 6∣

4 8
− x
5 3

1448, Contents

A138(a) (x − 1) / − 4 > 0 ⇐⇒ x − 1 < 0 ⇐⇒ x < 1.
(b) −1/ − 4 > 0 is always true. (c) 1/ − 4 > 0 is always false.
(d) The numerator and denominator equal zero at −1/2 and −2/3. Draw the sign diagram:

+ − +
−2/3 −1/2

Hence, x < −2/3 or x > −1/2.

2x + 1
3x + 2

2 1 x
− −
3 2

(e) The numerator and denominator equal zero at −6 and 14/9. Draw the sign diagram:

− + −
−6 14/9

Hence, −6 < x < 14/9.

1449, Contents


−6 14 −3x − 18
9 9x − 14

1450, Contents

A139(a) (2x + 3) / (−x + 7) < 9 ⇐⇒ 9+(2x + 3) / (x − 7) > 0 ⇐⇒ (9x − 63 + 2x + 3) / (x − 7) >
0 ⇐⇒ (11x − 60) / (x − 7) > 0 ⇐⇒ (11x − 60, x − 7 > 0 OR 11x − 60, x − 7 < 0).
1 2

11x − 60, x − 7 > 0 ⇐⇒ (x > 60/11 AND x > 7) ⇐⇒ x > 7.


11x − 60, x − 7 < 0 ⇐⇒ (x < 60/11 AND x < 7) ⇐⇒ x < 60/11.


y y=9

Altogether then,
2x + 3 60
< 9 ⇐⇒ x ∈ R ∖ [ , 7].
−x + 7 11

60 7 x
2x + 3
11 y=
−x + 7

(b) (−4x + 2) / (x + 1) > 13 ⇐⇒ (−4x + 2 − 13x − 13) / (x + 1) > 0 ⇐⇒ (−17x − 11) / (x + 1) >
0 ⇐⇒ −17x − 11, x + 1 > 0 OR −17x − 11, x + 1 < 0.
1 2

−17x − 11, x + 1 > 0 ⇐⇒ (x < −11/17 AND x > −1) ⇐⇒ x ∈ (−1, −11/17).

−17x − 11, x + 1 < 0 ⇐⇒ (x > −11/17 AND x < −1) ⇐⇒ Contradiction.


y = 13

Altogether then,
−4x + 2 11
> 13 ⇐⇒ x ∈ (−1, − ).
x+1 17

−1 −
17 −4x + 2

1451, Contents

A137(a) Since a < 0, this is a ∩-shaped quadratic. Since b2 −4ac = 12 −4(−3)(−5) = −59 < 0,
the graph doesn’t touch the x-axis at all and is completely below the x-axis. Hence, there
are no values of x for which this inequality is true — the solution set is ∅.


y = −3x2 + x − 5

(there are no values of x
for which −3x2 + x − 5 > 0)

(b) Since a > 0, this is a ∪-shaped quadratic. Since b2 − 4ac = (−2) − 4(1)(−1) = 8 > 0, the

graph intersects the x-axis at two

√ points —√ these are simply given by the two roots of the
quadratic, which are x = (2 ± 8) /2 = 1 ± 2. Hence, x2 − 2x − 1 > 0 is true “outside”
√ √
those two roots — the solution set is R ∖ [1 − 2, 1 + 2].

y = x2 − 2x − 1

√ √ x
1− 2 1+ 2

1452, Contents

A141(a) N = x2 + 2x + 1 = (x + 1) is positive everywhere except at x = −1, where it equals

zero. So, the inequality is equivalent to x2 − 3x + 2 > 0 AND x ≠ −1.

Observe that x2 − 3x + 2 = (x − 1) (x − 2) is a ∪-shaped quadratic which intersects the x-axis
at 1 and 2. Thus, x2 − 3x + 2 > 0 ⇐⇒ x ∈ R/ [1, 2].
Thus, the inequality’s solution set is R/ [1, 2] / {−1} or (−∞, 1) ∪ (−1, 1) ∪ (2, ∞).

x2 + 2x + 1
y= 2
x − 3x + 2

−1 1 2 x

x ∈ (−∞, −1) ∪ (−1, 1) ∪ (2, ∞)

x 2 + 2x + 1
solves 2 > 0.
x − 3x + 2

1453, Contents

A141(b) N = x2 − 1 = (x + 1) (x − 1) is positive if x ∈ R/ [−1, 1] and negative if x ∈ (−1, 1).
D = x2 − 4 = (x + 2) (x − 2) is positive if x ∈ R/ [−2, 2] and negative if x ∈ (−2, 2).
The given inequality is true if N, D > 0 or N, D < 0. We have:

N, D > 0 ⇐⇒ x ∈ R/ [−1, 1] AND x ∈ R/ [−2, 2] ⇐⇒ x ∈ R/ [−2, 2].

N, D < 0 ⇐⇒ x ∈ (−1, 1) AND x ∈ (−2, 2) ⇐⇒ x ∈ (−1, 1).

Thus, the inequality’s solution set is R/ [−2, 2] ∪ (−1, 1) or (−∞, −2) ∪ (−1, 1) ∪ (2, ∞).

x2 − 1
x2 − 4

−2 −1 1 2

(−∞ − 2) ∪ (−1, 1) ∪ (2, ∞)

x2 − 1
solves 2 > 0.
x −4

1454, Contents

A141(c) N = x2 − 3x − 18 = (x − 6) (x + 3) is positive if x ∈ R/ [−3, 6] and negative if
x ∈ (−3, 6).
D = −x2 + 9x − 14 = − (x − 2) (x − 7) is positive if x ∈ (2, 7) and negative if x ∈ R/ [2, 7].
The given inequality is true if N, D > 0 or N, D < 0. We have:

N, D > 0 ⇐⇒ x ∈ R/ [−3, 6] AND x ∈ (2, 7) ⇐⇒ x ∈ (2, 7) / [−3, 6] = (6, 7).

N, D < 0 ⇐⇒ x ∈ (−3, 6) AND x ∈ R/ [2, 7] ⇐⇒ x ∈ (−3, 6) ∖ [2, 7] = (−3, 2).

Thus, the inequality’s solution set is (−3, 2) ∪ (6, 7).

y (−3, 2) ∪ (6, 7)
x2 − 3x − 18
solves > 0.
−x2 + 9x − 14

−3 2 6 7 x

x2 − 3x − 18
−x2 + 9x − 14

1455, Contents

A142(a) Rewrite the inequality as x3 − x2 + x − 1 − ex > 0, then graph y = x3 − x2 + x − 1 − ex
on your TI84. It looks like the graph may be a little above the x-axis near 3, so let’s zoom
in. But we’ll zoom in not through the ZOOM function, but by adjusting Xmin and Xmax
in the WINDOW menu (I’ve adjusted them to 0 and 4).
And now as usual, we can “simply” find the two roots using the ZERO function. They are
x ≈ 3.047, 3.504.
Altogether then, based on these two roots and what the graph looks like, we conclude:

x3 − x2 + x − 1 > ex ⇐⇒ x3 − x2 + x − 1 − ex > 0 ⇐⇒ x ∈ (3.047 . . . , 3.503 . . . ).

After graphing. After zooming in. The two roots.

√ √
(b) Rewrite the inequality as x − cos x > 0, then graph y = x − cos x on your TI84. It
looks like there’s at least one x-intercept near the origin, maybe more. So let’s zoom in.
Alright, it looks like there’s only one x-intercept. We can as usual find it using the ZERO
function — x ≈ 0.642.
Altogether then, based on this one root and what the graph looks like, we conclude:
√ √
x > cos x ⇐⇒ x − cos x > 0 ⇐⇒ x ? 0.642.

After graphing. Zoom in. The only root.

(c) Rewrite the inequality as 1/ (1 − x2 )−x3 −sin x > 0, then graph y = 1/ (1 − x2 )−x3 −sin x
on your TI84. It looks like there’s only one x-intercept near x = −1.
So let’s simply find it using the ZERO function — x ≈ −1.179.
Altogether then, based on this one root and what the graph looks like, we conclude:
1 1
> x3 + sin x ⇐⇒ − x3 − sin x > 0 ⇐⇒ x ∈ (−∞, −1.179 . . . ) ∪ (−1, 1).
1−x 2 1 − x2

After graphing. The only root.

1456, Contents

124.18. Ch. 25 Answers (Solving Systems of Equations)
A143. Let A, B, and C be the ages of Apu, Beng, and Caleb today. Let k be the number
of years since Apu was 40 years old. The first sentence says that:

A − k = 40 and B − k = 2 (C − k).
1 2

The second sentence says that:

A = 2B and C = 28.
3 4

Plug = into = and = into = to get:

3 1 4 2

2B − k = 40 and B − k = 2(28 − k).

5 6

From =, we have

k = 2B − 40.

Now plug = into = to get:

7 6

B − (2B − 40) = 2 [28 − (2B − 40)]

40 − B = 2 (68 − 2B) = 136 − 4B
3B = 96.

Thus, B = 32 — Beng is 32 years old today. And from =, Apu is 64 years old today.

1457, Contents

A144. At 3 p.m., Plane A is 300 km northeast of the starting point and Plane B is 600 km
south of it. The angle formed by their flight paths is 3π/4. The distance between the two
planes is the third side of the triangle, two of whose sides are 300 km and 600 km, and whose
angle between those two sides is 3π/4.

300 km


600 km ≈ 442 km

By the Law of Cosines (Proposition 5), the third side of a triangle is given by:

c2 = a2 + b2 − 2ab cos C = 90000 + 360000 − 2(300)(600) × (− ) ≈ 195442.

Hence, c ≈ 195442 ≈ 442.
Thus, at 3 p.m., the distance between the two planes is about 442 km. And from 3 p.m.
onwards, they’re travelling directly towards each other — Plane A at 100 km h−1 and Plane
B at 200 km h−1 . Thus, the distance between them is shrinking at a rate of 300 km h−1 .
Hence, to collide, it will take about another
h ≈ 1 h 28 min.
Hence, they will collide at about 4.28 p.m.

1458, Contents

A145. The given information yields this system of equations:

2 = a ⋅ 12 + b ⋅ 1 + c = a + b + c,

5 = a ⋅ 32 + b ⋅ 3 + c = 9a + 3b + c,

9 = a ⋅ 62 + b ⋅ 6 + c = 36a + 6b + c.

= minus = yields: 8a + 2b = 3 ⇐⇒ b = 1.5 − 4a.

2 1 4

Plug = into = to get: a + 1.5 − 4a + c = 2 ⇐⇒ c = 0.5 + 3a.

4 1 5

Now plug = and = into = to get: 36a + 6 (1.5 − 4a) + 3a + 0.5 = 9 ⇐⇒ 15a = −0.5.
4 5 3

1 49 2
Hence, the (unique) solution is (a, b, c) = (− , , ).
30 30 5
A146. Recall that the strict global minimum point of a quadratic equation occurs where
x = −b/2a and thus where:

b 2 b2 b2 b2
y = a (− ) + b (− ) + c = − +c=c− .
2a 2a 4a 2a 4a
The above, along with the fact that (−1, 2) solves the given equation, yields the following
system of equations:

2 = a ⋅ (−1) + b ⋅ (−1) + c = a − b + c, 0 = − , and 0 = c − .
1 2 2 b 3
2a 2a

= immediately yields b = 0. = then yields c = 0. Plugging these into =, we have a = 2.

2 3 1

Altogether then, the (unique) solution is (a, b, c) = (2, 0, 0).

A147(a) We’ll use Method 2 from the last example — we’ll rewrite the two equations as:
y = x5 − x3 + 2 − √ ,
1+ x

then graph this equation on our GC. It looks like there are no x-intercepts. We thus
conclude that this system of equations has no solutions and its solution set is ∅.
(b) Again, use Method 2 and rewrite the two equations as:
y= − x3 − sin x,
1−x 2

then graph this equation on our GC. It looks like there is only one x-intercept. Using the
“zero” function, we find that it’s at x ≈ −1.179.
Plug this value of x back into either of the original equations to get: y ≈ −2.563. We
conclude that the (unique) solution is ∼ (−1.179, −2.563).
(Answer continues on the next page ...)

1459, Contents

(... Answer continued from the previous page.)
A147(c) Recall that x2 + y 2 = 1 describes the unit circle centred on the origin. Recall also
that to get the TI84 to graph it, we must break it up into two equations:
√ √
y = 1 − x2 and y = − 1 − x2 .
√ √
1. Enter the three equations y = 1 − x2 , y = − 1 − x2 , and y = sin x.
2. Press GRAPH to graph the three equations.
3. Zoom in by pressing ZOOM , 2 , then ENTER .
It looks like y = sin x intersects the circle x2 + y 2 = 1 at two points. To find what these
two points are, we will use the “intersect” function.
4. Press 2ND , then CALC to bring up the CALCULATE menu. Then press 5 to select
the “intersect” function.
As usual, the TI84 asks, “First curve?” I’ll first look for the intersection that’s to the
right of the y-axis. So:

5. Press ENTER to confirm that we’re selecting y1 = 1 − x2 as our first curve.
The TI84 now asks, “Second curve?” For our second curve, we want to select y3 = sin (x).
To do so, simply:
6. Press the down arrow key once.

7. Now as usual, use the left and right arrow keys to move the cursor to approximately where
we think the intersection point is. In my case, I’ve moved it to (x, y) ≈ (0.745, 0.678).
8. Now press ENTER . The TI84 now annoyingly now asks, “Guess?” So press ENTER
once more to learn that the intersection point is (x, y) ≈ (0.739, 0.674).
By repeating Steps 4–8 (and making the necessary changes), you should be able to find
that the other intersection point is (x, y) ≈ (−0.739, −0.674). (Alternatively, you can save
yourself some time by observing the symmetries here and immediately infer that this is the
other intersection point.)
We conclude that this system of equations has two solutions: (±0.739 . . . , ±0.673 . . . ).

After Step 1. After Step 2. After Step 3. After Step 4.

After Step 5. After Step 6. After Step 7. After Step 8.

(Answer continues on the next page ...)

1460, Contents

(... Answer continued from the previous page.)
A148(a) First, observe that x2 + x − 6 = (x − 2) (x + 3). Next write:
8 A (x + 3) + B (x − 2) (A + B) x + 3A − 2B
= + = =
x2 + x − 6 x − 2 x + 3 (x − 2) (x + 3) (x − 2) (x + 3)

Comparing coefficients, A + B = 0 and 3A − 2B = 8. So, A = 8/5 and B = −8/5. Thus:

8 8 8
= − .
x2 + x − 6 5 (x − 2) 5 (x + 3)

(b) First, observe that 3x2 − 8x − 3 = (3x + 1) (x − 3). Next write:

17x − 5 A (x − 3) + B (3x + 1) (A + 3B) x − 3A + B
= + = =
3x2 − 8x − 3 3x + 1 x − 3 (3x + 1) (x − 3) (3x + 1) (x − 3)

Comparing coefficients, A + 3B = 17 and −3A + B = −5. 3× = plus = yields 10B = 46. So,
1 2 1 2

B = 4.6 and A = 3.2. Thus:

17x − 5 3.2 4.6
= + .
3x − 8x − 3 3x + 1 x − 3

(c) First use the FTGACM to factorise p (x) = x3 − x2 − x + 1. Try 1:

p(1) = 1 − 1 − 1 + 1 = 0. 3

Yay, works! By the FT, x − 1 is a factor for p (x). Next write:

x3 − x2 − x + 1 = (x − 1) (ax2 + bx + c) .

Comparing coefficients, a = 1, −c = 1, and c − b = −1. So, c = −1 and b = 0. Hence,

ax2 + bx + c = x2 − 1 = (x + 1) (x − 1). Thus, x3 − x2 − x + 1 = (x + 1) (x − 1) . Now write:

2x2 − x + 7 A (x − 1) + B (x + 1) (x − 1) + C (x + 1)
= + + =
x − x − x + 1 x + 1 x − 1 (x − 1)
3 2 2
(x + 1) (x − 1)

Ax2 − 2Ax + A + Bx2 − B + Cx + C

(x + 1) (x − 1)

(A + B) x2 + (−2A + C) x + A − B + C
= .
(x + 1) (x − 1)

Comparing coefficients, A + B = 2, −2A + C = −1, and A − B + C = 7. Summing up these

three equations to get 2C = 8 or C = 4. We then find that A = 5/2 and B = −1/2. Thus:

2x2 − x + 7 5 1 4
= − + .
x3 − x2 − x + 1 2 (x + 1) 2 (x − 1) (x − 1)2

1461, Contents

A148(d) First use the FTGACM to factorise p (x) = x3 − 2x2 + 4x − 8. Try 1:

p(1) = 1 − 2 + 4 − 8 < 0. 7

Aiyah, sian. Doesn’t work — by the FT, x − 1 is not a factor for p (x). Now try 2 instead:

p(2) = 23 − 2 ⋅ 22 + 4 ⋅ 2 − 8 = 0. 3

Yay, works! By the FT, x − 2 is a factor for p (x). Now write:

x3 − 2x2 + 4x − 8 = (x − 2) (ax2 + bx + c) .

Comparing coefficients, we see that a = 1, −2c = −8, and c − 2b = 4. So c = 4 and b = 0.

Hence, ax2 + bx + c = x2 + 4 — this quadratic has negative discriminant and so cannot be
further factorised. Thus, x3 − 2x2 + 4x − 8 = (x − 2) (x2 + 4). Now write:
−3x2 + 5 Bx + C
= +
x3 − 2x2 + 4x − 8 x − 2 x2 + 4

A (x2 + 4) + (Bx + C) (x − 2)
(x − 2) (x2 + 4)

Ax2 + 4A + Bx2 + (C − 2B) x − 2C

(x − 2) (x2 + 4)

(A + B) x2 + (C − 2B) x + 4A − 2C
(x − 2) (x2 + 4)

Comparing coefficients, A + B = −3, C − 2B = 0, and 4A − 2C = 5.

1 2 3

2× = plus = yields 2A + C = −6. Now 2× = plus = yields 8A = −7 and thus A = −7/8. We can
1 2 4 4 3

then figure out that B = −17/8 and C = −17/4. Thus:

−3x2 + 5 7 17 (x + 2)
= − − .
x3 − 2x2 + 4x − 8 8 (x − 2) 8 (x2 + 4)

1462, Contents

124.19. Ch. 26 Answers (Extraneous Solutions)
A149. As usual, the squaring operation in Step 1 is an irreversible step. That is, the
following implication is true:

sin x + cos x = 1 Ô⇒ (sin x + cos x) = 12 .


But its converse is not:

(sin x + cos x) = 12 Ô⇒ sin x + cos x = 1.


And so, we may have introduced extraneous solutions through Step 1.

To check which solutions are correct and which are extraneous, simply plug each back into
the original equation =.

We can easily verify that x = 0 and x = π/2 do indeed satisfy =, while x = π and x = π/2 do
2 4 1 3 4

not and are thus extraneous solutions.

A150. Plugging x = 1 back into the original equation =, we have:

2 1

√ √
12 1 + 12 + 1 + 1 = 1 + 1 + 1 + 1 = 4 ≠ 0,

so that x = 1 does not solve =.

2 1

The error is in Step 4, where ⇐⇒ is incorrectly given as an ⇐⇒ or “is equivalent to” or “if

and only if” step. It is not. It is a squaring operation and is thus, as usual, an irreversible
step that should’ve been written as Ô⇒ . It is in this step that extraneous solutions may

have been introduced.

Since the only “solution” we found was extraneous, = simply has no (real) solutions.

A151. Plug x = 1 into =: 12 + (−1 − ) + 1 = 1 − 2 + 1 = 0.
5 4 4

Plug x = 1 into =: 12 + 1 + 1 = 3 ≠ 0.
5 1

The error is in ⇐⇒ , which should instead be Ô⇒ . The “divide by x” presumes the

2 2

existence of some (real) solution to =, where none may exist (and indeed none does) and

may thus introduce extraneous solutions.

Here = simply has no (real) solutions.

1463, Contents

125. Part II Answers (Sequences and Series)

125.1. Ch. 27 Answers (Sequences)

A152(a) Define a ∶ {1, 2, 3, 4, 5, 6, 7, 8, 9, 10} → R by a (n) = n2 .
(b) Define b ∶ {1, 2, 3, . . . , 100} → R by b (n) = 3n − 1.
(c) Define c ∶ {1, 2, 3, 4, 5, 6, 7} → R by c (n) = n3 .
(d) Define d ∶ Z+ → R by d (n) = 2n for n odd and d (n) = 3n for n even.
(e) There is no obvious pattern here. So simply define e ∶ {1, 2, 3} → R by e (1) = 5,
e (2) = 0, and e (3) = 99.
(f) Define f ∶ Z+ → R by f (n) = 1 × 2 × ⋅ ⋅ ⋅ × n = n!.
(g) Observe that the 1st, 3rd, 6th, 10th, 15th, and 21st terms are 1, while all the other
terms are 0. Thus, define g ∶ Z+ → R by g (n) = 1 if n is a triangular number (i.e. 1, 3, 6, . . . )
and g (n) = 0 otherwise.

A153(a) (cn ) = (−1, −3, −5, −7, −9, . . . ).

(b) (dn ) = (1, 8, 27, 64, 125, . . . ).
(c) (cn + dn ) = (0, 5, 22, 57, 116, . . . ).
(d) (cn − dn ) = (−2, −11, −32, −71, −134, . . . ).
(e) (cn dn ) = (−1, −24, −135, −448, −1 125, . . . ).
3 5 7 9
(f) ( ) = (−1, − , − , − , − , . . . ).
dn 8 27 64 125
(g) (kcn ) = (−2, −6, −10, −14, −18, . . . ).
(h) (kdn ) = (2, 16, 54, 128, 250, . . . ).

125.2. Ch. 28 Answers (Series)

(This chapter had no exercises.)

1464, Contents

125.3. Ch. 29 Answers (Summation Notation Σ)
A154(a) ∑ n! = 1 + 2 + 6 + 24 + 120 + 720 + 5 040 = 5 913.
(b) ∑ (3n − 1) = 2 + 5 + 8 + 11 + 14 + 17 + 20 + 23 = 100.
7 6
n 1 3 5 7
(c) ∑ = + 1 + + 2 + + 3 + = 14. (d) ∑ (9 − n) = 8 + 7 + 6 + 5 + 4 + 3 = 33.
n=1 2 2 2 2 2 n=1

A155(a) ∑ (n + 1)! = 1 + 2 + 6 + 24 + 120 + 720 + 5 040 = 5 913.
(b) ∑ (3n + 2) = 2 + 5 + 8 + 11 + 14 + 17 + 20 + 23 = 100.
6 5
n 1 1 3 5 7
(c) ∑ ( + ) = + 1 + + 2 + + 3 + = 14. (d) ∑ (8 − n) = 8 + 7 + 6 + 5 + 4 + 3 = 33.
n=0 2 2 2 2 2 2 n=0

∑ (2 − i) = (2 − 1) + (2 − 2) + (2 − 3) + (2 − 4)
1 2 3 4

= 11 + 02 + (−1) + (−2) = 1 − 1 + 16 = 16.
3 4

(b) ∑ (4⋆ + 5) = (4 ⋅ 16 + 5) + (4 ⋅ 17 + 5) = 69 + 73 = 142.
(c) ∑ (x − 3) = (31 − 3) + (32 − 3) + (33 − 3) = 28 + 29 + 30 = 87.

A157(a) ∑ n! = ∑ n! = 1 + 2 + 6 + 24 + 120 + 720 + 5 040 + . . .

(b) ∑ (3n − 1) = ∑ (3n − 1) = 2 + 5 + 8 + 11 + 14 + 17 + 20 + 23 + . . .

n 1 3 5 7
(c) ∑ = ∑ = + 1 + + 2 + + 3 + + ...
n=1 2 2 2 2 2 2

(d) ∑ (9 − n) = ∑ (9 − n) = 8 + 7 + 6 + 5 + 4 + 3 + . . .

A158(a) ∑ (n + 1)! = 1 + 2 + 6 + 24 + 120 + 720 + 5 040 + . . .

(b) ∑ (3n + 2) = 2 + 5 + 8 + 11 + 14 + 17 + 20 + 23 + . . .
∞ ∞
n 1 1 3 5 7
(c) ∑ ( + ) = + 1 + + 2 + + 3 + + . . . (d) ∑ (8 − n) = 8 + 7 + 6 + 5 + 4 + 3 + . . .
n=0 2 2 2 2 2 2 n=0

1465, Contents

125.4. Ch. 30 Answers (Arithmetic Sequences and Series)
A159(a) a1 = 2, ak = 997, and d = 5. So, k = (997 − 2) /5 + 1 = 200. Thus, the sum is:

(a1 + ak ) = (2 + 997) = 99 900.
2 2
(b) b1 = 3, bk = 1 703, and d = 17. So, k = (1703 − 3) /17 + 1 = 101 terms. Thus, the sum is:

(b1 + bk ) = (3 + 1703) = 86 153.
2 2
(c) c1 = 81, ck = 8 081, and d = 5. So, k = (8 081 − 81) /8 + 1 = 1 001 terms. Thus, the sum is:

1 001
(c1 + ck ) = (81 + 8 081) = 4 085 081.
2 2

125.5. Ch. 31 Answers (Geometric Sequences and Series)

A160(a) a1 = 7, ak = 896, and r = 2. By Corollary 6, the sum of this series is:

a1 − rak 7 − 2 ⋅ 896
= = 1 785.
1−r 1−2
(b) b1 = 20, bk = 5/8, and r = 1/2. By Corollary 6, the sum of this series is:

b1 − rbk 20 − 21 ⋅ 58 1 5 5 3
= = 2 (20 − ⋅ ) = 40 − = 39 .
1−r 1 − 12 2 8 8 8

(c) c1 = 1, ck = 1/243, and r = 1/3. By Corollary 6, the sum of this series is:

c1 − rck 1 − 31 ⋅ 243
3 1 364
= = (1 − )= .
1−r 1− 3 1 2 729 243

A161(a) a1 = 6 and r = 3/4. Thus, the sum of this series is 6/ (1 − 3/4) = 24.
(b) b1 = 20 and r = 1/2. Thus, the sum of this series is 20/ (1 − 1/2) = 40.
(c) c1 = 1 and r = 1/3. Thus, the sum of this series is 1/ (1 − 1/3) = 3/2.

1466, Contents

125.6. Ch. 32 Answers (Rules of Summation Notation)
100 100 4
A162(a) ∑ 1 = ∑ 1 − ∑ 1 = 100 − 4 = 96.
n=5 n=1 n=1
100 100 4
(b) ∑ n = ∑ n − ∑ n = 5 050 − 10 = 5 040.
n=5 n=1 n=1
100 100 100
(c) ∑ (n + 1) = ∑ n + ∑ 1 = 5 040 + 96 = 5 136.
n=5 n=5 n=5
100 100 100
(d) ∑ (3n + 2) = 3 ∑ n + 2 ∑ 1 = 3 ⋅ 5 040 + 2 ⋅ 96 = 15 312.
n=5 n=5 n=5
100 100
(e) ∑ nx = x ∑ n = 5 040x.
n=5 n=5

A163(a) Let: S5 = 1 + 2x + 3x2 + 4x3 + 5x4 .

Then: xS5 = x + 2x2 + 3x3 + 4x4 + 5x5 .

1 − x5
And so: S5 − xS5 = (1 − x) S5 = 1 + x + x2 + x3 + x4 − 5x5 = − 5x5
1 − x5 − 5x5 + 5x6 1 − 6x5 + 5x6
= = .
1−x 1−x

1 − 6x5 + 5x6
Thus: S5 = .
(1 − x)

(b) Let: Sk = 1 + 2x + 3x2 + 4x3 + ⋅ ⋅ ⋅ + kxk−1 .

Then: xSk = x + 2x2 + 3x3 + 4x4 + ⋅ ⋅ ⋅ + kxk .

1 − xk
And so: Sk − xSk = (1 − x) Sk = 1 + x + x2 + x3 + ⋅ ⋅ ⋅ + xk−1 − kxk = − kxk
1 − xk − kxk + kxk+1 1 − (k + 1) xk + kxk+1
= = .
1−x 1−x

1 − (k + 1) xk + kxk+1
Thus: Sk = .
(1 − x)

1467, Contents

125.7. Ch. 33 Answers (Method of Differences)
A164(a) Observe that 3 = 4 − 1 = 22 − 1, 8 = 9 − 1 = 32 − 1, 15 = 42 − 1, etc.

Hence, the nth term is: .
(n + 1) − 1

1 1 1 1 1 1 1
And: + + + + + ⋅⋅⋅ + =∑ .
3 8 15 24 35 999 999 n=1 (n + 1)2 − 1

Take the nth term, factorise its denominator, then do the partial fractions decomposition:
1 1 1
= =
(n + 1) − 1
2 (n + 1 − 1) (n + 1 + 1) n (n + 2)

A (n + 2) + Bn (A + B) n + 2A
= + = =
n n+2 n (n + 2) n (n + 2)

Comparing coefficients, A + B = 0 and 2A = 1. Hence, A = 1/2 and B = −1/2 and:

1 1 1 1
= ( − ).
(n + 1) − 1
2 2 n n+2
Thus: ∑
(n + 1) − 1

1 1 1 1 1 1
= + + + + + ⋅⋅⋅ +
3 8 15 24 35 999 999
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
= ( − + − + − + − + ⋅⋅⋅ + − + − + − ).
2 1 3 2 4 3 5 4 6 997 999 998 1 000 999 1 001

Observe that all the terms with denominators 3 through 999 will be cancelled out.
1 1 1 1 1 1 3 1 1
Thus: ∑ = ( + − − )= − − = 0.749 . . .
(n + 1) − 1
2 2 1 2 1 000 1 001 4 2 000 2 002

1 3 1 1
More generally: ∑ = − − .
n=1 (n + 1) − 1
2 4 2 (k + 1) 2 (k + 2)

1 1 1 1 1 k
Hence: + + + + + . . . = lim ∑
3 8 15 24 35 k→∞ n=1 (n + 1)2 − 1

3 1 1 3
= lim ( − − )= .
k→∞ 4 2 (k + 1) 2 (k + 2) 4

1468, Contents

or lg n − lg (n + 1).
(b) The nth term is simply: lg
1 2 3 999
∑ lg = lg + lg + lg + ⋅ ⋅ ⋅ + lg
n=1 n+1 2 3 4 1 000
= lg 1 − lg 2 + lg 2 − lg 3 + lg 3 − lg 4 + ⋅ ⋅ ⋅ + lg 999 − lg 1 000
= lg 1 − lg 1 000 = 0 − 3 = −3.

∑ lg = lg 1 − lg (k + 1) = − lg (k + 1)
More generally:
n=1 n+1

1 2 3 k
lg + lg + lg + ⋅ ⋅ ⋅ = lim ∑ lg = lim (− lg (k + 1)) = −∞.
2 3 4 k→∞ n=1 n + 1 k→∞

That is, the infinite series diverges.

(c) The nth term is √√ . Rationalise the surds:
(n + 1) n + n n + 1
√ √ √ √
1 (n + 1) n − n n + 1 (n + 1) n − n n + 1
√ √ √ √ =
(n + 1) n + n n + 1 (n + 1) n − n n + 1 (n + 1)2 n − n2 (n + 1)
√ √ √ √ √ √
(n + 1) n − n n + 1 (n + 1) n − n n + 1 n+1
= 3 = = −
n + 2n + n − (n + n )
2 3 2 n +n
2 n n+1

1 1 1 1
Thus: ∑ √ √ = √ √ + √ √ + ⋅⋅⋅ + √ √
n=1 (n + 1) n + n n + 1 2 1+1 2 3 2+2 3 100 99 + 99 100
√ √ √ √ √ √
1 2 2 3 99 100
= − + − + ⋅⋅⋅ + −
1 2 2 3 99 100
√ √
1 100 10
= − =1− = 0.9.
1 100 100

1 k+1 1
More generally: ∑ √ √ =1− =1− √ .
n=1 (n + 1) n + n n + 1 k k+1

1 1
Hence: lim ∑ √ √ = lim (1 − √ )=1
k→∞ n=1 (n + 1) n + n n + 1 k→∞ k+1

1469, Contents

(d) First, observe that (n + 1) − n4 = 4n3 + 6n2 + 4n + 1. Hence:

∑ [(n + 1) − n ] = ∑ (4n3 + 6n2 + 4n + 1)

k k
4 4
n=1 n=1
k k k k
= 4 ∑ n + 6∑n + 4∑n + ∑1
3 2
n=1 i=1 i=1 i=1
= 4 ∑ n3 + k (k + 1) (2k + 1) + 2k (k + 1) + k
= 4 ∑ n3 + 2k 3 + 5k 2 + 4k.


On the other hand, we also have:

∑ [(n + 1) − n4 ] = 24 − 14 + 34 − 24 + 44 − 34 + ⋅ ⋅ ⋅ + (k + 1) − k 4
4 4


= (k + 1) − 14 = k 4 + 4k 3 + 6k 2 + 4k.
4 2

Putting = and = together, we have:

1 2

4 ∑ n3 + k (k + 1) (2k + 3) + k = k 4 + 4k 3 + 6k 2 + 4k

k 4 + 4k 3 + 6k 2 + 4k − (2k 3 + 5k 2 + 4k) k 4 + 2k 3 + k 2 k 2 (k + 1)
k 2
⇐⇒ ∑ n = 3
= = .
n=1 4 4 4

And so, we have in particular:

1002 (100 + 1) 10 000 (10 201)

100 2
∑n = 3
= = 25 502 500
n=1 4 4

The corresponding infinite series diverges. That is:

k 2 (k + 1)
k 2
1 + 2 + 3 + ⋅ ⋅ ⋅ = lim ∑ n = lim
3 3 3
= ∞.
k→∞ i=1 k→∞ 4

1470, Contents

126. Part III Answers (Vectors)

126.1. Ch. 34 Answers (Introduction to Vectors)

A165. Tail Head Length This vect

Ð→ ⎛4⎞ √
= (4, 7) = = a 65 4 unit(s) E

Ð→ ⎛ −3 ⎞
= (−3, −4) = = b 5 3 “ W
⎝ −4 ⎠

Ð→ ⎛1⎞ √
= (1, 3) = = c 10 1 “ E

Ð→ ⎛ −4 ⎞ √
= (−4, −7) = = d 65 4 “ W
⎝ −7 ⎠

Ð→ ⎛ −1 ⎞ √
= (−1, −3) = = e 10 1 “ W
⎝ −3 ⎠

A166. Here is one possible counterexample (yours may be different but still correct).
√ √
Let u = (1, 0) and v = (0, 1). Then ∣u∣ = ∣(1, 0)∣ = 12 + 02 = 1 and ∣v∣ = ∣(0, 1)∣ √
= 02 + 12 √
= 1,
so that ∣u∣ + ∣v∣ = 1 + 1 = 2. However, u + v = (1, 1), so that ∣u + v∣ = ∣(1, 1)∣ = 12 + 12 = 2.
Hence, ∣u + v∣ ≠ ∣u∣ + ∣v∣.
(As we’ll learn later, the correct assertion is this: ∣u + v∣ ≤ ∣u∣ + ∣v∣, with ∣u + v∣ = ∣u∣ + ∣v∣ if
and only if u and v point in the same direction.)

A167. Given the vector (4, −3):

(a) If its tail is (0, 0), then its head is (0, 0) + (4, −3) = (4, −3).
(b) If its head is (0, 0), then its tail is (0, 0) − (4, −3) = (−4, 3).
(c) If its tail is (5, 2), then its head is (5, 2) + (4, −3) = (9, −1).
(d) If its head is (5, 2), then its tail is (5, 2) − (4, −3) = (1, 5).

Ð→ Ð→ Ð→ ÐÐ→ Ð→ Ð→ ÐÐ→ Ð→ Ð→ Ð→ ÐÐ→ Ð→ ÐÐ→ Ð→

A168. AC + CB = AB, DC + CA = DA, BD + DA = BA, AD − CD = AD + DC = AC,
ÐÐ→ ÐÐ→ ÐÐ→ ÐÐ→ Ð→ ÐÐ→ ÐÐ→ ÐÐ→
−DC − BD = CD + DB = CB, and BD + DB = BB = (0, 0) = 0.

A169(a) cv = (cv1 , cv2 ).

√ √ √ √
(b) ∣cv∣ = ∣(cv1 , cv2 )∣ = (cv1 ) + (cv2 ) = c v1 + c v2 = c (v1 + v2 ) = ∣c∣ v12 + v22 = ∣c∣ ∣v∣.
2 2 2 2 2 2 2 2 2

1471, Contents

Ð→ Ð→ Ð→
A170(a) AB = (1, 3), AC = (4, 2), BC = (3, −1).
Ð→ Ð→ Ð→
2AB = (2, 6), 3AC = (12, 6), 4BC = (12, −4).
Ð→ √ √ √ Ð→ √ √ Ð→ Ð→
(b) ∣2AB∣ = 22 + 62 = 40 = 2 10, ∣AB∣ = 12 + 32 = 10, and so ∣2AB∣ = 2 ∣AB∣.
Ð→ √ √ √ Ð→ √ √ Ð→ Ð→
∣3AC∣ = 122 + 62 = 180 = 3 20, ∣AC∣ = 42 + 22 = 20, and so ∣3AC∣ = 3 ∣AC∣.
√ √ √ √ √
Ð→ Ð→ Ð→ Ð→
∣4BC∣ = 12 + (−4) = 160 = 4 10, ∣BC∣ = 32 + (−1) = 10, and so ∣4BC∣ = 4 ∣BC∣.
2 2 2

A171. The vectors b and c point in the exact opposite directions because c = −3b.
The vectors b and d point in different directions because b ≠ kd for any k.

A172. By Facts 57 and 58, ∣cv̂∣ = ∣c∣ ∣v̂∣ = ∣c∣ ⋅ 1 = ∣c∣.

Ð→ Ð→ Ð→
A173. The unit vectors of AB = (1, 3), AC = (4, 2), and BC = (3, −1) are, respectively:

ˆ (1, 3) 1 1 3
AB = √ = √ (1, 3) = ( √ , √ ),
12 + 32 10 10 10
ˆ (4, 2) 1 2 1
AC = √ = √ (4, 2) = ( √ , √ ),
42 + 22 20 5 5
ˆ (3, −1) 1 3 −1
BC = √ = √ (3, −1) = ( √ , √ ).
32 + (−1) 10 10 10

Ð→ Ð→ Ð→
Of course, the above are also the unit vectors of 2AB, 3AC, and 4BC, respectively.

A174. For v = (3, 2), first write 3 = 1α + 3β and 2 = 2α + 4β.

1 2

= minus 2× = yields −2β = −4 or β = 2 and hence α = −3. Thus:

2 1

⎛3⎞ ⎛1⎞ ⎛3⎞

v= = −3 +2 = −3a + 2b.
⎝2⎠ ⎝2⎠ ⎝4⎠

For w = (−1, 0), first write −1 = 1α + 3β and 0 = 2α + 4β.

1 2

= minus 2× = yields −2β = 2 or β = −1 and hence α = 2. Thus:

2 1

⎛ −1 ⎞ ⎛1⎞ ⎛3⎞
w= =2 − = 2a − b.
⎝ 0 ⎠ ⎝2⎠ ⎝4⎠

1472, Contents

A175. Since a ∥/ b, any vector can be written as a linear combination of a and b.
For i = (1, 0), first write α + 7β = 1 and 3α + 5β = 0.
1 2

= minus 3× = yields −16β = −3 or β = 3/16 and hence α = −5/16. Thus:

2 1

⎛1⎞ 5 ⎛1⎞ 3 ⎛7⎞

i= =− + .
⎝0⎠ 16 ⎝ 3 ⎠ 16 ⎝ 5 ⎠

For j = (1, 0), first write α + 7β = 0 and 3α + 5β = 1.

1 2

= minus 3× = yields −16β = 1 or β = −1/16 and hence α = 7/16. Thus:

2 1

⎛0⎞ 7 ⎛1⎞ 1 ⎛7⎞

j= = − .
⎝ 1 ⎠ 16 ⎝ 3 ⎠ 16 ⎝ 5 ⎠

For d = (1, 1), first write α + 7β = 1 and 3α + 5β = 1.

1 2

= minus 3× = yields −16β = −2 or β = 1/8 and hence α = 1/8. Thus:

2 1

⎛ 1 ⎞ 1⎛ 1 ⎞ 1⎛ 7 ⎞
d= = + .
⎝ 1 ⎠ 8⎝ 3 ⎠ 8⎝ 5 ⎠

A176. The position vectors of P , Q, and R are:

6a + 5b 6 ⎛1⎞ 5 ⎛3⎞ 1 ⎛ 21 ⎞
p= = + = ,
5+6 11 ⎝ 2 ⎠ 11 ⎝ 4 ⎠ 11 ⎝ 32 ⎠

a + 5b 1⎛ 1 ⎞ 5⎛ 2 ⎞ 1 ⎛ 11 ⎞
q= = + = ,
5+1 6⎝ 4 ⎠ 6⎝ 3 ⎠ 6 ⎝ 19 ⎠

3a + 2b 3 ⎛ −1 ⎞ 2 ⎛ 3 ⎞ 1 ⎛ 3 ⎞
r = = + = .
2+3 5 ⎝ 2 ⎠ 5 ⎝ −4 ⎠ 5 ⎝ −2 ⎠

21 32 11 19 3 2
Thus: P =( , ), Q=( , ), and R = ( , − ).
11 11 6 6 5 5

1473, Contents

126.2. Ch. 35 Answers (Lines)
A177(a) (1, 3) (or any scalar multiple thereof). (b) (1, −2/7) (ditto). (c) (1, 0) (ditto).
(d) (0, 1) (ditto).

A178(a) The line described by −5x + y + 1 = 0 contains the point (0, −1) and has direction
vector (1, 5). Thus, it can also be described by r = (0, −1) + λ (1, 5) (λ ∈ R). λ = −1, λ = 0,
and λ = 1 produce the points (−1, −6), (0, −1), and (1, 4).
(b) The line described by x − 2y − 1 = 0 contains the point (1, 0) and has direction vector
(2, 1). Thus, it can also be described by r = (1, 0) + λ (2, 1) (λ ∈ R). λ = −1, λ = 0, and λ = 1
produce the points (−1, −1), (1, 0), and (3, 1).
(c) The line described by y − 4 = 0 contains the point (0, 4) and has direction vector (1, 0).
Thus, it can also be described by r = (0, 4) + λ (1, 0) (λ ∈ R). λ = −1, λ = 0, and λ = 1
produce the points (−1, 4), (0, 4), and (1, 4).
(d) The line described by x − 4 = 0 contains the point (4, 0) and has direction vector (0, 1).
Thus, it can also be described by r = (4, 0) + λ (0, 1) (λ ∈ R). λ = −1, λ = 0, and λ = 1
produce the points (4, −1), (4, 0), and (4, 1).

A179. If v = 0, then the “line” would not be a line, but the single point P :

{R ∶ r = p + λv (λ ∈ R)} = {R ∶ r = p} = {P } .

And so, we impose the restriction that v ≠ 0 to rule out the above trivial (or degenerate)

A180(a) Write out x = −1 + λ and y = 3 − 2λ.

1 2

Then = plus 2× = yields: y + 2x = 1 or y = −2x + 1.

2 1

(b) Write out x = 5 + 7λ and y = 6 + 8λ.

1 2

8 2
Then 7× = minus 8× = yields: 7y − 8x = 2 or y = x + .
2 1
7 7
(c) Write out x = 3λ and y = −3.
1 2

This is a horizontal line. We can discard = and be left with the single equation y = −3.
1 2

(d) Write out x = 1 and y = 1 + 2λ.

1 2

This is a vertical line. We can discard = and be left with the single equation x = 1.
2 1

x − (−1) y − 3
A181(a) = or y = −2x + 1.
1 −2
x−5 y−6 8 2
(b) = or y = x − .
7 8 7 7
(c) y = −3.
(d) x = 1.
1474, Contents
126.3. Ch. 36 Answers (The Scalar Product)
⎛2⎞ ⎛8⎞
A182. v⋅x = ⋅ = 16 + 7 = 23,
⎝1⎠ ⎝7⎠

⎛ −4 ⎞ ⎛ 8 ⎞
w⋅x = ⋅ = −32 + 0 = −32,
⎝ 0 ⎠ ⎝7⎠

Since the scalar product is commutative, w ⋅ v = −8, x ⋅ v = 23, and x ⋅ w = −32.

Since the scalar product is distributive, w ⋅ (x + v) = w ⋅ x + w ⋅ v = −32 − 8 = −40,
Also, (2v) ⋅ x = 2 (v ⋅ x) = 2 ⋅ 23 = 46 and w ⋅ (2x) = (2x) ⋅ w = 2 (x ⋅ w) = 2(−32) = −64.

√ √ √
∣a∣ = ∣(−2, 3)∣ = (−2) + 32 = (−2, 3) ⋅ (−2, 3) = a ⋅ a.
√ √ √
∣b∣ = ∣(7, 1)∣ = 7 +1
2 2 = (7, 1) ⋅ (7, 1) = b ⋅ b.
√ √ √
∣c∣ = ∣(5, −4)∣ = 52 + (−4) = (5, −4) ⋅ (5, −4) = b ⋅ b.

1475, Contents

126.4. Ch. 37 Answers (The Angle Between Two Vectors)
A185(a) The third side of the triangle corresponds to the vector u − v.
(b) ∣u∣, ∣v∣, and ∣u − v∣.
v u−v
(c) ∣u − v∣ = ∣u∣ + ∣v∣ − 2 ∣u∣ ∣v∣ cos θ.
2 2 2

(d) We’ll actually use distributivity twice: θ

(u − v) ⋅ (u − v) = (u − v) ⋅ u + (u − v) ⋅ (−v)
= u ⋅ u − u ⋅ v − u ⋅ v + v ⋅ v = u ⋅ u + v ⋅ v − 2u ⋅ v.

∣u − v∣ = ∣u∣ + ∣v∣ − 2 ∣u∣ ∣v∣ cos θ (By (c))

2 2 2
⇐⇒ (u − v) ⋅ (u − v) = u ⋅ u + v ⋅ v − 2 ∣u∣ ∣v∣ cos θ (By Fact 67)
⇐⇒ u ⋅ u + v ⋅ v − 2u ⋅ v = u ⋅ u + v ⋅ v − 2 ∣u∣ ∣v∣ cos θ (By (d))
⇐⇒ −2u ⋅ v = −2 ∣u∣ ∣v∣ cos θ
⇐⇒ u ⋅ v = ∣u∣ ∣v∣ cos θ
⇐⇒ θ = cos−1
∣u∣ ∣v∣

A186(a) The angle between u = (2, 0) and v = (0, 17) is:

(2, 0) ⋅ (0, 17) 2 ⋅ 0 + 0 ⋅ 17

θ = cos−1 = cos−1 √ √ = cos−1 0 = .
∣(2, 0)∣ ∣(0, 17)∣ 22 + 02 02 + 172 2

θ is right. u and v are perpendicular and point in different directions.

(b) The angle between u = (5, 0) and v = (−3, 0) is:

(5, 0) ⋅ (−3, 0) 5 ⋅ (−3) + 0 ⋅ 0 −15

θ = cos−1 =√ √ = cos−1 = cos−1 (−1) = π.
∣(5, 0)∣ ∣(−3, 0)∣ 52 + 02 (−3)2 + 02 5⋅3

θ is straight. u and v are parallel and point in exact opposite directions.

(c) The angle between u = (1, 0) and v = (1, 3) is:
√ √
(1, 0) ⋅ (1, 3) 1⋅1+0⋅ 3 1 −1 1
θ = cos−1
√ = cos √
√ = −1
√ = =
∣(1, 0)∣ ∣(1, 3)∣ √ 2 cos

2 3
1 + 0 1 + ( 3)
2 2 2 1 4

θ is acute. u and v are neither parallel nor perpendicular and point in different directions.
(d) The angle between u = (2, −3) and v = (1, 2) is:

(2, −3) ⋅ (1, 2) 2 ⋅ 1 + (−3) ⋅ 2 −4

θ = cos−1 = cos−1 √ = −1
√ √ ≈ 2.0899.
2√ 2
∣(2, −3)∣ ∣(1, 2)∣ ⋅
2 + (−3) 1 + 2
2 2 13 5

θ is obtuse. u and v are neither parallel nor perpendicular and point in different directions.
1476, Contents
∣u + v∣ = (u + v) ⋅ (u + v)
A187. (Fact 67)
=u⋅u+u⋅v+v⋅u+v⋅v (Distributivity)
= u ⋅ u + 2u ⋅ v + v ⋅ v (Commutativity)
= ∣u∣ + 2u ⋅ v + ∣v∣
2 2
(Fact 67 again)
= ∣u∣ + 0 + ∣v∣ (u ⋅ v = 0 because u ⊥ v)
2 2

= ∣u∣ + ∣v∣ .
2 2

∣u + v∣ = ∣u∣ + 2u ⋅ v + ∣v∣
2 2 2
≤ ∣u∣ + 2 ∣u∣ ∣v∣ + ∣v∣
2 2
(Cauchy’s Inequality)
= (∣u∣ + ∣v∣) 2 . (Complete the square)

We’ve just shown that ∣u + v∣ ≤ (∣u∣ + ∣v∣) . And so, taking square roots, we also have
2 2

∣u + v∣ ≤ ∣u∣ + ∣v∣.

A189. (a) (1, 3) (b) (4, 2) (c) (−1, 2)

1 1 4 4 2 −1 −1
x-direction cosine √ =√ √ =√ =√ √ =√
12 + 32 10 42 + 22 20 5 (−1) + 22
2 5

3 3 2 2 1 2 2
y-direction cosine √ =√ √ =√ =√ √ =√
12 + 32 10 42 + 22 20 5 (−1) + 22
2 5

1 3 2 1 −1 2
Unit vector (√ , √ ) (√ , √ ) (√ , √ )
10 10 5 5 5 5

1477, Contents

126.5. Ch. 38 Answers (The Angle Between Two Lines)
∣(−1, 1) ⋅ (2, −3)∣ ∣−5∣ 5
A190(a) cos−1 = cos−1 √ √ = cos−1 √ ≈ 0.197
∣(−1, 1)∣ ∣(2, −3)∣ 2 13 26

∣(1, 5) ⋅ (8, 1)∣ ∣13∣ 1

(b) cos−1 = cos−1 √ √ = cos−1 √ ≈ 1.249
∣(1, 5)∣ ∣(8, 1)∣ 26 65 10

∣(2, 6) ⋅ (3, 2)∣ ∣18∣ 9

(c) cos−1 = cos−1 √ √ = cos−1 √ ≈ 0.661
∣(2, 6)∣ ∣(3, 2)∣ 40 13 130

126.6. Ch. 39 Answers (Vectors vs Scalars)

(This chapter had no exercises.)

126.7. Ch. 40 Answers (Projection Vectors)

A191(a) Since (33, 33) ∥ (1, 1), the projection of (1, 0) on (33, 33) is equal to the projection
of (1, 0) on (1, 1). Hence:

(1, 0) ⋅ (1, 1) 1 ⋅ 1 + 0 ⋅ 1 2 √
∣proj(33,33) (1, 0)∣ = ∣proj(1,1) (1, 0)∣ = ∣ ∣= √ = √ = 2.
∣(1, 1)∣ 2 2

(b) The length of the projection of (33, 33) on (1, 0) is:

̂ (33, 33) ⋅ (1, 0) 33 + 0

∣proj(1,0) (33, 33)∣ = ∣(33, 33) ⋅ (1, 0)∣ = ∣ ∣= = 33.
∣(1, 0)∣ 1

(c) We just showed that: ∣proj(33,33) (1, 0)∣ ≠ ∣proj(1,0) (33, 33)∣.
Therefore, the given statement is false. In general, the projection of a on b is not the same
as the projection of b on a.

1478, Contents

126.8. Ch. 41 Answers (Collinearity)
A192(a) The unique line that contains both A = (3, 1) and B = (1, 6) is:
Ð→ Ð→
r = OA + λAB = (3, 1) + λ(−2, 5) (λ ∈ R).

If this line also contains C, then there exists λ̂ such that:

⎛ 0 ⎞ ⎛3⎞ ⎛ −2 ⎞ 0 = 3 − 2λ̂,
C= = + λ̂ or
⎝ −1 ⎠ ⎝ 1 ⎠ ⎝ 5 ⎠ −1 = 1 + 5λ̂.

From =, we have λ̂ = 1.5. But this contradicts =. So, there is no solution to the above
1 2

vector equation (or system of two equations), meaning our line does not contain C. Thus,
A, B, and C are not collinear.
(b) The unique line that contains both A = (1, 2) and B = (0, 0) is:
Ð→ Ð→
r = OA + λAB = (1, 2) + λ(−1, −2) (λ ∈ R).

If this line also contains C, then there exists λ̂ such that:

⎛3⎞ ⎛1⎞ ⎛ −1 ⎞ 3 = 1 − 1λ̂,

C= = + λ̂ or
⎝6⎠ ⎝2⎠ ⎝ −2 ⎠ 6 = 2 − 2λ̂.

Observe that λ̂ = −2 solves the above vector equation (or system of two equations). Hence,
our line does indeed contain the point C (it corresponds to λ̂ = −2). Thus, A, B, and C are

1479, Contents

126.9. Ch. 42 Answers (The Vector Product)
A??(a) a × (b + c) = a1 (b2 + c2 ) − a2 (b1 + c1 ) = a1 b2 − a2 b1 + a1 c2 − a2 c1 = a × b + a × c.
(b) a × b = a1 b2 − a2 b1 = − (a2 b1 − a1 b2 ) = − (b1 a2 − b2 a1 ) = −b × a.
(c) a × a = a1 a2 − a2 a1 = 0.

A193. a × b = (1, −2) × (3, 0) = 1 ⋅ 0 − (−2) ⋅ 3 = 0 + 6 = 6.

a × c = (1, −2) × (4, 1) = 1 ⋅ 1 − (−2) ⋅ 4 = 1 + 8 = 9.
b × c = (3, 0) × (4, 1) = 3 ⋅ 1 − 0 ⋅ 4 = 3 − 0 = 3.
By anti-commutativity, b × a = −a × b = −6, c × a = −a × c = −9, and c × b = b × c = −3.
By distributivity, a × (b + c) = a × b + a × c = 6 + 9 = 15.

√ √
A194(a) ∣a∣ = a21 + a22 , ∣b∣ = b21 + b22 , ∣a × b∣ = ∣a1 b2 − a2 b1 ∣, and:

a⋅b a1 b1 + a2 b2
cos θ = =√ √
∣a∣ ∣b∣
a21 + a22 b21 + b22

(b) If θ ∈ [0, π], then sin θ ≥ 0.

(c) The trigonometric identity is sin2 θ + cos2 θ = 1 (Fact 29). Rearranging, we have:

sin θ = ± 1 − cos2 θ.

But by (b), sin θ ≥ 0 and so we can discard the negative value. Altogether then:

sin θ = 1 − cos2 θ.
√ Á (a1 b1 + a2 b2 )
sin θ = 1 − cos2 θ = Á
À1 −
(a21 + a22 ) (b21 + b22 )

(a21 + a22 ) (b21 + b22 ) − (a1 b1 + a2 b2 )

= a21 b21 + a21 b22 + a22 b21 + a22 b22 − (a21 b21 + a22 b22 + 2a1 a2 b1 b2 )
= a21 b22 + a22 b21 − 2a1 a2 b1 b2 = (a1 b2 − a2 b1 ) .

√ √ Á (a1 b1 + a2 b2 )
(f) ∣a∣ ∣b∣ sin θ = a1 + a2 b1 + b2 1 − 2
2 2 2 2
(a1 + a22 ) (b21 + b22 )

= (a21 + a22 ) (b21 + b22 ) − (a1 b1 + a2 b2 )

= (a1 b2 − a2 b1 ) = ∣a1 b2 − a2 b1 ∣ = ∣a × b∣ .
(e) 2

1480, Contents

126.10. Ch. 43 Answers (The Foot of the Perpendicular)
Ð→ Ð→
A195. P A = A − P = (−1, 0) − (2, −3) = (−3, 3), P B = B − P = (3, 2) − (2, −3) = (1, 5).

Ð→ (−3, 3) ⋅ (5, 1) ⎛ 5 ⎞ −15 + 3 ⎛ 5 ⎞ 6 ⎛5⎞

projv P A = proj(5,1) (−3, 3) = = = − ,
52 + 12 ⎝1⎠ 26 ⎝ 1 ⎠ 13 ⎝ 1 ⎠

Ð→ (1, 5) ⋅ (5, 1) ⎛ 5 ⎞ 5 + 5 ⎛ 5 ⎞ 5 ⎛ 5 ⎞
projv P B = proj(5,1) (1, 5) = = = .
52 + 12 ⎝ 1 ⎠ 26 ⎝ 1 ⎠ 13 ⎝ 1 ⎠
By Fact 86 then, the feet of the perpendiculars from A and B to the line are:

Ð→ ⎛ 2 ⎞ 6 ⎛ 5 ⎞ 1 ⎛ −45 ⎞ 9 ⎛5⎞
P + projv P A = − = =− ,
⎝ −3 ⎠ 13 ⎝ 1 ⎠ 13 ⎝ −9 ⎠ 13 ⎝ 1 ⎠

Ð→ ⎛ 2 ⎞ 5 ⎛ 5 ⎞ 1 ⎛ 51 ⎞ 17 ⎛ 3 ⎞
P + projv P B = + = = .
⎝ −3 ⎠ 13 ⎝ 1 ⎠ 13 ⎝ −34 ⎠ 13 ⎝ −2 ⎠

Ð→ Ð→ Ð→
A196(a)(i) By definition, projv P A = (P A ⋅ v̂) v̂ = [(P A ⋅ v) / ∣v∣ ] v. We’ve just shown

Ð→ Ð→
that we can write projv P A = λv, with λ = (P A ⋅ v) / ∣v∣ .

Ð→ Ð→ Ð→
(ii) We have B = P + projv P A = P + λv or equivalently OB = OP + λv. We’ve just shown
that B satisfies l’s vector equation and thus that B is on l.
Ð→ Ð→ Ð→ Ð→ Ð→ Ð→ Ð→
(b)(i) AB = B − A = P + projv P A − A = AP + projv P A = − (P A − projv P A) = −rejv P A.
(ii) Recall that in general, rejb a ⊥ b. Hence, rejv P A ⊥ v.
Ð→ Ð→ Ð→ Ð→
(iii) Since rejv P A ⊥ v, we also have AB = − (rejv P A) ⊥ v and hence AB ⊥ l.
(c)(i) BC is a direction vector of l, so that by definition of the foot of the perpendicular,
Ð→ Ð→ Ð→ Ð→
we must have AB ⊥ BC or AB ⋅ BC = 0.
Ð→ Ð→
(ii) We will prove that AC ⋅ BC ≠ 0:
Ð→ Ð→ Ð→ Ð→ Ð→ Ð→ Ð→ Ð→ Ð→ Ð→ 2 Ð→ 2
AC ⋅ BC = (AB + BC) ⋅ BC = AB ⋅ BC + BC ⋅ BC = 0 + ∣BC∣ = ∣BC∣ > 0.

Ð→ Ð→ Ð→
We’ve just shown that AC ⊥/ BC. And hence, AC ⊥/ l.

A197(a) If A is on l, then as noted in Remark 62, d = 0. Moreover, P A ∥ v, so that by
Ð→ Ð→ Ð→
Corollary 11, P A × v̂ = 0 and hence ∣P A × v̂∣ = 0. Thus, we indeed have d = ∣P A × v̂∣.
Ð→ Ð→ Ð→
(b) By Corollary 13, d = ∣AB∣. (c) AB = −rejv P A.
Ð→ Ð→ Ð→
(d) Putting together (b) and (c), we have d = ∣AB∣ = ∣−rejv P A∣ = ∣rejv P A∣.
Ð→ Ð→ Ð→
But by Fact 85, ∣rejv P A∣ = ∣P A × v̂∣. Thus, d = ∣P A × v̂∣.
1481, Contents
A198(a) Let P = (8, 3) and v = (9, 3).
Method 1 (Formula Method). First compute P A = (7, 3) − (8, 3) = (−1, 0) and:

Ð→ (−1, 0) ⋅ (9, 3) ⎛ 9 ⎞ −9 + 0 ⎛ 9 ⎞ 1 ⎛9⎞

projv P A = proj(9,3) (−1, 0) = = = − .
92 + 32 ⎝3⎠ 90 ⎝ 3 ⎠ 10 ⎝ 3 ⎠

So: B = P + projv P A = (8, 3) − 0.1 (9, 3) = (7.1, 2.7).
And the distance between A and l is:

Ð→ Ð→ (9, 3) (−1) ⋅ 3 − 0 ⋅ 9 3 1 10
∣AB∣ = ∣P A × v̂∣ = ∣(−1, 0) × √ ∣=∣ √ ∣= √ = √ = .
92 + 32 90 90 10 10

Method 2 (Perpendicular Method). Let B = (8, 3) + λ̃ (9, 3). Then:

AB = B − A = (8, 3) + λ̃ (9, 3) − (7, 3) = (9λ̃ + 1, 3λ̃).
Ð→ Ð→ Ð→
Since B is the foot of the perpendicular, we have AB ⊥ l or AB ⊥ v or AB ⋅ (9, 3) = 0:
0 = AB ⋅ (9, 3) = (9λ̃ + 1, 3λ̃) ⋅ (9, 3) = 9 (9λ̃ + 1) + 3 (3λ̃) = 90λ̃ + 9.

Rearranging, λ̃ = −9/90 = −1/10 = −0.1 and so:

B = (8, 3) + λ̃ (9, 3) = (8, 3) − 0.1 (9, 3) = (7.1, 2.7) .

And the distance between A and l is:

Ð→ √
∣AB∣ = ∣B − A∣ = ∣(7.1, 2.7) − (7, 3)∣ = ∣(0.1, −0.3)∣ = 0.1 ∣(1, −3)∣ = 0.1 10.

Method 3 (Calculus Method). Let R be a generic point on l. Then:

√ √
Ð→ Ð→
AR = (9λ + 1, 3λ) ∣AR∣ = (9λ + 1) + (3λ) = 90λ2 + 18λ + 2.
2 2
Differentiate: (90λ2 + 18λ + 2) = 180λ + 18.

18 1
FOC: (180λ + 18) ∣λ=λ̃ = 0 or λ̃ = − =− .
180 10
18 18 1
Alternatively, we could simply have used “−b/2a”: λ̃ = − =− =− .
2 ⋅ 90 180 10
And now, we can find B and ∣AB∣ as we did in Method 2.
1482, Contents
A198(b) Let P = (4, 4) and v = (6, 11) − (4, 4) = (2, 7).
Method 1 (Formula Method). First compute P A = (8, 0) − (4, 4) = (4, −4) and:

Ð→ (4, −4) ⋅ (2, 7) ⎛ 2 ⎞ 8 − 28 ⎛ 2 ⎞ 20 ⎛ 2 ⎞

projv P A = proj(2,7) (4, −4) = = = − .
22 + 72 ⎝7⎠ 53 ⎝ 7 ⎠ 53 ⎝ 7 ⎠

Ð→ 20 1
So: B = P + projv P A = (4, 4) − (2, 7) = (172, 72).
53 53
And the distance between A and l is:

Ð→ Ð→ (2, 7) 4 ⋅ 7 − (−4) ⋅ 2 36
∣AB∣ = ∣P A × v̂∣ = ∣(4, −4) × √ ∣=∣ √ ∣= √ .
22 + 72 53 53

Method 2 (Perpendicular Method). Let B = (8, 3) + λ̃ (9, 3). Then:

AB = B − A = (4, 4) + λ̃ (2, 7) − (8, 0) = (2λ̃ − 4, 7λ̃ + 4).
Ð→ Ð→ Ð→
Since B is the foot of the perpendicular, we have AB ⊥ l or AB ⊥ v or AB ⋅ (2, 7) = 0:
0 = AB ⋅ (2, 7) = (2λ̃ − 4, 7λ̃ + 4) ⋅ (2, 7) = 2 (2λ̃ − 4) + 7 (7λ̃ + 4) = 53λ̃ + 20.

Rearranging, λ̃ = −20/53 and so:

20 1
B = (4, 4) + λ̃ (2, 7) = (4, 4) − (2, 7) = (172, 72) .
53 53
And the distance between A and l is:

Ð→ 1 1 36 36 53 36
∣AB∣ = ∣B − A∣ = ∣ (172, 72) − (8, 0)∣ = ∣ (−252, 72)∣ = ∣(−7, 2)∣ = =√ .
53 53 53 53 53

Method 3 (or the Calculus Method). Let R be a generic point on l. Then:

√ √
Ð→ Ð→
AR = (2λ − 4, 7λ + 4) ∣AR∣ = (2λ − 4) + (7λ + 4) = 53λ2 + 40λ + 32.
2 2
Differentiate: (53λ2 + 40λ + 32) = 106λ + 40.

40 20
FOC: (106λ + 40) ∣λ=λ̃ = 0 or λ̃ = − =− .
106 53
40 20
Alternatively, we could simply have used “−b/2a”: λ̃ = − =− .
2 ⋅ 53 53
And now, we can find B and ∣AB∣ as we did in Method 2.
1483, Contents
A198(c) Let P = (8, 4) and v = (5, 6).
Method 1 (Formula Method). First compute P A = (8, 5) − (8, 4) = (0, 1) and:

Ð→ (0, 1) ⋅ (5, 6) ⎛ 5 ⎞ 0 + 6 ⎛ 5 ⎞ 6 ⎛ 5 ⎞
projv P A = proj(5,6) (0, 1) = = = .
52 + 62 ⎝ 6 ⎠ 61 ⎝ 6 ⎠ 61 ⎝ 6 ⎠

Ð→ 6 1
So: B = P + projv P A = (8, 4) + (5, 6) = (518, 280).
61 61
And the distance between A and l is:

Ð→ (5, 6) 0−5 5
∣P A × v̂∣ = ∣(0, 1) × √ ∣ = ∣√ ∣ = √ .
52 + 62 61 61

Method 2 (Perpendicular Method). Let B = (8, 4) + λ̃ (5, 6). Then:

AB = B − A = (8, 4) + λ̃ (5, 6) − (8, 5) = (5λ̃, 6λ̃ − 1).
Ð→ Ð→ Ð→
Since B is the foot of the perpendicular, we have AB ⊥ l or AB ⊥ v or AB ⋅ (5, 6) = 0:
0 = AB ⋅ (5, 6) = (5λ̃, 6λ̃ − 1) ⋅ (5, 6) = 5 (5λ̃) + 6 (6λ̃ − 1) = 61λ̃ − 6.

Rearranging, λ̃ = 6/61 and so:

6 1
B = (8, 4) + λ̃ (5, 6) = (8, 4) + (5, 6) = (518, 280) .
61 61
And the distance between A and l is:

Ð→ 1 1 5 5 61
∣AB∣ = ∣B − A∣ = ∣ (518, 280) − (8, 5)∣ = ∣ (30, −25)∣ = ∣(6, −5)∣ = .
61 61 61 61

Method 3 (or the Calculus Method). Let R be a generic point on l. Then:

√ √
Ð→ Ð→
AR = (5λ, 6λ − 1) ∣AR∣ = (5λ) + (6λ − 1) = 61λ2 + 12λ + 1.
2 2
Differentiate: (61λ2 + 12λ + 1) = 122λ + 12.

12 6
FOC: (122λ + 12.) ∣λ=λ̃ = 0 or λ̃ = − =− .
122 61
12 6
Alternatively, we could simply have used “−b/2a”: λ̃ = − =− .
2 ⋅ 61 61
And now, we can find B and ∣AB∣ as we did in Method 2.
1484, Contents
126.11. Ch. 44 Answers (Three-Dimensional Space)
(This chapter had no exercises.)

126.12. Ch. 45 Answers (Vectors in 3D)

A199(a) AB = (0 − 2, 1 − 5, 1 − 8) = (−2, −4, −7).
Ð→ Ð→ Ð→
(b) OA = (2, 5, 8) and OB = (0, 1, 1). (c) AA = 0 = (0, 0, 0).
√ √
Ð→ Ð→
A200. AB = (−2, −4, −7). So, ∣AB∣ = ∣(−2, −4, −7)∣ = (−2) + (−4) + (−7) = 69.
2 2 2

A201(a) A + B is undefined.
(b) A − B is the vector BA = (1 − (−1) , 2 − 0, 3 − 7) = (2, 2, −4).
(c) B + C is undefined. Hence, A + (B + C) is also undefined.
(d) B − C is the vector CB = (−1 − 5, 0 − (−2) , 7 − 3) = (−6, 2, 4). And so, A + (B − C) is a
well-defined vector, namely A + (B − C) = A + CB = (1, 2, 3) + (−6, 2, 4) = (−5, 4, 7).
A202(a) u + v = (1, 2, 3) + (−1, 0, 7) = (0, 2, 10).
(b) u − v = (1, 2, 3) − (−1, 0, 7) = (2, 2, −4).
(c) u + (v + w) = u + v + w = (0, 2, 10) + (5, −2, 3) = (5, 0, 13).
(d) u + (v − w) = u + v − w = (0, 2, 10) − (5, −2, 3) = (−5, 4, 7).
A203. AB = B − A = (3, 6, −5) − (5, −1, 0) = (−2, 7, −5).
AC = C − A = (2, 2, 3) − (5, −1, 0) = (−3, 3, 3).
BC = C − B = (2, 2, 3) − (3, 6, −5) = (−1, −4, 8).
Ð→ Ð→ Ð→ Ð→
AB − AC = (−2, 7, −5) − (−3, 3, 3) = (1, 4, −8) = −BC = CB. 3
Ð→ Ð→ Ð→
AB + BC = (−2, 7, −5) + (−1, −4, 8) = (−3, 3, 3) = AC. 3
A204(a) v and w point in the exact opposite directions because v = −1.5w. They are thus
also parallel — v ∥ w.
(b) v and x point in different directions because x ≠ kv for all k ∈ R.They are thus also
non-parallel — v ∥/ x.
(c) w and x point in different directions because x ≠ kw for all k ∈ R.They are thus also
non-parallel — w ∥/ x.
(d) By our definitions (which covered only non-zero vectors), the zero vector 0 does not
point in the same, exact opposite, or different direction as any other vector (including, in
particular, u).
Also, it is neither parallel nor non-parallel to any other vector (including, in particular, u).

1485, Contents

√ √ (1, 2, 3)
A205(a) ∣a∣ = ∣(1, 2, 3)∣ = 12 + 22 + 32 = 14. And â = √ .
√ √ (4, 5, 6)
(b) ∣b∣ = ∣(4, 5, 6)∣ = 42 + 52 + 62 = 77. And b̂ = √ .
√ √ √
(c) ∣a − b∣ = ∣(−3, −3, −3)∣ = (−3) + (−3) + (−3) = 27 = 3 3. And:
2 2 2

(−3, −3, −3) (1, 1, 1)

−b= √ =− √ .
3 3 3

(d) ∣2a∣ = 2 14. And 2â = â = (1,

2, 3)

(e) ∣3b∣ = 3 77. And 3b̂ = b̂ = (4,

5, 6)
√ (1, 1, 1)
(f) ∣−4 (a − b)∣ = 12 3. And −4̂ (a − b) = −â−b= √ .
A206. v = (9, 0, −1) = 9i − k and w = (−7, 3, 5) = −7i + 3j + 5k.
A207. By the Ratio Theorem, the point P that divides the line segment AB in the ratio
2 ∶ 3 has position vector:

⎛ 11 ⎞
µa + λb 3 (1, 2, 3) + 2 (4, 5, 6) 1 ⎜
p= = = ⎜ 16 ⎟⎟.
λ+µ 2+3 5
⎝ 21 ⎠

Hence, the point is P = (11, 16, 21).
A208(a) Informally, a vector is an “arrow” with two properties: direction and length.
(b) A point and a vector are entirely different objects and should not be confused. Non-
etheless, each can be described by an ordered triple of real numbers.
(c) Let A = (a1 , a2 , a3 ) be a point and a = (a1 , a2 , a3 ) be a vector. We say that a is A’s
position vector.
(d) The vector a = (a1 , a2 , a3 ) carries us from the origin to the point A = (a1 , a2 , a3 ).
⎛ a1 ⎞
Ð→ Ð

A209. OA = A − O = (a1 , a2 , a3 ) = a = ⎜ ⎟
⎜ a2 ⎟ = a1 i + a2 j + a3 k = a .
⎝ a3 ⎠
A210. A + B is undefined.
A + OB is the point (a1 + b1 , a2 + b2 , a3 + b3 ).
Ð→ Ð→
OA + OB is the vector (a1 + b1 , a2 + b2 , a3 + b3 ).
Ð→ Ð→ Ð→
OA − OB is the vector BA = (a1 − b1 , a2 − b2 , a3 − b3 ).
Ð→ Ð→ Ð→
OA − BA is the vector OB = (b1 , b2 , b3 ).

1486, Contents

126.13. Ch. 46 Answers (The Scalar Product in 3D)
A211. (1, 2, 3) ⋅ (4, 5, 6) = 4 + 10 + 18 = 32.
(−2, 4, −6) ⋅ (1, −2, 3) = −2 − 8 − 18 = −28.
A212(a) The angle between the vectors (1, 2, 3) and (4, 5, 6) is:

(1, 2, 3) ⋅ (4, 5, 6) 32
cos−1 = cos−1 √ √ ≈ 0.226.
∣(1, 2, 3)∣ ∣(4, 5, 6)∣ 14 77

Thus, the vectors (1, 2, 3) and (4, 5, 6) are neither parallel nor perpendicular. Instead, they
point in different directions.
(b) The angle between the vectors (−2, 4, −6) and (1, −2, 3) is:

(−2, 4, −6) ⋅ (1, −2, 3) −28 −28 −28

cos−1 = cos−1 √ √ = cos−1 = cos−1 = π.
∣(−2, 4, 6)∣ ∣(1, −2, 3)∣ 56 14 28 28

Thus, the vectors (−2, 4, −6) and (1, −2, 3) are parallel (and point in exact opposite direc-
tions). Actually, we could also have arrived at this conclusion by observing that u = −2v.
√ √
A213(a) The vector (1, 3, −2) has length 12 + 32 + (−2) = 14.

Hence, its unit vector is: (1, 3, −2) / 14.
√ √ √
Its x-, y-, and z-direction cosines are 1/ 14, 3/ 14, and −2/ 14.
And the angles it makes with the positive x-, y-, and z-axes are:

1 3 −2
cos−1 √ ≈ 1.300, cos−1 √ ≈ 0.641, and cos−1 √ ≈ 2.135.
14 14 14
√ √
(b) The vector (4, 2, −3) has length 42 + 22 + (−3) = 29.

Hence, its unit vector is: (4, 2, −3) / 29.
√ √ √
Its x-, y-, and z-direction cosines are 4/ 29, 2/ 29, and −3/ 29.
And the angles it makes with the positive x-, y-, and z-axes are:

4 2 −3
cos−1 √ ≈ 0.734, cos−1 √ ≈ 1.190, and cos−1 √ ≈ 2.162.
29 29 29
√ √
(c) The vector (−1, 2, −4) has length (−1) + 22 + (−4) = 21.
2 2

Hence, its unit vector is: (−1, 2, −4) / 21.
√ √ √
Its x-, y-, and z-direction cosines are −1/ 21, 2/ 21, and −4/ 21.
And the angles it makes with the positive x-, y-, and z-axes are:

−1 2 −4
cos−1 √ ≈ 1.791, cos−1 √ ≈ 1.119, and cos−1 √ ≈ 2.362.
21 21 21

1487, Contents

126.14. Ch. 47 Answers (The Proj. and Rej. Vectors in 3D)
(2, 5, −1) ⋅ (1, 1, 4) 2+5−4 1
A214. projb a = (a ⋅ b̂) b̂ = (1, 1, 4) = (1, 1, 4) = (1, 1, 4).
12 + 12 + 42 18 6
1 1
rejb a = a − projb a = (2, 5, −1) − (1, 1, 4) = (11, 29, −10).
6 6

⎛ −2 ⎞
(2, 5, −1) ⋅ (−2, −2, 1) ⎜ ⎟
projc a = (a ⋅ ĉ) ĉ = ⎜ −2 ⎟
(−2) + (−2) + 1 ⎝
2 2
1 ⎠

⎛ −2 ⎞ ⎛ −2 ⎞ ⎛ 2 ⎞
−4 − 10 − 1 ⎜ ⎟ 15 ⎜ ⎟ 5 ⎜ ⎟
= ⎜ −2 ⎟ = − 9 ⎜ −2 ⎟ = 3 ⎜ 2 ⎟
5 ⎝ 1⎠ ⎝ ⎠ ⎝ −1 ⎠
rejc a = a − projc a = (2, 5, −1) − (2, 2, −1) =1 (−4, 5, 2). 1
3 3
1 1
(rejb a) ⋅ b = (11, 29, −10) ⋅ (1, 1, 4) = (11 + 29 − 40) = 0. 3
6 6
1 1
(rejc a) ⋅ c = (−4, 5, 2) ⋅ (−2, −2, 1) = (8 − 10 + 2) = 0. 3
3 3

⎛4⎞ ⎛4⎞ ⎛4⎞

(1, 2, 3) ⋅ (4, 5, 6) ⎜ ⎟ 4 + 10 + 18 ⎜ ⎟ 32 ⎜ ⎟
A215. projb a = (a ⋅ b̂) b̂ = 5 = ⎜ 5 ⎟ = 77 ⎜ 5 ⎟.
42 + 52 + 62 ⎜ ⎟ 77
⎝6⎠ ⎝6⎠ ⎝6⎠
32 √ 32
∣projb a∣ = 77 = √ .
77 77
32 1 3
rejb a = a − projb a = (1, 2, 3) − (4, 5, 6) = (−51, −6, 39) = − (17, 2, −13).
77 77 77

3 3√ 3√ 6
∣rejb a∣ = [17 + 2 + (−13) = ] 462 = 6 ⋅ 77 = 3
2 2 2
77 77 77 77
projb a = b and so projb a ∥ b. Also, projb a points in the same direction as b.
3 3
(rejb a) ⋅ b = − (17, 2, −13) ⋅ (4, 5, 6) = − [17 ⋅ 4 + 2 ⋅ 5 + (−13) ⋅ 6] = 0 and so rejb a ⊥ b.
77 77

1488, Contents

126.15. Ch. 48 Answers (Lines in 3D)
x+1 y−1 z−1
A216(a) = = . This line is not perpendicular to any of the axes.
3 −2 1
x−5 y−6 z−1
(b) = = . This line is not perpendicular to any of the axes.
7 8 1
x z−1
(c) = and y = −3. This line is perpendicular to the y-axis.
3 1
(d) y = z = 9. This line is perpendicular to the y- and z-axes.
(e) = = . This line is not perpendicular to any of the axes.
x y z
4 8 5
(f) x = 1 and z = 5. This line is perpendicular to the x- and z-axes.

x − 2/7 y − 50/3
= =
A217(a) Rewrite the given equations: .
5/7 70/3 7/8
So, this line may be described by r = (2/7, 50/3, 0) + λ (5/7, 70/3, 7/8) (λ ∈ R).
It is not perpendicular to any of the axes.
= =
x y z
(b) Rewrite the given equations: .
1/2 1/3 1/5
So, this line may be described by r = (0, 0, 0) + λ (1/2, 1/3, 1/5) (λ ∈ R).
It is not perpendicular to any of the axes.
x − 4/17 y − 1/3
= =
(c) Rewrite the given equations: .
1/17 2/3 1/3
So, this line may be described by r = (4/17, 1/3, 0) + λ (1/17, 2/3, 1/3) (λ ∈ R).
It is not perpendicular to any of the axes.
11 x − 3 z − 2/5
(d) Rewrite the given equations: y = and = .
3 2 7/5
So, this line may be described by r = (3, 11/3, 2/5) + λ (2, 0, 7/5) (λ ∈ R).
It is perpendicular to the y-axis.
(e) The free variable is y. So, this line may be described by:

r = (65, 0, 1/2) + λ (0, 1, 0) (λ ∈ R).

It is perpendicular to the x- and z-axes.

5 y − (−2) z
(f) Rewrite the given equations: x = − and = .
13 5 1
So, this line may be described by r = (− , −2, 0) + λ (0, 5, 1) (λ ∈ R).
It is perpendicular to the x-axis.

1489, Contents

A218(a) If the two lines intersect, then there are real numbers λ̂ and µ̂ numbers such that:

λ̂ = 1, 1 − λ̂ = 3, 1 + λ̂ = 3 + 2µ̂.
1 2 3

= immediately contradicts =. Hence, the two lines do not intersect and are not identical.
1 2

The angle between them is:

∣(1, −1, 1) ⋅ (0, 0, 2)∣ ∣0 + 0 + 2∣ 2

cos−1 = cos−1 √ √ = cos−1
√ ≈ 0.955.
∣(1, −1, 1)∣ ∣(0, 0, 2)∣ ⋅
1 + (−1) + 1 0 + 0 + 2 3 2
2 2 2 2 2 2

The two lines are neither parallel nor perpendicular. And since they do not intersect
either, they are skew.
(b) If the two lines intersect, then there are real numbers λ̂ and µ̂ numbers such that:

−1 = 8µ̂, 2 + λ̂ = −3µ̂, 3 = 5µ̂.

1 2 3

= immediately contradicts =. Hence, the two lines do not intersect and are not identical.
1 3

The angle between them is:

∣(0, −1, 0) ⋅ (8, −3, 5)∣ ∣0 + 3 + 0∣ 3

cos−1 = cos−1 √ √ = cos−1 √ ≈ 1.263.
∣(0, −1, 0)∣ ∣(8, −3, 5)∣ 1 ⋅ 98
02 + 12 + 02 82 + (−3) + 52

The two lines are neither parallel nor perpendicular. And since they do not intersect
either, they are skew.
(c) If the two lines intersect, then there are real numbers λ̂ and µ̂ numbers such that:

7 + 8λ̂ = 9 + 3µ̂, 3 + 3λ̂ = 3 − 4µ̂, 4 + 4λ̂ = 7 − 3µ̂.

1 2 3

= plus = yields 11+12λ̂ = 16 or λ̂ = 5/12. And now from =, we have µ̂ = 4/9. But these values
1 3 3

of λ̂ and µ̂ contradict =. Hence, the two lines do not intersect and are not identical.

The angle between them is:

∣(8, 3, 4) ⋅ (3, −4, −3)∣ ∣24 − 12 − 12∣

cos−1 = cos−1 = cos−1 0 = .
∣(8, 3, 4)∣ ∣(3, −4, −3)∣ ∣(8, 3, 4)∣ ∣(3, −4, −3)∣ 2

The two lines are perpendicular (and hence not parallel). And since they do not intersect
either, they are skew.
(d) The two lines have parallel direction vectors because (−3, −6, −3) = −3 (1, 2, 1). Hence,
the two lines are parallel (and thus neither perpendicular nor skew). Since they are parallel,
the angle between them is zero.
The first line does not contain the point (1, 0, 0) — the only point on the first line with
x-coordinate 1 is (1, 2, 3) (plug in λ = 1). Hence, the two lines are not identical. Since the
two lines are parallel and distinct, they do not intersect at all.

1490, Contents

A219(a) The unique line that contains both A and B is:

⎛3⎞ ⎛ −2 ⎞
Ð→ Ð→ ⎜ ⎟
r = OA + λAB = ⎜ 1 ⎟ + λ⎜
⎜ 5

⎟ (λ ∈ R).
⎝2⎠ ⎝ 3 ⎠

If the above line also contains C, then there exists λ̂ such that:

0 = 3 − 2λ̂,
⎛ 0 ⎞ ⎛3 ⎞ ⎛ −2 ⎞
C=⎜ ⎟ ⎜
⎜ −1 ⎟ = ⎜ 1
⎟ + λ̂⎜ 5 ⎟,
⎟ ⎜ ⎟ or −1 = 1 + 5λ̂,

⎝ 0 ⎠ ⎝2 ⎠ ⎝ 3 ⎠
0 = 2 + 3λ̂.

From =, we have λ̂ = 1.5. But this contradicts =. This contradiction means that there is no
1 2

solution to the above vector equation (or system of three equations).

Thus, our line does not contain C. We conclude that A, B, and C are not collinear.

(b) The unique line that contains both A and B is:

⎛1⎞ ⎛ −1 ⎞
r = ⎜ 2 ⎟ + λ⎜
⎜ ⎟
⎜ −2

⎟ (λ ∈ R).
⎝4⎠ ⎝ −3 ⎠

If the above line also contains C, then there exists λ̂ such that:

3 = 1 − 1λ̂,
⎛ 3 ⎞ ⎛1 ⎞ ⎛ −1 ⎞
C=⎜ ⎟ ⎜
⎜ 6 ⎟=⎜ 2
⎟ + λ̂⎜ −2 ⎟,
⎟ ⎜ ⎟ or 6 = 2 − 2λ̂,

⎝ 10 ⎠ ⎝ 4 ⎠ ⎝ −3 ⎠
10 = 4 − 3λ̂.

As you can verify, λ̂ = −2 solves the above vector equation (or system of three equations).
Thus, our line contains C. We conclude that A, B, and C are collinear.

1491, Contents

126.16. Ch. 49 Answers (The Vector Product in 3D)
A220(a) Given u = (0, 1, 2) and v = (3, 4, 5), we have:

⎛ 0 ⎞ ⎛ 3 ⎞ ⎛ 1×5−2×4 ⎞ ⎛ −3 ⎞
u×v=⎜ ⎟ ⎜ ⎟ ⎜
⎜ 1 ⎟×⎜ 4 ⎟=⎜ 2×3−0×5
⎟ = ⎜ 6 ⎟.
⎟ ⎜ ⎟
⎝ 2 ⎠ ⎝ 5 ⎠ ⎝ 0×4−1×3 ⎠ ⎝ −3 ⎠

We next verify that u × v ⊥ u, v:

(u × v) ⋅ u = (−3, 6, −3) ⋅ (0, 1, 2) = 0 + 6 − 6 = 0, 3

(u × v) ⋅ v = (−3, 6, −3) ⋅ (3, 4, 5) = −9 + 24 − 15 = 0. 3

(b) Given w = (−1, −2, −3) and x = (1, 0, 5), we have:

⎛ −1 ⎞ ⎛ 1 ⎞ ⎛ (−2) × 5 − (−3) × 0 ⎞ ⎛ −10 ⎞

w×x =⎜ ⎟ ⎜
⎜ −2 ⎟ × ⎜ 0
⎟ = ⎜ (−3) × 1 − (−1) × 5
⎟ ⎜
⎟ = ⎜ 2 ⎟.
⎟ ⎜ ⎟
⎝ −3 ⎠ ⎝ 5 ⎠ ⎝ (−1) × 0 − (−2) × 1 ⎠ ⎝ 2 ⎠

We next verify that w × x ⊥ w, x:

(w × x) ⋅ w = (−10, 2, 2) ⋅ (−1, −2, −3) = 10 − 4 − 6 = 0, 3

(w × x) ⋅ x = (−10, 2, 2) ⋅ (1, 0, 5) = −10 + 0 + 10 = 0. 3

⎛ a2 b3 − a3 b2 ⎞ ⎛ a1 ⎞
(c) (a × b) ⋅ a = ⎜ ⎟ ⎜
⎜ a3 b1 − a1 b3 ⎟ ⋅ ⎜ a2 ⎟

⎝ a1 b2 − a2 b1 ⎠ ⎝ a3 ⎠
= (a2 b3 − a3 b2 ) a1 + (a3 b1 − a1 b3 ) a2 + (a1 b2 − a2 b1 ) a3
= a1 a2 b3 − a1 b2 a3 + b1 a2 a3 − a1 a2 b3 + a1 b2 a3 − b1 a2 a3 = 0.

⎛ a2 b3 − a3 b2 ⎞ ⎛ b1 ⎞
(a × b) ⋅ b = ⎜ ⎟ ⎜
⎜ a3 b1 − a1 b3 ⎟ ⋅ ⎜ b2 ⎟

⎝ a1 b2 − a2 b1 ⎠ ⎝ b3 ⎠
= (a2 b3 − a3 b2 ) b1 + (a3 b1 − a1 b3 ) b2 + (a1 b2 − a2 b1 ) b3
= b1 a2 b3 − b1 b2 a3 + b1 b2 a3 − a1 b2 b3 + a1 b2 b3 − b1 a2 b3 = 0.

1492, Contents

⎛ a1 ⎞ ⎛ b1 + c1 ⎞ ⎛ a2 (b3 + c3 ) − a3 (b2 + c2 ) ⎞
A221(a) a × (b + c) = ⎜ ⎟ ⎜
⎜ a2 ⎟ × ⎜ b2 + c2
⎟ = ⎜ a (b + c ) − a (b + c )
⎟ ⎜ 3 1 1 1 3 3

⎝ a3 ⎠ ⎝ b3 + c3 ⎠ ⎝ a1 (b2 + c2 ) − a2 (b1 + c1 ) ⎠

⎛ a2 b3 − a3 b2 + a2 c3 − a3 c2 ⎞
⎜ a3 b1 − a1 b3 + a3 c1 − a1 c3

⎝ a1 b2 − a2 b1 + a1 c2 − a2 c1 ⎠

⎛ a2 b3 − a3 b2 ⎞ ⎛ a2 c3 − a3 c2 ⎞
⎜ a3 b1 − a1 b3
⎟+⎜ a c −a c
⎟ ⎜ 3 1 1 3
⎟ = a × b + a × c.

⎝ a1 b2 − a2 b1 ⎠ ⎝ a1 c2 − a2 c1 ⎠

(b) In Example 676, we already showed that (1, 2, 3) × (4, 5, 6) = (−3, 6, −3).

⎛4⎞ ⎛1 ⎞ ⎛ 3⋅5−2⋅6 ⎞ ⎛ 3 ⎞
⎜ 5 ⎟×⎜ 2 ⎟=⎜ 1⋅6−3⋅4 ⎟ = ⎜ −6 ⎟.
We now have: ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝6⎠ ⎝3 ⎠ ⎝ 2⋅4−1⋅5 ⎠ ⎝ 3 ⎠

⎛4⎞ ⎛1 ⎞ ⎛1⎞ ⎛4 ⎞
⎜ 5 ⎟×⎜ 2 ⎟ = −⎜ 2 ⎟ × ⎜ 5 ⎟.
So that indeed: ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝6⎠ ⎝3 ⎠ ⎝3⎠ ⎝6 ⎠

To prove that a × b = −b × a, simply write them out:

⎛ a2 b3 − a3 b2 ⎞ ⎛ b2 a3 − b3 a2 ⎞
⎜ a3 b1 − a1 b3

⎟ and b×a =⎜
⎜ b3 a1 − b1 a3

⎝ a1 b2 − a2 b1 ⎠ ⎝ b1 a2 − b2 a1 ⎠

(c) (a1 , a2 , a3 ) × (a1 , a2 , a3 ) = (a2 a3 − a3 a2 , a3 a1 − a1 a3 , a1 a2 − a2 a1 ) = (0, 0, 0) = 0.

(d) Let a = (a1 , a2 , a3 ) and b = (b1 , b2 , b3 ). Then da = (da1 , da2 , da3 ) and:

⎛ da2 b3 − da3 b2 ⎞ ⎛ a2 b3 − a3 b2 ⎞
(da) × b = ⎜
⎜ da3 b1 − da1 b3
⎟ = d⎜ a b − a b
⎟ ⎜ 3 1 1 3
⎟ = d (a × b) ,

⎝ da1 b2 − da2 b1 ⎠ ⎝ a1 b2 − a2 b1 ⎠

⎛ a2 db3 − a3 db2 ⎞ ⎛ a2 b3 − a3 b2 ⎞
a × (db) = ⎜
⎜ a3 db1 − a1 db3
⎟ = d⎜ a b − a b
⎟ ⎜ 3 1 1 3
⎟ = d (a × b) .

⎝ a1 db2 − a2 db1 ⎠ ⎝ a1 b2 − a2 b1 ⎠

1493, Contents

√ √
A222(a) ∣a∣ = a21 + a22 + a23 , ∣b∣ = b21 + b22 + b23 ,

∣a × b∣ = (a2 b3 − a3 b2 ) + (a3 b1 − a1 b3 ) + (a1 b2 − a2 b1 ) ,
2 2 2

a⋅b a1 b1 + a2 b2 + a3 b3
cos θ = =√ √ .
∣a∣ ∣b∣ a21 + a22 + a23 b21 + b22 + b23

(b) Since θ ∈ [0, π], it must be that sin θ is non-negative, i.e. sin θ ≥ 0.

√ The trigonometric identity is sin θ + cos θ = 1 (Fact 29). Rearranging, we have sin θ =
2 2
± 1 − cos2 θ. Since sin θ ≥ 0, we can discard the negative value. Altogether then, we have:

sin θ = 1 − cos2 θ.

¿ ¿
√ Á ⋅
2 Á (a + + )
a b
sin θ = 1 − cos2 θ = 1−( ) = 1− 2
1 b 1 a2 b 2 a3 b 3
(d) .
∣a∣ ∣b∣ (a1 + a22 + a23 ) (b21 + b22 + b23 )

(e) As per the hint, fully expand each of LHS and RHS:

LHS = (a21 + a22 + a23 ) (b21 + b22 + b23 ) − (a1 b1 + a2 b2 + a3 b3 )


= a21 b21 + a21 b22 + a21 b23 + a22 b21 + a22 b22 + a22 b23 + a23 b21 + a23 b22 + a23 b23
− (a21 b21 + a22 b22 + a23 b23 + 2a1 a2 b1 b2 + 2a1 a3 b1 b3 + 2a2 a3 b2 b3 )

= a21 b22 + a21 b23 + a22 b21 + a22 b23 + a23 b21 + a23 b22 − (2a1 a2 b1 b2 + 2a1 a3 b1 b3 + 2a2 a3 b2 b3 ).

RHS = (a2 b3 − a3 b2 ) + (a3 b1 − a1 b3 ) + (a1 b2 − a2 b1 )

2 2 2

= a22 b23 + a23 b22 − 2a2 a3 b2 b3 + a21 b23 + a23 b21 − 2a1 a3 b1 b3 + a22 b21 + a21 b22 − 2a1 a2 b1 b2 .

“Clearly”, LHS = RHS.

√ Á (a1 b1 + a2 b2 + a3 b3 )
∣a∣ ∣b∣ sin θ = (a1 + a2 + a3 ) (b1 + b2 + b3 )Á
À1 −
2 2 2 2 2 2
(a21 + a22 + a23 ) (b21 + b22 + b23 )

= (a21 + a22 + a23 ) (b21 + b22 + b23 ) − (a1 b1 + a2 b2 + a3 b3 )

2 ⋆
= (a2 b3 − a3 b2 ) + (a3 b1 − a1 b3 ) + (a1 b2 − a2 b1 ) = ∣a × b∣.
(e) 2 2

1494, Contents

126.17. Ch. 50 Answers: The Distance Between a Point and a Line
A223(a) Let P = (8, 3, 4) and v = (9, 3, 7). Let B be the foot of the perpendicular from A
to l. Method 1 (Formula Method). First, P A = (7, 3, 4) − (8, 3, 4) = (−1, 0, 0). So:
Ð→ Ð→ 9
P B = projv P A = proj(9,3,7) (−1, 0, 0) = − (9, 3, 7) .
Ð→ 9 1
By Fact 86: B = P + projv P A = (8, 3, 4) − (9, 3, 7) = (1 031, 390, 493).
139 139

And by Corollary 14, the distance between A and l is:

Ð→ Ð→ (9, 3, 7) (0, 7, −3) 58
∣AB∣ = ∣P A × v̂∣ = ∣(−1, 0, 0) × √ ∣=∣ √ ∣= .
92 + 32 + 72 139 139
Method 2 (Perpendicular Method). Let B = (8, 3, 4) + λ̃ (9, 3, 7). Write down AB:
AB = B − A = (8, 3, 4) + λ̃ (9, 3, 7) − (7, 3, 4) = (9λ̃ + 1, 3λ̃, 7λ̃).
Ð→ Ð→
Since AB ⊥ l, we have AB ⊥ v or:
0 = AB ⋅ (9, 3, 7) = (9λ̃ + 1, 3λ̃, 7λ̃) ⋅ (9, 3, 7) = 9 (9λ̃ + 1) + 3 (3λ̃) + 7 (7λ̃) = 139λ̃ + 9.

Rearranging, λ̃ = −9/139.

9 1
And now: B = (8, 3, 4) − (9, 3, 7) = (1 031, 390, 493).
139 139
Lovely — this is the same as what we found in Method 1. And now, the distance between
A and l is:
√ √
Ð→ √
∣AB∣ = (9λ̃ + 1) + (3λ̃) + (7λ̃) = 139λ̃ + 18λ̃ + 1 = 58/139.
2 2 2

Method 3 (or the Calculus Method). Let R be a generic point on l, so that AR =
(9λ + 1, 3λ, 7λ + 4) and the distance between A and R is:
Ð→ √
∣AR∣ = 139λ2 + 18λ + 1.

(139λ2 + 18λ + 1) = 278λ + 18.

18 9
FOC: (278λ + 18) ∣λ=λ̃ = 0 or λ̃ = − =− .
278 139
Lovely — this is also what we found in Method 2. We can now find B and ∣AB∣ (omitted).
Alternatively, we could simply have used “−b/2a”: λ̃ = −18/ (2 ⋅ 139) = −9/139.
1495, Contents
A223(b) Let P = (4, 4, 3) and v = (6, 11, 5) − (4, 4, 3) = (2, 7, 2). Let B be the foot of the
perpendicular from A to l.
Method 1 (Formula Method). First, P A = (8, 0, 2) − (4, 4, 3) = (4, −4, −1). So:
Ð→ Ð→ 22
P B = projv P A = proj(2,7,2) (4, −4, −1) = − (2, 7, 2) .
Ð→ 22 1
By Fact 86: B = P + projv P A = (4, 4, 3) − (2, 7, 2) = (184, 74, 127).
57 57

And by Corollary 14, the distance between A and l is:

Ð→ Ð→ (2, 7, 2) (−1, −10, 36) 1 397
∣AB∣ = ∣P A × v̂∣ = ∣(4, −4, −1) × √ ∣=∣ √ ∣= .
22 + 72 + 22 57 57
Method 2 (Perpendicular Method). Let B = (4, 4, 3) + λ̃ (2, 7, 2). Write down AB:
AB = B − A = (4, 4, 3) + λ̃ (2, 7, 2) − (8, 0, 2) = (2λ̃ − 4, 7λ̃ + 4, 2λ̃ + 1).
Ð→ Ð→
Since AB ⊥ l, we have AB ⊥ v or:
0 = AB ⋅ (2, 7, 2) = (2λ̃ − 4, 7λ̃ + 4, 2λ̃ + 1) ⋅ (2, 7, 2) = 57λ̃ + 22.

Rearranging, λ̃ = −22/57.

22 1
And now: B = (4, 4, 3) − (2, 7, 2) = (184, 74, 127).
57 57
Lovely — this is also what we found in Method 1. And now, the distance between A and l
√ √ √
Ð→ 1397
∣AB∣ = (2λ̃ − 4) + (7λ̃ + 4) + (2λ̃ + 1) = 57λ̃2 + 44λ̃ + 33 =
2 2 2
Method 3 (or the Calculus Method). Let R be a generic point on l, so that AR =
(2λ − 4, 7λ + 4, 2λ + 1) and the distance between A and R is:
Ð→ √
∣AR∣ = 57λ2 + 44λ + 33.

(57λ2 + 44λ + 33) = 114λ + 44.

44 22
FOC: (114λ + 44) ∣λ=λ̃ = 0 or λ̃ = − =− .
114 57
Lovely — this is also what we found in Method 2. We can now find B and ∣AB∣ (omitted).
Alternatively, we could simply have used “−b/2a”: λ̃ = −44/ (2 ⋅ 57) = −22/57.
1496, Contents
A223(c) Let P = (8, 4, 5) and v = (5, 6, 0). Let B be the foot of the perpendicular from A
to l. Method 1 (Formula Method). First, P A = (8, 5, 9) − (8, 4, 5) = (0, 1, 4). So:
Ð→ Ð→ 6
P B = projv P A = proj(5,6,0) (0, 1, 4) = (5, 6, 0) .
Ð→ 6 1
By Fact 86: B = P + projv P A = (8, 4, 5) + (5, 6, 0) = (518, 280, 305).
61 61

And by Corollary 14, the distance between A and l is:

Ð→ Ð→ (5, 6, 0) (−24, 20, −5) 1 001
∣AB∣ = ∣P A × v̂∣ = ∣(0, 1, 4) × √ ∣=∣ √ ∣= .
52 + 62 + 02 61 61
Method 2 (Perpendicular Method). Let B = (8, 4, 5) + λ̃ (5, 6, 0). Write down AB:
AB = B − A = (8, 4, 5) + λ̃ (5, 6, 0) − (8, 5, 9) = (5λ̃, 6λ̃ − 1, −4).

Ð→ Ð→
Since AB ⊥ l, we have AB ⊥ v or:
0 = AB ⋅ (5, 6, 0) = (5λ̃, 6λ̃ − 1, −4) ⋅ (5, 6, 0) = 5 (5λ̃) + 6 (6λ̃ − 1) + 0 (−4) = 61λ̃ − 6.

Rearranging, λ̃ = 6/61.

6 1
And now: B = (8, 4, 5) + (5, 6, 0) = (518, 280, 305).
61 61
Lovely — this is also what we found in Method 1. And now, the distance between A and l
√ √ √
Ð→ 1 001
∣AB∣ = (5λ̃) + (6λ̃ − 1) + (−4) = 61λ̃2 − 12λ̃ + 17 =
2 2 2
Method 3 (or the Calculus Method). Let R be a generic point on l, so that AR =
(5λ, 6λ − 1, −4) and the distance between A and R is:
Ð→ √
∣AR∣ = 61λ2 − 12λ̃ + 17.

(61λ2 − 12λ̃ + 17) = 122λ − 12.

12 6
FOC: (122λ − 12) ∣λ=λ̃ = 0 or λ̃ = = .
122 61
Lovely — this is also what we found in Method 2. We can now find B and ∣AB∣ (omitted).
Alternatively, we could simply have used “−b/2a”: λ̃ = − (−12) / (2 ⋅ 61) = 6/61.

1497, Contents

126.18. Ch. 51 Answers (Planes: Introduction)
A224(a) y


(b) (0, 2, 0) and (0, 2, 1).

(c) (0, 0, 3) and (0, 1, 3).
x (d) (0, 2, 3) and (1, 2, 3).

1498, Contents

126.19. Ch. 52 Answers (Planes: Formally Defined in Vector Form)
A225. B is on the plane q, but A and C are not:
OA ⋅ (−5, 7, 3) = (5, −3, 1) ⋅ (−5, 7, 3) = −25 − 21 + 3 = −43 ≠ −1. 7
OB ⋅ (−5, 7, 3) = (1, −2, 6) ⋅ (−5, 7, 3) = −5 − 14 + 18 = −1 = −1. 3
OC ⋅ (−5, 7, 3) = (−2, 2, −3) ⋅ (−5, 7, 3) = 10 + 14 − 9 = 15 ≠ −1. 7

A226. No, l contains the point (7, 3, 1), which isn’t on q because (7, 3, 1) ⋅ (4, −3, 2) ≠ −10.
√ √ √
A227. a = (2, −2, 2) and c = (− 2, 2, − 2) are parallel to (1, −1, 1), while b = (2, 2, −2)
is not. Hence, a and c are normal to q, but not b.

Ð→ Ð→
A228(a) AB = B − A = (−2, 3, 0) − (1, −1, 2) = (−3, 4, −2) and AC = C − A = (0, −1, 1) −
(1, −1, 2) = (−1, 0, −1). And so, a normal vector of q is:
Ð→ Ð→
n = AB × AC = (−3, 4, −2) × (−1, 0, −1) = (−4, −1, 4)
(b) n⋅ OA = (−4, −1, 4)⋅(1, −1, 2) = −4+1+8 = 5. So, q may be described by r⋅(−4, −1, 4) = 5.
(c) Another normal vector of q is m = 2n = 2 (−4, −1, 4) = (−8, −2, 8)
(d) Hence, the plane q may also be described by r ⋅ (−8, −28) = 10.

A229. Only b is perpendicular to q’s normal vector (see below). Hence, only b is on q.

a ⋅ (8, −2, 1) = (3, 7, −5) ⋅ (8, −2, 1) = 24 − 14 − 5 = 5 7

b ⋅ (8, −2, 1) = (1, 6, 4) ⋅ (8, −2, 1) = 8 − 12 + 4 = 0 3
c ⋅ (8, −2, 1) = (3, 10, 1) ⋅ (8, −2, 1) = 24 − 20 + 1 = 5 7

A230. Method 1. First find B = A + AB = (1, 4, −1) + (7, 3, −2) = (8, 7, −3). Then show
that B does not satisfy the plane’s vector equation and is thus not on q:
OB ⋅ (7, −1, 3) = (8, 7, −3) ⋅ (7, −1, 3) = 56 − 7 − 9 ≠ 19. 7
Method 2. Simply check if AB ⊥ (7, −1, 3):
AB ⋅ (7, −1, 3) = (7, 3, −2) ⋅ (7, −1, 3) = 49 − 3 − 6 ≠ 0. 7
So no, AB ⊥/ (7, −1, 3). Hence, by Fact 100, B ∉ q.

A231. Since n ⊥/ a, b, we know for sure that n cannot be a normal vector of q.

It is true that m ⊥ a, b. However, observe that a ∥ b and so Corollary 17 does not apply.
That is, we are unable to conclude that m ⊥ q. The answer is thus, “m may be a normal
vector of q, but we don’t know for sure.”
1499, Contents
126.20. Ch. 53 Answers (Planes in Cartesian Form)
A232(a) x + 2y + 3z = 17 doesn’t contain the origin.
(b) −x − 2z = 0 contains the origin.
(c) −2y + 5z = −3 doesn’t contain the origin.

A233(a) Rewrite x + 5 = 17y + z as x − 17y − z = −5. Reading off, the plane may also be
described by r ⋅ (1, −17, −1) = −15 and doesn’t contain the origin.
(b) Rewrite y + 1 = 0 as 0x + y + 0z = −1. Reading off, the plane may also be described by
r ⋅ (0, 1, 0) = −1 and doesn’t contain the origin.
(c) Rewrite x + z = y − 2 as x − y + z = −2. Reading off, the plane may also be described by
r ⋅ (1, −1, 1) = −2 and doesn’t contain the origin.

A234(a) The plane described by r ⋅ (0, 0, 1) = 32 or z = 32 contains the points (0, 0, 32),
(1, 0, 32), and (0, 1, 32). It does not contain the points (1, 0, 0), (0, 1, 0), or (0, 0, 1).
(b) The plane described by r ⋅ (5, 3, 1) = −2 or 5x + 3y + z = −2 contains the points (−1, 1, 0),
(0, 0, −2), and (0, −1, 1). It does not contain the points (1, 0, 0), (0, 1, 0), or (0, 0, 1).
(c) The plane described by r ⋅ (1, −2, 3) = 0 or x − 2y + 3z = 0 contains the points (0, 0, 0),
(2, 1, 0), and (−3, 0, 1). It does not contain the points (1, 0, 0), (0, 1, 0), or (0, 0, 1).

A235(a) (2, 1, 0), (3, 0, −1), and (0, 3, 2). (c) (4, 0, −1), (4, 1, −1), and (4, 2, −1).
(b) (3, −5, 0), (1, 0, −5), and (0, 1, −3). (d) (1, 0, 0), (0, 0, 1), and (1, 0, 1)

1500, Contents

126.21. Ch. 54 Answers (Planes in Parametric Form)
A236(a)(i) x + 2y + 3z = 4.
(ii) Since u ⋅ (1, 2, 3) = (2, −1, 0) ⋅ (1, 2, 3) = 0 and v ⋅ (1, 2, 3) = (−3, 0, 1) ⋅ (1, 2, 3) = 0, both
u and v are on the plane. Also, u ∥/ v because they aren’t scalar multiples of each other.
(iii) Since w ⋅ (1, 2, 3) = (−1, −1, 1) ⋅ (1, 2, 3) = 0, the vector w is on the plane.
The vectors u and v are on the same plane and u ∥/ v. Hence, by Theorem 10, we should
be able to write w as the LC of u and v, as indeed we can:

w = (−1, −1, 1) = (2, −1, 0) + (−3, 0, 1) = u + v.

(iv) Since (1, 1, 1) ⋅ (1, 2, 3) ≠ 0, we have (1, 1, 1) ⊥/ (1, 2, 3) and so (1, 1, 1) is not on the
plane. Hence, by Theorem 10, it cannot be written as a LC of u and v.

(b)(i) x − z = 0.
(ii) Since u ⋅ (1, 0, −1) = (0, 1, 0) ⋅ (1, 0, −1) = 0 and v ⋅ (1, 0, −1) = (1, 0, 1) ⋅ (1, 0, −1) = 0, both
u and v are on the plane. Also, u ∥/ v because they aren’t scalar multiples of each other.
(iii) Since w ⋅ (1, 0, −1) = (1, −1, 1) ⋅ (1, 0, −1) = 0, the vector w is on the plane.
The vectors u and v are on the same plane and u ∥/ v. Hence, by Theorem 10, we should
be able to write w as the LC of u and v, as indeed we can:

w = (1, −1, 1) = (1, 0, 1) − (0, 1, 0) = v − u.

(iv) Since (1, 1, 1)⋅(1, 0, −1) = 0, we have (1, 1, 1) ⊥ (1, 0, −1) and so (1, 1, 1) is on the plane.
And hence, by Theorem 10, it can be written as a LC of u and v:

(1, 1, 1) = (1, 0, 1) + (0, 1, 0) = u + v.

(c)(i) 9x + y + z = −5.
(ii) Since u ⋅ (9, 1, 1) = (0, 1, −1) ⋅ (9, 1, 1) = 0 and v ⋅ (1, 0, −1) = (−1, 9, 0) ⋅ (9, 1, 1) = 0, both
u and v are on the plane. Also, u ∥/ v because they aren’t scalar multiples of each other.
(iii) Since w ⋅ (9, 1, 1) = (−1, 10, −1) ⋅ (9, 1, 1) = 0, the vector w is on the plane.
The vectors u and v are on the same plane and u ∥/ v. Hence, by Theorem 10, awe should
be able to write w as the LC of u and v, as indeed we can:

w = (−1, 10, −1) = (0, 1, −1) + (−1, 9, 0) = u + v.

(iv) Since (1, 1, 1) ⋅ (9, 1, 1) ≠ 0, we have (1, 1, 1) ⊥/ (9, 1, 1) and so (1, 1, 1) is not on the
plane. Hence, by Theorem 10, it cannot be written as a LC of u and v.

1501, Contents

A237(a) The plane r ⋅ (−1, 2, 5) = 5 has cartesian equation: −x + 2y + 5z = 5.
It contains the point (−5, 0, 0) and the non-parallel vectors (2, 1, 0) and (5, 0, 1). And so,
it may be described by the following parametric equation:

r = (−5, 0, 0) + λ (2, 1, 0) + µ (5, 0, 1) = (−5 + 2λ + 5µ, λ, µ) (λ, µ ∈ R).

(b) The plane r ⋅ (0, 0, 1) = 0 has cartesian equation: z = 0.

It contains the point (0, 0, 0) and the non-parallel vectors (1, 0, 0) and (0, 1, 0). And so, it
may be described by the following parametric equation:

r = (0, 0, 0) + λ (1, 0, 0) + µ (0, 1, 0) = (λ, µ, 0) (λ, µ ∈ R).

(c) The plane r ⋅ (1, −3, 5) = −2 has cartesian equation: x − 3y + 5z = −2.

It contains the point (−2, 0, 0) and the non-parallel vectors (3, 1, 0) and (0, 5, 3). And so,
it may be described by the following parametric equation:

r = (−2, 0, 0) + λ (3, 1, 0) + µ (0, 5, 3) = (−2 + 3λ, λ + 5µ, 3µ) (λ, µ ∈ R).

A238(a) This plane contains the vectors (4, 5, 6) and (7, 8, 9). And so, a normal vector is:

(4, 5, 6) × (7, 8, 9) = (−3, 6, −3).

Since (−3, 6, −3) ∥ (1, −2, 1), a normal vector of the plane is (1, −2, 1).
The plane contains the point (1, 2, 3). Since (1, 2, 3) ⋅ (1, −2, 1) = 1 − 4 + 3 = 0, the plane may
be described in vector and cartesian forms by r ⋅ (1, −2, 1) = 0 and x − 2y + z = 0.
(b) First, rewrite the given parametric equation as:

r = (λ − µ, 4λ + 5, 0) = (0, 5, 0) + λ (1, 4, 0) + µ (−1, 0, 0) (λ, µ ∈ R).

Thus, this plane contains the vectors (1, 4, 0) and (−1, 0, 0). And so, a normal vector is:

(1, 4, 0) × (−1, 0, 0) = (0, 0, 4).

Since (0, 0, 4) ∥ (0, 0, 1), a normal vector of the plane is (0, 0, 1).
The plane contains the point (0, 5, 0). Since (0, 5, 0) ⋅ (0, 0, 1) = 0 + 0 + 0 = 0, the plane may
be described in vector and cartesian forms by r ⋅ (0, 0, 1) = 0 and z = 0.
(c) First, rewrite the given parametric equation as:

r = (1 + µ, 1 + λ, λ + µ) = (1, 1, 0) + λ (0, 1, 1) + µ (1, 01, 1) (λ, µ ∈ R).

Thus, this plane contains the vectors (0, 1, 1) and (1, 0, 1). And so, a normal vector is:

(0, 1, 1) × (1, 0, 1) = (1, 1, −1).

The plane contains the point (1, 1, 0). Since (1, 1, 0) ⋅ (1, 1, −1) = 1 + 1 + 0 = 2, the plane may
be described in vector and cartesian forms by r ⋅ (1, 1, −1) = 2 and x + y − z = 2.
1502, Contents
A239(a) The plane that contains the points (7, 3, 4), (8, 3, 4), and (9, 3, 7) also contains
the vectors (8, 3, 4) − (7, 3, 4) = (1, 0, 0) and (9, 3, 7) − (7, 3, 4) = (2, 0, 3).
Since (1, 0, 0) ∥/ (2, 0, 3), the plane may be described in parametric form by:

⎛7⎞ ⎛1⎞ ⎛ 2 ⎞ ⎛ 7 + λ + 2µ ⎞
r=⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜
⎜ 3 ⎟ + λ⎜ 0 ⎟ + µ⎜ 0 ⎟ = ⎜ 3 ⎟
⎟ (λ, µ ∈ R).
⎝4⎠ ⎝0⎠ ⎝ 3 ⎠ ⎝ 4 + 3µ ⎠

It has normal vector (1, 0, 0) × (2, 0, 3) = (0, −3, 0).

It thus also has normal vector (0, 1, 0). Compute (7, 3, 4) ⋅ (0, 1, 0) = 0 + 3 + 0 = 3. So, this
plane may be described in vector or cartesian form by:

r ⋅ (0, 1, 0) = 3 or y = 3.

(b) The plane that contains the points (8, 0, 2), (4, 4, 3), and (2, 7, 2) also contains the
vectors (4, 4, 3) − (8, 0, 2) = (−4, 4, 1) and (2, 7, 2) − (8, 0, 2) = (−6, 7, 0).
Since (−4, 4, 1) ∥/ (−6, 7, 0), the plane may be described in parametric form by:

⎛8⎞ ⎛ −4 ⎞ ⎛ −6 ⎞ ⎛ 8 − 4λ − 6µ ⎞
r=⎜ ⎟ ⎜
⎜ 0 ⎟ + λ⎜ 4
⎟ + µ ⎜ 7 ⎟ = ⎜ 0 + 4λ + 7µ
⎟ ⎜ ⎟ ⎜

⎟ (λ, µ ∈ R).
⎝2⎠ ⎝ 1 ⎠ ⎝ 0 ⎠ ⎝ 2+λ ⎠

This plane has normal vector (−4, 4, 1) × (−6, 7, 0) = (−7, −6, −4).
It thus also has normal vector (7, 6, 4). Compute (8, 0, 2) ⋅ (7, 6, 4) = 56 + 0 + 8 = 64. So, this
plane may be described in vector or cartesian form by:

r ⋅ (7, 6, 4) = 64 or 7x + 6y + 4z = 64.

(c) The plane that contains the points (8, 5, 9), (8, 4, 5), and (5, 6, 0) also contains the
vectors (8, 5, 9) − (8, 4, 5) = (0, 1, 4) and (8, 5, 9) − (5, 6, 0) = (3, −1, 9).
Since (0, 1, 4) ∥/ (3, −1, 9), the plane may be described in parametric form by:

⎛5⎞ ⎛0 ⎞ ⎛ 3 ⎞ ⎛ 5 + 3µ ⎞
r = ⎜ 6 ⎟ + λ⎜
⎜ ⎟
⎟ + µ ⎜ −1 ⎟ = ⎜ 6 + λ − µ
⎟ ⎜ ⎟ ⎜

⎟ (λ, µ ∈ R).
⎝0⎠ ⎝4 ⎠ ⎝ 9 ⎠ ⎝ 4λ + 9µ ⎠

This plane has normal vector (0, 1, 4) × (3, −1, 9) = (13, 12, −3).
Compute (5, 6, 0) ⋅ (13, 12, −3) = 65 + 72 + 0 = 137. So, this plane may be described in vector
or cartesian form by:

r ⋅ (13, 12, −3) = 137 or 13x + 12y − 3z = 137.

1503, Contents

126.22. Ch. 56 Answers (The Angle Between a Line and a Plane)
A240(a) The angle between a line with direction vector (−1, 1, 0) and a plane with normal
vector (3, 4, 5) is:

∣(−1, 1, 0) ⋅ (3, 4, 5)∣ ∣1∣

sin−1 = sin−1 √ √ = sin−1 0.1 ≈ 0.100.
∣(−1, 1, 0)∣ ∣(3, 4, 5)∣ 2 50

(b) The given line has direction vector (−1, 4, 9)−(−1, 2, 3) = (0, 2, 6) or (0, 1, 3). The given
plane has normal vector (−3, 1, 0) × (0, 5, −3) = (−3, −9, −15) or (1, 3, 5).
Hence, the angle between the line and the plane is:

∣(0, 1, 3) ⋅ (1, 3, 5)∣ ∣18∣

sin−1 = sin−1 √ √ ≈ 1.295.
∣(0, 1, 3)∣ ∣(1, 3, 5)∣ 10 35

(c) The given line has direction vector (0, 11, 11)−(−1, 2, 3) = (1, 9, 8). The given plane also
contains the vector (1.5, 0, 0) − (0, 0, 1.5) = (1.5, 0, −1.5) or (1, 0, −1); hence, it has normal
vector (4, −1, 0) × (1, 0, −1) = (1, 4, 1).
Hence, the angle between the line and the plane is:

∣(1, 9, 8) ⋅ (1, 4, 1)∣ ∣45∣

sin−1 = sin−1 √ √ ≈ 1.071.
∣(1, 9, 8)∣ ∣(1, 4, 1)∣ 146 18

A241(a) The line l has direction vector v = (2, 3, 5) and the plane q has normal vector
n = (−10, 0, 4). Now, v ⋅ n = (2, 3, 5) ⋅ (−10, 0, 4) = −20 + 0 + 20 = 0. So, v ⊥ n and thus l ∥ q.
The point (4, 5, 6) is on l but not on q because (4, 5, 6)⋅(−10, 0, 4) = −40+0+24 = −16 ≠ −26.
Hence, l and q do not intersect at all.
(b) The line l has direction vector v = (5, 5, 6) − (3, 2, 1) = (2, 3, 5).
The plane q has normal vector n = (2, 0, 5) × (2, 1, 5) = (−5, 0, 2).
Since v ⋅ n = (2, 3, 5) ⋅ (−5, 0, 2) = −10 + 0 + 10 = 0, we have v ⊥ n and thus l ∥ q.
Compute (3, 0, 1) ⋅ (−5, 0, 2) = −15 + 0 + 2 = −13. So, q has vector equation r ⋅ (−5, 0, 2) = −13.
The point (3, 2, 1) is on l and is also on q because (3, 2, 1) ⋅ (−5, 0, 2) = −15 + 0 + 2 = −13.
Hence, the line lies entirely on the plane.
(c) The line l has direction vector v = (6, 8, 11) − (4, 5, 6) = (2, 3, 5).
The plane q contains the vector (2, 1, −2) − (2, 0, −2) = (0, 1, 0) and thus has normal vector
n = (0, 1, 0) × (3, 0, 10) = (10, 0, −3).
Since v ⋅ n = (2, 3, 5) ⋅ (10, 0, −3) = 20 + 0 − 15 = 5 ≠ 0, , we have v ⊥/ n and thus l ∥/ q. Hence,
l and q share exactly one intersection point.
Compute (2, 0, −2) ⋅ (10, 0, −3) = 20 + 0 + 6 = 26. So, q has vector equation r ⋅ (10, 0, −3) = 26.
To find the intersection point, plug a generic point of l into q’s vector equation:

[(4, 5, 6) + λ̂(2, 3, 5)] ⋅ (10, 0, −3) = 26 ⇐⇒ 22 + 5λ̂ = 26 ⇐⇒ λ = 0.8.

Thus, l and q intersect at: (4, 5, 6) + λ̂(2, 3, 5) = (4, 5, 6) + 0.8(2, 3, 5) = (5.6, 7.4, 10).
1504, Contents
Ch. 57 Answers (The Angle Between Two Planes)

A242(a) The angle between planes with normal vectors (−1, −2, −3) and (3, 4, 5) is:

∣(−1, −2, −3) ⋅ (3, 4, 5)∣ ∣−26∣

θ = cos−1 = cos−1 √ √ ≈ 0.186.
∣(−1, −2, −3)∣ ∣(3, 4, 5)∣ 14 50

(b) The first plane has normal vector (1, −1, 0) × (3, 5, −1) = (1, 1, 8).
The second plane has normal vector (0, 1, 0) × (10, 2, 3) = (3, 0, −10).
And so, the angle between them is:

∣(1, 1, −8) ⋅ (3, 0, −10)∣ ∣83∣

cos−1 = cos−1 √ √ ≈ 0.207.
∣(1, 1, −8)∣ ∣(3, 0, −10)∣ 66 109

(c) The first plane contains vectors (3, 0, 0) − (1, 1, 0) = (2, −1, 0) and (3, 0, 0) − (0, 0, 1) =
(3, 0, −1). And so, it has normal vector (2, −1, 0) × (3, 0, −1) = (1, 2, 3).
The second plane contains vectors (1, 0, −1) − (1, −1, 0) = (0, 1, −1) and (0, 3, 1) − (1, −1, 0) =
(−1, 4, 1). And so, it has normal vector (0, 1, −1) × (−1, 4, 1) = (5, 1, 1).
Thus, the angle between the two planes is:

∣(1, 2, 3) ⋅ (5, 1, 1)∣ ∣10∣

θ = cos−1 = cos−1 √ √ ≈ 1.031.
∣(1, 2, 3)∣ ∣(5, 1, 1)∣ 14 27

A243(a) Since (4, 9, 3) ∥/ (1, 1, 2), q1 and q2 are not parallel and intersect along a line
with direction vector (4, 9, 3) × (1, 1, 2) = (15, −5, −5) or (−3, 1, 1).
To find an intersection point, plug x = 0 into their cartesian equations to get 9y + 3z = 61

and y + 2z = 19. = minus 9× = yields −15z = −110 or z = 22/3. And so, y = 13/3. Thus, their
2 1 2

intersection line has vector equation r = (0, 13/3, 22/3) + λ (−3, 1, 1) (λ ∈ R).

(b) q1 has normal vector (1, −1, 0) × (1, −1, 1) = (−1, −1, 0) or (1, 1, 0).
q2 has normal vector (6, −1, 0) × (8, 0, −1) = (1, 6, 8).
Since (1, 1, 0) ∥/ (1, 6, 8), q1 and q2 are not parallel and intersect along a line with direction
vector (1, 1, 0) × (1, 6, 8) = (8, −8, 5).
Compute (1, 3, −2) ⋅ (1, 1, 0) = 1 + 3 + 0 = 4 and (2, 3, 5) ⋅ (1, 6, 8) = 2 + 18 + 40 = 60. Hence, q1
and q2 have cartesian equations x + y = 4 and x + 6y + 8z = 60.
To find an intersection point, plug in x = 0 to get y = 4 and 6y + 8z = 60. So, y = 4
1 2

and z = 4.5. Thus, their intersection line has vector equation r = (0, 4, 4.5) + λ(8, −8, 5)
(λ ∈ R).
1505, Contents
A243(c) q1 contains the vectors (7, 7, 0) − (1, 1, 6) = (6, 6, −6) or (1, 1, −1) and (5, 3, 3) −
(1, 1, 6) = (4, 2, −3). And so, it has normal vector (1, 1, −1) × (4, 2, −3) = (−1, −1, −2) or
(1, 1, 2).
q2 contains the vectors (7, 3, 1) − (5, 5, 1) = (2, −2, 0) or (1, −1, 0) and (5, 5, 1) − (3, 5, 2) =
(2, 0, −1). And so, it has normal vector (1, −1, 0) × (2, 0, −1) = (1, 1, 2).
Clearly, q1 and q2 are parallel.
To check if they intersect at all, we’ll pick any point on q1 — say (7, 7, 0) — and check
if it’s on q2 . Compute (5, 5, 1) ⋅ (1, 1, 2) = 5 + 5 + 2 = 12 — hence, q2 has vector equation
r ⋅ (1, 1, 2) = 12. Since (7, 7, 0) ⋅ (1, 1, 2) = 7 + 7 + 0 = 14 ≠ 12, the point (7, 7, 0) is not on q2 .
Thus, q1 and q2 do not intersect at all.

(d) q1 contains the vectors (5, 3, 2)−(1, 5, 3) = (4, −2, −1) and (10, 0, 1)−(5, 3, 2) = (5, −3, −1).
And so, it has normal vector (4, −2, −1) × (5, −3, −1) = (−1, −1, −2) or (1, 1, 2).
q2 contains the vectors (8, 8, −2)−(5, −1, 4) = (3, 9, −6) or (1, 3, −2) and (8, 8, −2)−(3, 5, 2) =
(5, 3, −4). And so, it has normal vector (1, 3, −2) × (5, 3, −4) = (−6, −6, −12) or (1, 1, 2).
Clearly, the two planes are parallel.
To check if they intersect at all, we’ll pick any point on q1 — say (10, 0, 1) — and check
if it’s on q2 . Compute (3, 5, 2) ⋅ (1, 1, 2) = 3 + 5 + 4 = 12 — hence, q2 has vector equation
r ⋅ (1, 1, 2) = 12. Since (10, 0, 1) ⋅ (1, 1, 2) = 10 + 0 + 2 = 12, the point (10, 0, 1) is on the second
plane. Since q1 and q2 are parallel and share at least one intersection point, they must be

(e) Since (7, 1, 1) ∥/ (1, 1, 2), q1 and q2 are not parallel and intersect along a line with
direction vector (7, 1, 1) × (1, 1, 2) = (1, −13, 6).
To find an intersection point, plug x = 0 into their cartesian equations to get y + z = 42 and

y + 2z = 6. = minus = yields z = −36. And so, y = 78. Thus, their intersection line has
2 2 1

vector equation r = (0, 78, −36) + λ (1, −13, 6) (λ ∈ R).

(f) Since (0, 1, 3) ∥/ (−1, 1, 3), q1 and q2 are not parallel and intersect along a line with
direction vector (0, 1, 3) × (−1, 1, 3) = (0, −3, 1).
Observe that if here we try plugging x = 0 into their cartesian equations, then we get
y + 3z = 0 and y + 3z = 2, which are contradictory. This contradiction tells us that the two
1 2

planes have no intersection point with x-coordinate 0.

So, let’s instead try plugging in y = 0 to get 3z = 0 and −x+3z = 2. Solving, z = 0 and x = −2.
3 4

Thus, their intersection line has vector equation r = (−2, 0, 0) + λ (0, −3, 1) (λ ∈ R).

1506, Contents

126.23. Ch. 58 Answers (Point-Plane Foot and Distance)
A244(a) BC is a vector on q, so that by definition of the foot of the perpendicular, we
Ð→ Ð→ Ð→ Ð→
must have AB ⊥ BC or AB ⋅ BC = 0.
Ð→ Ð→
(b) We show that AC ⋅ BC ≠ 0:
Ð→ Ð→ Ð→ Ð→ Ð→ Ð→ Ð→ Ð→ Ð→ Ð→ 2 Ð→ 2
AC ⋅ BC = (AB + BC) ⋅ BC = AB ⋅ BC + BC ⋅ BC = 0 + ∣BC∣ = ∣BC∣ > 0.

Ð→ Ð→ Ð→
We’ve just shown that AC ⊥/ BC. And hence, AC ⊥/ q.

A245(a) Write B = A + kn = (7, 3, 4) + k(9, 3, 7). Since B ∈ q, we have:

OB ⋅ (9, 3, 7) = 109 or [(7, 3, 4) + k(9, 3, 7)] ⋅ (9, 3, 7) = 109 or 100 + 139k = 109.

9 1
So, k = 9/139 and B = A + kn = (7, 3, 4) + (9, 3, 7) = (1 054, 444, 619).
139 139
And the distance between A and q is:

Ð→ 9 9 √ 9
∣AB∣ = ∣kn∣ = ∣k∣ ∣n∣ = ∣ ∣ ∣(9, 3, 7)∣ = 139 = √ .
139 139 139

(b) Write B = A + kn = (8, 0, 2) + k(2, 7, 2). Since B ∈ q, we have:

OB ⋅ (2, 7, 2) = 42 or [(8, 0, 2) + k(2, 7, 2)] ⋅ (2, 7, 2) = 42 or 20 + 57k = 42.

22 1
So, k = 22/57 and B = A + kn = (8, 0, 2) + (2, 7, 2) = (500, 154, 158).
57 57
And the distance between A and q is:

Ð→ 22 22 √ 22
∣AB∣ = ∣kn∣ = ∣k∣ ∣n∣ = ∣ ∣ ∣(2, 7, 2)∣ = 57 = √ .
57 57 57

(c) Write B = A + kn = (8, 5, 9) + k(5, 6, 0). Since B ∈ q, we have:

OB ⋅ (5, 6, 0) = 64 or [(8, 5, 9) + k(5, 6, 0)] ⋅ (5, 6, 0) = 64 or 70 + 61k = 64.

6 1
So, k = −6/61 and B = A + kn = An = (8, 5, 9) − (5, 6, 0) = (458, 269, 549).
61 61
And the distance between A and q is:

Ð→ 6 6√ 6
∣AB∣ = ∣kn∣ = ∣k∣ ∣n∣ = ∣− ∣ ∣(5, 6, 0)∣ = 61 = √ .
61 61 61

1507, Contents

Ð→ Ð→
A246 By Definition 149, AB ∥ n. So, AB = kn or (a) B = A + kn for some k ∈ R. We thus
also have (b) ∣AB∣ = ∣kn∣ = ∣k∣ ∣n∣.
We will be done if we can show that our k here is that given in Fact 115.
Since B ∈ q, we have:
Ð→ Ð→ Ð→
OB ⋅ n = d or (OA + kn) ⋅ n = d or OA ⋅ n + kn ⋅ n = d.

Ð→ Ð→
d − OA ⋅ n d − OA ⋅ n
Rearranging: k= = .

√ √
A247(a) First compute ∣n∣ = 92 + 32 + 72 = 139. Then compute:
d − OA ⋅ n 109 − (7, 3, 4) ⋅ (9, 3, 7) 109 − (63 + 9 + 28) 9
k= = = = .
2 139 139 139

9 1
And so by Fact 115, B = A + kn = (−1, 0, 1) + (9, 3, 7) = (1 054, 444, 619).
139 139
Ð→ 9 √ 9
And as before: ∣AB∣ = ∣k∣ ∣n∣ = ⋅ 139 = √ .
139 139

√ √
(b) First compute ∣n∣ = 22 + 72 + 22 = 57. Then compute:
d − OA ⋅ n 42 − (8, 0, 2) ⋅ (2, 7, 2) 42 − (16 + 0 + 4) 22
k= = = = .
2 57 57 57

22 2
And so by Fact 115, B = A + kn = (8, 0, 2) + (2, 7, 2) = (250, 77, 79).
57 57
Ð→ 22 √ 22
And as before: ∣AB∣ = ∣k∣ ∣n∣ = ⋅ 57 = √ .
57 57

√ √
(c) First compute ∣n∣ = 52 + 62 + 02 = 61. Then compute:
d − OA ⋅ n 64 − (8, 5, 9) ⋅ (5, 6, 0) 64 − (40 + 30 + 0) 6
k= = = =− .
2 61 61 61

6 1
And so by Fact 115, B = A + kn = (8, 5, 9) − (5, 6, 0) = (458, 269, 549).
61 61
Ð→ 6 √ 6
And as before: ∣AB∣ = ∣k∣ ∣n∣ = ⋅ 61 = √ .
61 61

1508, Contents

A248. By Fact 115, the distance between the origin O and the given plane is:
∣d − OO ⋅ n∣ ∣d∣
= .
∣n∣ ∣n∣

And hence, if ∣n∣ = 1, then the distance between the plane and the origin is simply ∣d∣.
A249(a) Perpendicular Method. Let A and B be the feet of the perpendiculars from
S and T to q. Write: A = S + kn and B = T + ln.
Since A, B ∈ q, we have:
OA ⋅ (5, −3, 1) = 0 or [(−1, 0, 7) + k(5, −3, 1)] ⋅ (5, −3, 1) or 2 + 35k = 0,
OB ⋅ (5, −3, 1) = 0 or [(3, 2, 1) + l(5, −3, 1)] ⋅ (5, −3, 1) = 0 or 10 + 35l = 0.

Solving, we have k = −2/35 and l = −10/35 = −2/7. Thus:

2 1
A = S + kn = (−1, 0, 7) − −
(5, −3, 1) = (−45, 6, 243),
35 35
2 1
B = T + ln = (3, 2, 1) − − (5, −3, 1) = (11, 20, 5).
7 7
√ √
Formula Method. First compute ∣n∣ = 52 + (−3) + 12 = 35. Then compute:

d − OS ⋅ n 0 − (−1, 0, 7) ⋅ (5, −3, 1) 0 − (−5 + 0 + 7) 2
k= = = =− .
2 35 35 35
d − OT ⋅ n 0 − (3, 2, 1) ⋅ (5, −3, 1) 0 − (15 − 6 + 1) 2
l= = = =− .
2 35 35 7

So, the feet of the perpendiculars from S and T to q are, respectively:

2 1
S + kn = (−1, 0, 7) − − (5, −3, 1) = (−45, 6, 243),
35 35
2 1
T + ln = (3, 2, 1) − − (5, −3, 1) = (11, 20, 5).
7 7
(b) The distances between q and the points S and T are, respectively:

2 √ 2
∣k∣ ∣n∣ = ⋅ 35 = √ .
35 35

2 √ 2 5
∣l∣ ∣n∣ = ⋅ 35 = √ .
7 7

(c) Since the origin is on q, the distance between the origin and q is 0.

1509, Contents

126.24. Ch. 59 Answers (Coplanarity)
A250. The line AB is described by r = (1, 0, 0) + λ (1, −1, 0) (λ ∈ R) and does not contain
C or D. Hence, A, B, and C are not collinear and neither are A, B, and D.
The line CD is described by r = (0, 0, 1) + λ (1, 1, −2) (λ ∈ R) and does not contain A or B.
Hence, C, D, and A are not collinear and neither are C, D, and B.
A251(a) Let q be the plane that contains A, B, and C. The non-parallel vectors BA =
(3, 2, 4) and BC = (5, 8, 4) are on q.
Ð→ Ð→
Method 1 (Vector Form). q has normal vector BA× BC = (3, 2, 4)×(5, 8, 4) = (−24, 8, 14)
or (−12, 4, 7).
Compute OA ⋅ (−12, 4, 7) = (0, 1, 5) ⋅ (−12, 4, 7) = 0 + 4 + 35 = 39. Hence, q may be described
by r ⋅ (−12, 4, 7) = 39.
Now check if D ∈ q: OD ⋅ (−12, 4, 7) = (6, 6, 1) ⋅ (−12, 4, 7) = −72 + 24 + 7 ≠ 39. Nope, it isn’t.
So the four points aren’t coplanar.
Method 2 (Parametric Form). The plane q may be described in parametric form as:

⎛0⎞ ⎛3 ⎞ ⎛ 5 ⎞ ⎛ 3λ + 5µ ⎞
r=⎜ ⎟ ⎜
⎜ 1 ⎟ + λ⎜ 2
⎟ + µ ⎜ 8 ⎟ = ⎜ 1 + 2λ + 8µ
⎟ ⎜ ⎟ ⎜

⎟ (λ, µ ∈ R).
⎝5⎠ ⎝4 ⎠ ⎝ 4 ⎠ ⎝ 5 + 4λ + 4µ ⎠

Now check if D ∈ q:

⎛ 6 ⎞ ⎛ 3λ + 5µ ⎞ 6 = 3λ + 5µ,

⎜ 6 ⎟ = ⎜ 1 + 2λ + 8µ ⎟ 6 = 1 + 2λ + 8µ,
⎜ ⎟ ⎜ ⎟
⎝ 1 ⎠ ⎝ 5 + 4λ + 4µ ⎠ 1 = 5 + 4λ + 4µ.

2× = minus = yields 11 = 12µ − 3 or µ = 7/6. 1.5× = minus = yields 3 = 1.5 + 7µ or µ = 3/14,

2 3 4 2 1 5

which contradicts =. So, D ∉ q and the four points are not coplanar.

(b) Let q be the plane that contains A, B, and C. The non-parallel vectors AB = (1, −3, 5)
and BC = (6, 1, −2) are on q.
Ð→ Ð→
Method 1 (Vector Form). q has normal vector AB × BC = (1, −3, 5) × (6, 1, −2) =
(1, 32, 19).
Since q contains the origin, it may be described by r ⋅ (1, 32, 19) = 0.
Now check if D ∈ q: OD ⋅ (1, 32, 19) = (4, 7, −12) ⋅ (1, 32, 19) = 4 + 224 − 228 = 0. Yup, it is.
So D ∈ q and the four points are coplanar.
Method 2 (Parametric Form). The plane q may be described in parametric form as:

⎛0⎞ ⎛ 1 ⎞ ⎛ 6 ⎞ ⎛ λ + 6µ ⎞
r=⎜ ⎟ ⎜
⎜ 0 ⎟ + λ ⎜ −3
⎟ + µ ⎜ 1 ⎟ = ⎜ −3λ + µ
⎟ ⎜ ⎟ ⎜

⎟ (λ, µ ∈ R).
⎝0⎠ ⎝ 5 ⎠ ⎝ −2 ⎠ ⎝ 5λ − 2µ ⎠

Now check if D ∈ q:
1510, Contents
⎛ 4 ⎞ ⎛ λ + 6µ ⎞ 4 = λ + 6µ,

⎜ 7 ⎟ = ⎜ −3λ + µ ⎟ 7 = −3λ + µ,
⎜ ⎟ ⎜ ⎟
⎝ −12 ⎠ ⎝ 5λ − 2µ ⎠ −12 = 5λ − 2µ.

2× = plus = yields 2 = −λ or λ = −2 and hence µ = 1. These values of λ and µ also satisfy =.

2 3 1

So, D ∈ q and the four points are indeed coplanar.

(c) The line AB may be described by r = (0, 1, 2) + λ (1, 1, 1) (λ ∈ R). By picking λ = 2, we
observe that AB also contains the point C = (2, 3, 4). Hence, A, B, and C are collinear.
And so by Fact 116, the four points are coplanar.
Ð→ Ð→
The plane containing them contains the non-parallel vectors AB = (1, 1, 1) and AD =
(19, −1, −7). Hence, it may be described by:

r = (0, 1, 2) + λ (1, 1, 1) + µ (19, −1, −7) (λ, µ ∈ R).

A252(a) Since (3, 2, 1) ∥/ (5, 6, 7), the two lines are not parallel. Now write:

⎛8⎞ ⎛3 ⎞ ⎛1⎞ ⎛5 ⎞ 8 + 3λ̂ = 1 + 5µ̂,


⎜ 1 ⎟ + λ̂ ⎜ 2 ⎟ = ⎜ 2 ⎟ + µ̂ ⎜ 6 ⎟ 1 + 2λ̂ = 2 + 6µ̂,
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝5⎠ ⎝1 ⎠ ⎝3⎠ ⎝7 ⎠ 5 + λ̂ = 3 + 7µ̂.

2× = minus (= + =) yields −11 = 0, a contradiction. So, the two lines do not intersect.
2 1 3

Thus, the two lines are skew and not coplanar.

(b) Since (3, 9, 0) ∥ (1, 3, 0), the two lines are parallel. They are also distinct because, for
example, the point (1, 1, 1) is on the second line but not on the first.
Thus, they are coplanar and do not intersect. Compute (0, 0, 6) − (1, 1, 1) = (−1, −1, 5).
So, the (unique) plane that contains both lines is r = (0, 0, 6) + λ (1, 3, 0) + µ (−1, −1, 5)
(λ, µ ∈ R).
(c) Since (1, 0, 1) ∥/ (0, 1, 1), the two lines are not parallel. Now write:

⎛6⎞ ⎛1 ⎞ ⎛9⎞ ⎛0 ⎞ 6 + λ̂ = 9,

⎜ 5 ⎟ + λ̂ ⎜ 0 ⎟ = ⎜ 3 ⎟ + µ̂ ⎜ 1 ⎟ 5 = 3 + µ̂,
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝5⎠ ⎝1 ⎠ ⎝6⎠ ⎝1 ⎠ 5 + λ̂ = 6 + µ̂.

From =, λ̂ = 3. From =, µ̂ = 2. These values of λ̂ and µ̂ satisfy =. Plugging these back in,
1 2 3

the two lines intersect at (6, 5, 5) + 3 (1, 0, 1) = (9, 3, 6) + 2 (0, 1, 1) = (9, 5, 8) .

Thus, the two lines are coplanar and the (unique) plane that contains them is r = (6, 5, 5)+
λ (1, 0, 1) + µ (0, 1, 1) (λ, µ ∈ R).
(d) Since (−5, 0, 1) ∥ (10, 0, 2), the two lines are parallel. Indeed, they are identical
because the point (9, 3, 6) which is on the second line is also on the first (to see this, plug
λ = −2 into the first line’s vector equation).
Since they are identical, they intersect at every point along either line.
They are coplanar and there are infinitely many planes that contain both lines.

1511, Contents

127. Part IV Answers (Complex Numbers)

127.1. Ch. 60 Answers (Complex Numbers: Introduction)

√ √
A253. Is this number ... 9 − 2i 3i 0 4 4 + 2i i 3
Complex? 3 3 3 3 3 3 3
Real? 7 7 3 3 7 7 3
Imaginary? 3 3 7 7 3 3 7
Purely imaginary? 7 3 7 7 7 3 7
“Impure” imaginary? 3 7 7 7 3 7 3
The imaginary unit? 7 7 7 7 7 3 7

A254. Rationalise any denominators with surds and write out the sine or cosine values:
√ √ √ √ √ √ √ √
2 2 3 2 2 3 2 3 2
a= − i, b= − i, c= − i, d= − i.
2 2 2 2 2 2 2 2
Comparing the real and imaginary parts, we see that only c = d.

A255(a) z = (33, 33e). (b) w = (237 + π, 3 − 2). (c) ω = (p, q).

127.2. Ch. 61 Answers (Some Arithmetic of Complex Numbers)

A256(a) z + w = (−5 + 2i) + (7 + 3i) = 2 + 5i, z − w = (−5 + 2i) − (7 + 3i) = −12 − i.
(b) z + w = (3 − i) + (11 + 2i) = 14 + i, z − w = (3 − i) − (11 + 2i) = −8 − 3i.
√ √ √ √
(c) z + w = (1 + 2i) + (3 − 2i) = 4 + (2 − 2) i, z − w = (1 + 2i) − (3 − 2i) = −2 + (2 + 2) i.

A257(a) zw = (−5 + 2i) (7 + 3i) = −35 − 15i + 14i + 6i2 = −41 − i.

z 2 = (−5 + 2i) = 25 + 2 (−5) (2i) − 4 = 21 − 20i.

z 3 = (−5 + 2i) (−5 + 2i) = (21 − 20i) (−5 + 2i) = −105 + 42i + 100i + 40 = −65 + 142i.

(b) zw = (3 − i) (11 + 2i) = 33 + 6i − 11i − 2i2 = 35 − 5i.

z 2 = (3 − i) = 9 + 2 (3) (−i) − 1 = 8 − 6i.

z 3 = (3 − i) (3 − i) = (8 − 6i) (3 − i) = 24 − 8i − 18i − 6 = 18 − 26i.


√ √ √ √ √
(c) zw = (1 + 2i) (3 − 2i) = 3 − 2i + 6i − 2 2i2 = 3 + 2 2 + (6 − 2) i.

z 2 = (1 + 2i) = 1 + 2 (1) (2i) − 4 = −3 + 4i.


z 3 = (1 + 2i) (1 + 2i) = (−3 + 4i) (1 + 2i) = −3 − 6i + 4i − 8 = −11 − 2i.


1512, Contents

A258. zw = (a + ib) (c + id) = ac + iad + ibc + i2 bd = (ac − bd) + i (ad + bc).

A259. (2 + i) 2 = 4 + 2 (2) (i) − 1 = 3 + 4i.

(2 + i) 3 = (2 + i) (2 + i) 2 = (2 + i) (3 + 4i) = 6 + 8i + 3i − 4 = 2 + 11i.
Hence: az 3 + bz 2 + 3z − 1 = (2 + i) 3 a + (2 + i) 2 b + 3 (2 + i) − 1
= (2 + 11i) a + (3 + 4i) b + 3 (2 + i) − 1
= 2a + 3b + 5 + i (11a + 4b + 3) .

Two complex numbers are equal if and only if their real and imaginary parts are equal. So:

2a + 3b + 5 = 0 11a + 4b + 3 = 0.
1 2

Take 3× = minus 4× =: 3 (11a + 4b + 3) − 4 (2a + 3b + 5) = 25a − 11 = 0.

2 1

11 5.88
So: a= = 0.44 and b=− = −1.96.
25 3

1 1 1 1 5 2
A260(a) z ∗ = −5 − 2i. So = 2 2 z∗ = (−5 − 2i) = (−5, −2) = − − i.
z 5 +2 29 29 29 29
1 1 1 1
(b) w∗ = 3 + i. So = 2 2 w∗ = (3 + i) = (3, 1) = 0.3 + 0.1i.
w 3 +1 10 10
1 1 1 1
(c) ω ∗ = 1 − 2i. So = 2 2 ω ∗ = (1 − 2i) = (1, −2) = 0.2 − 0.4i.
ω 1 +2 5 5

A261(a) zz ∗ = (a + ib) (a − ib) = a2 − (ib) 2 = a2 − i2 b2 = a2 + b2 .

1 1 z ∗ z ∗ (a) z ∗
(b) = ∗ = ∗ = 2 2 .
z zz zz a +b

zw∗ (1 + 3i) i
= 2 2= = −3 + i.
w 0 +1 1
zw∗ (2 − 3i) (1 − i) 2 − 2i − 3i − 3 −1 − 5i
= 2 2= = = = −0.5 − 2.5i.
w 1 +1 2 2 2
√ √ √ √
zw∗ ( 2 − πi) (3 + 2i) 3 2 + 2i − 3iπ + 2π 3 + π √ 2 − 3π
(c) = = = = +

w 32 + (− 2)2 11 11 11

zw∗ (11 + 2i) (−i)

= 2 2= = 2 − 11i.
w 0 +1 1
zw∗ (−3) (2 − i) −6 + 3i
(e) = 2 2 = = = −1.2 + 0.6i.
w 2 +1 5 5
zw∗ (7 − 2i) (5 − i) 35 − 7i − 10i − 2 33 17
= 2 2= = = − i.
w 5 +1 26 26 26 26
1513, Contents
127.3. Ch. 62 Answers (Solving Polynomial Equations)
√ √
A263(a) x = (−1 ± −3) /2 = −1/2 ± 3i/2.

(b) x = (−2 ± −4) /2 = −1 ± i.
√ √
(c) x = (−3 ± −3) /6 = −1/2 ± 3i/6.

√ 3 2√ √ 2 √ √ √
A264. (−1 ± 3i) = −1 ± 3 (−1) 3i + 3 (−1) ( 3i) ∓ 3 3i = −1 ± 3 3i + 9 ∓ 3 3i = 8.

A265. Since (−4) + 64 = 0, one root is −4 and x + 4 is a factor for x3 + 64.


To find the other two roots, write:

x3 + 64 = (x + 4) (ax2 + bx + c) = ax3 + (b + 4a) x2 +?x + 4c.

Comparing coefficients: x3 + 64 = (x + 4) (x2 − 4x + 16).

We can now find the other two roots using the usual quadratic formula:
√ √
−b ± b2 − 4ac 4 ± 42 − 4 (1) (16) √ √ √
x= = = 2 ± 4 − 16 = 2 ± 12i = 2 ± 2 3i.
2a 2⋅1
√ √
Thus, the three roots of x3 + 64 = 0 are −4, 2 + 2 3i, and 2 − 2 3i.

A266(a) Since 1 is a root, write:

x3 + x2 − 2 = (x − 1) (ax2 + bx + c) = ax3 + (b − a) x2 +?x − c.

Comparing coefficients: x3 + x2 − 2 = (x − 1) (x2 + 2x + 2).

We can now find the other two roots using the usual quadratic formula:
√ √
−b ± b2 − 4ac −2 ± (−2) − 4 (1) (2)

x= = = −1 ± 1 − 2 = −1 ± i.
2a 2⋅1
Thus, the three roots of x3 + x2 − 2 = 0 are 1 and −1 ± i.

(b) Since 1 is a root, write:

x4 − x2 − 2x + 2 = (x − 1) (ax3 + bx2 + cx + d) = ax4 + (b − a) x3 + (c − b) x2 +?x − d.

Comparing coefficients: x4 − x2 − 2x + 2 = (x − 1) (x3 + x2 − 2).

But in (a), we already worked out the three roots of x3 + x2 − 2 = 0. Thus, the four roots
of x4 − x2 − 2x + 2 = 0 are 1, 1 (repeated), and −1 ± i.
1514, Contents
A267. Both equations have real coefficients and so the Complex Conjugate Root Theorem
applies. That is, since 2 − 3i solves both equations, so too does 2 + 3i.

[x − (2 − 3i)] [x − (2 + 3i)] = (x − 2) − (3i) = x2 − 4x + 13.

2 2

(a) x4 − 6x3 + 18x2 − 14x − 39 = (x2 − 4x + 13) (ax2 + bx + c) = ax4 + (b − 4a) x3 +?x2 +?x + 13c.

Comparing coefficients: ax2 + bx + c = x2 − 2x − 3.

By the quadratic formula or otherwise, we have x2 − 2x − 3 = (x + 1) (x − 3).

Thus, the four roots are 2 − 3i, 2 + 3i, −1, and 3.
(b) −2x4 + 21x3 − 93x2 + 229x − 195 = ax4 + (b − 4a) x3 +?x2 +?x + 13c.

Comparing coefficients: ax2 + bx + c = −2x2 + 13x − 15.

We can now find the other two roots using the usual quadratic formula:
√ √ √
−b ± b2 − 4ac −13 ± 132 − 4 (−2) (−15) 13 ∓ 49 13 ∓ 7
x= = = = = 1.5, 5.
2a 2 (−2) 4 4

Thus, the four roots are 2 − 3i, 2 + 3i, 1.5, and 5.

A268. If 1 − i solves x2 + px + q = 0, then by Theorem 12, so too does 1 + i. And so:

x2 + px + q = [x − (1 − i)] [x − (1 + i)] = (x − 1) 2 − i2 = x2 − 2x + 2.

Hence, p = −2 and q = 2.

127.4. Ch. 63 Answers (The Argand Diagram)


1 + 2i = (1, 2)
2i = (0, 2)

−1 = (−1, 0) 2 = (2, 0) x

θ −1.893

−1 − 3i = (−1, −3)

1515, Contents

127.5. Ch. 64 Answers (Complex Numbers in Polar Form)
√ √ √ √
A270. ∣2∣ = 2, ∣−1∣ = 1, ∣2i∣ = 2, ∣1 + 2i∣ = + 22 = 5, ∣−1 − 3i∣ = (−1) + (−3) = 10.
2 2

A271. Refer to figure on the previous page. We have arg 2 = 0, arg (−1) = π, arg (2i) = π/2
and arg (1 + 2i) = tan−1 (2/1) ≈ 1.107. For −1 − 3i, observe that θ = tan−1 (3/1). Thus,
arg (−1 − 3i) = θ − π = tan−1 (3/1) − π ≈ −1.893.

√ √
∣w∣ = (−3) + 22 = 13

w = −3 + 2i
arg w = π − tan−1 ≈ 2.554

z =2−i
√ √
∣z∣ = 22 + (−1) = 5

arg z = − tan−1 ≈ −0.464

2 −1 0
A273. arg 2 = cos−1 = 0, arg (−1) = cos−1 = π, arg (2i) = cos−1 = π/2.
2 1 2
1 −1
arg (1 + 2i) = cos−1 √ ≈ 1.107, arg (−1 − 3i) = − cos−1 √ ≈ −1.893.
5 10
2 −3
arg z = arg (2 − i) = − cos−1 √ ≈ −0.464, arg w = arg (−3 + 2i) = cos−1 √ ≈ 2.554.
5 13

A274. Using the moduli and arguments found in the above answers, we have:
2 = 2 (cos 0 + i sin 0), −1 = 1 (cos π + i sin π), 2i = 2 (cos + i sin ),
π π
2 2
√ √
1 + 2i ≈ 5 (cos 1.107 + i sin 1.107), −1 − 3i ≈ 10 (cos −1.893 + i sin −1.893),
√ √
2 − i ≈ 5 (cos −0.464 + i sin −0.464), −3 + 2i ≈ 13 (cos 2.554 + i sin 2.554).

127.6. Ch. 65 Answers (Complex Numbers in Exponential Form)

A275. Using the moduli and arguments found in the above answers, we have:
√ √
2 = 2e0i = 2e0i = 2, −1 = eiπ , 2i = 2eiπ/2 , 1 + 2i ≈ 5e1.107i , −1 − 3i ≈ 10e−1.893i ,
√ √
w = 2 − i ≈ 5e−0.464i , z = −3 + 2i ≈ 13e2.554i .
1516, Contents
127.7. Ch. 66 Answers (More Arithmetic of Complex Numbers)
A276(a) ∣z∣ = ∣1∣ = 1, arg z = 0, ∣w∣ = ∣−3∣ = 3, and arg w = π.
Hence, ∣zw∣ = ∣z∣ ∣w∣ = 3 and arg (zw) = arg z + arg w + 2kπ = 0 + π + 0 = π.
Thus, zw = 3 (cos π + i sin π) = 3eiπ = −3.
Next, ∣−2zw∣ = 2 ∣zw∣ = 2 ⋅ 3 = 6 and arg (−2zw) = arg (zw) − π = 0.
Thus, −2zw = 6 (cos 0 + i sin 0) = 6eiπ = 6.
√ √
(b) ∣z∣ = ∣2i∣ = 2, arg z = π/2, ∣w∣ = ∣1 + 2i∣ = 5, and arg w = cos−1 (1/ 5) ≈ 1.107.

Hence, ∣zw∣ = ∣z∣ ∣w∣ = 2 5 and arg (zw) = arg z + arg w + 2kπ = π/2 + 1.107 + 0 ≈ 2.678.
√ √
Thus, zw = 2 5 (cos 2.678 + i sin 2.678) = 2 5e2.678i ≈ −4 + 2i.
√ √
Next, ∣−2zw∣ = 2 ∣zw∣ = 2 ⋅ 2 5 = 4 5 and arg (−2zw) = arg (zw) − π ≈ 2.678 − π ≈ −0.464.
√ √
Thus, −2zw = 4 5 (cos −0.464 + i sin −0.464) = 4 5e−0.464i ≈ 8 − 4i.
√ √
(c) ∣z∣ = 10, arg z = − cos−1 (−1/ 10) ≈ −1.893, ∣w∣ = 5, and arg w = cos−1 (3/5) ≈ 0.927.

Hence, ∣zw∣ = ∣z∣ ∣w∣ = 5 10 and arg (zw) = arg z + arg w + 2kπ = −1.893 + 0.927 + 0 ≈ −0.965.
√ √
Thus, zw = 5 10 (cos −0.965 + i sin −0.965) = 5 10e−0.965i ≈ 9 − 13i.
√ √
Next, ∣−2zw∣ = 2 ∣zw∣ = 2 ⋅ 5 10 = 10 10 and arg (−2zw) = arg (zw) + π ≈ −0.965 + π ≈ 2.177.
√ √
Thus, −2zw = 10 10 (cos 2.177 + i sin 2.177) = 10 10e2.177i ≈ −18 + 26i.
√ √
(d) ∣z∣ = 29, arg z = cos−1 (−2/ 29) ≈ 1.951, ∣w∣ = 1, arg w = π/2.

Hence, ∣zw∣ = ∣z∣ ∣w∣ = 29 and arg (zw) = arg z + arg w + 2kπ = 1.951 + π/2 − 2π ≈ −2.761.
√ √
Thus, zw = 29 (cos −2.761 + i sin −2.761) = 29e−2.761i ≈ −5 − 2i.

Next, ∣−2zw∣ = 2 ∣zw∣ = 2 29 and arg (−2zw) = arg (zw) + π ≈ −2.761 + π ≈ 0.381.
√ √
Thus, −2zw = 2 29 (cos 0.381 + i sin 0.381) = 2 29e0.381i ≈ 10 + 4i.
√ √ √ √
(e) ∣z∣ = 2, arg z = − cos−1 (−1/ 2) ≈ −2.356, ∣w∣ = 5, and arg w = − cos−1 (−1/ 5) ≈

Hence, ∣zw∣ = ∣z∣ ∣w∣ = 10 and arg (zw) = arg z + arg w + 2kπ = −2.356 − 2.034 + 2π ≈ 1.893.
√ √
Thus, zw = 10 (cos 1.893 + i sin 1.893) = 10e1.893i ≈ −1 + 3i.

Next, ∣−2zw∣ = 2 ∣zw∣ = 2 10 and arg (−2zw) = arg (zw) − π ≈ 1.893 − π ≈ −1.249.
√ √
Thus, −2zw = 2 10 (cos −1.249 + i sin −1.249) = 2 10e−1.249i ≈ 2 − 6i.
√ √ √ √
(f) ∣z∣ = 34, arg z = − cos−1 (−5/ 34) ≈ −2.601, ∣w∣ = 26, and arg w = − cos−1 (5/ 26) ≈
√ √
Hence, ∣zw∣ = ∣z∣ ∣w∣ = 34 26 and arg (zw) = arg z+arg w+2kπ ≈ −2.601−0.197+0 ≈ −2.799.
√ √ √ √
Thus, zw = 34 26 (cos −2.799 + i sin −2.799) = 34 26e−2.799i ≈ −28 − 10i.
√ √
Next, ∣−2zw∣ = 2 ∣zw∣ = 2 34 26 and arg (−2zw) = arg (zw) + π ≈ −2.799 + π ≈ 0.343.
√ √ √ √
Thus, −2zw = 2 34 26 (cos 0.343 + i sin 0.343) = 2 34 26e0.343i ≈ 56 + 20i.

1517, Contents

A277(a) z = r (cos θ + i sin θ) and w = s (cos φ + i sin φ).
(b) zw = rs (cos θ + i sin θ) (cos φ + i sin φ)
= rs (cos θ cos φ + i sin θ cos φ + i cos θ sin φ − sin θ sin φ)
= rs [cos (θ + φ) + i sin (θ + φ)] .

∣zw∣ = [rs cos (θ + φ)] + [rs sin (θ + φ)]
2 2
√ √
= rs cos2 (θ + φ) + sin2 (θ + φ) = rs 1 = rs = ∣z∣ ∣w∣ . 3

A278(a) ∣z∣ = 1, arg z = 0. So, ∣1/z∣ = 1/ ∣z∣ = 1, arg (1/z) = − arg z = 0.

Thus: = 1e0i = cos 0 + i sin 0 = 1.
(b) ∣w∣ = 2, arg w = π/2. So, ∣1/w∣ = 1/ ∣w∣ = 1/2, arg (1/w) = − arg w = −π/2.

1 1 −iπ/2 1 −π −π 1
Thus: = e = (cos + i sin ) = − i.
w 2 2 2 2 2
(c) ∣z∣ = 17, arg z = π. So, ∣1/z∣ = 1/ ∣z∣ = 1/17. Note importantly that z < 0, so that Fact
131(b) does not apply here. We have, simply, arg = arg z = π. And:
1 1 π 1 1
= e = (cos π + i sin π) = − .
z 17 17 17
(d) ∣w∣ = 8, arg w = −π/2. So, ∣1/w∣ = 1/ ∣w∣ = 1/8, arg (1/w) = − arg w = π/2.

1 1 iπ/2 1 1
= e = (cos + i sin ) = i.
π π
w 8 8 2 2 8
√ √ √
(e) ∣z∣ = 29, arg z = cos−1 (−2/ 29) ≈ 1.951. So, ∣1/z∣ = 1/ ∣z∣ = 1/ 29, arg (1/z) ≈ −1.951.

1 1 1
Thus: ≈ √ e−1.951i ≈ √ (cos −1.951 + i sin −1.951) ≈ −0.069 + 0.172i.
z 29 29
√ √ √
(f) ∣w∣ = 2, arg w = − cos−1 (−1/ 2) = −3π/4. So, ∣1/w∣ = 1/ ∣w∣ = 1/ 2, arg (1/w) = 3π/4.

1 1 1 3π 3π 1 1
Thus: = √ e3iπ/4 = √ (cos + i sin ) = − + i.
w 2 2 4 4 2 2
√ √ √
(g) ∣z∣ = 10, arg z = − cos−1 (1/ 10) ≈ −1.249. So, ∣1/z∣ = 1/ ∣z∣ = 1/ 10, arg (1/z) ≈ 1.249.

1 1 1
Thus: ≈ √ e1.249i ≈ √ (cos 1.249 + i sin 1.249) ≈ 0.1 + 0.3i.
z 10 10

(h) ∣w∣ = 5, arg w = cos−1 (3/5) ≈ 0.927. So, ∣1/w∣ = 1/ ∣w∣ = 1/5, arg (1/w) = − arg w = −0.927.

1 1 −0.927 1
Thus: ≈ e ≈ (cos −0.927 + i sin −0.927) ≈ 0.12 − 0.16i.
z 5 5
1518, Contents
A279(a) ∣z∣ = ∣1∣ = 1, arg z = 0, ∣w∣ = ∣3∣ = 3, and arg w = π. Hence, ∣z/w∣ = ∣z∣ / ∣w∣ = 1/3 and
arg (z/w) = arg z − arg w + 2kπ = 0 − π + 2π = π. Thus, z/w = 1/3 (cos π + i sin π) = 1/3eiπ = −1/3.
√ √
(b) ∣z∣ = ∣2i∣ = 2, arg z = π/2, ∣w∣ = ∣1 + 2i∣ = 5, and arg w = cos−1 (1/ 5) ≈ 1.107. Hence,

∣z/w∣ = ∣z∣ / ∣w∣ = 2/ 5 and arg (z/w) = arg z − arg w + 2kπ = π/2 − 1.107 + 0 ≈ 0.464. Thus,
√ √
z/w ≈ (2/ 5) (cos 0.464 + i sin 0.464) ≈ (2/ 5) e0.464i ≈ 0.8 + 0.4i.
2i 2i 1 − 2i 2i + 4
= = = 0.8 + 0.4i.
1 + 2i 1 + 2i 1 − 2i 12 + 22
√ √
(c) ∣z∣ = ∣−1 − 3i∣ = 10, arg z = − cos−1 (−1/ 10) ≈ −1.893, ∣w∣ = ∣3 + 4i∣ = 5, and arg w =
√ √ √
cos (3/5) ≈ 0.927. Hence, ∣z/w∣ = ∣z∣ / ∣w∣ = 10/5 = 2/5 = 0.4 and arg (z/w) = arg z −

arg w + 2kπ = −1.893 − 0.927 + 0 ≈ −2.820.

√ √
Thus, z/w ≈ 0.4 (cos −2.820 + i sin −2.820) ≈ 0.4e−2.820i ≈ −0.6 − 0.2i.
−1 − 3i −1 − 3i 3 − 4i −3 + 4i − 9i − 12 −15 − 5i
= = = = −0.6 − 0.2i.
3 + 4i 3 + 4i 3 − 4i 32 + 42 25
√ √
(d) ∣z∣ = ∣−2 + 5i∣ = 29, arg z = cos−1 (−2/ 29) ≈ 1.951, ∣w∣ = ∣i∣ = 1, and arg w = π/2.

Hence, ∣z/w∣ = ∣z∣ / ∣w∣ = 29 and arg (z/w) = arg z − arg w + 2kπ = 1.951 − π/2 + 0 ≈ 0.381.
√ √
Thus, z/w ≈ 29 (cos 0.381 + i sin 0.381) ≈ 29e0.381i ≈ 5 + 2i.
−2 + 5i −2 + 5i −i 2i + 5
= = 2 = 5 + 2i.
i i −i 1
√ √ √
(e) ∣z∣ = ∣−1 − i∣ = 2, arg z = − cos−1 (−1/ 2) ≈ −2.356, ∣w∣ = ∣−1 − 2i∣ = 5, and arg w =
√ √ √
− cos−1 (−1/ 5) ≈ −2.034. Hence, ∣z/w∣ = ∣z∣ / ∣w∣ = 2/5 = 0.4 and arg (z/w) = arg z −
arg w + 2kπ = −2.356 + 2.034 + 0 ≈ −0.322.
√ √
Thus, z/w ≈ 0.4 (cos −0.322 + i sin −0.322) ≈ 0.4e−0.322i ≈ 0.6 − 0.2i.
−1 − i −1 − i −1 + 2i 1 − 2i + i + 2 3 − i
= = = = 0.6 − 0.2i.
−1 − 2i −1 − 2i −1 + 2i 12 + 22 5
√ √ √
(f) ∣z∣ = ∣−5 − 3i∣ = 34, arg z = − cos−1 (−5/ 34) ≈ −2.601, ∣w∣ = ∣5 − i∣ = 26, and arg w =
√ √ √
− cos (5/ 26) ≈ −0.197. Hence, ∣z/w∣ = ∣z∣ / ∣w∣ = 34/26 = 17/13 and arg (z/w) =

arg z − arg w + 2kπ = −2.601 + 0.197 + 0 ≈ −2.404.

√ √
Thus, z/w = 17/13 (cos −2.404 + i sin −2.404) = 17/13e−2.404i .
−5 − 3i −5 − 3i 5 + i −25 − 5i − 15i + 3 −22 − 20i 11 10
= = = = − − i.
5−i 5−i 5+i 52 + 12 26 13 13

1 1 1 2 1 ∣z∣
A280. ∣ ∣ = ∣z ∣ = ∣z∣ ∣ ∣ = ∣z∣ = , where = and = use Facts 130 and 131.
z 1 2
w w w ∣w∣ ∣w∣
1519, Contents
128. Part V Answers (Calculus)

128.1. Ch. 67 Answers (An Introduction to Limits)

1520, Contents

128.2. Ch. 68 Answers (An Introduction to Continuity, Continued)
A287. By Definition 68, we have:

sin 1 1 cos
tan = , cosec = , sec = , cot = .
cos sin cos sin
Since each of tan, cosec, sec, and cot is defined defined as the quotient of two continuous
functions, by Theorem 16, each is continuous.
By Definition 69, tan−1 is defined as the inverse of the continuous function tan, which is
defined on the interval R. And so by Theorem 17, tan−1 is also continuous.

1521, Contents

128.3. Ch. 69Answers (An Introduction to the Derivative)
A288(a) f is differentiable at −3, because the derivative of f at −3 exists and is equal to
f (x) − f (3) ∣x∣ − ∣−3∣
lim = lim (Simply plug in)
x→−3 x − (−3) x→−3 x + 3

−x − 3
= lim (For all x “near” 3, x < 0 and hence ∣x∣ = −x)
x→−3 x + 3

= lim −1 (Note that x ≠ 3)


= −1
(Constant Factor Rule).

(b) f is differentiable at any a < 0, because the derivative of f at a exists and is equal to
f (x) − f (a) ∣x∣ − ∣a∣
lim = lim (Simply plug in)
x→a x−a x→a x − a
−x + a
= lim (For all x “near” a < 0, x < 0 and hence ∣x∣ = −x)
x→a x − a

= lim −1 (Note that x ≠ a)


= −1
(Constant Factor Rule).

A289(a) The derivative of g at −3 exists and equals −6:

g (x) − g (−3) x2 − (−3)

lim = lim (Simply plug in)
x→−3 x − (−3) x→−3 x − (−3)

[x − (−3)] (x − 3)
= lim
x→−3 x − (−3)

= lim (x − 3) (Note that x ≠ −3)

= lim x + lim −3 (Sum and Difference Rules)
x→−3 x→−3

= −3 + −3 = −6
(Power and Constant Rules).

(b) The derivative of g at 0 exists and equals 0:

1522, Contents

g (x) − g (0) x 2 − 02
lim = lim (Simply plug in)
x→0 x−0 x→0 x − 0

(x − 0) (x + 0)
= lim
x→0 x−0
= lim (x + 0) (Note that x ≠ 0)
= lim x + lim 0 (Sum and Difference Rules)
x→0 x→0

= 0+0=0
(Power and Constant Rules).

(c) The derivative of g at a ∈ R exists and equals 2a:

g (x) − g (a) x2 − a2
lim = lim (Simply plug in)
x→a x−a x→a x − a

(x − a) (x + a)
= lim
x→a x−a
= lim (x + a) (Note that x ≠ a)
= lim x + lim a (Sum and Difference Rules)
x→a x→a

= a + a = 2a
(Power and Constant Rules).

A291. For any a ∈ R, we have:

i (x) − i (a) x4 − a4
lim = lim (Simply plug in)
x→a x−a x→a x − a

⋆ (x − a) (x3 + ax2 + a2 x + a2 )
= lim
x→a x−a
= lim (x3 + ax2 + a2 x + a3 ) (Note that x ≠ a)
±, F±, F
= = lim x3 + a lim x2 + a2 lim x + lim a3 (Sum, Difference, and Constant Fa
x→a x→a x→a x→a

= a3 + a ⋅ a2 + a2 ⋅ a + a3 = 4a3
P, C
(Power and Constant Rules).

What we’ve just shown is that for any a ∈ R, the derivative of h at a is 4a3 .
Thus, the derivative of i is the function i′ ∶ R → R defined by i′ (x) = 4x3 .
A292. For any a ∈ R, we have:

1523, Contents

f (x) − f (a) xc − ac
lim = lim (Simply pl
x→a x−a x→a x − a

○ (x − a) (xc−1 + xc−2 a + xc−3 a2 + ⋅ ⋅ ⋅ + xac−2 + ac−1 )

= lim
x→a x−a
= lim (x3 + ax2 + a2 x + a3 ) (Note that
±, F±, F
= = lim xc−1 + a lim xc−2 + a2 lim xc−3 + ⋅ ⋅ ⋅ + ac−2 lim x + lim ac−1 (Sum, Diff
x→a x→a x→a x→a x→a

= ac−1 + a ⋅ ac−2 + a2 ⋅ ac−3 + ⋅ ⋅ ⋅ + ac−2 ⋅ a + ac−1 = cac−1

P, C
(Power an

We’ve just shown that for any a ∈ R, the derivative of f at a is cac−1 . Thus, the derivative
of f is the function f ′ ∶ R → R defined by f ′ (x) = cxc−1 .
A293. The mistake is in Step 4.
This is a common mistake made by students. To find the derivative of f at 2, we must
first find the derivative of f then plug in 2. The mistake here is to plug in 2 first, getting
f (2) = −2, then differentiating the constant −2, which of course yields 0.
A294(a) Observe that is a constant. And so, by the Constant Rule for Limits
g (a)
(Theorem 14), we have:
1 1
lim =
x→a g (a) g (a)

(b) Since f and g are differentiable at a, by Theorem 19 (result), they are also continuous
at a. And so, by the definition of continuity, we have:

lim f (x) = f (a) and lim g (x) = g (a).

x→a x→a

(c) Since g is continuous at a and g (a) ≠ 0, by Theorem 16 (result), the reciprocal

function 1/g is also continuous at a. And so, again by the definition of continuity, we
1 1
lim =
x→a g (x) g (a)

(d) By definition of the derivative, we have:

f (x) − f (a) g (x) − g (a)

lim = f ′ (a) and lim = g ′ (a).
x→a x−a x→a x−a

(e) Now return to expression ⋆ and plug in the equations = through = to get:
1 6

(f /g) (x) − (f /g) (a) 1 1 f (x) − f (a) g (x) − g

lim = lim lim [lim g (x) lim − lim f (x) lim
x→a x−a x→a g (a) x→a g (x) x→a x→a x−a x→a x→a x−a
1 1 g (a) f ′ (a) − f (a) g ′ (a)
= [g (a) f (a) − f (a) g (a)] =
′ ′
g (a) g (a) [g (a)]

1524, Contents

(f) We’ve just shown that for any a ∈ D ∖ {x ∶ g (x) = 0}, the derivative of f /g at a is:
g (a) f ′ (a) − f (a) g ′ (a)
[g (a)]

Thus, the derivative of f /g is the function (f /g) ∶ D ∖ {x ∶ g (x) = 0} → R defined by:

f ′ g (x) f ′ (x) − f (x) g ′ (x)

( ) (x) = .
[g (x)]
g 2

1525, Contents

128.4. Ch. 71 Answers (The Second and Higher Derivatives)
A299(a) False. As noted above, it is true that if a function is twice differentiable, then it
is also differentiable. However, the converse is not generally true.
A300. Since f is a differentiable function, by Definition 168, it is differentiable at every
point in its domain. Thus, the (first) derivative f ′ has the same domain as f . That is,
Domainf ′ = Domainf = D.
We are given no information about the second derivative of f . Thus, all we can say about
the domain of f ′′ is that it must be a subset of the domain of f ′ . That is, Domainf ′′ ⊆
Domainf ′ = D. It may even be that Domainf ′′ = ∅ — that is, f is not twice differentiable
at any point.
A301. We have already used f n to denote the composite function f ○ f ○ ⋅ ⋅ ⋅ ○ f . So, the
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
n times
parentheses help us distinguish f , the nth derivative of f , from the composite function
f n.
A302. Every one of g’s derivatives have R as its domain and codomain. We have:

dg dg
g ′ (x) = (x) = ġ (x) = 4x3 − 3x2 + 2x − 1 and g ′ (1) = (1) = ġ (
dx dx
d2 g d2 g
g (x)
= (x) = g̈ (x) = 12x2 − 6x + 2 and g (1) =
(1) = g̈ (
dx2 dx2
d3 g ... d3 g ...
g (x)
= (x) = g (x) = 24x − 6 and g (1) =
(1) = g
dx3 dx3
d4 g 4 d4 g 4
g (4)
(x) = (x) = ġ (x) = 24 and g (4)
(1) = (1) = ġ (
dx4 dx4
For any n ≥ 5:
dn g dn g
g (n) (x) = (x) = (x) = 0 g (n) (1) = (1) = (
n n
ġ and ġ
dxn dxn

1526, Contents

128.5. Ch. 70.2 Answers (Implicit Differentiation)
d d 1 ÷ 0 − (− sin x) 1 sin x
A296(a) sec x = = = = sec x tan x.
dx dx cos x cos2 x cos x cos x
d dy
(c) Let y = cos−1 x ∈ [0, π]. Then x = cos y. Apply to = to get 1 = − sin y .
1 1 2
dx dx

From the identity sin2 y + cos2 y = 1, we have sin y = ± 1 − x2 . But since y ∈ [0, π], we
√ know
that sin y ≥ 0. And so in =, we may simply discard the negative value to get sin y = 1 − x2 .
3 4

Now plug = into = to get:

4 2

√ dy dy −1
1 = − 1 − x2 or =√ .
dx dx 1 − x2
d dy
(d) Let y = tan−1 x. Then x = tan y. Apply to = to get 1 = sec2 y .
1 1 2
dx dx
From the identity sec2 y = 1 + tan2 y, we have sec2 y = 1 + x2 . Plug = into = to get:
3 3 2

dy dy 1
1 = (1 + x2 ) or = .
dx dx 1 + x2

1527, Contents

128.6. Ch. ?? Answers (Solving Problems Involving
d dy
A295. To the given equation x2 y + sin x = 0, apply to get 2xy + x2 + cos x = 0. So
dx dx
dy 2xy + cos x dx x2
=− (for ≠ 0). And = − (for 2xy + cos x ≠ 0).
2xy + cos x
dx x2 dy
dx d
Alternative Method. To find , we could also have applied to the given equation:
dy dy
dx dx dx x2
2x y + x2 + cos x = 0 and so again, =− .
dy dy dy 2xy + cos x

dx dy
A297. Given x = cos t+t2 and y = et −t3 , we may compute = − sin t+2t and = et −3t2 .
dt dt
dy et − 3t2
So = (for − sin t + 2t ≠ 0).
dx − sin t + 2t

A318. Given x = t5 + t and y = t4 − t, we have t = 0 Ô⇒ (x, y) = (0, 0). And t = 1 Ô⇒

(x, y) = (2, 0).
dy dy dx 4t3 − 1
Compute = ÷ = 4 (for 5t4 + 1 ≠ 0).
dx dt dt 5t + 1
dy 4t3 − 1
And so at t = 0, = = −1 and so the tangent line at t = 0 has equation y = −x.
dx 5t4 + 1
dy 4t3 − 1 3 1
While at t = 1, = 4 = = and so the tangent line at t = 1 has equation
dx 5t + 1 6 2
1 1
y = (x − 2) = x − 1.
2 2
1 2
And so at their intersection, we have −x = x − 1 or x = . So their intersection point is
2 3
2 2
( , − ).
3 3

⎪1, if x ≤ 0,
A283. Given f ∶ R → R defined by f (x) = ⎨ we have

⎩2, if x > 0,

lim f (x) = 1, lim f (x) is undefined, and lim f (x) = 2.
x→−5 x→0 x→5

The 2nd derivative of g is the function with domain and codomain both R and mapping
d2 g ⋅⋅
rule x ↦ 12x − 6x + 2. It may be denoted g or
2 ′′
or g. Evaluated at 1, we have
d2 g ⋅⋅
g ′′ (1) = 2 ∣ = g(1) = 8.
dx x=1

The 3rd derivative of g is the function with domain and codomain both R and mapping
d3 g ⋅
rule x ↦ 24x − 6. It may be denoted g or
or g. Evaluated at 1, we have g (3) (1) =

1528, Contents

d3 g ⋅
∣ = g(1) = 18.
dx3 x=1

The 4th derivative of g is the function with domain and codomain both R and mapping
d4 g ⋅ d4 g
rule x ↦ 24. It may be denoted g or 4 or g. Evaluated at 1, we have g (1) = 4 ∣ =
(4) (4)
dx dx x=1

g(1) = 24.

For n ≥ 5, the nth derivative of g is the function with domain and codomain both R

and mapping rule x ↦ 0. It may be denoted g (n) dn g
or n
or g. Evaluated at 1, we have

g (1) = n ∣ = g(1) = 0.
(n) dn g
dx x=1

1529, Contents

128.7. Ch. 72.1 Answers (Stationary, Maximum, Minimum, and
Inflexion Points)
A307. (a) Given the function f ∶ R → R defined by x ↦ 100 ...
(i) Every point a ∈ R is a maximum point with corresponding maximum value f (a) = 100;
(ii) Every point a ∈ R is a minimum point with corresponding maximum value f (a) = 100;
(iii) No point is a strict maximum;
(iv) No point is a strict minimum;
(v) Every point a ∈ R is a global maximum point with corresponding global maximum
value f (a) = 100;
(vi) Every point a ∈ R is a global minimum point with corresponding global maximum
value f (a) = 100;
(vii) No point is a strict global maximum;
(viii) No point is a strict global minimum.

(b) Given the function g ∶ R → R defined by x ↦ x2 ...

(i) No point is a maximum;
(ii) Only x = 0 is a minimum point with corresponding minimum value g (0) = 0;
(iii) No point is a strict maximum;
(iv) Only x = 0 is a strict minimum point with corresponding strict minimum value g (0) = 0;
(v) No point is a global maximum;
(vi) Only x = 0 is a global minimum point with corresponding global minimum value
g (0) = 0;
(vii) No point is a strict global maximum;
(viii) Only x = 0 is a strict global minimum point with corresponding strict global minimum
value g (0) = 0.

1530, Contents

A307. (c) Given the function h ∶ [1, 2] → R defined by x ↦ x2 ...
(i) Only x = 2 is a maximum with corresponding maximum value h(2) = 4;
(ii) Only x = 1 is a minimum point with corresponding minimum value h(1) = 1;
(iii) Only x = 2 is a strict maximum point with corresponding strict maximum value
h(2) = 4;
(iv) Only x = 1 is a strict minimum point with corresponding strict minimum value h(1) = 1;
(v) Only x = 2 is a global maximum with corresponding global maximum value h(2) = 4;
(vi) Only x = 1 is a global minimum point with corresponding global minimum value
h(1) = 1;
(vii) Only x = 2 is a strict global maximum point with corresponding strict global maximum
value h(2) = 4;
(viii) Only x = 1 is a strict global minimum point with corresponding strict global minimum
value h(1) = 1.

A308. (a) It is false that every maximum point or minimum point is a stationary point
— see Points A and E in Example 722.
(b) It is also false that every maximum point or minimum point is a turning point — again,
see Points A and E in Example 722.
(c) It is false that every stationary point is a maximum point or minimum point — see
Point D in Example 722.
(d) By Definition ??, it is true that every turning point is a maximum point or minimum
(e) By Definition ??, it is true that every turning point is a stationary point.
(f) It is false that every stationary point is a turning point — again, see Point D in Example

A309. In order for −1 to be a minimum point of g, it must be that to its left, g is

decreasing; while to its right, g is increasing. In other words, to the left of −1, g ′ (x) ≤ 0.
While to the right of −1, g ′ (x) ≥ 0. Altogether then, we must have g ′ (−1) = 0 — at the
minimum point, the gradient of the function must be 0.

A310. “If c is a maximum or minimum point AND in the interior of D, then c is a turning
point” — true! By the IET, c is a stationary point. Since c is also either a maximum or a
minimum point, by Definition ??, x is also a turning point.

1531, Contents

A311. (a) f ∶ R → R defined by x ↦ x.
1. Identify all the stationary points (i.e. x where f ′ (x) = 0).
f ′ (x) = 1, for all x. So f has no stationary points.
2. Identify all the non-interior points.
There are no non-interior points because every point x ∈ R is in the interior of R.
3. Check if each of these points is a maximum point, a minimum point, or neither.
There are neither stationary nor non-interior points. Hence, there are no maximum or
minimum points.

(b) g ∶ [0, 1] → R defined by x ↦ x.

1. Identify all the stationary points (i.e. x where f ′ (x) = 0).
g ′ (x) = 1, for all x. So f has no stationary points.
2. Identify all the non-interior points.
The only two non-interior points are 0 and 1.
3. Check if each of these points is a maximum point, a minimum point, or neither.
0 is the only minimum point and 1 is the only maximum point of g.

1532, Contents

A311. (c) h ∶ R → R defined by x ↦ x4 − 2x2 .
1. Identify all the stationary points (i.e. x where h′ (x) = 0).
h′ (x) = 4x3 − 4x = 4x (x2 − 1) = 4x(x − 1)(x + 1). So the stationary points of h are 0, 1, and
2. Identify all the non-interior points.
There are no non-interior points because every point x ∈ R is in the interior of R.
3. Check if each of these points is a maximum point, a minimum point, or neither.
From a sketch of the graph, we see that 0 is a maximum point. And ±1 are minimum points
(and also global minimum points).

-2 -1 0 1 2

1533, Contents

A313(a) g ∶ R → R defined by x ↦ x8 + x7 − x6 .
1. Identify all the stationary points. g ′ (x) = 8x7 + 7x6 − 6x5 = x5 (8x2 + 7x − 6) = 0 ⇐⇒
√ √
−7 ± 72 − 4(8)(−6) −7 ± 241
x = 0, or x = = .
2(8) 16
(a) g ′′ (x) = 56x6 + 42x5 − 30x4 . So
√ √
−7 − 241 −7 + 241
g ′′ (0) = 0, g ′′ ( ) > 0, and g ′′ ( ) > 0.
16 16

−7 ± 241
(b) So are both minimum points. The 2DT is inconclusive about 0. By
sketching the graph, we observe that 0 is an inflexion point (this is an informal

2. There are no non-interior points.

Altogether, we √ conclude that there are no maximum points and the only two minimum
−7 ± 241
points are .
(b) h ∶ (− , ) → R defined by x ↦ tan x.
π π
2 2
1. Identify all the stationary points. h′ (x) = sec2 x is never equal to 0, so there are no
stationary points.
2. There are no non-interior points.
Altogether, we conclude that there are no maximum points and no minimum points.
(c) i ∶ [0, 2π] → R defined by x ↦ sin x + cos x.
π 5π
1. Identify all the stationary points. i′ (x) = cos x − sin x = 0 ⇐⇒ x = , .
4 4
(a) i′′ (x) = − sin x − cos x. So i′′ ( ) < 0 and i′′ ( ) > 0.
π π
4 4
π 5π
(b) So is a maximum point and is a minimum point.
4 4
2. The only two non-interior points are 0 and 2π. The former is a minimum point and the
latter is a maximum point.
Altogether, we conclude that the two maximum points are and 2π, and the two minimum

points are and 2π.

1534, Contents

1 3
A319. (a) The volume is fixed as 1 = πr2 h. So r = .
3 πh

√ 3
(b) By the Pythagorean Theorem, l = r2 + h2 = + h2 .

(c) The total external surface area of the cone (including the base) is
√ √ √ √
3 3 9 3h 9
A = πrl = π + h2 = π + = + 3πh.
πh πh π2 h2 π h2

h3 + 3π 3 π − h63 3 π − h63
6 1/3
= √ = √ = = 0 ⇐⇒ h = ( ) .
dA dA
(d) Compute . So
dh 2 9 + 3πh 2 9 + 3πh 2 A dh π
h2 h2

(e) Use the quotient rule:

h4 A − (π − h3 ) dh h4 A − (π − h3 ) 2 A 9 h4 A2 − (π − h3 )
18 6 dA 18 6 3 π− h3 12 6 2
d2 A 3 3
= = =
dh2 2 A2 2 A2 4 A3

12 2 6 2 12 9 6 2
(f) A − (π − 3 ) = 4 ( 2 + 3πh) − (π − 3 )
h4 h h h h

108 36π 36 12π 72 24π

= 6
+ 3
− (π2
+ 6
− 3
) = 6
+ 3 − π2 .
h h h h h h
This is a ∪-shaped quadratic in 3 , whose determinant is (24π)2 − 4(72)(−π2 ) = 864π2 > 0.
So this expression is always positive.

d2 A d2 A
(g) The numerator of our expression for is always positive. So is always positive.
dh2 dh2
That is, is always strictly increasing. So the stationary point we found in (d) must also
be the global minimum point.

1535, Contents

128.8. Ch. 76 Answers (Maclaurin Series)
A320(a) First, the derivatives of f are:

f ′ (x)
= k (1 + x) ,

f ′′ (x)
= k (k − 1) (1 + x) ,

f (3) (x)
= k (k − 1) (k − 2) (1 + x) ,

f (4) (x)
= k (k − 1) (k − 2) (k − 3) (1 + x) ,

f (x) = k (k − 1) . . . (k − n + 1) (1 + x) .
(n) k−n

Next, evaluate each of f , f ′ , f ′′ , etc. at 0:

f (0) = (1 + 0) = 1,

f ′ (0) = k (1 + 0) = k,

f ′′ (0) = k (k − 1) (1 + 0) = k (k − 1),

f (3) (0) = k (k − 1) (k − 2) (1 + 0) = k (k − 1) (k − 2),


f (4) (0) = k (k − 1) (k − 2) (k − 3) (1 + 0) = k (k − 1) (k − 2) (k − 3),


f (n) (0) = k (k − 1) . . . (k − n + 1) (1 + 0) = k (k − 1) . . . (k − n + 1) =
(k − n)!
The nth Maclaurin coefficient generated by f is:

f (n) (0)
cn = = =
(k − n)!n!
n! n!

And the Maclaurin series generated by f is:

∞ ∞
f (n) (0) n ∞ k (k − 1) 2 k (k − 1) (k − 2) 3
∑ cn x = ∑ x =∑ xn = 1 + kx + x + x + ...
n k!
n=0 n=0 n! n=0 (k − n)!n! 2! 3!
For all x ∈ (−1, 1), we have:
k (k − 1) 2 k (k − 1) (k − 2) 3
f (x) = (1 + x) = 1 + kx + x + x + ...
2! 3!
(b) First, the derivatives of cos are:

cos′ x = − sin x, cos′′ x = − cos x, cos(3) x = sin x, cos(4) x = cos x,

cos(5) x = − sin x.
We observe a cycle after every four derivatives. And so in general:

⎪ cos x for n = 0, 4, 8, . . . ,

⎪− sin x for n = 1, 5, 9, . . . ,
cos x = ⎨

⎪ − cos x for n = 2, 6, 10, . . . ,

⎩sin x for n = 3, 7, 11, . . .
1536, Contents
Next, evaluate each of cos, cos′ , cos′′ , etc. at 0:

⎪ 1 for n = 0, 4, 8, . . . ,

⎪0 for n = 1, 5, 9, . . . ,
cos 0 = ⎨

⎪ −1 for n = 2, 6, 10, . . . ,

⎩0 for n = 3, 7, 11, . . .

So, for each n ∈ Z+0 , the nth Maclaurin coefficient generated by cos is:

⎪ 1/n! for n = 0, 4, 8, . . . ,

cos(n) (0) ⎪

⎪0/n! = 0 for n = 1, 5, 9, . . . ,
cn = =⎨

⎪ −1/n! for n = 2, 6, 10, . . . ,


⎩0/n! = 0 for n = 3, 7, 11, . . .

And the Maclaurin series generated by cos is:

∞ ∞
cos(n) (0) n 1 x2 x4
∑ cn xn = ∑ x = − + − ...
n=0 n=0 n! 0! 2! 4!

For all x ∈ R, we have:

x2 x4
cos x = 1 − + − ...
2! 4!
(c) First, the derivatives of g are:

g ′ (x) = ,
g ′′ (x) = 2,
(1 + x)
g (3) (x) = 3,
(1 + x)
−3 ⋅ 2 ⋅ 1
g (4) (x) = 4,
(1 + x)

(−1) (n − 1)!
(x) = .
(1 + x)
g n

Next, evaluate each of g, g ′ , g ′′ , etc. at 0:

1537, Contents

g (0) = ln (1 + 0) = 0,
g ′ (0) = = 1,
g ′′ (0) = 2 = −1,
(1 + 0)
g (3) (0) = 3 = 2,
(1 + 0)
−3 ⋅ 2 ⋅ 1
g (4) (0) = 4 = −3!
(1 + 0)

(−1) (n − 1)!
g (0) =
= (−1) (n − 1)!
(1 + 0)

The nth Maclaurin coefficient generated by g is:

g (n) (0) (−1) (n − 1)! (−1)

n−1 n−1
cn = = = .
n! n! n
And the Maclaurin series generated by g is:

∞ ∞
g (n) (0) n ∞ (−1) x x2 x3 x4
∑ cn x = ∑
x =∑ x = −
+ − + ...
n=0 n=0 n! n=0 n 1 2 3 4

For all x ∈ (−1, 1], we have:

x2 x3 x4
g (x) = ln (1 + x) = x − + − + ...
2 3 4
x2 x3
A321(a) From List MF26, we know that exp x = M (x) = 1 + x + + + . . . Thus, the
2! 3!
0th, 1st, 2nd, and 3rd Maclaurin series polynomials generated by exp are simply:

x2 x2 x3
M0 (x) = 1, M1 (x) = 1 + x, M2 (x) = 1 + x + , and M3 (x) = 1 + x + + .
2! 2! 3!

Figure to be
inserted here.

n (n − 1) 2
(b) From List MF26, we know that f (x) = (1 + x) = M (x) = 1 + nx + x +
n (n − 1) (n − 2) 3
x + ...
Thus, the 0th, 1st, 2nd, and 3rd Maclaurin series polynomials generated by f are simply:

1538, Contents

n (n − 1) 2
M0 (x) = 1, M1 (x) = 1 + nx, M2 (x) = 1 + nx + x , and
n (n − 1) 2 n (n − 1) (n − 2) 3
M3 (x) = 1 + nx + x + x.
2! 3!

Figure to be
inserted here.

x2 x4 x2 x4
(c) From List MF26, we know that cos x = M (x) = 1 − + − ⋅ ⋅ ⋅ = 1 − + − . . . Thus,
2! 4! 2 4!
the 0th, 1st, 2nd, and 3rd Maclaurin series polynomials generated by cos are simply:
x2 x2
M0 (x) = 1, M1 (x) = 1, M2 (x) = 1 − , and M3 (x) = 1 − .
2 2

Figure to be
inserted here.

For small x, cos x ≈ M2 (x) = 1 − . We call this the small-angle approximation for
A322(a) The derivative of f is the function f ′ ∶ (−1, 1) → R defined by:
f ′ (x) = 2.
(1 − x)

(b) f (x) =

= 1 + 2x + 3x + ⋅ ⋅ ⋅ = ∑ nxn−1 .
(1 − x)

(c) It would appear that we can indeed find the derivative of f simply by differentiating
its Maclaurin series term by term!
A325. By the double angle formula for sine, we have:
f (x) = sin x cos x = sin 2x.
Hence: f ′ (x) = cos 2x, f ′′ (x) = −2 sin 2x, and f (3) (x) = −4 cos 2x.
And so: f (0) = 0, f ′ (0) = 1, f ′′ (0) = 0, and f (3) (0) = −4.
Thus: m0 = 0, m1 = 1, m2 = 0, m3 = −4/3! = −2/3 and:
1539, Contents
2 2
f (x) = sin x cos x = 0 + 1x + 0x2 − x3 + ⋅ ⋅ ⋅ = x − x3 + . . .
3 3
A328. Define g ∶ R → R by g (y) = sin y and h ∶ (−1, 1] → R by h (x) = ln (1 + x).
Then the composite function f = gh ∶ (−1, 1] → R is indeed defined by f (x) = sin [ln (1 + x)].
The power (and also Maclaurin) series representation of g is:

y3 y5
g (y) = sin y = y − + − ... for y ∈ R.
3! 5!
The power (and also Maclaurin) series representation of h is:

x2 x3
h (x) = ln (1 + x) = x − + − ... for x ∈ (−1, 1].
2 3

Method 1 (Theorem 38). Simply plug = into =:

2 1

f (x) = sin [ln (1 + x)] =

3 5
x2 x3 1 x2 x3 1 x2 x3
(x − + − . . . ) − (x − + − . . . ) + (x − + − ...) −... for x ∈ (−1, 1].
2 3 3! 2 3 5! 2 3

Since we’re only asked to write down the expansion up to and including the x3 term, we
x2 x3 1 3 x2 x3
f (x) = sin [ln (1 + x)] = (x − + − . . . ) − (x ) + . . . = x − + − ... for
2 3 3! 2 6
x ∈ (−1, 1].

Method 2 (Theorem 23). Compute the first few derivatives of f :

cos [ln (1 + x)]

f ′ (x) = ,
sin [ln (1 + x)] cos [ln (1 + x)]
f ′′ (x) = − − ,
(1 + x) (1 + x)
2 2

cos [ln (1 + x)] sin [ln (1 + x)] sin [ln (1 + x)] cos [ln (1 + x)]
f (3) (x) = − + 2 + + 2 .
(1 + x) (1 + x) (1 + x) (1 + x)
3 3 3 3

Evaluate each of f , f ′ , f ′′ , and f (3) at 0:

f (0) = 0,
f ′ (0) = 1,
f ′′ (0) = 0 − 1 = −1,
f (3) (0) = −1 + 0 + 0 + 2 = 1.

Thus, the Maclaurin series representation of f is:

0 1 −1 2 1 3 x2 x3
f (x) = sin [ln (1 + x)] = + x + x + x + ⋅ ⋅ ⋅ = x − + − ... for x ∈ (−1, 1].
0! 1! 2! 3! 2 6
1540, Contents
128.9. Ch. 77 Answers (Integration)

1541, Contents

128.10. Ch. 78 Answers (Antidifferentiation)
A334(a) Three antiderivatives of f are F, G, H ∶ R → R defined by:
1 1 1
F (x) = x2 − 3x, G (x) = x2 − 3x + 1, H (x) = x2 − 3x + 2.
2 2 2
(b) There exist constants C1 , C2 , C3 ∈ R such that:

A (x) = F (x) + C1 = G (x) + C2 = H (x) + C3 for all x ∈ R.

A335(a) The derivative of F is the function F ′ ∶ R → R with mapping rule F ′ (x) = 4 sin 4x.
Thus, F ′ = f or equivalently F = ∫ f — that is, F is indeed an antiderivative for f .
The derivative of G is the function G′ ∶ R → R with mapping rule

G′ (x) = 8 (2 sin x cos3 x − 2 sin3 x cos x) = 16 sin x cos x (cos2 x − sin2 x)

= 8 sin 2x cos 2x = 4 sin 4x.

Thus, G′ = f or equivalently G = ∫ f — that is, G is indeed an antiderivative for f .

(b) Although the functions F and G seem very different, they actually differ only by a
constant (namely 1), as we now show:

G (x) = 8 sin2 x cos2 x = 2 (2 sin x cos x) (2 sin x cos x) = 2 sin2 2x

= 1 − (1 − 2 sin2 2x) = − cos 4x + 1 = F (x) + 1.

So, F and G do not contradict our assertion that “antiderivatives are unique up to a
constant”, because they do indeed differ by only a constant.
A336Using the various Rules of Differentiation, we have:
(a) (kx + C) = k.
d xk+1
( + C) = (k + 1) = xk .
dx k + 1 k+1
d x
(d) (e + C) = ex .
(e) (− cos x + C) = sin x.
(f) (sin x + C) = cos x.
d d d
(g) f± g= (f ± g).
dx dx dx
d d
(h) (kf ) = k f .
dx dx
d 1 1 d C 1
(i) [ (∫ f ) (ax + b)] = [(∫ f ) (ax + b)] = f (ax + b) ⋅ a = f (ax + b).
dx a a dx a
A337(b) Suppose a function has mapping rule x ↦ xk (where k ≠ −1). Then this function’s
1542, Contents
antiderivatives are exactly those functions whose mapping rule is x ↦ + C.
(c) Suppose a function has mapping rule x ↦ . Then this function’s antiderivatives are
exactly those functions whose mapping rule is x ↦ ln ∣x∣ + C.

Remark 157. Observe that the function with mapping rule x ↦ could have a domain
as large as R ∖ {0}. In which case, we want to say that its antiderivative also has the
same domain. But this wouldn’t be the case if we simply say that the antiderivative has
mapping rule x ↦ ln x + C, because in H2 Maths, ln isn’t defined for negative values.
d 1
This, along with the fact that (ln ∣x∣ + C) = , are the reasons why, in general, it is
dx x
not OK to simply drop the absolute value sign here.
Note though that if the function with mapping rule x ↦ has domain D ⊆ R+ , then it is
OK to drop the absolute value sign when writing down its antiderivative.
(d) Suppose a function has mapping rule x ↦ ex . Then this function’s antiderivatives are
exactly those functions whose mapping rule is x ↦ ex + C.
(e) Suppose a function has mapping rule x ↦ sin x. Then this function’s antiderivatives
are exactly those functions whose mapping rule is x ↦ − cos x + C.
(f) Suppose a function has mapping rule x ↦ cos x. Then this function’s antiderivatives
are exactly those functions whose mapping rule is x ↦ sin x + C.
(g) Suppose a function has mapping rule x ↦ (f ± g) (x). Then this function’s antideriv-
atives are exactly those functions whose mapping rule is x ↦ ∫ f (x) ± ∫ g (x).
(h) Suppose a function has mapping rule x ↦ kf (x). Then this function’s antiderivatives
are exactly those functions whose mapping rule is x ↦ k ∫ f (x).
A338. Once again, this goes back to our repeated warning about the distinction between
antidifferentiation and integration.
As you were asked to write down in the previous exercise, the Constant Factor Rule in
Theorem 27 (Rules of Antidifferentiation) says that given a function f with antiderivative
∫ f , a function with mapping rule x ↦ kf (x) will have antiderivative k ∫ f .
In contrast, the Constant Factor Rule in Theorem 25 (Rules of Integration) says that if
the area under the graph of f between a and b is S, then the area under the graph of kf
between a and b is kS.
A priori, these two Constant Factor Rules have absolutely no relationship with each other.
One is concerned with antiderivatives, while the other is concerned with finding the area
under a curve. That they are in fact related is established only by the two FTCs.
Of course, these same remarks also apply to the Sum and Difference Rules listed in each of
these same two Theorems.
A339. In each of the following, C denotes the constant of integration.
(a) ∫ ax + b dx = ax2 + bx + C.
1543, Contents
1 1
(b) ∫ ax2 + bx + c dx = ax3 + bx2 + cx + C.
3 2
1 1 1
(c) ∫ ax3 + bx2 + cx + d dx = ax4 + bx3 + cx2 + d + C.
4 3 2
(d) ∫ (ax + b) dx = (ax + b) + C.
c c+1
a (c + 1)
1 1
(e) ∫ dx = ln ∣ax + b∣ + C.
ax + b a
(f) ∫ a sin (bx + c) + d dx = − cos (bx + c) + dx.
(g) ∫ a exp (bx + c) + d dx = exp (bx + c) + dx.
1 1
(h) ∫ a cos bx + c + dx = sin bx + cx + ln ∣x∣ + C.
dx b d

1544, Contents

128.11. Ch. 79 Answers (FTC2)

1545, Contents

128.12. Ch. 80 Answers (More Techniques of Antidifferentiation)
1 1 1 1 1
A341(a) ∫ dx = ∫ dx = − = .
4x − 4x + 1
(2x − 1)
2 2 2x − 1 2 − 4x
1 1 1 1 1
(b) ∫ dx = ∫ dx = − = − .
9x2 + 30x + 25 (3x + 5)
2 3 3x + 5 9x + 15
A342(a) Observe that 5x2 − 2x − 3 = (5x + 3) (x − 1). So, write:
1 A (x − 1) + B (5x + 3) (A + 5B) x + 3B − A
= + = =
− 2x − 3 5x + 3 x − 1 (5x + 3) (x − 1) 5x2 − 2x − 3

Comparing coefficients, we have A + 5B = 0 and 3B − A = 1. Solving, we have A = −5/8 and

B = 1/8. Thus:

1 5 1 1 1 51 1
∫ 5x2 − 2x − 3 dx = − 8 ∫ 5x + 3 dx + 8 ∫ x − 1 dx = − 8 5 ln ∣5x + 3∣ + 8 ln ∣x − 1∣ + C =
1 x−1
ln ∣ ∣ + C.
8 5x + 3

(b) Observing that x2 − a2 = (x + a) (x − a), we write:

1 A (x − a) + B (x + a) (A + B) x + (B − A) a
= + = =
x2 − a2 x + a x − a (x + a) (x − a) x2 − a2

Comparing coefficients, we have A + B = 0 and (B − A) a = 1. Solving, we have A = −
and B = . Thus:

1 − 2a
1 1
∫ x2 − a2 dx = ∫ x + a + x − a dx

− 2a
1 1
=∫ dx + ∫ 2a
dx (Sum Rule)
x+a x−a
1 1 1 1
=− ∫ dx + ∫ dx (Constant Rule)
2a x+a 2a x−a
1 1
= − ln ∣x + a∣ + ln ∣x − a∣ + C (Reciprocal Rule)
2a 2a
= (ln ∣x − a∣ − ln ∣x + a∣) + C
1 ∣x − a∣
= ln +C (Law of Logarithm)
2a ∣x + a∣
1 x−a
= ln ∣ ∣ + C, (Fact 42)
2a x+a

where as usual C is our COI.

(c) Instead of slaving through all the steps again, we can simply make use of (a):

1546, Contents

1 1 1 x−a 1 x+a
∫ a2 − x2 dx = − ∫ x2 − a2 dx = − 2a ln ∣ x + a ∣ − C = 2a ln ∣ x − a ∣ + Ĉ,

where the last step uses a Law of Logarithm and Ĉ is, as usual, our COI (and could’ve
been denoted by any other symbol).
7x + 2 7x + 2 7 2x − 1 11/2 7 1
A343(a) ∫ dx = ∫ dx = ∫ + dx = ∫ dx+
4x2 − 4x + 1 (2x − 1)
2 2 (2x − 1)2 (2x − 1)2 2 2x − 1
11 1 71 11 1 1 7 11
∫ dx = ln ∣2x − 1∣ + (− ) = ln ∣2x − 1∣ + .
2 (2x − 1)
2 22 2 2 2x − 1 4 4 − 8x

(b) In Example 1035, we already showed that:

1 1 1 1 x−2
∫ x2 + x − 6 dx = ∫ (x + 3) (x − 2) dx = ln ∣ ∣ + C.
5 x+3


7x + 2 7x + 2 x+3 19 1
∫ x2 + x − 6 dx = ∫ (x + 3) (x − 2) dx = ∫ 7 (x + 3) (x − 2) − (x + 3) (x − 2) dx = 7 ∫ x − 2 dx

19 x−2 1
= 7 ln ∣x − 2∣ − ln ∣ ∣ + C = (16 ln ∣x − 2∣ + 19 ln ∣x + 3∣) + C.
5 x+3 5

Note that the last step is nice but not necessary (you should however be perfectly capable
of doing this bit of algebra).
(c) Below, = uses our answer from Exercise 342(a).

7x + 2 7x + 2 x−1 9
∫ 5x2 − 2x − 3 dx = ∫ (5x + 3) (x − 1) dx = ∫ 7 (5x + 3) (x − 1) + (5x + 3) (x − 1) dx = 7 ∫ 5x

7 9 x−1 11 9
= ln ∣5x + 3∣ + ln ∣ ∣+C = ln ∣5x + 3∣ + ln ∣x − 1∣ + C.
5 8 5x + 3 40 8

Ditto the last sentence in (b).


⎪ sec x tan x

⎪ = tan x for sec x > 0,
d ⎪

⎪ sec x
(ln ∣sec x∣ + C) = ⎨
dx ⎪

⎪ − sec x tan x

⎪ = tan x for sec x < 0.
⎩ − sec x

1547, Contents

⎪ cos x

⎪ = cot x for sin x > 0,
d ⎪

⎪ sin x
(ln ∣sin x∣ + C) = ⎨
dx ⎪

⎪ − cos x

⎪ = cot x for sin x < 0.
⎩ − sin x

⎪ −cosecx cot x − cosec2 x cosecx(cot x + cosecx)

⎪ − = = cosecx for

⎪ cosecx + cot cosecx + cot

x x
(− ln ∣cosecx + cot x∣ + C) = ⎨
dx ⎪

⎪ cosecx cot x + cosec2 x cosecx(cot x + cosecx)

⎪ − = = cosecx for
⎩ − (cosecx + cot x) cosecx + cot x

(h) ⎧

⎪ sec x tan x + sec2 x sec x(sec x + tan x)

⎪ = = sec x for sec x + tan x

⎪ sec x + tan x sec x + tan x
d ⎪
(ln ∣sec x + tan x∣ + C) = ⎨
dx ⎪

⎪ − sec x tan x − sec2 x sec x(sec x + tan x)

⎪ = = sec x for sec x + tan x
⎩ − (sec x + tan x) sec x + tan x

A347(b) Use the identity cos 2x = 2 cos2 x − 1:

cos 2x + 1 1 sin 2x
∫ cos x dx = ∫ dx = x + + C.
2 2 4

(c) Use the identities tan2 x = sec2 x − 1 and ∫ sec2 x dx = tan x + C:

∫ tan x dx = ∫ sec x − 1 dx = tan x − x + C.

2 2

P +Q P −Q
(d) We have the identity: sin P + sin Q = 2 sin
cos .
2 2
P +Q P −Q
So let: mx = and nx = .
2 2
P = (m + n) x Q = (m − n) x.
2 3
Which means: and

Plugging in =, =, and =, we have:

1 2 3

∫ sin mx cos nx dx = ∫ 2 [sin (m + n) x + sin (m − n) x] dx
1 cos (m + n) x cos (m − n) x
=− [ + ] + C.
2 m+n m−n

P +Q P −Q
(e) We have the identity: cos P − cos Q = −2 sin
sin .
2 2
P +Q P −Q
So let: mx = and nx = .
2 2
1548, Contents
P == (m + n) x Q = (m − n) x.
2 3
Which means: and

Plugging in =, =, and =, we have:

1 2 3

∫ sin mx sin nx dx = ∫ − 2 [cos (m + n) x − cos (m − n) x] dx
1 sin (m − n) x sin (m + n) x
= [ − ] + C.
2 m−n m+n

P +Q P −Q
(f) We have the identity: cos P + cos Q = 2 cos
cos .
2 2
P +Q P −Q
So let: mx = and nx = .
2 2
P = (m + n) x Q = (m − n) x.
2 3
Which means: and

Plugging in =, =, and =, we have:

1 2 3

∫ cos mx cos nx dx = ∫ 2 [cos (m + n) x + cos (m − n) x] dx
1 sin (m − n) x sin (m + n) x
= [ + ] + C.
2 m−n m+n
cos x
A357. Since cot x = and sin′ (x) = cos x, we have:
sin x
cos x sin′ (x) ⋆
∫ cot x dx = ∫ sin x dx = ∫ sin x dx = ln ∣sin x∣ + C.

dx 4
A355(a) From the given substitution x = sec u, we have = sec u tan u.
Now plug in =:

1549, Contents

1 1
∫ √ dx = ∫ √
x2 x2 − 1 sec2 u sec2 u − 1
=∫ √ dx
sec u tan u

=∫ dx
sec2 u ∣tan u∣
=∫ dx (By hint)
sec2 u tan u
sec u du
1 du
=∫ dx
sec u dx
sec u
= ∫ cos u du

= sin u + C1

= sin (sec−1 x) + C1 .

1 du 5 2
(b) From the given substitution u = 1 − 2 , we have =
and also:
dx x3
√ √

√ 6 1 x2 − 1 x2 − 1
u= 1− 2 = = .
x x2 x
1 2
√ dx = ∫ √
∫ dx
x2 x2 − 1 2 x2 − 1 x3
=∫ √
5 x
2 x2 − 1 dx
=∫ √
s x
2 x2 − 1
= ∫ √ du
2 u

= u + C2

x2 − 1
= + C2 .
(c) Answering Hint 2, two antiderivatives of the same function differ by at most a constant.

x2 − 1
sin (sec−1 x) = +D for some D ∈ R.
1550, Contents
Following Hint 3, plug x = 2 into = to get:

LHS = sin (sec−1 2) = sin =
3 2
√ √
22 − 1 3
RHS = +D = + D.
2 2

Hence, D = 0 and = becomes the desired result:


x2 − 1
sin (sec x) =−1

A356(a) From x = tan u, we have:


dx 5
u = tan−1 x = sec2 u.

x3 tan3 u
∫ dx = ∫
(x2 + 1) (tan2 u + 1)
3/2 3/2

tan3 u
=∫ dx
(sec2 u)

tan3 u
=∫ dx
sec3 u
tan3 u
= ∫ dx dx
du sec u

tan3 u du
sec u dx
tan3 u
sec u
sin2 u
=∫ sin u du
cos2 u
1 − cos2 u
=∫ sin u du
cos2 u
sin u
=∫ − sin u du
cos2 u
= + cos u + C1
cos u
= + cos (tan−1 x) + C1 .
cos (tan x)

du 6
(b) From the given substitution u = x2 + 1, we have = 2x.
1551, Contents
x3 x3
∫ dx = ∫ 3/2 dx

(x2 + 1)
3/2 u
=∫ 2x dx
x2 du
2u3/2 dx
2 1 u−1
= ∫ du
2 u3/2
1 1 1
= ∫ 1/2 − 3/2 du
2 u u
= (2u1/2 + 2u−1/2 ) + C2
√ 1
= u + √ + C2
√ 1
= x2 + 1 + √ + C2 .
x2 + 1
1 1 ×ab
(c) + a = + b Ô⇒ b + a2 b = a + ab2 ⇐⇒ ba2 − (1 + b2 ) a + b = 0. By the quadratic formula:
a b

1 + b ± (1 + b2 ) − 4b2
2 2

1 + b2 ± b4 + 2b2 + 1 − 4b2

1 + b2 ± b4 − 2b2 + 1
1 + b2 ± (b2 − 1) 1
= = b, .
2b b
(d) The two antiderivatives found in (a) and (b) differ by at most a constant:
1 √ 1
+ (tan −1
= 2+1+ √ +D for some D ∈ R.
cos (tan−1 x)
x) x
x2 + 1

Plug x = 0 into = to get:


LHS = + cos (tan−1 0) = 1 + 1 = 2,
cos (tan 0)

√ 1
RHS = 02 + 1 + √ + D = 1 + 1 + D = 2 + D.
02 + 1

Thus, D = 0 and = becomes:


1552, Contents

1 √ 1
+ cos (tan −1
= x2 + 1 + √ .
cos (tan x)
x2 + 1

By (c) then:
√ 1
Either cos (tan x) =
x2 + 1 cos (tan−1 x) = √
8 9
or .
x2 + 1

Plugging x = 1 into = and =, we see that only = is true.

8 9 9

1553, Contents

128.13. Ch. ?? Answers (The Fundamental Theorems of Calculus)
A??. For the Lower Sum SL12 , each rectangle has width (or base) 0.5. The first rectangle
has height f (0), the second f (0.5), the third f (1), ..., the twelfth f (5.5). And so
⎡ √ √ √ ⎤
1 1 11 1⎢⎢ ⎛ 1 ⎞ ⎛ 11 ⎞⎥⎥
SL12 = [f (0) + f ( ) + ⋅ ⋅ ⋅ + f ( )] = ⎢( 0 + 1) + + 1 + ⋅⋅⋅ + + 1 ⎥ ≈ 15.116.
2 2 2 2⎢ ⎝ 2 ⎠ ⎝ 2 ⎠⎥
⎣ ⎦
For the Upper Sum SL12 , each rectangle again has width (or base) 0.5. The first rectangle
has height f (0.5), the second f (1), the third f (1.5), ..., the twelfth f (6). And so
⎡ √ √
√ ⎤
1 ⎢⎢⎛ 1 ⎞ ⎛ 3 ⎞ ⎥
+ 1 + ⋅ ⋅ ⋅ + ( 6 + 1)⎥⎥ ≈ 16.341.
1 1
SU 12 = [f ( ) + f (1) + ⋅ ⋅ ⋅ + f (6)] = ⎢ +1 +
2 2 2 ⎢⎝ 2 ⎠ ⎝ 2 ⎠ ⎥
⎣ ⎦
Altogether then, lower and upper bounds for A(6) are:

15.116 ≈ SL12 ≤ A(6) = 15.79795897... ≤ SU 12 ≈ 16.341.

A??. We do not know which the area function is, amongst the infinitely many indefinite
integrals of f . We merely know that the area function is one of them. Hence, we use the
indefinite article an, rather than the definite article the.

1554, Contents

128.14. Ch. 83 Answers (Definite Integrals)
A364. Our desired area is labelled A below.
Method #1. The entire rectangle A + B + C + D has area 21/3 × 2 = 24/3 . The rectangle B + C
24/3 − 1
21/3 4 2
has area 1 × 1 = 1. The region D has area ∫ x3 dx = [ ] =
. Hence,
1 4 1 4
24/3 − 1 3
A = A + B + C + D − (B + C + D) = 2 − (1 +
) = (24/3 − 1) .
4 4

2 3 4/3 2 3 4/3
Method #2. y = x3 ⇐⇒ x = y 1/3 . So A = ∫ x dy = ∫ y 1/3 dy = [y ]1 = (2 − 1).

y=1 1 4 4


1555, Contents

A365. The curve y = sin x and the line y = 0.5 intersect at x = π/6 and x = 5π/6.
√ √
5π/6 5π π 5π π 3 3 π
A=∫ sin x−0.5 dx = [− cos x − 0.5x]π/6 = − cos +cos − + = − (− )+ − =
6 6 12 12 2 2 3


3− .

A366. By the quadratic formula, the two curves intersect at ± 2/2. So
√ √ √
2/2 √ √ √ √
2/2 2x3 2 2 2 2 2 2 2 2
A=∫ √ 2 − x2 − (x2 + 1) dx = [x − ] √ =[ − ] − [− + ]= .
− 2/2 3 − 2/2 2 12 2 12 3

1556, Contents

A367. Compute

2 x5 32 256
∫−2 x − 16 dx = [ − 16x] = ( − 32) − (
+ 32) = − .
5 −2 5 5 5

Hence the desired area is 256/5.

A368. (Again,√it helps to graph this on your calculator.) Note that y = 1 ⇐⇒ t = 2;

y = 2 ⇐⇒ t = 3; and dy/dt = 3t2 . So the area can be computed as:


√ √
t= 3 3 3
3t5 6t4
∫y=1 x dy = ∫t= √ (t + 2t) 3t dt = [ + ]
2 2
2 5 4 √ 3

3 ⋅ 35/3 6 ⋅ 34/3 3 ⋅ 25/3 6 ⋅ 24/3 38/3 − 3 ⋅ 25/3 37/3 − 3 ⋅ 24/3

=[ + ]−[ + ]= + .
5 4 5 4 5 2

A369. By Fact 10,

1 sin 2x π π2
∫0 πy dx = ∫0 π sin x dx = π [ x − ] = .
π π
2 2
2 4 0 2

1557, Contents

128.15. Ch. 85 Answers (Revisiting Logarithms)
A370. Differentiate both sides with respect to x to get:

d 1 −1 1 1 d 2 1
ln = x ( 2 ) = − and (− ln x) = − .
dx x x x dx x

Or equivalently:

1 1 1 1 2
∫ (− x ) = ln x and ∫ (− x ) = − ln x.

By Corollary 32 then, ln and − ln x differ by at most a constant:
1 3
ln = − ln x + C.
1 3
Plugging in x = 1, we have ln 1 = 0 = − ln 1 + C = C. Hence, C = 0. And thus, ln = − ln x.
c ln a ln ac ⋆

A371(e) c logb a = = = logb ac . (The middle step uses Fact 159.)
ln b ln b
1 ⋆ ln (1/a) ln a ⋆
(f) logb = =− = − logb a. (The middle step uses Fact 158(c).)
a ln b ln b
⋆ ln (ac) ln a + ln c ⋆
(g) logb (ac) = = = logb a + logb c. (The middle step uses Fact 158(b).)
ln b ln b
(h) is immediate from (f) and (g). But if we’d like, we can also prove this without using
(f) and (g):

a ⋆ ln (a/c) ln a − ln c ⋆
logb = = = logb a − logb c.
c ln b ln b
(The middle step uses Fact 158(d).)
⋆ ln a ⋆ ln b
(i) logc a = and logc b = . So:
ln c ln c
logc b ln a/ ln c ln a ⋆
= = = logb a.
logc a ln b/ ln c ln b

⋆ ln c ln c ⋆ 1
(j) logab c = = = loga c. (The middle step uses Fact 159.)
ln ab b ln a b

1558, Contents

128.16. Ch. 87 Answers (Differential Equations)
A372. We’ll need to use IBP twice:

⎛ ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ⎞
v′ v′
©x ¬ ¬ © ⎜ ⎟
u u
y = ∫ e sin x dx = ex sin x − ∫ cos x ex dx = ex sin x − ⎜ex cos x + ∫ sin xex dx⎟
⎜ ⎟
⎝ ⎠

⇐⇒ y = (sin x − cos x) + C is the general solution.

Given also the initial condition x = 0 Ô⇒ y = 1, we find that 1 = (sin 0 − cos 0) + C =
C − 0.5 ⇐⇒ C = 1.5. Thus, the particular solution is y = (sin x − cos x) + 1.5.

dx 1 1
A373. Rearranging, = 2 . So the general solution is x = ∫ 2 dy = tan−1 y + C
dy y + 1 y +1
(Proposition ??). Rearranging, the general solution is y = tan (x + D) (where D = −C).
Given also the initial condition x = 0 Ô⇒ y = 1, we have C = −π/4. So the particular
solution is x = tan−1 y − π/4.

1559, Contents

A374. We’ll need to use IBP twice:

v′ ⎛ v′ ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ⎞
©¬ ¬© ⎜ x ⎟
u u
= ∫ ex sin x dx = ex sin x − ∫ cos x ex dx = ex sin x − ⎜ ⎟
+ ∫
⎜ e cos sin dx ⎟
⎜ ⎟
x xe
⎝ ⎠
dy ex
⇐⇒ = (sin x − cos x) + C1
dx 2

dy ex
Ô⇒ y=∫ dx = ∫ (sin x − cos x) + C1 dx
dx 2

1 ex
= [ (sin x − cos x) + C2 − ∫ ex cos x dx + C1 x]
2 2

1 ex ex
= [ (sin x − cos x) + C2 − (sin x + cos x) + C3 + C1 x]
2 2 2

= (−ex cos x + C1 x + C4 ) ,
dy 1
where = used =. The general solution is = (−ex cos x + C1 x + C4 ).
2 1
dx 2

Given also the initial condition x = 0 Ô⇒ y = 1 and x = π/2 Ô⇒ y = 2, we have

1= (−e0 cos 0 + C1 0 + C4 ) Ô⇒ C4 = 2.5,
2 = (−eπ/2 cos + C1 ⋅ + 2.5) Ô⇒ C1 = 1.5π.
π π
2 2 2
dy 1
So the particular solution is = (−ex cos x + 1.5πx + 2.5).
dx 2

1560, Contents

(a) The Law of Gravitation is F = , where G ∈ R is some constant.
GM m

(b) (i) Newton’s Second Law of Motion is that F = (mv).
d dm dv
(ii) By the Product Rule, F = (mv) = v + m . Assuming that m is constant, we
dt dt dt
dm dv
have = 0 and hence F = m .
dt dt

(c) Taking the Earth is immobile, the force of gravitation is the rate of change of momentum
of the small ball. That is,
F= = −m
GM m
r2 dt
The ball drops towards the surface of the Earth at an increasing speed. By assumption,
downwards is the negative direction. Hence the negative sign.
Cancelling out the m’s yields 2 = − .
r dt

1 R 1 1
(d) (i) ∫R+x r2 dr = Gm1 [− r ] = Gm1 (− + ).
R+x R R+x

dv v2 s vs2
∫r=R+x dt dr = ∫r=R+x dt dv =∫ v dv = [ ] = − .
r=R r=R dr v=vs
v=0 2 0 2

1 1 vs2 1 1
(iii) Gm1 (− + )=− ⇐⇒ vs = ± 2Gm1 ( − ).
R R+x 2 R R+x

√assumption, downwards is the negative direction. So for (d) (iii), we must have vs =
1 1
− 2Gm1 ( − ).
R R+x

(... Answer continued on the next page ...)

1561, Contents

(... Answer continued from the previous page ...)

(e) This is simply the same process as before, but in reverse. The ball will keep moving
upwards, but the force of gravitation will keep pulling it down, reducing its velocity at a
rate given by the equation 2 = − . Eventually, the velocity of the ball will hit 0 and
r dt
then start going negative (i.e. the ball will start falling down towards the Earth).

1 1
Hence, if x is the maximum height attained by the ball, we have V = 2GM ( − ).
R R+x

(f) In order for the ball to never fall back down to earth, it must be that the ball keeps
going upwards and never reaches any maximum height. That is, x → ∞. Thus, ve = lim V =
√ √ x→∞
1 1 2GM
2GM ( − )= .
R R+x R

(g) The escape velocity is

√ √
2GM 2 ⋅ 6.674 × 10−11 ⋅ 5.972 × 1024
ve = = ≈ 11, 190.
R 6371000

Hence, the escape velocity is approximately 11.19 km s-1 .

1562, Contents

129. Part VI Answers (Probability and Statistics)

129.1. Ch. 88 Answers (How to Count: Four Principles)

A376. Taking the green path, there are 3 ways. Taking the red path, there are 2 ways.
Hence, there are 3 + 2 = 5 ways to get from the Starting Point to the River.

A377. The tree diagram below illustrates.

Case #1. First letter is a D.
Case #1(i). Second letter is a D. Then the last two letters must both be E’s. (1 permuta-
Case #1(ii). Second letter is an E. Then the last two letters must be either DE or ED. (2
Case #2. First letter is a E.
Case #2(i). Second letter is an E. Then the last two letters must both be D’s. (1 per-
Case #2(ii). Second letter is a D. Then the last two letters must be either DE or ED. (2
Altogether then, there are 1 + 2 + 1 + 2 = 6 possible permutations of the letters in DEED.

1563, Contents

A378. 3 × 5 × 10 = 150.

A379. We must choose three 4D numbers. Choosing the first 4D number involves four
decisions — what to put as the first, second, third and fourth digits, with the condition
that no digit is repeated.

1 2 3 4

Thus, by the MP, there are 10 × 9 × 8 × 7 = 5040 ways to choose the first 4D number.
If we ignored the fact that we already chose the first 4D number, then there’d similarly be
5040 ways to choose the second 4D number (given the condition that this second 4D number
does not have any repeated digits). However, there is an additional condition — namely,
the second 4D number cannot be the same as the first. Thus, there are 5040 − 1 = 5039
ways to choose the second 4D number.
By similar reasoning, we see that there are 5040 − 2 = 5038 ways to choose the third 4D
Altogether then, by the MP, there are 5040 × 5039 × 5038 = 127, 947, 869, 280 ways to choose
the three 4D numbers.

1564, Contents

A380. Apply the IEP twice.
1. The food court and hawker centre share 2 types of cuisine (Chinese and Western) in
common. And so together, the food court and the hawker centre have 4 + 3 − 2 = 5
different types of cuisine.
2. Combine together the food court and the hawker centre (call this the “Low-Class Place”).
The Low-Class Place has 5 types of cuisine and shares 2 types of cuisine (Chinese and
Malay) with the restaurant. And so together, the Low-Class Place and restaurant have
5 + 3 − 2 = 6 different types of cuisine (namely Chinese, Indonesian, Japanese, Korean,
Malay, and Western).

A381. 10 − 3 = 7. (Can you name them?)

1565, Contents

129.2. Ch. 89 Answers (How to Count: Permutations)
A382. 6! = 720, 7! = 5040, and 8! = 40320.

A383. 7!/ (4!3!) = 35.

A384. 9!.

A385. The problem of choosing a president and vice-president from a committee of

11 members is equivalent to the problem of filling 2 spaces with 11 distinct objects. The
answer is thus P (11, 2) = 11!/9! = 11 × 10 = 110.

A386. Let B and S stand for brother and sister, respectively.

(a) First consider the problem of permuting the seven letters in BBBBSSS, without any
two B’s next to each other. There is only 1 possible arrangement, namely BSBSBSB.
There are 4! ways to permute the brothers and 3! ways to permute the sisters.
Hence, there are in total 1 × 4!3! = 144 possible ways to arrange the siblings in a line, so
that no two brothers are next to each other.

(b) First consider the problem of permuting the seven letters in BBBBSSS, without any
two S’s next to each other. We’ll use the AP.
1. B in position #1.
(a) B in position #2. Then the only way to fill the remaining five positions is SBSBS.
Total: 1 possible arrangement.
(b) S in position #2. Then we must have B in position #3.
i. B in position #4. Then the only way to fill the remaining three positions is
SBS. Total: 1 possible arrangement.
ii. S in position #4. Then we must have B in position #5. And there are two
ways to fill the remaining two positions: either BS or SB. Total: 2 possible
(... A386 continued on the next page ...)

1566, Contents

(... A386 continued from the previous page ...)
2. S in position #1. Then we must have B in position #2.
(a) B in position #3. Then, like in 1(b), we are left with two B’s and two S’s to fill
the remaining four positions. Hence, Total: 3 possible arrangements.
(b) S in position #3. Then we must have B in position #4. There are three ways
to fill the remaining three positions: SBB, BSB, and BBS. Total: 3 possible
By the AP, there are 1 + 1 + 2 + 3 + 3 = 10 possible arrangements.
Again, there are 4! ways to permute the brothers and 3! ways to permute the sisters.
Hence, there are in total 10 × 4!3! = 1440 possible ways to arrange the siblings in a line, so
that no two sisters are next to each other.

(c) We saw that there was only 1 possible (linear) permutation of BBBBSSS that satisfied
the restriction, namely BSBSBSB.
If we now arrange the siblings in a circle, there will necessarily be two brothers next to each
We thus conclude: There are 0 possible ways to arrange the siblings in a circle so that no
two brothers are next to each other.

(d) In part (b), we found 10 possible (linear) permutations of BBBBSSS that satisfied
the restriction.
Of these, 3 have sisters at the two ends: SBSBBBS, SBBSBBS, and SBBBSBS. If
arranged in a circle, these 3 arrangements would involve two sisters next to each other. So
we must deduct these 3 arrangements.
We are left with 7 possible arrangements: BBSBSBS, SBBSBSB, BSBBSBS,
SBSBBSB, BSBSBBS, SBSBSBB, and BSBSBSB. But of course, these are simply
one and the same fixed circular permutation! (This is consistent with Fact 167, which tells
us to simply divide by 7.)
And now again, we must now take into account the fact that the brothers are distinct and
the sisters are distinct. We conclude that there are in total 1 × 4!3! = 144 possible ways to
arrange the siblings in a circle, so that no two sisters are next to each other.

1567, Contents

129.3. Ch. 90 Answers (How to Count: Combinations)

⎝ k ⎠ k!(n − k)!

n × (n − 1) × ⋅ ⋅ ⋅ × (n − k + 1) × (n − k) × (n − k − 1) × ⋅ ⋅ ⋅ × 1
k!(n − k) × (n − k − 1) × ⋅ ⋅ ⋅ × 1

n × (n − 1) × (n − 2) × ⋅ ⋅ ⋅ × (n − k + 1)
= (mass cancellation).

4! 4! 4×3
C(4, 2) = = = = 6,
2!(4 − 2)! 2!2! 2 × 1

6! 6! 6×5
C(6, 4) = = = = 15,
4!(6 − 4)! 4!2! 2 × 1

7! 7! 7×6×5
C(7, 3) = = = = 35.
3!(7 − 3)! 3!4! 3 × 2 × 1

⎛ 3 ⎞⎛ 7 ⎞⎛ 5 ⎞
A389. = 630.
⎝ 1 ⎠⎝ 2 ⎠⎝ 2 ⎠

A390. (a) C(1, 0) + C(1, 1) = 1 + 1 = 2 = C(2, 1).

(b) C(4, 2) + C(4, 3) = 3 + 3 = 6 = C(5, 3).
17! 17! 17 × 16 17 × 16 × 15
(c) C(17, 2) + C(17, 3) = + = +
2!15! 3!14! 2×1 3×2×1

18 × 17 × 16
= 17 × 8 + 17 × 8 × 5 = 17 × 8 × 6 =

1568, Contents

⎛7⎞ ⎛7⎞ ⎛7⎞ ⎛7⎞ ⎛7⎞ ⎛7⎞ ⎛7⎞
A391. = 1, = 7, = 21, = 35, = 35, = 21, = 7,
⎝0⎠ ⎝1⎠ ⎝2⎠ ⎝3⎠ ⎝4⎠ ⎝5⎠ ⎝6⎠
= 1.

A392. Expanding, we have

(1 + x) = (1 + x) (1 + x) (1 + x)
= 1 ⋅ 1 ⋅ 1 + 1 ⋅ 1 ⋅ x + 1 ⋅ x ⋅ 1 + x ⋅ 1 ⋅ 1 + 1 ⋅ x ⋅ x + x ⋅ 1 ⋅ x + x ⋅ x ⋅ 1 + x ⋅ x ⋅ x.
´¹¹ ¹ ¸′ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸′ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¸′ ¹ ¹ ¹ ¹ ¶
0 xs 1 x 2 xs 3 xs

Consider the 6 terms on the right. There is C(3, 0) = 1 way to choose 0 of the x’s. Hence,
the coefficient on x0 is C(3, 0) — this corresponds to the term 1 ⋅ 1 ⋅ 1 above.
There are C(3, 1) = 3 ways to choose 1 of the x’s. Hence, the coefficient on x1 is C(3, 1) —
this corresponds to the terms 1 ⋅ 1 ⋅ x, 1 ⋅ x ⋅ 1, and x ⋅ 1 ⋅ 1 above.
There are C(3, 2) = 3 ways to choose 2 of the x’s. Hence, the coefficient on x2 is C(3, 2) —
this corresponds to the terms 1 ⋅ x ⋅ x, x ⋅ 1 ⋅ x, and x ⋅ x ⋅ 1 above.
There is C(3, 03) = 1 way to choose 3 of the x’s. Hence, the coefficient on x3 is C(3, 3) —
this corresponds to the term x ⋅ x ⋅ x above.
Altogether then,

⎛3⎞ 0 ⎛3⎞ 1 ⎛3⎞ 2 ⎛3⎞ 3

(1 + x) = x + x + x + x = 1 + 3x + 3x2 + x3 .
⎝0⎠ ⎝1⎠ ⎝2⎠ ⎝3⎠

A393. 27 = 128.

⎛7⎞ ⎛7⎞ ⎛7⎞ ⎛7⎞

+ + + ⋅⋅⋅ + = 1 + 7 + 21 + 35 + 35 + 21 + 7 + 1 = 128.
⎝0⎠ ⎝1⎠ ⎝2⎠ ⎝7⎠

⎛7⎞ ⎛7⎞ ⎛7⎞ ⎛7⎞

So indeed, 27 = + + + ⋅⋅⋅ + .
⎝0⎠ ⎝1⎠ ⎝2⎠ ⎝7⎠

1569, Contents


⎛4⎞ 4 0 ⎛4⎞ 3 1 ⎛4⎞ 2 2 ⎛4⎞ 1 3 ⎛4⎞ 4 4

(3 + x)4 = 3x + 3x + 3x + 3x + 3x
⎝0⎠ ⎝1⎠ ⎝2⎠ ⎝3⎠ ⎝4⎠

= 81 + 4 ⋅ 27x + 6 ⋅ 9x2 + 4 ⋅ 3x3 + x4 = 81 + 108x + 54x2 + 12x3 + x4 .

⎛4⎞ ⎛3⎞
A395. (a) There are = 4 ways of choosing the two Tan sons and = 3 ways of
⎝2⎠ ⎝2⎠
choosing the two Wong daughters.
Having chosen these sons and daughters, there are only 2! = 2 × 1 possible ways of matching
them up. This is because for the first chosen Tan Son, we have 2 possible choices of brides
for him. And then for the second chosen Tan Son, there is only 1 possible choice of bride
left for him.
⎛ 4 ⎞⎛ 3 ⎞
Altogether then, there are ⋅ 2 = 24 ways of forming the two couples.
⎝ 2 ⎠⎝ 2 ⎠

⎛6⎞ ⎛9⎞
(b) There are = 6 ways of choosing the five Lee sons and = 126 ways of choosing
⎝5⎠ ⎝5⎠
the five Ho daughters.
Having chosen these sons and daughters, there are 5! = 5 × 4 × 3 × 2 × 1 possible ways of
matching them up. This is because for the first chosen Tan Son, we have 5 possible choices
of brides for him. And then for the second chosen Tan Son, there are 4 possible choices of
brides left for him. Etc.
⎛ 6 ⎞⎛ 9 ⎞
Altogether then, there are ⋅ 5! = 6 ⋅ 126 ⋅ 5! = 90, 720 ways of forming the five
⎝ 5 ⎠⎝ 5 ⎠

1570, Contents

129.4. Ch. 91 Answers (Probability: Introduction)
A396(a). (i) The appropriate sample space is

S = {A«, K«, Q«, . . . , 2«, Aª, Kª, Qª, . . . , 2ª, A©, K©, Q©, . . . , 2©, A¨, K¨, Q¨, . . . , 2¨} .

(a) (ii) Since there are 52 possible outcomes, there are 252 possible events. Hence, the
event space contains 252 elements. It is too tedious to write this out explicitly.

(a) (iii) As always, P has domain Σ and R. We have P({3©}) = P({5♣}) = 1/52 and
P({3©, 5♣}) = 2/52. In general, given any event A ∈ Σ, we have
∣A∣ ∣A∣
P(A) = =
∣S∣ 52

In words, given any event A, its probability P(A) is simply the number of elements it
contains, divided by 52. So for example, P ({3©, 5♣, A«}) = 3/52, as we would expect.

(a) (iv) John might argue that since packs of poker cards usually come with Jokers, there
is the possibility that we mistakenly included one or more Jokers in our deck of cards. He
might thus argue that to cover this possibility, we should set our sample space to be

S = {A«, K«, , . . . , 2«, Aª, Kª, . . . , 2ª, A©, K©, . . . , 2©, A¨, K¨, . . . , 2¨, Joker} .

The event space would be appropriately adjusted to contain 253 elements.

The mapping rule of the probability function would be appropriately adjusted, based on
John’s belief of the probability of selecting a Joker. For example, if he reckons that the
probability of selecting a Joker is 1/10, 000, then he might assign P ({Joker}) = 1/10, 000
and for any other card C, P ({C}) = 9999/(10000 ⋅ 52). The probability of any other event
A ∈ Σ is as given by the Additivity Axiom.

1571, Contents

A396(b). (i) The appropriate sample space is S = {HH, HT, T H, T T }.

(b) (ii) Since there are 4 possible outcomes, there are 24 = 16 possible events. Hence, the
event space contains 16 elements. It is not too tedious to write these out explicitly:

Σ = ⎨∅, {HH} , {HT } , {T H} , {T T } , {HH, HT } , {HH, T H} , {HH, T T } ,

{HT, T H} , {HT, T T } , {T H, T T } , {HH, HT, T H} ,

{HH, HT, T T } , {HH, T H, T T } , {HT, T H, T T } , S ⎬.

(b) (iii) As always, P has domain Σ and R. We have P({HH}) = P({HT }) = 1/4 and
P({HT, HT, T H}) = 3/4. In general, given any event A ∈ Σ, we have
∣A∣ ∣A∣
P(A) = =

In words, given any event A, its probability P(A) is simply the number of elements it
contains, divided by 4. So for example, P ({T H, T T }) = 2/4, as we would expect.

(b) (iv) John might, as before, argue that there is the possibility that a coin lands on its
edge. He might thus argue that the sample space should be

S = {HH, HT, HX, T H, T T, T X, XH, XT, XX} .

The event space would be appropriately adjusted to contain 29 = 512 elements.

The mapping rule of the probability function would be appropriately adjusted. For example,
if John believes that any given coin flip has probability 1/6000 of landing on its edge, then
we might assign P ({XX}) = 1/60002 , P ({HH}) = (5999/6000)2 , P ({XH}) = (1/6000) ⋅
(5999/6000), etc.

1572, Contents

A396(c). (i) The appropriate sample space contains 36 outcomes:

⎪ ⎫

⎪ ⎪
S=⎨ ⎬.
⎪ ⎪
, ,..., , ,..., , ,...,

⎩ ⎪

(c) (ii) Since there are 36 possible outcomes, there are 236 possible events. Hence, the
event space contains 236 elements.

⎧ ⎫ ⎧ ⎫
⎪ ⎪
⎛⎪ ⎪⎞ ⎛⎪
⎪ ⎪
⎪⎞ 1
(c) (iii) As always, P has domain Σ and R. We have P ⎨ ⎬ = P ⎨ ⎬ = and
⎪ ⎪⎠ ⎝⎪ ⎪ ⎠ 36
⎩ ⎪ ⎭ ⎪
⎩ ⎪

⎧ ⎫
⎪ ⎪
⎪ ⎞ 2
P ⎨ , ⎬ = . In general, given any event A ∈ Σ, we have
⎪ ⎪
⎪ ⎠ 36
⎩ ⎭
∣A∣ ∣A∣
P(A) = =
∣S∣ 36

In words, given any event A, its probability P(A) is simply the number of elements it
⎧ ⎫
⎪ ⎪
⎪⎞ 4
contains, divided by 52. So for example, P ⎨ , ⎬ = , as we would expect.
⎝⎪ ⎪ ⎠ 36
, ,

⎩ ⎪

(c) (iv) John might argue that there is the possibility that a die lands on a vertex. He
might thus argue that the sample space contains 72 = 49 outcomes and should be

⎪ ⎫

⎪ ⎪
S=⎨ ⎬.
⎪ ⎪
, ,..., , , ,..., , , ,...,

⎩ V V V ⎪

The event space would be appropriately adjusted to contain 249 elements.
The mapping rule of the probability function would be appropriately adjusted. For example,
if John believes that any given die roll has probability 1/1000000 of landing on a vertex,
⎧ ⎫ ⎧ ⎫
⎪V ⎪ ⎪⎞ 1 ⎛⎪
⎪ ⎪ ⎪⎞ 999999 2
then we might assign P ⎨ ⎬ = , P ⎨ ⎬ =( ) , etc.
⎪ ⎪
⎪ ⎠ 10000002 ⎝⎪
⎪ ⎪
⎪ ⎠ 1000000
⎩ V ⎭ ⎩ ⎭

1573, Contents

A397. (a) By definition, A and B are mutually exclusive if A ∩ B = ∅. Since P(∅) = 0,
the result follows.

(b) The events A, Ac ∩ B, and Ac ∩ B c ∩ C are mutually exclusive. Moreover, their union
is A ∪ B ∪ C. Hence, by the Additivity Axiom (applied twice),

P(A ∪ B ∪ C) = P(A) + P (Ac ∩ B) + P (Ac ∩ B c ∩ C) .

1574, Contents

129.5. Ch. 92 Answers (Conditional Probability)
A398. Let A be the event that we rolled at least one even number and B be the event
that the sum of the two dice was 8. We have P(B) = 5/36 (see Exercise 404).
And A ∩ B can occur if and only if the two dice were , , or . Hence, P(A ∩ B) =
Altogether then,
P(A ∩ B) 3/36 3
P(A∣B) = = = .
P(B) 5/36 5

A399. It may be true that

P (DNA match∣Blood stain is not John Brown′ s) = .
10, 000, 000
It does not however follow, except by the CPF, that
P (Blood stain is not John Brown′ s∣DNA match) = .
10, 000, 000

There is reason to believe that P (Blood stain is not John Brown′ s) is much greater than
P (DNA match) and thus that P (Blood stain is not John Brown′ s∣DNA match) is much
greater than P (DNA match∣Blood stain is not John Brown′ s).
One important factor is that if the DNA database is large, then invariably we’d expect to
find, purely by coincidence, a DNA match to the blood stain at the murder scene. As of
May 2016, the US National DNA Index contains over the DNA profiles of over 12.3 million
individuals. And so, even if it were true that there is only probability that two
10, 000, 000
random individuals have a DNA match, we’d expect to find a match, simply by combing
through the entire US National DNA Index!
The error here is similar to the lottery example, where we conclude (erroneously) that a
lottery winner must have cheated, simply because it was so unlikely that she won.

1575, Contents

129.6. Ch. 93 Answers (Probability: Independence)
A400. By Fact 173, A, B are independent events ⇐⇒ P(A∣B) = P(A). Rearranging,
P(B) = P(A ∩ B)/P(A) = P(B∣A), as desired.

A401. First, note that P (H1 ) = P (T1 ) = P (H2 ) = 0.5.

(a) P (H1 ∩ H2 ) = 0.25 = 0.5 × 0.5 = P (H1 ) P (H2 ), so that indeed H1 and H2 are independ-
(b) P (H2 ∩ T1 ) = 0.25 = 0.5×0.5 = P (H2 ) P (T1 ), so that indeed H2 and T1 are independent.
(c) Observe that H1 ∩ T1 = ∅ (it is impossible that “the first coin flip is heads” AND also
“the first coin flip is tails”).
Hence, P (H1 ∩ T1 ) = P (∅) = 0 ≠ 0.25 = 0.5 × 0.5 = P (H1 ) P (T1 ), so that indeed H1 and T1
are not independent.

A402. No, the journalist is incorrectly assuming that the probability of one family
member making the NBA is independent of another family member making the NBA. But
such an assumption is almost certainly false.
The same excellent genes that made Rick Barry a great basketball player, probably also
helped his three sons. Not to mention that having an NBA player as your father probably
helps a lot too.
The two events “family member #1 in NBA” and “family member #2 in NBA” are probably
not independent. So we cannot simply multiply probabilities together.

A403. First, note that P (H1 ) = P (T2 ) = P(X) = 0.5.

(a) P (H1 ∩ T2 ) = 0.25 = 0.5 × 0.5 = P (H1 ) P (T2 ), so that indeed H1 , T2 are independent.
P (H1 ∩ X) = 0.25 = 0.5 × 0.5 = P (H1 ) P (X), so that indeed H1 , X are independent.
P (T2 ∩ X) = 0.25 = 0.5 × 0.5 = P (T2 ) P (X), so that indeed T2 , X are independent.
Altogether then, H1 , T2 , and X are indeed pairwise independent.
(b) The event H1 ∩ T2 ∩ X is the same as the event H1 ∩ T2 . Thus, P (H1 ∩ T2 ∩ X) =
P (H1 ∩ T2 ) = 0.25 ≠ 0.5 × 0.5 × 0.5 = P (H1 ) P (T2 ) P(X), so that indeed the three events are
not independent.

1576, Contents

129.7. Ch. 95 Answers (Random Variables: Introduction)

k s such that X(s) = k P(X = k)

2 .
3 , .
4 , , .
5 , , , .
6 , , , , .
7 , , , , , .
8 , , , , .
9 , , , .
10 , , .
11 , .
12 .

(b) E is the event X ≥ 10.

3 2 1 6 1
(c) P(E) = P (X ≥ 10) = P (X = 10) + P (X = 11) + P (X = 12) = + + = = .
36 36 36 36 6

1577, Contents

⎛ ⎞ ⎛ ⎞
A405. = 5 and P = 4.
⎝ ⎠ ⎝ ⎠

⎛ ⎞ ⎛ ⎞
= 3 and Q = 3.
⎝ ⎠ ⎝ ⎠

⎛ ⎞ ⎛ ⎞
(P Q) = 15 and (P Q) = 12.
⎝ ⎠ ⎝ ⎠

A406. If S = {1, 2, 3, 4, 5, 6}, then X ∶ S → R defined by X(s) = s is of course a random

variable. A random variable is simply any function with domain S and codomain R; and
X certainly meets these requirements.
P (X = 1) = P (X = 2) = P (X = 3) = P (X = 4) = P (X = 5) = P (X = 6) = 1/6 and P (X = K) =
0 for any k ≠ 1, 2, 3, 4, 5, 6.

A407(a). (i) The sample space is


HT T H, T HT H, T T HH, HT T T, T HT T, T T HT, T T T H, T T T T }.

The event space Σ is the set of all possible subsets of S and contains 216 elements.
The probability function P ∶ Σ → R is defined by P(A) = ∣A∣/16, for any event A ∈ Σ.

(a) (ii) The random variable X ∶ S → R is the function defined by

HT T T, T HT T, T T HT, T T T H ↦ 1,
HHT T, HT HT, T HHT, HT T H, T HT H, T T HH ↦ 2,
T T T T ↦ 0, HHHH ↦ 4.

(a)(iii) P(X = 4) = 1/16, P(X = 3) = 4/16, P(X = 2) = 6/16,

P(X = 1) = 4/16, P(X = 0) = 1/16, P(X = k) = 0, for any k ≠ 0, 1, 2, 3, 4.

1578, Contents

A407(b). (i) The sample space S consists of 216 outcomes:

⎪ ⎫

⎪ ⎪

S=⎨ ⎬
⎪ ⎪
, ,..., , ,..., , ,...,

⎪ ⎪

⎩ ⎭

The event space Σ is the set of all possible subsets of S and contains 2216 elements.
The probability function P ∶ Σ → R is defined by P(A) = ∣A∣/216, for any event A ∈ Σ.

(b) (ii) The range of X is {3, 4, 5, . . . , 18}. We now count the number of ways there are for
the three dice to reach a sum of 3, to reach a sum of 4, etc. This will enable us to write
down the mapping rule of the function X ∶ S → R.
To get a sum of 3, the three dice must be or permutations thereof. There is thus
= 1 possibility.
To get a sum of 4, the three dice must be , or permutations thereof. There are thus
= 3 possibilities.
To get a sum of 5, the three dice must be , , or permutations thereof. There are
3! 3!
thus + = 6 possibilities.
2! 2!
To get a sum of 6, the three dice must be , , , or permutations thereof.
3! 3!
There are + 3! + = 10 such possibilities.
2! 3!
To get a sum of 7, the three dice must be , , , , or permutations
3! 3! 3!
thereof. There are + 3! + + = 15 such possibilities.
2! 2! 2!
To get a sum of 8, the three dice must be , , , , , or permutations
3! 3! 3!
thereof. There are + 3! + 3! + + = 21 such possibilities.
2! 2! 2!
To get a sum of 9, the three dice must be , , , , , , or
3! 3! 3!
permutations thereof. There are 3! + 3! + + + 3! + = 25 such possibilities.
2! 2! 3!
To get a sum of 10, the three dice must be , , , , , , or
3! 3! 3!
permutations thereof. There are 3! + 3! + + 3! + + = 27 such possibilities.
2! 2! 2!
By symmetry, there are also 27 ways to get a sum of 11; also 25 ways to get a sum of 12,

(... Answer continued on the next page ...)

1579, Contents

(... Answer continued from the previous page ...)

So X ∶ S → R is defined by

⎛ ⎞

⎟ = 3,

⎝ ⎠

⎛ ⎞ ⎛ ⎞ ⎛ ⎞

⎟ ⎜
⎟ ⎜
⎟ = 4,

⎝ ⎠ ⎝ ⎠ ⎝ ⎠

⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞ ⎛ ⎞

⎟ ⎜
⎟ ⎜
⎟ ⎜
⎟ ⎜
⎟ ⎜
⎟ = 5,

⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠ ⎝ ⎠

1 3 6
(b)(iii) P(X = 3) = , P(X = 4) = , P(X = 5) = ,
216 216 216

10 15 21
P(X = 6) = , P(X = 7) = , P(X = 8) = ,
216 216 216

25 27 27
P(X = 9) = , P(X = 10) = , P(X = 11) = ,
216 216 216

25 21 15
P(X = 12) = , P(X = 13) = , P(X = 14) = ,
216 216 216

10 6 3
P(X = 15) = , P(X = 16) = , P(X = 17) = ,
216 216 216

P(X = 18) = , P(X = k) = 0,
for any k ∉ {3, 4, 5, . . . , 18}.

1580, Contents

129.8. Ch. 96 Answers (Random Variables: Independence)
A408. No. For example, P (X = 0, Y = 0) = 0, but

P (X = 0) P (Y = 0) = 0.5 × 0.25 = 0.125.

129.9. Ch. 97 Answers (Random Variables: Expectation)

A409. (a) P(X + Y = 2) is simply the probability of 2 heads and 0 sixes OR 1 head and
1 six OR 0 heads and 2 sixes. So

1 1 5 5 ⎛ 2 ⎞ 1 1 ⎛ 2 ⎞ 5 1 1 1 1 1 25 20 1 46
P (X + Y = 2) = ⋅ ⋅ ⋅ + ⋅ ⋅ + ⋅ ⋅ ⋅ = + + =
2 2 6 6 ⎝ 1 ⎠ 2 2 ⎝ 1 ⎠ 6 6 2 2 6 6 144 144 144 144

(b) P (X + Y = 3) is simply the probability of 2 heads and 1 six OR 1 head and 2 sixes. So

1 1 ⎛ 2 ⎞ 5 1 ⎛ 2 ⎞ 1 1 1 1 10 2 12
P (X + Y = 3) = ⋅ ⋅ ⋅ + ⋅ ⋅ ⋅ = + =
2 2 ⎝ 1 ⎠ 6 6 ⎝ 1 ⎠ 2 2 6 6 144 144 144

(c) P (X + Y = 4) is simply the probability of 2 heads and 2 sixes. So

1 1 1 1 1
P (X + Y = 4) = ⋅ ⋅ ⋅ = .
2 2 6 6 144

(d) E[X + Y ]

= ∑ P (X + Y = k) ⋅ k
k∈Range(X+Y )

= P (X + Y = 0) ⋅ 0 + P (X + Y = 1) ⋅ 1 + P (X + Y = 2) ⋅ 2
+ P (X + Y = 3) ⋅ 3 + P (X + Y = 4) ⋅ 4

25 60 46 12 1 60 + 92 + 36 + 4 192 4
= ⋅0+ ⋅1+ ⋅2+ ⋅3+ ⋅4= = = .
144 144 144 144 144 144 144 3

1581, Contents

A410(a). The range of X consists simply of the possible prizes from the “big” game.
Range(X) = {2000, 1000, 490, 250, 60, 0}. (Don’t forget to include 0.)
Similarly, Range(Y ) = {3000, 2000, 800, 0}.
(b) P (X = 2000) = P (X = 1000) = P (X = 490) = ,

P (X = 250) = P (X = 60) = ,

P (X = 0) = ,

P (Y = 3000) = P (Y = 2000) = P (Y = 800) = ,

P (Y = 0) = .

(c) E [X] = ∑ P (X = k) ⋅ k = 2000P (X = 2000) + 1000P (X = 1000) + . . .


⋅ ⋅ ⋅ + 490P (X = 490) + 250P (X = 250) + 60P (X = 60) + 0P (X = 0)

2000 1000 490 250 ⋅ 10 60 ⋅ 10 9977 ⋅ 0

= + + + + + = 0.659
10000 10000 10000 10000 10000 10000

E [Y ] = ∑ P (Y = 2000) ⋅ k
k∈Range(Y )

= P (Y = 3000) ⋅ 3000 + P (Y = 2000) ⋅ 2000 + P (Y = 800) ⋅ 800 + P (Y = 0) ⋅ 0

1 1 1 9997
= ⋅ 3000 + ⋅ 2000 + ⋅ 800 + ⋅ 0 = 0.3 + 0.2 + 0.08 + 0 = 0.58.
10000 10000 10000 10000
(d) For every $1 staked, the “big” game is expected to lose you $0.341 and the “small”
game is expected to lose you $0.42. Thus, the “big” game is expected to lose you less

1582, Contents

129.10. Ch. 98 Answers (Random Variables: Variance)
A411. In Exercise 407(b), we already found that P (Z = 3) = 1/216, P (Z = 4) = 3/216, ...,
P (Z = 18) = 1/216. By symmetry, we have µ = E [Z] = 10.5. So:
1 3 1 25704
E [Z 2 ] = ⋅ 32 + ⋅ 42 + ⋅ ⋅ ⋅ + ⋅ 182 = = 119.
216 216 216 216
Hence, V [Z] = E [Z 2 ] − µ2 = 119 − 10.52 = .

35 65
A412. E [Y ] = × 20 cm + × 30 cm = 26.5 cm.
100 100
35 65
V [Y ] = × (20 cm − 26.5 cm) + × (30 cm − 26.5 cm) = 22.75 cm2 .
2 2
100 100

SD [Y ] = V [Y ] ≈ 4.77 cm.

A413. (a) 2µ kg, 2σ 2 kg2 .

(b) 2µ kg, 4σ 2 kg2 .
(c) The mean of the total weight of the two fish is 2µ kg. However, we do not know the
variance, since the weights of the two fish are not independent.

129.11. Ch. 99 Answers (The Coin-Flips Problem)

(This chapter had no exercises.)

129.12. Ch. 100 Answers (Bernoulli Trial and Distribution)

(This chapter had no exercises.)

1583, Contents

129.13. Ch. 101 Answers (Binomial Distribution)
A414. Let X ∼ B (20, 0.01) be the number of components in engine #1 that fail. Let
Y ∼ B (35, 0.005) be the number of components in engine #2 that fail.
The probability that engine #1 fails is

P (X ≥ 2) = 1 − P (X ≤ 1) = 1 − P (X = 0) − P (X = 1)
⎛ 20 ⎞ ⎛ 20 ⎞
=1− 0.010 0.9920 − 0.011 0.9919
⎝ 0 ⎠ ⎝ 1 ⎠
≈ 0.0169.

The probability that engine #2 fails is

P (Y ≥ 2) = 1 − P (Y ≤ 1) = 1 − P (Y = 0) − P (Y = 1)
⎛ 35 ⎞ ⎛ 35 ⎞
=1− 0.0050 0.99535 − 0.0051 0.99534
⎝ 0 ⎠ ⎝ 1 ⎠
≈ 0.0133.

Hence, the probability that both engines fail is

P (X ≥ 2) P (Y ≥ 2) ≈ 0.00022.

1584, Contents

129.14. Ch. 102 Answers (Continuous Uniform Distribution)

⎪ 0, if k < 3, ⎧

⎪ ⎪
⎪0.5, if k ∈ [3, 5]
(a) FY (k) = ⎨0.5k, if k ∈ [3, 5], (b) fY (k) = ⎨

⎪ ⎪

⎪ ⎩ otherwise.
⎩1, if k > 5.

(c) P (3.1 ≤ Y ≤ 4.6) = 0.75 is in blue and P (4.8 ≤ Y ≤ 4.9) = 0.05 is in red.

1585, Contents

129.15. Ch. 103 Answers (Normal Distribution)
A416. (a) From Z-tables,

P (Z ≥ 1.8) = 1 − P (Z ≤ 1.8) = 1 − Φ(1.8) ≈ 1 − 0.9641 = 0.0359.

Graphing calculator screenshot:

-4 -3 -2 -1 0 1 2 3 4 -4 -3 -2 -1 0 1 2 3 4

(b) From Z-tables,

P (−0.351 < Z < 1.2) = Φ(1.2) − Φ(−0.351) = Φ(1.2) − [1 − Φ(0.351)]

≈ 0.8849 − (1 − 0.6372) = 0.8849 − 0.3628 = 0.5221.

Graphing calculator screenshot:

1586, Contents

A417. If µ = 0 and σ 2 = 1, then
1 1 1
fX (a) = √ e−0.5( σ ) = √ e−0.5( 1 ) = √ e−0.5a = φ (a) .
a−µ 2 a−0 2 2

σ 2π 1 2π 2π

We’ve just shown that the PDF of X ∼ N(µ, σ 2 ) when µ = 0 and σ 2 , is the same as the PDF
of the SNRV Z ∼ N(0, 1). Hence, the SNRV is indeed simply a normal random variable
with mean µ = 0 and variance σ 2 = 1.

X − µ X −µ 1
A418. First observe that = + . Now simply use Fact 181, with a = and
σ σ σ σ
b= :
X − µ X −µ µ −µ 1 2
= + ∼ N( + , σ ) = N (0, 1) .
σ σ σ σ σ σ2

1587, Contents

A419. Let X ∼ N (µ, σ 2 ) and let fX and FX be the PDF and CDF of X.
1. We know from Fact 180 that Φ (∞) = 1. And by Corollary 33, FX (∞) = P (X ≤ ∞) =
Φ (∞). Thus, FX (∞) = 1.
a−µ a−µ
2. fX (a) = φ ( ). But we already know from Fact 180 that φ ( ) > 0.
σ σ
3. E [X] = E [σZ + µ] = σE [Z] + µ = µ.
4. We know from Fact 180 that the standard normal PDF φ attains a global maximum
at 0. That is, φ (0) ≥ φ (a), for all a ∈ R. Since X = σZ + µ, this is equivalent to
fX (σ ⋅ 0 + µ) ≥ fX (σ ⋅ a + µ), for all a ∈ R. Equivalently, fX (µ) ≥ fX (b), for all b ∈ R.
That is, fX attains a global maximum at µ.
5. Var [X] = Var [σZ + µ] = σ 2 Var [Z] = σ 2 .
6. We know from Fact 180 that for any a ∈ R, we have P (Z ≤ a) = P (Z < a). Equivalently,
for all a ∈ R, P (X ≤ σa + µ) = P (X < σa + µ). Equivalently, for all b ∈ R, P (X ≤ b) =
P (X < b).
7. We know from Fact 180 that φ is symmetric about 0. Since X = σZ +µ, fX must likewise
be symmetric about σ ⋅ 0 + µ = µ.

(a) By Corollary 33, P (X ≥ µ + a) = P (Z ≥ ). By Fact 180, P (Z ≥ ) = P (Z ≤ − ).

a a a
σ σ σ
Now again by Corollary 33, P (Z ≤ − ) = P (X ≤ µ − a). Altogether then, P (X ≥ µ + a) =
P (X ≤ µ − a), as desired. And of course, by definition, P (X ≤ µ − a) = FX (µ − a).
(b) Obvious.
(c) Obvious.
8. First use Corollary 33: P (µ − σ ≤ X ≤ µ + σ) = P (−1 ≤ Z ≤ 1). Now use Fact 180:
P (−1 ≤ Z ≤ 1) = 0.6827.
9. First use Corollary 33: P (µ − 2σ ≤ X ≤ µ + 2σ) = P (−2 ≤ Z ≤ 2). Now use Fact 180:
P (−2 ≤ Z ≤ 2) = 0.9545.
10. First use Corollary 33: P (µ − 3σ ≤ X ≤ µ + 3σ) = P (−3 ≤ Z ≤ 3). Now use Fact 180:
P (−3 ≤ Z ≤ 3) = 0.9973.
11. By Fact 180, φ has two points of inflexion, namely at ±1. That is, φ changes concavity at
±1. Since by Corollary 33, X = σZ + µ, fX must likewise change concavity at ±1 ⋅ σ + µ =
µ ± σ. That is, fX has two points of inflexion, namely at µ ± σ.

1588, Contents

A420. We are given that X ∼ N(2.14, 5) and Y ∼ N(−0.33, 2).

1 − 2.14
(a) P (X ≥ 1) = P (Z ≥ √ ) ≈ P (Z ≥ −0.5098)

= P (Z ≤ 0.5098) = Φ (0.5098) ≈ 0.6949.

1 − (−0.33)
P (Y ≥ 1) = P (Z ≥ √ ) ≈ P (Z ≥ 0.9405)

= 1 − P (Z ≤ 0.9405) = 1 − Φ (0.9405) ≈ 0.1735.

P (X ≥ 1) and P (Y ≥ 1) P (−2 ≤ X ≤ −1.5) and P (−2 ≤ Y ≤ −1.5)

(−2) − 2.14 (−1.5) − 2.14

(b) P (−2 ≤ X ≤ −1.5) = P ( √ ≤Z≤ √ )
5 5

≈ P (−1.8515 ≤ Z ≤ −1.6279) = P (1.6279 ≤ Z ≤ 1.8515)

= Φ (1.8515) − Φ (1.6279) ≈ 0.9679 − 0.9482 = 0.0197.

(−2) − (−0.33) (−1.5) − (−0.33)

P (−2 ≤ Y ≤ −1.5) = P ( √ ≤Z≤ √ )
2 2

≈ P (−1.1809 ≤ Z ≤ −0.8273) = P (0.8273 ≤ Z ≤ 1.1809)

= Φ (1.1809) − Φ (0.8273) ≈ 0.8812 − 0.7959 = 0.0853.

1589, Contents

A421. (a) Let W ∼ N (25000, 64000000) and E ∼ N (200, 10000). Let B = 0.002W + 0.3E
be the total bill in a given month. Then

B ∼ N (0.002 × 25000 + 0.3 × 200, 0.0022 × 64000000 + 0.32 × 10000)

= N (50 + 60, 256 + 900) = N (110, 1156) .

Thus, P (B > 100) ≈ 0.6157 (calculator).

(b) Let B1 ∼ N (110, 1156), B2 ∼ N (110, 1156), . . . , B12 ∼ N (110, 1156) be the bills in each
of the 12 months.
Then the total bill in a year is T = B1 +B2 +⋅ ⋅ ⋅+B12 ∼ N (12 × 110, 12 × 1156) = N (1320, 13872).
Thus, P (T > 1000) ≈ 0.9967 (calculator).

(c) The total bill in a given month is B = 0.002W + xE and

B ∼ N (50 + 200x, 256 + 10000x2 ) .

Our goal is to find the value of x for which P (B > 100) = 0.1. We have

100 − (50 + 200x) 50 − 200x

P (B > 100) = P (Z > √ ) = P (Z > √ )
256 + 10000x 2 256 + 10000x2

50 − 200x
= 1 − Φ (√ ) = 0.1.
256 + 10000x2
From the Z-tables,

50 − 200x 50 − 200x
Φ (√ ) = 0.9 ⇐⇒ √ ≈ 1.2815.
256 + 10000x2 256 + 10000x2

One can rearrange, do the algebra (square both sides), and use the quadratic formula.
Alternatively, one can simply use one’s graphing calculator to find that x ≈ 0.084. We
conclude that the maximum value of x is approximately 0.084, in order for the probability
that the total utility bill in a given month exceeds $100 is 0.1 or less.

1590, Contents

A422. From our earlier work, we know that each die roll has mean 3.5 and variance
The CLT says that since n = 30 ≥ 30 is large enough and the distribution is “nice enough”
(we are assuming this), X can be approximated by the normal random variable Y ∼
N (30 × 3.5, 30 × 35/12) = N (105, 1050/12). Thus, using also the continuity correction, we
have P(100 ≤ X ≤ 110) ≈ P(99.5 ≤ Y ≤ 110.5) ≈ 0.4435 (calculator).

A423. Let X be the random variable that is the sum of the weights of the 5, 000 Coco-Pops.
The CLT says that since n = 5000 ≥ 30 is large enough and the distribution is “nice
enough” (we are assuming this), X can be approximated by the normal random variable
Y ∼ N (5000 × 0.1, 5000 × 0.004) = N (500, 20). Thus, P (X ≤ 499) ≈ P (Y ≤ 499) ≈ 0.4115

1591, Contents

129.16. Ch. 106 Answers (Sampling)
3 + 14 + 2 + 8 + 8 + 6 + 0 41
A424. x̄ = = and
7 7
∑ x2i − nx̄2 9 + 156 + 4 + 64 + 64 + 36 − 412 /7 155
s =
= =
6 7

A425. (a) The sample mean and sample variance are

∑i=1 x 1885
x̄ = = = 188.5,
n 10

∑i=1 x2 − 378, 265 − 1885
2 2
n i=1 x)

s =
2 n
= 10
≈ 2550.
n−1 9

(b) The sample mean and sample variance are

∑i=1 x ∑i=1 (x − 50 + 50) ∑i=1 (x − 50) + ∑i=1 50 1885 + 50n 1885

n n n n
x̄ = = = = = + 50 = 238.5,
n n n n n

∑i=1 (xi − 50) − [∑i=1 (xni −50)] 378, 265 − 1885

n 2 n 2

s2 = = 10
≈ 2550.
n−1 9

A426. (a) Assume that the weights of the five Singaporeans sampled are independently-
and identically-distributed. Then unbiased estimates for the population mean µ and vari-
ance σ 2 of the weights of Singaporeans are, respectively, the observed sample mean x̄ and
observed sample variance s2 :
∑ xi 32 + 88 + 67 + 75 + 56
x̄ = = = 63.6,
n 5
∑ x2i − nx̄2 322 + 882 + 672 + 752 + 562 − 4 × 63.6
s =
= = 448.3.
n−1 4
(b) We don’t know! And unless we literally gather and weigh every single Singaporean, we
will never know what exactly the average weight of a Singaporean is.
All we’ve found in part (a) is an estimate (63.6 kg) for the average weight of a Singaporean.
We know that on average, the estimator we uses “gets it right”.
However, it could well be that we’re unlucky (and got 5 unusually heavy or unusually light
persons) and the estimate of 63.6 kg is thus way off.

1592, Contents

X 1 + X 2 + ⋅ ⋅ ⋅ + Xn E [X1 + X2 + ⋅ ⋅ ⋅ + Xn ]
E [X̄] = E [ ]=
n n
E [X1 ] + E [X2 ] + ⋅ ⋅ ⋅ + E [Xn ] µ + µ + ⋅ ⋅ ⋅ + µ nµ
= = = = µ.
n n n

We have just shown that E [X̄] = µ. In other words, we’ve just shown that X̄ is an unbiased
estimator for µ.

A428. (a) The observed random sample is (x1 , x2 , . . . , x10 ) = (1, 1, 1, 1, 1, 1, 1, 0, 0, 0). The
observed sample mean and observed sample variance are
x1 + x2 + ⋅ ⋅ ⋅ + x10
x̄ = = 0.7,

(x1 − x̄) + (x2 − x̄) + ⋅ ⋅ ⋅ + (x10 − x̄) 7 ⋅ 0.32 + 3 ⋅ 0.72

2 2 2

s =
= = 0.23.
n−1 9

(b) Yes, the observed sample mean x̄ = 0.7 is an unbiased estimate for the true population
mean µ (i.e. the true proportion of coin flips that are heads).

And yes, the observed sample variance s2 = 0.23 is an unbiased estimate for the true
population variance σ 2 .

(c) No, this is merely one observed random sample, from which we generated a single
estimate (“guess”) — namely x̄ = 0.7 — of the true population mean µ.
All we know is that the sample mean X̄ is an unbiased estimator for the true population
mean µ. That is, the average estimate generated by X̄ will equal µ.
However, any particular estimate x̄ may or may not be equal to µ. Indeed, if we’re unlucky,
our particular estimate may be very far from the true µ.

1 1 1
A429. Var [X̄] = Var [ (X1 + X2 + ⋅ ⋅ ⋅ + Xn )] = 2 Var [X1 + X2 + ⋅ ⋅ ⋅ + Xn ] = 2 (Var [X1 ] + Var
n n n
1 2
(nσ 2 ) = .
n2 n

1593, Contents

A430. (a) The population mean µ is the number defined by µ = ∑ xi /k. It is the
average across all population values.
(b) The population variance σ 2 is the number defined by σ 2 = ∑ (xi − µ) /k. It measures
the dispersion across the population values.
(c) The sample mean X̄ is a random variable defined by X̄ = ∑ Xi /n. It is the average
of all values in a random sample.

(d) The sample variance S is a random variable defined by S = ∑ (Xi − X̄) / (n − 1).
2 2
It measures the dispersion across the values in a random sample.
(e) The mean of the sample mean, also called the expected value of the sample mean, is the
number E [X̄]. The interpretation is that if we we have infinitely many observed samples
of size n, calculate the observed sample mean for each, then E [X̄] is equal to the average
across the observed sample means. It can be shown that E [X̄] = µ and hence that the
sample mean X̄ is an unbiased estimator for the population mean µ.
(f) The variance of the sample mean is the number Var [X̄]. The interpretation is that if
we have infinitely many observed random samples of size n, calculate the observed sample
mean for each, then Var [X̄] measures the dispersion across the observed sample means.
(g) The mean of the sample variance, also called the expected value of the sample variance,
is the number E [S 2 ]. The interpretation is that if we have infinitely many observed
random samples of size n, calculate the observed sample variance for each, then E [S 2 ] is
equal to the average across the observed sample variances. It can be shown that E [S 2 ] = σ 2
and hence that the sample variance S 2 is an unbiased estimator for the population variance
σ2 .
(h) Given an observed random sample, e.g. (x1 , x2 , x3 ) = (1, 1, 0), we can calculate the
corresponding observed sample mean as
x1 + x2 + x3 1 + 1 + 0 2
x̄ = = = .
3 3 3
The observed sample mean is the average of all values in an observed random sample.
(i) Given an observed random sample, e.g. (x1 , x2 , x3 ) = (1, 1, 0), we can calculate the
corresponding observed sample variance as

(x1 − x̄) + (x2 − x̄) + (x3 − x̄) 1/9 + 1/9 + 4/9 1

2 2 2
s =
= = .
3−1 2 3
The observed sample variance measures the dispersion across the observed sample variances.

1594, Contents

129.17. Ch. 107 Answers (Null Hypothesis Significance Testing)
A431. Let µ be the probability that a coin-flip is heads. The null and alternative
hypotheses are

H0 ∶ µ = 0.5 and HA ∶ µ > 0.5.

Our random sample is 20 coin-flips: (X1 , X2 , . . . , X20 ), where Xi takes on the value 1 if
the ith coin-flip is heads and 0 otherwise.
Our test statistic is the number of heads: T = X1 + X2 + ⋅ ⋅ ⋅ + X20 .
In our observed random sample (x1 , x2 , . . . , x20 ), there are 17 heads. So the observed
test statistic is t = 17.
Assuming H0 were true, we’d have T ∼ B (20, 0.5). Thus, the p-value is

P (T ≥ 17∣H0 ) = P (T = 17∣H0 ) + P (T = 18∣H0 ) + P (T = 19∣H0 ) + P (T = 20∣H0 )

⎛ 20 ⎞ 17 3 ⎛ 20 ⎞ 18 2 ⎛ 20 ⎞ 19 1 ⎛ 20 ⎞ 20 0
= 0.5 0.5 + 0.5 0.5 + 0.5 0.5 + 0.5 0.5 ≈ 0.0013.
⎝ 17 ⎠ ⎝ 18 ⎠ ⎝ 19 ⎠ ⎝ 20 ⎠

Since p ≈ 0.0013 < α = 0.05, we reject H0 at the 5% significance level.

A432. Let µ be the true long-run proportion of coin-flips that are heads. The null and
alternative hypotheses are

H0 ∶ µ = 0.5 and HA ∶ µ ≠ 0.5.

Our random sample is 20 coin-flips: (X1 , X2 , . . . , X20 ), where Xi takes on the value 1 if
the ith coin-flip is heads and 0 otherwise.
Our test statistic is the number of heads: T = X1 + X2 + ⋅ ⋅ ⋅ + X20 .
In our observed random sample (x1 , x2 , . . . , x20 ), there are 17 heads. So the observed
test statistic is t = 17.
Assuming H0 were true, we’d have T ∼ B (20, 0.5). Thus, the p-value is

P (T ≥ 17, T ≤ 3∣H0 ) = P (T = 0∣H0 ) + ⋅ ⋅ ⋅ + P (T = 3∣H0 ) + P (T = 17∣H0 ) + ⋅ ⋅ ⋅ + P (T = 20∣H0 )

⎛ 20 ⎞ 0 20 ⎛ 20 ⎞ 1 19 ⎛ 20 ⎞ 17 3 ⎛ 20 ⎞ 20 0
= 0.5 0.5 + 0.5 0.5 + 0.5 0.5 + ⋅ ⋅ ⋅ + 0.5 0.5 ≈ 0.0026.
⎝ 0 ⎠ ⎝ 1 ⎠ ⎝ 17 ⎠ ⎝ 20 ⎠

Since p ≈ 0.0026 < α = 0.05, we reject H0 at the 5% significance level.

1595, Contents

A433. Let µ be the probability that a coin-flip is heads.
(a) The competing hypotheses are H0 ∶ µ = 0.5, HA ∶ µ > 0.5.
The test statistic T is the number of heads (out of the 20 coin-flips).
For t = 14, the corresponding p-value is

P (T ≥ 14∣H0 ) = P (T = 14∣H0 true) + P (T = 15∣H0 true) + ⋅ ⋅ ⋅ + P (T = 20∣H0 true)

⎛ 20 ⎞ 14 6 ⎛ 20 ⎞ 15 5 ⎛ 20 ⎞ 20 0
= 0.5 0.5 + 0.5 0.5 + ⋅ ⋅ ⋅ + 0.5 0.5 ≈ 0.05766.
⎝ 14 ⎠ ⎝ 15 ⎠ ⎝ 20 ⎠

For t = 15, the corresponding p-value is

P (T ≥ 15∣H0 ) = P (T = 15∣H0 true) + P (T = 15∣H0 true) + ⋅ ⋅ ⋅ + P (T = 20∣H0 true)

⎛ 20 ⎞ 14 6 ⎛ 20 ⎞ 15 5 ⎛ 20 ⎞ 20 0
= 0.5 0.5 + 0.5 0.5 + ⋅ ⋅ ⋅ + 0.5 0.5 ≈ 0.02069.
⎝ 15 ⎠ ⎝ 15 ⎠ ⎝ 20 ⎠

Thus, the critical value is 15 (this is the value of t at which we are just able to reject H0 at
the α = 0.05 significance level).
And the critical region is {15, 16, . . . , 20} (this is the set of values of t at which we’d be able
to reject H0 at the α = 0.05 significance level).
(b) The competing hypotheses are H0 ∶ µ = 0.5, HA ∶ µ ≠ 0.5.
The test statistic T is the number of heads (out of the 20 coin-flips).
For t = 14, the corresponding p-value is

P (T ≥ 14, T ≤ 6∣H0 ) = 1 − P (7 ≤ T ≤ 13∣H0 )

= 1 − [P (T = 7∣H0 true) + P (T = 8∣H0 true) + ⋅ ⋅ ⋅ + P (T = 13∣H0 true)]
⎡ ⎤
⎢⎛ 20 ⎞ 7 13 ⎛ 20 ⎞ 8 12 ⎛ 20 ⎞ 13 7 ⎥⎥
=1−⎢ ⎢ 0.5 0.5 + 0.5 0.5 + ⋅ ⋅ ⋅ + 0.5 0.5 ⎥ ≈ 0.1153.
⎢⎝ 7 ⎠ ⎝ 8 ⎠ ⎝ 13 ⎠ ⎥
⎣ ⎦
For t = 15, the corresponding p-value is

P (T ≥ 15, T ≤ 5∣H0 ) = 1 − P (6 ≤ T ≤ 14∣H0 )

= 1 − [P (T = 6∣H0 true) + P (T = 7∣H0 true) + ⋅ ⋅ ⋅ + P (T = 14∣H0 true)]
⎡ ⎤
⎢⎛ 20 ⎞ 6 14 ⎛ 20 ⎞ 7 13 ⎛ 20 ⎞ 13 7 ⎥⎥
=1−⎢ ⎢ 0.5 0.5 + 0.5 0.5 + ⋅ ⋅ ⋅ + 0.5 0.5 ⎥ ≈ 0.1153.
⎢⎝ 6 ⎠ ⎝ 7 ⎠ ⎝ 14 ⎠ ⎥
⎣ ⎦
Thus, the critical value is 15 and the critical region is {15, 16, . . . , 20}.

1596, Contents

A434. The competing hypotheses are:

H0 ∶ µ = 34,
HA ∶ µ ≠ 34.

The observed sample mean is

35 + 35 + 31 + 32 + 33 + 34 + 31 + 34 + 35 + 34
x̄ = = 33.4.
The corresponding p-value is

p = P (X̄ ≥ 33.4, X̄ ≤ 34.6∣H0 ) = P (X̄ ≥ 33.4∣H0 ) + P (X̄ ≤ 34.6∣H0 )

⎛ 33.4 − 34 ⎞ ⎛ 34.6 − 34 ⎞
=P Z≥ √ +P Z ≤ √ ≈ 0.5271.
⎝ 9/10 ⎠ ⎝ 9/10 ⎠

The large p-value does not cast doubt on or provide evidence against H0 . We fail to reject
H0 at the α = 0.05 significance level.

A435. The competing hypotheses are:

H0 ∶ µ = 34,
HA ∶ µ ≠ 34.

The observed sample mean is x̄ = 33.4.

The corresponding p-value is

p = P (X̄ ≤ 33.4, X̄ ≥ 34.6∣H0 ) = P (X̄ ≤ 33.4∣H0 ) + P (X̄ ≥ 34.6∣H0 )

CLT ⎛ 33.4 − 34 ⎞ ⎛ 34.6 − 34 ⎞

≈ P Z≤ √ +P Z ≥ √ ≈ 0.04550.
⎝ 9/100 ⎠ ⎝ 9/100 ⎠

The large p-value casts doubt on or provides evidence against H0 . We reject H0 at the
α = 0.05 significance level.

1597, Contents

A436. The competing hypotheses are:

H0 ∶ µ = 34,
HA ∶ µ ≠ 34.

The observed sample mean is x̄ = 33.4. And the observed sample variance is s2 = 11.2.
The corresponding p-value is

p = P (X̄ ≤ 33.4, X̄ ≥ 34.6∣H0 ) = P (X̄ ≤ 33.4∣H0 ) + P (X̄ ≥ 34.6∣H0 )

CLT ⎛ 33.4 − 34 ⎞ ⎛ 34.6 − 34 ⎞

≈ P Z≤√ +P Z ≥ √ ≈ 0.07300.
⎝ 11.2/100 ⎠ ⎝ 11.2/100 ⎠

The fairly small p-value casts some doubt on or provides some evidence against H0 . But
we fail to reject H0 at the α = 0.05 significance level.
A437. The observed sample mean is x̄ = 68 and the observed sample variance (use Fact
183(a)) is
[∑n xi ]
∑i=1 x2i − i=1n 50 × 5000 − (68×50)
2 2
s =
= 50
≈ 383.7.
n−1 49
Let µ be the true average weight of a Singaporean. The competing hypotheses are H0 ∶ µ =
75 and HA ∶ µ < 75.
(This is a one-tailed test, because your friend’s claim is that the average American is heavier
than the average Singaporean. If the claim were instead that the average American’s weight
is different from the average Singaporean’s, then we’d have a two-tailed test.)
Since the sample size n = 50 is “large enough”, we can appeal to the CLT. The p-value is

CLT ⎛ 68 − 75 ⎞
p = P (X̄ ≤ 68∣H0 ) ≈ P Z ≤ √ ≈ 0.0058.
⎝ 383.7/50 ⎠

The small p-value casts doubt on or provides evidence against H0 . We can reject H0 at any
conventional significance level (α = 0.1, α = 0.05, or α = 0.01).

1598, Contents

129.18. Ch. 108 Answers (Correlation and Linear Regression)

1200 q

p ($)
0 2 4 6 8 10 12

A439. Compute p̄ = (8 + 9 + 4 + 10 + 8) /5 = 7.8 and

q̄ = (300 + 250 + 1000 + 400 + 400) /5 = 470. Also,
∑ (pi − p̄) (qi − q̄) = (8 − p̄) (300 − q̄) + (9 − p̄) (250 − q̄) + ⋅ ⋅ ⋅ + (8 − p̄) (400 − q̄)
= (8 − 7.8) (300 − 470) + (9 − 7.8) (250 − 470) + ⋅ ⋅ ⋅ + (8 − 7.8) (400 − 470) = −2480,

¿ √
À∑ (pi − p̄)2 = (8 − p̄)2 + (9 − p̄)2 + (4 − p̄)2 + (10 − p̄)2 + (8 − p̄)2


= (8 − 7.8) + (9 − 7.8) + (4 − 7.8) + (10 − 7.8) + (8 − 7.8) = 20.8 ≈ 4.56070170,
2 2 2 2 2

¿ √
À∑ (qi − q̄)2 = (300 − q̄)2 + (250 − q̄)2 + ⋅ ⋅ ⋅ + (400 − q̄)2


= (300 − 470) + (250 − 470) + ⋅ ⋅ ⋅ + (400 − 470) = 368000 ≈ 606.63003552.
2 2 2

∑i=1 (pi − p̄) (qi − q̄) −2480,

Thus, r=√ √ ≈ ≈ −0.8964.
4.56070170 × 606.63003552
∑i=1 (pi − p̄) ∑i=1 (qi − q̄)
n 2 n 2

1599, Contents

A440. (a) We already computed (in the previous exercise) that p̄ = 7.8, q̄ = 470,
n n
∑ (pi − p̄) (qi − q̄) = −2480 and ∑ (pi − p̄) = 20.8. So,

i=1 i=1
∑i=1 (pi − p̄) (qi − q̄) −2480
b̂ = = ≈ −119.2
∑i=1 (pi − p̄)
n 2 20.8

Thus, the regression line of q on p is q − q̄ = b̂ (p − p̄) or q − 470 = −119.2 (p − 7.8) or

q = 1400 − 119.2p.

(b) i 1 2 3 4 5
pi ($) 8 9 4 10 8
qi 300 250 1000 400 400
q̂i 446 327 923 208 406
ûi = qi − q̂i −146 −77 77 192 −46

1000 q

100 p ($)
(c) 0 2 4 6 8 10

(d) The SSR is ∑ û2i ≈ (−146) + (−77) + 772 + 1922 + (−46) = 72308.
2 2 2


1600, Contents


After Step 1. After Step 2. After Step 3. After Step 4.

After Step 5. After Step 6. After Step 7. After Step 8.

After Step 9. After Step 10. After Step 11. After Step 12.

The TI84 tells us that r = −.8963881445 and the regression line is y = ax+b = −119.2307692+
1400. This is indeed consistent with the answers from the previous exercises.

A442. In the previous exercises, we already calculated that the OLS line of best fit is
q = 1400 − 119.2p. Thus,
(a) By interpolation, a barber who charged $7 per haircut sold 1400 − 119.2 × 7 ≈ 566
(b) By extrapolation, a barber who charged $200 per haircut sold 1400−119.2×200 = −22440
haircuts. This is plainly absurd.
The second prediction is obviously absurd and thus obviously less reliable than the first.

1601, Contents

A443. (a) r ≈ 0.954.

(b) r ≈ 0.984.

1602, Contents

130. Part VII Answers (2006–17 A-Level Exams)

130.1. Ch. 109 Answers (Functions and Graphs)

A444 (9758 N2017/I/2)(i)


y = b ∣x − a∣

a 1 x
a+ √
y= b

(ii) We look for the two graphs’ intersection points. First, suppose x > a. Then:
1 ±1 1
= b ∣x − a∣ = b (x − a) ⇐⇒ b (x − a) = 1 ⇐⇒ x − a = √ ⇐⇒ x = a ± √ .
2 1
x−a b b

Since x > a and b > 0, it cannot be that x = a − 1/√ b. We conclude that if x > a, then the
two graphs intersect at the point where x = a + 1/ b.
Next, suppose x < a. Then:
1 1
= b ∣x − a∣ ⇐⇒ = −b (x − a) ⇐⇒ b (x − a) = −1.
2 2
x−a x−a
Since b > 0 and (x − a) > 0, = does not hold and the two graphs do not intersect.
2 2

Altogether, the two graphs intersect at only one point, namely where x = a + 1/ b.
From the graph, we observe that the given inequality is true to the right of this intersection
point and to the left of the vertical asymptote x = a (shaded red region above). Altogether
then, the inequality’s solution set is:

1 1
(−∞, a) ∪ (a + √ , ∞) = R ∖ [a, a + √ ] .
b b

Equivalently: x < a or x > a + 1/ b.
1603, Contents
A445 (9758 N2017/I/4). By long division, y = 4 + .
dy dy
= − (x + 2) . So, < 0 for all x except −2.
(i) Compute the gradient of C:
dx dx
But since there’s no point on C for which x = −2, we conclude that the gradient of C is
negative at all points.
(ii) The horizontal and vertical asymptotes are y = 4 and x = −2.
1 1
(iii) Start with the graph of y = 4 + . Translate rightwards by 2 units to get y = 4 + .
x+2 x
Then translate downwards by 4 units to get y = 1/x.
A446 (9758 N2017/I/5)(a) Let f (x) = x3 + ax2 + bx + c. By the Remainder Theorem,

f (1) = 1 + a + b +c = 8,

f (2) = 8 + 4a + 2b +c = 12,

f (3) = 27 + 9a + 3b +c = 25.

This is a system of three equations with three unknowns. Either solve using your graphing
calculator or “manually”, as we now do:
= minus = yields 4 = 7 + 3a + b or b = −3 (a + 1).
2 1 4

Plug = into = to get 8 + 4a − 6 (a + 1) + c = 12 or c = 10 + 2a.

4 2 5

Plug = and = into = to get 27 + 9a − 9 (a + 1) + 10 + 2a = 25 or a = −1.5.

4 5 3 6

Plug = into = and = to get b = 1.5 and c = 7.

6 4 5

(b) Given the curve of f (x) = x3 − 1.5x2 + 1.5x + 7, its gradient (derivative) is f ′ (x) =
3x2 − 3x + 1.5.
f ′ (x) is a quadratic polynomial in x with positive coefficient on x2 and negative discriminant
(−3) − 4 (3) (1.5) = 9 − 18 = −9. Hence, it is ∪-shaped and doesn’t touch the horizontal axis.

Thus, f ′ (x) > 0 for all x.

Which means that f is everywhere increasing. And so, f can only intersect the x-axis at
most once — or equivalently, f can have at most one (real) root. ,
Now observe that f (−2) < 0 and f (0) = 7 > 0. Being a polynomial function, f is continuous.
And so, by the Intermediate Value Theorem, there exists d ∈ (−2, 0) such that f (d) = 0.
This shows that f has at least one (real) root. -
Together, , and - show that f has exactly one real root. Using our graphing calculator,
we find that it is x ≈ −1.33.
(c) We are given that the tangent is parallel to the line y = 2x + 3. Or equivalently:

f ′ (x) = 2 ⇐⇒ 3x2 − 3x + 1.5 = 2 ⇐⇒ 3x2 − 3x − 0.5 = 0

3 ± (−3) − 4(3)(−0.5) 3±
⇐⇒ x= = ≈ 1.15, −0.145.
2⋅3 6
1604, Contents

A447 (9758 N2017/II/1)(i) Plug x = 3/t and y = 2t into y = 2x to get 2t = 6/t or t = ± 3.
√ √ √ √
And so the two points are A = ( 3, 2 3) and B = (− 3, −2 3). We have:

√ √ 2 √ √ 2 √ √ √
∣AB∣ = [ 3 − (− 3)] + [2 3 − (−2 3)] = 4 ⋅ 3 + 16 ⋅ 3 = 60 = 2 15.

dy dy dx −3 2t2
(ii) = ÷ =2÷ 2 =− .
dx dt dt t 3
And so the equation of the tangent line at P ( , 2p) is:
2p 2 3
y − 2p = − (x − ) .
3 p

2p2 3 1 1 3 3 3 6
At D, y = 0 and so: 0 − 2p = − (x − ) ⇐⇒ = (x − ) ⇐⇒ x = + = . That is,
3 p p 3 p p p p
D = ( , 0).
2p2 3 1
At E, x = 0 and so: y − 2p = − (0 − ) ⇐⇒ y = 2p + 2p2 ( ) = 2p + 2p = 4p. That is,
3 p p
E = (0, 4p).
6 3
The midpoint of D = ( , 0) and E = (0, 4p) is F = ( , 2p).
p p
Write x = 3/p and y = 2p. Write p = 3/x and so the desired cartesian equation is y = 6/x.
A448 (9758 N2017/II/3)(a)(i) The graph of y = f (2x) is the graph of y = f (x) com-
pressed inwards towards the y-axis by a factor of 2. Hence, its x-intercept is (a/2, 0), while
its y-intercept (0, b) remains unchanged.
(ii) The graph of y = f (x − 1) is the graph of y = f (x) translated rightwards by 1 unit.
Hence, its x-intercept is (a + 1, 0). We cannot say anything about the y-intercept.
(iii) The graph of y = f (2x − 1) is the graph of y = f (2x) translated rightwards by 1 unit.
Hence, its x-intercept is (a/2 + 1, 0). We cannot say anything about the y-intercept.
(iv) The graph of y = f −1 (x) is the graph of y = f (x) reflected in the line y = x. Hence,
its x-intercept is (b, 0) and its y-intercept is (0, a).
(b)(i) a = 1 is excluded because g (1) would be undefined.
1 1
(ii) g 2 (x) = 1 − = 1 − 1 = 1 − (1 − x) = x.
1 − (1 − 1−x )
Recall that g −1 (g (x)) = x. Since g (g (x)) = x, we have g −1 (x) = g (x) = 1 − .
1 1
(iii) g 2 (b) = g −1 (b) ⇐⇒ b = 1 − ⇐⇒ b − 1 = − ⇐⇒ (1 − b) = 1 ⇐⇒ 1 − b = ±1
1−b 1−b
⇐⇒ b = 0, 2.
4x2 + 4x − 14 4x2 + 4x − 14 − (x2 − x − 12)
A449 (9740 N2016/I/1). − (x + 3) =
x−4 x−4
3x2 + 5x − 2 (3x − 1) (x + 2)
= = = f.
x−4 x−4
1605, Contents
The given inequality is equivalent to f < 0. The numerator or denominator of f expression
equals zero at x = 1/3, x = −2, and x = 4. Sign diagram:
− + − +

−2 1/3 4

Answer: x ∈ (−∞, −2) ∪ (1/3, 4).

A450 (9740 N2016/I/3). To get the graph of y = f (x) = k (x − l) 4 + m:
1. Translate the graph of y = x4 rightwards by l units to get the graph of y = (x − l) ;

2. Stretch the graph of y = (x − l) vertically (outwards from the x-axis) by a factor of k to


get the graph of y = k (x − l) ;


3. Translate the graph of y = k (x − l) upwards by m units to get the graph of y = f (x) =


k (x − l) + m.

The graph of y = x4 has turning point (0, 0). Following the above steps, this turning point
corresponds to (1) the point (0 + l, 0) = (l, 0) on the graph of y = (x − l) ; (2) the point

(l, k ⋅ 0) = (l, 0) on the graph of y = k (x − l) ; and (3) the point (l, 0 + m) = (l, m) on the

graph of y = k (x − l) + m.

We are given that the turning point corresponds to the point (a, b) on the graph of
y = f (x) = k(x − l)4 + m. Hence, have a = l and b = m.
Plug (0, c) into y = f (x) = k (x − l) 4 + m to get c = kl4 + m = ka4 + b. Thus, k = (c − b) /a4 .

y = x4

(0, c)

y = f (x) = k (x − l) + m
y = (x − l)

y = k (x − l)
(a, b) = (l, m)
(0, 1/c) (a, 1/b) y=
f (x)
(0, 0)

y = 1/f (x) has y-intercept (0, 1/c) and turning point (a, 1/b).
1606, Contents

A451 (9740 N2016/I/10)(a)(i) Let y = 1 + x. Do the algebra: (y − 1) = x. Thus,

f −1 (x) = (x − 1) .

Rangef = [1, ∞) = Domainf −1 .

Remark 158. N2016-I-10(a)(ii) was a difficult question that few students could have prop-
erly and rigorously solved under exam conditions. As my answer below suggests, any
proper and rigorous answer to this question is necessarily somewhat lengthy.
We do not have access to the “official” answers, but I suspect that the “official” answer
to this question was hand-wavy, sloppy, incomplete, and non-rigorous. Which seems to
be how our educational system works — hurl at you a difficult question, then expect you
to foggily sleepwalk through and “answer” it without really understanding what’s going
(ii) First sentence of the question.

√ f f (x) =x

⇐⇒ 1 + √1 + x =x

⇐⇒ 1+ x =x−1

Ô⇒ 1+ x = (x − 1)

⇐⇒ = (x − 1) − 1
Ô⇒ = [(x − 1) − 1] = [(x − 1 − 1) (x − 1 + 1)]
2 2 2
= [(x − 2) x] = (x2 − 4x + 4) x2 = x4 − 4x3 + 4x2

⇐⇒ x4 − 4x3 + 4x2 − x = 0.

(A brief word about the two Ô⇒ ’s: It is generally true that a = b Ô⇒ a2 = b2 . However,
as you will recall, the converse is false. That is, it is not generally true that a2 = b2 implies
a = b. A simple counterexample is a = 1 and b = −1. Hence, these two Ô⇒ ’s cannot be
replaced by ⇐⇒ ’s.)
Observe that if f f (x) = x, then x ≠ 0 because f f (0) = 2 ≠ 0. So, if f f (x) = x, then we can
1 1

divide = by x ≠ 0 to get:

x3 − 4x2 + 4x − 1 = 0.
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
Call this g(x)

Second sentence of the question. We now solve the cubic equation g (x) = 0. Observe

that g (1) = 0 — hence, by the Factor Theorem, we know that x − 1 is a factor of g (x).
So write:

g (x) = x3 − 4x2 + 4x − 1 = (x − 1) (ax2 + bx + c) = ax3 + (b − a) x2 +?x − c,

(where ? is a coefficient I didn’t bother to compute because it isn’t necessary). Comparing

coefficients, we have a = 1, b = −3, and c = 1. And now by the quadratic formula, we have:
√ √

−b ± b2 − 4ac 3 ± (−3) − 4 (1) (1) 3 ± 5
x= = =
2 (1)
2a 2
1607, Contents
√ √
Altogether then, the three roots to the equation g (x) = 0 are 1, (3 − 5) /2, and (3 + 5) /2.

By plugging these three values into =, we see that only (3 + 5) /2 solves = or equivalently
2 2

=. The other two roots 1 and (3 − 5) /2 to g (x) = 0 are extraneous solutions (see Ch.
1 4

Third sentence of the question. Observe that:

f f (x) = x ⇐⇒ f −1 (f f (x)) = f −1 (x) ⇐⇒ f (x) = f −1 (x).


Hence, any solution to f f (x) = x is also a solution to f (x) = f −1 (x).


(b)(i) g (0) = 1, g (1) = 1 + g (0) = 1 + 1 = 2, g (2) = 2 + g (1) = 2 + 2 = 4, g (3) = 1 + g (2) = 5,

g (4) = 2 + g (2) = 6, g (5) = 1 + g (4) = 7, g (6) = 2 + g (3) = 7, g (7) = 1 + g (6) = 1 + 7 = 8,
g (12) = 2 + g (6) = 9.
(ii) No, because g (5) = g (6) = 7. That is, the element 7 in Codomaing is “hit” by two
distinct elements in Domaing. By definition then, g is not invertible and hence (by Fact
21) does not have an inverse.
A452 (9740 N2015/I/1)(i) Compute dy/dx = −2a/x3 + b. From the information given,
we have this system of equations:

+ 1.6b + c = −2.4,
a 1
1.6 2

− 0.7b + c = 3.6,
a 2

RRR = −2 3 + b = 2.
a 3
dx RR 1
So a ≈ −3.593, b ≈ −5.187, c ≈ 7.303 (calculator).
(ii) − 2 + bx + c = 0 Ô⇒ x ≈ −0.589 (calculator).
(iii) As x → ±∞, y → bx + c. Hence, the other asymptote is y = bx + c or y ≈ 5.187x + 7.303.
A453 (9740 N2015/I/2)(i) One method for “doing” this question is to simply easily
graph the equations on your calculator and copy. But here as an exercise, I’ll do it without
a calculator:
x+1 x+1 2
First draw the graph of y = (below left). Write = −1+ . This is a rectangular
1−x 1−x 1−x
hyperbola with two distinct branches.
• Intercepts. The graph of y = crosses the vertical axis at (0, 1) and the horizontal
axis at (−1, 0).
x+1 x+1
• Asymptotes. Observe that as x → 1, → ±∞, so that the graph of y = has
1−x 1−x
vertical asymptote x = 1. And as x → ±∞, (x + 1) / (1 − x) → −1, so that the graph of
y= has horizontal asymptote y = −1.
1608, Contents
• The centre is thus (1, −1).
• The two lines of symmetry run through the centre and bisect the angles formed by
the asymptotes.

Figure to be
inserted here.

Now take those parts of the graph of y = below the x-axis and reflect them in the
x-axis to get the graph of y = ∣ ∣ (above right). As instructed, we’ve also graphed
y = x + 2.
(ii) Let P , Q, and R (marked above) be the three points at which the two graphs intersect.
Then the given inequality holds if and only if x is between P and Q or to the right of R.
One method for “doing” this question is to simply use your calculator to find the three
intersection points. But here as an exercise, I’ll do it without a calculator:
First, to find the x-coordinate of the intersection point Q, suppose x ∈ [−1, 1). Then:

x+1 x+1
∣ ∣ = x + 2 ⇐⇒ = x + 2 ⇐⇒ x + 1 = −x2 − x + 2 ⇐⇒ x2 + 2x − 1 = 0
1−x 1−x

−2 ± 22 − 4 (1) (−1) √
⇐⇒ x = = −1 ± 2.
2 (1)
√ √
Note that −1 − 2√∉ [−1, 1), while −1 + 2 ∈ [−1, 1). Thus, the intersection point Q has
x-coordinate −1 + 2.
Next, to find the x-coordinates of the intersection points P and R, suppose x ∉ [−1, 1].
x+1 x+1 √
∣ ∣ = x + 2 ⇐⇒ − = x + 2 ⇐⇒ x + 1 = x2 + x − 2 ⇐⇒ x2 − 3 = 0 ⇐⇒ x = ± 3.
1−x 1−x
√ √
Thus, the x-coordinates of the intersection points P and R are − 3 and 3.
√ √ √
Altogether then, the given inequality holds if and only if x ∈ (− 3, −1 + 2) ∪ ( 3, ∞).
A454 (9740 N2015/I/5)(i) First, move the graph of y = x2 rightwards by 3 units to get
the graph of y = (x − 3) .

Then stretch it vertically, outwards from the x-axis by a factor of 0.25 — or equivalently,
compress it vertically, inwards towards the x-axis by a factor of 4 — to get the graph of
y = 0.25 (x − 3) .

(ii) If −1 ≤ x < 0 or 3 < x ≤ 4, then f (x) = 0.

1609, Contents
it without a calculator.
0 ≤ =x 1≤ 1,
and (x) ==1.1. Similarly, f (3) = 0 and lim f (x) = 0. (These say that f is
x→1 x→3
continuous at both 1 and 3.)
If 1 < x ≤ 3, then f (x) = 0.25 (x − 3) . Note that f (3) = 0.

(iii) Again,
(iii) The graph
the easy = 1 +isf by
of y way (0.5x) is that
graphing of f stretched
calculator, but againhorizontally outwards
as an exercise, let’sfrom
y-axis by a factor of
it without a calculator.2, then shifted up by 1 unit.
Written out explicitly, we have:
Method #1: Mechanically do the algebra. First replace each x with 0.5x.

⎪ 2 for 0 ≤ x ≤ 2,
⎧ ⎪
⎪ ⎧

⎪ y = 1 + f (0.5x)for = ⎨01≤+x0.25(0.5x
≤ − 2 ⎪

⎪ 1x ≤ 6, for 0 ≤ 0.5x

1 ⎪

1, 3) for 2 ⎪

f (x) = ⎨0.25(x − 3)2 for ⎪ ⎪
⎪11< x ≤ 3, Ô⇒ f (0.5x) = ⎨0.25(0.5x − 3)2 for 1 < 0.5x

⎪ ⎩ otherwise.

⎪ ⎪

⎩ 0 otherwise. ⎪
⎩0 otherwise.
A455 (9740 N2015/II/3)(a)(i) Rangef = (−∞, 0). Pick any element y ∈ Rangef and do
the algebra:

⎪ ⎧

⎪ 1 for 0 ≤ ≤ 2, ⎪

⎪ 2√ for 0 ≤
⎪ ⎪
⎪ 1 1 1 ⎪ 1
Ô⇒ f (0.5x) =y⎨=0.25(0.5x ⇐⇒
− 3)2y = for
1 − 2x < x⇐⇒ = 1 1− + f (0.5x)
⇐⇒ x==⎨±1 + 10.25(0.5x
− .
≤ 6,x Ô⇒
2 2 2
⎪ − ⎪ − 3)2 for 2 <
⎪ 1 ⎪
⎪ ⎪
x y y

⎪ ⎪

⎩ 0 otherwise. ⎩1 otherw
(Note that ⇐⇒ is permissible because y ≠ 0)

Since x >#2: in =. Thus:

(red) graph. For the piece 0 ≤ x ≤ 1,
Method 1, we can reject
Reason the negative
it through. Take value
the original
√ axis, by a factor of 2. Then shift up by 1
stretch horizontally, outwards from the vertical
unit. Do the same for the piece 1 ≤ x ≤ 4,3and again
1 for the piece −1 ≤ x ≤ 0.
x= 1− .

Page last Table
equation shows
of Contents that every y ∈ Rangef corresponds to
a unique x ∈ Domainf .
Equivalently, f is invertible.
Rangef Domainf
³¹¹ ¹ ¹ ¹ ¹ · ¹ ¹ ¹ ¹ ¹ ¹µ ³¹¹ ¹ · ¹ ¹ ¹µ
(ii) The inverse function is f −1
√ ∶ (−∞, 0) → (1, ∞) (1, ∞) with the mapping rule f −1 (x) =
1 − 1/x.
(b) Let y ∈ Rangeg. Then there exists some x ∈ R ∖ {−1, 1} such that g (x) = y or:

1610, Contents

=y y − yx2 = 2 + x yx2 + x + 2 − y = 0.
or or
1 − x2

Observe that = is a quadratic equation in the variable x. Since x ∈ R, = holds for some
1 1

value of y if and only if its determinant is non-negative. That is:

12 − 4y (2 − y) ≥ 0 4y 2 − 8y + 1 ≥ 0.

To solve ≥, observe that its LHS is a quadratic expression in y. It has positive coefficient

on y 2 and is thus ∪-shaped. By the quadratic formula, its roots are:

√ √ √
8 ± (−8)2 − 4 (4) (1) 1 3
=1± 1− =1±
2 (4)
4 2

Thus, ≥ and therefore also = hold if and only if:

2 1

√ √
y ∈ (−∞, 1 − 0.5 3] ∪ [1 + 0.5 3, ∞).

What we’ve just shown is that there exists x ∈ R ∖ {−1, 1} such that g (x) = y if and only if
∈ holds. Thus:


A456 (9740 N2014/I/1)(i)

1 1 1−x 1−x 1
f 2 (x) = f ( )= = = =1− .
1−x 1 − 1−x 1 − x − 1
1 −x x

To show that f 2 (x) = f −1 (x), we need merely show that f 2 (y) = x ⇐⇒ f (x) = y. To
that end, write:

1 1 1 1
f 2 (y) = x ⇐⇒ 1 − = x ⇐⇒ f (x) = f (1 − ) = = 1 = y.
y y 1 − (1 − y1 ) y

(ii) f 3 (x) = f f 2 (x) = f f −1 (x) = x.

A457 (9740 N2014/I/4)(i) The graph of y 2 = f (x):
• Is symmetric in the x-axis.
• Is empty wherever f (x) < 0. So here, it is empty to the left of A and between B and C.
√ √
• Has turning points (0, d) and (0, − d) because the graph of y = f (x) has turning
point (0, d).
• Intersects the graph of y = f (x) wherever f (x) = 1.
• Has the same x-intercepts as y = f (x), namely A = (−a, 0), B = (b, 0), and C = (c, 0).

1611, Contents


(0, − d)
A = (−a, 0) B = (b, 0) C = (c, 0)
O x
y = f (x)

(0, d) y 2 = f (x)

(ii) The tangents to the curve y 2 = f (x) at the points where it cross the x-axis are vertical.
dy dy dx 1
A458 (9740 N2014/II/1)(i) = ÷ = 6 ÷ 6t = = 0.4. So, t = 2.5.
dx dt dt t
dy 1
(ii) The tangent line at (3p2 , 6p) has equation y − 6p = (x − 3p2 ) = (x − 3p2 ). Where
dx p
this line meets the y-axis, we have y − 6p = (0 − 3p2 ) = −3p or y = 3p. So D = (0, 3p) and
the mid-point of the line segment P D is:

3p2 + 0 6p + 3p
( , ) = (1.5p2 , 4.5p).
2 2

­ ¬ ¬
x y 2 2 y

Observe that 1.5p = 1.5( 4.5p /4.5) = ( 4.5p ) /13.5. So the desired cartesian equation is:

x= .

A459 (9740 N2013/I/2). Rearranging, we have: xy − y = x2 + x + 1 ⇐⇒ x2 + (1 − y) x +

(y + 1) = 0.

Observe that = is a quadratic equation in the variable x. It holds if and only if its discrim-

inant is non-negative, i.e.:

(1 − y) − 4 (1) (y + 1) = y 2 − 6y − 3 ≥ 0.
2 2

In turn, ≥ is a quadratic inequality in the variable y with positive coefficient on y 2 . It holds


if and only if y is “between” the two roots. By the quadratic formula, the two roots are:

6 ± (−6) 2 − 4 (1) (−3) √ √
y= = 3 ± 9 + 3 = 3 ± 2 3.
√ √
Altogether then, = holds ⇐⇒ ≥ holds ⇐⇒ y ∈ (−∞, 3 − 2 3] ∪ [3 + 2 3, ∞).
1 2

Thus, this is also the set of possible values that y can take.
1612, Contents
x+1 1.5
A460 (9740 N2013/I/3)(i) y = = 0.5 + .
2x − 1 2x − 1
To find the y-intercept, set x = 0 to get:
0.5 + = −1.
Hence, the y-intercept is (0, −1).
To find the x-intercept, set y = 0 to get:

y= or x = −1.
2x − 1
Hence, the x-intercept is (−1, 0).
Observe that as x → 0.5, we have y → ±∞. Hence, x = 0.5 is a vertical asymptote.
And as x → ±∞, y → 0.5. Hence, y = 0.5 is a horizontal asymptote.

y y=x
Line of
y=1-x (0.5, 0.5) symmetry
Line of Centre

y = 0.5 y = (x + 1) / (2x - 1)
horizontal asymptote

(-1, 0)
intercept x = 0.5

(0, -1)


1613, Contents

2x − 1
⇐⇒ “x + 1 < 2x − 1 AND 2x − 1 > 0” OR “x + 1 > 2x − 1 AND 2x − 1 < 0”

⇐⇒ “2 < x AND x > 0.5” OR “2 > x AND x < 0.5”

⇐⇒ x>2 x < 0.5

Thus, the solution set is (−∞, 0.5) ∪ (2, ∞).

A461 (9740 N2013/II/1)(i) Observe that for example, 1 ∈ Rangeg, but 1 ∉ Domainf .
Hence, Rangeg ⊆/ Domainf and so f g does not exist.
(ii) We presume from the question’s wording that the functions gf and (gf ) exist and
do not bother verifying their existence.
2+x 2+x
gf (x) = g ( )=1−2 .
1−x 1−x
2+x 2+x
gf (x) = 5 ⇐⇒ 1 − 2 = 5 ⇐⇒ 4 = −2 ⇐⇒ 2x − 2 = 2 + x ⇐⇒ x = 4. So
1−x 1−x
(gf ) (5) = 4.

A462 (9740 N2012/I/1). Let x, y, and z be, respectively, the costs of the under-16,
16-65, and over-65 tickets. Then we have the following system of equations:

9x + 6y + 4z = $162.03, 7x + 5y + 3z = $128.36, 10x + 4y + 5z = $158.50.

1 2 3

One method for solving the above system of equations is to use your graphing calculator.
But here as a masochistic exercise, let’s do it by hand:
= minus = yields: 2x + y + z = $33.67.
1 2 4

= minus = yields: −x + 2y − z = $3.53.

1 3 5

= plus = yields: x + 3y = $37.20 or y = $12.40 − x/3.

4 5 6

Plug = into = to get: 2x + $12.40 − x/3 + z = $33.67 or z = $21.27 − 5x/3.

6 4 7

Plug = and = into = to get 7x + $62 − 5x/3 + $63.81 − 5x = $128.36 or x = $7.65.

6 7 2 8

And now plugging = into = and =, we have:

8 6 7

x = $7.65, y = $9.85, z = $8.52.


A463 (9740 N2012/I/7)(i) Let h = g. We will show that h (g (x)) = x:

g (x) + k x+k
+k x + k + k (x − 1) x (1 + k)
h (g (x)) = = x−1
= = = x.
g (x) − 1 x+k
x−1 −1 x + k − (x − 1) k+1

Hence, g −1 = h and g is self-inverse.

(ii) To find the y-intercept, plug in x = 0 to get: y = = −k. Hence, the y-intercept is
(0, −k).
1614, Contents
To find the x-intercept, plug in y = 0 to get: 0 = or x = −k. Hence, the x-intercept is
(−k, 0).
Observe that as x → 1, y → ±∞. Hence, the graph has the vertical asymptote x = 1.
Observe that as x → ±∞, y → 1. Hence, the graph has the horizontal asymptote y = 1.

y = -x y = (x + k) / (x - 1)
Line of

(-k, 0) (1, 1)
Horizontal Centre y=1
intercept horizontal asymptote

y=x vertical
Line of asymptote

(0, -k)

(iii) Since g is self-inverse, a line of symmetry is y = x.

1 x+k k+1
Three steps to transform the graph of y = to that of y = =1+ :
x x−1 x−1
1. Translate rightwards by 1 unit to get y = .
2. Stretch vertically by a factor of k + 1, outwards from the x-axis to get y = .
k+1 x+k
3. Translate upwards by 1 unit to get y = 1 + = .
x−1 x−1

1615, Contents

A464 (9740 N2012/II/3)(i) Graph on your calculator and copy.
y y

y = f(x ) y = |f(x )|

x x

(ii) f (x) = 4 ⇐⇒ x3 + x2 − 2x − 4 = 4 ⇐⇒ x3 + x2 − 2x − 8 = 0. By observation, x = 2 is an

integer solution to this last equation.
Write: x3 + x2 − 2x − 8 = (x − 2) (ax2 + bx + c) = ax3 + (b − 2a) x2 +?x − 2c,
where as usual, ? denotes a coefficient we didn’t bother computing because it wasn’t ne-
cessary. Comparing coefficients, we have a = 1, b = 3, and c = 4. Thus:
x3 + x2 − 2x − 8 = (x − 2) (x2 + 3x + 4).

Observe though that the quadratic polynomial x2 + 3x + 4 has negative determinant. Hence,
the only real solution to f (x) = 4 is 2.
(iii) The given equation is equivalent to f (x + 3) = 4.
In (ii), we showed that f (x) = 4 has solution x = 2.
Hence, the given equation has solution has x + 3 = 2 or x = −1.
(iv) Where f (x) < 0, reflect the graph in the x-axis. Where f (x) ≥ 0, keep it unchanged.
(v) ∣f (x)∣ = 4 ⇐⇒ ∣x3 + x2 − 2x − 4∣ = 4.

Suppose x3 + x2 − 2x − 4 ≥ 0. Then = becomes x3 + x2 − 2x − 4 = 4 or x3 + x2 − 2x − 8 = 0.

2 1

We already found that this cubic equation has only one real root, namely 2. We can verify
that x = 2 satisfies ≥.

Now suppose instead that x3 + x2 − 2x − 4 < 0. Then = becomes −x3 − x2 + 2x + 4 = 4 or

3 1

0 = x3 + x2 − 2x = x (x2 + x − 2) = x (x + 2) (x − 1) .
This second cubic equation has three real roots, namely 0, −2, and 1. We can verify that
these values of x satisfy <.

Altogether, the equation ∣f (x)∣ = 4 has four real roots: −2, 0, 1, and 2.
A465 (9740 N2011/I/1). The numerator N = x2 + x + 1 is a quadratic polynomial with
positive coefficient on x2 and negative discriminant. So, N > 0 for all x.
1616, Contents
Thus, the given inequality holds if and only the denominator is negative.
The denominator D = x2 + x − 2 is a quadratic polynomial with positive coefficient on x2
and roots given by:

−1 ± 12 − 4 (1) (−2)
x= = −2, 1.
2 (1)

Hence, D < 0 (and the given inequality holds) if and only if x ∈ (−2, 1).
A466 (9740 N2011/I/2)(i) The given information forms the following system of equa-

a (−1.5) + b (−1.5) + c = 4.5, a (2.1) + b (2.1) + c = 3.2, a (3.4) + b (3.4) + c = 4.1.

2 2 2

2.25a − 1.5b + c = 4.5, 4.41a + 2.1b + c = 3.2, 11.56a + 3.4b + c = 4.1.

1 2 3

One method for solving the above system of equations is to use your graphing calculator.
But here just for fun, let’s do it by hand:
= minus = yields 2.16a + 3.6b = −1.3 or 108a + 180b = −65
2 1 4

= minus = yields 7.15a + 1.3b = 0.9 or 143a + 26b = 18.

3 2 5

180 5 90 90 1 620 + 845 2 465

× = minus = yields × 143a − 108a = × 18 + 65 or 882a = =
26 13 13 13 13
6 2 465 2 465
a= = ≈ 0.215.
13 ⋅ 882 11 466
18 − 143a 9 11 9 11 2 465 1 764 × 9 − 27 115
Plugging = back into =, we have b = = − a= − = =
6 5
26 13 2 13 2 13 ⋅ 882 2 ⋅ 13 ⋅ 882
15 876 − 27 115 −11 239
= ≈ −0.490.
2 ⋅ 13 ⋅ 882 22 932
9 9 2 465 3 −11 239 9 2 465
And now from =, we have c = 4.5 − 2.25a + 1.5b = − + = − −
2 4 13 ⋅ 882 2 2 ⋅ 13 ⋅ 882 2 52 ⋅ 98
11 239 9 18 634 9 9 317 9 1 331 9 ⋅ 26 ⋅ 21 − 1 331 3 583
= − = − = − = = ≈ 3.281.
52 ⋅ 294 2 52 ⋅ 294 2 26 ⋅ 294 2 26 ⋅ 42 26 ⋅ 42 1 092
2 465 11 239 11 239 2 465 11 239
(ii) f ′ (x) = 2ax + b = x− > 0 ⇐⇒ x > ÷ = ≈ 1.140.
5 733 22 932 22 932 5 733 9 860
A467 (9740 N2011/II/3)(i) Write y = f (x) = ln (2x + 1) + 3. Now do the algebra:

y − 3 = ln (2x + 1) ⇐⇒ ey−3 = 2x + 1 ⇐⇒ 0.5 (ey−3 − 1) = x.

Thus, f −1 (x) = 0.5 (ex−3 − 1).

As usual, we have Domainf −1 = Rangef = R and Rangef −1 = Domainf = (−0.5, ∞).
(ii) To find the y-intercept of the graph of f , plug in x = 0:

y = ln (2 ⋅ 0 + 1) + 3 = ln 1 + 3 = 3.

Hence, f has y-intercept (0, 3).

To find the x-intercept of the graph of f , plug in y = 0:
1617, Contents
0 = ln (2x + 1) + 3 or −3 = ln (2x + 1) or e−3 = 2x + 1 or
x = 0.5 (e−3 − 1).

Hence, f has y-intercept (0.5 (e−3 − 1) , 0).

Observe that as x → −0.5, f (x) → −∞. Hence, x = −0.5 is a vertical asymptote.

x = - 0.5
for f (x)

(0, 3)
y = f -1(x)

(3, 0)
(0.5 [e -1] , 0) Horizontal y = f(x )
Horizontal intercept

(0, 0.5 [e -3 - 1])

y = - 0.5
Horizontal x
for f -1(x)

The graphs of f and f −1 are reflections of each other in the line y = x. Hence, the graph of
f −1 has x-intercept (3, 0), y-intercept (0, 0.5 (e−3 − 1)), and horizontal asymptote y = −0.5.
(iii) By Fact 23, any point at which f intersects the line y = x is also a point at which f
intersects f −1 .
The points where f intersects the line y = x are the points where f (x) = x or equivalently
ln (2x + 1) + 3 = x. And thus, f and f −1 also intersect at the points where = holds. Solve =
1 1 1

by using your graphing calculator — we find that the two solutions are x ≈ −0.4847, 5.482.

1618, Contents

Remark 159. The writers of (iii) may have made a mistake. It seems that they believed
the following statement to be true and were leading students to write it down as an

“Any point at which f and f −1 intersect is on the line y = x.”

But as was already discussed in Ch. 14.3, the above statement is false.
A468 (9740 N2010/I/5)(i) Translating y = x3 rightwards by 2 units yields y = (x−2) .

Stretching y = (x − 2) outwards from the y-axis by a factor of 0.5 yields y = (2x − 2) .

3 3

Translating y = (2x − 2) downwards by 6 units yields y = (2x − 2) −6 — this is the new

3 3

To find the new curve’s y-intercept, plug in x = 0 to get y = (2 ⋅ 0 − 2) − 6 = −14. So, the

y-intercept is (0, −14).

To find the new curve’s x-intercept, plug in y = 0 to get 0 = (2x − 2) − 6 or 6 = 2x − 2 or
3 3

√ √
x = 0.5 6 + 1. So, the x-intercept is (0.5 6 + 1, 0).
3 3

(ii) Being the reflection of f in the line y = x, the graph of f −1 has x-intercept (−14, 0) and

y-intercept (0, 0.5 6 + 1).

1619, Contents


(0, 0.5 + 2)
intercept of
y = f -1(x)

y = f -1(x)

(0.5 , 0)
intercept of
(-14, 0) y = f(x)
intercept of y = f (x) = (2x - 2)3 - 6
y = f -1(x)

(0, -14)
intercept of
y = f(x)

A469 (9740 N2010/II/4)(i) Graph on your calculator and copy. The asymptotes are
x = ±1.

1620, Contents


y = f (x)
x = -1
(0, -1) Vertical
Vertical asymptote

(ii) Observe that f is symmetric in the y-axis. So, by restricting Domainf to R+0 , the new
function produced would be invertible. Hence, the smallest k for which f −1 exists is k = 0.
(iii) 1 1 (x − 3) 2
f g (x) = f ( )= =
x−3 ( x−3
1 2
) − 1 1 − (x − 3)

(x − 3) 2 (x − 3) 2
= =
[1 − (x − 3)] [1 + (x − 3)] (4 − x) (x − 2)

(iv) First, N = (x − 3) 2 > 0 for all x.

Next, D = (4 − x) (x − 2) is a quadratic polynomial with negative coefficient on x2 and roots
4 and 2. Thus, D > 0 ⇐⇒ x ∈ (2, 4).
Altogether then, N /D > 0 ⇐⇒ x ∈ (2, 4).
Note though that 3 ∉ Domain (f g). Thus, ⇐⇒ x ∈ (2, 4) ∖ {3}.

1621, Contents

Remark 160. The last part of N2010/II/4 was a difficult question. Below I give two
The first is a non-rigorous hand-wavy solution that relies (as always) on your graphing
calculator. This was probably what your A-Level examiners wanted you to write down.
The second is a rigorous solution (in other words, the only sort of correct solution) that
few students would have been able to produce under exam conditions.
(x − 3) 2
(v) Method 1 (non-rigorous). Graph y = on your calculator. Observe
(4 − x) (x − 2)
that y ∈ (−∞, −1) ∪ [0, ∞).
One is thus tempted to conclude that Range (f g) = (−∞, −1) ∪ [0, ∞), but this would be

slightly incorrect.

y = fg (x)

The point
(3, 0) is not
part of the
graph of y
= fg (x).

Note well that while your graphing calculator has graphed y for all x ∈ R ∖ {2, 4}, the
domain of f g is R ∖ {2, 3, 4}.
Hence, the point (3, 0) is not in the graph of f g. Since there is no other x for which
f g (x) = 0, we conclude that 0 ∉ Range (f g). So, = is incorrect. Instead, we have:

Thus: Range (f g) = (−∞, −1) ∪ (0, ∞)

1622, Contents
Method 2 (rigorous). Observe that
(x − 3) 2 x2 − 6x + 9 1 1
= 2 = −1 + 2 = −1 +
(4 − x) (x − 2) −x + 6x − 8 −x + 6x − 8 (4 − x) (x − 2)

Now, (4 − x) (x − 2) is a quadratic polynomial with negative coefficient on x2 . Its two roots

are 2 and 4. It thus attains, at x = 3, a maximum value of (4 − 3) (3 − 2) = 1. Thus, if x ∈ R,
then (4 − x) (x − 2) ∈ (−∞, 1).
But given that x ∈ R ∖ {2, 3, 4}, we have instead:

(4 − x) (x − 2) ∈ (−∞, 1) ∖ {0} = (−∞, 0) ∪ (0, 1).

And so: ∈ (−∞, 0) ∪ (1, ∞).
(4 − x) (x − 2)

And thus: −1 + ∈ (−∞, −1) ∪ (0, ∞) = Range (f g).
(4 − x) (x − 2)

A470 (9740 N2009/I/1)(i) Write un = an2 + bn + c. The information given yields the
following system of equations:

a ⋅ 12 + b ⋅ 1 + c = 10, a ⋅ 22 + b ⋅ 2 + c = 6, a ⋅ 32 + b ⋅ 3 + c = 5.
1 2 3

You can solve this using your calculator. But here as an exercise, let’s do it by hand:
= minus = yields −3a − b = 4.
1 2 4

= minus = yields −5a − b = 1.

2 3 5

= minus = yields 2a = 3 or a = 1.5.

4 5 6

And now plugging = into =, we have b = −8.5.

6 4 7

Plugging = and = into =, we have c = 17.

6 7 1

Altogether then, un = 1.5n2 − 8.5n + 17.

(ii) un = 1.5n2 − 8.5n + 17 > 100 ⇐⇒ 3n2 − 17n − 166 > 0.

> is a quadratic inequality with positive coefficient on n2 . Thus, > holds if and only if n is
1 1

“outside” the two roots.

And by the quadratic formula, the roots are:
√ √ √
17 ± (−17) 2 − 4 (3) (−166) 17 ± 289 + 1 992 17 ± 2 281
n= = = .
6 6 6
We can discard the negative value. The positive value that remains is approximately 10.8.
Since n must be an integer, the set of values of n for which un > 100 is {11, 12, 13, . . . }.
A471 (9740 N2009/I/6)(i) Observe that C1 is a rectangular hyperbola. Write y = =
1− . Thus, C1 has y-intercept (0, −1) and x-intercept (2, 0). As x → −2, y → ±∞, and
so x = −2 is a vertical asymptote. As x → ±∞, y → 1, and so y = 1 is a horizontal asymptote.
1623, Contents
√ √
Observe that C2 is an ellipse centred on the origin, with y-intercepts (0, − 3) and (0, 3)
√ √
and x-intercepts (− 6, 0) and ( 6, 0). There are no asymptotes.

x2 ( x+2 )
(ii) Plug C1 ’s equation y = into C2 ’s to get + = 1.
x+2 6 3
Multiply by 6 (x + 2) to get: x2 (x + 2) + 2 (x − 2) = 6 (x + 2)
2 2 2 2

Rearranging, 2 (x − 2) = 6 (x + 2) − x2 (x + 2) = (x + 2) (6 − x2 ), as desired.
2 2 2 2

(iii) −0.5149, 2.445.

Line of
(0.5, 0.5)
Centre (0, )
intercepts (-1, 0)
y = (x - 2) / (x + 1) Horizontal

y = 0.5
horizontal asymptote

(0, -1)

( , 0) x = 0.5 y=-x-1
Horizontal vertical Line of
intercepts asymptote symmetry

A472 (9740 N2009/II/3)(i) Write y =

and do the algebra:
bx − a

y (bx − a) = ax ⇐⇒ −ay = x (a − by) ⇐⇒ = x,

by − a

where the division in the last step is permissible because y ≠ a/b.

Thus, f −1 (x) =
bx − a
Notice that f (x) = f −1 (x). Thus, f 2 (x) = f f −1 (x) = x.

1624, Contents

Observe that f is a rectangular hyperbola with asymptotes x = and y = . So, Rangef =
a a
b b
R ∖ { }.
Notice that Rangef = Domainf . Thus, Rangef 2 = Rangef = R ∖ { }.
(ii) Since Rangeg = R ∖ {0} ⊆/ Domainf = R ∖ { }, f g does not exist.
(iii) f −1 (x) = x ⇐⇒ = x ⇐⇒ ax = x (bx − a) ⇐⇒ 0 = x (bx − 2a) ⇐⇒ x = 0 or
bx − a
x = 2a/b.
(cx + d) a − (ax + b) c ad − bc
A473 (9740 N2008/I/9)(i) f ′ (x) = = 2 . Since ad − bc ≠ 0,
(cx + d) 2
(cx + d)
f ′ (x) ≠ 0 for any x and so there are no turning points.
(ii) If ad − bc = 0, then f ′ (x) = 0 for all x. Hence, the graph is simply a horizontal line.
(iii) a = 3, b = −7, c = 2, and d = 1. Since ad − bc = 3 ⋅ 1 − (−7) ⋅ 2 = 17 > 0, by (i), the given
graph has a positive gradient at all points.
3x − 7 8.5
(iv)(a) This graph is a rectangular hyperbola. Write y = = 1.5 − .
2x + 1 2x + 1
This graph has y-intercept (0, −7) and x-intercept (7/3, 0).
As x → −1, y → ±∞. So, this graph has vertical asymptote x = −1.
As x → ±∞, y → 1.5. So, this graph has horizontal asymptote y = 1.5.

1625, Contents

x = - 0.5

y = 1.5
y = (3x - 7) / (2x + 1) horizontal asymptote

horizontal x

(7 / 3, 0)
y2 = (3x - 7) / (2x + 1) Horizontal
for both graphs
(0, -7)

(iv)(b) The graph of y 2 = f (x):

• Is symmetric in the x-axis.
• Is empty wherever f (x) < 0. So here, it is empty between the vertical asymptote and
the x-intercept.
• Intersects the graph of y = f (x) wherever f (x) = 1.
• Has the same x-intercept as y = f (x), namely (−7/3, 0).

A474 (9233 N2008/I/14)(i) Write y = 2 =

x x
. For any y-intercepts,
x − 1 (x + 1) (x − 1)
plug in x = 0 to get y = 2 = 0 — so, the only y-intercept is (0, 0). For any x-intercepts,
0 −1
plug in is y = 0 to get x = 0 — so, the only x-intercept is (0, 0).
As x → ±1, y → ±∞. Thus, two vertical asymptotes are x = 1 and x = −1.
As x → ±∞, y → 0. Thus, a horizontal asymptote is y = 0.
(ii) The graph of y 2 = f (x):
• Is symmetric in the x-axis.
• Is empty wherever f (x) < 0. So here, it is empty to the left of x = −1 and between x = 0
and x = 1.
• Intersects the graph of y = f (x) wherever f (x) = 1.
1626, Contents
• Has the same x-intercept as y = f (x), namely (0, 0). Of course, this is also a y-intercept.

At the origin, the tangent to the curve y 2 = 2

is vertical.
x −1

x=±1 y
asymptotes y = x / (x2 - 1)

horizontal asymptote
for both graphs

y2 = x / (x2 - 1)

(0, 0)
Horizontal and
vertical intercepts
for both graphs

= ex ⇐⇒ x = ex (x2 − 1) ⇐⇒ xe−x = x2 − 1 ⇐⇒ 1 + xe−x = x2 .

x −1

(iv) Try x1 = 1. Then:

√ √
x2 = 1 + x1 e−x1 = 1 + e−1 ≈ 1.169 564.
√ √
x3 = 1 + x2 e−x2 = 1 + 1.169 564e−1.169 564 ≈ 1.167 541.
√ √
x4 = 1 + x3 e−x3 = 1 + 1.167 541e−1.167 541 ≈ 1.167 587.

So the positive root of x = 1 + xe−x is x ≈ 1.17.
A475 (9740 N2008/II/4)(i) Take the quadratic equation y = x2 , translate it rightwards
by 4 units, then upwards by 1 unit to get y = (x − 4) + 1.

Take care to note that Domainf = (4, ∞) and Rangef = (1, ∞). In particular, the graph of
f does not include the point (4, 1).

1627, Contents

9 y
y = f (x)
The point (1, 4) is
8 not part of the
graph of y = f -1(x).

6 y = f -1(x)

2 line
The point (4, 1) is
1 not part of the
graph of y = f (x).
-2 0 2 4 6 8

= (x − + − = (x − ⇐⇒ ± y − 1 = x − 4 ⇐⇒
2 2
(ii) Write
√ y 4) 1. Do the algebra: y 1 4)
x = 4 ± y − 1. Since Domainf√ = (4, ∞), we have x > 4 and so we can discard the negative
value. Thus, f (x) = 4 + x − 1.

Domainf −1 = Rangef = (1, ∞).

(iii) See above.
(iv) Reflect f in the line y = x to get f −1 .
By Fact 23, any point at which f intersects the line y = x is also a point at which f intersects
f −1 . And so, let us find points at which f intersects y = x.
To do so, write f (x) = x or (x − 4) + 1 = a or x2 − 9x + 17 = 0 or:

9 ± (−9) − 4 (1) (17) 9 ± 13
x= =
2 (1)

Since Domainf = (4, ∞), we may discard any values that are smaller than or equal to 4.

This leaves us with (9 + 13) /2 as a solution to f (x) = f −1 (x). 421

The answer here may have sufficed on these particular A-Level exams where the writers seem to have
made a mistake (see remark). However, this is not in fact a complete answer. We have merely found
one solution to f (x) = f −1 (x). But there may or may not be still other solutions.
1628, Contents
Remark 161. The writers of (iv) may have made a mistake. It seems that they believed
the following statement to be true:

“Any point at which f and f −1 intersect is on the line y = x.”

But as was already discussed in Ch. 14.3, the above statement is false.
2x2 − x − 19 2x2 − x − 19 − (x2 + 3x + 2) x2 − 4x − 21
A476 (9740 N2007/I/1). 2 −1= = 2 .
x + 3x + 2 x2 + 3x + 2 x + 3x + 2
2x2 − x − 19 2x2 − x − 19 x2 − 4x − 21
And so 2 > 1 ⇐⇒ 2 − 1 > 1 − 1 = 0 ⇐⇒ 2 > 0.
x + 3x + 2 x + 3x + 2 x + 3x + 2
The numerator N = x2 − 4x − 21 has positive coefficient on x2 and roots −3 and 7.
The denominator D = x2 + 3x + 2 has positive coefficient on x2 and roots −1 and −2.
Hence, N /D > 0

⇐⇒ “N > 0 AND D > 0” OR “N < 0 AND D < 0”

⇐⇒ “’x < −3 OR x > 7’ AND ’x < −1 OR x > −2’” OR “−3 < x < 7 AND −2 < x < −1”

⇐⇒ “x < −3 OR x > 7” OR −2 < x < −1

Altogether then, the inequality holds if and only if x ∈ (−∞, −3) ∪ (−2, −1) ∪ (7, ∞).
A477 (9740 N2007/I/2)(i) Since Domainf = R ∖ {3} ⊆/ Rangeg = R+0 , f g does not exist.
Since Domaing = R ⊆ Rangef = R ∖ {0}, gf exists. We have Domain (gf ) = Domainf =
R ∖ {3} and:

To confirm
√ that this is the unique√solution, let us directly solve f (x) = f −1 (x) ⇐⇒ (x − 4) + 1 =
1 2

4 + x − 1 ⇐⇒ x2 − 8x + 13 = x − 1 Ô⇒ x4 + 64x2 + 169 − 16x3 + 26x2 − 208x = x − 1 ⇐⇒

x4 − 16x3 + 90x2 − 209x + 170 = 0. What we’ve shown above is that any solution to = must also be a
2 1

solution to =. The converse, though, is not true. So, what we’ll do is find all the solutions to =, then
2 2

examine each of them to see if it solves =. √


Above we have already found that two roots of = are (9 ± 13) /2. (We also concluded that one of these

is a solution to f (x) = f −1 (x) while the other is not.)

By the Fundamental Theorem of Algebra, = has four roots. Two of these must be (9 ± 13) /2. Thus:

√ √
x4 − 16x3 + 90x2 − 209x + 170 = (x2 + ax + b) [x − (9 + 13) /2] [x − (9 − 13) /2] =
(x2 + ax + b) (x2 − 9x + 17) = x4 + (a − 9) x3 +?x2 +?x + 17b.

Comparing coefficients, a = −7 and b = 10. And now x2 + ax + b = x2 − 7x + 10 = (x − 2) (x − 5). So, the

other two roots to = are 2 and 5.

Since 2 ∉ Domainf , 2 cannot be a solution to f (x) = f −1 (x).

We have f (5) = (5 − 4) + 1 = 2, while f −1 (5) = 4 + 5 − 1 = 7, so that 5 is also not a solution to

f (x) = f −1 (x).

Altogether then, (9 + 13) /2 is indeed the unique solution to f (x) = f −1 (x).
1629, Contents
1 1
gf (x) = g (f (x)) = g ( )= .
x−3 (x − 3)2

1 1 1
(ii) Write y = f (x) = and do the algebra to get = x − 3 or + 3 = x. (Note that
x−3 y y
y ≠ 0 because 0 ∉ Rangef .) Hence, Domainf −1 = Rangef = R ∖ {0} and:

f −1 (x) = + 3.
2x + 7 3
A478. (9740 N2007/I/5) Write y = =2+ .
x+2 x+2
Starting with y = 1/x:
1. Translate leftwards by 2 units to get y = 1/ (x + 2).
2. Stretch vertically outwards from the x-axis by a factor of 3 to get y = 3/ (x + 2).
3. Translate upwards by 2 units to get y = 2 + 3/ (x + 2).
This is a rectangular hyperbola.
For the y-intercept, plug in x = 0 to get y = 2 + = 3.5 — so, the only y-intercept is
(0, 3.5).
For the x-intercept, plug in y = 0 to get 0 = 2 + or x = −3.5 — so, the only x-intercept
is (3.5, 0).
As x → −2, y → ±∞. So, x = −2 is a vertical asymptote.
As x → ±∞, y → 2. So, y = 2 is a horizontal asymptote.

1630, Contents

y=-x y
Line of y = (2x + 7) / (x + 2)
symmetry (-2, 2)

y=x +4
Line of
horizontal asymptote
(0, -3.5)

x = -2

(-3.5, 0)

A479 (9740 N2007/II/1). Let p, m, and l be the prices (in dollars per kilogram) of,
respectively, the pineapples, mangoes, and lychees. Then the given table yields the following
system of equations:

1.15p + 0.6m + 0.55l = 8.28, 1.2p + 0.45m + 0.3l = 6.84, 2.15p + 0.9m + 0.65l = 13.05.
1 2 3

We can either solve the above system by calculator or by hand. We’ll do the latter:
2× = minus = yields 0.25p − 0.05l = 0.63 or 25p − 5l = 63.
2 3 4

4× = minus 3× = yields 1.35p − 0.45l = 2.52 or 135p − 45l = 252.

2 1 5

9× = minus = yields 90p = 315 or p = 3.5.

4 5 6

Now plug = into = to get l = 4.9.

6 4 7

Next plug = and = into = to get m = 2.6.

6 7 1

So, the total amount paid by Lee Lian was:

1.3p + 0.25m + 0.5l = 1.3 ⋅ 3.5 + 0.25 ⋅ 2.6 + 0.5 ⋅ 4.9 = 4.55 + 0.65 + 2.45 = 7.65 dollars.
4x + 1 13
4A480. (9233 N2007/II/4)(i) Write y = =4+ .
x−3 x−3
As x → 3, y → ±∞. So, x = 3 is a vertical asymptote.
1631, Contents
As x → ±∞, y → 4. So, y = 4 is a horizontal asymptote.
(ii) For the y-intercept, plug in x = 0 to get y = 4 + = −1/3. So, the only y-intercept is
(0, −1/3).
For the x-intercept, plug in y = 0 to get 0 = 4 + or x = −1/4. So, the only x-intercept
is (−1/4, 0).

y = (4x + 1) / (x - 3)
(3, 4)
y=-x+7 Centre y=x +1
Line of Line of
symmetry symmetry

horizontal asymptote

(- 1 / 4, 0) asymptote
(0, - 1 / 3)

(iii) Domainf −1 = Rangef = R ∖ {4}. Write y = f (x) = 4 + . Then do the algebra:
13 13 13
y−4= ⇐⇒ = x − 3 ⇐⇒ 3 + = x.
x−3 y−4 y−4
(Note that the second step is permitted because y ≠ 4.) So, f −1 (x) = 3 +
A481 (9233 N2006/I/3)(i) Since Domainf = R+ ⊆ Rangeg = R+ , f g exists and has
mapping rule:
3 3 15
f g (x) = f (g (x)) = f ( ) = 5 ⋅ + 3 = + 3.
x x x

Since Domaing = R+ ⊆ Rangeg = R+ , g 2 exists and has mapping rule:

3 3
g 2 (x) = g (g (x)) = g ( ) = = x.
x 3/x

Since Domaing = R+ ⊆ Rangeg 2 = R+ , g 3 exists and has mapping rule:

1632, Contents
g 3 (x) = g (g 2 (x)) = g (x) = .
We observe that for n odd, g n ∶ R+ → R is defined by g n (x) = 3/x and has range R+ .
And for n even, g n ∶ R+ → R is defined by g n (x) = x and has range R+ .
Thus, g 35 ∶ R+ → R is defined by g n (x) = 3/x and has range R+ .
(ii) h (x) = 5f (x) + 3.
x−9 x−9 x − 9 − x2 + 9
A482 (9233 N2006/II/1). ≤ 1 ⇐⇒ − 1 ≤ 0 ⇐⇒ ≤ 0 ⇐⇒
x2 − 9 x2 − 9 x2 − 9
x − x2 x (1 − x) x (x − 1)
≤ ⇐⇒ ≤ ⇐⇒ ≥
0 0 0.
x2 − 9 (x + 3) (x − 3) (x + 3) (x − 3)
Let N and D denote the numerator and denominator of the LHS of ≥. We have:

N 1
⇐⇒ “N ≥ 0 AND D ≥ 0 AND x ≠ ±3” OR “N ≤ 0 AND D ≤ 0 AN

⇐⇒ “’x ≤ 0 OR x ≥ 1’ AND ’x ≤ −3 OR x ≥ 3’ AND x ≠ ±3” OR “0 ≤ x ≤ 1 AND −3 ≤ x ≤ 3

⇐⇒ “’x ≤ 0 OR x ≥ 1’ AND ’x< − 3 OR x>3’” OR “0 ≤ x ≤ 1 AND −3

⇐⇒ “x < −3 OR x > 3” OR 0≤x≤1

Altogether then, the inequality holds if and only if x ∈ (−∞, −3) ∪ [0, 1] ∪ (3, ∞).

1633, Contents

130.2. Ch. 110 Answers (Sequences and Series)

A483 (9758 N2017/I/9)(a)(i) un = Sn − Sn−1 = An2 + Bn − [A (n − 1) + B (n − 1)] =


2An − A + B.
(ii) u10 = 2A ⋅ 10 − A + B = 19A + B = 48 and u17 = 2A ⋅ 17 − A + B = 33A + B = 90.
1 2

= minus = yields 14A = 42 or A = 3. Now by =, B = 48 − 19A = 48 − 57 = −9.

2 1 1

(b) r2 (r + 1) − (r − 1) r2 = r2 [(r + 1) − (r − 1) ] = r2 [(r + 1 − (r − 1)) (r + 1 + r − 1)] =

2 2 2 2

r2 [2 (2r)] = 4r3 . So k = 4.

∑ 4r3 = ∑ [r2 (r + 1) − (r − 1) r2 ]
n n
2 2

r=1 r=1

= 12 ⋅ 22 − 02 ⋅ 12 + 22 ⋅ 32 − 12 ⋅ 22 + ⋅ ⋅ ⋅ + n2 ⋅ (n + 1) − (n − 1) ⋅ n2 .
2 2

= −02 ⋅ 12 + n2 ⋅ (n + 1) = n2 ⋅ (n + 1)
2 2

an+1 xn+1 / (n + 1)!

= =
(c) We have: .
an xn /n! n+1

lim ∣ ∣ = lim ∣ ∣ = 0 < 1.

an+1 x
n→∞ an n→∞ n + 1

By D’Alembert’s ratio test then, the given series converges.

If you’ve done Part V (Calculus), you should be able to easily recognise that this is the
Maclaurin series expansion for ex . In fact, this is even printed on List MF26, p. 2:

ex = ∑
r=1 r!

A484 (9758 N2017/II/2). Let the arithmetic progression be (ai ) and the geometric
progression be (gi ). Let d = a2 − a1 be the common difference in the arithmetic progression.
(i) a1 = 3 and a13 = 3 + 12d. So, (3 + 3 + 12d) × 13/2 = 156, so d = (2 × 156/13 − 6)/12 = 1.5.
(ii) The common ratio r cannot be equal to 1, because if so, ∑ gi = 13g1 = 13 × 3 = 39 ≠ 156.

The sum of the first 13 terms is 3 (1 − r13 ) / (1 − r) = 156 ⇐⇒ 3 − 3r13 = 156 − 156r ⇐⇒
r13 − 52r + 51 = 0.
Use your graphing calculator to find that besides 1, the other two possible roots to this last
equation are r ≈ −1.451, 1.210.
These are thus also the two possible values of r.
(iii) We know that gn = 3rn−1 and an = 3 + 1.5 (n − 1).
We are told that r ≈ 1.210. We are told also that gn > 100an .
Thus: 3 ⋅ 1.210n−1 > 100 [3 + 1.5 (n − 1)] = 150 + 150n. Graph 3 ⋅ 1.210x−1 − 150x − 150 in your
graphing calculator. You should find that there is a positive x-intercept. To the left of this
x-intercept, the graph is below the x-axis and to the right, it is above.
1634, Contents
This x-intercept is given by x ≈ 41.149. Thus, the smallest value of n for which the inequality
holds is 42.
A486 (9740 N2016/I/6)(i) Let P (k) be the proposition: ∑ r (r2 + 1) = k (k + 1) (k 2 + k + 2).

r=1 4
We first verify that P (1) is true:
1 1 1
∑ r (r2 + 1) = 1 (12 + 1) = 2 = ⋅ 1 ⋅ 2 ⋅ 4 = ⋅ 1 (1 + 1) (12 + 1 + 2) = . 3
r=1 4 4 4

Now let k be any positive integer. Suppose P (k) is true. Below we show that P (k + 1) is
also true and hence, by the principle of mathematical induction, that the given proposition
is also true:

∑ r (r2 + 1) = ∑ r (r2 + 1) + (k + 1) [(k + 1) + 1]

k+1 k
P(k) 2

r=1 r=1
= k (k + 1) (k 2 + k + 2) + (k + 1) (k 2 + 2k + 2)
= (k + 1) [ k (k 2 + k + 2) + (k 2 + 2k + 2)]
1 5
= (k + 1) ( k 3 + k 2 + 2.5k + 2)
4 4
= (k + 1) (k 3 + 5k 2 + 10k + 8)
= (k + 1) (k + 2) (k 2 + 3k + 4)
= (k + 1) (k + 2) [(k + 1) + (k + 1) + 2] .
(ii) u1 = u0 + 13 + 1 = 2 + 1 + 1 = 4.
u2 = u1 + 23 + 2 = 4 + 8 + 2 = 14.
u3 = u2 + 33 + 3 = 14 + 27 + 3 = 44.
(iii) Through telescoping, we have ∑ (ur − ur−1 ) = un − u0 = un − 2.


But for r ≥ 1, we also have ur − ur−1 = ur−1 + r3 + r − ur−1 = r3 + r. So:


∑ (ur − ur−1 ) = ∑ (r + r) = ∑ r (r2 + 1) = k (k + 1) (k 2 + k + 2).
n n n
3 3 (i)

r=1 r=1 r=1 4

Plugging this last equation into =, we have un = k (k + 1) (k 2 + k + 2) + 2.
A485 (9740 N2016/I/4). We are given that:

a + 3d = br4 ,

a + 8d = br7 ,

a + 11d = br14 .

1635, Contents

(i) = divided by = yields r10 = (a + 11d) / (a + 3d), while = divided by = yields r3 =
3 1 2 1

(a + 8d) / (a + 3d). Now:

a + 11d a + 8d 5a + 55d − 8a − 64d + 3a + 9d
5r10 − 8r3 + 3 = 5 −8 +3= = 0.
a + 3d a + 3d a + 3d
Use your graphing calculator to find that the solutions to this 10th-degree polynomial
equation are r ≈ 0.74, 1.
You can verify by the Factor Theorem that r = 1 is indeed a root. So, given that ∣r∣ < 1,
the only possible value of r is r ≈ 0.74.
(ii) The limit of the infinite geometric series is b/ (1 − r). And the sum of the first n terms
is b (1 − rn ) / (1 − r).
Hence, the sum of the terms after the nth is:

b (1 − rn ) 0.74b b
− = [1 − (1 − r )] = ≈
b b n brn
1−r 1−r 1−r 1−r
A487 (9740 N2015/I/8). First, note that in seconds, the required time interval is
[5 400, 6 300].
(i) The time (in seconds) taken by A to complete the 50 laps is:

Number of terms 50
(First term + Last term) × = (T + T + 49 × 2) × = 50T + 49 × 50 =
2 2
50T + 2450.

So, we need 50T + 2 450 ∈ [5 400, 6 300] or 50T ∈ [2 950, 3 850] or T ∈ [59, 77].
(ii) The time (in seconds) taken by B to complete the 50 laps is:

1 − r50 1 − 1.0250 1.0250 − 1

=t =t = 50t (1.0250 − 1).
1−r 1 − 1.02

So, we need 50t (1.0250 − 1) ∈ [5 400, 6 300] or 3 192.267 > 50t > 3 724.311 or 63.845 > t >
(iii) T = 59 and t ≈ 63.845. So the times taken to complete the 50th lap by A and B are:

T + 49 × 2 = 157 and t × 1.0249 ≈ 168 seconds.

And so the desired difference is 11 seconds.

A488 (9740 N2015/II/4)(a) Let P (k) be the following proposition:

k (k + 1) (3k 2 + 31k + 74).
∑ r (r + 2) (r + 5) =
r=1 12

We show that P (1) is true:

1 1
∑ r (r + 2) (r + 5) = 1 × 3 × 6 = 18 = 1 × 2 × 108 = 1 (1 + 1) (3 ⋅ 12 + 31 ⋅ 1 + 74). 3
r=1 12 12

1636, Contents

We next show that for all j ∈ N, if P (j) is true, then P (j + 1) is also true:

j+1 j
∑ r (r + 2) (r + 5) = ∑ r (r + 2) (r + 5) + (j + 1) (j + 3) (j + 6)
r=1 r=1

= j (j + 1) (3j 2 + 31j + 74) + (j + 1) (j + 3) (j + 6)
= (3j 3 + 31j 2 + 74j) + (j + 1) (j 2 + 9j + 18)
= (3j 3 + 31j 2 + 74j + 12j 2 + 108j + 216)
= (3j 3 + 43j 2 + 182j + 12j 2 + 216)
= (j + 2) (3 (j + 1) 2 + 31 (j + 1) + 74). 3
(b)(i) Write: (2r + 3) A + (2r + 1) B
+ =
2r + 1 2r + 3 (2r + 1) (2r + 3)
(2A + 2B) r + 3A + B
= .
4r2 + 8r + 3

So 2A + 2B = 0 and 3A + B = 2.
1 2

2× = minus = yields 4A = 4 or A = 1 and thus B = −1. Hence:

2 1

2 1 1
= − .
4r2 + 8r + 3 2r + 1 2r + 3
(ii) n
2 n
1 1
∑ 2 = ∑( − )
r=1 4r + 8r + 3 r=1 2r + 1 2r + 3
1 1 1 1 1 1 1 1
= ( − ) + ( − ) + ( − ) + ⋅⋅⋅ + ( − )
3 5 5 7 7 9 2n + 1 2n + 3
1 1
= − .
3 2n + 3

(iii) The sum to infinity is 1/3. Hence, the difference between Sn and the sum to infinity
is . Now:
2n + 3
≤ 10−3 ⇐⇒ 1 000 ≤ 2n + 3 ⇐⇒ n ≥ 498.5.
2n + 3
So the smallest such n is 499.
A489 (9740 N2014/I/6)(i) Let P (k) be the following proposition:

pk = (7 − 4k ).
1637, Contents
We show that P (1) is true:
1 1
p1 = (7 − 4) = (7 − 41 ). 3
3 3
We next show that for all j ∈ N, if P (j) is true, then P (j + 1) is also true:

pj+1 = 4pj − 7
= (7 − 4j ) − 7
= (7 − 4j+1 ). 3
(ii) n n
1 1 n 1 n n
∑ pr = ∑ (7 − 4r ) = ∑ (7 − 4r ) = ( ∑ 7 − ∑ 4r )
r=1 r=1 3 3 r=1 3 r=1 r=1

1 1 − 4n 1 1 − 4n 4 7n 4n+1
= (7n − 4 ) = (7n + 4 )= + − .
3 1−4 3 3 9 3 9

1 1
(b)(i) As n → ∞, we have (n + 1)! → ∞ and so → 0 and thus Sn = 1 − → 1.
(n + 1)! (n + 1)!
(ii) 1 1 1 1
un = Sn − Sn−1 = 1 − − (1 − ) = −
(n + 1)! n! n! (n + 1)!
n+1 1
= − =
(n + 1)! (n + 1)! (n + 1)!

A490 (9740 N2014/II/3)(i)(a) She runs 8 m in Stage 1 and an additional 8 m in each

subsequent stage. So, in Stage n, she runs 8n m. Altogether then, the distance run in the
first 10 stages is:
Number of terms 10
(First term + Last term) × = (8 + 10 ⋅ 8) × = 440 m.
2 2
(b) The distance run in the first n stages is:

Number of terms
(First term + Last term) × = (8 + 8n) × = 4n + 4n2 m.
2 2

Write 4n + 4n2 ≥ 5 000 or n2 + n − 1 250 ≥ 0. By the quadratic formula:


−1 ± 12 − 4 (1) (−1250) √
n= = −0.5 ± 0.5 5001 ≈ 34.859, −35.859.
2 (1)

Hence, ≥ holds if and only if n > −35.859 or n ? 34.859.


Thus, the minimum number of stages to complete in order to have run at least 5 km is 35.
(ii) The distance run in the nth stage is 2n−1 ⋅ 8 m. Thus, the distance run in the first n
stages is:
1638, Contents
2n − 1
∑ (2k−1 ⋅ 8) = 8 ∑ 2k−1 = 8
n n
= 2n+3 − 8 m.
k=1 k=1 2−1

Let j be the largest integer such that 2j+3 − 8 < 10 000. Since 213 = 8 192 and 214 = 16 384,
we have j + 3 = 13 or j = 10.
So, the distance run after completing exactly 10 stages is 213 − 8 = 8 184 m.
So, at the instant at which he has run exactly 10 km or 10 000 m, he has completed 1 816 m
of the 11th stage. Since Stage 11 is 211−1 ⋅ 8 = 8 192 m long , at this instant, he will not
even have completed half of Stage 11. Thus, at this instant, he is 1 816 m away from O and
running away from O.
A491 (9740 N2013/I/7)(i) The nth piece is p = (2/3) × 128 cm long. Applying ln to
1 n−1

=, we get:

ln p = ln [(2/3) × 128] = ln (2/3) + ln 128

n−1 n−1

= (n − 1) ln (2/3) + ln 27 = (n − 1) (ln 2 − ln 3) + 7 ln 2 = (n + 6) ln 2 + (−n + 1) ln 3.

Thus, A = 1, B = 6, C = −1, and D = 1.

(ii) Let Sn be the total length of string cut off after cutting off n pieces. Then:

2 k−1 1 − (2/3) 2 n
n n
Sn = ∑ ( ) × 128 = 128 = 384 − 384 ( ) .
k=1 3 1 − 2/3 3

2 n
Thus, as n → ∞, Sn = 384 − 384 ( ) → 384.
(iii) Let j be the smallest integer such that Sj > 380. Then:

2 j 2 j 3 j 384
Sj = 384 − 384 ( ) > 380 or 4 > 384 ( ) or ( ) > = 96 or
3 3 2 4
3 ln 96
j ln > ln 96 or j> ≈ 11.257.
2 ln (3/2)

Thus, the minimum number of pieces one must cut off in order for the length cut off to
exceed 380 cm is j = 12.
A492 (9740 N2013/I/9)(i) Let P (k) be the following proposition:

∑ r (2r2 + 1) = k (k + 1) (k 2 + k + 1).

r=1 2

We show that P (1) is true:

1 1
∑ r (2r2 + 1) = 1 (2 ⋅ 12 + 1) = 3 = ⋅ 1 ⋅ 2 ⋅ 3 = ⋅ 1 (1 + 1) (12 + 1 + 1). 3
r=1 2 2

We next show that for all j ∈ N, if P (j) is true, then P (j + 1) is also true:

1639, Contents

j+1 j
∑ r (2r + 1) = ∑ r (2r2 + 1) + (j + 1) [2 (j + 1) + 1]
2 2

r=1 r=1

= j (j + 1) (j 2 + j + 1) + (j + 1) (2j 2 + 4j + 3)
j+1 3 2 j+1
= (j + j + j) + (4j 2 + 8j + 6)
2 2
j+1 3
= (j + 5j 2 + 9j + 6)
= (j + 1) (j + 2) (j 2 + aj + b)
= (j + 1) (j + 2) (j 2 + 3j + 3)
= (j + 1) (j + 2) [(j + 1) + (j + 1) + 1].
f (r) − f (r − 1) = (2r3 + 3r2 + r + 24) − [2 (r − 1) + 3 (r − 1) + (r − 1) + 24]
3 2

= (
2r3 + 3r2

− [2 (r3 − 3r2
+3r −1) + 3 (r2
  −2r +1) + (
r +24]
 = 6r .

So, a = 6. And hence:

1 n 1
∑ r2 = ∑ [f (r) − f (r − 1)] = [f (1) − f (0) + f (2) − f (1) + ⋅ ⋅ ⋅ + f (n) − f (n − 1)]
r=1 6 r=1 6
f (n) − f (0) 2n3 + 3n2 + n + 24 − 24 2n3 + 3n2 + n n (n + 1) (2n + 1)
= = = = .
6 6 6 6
∑ f (r) = ∑ (2r + 3r + r + 24) = ∑ [r (2r2 + 1) + 3r2 + 24]
n n n
3 2
r=1 r=1 r=1

= ∑ [r (2r + 1)] + 3 ∑ r + ∑ 24
n n n
2 2
r=1 r=1 r=1

1 n (n + 1) (2n + 1)
= n (n + 1) (n2 + n + 1) + + 24n.
2 2

3⋅2−1 5 3 ⋅ 5/6 − 1 1
A493 (9740 N2012/I/3)(i) u2 = = and u3 = = .
6 6 6 4
3un − 1 un 1 1
(ii) As n → ∞, un+1 − un → 0 ⇐⇒ − un = − − → 0 ⇐⇒ un → − .
6 2 6 3
(iii) Let P (k) be the following proposition:

14 1 k 1
uk = ( ) − .
3 2 3

We show that P (1) is true:

1640, Contents
7 1 14 1 1 1
u1 = 2 = − = ( ) − . 3
3 3 3 2 3

We next show that for all j ∈ N, if P (j) is true, then P (j + 1) is also true:

uj 1 uj 1
uj+1 = uj + uj+1 − uj = uj + (− − )= −
2 6 2 6
( 12 ) − 31 1 14 1 j+1 1
14 j

= − = ( ) − .
P(j) 3
2 6 3 2 3

A494 (9740 N2012/II/4)(i) On the first day of the nth month, she deposits 100 +
(n − 1) 10 = 10n + 90. Hence, through the first day of the nth month, her account has:
Number of terms
(First term + Last term) × = (100 + 10n + 90) × = 5n2 + 95n.
2 2

Let j be the smallest positive integer such that 5j 2 + 95j > 5 000 or j 2 + 19j − 1 000 > 0. By

the quadratic formula, the two roots of x2 + 19x − 1 000 = 0 are:

√ √
−19 ± 192 − 4 (1) (−1 000) 19 ± 627
x= =− ≈ −42.519, 23.519.
2 (1) 2

Hence, j = 24. Thus, her account first exceeded $5 000 on the 24th month — that is, on
December 1 2002.
(ii) Let Sn be the amount in his account on the last day of each month, after interest has
been paid.
Then S1 = 1.005 ⋅ 100 and Sn+1 = 1.005 (Sn + 100).
So, in general:
1.005n − 1
Sn = 1.005n ⋅ 100 + 1.005n−1 ⋅ 100 + ⋅ ⋅ ⋅ + 1.005 ⋅ 100 = 1.005 ⋅ 100 = 20 100 (1.005n − 1) .
1.005 − 1

Let j be the smallest positive integer such that Sj = 20 100 (1.005j − 1) > 5 000 or 201 ⋅
1.005j > 251 or:
251 251 ln (251/201)
1.005j > or j ln 1.005 > ln or j> ≈ 44.541.
201 201 ln 1.005
Hence, j = 45. Thus, his account first exceeded $5 000 in the 45th month — that is, in
September 2004.
(iii) Let r be the interest rate. Then given r, the amount in the account on 2 December
2003 is:

100 (1 + r) + 100 (1 + r) +⋅ ⋅ ⋅+ 100 (1 + r) +

35 34
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ °
Jan 2001 deposit has Feb 2001 deposit has Nov 2003 deposit has Dec 2003
earned interest 35 times earned interest 34 times earned interest once has not earn

1641, Contents

(1 + r) − 1
We want the above amount to equal 5 000. So, write 100 = 5 000 or:

(1 + r) − 1 = 50r r ≈ 0.01796 = 1.796%.

36 1

Note that = is an equation we haven’t learnt to solve in H2 Maths, so you’ll need to use

your calculator here.

A495 (9740 N2011/I/6)(i)
1 1 1 1 1 1
sin (r + ) θ − sin (r − ) θ = 
sin cos θ + cos rθ sin θ − [
sin cos θ − cos rθ sin θ] = 2 cos rθ s
rθ rθ
2 2 2 2 2 2

sin (r + 12 ) θ − sin (r − 12 ) θ
(ii) Rearranging (i), we have cos rθ = . And so:
2 sin 12 θ

n n sin (r + 12 ) θ − sin (r − 12 ) θ
∑ cos rθ = ∑
r=1 r=1 2 sin 12 θ
1 3 1 5 3 2n + 1 2n − 1
= (sin θ − sin θ + sin θ − sin θ + ⋅ ⋅ ⋅ + sin θ − sin θ)
2 sin 12 θ 2 2 2 2 2 2

1 2n + 1 1 sin (n + 12 ) θ 1
= (sin θ − sin θ) = − .
2 sin 12 θ 2 2 2 sin 12 θ 2

(iii) Let P (k) be the following proposition:

k cos 21 θ − cos (k + 12 ) θ
∑ sin rθ = .
r=1 2 sin 12 θ

We show that P (1) is true:

1 2 sin 21 θ sin θ −2 sin − 12 θ sin θ −2 sin 1/2−3/2 θ sin 1/2+3/2

∑ sin rθ = sin θ = = = =
2 2 θ
1 1 1
r=1 2 sin 2 θ 2 sin 2 θ 2 sin 2 θ
ª ª

cos 21 θ − cos 32 θ
, 3
2 sin 21 θ

where the last step uses the last formula printed under Trigonometry in List MF26, p. 3.
We next show that for all j ∈ N, if P (j) is true, then P (j + 1) is also true:

1642, Contents

j+1 j
∑ sin rθ = ∑ sin rθ + sin (j + 1) θ
r=1 r=1

cos 21 θ − cos (j + 21 ) θ
= + sin (j + 1) θ
2 sin 12 θ
cos 21 θ − cos (j + 12 ) θ + 2 sin 21 θ sin (j + 1) θ
2 sin 12 θ
cos 12 θ − cos (j + 12 ) θ + cos (j + 21 ) θ − cos (j + 32 ) θ
2 sin 12 θ
cos 12 θ − cos (j + 32 ) θ
= , 3
2 sin 12 θ

where again = uses the same trigonometric identity as before.


as desired. (Again, to get from = to =, I used the same trigonometric identity as before.)
3 4

A496 (9740 N2011/I/9)(i) The depth drilled on the nth day is 256 − 7 (n − 1) = 263 − 7n
metres and the depth drilled through the nth day is:
Number of terms n 519 7
Dn = (First term + Last term) × = (256 + 263 − 7n) = n − n2 .
2 2 2 2
Thus, the depth drilled on the 10th day is 263 − 7 ⋅ 10 = 193 metres.
Let j be the smallest integer such that 263 − 7j < 10 or j > 253/7 ≈ 36.1. Then j = 37. So,
the total depth drilled is:

519 7
Dj = D37 = ⋅ 37 − ⋅ 372 = 4 810 metres.
2 2
(ii) Through the nth day, the depth drilled is:

8 r−1 1 − (8/9) 8 n
n n
dn = ∑ 256 ( ) = 256 = 2 304 [1 − ( ) ] metres.
r=1 9 1 − 8/9 9

By “theoretical maximum”, the writers of this question probably mean this:

8 r−1 8 n
lim ∑ 256 ( ) = lim 2 304 [1 − ( ) ] = 2 304 metres.
r=1 9 n→∞ 9

8 j
Let j be the smallest integer such that dj > 0.99 ⋅ 2 304 or 2 304 [1 − ( ) ] > 0.99 ⋅ 2 304 or:

8 j 8 j 8
1 − ( ) > 0.99 or 0.01 > ( ) or − ln 100 > j ln or
9 9 9
ln 100
j> ≈ 39.1.
ln (9/8)

1643, Contents

Thus, j = 40.
un = Sn − Sn−1 = n (2n + c) − (n − 1) [2 (n − 1) + c]
A497 (9740 N2010/I/3)(i)
= 2n2 + cn − (2n2 − 4n + 2 + cn − c) = 4n − 2 + c.

(ii) Since un = 4n − 2 + c and un+1 = 4 (n + 1) − 2 + c = 4n + 2 + c, we have:

un+1 = un + 4.

A498 (9740 N2010/II/2)(i) Let P (k) be the following proposition:

∑ r (r + 2) = k (k + 1) (2k + 7).
r=1 6

We show that P (1) is true:

1 1
∑ r (r + 2) = 1 ⋅ 3 = 3 = ⋅ 1 ⋅ 2 ⋅ 9 = 1 (1 + 1) (2 ⋅ 1 + 7). 3
r=1 6 6

We next show that for all j ∈ N, if P (j) is true, then P (j + 1) is also true:
j+1 j
∑ r (r + 2) = ∑ r (r + 2) + (j + 1) (j + 3)
r=1 r=1

= j (j + 1) (2j + 7) + (j + 1) (j + 3)
j+1 j+1
= (2j 2 + 7j) + (6j + 18)
6 6
= (2j 2 + 13j + 18)
= (j + 2) (2j + 7)
= (j + 1) (j + 1 + 1) [2 (j + 1) + 7]. 3
(ii)(a) Observe that:
1 0.5 0.5
= −
r (r + 2) r+2

1 n
0.5 0.5
∑ = ∑( − )
r=1 r (r + 2) r=1 r r+2
0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5
= − + − + − + ⋅⋅⋅ + − + −
1 3 2 4 3 5 n−1 n+1 n n+2
0.5 0.5 0.5 0.5 3 1 1
= + − − = − − .
n + 1 n + 2 4 2 (n + 1) 2 (n + 2)
1 2
1644, Contents
(b) In the formula found in (ii)(a), as n → ∞, the second and third terms vanish. Hence,
1 3
as n → ∞, ∑ → .
r=1 r (r + 2) 4
A499 (9740 N2009/I/3)(i)
1 2 1 n (n + 1) − 2 (n − 1) (n + 1) + (n − 1) n
− + =
n−1 n n+1 (n − 1) n (n + 1)
n2 + n − 2 (n2 − 1) + n2 − n −2 (−1) 2
= = = .
n (n2 − 1) n3 − n n3 − n
So, A = 2.
1 1 n 1 2 1
∑ 3 = ∑( − + )
r=2 r − r 2 r=2 r − 1 r r + 1
1 1 2 1 1 2 1 1 2 1 1 2 1 1 2 1
= ( − + + − + + − + + ⋅⋅⋅ + − + + − + ).
2 1 2 3 2 3 4 3 4 5 n−2 n−1 n n−1 n n+1

Observe that the terms with denominators 3 through n − 1 are cancelled out. Hence:
1 1 1 2 1 1 2 1 1 1 1 1 1 1 1
∑ = ( − + + − + ) = ( − + ) = − + .
r=2 r − r 2 1 2 2 n n n+1 2 2 n n+1 4 2n 2 (n + 1)

(iii) In the formula found in (ii), as n → ∞, the second and third terms vanish. Hence, as
1 1
n → ∞, ∑ 3 → .
r=2 r − r 4
A500 (9740 N2009/I/5)(i) Let P (k) be the following proposition:
∑ r2 = k (k + 1) (2k + 1).
r=1 6

We show that P (1) is true:

1 1
∑ r 2 = 12 = 1 = ⋅ 1 ⋅ 2 ⋅ 3 = 1 (1 + 1) (2 ⋅ 1 + 1). 3
r=1 6 6

We next show that for all j ∈ N, if P (j) is true, then P (j + 1) is also true:
j+1 j
∑ r = ∑ r2 + (j + 1) = j (j + 1) (2j + 1) + (j + 1)
2 2 P(j) 2

r=1 r=1 6
j+1 j+1 j+1
= (2j 2 + j) + (6j + 6) = (2j 2 + 7j + 6)
6 6 6
j+1 1
= (j + 2) (2j + 3) = (j + 1) (j + 1 + 1) [2 (j + 1) + 1]. 3
6 6
(ii) 2n
1 2n n
∑ r = ∑ r − ∑ r2 = 2n (2n + 1) (2 ⋅ 2n + 1) − n (n + 1) (2n + 1)
2 2
r=n+1 r=1 r=1 6 6
n (2n + 1) n (2n + 1) n (2n + 1)
= (8n + 2) − (n + 1) = (7n + 1).
6 6 6
1645, Contents
A501 (9740 N2009/I/8)(i) Let r be the common ratio. Then the 25th bar has length
20r24 = 5 cm and so r = ( ) 1/24 = 0.51/12 .
In the limit, the total length of all the bars is ≈ 356.343 cm.
And so indeed, no matter how many bars there are, their total length cannot exceed 357 cm.
1 − r25
(ii) The total length is L = 20 ≈ 272.257 cm.
The length of the 13th bar is 20r12 = 20 ⋅ (0.51/12 ) = 10 cm.

(iii) The total length L is:

Number of terms 25
L = (Length of 25th bar + Length of 1st bar) × = (5 + 5 + 24d) =
2 2
125 + 300d = 272.257.

So, d ≈ 0.491 cm and the length of the longest bar (the 1st bar) is 5 + 24d ≈ 16.781 cm.
A502 (9740 N2008/I/2). Let P (k) be the following proposition:
Sk = ∑ ur = k (k + 1) (4k + 5).
r=1 6

We show that P (1) is true:

1 1
S1 = ∑ ur = u1 = 1 (2 ⋅ 1 + 1) = 3 = ⋅ 1 ⋅ 2 ⋅ 9 = ⋅ 1 (1 + 1) (4 ⋅ 1 + 5). 3
r=1 6 6

We next show that for all j ∈ N, if P (j) is true, then P (j + 1) is also true:

1 j+1 j+1
Sj+1 = ∑ ur + (j + 1) (2j + 3) = j (j + 1) (4j + 5) + (j + 1) (2j + 3) = (4j 2 + 5j) + (1

r=1 6 6 6
j+1 j+1 1
= (4j 2 + 17j + 18) = (j + 2) (4j + 9) = (j + 1) (j + 1 + 1) [4 (j + 1) + 5]. 3
6 6 6
A503 (9740 N2008/I/10)(i) On the first day of the nth month, she saves 10 + 3 (n − 1) =
7 + 3n dollars.
Thus, the total saved through the first day of the nth month is:
Number of terms n 3 17
(First term + Last term) × = (10 + 7 + 3n) = n2 + n.
2 2 2 2
3 2 17
Let j be the smallest positive integer such that j + j > 2 000 or 3j 2 + 17j − 4 000 > 0.
2 2
By the quadratic formula, the solution to 3x2 + 17x − 4000 = 0 is:
√ √
−17 ± 172 − 4 (3) (−4 000) −17 ± 48 289
x= = ≈ −39.458, 33.791
2 (3) 6
1646, Contents
Thus, j = 34. So, she will have saved over $2 000 on 1 October 2011.
In the 1st month (Jan 2009), she has saved 10 dollars. In the 2nd (Feb 2009), she has saved
10 + (10 + 1 × 3) dollars. So in the nth month, she has saved 10 + (10 + 1 × 3) + (10 + 2 × 3) +
. . . [10 + (n − 1) × 3] = [20 + (n − 1) × 3] × = 8.5n + 1.5n2 dollars. Set 8.5n + 1.5n2 = 2000
and solve:

−8.5 ± 8.52 − 4(1.5)(−2000) −8.5 ± 109.874
n= =
3 3
−8.5 + 109.874
We can ignore the negative root. So n = ≈ 33.791. So it is only in the 34th
month that she has saved over $2000. That’s 1 October 2011.
(ii)(a) At the end of 2 years, her original $10 has earned 10 × 1.0224 − 10 ≈ 6.084 dollars in
compound interest.
(b) Just after interest has been paid on the last day of the nth month, the total in her
account is:
1.02n − 1
10 ⋅ 1.02 + 10 ⋅ 1.02
n n−1
+ ⋅ ⋅ ⋅ + 10 ⋅ 1.02 = 10 ⋅ 1.02 ⋅
= 510 (1.02n − 1) dollars.
1.02 − 1
Hence, at the end of 2 years, just after interest has been paid on 31 December 2010, the
total in her account is:

510 (1.0224 − 1) ≈ 310.303 dollars.

(c) Let j be the smallest positive integer such that 510 (1.02j − 1) > 2 000 or 510⋅1.02j > 2 510
251 251 251
1.02j > or j ln 1.02 > ln or j > ln ÷ ln 1.02 ≈ 80.476.
51 51 51
Thus, it is only after j = 81 complete months that her total savings first exceed $2 000.
A504 (9233 N2008/II/2). Let an and gn denote the nth terms of the arithmetic and
geometric progressions. Let d and r be the corresponding common difference and ratio. We
are given that:
1 1 r 1
a2 + g2 = +d+ = d+ = 0.
r 1
or or
2 2 2 2 2

1 1 r2 1 r2 2 3
And: a3 + g3 = or + 2d + = or 2d + = − .
8 2 2 8 2 8

2× = minus = yields r − r2 /2 = 3/8 or 4r2 − 8r + 3 = 0 or:

1 2

8 ± (−8) − 4 (4) (3)
√ 1 1 3
r= = 1 ± 1 − 3/4 = 1 ± = , .
2 (4) 2 2 2

Since the geometric progression converges, ∣r∣ < 1 and so r = 1/2. And thus, its sum to
infinity is:
1647, Contents
= = 1.
1 − r 1 − 1/2

A505 (9740 N2007/I/9)(i) Using your graphing calculator, α ≈ 0.619 and β ≈ 1.512
(ii) Suppose xn → L. Then xn+1 − xn → 0. Or:
1 1 L
xn+1 − xn = exn − xn → 0 or e − L = 0.
3 3
Equivalently, L equals a solution to ex − x = 0. So, L equals α or β.
Remark 162. The answer to (ii) takes for granted certain results that students were not
taught (even under the old 9740 syllabus). This question should never have been asked.
(iii) If x1 = 0, then x2 = , x3 ≈ 0.465, x4 ≈ 0.531, x5 ≈ 0.567, x6 ≈ 0.588, . . . , x15 ≈ 0.619.
So the sequence converges to α ≈ 0.619.
If x1 = 1, then x2 ≈ 0.906, x3 ≈ 0.825, x4 ≈ 0.761, x5 ≈ 0.713, x6 ≈ 0.680, . . . , x17 ≈ 0.619. So
the sequence converges to α ≈ 0.619.
If x1 = 2, then x2 ≈ 2.463, x3 ≈ 3.913, x4 ≈ 16.690, x5 ≈ 5 903 230.335. “Clearly”, the sequence
(iv) From the graph of y = ex − 3x, we observe that if α < xn < β, then ex − 3x < 0 or ex < x
or xn+1 < xn .
Similarly, we observe that if x < α or x > β, then ex − 3x > 0 or ex > x or xn+1 > xn .
(v) If xn > β ≈ 1.512, then (iv) tells us that xn+1 > xn and therefore that the sequence
diverges. We saw this with x1 = 2 in (iii).
If xn ∈ (α, β) ≈ (0.619, 1.512), then (iv) tells us that xn+1 < xn . We saw this with x1 = 1 in
If xn < α ≈ 0.619, then (iv) tells us that xn+1 > xn . We saw this with x1 = 0 in (iii).
A506 (9740 N2007/I/10)(i) We are given that the first term of the geometric progression
equals a.

ra = a + 3d, r2 a = a + 5d.
1 2
We are also given:

Rearranging = and = so that d is on one side, we get:

1 2

a (r − 1) a (r2 − 1)
d= = or 5r − 5 = 3r2 − 3 or 3r2 − 5r + 2 = 0.
3 5

(ii) 3r2 − 5r + 2 = (3r − 2) (r − 1) = 0 and so r = 2/3 or r = 1. But if r = 1, then by =, d = 0,


contradicting our assumption that d ≠ 0. Hence, r = 2/3. Since ∣r∣ < 1, the geometric series
converges to:

= 3a.
1648, Contents
ra − a −a/3
(iii) From =, d = = = − . And now:
1 a
3 3 9
n−1 n−1 19 − n
S = [a + a + (n − 1) d] = an + nd = an (1 − ) = an
2 2 18 18
19 − n
S > 4a ⇐⇒ an > 4a ⇐⇒ n (19 − n) > 72 ⇐⇒ n2 − 19n + 72 < 0. By the quadratic
formula, x2 − 19x + 72 = 0 has solutions:

19 ± (−19) − 4 (1) (72) 19 ± 73
x= = ≈ 5.228, 13.772.
2 (1) 2

So, < holds if and only if 5.228 > n > 13.772. Of course, n must be an integer. And so, the

set of possible values of n for which < or S > 4a holds is {6, 7, 8, 9, 10, 11, 12, 13}.

A507 (9740 N2007/II/2)(i) Let P (k) be the following proposition:

uk = .
We show that P (1) is true:

u1 = 1 = . 3
We next show that for all j ∈ N, if P (j) is true, then P (j + 1) is also true:

2j + 1 2j + 1 (j + 1) − (2j + 1)
uj+1 = uj − = 2− =

j 2 (j + 1) j 2 (j + 1) j 2 (j + 1)
2 j 2 2

j2 1
= = 2. 3
j 2 (j + 1) (j + 1)

(ii) N
2n + 1 N
∑ 2 = ∑ (un − un+1 )
n=1 n (n + 1)

= u1 − u2 + u2 − u3 + ⋅ ⋅ ⋅ + uN − uN +1
= u1 − uN +1 = 1 − 2.
(N + 1)
(iii) In the formula just found in (ii), as N → ∞, the second term vanishes, so that the
series converges to 1.
2n + 1 2 (n + 1) − 1
(iv) First observe that 2 = . Thus:
n (n + 1) 2 (n + 1) 2 (n + 1 − 1) 2

2n − 1 N −1
2n + 1 1
∑ 2 = ∑ 2 = 1 − 2.
n=2 n (n − 1) n=1 n (n + 1)
2 2 N
1649, Contents
A508 (9233 N2007/I/14). Let P (k) be the following proposition:

k cos 21 x − cos (k + 21 ) x
∑ sin rx = .
r=1 2 sin 12 x

We show that P (1) is true:

1 2 sin x sin 21 x 1 cos 21 x − cos (1 + 12 ) x

∑ sin rx = sin x = = . 3
r=1 2 sin 12 x 2 sin 12 x

(= used the last trigonometric identity printed on List MF26, p. 3.)


We next show that for all j ∈ N, if P (j) is true, then P (j + 1) is also true:
j+1 j
∑ sin rx = ∑ sin rx + sin (j + 1) x
r=1 r=1

cos 12 x − cos (j + 21 ) x
= + sin (j + 1) x
2 sin 12 x
cos 12 x − cos (j + 21 ) x + 2 sin 12 x sin (j + 1) x
2 sin 12 x

cos 12 x cos (j+21 ) x
+cos (j+21 ) x − cos (j + 32 ) x


2 sin 12 x
cos 12 x − cos (j + 1 + 12 ) x
= . 3
2 sin 12 x

(Again, = used the same trigonometric identity as before.)


32n − 1 27 2n
A509 (9233 N2007/II/1). ∑ 3r+2 = 9 ∑ 3r = 9 ⋅ 3 = (3 − 1).
r=1 r=1 3 − 1 2
A510 (9233 N2006/I/1). The first term is S1 = 6 − 1−1 = 4.
2 4 1
The common ratio is (S2 − S1 ) ÷ S1 = S2 ÷ S1 − 1 = (6 − 2−1 ) ÷ 4 − 1 = − 1 = .
3 3 3
A511 (9233 N2006/I/11)(i) Let P (k) be the following proposition:
∑ r3 = k 2 (k + 1) 2 .
r=1 4

We show that P (1) is true:

1 1
∑ r 3 = 13 = 1 = ⋅ 1 ⋅ 4 = 12 (1 + 1) 2 . 3
r=1 4 4

We next show that for all j ∈ N, if P (j) is true, then P (j + 1) is also true:
1650, Contents
j+1 j
1 2
∑ r = ∑ r3 + (j + 1) = j (j + 1) 2 + (j + 1)
3 3 P(j) 3

r=1 r=1 4
(j + 1) 2 2 (j + 1) 2 (j + 1) 2 2 (j + 1) 2
= j + (4j + 4) = (j + 4j + 4) = (j + 2) .
4 4 4 4
n n
(ii) 2 + 4 + ⋅ ⋅ ⋅ + (2n) = ∑ (2r) = 8 ∑ r3 = 2n2 (n + 1) 2 .
3 3 3 3

r=1 r=1
∑ (2r − 1) 3 = 13 + 33 + . . . (2n − 1) = 13 + 23 + ⋅ ⋅ ⋅ + (2n) − [23 + 43 + ⋅ ⋅ ⋅ + (2n) ]
3 2 2

2n n
= ∑ r3 − ∑ (2r) = (2n) (2n + 1) 2 − 2n2 (n + 1) 2
3 2

r=1 r=1 4

= n2 [4n2 + 4n + 1 − (2n2 + 4n + 2)] = n2 (2n2 − 1).

1651, Contents

130.3. Ch. 111 Answers (Vectors)
A512 (9758 N2017/I/6)(i) Assuming t takes on all values in R, the vector equation
r = a + tb describes the line with direction vector b and which passes through the point
with position vector a.
(ii) The vector equation r ⋅ n = d describes the plane that has normal vector n and is of
distance ∣d∣ away from the origin. (See Corollary 23.)
(iii) Since b ⋅ n ≠ 0, the line is not parallel to the plane. Hence (Fact 107), the line and
plane intersect at exactly one point. This point is given by the solution of the following
system of (two vector) equations:

r = a + tb and r ⋅ n = d.

To solve these equations, plug the first into the second:

(a + t̂b) ⋅ n = d ⇐⇒ a ⋅ n + t̂b ⋅ n = d ⇐⇒ t̂ = .
That is, t̂ solves the above system of equations.
And this solution corresponds to the point at which the line and plane intersect. This point
has position vector:
a + t̂b = a + b.
A513 (9758 N2017/I/10)(i) The existing cable corresponds to the line described by:

r = (0, 0, 0) + λ (3, 1, −2) (λ ∈ R)

The new cable corresponds to the line with direction vector:

P Q = Q − P = (5, 7, a) − (1, 2, −1) = (4, 5, a + 1) .

And so, this line may be described by:

r = (1, 2, −1) + µ (4, 5, a + 1) (µ ∈ R).

If the two lines intersect, then there exist real numbers λ̂ and µ̂ such that:

3λ̂ = 1 + 4µ̂,
⎛0⎞ ⎛ 3 ⎞ ⎛ 1 ⎞ ⎛ 4 ⎞
⎜ 0 ⎟ + λ̂ ⎜ 1 ⎟ = ⎜ 2 ⎟ + µ̂ ⎜ 5 ⎟ λ̂ = 2 + 5µ̂,
⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝0⎠ ⎝ −2 ⎠ ⎝ −1 ⎠ ⎝ a+1 ⎠
−2λ̂ = −1 + (a + 1) µ̂ (λ ∈ R).

3× = minus = yields 0 = 5 + 11µ̂ or µ̂ = −5/11. And now from =, λ̂ = −3/11. If these values
2 1 2

of λ̂ and µ̂ satisfy =, then:


−3 3 −5 22
−2 ( ) = −1 + (a + 1) ( ) a=−
or .
11 11 5
1652, Contents
(i) Let R = (0, 0, 0) + λ̃ (3, 1, −2) = λ̃ (3, 1, −2).
Ð→ Ð→ Ð→ Ð→
If ∠P RQ is right, then P R ⊥ QR or P R ⋅ QR = 0. But:

⎛ 3 ⎞ ⎛ 1 ⎞ ⎛ 3λ̃ − 1 ⎞
P R = R − P = λ̃ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎜ 1 ⎟ − ⎜ 2 ⎟ = ⎜ λ̃ − 2 ⎟ and
⎝ −2 ⎠ ⎝ −1 ⎠ ⎝ −2λ̃ + 1 ⎠
⎛ 3 ⎞ ⎛ 5 ⎞ ⎛ 3λ̃ − 5 ⎞
QR = R − Q = λ̃ ⎜ ⎟ ⎜ ⎟ ⎜
⎜ 1 ⎟ − ⎜ 7 ⎟ = ⎜ λ̃ − 7

⎝ −2 ⎠ ⎝ −3 ⎠ ⎝ −2λ̃ + 3 ⎠

Ð→ Ð→
P R ⋅ QR = (3λ̃ − 1) (3λ̃ − 5) + (λ̃ − 2) (λ̃ − 7) + (−2λ̃ + 1) (−2λ̃ + 3)
= 14λ̃2 − 35λ̃ + 22.

This is a quadratic expression in λ̃ with determinant (−35) − 4 (14) (22) < 0. Hence,
Ð→ Ð→ Ð→ Ð→
P R ⋅ QR ≠ 0 or P R ⊥/ QR and thus ∠P RQ cannot be right,
(iii) Let R = (0, 0, 0) + λ̄ (3, 1, −2) = λ̄ (3, 1, −2).
√ √
∣P R∣ = (3λ̄ − 1) + (λ̄ − 2) + (−2λ̄ + 1) = 14λ̄2 − 14λ̄ + 6.
2 2 2

∣P R∣ is minimised at “−b/2a” (Fact 20):
λ̄ = − = 0.5.
Ð→ √ √ √
Hence, R = λ̄ (3, 1, −2) = (1.5, 0.5, −1) and ∣P R∣ = 14λ̄2 − 14λ̄ + 6 = 3.5 − 7 + 6 = 2.5.
A514 (9740 N2016/I/5)(i) Method 1. u +v = (2 + a, −1, 2 + b), u −v = (2 − a, −1, 2 − b),
and so:

⎛ 2+a ⎞ ⎛ 2−a ⎞ ⎛ b−2+2+b ⎞ ⎛ 2b ⎞

(u + v) × (u − v) = ⎜
⎜ −1
⎟ × ⎜ −1
⎟ ⎜
⎟ = ⎜ (2 + b) (2 − a) − (2 + a) (2 − b)
⎟ ⎜
⎟ = ⎜ 4b − 4a
⎟ ⎜

⎝ 2+b ⎠ ⎝ 2−b ⎠ ⎝ −2 − a + 2 − a ⎠ ⎝ −2a ⎠

Method 2. Recall that the vector product has the following three properties (Fact 81): the
vector product is distributive and anti-commutative; moreover, the vector product
of a vector with itself is the zero vector. Hence:

(u + v) × (u − v) = u × (u − v) + v × (u − v) (Distributivity)
=u×u−u×v+v×u−v×v (“)
=u×u+v×u+v×u−v×v (Anti-commutativity)
=0+v×u+v×u−0 (Self vector product is zero)
= 2v × u.
1653, Contents
⎛a⎞ ⎛ 2 ⎞ ⎛ b ⎞
But: v×u=⎜ ⎟ ⎜ ⎟ ⎜
⎜ 0 ⎟ × ⎜ −1 ⎟ = ⎜ 2b − 2a

⎝ b ⎠ ⎝ 2 ⎠ ⎝ −a ⎠

⎛ 2b ⎞
So: (u + v) × (u − v) = ⎜
⎜ 4b − 4a

⎝ −2a ⎠

(ii) We are given that 2b = −2a ⇐⇒ b = −a.

⎛ 2b ⎞ ⎛ −2a ⎞
So: (u + v) × (u − v) = ⎜
⎜ 4b − 4a
⎟ = ⎜ −8a ⎟.
⎟ ⎜ ⎟
⎝ −2a ⎠ ⎝ −2a ⎠

√ √ √
∣(u + v) × (u − v)∣ = (−2a) + (−8a) + (−2a) = 72a2 = 72 ∣a∣.
2 2 2
√ √
If 72 ∣a∣ = 1, then a = ±1/ 72.
(c) Recall that the scalar product is distributive and commutative. Hence:

(u + v) ⋅ (u − v) = u ⋅ u − u ⋅ v + v ⋅ u − v ⋅ v = u ⋅ u
−u ⋅
v ⋅
u − v ⋅ v = 0.
√ √
v ⋅ v = u ⋅ u or ∣v∣ = ∣u∣ or ∣v∣ = ∣u∣ = 22 + (−1) + 22 = 9 = 3.
2 2 2

A515 (9740 N2016/I/11)(i)(a) Let w = (−2, 1, 2) be l’s direction vector. We show that
w is perpendicular to the two non-parallel vectors u = (1, 2, 0) and v = (a, 4, −2) = (0, 4, −2)
on p:

w⋅u = (−2, 1, 2) ⋅ (1, 2, 0) = −2 + 2 + 0 = 0, 3

w⋅v = (−2, 1, 2) ⋅ (0, 4, −2) = 0 + 4 + −4 = 0. 3

Since w ⊥ u, v, we have w ⊥ p and thus also l ⊥ p.

To find the point at which l and p intersect, write:

⎛ 1 ⎞ ⎛1 ⎞ ⎛ 0 ⎞ ⎛ −1 ⎞ ⎛ −2 ⎞ 1 + λ̂ = −1 − 2t̂,

⎜ −3 ⎟ + λ̂ ⎜ 2 ⎟ + µ̂ ⎜ 4 ⎟ = ⎜ 0 ⎟ + t̂ ⎜ 1 ⎟ −3 + 2λ̂ + 4µ̂ = t̂,

⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟ ⎜ ⎟
⎝ 2 ⎠ ⎝0 ⎠ ⎝ −2 ⎠ ⎝ 1 ⎠ ⎝ 2 ⎠ 2 − 2µ̂ = 1 + 2t̂.

From =, λ̂ = −2 (1 + t̂). Plug = into = to get:

1 4 4 2

−3 + 2 [−2 (1 + t̂)] + 4µ̂ = t̂ µ̂ = (5t̂ + 7) /4.


Plug = into = to get:

5 3

1654, Contents

2 − 2 (5t̂ + 7) /4 = 1 + 2t̂ t̂ = −5/9.

And now plugging = back into = and =, we have λ̂ = −8/9 and µ̂ = 19/18.
6 4 5

(i)(b) “Obviously”, the two planes must be parallel to p, with one “above” it and the other
“below” it.

Upper plane
The plane p B
D Lower plane

The plane p has normal vector n = (1, 2, 0) × (0, 4, −2) = (−4, 2, 4). We have n̂ = n/ ∣n∣ =
√ 1
(−4, 2, 4) / 36 = (−2, 1, 2).
The plane p contains the point B = (1, −3, 2).
The upper plane contains the point C = B + 12n̂ = B + 4 (−2, 1, 2) = (−7, 1, 10).
The lower plane contains the point D = B − 12n̂ = B − 4 (−2, 1, 2) = (9, −7, −6).
Both planes have normal vector (−2, 1, 2). We have:
Ð→ ÐÐ→
OC ⋅ (−2, 1, 2) = 14 + 1 + 20 = 35 and OD ⋅ (−2, 1, 2) = −18 − 7 − 12 = −37.

And so, the two planes have cartesian equations:

−2x + y + z = 35 and −2x + y + 2z = −37.

(ii) If a line and a plane intersect at zero or more than one points, then they are parallel
(Fact 107). And so, w ⊥ n or w ⋅ n = 0.
But n = (1, 2, 0) × (a, 4, −2) = (−4, 2, 4 − 2a).
So w ⋅ n = (−2, 1, 2) ⋅ (−4, 2, 4 − 2a) = 18 − 4a = 0 or a = 18/4 = 9/2.
Ð→ ÐÐ→
A516 (9740 N2015/I/7)(i) OC = 0.6a and OD = 5/11b.
Ð→ Ð→ Ð→
(ii) BC = OC − OB = 0.6a − b and so the line BC can be written as r = b + λ(0.6a − b) =
0.6λa + (1 − λ)b, for λ ∈ R, as desired.
Ð→ ÐÐ→ Ð→ 5
AD = OD − OA = /11b − a and so the line AD can be written as r = a + µ(5/11b − a) =
(1 − µ)a + 5/11µb, for λ ∈ R, as desired.
(iii) Where the lines meet, we have 0.6λa + (1 − λ)b = (1 − µ)a + 5/11µb. Equating the
coefficients, we have 0.6λ = 1 − µ and 5/11µ = 1 − λ. From =, we have µ = 1 − 0.6λ. Plugging
1 2 1

this into =, we have 5/11 (1 − 0.6λ) = 1 − λ ⇐⇒ 1 − 0.6λ = 11/5 − 11/5λ ⇐⇒ 8/5λ = 6/5 ⇐⇒

λ = 3/4. And µ = 0.55. Altogether then, the position vector of E is 0.45a + 0.25b.

1655, Contents

Ð→ ÐÐ→ Ð→ ÐÐ→
AE = 0.55a − 0.25b and ED = −0.45a + 9/44b. We observe that AE = −9/11ED and so the
desired ratio is 9/11.
A517 (9740 N2015/II/2)(i) The angle is

(2, 3, −6) ⋅ (1, 0, 0) 2

cos−1 ( ) = cos−1 ( √ √ ) ≈ 1.281.
∣(2, 3, −6)∣ ∣(1, 0, 0)∣ 49 1

(ii) The vector from P to a generic point on L is (2, 5, −6) − r = (2, 5, −6) − (1, −2, −4) −
(2λ, 3λ, −6λ) = (1 − 2λ, 7 − 3λ, −2 + 6λ). The length of this vector is
√ √
(1 − 2λ) + (7 − 3λ) + (−2 + 6λ) = 49λ2 − 70λ + 54.
2 2 2

49λ2 − 70λ + 54 = 33 ⇐⇒ 49λ2 − 70λ + 21 = 0 ⇐⇒ 7λ2 − 10λ + 3 = 0 ⇐⇒ (7λ − 3)(λ − 1) = 0

⇐⇒ λ = 3/7, 1.
Hence, the two points are (1, −2, −4)+3/7(2, 3, −6) = 1/7(13, −5, −46) and (1, −2, −4)+(2, 3, −6) =
(3, 1, −10).
49λ2 −70λ+54 is a ∪-shaped quadratic with minimum point given by 98λ−70 = 0 or λ = 5/7.
Hence, the closest point is (1, −2, −4) + 5/7(2, 3, −6) = 1/7(17, 1, −58).
(iii) The plane is parallel to the vectors (2, 3, −6) and (2, 5, −6) − (1, −2, −4) = (1, 7, −2).
It thus has normal vector (2, 3, −6) × (1, 7, −2) = (36, −2, 11). Moreover, we know that
(1, −2, −4) is on the plane. Hence, a cartesian equation is 36x − 2y + 11z = 36 × 1 − 2 × (−2) +
11 × (−4) = −4.
A518 (9740 N2014/I/3)(i) One possibility is that one or both are zero vectors. And if
neither is a zero vector, then they point either in the same direction or in the exact opposite
directions — or equivalently, either â = b̂ or â = −b̂.
(1, 2, −2)
(ii) (1,̂2, −2) = .
∣(1, 2, −2) ⋅ (0, 0, 1)∣ 2
(iii) It is = .
3×1 3
A519 (9740 N2014/I/9)(i) The plane q is parallel to the vectors (1, 2, −3) and (2, −1, 4).
It thus has normal vector (1, 2, −3) × (2, −1, 4) = (5, −10, −5) and hence also normal vector
(−1, 2, 1). It contains the point (1, −1, 3). Altogether then, it has cartesian equation −x +
2y + z = 0.
(ii) Line m has direction vector (−1, 2, 1) × (1, 2, −3) = (−8, −2, −4) and hence also direction
vector (4, 1, 2).
To find a point that is on both planes, try plugging in x = 0. Then from the equation of
q, we have z = −2y. Now plug this also into the equation of p to get 2y − 3(−2y) = 12 or
y = 1.5. Hence, an intersection point is (0, 1.5, −3).
Altogether then, the line m has vector equation r = (0, 1.5, −3) + λ(4, 1, 2), for λ ∈ R.
Ð→ Ð→ Ð→
(iii) AB = OB − OA = (4λ, 1.5 + λ, −3 + 2λ) − (1, −1, 3) = (4λ − 1, 2.5 + λ, −6 + 2λ). So
Ð→ 2
∣AB∣ = (4λ − 1)2 + (2.5 + λ)2 + (−6 + 2λ)2 = 21λ2 − 27λ + 43.25. This lattermost expression
is a ∪-shaped quadratic, with minimum point given by 42λ − 27 = 0 or λ = 9/14. So
18 15 12
B = (4λ, 1.5 + λ, −3 + 2λ) = ( , , − ).
7 7 7
1656, Contents
A520 (9740 N2013/I/1)(i) From the equation for p, we have z = 0.5x − 2.

Plug = into the equation for q to get 2x − 2y + 0.5x − 2 = 6 ⇐⇒ y = 1.25x − 4.

1 2

Now plug = and = into the equation for r to get 5x − 4(1.25x − 4) + µ(0.5x − 2) = −9 ⇐⇒
1 2

4 4µ − 50
0.5µx + 25 − 2µ = 0 ⇐⇒ x =
So if µ = 3, from =, =, and =, we have:
1 2 4

38 119 25
x=− , y=− , z=− .
3 6 3

(ii) From =, if µ = 0, then we have 250, a contradiction. So the three planes do not intersect.

A521 (9740 N2013/I/6)(i) Every vector in the plane can be expressed as the linear
combination of any two vectors with distinct directions (see Fact 61).
ÐÐ→ 4a + 3c
(ii) By the Ratio Theorem, ON = .
(iii) The area of triangle ON C is
ÐÐ→ Ð→ 4a + 3c
0.5 ∣ON × OC∣ = 0.5 ∣ × c∣
= 1/14 ∣(4a + 3c) × c∣
= 1/14 ∣4a × c + 3c × c∣ (distributivity of vector product)
= 1/14 ∣4a × c∣ (v × v = 0)
= 1/14 ∣4a × (λa + µb)∣
= 1/14 ∣4a × λa + 4a × µb∣ (distributivity of vector product)

= 1/14 ∣4a × µb∣ = .
Similarly, the area of triangle OM C is
ÐÐ→ Ð→
0.5 ∣OM × OC∣ = 0.5 ∣0.5b × c∣
= 1/4 ∣b × c∣
= 1/4 ∣b × (λa + µb)∣
= 1/4 ∣b × λa + b × µb∣ (distributivity of vector product)
= 1/4 ∣b × λa∣ (v × v = 0)
= 1/4λ ∣b × a∣ .

Altogether then, 2µ/7 = λ/4 or λ = 8µ/7.

A522 (9740 N2013/II/4)(i) The angle θ between two planes is given by the scalar
product of their normal vectors:
(2, −2, 1) ⋅ (−6, 3, 2) 16 16 16
cos θ = =√ √ = = .
∣(2, −2, 1)∣ ∣(−6, 3, 2)∣ 9 49 3 × 7 21
So θ = 0.705.
(ii) The intersection line of two planes has direction vector given by the vector product of
1657, Contents
their normal vectors: (2, −2, 1) × (−6, 3, 2) = (−7, −10, −6).
A point that is on both planes satisfies both equations 2x − 2y + z = 1 and −6x + 3y + 2z = −1.
Plugging in x = 0, the first equation yields z = 1 + 2y, which when plugged into the second
equation yields y = −3/7. So a point that is on both planes is (0, −3/7, 1/7).
(iii) The distance between a point a and a plane is given by ∣d − a ⋅ n̂∣, where n is its normal
vector and d = r ⋅ n̂.
For p1 , d = 1/3 and for p2 = d = −1/7. Hence, the distance between A(4, 3, c) and the plane
p1 is
1 (4, 3, c) ⋅ (2, −2, 1) 1 2+c 1+c
∣ − ∣=∣ − ∣ = ∣− ∣,
3 3 3 3 3
and the distance between A(4, 3, c) and the plane p2 is
1 (4, 3, c) ⋅ (−6, 3, 2) 1 15 − 2c 14 − 2c
∣− − ∣ = ∣− + ∣=∣ ∣.
7 7 7 7 7
Equating these two distances, we have
1 + c 14 − 2c
− = ⇐⇒ −7 − 7c = 42 − 6c ⇐⇒ c = −49,
3 7
1 + c 14 − 2c
OR = ⇐⇒ 7 + 7c = 42 − 6c ⇐⇒ c = 35/13.
3 7
A523 (9740 N2012/I/5)(i) The area of triangle OAC is
Ð→ Ð→
0.5 ∣OA × OC∣ = 0.5 ∣a × (λa + µb)∣
= 0.5 ∣a × λa + a × µb∣ (distributivity of vector product)
= 0.5 ∣a × µb∣ (v × v = 0)
= 0.5µ ∣(1, −1, 1) × (1, 2, 0)∣

= 0.5µ ∣(−2, 1, 3)∣ = 0.5 14µ.
√ √ √ √
So 0.5 14µ = 126 or µ = 2 × 126/14 = 2 × 9 = 2 × 3 = 6.

(ii) c = λa+µb = λa+4b = λ(1, −1, 1)+4(1, 2, 0) = (4+λ, 8−λ, λ). We are given that ∣c∣ = 5 3.
√ 2
So (4 + λ)2 + (8 − λ)2 + λ2 = 3λ2 − 8λ + 80 = (5 3) = 75 ⇐⇒ 3λ2 − 8λ + 5 = 0 = (3λ − 5)(λ − 1),
so λ = 5/3 or 1. And c = (52/3, 61/3, 5/3) or (5, 7, 1).
A524 (9740 N2012/I/9)(i) r = (7, 8, 9) + λ(8, 16, 8).
(ii) The position vector of N is given by p + (Ð
→ ⋅ v̂) v̂, where v is the direction vector of the
̂ (1, 2, 1)
line, p is a point on the line, and a is the given point (1, 8, 3). Compute (8, 16, 8) = √
and now:
(−6, 0, −6) ⋅ (1, 2, 1) (1, 2, 1) 12
(7, 8, 9) + √ √ = (7, 8, 9) − (1, 2, 1) = (5, 4, 7).
6 6 6

By the Ratio Theorem, the ratio AN ∶ N B = α ∶ 1 satisfies

1658, Contents

(7, 8, 9) + α(−1, −8, 1)
(5, 4, 7) =
= (7 − α, 8 − 8α, 9 + α) .
Solving 5 = , we have 5α + 5 = 7 − α or α = 1/3. So the ratio AN ∶ N B = α ∶ 1 = 1/3 ∶ 1 =
1 ∶ 3.
(iii) To be written.
A525 (9740 N2011/I/7)(i) m = 0.5 (p + q) = 1/6a + 3/10b. The area of triangle OM P is

0.5 ∣m × p∣ = 0.5 ∣0.5 (p + q) × p∣

= 0.25 ∣p × p + q × p∣ (distributivity of vector product)
= 0.25 ∣q × p∣ (v × v = 0)
= 0.25 ∣3/5b × 1/3a∣
= 0.05 ∣b × a∣ = 0.05 ∣a × b∣ .

(ii) (a) Since a is a unit vector, (2p)2 + (6p)2 + (3p)2 = 1 or 49p2 = 1 or p = 1/7.
(b) ∣a ⋅ b∣ is the length of the projection vector of b on a.
(c) a × b = 1/7(2, −6, 3) × (1, 1, −2) = 1/7(9, 7, 8).
A526 (9740 N2011/I/11)(i) A normal vector to the plane is

(4 − (−2), −1 − (−5), −3 − 2) × (4 − 4, −1 − (−3), −3 − (−2))

= (6, 4, −5) × (0, 2, −1) = (6, 6, 12).

Another normal vector to the plane is a scalar multiple of the above, namely (1, 1, 2). We
have (4, −1, −3) ⋅ (1, 1, 2) = −3. Hence, a cartesian equation of p is x + y + 2z = −3.
= z + 3 ⇐⇒ x = 2(z + 3) + 1 = 2z + 7 and
(ii) From the equations for l1 , we have
= z + 3 ⇐⇒ y = −4z − 10.
Plug in = and = into the equations for l2 to get
1 2

2z + 7 + 2 3 −4z − 10 − 1 4 z − 3
= = .
1 5 k

From =, we have 10z + 45 = −4z − 11 ⇐⇒ z = −56/14 = −4. Now from =, we have

3 4

z−3 −7
k=5 = 5 = −7.
−4z − 11 5
(iii) The direction vector of l1 is perpendicular to the normal vector of the plane p, as we
can verify — (2, −4, 1) ⋅ (1, 1, 2) = 0. Moreover, a point on l1 is on p, as we can verify —
(1, 2, −3) ⋅ (1, 1, 2) = −3. Altogether then, l1 is on p.
From the equations for l2 , we have y = 5x+11 and z = −7x−11. Plug these into the equation
for the plane p to get: x + (5x + 11) + 2(−7x − 11) = −3 ⇐⇒ −8x − 11 = −3 ⇐⇒ x = −1. So
y = 6 and z = −4. The intersection point is (−1, 6, −4).

1659, Contents

(iv) The angle θ between l2 and the normal vector to p is given by

(1, 5, −7) ⋅ (1, 1, 2) −8 −4

θ = cos−1 ( ) = cos−1 ( √ √ ) = cos−1 ( √ ) ≈ 1.957.
∣(1, 5, −7)∣ ∣(1, 1, 2)∣ 75 4 5 2

So the acute angle between l2 and p is 2.172 − π/2 ≈ 0.387.

A527 (9740 N2010/I/1)(i) ∣b∣ = 1+22 +22 = 9 = ∣a∣ = (22 + 32 + 62 ) p2 = 49p2 . So p = 3/7.
2 2

6 9 18 6 9 18
(ii) (a + b) ⋅ (a − b) = ( + 1, + 2, + 2) ⋅ ( − 1, − 2, − 2)
7 7 7 7 7 7
13 115 128
=− − + = 0.
49 49 49

(Optional. Actually, more generally, since (a + b) ⋅ (a − b) = ∣a∣ − ∣b∣ , if ∣a∣ = ∣b∣, then
2 2

(a + b) ⋅ (a − b) = 0.)
A528 (9740 N2010/I/10)(i) The line has direction vector (−3, 6, 9), which is a scalar
multiple of the plane’s normal vector (1, −2, −3). So the line is perpendicular to the plane.
(ii) From the equations of the line, we have y = −2x + 19 and z = −3x + 27. Plug these in to
the equation of the plane to get x − 2(−2x + 19) − 3(−3x + 27) = 0 ⇐⇒ 14x − 119 = 0 ⇐⇒
x = 119/14 = 8.5. And so y = 2 and z = 1.5. So the point of intersection is (8.5, 2, 1.5).
−2 − 10
(iii) We can easily verify that the given point satisfies the equations for the line: =
23 + 1 33 + 3
4= = . The point is therefore on the line.
6 9
The point of intersection we found in (ii) (call it X) is equidistant to both A and B.
Moreover, these three points are collinear. Thus, B = (19, −19, −30).
(iv) The area of triangle OAB is

0.5 ∣a × b∣ = 0.5 ∣(−2, 23, 33) × (19, −19, −30)∣

= 0.5 ∣(−63, 567, −399)∣ ≈ 348.

A529 (9740 N2009/I/10)(i) The angle θ between the two planes is given by

(2, 1, 3) ⋅ (−1, 2, 1) 3
θ = cos−1 ( ) = cos−1 ( √ √ ) ≈ 1.237.
∣(2, 1, 3)∣ ∣(−1, 2, 1)∣ 14 4

(ii) The line l has direction vector (2, 1, 3) × (−1, 2, 1) = (−5, −5, 5) and thus also direction
vector (1, 1, −1).
A point (x, y, z) that lies on both planes satisfies 2x+y +3z = 1 and −x+2y +z = 2. Plugging
1 2

in x = 0, = yields y = 1 − 3z and now = yields z = 0. So (x, y, z) = (0, 1, 0).

1 2

Altogether then, the line l has vector equation r = (0, 1, 0) + λ(1, 1, −1), for λ ∈ R.
(iii) The line l is parallel to the plane p3 , as we now verify: (1, 1, −1) ⋅ (2 − k, 1 + 2k, 3 + k) =
2 − k + 1 + 2k − 3 − k = 0. Moreover, the point (0, 1, 0), which is on the line l, is also on the
plane p3 , as we now verify: 2 × 0 + 1 + 3 × 0 − 1 + k(−0 + 2 × 0 + 0 − 2) = 0. Altogether then,
the line l lies in p3 for any constant k.

1660, Contents

We want to find k such that (2, 3, 4) satisfies 2x + y + 3z − 1 + k(−x + 2y + z − 2) = 0.
That is, 2 × 2 + 3 + 3 × 4 − 1 + k(−2 + 2 × 3 + 4 − 2) = 18 + 6k. So k = −3. So the plane is
2x + y + 3z − 1 − 3(−x + 2y + z − 2) = 0 or 5x − 5y + 5 = 0 or x − y + 1 = 0.
Ð→ 2(11, −13, 2) + (14, 14, 14
A530 (9740 N2009/II/2)(i) Let p = OP . By the Ratio Theorem, p =
(12, −4, 6). So the point P is (12, −4, 6).
(ii) AB ⋅ p = (−3, −27, −12) ⋅ (12, −4, 6) = 0.
̂ (6, −2, 3) (6, −2, 3)
(iii) c = (12, −4, 6) = (6,̂
−2, 3) = √ = . ∣a ⋅ c∣ is the length of the
62 + (−2)2 + 32 7
projection vector of a on p.
(iv) a × p = (140, 84, −224). ∣a × p∣ is the area of the parallelogram formed with a and p as
its sides,√
where the heads of a and p are the same point. The area of the triangle OAP is
∣a × p∣ = 1402 + 842 + (−224)2 ≈ 139.
Ð→ Ð→ Ð→
A531 (9740 N2008/I/3)(i) OP = OA + OB = (6, 3, −3).
Ð→ Ð→
(ii) The angle AOB is equal to the angle between the vectors OA and OB:

(1, 4, −3) ⋅ (5, −1, 0) 1

cos−1 ( ) = cos−1 ( √ √ ) ≈ 1.532.
∣(1, 4, −3)∣ ∣(5, −1, 0)∣ 26 26
Ð→ Ð→
(iii) It is ∣OA × OB∣ ≈ 25.981.
A532 (9740 N2008/I/11). You can either find the intersection point using a graphing
calculator or painfully by hand, as I do now:
2 5
From p1 , z = 1 − x + y. Plug = into the equation for p2 to get 3x + 2y − 5z = 3x + 2y −
1 1
3 3
2 5 19 19
5 (1 − x + y) = −5 or x − y = 0 or x = y. And so from =, we have z = 1 + x. Now plug
2 1 3
3 3 3 3
in = and = into the equation for p3 to get 5x + λx + 17 (1 + x) = µ or (22 + λ)x = µ − 17 or
3 2

µ − 17 0.4 4 4 4 7
x= =− = − . So the point of intersection is (− , − , ).
22 + λ 1.1 11 11 11 11
(i) The line has direction vector (2, −5, 3) × (3, 2, −5) = (19, 19, 19) and thus also direction
vector (1, 1, 1). From our work above, x = y at the intersection of the two planes. Plug in
x = 0 to find that the two planes intersect at (0, 0, 1). Altogether then, the line has vector
equation r = (0, 0, 1) + α(1, 1, 1), for α ∈ R.
(ii) Two points on the line are (0, 0, 1) and (−1, −1, 0). Plug these into the equation for
plane p3 to get 17 = µ and −5 − λ = µ, so that µ = −22.
(iii) The line l must be parallel to the plane p3 , so that (1, 1, 1) ⋅ (5, λ, 17) = 0 or λ = −22.
Moreover, the point (0, 0, 1) on the line is not on the plane, so that µ ≠ 17.
(iv) Another vector that is parallel to the plane to be found is (1, −1, 3)−(0, 0, 1) = (1, −1, 2).
The plane thus has normal vector (1, 1, 1) × (1, −1, 2) = (3, −1, −2). Compute also d =
(0, 0, 1) ⋅ (3, −1, −2) = −2. Altogether then, the plane has cartesian equation 3x − y − 2z = −2.

1661, Contents

A533 (9233 N2008/I/11)(i) From the equations of the first line, we have y = 2x − 2 and
z = 5 − x. Plugging these into the equations of the second line, we have
x − 1 1 2x − 2 + 3 2 5 − x − 4
= = .
−1 −3 1
Both = and = imply that x = 4 and so indeed the two lines intersect. (If they didn’t intersect,
1 2

then = would contradict =.) So the point of intersection is (4, 6, 1).

1 2

(ii) The angle between the lines is given by:

(1, 2, −1) ⋅ (−1, −3, 1) −8
cos−1 = cos−1 √ √ ≈ 2.967.
∣(1, 2, −1)∣ ∣(−1, −3, 1)∣ 4 11
This is obtuse. So the acute angle is π − 2.967 ≈ 0.175.
A534 (9740 N2007/I/6)(i) (1, −1, 2) ⋅ (2, 4, 1) = 0.
(ii) By the Ratio Theorem, OM = 1/3 [2(1, −1, 2) + (2, 4, 1)] = 1/3(4, 2, 5).
(iii) The area of triangle OAC is
Ð→ Ð→
0.5 ∣OA × OC∣ = 0.5 ∣(1, −1, 2) × (−4, 2, 2)∣
= 0.5 ∣(−6, −10, −2)∣

= 0.5 140 ≈ 5.916.

A535 (9740 N2007/I/8)(i) The line l has vector equation r = (1, 2, 4) + λ(−3, 1, −3), for
λ ∈ R. Plugging this into the equation for the plane, we have 3(1−3λ)−(2+λ)+2(4−3λ) = 17
⇐⇒ 9 − 16λ = 17 ⇐⇒ λ = −0.5. So the point of intersection is (1, 2, 4) − 0.5(−3, 1, −3) =
(2.5, 1.5, 5.5).
(ii) The angle between l and the normal vector to p is
(−3, 1, −3) ⋅ (3, −1, 2) −16
cos−1 = cos−1 √ √ ≈ 2.946
∣(−3, 1, −3)∣ ∣(3, −1, 2)∣ 19 14
So the angle between the line and the plane is 2.946 − π/2 ≈ 1.376.

∣17 − (1, 2, 4) ⋅ (3, −1, 2)∣ ∣17 − 9∣ 8 4 14
(iii) ∣d − a ⋅ n̂∣ = √ = √ =√ = ≈ 2.138.
14 14 14 7
A536 (9233 N2007/I/7). The foot of the perpendicular a point A to a line is Q +
(QA ⋅ v̂) v̂, where Q is any point on the line and v is the line’s direction vector. Hence,
(4, −5, −5) ⋅ (2, 2, 3) (2, 2, 3)
P = (−3, 8, 3) + √ √
17 17
17(2, 2, 3)
= (−3, 8, 3) − = (−5, 6, 0).
Ð→ √
∣AP ∣ = ∣(−6, 3, 2)∣ = 49 = 7.
A537 (9233 N2007/II/2)(i) OD = 0.75(1, −3, 4). So the line AD has direction vector
(3.25, 3.25, 0) and hence also direction vector (1, 1, 0). So the line AD has equation r =
(4, 1, 3) + λ(1, 1, 0), for λ ∈ R.
1662, Contents
(ii) OC = 0.25(4, 1, 3). So the line BC has direction vector (0, 3.25, −3.25) and hence also
direction vector (0, 1, −1). So the line BC has equation r = (1, −3, 4) + µ(0, 1, −1), for µ ∈ R.
Setting the equations of the two lines equal to each other, we have 4 + λ = 1, 1 + λ = −3 + µ,
and 3 = 4 − µ, so that λ = −3 and µ = 1. And the point of intersection is (1, −2, 3).
Ð→ Ð→ Ð→
A538 (9233 N2006/I/14). By the Ratio Theorem, OP = (1 − λ)OA + λOB = (1 −
Ð→ Ð→ ÐÐ→
λ)(1, −2, 5) + λ(1, 3, 0) = (1, −2 + 5λ, 5 − 5λ). And OQ = (1 − µ)OC + µOD = (1 − µ)(10, 1, 2) +
µ(−2, 4, 5) = (10 − 12µ, 1 + 3µ, 2 + 3µ).
Ð→ ÐÐ→
(i) P Q has direction vector AB × CD = (0, 5, −5) × (−12, 3, 3) = (30, 60, 60) and hence also
direction vector (1, 2, 2).
Ð→ Ð→ Ð→
Moreover, P Q = OQ − OP = (9 − 12µ, 3 + 3µ − 5λ, −3 + 3µ + 5λ), which must be a scalar
multiple of (1, 2, 2). And so 3 + 3µ − 5λ = 2(9 − 12µ) and −3 + 3µ + 5λ = 2(9 − 12µ). Taking
1 2

= minus =, we have −6 + 10λ = 0 or λ = 0.6. Taking = plus =, we have 6µ = 4(9 − 12µ) or

2 1 2 1
µ = 2/3. Altogether then, P Q = (1, 2, 2), as desired.
Ð→ Ð→ Ð→
(ii) First observe that AQ = OQ − OA = (10 − 12µ, 1 + 3µ, 2 + 3µ) − (1, −2, 5) = (2, 3, 4) −
(1, −2, 5) = (1, 5, −1).
Now compute that the area of triangle ABQ is
Ð→ Ð→
0.5 ∣AB × AQ∣ = 0.5 ∣(0, 5, −5) × (1, 5, −1)∣
= 0.55 ∣(20, −5, −5)∣ ≈ 10.607.

1663, Contents

130.4. Ch. 112 Answers (Complex Numbers)
A539 (9758 N2017/I/8)(a) By the quadratic formula:

1 √ √
2±  (−2) − 4 (1 − i) (5 + 5i) 1 ± 1 − (5 + 5i − 5i + 5) 1 ± −9 1 ± 3i


z= = = =

2 (1 − i) 1−i 1−i 1−i


1+i 1 ± 3i 1 ± 3i 1 + i 1 + i ± 3i ∓ 3
Multiply by : = = = −1 + 2i or 2 − i.
1+i 1−i 1−i 1+i 12 + 12
(b)(i) ω 2 = (1 − i) = 1 − 1 − 2i = −2i, ω 3 = (1 − i) (−2i) = −2 − 2i, ω 4 = (1 − i) (−2 − 2i) =

−2 − 2i + 2i − 2 = −4.
We are given that:

ω 4 +pω 3 +39ω 2 +qω+58 = −4+p (−2 − 2i)+39 (−2i)+q (1 − i)+58 = q−2p+54+i (−q − 2p − 78) = 0.

Hence, q − 2p + 54 = 0 and −q − 2p − 78 = 0. Taking = plus =, we have −4p − 24 = 0 or p = −6

1 2 1 2

and q = −66.
(c)(i) Since the coefficients of the given quartic equation are all real, by the Complex
Conjugate Root Theorem (Theorem 12), ω ∗ = 1 + i also solves the given equation. Thus, a
quadratic factor of the given quadratic polynomial is (ω − 1 + i) (ω − 1 − i) = ω 2 + 1 − 2ω + 1 =
ω 2 − 2ω + 2.
6ω 3 + 39ω 2 − 66ω + 58 = (ω 2 − 2ω + 2) (aω 2 + bω + c) = aω 4 + (b − 2a) ω 3 +?ω 2 +?ω + 2c,
ω 4 −write:
where ?’s are coefficients we didn’t bother to compute. Comparing coefficients, we have
a = 1, b = −4, and c = 29.
Thus: ω 4 − 6ω 3 + 39ω 2 − 66ω + 58 = (ω 2 − 2ω + 2) (ω 2 − 4ω + 29).
A540 (9740 N2016/I/7)(a) Simply plug in −1 + 5i and verify that the equation holds:

(−1 + 5i) + (−1 − 8i) (−1 + 5i) + (−17 + 7i) = 1 − 25−10i + 1−5i+8i + 40 − 17+7i = 0.
Note that the Complex Conjugate Roots Theorem (Theorem ???) does not apply here
because the coefficients of the given quadratic equation are not all real.
Let a + ib be the other root. Then:
w2 + (−1 − 8i) w + (−17 + 7i) = (w + 1 − 5i) (w − a − ib) = w2 + (1 − a − 5i − ib) w+?,
where we didn’t bother computing ? because it isn’t necessary.
Comparing coefficients, we have 1 − a = −1 or a = 2 and −5i − ib = −8i or b = 3. Hence, the
other root is 2 + 3i.
(b) Plug in 1 + ai into the given equation:

(1 + ai) − 5 (1 + ai) + 16 (1 + ai) + k = 1 + 3ai − 3a2 − a3 i − 5 (1 + 2ai − a2 ) + 16 + 16ai + k

3 2

= 1 − 3a2 − 5 + 5a2 + 16 + k + i (3a − a3 − 10a + 16a)

= 12 + 2a2 + k + i (9a − a3 ) = 0.

1664, Contents

Comparing coefficients, we have 12 + 2a2 + k = 0 and 9a − a3 = 0. From =, we have a = ±3, 0.
1 2 2

We are given that a > 0 and so a = 3. Plugging = into =, we have k = −30.

3 3 1

A541 (9740 N2016/II/4)(a)(i) ∣z − 3 − i∣ = 1 traces out a unit circle centred on 3 + i =


(3, 1).
arg z = tan−1 0.4 traces out a ray from the origin, with slope 0.4.

∣z − 3 − i∣ = 1 z2

z1 (3, 1)

arg z = tan−1 0.4


(a)(ii) We are asked to find the two points that are on the circle and equidistant from z1
and z2 . These are the two blue points depicted in the figure above.
The line connecting these two blue points is perpendicular to the line with slope 0.4 — it
thus has slope −1/0.4 = −2.5 and direction vector (2, −5), whose unit vector is √ (2, −5).
Moreover, it passes through the point (3, 1). Thus, it may be described by the vector

R = (3, 1) + √ (2, −5) λ ∈ R.


The circle’s radius is 1. And so, the two blue points are given by:

1 2 5
(3, 1) + √ (2, −5) = (3 + √ , 1 − √ ) or
29 29 29
1 2 5
(3, 1) − √ (2, −5) = (3 − √ , 1 + √ ).
29 29 29
√ √ 2 1
(b)(i) ∣w∣ = ∣2 − 2i∣ = 22 + (−2) = 2 2 and arg w = arg (2 − 2i) = − cos−1 √ = − cos−1 √ =
2 2 2
− .
√ √ 1/3 √ √
Thus, w = 2 2e−iπ/4 and the cube roots of w are (2 2) e−iπ/12 = 2e−iπ/12 , 2e(−π/12+2π/3)i =
√ 7iπ/12 √ √
2e , and 2e(−π/12−2π/3)i = 2e−3iπ/4 .
π 1−n
(b)(ii) arg (w∗ wn ) = arg w∗ +n arg w+2kπ = − +2kπ = ( + 4k), where k = −1, 0, 1.
π πn
4 4 2 2
1−n 1−n
So, + 4k = 1 and n > 0. So we must have k = 1 and thus = −3 or n = 7.
2 2
1665, Contents
A542 (9740 N2015/I/9)(a)
w2 (a + ib)2 a2 − b2 + 2abi a2 − b2 + 2abi a + ib
= = = ×
w∗ a − ib a − ib a − ib a + ib

a3 − ab2 + 2a2 bi + a2 ib − ib3 − 2ab2

a2 + b2

= [(a3 − 3ab2 ) + i (3a2 b − b3 )]
a2 + b2
√ √
is purely imaginary if and only if a3 −3ab2 = 0. But a3 −3ab2 = a(a2 −3b2 ) = a (a − 3b) (a + 3b).

So either b = ±a/ 3 or a = 0 (but the latter is explicitly ruled out in the question).

Altogether, the possible values of w = a + ib are given by b = ±a/ 3 and a is any non-zero
real number.
(b)(i) z 5 = 25 eiπ(−0.5) = 25 eiπ(−0.5+2k) for k ∈ Z. So z = 2eiπ(−0.5+2k)/5 for k = 0, ±1, ±2. So
∣z∣ = 2 and arg z = −0.9π, −0.5π, −0.1π, 0.3π, 0.7π.

(ii) z1 − z2 = 2eiπ(0.7) − 2eiπ(−0.9)

= 2eiπ(−0.1) (eiπ(0.8) + eiπ(−0.8) )
= 2eiπ(−0.1) 2i sin 0.8π
= (4 sin 0.8π) ieiπ(−0.1) .

So arg (z1 − z2 ) = arg [(4 sin 0.8π) ieiπ(−0.1) ]

= arg (4 sin 0.8π) + arg i + arg (eiπ(−0.1) ) + 2kπ
= 0 + π/2 − 0.1π + 2kπ = 0.4π (k = 0),

and ∣z1 − z2 ∣ = ∣(4 sin 0.8π) ieiπ(−0.1) ∣

= ∣4 sin 0.8π∣ ∣i∣ ∣eiπ(−0.1) ∣ = 4 sin 0.8π = 4 sin 0.2π,

®´¹¹ ¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
1 1

where the last line uses the fact that sin(π − x) = sin x.
A543 (9740 N2014/I/5)(i) z 2 = (1 + 2i)2 = 12 + (2i)2 + 2(1)(2i) = 1 − 4 + 4i = −3 + 4i.
z 3 = (−3 + 4i)(1 + 2i) = −3 − 6i + 4i + (4i)(2i) = −3 − 2i − 8 = −11 − 2i. So:
1 1 1 −11 + 2i −11 + 2i −11 + 2i −11 + 2i
= = × = = =
z 3 −11 − 2i −11 − 2i −11 + 2i 112 − (2i)2 121 + 4

−11 + 2i 11 2
(ii) Since pz 2 + 3 = p(−3 + 4i) + q = (−3p − q) + i (4p +
q) is real, we have
z 125 125 125
4p + 2 = 0 or q = −250p. And pz 2 + 3 = 19p.
q q
125 z

1666, Contents

A544 (9740 N2014/II/4)(a)(i) This is simply a circle with radius 4 centred on the point
−5 + i. It has cartesian equation (x + 5)2 + (y − 1)2 = 42 .

{z : |z + 5 - i| = 4}

Radius 4
(-5, 1)

(ii) The complex equation ∣z − 6i∣ = ∣z + 10 + 4i∣ is equivalent to the cartesian equation
(x − 0)2 + (y − 6)2 = (x + 10)2 + (y + 4)2 or −12y + 36 = 20x + 100 + 8y + 16 or y = −x − 4.

= into = to get (x + 5)2 +

2 1

So to find the intersection points of the line and the circle, plug √
(−x − 4 − 1)2 = 42 or 2(x + 5)2 = 42 or (x + 5)2 = 8 or x + 5 = ± 8 or x = −5 ± 8. So the
√ √ √ √
possible values of z are −5 ± 8 + (5 ∓ 8 − 4) i = −5 ± 8 + (1 ∓ 8) i.

√ √ 2 −1
(b)(i) w = 3−i, so ∣w∣ = ( 3) + (−1)2 = 2 and arg w = tan−1 √ = − . So w = 2ei(−π/6) .
3 6
And so w = 2 e
6 6 i(−π+2kπ)
=2 e .
6 iπ

(ii) arg ( ∗ ) = arg wn − arg w∗ + 2kπ = n arg w + arg w + 2kπ = (n + 1) arg w + 2kπ = (n +
w n

1) × (−π/6) + 2kπ. A complex number z is real if and only if arg z = 0 or arg z = π. So by
observation, the three smallest positive whole number values of n for which ∗ is real are
5, 11, and 17.
A545 (9740 N2013/I/4)(i) (1+2i)3 = 1+3×2i+3×(2i)2 +(2i)3 = 1+6i−12−8i = −11−2i.
(ii) Since w = 1 + 2i is a root for az 3 + 5z 2 + 17z + b = 0, we have
1667, Contents
0 = a (1 + 2i) + 5 (1 + 2i) + 17 (1 + 2i) + b
3 2

= a(−11 − 2i) + 5(−3 + 4i) + 17 + 34i + b

= (−11a − 15 + 17 + b) + i(−2a + 20 + 34)
= (2 − 11a + b) + i(54 − 2a).
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶
0 0

54 − 2a = 0 Ô⇒ a = 27. And 2 − 11a + b = 0 Ô⇒ b = 11(27) − 2 = 295.

(iii) By the complex conjugate roots theorem, 1 − 2i is also a root for the equation. Write

27z 3 + 5z 2 + 17z + 295 = 27 [z − (1 + 2i)] [z − (1 − 2i)] (z − k)

= 27 [(z − 1)2 − (2i)2 ] (z − k)
= 27 (z 2 − 2z + 5) (z − k)
= 27 [z 3 − (k + 2)z 2 − (2k + 5)z − 5k] .

So 295 = 27 × (−5k) or k = −59/27. The roots are 1 ± 2i and −59/27.

√ √ √ 2
A546 (9740 N2013/I/8)(i) ∣w∣ = ∣(1 − i 3) z∣ = ∣1 − i 3∣ ∣z∣ = 12 + ( 3) ∣z∣ = 2 ∣z∣.
√ √ √
arg w = arg [(1 − i 3) z] = arg (1 − i 3)+arg z+2kπ = tan−1 (− 3)+θ+2kπ = −π/3+θ+2kπ ∈
[−π/3 + 2kπ, π/6 + 2kπ]. So we should choose k = 0 and arg w = −π/3 + θ.
(ii) z is the top-right quarter of the circumference of the circle of radius r, centred on the
Take the position vector of z, rotate it clockwise by π/3 radians about the origin, double
its length — this is the position vector of w.

1668, Contents

re iπ / 2 2re i π / 6

re i 0
{z = re iɅ : Ʌ [0, π / 2]}

2re i (- π / 3)

z 10
(iii) arg ( 2 ) = arg z 10 − arg w2 + 2kπ = 10 arg z − 2 arg w + 2kπ = 10θ − 2 (− + θ) + 2kπ =
w 3
8θ + 2 + 2kπ = π, so θ = (with k = 0).
π π
3 24
A547 (9740 N2012/I/6)(i) z 3 = (1 + ic)3 = 13 + 3(ic) + 3(ic)2 + (ic)3 = 1 + 3ic − 3c2 − ic3 =
(1 − 3c2 ) + i(3c − c3 ).

(ii) z 3 is√real if and only√if 3c − c3 = 0 or c = 0, ± 3. The question already ruled out c = 0.
So c = ± 3 and z = 1 ± i 3.

(iii) z = 1 − i 3 = ∣z∣ ei arg z = 2ei(−π/3) . ∣z n ∣ = 2n > 1000 if and only if n > 9. (The reason is
that 29 = 512 and 210 = 1024.) So the smallest positive integer n is 10.
∣z 10 ∣ = 210 and arg z 10 = 10(−π/3) + 2kπ = 2π/3 (k = 2).
A548 (9740 N2012/II/2)(i) ∣z − (7 − 3i)∣ = 4 describes a circle with centre 7 − 3i and
radius 4.

1669, Contents


{z : |z - (7 - 3i )| = 4}
o x

Radius 4
c = (7, -3)

Radius 4

(ii)(a) a is the point on the circle’s circumference that is closest to the origin a. The line
l through the origin and the centre of the circle passes through a (see Fact ??).
√ √
But the distance of the centre of the circle from the origin is 72 + 32 = 58. The distance
of the centre of the circle to the point a is 4 √
(this is simply the length of the radius). Hence,
the distance of the origin to the point a is 58 − 4.
(b) △abc is right. So ab2 + bc2 = ca2 = 42 = 16.
But the line l has gradient − (because it runs through the origin and the point 7 − 3i) and
3 3 2 49 7 28
so ab = bc. Hence, ( ) × bc2 + bc2 = 16. Or bc2 = 16 × . Or bc = 4 × √ = √ . And
7 7 58 58 58
12 28 12
ab = √ . Hence, a = (7 − √ , −3 + √ ).
58 58 58
(iii) By observation, d is the point where ∣arg z∣ is as large as possible. arg z = arg(7 − 3i) +
4 −3
But △cod is right. So ∠cod = sin−1 √ . Moreover, arg(7 − 3i) = tan−1 .
58 7
−3 4
Altogether then, arg z = tan−1 + sin−1 √ = −0.9579.
7 58
A549 (9740 N2011/I/10)(i) Let (x + iy)2 = x2 − y 2 + i(2xy) = −8i. So x2 − y 2 = 0 and

2xy = −8. From =, we observe that x and y must have opposite signs. From =, x = ±y and
2 2 1

by our observation of the previous sentence, we must have x = −y. And now from =, we

have 2(−y) × y = −8 or −2y 2 = −8 or y = ±2. Altogether then, z1 = −2 + 2i and z2 = 2 − 2i.

(ii) Using the quadratic formula and part (i),
1670, Contents

−4 ± 42 − 4(1)(4 + 2i) √ √
w= = −2 ± 4 − (4 + 2i) = −2 ± −2i = −2 ± (1 + i) = −3 − i, −1 + i.
(iii)(a) This is simply the line that is equidistant to z1 = (−2, 2) and z2 = (2, −2). By
observation, it has cartesian equation y = x.

|z - z1 | = |z - z2 |

|w - w1 | = |w - w2 |

(b) This simply the line that is equidistant to w1 = (−3, −1) and w2 = (−1, 1). By observa-
tion, it has cartesian equation y = x + 2.
(iv) The two lines are parallel and do not intersect.
A550 (9740 N2011/II/1)(i) This is simply the circle with radius 3 and centre 2 + 5i,
including all the points within the circle.

1671, Contents


{z : |z - (2 + 5i )| ≤ 3}
Radius 3

c = (2, 5)

Radius 3


(6, 1)

(ii) The points on the circle’s circumference that are closest to and furthest from the origin
o are a and b. The line l through the origin and the centre of the circle passes through both
a and b (see Fact ??).
√ √ √ √
oc = 22 + 52 = 29 and ac = 3. Hence, oc = 29 − 3. √ Symmetrically, ob = 29 + 3. The
maximum and minimum possible values of ∣z∣ are thus 29 ± 3.
(iii) The locus of points that satisfy both ∣z − 2 − 5i∣ ≤ 3 and 0 ≤ arg z ≤ π/4 is the blue
closed segment.
By observation, ∣z − 6 − i∣ is maximised either at P1 or P2 . These points are given by

(p − 2)2 + (p − 5)2 = 3 ⇐⇒ 2p2 − 14p + 20 = 0

⇐⇒ p2 − 7p + 10 = 0 ⇐⇒ (p − 5)(p − 2) = 0.

√ P 1 = (2,
√ 2) and P 2 =
√ (5, 5). The distances of these points to the point (6, 1) are 42 + 12 =
17 and 12 + 42 = 17. So both are, equally, the furthest point from (6, 1).

√ 2 √
A551 (9740 N2010/I/8)(i) For z1 , r = 1 + ( 3) = 2 and θ = tan ( 3/1) = π/3. For
2 −1
√ √ −1 −3π
z2 , r = (−1) + (−1) = 2 and θ = tan−1 =
2 .
−1 4

1672, Contents

√ −3π −3π
Altogether then, z1 = 2 [cos + i sin ] and z2 = 2 [cos + i sin ].
π π
3 3 4 4
∣z1 ∣ 2 √ −3π 13π −11π
(ii) ∣ ∣ = = √ = 2. arg ( ) = arg z1 − arg z2 = π/3 − = =
z1 z1
z2 ∣z2 ∣ 2 z2 4 12 12
z1 ∗ √ 11π 11π
Hence, ( ) = 2 [cos + i sin ].
z2 12 12
(iii)(a) This is simply the circle with centre z1 and radius 2.

|z - z 1| = 2

arg (z - z 2) = π / 4

z 1= (1, )

z 2 = (-1, -1)

(b) This is simply the ray from the point z2 (but excluding the point z2 ) that makes an
angle π/4 with the horizontal.

(iv) We want to find x > 0 such that ∣(x, 0) − (1, 3)∣ = 2 or (x − 1)2 + 3 = 4 or (x − 1)2 = 1
or x = 0, 2. So (2, 0) is where the locus ∣z − z1 ∣ = 2 meets the positive real axis.
A552 (9740 N2010/II/1)(i)
√ √
6 ± (−6)2 − 4(1)(34) −100
x= =3± = 3 ± 5i.
2 2
(ii) Since −2 + i is a root of x4 + 4x3 + x2 + ax + b = 0, we have

1673, Contents

(−2 + i)4 + 4(−2 + i)3 + (−2 + i)2 + a(−2 + i) + b = 0
⋮ (tedious algebra)
−12 − 2a + b + (16 + a)i = 0.
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¸¹ ¹ ¹ ¹ ¹ ¹ ¶
=0 =0

16 + a = 0 Ô⇒ a = −16. Moreover, −12 − 2a + b = 0 Ô⇒ −12 − 2(−16) + b = 0 Ô⇒ b = −20.

By the complex conjugate roots theorem, −2 − i is also a root. So

x4 + 4x3 + x2 − 16x − 20 = [x − (−2 + i)] [x − (−2 − i)] (x2 + cx + d)

= (x + 2 − i) (x + 2 + i) (x2 + cx + d)
= [(x + 2) − i2 ] (x2 + cx + d)

= (x2 + 4x + 5) (x2 + cx + d)
= x4 + (4 + c)x3 + (5c + 4d)x + 5d.

Comparing coefficients, we have c = 0 and d = −4. So x2 + cx + d = x2 − 4 = (x − 2)(x + 2). So

the other two roots are ±2.
A553 (9740 N2009/I/9)(i) z 7 = 1+i = 21/2 eiπ/4 = 21/2 eiπ(1/4+2k) . By de Moivre’s Theorem,
z = 21/14 eiπ(1/28+2k/7) , for k = 0, ±1, ±2, ±3.

1674, Contents



(iii) ∣z − z1 ∣ = ∣z − z2 ∣ is the line (blue) that is equidistant to the points z1 = 21/14 eiπ(1/28) and
z2 = 21/14 eiπ(1/28+2/7)
Explanation #1: 0 satisfies the equation ∣z − z1 ∣ = ∣z − z2 ∣ as we can easily verify — ∣0 − z1 ∣ =
∣0 − z2 ∣ = 21/14 . So 0 is in the locus ∣z − z1 ∣ = ∣z − z2 ∣.
Explanation #2: The perpendicular bisector of a chord runs through the centre of the
circle. So in this case, the perpendicular bisector of the chord z1 z2 runs through the origin
(which is the centre of the circle).
A554 (9740 N2008/I/8)(i)
√ 2 √ √ √ √ √
(1 + 3i) (1 + 3i) = (−2 + 2 3i) (1 + 3i) = −2 − 6 + (2 3 − 2 3) i = −8.

(ii) 0 = 2z 3 + az 2 + bz + 4
√ √
= 2(−8) + a (−2 + 2 3i) + b (1 + 3i) + 4

= −12 − 2a + b + i 3 (2a + b)
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶

1675, Contents

Adding = and = together, we have −12 + 2b = 0 or b = 6. And now from =, a = −3.
1 2 2

(iii) By the complex conjugate roots theorem, another root is 1 − 3i. So
√ √
2z 3 − 3z 2 + 6z + 4 = 2 [z − (1 + 3i)] [z − (1 − 3i)] (z − c)
√ √
= 2 (z − 1 − 3i) (z − 1 + 3i) (z − c)
√ 2
= 2 [(z − 1) − ( 3i) ] (z − c)

= 2 (z 2 − 2z + 4) (z − c)
= 2 [z 3 + (−c − 2)z 2 + (4 + 2c)z − 4c] .

Comparing coefficients, we have c = −0.5, which is also the third root for the equation.

A555 (9740 N2008/II/3)(a) ∣p∣ = ∣ ∣ = = = = arg w − arg w∗ =
w w
1 and arg arg
∣w ∣
∗ ∗
p ∗
w w
θ − (−θ) = 2θ.
arg p5 = 10θ + 2kπ. The argument of a positive real number is 2mπ for some integer m.
Hence, θ = nπ/5 for integers n. Given also the restriction that θ ∈ (0, π/2), we have θ = π/5
or 2π/5.
(b) ∣z∣ ≤ 6 is a circle of radius 6 centred on the origin, including the interior of the circle.
∣z∣ = ∣z − 8 − 6i∣ is a line that is equidistant to the origin and the point (8, 6).
So the locus of z is the line segment AB.

1676, Contents


|z | ≤ 6 8 + 6i


|z | = |z - 8 - 6i |


(ii) Observe that arg z is maximised and minimised at A and B. arg A = ∠COX + ∠AOC,
6 3
arg B = ∠COX − ∠BOC. Moreover, ∠COX = arg(8 + 6i) = tan−1 = tan−1 .
8 4

Note that △AOC is right and the length of OC is half of ∣8 + 6i∣ = 82 + 62 = 10. So
OC = 10. Thus, ∠AOC = ∠BOC = cos−1 = cos−1 = cos−1 .
3 5
Altogether then, arg A = ∠COX + ∠AOC = tan−1 + cos−1 ≈ 1.229 and arg B = ∠COX −
4 4
3 5
∠BOC = tan−1 − cos−1 ≈ 0.058.
4 4
A556 (9233 N2008/I/9)(i) This is the circle centred on −2i with radius 2.

1677, Contents


- 2i
Radius 2

|z + 2i | = 2

(ii) This is the line that is equidistant to the points 2 + i and i.

A556 (9233 N2008/I/9)(iii) This is the region bounded by and including the rays
arg(z + 1 − 3i) = π/6 and arg(z + 1 − 3i) = π/3.

1678, Contents



1 + 3i π/6

π / 6 ≤ arg (z + 1 – 3i) ≤ π / 3

A557 (9233 N2008/II/3)(i) (1 − i)2 = 1 − 1 − 2i = −2i. 3

The other root is −1 + i, because (−1 + i)2 = 1 − 1 − 2i = −2i.

Remark 163. Do not make the mistake of concluding that by the complex conjugate roots
theorem, 1 + i is the other root of the equation w2 = −2i. The theorem applies only for
polynomials whose coefficients are all real. It does not apply here because there is an
imaginary coefficient.
√ √
3 + 5i ± (3 + 5i)2 − 4(1)(−4)(1 − 2i) 3 + 5i ± 9 − 25 + 30i + 16(1 − 2i)
(ii) z = =
√ 2 2
3 + 5i ± −2i 3 + 5i ± (1 − i)
= = = 2 + 2i, 1 + 3i.
2 2

A558 (9740 N2007/I/3)(a) This is the circle with radius 13 centred on the point
−2 + 3i.

1679, Contents


|z + 2 - 3i | =

-2 + 3i


(b) (a + ib)(a − ib) + 2(a + ib) = 3 + 4i or a2 + b2 + 2a + 2bi = 3 + 4i. Two complex numbers are
equal if and only if their real and imaginary parts are equal. So a2 + b2 + 2a = 3 and 2b = 4.
1 2

From =, b = 2. Plug this into = to find that a2 + 2a + 1 = 0 or a = −1. So w = −1 + 2i.

2 1

A559 (9740 N2007/I/7)(i) By the complex conjugate roots theorem, another root is
re−iθ . And so a quadratic factor of P (z) is:

(z + reiθ ) (z − re−iθ ) = z 2 + rzeiθ − rze−iθ − (reiθ ) (re−iθ )

= z 2 + rz (eiθ − e−iθ ) − r2 = z 2 + rz cos θ − r2 .

(ii) z 6 = −64 = 64eiπ = 26 eiπ(1+2k) for k ∈ Z. So z = 2eiπ(1+2k)/6 for k = 0, ±1, ±2, −3.
(iii) We first use (ii), then use (i):

1680, Contents

z 6 + 64

z 2 −2(2) cos(π/6)+22 z 2 −2(2) cos(5π/6)+22

³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹µ
= (z − 2e ) (z − 2e iπ/6 −iπ/6
)(z − 2e 3iπ/6
) (z − 2e −3iπ/6
)(z − 2e5iπ/6 ) (z − 2e−5iπ/6 )
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¶
z 2 −2(2) cos(3π/6)+22

√ √
= (z 2 − 2 3 + 4) (z 2 + 4) (z 2 + 2 3 + 4) .

A560 (9233 N2007/I/9)(i) By the complex conjugate roots theorem, another root is
−ki. Altogether then,

az 4 + bz 3 + cz 2 + dz + e = a (z − ki) (z + ki) (z 2 + f z + g)
= a (z 2 + k 2 ) (z 2 + f z + g)
= a [z 4 + f z 3 + (k 2 + g)z 2 + k 2 f z + gk 2 ] .

By comparing coefficients, we have b = af , c = a(k 2 + g), d = ak 2 f , and e = agk 2 . Now verify

1 2

that indeed:

ad2 + b2 e = a3 k 4 f 2 + a3 f 2 gk 2
= (af ) × [a(k 2 + g)] × (ak 2 f )
= b × c × d. 3

(ii)a = 1, b = 3, c = 13, d = 27, e = 36. So indeed ad2 +b2 e = 1×272 +32 ×36 = 1053 = 3×13×27 =


27 √
From = above, f = = 3. So from =, k = ± =± = ± 9 = ±3. So the two desired
1 b 2 d
a af 1×3
roots are ±3i.
A561 (9233 N2007/II/5). The locus of P is the ray from (but excluding) the point 2i
that makes an angle π/3 with the horizontal. This is one half of the line with cartesian

equation y = x tan + 2 = 3x + 2.
The locus of Q is the line that is equidistant to the points 4 and −2. It has cartesian
equation x = 1.

1681, Contents

P : arg (z – 2i) = π / 3

(1, )

(1, 2)

Q : |z + 2| = |z – 4|
(-2, 0) (4, 0) x
√ √
The intersection of the two lines is (1, 3 + 2) or 1 + i ( 3 + 2).
√ √ √ 2 √ √
[1 + i ( 3 + 2)] [1 − i ( 3 + 2)] = 1 + ( 3 + 2) = 1 + 3 + 4 + 4 3 = 8 + 4 3. 3
A562 (9233 N2006/I/5)(i) This is the circle with radius 3 centred on −4 + 4i.

1682, Contents


|z + 4 - 4i| = 3

A = (-4, 4)

C = (0, 1) x

(ii) Given a point (C here), the line connecting it to the centre of a circle (A here) also
passes through the point on the circumference (B here) that is closest to the given point
(see Fact ??).
√ √
The distance between A and C is (−4 − 0) + (4 − 1) = 42 + 32 = 5. So the distance
2 2

between B and C is 5 − 3 = 2.
A563 (9233 N2006/I/6)(i)

0 = (ki)4 − 2(ki)3 + 6(ki)2 − 8(ki) + 8

= k 4 + 2k 3 i − 6k 2 − 8ki + 8
= k 4 − 6k 2 + 8 + 2k(k 2 − 4)i.
´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¸ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶ ´¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹¸¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¶
=0 0

From 2k(k 2 − 4) = 0, we have k = 0, ±2. Only k = ±2 also satisfies k 4 − 6k 2 + 8 = 0. So the

equation has roots ±2i.

(ii) z 4 − 2z 3 + 6z 2 − 8z + 8 = (z − 2i) (z + 2i) (z 2 + az + b)

= (z 2 + 4) (z 2 + az + b)
= z 4 + az 3 + (4 + b)z 2 + 4az + 4b.

1683, Contents

Comparing coefficients, a = −2 and b = 2. So z 2 + az + b = z 2 − 2z + 2, whose zeros are

2 ± (−2)2 − 4(1)(2) √
z= = 1 ± 1 − 2 = 1 ± i.
Altogether then, the equation has roots ±2i, 1 ± i.

1684, Contents

130.5. Ch. 113 Answers (Calculus)
A564 (9758 N2017/I/1).

(2x) (ax) (ax)

2 2 3
e ln (1 + ax) = (1 + 2x +
+ . . . ) (ax − + − ...)
2! 2 3

(ax) (ax)
2 3
= ax − + + 2ax2 − a2 x3 + 2ax3 + . . .
2 3
a2 2 a3 3
= ax + (2a − ) x + (2a − a + ) x + . . .
2 3
2a − =0 ⇐⇒ (4 − a) = 0 ⇐⇒ a = 0 (discard) or a = 4.
2 2
A565 (9758 N2017/I/3)(i) Apply the operator to the given equation:
dy dy
− 2 (y + x ) + 10x = 0.
dx dx

dy 2
= 0. Plug = into =:
2 1
The stationary points are given by

−2y + 10x = 0 y = 5x.


Plug = back into the original equation:


1 1 2
25x2 − 10x2 + 5x2 − 10 = 0 or or x = ±√ = ±
x2 = .
2 2 2

Thus, the x-coordinates of the two stationary points of C are ± 2/2.
√ √
2 5 2
(ii) The point in question is P = ( , ).
2 2
operator to =:
Apply the

dy 2 d2 y dy dy d2 y
2 (( ) + y 2 ) − 2 ( + + x 2 ) + 10 = 0.
dx dx dx dx dx

dy 2
= 0 into =:

d2 y d2 y d2 y 5
2y 2 − 2x 2 + 10 = 0 = (for x ≠ y).
dx dx dx 2 x−y

At P , we have x < y and so the second derivative is negative. By the Second Derivative
Test then, P is a maximum point.
1685, Contents
A566 (9758 N2017/I/7)(i) We’ll use the following identity given on List MF26 (p. 3):
P +Q P −Q
cos P − cos Q = −2 sin sin .
2 2
P +Q 2 P −Q
2mx = and 2nx =
Set: .
2 2

The sum of = and = yields: 2 (m + n) x = P .

1 2 3

Now plug = into = to get: Q = 4mx − P = 2 (m − n) x.

3 1
And thus:

1 1
∫ sin 2mx sin 2nx dx = 2 ∫ cos Q − cos P dx = 2 ∫ cos 2 (m − n) x − cos 2 (m + n) x dx

1 sin 2 (m − n) x sin 2 (m + n) x
= [ − ] + C.
2 2 (m − n) 2 (m + n)

(f (x)) dx = ∫ (sin 2mx + sin 2nx) dx

π π
2 2

= ∫ sin2 2mx + sin2 2nx + 2 sin 2mx sin 2nx dx


1 − cos 4mx 1 − cos 4nx
=∫ + + 2 sin 2mx sin 2nx dx

0 2 2
sin 2 (m − n) x sin 2 (m + n) x
1 1
= [x − sin 4mx − sin 4nx + − ] = π,
8m 8n 2 (m − n) 2 (m + n) 0

where the last step exploits the fact that sin (kπ) = 0 for any k ∈ Z.
A567 (9758 N2017/I/11)(i)(a) = c. (b) 4 + 2.5c = 29. So c = 10 and v = 4 + 10t.
dv dt 1 1
= c − kv = 10 − kv ⇐⇒ = . Apply the ∫ dv operator to =:
dt dv 10 − kv
dt 1 1
∫ dv dv = ∫ 10 − kv dv or t + C = − ln ∣10 − kv∣.

10 − kv = ±e−k(t+C) = ±e−kC e−kt = C1 e−kt .


Plug the initial values (t, v) = (0, 0) into = to get: 10 − k ⋅ 0 = C1 e−k⋅0 = C1 or C1 = 10.

Thus: 10 − kv = 10e−kt or v= (1 − e−kt ).
10 10
(iii) The terminal velocity is vT = lim v = lim (1 − e−kt ) = = 40. So, k = 1/4. Now:
t→∞ t→∞ k k

v = 0.9 × 40 = 36 or 10/k (1 − e−kt ) = 36 or 1 − e−t/4 = 0.9 or ln 0.1 = −t/4 or t = 4 ln 10.

The time taken for this object to reach 90% of its terminal velocity is 4 ln 10 s ≈ 9.210 s.
1686, Contents
A568 (9758 N2017/II/4)(a) First, sketch the given curve and line:

2y = x − 1

2 x


y = x2 − 6x + 5

To find their intersection points, plug the line’s equation into the curve’s:

x − 1 = 2x2 − 12x + 10 ⇐⇒ 2x2 − 13x + 11 = 0

13 ± (−13) − 4 (2) (11) 13 ± 9
⇐⇒ x= = = 1, .
2 (2) 4 2

So, the requested area is:

11/2 x3 13 11 243
∫1 − (x2 − 6x + 5) dx = [− + x2 − x] = = 15.187 5.
2 3 4 2 1 16
√ 2
1 1 1 1 π 1 1
π( ) = = [ ] = ( − ) =
y y π π
(b)(i) ∫ dy ∫ dy .
a − y2 2 a − y2 0 2 a − 1 a 2 (a2 − a)
0 (a − y 2 )

(ii) We are given that the volume of the second container is four times that of the first:

= 4 (b2 − b) = a2 − a 4b2 − 4b + a − a2 = 0.
π π
4 or or
2 (b2 − b) 2 (a2 − a)

By the quadratic formula:

√ √
4± 16 − 4 (4) (a − a2 ) 1 ± 1 − a + a2
b= =

Note that the correct solution to (b)(ii) must include both values of b. It is incorrect to
reject either value. See lengthy remark on the next page.

1687, Contents

Remark 164. This entire remark concerns 9758 N2017/II/4(b)(ii) or Exercise 568,
which was answered on the previous page.
I have come across one set of published TYS answers422 that rejects the smaller value of
b. I suspect that this may also have been the answer the writers of this question were
looking for. But based on all the information given in the question, there is no reason —
and it is therefore incorrect — to reject the smaller value of b. The question did specify
that a > 1 but placed no restriction on what b could be.
Consider for example the case where a = 2. Then the two possible values of b are:
√ √
1+ 3 1− 3
b1 = or b2 = .
2 2
√ √ √
x= x= x=
y y y
The three graphs below are those of: , , and .
a − y2 b1 − y 2 b2 − y 2

x= y
a − y2

b2 − y 2
1 (1, 1)
( , 1)
b2 − 1 1
( , 1)
b1 − 1

b1 − y 2

Let V be the volume generated by rotating the black area around the y-axis, i.e. the
volume found in (b)(i). Let V1 and V2 be the corresponding volumes for the red and blue
areas. (The red area includes the black area.)
It is not difficult to show that V = π/4 and V1 = π = V2 , so that indeed, each of the
volumes V1 and V2 is four times the volume V .
We can quite easily prove that this is also more generally true for any a > 1:

√ ⎡ √ ⎤2
2 ⎢ ⎥
1 1
⎢ ⎥ 2π
V1 = V2 = ∫ π( ) = ⎢ ⎥ = ⋅ ⋅ ⋅ =
y y
dy ∫ √
⎢ (1 ± 1 − a + a2 ) /2 − y 2 ⎥
b − y2 a2 − a
π .
0 0 ⎢ ⎥
⎣ ⎦

The one-sentence reason given is: “Since a > 1, 1 − a + a2 > 1 Ô⇒ b > 1.”
1688, Contents
A569 (9740 N2016/I/2). Here as an exercise, let’s do this without a calculator.
First, apply ln to the given equation: ln y = cos x ln 2. Then apply :
1 dy dy
= − sin x ln 2 or = −y ln 2 sin x = −2cos x ln 2 sin x.
y dx dx
Thus: ∣ = −2cos 0 ln 2 sin 0 = 0.
dx x=0
∣ = −2cos(π/2) ln 2 sin = −20 ln 2 ⋅ 1 = − ln 2 ≈ −0.693.
dx x=π/2 2

(ii) The tangent line at x = 0 is: y − 2cos 0 = 0 (x − 0) or y = 2.


The tangent line at x = is: y − 2cos /2 = − ln 2 (x − ) or y = −x ln 2 + ln 2 + 1.

π π π 2 π
2 2 2
Plug = into = to get: 2 = −x ln 2 + ln 2 + 1 or x = −
1 2 π π
2 2 ln 2
So, the tangents meet at ( −
, 2).
2 ln 2

A570 (9740 N2016/I/8)(i) f ′ (x) = a sec2 (ax + b) = a [1 + tan2 (ax + b)]

= a + a tan2 (ax + b) = a + ay 2 .
f ′′ (x) = 2ayf ′ (x) = 2a2 y (1 + y 2 ).
f ′′′ (x) = 2a2 [f ′ (x) (1 + y 2 ) + y ⋅ 2yf ′ (x)] = 2a2 f ′ (x) (1 + 3y 2 ) = 2a3 (1 + y 2 ) (1 + 3y 2 ).

(ii) f (0) = tan (a ⋅ 0 + ) = tan = 1.

π π
4 4
f ′ (0) = a {1 + [f (0)] } = a (1 + 12 ) = 2a.

f ′′ (0) = 2a2 f (0) (1 + [f (0)] ) = 2a2 ⋅ 1 (1 + 12 ) = 4a2 .


f ′′′ (0) = 2a3 (1 + [f (0)] ) (1 + 3 [f (0)] ) = 2a3 (1 + 12 ) (1 + 3 ⋅ 12 ) = 16a3 .

2 2

f ′′ (0) 2 f ′′′ (0) 3 8

Thus: f (x) = f (0) + f (0) x +

x + x + ⋅ ⋅ ⋅ = 1 + 2ax + 2a2 x2 + a3 x3 + . . .
2! 3! 3

(iii) f (0) = tan (2 ⋅ 0 + 0) = tan 0 = 0.

f ′ (0) = a {1 + [f (0)] } = 2 (1 + 02 ) = 2.

f ′′ (0) = 2a2 f (0) (1 + [f (0)] ) = 2 ⋅ 22 ⋅ 0 (1 + 02 ) = 0.


f ′′′ (0) = 2a3 (1 + [f (0)] ) (1 + 3 [f (0)] ) = 2 ⋅ 23 (1 + 02 ) (1 + 3 ⋅ 02 ) = 16.

2 2

f ′′ (0) 2 f ′′′ (0) 3 8

Thus: f (x) = f (0) + f ′ (0) x + x + x + ⋅ ⋅ ⋅ = 2x + x3 + . . .
2! 3! 3
1689, Contents
dx dy d2 x
A571 (9740 N2016/I/9)(i)(a) Plug y = and = into the given equation to get:
dt dt dt2
dy dy
+ 2y = 10 or = 10 − 2y.
dt dt
dt 1 1
= (for y ≠ 5). Now apply ∫ dy to =:
dy 10 − 2y

dt 1 1 1 1
∫ dy dy = ∫ 10 − 2y dy = − 2 ∫ y − 5 dy or t + C = − ln (y − 5) (for y < 5).

y − 5 = e−2(t+C) = C1 e−2t y = 5 + C1 e−2t .

Or: or
dx 3
We are given the initial values (t, x, y = ) = (0, 0, 0). Plugging these into =, we get:
0 = 5 + C1 e−2⋅0 or C1 = −1. Thus, we have:

dx 4
y = 5 − 5e−2t or = 5 − 5e−2t .
Apply ∫ dt to = to get: x = ∫ 5 − 5e−2t dt = 5t + e−2t + C2 .
3 5
Plugging the initial values = into =, we get:
3 5

5 5 5 5
C2 = 0 − 5 ⋅ 0 − e−2⋅0 = − , and thus: x = 5t + e−2t − .
2 2 2 2

(ii) Apply ∫ dt to the given differential equation:

dx 1 1
= ∫ 10 − 5 sin t dt = 10t + 10 cos t + C3 .
dt 2 2

Plugging the initial values = into =, we get C3 = 0 − 10 ⋅ 0 − 10 cos 0 = −10.

3 6

Now apply the ∫ dt operator to = to get:


1 t2
x = 10 ∫ t + cos t − 1 dt = 10 ( + 2 sin − t) + C4 .
7 t
2 2 2

Plugging the initial values = into =, we get C5 = 0 − 10 (0 + 0 − 0) = 0.

3 7

Thus, x = 10 ( + 2 sin + t) = 5 (t2 + 4 sin − 2t).
t t
2 2 2

5 5
(iii) For model (i), x = 5 ⇐⇒ 5t + e−2t − = 5 ⇐⇒ t ≈ 1.474 (calculator).
2 2
For model (ii), x = 5 ⇐⇒ 5 (t2 + 4 sin − 2t) = 5 ⇐⇒ t ≈ 1.046 (calculator).
1690, Contents
dV 1
A572 (9740 N2016/II/1). We are given that = 0.1.
dh dr
Also, tan α = = 0.5. So, h = 2r and =2 .
r 2
h dt dt
1 2 2π 3 3V 1/3 9 1/3
Plug = into V = πr h to get V = r or r = ( ) . So, when V = 3, we have r = ( )
2 3
3 3 2π 2π
dV d π dr dh π dh dh 4 π dh dh
= ( r2 h) = (2r h + r2 ) = (r h + r2 ) = r (h + r) = πr2 .
dt dt 3 3 dt dt 3 dt dt 3 dt dt

Putting =, =, and = together, we have:

1 3 4

dh 0.1 1 2π 2/3 1 −1/3 2 2/3

= 2= ( ) = π ( ) ≈ 0.0250.
dt πr 10π 9 10 9

A573 (9740 N2016/II/2)(a)(i) Use Integration by Parts twice:

1 2 1
∫ x cos nx dx = n x sin nx − 2 ∫ x n sin nx dx

1 2 −1 −1
= x2 sin nx − [x cos nx − ∫ cos nx dx]
n n n n
1 2 2
= x2 sin nx + 2 x cos nx − 3 sin nx + C
n n n
1 2 2
= [(x2 − 2 ) sin nx + x cos nx] + C1 .
n n n

2π 1 2 2
∫π x cos nx dx = [(x − 2 ) sin nx + x cos nx]
2 2
n n n π

1 2 2
= [ x cos nx] = 2 (2π ⋅ 1 ± π) = 2 (2 ± 1) 2 .
n n π n n

So, a = 6 or 2.

du dx 2 1
(b) Given the substitution u = 9 − x2 , we have = −2x and =
dx du −2x
The volume of the solid is:
√ 2
2 2 2 x3 2 x3 dx du
y 2 dx = π ∫ ( ) = =
x x
π∫ dx ∫ dx ∫ dx
π π
0 (9 − x2 ) 0 (9 − x2 ) du dx
2 2 2
0 0

x3 1 π 5 x2 5 9−u π 5 9 1
= π∫ = − = − = − − du
s,2 1 π
2 ∫9 u2 2 ∫9 u2 2 ∫9 u2 u
du du du
u=9 u2 −2x
π 9 π 9 9 π 4 5
= − [− − ln ∣u∣] = ( + ln 5 − − ln 9) = ( + ln ) ≈ 0.333.
2 u 9 2 5 9 2 5 9
1691, Contents
A574 (9740 N2016/II/3)(i) y = 0 ⇐⇒ cos t = 1 ⇐⇒ t = 0, 2π ⇐⇒ x = −1, 2π − 1.
So, the x-intercepts are (−1, 0) and (2π − 1, 0). (Note that these correspond to where t = 0
and t = 2π.)

When t = π,
(x, y) = (π + 1, 2)

When t = 0,
(x, y) = (−1, 0) x

When t = 2π,
(x, y) = (2π − 1, 0)

y = 1 − cos t is maximised when cos t = −1 or t = π. So, the maximum point is (π + 1, 2).

t=a dx dt
y dx = ∫ dx = ∫ (1 − cos t) (1 + sin t) dt
t=a t=a
(ii) ∫t=0 t=0
dt dx t=0
=∫ 1 + sin t − cos t − sin 2t dt
1 + sin t − cos t − sin t cos t dt = ∫
a a

0 0 2
1 a
1 3
= [t − cos t − sin t + cos 2t] = a − cos a − sin a + cos 2a + .
4 0 4 4

(iii) The gradient of the normal line at t = π is:
dx dx dy 1 + sin t
− ∣ =− ÷ ∣ =− ∣ = −2.
dy t=π/2 dt dt t=π/2 sin t t=π/2
Hence, the normal line’s equation is:

y − (1 − cos ) = −2 [x − ( − cos )] y = −2x + π + 1.

π π π
2 2 2

Thus, E = ( , 0), F = (0, π + 1), and the area of △OEF is:
1π+1 (π + 1)
(π + 1) = .
2 2 4

1692, Contents

A575 (9740 N2015/I/3)(i) We 1 2
have graphed some continuous y f( )
1 1 4 4
function f ∶ [0, 1] → R. f( ) f
2 2
Let A = ∫ f (x) dx. That is,
let A be the area under f or
more precisely, the area bounded
by f , the x-axis, and the vertical
lines x = 0 and x = 1. 1 3
f( )
1 2 1 1 4 4
The two red rectangles corres- f( ) f( )
pond to the case where n = 2 2 2 4 4
and have total area: x
1 1 2 0 1 4 1
[f ( ) + f ( )] , f( ) 1 3 1
2 2 2 4 4 4 2 4
which serves as a crude approximation of A.
The four blue rectangles correspond to the case where n = 4 and have total area:
1 1 2 3 4
[f ( ) + f ( ) + f ( ) + f ( )] ,
4 4 4 4 4
which serves as a better if still fairly crude approximation of A.
It is plausible then that as n grows (and the number of rectangles grow), the expression:
1 1 2
[f ( ) + f ( ) + ⋅ ⋅ ⋅ + f ( )]
n n n n
serves as an ever-better approximation of A.
It is thus plausible that:
1 1 2 1
[f ( ) + f ( ) + ⋅ ⋅ ⋅ + f ( )] = ∫ f (x) dx.
n 1
n→∞ n n n n 0

(Note that we haven’t proven that = is true. We have merely presented an informal argument

for why it might be true. Which is all you need to know for H2 Maths.)

(ii) Define f ∶ R → R by f (x) ↦ 3 x. Then:
√ √ √ ⎡√ √ √ ⎤
1 3
1 + 2 + ⋅⋅⋅ + n
3 3 ⎢
1⎢3 1 3 2 n ⎥⎥ 1 1 2
( √ ) = + + ⋅ ⋅ ⋅ + = [f ( ) + f ( ) + ⋅ ⋅ ⋅ + f ( )] .
⎢ ⎥

n⎢ n n⎥ n
⎣ ⎦
n n n n n n
√ √ √
1 3 1 + 3 2 + ⋅⋅⋅ + 3 n 1 1√ 3 4 1 3
Hence, by (i): ( √ ) = ∫0 f (x) dx = ∫0
x dx = [x 3 ] = .
n 3
n 4 0 4
1693, Contents
A576 (9740 N2015/I/4). The rectangle’s perimeter is 2 (x + y). The semicircle’s is
2x + πx. The sum of these two perimeters is:

d = 2 (x + y) + 2x + πx = (4 + π) x + 2y.

y= − (2 + ) x.
1 d π
2 2

The rectangle’s area is xy. The semicircle’s is πx2 /2. The sum of these two areas is:

A = xy + x2 = x [ − (2 + ) x] + x2 = x ( − 2x).
π 1 d π π d
2 2 2 2 2

Observe that A is a quadratic polynomial in x with negative coefficient on x and roots 0

and d/4. Thus, its maximum, which is also its turning point, is given by the midpoint of
these two roots, i.e. at x = d/8. Hence, the maximised area is:

1 1
= ( − 2 ) = d2 , where k =
d d d
A∣ .
x=d/8 8 2 8 32 32

A577 (9740 N2015/I/6) Refer to List MF26, p. 2.

(i) In the last standard Maclaurin expansion, replace each x with 2x:

(2x) (2x)
2 3
ln (1 + 2x) = (2x) − + − ⋅ ⋅ ⋅ = 2x − 2x2 + x3 − . . .
2 3 3
(ii) In the second Maclaurin expansion, replace n with c and each x with bx:

c (c − 1) (bx) c (c − 1) (c − 2) (bx)
2 3
ax (1 + bx) = ax [1 + c (bx) + + + ...]
2! 3!
1 1
= ax + abcx2 + ab2 c (c − 1) x3 + ab3 c (c − 1) (c − 2) x4 + . . .
2 6
1 2 3 8
a = 2, abc = −2, ab c (c − 1) = .
1 2
Comparing coefficients, we have:
2 3
Plug = into = to get b = −1/c. Then plug = and = into = to get:
1 2 4 1 4 3

c−1 8 1 8 −3
= ⇐⇒ 1− = ⇐⇒ c= .
c 3 c 3 5
And now from =, we also have b = .
Thus, the coefficient of x is:

1 3 1 5 3 3 8 13 1 1 3 104
ab c (c − 1) (c − 2) = (2) ( ) (− ) (− ) (− ) = ( ) (−3) (−8) (−13) = − .

6 6 3 5
 3 3
1694, Contents
A578 (9740 N2015/I/10)(i) A1 + A2 = ∫ cos x dx = [sin x]0 = 1.
√ √
2 2 √
A1 = ∫ cos x dx = [− cos x]0 +[sin x]π/4 = (− + 1)+(1 − ) = 2− 2.
π/4 π/2
sin x dx+ ∫
π/4 π/2
0 π/4 2 2
√ √
Hence: A2 = (A1 + A2 ) − A1 = 1 − (2 − 2) = 2 − 1.
√ √ √ √ √
A1 2 − 2 2 − 2 2 + 1 2 2 − 2 + 2 − 2 √
And: =√ =√ √ = = 2.
A2 2−1 2−1 2+1

√ √
2/2 2/2
(ii) The volume of the solid is π ∫ x dy = π ∫
(sin−1 y) 2 dy.
0 0

dy 2
(iii) From the given substitution y = sin u, we have = cos u.
√ √
2/2 2/2
π∫ (sin−1 y) 2 dy = π ∫
u2 dy
0 0

y= 2/2 dy
= π∫
u2 du
y=0 du

= π∫
u2 cos u du. ,

Use Integration by Parts twice:

∫ u cos u du = u sin u − ∫ 2u sin u du

2 2

= u2 sin u − 2 [u (− cos u) − ∫ (− cos u) du]

= u2 sin u + 2u cos u − 2 sin u.


Plug = into ,:

u2 cos u du = π [u2 sin u + 2u cos u − 2 sin u] 0

√ √
π2 2 π 2 √
= π [( + − 2) − (0 + 0 − 0)]
16 2 2 2

2 π2 π
=π ( + − 2).
2 16 2
1695, Contents
A579 (9740 N2015/I/11)(i)

dy dy dx 6 sin θ cos2 θ − 3 sin3 θ 2 cos2 θ − sin2 θ cos θ sin θ

= ÷ = 2 = =2 − = 2 cot θ − tan θ.
dx dθ dθ 3 sin θ cos θ sin θ cos θ sin θ cos θ
dy √
(ii) Stationary point ⇐⇒ = 0 ⇐⇒ 2 cot θ̂ −tan θ̂ = 0 ⇐⇒ 2 = tan2 θ̂ ⇐⇒ tan θ̂ = ± 2.
√ 1 √
Since θ̂ ∈ [0, π/2], tan θ̂ > 0. We can discard tan θ̂ = − 2 and tan θ̂ = 2 (k = 2).
At this stationary point, we have:
√ √ √ 3
2 1 ⎛ 2 2 ⎞
sin θ̂ = , cos θ̂ = , (x, y) = ⎜ , √ ⎟.
3 3 ⎝ 3 3⎠

d2 y d dθ −2cosec2 θ − sec2 θ
= (2 cot θ − tan θ) = (−2cosec2
θ − sec2
θ) = .
dx2 dx dx 3 sin2 θ cos θ

Evaluated at θ = θ̂, this last expression has negative numerator and positive denominator
and is thus negative. So by the Second Derivative Test, this is a maximum turning point.
(iii) Observe that since θ ∈ [0, π/2], we have y = 3 sin2 θ cos θ ≥ 0. And so, the curve C is
entirely above the x-axis.
Moreover, x = sin3 θ is strictly increasing in θ, with the endpoints of θ matching those of
the range of values taken by x.
Thus, the requested area is simply:

y dx = ∫
θ=π/2 θ=π/2
∫θ=0 θ=0
3 sin2 θ cos θ dx

dx dθ
3 sin2 θ cos θ dx
θ=0 dθ dx
3 sin2 θ cos θ dθ
θ=0 dθ

=∫ 3 sin2 θ cos θ (3 sin2 θ cos θ) dθ


=∫ 9 sin4 θ cos2 θ dθ ≈ 0.884.


(iv) The intersection points of the line and the curve C are given by:

3 2
y = ax or 3 sin2 θ cos θ = a sin3 θ or = tan θ.

From (ii), the maximum point occurs at tan θ̂ = 2. So plug = into = to get:
1 1 2

3 √ 3 3 2
= 2 or a= √ = .
a 2 2
1696, Contents
A580 (9740 N2015/II/1)(i) The maximum height is attained when

dh 1 1
=0 or 16 − h = 0 or h = 32.
dt 10 2

dt 1 10
(ii) Rearrange the given differential equation as: =√ .
dh 16 − 2 h

10 1 √
So: t=∫ √ dh = −40 16 − h + C = −20 64 − 2h + C.
16 − 2 h
1 2

Plugging in the given initial condition (h, t) = (0, 0), we have C = 160. Hence:

t = −20 64 − 2h + 160.
√ √
Thus: t∣h=16 = −20 64 − 2 ⋅ 16 + 160 = −20 ⋅ 4 2 + 160.

A581 (9740 N2014/I/2). Apply the operator to the given equation:
dy dy 1
2xy + x2 + y 2 + x ⋅ 2y = 0.
dx dx
= −1 into = to get:

2xy − x2 + y 2 − 2xy = y 2 − x2 = 0 y 2 = x2 y = ±x.

2 3
or or

Plug = and = into the given equation to get:

2 3

±x3 + x3 + 54 = 0.

Clearly, this last equation is true if and only if y = x. So, we have:


2x3 + 54 = 0 or x = −3.
Thus, the unique point at which = −1 is (−3, −3).
1697, Contents
A582 (9740 N2014/I/7)(i) α ≈ 1.885 (mindless use of the calculator).
For β, we work out the “exact value”:

f (x) = −7 ⇐⇒ x6 − 3x4 − 7 = −7 ⇐⇒ x4 (x2 − 3) = 0 ⇐⇒ x = 0, ± 3.

From the graph, β = 3.

1.885 x7 3 5
(ii) ∫ f (x) dx ≈ ∫√ x6 − 3x4 − 7 dx = [ − x − 7x] 1.885 ≈ −0.597.
α 1 2

β 3 7 5 3

3 x7 3 5 √ 27 √ 27 √ √
(iii) ∫ f (x) dx = ∫ x − 3x − 7 dx = [ − x − 7x] 0 3 = 3− 3 − 7 3.
6 4
0 0 7 5 7 5
27 √ 27 √ √ √ 54 √
Hence, the requested area is − ( 3− 3 − 7 3) − 7 3 = 3.
7 5 35

(iv) f (−x) = (−x) − 3 (−x) − 7 = x6 − 3x4 − 7 = f (x).

6 4
If γ solves f (x) = 0, then so too does −γ.
So for example, since that α ≈ 1.885 solves f (x) = 0, so too does −α ≈ −1.885.

Remark 165. The last question was strangely open-ended. I think the above answer
should have sufficed for the full four marks. But of course, who can divine what was on
the mind of those who wrote this question?
Here are two other things that could also have been “said”:

• Compute f ′ (x) = 6x5 − 12x3 = 6x3 (x2 − 2). Hence, for x ≥ 0, f ′ (x) > 0 ⇐⇒ x > 2.
Thus, the only positive root is α. Therefore, the other four roots are complex.
• We can even find these using only what we’ve learnt in H2 Maths.423 Write:

x6 − 3x4 − 7 = (x − α) (x + α) (x4 + bx2 + c) = (x2 − α2 ) (x4 + bx2 + c).

Comparing coefficients on the x4 and constant terms, we have −3 = −α2 +b and −7 = −α2 c.
And thus, b = α2 − 3 and c = 7/α2 .
Now let z = x2 , so that x4 + bx2 + c = z 2 + bz + c = 0. By the usual quadratic formula:
√ √ √
−b ± b2 − 4c 3 − α2 ± α4 − 6α2 + 9 − 28/α2 3 − α2 ± α4 − 6α2 + 9 − 28/α2
z= = = .
2 2 2
So, the other four (complex) roots of f (x) = 0 are:
√ Á 3 − α2 ± √α4 − 6α2 + 9 − 28/α2
x=± z=± .
In particular, without knowing how to solve cubic equations.
1698, Contents
A583 (9740 N2014/I/8)(i) List MF26, p. 4 tells us that ∫ √ = sin−1 .
a2 − x2 a

∫ f (x) dx = ∫ √ dx = sin−1 + C.
9 − x2 3

1 1 1 1 x 2
(ii) f (x) = √ = √ = [1 − ( ) ]
9 − x2 3 1 − ( x )2 3 3
⎡ ⎤
⎢ ⎥
2 3
1 ⎢⎢ 2 (− 1
) (− 3
) [− ( )
x 2
] (− 1
) (− 3
) (− 5
) [− ( )
x 2
] ⎥
+ . . . ⎥⎥
= = ⎢1 + (− ) [− ( ) ] + +
x 2 2 3 2 2 2 3
3⎢ 2 3 2! 3! ⎥
⎢ ⎥
⎣ ⎦
1 1 1 4 5
= = + x2 + x + x6 + . . .
3 54 648 34 992

(iii) By the merry and unjustified application of Theorem ??:

sin−1 = f (x) dx
3 ∫
1 1 2 1 4 5
=∫ + x + x + x6 + . . . dx
3 54 648 34 992
1 1 3 1 5 5
=C + x+ x + x + x7 + . . .
3 162 3 240 244 944

Plugging in x = 0, we see that C = 0. So:

x 1 1 3 1 5 5
sin−1 = x+ x + x + x7 + . . .
3 3 162 3 240 244 944

1699, Contents

dx 1 1 1
A584 (9740 N2014/I/10) The given initial condition is (t, x, ) = (0, , − ).
dt 2 4
(i) Plug = into the given differential equation:

1 1 1 2 5 1
− = k [1 + − ( ) ] = k or k=− .
4 2 2 4 5

5 1 2
(ii) 1 + x − x = − (x − ) . So:
4 2
1 1 1
t =∫ dx = ∫ dx = 5 ∫ dx
k (1 + x − x2 ) − 5 [ 4 − (x − 2 ) ]
1 5 1 2
(x − 2 ) − 4
1 2 5

RRR √ R √
R − 1/2 − 5/4 RRR √ 5 + 1 − 2x
ln RRRR √ RRR + C = 5 ln √
=5⋅ √ + C,
2 5/4 RRR x − 1/2 + 5/4 RRRR 5 + 2x − 1

where we can remove the absolute value operator because x ∈ [0, ].

2 √ 5 + 1 − 2x
Plugging in =, we find that C = 0. So: t = 5 ln √
5 + 2x − 1
√ √
√ 5 + 1/2 √ 2 5+1
(iii)(a) t∣x=1/4 = 5 ln √ = 5 ln √ .
5 − 1/2 2 5−1

√ 5+1
(b) t∣x=0 = 5 ln √ ≈ 2.152.
(iv) Rearrange =:

√ √ √ √
√ 5 + 1 − 2x 2 5 5 1− 5
et/ 5 = √ = −1 + √ ⇐⇒ x= √ + .
2x + 5 − 1 2x + 5 − 1 et/ 5 + 1 2

(0, )


√ 5+1
( 5 ln √ , 0) ≈ (2.152, 0)
1700, Contents
√ √
A585 (9740 N2014/I/11)(i) By the Pythagorean Theorem, h = 42 − r2 = 16 − r2 .
2 1 1 √
The hemisphere’s volume is Vh = πr3 and the cone’s is Vc = πr2 h = πr2 16 − r2 .
3 3 3
2 1 √
So, the total volume is: V (r) = Vh + Vc = πr3 + πr2 16 − r2 .
3 3
We have r ∈ [0, 4]. Clearly, r = 0 does not produce a maximum.
√ r3
V (r) = 2πr + (2r 16 − r − √

2 π
Compute the derivative: 2
3 16 − r2
Observe that V ′ (4) < 0. So, 4 cannot be a maximum.
Since neither boundary point 0 or 4 is a maximum, the maximum is attained in the interior.
And so, by the Interior Extremum Theorem, the maximum is a stationary point.
So, we now look for any stationary points. Observe that V ′ (r) = 0 is equivalent to:

1 √ r2
2r + (2 16 − r2 − √ )=0 r = 0.
1 2
3 16 − r2

We already noted that r = 0 does not produce a maximum. Now rearrange =:

2 1

1 r2 √ 1 r2 − 32 + 2r2 3r2 − 32
2r = (√ − 2 16 − r2 ) = √ = √ .
3 16 − r2 3 16 − r2 3 16 − r2

9r4 + 1 024 − 192r2

Ô⇒ 4r = 45r4 − 768r2 + 1 024 = 0.
Square 2 3
9 (16 − r2 )

(ii) The equation = is quadratic in r2 . By the quadratic formula:


√ √
768 ± 7682 − 4 (45) (1 024)
384 ± 101 376
r2 = = ≈ 15.608, 1.457.
2 (45) 45
√ √
So, the positive solutions to = are: ra ≈ 15.608 ≈ 3.951 rb ≈ 1.457 ≈ 1.207.
(iii) V ′ (ra ) = 0 and V ′ (rb ) ≠ 0. Thus, only ra is a stationary point.424

So: ra = r1 ≈ 3.951 and h (r1 ) = 16 − r12 ≈ 0.625.
V (r1 , V (r1 )) ≈ (3.951, 139)

The extraneous solution rb arose because of Ô⇒ (see Ch. 26).
424 Square

1701, Contents

A586 (9740 N2014/II/2). (See List MF26, p. 2.)
9x2 + x − 13 Bx + C (A + 2B) x2 + (2C − 5B) x + 9A − 5C
= + =
(2x − 5) (x2 + 9) 2x − 5 x2 + 9 (2x − 5) (x2 + 9)

A + 2B = 9, 2C − 5B = 1, 9A − 5C = −13.
1 2 3
Comparing coefficients:
Solving, we have A = 3, B = 3, C = 8. Hence, the given definite integral equals:
2 3 3x + 8 3 3 8 2
∫0 + dx = [ ln ∣2x − 5∣ + ln (x + 9) + tan
2 −1 x
2x − 5 x2 + 9 2 2 3 3 0
3 3 8 2 0 3 13 8 2
= (ln 1 − ln 5) + (ln 13 − ln 9) + (tan−1 − tan−1 ) = ln + tan−1 .
2 2 3 3 3 2 45 3 3

x 2
A587 (9740 N2013/I/5)(i) For x ∈ [−a, a], we have f (x) = 1 − ( ) or equivalently:
x 2
[f (x)] + ( ) = 1, with f (x) ≥ 0.

Thus, this portion of the graph is a semi-ellipse with y-intercept 1 and x-intercepts ±a.
On [−4a, −2a] and [2a, 4a], there are two identical semi-ellipses. And on [5a, 6a], there is
a quarter-ellipse. For all other x, we have y = 0. Altogether then:

−4a −2a −a a 3a 5a 6a

dx 2
(ii) From the given substitution x = a sin θ, we have = a cos θ. We specify also that

√ dθ
θ ∈ [−π/2, π/2], so that cos2 θ = cos θ.

Note that both the upper and lower limits of integration are in [−a, a]. So:
√ √ √ √ √
3a/2 3a/2 x2 3a/2 x2 dx dθ
∫a/2 f (x) dx = ∫ 1 − 2 dx = ∫ 1− 2 dx
a/2 a a/2 a dθ dx
π/3 √ π/3 √
= ∫ 1 − sin2 θa cos θ dθ = ∫ cos2 θa cos θ dθ = a ∫
1,2,s 3
cos2 θ dθ
π/6 π/6 π/6

cos 2θ − 1 1 1 π/3 π
= a∫ dθ = a [ sin 2θ − θ] = a.

π/6 2 4 2 π/6 12
1702, Contents
A588 (9740 N2013/I/10)(i) Note that z < 3/2 implies 3 − 2z > 0. Rearrange (A):

dx 1 1 1 1 1
= or x=∫ dz = − ln ∣3 − 2z∣ + C1 = − ln (3 − 2z) + C1 .
dz 3 − 2z 3 − 2z 2 2

Or: 3 − 2z = C3 e−2x or 3 − C3 e−2x = 2z or z = C4 e−2x + .

dy 3 3 3
(ii) = z = C4 e−2x + . So, y = ∫ C4 e−2x + dx = C5 e−2x + x + C6 .
dx 2 2 2

d2 y d dy dy 3
(iii) = = −2C 4 e−2x
= a + b = a (C 4 e−2x
+ ) + b.
dx2 dx dx dx 2
Comparing coefficients, a = −2 and b = 3.

(iv) Two of these lines are:

3 3
y = x (C5 = 0, C6 = 0) and y = x + 1 (C5 = 0, C6 = 1).
2 2
A non-linear member of this family is:

y = e−2x + x (C5 = 1, C6 = 0),
which has the line y = x as its asymptote.

y = e−2x + x

y= x

1703, Contents

dy dy dx
A589 (9740 N2013/I/11)(i) = ÷ = 6t2 ÷ 6t = t.
dx dt dt
So the tangent has equation y − 2t3 = t (x − 3t2 ) or y = tx − t3 .
y = px − p3 y = qx − q 3 .
1 2
(ii) The two tangent lines are: and
For R, equate = and =:
1 2

p3 − q 3
pxR − p3 = qxR − q 3 or (p − q) xR = p3 − q 3 or xR = = p2 + pq + q 2 .
(Note that it’s OK to divide by p − q because p ≠ q.)
The corresponding y-coordinate is yR = px − p3 = p (xR − p2 ) = p (pq + q 2 ). Hence:

R = (p2 + pq + q 2 , p (pq + q 2 )).

If pq = −1, then R = (p2 − 1 + q 2 , −p − q) and we can verify that xR = yR2 + 1:

xR = p2 − 1 + q 2 = p2 + q 2 + 2pq + 1 = (−p − q) + 1 = yR .

(iii) Plug the given parametric equations into x = y 2 + 1 to get:

3t2 = 4t6 + 1 4t6 − 3t2 + 1 = 0.

Observe that t2 = −1 solves =. So write:

4t6 − 3t2 + 1 = (t2 + 1) (4t4 + bt2 + c).

Comparing coefficients, we have b = −4 and c = 1. So:
4t6 − 3t2 + 1 = (t2 + 1) (4t4 − 4t2 + 1) = (t2 + 1) (2t2 − 1) (2t2 − 1) .

Thus, the only real solutions to = are given by 2t2 − 1 = 0 or t = ± 1/2.

Since y ≥ 0 at M , we must have t = 1/2. Thus:
√ √ 3 √
⎛ 12 1 ⎞ ⎛3 1⎞
M = ⎜3 ,2 ⎟= , .
⎝ 2 2 ⎠ ⎝ 2 2 ⎠

x 1.5 √
(iv) In the given region, C and L may be described by: y = 2( ) and y = x − 1.
The area under C, above the x-axis, and from 0 to M is:

3/2 x 1.5 2 3/2
4 3 2.5
3 3 2
∫0 2 ( ) dx = [ 1.5 x2.5 ] = ( ) = √ =
3 ⋅ 2.5 5⋅3
3 0
1.5 2 5 2 10
The area under L, above the x-axis, and from 1 to M is:

3/2 √ 2 3/2
2 1 1 2
∫1 x − 1 dx = [ (x − 1) ] = 1.5 = √ =
3 1 32 3 2 6
√ √
3 2 2 2√
Hence, the requested area is − = 2.
10 6 15
1704, Contents
A590 (9740 N2013/II/2). The three figures reproduced, but with D and E added:

x x
E x x
x x x
B Fig. 1 C Fig. 2 Fig. 3
π √
(i) ∣AD∣ = x tan = 3x. Hence:

∣DE∣ = ∣AB∣ − (∣AD∣ + ∣BE∣) = a − 2 ∣AD∣ = a − 2 3x.

Note that ∣DE∣ is also the length of each side of the equilateral triangle in Fig. 2. So, that
triangle’s area is:

1 π 1 √ 2 3 1√ √ 2
T = ∣DE∣ sin = (a − 2 3x) = 3 (a − 2 3x) .
2 3 2 2 4
The prism has height x and hence volume:
1 √ √ 2
V (x) = xT = x 3 (a − 2 3x) .
√ √ √
(ii) From =, we have x ≤ a/ (2 3) = 3a/6. Hence, x ∈ [0, 3a/6]. Observe that the

boundary points x = 0 and x = 3a/6 correspond to V = 0 and thus cannot be maximum
points. So, the maximum is attained in the interior. Thus, by the Interior Extremum
Theorem, the maximum is also a stationary point.
And so, let us find any stationary points of V .

3 √ 2 √ √
V ′ (x) = [(a − 2 3x) + 2x (a − 2 3x) (−2 3)]
√ √
3 √ √ √ 3 √ √
= (a − 2 3x) (a − 2 3x − 4 3x) = (a − 2 3x) (a − 6 3x).
4 4
√ √
3 3
V (x) = 0

⇐⇒ x= √ = a or x = √ =
2 a 3 a
Now: a.
2 3 6 6 3 18

We already saw that x =
3a/6 is not a maximum. So, the maximum must be:
√ √ √ √
3 3 1 3 √ √ 3 2 a a 2 a3
x= Vmax = V ( a) = a 3 (a − 2 3 a) = (a − ) = .
a and
18 18 4 18 18 24 3 54

1705, Contents

A591 (9740 N2013/II/3)(i) f (0) = ln (1 + 2 sin 0) = ln (1 + 0) = ln 1 = 0.

2 cos x
f ′ (x) = Ô⇒ f ′ (0) = 2.
1 + 2 sin x
− sin x 2 cos2 x
f ′′ (x) = 2 [ − ] Ô⇒ f ′′ (0) = 2 (0 − 2) = −4.
1 + 2 sin x (1 + 2 sin x)2
− cos x 2 sin x cos x 4 cos x sin x 4 cos3 x
f ′′′ (x) = 2 [ + + + 2 3]
1 + 2 sin x (1 + 2 sin x)2 (1 + 2 sin x)2 (1 + 2 sin x)

Ô⇒ f ′′′ (0) = (−1 + 0 + 0 + 8) = 14.

Those computations were a wonderful and highly enriching experience. Now:

f (0) f ′ (0) f ′′ (0) 2 f ′′′ (0) 3 7

f (x) = + x+ x + x + ⋅ ⋅ ⋅ = 0 + 2x − 2x2 + x3 + . . .
0! 1! 2! 3! 3

(ax) 2 (nx) 3
(ii) eax sin nx = [1 + ax + + . . . ] [nx − + ...]
2! 3!
a2 n n3 3
= nx + anx2 + ( − )x + ...
2 6

Comparing the first two non-zero coefficients, we have n = 2, an = −2, and so a = −1. Hence,
the third non-zero term is:

(−1) ⋅ 2 23
a2 n n3 8 x3
( − )x = [ − ] x = (1 − ) x3 = −
2 6 2 6 6 3

x3 1
A592 (9740 N2012/I/2)(i) ∫ dx = ln (1 + x4 ) + C.
1+x 4 4
du 2 1 du
(ii) From the given substitution u = x2 , we have = 2x or x =
. Now:
dx 2 dx
1 1 du s,1 1 1 ⋆ 1 1 1
= = = −1
+ = tan−1 x2 + C.
∫ 1 + x4 ∫ 2 1 + x4 dx ∫
dx dx du tan
u C
2 2 2 2

(Note that = is given on List MF26, p. 4.)
1 2
( ) dx ≈ 0.186.
(iii) Just use your calculator: ∫
0 1 + x4
1706, Contents
A593 (9740 N2012/I/4)(i) Observe that: ∠ACB = π − (∠ABC + ∠BAC) = − θ.
By the Law of Sines, we have , or:
sin ∠ABC sin ∠ACB
3π 1 3π 1 1
AC = sin = sin π = ,
4 sin ( 4 − θ)
π 4 sin 4 cos θ − sin θ cos 4 cos θ − sin θ

= sin = cos .
π π
where the last step uses sin
4 4 4
(ii) From the small-angle approximations cos θ ≈ 1 − θ2 and sin θ ≈ θ, we have:
cos θ − sin θ ≈ 1 − θ − θ2 .
Now use the Maclaurin series expansion for (1 + x) , with x = −θ − θ2 and n = −1:
1 1 n (n − 1) 2
≈ = 1 + + x + ...
cos θ − sin θ 1 − θ − 21 θ2

1 2 (−1) (−2) 1 2 2 3
≈ 1 + (−1) (−θ − θ ) + (−θ − θ ) ≈ 1 + θ + θ2 .
2 2 2 2

A594 (9740 N2012/I/8)(i) Differentiate the given equation with respect to x:

dy dy dy
1− = 2 (x + y) (1 + ) = 2 (x + y) + 2 (x + y) .
dx dx dx
Or: 1 − 2x − 2y = (2x + 2y + 1) .

For points at which 2x + 2y + 1 ≠ 0, we have:

dy 1 − 2x − 2y dy 1 − 2x − 2y 1 2
= or 1+ =1+ = .
dx 2x + 2y + 1 dx 2x + 2y + 1 2x + 2y + 1

to =, then do some algebra:
(ii) Apply

d2 y −2 dy −4 dy 1 dy 3
= (2 + 2 ) = 2 (1 + dx ) = − (1 + dx ) .
dx2 (2x + 2y + 1)2 dx (2x + 2y + 1)

dy d2 y
(iii) The turning point occurs where = 0. But at any such point, < 0.
dx dx2
So, by the Second Derivative Test, the turning point is a maximum.
1707, Contents
k = πr2 h + πr3 .
A595 (9740 N2012/I/10)(i) The model’s volume is:
So, h = − r. And the model’s external surface area is:
2 k
πr 2 3
2 3 2k 5 2
A = πr2 + 2πrh + 2πr2 = πr (2h + 3r) = πr [2 ( − + = +
2 k
r) 3r] πr .
πr2 3 r 3

dA 2k 10
Compute: = − 2 + πr.
dr r 3

dA 2k 10 3k 1/3
And so: =0 ⇐⇒ − 2 + πr = 0 ⇐⇒ r=( ) .
dr r 3 5π

2 2 3k 1/3 5 2/3 k 1/3 2 k 1/3 3k 1/3

h= 2 − r= − ( ) =( ) − =( ) .
2k k
πr 3 π ( 5π )
3k 2/3 3 5π 3 π1/3 32/3 51/3 π1/3 5π

To show that this is indeed a minimum, we compute the second derivative:

d2 A 4k 10
= + π > 0.
dr2 r3 3
The second derivative is positive for all r > 0. So by the Second Derivative Test, the point
we found is indeed a global minimum.

(ii) We are now given instead that k = 200 and:


2k 5 2 4 400 5 2 5 3
A = 180 = + πr = + πr πr − 180r + 400 = 0.
3 5
r 3 r 3 3

Use your calculator to solve =:


r ≈ −6.759 (reject) r ≈ 3.037 r ≈ 3.722.

6 7
or or

By =, the corresponding values of h are:


2 200 2
h= − = − r ≈ 4.877, 2.116.
2 k
πr2 3 πr2 3
Given the constraint that r < h, we have:

(r, h) ≈ (3.037, 4.877).

1708, Contents

A596 (9740 N2012/I/11)(i) Note that x is increasing in θ.

dy dy dx sin θ 2 sin 2θ cos 2θ cos 2θ ⋆

= ÷ = = = =
dx dθ dθ 1 − cos θ
2 sin2 2θ sin 2θ 2

dy dy RRRR
is undefined where θ = 0 or θ = 2π. Also, R = 0.
Note that
dy dy
As θ → 0, → ∞. And as θ → 2π, → −∞.
dx dx
So, as θ → 0 or θ → 2π, the tangents to C tend towards being vertical.

(ii) We’re supposed to just copy from a cal-

culator but we can actually figure this one y
(θ, x, y)
out without one. At the endpoints (θ, x, y) =
(0, 0, 0) and (θ, x, y) = (2π, 2π, 0), the tan- = (π, π, 2)
gents are vertical. At the midpoint, (θ, x, y) =
(π, π, 2) and the tangent is horizontal.
dy x
= cot which is
Between the endpoints,
dx 2 (θ, x, y) (θ, x, y)
positive for θ < π and negative for θ > π. = (2π, 2π, 0) = (0, 0, 0)

dx 2
(iii) Given y = 1 − cos θ, we have = 1 − cos θ and:

2π 2π dx dθ x=θ=2π dx 2π
y dx = ∫ dx = ∫ y dθ = ∫ (1 − cos θ) dθ
s 2
0 dθ dx x=θ=0 dθ 0

2π 2π cos 2θ + 1 3 1 2π
=∫ 1 + cos θ − 2 cos θ dθ = ∫
1+ − 2 cos θ dθ = [ θ + sin 2θ − 2 sin θ] = 3π.
0 0 2 2 4 0

− = − tan .
(iv) The gradient of the normal to C at P is:
dy 2
y − (1 − cos p) = − tan [x − (p − sin p)].
3 p
Thus, the normal is:
For the x-intercept, plug y = 0 into = to get cos p − 1 = − tan [x − (p − sin p)] or:
3 p
p 5 x − (p − sin p)
1 − cos p = tan [x − (p − sin p)] =
or cot .
2 2 1 − cos p

sin θ ⋆
= cot , we have x = p.
And now by
1 − cos θ 2
1709, Contents
A597 (9740 N2012/II/1)(a) = ∫ 16 − 9x2 dx = 16x − 3x3 + C1 .
y = ∫ 16x − 3x3 + C1 dx = 8x2 − x4 + C1 x + C2 .
4 dt 1
(b) If u ≠ , then = and:
3 du 16 − 9u2
1 1 1 1 1 4 + 3u 1 4 + 3u
t=∫ du = ∫ du = ln ∣ ∣ + = ln ∣ ∣+C
16 − 9u2 9 2 ⋅ (4/3) 4 − 3u 4 − 3u
(4/3) − u2
9 2 24

Plug in the given initial condition (t, u) = (0, 1) to find that C = − ln 7. So:
1 4 + 3u 1 4 + 3u 4
t= (ln ∣ ∣ − ln 7) = ln ∣ ∣ for u ≠ .
24 4 − 3u 24 7 (4 − 3u) 3

A598 (9740 N2011/I/3)(i) The tangent at the point with parameter t has gradient :
dy dy dx 2 1
= ÷ = − 2 ÷ (2t) = − 3 .
dx dt dt t t
2 1 x 3
So the tangent has equation: y− = − 3 (x − p2 ) or or y = − + .
p p p3 p
(ii) For the x-intercept Q, plug in y = 0: 0 = −x/p3 + 3/p or x = 3p2 . For the y-intercept R,
plug in x = 0: y = 0 + 3/p. So, Q = (3p2 , 0) and R = (0, 3/p).
(iii) The mid-point is (x, y) = (1.5p2 , 1.5/p) and has equation xy 2 = 1.5p2 (1.5/p) = 1.53 .

A599 (9740 N2011/I/4)(i) Recall that (a + b) = a6 + 6a5 b + . . . So, using List MF26:

⎛ x2 x4 ⎞ x2
6 5
x2 x4
cos x = 1 −
+ + . . . = (1 − ) + 6 (1 − ) + ...
⎝ 2 24 ⎠ 2 2 24
´¹¹ ¹¸¹ ¹ ¹¶ ¯
a b

−x2 −x2
= 1 + 6( ) + 15 ( ) + 6 (1) + ⋅ ⋅ ⋅ = 1 − 3x2 + 4x4 + . . .
2 2 24
(ii)(a) By the merry and unjustified application of Theorem ??:

4 5 a
∫0 cos x dx = ∫0 1 − 3x + 4x + . . . dx = [x − x + x + . . . ] = a − a3 + a5 + . . .
a a
6 2 4 3
5 0 5

π 3 4 π 5
If a = π/4, then ∫ cos x dx ≈ − ( ) + ( ) ≈ 0.540.
π/4 π
0 4 4 5 4

cos6 x dx ≈ 0.475.
(b) By our calculator: ∫0

Using the first few terms of the Maclaurin series as an approximation works well if π/4 is
near 0. But it isn’t and so this approximation doesn’t work well.
1710, Contents
A600 (9740 N2011/I/5)(i) The graphs of y = f (∣x∣) and y = ∣f (x)∣ are identical to that
of f , except that:
• Where x < 0, y = f (∣x∣) is the reflection of f in the y-axis.
• Where f (x) ≥ 0, y = ∣f (x)∣ is the reflection of f in the x-axis.

y y

y = f (x)

(0, 2) (0, 2)
(2, 0) x

(−2, 0) x y = f (x)

⎧ ⎪
⎪2 − x for x ≤ 2,

⎪2 − x for x ≥ 0, y = ∣f (x)∣ = ⎨
y = f (∣x∣) = ⎨ ⎪
⎪x − 2
⎪ for x > 2.

⎪ ⎩
⎩2 + x
⎪ for x < 0.

(ii) For x < 0, we have f (∣x∣) = 2 + x ≠ 2 − x = ∣f (x)∣. 7

For x ∈ [0, 2], we have f (∣x∣) = 2 − x = 2 − x = ∣f (x)∣. 3
For x > 0, we have f (∣x∣) = 2 − x ≠ x − 2 = ∣f (x)∣. 7
Altogether then, f (∣x∣) = ∣f (x)∣ ⇐⇒ x ∈ [0, 2].

0 1
1 0 1 x2 x2 1
(iii) ∫ f (∣x∣) dx = ∫ 2 + x dx + ∫ 2 − x dx = [2x + ] + [2x − ] = 3.
−1 −1 0 2 −1 2 0
2 x2 x2
∫1 ∣f (x)∣ dx = ∫1 (2 − x) dx + ∫2 (x − 2) dx = [2x − ] + [ − 2x]
a a

2 1 2 2

1 a2 2 5
= (4 − 2 − 2 + ) + ( − 2a − 2 + 4) = − 2a + .
2 a
2 2 2 2

a2 5
Set = and = to be equal: − 2a + 3= a2 − 4a − 1 = 0.
1 2
2 2

4 ± (−4) − 4 (1) (−1)
√ √
By the quadratic formula: a= = 2 ± 4 + 1 = 2 ± 5.
2 (1)
√ √
We can discard a = 2 − 5 < 2. So, a = 2 + 5.
1711, Contents
1 1 10 + v
A601 (9740 N2011/I/8)(i) ∫ dv = ln ∣ ∣ + C1 (see List MF26, p. 4).
100 − v 2 20 10 − v
dt 1 1
(ii)(a) Rearrange the given differential equation: = = 10 .
dv 10 − 0.1v 2 100 − v 2
1 1 1 10 + v
So: t=∫ dv = ∫ 10 dv = ln ∣ ∣ + C2 .
10 − 0.1v 2 100 − v 2 2 10 − v
1 10 + 0
Plug in the given initial condition (t, v) = (0, 0) to get 0 = ln ∣ ∣ + C2 . So, C2 = 0.
2 10 − 0
1 10 + v 1 10 + 5 1
t= ln ∣ ∣ = ln ∣ ∣ = ln 3.
Thus: and
10 − v 10 − 5 2
2 v=5

10 + v 10 + v
(b) Rearrange =: 2t = ln ∣ ∣ ⇐⇒ e2t = ∣ ∣.
10 − v 10 − v
10 + v 20 20
For v < 10: e2t = = −1 + ⇐⇒ v (t) = 10 −
10 − v 10 − v e2t + 1
So, if we start with v < 10, then for all t, we have v < 10. And we do start from v < 10
(indeed, we start at v = 0). So = holds for all t and v (1) = 10 − 20/ (e2 + 1).

(c) lim v (t) = 10.


A602 (9740 N2011/II/2)(i) The box has length L = 2 (n − x), breadth B = n − 2x, and
height H = x. It thus has volume:

V = LBH = 2 (n − x) (n − 2x) x = 2 (n2 − 3nx + 2x2 ) x = 2n2 x − 6nx2 + 4x3 .

(ii) Compute: = 2n2 − 12nx + 12x2 . So:
=0 ⇐⇒ 2n2 − 12nx + 12x2 = 0 ⇐⇒ n2 − 6nx + 6x2 = 0.
√ √ √ √
6n ± (−6n) − 4 (1) (6n2 ) 3n ± 9n2 − 6n2 1
3 1 3
Or: x= = = n± n = (1 ± ) n.
2 (12) 6 2 6 2 3

We discard the larger value of x because it implies that B < 0:

√ √
1 3 3
B = n − 2x = n − 2 ⋅ (1 + )n = − n < 0.
2 3 3

1 3
Thus, the only stationary value of V occurs at x = (1 − ) n.
2 3
1712, Contents
A603 (9740 N2011/II/4)(a)(i) Use Integration by Parts twice:

−1 −2x −1 n
1 n
x2 e−2x dx = [x2 ( ) e − ∫ 2x ( ) e−2x dx] = [− x2 e−2x + ∫ xe−2x dx]
∫0 2 2 2
0 0

1 2 −2x −1 −2x −1 −2x n

1 2 −2x 1 −2x 1 −2x n
= [− x e + x ( ) e − ∫ ( ) e dx] = [− x e − xe − e ]
2 2 2 0 2 2 4 0

1 −2x n
1 1
= [− e (2x + 2x + 1)] = − e−2n (2n2 + 2n + 1) + .
4 0 4 4
∞ 1 1 1
(a)(ii) ∫ x2 e−2x dx = lim [− e−2n (2n2 + 2n + 1) + ] = .
0 n→∞ 4 4 4
dx 2
(b) From the given substitution x = tan θ, we have = sec2 θ and:

1 1 4x 2 1 16 tan2 θ 1 16 tan2 θ
V π ∫ y 2 dx = π ∫ ( 2 ) dx = π ∫ = ∫0
dx 2 dx
x +1
0 (tan2 θ + 1) (sec θ)
2 2
0 0

1 16 tan2 θ dθ θ=π/4 16 tan2 θ

= π∫ = =
∫ ∫
dx π dθ 16π sin2 θ dθ
0 sec θ dx
2 θ=0 sec θ
2 0

1 π/4
= 8π ∫ 1 − cos 2θ dθ = 8π [θ − sin 2θ] = 2π (π − 4).

0 2 0

A604 (9740 N2010/I/2)(i) ex = 1 + x + x2 + . . . and 1 + sin 2x = 1 + 2x + . . .
1 5
Thus: ex (1 + sin 2x) = (1 + x + x2 + . . . ) (1 + 2x + . . . ) = 1 + 3x + x2 + . . .
2 2
4x n 4x n (n − 1) 4x 2 4n 8n (n − 1) 2
(ii) (1 + ) = 1 + n + ( ) + ⋅⋅⋅ = 1 + x + x + ...
3 3 2 3 3 9

4n 9 8n (n − 1) 8 (9/4) (5/4) 5
So, 3 = or n = . And = = . 3
3 4 9 9 2

A605 (9740 N2010/I/4)(i) Differentiate the given equation with respect to x:

dy dy dy dy x + y
2x − 2y + 2y + 2x =0 ⇐⇒ 2 (x − y) = −2 (x + y) ⇐⇒ = .
dx dx dx dx y − x

(Note that it’s OK to divide by y − x because y − x ≠ 0 — if x = y, then the given equation

implies 2x2 + 4 = 0, which has no real solutions.)
⇐⇒ dy/dx = 0 ⇐⇒ y = −x.
(ii) Tangent parallel to y-axis

Plug = into the given equation: x2 − x2 − 2x2 + 4 = 0 x2 = 2 x = ± 2.
or or
√ √
So, the two points are (∓ 2, ± 2).
1713, Contents
A606 (9740 N2010/I/6)(i) Using your calculator:

β ≈ 0.347 and and γ ≈ 1.532.

(ii) Again, use your calculator: ∣∫ x3 − 3x + 1 dx∣ ≈ 0.781.



(iii) The curve and line intersect at x = − 3. So, the requested area is:
0 x4 3 2 9 3 9
∫− 3
√ − 3x3
dx = [ − ] = − + ⋅ 3 = .
4 2 −√3
x x
4 2 4

(iv) Let f (x) = x3 − 3x + 1. The maximum and minimum values of f are:

f (−1) = (−1) − 3 (−1) + 1 = 3 f (−1) = 13 − 3 ⋅ 1 + 1 = −1.


By observation, x3 −3x+1 = k has three distinct real roots if and only if k is strictly between
the above two values. That is, k ∈ (−1, 3).

A607 (9740 N2010/I/7) We will assume throughout that θ < 20.

dθ 1 dt 1
(i) The differential equation is: = k (20 − θ) or = .
dt dθ k (20 − θ)
dθ 2
The initial condition is (t, θ, ) = (0, 10, 1). Plug = into =:
2 1
1 = k (20 − 10) or k= .
So: t = 10 ∫ dθ = −10 ln ∣20 − θ∣ + C = −10 ln (20 − θ) + C.
20 − θ
Plug = into =: 0 = −10 ln (20 − 10) + C or C = 10 ln 10. So:
2 3

10 10
t (θ) = 10 ln or e0.1t = or θ (t) = 20 − 10e−0.1t .
20 − θ 20 − θ

(ii) t (15) = 10 ln 2 ≈ 6.931. θ

lim θ (t) = lim (20 − 10e −0.1t
) = 20.
t→∞ t→∞ 20

θ (t)

1714, Contents

A608 (9740 N2010/I/9)(i) The box has volume 3x2 y = 300. Rearranging, y = 100/x2 .

The box’s and lid’s external surface areas are:

100 300 800
ABox = 2 (xy + 3xy) + 3x2 = 2 ( + ) + 3x2 = + 3x2 .
x x x
ALid = 2 (kxy + 3xky) + 3x2 = k + 3x2 .
Hence, the total external surface area is: A = ABox + ALid =
(1 + k) + 6x2 .
We have x ∈ (0, ∞). Hence, by the Interior Extremum Theorem, the minimum must be a
stationary point. So, let’s find any stationary points:

dA 800 √
= − 2 (1 + k) + 12x = 0 ⇐⇒ 12x = 800 (1 + k)
⇐⇒ x = 3 200 (1 + k) /3.
dx x

Thus, A is minimised at: x= (1 + k).

y 100 100 200 (1 + k) 3

(ii) = 2 ÷ x = 3 = 100 ÷ = .
x x x 3 2 (1 + k)

1 1 1 3 3 3
(iii) k ∈ (0, 1] ⇐⇒ 2 (1 + k) ∈ (2, 4] ⇐⇒ ∈ [ , ) ⇐⇒ ∈ [ , ).
2 (1 + k) 4 2 2 (1 + k) 4 2

3 1
= 1 ⇐⇒ = 1 ⇐⇒ k = .
x 2 (1 + k) 2

1715, Contents

A609 (9740 N2010/I/11)(i) The tangent at t has gradient:

dy dy dx 1 + t−2 t2 + 1
= ÷ = = (for t ≠ 0, 1).
dx dt dt 1 − t−2 t2 − 1
So the tangent at point P has equation:

p2 + 1 (p2 − 1) p2 + 1
1 1
y − (p − ) = 2 [x − (p + )] or (p − 1) y −
= (p2 + 1) (x − ).
p p −1 p p p

p (p2 − 1) y − (p2 − 1) = (p2 + 1) [px − (p2 + 1)].


1 1
(p2 + 1) x − (p2 − 1) y = [(p2 + 1) − (p2 − 1) ] = (2 ⋅ 2p2 ) = 4p.
2 2 1
p p

(ii) For A, plug y = x into =: (p2 + 1) x − (p2 − 1) x = 4p or x = 2p. So, A = (2p, 2p).

2 2 2
For B, plug y = −x into =: (p2 + 1) x + (p2 − 1) x = 4p or x = . So, B = ( , − ).
p p p
Observe that since y = x and y = −x are perpendicular, OA and OB are likewise perpendic-
ular. Hence, the area of the triangle OAB is simply given by the primary-school formula
“half base times height”:

√ RRR¿ 2 RRR
R √
R Á √
∣OA∣ ∣OB∣ = ∣ (2p) + (2p) ∣ RRRRRÁ ( ) + (− ) RRRRR =
2 À
1 1 2 2 1 8
= 4,
2 2 RRR p p RR 2
which is indeed independent of p.

(iii) Observe that x + y = 2t and x − y = . Hence, x2 − y 2 = (x + y) (x − y) = 4.
This is an east-west hyperbola with x-intercepts (±2, 0) and asymptotes y = ±x.

C y = −x y=x

(2, 0) (−2, 0)

1716, Contents

A610 (9740 N2010/II/3)(i) Find the stationary points:

dy √ x + 2 + x/2 1.5x + 2 1.5x + 2 4

= x+2+ √ = √ = √ = 0 ⇐⇒ √ = 0 ⇐⇒ x = − .
dx 2 x+2 x+2 x+2 x+2 3

So, x = −4/3 is the only stationary point. To show that this is also a minimum turning
point, we use the Second Derivative Test:
R √
d2 y RRRR 1.5x 1 1.5x + 2 1.5x x + 2 − 0.75x − 1
R = [√ − ] =[ ] > 0.
dx2 RRRR 4 + 2 2 (x + 2)1.5 x=−4/3 (x +
Rx=− /3 x 2) x=− /3

(ii)(a) The given equation is equivalent to y = ±x x + 2, which is, of course, very similar

to the equation given in (i).

dy 1.5x + 2
From (i), we immediately know that: =±√ .
dx x+2
dy RRRR 1.5 ⋅ 0 + 2 2 √
Hence: RRR = ± √ = ± √ = ± 2.
dx RR 0+2 2

(ii)(b) y (iii) y
Vertical asymptote
x = −2

x x

1.5x + 2
y 2 = x2 (x + 2) y = f ′ (x) = √

A611 (9740 N2009/I/2). Use what’s given on List MF26, p. 4:

1 1 1 1 2+x 1 2+1 2+0 1

∫0 dx = ∫ ln ∣ ∣ dx = (ln − ln ) = ln 3.
4 − x2 0 2⋅2 2−x 4 2−1 2−0 4
1/2p 1 1 1/2p 1 1 1 −1 1 π
√ dx = ∫ √ dx = [sin−1 ] = = .
∫0 sin
1 − p 2 x2 p 0 1/p2 − x2 p 1/p 0 p 2 6p

1 4π 2π
ln 3 = ⇐⇒ p = = ≈ 1.906.
4 6p 6 ln 3 3 ln 3
1717, Contents
A612 (9740 N2009/I/4)(i) f (27) + f (45) = f (3) + f (1) = 5 + 6 = 11.
(−4, 7) y (4, 7) (8, 7)
(0, 7)
(−7, 6)

(−6, 3) (−2, 3) (2, 3) (6, 3) (10, 3)

3 −2 0 2 3
(iii) ∫−4 f (x) dx = ∫ f (x) dx + ∫ f (x) dx + ∫ f (x) dx + ∫ f (x) dx
−4 −2 0 2
2 4 2 3
= ∫ 7 − x2 dx + ∫ 2x − 1 dx + ∫ 7 − x2 dx + ∫ 2x − 1 dx
0 2 0 2

1 3 2 2 2
= 2 [7x − x ] + [x2 − x]2 + [x2 − x]2 = 22 + 12 − 2 + 6 − 2 = 36 .
4 3
3 0 3 3

A613 (9740 N2009/I/7)(i) f (0) = ecos 0 = e1 = e.

f ′ (x) = ecos x (− sin x) Ô⇒ f ′ (0) = ecos 0 (− sin 0) = 0.
f ′′ (x) = ecos x (− sin x) + ecos x (− cos x) Ô⇒ f ′′ (0) = 0 − e1 ⋅ 1 = −e.

f (0) f ′ (0) f ′′ (0) 2 −e e

So: ecos x = + x+ x + ⋅ ⋅ ⋅ = e + 0x + x2 + ⋅ ⋅ ⋅ = e − x2 + . . .
0! 1! 2! 2 2
1 1 2bx
(ii) Let g (x) = . So (0) = . Also: g ′ (x) = − Ô⇒ g ′ (0) = 0.
a + bx2
(a + bx2 )
a 2

2b 8bx2 2b
And: g ′′ (x) = − + Ô⇒ g ′′ (0) = − .
(a + bx2 ) (a + bx2 )
2 3 a2

Since the first two non-zero terms coincide, we have: f (0) = g (0) or a = 1/e.

2b 1
And: f ′′ (0) = g ′′ (0) or −e = − = −2be2 or b= .
a2 2e
1718, Contents
A614 (9740 N2009/I/11)(i)

(ii) Compute: f ′ (x) = e−x + xe−x (−2x) = e−x (1 − 2x2 ).

2 2 2

√ √ √
So, f ′ (x) = 0 ⇐⇒ x = ± 0.5. So, the two stationary points are (± 0.5, ± 0.5e1/2 ).

To verify that these are also turning points, compute: f ′′ (x) = −2xe−x (1 − 2x2 )+e−x (−4x).
2 2

√ √
And so: f ′′ (± 0.5) = ∓2 0.5e−0.5 (1 − 1) ∓ 2e−0.5 = ∓2e−0.5 .
√ √
The second derivative is negative at x = 0.5 and positive at x = − 0.5. Hence, by the
Second Derivative Test, the first point is a minimum turning point and the second is a
maximum turning point.

du 2
(iii) From the given substitution u = x2 , we have = 2x and:
1 n du −u s 1 1 1
dx = ∫ dx = ∫ e dx = ∫ e−u du = [−e−u ]0 = (1 − e−n ) .
n n u=n
−x2 −u n2
∫0 xe
1 2 2
0 2 0 dx 2 0 2 2
1 1
lim ∫ xe−x dx = lim (1 − e−n ) = .
n 2 2
So, the requested area is:
n→∞ 0 n→∞ 2 2

2 2 1
∫−2 ∣xe−x ∣ dx = 2 ∫ xe−x dx = 2 [ (1 − e−n )] = 1 − e−4 .
2 2 2
(iv) By symmetry:
0 2 n=2

1 1 2
π∫ y 2 dx = π ∫ (xe−x ) dx ≈ 0.363.
(v) Calculator:
0 0
1719, Contents
A615 (9740 N2009/II/1)(i) Observations:
1. x = t2 + 4t = t (t + 4). So, x is minimised at t = −2 and is strictly increasing for t ∈ [−2, 1].
2. y ′ (t) = 3t2 + 2t = t (3t + 2), so that y has turning points at t = −2/3 and t = 0.

When t = 1,
(x, y) = (5, 2)

When t = −2,
(x, y) = (−4, −4)

dy dy dx 3t2 + 2t dy 3 ⋅ 22 + 2 ⋅ 2 16
(ii) Compute: = ÷ = . So: ∣ = = = 2.
dx dt dt 2t + 4 dx t=2 2⋅2+4 8

Thus: P = (22 + 4 ⋅ 2, 23 + 22 ) = (12, 12).

y − 12 = 2 (x − 12) y = 2x − 12.
So, l has cartesian equation: or

(iii) Plug the given parametric equations into = to get:


t3 + t2 = 2 (t2 + 4t) − 12 t3 − t2 − 8t + 12 = 0.

The solutions to = give us the points at which the curve C and the line l intersect.

We already know that t = 2 solves =, because P is an intersection point. So write:


t3 − t2 − 8t + 12 = (t − 2) (t2 + bt + c) .

Comparing coefficients, we have c = −6 and b − 2 = −1 or b = 1. So:

t3 − t2 − 8t + 12 = (t − 2) (t2 + bt + c) = (t − 2) (t2 + t − 6) = (t − 2) (t − 2) (t + 3) .

So, the only other intersection point is at t = −3, which corresponds to the point:

Q = ((−3) 2 + 4 (−3) , (−3) 3 + (−3) 2 ) = (−3, −18).

1720, Contents
dn d2 n
= dt = ∫ 10 − 6t dt = 10t − 3t2 + C1 .
dt ∫ dt2
A616 (9740 N2009/II/4)(i)

n=∫ dt = ∫ 10t − 3t2 + C1 dt = 5t2 − t3 + C1 t + C2 .
And so, the general solution is:
The remainder of the answer for (i) is no longer in the current 9758 syllabus.
Plug the initial condition (t, n) = (0, 100) into =: 100 = 5 ⋅ 02 − 03 + C ⋅ 0 + D = D.

So, the family of curves is: n = 5t2 − t3 + Ct + 100.

Sketched below are three members of this family, corresponding to C = 0, C = 1, and C = 2:

100 C=0 C=2


Three members of the family

n = 5t2 − t3 + Ct + 100. x

(ii) We will simply assume that n < 150.

dt 1 50
Rearrange the given differential equation: = = . So:
dn 3 − 0.02n 150 − n
dt 50
t=∫ dn = ∫ dn = −50 ln ∣150 − n∣ + E = −50 ln (150 − n) + E.
dn 150 − n

Plug the given initial condition (t, n) = (0, 100) into =:


0 = −50 ln (150 − 100) + E = −50 ln 50 + E. So, E = 50 ln 50.

50 50
And: t = 50 ln or et/50 = or n = 150 − 50e−t/50 .
150 − n 150 − n

As t → ∞, n → 150. So, the population will approach 150 thousand.

1721, Contents
4√ 2 3/2 4 2
A617 (9740 N2008/I/1). The dotted area is ∫ y dy = [ y ] = (8 − a3/2 ).
a 3 a 3
2 1 3 2 1 7
The grey area is ∫ x dx = [ x ] = (8 − 1) = . If these two areas are equal, then:
1 3 1 3 3
2 7 7 9 2/3
(8 − a3/2 ) = ⇐⇒ 8−a 3/2
= ⇐⇒ a = ( ) ≈ 2.726.
3 3 2 2

dy 3x 1 3
A618 (9740 N2008/I/4)(i) y = ∫ dx = ∫ 2 dx = ln (x2 + 1) + C.
dx x +1 2
(ii) Plug the initial condition (x, y) = (0, 2) into = to get: 2 = 1.5 ln 1 + C = C.

So, y = 1.5 ln (x2 + 1) + 2.

(iii) As x → ±∞, dy/dx → 0.


Three members of the family C=1

y = 1.5 ln (x2 + 1) + 2

C=0 x

A619 (9740 N2008/I/5)(i)

√ √
1/ 3 1 1 1/ 3 1 1

1 √
= = [3 −1
= tan−1 3 = .
1/ 3 π
∫0 1 + 9x2
9 ∫0
dx tan 3x]
(1/3) + x2
2 9 0 3 9
e e
e 1 n+1 e 1 1 1 n+1 1
∫1 x ln x dx = [
x ln x] − ∫ xn+1 dx = e −[ xn+1 ]
n+1 1 1 n+1 x n+1 (n + 1)

1 n+1 1 1 (n + 1) en+1 + 1 − en+1 nen+1 + 1

= e −[ en+1
− 1 n+1
] = = 2 .
n+1 (n + 1)
(n + 1)
(n + 1)
(n + 1)
1722, Contents
A620 (9740 N2008/I/6)(a) Use the Law of Cosines:

AC 2 = AB 2 + BC 2 − 2 (AB) (BC) cos θ = 1 + 9 − 6 cos θ = 10 − 6 cos θ.

√ 1
So: AC = 10 − 6 cos θ. Plug into = the small angle approximation cos θ ≈ 1 − θ2 :
1 1

1 √
AC ≈ 10 − 6 (1 − θ2 ) = 4 + 3θ2 .

√ 3 2 1/2 1 3 3
So: 4 + 3θ = 2 (1 + θ ) = 2 [1 + ( θ2 ) + . . . ] = 2 + θ2 + . . .
4 2 4 4
Hence, a = 2 and b = 3/4.

(b) f (0) = tan = 1. f ′ (x) = 2 sec2 (2x + ) Ô⇒ f ′ (0) = 2 sec2 = 2 ⋅ 2 = 4.

π π π
4 4 4

f ′′ (x) = 8 sec2 (2x + ) tan (2x + ) Ô⇒ f ′′ (0) = 8 sec2 tan = 8 ⋅ 2 ⋅ 1 = 16.

π π π π
4 4 4 4

Hence, f (x) = 1 + 4x + + ⋅ ⋅ ⋅ = 1 + 4x + 8x2 + . . .

A621 (9740 N2008/I/7). The straight parts have length x + 2y. The semicircular part
has length πx/2. So, the total time to build the wall is: 3 (x + 2y) + 9πx/2 = 180.
3 1 1 1 3
Rearranging: y = 30 − πx − x = 30 − ( + π) x.
4 2 2 4

The area of the flower-bed is:

1 x 21 1 3 1 5
A = xy + π ( ) = 30x − ( + π) x2 + = 30x − ( + π) x2 .
2 2 2 4 8 2 8

Observe that A is a quadratic polynomial in x with negative coefficient on x2 . Hence, its

maximum is at “x = −b/2a” or:
−30 2 120
x= = ≈ 6.089.
−2 ( 12 + 58 π) 4 + 5π

For the corresponding value of y, plug = into =:

2 1

1 3 120 60 + 90π 1+π

y = 30 − ( + π) = 30 − = 60 ⋅ ≈ 12.609.
2 4 4 + 5π 4 + 5π 4 + 5π
1723, Contents
A622 (9740 N2008/II/1)(i), (iii) → y
(ii) e sin x = (1 + x +
+ . . . ) (x − . . . )
=x+x + 2
+ ...
(iv) ∣f (x) − g (x)∣ = 0.5
⇐⇒ ∣e sin x − (x + x + )∣ = 0.5
x 2

⇐⇒ x ≈ −1.962 or x ≈ 1.560.

∣f (x) − g (x)∣ < 0.5 ⇐⇒ −1.962 > x > 1.560.

√ √
A623 (9740 N2008/II/2)(i) The upper half of the curve C has equation y = x 1 − x.
1√ √
And so, by symmetry, the requested area is R = 2 ∫ x 1 − x dx ≈ 0.999.

du 2 dx
(ii) From u = 1 − x, we have = −1 =
. And the requested volume is:
dx du
1 1 √ 1 √
π∫ y dx = π ∫ x 1 − x dx = π ∫ (1 − u) u dx
2 1
0 0 0
1 √ dx du 0 √
= π ∫ (1 − u) u dx = π ∫ (1 − u) u (−1) du
0 du dx 1
1√ 2 3/2 2 5/2 1 4π
= π∫ u − u du = π [ u − u ] =
0 3 5 0 15

dy √
= 1−x− √
(iii) Differentiate the given equation with respect to x: 2y .
dx 2 1−x
dy √
= 0 ⇐⇒ 1 − x − √ = 0 ⇐⇒ 2 (1 − x) − x = 0 ⇐⇒ x = 2/3.
dx 2 1−x

(There are two stationary points, both with x-coordinate 2/3. One is a maximum point of
C and the other a minimum point of C.)
1724, Contents
A624 (9233 N2008/I/2). By the standard Maclaurin series expansions, for small x:

(2x) −1
1 −1/2 1
cos 2x ≈ 1 − = 1 − 2x2 and √ = (1 + x2 ) ≈ 1 + ( ) x2 = 1 − x2 .
2! 1 + x2 2 2

cos 2x 1 5
Thus: √ ≈ (1 − 2x2 ) (1 − x2 ) ≈ 1 − x2
1 + x2 2 2

So, a = 1 and b = −5/2.

A625 (9233 N2008/I/3). Use Integration by Parts:

1 −1 −2x 1 1 −1 1 1 1
∫0 xe−2x dx = [x ( ) e ] − ∫ ( ) e−2x dx = − e−2 + [− e−2x ] = (1 − 3e−2 ) .
2 0 0 2 2 4 0 4

dt 2 1
A626 (9233 N2008/I/4). From the given substitution t = ln x, we have = . So:
dx x
e3 1 e3 1 dt 1 3 1 1 3 2
dx = ∫ dx = ∫ 2 dt = ∫ dt = [− ] =
2 1
x (ln x) (ln x) dx t=1 (ln x)
2 2 t 1 3
e 1 t2

y ⎧

A627 (9233 N2008/I/6)(i) ⎪
⎪x − a for x ≥ a,
∣x − a∣ = ⎨
(ii) By observation, this definite integral is ⎪
⎪a − x
⎪ for x < a.
simply the area of the two triangles created ⎩
by the graph and the x-axis:
1 1
[a − (−b)] + (b − a) = a2 + b2 .
2 2
2 2 x

−b a b

∞ ∞
1 1 π 1
dx = [ tan −1 x
] = − tan−1 .
A628 (9233 N2008/I/8). ∫
a 4+x 2 2 2 a 4 2 2
√ √
3/2 1
√ dx = [sin x]1/2 = − = .
3/2 π π π
1 − x2 3 6 6

If the two expressions are equal, then:

π 1 2 2√
− tan−1 = ⇐⇒ = tan−1 ⇐⇒ a = 2 tan =√ =
a π π a π
4 2 2 6 6 2 6 3 3
1725, Contents
dy 2 dz
A629 (9233 N2008/I/10)(i) From the given substitution y = xz, we have = z+x .
dx dx
Now plug these into the given differential equation:

dz dz dz
x2 z (z + x ) = x2 + x2 z 2 ⇐⇒ x2 (z 2 + xz − z 2 − 1) = 0 ⇐⇒ x2 (xz − 1) = 0.
dx dx dx
dz 4
So, provided x ≠ 0, we have xz = 1.
dz 5 1
(ii) Rearrange = as z = . Apply the ∫ dx operator to = to get:
4 5
dx x
dz 1 1
∫ z dx dx = ∫ x dx ∫ z dz = ∫ x dx z 2 = 2 ln ∣x∣ + C.
or or

From =, we have y 2 = x2 z 2 or assuming x ≠ 0, z 2 = y 2 /x2 . (Note that from the given

1 7

differential equation, if x = 0, then y = 0. We will note this in our final answer below.)

Plug = into = to get: y 2 = x2 (2 ln ∣x∣ + C) y = ±x 2 ln ∣x∣ + C.
6 7 8
Plug in the initial condition (x, y) = (2, 6) to get: 62 = 36 = 22 (2 ln 2 + C) or C = 9 − 2 ln 2.

⎪0 for x = 0,
So: y=⎨ √

⎩±x 2 ln ∣x∣ + 9 − 2 ln 2
⎪ for x ≠ 0.

A630 (9233 N2008/I/13)(i) The gradient of the normal to the curve is:
dx dx dy −3 cos2 t (− sin t) cos t
− =− ÷ = = .
dy dt dt 3 sin2 t cos t sin t
Thus, the equation of the normal at the given point is:

cos t
y − sin3 t = (x − cos3 t) x cos t − y sin t = cos4 t − sin4 t.
sin t

(ii) cos4 t − sin4 t = (cos2 t + sin2 t) (cos2 t − sin2 t) = (1) (cos 2t) = cos 2t.

(iii) Plug = into = to get x cos t − y sin t = cos 2t. (Note: 0 < t < π/4 Ô⇒ cos t ≠ 0 ≠ sin t.)
2 1 3

For the x-intercept, plug y = 0 into = to get x cos t = cos 2t or x = cos 2t/ cos t.

For the y-intercept, plug x = 0 into = to get −y sin t = cos 2t or y = − cos 2t/ sin t.

cos 2t cos 2t
So: A=( , 0) and B = (0, − ). And thus:
cos t sin t
√ √ √
cos 2t 2 cos 2t 2 1 1 sin2 t + cos2 t
∣AB∣ = ( ) + (− ) = cos 2t + = cos 2t
cos t sin t cos2 t sin2 t sin2 t cos2 t
1 cos 2t
= cos 2t =1 = 2 cot 2t. (So, k = 2.)
sin t cos t /2 sin 2t
1726, Contents
A631 (9233 N2008/I/14)(i) Let P (k) be the following proposition:

1 − (k + 1) xk + kxk+1
1 + 2x + 3x + ⋅ ⋅ ⋅ + kx
2 k−1
(1 − x) 2
1 − (1 + 1) x1 + 1 ⋅ x1+1 1 − 2x + x2
We show that P (1) is true: 1= = . 3
(1 − x) (1 − x)
2 2

We next show that for all j ∈ N, if P (j) is true, then P (j + 1) is also true:

1 − (j + 1) xj + jxj+1
1 + 2x + 3x2 + ⋅ ⋅ ⋅ + jxj−1 + (j + 1) xj = + (j + 1) xj
(1 − x) 2
1 + (j + 1) xj [(1 − x) 2 − 1] + jxj+1 1 + (j + 1) xj (x2 − 2x) + jxj+1
= =
(1 − x) 2 (1 − x) 2
1 − 2 (j + 1) xj+1 + jxj+1 + (j + 1) xj+2 1 − (j + 2) xj+1 + (j + 1) xj+2
= = .
(1 − x) 2 (1 − x) 2

∫ 1 + 2x + 3x2 + ⋅ ⋅ ⋅ + nxn−1 dx = 1 + 2x + 3x2 + ⋅ ⋅ ⋅ + nxn−1 . Also:
(ii) By definition,
d d
∫ 1 + 2x + 3x2 + ⋅ ⋅ ⋅ + nxn−1 dx = (x + x2 + x3 + ⋅ ⋅ ⋅ + xn + C)
dx dx
d 1 − xn 1 − xn n−1 1 1 − xn (1 − xn − nxn ) (1 − x) + x − xn+1
= (x + C) = + x [−nx + ]=
dx 1−x 1−x 1 − x (1 − x)2 (1 − x)

1 − xn − nxn − x + xn+1 + nxn+1 + x − xn+1 1 − (n + 1) xn + nxn+1

= =
(1 − x) (1 − x)
2 2

The claim now follows from = and =.

1 2

A632 (9233 N2008/II/1). Refer to List MF26, p. 3:

4+6 4−6
cos 4x − cos 6x = −2 sin ( x) sin ( x) = −2 sin 5x sin (−x) = 2 sin 5x sin x.
2 2

1 π/3 1 sin 4x sin 6x π/3 3
∫0 sin 5x sin x dx = 2 ∫0 cos 4x − cos 6x dx = 2 [ 4 − 6 ] = − 16 .

1727, Contents

A633 (9233 N2008/II/5)(i) Let f (x) be the given expression. Then:

(x + 2) ⋅ 2 − 2x ⋅ 1 (x + 2) − 4 (1 + x)
1 x2
f (x) =

− = =
1+x (x + 2)
(1 + x) (x + 2)
(1 + x) (x + 2)

Assuming425 the domain of ln is R+ , we have 1+x > 0 and hence also (x + 2) > 0. Assuming

x ∈ R, we have x2 ≥ 0. Altogether then:

≥ 0.
(1 + x) (x + 2)

(ii) Observe that f (0) = 0 or equivalently ln (1 + x) = .
Since f is non-decreasing, for all x ≥ 0, we must have f (x) ≥ f (0) = 0 and thus:

ln (1 + x) ≥ .

4 dt
A634 (9740 N2007/I/4)(i) For I ≠ 2/3, we have = .
2 − 3I dI
4 dt 4
∫ 2 − 3I dI = ∫ dI dI − ln ∣2 − 3I∣ = t + C.
So: or

Plug the given initial condition (t, I) = (0, 2) into = to get:


4 4
− ln ∣2 − 3 ⋅ 2∣ = 0 + C or C = − ln 4.
3 3

4 4 4
So: ln =t or = e3t/4 or ∣2 − 3I∣ = 4e−3t/4 .
3 ∣2 − 3I∣ ∣2 − 3I∣

⎪ 1

⎪ (4e−3t/4 + 2) for I > 2/3,

Hence: I =⎨

⎪ 1

⎪ (2 − 4e−3t/4 ) for I < 2/3.

1 2
(ii) We start at t = 0 with I = 2 > 2/3. So, lim I = lim (4e−3t/4 + 2) = .
t→∞ t→∞ 3 3

It turns out that more generally, the domain of the natural logarithm function ln is C ∖ {0}, that is,
the set of all complex numbers excluding 0. In which case, it is perfectly possible that 1 + x < 0 and the
conclusion to which this question leads is false.
1728, Contents
A635 (9740 N2007/I/11)(i) y
When t = π/2,
dy dy dx 3 sin t cos t 2
3 (x, y) = (0, 1).
(ii) = ÷ = = − sin t.
dx dt dt 2 cos t (− sin t) 2

So, the gradient of the tangent at the given

point is − sin θ and its equation is:
2 x = cos2 t, y = sin3 t,
for 0 ≤ t ≤ π/2.
1 3
y − sin3 θ = − sin θ (x − cos2 θ).
For its x-intercept, plug y = 0 into = to get:

When t = 0,
− sin3 θ = − sin θ (xQ − cos2 θ). (x, y) = (1, 0).
Or: xQ = sin2 θ + cos2 θ.
For its y-intercept, plug x = 0 into = to get:

3 3
y − sin3 θ = − sin θ (− cos2 θ) or yR = sin θ cos2 θ + sin3 θ.
2 2
OQR is simply a triangle with area:

1 1 2 3
xQ yR = ( sin2 θ + cos2 θ) ( sin θ cos2 θ + sin3 θ)
2 2 3 2
= sin θ (2 sin2 θ + 3 cos2 θ) (3 cos2 θ + 2 sin2 θ)
1 1
= sin θ (2 sin2 θ + 3 cos2 θ) = sin θ (2 + cos2 θ) .
2 2
12 12

(iii) The requested area is:

1 1 dx dt t=0 dx
∫0 y dx = ∫ y dx = ∫
y dt
0 dt dx t=π/2 dt

=∫ sin t ⋅ 2 cos t (− sin t) dt = 2 ∫
cos t sin4 t dt.
π/2 0

du 3
The given substitution u = sin t yields = cos t.

du 4 s 1 1 2
cos t sin4 t dt = 2 ∫ u dt = 2 ∫ u4 du = 2 [ u5 ] = .
π/2 π/2 u=1
0 0 dt u=0 5 0 5
1729, Contents
A636 (9740 N2007/II/3)(i) Let f (x) = (1 + x) . Then f (0) = 1. Also:

f ′ (x) = n (1 + x) and f ′ (0) = n.


f ′′ (x) = n (n − 1) (1 + x) and f ′′ (0) = n (n − 1).


f ′′′ (x) = n (n − 1) (n − 2) (1 + x) and f ′′′ (0) = n (n − 1) (n − 2).


n (n − 1) 2 n (n − 1) (n − 2) 3
Hence: (1 + x) = 1 + nx + x + x + ...
2! 3!

(ii) First, (1 + 2x ) = 1 + 1.5 (2x2 ) + ⋅ ⋅ ⋅ = 1 + 3x2 + . . .

2 2

1 3 3 1 3/21/2 1 2 3/21/2 (−1/2) 1 3

(4 − x) = 8 (1 − x) = 8 [1 + (− x) + (− x) + (− x) + . . . ]
2 2
4 2 4 2! 4 3! 4
3 2 1 3
= 8 − 3x + x + x + ...
16 128
3 2 1 3
(4 − x) (1 + 2x ) = (8 − 3x + x + x + . . . ) (1 + 3x2 + . . . )
So: 2

16 128
3 1 3 3 127 3
= 8 − 3x + x2 + x + 24x2 − 9x3 + ⋅ ⋅ ⋅ = 8 − 3x + 24 x2 − 8 x + ...
16 128 16 128

(iii) The expansions are valid provided:

∣−x/4∣ < 1 AND ∣2x2 ∣ < 1 ⇐⇒ ∣x∣ < 4 AND ∣x2 ∣ < 1/2 ⇐⇒ ∣x∣ < 1/ 2.

A637 (9740 N2007/II/4)(i)

5π/3 5π/3 1 − cos 2x 1 1 5π/3
5π 3
∫0 sin2 x dx = ∫ dx = [ x − sin 2x] = + .
0 2 2 4 0 6 8
√ √
5π/3 5π/3 5π 3 5π 3
∫0 cos x dx = ∫ 1 − sin x dx = [x] 0 −( + )= −
2 2 5π/3
0 6 8 6 8

(ii)(a) Use Integration by Parts twice:

x2 sin x dx = [x2 (− cos x)]0 − ∫ 2x (− cos x) dx = 0 + 2 ∫

π/2 π/2 π/2
xcos x dx
0 0

= 2 [x sin x − ∫ sin x dx] 0 = 2 [x sin x + cos x] 0 = 2 ( + 0 − 0 − 1) = π − 2.

π/2 π/2 π

(x2 sin x) dx ≈ 5.391.

π/2 2
(ii)(b) Calculator: π∫

1730, Contents

3 2
A638 (9233 N2007/I/2). (4 + 3x) = 4 (1 + x) .
5 5
2 2
The first negative coefficient is the 4th Maclaurin coefficient:
⋅ 23 ⋅ 21 ⋅ (− 12 ) 3 4 5 −15/2
4 3 4 −15 81 −5 81 405
( ) =2 ⋅ ( ) = ⋅ = ⋅ =−
4 2 .
4! 4 4! 4 12 256 4 256 1 024

√ √ √
3/2 3/2 1 3/2 1
y 2 dx = π ∫ =
A639 (9233 N2007/I/3).π ∫ dx ∫ dx
1/2 1/2 1 + 4x2 4 1/2 1/22 + x2
√ √
= [2 tan−1 2x]1/2 = (tan−1 3 − tan−1 1) = .
π 3/2 π
4 2 24

du 3 1
A640 (9233 N2007/I/8)(i) From t = sin u, we have sin−1 t = u and =
1 2
. So:
dt cos u

(sin−1 t) cos [(sin−1 t) ] 2

u cos u2 u cos u2
∫ √ dt = ∫ √ dt = ∫ ∣cos u∣ dt
1 − t2 1 − sin2 u
u cos u2 3 du s 1 1
=∫ dt = ∫ u cos u2 dt = ∫ u cos u2 du = sin u2 + C = sin (sin−1 t) + C.
cos u dt 2 2

(sin−1 t) cos [(sin−1 t) ]

1 1 1 π2
√ dt = ∫ u cos u du = [sin u ]0 = sin .
2 2 π/2
1 − t2 u=0 2 2 4

A641 (9233 N2007/I/10)(i) For x ∈ [0, 2π], the graphs of y = cos x and y = sin x intersect
at x = π/4 and x = 5π/4. With the aid of a sketch, we see that the given inequality holds
to the left of the first intersection point and to the right of the second. That is:

cos x > sin x ⇐⇒ x ∈ [0, ) ∪ ( , 2π].
4 4

(ii) ∫0 ∣cos x − sin x∣ dx


= ∫ cos x − sin x dx + ∫ π sin x − cos x dx + ∫ 5π cos x − sin x dx
4 4

0 4 4

= [sin x + cos x]0 + [− cos x − sin x]π/4 + [sin x + cos x]5π/4

π/4 5π/4 2π

= sin + cos − 1 + 2 (sin + cos ) + (sin + cos ) + 1 = 8 sin = 4 2.
π π π π π π π
4 4 4 4 4 4 4
1731, Contents
5x + 4 Bx + C
= +
A642 (9233 N2007/I/11). A
(x − 5) (x2 + 4) x − 5 x2 + 4
Ax2 + 4 + Bx2 − 5Bx + Cx − 5C (A + B) x2 + (−5B + C) x + 4 − 5C
= = .
(x − 5) (x2 + 4) (x − 5) (x2 + 4)

A + B = 0, −5B + C = 5, 4 − 5C = 4.
1 2 3
Comparing coefficients:

Solving, we have C = 0, B = −1, and A = 1. So:

4 5x + 4 4 1 1 4
dx = ∫ − 2 dx = [ln ∣x − 5∣ − ln (x + 4)]
(x − 5) (x2 + 4) 1 x−5 x +4 2 1

1 1 5 1
= ln 1 − ln 4 − ln 20 + ln 5 = ln √ = ln = − ln 8.
2 2 4 ⋅ 20 8

A643 (9233 N2007/I/13)(i) y ′ (x) = sec x tan x = tan x.
sec x

y ′′ (x) = sec2 x. y ′′′ (x) = 2 sec2 x tan x = 2y ′′ (x) y ′ (x).

(ii) y (4) (x) = 2 (2 sec2 x tan2 x + sec4 x). So, y (4) (0) = 2 (0 + 1) = 2.

y (0) y ′ (0) y ′′ (x) 2 y ′′′ (x) 3 y (4) (0) 4

(iii) ln (sec x) = + x+ x + x + x + ...
0! 1! 2! 3! 4!
y (0) y ′ (0) y ′′ (x) 2 y ′′′ (x) 3 y (4) (0) 4
= + x+ x + x + x + ...
0! 1! 2! 3! 4!
1 0 2 1 1
= 0 + 0x + x2 + x3 + x4 = x2 + x4 .
2 6 4! 2 12
√ 1
(iv) First, observe that ln (sec ) = ln 2 = ln 2. And now by (iii):
4 2
1 π 2 1 π 4 π2 π4
ln 2 = 2 ln (sec ) ≈ 2 [ ( ) + ( ) ] = +
4 2 4 12 4 16 1 536

1732, Contents

A644 (9233 N2007/I/14)(i) Differentiate the given equation with respect to x:

dy 2x −
−y 2
dy x2 − y 2 x2 + y 2
2x − 2y =A or 2x − 2y = or = x
= .
dx dx x dx 2y xy

dy 4 dv
(ii) From the given substitution y = vx, we have = x + v. So:
dx dx
dv 4 dy 2xy 2vx2 2v 3v + v 3
= −v =− 2 −v =− 2 − v = −( + v) = −
x + y2 x + v 2 x2 1 + v2 1 + v2
x .
dx dx

1 + v2 1 dx
(iii) Rearrange (ii): − = . Then apply ∫ dv:
3v + v 3 x dv
1 + v2 1 dx 1 1
∫ 3v + v 3 dv = ∫ x dv dv = − 3 ln (3v + v ) = ∫ x dx = ln x + C1 .
− 3

⇐⇒ (3v + v 3 ) = C2 x ⇐⇒ 3x3 v + x3 v 3 = C ⇐⇒ 3x2 y + y 3 = C.

A645 (9233 N2006/I/7). Consider the cone formed by the liquid. Let r be the radius
of this cone’s base and h be its height. Then tan 45○ = Opp/Adj = r/h = 1, so that r = h.

V = r h = h3 .
π 2 π
So, the volume of the liquid is:
3 3

V ′ (t) = −2.
We are given that:

V ′ (t) = πh2
But we also have: .
h′ (t) =
So: .

At t seconds after the start of the experiment, we have:

2 5 3
V (t) = 390 − 2t = [h (t)] 3 [h (t)] = [ (390 − 2t)] .
3 π

So, at 3 minutes or 180 seconds after the start of the experiment, we have:
2/3 2/3
5 3 6 90
[h (180)] 2 = [ (390 − 2 ⋅ 180)] = ( ) .
π π

−2 −2
h′ (180) = = √ ≈ −0.068.
4 6
And thus:
π [h (180)]
902 π

The instantaneous rate of decrease of the depth of the liquid at 3 minutes is approximately
0.068 centimetres per second.
1733, Contents
dy dy dy
A646 (9233 N2006/I/8). Apply to the given equation: 6x + y + x + 2y = 0.
dx dx dx
= 0 ⇐⇒ 6x + y = 0 ⇐⇒ y = −6x.

Now plug = into the given equation:


3x2 + x (−6x) + (−6x) = 33 33x2 = 33 or x = ±1.

or or

So, the two points at which the tangent is parallel to the x-axis are (±1, ∓6).
d d 1 − sin θ 1 sin θ
A647 (9233 N2006/I/9)(i) sec θ = =− = = sec θ tan θ.
dθ dθ cos θ cos2 θ cos θ cos θ
dx 2
(ii) From x = sec θ − 1, we have = sec θ tan θ and x2 + 2x = (x + 1) − 1 = sec2 θ − 1.
1 3 2

1 1 1 1 1 1
√ = √ = √
∫√2−1 dx ∫ √ dx ∫ √ dx
(x + 1) x2 + 2x 2−1 sec θ sec2 θ − 1 2−1 sec θ tan2 θ

1 1 1 dθ
= ∫√ dx = ∫√ dx = ∫ dθ = [sin θ]π/4 = .
θ=π/3 π
2 s π/3
2−1 sec θ tan θ 2−1 dx θ=π/4 12

A648 (9233 N2006/I/12)(i) Write:

1 + x − 2x2 Bx + C A + 2C + (2B − C) x + (A − B) x2
= + =
(2 − x) (1 + x2 ) 2 − x 1 + x2 (2 − x) (1 + x2 )

A + 2C = 1, 2B − C = 1, A − B = −2.
1 2 3
Comparing coefficients:
= plus 2× = plus 4× = yields 5A = −5 or A = −1. We then find that C = 1 and B = 1.
1 2 3

1 + x − 2x2 4 1 x+1
So: = − + .
(2 − x) (1 + x2 ) 2 − x 1 + x2
(ii) We use the first standard Maclaurin series expansion twice. First:

1 5 1 x −1 1 (−1) (−2) x 2 1 1 1
− = − (1 − ) = − [1 + (−1) (− ) + (− ) + . . . ] = − − x − x2 − . . .
2−x 2 2 2 2 2! 2 2 4 8

Second: = 1 + (−1) x2 + ⋅ ⋅ ⋅ = 1 − x2 + . . .

x+1 6
So: = (x + 1) (1 − x2 + . . . ) = 1 + x − x2 + . . .

Plug = and = into = to get:

5 6 4

1 + x − 2x2 1 1 1 2 1 3 9 2
= (− − − − ) + (1 + − 2
+ ) = + − x + ...
(2 − x) (1 + x2 )
x x . . . x x . . . x
2 4 8 2 4 8

∣− ∣ < 1 AND ∣x2 ∣ < 1 “∣x∣ < 1”.

(iii) The expansions in (ii) are valid if: or
1734, Contents
A649 (9233 N2006/I/14)(i) The gradient of QR is:

yR − yQ c/r − c/q
= = =− .
xR − xQ cr − cq r − q qr

(ii) The described line has gradient qr, passes through P (cp, ), and thus has equation:

y − = qr (x − cp).
c 1

This line passes through V . So, plug (x, y) = (cv, ) into = to get:
c 1
1 1 1
− = qr (cv − cp) − = qr (v − p) v=−
c c
or or .
v p v p pqr

d dx
(iii) Observe that xy = c2 . Applying the operator, we have y + x = 0.
dy dy
dx x ct
Rearranging, the gradient of the normal to the given curve at t is: − = = = t2 .
dy y c/t
So the gradient of the normal at P is p2 .

= p (x − cp).
c 2 2
(iv) The equation of the normal at P is:
This line passes through S. So, plug (x, y) = (cs, ) into = to get:
c 2
p−s 1 1
− = p2 (cs − cp) = p2 (s − p) − = p3 or s=−
c c
or or .
s p sp s p3

(v) Since QP ⊥ P R, their gradients must be negative reciprocals of each other.

1 1 1 1
From (i), QP has gradient − and P R has gradient − . So, − = pr or − = p2 .
qp pr qp qr
But p2 is also the gradient of the normal at P . So, QR and the normal at P are parallel.

A650 (9233 N2006/II/2)(i) By the Quotient Rule:

√ −1/2
dz x2 + 32 − x ( 12 ) (x2 + 32) (2x) x2 + 32 − x2 32
= = =
x2 + 32
dx (x + 32) (x + 32)
2 3/2 2 3/2

⎡ ⎤7
1 ⎢⎢ ⎥
1 ⎥ 1 7 2 1 7 2 4 1
= = [ √ − √ ] = ( − ) = =
(ii)∫ dx ⎢ ⎥
⎢ ⎥ ⋅
(x + 32) 32
⎣ (x + 32) ⎦2
32 32 9 6 9 32 72
3/2 1
2 2 2 2 81 36 2
1735, Contents
130.6. Ch. 114 Answers (Probability and Statistics)
A651 (9758 N2017/II/5). XXX
A652 (9758 N2017/II/6). XXX
A653 (9758 N2017/II/7). XXX
A654 (9758 N2017/II/8). XXX
A655 (9758 N2017/II/9). XXX
A656 (9758 N2017/II/10). XXX
A657 (9740 N2016/II/5). XXX
A658 (9740 N2016/II/6). XXX
A659 (9740 N2016/II/7). XXX
A660 (9740 N2016/II/8). XXX
A661 (9740 N2016/II/9). XXX
A662 (9740 N2016/II/10). XXX
A663 (9740 N2015/II/5)(i) The manager may not have all the required information to
properly implement stratified sampling. For example, he may not know what proportion
of the sampling population each age group composes.
(ii) Decide what the age groups are and how many he wishes to survey from each group.
(That is, for each age group, set a quota of respondents to be surveyed.) Then simply go
around surveying customers he sees in the supermarket, until he meets the quota for each
age group.
(iii) The manager may unconsciously gravitate towards customers that look more friendly.
He may thus not get a representative sample of his customers (many of whom look un-
A664 (9740 N2015/II/6)(i) Let X be the number of red sweets in the packet.

P(X ≥ 4) = 1 − P(X < 4) = 1 − P(X = 0) − P(X = 1) − P(X = 2) − P(X = 3)

⎛ 10 ⎞ ⎛ 10 ⎞ ⎛ 10 ⎞
= 1 − 0.7510 − 0.759 0.25 − 0.752 0.258 − 0.753 0.257
⎝ 1 ⎠ ⎝ 2 ⎠ ⎝ 3 ⎠
⎛ 10 ⎞ ⎛ 10 ⎞ ⎛ 10 ⎞
= 1 − 0.7510 − 0.759 0.25 − 0.752 0.258 − 0.753 0.257
⎝ 1 ⎠ ⎝ 2 ⎠ ⎝ 3 ⎠
≈ 0.247501

(ii) X ∼ B(100, 0.25). Since np = 25 > 5 and n(1 − p) > 5, the normal approximation
Y ∼ N (25, 18.75) is suitable. Hence, using also the continuity correction,

29.5 − 25
P(X ≥ 30) = 1 − P(X < 30) ≈ 1 − P(Y < 29.5) = 1 − Φ ( √ )
≈ 1 − Φ(1.039) ≈ 1 − 0.8506 = 0.1494.

(iii) Let p = P(X ≥ 30) ≈ 0.1494 and q = 1 − P(X ≥ 30) ≈ 0.8506. Then the desired

1736, Contents

probability is

⎛ 15 ⎞ 15 ⎛ 15 ⎞ 14 ⎛ 15 ⎞ 2 13 ⎛ 15 ⎞ 3 12
q + pq + pq + p q ≈ 0.8245.
⎝ 0 ⎠ ⎝ 1 ⎠ ⎝ 2 ⎠ ⎝ 3 ⎠

A665 (9740 N2015/II/7)(i) The rate at which errors are made is independent of the
number of errors that have already been made.
The rate at which errors are made is constant throughout the newspaper.
(ii) Let E ∼ Po(6 ⋅ 1.3) = Po(7.8). Then
7.80 7.81 7.810
P(E > 10) = 1 − P(E ≤ 10) = 1 − e−7.8 ( + + ⋅⋅⋅ + ) ≈ 0.164770.
0! 1! 10!
(iii) Let F ∼ Po(1.3n). We are given that P(F < 2) < 0.05. That is:
(1.3n)0 (1.3n)1
e−1.3n ( + ) < 0.05 or e−1.3n (1 + 1.3n) < 0.05.
0! 1!
Let f (n) = e−1.3n (1+1.3n). From calculator, f (1), f (2), f (3) > 0.05 and f (4) < 0.05. Hence,
the smallest possible integer value of n is 4.
A666 (9740 N2015/II/8)(i)
0.80 + 1.000 + 0.82 + 0.85 + 0.93 + 0.96 + 0.81 + 0.89 ∑ xi
x̄ = = = 0.8825,
8 n

∑(xi − x̄)2 (0.80 − 0.8825) + (1.000 − 0.8825)2 + ⋅ ⋅ ⋅ + (1.000 − 0.8825)2

s =
= ≈ 0.005592857.
n−1 7
(ii) The null hypothesis is H0 ∶ µ0 = 0.9 and the alternative hypothesis is HA ∶ µ0 < 0.9.
x̄ − µ0 0.8825 − 0.9
t= √ = √ ≈ −0.661860.
s/ n 0.005592857/ 9

Since, ∣t∣ < t7,0.1 = −1.415, we are unable to reject the null hypothesis at the 10% significance
A667 (9740 N2015/II/9)(i) By indep., P(B∣A) = P(B) = 0.4.

(ii)P(A ∪ B ∪ C) = P(A) + P(B) + P(C) − P(A ∩ B) − P(A ∩ C) − P(B ∩ C) + P(A ∩ B ∩ C)

= 0.45 + 0.4 + 0.3 − 0.45 ⋅ 0.4 − 0.45 ⋅ 0.3 − P(B ∩ C) + 0.1 = 0.935 − P(B ∩ C)

Ô⇒ P(A′ ∩ B ′ ∩ C ′ ) = 1 − P(A ∪ B ∪ C) = 0.065 + P(B ∩ C).

The above is true even if B and C are not independent.

And if B and C are independent, P(B ∩ C) = 0.4 ⋅ 0.3 = 0.12 and P(A′ ∩ B ′ ∩ C ′ ) = 0.185.
(iii) We know that P(A ∩ B ′ ∩ C) = P(A ∩ C) − P(A ∩ B ∩ C) = 0.135 − 0.1 = 0.035.
We want to find lower and upper bounds for P(B ∩ C). Refer to diagram below.
1737, Contents
At one extreme, it could be that P(A′ ∩ B ∩ C) = 0, in which case P(B ∩ C) = 0.1.
At the other extreme, it could be that P(A′ ∩ B ′ ∩ C) = 0, in which case P(B ∩ C) = 0.265.
Altogether, P(A′ ∩ B ′ ∩ C ′ ) = 0.065 + P(B ∩ C) ∈ [0.165, 0.33]
In P(B ∩ C) ≥ P(A ∩ B ∩ C) = 0.1, P(B ∩ C) ≤ P(B) = 0.4, P(B ∩ C) ≤ P(C) = 0.3.

A668 (9740 N2015/II/10)(i)

30 y





0 x
0 5,000 10,000 15,000 20,000 25,000 30,000 35,000 40,000 45,000

(ii) (a) PMCC ≈ −0.9807.

(ii) (b) PMCC ≈ −0.9748.
1738, Contents
(ii) (c) PMCC ≈ −0.9986.
(iii) We are apparently supposed to presume that the greater the PMCC, the “better” or
the “more appropriate”. So we are supposed to use (c) from part (ii).
The estimated regression equation is y − ȳ = b(x − x̄), where b = ∑ x̂i ∑ ŷi / ∑ x̂2i . So in this
case, the estimated regression equation is

P − 14.083 = −0.147 ( h − 140.986).

(iv) Let x be the height given in metres. Then 3x = h. Thus, the above equation may be
rewritten as

P − 14.083 = −0.147 ( 3x − 140.986) .

A669 (9740 N2015/II/11)(i) 8!/ (2!2!) = 10080.

(ii) There is only one arrangement where the letters are in alphabetical order, namely
AABBCEGS. Hence, the number of these arrangements in which the letters are not in
alphabetical order is 10080 − 1 = 10079.
(iii) Treating the two A’s as a single unit and the two B’s as a single unit, we have 6 units
altogether, so there are 6! arrangements.
(iv) Treating the two A’s as a single unit, we have 7 units altogether, so there are 7!/2!
Treating the two B’s as a single unit, we have 7 units altogether, so there are 7!/2! arrange-
Hence, the number of arrangements where there are at least two adjacent letters is 7!/2! +
7!/2! − 6! = 7! − 6!, where the subtraction of 6! is to avoid double counting.
Hence, he number of different arrangements with no two adjacent letters the same is
8!/ (2!2!) − (7! − 6!) = 5760.
A670 (9740 N2015/II/12)(i) Let A1 , A2 , A3 , A4 , A5 be independent random variables
with the identical distribution N (300, 202 ). Then F = A1 +A2 +A3 +A4 +A5 ∼ N (5 ⋅ 300, 5 ⋅ 202 )

1600 − 1500
P(F > 1600) = 1 − P(F ≤ 1600) = 1 − Φ ( √ )

= 1 − Φ ( 5) ≈ 1 − Φ(2.236) ≈ 1 − 0.9873 = 0.0127.

(ii) Let P ∼ N (200, 152 ). Then E = P1 + P2 + ⋅ ⋅ ⋅ + P8 ∼ N (8 ⋅ 200, 8 ⋅ 152 ). Then F − E ∼

N (5 ⋅ 300 − 8 ⋅ 200, 5 ⋅ 202 + 8 ⋅ 152 ) = N (−100, 3800) and

0 − (−100)
P(F > E) = P(F − E > 0) = 1 − P(F − E ≤ 0) = 1 − Φ ( √ )
= 1 − Φ ( √ ) ≈ 1 − Φ(1.622) ≈ 1 − 0.9476 = 0.0524.

(iii) 0.85F +0.9E ∼ N (0.85 ⋅ 5 ⋅ 300 + 0.9 ⋅ 8 ⋅ 200, 0.852 ⋅ 5 ⋅ 202 + 0.92 ⋅ 8 ⋅ 152 ) = N (2715, 2903).
1739, Contents
2750 − 2715
P(0.85F + 0.9E < 2750) = Φ ( √ ) ≈ Φ(0.650) ≈ 0.7422.

A671 (9740 N2014/II/5)(i) Arrange these 10000 customers by name, alphabetically. If

two customers have the exact same same, then randomly pick one to precede the other.
From this list of alphabetically-sorted customers, pick every 20th customer to survey.
(ii) Advantage: Each customer has equal probability of being surveyed.
Disadvantage: There is the small risk that there is some periodic pattern that could bias
the sample. For example, it could be that the customers are all in some country (or
concentration camp), where each person has a 9-digit number for a name (e.g. 001533123)
and only the most-privileged persons have 7 as the last digit of their name. If so, our
proposed method would omit all the most-privileged persons.
Such a pattern is obviously contrived and absurdly unlikely. In practice, it is unlikely that
my proposed method of systematic sampling is any different from purely random sampling.
⎛ 3 ⎞⎛ 8 ⎞⎛ 5 ⎞⎛ 6 ⎞
A672 (9740 N2014/II/6)(i) = 31 500.
⎝ 1 ⎠⎝ 4 ⎠⎝ 2 ⎠⎝ 4 ⎠
⎛ 3 ⎞⎛ 8 ⎞⎛ 4 ⎞⎛ 5 ⎞
(ii) Ways to include only the midfielder brother = .
⎝ 1 ⎠⎝ 4 ⎠⎝ 1 ⎠⎝ 4 ⎠
⎛ 3 ⎞⎛ 8 ⎞⎛ 4 ⎞⎛ 5 ⎞
Ways to include only the attacker brother = .
⎝ 1 ⎠⎝ 4 ⎠⎝ 2 ⎠⎝ 3 ⎠
In total, 16 800 ways.
(iii) The club now has 3 goalkeepers, 8 defenders, 3 midfielders, 5 attackers, and one player
(call him Apu) who can either be a midfielder or a defender.
⎛ 3 ⎞⎛ 8 ⎞⎛ 3 ⎞⎛ 5 ⎞
Ways to form a team without Apu = = 3 150.
⎝ 1 ⎠⎝ 4 ⎠⎝ 2 ⎠⎝ 4 ⎠
⎛ 3 ⎞⎛ 8 ⎞⎛ 3 ⎞⎛ 5 ⎞
Ways to form a team with Apu as a midfielder = = 3 150.
⎝ 1 ⎠⎝ 4 ⎠⎝ 1 ⎠⎝ 4 ⎠
⎛ 3 ⎞⎛ 8 ⎞⎛ 3 ⎞⎛ 5 ⎞
Ways to form a team with Apu as a defender = = 2 520.
⎝ 1 ⎠⎝ 3 ⎠⎝ 2 ⎠⎝ 4 ⎠
In total, 8 820 ways.
A673 (9740 N2014/II/7)(i) Let X be the number of sixes rolled. Then X ∼ B (10, ).
⎛ 10 ⎞ 1 3
5 7
And P(X = 3) = ( ) ( ) = = 0.155045.
⎝ 3 ⎠ 6 6 3!610
(ii) Let Y be the number of sixes rolled. Then Y ∼ B (60, ). We have np > 5 and
n(1 − p) > 5. So Z = N (10, ) is a suitable approximate distribution for Y . Using also the
continuity correction, we have
1740, Contents
⎛ 8.5 − 10 ⎞ ⎛ 4.5 − 10 ⎞
P(5 ≤ Y ≤ 8) ≈ P(4.5 < Z < 8.5) = Φ √ −Φ √ ≈ Φ(−0.520) − Φ(−1.905)
⎝ 50/6 ⎠ ⎝ 50/6 ⎠

= Φ(1.905) − Φ(0.520) ≈ 0.9716 − 0.6985 = 0.2731.

(Without using an approximation, P(5 ≤ Y ≤ 8) ≈ 0.291854.)

(iii) Let A be the number of sixes rolled. Then A ∼ B (60, ). We have n > 20 and np < 5.
So B = Po(4) is a suitable approximate distribution for A.
45 46 47 48
P(5 ≤ A ≤ 8) ≈ P(5 ≤ B ≤ 8) = e−4 ( + + + ) ≈ 0.349800.
5! 6! 7! 8!
(Without using an approximation, P(5 ≤ A ≤ 8) ≈ 0.353659.)
A674 (9740 N2014/II/8)(a) Case (i) is in red and case (ii) is in blue.

(b) (i) (A) PMCC ≈ −0.9470452.

(b) (i) (B) PMCC ≈ −0.974921.
(ii) It’s not at all clear which is the better model. But apparently we are supposed to say
that since the second model is better because the magnitude of its PMCC is greater.
In general, the estimated regression equation is y − ȳ = b(x − x̄), where b = ∑ x̂i ∑ ŷi / ∑ x̂2i .
So in this case, the estimated regression equation is

P − 72590 ≈ −33659.728 (ln m − 3.657)

⇐⇒ P ≈ −33659.728 ln m + 195693.560.

(iii) P(50) ≈ −33659.728 ln 50 + 195693.560 ≈ 64016.

A675 (9740 N2014/II/9)(i) Let X be the number of minutes a bus is late after the new
company has taken over. We’ll assume X ∼ N (µ, σ 2 ).
1741, Contents
Our null hypothesis is H0 ∶ µ = µ0 = 4.3 and our alternative hypothesis is HA ∶ µ < µ0 = 4.3.
√ √ √
(ii) The null hypothesis is not rejected if t̄ > µ0 −t9,0.1 ⋅k/ n = 4.3−1.383⋅ 3.2/ 10 ≈ 3.518.
√ √
(iii) The
√ null hypothesis is rejected if t̄ < µ 0 − t 9,0.1 ⋅ k/ n or 4.0 < 4.3 − 1.383 ⋅ k/ 10 or
k > 0.3 10/1.383 or k 2 > 0.32 ⋅ 10/1.3832 ≈ 0.471.
A676 (9740 N2014/II/10)(i)(a) 0.1 ⋅ 0.2 ⋅ 0.1 = 0.002.
(i)(b) The probability that no ⋆ is displayed is 0.9 ⋅ 0.8 ⋅ 0.9 = 0.648. And so the probability
that t least one ⋆ symbol is displayed is 1 − 0.648 = 0.352.
(i)(c) P(× × +) = 0.3 ⋅ 0.1 ⋅ 0.2, P(× + ×) = 0.3 ⋅ 0.3 ⋅ 0.4, P(+ × ×) = 0.4 ⋅ 0.1 ⋅ 0.4.
Thus, the desired probability 0.006 + 0.036 + 0.016 = 0.058.
(ii) The probability that there is exactly one ⋆ is P(⋆ ⋆/ ⋆/ ) + P(/⋆ ⋆ ⋆/ ) + P(/⋆⋆/ ⋆) = 0.1 ⋅ 0.8 ⋅
0.9 + 0.9 ⋅ 0.2 ⋅ 0.9 + 0.9 ⋅ 0.8 ⋅ 0.1 = 0.306.
The probability that the symbols are ⋆, +, ◯ (in any order) is

P (⋆ + ◯) + P (⋆◯+) + P (+ ⋆ ◯) + P (◯ ⋆ +) + P (+◯⋆) + P (◯ + ⋆)
= 0.1(0.3 ⋅ 0.3 + 0.4 ⋅ 0.2) + 0.2(0.4 ⋅ 0.3 + 0.2 ⋅ 0.2) + 0.1(0.4 ⋅ 0.4 + 0.2 ⋅ 0.3)
= 0.017 + 0.032 + 0.022 = 0.071

Hence, the desired probability is 0.071/0.306 = 71/306 ≈ 0.232026.

A677 (9740 N2014/II/11)(i)(a) Let O ∼ Po(2) and P ∼ Po(11). Then
110 111 118
P(P > 8) = 1 − P(P ≤ 8) = 1 − e −11
( + + ⋅⋅⋅ + ) ≈ 0.768015.
0! 1! 8!
(i)(b) O + P ∼ Po(13). So
130 131 1314
P(O + P < 15) = e−13 ( + + ⋅⋅⋅ + ) ≈ 0.675132.
0! 1! 14!
(ii) Let Q ∼ Po(2n). We are given that P(Q < 3) < 0.01. That is,
(2n)0 (2n)1 (2n)2
P(Q < 3) = e −2n
( + + ) = e−2n (1 + 2n + 2n2 ) < 0.01.
0! 1! 2!

Let f (n) = e−2n (1 + 2n + 2n2 ). From calculator, f (1), f (2), f (3), f (4) > 0.01 and f (5) <
0.01. Hence, the smallest possible integer value of n is 5.
(iii) Let R ∼ Po(52⋅11) = Po(572). Given a large sample, we can use the normal distribution
S ∼ N (572, 572)as an approximation. Hence, using also the continuity correction,

550.5 − 572
P(R > 550) ≈ P(S > 550.5) = 1 − P(S < 550.5) = 1 − Φ ( √ )

≈ 1 − Φ(−0.898960) = Φ(0.898960) ≈ 0.8158.

(iv) Sales may be seasonal — e.g. it may be that art collectors make most of their purchases
in the northern hemisphere’s summer months.

1742, Contents

The sales of originals and prints may not be independent of each other. E.g., an art collector
who buys an original Picasso might wish to also buy a few copies thereof.
A678 (9740 N2013/II/5)(i) Use a computer program to randomly sort the 100000
employees into an ordered list. Pick the first 90 employees on the list.
The Chief Executive’s idea of a representative sample might be to have each country’s
employees proportionally represented. For example, if 10% of employees are from India,
then she may want 9 of the invited employees to be from India.
(ii) Stratified sampling is more appropriate. If say 10% of employees are from India, 30%
from China, 20% from Thailand, and 40% from Singapore, then we could instead pick
from the list the first 9 Indian employees, the first 27 Chinese employees, the first 18 Thai
employees, and the first 36 Singaporean employees.
2a − µ 2a − µ
A679 (9740 N2013/II/6). P(Y < 2a) = P (Z < ) = 0.95 Ô⇒ ≈ 1.645 ⇐⇒
σ σ
2a − µ 1
≈ σ.
a−µ a−µ 2a − µ
P(Y < a) = P (Z < ) = 0.25 Ô⇒ ≈ −0.674 ⇐⇒ µ − a ≈ 0.674σ ≈ 0.674
σ σ 1.645
0.674 2 ⋅ 0.674
⇐⇒ µ (1 + ) ≈ (1 + ) a ⇐⇒ µ ≈ 1.29a. That is, k ≈ 1.29.
1.645 1.645
A680 (9740 N2013/II/7)(i) The probability that one packet contains a free gift is
independent of why another packet contains a free gift.
There is no possibility that any one packet contains two or more free gifts.
1 ⎛ 20 ⎞ 1 19 19
(ii) Let F ∼ B (20, ). Then P(F = 1) = ( ) ( ) ≈ 0.377354.
20 ⎝ 1 ⎠ 20 20
(iii) Let F ∼ B (60, ). Since n = 60 is large and np = 3 is small, a suitable approximation
for F is G ∼ Po (3).
30 31 32 33 34 35
P(F ≥ 5) ≈ P(G ≥ 5) = 1 − e−3 ( + + + + + ) ≈ 0.184737.
0! 1! 2! 3! 4! 5!
(By comparison, the actual probability is P(F ≥ 5) ≈ 0.180335.)
A681 (9740 N2013/II/8)(i) P(B ∩ A′ ) = P(B∣A′ )P(A′ ) = 0.8 × 0.3 = 0.24.
(ii) P(A′ ∩ B ′ ) = 1 − [P(A) + P(B ∩ A′ )] = 1 − 0.7 − 0.24 = 0.06.
(iii) P(A′ ∣B ′ ) = 1 − P(A∣B ′ ) = 0.18.
P(A′ ∩ B ′ ) 0.06
P(B ) =

= = 0.5
P(A′ ∣B ′ ) 0.12
P(A ∩ B) = 1 − [P(A′ ) + P(A ∩ B ′ )] = 1 − 0.3 − P(A ∩ B ′ )
= 0.7 − P(A∣B ′ )P(B ′ ) = 0.7 − 0.88 × 0.5 = 0.26.

A682 (9740 N2013/II/9)(i)

1743, Contents

∑ xi 14.0 + 12.5 + 11.0 + 11.0 + 12.5 + 12.6 + 15.6 + 13.2
x̄ = = = 12.8,
n 8
∑(xi − x̄)2 (14.0 − 12.8) + (12.5 − 12.8)2 + ⋅ ⋅ ⋅ + (13.2 − 12.8)2
s =
= ≈ 2.305714.
n−1 7
(ii) The necessary assumption is that the population is normally distributed.
The null hypothesis is H0 ∶ µ0 = 13.8 and the alternative hypothesis is HA ∶ µ0 < 13.8.
x̄ − µ0 12.8 − 13.8
t= √ =√ ≈ −1.862697.
s/ n 2.305714/8

Since ∣t∣ < t7,0.05 = 1.895, we are unable to reject the null hypothesis at the 5% significance
A683 (9740 N2013/II/10)(i) In blue is case (A), in red is case (B), and in green is
case (C).


150 Distance, y


Speed, x
0 15 30 45 60 75 90 105 120 135 150

(iii) As a function of speed, the distance travelled decreases at an increasing rate. So (A)
is the most appropriate.
PMCC ≈ −0.939203.
(iv) In general, the estimated regression equation is y−ȳ = b(x−x̄), where b = ∑ x̂i ∑ ŷi / ∑ x̂2i .
So in this case, the estimated regression equation is

y − 135 ≈ −0.00461978 (x2 − 11850.66667)

⇐⇒ y ≈ −0.00461978x2 + 189.747528.

1744, Contents

Thus, y(110) ≈ −0.00461978(110)2 + 189.747528 ≈ 134.
A684 (9740 N2013/II/11)(i) The total number of ways to choose a code is 263 92
(1423656). The number of ways to choose a code with three different letters and two differ-
ent digits is 26⋅25⋅24⋅9⋅8 (1123200). Hence, the desired probability is 26⋅25⋅24⋅9⋅8/ (263 ⋅ 92 ) =
≈ 0.78895.
(ii) The number of ways to choose the two digits so that the second digit is larger than the
first is 1 + 2 + ⋅ ⋅ ⋅ + 8 = 36. Hence, the desired probability is (1 + 2 + ⋅ ⋅ ⋅ + 8)/92 = = 0.4̇.
(iii) The number of ways to choose a code with exactly two letters the same, but not two
digits the same is

Arrange these three letters

Repeated letter Third letter © Two digits
© © 3! «
26 × 25 × × 9⋅8 = 26⋅25⋅3⋅9⋅8 = 140400.
The number of ways to choose a code with exactly two digits the same, but not exactly
two letters the same is

⎛ All three All three ⎞

⎜ ⎟
Repeated digit ⎜

letters different letters same ⎟

© ⎜ ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ © ⎟
9 ×⎜
⎜ 26 ⋅ 25 ⋅ 24 + 26 ⎟ = 9 ⋅ 26 ⋅ 601 = 140634.

⎜ ⎟
⎜ ⎟
⎜ ⎟
⎜ ⎟
⎝ ⎠

26 ⋅ 25 ⋅ 3 ⋅ 9 ⋅ 8 + 9 ⋅ 26 ⋅ 601 25 ⋅ 3 ⋅ 8 + 601 1201

Hence the desired probability is = = ≈ 0.197.
263 ⋅ 92 262 ⋅ 9 6084
(iv) There are 4 ways to choose the even digit, 5 to choose the odd digit then 2 ways to
arrange these two digits. Hence, there are 4 ⋅ 5 ⋅ 2 = 40 ways to choose the two digits.
There are 5 ways to choose the vowel. There are 212 ways to choose the two consonants.
We can now slot in the vowel amidst the consonants in 3 different ways. Hence, there are
5 ⋅ 212 ⋅ 3 ways to choose the three letters.
Altogether then, there are 5 ⋅ 212 ⋅ 3 ⋅ 4 ⋅ 5 ⋅ 2 ways to choose a code with exactly one vowel
and exactly one even digit.
5 ⋅ 212 ⋅ 3 ⋅ 4 ⋅ 5 ⋅ 2 5 ⋅ 72 ⋅ 5 1225
Hence the desired probability is = = ≈ 0.18586.
263 92 133 ⋅ 3 6591
A685 (9740 N2013/II/12)(i) #1. The number of people sick on a particular day is
independent of how many were sick the previous day. #2. The average number of illnesses
in any span of 30 days is the same, throughout the course of the year.
Condition #1 may not be met if the illness is contagious. If so, we’d expect the number of
people sick on a particular day to depend (positively) on how many were sick the previous
Condition #2 may not be met if the illnesses are seasonal. For example, due to influenza,
illnesses may be more common during the winter than during the summer.

1745, Contents

Let A ∼ Po(1.2) and M ∼ Po(2.7).
(ii) Let B ∼ Po(1.2n). Then P(B = 0) < 0.01 ⇐⇒ e−1.2n < 0.01 ⇐⇒ n > (ln 0.01) /(−1.2) ≈
3.8. Hence, the smallest number of days is 4.
(iii) Let C be the total number of days of absence across both departments, over a 5-day
period. Then C ∼ Po(19.5) and:
P(C > 20) = 1 − P(C ≤ 20) = 1 − e−19.5 ∑ ≈ 0.396583.
i=0 i!

(iv) Let D be the total number of days of absence across both departments, over a 60-day
period. Then D ∼ Po(234). Since λD = 234 is large, the normal distribution is a suitable
approximation. Let E ∼ N (234, 234). Then:

250.5 − 234 199.5 − 234

P(200 ≤ D ≤ 250) ≈ P(199.5 ≤ E ≤ 250.5) = Φ ( √ ) − Φ( √ )
234 234

≈ Φ(1.0786) − Φ(−2.2553) ≈ 0.8597 − 0.0120 = 0.8477.

A686 (9740 N2012/II/5)(i)(a) Let +, −, D, and N denote the events “positive result”,
“negative result”, “has disease”, and “no disease”. Then:

P(+) = P(+∣D)P(D) + P(+∣N )P(N ) = p ⋅ 0.001 + (1 − p) ⋅ 0.999 = 0.999 − 0.998p = 0.00599.

(i)(b) P(D∣+) = P(D ∩ +) ÷ P(+) = P(D)P(+∣D) ÷ P(+) = 0.001p ÷ 0.00599 ≈ 0.166110.

(ii) asP(D∣+) = 0.75. But
P(D∣+) =
0.999 − 0.998p

So 3(0.999 − 0.998p) = 4(0.001p) or 2.997 = 2.998p or p ≈ 0.999666.

A687 (9740 N2012/II/6)(i) H0 ∶ µ0 = 14.0, HA ∶ µ0 ≠ 14.0.
(ii) H0 ∶ x̄ ∼ N (14.0, 3.82 ). Since Z0.025 = 1.96, the values of x̄ for which the null hypothesis
would not be rejected are

3.8 3.8
x̄ ∈ (µ − Z0.025 √ , µ + Z0.025 √ ) = (14.0 − 1.96 √ , 14.0 + 1.96 √ ) ≈ (12.335, 15.665) .
σ σ
n n 20 20

(iii) The null hypothesis is rejected.

A688 (9740 N2012/II/7)(i) There are 15! ways to arrange the 15 individuals.
There are 2 ways to arrange the 2 sisters as a single unit. Counting the 2 sisters as a single
unit, we have 14 units total, and there are 14! ways to arrange these 14 units. So, there
are in total 2 ⋅ 14! ways to arrange the 15 individuals so that the two sisters are together.
Hence, the probability that the sisters are next to each other is 2 ⋅ 14!/15! = 2/15 = 0.13̇.
(ii) There are 3! ways to arrange the 3 brothers as a single unit. Counting the 3 brothers
as a single unit, we have 13 units total, and there are 13! ways to arrange these 13 units.
So, there are in total 3! ⋅ 13! ways to arrange the 15 individuals so that the three brothers
1746, Contents
are together. We do not want the three brothers to be together.
Hence, the desired probability is 1−3!⋅13!/15! = 1−6/ (14 ⋅ 15) = 1−1/35 = 34/35 ≈ 0.97142857.
(iii) There are 2 ways to arrange the 2 sisters as a single unit and 3! ways to arrange the 3
brothers as a single unit. Counting the 2 sisters as a single unit and also the 3 brothers as
a single unit, we have 12 units in total, and there are 12! ways to arranges these 12 units.
So, there are in total 2 ⋅ 3! ⋅ 12! ways to arrange the 15 individuals so that the 2 sisters are
together and the 2 brothers are together.
Hence, the desired probability is 2 ⋅ 3! ⋅ 12!/15! = 12/(13 ⋅ 14 ⋅ 15) = 2/(13 ⋅ 7 ⋅ 5) = 2/455 ≈
(iv) Let A and B denote the events that “the sisters are next to each other” and “the
brothers are next to each other”. Our desired probably is P(A ∪ B).
2 1 2 91 ⋅ 2 13 2
P(A ∪ B) = P(A) + P(B) − P(A ∩ B) = + − = + −
15 35 455 3 ⋅ 455 455 455

91 ⋅ 2 33 43 43
= + = = ≈ 0.17695.
3 ⋅ 455 3 ⋅ 455 3 ⋅ 91 243
A689 (9740 N2012/II/8)(i)

100 Percentage mark, y

Week, x
0 1 2 3 4 5 6

(ii) The trend is one of steady improvement. After a terrible performance in Week 1, Amy
resolves to work hard. Her work pays off, with her mark improving week after week.
The only deviation from trend occurs on Week 5, because Amy happened to be experi-
menting with drugs that week.
(iii) A linear model would suggest that she eventually breaks the 100% barrier, which is
quite impossible.
A quadratic model would suggest that her mark eventually starts falling and moreover at
an increasing rate, which is quite improbable, unless of course she gets hooked on drugs.
(iv) PMCC ≈ −0.929744.
(v) We are supposed to say that the most appropriate choice is wherever the magnitude of
the PMCC is the largest. Hence, L = 92 is the most appropriate.
(vi) In general, the estimated regression equation is y−ȳ = b(x−x̄), where b = ∑ x̂i ∑ ŷi / ∑ x̂2i .
So in this case, the estimated regression equation is

1747, Contents

ln (92 − y) − 3.125912 ≈ −0.279599(x − 3.5)
⇐⇒ ln (92 − y) ≈ −0.279599x + 4.104510.

y ≥ 90 ⇐⇒ −0.28x + 4.10 ≤ ln 2 ⇐⇒ x ? 12.2. So she’ll get at least 90% in Week 13.

(vii) As x → ∞, y → L. An interpretation is thus that L is the best mark she can ever
hope to get, no matter how long she spends studying.
A690 (9740 N2012/II/9)(i) The choice must be binary — a voter must be said to either
support the Alliance Party or not support it.
The probability that any one polled voter supports the Party is independent of whether
another polled voter supports the party.
⎛ 30 ⎞ 3 ⎛ 30 ⎞ 4
(ii) P(A = 3) + P(A = 4) = p (1 − p)27 + p (1 − p)26 ≈ 0.373068.
⎝ 3 ⎠ ⎝ 4 ⎠
(iii)(a) np = 16.5 > 5 and n(1 − p) = 13.5 > 5 are both large and so yes, the normal
distribution N(16.5, 16.5 ⋅ 0.45)would be a suitable approximation for A.
(iii)(b) p is large. And so, while it is certainly possible to use the Poisson distribution as
an approximation, it would fare poorly.
⎛ 30 ⎞ 15
(iv) P(A = 15) = p (1 − p)15 ≈ 0.06864.
⎝ 15 ⎠
⎡ ⎤1/15
⎢ ⎛ ⎞ ⎥
Thus, p(1 − p) = p − p2 ≈ ⎢⎢0.06864/ ⎥
≈ 0.237900.
⎢ ⎝ 15 ⎠⎥⎥
⎣ ⎦
Rearranging, p − p + 0.237900 = 0. By the quadratic formula, p ≈ 0.39, 0.61. Given that

p < 0.5, we have p ≈ 0.39.

A691 (9740 N2012/II/10)(i) The number of gold coins in a randomly chosen square
metre is independent of how many gold coins there are in the square metre to its left.
No two coins are stacked exactly on top of each other.
(ii) Let G ∼ Po(0.8). Then:
0.80 0.81 0.82
P(G ≥ 3) = 1 − e−0.8 ( + + ) ≈ 0.0474226.
0! 1! 2!

(iii) Let H ∼ Po(0.8x). Then P(H = 1) = e−0.8x (0.8x) = 0.2.

By calculator, x ≈ 0.323964, 3.1783. So x ≈ 0.323964.
(iv) Let I ∼ Po(80). Since λ is large, the normal distribution J ∼ N(80, 80) is a suitable
approximation. Using also the continuity correction:

89.5 − 80
P(I ≥ 90) ≈ P(J ≥ 89.5) = 1 − Φ ( √ ) ≈ 1 − Φ(1.062) ≈ 1 − 0.8559 = 0.1441.

(v) Let P ∼ Po(3). Let Z be the number of gold coins and pottery shards found in 50 m2 .
Then Z ∼ Po(190). Since λ is large, the normal distribution Q ∼ N(190, 190) is a suitable
approximation for Z. Using also the continuity correction,

1748, Contents

199.5 − 190
P (Z ≥ 200) ≈ P(Q ≥ 199.5) = 1 − Φ ( √ ) ≈ 1 − Φ(0.6892) ≈ 1 − 0.7546 = 0.2454.

(vi) Let X and Y be, respectively, the numbers of gold coins and pottery shards found in
50 m2 . Then X ∼ Po(40) and Y ∼ Po(150). Our goal is to find P(Y ≥ 3X) = P(Y − 3X ≥ 0).
Since λX = 40 and λY = 150 are both large, the normal distributions A ∼ N(40, 40) and
B ∼ N(150, 150) are suitable approximations for X and Y , respectively. And in turn,
B − 3A ∼ N(150 − 3 ⋅ 40, 150 + 32 ⋅ 40) = N(30, 510) is a good approximation for Y − 3X.
Hence, using also the continuity correction,

−0.5 − 30 30.5
P(Y − 3X ≥ 0) ≈ P(B − 3A ≥ −0.5) = 1 − Φ ( √ ) = Φ (√ ) ≈ Φ(1.3506) ≈ 0.9116.
510 510
40.0 − µ 40.0 − µ
A692 (9740 N2011/II/5)(i) P(X < 40.0) = P (Z < ) = 0.05 ⇐⇒ ≈
σ σ
−1.645 ⇐⇒ µ ≈ 1.645σ + 40.0

70.0 − µ 70.0 − µ
P(X < 70.0) = P (Z < ) = 0.975 ⇐⇒ ≈ 1.96 ⇐⇒ µ ≈ −1.96σ + 70.0.
σ σ
Comparing ≈ and ≈, we have 1.645σ + 40.0 ≈ −1.96σ + 70.0 ⇐⇒ 3.605σ ≈ 30.0 ⇐⇒ σ ≈ 8.3
1 2

and µ ≈ 53.7.
A693 (9740 N2011/II/6)(i) Decide what the age groups will be. Decide how many
from each age group are to be interviewed (these are our quotas). Then pick, at random,
residents on the street to be interviewed, until the quota for every age group is fulfilled.
(ii) Residents who are on the street may not be a representative sample of the population.
(iii) Random sampling. Acquire a complete list of the city suburb’s population. Use a
computer program to randomly pick a sample. Interview this sample.
No it is not realistic. First, one may be able to acquire a complete list of the city suburb’s
population. Second, one may not be able to contact every member of one’s sample.
A694 (9740 N2011/II/7)(i) #1. I do indeed make an actual attempt to contact n
different friends.
#2. The probability that one friend is contactable is independent of whether another friend
is contactable.
(ii) Assumption #1 may not hold because if say n = 100, I may run out of time before I
attempt to contact all 100 different friends.
Assumption #2 may not hold because my friends probably know each other and so they
might be watching a movie together and their handphones are switched off. This would
mean that the probability that one friend is contactable is dependent on whether another
friend is contactable.
5 5 ⎛ 5 ⎞ i 5−i
(iii) P(R ≥ 6) = 1 − ∑ P(R = i) = 1 − ∑ 0.7 0.3 ≈ 0.551774.
i=0 i=0 ⎝ i ⎠

(iv) Since np = 28 > 5 and n(1 − p) = 12 > 5 are both large, a suitable approximation to R
is the normal distribution S ∼ N (28, 8.4). Using also the continuity correction, we have

1749, Contents

24.5 − 28
P(R < 25) ≈ P(S < 24.5) = Φ ( √ ) ≈ Φ (−1.2076)
= 1 − Φ (1.2076) ≈ 1 − 0.8863 = 0.1137.

A695 (9740 N2011/II/8)(i)

(ii) The PMCC is ≈ −0.992317 which is very large in magnitude. But this merely means
that the correlation between x and y is very strong. It does not also imply that their true
relationship is definitely linear. Indeed in this case, it appears that the relationship is not
(iii) We are supposed to say that the larger the magnitude of the PMCC, the better the
model. In this case, the PMCC of y and x2 is −0.999984. And so we’re supposed to conclude
that y = a + bx2 is the better model.
(iv) In general, the estimated regression equation is y−ȳ = b(x−x̄), where b = ∑ x̂i ∑ ŷi / ∑ x̂2i .
So in this case, the estimated regression equation is

y − 10.885714 ≈ −0.856210 (x2 − 13.25)

⇐⇒ y ≈ −0.856210x2 + 22.230492.

y(3.2) = −0.856210 ⋅ (3.2)2 + 22.230492 ≈ 13.5.

A696 (9740 N2011/II/9)(i)(a) 0.6 ⋅ 0.05 + 0.4 ⋅ 0.07 = 0.03 + 0.028 = 0.058.
(i)(b) 0.03/0.058 = 15/29 ≈ 0.517241.
(ii)(a) P(Exactly one faulty) = P(First faulty, second not) + P(Second faulty, first not) =
0.058 (1 − 0.058) + (1 − 0.058) 0.058 = 2 ⋅ 0.058 ⋅ 0.942 = 0.109272.

³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹µ ³¹¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ·¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ ¹ µ P(E ∩ F )


(ii) (b)P(Both made by A∣Exactly one faulty) =

P(F )

But P(E ∩ F ) = P(E)P(F ∣E) = 0.62 (0.05 ⋅ 0.95 + 0.95 ⋅ 0.05) = 0.0342. Hence P(E∣F ) =
0.0342/0.109272 ≈ 0.312980.
A697 (9740 N2011/II/10)(i) We are given that T ∼ N(5.0, 38.0).
Let X be the time taken to install the component after background music is introduced.
Assume that X remains normally distributed with standard deviation 5.0 (these are ques-
1750, Contents
tionable assumptions, but without these we cannot proceed). That is, X ∼ N (µ0 , 5.02 ).
The null hypothesis is H0 ∶ µ0 = 38.0 and the alternative hypothesis is HA ∶ µ0 < 38.0.

(ii) Z0.05 ≈ 1.645.√ So to reject the null hypothesis, we must have t̄ < µ 0 − Z 0.05 σ/ n=
38.0 − 1.645 ⋅ 5.0/ 50 ≈ 36.8.

(iii) Since the null is not rejected with t̄ = 37.1, we must have t̄ = 37.1 > µ 0 − Z 0.05 σ/ n=
38.0 − 1.645 ⋅ 5.0/ n. Rearranging, n < (1.645 ⋅ 5.0/0.9) ≈ 83.5. Thus, n ∈ {1, 2, . . . , 83}.

A698 (9740 N2011/II/11)(i) There are in total C(30, 10) ways to choose the committee.
There are C(18, 4) × C(12, 6) ways to choose a committee with exactly 4 women. Hence,
the desired probability is

⎛ 18 ⎞ ⎛ 12 ⎞ ⎛ 30 ⎞ [(18 ⋅ 17 ⋅ 16 ⋅ 15) /4!] [(12 ⋅ 11 ⋅ 10 ⋅ 9 ⋅ 8 ⋅ 7) /6!]

/ =
⎝ 4 ⎠ ⎝ 6 ⎠ ⎝ 10 ⎠ (30 ⋅ 29 ⋅ ⋅ ⋅ ⋅ ⋅ 21) /10!

17 ⋅ 48
= ≈ 0.9410679.
29 ⋅ 13 ⋅ 23
(ii) The number of ways to choose a committee with exactly r women is:

⎛ 18 ⎞ ⎛ 12 ⎞
⎝ r ⎠ ⎝ 10 − r ⎠

And the number of ways to choose a committee with exactly r + 1 women is:

⎛ 18 ⎞ ⎛ 12 ⎞
⎝ r + 1 ⎠⎝ 9 − r ⎠

We are told that the first number is greater than the second, i.e.

⎛ 18 ⎞ ⎛ 12 ⎞ ⎛ 18 ⎞ ⎛ 12 ⎞
⎝ r ⎠ ⎝ 10 − r ⎠ ⎝ r + 1 ⎠ ⎝ 9 − r ⎠

18! 12! 18! 12!

⇐⇒ >
(18 − r)!r! (2 + r)!(10 − r)! (17 − r)!(r + 1)! (3 + r)!(9 − r)!

⇐⇒ (17 − r)!(r + 1)!(3 + r)!(9 − r)! > (18 − r)!r!(2 + r)!(10 − r)! (as desired).

Continuing with the algebra, we have (r + 1)(3 + r) > (18 − r)(10 − r) ⇐⇒ r2 + 4r + 3 >
r2 − 28r + 180 ⇐⇒ 32r > 177 ⇐⇒ r > 5 + 17/32.
We have just proven that P(R = r) > P(R = r + 1) if and only if r = 6, 7, 8, 9. That is,
we have just shown that P(R = 6) > P(R = 7) > P(R = 8) > P(R = 9) > P(R = 10), but
P(R = 0) ≤ P(R = 1) ≤ P(R = 2) ≤ P(R = 3) ≤ P(R = 4) ≤ P(R = 5) ≤ P(R = 6).
We have thus shown that 6 is a most-probable-number-of-women and that 7, 8, 9, 10 are
not. We must rule out that 5 (or any smaller number) is a most-probable-number-of-women.
But clearly, 6!4! ≠ 5!5!, so that

1751, Contents

⎛ 18 ⎞ ⎛ 12 ⎞ ⎛ 18 ⎞ ⎛ 12 ⎞

⎝ 6 ⎠⎝ 4 ⎠ ⎝ 5 ⎠⎝ 5 ⎠

Hence, it is indeed the case that P(R = 5) < P(R = 6). Thus, 6 is indeed the unique
A699 (9740 N2011/II/12)(i) Let X be the number of people who join the queue in a
period of 4 minutes. Then X ∼ Po(4.8) and:
P(X ≥ 8) = 1 − P(X ≤ 7) = 1 − e −4.8
∑ ≈ 0.113334.
i=0 i!

(ii) Let Y be the number of people who join the queue in a period of t minutes. Then
Y ∼ Po(1.2t/60) = Po(0.02t). We are told that P(Y ≤ 1) = 0.7. That is,

P(Y ≤ 1) = e−0.02t (1 + 0.02t) = 0.7.

By calculator, t ≈ 54.8675.
(iii) Let Z be the number of people who leave the queue over 15 minutes. Then Z ∼ Po(27).
Let B be the number of people who join the queue over 15 minutes. Then B ∼ Po(18).
We wish to find P(35 + B − Z ≥ 24) = P(Z − B ≤ 11).
Since λZ = 27 is large, a suitable approximation for Z is the normal distribution is A ∼
N(27, 27). Since λB = 18 is large, a suitable approximation for B is the normal distribution
is C ∼ N(18, 18). In turn, a suitable approximation for Z − B is A − C ∼ N(9, 45). Hence,
using also the continuity correction,

11.5 − 9
P(Z − B ≤ 11) ≈ P(A − C ≤ 11.5) = Φ ( √ ) ≈ Φ(0.3727) ≈ 0.6453.

(iv) There might be certain periods of time when more planes arrive and other periods when
fewer arrive. So the rate at which people join the queue will probably not be constant.
A700 (9740 N2010/II/5)(i) Say we wish to stratify the spectators by age group. One
problem is that we may not know what proportion of the spectators belongs to each age
group. As such, it would may be difficult to get a representative sample.
(ii) Order the spectators by their names, alphabetically. Choose every 100th spectator on
the list to survey.
A701 (9740 N2010/II/6)(i)
∑ t 454.3
t̄ = = = 41.3,
n 11

∑ t2 − (∑ t) /11 18779.43 − 454.32 /11

s =
= = 1.684.
n−1 10
(ii) The null hypothesis is H0 ∶ µ0 = 42.0 and the alternative hypothesis is HA ∶ µ0 ≠ 42.0.

1752, Contents

t̄ − µ0 41.3 − 42.0
T= √ =√ ≈ −1.789.
s/ n 1.684/11

Since ∣T ∣ < t10,0.05 = 1.812, we are unable to reject the null hypothesis.
A702 (9740 N2010/II/7)(i) P(A ∩ B ′ ) = P(A∣B ′ )P(B ′ ) = 0.8 ⋅ 0.4 = 0.32.
(ii) P(A ∪ B) = P(B) + P(A ∩ B ′ ) = 0.92.
(iii) P(B ′ ∣A) = P(B ′ ∩ A) ÷ P(A) = 0.32 ÷ 0.7 = 16/35 ≈ 0.457142857.
(iv) P(A′ ∩ C) = P(A′ )P(C) = 0.3 ⋅ 0.5 = 0.15.
(v) P(A′ ∩ B ∩ C) ≤ 0.15.
A703 (9740 N2010/II/8)(i) The probability that the number is greater than 30000 is
the probability that the first digit is 3, 4, or 5. Answer: 3/5 = 0.6.
(ii) The first three digits are odd and there are 3! ways to arrange them. The last two are
even and there are 2! ways to arrange them. The total number of ways to arrange the five
digits is 5!. Answer: 3!2!/5! = 1/10 = 0.1.
(iii) If the first digit is 3, the last digit must be 1or 5, and in each case, there are 3! ways
to arrange the middle 3 digits.
Similarly, if the first digit is 5, the last digit must be 1 or 3, and in each case, there are 3!
ways to arrange the middle 3 digits.
If the first digit is 4, the last digit can be 1, 3, or 5, and in each case, there are 3! ways to
arrange the middle 3 digits.
Altogether then, there are 7 ⋅ 3! ways to get such a number and the desired probability is
7 ⋅ 3!/5! = 7/20 = 0.35.
A704 (9740 N2010/II/9)(i) Our desired probability is P(Y > 2X) = P(Y − 2X > 0).
Now, Y − 2X ∼ N (400 − 2 ⋅ 180, 602 + 22 302 ) = N (40, 7200). So

0 − 40
P(Y − 2X > 0) = 1 − Φ ( √ ) ≈ Φ(0.4714) ≈ 0.6813.

(ii) Our desired probability is P(0.12X + 0.05Y > 45). Now:

0.12X + 0.05Y ∼ N (0.12 ⋅ 180 + 0.05 ⋅ 400, 0.122 ⋅ 302 + 0.052 ⋅ 602 ) = N (41.6, 21.96)
45 − 41.6
Ô⇒ P(0.12X + 0.05Y > 45) = 1 − Φ ( √ ) ≈ 1 − Φ(0.7255) ≈ 1 − 0.7658 = 0.2342.

(iii) Our desired probability is P (0.12X1 + 0.12X2 > 45). Now:

0.12X1 + 0.12X2 ∼ N (0.12 ⋅ 180 + 0.12 ⋅ 180, 0.122 ⋅ 302 + 0.122 ⋅ 302 ) = N(43.2, 25.92)
45 − 43.2
P (0.12X1 + 0.12X2 > 45) = 1 − Φ ( √ ) ≈ 1 − Φ(0.3536) ≈ 1 − 0.6381 = 0.3619.

A705 (9740 N2010/II/10)(i)

1753, Contents


(ii)(a) PMCC ≈ 0.986024.

(ii)(b) PMCC ≈ 0.990681.
(iii) We are, as usual, supposed to say that the larger the magnitude of the PMCC, the
better the model. So F = c + dv 2 is the better model.
(iv) In general, the estimated regression equation is y−ȳ = b(x−x̄), where b = ∑ x̂i ∑ ŷi / ∑ x̂2i .
So in this case, the estimated regression equation is

F − 14.25 ≈ 0.0242420 (x2 − 456)

⇐⇒ F ≈ 0.0242420x2 + 3.195652.

And F = 26.0 ⇐⇒ x ≈ (26.0 − 3.195652) /0.0242420 ≈ 30.7.
To predict a value of v given a value of F , it would be more appropriate to use a regression
where v (or a function of v) is the independent variable and F (or a function of F ) is the
dependent variable.
A706 (9740 N2010/II/11)(i) Let X be the number of calls received in a randomly
chosen period of 4 minutes. Then X ∼ Po(12) and

−12 12
P(X = 8) = e ≈ 0.0655233.
(ii) Let Y be the number of calls received in a randomly chosen period of t seconds. Then
Y ∼ Po(3t/60) = Po(0.05t) and P(Y = 0) = e−0.05t = 0.2. So t = (ln 0.2) /(−0.05) ≈ 32.
P(Y = 0) = e−12 ≈ 0.0655233.
(iii) Let Z be the number of calls received in a randomly chosen period of 12 hours.
Then Z ∼ Po(2160) and a suitable approximation therefor is the normal distribution A ∼
N (2160, 2160). Hence, using also the continuity correction,

2200.5 − 2160
P(Z > 2200) ≈ P(A ≥ 2200.5) = 1 − Φ ( √ ) ≈ 1 − Φ (0.8714) ≈ 1 − 0.8082 = 0.1918.

(iv) 0.19182 0.80824 ≈ 0.2354.

1754, Contents

(v) Let B be the number of busy days out of 30. Since np ≈ 5.754 > 5 and n(1 − p) > 5, a
suitable approximation to B is the normal distribution C ∼ N (5.754, 4.650). So using also
the continuity correction,

10.5 − 5.754
P(B ≤ 10) ≈ P(C ≤ 10.5) = Φ ( √ ) ≈ Φ(2.201) ≈ 0.9861.

(Without using any approximation, P(B ≤ 10) ≈ 0.980906.)

A707 (9740 N2009/II/5). Simply survey people standing outside the theatre waiting
for the movie to start. Stop once the quota of 100 persons is met.
A disadvantage is that this may not be a representative sample. For example, there will be
no late-comers in our sample of 100.
A708 (9740 N2009/II/6)(i)

(ii) No. A linear model would imply that several centuries hence, the time taken to run a
mile would be negative, which is clearly impossible.
The scatter diagram similarly suggests that the rate of improvement is tapering off, rather
than linear.
(iii) A quadratic model would imply that the world record time taken to run a mile eventu-
ally bottoms out, then starts increasing. But by definition, it is impossible that the world
record time increases.
(iv) In general, the estimated regression equation is y−ȳ = b(x−x̄), where b = ∑ x̂i ∑ ŷi / ∑ x̂2i .
So in this case, the estimated regression equation is

ln t − 3.161647 ≈ −0.0161280(x − 1965)

⇐⇒ ln t ≈ −0.0161280x + 34.853071.

t(2010) ≈ e−0.0161280(2010)+34.853071 ≈ 11.4. So the predicted world record time on 1st January
2010 is 3 m 41.4 s.
Our range of data is 1930-2000. We are extrapolating our data, which might not always
work out reliably.
A709 (9740 N2009/II/7)(i) Let E and F be the events that “a randomly chosen com-
ponent that is faulty” and “a randomly chosen component was supplied by A”. Then:

1755, Contents

P(E) = 0.01p ⋅ 0.05 + 0.01(1 − p)0.03 = 0.03 + 0.02 ⋅ 0.01p = 0.035

P(F ∩ E) 0.01p ⋅ 0.05 0.05p 7.5

(ii) f (p) = P(F ∣E) = = = = 2.5 −
F (E) 0.03 + 0.02 ⋅ 0.01p 3 + 0.02p 3 + 0.02p

f ′ (p) = 7.5(3 + 0.02p)−2 (0.02) > 0. This shows that the probability that a randomly chosen
component that is faulty was supplied by A is increasing in the percentage of electronic
components bought from A. Which is not very surprising.
A710 (9740 N2009/II/8)(i) We have 8 letters total, 3 of which are repeated. Hence,
there are 8!/3! = 6720 possible permutations.
(ii) Let TD or DT be a single letter. Then we have 7 “letters” total, 3 of which are
repeated, so there are 2! × 7!/3! possible permutations that we do not want. So there are
6720 − 2! × 7!/3! = 5040 possible permutations that we do want.
(iii) The 4 consonants by themselves have 4! possible permutations. The 4 vowels by
themselves have 4! ÷ 3! = 4 possible permutations. The first letter can either be a consonant
or a vowel. Hence, there are in total 2 × 4! × 4 = 192 possible permutations.
(iv) There are only four broad possibilities: E _ _ E _ _ E _ , E _ _ E _ _ _ E, E _
_ _E _ _ E, and _ E _ _E _ _ E. Each of which have 5! possible permutations. Hence,
there are in total 4 × 5! = 480 possible permutations.
0.12 ⎛ 2.53 − 2.5 ⎞
A711 (9740 N2009/II/9)(i) M̄ ∼ N (2.5, ). So P (M̄ > 2.53) = 1 − Φ √ =
⎝ 0.12 /n ⎠
√ √ √
1 − Φ (0.3 n) = 0.0668 ⇐⇒ Φ (0.3 n) = 0.9332 ⇐⇒ 0.3 n = 1.5 ⇐⇒ n = 25.
(ii) Assuming the thicknesses of the textbooks are independently distributed,

X = M1 +⋅ ⋅ ⋅+M21 +S1 +. . . S24 ∼ N (21 ⋅ 2.5 + 24 ⋅ 2.0, 21 ⋅ 0.12 + 24 ⋅ 0.082 ) = N (100.5, 0.3636) .

100 − 100.5
Now, P(X ≤ 100) = Φ ( √ ) ≈ 1 − Φ (0.8292) ≈ 1 − 0.7964 = 0.2036.
(iii) Again assuming the thicknesses of the textbooks are independently distributed, our
desired probability is P (S1 + S2 + S3 + S4 < 3M ) = P (S1 + S2 + S3 + S4 − 3M < 0). Now, S1 +
S2 + S3 + S4 − 3M ∼ N (4 ⋅ 2.0 − 3 ⋅ 2.5, 4 ⋅ 0.082 + 32 ⋅ 0.12 ) = N (0.5, 0.1156). Hence,

0 − 0.5
P (S1 + S2 + S3 + S4 − 3M < 0) = Φ ( √ ) ≈ 1 − Φ (1.4706) ≈ 1 − 0.9293 = 0.0707.

(iv) The thicknesses of the textbooks are independently distributed.

A712 (9740 N2009/II/10)(i)
∑ x 86.4
x̄ = = = 9.6,
n 9

∑ x2 − (∑ x) /n 835.92. − 86.42 /9
s =
= ≈ 0.81.
n−1 8

1756, Contents

(ii) A necessary assumption is that X is normally distributed. The null hypothesis is
H0 ∶ µ0 = 10 and the alternative hypothesis is HA ∶ µ0 ≠ 10.
x̄ − µ0 9.6 − 10 4
t= √ =√ =− .
s/ n 0.81/9 3

Since ∣t∣ < t8,0.025 = 2.306, we are unable to reject the null hypothesis.
The sample size is small. And so we are unable to appeal to the CLT and claim that a
normal distribution is a suitable approximate distribution for x̄.
(Author’s remark: It actually makes no sense to say that “the CLT does not apply in this
context”. The CLT certainly applies. It is merely that the normal distribution is a poor
approximation for the sample mean.)
(iii) We’d use the Z-test instead.
A713 (9740 N2009/II/11)(i) The probability that any observed car is red is independent
of whether any other observed car is red.
Each car is either strictly red or strictly not red.

(ii) P(4 ≤ R < 8)

⎛ 20 ⎞ ⎛ 20 ⎞ ⎛ 20 ⎞ ⎛ 20 ⎞
= 0.154 0.8516 + 0.155 0.8515 + 0.156 0.8514 + 0.157 0.8513
⎝ 4 ⎠ ⎝ 5 ⎠ ⎝ 6 ⎠ ⎝ 7 ⎠

≈ 0.346354.

(iii) Since np and n(1−p) are large, a suitable approximation to R is the normal distribution
X ∼ N (72, 50.4). Hence, using also the continuity correction,

59.5 − 72
P(R < 60) ≈ P(X < 59.5) = Φ ( √ ) ≈ 1 − Φ (1.761) ≈ 1 − 0.9609 = 0.0391.

(iv) Since n is large and p is small, a suitable approximation to R is the normal distribution
Y ∼ Po (4.8). Hence,

−4.8 4.8
P(R = 3) = e ≈ 0.152.

⎛ 20 ⎞ 0 ⎛ 20 ⎞ 1
(v)P(R = 0) + P(R = 1) = p (1 − p)20 + p (1 − p)19
⎝ 0 ⎠ ⎝ 1 ⎠
= (1 − p)19 (1 − p + 20p) = 0.2.

By calculator, p ≈ 0.142432.
A714 (9740 N2008/II/5)(i) Take any ordered list of the 950 pupils. From the list, pick
every 19th student.
(ii) We might want each level to be equally well-represented. For example, we might like
approximately one-sixth of the sample to be from Primary 1, another sixth from Primary
1757, Contents
2, etc.
In which case we’d probably prefer to do a stratified sample. The method might be some-
thing like this: Pick from the aforementioned ordered list the first 108 Primary 1 students,
the first 108 Primary 2 students, etc.
A715 (9740 N2008/II/6). Let the mass of calcium in a bottle (after the extreme
weather) be X ∼ N (µ0 , σ 2 ). (We have made the necessary assumption that X is normally
The null hypothesis is H0 ∶ µ0 = 78 and the alternative hypothesis is H0 ∶ µ0 ≠ 78. Now,
x̄ − µ0 ∑ x/n − 78
t= √ =√ √ ≈ −1.207.
[∑ x − (∑ x) /n] /(n − 1)/ n
s/ n 2 2

Since ∣t∣ < t14,0.025 ≈ 2.145, we are unable to reject the null hypothesis.
A716 (9740 N2008/II/7)(i) Let A1 denote the event that A wins the first set. Similarly
define A2 , A3 , B1 , B2 , and B3 . P (A2 ) = P (A1 ∩ A2 ) + P (B1 ∩ A2 ) = 0.6 ⋅ 0.7 + 0.4 ⋅ 0.2 = 0.5.
(ii) P (A wins) = P (A1 ∩ A2 ) + P (A1 ∩ B2 ∩ A3 ) + P (B1 ∩ A2 ∩ A3 ) = 0.42 + 0.6 ⋅ 0.3 ⋅ 0.2 +
0.4 ⋅ 0.2 ⋅ 0.7 = 0.42 + 0.036 + 0.056 = 0.512.
(iii) P (B1 ∩ A2 ∩ A3 ) /P (A wins) = 0.056/0.512 = 0.109375.
A717 (9740 N2008/II/8)(i) PMCC ≈ 0.9695281468. This large PMCC merely suggests
that there is a strong (positive) linear relationship between x and t. However, the true
relationship between x and t could be something other than linear.

(iii) Without P , it appears that t is increasing, but at a decreasing rate. So a log model
might be appropriate.
(iv) In general, the estimated regression equation is y − ȳ = b(x − x̄), where

b = ∑ x̂i ∑ ŷi / ∑ x̂2i .

So in this case, the estimated regression equation is

t − 6.45 ≈ 4.396563 (ln x − 1.143002)

⇐⇒ t ≈ 4.396563 ln x + 1.424722.

1758, Contents

So for the model t = a + b ln x, the least square estimates are a ≈ 1.4 and b ≈ 4.4.
(v) t(x = 4.8) ≈ 4.4 ln(4.8) + 1.4 ≈ 8.3.
(vi) This would be an extrapolation of the data, which may or may not be wise.
A718 (9740 N2008/II/9)(i) X ∼ Po(1.8).
1.80 1.81 1.82
P(X ≥ 4) = 1 − P(X ≤ 3) = 1 − e−1.8 ( + + ) ≈ 0.108708.
0! 1! 2!
(ii) Let Y be the total number of pianos sold in a given week. Then Y ∼ Po(4.4). P(Y =
4) = e−4.4 4.44 /4! ≈ 0.191736.
(iii) Let Z be the number of grand pianos sold in 50 weeks. Then Z ∼ Po(90). Since λZ
is large, a suitable approximation is the normal distribution A ∼ N (90, 90). Hence, using
also the continuity correction,

79.5 − 90
P(Z < 80) ≈ P(A < 79.5) = Φ ( √ ) ≈ 1 − Φ(1.1068) ≈ 1 − 0.8657 = 0.1343.

(iv) An organisation might buy a relatively-large number of grand pianos on any given day.
So it is not likely that the rate at which grand pianos are sold is constant throughout the
⎛ 3 ⎞⎛ 4 ⎞⎛ 5 ⎞
A719 (9740 N2008/II/10)(i) = 3 ⋅ 4 ⋅ 10 = 120.
⎝ 2 ⎠⎝ 3 ⎠⎝ 3 ⎠
(ii) = 9.
⎛ 5 ⎞⎛ 7 ⎞ ⎛ 5 ⎞⎛ 7 ⎞
(iii) + = 5 ⋅ 35 + 1 × 35 = 210.
⎝ 4 ⎠⎝ 4 ⎠ ⎝ 5 ⎠⎝ 3 ⎠
(iv) The number of ways to have
• No diplomats from K (i.e. only diplomats from L and M ) is ;
• No diplomats from L is ;
• No diplomats from M is 0.
⎛ 12 ⎞
The total number of ways to choose the diplomats is . Hence the number of ways to
⎝ 8 ⎠
⎡ ⎤
⎛ 12 ⎞ ⎢⎢⎛ 9 ⎞ ⎛ 8 ⎞⎥⎥
− + = 495 − (9 + 1) = 485.
⎝ 8 ⎠ ⎢⎢⎝ 8 ⎠ ⎝ 8 ⎠⎥⎥
have at least 1 diplomat from each island is
⎣ ⎦

1759, Contents

A720 (9740 N2008/II/11)(i) X1 + X2 ∼ N (100, 2 ⋅ 82 ). So:

120 − 100
P(X1 + X2 > 120) = 1 − Φ ( √ ) ≈ 1 − Φ(1.768) ≈ 1 − 0.9615 = 0.0385.
2 ⋅ 82

(ii) X1 − X2 ∼ N (0, 2 ⋅ 82 ). So:

15 − 0
P(X1 > X2 + 15) = P(X1 − X2 > 15) = 1 − Φ ( √ ) ≈ 1 − Φ(1.3258) ≈ 1 − 0.9075 = 0.0925.
2 ⋅ 82
74 − µ 74 − µ
(iii) P(Y < 74) = Φ ( ) = 0.0668 ⇐⇒ = −1.5.
σ σ
146 − µ 146 − µ 146 − µ
P(Y > 146) = 1 − Φ ( ) = 0.0668 ⇐⇒ Φ ( ) = 0.9332 ⇐⇒ = 1.5.
σ σ σ
146 − µ 74 − µ 72
− = 1.5 − (−1.5) = = 3 ⇐⇒ σ = 24 and µ = 110.
σ σ σ
Since σ = 8a and µ = 50a + b, a = 3 and b = −40.
A721 (9233 N2008/I/1). 3 ways to arrange the 3 groups of books. And within each
group of books, we can permute them as usual. So there are 3!6!5!4! = 12 441 600 ways.
A722 (9233 N2008/II/23). By independence, pA∩B = pA pB . Also pA∪B = pA +pB −pA∩B =
pA + pB − pA pB . Plugging in the given numbers, we have 0.4 = 0.2 + pB − 0.2pB , so pB = 0.25.
pB pC = 0.25 ⋅ 0.4 = 0.1 = pB∩C , so that by definition, B and C are indeed independent.
A723 (9233 N2008/II/26)(i) Let X ∼ Po(3). P(X > 2) = 1 − P(X ≤ 0) = 1 −
e−3 (1 + 3 + 9/2) = 1 − 8.5e−3 ≈ 1 − 0.423 = 0.577.
(ii) Let Y be the number of times the machine will break down in a period of four weeks.
Then Y ∼ Po(12).

P(Y ≤ 3) = e−12 (1 + 12 + 122 /2 + 123 /6) ≈ 0.00229.

(iii) Let Z be the number of times the machine will break down in a period of 16 weeks.
Then Z ∼ Po(48). Since λZ is large, a suitable approximation for Z is the normal distribu-
tion A ∼ N (48, 48). Hence, using also the continuity correction,

50.5 − 48
P(Z > 50) ≈ P(A > 50.5) = 1 − Φ ( √ ) ≈ 1 − Φ(0.3608) ≈ 1 − 0.6409 = 0.3591.

A724 (9233 N2008/II/27)(i) Let the mass after the adjustment be X ∼ N (µ0 , σ 2 ). It
is necessary to assume that these masses remain normally distributed. The null hypothesis
is H0 ∶ µ0 = 32.40 and the alternative hypothesis is HA ∶ µ0 ≠ 32.40. Now,
x̄ − µ0 32.00 − 32.40
t= √ = √ ≈ −2.104.
s/ n 2.892/80

Since ∣t∣ > t79,0.025 ≈ 1.99, we can reject the null hypothesis.
(ii) This means that if H0 were true and we tested infinitely many size-80 samples (as done
above), we’d reject H0 in 5% of the samples.
1760, Contents
(iii) The one-tailed p-value is ≈ 0.0193. So the least level of significance is 1.93%.
A725 (9233 N2008/II/29)(i) Let X ∼ N (50, 42 ). The probability that Mr Sim is late
on any given day is
55 − 50
P(X > 55) = 1 − Φ ( ) = 1 − Φ(1.25) ≈ 1 − 0.8944 = 0.1056.
Assuming that the probability that he’s late each day is independent of whether he was
late on any other day, the probability that he will be late no more than once in 5 days is

⎛5⎞ ⎛5⎞
0.10560 0.89445 + 0.10561 0.89444 ≈ 0.910.
⎝0⎠ ⎝1⎠

(ii) Let Y ∼ N (40, 52 ). Our desired probability is P(X − Y − 5 < 0). Assuming the journey
times of Messrs Sim and Lee are independent, X − Y − 5 ∼ N (5, 42 + 52 ). Thus,

P(X − Y − 5 < 0) = Φ ( √ ) ≈ 1 − Φ(0.7809) ≈ 1 − 0.7826 = 0.2174.
42 + 52

(iii) Assume that the journey times of Messrs Sim and Lee each day are independent. Then
the desired probability is

⎛5⎞ ⎛5⎞ ⎛5⎞

0.21743 0.78262 + 0.21744 0.78261 + 0.21745 0.7826 ≈ 0.0722.
⎝3⎠ ⎝4⎠ ⎝5⎠

86.50 − µ
A726 (9233 N2008/II/30)(i) Let M ∼ N (µ, σ 2 ). P(M < 86.50) = Φ ( ) = 0.12
86.50 − µ 1
⇐⇒ = −1.175.
92.25 − µ 92.25 − µ 92.25 − µ 2
P(M > 92.25) = 1 − Φ ( ) = 0.2 ⇐⇒ Φ ( ) = 0.8 ⇐⇒ = 0.842.
σ σ σ
=minus = yields = 2.017 ⇐⇒ σ ≈ 2.85. And now µ ≈ 89.85.
2 1
(ii) Let X ∼ N (µ, σ 2 ). P (µ − 2 ≤ X ≤ µ + 2) = 0.8 Ô⇒ P (X ≤ µ + 2) = 0.9 ⇐⇒ Φ ( ) =
0.9 ⇐⇒ ≈ 1.281 ⇐⇒ σ ≈ 1.56.
σ2 0.50
(iii) Let X̄ ∼ N (µ, ). Then P(X̄ ≥ µ + 0.50) ≤ 0.1 ⇐⇒ 1 − Φ ( √ ) ≤ 0.1 ⇐⇒
√ √ √
n σ/ n
0.50 n 0.50 n 0.50 n 2 √
0.9 ≤ Φ ( ) ⇐⇒ ? 1.281 ⇐⇒ ≥ ⇐⇒ 0.50 n ≥ 2 ⇐⇒ n ≥ 16.
σ σ σ σ
A727 (9740 N2007/II/5)(i) Consider a survey of whether students like a particular
teacher. A quota of 10 students is to be chosen. Take a list of the teacher’s students, sort
their names alphabetically, and pick the first 10 students on the list.
One disadvantage is that this sample of 10 students might not be representative. For
example, they might all be siblings from the same family of Angs.

1761, Contents

(ii) Yes. If say the teacher teaches 10 different classes, we could stratify our sample by
class and pick 1 student from each class.
A728 (9740 N2007/II/6).

⎛ 10 ⎞ ⎛ 10 ⎞ ⎛ 10 ⎞
0.240 0.7610 + 0.241 0.769 + ⋅ ⋅ ⋅ + + 0.244 0.766 ≈ 0.933.
⎝ 0 ⎠ ⎝ 1 ⎠ ⎝ 4 ⎠

(i) Let X ∼ B(1000, 0.24) be the number of people in a sample of 1000 that have gene A.
Since np = 240 > 5 and n(1 − p) = 760 > 5 are both large, a suitable approximation for X is
the normal distribution Y ∼ N (240, 182.4). Hence, using also the continuity correction,

260.5 − 240 229.5 − 240

P(230 ≤ X ≤ 260) ≈ P(229.5 ≤ Y ≤ 260.5) = Φ ( √ ) − Φ( √ )
182.4 182.4
≈ Φ(1.5179) − Φ(−0.7775) ≈ 0.9355 − 0.2180 ≈ 0.7175.

(ii) Let Z ∼ B(1000, 0.003) be the number of people in a sample of 1000 that have gene B.
Since n is large and p is small, a suitable approximation for Y is the Poisson distribution
A ∼ Po (3). Hence,

P(2 ≤ Z < 5) ≈ P(2 ≤ Y < 5) = P(Y = 2) + P(Y = 3) + P(Y = 4)

= e−3 (32 /2 + 33 /6 + 34 /24) ≈ 0.616.

A729 (9740 N2007/II/7)(i)

∑ x 4626 ∑ x2 − (∑ x) /n
x̄ = = = 30.84 and s =
≈ 33.7259.
n 150 n−1
(ii) Let H0 ∶ µ0 = 30 and HA ∶ µ0 > 30 be the null and alternative hypotheses. Now,
x̄ − µ0 30.84 − 30
Z= √ ≈√ ≈ 1.772.
s/ n 33.7259/150

Since Z > Z0.05 = 1.645, we can reject the null hypothesis.

(iii) We used the Z-test. The sample size is large, so the normal distribution is a good
approximation provided the underlying distribution is “nice enough”.
A730 (9740 N2007/II/8)(i) Let C be the weight of a randomly chosen chicken. Then
C ∼ N (2.2, 0.52 ). Then 3C ∼ N (3 ⋅ 2.2, 32 ⋅ 0.52 ) = N (6.6, 1.52 ) and:
7 − 6.6 4
P(3C > 7) = 1 − Φ ( ) = 1 − Φ ( ) ≈ 0.3949.
1.5 15

(ii) Let T be the weight of a randomly chosen turkey. Then T ∼ N (10.5, 2.12 ). Then
5T ∼ N (5 ⋅ 10.5, 52 ⋅ 2.12 ) = N (52.5, 10.52 ) and:
55 − 52.5 5
P(5T > 55) = 1 − Φ ( ) = 1 − Φ ( ) ≈ 0.405904.
10.5 21
Thus, P(3C > 7) ⋅ P(5T > 55) ≈ 0.160.

1762, Contents

(iii) 3C + 5T ∼ N (6.6 + 52.5, 1.52 + 10.52 ) = N (59.1, 112.5). So:

62 − 59.1 5
P(3C + 5T > 62) = 1 − Φ ( √ ) = 1 − Φ ( ) ≈ 0.392.
112.5 21

(iv) The event “both chicken costs more than $7 and turkey costs more than $55” is a
proper subset of the event “chicken and turkey together cost $62”. By the monotonicity of
probability, the probability of the latter is greater than the latter.
A731 (9740 N2007/II/9)(i)(a) 12! (b) 6! ⋅ 26 .
(ii)(a) 11!
(ii)(b) Fix any man. Then we must have to his right: Woman, man, woman, man, etc. So
(ii)(c) Fix any man A. Then we must have:
• To his right: “Wife A, some other man, that some other man’s wife, etc.”; OR
• To his left: “Wife A, some other man, that some other man’s wife, etc.”.
In the first scenario, we have 5! possible arrangements. Likewise in the second. Altogether
2 ⋅ 5! possible arrangements.
A732 (9740 N2007/II/10).

Figure to be
inserted here.

1 1 1 1
(i) P(1, 1, 1) = ⋅ ⋅ = .
8 4 2 64
1 1 1 3 1 7 1 1 8 + 2 ⋅ 3 + 7 21
(ii) P(1, 1) + P(1, 0, 1) + P(0, 1, 1) = ⋅ + ⋅ ⋅ + ⋅ ⋅ = = .
8 4 8 4 4 8 8 4 256 256
(iii) Let E and F be the events that “the third throw is successful” and “exactly two of
the three throws are successful”.
1 3 1 7 1 1 13
P(E ∩ F ) = P(1, 0, 1) + P(0, 1, 1) = ⋅ ⋅ + ⋅ ⋅ = .
8 4 4 8 8 4 256
13 17
P(F ) = P(E ∩ F ) + P(E ′ ∩ F ) = + P(1, 1, 0) = .
256 256
Thus, P(E∣F ) = P(E ∩ F ) ÷ P(F ) = 13/17.
A733 (9740 N2007/II/11)(i) In general, the estimated regression equation is y − ȳ =
b(x − x̄), where b = ∑ x̂i ∑ ŷi / ∑ x̂2i . So in this case, the estimated regression equation is:

x − 131.667 ≈ −0.260 (t − 32) or x ≈ −0.260t + 66.194.

(ii) x(t = 300) ≈ −0.259701 ⋅ (300) + 66.194030 ≈ −11.7.

1763, Contents
From the scatter diagram, the linear model does not appear to be suitable. Moreover, the
linear model predicts that at t = 300, x < 0, which is impossible.
(iii) PMCC ≈ −0.993839. Its magnitude is larger than −0.912 and very close to −1. It
would appear that the regression of ln x on t is a more appropriate model.
(iv) In general, the estimated regression equation is y−ȳ = b(x−x̄), where b = ∑ x̂i ∑ ŷi / ∑ x̂2i .
So in this case, the estimated regression equation is

ln x − 2.995391 ≈ −0.0123434 (t − 131.666667)

⇐⇒ ln x ≈ −0.0123434t + 4.620609

(v) x = 15 Ô⇒ t ≈ (4.620609 − ln 15) /0.0123434 ≈ 155.

A734 (9233 N2007/I/4)(i) It cannot be that all three vertices are collinear. Thus, one
vertex must be chosen from the upper line segment and the other must be chosen from the
lower line segment. Hence, there are 3 × 6 = 18 possible triangles.
(ii) Consider triangles that do not have A as a vertex. Two vertices must be chosen from one
⎛3⎞ ⎛6⎞
line segment and the third must be chosen from the other. So there are ⋅7+4⋅ =
⎝2⎠ ⎝2⎠
21 + 60 = 81 possible triangles. Now, including also triangles with A as a vertex, we have
99 possible triangles.
A735 (9233 N2007/II/23)(i) X ∼ (30, 52 ) Ô⇒ X̄100 ∼ (30, 52 /100) = (30, 0.25). Since
the sample size is sufficiently large, by the Central Limit Theorem, a suitable approximation
for X̄100 is the normal distribution Y ∼ N (30, 0.25). So

P (29.2 ≤ X̄100 ≤ 30.8) ≈ P (29.2 ≤ Y ≤ 30.8) ≈ 0.945201 − 0.054799 ≈ 0.890.

(ii) The distribution is “sufficiently nice” that with a sample size of 100, it is appropriate
to use the CLT.
A736 (9233 N2007/II/25)(i) P(W ∣B) = 20/52 = 5/13 ≈ 0.384615.
(ii) P(B∣W ) = 20/40 = 0.5.
(iii) P(B ∪ W ) = (40 + 32)/90 = 72/100 = 0.72.
(iv) P(W )P(B) = 0.4 ⋅ 0.52 ≠ P(B ∪ W ) and so W and B are not independent.
There are men who take chemistry (equivalently, P(M ∩ C) ≠ 0), so M and C are not
mutually exclusive.
A737 (9233 N2007/II/26)(i) Let X be the number of genuine call-outs in a randomly
chosen two-week period. Then X ∼ Po(4) and
42 43 44 45
P(X < 6) = e−4 (1 + 4 + + + + ) ≈ 0.785130.
2! 3! 4! 5!
(ii) Let Y be the total number of call-outs in a randomly chosen six-week period. Then
Y ∼ Po(15) and since λY is large, a suitable approximation for Y is the normal distribution
Z ∼ N (15, 15). Hence, using also the continuity correction,

1764, Contents

19.5 − 15
P(Y > 19) ≈ P(Z > 19.5) ≈ 1 − Φ ( √ ) ≈ 0.123.

A738 (N2007/II/27-9233)(i) L + H ∼ N (5 + 3, 0.12 + 0.052 ) = N (8, 0.0125). So

8.2 − 8.0 7.9 − 8.0

P(7.9 ≤ L + H ≤ 8.2) = Φ ( √ ) − Φ (√ ) ≈ 0.963 − 0.185 ≈ 0.778.
0.0125 0.0125

(ii) 0.74L + 0.86H ∼ N (0.74 ⋅ 5 + 0.86 ⋅ 3, 0.742 ⋅ 0.12 + 0.862 ⋅ 0.052 ) = N (6.28, 0.00728225).

6.2 − 6.28 6.1 − 6.28

So, P(6.1 ≤ 0.74L + 0.86H ≤ 6.2) = Φ ( √ ) − Φ (√ )
0.00728225 0.00728225

≈ 0.183 − 0.021 ≈ 0.162.

A742 (9233 N2006/II/26)(i) Let X be the number of severe floods in a randomly-chosen

100-year period. Then X ∼ Po(2). So

[P(X = 1)] = (e−2 ⋅ 2) = 4e−4 ≈ 0.0733.

2 2

(ii) Let Y be the number of severe floods in a randomly-chosen 1000-year period. Then
Y ∼ Po(20). Since λY is large, a suitable approximation for Y is the normal distribution
Z ∼ N (20, 20). Hence, using also the continuity correction,

25.5 − 20
P(Y > 25) ≈ P(Z > 25.5) = 1 − Φ ( √ ) ≈ 0.109.

A739 (9233 N2006/I/4). We could have:

• All three identical — 1 possibility.
• Two identical — 5 possibilities.
• One identical — = 10 possibilities.
• None identical — = 10 possibilities.
So total 26 possibilities.
A740 (9233 N2006/II/23)(i) P(A) = 1/3.
The sum of two scores is 9 if the dice are (3, 6), (4, 5), (5, 4), or (6, 3). So P(B) = 4/36 = 1/9.
P(A ∩ B) = 2/36 = 1/18 ≠ P(A)P(B), so A and B are not independent.
(ii) P(A ∪ B) = P(A) + P(B) − P(A ∩ B) = 1/3 + 1/9 − 1/18 = 7/18.
A741 (9233 N2006/II/25)(i) The null hypothesis is H0 ∶ µ = µ0 = 10000 and the altern-
ative hypothesis is HA ∶ µ < 10000

1765, Contents

x̄ − µ0 ∑(x − 10000)/n + 10000 − µ0
Z= √ =√ √
{∑(x − 10000)2 − [∑(x − 10000)] /n} /(n − 1)/ n
s/ n 2

−2510/80 + 10000 − 10000

=√ √ ≈ −1.795.
{2010203 − (−2510) /80} /79/ 80

Since ∣Z∣ > Z0.05 = 1.645, we can reject the null hypothesis.
(ii) If H0 is true and we conduct the above test on infinitely many size-80 samples, we’d
(falsely) reject H0 for 5% of the samples.
A743 (9233 N2006/II/28)(i) Let the speed of any car (in km h−1 ) be X ∼ N (µ, σ 2 ). We
are given that P (X > 125) = 1/80 and P (X < 40) = 1/10.

1 125 − µ 1 125 − µ 79
P (X > 125) = ⇐⇒ 1 − Φ( )= ⇐⇒ Φ( )=
80 σ 80 σ 80
125 − µ 1
⇐⇒ ≈ 2.240.

1 40 − µ 1 40 − µ 2
P(X < 40) = ⇐⇒ Φ( )= ⇐⇒ ≈ −1.282.
10 σ 10 σ

≈ minus ≈ yields ≈ 3.522 ⇐⇒ σ ≈ 24.1 and µ ≈ 70.9.
1 2

⎛ 10 ⎞ 0 10 ⎛ 10 ⎞ 1 9 ⎛ 10 ⎞ 2 8 ⎛ 10 ⎞ 3 7
(ii) 0.1 0.9 + 0.1 0.9 + 0.1 0.9 + 0.1 0.9 ≈ 0.987.
⎝ 0 ⎠ ⎝ 1 ⎠ ⎝ 2 ⎠ ⎝ 3 ⎠

(iii) Let Y be the number of cars out of a random sample of 100 that are travelling at speed
less than 40 km h-1 . Then Y ∼ B (100, 0.1). Since np = 10 > 5 and n (1 − p) = 90 > 5 are
both large, a suitable approximation to Y is the normal distribution Z ∼ N (10, 9). Hence,
using also the continuity correction:

8.5 − 10
P (Y ≤ 8) ≈ P (Z ≤ 8.5) = Φ ( √ ) = 1 − Φ(0.5) ≈ 1 − 0.6915 = 0.3085.

1766, Contents

A cos
A Mathematician’s Lament, iv, xxxvii is continuous, 670
antiderivative, 816 cosecant, 282
constant of integration, 818 cosine, 282
uniqueness, 818 cotangent, 282
antidifferentiation, 816
rules, 821
De Morgan’s Laws, 54
techniques, 828
definite integral, 800
partial fractions, 829
derivative, 673
trigonometric identities, 841
formal definition, 674
arccosine, 282
is a function, 681
arcsine, 282
motivation, 673
arctangent, 282
notation, 682
Lagrange, 682
B Leibniz, 682
boundary points, 723 Newton, 682
C as approximate linearity, 678
Chain Rule, 693 elementary functions, 714
Leibniz’s notation, 696 implies continuity, 680
circular functions, 282 differentiation
concavity, 734 as an operator, 682
Confucian East Asia, xxxvii Chain Rule
conjugate, 69 proof, 693, 1338
pair, 69 Inverse Function Theorem, 1339
constant function, 282 Parametric Differentiation Rule, 1339
is continuous, 669 Product Rule, 690
constant of integration, 818 proof, 691
continuity Quotient Rule, 690
and limits, 668 mnemonic, 237, 690
at isolated points, 672 proof, 691
formal definition, 660 Rules of, 685
is implied by differentiability, 680 Dirichlet function, 656, 665
of constant function, 669 discontinuities, 662
of cos, 670 Dirichlet function, 665
of elementary function, 671 everywhere-discontinuous functions, 665
of identity function, 669 seemingly-discontinuous functions, 666
of inverse function, 670 types, 662
of inverse trigonometric function, 671 essential or infinite, 664
of ln, 670 jump, 663
of polynomial function, 670 removable, 662, 663
of sin, 670 division, 4
of trigonometric function, 670, 671 by zero, 6
coordinates, 77 dividend, 4
Correct Procedure for Finding Extrema (CPFE), divisor, 4
728 Euclidean algorithm, 4
1767, Contents
long division, 5 interior points, 723
quotient, 4 Internet Archive, xxxii
remainder, 4 inverse function
is continuous, 670
E Inverse Function Theorem
elementary function Leibniz’s notation, 696
is continuous, 671 inverse trigonometric function
elementary functions, 282 is continuous, 671
differentiability, 714 inverse trigonometric functions, 282
exponents, 63 Isaac Newton, 6
00 = 1, 64 isolated points
Laws of, 67 continuity at, 672
root, 65
extrema, 719 L
maximum point, 719 Leibniz’s notation, 695
strict, 719 Library Genesis, xxxii
minimum point, 719 limits, 647
strict, 719 and continuity, 668
Dirichlet function, 656
F does not exist, 652
factorial n!, 62 informal definitions, 647
Familee, 57 Rules, 659
Fermat, 726 List MF26, xxxi
Fermat’s Last Theorem, 726 ln
Fermat’s Theorem, 726 is continuous, 670
First Derivative Test (FDT), 729 logarithms, 70
First Derivative Test for Inflexion Points (FDTI), Laws of, 70
742 logic, 8
Flawed Secondary School Procedure for Find- affirming the consequent, 19
ing Extrema (FSSPFE), 725 categorical propositions, 27
Fundamental Theorem of Calculus negations, 27
First, 811 conjunction AND, 10
G contrapositive, 23
Google calculator, xxxi equivalence of, 23
graphing calculators, xxxiii uses of, 24
graphs, 77 converse, 17
not equivalent, 17
I De Morgan’s Laws, 13
identity function, 282 negation of AND, 13
identity mapping, 282 negation of OR, 14
is continuous, 669 disjunction OR, 10
Increasing/Decreasing Test, 716 equivalence, 25
indefinite integral, 816 equivalence ⇐⇒ , 12
indefinite integration, 816 fallacy of the converse, 19
inflexion point, 738 implication, 15, 26
spelling, 738 formal definition, 16
inflexion points negation NOT, 11
non-stationary, 741 negation of implication, 20
stationary, 741 statements, 9
Interior Extremum Theorem (IET), 725 summary, 31
1768, Contents
Wason Four-Card Puzzle, 8, 22 assumed knowledge, xxxi
LYX, xxv online calculators, xxxi
ordered pairs, 77
M ordered pairs, equality of, 77
mathematical vocabulary, 61
coefficient, 61 P
constant term, 61 Paul Lockhart, iv, xxxvii
dummy or placeholder variable, 73 PISA, xxxvii
equation, 61 2015 results, xliii
expression, 61 Plus Zero Trick, 691, 831, 832
inequality, 61 polynomial function, 282
left-hand-side (LHS), 61 is continuous, 670
right-hand-side (RHS), 61 polynomials, 73
term, 61 coefficient, 73
variable, 61 cubic, 74
Mathworld.Wolfram, xxxii equation, 73
maximum point, 719 in two variables, 74
strict, 719 linear, 73
minimum point, 719 quadratic, 73
strict, 719 quartic, 74
quintic, 74
N sextic, 74
nice function, 151 term, 73
Nicholas Saunderson, 6 power function, 282
notation primitive, 816

, 43 ProofWiki, xxxii
, 43
0 , 43 Q
ellipsis . . . , 37 quotes
empty set ∅, 44 Confucius, xxviii
in ∈, not in ∉, 35 Disraeli, xxviii
intersection ∩, 50 Einstein, xxix
intervals, 45 Feynman, xxix, 612
minus-plus ∓, 69 Halmos, xxvii
number of elements n(S), 37 Hardy, iv, 2
plus-minus ±, 69 Hilbert, 646
proper subset of ⊂, 48 Lockhart, iv, xl, 2, 377, 409
set complement A′ , 53 Mencken, xxxv
set minus ∖, 51 Michael Polanyi, iv
set of integers Z, 40 Peter Singer, xxix
set of rational numbers Q, 41 Poincaré, iv
set of real numbers R, 39 Russell, 646
subset of ⊆, 47 Socrates (Plato’s), xxviii
union ∪, 49 Spivak, 137
universal set E , 52 von Neumann, 76, 646
numbers Willy Wonka (Roald Dahl), 905
taxonomy, 42 Wozniak, xxxv
O Level, xxxi rationalising a surd, 69

1769, Contents

Sci-Hub, xxxii
secant, 282
secant lines, 674
Second Derivative Test for Inflexion Points
(SDTI), 743
sets, 32
De Morgan’s Laws, 13, 54
elements, 32, 33
set-builder notation, 56
summary, 60
is continuous, 670
sine, 282
Singapore Math, xxxvii
South Korea, xxxvii
Spider-Man, xxxiv
Stack Exchange, xxxii
stationary point, 719
Stephen Hawking, 6
suicides, xxxvii
surd, 69
Symbolab, xxxi
tangent, 282
Tangent Line Test (TLT), 739
TI84, xxxiii
Times One Trick, 693
tips, xxvii, xxxi, xxxiv
trigonometric function
is continuous, 670, 671
trigonometric functions, 282
turning point, 719
Wolfram Alpha, xxxi
zero function, 282

1770, Contents

Abbreviations Used in This Textbook
We give the number of the page on which each abbreviation is first used.

2D two-dimensional

3D three-dimensional

9233 subject code of 2002–08 H2-Maths-equivalent syllabus

9740 subject code of 2007–17 H2 Maths syllabus
9758 subject code of current H2 Maths syllabus (2017–?)

A Level Singapore-Cambridge Advanced Level

‘A’ Maths Additional Mathematics (an O-Level subject)

ASEAN Association of Southeast Asian Nations 57

BBC British Broadcasting Corporation xxxv

CLT Central Limit Theorem xl

COI constant of integration 818

CPFE Correct Procedure for Finding Extrema 732

DOS disk operating system xxxiii

1771, Contents

ESGS Every School a Good School xxxvi

FDTE First Derivative Test for Extrema 729, 1347

FDTI First Derivative Test for Inflexion Points 742, 1350

FLC four-letter campaign xxxvi

FSSPFE the Flawed Secondary School Procedure for Finding 725


FTCs Fundamental Theorems of Calculus

GCT Goh Chok Tong

GDP gross domestic product xxxv

GUI graphical user interface

1772, Contents

H1, H2, H3 Higher 1, Higher 2, Higher 3 (see A Level)
HCI Hwa Chong Institution

IET Interior Extremum Theorem 725

IMHO in my humble opinion xxxiii

IMO International Mathematical Olympiad

J1, J2 the first and second years of junior college

JC junior college xxxi

JSYK just so you know

LHL Lee Hsien Loong

List MF26 List of Formulae and Statistical Tables xxxi

LKY Lee Kuan Yew

MOE Ministry of Education (Singapore) xxxiii

1773, Contents

O Level Singapore-Cambridge Ordinary Level

PAP People’s Action Party

PDF portable document format

PDF probability density function

PISA Programme for International Student Assessment

PM Prime Minister 56
PSC Public Service Commission (Singapore) xxxvi

Q&A question and answer xxxii

SDN Social Development Network

SDTE Second Derivative Test for Extrema 730

SDTI Second Derivative Test for Inflexion Points 744

SDU Social Development Unit

SE Stack Exchange
SEAB Singapore Examinations and Assessment Board xxxiii

TI84 TI-84 PLUS Silver Edition calculator xxxiii

TLA three-letter abbreviation426 xxv

TLLM Teach Less, Learn More xxxvi

TLT Tangent Line Test 738

TPL Tin Pei Ling 16

TSLN Thinking Schools, Learning Nation xxxvi

TYS Ten Year Series xxxv

UK United Kingdom
1774, Contents
US United States of America
Singlish Used in This Textbook
We give the number of the page on which each Singlish expression is first used.

angmoh white person (literally “red-haired”) 259

Familee the family of Lee Kuan Yew 57

Gahmen government xxxv

kiasu literally, “afraid to lose” xxxi

mug to study hard, especially in rote fashion 259

promos the J1 end-of-year promotional examinations xxxvii

1775, Contents

Notation Used in the Main Text
We list only notation not already already listed on pp. 14–18 of your H2 Maths syllabus.
This is a fairly short list because we’ve generally tried to stick closely to the notation used
in your syllabus and exams.
We give the number of the page on which each piece of notation is first used and/or defined.

∵ because 3
∴ therefore 3
” ditto 3

Z+0 the set of non-negative integers, {0, 1, 2, 3, . . . } 43

Z−0 the set of non-positive integers, {. . . , −3, −2, −1, 0} 43
Q−0 the set of non-positive rational numbers, {x ∈ Q ∶ x ≤ 0} 43
R−0 the set of non-positive real numbers, {x ∈ R ∶ x ≤ 0} 43

⊊ proper subset of (used by other writers, not used in this 48


∖ set minus (or set difference) 51

Domainf the domain of the function f

Codomainf the codomain of the function f
Rangef the range of the function f

lim f (x) the left-hand limit of f as x tends to a 103, 1327

lim f (x) the right-hand limit of f as x tends to a 103, 1327

⋛ any inequality, i.e. any of ≥, <, or ≤.

BS the set of boundary points of the set S 723, 1342

IS the set of interior points of the set S 723, 1342

1776, Contents

Notation Used in the Appendices
We list only notation not already listed on the previous page or on pp. 14–18 of your H2
Maths syllabus.
We give the number of the page on which each piece of notation is first used and/or defined.

A×B the cartesian product of the sets A and B 1267

Nε (a) the ε-neighbourhood of a — in R, the set (a − ε, a + ε) 1325

N−ε (a) the left ε-neighbourhood of a — in R, the set (a − ε, a) 1325
N+ε (a) the right ε-neighbourhood of a — in R, the set (a, a + ε) 1325
Nε (a) the deleted (or punctured) ε-neighbourhood of a — in R, 1325
the set (a − ε, a) ∪ (a, a + ε)

1777, Contents

YouTube Ad
For a while I was making educational YouTube videos (channel name: Econ
Cow)! Mostly on economics. Do me a favour by checking them out and
letting me know how I can improve! Be sure to hit subscribe, like, and leave
me a comment!

1778, Contents

Tuition Ad

I give tuition for:

• Economics
• Mathematics

• Writing, English, General Paper

I have a PhD in economics and have been teach-

ing and tutoring since 2010, at every level from
secondary school through PhD.

For more information, please visit:

Or email:

1779, Contents

You might also like