Uploaded by marco20874

0.0 (0)

Thaddeus H. Black Revised 11 August 2012

ii

Thaddeus H. Black, 1967–. Derivations of Applied Mathematics. U.S. Library of Congress class QA401. Copyright c 1983–2012 Thaddeus H. Black. Published by the Debian Project [17]. This book is free software. You can redistribute and/or modify it under the terms of the GNU General Public License [25], version 2. This is a prepublished draft, dated 11 August 2012.

Contents

Preface 1 Introduction 1.1 Applied mathematics . . . . . 1.2 Rigor . . . . . . . . . . . . . . 1.2.1 Axiom and deﬁnition . 1.2.2 Mathematical extension 1.3 Complex numbers and complex 1.4 On the text . . . . . . . . . . . xxi 1 1 2 2 4 5 5

. . . . . . . . . . . . . . . . . . . . . . . . variables . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

I

**The calculus of a single variable
**

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7

9 9 9 11 12 12 13 15 17 18 18 20 21 22 23 23

2 Classical algebra and geometry 2.1 Basic arithmetic relationships . . . . . . . . . . . 2.1.1 Commutivity, associativity, distributivity 2.1.2 Negative numbers . . . . . . . . . . . . . 2.1.3 Inequality . . . . . . . . . . . . . . . . . 2.1.4 The change of variable . . . . . . . . . . 2.2 Quadratics . . . . . . . . . . . . . . . . . . . . . 2.3 Integer and series notation . . . . . . . . . . . . 2.4 The arithmetic series . . . . . . . . . . . . . . . 2.5 Powers and roots . . . . . . . . . . . . . . . . . . 2.5.1 Notation and integral powers . . . . . . . 2.5.2 Roots . . . . . . . . . . . . . . . . . . . . 2.5.3 Powers of products and powers of powers 2.5.4 Sums of powers . . . . . . . . . . . . . . 2.5.5 Summary and remarks . . . . . . . . . . 2.6 Multiplying and dividing power series . . . . . . iii

iv

CONTENTS 2.6.1 Multiplying power series . . . . . . . . . . . . 2.6.2 Dividing power series . . . . . . . . . . . . . . 2.6.3 Dividing power series by matching coeﬃcients 2.6.4 Common quotients and the geometric series . 2.6.5 Variations on the geometric series . . . . . . . 2.7 Constants and variables . . . . . . . . . . . . . . . . . 2.8 Exponentials and logarithms . . . . . . . . . . . . . . 2.8.1 The logarithm . . . . . . . . . . . . . . . . . . 2.8.2 Properties of the logarithm . . . . . . . . . . . 2.9 Triangles and other polygons: simple facts . . . . . . 2.9.1 The area of a triangle . . . . . . . . . . . . . . 2.9.2 The triangle inequalities . . . . . . . . . . . . 2.9.3 The sum of interior angles . . . . . . . . . . . 2.10 The Pythagorean theorem . . . . . . . . . . . . . . . . 2.11 Functions . . . . . . . . . . . . . . . . . . . . . . . . . 2.12 Complex numbers (introduction) . . . . . . . . . . . . 2.12.1 Rectangular complex multiplication . . . . . . 2.12.2 Complex conjugation . . . . . . . . . . . . . . 2.12.3 Power series and analytic functions (preview) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 24 28 31 32 32 34 34 35 36 36 36 37 38 40 42 44 44 46 49 49 51 51 55 57 58 59 59 63 64 66 69 69 73 73 74 75 76

3 Trigonometry 3.1 Deﬁnitions . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Simple properties . . . . . . . . . . . . . . . . . . . . . . . 3.3 Scalars, vectors, and vector notation . . . . . . . . . . . . 3.4 Rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Trigonometric sums and diﬀerences . . . . . . . . . . . . 3.5.1 Variations on the sums and diﬀerences . . . . . . 3.5.2 Trigonometric functions of double and half angles 3.6 Trigonometrics of the hour angles . . . . . . . . . . . . . 3.7 The laws of sines and cosines . . . . . . . . . . . . . . . . 3.8 Summary of properties . . . . . . . . . . . . . . . . . . . 3.9 Cylindrical and spherical coordinates . . . . . . . . . . . 3.10 The complex triangle inequalities . . . . . . . . . . . . . . 3.11 De Moivre’s theorem . . . . . . . . . . . . . . . . . . . . . 4 The derivative 4.1 Inﬁnitesimals and limits 4.1.1 The inﬁnitesimal 4.1.2 Limits . . . . . . 4.2 Combinatorics . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

CONTENTS 4.2.1 Combinations and permutations . . . . . 4.2.2 Pascal’s triangle . . . . . . . . . . . . . . The binomial theorem . . . . . . . . . . . . . . . 4.3.1 Expanding the binomial . . . . . . . . . . 4.3.2 Powers of numbers near unity . . . . . . 4.3.3 Complex powers of numbers near unity . The derivative . . . . . . . . . . . . . . . . . . . 4.4.1 The derivative of the power series . . . . 4.4.2 The Leibnitz notation . . . . . . . . . . . 4.4.3 The derivative of a function of a complex 4.4.4 The derivative of z a . . . . . . . . . . . . 4.4.5 The logarithmic derivative . . . . . . . . Basic manipulation of the derivative . . . . . . . 4.5.1 The derivative chain rule . . . . . . . . . 4.5.2 The derivative product rule . . . . . . . . 4.5.3 A derivative product pattern . . . . . . . Extrema and higher derivatives . . . . . . . . . . L’Hˆpital’s rule . . . . . . . . . . . . . . . . . . . o The Newton-Raphson iteration . . . . . . . . . . complex exponential The real exponential . . . . . . . . . . . . . . . . The natural logarithm . . . . . . . . . . . . . . . Fast and slow functions . . . . . . . . . . . . . . Euler’s formula . . . . . . . . . . . . . . . . . . . Complex exponentials and de Moivre . . . . . . Complex trigonometrics . . . . . . . . . . . . . . 5.6.1 The hyperbolic functions . . . . . . . . . 5.6.2 Inverse complex trigonometrics . . . . . . Summary of properties . . . . . . . . . . . . . . Derivatives of complex exponentials . . . . . . . 5.8.1 Derivatives of sine and cosine . . . . . . . 5.8.2 Derivatives of the trigonometrics . . . . . 5.8.3 Derivatives of the inverse trigonometrics The actuality of complex quantities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

v 76 78 78 78 79 80 81 81 82 84 86 86 87 87 87 89 89 91 93 97 97 100 102 104 108 108 109 110 111 111 113 114 114 116

4.3

4.4

4.5

4.6 4.7 4.8 5 The 5.1 5.2 5.3 5.4 5.5 5.6

5.7 5.8

5.9

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

. . . . . . . . . . . . . .

6 Primes, roots and averages 119 6.1 Prime numbers . . . . . . . . . . . . . . . . . . . . . . . . . . 119 6.1.1 The inﬁnite supply of primes . . . . . . . . . . . . . . 119 6.1.2 Compositional uniqueness . . . . . . . . . . . . . . . . 120

vi 6.1.3 Rational and irrational numbers . . . The existence and number of roots . . . . . . 6.2.1 Polynomial roots . . . . . . . . . . . . 6.2.2 The fundamental theorem of algebra Addition and averages . . . . . . . . . . . . . 6.3.1 Serial and parallel addition . . . . . . 6.3.2 Averages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123 124 124 125 126 126 129 133 133 134 137 137 139 141 141 142 143 143 146 148 148 148 149 152 153 155 157 159 160 160 162 163 164 165 167 169 172 173

6.2

6.3

7 The integral 7.1 The concept of the integral . . . . . . . . . . . . . . . . . 7.1.1 An introductory example . . . . . . . . . . . . . . 7.1.2 Generalizing the introductory example . . . . . . 7.1.3 The balanced deﬁnition and the trapezoid rule . . 7.2 The antiderivative . . . . . . . . . . . . . . . . . . . . . . 7.3 Operators, linearity and multiple integrals . . . . . . . . . 7.3.1 Operators . . . . . . . . . . . . . . . . . . . . . . 7.3.2 A formalism . . . . . . . . . . . . . . . . . . . . . 7.3.3 Linearity . . . . . . . . . . . . . . . . . . . . . . . 7.3.4 Summational and integrodiﬀerential commutivity 7.3.5 Multiple integrals . . . . . . . . . . . . . . . . . . 7.4 Areas and volumes . . . . . . . . . . . . . . . . . . . . . . 7.4.1 The area of a circle . . . . . . . . . . . . . . . . . 7.4.2 The volume of a cone . . . . . . . . . . . . . . . . 7.4.3 The surface area and volume of a sphere . . . . . 7.5 Checking an integration . . . . . . . . . . . . . . . . . . . 7.6 Contour integration . . . . . . . . . . . . . . . . . . . . . 7.7 Discontinuities . . . . . . . . . . . . . . . . . . . . . . . . 7.8 Remarks (and exercises) . . . . . . . . . . . . . . . . . . . 8 The Taylor series 8.1 The power-series expansion of 1/(1 − z)n+1 . . . . . 8.1.1 The formula . . . . . . . . . . . . . . . . . . 8.1.2 The proof by induction . . . . . . . . . . . . 8.1.3 Convergence . . . . . . . . . . . . . . . . . . 8.1.4 General remarks on mathematical induction 8.2 Shifting a power series’ expansion point . . . . . . . 8.3 Expanding functions in Taylor series . . . . . . . . . 8.4 Analytic continuation . . . . . . . . . . . . . . . . . 8.5 Branch points . . . . . . . . . . . . . . . . . . . . . 8.6 Entire and meromorphic functions . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

CONTENTS 8.7 8.8 Extrema over a complex domain . . . . . . . . . . . . . Cauchy’s integral formula . . . . . . . . . . . . . . . . . 8.8.1 The meaning of the symbol dz . . . . . . . . . . 8.8.2 Integrating along the contour . . . . . . . . . . . 8.8.3 The formula . . . . . . . . . . . . . . . . . . . . 8.8.4 Enclosing a multiple pole . . . . . . . . . . . . . Taylor series for speciﬁc functions . . . . . . . . . . . . Error bounds . . . . . . . . . . . . . . . . . . . . . . . . 8.10.1 Examples . . . . . . . . . . . . . . . . . . . . . . 8.10.2 Majorization . . . . . . . . . . . . . . . . . . . . 8.10.3 Geometric majorization . . . . . . . . . . . . . . 8.10.4 Calculation outside the fast convergence domain 8.10.5 Nonconvergent series . . . . . . . . . . . . . . . 8.10.6 Remarks . . . . . . . . . . . . . . . . . . . . . . Calculating 2π . . . . . . . . . . . . . . . . . . . . . . . Odd and even functions . . . . . . . . . . . . . . . . . . Trigonometric poles . . . . . . . . . . . . . . . . . . . . The Laurent series . . . . . . . . . . . . . . . . . . . . . Taylor series in 1/z . . . . . . . . . . . . . . . . . . . . The multidimensional Taylor series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

vii 174 175 176 176 180 181 182 185 185 186 188 190 192 193 194 194 195 197 199 201

8.9 8.10

8.11 8.12 8.13 8.14 8.15 8.16

9 Integration techniques 9.1 Integration by antiderivative . . . . . . . . . . . . . 9.2 Integration by substitution . . . . . . . . . . . . . . 9.3 Integration by parts . . . . . . . . . . . . . . . . . . 9.4 Integration by unknown coeﬃcients . . . . . . . . . 9.5 Integration by closed contour . . . . . . . . . . . . . 9.6 Integration by partial-fraction expansion . . . . . . 9.6.1 Partial-fraction expansion . . . . . . . . . . . 9.6.2 Repeated poles . . . . . . . . . . . . . . . . . 9.6.3 Integrating a rational function . . . . . . . . 9.6.4 The derivatives of a rational function . . . . 9.6.5 Repeated poles (the conventional technique) 9.6.6 The existence and uniqueness of solutions . . 9.7 Frullani’s integral . . . . . . . . . . . . . . . . . . . 9.8 Products of exponentials, powers and logs . . . . . . 9.9 Integration by Taylor series . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

203 . 203 . 204 . 205 . 207 . 210 . 215 . 215 . 216 . 219 . 221 . 221 . 223 . 224 . 225 . 227

viii 10 Cubics and quartics 10.1 Vieta’s transform . 10.2 Cubics . . . . . . . 10.3 Superﬂuous roots . 10.4 Edge cases . . . . 10.5 Quartics . . . . . . 10.6 Guessing the roots

CONTENTS 229 . 230 . 230 . 233 . 235 . 237 . 239

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

II

**Matrices and vectors
**

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

243

245 . 248 . 248 . 249 . 251 . 252 . 253 . 254 . 256 . 257 . 259 . 259 . 260 . 261 . 261 . 262 . 262 . 264 . 265 . 266 . 270 . 272 . 273 . 274 . 275 . 277 . 279 . 280 . 280

11 The matrix 11.1 Provenance and basic use . . . . . . . . . . . . . . . . . 11.1.1 The linear transformation . . . . . . . . . . . . . 11.1.2 Matrix multiplication (and addition) . . . . . . 11.1.3 Row and column operators . . . . . . . . . . . . 11.1.4 The transpose and the adjoint . . . . . . . . . . 11.2 The Kronecker delta . . . . . . . . . . . . . . . . . . . . 11.3 Dimensionality and matrix forms . . . . . . . . . . . . . 11.3.1 The null and dimension-limited matrices . . . . 11.3.2 The identity, scalar and extended matrices . . . 11.3.3 The active region . . . . . . . . . . . . . . . . . 11.3.4 Other matrix forms . . . . . . . . . . . . . . . . 11.3.5 The rank-r identity matrix . . . . . . . . . . . . 11.3.6 The truncation operator . . . . . . . . . . . . . 11.3.7 The elementary vector and lone-element matrix 11.3.8 Oﬀ-diagonal entries . . . . . . . . . . . . . . . . 11.4 The elementary operator . . . . . . . . . . . . . . . . . 11.4.1 Properties . . . . . . . . . . . . . . . . . . . . . 11.4.2 Commutation and sorting . . . . . . . . . . . . . 11.5 Inversion and similarity (introduction) . . . . . . . . . . 11.6 Parity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.7 The quasielementary operator . . . . . . . . . . . . . . 11.7.1 The general interchange operator . . . . . . . . 11.7.2 The general scaling operator . . . . . . . . . . . 11.7.3 Addition quasielementaries . . . . . . . . . . . . 11.8 The unit triangular matrix . . . . . . . . . . . . . . . . 11.8.1 Construction . . . . . . . . . . . . . . . . . . . . 11.8.2 The product of like unit triangular matrices . . 11.8.3 Inversion . . . . . . . . . . . . . . . . . . . . . .

CONTENTS 11.8.4 The parallel unit triangular matrix 11.8.5 The partial unit triangular matrix . 11.9 The shift operator . . . . . . . . . . . . . . 11.10 The Jacobian derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ix 281 286 288 288 291 292 294 294 296 299 300 308 309 309 311 311 312 313 316 317 318 322 324 325 326 327 328 331 332 336 337 338 342 342 344 345 345 346

12 Rank and the Gauss-Jordan 12.1 Linear independence . . . . . . . . . . . . . . . . . . . . . . 12.2 The elementary similarity transformation . . . . . . . . . . 12.3 The Gauss-Jordan decomposition . . . . . . . . . . . . . . . 12.3.1 Motive . . . . . . . . . . . . . . . . . . . . . . . . . 12.3.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . 12.3.3 The algorithm . . . . . . . . . . . . . . . . . . . . . 12.3.4 Rank and independent rows . . . . . . . . . . . . . 12.3.5 Inverting the factors . . . . . . . . . . . . . . . . . . 12.3.6 Truncating the factors . . . . . . . . . . . . . . . . . 12.3.7 Properties of the factors . . . . . . . . . . . . . . . 12.3.8 Marginalizing the factor In . . . . . . . . . . . . . . 12.3.9 Decomposing an extended operator . . . . . . . . . 12.4 Vector replacement . . . . . . . . . . . . . . . . . . . . . . . 12.5 Rank . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12.5.1 A logical maneuver . . . . . . . . . . . . . . . . . . 12.5.2 The impossibility of identity-matrix promotion . . . 12.5.3 General matrix rank and its uniqueness . . . . . . . 12.5.4 The full-rank matrix . . . . . . . . . . . . . . . . . 12.5.5 Under- and overdetermined systems (introduction) . 12.5.6 The full-rank factorization . . . . . . . . . . . . . . 12.5.7 Full column rank and the Gauss-Jordan’s K and S 12.5.8 The signiﬁcance of rank uniqueness . . . . . . . . . 13 Inversion and orthonormalization 13.1 Inverting the square matrix . . . . . . . . . . 13.2 The exactly determined linear system . . . . 13.3 The kernel . . . . . . . . . . . . . . . . . . . 13.3.1 The Gauss-Jordan kernel formula . . 13.3.2 Converting between kernel matrices . 13.3.3 The degree of freedom . . . . . . . . . 13.4 The nonoverdetermined linear system . . . . 13.4.1 Particular and homogeneous solutions 13.4.2 A particular solution . . . . . . . . . 13.4.3 The general solution . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

x 13.5 The residual . . . . . . . . . . . . . . . . . 13.6 The pseudoinverse and least squares . . . . 13.6.1 Least squares in the real domain . . 13.6.2 The invertibility of A∗ A . . . . . . . 13.6.3 Positive deﬁniteness . . . . . . . . . 13.6.4 The Moore-Penrose pseudoinverse . 13.7 The multivariate Newton-Raphson iteration 13.8 The dot product . . . . . . . . . . . . . . . 13.9 The complex vector triangle inequalities . . 13.10 The orthogonal complement . . . . . . . . . 13.11 Gram-Schmidt orthonormalization . . . . . 13.11.1 Eﬃcient implementation . . . . . . 13.11.2 The Gram-Schmidt decomposition . 13.11.3 The Gram-Schmidt kernel formula . 13.12 The unitary matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 346 347 349 351 352 352 356 357 359 361 361 363 364 368 369 373 373 375 378 379 380 380 381 382 384 385 386 388 389 390 392 393 394 398 401 405 407

14 The eigenvalue 14.1 The determinant . . . . . . . . . . . . . . . . . . . . . 14.1.1 Basic properties . . . . . . . . . . . . . . . . . 14.1.2 The determinant and the elementary operator 14.1.3 The determinant of a singular matrix . . . . . 14.1.4 The determinant of a matrix product . . . . . 14.1.5 Determinants of inverse and unitary matrices . 14.1.6 Inverting the square matrix by determinant . . 14.2 Coincident properties . . . . . . . . . . . . . . . . . . 14.3 The eigenvalue itself . . . . . . . . . . . . . . . . . . . 14.4 The eigenvector . . . . . . . . . . . . . . . . . . . . . 14.5 Eigensolution facts . . . . . . . . . . . . . . . . . . . . 14.6 Diagonalization . . . . . . . . . . . . . . . . . . . . . . 14.7 Remarks on the eigenvalue . . . . . . . . . . . . . . . 14.8 Matrix condition . . . . . . . . . . . . . . . . . . . . . 14.9 The similarity transformation . . . . . . . . . . . . . . 14.10 The Schur decomposition . . . . . . . . . . . . . . . . 14.10.1 Derivation . . . . . . . . . . . . . . . . . . . . 14.10.2 The nondiagonalizable matrix . . . . . . . . . 14.11 The Hermitian matrix . . . . . . . . . . . . . . . . . . 14.12 The singular-value decomposition . . . . . . . . . . . 14.13 General remarks on the matrix . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . .

CONTENTS 15 Vector analysis 15.1 Reorientation . . . . . . . . . . . . . . . . . . . . . . . . 15.1.1 The Tait-Bryan rotations . . . . . . . . . . . . . 15.1.2 The Euler rotations . . . . . . . . . . . . . . . . 15.2 Multiplication . . . . . . . . . . . . . . . . . . . . . . . 15.2.1 The dot product . . . . . . . . . . . . . . . . . . 15.2.2 The cross product . . . . . . . . . . . . . . . . . 15.3 Orthogonal bases . . . . . . . . . . . . . . . . . . . . . . 15.4 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . 15.4.1 Components by subscript . . . . . . . . . . . . . 15.4.2 Einstein’s summation convention . . . . . . . . . 15.4.3 The Kronecker delta and the Levi-Civita epsilon 15.5 Algebraic identities . . . . . . . . . . . . . . . . . . . . 15.6 Isotropy . . . . . . . . . . . . . . . . . . . . . . . . . . . 15.7 Parabolic coordinates . . . . . . . . . . . . . . . . . . . 15.7.1 The parabola . . . . . . . . . . . . . . . . . . . . 15.7.2 Parabolic coordinates in two dimensions . . . . 15.7.3 Properties . . . . . . . . . . . . . . . . . . . . . 15.7.4 The parabolic cylindrical coordinate system . . 15.7.5 The circular paraboloidal coordinate system . . 16 Vector calculus 16.1 Fields and their derivatives . . . . . . . . . . . . . 16.1.1 The ∇ operator . . . . . . . . . . . . . . . 16.1.2 Operator notation . . . . . . . . . . . . . . 16.1.3 The directional derivative and the gradient 16.1.4 Divergence . . . . . . . . . . . . . . . . . . 16.1.5 Curl . . . . . . . . . . . . . . . . . . . . . . 16.1.6 Cross-directional derivatives . . . . . . . . 16.2 Integral forms . . . . . . . . . . . . . . . . . . . . 16.2.1 The divergence theorem . . . . . . . . . . . 16.2.2 Stokes’ theorem . . . . . . . . . . . . . . . 16.3 Summary of deﬁnitions and identities . . . . . . . 16.4 The Laplacian, et al. . . . . . . . . . . . . . . . . . 16.5 Contour derivative product rules . . . . . . . . . . 16.6 Metric coeﬃcients . . . . . . . . . . . . . . . . . . 16.6.1 Displacements, areas and volumes . . . . . 16.6.2 The vector ﬁeld and its scalar components 16.7 Nonrectangular notation . . . . . . . . . . . . . . . 16.8 Derivatives of the basis vectors . . . . . . . . . . .

xi 409 . 412 . 412 . 414 . 414 . 415 . 415 . 418 . 424 . 424 . 426 . 427 . 432 . 434 . 435 . 436 . 438 . 440 . 442 . 443 445 445 447 449 450 452 454 456 457 457 459 459 462 464 465 466 467 468 469

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

xii 16.9 Derivatives in the nonrectangular systems . . 16.9.1 Derivatives in cylindrical coordinates 16.9.2 Derivatives in spherical coordinates . 16.9.3 Finding the derivatives geometrically 16.10 Vector inﬁnitesimals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . 469 469 474 476 482

III

**Transforms and special functions
**

Fourier series Parseval’s principle . . . . . . . . . . . . . . . . . . . Time, space and frequency . . . . . . . . . . . . . . The square, triangular and Gaussian pulses . . . . . Expanding waveforms in Fourier series . . . . . . . . 17.4.1 Derivation of the Fourier-coeﬃcient formula 17.4.2 The square wave . . . . . . . . . . . . . . . . 17.4.3 The rectangular pulse train . . . . . . . . . . 17.4.4 Linearity and suﬃciency . . . . . . . . . . . 17.4.5 The trigonometric form . . . . . . . . . . . . 17.5 The sine-argument function . . . . . . . . . . . . . . 17.5.1 Derivative and integral . . . . . . . . . . . . 17.5.2 Properties of the sine-argument function . . 17.5.3 Properties of the sine integral . . . . . . . . 17.5.4 The sine integral’s limit by complex contour 17.6 Gibbs’ phenomenon . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

485

487 488 491 493 494 495 497 497 499 502 503 504 505 506 509 511 515 515 515 517 517 519 519 520 522 524 525 526 529 532

17 The 17.1 17.2 17.3 17.4

18 The Fourier and Laplace transforms 18.1 The Fourier transform . . . . . . . . . . . . . . . . . . . 18.1.1 Fourier’s equation . . . . . . . . . . . . . . . . . 18.1.2 The transform and inverse transform . . . . . . 18.1.3 The complementary variables of transformation 18.1.4 An example . . . . . . . . . . . . . . . . . . . . 18.2 Properties of the Fourier transform . . . . . . . . . . . . 18.2.1 Duality . . . . . . . . . . . . . . . . . . . . . . . 18.2.2 Real and imaginary parts . . . . . . . . . . . . . 18.2.3 The Fourier transform of the Dirac delta . . . . 18.2.4 Shifting, scaling and diﬀerentiation . . . . . . . 18.2.5 Convolution and correlation . . . . . . . . . . . 18.2.6 Parseval’s theorem . . . . . . . . . . . . . . . . . 18.2.7 Oddness and evenness . . . . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

CONTENTS 18.3 18.4 18.5 18.6 18.7 18.8 18.9 Fourier transforms of selected functions . The Fourier transform of integration . . . The Gaussian pulse . . . . . . . . . . . . The Laplace transform . . . . . . . . . . . Solving diﬀerential equations by Laplace . Initial and ﬁnal values by Laplace . . . . The spatial Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

xiii 532 535 537 540 542 546 547

19 Introduction to special functions 549 19.1 The Gaussian pulse and its moments . . . . . . . . . . . . . . 550 20 Probability 20.1 Deﬁnitions and basic concepts . . . . . . 20.2 The statistics of a distribution . . . . . . 20.3 The sum of random variables . . . . . . . 20.4 The transformation of a random variable 20.5 The normal distribution . . . . . . . . . . 20.6 Inference of statistics . . . . . . . . . . . 20.7 The random walk and its consequences . 20.7.1 The random walk . . . . . . . . . 20.7.2 Consequences . . . . . . . . . . . 20.8 Other distributions . . . . . . . . . . . . . 20.8.1 The uniform distribution . . . . . 20.8.2 The exponential distribution . . . 20.8.3 The Rayleigh distribution . . . . . 20.8.4 The Maxwell distribution . . . . . 20.8.5 The log-normal distribution . . . 20.9 The Box-Muller transformation . . . . . . 20.10 The normal CDF at large arguments . . . 20.11 The normal quantile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 551 . 553 . 555 . 556 . 558 . 559 . 562 . 566 . 566 . 567 . 569 . 569 . 569 . 570 . 571 . 571 . 572 . 573 . 576

Appendices

581

A Hexadecimal notation, et al. 583 A.1 Hexadecimal numerals . . . . . . . . . . . . . . . . . . . . . . 584 A.2 Avoiding notational clutter . . . . . . . . . . . . . . . . . . . 585 B The Greek alphabet C A sketch of pure complex theory 587 591

xiv D Manuscript history CONTENTS 597 .

240 11. . . .1 8. . . . . . Power properties and deﬁnitions. .4 2. . . . . . . . . . Elementary operators: scaling. . . .1 5. . . .List of tables 2.2 2. .2 5. .2 11. . . . . . . . . . . . . . . . .4 5. . . Trigonometric functions of the hour angles. . . . cylindrical and spherical coordinate relations. . Further properties of the trigonometric functions. . . . . . . . . . 115 Derivatives of the inverse trigonometrics.5 3. . . . . . . . . . . . . . . . . . . . . . . . . . . .4 11. . . .1 3. . . .3 11. . . Elementary operators: addition. . . . . . . . . .2 A method to extract the four roots of the general quartic. . . .3 3.2 3. . . . . 184 Antiderivatives of products of exps. . . . . General properties of the logarithm. Simple properties of the trigonometric functions.1 A method to extract the three roots of the general cubic. . . . Rectangular. .1 9. Properties of the parallel unit triangular xv . . . . . . . . . . . 128 Basic derivatives for the antiderivative. . 10 18 27 29 36 52 61 65 67 Complex exponential properties. Dividing power series through successively smaller powers. . . . . 267 268 268 270 285 . . . . . . 233 10.1 11. . . . . . . . . . . . . . powers and logs. . . . . . . . . 141 Taylor series. . . . . . . . .1 2. . . . . . . . . . . . . . . . . . . . . . . matrix. Matrix inversion properties. . . . .3 6. . . . . . . . . . . . . . 117 Parallel and serial addition identities. . . . . . . . . . . . 227 10. . . 112 Derivatives of the trigonometrics. . . . . . . . . . . . . . . .3 2. . . . . . . . . . . .1 Basic properties of arithmetic.5 Elementary operators: interchange. . .1 7. . . . . . Dividing power series through successively larger powers.

. . . Real and imaginary parts of the Fourier transform. . . The Roman and Greek alphabets. . . . . . . . .4. .1 Properties of the Kronecker delta and Levi-Civita epsilon. . . 429 433 442 443 461 466 470 474 477 522 524 525 530 531 536 543 543 Fourier duality rules.5 18. . . . . . . . . . . . . . . . . .1 Some elementary similarity transformations. . . . . . . . . . Fourier transform pairs. .3 15. . . . . . . . . . .6 18.2 18. . . . . . . and their Fourier properties. . Properties involving shifting.1 16. . .2 15. . . . Metric coeﬃcients. . . 295 12. . Laplace transform pairs. . . 315 15. . . . Vector derivatives in spherical coordinates. . . . 588 . . . . . . . . . . . . . . . . . . . .3 16. . Derivatives of the basis vectors. . Vector derivatives in cylindrical coordinates.4 16. . . .8 B. . . . . . . . . . . 311 12. . . . . . . . . . . . . . .2 A few properties of the Gauss-Jordan factors. . . . . . . . . . . . . . . . . . . . Algebraic vector identities. . . . . .1 15. . . . .5 18. Parabolic coordinate properties. . . . . . . . . . . . . . . . . Circular paraboloidal coordinate properties. .4 18. . . . . . .3 The symmetrical equations of § 12. . . . . . . . . scaling and diﬀerentiation. . . .1 18. . . . . . . . . . . .7 18. . Properties of the Laplace transform. . . . . . . . .xvi LIST OF TABLES 12. . . . . .4 16. . . . . . . . . . . . . . Convolution and correlation in their peculiar notation.2 16. . . . . . . . . . . . . . . . . . . . . .3 18. . Deﬁnitions and identities of vector calculus. . . . . . . . . Convolution and correlation. . . . .

. .3 2. . . . . . . . . . . . . . . . . . . . The plan for Pascal’s triangle. . .2 3. . . .5 5. . . . . . . . . . . . . . . The Newton-Raphson iteration. . . . . . . . . . . . . . . . . . . . . . .3 5. . . .1 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . natural exponential. . . . . . . . . . . . . . . . . . . . . .1 2. . . . . .6 3. . . . . . . . . . natural logarithm.1 Two triangles. . . . . .9 4. . . . . . . . The The The The . . . . 134 xvii . . . . . . . . . .3 4. . . . . . . .1 3. . . . . . . . . . . . . . . .2 4. . . . . . . . . . . . . . . . . . . . . . . . . . . .5 3. . . . . . . . Multiplicative commutivity. . . . The Pythagorean theorem. . . . . . . . . . . . The sine function. . A local extremum. . . . . . . . . . . . . . . . . . . . . . . . . . . . A point on a sphere. . .4 4. . . . . . . . . . . . . . . . . . . . ˆ ˆ A two-dimensional vector u = xx + yy.1 4. . . . . . . . . . . . . . . . . Calculating the hour trigonometrics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . The laws of sines and cosines. .List of ﬁgures 1. . . . . .2 5. ˆ ˆ A three-dimensional vector v = xx + yy + ˆz. . . . . . . . . . . . . . . . . . . The sum of a triangle’s inner angles: A right triangle. The 0x18 hours in a circle. z Vector basis rotation. . . . .4 3. . . . . . . .5 3. . . . . .1 5. . . . . . . . . . . . .3 3. . . . . . . . . . . . .4 7. . . Pascal’s triangle. . . . . . . . A level inﬂection. . . . . . . . . . . . . . . . . . . . .8 3. . .2 2. . derivatives of the sine and cosine functions. .4 2. . . . Areas representing discrete sums. . . . . turning at the corner. . . . complex exponential and Euler’s formula. . .7 3. The complex (or Argand) plane. . 4 10 37 39 39 43 50 51 53 53 56 60 62 63 67 78 79 90 91 93 100 101 105 113 The sine and the cosine. . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .8 The points at which t intersects tan t. . .10 Gibbs’ phenomenon. . . . . . 17. . . . 136 138 148 149 150 150 154 155 156 A complex contour of integration in two segments. . . . . . . . . . . . . . . LIST OF FIGURES of inﬁnitesimals. . . . . . . . . . . . . . . . . .1 An area representing an inﬁnite sum Integration by the trapezoid rule. .7 15. . . . triangular and Gaussian pulses. . . .1 8. . . . . . 17. . An element of a sphere’s surface. . . . . . . . . .3 15. . . . . . . . . . The dot product. . . . . . . 17. . . . . . . . . . . .3 9. . . . . . The volume of a cone. . . . . . . . . . . . . . . . . . . .2 8. . . . . . . .8 15. . .5 15. . . . . .7 The sine integral. . . . . . . The spherical basis. . . . . . . . 17.9 7. .4 A rectangular pulse train. . . . . . . . . . . . . . . . . . . . . . . . .7 7. . . . . . . The parabolic coordinate grid in two dimensions. . The cylindrical basis. . . . . 410 416 418 421 422 433 437 439 440 487 489 493 498 500 504 505 507 509 514 17. . . . . . . . . . . .5 A Dirac delta pulse train. . . . . . . . . . . . . .xviii 7. 17. . . . . . . 181 Majorization.6 15. . . . .6 The sine-argument function. .1 15. . . . . 17. . . . . . . . . . . . . . . . . . A contour of integration. . . . . . . . . . . . . . . . .9 A complex contour about which to integrate eiz /i2z. 17.3 7. . . . . . . . . . . . . .2 15. . . . . . . . . . . . . . . .5 7. . . . . . . . . . . . . . . . . . . . . . 187 Integration by closed contour. . . 17. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8 7. . . . . . . 231 13. . . . . . . 177 A Cauchy contour integral. . . . . . . . . . . . . . . .4 15. . .2 7. . . . . . . . . . . . . . . . . . . . . . . . . The cross product. .1 Fitting a line to measured data.1 Vieta’s transform. . . . . . 17. . . . . . A vector projected onto a plane. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9 A point on a sphere. . . . . . 211 10. .10 8. The area of a circle. . . . . plotted logarithmically. . . . . . . . . . . .2 Superpositions of sinusoids. . . . . . . . .4 7. . . . . . . . . .6 7. . . . . . . . . . . . . . . . . . . . . . . . . . 348 15. . . . . . . . . . . .1 A square wave. . . . . . . The parabola. . . . . . . . . . . . . . . . . . . . . Locating a point by parabolic construction. . . . . . . . . . . . . The Heaviside unit step u(t). . . . . . . . . . . . . . . . . .3 The square. . . . . . . . . . . . . . . . . . . . . . . . . . . . . A sphere. . . . . . . . . . The Dirac delta δ(t). . .

. . . . . . . . . . 518 20. . . 562 . . . . . . . .LIST OF FIGURES xix 18. . . . . . . .1 A pulse. . .1 The normal distribution and its CDF. . . 18. . . . . . . . . . . . .1. . . . . 516 18.2 The Fourier transform of the pulse of Fig. . . . . .

xx LIST OF FIGURES .

Kernighan’s and Ritchie’s The C Programming Language [42]. college-bound scientist or engineer. The book as a whole surveys the general mathematical methods common to engineering. For the prospective. The book thus is a marshal or guide. It concisely arrays and ambitiously reprises the mathematics the practicing scientist or engineer will likely once have met but may imperfectly recall. I lost the dare. Its focus on derivations is what principally distinguishes this book from the few others1 of its class. deriving the mathematics it reprises.] I never meant to write this book. In this book. The book is neither a tutorial on the one hand nor a bald reference on the other.Preface [You are reading a prepublished draft of the book. who means to begin on page one. xxi . architecture. It is a study reference. in the tradition of. It emerged unheralded. the book oﬀers a concrete means to earn an enduring academic advantage. No result is presented here but that it is 1 Other books of the class include [15][39][4]. exercise and application. From such a kernel the notes grew over time. you can look up some particular result directly. unexpectedly. ﬁlling gaps in one’s knowledge while extending one’s mathematical reach. review. but looking the proof up later I recorded it on loose leaves. for instance. already have turned ahead to it. I trust. The reader who has come to look up a particular result will. until family and friends suggested that the notes might make the material for the book you hold. to the extent to which study of the book proper is augmented by reﬂection. The book began in 1983 when a high-school classmate challenged me to prove the Pythagorean theorem on the spot. chemistry and physics. so let me here address the other reader. adding to it the derivations of a few other theorems of interest to me. or you can begin on page one and read—with toil and commensurate proﬁt—straight through to the end of the last chapter. dated on the title page.

Plan Following its introduction in Ch. useful to engineers and their brethren. Thus. The ﬁrst part begins with a brief review of classical algebra and geometry and develops thence the calculus of a single complex variable. The second part constructs the initially oppressive but broadly useful mathematics of matrices and vectors. if you will endorse the characterization—the conventional notation of applied mathematics in one conspicuous respect which. is straightforward enough to describe in a single sentence. then observe that the same notes might prove useful to others. the third part’s unifying theme. The book . the most interesting but also the most advanced. though extensive enough to take several hundred pages to execute. but I suppose that what is true for me is true for many of them also: we begin by organizing notes for our own use. I think.xxii PREFACE justiﬁed in a style engineers. My own peculiar heterogeneous background in military service. residence and citizenship in the United States. electromagnetic analysis and Debian development. How other authors go about writing their books. What constitutes “useful” or “orderly” is a matter of perspective and judgment. undoubtedly bias the selection and presentation to some degree. 14. without which so many modern applications (to the fresh incredulity of each generation of college students) remain analytically intractable—the jewel of this second part being the eigenvalue of Ch. introduces the the mathematics of the Fourier transform. as possible in a coherent train. recording and presenting the derivations together in an orderly manner in a single volume. Whether this book succeeds in the last point is for the reader to judge. requires some defense here. Notation The book deviates from—or cautiously improves. I do not know. building construction. my nativity. this calculus being the axle as it were about which higher mathematics turns. 1 the book comes in three parts. the book’s overall plan. scientists and other applied mathematicians will recognize—not indeed the high style of the professional mathematician. of course. probability and the wave equation—each of which is enhanced by the use of special functions. electrical engineering. The plan is to derive as many mathematical results. which serves other needs. but the long-established style of applications. and then undertake to revise the notes and to bring them into a form which actually is useful to others. The third and ﬁnal part.

retaining especially a partial regard for the stately grandeur of the decimal numerals mdclxvi of the famous Roman style. is no use). one who does for the love of it: not such a bad motive. rather than to the sophistry that I aﬀect to appeal here. has indisputably abused such causes—in ways which the mature reader of a certain cast of mind will ﬁnd all too familiar—such causes would hardly merit abuse did they not sometimes hide a latent measure of justice. I should rather not make too many claims. the book leaps seldom. one might judge the last to be more excuse than cause. law and engineering (the physical units are arbitrary anyway). Decimal numerals are well in history and anthropology (man has ten ﬁngers). Will making. not a program. Lore among open-source developers holds that open development inherently leads to superior work. . If one wishes to reach hexadecimal ground then one must leap. though a dreary train of sophists down the years. There unfortunately really is no gradual way to bridge the gap to hexadecimal (shifting to base eleven. impatient of experience. after all2 ) on principle. It is in this spirit alone that hexadecimal numerals are given place here. but sometimes adorned. It is to the justice. pounds. Custom is not always defaced. a Washington. It would be vain to deny that professional editing and formal peer review.. shillings. ﬁnance and accounting (dollars. maybe. Yet. either— unless one would on the same principle despise a Socrates. In other matters. Admittedly. The book in general walks a tolerably conventional applied mathematical line. by contrast. etc. an Einstein or a Debian Developer [17]. it does not do to despise the amateur (literally. but they are merely serviceable in mathematical theory. cents. Why not decimal only? There is nothing wrong with decimal numerals as such. Well. had substantial value. I am for them. neither of which the book enjoys. Often it does in fact.xxiii employs hexadecimal numerals. thence to twelve. Publication The book belongs to the emerging tradition of open-source software where at the time of this writing it ﬁlls a void. Nevertheless it is a book. On the other hand. Personally with regard to my own work. Twenty years of keeping my own private notes in hex have persuaded me that the leap justiﬁes the risk. or at least to the aesthetic. by the respectful attendance of a prudent discrimination. Open source has a spirit to it which 2 The expression is derived from an observation I seem to recall George F. never aesthetic. pence: the base hardly matters). eager to innovate.

proprietary book. Such readers. Catching the reader’s eye. but the point is that other writers can refer their readers hither without undue fear of imposition. sidebars. long sanctioned by an earlier era of publishing.xxiv PREFACE leads readers to be far more generous with their feedback than ever could be the case with a traditional. too. among whom a surprising concentration of talent and expertise are found. the last of which it generally sets in italic type. The book includes a bibliography listing works I have referred to while writing. In typical science/engineering style. for its methods and truths are established by derivation . though edifying. of course. pages otherwise devoted to mathematical appendices rather omitted. but for this book the humble footnote. Most readers naturally will be satisﬁed merely to read the book. Others. etc. the book numbers its sections. special fonts. Mathematics however by nature promotes queer bibliographies. tables. The book subjoins an alphabetical index as a standard convenience. This is as it should be. thus saving scarce pages in the other writers’ own books and articles. oﬀering nonessential material which. coheres insuﬃciently well to join the book’s main narrative. In this book it shall have many messages to bear. extensively employed by such sages as Gibbon [27] and Shirer [66]. Some of the footnotes unremarkably cite sources but many are discursive in nature. colored inks. enrich the work freely. like numbered examples. Modern publishing promotes various alternatives to the footnote—numbered examples. The book’s open-source publication implies that it can neither go out of print nor grow hard to ﬁnd. which alone of the book’s pages is not to be regarded as a proper part of the book. ﬁgures and formulas. seems the most able messenger. a theorem is referenced by the number of the section that states it. Within the book. Even so. This has value. The footnote is an imperfect messenger. a reference one’s reader can conveniently follow diﬀers practically from a reference one’s reader can follow with diﬃculty if at all. Such a reader will tend rather to consult the book’s table of contents which is a proper part. After all. it can break the ﬂow of otherwise good prose. the canny reader will avoid using the index (of this and most other books). really do help the right kind of book. You can indeed copy and distribute the book yourself if you wish. Edition The book is extensively footnoted. but not its theorems. Some of these are merely trendy.

xxv rather than authority. May the book deserve such a heritage. Acknowledgements Among the bibliography’s entries stands a reference [11] to my doctoral adviser G. my mother has edited tracts of the book’s manuscript. Brown’s style and insight touch the book in many places and in many ways. Admittedly. though the book’s narrative seldom explicitly invokes the reference. The latter proofs are perhaps original or semi-original from my personal point of view but it is unlikely that many if any of them are truly new. patiently building a foundation that cannot but be said to undergird the whole book today. My father generously ﬁnanced much of my formal education but—more than this—one would have had to grow up with my brother and me in my father’s home to appreciate the grand sweep of the man’s curiosity. More recently. Brown or to my wife and children. the ways in which a good doctoral adviser inﬂuences his student are both complex and deep.S. My mother taught me at her own hand most of the mathematics I ever learned as a child. No. usually too implicitly coherently to cite. the book owes a debt to my mother and. THB . to my father. and even where a proof is new the idea proven probably is not. Prof. had been drafted years before I had ever heard of Prof. To the initiated. Brown had nothing directly to do with the book’s development. for a substantial bulk of the manuscript. Much of the book consists of common mathematical knowledge or of proofs I have worked out with my own pencil from various ideas gleaned—who knows from where?—over the years. separately. More and much earlier than to Prof. or of the notes underlying it. the general depth of his knowledge. and my work under him did not regard the book in any case. Brown. Brown or he of me. but it is no oﬃce of a mathematical book’s preface to burden readers with an author’s expressions of ﬁlial piety. it is in entirely another respect that I lay the matter here. the mathematics itself often tends to suggest the form of the proof: if to me. any author of any book might say as much in a certain respect. without either of whom the book would never have come to be. Prof. However. then surely also to others who came before. Steady encouragement from my wife and children contribute to the book in ways only an author can properly appreciate. the profound wisdom of his practicality and the enduring example of his love of excellence.

xxvi PREFACE .

geometrical and classical-algebraic idealizations of physical systems. engineers and the like. The question of what is applied mathematics does not answer to logical classiﬁcation so much as to the sociology of professionals who use mathematics. 1 . well-deﬁned sets of axioms but rather directly from a nebulous mass of natural arithmetical. If you have seen a mathematical result. 1. on both counts. The book’s chapters are topical. To this end. . It derives mathematical results e from the purely applied perspective of the scientist and the engineer. In this book we shall deﬁne applied mathematics to be correct mathematics useful to scientists. if you want to know why the result is so. de¨mphasizing formal mathematical rigor. [46] What is applied mathematics? That is about right. you can look for the proof here.1 Applied mathematics Applied mathematics is a branch of mathematics that concerns itself with the application of mathematical knowledge to other domains.Chapter 1 Introduction This is a book of applied mathematical proofs. proceeding not from reduced. This ﬁrst chapter treats a few introductory matters of general interest. the book emphasizes main threads of mathematical argument and the motivation underlying the main threads. The book’s purpose is to convey the essential ideas underlying the derivations of a large number of mathematical results useful in the modeling of physical systems. demonstrable but generally lacking the detailed rigor of the professional mathematician. . .

A prime aesthetic here is irreducibility: no axiom in the set should overlap the others or be speciﬁable in terms of the others. adapted implicitly to suit the model at hand. but rather in the elegant modeling of the essential features of some physical system. depending on the physical context in which the equation . which is why the professional mathematician tends not to accept blithe statements such as that 1 = ∞.2 CHAPTER 1. Here. the applied mathematician relies not on abstract formalism but rather on a thorough mental grasp of the essential physical features of the phenomenon he is trying to model. but its readers do implicitly demand a convincing assurance that its writers could derive results in pedantic detail if called upon to do so. what he knows is that the bridge is founded in certain soils with speciﬁed tolerances. he usually doesn’t know. If you ask the applied mathematician exactly what his axioms are. An equation like 1 =∞ 0 may make perfect sense without further explanation to an applied mathematical readership. 0 without ﬁrst inquiring as to exactly what is meant by symbols like 0 and ∞. Notions of mathematical rigor ﬁt far more comfortably in the abstract realm of the professional mathematician.2 Rigor It is impossible to write such a book as this without some discussion of mathematical rigor. Applied and pure mathematics diﬀer principally and essentially in the layer of abstract deﬁnitions the latter subimposes beneath the physical ideas the former seeks to model. based on previous experience solving similar problems. The professional mathematical literature discourages undue pedantry indeed. Precise deﬁnition here is critically important. mathematical deﬁnitions tend to be made up ad hoc along the way. a professional mathematician knows or precisely speciﬁes in advance the set of fundamental axioms he means to use to derive a result. which symbolic algebra he is using.2. suﬀers such-and-such a wind load. His ideal lies not in precise deﬁnition or irreducible axiom. To avoid error. INTRODUCTION 1. 1. they do not always translate so gracefully to the applied realm. Geometrical argument—proof by sketch—is distrusted. etc.1 Axiom and deﬁnition Ideally. The applied mathematical reader should be aware of this diﬀerence. The applied mathematician begins from a diﬀerent base.

weakened the connection between mathematics and physics. and although it is a stylistic error to mix the two indiscriminately. physicists have ceased to appreciate the attitudes of mathematicians. but from the applied mathematical perspective Heaviside nevertheless had a point. perhaps. The introduction you are now reading is not the right venue for an essay on why both kinds of mathematics—applied and professional (or pure)— are needed. [53] Exaggeration. In many cases. RIGOR 3 is introduced. . responsible for the important applied mathematical technique of phasor analysis. Courant and Hilbert could have been speaking for the engineers and other applied mathematicians of our own day as well as for the physicists of theirs. however.1. Each kind has its place.2. This book adopts the Courant-Hilbert perspective. Since the seventeenth century. clearly the two have much to do with one another. [15. obvious. Geometrical argument—proof by sketch—is not only trusted but treasured. To the applied mathematician. The irascible Oliver Heaviside. and at times have overlooked the unity of their science with physics and other ﬁelds. mathematicians. However this may be. this book is a book of derivations of applied mathematics. Preface] Although the present book treats “the attitudes of mathematicians” with greater deference than some of the unnamed 1924 physicists might have done. it is meant to be used. such deﬁnitions are seldom promoted for their own sakes. once said. Abstract deﬁnitions are wanted only insofar as they smooth the analysis of the particular physical problem at hand. The derivations here proceed by a purely applied approach. have concentrated on reﬁnement and emphasized the postulational side of mathematics. . Recent trends and fashions have. still. The professional mathematicians Richard Courant and David Hilbert put it more soberly in 1924 when they wrote. . physical intuition has served as a vital source for mathematical problems and methods. It is shocking that young people should be addling their brains over mere logical subtleties. turning away from the roots of mathematics in intuition. trying to understand the proof of one obvious fact in terms of something equally . the mathematics is not principally meant to be developed and appreciated for its own sake.

with b1 the base of the larger. The b2 is negative because the sense of the small right triangle’s area in the proof is negative: the small area is subtracted from . 2 b2 h 2 (each right triangle is exactly half a rectangle).2.4 CHAPTER 1. 0—then asking what follows (precedes?) 0 in the countdown and what properties this new. the results achieved by extension are unsurprising and not very interesting in themselves. What about the triangle on the right? Its b1 is not shown on the ﬁgure. A = A1 + A2 = 2 2 Very well. The astonishing Euler’s formula (§ 5. negative integer must have to interact smoothly with the already known positives. INTRODUCTION Figure 1. however. 1. 2. and what is that −b2 . Such extended results are the faithful servants of mathematical rigor.1: Two triangles. More often. anyway? Answer: the triangle is composed of the diﬀerence of two right triangles. Consider for example the triangle on the left of Fig.4) is discovered by a similar but more sophisticated mathematical extension. negative integers and their properties can be discovered by counting backward—3. overall one: b1 = b + (−b2 ).1. 1. Hence the main triangle’s area is bh (b1 + b2 )h = . For example. This triangle is evidently composed of two right triangles of areas A1 = A2 = b1 h . h b1 b b2 b −b2 h 1.2 Mathematical extension Profound results in mathematics are occasionally achieved simply by extending results already known.

3 Complex numbers and complex variables More than a mastery of mere logical details. Less beautiful but more practical paths to the topic exist. COMPLEX NUMBERS AND COMPLEX VARIABLES 5 the large rather than added. this book generally leaves the mere extension of proofs— including the validation of edge cases and over-the-edge cases—as an exercise to the interested reader. a bare sketch of the abstract theory of the complex variable is found in Appendix C. A feel for the math is the great thing. the main triangle’s area is again seen to be A = bh/2. the reader comes to appreciate that most mathematical properties which apply for real numbers apply equally for complex. 9.1. The proof is exactly the same. axioms. Sections 2.10. Readers unfamiliar with the Greek alphabet will ﬁnd it in Appendix B. 4. In these sections and throughout the book. are felt to be secondary.12. constitute the book’s principal stages of complex development. though often useful. 4. plus all of Chs.5.6.3. . 5 and 8. 6. that few properties concern real numbers alone. By extension on this basis. In fact.2 this book leads the reader along one of these. 1.3. The book’s rapidly staged development of complex numbers and complex variables is planned on this sensibility. once the central idea of adding two right triangles is grasped. 8’s footnote 8. 1. 3. symbolic algebras and the like. the extension is really rather obvious—too obvious to be allowed to burden such a book as this. It denotes variables in Greek letters as well as Roman.1 The abstract theory is quite beautiful. Excepting the uncommon cases where extension reveals something interesting or new. Formal deﬁnitions. 1 2 [5][23][67][34] See Ch. 3.4. However.2.4 On the text The book gives numerals in hexadecimal.3. For supplemental reference. Readers unfamiliar with the hexadecimal notation will ﬁnd a brief orientation thereto in Appendix A.11.5 and 9. it is an holistic view of the mathematics and of its use in the modeling of physical systems which is the mark of the applied mathematician. Pure mathematics develops an abstract theory of the complex variable. its arc takes oﬀ too late and ﬂies too far from applications for such a book as this.

nonetheless. which color otherwise remains unobserved. to strike an appropriately lively balance between the opposite demands of logical progress and literary relief. even gray reading. Delineations of black and white become the book’s duty. The book’s subject and purpose thus restrict its overt style.6 CHAPTER 1. Mathematics however should serve the demands not only of deduction but equally of insight. though this book does try—at some risk to strict consistency of tone—to add color in suitable shades. . neither every sequence of equations nor every conjunction of ﬁgures is susceptible to an apparent hue the writer can openly paint upon it. version 2. INTRODUCTION Licensed to the public under the GNU General Public Licence [25]. Yet. this book meets the Debian Free Software Guidelines [18]. The book begins by developing the calculus of a single variable. but only to that abeyant hue. that luster which reveals or reﬂects the ﬁre of the reader’s own mathematical imagination. A book of mathematical derivations by its nature can tend to make dry. by the latter of which alone mathematics derives either feeling or use.

Part I The calculus of a single variable 7 .

.

each of which applies not only to real numbers but equally to the complex numbers of § 2. or of how to solve 3x − 2 = 7.Chapter 2 Classical algebra and geometry One learns arithmetic and the simplest elements of classical algebra and geometry as a child. then the same rectangle turned on its side. one imagines a rectangle with sides of lengths a and b.1 lists several arithmetical rules. associativity. and since the area is the length times the width in either case (the area is more or less a matter of counting the little squares). identity and inversion Table 2. as in Fig.1: since the area of the rectangle is the same in either case.1. except that here 9 . there are some basic points which do seem worth touching. A similar argument validates multiplicative associativity. However.12. Few readers presumably. 2.1 Commutivity.1 Basic arithmetic relationships This section states some arithmetical rules. Most of the rules are appreciated at once if the meaning of the symbols is understood. on the present book’s tier. or of the formal consequences of the congruence of the several angles produced when a line intersects some parallels. 2. would wish the book to begin with a treatment of 1 + 1 = 2. distributivity. In the case of multiplicative commutivity. The book starts with these. evidently multiplicative commutivity holds. 2.

b a a b .1: Basic properties of arithmetic.10 CHAPTER 2. CLASSICAL ALGEBRA AND GEOMETRY Table 2. a+b a + (b + c) a+0=0+a a + (−a) ab (a)(bc) (a)(1) = (1)(a) (a)(1/a) (a)(b + c) = = = = = = = = = b+a (a + b) + c a 0 ba (ab)(c) a 1 ab + ac Additive commutivity Additive associativity Additive identity Additive inversion Multiplicative commutivity Multiplicative associativity Multiplicative identity Multiplicative inversion Distributivity Figure 2.1: Multiplicative commutivity.

. you may want to use parentheses anyway.1. surely either the zero or the inﬁnity.3 A × B × C = A × (B × C). (+a)(−b) = −ab. The sense of it is that the thing on the left (A × ) operates on the thing on the right (B × C).1. which box we turn various ways.2. (−a)(−b) = +ab. but the last is interesting. (u)(vw) The nonassociative cross product B × C is introduced in § 15. Where associativity does not hold and parentheses do not otherwise group. The ﬁrst three of the four equations are unsurprising. Looking ahead in the book. Loosely. (In the rare case in which the question arises. somehow diﬀer in the latter case. Ch. matrix multiplication is not commutative and vector cross-multiplication is not associative.2. or both. Standard mathematical notation is more elegant: 2 1 abc/uvw = 3 (a)(bc) . 0 But since 3/0 = ∞ also. (−a)(+b) = −ab.2 Negative numbers Consider that (+a)(+b) = +ab.1 Multiplicative inversion lacks an obvious interpretation when a = 0. along with division inharmoniously on the same level of syntactic precedence as multiplication.) 2. For example. 1] The ﬁne C and C++ programming languages are unfortunately stuck with the reverse order of association. we note that the multiplicative properties do not always hold for more general linear transformations. Why would a negative count −a of a negative quantity −b come to [67. BASIC ARITHMETIC RELATIONSHIPS 11 we compute the volume of a three-dimensional rectangular box. 1 = ∞.2. right-to-left association is notationally implicit:2.

(−3)(−b) = +3b. . (−1)(−b) = +1b. 2. then necessarily a + x < b + x. introducing new symbols to stand in place of old. (0)(−b) = (+1)(−b) = −1b.3 If4 Inequality a < b.1. For this we have Few readers attempting this book will need to be reminded that < means “is less than. (+2)(−b) = −2b. The logic of arithmetic demands that the product of two negative numbers be positive for this reason. (+3)(−b) = −3b. CLASSICAL ALGEBRA AND GEOMETRY a positive product +ab? To see why. 1 1 > . 2.” that > means “is greater than. 0b. if u < 0.” 4 . However.4 The change of variable The applied mathematician very often ﬁnds it convenient to change variables. .1.12 CHAPTER 2.” or that ≤ and ≥ respectively mean “is less than or equal to” and “is greater than or equal to. a b if u > 0. . (−2)(−b) = +2b. consider the progression . the relationship between ua and ub depends on the sign of u: ua < ub ua > ub Also. . .

2 Quadratics a2 − b2 = (a + b)(a − b). put Q”. “let the new symbol Q represent P . “in place of P .12 below). a2 + 2ab + b2 = (a + b)2 a2 − 2ab + b2 = (a − b)2 . One can indeed use the equal sign. The concepts grow clearer as examples of the usage arise in the book. 6 One would never write k ≡ k + 1.2. usually it is better to introduce a new symbol. Diﬀerences and sums of squares are conveniently factored as a2 + b2 = (a + ib)(a − ib). then the change of variable 2µ ← a yields the new form (2µ)2 + b2 = c2 . but then what does the change of variable k = k + 1 mean? It looks like a claim that k and k + 1 are the same. This means. However. QUADRATICS the change of variable or assignment notation5 Q ← P.” For example. 13 This means.2. 2. “let Q now equal P . In some books. other than the = equal sign. The C and C++ programming languages use == for equality and = for assignment (change of variable). it just introduces a (perhaps temporary) new symbol Q to ease the algebra. Still. as in j ← k + 1. or. which regrettably does not ﬁll the role well.”6 The two notations logically mean about the same thing. Even k ← k + 1 can confuse readers inasmuch as it appears to imply two diﬀerent values for the same symbol k. which is impossible. Similar to the change of variable notation is the deﬁnition notation Q ≡ P.1) (where i is the imaginary unit. but the latter notation is sometimes used anyway when new symbols are unwanted or because more precise alternatives (like kn = kn−1 + 1) seem overwrought. whereas Q ← P implies nothing especially interesting about P or Q. ≡ is printed as . the latter notation admittedly has seen only scattered use in the literature. The notation k ← k + 1 by contrast is unambiguous. introduced in more detail in § 2. if a2 + b2 = c2 . (2. Useful as these four forms are. as the reader may be aware. Q ≡ P identiﬁes a quantity P suﬃciently interesting to be given a permanent name Q. a number deﬁned such that i2 = −1. Subjectively. it means to increment k by one. 5 There appears to exist no broadly established standard mathematical notation for the change of variable. .

10 and its Tables 10. To factor this. where z1 and z2 are the two values of z given by (2. z=β± β2 − γ2. for instance. 10 It suggests it because the expressions on the left and right sides of (2. which is called the quadratic formula. 11 The form of the quadratic formula which usually appears in print is √ −b ± b2 − 4ac x= . The term 5x2 y = 5[x][x][y] is of third order. (2.11 (Cubic and quartic formulas also exist respectively to extract the roots of polynomials of third and fourth order.2) = z 2 − 2βz + β 2 − (β 2 − γ 2 ) The expression evidently has roots8 where or in other words where9 This suggests the factoring10 z 2 − 2βz + γ 2 = (z − z1 )(z − z2 ).11. zeroth-order expressions like 3 are constant.2). respectively.) The adjective quadratic refers to the algebra of expressions in which no term has greater than second order. where the two symbols appear together.2. CLASSICAL ALGEBRA AND GEOMETRY however.3) are both quadratic (the highest power is z 2 ) and have the same roots. the expressions x3 −1 and 5x2 y are cubic not quadratic because they contain third-order terms.3) are those given by (2. 2a 7 z 2 = 2βz − γ 2 (2. 9 The symbol ± means “+ or −. Expressions of fourth and ﬁfth order are quartic and quintic.1 and 10. By contrast.) 8 A root of f (z) is a value of z for which f (z) = 0. but they are much harder. we complete the square.4) .2). (If not already clear from the context. the alternate symbol ∓ occasionally also appears. writing z 2 − 2βz + γ 2 = z 2 − 2βz + γ 2 + (β 2 − γ 2 ) − (β 2 − γ 2 ) = (z − β)2 − (β 2 − γ 2 ). Examples of quadratic expressions include x2 . It follows that the two solutions of the quadratic equation (2. See Ch. First-order expressions like x + 1 are linear. Substituting into the equation the values of z1 and z2 and simplifying proves the suggestion correct. See § 2. none of them can directly factor the more general quadratic7 expression z 2 − 2βz + γ 2 . 2x2 − 7x + 3 and x2 +2xy +y 2 .14 CHAPTER 2.” In conjunction with this symbol. meaning “− or +”—which is the same thing except that. (±z) + (∓z) = 0. (z − β)2 = (β 2 − γ 2 ). order basically refers to the number of variables multiplied together in a term.

” The j or k is a dummy variable. .13 (Nothing prevents one from writing k rather than j . k=3 The similar multiplication notation b f (j) j=a means to multiply the several f (j) rather than to add them. the general habit of writing k and j proves convenient at least in § 4. 8. For example.3 Integer and series notation Sums and products of series arise so frequently in mathematical work that one ﬁnds it convenient to deﬁne terse notations to express them. . by (2. used only to facilitate the addition or multiplication of the series. index of summation or loop counter —a variable with no independent existence. then adding the several f (k). However.3 speaks further of the dummy variable. Appendix A tells how to read the numerals. However. The symbols and come respectively from the Greek letters for S and P.4).12 6 k2 = 32 + 42 + 52 + 62 = 0x56.2) in light of (2.3. For a dummy variable. incidentally. The summation notation b f (k) k=a means to let k equal each of the integers a. this writer ﬁnds the form (2. a+ 2. INTEGER AND SERIES NOTATION 15 2. . 13 Section 7. b in turn. a+ 1.2. one can use any letter one likes.2 and Ch. The hexadecimal numeral 0x56 represents the same number the decimal numeral 86 represents.2) easier to remember.5. The book’s preface explains why the book represents such numbers in hexadecimal. For example. .) which solves the quadratic ax2 + bx + c = 0. evaluating the function f (k) at each k. and may be regarded as standing for “Sum” and “Product. so we start now. . the quadratic z 2 = 3z − 2 has the solutions z= 12 3 ± 2 3 2 2 − 2 = 1 or 2.

In the event that the reverse order of multiplication is needed. overﬂowing computer registers during numerical computation. but this is the order we will use in this book.1) is not always commutative. The notation n! is pronounced “n factorial. . we will use the notation b j=a f (j) = [f (a)][f (a + 1)][f (a + 2)] · · · [f (b − 2)][f (b − 1)][f (b)].5) One reason among others for this is that factorials rapidly multiply to extremely large sizes. Note that for the sake of deﬁnitional consistency. this can of course be regarded correctly as n! divided by m! . If you can avoid unnecessary multiplication by regarding n!/m! as a single unit. N N f (k) = 0 + k=N +1 N k=N +1 N f (k) = 0.1.14 Because multiplication in its more general sense as linear transformation (§ 11. j=m+1 is very frequently used. CLASSICAL ALGEBRA AND GEOMETRY The product shorthand n n! ≡ n!/m! ≡ j. we specify that b j=a f (j) = [f (b)][f (b − 1)][f (b − 2)] · · · [f (a + 2)][f (a + 1)][f (a)] rather than the reverse order of multiplication.15 Multiplication proceeds from right to left. this is a win. j=1 n j.16 CHAPTER 2.” Regarding the notation n!/m!. 14 (2. f (j) = (1) j=N +1 j=N +1 f (j) = 1. This means among other things that 0! = 1. but it usually proves more amenable to regard the notation as a single unit. 15 The extant mathematical literature lacks an established standard on the order of multiplication implied by the “ ” symbol.

y and z. whether or not the ∈ Z notation also is used. n. −4. 3. −3.4. can be pressed into service.6) k = (b − a + 1) 2 k=a The letter Z recalls the transitive and intransitive German verb z¨hlen. but if used (as here and in § 2. as—ironically in light of the symbol Z—are the Roman letters x. then a+ 2 with b− 2. Where additional letters are needed ℓ. M and N when available—though i is sometimes avoided because the same letter represents the imaginary unit of § 2. this alphabetical convention is fairly well cemented in practice.5) it states explicitly that N .” Integers conventionally get the letters17 i. Refer to Appendix B.3 is the arithmetic series b k=a k = a + (a + 1) + (a + 2) + · · · + b. Greek letters are avoided. We will use such notation extensively in this book. exk=1 perience shows the latter notation to be extremely useful in expressing more sophisticated mathematical ideas.. L. Hence.}. so.12. 5. b a+b . it dominated applied-mathematical computer programming for decades. p and q. k ∈ Z 17 unnecessary. Admittedly it is easier for the beginner to read “f (1)+ f (2)+ · · · + f (N )” than “ N f (k). (The symbol Z represents16 the set of all integers: Z ≡ {. j. the average of each pair is [a+b]/2. K. k. etc. 2. j and k are integers. “to count. . Pairing a with b. . 2. then a+ 1 with b− 1. −5. (2. . thus the average of the entire series is [a+b]/2. −1.” However. 1. plus the capitals of these and the earlier listed letters. 4.) On ﬁrst encounter. . −2. but this changes nothing. . the and notation seems a bit overwrought. THE ARITHMETIC SERIES Context tends to make the notation N.2. 0. . m. The symbol ∈ means “belongs to” or “is a member of.4 The arithmetic series A simple yet useful application of the series sum of § 2.) The series has b − a + 1 terms. M or N. j. occasionally joined even by r and s. J. during which the standard way to declare an integer variable to the Fortran compiler was simply to let its name begin with I. (The pairing may or may not leave an unpaired element at the series midpoint k = [a + b]/2. 17 16 .” a Though Fortran is perhaps less widely used a computer programming language than it once was.

q ∈ Z Success with this arithmetic series leads one to wonder about the geometric series ∞ z k .4 addresses that point. p.18 CHAPTER 2. s ∈ Z are integers.2: Power properties and deﬁnitions. 2. CLASSICAL ALGEBRA AND GEOMETRY Table 2. n ≥ 0 (uv)a = ua v a z = (z 1/n )n = (z n )1/n √ z ≡ z 1/2 z p/q = (z 1/q )p = (z p )1/q z ab = (z a )b = (z b )a z a+b = z a z b za z a−b = zb 1 z −b = zb j.6. k. n z n ≡ j=1 z. Section 2. n.5. m. q.2 summarizes its deﬁnitions and results. Table 2. It oﬀers no surprises. Readers seeking more rewarding reading may prefer just to glance at the table then to skip directly to the start of the next section. n. In this section. p. k=0 2. but the exponents a and b are arbitrary real numbers. the exponents j.1 Notation and integral powers zn The power notation .5 Powers and roots This necessarily tedious section discusses powers and roots. r.

The speciﬁc interpretation depends on the nature and meaning of the two zeros. zn (2.9) 1 .1.8) On the other hand from multiplicative associativity and commutivity. More formally. For similar reasons. z for any integral m and n.7) z 3 = (z)(z)(z). z mn = (z m )n = (z n )m . 19 The case 00 is interesting because it lacks an obvious interpretation. j=1 (2. zm z m−n = n . E→∞ E→∞ . then lim ǫǫ = lim 1 E 1/E ǫ→0+ E→∞ = lim E −1/E = lim e−(ln E)/E = e0 = 1.5. 18 (2. (uv)n = un v n . z This leads us to extend the deﬁnition to negative integral powers with z n−1 = z −n = From the foregoing it is plain that z m+n = z m z n . Refer to § 2. For interest. Notice that in general. zn .11) The symbol “≡” means “=”. multiplied by itself n times. z 1 = z. POWERS AND ROOTS 19 indicates the number z.10) (2. z 0 = 1. when the exponent n is a nonnegative integer. z 2 = (z)(z). but it further usually indicates that the expression on its right serves to deﬁne the expression on its left. (2.19 n z. if E ≡ 1/ǫ.2.18 n z ≡ For example.4.

Taking the qth root. wp = (z p )1/q .5.20 CHAPTER 2. nonnegative square root. (v n )1/n = u1/n .2 Roots Fractional powers are not something we have deﬁned yet. the square root of z. so this is (z 1/q )p = (z p )1/q .10) we let (u1/n )n = u. Taking the u and v formulas together. raised to the nth power. The number z 1/n is called the nth root of z—or in the very common case n = 2. What about powers expressible neither as n nor as 1/n. This has u1/n as the number which. the last notation is usually implicitly taken to mean the real. (v n )1/n = v. it follows by successive steps that v n = u. the power and root operations mutually invert one another. then. then wpq = z p . Setting v = u1/n . so for consistency with (2. such as the 3/2 power? If z and w are numbers related by wq = z. often written √ z. But w = z 1/q .12) for any z and integral n. (z 1/n )n = z = (z n )1/n (2. In any case. CLASSICAL ALGEBRA AND GEOMETRY 2. When z is real and nonnegative. yields u. .

§ 2.13) implies a power deﬁnition for all real exponents.5.11).10) therefore. we have that (uv)p/q = [up v p ]1/q = = = (up )q/q (v p )q/q (up/q )q (v p/q )q (up/q )(v p/q ) 1/q 1/q q/q = up/q v p/q .3 Powers of products and powers of powers (uv)p = up v p . Equation (2.13) is this subsection’s main result. then raising to the qs power yields (ws )q = z. POWERS AND ROOTS 21 which says that it does not matter whether one applies the power or the root ﬁrst. Per (2. w = (z 1/s )1/q .14). By identical reasoning. If w ≡ z 1/qs . Extending (2. . (2. we deﬁne z p/q such that (z 1/q )p = z p/q = (z p )1/q .13) Since any real number can be approximated arbitrarily closely by a ratio of integers. Raising this equation to the 1/q power. (2. The proof is straightforward. as we have sought.5. the last two equations imply (2. But since w ≡ z 1/qs . (2. the result is the same.14) 2. Successively taking the qth and sth roots gives w = (z 1/q )1/s .2. However.5.3 will ﬁnd it useful if we can also show here that (z 1/q )1/s = z 1/qs = (z 1/s )1/q .

10). which implies that 1 . (2.17) (2. per (2.5.17) leads to z −b = z a−b = z a z −b . With (2.16). (2. z (p/q)(r/s) = (z r/s )p/q . (2.18) .18) is z a−b = za . Raising this equation to the 1/qs power and applying (2. zb (2. By identical reasoning. CLASSICAL ALGEBRA AND GEOMETRY In other words (uv)a = ua v a for any real a.10).15) 2. In the case that a = −b.9). zb But then replacing −b ← b in (2. Since p/q and r/s can approximate any real numbers with arbitrary precision. z pr = (z p )r . On the other hand.13) and (2.4 Sums of powers z (p/q)+(r/s) = (z ps+rq )1/qs = (z ps z rq )1/qs = z p/q z r/s .14) to reorder the powers. 1 = z −b+b = z −b z b . which according to (2. this implies that (z a )b = z ab = (z b )a (2.19) (2.22 CHAPTER 2.16) for any real a and b. we have that z (p/q)(r/s) = (z p/q )r/s . one can reason that or in other words that z a+b = z a z b .15) and (2.

to sketch some annulus in the Argand plane. The name “Laurent series” is a name we shall meet again in § 8. they call a “power series” only if ak = 0 for all k < 0—in other words.14. the formulas remain valid. They call it a “polynomial” only if it is a “power series” with a ﬁnite number of terms. That is after all exactly what it is.20) a “power series with negative powers.12). Looking ahead to § 2. Professional mathematicians use the terms more precisely. 5.2 on page 18 summarizes the section’s deﬁnitions and results. but the two names in fact refer to essentially the same thing.5 Summary and remarks Table 2. All these names mean about the same thing. but one is expected most .2. If that is not what you seek. technically. if complex (§ 2.” This book follows the last usage.” be prepared for people subjectively— for no particular reason—to expect you to establish complex radii of convergence. MULTIPLYING AND DIVIDING POWER SERIES 23 2. w. real or complex. Still. p/q.20) in general a Laurent series.11 and Ch. Nevertheless if you do use the name “Laurent series.” Semantics. we shall remove later.6. n. We tend to call (2. then you may ﬁnd it better just to call the thing by the less lofty name of “power series”—or better.20). and/or to engage in other maybe unnecessary formalities. by the even humbler name of “polynomial. the analysis does imply that the various exponents m. This restriction. purposely deﬁning the action of a complex exponent to comport with the results found here. This section discusses the multiplication and division of power series. we observe that nothing in the foregoing analysis requires the base variables z. if it has a ﬁnite number of terms. You however can call (2.20) a Laurent series if you prefer (and if you pronounce it right: “lor-ON”).12. In the meantime however we admit that the professionals have vaguely daunted us by adding to the name some pretty sophisticated connotations. The word “polynomial” usually connotes a power series with a ﬁnite number of terms. (2.” or just “a power series.5. to the point that we applied mathematicians (at least in the author’s country) seem to feel somehow unlicensed actually to use the name. 20 Another name for the power series is polynomial. 2.6 Multiplying and dividing power series A power series 20 is a weighted sum of integral powers: A(z) = ∞ k=−∞ ak z k . u and v to be real numbers. § 3. Equation (2. With such a deﬁnition the results apply not only for all bases but also for all exponents. not if it includes negative powers of z. a.20) where the several ak are arbitrary constants. b and so on are real numbers. They call (2.

you can snow him with the big word “Laurent series.6. (2. depending on whether the speaker is male or female. carefully always to give the right name in the right place. the writer recommends that you call it a “power series” and then not worry too much about it until someone objects. What a bother! (Someone once told the writer that the Japanese language can give diﬀerent names to the same object. then B(z) is a dividend (or numerator ) and A(z) a divisor (or denominator ). purposely choosing pieces easily divided by A(z).” The vocabulary omits the name because that name fortunately remains unconfused in usage—it means quite speciﬁcally a power series without negative powers and tends to connote a representation of some particular function of interest—as we shall see in Ch. 21 If Q(z) is a quotient and R(z) a remainder. 8. Such are the Latin-derived names of the parts of a long division. = 2z + z−2 z−2 z−2 = The strategy is to take the dividend21 B(z) piece by piece. The experienced scientist or engineer may notice that the above vocabulary omits the name “Taylor series. For example. When someone does object. The power-series terminology seems to share a spirit of that kin. (2. and there are at least two ways to do it.3 below will do it by matching coeﬃcients. k the product of the two series is evidently P (z) ≡ A(z)B(z) = ∞ ∞ aj bk−j z k . CLASSICAL ALGEBRA AND GEOMETRY 2.” instead. .21) bk z .) If you seek just one word for the thing. 2z 2 − 3z + 3 z−2 2z 2 − 4z z + 3 z+3 + = 2z + z−2 z−2 z−2 5 5 z−2 + = 2z + 1 + .6. Section 2.1 Multiplying power series ∞ k=−∞ ∞ k=−∞ Given two power series A(z) = B(z) = ak z k .24 CHAPTER 2.6.2 Dividing power series The quotient Q(z) = B(z)/A(z) of two power series is a little harder to calculate. but this subsection does it by long division.22) k=−∞ j=−∞ 2.

What does it mean? The key to understanding it lies in understanding (2. Well.24) rnk z k . bk z k . . which we add and subtract from (2. Rn (z) = k=−∞ Qn (z) = N −K k=n−K+1 qk z k . That is what the rest of the subsection is about. At start. MULTIPLYING AND DIVIDING POWER SERIES 25 If you feel that you understand the example. aK = 0. however. The dividend B(z) and the divisor A(z) stay the same from one n to the next. One sometimes wants to express the long division of power series more formally. The goal is to grind Rn (z) away to nothing. where K and N identify the greatest orders k of z k present in A(z) and B(z). . though less direct. QN (z) = 0. . QN (z) = 0 while RN (z) = B(z). and K A(z) = k=−∞ N ak z k . then that is really all there is to it.23).6. we pursue the goal by choosing from Rn (z) an easily divisible piece containing the whole high-order term of Rn (z). to make it disappear as n → −∞. respectively. but the quotient Qn (z) and the remainder Rn (z) change.23) where Rn (z) is a remainder (being the part of B[z] remaining to be divided). where n = N.23) to . but the thrust of the long division process is to build Qn (z) up by wearing Rn (z) down. and you can skip the rest of the subsection if you like. we prepare the long division B(z)/A(z) by writing B(z) = A(z)Qn (z) + Rn (z). (2. that is a lot of symbology.3. is often easier and faster. which is not one but several equations—one equation for each value of n. The piece we choose is (rnn /aK )z n−K A(z).) Formally. As in the example. n (2. (Be advised however that the cleverer technique of § 2.2. N − 1. N − 2. k=−∞ B(z) = RN (z) = B(z).6. .

§ 3. CLASSICAL ALGEBRA AND GEOMETRY B(z) = A(z) Qn (z) + rnn n−K rnn n−K z z A(z) . Then we iterate per (2.” “a sub k” and “z to the k” (or.3 below. k=No B(z) = 22 23 [69.6.23) is trivially true.26 obtain CHAPTER 2.22 In its qn−K equation. we initialize RN (z) = B(z). To begin the actual long division.25) as many times as desired. then so long as Rn (z) tends to vanish as n → −∞. the table includes also the result of § 2. “z to the kth power”)—at least in the author’s country. we ﬁnd that rnn qn−K = . A(z) (2. aK (2.23) that B(z) = Q−∞ (z).3 summarizes the long-division procedure.27) except that it may happen that Rn (z) = 0 for suﬃciently small n. bk z k . It should be observed in light of Table 2. for which (2. Rn (z) B(z) = Qn (z) + . Table 2.25) Rn−1 (z) = Rn (z) − qn−K z n−K A(z).2] The notations Ko . A(z) A(z) (2. .26) Iterating only a ﬁnite number of times leaves a remainder. it follows from (2. more fully. respectively. as “K naught. ak and z k are usually pronounced.3 that if23 K A(z) = k=Ko N ak z k . where no term remains in Rn−1 (z) higher than a z n−1 term. If an inﬁnite number of times. + Rn (z) − aK aK Matching this equation against the desired iterate B(z) = A(z)Qn−1 (z) + Rn−1 (z) and observing from the deﬁnition of Qn (z) that Qn−1 (z) = Qn (z) + qn−K z n−K .

B(z) = A(z)Qn (z) + Rn (z) K A(z) = k=−∞ N ak z k .2. MULTIPLYING AND DIVIDING POWER SERIES 27 Table 2.3: Dividing power series through successively smaller powers. aK = 0 bk z k k=−∞ B(z) = RN (z) = B(z) QN (z) = 0 n Rn (z) = k=−∞ rnk z k N −K k=n−K+1 Qn (z) = qk z k N −K qn−K = 1 rnn = aK aK bn − an−k qk Rn−1 (z) = Rn (z) − qn−K z B(z) = Q−∞ (z) A(z) k=n−K+1 n−K A(z) .6.

then consider per (2.3 Dividing power series by matching coeﬃcients There is another. when n < No + (K − Ko ). § 2.25) that rmm m−K z A(z).28 then CHAPTER 2. which is exactly the claim (2. then according to the equation so also must the leastorder term of Rm−1 (z) be a z No term. the terms of Rn (z) run from z n−(K−Ko )+1 through z n . A(z) (2. Often. CLASSICAL ALGEBRA AND GEOMETRY n Rn (z) = k=n−(K−Ko )+1 rnk z k for all n < No + (K − Ko ). unless an even lower-order term be contributed by the product z m−K A(z).5] . This is better stated after the change of variable n + 1 ← m: the least-order term of Rn (z) is a z n−(K−Ko )+1 term when n < No + (K − Ko ). evidently the least-order term of Rm−1 (z) is a z m−(K−Ko ) term when m − (K − Ko ) ≤ No . sometimes quicker way to divide power series than by the long division of § 2. But that very product’s term of least order is a z m−(K−Ko ) term.2. aK = 0.25 If Q∞ (z) = where A(z) = B(z) = 24 B(z) . So.6. The greatest-order term of Rn (z) is by deﬁnition a z n term.28) is wanted. In this case.29) ∞ k=K ∞ k=N ak z k . otherwise a z No term. Under these conditions.6.24 The long-division procedure of Table 2. where a z K term is A(z)’s term of least order. The reason for this. 25 [45][23.28) That is. one prefers to extend the quotient through successively larger powers of z. One can divide them by matching coeﬃcients.28) makes. Rm−1 (z) = Rm (z) − aK If the least-order term of Rm (z) is a z No term (as clearly is the case at least for the initial remainder RN [z] = B[z]). in summary. bk z k If a more formal demonstration of (2. (2. the long division goes by the complementary rules of Table 2. 2. otherwise a z No term.4. the remainder has order one less than the divisor has. however.3 extends the quotient Qn (z) through successively smaller powers of z. is that we have strategically planned the long-division iteration precisely to cause the leading term of the divisor to cancel the leading term of the remainder at each step. of course.

4: Dividing power series through successively larger powers.2. aK = 0 bk z k RN (z) = B(z) QN (z) = 0 Rn (z) = Qn (z) = k=N −K ∞ rnk z k qk z k k=n n−K−1 qn−K = 1 rnn = aK aK n−K−1 bn − an−k qk A(z) Rn+1 (z) = Rn (z) − qn−K z B(z) = Q∞ (z) A(z) k=N −K n−K . B(z) = A(z)Qn (z) + Rn (z) A(z) = B(z) = ∞ k=K ∞ k=N ak z k .6. MULTIPLYING AND DIVIDING POWER SERIES 29 Table 2.

(2.30 CHAPTER 2.30) is correct when Q∞ (z) does converge. the fact that (2. Expanding the left side according to (2. CLASSICAL ALGEBRA AND GEOMETRY are known and Q∞ (z) = ∞ k=N −K qk z k is to be calculated. n=N k=N −K But for this to hold for all z. often what interest us are only the series’ ﬁrst several terms n−K−1 Qn (z) = k=N −K 26 qk z k . however. powers of z if needed or desired. each coeﬃcient depending on the coeﬃcients earlier computed. but Tables 2. then one can multiply (2.34). See footnote 27. even though all its coeﬃcients are known. Admittedly.4 incorporate the technique both ways. which diverges when26 |z| > 1. Transferring all terms but aK qn−K to the equation’s right side and dividing by aK . n ≥ N. ∞ n−K an−k qk z n = ∞ n=N bn z n . rather than increasing.30) Equation (2. The coeﬃcient-matching technique of this subsection is easily adapted to the division of series in decreasing. .30) computes the coeﬃcients of Q(z).3 and 2.30) yields a sequence of coeﬃcients does not necessarily mean that the resulting power series Q∞ (z) converges to some deﬁnite value over a given domain. we have that qn−K = 1 aK n−K−1 bn − an−k qk k=N −K .29) through by A(z) to obtain the form A(z)Q∞ (z) = B(z). Consider for instance (2. n ≥ N. the coeﬃcients must match for each n: n−K k=N −K an−k qk = bn . At least (2. The adaptation is left as an exercise to the interested reader. Even when Q∞ (z) as such does not converge.22) and changing the index n ← k on the right side.

Solving (2. Q∞ (z) = Rn (z) = B(z) − A(z)Qn (z). as follows. 1 k=0 = (2. include27 ∞ (∓z)k .34) . |z| > 1. 27 The notation |z| represents the magnitude of z. implies that S≡ as was to be demonstrated. ∞ zk = k=0 1 .31) for Rn (z). |z| < 1. |z| < 1. For example.2. there is a simpler.33) 1 ± z −1 k − (∓z) . after dividing by 1 − z. more aesthetic.4 and which often arises in practice: to what total does the inﬁnite geometric series ∞ z k . which. However.2. MULTIPLYING AND DIVIDING POWER SERIES In this case.6.6.33) almost incidentally answers a question which has arisen in § 2. calculated by the long division of § 2.32) 2.6. Subtracting the latter equation from the former leaves (1 − z)S = 1. Rn (z) B(z) = Qn (z) + A(z) A(z) and convergence is not an issue. |z| < 1. 31 (2. but also |−5| = 5.4 Common power-series quotients and the geometric series Equation (2. |5| = 5.6. and/or veriﬁed by multiplying. k=−∞ z k .3. 1−z (2. computed by the coeﬃcient matching of § 2.31) (2. |z| < 1. sum? Answer: it sums exactly to 1/(1 − k=0 z). ∞ k=1 zk . Let S≡ Multiplying by z yields zS ≡ ∞ k=0 Frequently encountered power-series quotients. more instructive way to demonstrate the same thing.

etc. producing zS1 = ∞ k=0 kz k+1 = ∞ k=1 (k − 1)z k . (2.6. the sum S1 ≡ ∞ k=0 kz k . we arrive at ∞ z .6.2. k k3 z k .34) to collapse the last sum. We multiply the unknown S1 by z. leaving (1 − z)S1 = ∞ k=0 kz − k ∞ k=1 (k − 1)z = k ∞ k=1 z =z k ∞ k=0 zk = z .6. |z| < 1. we can compute as follows. [50] . Further series of the kind. independent variables and dependent variables. such as k k2 z k . the diﬀerence technique of § 2. 2. |z| < 1 (which arises in.32 CHAPTER 2. We then subtract zS1 from S1 . Consider the time t a sound needs to travel from its source to a distant listener: t= 28 ∆r vsound .. can be calculated in like manner as the need for them arises. among others. CLASSICAL ALGEBRA AND GEOMETRY 2. independent variables and dependent variables Mathematical models use indeterminate constants. Dividing by 1 − z.5 Variations on the geometric series Besides being more aesthetic than the long division of § 2. 1−z where we have used (2.35) S1 ≡ kz k = (1 − z)2 k=0 which was to be found. k (k + 1)(k)z k . Planck’s quantum blackbody radiation calculation28 ).7 Indeterminate constants. For instance. The three are best illustrated by example.4 permits one to extend the basic geometric series in several ways.

now regarding as an independent variable what a moment ago we had regarded as an indeterminate constant. dependent). One could calculate the quantity of water in a kitchen mixing bowl just as well. So. ∆r is an independent variable. § 9. Although there exists a deﬁnite philosophical distinction between the three kinds of quantity.7. The chance that the typical reader will ever specify the dimensions of a real musical concert hall is of course vanishingly small. it doesn’t vary).2. the great majority of practical engineering calculations—for quick. CONSTANTS AND VARIABLES 33 where ∆r is the distance from source to listener and vsound is the speed of sound. that sound goes at some constant speed—whatever it is—and that we can calculate the delay in terms of this. if you tell the model how far the listener sits from the sound source. ∆r. Occasionally we go so far as deliberately to change our point of view in mid-analysis. so long as we remember that there is a diﬀerence Math books are funny about examples like this. nevertheless it cannot be denied that which particular quantity is an indeterminate constant. so. Knowing the ﬁgure is not the point. However. day-to-day decisions where small sums of money and negligible risk to life are at stake—are done with models hardly more sophisticated than the one shown here. but later you ﬁnd out that the real ﬁgure is 331 m/s. the model returns the time needed for the sound to propagate from one to the other. Note that the abstract validity of the model does not necessarily depend on whether we actually know the right ﬁgure for vsound (if I tell you that sound goes at 500 m/s. The same model in the example would remain valid if atmospheric conditions were changing (vsound would then be an independent variable) or if the model were used in designing a musical concert hall29 to suﬀer a maximum acceptable sound time lag from the stage to the hall’s back row (t would then be an independent variable. 29 . as of the quantity of air contained in an astronaut’s round helmet. and t is a dependent variable. Here. an independent variable or a dependent variable often depends upon one’s immediate point of view. you see. Although sophisticated models with many factors and terms do indeed play a major role in engineering. The model gives t as a function of ∆r. Such examples remind one of the kind of calculation one encounters in a childhood arithmetic textbook.4). it probably doesn’t ruin the theoretical part of your analysis. Such a shift of viewpoint is ﬁne. The point is that conceptually there pre¨xists some right ﬁgure for the indeterminate e constant. you just have to recalculate numerically). it is the idea of the example that matters here. vsound is an indeterminate constant (given particular atmospheric conditions. but astronauts’ helmets are so much more interesting than bowls. after all. maybe the concert-hall example is not so unreasonable. for instance (a typical case of this arises in the solution of diﬀerential equations by the method of unknown coeﬃcients. because the chance that the typical reader will ever specify something technical is quite large.

Such a dummy variable does not seem very “independent. the inverse operation is not the root but rather the logarithm: loga (az ) = z. where (in § 2. we have that aloga (a z) = az .3.36) The logarithm loga w answers the question. but because the variable of interest is now the exponent rather than the base.” of course. where the variable z is the exponent and the constant a is the base.3 has introduced the dummy variable.7’s language) the independent variable z is the base and the indeterminate constant a is the exponent. CLASSICAL ALGEBRA AND GEOMETRY between the three kinds of quantity and we keep track of which quantity is which kind to us at the moment. not on some other variable within the expression. most dummy variables are just independent variables—a few are dependent variables— whose scope is restricted to a particular expression.8 Exponentials and logarithms In § 2. There is another way to view the power operation. One can view it as the exponential operation az . however.5 we have considered the power operation z a . which the present section’s threefold taxonomy seems to exclude. (2.1 The logarithm The exponential operation follows the same laws the power operation follows. without. .3 and 7. (Section 2. “What power must I raise a to.8. The main reason it matters which symbol represents which of the three kinds of quantity is that in calculus.) 2. it ﬁlls no role because logically it does not exist there. Refer to §§ 2. in fact. Within the expression. but its dependence is on the operator controlling the expression.34 CHAPTER 2. However. to get w?” Raising a to the power of the last equation. one analyzes how change in independent variables aﬀects dependent variables as indeterminate constants remain ﬁxed. 2. the dummy variable ﬁlls the role of an independent variable.

z) wz = aloga (w wz = aloga = (aloga w )z .40 and 2.38) through (2. exponentials to diﬀerently based exponentials. thus summarizing the general properties of the logarithm.41) (2. loga v loga (wz ) = z loga w. EXPONENTIALS AND LOGARITHMS With the change of variable w ← az .2 Properties of the logarithm loga uv = loga u + loga v.36) and (2.41).8. loga b w z z loga w The basic properties of the logarithm include that (2. and logarithms to diﬀerently based logarithms.39) follows by similar reasoning.38) follows from the steps (uv) = (u)(v). u = loga u − loga v. the exponential and logarithmic operations mutually invert one another. aloga uv = aloga u+loga v . = a . 35 (2.42) follows from the steps w = blogb w . (w z ) Equation (2. quotients to diﬀerences. loga w logb w = . (a loga uv ) = (aloga u )(aloga v ). loga w = logb w loga b.5 repeats the equations along with (2.41) follow from the steps wz = (wz ) = (w)z .2.40) and (2. Table 2.42) Of these. (2. Among other purposes.42) serve respectively to transform products to sums. (2. = az loga w . and (2. 2. . 2.8.39) (2. powers to products. loga w = loga (blogb w ).37) Hence.38) (2. this is aloga w = w.37) (which also emerge as restricted forms of eqns.40) (2. Equations (2.

|a − b| < c < a + b. which itself is longer than the diﬀerence between the two. 2. and h for perpendicular height. The fact that any triangle’s area is half its base length times its height is seen by dropping a perpendicular from one point of the triangle to the opposite side (see Fig. . (2. 2. (2. The truth of the sum inequality c < a + b. CLASSICAL ALGEBRA AND GEOMETRY Table 2. loga uv = loga u + loga v u loga = loga u − loga v v loga (wz ) = z loga w wz = az loga w loga w logb w = loga b loga (az ) = z w = aloga w 2. These are the triangle inequalities. This is seen by splitting a rectangle down its diagonal into a pair of right triangles of equal size.44) where a.2 The triangle inequalities Any two sides of a triangle together are longer than the third alone. b for base length.5: General properties of the logarithm.9.9 Triangles and other polygons: simple facts This section gives simple facts about triangles and other polygons. b and c are the lengths of a triangle’s three sides. bh . is seen by 30 A right triangle is a triangle.9.36 CHAPTER 2.1 on page 4). dividing the triangle into two right triangles. In algebraic symbols. In symbols. 1.1 The area of a triangle The area of a right triangle30 is half the area of the corresponding rectangle. one of whose three angles is perfectly square. for each of which the fact is true.43) A= 2 where A stands for area.

but for the reasons explained in Appendix A and in footnote 16 of Ch. the symbol 2π represents a complete turn. 31 . One way to see the truth of this fact is to imagine a small car rolling along one of the triangle’s sides. however. 3. returning to the start. 2. we reason that it has turned a total of 2π: a full revolution. You may be used to the notation 360◦ in place of 2π. Rearranging the a and b inequalities. the notation is introduced more properly in §§ 3. 32 Many or most readers will already know the notation 2π and its meaning as the angle of full revolution. But the angle φ the car turns at a corner and the triangle’s inner angle ψ there together form the straight angle 2π/2 (the sharper the inner angle. In Section 13. and so on round all three corners to complete a circuit.6 and 8.9. For those who do not.9 proves the triangle inequalities more generally. Reaching the corner.2).2. this book tends to avoid the former notation.3 The sum of interior angles A triangle’s three interior angles32 sum to 2π/2. φ ψ sketching some triangle on a sheet of paper and asking: if c is the direct route between two points and a + b is an indirect route. the car turns to travel along the next side. a full circle. Since the car again faces the original direction. The last is the diﬀerence inequality. though regrettably without recourse to this subsection’s properly picturesque geometrical argument.9. completing (2.11 below.2: The sum of a triangle’s inner angles: turning at the corner.44)’s proof. which together say that |a − b| < c. TRIANGLES AND OTHER POLYGONS: SIMPLE FACTS 37 Figure 2. Brieﬂy.31 2. then how can a + b not be longer? Of course the sum inequality is equally good on any of the triangle’s three sides. a spin to face the same direction as before. 3. the more the car turns: see Fig. Hence 2π/4 represents a square turn or right angle.1. so one can write a < c + b and b < c + a just as well as c < a + b. we have that a − b < c and b − a < c.

as in Fig.3. φk + ψk = 2 where ψk and φk are respectively the triangle’s inner angles and the angles through which the car turns. 2. k = 1. 2. CLASSICAL ALGEBRA AND GEOMETRY mathematical notation. Extending the same technique to the case of an n-sided polygon. Cauchy’s integral formula (8. 2. k=1 φk + ψk = 2π . Solving the latter equations for φk and substituting into the former yields ψ1 + ψ2 + ψ3 = 2π . b and c are the lengths of the legs and diagonal of a right triangle.46) Equation (2. Many proofs of the theorem are known.29) and Fourier’s equation (18.2).38 CHAPTER 2.45) is then seen to be a special case of (2. .47) where a. 2 (2. the Pythagorean theorem is one of the most famous results in all of mathematics. we have that n φk = 2π. or in other words n k=1 ψk = (n − 2) 2π . 2π . 2 (2. (2.45) which was to be demonstrated. The theorem holds that a2 + b2 = c2 . the fundamental theorem of calculus (7.10 The Pythagorean theorem Along with Euler’s formula (5.46) with n = 3. φ1 + φ2 + φ3 = 2π.1).12). 3. 2 Solving the latter equations for φk and substituting into the former yields n k=1 2π − ψk 2 = 2π.

10.3: A right triangle.4: The Pythagorean theorem. c b b a . THE PYTHAGOREAN THEOREM 39 Figure 2.2. c b a Figure 2.

But the large outer square is comprised of the tilted inner square plus the four triangles. In mathematical symbols.11 Functions This book is not the place for a gentle introduction to the concept of the function. This elegant proof is far simpler than the one famously given by the ancient geometer Euclid. One often speaks of domains and ranges when discussing functions. (2. among others. and where r is the corresponding three-dimensional diagonal: the diagonal of the right triangle whose legs are c and h. a function is a mapping from one number (or vector of several numbers) to another. The range of a function is the corresponding set of numbers one can get out of it. yet more appealing than alternate proofs often found in print. Unfortunately the citation is now long lost. hence the area of the large outer square equals the area of the tilted inner square plus the areas of the four triangles. the present writer encountered the proof this section gives somewhere years ago and has never seen it in print since. f (x) = x2 − 1 is a function which maps 1 to 0 and −3 to 8.47) applies to any right triangle. For example.4.47).48). 33 . so can claim no credit for originating it. this is (a + b)2 = c2 + 4 ab 2 . In the example. thus also to c. The domain of a function is the set of numbers one can put into it. which simpliﬁes directly to (2. The area of the large outer square is (a + b)2 . but it is possible [77. Be that as it may. if the domain is restricted to real x such that |x| ≤ 3. The area of the tilted inner square is c2 . 31 March 2006] that Euclid chose his proof because it comported better with the restricted set of geometrical elements he permitted himself to work with. CLASSICAL ALGEBRA AND GEOMETRY One such proof posits a square of side length a + b with a tilted square of side length c inscribed as in Fig. “Pythagorean theorem. electronic source for the proof is [77] as cited earlier in this footnote. however.33 The Pythagorean theorem is readily extended to three dimensions as a2 + b2 + h2 = r 2 . Inasmuch as (2. The area of each of the four triangles in the ﬁgure is evidently ab/2.” 02:32.48) where h is an altitude perpendicular to both a and b. it follows that c2 + h2 = r 2 . 2. then the corresponding range is −1 ≤ f (x) ≤ 8. Whether Euclid was acquainted with the simple proof given here this writer does not know. A current. 2.40 CHAPTER 2. Brieﬂy. which equation expands directly to yield (2.

since it maps both x = −u and x = u onto f (x) = u2 −1—it often does not pay to interpret the requirement too rigidly. inversion’s notation f −1 (·) clashes with the similar-looking but diﬀerent-meaning notation f 2 (·) ≡ [f (·)]2 . A root (or zero) of a function is a domain point at which the function evaluates to zero (the example has roots at x = ±1). Though only a function which never maps distinct values from its domain together onto a single value in its range is strictly invertible—which means that the f (x) = x2 − 1 of the example is not strictly invertible. singularity and pole. but the book must lay a more extensive foundation before introducing them properly in § 8. but the best way to handle such unreasonable singularities is almost always to change a variable. whereas f −1 (·) = [f (·)]−1 : both notations are conventional and both are used in this book. A more precise formalism to say that “the function’s output is inﬁnite” might be lim |f (x)| = ∞. or otherwise to frame the problem such that one need not approach the singularity. which (§ 9. the function h(x) = 1/(x2 − 1) has poles at x = ±1. FUNCTIONS 41 The notation f −1 (·) indicates the inverse of the function f (·) such that f −1 [f (x)] = x = f [f −1 (x)]. the applied mathematician tends to avoid such formalisms where there seems no immediate need for them. . for example. A singularity that behaves as 1/xN is a multiple pole. for context tends to choose between available values (see for instance § 8. x→xo and yet preciser formalisms than this are conceivable. the singularity and the pole. this book will have little to say of such singularities. that is. Be that as it may.5). Except tangentially much later when it treats asymptotic series.2) can be thought of as N poles. there is also the troublesome branch point. as w ← 1/z.49) (Besides the root. Other terms which arise when discussing functions are root (or zero). and occasionally even are useful. 35 There is further the essential singularity. Here is one example of the book’s deliberate lack of formal mathematical rigor.11. where the function’s output is inﬁnite. Branch points are important. an example of which is z = 0 in p(z) = exp(1/z).34 A pole is a sin√ gularity that behaves locally like 1/x (rather than. A singularity of a function is a domain point at which the function’s output diverges. like 1/ x).35 ) 34 thus swapping the function’s range with its domain.6. Inconsistently. an infamous example of which is z = 0 in the function √ g[z] = z. (2.5. The example’s function f (x) has no singularities for ﬁnite x.2. however.

2. one summer afternoon. imaginary numbers are substantial. then the period separating your fourteenth and twenty-ﬁrst birthdays could have been measured as −i7 iyears. never directly from counting things. The unpersuaded reader is asked to suspend judgment a while.50) and since the quantity i thus deﬁned is found to be critically important across broad domains of higher mathematics. 37 For some inscrutable reason. Imaginary numbers are not to mathematics as. He will soon see the use. 36 . or absolute value) |z| of the complex number is deﬁned to be the length ρ in Fig. whereas rising engineers take the mathematicians’ classes at school and then. The reason imaginary numbers are called “imaginary” probably has to do with the fact that they emerge from mathematical operations only. Since there exists no real number i such that i2 = −1 (2. let us call it a useful formalism.— neither of us has ever met just 1). but the curiously absolute character of the notational split is interesting as a social phenomenon. The writer has his preference between the two notations and this book reﬂects it. however. imaginary elfs are (presumably) not substantial objects. we accept (2. The sum of a real number x and an imaginary number iy is the complex number z = x + iy. say. promptly start writing z ∗ for the rest of their lives. Madness? No. Notice. The magnitude (or modulus.36 Imaginary numbers are given their own number line. imaginary elfs are to the physical world. plotted at right angles to the familiar real number line as in Fig. one chair. in the author’s country at least. let us not call it that. However.12 Complex numbers (introduction) Section 2.5.5. having passed the classes.5. but perhaps not of quite the right concept in this usage. The number i is just a concept. What it has not done is to tell us √ how to regard a quantity such as −1. that the number 1/2 never emerges directly from counting things. The conjugate z ∗ of this complex number is deﬁned to be37 z ∗ = x − iy. professional mathematicians seem universally to write z instead of z ∗ . CLASSICAL ALGEBRA AND GEOMETRY 2. which per the Pythagorean theorem The English word imaginary is evocative. If for some reason the iyear were oﬀered as a unit of time. either. 2. rather.50) as the deﬁnition of a fundamentally new kind of quantity: the imaginary number.2 has introduced square roots. but then so is the number 1 (though you and I have often met one of something—one apple. etc. in the mathematical realm.42 CHAPTER 2. of course. In the physical world. The word imaginary in the mathematical sense is thus more of a technical term than a descriptive adjective.

x (2.10) is such that |z|2 = x2 + y 2 . COMPLEX NUMBERS (INTRODUCTION) 43 Figure 2. particularly when printed by hand). and a complex number z therein.5: The complex (or Argand) plane. iℑ(z) i2 i z ρ φ −2 −1 −i −i2 1 2 ℜ(z) (§ 2. (2. . If the equation doesn’t make sense to you yet for this reason. which in terms of the trigonometric functions of § 3.52) Speciﬁcally to extract the real and imaginary parts of a complex number. 38 This is a forward reference. are conventionally recognized (although often the symbols ℜ[·] and ℑ[·] are written Re[·] and Im[·]. (2.51) The phase arg z of the complex number is deﬁned to be the angle φ in the ﬁgure.2.12. skip it for now. The important point is that arg z is the angle φ in the ﬁgure.53) ℑ(z) = y.138 is such that tan(arg z) = y . the notations ℜ(z) = x.

claiming that j not i were the true imaginary unit.57) (the curious fact.44 CHAPTER 2. eqn. CLASSICAL ALGEBRA AND GEOMETRY 2. If one deﬁned some number j ≡ −i. i2 (2. 2.39 then one would ﬁnd that (−j)2 = −1 = j 2 . Sometimes convenient are the forms ℜ(z) = z + z∗ .54) (2. 39 [22. 2.2 Complex conjugation An important property of complex numbers descends subtly from the fact that i2 = −1 = (−i)2 .55) (2.56. 2 z − z∗ ℑ(z) = . is useful.12. § I:22-5] . including that z1 z2 = (x1 x2 − y1 y2 ) + i(y1 x2 + x1 y2 ). 2 x2 + y 2 2 It is a curious fact that It is a useful fact that 1 = −i. z1 x2 − iy2 x1 + iy1 x1 + iy1 = = z2 x2 + iy2 x2 − iy2 x2 + iy2 (x1 x2 + y1 y2 ) + i(y1 x2 − x1 y2 ) = .12.1 Multiplication and division of complex numbers in rectangular form Several elementary properties of complex numbers are readily seen if the fact that i2 = −1 is kept in mind.58) trivially proved. i z ∗ z = x2 + y 2 = |z|2 (2. and thus that all the basic properties of complex numbers in the j system held just as well as they did in the i system. but would perfectly mirror one another in every respect. too).56) (2. The units i and j would diﬀer indeed.

if (z k−1 )∗ = (z ∗ )k−1 . Proposing that (z k−1 )∗ = (z ∗ )k−1 (which may or may not be true but for the moment we assume it). In other words. (z ∗ )k = sk − itk . where sk−1 and tk−1 are symbols introduced to represent the real and imaginary parts of z k−1 . To establish it symbolically needs a page or so of slightly abstract algebra as follows. To begin with. the implication is reversible by reverse reasoning. the goal of which will be to show that [f (z)]∗ = f (z ∗ ) for some unspeciﬁed function f (z) with speciﬁed properties. Multiplying the former equation by z = x + iy and the latter by z ∗ = x − iy. so by mathematical induction40 we have that (z k )∗ = (z ∗ )k (2.1 elaborates on the technique and oﬀers a more extensive example.60) Mathematical induction is an elegant old technique for the construction of mathematical proofs. . a very good introduction to mathematical induction is found in [30]. we have that (z ∗ )k = (xsk−1 − ytk−1 ) − i(ysk−1 + xtk−1 ).2. this is written more succinctly z k = sk + itk . if z = x + iy. we can write z k−1 = sk−1 + itk−1 .12. We have also from (2. Beyond the present book. Therefore.59) for all integral k. z ∗ = x − iy With the deﬁnitions sk ≡ xsk−1 − ytk−1 and tk ≡ ysk−1 + xtk−1 . Solving the deﬁnitions of sk and tk for sk−1 and tk−1 yields the reverse deﬁnitions sk−1 = (xsk + ytk )/(x2 + y 2 ) and tk−1 = (−ysk + xtk )/(x2 + y 2 ). z k = (xsk−1 − ytk−1 ) + i(ysk−1 + xtk−1 ). except when z = x + iy happens to be null or inﬁnite. (z ∗ )k−1 = sk−1 − itk−1 . then by deﬁnition. then it necessarily follows that (z k )∗ = (z ∗ )k . Section 8.54) that ∗ ∗ (z1 z2 )∗ = z1 z2 40 (2. COMPLEX NUMBERS (INTRODUCTION) 45 That is the basic idea.

63) In the common case where all bk = 0 and zo = xo is a real number.60) include that if f (z) ≡ f ∗ (z) ≡ ∞ k=−∞ ∞ (ak + ibk )(z − zo )k . so (2.59) and (2.” However.8] 43 That is a pretty impressive-sounding statement: “Such power series have broad application. Observe that the Taylor series (which this section also does not discuss. (2. Equation (2. see § 8.3 Power series and analytic functions (preview) Equation (2.62) k=−∞ where ak and bk are real and imaginary parts of the coeﬃcients peculiar to the function f (·). then f (·) and f ∗ (·) are the same function.3) is a power series with ak = bk = 0 for k < 0.46 CHAPTER 2. 42 41 . merely stating the fact does not tell us much. CLASSICAL ALGEBRA AND GEOMETRY for any complex z1 and z2 . then [f (z)]∗ = f ∗ (z ∗ ).12.43 It happens in practice that most functions of interest [30][67] [34. The interesting part is not so much in the general form (2. Consequences of (2. In fact the general power series is a sort of one-size-ﬁts-all mathematical latex glove. § 10. one can produce an equally valid alternate model simply by simultaneously conjugating all the complex quantities present.61) is a general power series42 in z − zo . general rule of complex numbers and complex variables which is better explained in words than in mathematical symbols. air and words also have “broad application”.61) (2.61) of the series as it is in the speciﬁc choice of ak and bk .41 2. which can be stretched to ﬁt around almost any function. (2. The rule is this: for most equations and systems of equations used to model physical systems.64) expresses a signiﬁcant. (2.63) reduces to the desired form [f (z)]∗ = f (z ∗ ). Such power series have broad application. ∗ (ak − ibk )(z − zo )k . which this section does not discuss.64) which says that the eﬀect of conjugating the function’s input is merely to conjugate its output. molecules.

Basically. Observe however that no such property holds for the real parts: (−1 + i2)(−2 + i3) + (−1 − i) = 3 + i6. from the fact that (1 + i2)(2 + i3) + (1 − i) = −3 + i6 the conjugation property infers immediately that (1 − i2)(2 − i3) + (1 + i) = −3 − i6. The careful reader might observe that this statement neglects Gibbs’ phenomenon.3. 4) in the immediate domain neighborhood of interest. a function is analytic which is inﬁnitely diﬀerentiable (Ch.64) also applying to those for which bk = 0 and zo = xo .4. and the real and imaginary parts ℜ(·) and ℑ(·). We shall have more to say about complex numbers in §§ 3.3. whether bk = 0 and zo = xo or not. These functions are not analytic and do not in general obey the conjugation property.7). bk and zo . it says that if one replaces all the i in some mathematical model with −i. Also not analytic are the Heaviside unit step u(t) and the Dirac delta δ(t) (§ 7. There nevertheless exist one common group of functions which cannot be constructed as power series.11. 8. In the formal mathematical deﬁnition.4). and 4.2. 45 To illustrate. but that curious matter will be dealt with in § 17.” Chapter 8 elaborates. We shall have more to say about analytic functions in Ch. the phase arg(·). These all have to do with the parts of complex numbers and have been introduced in this very section: the magnitude |·|. then the resulting conjugate model is equally as valid as the original. The property the two equations represent is called the conjugation property. 5.63) applies to all such functions. the conjugate (·)∗ . used to model discontinuities explicitly.45 Such functions. are analytic functions (§ 8. All power series are inﬁnitely diﬀerentiable except at their poles. However.6. The property (2. 4. and much more yet in Ch. with (2. COMPLEX NUMBERS (INTRODUCTION) 47 in modeling physical phenomena can conveniently be constructed as power series (or sums of power series)44 with suitable choices of ak . 44 . for applications a fair working deﬁnition of the analytic function might be “a function expressible as a power series.12.

48 CHAPTER 2. CLASSICAL ALGEBRA AND GEOMETRY .

describes an arc of unit length. The angle of full revolution is given the symbol 2π—which thus is the circumference of a unit circle.2 A quarter revolution. when centered in a unit circle. 3. we express angles in radians.” In mathematical theory. just an abstract 1). is then the right angle. 3. it suﬃces to write “φ = 2π/4. 3. (3.1 Measured in radians. we will ﬁnd some terminology useful: the angle φ in the diagram is measured in radians. An angle in radians is a dimensionless number. 2π/4. This chapter introduces the trigonometric functions and derives their several properties.1. 2 Section 8. of distance ρ from the circle’s center to its perimeter).1 Deﬁnitions Consider the circle-inscribed right triangle of Fig.1) tan φ ≡ cos φ The word “unit” means “one” in this context. In considering the circle. where a radian is the angle which. so one need not write “φ = 2π/4 radians”. A unit circle is a circle of radius 1. The trigonometric functions sin φ and cos φ (the “sine” and “cosine” of φ) relate the angle φ to the lengths shown in Fig. an angle φ intercepts an arc of curved length ρφ on a circle of radius ρ (that is. The tangent function is then deﬁned as sin φ .11 computes the numerical value of 2π. A unit length is a length of 1 (not one centimeter or one mile. or square angle.1.Chapter 3 Trigonometry Trigonometry is the branch of mathematics which relates angles to lengths. 1 49 .

arctan[1/(−1)] = [+3/8][2π].3 Inverses of the three trigonometric functions can also be deﬁned: arcsin (sin φ) = φ. −4 units northward) from the origin (0. the y coordinate.1: The sine and the cosine (shown on a circle-inscribed right triangle. of the triangle’s diagonal. y. with the circle centered at the triangle’s point). 4 Rectangular coordinates are pairs of numbers (x. northward. Conventionally. TRIGONOMETRY Figure 3. respectively the “secant. arctan (tan φ) = φ. 0). the coordinates (3. y) which uniquely specify points in a plane. −4) mean the point three units eastward and four units southward (that is. whereas arctan[(−1)/1] = [−1/8][2π]). Often seen in print is the additional notation sec φ ≡ 1/ cos φ. A third rectangular coordinate can also be added—(x. the x coordinate indicates distance eastward. This is similarly the usual interpretation when an equation like y tan φ = x is written.50 CHAPTER 3. 3 . arccos (cos φ) = φ. For instance. x it is normally implied that x and y are to be interpreted as rectangular coordinates4 and that the arctan function is to return φ in the correct quadrant −π < φ ≤ π (for example. csc φ ≡ 1/ sin φ and cot φ ≡ 1/ tan φ. z)—where the z indicates distance upward.” or slope.” This book does not use the notation. y φ ρ cos φ ρ sin φ ρ x which is the “rise” per unit “run.” “cosecant” and “cotangent. When the last of these is written in the form arctan y .

The notation cos2 φ means (cos φ)2 . as is the entity u depicted in Fig.3. 3. but those are not the uses of the word meant here. a vector is an amplitude of some kind coupled with a direction. one readily discovers the several simple trigonometric properties of Table 3. vectors. The entity v depicted in Fig. 3.2.1) and (3. 5 6 . it is seen generally that5 cos2 φ + sin2 φ = 1.2: The sine function.3 Scalars.2) Fig. and vector notation In applied mathematics. SIMPLE PROPERTIES Figure 3.1 and observing (3. 3. (3.2 Simple properties Inspecting Fig. 3.10).4 is also a vector. See also the introduction to Ch. 51 sin(t) 1 t 2π − 2 2π 2 By the Pythagorean theorem (§ 2. 3. “55 miles per hour northwestward” is a vector.1.2). 15. in this case a three-dimensional one.2 plots the sine function.6 For example. The same word vector is also used to indicate an ordered set of N scalars (§ 8. The shape in the plot is called a sinusoid. 11).16) or an N × 1 matrix (Ch. 3.3.

52 CHAPTER 3.1: Simple properties of the trigonometric functions. TRIGONOMETRY Table 3. sin(−φ) sin(2π/4 − φ) sin(2π/2 − φ) sin(φ ± 2π/4) sin(φ ± 2π/2) sin(φ + n2π) = = = = = = − sin φ + cos φ + sin φ ± cos φ − sin φ sin φ cos(−φ) cos(2π/4 − φ) cos(2π/2 − φ) cos(φ ± 2π/4) cos(φ ± 2π/2) cos(φ + n2π) = = = = = = − tan φ +1/ tan φ − tan φ −1/ tan φ + tan φ tan φ = = = = = = + cos φ + sin φ − cos φ ∓ sin φ − cos φ cos φ tan(−φ) tan(2π/4 − φ) tan(2π/2 − φ) tan(φ ± 2π/4) tan(φ ± 2π/2) tan(φ + n2π) sin φ = tan φ cos φ cos2 φ + sin2 φ = 1 1 1 + tan2 φ = cos2 φ 1 1 1+ 2φ = tan sin2 φ .

3.4: A three-dimensional vector v = xx + yy + ˆz. shown with its rectangular components.3: A two-dimensional vector u = xx + yy. AND VECTOR NOTATION 53 ˆ ˆ Figure 3.3. VECTORS. y ˆ yy u ˆ xx x ˆ ˆ Figure 3. SCALARS. z y v x z .

In applications. There is no inherent mathematical unity to 1 mile per hour (otherwise known as 0. one kilogram. a “unitless 1”—a 1 with no physical unit attached— does represent mathematical unity. as ˆ ˆ a = aa = a |a| . like a mile per hour. TRIGONOMETRY Many readers will already ﬁnd the basic vector concept familiar. However. 11) is called a scalar.05 (not 1. a = 55 miles per hour is a scalar. just 1. y and z. That r > 1 means neither more nor less than that you are taller than I am. Consider the ratio r = h1 /ho of your height h1 to my height ho .7 Any vector a can be factored into ˆ an amplitude a and a unit vector a. but for those who do not.05 cm or 1.447 meters per second. which themselves are vectors of unit length pointing in the cardinal directions their respective symbols suggest. In the example. one Japanese yen or anything like that.05). That ratio is of course unity. The concept is quite straightforward and entirely practical. The ˆ unit vector a itself can be expressed in terms of the unit basis vectors: for √ √ ˆ ˆ ˆ ˆ −ˆ example. ˆ 8 The word “unit” here is unfortunately overloaded. a = northwestward. just an abstract 1.05 feet. Maybe you are taller than I am and so r = 1. one often puts physical quantities in ratio precisely to strip the physical units from them. The word “unit” itself as a noun however usually signiﬁes a physical or ﬁnancial reference quantity of measure. exactly 1.8 For example. then a =√ x(1/ 2)+ y(1/ 2). Now consider the ratio h1 /h1 of your height to your own height. comparing the ratio to unity without regard to physical units. Though the scalar a in the example happens to be real. if x points east and y points north. too— which might surprise one. or in its nounal form “unity. nor in the concept of unitless quantities in general.54 CHAPTER 3. There is nothing ephemeral in the concept of mathematical unity.” it refers to the number one (1)—not one mile per hour. 2. scalars can be complex. since scalars by deﬁnition lack direction and the Argand phase φ of Fig. a brief review: Vectors such as the ˆ ˆ u = xx + yy. . A single number which is not a vector or a matrix (Ch. ˆ and a unit vector like x as “ x ”. a kilogram or even a Japanese yen. one customarily writes a general vector like u as “ u ” or just “ u ”.5 so strongly resembles a direction. √ where per the Pythagorean theorem (−1/ 2)2 + (1/ 2)2 = 12 . phase is not an actual direction in the vector sense (the real number line 7 Printing by hand. among other names). By contrast. As an adjective in mathematics. and where the a or |a| represents amplitude only and carries the physical ˆ units if any. a = 55 miles per hour. ˆ where the a represents direction only and has unit magnitude by deﬁnition. ˆ ˆ v = xx + yy + ˆz z ˆ ˆ ˆ of the ﬁgures are composed of multiples of the unit basis vectors x.

4) A fundamental problem in trigonometry arises when a vector ˆ ˆ ˆ must be expressed in terms of alternate unit vectors x′ and y′ . then bend your ﬁngers in the y direction and extend your thumb. ROTATION 55 in the Argand plane cannot be said to run west-to-east. 11 A plane. Notice the relative orientation of the axes in Fig.3. not ﬁxed positions. ˆ ˆ ˆ and y Some books print |v| as v or even v 2 to emphasize that it represents the real. you’d probably ﬁnd it easier to drive that screw with the screwdriver in your left hand.4. of course. A lefthanded orientation is equally possible. y and z are complex. Space is three-dimensional.4 are each (possibly complex) scalars. they represent distances and directions. If x. or anything like that).” This book just renders it |v|. The axes are oriented such that if you point your ﬂat right hand in the x direction. 3. turning the screw slot from the x orientation toward the y. That is. 15 and 16 speak further of the vector. A 9 . y) can be identiﬁed with the vector xx + yy. However. 10 The writer does not know the etymology for certain. where x′ ′ stand at right angles to one another and lie in the plane11 of x and y. 3. The reason the last notation subscripts a numeral 2 is obscure. but as neither orientation has a natural advantage over the other. but verbal lore in American engineering has it that the name “right-handed” comes from experience with a standard right-handed wood screw or machine screw. having to do with the professional mathematician’s generalized deﬁnition of a thing he calls the “norm. y and z of Fig.4 and 3. the thumb then points in the z direction. The x.10 Sections 3. v = ˆ ˆ xx + yy + ˆz is a vector.9 and Chs. If somehow you came across a left-handed screw. scalar magnitude of a complex vector. inﬁnite in extent unless otherwise speciﬁed. If you hold the screwdriver in your right hand and turn the screw in the natural manner clockwise. This is orientation by the right-hand rule.4 Rotation ˆ ˆ u = xx + yy (3. in the general case vectors are not associated with any particular origin. = [ℜ(x)]2 + [ℑ(x)]2 + [ℜ(y)]2 + [ℑ(y)]2 (3. then9 z |v|2 = |x|2 + |y|2 + |z|2 = x∗ x + y ∗ y + z ∗ z + [ℜ(z)]2 + [ℑ(z)]2 . as the reader on this tier undoubtedly knows.4.3) A point is sometimes identiﬁed by the vector expressing its distance and direction from the origin of the coordinate system. 3. is a ﬂat (but not necessarily level) surface. the screw advances away from you in the z direction into the wood or hole. we arbitrarily but conventionally accept the right-handed one as standard. the point ˆ ˆ (x.

Here. Equation (3. evidently ˆ ˆ x′ = +ˆ cos φ + y sin φ. the mark merely distinguishes the new ˆ ˆ unit vector x′ from the old x. but anyway.7) ﬁnds general application where rotations in rectangular coordinates are involved.12 In terms of the trigonometric functions of § 3. If the question is asked. x Substituting (3. (3.7) ˆ ˆ x = +ˆ ′ cos φ − y′ sin φ.5: Vector basis rotation. 3. A line is one-dimensional.5. x ˆ ˆ y′ = −ˆ sin φ + y cos φ. Mathematical writing employs the mark for a variety of purposes. . “what happens if I rotate plane is two-dimensional. y u y ˆ′ φ φ ˆ x ˆ y ˆ x′ x but are rotated from the latter by an angle φ as depicted in Fig. that’s how it’s pronounced). TRIGONOMETRY Figure 3.1. A point is zero-dimensional. x and by appeal to symmetry it stands to reason that ˆ ˆ y = +ˆ ′ sin φ + y′ cos φ.5) which was to be derived. 12 The “ ′ ” mark is pronounced “prime” or “primed” (for no especially good reason of which the author is aware.56 CHAPTER 3. x (3.6) (3.6) into (3.4) yields ˆ ˆ u = x′ (x cos φ + y sin φ) + y′ (−x sin φ + y cos φ). The plane belongs to this geometrical hierarchy.

§ 15. Solving the last pair simultaneously14 for sin(α − β) and cos(α − β) and observing that sin2 (·) + cos2 (·) = 1 yields cos(α − β) = cos α cos β + sin α sin β. (3. 14 The easy way to do this is • to subtract sin β times the ﬁrst equation from cos β times the second.5 Trigonometric functions of sums and diﬀerences of angles With the results of § 3. of course. except that the sense of the rotation is reversed: ˆ ˆ u′ = x(x cos φ − y sin φ) + y(x sin φ + y cos φ). TRIGONOMETRIC SUMS AND DIFFERENCES 57 not the unit basis vectors but rather the vector u instead?” the answer is that it amounts to the same thing. sin(α − β) = sin α cos β − cos α sin β. with respect to the vectors themselves. if we did this we would obtain ˆ ˆ b′ = x[cos β cos(α − β) − sin β sin(α − β)] ˆ + y[cos β sin(α − β) + sin β cos(α − β)]. respectively at angles α and β ˆ ˆ from the x axis. According to (3. the body experiences forces during rotation which might or might not change the body internally in some relevant way.1 will extend rotation in two dimensions to reorientation in three dimensions.9) This is only true. we ˆ ˆ ˆ ˆ can separately equate the x and y terms in the expressions for a and b′ to obtain the pair of equations cos α = cos β cos(α − β) − sin β sin(α − β). 3. If we wanted b to coincide with a. When one actually rotates a physical body.8) and the deﬁnition of b. (3. we would have to rotate ˆ it by φ = α − β.8) Whether it is the basis or the vector which rotates thus depends on your point of view. ˆ b ˆ ˆ ˆ Since we have deliberately chosen the angle of rotation such that b′ = a. then to solve . we now stand in a position to consider trigonometric functions of sums and diﬀerences of angles.5. ˆ ≡ x cos β + y sin β.4 in hand. sin α = cos β sin(α − β) + sin β cos(α − β).3.13 Much later in the book. 13 be vectors of unit length in the xy plane. Let ˆ ˆ ˆ a ≡ x cos α + y sin α.

1 Variations on the sums and diﬀerences Several useful variations on (3.9) and (3. . cos(α + β) = cos α cos β − sin α sin β.10) Equations (3. 2 sin α sin β = (3. 2 cos(α − β) + cos(α + β) cos α cos β = .9) become sin(α + β) = sin α cos β + cos α sin β.1 that sin(−φ) = − sin φ and cos(−φ) = + cos(φ). (3. . This shortcut technique for solving a pair of equations simultaneously for a pair of variables is well worth mastering.10) are the basic formulas for trigonometric functions of sums and diﬀerences of angles.11) With the change of variables δ ← α − β and γ ← α + β. .10) become γ−δ γ+δ cos − cos 2 2 γ +δ γ−δ cos δ = cos cos + sin 2 2 γ+δ γ−δ sin γ = sin cos + cos 2 2 γ−δ γ +δ cos − sin cos γ = cos 2 2 sin δ = sin the result for sin(α − β). γ+δ 2 γ+δ 2 γ+δ 2 γ+δ 2 sin sin sin sin γ−δ 2 γ−δ 2 γ−δ 2 γ−δ 2 .15 These include cos(α − β) − cos(α + β) . 15 Refer to footnote 14 above for the technique. • to add cos β times the ﬁrst equation to sin β times the second. (3.58 CHAPTER 3. In this book alone.10) are achieved by combining the equations in various straightforward ways. . then to solve the result for cos(α − β). it proves useful many times. 2 sin(α − β) + sin(α + β) sin α cos β = .5. (3. 3. TRIGONOMETRY With the change of variable β ← −β and the observations from Table 3. eqns.9) and (3.9) and (3.

we have that γ+δ γ−δ cos .12) 3.10) become the double-angle formulas sin 2α = 2 sin α cos α. Since such angles arise very frequently in practice.5.2 Trigonometric functions of double and half angles If α = β. look at the square and the Hence an hour is 15◦ .3. Solving (3.6 shows the angles. for angles which happen to be integral multiples of an hour —there are twenty-four or 0x18 hours in a circle. 2 2 γ+δ γ−δ cos δ + cos γ = 2 cos cos . 2 2 γ+δ γ−δ sin γ − sin δ = 2 cos sin . but you weren’t going to write your angles in such inelegant conventional notation as “15◦ . Table 3. 16 . TRIGONOMETRICS OF THE HOUR ANGLES Combining these in various ways.2 tabulates the trigonometric functions of these hour angles. then eqns. Consider: • There are 0x18 hours in a circle. 8 to calculate trigonometric functions of speciﬁc angles. cos 2α = 2 cos2 α − 1 = cos2 α − sin2 α = 1 − 2 sin2 α.” were you? Well. (3.6 Trigonometric functions of the hour angles In general one uses the Taylor series of Ch. To see how the values in the table are calculated.6. The barrier is erected neither lightly nor disrespectfully. you’re in good company. Figure 3. just as there are twenty-four or 0x18 hours in a day16 —for such angles simpler expressions exist. The author is fully aware of the barrier the unfamiliar notation poses for most ﬁrst-time readers of the book. if you were.13) for sin2 α and cos2 α yields the half-angle formulas 1 − cos 2α . it seems worth our while to study them specially. 2 1 + cos 2α . However. 2 2 γ−δ γ+δ sin . cos δ − cos γ = 2 sin 2 2 sin γ + sin δ = 2 sin 59 (3.14) 3. cos2 α = 2 sin2 α = (3.13) (3.

6: The 0x18 hours in a circle.60 CHAPTER 3. TRIGONOMETRY Figure 3. .

6.2: Trigonometric functions of the hour angles. ANGLE φ [radians] [hours] 0 2π 0x18 2π 0xC 2π 8 2π 6 (5)(2π) 0x18 2π 4 0 1 2 3 4 5 6 sin φ 0 √ 3−1 √ 2 2 1 2 1 √ 2 √ 3 2 tan φ 0 √ 3−1 √ 3+1 1 √ 3 1 √ 3 cos φ 1 √ 3+1 √ 2 2 √ 3 2 1 √ 2 1 2 √ 3+1 √ 2 2 1 √ 3+1 √ 3−1 ∞ √ 3−1 √ 2 2 0 . TRIGONOMETRICS OF THE HOUR ANGLES 61 Table 3.3.

It is hoped that after trying the notation a while.3). This is so because the clock face’s geometry is artiﬁcial.80 hours instead of 22.7: A square and an equilateral triangle for calculating trigonometric functions of the hour angles. after which we observe • There are 360 degrees in a circle. the reader will approve the choice. you’d probably not like the reception the memo got. √ 2 1 1 √ 3 2 1 1 1/2 1/2 equilateral triangle17 of Fig. It’d be a bit hard to read the time from such a crowded clock face were it not so big. If you wrote an engineering memo describing a survey angle as 0x1. the ﬁrst sentence nevertheless says the thing rather better.10) then supplies the various other lengths in the ﬁgure.62 CHAPTER 3. It is not claimed that they oﬀer much beneﬁt in most technical work of the less theoretical kinds. you may have seen the big clock face there with all twenty-four hours on it. a triangle whose three sides all have the same length. TRIGONOMETRY Figure 3. and since a triangle’s angles always total twelve hours (§ 2. There is a psychological trap regarding the hour. the perpendicular splits the triangle’s top angle into equal halves of two hours each and its bottom leg into equal segments of length 1/2 each. Nonetheless. .5 degrees. Both sentences say the same thing. don’t they? But even though the “0x” hex preﬁx is a bit clumsy. 3.9. [9] The hex and hour notations are recommended mostly only for theoretical math work. the improved notation ﬁts a book of this kind so well that the author hazards it. so the angle between eleven o’clock and twelve on the clock face is not an hour of arc! That angle is two hours of arc. the angle between hours on the Greenwich clock is indeed an honest hour of arc. standard clock face shows only twelve hours not twenty-four. for example. and the diagonal splits the square’s corner into equal halves of three hours each. The familiar. The Pythagorean theorem (§ 2. Each of the square’s four angles naturally measures six hours. The reader is urged to invest the attention and eﬀort to master the notation.7. but anyway. as the name and the ﬁgure suggest. Also by symmetry. If you have ever been to the Old Royal Observatory at Greenwich. 17 An equilateral triangle is. by symmetry each of the angles of the equilateral triangle in the ﬁgure measures four. England.

THE LAWS OF SINES AND COSINES Figure 3. one can calculate the sine. tangent and cosine of angles of two. By the deﬁnition of the sine function. diﬀerence and half-angle formulas from the preceding sections to the values already in the table.10) against the values for two and three hours just calculated. the Taylor series (§ 8. or in other words that sin γ sin β = . quicker way to calculate trigonometrics of non-hour angles.9) oﬀers a cleaner. The values for zero and six hours are. of course.1 that • the sine of a non-right angle in a right triangle is the opposite leg’s length divided by the diagonal’s.9) and (3. b c The creative reader may notice that he can extend the table to any angle by repeated application of the various sum. 63 y x γ b α h a c β from Fig. three and four hours.18 3. seen by inspection. • the tangent is the opposite leg’s length divided by the adjacent leg’s. 18 . The values for one and ﬁve hours are found by applying (3. one can write that c sin β = h = b sin γ.7. 3.8: The laws of sines and cosines.8.3.7 The laws of sines and cosines Refer to the triangle of Fig. and • the cosine is the adjacent leg’s length divided by the diagonal’s. However. 3. With this observation and the lengths in the ﬁgure.

too. 20 Here is another example of the book’s judicious relaxation of formal rigor.1 on page 52 has summarized simple properties of the trigonometric functions.” True. if one expresses a and b as vectors emanating from the point γ. gathering them from §§ 3. The perpendicular h drops from it. Nothing prevents us from dropping additional perpendiculars hβ and hγ from the other two corners and using those as utility variables. if we like. (3.3 summarizes further properties. TRIGONOMETRY But there is nothing special about β and γ. Since cos2 (·) + sin2 (·) = 1. sin α sin β sin γ = = . However. Of course there is no “point γ”.8 that few readers will be confused as to which point is meant. known as the law of cosines. too. “there is something special about α. 19 “But. the writer suspects in light of Fig.20 ˆ a = xa. However.8 Summary of properties Table 3. then c2 = |b − a|2 = a2 + (b2 )(cos2 γ + sin2 γ) − 2ab cos γ. The skillful applied mathematician does not multiply labels without need.19 Hence.5 and 3.64 CHAPTER 3. the h is just a utility variable to help us to manipulate the equation into the desired form. 3.4. what is true for them must be true for α. Table 3.16) = (b cos γ − a)2 + (b sin γ)2 3. γ is an angle not a point. we’re not interested in h itself. this is c2 = a2 + b2 − 2ab cos γ. a b c (3. On the other hand.” it is objected. . Table 3. ˆ ˆ b = xb cos γ + yb sin γ.7.2 on page 61 has listed the values of trigonometric functions of the hour angles. We can use any utility variables we want. 3.15) This equation is known as the law of sines.

SUMMARY OF PROPERTIES 65 Table 3.3.3: Further properties of the trigonometric functions.8. ˆ ˆ u = x′ (x cos φ + y sin φ) + y′ (−x sin φ + y cos φ) cos(α ± β) = cos α cos β ∓ sin α sin β cos(α − β) − cos(α + β) sin α sin β = 2 sin(α − β) + sin(α + β) sin α cos β = 2 cos(α − β) + cos(α + β) cos α cos β = 2 γ−δ γ +δ cos sin γ + sin δ = 2 sin 2 2 γ+δ γ−δ sin γ − sin δ = 2 cos sin 2 2 γ+δ γ−δ cos δ + cos γ = 2 cos cos 2 2 γ +δ γ−δ cos δ − cos γ = 2 sin sin 2 2 sin 2α = 2 sin α cos α cos 2α = 2 cos2 α − 1 = cos2 α − sin2 α = 1 − 2 sin2 α 1 − cos 2α sin2 α = 2 1 + cos 2α cos2 α = 2 sin γ sin α sin β = = c a b c2 = a2 + b2 − 2ab cos γ sin(α ± β) = sin α cos β ± cos α sin β .

) . y. That is. Consider for example an electric wire’s magnetic ﬁeld. Such rectangular coordinates are simple and general. “Orthonormality. or the illumination a lamp sheds on a printed page of this book. 7 May 2006] 22 Notice that the φ is conventionally written second in cylindrical (ρ. Nevertheless. whose intensity varies with distance from the wire (an axis).” 14:19. However. ˆ θ z (3. There are no constant unit basis vectors to match them. x ˆ ≡ +ˆ cos θ + ρ sin θ. z) can be used instead of the rectangular coordinates (x.9 Cylindrical and spherical coordinates Section 3. refer to § 15. Cylindrical and spherical coordinates can greatly simplify the analyses of the kinds of problems they respectively ﬁt.66 CHAPTER 3. φ. much later in the book.9. ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ v = xx + yy + zz = ρρ + φφ + zz = ˆr + θθ + φφ. z). variable unit basis vectors are deﬁned: ˆ ˆ ρ ≡ +ˆ cos φ + y sin φ.4 on page 53.3). ˆ r z ˆ ≡ −ˆ sin θ + ρ cos θ. the cylindrical coordinates (ρ. the spherical coordinates (r. 15 and 16 are read. x ˆ ˆ φ ≡ −ˆ sin φ + y cos φ. and are convenient for many purposes. Such coordinates are related to one another and to the rectangular coordinates by the formulas of Table 3.” [77. but they come at a price. θ. r It doesn’t work that way. φ) can be used.3 has introduced the concept of the vector ˆ ˆ ˆ v = xx + yy + zz. To attack a problem dominated by an axis. z) but third in spherical (r. y. z) on the equation’s right side are coordinates—specifically. θ.22 Refer to Fig. This odd-seeming convention is to maintain proper righthanded coordinate rotation.4. φ) coordinates. which depends on the book’s distance from the lamp (a point). TRIGONOMETRY 3. φ. The coeﬃcients (x. rectangular coordinates—which given a speciﬁc orthonormal21 set of unit basis vectors [ˆ y z] uniquely identify a point (see Fig. To attack a problem dominated by a point. there are at least two broad classes of conceptually simple problems for which rectangular coordinates tend to be inconvenient: problems in which an axis or a point dominates.17) 21 Orthonormal in this context means “of unit length and at right angles to the other vectors in the set. 3. 3. (The explanation will seem clearer once Chs. xˆˆ also.

9: A point on a sphere.) ˆ z ˆ θ φ x ˆ r z y ˆ ρ Table 3. z) coordinates. in spherical (r. ρ2 = x2 + y 2 r 2 = ρ2 + z 2 = x2 + y 2 + z 2 ρ tan θ = z y tan φ = x z = r cos θ ρ = r sin θ x = ρ cos φ = r sin θ cos φ y = ρ sin φ = r sin θ sin φ . φ) and cylindrical (ρ. cylindrical and spherical coordinates. CYLINDRICAL AND SPHERICAL COORDINATES 67 Figure 3. φ. (The axis labels bear circumﬂexes in this ﬁgure only to disambiguate the z axis from the cylindrical coordinate z.4: Relations among the rectangular.3.9. θ.

23 Symbols like ρx are logical but. substituting identities from the table. r (3. ρ ˆ ˆ y = +ˆ sin φ + φ cos φ. θ and φ is usually not a good idea. tan θ y = tan φy = ρy . Occaˆ sionally however a problem arises in which it is more convenient to orient x ˆ or y in the direction of the problem’s axis (usually because ˆ has already z been established in the direction of some other pertinent axis). as far as this writer is aware. y . but § 15. z (3.68 CHAPTER 3. TRIGONOMETRY or. θ= r ˆ ρ= (3. The writer is not aware of any conventionally established symbols for quantities like these. Changing the meanings of known symbols like ρ.18) Such variable unit basis vectors point locally in the directions in which their respective coordinates advance. x z . φ= ρ ˆ ˆ ˆ ˆz + ρρ ˆ x x + y y + zz z ˆ= = . y x . r ˆ ˆ ρ = +ˆ sin θ + θ cos θ. tan θ x = tan φx = instead if needed. 23 (ρy )2 = z 2 + x2 .6 at least will use the ρx -style symbology. we have also that ˆ ˆ x = +ˆ cos φ − φ sin φ. ˆ ˆ xx + yy . not standard. ρ ˆ ˆ z = +ˆ cos θ − θ sin θ. but you can use symbols like (ρx )2 = y 2 + z 2 .20) ρx . ρ ˆ −ˆ y + yx x ˆ . Combining pairs of (3.17)’s equations appropriately.19) ˆ Convention usually orients z in the direction of a problem’s axis. r r r ˆ z ˆ −ˆρ + ρz .

Equation (3.15). 25 Section 13.5 on page 43.3.25 An important consequence of (3. b and c represent the three sides of a triangle such that a + b + c = 0. then per (2.2. 3.2. 2.22) Naturally.” but that’s all right.9.22) will ﬁnd use among other places in § 8.22) implies convergence of zk . (3.10.44) |a| − |b| ≤ |a + b| ≤ |a| + |b| .2 in vector notation. the ﬁgures do suggest an analogy between complex phase and vector direction. . two-dimensional vectors a.11 De Moivre’s theorem Compare the Argand-plotted complex number of Fig. for example. then why not equally for complex scalars? Consider the geometric interpretation of the Argand plane of Fig.21) and (3.21) for any two complex numbers z1 and z2 . one may ﬁnd the latter formula useful for sums of real numbers.22) hold equally well for real numbers as for complex.9.3.22) is that if |zk | converges. which can be hard to do directly. when some of the numbers summed are positive and others negative.10.10 The complex triangle inequalities If the real. which per (3.5 (page 43) against the vector of Fig. Extending the sum inequality. 2.23) Reading closely. These are just the triangle inequalities of § 2.5 we can write z = (ρ)(cos φ + i sin φ) = ρ cis φ. one might note that § 2. then zk also converges. 2.3 (page 53).2 uses the “<” sign rather than the “≤. |z1 | − |z2 | ≤ |z1 + z2 | ≤ |z1 | + |z2 | (3. (3. Although complex numbers are scalars not vectors. Evidently.24 But if the triangle inequalities hold for real vectors in a plane. we have that k zk ≤ k |zk | .9 proves the triangle inequalities more generally. Convergence of |zk |. See also (9. is often easier to establish. See § 1. THE COMPLEX TRIANGLE INEQUALITIES 69 3. Such a consequence is important because mathematical derivations sometimes need the convergence of zk established. 3. 24 (3. With reference to Fig.

28) are known as de Moivre’s theorem.28) Equations (3. y = ρ sin φ.27 Also called de Moivre’s formula.54). but since the three equations express essentially the same idea. 27 [67][77] 26 (3. If z = x + iy. ρ1 ρ2 But according to (3. TRIGONOMETRY cis φ ≡ cos φ + i sin φ. (3. Per (2. (3.25) z1 z2 = (cos φ1 cos φ2 − sin φ1 sin φ2 ) + i(sin φ1 cos φ2 + cos φ1 sin φ2 ).27) and (3.26 . It says that if you want to multiply complex numbers. then evidently x = ρ cos φ. Applying (3.10).24) (3.70 where CHAPTER 3. (3. It follows by parallel reasoning (or by extension) that ρ1 z1 = cis(φ1 − φ2 ) z2 ρ2 and by extension that z a = ρa cis aφ. z1 z2 = (x1 x2 − y1 y2 ) + i(y1 x2 + x1 y2 ).26). or to some variation thereof. it suﬃces • to multiply their magnitudes and • to add their phases. Some authors apply the name of de Moivre directly only to (3. this is just z1 z2 = cos(φ1 + φ2 ) + i sin(φ1 + φ2 ).25) to the equation yields (3.27) .26) is an important result. ρ1 ρ2 or in other words z1 z2 = ρ1 ρ2 cis(φ1 + φ2 ).28).26) Equation (3. if you refer to any of them as de Moivre’s theorem then you are unlikely to be misunderstood.

5 will revisit the derivation of de Moivre’s theorem. Section 5.11. both deﬁned in Ch. DE MOIVRE’S THEOREM We have not shown yet.4.3. but will in § 5. that cis φ = exp iφ = eiφ . 71 where exp(·) is the natural exponential function and e is the natural logarithmic base. . 5. De Moivre’s theorem is most useful in this light.

72 CHAPTER 3. TRIGONOMETRY .

f ′ (t)? • Interpreting some function f ′ (t) as an instantaneous rate of change.1 Inﬁnitesimals and limits Calculus systematically treats numbers so large and so small. so brieﬂy stated. the calculus beginner’s ﬁrst task is quantitatively to understand the pair’s interrelationship. generality and signiﬁcance. Although once grasped the concept is relatively simple. Sir Isaac Newton and G. Leibnitz cleared this hurdle for us in the seventeenth century. Once grasped. or derivative. With the pair in hand. what is the corresponding accretion. Such an understanding constitutes the basic calculus concept. so now at least we know the right pair of questions to ask. he would probably ﬁnd it quicker and more pleasant to begin with a book like the one referenced. f (t)? This chapter builds toward a basic understanding of the ﬁrst question. Many instructional textbooks—[30] is a worthy example—have been written to lead the beginner gently. In this book we necessarily state the concept brieﬂy. then move along. what is the function’s instantaneous rate of change. 4. The greatest conceptual hurdle—the stroke of brilliance—probably lies simply in stating the pair of questions clearly. the concept is simple and brieﬂy stated. Although a suﬃciently talented. or integral.W.Chapter 4 The derivative The mathematics of calculus concerns a complementary pair of questions:1 • Given some function f (t). It cannot be the role of a book like this one to lead the beginner gently toward an apprehension of the basic calculus concept. they lie beyond the reach of our mundane number system. dedicated beginner could perhaps obtain the basic calculus concept directly here. 1 73 . to understand this pair of questions. is no trivial thing. They are the pair which eluded or confounded the most brilliant mathematical minds of the ancient world.

so if it is not immediately clear then let us approach the matter colloquially. then. For instance. Bigger than that. remember?” “Sorry. no.” “Zero. but its smallness conceptually lies beyond the reach of our mundane number system. the inﬁnitesimals are often not true mathematical inﬁnitesimals but rather relatively very small quantities such as the mass of a wood screw compared to the mass .1. rather it is δ/ǫ = 3.01. You said that we should use hexadecimal notation in this book.” This is the idea of the inﬁnitesimal. The principal advantage of using symbols like ǫ rather than 0 for inﬁnitesimals is in that it permits us conveniently to compare one inﬁnitesimal against another. Nevertheless.1 The inﬁnitesimal A number ǫ is an inﬁnitesimal if it is so small that 0 < |ǫ| < a for all possible mundane positive numbers a. etc.0000 0000 0000 0001?” “Smaller. to add them together. then the quotient δ/ǫ is not some unfathomable 0/0. This is somewhat a diﬃcult concept. to divide them.” I reply. THE DERIVATIVE 4. “How big is your inﬁnitesimal?” you ask. “How small?” “Very small.” “Smaller than 2−0x1 0000 0000 0000 0000 ?” “Now that is an impressively small number.” “What about 0x0. In physical applications. smaller than 0x0. if δ = 3ǫ is another inﬁnitesimal. It is a deﬁnite number of a certain nonzero magnitude.” “Smaller than 0x0. If ǫ is an inﬁnitesimal. Yes. “Very. right. my inﬁnitesimal is smaller still. My inﬁnitesimal is deﬁnitely bigger than zero. then 1/ǫ can be regarded as an inﬁnity: a very large number much larger than any mundane number one can name.” “Oh.74 CHAPTER 4.” “Smaller than 0x0. very small. Let me propose to you that I have an inﬁnitesimal.01?” “Smaller than what?” “Than 2−8 .0001? Is it smaller than that?” “Much smaller.

one common way to specify that ǫ be inﬁnitesimal is to write that ǫ ≪ 1. In keeping with this book’s adecimal theme (Appendix A) and the concept of the hour of arc (§ 3. Similarly. even very roughly speaking. In any case. the implication is that z draws toward zo from + the positive side such that z > zo . z→0 2z 2 lim The symbol “limQ ” is short for “in the limit as Q. For quantities between 1/0xC and 1/0x10000. depending on your point of view. we should probably render the rule as twelve points per wavelength here. typically such that one can regard the quantity u/v to be an inﬁnitesimal.2 Limits The notation limz→zo indicates that z draws as near to zo as it possibly can. positions of spacecraft and concentrations of chemical impurities must sometimes be accounted more precisely). On the other hand. it depends on the accuracy one seeks. when written limz→zo . ﬁrst-order inﬁnitesimal ǫ that the even latter cannot measure it. It is just a reminder that a quantity approaches some value.and higher-order inﬁnitesimals are likewise possible. a quantity less than 1/0x10000 of the principal is indeed inﬁnitesimal for most practical purposes (but not all: for example. INFINITESIMALS AND LIMITS 75 of a wooden house frame.6).4.1. The ǫ2 is an inﬁnitesimal to the inﬁnitesimals. used when saying that the quantity 2 Among scientists and engineers who study wave phenomena. indicates that u is much less than v. there is an old rule of thumb that sinusoidal waveforms be discretized not less ﬁnely than ten points per wavelength. The reason for the notation is to provide a way to handle expressions like 3z 2z as z vanishes: 3z 3 = . or v ≫ u. 4. a quantity greater then 1/0xC of the principal to which it compares probably cannot rightly be regarded as inﬁnitesimal. the − implication is that z draws toward zo from the negative side. . When written limz→zo . The additional cost of inviting one more guest to the wedding may or may not be inﬁnitesimal. whatever “negligible” means in the context. In fact.” Notice that lim is not a function like log or sin. The key point is that the inﬁnitesimal quantity be negligible by comparison.1.2 The second-order inﬁnitesimal ǫ2 is so small on the scale of the common. or the audio power of your voice compared to that of a jet engine. Third. The notation u ≪ v.

This section treats the problem in its basic form. There are evidently n P ≡ n!/(n − k)! (4. suppose that what you want is exactly k blocks. then how many distinct choices of blocks do you have? Answer: you have 2n choices. you select your favorite block ﬁrst: there are n options for this. the permutations redgreen-blue and green-red-blue constitute the same combination. 20). or permutations. Don’t let it confuse you.3 4. I have several small wooden blocks of various shapes and sizes.76 CHAPTER 4. the problem of selecting k speciﬁc items out of a set of n available items belongs to probability theory (Ch. If I oﬀer you the blocks and you are free to take all. I have only n − 1 blocks left). Now. available for you to select exactly k blocks. Desiring exactly k blocks. then the third. some or none of them at your option. However.2.2 Combinatorics In its general form. THE DERIVATIVE equaled the value would be confusing. the same problem also applies to the handling of polynomials or power series.1) k ordered ways. for instance. The lim notation is convenient to use sometimes. Then you select your third favorite—for this there are n − 2 options—and so on until you have k blocks. 4. then accept or reject the second. some of these distinct permutations put exactly the same combination of blocks in your hand. painted diﬀerent colors so that you can clearly tell each block from the others. neither more nor fewer. if you can take whichever blocks you want. however. whereas red-white-blue is a diﬀerent combination entirely. because you can accept or reject the ﬁrst block. but it is not magical. For a single combination 3 [30] . there are n − 1 options (why not n options? because you have already taken one block from me. Consider that to say z→2− lim (z + 2) = 4 is just a fancy way of saying that 2 + 2 = 4.1 Combinations and permutations Consider the following scenario. and so on. In its basic form. Then you select your second favorite: for this.

green-red-blue. Equation (4. there is no obvious deﬁnition. Because one can choose neither fewer than zero nor more than n from n blocks.9) come directly from the deﬁnition (4. .4) comes directly from the observation (made at the head of this section) that 2n total combinations are possible if any k is allowed.2.2). = = = = = n k . green-blue-red. they relate combinatoric coeﬃcients to their neighbors in Pascal’s triangle (§ 4. blue-green-red). evidently k! permutations are possible (redgreen-blue.6) (4. blue).3) (4.4.6) through (4. to choose k blocks then. (4. green. COMBINATORICS 77 of k blocks (red. blue-red-green.2) n n n−k n k (4. red-blue-green.2). or the black block plus k − 1 from the original set. k=0 n−1 k−1 + n−1 k n k (4. (4. Equations (4.8) .9) n−k+1 n k−1 k k+1 n n−k k+1 n k n−1 k−1 Equation (4. Equation (4.7) (4. n = 0 unless 0 ≤ k ≤ n.5) (4. Hence dividing the number of permutations (4.3) results from changing the variable k ← n − k in (4.2). k! of combinations include that = n k .2.10) k For n k n n−k n−1 k when n < 0. you can either choose k from the original set. (4.5) is seen when an nth block—let us say that it is a black block—is added to an existing set of n − 1 blocks.1) by k! yields the number of combinations n k Properties of the number n k ≡ n!/(n − k)! .4) = 2n .

(4. as (4. Notice how each entry in the triangle is the sum of the two entries immediately above.3 The binomial theorem This section presents the binomial theorem and one of its signiﬁcant consequences.11) can be extended to nonintegral n. k Evaluated. by an heroic derivational eﬀort.2.3. . However. 4.) 4. 0 0 1 0 2 0 3 0 4 0 4 1 3 1 4 2 2 1 3 2 1 1 2 2 3 3 4 3 4 4 .2 Pascal’s triangle Consider the triangular layout in Fig.11) The author is given to understand that. . What is covered—in Table 8. 4.1—is the Taylor series for (1 + z)a−1 for complex z and complex a. .1 of the various possible n . 4. (4.5) predicts. since applied mathematics does not usually concern itself with hard theorems of little known practical use. which amounts to much the same thing. 4. this yields Fig. Pascal’s triangle. THE DERIVATIVE Figure 4. (In fact this is the easy way to ﬁll Pascal’s triangle out: for each entry. just add the two entries above.1: The plan for Pascal’s triangle.78 CHAPTER 4.2. the extension as such is not covered in this book.1 Expanding the binomial The binomial theorem holds that4 n (a + b) = k=0 4 n n k an−k bk .

3. .3.12) (actually this holds for any ǫ. Since (a + b)n = (a + b)(a + b) · · · (a + b)(a + b).12) for (m. b = ǫ. each (a+b) factor corresponds to one of the “wooden blocks.2 Since Powers of numbers near unity n 0 = 1 and n 1 = n. m > 0. small or large. n ≥ 0. In either form. 4. In the common case that a = 1.2: Pascal’s triangle. |ǫ| ≪ 1.2. it follows from (4. accepting it. but the typical case of interest has |ǫ| ≪ 1). |ǫ| ≪ 1. this is n (1 + ǫ) = k=0 n n k ǫk (4. |ǫo | ≪ 1.4. that5 1 + mǫo ≈ (1 + ǫo )m 5 The symbol ≈ means “approximately equals. 79 1 1 1 1 2 1 1 3 3 1 1 4 6 4 1 1 5 A A 5 1 1 6 F 14 F 6 1 1 7 15 23 23 15 7 1 .” where a means rejecting the block and b. THE BINOMIAL THEOREM Figure 4. n) ∈ Z. . |δ| ≪ 1.” . the binomial theorem is a direct consequence of the combinatorics of § 4.

In fact nothing prevents us from deﬁning the action of a complex power such that (4.13). THE DERIVATIVE to excellent precision. 4. are complex? Changing the symbol z ← x and observing that the inﬁnitesimal ǫ may also be complex.13) into the new domain. accurate way of approximating any real power of numbers in the near neighborhood of 1. Ch. or both. No work we have yet done in the book answers the question. we have that (1 + ǫ)n/m ≈ 1 + Inverting this equation yields (1 + ǫ)−n/m ≈ 1 [1 − (n/m)ǫ] n = ≈ 1 − ǫ.80 CHAPTER 4. m Changing 1 + δ ← (1 + ǫ)n and observing from the (1 + ǫo )m equation above that this implies that δ ≈ nǫ. 1 + (n/m)ǫ [1 − (n/m)ǫ][1 + (n/m)ǫ] m (1 + ǫ)x ≈ 1 + xǫ (4. The writer knows of no conventional name6 for (4. raising the equation to the 1/m power then changing δ ← mǫo . one would like (4. Still. 8 will introduce the Taylor expansion as such. but its very form suggests the question: what if ǫ or x. one wants to know whether (1 + ǫ)z ≈ 1 + zǫ (4. “the ﬁrst-order Taylor expansion” is a conventional name for it.3 Complex powers of numbers near unity Equation (4. the action of a complex power z remains undeﬁned.14) to hold. but so unwieldy a name does not ﬁt the present context.3.13) n ǫ.14) still holds. which we now do.14) does hold. 6 . m Taken together. The equation oﬀers a simple. logically extending the known result (4. Furthermore. we have that (1 + δ)1/m ≈ 1 + δ .13) is ﬁne as far as it goes. but named or unnamed it is an important equation. for consistency’s sake. the last two equations imply that for any real x. because although a complex inﬁnitesimal ǫ poses no particular problem. Actually.

we now stand in a position properly to introduce the chapter’s subject. (4.15) says that f ′ (t) = = 7 ∞ k=−∞ ∞ k=−∞ ǫ→0+ lim (ck )(t + ǫ/2)k − (ck )(t − ǫ/2)k ǫ (1 + ǫ/2t)k − (1 − ǫ/2t)k .14) grows large. one can deﬁne the same derivative in the unbalanced form f ′ (t) = lim f (t + ǫ) − f (t) . In mathematical symbols and for the moment using real numbers.4. ǫ ǫ→0+ but this book generally prefers the more elegant balanced form (4. 4. ǫ (4. but for the moment we shall use the equation in a more ordinary manner to develop the concept and basic application of the derivative. as follows. There is no helping this.15). (4.14).4 The derivative Having laid down (4. What is the derivative? The derivative is the instantaneous rate or slope of a function. f ′ (t) ≡ lim f (t + ǫ/2) − f (t − ǫ/2) .16) where the ck are in general complex coeﬃcients.1 The derivative of the power series In the very common case that f (t) is the power series f (t) = ∞ k=−∞ ck tk .15) ǫ→0+ Alternately. the derivative.4.4 will investigate the extremely interesting eﬀects which arise when ℜ(ǫ) = 0 and the power z in (4.7 4. which we will now use in developing the derivative’s several properties through the rest of the chapter.4. ǫ ǫ→0+ lim ck tk From this section through § 4. The reader is advised to tread through these sections line by stubborn line. THE DERIVATIVE 81 Section 5. in the good trust that the math thus gained will prove both interesting and useful.7. . the mathematical notation grows a little thick.

(4. is G. ǫ ∞ k=−∞ ck ktk−1 . 9 This subsection is likely to confuse many readers the ﬁrst time they read it. however. this is f (t) = which simpliﬁes to f ′ (t) = ′ ∞ k=−∞ CHAPTER 4. one can choose any inﬁnitesimal size ǫ. In (4. the meaning is clear enough: d(·) signiﬁes how much (·) changes f ′ (t) = 8 Equation (4. and df is a dependent inﬁnitesimal whose size relative to dt depends on the independent variable t. df .4. Often they have developed positive misunderstandings.18).8 4. For the independent inﬁnitesimal dt.3 will remedy the oversight.2 The Leibnitz notation The f ′ (t) notation used above for the derivative is due to Sir Isaac Newton. but occasionally when there are two independent variables it helps the analysis to adjust the size of one of the independent inﬁnitesimals with respect to the other. THE DERIVATIVE ǫ→0+ lim ck tk (1 + kǫ/2t) − (1 − kǫ/2t) . (4.15). Because there is signiﬁcant practical beneﬁt in learning how to handle the Leibnitz notation correctly—particularly in applied complex variable theory—this subsection seeks to present each Leibnitz element in its correct light. The meaning of the symbol d unfortunately depends on the context. . like ∂f /∂z.17) admittedly has not explicitly considered what happens when the real t becomes the complex z. df such that per (4. Leibnitz’s notation9 dt = ǫ.17) Equation (4. = f (t + dt/2) − f (t − dt/2). Usually better on the whole. many users of applied mathematics have never developed a clear understanding as to precisely what the individual symbols mean. conceptually. Usually the exact choice of size does not matter.18) dt Here dt is the inﬁnitesimal.W.4.17) gives the general derivative of the power series.82 Applying (4.14). but § 4. and is easier to start with. The reason is that Leibnitz elements like dt and ∂f usually tend to appear in practice in certain speciﬁc relations to one another. As a result.

so there is usually no trouble with treating dt and df as though they were the same kind of thing. The important point is that dv and dx be coordinated so that the ratio dv/dx has a deﬁnite value no matter which of the two be regarded as independent. what we are interested in is not dt or df as such. if s and t are both independent variables. Maybe it would be clearer in some cases to write dt f instead of df . That is okay as far as it goes. This leads one to forget that dt does indeed have a precise value. whose speciﬁc size could be a function of t but more likely is just a constant. we do not usually worry about which of df and dt is the independent inﬁnitesimal. one can have dt = ǫ(t). then dt is just an inﬁnitesimal of some kind. 11 This is diﬃcult. nor do we usually worry about the precise value of dt.10 Notice. If a constant. Because the book is a book of applied mathematics. such that dt has prior independence to ds. however. this footnote does not attempt to say everything there is to say about inﬁnitesimals. For instance. this writer recommends that you learn the Leibnitz notation properly. For this reason. The quantities dt and df represent coordinated inﬁnitesimal changes in t and f respectively.4. One might as well just stay with the Newton notation in that case. Maybe it would be clearer in some cases to write ǫ instead of dt. but the latter is how it is conventionally written.4. What confuses is when one changes perspective in mid-analysis. yet the author can think of no clearer. more concise way to state it. then dt does not fundamentally have anything to do with t as such. ds = δ(s. developing the ability to treat the inﬁnitesimals individually.15) and (4. reread it carefully with reference to (4. Instead. it is a function of t. Many applied mathematicians do precisely that. never treating the inﬁnitesimals individually. that the notation dt itself has two distinct meanings:11 • the independent inﬁnitesimal dt = ǫ. but for most cases the former notation is unnecessarily cluttered. 10 . t). it’s important.18) until you do. and If you do not fully understand this sentence. then df or d(f ) is the amount by which f changes as t changes by dt. most of the time. Now. but rather the ratio df /dt or the sum k f (k dt) dt = f (t) dt. This is ﬁne. However. then we can (and in complex analysis sometimes do) say that ds = dt = ǫ. you can go from there. we do not necessarily have either v or x in mind as the independent variable. The point is to develop a clear picture in your mind of what a Leibnitz inﬁnitesimal really is. In fact. or whether the independent be some third variable (like t) not in the equation. Changing perspective is allowed and perfectly proper. but it really denies the entire point of the Leibnitz notation. THE DERIVATIVE 83 when the independent variable t increments by dt. you can do that as the need arises. If t is an independent variable. One can avoid the confusion simply by keeping the dv/dx or df /dt always in ratio. Sometimes when writing a diﬀerential equation like the potential-kinetic energy equation ma dx = mv dv. the ratio df /dt remains the same in any case. if f is a dependent variable. The df is inﬁnitesimal but not constant. now regarding f as the independent variable. after which nothing prevents us from using the symbols ds and dt interchangeably. the latter is how it is conventionally written. but one must take care: the dt and df after the change are not the same as the dt and df before the change. By contrast. it has not yet pointed out (but does so now) that even if s and t are equally independent variables. Once you have the picture. at the fundamental level they really aren’t. The point is not to fathom all the possible implications from the start. However.

then it should be equally valid to let ǫ = δ. and for most practical cases of interest. or second derivative. introducing some unambiguous symbol like ǫ to represent the independent inﬁnitesimal. Such notation appears only rarely in the literature. etc. That is. it may be wise to use the symbol dt to mean d(t) only. both of which nonetheless normally are written dt. One should like the 12 The writer confesses that he remains unsure why this minor distinction merits the separate symbol ∂.84 CHAPTER 4.15) to be robust. one can write ∂t f when tracking t. it is conventional to use the symbol ∂ instead of d.19) one should like it to evaluate the same in the limit regardless of the complex phase of ǫ. the math can get a little tricky. dt2 .4. (If needed or desired. ǫ = −δ. By extension. Where two or more independent variables are at work in the same equation. ∂s f when tracking s. for (dt)2 . the notation dk f dtk represents the kth derivative. but he accepts the notation as conventional nevertheless. if δ is a positive real inﬁnitesimal. so indeed it is. when switching perspective in mid-analysis as to which variables are dependent and which are independent. However. At ﬁrst glance.3 The derivative of a function of a complex variable For (4. dz ǫ→0 ǫ (4.12 A derivative ∂f /∂t or ∂f /∂s in this case is sometimes called by the slightly misleading name of partial derivative. so long as 0 < |ǫ| ≪ 1. THE DERIVATIVE • d(t). ǫ = (4 − i3)δ or any other inﬁnitesimal value. In such cases. written here in the slightly more general form df f (z + ǫ/2) − f (z − ǫ/2) = lim . . as a reminder that the reader needs to pay attention to which ∂ tracks which independent variable. 4. the distinction between dt and d(t) seems a distinction without a diﬀerence. or when changing multiple independent complex variables simultaneously. ǫ = iδ. though. Use discretion. so your audience might not understand it when you write it. so d2 f d(df /dt) = 2 dt dt is a derivative of a derivative. In any case you should appreciate the conceptual diﬀerence between dt = ǫ and d(t). which is how much (t) changes as t increments by dt. ǫ = −iδ.) Conventional shorthand for d(df ) is d2 f .

4. Particularly. Meeting the criterion. In the professionals’ favor. They seem to prefer the unbalanced nonetheless.20) The unbalanced deﬁnition of the derivative from § 4.19) rather than (4. In fact for the sake of robustness.19) except at isolated nonanalytic points (like z = 0 in h[z] = 1/z √ or g[z] = z).14). Moreover. 2. whose complex form is df f (z + ǫ) − f (z) = lim .) Scientists and engineers tend to prefer the balanced deﬁnition among other reasons because it more reliably approximates the derivative of a function for which only discrete samples are available [22.7). §§ I:9. Professional mathematicians have diﬀerent needs.5).7]. plus the Heaviside unit step u(t) and the Dirac delta δ(t) (§ 7. . THE DERIVATIVE 85 derivative (4.6 and I:9. one acknowledges that the balanced deﬁnition strictly misjudges the modulus function f (z) = |z| to be diﬀerentiable solely at the point z = 0. one normally demands that derivatives do come out the same regardless of the Argand direction. arg[·]. so the derivative (4. [·]∗ .17) of the general power series. most functions encountered in applications do meet the criterion (4.19) this book prefers. works without modiﬁcation when ǫ is complex. written here as (1 + ǫ)w ≈ 1 + wǫ. It is a question for the professionals. whereas that the unbalanced deﬁnition. and (4. such functions are fully diﬀerentiable except at their poles (where the derivative goes inﬁnite in any case) and other nonanalytic points. Where the limit (4. d dz 13 ∞ k=−∞ ck z k = ∞ k=−∞ ck kz k−1 (4. Where the derivative (4.4. ℜ[·] and ℑ[·]. though. judges the modulus to be diﬀerentiable nowhere—though the writer is familiar with no signiﬁcant appliedmathematical implication of the distinction. complex neighborhood? The writer does not know. see § 2.3).15) is the deﬁnition we normally use for the derivative for this reason. for this writer at least the balanced deﬁnition just better captures the subjective sense of the thing.19) is sensitive to the Argand direction or complex phase of ǫ. (Would it co¨rdinate the two deﬁnitions to o insist that a derivative exist not only at a point but everywhere in the point’s immediate. the key formula (4.4.13 Excepting the nonanalytic parts of complex numbers (|·|. ǫ→0 dz ǫ does not always serve applications as well as does the balanced deﬁnition (4. there we normally say that the derivative does not exist. probably more sensibly.19) to come out the same regardless of the Argand direction from which ǫ approaches 0 (see Fig.12.19) does exist—where the derivative is ﬁnite and insensitive to Argand direction—there we say that the function f (z) is diﬀerentiable.

if you inform me that you earn $ 1000 a year on a bond you hold.4 and its (5.1’s logic in light of (4. It expresses the signiﬁcant concept of a relative rate. d(z a ) dz (z + ǫ/2)a − (z − ǫ/2)a ǫ→0 ǫ (1 + ǫ/2z)a − (1 − ǫ/2z)a = lim z a ǫ→0 ǫ (1 + aǫ/2z) − (1 − aǫ/2z) . then I may commend you vaguely for your thrift but otherwise the information does not tell me much. 4. treated in § 5. so you can ignore the right side of (4. THE DERIVATIVE holds equally well for complex z as for real. For example. How exactly to evaluate z a or z a−1 when a is complex is another matter. 4.4 The derivative of z a Inspection of § 4.22) The investment principal grows at the absolute rate df /dt. That is. The natural logarithmic notation ln f (t) may not mean much to you yet.4. = lim z a ǫ→0 ǫ = lim d(z a ) = az a−1 dz (4.86 CHAPTER 4.14) reveals that nothing prevents us from replacing the real t. or logarithmic derivative.4. ǫ and a. d df /dt = ln f (t).4.22) for the moment.21) which simpliﬁes to for any complex z and a. then I might want to invest. The latter ﬁgure is a relative rate. but the bond’s interest rate is (df /dt)/f (t). if you inform me instead that you earn 10 percent a year on the same bond.5 The logarithmic derivative Sometimes one is more interested in knowing the rate of f (t) relative to the value of f (t) than in knowing the absolute rate itself. as we’ll not introduce it formally until § 5. .21) for real a right now.13).2. However. real ǫ and integral k of that section with arbitrary complex z. but in any case you can use (4. like 10 percent annual interest on a bond. f (t) dt (4. but the equation’s left side at least should make sense to you.

19). 2 For example. BASIC MANIPULATION OF THE DERIVATIVE 87 4. then14 . which itself is a function of z.15 4. (4.23) Equation (4. 3z 2 − 1.5. But to ﬁrst order.23).4. one can rewrite f (z) = 3z 2 − 1 w1/2 .5 Basic manipulation of the derivative This section introduces the derivative chain and product rules. a thing divided by itself is 1. .5. in the form f (w) w(z) Then df dw dw dz so by (4. d j fj (z) = dz 2 fj z + j dz 2 − j fj z − dz 2 .23) is the derivative chain rule.5.2 The derivative product rule In general per (4. df = dz 15 = = = = dw dz 1 1 . 2 −1 2 3z 3z 2 − 1 df dw It bears emphasizing to readers who may inadvertently have picked up unhelpful ideas about the Leibnitz notation in the past: the dw factor in the denominator cancels the dw factor in the numerator. fj z ± 14 ≈ fj (z) ± dfj dz dz 2 = fj (z) ± dfj . 6z 3z = √ = √ .1 The derivative chain rule df = dz df dw dw dz If f is a function of w. 4. There is nothing more to the proof of the derivative chain rule than that. = √ 2w1/2 2 3z 2 − 1 6z. That’s it.

this simpliﬁes to dfk −dfk d fj (z) = fj (z) − fj (z) .25) fj = j fj k dfk . this comes to d(f1 f2 ) = f2 df1 + f1 df2 . so.23). You can skip it for now if you prefer.24) in the slightly specialized but often useful form16 d gj j a ebj hj j j j = ln cj pj j gj j j j a ebj hj dgk + gk ln cj pj bk dhk + k k dpk . (4. we shall stand in a position to write (4.27) × 16 ak k This paragraph is extra. After studying the complex exponential in Ch.24) On the other hand. g2 (4.24) is the derivative product rule. or in other words d j In the common case of only two fj . . THE DERIVATIVE Since the product of two or more dfj is negligible compared to the ﬁrst-order inﬁnitesimals to which they are added here. d f g = g df − f dg . d j CHAPTER 4. df2 = −dg/g2 .26) Equation (4. in the limit. pk ln ck pk (4. then by the derivative chain rule (4. if f1 (z) = f (z) and f2 (z) = 1/g(z). 2fk (z) 2fk (z) j j k j k fj (z) = fj (z) + j dfj 2 − j fj (z) − dfj 2 . fk (4.88 so. 5.

6. EXTREMA AND HIGHER DERIVATIVES 89 where the ak .5.4.17 4. given Q. (4.21). a local minimum or maximum—of a real-valued function f (x). The almost distinctive characteristic of the extremum f (xo ) is that18 df dx = 0. The subsection is suﬃciently abstract that it is a little hard to understand unless one already knows what it means. If from downward to upward. Whether the extremum be a minimum or a maximum depends on whether the curve turn from a downward slope to an upward.6 Extrema and higher derivatives One problem which arises very frequently in applied mathematics is the problem of ﬁnding a local extremum—that is.” Sometimes it is alternately written P |Q or [P ]Q . x=xo (4. the slope is zero.” or “P evaluated at Q. emerging among other places in § 16. u v z s ln 7s 17 18 The notation P |Q means “P when Q. Refer to Fig. the derivative of the product z a f (z) with respect to its independent variable z is d a df [z f (z)] = z a + az a−1 f (z). dz dz Swapping the equation’s left and right sides then dividing through by z a yields df f d(z a f ) +a = a .29) At the extremum. or from an upward slope to a downward.29) to ﬁnd the extremum.3 A derivative product pattern According to (4. if from upward to downward. 4. hk and pk are arbitrary functions.9.” “P . bk and ck are arbitrary complex coeﬃcients and the gk .28) dz z z dz a pattern worth committing to memory. respectively. One solves (4. An example may help: d u2 v 3 −5t u2 v 3 −5t e ln 7s = e ln 7s z z 2 du dv dz ds +3 − − 5 dt + .25) and (4. then the derivative of the slope is evidently positive.3. 4. The curve momentarily runs level there. .

would be a matter of deﬁnition. then f (x) = yo is just a level straight line. 19 .4. but you knew that already.3: A local extremum. best established not by prescription but rather by the needs of the model at hand. Whether one chooses to call some random point on a level straight line an inﬂection point or an extremum.90 CHAPTER 4. Hence if df /dx = 0 at x = xo . x=xo Regarding the case d2 f dx2 = 0. or both or neither.) Of course if the ﬁrst and second derivatives are zero not just at x = xo but everywhere.4 is level because its ﬁrst derivative is zero. y f (xo ) f (x) xo x then negative. THE DERIVATIVE Figure 4. 4. The inﬂection point of Fig. x=xo this might be either a minimum or a maximum but more probably is neither.19 (In general the term inﬂection point signiﬁes a point at which the second derivative is zero. being rather a level inﬂection point as depicted in Fig. 4. or second derivative. too. then d2 f dx2 d2 f dx2 > 0 implies a local minimum at xo . x=xo < 0 implies a local maximum at xo . But the derivative of the slope is just the derivative of the derivative.

7. if both functions go to zero or inﬁnity together at z = zo —then l’Hˆpital’s rule holds that o z→zo lim df /dz f (z) = g(z) dg/dz .4: A level inﬂection.ˆ 4. z→zo −df /f 2 g(z) z→zo F (z) z→zo dF lim dg g(z) = lim .30) In the case where z = zo is a root. new functions F (z) ≡ 1/f (z) and G(z) ≡ 1/g(z) of which z = zo is a root are deﬁned. z→zo df f (z) Partly with reference to [77. “L’Hopital’s rule. L’HOPITAL’S RULE Figure 4. or alternately if z = zo is a pole of both functions—that is. y 91 f (xo ) f (x) xo x 4. we have that z→zo 20 G(z) dG −dg/g2 f (z) = lim = lim = lim . l’Hˆpital’s rule is proved by reasoning20 o f (z) f (z) − 0 = lim z→zo g(z) z→zo g(z) − 0 df df /dz f (z) − f (zo ) = lim = lim . = lim z→zo dg z→zo dg/dz z→zo g(z) − g(zo ) lim In the case where z = zo is a pole. z=zo (4. 5 April 2006].7 L’Hˆpital’s rule o If z = zo is a root of both f (z) and g(z).” 03:40. with which z→zo lim where we have used the fact from (4.21) that d(1/u) = −du/u2 for any u. Canceling the minus signs and multiplying by g2 /f 2 . .

92 Inverting. THE DERIVATIVE lim df df /dz f (z) = lim = lim . whether it represents a root or a pole. but just to make the point let us apply l’Hˆpital’s rule instead. Nothing prevents one from applying l’Hˆpital’s rule recursively. plus related forms like (0)(∞) which can be recast into either of the two main forms. we deﬁne the new variable Z = 1/z and the new functions Φ(Z) = f (1/Z) = f (z) and Γ(Z) = g(1/Z) = g(z). lim √ = lim x→∞ 1/2 x x→∞ x→∞ x x The example incidentally shows that natural logarithms grow slower than square roots. with which we apply l’Hˆpital’s rule for o Z → 0 to obtain f (z) Φ(Z) dΦ/dZ df /dZ = lim = lim = lim z→∞ g(z) Z→0 Γ(Z) Z→0 dΓ/dZ Z→0 dg/dZ (df /dz)(dz/dZ) (df /dz)(−z 2 ) df /dz = lim = lim = lim .8) the natural logarithmic function and its derivative. a recursive application of l’Hˆpital’s rule can be just the thing one needs. 5 and later. o Observe that one must stop applying l’Hˆpital’s rule once the ratio is no longer 0/0 or o ∞/∞.22 1 d ln x = . Good examples of the use require math from Ch. In the example. an instance of a more general principle we shall meet in § 5.21 o L’Hˆpital’s rule is used in evaluating indeterminate forms of the kinds o 0/0 and ∞/∞. § 10-2] 21 . (dg/dz)(dz/dZ) lim Z→0 Nothing in the derivation requires that z or zo be real. should the occasion arise. z→zo CHAPTER 4. applying the rule a third time would have ruined the result. which is still 0/0. Where expressions involving trigonometric or special functions (Chs.3. 22 This paragraph is optional reading for the moment. o Consider for example the ratio limx→0 (x3 + x)2 /x2 . z→∞ (dg/dz)(−z 2 ) z→∞ dg/dz z→∞.3 will put l’Hˆpital’s rule to work. 5 and [not yet written]) appear in ratio. 3. Section 5. then come back here and read the paragraph if you prefer. z→zo dg z→zo dg/dz g(z) And if zo itself is inﬁnite? Then. but if we may borrow from (5. 23 [65. reducing the ratio to limx→0 2(x3 + o x)(3x2 + 1)/2x. dx x 23 then a typical l’Hˆpital example is o 1/x 2 ln x √ = lim √ = 0. The easier way to resolve this particular ratio would naturally be to cancel a factor of x2 from it. Applying l’Hˆpital’s rule again to the result yields o limx→0 2[(3x2 +1)2 +(x3 +x)(6x)]/2 = 2/2 = 1. You can read Ch. 5 ﬁrst. which is 0/0.

etc. y 93 x xk xk+1 f (x) 4.4. is the line which most nearly approximates a curve at a given point. The tangent touches the curve at the point.5 is a good example of a tangent line. A tangent line. give successively better estimates of the true root z∞ . To understand the Newton-Raphson iteration.8 The Newton-Raphson iteration The Newton-Raphson iteration is a powerful.5: The Newton-Raphson iteration. The trigonometric tangent function is named from a variation on Fig. fast converging. also just called a tangent. z2 .. leaving the rightward leg tangent to the circle. calculated in turn by the iteration (4.31). Given a function f (z) of which the root is desired.5.8. consider the function y = f (x) of Fig 4. 4.1 in which the triangle’s bottom leg is extended to unit length. Then z1 . the Newton-Raphson iteration is zk+1 = z − f (z) d dz f (z) z=zk . broadly applicable method for ﬁnding roots numerically. and in the neighborhood of the point it goes in the same direction the curve goes. THE NEWTON-RAPHSON ITERATION Figure 4. The relationship between the tangent line and the trigonometric tangent function of Ch. z3 . 3 is slightly obscure. The iteration approximates the curve f (x) by its tangent line24 (shown as the dashed line in the ﬁgure): ˜ fk (x) = f (xk ) + 24 d f (x) dx x=xk (x − xk ). (4. 3. . maybe more of linguistic interest than of mathematical.31) One begins the iteration by guessing the root and calling the guess z0 . The dashed line in Fig.

The most straightforward way to beat this problem is to ﬁnd all the roots: ﬁrst you ﬁnd some root α. If this procedure is not practical (perhaps because the function has a large or inﬁnite number of roots). Usually in practice. we have that xk+1 = x − f (x) d dx f (x) x=xk d f (x) dx x=xk (xk+1 − xk ). then the iteration might never converge at √ all. Although the illustration uses real numbers. if you happen to guess z0 especially unfortunately. even for calculations by hand. For most functions. the Newton-Raphson iteration works very well. Then try again with z0 = 1 + i2−0x10 . For instance. For example. then you ﬁnd the next root by iterating on the new function f (z)/(z − α). once the Newton-Raphson ﬁnds the root’s neighborhood. the Newton-Raphson normally still converges.31) with x ← z. The principal limitation of the Newton-Raphson arises when the function has more than one root. which is (4. try it with your pencil and see what z2 comes out to be). but relatively slowly. possibly complex initial guess. but if you guess that z0 = 1 then the iteration has no way to leave the real number line. THE DERIVATIVE ˜ It then approximates the root xk+1 as the point at which fk (xk+1 ) = 0: ˜ fk (xk+1 ) = 0 = f (xk ) + Solving for xk+1 .94 CHAPTER 4. However. and in any case there is no guarantee that the root it ﬁnds is the one you wanted. even the relatively slow convergence is still pretty fast and is usually adequate. A third limitation arises where there is a multiple root. so √ it never converges25 (and if you guess that z0 = 2—well. . then you should probably take care to make a suﬃciently accurate initial guess if you can. The iteration often converges on the root nearest the initial guess zo but does not always. A second limitation of the Newton-Raphson is that. . the roots of f (z) = z 2 + 2 are z = ±i 2. and so on until you have found all the roots. as most interesting functions do. The Newton-Raphson iteration works just as well for these. In this case. the Newton-Raphson converges relatively slowly on the triple root of f (z) = z 3 . it 25 It is entertaining to try this on a computer. then you remove that root (without aﬀecting any of the other roots) by dividing f (z)/(z − α). nothing forbids complex z and f (z). You can ﬁx the problem with a diﬀerent.

2 xk x0 = 1 and iterate several times. This saves eﬀort and usually works. Given x0 = 1. The Newton-Raphson iteration however excels as a practical root-ﬁnding technique. Then σ is real and nonnegative. This concludes the chapter. p 1 xk + . the Newton-Raphson iteration for this is xk+1 = If you start by guessing (4. Figure 4. THE NEWTON-RAPHSON ITERATION 95 converges on the actual root remarkably quickly. (To see why. n xk (4. § 6.1][76] 26 . the curve hardly departs from the straight line.33) reliably. σ ≥ 0.5. let f (x) = xn − p and iterate26. in which cis(ψ/n) = cos(ψ/n) + i sin(ψ/n) is calculated by the Taylor series of Table 8. In the uncommon event that the direct iteration does not seem to converge. Per (4. sketch f [x] in the fashion of Fig. Equations (4. incidentally.7 generalizes the Newton-Raphson iteration to handle vectorvalued functions.28) as p1/n = σ 1/n cis(ψ/n). 27 [65. so it often pays to be a little less theoretically rigid in applying it. The Newton-Raphson iteration is a champion square root calculator. start over again with some randomly chosen complex z0 . orderly convergence is needed for complex p = u + iv = σ cis ψ.32) and (4.31). nonnegative p. If so. 4. seek p1/n directly.1. orderly computes σ 1/n .1.27 xk+1 = p 1 (n − 1)xk + n−1 . then don’t bother to decompose.8.5 shows why: in the neighborhood.32) converges on x∞ = To calculate the nth root x = p1/n .) If reliable. they converge reliably and orderly only for real. whose roots are √ x = ± p. using complex zk in place of the real xk . Chapter 8. Consider f (x) = x2 − p. treating the Taylor series.33) √ p fast.33) work not only for real p but also usually for complex. however. § 4-9][52. you can decompose p1/n per de Moivre’s theorem (3.4. the iteration (4. will continue the general discussion of the derivative. upon which (4.32) Section 13.

THE DERIVATIVE .96 CHAPTER 4.

basic or advanced.1) Equation (5. what happens in the limit. It works out the functions’ derivatives and the derivatives of the basic members of the trigonometric and inverse trigonometric families to which they respectively belong.1 The real exponential (1 + ǫ)N . Because the complex natural exponential emerges (at least pedagogically) out of the real natural exponential.Chapter 5 The complex exponential The complex natural exponential is ubiquitous in higher mathematics.1) deﬁnes the natural exponential function—commonly. this chapter introduces ﬁrst the real natural exponential and its inverse. while x = ǫN remains a ﬁnite number? The answer is that the factor becomes exp x ≡ lim (1 + ǫ)x/ǫ . Consider the factor This is the overall factor by which a quantity grows after N iterative rounds of multiplication by (1 + ǫ). What happens when ǫ is very small but N is very large? The really interesting question is. It derives the exponential and logarithmic functions’ basic properties and explains their close relationship to the trigonometrics. in which the complex exponential does not strongly impress itself and frequently arise. the real natural logarithm. and then proceeds to show how the two can operate on complex arguments. as ǫ → 0 and N → ∞. more brieﬂy named the exponential function. There seems hardly a corner of calculus. Another way to write the same 97 . ǫ→0 (5. 5.

e ≡ lim (1 + ǫ) ǫ→0 1/ǫ (5. (5. whether in fact we can put some concrete bound on e.4). With the tangent line y(x) found. however. The rest of this section shows why it does.ǫ→0 δ = lim (1 + ǫ)x/ǫ = lim (1 + ǫ)x/ǫ .4) dx This is a curious. the author would prefer to omit much of the rest of this section. whether 0 < e < ∞.19) and (4. important result: the derivative of the exponential function is the exponential function itself.13) that the derivative of the exponential function is exp(x + δ/2) − exp(x − δ/2) d exp x = lim δ→0 dx δ (x+δ/2)/ǫ − (1 + ǫ)(x−δ/2)/ǫ (1 + ǫ) = lim δ. but even at the applied level cannot think of a logically permissible way to do it.98 deﬁnition is CHAPTER 5. THE COMPLEX EXPONENTIAL exp x = ex . To show that we can. the next step toward putting a concrete bound on e is to show that y(x) ≤ exp x for all real x. ǫ→0 d exp x = exp x. the question remains as to whether the limit actually exists. ǫ→0 dx which says that the slope and height of the exponential function are both unity at x = 0. that the curve runs Excepting (5.ǫ→0 δ (1 + δ/2) − (1 − δ/2) = lim (1 + ǫ)x/ǫ δ. (5. the slope and height of the exponential function are everywhere equal. It seems nonobvious that the limit limǫ→0 (1 + ǫ)1/ǫ actually does exist. 1 which is to say that . that is.ǫ→0 δ (1 + ǫ)+δ/2ǫ − (1 + ǫ)−δ/2ǫ δ. implying that the straight line which best approximates the exponential function in that neighborhood—the tangent line. For the moment. what interests us is that d exp 0 = exp 0 = lim (1 + ǫ)0 = 1.1 we observe per (4. which just grazes the curve—is y(x) = 1 + x.3) Whichever form we write it in.2) .

at x > 0. Leftward. But we have purposely deﬁned the tangent line y(x) = 1 + x such that exp 0 = d exp 0 = dx y(0) = 1. ≤ e1 . . Rightward.1 depicts. so 1 2 2 ≤ e−1/2 . Either way. again bending upward away from the line. the last two equations imply further that d exp x1 ≤ dx 0 ≤ d exp x2 dx d exp x. so the same action also means that 0 ≤ exp x for all real x. ≤ exp − . evidently the curve’s slope only increases.2) exp x = ex . Evaluating the last inequality at x = −1/2 and x = 1. we have that 1 2 2 But per (5. However. this action means for real x that exp x1 ≤ exp x2 if x1 ≤ x2 . a positive number remains positive no matter how many times one multiplies or divides it by 1 + ǫ.1) that the essential action of the exponential function is to multiply repeatedly by 1 + ǫ as x increases. Since 1 + ǫ > 1.5. at x < 0. we observe per (5. To show this. dx if x1 ≤ x2 . bending upward away from the line. d y(0) = 1.1. to divide repeatedly by 1 + ǫ as x decreases. the curve never crosses below the line for real x. such that the line just grazes the curve of exp x at x = 0.4). THE REAL EXPONENTIAL 99 nowhere below the line. In symbols. evidently the curve’s slope only decreases. y(x) ≤ exp x. In light of (5. 1 2 ≤ exp (1) . Figure 5. dx that is.

100 CHAPTER 5. the value e ≈ 0x2.B7E1 can readily be calculated. (exp x) dx (5. where e is the constant introduced in (5. but the derivation of that series does not come until Ch. The limit does exist. 8. is the most interesting choice of all.4. As we shall see in § 5. b = 2 is an interesting choice.6) a form which expresses or captures the deep curiosity of the natural exponential maybe even better than does (5. exp x 1 x −1 or in other words. For this reason among others.2) puts the desired bound on the exponential function. it turns out that b = e. THE COMPLEX EXPONENTIAL Figure 5. By the Taylor series of Table 8. 5.5) which in consideration of (5.1. for example. one can choose any base b.5) d(exp x) = 1. . such that we deﬁne for it the special notation ln(·) = loge (·). however.3).4. the base-e logarithm is similarly interesting.1: The natural exponential.4). (5.4) by exp x yields the logarithmic derivative (§ 4. Dividing (5. 2 ≤ e ≤ 4.2 The natural logarithm In the general exponential expression bx .

8) the inverse of which is In other words. and per (5. dx = exp y. then x = exp y.2.4). so also for base b = e. Figure 5.2 plots the natural logarithm. If y = ln x. (5. THE NATURAL LOGARITHM Figure 5. dy But this means that dx = x.5.2: The natural logarithm. Just as for any other base b. dx x 1 d ln x = . dx x (5. dy 1 dy = .7) exp ln x = eln x = x. ln x 101 1 x −1 and call it the natural logarithm. . thus the natural logarithm inverts the natural exponential and vice versa: ln exp x = ln ex = x.

10) which reveals the logarithm to be a slow function. Applying the rule. 8 and its Table 8. 2 which Ch.1 will show how to calculate. here is another rather signiﬁcant result.9) logb w = ln b This equation converts any logarithm to a natural logarithm. We will use the technique more than once in this book. ln x x x→∞ x→∞ 2 Besides the result itself.10) to show also that exp(±x) x→∞ xa lim exp(±x) x→∞ xa = lim exp [±x − a ln x] = lim exp ln x→∞ = = = x→∞ lim exp (x) ±1 − a lim exp [(x) (±1 − 0)] lim exp [±x] . 0 if a < 0. diverge or decay respectively faster and slower than xa . so we note here that ln 2 = − ln 1 ≈ 0x0. +∞ if a ≤ 0. (5.30). the technique which leads to the result is also interesting and is worth mastering. Base b = 2 logarithms are interesting. These functions grow.3 Fast and slow functions The exponential exp x is a fast function. The logarithm ln x is a slow function. (5. THE COMPLEX EXPONENTIAL Like many of the equations in these early chapters.B172. Such claims are proved by l’Hˆpital’s rule (4. Since the exp(·) and ln(·) functions are mutual inverses.102 CHAPTER 5. . we can leverage (5. −∞ if a ≥ 0. 5.5’s logarithmic base-conversion identity to read ln w .2 One can specialize Table 2. we o have that −1 ln x = lim = a x→∞ axa x→∞ x lim ln x −1 lim a = lim a = x→0 x x→0 ax 0 if a > 0.

. . .. . 3 . that per (5.− 2. . 3. but the divergence of the former is extremely slow— so slow. . x0 = lim u→∞ ln u ..2 has plotted ln x only for x ∼ 1. FAST AND SLOW FUNCTIONS That is. ... each term in each sequence is the derivative with respect to x of the term to its right—except left of the middle element in the ﬁrst sequence and right of the middle element in the second. in fact. What is going on here? The answer is that x0 (which is just a constant) and ln x both are of zeroth order in x. . Exponentials grow or decay faster than powers.3 It is interesting and worthwhile to contrast the sequence x→∞ lim . .11) exp(−x) lim = 0. . x→∞ xa which reveals the exponential to be a fast function. but it certainly does not hurry. Such conclusions are extended to bases other than the natural base e simply by observing that logb x = ln x/ ln b and that bx = exp(x ln b).− There are of course some degenerate edge cases like b = 0 and b = 1.10) limx→∞ (ln x)/xǫ = 0 for any positive ǫ no matter how small. but beyond the ﬁgure’s window the curve (whose slope is 1/x) ﬂattens rapidly rightward.3. Thus exponentials generally are fast and logarithms generally are slow. and indeed one can write ln(x + u) .4 Figure 5. The reader can detail these as the need arises.. The exception is peculiar. 103 exp(+x) = ∞.. regardless of the base. .− against the sequence 1! 0! x0 x1 x2 x3 x4 3! 2! . The natural logarithm does indeed eventually diverge to inﬁnity. xa (5. 1. This seems strange at ﬁrst because ln x diverges as x → ∞ whereas x0 does not.. x4 x x x 0! 1! 2! 3! 4! 3! 2! 1! 0! x1 x2 x3 x4 . − 2 . As we have seen. it takes practically forever just to reach 0x100. 4 One does not grasp how truly slow the divergence is until one calculates a few concrete values. each sequence increases in magnitude going rightward. to the extent that it locally resembles the plot of a constant value. Consider for instance how far out x must run to make ln x = 0x100.5. . logarithms diverge slower. It’s a long. 3 . 1 . Also.. . long way. in the literal sense that there is no height it does not eventually reach. ln x. x4 x x x 1! 2! 3! 4! As x → +∞.

4 Euler’s formula The result of § 5. So. if we can ﬁnd a way to deﬁne it which ﬁts sensibly with our existing notions of the real exponential. ǫ→0 but from here it is not obvious where to go.11) that exp x is of inﬁnite order in x. to remember that ln x resembles x0 and that exp x resembles x∞ .11). we turn our attention in the next section to the highly important matter of the exponential of a complex argument. helps one to grasp mentally the essential features of many mathematical models one encounters in practice. but the point nevertheless remains that x0 and ln x often play analogous roles in mathematics. then exp(iǫ) = (1 + iǫ)ǫ/ǫ = 1 + iǫ. one can take advantage of (4. .10) and (5.1 leads to one of the central questions in all of mathematics. A qualitative sense that logarithms are slow and exponentials. ǫ→0 where i2 = −1 is the imaginary unit introduced in § 2. The book’s development up to the present point gives no obvious direction.1. The logarithm can in some situations proﬁtably be thought of as a “diverging constant” of sorts.104 CHAPTER 5. 5. In fact it appears that the interpretation of exp iθ remains for us to deﬁne. because ln x is not in fact a constant. one ought not strain such logic too far. fast. that x∞ and exp x play analogous roles. How can one evaluate exp iθ = lim (1 + ǫ)iθ/ǫ . what do we know? One thing we know is that if θ = ǫ. It beﬁts an applied mathematician subjectively to internalize (5. THE COMPLEX EXPONENTIAL which casts x0 as a logarithm shifted and scaled. the essential operation of the exponential function is to multiply repeatedly by some factor. Less strange-seeming perhaps is the consequence of (5. Now leaving aside fast and slow functions for the moment.14) to write the last equation in the form exp iθ = lim (1 + iǫ)θ/ǫ .12? To begin. the factor being not quite exactly unity and. if we don’t quite know where to go with this yet. But per § 5. Admittedly.

3. obtaining (1 + iǫ)(x + iy) = (x − ǫy) + i(y + ǫx).3: The complex exponential and Euler’s formula. The resulting change in z is ∆z = (1 + iǫ)(x + iy) − (x + iy) = (ǫ)(−y + ix). ρ = |z| and φ = arg z are as shown in Fig. With such thoughts in mind. which happens to be just the kind of motion ∆z represents. being 1 + iǫ. let us multiply a complex number z = x + iy by 1 + iǫ. iℑ(z) i2 ∆z i ρ φ −2 −1 −i −i2 1 2 z ℜ(z) 105 in this case. in which it is seen that |∆z| = (ǫ) y 2 + x2 = ǫρ. Referring to the ﬁgure and the last equations. 2π x = φ+ . we have then that |∆z| ǫρ ∆φ ≡ arg(z + ∆z) − arg z = = ρ ρ ∆ρ ≡ |z + ∆z| − |z| = 0. . arg(∆z) = arctan −y 4 The ∆z. EULER’S FORMULA Figure 5. Whether in the ﬁgure or in the equations. To travel about a circle wants motion always perpendicular to the circle’s radial arm. 5. the change ∆z is evidently proportional to the magnitude of z. = ǫ.5. but at a right angle to z’s radial arm in the complex plane.4.

|exp iθ| = 1. ǫ→0 and that this remains so for arbitrary real θ. where the notation cis(·) is as deﬁned in § 3. stand independently of the value of ρ.106 CHAPTER 5. Reversing the sequence of the last two equations and recalling that ρ ≡ |exp iθ| and that φ ≡ arg(exp iθ).12) . importantly. but to compute exp iθ by multiplying 1 by 1+iǫ repeatedly. or. while the magnitude must remain ρ = 1. as the last equations almost seem to suggest? The answer is no. We have recently seen how each multiplication of the kind the equation suggests increments the phase φ by ∆φ = ǫ while not changing the magnitude ρ.11. arg(exp iθ) = θ. which equations together say neither more nor less than that exp iφ = cos φ + i sin φ = cis φ. mechanically. the answer is that ∆ρ ≈ {[ 1 + ǫ2 ] − 1}ρ ≈ ǫ2 ρ/2 ≈ 0. we have for real φ that |exp iφ| = 1. arg(exp iφ) = φ.) With such results in hand. θ/ǫ times? The plain answer is that such an equation does precisely this and nothing else. Since the phase φ begins from arg 1 = 0 it must become θ φ= ǫ=θ ǫ after θ/ǫ increments of ǫ each. Changing φ ← θ now. Moreover. if you √ prefer. naturally we should have represented it by the symbol φ from the start. a secondorder inﬁnitesimal inconsequential on the scale of ǫ or ǫρ. (5. now let us recall from earlier in the section that—as we have asserted or deﬁned— exp iθ = lim (1 + iǫ)θ/ǫ . Yet what does such an equation do. THE COMPLEX EXPONENTIAL which results evidently are valid for inﬁnitesimal ǫ → 0 and. had we known that θ were just φ ≡ arg(exp iθ). (But does ρ not grow at least a little. utterly vanishing by comparison in the limit ǫ → 0.

the last equation is wz = exp(x ln σ − ψy) exp i(y ln σ + ψx). v .1).4.2).47). perhaps. eqn. where x = ρ cos φ.” 6 An alternate derivation of Euler’s formula (5. σ= tan ψ = u2 + v 2 .12) that one can express any complex number in the form z = x + iy = ρ exp iφ. y = ρ sin φ. but at least it builds conﬁdence that we actually knew what we were doing when we came up with the incredible (5.12) is one of the most famous results in all of mathematics.13) serves to raise any complex number to a complex power.12)—less intuitive and requiring slightly more advanced mathematics. 5. (5. Cauchy’s integral formula (8. Such an alternate derivation lends little insight. then adds the latter two to show them equal to the ﬁrst of the three.3 and (5. 5 . For native English speakers who do not speak German. EULER’S FORMULA 107 Along with the Pythagorean theorem (2. but briefer—constructs from Table 8. Since exp(α + β) = eα+β = exp α exp β. u (5.5. How? Consider in light of Fig.13) Equation (5. then it follows that wz = exp[ln wz ] = exp[z ln w] = exp[(x + iy)(iψ + ln σ)] = exp[(x ln σ − ψy) + i(y ln σ + ψx)]. cos φ and i sin φ.12).1 the Taylor series for exp iφ. not just for the natural base e but for any base. It is called Euler’s formula. the fundamental theorem of calculus (7. If a complex base w is similarly expressed in the form w = u + iv = σ exp iψ.6 and it opens the exponential domain fully to complex numbers.5. Leonhard Euler’s name is pronounced as “oiler.29) and Fourier’s equation (18.

12) to +φ then to −φ. (5. (5. we have that Adding the two equations and solving for cos φ yields exp(+iφ) + exp(−iφ) . (5.12) implies that complex numbers z1 and z2 can be written (5.108 CHAPTER 5. THE COMPLEX EXPONENTIAL Curious consequences of Euler’s formula (5. e in2π e±i2π/2 = −1. sin φ = . ρ1 i(φ1 −φ2 ) ρ1 z1 = e = exp[i(φ1 − φ2 )]. Euler’s formula (5.18) 2 Subtracting the second equation from the ﬁrst and solving for sin φ yields cos φ = exp(+iφ) − exp(−iφ) .17) 5.2. exp(−iφ) = cos φ − i sin φ.12) include that e±i2π/4 = ±i. 5. z1 z2 = ρ1 ρ2 ei(φ1 +φ2 ) = ρ1 ρ2 exp[i(φ1 + φ2 )]. This is de Moivre’s theorem. introduced in § 3.11. = 1. then. z2 ρ2 ρ2 z a = ρa eiaφ = ρa exp[iaφ].5 Complex exponentials and de Moivre’s theorem z1 = ρ1 eiφ1 .19) i2 Thus are the trigonometrics expressed in terms of complex exponentials. Applying Euler’s formula (5.14) For the natural logarithm of a complex number in light of Euler’s formula.6 Complex trigonometrics exp(+iφ) = cos φ + i sin φ.15) ln w = ln σeiψ = ln σ + iψ. we have that (5. z2 = ρ2 eiφ2 . (5.16) By the basic power properties of Table 2.

19) suggest the deﬁnition of new functions cosh φ ≡ sinh φ ≡ tanh φ ≡ exp(+φ) + exp(−φ) . cosh φ (5. The Pythagorean theorem for trigonometrics (3.2) is that cos2 φ + sin2 φ = 1.21) can generally be true only if (5. (5. as earlier seen in § 3.18) through (5.5.6. the notation cis(·) ≡ exp i(·) = cos(·) + i sin(·) is also conventionally recognized. Although less commonly seen than the other two.20) and (5.6.7 The notation exp i(·) or ei(·) is sometimes felt to be too bulky. if later you return after reading Ch. 3. and likewise for the several other trigs. COMPLEX TRIGONOMETRICS 109 5. Replacing z ← φ in this section’s several equations implies a coherent deﬁnition for trigonometric functions of a complex variable. Their inverses arccosh. 15 and if the confusion then arises. but what if anything the ﬁgure means when φ is complex is not obvious. 15. 7 .. cosh2 φ − sinh2 φ = 1. then consider that the angle φ of Fig. then such thoughts may serve to clarify the matter. so you can ignore this footnote for now.23) hold for complex φ as well as for real. Hence in fact (cos φ)∗ cos φ + (sin φ)∗ sin φ = 1 and (cosh φ) cosh φ − (sinh φ)∗ sinh φ = 1.1 The hyperbolic functions The forms (5. 2 sinh φ . in the notation of that chapter—which tempts one incorrectly to v ˆ suppose by analogy that (cos φ)∗ cos φ + (sin φ)∗ sin φ = 1 and that (cosh φ)∗ cosh φ − (sinh φ)∗ sinh φ = 1 when the angle φ is complex. are deﬁned in the obvious way.21) one can derive the hyperbolic analog: cos2 φ + sin2 φ = 1. Then. If the confusion descends directly or indirectly from the ﬁgure.23) holds exactly as written for complex φ as well as for real.11. 2 exp(+φ) − exp(−φ) . and from (5. Such confusion probably tempts few readers unfamiliar with the material of Ch. However. Also conventionally recognized are sin−1 (·) and occasionally asin(·) for arcsin(·). (5.23)’s ﬁrst line from that ﬁgure.20) (5. comparChapter 15 teaches that the “dot product” of a unit vector and its own conjugate is unity—ˆ ∗ · v = 1.21) (5.1 is a real angle. etc. whereas we originally derived (5.22) These are called the hyperbolic functions. However.18) and (5.23) Both lines of (5. The ﬁgure is quite handy for real φ.

the last step of which has used the quadratic formula (2. −i ln.3 to hyperbolic use.18) z = cos w = then by successive steps eiw = 2z − e−iw . ln. Likewise their respective inverses: arcsin.21). exp. Ch. 2 eiw eiw = z ± = 2z eiw − 1. we have that w= 8 1 ln z ± i 1 − z 2 . one can. z 2 − 1. then one might call them the trigonometric and inverse trigonometric families. complementarily. At this point in the development one begins to notice that the sin.19) respectively to (5. we have that cosh z = cos iz.24) 5. cos. but one might call them the natural exponential and natural logarithmic families.18) and (5. 2 eiw + e−iw . i sinh z = sin iz.6. cosh and sinh functions are each really just diﬀerent facets of the same mathematical phenomenon.2 Inverse complex trigonometrics Since one can express the several trigonometric functions in terms of complex exponentials one would like to know. Conventional names for these two mutually inverse families of functions are unknown to the author. (5.20) and (5.110 CHAPTER 5. 2] . THE COMPLEX EXPONENTIAL ing (5. arccosh and arcsinh.8 Let us consider the arccosine function. arccos. Or.1 and 3. if the various tangent functions were included. cis. If per (5. for instance. Taking the logarithm. As it happens. whether one cannot express the several inverse trigonometric functions in terms of complex logarithms. i tanh z = tan iz.2). by which one can immediately adapt the many trigonometric properties of Tables 3. i [67.

. 3. SUMMARY OF PROPERTIES or. arcsin z = 1 ln iz ± i 1 − z2 .25) (5. (i + z)eiw = (i − z)e−iw .27) By the same means.7 Summary of properties Table 5. eiw + e−iw = −ieiw + ie−iw . ei2w = i+z implying that arctan z = i−z 1 ln .1 gathers properties of the complex exponential from this chapter and from §§ 2. i2 i + z (5.7. that arccos z = Similarly.5. 5.11 and 4. (5.4. 1 ln z ± i 1 − z 2 . one can work out the inverse hyperbolics to be arccosh z = ln z ± arcsinh z = ln z ± arctanh z = z2 − 1 . z2 + 1 . since by deﬁnition z = cos w.26) The arctangent goes only a little diﬀerently: z = tan w = −i zeiw + ze−iw eiw − e−iw . i 111 (5.12.8 Derivatives of complex exponentials This section computes the derivatives of the various trigonometric and inverse trigonometric functions. i−z .28) 1 1+z ln . 2 1−z 5.

1: Complex exponential properties. THE COMPLEX EXPONENTIAL Table 5. i2 = −1 = (−i)2 1 = −i i eiφ = cos φ + i sin φ eiz = cos z + i sin z z1 z2 = ρ1 ρ2 ei(φ1 +φ2 ) = (x1 x2 − y1 y2 ) + i(y1 x2 + x1 y2 ) ρ1 i(φ1 −φ2 ) (x1 x2 + y1 y2 ) + i(y1 x2 − x1 y2 ) z1 = e = 2 z2 ρ2 x2 + y 2 2 z a = ρa eiaφ wz = ex ln σ−ψy ei(y ln σ+ψx) ln w = ln σ + iψ eiz − e−iz i2 eiz + e−iz cos z = 2 sin z tan z = cos z sin z = sin iz = i sinh z cos iz = cosh z tan iz = i tanh z 1 − z2 1 − z2 ez − e−z 2 ez + e−z cosh z = 2 sinh z tanh z = cosh z sinh z = z2 + 1 z2 − 1 arcsin z = arccos z = arctan z = 1 ln iz ± i 1 ln z ± i i i−z 1 ln i2 i + z arcsinh z = ln z ± arccosh z = ln z ± arctanh z = 1 1+z ln 2 1−z cos2 z + sin2 z = 1 = cosh2 z − sinh2 z z ≡ x + iy = ρeiφ w ≡ u + iv = σeiψ exp z ≡ ez cis z ≡ cos z + i sin z = eiz d exp z = exp z dz d 1 ln w = dw w df /dz d = ln f (z) f (z) dz ln w logb w = ln b .112 CHAPTER 5.

5. Better applied style is to ﬁnd the derivatives by observing directly the circle from which the sine and cosine functions come. which is to say that arg(dz/dt) = φ + 2π/4.5.19).4.4: The derivatives of the sine and cosine functions. • the direction is at right angles to the arm of ρ. ℑ(z) dz dt z ℜ(z) 113 ρ ωt + φo 5. Refer to Fig.1 Derivatives of sine and cosine One can compute derivatives of the sine and cosine functions from (5. With these observations we can write that dz 2π 2π = (ρω) cos ωt + φo + + i sin ωt + φo + dt 4 4 = (ρω) [− sin(ωt + φo ) + i cos(ωt + φo )] . considering the ﬁgure. and in what Argand direction? Answer: (5. Suppose that the point z in the ﬁgure is not ﬁxed but travels steadily about the circle such that z(t) = (ρ) [cos(ωt + φo ) + i sin(ωt + φo )] .31) . d d dz = (ρ) cos(ωt + φo ) + i sin(ωt + φo ) .18) and (5. but to do it in that way doesn’t seem sporting.8. DERIVATIVES OF COMPLEX EXPONENTIALS Figure 5.8. • the speed |dz/dt| is also (ρ)(dφ/dt) = ρω.29) How fast then is the rate dz/dt.30) (5. (5. dt dt dt Evidently however.

with the help of (5. By analogy with the natural logarithm.2.33) 5. but its derivative is simpler.32) (5. Following a similar procedure. as does its derivative. by contrast. dz d 1 ln w = .9 5.31).2. we have that d cos(ωt + φo ) = −ω sin(ωt + φo ).3 Derivatives of the inverse trigonometrics Observe the pair d exp z = exp z. dt (5. Refer to the account of the natural logarithm’s derivative in § 5. back endpaper] . dt d sin(ωt + φo ) = +ω cos(ωt + φo ).33) give the derivatives of exp(·). THE COMPLEX EXPONENTIAL Matching the real and imaginary parts of (5. we can calculate the several derivatives of Table 5.2.114 CHAPTER 5. dw w The natural exponential exp z belongs to the trigonometric family of functions. From these. dt If ω = 1 and φo = 0.8.8.2 Derivatives of the trigonometrics Equations (5. not a trigonometric or inverse trigonometric function at all.23) and the derivative chain and product rules (§ 4.30) against those of (5. In Table 5.4) and (5. these are d cos t = − sin t. The natural logarithm ln w. do all the inverse trigonometrics have simpler derivatives? It turns out that they do. sin(·) and cos(·). belongs to the inverse trigonometric family of functions. one notices that all the trigonometrics have trigonometric derivatives. we have by successive steps 9 [65.5). dt d sin t = + cos t.

8. DERIVATIVES OF COMPLEX EXPONENTIALS 115 Table 5. d exp z = + exp z dz d sin z = + cos z dz d cos z = − sin z dz d 1 dz exp z d 1 dz sin z d 1 dz cos z 1 exp z 1 = − tan z sin z tan z = + cos z = − d tan z = + 1 + tan2 z dz 1 d 1 = − 1+ dz tan z tan2 z d sinh z = + cosh z dz d cosh z = + sinh z dz d 1 dz sinh z 1 d dz cosh z 1 cos2 z 1 = − 2 sin z = + 1 tanh z sinh z tanh z = − cosh z = − d tanh z = 1 − tanh2 z dz d 1 1 = 1− dz tanh z tanh2 z = + 1 cosh2 z 1 = − sinh2 z .2: Derivatives of the trigonometrics.5.

1 − w2 (5. = ± = ± = √ 1 − sin2 z. Table 5. 1 − w2 ±1 √ . leaving them to the professional mathematical theorists? As developed by Oliver Heaviside in 1887. = 1 + tan2 z. then I have 400 g 10 [53] .10 the answer depends on your point of view.116 that CHAPTER 5.34) arctan w = z. 1 − w2 . = cos z. If I have 300 g of grapes and 100 g of grapes. = 1 . w dw dz dw dz dz dw ±1 . then hadn’t we better avoid these complex quantities. THE COMPLEX EXPONENTIAL arcsin w = z.3 summarizes. 1 + w2 d arctan w = dw (5. 5. the applied mathematician can lose sight of some questions he probably ought to keep in mind: Is there really such a thing as a complex quantity in nature? If not. w dw dz dw dz dw dz dz dw = sin z.9 The actuality of complex quantities Doing all this neat complex math. 1 + w2 1 .35) Derivatives of the other inverse trigonometrics are found in the same way. = 1 + w2 . = tan z. d arcsin w = dw Similarly.

if I have 500 g of grapes and −100 g of grapes.5. an example: Suppose that by the Newton- . the analyses thereof are mutually conjugate. 5. Better yet.3). 400 g. THE ACTUALITY OF COMPLEX QUANTITIES Table 5. 2 2 The beneﬁt of splitting the real cosine into two complex parts is that while the magnitude of the cosine changes with time t.3: Derivatives of the inverse trigonometrics. Alternately. The one analysis suﬃces for both. again I have 400 g altogether. d ln w = dw d arcsin w dw d arccos w dw d arctan w dw d arcsinh w dw d arccosh w dw d arctanh w dw = = = = = = 1 w ±1 1 − w2 ∓1 √ 1 − w2 1 1 + w2 ±1 √ w2 + 1 ±1 √ w2 − 1 1 1 − w2 √ 117 altogether.11 (It’s like A cos[ωt − kz] = 11 If the point is not immediately clear. too.18) and (5. one often describes wave phenomena as linear superpositions (sums) of countervailing complex exponentials. Probably you would not choose to think of 200 + i100 g of grapes and 200 − i100 g of grapes. so you normally needn’t actually analyze the second. (What does it mean to have −100 g of grapes? Maybe that I ate some!) But what if I have 200 + i100 g of grapes and 200 − i100 g of grapes? Answer: again. Consider for instance the propagating wave A A exp[+i(ωt − kz)] + exp[−i(ωt − kz)].19). It turns out to be much easier to analyze two complex wave quantities of constant magnitude than to analyze one real wave quantity of varying magnitude. since each complex wave quantity is the complex conjugate of the other.9. but because of (5. the magnitude of either exponential alone remains steady (see the circle in Fig.

So go ahead: regard the imaginary parts as actual. but it is not the way this book recommends. “Do not multiply objects without necessity.19) that a trigonometric can be decomposed into a sum of complex exponentials. More often than one likes to believe.8C. You need not abuse Ockham’s razor! (Ockham’s razor.8C. reﬂecting—which is to say.118 CHAPTER 5. Nothing in the mathematics requires you to regard the imaginary parts as physically nonexistent.”12 is not a bad philosophical indicator as far as it goes. Raphson iteration (§ 4. then which of the two is the real decomposition? Answer: it depends on your point of view.) It is true by Euler’s formula (5.18) and (5. conjugating—complex quantities often is useful. Where is there another root? Answer: at the complex conjugate.18) and (5. Of course. you needn’t ask her to try writing reverse with the wrong hand. but anyway.19) which are the real decomposition. From this point of view.) Some authors have gently denigrated the use of imaginary parts in physical applications as a mere mathematical trick. The complex exponential method of oﬀsetting imaginary parts oﬀers an elegant yet practical mathematical means to model physical wave phenomena. this ignores the question of why one would want to reﬂect someone’s handwriting in the ﬁrst place. THE COMPLEX EXPONENTIAL reﬂecting your sister’s handwriting. Experience seems to recommend viewing the complex exponential as the basic element—as the element of which the trigonometrics are composed—rather than the other way around. 12] 13 [21] . if each can be decomposed into the other. To regard the imaginary parts as actual hurts nothing. Well.2D + i0x1. However. Aristotle would regard them so (or so the author suspects). x ≈ −0x0.2D − i0x1. but is overused in some circles—particularly in circles in which Aristotle13 is mistakenly believed to be vaguely outdated. the necessity to multiply objects remains hidden until one has ventured the multiplication. So. that is one way to treat the matter. as though the parts were not actually there. nor reveals itself to the one who wields the razor. 12 [71. and it helps with the math. it is equally true by the complex trigonometric formulas (5.8) you have found a root of the polynomial x3 + 2x2 + 3x + 4 at x ≈ −0x0. you can just hold her regular script up to a mirror. Euler’s formula itself is secondary. One need not actually run the Newton-Raphson again to ﬁnd the conjugate root. Ch.12) that a complex exponential exp iφ can be decomposed into a sum of trigonometrics. To read her handwriting backward. whose hand humility should stay. it is (5.

. letting p1 . divisible only by one and itself. each of whose treatment seems too brief for a chapter of its own. suppose that there were. so we will conﬁne our study of number theory in this book to one or two of its simplest. 1 119 . 6. A composite number is an integer greater than one and not prime. . . 3. most broadly interesting results. . Now consider the product of The deeper results of number theory do arise in cryptography. . . . The deeper results of number theory seldom arise in applications. its spirit is that of pure mathematics rather than of applied. The mathematical study of prime numbers and their incidents constitutes number theory.Chapter 6 Primes. this book is probably not the one you want. 7. with N ﬁnite. If you seek cryptographic derivations.1 The inﬁnite supply of primes The ﬁrst primes are evidently 2. suppose that there existed exactly N primes.1 Prime numbers A prime number —or simply. a prime—is an integer greater than one. 6. 5. More precisely. 0xB. Although cryptography is literally an application of mathematics. Is there a last prime? To show that there is not. roots and averages This chapter gathers a few signiﬁcant topics. .1 however.1. A composite number can be composed as a product of two or more prime numbers. p2 . All positive integers greater than one are either composite or prime. or so the author has been led to understand. pN represent these primes from least to greatest. and it is a deep area of mathematics.

2 = (21 ). One such plausible assumption is the assumption that every positive integer has a unique prime factorization. with the symbol C representing the least such integer. . The identity of that book is now long forgotten.” 02:36. It is readily seen that the ﬁrst several positive integers—1 = (). 5 = (51 ). —each have unique prime factorizations.3 6. ROOTS AND AVERAGES N C= j=1 pj . it also cannot divide C + 1. p4 = 7. “Reductio ad absurdum. . or as usually styled in English. p5 = 0xB. and the former is assumed impossible on the ground that C + 1 > C > pN . The author encountered the proof in some book over a decade ago. Appendix 1][77. yet cannot even cite this one properly. the foregoing proof is a classic example of mathematical reductio ad absurdum. since p2 = 3 divides C. proof by contradiction. But the latter is assumed impossible on the ground that the pj series includes all primes. suppose that it were not. it cannot divide C + 1. etc. What of C + 1? Since p1 = 2 divides C. suppose that there did exist positive integers factorable each in two or more distinct ways. 3 2 . with pN the greatest prime. 6 = (21 )(31 ). plausible assumptions can hide subtle logical ﬂaws. The same goes for p3 = 5.120 all the primes. No matter how great a prime number one ﬁnds. or that C + 1 is composed of primes not in the series. 28 April 2006] 4 Unfortunately the author knows no more elegant proof than this. 4 = (22 ). Apparently none of the primes in the pj series divides C + 1. 8 = (23 ). Noting that C must be composite (prime numbers by deﬁnition are each factorable [68] [62. CHAPTER 6.2 Compositional uniqueness Occasionally in mathematics. .2 Attributed to the ancient geometer Euclid.1. which implies either that C + 1 itself is prime.4 More precisely. 3 = (31 ). Thus there is no last prime. The false assumption: that there were a last prime. PRIMES. The supply of primes is inﬁnite. a greater can always be found. but is this necessarily true of all positive integers? To show that it is true. The contradiction proves false the assumption which gave rise to it. Similarly. 7 = (71 ).

PRIME NUMBERS only one way. that the same prime cannot appear in both factorizations—because if the same prime r did appear in both then C/r either would be prime (in which case both factorizations would be [r][C/r]. j=2 Nq qk . defying our assumption that the two diﬀered) or would constitute an ambiguously factorable composite integer less than C when we had already deﬁned C to represent the least such. Cp = Cq = C. the fact that pj = qk strengthens the deﬁnition p1 ≤ q1 to read p1 < q 1 . Np > 1. Np Ap ≡ Aq ≡ pj . k=2 . let Np 121 Cp ≡ Cq ≡ pj . p1 ≤ q 1 .1. Let us now rewrite the two factorizations in the form Cp = p1 Ap . pj ≤ pj+1 . j=1 Nq qk . Nq > 1. We see that pj = q k for any j and k—that is.6. Among other eﬀects. Cq = q1 Aq . where Cp and Cq represent two distinct prime factorizations of the same number C and where the pj and qk are the respective primes ordered from least to greatest. k=1 Cp = Cq = C. qk ≤ qk+1 . like 5 = [51 ]).

The false assumption: that there existed a least composite number C prime-factorable in two distinct ways. we have that √ 1 < p1 < q1 ≤ C ≤ Aq < Ap < C. But Ap < C. Let E represent the positive integer which results from dividing C by p1 q1 : C . Interestingly however. not even q1 . The last inequality lets us compose the new positive integer B = C − p1 q 1 . with C the least positive integer factorable two ways. so Ap ’s prime factorization is unique—and we see above that Ap ’s factorization does not include any qk . Since C is composite and since p1 < q1 . also. we have just had to do some extra work to prove it.122 CHAPTER 6. But if p1 q1 divides B. Indeed this is so. but which either way enjoys a unique prime factorization because B < C. we note that each of p1 and q1 necessarily divides B. ROOTS AND AVERAGES where p1 and q1 are the least primes in their respective factorizations. PRIMES. Prime factorizations are always unique. Such eﬀects . Thus no positive integer is ambiguously factorable. which might be prime or composite (or unity). E≡ p1 q 1 Then. The contradiction proves false the assumption which gave rise to it. Observing that some integer s which divides C necessarily also divides C ± ns. p1 C Ep1 = = Aq . which implies that p1 q1 < C. q1 Eq1 = That Eq1 = Ap says that q1 divides Ap . which further means that the product p1 q1 divides B. We have observed at the start of this subsection that plausible assumptions can hide subtle logical ﬂaws. then it divides B + p1 q1 = C. the plausible assumption of the present subsection has turned out absolutely correct. C = Ap . This means that B’s unique factorization includes both p1 and q1 .

But if q > 1. 4/6 is not fully reduced. and more importantly on whether the unsureness felt is true uncertainty or is just an unaccountable desire for more precise mathematical deﬁnition (if the latter. Appendix 1]. It depends on how sure one feels. nonintegral n. q) ∈ Z. when one lets logical minutiae distract him to too great a degree. then unlike the author you may have the right temperament to become a professional mathematician). An irrational number is a ﬁnite real number which is not rational. 6. the 5 6 Section 2. then the fully reduced n = p2 /q 2 is not an integer as we had assumed that it was.3 Rational and irrational numbers p . then it probably is. The contradiction proves false the assumption which gave rise to it. p > 0. is a matter of applied mathematical style. q > 1. one admittedly begins to drift out of the applied mathematical realm that is the subject of this book. q Squaring the equation.1. . √ there is no such thing as a n which is not an integer but is rational. For instance. q A rational number is a ﬁnite real number expressible as a ratio of integers5 x= The ratio is fully reduced if p and q have no prime factors in common. whereas 2/3 is. (n. For √ √ example. q) ∈ Z. The author does judge the present subsection’s proof to be worth the applied eﬀort. A proof somewhat like the one presented here is found in [62. we have that p2 = n.3 explains the ∈ Z notation. Judging when to delve into the mathematics anyway. suppose that there did exist a fully reduced x= p √ = n. The proof is readily extended to show that any x = nj/k is irrational if nonintegral. as was to be demonstrated. PRIME NUMBERS 123 are typical on the shadowed frontier where applied shades into pure mathematics: with suﬃcient experience and with a ﬁrm grasp of the model at hand. n > 0. Hence there √ exists no rational.6.1. q = 0. p. (p. seeking a more rigorous demonstration of a proposition one feels pretty sure is correct. but nevertheless. q2 which form is evidently also fully reduced. if you think that it’s true. To prove6 the last point. In fact any x = n is irrational unless integral. 2 is irrational.

this reduces to B(α) = ρ.2. if Q(z) had order less than N − 1. In this case the divisor A(z) = z − α has ﬁrst order.2 The existence and number of polynomial roots This section shows that an N th-order polynomial must have exactly N roots. When z = α.124 CHAPTER 6. . Evidently the division leaves no remainder ρ.3. z N −1 term of the quotient is never null. so little will take you pretty far. 6. Thus substituting yields B(z) = (z − α)Q0 (z) + ρ. ﬁrst-order divisors leave zeroth-order. constant remainders R0 (z) = ρ. PRIMES. N B(z) = k=0 bk z k . ROOTS AND AVERAGES extension by writing pk /q k = nj then following similar steps as those this paragraph outlines. B(z) = A(z)Q0 (z) + R0 (z). a remainder. That’s all the number theory the book treats. Now onward we go to other topics. which is to say that z − α exactly divides every polynomial B(z) of which z = α is a root. N > 0. then B(z) = (z − α)Q(z) could not possibly have order N as we have assumed. bN = 0. But B(α) = 0 by assumption. B(α) = 0. the leading. then the quotient Q(z) = B(z)/(z − α) has exactly order N − 1. and as § 2. 6.2 has observed. where A(z) = z − α.1 Polynomial roots Consider the quotient B(z)/A(z). so ρ = 0. but in applied math.6. The reason is that if the leading term were null. In the long-division symbology of Table 2. Note that if the polynomial B(z) has order N . where Q0 (z) is the quotient and R0 (z). That is.

where z = ρeiφ . 10. ρ is held constant. The locus is like a great string or rubber band. one root extracted at each step.1 one can divide the polynomial by z − αN to obtain a new polynomial of order N − 1. They also prove it in rather a diﬀerent way. factoring the polynomial step by step into the desired form bN N (z − αj ). Since the string originally has looped N times at great distance about the Argand origin. To show that there is no such polynomial. for if a polynomial of order N has a root αN . then at some time between it must have swept through the origin and every other point within the original loops. the locus of all points in a plane at distance ρ from a point O is a circle. joined at the ends and looped in N great loops. Prob. [34. On the other hand. For example. the locus is nearly but not quite a circle at radius bN ρN from the Argand origin B(z) = 0. Ch. this locus forms a closed loop. then according to § 6.1) where the αk are the N roots of the polynomial. j=1 It remains however to show that there exists no polynomial B(z) of order N > 0 lacking roots altogether. the string’s shape changes smoothly.7 To prove the theorem. then one can divide it by z−αN −1 to obtain yet another polynomial of order N − 2. consider the locus8 of all B(ρeiφ ) in the Argand range plane (Fig. the locus of all points in three-dimensional space equidistant from two points P and Q is a plane. it suﬃces to show that all polynomials of order N > 0 have at least one root.5). Now consider the locus at very large ρ again. (6. the bN z N term dominates B(z).2. As ρ shrinks smoothly. and φ is variable. THE EXISTENCE AND NUMBER OF ROOTS 125 6. but at the end has collapsed on a single point. Because ei(φ+n2π) = eiφ and no fractional powers of z appear in (6. 74] 8 A locus is the geometric collection of points which satisfy a given criterion. 2.2. but this time let ρ slowly shrink. when ρ = 0 the entire locus collapses on the single point B(0) = b0 . bN = 0. revolving N times at that great distance before exactly repeating. To the new polynomial the same logic applies: if it has at least one root αN −1 . As such. .1). so the locus there evidently has the general character of bN ρN eiN φ . Watch the locus as ρ shrinks.6. etc. 7 Professional mathematicians typically state the theorem in a slightly diﬀerent form. Eventually ρ disappears and the entire string collapses on the point B(0) = b0 .2.2 The fundamental theorem of algebra The fundamental theorem of algebra holds that any polynomial B(z) of order N can be factored N N B(z) = k=0 bk z = bN k j=1 (z − αj ). and so on. At very large ρ.

10 Next is Brian who lays 90. reducing the polynomial step by step until all the roots are found. 6. . The fact that the roots exist is one thing. PRIMES. lays 120 bricks per hour. every 40 seconds. If so. Actually ﬁnding the roots numerically is another matter. There are three masons. (2. The strongest and most experienced of the three. but to the applied mathematician it probably suﬃces to observe merely that no such formula is known. given that the formula be constructed according to certain rules. Given eight hours.8) can be used to locate a root numerically in any case.3 Addition and averages This section discusses the two basic ways to add numbers and the three basic ways to calculate averages of them. 10 The ﬁgures in the example are in decimal notation. Undoubtedly the theorem is interesting to the professional mathematician.1). so the string can only sweep as ρ decreases. “Abel’s impossibility theorem”]. Adam. Now suppose that we are told that Adam can lay a brick every 30 seconds. formulas for the roots are known (see Ch. B(z) does have at least one root. B(z) is everywhere diﬀerentiable. it is said to be shown that no such formula even exists. ROOTS AND AVERAGES After all. he lays only 60. then the values of ρ and φ precisely where the string has swept through the origin by deﬁnition constitute a root B(ρeiφ ) = 0. The Newton-Raphson is used to extract one root (any root) at each step as described above. The Argand origin lies inside the loops at the start but outside at the end.9 but the NewtonRaphson iteration (§ 4. as in (6. For a quadratic (second order) polynomial. it can never skip.126 CHAPTER 6. How much time do the 9 In a celebrated theorem of pure mathematics [75. For cubic (third order) and quartic (fourth order) polynomials. 10) though seemingly not so for quintic (ﬁfth order) and higher-order polynomials. ﬁnding the polynomial given the roots. is much easier: one just multiplies out j (z − αj ). Thus as we were required to show. The reverse problem. Charles is new. which observation completes the applied demonstration of the fundamental theorem of algebra. 6. every 60 seconds.1 Serial and parallel addition Consider the following problem.2) gives the roots. Brian. Charles.3. how many bricks can the three men lay? Answer: (8 hours)(120 + 90 + 60 bricks per hour) = 2160 bricks.

Neither is stated in simpler terms than the other. 5.800 seconds = 8 hours. perhaps. . Assuming that none of the values involved is negative. It works according to the law 1 1 1 = + . might serve if needed. Counterintuitive.1 are soon derived. but fortunately there exists a better notation: (2160 bricks)(30 40 60 seconds per brick) = 8 hours. the several parallel-addition identities of Table 6.2) and a bit of arithmetic.” . b k=a f (k) ≡ f (a) f (a + 1) f (a + 2) · · · f (b). (6. 2 .—rather than as 1. . . The notation used to solve the second is less elegant. 1 . (6. 30 40 60 30 40 60 The operator is called the parallel addition operator.3) This is intuitive. .6. 1 . one can readily show that11 a x ≤ b x iﬀ a ≤ b. With (6. The writer knows of no conventional notation for parallel sums of series.2) a b a b where the familiar operator + is verbally distinguished from the when necessary by calling the + the serial addition or series addition operator. but suggests that the notation which appears in the table. 2. . 1 hour 3600 seconds The two problems are precisely equivalent. (6. . 3. ADDITION AND AVERAGES three men need to lay 2160 bricks? Answer: 1 30 127 + 1 40 2160 bricks 1 + 60 bricks per second = 28.3. is that a x ≤ a. 4. “if and only if. where 1 1 1 1 = + + . 5 .—serial addition (+) seems 3 4 11 The word iﬀ means.4) Because we have all learned as children to count in the sensible man1 1 ner 1.

1 a b 1 1 + a b ab a+b a 1 + ab 1 a+b 1 1 a b ab a b a 1 ab = = a b = a 1 b = a+b = a+ 1 b = a b = b a a (b c) = (a b) c a ∞=∞ a = a a (−a) = ∞ (a)(b c) = ab ac 1 k a+b = b+a a + (b + c) = (a + b) + c a+0 =0+a = a a + (−a) = 0 (a)(b + c) = ab + ac 1 k ak = k 1 ak ak = k 1 ak .128 CHAPTER 6. PRIMES.1: Parallel and serial addition identities. ROOTS AND AVERAGES Table 6.

Yet both ﬁgures are valid. the clever businessman might negotiate a contract so that the average used worked to his own advantage. Parallel addition gives the electrical engineer a neat way of adding the impedances of parallel-connected loads.1 illustrates. One merely writes a (−b).. 1/(43 3 seconds per brick) = 90 bricks per hour. 3 3 1 These two ﬁgures are not the same. 5. they are connected in series. The psychological barrier is hard to breach.12 Now that you have seen it. incidentally. yet for many purposes parallel addition is in fact no less fundamental. 30 + 40 + 60 seconds per brick = 43 1 seconds per brick. ADDITION AND AVERAGES 129 more natural than parallel addition ( ) does. . (Exercise: counting from zero serially goes 0. 13 [64. in fact probably more often than. If the author wanted to demonstrate his business acumen (or lack thereof) he’d do so elsewhere not here! There are a lot of good business books 12 .14 In electric circuits. loads are connected in parallel as often as. 2. eqn. how does the parallel analog go?)13 Convention brings no special symbol for parallel subtraction.3. the other way.2 Averages Let us return to the problem of the preceding section. On the one hand. 6. . Which ﬁgure you choose depends on what you want to calculate. That is. 1. yet some businessmen will never clearly consider the diﬀerence). There is proﬁt in learning to think both ways. you can use it. The rejoinder is fair enough. 3 On the other hand. . which means exactly what it appears to mean. if in seconds per brick. as Table 6. Among the three masons. yet outside the electrical engineering literature the parallel addition notation is seldom seen. A common mathematical error among businessmen seems to be to fail to realize that both averages are possible and that they yield diﬀerent numbers (if the businessman quotes in bricks per hour.6. 4.3. 120 + 90 + 60 bricks per hour = 90 bricks per hour. what is their average productivity? The answer depends on how you look at it. Its rules are inherently neither more nor less complicated. 1.27] 14 “And what does the author know about business?” comes the rejoinder. the productivities average one way. Realizing this. 3.

but somehow he doubts that many boards of directors would be willing to bet the company on a ﬁnancial formula containing some mysteriouslooking ex .5) j xj j k w k 1/wk . The inverse geometric mean [(30)(40)(60)] 1/3 seconds per brick implies the same average productivity. Business demands other talents.15 Generally. Now. The fact remains nevertheless that businessmen sometimes use mathematics in peculiar ways. an American. xk wk (6. Some businessmen are mathematically rather sharp— as you presumably are if you are in business and are reading these words—but as for most: when real mathematical ability is needed.6) wk . properly always apportions the next available seat in the House to the state whose geometric mean of population per representative before and after apportionment would be greatest. incidentally. Trying to convince businessmen that their math is wrong. The mathematically savvy sometimes prefer the geometric mean over either of the others for this reason.12) to accrue interest and amortize loans. The author is not sure. The geometric mean does not have the problem either of the two averages discussed above has. that’s what they hire engineers. ROOTS AND AVERAGES When it is unclear which of the two averages is more appropriate. was recently. the Republic does not rise or fall on the niceties of averaging techniques. but. architects and the like for. making relatively easy problems harder and more mysterious than the problems need to be. a third average is available. (6. pleasantly surprised to learn that the formula his country’s relevant federal statute stipulates to implement the Constitutional requirement that representation in the country’s federal House of Representatives be apportioned by population actually. some American who knew his mathematics was involved in the drafting of that statute! . geometric and harmonic means are deﬁned µ ≡ µΠ ≡ µ ≡ k j k wk xk = k wk 1/ xj j w k wk k k xk /wk = 1/wk = k 1 wk wk xk k . 15 The writer. is in the author’s experience usually a waste of time.130 CHAPTER 6. (6. the geometric mean [(120)(90)(60)] 1/3 bricks per hour. If you have ever encountered the little monstrosity of an approximation banks (at least in the author’s country) actually use in place of (9. admittedly. PRIMES.7) out there and this is not one of them. nonetheless. the arithmetic. then you have met the diﬃculty.

16 0 ≤ (a − b)2 . from the last step to the ﬁrst. 2 . √ 2 ab ≤ 1 ≤ a+b √ 2ab ab ≤ ≤ a+b √ ab ≤ 2(a b) ≤ That is.11) The arithmetic mean is greatest and the harmonic mean.6. with the geometric mean falling between. 2 ab a+b . (6. = 2(a b).11) hold when there are several nonnegative samples of various nonnegative weights? To show that it does. consider the case of N = 2m nonnegative samples of equal weight. √ 2 ab ≤ a + b. (6.9) (6.” the saying goes. but the motivation behind them remains inscrutable until the reader realizes that the writer originally worked the steps out backward with his pencil. 16 4ab ≤ a2 + 2ab + b2 . √2 = ab. The same reading strategy often clariﬁes inscrutable math. then by successive steps. µ ≤ µΠ ≤ µ. try reading backward. 0 ≤ a2 − 2ab + b2 . Does (6. “Begin with the end in mind. least. Nothing prevents one from dividing The steps are logical enough. ADDITION AND AVERAGES 131 where the xk are the several samples and the wk are weights. Only then did he reverse the order and write the steps formally here.3. For two samples weighted equally. these are µ = µΠ µ a+b .10) If a ≥ 0 and b ≥ 0. The writer had no idea that he was supposed to start from 0 ≤ (a−b)2 until his pencil working backward showed him. a+b √ . When you can follow the logic but cannot understand what could possibly have inspired the writer to conceive the logic in the ﬁrst place. In this case the saying is right.8) (6. 2 a+b .

By this reasoning. . But each subset can further be divided in half. we have that (6.11) obtains. PRIMES. Now consider that a sample of any weight can be approximated arbitrarily closely by several samples of weight 1/2m . and so on until each smallest group has two members only—in which case we already know that (6. considering each subset separately.11) obtains for the entire set. (6. for if (6.11) holds for each subset individually then surely it holds for the whole set (this is so because the average of the whole set is itself the average of the two subset averages. ROOTS AND AVERAGES such a set of samples in half.132 CHAPTER 6. where the word “average” signiﬁes the arithmetic. then each subsubset can be divided in half again. Starting there and recursing back.11) holds for any nonnegative weights of nonnegative samples. geometric or harmonic mean as appropriate). which was to be demonstrated. provided that m is suﬃciently large.

one of the profoundest ideas in all of mathematics. This chapter. what is the corresponding accretion. or integral. f ′ (t)? • Interpreting some function f ′ (t) as an instantaneous rate of change. Experience knows no reliable way to teach the integral adequately to the uninitiated except through dozens or hundreds of pages of suitable examples and exercises.Chapter 7 The integral Chapter 4 has observed that the mathematics of calculus concerns a complementary pair of questions: • Given some function f (t). The understanding of the second question constitutes the concept of the integral. It can be done. The sections of the present chapter concisely treat matters which elsewhere rightly command chapters or whole books of their own. However. 7.1 The concept of the integral An integral is a ﬁnite accretion or sum of an inﬁnite number of inﬁnitesimal elements. This chapter builds toward a basic understanding of the second. This section introduces the concept. f (t)? Chapter 4 has built toward a basic understanding of the ﬁrst question. is undeniably a hard chapter. yet the book you are reading cannot be that kind of book. 133 . Concision can be a virtue—and by design. nothing essential is omitted here—but the bold novice who wishes to learn the integral from these pages alone faces a daunting challenge. for less intrepid readers who quite reasonably prefer a gentler initiation. or derivative. what is the function’s instantaneous rate of change. which introduces the integral. [30] is warmly recommended.

Sn = 1 n 1 2 1 4 1 8 k=0 0x20−1 k=0 0x40−1 k=0 0x80−1 k=0 k. 4 k . . THE INTEGRAL Figure 7. n What do these sums represent? One way to think of them is in terms of the shaded areas of Fig. k . f1 (τ ) 0x10 ∆τ = 1 f2 (τ ) 0x10 1 ∆τ = 2 S1 τ 0x10 S2 τ 0x10 7. . thin rectangles of width 1 and height k.1. of rectangles of width 1/2 and height . 8 (0x10)n−1 k=0 k . In the ﬁgure.1: Areas representing discrete sums.1 An introductory example Consider the sums 0x10−1 S1 = S2 = S4 = S8 = . 7. S1 is composed of several tall.134 CHAPTER 7.1. S2 . 2 k .

7.1. THE CONCEPT OF THE INTEGRAL

135

k/2.1 As n grows, the shaded region in the ﬁgure looks more and more like a triangle of base length b = 0x10 and height h = 0x10. In fact it appears that bh lim Sn = = 0x80, n→∞ 2 or more tersely S∞ = 0x80, is the area the increasingly ﬁne stairsteps approach. Notice how we have evaluated S∞ , the sum of an inﬁnite number of inﬁnitely narrow rectangles, without actually adding anything up. We have taken a shortcut directly to the total. In the equation 1 Sn = n let us now change the variables k , n 1 ∆τ ← , n τ← to obtain the representation

(0x10)n−1 (0x10)n−1

k=0

k , n

Sn = ∆τ

k=0

τ;

**or more properly,
**

(k|τ =0x10 )−1

Sn =

k=0

τ ∆τ,

**where the notation k|τ =0x10 indicates the value of k when τ = 0x10. Then
**

(k|τ =0x10 )−1

S∞ = lim

1

∆τ →0+

τ ∆τ,

k=0

If the reader does not fully understand this paragraph’s illustration, if the relation of the sum to the area seems unclear, the reader is urged to pause and consider the illustration carefully until he does understand it. If it still seems unclear, then the reader should probably suspend reading here and go study a good basic calculus text like [30]. The concept is important.

136

CHAPTER 7. THE INTEGRAL

Figure 7.2: An area representing an inﬁnite sum of inﬁnitesimals. (Observe that the inﬁnitesimal dτ is now too narrow to show on this scale. Compare against ∆τ in Fig. 7.1.)

f (τ ) 0x10

S∞ τ 0x10

**in which it is conventional as ∆τ vanishes to change the symbol dτ ← ∆τ , where dτ is the inﬁnitesimal of Ch. 4:
**

(k|τ =0x10 )−1

S∞ = lim

(k|

dτ →0+ )−1

τ dτ.

k=0

τ is cumbersome, so we replace it with the The symbol limdτ →0+ k=0=0x10 0x10 new symbol2 0 to obtain the form

0x10

S∞ =

τ dτ.

0

This means, “stepping in inﬁnitesimal intervals of dτ , the sum of all τ dτ from τ = 0 to τ = 0x10.” Graphically, it is the shaded area of Fig. 7.2.

Like the Greek S, , denoting discrete summation, the seventeenth century-styled Roman S, , stands for Latin “summa,” English “sum.” See [77, “Long s,” 14:54, 7 April 2006].

2

7.1. THE CONCEPT OF THE INTEGRAL

137

7.1.2

Generalizing the introductory example

**Now consider a generalization of the example of § 7.1.1: 1 Sn = n
**

bn−1

f

k=an

k n

.

(In the example of § 7.1.1, f [τ ] was the simple f [τ ] = τ , but in general it could be any function.) With the change of variables k , n 1 ∆τ ← , n τ← whereby k|τ =a = an, k|τ =b = bn, (k, n) ∈ Z, n = 0, (but a and b need not be integers), this is

(k|τ =b )−1

Sn =

k=(k|τ =a )

f (τ ) ∆τ.

**In the limit,
**

(k|τ =b )−1 b

S∞ = lim

dτ →0+

f (τ ) dτ =

k=(k|τ =a ) a

f (τ ) dτ.

This is the integral of f (τ ) in the interval a < τ < b. It represents the area under the curve of f (τ ) in that interval.

7.1.3

The balanced deﬁnition and the trapezoid rule

Actually, just as we have deﬁned the derivative in the balanced form (4.15), we do well to deﬁne the integral in balanced form, too: (k|τ =b )−1 f (a) dτ b f (b) dτ f (τ ) dτ ≡ lim . (7.1) + f (τ ) dτ + 2 2 dτ →0+ a

k=(k|τ =a )+1

138

CHAPTER 7. THE INTEGRAL

Figure 7.3: Integration by the trapezoid rule (7.1). Notice that the shaded and dashed areas total the same.

f (τ )

a

dτ

b

τ

Here, the ﬁrst and last integration samples are each balanced “on the edge,” half within the integration domain and half without. Equation (7.1) is known as the trapezoid rule. Figure 7.3 depicts it. The name “trapezoid” comes of the shapes of the shaded integration elements in the ﬁgure. Observe however that it makes no diﬀerence whether one regards the shaded trapezoids or the dashed rectangles as the actual integration elements; the total integration area is the same either way.3 The important point to understand is that the integral is conceptually just a sum. It is a sum of an inﬁnite number of inﬁnitesimal elements as dτ tends to vanish, but a sum nevertheless; nothing more.

The trapezoid rule (7.1) is perhaps the most straightforward, general, robust way to deﬁne the integral, but other schemes are possible, too. For example, taking the trapezoids in adjacent pairs—such that a pair enjoys not only a sample on each end but a third sample in the middle—one can for each pair ﬁt a second-order curve f (τ ) ≈ (c2 )(τ − τmiddle )2 + (c1 )(τ − τmiddle ) + c0 to the function, choosing the coeﬃcients c2 , c1 and c0 to make the curve match the function exactly at the pair’s three sample points; then substitute the area under the pair’s curve (which by the end of § 7.4 we shall know how to calculate exactly) for the areas of the two trapezoids. Changing the symbol ∆τ ← dτ on one side of the equation to suggest coarse sampling, the result is the unexpectedly simple

b 3

f (τ ) ∆τ

a

≈

1 4 2 f (a) + f (a + ∆τ ) + f (a + 2 ∆τ ) 3 3 3 + 2 4 1 4 f (a + 3 ∆τ ) + f (a + 4 ∆τ ) + · · · + f (b − ∆τ ) + f (b) ∆τ, 3 3 3 3

7.2. THE ANTIDERIVATIVE

139

Nothing actually requires the integration element width dτ to remain constant from element to element, incidentally. Constant widths are usually easiest to handle but variable widths ﬁnd use in some cases. The only requirement is that dτ remain inﬁnitesimal. (For further discussion of the point, refer to the treatment of the Leibnitz notation in § 4.4.2.)

7.2

If

**The antiderivative and the fundamental theorem of calculus
**

x

S(x) ≡

g(τ ) dτ,

a

then what is the derivative dS/dx? After some reﬂection, one sees that the derivative must be dS = g(x). dx This is so because the action of the integral is to compile or accrete the area under a curve. The integral accretes area at a rate proportional to the curve’s height f (τ ): the higher the curve, the faster the accretion. In this way one sees that the integral and the derivative are inverse operators; the one inverts the other. The integral is the antiderivative. More precisely, b df dτ = f (τ )|b , (7.2) a dτ a where the notation f (τ )|b or [f (τ )]b means f (b) − f (a). a a

as opposed to the trapezoidal

b

f (τ ) ∆τ

a

≈

1 f (a) + f (a + ∆τ ) + f (a + 2 ∆τ ) 2 + f (a + 3 ∆τ ) + f (a + 4 ∆τ ) + · · · + f (b − ∆τ ) + 1 f (b) ∆τ 2

implied by (7.1). The curved scheme is called Simpson’s rule. It is clever and well known. Simpson’s rule had real uses in the slide-rule era when, for practical reasons, one preferred to let ∆τ be sloppily large, sampling a curve only a few times to estimate its integral; yet the rule is much less useful when a computer is available to do the arithmetic over an adequate number of samples. At best Simpson’s rule does not help much with a computer; at worst it can yield spurious results; and because it is easy to program it tends to encourage thoughtless application. Other than in the footnote you are reading, Simpson’s rule is not covered in this book.

140

CHAPTER 7. THE INTEGRAL

The importance of (7.2), ﬁttingly named the fundamental theorem of calculus,4 can hardly be overstated. As the formula which ties together the complementary pair of questions asked at the chapter’s start, (7.2) is of utmost importance in the practice of mathematics. The idea behind the formula is indeed simple once grasped, but to grasp the idea ﬁrmly in the ﬁrst place is not entirely trivial.5 The idea is simple but big. The reader is urged to pause now and ponder the formula thoroughly until he feels reasonably conﬁdent that indeed he does grasp it and the important idea it represents. One is unlikely to do much higher mathematics without this formula. As an example of the formula’s use, consider that because (d/dτ )(τ 3 /6) = τ 2 /2, it follows that

x 2

τ 2 dτ = 2

x 2

d dτ

τ3 6

τ3 dτ = 6

x

=

2

x3 − 8 . 6

Gathering elements from (4.21) and from Tables 5.2 and 5.3, Table 7.1 lists a handful of the simplest, most useful derivatives for antiderivative use. Section 9.1 speaks further of the antiderivative.

[30, § 11.6][65, § 5-4][77, “Fundamental theorem of calculus,” 06:29, 23 May 2006] Having read from several calculus books and, like millions of others perhaps including the reader, having sat years ago in various renditions of the introductory calculus lectures in school, the author has never yet met a more convincing demonstration of (7.2) than the formula itself. Somehow the underlying idea is too simple, too profound to explain. It’s like trying to explain how to drink water, or how to count or to add. Elaborate explanations and their attendant constructs and formalities are indeed possible to contrive, but the idea itself is so simple that somehow such contrivances seem to obscure the idea more than to reveal it. One ponders the formula (7.2) a while, then the idea dawns on him. If you want some help pondering, try this: Sketch some arbitrary function f (τ ) on a set of axes at the bottom of a piece of paper—some squiggle of a curve like

5 4

f (τ ) τ a b

will do nicely—then on a separate set of axes directly above the ﬁrst, sketch the corresponding slope function df /dτ . Mark two points a and b on the common horizontal axis; then on the upper, df /dτ plot, shade the integration area under the curve. Now consider (7.2) in light of your sketch. There. Does the idea not dawn? Another way to see the truth of the formula begins by canceling its (1/dτ ) dτ to obtain b the form τ =a df = f (τ )|b . If this way works better for you, ﬁne; but make sure that you a understand it the other way, too.

**7.3. OPERATORS, LINEARITY AND MULTIPLE INTEGRALS Table 7.1: Basic derivatives for the antiderivative.
**

b a

141

df dτ = f (τ )|b a dτ τa a ln τ, exp τ, sin τ, (− cos τ ) , , a = 0 ln 1 = 0 exp 0 = 1 sin 0 = 0 cos 0 = 1

τ a−1 = 1 = τ exp τ = cos τ = sin τ =

d dτ d dτ d dτ d dτ d dτ

7.3

Operators, linearity and multiple integrals

This section presents the operator concept, discusses linearity and its consequences, treats the commutivity of the summational and integrodiﬀerential operators, and introduces the multiple integral.

7.3.1

Operators

An operator is a mathematical agent that combines several values of a function. Such a deﬁnition, unfortunately, is extraordinarily unilluminating to those who do not already know what it means. A better way to introduce the operator is by giving examples. Operators include +, −, multiplication, division, , , and ∂. The essential action of an operator is to take several values of a function and combine them in some way. For example, is an operator in

5 j=1

(2j − 1) = (1)(3)(5)(7)(9) = 0x3B1.

Notice that the operator has acted to remove the variable j from the expression 2j − 1. The j appears on the equation’s left side but not on its right. The operator has used the variable up. Such a variable, used up by an operator, is a dummy variable, as encountered earlier in § 2.3.

142

CHAPTER 7. THE INTEGRAL

7.3.2

A formalism

But then how are + and − operators? They don’t use any dummy variables up, do they? Well, it depends on how you look at it. Consider the sum S = 3 + 5. One can write this as

1

S=

k=0

f (k),

where

Then, S=

3 if k = 0, f (k) ≡ 5 if k = 1, undeﬁned otherwise.

1

f (k) = f (0) + f (1) = 3 + 5 = 8.

k=0

By such admittedly excessive formalism, the + operator can indeed be said to use a dummy variable up. The point is that + is in fact an operator just like the others. Another example of the kind: D = g(z) − h(z) + p(z) + q(z)

= Φ(0, z) − Φ(1, z) + Φ(2, z) − Φ(3, z) + Φ(4, z)

4

= g(z) − h(z) + p(z) − 0 + q(z) (−)k Φ(k, z),

k=0

= where

Such unedifying formalism is essentially useless in applications, except as a vehicle for deﬁnition. Once you understand why + and − are operators just as and are, you can forget the formalism. It doesn’t help much.

g(z) h(z) p(z) Φ(k, z) ≡ 0 q(z) undeﬁned

if k = 0, if k = 1, if k = 2, if k = 3, if k = 4, otherwise.

7.3. OPERATORS, LINEARITY AND MULTIPLE INTEGRALS

143

7.3.3

Linearity

A function f (z) is linear iﬀ (if and only if) it has the properties f (z1 + z2 ) = f (z1 ) + f (z2 ), f (αz) = αf (z), f (0) = 0. The functions f (z) = 3z, f (u, v) = 2u − v and f (z) = 0 are examples of √ linear functions. Nonlinear functions include6 f (z) = z 2 , f (u, v) = uv, f (t) = cos ωt, f (z) = 3z + 1 and even f (z) = 1. An operator L is linear iﬀ it has the properties L(f1 + f2 ) = Lf1 + Lf2 , L(αf ) = αLf, L(0) = 0. The operators instance,7 , , +, − and ∂ are examples of linear operators. For df1 df2 d [f1 (z) + f2 (z)] = + . dz dz dz Nonlinear operators include multiplication, division and the various trigonometric functions, among others. Section 16.1.2 will have more to say about operators and their notation.

7.3.4

**Summational and integrodiﬀerential commutivity
**

b

**Consider the sum S1 =
**

k=a

6

q j=p

xk j!

.

If 3z + 1 is a linear expression, then how is not f (z) = 3z + 1 a linear function? Answer: it is partly a matter of purposeful deﬁnition, partly of semantics. The equation y = 3x + 1 plots a line, so the expression 3z + 1 is literally “linear” in this sense; but the deﬁnition has more purpose to it than merely this. When you see the linear expression 3z + 1, think 3z + 1 = 0, then g(z) = 3z = −1. The g(z) = 3z is linear; the −1 is the constant value it targets. That’s the sense of it. 7 You don’t see d in the list of linear operators? But d in this context is really just another way of writing ∂, so, yes, d is linear, too. See § 4.4.2.

144

CHAPTER 7. THE INTEGRAL

This is a sum of the several values of the expression xk /j!, evaluated at every possible pair (j, k) in the indicated domain. Now consider the sum

q b k=a

S2 =

j=p

xk . j!

This is evidently a sum of the same values, only added in a diﬀerent order. Apparently S1 = S2 . Reﬂection along these lines must soon lead the reader to the conclusion that, in general, f (j, k) =

k j j k

f (j, k).

Now consider that an integral is just a sum of many elements, and that a derivative is just a diﬀerence of two elements. Integrals and derivatives must then have the same commutative property discrete sums have. For example,

∞ v=−∞ b b

f (u, v) du dv =

u=a u=a

∞ v=−∞

f (u, v) dv du;

fk (v) dv =

k k

fk (v) dv; ∂f du. ∂v (7.3)

∂ ∂v In general,

f du =

**Lv Lu f (u, v) = Lu Lv f (u, v), where L is any of the linear operators Some convergent summations, like
**

∞ 1

,

or ∂.

k=0 j=0

(−)j , 2k + j + 1

**diverge once reordered, as
**

1 ∞

j=0 k=0

(−)j . 2k + j + 1

One cannot blithely swap operators here. This is not because swapping is wrong, but rather because the inner sum after the swap diverges, hence the

7.3. OPERATORS, LINEARITY AND MULTIPLE INTEGRALS

145

outer sum after the swap has no concrete summand on which to work. (Why does the inner sum after the swap diverge? Answer: 1 + 1/3 + 1/5 + · · · = [1] + [1/3 + 1/5] + [1/7 + 1/9 + 1/0xB + 1/0xD] + · · · > 1[1/4] + 2[1/8] + 4[1/0x10] + · · · = 1/4 + 1/4 + 1/4 + · · · . See also § 8.10.5.) For a more twisted example of the same phenomenon, consider8 1− 1 1 1 + − + ··· = 2 3 4 1− 1 1 − 2 4 + 1 1 1 − − 3 6 8 + ··· ,

which associates two negative terms with each positive, but still seems to omit no term. Paradoxically, then, 1− 1 1 1 + − + ··· = 2 3 4 = = 1 1 1 1 − − + + ··· 2 4 6 8 1 1 1 1 − + − + ··· 2 4 6 8 1 1 1 1 1 − + − + ··· , 2 2 3 4

or so it would seem, but cannot be, for it claims falsely that the sum is half itself. A better way to have handled the example might have been to write the series as

n→∞

lim

1−

1 1 1 1 1 + − + ··· + − 2 3 4 2n − 1 2n

in the ﬁrst place, thus explicitly specifying equal numbers of positive and negative terms.9 So specifying would have prevented the error. In the earlier example, n 1 (−)j lim n→∞ 2k + j + 1

k=0 j=0

[3, § 1.2.3] Some students of professional mathematics would assert that the false conclusion had been reached through lack of rigor. Well, maybe. This writer however does not feel sure that rigor is quite the right word for what was lacking here. Professional mathematics does bring an elegant notation and a set of formalisms which serve ably to spotlight certain limited kinds of blunders, but these are blunders no less by the applied approach. The stalwart Leonhard Euler—arguably the greatest series-smith in mathematical history— wielded his heavy analytical hammer in thunderous strokes before professional mathematics had conceived the notation or the formalisms. If the great Euler did without, then you and I might not always be forbidden to follow his robust example. See also footnote 11. On the other hand, the professional approach is worth study if you have the time. Recommended introductions include [44], preceded if necessary by [30] and/or [3, Ch. 1].

9

8

146

CHAPTER 7. THE INTEGRAL

likewise would have prevented the error, or at least have made the error explicit. The conditional convergence 10 of the last paragraph, which can occur in integrals as well as in sums, seldom poses much of a dilemma in practice. One can normally swap summational and integrodiﬀerential operators with little worry. The reader however should at least be aware that conditional convergence troubles can arise where a summand or integrand varies in sign or phase.

7.3.5

Multiple integrals

f (u, w) =

Consider the function

u2 . w Such a function would not be plotted as a curved line in a plane, but rather as a curved surface in a three-dimensional space. Integrating the function seeks not the area under the curve but rather the volume under the surface:

u2 w2 w1

V =

u1

u2 dw du. w

**This is a double integral. Inasmuch as it can be written in the form
**

u2

V

=

u1 w2

g(u) du, u2 dw, w

g(u) ≡

w1

its eﬀect is to cut the area under the surface into ﬂat, upright slices, then the slices crosswise into tall, thin towers. The towers are integrated over w to constitute the slice, then the slices over u to constitute the volume. In light of § 7.3.4, evidently nothing prevents us from swapping the integrations: u ﬁrst, then w. Hence

w2 u2 u1

V =

w1

u2 du dw. w

And indeed this makes sense, doesn’t it? What diﬀerence should it make whether we add the towers by rows ﬁrst then by columns, or by columns ﬁrst then by rows? The total volume is the same in any case—albeit the integral

10

[44, § 16]

7.3. OPERATORS, LINEARITY AND MULTIPLE INTEGRALS

147

over w is potentially ill-behaved11 near w = 0; so that, if for instance w1 were negative, w2 were positive, and both were real, one might rather write the double integral as12 V = lim

−ǫ w1 w2 u2 u1 ǫ→0+

+

+ǫ

u2 du dw. w

Double integrations arise very frequently in applications. Triple integrations arise about as often. For instance, if µ(r) = µ(x, y, z) represents the variable mass density of some soil,13 then the total soil mass in some rectangular volume is

x2 y2 y1 z2

M=

x1 z1

µ(x, y, z) dz dy dx.

**As a concise notational convenience, the last is likely to be written M=
**

V

µ(r) dr,

where the V stands for “volume” and is understood to imply a triple integration. Similarly for the double integral, V =

S

f (ρ) dρ,

where the S stands for “surface” and is understood to imply a double integration. Even more than three nested integrations are possible. If we integrated over time as well as space, the integration would be fourfold. A spatial Fourier transform (§ 18.9) implies a triple integration; and its inverse, another triple: a sixfold integration altogether. Manifold nesting of integrals is thus not just a theoretical mathematical topic; it arises in sophisticated real-world engineering models. The topic concerns us here for this reason.

A great deal of ink is spilled in the applied mathematical literature when summations and/or integrations are interchanged. The author tends to recommend saving the ink, for pure and applied mathematics want diﬀerent styles. What usually matters in applications is not whether a particular summation or integration satisﬁes some formal test but rather whether one clearly understands the summand to be summed or the integrand to be integrated. See also footnote 9. 12 It is interesting to consider the eﬀect of withdrawing the integral’s limit at −ǫ to −2ǫ, 2 −2ǫ w u as limǫ→0+ w1 + +ǫ2 u12 u du dw; for, surprisingly—despite that the parameter ǫ is w vanishing anyway—the withdrawal does alter the integral unless the limit at +ǫ also is 2ǫ withdrawn. The reason is that limǫ→0+ ǫ (1/w) dw = ln 2 = 0. 13 Conventionally the Greek letter ρ not µ is used for density, but it happens that we need the letter ρ for a diﬀerent purpose later in the paragraph.

11

Integrating the many triangles. 7. Refer to Fig. but inasmuch as the wedge is inﬁnitesimally narrow. 2 2 (7.1 The area of a circle Figure 7. The element has wedge shape.4: The area of a circle. y dφ ρ x 7. the wedge is indistinguishable from a triangle of base length ρ dφ and height ρ.) 7.11. we ﬁnd the circle’s area to be π π Acircle = φ=−π Atriangle = −π 2πρ2 ρ2 dφ = .4. The area of such a triangle is Atriangle = ρ2 dφ/2.4 depicts an element of a circle’s area. areas and volumes of interesting common shapes and solids.4.” .4 Areas and volumes By composing and solving appropriate integrals.2 The volume of a cone One can calculate the volume of any cone (or pyramid) if one knows its base area B and its altitude h measured normal14 to the base. We will calculate it in § 8. one can calculate the perimeters.4) (The numerical value of 2π—the circumference or perimeter of the unit circle—we have not calculated yet. THE INTEGRAL Figure 7. 7.148 CHAPTER 7.5. 14 Normal here means “at right angles.

the cone’s volume is h Vcone = 0 (B) z h 2 dz = B h2 h 0 z 2 dz = B h2 h3 3 = Bh .5) 7. the sphere’s surface is sliced vertically down the z axis into narrow constant-φ tapered strips (each strip broadest at the sphere’s equator. 7. 7.6. tapering to points at the sphere’s ±z poles) and horizontally across the z axis into narrow constant-θ rings. then ponder Fig. AREAS AND VOLUMES Figure 7. 7. regardless of the base’s shape.5: The volume of a cone.4. 3 (7. as in Fig. What if the base were square? Would the cross-sectional area not be (B)(z/h)2 in that case? What if the base were a right triangle with equal legs—in other words. For the surface area. well-characterized part of a square? (With a pair of scissors one can cut any shape from a square piece of paper. 149 h B A cross-section of a cone. 15 .5 a moment. after all. and how cross-sections cut nearer a cone’s vertex are smaller though the same shape. half a square? What if the base were some other strange shape like the base depicted in Fig. one wants to calculate both the surface area and the volume.7.7. For this reason. Fig.3 The surface area and volume of a sphere Of a sphere. 7. If coordinates are chosen such that the ˆ altitude h runs in the z direction with z = 0 at the cone’s vertex. then the cross-sectional area is evidently15 (B)(z/h)2 . If it is not yet evident to you. has the same shape the base has but a diﬀerent scale.5? Could such a strange shape not also be regarded as a deﬁnite.4. cut parallel to the cone’s base. Consider what it means to cut parallel to a cone’s base a cross-section of the cone. A surface element so produced (seen as shaded in the latter ﬁgure) evidently The fact may admittedly not be evident to the reader at ﬁrst glance.) Thinking along such lines must soon lead one to the insight that the parallel-cut cross-sectional area of a cone can be nothing other than (B)(z/h)2 .

z ˆ θ φ x ˆ r z y ˆ ρ Figure 7. z ˆ r dθ y ˆ ρ dφ . 7.150 CHAPTER 7. THE INTEGRAL Figure 7.7: An element of the sphere’s surface (see Fig.6).6: A sphere.

π annular area the expression φ=−π dS = −2πr dz represents: evidently. unexpectedly.6). 7.7) 3 16 It can be shown. the total volume is 4πr 3 Vsphere = .7. though some slices will be bigger across and thus heavier than others. Vsphere = S Vcone = S r r dS = 3 3 r dS = Ssphere . the volume of one such cone is Vcone = r dS/3. 151 The sphere’s total surface area then is the sum of all such elements over the sphere’s entire surface: π π Ssphere = φ=−π π θ=0 π θ=0 dS r 2 sin θ dθ dφ = = r φ=−π π 2 = r2 φ=−π π φ=−π 2 [− cos θ]π dφ 0 [2] dφ (7.1 that sin τ = (d/dτ )(− cos τ ). Having computed the sphere’s surface area. so. Hence. The author’s children judge it to represent “the Death Star with a square gun. one divides the sphere into many narrow cones. with the vertices of all the cones meeting at the sphere’s center. The form is interesting in any event if one visualizes the speciﬁc.” so maybe it depends on your point of view.1 has found a circle’s area—except that instead of dividing the circle into many narrow triangles. The subsequent integration arguably goes a little easier if dS is accepted in this mildly clever form. anyway. pole to pole. one can ﬁnd its volume just as § 7. AREAS AND VOLUMES has the area16 dS = (r dθ)(ρ dφ) = r 2 sin θ dθ dφ.6) = 4πr . a precisely equal portion of the sphere’s surface corresponds to each equal step along the z axis. if you judge Fig.4. should you slice an unpeeled apple into parallel slices of equal thickness. incidentally—the details are left as an exercise—that dS = −r dz dφ also. 3 S where the useful symbol S indicates integration over a closed surface.7 to represent an apple. In light of (7. where we have used the fact from Table 7. Per (7. (This is true.5). each slice curiously must take an equal share of the apple’s skin. (7.4. each cone with base area dS and altitude r.) .

It is simpler than decimal or hexadecimal division. Easier than integration.1 lends an analogous insight: that a circle can sometimes be viewed as a great triangle rolled up about its own vertex. dτ df =− . how does one check the result?17 Answer: by multiplying (0x57)(0xD) = 0x46B.8) Diﬀerentiating (7. integrating b a b3 − a 3 τ2 dτ = 2 6 with a pencil.2). b S≡ a df dτ = f (b) − f (a). however. Likewise. If you have never tried it. you might. and it’s how computers divide. and long division in straight binary is kind of fun. hexadecimal is just proxy for binary (see Appendix A). In decimal. Actually. it’s 1131/13 = 87.8) with respect to b and a.) 7. by writing dV = r 2 sin θ dr dθ dφ then integrating V dV . multiplication provides a quick. how does one check the result? Answer: by diﬀerentiating ∂ ∂b b3 − a 3 6 = b=τ τ2 . but.152 CHAPTER 7. Easier than division. THE INTEGRAL (One can compute the same spherical volume more prosaically. 2 Diﬀerentiation inverts integration. according to (7. hey. The derivation given above. reliable check. diﬀerentiation like multiplication provides a quick.4. ∂S ∂b ∂S ∂a df . The circular area derivation of § 7. The insight gained is worth the trial. Multiplication inverts division. . More formally. is preferred because it lends the additional insight that a sphere can sometimes be viewed as a great cone rolled up about its own vertex. dτ = b=τ (7. reliable check. dτ (7. go with it. without reference to cones.5 Checking an integration Dividing 0x46B/0xD = 0x57 with a pencil. few readers will ever have done much such multidigit hexadecimal arithmetic with a pencil.9) a=τ 17 Admittedly.

analytically precisely the opposite is true. CONTOUR INTEGRATION 153 Either line of (7. a f (τ ) dτ . 18 Using (7. The integration variable is a real-valued scalar which can do nothing but make a straight line from a to b. which lack variable limits to diﬀerentiate. It is a rare irony of mathematics that. 7. Consider the integral ˆ yρ S= r=ˆ ρ x (x2 + y 2 ) dℓ.9) can be used to check an integration. They are of little use to check deﬁnite integrals like (9.10) to check the example. . where dℓ is the inﬁnitesimal length of a step along the path of integration.9) and (7. So far the book has introduced only easy integrals. a + 2dτ. diﬀerentiation is the easier. . However. a + dτ. then from there to r = yρ? Or does it mean to integrate along the arc of Fig. Even experienced mathematicians are apt to err in analyzing these.10) which can be used to check further. and the integral certainly does not come ˆ ˆ out the same both ways. Equations (7.7. but Ch. (b3 − a3 )/6|b=a = 0. Yet many other paths of integration from xρ to yρ are possible. ˆ What does this integral mean? Does it mean to integrate from r = xρ to ˆ r = 0.8? The two paths of integration begin and end at the same points. 9 will bring much harder ones. Reversing an integration by taking an easy derivative is thus an excellent way to check a hard-earned integration result. they nevertheless serve only integrals with variable limits.14) below.6 Contour integration To this point we have considered only integrations in which the variable of integration advances in a straight line from one point to another: for b instance.9) and (7.18 As useful as (7. . not just these two. . Analytically.6. in which the function f (τ ) is evaluated at τ = a. Such is not the case when the integration variable is a vector. although numerically diﬀerentiation is indeed harder than integration. Evaluating (7.10) are. . 7. b. many or most integrals one meets in practice have or can be given variable limits.8) at b = a yields S|b=a = 0. but they diﬀer in between. (7.10) do serve such indeﬁnite integrals.

8 and 9. In the latter case some interesting mathematics emerge. so 2π/4 2π 3 ρ3 dφ = ρ2 dℓ = S= ρ .8 would be closed. 7. ˆ for instance.8: A contour of integration. The contour of Fig. we must be more speciﬁc: S= C (x2 + y 2 ) dℓ.5. as we shall see in §§ 8.8. 7. Besides applying where the variable of integration is a vector. indeed common. but closed contours which begin and end at the same point are also possible. It means that the contour ends where it began: the loop is closed. if it continued to r = 0 and then back to r = xρ. In the example. where C stands for “contour” and means in this example the speciﬁc contour of Fig. The useful symbol indicates integration over a closed contour. contour integration applies equally where the variable of integration is a complex scalar. x2 + y 2 = ρ2 (by the Pythagorean theorem) and dℓ = ρ dφ. 4 0 C In the example the contour is open. THE INTEGRAL Figure 7. y C ρ φ x Because multiple paths are possible. .154 CHAPTER 7.

t < to . where u(t) is the Heaviside unit step.11) plotted in Fig.7. dt (7.7. One can write this more concisely in the form x(t) = u(t − to )xo . DISCONTINUITIES Figure 7. t < 0.5] . 1. t > t o . The derivative of the Heaviside unit step is the curious Dirac delta δ(t) ≡ d u(t). u(t) ≡ 0.12) also called19 the impulse function. (7. § 19. 7. but one thing they do not model gracefully is the simple discontinuity.9.9: The Heaviside unit step u(t). t > 0. 7.13) [40. Consider a mechanical valve opened at time t = to . The ﬂow x(t) past the valve is x(t) = 0. plotted in Fig. where it is inﬁnite.10. u(t) 1 t 155 7. This function is zero everywhere except at t = 0. (7.7 Discontinuities The polynomials and trigonometrics studied to this point in the book offer ﬂexible means to model many physical phenomena of interest. xo . with the property that ∞ −∞ 19 δ(t) dt = 1.

25 May 2006] and slapped his disruptive δ(t) down on the table.” whose relevant traits in my deﬁnition included ﬁve ﬁngers on each hand. the six-ﬁngered count is “not a nobleman”. 20 . What the professionals seem to be saying is that δ(t) does not ﬁt as neatly as they would like into the abstract mathematical framework they had established for functions in general before Paul Dirac came along in 1930 [77. the fact of the Dirac delta dispute. until one realizes that it is more a dispute over methods and deﬁnitions than over facts. when the six-ﬁngered Count Rugen appeared on the scene. of course. Some professional mathematicians seem to have objected that δ(t) is not a function. then you would expect me to adapt my deﬁnition. In the author’s country at least. however.14 is the sifting property of the Dirac delta. THE INTEGRAL Figure 7.” 05:48. is not for this writer to judge. strictly speaking.)20 It seems inadvisable for the narrative to digress at this point to explore u(z) and δ(z). δ(t) t and the interesting consequence that ∞ −∞ δ(t − to )f (t) dt = f (to ) (7.156 CHAPTER 7. It’s a little like the six-ﬁngered man in Goldman’s The Princess Bride [29].14) for any function f (t). the unit step and delta of a complex argument.5) it could perhaps do so.10: The Dirac delta δ(t). wouldn’t you? By my pre¨xisting deﬁnition. a sort of debate seems to have run for decades between professional and applied mathematicians over the Dirac delta δ(t). § 2. From the applied point of view the objection is admittedly a little hard to understand. Even if not.4][19]. e but such exclusion really tells one more about ﬂaws in the deﬁnition than it does about the count. For the book’s present purpose the interesting action of the two functions is with respect to the real argument t. Whether the professional mathematician’s deﬁnition of the function is ﬂawed. “Paul Dirac. The objection is not so much that δ(t) is not allowed as it is that professional mathematics for years after 1930 lacked a fully coherent theory for it. inasmuch as it lacks certain properties common to functions as they deﬁne them [56. If I had established a deﬁnition of “nobleman” which subsumed “human. (Equation 7. 18) or by conceiving the Dirac delta as an inﬁnitely narrow Gaussian pulse (§ 18. The book has more pressing topics to treat. although by means of Fourier analysis (Ch.

(b) 5 (3τ 2 − 2τ 3 ) dτ . . Evaluate (a) x 0 τ x dτ .) x n a Cτ dτ . such that δ(r) dr = 1. x3 /3. ∗ Evaluate (a) dτ . The professionals who had established the theoretical framework before 1930 justiﬁably felt reluctant to throw the whole framework away because some scientists and engineers like us came along one day with a useful new function which didn’t quite ﬁt. Evaluate (a) −2 (3τ 2 − 2τ 3 ) dτ .8 Remarks (and exercises) The concept of the integral is relatively simple once grasped. The harder ones are marked with ∗ asterisks. One reason introductory calculus texts run so long is that they include many.15) 7. (b) a 3τ −2 dτ . so you should not expect to be able to complete them all now. coupled with the diﬃculty we applied mathematicians experience in trying to understand the reason the dispute even exists. many pages of integration examples and exercises. To us the Dirac delta δ(t) is just a function. −2 5. (b) x 2 0 τ dτ . Even if this book is not an instructional textbook. Some of them do need material from later chapters. x (Answer: x2 /2. it seems not meet that it should include no exercises at all here. who know whereof they speak. Evaluate (a) 1 (1/τ 2 ) dτ . Evaluate ∞ 2 1 (3/τ ) dτ . (c) x (a2 τ 2 + a1 τ ) dτ . The reader who desires a gentler introduction to the integral might consult among others the textbook the chapter’s introduction has recommended. we leave to the professionals. This chapter is short. but that was the professionals’ problem not ours. too. has unfortunately surrounded the Dirac delta with a kind of mysterious aura. 4. (c) ∞ k k=0 (τ /k!) x 0 exp ατ 5 dτ . an elusive sense that δ(t) hides subtle mysteries—when what it really hides is an internal discussion of words and means among the professionals. REMARKS (AND EXERCISES) The Dirac delta is deﬁned for vectors. Work the exercises if you like. 1. ∗ (e) 1 (1/τ ) dτ . deep and hard. Work the exercise by hand in hexadecimal and give the answer in hexadecimal. (b) x k ∞ k=0 0 τ dτ .8. The internal discussion of words and means. 3.7. Evaluate x 0 ∞ k k=0 τ (d) x 0 dτ . x 0 2. V 157 (7. 6. but its implications are broad. Here are a few.

(b) ∞ 2 −∞ exp[−τ /2] dτ . back endpaper] . integrals which arise in practice often can be solved very well with suﬃcient cleverness—and the more cleverness you develop. 8. On the other hand. most broadly useful integralsolving techniques. ∗ Evaluate22 (a) x√ 1 1 x x 0 sin ωτ dτ . however. (b) √ √ a x [(cos τ )/ τ ] dτ. Chapter 9 introduces some of the basic. you should be able to work right now. Before addressing techniques of integration. ∗ Evaluate the integral of the example of § 7. as promised earlier we turn our attention in Chapter 8 back to the derivative. The point of the exercises is to illustrate how hard integrals can be to solve. (b) 9. Some of the easier exercises. ∗∗ Evaluate (a) x 2 −∞ exp[−τ /2] dτ . from xρ to 0 to yρ. ∗ (c)21 x 0 τ sin ωτ dτ . (b) 0 [(4 + √ i3)/ 2 − 3τ 2 ] dτ (hint: the answer involves another inverse trigonometric). ∗ Evaluate23 (a) 0 [1/(1 + τ 2 )] dτ (answer: arctan x).6 along the alternate conˆ ˆ tour suggested there. x 10. the more such integrals you can solve. etc.158 CHAPTER 7. § 5-6] 23 [65. of course. it requires a developed sense of applied mathematical style to put the answer in a pleasing form (the right form for part b is very diﬀerent from that for part a). The last exercise in particular requires some experience to answer.) yet not even the masters can solve them all in practical ways. 11. + 2τ dτ . and in fact how easy it is to come up with an integral which no one really knows how to solve very well. applied in the form of the Taylor series. Evaluate (a) x 0 cos ωτ dτ . Some solutions to the same integral are better than others (easier to manipulate. 21 22 [65. § 8-2] [65. The mathematical art of solving diverse integrals is well worth cultivating. Moreover. faster to numerically calculate. THE INTEGRAL 7. The ways to solve them are myriad.

for.3.11. This chapter introduces the Taylor series and some of its incidents.1. the chapter errs maybe toward too little rigor. However.5.Chapter 8 The Taylor series The Taylor series is a power series which ﬁts a function in a limited domain neighborhood. It also derives Cauchy’s integral formula. with a little less. to read only the following sections might not be an unreasonable way to shorten the chapter: §§ 8.6 would cease to be necessary.8.4 and 8.20) we take them with any power series.9 and 8. 8. plus the introduction of § 8. 8. Fitting a function in such a way brings two advantages: • it lets us take derivatives and integrals in the same straightforward way (4. analytic continuation and the matter of convergence domains.1. such constructs drive the applied mathematician on too long a detour.3. From another point of view.1 Because even at the applied level the proper derivation of the Taylor series involves mathematical induction. The chapter’s early sections prepare the ground for the treatment of the Taylor series proper in § 8. and • it implies a simple procedure to calculate the function numerically. most of §§ 8. For the impatient. Some pretty constructs of pure mathematics serve the Taylor series and Cauchy’s integral formula. 1 159 . The chapter errs maybe toward too much rigor. 8. 8.2. no balance of rigor the chapter might strike seems wholly satisfactory. The chapter as written represents the most nearly satisfactory compromise the writer has been able to attain. 8.

4. it is the second stage (§ 8. k. n ≥ 0. . In all the section.1.1 The power-series expansion of 1/(1 − z)n+1 Before approaching the Taylor series proper in § 8.3) establishes the sum’s convergence. .1 The formula In § 2. dividing 1/(1 − z)4 .3.2 on page 79. (8.1) The demonstration comes in three stages. m. 2.4 we found that 1 = 1−z ∞ k=0 zk = 1 + z + z2 + z3 + · · · for |z| < 1. . n. .1. 4.160 CHAPTER 8.1). i.6.2) which actually proves (8. 4. one can calculate the ﬁrst few terms of 1/(1 − z)2 to be 1 1 = = 1 + 2z + 3z 2 + 4z 3 + · · · 2 (1 − z) 1 − 2z + z 2 whose coeﬃcients 1. To motivate the conjecture a bit more formally (though without actually proving it yet). 1/(1 − z)4 .1).1. we shall ﬁnd it both interesting and useful to demonstrate that 1 = (1 − z)n+1 ∞ k=0 n+k n z k . 8. (8.1). The third stage (§ 8. the coeﬃcients down the third. 6. 3. 0xA. is expandable in the power series ∞ 1 ank z k . happen to be the numbers down the ﬁrst diagonal of Pascal’s triangle (Fig. . suppose that 1/(1− z)n+1 . n ≥ 0. see also Fig. 1/(1 − z)3 . down the second diagonal. What about 1/(1 − z)2 . The pattern recommends the conjecture (8. worth investigating more closely. THE TAYLOR SERIES 8. 3.1) comes up with the formula for the second stage to prove. j.2) = (1 − z)n+1 k=0 . A curious pattern seems to emerge. Of the three. and so on? By the long-division procedure of Table 2. The ﬁrst stage (§ 8.1. . 4. K ∈ Z. Dividing 1/(1 − z)3 seems to produce the coeﬃcients 1.

But to seem is not to be. (8.1) is thus suggestive.3) Thinking of Pascal’s triangle. Equation (8. applied to (8. It works at least for the important case of n = 0. In light of (8. (1 − z)n k=0 This is to say that a(n−1)k = ank − an(k−1) .3) reminds one of (4. k ← j. Multiplying by 1 − z. (8. but eventually the change n + k ← m. Transforming according to the rule (4. THE POWER-SERIES EXPANSION OF 1/(1 − Z)N +1 161 where the ank are coeﬃcients to be determined.1. this is + [n − 1] + k n−1 n+k n = n+k n .1) seems right. (8.4) j−1 j j except that (8. we have that ∞ 1 = [ank − an(k−1) ]z k . yield (8. (8. this much is easy to test.4) better match (8. At this point. .5).2).3). We will establish that it is right in the next subsection.6) which coeﬃcients.5) which ﬁts (8. (8. it seems to imply a relationship between the 1/(1 − z)n+1 series and the 1/(1 − z)n series for any n. Hence we conjecture that ank = .4) gives n+k−1 k−1 n + [k − 1] n + n+k−1 k = n+k k .8. or in other words that an(k−1) + a(n−1)k = ank .1). Thus changing in (8.3).3). transcribed here in the symbols m−1 m−1 m + = .3) is not a(m−1)(j−1) + a(m−1)j = amj . recommends itself.3) perfectly. Various changes of variable are possible to make (8. We might try at ﬁrst a few false ones. all we can say is that (8.

2 The proof by induction ∞ k=0 Equation (8. THE TAYLOR SERIES 8.8) Now suppose that (8. (1 − z)i+1 1 = (1 − z)i+1 ∞ k=0 Dividing by 1 − z. Like all inductions. (1 − z)i 1 = Si . Applying (8. Thus by induction.10). if it is true for any one n. i+k i zk . The n = 0 supplies the start case 1 = (1 − z)0+1 ∞ k=0 k 0 z = k ∞ k=0 zk . then it is also true for n = i. Consider the sum Sn ≡ Multiplying by 1 − z yields (1 − z)Sn = Per (8.34) we know to be true.10) Evidently (8. (8.1) is true for n = i − 1. if (8.9) In light of (8.5). then it is also true for all greater n. this means that 1 = (1 − z)Si .9) implies (8. (8. In other words. [n − 1] + k n−1 zk .7). this one needs at least one start case to be valid (many inductions actually need a consecutive pair of start cases).1) is true for n = i − 1 (where i denotes an integer rather than the imaginary unit): 1 = (1 − z)i ∞ k=0 [i − 1] + k i−1 zk . The “if” in the last sentence is important.7) n+k n − n + [k − 1] n zk .8). which per (2.162 CHAPTER 8. (8. this is (1 − z)Sn = ∞ k=0 ∞ k=0 n+k n zk . .1. (8.1) is proved by induction as follows.

2 To answer the question.1) converges.1.9). Here. The professional mathematical literature calls such convergence “uniform convergence. ank an(k−1) = n+k n =1+ .11) 2 The meaning of the verb to converge may seem clear enough from the context and from earlier references. consider that per (4. m j = m m−j m−1 j . but if explanation here helps: a series converges if and only if it approaches a speciﬁc. but you knew that already). n ← j.5]. k n+k n = n+k k n + [k − 1] n . It is interesting nevertheless to consider an example of an integral for which convergence is not so simple. we avoid error by keeping a clear view of the physical phenomena the mathematics is meant to model. but the eﬀect of trying to teach the professional view in a book like this would not be pleasing. are the coeﬃcients of the power series (8.1. . THE POWER-SERIES EXPANSION OF 1/(1 − Z)N +1 163 8.3 Convergence The question remains as to the domain over which the sum (8. § 1.8. A more rigorous way of saying the same thing is as follows: the series S= ∞ τk k=0 converges iﬀ (if and only if). Rearranging factors. k=K+1 for all n ≥ K (of course it is also required that the τk be ﬁnite.” distinguishing it through a test devised by Weierstrass from the weaker “pointwise convergence” [3. there exists a ﬁnite K ≥ −1 such that n τk < ǫ. With the substitution n + k ← m.7. ank = where ank ≡ n+k an(k−1) . ﬁnite value after many terms. for all possible positive constants ǫ. The applied mathematician can proﬁt substantially by learning the professional view in the matter. this means that n+k n or more tersely. such as Frullani’s integral of § 9. k k (8. m > 0.1).

all series τk with |τk /τk−1 | < 1 − δ do converge. The really curious reader may now ask why 1/k does not converge. The theoretical diﬃculty the student has with mathematical induction arises from the reluctance to ask seriously. but the induction probably does not help you to obtain the formula! A good inductive proof usually begins by motivating the formula proven. Once you obtain a formula somehow. (8.22). THE TAYLOR SERIES Multiplying (8. As we see however. See (5.4 General remarks on mathematical induction We have proven (8.12) thus establishes a sure convergence domain for (8. 3 .4 and eqn.10. the reader may nevertheless wonder why the simpler |(1 + n/k)z| < 1 is not given as a criterion. 3. Answer: it x majorizes 1 (1/τ ) dτ = ln x.8) and § 8. maybe you can prove it by induction.1.2 is that it makes a logically clean. so to meet the criterion it suﬃces that |z| < 1. the extremely simple 1/k does not converge.1). 8. But we can bind 1+n/k as close to unity as desired by making K suﬃciently large.1) is (1 + n/k)z times the (k − 1)th term. The virtue of induction as practiced in § 8. Its vice is that it conceals the subjective process which has led the mathematician to consider the formula in the ﬁrst place.6. Richard W. = 1+ k−1 an(k−1) z k which is to say that the kth term of (8.164 CHAPTER 8.12) The bound (8.11) by z k /z k−1 gives the ratio ank z k n z. Hamming once said of mathematical induction. “How could I prove a formula for an inﬁnite number of cases when I know that testing a ﬁnite number of cases is not enough?” Once you Although one need not ask the question to understand the proof.1. The surprising answer is that not all series τk with |τk /τk−1 | < 1 converge! For example.1) by means of a mathematical induction. So long as the criterion3 1+ n z ≤1−δ k is satisﬁed for all suﬃciently large k > K—where 0 < δ ≪ 1 is a small positive constant—then the series evidently converges (see § 2.1. airtight case for a formula. as in § 8. The distinction is subtle but rather important.1.

2.6] Applied mathematics holds that the practice is defensible. but he does not disdain mathematical rigor on principle. We will not always write inductions out as explicitly in this book as we have done in the present section—often we will leave the induction as an implicit exercise for the interested reader—but this section’s example at least lays out the general pattern of the technique. 8. you will understand the ideas behind mathematical induction. . § 2. It is necessary to require a gradually rising level of rigor so that when faced with a real need for it you are not left helpless. this is indefensible.2 Shifting a power series’ expansion point One more question we should treat before approaching the Taylor series proper in § 8. SHIFTING A POWER SERIES’ EXPANSION POINT really face this question. [30. which is needed to protect us against careless thinking. a professional mathematician who sympathized with the applied mathematician’s needs. Rigor is the hygiene of mathematics.6] 165 The applied mathematician may tend to avoid rigor for which he ﬁnds no immediate use. K ≤ 0. wrote further. [30. As a result. but Hamming nevertheless makes a pertinent point. but rather a gradually rising level.3 concerns the shifting of a power series’ expansion point. . The style lies in exercising rigor at the right level for the problem at hand. How can the expansion point of the power series f (z) = ∞ k=K (ak )(z − zo )k . when teaching a topic the degree of rigor should follow the student’s perceived need for it. § 1. Mathematical induction is a broadly applicable technique for constructing mathematical proofs. [30. Ideally. but psychologically there is little else that can be done. (8. The function of rigor is mainly critical and is seldom constructive. It is only when you grasp the problem clearly that the method becomes clear. [one cannot teach] a uniform level of rigor. on the ground that the math serves the model.13) (k. . Hamming. . K) ∈ Z. § 1.8. Logically.3] Hamming also wrote.

18) . Of the two subseries.16) c[−(k+1)] .15) Splitting the k < 0 terms from the k ≥ 0 terms in (8. after which combining like powers of w yields the form f− (z) = qk ≡ ∞ k=0 −(K+1) n=0 qk w k .12).14) (ck )(1 − w)k . the f− (z) is expanded term by term using (8. (8.1). (1 − w)k+1 (ck )(1 − w)k . w← to obtain f (z) = ∞ k=K ∞ k=K (ak )([z − z1 ] − [zo − z1 ])k . THE TAYLOR SERIES which may have terms of negative order. zo − z1 ck ← [−(zo − z1 )]k ak . (cn ) n k (8.166 CHAPTER 8.13) in the form f (z) = then changes the variables z − z1 . (8. (c[−(n+1)] ) The f+ (z) is even simpler to expand: one need only multiply the series out term by term per (4.15). (8. .17) n+k n . f− (z) ≡ f+ (z) ≡ −(K+1) k=0 ∞ k=0 (8. we have that f (z) = f− (z) + f+ (z). combining like powers of w to reach the form f+ (z) = pk ≡ ∞ k=0 ∞ n=k pk w k . be shifted from z = zo to z = z1 ? The ﬁrst step in answering the question is straightforward: one rewrites (8.

) Furthermore.12) each term of the original f− (z) of (8. the reconstituted f− (z) of (8. we have strategically framed the problem so that one needn’t worry about it. Notice that—unlike the original. one shifts the expansion point of some concrete function like sin z or ln(1 − z). even if z1 does represent a fully analytic point of f (z). The latter convergence. EXPANDING FUNCTIONS IN TAYLOR SERIES 167 Equations (8.18) can be nothing other than the Taylor series about z = z1 of the function f+ (z) in any event. which ratio approaches unity with increasing n.12) of restricting the convergence domain to |w| < 1. calculating the coeﬃcients of a power series for f (z) about z = z1 .3. we now stand in position to treat the Taylor series proper. that of f+ (z). one does not normally try to shift the expansion point of an unspeciﬁed function f (z). however.3. the f+ (z) of (8.4 8. See §§ 8.8. the important point is the one made in the narrative: f+ (z) can be nothing other than the Taylor series in any event.17) nor of f+ (z) in (8. it has no terms of negative order. Appealing to § 8.13) from the ﬁnite k = K ≤ 0 rather than from the inﬁnite k = −∞. that of f− (z). is harder to establish in the abstract because that subseries has an inﬁnite number of terms.18). A more elegant rigorous argument can be made indirectly by way of a complex contour integral. (Examples of such forbidden points include z = 0 in √ h[z] = 1/z and in g[z] = z. Rather. 4 . Regarding the former convergence.3 Expanding functions in Taylor series Having prepared the ground.9) and its brethren. it also must lie within the convergence domain of the original.13) through (8.18) serve to shift a power series’ expansion point. and since according to (8.8. The imagined diﬃculty (if any) vanishes in the concrete case. from the ratio n/(n − k) of (4. The convergence domain vanishes as z1 approaches such a forbidden point. running the sum in (8. z = zo power series—the new.3. however. The method fails if z = z1 happens to be a pole or other nonanalytic point of f (z).3 if desired. shifting the expansion point away from the pole at z = zo has resolved the k < 0 terms. At the price per (8. As we will see by pursuing a diﬀerent line of argument in § 8.16) converges for |w| < 1. anyway. z = z1 power series has terms (z − z1 )k only for k ≥ 0. In applied mathematics. enjoying the same convergence domain any such Taylor series enjoys. The treatment begins with a question: if you had to express A rigorous argument can be constructed without appeal to § 8.4 through 8. The attentive reader might observe that we have formally established the convergence neither of f− (z) in (8. z = zo series for the shift to be trustworthy as derived.17) safely converges in the same domain. given those of a power series about z = zo .

1) that the professional’s approach is founded in postulation.1 worked well enough in the case of f (z) = 1/(1−z)n+1 . having terms of nonnegative order only. hence also at z = zo ± 2ǫ. otherwise it would not have the right value at z = zo . The professional mathematician reading such words is likely to blanch. If ∆F (z) is the part of F (z) not representable as a Taylor series. if F (z) is inﬁnitely diﬀerentiable and if all the derivatives of ∆F (z) are zero at z = zo then. THE TAYLOR SERIES some function f (z) by a power series f (z) = ∞ k=0 (ak )(z − zo )k . This means that ∆F (z) = 0. most resembles f (z) in the immediate neighborhood of z = zo ? To resemble f (z).19) Equation (8. Then it should have a1 = f ′ (zo ) for the right slope. What if f (z) = sin z.19) is the Taylor series. Then. and so on. how would you do it? The procedure of § 8. It works by asking the question: what power series. the desired power series should have a0 = f (zo ). 8. there is no part of F (z) not representable as a Taylor series. Before submitting unreservedly to the professional’s scruples in the matter. let us not forget (§ 1. In other words. with terms of nonnegative order k ≥ 0 only. It begins with an inﬁnitely diﬀerentiable function F (z) and its Taylor series f (z) about zo .19. one could construct a nonzero Taylor series for ∆F [zo ] from the nonzero derivatives).4. However.2. by the unbalanced deﬁnition of the derivative from § 4. whereas that ours is founded in physical metaphor. however. With this procedure. it has all the same derivatives f (z) has. Still. an alternate way to think about the Taylor series’ suﬃciency might interest some readers. and so on. Our means diﬀer from his for this reason alone. A more formal way to make the same argument would be to suppose that 6 5 . so if f (z) is inﬁnitely diﬀerentiable then the Taylor series is an exact representation of the function. Where it converges.168 CHAPTER 8. f (z) = ∞ k=0 dk f dz k z=zo (z − zo )k .6 The actual Taylor series for sin z is given in § 8.9. but it is not immediately obvious that the same procedure works more generally. for example?5 Fortunately a diﬀerent way to attack the power-series expansion problem is known. k! (8. such words want rigor. To him. a2 = f ′′ (zo )/2 for the right second derivative. such words want pages and whole chapters [5][23][67][34] of rigor. To him. all the derivatives must also be zero at z = zo ± ǫ. then ∆F (zo ) and all its derivatives at zo must be identically zero (otherwise by the Taylor series formula of eqn. letting ∆F (z) ≡ F (z) − f (z) be the part of F (z) not representable as a Taylor series about zo .

Appendix C sketches that proof. Applied mathematicians normally regard mathematical functions to be imprecise analogs of. After all.3. then must they not indicate how the function and its several derivatives will evolve from that point? And if they do indicate that. By either name. second. then what could null derivatives indicate but null evolution? Yet such arguments even if accepted are satisfactory only from a certain point of view. indicate anything at all. as we shall soon see. preferred by the professional mathematicians [45][5][23] but needing signiﬁcant theoretical preparation. but to deprecate insistence on rigor which serves little known purpose in applications.4. third derivatives and so forth. (It is entertaining incidentally to consider [76. not the other way around. “Extremum”] the Taylor series of the function sin[1/x]—although in practice this particular function is readily expanded after the obvious change of variable u ← 1/x.8. if the ﬁrst. Other than to divert the interested reader to Appendix C. When zo = 0. 8. that is. As we have seen. inasmuch as the latter is one of the derivatives of o ∆F (z) at z = zo . whether there actually exist any inﬁnitely diﬀerentiable functions not representable as Taylor series is more or less beside the point—at least until a concrete need for such a hypothetical function should present itself. The interested reader can ﬁll the details in. only one Taylor series dn ∆F/dz n |z=zo +ǫ = h for some integer n ≥ 0.12. maybe more appropriately for our applied purpose. whereas that this would mean that dn+1 ∆F/dz n+1 z=z = h/ǫ. the applied mathematician is logically free implicitly to deﬁne the functions he uses as Taylor series in the ﬁrst place. the series is a construct of great importance and tremendous practical value. evaluated at some expansion point. and yet slightly less so once one considers the asymptotic series of [chapter not yet written] later in the book. A more elegant rigorous argument. involves integrating over a complex contour about the expansion point. In applied mathematics. but where it does converge it is precise. The discussion of rigor is conﬁned here to a footnote not to deprecate rigor as such. With such an implicit deﬁnition. or metaphors for. ANALYTIC CONTINUATION 169 The Taylor series is not guaranteed to converge outside some neighborhood near z = zo .) . a function expressible as a Taylor series in that neighborhood. it follows that h = 0. an analytic function is a function which is inﬁnitely diﬀerentiable in the domain neighborhood of interest—or. but basically that is how the alternate argument begins. physical quantities of interest. this footnote will leave the matter in that form. the series is also called the Maclaurin series. but that. Since the functions are imprecise analogs in any case.4 Analytic continuation As earlier mentioned in § 2. to restrict the set of inﬁnitely diﬀerentiable functions used in the model to the subset of such functions representable as Taylor series. the deﬁnitions serve the model.

As it happens. Since an analytic function f (z) is inﬁnitely diﬀerentiable and enjoys a unique Taylor expansion fo (z −zo ) = f (z) about each point zo in its domain. then a series f4 whose domain overlaps that of f3 . However. except that the transposed series may there enjoy a diﬀerent convergence domain. In the swapped notation. it follows that if two Taylor series f1 (z − z1 ) and f2 (z − z2 ) ﬁnd even a small neighborhood |z − zo | < ǫ which lies in the domain of both. transposing rather from expansion about z = z1 to expansion about z = zo . He does not especially recommend that the reader worry over the point. and if each pair in the chain matches at least in a small neighborhood in its region of overlap.2. Moreover.7 Observe however that the principle fails at poles and other nonanalytic points. If the two are found to have the same Taylor series there. so long as the expansion point z = zo lies fully within (neither outside nor right on the edge of) the z = z1 series’ convergence domain. and so on. if a series f3 is found whose domain overlaps that of f2 . which shows general power series to be expressible as Taylor series except at their poles and other nonanalytic points. then they are the same everywhere. The domain neighborhood |z − zo | < ǫ suﬃces. then the whole chain of overlapping series necessarily represents the same underlying analytic function f (z). .170 CHAPTER 8. The result of § 8. then the two can both be transposed to the common z = zo expansion point.2. nothing prevents one from transposing the series to a diﬀerent expansion point z = z1 by the method of § 8. The principle holds that if two analytic functions are the same within some domain neighborhood |z − zo | < ǫ. the two series evidently describe the selfsame underlying analytic function. the writer has neither researched the extension’s proof nor asserted its truth. THE TAYLOR SERIES about zo is possible for a given function f (z): f (z) = ∞ k=0 (ak )(z − zo )k . this section’s purpose ﬁnds it convenient to swap symbols zo ↔ z1 . The series f1 and the series fn represent the same analytic function even if their domains do not directly overlap at all. extends the 7 The writer hesitates to mention that he is given to understand [67] that the domain neighborhood can technically be reduced to a domain contour of nonzero length but zero width. This is a manifestation of the principle of analytic continuation. because the function is not diﬀerentiable there. then f1 and f2 both represent the same function. Having never met a signiﬁcant application of this extension of the principle.

given Taylor series for g(z) and h(z) one can make a power series for f (z) by long division if desired. reduces. arg z. Now.8. When the professional mathematician speaks generally of a “function. z ∗ . However. (k. including power series with terms of negative order. Sums. Such functions are nonanalytic either because they lack proper derivatives in the Argand plane according to (4. δ(t). neither functions like this f (z) nor more subtly unreasonable functions normally arise in the modeling of physical phenomena.19) or because one has deﬁned them only over a real domain. The few pages of Appendix C trace only the pure theory’s main thread. ℑ(z).” he means any function at all. including [23][67][34] and numerous others. one need not actually expand every analytic function in a power series. there also is f (z) ≡ g(z)/h(z) analytic (except perhaps at isolated points where h[z] = 0). but they are not subtle: |z|. is an analytic function of analytic functions. Many books do cover it in varying degrees. f (z) ≡ 0 otherwise. Though not for the explicit purpose of serving the pure theory. by the derivative chain rule.12 and 7. u(t). The subject of analyticity is rightly a matter of deep concern to the professional mathematician. However that may be.8 This does not mean that the scientist or engineer never encounters nonanalytic functions. The full theory which classiﬁes and encompasses—or explicitly excludes—such functions is thus of limited interest to the applied mathematician. the present chapter does develop just such an understanding. where g(z) and h(z) are analytic. m) ∈ Z. are beautiful. such as f ([2k + 1]2m ) ≡ (−)m . It is also a long wedge which drives pure and applied mathematics apart. Section 8. Refer to §§ 2. When such functions do arise. and this book does not cover it. approximates. and though they do not comfortably ﬁt a book like this even an applied mathematician can proﬁt substantially by studying them. On the contrary. 8 . The foundations of the pure theory of a complex variable. One can construct some pretty unreasonable functions if one wants to.4. he encounters several. though abstract. products and ratios of analytic functions are no less diﬀerentiable than the functions themselves—as also.7.15 speaks further on the point. replaces and/or avoids them. so that is all right. one transforms. For example. ANALYTIC CONTINUATION 171 analytic continuation principle to cover power series in general. the pure theory is probably best appreciated after one already understands its chief conclusions. observe: though all convergent power series are indeed analytic. ℜ(z). Besides.

probably the most ﬁtting shape to imagine when thinking of the concept abstractly. that neither suddenly skip from one spot to another—then one ﬁnds that f (z) ends in a diﬀerent place than it began. 16 May 2006] √ An analytic function like g(z) = z having a branch point evidently is not single-valued. there really is no distinction between z1 = zo + ρeiφ and For readers whose native language is not English. The range contour remains open even though the domain contour is closed. the circle is a typical shape. even though the two are exactly the same number.172 CHAPTER 8. is single-valued even though it has a pole. An analytic function like h(z) = 1/z. This function does not suﬀer the syndrome described. yet the point is not a pole. so even though the function is ﬁnite at z = 0. any closed shape suﬃces.5 Branch points √ The function g(z) = z is an interesting. It is confusing. 9 . As far as z is concerned. until one realizes that the fact of a branch point says nothing whatsoever about the argument z. Indeed z1 = z2 . For a single z more than one distinct g(z) is possible. THE TAYLOR SERIES 8. In complex analysis. Its deriv√ ative is dg/dz = 1/2 z. even though z itself has returned precisely to its own starting point. if one encircles9 the point once alone (that is. [77. This is diﬃcult. “to encircle” means “to surround” or “to enclose. but paradoxically f (z1 ) = f (z2 ).” 18:10. without also encircling some other branch point) by a closed contour in the Argand domain plane.” The verb does not require the boundary to have the shape of an actual. by contrast. “Branch point. given a function f (z) with such a point at z = zo . What is it? We call it a branch point. troublesome function. its derivative is not ﬁnite there. a branch point may be thought of informally as a point zo at which a “multiple-valued function” changes values when one winds once around zo . If f (z) does have a branch point—if a is not an integer— then the mathematician must draw a distinction between z1 = zo + ρeiφ and z2 = zo + ρei(φ+2π) . Poles do not cause their functions to be multiple-valued and thus are not branch points. too. Evidently g(z) has a nonanalytic point at z = 0. It is multiple-valued. while simultaneously tracking f (z) in the Argand range plane—and if one demands that z and f (z) move smoothly. However. When a domain contour encircles a pole. geometrical circle. The deﬁning characteristic of the branch point is that. the corresponding range contour is properly closed. Evidently f (z) ≡ (z − zo )a has a branch point at z = zo if and only if a is not an integer.

What draws the distinction is the multiplevalued function f (z) which uses the argument. to shut it out. whereupon for some reason he began calling me Gorbag Pfufnik.11 A function f (z) which is analytic for all ﬁnite z is an entire function. of which no circle of ﬁnite radius in the Argand domain plane encompasses an inﬁnity of poles. pure mathematics nevertheless brings some technical deﬁnitions the applied mathematician can use. even though the number of turns is personally of no import to me. 11 [76] 10 . The usual analysis strategy when one encounters a branch point is simply to avoid the point. is a meromorphic function. A function f (z) which is analytic for all ﬁnite z except at isolated poles (which can be n-fold poles if n is a ﬁnite. but now the colleague calls me by a diﬀerent name.8. Two such are the deﬁnitions of entire and meromorphic functions. The interested reader might consult a book on complex variables or advanced calculus like [34]. ENTIRE AND MEROMORPHIC FUNCTIONS 173 z2 = zo + ρei(φ+2π) —none at all.10 8. I had not changed at all. The function f (z) = tan(1/z) is not meromorphic since it has an inﬁnite number of poles within the Argand unit circle. but of which the poles nowhere cluster in inﬁnite numbers.6 Entire and meromorphic functions Though an applied mathematician is unwise to let abstract deﬁnitions enthrall his thinking. positive integer). It is as though I had a mad colleague who called me Thaddeus Black. is it? It’s in my colleague. “Pfufnik!” then I had better keep a running count of how many times I have turned about his desk. which has no branch points. but are not central to the analysis as developed in this book and are not covered here. the strategy is to compose the contour to exclude the branch point. hadn’t I. f (z) = 1/(z + 2) + 1/(z − 1)3 + 2z 2 and f (z) = tan z— the last of which has an inﬁnite number of poles. until one day I happened to walk past behind his desk (rather than in front as I usually did).6. If it is important to me to be sure that my colleague really is addressing me when he cries. Examples include f (z) = z 2 and f (z) = exp z. among many others. who seems to suﬀer a branch point.8. These ideas are interesting. but not f (z) = 1/z which has a pole at z = 0. Examples include f (z) = 1/z. The change isn’t really in me. Such a strategy of avoidance usually prospers. Where an integral follows a closed contour as in § 8. Even the function f (z) = exp(1/z) is not meromorphic: it has only Traditionally associated with branch points in complex variable theory are the notions of branch cuts and Riemann sheets.

incidentally. which by Euler’s formula. n ∈ Z. if it is unclear that z = [2n + 1][2π/4] are the only singularities tan z has—that it has no singularities of which ℑ[z] = 0—then consider that the singularities of tan z occur where cos z = 0. which happens only for real z. isolated nonanalytic point at z = 0. for k < K. having the character of an inﬁnitifold (∞-fold) pole.18. with which −1 + w2 /2 − w4 /0x18 − · · · tan z = . K) ∈ Z. but = 0. 8. and that point is no branch point. occurs where exp[+iz] = exp[−iz]. [45] . THE TAYLOR SERIES the one. eqn. (k.15 and 9.7 Extrema over a complex domain If a function f (z) is expanded by (8. below.9 and its Table 8.174 CHAPTER 8. 0 < K < ∞. w 1 − w2 /6 + w4 /0x78 − · · · (On the other hand. 5. but the point is an essential singularity.19) or by other means about an analytic expansion point z = zo such that f (z) = f (zo ) + and if ak = 0 aK 12 ∞ k=1 (ak )(z − zo )k .1. 4 Section 8. w ← z − (2n + 1) tan z = − w/3 − w3 /0x1E + · · · 1 + . then consider that sin z cos w =− . give Taylor series for cos z and sin z. 8.) Sections 8.12 If it seems unclear that the singularities of tan z are actual poles.14. cos z sin w wherein we have changed the variable tan z = 2π .6 speak further of the matter. This in turn is possible only if |exp[+iz]| = |exp[−iz]|. w − w3 /6 + w5 /0x78 − · · · By long division.

It does however arise where the variable of integration is a complex scalar. |z − zo | ≪ 1. this always works—even where [df /dz]z=zo = 0. n ∈ Z.8. 7.8. n ∈ Z. Changing ρ′ eiφ ← z − zo . no alternate choice of path is possible where the variable of integration is a real scalar.8 Cauchy’s integral formula In § 7. 0 ≤ ρ′ ≪ 1. Refer to the Argand plane of Fig.5. as in Fig.13. a function’s extrema over a bounded analytic domain never lie within the domain’s interior but always on its boundary.21) Professional mathematicians tend to deﬁne the domain and its boundary more carefully. (8.14 8. ′ ′ (8. one can shift z by ∆z ≈ (ǫ/aK )1/K ei(ψ+n2π)/K . then. That one can shift an analytic function’s output smoothly in any Argand direction whatsoever has the signiﬁcant consequence that neither the real nor the imaginary part of the function—nor for that matter any linear combination ℜ[e−iω f (z)] of the real and imaginary parts—can have an extremum within the interior of a domain over which the function is fully analytic. so the contour problem does not arise in that case. or contour. Speciﬁcally according to (8. because there again diﬀerent paths are possible. 2. CAUCHY’S INTEGRAL FORMULA 175 such that aK is the series’ ﬁrst nonzero coeﬃcient. this is f (z) ≈ f (zo ) + aK ρ′K eiKφ . Because real scalars are conﬁned to a single line. 14 [72][45] .20) Evidently one can shift the output of an analytic function f (z) slightly in any desired Argand direction by shifting slightly the function’s input z. Consider the integral z2 Sn = z1 13 z n−1 dz.8. over which the integration is done.20). in which the sum value of an integration depends not only on the integration’s endpoints but also on the path. in the immediate neighborhood of the expansion point. to shift f (z) by ∆f ≈ ǫeiψ .6 we considered the problem of vector contour integration. f (z) ≈ f (zo ) + (aK )(z − zo )K . Except at a nonanalytic point of f (z) or in the trivial case that f (z) were everywhere constant. That is.

1.22) 8. in the case of n = 0. we are left with the peculiar but ﬁne formula dz = (dρ + iρ dφ)eiφ . Inasmuch as z is complex. One cannot always drop them.2 Integrating along the contour Now consider the integration (8. we can drop the dρ dφ term.2) this inten n gral would evaluate to (z2 − z1 )/n. 8.8. One must also consider the meaning of the symbol dz. however. is a standard calculus technique. To ﬁx the problem. but also the ﬁrst-order inﬁnitesimals. THE TAYLOR SERIES If z were always a real number. Occasionally one encounters a sum in which not only do the ﬁnite terms cancel. you’ll know it. (8. one must consider some speciﬁc path of integration in the Argand plane.15 After canceling ﬁnite terms. Integrat15 The dropping of second-order inﬁnitesimals like dρ dφ. to ln(z2 /z1 ). An example of the type is ǫ→0 lim (1 − ǫ)3 + 3(1 + ǫ) − 4 (1 − 3ǫ + 3ǫ2 ) + (3 + 3ǫ) − 4 = lim = 3. In such a case. In the relatively uncommon event that you need them.176 CHAPTER 8.1 The meaning of the symbol dz The symbol dz represents an inﬁnitesimal step in some direction in the Argand plane: dz = [z + dz] − [z] = = = (ρ + dρ)ei(φ+dφ) − ρeiφ (ρ + dρ)ei dφ eiφ − ρeiφ (ρ + dρ)(1 + i dφ)eiφ − ρeiφ . Otherwise there isn’t much point in carrying second-order inﬁnitesimals around. the second-order inﬁnitesimals dominate and cannot be dropped. The math itself will tell you. or. ǫ→0 ǫ2 ǫ2 One typically notices that such a case has arisen when the dropping of second-order inﬁnitesimals has left an ambiguous 0/0. Since the product of two inﬁnitesimals is negligible even on inﬁnitesimal scale. then by the antiderivative (§ 7. the correct evaluation is less obvious. added to ﬁrst order inﬁnitesimals like dρ. then you proceed from there.21) along the contour of Fig. you simply go back to the step where you dropped the inﬁnitesimal and you restore it.8. To evaluate the integral sensibly in the latter case. 8. however. .

y = ℑ(z) zb ρ zc φ za x = ℜ(z) ing along the constant-φ segment. in two segments: constant-ρ (za to zb ).1: A contour of integration in the Argand plane. CAUCHY’S INTEGRAL FORMULA 177 Figure 8.8. zc zb z n−1 dz = = ρc ρb ρc ρb (ρeiφ )n−1 (dρ + iρ dφ)eiφ (ρeiφ )n−1 (dρ)eiφ ρc ρb = einφ = = ρn−1 dρ einφ n (ρc − ρn ) b n n − zn zc b .8. and constant-φ (zb to zc ). n .

25) . Equation (8. it follows generally that z2 n z n − z1 z n−1 dz = 2 . Since any path of integration between any two complex numbers z1 and z2 is approximated arbitrarily closely by a succession of short constant-ρ and constant-φ segments. z Following the same steps as before and using (5. we ﬁnd that ρ2 ρ1 dz = z ρ2 ρ1 (dρ + iρ dφ)eiφ = ρeiφ ρ2 ρ1 dρ ρ2 = ln .23) does not hold in that case. THE TAYLOR SERIES Integrating along the constant-ρ arc. ρ ρ1 (8. It’s the same as for real numbers.178 CHAPTER 8. but the point is well taken nevertheless. However. φ2 φ1 dz = z φ2 φ1 (dρ + iρ dφ)eiφ =i ρeiφ φ2 φ1 dφ = i(φ2 − φ1 ). n = 0. (8. n ∈ Z. (8. zb za z n−1 dz = = φb φa φb φa (ρeiφ )n−1 (dρ + iρ dφ)eiφ (ρeiφ )n−1 (iρ dφ)eiφ φb φa = iρn = = Adding the two. notice the exemption of n = 0. However. “Was (8. but otherwise it brings no surprise.24) This is always real-valued.39). n z n−1 dz = surprisingly the same as for real z.23) really worth the trouble? We knew that already. we have that zc za einφ dφ iρn inφb − einφa e in n n zb − za . we really didn’t know it before deriving it. Consider the n = 0 integral z2 S0 = z1 dz . n n n zc − za .8) and (2.” Well.23) n z1 The applied mathematician might reasonably ask.

this means that f (z) dz = 0 if f (z) is everywhere analytic within the contour. (z − zo )a−1 dz = 0. But because any analytic function can be expanded in the form f (z) = ak −1 (which is just a Taylor series if the a happen to be k k (ck )(z − zo ) positive integers). a (8. Notice also that nothing in the derivation of (8. For a closed contour which encloses no pole or other nonanalytic point.6 the symbol represents integration about a closed contour that ends where it begins. In this case. but only on the fact that the path loops once positively about the pole. Notice that the formula’s i2π does not depend on the precise path of integration. dz z − zo = i2π.26) does not hold in the latter case. Generalizing. (8.27) However. (8. CAUCHY’S INTEGRAL FORMULA 179 The odd thing about this is in what happens when the contour closes a complete loop in the Argand plane about the z = 0 pole. and where it is implied that the contour loops positively (counterclockwise. even though the integration ends where it begins. φ2 = φ1 + 2π.26) where as in § 7. its integral comes to zero for nonintegral a only if the contour does not enclose the branch point at z = zo . in the direction of increasing φ) exactly once about the z = zo pole. (8.8. a = 0. so one can write z2 z1 z a−1 dz = a a z2 − z1 .23) actually requires that n be an integer. n ∈ Z.16 16 (8.28) The careful reader will observe that (8. we have that (z − zo )n−1 dz = 0.27) has that z a−1 dz = 0.28)’s derivation does not explicitly handle .8. n = 0. thus S0 = i2π. or with the change of variable z − zo ← z.

if the dashed detour which excludes the pole is taken. where neither fo (z) nor any of the several fk (z) has a pole or other nonanalytic point within (or on) the an f (z) represented by a Taylor series with an inﬁnite number of terms and a ﬁnite convergence domain (for example. 8.29) But if the contour were not an inﬁnitesimal circle but rather the larger contour of Fig. § 1.8.28) the resulting integral totals zero. each of which is small enough to enjoy a single convergence domain. fo (z) + k fk (z) z − zk dz = i2π k fk (zk ). Because the cells share boundaries within the contour’s interior. f [z] = ln[1 − z]). the reverse-directed integral about the tiny detour circle is −i2πf (zo ).2).26) and (8.3 The formula The combination of (8.3). and similarly as we have just reasoned. whether the contour be small or large. then by the principle of linear superposition (§ 7. Let the contour’s interior be divided into several cells.180 CHAPTER 8. z − zo where the contour encloses no nonanalytic point of f (z) itself but does enclose the pole of f (z)/(z − zo ) at z = zo . This is the basis on which a more rigorous proof is constructed. 17 [48. and then per (8.29) is Cauchy’s integral formula.30) where the fo (z) is a regular part. If the contour were a tiny circle of inﬁnitesimal radius about the pole.2 one can transpose such a series from zo to an overlapping convergence domain about z1 . The original contour—each piece of which is an exterior boundary of some cell—is integrated once piecewise.11 and 9. f (z) dz = i2πf (zo ). If the contour encloses multiple poles (§§ 2.2? In this case. so to bring the total integral to zero the integral about the main contour must be i2πf (zo ). Equation (8. Consider the closed contour integral f (z) dz. THE TAYLOR SERIES 8. z − zo (8. Integrate about each cell.29) holds for any positivelydirected contour which once encloses a pole and no other nonanalytic point.17 and again. then the integrand would reduce to f (zo )/(z − zo ).6. (8.3.28) is powerful.26). each interior boundary is integrated twice. (8. but the two straight integral segments evidently cancel. once in each direction. then according to (8. Thus. by § 8. canceling. However.1] .

8. CAUCHY’S INTEGRAL FORMULA Figure 8. whether in the form of (8. ℑ(z) zo 181 ℜ(z) contour. the contour must exclude it or the formula will not work. In words.19) about z = zo .8.5. m ∈ Z.4 Enclosing a multiple pole When a complex contour of integration encloses a double.6][67][77. If there is a branch point. triple or other n-fold pole. “Cauchy’s integral formula.29) or of (8.) As we shall see in § 9. are called residues. The values fk (zk ).2: A Cauchy contour integral. 8. 20 April 2006] . the integration can be written S= f (z) dz.30) Cauchy’s integral formula is an extremely useful result. S= 18 ∞ k=0 dk f dz k z=zo dz . (k!)(z − zo )m−k+1 [34. Expanding f (z) in a Taylor series (8. m ≥ 0.18 8. (z − zo )m+1 where m + 1 = n. § 10. which represent the strengths of the poles.30) says that an integral about a closed contour in the Argand plane comes to i2π times the sum of the residues of the poles (if any) thus enclosed. (Note however that eqn.8.” 14:13. (8.30 does not handle branch points.

m ≥ 0. only the k = m term contributes. (8.31) Equation (8.3. = z=zo = dm f dz m z=zo where the integral is evaluated in the last step according to (8.19).29) does about a single pole. m ∈ Z. the two equations are the same.9 Taylor series for speciﬁc functions With the general Taylor series formula (8.)19 8. Altogether.31) evaluates a contour integral about an n-fold pole as (8.26). (When m = 0.21) that d(z a ) = az a−1 . THE TAYLOR SERIES But according to (8. the derivatives of Tables 5. dz 19 [45][67] . so S = 1 m! i2π m! dm f dz m dm f dz m z=zo dz (m!)(z − zo ) dz (z − zo ) .182 CHAPTER 8. and the observation from (4.29).2 and 5. f (z) i2π dz = (z − zo )m+1 m! dm f dz m z=zo .

4 elaborates. = 2. With these derivatives.) Using such Taylor series. The exp z. 1.10. (And if z lies outside the convergence domain? Several strategies are then possible. . one can relatively eﬃciently calculate actual numerical values for ln z and many other functions. = z=1 z=1 = −(−)k (k − 1)!. = = z ∞ 0 k=0 ∞ (−)k z 2k+1 k=0 2k + 1 20 [65. the series for arctan z is computed indirectly20 by way of Table 5. k evidently convergent for |1 − z| < 1. sin z and cos z series converge for all complex z. expanding about z = 1. TAYLOR SERIES FOR SPECIFIC FUNCTIONS 183 one can calculate Taylor series for many functions. One can expand the Taylor series about a diﬀerent point.3 and (2. Among the several series.8. but cleverer and easier is to take advantage of some convenient relationship like ln w = − ln[1/w]. = z=1 = −1.1 lists Taylor series for a few functions of interest. For instance. § 11-7] . All the series converge for |z| < 1.33): z arctan z = 0 1 dw 1 + w2 (−)k w2k dw . Section 8. the Taylor series about z = 1 is ln z = ∞ k=1 (z − 1)k =− −(−) (k − 1)! k! k ∞ k=1 (1 − z)k . Table 8. k > 0. . ln z|z=1 = d ln z dz d2 ln z dz 2 d3 ln z dz 3 dk ln z dz k = z=1 ln z|z=1 = 1 = z z=1 −1 z2 2 z3 −(−)k (k − 1)! zk z=1 0.9. = z=1 z=1 .

THE TAYLOR SERIES Table 8. ∞ k=0 f (z) = (1 + z)a−1 = dk f dz k k k z=zo j=1 z − zo j ∞ k=0 j=1 a −1 z j z = j k ∞ k=0 exp z = ∞ k k=0 j=1 zk k! sin z = ∞ k=0 cos z = ∞ z k j=1 (2j)(2j + 1) −z 2 k=0 j=1 −z 2 (2j − 1)(2j) k sinh z = ∞ k=0 cosh z = ∞ z k z2 (2j)(2j + 1) j=1 k=0 j=1 z2 (2j − 1)(2j) k − ln(1 − z) = arctan z = ∞ k=1 ∞ k=0 1 k z= j=1 ∞ k=1 zk k (−z 2 ) = ∞ k=0 1 z 2k + 1 k j=1 (−)k z 2k+1 2k + 1 .1: Taylor series.184 CHAPTER 8.

10. from Table 8. Up to but not including the fourth-order term. 1 3 < ln < S4 . that the remaining terms. S4 = 1) 2) (1)(2 (2)(2 (3)(23 ) S4 − Other series however do not alternate sign.1 the useful ﬁrst-order approximations that z→0 z→0 lim exp z = 1 + z. among others. so the true value of ln(3/2) necessarily lies between the sum of the series to n terms and the sum to n + 1 terms. for instance. For example. 2 2 1 1 1 1 + + + . For example. (8.1. The last and next partial sums bound the result. are suﬃciently insigniﬁcant? How can one set error bounds on the truncated sum? 8. 8. which constitute the tail of the series.10 Error bounds One naturally cannot actually sum a Taylor series to an inﬁnite number of terms. ln 2 = − ln S5 = R5 = 1 1 = − ln 1 − = S 5 + R5 . For these it is easy if the numbers involved happen to be real.32) lim sin z = z.8. ERROR BOUNDS 185 It is interesting to observe from Table 8.10. then quit—which raises the question: how many terms are enough? How can one know that one has added adequately many terms. 1) 2) 3) (1)(2 (2)(2 (3)(2 (4)(24 ) 1 1 + + ··· 5) (5)(2 (6)(26 ) . (4)(24 ) 2 1 1 1 − + . ln 1 3 = ln 1 + 2 2 = 1 1 1 1 − + − + ··· 1) 2) 3) (1)(2 (2)(2 (3)(2 (4)(24 ) Each term is smaller in magnitude than the last.1 Examples Some series alternate sign. One must add some ﬁnite number of terms.

or to replace by virtue of being. each of whose terms equals or exceeds in magnitude the corresponding term of Rn . n = 0x40 would have bound ln 2 tighter than the limit of a computer’s typical double-type ﬂoating-point accuracy). and that it would have been a lot smaller yet had we included a few more terms in Sn (for instance. Then. For real 0 ≤ x < 1 generally.10. some of which we will meet in the rest of the section. one might choose ∞ 2 1 1 ′ . 8. = R5 = k 5 2 (5)(25 ) k=5 wherein (2. everywhere at least as great as. Consider the summation S= ∞ k=1 1 1 1 1 = 1 + 2 + 2 + 2 + ··· k2 2 3 4 The exact value this summation totals to is unknown to us. k xk xn = . then to overestimate the remainder ′ of the series to establish an upper bound. ′ Sn < − ln(1 − x) < Sn + Rn . . THE TAYLOR SERIES The basic technique in such a case is to ﬁnd a replacement series (or inte′ gral) Rn which one can collapse analytically. For the example. The technique usually works well in practice for this reason. but the summation does rather resemble the integral (refer to Table 7.186 CHAPTER 8. n−1 k=1 ∞ k=n Sn ≡ ′ Rn ≡ xk . The overestimate Rn majorizes ′ in the example is a fairly the series’ true remainder Rn .34) had been used to collapse the summation.1) I= 1 ∞ dx 1 =− 2 x x ∞ 1 = 1.2 Majorization To majorize in mathematics is to be. but that is the basic technique: to add several terms of the series to establish a lower bound. ′ S5 < ln 2 < S5 + R5 . n (n)(1 − x) Many variations and reﬁnements are possible. Notice that the Rn small number. This is best explained by example.

and only then majorize the remainder. The quantities in question The author does not remember ever encountering the word minorization heretofore in print. cleverer ways to majorize the remainder of this particular series will occur to the reader. plots S − 1 and I together as areas (the summation’s ﬁrst term is omitted). but that isn’t the point of the example. known quantity. but as a reﬂection of majorization the word seems logical. The area I between the dashed curve and the x axis majorizes the area S − 1 between the stairstep curve and the x axis. In practical calculation. 21 . y 1 1/22 1/32 1/42 1 2 y = 1/x2 x 3 Figure 8. The integral I majorizes the summation S − 1. thus guaranteeing the absolute upper limit on S. S − 1 < I = 1. As the plot shows. known quantity. the unknown area S − 1 cannot possibly be as great as the known area I. Even so. Reﬂecting. In symbols.10. because the height of the dashed curve is everywhere at least as great as that of the stairstep curve. minorization 21 serves surely to bound an unknown quantity by a smaller. shifted in the ﬁgure a half unit rightward. or. This book at least will use the word where needed.) Majorization serves surely to bound an unknown quantity by a larger. such as in representing the terms graphically—not as ﬂat-topped rectangles—but as slant-topped trapezoids.8.3 plots S and I together as areas—or more precisely. You can use it too if you like.3: Majorization. (Of course S < 2 is a very loose limit. one would let a computer add many terms of the series ﬁrst numerically. S < 2. ERROR BOUNDS 187 Figure 8.

188 CHAPTER 8. 8. general bound which works quite adequately for most power series encountered in practice. τk τk+1 < |ρn | for at least one k ≥ n. such that τk+1 ≤ |ρn | for all k ≥ n. The |ρn | is a positive real number chosen.1’s series for exp z. because although its terms do decrease in magnitude it has no z k factor (or seen from another point of view. preferably as small as possible. A simple. and the ratio of adjacent terms’ magnitudes approaches unity as k grows. τk 0 < |ρn | < 1. but clever majorization can help (and if majorization does not help enough. it does have a z k factor.3 illustrates.2 has observed. the two of which are akin as Fig. 1 − |ρn | |ǫn | < (8. the point of establishing a bound is not to sum a power series exactly but rather to fence the sum within some suﬃciently (rather than optimally) small neighborhood. including among many others all the Taylor series of Table 8.34) (8. Harmonic series can be hard to sum accurately. for example. The choice of whether to majorize a particular unknown quantity by an integral or by a series summation depends on the convenience of the problem at hand. τn represents the power series’ nth-order term (in Table 8. It is a harmonic series rather than a power series. The series S of this subsection is interesting. THE TAYLOR SERIES are often integrals and/or series summations. but more common than harmonic series are true power series.10.3 Geometric majorization Harmonic series can be hard to sum as § 8. the series transformations of [chapter not yet written] can help even more). easier to sum in that they include a z k factor in each term. There is no one. incidentally.33) Here. However.10. (8.35) . but z = 1). τn = z n /[n!]). 8.1. ideal bound that works equally well for all power series. is the geometric majorization |τn | .

Second.23 Here. merely by taking one or two more terms into the truncated sum.33). 22 . if the Taylor series exp z = ∞ k k=0 j=1 z = j ∞ k=0 zk k! also of Table 8. oppositely as we deﬁne it here. then n−1 − ln(1 − z) ≈ |ǫn | < k=1 |z n | /n zk . unknown) inﬁnite series sum. k . exact (but uncalculatable. ERROR BOUNDS 189 which is to say. so we have chosen |ρn | = |z|.36). more or less.8. This choice is a matter of taste. k=0 (8. and if we choose to stipulate that n + 1 > |z| .1 is truncated before the nth-order term. Given these deﬁnitions. such that each term in the series’ tail is smaller than the last by at least a factor of |ρn |. where S∞ represents the true. if22 n−1 Sn ≡ τk . one can quite conveniently let n = 1 or n = 2.22) imply the geometric majorization (8. There is no reason to use the error bound for n = 0 when. |τk+1 /τk | = [k/(k + 1)] |z| < |z| for all k ≥ n > 0.10. for example. a pair of concrete examples should serve to clarify. Some scientists and engineers—as. for example. Professional mathematicians—as. the author of [73]—seem to tend toward the ǫn ≡ S∞ − Sn of (8. 1 − |z| where ǫn is the error in the truncated sum. If the last paragraph seems abstract. 23 This particular error bound fails for n = 0. if the Taylor series − ln(1 − z) = ∞ k=1 zk k of Table 8. First.34) and (3.36) ǫn ≡ S ∞ − S n .1 is truncated before the nth-order term. but that is no ﬂaw. then (2. the authors of [55] and even this writer in earlier years—prefer to deﬁne ǫn ≡ Sn − S∞ .

|τk+1 /τk | = |z| /(k + 1). f (γ) = g[f (1/γ)] = −f (1/γ) and f (αγ) = h[f (α). The series for − ln(1 − z) and (1 + z)a−1 for instance each converge for |z| < 1 (though slowly for |z| ≈ 1). To shift the series’ expansion points per § 8.6) like these a more probably proﬁtable strategy is to ﬁnd and exploit some property of the functions to transform their arguments. k! . 24 . so we have chosen |ρn | = |z| /(n + 1). THE TAYLOR SERIES n−1 k exp z ≈ |ǫn | < k=0 j=1 k=0 n | /n! |z z = j n−1 zk .37) where g[·] and h[·. whose maximum value for all k ≥ n occurs when k = n. f (γ)] = f (α)f (γ). 8. but for nonentire functions (§ 8. the Taylor series of Table 8. f (γ)] = f (α) + f (γ). f (γ) = g[f (1/γ)] = 1/f (1/γ) and f (αγ) = h[f (α).1 tend to converge slowly for some values of z and not at all for others.4 Calculation outside the fast convergence domain Used directly. To make it seem more concrete. such as 1 − ln γ = ln . 1 − |z| /(n + 1) Here. f (γ) = g f (8. Let f (1 + ζ) be a function whose Taylor series about ζ = 0 converges for |ζ| < 1 and which obeys properties of the forms24 1 . and that the function f (1 + ζ) = (1 + z)a−1 has ζ = z. γ f (αγ) = h [f (α). f (γ)] . whereas each series diverges when asked to compute a quantity like − ln 3 or 3a−1 directly.190 then CHAPTER 8. γ 1 γ a−1 = . (1/γ)a−1 which leave the respective Taylor series to compute quantities like − ln(1/3) and (1/3)a−1 they can handle. ·] are functions we know how to compute like g[·] = −[·] This paragraph’s notation is necessarily abstract. consider that the function f (1 + ζ) = − ln(1 − z) has ζ = −z.2 is one way to seek convergence.10.

191 (8. For the power. γ= 1+ζ 1−γ = ζ. m. − ln(in 2m ) = m ln(1/2) − in(2π/4). (8. whereupon the formula |ℑ(γ)| ≤ ℜ(γ).25 25 Equation (8. . Though this writer knows no way to lift the convergence limit altogether that does not cause more problems than it solves.41) admittedly leaves open the question of how to compute f (in 2m ). 2 Although the transformation from ζ to γ has not lifted the convergence limit altogether. but will in § 8. we have not calculated yet. and like h[·. which not infrequently we do. n ∈ Z. ERROR BOUNDS or g[·] = 1/[·].11. which is |γ − 1| < |γ| or in other words 1 ℜ(γ) > .1). ·] property of (8.39) whose convergence domain |ζ| < 1 is |1 − γ| / |γ| < 1.40) 1 ≤ ℜ(γ) < 2. Identifying 1 = 1 + ζ. The number 2π. f (ω) = h[f (in 2m ). computing f (ω) indirectly for any ω = 0 by any of several tactics.41) calculates f (ω) fast for any ω = 0—provided only that we have other means to compute f (in 2m ). ·] = [·][·]. but at least for the functions this subsection has used as examples this is not hard. (in 2m )a−1 = cis[(n2π/4)(a − 1)]/[(1/2)a−1 ]m . one can take advantage of the h[·. The sine and cosine in the cis function are each calculated directly by Taylor series (possibly with the help of Table 3. One nonoptimal but entirely eﬀective tactic is represented by the equations ω ≡ in 2m γ. f (γ)] (8.8.37) to sidestep the limit. γ we have that f (γ) = g f 1+ 1−γ γ . as are the numbers ln(1/2) and (1/2)a−1 . γ 1 .38) (8. ·] = [·] + [·] or h[·. we see that it has apparently opened the limit to a broader domain. For the logarithm.10.

5 Nonconvergent series Variants of this section’s techniques can be used to prove that a series does not converge at all. consider that arctan ω = α + arctan ζ. but extreme γ of any kind let the series converge only slowly (and due to compound ﬂoating-point rounding error inaccurately) inasmuch as they imply that |ζ| ≈ 1.40) fences γ within a comfortable zone. thus causing the Taylor series to converge faster or indeed to converge at all. shrinking a function’s argument by some convenient means before feeding the argument to the function’s Taylor series.40) also thus signiﬁcantly speeds series convergence. τ To draw another example from Table 8. . ω cos α − sin α ζ≡ . ω sin α + cos α ˆ ˆ ˆ where arctan ω is interpreted as the geometrical angle the vector x + yω makes with x. THE TAYLOR SERIES Notice how (8. the tactic (8. In theory all ﬁnite γ rightward of the frontier let the Taylor series converge.26 8.37) through (8. most nonentire functions lack properties of the speciﬁc kinds that (8. Any number of further examples and tactics of the kind will occur to the creative reader. keeping γ moderately small in magnitude but never too near the ℜ(γ) = 1/2 frontier in the Argand plane. ∞ k=1 26 k+1 k dτ .1. Axes are rotated per (3. ∞ k=1 1 k does not converge because 1 > k hence.10.192 CHAPTER 8. where ˆ ˆ ˆ arctan ω is interpreted as the geometrical angle the vector x + yω = x′ (ω sin α + cos α) + ˆ ˆ y′ (ω cos α − sin α) makes with x′ . Besides allowing all ω = 0.7) through some angle α to reduce the tangent from ω to ζ.41) are useful in themselves and also illustrative generally. but such functions may have other properties one can analogously exploit.37) demands. Of course. The method and tactic of (8. τ ∞ 1 1 > k ∞ k=1 k k+1 dτ = τ dτ = ln ∞. For example.

Occasionally nonetheless a series arises for which even adequate precision is quite hard to achieve. merely by including a suﬃcient number of terms in the sum. A series sum converges to a deﬁnite value. eliminating the error entirely is not normally a goal one seeks. is to establish a deﬁnite neighborhood in which the unknown value is sure to lie. but there are several techniques which help. some of which this section has introduced.10. The English word “error” is thus overloaded here. The topic of series error bounds is what G. 2 3 4 k k=1 ∞ which obviously converges. The “error” in a series summation’s error bounds is unrelated to the error of probability theory (Ch. but sum it if you can! It is not easy to do.10. no chance is involved. Before closing the section. To eliminate the error to the 0x34-bit (sixteen-decimal place) precision of a computer’s double-type ﬂoating-point representation typically requires something like 0x34 terms—if the series be wisely composed and if care be taken to keep z moderately small and reasonably distant from the edge of the series’ convergence domain. some tighter bound can usually be discovered—but often easier and more eﬀective than cleverness is simply to add a few extra terms into the series before truncating it (that is.”27 There is no general answer to the error-bound problem. Besides. suggestions and tactics. ERROR BOUNDS 193 8. What we can do. Brown refers to as “trickbased. Perfect precision is impossible.6 Remarks The study of error bounds is not a matter of rules and formulas so much as of ideas. To eliminate the error entirely usually demands adding an inﬁnite number of terms. by this section’s techniques or perhaps by other methods. 20). and we can make that neighborhood as tight as we want. There is normally no such thing as an optimal error bound—with suﬃcient cleverness. and to the same value every time the series is summed. we ought to arrest one potential agent of terminological confusion. to increase n a little). which is impossible anyway.8. An infamous example is S=− 1 1 1 (−)k √ = 1 − √ + √ − √ + ··· . but since eliminating the error entirely also requires recording the sum to inﬁnite precision. but adequate precision is usually not hard to achieve.S. 27 [11] . Other techniques. which is impossible. we shall meet later in the book as the need for them arises. It is just that we do not necessarily know exactly what that value is. few engineering applications really use much more than 0x10 bits (ﬁve decimal places) in any case.

43) 8.194 CHAPTER 8. the writer supposes that useful lessons lurk in the clever mathematics underlying the subtle schemes.1 implies a neat way of calculating the constant 2π.11 Calculating 2π The Taylor series for arctan z in Table 8. THE TAYLOR SERIES 8. Examples of even functions include z 2 and cos z. we have that 2π = 8 ∞ k=0 (−)k .487F. 2k + 1 (8. An even function is one for which f (−z) = f (z). the plot of a real-valued even function being symmetric about a line. we have that28 2π = 0x18 ∞ k=0 (−)k 2k + 1 √ 3−1 √ 3+1 2k+1 ≈ 0x6. or in other words that 2π . still much faster-converging iterative schemes toward 2π. Odd and even functions are interesting because of the symmetry they bring—the plot of a real-valued odd function being symmetric about a point. For example. arctan 1 = 8 Applying the Taylor series. there is fast and there is fast. but such schemes are not covered here. Much faster convergence is given by angles smaller than 2π/8. from Table 3. √ 3−1 2π arctan √ = . Admittedly. Many The writer is given to understand that clever mathematicians have invented subtle. 0x18 3+1 Applying the Taylor series at this angle.12 Odd and even functions An odd function is one for which f (−z) = −f (z). However. after all. We already know that tan(2π/8) = 1.2. (8.42) The series (8. you only need to compute the numerical value of 2π once. Any function whose Taylor series about zo = 0 includes only odd-order terms is an odd function. The relatively straightforward series this section gives converges to the best accuracy of your computer’s ﬂoating-point register within a paltry 0x40 iterations or so—and. Any function whose Taylor series about zo = 0 includes only even-order terms is an even function. Examples of odd functions include z 3 and sin z.42) is simple but converges extremely slowly. 28 .

all the poles lie along the real number line.8. tanh z. For the circular trigonometrics. tan z. feven (z) = 2 (8.13 Trigonometric poles The singularities of the trigonometric functions are single poles of residue ±1 or ±i.13. For example. sin z cos z tan z 1 1 1 . sinh z cosh z tanh z . the other even—by the simple expedient of sorting the odd-order terms from the even-order in the function’s Taylor series. f (z) = fodd (z) + feven (z). then the ﬁrst line of which is veriﬁed by adding the latter two. of course. along the imaginary. exp z = sinh z + cosh z. .2.44) the latter two lines of which are veriﬁed by substituting −z ← z and observing the deﬁnitions at the section’s head of odd and even. for the hyperbolic trigonometrics. . . of the eight trigonometric functions 1 1 1 . Speciﬁcally.7 will have more to say about odd and even functions. Alternately. but one can always split an analytic function into two components—one odd. Section 18. . 2 f (z) + f (−z) . 8. TRIGONOMETRIC POLES 195 functions are neither odd nor even. f (z) − f (−z) fodd (z) = .

but for real z.45) = i(−)k .45) lists. tan z z = kπ [z − (k − 1/2)π] tan z|z = (k − 1/2)π = −1. . z = kπ = (−)k . which is possible only for real z. . similar reasoning for each of the eight trigonometrics forbids poles other than those (8. Consider for instance the function 1/ sin z = i2/(eiz − e−iz ). tanh z z = ikπ [z − i(k − 1/2)π] tanh z|z = i(k − 1/2)π = 1.1 and 5. sin z cos z tan z 1/ tan z z − ikπ z − i(k − 1/2)π z − ikπ z − i(k − 1/2)π . z = ikπ (8. .1 that i sinh z = sin iz and cosh z = cos iz. o however. Before calculating residues and such. THE TAYLOR SERIES the poles and their respective residues are z − kπ sin z z − (k − 1/2)π cos z = (−)k .45) lists are in fact the only poles that there are. z − ikπ sinh z z − i(k − 1/2)π cosh z = (−)k . sinh z cosh z tanh z 1/ tanh z to reach (8. 3. Trigonometric poles evidently are special only in that a trigonometric function has an inﬁnite number of them.196 CHAPTER 8. that we have forgotten no poles. .1).30). z = (k − 1/2)π z − kπ = 1. the sine function’s very deﬁnition establishes the poles z = kπ (refer to Fig. we shall marshal the identities of Tables 5. Satisﬁed that we have forgotten no poles. z = i(k − 1/2)π z − ikπ = 1. k ∈ Z.2 plus l’Hˆpital’s rule (4. . single .45)’s claims. therefore. With the observations from Table 5.45). This function evidently goes inﬁnite only when eiz = e−iz . we ﬁnally apply l’Hˆpital’s rule to each of the ratios o z − kπ z − (k − 1/2)π z − kπ z − (k − 1/2)π . To support (8. we should like to verify that the poles (8. The poles are ordinary.

cos z. Sometimes one nevertheless wants to expand at least about a pole.46) about the function’s pole at z = 0. Consider for example expanding f (z) = e−z 1 − cos z (8. but never about a pole or branch point of the function.14. subject to Cauchy’s integral formula and so on. exp z and cis z—conspicuously excluded from this section’s gang of eight—have no poles for ﬁnite z. Observe however that the inverse trigonometrics are multiple-valued and have branch points. THE LAURENT SERIES 197 poles.14 The Laurent series Any analytic function can be expanded in a Taylor series. These simpler trigonometrics are not only meromorphic but also entire.29 The six simpler trigonometrics sin z. cosh z. because ez . sinh z. 8.8. with residues. eiz . The trigonometrics are meromorphic functions (§ 8. and thus are not meromorphic at all.6) for this reason. ez ± e−z and eiz ± e−iz then likewise are ﬁnite. Expanding dividend and divisor separately. f (z) = = = 1 − z + z 2 /2 − z 3 /6 + · · · z 2 /2 − z 4 /0x18 + · · · ∞ j j j=0 (−) z /j! − ∞ k 2k k=1 (−) z /(2k)! ∞ 2k 2k+1 /(2k k=0 −z /(2k)! + z ∞ k 2k k=1 (−) z /(2k)! + 1)! . 29 [45] .

(2k)! The remainder’s k = 0 terms now disappear as intended.48) . (k. K ≤ 0.47) One can continue dividing to extract further terms if desired. f (z) = ∞ k=K (ak )(z − zo )k . f (z) = 2 2 − + 2 z z − + = 2 2 − + 2 z z ∞ k=1 CHAPTER 8. (2k + 2)! (8. K) ∈ Z. and if all the terms 2 2 7 z f (z) = 2 − + − + · · · z z 6 2 are extracted the result is the Laurent series proper.198 By long division. (8. so. THE TAYLOR SERIES 2 2 + 2 z z ∞ k=0 ∞ k=1 (−)k z 2k (2k)! ∞ k=1 z 2k z 2k+1 − + (2k)! (2k + 1)! (−)k 2z 2k−2 (2k)! − + (−)k 2z 2k−1 (2k)! (−)k z 2k (2k)! − ∞ k=0 + = 2 2 − + 2 z z ∞ k=0 z 2k+1 z 2k + (2k)! (2k + 1)! − (−)k 2z 2k+1 (2k + 2)! ∞ k=1 (−)k z 2k (2k)! (−)k 2z 2k (2k + 2)! ∞ k=0 + = 2 2 − + 2 z z ∞ k=0 − z 2k z 2k+1 + (2k)! (2k + 1)! (−)k 2 z 2k ∞ k=1 ∞ k=1 (−)k z 2k (2k)! −(2k + 1)(2k + 2) + (2k + 2)! (2k + 2) − (−)k 2 2k+1 z + (2k + 2)! (−)k z 2k . factoring z 2 /z 2 from the division leaves f (z) = 2 2 − + 2 z z ∞ k=0 (2k + 3)(2k + 4) + (−)k 2 2k z (2k + 4)! (2k + 4) + (−)k 2 2k+1 z (2k + 4)! ∞ k=0 − (−)k z 2k .

cos z This function has a nonanalytic point of a most peculiar nature at z = 0. − zo )k (8.49) (k. The fo (z − zo ) of (8. including multiple poles like the one in the example here. 8. suﬃces and may even be preferable. In either form. Consider the function f (z) = sin(1/z) .6 tell more about poles generally. TAYLOR SERIES IN 1/Z However for many purposes (as in eqn. Several other ways are possible. the Laurent series remedies this defect. K ≤ 0.8][23. Thinking ﬂexibly. 31 [34. even then. The typical trouble one has with the Taylor series is that a function’s poles and branch points limit the series’ convergence domain.14 represents one way to extend the Taylor series. c0 = 0. From an applied point of view however what interests us is the basic technique to remove the poles from an otherwise analytic function. one can often evade the trouble.15. The point is an essential singularity. Handling the pole separately. K) ∈ Z.15 Taylor series in 1/z A little imagination helps the Taylor series a lot. The ordinary Taylor series diverges at a function’s pole. unlike f (z).31 Sections 9. but calculating the Taylor coeﬃcients would take a lot of eﬀort and.30.50) where.47) the partial Laurent series f (z) = −1 k=K 199 (ak )(z − zo )k + ∞ k=0 (bk )(z ∞ k=0 (ck )(z − zo )k . The Laurent series of § 8. 8.50) is f (z)’s regular part at z = zo . (k. fo (z − zo ) is analytic at z = zo . (8. however. K ≤ 0.5] . f (z) = −1 k=K (ak )(z − zo )k + fo (z − zo ). Further rigor is left to the professionals. § 10.8. § 2. One could expand the function directly about some other point like z = 1. the resulting series would suﬀer a straitly limited 30 The professional mathematician’s treatment of the Laurent series usually begins by deﬁning an annular convergence domain (a convergence domain bounded without by a large circle and within by a small) in the Argand plane. and one cannot expand the function directly about it.5 and 9. K) ∈ Z.

2k+2 That expansion is good only for |z| < 2. Depending on the application.2). 1 − z 2 /2! + z 4 /4! − · · · which is all one needs to calculate f (z) numerically—and may be all one needs for analysis. One can expand in negative powers of z equally validly as in positive powers. All that however tries too hard. one needs no Taylor series to handle such a function (one simply does the indicated arithmetic). Neither of the section’s examples is especially interesting in itself. cos z z z −1 − z −3 /3! + z −5 /5! − · · · . any eﬀective means to ﬁnd the coeﬃcients suﬃces. Then by (8. but their point is that it often pays to think ﬂexibly in extending and applying the Taylor series. then take the Taylor series either of the whole function at once or of pieces of it separately.19) may be the canonical way to determine Taylor coeﬃcients. Suppose however that a Taylor series speciﬁcally about z = 0 were indeed needed for some reason. Note that we have computed the two series for g(z) without ever actually taking a derivative.200 CHAPTER 8. And. but for |z| > 2 we also have that g(z) = 1/z 2 1 = 2 2 (1 − 2/z) z ∞ k=0 1+k 1 2 z k = ∞ k=2 2k−2 (k − 1) . THE TAYLOR SERIES convergence domain. one can ﬁrst change variables or otherwise rewrite the function in some convenient way. zk which expands in negative rather than positive powers of z. it may suﬃce to write f (z) = This is f (z) = 1 sin w . One is not required immediately to take the Taylor series of a function as it presents itself. (z − 2)2 Most often. w≡ . g(z) = 1 1/4 = 2 (1 − z/2) 4 ∞ k=0 1+k 1 z 2 k = ∞ k=0 k+1 k z . too. . consider g(z) = 1 . As an example of a diﬀerent kind.1) and (4. though taking derivatives per (8.

. . . if N = 2 then k = (k1 . For example. kN — one for each of the independent variables. (2. . . . .. (0. With this idea in mind. (3. The idea of the Taylor series does not diﬀer where there are two or more independent variables. . 2). . is a vector with N = 3. one must account for the crossderivatives.16 The multidimensional Taylor series Equation (8. that’s neat. . . (2. then. (2. 2).16. . In this generalized sense of the word. 1). (1.8. 3). z2 ) = z1 +z1 z2 +2z2 . . . 0). k2 ) = (0. only the details are a little more complicated. THE MULTIDIMENSIONAL TAYLOR SERIES 201 8. . For 2 2 example. 2). 2). (2. (3. Each of the kn runs independently from 0 to ∞. consider the function f (z1 . . v2 = y z and v3 = z.19) has given the Taylor series for functions of a single variable. . (3. 0). • The k is a nonnegative integer vector of N counters—k1 . . (Generalized vectors of arbitrary N will ﬁgure prominently in the book from Ch. v1 = x. . k! (8. (0. (3. 3). 3). 11 onward. . z2 . which has terms z1 and 2z2 —these we understand—but also has the cross-term z1 z2 for which the relevant derivative is the cross-derivative ∂ 2 f /∂z1 ∂z2 . k2 . 1). Where two or more independent variables are involved. (1. 0).) . (1. too. and every permutation is possible.3. . . . The ˆ ˆ geometrical vector v = xx + yy + ˆz of § 3. the multidimensional Taylor series is f (z) = k ∂kf ∂zk z=zo (z − zo )k . What does it mean? • The z is a vector 32 incorporating the several independent variables z1 . (1.. zN . (0.51) Well. meaning that ∂kf ≡ ∂zk 32 ∂ kn (∂zn )kn n=1 N f. 3). 0). 1). . 1). a vector is an ordered set of N elements. • The ∂ k f /∂zk represents the kth cross-derivative of f (z). .

19) represents a function f (z).51) represents a function f (z) as accurately as the simple Taylor series (8. the multidimensional Taylor series (8. .202 • The (z − zo )k represents CHAPTER 8. THE TAYLOR SERIES N (z − zo ) ≡ • The k! represents k n=1 (zn − zon )kn . Thus within some convergence domain about z = zo . the multidimensional Taylor series (8. N k! ≡ kn !. n=1 With these deﬁnitions. and for the same reason.51) yields all the right derivatives and cross-derivatives at the expansion point z = zo .

It is hard to guess in advance which technique might work best. recognizing it to be the derivative of ln τ .2. 1 1 The notation f (τ )|z or [f (τ )]z means f (z) − f (a).1). for.1) For instance. 1 x 1 dτ = ln τ |x = ln x. a dτ (9. unfortunately. then directly writes down the solution ln τ |x . some by another.1 Integration by antiderivative The simplest way to solve an integral is just to look at it. when it comes to adding an inﬁnite number of inﬁnitesimal elements.19) implies a general technique for calculating a derivative symbolically. 1 τ One merely looks at the integrand 1/τ . 9. Some functions are best integrated by one technique.Chapter 9 Integration techniques Equation (4. how is one actually to do the sum? It turns out that there is no one general answer to this question. Refer to § 7. This chapter surveys several weapons of the intrepid mathematician’s arsenal against the integral. Its counterpart (7. recognizing its integrand to be the derivative of something already known:1 z a df dτ = f (τ )|z . a a 203 . implies a general technique only for calculating an integral numerically—and even for this purpose it is imperfect.

2 Integration by substitution Consider the integral x2 S= x1 x dx . . 9. du = 2x dx. with the change of variable u ← 1 + x2 . whose diﬀerential is (by successive steps) d(u) = d(1 + x2 ). useful variation on the antiderivative technique seems worth calling out specially here. nonobvious. One particular.3 and 9. INTEGRATION TECHNIQUES The technique by itself is pretty limited. the frequent object of other integration techniques is to transform an integral into a form to which this basic technique can be applied. for example.24) and (8.1.2.25) have that z2 z1 ρ2 dz = ln + i(φ2 − φ1 ). Besides the essential τ a−1 = d dτ τa a .1 provide several further good derivatives this antiderivative technique can use. If z = ρeiφ .204 CHAPTER 9. However. when z1 and z2 are real but negative numbers. 1 + x2 This integral is not in a form one immediately recognizes.3) This helps. z ρ1 (9. 5.2) Tables 7. (9. 5. then (8. However.

where u(τ ) and v(τ ) are functions of an independent variable τ .9. INTEGRATION BY PARTS the integral is x2 205 S = x=x1 x2 = = = = x=x1 1+x2 2 x dx u 2x dx 2u du 2u u=1+x2 1 1 ln u 2 1 ln 2 1+x2 2 u=1+x2 1 1 + x2 2 . d(uv) = u dv + v du. 1 + x2 x2 =x which indeed has the form of the integrand we started with.5 of the ﬁnal expression with respect to x2 : ∂ 1 1 + x2 2 ln ∂x2 2 1 + x2 1 = x2 =x = 1 ∂ ln 1 + x2 − ln 1 + x2 2 1 2 ∂x2 x . u dv = d(uv) − v du. we can take the derivative per § 7. Reordering terms.3. τ =a (9. b τ =a u dv = uv|b =a − τ b v du. 9. whether alone or in combination with other techniques.4) . The technique is integration by substitution. Integrating.3 Integration by parts Integration by parts is a curious but very broadly applicable technique which begins with the derivative product rule (4. It does not solve all integrals but it does solve many.25). 1 + x2 1 To check the result.

Any convenient v suﬃces. (9. Letting u ← τ. We can begin by integrating the cos ατ dτ part. For an example of the rule’s operation. dv ← cos ατ dτ. not just for C = 0. However. we choose v = (sin ατ )/α. The new integral may or may not be easier to integrate than was the original u dv. 3 [48] . sin ατ v= . INTEGRATION TECHNIQUES Equation (9.5) The careful reader will observe that v = (sin ατ )/α + C matches the chosen dv for any value of C. Unsure how to integrate this.206 CHAPTER 9. we ﬁnd that2 du = dτ. The technique does not just integrate each part of an integral separately. S(x) = τ sin ατ α x 0 x − 0 sin ατ x dτ = sin αx + cos αx − 1. consider the deﬁnite integral3 Γ(z) ≡ 2 ∞ 0 e−τ τ z−1 dτ. then. The virtue of the technique lies in that one often can ﬁnd a part dv which does yield an easier v du. The technique is powerful for this reason. α According to (9. nothing in the integration by parts technique requires us to consider all possible v.4). In this case. consider the integral x S(x) = 0 τ cos ατ dτ. α α Though integration by parts is a powerful technique. upon which it rewards the mathematician only with a whole new integral v du.4) is the rule of integration by parts. we can begin by integrating part of it. What it does is to integrate one part of an integral separately—whichever part one has chosen to identify as dv—while contrarily diﬀerentiating the other part u. For another kind of example of the rule’s operation. ℜ(z) > 0. This is true. one should understand clearly what it does and does not do. It isn’t that simple.

(9. called the gamma function. 207 dv ← τ z−1 dτ. we evidently have that du = −e−τ dτ.4 Integration by unknown coeﬃcients One of the more powerful integration techniques is relatively inelegant. 9. z Substituting these according to (9. it follows by induction on (9. z Γ(z + 1) = zΓ(z). Integration by parts has made this ﬁnding possible.5) yields Γ(z) = e−τ τz z ∞ = [0 − 0] + = When written τ =0 ∞ 0 − ∞ 0 τz z −e−τ dτ τ z −τ e dτ z Γ(z + 1) .9. ℜ(z) > 0. yet it easily cracks some integrals that give other techniques trouble. INTEGRATION BY UNKNOWN COEFFICIENTS Letting u ← e−τ .6) this is an interesting result.7) Thus (9.5) Γ(1) = 0 ∞ e−τ dτ = −e−τ ∞ 0 = 1. . It is best illustrated by example. (9.1) plus intelligent guessing. The technique is the method of unknown coeﬃcients.4.6) that (n − 1)! = Γ(n).4) into (9. τz v= . Since per (9. can be taken as an extended deﬁnition of the factorial (z − 1)! for all z. and it is based on the antiderivative (9.5).

where I is the loan’s interest rate and P is the borrower’s payment rate. at least) by law or custom actually use a needlessly more complicated formula—and not only more complicated.9) The same technique solves diﬀerential equations. dρ with which one can solve the integral. If it is desired to ﬁnd the correct payment rate P which pays Real banks (in the author’s country. Using this value for a. but mathematically slightly incorrect.208 CHAPTER 9. one can write the speciﬁc antiderivative e−(ρ/σ) 2 /2 ρ= d 2 −σ 2 e−(ρ/σ) /2 . too. INTEGRATION TECHNIQUES Consider the integral (which arises in probability theory) x S(x) = 0 e−(ρ/σ) 2 /2 ρ dρ.10) which conceptually represents4 the changing balance x of a bank loan account over time t. σ implying that a = −σ 2 (evidently the guess is right. x|t=T = 0. but see: if the guess were right. one has no guarantee that the guess is right. such as e−(ρ/σ) 2 /2 ρ= d −(ρ/σ)2 /2 ae . (9. Having guessed. concluding that S(x) = −σ 2 e−(ρ/σ) 2 /2 x 0 = σ2 1 − e−(x/σ) 2 /2 . then the antiderivative would have the form e−(ρ/σ) 2 /2 ρ = d −(ρ/σ)2 /2 ae dρ aρ 2 = − 2 e−(ρ/σ) /2 . Consider for example the diﬀerential equation dx = (Ix − P ) dt.8) If one does not know how to solve the integral in a more elegant way. 4 . (9. (9. too. after all). x|t=0 = xo . dρ where the a is an unknown coeﬃcient. one can guess a likely-seeming antiderivative form.

The guess’ derivative is dx = αAeαt dt. Substituting the last two equations into (9. 0 = IB − P. I I and for the latter condition. INTEGRATION BY UNKNOWN COEFFICIENTS 209 the loan oﬀ in the time T . where α. For the former condition. which at least is satisﬁed if both of the equations αAeαt = IAeαt . Evidently good choices for α and B.11) is P P xo = Ae(I)(0) + =A+ . are satisﬁed.4.11) I to (9. (9.10). I Solving the last two equations simultaneously.9.10) and dividing by dt yields αAeαt = IAeαt + IB − P. then (perhaps after some bad guesses) we guess the form x(t) = Aeαt + B. 0 = AeIT + P . I Substituting these coeﬃcients into the x(t) equation above yields the general solution P x(t) = AeIt + (9. P B= . then. 1 − e−IT A= (9. we establish by applying the given boundary conditions x|t=0 = xo and x|t=T = 0. we have that −e−IT xo . 1 − e−IT Ixo P = . A and B are unknown coeﬃcients. are α = I. The constants A and P .12) .

§ 1. The trouble. The virtue of the method of unknown coeﬃcients lies in that it permits one to try an entire family of candidate solutions at once. with the payment rate P required of the borrower given by (9. too—and it has surprise value (for some reason people seem not to expect it). then Cauchy’s integral formula (8. If the outer circle in the ﬁgure is of inﬁnite 5 [48. no evident factoring into parts. but it is pretty powerful. with the contour enclosing the pole at z = −1 but shutting out the branch point at z = 0.210 CHAPTER 9. Such are the kinds of problems the method can solve. with the family members distinguished by the values of the coeﬃcients.13) to (9.1.2] . 9.29) has that integrating once counterclockwise about a closed complex contour. from the inelegant to the sublime. if one writes the same integral using the complex variable z in place of the real variable τ . Consider the deﬁnite integral5 S= 0 ∞ τa dτ. Slightly inelegant the method may be. but there is a way.10) meeting the boundary conditions. If a solution exists anywhere in the family. Observing that τ is only a dummy integration variable. INTEGRATION TECHNIQUES Applying these to the general solution (9. of course.5 Integration by closed contour We pass now from the elephant to the falcon. is that the integral S does not go about a closed complex contour. τ +1 This is a hard integral. One can however construct a closed complex contour I of which S is a part. The method of unknown coeﬃcients is an elephant. The integrand has a pole at τ = −1. No obvious substitution. yields I= za dz = i2πz a |z=−1 = i2π ei2π/2 z+1 a = i2πei2πa/2 . the method usually ﬁnds it.12). as in Fig 9.11) yields the speciﬁc solution x(t) = xo 1 − e(I)(t−T ) 1 − e−IT (9. −1 < a < 0. seems to solve the integral.

5. Even though z itself takes on the same values along I3 as along I1 . The integrand has a branch point at z = 0. Indeed. but besides being incorrect this defeats the purpose of the closed contour technique. The ﬁgure tempts one to make the mistake of writing that I1 = S = −I3 . the multiple-valued integrand z a /(z + 1) does not. in passing from I3 through I4 to I1 . then the closed contour I is composed of the four parts I = I1 + I2 + I3 + I4 = (I1 + I3 ) + I2 + I4 . the two parts I1 + I3 = 0 do not cancel. More subtlety is needed. One must take care to interpret the four parts correctly. INTEGRATION BY CLOSED CONTOUR Figure 9. ℑ(z) 211 I2 z = −1 I4 I1 I3 ℜ(z) radius and the inner. ρ+1 . in fact. so. of inﬁnitesimal.9. ρ+1 ρa dρ = ei2πa S.1: Integration by closed contour. I1 = 0 ∞ ∞ 0 −I3 = (ρei0 )a dρ = (ρei0 ) + 1 (ρei2π )a dρ = ei2πa (ρei2π ) + 1 ∞ 0 ∞ 0 ρa dρ = S. the contour has circled. The integrand z a /(z + 1) is multiple-valued. which.

1 + a cos θ So astonishing is the result. INTEGRATION TECHNIQUES I = I1 + I2 + I3 + I4 = (I1 + I3 ) + I2 + I4 = (1 − ei2πa )S + lim 2π ρ→∞ φ=0 2π za dz − lim ρ→0 z+1 z a−1 dz − lim − lim ρ→0 2π φ=0 2π za dz z+1 = (1 − ei2πa )S + lim = (1 − ei2πa )S + lim ρ→∞ φ=0 a 2π ρ→∞ z a φ=0 z a+1 ρ→0 φ=0 a+1 2π φ=0 z a dz . CHAPTER 9.14) an astonishing result.29). But by Cauchy’s integral formula we have already found an expression for I. S = − sin(2πa/2) That is. 7 [45] . Substituting this expression into the last equation yields. ℑ(a) = 0. τ +1 sin(2πa/2) (9. the second limit vanishes.1) conﬁrms the result. S = i2πei2πa/2 .212 Therefore. that one is unlikely to believe it at ﬁrst encounter. Since a < 0. leaving I = (1 − ei2πa )S. the ﬁrst limit vanishes. and because a > −1. e − ei2πa/2 2π/2 . by successive steps. 0 ∞ τa 2π/2 dτ = − . too. Such results vindicate the eﬀort we have spent in deriving Cauchy’s integral formula (8. 1 − ei2πa i2π S = −i2πa/2 . |ℜ(a)| < 1. straightforward (though computationally highly ineﬃcient) numerical integration per (7. −1 < a < 0. i2πei2πa/2 = (1 − ei2πa )S. However.6 Another example7 is 2π T = 0 6 dθ . as the interested reader and his computer can check.

here again the contour is not closed. meaning that one of the two poles lies within the contour and one lies without. One thus obtains the equivalent integral T = dz/iz i2 dz =− 2 + 2z/a + 1 1 + (a/2)(z + 1/z) a z dz i2 = − . The previous example closed the contour by extending it.9. The integrand evidently has poles at −1 ± √ a 1 − a2 z= . INTEGRATION BY CLOSED CONTOUR 213 As in the previous example. as is .5. whose magnitudes are such that √ 2 − a2 ∓ 2 1 − a2 . In this example there is no branch point to exclude. √ √ a z − −1 + 1 − a2 /a z − −1 − 1 − a2 /a whose contour is the unit circle in the Argand plane. begins and ends the integration at the same point. one changes the variable z ← eiθ and takes advantage of the fact that z. nor need one extend the contour. Rather. |z| = a2 2 One of the two magnitudes is less than unity and one is greater. unlike θ. excluding the branch point.

√ 2 + 2 1 − a2 .29). this seems equally unreasonable to the writer—but work it does. The robustness of the technique lies in that any contour of any shape will work. integration by closed contour. 1 − a2 √ 1 − a2 √ − 1 − a2 √ 1 − 1 − a2 √ 2 − 2 1 − a2 √ 2 − a2 − 2 1 − a2 √ 2 − a2 − 2 1 − a2 a2 . (−a2 )(0) > (−a2 )(1 − a2 ). . 8 These steps are perhaps best read from bottom to top. 1 − a2 √ z=(−1+ 1−a2 )/a Observe that by means of a complex variable of integration. nevertheless. √ 2 − a2 + 2 1 − a2 . so long as the contour encloses appropriate poles in the Argand domain plane while shutting branch points out.214 CHAPTER 9. It is a great technique. integrating about the pole within the contour yields T = i2π −i2/a √ z − −1 − 1 − a2 /a =√ 2π . If it seems unreasonable to the reader to expect so ﬂamboyant a technique actually to work. 1 − a2 > 1 − 2a2 + a4 . < < < < a2 1 − a2 2 0 > −a2 + a4 . √ < −(1 − a2 ) < < < < < 2a2 a2 1 1 − a2 . The technique. is found in practice to solve many integrals other techniques ﬁnd almost impossible to crack. The key to making the technique work lies in closing a contour one knows how to treat. 6’s footnote 16. See Ch. √ 2 − a2 + 2 1 − a2 . each example has indirectly evaluated an integral whose integrand is purely real. 0 < 1 − a2 . INTEGRATION TECHNIQUES seen by the successive steps8 a2 < 1. a2 Per Cauchy’s integral formula (8. > > 1 − a2 . √ 1 + 1 − a2 .

1 Partial-fraction expansion f (z) = −4 5 + .12] Terminology (you probably knew this already): A fraction is the ratio of two numbers or expressions B/A. using (9. In the fraction. B is the numerator and A is the denominator. m. ℓ. 0 0 f (τ ) dτ −1 = −1 = [−4 ln(1 − τ ) + 5 ln(2 − τ )]0 . It introduces the expansion itself ﬁrst. 9.4. INTEGRATION BY PARTIAL-FRACTION EXPANSION The extension z2 z1 z2 215 f (z) dz ≤ z1 |f (z) dz| (9. as in § 17. N ∈ Z. −1 −4 dτ + τ −1 0 −1 5 dτ τ −2 The trouble is that one is not always given the function in the amenable form. p. n. M. k.9.16) [56.6 Integration by partial-fraction expansion This section treats integration by partial-fraction expansion. §§ 2. Appendix F][34. the former is probably the more amenable to analysis.3). z−1 z−2 Consider the function Combining the two fractions over a common denominator10 yields f (z) = z+3 .6. p(·) . For example.9 Throughout the section. j.5. j ′ .15) of the complex triangle sum inequality (3. The quotient is Q = B/A. (z − 1)(z − 2) Of the two forms.6. 10 .7 and 10.22) from the discrete to the continuous case sometimes proves useful in evaluating integrals by this section’s technique. 9. Given a rational function f (z) = 9 N −1 k k=0 bk z N j=1 (z − αj ) (9.

17) z − αk k=1 where multiplying each fraction of (9. the partial-fraction expansion has the form N Ak f (z) = .16).6.17) gives the ratio 1= N −1 k k=0 bk z N j=1 (z N k=1 − αj ) Ak . That is.1 is that it cannot directly handle repeated poles. The conventional way to expand a fraction with repeated poles is presented in § 9.18) ﬁnds an uncanceled pole remaining in its denominator and thus fails for An = Aj (it still works for the other Am ). Hence. z − αk In the immediate neighborhood of z = αm . we have that Am = N −1 k k=0 bk z N j=1 (z − αj ) /(z − αm ) = lim [(z − αm )f (z)] .16) by (9. z→αm z=αm (9.18) where Am . Equations (9. Dividing (9. 1 = lim N −1 k k=0 bk z N j=1 (z z→αm − αj ) Am . yielding (9. n = j. z − αm Rearranging factors. then the residue formula (9. if αn = αj . 9.17) by N j=1 (z N j=1 (z − αj ) /(z − αk ) − αj ) /(z − αk ) puts the several fractions over a common denominator. the mth term Am /(z − αm ) dominates the summation of (9. the value of f (z) with the pole canceled. (9.17) and (9.5 below. but because at least to this writer that way does not lend much applied . is called the residue of f (z) at the pole z = αm .18) together give the partial-fraction expansion of (9.6.6.2 Repeated poles The weakness of the partial-fraction expansion of § 9.216 CHAPTER 9.16)’s rational function f (z). INTEGRATION TECHNIQUES in which no two of the several poles αj are the same.17).

the present subsection treats the matter in a diﬀerent way. Consider the function g(z) = N −1 k=0 Cei2πk/N . N −1 C ei2πk/N g(z) = z k=0 ∞ j=0 ǫei2πk/N z j = C N −1 ∞ k=0 j=1 ǫj−1 ei2πjk/N zj N −1 k=0 = C But11 ∞ j=1 ǫj−1 zj ei2πj/N k . z mN m=1 For |z| ≫ ǫ—that is. so g(z) = N C ǫmN −1 . Do the same for j = 2 then j = 8. . then for N = 8 and j = 3 plot the several (ei2πj/N )k in the Argand plane. |z| ≫ ǫ.6. in the other cases they cancel. g(z) = C z N −1 k=0 ei2πk/N . z − ǫei2πk/N (9. N > 1. g(z) ≈ N C 11 ǫN −1 . This function evidently has a small circle of poles in the Argand plane at αk = ǫei2πk/N . Only in the j = 8 case do the terms add coherently. 1 − (ǫei2πk/N )/z Using (2.19) where C is a real-valued constant. This eﬀect—reinforcing when j = nN .1 will formally introduce later in the book. Here. INTEGRATION BY PARTIAL-FRACTION EXPANSION 217 insight. zN If you don’t see why. otherwise. Factoring. 0 < ǫ ≪ 1. which § 17. we separate the poles.34) to expand the fraction. Hence. except in the immediate neighborhood of the small circle of poles—the ﬁrst term of the summation dominates. N −1 k=0 ei2πj/N k = N 0 ∞ if j = mN.9. canceling otherwise—is a classic manifestation of Parseval’s principle.

zN g(z) ≈ But given the chosen value of C. we have that 1 1 = lim N ǫ→0 N ǫN −1 (z − zo ) N −1 k=0 ei2πk/N .20) The signiﬁcance of (9. N ǫN −1 then 1 . |z| ≫ ǫ. The poles are close together but very strong.218 CHAPTER 9.20) is that it lets one replace an N -fold pole with a small circle of ordinary poles. which per § 9. if we strategically choose C= 1 .19) is g(z) = 1 N ǫN −1 N −1 k=0 ei2πk/N . N > 1. 0 < ǫ ≪ 1.6. z − zo + ǫei2πk/N (9. z − ǫei2πk/N Joining the last two equations together. (9. INTEGRATION TECHNIQUES Having achieved this approximation. changing z − zo ← z. Notice incidentally that 1/N ǫN −1 is a large number not a small. N > 1. and writing more formally. .1 we already know how to handle.

the pole hiding under dominant. . Notice how the calculation has discovered an additional. if needed. double pole there. INTEGRATION BY PARTIAL-FRACTION EXPANSION An example to illustrate the technique. (9.20)—to expand the function into a sum of partial fractions. 9.6.3 Integrating a rational function If one can ﬁnd the poles of a rational function of the form (9.17) and (9.18)—and. then one can use (9. single pole at z = 1.16). separating a double pole: z2 − z + 6 (z − 1)2 (z + 2) 219 f (z) = z2 − z + 6 ǫ→0 (z − [1 + ǫei2π(0)/2 ])(z − [1 + ǫei2π(1)/2 ])(z + 2) z2 − z + 6 = lim ǫ→0 (z − [1 + ǫ])(z − [1 − ǫ])(z + 2) 1 z2 − z + 6 = lim ǫ→0 z − [1 + ǫ] (z − [1 − ǫ])(z + 2) z=1+ǫ = lim + + = lim 1 z − [1 − ǫ] 1 z+2 z2 − z + 6 (z − [1 + ǫ])(z + 2) z2 z=1−ǫ ǫ→0 = lim ǫ→0 = lim ǫ→0 6+ǫ 1 z − [1 + ǫ] 6ǫ + 2ǫ2 1 6−ǫ + z − [1 − ǫ] −6ǫ + 2ǫ2 1 0xC + z+2 9 −1/ǫ − 1/6 4/3 1/ǫ − 1/6 + + z − [1 + ǫ] z − [1 − ǫ] z+2 1/ǫ −1/ǫ −1/3 4/3 + + + z − [1 + ǫ] z − [1 − ǫ] z − 1 z + 2 −z+6 (z − [1 + ǫ])(z − [1 − ǫ]) z=−2 . each of which one can integrate individually.6.9.

(Notice incidentally how much easier it is symbolically to diﬀerentiate than to integrate!) Section 18. conﬁrming the result. 18’s Laplace transform to solve a linear diﬀerential equation.220 CHAPTER 9. x f (τ ) dτ 0 = = = = = = = = = τ2 − τ + 6 dτ 2 0 (τ − 1) (τ + 2) x −1/ǫ −1/3 4/3 1/ǫ lim + + + dτ ǫ→0 0 τ − [1 + ǫ] τ − [1 − ǫ] τ − 1 τ + 2 1 1 lim ln([1 + ǫ] − τ ) − ln([1 − ǫ] − τ ) ǫ→0 ǫ ǫ x 4 1 − ln(1 − τ ) + ln(τ + 2) 3 3 0 x 1 [1 + ǫ] − τ 4 1 lim ln − ln(1 − τ ) + ln(τ + 2) ǫ→0 ǫ [1 − ǫ] − τ 3 3 0 x [1 − τ ] + ǫ 4 1 1 ln − ln(1 − τ ) + ln(τ + 2) lim ǫ→0 ǫ [1 − τ ] − ǫ 3 3 0 x 2ǫ 4 1 1 ln 1 + − ln(1 − τ ) + ln(τ + 2) lim ǫ→0 ǫ 1−τ 3 3 0 x 1 4 2ǫ 1 lim − ln(1 − τ ) + ln(τ + 2) ǫ→0 ǫ 1−τ 3 3 0 x 1 4 2 − ln(1 − τ ) + ln(τ + 2) lim ǫ→0 1 − τ 3 3 0 2 x+2 1 4 − 2 − ln(1 − x) + ln . INTEGRATION TECHNIQUES Continuing the example of § 9. applying it in the context of Ch. for 0 ≤ x < 1. we can take the derivative of the ﬁnal expression: d dx x+2 1 4 2 − 2 − ln(1 − x) + ln 1−x 3 3 2 x=τ −1/3 4/3 2 + + = 2 (τ − 1) τ −1 τ +2 2−τ +6 τ = . (τ − 1)2 (τ + 2) which indeed has the form of the integrand we started with.6. 1−x 3 3 2 x To check (§ 7. .7 exercises the technique in a more sophisticated way.5) that the result is correct.2.

22) is good at least for this case.9.6. (9. 9. Then. dwk [g(w)]k+1 (9. First of interest is the property that a function in the general rational form wp h0 (w) .3 aﬀords extra insight. INTEGRATION BY PARTIAL-FRACTION EXPANSION 221 9. This subsection derives it. d dn−1 Φ d wp−n+1 hn−1 (w) wp−n hn (w) = = . g(0) = 0. 0 ≤ k ≤ p.5 Repeated poles (the conventional technique) Though the technique of §§ 9.22) holds for k = n − 1. The property is proved by induction.4 The derivatives of a rational function Not only the integral of a rational function interests us.6. so (9. The conventional technique is worth learning not only because it is conventional but also because it is usually quicker to apply in practice. One needs no special technique to compute such derivatives. of course. if (9.21) Φ(w) = g(w) enjoys derivatives in the general rational form dk Φ wp−k hk (w) = . The reason is that (9.22) is (9. when 0 ≤ k < p and w = 0. hn (w) ≡ wg dw dw dn Φ dwn = which makes hn (like hn−1 ) a polynomial in nonnegative powers of w.6. (9. By induction on this basis. A related property is that dk Φ dwk =0 w=0 for 0 ≤ k < p. n dw dwn−1 dw [g(w)] [g(w)]n+1 dg dhn−1 − nwhn−1 + (p − n + 1)ghn−1 . as was to be demonstrated. .22)’s denominator is [g(w)]k+1 = 0. the function and its ﬁrst p − 1 derivatives are all zero at w = 0. it is not the conventional technique to expand in partial fractions a rational function having a repeated pole.23) That is.2 and 9. 0 < n ≤ p. but the derivatives do bring some noteworthy properties.22) holds for all 0 ≤ k ≤ p.6. its derivatives interest us.6. (9. When k = 0.22) where g and hk are polynomials in nonnegative powers of w. (9. whereas its numerator has a wp−k = 0 factor. too.21).

23) with w = z − αm reveals the double summation and its ﬁrst pm − 1 derivatives all to be null at z = αm . cannot be expanded solely in the ﬁrst-order fractions of § 9.25) lacks are the values of its several coeﬃcients Ajℓ . j=1 αj ′ = αj if j ′ = j. But (9. ℓ=0 j=m (Ajℓ )(z − αm )pm + (z − αj )pj −ℓ pm −1 ℓ=0 (Amℓ )(z − αm )ℓ .24) N ≡ pj . M .25) What the partial-fraction expansion (9. ℓ=0 j=m (Ajℓ )(z − αm )pm (z − αj )pj −ℓ z=αm = 0.1.25) by (z − αm )pm to obtain the form M (z − αm ) pm f (z) = pj −1 j=1. N and the several pj are integers. Ajℓ . (z − αj )pj −ℓ (9. one multiplies (9. To determine them with respect to the pm -fold pole at z = αm . the (z − αm )pm f (z) equation’s kth derivative reduces at that point to = z=αm pm −1 ℓ=0 dk (z − αm )pm f (z) dz k dk (Amℓ )(z − αm )ℓ dz k z=αm = k!Amk . . 0 ≤ k < pm . f (z) = N −1 k k=0 bk z M j=1 (z M − αj )pj . dk dz k M pj −1 j=1. that is. INTEGRATION TECHNIQUES A rational function with repeated poles. so. but can indeed be expanded if higherorder fractions are allowed: f (z) = M pj −1 j=1 ℓ=0 pj ≥ 0. k.222 CHAPTER 9. 0 ≤ k < pm .6. where j. One can determine the coeﬃcients with respect to one (possibly repeated) pole at a time. (9. 1 ≤ m ≤ M .

26) means. (9. 9. and some of the coeﬃcients come from the one.6. Maybe there exist two distinct expansions. if only a short way. On the other hand.6.6 The existence and uniqueness of solutions Equation (9.5’s logic discovers no guarantee that all of (9. A careful review of § 9.9.25)’s partial fractions. Uniqueness is proved by positing two solutions f (z) = M pj −1 j=1 ℓ=0 Ajℓ = (z − αj )pj −ℓ M pj −1 j=1 ℓ=0 Bjℓ (z − αj )pj −ℓ and computing the diﬀerence M pj −1 j=1 ℓ=0 Bjℓ − Ajℓ (z − αj )pj −ℓ between them. but on this occasion let us nonetheless follow the professional’s line of reasoning.26)’s coeﬃcients actually come from the same expansion. otherwise what we have found is a phantom. “Why should we prove that a solution exists. some from the other. in which event it is not even clear what (9. yes. This however . but the professional’s point is that we have found the solution only if in fact it does exist.6. these coeﬃcients evidently depend not only on the residual function itself but also on its several derivatives. “But these are quibbles.26) has solved (9. Logically this diﬀerence must be zero for all z if the two solutions are actually to represent the same function f (z).” Well.26) to weight the expansion (9. maybe there exists no expansion at all. one derivative per repetition of the pole. once we have actually found it?” Ah. z=αj 0 ≤ ℓ < p. In case of a repeated pole.25). INTEGRATION BY PARTIAL-FRACTION EXPANSION 223 Changing j ← m and ℓ ← k and solving for Ajℓ then produces the coeﬃcients Ajℓ = 1 ℓ! dℓ (z − αj )pj f (z) dz ℓ . Comes from us the reply. and uniquely. “The present book is a book of applied mathematics.24) and (9. cavils and nitpicks!” we are inclined to grumble. A professional mathematician might object however that it has done so without ﬁrst proving that a unique solution actually exists.

ℓ). positive coeﬃcients and f (τ ) is an arbitrary complex expression in τ . Each coeﬃcient bk is seen thereby to be a linear combination of the several Ajℓ . 9. which we have already established.224 CHAPTER 9. there always exist an alternate set of bk for which the same system has multiple solutions. the two solutions are one and the same. Therefore it is not possible for the system to have no solution—which is to say. say. b1 = A00 + A01 + A10 . one cannot. but if f (0+ ) = f (+∞). but such proofs when desired tend to ﬁt the pattern outlined here. . because each half-integral alone diverges. Nonetheless. forbids such multiple solutions in all cases. τ where a and b are real. an N × N system of N linear equations in N unknowns results—which might for example (if. splitting the integral in two is the right idea. the solution necessarily exists. b2 = 2A01 − 5A10 .24). But uniqueness. We will show in Chs. We will not often in this book prove existence and uniqueness explicitly. Therefore. provided that one ﬁrst relaxes the limits of integration as 1/ǫ S = lim ǫ→0+ ǫ f (bτ ) dτ − τ 1/ǫ ǫ f (aτ ) dτ τ . Existence comes of combining the several fractions of (9. One wants to split such an integral in two as [f (bτ )/τ ] dτ − [f (aτ )/τ ] dτ . INTEGRATION TECHNIQUES is seen to be possible only if Bjℓ = Ajℓ for each (j.25) over a common denominator and comparing the resulting numerator against the numerator of (9. where the combination’s weights depend solely on the locations αj and multiplicities pj of f (z)’s several poles. 11 through 14 that when such a system has no solution. N = 3) look like b0 = −2A00 + A01 + 3A10 .7 Frullani’s integral One occasionally meets an integral of the form S= 0 ∞ f (bτ ) − f (aτ ) dτ. From the N coeﬃcients bk and the N coeﬃcients Ajℓ .

13 12 .5. 0 τ a (9. [48. “Frullani’s integral”] One could write the latter product more generally as τ a−1 ln βτ . The two occur often enough to merit investigation here.9. if a and b are both real and positive. § 1.27) which.12 9.1][76.8 Integrating products of exponentials.8. ln βτ = ln β + ln τ . ∞ 0 f (bτ ) − f (aτ ) b dτ = [f (τ )]∞ ln . wherein ln β is just a constant. PRODUCTS OF EXPONENTIALS. we have split the integration as though a ≤ b.3][3. powers and logarithms The products exp(ατ )τ n (where n ∈ Z) and τ a−1 ln τ tend to arise13 among other places in integrands related to special functions ([chapter not yet written]). as is easy to verify). this is b/ǫ S = = ǫ→0+ lim f (+∞) a/ǫ dσ − f (0+ ) σ bǫ aǫ dσ σ ǫ→0+ lim f (+∞) ln bǫ b/ǫ − f (0+ ) ln a/ǫ aǫ b = [f (τ )]∞ ln . 0 a Thus we have Frullani’s integral. § 2. According to Table 2. works for any f (τ ) which has deﬁnite f (0+ ) and f (+∞). but in fact it does not matter which of a and b is the greater. however. POWERS AND LOGS Changing σ ← bτ in the left integral and σ ← aτ in the right yields b/ǫ 225 S = = = ǫ→0+ lim bǫ bǫ f (σ) dσ − σ a/ǫ aǫ f (σ) dσ σ f (σ) − f (σ) dσ + σ b/ǫ a/ǫ ǫ→0+ lim aǫ −f (σ) dσ + σ f (σ) dσ − σ a/ǫ bǫ bǫ aǫ f (σ) dσ σ b/ǫ ǫ→0+ lim a/ǫ f (σ) dσ σ (here on the face of it.5. So long as each of f (ǫ) and f (1/ǫ) approaches a constant value as ǫ vanishes.

Therefore. 73] [65. a = 0.14 (−)n−k exp(ατ )τ k . (n!/k!)αn−k+1 which demands that B = 1/a and that C = −1/a2 . 0 ≤ k ≤ n. n ∈ Z. INTEGRATION TECHNIQUES Concerning exp(ατ )τ n . = αan exp(ατ )τ n + k=0 αak + If so.15 τ a−1 ln τ = 14 15 d τa dτ a ln τ − 1 a . 0 ≤ k < n. ak = Therefore. after maybe some false tries we eventually do strike the right form exp(ατ )τ n = d dτ τ a−1 ln τ d a τ [B ln τ + C] dτ = τ a−1 [aB ln τ + (B + aC)]. = n 1 .28) a−1 ln τ is less obvious. by § 9. Appendix 2. then evidently an = ak That is.3’s observation that ln τ is of zeroth order in τ . The right form to guess for the antiderivative of τ Remembering however § 5. (9.29) [65. eqn. 74] . eqn. n ≥ 0. α ak+1 .226 CHAPTER 9. = − (k + 1)(α) 1 α n j=k+1 (−jα) = (−)n−k . Appendix 2. (n!/k!)αn−k+1 k=0 (9. α = 0.4’s method of unknown coeﬃcients we guess its antiderivative to ﬁt the form exp(ατ )τ n = = k=0 n−1 d dτ n n ak exp(ατ )τ k k=0 n αak exp(ατ )τ k + k=1 ak exp(ατ )τ k−1 k ak+1 k+1 exp(ατ )τ k .

powers and logarithms.30) seemed hard to guess nevertheless. 8 and the antiderivative of § 9. But not all. exp(ατ )τ n ln τ and so on can be computed in like manner as the need arises. τ dτ 2 (9. d dτ n k=0 τa exp(ατ )τ n = τ a−1 ln τ ln τ τ = = (−)n−k exp(ατ )τ k . but in this case with a little imagination the antiderivative is not hard to guess: d (ln τ )2 ln τ = .9 Integration by Taylor series With suﬃcient cleverness the techniques of the foregoing sections solve many.30) If (9. (9. many integrals.1: Antiderivatives of products of exponentials. as sometimes it does.1 summarizes.30).1 together oﬀer a concise.9. n ≥ 0. would yield the same (9. the Taylor series of Ch. practical way to integrate some functions.29) as a → 0.9.31) 9. When all else fails. then l’Hˆpital’s rule (4. INTEGRATION BY TAYLOR SERIES 227 Table 9. a=0 d 1 ln τ − dτ a a 2 d (ln τ ) dτ 2 Antiderivatives of terms like τ a−1 (ln τ )2 . Table 9. o applied to (9. at the price of losing the functions’ . α = 0 (n!/k!)αn−k+1 . with the observation from (2. Equation (9. n ∈ Z.29) fails when a = 0.41) that τ a = exp(a ln τ ).30).

It works just the same. it is just a series. x 0 exp − τ2 2 dτ = = = x ∞ 0 k=0 x ∞ 0 k=0 ∞ k=0 (−τ 2 /2)k dτ k! (−)k τ 2k dτ 2k k! x (−)k τ 2k+1 (2k + 1)2k k! 0 ∞ k=0 = ∞ k=0 (−)k x2k+1 (2k + 1)2k k! = (x) 1 2k + 1 k j=1 −x2 . or calculate its value. For example. or take its derivative τ2 d myf τ = exp − dτ 2 . 2j exp − τ2 2 dτ = myf x. The myf z is no less a function than sin z is. . a special-purpose technique of integration by cylindrical transformation will surface in § 18. INTEGRATION TECHNIQUES known closed analytic forms.228 CHAPTER 9. After all. You can plot the function. however. 0 ∞ k=0 (−)k z 2k+1 = (z) (2k + 1)2k k! x ∞ k=0 1 2k + 1 k j=1 −z 2 . The series above converges just as accurately and just as fast. or do with it whatever else one does with functions.1 is used to calculate sin z.5. Beyond the several integration techniques this chapter has introduced. 2j The result is no function one recognizes. it’s just a function you hadn’t heard of before. too. when a Taylor series from Table 8. This is not necessarily bad. then sin z is just a series. Sometimes it helps to give the series a name like myf z ≡ Then.

hefting the weighty topic of the matrix. One would prefer an actual formula to extract the roots. but that is an approximate iteration rather than an exact formula like (2. we shall refresh ourselves with an interesting but lighter mathematical topic: the topic of cubics and quartics.Chapter 10 Cubics and quartics Under the heat of noonday. Chapters 2 through 9 have established the applied mathematical foundations upon which coming chapters will build. will indeed begin to build on those foundations. But in this short chapter which rests between. The roots of higher-order polynomials.2). between the hard work of the morning and the heavy lifting of the afternoon. one likes to lay down one’s burden and rest a spell in the shade. 11. which though not plain to see the quadratic formula (2.1 However. So much algebra has been known since antiquity. No general formula to extract the roots of the nth-order polynomial seems to be known. 1 Refer to Ch. The quadratic polynomial z 2 + a1 z + a0 has of course two roots. the lone root z = −a0 of which is plain to see. The expression z + a0 is a linear polynomial.2) extracts with little eﬀort. the Newton-Raphson iteration (4. and Ch. 6’s footnote 9. and as we have seen in § 4. 229 .31) locates swiftly. to extract the roots of the cubic and quartic polynomials z 3 + a2 z 2 + a1 z + a0 .8 it can occasionally fail to converge. z 4 + a3 z 3 + a2 z 2 + a1 z + a0 .

CUBICS AND QUARTICS though the ancients never discovered how. z ≈ wo /w. 1 ← z. w (10.” 00:26. The 16thcentury algebraists Ferrari. and so forth. 31 Oct. § 1. formulas do exist. Section 10. 9 Nov. we have that z ≈ w. inasmuch as exp[−φ] = 1/ exp φ. 2006][77.1) is Vieta’s transform. 10.1 Vieta’s transform There is a sense to numbers by which 1/2 resembles 2.3) 2 [76. but as |w| approaches |wo | this 2 ceases to be true. To capture this sense. 2006][77. w+ An interesting alternative to Vieta’s transform is w 2 wo ← z.2 shows how Vieta’s transform can be used.1) w Equation (10.20) of the cosh(·) function. 1/3 resembles 3. 1 Nov.4 For |w| ≫ |wo |. “Quartic equation.3 might be named Vieta’s parallel transform. For |w| ≪ |wo |.230 CHAPTER 10.2) which in light of § 6. This chapter explains.” 05:17. in the neighborhood of which w transitions from the one domain to the other. 1/4 resembles 4. “Quartic equation”][77.5] 3 This change of variable broadly recalls the sum-of-exponentials form (5. “Gerolamo Cardano. so one begins by changing the variable x+h←z (10. c e 2006][69. Figure 10. Vieta.” [76. one can transform a function f (z) into a function f (w) by the change of variable3 w+ or. (10. Tartaglia and Cardano have given us the clever technique. The constant wo is the corner value. w 2 wo ← z.2 10.2 Cubics The general cubic polynomial is too hard to extract the roots of directly. more generally. “Fran¸ois Vi`te.” 22:35. “Vieta’s substitution”] .1 plots Vieta’s transform for real w in the case wo = 1. “Cubic equation”][76. 4 Also called “Vieta’s substitution.

3 (10.10. then the . are the cubic polynomial’s three roots.4) . plotted logarithmically. CUBICS 231 Figure 10. If one could strike the px term as well. 3 a1 a2 a2 q ≡ −a0 + −2 3 3 a2 a1 a2 a2 2 +2 x + a0 − 3 3 3 3 (10. The solutions to the equation x3 = px + q.6) then.1: Vieta’s transform (10.1) for wo = 1. where p ≡ −a1 + a2 2 . That was the easy part. (10.5) . So we have struck the a2 z 2 term. ln z ln w to obtain the polynomial x3 + (a2 + 3h)x2 + (a1 + 2ha2 + 3h2 )x + (a0 + ha1 + h2 a2 + h3 ).2. what to do next is not so obvious. The choice a2 3 casts the polynomial into the improved form h≡− x 3 + a1 − or better yet x3 − px − q.

but no very simple substitution like (10. That way is no good. such a substitution does achieve it. we have the quadratic equation (w3 )2 = 2 p q w3 − 2 3 3 . before unveiling (10.3.8) which invites the choice 2 wo ≡ p . To deﬁne inscrutable coeﬃcients unnecessarily before the need for them is apparent seems poor applied mathematical style.2) we know how to solve. 3 (10.10) seems to imply six roots.7) (10.11) Why did we not deﬁne P and Q so to begin with? Well. Vieta’s transform has reduced the original cubic to a quadratic. we lacked motivation to do so. double the three the fundamental theorem of algebra (§ 6. we should like to improve the notation by deﬁning5 p P ←− . w3 Multiplying by w3 and rearranging terms.6) by the change of variable w+ we get the new equation 2 2 w3 + (3wo − p)w + (3wo − p) 2 w6 wo + o = q.10) which by (2. Vieta-transforming (10. with the idea of balancing the equation between oﬀsetting w and 1/w terms. . various substitutions.232 CHAPTER 10. We shall return to this point in § 10. w w3 2 wo ←x w (10.2) allows a cubic polynomial to have.10). Lacking guidance. The careful reader will observe that (10.2.1). but at the price of reintroducing an unwanted x2 or z 2 term. 2 5 (10. (10. This works.8) to read w3 + (p/3)3 = q.9) reducing (10.3) achieves this—or rather. 3 q Q←+ . For the moment. one might try many. but after weeks or months of such frustration one might eventually discover Vieta’s transform (10. however. none of which seems to help much. CUBICS AND QUARTICS roots would follow immediately.

If .3 Superﬂuous roots As § 10. SUPERFLUOUS ROOTS 233 Table 10.2) allows a cubic polynomial to have.6) and (10. w3 ≡ Q ± Q2 + P 3 otherwise.3.4 below. the equations of Table 10. treated in §§ 10. x ≡ 0 w − P/w a2 z = x− 3 3 if P = 0 and Q = 0.2 has observed.13) = 2Qw + P .10. which three do we really need and which three can we ignore as superﬂuous? The six w naturally come in two groups of three: one group of three from the one w3 and a second from the other. we will guess—and logically it is only a guess—that a single w3 generates three distinct x and thus (because z diﬀers from x only by a constant oﬀset) all three roots z.) 0 = z 3 + a2 z 2 + a1 z + a0 a1 a2 2 P ≡ − 3 3 1 a2 a2 a1 Q ≡ −2 −a0 + 3 2 3 3 3 2Q if P = 0. double the three the fundamental theorem of algebra (§ 6. 10. 3 Table 10.12) (10. The deﬁnition x ≡ w − P/w maps two w to any one x.2.1: A method to extract the three roots of the general cubic polynomial. otherwise. 3 (10.1 seem to imply six roots.1 summarizes the complete cubic polynomial root extraction method in the revised notation—including a few ﬁne points regarding superﬂuous roots and edge cases. However. one can choose either sign. (In the deﬁnition of w3 . so in fact the equations imply only three x and thus three roots z.3 and 10.10) are written 3 2 (w ) x3 = 2Q − 3P x. what the equations really imply is not six distinct roots but six distinct w. For this reason. The question then is: of the six w. with which (10.

CUBICS AND QUARTICS the guess is right. then the second w3 cannot but yield the same three roots. Cubing6 the last equation. just as the verb to square means “to raise to the second power. Letting the symbol w1 represent the third w. by successive steps. e−i2π/3 w1 P = e−i2π/3 w1 + +i2π/3 . Let us suppose that a single w3 did generate two w which led to the same x. Squaring. Q2 + P 3 . then canceling oﬀsetting terms and factoring. e+i2π/3 w1 − P e+i2π/3 w1 P e+i2π/3 w1 + −i2π/3 e w1 P e+i2π/3 w1 + w1 .” . −P 3 = 2Q2 + P 3 ± 2Q or. 6 The verb to cube in this context means “to raise to the third power. Q2 + P 3 = ∓Q Q2 + P 3 . 6 w1 = −P 3 . 6 w1 = 2Q2 + P 3 ± 2Q 6 Combining the last two on w1 . Q4 + 2Q2 P 3 + P 6 = Q4 + Q2 P 3 . but squaring the table’s w3 deﬁnition for w = w1 . then (since all three w come from the same w3 ) the two w are e+i2π/3 w1 and e−i2π/3 w1 . e w1 P = e−i2π/3 w1 + .” as to change y to y 3 . Q2 + P 3 . which means that the second w3 is superﬂuous and can safely be overlooked. w1 = e−i2π/3 w1 − P which can only be true if 2 w1 = −P. (P 3 )(Q2 + P 3 ) = 0. rearranging terms and halving. let us suppose that it did not. But is the guess right? Does a single w3 in fact generate three distinct x? To prove that it does.234 CHAPTER 10. Because x ≡ w − P/w.

or any other convenient root3 ﬁnding technique to ﬁnd a single root w1 such that w1 = w3 .10. it matters not which. because w appears in the denominator of x’s deﬁnition. incidentally. the Taylor series of Table 8. not both. the method of Table 10. so two roots come easier. One can choose either sign. the contradiction proves false the assumption which gave rise to it. 10. as was to be demonstrated. When this happens.4 Edge cases Section 10. As a simple rule. w1 .1 gives the quadratic solution w 3 ≡ Q ± Q2 + P 3 . the three x descending from a single w3 are indeed distinct. Some cubic polynomials do meet the demand—§ 10. but e±i2π/3 = (−1 ± i 3)/2. The conclusion: either. For example. it can matter. two roots—at z = 1. that nothing prevents two actual roots of a cubic polynomial from having the same value. . but the eﬀects of these cubic edge cases seem suﬃciently nonobvious that the book might include here a few words about them. one can apply the Newton-Raphson iteration (4.1. just as it ought to do. 1 and 2. with a single root at z = 2 and a double root—that is.7 The one sign alone yields all three roots of the general cubic polynomial. EDGE CASES 235 The last equation demands rigidly that either P = 0 or P 3 = −Q2 . (10. For most cubic polynomials. Mostly the book does not worry much about edge cases. Then√ other the ±i2π/3 w . Therefore. the cubic polynomial (z − 1)(z − 1)(z − 2) = z 3 − 4z 2 + 5z − 2 has roots at 1.1 properly yields the single root once and the double root twice. then.14) 2 We should observe. when the two w3 diﬀer in magnitude one might choose the larger. This certainly is possible. Table 10.4.33). of the two signs in the table’s quadratic solution w3 ≡ Q ± Q2 + P 3 demands to be considered. and it does not mean that one of the two roots is superﬂuous or that the polynomial has fewer than three roots. The assumption: that the three x descending from a single w3 were not distinct.4 will treat these and the reader is asked to set them aside for the moment—but most cubic polynomials do not meet it. provided that P = 0 and P 3 = −Q2 . They are e 1 √ −1 ± i 3 w = w1 . In calculating the three w from w3 . if for no other reason than to oﬀer the reader a model of how to think about edge cases on his own. 7 Numerically.3 excepts the edge cases P = 0 and P 3 = −Q2 .

In the edge case P 3 = −Q2 . according to (10. one of which is w3 = 0. x3 = 2Q − 3P x. For this reason. w3 = 2Q or 0. we shall consider ﬁrst the edge cases themselves.236 CHAPTER 10. In the edge case P 3 = −Q2 . which is awkward in light of Table 10. In this section. We address this edge in the spirit of l’Hˆpital’s o rule. w3 = Q.1’s deﬁnition that x ≡ w − P/w. both w3 are the same. arises where the two edges meet— where P = 0 and P 3 = −Q2 . (2Q)1/3 ǫ ǫ x = w− =− + (2Q)1/3 = (2Q)1/3 . The edge case P 3 = −Q2 gives only the one quadratic solution w3 = Q.3. w (2Q)1/3 . by sidestepping it. This is ﬁne. in applying the table’s method when P = 0. gives two distinct quadratic solutions w3 . no other x being possible. The double edge case. or equivalently where P = 0 and Q = 0. Then. w3 = Q + Q = 2Q. One of the two however is w3 = Q − Q = 0. Section 10. or more precisely.12). 2Q ǫ . which in this case means that x3 = 0 and thus that x = 0 absolutely. choosing the − sign in the deﬁnition of w3 . changing P inﬁnitesimally from P = 0 to P = ǫ. then their eﬀect on the proof of § 10. and does not worry about choosing one w3 over the other. which puts an awkward 0/0 in the table’s deﬁnition of x. like the general non-edge case. One merely accepts that w3 = Q.3 generally ﬁnds it suﬃcient to consider either of the two signs. so the one w3 suﬃces by default because the other w3 brings nothing diﬀerent. However. This implies the triple root z = −a2 /3. it gives two quadratic solutions which happen to have the same value. CUBICS AND QUARTICS in which § 10. w3 = Q − w = − Q2 + ǫ3 = Q − (Q) 1 + ǫ3 2Q2 =− ǫ3 . In the edge case P = 0. The edge case P = 0. The edge case P = 0 however does give two distinct w3 .3 has excluded the edge cases from its proof of the suﬃciency of a single w3 . one chooses the other quadratic solution. the trouble is that w3 = 0 and that no alternate w3 is available. Both edge cases are interesting. Let us now add the edge cases to the proof. At the corner. or corner case.

the roots lying nicely along the Argand axes. whereas (1)1/3 = 1. 1/3 237 . and. either way. The reason the quartic is simpler to reduce is probably related to the fact that (1)1/4 = √ ±1. one begins solving the quartic by changing the variable x+h←z to obtain the equation x4 = sx2 + px + q. The details differ. strangely enough.17) . though. 10. historically Ferrari discovered it earlier [76. +3 a3 4 4 . The (1)1/4 brings a much neater result. w (2Q)1/3 Evidently the roots come out the same. w3 = Q + w = (2Q) Q2 + ǫ3 = 2Q. The kernel of the cubic technique lay in reducing the cubic to a quadratic. 6’s footnote 9. What motivated Ferrari to chase the quartic solution while the cubic solution remained still unknown.10.8 As with the cubic. 3 2 p ≡ −a1 + 2a2 q ≡ −a0 + a1 a3 a3 −8 4 4 a3 a3 − a2 4 4 (10. “Quartic equation”].22). but one supposes that it might make an interesting story. See Ch. This may also be why the quintic is intractable—but here we trespass the professional mathematician’s territory and stray from the scope of this book. 4 a3 4 2 (10. this writer does not know. we now turn our attention to the general quartic.16) s ≡ −a2 + 6 . The kernel of the quartic technique lies likewise in reducing the quartic to a cubic. in some ways the quartic reduction is actually the simpler.15) (10. ±i. which he could not solve until Tartaglia applied Vieta’s transform to it. ǫ ǫ x = w − = (2Q)1/3 − = (2Q)1/3 . This completes the proof.5.5 Quartics Having successfully extracted the roots of the general cubic polynomial. where h≡− a3 . 8 Even stranger. QUARTICS But choosing the + sign. . Apparently Ferrari discovered the quartic’s resolvent cubic (10. (−1±i 3)/2.

one supposes that a wise choice might at least render (10. As for u. but with respect to x2 rather than x.19). The right side would be.20) or.19) 2 = k2 x2 + px + j 2 .18) and (10. one must regard (10. as x2 + u where k2 ≡ 2u + s.22) [76. what choice for u would be wise? Well. rearranging terms and scaling. 2k (10. 2 8 9 (10. we have introduced it ourselves and can assign it any value we like. after distributing factors.18). The left side of that equation is a perfect square.16) further. too.19) properly. j= p . In these equations. CUBICS AND QUARTICS To reduce (10. one must be cleverer. j 2 ≡ u2 + q. even after specifying u we remain free at least to choose signs for j and k. s. we have that p2 = 4(2u + s)(u2 + q). The clever idea is to transfer some but not all of the sx2 term to the equation’s left side by x4 + 2ux2 = (2u + s)x2 + px + q.17).238 CHAPTER 10. (10. The variable u is completely free. p and q have deﬁnite values ﬁxed by (10.18) Now. then to complete the square on the equation’s left side as in § 2.2. better expressed. Ferrari9 supplies the cleverness. or. look at (10. where u remains to be chosen. but not so u. So. arbitrarily choosing the + sign. j or k. if it were that p = ±2jk. (10. And though j 2 and k2 depend on u. we propose the constraint that p = 2jk. still. that s 4sq − p2 0 = u3 + u2 + qu + .18) easier to simplify. though no choice would truly be wrong. (10. “Quartic equation”] . so.21) Squaring (10.20) and substituting for j 2 and k2 from (10.

so we can just pick one u and ignore the other two. 2 2 = kx + j . reveals the four roots of the general quartic polynomial.6 Guessing the roots It is entertaining to put pencil to paper and use Table 10.19) then gives k (again.26) improves the notation. Table 10. Equation (10.2) solves as x=± k ±o 2 k 2 2 (10.1’s method to extract the roots of the cubic polynomial 0 = [z − 1][z − i][z + i] = z 3 − z 2 + z − 1. (10. (10. and which we now specify as a second constraint.22) of course yields three u not one.10. which we know by Table 10.25).20) into (10. GUESSING THE ROOTS 239 Equation (10. 10. we can just pick one of the two signs). Using the improved notation. but the resolvent cubic is a voluntary constraint.25). j and k established.22) are both honored. ±o sign is independent of the two.1 how to solve for u. and (10. In view of (10. (10.18) to reach the form x2 + u which is x2 + u 2 = k2 x2 + 2jkx + j 2 .21) then gives j.23) The resolvent cubic (10. Equation (10. (10.6. 2 J ← j. If the constraints (10.22) is the resolvent cubic. with the other equations and deﬁnitions of this section. which (2.24) ± j − u. . then we can safely substitute (10.21) and (10. With u.2 summarizes the complete quartic polynomial root extraction method. the change of variables K← k .23) implies the quadratic x2 = ±(kx + j) − u.25) wherein the two ± signs are tied together but the third.

CUBICS AND QUARTICS Table 10. (In the table. J ≡ p/4K otherwise. where any one of the three resulting u serves. the ±o is independent but the other two are tied together. the four resulting combinations giving the four roots of the general quartic. Of the three ± signs in x’s deﬁnition.) 0 = z 4 + a3 z 3 + a2 z 2 + a1 z + a0 a3 2 s ≡ −a2 + 6 4 a3 a3 3 p ≡ −a1 + 2a2 −8 4 4 a3 2 a3 a3 − a2 +3 q ≡ −a0 + a1 4 4 4 s 2 4sq − p2 0 = u3 + u + qu + 8 √ 2 2u + s K ≡ ± 2 ± u2 + q if K = 0.2: A method to extract the four roots of the general quartic polynomial. x ≡ ±K ±o a3 z = x− 4 K2 ± J − u 4 .240 CHAPTER 10. the resolvent cubic is solved for u by the method of Table 10. Either of the two K similarly serves.1.

GUESSING THE ROOTS One ﬁnds that z = w+ w3 ≡ 1 2 − 2 . 2. ±3. a quartic. but if any of the roots happens to be real and rational. it must belong to the set 1 2 3 6 ±1. ±2. Trying the several candidates on the polynomial. ±2. 3 3 w √ 2 5 + 33 .2 and guess the roots directly. which naturally has the same roots.0000. 2 2 Doubling to make the coeﬃcients all integers produces the polynomial 2z 5 − 7z 4 + 8z 3 + 1z 2 − 0xAz + 6. ± . a quintic or any other polynomial has real. ±i. ± 2 2 2 2 . ±6. If any reader can straightforwardly simplify the expression without solving a cubic polynomial of some kind. 6. one ﬁnds that 1. 33 241 which says indeed that z = 1. In general no better way is known. rational roots. the root of the polynomial comes mysteriously to 1. rational root is possible. but why? The root’s symbolic form gives little clue. ± . the author would like to hear of it. Figuring the square and cube roots in the expression numerically. a trick is known to sidestep Tables 10. No other real. If the roots are complex or irrational.1 and 10. whose factors are ±1. rational candidates are the factors of the polynomial’s trailing coeﬃcient (in the example. no better way is known to this author. Dividing these out leaves a quadratic which is easy to solve for the remaining roots. to the extent to which a cubic. 10 . they are hard to guess.6. ± . One has found the number 1 but cannot recognize it. However. −1 and 3/2 are indeed roots. The real.10. Consider for example the quintic polynomial 1 7 z 5 − z 4 + 4z 3 + z 2 − 5z + 3. ±3 and ±6) divided by the factors of the polynomial’s leading coeﬃcient (in the example. but just you try to simplify it! A more baroque. At least.10 we are stuck with the cubic baroquity. more impenetrable way to write the number 1 is not easy to conceive.

harder but much more proﬁtable. We conclude for this reason. and an . We do not want to spend many pages on this. The next several chapters turn to the topic of the matrix. which implies that a0 q n is a multiple of p. where all the coeﬃcients ak are integers. But by demanding that the fraction p/q be fully reduced. of course. 12 [69. Moving the q n term to the equation’s right side. an is a multiple of q. not only a0 q n but a0 itself is a multiple of p.12 Such root-guessing is little more than an algebraic trick. By similar reasoning. CUBICS AND QUARTICS whose factors are ±1 and ±2). further eﬀort were nevertheless probably better spent elsewhere. though the general methods this chapter has presented to solve cubics and quartics are interesting. a multiple of q. rational root is possible except a factor of a0 divided by a factor of an .2] 11 . q ∈ Z are integers and the fraction p/q is fully reduced—then multiplying the nth-order polynomial by q n to reach the form an pn + an−1 pn−1 q + · · · + a1 pq n−1 + a0 q n = 0. then p and q are factors of a0 and an respectively. we have deﬁned p and q to be relatively prime to one another—that is.242 CHAPTER 10. but it can be a pretty useful trick if it saves us the embarrassment of inadvertently expressing simple rational numbers in ridiculous ways. The presentation here is quite informal. The reason no other real. rational root is possible is seen11 by writing z = p/q—where p. toward which we mean to put substantial eﬀort. One could write much more about higher-order algebra. as was to be demonstrated. But if a0 is a multiple of p. that no real. § 3. but now that the reader has tasted the topic he may feel inclined to agree that. we have that an pn−1 + an−1 pn−2 q + · · · + a1 q n−1 p = −a0 q n . we have deﬁned them to have no factors but ±1 in common—so.

Part II Matrices and vectors 243 .

.

It is when numbers are laid out in orderly grids like C= 6 3 3 4 0 1 0 1 0 that nontrivial eigenvalues arise (though you cannot tell just by looking. This chapter begins to build on those foundations. An object like C is called a matrix. most of the foundational methods of the earlier chapters have handled only one or at most a few numbers (or functions) at a time. the √ eigenvalues of C happen to be −1 and [7 ± 0x49]/2). Some nonobvious eﬀects emerge then. we are getting ahead of ourselves. either. Regarding the eigenvalue: the eigenvalue was always there. just what is an eigenvalue? Answer: an eigenvalue is the value by which an object like C scales an eigenvector without altering the eigenvector’s direction. for instance—so we didn’t bother much to talk about it. we have not yet said what an eigenvector is. Taken by themselves. in practical applications the need to handle large arrays of numbers at once arises often. However. the eigenvalue of Ch.Chapter 11 The matrix Chapters 2 through 9 have laid solidly the basic foundations of applied mathematics. Let’s back up. But. It serves as a generalized coeﬃcient or multiplier. but prior to this point in the book it was usually trivial—the eigenvalue of 5 is just 5. So. or how C might scale something. 14. demanding some heavier mathematical lifting. Where we have used single numbers as coeﬃcients or multipliers 245 . Of course. as. for example. but it is to answer precisely such questions that this chapter and the three which follow it are written.

an m × n grid of scalars. Besides acting alone. a complex number is not a two-element vector but a scalar. so in this example m = 2 and n = 3). therefore any or all of a vector’s or matrix’s scalar elements can be complex.1. 5 and −4+i3. Third. a column of m scalars like u= 5 −4 + i3 . in square brackets. Second. three-dimensional) geometrical vector of § 3. 1 1 −1] or the notation A = [0 1. which can be written in-line with the notation u = [5 −4 + i3]T (here there are two scalar elements. is called a scalar because its action alone during multiplication is simply to scale the thing it multiplies. so in this example m = 2). a single number like α = 5 or β = −4 + i3. like 0 6 2 A = 1 1 −1 . as for instance 5 or −4 + i3. vector. despite the geometrical Argand interpretation of the complex number. scalars can also act in concert—in orderly formations—thus constituting any of three basic kinds of arithmetical object: • the scalar itself.2 Normally however a simple A suﬃces. however. “Matrix stacks” bring no such advantage. one can with suﬃcient care often use matrices instead. one can write it [A]. 2 Alternate notations seen in print include A and A. even one.3 is just an m-element vector with m = 3. though the progression scalar. which can be written in-line with the notation A = [0 6 2. First. Fifth. This book does not treat them. Fourth. 6 1. such objects in fact are seldom used. • the vector. or equivalently a row of n vectors. 2 −1]T (here there are two rows and three columns of scalar elements. The matrix interests us for this reason among others. • the matrix. the three-element (that is. 1 . generally the two can be regarded as the same thing. matrix suggests next a “matrix stack” or stack of p matrices.1 Where one needs visually to distinguish a symbol like A representing a matrix. m and n can be any nonnegative integers. the chief advantage of the standard matrix is that it neatly represents the linear transformation of one vector into another. even zero. even inﬁnity. Such a number. an m-element vector does not diﬀer for most purposes from an m×1 matrix. The technical name for the “single number” is the scalar.246 CHAPTER 11. As we shall see in § 11. THE MATRIX heretofore. Several general points are immediately to be observed about these various objects.

logical. The matrix neatly encapsulates a substantial knot of arithmetical tedium and clutter. checking results (again by pencil. and using a sword. it won’t work) by multiplying factors to restore the original matrices. Several hours of such drill should build the young warrior the practical arithmetical foundation to master—with commensurate eﬀort—the theory these chapters bring.247 The matrix is a notoriously hard topic to motivate. then the rebel might compose a dozen matrices of various sizes and shapes. broad. dull though such chapters be (see [47] or better yet the ﬁne. necessary the chapters may be. . still determined to learn the matrix here alone. tiresome. As a reasonable compromise. without which little or no useful matrix work can be done. is deceptively extensive. is likely to ﬁnd these chapters positively hostile. but conquest is not impossible. the young warrior with face painted and sword agleam. the beginner can expect to ﬁnd these chapters still tedious but no longer impenetrable. 13 and 14. 3 In most of its chapters. The matrix upsets this balance. using a machine defeats the point of the exercise. Applied mathematics brings nothing else quite like it. The most basic body of matrix theory. well. Only the doggedly determined beginner will learn the matrix here alone. others will ﬁnd it more amenable to drill matrix arithmetic ﬁrst in the early chapters of an introductory linear algebra textbook. one must drill it. To the matrix veteran. The reader who has not previously drilled matrix arithmetic. That reader will ﬁnd this chapter and the next three tedious enough. the author presents these four chapters with grim enthusiasm. The mechanics of matrix arithmetic are deceptively intricate. but to understand the matrix one must ﬁrst understand the tedium and clutter the matrix encapsulates. The reader who has previously drilled matrix arithmetic will meet here the essential applied theory of the matrix. an arithmetic. surprisingly less dull [33] for instance. To master matrix arithmetic. the book seeks a balance between terseness the determined beginner cannot penetrate and prolixity the seasoned veteran will not abide. square and tall. Substantial. Part of the trouble with the matrix is that its arithmetic is just that.3. no one has ever devised a way to introduce the matrix which does not seem shallow. The reward is worth the eﬀort. yet the book you hold is fundamentally one of theory not drill. The way of the warrior is hard. decomposing each carefully by pencil per the Gauss-Jordan method of § 12. the veteran seeking more interesting reading might skip directly to Chs. irksome. the author salutes his honorable deﬁance. As far as the author is aware.) Returning here thereafter. referring back to Chs. This chapter. That is the approach the author recommends. Would the rebel consider alternate counsel? If so.3 Chapters 11 through 14 treat the matrix and its algebra. but exciting they are not. yet the matrix is too important to ignore. even interminable at ﬁrst encounter. however. though the early chapters of almost any such book give the needed arithmetical drill. At least. The idea of the matrix is deceptively simple. the earlier parts are not very exciting (later parts are better). 11 and 12 as need arises. To the mathematical rebel. no more likely to be mastered by mere theoretical study than was the classical arithmetic of childhood.

We begin there. The linear transformation 5 is the operation of an m × n matrix A. [7][24][33][47] Professional mathematicians conventionally are careful to begin by drawing a clear distinction between the ideas of the linear transformation. 11. Chs.1) to transform an n-element vector x into an m-element vector b. (11.1. 2 and 5] has much to recommend it. A(αx) = αAx A(0) = 0. introduces the rudiments of the matrix itself. b= b1 b2 . A= 0 1 6 1 2 −1 = αb. x3 = b.1 The linear transformation Section 7. 11.248 CHAPTER 11. 1. as in Ax = b. but it is not the approach we will follow here.3.3 has introduced the idea of linearity. . rather than implicitly assuming the fact. The professional approach [7. For example. the basis set and the simultaneous system of linear equations—proving from suitable axioms that the three amount more or less to the same thing. (11. while respecting the rules of linearity A(x1 + x2 ) = Ax1 + Ax2 = b1 + b2 . 1 and 2][47. Chs. THE MATRIX Ch.2) is the 2 × 3 matrix which transforms a three-element vector x into a twoelement vector b such that Ax = where x= 4 5 0x1 + 6x2 + 2x3 1x1 + 1x2 − 1x3 x1 x2 .1 Provenance and basic use It is in the study of linear transformations that the concept of the matrix ﬁrst arises.4 11.

the operation of a matrix A is that6. In matrix work.11. and in many ways this really is the more sensible way to do it. there are unfortunately not enough distinct Roman and Greek letters available to serve the needs of higher mathematics. In mathematical theory. Should a case arise in which the meaning is not clear. 6 . and aij ≡ [A]ij is the element at the ith row and jth column of A. which is not an index and has nothing to do with indices.3) where xj is the jth element of x.1.1. i in −4 + i3 or eiφ is the imaginary unit. both indices i and j run from −∞ to +∞ anyway. a simultaneous system of linear equations is itself neither more nor less than a linear transformation. Conceived more generally. counting from top left (in the example for instance. one can use ℓjk or some other convenient letters for the indices. industrial mass production-style. For example. a12 = 6). following mathematical convention in the matter for this reason. 7 Whether to let the index j run from 0 to n−1 or from 1 to n is an awkward question of applied mathematical style. but the same letter i also serves as the imaginary unit.2 Matrix multiplication (and addition) Nothing prevents one from lining several vectors xk up in a row. Besides representing linear transformations as such. 11. Seen from this point of view. however. Here. The book you are reading tends to let the index run from 1 to n. the Roman letters ijk conventionally serve as indices. an m × n matrix can be considered an ∞ × ∞ matrix with zeros in the unused cells. In computers. (11. 1x1 + 1x2 − 1x3 = 4. the index normally runs from 0 to n − 1. matrices can also represent simultaneous systems of linear equations. the system 0x1 + 6x2 + 2x3 = 2. a 0 index normally implies something special or basic about the object it identiﬁes. Fortunately. so the computer’s indexing convention poses no dilemma in this case. PROVENANCE AND BASIC USE In general. the meaning is usually clear from the context: i in i or aij is an index. transforming them at once into the corresponding As observed in Appendix B. is compactly represented as Ax = b.7 n 249 bi = j=1 aij xj . See § 11. with A as given above and b = [2 4]T .3. bi is the ith element of b.

5) (11.8) . one multiplies each of the matrix’s elements individually by the scalar: [αA]ij = αaij .6) as one can show by a suitable counterexample like A = [0 1. (11. Such matrix multiplication is associative since n [(A)(XY )]ik = j=1 n aij [XY ]jk p = aij j=1 p n ℓ=1 xjℓ yℓk aij xjℓ yℓk = ℓ=1 j=1 = [(AX)(Y )]ik . Matrix multiplication is not generally commutative. To multiply a matrix by a scalar. (11. AX = XA. however. 0 0]. Equation (11. In this case. n xp ].4) bik = j=1 aij xjk . 0 0].4) implies a deﬁnition for matrix multiplication. (11. (11. X ≡ [ x1 x2 · · · B ≡ [ b1 b2 · · · AX = B. element by element. matrix addition is indeed distributive: [X + Y ]ij = xij + yij . (A + C)(X) = AX + CX.7) Evidently multiplication by a scalar is commutative: αAx = Aαx. and as one can see from (11.250 CHAPTER 11. (A)(X + Y ) = AX + AY . under multiplication. bp ]. THE MATRIX vectors bk by the same matrix A. X = [1 0. Matrix addition works in the way one would expect.4).

the matrix A operates on X’s rows. with the elements of x as the weights. By virtue of multiplying A from the right. the vector x is a column operator acting on A. Similarly.11) The ith row of A weights the several rows of X to yield the ith row of B. The ∗ here means “any” or “all. the jth row of X.11). (11. is n [B]i∗ = j=1 aij [X]j∗ . However. the jth column of A.10). such that AX = B. If a matrix multiplying from the right is a column operator. the matrix X operates on A’s columns.4). a column operation (11.1. the same matrix equation represents something else.1.3 Row and column operators The matrix equation Ax = b represents the linear transformation of x into b.10) or a row operation (11. Since matrix multiplication produces the same result whether one views it as a linear transformation (11. all columns of X”—that is. In AX = B. row operators.3) as n b= j=1 [A]∗j xj . Here x is not only a vector.) Column operators attack from the right.9) where [A]∗j is the jth column of A. however.” Hence [X]j∗ means “jth row. one might wonder what purpose lies in deﬁning matrix multiplication three separate ways. PROVENANCE AND BASIC USE 251 11. besides (11.10) The kth column of X weights the several columns of A to yield the kth column of B. it is also an operator. jth column of A”—that is. as we have seen. from the left. In this view. (Observe the notation. it is not so much for the sake . (11. The matrix A is a row operator. the concept is important. it represents a weighted sum of the columns of A.11. Another way to write AX = B. It operates on A’s columns. If several vectors xk line up in a row to form a matrix X. (11. is a matrix multiplying from the left a row operator? Indeed it is. one writes (11. [A]∗j means “all rows. Viewed from another perspective. then the matrix X is likewise a column operator: n [B]∗k = j=1 [A]∗j xjk . This rule is worth memorizing.

It is the adjoint rather which mirrors a matrix properly. the latter two do indeed expand to yield (11. but algebraically the transpose is artiﬁcial.1. then the transpose of “derivations” would be “snoitavired. A tedious. Alternate notations sometimes seen in print for the adjoint include A† (a notation which in this book means something unrelated) and AH (a notation which recalls the name of the mathematician Charles Hermite). 11. of course. but conjugates each element as it goes.13) which again mirrors an m × n matrix into an n × m matrix. −1 (11. the book you are reading writes the adjoint only as A∗ .” See the diﬀerence?) On real-valued matrices like the A in the example.252 CHAPTER 11. However. For example.4 The transpose and the adjoint One function peculiar to matrix algebra is the transpose C = AT . (If the transpose and adjoint functions applied to words as to matrices. cij = a∗ . AT = 0 6 2 1 1 . Results hard to visualize one way are easy to visualize another. which mirrors an m × n matrix into an n × m matrix. THE MATRIX of the mathematics that we deﬁne it three ways as it is for the sake of the mathematician. the transpose and the adjoint amount to the same thing. ji (11. nonintuitive matrix theorem from one perspective can appear suddenly obvious from another (see for example eqn.12) Similar and even more useful is the conjugate transpose or adjoint 8 C = A∗ . The transpose is convenient notationally to write vectors and matrices inline and to express certain matrix-arithmetical mechanics.4). Mathematically. cij = aji . but as written the three represent three diﬀerent perspectives on the matrix.63). It is worth developing the mental agility to view and handle matrices all three ways for this reason. 8 . 11.” whereas the adjoint would be “snoitavired. a notation which better captures the sense of the thing in the author’s view. We do it for ourselves.

whereas [77.2. The discrete analog of the Dirac delta is the Kronecker delta 10 δi ≡ or δij ≡ ∞ i=−∞ ∞ i=−∞ ∞ j=−∞ 1 if i = 0. 1 if i = j.19) the latter of which is the Kronecker sifting property. . (11.16) (11. ∞ j=−∞ (11.11.13) and (7.3 that k Ak = · · · A3 A2 A1 .19) parallel the Dirac equations (7. THE KRONECKER DELTA 253 If one needed to conjugate the elements of a matrix without transposing the matrix itself. 1 2 (A2 A1 )∗ = A∗ A∗ .18) and that δij ajk = aik . 0 otherwise. 0 otherwise.4. “Kronecker delta. 31 May 2006] k Ak = A1 A2 A3 · · · . k k Ak k = 11.2 The Kronecker delta Section 7. Such a need seldom arises.14). one could contrive notation like A∗T .17) The Kronecker delta enjoys the Dirac-like properties that δi = δij = δij = 1 (11. The Kronecker equations (11.7 has introduced the Dirac delta.3 will revisit the Kronecker delta in another light. § 15.18) and (11. 9 10 Recall from § 2.14) Ak k ∗ = k AT . Later. Chs. however.15) A∗ .” 15:59. Observe that (A2 A1 )T = AT AT . 11 and 14 will ﬁnd frequent use for the Kronecker delta. k (11. 1 2 and more generally that9 T (11.

. 0 0 0 0 0 0 0 .. aij xjk . A6 . . but.. merely padding with zeros often does not suit. 1 This A3 is indeed a matrix. x11 = −4 and x32 = −1. . .3 Dimensionality and matrix forms X= −4 1 2 0 2 −1 . An m × n matrix like can be viewed as the ∞ × ∞ matrix . . . . . . indeed for all matrices. . . THE MATRIX 11. all these express the same operation: “to add to the second row. bik = ∞ j=−∞ . for instance. (11. .4) generalizes to B = AX. . . Further consider A4 = which does the same to a 4 × p matrix X. . ··· ··· ··· ··· ··· ··· ··· . . .. Consider for example the square matrix A3 = 1 5 0 0 1 0 0 0 . . 0 0 0 0 0 0 0 . . its eﬀect is to add to X’s second row. x(−1)(−1) = 0. the matrix multiplication rule (11. . . but when it acts A3 X as a row operator on some 3 × p matrix X. . . . 0 1 . 5 times the ﬁrst. X= . 5 times the ﬁrst.254 CHAPTER 11. . . . . . We can also deﬁne A5 . As before. . 0 0 0 0 0 0 0 . . really.” 1 5 0 0 0 1 0 0 0 0 1 0 0 0 . with zeros in the unused cells. . . . if we want. . . ··· ··· ··· ··· ··· ··· ··· . .20) For square matrices whose purpose is to manipulate other matrices or vectors in place. but now xij exists for all integral i and j. . . For such a matrix. 0 0 0 0 0 0 0 . A7 . 0 0 0 0 0 0 0 −4 0 0 1 2 0 2 −1 0 0 0 0 0 0 .

0 1 0 0 0 0 0 . . . 1 0 0 0 0 0 0 . . . Of course one does not actually write down or store all the elements of an inﬁnite-dimensional vector or matrix.12 This particular section happens to use the symbols A and X to represent certain speciﬁc matrix forms because such usage ﬂows naturally from the usage Ax = b of § 11. A adds to the second row of X. ··· ··· ··· ··· ··· ··· ··· .1. The applied mathematical reader who has never heretofore considered 11 .. .3. After all. the idea of an ∞ × 1 vector or an ∞ × ∞ matrix should not seem so strange. . ··· ··· ··· ··· ··· ··· ··· . . There is nothing wrong with thinking of a matrix as having some deﬁnite size. 0 0 0 0 1 0 0 . 0 0 0 1 0 0 0 . As before. 5 times the ﬁrst—or rather so ﬁlls in X’s previously null second row. Writing them down or storing them is not the point. . . the matrices A and X diﬀer essentially. . . 5 times the ﬁrst—and aﬀects no other row. 0 0 1 5 0 0 0 . we guarantee by (11. DIMENSIONALITY AND MATRIX FORMS The ∞ × ∞ matrix . any more than one actually writes down or stores all the bits (or digits) of 2π. where 0 < ǫ ≪ 1 and ℓ is an integer. 12 The idea of inﬁnite dimensionality is sure to discomﬁt some readers. . And really. Diﬀerent ways of looking at the same mathematics can be extremely useful to the applied mathematician. 0 0 0 0 0 1 0 . but rather an arbitrary matrix of no particular form. . but now also a(−1)(−1) = 1 and a09 = 0. . . that the idea thereof does not threaten to overturn the reader’s pre¨xisting matrix knowledge. that. . only that that view does not suit the present book’s development. and has no second row? Then the operation AX creates a new second row. developing some nonstandard formalisms the derivations of later sections and chapters can use. . . 255 A= expresses the operation generally.. . . .20) that when A acts AX on a matrix X of any dimensionality whatsoever. . . . The point is that inﬁnite dimensionality is all right. among others.11 This section explains. . no fundamental conceptual barrier rises against it. . . . By running ones inﬁnitely both ways out the main diagonal. . the letter A does not necessarily represent an extended operator as it does here. . . . (But what if X is a 1 × p matrix. though the construct seem e unfamiliar. . 0 0 0 0 0 0 1 .11. who have studied matrices before and are used to thinking of a matrix as having some deﬁnite size. Such usage admittedly proves awkward in other contexts. which holds all values of the function sin θ of a real argument θ. . . a11 = 1 and a21 = 5.) In the inﬁnite-dimensional view. . consider the vector u such that uℓ = sin ℓǫ. Traditionally in matrix work and elsewhere in the book.

13 Now a formality: the ordinary m × n matrix X can be viewed. 4. The same symbol 0 used for the null scalar (zero) and the null matrix is used for the null vector. . 7. 0 0 0 0 0 . . too. . to retain complete information about such a matrix. . a zero is a zero is a zero.1 The null and dimension-limited matrices The null matrix is just what its name implies: .256 CHAPTER 11. Though the theoretical dimensionality of X be ∞ × ∞. There are no surprises here. . As we shall discover in Ch. inasmuch as X diﬀers from the null matrix only in the mn elements xij . [0]ij = 0. 0 or O are possible for the null matrix. . [0][X] = 0. 0 0 0 0 0 . the null matrix brings all the expected properties of a zero. . . So the semantics are these: when we call a matrix X an m × n matrix. 1 ≤ j ≤ n. . . . . . one need record only the mn elements. 12. like 0 + A = A. . .3. but usually a simple 0 suﬃces. or to summing an inﬁnity of zeros as in Ch. but the three are interchangeable for most practical purposes in any case. 0 0 0 0 0 . but those aren’t what we were speaking of here. 0= ··· ··· ··· ··· ··· . 1 ≤ i ≤ m. of course. or more compactly. . . . . . inﬁnitedimensionally. . What really counts is not a matrix’s m × n dimensionality but rather its rank. . Special symbols like 0.. there’s not much else to it. 0 0 0 0 0 . or more precisely a dimension-limited matrix with an m × n inﬁnite dimensionality in vectors and matrices would be well served to take the opportunity to do so here. as a variation on the null matrix. . Basically. . there’s a lot else to it. 0 0 0 0 0 . . when it comes to dividing by zero as in Ch. Whether the scalar 0. . plus the values of m and n. THE MATRIX 11. 13 Well. dimensionality is a poor measure of a matrix’s size in any case.. the vector 0 and the matrix 0 actually represent diﬀerent things is a matter of semantics. . ··· ··· ··· ··· ··· . .

[λI]ij = λδij . or more compactly. . [I]ij = δij . . . . That is essentially what it is. . . . . . . . . .. .3. . 0 0 0 0 λ . . 1 0 0 0 0 .. .14 bringing the essential property one expects of a 1: IX = X = XI.21) By these semantics. as it were. . we will mean formally that X is an ∞ × ∞ matrix whose elements are all zero outside the m × n rectangle: xij = 0 except where 1 ≤ i ≤ m and 1 ≤ j ≤ n. but a 4 × 4 matrix is not in general a 3 × 2 matrix. .24) 14 In fact you can write it as 1 if you like. . . (11. λ 0 0 0 0 . . (11.. (11. .11. . . . 0 λ 0 0 0 . . DIMENSIONALITY AND MATRIX FORMS 257 active region. . . 0 1 0 0 0 . The identity matrix I is a matrix 1.2. . . . 0 0 λ 0 0 . . ··· ··· ··· ··· ··· . . . (11. The scalar matrix is . . . . 0 0 0 1 0 . . 0 0 0 0 1 . . . . . I= ··· ··· ··· ··· ··· .22) where δij is the Kronecker delta of § 11. . 11. . 0 0 0 λ 0 .23) λI = ··· ··· ··· ··· ··· . . . or more compactly.. ··· ··· ··· ··· ··· . 0 0 1 0 0 . .2 The identity and scalar matrices and the extended operator The general identity matrix —or simply. . . every 3 × 2 matrix (for example) is also a formally a 4 × 4 matrix. . . the identity matrix —is . The I can be regarded as standing for “identity” or as the Roman numeral I. .3. . .

The several αk control how the extended operator A diﬀers from λI. λ = 0. 4). When we call a matrix A an extended n × n operator. . . . . 1). (3. λ λα1 0 0 0 . (11. . aij = (λ)(δij + αk ) λδij if (i. such that [λI]X = λX = X[λI]. . etc. with p a ﬁnite number. . jk ). THE MATRIX If the identity matrix I is a matrix 1. ··· ··· ··· ··· ··· .26) λ = 0.258 CHAPTER 11. 0 0 λ 0 0 .27) That is. an extended n × n operator is one whose several αk all lie within the n × n square. plus the scale λ. α2 and α3 . . (11. One need record only the several αk along with their respective addresses (ik . 4) and (4. . . . . 0 0 λα2 λ(1 + α3 ) 0 . and the value of the scale λ. the respective addresses (2. . to retain complete information about such a matrix. a 6 × 6. . .25) The identity matrix is (to state the obvious) just the scalar matrix with λ = 1.. 0 λ 0 0 0 . . (Often in practice for smaller operators—especially in the typical case that λ = 1—one ﬁnds it easier just to record all the n × n elements of the active region. This is ﬁne. for an extended operator ﬁtting the pattern . The A in the example is an extended 4 × 4 operator (and also a 5 × 5.. inasmuch as A diﬀers from λI only in p speciﬁc elements. this information alone implies the entire ∞ × ∞ matrix A. but not a 3 × 3). . then the scalar matrix λI is a matrix λ. j) = (ik . 1 ≤ k ≤ p. . jk ). Symbolically. . For example. Large matrix operators however tend to be . . we will mean formally that A is an ∞ × ∞ matrix and is further an extended operator for which 1 ≤ ik ≤ n and 1 ≤ jk ≤ n for all 1 ≤ k ≤ p. (11. . otherwise. or an extended operator with an n × n active region. . A= ··· ··· ··· ··· ··· . 0 0 0 0 λ . one need record only the values of α1 . . . . . The extended operator A is a variation on the scalar matrix λI. ..

1 and the extended-operational form of § 11. 2 × 2. aik bkj = λa bij unless 1 ≤ i ≤ n k aik (λb δkj ) = λb aij unless 1 ≤ j ≤ n k (λa δik )bkj = = λa λb δij unless 1 ≤ (i.3.15 (Remember that a matrix with an m′ × n′ active region also by deﬁnition has an n × n active region if m′ ≤ n and n′ ≤ n.) If any of the factors has dimension-limited form then so does the product. . It’s probably easier just to sketch the matrices and look at them. etc.11. otherwise the product is an extended operator.3. It would waste a lot of computer memory explicitly to store all those zeros. DIMENSIONALITY AND MATRIX FORMS 259 sparse.3. are extended operators with 0 × 0 active regions (and also 1 × 1. If λ = 0.3. One could for example advantageously deﬁne a “null sparse” form. j) ≤ n. which is no extended operator but rather by deﬁnition a 0 × 0 dimension-limited matrix.9 introduces one worthwhile matrix which ﬁts neither the dimension-limited nor the extended-operational form. the dimensionThe section’s earlier subsections formally deﬁne the term active region with respect to each of the two matrix forms. so one normally stores just the few elements.) Implicit in the deﬁnition of the extended operator is that the identity matrix I and the scalar matrix λI. it bears stating explicitly that a product of dimension-limited and/or extended-operational matrices with n × n active regions itself has an n × n active region. recording only nonzero elements and their addresses in an otherwise null matrix. though. j) ≤ n. unless 1 ≤ (i. λ = 0.2. j) ≤ n. or a “tridiagonal extended” form.).16 11. Section 11. instead. other inﬁnite-dimensional matrix forms are certainly possible. bearing repeated entries not only along the main diagonal but also along the diagonals just above and just below.3.3 The active region Though maybe obvious.4 Other matrix forms Besides the dimension-limited form of § 11. meaning that they depart from λI in only a very few of their many elements. 16 If symbolic proof of the subsection’s claims is wanted. the scalar matrix λI is just the null matrix. 11. Still. here it is in outline: aij bij [AB]ij = = = k 15 λa δij λb δij unless 1 ≤ (i. however.

) The rank r can be any nonnegative integer. however. . we will discern the attribute of rank only in the rank-r identity matrix itself. If alternate indexing limits are needed (for instance for a computer-indexed b identity matrix whose indices run from 0 to r − 1).5 The rank-r identity matrix The rank-r identity matrix Ir is the dimension-limited matrix for which [Ir ]ij = δij 0 if 1 ≤ i ≤ r and/or 1 ≤ j ≤ r. (11. though a 3 × 3 matrix. even zero (though the rankzero identity matrix I0 is in fact the null matrix. 11.29) where X is an m × n matrix and x. an m × 1 vector.28) where either the “and” or the “or” can be regarded (it makes no diﬀerence). and indeed it does. the rank in this case is r = b − a + 1. and they are the ones we will principally be handling in this book. which is just the count of ones along the matrix’s main diagonal.3. The eﬀect of Ir is that Im X = X = XIn . The areas of I3 not shown are all zero. (11.3. Further reasons to have deﬁned such forms will soon occur.260 CHAPTER 11. the notation Ia . I3 . Examples of Ir include I3 = 1 0 0 0 1 0 (Remember that in the inﬁnite-dimensional view.5 deﬁnes rank for matrices more generally. 1 (11. THE MATRIX limited and extended-operational forms are normally the most useful. The name “rank-r” implies that Ir has a “rank” of r. normally just written 0). where b [Ia ]ij ≡ δij 0 if a ≤ i ≤ b and/or a ≤ j ≤ b. otherwise. even along the main diagonal. It has only the three ones and ﬁts the 3 × 3 dimension-limited form of § 11. 0 0 . One reason to have deﬁned speciﬁc inﬁnite-dimensional matrix forms is to show how straightforwardly one can fully represent a practical matrix of an inﬁnity of elements by a modest.30) can be used. For the moment. is formally an ∞ × ∞ matrix with zeros in the unused cells.1. Section 12. ﬁnite quantity of information. otherwise. Im x = x.

Evidently big identity matrices commute freely where small ones cannot (and the general identity matrix I = I∞ commutes freely past everything). (11. • The rank-r identity matrix Ir commutes freely past C. as in Ir A. 0 otherwise. By this deﬁnition.6 The truncation operator The rank-r identity matrix Ir is also the truncation operator.7 The elementary vector and the lone-element matrix The lone-element matrix Emn is the matrix with a one in the mnth cell and zeros elsewhere: [Emn ]ij ≡ δim δjn = 1 if i = m and j = n. then there would be little point in applying a truncation operator. if it were always all zero outside. [I]∗j = ej and [I]i∗ = eT . (11.11. then n ≤ r.3. as in AIr .1 and 11.3.) 17 . (11. it retains the ﬁrst through rth columns. 11. i Refer to the deﬁnitions of active region in §§ 11.2.31) says at least two things: • It is superﬂuous to truncate both rows and columns.31) Ir C = Ir CIr = CIr . Attacking from the right. There would be nothing there to truncate. For such a matrix.3. DIMENSIONALITY AND MATRIX FORMS 261 11.3. which has a one as the mth element: 1 if i = m.33) [em ]i ≡ δim = 0 otherwise. Such truncation is useful symbolically to reduce an extended operator to dimension-limited form. (11. C = i. if it has an m × n active region17 and m ≤ r.j cij Eij for any matrix C. Whether a matrix C has dimension-limited or extended-operational form (though not necessarily if it has some other form). it retains the ﬁrst through rth rows of A but cancels other rows. it suﬃces to truncate one or the other. That a matrix has an m × n active region does not necessarily mean that it is all zero outside the m × n rectangle.3.32) By this deﬁnition. The vector analog of the lone-element matrix is the elementary vector em . (After all. Attacking from the left.

C1 acts as a row operator on C2 .34) 11. (11.35). Conventionally denoted T . only if the i corresponding row or column of at least one of the factors so departs.6).3.35) The product of matrices has oﬀ-diagonal entries in a row or column only if at least one of the factors itself has oﬀ-diagonal entries in that row or column. the ith row or jth column of the product of matrices can depart from eT or ej . less readably but more precisely. where j = i. which itself is i just eT .4 The elementary operator Section 11.19 18 In § 11. then its action is merely to duplicate C2 ’s ith row. some authors [47] forbid T[i↔i] as an elementary operator. i (11. The reason is that in (11.3 has introduced the general row or column operator. THE MATRIX 11.36) which by operating T[i↔j]A or AT[i↔j] respectively interchanges A’s ith row or column with its jth. i and likewise that if [C1 ]∗j = [C2 ]∗j = ej . There exist legitimate tactical reasons to forbid (as in § 11.262 CHAPTER 11. 19 As a matter of deﬁnition.3.18 • The ﬁrst is the interchange elementary T[i↔j] = I − (Eii + Ejj ) + (Eij + Eji ). then also [C1 C2 ]∗j = ej .34). Or. Parallel logic naturally applies to (11. but here and generally the symbol represents any matrix. . The elementary operator T comes in three kinds.8 Oﬀ-diagonal entries It is interesting to observe and useful to note that if [C1 ]i∗ = [C2 ]i∗ = eT . i then also [C1 C2 ]i∗ = eT . which is to say that the operator doesn’t actually do anything. (11. but normally this book permits. the symbol A speciﬁcally represented an extended operator.1. that if C1 ’s ith row is eT . the elementary operator is a simple extended row or column operator from sequences of which more complicated extended operators can be built. since after all T[i↔i] = I. respectively.

Where the deﬁnition does not serve the concept well... .. . . 1 0 0 0 0 . . . THE ELEMENTARY OPERATOR • The second is the scaling elementary Tα[i] = I + (α − 1)Eii . . . an applied mathematician ought not to let a mere deﬁnition entangle him. . . • The third and last is the addition elementary Tα[ij] = I + αEij . . α times the ith column. . . .4. . One should usually do so when one can. . . . . ··· ··· ··· ··· ··· . . . . α times the jth row. . . the applied mathematician considers whether it were not worth the eﬀort to adapt the deﬁnition accordingly. 0 0 0 0 1 . . What matters is the underlying concept. . . . . . . α = 0. . . . 0 0 0 0 1 . . . 0 0 1 0 0 . ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· . ··· ··· ··· ··· ··· . . by the factor α. ··· ··· ··· ··· ··· . respectively. . However. 263 (11. . . . 0 1 0 0 0 . 0 0 0 0 1 . ··· ··· ··· ··· ··· . . . T[1↔2] = T5[4] = . . 0 0 0 1 0 . T5[21] = It is good to deﬁne a concept aesthetically. or which by operating ATα[ij] adds to the jth column of A. . . . . . . . . . . . .. . and indeed in this case one might reasonably promote either deﬁnition on aesthetic grounds. Examples of the elementary operators include . (11. ... . . . 0 1 0 0 0 . . 1 5 0 0 0 . . 0 0 1 0 0 . . . i = j. . . . . . . . . 1 0 0 0 0 . 0 0 0 1 0 . .38) which by operating Tα[ij] A adds to the ith row of A. . . 0 0 0 5 0 . . . . 0 1 0 0 0 .37) which by operating Tα[i] A or ATα[i] scales (multiplies) A’s ith row or column. . . .11. 0 0 1 0 0 . . . .

−1 Tα[ij] = T−α[ij] . fourth kind of elementary operator if desired. Equation (11.264 CHAPTER 11. The rank-r identity matrix Ir is no elementary operator. elementary operators as deﬁned above are always invertible (which is to say. and in fact no elementary operator of any kind. (11.21 nor is the lone-element matrix Emn . reversible in eﬀect). but it is probably easier just to regard it as an elementary of any of the ﬁrst three kinds. This book ﬁnds it convenient to deﬁne the elementary operator in inﬁnite-dimensional. . the transpose of an elementary row operator is the corresponding elementary column operator. Curiously. (11. 21 If the statement seems to contradict statements of some other books.20 This means that any sequence of elementaries −1 safely be undone by the reverse sequence k Tk : −1 Tk k k k (11. their underlying deﬁnitions just diﬀer slightly.40) Tk can (11. with −1 T[i↔j] = T[j↔i] = T[i↔j]. the interchange elementary is its own transpose and adjoint: ∗ T T[i↔j] = T[i↔j] = T[i↔j]. THE MATRIX Note that none of these.1 Properties Signiﬁcantly. it is only a matter of deﬁnition. In general. diﬀers from I in more than four elements.31).4. 11. extended-operational form. From (11.43) 20 The addition elementary Tα[ii] and the scaling elementary T0[i] are forbidden precisely because they are not generally invertible. we have that Ir T = Ir T Ir = T Ir if 1 ≤ i ≤ r and 1 ≤ j ≤ r (11. since I = T[i↔i] = T1[i] = T0[ij] .42) for any elementary operator T which operates within the given bounds.39) being themselves elementary operators such that T −1 T = I = T T −1 in each case. −1 Tα[i] = T(1/α)[i] .41) Tk = I = k Tk k −1 Tk . The other books are not wrong. The last can be considered a distinct.42) lets an identity matrix with suﬃciently high rank pass through a sequence of elementaries as needed. but the general identity matrix I is indeed an elementary operator.

Though you probably cannot tell just by looking. with several elementaries of all kinds intermixed. elementaries of diﬀerent kinds always commute. (iii) addition. where T1 and T2 are elementaries of the same kinds respectively as T1 and T2 . Many qualitatively distinct pairs of elementaries exist. what it alters are the other elementary’s indices i and/or j (or m and/or n.2 Commutation and sorting Elementary operators often occur in long chains like A = T−4[32] T[2↔3] T(1/5)[3] T(1/2)[31] T5[21] T[1↔3] . Signiﬁcantly. it so happens that there exists either a T1 such that ′ ′ ′ ′ ′ T1 T2 = T2 T1 or a T2 such that T1 T2 = T2 T1 . though ′ indeed T1 T2 = T2 T1 . Interesting is that the act of reordering the elementaries has altered some of them into other elementaries of the same kind. they yield the same A and thus represent exactly the same matrix operation. at the moment we are dealing in elementary operators only.11. Itself subject to alteration only by another interchange elementary. though commutation can alter one (never both) of the two elementaries. One sorts a chain of elementary operators by repeatedly exchanging adjacent pairs. it changes the kind of neither. however. we should like to observe a natural hierarchy among the three kinds of elementary: (i) interchange. and for most pairs T1 and T2 of elementary operators. which seems impossible since matrix multiplication is not commutative: A1 A2 = A2 A1 . it can alter any elementary by commuting past. the three products above are diﬀerent orderings of the same elementary chain. And. The attempt sometimes fails when both T1 and T2 are addition elementaries. • The interchange elementary is the strongest. we will list these exhaustively in a moment. This of course supposes that one can exchange adjacent pairs. First. However. as A = T[2↔3] T[1↔3] or as A = T−4[32] T(1/0xA)[21] T5[31] T(1/5)[2] T[2↔3] T[1↔3] . or whatever symbols T−4[21] T(1/0xA)[13] T5[23] T(1/5)[1] . among other possible orderings. Some applications demand that the elementaries be sorted and grouped by kind. (ii) scaling. but all other pairs commute in this way. but has changed the kind of none of them. When an interchange elementary commutes past another elementary of any kind. THE ELEMENTARY OPERATOR 265 11.4.4.

is subject to alteration by either of the other two.) Refer to Table 11.3 list all possible pairs of elementary operators. • Next in strength is the scaling elementary. When two interchange elementaries commute past one another.3 exhaustively describe the commutation of one elementary past another elementary.3. 11. An elementary commuting rightward changes A to T AT −1 . or whatever symbol happens to represent the scale in question).5 Inversion and similarity (introduction) If Tables 11. what it alters is the latter’s scale α (or β.3. When a scaling elementary commutes past an addition elementary. Tables 11.2. only one of the two is altered. A pair of addition elementaries are the only pair that can altogether fail to commute—they fail when the row index of one equals the column index of the other—but when they do commute. as the reader can check. The only pairs that fail to commute are the last three of Table 11. THE MATRIX happen to represent the indices in question). Only an interchange elementary can alter it. 11.44) where T −1 is given by (11. commuting leftward.1. to T −1 AT . AT = T [T −1 AT ].39). . Refer to Table 11. itself having no power to alter any elementary during commutation. 11. Refer to Table 11. The mathematician chooses. then what can one write of the commutation of an elementary past the general matrix A? With some matrix algebra. last and weakest.1. T A = (T A)(I) = (T A)(T −1 T ).2 and 11. Scaling elementaries do not alter one another during commutation. (Which one? Either. • The addition elementary. and it in turn can alter only an addition elementary.266 CHAPTER 11. (11.2 and 11. AT = (I)(AT ) = (T T −1 )(AT ).1. one can write that T A = [T AT −1 ]T. neither alters the other.

even another interchange elementary.1: Inverting. combining and expanding elementary operators: interchange. no two indices are the same.11. is simply to replace m by n and n by m among the indices of the other elementary.5. T[m↔n] = T[n↔m] T[m↔m] = I IT[m↔n] = T[m↔n] I T[m↔n] T[m↔n] = T[m↔n] T[n↔m] = T[n↔m] T[m↔n] = I T[m↔n] T[i↔n] = T[i↔n] T[m↔i] = T[i↔m] T[m↔n] = T[i↔n] T[m↔n] 2 T[m↔n] T[i↔j] = T[i↔j] T[m↔n] T[m↔n] Tα[m] = Tα[n] T[m↔n] T[m↔n] Tα[i] = Tα[i] T[m↔n] T[m↔n] Tα[ij] = Tα[ij] T[m↔n] T[m↔n] Tα[in] = Tα[im] T[m↔n] T[m↔n] Tα[mj] = Tα[nj] T[m↔n] T[m↔n] Tα[mn] = Tα[nm] T[m↔n] . In the table. INVERSION AND SIMILARITY (INTRODUCTION) 267 Table 11. commuting. Notice that the eﬀect an interchange elementary T[m↔n] has in passing any other elementary. i = j = m = n.

combining and expanding elementary operators: scaling.268 CHAPTER 11. commuting. T1[m] = I ITβ[m] = Tβ[m] I T(1/β)[m] Tβ[m] = I Tβ[m] Tα[m] = Tα[m] Tβ[m] = Tαβ[m] Tβ[m] Tα[i] = Tα[i] Tβ[m] Tβ[m] Tα[ij] = Tα[ij] Tβ[m] Tβ[m] Tαβ[im] = Tα[im] Tβ[m] Tβ[m] Tα[mj] = Tαβ[mj] Tβ[m] Table 11. no two indices are the same. The last three lines give pairs of addition elementaries that do not commute.3: Inverting. commuting. no two indices are the same. T0[ij] = I ITα[ij] = Tα[ij] I T−α[ij] Tα[ij] = I Tβ[ij] Tα[ij] = Tα[ij] Tβ[ij] = T(α+β)[ij] Tβ[mj] Tα[ij] = Tα[ij] Tβ[mj] Tβ[in] Tα[ij] = Tα[ij] Tβ[in] Tβ[mn] Tα[ij] = Tα[ij] Tβ[mn] Tβ[mi] Tα[ij] = Tα[ij] Tαβ[mj] Tβ[mi] Tβ[jn] Tα[ij] = Tα[ij] T−αβ[in] Tβ[jn] Tβ[ji] Tα[ij] = Tα[ij] Tβ[ji] . THE MATRIX Table 11.2: Inverting. i = j = m = n. i = j = m = n. In the table. In the table. combining and expanding elementary operators: addition.

For example. though. 12 and 13 to address. the null matrix has no such inverse.14).45) (Do all matrices have such inverses? No. (11. Equation (11. (11.2 and 14. which yields −1 Ck k k Ck = I = k Ck k −1 Ck .9 speak further of this.4.46) The transformation CAC −1 or C −1 AC is called a similarity transformation. since for conjugate transposition C −1 ∗ C ∗ = CC −1 ∗ = [I]∗ = I = [I]∗ = C −1 C ∗ = C ∗ C −1 ∗ and similarly for nonconjugate transposition. we leave for Chs.47) is a consequence of (11.44) actually requires the matrix T there to be an elementary operator. ∗ −1 −1 ∗ (11.) The broad question of how to invert a general matrix C. Any matrix C for which C −1 is known can ﬁll the role. −1 Ck k = k −1 Ck . Many more general matrices C also have inverses such that C −1 C = I = CC −1 . .48) This rule emerges upon repeated application of (11. AC = C[C −1 AC]. CT C −1 = C −T = C −1 = C −∗ = C T . First. the notation T −1 means the inverse of the elementary operator T . Second.5. . Sections 12. such that T −1 T = I = T T −1 . Hence. Third. INVERSION AND SIMILARITY (INTRODUCTION) 269 First encountered in § 11. Matrix inversion is not for elementary operators only. For the moment however we should like to observe three simple rules involving matrix inversion. nothing in the logic leading to (11.11. CA = [CAC −1 ]C.47) where C −∗ is condensed notation for conjugate transposition and inversion in either order and C −T is of like style.45). (11.

46) too applies for the rank-r inverse if A’s active region is limited to r × r.48) apply equally for the rank-r inverse as for the inﬁnite-dimensional inverse. one can achieve any desired permutation (§ 4. a matrix C −1(r) such that C −1(r) C = Ir = CC −1(r) . one can abbreviate the notation to C −1 . (11.) C −1 C = C −1(r) C = CT C∗ −1 −1 I Ir C −∗ = CC −1 = CC −1(r) = = C −1 C −1 T ∗ = C −T = CA = [CAC −1 ]C AC = C[C −1 AC] −1 Ck k = k −1 Ck A more limited form of the inverse exists than the inﬁnite-dimensional form of (11.2 uses the rank-r inverse to solve an exactly determined linear system. (Section 13. (11. beginning . By successively interchanging pairs of the objects (any pairs. 11. This is the rank-r inverse. When so.) Table 11.47) and (11. This is a famous way to use the inverse.45). . In either notation.2. THE MATRIX Table 11. since the context usually implies the rank. not just adjacent pairs). For example.1).6 Parity Consider the sequence of integers or other objects 1. The full notation C −1(r) for the rank-r inverse incidentally is not standard. . (11.49) The full notation C −1(r) is not standard and usually is not needed. usually is not needed. . 3. 13. (The properties work equally for C −1(r) as for C −1 if A honors an r ×r active region. Because of (11. and normally is not used. 12. with which many or most readers will already be familiar.4 summarizes.270 CHAPTER 11. we shall ﬁrst learn how to compute it reliably in Ch.31). n. eqn. .4: Matrix inversion properties. but before using it so in Ch. 2.

v) (u. v. . u. some pairs will appear in correct order with respect to one another. 5. 2) (1. . (3. a2 . What about nonadjacent elements? Does interchanging a pair of these reverse parity. 2. . 2 by interchanging ﬁrst the 1 and 3. 2).22 Now consider: every interchange of adjacent elements must either increment or decrement p by one. n). 5. thus reversing parity. v) · · · (u. am . . 3. 1. the pair [1. . v) (am . . −2. 2. . 3) (1. Why? Well. . (1. . . . . a2 ) · · · (a1 . 4. am ) (am−1 . am the elements lying between. (2. 4. v) 22 For readers who learned arithmetic in another language than English. 1. . with a1 . let u and v represent the two elements interchanged. . 4) · · · (3. a2 . −1. v. PARITY 271 with 1. thus the interchange aﬀects the ordering of no other pair). if odd. . . 3] appears in incorrect order in that the larger 3 stands to the left of the smaller 1. an adjacent interchange alters p by exactly ±1. 2] appears in correct order in that the larger 2 stands to the right of the smaller 1. In a given permutation (like 3. Now contemplate all possible pairs: (1. . one can achieve the permutation 3. a1 ) (u. . n). . After the interchange: . . am−1 . a1 . 2. 0. 4. but the pair [1. Before the interchange: . then we say that the permutation has even or positive parity. If two elements are adjacent and their order is correct. . then odd or negative parity. Complementarily. 1. then interchanging rectiﬁes the order. . p = 6). . 4. . . n). n). . v) (a2 . but only of that pair (no other element interposes. . The interchange reverses with respect to one another just the pairs (u. reversing parity. 5.) If p is the number of pairs which appear in incorrect order (in the example. . 6. . . 5. 3) (2. the even integers are . Either way. . 4. 3. . think about it. too? To answer the question. . −4. .11. . then interchanging falsiﬁes the order. 1. .. while others will appear in incorrect order. . 4) · · · (2. (In 3. am−1 .. and if p is even. (n − 1. . am . a2 . a1 . −3. am−1 ) (u. . then the 2 and 5.6. u. 5. . . 4) · · · . if the order is incorrect. the odd integers are . . 7.

A quasielementary operator is composed of elementaries only of a single kind. The sole exception arises when an element is interchanged with itself. so in parity calculations we ignore it. The rows or columns of a matrix can be considered elements in a sequence. 11. This does not change parity. The three subsections which follow respectively introduce the three kinds of quasielementary operator. THE MATRIX The number of pairs reversed is odd. However. It follows that if ik = jk and q is odd. any sequences of elementaries of the respective kinds are allowed. however. reversing parity.1 and 14. then q T[ik ↔jk ] = I. scaling and addition—to match the three kinds of elementary. but it does not change anything else. If so. There are thus three kinds of quasielementary—interchange. either.7. It seems that regardless of how distant the pair. In any event. it is possible that q T[ik ↔jk ] = I k=1 k=1 if q is even.3. interchanging any pair of elements reverses the permutation’s parity. more complicated than elementary operators but less so than arbitrary matrices.23 All other interchanges reverse parity.7. as explained in footnote 19. With respect to addition. the net change in p apparently also is odd. We shall have more to say about parity in §§ 11. explained in § 11. one can form much more complicated operators. so one ﬁnds it convenient to deﬁne an intermediate class of operators. Since each reversal alters p by ±1. because parity concerns the elementary interchange operator of § 11. 23 This is why some authors forbid self-interchanges.4. With respect to interchange and scaling. which means even (positive) parity. even q implies even p. interchanging rows or columns and thus reversing parity.272 CHAPTER 11. acts precisely in the manner described. odd q implies odd p. . called in this book the quasielementary operators. We discuss parity in this.41) are always invertible. which per (11.1. there are some extra rules. a chapter on matrices. which means odd (negative) parity. Such complicated operators are not trivial to analyze. i = j. then the interchange operator T[i↔j] .4.7 The quasielementary operator Multiplying sequences of the elementary operators of § 11.

0 1 0 0 0 .41). . . ..24 An example is . 0 0 0 0 1 .7. (11. . . . The eﬀect of the operator is to shuﬄe the rows or columns of the matrix it operates on. (11. 1 0 0 0 0 . . . .52) The letter P here recalls the verb “to permute. transpose and adjoint of the general interchange operator are thus the same: P T P = P ∗P = I = P P ∗ = P P T .. the inverse of the general interchange operator is P −1 −1 = k T[ik ↔jk ] = k −1 T[ik ↔jk ] = k T[ik ↔jk ] ∗ T[ik ↔jk ] ∗ = k = k T[ik ↔jk ] (11. . .” . . 0 0 1 0 0 . .15).51) = P∗ = PT (where P ∗ = P T because P has only real elements). . . This operator resembles I in that it has a single one in each row and in each column. permutor or general interchange operator.43) and (11. . . without altering any of the rows or columns it shuﬄes. THE QUASIELEMENTARY OPERATOR 273 11. .7. (11.39). By (11. permutation matrix. . . . but the ones here do not necessarily run along the main diagonal. ··· ··· ··· ··· ··· . .1 The interchange quasielementary or general interchange operator Any product P of zero or more interchange elementaries. . . P = k T[ik ↔jk ] . 24 (11. . 0 0 0 1 0 . P = T[2↔5] T[1↔3] = ··· ··· ··· ··· ··· .11. The inverse. . .50) constitutes an interchange quasielementary.

. As we shall see in § 14. ··· ··· ··· ··· ··· . . in this case elementary scaling operators:25 .. . The letter D here recalls the adjective “diagonal. 0 ∗ 0 0 0 . The parity of the general interchange operator P concerns us for this reason. . hence that Tαi [i] = I. D= ∞ i=−∞ Tαi [i] = ∞ i=−∞ Tαi [i] = ∞ i=−∞ αi Eii = (11. This works precisely as described in § 11.2 The scaling quasielementary or general scaling operator Like the interchange quasielementary P of § 11. for some. .. . only interchange elementaries T[ik ↔jk ] for which ik = jk are counted. . Parity. incidentally.1. the matrix is nonetheless composable as a deﬁnite sequence of interchange elementaries. THE MATRIX A signiﬁcant attribute of the general interchange operator P is its parity: positive or even parity if the number of interchange elementaries T[ik ↔jk ] which compose it is even.7. however. . 0 0 ∗ 0 0 . not just of the operation P represents. For the purpose of parity determination.53) (of course it might be that αi = 1. regardless of whether one ultimately means to use P as a row or column operator. the positive (even) and negative (odd) parities sometimes lend actual positive and negative senses to the matrices they describe. . . It is the number of interchanges. Thus the example’s P above has even parity (two interchanges). . . as does I itself (zero interchanges). . The reason is that. any T[i↔i] = I noninterchanges are ignored. 0 0 0 ∗ 0 . No interchange quasielementary P has positive parity as a row operator but negative as a column operator.7. diagonal matrix or general scaling operator D consists of a product of zero or more elementary operators. . which determines P ’s parity. the scaling quasielementary. . . . negative or odd parity if the number is odd. . 11. αi = 0 is forbidden by the deﬁnition of the scaling 25 ··· ··· ··· ··· ··· .6. 0 0 0 0 ∗ .” . is a property of the matrix P itself. . . .274 CHAPTER 11. not the use. but T[i↔j] alone (one interchange) has odd parity if i = j. . . most or even all i. . ∗ 0 0 0 0 .1. . .

0 0 0 0 x5 . . .7. . . . . . 11. .11. . ··· ··· ··· ··· ··· . Its inverse is evidently D −1 = ∞ i=−∞ T(1/αi )[i] = ∞ i=−∞ T(1/αi )[i] = ∞ i=−∞ Eii . This operator resembles I in that all its entries run down the main diagonal. (11. .54) where each element down the main diagonal is individually inverted. . .. x1 0 0 0 0 . . . diag{x} = converts a vector x into a diagonal matrix. .. . . deﬁned less restrictively that [A]ij = 0 for i = j. . . . . . 0 0 0 0 1 0 0 −5 0 0 . . qualiﬁes as a quasielementary operator. . either. They are nonzero scaling factors.7. . . A superset of the general scaling operator is the diagonal matrix. . . 0 x2 0 0 0 . .. . An example is . where zeros along the main diagonal are allowed. . though never zeros. . any . 275 D = T−5[4] T4[2] T7[1] = ··· ··· ··· ··· ··· . . but is sometimes useful nevertheless. The general scaling operator is a particularly simple matrix.1). . ··· ··· ··· ··· ··· . . 0 0 0 0 1 . . . . . . The diagonal matrix in general is not invertible and is no quasielementary operator. . . . are not necessarily ones.2). . The eﬀect of the operator is to scale the rows or columns of the matrix it operates on. 0 0 x3 0 0 . . THE QUASIELEMENTARY OPERATOR elementary). . The conventional notation [diag{x}]ij ≡ δij xi = δij xj . ..7. . . but these entries. any product of scaling elementaries (§ 11.3 Addition quasielementaries Any product of interchange elementaries (§ 11. 7 0 0 0 0 . αi (11. 0 0 0 x4 0 .55) .7. ··· ··· ··· ··· ··· . 0 4 0 0 0 . . . . . . Not so. .

. but the pattern is similar.. . ··· ··· ··· ··· ··· . .56) = I+ .. ··· ··· ··· ··· ··· . a product of elementary addition operators must meet some additional restrictions. 0 0 1 ∗ ∗ . . . . . . . To qualify as a quasielementary. . 1 0 0 0 0 . . whose inverse is L−1 = [j] ∞ i=j+1 T−αij [ij] = ∞ ∞ i=j+1 T−αij [ij] (11.276 CHAPTER 11.57) = I− i=j+1 αij Eij = 2I − L[j]. Four types of addition quasielementary are deﬁned:26 • the downward multitarget row addition operator. 0 1 0 0 0 . 0 0 0 1 0 . .27 ∞ i=j+1 ∞ i=j+1 ∞ i=j+1 L[j] = Tαij [ij] = αij Eij . . . = . THE MATRIX product of addition elementaries. 0 0 0 0 1 . . . . . . . . .” 26 . . . Tαij [ij] (11. 27 The letter L here recalls the adjective “lower. In this subsection the explanations are briefer than in the last two. The reader can ﬁll in the details. .

. . . which is the transT pose U[j] of the upward operator. ··· ··· ··· ··· ··· . . .11.8 The unit triangular matrix Yet more complicated than the quasielementary of § 11. .” .28 j−1 j−1 277 U[j] = i=−∞ Tαij [ij] = i=−∞ j−1 Tαij [ij] (11. . whose inverse is j−1 −1 U[j] j−1 = i=−∞ T−αij [ij] = i=−∞ T−αij [ij] (11.58) = I+ i=−∞ αij Eij ... . . . • the rightward multitarget column addition operator. = . . ··· ··· ··· ··· ··· . THE UNIT TRIANGULAR MATRIX • the upward multitarget row addition operator. . . 1 0 0 0 0 . 0 0 0 0 1 . 0 1 0 0 0 . .59) j−1 = I− i=−∞ αij Eij = 2I − U[j] . . which is the transpose LT of the downward operator. . . . ∗ ∗ 1 0 0 . and [j] • the leftward multitarget column addition operator. . . 0 0 0 1 0 . with which we draw this necessary but tedious chapter toward 28 The letter U here recalls the adjective “upper. . . .7 is the unit triangular matrix. . . 11. .8.

.. as in Table 11. 1 ∗ ∗ ∗ ∗ . . . which adds not only to multiple targets but also from multiple sources—but in one direction only: downward or leftward for L or U T (or U ∗ ).10. . . . 30 Aug. . is sometimes of interest. The subscript S here stands for Schur. ∞ . . This section presents and derives the basic properties of the unit triangular matrix. . . ∗ 1 0 0 0 . .60) i=−∞ j=−∞ j=−∞ i=j+1 = .. . . 2007] 29 . . . . ··· ··· ··· ··· ··· . 0 0 1 ∗ ∗ .29 The strictly triangular matrix L − I or U − I is likewise sometimes of interest. . the latter. such matrices cannot in general be expressed as products of elementary operators and this section does not treat them. .” 00:32. . 30 [77. . . U = I+ . . . . which by deﬁnition can have any values along its main diagonal. . . but this book distinguishes by the subscript. . . . αij Eij = I + .30 However. ∗ ∗ ∗ ∗ 1 . 1 0 0 0 0 . ∗ ∗ 1 0 0 . . ··· ··· ··· ··· ··· . . . The unit triangular matrix is a generalized addition quasielementary. ··· ··· ··· ··· ··· . . ··· ··· ··· ··· ··· . αij Eij = I + .. . .278 a long close: ∞ i−1 CHAPTER 11. THE MATRIX L = I+ . The general triangular matrix LS or US . . ∞ ∞ ∞ αij Eij (11. . . upward or rightward for U or LT (or L∗ ). The former is a unit lower triangular matrix. ∗ ∗ ∗ 1 0 . . 0 0 0 0 1 .5. .. . 0 0 0 1 ∗ . . as in the Schur decomposition of § 14. Other books typically use the symbols L and U for the general triangular matrix of Schur.61) i=−∞ j=i+1 j=−∞ i=−∞ = . . a unit upper triangular matrix. ∞ j−1 αij Eij (11. “Schur decomposition. . . . . . . 0 1 ∗ ∗ ∗ .

1 Construction To make a unit triangular matrix is straightforward: L= U= ∞ j=−∞ ∞ j=−∞ L[j] .62) U[j] . as row operators to C. L U = L[j] = U[j] . then A2 . then considering the order in which the several L[j] and U[j] act. The correctness of (11. ij ij ij ij which is to say that the entries of L and U are respectively nothing more than the relevant entries of the several L[j] and U[j]. A3 and so on. to build any unit triangular matrix desired.8. The reader can construct an inductive proof symbolically on this basis without too much diﬃculty if desired. Equation (11. whereas (C)( k Ak ) applies ﬁrst A1 .62) immediately and directly. (11.63) enables one to use (11.63) is most easily seen if the several L[j] and U[j] are regarded as column operators acting sequentially on I: ∞ j=−∞ ∞ j=−∞ L = (I) U = (I) L[j] . but just thinking about how L[j] adds columns leftward and U[j] .63) follows at once. (11. rightward. as column operators to C. without calculation. So long as the multiplication is done in the order indicated.63) . A3 and so on. This means that ( k Ak )(C) applies ﬁrst A1 . U[j] . 31 . THE UNIT TRIANGULAR MATRIX 279 11. Recall again from § 2. whereas k Ak = A1 A2 A3 · · · . (11. The symbols and as this book uses them can thus be thought of respectively as row and column sequencers.31 then conveniently.11.3 that k Ak = · · · A3 A2 A1 .8. then A2 .

57) or (11.8. L1 L2 = L.59) gives the inverse of each . which again is the very deﬁnition of a unit lower triangular matrix. Then. Hence (11. if i ≥ j. one starts from a form of the deﬁnition of a unit lower triangular matrix: 0 if i < j. inasmuch as (11. The proof for unit lower and unit upper triangular matrices is the same. [L1 L2 ]ij = i m=j [L1 ]im [L2 ]mj if i = j. [L1 ]ij [L2 ]ij = [L1 ]ii [L2 ]ii = (1)(1) = 1 if i = j. Therefore.64).63) supplies the speciﬁc quasielementaries.8. [L1 ]ij or [L2 ]ij = 1 if i = j. and [L2 ]mj is null when m < j. (11. and inasmuch as (11. U1 U2 = U. But as we have just observed. THE MATRIX 11. [L1 L2 ]ij = ∞ m=−∞ [L1 ]im [L2 ]mj .280 CHAPTER 11. 11. Inasmuch as this is true.62). In the unit lower triangular case. But this is just [L1 L2 ]ij = 0 if i < j.2 The product of like unit triangular matrices The product of like unit triangular matrices.3 Inversion Inasmuch as any unit triangular matrix can be constructed from addition quasielementaries by (11. nothing prevents us from weakening the statement to read 0 if i < j. [L1 ]im is null when i < m. [L1 L2 ]ij = 0 i m=j [L1 ]im [L2 ]mj if i < j.64) is another unit triangular matrix of the same type.

4 The parallel unit triangular matrix If a unit triangular matrix ﬁts the special.11. . therefore. It is plain to see but still interesting to note that—unlike the inverse— the adjoint or transpose of a unit lower triangular matrix is a unit upper triangular matrix. . . 0 0 0 1 0 . . ··· ··· ··· ··· ··· . . . 0 0 1 ∗ ∗ . . . ∞ αij Eij . . another unit upper triangular matrix. .. . .64). The adjoint reverses the sense of the triangle. . 0 0 0 0 1 . 1 0 0 ∗ ∗ . and the inverse of a unit upper triangular matrix. 11. . [j] (11. . . . restricted form L {k} k = I+ . . 0 1 0 ∗ ∗ . −1 In view of (11. . . .65) −1 U[j] . ··· ··· ··· ··· ··· . the inverse of a unit lower triangular matrix is another unit lower triangular matrix. and that the adjoint or transpose of a unit upper triangular matrix is a unit lower triangular matrix.8. (11.66) j=−∞ i=k+1 = . ..8. . . one can always invert a unit triangular matrix easily by L U −1 = = ∞ j=−∞ ∞ j=−∞ L−1 . . THE UNIT TRIANGULAR MATRIX 281 such quasielementary.

. . .. . the sequence matters in which rows or columns are added. The reason we care about the separation of source from target is that. . . which acting AL {k}T and AU {k}T {k}T add columns respectively rightward and leftward (remembering that L {k}T is the lower). . and the sequence ceases to matter (refer to Table 11. . which thus march in A as separate squads. 0 1 0 0 0 . THE MATRIX = I+ . Remove this danger. ··· ··· ··· ··· ··· . that U Each separates source from target in the matrix A it operates on. Tα1 [i1 j1 ] Tα2 [i2 j2 ] = I + α1 Ei1 j1 + α2 Ei2 j2 = Tα2 [i2 j2 ] Tα1 [i1 j1 ] . ∞ k−1 αij Eij . (11. . It is for this reason that . . and also with respect to L {k}T ··· ··· ··· ··· ··· . . The danger is that i1 = j2 or i2 = j1 . and U {k}T . where source and target are not separate but remain intermixed. That is exactly what the parallel unit triangular matrix does: it separates source from target and thus removes the danger. ∗ ∗ 1 0 0 . . which acting U A adds rows upward. . is no unit lower but a unit upper triangular matrix. That is. . The addition is from A’s rows through the kth. ∗ ∗ 0 1 0 . . during or after the other—but only because the target of the one addition might be the source of the other. in matrix arithmetic generally. The parallel unit lower triangular matrix {k} {k} acting L A also adds rows downward. but with the useful restriction L that it makes no row of A both source and target. . .. . then it is a parallel unit triangular matrix and has some special properties the general unit triangular matrix lacks. . ∗ ∗ 0 0 1 . . . A horizontal frontier separates source from target. . Similar observations naturally apply with respect to the parallel unit {k} {k} upper triangular matrix U . 1 0 0 0 0 . to A’s (k + 1)th row onward.282 or U {k} CHAPTER 11. .3). . The general unit lower triangular matrix L acting LA on a matrix A adds the rows of A downward.67) j=k i=−∞ = conﬁning its nonzero elements to a rectangle within the triangle as shown. . in general. . It makes a diﬀerence whether the one addition comes before.

The multiplication is fully commutative. even more sequences are possible.68) i=k+1 j=−∞ i=k+1 j=−∞ = {k} ∞ k Tαij [ij] = ∞ k−1 ∞ k Tαij [ij] .8. You can scramble the factors’ ordering any random way you like. 11.) Under such conditions.11. i=k+1 j=−∞ i=k+1 j=−∞ U =I+ ∞ αij Eij j=k i=−∞ k−1 = j=k i=−∞ Tαij [ij] = · · · . THE UNIT TRIANGULAR MATRIX the parallel unit triangular matrix brings the useful property that 283 L {k} k =I+ k ∞ ∞ αij Eij k ∞ j=−∞ i=k+1 = k Tαij [ij] = k Tαij [ij] Tαij [ij] Tαij [ij] j=−∞ i=k+1 j=−∞ i=k+1 = ∞ ∞ Tαij [ij] = Tαij [ij] = ∞ ∞ j=−∞ i=k+1 k j=−∞ i=k+1 k = (11.62) one must sequence carefully.68 does not show them. the inverse of the parallel unit . (Though eqn. which says that one can build a parallel unit triangular matrix equally well in any sequence—in contrast to the case of the general unit triangular matrix. whose construction per (11.

. ··· ··· ··· ··· ··· . . . . . . . .69) represents is indeed simple: that one can multiply constituent elementaries in any order and still reach the same parallel unit triangular matrix. . . . . . 32 .. . . . . . .. . .5 records a few properties that come immediately of the last observation and from the parallel unit triangular matrix’s basic layout. . . . . .. . . . . . 0 −∗ −∗ −∗ 1 −∗ −∗ −∗ 0 1 0 0 0 0 1 0 0 0 0 1 . ··· ··· ··· ··· ··· . . ··· ··· ··· ··· ··· . in the present context the idea (11. . 1 0 0 0 1 0 0 0 1 −∗ −∗ −∗ −∗ −∗ −∗ . . 0 0 0 1 0 . Table 11. . that the elementaries in this case do not interfere. . only with each element oﬀ the main diagonal negated. . (11. U {k} −1 = . The inverse of a parallel unit triangular matrix is just the matrix itself. . . . 0 0 0 0 1 . . . . L {k} −1 = ··· ··· ··· ··· ··· .69) αij Eij = 2I − U {k} k−1 U =I− = ∞ j=k i=−∞ k−1 j=k i=−∞ T−αij [ij] = · · · . .” Nevertheless. where again the elementaries can be multiplied in any order. Pictorially.69) “particularly simple. . 1 0 0 0 0 . . . . . . . . . . . . ..284 triangular matrix is particularly simple:32 L {k} −1 k CHAPTER 11. . THE MATRIX =I− k ∞ j=−∞ i=k+1 ∞ αij Eij = 2I − L {k} = {k} −1 ∞ j=−∞ i=k+1 T−αij [ij] = · · · . . There is some odd parochiality at play in applied mathematics when one calls such collections of symbols as (11. .

then {k} and (I − In )(U (In − Ik−1 ) = U {k} − I) = 0 = (U − I = Ik−1 (U {k} − I)(I − In ).) L ∞ Ik+1 L {k} {k} −1 {k}′ +L 2 {k} −1 =I = {k} {k} U {k} +U 2 {k} −1 {k} k I−∞ k−1 {k} ∞ I−∞ U Ik = L = U −I ∞ = Ik+1 (L {k} {k} k−1 − I = I−∞ (U k − I)I−∞ ∞ − I)Ik If L {k} and (I − In )(L If U {k} (In − Ik )L honors an n × n active region. THE UNIT TRIANGULAR MATRIX 285 Table 11. 11. then {k} Ik = L {k} {k} − I) = 0 = (L {k} − I = (In − Ik )(L {k} {k} − I)(I − In ). b the notation Ia represents the generalized dimension-limited indentity ma{k}′ {k} −1 =L and trix or truncator of eqn. {k} − I)Ik Ik−1 U honors an n × n active region. such that U the table’s properties hold for them. too. − I)(In − Ik−1 ) .11. (In the table.5: Properties of the parallel unit triangular matrix.30. Note that the inverses L =U are parallel unit triangular matrices themselves.8.

. ··· ··· ··· ··· ··· . . . . ∗ 1 0 0 0 . . . . . . .. . . . 0 0 0 1 0 . . . . ··· ··· ··· ··· ··· . . .. . . . . . 0 0 1 ∗ ∗ . (11. we shall ﬁnd it useful to introduce the additional notation L [k] = I+ . . . . 0 0 0 1 ∗ . 1 0 0 0 0 . . . (11. . . . . j−1 . . THE MATRIX 11.8. . . . 1 0 0 0 0 . . . ··· ··· ··· ··· ··· . . ∞ ∞ αij Eij . k . U [k] = I+ j=−∞ i=−∞ αij Eij .5 The partial unit triangular matrix Besides the notation L and U for the general unit lower and unit upper {k} {k} for the parallel unit and U triangular matrices and the notation L lower and unit upper triangular matrices..70) j=k i=j+1 = . . . .. 0 1 0 0 0 . . . 0 0 0 0 1 . 0 0 0 0 1 . . ∗ ∗ 1 0 0 . . . ··· ··· ··· ··· ··· . .286 CHAPTER 11. .71) = .

. . . . . = . . .4 are in fact also major partial unit triangular matrices. .. . . ∗ ∗ ∗ ∗ 1 . . the partial unit triangular matrix is a matrix which leftward or rightward of the kth column resembles I. {k} {k} Observe that the parallel unit triangular matrices L and U of § 11. . but it serves a purpose in this book and is introduced here for this reason.11. 1 ∗ ∗ ∗ ∗ . 0 0 0 1 0 . The notation is arguably imperfect in that L{k} + L[k] − I = L but rather that L + L[k+1] − I = L. U [k] . . . L{k} and U {k} . .8. (11. U [k]T . . . ··· ··· ··· ··· ··· . .33 Such notation is not standard in the literature. . 0 0 0 0 1 . . .. . the former pair can be called minor partial unit triangular matrices. . . . U {k} = I+ . . . ··· ··· ··· ··· ··· ∞ ∞ αij Eij . Whether minor or major. . . major partial unit triangular matrices. Of course partial unit triangular matrices which resemble I above or below the kth row are equally possible. αij Eij . . 0 1 ∗ ∗ ∗ .. . .. . for the supplementary forms. {k} 33 . . . . . ∗ ∗ 1 0 0 . j−1 . . . . 1 0 0 0 0 .72) j=−∞ i=j+1 . as the notation suggests. ··· ··· ··· ··· ··· . ∗ ∗ ∗ 1 0 .73) j=k i=−∞ = ··· ··· ··· ··· ··· . . and can be denoted L[k]T . 0 1 0 0 0 . THE UNIT TRIANGULAR MATRIX 287 for unit triangular matrices whose oﬀ-diagonal content is conﬁned to a narrow wedge and k L{k} = I + . . . . and the latter pair. 0 0 1 ∗ ∗ . L{k}T and U {k}T . . . If names are needed for L[k]. . The conventional notation b f (k) + c f (k) = c f (k) k=a k=b k=a suﬀers the same arguable imperfection. (11. .8. .

. 0 0 0 0 0 0 0 . . . Further obvious but useful identities include that H−k (Iℓ − Ik ) = Iℓ−k H−k . . Inasmuch as the shift operator shifts all rows or columns of the matrix it operates on. . . ··· ··· ··· ··· ··· ··· ··· . .3. . . (11. transpose and adjoint are the same: T ∗ ∗ T Hk Hk = Hk Hk = I = Hk Hk = Hk Hk . . . .10 The Jacobian derivative Chapter 4 has introduced the derivative of a function with respect to a scalar variable. deﬁned that [Hk ]ij = δi(j+k) . the shift operator’s inverse.288 CHAPTER 11. . . its active region is ∞ × ∞ in extent. and the function itself can be vector-valued. (11.75) −1 ∗ T Hk = Hk = Hk = H−k . Hk shifts A’s columns leftward k steps. 0 0 0 0 0 0 1 . Operating AHk . . . . . . then df = dx ∂f1 ∂x 1 ∂f 2 ∂x1 ∂f1 ∂x2 ∂f2 ∂x2 ∂f1 ∂x3 . (11. The derivative is ∂fi df . One can also take the derivative of a function with respect to a vector variable. .74) H2 = Operating Hk A. . . (11.77) = dx ij ∂xj For instance. . (Iℓ − Ik )Hk = Hk Iℓ−k . An exception is the shift operator Hk . . ∂f2 ∂x3 . . . Hk shifts A’s rows downward k steps. . THE MATRIX 11. ··· ··· ··· ··· ··· ··· ··· .9 The shift operator Not all useful matrices ﬁt the dimension-limited and extended-operational forms of § 11. Obviously. . if x has three elements and f has two. 0 0 0 0 0 1 0 . . . 0 0 0 0 1 0 0 . 0 0 1 0 0 0 0 . 0 0 0 0 0 0 0 .76) 11. . 0 0 0 1 0 0 0 . For example. . . .

the Jacobian matrix.10.79) .11. because.” 00:50. The Jacobian derivative of a vector with respect to itself is dx = I.9 and the Jacobian derivative of this section complete the family of matrix rudiments we shall need to begin to do increasingly interesting things with matrices in Chs.79)’s second line is transposed. The shift operator of § 11. 13 and 14. not adjointed. 2007] Notice that the last term on (11. 15 Sept. + Af valid for any constant matrix A—as is seen by applying the deﬁnition (4. however. 12. we must treat two more foundational matrix matters. which here is (g + ∂g/2)∗ A(f + ∂f /2) − (g − ∂g/2)∗ A(f − ∂f /2) ∂ (g∗ Af ) = lim .19) of the derivative. or just the Jacobian. The two are the Gauss-Jordan decomposition and the matter of matrix rank. THE JACOBIAN DERIVATIVE 289 This is called the Jacobian derivative. “Jacobian. and ∂xn+1 /∂xn+1 = 0. ∂xj →0 ∂xj ∂xj and simplifying.25) in the form35 d T g Af = gTA dx d ∗ g Af = g∗ A dx df dx df dx + dg dx dg dx T T Af ∗ T . even if x has only n elements. which will be the subjects of Ch. The Jacobian derivative obeys the derivative product rule (4. (11. one could vary xn+1 in principle.78) The derivative is not In as one might think. next. . 34 35 [77. still.34 Each of its columns is the derivative with respect to one element of x. dx (11. Before doing interesting things.

290 CHAPTER 11. THE MATRIX .

• the null matrix 0 (§ 11. but it turns out that one can factor any matrix whatsoever into a product of rudiments which do have the properties. • the elementary operator T (§ 11. • the general identity matrix I and the scalar matrix λI (§ 11. is the Gauss-Jordan decomposition.3. This chapter introduces it. • the quasielementary operator P . Section 11. the latter including • lone-element matrix E (§ 11. The general matrix A does not necessarily have any of these properties. D. and that several orderly procedures are known to do so. • the rank-r identity matrix Ir (§ 11.3 has de¨mphasized the concept of matrix dimensionality m× e n. that section has actually deﬁned rank only for the rank-r identity matrix Ir . L[k] or U[k] (§ 11. 291 .7).7). The simplest of these.3.2). as we have seen.4). This chapter explains.5).Chapter 12 Matrix rank and the Gauss-Jordan decomposition Chapter 11 has brought the matrix and its rudiments. Such rudimentary forms have useful properties. and • the unit triangular matrix L or U (§ 11. However. In fact all matrices have rank.3. supplying in its place the new concept of matrix rank. and indeed one of the more useful.1).8).3.

any nonzero scalar alone is linearly independent—but there is no such thing as a linearly independent pair of scalars. . and then onward to the more interesting matrix topics of Chs. it is hard to see how to avoid the rigor here. a3 .1) forbids it. n = 0 set as linearly independent. a4 . there is also no such thing as a linearly independent set which includes the null vector. That is. or some other vectors—the property being deﬁned as follows.1 Linear independence Linear independence is a signiﬁcant possible property of a set of vectors— whether the set be the several columns of a matrix. the chapter demands more rigor than one likes in such a book as this. Linear independence is a property of vectors. and (ii) the elementary similarity transformation. . we regard the empty. One . the n vectors of the set {a1 . would be α1 = α2 = α3 = · · · = αn = 0).1 1 This is the kind of thinking which typically governs mathematical edge cases. where “nontrivial αk ” means the several αk . . on the technical ground that the only possible linear combination of the empty set is trivial. The chapter begins with these. . the several rows. too. For consistency of deﬁnition. because one of the pair can always be expressed as a complex multiple of the other.292 CHAPTER 12. (12. 13 and 14. however. More formally. and logically the chapter cannot be omitted. at least one of which is nonzero (trivial αk . Vectors which can combine nontrivially to reach the null vector are by deﬁnition linearly dependent. Except in § 12. by contrast. a5 . n = 1 set consisting only of a1 = 0 is. strictly speaking. Paradoxically.1) for all nontrivial αk . RANK AND THE GAUSS-JORDAN Before treating the Gauss-Jordan decomposition and the matter of matrix rank as such. Technically the property applies to scalars. the several ak are linearly independent iﬀ α1 a1 + α2 a2 + α3 a3 + · · · + αn an = 0 (12. we shall ﬁnd it helpful to prepare two preliminaries thereto: (i) the matter of the linear independence of vectors. 12. an } are linearly independent if and only if none of them can be expressed as a linear combination—a weighted sum—of the others. even the singlemember. A vector is linearly independent if its role cannot be served by the other vectors in the set. inasmuch as a scalar resembles a one-element vector— so. However. Signiﬁcantly but less obviously. not linearly independent. We will drive through the chapter in as few pages as can be managed.2. a2 .

We discuss the linear independence of vectors in this. βn − βn = 0.5. using the threedimensional geometrical vectors of § 3. A fourth such vector (unless it points oﬀ into some unvisualizable fourth dimension) cannot possibly then be independent of the three. depending on one’s point of view. suppose that it were possible. −1 ak z k = 0. One concludes therefore that. β3 − ′ β3 = 0.3. a chapter on matrices. then it cannot be expressed as any other combination of the same vectors. were in fact one and the same. where the several ak satisfy (12. but it helps to visualize the concept geometrically in three dimensions. and 2 not 1 k=0 the least prime. As we shall see in § 12. β2 − β2 = 0.1). . could deﬁne the empty set to be linearly dependent if one really wanted to. this could only be so if the coeﬃcients in the last ′ ′ ′ equation where trivial—that is.12.1). the important property of matrix rank depends on the number of linearly independent columns or rows a matrix has. among other examples. if β1 a1 + β2 a2 + β3 a3 + · · · + βn an = b. but what then of the observation that adding a vector to a linearly dependent set never renders the set independent? Surely in this light it is preferable just to deﬁne the empty set as independent in the ﬁrst place. Similar thinking makes 0! = 1. which we had supposed to diﬀer. Linear independence can apply in any dimensionality.1. . LINEAR INDEPENDENCE 293 If a linear combination of several independent vectors ak forms a vector b.1) a matrix is essentially a sequence of vectors—either of column vectors or of row vectors. The diﬀerence of the two equations then would be ′ ′ ′ ′ (β1 − β1 )a1 + (β2 − β2 )a2 + (β3 − β3 )a3 + · · · + (βn − βn )an = 0. Two such vectors are independent so long as they do not lie along the same line. . The combination is unique. because (§ 11. then is ′ ′ ′ ′ β1 a1 + β2 a2 + β3 a3 + · · · + βn an = b possible? To answer the question. only if β1 − β1 = 0. A third such vector is independent of the ﬁrst two so long as it does not lie in their common plane. But this says no less than that the two linear combinations. According to (12. . then one might ask: can there exist a diﬀerent linear combination of the same vectors ak which also forms b? That is. if a vector b can be expressed as a linear combination of several linearly independent vectors ak . .

m × n matrix A is2 A = G> Ir G< = P DLU Ir KS. without fundamentally altering the character of the quasielementary or unit triangular matrix. Of course rigorous symbolic proofs can be constructed after the pattern of § 11.1).3. L and U of course represent the quasielementaries and unit triangular matrices of §§ 11.294 CHAPTER 12. RANK AND THE GAUSS-JORDAN 12. typically merging D into L and omitting K and S. . The symbols P ′ .1 obtain. The rules permit one to commute some but not all elementaries past a quasielementary or unit triangular matrix.8.2) G> ≡ P DLU.7 and 11. some of which we are now prepared to discuss. The similarity transformation has several interesting properties. G< ≡ KS. 2 (12.46) have introduced the similarity transformation CAC −1 or C −1 AC.7. D.2 The elementary similarity transformation Section 11. Most introductory linear algebra texts this writer has met call the Gauss-Jordan decomposition instead the “LU decomposition” and include fewer factors in it.5 and its (11. but they reveal little or nothing sketching the matrices does not.3 The Gauss-Jordan decomposition The Gauss-Jordan decomposition of an arbitrary. The symbols P . L′ and U ′ also represent quasielementaries and unit triangular matrices. The rules ﬁnd use among other places in the Gauss-Jordan decomposition of § 12. dimension-limited. and sometimes without altering the matrix at all.8. 12. only not necessarily the same ones P . though to grasp some of the rules it helps to sketch the relevant matrices on a sheet of paper. D ′ . where • P and S are general interchange operators (§ 11. Most of the table’s rules are fairly obvious if the meaning of the symbols is understood. L and U do. the several rules of Table 12. In this case. Perhaps the reader will agree that the decomposition is cleaner as presented here. particularly in the case in which the operator happens to be an elementary. They also omit Ir . which arises when an operator C commutes respectively rightward or leftward past a matrix A. C = T . D. since their matrices have pre-deﬁned dimensionality.2.

U {k} .1: Some elementary similarity transformations. T[i↔j] IT[i↔j] = I T[i↔j]P T[i↔j] = P ′ T[i↔j]DT[i↔j] = D ′ = D + ([D]jj − [D]ii ) Eii + ([D]ii − [D]jj ) Ejj T[i↔j]DT[i↔j] = D [k] if [D]ii = [D]jj T[i↔j]L T[i↔j] = L[k] {k} ′ {k} ′ {k} ′ {k} ′ if i < k and j < k if i > k and j > k if i > k and j > k if i < k and j < k if i > k and j > k if i < k and j < k T[i↔j]U [k] T[i↔j] = U [k] T[i↔j] L{k} T[i↔j] = L T[i↔j] U T[i↔j] L T[i↔j] U {k} {k} {k} T[i↔j] = U T[i↔j] = L T[i↔j] = U Tα[i] IT(1/α)[i] = I Tα[i] DT(1/α)[i] = D Tα[i] AT(1/α)[i] = A′ Tα[ij] IT−α[ij] = I where A is any of {k} {k} L. U [k] . U.3. U Tα[ij] DT−α[ij] = D + ([D]jj − [D]ii ) αEij = D ′ Tα[ij] DT−α[ij] = D Tα[ij] U T−α[ij] = if [D]ii = [D]jj if i > j if i < j Tα[ij] LT−α[ij] = L′ U′ .12. L[k] . L{k} . L . THE GAUSS-JORDAN DECOMPOSITION 295 Table 12.

and • r is an unspeciﬁed rank. • G> and G< are composites3 as deﬁned by (12. and (at least as developed in this book) precedes the others logically. 14.2). 14. but just as there are many ways to factor a scalar (0xC = [4][3] = [2]2 [3] = [2][6].10 and 14. < U −1 −1 (12.296 CHAPTER 12. • K=L is the transpose of a parallel unit lower triangular matrix.2). there are likewise many ways to factor a matrix. one can left-multiply the equation A = G> Ir G< by G−1 and right> multiply it by G−1 to obtain < U −1 L−1 D −1 P −1 AS −1 K −1 = G−1 AG−1 = Ir .4). RANK AND THE GAUSS-JORDAN • D is a general scaling operator (§ 11.8). 12.1 Motive Equation (12.3) L D −1 P −1 = G−1 . • L and U are respectively unit lower and unit upper triangular matrices (§ 11. The equation itself is easy enough to read. being thus a parallel unit upper triangular matrix (§ 11.7. > < S −1 K −1 = G−1 . However—at least for matrices which do have one—because G> and G< are composed of invertible factors.8.11.3. Why choose this particular way? There are indeed many ways. in fact) is a matter this section addresses. and in any case needs less advanced preparation to appreciate than the others. Whether all possible matrices A have a Gauss-Jordan decomposition (they do.6. The Gauss-Jordan decomposition we meet here however has both signiﬁcant theoretical properties and useful practical applications.” The letter G itself can be regarded as standing for “Gauss-Jordan. for example). We shall meet some of the others in §§ 13.” but admittedly it is chosen as much because otherwise we were running out of available Roman capitals! 3 . n × n One can pronounce G> and G< respectively as “G acting rightward” and “G acting leftward.2) seems inscrutable.12. It emerges naturally when one posits a pair of square. > the Gauss-Jordan’s complementary form. {r}T The Gauss-Jordan decomposition is also called the Gauss-Jordan factorization.

2). 11. A and A−1 . for which A−1 A = In . then we shall have that T (A) = In . However. . THE GAUSS-JORDAN DECOMPOSITION 297 matrices.) To determine A−1 is not an entirely trivial problem. This observation is what motivates the Gauss-Jordan decomposition.49.3. and let us seek A−1 by left-multiplying A by a sequence T of elementary row operators. truncated (§ 11. T . By successive steps. all elementary operators including the ones here have extendedoperational form (§ 11.6) to n × n dimensionality. . if one can determine A−1 . it is not yet claimed that AA−1 = In . 2 or. that is only for square A.4 a concrete example: A = 1 2 2 3 1 3 1 0 1 0 1 0 1 0 0 0 0 0 0 0 1 0 1 0 1 0 1 0 1 −4 −1 . (The A−1 here is the A−1(n) of eqn. . (In ) which implies that T (A) = In .3. When In is ﬁnally achieved. so for the moment let us suppose a square A for which A−1 does exist. Only the 2 × 2 active regions are shown here. The matrix A−1 such that A−1 A = In may or may not exist (usually it does exist if A is square. . A−1 = (In ) The product of elementaries which transforms A to In . what if A is not square? In the present subsection however we are not trying to prove anything. each of which makes the matrix more nearly resemble In . as we shall soon see). itself constitutes A−1 . but even then it may not. it is only supposed here that A−1 A = In . where A is known and A−1 is to be determined. and even if it does exist. how to determine it is not immediately obvious. .12.3. but all those · · · ellipses clutter the page too much. only to motivate. . A = A = A = A = A = 1 0 1 0 1 0 4 0 1 5 1 −3 0 1 0 1 0 1 0 1 1 2 −2 −1 −2 5 −2 1 0 1 0 1 2 1 2 1 1 0 1 0 0 1 5 1 −3 1 −3 1 −3 1 2 1 2 1 2 0 1 1 0 0 1 5 Theoretically. And still. left-multiplying by In and observing that In = In .

. step 12.2.2). of which the example given has used none but which other examples sometimes need or want. then we will proceed on this basis. 1 0] to I2 . this factor comes into play when A has broad rectangular rather than square shape. for instance. multiplying the two scaling elementaries to merge them into a single general scaling operator (§ 11. but at present we are only motivating not proving.3. Now. particularly to avoid dividing by zero when they encounter a zero in an inconvenient cell of the matrix (the reader might try reducing A = [0 1. Regarding K.7. The last point. A−1 = 1 0 0 1 CHAPTER 12. a row or column interchange is needed here). from Table 11. A−1 = 1 0 0 1 1 0 2 1 1 −3 5 0 1 1 2 0 1 5 0 = 3 −A 1 −A 2 5 1 5 . The last equation is written symbolically as A−1 = I2 U −1 L−1 D −1 .298 Hence. S and K in the ﬁrst place? The answer regarding P and S is that these factors respectively gather row and column interchange elementaries. we are not quite ready to detail yet. we have that A−1 = 1 0 0 1 1 0 2 1 1 −3 5 0 1 1 0 0 1 5 1 2 0 1 0 = 1 −A 3 −A 2 5 1 5 . Using the elementary commutation identity that Tβ[m] Tα[mj] = Tαβ[mj] Tβ[m] . RANK AND THE GAUSS-JORDAN 1 0 2 1 1 0 0 1 5 1 −3 0 1 1 2 0 1 0 = 3 −A 1 −A 2 5 1 5 . the equation A = DLU I2 is not (12. admittedly. from which A = DLU I2 = 2 0 0 5 1 3 5 0 1 1 0 −2 1 1 0 0 1 = 2 3 −4 −1 . so if the reader will accept the other factors and suspend judgment on K until the actual need for it emerges in § 12. to group like operators. it is (12. and also sometimes when one of the rows of A happens to be a linear combination of the others.2)—or rather. but only in the special case that r = 2 and P = S = K = I—which begs the question: why do we need the factors P .3. or.2).

˜ Each step of the transformation goes as follows. D. THE GAUSS-JORDAN DECOMPOSITION 299 12. ˜ ˜ A = P DT(1/α)[i] ˜ Tα[i] LT(1/α)[i] ˜ Tα[i] U T(1/α)[i] ˜ ˜˜ Tα[i] I K S. then. K and S of (12. etc. while the six I are gradually transformed into the six Gauss-Jordan factors. ˜ ˜ L ← Tα[i] LT(1/α)[i] .4) has become the Gauss-Jordan decomposition (12.or right-multiplied by an elementary operator T .4) ˜ ˜ ˜ where the initial value of I is A and the initial values of P . ˜ ˜ D ← DT(1/α)[i] .or left-multiplied by T −1 . inasmuch as the adjacent elementaries cancel one another. while the several matrices are gradually being transformed.2. L. D. The decomposition thus ends with the equation A = P DLU Ir KS.2).. where the six I hold the places of the six Gauss-Jordan factors P . U . which is (12.2 Method The Gauss-Jordan decomposition of a matrix A is not discovered at one stroke but rather is gradually built up. the equation is represented as ˜ ˜ ˜ ˜ ˜˜ ˜ A = P D LU I K S. (12. at Such elementary row and column operations are repeated until I which point (12.3. To compensate. elementary by elementary. which is just (12. In between. The matrix I is left.4). ˜ ˜ I ← Tα[i] I.2). . For example. one of the six factors is right.12. By successive elementary operations. are all I. ˜ thus associating the operation with the appropriate factor—in this case. Intervening factors are multiplied by both T and T −1 . ˜ = Ir . D. the A on the right is gradually transformed into Ir .2).3. which multiplication constitutes an elementary similarity transformation as described in § 12. ˜ ˜ U ← Tα[i] U T(1/α)[i] . It begins with the equation A = IIIIAII.

D. but nevertheless true. failproof algorithm to achieve it.) 1. RANK AND THE GAUSS-JORDAN 12. (Besides arriving at this point from step 1 above. From step 1. where i is a row index. the algorithm decrees the following steps. I = A and L = I. L ← I. and where the others are the variable working matrices ˜ of (12. leaving ˜ = Ir . the algorithm also ˜ ˜ re¨nters here from step 7 below. (The steps as written include many parenthetical remarks—so many that some steps seem to consist more of parenthetical remarks than of actual algorithm. One need store none of the matrices beyond these bounds. What is less clear until one has read the whole algorithm.3 The algorithm Having motivated the Gauss-Jordan decomposition in § 12. I Since A by deﬁnition is a dimension-limited m×n matrix. S ← I.4). Begin by initializing ˜ ˜ ˜ ˜ ˜ ˜ P ← I.) 2. (The eventual goal will be to factor all of I away. • establishes a rank r (step 8). the algorithm ˜ • copies A into the variable working matrix I (step 1 below). so e . one naturally need not store A beyond the m × n active region. L and U . The other six variable working matrices each have extendedoperational form. ˜ ← A. K ← I. n × n for K and S. Broadly. though the precise value of r will not be known until step 8. ˜ where I holds the part of A remaining to be decomposed. Speciﬁcally. is that ˜ one also need not store the dimension-limited I beyond the m×n active region. I i ← 1. U ← I. D ← I.2. we shall now establish a deﬁnite. and ˜ • reduces the now unit triangular I further to the rank-r identity matrix Ir (steps 9 through 13). They are however necessary to explain and to justify the algorithm’s steps to the reader.3.300 CHAPTER 12. The remarks are unnecessary to execute the algorithm’s steps as such.3. orderly.3. but they also conﬁne their activity to well-deﬁned ˜ ˜ ˜ ˜ ˜ ˜ regions: m × m for P . ˜ • reduces I by suitable row (and maybe column) operations to unit upper triangular form (steps 2 through 7).1 and having proposed a basic method to pursue it in § 12.

. . . . . ˜ I = . 1 0 0 0 0 0 0 . The ˜ relates not to the ı ı ˜ ˜ doubled. . . . . . .. . . . 0 0 0 1 0 0 0 . ··· ··· ··· ··· ··· ··· ··· . . . . . 0 0 0 0 1 0 0 . The need grows clear once ˜ one has read through step 7. . . . . 0 0 0 0 0 0 1 . . THE GAUSS-JORDAN DECOMPOSITION 301 this step 2 though logical seems unneeded. . . . . ı ˜ it means the current iith element of the variable working matrix I. . . .. . .8). ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· . . . 0 1 ∗ ∗ ∗ ∗ ∗ .. ∗ ∗ 1 0 0 0 0 . ∗ ∗ ∗ ∗ ∗ ∗ ∗ . . ··· ··· ··· ··· ··· ··· ··· . . . . . form L . . . . . . . . . but this is accidental. ··· ··· ··· ··· ··· ··· ··· . . . . . . . . . . . . . . . 0 0 0 0 0 1 0 . subscribed index ii but to I. . . . ··· ··· ··· ··· ··· ··· ··· . .3. 0 0 0 1 0 0 0 . notice that L enjoys the major partial unit triangular {i−1} (§ 11.) Observe that neither the ith row of I ˜ is nor any row below it has an entry left of the ith column. 5 . . . . that I all-zero below-leftward of and directly leftward of (though not directly below) the pivot element ˜ii . . ∗ ∗ ∗ ∗ ∗ ∗ ∗ .. . . . 0 0 0 0 1 0 0 . . Regarding the ˜ other factors. . . . . . where the ith row and ith column are depicted at center. . ∗ ∗ ∗ ∗ ∗ ∗ ∗ . The notation ˜ii thus means [I]ii —in other words. .. . . 0 0 ∗ 0 0 0 0 . . . .5) and that d ˜kk = 1 for all k ≥ i. ˜ D = ˜ L = L{i−1} = . . The notation ˜ii looks interesting.5 Observe also that above the ith row. . 0 0 0 0 0 1 0 . . . . . . Pictorially. . . . . 0 0 0 0 0 0 1 . . . ∗ 1 0 0 0 0 0 . . . . . . . ∗ 0 0 0 0 0 0 . . . 1 ∗ ∗ ∗ ∗ ∗ ∗ .8. 0 ∗ 0 0 0 0 0 .. the ı matrix has proper unit upper triangular form (§ 11. 0 0 1 ∗ ∗ ∗ ∗ . . .12. ∗ ∗ ∗ ∗ ∗ ∗ ∗ . . . . .

If two are equally least. but of course it is not generally exact. but any nonzero element from the ith row downward ı can in general be chosen.302 CHAPTER 12. that is as good a choice as any. In fact alternate choices can sometimes improve practical numerical accuracy.6. 6 . § 1-4. RANK AND THE GAUSS-JORDAN 3. To choose a pivot. Choose a nonzero element ˜pq = 0 on or below the pivot row. (The mantissa’s high-order bit. x = 2p b. is implied not stored. then choose ﬁrst the lesser column ˜2 index q. Such errors are naturally avoided by avoiding small pivots. (The easiest choice may simply be ˜ii .0 ≤ b < 2. All this is standard computing practice.2. The following heuristic if programmed intelligently might not be too computationally expensive: Deﬁne the pivotsmallness metric 2˜∗ ˜pq ı pqı ηpq ≡ ˜2 . in 0x40 bits of computer memory. thus is one neither of the 0x34 nor of the 0x40 bits. when doing exact arithmetic. When one has no reason to choose otherwise. where p = ı q = i.2] 7 The Gauss-Jordan’s ﬂoating-point errors come mainly from dividing by small pivots.0 ≤ b < 4. if ˜ii = 0. where ı p ≥ i and q ≥ i.0 (not 1. at least until as late in the algorithm as possible. [38. the choice is quite arbitrary. There is however no actual need to choose so. then if necessary the lesser row index p. 4.7 Theoretically nonetheless. 0xB are for the number’s exponent −0x3FF ≤ p ≤ 0x3FE. A typical Intel or AMD x86-class computer processor represents a C/C++ doubletype ﬂoating-point number. which is always 1. any of several heuristics are reasonable.) If ı no nonzero element is available—if all remaining rows p ≥ i are now null—then skip directly to step 8.) The out-of-bounds exponents p = −0x400 and p = 0x3FF serve specially respectively to encode 0 and ∞. Of the 0x40 bits. Smallness however is relative: a small pivot in a row and a column each populated by even smaller elements is unlikely to cause as much error as is a large pivot in a row and a column each populated by even larger elements. Such a ﬂoating-point representation is easily accurate enough for most practical purposes. and one is for the number’s ± sign.4) can be expanded to read A = ˜ P T[p↔i] ˜ T[p↔i] DT[p↔i] ˜ T[p↔i] LT[p↔i] ˜ T[p↔i] U T[p↔i] ˜ × T[p↔i]IT[i↔q] = ˜ T[i↔q] KT[i↔q] ˜ T[i↔q] S ˜ ˜ ˜ ˜ P T[p↔i] D T[p↔i] LT[p↔i] U ˜ ˜ ˜ × T[p↔i]IT[i↔q] K T[i↔q] S . m n ∗ ı ı ı∗ ı ′ =i ˜p′ q˜p′ q + p q ′ =i ˜pq ′˜pq ′ Choose the p and q of least ηpq . 0x34 are for the number’s mantissa 2. so long as ˜pq = 0. Observing that (12. Beginning students of the Gauss-Jordan or LU decomposition are conventionally taught to choose ﬁrst the least possible q then the least possible p.0 as one might expect).

to bring the chosen element to the pivot position. The U and K transformations disappear because at this stage of the algorithm. The D transformation disappears because p ≥ i and ˜kk = 1 for all k ≥ i. ˜ ˜ I ← T(1/˜ii )[i] I. ˜ ˜ L ← T[p↔i]LT[p↔i] . THE GAUSS-JORDAN DECOMPOSITION let ˜ ˜ P ← P T[p↔i]. ˜ ˜ S ← T[i↔q] S. ı ˜ This forces ˜ii = 1.3.1 for the similarity transformations. not disappear. ı ı normalize the new ˜ii pivot by letting ı ˜ ˜ ı D ← DT˜ii [i] . (Re˜ ˜ fer to Table 12. Observing that (12. ˜ ˜ I ← T[p↔i]IT[i↔q] . but L which form according to Table 12.12.4) can be expanded to read ˜ ˜ ı A = P DT˜ii [i] ˜ ı T(1/˜ii )[i] LT˜ii [i] ı ˜ ı T(1/˜ii )[i] U T˜ii [i] ı ˜ ˜˜ × T(1/˜ii )[i] I K S ı ˜ ˜ ı = P DT˜ii [i] ˜ ı ˜ ˜ ˜˜ T(1/˜ii )[i] LT˜ii [i] U T(1/˜ii )[i] I K S. it does ˜ because d ˜ has major partial unit triangular form L{i−1} . Pictorially after ı . Regarding the L transformation. It also changes the value of dii .) 5. ˜ ˜ ı L ← T(1/˜ )[i] LT˜ ı ii ii [i] .1 it retains since i − 1 < i ≤ p. still ˜ ˜ ˜ U = K = I. 303 thus interchanging the pth with the ith row and the qth with the ith column.

. . . . . CHAPTER 12. 0 0 0 0 0 0 1 . . . It also ﬁlls in L’s ith column below the ı {i−1} form to the L{i} form. . . . . .. . . . . pivot. . 0 0 ∗ 0 0 0 0 .1. Observing that (12. . . RANK AND THE GAUSS-JORDAN ˜ D = . . . ∗ ∗ ∗ 1 ∗ ∗ ∗ . . . . . ∗ ∗ ∗ ∗ ∗ ∗ ∗ . .. . 1 0 0 0 0 0 0 . . . too. . . . . . . . . . . . . ∗ ∗ ∗ ∗ ∗ ∗ ∗ . because i − 1 < i. Refer to Table 12. . ··· ··· ··· ··· ··· ··· ··· . 0 0 0 ∗ 0 0 0 . . ı ˜ clear I’s ith column below the pivot by letting m ˜ ˜ L← L ˜ I← m p=i+1 T˜pi [pi] . . ∗ 1 0 0 0 0 0 . . . . . again it leaves L in the major partial {i−1} . . . advancing that matrix from the L ˜ T−˜pi [pi] I . ··· ··· ··· ··· ··· ··· ··· . ı p=i+1 ˜ This forces ˜ip = 0 for all p > i. 0 0 0 0 1 0 0 . . . . ˜ ˜ (Though the step changes L. 0 ∗ 0 0 0 0 0 . . ∗ ∗ 1 0 0 0 0 .. . .304 this step. ∗ ∗ ∗ ∗ ∗ ∗ ∗ . ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· .4) can be expanded to read ˜˜ ˜ ı A = P D LT˜pi [pi] ˜ ı T−˜pi [pi] U T˜pi [pi] ı ˜ ˜˜ T−˜pi [pi] I K S ı ˜˜ ˜ ı ˜ ˜ ˜˜ = P D LT˜pi [pi] U T−˜pi [pi] I K S. . 0 0 0 0 0 1 0 . . . ı . ∗ 0 0 0 0 0 0 . . . . . . . .) unit triangular form L 6. . . ˜ I = . . . .. . . . .

∗ ∗ 1 0 0 0 0 . .) If i = 0. . Notice that. 0 0 1 ∗ ∗ ∗ ∗ . . . ∗ ∗ ∗ ∗ ∗ ∗ ∗ . . Go back to step 2. THE GAUSS-JORDAN DECOMPOSITION Pictorially. . . . . 0 0 0 0 1 0 0 . 0 1 ∗ ∗ ∗ ∗ ∗ . . . . .3. . . . . certainly. (Besides arriving at this point from step 8 above. 0 0 0 1 ∗ ∗ ∗ .. thus can be applied all at once. . .. . then skip directly to e step 12. . . . . . . . . . . After decrementing. . .7. . let the rank r ≡ i. ˜ I = . . Together they easily form an addition quasielementary L[i] .. . .. . . (Note that it is not necessary actually to apply the addition elementaries here one by one. ··· ··· ··· ··· ··· ··· ··· . 9. . . r ≤ m and r ≤ n. 8. . . ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· . . . Increment i ← i + 1.3. . . . . . . . . ∗ ∗ ∗ ∗ ∗ ∗ ∗ .12. 1 0 0 0 0 0 0 . ∗ ∗ ∗ 1 0 0 0 . . 1 ∗ ∗ ∗ ∗ ∗ ∗ . . . . . .) 7. thus letting i point to the matrix’s last nonzero row. . . 0 0 0 0 0 0 1 . . ∗ 1 0 0 0 0 0 . 0 0 0 0 0 1 0 . . Decrement i←i−1 to undo the last instance of step 7 (even if there never was an instance of step 7). . See § 11. . 305 ˜ L = L{i} = ··· ··· ··· ··· ··· ··· ··· . . . . . . . . . . ∗ ∗ ∗ ∗ ∗ ∗ ∗ . . . the algorithm also re¨nters here from step 11 below. . .

. . . . . . . .. . 0 0 0 0 0 0 1 . . . ∗ ∗ ∗ 1 0 0 0 . . ı ˜ This forces ˜ip = 0 for all p = i. ∗ 1 0 0 0 0 0 . .3. . . .. . . . . ˜ T−˜pi [pi] I . . 1 0 0 0 0 0 0 . . . . . here again it is not necessary actually to apply the addition elementaries one by one. . ∗ ∗ ∗ ∗ ∗ 1 0 . . 0 0 1 0 0 0 0 . 0 0 0 0 0 1 0 . . . ı ˜ U = U {i} = ··· ··· ··· ··· ··· ··· ··· . .4) can be expanded to read ˜˜˜ ˜ ı A = P D L U T˜pi [pi] ˜ ˜˜ T−˜pi [pi] I K S. ı ˜ clear I’s ith column above the pivot by letting i−1 ˜ ˜ U← U ˜ I ← i−1 p=1 p=1 T˜pi [pi] . . . . . . . . . It also ﬁlls in U ’s ith column above the ı {i+1} form to the U {i} form. (As in step 6. . Decrement i ← i − 1. . ··· ··· ··· ··· ··· ··· ··· . Together they easily form an addition quasielementary U[i] . .) 11. . 0 0 0 0 1 0 0 . . 1 0 0 0 0 0 0 . . . . RANK AND THE GAUSS-JORDAN 10.. . See § 11. . . . . . . . pivot. . ∗ ∗ 1 0 0 0 0 . . . . . Observing that (12. .306 CHAPTER 12. . . ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· . .. . . . advancing that matrix from the U Pictorially. . . . Go back to step 9. . ∗ ∗ ∗ ∗ 1 0 0 . . . . . 0 0 0 1 0 0 0 . . ∗ ∗ ∗ ∗ ∗ ∗ 1 . 0 1 0 0 0 0 0 .7. . ˜ I = . .

Notice now that I = Ir . 0 1 0 0 0 0 0 . THE GAUSS-JORDAN DECOMPOSITION 307 ˜ 12. .3.4. . .. here again it is not necessary actually to apply the addition elementaries one by one. End. . . . See § 11. . Pictorially. . Notice that I now has the form of a rank-r identity matrix. . L ≡ L. . . . D ≡ D. S ≡ S. . .) L ˜ 13. ··· ··· ··· ··· ··· ··· ··· . . then there are no extra columns). . . ∗ ∗ ∗ ∗ 0 0 0 . .2). ∗ ∗ ∗ ∗ 0 0 0 . entering this step. . . .8. . it was that K = I. . . . . ˜ Never stalling. ∗ ∗ ∗ ∗ 0 0 0 . ı .12. . . U ≡ U . Let ˜ ˜ ˜ ˜ ˜ ˜ P ≡ P . . except with n − r extra columns dressing its right edge (often r = n however. Together they easily form a parallel unit upper—not lower—triangular matrix {r}T . Observing that (12. . ˜ I= ··· ··· ··· ··· ··· ··· ··· . 1 0 0 0 0 0 0 .4) can be expanded to read ˜˜˜˜ ˜ ı A = P D LU IT−˜pq [pq] ˜ ˜ T˜pq [pq] K S.. . . . K ≡ K. . As in steps 6 and 10. 0 0 0 1 0 0 0 . . ı ˜ use the now conveniently elementarized columns of I’s main body to suppress the extra columns on its right edge by n r ˜ ˜ I← I n q=r+1 p=1 r T−˜pq [pq] . 0 0 1 0 0 0 0 . . the algorithm cannot fail to achieve I = Ir and thus a complete Gauss-Jordan decomposition of the form (12. though what value ˜ K← q=r+1 p=1 ˜ T˜pq [pq] K . so in fact K becomes just the product above. . . ı ˜ ˜ (Actually.

which does not belong to the algorithm’s main loop and has nothing to do with the availability of nonzero rows to step 3. This is unsurprising. Step 6 in the ith iteration adds downward multiples of the ith pivot row. To no row is ever added. the same rank r ≥ 0.1). r < m. (We have not yet proven. The action of step 6 then is to add multiples of the chosen pivot row downward only—that is. necessarily.308 CHAPTER 12. only if step 3 ﬁnds a null row i ≤ m.1. . that the algorithm always produces the same Ir . r ≤ n. Observe also however that the rank always fully reaches r = m if the rows of the original matrix A are linearly independent. but ﬁnd only rows which already include multiples of the ﬁrst pivot row. We can safely ignore this unproven fact however for the immediate moment. regardless of which pivots ˜pq = 0 ı one happens to choose in step 3 along the way.8 According to (12. How do we know that step 6 can never create a null row? We know this because the action of step 6 is to add multiples only of current and earlier pivot rows ˜ to rows in I which have not yet been on pivot. but before going to that extreme consider: The action of steps 3 and 4 is to choose a pivot row p ≥ i and to shift it upward to the ith position. only to rows which have not yet been on pivot. Therefore. r ≤ m. (12.4 Rank and independent rows Observe that the Gauss-Jordan algorithm of § 12. such action has no power to cancel the independent rows it targets. one can drown the assertion rigorously in symbols to prove it.3 operates always within the bounds of the original m × n matrix A. but according to § 12. Step 6 in the second iteration therefore adds downward multiples of the second pivot row. steps 3 and 4 in the second iteration ﬁnd no unmixed rows available to choose as second pivot. a multiple of itself—until step 10. directly or indirectly. The reason for this observation is that the rank can fall short. Indeed the narrative of the algorithm’s step 8 has already noticed the fact.5) The rank r exceeds the number neither of the matrix’s rows nor of its columns. RANK AND THE GAUSS-JORDAN the rank r might turn out to have is not normally known to us in advance.) 12. which already includes multiples of the ﬁrst through (i − 1)th.3. This being so. 8 If the truth of the sentence’s assertion regarding the action of step 6 seems nonobvious.3. such null rows never were linearly independent in the ﬁrst place). but step 3 can ﬁnd such a null row only if step 6 has created one (or if there were a null row in the original matrix A. but will in § 12. So it comes to pass that multiples only of current and earlier pivot rows are added to rows which have not yet been on pivot.5. which already includes a multiple of the ﬁrst pivot row.

3). neither Ir nor A has anything but zeros outside the m × n rectangle. L−1 .41).12. K or S. it isn’t even that hard. If one merely records the sequence of elementaries used to build each of the six factors—if one reverses each sequence. ˜ ← T(1/˜ )[i] LT˜ [i] . Each of the six factors—P . not because it would not be useful. L.3. so there is nothing for the six operators to act upon beyond those bounds in any event.3.2).3. For example. U −1 ← I. ˜ ı L ı ii ii ˜ ˜ I ← T(1/˜ii )[i] I. by their n × n. 12. but because its initial value would be9 A−1(r) . U . row operators. act wholly by their m × m squares. ˜ ˜ ı D ← DT˜ii [i] . ˜ L ı ii ı ii With this simple extension. One actually need not record the individual elementaries. One need not however go even to that much trouble.1) for this reason if we want.) Then. U −1 . in step 5.6 Truncating the factors None of the six factors of (12. K −1 and S −1 —is therefore composed of the reverse sequence T −1 of inverse elementary operators. Refer to (11.2) but simultaneously also of the Gauss-Jordan’s complementary form (12. Each of the six inverse factors—P −1 . the two on the right. THE GAUSS-JORDAN DECOMPOSITION 309 12. ı ˜ −1 ← T(1/˜ )[i] L−1 T˜ [i] . ı ˜ ˜ D −1 ← T(1/˜ii )[i] D −1 .5 Inverting the factors Inverting the six Gauss-Jordan factors is easy. ˜ (There is no I −1 .8 have shown how.2) actually needs to retain its entire extendedoperational form (§ 11.7 and 11. D. This means starting the algorithm from step 1 with six extra variable working matrices (besides the seven already there): ˜ ˜ ˜ ˜ ˜ ˜ P −1 ← I.5 explains the notation. Indeed. D −1 ← I. for each ˜ ˜ ˜ ˜ ˜ ˜ operation on any of P . in fact. And. 9 Section 11. one operates inversely on the corresponding inverse matrix. column operators. inverts each elementary. multiply and forget them in stream. U . L−1 ← I.3. We can truncate all six operators to dimension-limited forms (§ 11. The four factors on the left. L. the algorithm yields all the factors not only of the Gauss-Jordan decomposition (12.3. Sections 11. S −1 ← I. D. unknown at algorithm’s start. one can invert. and multiplies—then the six inverse factors result. K −1 ← I. D −1 . anyway. . K and S—is composed of a sequence T of elementary operators.

having inﬁnite rank.10 Comes the objection. that counts.42) repeatedly. Hence inﬁnite dimensionality. we left-multiply (12. RANK AND THE GAUSS-JORDAN To truncate the six operators formally. m × m. The two matrix forms of § 11. It is the rank r.6) expresses any dimension-limited rectangular matrix A as the product of six particularly simple. too.3 manifest the sense that a matrix can represent a linear transformation. In either case. if any. The answer is that this book is a book of applied mathematical theory. nor is there any real diﬀerence between T5[21] when it row-operates on a 3 × p matrix and the same elementary when it row-operates on a 4 × p. According to § 11. by the method of this subsection. r × n and n × n. whose rank matters. A = (Im G> Ir )(Ir G< In ). a matrix displaying an inﬁnite ﬁeld of zeros resembles a shipment delivering an inﬁnite supply of nothing. The extended-operational form. then. by using (11.7) where the dimensionalities of the two factors are m × r and r × n. one need not be too impressed with either. The relevant theoretical constructs ought to reﬂect such insights.6. actions which have no eﬀect on A since it is already a dimension-limited m×n matrix. dimension-limited rectangular factors. m × m.3. (12. A = (Im P Im )(Im DIm )(Im LIm )(Im U Ir )(Ir KIn )(In SIn ). however.310 CHAPTER 12. By successive steps. The book will seldom point it out again explicitly.2) by Im and right-multiply it by In . (12. A = Im P DLU Ir KSIn 3 7 2 = Im P DLU Ir KSIn . To append a null row or a null column to a dimension-limited matrix is to alter the matrix in no essential way. 10 .” It is a fair question. but one can straightforwardly truncate not only the Gauss-Jordan factors but most other factors and operators. why do you make it more complicated than it needs to be? For what reason must all your matrices have inﬁnite dimensionality. “Black.6) where the dimensionalities of the six factors on the equation’s right side are respectively m × m. anyway? They don’t do it that way in my other linear algebra book. m × r.2). By similar reasoning from (12. the dimensionality m × n of the matrix is a distraction. whose rank does not. and theoretically in the author’s view.31) or (11. the Im and In respectively truncate rows and columns. Anyway. or a reversible row or column operation. serves the latter case. inﬁnite-dimensional matrices are signiﬁcantly neater to handle. obtaining Im AIn = Im P DLU Ir KSIn . and ﬁnally. Equation (12.

if one understands that the operator (In − Ir ) is a truncator that selects columns r + 1 through n of the matrix it operates leftward upon. Still.31) to rewrite the Gauss-Jordan .12.2 lists a few.7 and 11.52) and Table 11.3. included mostly only because § 13. but. P ∗ = P −1 = P T S ∗ = S −1 = S T P −∗ = P = P −T S −∗ = S = S −T K + K −1 =I 2 Ir K −1 (I Ir K(In − Ir ) = n 311 (I − In (I − In )(K − I) = )(K −1 − I) = − Ir ) = K −1 K −I 0 0 − I = Ir (K −1 − I)(In − Ir ) = (K −1 − I)(I − In ) = (K − I)(I − In ) = Ir (K − I)(In − Ir ) 12.7 Properties of the factors One would expect such neatly formed operators as the factors of the GaussJordan to enjoy some useful special properties. They might ﬁnd other uses. then one can take advantage of (11. Further properties of the several Gauss-Jordan factors can be gleaned from the respectively relevant subsections of §§ 11. 12.5. Table 12. 11 as such. n × n matrix and if it develops that the rank r = n.8. Indeed they do. The table’s properties formally come from (11. even the K properties are always true. and if one sketches the relevant factors schematically with a pencil. The table’s properties regarding K are admittedly less signiﬁcant.3.3. if one ﬁrmly grasps the matrix forms involved and comprehends the notation (neither of which is trivial to do). then the properties are plainly seen without reference to Ch.2: A few properties of the Gauss-Jordan factors. THE GAUSS-JORDAN DECOMPOSITION Table 12.8 Marginalizing the factor In If A happens to be a square. The table’s properties regarding P and S express a general advantage all permutors share.3 will need them.

The GaussJordan fails on irreversible extended operators. See § 12.3. for which the rank of the truncated form AIn is r < n. of (11. 13.312 CHAPTER 12. is that what an operator does outside some appropriately delimited active region is seldom interesting. from which. it leaves one to consider the ranks of reversible operators like T[1↔2] that naturally should have no rank. it confuses the otherwise natural extension of discrete vectors into continuous functions. the preceding equation results. as a typical example of the usage.” as the narrative says. 12. too. n × n extended operator A (where per § 11.2) in the form P DLU KSIn = A = In P DLU KS.9 Decomposing an extended operator Once one has derived the Gauss-Jordan decomposition.2 A outside the n × n active region resembles the inﬁnite-dimensional identity matrix I) is trivial. then one might just put all matrices in dimension-limited form.8) thus marginalizing the factor In . One can decompose only reversible extended operators so. really. One merely writes A = P DLU KS. Their only point. Second. eqn.3. This subsection’s equations remain unnumbered because they say little new. First. (12. In such a context it may not matter whether one truncates the operator. Indeed this was also the point of § 12. if you like.5.11 If “it may not matter. Third. to extend it to decompose a reversible. inasmuch as all the factors present but In are n × n extended operators.31).6 and 11. equivalently.3. This is to express the Gauss-Jordan solely in row operations or solely in column operations. wherein the Ir has become an I. it leaves shift-and-truncate operations hard to express cleanly (refer to §§ 11.7).6 and. RANK AND THE GAUSS-JORDAN decomposition (12. Many books do. Or. one decomposes the n × n dimension-limited matrix In A = In AIn = AIn as AIn = P DLU In KS = P DLU KSIn . It does not change the algorithm and it does not alter the factors. To put them all in dimension-limited form however brings at least three eﬀects the book you are reading prefers to avoid.3.9 and. The last 11 . it merely reorders the factors after the algorithm has determined them. It fails however if A is rectangular or r < n. because the vector on which the operator ultimately acts is probably null there in any event.

a2 . We will put the Gauss-Jordan to good use in Ch. which we had not known how to handle very well.10) for u. . the Schur of § 14. ψo ψo ψo ψo of the three is arguably most signiﬁcant: matrix rank is such an important attribute that one prefers to impute it only to those operators about which it actually says something interesting.12.10 and the singular-value of § 14. (12. we shall ﬁrst need one more preliminary: the technique of vector replacement. Since what is found outside the operational domain is often uninteresting.9) Now consider a speciﬁc vector v in the space. 12. VECTOR REPLACEMENT 313 Regarding the present section as a whole.4. Consider a set of m + 1 (not necessarily independent) vectors As a deﬁnition.12.4 Vector replacement {u. Nevertheless. . . That is.11. am }. not only for Ir but for all matrices. whereas a dimension-limited operator would have truncated whatever it found there. . the space these vectors address consists of all linear combinations of the set’s several vectors. for which ψo = 0. which one can safely ignore. the diagonal of § 14.10) . a1 .6. among others—but the Gauss-Jordan nonetheless reliably factors an arbitrary m × n matrix A. this may be a distinction without a diﬀerence. into a product of unit triangular matrices and quasielementaries. It is not the only matrix decomposition— further interesting decompositions include the Gram-Schmidt of § 13. (12. However. To do that. ψo u + ψ1 a1 + ψ2 a2 + · · · + ψm am = v. before closing the present chapter we should like ﬁnally. which we do. the extended-operational matrix form is hardly more than a formality. Solving (12. we ﬁnd that ψ1 ψ2 ψm 1 v− a1 − a2 − · · · − am = u. 13. All it says is that the extended operator unobtrusively leaves untouched anything it happens to ﬁnd outside its operational domain. the space consists of all vectors b formable as βo u + β1 a1 + β2 a2 + · · · + βm am = b. squarely to deﬁne and to establish the concept of matrix rank. the Gauss-Jordan decomposition is a signiﬁcant achievement.

RANK AND THE GAUSS-JORDAN With the change of variables φo ← φ1 φ2 1 . the symmetry is complete. only with the symbols u ↔ v and ψ ↔ φ swapped. ψm . φo ψm the solution is φo v + φ1 a1 + φ2 a2 + · · · + φm am = u. ψo . Table 12. . . a1 . . φm = − . a1 .314 CHAPTER 12. am }? . Since φo = 1/ψo . ψo ψ1 ← − . .11) has identical form to (12. φo φ2 = − . Does the same b also lie in the space addressed by the vectors {v. assuming that ψo is ﬁnite it even appears that φo = 0. . . (12. . . . ← − ψo φm for which.10).3 summarizes. . so. quite symmetrically. . φo . . Now further consider an arbitrary vector b which lies in the space addressed by the vectors {u. a2 .11) Equation (12. ψo ψ2 ← − . it happens that ψo = ψ1 ψ2 1 . a2 . φo φ1 = − . am }.

− ψm ψo = φm − φm φo φo v + φ1 a1 + φ2 a2 + · · · + φm am = u 0 = 1 φo φ1 − φo φ2 − φo = ψo = ψ1 = ψ2 . we substitute into (12. in which we see that. one can safely replace the u by a new vector v. am }. one and the same. that an arbitrary vector b which lies in the latter space also lies in the former. a2 . ψo u + ψ1 a1 + ψ2 a2 + · · · + ψm am = v 0 = 1 ψo ψ1 − ψo ψ2 − ψo = φo = φ1 = φ2 . obtaining the new set {v.9) the expression for u from (12.4. a1 . Collecting terms. obtaining the form (βo )(φo v + φ1 a1 + φ2 a2 + · · · + φm am ) + β1 a1 + β2 a2 + · · · + βm am = b. . . . . 12. . a1 . a2 . The two spaces are. VECTOR REPLACEMENT Table 12.3: The symmetrical equations of § 12. . . am }. never just in one or the other.4. = ψm 315 To show that it does. this is βo φo v + (βo φ1 + β1 )a1 + (βo φ2 + β2 )a2 + · · · + (βo φm + βm )am = b. . Naturally the problem’s u ↔ v symmetry then guarantees the converse. yes. in fact. . .12. b does indeed also lie in the latter space. . provided that the replacement vector v includes at least a little of the replaced vector u (ψo = 0 in eqn. This leads to the following useful conclusion.11). Given a set of vectors {u. Therefore.10) and that v is otherwise an honest . a vector b must lie in both spaces or neither. .

if the vectors of the original set happen to be linearly independent (§ 12. the matrix has only two independent vectors. then either γo = 0—impossible since that would make the several ak themselves linearly dependent—or γo = 0.5 Rank Sections 11.3. then the vectors of the new set are linearly independent. by columns or by rows. The section demonstrates that every matrix has a deﬁnite. As a corollary. Other matrices have rank. The new space is exactly the same as the old.10) would still also explicitly make v a linear combination of the same ak plus a nonzero multiple of u. The rank of this 3 × 3 matrix is not r = 3 but r = 2. because according to § 12. if it were that γo v + γ1 a 1 + γ2 a 2 + · · · + γm a m = 0 for nontrivial γo and γk . Yet both combinations cannot be. the third row is just two-thirds the second.6 have introduced the rank-r identity matrix Ir .5 and 11.3. Such vector replacement does not in any way alter the space addressed.1. Hence the vectors of the new set are equally as independent as the vectors of the old. but consider the matrix 5 3 2 1 6 4 6 9 . for. Also. where the integer r is the number of ones the matrix has along its main diagonal. The contradiction proves false the assumption which gave rise to it: that the vectors of the new set were linearly dependent. then (12. Commonly. But if v were a linear combination of the several ak alone. too. we should immediately observe that—like the rest of this chapter but unlike some other parts of the book—this section explicitly trades in exact numbers. RANK AND THE GAUSS-JORDAN linear combination of the several vectors of the original set. 12.1). too.316 CHAPTER 12. and shows how this rank can be calculated. two distinct combinations of independent vectors can never target the same v. in which case v would be a linear combination of the several ak alone. To forestall potential confusion in the matter. unambiguous rank. an n × n matrix has rank r = n. 6 The third column of this matrix is the sum of the ﬁrst and second columns. untainted by foreign contribution. This section establishes properly the important concept of matrix rank. If a matrix element . Either way.

If P2 then Q. Exact quantities are every bit as valid in applied mathematics as imprecise ones are. P2 or P3 .1 sides! A triangle has exactly three sides.0±0.2 we will execute a pretty logical maneuver. One valid way to prove Q. The logical maneuver follows this basic pattern. The ratio of a circle’s circumference to its radius is exactly 2π.1 inches for the length of your thumb.6 ± 0.5.5. “the end justiﬁes the means.5. then alternately to suppose P2 and show that it separately led to Q. then again to suppose P3 and show that it also led to Q. which one might name.12 12. Where the distinction matters.2. Which of P1 . it is the applied mathematician’s responsibility to distinguish between the two kinds of quantity. does it not? The author recommends no such maneuver in social or family life! Logically here however it helps the math. of course—especially matrices populated by measured data—can never truly be exact. The end which justiﬁes the means is the conclusion Q. then. but that is not the point here.1 A logical maneuver In § 12.2 days).0 ± 0. none of which one actually means to prove. like 3. Once the reader feels that he grasps its logic. P2 and P3 are true is not known. Herein. On the contrary. he can proceed to the next subsection where the maneuver is put to use.12. It is a subtle maneuver. the means is to assert several individually suspicious claims. the length of your thumb may indeed be 3. Many real-world matrices. but what is known is that at least one of the three is true: P1 or P2 or P3 . one can still conclude that their common object Q is true. although one can draw no valid conclusion regarding any one of the three predicates P1 . The author has exactly one brother. P2 and P3 could not possibly all be false at once. Here. which thereby one can and does prove. so this subsection is to prepare the reader to expect the maneuver. It is false to suppose that because applied mathematics permits imprecise quantities. but surely no triangle has 3. 13 The maneuver’s name rings a bit sinister. then it is exactly 5. If P3 then Q. 12 . and so on. The ﬁnal step would be to show somehow that P1 . the maneuver if unexpected can confuse. if 0. A construction contract might require the builder to ﬁnish within exactly 180 days (though the actual construction time might be an inexact t = 172. would be to suppose P1 and show that it led to Q. the numbers are exact.”13 When embedded within a larger logical construct as in § 12.1 inches. RANK 317 here is 5. it also requires them. If P1 then Q. then exactly 0.0±0.5. Therefore.

The proof then is by contradiction. ar denote these columns.1 among others has already illustrated the technique. This subsection proves the impossibility. Let the symbols a1 . .12) If r ≥ s.3. reversible or otherwise. .5). a2 . are nothing more than the elementary vectors e1 . .14) will turn out to be false because there are too many ej .5.6). 14 . e5 . . it is not so easy.1.3. more precisely. . RANK AND THE GAUSS-JORDAN 12.318 CHAPTER 12. If r < s. (12. . Equation (12. because Ir attacking from the right is the column truncation operator (§ 11. The r columns of AIr are nothing more than the ﬁrst through rth columns of A. One assumes the falsehood to be true.1.7). the product AIr is a matrix with an unspeciﬁed number of rows but only r columns—or. we shall assume for the moment that the claim were true. The claim (12. a5 . then reasons toward a contradiction which proves the assumption false. Observe that unlike as in § 12. Viewed this way.12) can be written in the form (AIr )B = Is . however. In fact it is impossible.13) makes is thus that the several vectors ak together address each of the several elementary vectors ej —that is. with no more than r nonzero columns. but the technique’s use here is more sophisticated. 15 As the reader will have observed by this point in the book. (12.3. that a linear combination14 b1j a1 + b2j a2 + b3j a3 + · · · + brj ar = ej (12. then.15 and it runs as follows. e3 .1.2 The impossibility of identity-matrix promotion Consider the matrix equation AIr B = Is . here we have not necessarily assumed that the several ak are linearly independent. The s columns of Is . ever transform an identity matrix into another identity matrix of greater rank (§ 11.12) holds: A = Is = B. e4 . . It shows that one cannot by any row and column operations.14) exists for each ej . B operates on the r columns of AIr to produce the s columns of Is . a3 . the technique—also called reductio ad absurdum—is the usual mathematical technique to prove impossibility.3.13) where. a4 . Section 6. then it is trivial to ﬁnd matrices A and B for which (12. es (§ 11. 1 ≤ j ≤ s. e2 . but to prove this. . per § 11. The claim (12.

The logic will never ask. then a3 might be available instead because b31 = 0. For j = 1. we shall ﬁnd that the illicit choice of replacement and the licit choice had led alike to the same. {e1 . . In this case the new set would be {a1 . . a5 . ar }. For example. (12. The claim (12. The only restriction per § 12. but never all. Of course there is no guarantee specifically that b11 = 0. a4 . if a1 were forbidden because b11 = 0. inasmuch as e1 is nonzero. there is always at least one ak which e1 can replace.14) is b11 a1 + b21 a2 + b31 a3 + · · · + br1 ar = e1 . identical conclusion remains to be determined. a2 . a3 . (For example. e1 . so we cannot properly appeal to it here). even though we have no idea which of the several ak the vector e1 contains. a2 . for only the complete absence of licit choices can threaten the present maneuver.12.) Here comes the hard part. Some of the ak might indeed be forbidden. Therefore. . ar }. a2 . licit and illicit. ar }. Here is where the logical maneuver of § 12. . according to § 12. in the end. but that section depends logically on this one. Whether as the maneuver also demands. a4 . . then according to (12. . According to (12.4 one can safely replace any of the vectors in the set by e1 without altering the space addressed. . lead ultimately alike to the same.2 gives the method. If we do so ﬁnd then—in the end—the logic will demand of us only an assurance that some licit choice had existed at the time the choice was or might have been made. and if bk1 is nonzero then e1 can replace ak . a5 . . Because e1 is a linear combination. a5 .4 is that e1 contain at least a little of the vector ak it replaces—that bk1 = 0. . even in retrospect. a4 . . .5. but surely it contains at least one of them. which speciﬁc choice had been the licit one.1 comes in. . all the choices. replacing a1 by e1 . The book to this point has established no general method to tell which of the several ak the elementary vector e1 actually contains (§ 13.14) at least one of the several bk1 also is nonzero.14). the vector e1 might contain some of the several ak or all of them.5. so for e1 to replace a1 might not be allowed. . identical conclusion. still we can proceed with the proof—provided only that. even though it is illicit to replace an ak by an e1 which contains none of it. a3 .14) guarantees at least one licit choice. However. 319 which says that the elementary vector e1 is a linear combination of the several vectors {a1 . even though replacing the wrong ak logically invalidates any conclusion which ﬂows from the replacement. RANK Consider the elementary vector e1 .

Therefore by the same reasoning as before. e4 . a5 . which is attached to e1 ) must be nonzero. until all the ak are gone and our set has become {e1 . Therefore as we have seen. . ar } (or {a1 . . e1 . According to (12. . . RANK AND THE GAUSS-JORDAN Now consider the elementary vector e2 . a3 . . a4 . . addresses precisely the same space as did the original set {a1 . ar }. a2 . it is impossible for β12 to be the sole nonzero coeﬃcient. ar }. e2 also lies in the space addressed by the new set {e1 . Again it is impossible for all the coeﬃcients βk2 to be zero but. ar }. . . as we have reasoned. . . . . And so it goes.3. but also coeﬃcients βk2 exist such that β12 e1 + β22 a2 + β32 a3 + · · · + βr2 ar = e2 . . . a2 . . That is. . thus also as the original set. . a2 . a4 . which. er }. . a3 . a5 . for (as should seem plain to the reader who grasps the concept of the elementary vector. At least one of the βk2 attached to an ak (not β12 . a4 . . This newer set addresses precisely the same space as the previous set. not only do coeﬃcients bk2 exist such that b12 a1 + b22 a2 + b32 a3 + · · · + br2 ar = e2 . a4 . or whatever the new set happens to be).7) no elementary vector can ever be a linear combination of other elementary vectors alone! The linear combination which forms e2 evidently includes a nonzero multiple of at least one of the remaining ak . . a3 .14). we now choose an ak with a nonzero coeﬃcient βk2 = 0 and replace it by e2 . . . a2 . obtaining an even newer set of vectors like {e1 . moreover. . e3 . a3 . § 11.320 CHAPTER 12. e2 . ar }. . e5 . a2 . a5 . . replacing one ak by an ej at a time. e2 . e2 lies in the space addressed by the original set {a1 . a5 . a5 .

4 replace a3 .5. no unblockaded sequence can run to a dead end. . Here is another. address each of the elementary vectors ej . there’s nothing for it: if you wish to understand then you will simply have to trace all the subscripts out with your pencil. assuming again per § 12. er }.] Clear? No? Too many subscripts? Well. . in their proper order but we might take the several ak in any of r! distinct sequences: for instance. All intermediate choices. ultimately have led alike to the single conclusion of this paragraph. (a2 . a2 . we wish to convert the set {a1 . a3 ). a2 . which thereby is properly established. e2 . . forbidden—because. because the turn of e1 comes ﬁrst and. in the case that r = 3. both sequences beginning (a1 . by various paths licit and illicit. (a3 . (a2 .4 that the several vectors ak of the original set. . e4 . Blockaded? Well. then we will put e2 in the place of one of the remaining ak . in this example. . a1 ). . since e2 then contains none of a3 . a4 . too.1 has demanded. The reason is as before: that. After each unblockaded replacement another replacement will always be possible. RANK 321 And this is the one. 1 ≤ j ≤ s. Suppose again that. according to § 12. . a1 ) is blockaded—which is to say. Admittedly.4. r < s. slightly diﬀerent light by which to illuminate the question. a3 . we might take the several ak in any of the 3! = 6 distinct sequences (a1 . and so on until. a1 ). a2 ). a3 . . so long as each elementary . ﬁrst we will put the elementary vector e1 in the place of one of the several vectors ak . the example cannot be made any simpler. a5 . (a1 . a1 ). except however that we might (or might not) ﬁnd certain sequences blockaded in the event.12. e1 contains none of a1 . by making exactly r replacements. . at that time.4 to convert the set without altering the space the set addresses. a1 . a3 . Now. [Actually. We will give the several ej . 1 ≤ j ≤ r. we put er in the place of the last remaining ak . taken together. consider for example the sequence (a2 . a2 . so to speak. at last. Suppose further again that we wish per § 12. a1 .) are blockaded. . such subtle logic may not be easy to discern. once e1 has replaced a2 . . ar } into {e1 . identical conclusion the maneuver of § 12.5. a3 . a2 ). although some sequences might be blockaded. e3 . e5 . (a3 . To reach our goal. and suppose that e1 = 0a1 + 4a2 − 2a3 and that e2 = 5a1 − (1/2)e1 + 0a3 (noticing that the latter already has e1 on the right instead of a2 ): in this case the sequence (a2 . e2 cannot according to § 12. a3 . a3 ).

as we have stipulated. . . The false claim: that the several ak . 1 ≤ k ≤ r. r < j ≤ s. Hence ﬁnally we conclude that no matrices A and B exist which satisfy (12. the replacement cannot alter the space the set addresses. e2 . that each elementary vector in turn must be found to contain at least one of the set’s then remaining ak . Some elementary vectors ej . Plainly this is impossible with respect to the left-over ej .12) when r < s. e5 . e2 . . the claim (12. and. e2 . than there are ak . . ar } together addressed each of the several elementary vectors ej . . ar } into {e1 . . that the space by initial assumption includes all the elementary vectors. but until now we have lacked the background theory to prove it.13). they can never promote them. . The conclusion is that we can indeed convert {a1 . The conclusion leaves us with a problem. . a4 . that each elementary vector in turn must therefore be found contain at least one of the vectors then in the set. . a3 . Now we have the theory. . Back at the beginning of the section.322 CHAPTER 12. The promotion of identity matrices is impossible. RANK AND THE GAUSS-JORDAN vector ej in turn contains some of the vector it replaces. es . even when r < s. are evidently left over. e5 . 1 ≤ j ≤ s.12) written diﬀerently. In other words. The contradiction proves false the claim which gave rise to it. e5 . 1 ≤ k ≤ r. . . r < j ≤ s. this amounts to a claim that {e1 . a5 . e4 . e3 . . . . er }. which is just (12. One should like to think that this rank r were a deﬁnite property of the matrix itself rather than some unreliable artifact of the algorithm. a2 . step by step.3. addressed all the ej . e4 . e4 . the latter of which are just the elementary vectors e1 . e3 . 1 ≤ j ≤ s. that no elementary vector can be composed solely of other elementary vectors. a4 . . . The logic though slightly complicated is nonethemore escapable. asserts that B is a column operator which does precisely what we have just shown impossible: to combine the r columns of AIr to yield the s columns of Is .3 General matrix rank and its uniqueness Step 8 of the Gauss-Jordan algorithm (§ 12. 12. There remain more ej . a2 . .5. . er } together addressed each of the several elementary vectors ej . a3 . e3 . But as we have seen.14) made was that {a1 . we conclude that although row and column operations can demote identity matrices in rank. Equation (12. r < s. a5 .3) discovers a rank r for any matrix A. . . because. consequently. without altering the space addressed. Here is the proof. however.

that A had not only rank r but also rank s.12). But according to § 12. −1 −1 A = B> Ir B< .15) The question is whether in (12.3.5. then for Is .) • The rank r of a general matrix A is the rank of an identity matrix Ir to which A can be reduced by reversible row and column operations. > < Combining these equations.5.15). (B> G−1 )Is (G−1 B< ) = Ir . and r=s is guaranteed. according to (12. then one of these two equations would constitute the demotion of an identity matrix and the other. A = G−1 Is G−1 . a promotion. • The rank r of an identity matrix Ir is the number of ones along its main diagonal. . −1 −1 A = B> Ir B< . RANK 323 The proof begins with a formal deﬁnition of the quantity whose uniqueness we are trying to prove. This ﬁnding has two immediate implications: • Reversible row and/or column operations exist to change any matrix of rank r to any other matrix of the same rank.2 and its (12. (12. −1 −1 B< B< = I = B< B< . (This is from § 11. −1 −1 B> Ir B< = G−1 Is G−1 .5.12. No matrix has two diﬀerent ranks. Were it that r = s. > < Solving ﬁrst for Ir . Then. Matrix rank is unique.15) only a single rank r is possible. reversible operations exist to change both matrices to Ir and back. promotion is impossible. A matrix A has rank r if and only if matrices B> and B< exist such that B> AB< = Ir . we suppose that another rank were possible. > < −1 −1 (G> B> )Ir (B< G< ) = Is . −1 −1 B> B> = I = B> B> . Therefore r = s is also impossible. To answer the question. The reason is that.

The discovery that every matrix has a single. but they are important achievements nonetheless. RANK AND THE GAUSS-JORDAN • No reversible operation can change a matrix’s rank. Having zeroed the dependent column. Parallel reasoning rules the rows of broad matrices. A full-rank matrix. Shrinking the matrix necessarily also shrinks the bound on the matrix’s rank to r ≤ n − 1—which is to say. one of whose columns is a linear combination of the others.1) of the others. Since AT is such a matrix.4 was that a matrix of independent rows always has rank equal to the number of rows. has full rank r = n. shrinking the matrix to m × (n − 1) dimensionality. The transpose AT of such a matrix has a full n independent rows. one could then interchange it over to the matrix’s extreme right.4 The full-rank matrix According to (12. . but also A itself. Consider a tall m × n matrix C. is necessarily degenerate for this reason. the rank r of a matrix can exceed the number neither of the matrix’s rows nor of its columns. using multiples of the other columns to wipe the dependent column out. its rank is a full r = n. One of the conclusions of § 12. m = n. But formally. unambiguous rank and the establishment of a failproof algorithm—the Gauss-Jordan—to ascertain that rank have not been easy to achieve. then. Section 12. A matrix of less than full rank is a degenerate matrix.324 CHAPTER 12.8 comments further. to r < n.3. m ≥ n. if m = n. by which § 12. of course. But the shrink. what this T T T T says is that there exist operators B< and B> such that In = B< AT B> . To square matrices.5. The matrix C. both lines of reasoning apply. is itself reversible. is deﬁned to be an m × n matrix with maximum rank r = m or r = n—or. The greatest rank possible for an m × n matrix is the lesser of m and n. We will rely on it often. Matrix rank by contrast is an entirely solid. eﬀectively throwing the column away. both. One could by deﬁnition target the dependent column with addition elementaries. 12.5. The reason these achievements matter is that the mere dimensionality of a matrix is a chimerical measure of the matrix’s true size—as for instance for the 3 × 3 example matrix at the head of the section. done by reversible column operations. Now consider a tall matrix A with the same m × n dimensionality. but with a full n independent columns. worth the eﬀort thereto.5).5. the transpose of which equation is B> AB< = In —which in turn says that not only AT . one of whose n columns is a linear combination (§ 12. m ≤ n.3 binds the rank of the original. dependable measure. m × n matrix C likewise to r < n.

5 Underdetermined and overdetermined linear systems (introduction) The last paragraph of § 12. because a tall or broad matrix cannot but include. we have that 325 • a tall m × n matrix. more columns or more rows than Ir . but. To say that a matrix has full column rank is to say that it is tall or square and has full rank r = n ≤ m. An overdetermined linear system Ax = b cannot have a solution for every possible m-element driving vector b. Section 13. If A lacks both. • a broad m × n matrix. • a square n × n matrix. Further generalities await Ch.12.4 solves the nonoverdetermined linear system. meaning that knowledge of b does not suﬃce to determine x uniquely. m ≤ n. the present subsection would observe at least the few following facts.2 solves the exactly determined linear system. RANK Gathering ﬁndings. then the system is paradoxically both underdetermined and overdetermined and is thereby degenerate.6 analyzes the unsolvable overdetermined linear system among others.5. The truth of this claim can be . if r < n—because inasmuch as some of A’s columns then depend linearly on the others such a system maps multiple n-element vectors x to the same m-element vector b. Section 13. has full rank if and only if its rows are linearly independent. A linear system Ax = b is underdetermined if A lacks full column rank—that is. has full rank if and only if its columns are linearly independent. respectively. Complementarily. m = n. If A happily has both. Only a square matrix can have full column rank and full row rank at the same time. never just one or the other.4 provokes yet further terminology. and • a square matrix has both independent columns and independent rows.5. then the system is exactly determined. or neither. Section 13. if r < m. 12.5. regarding the overdetermined system speciﬁcally. To say that a matrix has full row rank is to say that it is broad or square and has full rank r = m ≤ n. 13. m ≥ n. has full rank if and only if its columns and/or its rows are linearly independent. a linear system Ax = b is overdetermined if A lacks full row rank—that is.

or. then Ir G< x = c. only not in the context of unrestricted b. which veriﬁes the claim. 12. Analysis including such an error can lead to subtly absurd conclusions. and an easy one innocently to commit. better stated. m × n matrix of rank r can be expressed as the product of two full-rank matrices. if b is an unrestricted m-element vector then so also is c. But since G> is invertible. Other full-rank .5. > If the m-element vector c ≡ G−1 b.6 The full-rank factorization One sometimes ﬁnds dimension-limited matrices of less than full rank inconvenient to handle. for the trivial reason that r = m. (12. because c in this case has no nonzeros among its last m − r elements. RANK AND THE GAUSS-JORDAN seen by decomposing the system’s matrix A by Gauss-Jordan and then leftmultiplying the decomposed system by G−1 to reach the form > Ir G< x = G−1 b. to require that Ax = b for unrestricted b when A lacks full row rank. C = Ir G< In . both also of rank r: A = BC. because it has no last m − r elements. whatever other solutions such a system might have. good for any matrix. This is so because in this case the last m − r elements of c do happen to be zero. It is an analytical error. However. It is never such an analytical error however to require that Ax = 0 because. a nonoverdetermined linear system Ax = b does have a solution for every possible m-element driving vector b. The error is easy to commit because the equation looks right. one m × r and the other r × n.16) The truncated Gauss-Jordan (12. it has at least the solution x = 0. which is impossible > unless the last m − r elements of c happen to be zero. Complementarily. because such an equation is indeed valid over a broad domain of b and might very well have been written correctly in that context. each b corresponds to a unique c and vice versa.326 CHAPTER 12.7) constitutes one such full-rank factorization: B = Im G> Ir . so. every dimension-limited.

That S = I comes immediately of choosing q = i for pivot column during each iterative instance of the algorithm’s step 3. § 3. We conclude that though one might voluntarily choose q = i during the algorithm’s step 3.7 Full column rank and the Gauss-Jordan factors K and S The Gauss-Jordan decomposition (12. as the full-column-rank matrix A evidently leaves one free to do.16 Of course.5. which contrary to our assumption would mean precisely that r < n. but in this case I has only r columns and therefore has no spare columns to null. regardless of the pivots one chooses during the Gauss-Jordan algorithm’s step 3. then the full-rank factorization is trivial: A = Im A or A = AIn . one must ask. too. but see: nothing in the algorithm later ﬁlls such a column’s zeros with anything else—they remain zeros—so swapping the column away rightward could only delay the crisis. One need not swap. Eventually the column would reappear on pivot when no usable column rightward remained available to swap it with. then one would indeed have to choose q = i to swap the unusable column away rightward. The full-rank factorization is not unique. Hence step 12 does nothing and K = I. the algorithm cannot force one to do so if r = n. If one happens always to choose q = i as pivot column then not only K = I but S = I. Such contradiction can only imply that if r = n then no unusable column can ever appear. of a tall or square m × n matrix A of full column rank r = n ≤ m always ﬁnds the factor K = I. then indeed S = I. 12.56). That K = I is seen by the algorithm’s step 12. however. which creates K. A = P DLU Ir KS. were it so. 16 [7. can one choose so? What if column q = i were unusable? That is. Yet if one always does choose q = i. “Moore-Penrose generalized inverse”] . if an m × n matrix already has full rank r = m or r = n. RANK 327 factorizations are possible. including among others the truncated Gram-Schmidt (13. what if the ˜ only nonzero elements remaining in I’s ith column stood above the main diagonal.2). The column would remain unusable. Section 13.4 uses the full-rank factorization.3][58.6.5. Step 12 ˜ ˜ nulls the spare columns q > r that dress I’s right.12. unavailable for step 4 to bring to pivot? Well. But.

if doing exact arithmetic. complete n-dimensional space. that each row of In lies in the space the rows of A address. A’s rows together. by deﬁnition have no such problem. along with all that attends to the concept. Matrices of full column rank r = n. Without this theorem. This is obvious and boring.2) includes the factors K and S precisely to handle matrices with more columns than rank.5.3. 12. Equation (12. which fact lets us abbreviate the decomposition for such matrices to read A = P DLU In .17) is not mandatory but optional for a matrix A of full column rank (though still r = n and thus K = I for such a matrix. In ’s columns together and In ’s rows together each address the same. Therefore. The rows of In and the rows of A evidently address the same space. (12. It constitutes the chapter’s chief result.17) implies that each row of the full-rank matrix A lies in the space the rows of In address. In the whole. but interesting is the converse implication of (12. Since P DLU acts as a row operator. is an extremely important matrix theorem. U −1 L−1 D −1 P −1 A = In . which we have spent so many pages to attain. One can moreover say the same of A’s columns. the very concept of matrix rank must remain in doubt. the Gauss-Jordan decomposition (12.8 The signiﬁcance of rank uniqueness The result of § 12.5. common in applications.2 is used). since B = AT has full rank just as A does. after all—especially numerically to avoid small pivots during early invocations of the algorithm’s step 5. even when the unabbreviated eqn. The theorem is the rock upon which the general theory of the matrix is built. The concept underlying the theorem promotes the useful sensibility that a matrix’s rank. set S = I if one wanted to.17)’s complementary form. (12. that matrix rank is unique. the Gauss-Jordan decomposition theoretically needs no K or S for such matrices. RANK AND THE GAUSS-JORDAN Theoretically.17) Observe however that just because one theoretically can set S = I does not mean that one actually should. 12. if a matrix A is square and has full rank r = n.328 CHAPTER 12. There are however times when it is nice to know that one theoretically could. The column permutor S exists to be used. much more than its mere dimensionality or the extent of its . then A’s columns together.

after all. rank r = 2. admittedly. hence. too. When one models a physical phenomenon by a set of equations. Consider for instance the 5 × 5 matrix (in hexadecimal notation) 12 9 3 1 0 F 15 3 2 12 2 2 2 D 9 −19 − E −6 . dimensions or equations one actually has available to work with. It looks like a rank-three matrix. but really has only two independent columns and two independent rows. the honest 2 × 2 matrix 5 3 1 6 has two independent rows or. No one can just look at some arbitrary matrix and instantly perceive its true rank. represents the matrix’s true size. We have here caught a matrix impostor pretending to be bigger than it really is. is really just a useless combination of the others.5. but it is a very simple example.12. two independent columns. the matrix’s rank is not r = 5 but r = 4. This can happen in matrix work. and. 6 The 3 × 3 matrix on the equation’s right is the one we met at the head of the section.17 Now. but of course there is nothing mathematically improper or illegal about a matrix of less than full rank. 3 −2 0 6 1 5 1 −4 4 1 −8 17 As the reader can verify by the Gauss-Jordan algorithm. so long as the true rank is correctly recognized. alternately. however. rather than how many seem available at ﬁrst glance. . one sometimes is dismayed to discover that one of the equations.” terms like “imposter. simply by applying some 3 × 3 row and column elementaries: T(2/3)[32] 5 3 1 6 T1[13] T1[23] = 5 3 2 1 6 4 6 9 .” are a bit hyperbolic. An applied mathematician with some matrix experience actually probably recognizes this particular 3 × 3 matrix as a fraud on sight. One can easily construct a phony 3 × 3 matrix from the honest 2 × 2. RANK 329 active region. The rank of a matrix helps one to recognize how many truly independent vectors. Dimensionality can deceive. The last paragraph has used them to convey the subjective sense of the matter. thought to be independent. Its true rank is r = 2. adjectives like “honest” and “phony. For example. That is the sense of matrix rank.

RANK AND THE GAUSS-JORDAN .330 CHAPTER 12.

12 might at least review § 12.1.5. as in the last step to derive (3. We already knew how to solve a simultaneous system of linear scalar equations in principle without recourse to the formality of a matrix. we want to confess that such a use alone.5. but admits at least four answers.5. 331 . Sections 11. this chapter brings the ﬁrst rewarding matrix work. after all. 3.9) as far back as Ch. First. Uses however it has. the primary subject of this chapter. 11 and 12.1 and 12.Chapter 13 Inversion and orthonormalization The undeniably tedious Chs. to represent a system of m linear scalar equations in n unknowns neatly as Ax = b and to solve the whole system at once by inverting the matrix A that characterizes it. as this chapter 1 The reader who has skipped Ch. before we go on. if only to prepare to do here something we already knew how to do? The question is a fair one. Now.5 have already broached1 the matrix’s most basic use. on the surface of it—though interesting—might not have justiﬁed the whole uncomfortable bulk of Chs. Why should we have suﬀered two bulky chapters. Building upon the two tedious chapters. the matrix neatly solves a linear system not only for a particular driving vector b but for all possible driving vectors b at one stroke. 11 and 12 have piled the matrix theory deep while aﬀording scant practical reward. One might be forgiven for forgetting after so many pages of abstract theory that the matrix aﬀorded any reward or had any use at all.

7. Second and yet more impressively.6 to introduce the pseudoinverse to approximate the solution to an unsolvable linear system and. Most students ﬁrst learning the matrix have wondered at this stage whether it were worth all the tedium. it concludes by introducing the concept and practice of vector orthonormalization in §§ 13. Suppose that extended operators G> . if the reader now wonders. each with an n × n active > < . the matrix allows § 13. The matrix ﬁnally begins to show its worth here. In § 13.4.12. G< . moreover. m × n linear system in § 13. then he stands in good company. which one can then manipulate by known algebraic rules. The chapter opens in § 13. speciﬁc applications aside.2. Fourth. 14.1 by inverting the square matrix to solve the exactly determined. G−1 and G−1 can be found. it brings forth the aforementioned pseudoinverse.332 CHAPTER 13. whereas such overdetermined systems arise commonly in applications.6. Third.8 through 13. 13.3 by computing the rectangular matrix’s kernel to solve the nonoverdetermined. one should never underestimate the blunt practical beneﬁt of reducing an arbitrarily large grid of scalars to a single symbol A. INVERSION AND ORTHONORMALIZATION explains. so. not its only use: the even more interesting eigenvalue and its incidents await Ch. After brieﬂy revisiting the Newton-Raphson iteration in § 13. to do so both optimally and eﬃciently.1 Inverting the square matrix Consider an n×n square matrix A of full rank r = n. n × n linear system in § 13. which rightly approximates the solution to the unsolvable overdetermined linear system. It continues in § 13. to solve the linear system neatly is only the primary and most straightforward use of the matrix.

2 The symbology and associated terminology might disorient a reader who had skipped Chs. = In . slightly diverge from deﬁnitions the reader may have been used to seeing in other books. In this book. Outside the m × m or n × n square. the symbol I theoretically represents an ∞ × ∞ identity matrix. the matrix A of the m × n system Ax = b has nonzero content only within the m × n rectangle.1. = In . the reader might brieﬂy review the earlier chapters. INVERTING THE SQUARE MATRIX region (§ 11. but this footnote uses both symbols because generally m = n.3. . (In the present section it happens that m = n because the matrix A of interest is square. If interpreted as an ∞ × ∞ matrix. sometimes it does make sense. especially § 11. < < A = G> In G< . but more often it does not—which naturally is why the other books tend to forbid such multiplication).24 and 14. one can legally multiply any two matrices. 11 and 12.3. because all matrices are theoretically ∞ × ∞. > > G−1 G< = I = G< G−1 . None of this is complicated. < > 333 (13.1) we ﬁnd by successive steps that A = G> In G< . anyway (though whether it makes any sense in a given circumstance to multiply mismatched matrices is another question. though it too can be viewed as an ∞ × ∞ matrix with zeros in the unused regions. Its purpose is merely to separate the essential features of a reversible operation like G> or G< from the dimensionality of the vector or matrix on which the operation happens to operate. In this book.49. In A −1 −1 G< G> In A −1 (G< In G−1 )(A) > = G> G< In . really. the operators G> and G< each resemble the ∞ × ∞ identity matrix I.) The symbol Ir contrarily represents an identity matrix of only r ones.2). as in eqns.31) that In A −1 −1 In G< G> = = A G−1 In G−1 < > = AIn . such that2 G−1 G> = I = G> G−1 . Observing from (11. which means that the operators aﬀect respectively only the ﬁrst m rows or n columns of the thing they operate on. 13. = G−1 G−1 In . To the extent that the deﬁnitions confuse. The deﬁnitions do however necessarily.13.

reciprocal pair.2). the rank-n inverse A−1 (more fully written A−1(n) ) too is an n × n square matrix of full rank r = n. it does per (13. but we have deﬁned it in (11. so. we have that A−1 A = In = AA−1 .2) include the following. which is what G−1 and G−1 are. but In . The factors do exist. A is itself the rank-n inverse of A−1 . According > < to (13. G−1 and G−1 for any square matrix A of full rank.3) and the body of § 12. AIn −1 −1 AIn G< G> (A)(G−1 In G−1 ) < > Either way. (12. the rank-n inverse of A.5. < > (13.2) = In G> G< . which might seem a practical hurdle. the product of A−1(n) and A—is.” because the rank is implied by the size of the square active region of the matrix inverted. not I. However. there is no trouble here. • Like A. written more fully. so it has only the one. . We have not yet much studied the rank-n inverse. nonstandard notation A−1(n) . “the inverse. separately.2) indeed have an inverse A−1 . No other rank-n inverse of A exists. G< .334 CHAPTER 13.2). G< . • If B = A−1 then B −1 = A. be known > < and honor n × n active regions. = In .49). its rows and. inasmuch as A is square and has full rank. for this to work.49) is not quite the inﬁnitedimensional inverse from (11.4). The rank-n inverse exists. Equation (13. • Since A is square and has full rank (§ 12. Of course. Properties that emerge from (13. (12.45). When naming the rank-n inverse in words one usually says simply. where we gave it the fuller. INVERSION AND ORTHONORMALIZATION or alternately that A = G> In G< .2) features the important matrix A−1 . but the rank-n inverse from (11. and indeed we know how to ﬁnd them. The matrices A and A−1 thus form an exclusive. > < without exception. That is. = In .3 have shown exactly how to ﬁnd just such a G> . G> . G−1 and G−1 must exist. the product of A−1 and A—or. its columns are linearly independent. • On the other hand. A−1 ≡ G−1 In G−1 . unique inverse A−1 .

2 and 12. no less than that many diﬀerent pairs of scalar factors γ> and γ< can yield α = γ> 1γ< . and so on through en . .3’s ﬁnding that reversible operations like G−1 and G−1 cannot change In ’s rank. Mathematical convention owns a special name for a square matrix which is degenerate and thus has no inverse. which has no reciprocal. then A′−1 would by deﬁnition represent a row or column operation which impossibly promoted A′ to the full rank r = n of In . • Only a square. On the contrary.5.” Refer to § 11. Moreover. 3 The notation [A−1 ]∗j means “the jth column of A−1 .5. A−1 is unique. in that it has no inverse such a degenerate matrix closely resembles the scalar 0. it is hardly the only means. Either way.2)—that BA = In or that AB = In —much less both. many diﬀerent pairs of matrix factors G> and G< can yield A = G> In G< . From this beginning and the fact that In = AA−1 . What of the degenerate n × n square matrix A′ .5. Indeed.2) of A−1 plus § 12. What are unique are not the factors but the A and A−1 they produce. is not invertible in the rank-n sense of (13. it calls it a singular matrix. One could observe likewise respecting the independent rows of A. then both equalities in fact hold. B = A−1 . if it had.2) reliably calculate it. inasmuch as the Gauss-Jordan decomposition plus (13. It is not claimed that the matrix factors G> and G< themselves are unique.2) such a matrix has no inverse. thus.2). of rank r < n? Rank promotion is impossible as §§ 12. or whose rank falls short of a full r = n. n × n matrix of full rank r = n has a rank-n inverse. A matrix A′ which is not square. Though the GaussJordan decomposition is a convenient means to G> and G< . That the inverse is unique begins from § 12. that [A−1 ]∗2 represents the one and only possible combination of A’s columns which achieves e2 .13.4’s observation that the columns (like the rows) of A are linearly independent because A is square and has full rank. it follows that [A−1 ]∗1 represents3 the one and only possible combination of A’s columns which achieves e1 . That A−1 is an n × n square matrix of full rank and that A is itself the inverse of A−1 proceed from the deﬁnition (13. so in the sense of (13. > < That the inverse exists is plain. incidentally. for. no other n × n matrix B = A−1 satisﬁes either requirement of (13.1).1. and any proper G> and G< found by any means will serve so long as they satisfy (13. INVERTING THE SQUARE MATRIX 335 • If B is an n × n square matrix and either BA = In or AB = In .3.5.3 have shown. One can have neither equality without the other.1.

It is the classic motivational result of matrix theory. However. one can solve (13. Furthermore. then the matrix A has square. It has taken the book two long chapters to reach (13.3 and 12.3) by the A−1 of (13.2 The exactly determined linear system Section 11.1) to reach the famous formula x = A−1 b. because the student still must at some point develop the theory underlying matrix rank and supporting each of the several coincident properties of § 14.5. Refer to §§ 12.2) and (13.4) Inverting the square matrix A of scalar coeﬃcients.1.6. Ch. 1][47. reaching (13.4). however. n × n dimensionality.5.3) concisely represents an entire simultaneous system of linear scalar equations. The deferred price the student pays for the simplerseeming approach of the tutorials is twofold.336 CHAPTER 13. the student fails to develop the GaussJordan decomposition properly. If the system has n scalar equations and n scalar unknowns.2]—which makes good matrixarithmetic drill but leaves the student imperfectly prepared when the time comes to study kernels and eigensolutions or to read and write matrix-handling computer code. Second. See also §§ 12. (13. then one can indeed reach (13.2. 13. which gives A full rank and makes it invertible.4 As the chapter’s introduction has observed. What the tutorials do is pedagogically necessary—it is how the 4 . not exactly.4) with rather less eﬀort than the book has done. in which case the preceding argument applies.4. appending the right number of null rows or columns to a nonsquare matrix does turn it into a degenerate square. Under these conditions. § 1.5.1 has shown how the single matrix equation Ax = b (13. if the n scalar equations are independent of one another. If one omits ﬁrst to prepare the theoretical ground suﬃciently to support more advanced matrix work. The deﬁnitions of the present particular section are meant for square matrices. instead learning the less elegant but easier to grasp “row echelon form” of “Gaussian elimination” [33.4) with less eﬀort. in the long run the tutorials save no eﬀort. INVERSION AND ORTHONORMALIZATION And what of a rectangular matrix? Is it degenerate? Well.4 and 13. First. 13.5. they do not neatly apply to nonsquare ones.4) concisely solves a simultaneous system of n linear scalar equations in n scalar unknowns. not necessarily. (13.3) and the corresponding system of linear scalar equations by left-multiplying (13. tutorial linear algebra textbooks like [33] and [47] rightly yet invariably invert the general square matrix of full rank much earlier. introductory. we For motivational reasons. then the rows of A are similarly independent. no.

Depending on the value of the driving vector b. 5 The conventional mathematical notation for the kernel of A is ker{A}. 11 and 12 have laid under it a ﬁrm foundation upon which—and supplied it the right tools with which—to work.5. THE KERNEL 337 shall soon meet additional interesting applications of the matrix which in any case require the theoretical ground to have been prepared.5. any n-element x (including x = 0) that satisﬁes (13. 13. minimally represents the kernel of A if and only if • AK has n × (n − r) dimensionality (which gives AK tall rectangular form unless r = 0). thus degenerate. Where the inverse does not exist.5 AK . Let A be an m × n matrix of rank r.13. the proper place to invert the general square matrix of full rank is here. . and surely more comprehensible to those who do not happen to have read this particular book. no unique solution to a linear system described by a singular square matrix is possible—though a good approximate solution is given by the pseudoinverse of § 13. we might write not AK but AK(n) to match the A−1(r) of (11.5. Equation (13. meaning that the corresponding system actually contains fewer than n useful scalar equations. too—but it is appropriate to a tutorial. though.5) is possible even if In x = 0.3.5) belongs to the kernel of A.3 The kernel If a matrix A has full column rank (§ 12. the inversion here goes smoothly. not to a study reference like this book.4).6. then the columns of A are linearly independent and Ax = 0 (13. as opposed to the notation AK which represents a matrix whose columns address the kernel space. writer ﬁrst learned the matrix and probably how the reader ﬁrst learned it. the superﬂuous equations either merely reproduce or ﬂatly contradict information the other equations already supply. where the square matrix A is singular. If the matrix however lacks full column rank then (13. Indeed.4) is only the ﬁrst fruit of the eﬀort. Either way. The abbreviated notation AK is probably clear enough for most practical purposes. the singular square matrix characterizes a system that is both underdetermined and overdetermined. This book de¨mphasizes the distinction and prefers the kernel matrix notation AK .49). A second matrix. In this book. In the language of § 12. because Chs. In either case.5) is impossible if In x = 0. e If we were really precise. the rows of the matrix are linearly dependent. null{A} or something nearly resembling one of the two—the notation seems to vary from editor to editor—which technically represent the kernel space itself. where derivations prevail.

7) is not easy. too.8) and where b and x are respectively m. for generality’s sake.3) and Hr is the shift operator of § 11. or both.338 CHAPTER 13. you’ll be all right). for the moment.3.1 The Gauss-Jordan kernel formula To derive (13. This statement is broader than (13. 13. which gives AK full column rank).and n-element vectors and A is an m × n matrix of rank r. and • AK satisﬁes the equation AAK = 0. Gauss-Jordan factoring A. Section 13. we leave b unspeciﬁed but by the given proviso. where S −1 . .4. so. by successive steps. G> Ir KSx = b. > Ir (K − I)Sx + Ir Sx = G−1 b. This name seems as ﬁtting as any. and it is the latter space rather than AK as such that is technically the kernel (if you forget and call AK “a kernel. In symbols.5) requires but it serves § 13. What is unique is not the kernel matrix but rather the space its columns address.6) The n−r independent columns of the kernel matrix AK address the complete space x = AK a of vectors in the kernel. but we would like a name for (13. where the (n − r)-element vector a can have any value. Ax = A(AK a) = (AAK )a = 0. next. the columns of AK are linearly independent. K −1 and G−1 are the factors < their respective symbols indicate of the Gauss-Jordan decomposition’s complementary form (12.9.7). (13. Except when A has full column rank the kernel matrix is not unique. It begins from the statement of the linear system Ax = b. (13.1 derives the formula. where b = 0 or r = m. INVERSION AND ORTHONORMALIZATION • AK has full rank n − r (that is. there are inﬁnitely many kernel matrices AK to choose from for a given matrix A. The deﬁnition does not pretend that the kernel matrix AK is unique. Ir KSx = G−1 b. The Gauss-Jordan kernel formula 6 AK = S −1 K −1 Hr In−r = G−1 Hr In−r < (13.7) gives a complete kernel AK of A.” though.3. > 6 The name Gauss-Jordan kernel formula is not standard as far as the writer is aware.

THE KERNEL Applying an identity from Table 12. 0) = −Ir KHr a. b) = f (a. If b = 0 as (13.11) f ≡ Ir Sx.9) to achieve precisely this eﬀect.5) requires. The equation has however on the left only Ir Sx. > Sx(a. but that the choice then determines the ﬁrst r elements. Ir K(In − Ir )Sx + Ir Sx = G−1 b. Equation (13.9) is interesting. Naturally this is no accident. . we have (probably after some trial and error not recorded here) planned the steps leading to (13. Sx(a. The implication is signiﬁcant. 0) a = f (a.9) implies that one can choose the last n − r elements of Sx freely. It has Sx on both sides. Notice how we now associate the factor (In − Ir ) rightward as a row truncator. b) a = f (a. where Sx is the vector x with elements reordered in some particular way. 0) = 7 f (a.13. (13. This makes f and thereby also x functions of the free parameter a and the driving vector b: f (a.10) a ≡ H−r (In − Ir )Sx. 11 and 12 have gone to such considerable trouble to develop the basic theory of the matrix.9) Equation (13. (13. 0) + Hr a. then f (a.9) in the improved form f = G−1 b − Ir KHr a. Ir Sx = G−1 b − Ir K(In − Ir )Sx. which is the remaining n − r elements. where a represents the n − r free elements of Sx and f represents the r dependent elements. b) + Hr a. and on the right only (In − Ir )Sx. though it had ﬁrst entered acting leftward as a column truncator. > 339 (13. > Rearranging terms.7 No element of Sx appears on both sides.3.2 on page 311. which is the ﬁrst r elements of Sx. > Sx = f a = f + Hr a. To express the implication more clearly we can rewrite (13. b) = G−1 b − Ir KHr a. The ﬂexibility to reassociate operators in such a way is one of many good reasons Chs.

(13. the columns of In−r . then the columns of x(In−r . Equation (13. with the elements of a again as the weights. in aggregate. INVERSION AND ORTHONORMALIZATION Substituting the ﬁrst line into the second.12) Left-multiplying by S −1 = S ∗ = S T produces the alternate kernel formula (13. from another perspective. (13. exactly address the domain of a. And what are the solutions for each ej ? Answer: the corresponding columns of AK . The solution x = AK a for a given choice of a becomes a weighted combination of the solutions for each ej . which by deﬁnition are the independent values of x that cause b = 0. baﬄing.13) is SAK = (I − Ir K)(In − Ir )Hr = [(In − Ir ) − Ir K(In − Ir )]Hr = [(In − Ir ) − (K − I)]Hr . The idea is that if we can solve the problem for each elementary vector ej —that is. 0) = (I − Ir K)Hr ej .14) is correct but not as simple as it could be. by which8 SAK = (I − Ir K)Hr In−r . this seems trivial.14) The alternate kernel formula (13. For all the ej at once. eqn. By the identity (11. AK = S −1 (I − Ir K)Hr In−r .340 CHAPTER 13. then x by AK ? One justiﬁes them in that the columns of In−r are the several ej .15) 8 These are diﬃcult steps. of which any (n − r)-element vector a can be constructed as the linear combination n−r a = In−r a = [ e1 e2 e3 ··· en−r ]a = j=1 aj e j weighted by the elements of a. How does one justify replacing a by ej . . 0) likewise exactly address the range of x(a. 0) = (I − Ir K)Hr a. (13. Seen from one perspective. then ej by In−r . Sx(ej .6) has already named this range AK . until one grasps what is really going on here.13) (13. But if all the ej at once. where 1 ≤ j ≤ n − r. 0) = (I − Ir K)Hr In−r .76). Sx(In−r . 0). if we can solve the problem for the identity matrix In−r —then we shall implicitly have solved it for every a because a is a weighted combination of the ej and the whole problem is linear. Sx(a. In the event that a = ej .

9 but (13.16) though unproven still helps because it posits a hypothesis toward which to target the analysis. (13.18) into (13. SAK = K −1 (In − Ir )Hr + [K −1 − K −1 In − I + In ]Hr . since all of K’s interesting content lies by construction right of its rth column).15) is not obvious. right-multiplying by Ir the identity that Ir K −1 (In − Ir ) = K −1 − I and canceling terms.15) schematically with a pencil. 9 Well. THE KERNEL 341 where we have used Table 12. and if one remembers that K −1 is just K with elements oﬀ the main diagonal negated. we have that K −1 Ir = Ir (13. Substituting (13. the appearance pretty much is entirely convincing. no. actually.2 again in the last step.16) The appearance is not entirely convincing.3.2. so SAK = K −1 (In − Ir )Hr .2 also help. .18) (which actually is pretty obvious if you think about it. First. Two variations on the identities of Table 12. (13. According to Table 12. then it appears that SAK = K −1 Hr In−r .15) yields SAK = [(In − K −1 Ir ) − (I − K −1 )]Hr . How to proceed symbolically from (13. SAK = K −1 (In − Ir )Hr + [(K −1 − I)(I − In )]Hr . the quantity in square brackets is zero.17) and (13.13. but let us ﬁnish the proof symbolically nonetheless.17) Second. Factoring. 2 we have that K − I = I − K −1 . from the identity that K + K −1 = I. Now we have enough to go on with. Adding 0 = K −1 In Hr − K −1 In Hr and rearranging terms. but if one sketches the matrices of (13.

The ﬁnal step is to left-multiply (13. INVERSION AND ORTHONORMALIZATION which. the deﬁnition of AK in the section’s introduction demands both of these features. In general such features would be hard to establish.2 Converting between kernel matrices If C is a reversible (n−r)×(n−r) operator by which we right-multiply (13. that it had full rank. without altering the space addressed. Moreover.5. . Regarding the whole kernel space.1 and 12. Refer to §§ 12.3.4 lets one replace the columns of AK with those of A′K .5. reversibly. worth pausing brieﬂy to appreciate. proves (13.13) the last rows of SAK are Hr In−r .19) like AK evidently represents the kernel of A: AA′K = A(AK C) = (AAK )C = 0. one column at a time. but here the factors conveniently are Gauss-Jordan factors. AK addresses it because AK comes from all a. and SAK lacks it because according to (13.16).7)’s AK actually addressed the whole kernel space of A rather than only part. Moreover. that is. but if out of sequence then a reversible permutation at the end corrects the order.54) below incidentally tends to make a good choice for C. Regarding redundancy. some C exists to convert AK into every alternate kernel matrix A′K of A. Indeed this makes sense: because the columns of AK C address the same space the columns of AK address. One would further like to feel sure that AK had no redundant columns. then the matrix A′K = AK C (13.342 CHAPTER 13.7) that was to be derived. 13. (It might not let one replace the columns in sequence. AK lacks it because SAK lacks it. 13. We know this because § 12.3.) The orthonormalizing column operator R−1 of (13. the two matrices necessarily represent the same underlying kernel.2 for the pattern by which this is done. (13.3 The degree of freedom A slightly vague but extraordinarily useful concept has emerged in this section.76) has that (In − Ir )Hr = Hr In−r . considering that the identity (11.6). So.16) by S −1 = S ∗ = S T . reaching (13.7) has both features and does ﬁt the deﬁnition. The concept is the concept of the degree of freedom. One would like to feel sure that the columns of (13. in fact.

Maybe the cannon’s carriage wheel is broken and the artillerist can no longer turn the cannon. Likewise. Some apparent degrees of freedom are not real. The artillerist probably preloads the cannon always with a standard charge of gunpowder because. nearer targets can be hit by ﬁring either higher or lower at the artillerist’s discretion. who has never ﬁred an artillery piece (unless an arrow from a Boy Scout bow counts). If he disregards Napoleon’s order. The author. Yet even among the six remaining degrees of freedom. for he needs two but has only one. too much gunpowder might break the cannon. Napoleon’s artillerist10 might have enjoyed as many as six degrees of freedom in ﬁring a cannonball: two in where he chose to set up his cannon (one degree in north-south position. the artillerist must limber up the cannon and hitch it to a horse to shift it to better ground. . “Fire!” canceling the time degree as well. 10 . Now consider what happens if the artillerist loses one of his last two remaining degrees of freedom. since exactly two degrees are needed to hit some particular target on the battleﬁeld. On the other hand. invites any real artillerist among the readership to write in to improve the example. he can still choose ﬁring elevation but no longer azimuth.3. is of course restricted by the lay of the land: the artillerist can ﬁre from a high place only if the place he has chosen to ﬁre from happens to be up on a hill. THE KERNEL 343 A degree of freedom is a parameter one remains free to determine within some continuous domain. the artillerist might ﬁnd some impractical to exercise. the two are enough. anyway. .13. Other degrees of freedom are nonlinear in eﬀect: a certain ﬁring elevation gives maximum range. In such a strait to hit some particular target on the battleﬁeld. the artillerist needs somehow to recover another degree of freedom. And Napoleon might yell. for Napoleon had no ﬂying cannon. “Fire!” (maybe not a wise thing to do. one in muzzle velocity (as governed by the quantity of gunpowder used to propel the ball). for this too he cannot spare time in the heat of battle: this costs two degrees. For example. For example. for could he choose neither azimuth nor the moment to ﬁre. two in aim (azimuth and elevation). when he ﬁnds his target in the ﬁeld. then he can still hope to hit even with the broken carriage wheel. then he would almost surely miss. muzzle velocity gives the artillerist little control ﬁring elevation does not also give. Two degrees of freedom remain to the artillerist. A seventh potential degree of freedom. ) and waits for the target to traverse the cannon’s ﬁxed line of ﬁre. and one in time. but. one in east-west). . that is. he cannot spare the time to unload the cannon and alter the charge: this costs one degree of freedom. the height from which the artillerist ﬁres. but.

and so on. not merely a discrete choice to turn left or right. If the line bends and turns like a mountain road. Just how many streets it takes to turn the driver’s “line” experience into a “plane” experience is a matter for the mathematician’s discretion. Reviewing (13. if it is nondegenerate such that r = m.20) . but the count of the degrees of freedom in a system is of high conceptual importance to the engineer nonetheless. then it is guaranteed to have a family of solutions x. north-south. INVERSION AND ORTHONORMALIZATION All of this is hard to generalize in unambiguous mathematical terms. but he might plausibly claim a second degree of freedom upon reaching the city. but also common is the more general. A plane brings two. an electrical engineer knows in advance that a circuit needs at least n potentiometers for the technician adequately to tune the circuit. And if the road reaches an intersection? Answer: still one degree. The driver on the mountain road cannot claim a second degree of freedom at the mountain intersection (he can indeed claim a choice. A point brings none. a swimmer in a swimming pool enjoys three degrees of freedom (up-down. Basically.11). a line brings a single degree of freedom. east-west) even though his domain in any of the three is limited to the small volume of the pool. If the underdetermined system is not also overdetermined. On the other hand. where the web or grid of streets is dense enough to approximate access to any point on the city’s surface. Engineers of all kinds think in this way: an aeronautical engineer knows in advance that an airplane needs at least n ailerons. it still brings a single degree of freedom. rudders and other control surfaces for the pilot adequately to control the airplane.4 The nonoverdetermined linear system The exactly determined linear system of § 13. This family is the topic of the next section. (13. represented by the n − r free elements of a.2 is common. we ﬁnd n − r degrees of freedom in the general underdetermined linear system. A degree of freedom has some continuous nature. 13. nonoverdetermined linear system Ax = b. In geometry. but the choice being discrete lacks the proper character of a degree of freedom). the count captures the idea that to control n output variables of some system takes at least n independent input variables.344 CHAPTER 13. The n may possibly for various reasons still not suﬃce—it might be wise in some cases to allow n + 1 or n + 2—but in no event will fewer than n do.

22) separately.2. The (n − r)-element vector a remains unspeciﬁed. Splitting the system does not change it. m × n matrix of full row rank (§ 12.7) has given us AK and thereby the homogeneous solution. b) + AK a = f (a. (13. the mathematician just picks one—and is called a particular solution of (13. though. x = x1 + A a. Equation (13. when the second line is added to the ﬁrst and the third is substituted. which renders the analysis of (13. x is an unknown.20). Notice the italicized articles a and the. but it does let us treat the system’s ﬁrst and second lines in (13. This section delineates the family. it remains in § 13.20). the symbol x1 represents any one n-element vector that happens to satisfy the form’s ﬁrst line—many are possible.4.1 Particular and homogeneous solutions The nonoverdetermined linear system (13. so we prefer to split the system as Ax1 = b. whereupon AK a represents the complete family of n-element vectors that satisfy the form’s second line.2 A particular solution f (a.20).21) Except in the exactly determined edge case r = m = n of § 13.20) by deﬁnition admits more than one solution x for a given driving vector b. In the split form. and A is a square or broad.22) 13. A(AK a) = 0.4) r = m ≤ n. the nonoverdetermined linear system has no unique solution but rather a family of solutions.5. The family of vectors expressible as AK a is called the homogeneous solution of (13. The Gauss-Jordan kernel formula (13.4. n-element vector. K (13.2 to ﬁnd a particular solution.20) already half done. which. THE NONOVERDETERMINED LINEAR SYSTEM 345 in which b is a known. b) + Hr a. Any particular solution will do. makes the whole form (13. > (S) x1 (a. 13.11) has that .13. Such a system is hard to solve all at once. m-element vector. b) a = f (a.4. To complete the analysis.4. b) = G−1 b − Ir KHr a.

which addresses a related problem. b) = G−1 b.22) for x.8. Equation (13. Equation (13.22) and (13. b) 0 = f (0. Avoiding unduly small pivots early in the Gauss-Jordan extends (13.24) for matrices larger than some moderately large size. practical calculations are usually done in limited precision.7). (13. and for yet larger matrices a bewildering variety of more sophisticated techniques exists to mitigate the problem.3 The general solution Assembling (13.24) broadens the solution to include the nonoverdetermined linear system. (It can also lose accuracy when the matrix’s rows are almost dependent. b). We are not free to choose the driving vector b.) 13.5 The residual Equations (13.24) to the nonoverdetermined linear system (13. b) = That is. > (13.24)’s reach to larger matrices. None of those equations however can handle the .24) is useful and correct. in which compounded rounding error in the last bit eventually disrupts (13.346 CHAPTER 13.2) and (13. a can be anything we want. In exact arithmetic (13. x1 = S −1 G−1 b. which can be vexing because the problem arises even when the matrix A is exactly known.20).23) yields the general solution x = S −1 (G−1 b + K −1 Hr In−r a) > (13. > Sx1 (0. This holds for any a and b. 13. INVERSION AND ORTHONORMALIZATION where we have substituted the last line of (13. Of course. See § 14. but since we need only one particular solution.23) f (0. but that is more the fault of the matrix than of the formula.4) solve the exactly determined linear system Ax = b. Why not a = 0? Then f (0. but one should at least be aware that it can in practice lose ﬂoating-point accuracy when the matrix it attacks grows too large.4.24) solves the nonoverdetermined linear system in theory exactly.

whose labor union can supply additional. suppose that we are building-construction contractors with a unionized work force.6. and the more so because it arises so frequently in applications. etc. overdetermined.27) is. On Saturday morning at the end of the second week. fully trained labor on demand. If ui and bi respectively represent the number of workers and the length of freeway completed during week i. the better. The quantity11. Alas. In fact the overdetermined system is especially interesting. For example. called the squared residual norm. The r here is unrelated to matrix rank r. (See § 12.6 The Moore-Penrose pseudoinverse and the least-squares problem A typical problem is to ﬁt a straight line to some data.25) has no exact solution.) One is tempted to declare the overdetermined system uninteresting because it has no solution and to leave the matter there.25). THE PSEUDOINVERSE AND LEAST SQUARES 347 overdetermined linear system. the alphabet has only so many letters (see Appendix B). We need to develop the mathematics to handle the overdetermined system properly. we gather and plot the production data on the left of Fig. 13. 13.1. Some authors [55] however prefer to deﬁne r(x) ≡ Ax − b. instead. because for general b the overdetermined linear system Ax ≈ b (13.13. the more favorably we regard the candidate solution x. We call this quantity the residual. More precisely. 12 This is as [73] deﬁnes it.5 for the deﬁnitions of underdetermined. One seldom trusts a minimal set of data for important measurements. Suppose further that we are contracted to build a long freeway and have been adding workers to the job in recent weeks to speed construction. yet extra data imply an overdetermined system. 11 .5. but this would be a serious mistake. the smaller the nonnegative real scalar [r(x)]∗ [r(x)] = i |ri (x)|2 (13.26) measures how nearly some candidate solution x solves the system (13.12 r(x) ≡ b − Ax (13. and the smaller.

3. but geometrically we need only two points to specify a line. Statistically.13 The proper approach is to draw among the data points a single straight line that misses the points as narrowly as possible. in the hope that the resulting line will predict future production accurately. they diﬀer as driving an automobile diﬀers from washing one. The added data present a problem. what are we to do with the other three? The ﬁve points together overdetermine the linear system = b1 b2 b3 b4 b5 . Why now two? The answer is that § 13.3 discussed travel along a line rather than placement of a line as here. Do not let this confuse you. By the ﬁfth Saturday however we shall have gathered more production data. Though both involve lines.348 CHAPTER 13. plotted on the ﬁgure’s right. for in placing the line we enjoy only two degrees of freedom. the proper Section 13. More precisely. That is all mathematically irreproachable. INVERSION AND ORTHONORMALIZATION Figure 13.2 to solve for x ≡ [σ γ]T . the added data are welcome.3.1 and 13.3 characterized a line as enjoying only one degree of freedom.1: Fitting a line to measured data. 13 Length newly completed during the week . inverting the matrix per §§ 13. to which we should like to ﬁt a better line to predict production more accurately. Number of workers Length newly completed during the week Number of workers u2 1 γ b2 u1 u2 u3 u4 u5 1 1 1 1 1 σ γ then we can ﬁt a straight line b = σu + γ to the measured production data such that b1 σ u1 1 = . There is no way to draw a single straight line b = σu + γ exactly through all ﬁve.

In this case.13. we seek to minimize the squared residual norm [r(x)]T [r(x)] = (b − Ax)T (b − Ax) = xT AT (Ax − 2b) + bT b.25). but in general any matrix A with any number of columns of any content might arise (because there were more than two relevant variables or because some data merited heavier weight than others.1 Least squares in the real domain The least-squares problem is simplest when the matrix A enjoys full column rank and no complex numbers are involved. data marching on the left.6. This is a typical structure for A. . Ax ≈ b. THE PSEUDOINVERSE AND LEAST SQUARES 349 approach chooses parameters σ and γ to minimize the squared residual norm [r(x)]∗ [r(x)] of § 13. all ones on the right. b1 b2 b3 b4 b5 . The matrix A in the example has two columns. among many further reasons). 13. this section attacks the diﬃcult but important problem of approximating optimally a solution to the general. given that u1 u2 u3 u4 u5 . Whatever matrix A might arise from whatever source.5.10). A= x= σ γ . . 1 1 1 1 1 . . = xT AT Ax + bT b − 2xT AT b = xT AT Ax + bT b − xT AT b + bT Ax in which the transpose is used interchangeably for the adjoint because all the numbers involved happen to be real. possibly unsolvable linear system (13. The norm is minimized where d rT r = 0 dx (in which d/dx is the Jacobian operator of § 11. A requirement that d xT AT (Ax − 2b) + bT b = 0 dx . b= Such parameters constitute a least-squares solution. .6. .

“what if some more complicated metric is better?” Well.27).” comes the objection.27) is so chosen.14 Here is a nice example of the use of the mathematical adjective optimal in its adverbial form. As the reader can see. Mathematics oﬀers many downward-turning curves.79) yields the equation xT AT A + AT (Ax − 2b) T = 0. the simpliﬁed equation implies the approximate but optimal least-squares solution −1 T A b (13. next) that the n×n square matrix AT A is invertible. Diﬀerentiating by the Jacobian product rule (11. 13. but it does pass pretty convincingly nearly among them. the better. then one should probably use it.6. its role is not. really. too much labor on the freeway job might actually slow construction rather than speed it. In general however the mathematical question is: what does one mean by “better?” Better by which metric? Each metric is better according to itself. to identify the good as good in the ﬁrst place. Assuming (as warranted by § 13.25) in the restricted but quite typical case that A and b are real and A has full column rank. One could therefore seek to ﬁt not a line but some downward-turning curve to the data. and the less. real cost function or metric. A circle.28) plots the line on Fig. the mathematics cannot tell us which metric to use. nonnegative. in the squared-residual norm sense of (13. objectively is better.” Many problems in applied mathematics involve discovering the best of something. What constitutes the best however can be a matter of judgment.2. The role of applied mathematics is to construct suitable models to calculate quantities needed to achieve some deﬁnite good. Equation (13. An experienced mathematician would probably reject the circle on the aesthetic yet practical ground that the parabola b = αu2 + σu + γ lends 14 . but where no other consideration prevails the applied mathematician tends to choose the metric that best simpliﬁes the mathematics at hand— and. if the other metric really. “But. the simpliﬁed equation AT Ax = AT b. even of dispute. In fact it passes optimally nearly among them. No line can pass more nearly. for applied mathematics is a poor guide to such mysteries. Strictly speaking. the line does not pass through all the points. “Optimal” means “best.350 CHAPTER 13. We will leave to the philosopher and the theologian the important question of what constitutes objective good. rearranging terms and dividing by 2. usually. This is where the mathematician’s experience. The metric (13. INVERSION AND ORTHONORMALIZATION comes of combining the last two equations. for no line can. taste and judgment come in. In the present section’s example. that is about as good a way to choose a metric as any. or.28) x = AT A to the unsolvable linear system (13. after transposing the equation.1’s right. maybe? Not likely. One generally establishes mathematical optimality by some suitable.

m × m square. They predict production diﬀerently. The contradiction proves false the assumption which gave rise to it. Yet even ﬁtting a mere straight line oﬀers choices. so it only broadens the same assumption to suppose that the product A∗ A were invertible for complex A of full column rank. 15 Notice that if A is tall. THE PSEUDOINVERSE AND LEAST SQUARES 351 13. then A∗ A is a compact. Let A be a complex. implying (§ 12. whereas AA∗ is a big. the n × n product A∗ A is invertible for any tall or square.6. This would mean that there existed a nonzero. It is the compact square that concerns this section. or in other words that n (Au) (Au) = i=1 ∗ |[Au]i |2 = 0. One might ﬁt the line to the points (bi . But this could only be so if Au = 0. n-element vector u for which A∗ Au = 0.1 has assumed correctly but unwarrantedly that the product AT A were invertible for real A of full column rank. Section 6. it happens that AT = A∗ .13.5. Left-multiplying by u∗ would give that u∗ A∗ Au = 0.2 The invertibility of A∗ A Section 13. ln bi ) rather than to the points (ui . In u = 0. Thus. m × n matrix A of full column rank r = n ≤ m.6. this is to suppose (§ 13. m × n matrix of full column rank r = n ≤ m. n × n square. The three resulting lines diﬀer subtly. The adjective “optimal” alone evidently does not always tell us all we need to know. Since the product A∗ A is a square. Suppose falsely that A∗ A were not invertible but singular.4) that its columns (as its rows) depended on one another. The false assumption: that A∗ A were singular. In u = 0. n × n matrix. bi ). thereby incidentally also warranting the former. For real A.3 oﬀers a choice between averages that resembles in spirit this footnote’s choice between metrics.6. . In u = 0. The big square is not very interesting and in any case is not invertible. In u = 0. impossible when the columns of A are independent. ui ) or (ln ui .1) that the product’s rank r ′ < n were less than full.15 This subsection warrants the latter assumption. itself to easier analysis.

we manipulated (13. Thus per (13. (B B) ∗ −1 B BCx ≈ (B ∗ B)−1 B ∗ b. every m × n matrix A of rank r can be factored into a product16 A = BC of an m × r tall or square matrix B and an r × n broad or square matrix C. both of which factors themselves enjoy full rank r. but their sense is that a positive deﬁnite operator never reverses the thing it operates on. 13. An n × n matrix C is nonnegative deﬁnite if and only if ℑ(u∗ Cu) = 0 and ℜ(u∗ Cu) ≥ 0 for all u. ∗ BCx ≈ b.2 ﬁnds at least the full-rank factorization B = G> Ir . here also when a matrix A has full column rank r = n ≤ m the product u∗ A∗ Au = (Au)∗ (Au) is real and positive for all nonzero. Such deﬁnitions might seem opaque. Suppose that. According to (12.29) the product A∗ A is positive deﬁnite for any matrix A of full column rank. (13.30) By reasoning like the last paragraph’s. INVERSION AND ORTHONORMALIZATION 13.8 explains further. inspired by (13.16). 16 This subsection uses the symbols B and b for unrelated purposes. nelement vectors u. See footnote 11.) This being so. . 12.352 CHAPTER 13. however.6.6. then one can just choose B = Im or C = In . (If A happens to have full row or column rank. which is unfortunate but conventional. C = Ir G< . but even if A lacks full rank. Section 13.4 The Moore-Penrose pseudoinverse Not every m × n matrix A enjoys full rank.3 Positive deﬁniteness An n × n matrix C is positive deﬁnite if and only if ℑ(u∗ Cu) = 0 and ℜ(u∗ Cu) > 0 for all In u = 0. A positive deﬁnite operator resembles a positive scalar in this sense.6. Cx ≈ (B ∗ B)−1 B ∗ b. (13. a conjecture seems warranted. the product A∗ A is nonnegative definite for any matrix A whatsoever.2. that the product Au points more in the direction of u than of −u. the Gauss-Jordan decomposition of eqn.25) by the successive steps Ax ≈ b.28).29) As in § 13.

and we know from § 13. the latter of which admittedly requires real A of full column rank but does minimize the residual when its requirements are met.6.6. (13. According to (13.26). So here is our conjecture: • no x enjoys a smaller squared residual norm r∗ r than the x of (13.31)? One can but investigate. u ≈ (CC ∗ )−1 (B ∗ B)−1 B ∗ b.13.” x = C ∗ (CC ∗ )−1 (B ∗ B)−1 B ∗ b. . moderate or large. of some alternate x from the x of (13. 353 thus restricting x to the space addressed by the independent columns of C ∗ . and.31) Equation (13. CC ∗ u ≈ (B ∗ B)−1 B ∗ b. this is [b − Ax]∗ [b − Ax] ≤ [b − (A)(x + ∆x)]∗ [b − (A)(x + ∆x)].31) does.31) is strictly least in magnitude. the x of (13. whether small. and • among all x that enjoy the same.2 at least that the two matrices it tries to invert are invertible. altering the “≈” sign to “=. Changing the variable back and (because we are conjecturing and can do as we like). THE PSEUDOINVERSE AND LEAST SQUARES Then suppose that we changed C ∗ u ← x. The conjecture is bold.31) does resemble (13. even if there were more than one x which minimized the residual. After all. The ﬁrst point of the conjecture is symbolized r∗ (x)r(x) ≤ r∗ (x + ∆x)r(x + ∆x). Reorganizing.31).28). Continuing. (13. where ∆x represents the deviation.31) has a pleasingly symmetrical form. [b − Ax]∗ [b − Ax] ≤ [(b − Ax) − A ∆x]∗ [(b − Ax) − A ∆x]. but if you think about it in the right way it is not unwarranted under the circumstance. one of them might be smaller than the others: why not the x of (13. minimal squared residual norm.

See Ch.31) and the full-rank factorization A = BC.2 and 9. now established. or equivalently by the last paragraph’s logic. thus establishing the conjecture’s ﬁrst point.17 so the assertion in the last form is logically equivalent to the conjecture’s ﬁrst point. minimal squared residual norm. The latter case is symbolized r∗ (x)r(x) = r∗ (x + ∆x)r(x + ∆x). 0 = ∆x∗ A∗ A ∆x. But according to (13. has it that no x+∆x enjoys a smaller squared residual norm than the x of (13. which since B has full column rank is possible only if C ∆x = 0. A∗ (b − Ax) = A∗ b − A∗ Ax = [C ∗ B ∗ ][b] − [C ∗ B ∗ ][BC][C ∗ (CC ∗ )−1 (B ∗ B)−1 B ∗ b] = C ∗ B ∗ b − C ∗ (B ∗ B)(CC ∗ )(CC ∗ )−1 (B ∗ B)−1 B ∗ b = C ∗ B ∗ b − C ∗ B ∗ b = 0. leaving an assertion that 0 ≤ ∆x∗ A∗ A ∆x. . with which the paragraph began. A ∆x = 0. Moreover. Each step in the present paragraph is reversible. 6’s footnote 16. the assertion in the last form is correct because the product of any matrix and its adjoint according to § 13.3 is a nonnegative deﬁnite operator. or in other words.3.354 CHAPTER 13. But A = BC. 17 The paragraph might inscrutably but logically instead have ordered the steps in reverse as in §§ 6. so this is to claim that B(C ∆x) = 0. 0 ≤ −∆x∗ A∗ (b − Ax) − (b − Ax)∗ A ∆x + ∆x∗ A∗ A ∆x.6.31) does.5. which reveals two of the inequality’s remaining three terms to be zero. It does not claim that no x + ∆x enjoys the same. INVERSION AND ORTHONORMALIZATION Distributing factors and canceling like terms. The conjecture’s ﬁrst point.

every matrix has a Moore-Penrose pseudoinverse. so the last inequality reduces to read 0 < ∆x∗ ∆x. overdetermined or even degenerate. Returning attention to the conjecture. reverse logic establishes the conjecture’s second point. Yielding the optimal approximation x = A† b. Since each step in the paragraph is reversible.33) Some books print A† as A+ . 18 (13. exactly determined.31) is called the Moore-Penrose pseudoinverse of A. Whether underdetermined. more brieﬂy the pseudoinverse of A. Distributing factors and canceling like terms. THE PSEUDOINVERSE AND LEAST SQUARES 355 Considering the product ∆x∗ x in light of (13. 0 < x∗ ∆x + ∆x∗ x + ∆x∗ ∆x. which naturally for ∆x = 0 is true. But the last paragraph has found that ∆x∗ x = 0 for precisely such ∆x as we are considering here.6. If A = BC is a full-rank factorization. not “≤” as in the conjecture’s ﬁrst point). With both its points established. then the matrix18 A† ≡ C ∗ (CC ∗ )−1 (B ∗ B)−1 B ∗ (13. . we observe that ∆x∗ x = ∆x∗ [C ∗ (CC ∗ )−1 (B ∗ B)−1 B ∗ b] = [C ∆x]∗ [(CC ∗ )−1 (B ∗ B)−1 B ∗ b].13.32) of (13. which is to observe that ∆x∗ x = 0 for any ∆x for which x + ∆x achieves minimal squared residual norm.31) and the last equation. its second point is symbolized x∗ x < (x + ∆x)∗ (x + ∆x) for any ∆x = 0 for which x+∆x achieves minimal squared residual norm (note that it’s “<” this time. the conjecture is true.

25) as well as the system can be solved—exactly if possible.1 if f and x happen each to have the same number of elements. Refer to § 4. then the Moore-Penrose solves the system as nearly as the system can be solved (as in Fig. then the Moore-Penrose A† = A−1 is just the inverse. Now that we have the notation and algebra we can write down the multivariate Newton-Raphson iteration almost at once. “Moore-Penrose generalized inverse”] [63] .33) solves the system uniquely and exactly.5. with minimal squared residual norm if impossible. INVERSION AND ORTHONORMALIZATION the Moore-Penrose solves the linear system (13.19 13. If A lacks full row rank.23 fails to do).356 CHAPTER 13.10. and then of course (13. The iteration approximates the nonlinear vector function f (x) by its tangent ˜k (x) = f (xk ) + d f (x) (x − xk ).20 19 20 [7.6—which is just the ordinary inverse [·]−1 of § 13. moreover minimizing the solution’s squared magnitude x∗ x (which the solution of eqn.34) where [·]† is the Moore-Penrose pseudoinverse of § 13. f dx x=xk where df /dx is the Jacobian derivative of § 11. If A is square and invertible.8 we lacked the matrix notation and algebra to express and handle vector-valued functions adeptly.3][58. 4. x=xk Solving for xk+1 (approximately if necessary).1) and as a side-beneﬁt also minimizes x∗ x. 13. (13. Nothing can solve the system uniquely if A has broad shape but the Moore-Penrose still solves the system exactly in that case as long as A has full row rank. It then approximates the root xk+1 as the point at which ˜k (xk+1 ) = 0: f ˜k (xk+1 ) = 0 = f (xk ) + f d f (x) dx (xk+1 − xk ).7 The multivariate Newton-Raphson iteration When we ﬁrst met the Newton-Raphson iteration in § 4.8 and Fig. It is a signiﬁcant discovery. we have that xk+1 d f (x) = x− dx † f (x) x=xk . § 3. The MoorePenrose is thus a general-purpose solver and approximator for linear systems. 13.

8. Therefore. Most recently. § 3. both of which mean a∗ · b (or.34).13. But if the dot product is to mean anything. a · b ≡ aT b = ∞ j=−∞ (13. a). the notation usually resembles a. restarting the iteration from another point. This book prefers the dot. It does not intend to use the full (13. The iteration intends rather in light of (13. (13. at least in the author’s country. the usage a. the Newton-Raphson iteration is not normally meant to be applied at a value of x for which the Jacobian is degenerate.1][24. a · b = (a1 e1 + a2 e2 + · · · + an en ) · (b1 e1 + b2 e2 + · · · + bn en ).8 The dot product The dot product of two vectors. if r = n ≤ m.35) fails. or.36) a j bj . except that which of a and b is conjugated depends on the author. if r = m = n.5 ran horizontally at the test point—when one quits. as though the curve of Fig.32). It is written a · b.21 is the product of the two vectors to the extent to which they run in the same direction. especially in some of the older literature. THE DOT PRODUCT 357 Despite the Moore-Penrose notation of (13. 4.37) 21 The term inner product is often used to indicate a broader class of products than the one deﬁned here. Where used. −1 [df /dx]∗ ([df /dx] [df /dx]∗ ) = [df /dx]−1 −1 ([df /dx]∗ [df /dx]) [df /dx]∗ if r = m ≤ n. some similar product). b ≡ a∗ · b seems to be emerging as standard where the dot is not used [7. also called the inner product. more concisely.32) that d f (x) dx † where B = Im in the ﬁrst case and C = In in the last. a · b = a1 b1 + a2 b2 + · · · + an bn . (13. If both r < m and r < n—which is to say. it must be that ei · ej = δij . 4]. . b or (b.35) 13. In general. more broadly. if the Jacobian is degenerate—then (13. Ch.

38) j=−∞ gives the square of a vector’s magnitude. if either vector is wrongly oriented. (13. with the product of the imaginary parts added not subtracted. usually one is not interested in a · b so much as in a ·b≡a b= The reason is that ℜ(a∗ · b) = ℜ(a) · ℜ(b) + ℑ(a) · ℑ(b).) Where vectors may have complex elements. (13. the notation implicitly reorients it before using it.41) When two vectors do not run in the same direction at all. incidentally: a · b = a · bT = aT · b = aT · bT = aT b. (13. That is. Geometrically this puts them at right angles. . (The more orderly notation aT b by contrast assumes that both are proper column vectors. always real.40) a≡ |a| a∗ · a from which ˆ ˆ |ˆ|2 = a∗ · a = 1. the dot product |a|2 = a∗ · a (13. a a∗ · b = 0. For other angles θ between two vectors.43) which formally deﬁnes the angle θ even when a and b have more than three elements each. ˆ ˆ a∗ · b = cos θ. j (13. such that (13. thus honoring the right Argand sense of “the product of the two vectors to the extent to which they run in the same direction.358 CHAPTER 13.” By the Pythagorean theorem.42) the two vectors are said to lie orthogonal to one another. never negative.39) ∗ ∗ ∞ a∗ bj . INVERSION AND ORTHONORMALIZATION The dot notation does not worry whether its arguments are column or row vectors. The unit vector in a’s direction then is a a ˆ =√ .

13. among others.44) for any complex.21) lead one to hypothesize generally that |a| − |b| ≤ |a + b| ≤ |a| + |b| (13. Splitting a and b each into real and imaginary parts on the inequality’s left side and then halving both sides. (This naturally is impossible for any case in which f = 0 or g = 0.9 The complex vector triangle inequalities The triangle inequalities (2. The proof of the sum hypothesis that |a + b| ≤ |a| + |b| is by contradiction. .9. ℜ(bn ) ℑ(bn ) f≡ . Deﬁning the new. a∗ · b + b∗ · a > 2 |a| |b| . n-dimensional vectors a and b. Distributing factors and canceling like terms. we make the inequality to be f · g > |f | |g| . . but wishing to establish impossibility for all cases we pretend not to notice and continue reasoning as follows. in which we observe that the left side must be positive because the right side is nonnegative. We suppose falsely that |a + b| > |a| + |b| . ℜ(an ) ℑ(an ) ℜ(b1 ) ℑ(b1 ) ℜ(b2 ) ℑ(b2 ) . (a + b)∗ · (a + b) > a∗ · a + 2 |a| |b| + b∗ · b.) Squaring again. (f · g)2 > (f · f )(g · g). . .39). g≡ . .44) and (3. 2n-dimensional real vectors ℜ(a1 ) ℑ(a1 ) ℜ(a2 ) ℑ(a2 ) . Squaring and using (13. ℜ(a) · ℜ(b) + ℑ(a) · ℑ(b) > |a| |b| . THE COMPLEX VECTOR TRIANGLE INEQUALITIES 359 13.

[(fi gj )(gi fj )] > i=j i=j (fi gj )2 . Subtracting 2 i (fi gi ) from each side.j 2 fi2 gj . whereupon according to the sum hypothesis (which we have already established).j i. |−a| ≤ |b| + |−a − b| . . That is. [(fi gj )2 + (gi fj )2 ]. |−b| ≤ |a| + |−a − b| .j i. where i<j = right side. INVERSION AND ORTHONORMALIZATION or.44). The contradiction proves false the assumption that gave rise to it. which we can cleverly rewrite in the form [2(fi gj )(gi fj )] > i<j i<j 2n j=i+1 . Reordering factors. This is 0> i<j [fi gj + gi fj ]2 . is impossible in all cases. fi gi fj gj > i. since we have constructed the vectors f and g to have real elements only. 2n−1 i=1 Transferring all terms to the inequality’s 0> i<j [(fi gj )2 + 2(fi gj )(gi fj ) + (gi fj )2 ]. |b + c| ≤ |b| + |c| . thus establishing the sum hypothesis of (13. [(fi gj )(gi fj )] > i. |a + c| ≤ |a| + |c| .j (fi gj )2 . in other words. which. The diﬀerence hypothesis that |a| − |b| ≤ |a + b| is established by deﬁning a vector c such that a + b + c = 0.360 CHAPTER 13.

By deﬁnition of the kernel.13. § 3.VI.3)22 (13. then you are not mistaken arbitrarily to rescale each column of my AK by a separate nonzero factor.” since by (13. THE ORTHOGONAL COMPLEMENT 361 which is the diﬀerence hypothesis in disguise.44). If I claim AK = [3 4 5. then so does α1 x1 + α2 x2 . then so equally does any αx. This completes the proof of (13.46) is an interesting matrix.3).11 Gram-Schmidt orthonormalization If a vector x = AK a belongs to a kernel space AK (§ 13. such that A⊥∗ A = 0 = A∗ A⊥ . −1 1 0]T to represent a kernel. (13. (13. (13.10.47) The matrix A⊥ is called the orthogonal complement 23 or perpendicular matrix to A. we might write not A⊥ but A⊥(m) . here too we can extend the sum inequality to the even more general form k ak ≤ k |ak | .10 The orthogonal complement A⊥ ≡ A∗K The m × (m − r) kernel (§ 13. A∗⊥ = AK . which—inasmuch as the rows of A∗ are the adjoints of the columns of A—is possible only when each uj lies orthogonal to every column of A. If the vectors x1 = AK a1 and x2 = AK a2 both belong. This says that the columns of A⊥ ≡ A∗K address the complete space of vectors that lie orthogonal to A’s columns. Refer to footnote 5.47) A⊥ is in some sense perpendicular to A. Properties include that A∗K = A⊥ .” short for “A perpendicular. Among other uses.48) 13. As in § 3. instead for instance representing the same kernel The symbol A⊥ [33][7][47] can be pronounced “A perp. the columns of A∗K are the independent vectors uj for which A∗ uj = 0.10. 23 [33.3] 22 .45) 13. the orthogonal complement A⊥ supplies the columns A lacks to reach full row rank. If we were really precise.

ˆ a b⊥ = b − a(ˆ∗ )(b). b⊥ ≡ b − a∗ · b a. One orthogonalizes a vector b with respect to a vector a by subtracting from b a multiple of a such that a∗ · b⊥ = 0. style generally asks the applied mathematician not only to normalize but also to orthogonalize the columns before reporting them. Style 7 generally asks the applied mathematician to remove the false appearance of scale by using (13. a a a a One orthonormalizes a set of vectors by orthogonalizing them with respect to one another. The same goes for the eigenvectors of Ch. a = a a∗ · a. . 1 − 7 0]T . b⊥ ≡ b − βa. .49) a∗ · b . and according to (13. a∗ · a (13. Where a kernel matrix AK has two or more columns (or a repeated eigenvalue has two or more eigenvectors). 14 to come. a∗ · a = 1. then by normalizing each of them to unit magnitude.50) or.51) (observe that it’s [ˆ][ˆ∗ ]. a matrix. x3 . a∗ · a √ ˆ ˆ ˆ But according to (13.41). Substituting the second of these equations into the ﬁrst and solving for β yields β= Hence. where the symbol b⊥ represents the orthogonalized vector. ˆ a b⊥ = b − a(ˆ∗ · b). rather than the scalar [ˆ∗ ][ˆ]). This is arguably better written b⊥ = [I − (ˆ)(ˆ∗ )] b a a (13.362 CHAPTER 13. . in matrix notation. so. Kernel vectors have no inherent scale. xn } . (13. INVERSION AND ORTHONORMALIZATION 1 as AK = [6 8 0xA. x2 . . The procedure to orthonormalize several vectors {x1 .40) to normalize the columns of a kernel matrix to unit magnitude before reporting them.40). a∗ · b⊥ = 0.

neater. normalize x1 by (13.1 Eﬃcient implementation To turn an equation like the latter line of (13. x3⊥ . x2⊥ . call the result x3⊥ . By the vector replacement principle of § 12. call the result x1⊥ . ˆ xj⊥ = xj⊥ x∗ xj⊥ j⊥ j−1 . the resulting orthonormal set of vectors ˆ ˆ ˆ {ˆ 1⊥ . orthogonalize x2 with respect to x1⊥ by (13.50) is just a scalar. Fortunately. 13. Symbolically.50) or (13. (13. one can conveniently replace a set of independent vectors by an equivalent.52) xj⊥ ≡ i=1 (I − ˆ ˆ i⊥ xi⊥ x∗ ) xj . representing the multiplication in the admittedly messier form ˆ xj(i+1) ≡ xji − (ˆ ∗ · xji ) xi⊥ . . then normalize it. not only for kernel vectors or eigenvectors.13. Third. . ˆ Second.52) exactly as written.40). . then ˆ ˆ normalize it.4 in light of (13. xn⊥ } x addresses the same space as did the original set. GRAM-SCHMIDT ORTHONORMALIZATION 363 ˆ therefore is as follows.52) into an eﬃcient numerical algorithm sometimes demands some extra thought. By the technique. First.51).11. .53) has the signiﬁcant additional practical virtue .11. If all one wants is some vectors orthonormalized. then the equation as written is neat but is ˆ ˆ i⊥ ˆ i⊥ overkill because the product xi⊥ x∗ is a matrix. xj1 ≡ xj . in perspective of whatever it happens to be that one is trying to accomplish. whereas the product x∗ xj implied by (13. Proceed in this manner through the several xj . xi⊥ xj⊥ = xjj . the messier form (13. Orthonormalization naturally works equally for any linearly independent set of vectors. (13. call the result x2⊥ . One can instead introduce intermediate vectors xji .49). orthonormal set which addresses precisely the same space. one need not apply the latter line of (13. orthogonalize x3 with respect to x1⊥ ˆ ˆ then to x2⊥ .53) ˆ ˆ i⊥ Besides obviating the matrix I − xi⊥ x∗ and the associated matrix multiplication.

53). m×r matrix Q of orthonormal columns. by an algorithm that somewhat resembles the Gauss-Jordan algorithm of § 12. which memory it repeatedly overwrites—and. .54) also called the orthonormalizing or QR decomposition. probably does not even reserve that much. 11 Aug.1—where the latter three working matrices ultimately become the extended-operational factors U .3. D and S according to Table 12. such that the several (i. 3) (2. one level looping over j and the other over i. INVERSION AND ORTHONORMALIZATION that it lets one forget each intermediate vector xji immediately after using it. (A well-written orthonormalizing computer program reserves memory for one intermediate vector only.364 CHAPTER 13.54). working rather in the ˆ memory space it has already reserved for xj⊥ .55) ˜ and initially Q ← A. 13.53) imply a two-level nested loop. j) index pairs occur in the sequence (reading left to right then top to bottom) (1. distributing the inverse elementaries ˜ ˜ ˜ to U . with the loop over j at the outer level and the loop over i at the inner. D and S of (13. By elementary column operations based on (13. 2007] .52) ˜ and (13.52) and (13. actually.4) here becomes ˜˜ ˜˜ A = QU D S (13.” 04:48. The equations suggest j-major nesting. 2) (1. 4) (3. One can turn it into the Gram-Schmidt decomposition A = QR = QU DS.)24 Other equations one algorithmizes can likewise beneﬁt from thoughtful rendering.3. (13. 4) .11. “Gram-Schmidt process.. R ≡ U DS. the algorithm gradually transforms Q into a dimension-limited. ··· ··· ··· 24 [77.2 The Gram-Schmidt decomposition The orthonormalization technique this section has developed is named the Gram-Schmidt process. except that (12. 4) (2. 3) (1. Borrowing the language of computer science we observe that the indices i and j of (13.

GRAM-SCHMIDT ORTHONORMALIZATION 365 ˆ In reality. Observing that (13. 2. that D is a general ˜ partial unit triangular form L ˜ ˜ scaling operator (§ 11. j) pair follows which.1). then skip directly to step 10. i ← 1. (13.) If Q is null in and rightward of its ith column such that no column p ≥ i remains available to choose. in detail: 1. that S is permutor ˜ (§ 11. 4) · · · . or to choose randomly. 3) (2.8.7.53)’s middle line requires only that no xi⊥ be used before it is fully calculated. 4. The i-major nesting (1. however. The algorithm. otherwise that line does not care which (i. . . We choose i-major nesting on the subtle ground that it aﬀords better information to the choice of column index p during the algorithm’s step 3. ˜ Q ← A.5).55) can be expanded to read A = = ˜ QT[i↔p] ˜ QT[i↔p] ˜ T[i↔p] U T[i↔p] ˜ T[i↔p] DT[i↔p] ˜ T[i↔p]S ˜ ˜ ˜ T[i↔p] U T[i↔p] D T[i↔p]S . D ← I. 4) · · · (3. ˜ 3. the algorithm also ˜ re¨nters here from step 9 below. Begin by initializing ˜ ˜ ˜ U ← I.13. S ← I. but one might instead prefer to choose the column of greatest magnitude. Choose a column p ≥ i of Q containing at least one nonzero element. is just as valid.) Observe that U enjoys the major e {i−1}T (§ 11.. 2) (1. 3) (1.2) with djj = 1 for all j ≥ i. (Besides arriving at this point from step 1 above. 4) · · · (2. among other ˜ heuristics. and that the ﬁrst through (i − 1)th columns of Q consist of mutually orthonormal unit vectors.7.11. (The simplest choice is perhaps p = i as long as the ith column does not happen to be null. bringing the very same index pairs in a diﬀerent sequence.

5. where α= 6. interchange the chosen pth column to the ith position by ˜ ˜ Q ← QT[i↔p]. ˜ ˜ U ← Tα[i] U T(1/α)[i] . ˜ normalize the ith column of Q by ˜ ˜ Q ← QT(1/α)[i] . 7.55) can be expanded to read ˜ A = QT−β[ij] ˜ ˜˜ Tβ[ij]U D S. .1. Observing that (13. ˜ ˜ U ← Tβ[ij] U . ˜ ˜ S ← T[i↔p]S. where ˜ β= Q ∗ ∗i ˜ · Q ∗j . (Besides arriving at this point from step 6 above. ˜ orthogonalize the jth column of Q per (13. INVERSION AND ORTHONORMALIZATION where the latter line has applied a rule from Table 12. ˜ ˜ U ← T[i↔p]U T[i↔p] . j ← i + 1. e Otherwise.366 CHAPTER 13. Initialize ˜ Q ∗ ∗i ˜ · Q ∗i . observing that (13.53) with respect to the ith column by ˜ ˜ Q ← QT−β[ij] .) If j > n then skip directly to step 9. the algorithm also re¨nters here from step 8 below.55) can be expanded to read ˜ A = QT(1/α)[i] ˜ Tα[i] U T(1/α)[i] ˜ ˜ Tα[i] D S. ˜ ˜ D ← Tα[i] D.

5. 13. GRAM-SCHMIDT ORTHONORMALIZATION 8. Whether S = I or not.3. ˜ ˜ ˜ ˜ Q ≡ Q.7. Before treating the unitary matrix.11. this is A = (Q)(Ir R).11. 13. above. when null columns cannot arise. U ≡ U . m × m shape. so one can write (13. As in § 12. Reassociating factors. below). next.32. Increment and return to step 2. r = i − 1.16) is a proper full-rank factorization with which one can compute the pseudoinverse A† of A (see eqn.13. The algorithm optionally supports this preference if the m × n matrix A has full column rank r = n. S ≡ S. Such optional discipline maintains S = I when desired. The most interesting of its several factors is the m × r orthonormalized matrix Q. (13. D ≡ D. but see also eqn. 10. the Gram-Schmidt decomposition (13. . then it becomes a unitary matrix —the subject of § 13. achieving square. If the Gram-Schmidt decomposition (13. here also one sometimes prefers that S = I. the matrix Q = QIr has only r columns.54) as A = (QIr )(R). (ii) since Q is itself dimension-limited.56) which per (12.54) needs and has no explicit factor Ir .12. at least two signiﬁcant diﬀerences stand out: (i) the Gram-Schmidt ˜ is one-sided because it operates only on the columns of Q. it is even more useful than it looks. let us pause to extract a kernel from the Gram-Schmidt decomposition in § 13. however. never on the rows.65. Let i←i+1 j ←j+1 367 End.54) looks useful. If Q reaches the maximum possible rank r = m. Though the Gram-Schmidt algorithm broadly resembles the GaussJordan. 9. if one always chooses p = i during the algorithm’s step 3. whose orthonormal columns address the same space the columns of A themselves address. Increment and return to step 7.

A∗K = A⊥ = Q′ Hr Im−r .48).7). but of A∗ . the Gram-Schmidt decomposition too brings a kernel formula. INVERSION AND ORTHONORMALIZATION 13. Equation (13. left and right.61) Q A⊥ (13.368 CHAPTER 13. Refer to (13.60) Q Im (13. after all. .3 The Gram-Schmidt kernel formula Like the Gauss-Jordan decomposition in (13. (13.59) again by Gram-Schmidt—with the diﬀerences that. 2.60) is probably the more useful form. 3. Each column has unit magnitude and conveniently lies orthogonal to every other column. but the GramSchmidt kernel formula as such. To compute the kernel of a matrix B by Gram-Schmidt one sets A = B ∗ and applies (13. on the ground that the earlier Gram-Schmidt application of (13.10) A⊥ of A.2.61). which columns. To develop and apply it. and that one skips the unnecessary step 7 for all j ≤ r. The resulting m × m.11. this time. not of A. . . Observing that the r independent columns of the m × r matrix Q address the same space the columns of A address.57) per the Gram-Schmidt (13. one decomposes an m × n matrix A = QR (13. A′ = Q′ R′ . . one then constructs the m × (r + m) matrix A′ ≡ Q + Im H−r = and decomposes it too.58) extracts the rightward columns that express the kernel. are just Q.57) through (13. (13. r during the ﬁrst r instances of the algorithm’s step 3. full-rank square matrix Q′ = Q + A⊥ H−r = consists of • r columns on the left that address the same space the columns of A address and • m−r columns on the right that give a complete orthogonal complement (§ 13.54) and its algorithm in § 13.11. . one chooses p = 1.57) has already orthonormalized ﬁrst r columns of A′ .

Thus. it brings one property so interesting that the property merits a section of its own. The property is that Q∗ Q = Im = QQ∗ .4] . (13. A matrix Q that satisﬁes (13. (Note that the permutor of § 11.54) is square. Q−1 = Q∗ . is called a unitary matrix. 13.62). That Im = QQ∗ is unexpected. until one realizes25 that the equation Q∗ Q = Im characterizes Q∗ to be the rank-m inverse of Q.) One immediate consequence of (13. Qa and Qb and let the symbol qbij represent the ith element of qbj .63) a very useful property. for. then one must orthonormalize it as an extra step. § 4. (13. and that the very deﬁnition of orthonormality demands that the dot product [Q]∗ · ∗i [Q]∗j of orthonormal columns be zero unless i = j.12 that follows.7. having the maximum possible rank r = m. as the last paragraph of § 13.11. Being square.63 precisely because it is unitary. when the dot product of a unit vector with itself is unity. Let the symbols qj . the Gram-Schmidt kernel formula does everything the Gauss-Jordan kernel formula (13. By the columnwise interpretation 25 [24. the m × m matrix Q′ is a unitary matrix. consider the product Q = Qa Qb (13.7) does and in at least one sense does it better. To prove it.13. The unitary matrix is the subject of § 13. if one wants a Gauss-Jordan kernel orthonormalized.12.62) The reason that Q∗ Q = Im is that Q’s columns are orthonormal. The product of two or more unitary matrices is itself unitary if the matrices are of the same dimensionality. THE UNITARY MATRIX 369 In either the form (13.12 The unitary matrix When the orthonormalized matrix Q of the Gram-Schmidt decomposition (13. qaj and qbj respectively represent the jth columns of Q.2 has alluded. however. whether derived from the Gram-Schmidt or from elsewhere.62) is that a square matrix with either orthonormal columns or orthonormal rows is unitary and has both.60) or the form (13. and that § 13.61).1 enjoys the property of eqn. 13.1 lets any rank-m inverse (orthonormal or otherwise) attack just as well from the right as from the left. whereas the Gram-Schmidt kernel comes already orthonormalized.64) of m × m unitary matrices Qa and Qb .

62). INVERSION AND ORTHONORMALIZATION (§ 11. .i′ But q∗ ′ · qai = δi′ i because Qa is unitary. (13.1. as was to be demonstrated. The adjoint dot product of any two of Q’s columns then is q∗′ · qj = j ∗ qbi′ j ′ qbij q∗ ′ · qai .63) lets one use the Gram-Schmidt decomposition (13. then the extended operator Q∞ = Q + (I − Im ). 26 This is true only for 1 ≤ i ≤ m. Q∗ Q = Im .65) Unitary extended operators are certainly possible. Multiplying the system by its own adjoint yields x∗ Q∗ Qx = b∗ b. qj = i qbij qai . ai i. but you knew that already. for if Q is an m × m dimension-limited matrix. But according to (13.26 so ai q∗′ · qj = j i ∗ qbij ′ qbij = q∗ ′ · qbj = δj ′ j .370 CHAPTER 13. so. which is to say that Q is unitary. To prove it. as was to be demonstrated. bj which says neither more nor less than that the columns of Q are orthonormal. consider the system Qx = b. That is. Equation (13. operating on an m-element vector by an m × m unitary matrix does not alter the vector’s magnitude.3) of matrix multiplication.54) to invert a square matrix as A−1 = R−1 Q∗ = S ∗ D −1 U −1 Q∗ . x∗ x = b∗ b. Unitary operations preserve length.

As the chapter’s introduction had promised. Unitary matrices are so easy to handle that they can sometimes justify signiﬁcant eﬀort to convert a model to work in terms of them if possible.12.10 and 14. the matrix has shown its worth here. THE UNITARY MATRIX 371 which is just Q with ones running out the main diagonal from its active region. 14. itself meets the unitary criterion (13. .13. arithmetic and algebra most of the chapter’s ﬁndings would have lain beyond practical reach.62) for m = ∞. for without the matrix’s notation. We shall meet the unitary matrix again in §§ 14. The chapter as a whole has demonstrated at least in theory (and usually in practice) techniques to solve any linear system characterized by a matrix of ﬁnite dimensionality. And even so. and it is the subject of Ch. the single most interesting agent of matrix arithmetic remains yet to be treated. This last is the eigenvalue. whatever the matrix’s rank or shape. next.12. It has explained how to orthonormalize a set of vectors and has derived from the explanation the useful Gram-Schmidt decomposition.

372

CHAPTER 13. INVERSION AND ORTHONORMALIZATION

Chapter 14

The eigenvalue

The eigenvalue is a scalar by which a square matrix scales a vector without otherwise changing it, such that Av = λv. This chapter analyzes the eigenvalue and the associated eigenvector it scales. Before treating the eigenvalue proper, the chapter gathers from across Chs. 11 through 14 several properties all invertible square matrices share, assembling them in § 14.2 for reference. One of these regards the determinant, which opens the chapter.

14.1

The determinant

Through Chs. 11, 12 and 13 the theory of the matrix has developed slowly but pretty straightforwardly. Here comes the ﬁrst unexpected turn. It begins with an arbitrary-seeming deﬁnition. The determinant of an n × n square matrix A is the sum of n! terms, each term the product of n elements, no two elements from the same row or column, terms of positive parity adding to and terms of negative parity subtracting from the sum—a term’s parity (§ 11.6) being the parity of the permutor (§ 11.7.1) marking the positions of the term’s elements. Unless you already know about determinants, the deﬁnition alone might seem hard to parse, so try this. The inverse of the general 2 × 2 square matrix a11 a12 , A2 = a21 a22 373

374

CHAPTER 14. THE EIGENVALUE

by the Gauss-Jordan method or any other convenient technique, is found to be a22 −a12 −a21 a11 A−1 = . 2 a11 a22 − a12 a21 The quantity1 det A2 = a11 a22 − a12 a21 in the denominator is deﬁned to be the determinant of A2 . Each of the determinant’s terms includes one element from each column of the matrix and one from each row, with parity giving the term its ± sign. The determinant of the general 3 × 3 square matrix by the same rule is det A3 = (a11 a22 a33 + a12 a23 a31 + a13 a21 a32 ) − (a13 a22 a31 + a12 a21 a33 + a11 a23 a32 ); and indeed if we tediously invert such a matrix symbolically, we do ﬁnd that quantity in the denominator there. The parity rule merits a more careful description. The parity of a term like a12 a23 a31 is positive because the parity of the permutor, or interchange quasielementary (§ 11.7.1), P =

0 0 1 1 0 0 0 1 0

marking the positions of the term’s elements is positive. The parity of a term like a13 a22 a31 is negative for the same reason. The determinant comprehends all possible such terms, n! in number, half of positive parity and half of negative. (How do we know that exactly half are of positive and half, negative? Answer: by pairing the terms. For every term like a12 a23 a31 whose marking permutor is P , there is a corresponding a13 a22 a31 whose marking permutor is T[1↔2] P , necessarily of opposite parity. The sole exception to the rule is the 1 × 1 square matrix, which has no second term to pair.)

The determinant det A used to be written |A|, an appropriately terse notation for which the author confesses some nostalgia. The older notation |A| however unluckily suggests “the magnitude of A,” which though not quite the wrong idea is not quite the right idea, either. The magnitude |z| of a scalar or |u| of a vector is a real-valued, nonnegative, nonanalytic function of the elements of the quantity in question, whereas the determinant det A is a complex-valued, analytic function. The book follows convention by denoting the determinant as det A for this reason among others.

1

14.1. THE DETERMINANT

375

Normally the context implies a determinant’s rank n, but the nonstandard notation det(n) A is available especially to call the rank out, stating explicitly that the determinant has exactly n! terms. (See also §§ 11.3.5 and 11.5 and eqn. 11.49.2 ) It is admitted3 that we have not, as yet, actually shown the determinant to be a generally useful quantity; we have merely motivated and deﬁned it. Historically the determinant probably emerged not from abstract considerations but for the mundane reason that the quantity it represents occurred frequently in practice (as in the A−1 example above). Nothing however log2 ically prevents one from simply deﬁning some quantity which, at ﬁrst, one merely suspects will later prove useful. So we do here.4

14.1.1

Basic properties

The determinant det A enjoys several useful basic properties. • If ai′′ ∗ ci∗ = ai′ ∗ ai∗ c∗j a∗j ′′ = a∗j ′ a∗j when i = i′ , when i = i′′ , otherwise, when j = j ′ , when j = j ′′ , otherwise,

or if

where i′′ = i′ and j ′′ = j ′ , then

**det C = − det A. Interchanging rows or columns negates the determinant. • If ci∗ =
**

2 3

(14.1)

αai∗ ai∗

when i = i′ , otherwise,

And further Ch. 13’s footnotes 5 and 22. [24, § 1.2] 4 [24, Ch. 1]

376 or if c∗j = then

CHAPTER 14. THE EIGENVALUE

αa∗j a∗j

when j = j ′ , otherwise, (14.2)

det C = α det A.

Scaling a single row or column of a matrix scales the matrix’s determinant by the same factor. (Equation 14.2 tracks the linear scaling property of § 7.3.3 and of eqn. 11.2.) • If ci∗ = or if c∗j = then det C = det A + det B. (14.3) If one row or column of a matrix C is the sum of the corresponding rows or columns of two other matrices A and B, while the three matrices remain otherwise identical, then the determinant of the one matrix is the sum of the determinants of the other two. (Equation 14.3 tracks the linear superposition property of § 7.3.3 and of eqn. 11.2.) • If or if c∗j ′ = 0, then det C = 0. (14.4) A matrix with a null row or column also has a null determinant. • If or if c∗j ′′ = γc∗j ′ , ci′′ ∗ = γci′ ∗ , ci′ ∗ = 0, a∗j + b∗j a∗j = b∗j when j = j ′ , otherwise, ai∗ + bi∗ ai∗ = bi∗ when i = i′ , otherwise,

14.1. THE DETERMINANT where i′′ = i′ and j ′′ = j ′ , then det C = 0.

377

(14.5)

The determinant is zero if one row or column of the matrix is a multiple of another. • The determinant of the adjoint is just the determinant’s conjugate, and the determinant of the transpose is just the determinant itself: det C ∗ = (det C)∗ ; det C T = det C. (14.6)

These basic properties are all fairly easy to see if the deﬁnition of the determinant is clearly understood. Equations (14.2), (14.3) and (14.4) come because each of the n! terms in the determinant’s expansion has exactly one element from row i′ or column j ′ . Equation (14.1) comes because a row or column interchange reverses parity. Equation (14.6) comes because according to § 11.7.1, the permutors P and P ∗ always have the same parity, and because the adjoint operation individually conjugates each element of C. Finally, (14.5) comes because, in this case, every term in the determinant’s expansion ﬁnds an equal term of opposite parity to oﬀset it. Or, more formally, (14.5) comes because the following procedure does not alter the matrix: (i) scale row i′′ or column j ′′ by 1/γ; (ii) scale row i′ or column j ′ by γ; (iii) interchange rows i′ ↔ i′′ or columns j ′ ↔ j ′′ . Not altering the matrix, the procedure does not alter the determinant either; and indeed according to (14.2), step (ii)’s eﬀect on the determinant cancels that of step (i). However, according to (14.1), step (iii) negates the determinant. Hence the net eﬀect of the procedure is to negate the determinant—to negate the very determinant the procedure is not permitted to alter. The apparent contradiction can be reconciled only if the determinant is zero to begin with. From the foregoing properties the following further property can be deduced. • If ci∗ = or if c∗j = a∗j + αa∗j ′ a∗j ai∗ + αai′ ∗ ai∗ when i = i′′ , otherwise, when j = j ′′ , otherwise,

378 where i′′ = i′ and j ′′ = j ′ , then

CHAPTER 14. THE EIGENVALUE

det C = det A.

(14.7)

Adding to a row or column of a matrix a multiple of another row or column does not change the matrix’s determinant. To derive (14.7) for rows (the column proof is similar), one deﬁnes a matrix B such that αai′ ∗ when i = i′′ , bi∗ ≡ ai∗ otherwise. From this deﬁnition, bi′′ ∗ = αai′ ∗ whereas bi′ ∗ = ai′ ∗ , so bi′′ ∗ = αbi′ ∗ , which by (14.5) guarantees that det B = 0. On the other hand, the three matrices A, B and C diﬀer only in the (i′′ )th row, where [C]i′′ ∗ = [A]i′′ ∗ + [B]i′′ ∗ ; so, according to (14.3), det C = det A + det B. Equation (14.7) results from combining the last two equations.

14.1.2

The determinant and the elementary operator

Section 14.1.1 has it that interchanging, scaling or adding rows or columns of a matrix respectively negates, scales or does not alter the matrix’s determinant. But the three operations named are precisely the operations of the three elementaries of § 11.4. Therefore, det T[i↔j]A = − det A = det AT[i↔j], det Tα[i] A = α det A = det ATα[j] , det Tα[ij] A = det A = det ATα[ij] , (14.8)

1 ≤ (i, j) ≤ n, i = j, for any n × n square matrix A. Obviously also, det IA = det A = det AI, det In A = det A = det AIn , det I = 1 = det In . (14.9)

14.1. THE DETERMINANT

379

If A is taken to represent an arbitrary product of identity matrices (In and/or I) and elementary operators, then a signiﬁcant consequence of (14.8) and (14.9), applied recursively, is that the determinant of a product is the product of the determinants, at least where identity matrices and elementary operators are concerned. In symbols,5 det

k

Mk Mk

=

k

det Mk , In , I, T[i↔j] , Tα[i] , Tα[ij] ,

(14.10)

1 ≤ (i, j) ≤ n. This matters because, as the Gauss-Jordan decomposition of § 12.3 has shown, one can build up any square matrix of full rank by applying elementary operators to In . Section 14.1.4 will put the rule (14.10) to good use.

∈

14.1.3

The determinant of a singular matrix

Equation (14.8) gives elementary operators the power to alter a matrix’s determinant almost arbitrarily—almost arbitrarily, but not quite. What an n × n elementary operator6 cannot do is to change an n × n matrix’s determinant to or from zero. Once zero, a determinant remains zero under the action of elementary operators. Once nonzero, always nonzero. Elementary operators being reversible have no power to breach this barrier. Another thing n × n elementaries cannot do according to § 12.5.3 is to change an n × n matrix’s rank. Nevertheless, such elementaries can reduce any n × n matrix reversibly to Ir , where r ≤ n is the matrix’s rank, by the Gauss-Jordan algorithm of § 12.3. Equation (14.4) has that the n × n determinant of Ir is zero if r < n, so it follows that the n × n determinant of every rank-r matrix is similarly zero if r < n; and complementarily that the n × n determinant of a rank-n matrix is never zero. Singular matrices always have zero determinants; full-rank square matrices never do. One can evidently tell the singularity or invertibility of a square matrix from its determinant alone.

Notation like “∈”, ﬁrst met in § 2.3, can be too fancy for applied mathematics, but it does help here. The notation Mk ∈ {. . .} restricts Mk to be any of the things between the braces. As it happens though, in this case, (14.11) below is going to erase the restriction. 6 That is, an elementary operator which honors an n × n active region. See § 11.3.2.

5

380

CHAPTER 14. THE EIGENVALUE

14.1.4

The determinant of a matrix product

Sections 14.1.2 and 14.1.3 suggest the useful rule that det AB = det A det B. (14.11)

To prove the rule, we consider three distinct cases. The ﬁrst case is that A is singular. In this case, B acts as a column operator on A, whereas according to § 12.5.2 no operator has the power to promote A in rank. Hence the product AB is no higher in rank than A, which says that AB is no less singular than A, which implies that AB like A has a null determinant. Evidently (14.11) holds in the ﬁrst case. The second case is that B is singular. The proof here resembles that of the ﬁrst case. The third case is that neither matrix is singular. Here, we use GaussJordan to decompose both matrices into sequences of elementary operators and rank-n identity matrices, for which det AB = det {[A] [B]} = det = = det T In det T det In T In T T det T det T In T det T det T det In T In T

= det A det B, which is a schematic way of pointing out in light of (14.10) merely that since A and B are products of identity matrices and elementaries, the determinant of the product is the product of the determinants. So it is that (14.11) holds in all three cases, as was to be demonstrated. The determinant of a matrix product is the product of the matrix determinants.

14.1.5

Determinants of inverse and unitary matrices

From (14.11) it follows that det A−1 = because A−1 A = In and det In = 1. 1 det A (14.12)

14.1. THE DETERMINANT From (14.6) it follows that if Q is a unitary matrix (§ 13.12), then det Q∗ det Q = 1, |det Q| = 1.

381

(14.13)

This reason is that |det Q|2 = (det Q)∗ (det Q) = det Q∗ det Q = det Q∗ Q = det Q−1 Q = det In = 1.

14.1.6

Inverting the square matrix by determinant

The Gauss-Jordan algorithm comfortably inverts concrete matrices of moderate size, but swamps one in nearly interminable algebra when symbolically inverting general matrices larger than the A2 at the section’s head. Slogging through the algebra to invert A3 symbolically nevertheless (the reader need not actually do this unless he desires a long exercise), one quite incidentally discovers a clever way to factor the determinant: C T A = (det A)In = AC T ; cij ≡ det Rij ; 1 if i′ = i and j ′ = j, [Rij ]i′ j ′ ≡ 0 if i′ = i or j ′ = j but not both, ai′ j ′ otherwise.

. . . ··· ··· ··· ··· ··· ∗ ∗ 0 ∗ ∗ . . . . . . ∗ ∗ 0 ∗ ∗ . . . . . . 0 0 1 0 0 . . . . . . ∗ ∗ 0 ∗ ∗ . . . . . . ∗ ∗ 0 ∗ ∗ . . . ··· ··· ··· ··· ··· ,

(14.14)

Pictorially,

Rij =

same as A except in the ith row and jth column. The matrix C, called the cofactor of A, then consists of the determinants of the various Rij . Another way to write (14.14) is [C T A]ij = (det A)δij = [AC T ]ij , which comprises two cases. In the case that i = j, [AC T ]ij = [AC T ]ii =

ℓ

(14.15)

aiℓ ciℓ =

ℓ

aiℓ det Riℓ = det A = (det A)δij ,

382

CHAPTER 14. THE EIGENVALUE

wherein the equation ℓ aiℓ det Riℓ = det A states that det A, being a determinant, consists of several terms, each term including one factor from each row of A, where aiℓ provides the ith row and Riℓ provides the other rows.7 In the case that i = j, [AC T ]ij =

ℓ

aiℓ cjℓ =

ℓ

aiℓ det Rjℓ = 0 = (det A)(0) = (det A)δij ,

wherein ℓ aiℓ det Rjℓ is the determinant, not of A itself, but rather of A with the jth row replaced by a copy of the ith, which according to (14.5) evaluates to zero. Similar equations can be written for [C T A]ij in both cases. The two cases together prove (14.15), hence also (14.14). Dividing (14.14) by det A, we have that8 A−1 A = In = AA−1 , CT . (14.16) A−1 = det A Equation (14.16) inverts a matrix by determinant. In practice, it inverts small matrices nicely, through about 4 × 4 dimensionality (the A−1 equation 2 at the head of the section is just eqn. 14.16 for n = 2). It inverts 5 × 5 and even 6 × 6 matrices reasonably, too—especially with the help of a computer to do the arithmetic. Though (14.16) still holds in theory for yet larger matrices, and though symbolically it expresses the inverse of an abstract, n × n matrix concisely whose entries remain unspeciﬁed, for concrete matrices much bigger than 4 × 4 to 6 × 6 or so its several determinants begin to grow too great and too many for practical calculation. The Gauss-Jordan technique (or even the Gram-Schmidt technique) is preferred to invert concrete matrices above a certain size for this reason.9

14.2

Coincident properties

Chs. 11, 12 and 13, plus this chapter up to the present point, have discovered several coincident properties of the invertible n × n square matrix. One does

This is a bit subtle, but if you actually write out A3 and its cofactor C3 symbolically, trying (14.15) on them, then you will soon see what is meant. 8 Cramer’s rule [24, § 1.6], of which the reader may have heard, results from applying (14.16) to (13.4). However, Cramer’s rule is really nothing more than (14.16) in a less pleasing form, so this book does not treat Cramer’s rule as such. 9 For very large matrices, even the Gauss-Jordan grows impractical, due to compound ﬂoating-point rounding error and the maybe large but nonetheless limited quantity of available computer memory. Iterative techniques ([chapter not yet written]) serve to invert such matrices approximately.

7

14.2. COINCIDENT PROPERTIES

383

not feel the full impact of the coincidence when these properties are left scattered across the long chapters; so, let us gather and summarize the properties here. A square, n × n matrix evidently has either all of the following properties or none of them, never some but not others. • The matrix is invertible (§ 13.1). • Its rows are linearly independent (§§ 12.1 and 12.3.4). • Its columns are linearly independent (§ 12.5.4). • Its columns address the same space the columns of In address, and its rows address the same space the rows of In address (§ 12.5.7). • The Gauss-Jordan algorithm reduces it to In (§ 12.3.3). (In this, per § 12.5.3, the choice of pivots does not matter.) • Decomposing it, the Gram-Schmidt algorithm achieves a fully square, unitary, n × n factor Q (§ 13.11.2). • It has full rank r = n (§ 12.5.4). • The linear system Ax = b it represents has a unique n-element solution x, given any speciﬁc n-element driving vector b (§ 13.2). • The determinant det A = 0 (§ 14.1.3). • None of its eigenvalues is zero (§ 14.3, below). The square matrix which has one of these properties, has all of them. The square matrix which lacks one, lacks all. Assuming exact arithmetic, a square matrix is either invertible, with all that that implies, or singular; never both. The distinction between invertible and singular matrices is theoretically as absolute as (and indeed is analogous to) the distinction between nonzero and zero scalars. Whether the distinction is always useful is another matter. Usually the distinction is indeed useful, but a matrix can be almost singular just as a scalar can be almost zero. Such a matrix is known, among other ways, by its unexpectedly small determinant. Now it is true: in exact arithmetic, a nonzero determinant, no matter how small, implies a theoretically invertible matrix. Practical matrices however often have entries whose values are imprecisely known; and even when they don’t, the computers which invert them tend to do arithmetic imprecisely in ﬂoating-point. Matrices which live

384

CHAPTER 14. THE EIGENVALUE

on the hazy frontier between invertibility and singularity resemble the inﬁnitesimals of § 4.1.1. They are called ill-conditioned matrices. Section 14.8 develops the topic.

14.3

The eigenvalue itself

We stand ready at last to approach the ﬁnal major agent of matrix arithmetic, the eigenvalue. Suppose a square, n×n matrix A, a nonzero n-element vector v = In v = 0, (14.17) and a scalar λ, together such that Av = λv, or in other words such that Av = λIn v. If so, then [A − λIn ]v = 0. (14.19) (14.18)

Since In v is nonzero, the last equation is true if and only if the matrix [A − λIn ] is singular—which in light of § 14.1.3 is to demand that det(A − λIn ) = 0. (14.20)

The left side of (14.20) is an nth-order polynomial in λ, the characteristic polynomial, whose n roots are the eigenvalues 10 of the matrix A. What is an eigenvalue, really? An eigenvalue is a scalar a matrix resembles under certain conditions. When a matrix happens to operate on the right eigenvector v, it is all the same whether one applies the entire matrix or just the eigenvalue to the vector. The matrix scales the eigenvector by the eigenvalue without otherwise altering the vector, changing the vector’s

10

An example: A det(A − λIn ) = = = = λ = 2 3 det 0 −1 , 0 −1 − λ

2−λ 3

(2 − λ)(−1 − λ) − (0)(3) λ2 − λ − 2 = 0, −1 or 2.

whereas none of the other determinant-terms ﬁnds enough factors of λ to reach order n. nothing prevents λ = 0. which comprises exactly n! determinant-terms (we say “determinant-terms” rather than “terms” here only to avoid confusing the determinant’s terms with the characteristic polynomial’s). the Newton-Raphson iteration (4. When λ = 0. That is. Singular matrices each have at least one zero eigenvalue. 10. Once one has calculated an eigenvalue.14. j (14. λ′ λj = 1 for all 1 ≤ j ≤ n if A′ A = In = AA′ . Zero eigenvalues and singular matrices always travel together. this maindiagonal determinant-term evidently contributes a (−λ)n to the characteristic polynomial. only one of which.4 The eigenvector It is an odd fact that (14.21) comes from answering the question: if Avj scales vj by the factor λj . You will soon see what is meant. regardless of the matrix’s rank. which § 14.20) can be impractical to expand for a large matrix. . The eigenvalues of a matrix’s inverse are the inverses of the matrix’s eigenvalues.4. one can feed it back to calculate the 11 The inexpensive [24] also covers the topic competently and readably. the determinant (14. the cubic and quartic methods of Ch. which as we have said is the sign of a singular matrix.) On the other hand. (If unsure.21) The reason behind (14. When multiplied out. This is what (14.20) reveal the eigenvalues λ of a square matrix A while obscuring the associated eigenvectors v. hulking matrix. Of course it works only when v happens to be the right eigenvector. The eigenvalue alone takes the place of the whole. Observe incidentally that the characteristic polynomial of an n×n matrix always enjoys full order n.20)’s nth-order polynomial to locate the actual eigenvalues. though. The reason lies in the determinant det(A−λIn ). THE EIGENVECTOR 385 magnitude but not its direction.4 discusses. (a11 − λ)(a22 − λ) · · · (ann − λ).31). (14. take your pencil and just calculate the characteristic polynomials of the 3 × 3 matrices I3 and 0.18) means.11 14. On the other hand.2). nonsingular matrices never do.19) and (14. gathers elements straight down the main diagonal of the matrix [A − λIn ].20) makes det A = 0. One solves it by the same techniques by which one solves any polynomial: the quadratic formula (2. then what does A′ Avj = Ivj do to vj ? Naturally one must solve (14. here iterative techniques help: see [chapter not yet written].

vk−1 respectively with distinct eigenvalues λ1 . Were vk dependent. consider several independent eigenvectors v1 . To prove this fact. v2 . . An eigenvalue and its associated eigenvector. . then the eigensolutions of A + αIn are (λj + α. vj ). vj ). which is to say. λk−1 . did nontrivial coeﬃcients cj exist such that k−1 vk = j=1 cj vj . 14.61). the eigenvectors are the nelement vectors for which [A − λIn ]v = 0.386 CHAPTER 14.7) or the Gram-Schmidt kernel formula (13. among them the following. . then left-multiplying the equation by A − λk would yield k−1 0= j=1 (λj − λk )cj vj . λ2 . . Refer to (14.3. are sometimes called an eigensolution. • Eigenvectors corresponding to distinct eigenvalues are always linearly independent of one another. which is to say that the eigenvectors are the vectors of the kernel space of the degenerate matrix [A − λIn ]—which one can calculate (among other ways) by the Gauss-Jordan kernel formula (13. • If the eigensolutions of A are (λj . . THE EIGENVALUE associated eigenvector.21) and its explanation in § 14.5 Eigensolution facts Many useful or interesting mathematical facts concern the eigensolution. .19). This is seen by adding αvj to both sides of the deﬁnition Avj = λvj . and further consider another eigenvector vk which might or might not be independent but which too has a distinct eigenvalue λk . According to (14. . The eigenvalues move over by αIn while the eigenvectors remain ﬁxed. . . taken together. • A matrix and its inverse share the same eigenvectors with inverted eigenvalues.

Thus vk too is independent. and for which (14. EIGENSOLUTION FACTS 387 impossible since the k − 1 eigenvectors are independent. an eigenvalue. This fact proceeds from the last point. which in turn is possible only if A1 = A2 . Section 14. positive eigenvalues. • A positive deﬁnite matrix has only real. that every n-element vector x is a unique linear combination of independent eigenvectors.3. Were it not so. Unfortunately some matrices with repeated eigenvalues also have repeated eigenvectors—as for example. 12 [32] . See § 13. curiously.14.2 speaks of matrices of the last kind. This is a simple consequence of the fact that the n × n matrix V whose columns are the several eigenvectors vj has full rank r = n. 1 1]T .1) the matrix’s characteristic polynomial (14.10. then any n-element vector can be expressed as a unique linear combination of the eigenvectors.12 [1 0. • If an n × n square matrix A has n independent eigenvectors (which is always so if the matrix has n distinct eigenvalues and often so even otherwise).5.19) necessarily admits at least one nonzero solution v because its matrix A − λIn is degenerate. A nonnegative deﬁnite matrix has only real. nonnegative eigenvalues.20) has at least one root. • An n × n square matrix whose eigenvectors are linearly independent of one another cannot share all eigensolutions with any other n×n square matrix. so A1 x − A2 x = (A1 − A2 )x = 0 for all x. • Every n × n square matrix has at least one eigensolution if n > 0. whereupon by induction from a start case of k = 1 we conclude that there exists no dependent eigenvector with a distinct eigenvalue.6. which by deﬁnition would be no eigenvalue if it had no eigenvector to scale. whose double eigenvalue λ = 1 has the single eigenvector [1 0]T . then v∗ Av = λv∗ v (in which v∗ v naturally is a positive real scalar) would violate the criterion for positive or nonnegative deﬁniteness. Neither of the two proposed matrices A1 and A2 could scale any of the eigenvector components of x diﬀerently than the other matrix did. because according to the fundamental theorem of algebra (6.

we need not show this. the diagonal decomposition or the diagonalization of the square matrix A. whereby the product V ΛV −1 can be nothing other than A. One might object that we had shown only how to compose some matrix V ΛV −1 with the correct eigenvalues and independent eigenvectors.5 includes. . Besides factoring a diagonalizable matrix by (14..22). . 13 . . If this seems confusing. again. . from which (14. An n × n matrix with n independent eigenvectors (which class. one can apply the same formula to compose a diagonalizable matrix with desired eigensolutions.5 has already demonstrated that two matrices with the same eigenvalues and independent eigenvectors are in fact the same matrix. because the identity AV = V Λ holds. whereas the jth column of Λ having just the one element acts to scale V ’s jth column only. 0 λn is an otherwise empty n × n matrix with the eigenvalues of A set along its main diagonal and V = v1 v2 · · · vn−1 vn is an n × n matrix whose columns are the eigenvectors of A. but is not limited to.6 Diagonalization Any n × n matrix with n independent eigenvectors (which class per § 14. . then consider that the jth column of the product AV is Avj . .13 The matrix V is invertible because its columns the eigenvectors are independent.22) follows. . includes every n × n matrix with n distinct eigenvalues and also includes many matrices with fewer) is called a diagonalizable matrix. (14. . because § 14.388 CHAPTER 14. 0 0 0 λ2 . Equation (14. λn−1 0 0 0 . or. 0 0 ··· ··· .22) where Λ= λ1 0 . THE EIGENVALUE 14. This is so because the identity Avj = vj λj holds for all 1 ≤ j ≤ n. expressed more concisely. However. ··· ··· 0 0 . every n×n matrix with n distinct eigenvalues) can be diagonalized as A = V ΛV −1 . .22) is called the eigenvalue decomposition. but had failed to show that the matrix was actually A.

j ij Changing z ← k implies the generalization14 Az = V Λz V −1 . 14. The idea that a matrix resembles a humble scalar in the right circumstance is powerful. but the resemblance ends there. The n × n null matrix for example is singular but still diagonalizable. It is a curious and useful fact that A2 = (V ΛV −1 )(V ΛV −1 ) = V Λ2 V −1 and by extension that Ak = V Λk V −1 (14.23) for any diagonalizable matrix A.55) is trivially diagonalizable as diag{x} = In diag{x}In . Λz ij = δij λz . j (14.13) which branch of λz one should choose j at each index j. twice or more. What a nondiagonalizable matrix is in essence is a matrix with a repeated eigensolution: the same eigenvalue with the same eigenvector. a nondiagonalizable matrix is a matrix with an n-fold eigenvalue whose corresponding eigenvector space fewer than n eigenvectors fully characterize.24) good for any diagonalizable A and complex z. Nondiagonalizable matrices are troublesome and interesting.7. and a matrix can be either without being the other.14. The diagonal matrix Λk is nothing more than the diagonal matrix Λ with each element individually raised to the kth power. such that Λk = δij λk . Section 14.7 Remarks on the eigenvalue Eigenvalues and their associated eigenvectors stand among the principal reasons one goes to the considerable trouble to develop matrix theory as we have done in recent chapters.10. More formally. .2 will have more to say about the nondiagonalizable matrix. REMARKS ON THE EIGENVALUE 389 The diagonal matrix diag{x} of (11. especially if A has negative or complex eigenvalues. The nondiagonalizable matrix vaguely resembles the singular matrix in that both represent edge cases and can be hard to handle numerically. Among the reasons for this 14 It may not be clear however according to (5.

3 are singular.26) is always a real number of no less than unit magnitude. For best invertibility.10. Section 14. Then there is the edge case of the nondiagonalizable matrix. then both matrices according to § 14. though in practice quite a broad range of κ is usually acceptable.8 Matrix condition The largest in magnitude of the several eigenvalues of a diagonalizable operator A. It brings an appreciation of the matrix for reasons which were anything but apparent from the outset of Ch. denoted here λmax . If zero. it destabilizes the iteration. then to A2 v. if A’s eigenvalue of smallest magnitude is denoted λmin . tends then to transform v into the associated eigenvector. κ = 1 would be ideal (it would mean that all eigenvalues had the same magnitude). Should the dominant eigenvalue have greater than unit magnitude. if nearly zero. All this is fairly deep mathematics. Such considerations lead us to deﬁne the condition of a diagonalizable matrix quantitatively as15 λmax . thus. 11. tends to dominate the iteration Ak x. thus one can sometimes judge the stability of a physical process indirectly by examining the eigenvalues of the matrix which describes it. which matrix surprisingly covers only part of its domain with eigenvectors.13. operating repeatedly on a vector v to change it ﬁrst to Av.2 and 14. leaving one free to carry the magnifying eﬀect |λmax |k separately if one prefers to do so. (14. THE EIGENVALUE is that a matrix can represent an iterative process. One sometimes ﬁnds it convenient to normalize a dominant eigenvalue by deﬁning a new operator A′ ≡ A/ |λmax |. largest in magnitude. In terms of the new operator. then the corresponding eigenvalue of A′ is λmin / |λmax |. A3 v and so on. then both matrices are ill conditioned. the iteration becomes Ak x = |λmax |k A′k x.25) κ≡ λmin by which κ≥1 (14.7 has named λmax the dominant eigenvalue for this reason. However. whose own dominant eigenvalue λmax / |λmax | has unit magnitude. The dominant eigenvalue of A. the scale factor 1/ |λmax | scales all eigenvalues equally. 14. gradually but relatively eliminating all other components of v.390 CHAPTER 14. 15 [72] . Remarks continue in §§ 14.

no particular edge value of κ. but you knew that already. at and beyond which it turns ill-conditioned.14. According to (14. it remains to the mathematician to interpret it. The quantity A−1 b cannot min fall short of λ−1 b . for instance. . less than which a matrix is well conditioned. |δx| = A−1 δb . or that one with a wretched κ = 20x18 were well conditioned. Dividing by |x| = A−1 b .25). A−1 δb |δx| = . or where the elements of A are known only within some tolerance. Suppose that a diagonalizable matrix A is precisely known but that the corresponding driving vector b is not. x + δx = A−1 (b + δb). where δb is the error in b and δx is the resultant error in x. = −1 |x| λmin |b| λmax b That is. MATRIX CONDITION 391 Could we always work in exact arithmetic. incidentally. (14. |δx| |δb| ≤κ . then one should like to bound the ratio |δx| / |x| to ascertain the reliability of x as a solution. Matrix condition so deﬁned turns out to have another useful application. |x| |A−1 b| The quantity A−1 δb cannot exceed λ−1 δb . might technically be said to apply to scalars as well as to matrices. the value of κ might not interest us much as long as it stayed ﬁnite. then you might not credit me—but the mathematics nevertheless can only give the number.8. but in computer ﬂoating point. inﬁnite κ tends to emerge imprecisely rather as large κ ≫ 1. Thus. Subtracting x = A−1 b and taking the magnitude. Transferring A to the equation’s right side. The applied mathematician handles such a matrix with due skepticism. 16 There is of course no deﬁnite boundary. the condition of every nonzero scalar is happily κ = 1. max λ−1 δb |δx| λmax |δb| ≤ min .27) |x| |b| Condition. An ill-conditioned matrix by deﬁnition16 is a matrix of large κ ≫ 1. but ill condition remains a property of matrices alone. If A(x + δx) = b + δb. If I tried to claim that a matrix with a ﬁne κ = 3 were ill conditioned.

though. 18 The professional matrix literature sometimes distinguishes by typeface between the matrix B and the basis B its columns represent. ABu = b. The basis provides the units from which other vectors can be built. The reader may need to ponder the basis concept a while to grasp it.—except that the right word for the relevant “building block” is basis vector. Left-multiplication by B evidently converts out of the basis. This book just uses B. [B −1 AB]u = B −1 b. then the vector B 5 1 = 1 −1 5 0 + 1 2 1 0 = is (5. Left-multiplication by B −1 . invertible complete basis B. by which the operator B −1 AB is seen to be the operator A. e3 . then does the reverse. etc. Basically.392 CHAPTER 14. One can therefore convert any operator A to work within a complete basis B by the successive steps Ax = b. If x = Bu then u represents x in the basis B. if the columns of B= 1 0 0 −1 2 1 4 2 1 are regarded as a basis.18 B. but the concept is simple once grasped and little purpose would be served by dwelling on it here. converting into the basis. Particularly interesting is the n×n. the idea is that one can build the same vector from alternate building blocks. not only from the standard building blocks e1 . 1) in the basis B: ﬁve times the ﬁrst basis vector plus once the second. For example. one might consult one of those if needed. u = B −1 x.9 The similarity transformation Any collection of vectors assembled into a matrix can serve as a basis by which other vectors can be expressed. only transformed to work within the basis17. e2 . The books [33] and [47] introduce the basis more gently. 17 . in which the n basis vectors are independent and address the same full space the columns of In address. THE EIGENVALUE 14. Such semantical distinctions seem a little too ﬁne for applied use.

10 The Schur decomposition The Schur decomposition of an arbitrary.14. but serves a theoretical purpose. square. n × n square matrix A is A = QUS Q∗ .10. 14. Now we have the theory to appreciate it properly. if B is unitary. and where US is a general upper triangular matrix which can have any values (even zeros) along its main diagonal. That is.28) The eigenvalues of A and the similar B −1 AB are the same for any square. Probably the most important property of the similarity transformation is that it alters no eigenvalues. n × n matrix B. then. (14. unitarily similar ) to the matrix A.19 We derive it here for this reason. We have already met the similarity transformation in §§ 11. [B −1 AB]u = λu. B −1 A(BB −1 )x = λB −1 x.2. as for any unitary matrix (§ 13. n × n matrix A and any invertible. then the conversion is also called a unitary transformation. The matrix B −1 AB the transformation produces is said to be similar (or.5 and 12. if Ax = λx. is Q−1 = Q∗ . 19 The alternative is to develop the interesting but diﬃcult Jordan canonical form. THE SCHUR DECOMPOSITION 393 The conversion from A into B −1 AB is called a similarity transformation. .12). The Schur decomposition is slightly obscure. (14.12). by successive steps. which for brevity’s sake this chapter prefers to omit. is somewhat tedious to derive and is of limited use in itself.29) where Q is an n × n unitary matrix whose inverse. If B happens to be unitary (§ 13.

∗ ∗ 0 0 0 0 0 .. Co ≡ In−i H−i CHi In−i . .32) This subsection assigns various capital Roman letters to represent the several matrices and submatrices it manipulates. See Appendix B. . . . This footnote mentions the fact because good mathematical style avoid assigning letters that already bear a conventional meaning in a related context (for example. B= ··· ··· ··· ··· ··· ··· ··· . . . .29) is not standard and carries no meaning elsewhere. . . . . and where we do not mind if any or all of the ∗ elements change in going from B to C but we require zeros in the indicated spots. ··· ··· ··· ··· ··· ··· ··· . . ∗ ∗ ∗ ∗ ∗ ∗ ∗ . . .. such that Bo ≡ In−i H−i BHi In−i . . . . The writer had to choose some letters and these are ones he chose. . ∗ ∗ ∗ ∗ ∗ 0 0 . ∗ ∗ ∗ ∗ ∗ ∗ ∗ . . (14. . . . The Roman alphabet provides only twenty-six capitals. ··· ··· ··· ··· ··· ··· ··· . . . ∗ ∗ ∗ 0 0 0 0 . .394 CHAPTER 14. . . though. not because the latter is wrong but because it would be extremely confusing). which will shortly grow clear) we have a matrix B of the form . . . . . . 20 (14. . . . . . . ∗ 0 0 0 0 0 0 . . . . . .. . ∗ ∗ ∗ ∗ ∗ ∗ ∗ . . Let Bo and Co represent the (n − i) × (n − i) submatrices in the lower right corners respectively of B and C. ∗ 0 0 0 0 0 0 . ∗ ∗ ∗ ∗ 0 0 0 . . . . . . (14. .31) where W is an n × n unitary matrix. Suppose further that we wish to transform B not only similarly but unitarily into . C ≡ W ∗ BW = ··· ··· ··· ··· ··· ··· ··· . . . . . .30) where the ith row and ith column are depicted at center. . . . . Its choice of letters except in (14. of which this subsection uses too many to be allowed to reserve any. . THE EIGENVALUE 14. . ∗ ∗ ∗ 0 0 0 0 .. this book avoids writing Ax = b as T e = i. . ∗ ∗ ∗ ∗ 0 0 0 .10. ∗ ∗ 0 0 0 0 0 . . . .1 Derivation Suppose that20 (for some reason. ∗ ∗ ∗ ∗ ∗ ∗ ∗ . . . . . . . . . ∗ ∗ ∗ ∗ ∗ ∗ ∗ . .

. 0 0 1 0 0 0 0 . .. . . Equation (14. . ∗ ∗ ∗ . . .10. . .14. . 0 0 0 0 ∗ ∗ ∗ .31) stipulates. so In−i Wo = Wo = Wo In−i . . . . ∗ ∗ ∗ . 0 0 0 0 ∗ ∗ ∗ . ··· ··· ··· .33) which contains a smaller. ∗ ∗ ∗ . W = Ii + Hi Wo H−i = ··· ··· ··· ··· ··· ··· ··· . 0 1 0 0 0 0 0 . . . 0 0 0 1 0 0 0 . THE SCHUR DECOMPOSITION where Hk is the shift operator of § 11. . . .. reversible steps that C = W ∗ BW ∗ = Ii BIi + Ii BHi Wo H−i + Hi Wo H−i BIi ∗ + Hi Wo H−i BHi Wo H−i . ∗ + Hi In−i Wo In−i H−i BHi In−i Wo In−i H−i . . . . Thus. .. . The question remains as to whether a unitary W exists that satisﬁes the form and whether for general B we can discover a way to calculate it. . . . . . .9. . . ∗ C = Ii BIi + Ii BHi Wo In−i H−i + Hi In−i Wo H−i BIi ∗ = Ii [B]Ii + Ii [BHi Wo H−i ](In − Ii ) + (In − Ii )[Hi Wo H−i B]Ii ∗ + (In − Ii )[Hi Wo Bo Wo H−i ](In − Ii ). To narrow the search. Bo = ∗ ∗ ∗ .31). . we have by successive. ··· ··· ··· . 0 0 0 0 ∗ ∗ ∗ . 395 Co = ∗ 0 0 . .31) seeks an n × n unitary matrix W to transform the matrix B into a new matrix C ≡ W ∗ BW such that C ﬁts the form (14. Pictorially. . ∗ = (Ii + Hi Wo H−i )(B)(Ii + Hi Wo H−i ) The unitary submatrix Wo has only n−i columns and n−i rows. . . . . let us look ﬁrst for a W that ﬁts the restricted template . ∗ ∗ ∗ . . . .. . (n − i) × (n − i) unitary submatrix Wo in its lower right corner and resembles In elsewhere. . ··· ··· ··· ··· ··· ··· ··· . . because we need not ﬁnd every W that satisﬁes the form but only one such W . . . (14. . . . Beginning from (14. . 1 0 0 0 0 0 0 . . . .

∗ Co = In−i H−i (Ii + Hi Wo H−i )B(Ii + Hi Wo H−i )Hi In−i . Apparently any (n − i) × (n − i) unitary submatrix Wo whatsoever obeys (14.31) in the lower left. . That leaves the lower right.34) Further left-multiplying by H−i .32) and the identity (11.31) by the truncator (In −Ii ) to focus solely on the lower right area. The four terms on the equation’s right. lower left and lower right. we have the reduced requirement that (In − Ii )C(In − Ii ) = (In − Ii )W ∗ BW (In − Ii ). and applying the identity (11.396 CHAPTER 14. But the upper left term makes the upper left areas of B and C the same. respectively. since In−i H−i Ii = 0 = Ii Hi In−i . THE EIGENVALUE where the last step has used (14. substituting from (14.33).and right-multiplying (14.76). or. Co = In−i H−i W ∗ BW HiIn−i . and the upper right term does not bother us because we have not restricted the content of C’s upper right area. right-multiplying by Hi . ∗ Co = In−i H−i (Hi Wo H−i )B(Hi Wo H−i )Hi In−i ∗ = In−i Wo H−i BHi Wo In−i ∗ = Wo In−i H−i BHi In−i Wo . upper left and upper right. The lower left term is null because ∗ ∗ (In − Ii )[Hi Wo H−i B]Ii = (In − Ii )[Hi Wo In−i H−i BIi ]Ii ∗ = (In − Ii )[Hi Wo H−i ][0]Ii = 0. (14. each term with rows and columns neatly truncated.32). represent the four quarters of C ≡ W ∗ BW —upper left. Left. Expanding W per (14. ∗ = (In − Ii )[Hi Wo H−i ][(In − Ii )BIi ]Ii leaving C = Ii [B]Ii + Ii [BHi Wo H−i ](In − Ii ) ∗ + (In − Ii )[Hi Wo Bo Wo H−i ](In − Ii ).76) yields In−i H−i CHi In−i = In−i H−i W ∗ BW HiIn−i . or. upper right.

11. To achieve a unitary transformation of the form (14. F Noting that the Gram-Schmidt process has rendered orthogonal to vo all columns of QF but the ﬁrst. F Ge1 = λo Q∗ vo . which is vo . Noting that the Gram-Schmidt algorithm orthogonalizes only rightward. The increasingly well-stocked armory of matrix theory we now have to draw from makes satisfying (14. which is the unit eigenvector vo : [QF ]∗1 = QF e1 = vo . so the latter is as good a way to state the reduced requirement as the former is. Left-multiplying by Q∗ . Let (λo .14.10. F Right-multiplying by QF e1 = vo and noting that Bo vo = λo vo . Decompose F by the Gram-Schmidt technique of § 13. 397 (14.2. (n − i) × (n − i + 1) matrix F ≡ vo e1 e2 e3 · · · en−i . therefore. observe that the ﬁrst column of the (n − i) × (n − i) unitary matrix QF remains simply the ﬁrst column of F . .5 that every square matrix has at least one eigensolution.35) are reversible. it suﬃces to satisfy (14.31). vo ) represent an eigensolution of Bo —any eigensolution of Bo —with vo normalized to unit magnitude. Form the broad. this is ∗ Co = W o Bo W o . F then transfer factors to reach the equation QF GQ∗ = Bo .35) possible as follows. observe that QF Ge1 = λo vo . to obtain F = Q F RF . Observe per § 14. . .34) to (14. Transform Bo unitarily by QF to deﬁne the new matrix G ≡ Q ∗ Bo Q F . .32).35). THE SCHUR DECOMPOSITION Per (14. choosing p = 1 during the ﬁrst instance of the algorithm’s step 3 (though choosing any permissible p thereafter).35) The steps from (14. observe that Ge1 = λo Q∗ vo = λo e1 = F λo 0 0 .

. together constitute a valid choice for Wo and Co .10.30) of B at i = i′ is nothing other than the form (14..40) which along with (14. which ﬁts the very form (14.39) (14. Conclude therefore that Wo = QF .38) where per (14. (14. if we let B|i=0 = A. Moreover.39) accomplishes the Schur decomposition. because the product of unitary matrices according to (13.37) because the form (14.30) the matrix US has the general upper triangular form the Schur decomposition (14.29) requires. Co = G.31) of C at i = i′ − 1. Naturally the technique can be applied recursively as B|i=i′ = C|i=i′ −1 .30) into a square matrix C of the form (14.20) of the general upper triangular matrix US is det(US − λIn ) = 0. ··· ··· ··· . we have that n−1 Q= i′ =0 (W |i=i′ ) .35) and thus also the original requirement (14.31). Equation (14. 1 ≤ i′ ≤ n.64) is itself a unitary matrix.398 which means that CHAPTER 14. (14. THE EIGENVALUE G= λo 0 0 . . where QF and G are as this paragraph develops. ∗ ∗ ∗ .31).32) the submatrix Co is required to have. . (14. . . . .36) completes a failsafe technique to transform unitarily any square matrix B of the form (14. 14.36) . satisfying the reduced requirement (14. (14. .2 The nondiagonalizable matrix The characteristic equation (14. Therefore. then it follows by induction that B|i=n = US . ∗ ∗ ∗ .

In either form. then it would necessarily have pnn = 1. According to (14. p(n−1)(n−1) = 1. 22 An unusually careful reader might worry that A and US had the same eigenvalues with diﬀerent multiplicities.22 One might think that the Schur decomposition oﬀered an easy way to calculate eigenvalues. thus further the same eigenvalue multiplicities. Consider a permutor P . where the determinant’s n! − 1 other terms are all zero because each of them includes at least one zero factor from below the main diagonal. the proposition’s truth might seem less than obvious until viewed from the proper angle. because the nth column is already occupied. and. similarity transformations preserve eigenvalues.” Consider however that A − λIn = QUS Q∗ − λIn = Q[US − Q∗ (λIn )Q]Q∗ According to (14.13). (14.10. The Schur decomposition (14.21 Hence no element above the main diagonal of US even inﬂuences the eigenvalues.28). p(n−2)(n−2) = 1. still. It would be surprising if it actually were so.41) the main-diagonal elements. If therefore A = QUS Q∗ . this determinant brings only the one term n 399 det(US − λIn ) = i=1 (uSii − λ) = 0 whose factors run straight down the main diagonal. but. and so on. one would like to give a sounder reason than the participle “surprising. this equation’s determinant is det[A − λIn ] = det{Q[US − λIn ]Q∗ } = det Q det[US − λIn ] det Q∗ = det[US − λIn ]. as we have seen. which says that A and US have not only the same eigenvalues but also the same characteristic polynomials. thus aﬃrming the proposition. In the next-to-bottom row.14.29) is in fact a similarity transformation. In the next row up. then the eigenvalues of A are just the values along the main diagonal of US . but it is less easy than it ﬁrst appears because one must The determinant’s deﬁnition in § 14.11) and (14. = Q[US − λ(Q∗ In Q)]Q∗ = Q[US − λIn ]Q∗ . (ii) that the only permutor that marks no position below the main diagonal is the one which also marks no position above.1 makes the following two propositions equivalent: (i) that a determinant’s term which includes one or more factors above the main diagonal also includes one or more factors below. every matrix A has a Schur decomposition. which apparently are λi = uSii . 21 . If P marked no position below the main diagonal. THE SCHUR DECOMPOSITION Unlike most determinants. else the permutor’s bottom row would be empty which is not allowed.

better. however.42) where |ǫ| ≪ 1 and where u is an arbitrary vector that meets the criterion ′ given. This theoretical beneﬁt pays when some of the n eigenvalues of an n × n square matrix A repeat.42) brings us to the nondiagonalizable matrix of the subsection’s title. whose triangular factor US openly lists all eigenvalues along its main diagonal. We will leave the answer in that form. By the Schur decomposition.20) necessarily imply a p-fold eigenvalue in the corresponding matrix? The existence of the nondiagonalizable matrix casts a shadow of doubt until one realizes that every nondiagonalizable matrix is arbitrarily nearly diagonalizable— and. simply by perturbing the main diagonal of US to23 ′ US ui ′ ≡ US + ǫ diag{u}. With suﬃcient toil. thus also eigenvectors—for according to § 14. The question: does a p-fold root in the characteristic polynomial (14. one can analyze such perturbed eigenvalues and their associated eigenvectors similarly as § 9. Equation (14.6. one can construct a second square matrix A′ . (14. then you can show him a nearly identical matrix with three inﬁnitesimally distinct eigenvalues.400 CHAPTER 14.42) separates eigenvalues. the modiﬁed matrix A′ = QUS Q∗ unlike A has n (maybe inﬁnitesimally) distinct eigenvalues.2 has analyzed perturbed poles. Section 14. as near as desired to A but having n distinct eigenvalues. . If you claim that a matrix has a triple eigenvalue and someone disputes the claim.6 and its diagonalization formula (14.41): every square matrix without exception has a Schur decomposition.55) deﬁnes the diag{·} notation.5 eigenvectors of distinct eigenvalues never depend on one another—permitting a nonunique but still sometimes usable form of diagonalization in the limit ǫ → 0 even when the matrix in question is strictly nondiagonalizable. is arbitrarily nearly diagonalizable with distinct eigenvalues. 23 Equation (11. it brings at least the theoretical beneﬁt of (14. THE EIGENVALUE calculate eigenvalues to reach the Schur decomposition in the ﬁrst place. That is the essence of the idea. Though inﬁnitesimally near A.22) diagonalize any matrix with distinct eigenvalues and even any matrix with repeated eigenvalues but distinct eigenvectors. but fail where eigenvectors repeat. Equation (14. Whatever practical merit the Schur decomposition might have or lack. = ui if λi′ = λi . The ﬁnding that every matrix is arbitrarily nearly diagonalizable illuminates a question the chapter has evaded up to the present point.

or concerning the eigenvalue problem26 more broadly. Then there is a kind of sloppy Schur form called a Hessenberg form which allows content in US along one or more subdiagonals just beneath the main diagonal.12 and 14. • its eigenvectors corresponding to distinct eigenvalues lie orthogonal to one another. (14. An m × m square matrix A that is its own adjoint. Ch. Ch. That is. but for the time being we will let the matter rest there. Properties of the Hermitian matrix include that • its eigenvalues are real.11.11 The Hermitian matrix A∗ = A. λ∗ = λ.14. (14.44) That the eigenvalues are real is proved by letting (λ.6.5. THE HERMITIAN MATRIX 401 Generalizing the nondiagonalizability concept leads one eventually to the ideas of the generalized eigenvector 24 (which solves the higher-order linear system [A − λI]k v = 0) and the Jordan canonical form.6) such that A = V ΛV ∗ .43) is called a Hermitian or self-adjoint matrix. and • it is unitarily diagonalizable (§§ 13. in more and less rigorous ways. One could proﬁtably propose and prove any number of useful theorems concerning the nondiagonalizable matrix and its generalized eigenvectors.25 which together roughly track the sophisticated conventional pole-separation technique of § 9. v) represent an eigensolution of A and constructing the product v∗ Av. 5] 26 [79] . 7] [24. which naturally is possible only if λ is real. 24 25 [26. for which λ∗ v∗ v = (Av)∗ v = v∗ Av = v∗ (Av) = λv∗ v. 14.

To prove the last hypothesis of the three needs ﬁrst some deﬁnitions as follows. THE EIGENVALUE That eigenvectors corresponding to distinct eigenvalues lie orthogonal to one another is proved27 by letting (λ1 . Hence. let the s columns of the m × s matrix Vo represent the s independent eigenvectors of A such that (i) each column has unit magnitude and (ii) columns whose eigenvectors share the same eigenvalue lie orthogonal to one another. Let the s × s diagonal matrix Λo carry the eigenvalues on its main diagonal such that AVo = Vo Λo . Given an m × m matrix A. the eigenvalues λ1 and λ2 are no exceptions.10) to Vo —perpendicular to all eigenvectors. each column of unit magnitude— such that Vo⊥∗ Vo = 0 and Vo⊥∗ Vo⊥ = Im−s . ∗ λ2 = λ1 or v2 v1 = 0.28 27 28 [47. Let the m−s columns of the m× (m − s) matrix Vo⊥ represent the complete orthogonal complement (§ 13.6 that s = m if and only if A is diagonalizable. § 8. v1 ) and (λ2 . whereas the former includes a p-fold eigenvalue only as many times as the eigenvalue enjoys independent eigenvectors. 2 That is. Recall from § 14. for which ∗ ∗ ∗ ∗ λ∗ v2 v1 = (Av2 )∗ v1 = v2 Av1 = v2 (Av1 ) = λ1 v2 v1 . Recall from § 14.5 that s = 0 but 0 < s ≤ m because every square matrix has at least one eigensolution. 2 But according to the last paragraph all eigenvalues are real. ∗ λ∗ = λ1 or v2 v1 = 0.402 CHAPTER 14.22) is that the latter always includes a p-fold eigenvalue p times. where the distinction between the matrix Λo and the full eigenvalue matrix Λ of (14.1] A concrete example: the invertible but −1 −6 A= 0 0 nondiagonalizable matrix 0 0 0 5 5 5 2 −2 0 5 0 0 0 5 . v2 ) represent eigensolu∗ tions of A and constructing the product v2 Av1 .

not an eigenvector but perpendicular to them all. where we had relied on the assumption that A were Hermitian and thus that. implying that s < m) would have at least one column. s × (m − s) and (m − s) × (m − s) auxiliary matrices F and G necessarily would exist such that AVo⊥ = Vo F + Vo⊥ G. (AVo )∗ Vo⊥ = Is F + Vo∗ Vo⊥ G. has a single eigenvalue at λ = −1 and a triple eigenvalue at λ = 5. as proved above. In the example. we would have by successive steps that Vo∗ AVo⊥ = Vo∗ Vo F + Vo∗ Vo⊥ G. All vectors in the example are reported with unit magnitude. we can now prove by contradiction that all Hermitian matrices are diagonalizable. o 0 = F. the latter of whose eigenvector space is fully characterized by two eigenvectors rather than three such that 1 √ 0 0 0 2 −1 0 0 1 √ 0 1 0 ⊥ Vo = 2 1 . Λ∗ Vo∗ Vo⊥ = F + Vo∗ Vo⊥ G. The ﬁnding that F = 0 reduces the AVo⊥ equation above to read AVo⊥ = Vo⊥ G. Λo = 0 5 0 .11. (Vo Λo )∗ Vo⊥ = F + Vo∗ Vo⊥ G. whose Vo⊥ (since A is supposed to be nondiagonalizable. but notice that eigenvectors corresponding to distinct eigenvalues need not be orthogonal when A is not Hermitian. not due to any unusual property of the product AVo⊥ but for the mundane reason that the columns of Vo and Vo⊥ together by deﬁnition addressed the space of all m-element vectors—including the columns of AVo⊥ . The two λ = 5 eigenvectors are reported in mutually orthogonal form. m = 4 and s = 3. THE HERMITIAN MATRIX 403 With these deﬁnitions in hand. . Leftmultiplying by Vo∗ . For such a matrix A. its distinctly eigenvalued eigenvectors lay orthogonal to one another.14. falsely supposing a nondiagonalizable Hermitian matrix A. o Λ∗ (0) = F + (0)G. 1 0 √2 0 2 0 0 5 1 1 √ − 2 0 0 √2 The orthogonal complement Vo⊥ supplies the missing vector. in consequence of which A∗ = A and Vo∗ Vo = Is . Vo = √ .

The contradiction proves false the assumption that gave rise to it.45) worsens a matrix’s condition and may be undesirable for this reason. one can construct the matrix described by the diagonalization formula (14. one wonders whether the converse holds: are all diagonalizable matrices with real eigenvalues and orthogonal eigenvectors Hermitian? To show that they are. we would have by successive steps that AVo⊥ w = Vo⊥ Gw. Right-multiplying by the (m − s)-element vector w = 0. 1 × 1 or larger. A = V ΛV ∗ . where V −1 = V ∗ because this V is unitary (§ 13. Let (µ. when we had supposed that all of A’s eigenvectors lay in the space addressed by the columns of Vo . The last equation claims that (µ. (14. Having proven that all Hermitian matrices are diagonalizable and have real eigenvalues and orthogonal eigenvectors. THE EIGENVALUE In the reduced equation the matrix G would have at least one eigensolution.45) in which A = C ∗ C and b = C ∗ d. has at least one eigensolution. not due to any unusual property of G but because according to § 14. thus by construction did not lie in the space addressed by the columns of Vo⊥ .5 every square matrix.22). w) represent an eigensolution of G. We conclude that all Hermitian matrices are diagonalizable—and conclude further that they are unitarily diagonalizable on the ground that their eigenvectors lie orthogonal to one another—as was to be demonstrated. .29 29 The device (14. This section brings properties that greatly simplify many kinds of matrix analysis.12). All diagonalizable matrices with real eigenvalues and orthogonal eigenvectors are Hermitian. The properties demand a Hermitian matrix. The assumption: that a nondiagonalizable Hermitian A existed. But all the eigenvalues here are real. which means that Λ∗ = Λ and the right sides of the two equations are the same. Vo⊥ w) were an eigensolution of A. for which the properties obtain. A(Vo⊥ w) = µ(Vo⊥ w). That is. which might seem a severe and unfortunate restriction—except that one can left-multiply any exactly determined linear system Cx = d by C ∗ to get the equivalent Hermitian system [C ∗ C]x = [C ∗ d]. The equation’s adjoint is A∗ = V Λ∗ V ∗ .404 CHAPTER 14. A∗ = A as was to be demonstrated. but it works in theory at least.

If the unlikely thought occurred to you to take the square root of a matrix.” 14:29. “Singular value decomposition. are real and positive is a useful fact.14. m × n matrix A of full column rank r=n≤m and its adjoint A∗ . so equally has √ a real. since (A∗ A)∗ = A∗ A.5 does require all of its eigenvalues to be real and moreover positive— which means among other things that the eigenvalue matrix Λ has full rank.30 Consider the n × n product A∗ A of a tall or square.12.2. for just as a real. 18 Oct. 2007] . 0 √ + λn . overlooked. .47) to (14. THE SINGULAR-VALUE DECOMPOSITION 405 14. V ∗ A∗ AV = Σ∗ Σ. the diagonal elements of Λ. . (14. then the following idea is one you might discover.46) then yields A∗ A = V Σ∗ ΣV ∗ . because the product is positive deﬁnite § 14. .12 The singular-value decomposition Occasionally an elegant idea awaits discovery.46) Here. Applying (14. The product A∗ A is invertible according to § 13.44) as A∗ A = V ΛV ∗ .48) [77.6.47) 0 √ + λ2 . the n × n matrices Λ and V represent respectively the eigenvalues and eigenvectors not of A but of the product A∗ A. 0 0 √ (14. and.3.. 30 (14.11. Though nothing requires the product’s eigenvectors to be real. positive square root under these conditions. . √. . . positive scalar has a real. Λ Let the symbol Σ = Λ represent the n × n real. is positive deﬁnite according to § 13. . + λn−1 0 0 0 . ··· ··· 0 0 . is clearly Hermitian according to § 14. 0 0 ··· ··· . . thus is unitarily diagonalizable according to (14. where the singular values of A populate Σ’s diagonal.6. ∗ Σ =Σ = + λ1 0 . positive square root. almost in plain sight. positive square root of the eigenvalue matrix Λ such that Λ = Σ∗ Σ. That the eigenvalues.

A = U ΣV . Apparently every full-rank matrix has a singular-value decomposition. then the singular-value decomposition evidently inverts it as A−1 = V Σ−1 U ∗ .49) becomes AV Σ−1 = U Is . instead. Why not use GramSchmidt to make them orthonormal. AV = U Σ. but the reasoning is otherwise the same as before. which says neither more nor less than that the ﬁrst n columns of U are orthonormal. but this is irrelevant to the proof at hand).and right-multiplying respectively by Σ−∗ and Σ−1 leaves that In U ∗ U In = In . ∗ (14. AV = U Σ.50) .49) Substituting (14. thus making U a unitary matrix? If we do this.49)’s second line into (14. However. If A happens to have broad shape then we can decompose A∗ . but ΣΣ−1 = In . then (14. so this case poses no special trouble. THE EIGENVALUE Now consider the m × m matrix U such that AV Σ−1 = U In . (14.49) constitutes the singularvalue decomposition of A. too. If A happens to be an invertible square matrix.406 CHAPTER 14. too.49) does not constrain the last m − n columns of U . Apparently every matrix of less than full rank has a singular-value decomposition.51) ∗ (14. A = U ΣV . The product A∗ A is nonnegative deﬁnite in this case and ΣΣ−1 = Is . leaving us free to make them anything we want.48)’s second line gives the equation Σ∗ U ∗ U Σ = Σ∗ Σ. if the s nonzero eigenvalues are arranged ﬁrst in Λ. Equation (14. so left. then the surprisingly simple (14. But what of the matrix of less than full rank r < n? In this case the product A∗ A is singular and has only s < n nonzero eigenvalues (it may be that s = r.

VI. one might avoid true understanding and instead work by memorized rules. That is not always a bad plan. it is the agents that matter. the eigenvalue. which is to 31 . One is the class of more advanced and more specialized matrix theory. no one has yet discovered a satisfactory way to untangle the knot. GENERAL REMARKS ON THE MATRIX 407 14. brieﬂy: a projector is a matrix that ﬂattens an arbitrary ˜ vector b into its nearest shadow b within some restricted subspace. orthonormalization. rank. 33 Well. The other is the class of basic matrix theory these chapters do not happen to use. inversion. If the columns of A ˜ ˜ represent the subspace. Tools like the projector not used here tend to be omitted here or deferred to later chapters.13. really. a lengthy but well-knit tutorial this writer recommends. The reader who understands the Moore-Penrose pseudoinverse and/or the Gram-Schmidt process reasonably well can after all pretty easily ﬁgure out how to construct a projector without explicit instructions thereto. § 3. yet most of the tools are of limited interest in themselves.33 Of course. the orthogonal complement.13 General remarks on the matrix Chapters 11 through 14 have derived the uncomfortably bulky but—incredibly—approximately minimal knot of theory one needs to grasp the matrix properly and to use it with moderate versatility. then x represents b in the subspace basis iﬀ Ax = b. This book has chosen a way that needs some tools like truncators other books omit. but the way the agents are developed diﬀers. since we have brought it up (though only as an example of tools these chapters have avoided bringing up). 32 Such as [33. but does not need other tools like projectors other books32 include. and how many scientists and engineers would rightly choose the “nothing” if the matrix did not serve so very many applications as it does? Since it does serve so very many. What has given these chapters their hefty bulk is not so much the immediate development of the essential agents as the preparatory development of theoretical tools used to construct the essential agents. should the need arise.31 Applied mathematics brings nothing else quite like it. diagonalization and so on—are the same in practically all books on the subject.3]. about which we will have more to say in a moment. As far as the writer knows. The topics they omit fall roughly into two classes. The essential agents of matrix analysis— multiplicative associativity. but if that were your plan then it seems spectacularly unlikely that you would be reading a footnote buried beneath the further regions of the hinterland of Chapter 14 in such a book as this. the kernel.14. the “all” it must be. pseudoinversion. These several matrix chapters have not covered every topic they might. The choice to learn the basic theory of the matrix is almost an all-or-nothing choice. not because they are altogether useless but because they are not used here and because the present chapters are already too long.

for instance. That is.32).408 CHAPTER 14. Thence it is readily shown that the ˜ ˜ deviation b − b lies orthogonal to the shadow b. more advanced and more specialized matrix theory though often harder tends to come in smaller. per (13. maybe we ought to take advantage to change the subject. but there are two ways at least. since the book is Derivations of Applied Mathematics rather than Derivations of Applied Matrices. The theory develops endlessly. or the conjugategradient algorithm. say that Ax ≈ b. One can approach the projector in other ways. more manageable increments: the Cholesky decomposition. . However. More broadly deﬁned. whereupon x = A† b. in which the matrix B(B ∗ B)−1 B ∗ is the projector. From the present pause one could proceed directly to such topics. since this is the ﬁrst proper pause these several matrix chapters have aﬀorded. ˜ b = Ax = AA† b = [BC][C ∗ (CC ∗ )−1 (B ∗ B)−1 B ∗ ]b = B(B ∗ B)−1 B ∗ b. THE EIGENVALUE Paradoxically and thankfully. any matrix M for 2 which M = M is a projector.

more exotic possibilities (§ 15. 2] 409 . φ) in the spherical spherical coordinate system—as Fig. however. The lone word vector serves. A name like “matrix vector” or “n-dimensional vector” can disambiguate the vector of Chs. Per § 3. y. φ. n-dimensional vector of Chs. Because its three elements represent the three dimensions of the physical world. x) in the rectangular coordinate system.4 and 3. ﬁrst met in §§ 3. θ. Seen from one perspective.9.4 on page 67 interrelates—among other. it usually is not necessary to disambiguate.9.3. the three being (x. or (r.Chapter 15 Vector analysis Leaving the matrix. The vector brings an elegant notation. this chapter and the next turn to a curiously underappreciated agent of applied mathematics. since the three-dimensional geometrical vector is in fact a vector.1 It also merits a shorter name. (ρ. 11 through 14. 15. the threedimensional geometrical vector merits closer attention and special treatment.7). 11 through 14 where necessary but. the threedimensional geometrical vector is the n = 3 special case of the general. three scalars called coordinates suﬃce together to specify the amplitude and direction and thus the vector. 16 detail 1 [13. In the present chapter’s context and according to § 3. a vector consists of an amplitude of some kind plus a direction. Ch. This chapter and Ch. 3.1 illustrates and Table 3. z) in the cylindrical coordinate system. the three-dimensional geometrical vector.3. Where the geometrical context is clear—as it is in this chapter and the next—we will call the three-dimensional geometrical vector just a vector.

φ) and cylindrical (ρ. (The axis labels bear circumﬂexes in this ﬁgure only to disambiguate the z axis from the cylindrical coordinate z. θ. 15. in spherical (r. φ. See also Fig.5.1: A point on a sphere. VECTOR ANALYSIS Figure 15. z) coordinates.) ˆ z ˆ θ φ x ˆ r z y ˆ ρ .410 CHAPTER 15.

although it might with some theoretical subtlety be possible to treat complex radii. a word on complex numbers seems in order.1) for example knots the former expression like spaghetti but does not disturb the latter expression at all. azimuths and elevations consistently as three-dimensional coordinates. r Besides being more evocative (once one has learned to read it) and much more compact.6). in the rectangular coordinate system complex coordinates are perfectly appropriate and are straightforward enough to handle. Notwithstanding. don’t worry. What these chapters will avoid are complex nonrectangular coordinates like [3 + j2. the standard vector notation brings the major advantage of freeing a model’s geometry from reliance on any particular coordinate system. 2 The author would be interested to learn if there existed an uncontrived scientiﬁc or engineering application that actually used complex. 0].411 it. The cylindrical and spherical systems however.2 (This is not to say that you cannot ˆ ˆ have a complex vector like.) Vector addition will already be familiar to the reader from Ch. 3 or (quite likely) from earlier work outside this book. say. Here at the outset. Reorienting axes (§ 15. Two-dimensional geometrical vectors arise in practical modeling about as often as three-dimensional geometrical vectors do. or at least in real coordinates. the twodimensional case needs little special treatment. one would write an expression like (z − z ′ ) − [∂z ′ /∂x]x=x′ .y=y′ [(x − x′ )2 + (y − y ′ )2 + (z − z ′ )2 ] for the aspect coeﬃcient relative to a local surface normal (and if the sentence’s words do not make sense to you yet. −1/4. Fortunately. and. these chapters will not try to do so.1 and vector multiplication in § 15. Unlike most of the rest of the book this chapter and the next will work chieﬂy in real numbers. for it is just the threedimensional with z = 0 or θ = 2π/4 (see however § 15. The same coeﬃcient in standard vector notation is ˆ n · ∆ˆ. . just look the symbols over and appreciate the expression’s bulk).2. This chapter therefore begins with the reorientation of axes in § 15. were not conceived with complex coordinates in mind. Indeed. ρ[3 + j2] − φ[1/4] in a nonrectangular basis. and these chapters will not avoid it. complex coordinates are possible. which these chapters also treat.y=y′ (y − y ′ ) [1 + (∂z ′ /∂x)2 + (∂z ′ /∂y)2 ]x=x′ . it is ﬁne. You can have such a vector. nonrectangular coordinates.y=y′ (x − x′ ) − [∂z ′ /∂y]x=x′ . Without the notation.

θ and ψ. ˆ 1 z Matrix notation expresses the rotation of axes (3.4.1 Reorientation ˆ x′ y′ ˆ ˆ′ z cos φ − sin φ 0 sin φ cos φ 0 ˆ x 0 ˆ 0 y .” 23:00. In the Tait-Bryan rotations as explained in this book.1 The Tait-Bryan rotations With a yaw and a pitch to point the x axis in the desired direction plus a roll to position the y and z axes as desired about the new x axis.7) which are correct. so her bow (her forwardmost part) yaws from side to side. the axes are assigned to the vessel diﬀerently such that to yaw is to rotate about the x axis. ˆ′ z cos ψ .1. 4 The literature seems to agree on no standard order among the three Tait-Bryan rotations.3 one can reorient the three axes generally: ˆ x′ y′ ˆ ˆ′ z = 1 0 0 0 cos ψ − sin ψ cos θ 0 sin ψ 0 sin θ cos ψ 0 1 0 cos φ − sin θ − sin φ 0 0 cos θ sin φ cos φ 0 ˆ x 0 ˆ 0 y . 15. to pitch and to roll describe the rotational motion of a vessel at sea. VECTOR ANALYSIS 15.5 3 The English maritime verbs to yaw. to pitch about the y. If this footnote seems confusing. ˆ 1 z (15.5) as = In three dimensions however one can do more than just to rotate the x and y axes about the z. though the rotational angles are usually named φ.2) These are called the Tait-Bryan rotations.1. to pitch about the y. This implies that the Tait-Bryan vessel points x-ward whereas the Euler vessel points z-ward. For a vessel to pitch is for her to rotate about her “beam axis. 20 May 2008]. and to roll about the x [41].1) or. inverting per (3. For a vessel to yaw is for her to rotate about her vertical axis. In the Euler rotations as explained in this book later in the present section. For a vessel to roll is for her to rotate about her “fore-aft axis” such that she rocks or lists (leans) without changing the direction she points [77. 5 [12] cos φ sin φ 0 − sin φ cos φ 0 cos θ 0 0 0 − sin θ 1 0 1 0 1 sin θ 0 0 0 cos θ 0 cos ψ sin ψ ′ ˆ x 0 ˆ − sin ψ y′ . One can reorient the three axes generally as follows. The reason to shift perspective so is to maintain the semantics of the symbols θ and φ (though not ψ) according to Fig. then read (15.6).” so her bow pitches up and down. to yaw is to rotate about the z axis. 15. and to roll about the z. or alternately the Cardano rotations.1) and (15. ˆ x y ˆ ˆ z = (15. however. “Glossary of nautical terms.412 CHAPTER 15. which angle gets which name admittedly depends on the author. and.

15.1. REORIENTATION

413

Notice in (15.1) and (15.2) that the transpose (though curiously not the adjoint) of each 3 × 3 Tait-Bryan factor is also its inverse. In concept, the Tait-Bryan equations (15.1) and (15.2) say nearly all one needs to say about reorienting axes in three dimensions; but, still, the equations can confuse the uninitiated. Consider a vector ˆ ˆ v = xx + yy + ˆz. z (15.3)

**after which the application is straightforward. There results ˆ ˆ v′ = x′ x′ + y′ y ′ + ˆ′ z ′ , z where
**

x′ y′ z′

It is not the vector one reorients but rather the axes used to describe the vector. Envisioning the axes as in Fig. 15.1 with the z axis upward, one ﬁrst yaws the x axis through an angle φ toward the y then pitches it downward through an angle θ away from the z. Finally, one rolls the y and z axes through an angle ψ about the new x, all the while maintaining the three axes rigidly at right angles to one another. These three Tait-Bryan rotations can orient axes any way. Yet, even once one has clearly visualized the TaitBryan sequence, the prospect of applying (15.2) (which inversely represents the sequence) to (15.3) can still seem daunting until one rewrites the latter equation in the form x ˆ ˆ z v = x y ˆ y , (15.4) z

≡

(15.5) and where Table 3.4 converts to cylindrical or spherical coordinates if and as desired. Since (15.5) resembles (15.1), it comes as no surprise that its inverse,

x y z

1 0 0

0 cos ψ − sin ψ

cos θ 0 sin ψ 0 sin θ cos ψ

0 1 0

cos φ − sin θ − sin φ 0 0 cos θ

sin φ cos φ 0

x 0 0 y , z 1

=

cos φ sin φ 0

− sin φ cos φ 0

cos θ 0 0 0 − sin θ 1

0 1 0

1 sin θ 0 0 0 cos θ

0 cos ψ sin ψ

′ x 0 − sin ψ y ′ , cos ψ z′

(15.6)

resembles (15.2).

414

CHAPTER 15. VECTOR ANALYSIS

15.1.2

The Euler rotations

A useful alternative to the Tait-Bryan rotations are the Euler rotations, which view the problem of reorientation from the perspective of the z axis rather than of the x. The Euler rotations consist of a roll and a pitch followed by another roll, without any explicit yaw:6

ˆ x′ y′ ˆ ˆ′ z

=

cos ψ − sin ψ 0

sin ψ cos ψ 0

cos θ 0 0 0 sin θ 1

0 1 0

cos φ − sin θ − sin φ 0 0 cos θ

sin φ cos φ 0

ˆ x 0 ˆ 0 y ; ˆ 1 z

(15.7)

and inversely

ˆ x y ˆ ˆ z

=

(15.8) Whereas the Tait-Bryan point the x axis ﬁrst, the Euler tactic is to point ﬁrst the z. So, that’s it. One can reorient three axes arbitrarily by rotating them in pairs about the z, y and x or the z, y and z axes in sequence—or, generalizing, in pairs about any of the three axes so long as the axis of the middle rotation diﬀers from the axes (Tait-Bryan) or axis (Euler) of the ﬁrst and last. A ﬁrmer grasp of the reorientation of axes in three dimensions comes with practice, but those are the essentials of it.

cos φ sin φ 0

− sin φ cos φ 0

cos θ 0 0 0 − sin θ 1

0 1 0

cos ψ sin θ 0 sin ψ 0 cos θ

− sin ψ cos ψ 0

′ ˆ x 0 ˆ 0 y′ . ˆ′ z 1

15.2

Multiplication

One can multiply a vector in any of three ways. The ﬁrst, scalar multiplication, is trivial: if a vector v is as deﬁned by (15.3), then ˆ ˆ ψv = xψx + yψy + ˆψz. z (15.9)

Such scalar multiplication evidently scales a vector’s length without diverting its direction. The other two forms of vector multiplication involve multiplying a vector by another vector and are the subjects of the two subsections that follow.

As for the Tait-Bryan, for the Euler also the literature agrees on no standard sequence. What one author calls a pitch, another might call a yaw, and some prefer to roll twice about the x axis rather than the z. What makes a reorientation an Euler rather than a Tait-Bryan is that the Euler rolls twice.

6

15.2. MULTIPLICATION

415

15.2.1

The dot product

We ﬁrst met the dot product in § 13.8. It works similarly for the geometrical vectors of this chapter as for the matrix vectors of Ch. 13: v1 · v2 = x1 x2 + y1 y2 + z1 z2 , (15.10)

and then expand each of the two factors on the right according to (15.6), we see that the dot product does not in fact vary. As in (13.43) of § 13.8, here too the relationship ˆ∗ ˆ v1 · v2 = cos θ,

∗ ∗ v1 · v2 = v1 v2 cos θ,

which, if the vectors v1 and v2 are real, is the product of the two vectors to the extent to which they run in the same direction. It is the product to the extent to which the vectors run in the same direction because one can ˆ reorient axes to point x′ in the direction of v1 , whereupon v1 · v2 = x′ x′ 1 2 ′ ′ since y1 and z1 have vanished. Naturally, to be valid, the dot product must not vary under a reorientation of axes; and indeed if we write (15.10) in matrix notation, x2 v1 · v2 = x1 y1 z1 y2 , (15.11) z2

(15.12)

gives the angle θ between two vectors according Fig. 3.1’s cosine if the vectors are real, by deﬁnition hereby if complex. Consequently, the two vectors are mutually orthogonal—that is, the vectors run at right angles θ = 2π/4 to one another—if and only if ∗ v1 · v2 = 0. That the dot product is commutative, v2 · v1 = v1 · v2 , is obvious from (15.10). Fig. 15.2 illustrates the dot product. (15.13)

15.2.2

The cross product

The dot product of two vectors according to § 15.2.1 is a scalar. One can also multiply two vectors to obtain a vector, however, and it is often useful to do so. As the dot product is the product of two vectors to the extent to

416

CHAPTER 15. VECTOR ANALYSIS Figure 15.2: The dot product.

b θ b cos θ

a · b = ab cos θ a

which they run in the same direction, the cross product is the product of two vectors to the extent to which they run in diﬀerent directions. Unlike the dot product the cross product is a vector, deﬁned in rectangular coordinates as v1 × v2 ˆ ˆ ˆ x y z = x1 y1 z1 (15.14) x2 y2 z2 ˆ ˆ ≡ x(y1 z2 − z1 y2 ) + y(z1 x2 − x1 z2 ) + ˆ(x1 y2 − y1 x2 ), z

where the |·| notation is a mnemonic (actually a pleasant old determinant notation § 14.1 could have but did not happen to use) whose semantics are as shown. As the dot product, the cross product too is invariant under reorientation. One could demonstrate this fact by multiplying out (15.2) and (15.6) then substituting the results into (15.14): a lengthy, unpleasant exercise. Fortunately, it is also an unnecessary exercise; for, inasmuch as a reorientation consists of three rotations in sequence, it suﬃces merely that rotation about one axis not alter the dot product. One proves the proposition in the latter form by setting any two of φ, θ and ψ to zero before multiplying out and substituting. Several facets of the cross product draw attention to themselves. • The cyclic progression ··· → x → y → z → x → y → z → x → y → ··· (15.15)

of (15.14) arises again and again in vector analysis. Where the proˆ gression is honored, as in zx1 y2 , the associated term bears a + sign, otherwise a − sign, due to § 11.6’s parity principle and the right-hand rule.

15.2. MULTIPLICATION • The cross product is not commutative. In fact, v2 × v1 = −v1 × v2 ,

417

(15.16)

which is a direct consequence of the previous point regarding parity, or which can be seen more prosaically in (15.14) by swapping the places of v1 and v2 . • The cross product is not associative. That is, (v1 × v2 ) × v3 = v1 × (v2 × v3 ), ˆ ˆ as is proved by a suitable counterexample like v1 = v2 = x, v3 = y. • The cross product runs perpendicularly to each of its two factors if the vectors involved are real. That is, v1 · (v1 × v2 ) = 0 = v2 · (v1 × v2 ), (15.17)

as is seen by substituting (15.14) into (15.10) with an appropriate change of variables and simplifying. • Unlike the dot product, the cross product is closely tied to threedimensional space. Two-dimensional space (a plane) can have a cross product so long as one does not mind that the product points oﬀ into the third dimension, but to speak of a cross product in fourdimensional space would require arcane deﬁnitions and would otherwise make little sense. Fortunately, the physical world is threedimensional (or, at least, the space in which we model all but a few, exotic physical phenomena is three-dimensional), so the cross product’s limitation as deﬁned here to three dimensions will seldom if ever disturb us. • Section 15.2.1 has related the cosine of the angle between vectors to the dot product. One can similarly relate the angle’s sine to the cross product if the vectors involved are real, as |v1 × v2 | = v1 v2 sin θ, (15.18)

ˆ |ˆ 1 × v2 | = sin θ, v

ˆ ˆ ˆ demonstrated by reorienting axes such that v1 = x′ , that v2 has no ′ direction, and that v has only a nonnegative comˆ ˆ2 component in the z ˆ ponent in the y′ direction; by remembering that reorientation cannot

418

CHAPTER 15. VECTOR ANALYSIS Figure 15.3: The cross product.

c = a×b = ˆab sin θ c

b θ a

alter a cross product; and ﬁnally by applying (15.14) and comparing the result against Fig. 3.1’s sine. (If the vectors involved are com∗ plex then nothing prevents the operation |v1 × v2 | by analogy with ∗ eqn. 15.12—in fact the operation v1 × v2 without the magnitude sign is used routinely to calculate electromagnetic power ﬂow7 —but each of the cross product’s three rectangular components has its own complex phase which the magnitude operation ﬂattens, so the result’s relationship to the sine of an angle is not immediately clear.) Fig. 15.3 illustrates the cross product.

15.3

Orthogonal bases

A vector exists independently of the components by which one expresses ˆ ˆ ˆ ˆ ˆ it, for, whether q = xx + yy + zz or q = x′ x′ + y′ y ′ + ˆ′ z ′ , it remains z the same vector q. However, where a model involves a circle, a cylinder or a sphere, where a model involves a contour or a curved surface of some ˆ ˆ kind, to choose x′ , y′ and ˆ′ wisely can immensely simplify the model’s z ˆ ˆ z analysis. Normally one requires that x′ , y′ and ˆ′ each retain unit length, run perpendiclarly to one another, and obey the right-hand rule (§ 3.3), ˆ ˆ ˆ but otherwise any x′ , y′ and z′ can serve. Moreover, a model can specify ˆ ˆ various x′ , y′ and ˆ′ under various conditons, for nothing requires the three z to be constant.

7

[31, eqn. 1-51]

15.3. ORTHOGONAL BASES

419

Recalling the constants and variables of § 2.7, such a concept is ﬂexible enough to confuse the uninitiated severely and soon. As in § 2.7, here too an example aﬀords perspective. Imagine driving your automobile down a winding road, where q represented your speed8 and ˆ represented the direction ℓ the road ran, not generally, but just at the spot along the road at which your automobile momentarily happened to be. That your velocity were ˆ meant ℓq that you kept skilfully to your lane; on the other hand, that your velocity ˆ ˆ were (ˆ cos ψ + v sin ψ)q—where v, at right angles to ˆ represented the diℓ ℓ, rection right-to-left across the road—would have you drifting out of your lane at an angle ψ. A headwind had velocity −ˆ wind ; a crosswind, ±ˆ qwind . ℓq v ˆ A car a mile ahead of you had velocity ˆ2 q2 = (ˆ cos β + v sin β)q2 , where β ℓ ℓ represented the diﬀerence (assuming that the other driver kept skilfully to his own lane) between the road’s direction a mile ahead and its direction at your spot. For all these purposes the unit vector ˆ would remain constant. ℓ However, ﬁfteen seconds later, after you had rounded a bend in the road, ˆ the symbols ˆ and v would by deﬁnition represent diﬀerent vectors than ℓ before, with respect to which one would express your new velocity as ˆ but ℓq ˆ wind because, since would no longer express the headwind’s velocity as −ℓq the road had turned while the wind had not, the wind would no longer be a headwind. And this is where confusion can arise: your own velocity had changed while the expression representing it had not; whereas the wind’s velocity had not changed while the expression representing it had. This is not because ˆ diﬀers from place to place at a given moment, for like any ℓ other vector the vector ˆ (as deﬁned in this particular example) is the same ℓ vector everywhere. Rather, it is because ˆ is deﬁned relative to the road at ℓ your automobile’s location, which location changes as you drive. ˆ ˆ If a third unit vector w were deﬁned, perpendicular both to ˆ and to v ℓ ˆ v w] obeyed the right-hand rule, then the three together would such that [ℓ ˆ ˆ constitute an orthogonal basis. Any three real,9 right-handedly mutually perpendicular unit vectors [ˆ ′ y′ ˆ′ ] in three dimensions, whether constant x ˆ z

Conventionally one would prefer the letter v to represent speed, with velocity as v which in the present example would happen to be v = ˆ ℓv. However, this section will require the letter v for an unrelated purpose. 9 A complex orthogonal basis is also theoretically possible but is normally unnecessary in geometrical applications and involves subtleties in the cross product. This chapter, which speciﬁcally concerns three-dimensional geometrical vectors rather than the general, n-dimensional vectors of Ch. 11, is content to consider real bases only. Note that one can express a complex vector in a real basis.

8

420 or variable, for which ˆ z y′ · ˆ′ = 0, ˆ′ · x′ = 0, z ˆ ˆ ˆ x′ · y′ = 0,

CHAPTER 15. VECTOR ANALYSIS

ˆ ˆ y′ × ˆ′ = x′ , z ˆ ˆ ˆ z′ × x′ = y′ , ˆ ˆ z x′ × y′ = ˆ′ ,

ℑ(ˆ ′ ) = 0, x ℑ(ˆ ′ ) = 0, y ℑ(ˆ′ ) = 0, z

(15.19)

constitutes such an orthogonal basis, from which other vectors can be built. The geometries of some models suggest no particular basis, when one usually just uses a constant [ˆ y ˆ]. The geometries of other models however do x ˆ z suggest a particular basis, often a variable one. • Where the model features a contour like the example’s winding road, an [ˆ v w] basis (or a [ˆ v ˆ basis or even a [ˆ ˆ w] basis) can be ℓ ˆ ˆ u ˆ ℓ] u ℓ ˆ ˆ used, where ˆ locally follows the contour. The variable unit vectors v ℓ ˆ ˆ ˆ and w (or u and v, etc.) can be deﬁned in any convenient way so long as they remain perpendicular to one another and to ˆ ℓ—such that ˆ · w = 0 for instance (that is, such that w lay in the plane of ˆ ˆ (ˆ × ℓ) ˆ z z ˆ ˆ ˆ and ˆ ℓ)—but if the geometry suggests a particular v or w (or u), like ˆ ˆ the direction right-to-left across the example’s road, then that v or w 10 should probably be used. The letter ℓ here stands for “longitudinal.” • Where the model features a curved surface like the surface of a wavy ˆ sea,11 a [ˆ v n] basis (or a [ˆ n w] basis, etc.) can be used, where n uˆ ˆ u ˆ ˆ points locally perpendicularly to the surface. The letter n here stands for “normal,” a synonym for “perpendicular.” Observe, incidentally ˆ but signiﬁcantly, that such a unit normal n tells one everything one needs to know about its surface’s local orientation. • Combining the last two, where the model features a contour along a curved surface, an [ˆ v n] basis can be used. One need not worry about ℓˆ ˆ ˆ ˆ ˆ ℓ. choosing a direction for v in this case since necessarily v = n × ˆ • Where the model features a circle or cylinder, a [ˆ φ z] basis can ρ ˆ ˆ be used, where ˆ is constant and runs along the cylinder’s axis (or z ˆ perpendicularly through the circle’s center), ρ is variable and points ˆ locally away from the axis, and φ is variable and runs locally along the circle’s perimeter in the direction of increasing azimuth φ. Refer to § 3.9 and Fig. 15.4.

10 11

The assertion wants a citation, which the author lacks. [57]

15.3. ORTHOGONAL BASES

421

and Figure 15.4: The cylindrical basis. (The conventional symbols respectively represent vectors pointing out of the page toward the reader and into the page away from the reader. Thus, this ﬁgure shows the constant basis vector ˆ pointing out of the page toward the reader. The dot in the z middle of the is supposed to look like the tip of an arrowhead.)

ˆ φ ˆ ρ

ρ φ

ˆ z

• Where the model features a sphere, an [ˆ θ φ] basis can be used, r ˆ ˆ where ˆ is variable and points locally away from the sphere’s cenr ˆ is variable and runs locally tangentially to the sphere’s surface ter, θ in the direction of increasing elevation θ (that is, though not usually in the −ˆ direction itself, as nearly as possible to the −ˆ direction z z ˆ is variable and without departing from the sphere’s surface), and φ runs locally tangentially to the sphere’s surface in the direction of increasing azimuth φ (that is, along the sphere’s surface perpendicularly z r to ˆ). Standing on the earth’s surface, with the earth as the sphere, ˆ ˆ ˆ would be up, θ south, and φ east. Refer to § 3.9 and Fig. 15.5. • Occasionally a model arises with two circles that share a center but whose axes stand perpendicular to one another. In such a model one ˆ conventionally establishes z as the direction of the principal circle’s ˆ ˆ axis but then is left with x or y as the direction of the secondary x ˆx ˆ x r ˆx ˆ y ˆ ˆ ˆy ˆ y r circle’s axis, upon which an [ˆ ρ φ ], [φ ˆ θ ], [φ y ρy ] or [θ φ ˆ] xˆ basis can be used locally as appropriate. Refer to § 3.9. Many other orthogonal bases are possible (as in § 15.7, for instance), but the foregoing are the most common. Whether listed here or not, each orthogonal basis orders its three unit vectors by the right-hand rule (15.19).

422

CHAPTER 15. VECTOR ANALYSIS Figure 15.5: The spherical basis (see also Fig. 15.1).

ˆ r ˆ θ ˆ φ

ˆ ˆ Quiz: what does the vector expression ρ3 − φ(1/4) + ˆ2 mean? Wrong z answer: it meant the cylindrical coordinates (3; −1/4, 2); or, it meant the ˆ ˆ position vector x3 cos(−1/4) + y3 sin(−1/4) + ˆ2 associated with those coz ordinates. Right answer: the expression means nothing certain in itself but acquires a deﬁnite meaning only when an azimuthal coordinate φ is also supplied, after which the expression indicates the ordinary rectangular ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ vector x′ 3 − y′ (1/4) + z′ 2, where x′ = ρ = x cos φ + y sin φ, y′ = φ = ′ = ˆ. But, if this is so—if the cylindrical basis ˆ ˆ −ˆ sin φ + y cos φ, and z x z [ˆ φ ˆ] is used solely to express rectangular vectors—then why should we ρ ˆ z name this basis “cylindrical”? Answer: only because cylindrical coordinates (supplied somewhere) determine the actual directions of its basis vectors. Once directions are determined such a basis is used purely rectangularly, like any other orthogonal basis. This can seem confusing until one has grasped what the so-called nonrectangular bases are for. Consider the problem of air ﬂow in a jet engine. It probably suits such a problem that instantaneous local air velocity within the engine cylinder be expressed in cylindrical coordinates, with the z axis oriented along the engine’s axle; but this does not mean that the air ﬂow within the engine cylinder were everywhere ˆ-directed. On the contrary, a z ˆ z local air velocity of q = [−ˆ 5.0 + φ30.0 − ˆ250.0] m/s would have air moving ρ

15.3. ORTHOGONAL BASES

423

through the point in question at 250.0 m/s aftward along the axle, 5.0 m/s inward toward the axle and 30.0 m/s circulating about the engine cylinder. ˆ ˆ In this model, it is true that the basis vectors ρ and φ indicate diﬀerent directions at diﬀerent positions within the cylinder, but at a particular position the basis vectors are still used rectangularly to express q, the instantaneous local air velocity at that position. It’s just that the “rectangle” is rotated locally to line up with the axle. Naturally, you cannot make full sense of an air-velocity vector q unless you have also the coordinates (ρ; φ, z) of the position within the engine cylinder at which the air has the velocity the vector speciﬁes—yet this is when confusion can arise, for besides the air-velocity vector there is also, ˆ ˆ separately, a position vector r = xρ cos φ + yρ sin φ + ˆz. One may denote z 12 q(r), a function of position; yet, though the the air-velocity vector as position vector is as much a vector as the velocity vector is, one nonetheless handles it diﬀerently. One will not normally express the position vector r in the cylindrical basis. It would make little sense to try to express the position vector r in the cylindrical basis because the position vector is the very thing that determines the cylindrical basis. In the cylindrical basis, after all, the position vector ˆ ˆ is necessarily r = ρρ + zz (and consider: in the spherical basis it is the even more cryptic r = ˆr), and how useful is that, really? Well, maybe it r is useful in some situations, but for the most part to express the position vector in the cylindrical basis would be as to say, “My house is zero miles away from home.” Or, “The time is presently now.” Such statements may be tautologically true, perhaps, but they are confusing because they only seem to give information. The position vector r determines the basis, after which one expresses things other than position, like instantaneous local air velocity q, in that basis. In fact, the only basis normally suitable to express a position vector is a ﬁxed rectangular basis like [ˆ y ˆ]. Otherwise, one x ˆ z uses cylindrical coordinates (ρ; φ, z), but not a cylindrical basis [ˆ φ ˆ], to ρ ˆ z express a position r in a cylindrical geometry. Maybe the nonrectangular bases were more precisely called “rectangular bases of the nonrectangular coordinate systems,” but those are too many words and, anyway, that is not how the usage has evolved. Chapter 16 will elaborate the story by considering spatial derivatives of quantities like air ˆ ˆ velocity, when one must take the variation in ρ and φ from point to point

Conventionally, one is much more likely to denote a velocity vector as u(r) or v(r), except that the present chapter is (as footnote 8 has observed) already using the letters u and v for an unrelated purpose. To denote position as r however is entirely standard.

12

424

CHAPTER 15. VECTOR ANALYSIS

into account, but the foregoing is the basic story nevertheless.

15.4

Notation

The vector notation of §§ 15.1 and 15.2 is correct, familiar and often expedient but sometimes inconveniently prolix. This admittedly diﬃcult section augments the notation to render it much more concise.

15.4.1

Components by subscript

ax ay az an ≡ ≡ ≡ ≡ ˆ x · a, ˆ y · a, ˆ z · a, ˆ n · a, aρ ar aθ aφ ˆ ≡ ρ · a, ≡ ˆ · a, r ˆ ≡ θ · a, ˆ ≡ φ · a,

The notation

and so forth abbreviates the indicated dot product. That is to say, the notation represents the component of a vector a in the indicated direction. Generically, ˆ aα ≡ α · a. (15.20) Applied mathematicians use subscripts for several unrelated or vaguely reˆ lated purposes, so the full dot-product notation α · a is often clearer in print than the abbreviation aα is, but the abbreviation especially helps when several such dot products occur together in the same expression. Since13 ˆ ˆ ˆ ˆ a = xax + yay + zaz , ˆ ˆ ˆ ˆ b = xbx + yby + zbz , the abbreviation lends a more amenable notation to the dot and cross products of (15.10) and (15.14): a · b = a x bx + a y by + a z bz ; a×b =

13

(15.21) (15.22)

ˆ ˆ z x y ˆ ax ay az bx by bz

.

ˆ “Wait!” comes the objection. “I thought that you said that ax meant x · a. Now you claim that it means the x component of a?” ˆ But there is no diﬀerence between x · a and the x component of a. The two are one and the same.

15.4. NOTATION

425

In fact—because, as we have seen, reorientation of axes cannot alter the dot and cross products—any orthogonal basis [ˆ ′ y′ ˆ′ ] (§ 15.3) can serve here, x ˆ z so one can write more generally that a · b = ax′ bx′ + ay′ by′ + az ′ bz ′ ; a×b = ˆ x′ ax ′ bx ′ ˆ ˆ y′ z′ ay′ az ′ by′ bz ′ . (15.23) (15.24)

Because all those prime marks burden the notation and for professional mathematical reasons, the general forms (15.23) and (15.24) are sometimes rendered a · b = a1 b1 + a2 b2 + a3 b3 , ˆ ˆ ˆ e1 e2 e3 a1 a2 a3 , a×b = b1 b2 b3 but you have to be careful about that in applied usage because people are not always sure whether a symbol like a3 means “the third component of ˆ the vector a” (as it does here) or “the third vector’s component in the a direction” (as it would in eqn. 15.10). Typically, applied mathematicians will write in the manner of (15.21) and (15.22) with the implied understanding that they really mean (15.23) and (15.24) but prefer not to burden the notation with extra little strokes—that is, with the implied understanding that x, y and z could just as well be ρ, φ and z or the coordinates of any other orthogonal, right-handed, three-dimensional basis. Some pretty powerful confusion can aﬄict the student regarding the roles of the cylindrical symbols ρ, φ and z; or, worse, of the spherical symbols r, θ and φ. Such confusion reﬂects a pardonable but remediable lack of understanding of the relationship between coordinates like ρ, φ and z ˆ ˆ and their corresponding unit vectors ρ, φ and ˆ. Section 15.3 has already z written of the matter; but, further to dispel the confusion, one can now ˆ ˆ ˆ ask the student what the cylindrical coordinates of the vectors ρ, φ and z are. The correct answer: (1; φ, 0), (1; φ + 2π/4, 0) and (0; 0, 1), respectively. Then, to reinforce, one can ask the student which cylindrical coordinates ˆ ˆ the variable vectors ρ and φ are functions of. The correct answer: both are functions of the coordinate φ only (ˆ, a constant vector, is not a function z of anything). What the student needs to understand is that, among the cylindrical coordinates, φ is a diﬀerent kind of thing than z and ρ are:

φ and z are all the same kind of thing. and. lengths. where the superscript bears some additional semantics [77. subscript-only notation. the writer empathizes but regrets to inform the reader that the rest of the section.14 For instance.4. The trouble with vector work is that one has to learn to abbreviate or the expressions involved grow repetitive and unreadably long. aρ . study closely and take heart! The notation is not actually as impenetrable as it at ﬁrst will seem.2 Einstein’s summation convention Einstein’s summation convention is this: that repeated indices are implicitly summed over. Scientists and engineers however tend to prefer Einstein’s original. Eventually one accepts the need and takes the trouble to master the conventional vector abbreviation this section presents. 15. For vectors. ˆ ˆ ˆ • but ρ. φ1 . far from granting the reader a comfortable respite to absorb the elaborated notation as it stands.” 05:36. 10 February 2008]. the abbreviated notation really is the proper notation. will not delay to elaborate the notation yet further! The confusion however is subjective. indeed.426 CHAPTER 15. i=x′ . aφ and az are all the same kind of thing. where the convention is in force. If the reader feels that the notation begins to confuse more than it describes. “Einstein notation. Now to ask the student a harder question: in the cylindrical basis.y ′ . the equation15 a · b = a i bi (15.z ′ [35] Some professional mathematicians now write a superscript ai in certain cases in place of a subscript ai . • and. . VECTOR ANALYSIS • z and ρ are lengths whereas φ is an angle. So. separately. The student that gives this answer probably ˆ φ) + φρ grasps the cylindrical symbols.25) means that a·b = or more fully that a·b = 14 15 ai bi i ai bi = ax′ bx′ + ay′ by′ + az ′ bz ′ . what is ˆ the vector representation of (ρ1 . z1 )? The correct answer: ρρ1 cos(φ1 − ˆ 1 sin(φ1 − φ) + zz1 . the abbreviation is rather elegant once one becomes used to it. unit vectors.

Einstein’s summation convention is also called the Einstein notation. except that Einstein’s form (15. It is the kind of notational trick an accountant might appreciate.17 In contexts outside vector analysis. as we have seen in (15.25) expresses it more succinctly. Incidentally.3. then what does the symbol δii represent where Einstein’s summation convention is in force? 15.24)—although an experienced applied mathematician would probably apply the Levi-Civita epsilon of § 15. but the operator is still there.26) is (15.3) and thus to read “ai bi ” as “ i ai bi .3. Fortunately. For native English speakers who do not speak Italian. to further abbreviate this last equation to the form of (15. Admittedly confusing on ﬁrst encounter.27) before presenting it.26).23). You can waive the convention. Likewise. “Einstein summation”] [59] 18 [35] 19 Also called the Levi-Civita symbol. it often does just that. to invoke the convention at all may make little sense. Quiz:18 if δij is the Kronecker delta of § 11.25) neatly but. a × b = ˆ(ai+1 bi−1 − bi+1 ai−1 ) ı (15. a term sometimes taken loosely to include also the Kronecker delta and LeviCivita epsilon of § 15. The Levi-Civita epsilon 19 ǫijk mends this.15.4. does not by itself wholly avoid unseemly repetition in the cross product. It is rather a notational convenience. it just hides the summation sign to keep it from cluttering the page. Under the convention. below. the “ci” in Levi-Civita’s name is pronounced as the “chi” in 17 16 .16 It asks a reader to regard a repeated index like the i in “ai bi ” as a dummy index (§ 2. in vector work.2.” It does not magically create a summation where none existed.4.4. the summational operator i is implied not written.3 The Kronecker delta and the Levi-Civita epsilon Einstein’s summation convention expresses the dot product (15. or permutor. [75. it brings no new mathematics. Nevertheless. nothing requires you to invoke Einstein’s summation convention everywhere and for all purposes. writing the summation symbol out explicitly whenever you like.4. the convention’s utility and charm are felt after only a little practice. in and of itself. you should indeed learn the convention—if only because you must learn it to understand the rest of this chapter—but once having learned it you should naturally use it only where it actually serves to clarify. NOTATION 427 which is (15. What is important to understand about Einstein’s summation convention is that. tensor.

For example. which makes for yet another plausible story). one sometimes hears Einstein’s summation convention. however.” which though maybe not quite terminologically correct is hardly incorrect enough to argue over and is clear enough in practice. Table 15. y ′ ). the “children.1. x′ .1 and 14. stood in this context for “Einstein. but a canny reader takes the Levi-Civita’s appearance as a hint that Einstein’s convention is probably in force. The two tend to go together. 11. In any event.1.2 approximately as the cross product relates to the dot product. y ′ . ǫ is merely the letter after δ. “Levi-Civita permutation symbol”] 21 The writer has heard the apocryphal belief expressed that the letter ǫ. 7.6. k) = (x′ .27).27) (15. independent things.6. k) = (x′ . x′ ). Both delta and epsilon ﬁnd use in vector work.28) In the language of § 11. otherwise [for instance if (i. 22 [59] 23 The table incidentally answers § 15.7. x′ . which represents the name of Paul Dirac—though the writer does not claim his indirected story to be any less apocryphal than the other one (the capital letter ∆ has a point on top that suggests the pointy nature of the Dirac delta of Fig. Table 15. In this more general sense the Levi-Civita is the determinant of the permutor whose ones hold the indicated positions—which is a formal way of saying that it’s a + sign for even parity and a − sign for odd. z ′ ). as in (15. z ′ . (Chapters 11 and 14 did not use it. (y ′ .) Technically.” 20 [58.22 each as with Einstein’s summation convention in force.25) alternately in the form a · b = δij ai bj . one can write (15. j. j. not only three as in the present chapter. . a Greek e. but the Levi-Civita notation applies in any number of dimensions. ı where20 ǫijk ≡ +1 −1 0 if (i. the Kronecker delta and the Levi-Civita epsilon together referred to as “the Einstein notation.1 lists several relevant properties. concerns the three-dimensional case only.2’s quiz.428 rendering the cross-product as CHAPTER 15. if (i. (y ′ . the Levi-Civita epsilon and Einstein’s summation convention are two separate. z ′ ) or (z ′ .10. y ′ )]. (15.21 The Levi-Civita epsilon ǫijk relates to the Kronecker delta δij of § 11. as the rest of this section and chapter.4. j. the Levi-Civita epsilon quantiﬁes parity.23 Of the table’s several properties. k) = (x′ . y ′ ).” As far as the writer knows. For instance. in the fourdimensional. x′ . y ′ . z ′ . VECTOR ANALYSIS a × b = ǫijkˆaj bk . x′ ) or (z ′ . 4 × 4 case ǫ1234 = 1 whereas ǫ1243 = −1: refer to §§ 11.

δjk = δkj δij δjk = δik δii = 3 δjk ǫijk = 0 δnk ǫijk = ǫijn ǫijk = ǫjki = ǫkij = −ǫikj = −ǫjik = −ǫkji ǫijk ǫijk = 6 ǫijn ǫijk = 2δnk ǫimn ǫijk = δmj δnk − δmk δnj property that ǫimn ǫijk = δmj δnk − δmk δnj is proved by observing that. k) = (n. the last paragraph likely makes sense to few who do not already know what it means. for k = n = x′ . z ′ ) or (m. k) = (y ′ . in the case that i = x′ . Unfortunately. In this section’s notation and with the use . when one takes parity into account. or both. and similarly in the cases that i = y ′ and i = z ′ (more precisely. k) = (m.4. The factor 2 appears because. j) = (z ′ . k) = (z ′ .1: Properties of the Kronecker delta and the Levi-Civita epsilon. z ′ ) term and an (i. j) = (y ′ . is exactly what the property in question asserts. an (i. The property that ǫijn ǫijk = 2δnk is proved by observing that. n) = (y ′ . y ′ ) term both contribute positively to the sum. A concrete example helps. This implies that either (j. with Einstein’s summation convention in force. and similarly for k = n = y ′ and again for k = n = z′ . y ′ ). NOTATION 429 Table 15. and also either (m. y ′ ).15. m)—which. thus contributing nothing to the sum). z ′ ) or (j. n) or (j. in each case the several indices can take any values. either (j. in any given term of the Einstein sum. Consider the compound product c × (a × b). but combinations other than the ones listed drive ǫimn or ǫijk . to zero. which leaves the third to be shared by both k and n. i is either x′ or y ′ or z ′ and that j is one of the remaining two. n) = (z ′ .

the compound product is c × (a × b) = c × (ǫijkˆaj bk ) ı ˆ = ǫmni mcn (ǫijkˆaj bk )i ı ˆ = ǫmni ǫijk mcn aj bk ˆ = ǫimn ǫijk mcn aj bk ˆ = (δmj δnk − δmk δnj )mcn aj bk ˆ ˆ = δmj δnk mcn aj bk − δmk δnj mcn aj bk ˆ = ˆck aj bk − kcj aj bk ˆ = (ˆaj )(ck bk ) − (kbk )(cj aj ).430 CHAPTER 15.j. in light of (15. . Written without the beneﬁt of Einstein’s summation convention.k.29) a useful vector identity.k. the example’s central step would have been c × (a × b) = = ˆ ǫimn ǫijk mcn aj bk i.25). That is. (15.n j. c × (a × b) = a(c · b) − b(c · a).m.n ˆ (δmj δnk − δmk δnj )mcn aj bk .27).m. VECTOR ANALYSIS of (15.

y ′ ) + f (z ′ . y ′ . x′ ) + ǫz ′ x′ y ′ ǫz ′ x′ y ′ f (y ′ . x′ ) + f (y ′ . z ′ . x′ ) + f (x′ . z ′ . y ′ ) − f (x′ . z ′ . z ′ . z ′ ) + ǫy ′ x′ z ′ ǫy ′ x′ z ′ f (z ′ . x′ . y ′ . y ′ ) + f (z ′ . z ′ . y ′ . x′ ) + ǫy ′ x′ z ′ ǫy ′ x′ z ′ f (x′ . m. y ′ ) + f (z ′ . x′ . x′ . x′ . z ′ ) − f (y ′ . y ′ . y ′ . x′ . y ′ . x′ ) + f (x′ . y ′ .24 and justiﬁes the 24 If thinking about it hard enough does not work. y ′ . x′ . y ′ ) − f (z ′ . y ′ . z ′ . x′ ) + ǫz ′ y ′ x′ ǫz ′ y ′ x′ f (x′ . z ′ ) + f (z ′ . z ′ ) + ǫx′ y ′ z ′ ǫx′ z ′ y ′ f (y ′ . y ′ ) + f (z ′ . z ′ . z ′ . x′ . x′ . z ′ ) + f (z ′ . y ′ . x′ .4. x′ . For the property that ǫijk ǫijk = 6. x′ . y ′ ) + ǫx′ z ′ y ′ ǫx′ y ′ z ′ f (z ′ . z ′ ) 2 k. z ′ . n) i. That is for the property that ǫimn ǫijk = δmj δnk −δmk δnj . x′ . x′ . k. ǫijk ǫijk = ǫ2 ′ y ′ z ′ + ǫ2 ′ z ′ x′ + ǫ2′ x′ y ′ + ǫ2 ′ z ′ y ′ + ǫ2 ′ x′ z ′ + ǫ2′ y ′ x′ = 6.n = ǫx′ y ′ z ′ ǫx′ y ′ z ′ f (y ′ . y ′ ) + f (x′ .j. y ′ ) ′ + f (z ′ . y ′ . y ′ ) + f (z ′ . y ′ . x′ . y ′ . then here it is in interminable detail: ǫimn ǫijk f (j. z ′ . z ′ . z ′ . y ′ . z ′ ) + f (z ′ . y ′ ) + f (z ′ . z ′ . x′ ) + f (y ′ . y ′ .k. x′ ) − f (z ′ . y ′ . z ′ ) + ǫx′ z ′ y ′ ǫx′ z ′ y ′ f (z ′ . x′ . y ′ . z ′ . z ′ ) + ǫz ′ x′ y ′ ǫz ′ x′ y ′ f (x′ .n (δmj δnk − δmk δnj )f (j.j. z ′ . y ′ . z ′ ) = = = f (x′ . x′ . z ′ . y ′ ) + ǫz ′ y ′ x′ ǫz ′ y ′ x′ f (y ′ . x′ . y ′ . y ′ . y ′ . x′ . x′ . y ′ . y ′ ) + f (z ′ . x′ ) + f (z ′ . z ′ . x′ ) + f (z ′ . y ′ . z ′ ) + f (x′ . z ′ ) + f (z ′ . y ′ ) + f (y ′ .k.j. y ′ ) + f (x′ . y ′ ) + f (y ′ .n δnk f (k. y ′ . y ′ ) + ǫy ′ z ′ x′ ǫy ′ z ′ x′ f (z ′ . n). m. z ′ ) 2 f (x′ . x′ . z ′ ) + f (y ′ . z ′ . z ′ . y ′ . x′ . x′ ) + f (y ′ . x′ . x′ . x′ . z ′ . NOTATION 431 which makes sense if you think about it hard enough. x′ . z ′ . y ′ . x′ ) + ǫy ′ z ′ x′ ǫy ′ x′ z ′ f (z ′ . y ′ ) + ǫx′ z ′ y ′ ǫx′ z ′ y ′ f (y ′ . x′ ) = f (y ′ . the corresponding calculation is ǫijn ǫijk f (k. z ′ . x′ ) − f (y ′ . x y z x y z i. z ′ .15. y ′ . z ′ . x′ ) + f (y ′ . z ′ . n) i. y ′ ) + ǫz ′ x′ y ′ ǫz ′ y ′ x′ f (x′ . z ′ . k. y ′ ) + ǫx′ y ′ z ′ ǫx′ y ′ z ′ f (z ′ . x′ . x′ . x′ . n).k. x′ . y ′ . z ′ . z ′ . x′ ) + f (x′ . x′ . x′ . z ′ ) + f (x′ . x′ . For the property that ǫijn ǫijk = 2δnk .m. z ′ ) − f (y ′ . z ′ ) = j. z ′ . y ′ . y ′ ) + f (x′ . x′ . x′ . y ′ . x′ ) + f (x′ . x′ ) + f (y ′ .k . y ′ ) = f (y ′ .m. z ′ . z ′ . y ′ ) + f (x′ . y ′ .n = ǫy ′ z ′ x′ ǫy ′ z ′ x′ f (x′ . z ′ . x′ . y ′ . z ′ . y ′ . x′ ) = f (y . y ′ . z ′ ) + f (x′ . x′ ) + f (z ′ . z ′ ) + ǫy ′ x′ z ′ ǫy ′ z ′ x′ f (x′ . y ′ . z ′ ) − f (y ′ . x′ ) + f (x′ . z ′ . z ′ ) + f (y ′ . x′ ) + f (y ′ . z ′ ) − f (x′ . x′ . z ′ ) + f (x′ . y ′ . x′ ) + ǫz ′ y ′ x′ ǫz ′ x′ y ′ f (y ′ . z ′ . z ′ .

VECTOR ANALYSIS table’s claim that ǫimn ǫijk = δmj δnk − δmk δnj . before leaving the present section.6 n n illustrates. dot and cross products. until he is satisﬁed that he adequately understands the several properties and their correct use. .29) and (15. n n (15.26 Most of the table’s identities are plain by the formulas (15. However.21) and (15. (Notice that the compound Kronecker operator δmj δnk includes nonzero terms for the case that j = k = m = n = x′ . Figure 15. The reader who does not feel entirely sure that he understands what is going on might work out the table’s several properties with his own pencil.4 as ǫijk ci aj bk = ǫijk ci aj bk = ǫkij ck ai bj = ǫjki cj ak bi = ǫijk ci aj bk = ǫijk ai bj ck = ǫijk bi cj ak = c · (ǫijkˆaj bk ) = a · (ǫijkˆbj ck ) = b · (ǫijkˆcj ak ). 15. whereas the compound Levi-Civita operator ǫimn ǫijk does not. Appendix A] 26 Nothing in any of the table’s identities requires the vectors involved to be real.30). we might ˆ pause for a moment to appreciate (15.22) respectively for the scalar. but with three distinct types of product it has more rules controlling the way its products and sums are combined. Appendix II][31. Equation (15.29) in the special case that b = c = n: ˆ n −ˆ × (ˆ × a) = a − n(ˆ · a).) To belabor the topic further here would serve little purpose.30) reveals that the double cross product −ˆ × (ˆ × a) projects the same vector onto the same plane. The table is equally as valid when vectors are complex. This is why the table’s claim is valid as written. the compound Kronecker operator −δmk δnj includes canceling terms for these same three cases.25.1.7 will reﬁne the notation for use when derivatives with respect to angles come into play but. The remaining identity is proved in the notation of § 15.432 CHAPTER 15. Section 16.9). Table 15. for the case that j = k = m = n = y ′ and for the case that j = k = m = n = z ′ .5 Algebraic identities Vector algebra is not in principle very much harder than scalar algebra is. the Levi-Civita epsilon and the properties of Table 15. and two were proved as (15. (15. in something like the style of the example.2 lists several of these. ı ı ı It is precisely to encapsulate such interminable detail that we use the Kronecker delta. 25 [70.30) ˆ n The diﬀerence a − n(ˆ ·a) evidently projects a vector a onto the plane whose ˆ unit normal is n.

6: A vector projected onto a plane. a ˆ n ˆ n n(ˆ · a) −ˆ × (ˆ × a) n n ˆ n = a − n(ˆ · a) Table 15.2: Algebraic vector identities. ψa = a∗ · a b·a c · (a + b) a · (ψb) ˆψai ı a · b ≡ ai bi a × b ≡ ǫijkˆaj bk ı 2 = |a| (ψ)(a + b) = ψa + ψb = a·b b × a = −a × b = c·a+c·b c × (a + b) = c × a + c × b = (ψ)(a · b) a × (ψb) = (ψ)(a × b) c · (a × b) = a · (b × c) = b · (c × a) c × (a × b) = a(c · b) − b(c · a) ˆ n −ˆ × (ˆ × a) = a − n(ˆ · a) n n .15.5. ALGEBRAIC IDENTITIES 433 Figure 15.

16’s footnote 21 explain the usage of semicolons as coordinate delimiters.2’s identities. The author has not encountered such a system.30 Isotropy admittedly would not be a very interesting property if that were all there were to it. nonrectangular coordinate system in three dimensions is a question we will leave to the professional mathematician. 15. r=r1 (15. (15. However.434 That is. 27 . 30 Whether it is even possible to construct an isotropic.2 is invariant under reorientation of axes. then he is likely to have heard of the unfortunate “back cab rule.32) ∂r ∂β = r=r1 ∂r ∂γ . 29 This chapter’s footnote 31 and Ch. the table also includes the three vector products in Einstein notation. there is also two-dimensional isotropy.31) Besides the several vector identities. more interesting because it arises oftener. The writer recommends that the reader forget the rule if he has heard of it for. CHAPTER 15.33).27 Each deﬁnition and identity of Table 15. VECTOR ANALYSIS c · (a × b) = a · (b × c) = b · (c × a). γ) is isotropic at a point r = r1 if and only if ˆ ˆ β(r1 ) · γ (r1 ) = 0. signiﬁcantly impedes real understanding of the vector. and ∂r ∂α = r=r1 (15. spelling-based mnemonics are seldom if ever a good idea. If the reader’s native language is English.6 Isotropy A real.32) and (15. in mathematics.33) That is. when learned. The mnemonic is mildly orthographically clever but. Of the three basic three-dimensional coordinate systems—indeed. a three-dimensional system is isotropic if its three coordinates advance locally at right angles to one another but at the same rate. β. of all the three-dimensional coordinate systems this book treats—only the rectangular is isotropic according to (15.” which actually is not a rule but an unhelpful mnemonic for one of Table 15. 28 The reader is reminded that one can licitly express a complex vector in a real basis. ˆ ˆ γ (r1 ) · α(r1 ) = 0. ˆ ˆ α(r1 ) · β(r1 ) = 0.28 three-dimensional coordinate system29 (α.

2. analyzed in the rectangular system not because the problems’ geometries ﬁt that system but rather because they ﬁt no system and thus give one little reason to depart from the rectangular. The ﬁrst category comprises problems of simple geometry conforming to any one of the three basic coordinate systems—rectangular. Because |∂ρ/∂φ| = (ρ) |∂ρ/∂ρ| the standard two-dimensional cylindrical system (ρ. 15. Further twodimensionally isotropic coordinate systems include the parabolic system of § 15. PARABOLIC COORDINATES 435 ργ A real.7.20 one can omit the superscript. converts the system straightforwardly into the logarithmic cylindrical system (λ.7.15. This section will treat the parabolic coordinate systems which. (If the α-β plane happens to be the x-y plane.36) where ρo is some arbitrarily chosen reference radius. then ργ = ρz = ρ and per eqn. whose geometries are simple but. but the change of coordinate λ ≡ ln ρ . cylindrical or spherical. φ) as such is nonisotropic. (15. ρo (15. serve as good examples of the kind of special system a scientist or engineer might be .7 Parabolic coordinates Scientists and engineers ﬁnd most spatial-geometrical problems they encounter in practice to fall into either of two categories.35) and ργ =ργ 1 ργ =ργ 1 ˆ ˆ where ργ = αα + ββ represents position in the α-β plane. y) naturally is isotropic. φ) which is isotropic everywhere in the plane except at the origin ρ = 0. β) is isotropic at a point = ργ if and only if 1 ˆ 1 ˆ 1 α(ργ ) · β(ργ ) = 0 (15. 3. One however occasionally encounters problems of a third category.) The two-dimensional rectangular system (x. as is often the case. Then it may fall to the scientist or engineer to devise a special coordinate system congenial to the problem. nevertheless ﬁt none of the three basic coordinate systems.34) ∂ργ ∂α = ∂ργ ∂β . though simple. The second category comprises problems of complicated geometry. two-dimensional coordinate system (α. to follow. besides being arguably the most useful of the various special systems.

19 July 2008] 34 Whether the parabola’s deﬁnition ought to forbid the directrix to pass through the focus is a stylistic question this book will leave unanswered.7. τ ) of § 15. and a line. See § 15.33 However. Referring to the ﬁgure. and that the equation y = k − σ 2 describes the parabola’s directrix. η and ξ—neither angles nor lengths but root-lengths (that is. τ.5. φ). τ . where the angle φ and the length z are familiar to us but σ. The parabola. See also Ch.34 plus the plane in which the focus and the directrix both lie. it doesn’t mean much.) 32 The letters σ. The two three-dimensional parabolic systems are the parabolic cylindrical system (σ.35 See Fig. if rectangular coordinates are established such ˆ ˆ that x and y lie in the plane. 35 [65. but we will continue to use it for inertia’s sake.32 Both threedimensional parabolic systems derive from the two-dimensional parabolic system (σ. φ. such a notational convention ceases to help much when parabolic coordinates arrive. φ. ξ. k). ξ) of § 15. next.3 and eqn. η and ξ are available letters this section happens to use. Admittedly. that the parabola’s focus lies at (x. Given a point.5. called the directrix. then the equation x2 + (y − k)2 = (y − k + σ 2 )2 31 The reader probably will think nothing of it now.436 CHAPTER 15. simple but less obvious than the circle.2. the associated parabola is that curve which lies in the plane everywhere equidistant from both focus and directrix. ξ) rather than (ξ. 16’s footnote 21. 33 [77.7. called the focus.7. but later may wonder why the circular paraboloidal coordinates are (η. φ.” delimiter. since η × ξ = −φ rather ˆ than +φ. This subsection e reviews the parabola.7. § 12-1] . which many or most readers will have met long before opening this book’s covers. 15. 15. may however not be equally well known to all readers. z) of § 15. η) or (η. y) = (0. This book arbitrarily uses a semicolon when the following coordinate happens to be an angle. before handling any parabolic system we ought formally to introduce the parabola itself. τ . “Parabolic coordinates. (Regarding the semicolon “.1 The parabola Parabolic coordinates are based on a useful geometrical curve called the parabola. See Appendix B. and even readers already acquainted with it might appreciate a re¨xamination.4 and the circular paraboloidal system31 (η. which helps to distinguish rectangular coordinates from cylindrical from spherical. VECTOR ANALYSIS called upon to devise.7. not necessarily standard parabolic symbols.” 09:59. 15.19).7. coordinates having dimensions of [length]1/2 )—are new. The peculiar ˆ ˆ ˆ ordering is to honor the right-hand rule (§ 3.

40) arise extremely commonly in applications. Any equation that ﬁts the form can be plotted as a parabola. 2µ (15. which for example is why projectiles ﬂy in parabolic arcs. (15. 2σ 2 σ2 κ≡k− . To choose a particularly famous example. Solving for y − k and then.39) (15.15. indeed. from that solution. PARABOLIC COORDINATES Figure 15. k =κ+ 4µ eqn. the equation that describes a projectile’s ﬂight in the absence of air resistance ﬁts the form. Observe that the parabola’s deﬁnition does not actually require the diˆ ˆ rectrix to be y-oriented: the directrix can be x-oriented or. (15. we have that y= With the deﬁnitions that µ≡ 1 .7: The parabola. 2 1 . 437 a σ2 a evidently expresses the equidistance rule of the parabola’s deﬁnition.37) given which σ2 = 1 .37) becomes y = µx2 + κ. for y.38) x2 σ2 + k− 2 2σ 2 . oriented . (15.40) Equations ﬁtting the general form (15.7.

τ ) represent the point in the x-y plane that lies equidistant • from the line y = −σ 2 .40 would have to be modiﬁed). one of which represents the point’s parabolic track if σ were varied while τ were held constant and the other of which represents the point’s parabolic track if τ were varied while σ were held constant. then your track cannot but exactly bisect the angle between the two segments. thus maintaining equal distances.7. One of the consequences of this geometrical fact—a fact it seems better to let the reader visualize and ponder than to try to justify in so many words36 —is that a parabolic mirror reﬂects precisely37 toward its focus all light rays that arrive perpendicularly to its directrix (which for instance is why satellite dish antennas have parabolic cross-sections). If you want to draw away from the directrix at the same rate as you draw away from the focus.8 depicts the construction described. where the parameter k of § 15. Once you grasp the idea.7. from which one can If it helps nevertheless. the ray model breaks down in the immediate neighborhood of the mirror’s focus. though to grasp the idea can take some thought. VECTOR ANALYSIS any way (though naturally in that case eqns. In two dimensions. where λ represents the light’s characteristic wavelength. the ray model of light implied here is valid only insofar as λ ≪ σ 2 . To bisect a thing. Also. and • from the point ρ = 0. regardless of λ. Observe also the geometrical fact that the parabola’s track necessarily bisects the angle between the two line segments labeled “a” in Fig.37 and 15. 15. physically. 36 . actually.7. the parabolic coordinates (σ.2 Parabolic coordinates in two dimensions Parabolic coordinates are most easily ﬁrst explained in the two-dimensional case that z = 0. Such wave-mechanical considerations are conﬁned to a footnote not because they were untrue but rather because they do not concern the present geometrical discussion. • from the line y = +τ 2 .1’s bisection ﬁnding that each parabola necessarily bisects the angle between two of the three line segments labeled a in the ﬁgure.1 has been set to k = 0. 15. the bisection is obvious.438 CHAPTER 15. In the ﬁgure are two dotted curves. Observe according to § 15. 37 Well. Insofar as rays are concerned. incidentally—if the context has not already made the meaning plain— is to divide the thing at its middle into two equal parts. 15. Figure 15. some words: Consider that the two line segments labeled a in the ﬁgure run in the directions of increasing distance respectively from the focus and from the directrix. Observe further that the two angles’ sum is the straight angle 2π/2. the focusing is precise.7.

or. Figure 15. signiﬁcantly.3 will demonstrate formally.41). Notice in the ﬁgure that one of the grid’s several cells is subdivided at its quarter-marks for illustration’s sake. .9 lays out the parabolic coordinate grid. for example. (15. that the two parabolas cross precisely at right angles to one another. (σ 2 + y)2 = x2 + y 2 = (τ 2 − y)2 .8: Locating a point in two dimensions by parabolic construction. PARABOLIC COORDINATES 439 Figure 15. τ ) = ( 7 . 2 (15. From the ﬁrst line of (15. a signiﬁcant fact § 15. − 9 ) visually. combining the two lines of (15.) Using the Pythagorean theorem. one can symbolically express the equidistant construction rule above as a2 = ρ2 = x2 + y 2 .7. x2 = (στ )2 . τ2 ρ=0 a σ2 a a conclude.42)’s expression for y. subtracting y 2 . y= τ 2 − σ2 . Substituting (15. to show how one can subgrid at need to locate points like.42) a = σ 2 + y = τ 2 − y. σ 4 + 2σ 2 y = x2 = τ 4 − 2τ 2 y.41) On the other hand.15.7. (That the subgrid’s cells approach 2 4 square shape implies that the parabolic system is isotropic.41). (σ.

VECTOR ANALYSIS Figure 15. σ=0 ±1 ±2 ±3 ±3 τ =0 ±1 ±2 That either x = +στ or x = −στ would satisfy this equation. (15.9: The parabolic coordinate grid in two dimensions.3 Properties The derivatives of (15. since ρ2 = x2 + y 2 .7.43) together imply that ρ= τ 2 + σ2 . (15.43) Combining (15. (15. 2 (15.42) and (15.45) 15.44) to isolate σ 2 and τ 2 yields σ 2 = ρ − y.42) and (15. Also.44) (15.44) are dx = σ dτ + τ dσ.42) and (15. (15. dy = τ dτ − σ dσ. τ 2 = ρ + y.440 CHAPTER 15.46) . Arbitrarily choosing the + sign gives us that x = στ. dρ = τ dτ + σ dσ.43).

∂τ ˆ ˆ = x(σ dτ + τ dσ) + y(τ dτ − σ dσ) (15.46) has expanded the diﬀerentials and from which . that ˆ σ= ˆ ˆ xτ − yσ √ . 2ρ ˆ ˆ xσ + yτ ˆ . 2 + σ2 τ or.47) from which it is apparent that (15.50) in which (15.48) of which the dot product ˆ ˆ σ · τ = 0 if ρ = 0 (15.44) yields dσ = τ dx − σ dy . 2ρ σ dx + τ dy dτ = .48) simultaneously for x ˆ and y then produces ˆ ˆ τ σ + στ √ . conﬁrming our earlier ﬁnding that the various grid parabolas cross ˆ always at right angles to one another. τ = √ 2ρ (15. 2ρ ˆ ˆ xτ − yσ ˆ .7. Solving (15.46) simultaneously for dσ and dτ and then collapsing the resultant subexpression τ 2 + σ 2 per (15. y= √ 2ρ ˆ x= One can express an inﬁnitesimal change in position in the plane as ˆ ˆ dρ = x dx + y dy ˆ ˆ = (ˆ τ − yσ) dσ + (ˆ σ + yτ ) dτ. ∂σ ∂ρ ˆ ˆ = xσ + yτ.44). σ=√ 2 + σ2 τ ˆ ˆ xσ + yτ ˆ τ =√ . collapsing again per (15. 2ρ ˆ ˆ τ τ − σσ ˆ . PARABOLIC COORDINATES 441 Solving the ﬁrst two lines of (15.15. x x ∂ρ ˆ ˆ = xτ − yσ.49) is null.

3: Parabolic coordinate properties. All the properties of Table 15. ˆ ˆz The orthogonal parabolic cylindrical basis is [σ τ ˆ].49) and (15. x = στ τ 2 − σ2 y = 2 2 + σ2 τ ρ = 2 2 2 ρ = x + y2 σ2 = ρ − y τ2 = ρ + y ˆ x = ˆ y = ˆ σ = ˆ τ = ˆ ˆ σ×τ = ˆ ˆ τ σ + στ √ 2ρ ˆ ˆ τ τ − σσ √ 2ρ ˆ ˆ xτ − yσ √ 2ρ ˆ ˆ xσ + y τ √ 2ρ ˆ z ˆ ˆ σ·τ = 0 ∂ρ ∂ρ = ∂σ ∂τ and thus ∂ρ ∂ρ = . The surfaces of constant σ and of constant τ in this system are parabolic cylinders (and the surfaces of constant z naturally are planes). 15.442 CHAPTER 15. τ. Observe however that the system is isotropic only in two dimensions not three. . gathering parabolic coordinate properties from this subsection and § 15.34) and (15.51) Equations (15. ∂σ ∂τ (15.35).4 The parabolic cylindrical coordinate system Two-dimensional parabolic coordinates are trivially extended to three dimensions by adding a z coordinate.7. thus constituting the parabolic cylindrical coordinate system (σ.2. Table 15. implying that the two-dimensional parabolic coordinate system is isotropic except at ρ = 0.3 summarizes.7. z).51) respectively meet the requirements (15.3 apply. VECTOR ANALYSIS Table 15.

5 The circular paraboloidal coordinate system Sometimes one would like to extend the parabolic system to three dimensions by adding an azimuth φ rather than a height z. φ.7.7. The system is the circular paraboloidal system (η. which are just the properties of Table 15. foci and directrices of Figs. the circular paraboloidal system too is isotropic in two dimensions. Therefore. but then one tends to prefer the parabolas. The properties of Table 15. 443 ρ = ηξ ξ 2 − η2 z = 2 ξ 2 + η2 r = 2 r 2 = ρ2 + z 2 = x2 + y 2 + z 2 η2 = r − z ξ2 = r + z ˆ ρ = ˆ z = ˆ η = ˆ ξ = ˆ ˆ η×ξ = ˆ ˆ η·ξ = 0 ∂r ∂r = ∂η ∂ξ ˆ ˆ ξη + η ξ √ 2r ˆ − ηη ξξ ˆ √ 2r ˆ z ρξ − ˆη √ 2r ˆ ˆ ρη + zξ √ 2r ˆ −φ 15. given the usual deﬁnition of the φ unit basis vector. just as in the cylindrical system). or half planes if you like. Like the parabolic cylindrical system. η × ξ = ˆ rather than +φ as one might ﬁrst guess.9 to run in the ρ-z plane rather than in the x-y.4 result. PARABOLIC COORDINATES Table 15. 15. one deﬁnes the coordinates η and ξ to represent in the ρ-z plane what the letters σ and τ have represented in the x-y. ˆ ˆ ˆ Notice that. parabolas rotated about the z axis (and the surfaces of constant φ are planes. right-handed ˆ −φ sequence of the orthogonal circular paraboloidal basis therefore would be .3 with coordinates changed. The surfaces of constant η and of constant ξ in the circular paraboloidal system are paraboloids.15. The correct. This is possible.8 and 15. ξ).4: Circular paraboloidal coordinate properties.

VECTOR ANALYSIS [ˆ φ ξ]. . Chapter 16.444 CHAPTER 15. next.38 η ˆ ˆ This concludes the present chapter on the algebra of vector analysis. 38 See footnote 31. will venture hence into the larger and even more interesting realm of vector calculus.

1 This ψ(r) is unrelated to the Tait-Bryan and Euler roll angles ψ of § 15. § 18.3. Wind velocity2 q(r) is an example of a vector ﬁeld. 2 As § 15. Tactically. 16. the vector is a continuous quantity and as such has not only an algebra but also a calculus.” but there is a special. Like the scalar. Air pressure p(r) is an example of a scalar ﬁeld. which it needs for another purpose.1. alternate name for such a quantity. a vector ﬁeld can be thought of as composed of three scalar ﬁelds ˆ ˆ ˆ q(r) = xqx (r) + yqy (r) + zqz (r). In the unlikely event of confusion. See Appendix B. if you prefer. A ﬁeld is a quantity distributed over space or. you can use an alternate letter like η for the roll angle.1 Fields and their derivatives A scalar quantity σ(t) or vector quantity f (t) whose value varies over time is “a function of time t.” We can likewise call a scalar quantity1 ψ(r) or vector quantity a(r) whose value varies over space “a function of position r. whose value at a given location r has both amplitude and direction.Chapter 16 Vector calculus Chapter 15 has introduced the algebra of the three-dimensional geometrical vector. a function in which spatial position serves as independent variable. an unfortunate but tolerable overloading of the Greek letter ψ in the conventional notation of vector analysis. whose value at a given location r has amplitude but no direction. We call it a ﬁeld. 445 . These are typical examples.4]. this section also uses the letter q for velocity in place of the conventional v [8. This chapter develops the calculus of the vector.

This is not because the Laplacian were uninteresting but rather because the Laplacian is actually a second-order derivative—a derivative of a derivative. You and I who already grasp the concept of upstairs and downstairs do not ﬁnd a building of two ﬂoors. The ﬁeld q(r) as a whole is the interesting thing. As one can take the derivative dσ/dt or df /dt with respect to time t of a function σ(t) or f (t). especially hard to envision. and the curl. To try to think of upstairs and downstairs might confuse him with partly false images of sitting in a tree or of clambering onto (and breaking) the roof of his hut. If we will speak of a ﬁeld’s derivative with respect to position r then we shall be more precise. In fact dψ/dr and da/dr mean nothing very distinct in most contexts and we shall avoid such notation. since CHAPTER 16. The notation dσ/dt means “the rate of σ as time t advances. the divergence. the three components come tactically. Section 15. or three or even thirty. “There are many trees and antelopes but only one sky and ﬂoor. such components are uninteresting in themselves. How can one speak of many skies or many ﬂoors?” The student’s principal conceptual diﬃculty with the several vector derivatives is of this kind. by and bz are to a vector b. but our caveman is used to thinking of the ground and the ﬂoor as more or less the same thing. However.” Imagine a caveman. As we said. for it is not obvious what symbology like dψ/dr or da/dr would actually mean. He might not understand. “advances in which direction?” The notation oﬀers no hint. We will address the Laplacian in § 16. 3 . Suppose that you tried to describe to the caveman a house or building of more than one ﬂoor. typically. VECTOR CALCULUS ˆ ˆ ˆ q(r) = x′ qx′ (r) + y′ qy′ (r) + z′ qz ′ (r) for any orthogonal basis [x′ y′ z′ ] as well. Scalar and vector ﬁelds are of utmost use in the modeling of physical phenomena.2 has given the vector three distinct kinds of product. the speciﬁc scalar ﬁelds qx (r). the gradient. This section gives the ﬁeld no fewer than four distinct kinds of derivative: the directional derivative.4.” but if the notation dψ/dr likewise meant “the rate of ψ as position r advances” then it would necessarily prompt one to ask. one can likewise take the derivative with respect to position r of a ﬁeld ψ(r) or a(r). Vector veterans may notice that the Laplacian is not listed.446 but. derivatives with respect to position create a notational problem.3 So many derivatives bring the student a conceptual diﬃculty one could call “the caveman problem. qy (r) and qz (r) are no more essential to the vector ﬁeld q(r) than the speciﬁc scalars bx .

16. Section 16. Maybe there exists a model in which “[Tuesday]” knows how to operate on a scalar like ax . The model might allow it. Now consider a “vector” ˆ ∇=x ∂ ∂ ∂ ˆ ˆ +y +z . What matters in this context is not that c have amplitude and direction (it has neither) but rather that it have the three orthonormal components it needs to participate formally in relevant vector operations. maybe. but.) If there did exist such a model. It has these. but c is not a true vector. As if this were not enough. the cross product c × a too could be licit in that model. but consider: c · a = [Tuesday]ax + [Wednesday]ay + [Thursday]az . (Operate on? Yes. The dot and cross products in and of themselves do not forbid it.1.1) This ∇ is not a true vector any more than c is. ∂x ∂y ∂z (16. unlike the terms in the earlier dot product. then the dot product c · a could be licit in that model. composed according to the usual rule for cross products. but the point is that c behaves as though it were a vector insofar as vector operations like the dot product are concerned. Consider a vector Then consider a “vector” ˆ ˆ c = x[Tuesday] + y[Wednesday] + ˆ[Thursday]. then the writer thinks as you do. z If you think that the latter does not look very much like a vector. The writer does not know how to interpret a nonsensical term like “[Tuesday]ax ” any more than the reader does. ∂x ∂y ∂z Such a dot product might or might not prove useful. .2 elaborates the point.1 The ∇ operator ˆ ˆ ˆ a = xax + yay + zaz . but if we treat it as one then we have that ∇·a= ∂ax ∂ay ∂az + + . FIELDS AND THEIR DERIVATIVES 447 16.1. Nothing in the dot product’s deﬁnition requires the component amplitudes of c to multiply those of a. at least we know what this one’s terms mean. so “[Tuesday]” might do something to ax other than to multiply it. That the components’ amplitudes seem nonsensical is beside the point. Multiplication is what the component amplitudes of true vectors do.1.

ay and az do. d/dro . informally pronounced “del” (in the author’s country at least). may suddenly ﬁnd the notation . ∂i ∂x ∂y ∂z (16. The partial diﬀerential operators ∂/∂x. Whatever mark distinguishes the special r. This is easier to see at ﬁrst with respect the true vector a. That is the convention. The operator ∇ however shares more in common with a true vector than merely having x. y and z components. After a brief digression to 4 A few readers not fully conversant with the material of Ch. VECTOR CALCULUS Well. ı 4 there ∇o = ˆ ∂/∂io . If ∇ takes the place of the ambiguous d/dr. to whom this chapter had been making sense until the last two sentences. relative to those axes. like a true vector. Now consider ∇. What’s the point? The answer is that there wouldn’t be any point if the only nonvector “vectors” in question were of c’s nonsensical kind. exactly the same form. Consider rotating the x and y axes through an angle φ about the z axis. Hence. days of the week.2) evidently the same operator regardless of the choice of basis [ˆ ′ y′ z′ ]. There ensues ˆ ˆ a = xax + yay + ˆaz z ˆ = (ˆ ′ cos φ − y′ sin φ)(ax′ cos φ − ay′ sin φ) x ˆ z + (ˆ ′ sin φ + y′ cos φ)(ax′ sin φ + ay′ cos φ) + ˆ′ az ′ x ˆ = x′ [ax′ cos2 φ − ay′ cos φ sin φ + ax′ sin2 φ + ay′ cos φ sin φ] ˆ + y′ [−ax′ cos φ sin φ + ay′ sin2 φ + ax′ cos φ sin φ + ay′ cos2 φ] ˆ + z′ az ′ ˆ ˆ ˆ = x′ ax′ + y′ ay′ + z′ az ′ . (15. 15. ∇o .8). the same mark ∇ distinguishes the corresponding special ∇. the operator ∇ is amenable to having its axes reoriented by (15.2). r † and so on.7) and (15. ∂/∂y and ∂/∂z change no diﬀerently under reorientation than the component amplitudes ax . (15. where the ﬁnal expression has diﬀerent axes than the original but. where ro = ˆio . It is x ˆ ˆ this invariance under reorientation that makes the ∇ operator useful. ı Introduced by Oliver Heaviside. for. ersatz vectors—it all seems rather abstract. the vector diﬀerential operator ∇ ﬁnds extensive use in the modeling of physical phenomena. partial derivatives. Further rotation about other axes would further reorient but naturally also would not alter the form. then what takes the place ˜ of the ambiguous d/dr′ . d/d˜. ∇ =ˆ ı ∂ ∂ ∂ ∂ ˆ ˆ ˆ = x′ ′ + y′ ′ + z′ ′ . For example. ∇. d/dr† and so on? Answer: ∇′ .448 CHAPTER 16. as follows.1).

The notation is Einstein’s. And. This subsection digresses to explain. Any letter might serve as well as the example’s J. 16. FIELDS AND THEIR DERIVATIVES 449 discuss operator notation. It means ˆio = ı ˆio = x′ x′ + y′ yo + z′ zo . the subsections that follow will use the operator to develop and present the four basic kinds of vector derivative. For example. Seen in this way.1.1. Thus.z ′ in the leftmost form of which the summation sign is implied not written. 5 [11] . though the notation Jt usually formally resembles multiplication in other respects as we shall see. generally. Perhaps you did not know that 5 was an operator—and. The a · in the dot product a · b and the a × in the cross product a × b can proﬁtably be regarded as unary operators.4. so we have met operator notation before. it is true that 5t = t5. the scalar 5 itself is no operator but just a number—but where no other operation is deﬁned operator notation implies scalar multiplication by default.16.2 Operator notation Section 16. we have met operator notation much earlier than that. for instance. Jt = tJ if J is a unary operator. like the matrix row operator A in matrix notation’s product Ax (§ 11. Refer to § 15. in fact.1 has introduced operator notation without explaining what it is or even what it concerns. 5t and t5 actually mean two diﬀerent things.3 has already broadly introduced the notion of the operator. a single function or another single mathematical object in some deft inite way. more fully written t J ≡ 0 · dt where the “·” holds the place of the thing operated upon. a single distributed quantity. which happens to be commutative. Whether operator notation can licitly represent any unary operation incomprehensible.5 whose eﬀect is such that. A unary operator is a mathematical agent that transforms a single discrete quantity. the operator J in operator notation’s operation Jt attacks from the left. ˆ o ˆ ′ ˆ ′ ı i=x′ . The product 5t can if you like be regarded as the unary operator “5 times.” operating on t. indeed.1). Section 7. J ≡ 0 dt is a unary operator. Jt = t2 /2 and J cos ωt = (sin ωt)/ω. The matrix actually is a type of unary operator and matrix notation is a specialization of operator notation. but what distinguishes operator notation is that.1.y ′ . though naturally in the speciﬁc case of scalar multiplication. Operator notation concerns the representation of unary operators and the operations they specify.1. a single ﬁeld.

or.1) so that. the notation gives r no speciﬁc direction in which to advance.3. the same algebra matrices obey.’ operating on t. Operators associate from right to left (§ 2. Indeed. The operators J and A above are examples of linear unary operators. operators associate from right to left except where parentheses group otherwise. Jωt = J(ωt). VECTOR CALCULUS whatsoever is a deﬁnitional question we will leave for the professional mathematician to answer.1. t For example. It is precisely this algebra that makes operator notation so useful.450 CHAPTER 16. 16.2 between λ and λI against the distinction between ω and Ω here. one can compare the distinction in § 11. rather than go to the trouble of deﬁning extra symbols like Ω. Observe however that the perceived need for parentheses comes only of the purely notational ambiguity as to whether ωt bears the semantics of “the product of ω and t” or those of “the unary operator ‘ω times. but otherwise linear unary operators follow familiar rules of multiplication like (J2 + J3 )J1 = J2 J1 + J3 J1 . . however. One can speak of a unary operator like J. not (Jω)t. whereupon (JΩ)t = JΩt = J(Ωt) = J(ωt). to rely on the right-to-left convention that Jωt = J(ωt). lest an expression like Kt mislead an understandably unsuspecting audience. better. the operator K ≡ · + 3 is not linear and almost certainly should never be represented in operator notation as here. but in normal usage operator notation represents only linear unary operations. the derivative notation d/dr is ambiguous because.5) a matrix enjoys. which take little space on the page and are universally understood. One can leave an operation unresolved. it is usually easier just to write the parentheses.3 The directional derivative and the gradient In the calculus of vector ﬁelds. as the section’s introduction has observed. Linear unary operators often do not commute. for a linear unary operator enjoys the same associativity (11.3’s rule of linearity. in which case the product JΩ would itself be an operator.2) is an unresolved unary operator of the same kind. tJ is itself a unary operator—it is the operator t 0 dt—though one can assign no particular value to it until it actually operates on something.1. so.” The perceived need and any associated confusion would vanish if Ω ≡ (ω)(·) were unambiguously an operator. so J1 J2 = J2 J1 generally. and for the same reason. notationally. A or Ω without giving it anything in particular to operate upon. The operator ∇ of (16. Modern conventional applied mathematical notation though generally excellent remains imperfect. Still. In operator notation.3. in operator notation. Linear unary operators obey a deﬁnite algebra. unary operations that honor § 7. when it matters.

The gradient represents the amplitude and direction of a scalar ﬁeld’s locally steepest rate.5). whereupon the directional derivative is invariant. (b · ∇)ψ(r) = bi ∂i as to the vector ﬁeld.5) b · ∇ψ(r) = bi ∂i In the case (16.3) to express the derivative unambiguously.2) and accorded a reference vector b to supply a direction and a scale. This operator applies equally to the scalar ﬁeld. Within (16.4) of the vector ﬁeld. however. (b · ∇)a(r) = bi ∂a ∂aj = ˆbi . It can be a vector ﬁeld b(r) that varies from place to place. ∂i ∂i (16. The result of a ∇ operation. Note that the directional derivative is the derivative not of the reference vector b but only of the ﬁeld ψ(r) or a(r).4) and (16. the quantity ∇ψ(r) = ˆ ı ∂ψ ∂i (16. as ∂ψ . the directional derivative does not care. ∇a(r) itself means nothing coherent6 so the parentheses usually are retained. the directional derivative operator b · ∇ is invariant under reorientation of axes. . the gradient ∇ψ(r) is likewise invariant. it is not the object of it. The vector b just directs and scales the derivative.4) For the scalar ﬁeld the parentheses are unnecessary and conventionally are omitted.1. only scalar ﬁelds have gradients. too. Equations (16.5) deﬁne the directional derivative. Appendix B]. Though both scalar and vector ﬁelds have directional derivatives.16. Nothing requires b to be constant. Formally a dot product. but this book won’t treat that. ∂ψ . it does mean something coherent in dyadic analysis [10. (16. 6 Well. one can compose the directional derivative operator (b · ∇) = bi ∂ ∂i (16.6) is called the gradient of the scalar ﬁeld ψ(r). though. FIELDS AND THEIR DERIVATIVES 451 given (16.

452 CHAPTER 16.2. net ﬂow outward from the region in question. 7 The author has American football in mind but other football games have goal lines and goal posts. The surface presents its full area to a perpendicular ﬂow.8) a(r) · ds.7) through a closed surface that will concern us here. When something like air ﬂows through any surface—not necessarily a physical barrier but an imaginary surface like the goal line’s vertical plane in a football game7 —what matters is not the surface’s area as such but rather the area the surface presents to the ﬂow. VECTOR CALCULUS 16. but otherwise the ﬂow sees a foreshortened surface.3. where wind blowing through the region. (The paragraph says much in relatively few words. One can contemplate the ﬂux Φopen = S a(r) · ds through an open surface as well as through a closed.7) S is a vector inﬁnitesimal of amplitude ds. . Pick your favorite brand of football. If it seems opaque then try to visualize eqn. as though the surface were projected onto a plane perpendicular to the ﬂow. Refer to Fig. in which the vector ds represents the area and orientation of a patch of the region’s enclosing surface.7’s dot product a[r] · ds. a sink.7 actually describes ﬂux not through an open surface but through a closed—it could be the imaginary rectangular box enclosing the region of football play to goal-post height. entering and leaving. (16. would constitute zero net ﬂux.) A region of positive ﬂux is a source.1. but where a positive net ﬂux would have barometric pressure falling and air leaving the region maybe because a storm is coming—and you’ve got the idea. however. so we will approach it indirectly.4 Divergence There exist other vector derivatives than the directional derivative and gradient of § 16. too. directed normally outward from the closed surface bounding the region—ds being the area of an inﬁnitesimal element of the surface. One of these is divergence. 16. Flux is ﬂow through a surface: in this case.1. of negative ﬂux. The ﬂux of a vector ﬁeld a(r) outward from a region in space is Φ≡ where ˆ ds ≡ n · ds (16. 15. Now realize that eqn. through the concept of ﬂux as follows. the area of a tiny patch. 16. It is not easy to motivate divergence directly. but it is the outward ﬂux (16.

but this merely elaborates the limits of integration along those lines.z) ∆ax (y. ∂x ∂y ∂z Naturally.x) zmax (x. y) dx dy. x) = ∆y(z. y) dx dy. It changes the problem in no essential way. ∂y ∂az dz ∂z ∆ay (z. ∂x ∂ay dy.1. x). z) = respectively of ax . z) dy dz + ∆ay (z. y) = ∂z upon which Φ = ∂ax ∂x + ∂az ∂z ∆x(y. then ∆ax (y.16. ∆az (x.8 If the ﬁeld equivalently if the region in question is are practically constant through it. then some lines might enter and exit the region more than once. ∂x ∂ay ∆ay (z.z) ymax (z. 8 . But each of the last equation’s three integrals represents the region’s volume V . x) dz dx ∆z(x.x) ∂ax dx. y. y). or small enough that the derivatives these increases are simply ∂ax ∆x(y. ∂y ∂az ∆z(x.or z-directed line. x) = ymin (z. FIELDS AND THEIR DERIVATIVES 453 The outward ﬂux Φ of a vector ﬁeld a(r) through a closed surface bounding some deﬁnite region in space is evidently Φ= where xmax (y.y) ∆az (x. z) = xmin (y.y) represent the increase across the region ˆ ˆ ˆ an x-. z) dy dz + ∂ay ∂y ∆y(z. ∆ax (y. z). ay or az along has constant derivatives ∂a/∂i. y) = zmin (x. if the region’s boundary happens to be concave. x) dz dx + ∆az (x. so ∂ax ∂ay ∂az + + Φ = (V ) .

10) (16. VECTOR CALCULUS or. or .454 CHAPTER 16. ˆ Let [ˆ v n] be an orthogonal basis with n normal to the contour’s plane uˆ ˆ ˆ ˆ such that travel positively along the contour tends from u toward v rather than the reverse. 16. Φ ∂ax ∂ay ∂az ∂ai = + + = = ∇ · a(r).or v-directed line. dividing through by the volume. It needs ﬁrst the concept of circulation as follows. but it suits our purpose here to consider speciﬁcally a closed contour that happens not to depart from a single. The circulation Γ of a vector ﬁeld a(r) about this contour is evidently Γ= where umax (v) ∆av (v) dv − ∆au (u) du. ∆av (v) = umin (v) vmax (u) ∂av du. If the ﬁeld has constant derivatives ∂a/∂i. ∂u ∂au dv ∂v ∆au (u) = vmin (u) represent the increase across the contour’s interior respectively of av or au ˆ ˆ along a u.7) which represented a double integration over a surface.5 Curl Curl is to divergence as the cross product is to the dot product. (16. divergence is invariant under reorientation of axes.9) the name divergence. the here represents only a single integration. V ∂x ∂y ∂z ∂i We give this ratio of outward ﬂux to volume. ∇ · a(r) = ∂ai . unlike the S of (16. One can in general contemplate circulation about any closed contour. ﬂat plane in space. Curl is a little trickier to visualize. though. Formally a dot product.11) where. ∂i (16.1. representing the intensity of a source or sink. The circulation of a vector ﬁeld a(r) about a closed contour in space is Γ≡ a(r) · dℓ.

14) we call curl.13) and although we have developed directional curl with respect to a contour in some speciﬁed plane. = ∂j ∂u ∂v (16. representing the intensity of circulation. ˆ ˆ n · [∇ × a(r)] = n · ǫijkˆ ı ∂au ∂ak ∂av − . Γ = (A) ∂u ∂v or. ∇ × a(r) = ǫijkˆ ı ∂ak . Although it emerges from directional curl (16. Curl (16. Γ A = ∂av ∂au − ∂u ∂v ∂ak ˆ =n· ∂j ˆ ˆ ˆ u v n ∂/∂u ∂/∂v ∂/∂n au av an (16. But each of the last equation’s two integrals represents the area A within the contour. about a speciﬁed axis. ∂j (16. We might have choˆ sen another plane and though n would then be diﬀerent the same (16. The cross product in (16. is a property of the ﬁeld .14) would necessarily result. curl (16. so ∂au ∂av − .13) the name directional curl.13). the degree of twist so to speak.16. dividing through by the area. a scalar.1. ∂u ∂au ∆au (u) = ∆v(u). FIELDS AND THEIR DERIVATIVES 455 equivalently if the contour in question is short enough that the derivatives are practically constant over it.12) ˆ = n · ǫijkˆ ı ˆ = n · [∇ × a(r)].14) is an interesting quantity. ∂v ∆av (v) = upon which Γ= ∂av ∂u ∆u(v) dv − ∂au ∂v ∆v(u) du. Directional curl. then these increases are simply ∂av ∆u(v).14) itself turns out to be altogether independent of any particular plane. We give this ratio of circulation to area.

1. Curl. compute rates with reference to some direction b. Note however that directional curl so deﬁned is not a distinct kind of derivative but rather is just curl.4) directional derivatives themselves and also including directional curl (16. .13) has deﬁned it. VECTOR CALCULUS and the plane. Formally a cross product. 16.4) here too any reference vector b or ˆ vector ﬁeld b(r) can serve to direct the curl. As in (16.13) to motivate and develop the concept (16. so it may be said that curl is the locally greatest directional curl. curl is invariant under reorientation of axes. Unlike the vector directional derivative (16. dot-multiplied by a reference vector.4). ˆ We have needed n and (16. Directional curl evidently cannot exceed curl in magnitude. not only n. however. oriented normally to the locally greatest directional curl’s plane. the concept of curl stands on its own. ∂k ∂ai ∂aj − ∂j ∂i (16. Hence. b · [∇ × a(r)] = b · ǫijkˆ ı ∂ak ∂ak .14) of curl.15). 9 The author is unaware of a conventional name for these derivatives.15) This would be the actual deﬁnition of directional curl. An ordinary dot product.6 Cross-directional derivatives The several directional derivatives of the b · ∇ class. The name crossdirectional seems as apt as any. that of the crossdirectional derivatives. unexpectedly is a property of the ﬁeld only. including the scalar (16. The cross-directional derivatives are b × ∇ψ = ǫijkˆbj ı b × ∇ × a = ˆbi ∂ψ . Another class of directional derivatives however is possible. = ǫijk bi ∂j ∂j (16. a vector. but will equal ˆ it when n points in its direction. whereupon one can return to deﬁne directional curl more generally than (16.9 These compute rates perpendicularly to b.5) and vector (16. the cross-directional derivatives are not actually new derivatives but are cross products of b with derivatives already familiar to us.456 CHAPTER 16. Once developed. directional curl is likewise invariant.16) .

2. According to the deﬁnition (16.16. Even a single volume element however can have two distinct kinds of surface area: inner surface area shared with another element. overall volume. more general volumes as well. Interior elements naturally have only the former kind but .1 The divergence theorem Section 16. the latter derived as b × ∇ × a = b × ǫijkˆ ı ∂ak ∂j ∂ak ∂j ˆ = ǫmni ǫijk mbn i ˆ = ǫmni mbn ǫijkˆ ı ∂ak ∂j ∂ak ∂j ∂ai ∂aj ∂ak ˆ ∂ak − kbj = ˆbi − ˆbi = ˆbk ∂j ∂j ∂j ∂i ˆ = (δmj δnk − δmk δnj )mbn where the Levi-Civita identity that ǫmni ǫijk = ǫimn ǫijk = δmj δnk − δmk δnj comes from Table 15.7). 16. the ﬂux from any volume is Φ= S a · ds.2.1. If one subdivides a large volume into inﬁnitesimal volume elements dv. then the ﬂux from a single volume element is Φelement = Selement a · ds.1. INTEGRAL FORMS 457 We call these respectively the cross-directional derivative (itself) and crossdirectional curl. The two subsections that follow explain. 16. which we are now prepared to treat.4 has contemplated the ﬂux of a vector ﬁeld a(r) from a volume small enough that the divergence ∇ ·a were practically constant through the volume. This shift comes in two kinds. One would like to treat the ﬂux from larger.2 Integral forms The vector ﬁeld’s distinctive maneuver is the shift between integral forms. and outer surface area shared with no other element because it belongs to the surface of the larger.

eqn.11 [70. a · ds + elements Souter a · ds.458 CHAPTER 16. which surface after all consists of nothing other than the several boundary elements’ outer surface patches. so Φelement = elements elements Souter a · ds = S a · ds.17) Equation (16. (16. 1. In this equation.17) is the divergence theorem. consequently.17) swaps the one integral for the other.10 The divergence theorem. elements S That is. Applying (16.9) to the equation’s left side to express the ﬂux Φelement from a single volume element yields ∇ · a dv = a · ds. where Selement = ments together. The integral on the equation’s left and the one on its right each arise in vector analysis more often than one might expect. neatly relates the divergence within a volume to the ﬂux from it. such that the one volume element’s ds on the surface the two elements share points oppositely to the other volume element’s ds on the same surface). V ∇ · a dv = S a · ds.2. but the inner sum is null because it includes each interior surface twice. the vector’s version of the fundamental theorem of calculus (7. because each interior surface is shared by two elements such that ds2 = −ds1 (in other words. VECTOR CALCULUS boundary elements have both kinds of surface area. so one can elaborate the last equation to read Φelement = Sinner a · ds + Sinner Souter a · ds Adding all the ele- for a single element. which would seem 11 10 . often a proﬁtable maneuver. overall volume. the ﬁeld’s divergence can be inﬁnite. the associated ﬁeld can be discontinuous and. When they do. (16.2). It is an important result.8] Where a wave propagates through a material interface. the last integration is over the surface of the larger. we have that Φelement = elements elements Sinner + Souter .

18) S Equation (16. reasoning parallel to that of § 16.7.18) serves to swap one vector integral for another where such a maneuver is needed.3][70. then the divergence theorem necessarily remains valid in the limit ǫ → 0. Sir George Gabriel Stokes evidently is not to be denied his fame! 14 [6.11) the circulation about the entire surface is Γ = a · dℓ and the circulation about any one surface element is Γelement = element a · dℓ.3 Summary of deﬁnitions and identities of vector calculus Table 16.2. (16.2. related theorem for directional curl.9)—concludes that (∇ × a) · ds = a · dℓ.13 neatly relating the directional curl over a (possibly nonplanar) surface to the circulation about it. One can integrate ﬁnitely through either inﬁnity.1 lists useful deﬁnitions and identities of vector calculus. developed as follows. but no one calls it that. From this equation.1 is a second. the last several of to call assumptions underlying (16. is integrable. 1.12) in place of (16.1—only using (16.” then should (16. If an open surface. though inﬁnite. Appendix II. However. the inﬁnite divergence at a material interface is normally integrable in the same way the Dirac delta of § 7. SUMMARY OF DEFINITIONS AND IDENTITIES 459 16.12 .4. through which the associated ﬁeld varies steeply but continuously. Like the divergence theorem (16. Appendix A] .20] 13 If (16.14 the ﬁrst several of which it gathers from §§ 16.16.2 Stokes’ theorem Corresponding to the divergence theorem of § 16. If one can conceive of an interface not as a sharp layer of zero thickness but rather as a thin layer of thickness ǫ.2.18) not be “the curl theorem”? Answer: maybe it should be. is subdivided into inﬁnitesimal surface elements ds— each element small enough not only to experience essentially constant curl but also to be regarded as planar—then according to (16.17) into question. 12 [70. 16.18) is Stokes’ theorem.3.17) is “the divergence theorem. Stokes’ theorem (16.1 and 16.2. eqn. Appendix II][31. whether the surface be conﬁned to a plane or be warped in three dimensions (as for example in bowl shape).17).

2. However. VECTOR CALCULUS which (exhibiting heretofore unfamiliar symbols like ∇2 ) it gathers from § 16. ∂i ∂i ∂i ∇(ψω) = ˆ ı The hardest to prove is that ∂ai ∂bi ∂(ai bi ) = ˆbi + ˆai ∂j ∂j ∂j ∂aj ∂bj ∂ai ∂aj ∂bi ∂bj = ˆbi + ˆbi − + ˆai − + ˆai ∂i ∂j ∂i ∂i ∂j ∂i = (b · ∇)a + b × ∇ × a + (a · ∇)b + a × ∇ × b = (b · ∇ + b × ∇ × )a + (a · ∇ + a × ∇ × )b. ∇(a · b) = ∇(ai bi ) = ˆ because to prove it one must recognize in it the cross-directional curl of (16. its action diﬀers from a vector’s in some cases. Among the several product rules the easiest to prove is that ∂ψ ∂ω ∂(ψω) = ωˆ ı + ψˆ ı = ω∇ψ + ψ∇ω.4 to follow. and there are more distinct ways in which it can act. The product rules resemble the triple products of Table 15.5. a few are statements of the ∇ operator’s distributivity over summation. b · ∇ = ∇ · b. since ∇ is a diﬀerential operator for which.2). Of the identities in the middle of the table. for instance.16). only with the ∇ operator in place of the vector c. Also nontrivial to prove is that ∇ × (a × b) = ∇ × (ǫijkˆaj bk ) ı ∂(ǫijkˆaj bk )i ı ∂(aj bk ) ˆ ˆ = ǫmni m = ǫmni ǫijk m ∂n ∂n ∂(aj bk ) ˆ = (δmj δnk − δmk δnj )m ∂n ∂(aj bk ) ˆ ∂(aj bk ) ∂(aj bi ) ∂(ai bj ) = ˆ −k =ˆ −ˆ ∂k ∂j ∂i ∂i ∂aj ∂bj ∂bi ∂ai ˆbi + ˆaj + ˆbj = − ˆai ∂i ∂i ∂i ∂i = (b · ∇ + ∇ · b)a − (a · ∇ + ∇ · a)b. The rest are vector derivative product rules (§ 4. .460 CHAPTER 16.

∂ ∂i ∂ψ ∇ψ = ˆ ı ∂i ∂ai ∇·a = ∂i ∇ ≡ ˆ ı ∂ak ∂j ∂ψ b × ∇ψ = ǫijkˆbj ı ∂k ∇ × a = ǫijkˆ ı Φ ≡ Γ ≡ S b · ∇ = bi b · ∇ψ = (b · ∇)a = b·∇×a = b×∇×a = V ∂ ∂i ∂ψ bi ∂i ∂aj ∂a bi = ˆbi ∂i ∂i ∂ak ǫijk bi ∂j ∂ai ∂aj ˆbi − ∂j ∂i S a · ds a · dℓ S ∇ · a dv = a · ds a · dℓ C (∇ × a) · ds = ∇ × (a + b) = ∇ × a + ∇ × b ∇(ψ + ω) = ∇ψ + ∇ω ∇(ψω) = ω∇ψ + ψ∇ω ∇ · (a + b) = ∇ · a + ∇ · b ∇ × (ψa) = ψ∇ × a − a × ∇ψ ∇ · (ψa) = a · ∇ψ + ψ∇ · a ∇ × (a × b) = (b · ∇ + ∇ · b)a − (a · ∇ + ∇ · a)b ∂ 2 ai ∂2 ∇∇ · a = ˆ ∇2 ≡ ∂i2 ∂j ∂i ∂ 2 aj ∂2ψ ∂2a ∇2 ψ = ∇ · ∇ψ = 2 ∇2 a = = ˆ 2 = ˆ∇2 (ˆ · a) ∂i ∂i2 ∂i ∇ × ∇ψ = 0 ∇·∇×a = 0 ∂ ∂ai ∂aj ∇×∇×a = ˆ − ∂i ∂j ∂i ∇∇ · a = ∇2 a + ∇ × ∇ × a ∇ · (a × b) = b · ∇ × a − a · ∇ × b ∇(a · b) = (b · ∇ + b × ∇ × )a + (a · ∇ + a × ∇ × )b .3. SUMMARY OF DEFINITIONS AND IDENTITIES 461 Table 16.1: Deﬁnitions and identities of vector calculus (see also Table 15.16.2 on page 433).

except that this book is not actually an instructional textbook.20) And probably should have been left as exercises.4 The Laplacian and other second-order derivatives Table 16. That the deﬁnitions and identities at the top of the table are invariant. second-order vector derivatives too come in several kinds. 16. we have already seen. ∂i2 ∂ 2 aj ∂2a ∇2 a = 2 = ˆ 2 = ˆ∇2 (ˆ · a).4. will give invariance to the deﬁnitions and identities at the bottom. Inasmuch as none of the derivatives or products within the table’s several product rules vary under rotation of axes. the product rules are themselves invariant. ∂i ∂i Other second-order vector derivatives include ∂ 2 ai .462 CHAPTER 16. ∂i ∂i ∂i ∂(ψak ) ∂ak ∂ψ ∇ × (ψa) = ǫijkˆ ı = ǫijkˆψ + ǫijkˆak ı ı ∂j ∂j ∂j = ψ∇ × a − a × ∇ψ. and § 16. . The whole table therefore is invariant under rotation of axes. the alternate symbol ∆ replaces ∇2 in some books. ∇∇ · a = ˆ ∂j ∂i ∂ ∂ai ∂aj ∇×∇×a =ˆ − . ∂i ∂j ∂i ∇2 ψ = ∇ · ∇ψ = 15 (16.19) ∂2ψ . ∂(ǫijk aj bk ) ∂aj ∂bk = ǫijk bk + ǫijk aj ∇ · (a × b) = ∂i ∂i ∂i = b · ∇ × a − a · ∇ × b. the simplest of which is the Laplacian 16 ∇2 ≡ ∂2 . next. Like vector products and ﬁrst-order vector derivatives. VECTOR CALCULUS The others are less hard:15 ∂(ψai ) ∂ψ ∂ai ∇ · (ψa) = = ai +ψ = a · ∇ψ + ψ∇ · a.1 ends with second-order vector derivatives. 16 Though seldom seen in applied usage in the author’s country. ∂i2 (16. The reader who wants exercises might hide the page from sight and work the three identities out with his own pencil.

source-free or (prosaically) divergenceless ﬁeld. who had earned Ph.22) ∇ · ∇ × a = 0. b2 = 0. This is unexpected but is a direct consequence of the deﬁnitions of the gradient.4. unacquainted with one other. curl and divergence: ∂2ψ ∂ψ ˆ = ǫmni m ∂i ∂n ∂i ∂ak ∂ 2 ak ∇ · ∇ × a = ∇ · ǫijkˆ ı = ǫijk ∂j ∂i ∂j ı ∇ × ∇ψ = ∇ × ˆ = 0. but it is not just the one textbook. (16. Even at least one widely distributed textbook expresses this belief. 17 . ∇ × ∇ψ = 0.D. for the writer has heard it verbally from at least two engineers. then the two ﬁelds could diﬀer only by an additive constant. Consider the admittedly conˆ ˆ trived counterexample of b1 = xy + yx. A ﬁeld like ∇×a that does not diverge is called a solenoidal. = 0. The table includes two curious null identities.21) Table 16. there has been a mistaken belief afoot that. In words.16. ET AL. the United States.22) states that gradients do not curl and curl does not diverge.1 summarizes. (16. A ﬁeld like ∇ψ that does not curl is called an irrotational ﬁeld. naming it Helmholtz’s theorem. but the writer remains unconvinced. maybe it is.17 In the writer’s country. = ˆ ∂k ∂j ∂j ∂i ∂j ∂i Combining the various second-order vector derivatives yields the useful identity that ∇∇ · a = ∇2 a + ∇ × ∇ × a. (16. if two ﬁelds b1 (r) and b2 (r) had everywhere the same divergence and curl. So the belief must be correct.s in diﬀerent eras in diﬀerent regions of the country. THE LAPLACIAN. mustn’t it? Well. the latter of which is derived as ∇ × ∇ × a = ∇ × ǫijkˆ ı ˆ = ǫmni m ∂ ∂n ∂ak ∂j ǫijkˆ ı ∂ak ∂j ˆ = ǫmni ǫijk m i 463 ∂ 2 ak ∂n ∂j ∂ 2 ak ˆ = (δmj δnk − δmk δnj )m ∂n ∂j ∂ 2 ak ∂ 2 ai ∂ 2 aj ∂ 2 ak ˆ −k 2 =ˆ −ˆ 2 .

were not actually very necessary in practical applications. ∂ℓ ∂ℓ ∂ℓ ∂ψ ∂a ∂ (ψa) = a +ψ . several of which Table 16.) Corrections by readers are invited. inasmuch as each of those ﬁrst-order vector derivatives is invariant under reorientation of axes.464 CHAPTER 16. What the writer really believes however is that Hermann von Helmholtz probably originally had put some appropriate restrictions on b1 and b2 which. one can treat the scalar ﬁelds ψ(r) and ω(r) as ordinary scalar functions ψ(ℓ) and On an applied level. As a hypothesis. ∂ℓ ∂ℓ ∂ℓ ∂a ∂b ∂ (a × b) = −b × +a× .23) is attractive. one supposes that18 ∂ψ ∂ω ∂ (ψω) = ω +ψ . too.23) represents. Therefore. too? That is. VECTOR CALCULUS Each of this section’s second-order vector derivatives—including the vector Laplacian ∇2 a.25) gives the derivative product rule for functions of a scalar variable.23) where ℓ is the distance along some arbitrary contour in space. though interesting. (Some believe the theorem necessary to establish a “gauge” in a wave equation. if obeyed. in the restricted case (16. 16. See below. ∂ℓ ∂ℓ ∂ℓ (16.5 Contour derivative product rules Equation (4.1. but if they examine the use of their gauges closely then they will likely discover that one does not logically actually need to invoke the theorem to use the gauges. the narrative will reverse the order to kill the sign.23)’s last line is an artifact of ordering the line’s factors in the style of Table 16. . (16. the writer knows of literally no other false theorem so widely believed to be true.1 lists. Fields—that is. if you think about it in the right way.23) is true is clear. That a transcription error would go undetected so many years would tend to suggest that Helmholtz’s theorem. The table’s product rules however are general product rules that regard full spatial derivatives. which leads the writer to suspect that he himself had somehow erred in judging the theorem false. because.21)—is or can be composed of ﬁrst-order vector derivatives already familiar to us from § 16. What about derivatives along an arbitrary contour? Do they obey product rules. each second-order vector derivative is likewise invariant. But is it true? That the ﬁrst line of (16. Before proving the line.1. ∂ℓ ∂ℓ ∂ℓ ∂a ∂b ∂ (a · b) = b · +a· . according to (16. made his theorem true but which at some time after his death got lost in transcription. functions of a vector variable—obey product rules. 18 The − sign in (16.

23) never evaluates ψ(r) or ω(r) but along the contour. .6. k) = (x. which completes the proof of (16.3) of which is ǫijkˆ ı ∂aj ∂bk ∂ (aj bk ) = ǫijkˆ bk + a j ı .19 or indeed of coordinates in any three-dimensional system. Refer to Table 3. The truth of (16. The same naturally goes for the vector ﬁelds a(r) and b(r). ∂ℓ ∂ℓ ∂ℓ each of which. cylindrical and spherical geometries normally recommend cylindrical and spherical coordinate systems. However. The reason a cylindrical or spherical system makes some of the table’s identities hard to use is that some of the table’s identities involve derivatives 19 For example. ∂ℓ ∂ℓ ∂ℓ the Levi-Civita form (§ 15. for (i.4. too. and so forth. j. z. θ. too. since one can write the second line in the form ∂ ∂ψ ∂ai ˆ ı (ψai ) = ˆ ai ı +ψ ∂ℓ ∂ℓ ∂ℓ and the third line in the form ∂ ∂ai ∂bi (ai bi ) = bi + ai . y). whereupon (4. for i = y and for i = z. is true separately for i = x.6 Metric coeﬃcients A scalar ﬁeld ψ(r) is the same ﬁeld whether expressed as a function ψ(x. Nevertheless.23)’s last line is slightly less obvious. φ. one can reorder factors to write the line as ∂a ∂b ∂ (a × b) = ×b+a× . so (16. ψ(ρ. which one can treat as vector functions a(ℓ) and b(ℓ) of the scalar distance ℓ. systems which make some of the identities of Table 16. so the second and third lines of (16. 16. y. METRIC COEFFICIENTS 465 ω(ℓ) of the scalar distance ℓ along the contour.23)’s last line as a whole is true.4. ∂ℓ ∂ℓ ∂ℓ The Levi-Civita form is true separately for (i. φ) of spherical coordinates.23) are true. y. according to the ﬁrst line. z) of cylindrical coordinates or ψ(r. the ﬁeld ψ = x2 + y 2 in rectangular coordinates is ψ = ρ2 in cylindrical coordinates.23). z). z) of rectangular coordinates.16. k) = (x.25) applies— for (16.1 hard to use. j.

2: The metric coeﬃcients of the rectangular.21 or even something exotic like [13. when special coordinate systems like the parabolic systems of § 15. hx = 1 hy = 1 hz = 1 CYL. r. hr = 1 hθ = r hφ = r sin θ d/di. (z. cylindrical and spherical coordinate systems. θ. y ′ and z ′ represent lengths. θ x ). since ρ = r sin θ per Table 3. x. φ. right-handed. hρ = 1 hφ = ρ hz = 1 SPHER. θx ) come into play. θ.” to delimit coordinate triplets. Such factors are called metric coeﬃcients and Table 16.2 lists them. y ′ . notation which per § 15. γ)—whether the symbols (α. z).4. y.4. whereby a semicolon precedes an angle (or. the metric coeﬃcient hφ seems to have two diﬀerent values. φ. Because one cannot use an angle as though it were a length. (r.4. § 2-4] The book’s admittedly clumsy usage of semicolons “. more generally.6. z). to convert coordinates other than lengths to lengths). which the table now formalizes. z ′ ).2 stands for d/dx′ .” and commas “. But among the cylindrical and spherical coordinates are θ and φ. x). (φx . (x′ . γ) stand for (x. angles rather than lengths. The two are really the same value.1. 16. β. it would make more sense to write in the 21 20 .466 CHAPTER 16. For example. VECTOR CALCULUS Table 16. thus. We therefore want factors to convert the angles in question to lengths (or. maybe. z) and (r. RECT. though.7 come into play. (ρ. r. serves well enough to distinguish the three principal coordinate systems (x. y). one cannot use the table in cylindrical or spherical coordinates as the table stands. areas and volumes In any orthogonal. three-dimensional coordinate system (α. z). In the table. y. z.20 The use of the table is this: that for any metric coeﬃcient hα a change dα in its coordinate α sweeps out a length hα dα. precedes a generic coordinate like α that could stand for an angle). the notation d/di cannot stand for d/dθ or d/dφ and. in cylindrical coordinates hφ = ρ according to table. etc. (y.. φ). Logically. incidentally. (ρ. one value in cylindrical coordinates and another in spherical. d/dy ′ or d/dz ′ where the coordinates x′ . β.1 Displacements. so a change dφ in the azimuthal coordinate φ sweeps out a length ρ dφ—a fact we have already informally observed as far back as § 7. φ) visually from one another but ceases to help much when further coordinate systems such as (φx . in this section’s case.

a vector ﬁeld a(r) too is the same ﬁeld whether expressed as a function a(x. r r Again in any orthogonal.25) cannot meaningfully in three dimensions be expressed as a vector as an area inﬁnitesimal (16. 15’s footnote 31.24) ˆ represents an area inﬁnitesimal normal to α. z).6. See also Ch. z in cylindrical coordinates. since in three dimensions a volume has no orientation.2 The vector ﬁeld and its scalar components Like a scalar ﬁeld ψ(r). β. Notice that § 7. a(ρ. the volume inﬁnitesimal in a spherical geometry is dv = hr hθ hφ dr dθ dφ = r 2 sin θ dr dθ dφ.6. METRIC COEFFICIENTS the parabolic (σ. . ˆ ˆ a(r) = ρaρ (r) + φaφ (r) + ˆaz (r).7—the product ˆ ds = αhβ hγ dβ dγ 467 (16. z) of § 15. θ. but to do so seems overwrought and fortunately no one the author knows of does it in that way.16. For example. right-handed. For example. Naturally however. γ). a length or displacement inﬁnitesimal can indeed be expressed as a vector.24) can. A vector ﬁeld however is the sum of three scalar ﬁelds.25) represents a volume inﬁnitesimal. 16. In rectangular coordinates. as ˆ dℓ = αhα dα. The book adheres to the semicolon convention not for any deep reason but only for lack of a better convention. A volume inﬁnitesimal (16. The delimiters just are not that important. (16.26) Section 16. three-dimensional coordinate system (α.4 has already calculated several integrals involving area and volume inﬁnitesimals of these kinds. the area inﬁnitesimal on a spherical surface of radius r is ds = ˆhθ hφ dθ dφ = ˆr 2 sin θ dθ dφ. τ. x. φ) of spherical coordinates. z) of rectangular coordinates. the product dv = hα hβ hγ dα dβ dγ (16.10 will have more to say about vector inﬁnitesimals in nonrectangular coordinates. each scaling an appropriate unit vector. ˆ ˆ a(r) = xax (r) + yay (r) + ˆaz (r). y. or indeed of coordinates in any three-dimensional system. z manner of (. y. φ. z) of cylindrical coordinates or a(r.

Of course there are the variable ˆ ˆ ˆ unit vectors ρ(r). it is true that ρ · φ = 0. this failure is not hard to redress. Unfortunately. aθ (r) and aφ (r) in and of themselves do not diﬀer in nature from ax (r). az (r) to compose the vector ﬁeld a(r) whereas no such constant unit vectors exist to combine the scalar ﬁelds aρ (r). Whereas the standard Einstein symbols i. ar (r). because constant unit vectors x. can stand for any coordinates. j and k to stand for unspeciﬁed coordinates. ρ(r1 ) · φ(r2 ) = 0. but the practical and philosophical r diﬀerences between these and the constant unit vectors is greater than it ˆ ˆ might seem. One ˆ does tend to use them diﬀerently. The subsection is here because no obviously better spot for it presents itself.2 and 16. The notation relies on symbols like i. The tilde “˜” atop the symbol˜ warns readers that the ı coordinate it represents is not necessarily a length and that. ˆ(r). and Tables 15. az (r). the modiﬁed Einstein symbols ˜. that x · y = 0 is always true.1 use it extensively. ψ(r) or any other scalar ﬁeld.468 and in spherical coordinates. VECTOR CALCULUS ˆ ˆ a(r) = ˆar (r) + θaθ (r) + φaφ (r). ar (r). For instance. ı ı The products h˜˜. because ∂/∂i is taken to represent a derivative speciﬁcally with respect to a length whereas nonrectangular coordinates like θ and φ are not lengths. so long as what is meant ˆ ˆ ˆ ˆ by this is that ρ(r) · φ(r) = 0. (One might ask why such a subsection as this would appear in a section on metric coeﬃcients.) 16. h˜˜ and hkk always represent lengths. ı ˜ ı ˜ . the notation fails in the nonrectangular coordinate systems when derivatives come into play. one must multiply ˜ by an appropriate metric coeﬃcient h˜ (§ 16.7 Nonrectangular notation Section 15. though. ˜ and k .1. ay (r). r The scalar ﬁelds aρ (r).9 to come. CHAPTER 16. However. which this section now ı ˜ introduces. On the other hand. even for coordinates like θ and φ that are not lengths.4 has introduced Einstein’s summation convention. as they do in Table 16. j and k can stand only for lengths. ˆ ˆ y and z exist to combine the scalar ﬁelds ax (r). the Kronecker delta δij and the Levi-Civita epsilon ǫijk together as notation for use in the deﬁnition of vector operations and in the derivation of vector identities. an algebraic error ˆ ˆ fairly easy to commit. θ(r) and φ(r).6). if one wants a length. aθ (r) and aφ (r). but moreover because we shall need the understanding the subsection conveys to apply metric coeﬃcients consistently and correctly in § 16. ay (r). Fortunately.

∂ψ .3 records them. naturally.3.16. as the coordinate system is an orthogonal.4 on page 421. Fig. instead.27) .9. 16. 15. The modiﬁed notation will ﬁnd use in § 16. ∂i but as § 16. one can just write them down. The cylindrical metric coeﬃcients of Table 16.3 and 15. In fact. as for example ˆ ∂ ∂ρ ˆ ˆ ˆ = (ˆ cos φ + y sin φ) = −ˆ sin φ + y cos φ = +φ. right-handed coordinate system as are all the coordinate systems in this book.8. and Fig. one can compute the table’s derivatives symbolically. 16. 16. looking at Fig.1 on page 410.1 Derivatives in cylindrical coordinates ∇ψ = ˆ ı According to Table 16. since ˆ.9. ˆ and k need no modiﬁcation even when modiﬁed symbols ı ˆ like ˜. the result of which is ˆ ∇ψ = ρ ∂ψ ∂ψ ˆ ∂ψ ˆ +φ +z . but in cylindrical and spherical coordinates it is probably easier just to look at the ﬁgures. 15.5 on page 422.9 Derivatives in the nonrectangular systems This section develops vector derivatives in cylindrical and spherical coordinates. ∂ρ ρ ∂φ ∂z (16. Table 16.4. ˆ and k are taken to represent unit vectors ı ˜ ı ˆ a proper orthogonal basis irrespective of the coordinate system— and [ˆ ˆ k]. ı so long. ˜ and k are in use. x x ∂φ ∂φ Such an approach prospers in special coordinate systems like the parabolic systems of Tables 15.8 Derivatives of the basis vectors The derivatives of the various unit basis vectors with respect to the several coordinates of their respective coordinate systems are not hard to compute.1.2 make the necessary conversion. DERIVATIVES OF THE BASIS VECTORS 469 ˆ The symbols ˆ. 15.6 has observed Einstein’s symbol i must stand for a length not an angle. whereas one of the three cylindrical coordinates—the azimuth φ—is an angle. Naturally.

3: Derivatives of the basis vectors. RECTANGULAR ˆ ∂x =0 ∂x ˆ ∂y =0 ∂x ∂ˆ z =0 ∂x ˆ ∂x =0 ∂y ˆ ∂y =0 ∂y ∂ˆ z =0 ∂y ˆ ∂x =0 ∂z ˆ ∂y =0 ∂z ∂ˆ z =0 ∂z CYLINDRICAL ˆ ∂ρ =0 ∂ρ ˆ ∂φ =0 ∂ρ ∂ˆ z =0 ∂ρ ˆ ∂ρ ˆ = +φ ∂φ ˆ ∂φ = −ˆ ρ ∂φ ∂ˆ z =0 ∂φ SPHERICAL ∂ˆ r =0 ∂r ˆ ∂θ =0 ∂r ˆ ∂φ =0 ∂r ∂ˆ r ˆ = +θ ∂θ ˆ ∂θ = −ˆ r ∂θ ˆ ∂φ =0 ∂θ ∂ˆ r ˆ = +φ sin θ ∂φ ˆ ∂θ ˆ = +φ cos θ ∂φ ˆ ∂φ ˆ = −ˆ = −ˆ sin θ − θ cos θ ρ r ∂φ ˆ ∂ρ =0 ∂z ˆ ∂φ =0 ∂z ∂ˆ z =0 ∂z . VECTOR CALCULUS Table 16.470 CHAPTER 16.

∂z ∂z ∂z +ˆ z ∂az ∂φ (16.23) yields (3)(3)(2) = 0x12 (eighteen) terms in the result. xˆz According to Table 16. If this is done. ∂ρ ρ ∂φ ∂z (16. we have that (b · ∇)a = bρ ∂a ∂a ∂a + bφ + bz . Some of the 0x12 terms turn out to be null. (b · ∇)a = bρ ∂ ∂ ∂ + bφ + bz ∂ρ ρ ∂φ ∂z ˆ ˆ ρaρ + φaφ + ˆaz . Half the 0x12 terms involve derivatives of the basis vectors. which Table 16. then the basis [ˆ o φo ˆ] is ρ ˆ z constant like [ˆ y ˆ] but not awkward like it.1. nothing prevents us from deﬁning a [ˆ φ ˆ ρ ρˆ ˆ ρ ˆ ˆ constant basis [ˆ o φo z] such that [ˆ φ z] = [ˆ o φo z] at the point r = ro at ρ ˆ ˆ which the derivative is evaluated.1. Fortunately.9. each term of two factors. ∇·a= ∂ai ∂i .28) Expanding the vector ﬁeld a in the cylindrical basis. DERIVATIVES IN THE NONRECTANGULAR SYSTEMS Again according to Table 16. z Here are three derivatives of three terms.3 computes. Evaluating the derivatives according to the contour derivative product rule (16. ∂i 471 Applying the cylindrical metric coeﬃcients. (b · ∇)a = bi ∂a .16. The result is that ˆ (b · ∇)a = bρ ρ ∂aρ ˆ ∂aφ ∂az +φ +ˆ z ∂ρ ∂ρ ∂ρ bφ ∂aρ ˆ ∂aφ + aρ ˆ ρ − aφ + φ + ρ ∂φ ∂φ ∂aρ ˆ ∂aφ ∂az ˆ + bz ρ +φ +ˆ z .29) To evaluate divergence and curl wants more care. whereas [ˆ y z] is awkward in a cylindrical geometry and xˆˆ ˆ z] is not constant. It also wants a constant basis to work in.

leading to any number of hard-to-detect false conclusions. . would be a ghastly error. ∂ρ ∂ρ ρ ∂φ ρ ∂φ ∂z ∂z ∂a ˆ ∂a ∂a + φo · +ˆ· z . so ρ ˆ ˆ ∇ · a = ρo · That is.472 CHAPTER 16.28).23). ∂ρ ρ ∂φ ∂aφ ∂az ∂aρ + + . Refer to § 16. ρ ∂ρ ρ ∂φ ∂z (16. ∂ρ ρ ∂φ ∂z But [ˆ o φo z] are constant unit vectors. ∇·a= ∂ρ ρ ρ ∂φ ∂z or. ∂ρ ρ ∂φ ∂z Applying the contour derivative product rule (16. As above.30) ˆ ∂(ˆ · a) ∂(φo · a) z z ρ ˆ ∂(ˆ o · a) − ∂(ˆ · a) − + φo ρ ∂φ ∂z ∂z ∂ρ ˆ ∂(φo · a) ∂(ˆ o · a) ρ − . that ∇·a= Again according to Table 16. ˆ ∇ · a = ρo · ˆ ˆ ∂a ∂ ρo ∂a ∂ φo ∂a ∂ˆ z ˆ + · a + φo · + ·a+ˆ· z + · a. expressed more cleverly in light of (4.1. this time most of the terms turn out to be null. ∂ρ ρ ∂φ ∂z +ˆ z 22 Mistakenly to write here that ∇·a = which is not true. ∇·a= ˆ ρ· ∂ ∂ ∂ ˆ z +φ· +ˆ· ∂ρ ρ ∂φ ∂z ˆ ˆ z ρaρ + φaφ + ˆaz . Fortunately.2. ∂ρ ρ ∂φ ∂z Expanding the ﬁeld in the cylindrical basis. ˆ ∇·a=ρ· ∂a ∂a ˆ ∂a ˆ +φ· +z· . The result is that ∂aφ ∂aρ aρ ∂az + + + . here again the expansion yields 0x12 terms. ∇ × a = ǫijkˆ ı ˆ = ρo ∂ak ∂j ∂aφ ∂(ρaρ ) ∂az + + . VECTOR CALCULUS In cylindrical coordinates and the [ˆ o φo z] basis. this is22 ρ ˆ ˆ ∇·a= ˆ ∂(ˆ o · a) ∂(φo · a) ∂(ˆ · a) ρ z + + .6.

for the author regrettably knows of no valid shortcut—the clumsy alternative. because Table 16. ˆ − +φ ρ ∂φ ∂z ∂z ∂ρ ∂ρ ρ ρ ∂φ or.31) − +φ ρ ∂φ ∂z ∂z ∂ρ ρ ∂ρ ∂φ Table 16. DERIVATIVES IN THE NONRECTANGULAR SYSTEMS That is. 23 ˆ A concrete example: if ψ(r) = eiφ /ρ.23 To calculate the vector Laplacian ∇2 a in cylindrical coordinates is tedious but nonetheless can with care be done accurately by means of Table 16. ρ whereupon ∇2 ψ = ∇ · ˆ −ˆ + iφ ρ eiφ ρ2 ˆ = −ˆ + iφ · ∇ ρ eiφ ρ2 + eiφ ˆ ∇ · −ˆ + iφ . . (16.1’s identity that ∇2 a = ∇∇ · a − ∇ × ∇ × a. The result is that ˆ ∇×a=ρ ∂aφ ∂az ˆ ∂aρ − ∂az + z ∂aφ + aφ − ∂aρ . ρ ρ2 To ﬁnish the example is left as an exercise.4. ∂ρ ρ ∂φ Here the expansion yields 0x24 terms. ∇×a = ˆ z ρ ˆ· ∂ ˆ ∂ +φ ρ· ∂ −z· ∂ ˆ ˆ ˆ −φ· ρ ∂φ ∂z ∂z ∂ρ ∂ ∂ ˆ ˆ ˆ ˆ ˆ ˆ +z φ· −ρ· ρaρ + φaφ + zaz . ˆ z ∇×a = ρ ˆ· ∂a ˆ ∂a + φ ρ · ∂a − z · ∂a ˆ ˆ ˆ −φ· ρ ∂φ ∂z ∂z ∂ρ ∂a ∂a ˆ −ρ· .1 gives the scalar Laplacian as ∇2 ψ = ∇ · ∇ψ. For example.4 summarizes.16.9. one can calculate ∇2 ψ in cylindrical coordinates by taking the divergence of ψ’s gradient. that ˆ ∇×a=ρ ∂aφ ˆ ∂az ˆ ∂aρ − ∂az + z ∂(ρaφ ) − ∂aρ . One can compute a second-order vector derivative in cylindrical coordinates as a sequence of two ﬁrst-order cylindrical vector derivatives. but fortunately as last time this time most of the terms again turn out to be null. expressed more cleverly. then ∇ψ = (−ˆ + iφ)eiφ /ρ2 per Table 16. +ˆ φ· z ˆ ∂ρ ρ ∂φ 473 Expanding the ﬁeld in the cylindrical basis. (This means that to calculate the vector Laplacian ∇2 a in cylindrical coordinates takes not just two but actually four ﬁrst-order cylindrical vector derivatives.

less insightful. ∂ψ ˆ ∂ψ ∂ψ ˆ +φ +z ∂ρ ρ ∂φ ∂z ∂a ∂a ∂a bρ + bφ + bz ∂ρ ρ ∂φ ∂z ∂aρ ˆ ∂aφ ∂az ˆ ˆ +φ +z bρ ρ ∂ρ ∂ρ ∂ρ bφ ∂aρ ˆ ∂aφ + aρ + z ∂az ˆ ˆ − aφ + φ ρ + ρ ∂φ ∂φ ∂φ ∂aρ ˆ ∂aφ ∂az ˆ ˆ + bz ρ +φ +z ∂z ∂z ∂z ∂aφ ∂az ∂(ρaρ ) + + ρ ∂ρ ρ ∂φ ∂z ∂aφ ˆ ∂az ˆ ∂aρ − ∂az + z ∂(ρaφ ) − ∂aρ ˆ − ρ +φ ρ ∂φ ∂z ∂z ∂ρ ρ ∂ρ ∂φ ˆ ∇ψ = ρ (b · ∇)a = = ∇·a = ∇×a = less proper.2 Derivatives in spherical coordinates One can compute vector derivatives in spherical coordinates as in cylindrical coordinates (§ 16.) 16.2. ∂i Applying the spherical metric coeﬃcients of Table 16. ∂r r ∂θ (r sin θ) ∂φ (16.4: Vector derivatives in cylindrical coordinates. being to take the Laplacian in rectangular coordinates and then to convert back to the cylindrical domain.9.1).9.1.474 CHAPTER 16. ∂i . for to work cylindrical problems directly in cylindrical coordinates is almost always advisable. even more tedious and not recommended. According to Table 16. only the spherical details though not essentially more complicated are messier.1.32) Again according to Table 16. ∇ψ = ˆ ı ∂ψ . VECTOR CALCULUS Table 16. we have that ∇ψ = ˆ r ∂ψ ∂ψ ˆ ∂ψ ˆ +θ +φ . (b · ∇)a = bi ∂a .

reasoning as in § 16. ∂r r r ∂θ r tan θ (r sin θ) ∂φ ∂aφ 1 ∂(r 2 ar ) ∂(aθ sin θ) + + . reasoning as in § 16. we have that (b · ∇)a = br ∂a ∂a ∂a + bθ + bφ . that ∇·a= (16.33) Expanding the vector ﬁeld a in the spherical basis. r Evaluating the derivatives. the result is that ∇·a= ∂aφ 2ar ∂aθ aθ ∂ar + + + + .16.1.9. ∇·a= ˆ· r ∂ ∂ ˆ ∂ +φ· ˆ +θ· ∂r r ∂θ (r sin θ) ∂φ ˆ ˆ ˆar + θaθ + φaφ .34) +φ ∂φ ∂ai ∂a ˆ ∂a ∂a ˆ =ˆ· r +θ· +φ· . (b · ∇)a = br ∂ ∂ ∂ + bθ + bφ ∂r r ∂θ (r sin θ) ∂φ ˆ ˆ ˆar + θaθ + φaφ .35) Again according to Table 16. ∂r r ∂θ (r sin θ) ∂φ 475 (16.1. ∇ × a = ǫijkˆ ı ∂ak ∂j ∂a ∂a ∂a ˆ ˆ ∂a ˆ r −θ· −φ· = ˆ φ· r ˆ +θ ˆ· r ∂θ (r sin θ) ∂φ (r sin θ) ∂φ ∂r ˆ ˆ ∂a − ˆ · ∂a . DERIVATIVES IN THE NONRECTANGULAR SYSTEMS Applying the cylindrical metric coeﬃcients. expressed more cleverly. r r ∂r (sin θ) ∂θ (sin θ) ∂φ or. ∂i ∂r r ∂θ (r sin θ) ∂φ According to Table 16.9. (16.9.1. (b · ∇)a = br ˆ r ∂ar ˆ ∂aθ ˆ ∂aφ +θ +φ ∂r ∂r ∂r ∂ar bθ ˆ ∂aθ + ar + φ ∂aφ ˆ ˆ + r − aθ + θ r ∂θ ∂θ ∂θ bφ ∂ar ˆ ∂aθ − aφ cos θ ˆ + r − aφ sin θ + θ r sin θ ∂φ ∂φ ˆ ∂aφ + ar sin θ + aθ cos θ . +φ θ· r ∂r r ∂θ .1. r Evaluating the derivatives. ∇·a= Expanding the ﬁeld in the spherical basis.

and where ds = ds(r′ ) is the corresponding surface patch. as the ratio of ﬂux Φ from a vanishing test volume ∆V to the volume itself.7) Φ = S a(r′ ) · ds.3 Finding the derivatives geometrically The method of §§ 16. ∇×a = ∂ ∂ ∂ ˆ ˆ ∂ ˆ r ˆ φ· r ˆ −θ· −φ· +θ ˆ· r ∂θ (r sin θ) ∂φ (r sin θ) ∂φ ∂r ˆ ˆ ∂ −ˆ· ∂ ˆ ˆ ˆar + θaθ + φaφ . (16.9.1 and 16. VECTOR CALCULUS Expanding the ﬁeld in the spherical basis.5 summarizes.2 is general.36) + r ∂r ∂θ Table 16. reliable and correct. the result is that ∇×a = ˆ r ∂aφ aφ ∂aφ aφ ∂aθ ∂ar ˆ + − − − +θ r ∂θ r tan θ (r sin θ) ∂φ (r sin θ) ∂φ ∂r r aθ ∂ar ˆ ∂aθ + − . as a sequence of two ﬁrst-order vector derivatives.476 CHAPTER 16. that ∇×a = ˆ ∂(raφ ) ∂(aφ sin θ) ∂aθ ˆ r ∂ar θ − − + r sin θ ∂θ ∂φ r (sin θ) ∂φ ∂r ˆ ∂(raθ ) ∂ar φ − . . geometrically. Refer to § 16.9. +φ ∂r r r ∂θ or.1. One can compute a second-order vector derivative in spherical coordinates as in cylindrical coordinates.1. expressed more cleverly. where per (16. 16. but there exists an alternate. arguably neater method to derive nonrectangular formulas for most vector derivatives.9. Adapting the notation to this subsection’s purpose we can write (16. ∆V (16.4. where r′ is a position on the test volume’s surface.37) ∆V →0 thus deﬁning a vector’s divergence fundamentally as in § 16.9.9) as ∇ · a(r) ≡ lim Φ . +φ θ· r r ∂r r ∂θ Evaluating the derivatives.

5: Vector derivatives in spherical coordinates. ∂ψ ∂ψ ˆ ∂ψ ˆ +θ +φ ∂r r ∂θ (r sin θ) ∂φ ∂a ∂a ∂a + bθ + bφ br ∂r r ∂θ (r sin θ) ∂φ ∂ar ˆ ∂aθ ˆ ∂aφ +θ +φ br ˆ r ∂r ∂r ∂r bθ ∂ar ˆ ∂aθ + ar + φ ∂aφ ˆ ˆ + r − aθ + θ r ∂θ ∂θ ∂θ bφ ∂ar ˆ ∂aθ − aφ cos θ ˆ + r − aφ sin θ + θ r sin θ ∂φ ∂φ ∂aφ ˆ + ar sin θ + aθ cos θ +φ ∂φ ∂aφ 1 ∂(r 2 ar ) ∂(aθ sin θ) + + r r ∂r (sin θ) ∂θ (sin θ) ∂φ ˆ ∂(raφ ) ∂(aφ sin θ) ∂aθ ˆ ∂ar θ r − − + r sin θ ∂θ ∂φ r (sin θ) ∂φ ∂r ˆ ∂(raθ ) ∂ar φ + − r ∂r ∂θ ∇ψ = ˆ r (b · ∇)a = = ∇·a = ∇×a = .16. DERIVATIVES IN THE NONRECTANGULAR SYSTEMS 477 Table 16.9.

25 More rigorously.24) of the sides through which the ﬁelds pass.1. such as that it remain wholly enclosed within a sphere of vanishing radius. Incidentally.β.γ) = −aα hβ hγ |r′ =r(α−∆α/2. but we will not try for such a level of rigor here. the net A professional mathematician would probably enjoin the volume’s shape to obey certain technical restrictions.) 24 .478 CHAPTER 16. 2 ∆γ . So lengthy a digression however would only formalize what we already knew. τ +∆τ /2 If you will believe that lim∆τ →0 τ −∆τ /2 f (τ ′ ) dτ ′ = f (τ ) ∆τ for any τ in the neighborhood of which f (τ ) is well behaved. β.24 so let us give it six sides and shape it as an almost rectangular box that conforms precisely to the coordinate system (α. of more practical temperament.γ) ∆β ∆γ.γ) = +aα hβ hγ |r′ =r(α+∆α/2. VECTOR CALCULUS So long as the test volume ∆V includes the point r and is otherwise inﬁnitesimal in extent. we remain free to shape the volume as we like.and −α-ward sides will then be25 Φ+α = (+aα )(hβ hγ ∆β ∆γ)|r′ =r(α+∆α/2.β. he might review Ch. products of the outward-directed ﬁeld components and the areas (16.β. that one can approximate to ﬁrst order the integral of a well-behaved quantity over an inﬁnitesimal domain by the quantity’s value at the domain’s midpoint. then you will probably also believe its three-dimensional analog in the narrative. γ) in use: α− ∆α ≤ α′ ≤ α + 2 ∆β β− ≤ β′ ≤ β + 2 ∆γ ≤ γ′ ≤ γ + γ− 2 ∆α . Φ−α = (−aα )(hβ hγ ∆β ∆γ)|r′ =r(α−∆α/2. (If the vagueness in this context of the adjective “well-behaved” deeply troubles any reader then that reader may possess the worthy temperament of a professional mathematician. Consider: if the ﬁeld grows in strength across a single side of the test volume and if the test volume is small enough that second-order eﬀects can be ignored. cylindrical and spherical coordinates and to ponder the matter a while. Other readers. 2 The ﬂuxes outward through the box’s +α. namely. Thence by successive steps. 2 ∆β . then what single value ought one to choose to represent the ﬁeld over the whole side but its value at the side’s midpoint? Such visualization should soon clear up any confusion and is what the writer recommends. are advised to visualize test volumes in rectangular. the contrast between the two modes of thought this footnote reveals is exactly the sort of thing Courant and Hilbert were talking about in § 1. 8 and then seek further illumination in the professional mathematical literature. one might digress from this point to expand the ﬁeld in a threedimensional Taylor series (§ 8.β.γ) ∆β ∆γ.2.16) to account for the ﬁeld’s variation over a single side of the test volume.

. invoking Einstein’s summation convention in § 16. DERIVATIVES IN THE NONRECTANGULAR SYSTEMS ﬂux outward through the pair of opposing sides will be Φα = Φ+α + Φ−α = = = aα hβ hγ |r′ =r(α+∆α/2. The total ﬂux from the test volume then is Φ = Φα + Φβ + Φγ = ∆V h3 ∂ ∂α h3 aα hα + ∂ ∂β h3 aβ hβ + ∂ ∂γ h3 aγ hγ .γ) ∆β ∆γ ∂(aα hβ hγ ) ∂(aα hβ hγ ) ∆α ∆β ∆γ = ∆α ∆β ∆γ ∂α ∂α ∆V ∂(aα hβ hγ ) . (16. Φγ = hα hβ hγ ∂γ The three equations are better written Φα = ∆V ∂ h3 ∂α ∆V ∂ Φβ = 3 h ∂β Φγ = where h3 ≡ hα hβ hγ . hα hβ hγ ∂α 479 Naturally.β. or.16.7’s modiﬁed style. (16. substituting the last equation into (16. . Finally.39) .38) ∆V ∂ h3 ∂γ h3 aα hα h3 aβ hβ h3 aγ hγ . hα hβ hγ ∂β ∆V ∂(aγ hα hβ ) .37).9.γ) − aα hβ hγ |r′ =r(α−∆α/2. Φ= ∆V ∂ h3 ∂˜ ı h3 a˜ ı h˜ ı .β. ∇·a= ∂ 3 ∂˜ h ı h3 a˜ ı h˜ ı . the same goes for the other two pairs of sides: Φβ = ∆V ∂(aβ hγ hα ) .

we have that ˆ γ ·∇×a= 26 hγ ∂(hβ aβ ) ∂(hα aα ) − . as is the ˆ appearance of both γ and Γ. 2 ∆β .and +β-ward edges will be Γ−β = +hα aα |r′ =r(α. VECTOR CALCULUS An analogous formula for curl is not much harder to derive but is harder to approach directly. 2 The circulations along the +α.γ) ∆β.β. The capital and minuscule symbols here represent unrelated quantities. perpendicular to γ .β+∆β/2. h3 ∂α ∂β The appearance of both a and A in (16.40). and likewise the circulations along the −β.11) Γ = γ a(r′ ) · dℓ and the notation γ reminds us that the ˆ contour of integration lies in the α-β plane. ∆α . which we give four edges and an almost rectangular shape that conforms precisely to the coordinate system (α. whence the total circulation about the contour is Γ = = ∂(hβ aβ ) ∂(hα aα ) ∆α ∆β − ∆β ∆α ∂α ∂β hγ ∆A ∂(hβ aβ ) ∂(hα aα ) − . Γ+β = −hα aα |r′ =r(α.β.γ) ∆β.40) is unfortunate but coincidental. Equation (16. ∆A→0 ∆A (16. h3 ∂α ∂β Substituting the last equation into (16.γ) ∆α. so we will approach it by deriving ﬁrst the formula for ˆ γ -directed directional curl.480 CHAPTER 16.40) where per (16. Γ−α = −hβ aβ |r′ =r(α−∆α/2. γ) in use: α− ∆α ≤ α′ ≤ α + 2 ∆β β− ≤ β′ ≤ β + 2 γ ′ = γ.and −α-ward edges will be Γ+α = +hβ aβ |r′ =r(α+∆α/2. . β.γ) ∆α.β−∆β/2. In this case the contour of integration bounds not a test volume but a test surface.12) has it that26 ˆ γ · ∇ × a(r) ≡ lim Γ .

(16.41) and (16.9. in Einstein notation. It is ∇ψ = ˆ ∂ψ ı . h3 hα aα hβ aβ hγ aγ or.39) and (16.41) for divergence and curl. the integral.16. then let us pause a moment to appreciate a few of the many concepts the notation implicitly encapsulates. h˜ ∂˜ ı ı (16. ˆ α·∇×a = ∂(hβ aβ ) hα ∂(hγ aγ ) − . the ﬁeld.39).42) One can generate most of the vector-derivative formulas of Tables 16. rotational invariance. DERIVATIVES IN THE NONRECTANGULAR SYSTEMS Likewise. nonrectangular coordinates. β·∇×a = 3 h ∂γ ∂α 481 But one can split any vector v into locally rectangular components as v = ˆ ˆ ˆ ˆ ˆ γ α(α · v) + β(β · v) + γ (ˆ · v). three-dimensional geometry. There are the vector.4 and 16. One can generate additional vector-derivative formulas for special coordinate systems like the parabolic systems of § 15. 3 h ∂β ∂γ hβ ∂(hα aα ) ∂(hγ aγ ) ˆ − . the dummy variable and so on—each of which concepts itself yet encapsulates several further ideas—not to mention multiplication and division which themselves are not trivial.27 ∇×a= ǫ˜kˆh˜ ∂(hk ak ) ı˜˜ ı ı ˜ ˜ .5 by means of this subsection’s (16.41) Compared to the formulas (16. so ˆ ˆ ˆ ˆ ˆ γ ∇ × a = α(α · ∇ × a) + β(β · ∇ × a) + γ (ˆ · ∇ × a) ˆ βhβ ∂(hα aα ) ∂(hγ aγ ) ˆ αhα ∂(hγ aγ ) ∂(hβ aβ ) = − − + 3 3 h ∂β ∂γ h ∂γ ∂α ˆ γ hγ ∂(hβ aβ ) ∂(hα aα ) + 3 − h ∂α ∂β ˆ β ˆ ˆ αhα βh γ hγ 1 = ∂/∂α ∂/∂β ∂/∂γ . What a marvel mathematical notation is! If you can read (16. the derivative. circulation. the unit vector.41) and understand the message it conveys. It is doubtful that one could explain it even tersely to the uninitiated in fewer than ﬁfty pages. and yet to the initiated one can express it all in half a line.42). h3 ∂˜ (16. the corresponding gradient formula is trivial. parity. 27 .7 by means of the same equations.

This is easy once you see how to do it. The geometrical vector of Chs. One might integrate in any of the forms ψ dℓ C C a dℓ S ψ ds S a ds a · ds a × ds ψ dℓ C C a · dℓ a × dℓ ψ ds S S C S among others. β) can serve. Where over a surface. β). Given such functions and a ﬁeld integral to compute. So. VECTOR CALCULUS 16.44) ˆ of two vectors that lie on the surface. a single function γ(α. 15 and 16 and the matrix of Chs.482 CHAPTER 16. one vector normal to β and the other ˆ to α. Those are the essentials of the three-dimensional geometrical vector—of its analysis and of its calculus. one wants an expression for the integrand’s inﬁnitesimal dℓ or ds in terms respectively of the contour functions α(γ) and β(γ) or of the surface function γ(α. (16.10 Vector inﬁnitesimals To integrate a ﬁeld over a contour or surface is a typical maneuver of vector calculus. The contour inﬁnitesimal is evidently dℓ = ˆ ˆ γ hγ + α hα dα ˆ hβ dβ +β dγ dγ dγ. that’s it. edges not of a rectangular patch of the surface but of a patch whose projection onto the α-β plane is an (hα dα)-by-(hβ dβ) rectangle. but one can nevertheless correctly construct it as the cross product ds = = hγ ∂γ dα × ∂α 1 ∂γ ˆ ∂γ ˆ ˆ γ −α −β hγ hα ∂α hβ ∂β ˆ ˆ αhα + γ hγ ∂γ ˆ ˆ βhβ + γ ∂β h3 dα dβ dβ (16. Where the integration is over a contour. a pair of functions α(γ) and β(γ) typically can serve to specify the contour.43) ˆ consisting of a step in the γ direction plus the corresponding steps in the orˆ ˆ thogonal α and β directions. Harder is the surface inﬁnitesimal. 11 through 14 have in common that they represent well-developed ways of marshaling several quantities together to a common purpose: three quantities in the specialized case of the .

16. one cannot well manage without them. 9 has left oﬀ.10. they are exceedingly signiﬁcant. we will begin from the start of the next chapter to introduce a series of advanced topics that pick up where Ch. In applications. VECTOR INFINITESIMALS 483 geometrical vector. The time nevertheless has come to change the subject. . they have proven unexpectedly interesting. Turning the page. entering ﬁrst upon the broad topic of the Fourier transform. Matrices and vectors vastly expand the domain of physical phenomena a scientist or engineer can model. it must be said. Mathematically. n quantities in the generalized case of the matrix. Matrices and vectors have admittedly not been easy for us to treat but after a slow start.

VECTOR CALCULUS .484 CHAPTER 16.

Part III Transforms and special functions 485 .

.

ℑ(T1 ) = 0. This ﬁrst of the two chapters brings the Fourier transform in its primitive guise as the Fourier series. 1 [40. as the one Lord Kelvin has acclaimed “a great mathematical poem. 8 but meant for repeating waveforms.”1 It is the Fourier transform. Examples include the square wave of Fig. none is so useful. if the waveform is real. which this chapter and the next will develop. functions f (t) of which f (t) = f (t + nT1 ). alternately.Chapter 17 The Fourier series It might fairly be said that. of sinusoids. and few so appealing.1.1: A square wave. f (t) A T1 t 487 . The Fourier series is an analog of the Taylor series of Ch. 17] Figure 17. A Fourier series expands such a repeating waveform as a superposition of complex exponentials or. 17. T1 = 0. (17. for all n ∈ Z.1) where T1 is the waveform’s characteristic period. Ch. among advanced mathematical techniques.

488

CHAPTER 17. THE FOURIER SERIES

Suppose that you wanted to approximate the square wave of Fig. 17.1 by a single sinusoid. You might try the sinusoid at the top of Fig. 17.2—which is not very convincing, maybe, but if you added to the sinusoid another, suitably scaled sinusoid of thrice the frequency then you would obtain the somewhat better ﬁtting curve in the ﬁgure’s middle. The curve at the ﬁgure’s bottom would yet result after you had added in four more sinusoids respectively of ﬁve, seven, nine and eleven times the primary frequency. Algebraically, f (t) = 8A 3(2π)t (2π)t 1 − cos cos 2π T1 3 T1 5(2π)t 1 7(2π)t 1 − cos + ··· . + cos 5 T1 7 T1

(17.2)

How faithfully (17.2) really represents the repeating waveform and why its 1 coeﬃcients happen to be 1, − 3 , 1 , − 1 , . . . are among the questions this chap5 7 ter will try to answer; but, visually at least, it looks as though superimposing sinusoids worked. The chapter begins in preliminaries, starting with a discussion of Parseval’s principle.

17.1

Parseval’s principle

Parseval’s principle is that a step in every direction is no step at all. In the Argand plane (Fig. 2.5), stipulated that ∆ω T1 = 2π, ℑ(∆ω) = 0, ℑ(T1 ) = 0, T1 = 0, and also that2 j, n, N ∈ Z, n = 0, |n| < N, 2 ≤ N, (17.4) ℑ(to ) = 0, (17.3)

2 That 2 ≤ N is a redundant requirement, since (17.4)’s other lines imply it, but it doesn’t hurt to state it anyway.

17.1. PARSEVAL’S PRINCIPLE

489

**Figure 17.2: Superpositions of one, two and six sinusoids to approximate the square wave of Fig. 17.1.
**

f (t)

t

f (t)

t

f (t)

t

490

CHAPTER 17. THE FOURIER SERIES

**the principle is expressed algebraically as that3
**

to +T1 /2 to −T1 /2

ein ∆ω τ dτ = 0

(17.5)

**or alternately in discrete form as that
**

N −1 j=0

ei2πnj/N = 0.

(17.6)

Because the product ∆ω T1 = 2π relates ∆ω to T1 , the symbols ∆ω and T1 together represent in (17.3) and (17.5) not two but only one independent parameter. If T1 bears physical units then these typically will be units of time (seconds, for instance), whereupon ∆ω will bear the corresponding units of angular frequency (such as radians per second). The frame oﬀset to and the dummy variable τ naturally must have the same dimensions4 T1 has and normally will bear the same units. This matter is discussed further in § 17.2. To prove (17.5) symbolically is easy: one merely carries out the indicated integration. To prove (17.6) symbolically is not much harder: one replaces the complex exponential ei2πnj/N by limǫ→0+ e(i−ǫ)2πnj/N and then uses (2.34) to evaluate the summation. Notwithstanding, we can do better, for an alternate, more edifying, physically more insightful explanation of the two equations is possible as follows. Because n is a nonzero integer, (17.5) and (17.6) represent sums of steps in every direction—that is, steps in every phase—in the Argand plane (more precisely, eqn. 17.6 represents a sum over a discrete but balanced, uniformly spaced selection of phases). An appeal to symmetry forbids such sums from favoring any one phase n ∆ω τ or 2πnj/N over any other. This being the case, how could the sums of (17.5) and (17.6) ever come to any totals other than zero? The plain answer is that they can come to no other totals. A step in every direction is indeed no step at all. This is why (17.5) and (17.6) are so.5 We have actually already met Parseval’s principle, informally, in § 9.6.2.

An expression like to ± T1 /2 means to ± (T1 /2), here and elsewhere in the book. The term dimension in this context refers to the kind of physical unit. A quantity like T1 for example, measurable in seconds or years (but not, say, in kilograms or dollars), has dimensions of time. An automobile’s speed having dimensions of length divided by time can be expressed in miles per hour as well as in meters per second but not directly, say, in volts per centimeter; and so on. 5 The writer unfortunately knows of no conventionally established name for Parseval’s principle. The name Parseval’s principle seems as apt as any and this is the name this book will use.

4 3

17.2. TIME, SPACE AND FREQUENCY

491

One can translate Parseval’s principle from the Argand realm to the analogous realm of geometrical vectors, if needed, in the obvious way.

17.2

Time, space and frequency

A frequency is the inverse of an associated period of time, expressing the useful concept of the rate at which a cycle repeats. For example, an internalcombustion engine whose crankshaft revolves once every 20 milliseconds— which is to say, once every 1/3000 of a minute—runs thereby at a frequency of 3000 revolutions per minute (RPM). Frequency however comes in two styles: cyclic frequency (as in the engine’s example), conventionally represented by letters like ν and f ; and angular frequency, by letters like ω and k. If T , ν and ω are letters taken to stand respectively for a period of time, the associated cyclic frequency and the associated angular frequency, then by deﬁnition νT = 1, ωT = 2π, ω = 2πν. The period T will have dimensions of time like seconds. The cyclic frequency ν will have dimensions of inverse time like hertz (cycles per second).6 The angular frequency ω will have dimensions of inverse time like radians per second. The applied mathematician should make himself aware, and thereafter keep in mind, that the cycle per second and the radian per second do not differ dimensionally from one another. Both are technically units of [second]−1 , whereas the words “cycle” and “radian” in the contexts of the phrases “cycle per second” and “radian per second” are verbal cues that, in and of themselves, play no actual part in the mathematics. This is not because the

A pedagogical knot seems to tangle Marc-Antoine Parseval’s various namesakes. Because Parseval’s principle can be extracted as a special case from Parseval’s theorem (eqn. 18.34 in the next chapter), the literature sometimes indiscriminately applies the name “Parseval’s theorem” to both. This is ﬁne as far as it goes, but the knot arrives when one needs Parseval’s principle to derive the Fourier series, which one needs to derive the Fourier transform, which one needs in turn to derive Parseval’s theorem, at least as this book develops them. The way to untie the knot is to give Parseval’s principle its own name and to let it stand as an independent result. 6 Notice incidentally, contrary to the improper verbal usage one sometimes hears, that there is no such thing as a “hert.” Rather, “Hertz” is somebody’s name. The uncapitalized form “hertz” thus is singular as well as plural.

(17.7)

492

CHAPTER 17. THE FOURIER SERIES

cycle and the radian were ephemeral but rather because the second is unfundamental. The second is an arbitrary unit of measure. The cycle and the radian are deﬁnite, discrete, inherently countable things; and, where things are counted, it is ultimately up to the mathematician to interpret the count (consider for instance that nine baseball hats may imply nine baseball players and one baseball team, but that there is nothing in the number nine itself to tell us so). To distinguish angular frequencies from cyclic frequencies, it remains to the mathematician to lend factors of 2π where needed. The word “frequency” without a qualifying adjective is usually taken to mean cyclic frequency unless the surrounding context implies otherwise. Frequencies exist in space as well as in time: kλ = 2π. (17.8)

Here, λ is a wavelength measured in meters or other units of length. The wavenumber 7 k is an angular spatial frequency measured in units like radians per meter. (Oddly, no conventional symbol for cyclic spatial frequency seems to be current. The literature just uses k/2π which, in light of the potential for confusion between ν and ω in the temporal domain, is probably for the best.) Where a wave propagates the propagation speed v= ω λ = T k (17.9)

relates periods and frequencies in space and time. Now, we must admit that we ﬁbbed when we said that T had to have dimensions of time. Physically, that is the usual interpretation, but mathematically T (and T1 , t, to , τ , etc.) can bear any units and indeed are not required to bear units at all, as § 17.1 has observed. The only mathematical requirement is that the product ωT = 2π (or ∆ω T1 = 2π or the like, as appropriate) be dimensionless. However, when T has dimensions of length rather than time it is conventional—indeed, it is practically mandatory if one wishes to be understood—to change λ ← T and k ← ω as this section has done, though the essential Fourier mathematics is the same regardless of T ’s dimensions (if any) or of whether alternate symbols like λ and k are used.

7 The wavenumber k is no integer, notwithstanding that the letter k tends to represent integers in other contexts.

**17.3. THE SQUARE, TRIANGULAR AND GAUSSIAN PULSES Figure 17.3: The square, triangular and Gaussian pulses.
**

Π(t) 1

493

t −1 2 Λ(t) 1

1 2

t −1 Ω(t)

√1 2π

1

t

17.3

The square, triangular and Gaussian pulses

The Dirac delta of § 7.7 and of Fig. 7.10 is useful for the unit area it covers among other reasons, but for some purposes its curve is too sharp. One occasionally ﬁnds it expedient to substitute either the square or the triangular pulse of Fig. 17.3, Π(t) ≡ Λ(t) ≡ 1 0 if |t| ≤ 1/2, otherwise;

1 − |t| if |t| ≤ 1, 0 otherwise;

(17.10)

for the Dirac delta, both of which pulses evidently share Dirac’s property

494 that

∞ −∞ ∞ −∞ ∞ −∞

CHAPTER 17. THE FOURIER SERIES

1 δ T 1 Π T 1 Λ T

τ − to T τ − to T τ − to T

dτ = 1, dτ = 1, dτ = 1, (17.11)

for any real T > 0 and real to . In the limit, 1 Π T 1 Λ lim + T T →0 lim t − to T t − to T = δ(t − to ), = δ(t − to ),

T →0+

(17.12)

constituting at least two possible implementations of the Dirac delta in case such an implementation were needed. Looking ahead, if we may further abuse the Greek capitals to let them represent pulses whose shapes they accidentally resemble, then a third, subtler implementation—more complicated to handle but analytic (§ 8.4) and therefore preferable for some purposes—is the Gaussian pulse lim 1 Ω T t − to T = δ(t − to ), 1 t2 √ exp − 2 2π , (17.13)

T →0+

Ω(t) ≡

the mathematics of which § 18.5 and Chs. 19 and 20 will unfold.

17.4

Expanding repeating waveforms in Fourier series

The Fourier series represents a repeating waveform (17.1) as a superposition of sinusoids. More precisely, inasmuch as Euler’s formula (5.18) makes a sinusoid a sum of two complex exponentials, the Fourier series supposes that a repeating waveform were a superposition f (t) =

∞ j=−∞

aj eij ∆ω t

(17.14)

17.4. EXPANDING WAVEFORMS IN FOURIER SERIES

495

of many complex exponentials, in which (17.3) is obeyed yet in which neither the several Fourier coeﬃcients aj nor the waveform f (t) itself need be real. Whether one can properly represent every repeating waveform as a superposition (17.14) of complex exponentials is a question §§ 17.4.4 and 17.6 will address later; but, at least to the extent to which one can properly represent such a waveform, we will now assert that one can recover any or all of the waveform’s Fourier coeﬃcients aj by choosing an arbitrary frame oﬀset to (to = 0 is a typical choice) and then integrating

to +T1 /2 to −T1 /2

aj =

1 T1

e−ij ∆ω τ f (τ ) dτ.

(17.15)

17.4.1

Derivation of the Fourier-coeﬃcient formula

But why should (17.15) work? How is it to recover a Fourier coeﬃcient aj ? The answer is that it recovers a Fourier coeﬃcient aj by isolating it, and that it isolates it by shifting frequencies and integrating. Equation (17.14) has proposed to express a repeating waveform as a series of complex exponentials, each exponential of the form aj eij ∆ω t in which aj is a weight to be determined. Unfortunately, (17.14) can hardly be very useful until the several aj actually are determined, whereas how to determine aj from (17.14) for a given value of j is not immediately obvious. The trouble with using (17.14) to determine the several coeﬃcients aj is that it includes all the terms of the series and, hence, all the coeﬃcients aj at once. To determine aj for a given value of j, one should like to suppress the entire series except the single element aj eij ∆ω t , isolating this one element for analysis. Fortunately, Parseval’s principle (17.5) gives us a way to do this, as we shall soon see. Now, to prove (17.15) we mean to use (17.15), a seemingly questionable act. Nothing prevents us however from taking only the right side of (17.15)— not as an equation but as a mere expression—and doing some algebra with it to see where the algebra leads, for if the algebra should lead to the left side of (17.15) then we should have proven the equation. Accordingly, changing dummy variables τ ← t and ℓ ← j in (17.14) and then substituting into (17.15)’s right side the resulting expression for f (τ ), we have by suc-

496 cessive steps that8

CHAPTER 17. THE FOURIER SERIES

1 T1 = = = =

to +T1 /2 to −T1 /2

e−ij ∆ω τ f (τ ) dτ e−ij ∆ω τ

to +T1 /2 ∞ ℓ=−∞

1 T1 1 T1 aj T1 aj T1

to +T1 /2

aℓ eiℓ ∆ω τ dτ

to −T1 /2 ∞ ℓ=−∞ to +T1 /2 to −T1 /2

aℓ

to −T1 /2

ei(ℓ−j) ∆ω τ dτ

ei(j−j) ∆ω τ dτ dτ = aj ,

to +T1 /2

to −T1 /2

in which Parseval’s principle (17.5) has killed all but the ℓ = j term in the summation. Thus is (17.15) formally proved. Though the foregoing formally completes the proof, the idea behind the formality remains more interesting than the formality itself, for one would like to know not only the fact that (17.15) is true but also the thought which leads one to propose the equation in the ﬁrst place. The thought is as follows. Assuming that (17.14) indeed can represent the waveform f (t) properly, one observes that the transforming factor e−ij ∆ω τ of (17.15) serves to shift the waveform’s jth component aj eij ∆ω t —whose angular frequency evidently is ω = j ∆ω—down to a frequency of zero, incidentally shifting the waveform’s several other components to various nonzero frequencies as well. Signiﬁcantly, the transforming factor leaves each shifted frequency to be a whole multiple of the waveform’s fundamental frequency ∆ω. By Parseval’s principle, (17.15)’s integral then kills all the thus frequency-shifted components except the zero-shifted one by integrating the components over complete cycles, passing only the zero-shifted component which, once shifted, has no cycle. Such is the thought which has given rise to the equation.

It is unfortunately conventional to footnote steps like these with some formal remarks on convergence and the swapping of summational/integrodiﬀerential operators. Refer to §§ 7.3.4 and 7.3.5.

8

17.4. EXPANDING WAVEFORMS IN FOURIER SERIES

497

17.4.2

The square wave

According to (17.15), the Fourier coeﬃcients of Fig. 17.1’s square wave are, if to = T1 /4 is chosen and by successive steps, aj = = = But e−ij ∆ω τ

τ =−T1 /4

1 T1 A T1

3T1 /4 −T1 /4

e−ij ∆ω τ f (τ ) dτ

3T1 /4

T1 /4

−T1 /4

−

e−ij ∆ω τ dτ

3T1 /4

T1 /4 T1 /4 −T1 /4

iA −ij ∆ω τ e 2πj

−

.

T1 /4

= e−ij ∆ω τ e−ij ∆ω τ

τ =3T1 /4 τ =T1 /4

= ij , = (−i)j ,

so e−ij ∆ω τ

T1 /4 −T1 /4 j 3T1 /4

−

T1 /4

= . . . , −i4, 0, i4, 0, −i4, 0, i4, . . . for j = . . . , −3, −2, −1, 0, 1, 2, 3, . . . Therefore, aj = (−i)j − ij = i2A 2πj for odd j, for even j (17.16)

= [(−i) − i ] − [ij − (−i)j ] = 2[(−i)j − ij ]

j

(−)(j−1)/2 4A/2πj 0

are the square wave’s Fourier coeﬃcients which, when the coeﬃcients are applied to (17.14) and when (5.18) is invoked, indeed yield the speciﬁc series of sinusoids (17.2) and Fig. 17.2 have proposed.

17.4.3

The rectangular pulse train

The square wave of § 17.4.2 is an important, canonical case and (17.2) is arguably worth memorizing. After the square wave however the variety of

498

**CHAPTER 17. THE FOURIER SERIES Figure 17.4: A rectangular pulse train.
**

f (t)

A T1 ηT1 t

possible repeating waveforms has no end. Whenever an unfamiliar repeating waveform arises, one can calculate its Fourier coeﬃcients (17.15) on the spot by the straightforward routine of § 17.4.2. There seems little point therefore in trying to tabulate waveforms here. One variant on the square wave nonetheless is interesting enough to attract special attention. This variant is the pulse train of Fig. 17.4, f (t) = A

∞ j=−∞

Π

t − jT1 ηT1

;

(17.17)

where Π(·) is the square pulse of (17.10); the symbol A represents the pulse’s full height rather than the half-height of Fig. 17.1; and the dimensionless factor 0 ≤ η ≤ 1 is the train’s duty cycle, the fraction of each cycle its pulse is as it were on duty. By the routine of § 17.4.2, aj = = = 1 T1 A T1

T1 /2

e−ij ∆ω τ f (τ ) dτ e−ij ∆ω τ dτ

ηT1 /2

−T1 /2 ηT1 /2 −ηT1 /2

iA −ij ∆ω τ e 2πj

=

−ηT1 /2

2πηj 2A sin 2πj 2

**for j = 0. On the other hand, a0 = 1 T1
**

T1 /2

f (τ ) dτ =

−T1 /2

A T1

ηT1 /2

dτ = ηA

−ηT1 /2

17.4. EXPANDING WAVEFORMS IN FOURIER SERIES is the waveform’s mean value. Altogether for the pulse train, 2A sin 2πηj if j = 0, 2 aj = 2πj ηA if j = 0

499

(17.18)

(though eqn. 17.26 will improve the notation later). An especially interesting special case occurs when the duty cycle grows very short. Since limη→0+ sin(2πηj/2) = 2πηj/2 according to (8.32), it follows from (17.18) that lim aj = ηA, (17.19)

η→0+

the same for every index j. As the duty cycle η tends to vanish the pulse tends to disappear and the Fourier coeﬃcients along with it; but we can compensate for vanishing duty if we wish by increasing the pulse’s amplitude A proportionally, maintaining the product ηT1 A = 1 (17.20)

of the pulse’s width ηT1 and its height A, thus preserving unit area9 under the pulse. In the limit η → 0+ , the pulse then by deﬁnition becomes the Dirac delta of Fig. 7.10, and the pulse train by construction becomes the Dirac delta pulse train of Fig. 17.5. Enforcing (17.20) on (17.19) yields the Dirac delta pulse train’s Fourier coeﬃcients aj = 1 . T1 (17.21)

17.4.4

Linearity and suﬃciency

The Fourier series is evidently linear according to the rules of § 7.3.3. That is, if the Fourier coeﬃcients of f1 (t) are aj1 and the Fourier coeﬃcients of f2 (t) are aj2 , and if the two waveforms f1 (t) and f2 (t) share the same fundamental period T1 , then the Fourier coeﬃcients of f (t) = f1 (t) + f2 (t) are aj = aj1 + aj2 . Likewise, the Fourier coeﬃcients of αf (t) are αaj and

In light of the discussion of time, space and frequency in § 17.2, we should clarify that we do not here mean a physical area measurable in square meters or the like. We merely mean the dimensionless product of the width (probably measured in units of time like seconds) and the height (correspondingly probably measured in units of frequency like inverse seconds) of the rectangle a single pulse encloses in Fig. 17.4. Though it is not a physical area the rectangle one sketches on paper to represent it, as in the ﬁgure, of course does have an area. The word area here is meant in the latter sense.

9

500

**CHAPTER 17. THE FOURIER SERIES Figure 17.5: A Dirac delta pulse train.
**

f (t)

T1

t

the Fourier coeﬃcients of the null waveform fnull (t) ≡ 0 are themselves null, thus satisfying the conditions of linearity. All this however supposes that the Fourier series actually works.10 Though Fig. 17.2 is suggestive, the ﬁgure alone hardly serves to demonstrate that every repeating waveform were representable as a Fourier series. To try to consider every repeating waveform at once would be too much to try at ﬁrst in any case, so let us start from a more limited question: does there exist any continuous, repeating waveform11 f (t) = 0 of period T1 whose Fourier coeﬃcients aj = 0 are identically zero? If the waveform f (t) in question is continuous then nothing prevents us from discretizing (17.15) as aj ∆τM 1 = lim M →∞ T1 ≡ T1 , 2M + 1

M

**e(−ij ∆ω)(to +ℓ ∆τM ) f (to + ℓ ∆τM ) ∆τM ,
**

ℓ=−M

**and further discretizing the waveform itself as f (t) = lim
**

∞ p=−∞

M →∞

f (to + p ∆τM ) Π

t − (to + p ∆τM ) , ∆τM

**in which Π[·] is the square pulse of (17.10). Substituting the discretized
**

The remainder of this dense subsection can be regarded as optional reading. As elsewhere in the book, the notation f (t) = 0 here forbids only the all-zero waveform. It does not forbid waveforms like f (t) = A sin ωt that happen to take a zero value at certain values of t.

11 10

we can do better than merely to state that CM is invertible: we can write down its actual inverse.2 has it that neither fM nor aM can be null unless both are. But is CM invertible? This seems a hard question to answer until we realize that the rows of CM consist of sampled complex exponentials which repeat over the interval T1 and thus stand subject to Parseval’s principle (17. M →∞ assuming that CM is invertible. then matrix notation renders the last equation as M →∞ [fM ]ℓ ≡ f (to + ℓ ∆τM ). CM is invertible. § 14. the answer to our question is that. M Equation (11. In the limit M → ∞. −1 [CM ]ℓj = T1 e(+ij ∆ω)(to +ℓ ∆τM ) . lim aM = lim CM fM .17. representing a (2M + 1)-dimensional identity matrix whose string of ones extends along its main diagonal from j = ℓ = −M through j = ℓ = M . M →∞ whereby M →∞ −1 lim fM = lim CM aM .2) also that CM CM = I−M . Because CM is invertible. we have that aj = ∆τM M →∞ T1 lim ∆τM M →∞ T1 lim M ∞ 501 ℓ=−M p=−∞ M e(−ij ∆ω)(to +ℓ ∆τM ) f (to + p ∆τM )Π(ℓ − p) = e(−ij ∆ω)(to +ℓ ∆τM ) f (to + ℓ ∆τM ).6). this implies that no continuous. Realizing this. ℓ) ≤ M. ℓ=−M If we deﬁne the (2M + 1)-element vectors and (2M + 1) × (2M + 1) matrix [aM ]j ≡ aj . 12 . ∆τM (−ij ∆ω)(to +ℓ ∆τM ) e . EXPANDING WAVEFORMS IN FOURIER SERIES waveform into the discretized formula for aj .30) has deﬁned the notation I−M . [CM ]jℓ ≡ T1 −M ≤ (j. (2M + 1) ∆τM −1 −1 M M such that12 CM CM = I−M and thus per (13.4. yes. So.

2 this chapter has posed as its principal example is a repeating waveform but. In other words. repeating waveform is representable as a Fourier series. more subtle. and moreover more evocative. We will leave to the professional mathematician the classiﬁcation of such unreasonable waveforms. not a continuous one. to calculate Fourier coeﬃcients and express Fourier series in terms of complex exponentials as (17. Chapter 8’s footnote 6 has argued in a similar style.12) as ∞ j=1 f (t) = a0 + [(aj + a−j ) cos j ∆ω t + i(aj − a−j ) sin j ∆ω t] . 17. Being continuous and unrepresentable as a Fourier series. Hence. earlier in the book.5 The trigonometric form It is usually best.14) by Euler’s formula (5. Occasionally.15) do. the investigation of the waveforms’ Fourier series and the provision of greater rigor generally. A truly discontinuous waveform would admittedly invalidate the discretization above of f (t). the square wave of Figs. And what of discontinuous waveforms? Well.6 will derive.1 and 17.14) and (17. continuous because both F (t) and f (t) are continuous. though. but as the last paragraph has concluded this can only be so if ∆F (t) = 0. 17. or at least neatest and cleanest. THE FOURIER SERIES repeating waveform f (t) = 0 exists whose Fourier coeﬃcients aj = 0 are identically zero. ∆F (t) has null Fourier coeﬃcients. every continuous.502 CHAPTER 17. when the repeating waveform f (t) is real. more complete answer to the question though is that a discontinuity incurs Gibbs’ phenomenon. which § 17. one prefers to work in sines and cosines rather than in complex exponentials. ∆F (t) = 0 indeed.14 The better. but see: nothing prevents us from approximating the square wave’s discontinuity by an arbitrarily steep slope. of course. whereupon this subsection’s conclusion again applies. repeating waveform F (t) and its Fourier series f (t). which implies13 that f (t) = F (t). One writes (17. Where this subsection’s conclusion cannot be made to apply is where unreasonable waveforms like A sin[B/ sin ωt] come into play.4. Let ∆F (t) ≡ F (t)−f (t) be the part of F (t) unrepresentable as a Fourier series. Now consider a continuous. 14 13 .

(17. 2 (17.22) sin(j ∆ω τ )f (τ ) dτ. a0 = bj ≡ (aj + a−j ) = cj ≡ i(aj − a−j ) = 1 T1 2 T1 2 T1 to +T1 /2 503 f (τ ) dτ.5 The sine-argument function Equation (17.15). to −T1 /2 to +T1 /2 (17.5. THE SINE-ARGUMENT FUNCTION Then. 17. to −T1 /2 to +T1 /2 cos(j ∆ω τ )f (τ ) dτ.23) The complex conjugate of (17.4 its Fourier coeﬃcients. but a better notation for (17.26) sin z z (17.15) is a∗ = j 1 T1 to +T1 /2 to −T1 /2 e+ij ∆ω τ f ∗ (τ ) dτ.17.24) cj = −2ℑ(aj ) 17. If the waveform happens to be real then f ∗ (t) = f (t).22) and (17. j Combining (17.15) implies that a−j = a∗ if ℑ[f (t)] = 0.18) gives the pulse train of Fig. to −T1 /2 which give the Fourier series the trigonometric form f (t) = a0 + ∞ j=1 (bj cos j ∆ω t + cj sin j ∆ω t) . we have that bj = 2ℜ(aj ) if ℑ[f (t)] = 0. (17.18) is aj = ηA Sa where Sa z ≡ 2πηj .25) (17. superimposing coeﬃcients in (17. which in light of the last equation and (17.27) .24).

2][20] use the sinc(·) notation for another function. This section introduces the sine-argument function and some of its properties.26) to be cos z − Sa z d Sa z = .3][14. sincalternate z ≡ Sa 2πz sin(2πz/2) = . They can skip ahead to the start of the next chapter without great loss. 17.27) and the derivative product rule (4.29) 15 Many (including the author himself in other contexts) call it the sinc function.5. less interested in special functions than in basic Fourier theory. divided by z. (2m)(2m + 1) (17.1. . may ﬁnd this section unproﬁtably tedious. anyway.28) the Taylor series of sin z from Table 8.6. so this is the notation we will use.1 Derivative and integral The sine-argument function’s derivative is computed from the deﬁnition (17. some [56. THE FOURIER SERIES Figure 17.6: The sine-argument function. 2 2πz/2 The unambiguous Sa(·) suits this particular book better. § 2.6. plus also the related sine integral.504 CHAPTER 17. will read the present section because Gibbs depends on its results. denoting it sinc(·) and pronouncing it as “sink. § 17.” Unfortunately. The function’s Taylor series is Sa z = ∞ j=0 j m=1 −z 2 .16 17. § 4. Among other readers however some.15 plotted in Fig. 16 Readers interested in Gibbs’ phenomenon. dz z (17. Sa t ≡ −2π − 2π 2 sin t t 1 2π 2 2π t is the sine-argument function.

” 18 17 .17.5. (−)n Sa(±t) > 0 over nπ < t < (n + 1)π for each n ≥ 0.5. 17. where Sa 0 = 1. Si t ≡ 2π 4 t 0 505 Sa τ dτ −2π − 2π 2 1 t −1 − 2π 4 2π 2 2π The function’s integral is expressed as a Taylor series after integrating the function’s own Taylor series (17.31) • Over the real domain ℑ(t) = 0.7. if ℑ(t) = 0 then ℑ(Sa t) = 0.30) plotted in Fig. n ∈ Z. • It is that |Sa t| < 1 over the real domain ℑ(t) = 0 except at the global maximum t = 0.3] The name “sine-argument” incidentally seems to have been back-constructed from the name “sine integral. positive and negative lobes. n = 0. THE SINE-ARGUMENT FUNCTION Figure 17. 17. Convention gives this integrated function its own name and notation: it calls it the sine integral 17 . § 3. • The zeros of Sa z occur at z = nπ.7: The sine integral. 2j + 1 m=1 (2m)(2m + 1) j (17.2 Properties of the sine-argument function Sine-argument properties include the following. (17. [48. That is. the function Sa t alternates between distinct. n ∈ Z. • The sine-argument function is real over the real domain.18 and denotes it by Si(·).28) term by term to obtain the form z Si z ≡ Sa τ dτ = 0 ∞ j=0 z −z 2 . Speciﬁcally.

17.5. interpreted as the corresponding direct shortcut to the axis (see Fig. multiplying by z 2 / cos z and changing t ← z—it is to assert that tan t = t exactly once in each interval nπ ≤ t < (n + 1)π. • The sine-argument function and its derivative converge toward t→±∞ lim Sa t = 0.8. But according to Table 5.29)’s left side to zero. To assert that each lobe has but a single peak is to assert that (d/dt) Sa t = 0 exactly once in each lobe. over the real domain ℑ(t) = 0. that |Sa t| < 1 says merely that |sin t| < |t| for nonzero t. (8. and similarly for t ≤ 0. n ≥ 0.32) d Sa t = 0. implying that tan t is everywhere at least as steep as t is—and. Among the less obvious properties. THE FOURIER SERIES • Each of the sine-argument’s lobes has but a single peak. .27). or. 3. That each of the sine-argument function’s lobes should have but a single peak seems right in view of Fig. the derivative (d/dt) Sa t = 0 is zero at only a single value of t on each lobe. well.3 Properties of the sine integral Properties of the sine integral Si t of (17. interpreted as an angle—which is to say.506 CHAPTER 17. where the curves evidently cannot but intersect exactly once in each interval. 17. which must be true since t.30) include the following. equivalently—after setting (17. as a curved distance about a unit circle—can hardly be shorter than sin t.6 but is nontrivial to prove. dt cos2 t whereas dt/dt = 1. (17. (17. if you prefer.32) obtains—or. t→±∞ dt lim Some of these properties are obvious in light of the sine-argument function’s deﬁnition (17. For t = 0.1). That is.2 1 d tan t = ≥ 1.28). for t ≥ 0. 17. the most concise way to ﬁnish the argument is to draw a picture of it. as in Fig.

4 (17. negative for negative t and. • The global maximum and minimum of Si t over the real domain ℑ(t) = 0 occur respectively at the ﬁrst positive and negative zeros of Sa t. of course.33) t→±∞ That the sine integral should reach its local extrema at the sine-argument’s zeros ought to be obvious to the extent to which the concept of integration is understood. f (t) 507 − 2π 2 2π 2 t t tan t • Over the real domain ℑ(t) = 0. the sine integral Si t is positive for positive t. • The local extrema of Si t over the real domain ℑ(t) = 0 occur at the zeros of Sa t. THE SINE-ARGUMENT FUNCTION Figure 17.17.5. which are t = ±π. • The sine integral converges toward lim Si t = ± 2π . To explain the other properties it helps ﬁrst to have expressed .8: The points at which t intersects tan t. zero for t = 0.

concluding that the integral’s global maximum and minimum over the real domain occur respectively at the sine-argument function’s ﬁrst positive and negative zeros. n) ∈ Z.508 the sine integral in the form CHAPTER 17. THE FOURIER SERIES t Si t = Sn + nπ n−1 Sa τ dτ. because each lobe majorizes the next (§ 8. Sn ≡ Uj ≡ Uj .19 in the integrand. such that 0 ≤ (−)j t jπ Sa τ dτ < (−)j Uj < (−)j−1 Uj−1 . |Sa τ | ≥ |Sa τ + π| for all τ ≥ 0—the magnitude of the area under each lobe exceeds that under the next. 19 . edges and the like. because.33) wants some cleverness to calculate and will be the subject of the next subsection. (j. and further concluding that the integral is positive for all positive t and negative for all negative t. since there is no U−1 ) and thus that 0 = S0 < S2m < S2m+2 < S∞ < S2m+3 < S2m+1 < S1 for all m > 0. j=0 (j+1)π Sa τ dτ. The several Uj alternate in sign but. 0 ≤ j. More rigorously. Equation (17.10. m ∈ Z. one could fuss here for a third of a page or so over signs. The foregoing applies only when t ≥ 0 but naturally one can reason similarly for t ≤ 0. t = ±π.2)—that is. 0 ≤ n. jπ nπ ≤ t < (n + 1)π. to give the reason perfectly unambiguously. where each partial integral Uj integrates over a single lobe of the sineargument. j ∈ Z (except that the Uj−1 term of the inequality does not apply when j = 0. To do so is left as an exercise to those who aspire to the pure mathematics profession. jπ ≤ t < (j + 1)π.

4 The sine integral’s limit by complex contour Equation (17. ℑ(z) I3 I4 I5 I6 I1 I2 ℜ(z) 509 17. i2z rather than trying to integrate the sine-argument function all at once let us ﬁrst try to integrate just one of its two complex terms.9. 2 4 .17.5. we shall have to think of something cleverer. Many contours are possible and one is unlikely to ﬁnd an amenable contour on the ﬁrst attempt. so it cannot answer the question. Noticing per (5. To evaluate the integral in the inﬁnite limit. we will apply the closed-contour technique of § 9. but why? The integral’s Taylor series (17. THE SINE-ARGUMENT FUNCTION Figure 17.5.19) that Sa z = e+iz − e−iz .9: A complex contour about which to integrate eiz /i2z. 17.5. but perhaps after several false tries we discover and choose the contour of Fig.33) has proposed that the sine integral converges toward a value of 2π/4. leaving the other term aside to handle later. i2z To compute the integral I1 . The integral about the inner semicircle of this contour is I6 = C6 eiz dz = lim i2z ρ→0+ 0 2π/2 eiz (iρeiφ dφ) = i2(ρeiφ ) 0 2π/2 ei0 dφ 2π =− . choosing a contour in the Argand plane that incorporates I1 but shuts out the integrand’s pole at z = 0.30) is impractical to compute for large t and is useless for t → ∞. for the moment computing only I1 ≡ ∞ 0 eiz dz .

we can weaken to read a |I2 | ≤ lim only possible if a→∞ 0 e−y dy 1 = lim = 0. i2z from which. a |I2 | ≤ lim a→∞ 0 eia e−y dy = lim a→∞ 2z a 0 e−y dy . Because the contour encloses no pole. according to the continuous triangle inequality (9. 2z from which. a→∞ 2a 2a I2 = 0.510 CHAPTER 17. 2 |z| which. 2 |z| which. according to the continuous triangle inequality. since 0 < a ≤ |z| across the segment. The integral down the contour’s left segment is I4 = 0 for like reason.15). we can weaken to read a |I3 | ≤ lim only possible if a→∞ −a e−a dx = lim e−a = 0. a→∞ 2a I3 = 0. THE FOURIER SERIES The integral across the contour’s top segment is I3 = C3 eiz dz = lim a→∞ i2z −a a ei(x+ia) dx = lim a→∞ i2z a −a −eix e−a dx . eiz dz = I1 + I2 + I3 + I4 + I5 + I6 = 0. since 0 < a ≤ |z| across the segment. i2z . The integral up the contour’s right segment is I2 = C2 eiz dz = lim a→∞ i2z a 0 ei(a+iy) dy = lim a→∞ 2z a 0 eia e−y dy . a |I3 | ≤ lim a→∞ −a −eix e−a dx = lim a→∞ i2z a −a e−a dx .

17.20 17. though Integration by closed contour is a subtle technique. once one has grasped the technique? An integration of −e−iz /i2z is precisely the sort of thing an experienced applied mathematician would expect to fall out as a byproduct of the contour integration of eiz /i2z. We said that it was fortuitous that I5 .6. 4 ∞ 0 eiz dz = i2z eix dx i2x is the integral we wanted to compute in the ﬁrst place. repeating waveform.6 Gibbs’ phenomenon Section 17.4. but what is that I5 ? Answer: 0 eix dx eiz dz = . but is it really merely fortuitous. which refers a sine integral to the known amplitude of a square wave. the discovery being a process of informed trial and error. is it not? What a ﬁnesse this subsection’s calculation has been! The author rather strongly sympathizes with the reader who still somehow cannot quite believe that contour integration actually works. I1 = C1 511 2π . Paradoxically.2 the Fourier series seems to work for such discontinuous waveforms. the chapter’s examples have been of discontinuous waveforms like the square wave. i2x which fortuitously happens to integrate the heretofore neglected term of the sine-argument function we started with. lim Si t = 0 ∞ t→∞ Sa x dx = 0 ∞ e+ix − e−ix 2π dx = I1 + I5 = . 20 . I5 = −∞ i2x C5 i2z or. which we did not know how to eliminate.4 has shown how the Fourier series suﬃces to represent a continuous. At least in Fig. quite independent method to evaluate the integral is known and it ﬁnds the same number 2π/4. changing −x ← x. 17. I5 = 0 ∞ −e−ix dx . turned out to be something we needed anyway. GIBBS’ PHENOMENON which in light of the foregoing calculations implies that I1 + I5 = Now. Thus. i2x 4 which was to be computed.6. The interested reader can extract this other method from Gibbs’ calculation in § 17. The trick is to discover the contour from which it actually does fall out. but in the case of the sine integral another.

17. J.512 CHAPTER 17. the Fourier series works. and moreover they can have signiﬁcant physical eﬀects.10 by the triangular pulse of Fig. Yet.23 Henry Wilbraham investigated this phenomenon as early as 1848. though systems encountered in practice admittedly typically suﬀer a middle frequency domain through which frequencies are only partly suppressed. Willard Gibbs explored its engineering implications in 1899.3. a signiﬁcant thing happens when one truncates the Fourier series. § 15. as we said. considered in this light. One can represent a discontinuity by a relatively sharp continuity—as for instance one can represent the Dirac delta of Fig. in which case to truncate the series is more or less the right thing to do. 7. This oscillation and overshot turn out to be irreducible.12) is suﬃciently small—and. the Fourier series oscillates and overshoots. Mathematically however one is more likely to approximate a Fourier series by truncating it after some ﬁnite number N of terms. if we like. indeed. if T in (17. 2j + 1 T1 21 So called because they pass low frequencies while suppressing high ones. f (t) = 2π 2j + 1 T1 j=0 which we can. Changing t − T1 /4 ← t in (17. but. it does not imply much of anything. and. with its sloped edges.24 Let us along with them refer to the square wave of Fig. 17. so-called21 “low-pass” physical systems that naturally suppress high frequencies22 are common.2] 23 [43] 24 [78][28] .2) to delay the square wave by a quarter cycle yields ∞ (2j + 1)(2π)t 1 8A sin . write as f (t) = lim Again changing ∆v ← 2(2π)t T1 8A N →∞ 2π N −1 j=0 (2j + 1)(2π)t 1 sin . As further Fourier components are added the Fourier waveform better approximates the square wave.2 on page 489. So. At a discontinuity. what does all this imply? In one sense. it oscillates about and overshoots—it “rings about” in the electrical engineer’s vernacular—the square wave’s discontinuities (the verb “to ring” here recalls the ringing of a bell or steel beam). THE FOURIER SERIES we have never exactly demonstrated that it should work for them. 22 [40. or how.

It means what it appears to mean. 2π (17. When it does. and the associated sine-integral ringing constitute Gibbs’ phenomenon. 2j + 1 m=1 (2m)(2m + 1) j This overshot. as Fig. if it is to obey Fourier. Though unexpected the eﬀect can and does actually arise in physical systems.34) as it should has it that f (t) ≈ A when25 t ≩ 0. 0 < ∆v ≪ 1.3.2DD2)A. the maximum value of f (t) is of interest to mechanical and electrical engineers among others because. GIBBS’ PHENOMENON makes this T1 4A f ∆v = lim N →∞ 2π 2(2π) Stipulating that ∆v be inﬁnitesimal.34) Equation (17.33) gives us that limu→∞ Si u = 2π/4. When t ≈ 0 however it gives the waveform locally the sine integral’s shape of Fig.17.30) fmax 4A 2π 4A = Si = 2π 2 2π ∞ j=0 2π/2 −(2π/2)2 ≈ (0x1. u= 2 where according to (17. N −1 j=0 513 Sa j+ 1 2 ∆v ∆v. giving it a steep 25 Here is an exotic symbol: ≩. 17. and further deﬁning u ≡ N ∆v. Admittedly as earlier alluded. that the summation become an integration. .2DD2)A. peaking momentarily at (0x1. so (17.7. and indeed strictly this is so: a true discontinuity. the engineer wants to allow safely for the overshot. therefore. one can sometimes substantially evade Gibbs by softening a discontinuity’s edge. According to § 17.5.10 depicts. that t > 0 and t ≈ 0.6. the sine integral Si u reaches its maximum at 2π . must overshoot according to Gibbs. we have that N →∞ lim f T1 4A u = 2(2π)N 2π u Sa v dv = 0 4A Si u. 17. We have said that Gibbs’ phenomenon is irreducible. (which in light of the deﬁnition of ∆v is to stipulate that 0 < t ≪ T1 ) such that dv ≡ ∆v and. if an element in an engineered system will overshoot its designed position.

by rolling the Fourier series oﬀ gradually rather than truncating it exactly at N terms. alternately. Nature may do likewise. Engineers may do one or the other. adapting this section’s subtle technique as the circumstance might demand. A good engineer or other applied mathematician will make himself aware of Gibbs’ phenomenon and of the mathematics behind it for this reason. Neither however is the point. 26 . However. explicitly or implicitly. such extra-ﬁne mathematical craftsmanship is unnecessary to this section’s purpose.10: Gibbs’ phenomenon. which is why the full Gibbs is not always observed in engineered systems. f (t) A t but not vertical slope and maybe rounding its corners a little.26 or.514 CHAPTER 17. and indeed there are times at which he might do so. THE FOURIER SERIES Figure 17. If the applied mathematician is especially exacting he might represent a discontinuity by the probability integral of [not yet written] or maybe (if slightly less exacting) as an arctangent. yet that one can still analyze them proﬁtably. or both. The point is that sharp discontinuities do not behave in the manner one might na¨ ıvely have expected.

18. Formally. convert the pulse f (t) into the pulse train g(t) ≡ ∞ n=−∞ f (t − nT1 ). An eﬀort to extend the Fourier series to the broader domain of nonrepeating waveforms leads to the Fourier transform. 1 515 . by (17. which naturally does repeat.] 18. [This chapter is yet only a rough draft.1 Fourier’s equation Consider the nonrepeating waveform or pulse of Fig.15). such a condition would forbid a function like f (t) = A cos ωo t. yet one can however give it something very like a Fourier series in the following way. Because the pulse does not repeat it has no Fourier series.1 Second. ℑ(T1 ) = 0. extending the Fourier series. this chapter’s chief subject. 18. Third. use these coeﬃcients in the Fourier One could divert rigorously from this point to consider formal requirements against f (t) but it suﬃces that f (t) be suﬃciently limited in extent that g(t) exist for all ℜ(T1 ) > 0.Chapter 18 The Fourier and Laplace transforms The Fourier series of Ch. 17 though quite useful applies solely to waveforms that repeat. First.1. calculate the Fourier coeﬃcients of this pulse train g(t).1 The Fourier transform This section derives and presents the Fourier transform.1.

observing that limT1 →∞ g(t) = f (t). or. f (t) t series (17. but one can evade this formality. (18. where Π(t) is the rectangular pulse of (17. recover from the train the original pulse f (t) = lim ∞ j=−∞ T1 →∞ 1 T1 T1 /2 −T1 /2 e−ij ∆ω τ f (τ ) dτ eij ∆ω t . . Fifth. by deﬁning the function as f (t) = limT2 →∞ Π(t/T2 )A cos ωo t.1) This is Fourier’s equation.516 CHAPTER 18.10). a remarkable. THE FOURIER AND LAPLACE TRANSFORMS Figure 18. f (t) = lim 1 √ 2π 1 eij ∆ω t √ 2π j=−∞ ∞ 2π/2∆ω −2π/2∆ω ∆ω→0+ e−ij ∆ω τ f (τ ) dτ ∆ω. such that 1 f (t) = √ 2π 1 eiωt √ 2π −∞ ∞ ∞ −∞ e−iωτ f (τ ) dτ dω. Fourth. We will leave to the professionals further consideration of formal requirements.3) that ∆ω T1 = 2π and reordering factors. observing per (17. among other ways. deﬁning the symbol ω ≡ j ∆ω observe that the summation is really an integration in the limit.1: A pulse. highly signiﬁcant result.14) to reconstruct g(t) = ∞ j=−∞ 1 T1 T1 /2 −T1 /2 e−ij ∆ω τ g(τ ) dτ eij ∆ω t .

several alternate deﬁnitions and usages of the Fourier series are broadly current in the writer’s country alone. To understand why this should be so.1.1.3 The complementary variables of transformation If t represents time then ω represents angular frequency as § 17. in the frequency domain. let us observe that the quantity in (18. . 1 F (ω) = F {f (t)} ≡ √ 2π ∞ −∞ e−iωτ f (τ ) dτ. The reader can adapt the book’s presentation to the Fourier deﬁnition and usage his colleagues prefer at need.2 has explained.1. We conventionally give this function the capitalized symbol F (ω) and name it the Fourier transform of f (t).2) provides the weights F (ω) for the construction.2 plots the Fourier transform of the pulse of Fig. The Fourier transform (18.1. unfortunately.2) serves as a continuous measure of a function’s frequency content.2) is a function not of t but rather of ω. 2 Regrettably.2) serves to deﬁne.1) and changing η ← ω as the dummy variable of integration.2 The transform and inverse transform The reader may agree that Fourier’s equation (18.1)’s square braces. 18. 18.3) and (18. (18. the same letter here as f and F ) as a short form to represent the transformation (18.3) This last is the inverse Fourier transform of the function F (ω). introducing also the useful notation F{·} (where the script letter F stands for “Fourier” and is only coincidentally. Substituting (18.15) of the discrete Fourier series. (18. but in what way is it remarkable? To answer.2) into (18. THE FOURIER TRANSFORM 517 18. Alternate usages [22] change −i ← i in certain circumstances. The transform ﬁnds even wider application than does the series. Indeed.14) and (17. The essential Fourier mathematics however remains the same in any case.1) is curious.3) constructs a function f (t) of an inﬁnity of inﬁnitesimally graded complex exponentials and that (18. the Fourier transform’s complementary equations (18. In this case the function f (t) is said to operate in the time domain and the corresponding transformed function F (ω).2) are but continuous versions of the earlier complementary equations (17. Alternate deﬁnitions [56][14] handle the factors of √ 1/ 2π diﬀerently.2 Figure 18.18. we have that 1 f (t) = F −1 {F (ω)} = √ 2π ∞ −∞ eiηt F (η) dη. consider that (18.

eivθ F (θ) dθ.2: The Fourier transform of the pulse of Fig.518 CHAPTER 18.5) where (18. eiωt F (ω) dω. Formally. ℜ[F (ω)] ω ℑ[F (ω)] The mutually independent variables ω and t are then the complementary variables of transformation. THE FOURIER AND LAPLACE TRANSFORMS Figure 18. ∞ −∞ ∞ −∞ e−iωt f (t) dt. incidentally. one can instead write 1 F (ω) = F {f (t)} = √ 2π 1 f (t) = F −1 {F (ω)} = √ 2π F ≡ Fωt .4) in which the θ is in itself no variable of transformation but only a dummy variable. Notice here the usage of the symbol F. 18. . To emphasize the distinction between the untransformed and transformed (respectively typically time and frequency) domains. ∞ −∞ ∞ −∞ e−ivθ f (θ) dθ. one can use any two letters in place of t and ω. however. and indeed one need not even use two diﬀerent letters. one can elaborate the F—here or wherever else it appears— as Fvv .2). The unadorned symbol F however usually acquits itself clearly enough in context (refer to § A.3) together with appropriate changes of dummy variable. Fωt or the like to identify the complementary variables of transformation explicitly.5) is just (18. (18. (18.1. As clarity demands. for it is sometimes easier just to write 1 F (v) = F {f (v)} = √ 2π 1 f (v) = F −1 {F (v)} = √ 2π F ≡ Fvv .2) and (18.

8) and yet further transforms by the duality rule and the other properties of § 18.18. exhibiting several broadly useful properties one might grasp to wield the transform eﬀectively. 2π (18.27). 2π 2 1 0 where Sa(·) is the sine-argument function of (17. F {Λ(v)} = v 2 2π 1 √ e−ivθ [1 + (iv)(1 + θ)] 0 −1 + e−ivθ [−1 + (iv)(1 − θ)] = Sa (v/2) √ . continuing. According to Table 9. PROPERTIES OF THE FOURIER TRANSFORM 519 constitute a Fourier transform pair. θe−ivθ = [d/dθ][e−ivθ (1 + ivθ)/v 2 ].7) 2π One can compute other Fourier transforms in like manner. Thus we ﬁnd the Fourier transform pair 2 F Sa (v/2) Λ(v) → √ . This section derives and lists the properties. 18.4 An example As a Fourier example. so. such as that Π(v) → F Sa(v/2) √ .10). Its Fourier transform according to (18. consider the triangular pulse Λ(t) of (17.1 (though it is easy enough to ﬁgure it out without recourse to the table).2.1. (18. Whichever letter or letters might be used for the independent variable. the functions F f (v) → F (v) (18.2 Properties of the Fourier transform The Fourier transform obeys an algebra of its own. .2.4) is F {Λ(v)} = = 1 √ 2π 1 √ 2π ∞ −∞ 0 −1 e−ivθ Λ(θ) dθ e−ivθ (1 + θ) dθ + 0 1 e−ivθ (1 − θ) dθ .6) 18.

Consider the Fourier pair Λ(v) → √ F Sa2 [v/2]/ 2π mentioned in § 18. F (hG (v))]. is √ F just the pair Sa2 [v/2]/ 2π → Λ(v).11) converts the identity to its compositional dual e−iav f (v) → F (v + a). to combine (18.9) which is that the transform of the transform is the original function with the independent variable reversed. f (hG (v))] → g[−v. G[v. F (−hg (−v))]. or alternately (18. (18. compositional duality. but is it correct? To show that it is. This is best introduced by example.4)’s ﬁrst line. the identity extends the pair to √ F Λ(v − a) → e−iav Sa2 [v/2]/ 2π.10) is the pair √ F Sa2 [v/2]/ 2π → Λ(−v) which.9). expresses the Fourier transform’s duality rule. (·)] = (·). THE FOURIER AND LAPLACE TRANSFORMS 18.2.11) is correct. plus the Fourier identity f (v − a) → e−iav F (v) which we have not yet met but will in § 18.2. (·)] = (e−iav )(·).9) to form the endless transform progression · · · → f (v) → F (v) → f (−v) → F (−v) → f (v) → · · · F F F F F F (18. Note incidentally that the direct dual of the original pair per (18.11)’s ﬁrst line to get the F F F F (18. eqn. and G[v. an interesting and useful property. which √ F in turn extends the pair to e−iav Λ(v) → Sa2 [(v + a)/2]/ 2π. g[v. recognizing hg (v) = v − a. f (hg (v))] → G[v. (18. take the direct dual on v of (18.1 Duality ∞ −∞ Changing −v ← v makes (18.11) to determine this. expressed abstractly as g[v.10) Equation (18. it does seem useful. On the other hand.1. assuming that (18. The Fourier transform evinces duality in another guise too.4.6) and (18. this says neither more nor less than that F F (v) → f (−v). So.4)’s second line to read 1 f (−v) = √ 2π e−ivθ F (θ) dθ. but that we need neither the identity nor (18.4 below. according to (18. Identifying √ f (v) = Λ(v) and F (v) = Sa2 [v/2]/ 2π.520 CHAPTER 18.10). However. hG (v) = v.11) . since it happens that Λ(−v) = Λ(v). and moreover enlightening. It is entertaining.

Φ(hG (v))] → g[−v. Suppose some particular function f ([·]) whose Fourier transform on [·] is F ([·]). F F Refer to § 18. f (hG (v))] → g[−v. f (hg (−v))]. F then observe that substituting these two. such that φ([·]) → Φ([·]). F (−[·]) → f ([·]). (18. φ([·]) might represent any function so long as Φ([·]) were let to represent the same function’s Fourier transform on [·]. F (−hg (−v))]. F2 (hG2 (v))].2. f2 (hG2 (v))] → g[−v. F (hG (v))]. Therefore. F2 (−hg2 (−v))]. completing the proof. (18.3. such that vv g[v. F (hG (v))] → g[−v. F F To be symbolically precise. in view of the foregoing. F1 (hG1 (v)). vv G[v. Once the proof is understood. F[·][·] F[·][·] whereas the F in the formal pairs was Fvv . this as we said is merely a formal pair. complementary deﬁnitions together into the formal pair (18.13) 3 whose inverse Fourier transform on [·]. G[v. as3 φ([·]) → Φ([·]). F then change the symbols φ ← f and Φ ← F to express the same formal pair as F G[v. f1 (hG1 (v)).12) G[v.11) is readily extended to g[v. which is to say that it represents no functions in particular but presents a pattern to which functions can be ﬁtted. for the two of which there must exist—by direct duality thrice on [·]— the Fourier pair F F (−[·]) → f ([·]). f (hg (v))] → G[v.1. the F here is F[·][·] .18. PROPERTIES OF THE FOURIER TRANSFORM formal pair 521 Now.11)’s second line. . (18.12) yields (18. Let us deﬁne Φ([·]) ≡ f ([·]). cannot but be φ([·]) ≡ F (−[·]). F1 (−hg1 (−v)). f1 (hg1 (v)). φ(hg (−v))]. f2 (hg2 (v))] → G[v.

f1 (hg1 (v)). F (hG (v))] G[v. f (hg (v))] → G[v. F (−hg (−v))] g[v. G[v. fk (hGk (v))] → g[−v.1 summarizes. fk (hgk (v))] → G[v. f2 (hg2 (v)). THE FOURIER AND LAPLACE TRANSFORMS Table 18.1: Fourier duality rules. F2 (hG2 (v))] G[v. f2 (hG2 (v))] → g[−v. the table’s several rules involving g. F1 (hG1 (v)). in which g[v. (Observe that the compositional rules. . f1 (hG1 (v)). . f (hG (v))] → g[−v. Fk (−hgk (−v))] F F F F F F F F F F and indeed generalized to g[v. f1 (hg1 (v)). fk (hGk (v))] → g[−v. F2 (−hg2 (−v))] g[v.522 CHAPTER 18. . Fk (hGk (v))] G[v.) f (v) → F (v) F (v) → f (−v) f (−v) → F (−v) F (−v) → f (v) g[v. Fk (−hgk (−v))].2. Table 18. F F (18. transform only properties valid for all f [v]. f3 (hg3 (v)). f2 (hg2 (v))] → G[v.2 Real and imaginary parts The Fourier transform of a function’s conjugate according to (18. Fk (hGk (v))]. fk (hgk (v))] means g[v. fk (hgk (v))] → G[v.14) 18.4) is 1 F{f ∗ (v)} = √ 2π 1 e−ivθ f ∗ (θ) dθ = √ 2π −∞ ∞ ∞ −∞ eiv θ f (θ) dθ ∗ ∗ . .]. F1 (−hg1 (−v)).

58) (18. The Fourier transform as such therefore does not meet § 2.17) For real v and an f (v) which itself is real for all real v. unlike most of the book’s mathematics before Ch.3 is F here implied.2. because Fourier transforms real functions into complex ones. but the property referred to.12. ℑ[f (v)] = 0. i2 ℜ[f (v)] = then the Fourier transforms of these parts according to (18.16) does hold. it implies that4 f ∗ (v) → F ∗ (−v ∗ ).10) that F{f ∗ (v)} = F −∗ {f (v ∗ )} = F ∗ {f (−v ∗ )}. This implies by (18. . 18. i2 F F (18.16) are5 ℜ[f (v)] → F (v) + F ∗ (−v ∗ ) .15)—and thus ultimately also the Fourier transform’s deﬁnition (18.4)—have arbitrarily chosen a particular sign for the i in the phasing factor e−ij ∆ω τ or e−ivθ . i2 From past experience with complex conjugation.64) to hold. eqns.16) that f ∗ (v) → F ∗ (v). Viewed from another angle. so to speak. (17. Fourier superposition A1 f1 (v) + A2 f2 (v) → A1 F1 (v) + A2 F2 (v). PROPERTIES OF THE FOURIER TRANSFORM 523 in which we have taken advantage of the fact that the dummy variable θ = θ ∗ happens to be real.14) and (17.4) and (18. the latter line becomes 0→ 4 F F (v) − F ∗ (−v) if ℑ(v) = 0 and.18. but this natural expectation would actually have been incorrect.15) where the symbology F −∗ {·} ≡ [F −1 {·}]∗ is used and F −1 {g(w)} = F{F −1 {F −1 {g(w)}}} = F{g(−w)}. which phasing factor the Fourier integration bakes into the transformed function F (v). (18. which does not depend on this subsection’s results anyway. Readers whom this troubles might consider that. 5 The precisely orderly reader might note that a forward reference to Table 18. (18. 17. it must be so. is so trivial to prove that we will not bother about the precise ordering in this case.16) If we express the real and imaginary parts of f (v) in the style of (2. Fortunately. 2 ∗ ∗ F F (v) − F (−v ) ℑ[f (v)] → . 2 f (v) − f ∗ (v) ℑ[f (v)] = .2’s condition for (2. In the arrow notation. an applied mathematician might F naturally have expected of (18. See Figs.1 and 18.2. as f (v) + f ∗ (v) . for all such v.

18. Table 18.10) is 1→ F (18. ℑ[f (v)] = 0.3 The Fourier transform of the Dirac delta Section 18.18) Interpreted. an impossible integral to evaluate.2 summarizes. 18. ℑ[f (v)] = 0.14).20.2 has illustrated.2: Real and imaginary parts of the Fourier transform. incidentally. so let us pause to compute these now.2. for all such v.18) says for real v and f (v) that the plot of ℜ[F (v)] is symmetric about the vertical axis whereas the plot of ℑ[F (v)] is symmetric about the origin. we ﬁnd curiously that 1 F δ(v) → √ .12) and invoking its sifting property (7. (18. as Fig. for all such v. THE FOURIER AND LAPLACE TRANSFORMS Table 18. Applying (18.4 we would have found 1 → of ∞ [1/ 2π] −∞ e−ivθ dθ.20) inasmuch as δ(−v) = δ(v).) . f ∗ (v) → F ∗ (−v ∗ ) F (v) + F ∗ (−v ∗ ) F ℜ[f (v)] → 2 F (v) − F ∗ (−v ∗ ) F ℑ[f (v)] → i2 If ℑ(v) = 0 and.4 will need one particular pair and its dual sooner. 18. √ f [v] ≡ 1—directly according to eqn. (18. then F (v) = F ∗ (−v). 2π the dual of which according to (18.2.524 CHAPTER 18.19) √ 2π δ(v) (18. F whereby F (v) = F ∗ (−v) if ℑ(v) = 0 and.3 will compute several Fourier transform pairs but § 18.4) to the Dirac delta (7. (The duality rule proves its worth in eqn. Had we tried to calculate the Fourier transform of 1—that F is. 18.

3 lists several Fourier properties involving shifting. The table’s fourth property is proved trivially. ℜ(α) = 0 A1 f1 (v) + A2 f2 (v) → A1 F1 (v) + A2 F2 (v) d F f (v) → ivF (v) dv d F ivf (v) → − F (v) dv v F (v) 2π F f (τ ) dτ → + F (0)δ(v) iv 2 −∞ 18. diﬀerentiation and integration.4)’s second line.11) as the composition dual of the table’s ﬁrst property.3: Fourier properties involving shifting. PROPERTIES OF THE FOURIER TRANSFORM 525 Table 18. The table’s ﬁfth and sixth properties begin from the derivative of the inverse Fourier transform. plus an expression of the Fourier transform’s linearity.2. is proved through (18. f (v − a) → e−iav F (v) eiav f (v) → F (v − a) A v F Af (αv) → F |α| α F F F if ℑ(α) = 0. or. that is. This derivative is d f (v) = dv ∞ 1 √ iθeivθ F (θ) dθ 2π −∞ ∞ 1 eivθ [iθF (θ)] dθ = √ 2π −∞ = F −1 {ivF (v)}. The table’s second property is proved by applying (18.4) to f (v − a) then changing ξ ← θ−a. alternately. . of (18.4 Shifting.4) to eiav f (v).18. scaling. The table’s third property is proved by applying (18. scaling and diﬀerentiation Table 18. scaling and diﬀerentiation. which implies F d f (v) dv = ivF (v).2. The table’s ﬁrst property is proved by applying (18.4) to Af (αv) then changing ξ ← αθ.

[37. Since the system is linear. THE FOURIER AND LAPLACE TRANSFORMS the table’s ﬁfth property.2] See Ch. it follows that its response to an arbitrary input f (t) is g(t) ≡ ∞ −∞ h(t − τ )f (τ ) dτ .2.5 Convolution and correlation The concept of convolution emerges from mechanical engineering (or from its subdisciplines electrical and chemical engineering).526 CHAPTER 18. or. Table 18. e−ivθ h = √ 2 2 2π −∞ −∞ h ∞ −∞ ∞ Now changing φ ← θ/2 + ψ. g(t) ≡ ∞ −∞ h t −τ 2 f t +τ 2 dτ. § 2. Besides the identities this section derives.11) of the ﬁfth. in which the response of a linear system to an impulse δ(t) is some characteristic transfer function h(t).4 will prove. The sixth and last property is the compositional dual (18.45). changing t/2 + τ ← τ to improve the equation’s symmetry. 18. (18. Changing v ← t and ψ ← τ in (18.4) yields7 F v v −ψ f + ψ dψ 2 2 −∞ ∞ ∞ θ θ 1 −ψ f + ψ dψ dθ e−ivθ h = √ 2 2 2π −∞ −∞ ∞ ∞ θ θ 1 −ψ f + ψ dθ dψ.21) to comport with the notation found elsewhere in this section then applying (18. which § 18. F = = 6 7 h v v −ψ f + ψ dψ 2 2 ∞ −∞ ∞ −∞ ∞ −∞ 2 √ 2π 2 √ 2π e−iv(2φ−2ψ) h(φ − 2ψ)f (φ) dφ dψ ∞ −∞ e−ivφ f (φ) e−iv(φ−2ψ) h(φ − 2ψ) dψ dφ. .3 also includes (18.21) This integral deﬁnes6 convolution of the two functions f (t) and h(t). 17’s footnote 8.

25) Furthermore. Closely related to the convolutional integral (18. (18.11) of (18. (18.24) whose transform and dual transform are computed as in the last paragraph to be ∞ √ v v F f ψ+ dψ → h ψ− 2π H(−v)F (v). so ∞ −∞ F h∗ ψ − v v f ψ+ 2 2 dψ → F √ 2π H ∗ (v ∗ )F (v).18. h∗ (v) → H ∗ (−v ∗ ). 2 (18.22) is 1 F h(v)f (v) → √ 2π H v −ψ F 2 v + ψ dψ.26) . ∞ −∞ 1 F h∗ (v ∗ )f (v) → √ 2π H∗ ψ − v v F ψ+ 2 2 dψ. according to (18.2. −ψ f + ψ dψ → 2 2 ∞ −∞ (18.21) is the integral g(t) ≡ ∞ −∞ h τ− t 2 f τ+ t 2 dτ. convolution in the one domain evidently transforms to multiplication in the other.22) or by (18. F = = = That is. h √ v v F 2π H(v)F (v).22) The compositional dual (18. 2 2 −∞ ∞ v v 1 F F ψ+ dψ. PROPERTIES OF THE FOURIER TRANSFORM Again changing χ ← φ − 2ψ.23).16).23) in which we have changed −ψ ← ψ as the dummy variable of integration. ∞ −∞ ∞ 527 h −∞ v v −ψ f + ψ dψ 2 2 ∞ −∞ 1 √ 2π √ 2π √ e−ivφ f (φ) ∞ −∞ ∞ −∞ e−ivχ h(χ) dχ dφ 1 √ 2π ∞ −∞ 1 √ 2π e−ivχ h(χ) dχ e−ivφ f (φ) dφ 2π H(v)F (v). Whether by (18. H ψ− h(−v)f (v) → √ 2 2 2π −∞ (18.

§ 1. we see that in (18. 2 2 ∞ 1 v F h∗ (v)f (v) → √ −ψ F H∗ 2 2π −∞ v + ψ dψ.29) h∗ τ − f τ+ dτ (18. as before. It is called correlation.30) for correlation.28). (18. Reviewing this subsection.22) of the convolutional integral.6A] . THE FOURIER AND LAPLACE TRANSFORMS in which the second line is the compositional dual of the ﬁrst with. the operation the integral (18. known as convolution.26) we have already determined the transform and dual transform of the correlational integral (18. Nothing prevents one from correlating a function with itself. § 19. 2 (18.27) Unlike the operation the integral (18. Convolution and correlation arise often enough in applications to enjoy their own.21) expresses. being a measure of the degree to which one function tracks another with an oﬀset in the independent variable.28) expresses does have a name. and indeed one can do the same to the transform (18.9 For convolution. incidentally.24) expresses has no special name as far as the writer is aware. 8 9 f (t) ∗ h(t) = h(t) ∗ f (t).31) proves useful at times. the commutative and associative properties that f (t) ∗ [g(t) ∗ h(t)] = [f (t) ∗ g(t)] ∗ h(t).528 CHAPTER 18.4] [36. The autocorrelation Rf f (t) = ∞ −∞ f∗ τ − t 2 f τ+ t 2 dτ (18. the operation its variant g(t) ≡ ∞ −∞ h∗ τ − t 2 f τ+ t 2 dτ (18.32) [40. obtaining ∞ −∞ h∗ √ v v F −ψ f + ψ dψ → 2π H ∗ (−v ∗ )F (v). the dummy variable −ψ ← ψ changed. peculiar notations8 h(t) ∗ f (t) ≡ for convolution and Rf h (t) ≡ ∞ −∞ ∞ −∞ h t −τ 2 t 2 f t +τ 2 t 2 dτ (18. However.

we should take note of one entry of 18.4).6 Parseval’s theorem ∞ −∞ ∞ 1 eivθ F (θ) dθ dv h∗ (v) √ 2π −∞ −∞ ∞ ∞ 1 √ eivθ h∗ (v) dv F (θ) dθ 2π −∞ −∞ ∗ ∞ ∞ 1 √ e−ivθ h(v) dv F (θ) dθ. § 1. −∞ in which we have used (18.33) 18.34) −∞ 10 −∞ Electrical engineers call the quantity |F (v)|2 on (18. [36. PROPERTIES OF THE FOURIER TRANSFORM 529 are useful.2. 1 F √ Rf f (v) → |F (v)|2 if ℑ(v) = 0. 2π For10 real v. This same entry is also found in Table 18.4 and 18.26)— but when written in the correlation’s peculiar notation it draws attention √ to a peculiar result. 2π −∞ −∞ ∞ ∞ By successive steps. we have that ∞ h∗ (v)f (v) dv = ∞ H ∗ (v)F (v) dv. Scaled by 1/ 2π. too.4 in other notation—indeed it is just the ﬁrst line of (18.29) and.5 in F √ particular. h∗ (v)f (v) dv = = = = H ∗ (θ)F (θ) dθ. (18.18.33)’s right the energy spectral density of f (v). 2π (18.6B] . Rf h (v) → ( 2π)H ∗ (v ∗ )F (v). the autocorrelation and its Fourier transform are evidently 1 F √ Rf f (v) → F ∗ (v ∗ )F (v). Before closing the section. both may be demonstrated √ √ F as f (v) ∗ [g(v) ∗ h(v)] → ( 2π)F (v)[( 2π)G(v)H(v)] = √ √ F −1 ( 2π)[( 2π)F (v)G(v)]H(v) → [f (v) ∗ g(v)] ∗ h(v) and similarly for the commutative property. Changing v ← θ on the right. through Fourier transformation.5 summarize. where the former may be demonstrated by changing −τ ← τ in (18.2. Tables 18. interchanged the integrations and assumed that the dummy variables v and θ of integration remain real.

∞ −∞ h v v F −ψ f + ψ dψ → 2 2 h(v)f (v) → F √ 2π H(v)F (v) ∞ −∞ 1 √ 2π √ H ∞ −∞ v −ψ 2 v ×F + ψ dψ 2 h ψ− v v f ψ+ 2 2 dψ → F F 2π H(−v)F (v) ∞ −∞ h(−v)f (v) → ∞ −∞ 1 √ 2π √ H ψ− v 2 v 2 dψ ×F ψ+ 2π H ∗ (−v ∗ )F (v) ∞ −∞ h∗ v v F −ψ f + ψ dψ → 2 2 h∗ (v)f (v) → F 1 √ 2π √ H∗ ∞ −∞ v −ψ 2 v ×F + ψ dψ 2 h∗ ψ − v v f ψ+ 2 2 dψ → F F 2π H ∗ (v ∗ )F (v) ∞ −∞ h∗ (v ∗ )f (v) → 1 √ 2π H∗ ψ − v 2 v 2 dψ ×F ψ+ . THE FOURIER AND LAPLACE TRANSFORMS Table 18.4: Convolution and correlation. and their Fourier properties.530 CHAPTER 18.

ω and |F (ω)|2 respectively have physical dimensions of time. too: see § 17. angular frequency and energy per unit angular frequency.6B] .35) When this is written as ∞ −∞ |f (t)|2 dt = |F (ω)|2 dω. then the theorem conveys the important physical insight that energy transmitted at various times can equally well be regarded as energy transmitted at various frequencies.) ∞ −∞ ∞ −∞ f (t) ∗ h(t) = h(t) ∗ f (t) Rf h (t) ≡ ≡ F h t −τ 2 t 2 f f t +τ 2 τ+ t 2 dτ dτ h(v) ∗ f (v) → h(v)f (v) → Rf h (v) → h∗ (v ∗ )f (v) → f (t) ∗ [g(t) ∗ h(t)] = F F F √ 2π H(v)F (v) h∗ τ − 1 √ [H(v) ∗ F (v)] 2π √ 2π H ∗ (v ∗ )F (v) 1 √ RF H (v) 2π [f (t) ∗ g(t)] ∗ h(t) This is Parseval’s theorem.5: Convolution and correlation in their peculiar notation. energy per unit time.35) as ∞ −∞ 11 f 2 (v) dv = ∞ −∞ ℜ2 [F (v)] dv + ∞ −∞ ℑ2 [F (v)] dv (18.2. PROPERTIES OF THE FOURIER TRANSFORM 531 Table 18. § 2-2][36. |f (t)|2 . This works for space and spatial frequencies. (Note that the ∗ which appears in the table as h[t] ∗ f [t] diﬀers in meaning from the ∗ in h∗ [v ∗ ]. § 1.18. (18.11 Especially interesting is the special case h(t) = f (t). when ∞ −∞ |f (v)|2 dv = ∞ −∞ ∞ −∞ |F (v)|2 dv. one can write (18.2. For real f (v).36) [14. and t.

• if f (−v) = −f (v) for all v.4) as 1 F (−v) = √ 2π ∞ −∞ e−i(−v)θ f (θ) dθ. often do precisely this—eﬀectively transmitting two. independent streams of information at once. (18.10) of the ﬁrst two of these are evidently v F √ → 2π Π(−v). δ(v) and the constant 1 in (18. then F (−v) = F (v).12 Practical digital electronic communications systems.20). this is seen by expressing F (−v) per (18. respectively.3 The Fourier transforms of selected functions We have already computed the Fourier transforms of Π(v). Sa 2 v F √ 2π Λ(−v). THE FOURIER AND LAPLACE TRANSFORMS which expresses the principle of quadrature. See § 8.2. Λ(v). → Sa2 2 12 [14. independent channels.12.7 Oddness and evenness Odd functions have odd transforms. without conﬂict. • if f (−v) = f (v) for all v. in the selfsame band.7). Even functions have even transforms. 18. in-phase or I channel and an imaginary-phased. Symbolically. The duals (18. (18. then F (−v) = −F (v). 18. quadrature-phase or Q channel. wired or wireless. a real-phased. namely. § 5-1] . The even case is analyzed likewise.8). conveying the additional physical insight that a single frequency can carry energy in not one but each of two distinct.19) and (18. In the odd case. then changing the dummy variable −θ ← θ to get F (−v) = ∞ 1 √ e−i(−v)(−θ) f (−θ) dθ 2π −∞ ∞ 1 = −√ e−ivθ f (θ) dθ 2π −∞ = −F (v).532 CHAPTER 18.

ℜ(a) > 0. 2 533 which by the scaling property of Table 18.3. 1 F u(v) → √ + Cδ(v).11).18. 7.37) implies signiﬁcantly that. to spread energy evenly over an available “baseband” but to let no energy leak outside that band. 2π (iv) where the necessary term Cδ(v). 2 imply that13 v .4) to u(v)e−av .38) Interesting is the limit a → 0+ in (18.37) Applying the Fourier transform’s deﬁnition (18. → 2 √ v F → 2π Λ(v).2. 0 revealing the transform pair u(v)e−av → √ F 1 . FOURIER TRANSFORMS OF SELECTED FUNCTIONS or. including radio.9) one can convert u(v) to an odd function by the simple expedient of subtracting 1/2 from it. with scale C to be determined. 17.38) when both a and v vanish at once. + C− u(v) − → √ 2 2 2π (iv) In electronic communications systems. where u(v) is the Heaviside unit step (7. since Π(−v) = Π(v) and Λ(−v) = Λ(v). 13 .3 √ 2π F Sa(v) → Π √2 2π F Sa2 (v) → Λ 2 (18. one should transmit sine-argument-shaped pulses as in Fig. Sa Sa2 v F √ 2π Π(v). 2 v . we have then that √ 1 1 F 2π δ(v).6. 2π (a + iv) (18. the ﬁrst line of (18.38). F √ Since 1/2 → ( 2π/2)δ(v) according to (18. What we do know from § 18. yields F u(v)e −av 1 =√ 2π ∞ 0 e −(a+iv)θ 1 dθ = √ 2π e−(a+iv)θ −(a + iv) ∞ . merely admits that we do not yet know how to evaluate (18.7 is that odd functions have odd transforms and that (as one can readily see in Fig.20).

1 the trigonometrics from their complex parts. THE FOURIER AND LAPLACE TRANSFORMS √ which to make its right side odd demands that C = 2π/2. and can be computed straightforwardly from the pairs √ F 2π δ(v − a). On the other hand. we have that √ 2π F sin av → [δ(v − a) − δ(v + a)] . j2 (18. The reason is that the Dirac delta pulse train’s Fourier series according to (17. 2πn!(a + iv)n+1 (18. 0 Since all but the k = 0 term vanish.1.20) Table 18. e−av v n = so F u(v)e−av v n = = 1 √ 2π 1 √ 2π ∞ 0 n k=0 d dv n k=0 −e−av v k .534 CHAPTER 18.3’s property that eiav f (v) → F (v − a). 2 Curiously.39) u(v) → √ + 2 2π iv results.14) is ∞ j=−∞ F δ(v − jT1 ) = ∞ j=−∞ eij(2π/T1 )v . according to Table 9.5 turns out to be another Dirac delta pulse train. n ≥ 0. The transform pair √ 1 2π F δ(v) (18.21) and (17. n ∈ Z. the Fourier transform of the Dirac delta pulse train of Fig.41) √ F 2π δ(v + a).40) The Fourier transforms of sin av and cos av are interesting and important. T1 . (n!/k!)an−k+1 e−(a+iv)θ θ n dθ −e−(a+iv)θ θ k (n!/k!)(a + iv)n−k+1 ∞ . e−iav → which result by applying to (18.42) √ 2π F cos av → [δ(v − a) + δ(v + a)] . eiav → (18. ℜ(a) > 0. 17. Composing per Table 5. the last equation implies the transform pair u(v)e−av v n → √ F 1 .

18.41) is ∞ j=−∞ 535 δ(v − jT1 ) → F √ 2π T1 ∞ j=−∞ δ v−j 2π T1 . the table also covers the Gaussian pulse of § 18.4 The Fourier transform of the integration operation Though it includes the Fourier transform of the diﬀerentiation operation.44) j=−∞ j=−∞ discovering a pulse train whose Fourier transform is itself. and vice versa. (18.4 on the rightward form.4. the further the pulses of the original train. yields the Fourier pair v f (τ ) dτ −∞ 14 → ≡ F √ 2π H(v)F (v). 18.5. we begin by observing that one can express the integration in question in either of the equivalent forms v f (τ ) dτ = −∞ ∞ −∞ u v v −τ f +τ 2 2 dτ. yet. We have the theory now.43) Apparently. Invoking an identity of Table 18. (18. Table 18. Letting T1 = 2π in (18. even √ when transformed.6 summarizes. [37.33] . the closer the pulses of the transformed train. the train remains a train of Dirac deltas. Prob. iv 2 (18. THE FOURIER TRANSFORM OF INTEGRATION the transform of which according to (18.14 To develop (18. then substituting the leftward form. Table 18. 5. for when we compiled the table we lacked the needed theory. Besides gathering transform pairs from this and earlier sections.3 omits the complementary identity v −∞ f (τ ) dτ → F F (v) 2π + F (0)δ(v). H(v) F{u(v)}.45).43) we ﬁnd the pair ∞ ∞ √ √ F δ v − j 2π → δ v − j 2π .45) the Fourier transform of the integration operation.

6: Fourier transform pairs. THE FOURIER AND LAPLACE TRANSFORMS Table 18. n ≥ 0 √ 2π δ(v − a) √ 2π [δ(v − a) − δ(v + a)] j2 √ 2π [δ(v − a) + δ(v + a)] √2 v 2π Π 2 √2 v 2π Λ 2 2 √ ∞ 2π 2π δ v−j T1 T1 j=−∞ ∞ j=−∞ √ δ v − j 2π √ δ v − j 2π Ω(v) → Ω(v) .536 CHAPTER 18. ℜ(a) > 0 2π (a + iv) 1 √ . 1 → u(v) → δ(v) → Π(v) → Λ(v) → u(v)e−av → F F F F F F F √ 2π δ(v) u(v)e−av v n → eiav → sin av → cos av → Sa(v) → Sa2 (v) → δ(v − jT1 ) → → F F F F F F F F ∞ j=−∞ ∞ j=−∞ √ 1 2π √ + δ(v) 2 2π iv 1 √ 2π Sa(v/2) √ 2π 2 Sa (v/2) √ 2π 1 √ . 2π n!(a + iv)n+1 ℜ(a) > 0. n ∈ Z.

1 and 17.6.18. iv 2 of which sifting the rightmost term produces (18. we shall encounter a most curious function. the question now arises again: can any function be its own transform? Well. We found that a sinusoid could be its own derivative after a fashion—diﬀerentiation shifted its curve leftward but did not alter its shape—but that the only nontrivial function to be exactly its own derivative was the natural exponential f (z) = Aez . 4 and 5. so v −∞ 537 f (τ ) dτ → F F (v) 2π + F (v)δ(v). also known as the bell curve among other names. F{u(v)} = 1/[( 2π)iv] + ( 2π/2)δ(v). for now. 18. but this train unlike the natural exponential is abrupt and ungraceful. 2π (18.43]. we have already found in § 18.4) of the Fourier transform.3. 20. we can just copy here the pulse’s deﬁnition from (20.3. whether directly or indirectly. because it was indeed its own derivative. An alternate technique is outlined in [37.5. We later found the same natural exponential to ﬁll several signiﬁcant mathematical roles—largely. we asked among other questions whether any function could be its own derivative. Studying the Fourier transform. the Gaussian pulse. but known techniques to compute it include the following.5 The Gaussian pulse While studying the derivative in Chs.45). In Chs. 19 and 20.3 that the Dirac delta pulse train can be. We will defer discussion of the Gaussian pulse’s provenance to the coming chapters but.15 From the deﬁnition (18. One should like an analytic function. . THE GAUSSIAN PULSE √ √ But according to Table 18. and preferably not a train but a single pulse. F{Ω(v)} = 15 1 2π ∞ −∞ exp − θ2 − ivθ 2 dθ. 5. Prob.15) as Ω(t) ≡ exp −t2 /2 √ . perhaps not the sort of function one had in mind.46) plotted on pages 562 below and 493 above. during the study of special functions and probability. respectively in Figs. The Fourier transform of the Gaussian pulse is even trickier to compute than were the transforms of § 18.

not to forget. mathematicians have been searching hundreds of years for clever techniques to evaluate just such integrals and. leaving only the two. None of the techniques of Ch. Fortunately. we record it in books like this. when occasionally they should discover such a technique and reveal it to us.47) How to proceed from here is not immediately obvious. and the formal proof is left as an exercise to the interested reader17 ) lie so far away that—for this integrand—they integrate to nothing. THE FOURIER AND LAPLACE TRANSFORMS Completing the square (§ 2.538 CHAPTER 18. Had we not studied complex contour integration in § 9. So tracing leaves us with F{Ω(v)} = exp −v 2 /2 2π ∞ −∞ exp − ξ2 2 dξ. dθ −∞ (θ + iv)2 2 exp − ξ2 2 dξ. F{Ω(v)} = exp −v 2 /2 2π ∞+iv −∞+iv exp −v 2 /2 2π exp −v 2 /2 2π ∞ −∞ ∞ exp − exp − v2 θ2 − ivθ + 2 2 dθ. since happily we have studied it. short complex segments at the ends which (as is easy enough to see. that it is everywhere analytic—we recognize that one can trace the path of integration from −∞ + iθ to ∞ + iθ along any contour one likes. but the real part happens to be positive—indeed. 9 seems especially suitable to evaluate I≡ ∞ −∞ exp − ξ2 2 dξ. 17 16 . though if a search for a suitable contour of integration failed one might fall back on the Taylor-series technique of § 9.5 we should ﬁnd such an integral hard to integrate in closed form. most extremely positive—over the domains of the segments in question.6) of ξ—that is. why. [16] The short complex segments at the ends might integrate to something were the real part of ξ 2 negative. Let us trace it along the real Argand axis from −∞ to ∞. observing that the integrand exp(ξ 2 ) is an entire function (§ 8.16 F{Ω(v)} = = Changing ξ ← θ + iv. (18.2). However.9.

18. What if we multiply the two? Then I2 = = ∞ −∞ ∞ −∞ exp − ∞ −∞ x2 2 dx ∞ exp − −∞ 2 + y2 x exp − y2 2 dy 2 dx dy.18 The equations I= x2 2 −∞ ∞ y2 exp − I= 2 −∞ exp − ∞ 539 dx.5.1) becomes suddenly easy to guess: ∞ ρ2 = 2π. I 2 = 2π − exp − 2 0 So evidently. nothing then prevents us from double-integrating by the cylindrical coordinates (ρ.48) exp − [22. but that an extra factor of the dummy variable has appeared— that the integrand ends not with dρ but with ρ dρ. express the same integral I in two diﬀerent ways. § I:40-4] . ξ2 2 dξ = √ 2π (18. instead. y) are rectangular coordinates. At a casual glance. the last integral in square brackets does not look much diﬀerent from the integral with which we started. the only diﬀerence being in the choice of letter for the dummy variable. If we interpret thus. THE GAUSSIAN PULSE Here is the technique. One. dy. the integral’s solution by antiderivative (§ 9. φ). as I2 = 2π/2 −2π/2 0 ∞ ∞ exp − ρ2 2 ρ2 2 ρ dρ dφ = 2π 0 exp − ρ dρ . Once we have realized this. but see: it is not only that the lower limit of integration and the letter of the dummy variable have changed. I= which means that ∞ −∞ 18 √ 2π. geometrical way to interpret this I 2 is as a double integration over a plane in which (x.

The Gaussian pulse resembles the natural exponential in its general versatility. transforms pulses like those of Figs. in the tall.3DF2. ±3. F (18.13) have recommended the shape of the Gaussian pulse. ±1. the Gaussian pulse is about as tightly localized as a nontrivial.46) reveals the remarkable transform pair Ω(v) → Ω(v). seems powerless to bend Gauss’ mighty curve.1 and 17. thus rendering the Dirac delta susceptible to an analytic limiting process which transforms amenably. Away from its middle region |t| 1. ±2. who can twist and knot other curves with ease. narrow limit.50) the same as for Π(t). for not only is the Gaussian pulse analytic (unlike the Dirac delta) but it also behaves uncommonly well under Fourier transformation (like the Dirac delta). Old Fourier.46) and (18. 0x0. ±4. 0x0.3 straightforwardly but stumbles Consider that Ω(t) ≈ 0x0. 20 to come.6621.48) that ∞ −∞ Ω(t) dt = 1. the Gaussian pulse evidently vanishes rather convincingly.3 and its (17. deﬁning the Fourier transform in the Fωt notation. 0x0. now that we have the Gaussian pulse in hand we shall ﬁnd that it ably ﬁlls all sorts of roles—not least the principal role of Ch. This section lends a bit more force to the recommendation. ±5.47).0000 at t = 0. Finally substituting (18. 2π which in view of (18. and that Ω(±8) < 2−0x2F . Λ(t) and indeed δ(t).49) The Gaussian pulse transforms to itself. Section 17.19 The passion of one of the author’s mentors in extolling the Gaussian pulse as “absolutely a beautiful function” seems well supported by the practical mathematical virtues exhibited by the function itself. It is worth observing incidentally in light of (18. Indeed.0DD2. THE FOURIER AND LAPLACE TRANSFORMS as was to be calculated. 18. 0x0. 18 to develop the fairly deep mathematics underlying the Gaussian pulse and supporting its basic application. 19 . Too.540 CHAPTER 18. we have that F{Ω(v)} = exp −v 2 /2 √ . uncontrived analytic function can be. (18.48) into (18. though the book has required several chapters through this Ch.6 The Laplace transform Equation (18. to implement the Dirac delta δ(t).0122. 18.5).0009. 0x0.

but nonetheless can prompt the tasteful mathematician to wonder whether a simpler alternative to the Fourier transform were not possible. The ramping property comes by diﬀerentiating and negating (18.18.51)—quite unlike the Fourier transform’s deﬁnition (18. whereafter the higher-order differentiation property comes by repeated application of the diﬀerentiation property. 7][40. Ch. usually with a negative real part.3). Ch. The integration property merely reverses the integration property ∞ on the function g(t) ≡ 0− f (τ ) dτ .51)’s integrand suppresses even the tail the transform does not omit. 21 [37. thus eﬀectively converting even a time-unlimited function to an integrable pulse—and Laplace does so without resort to indirect techniques. when s is purely imaginary. 3][56. s = jω is the transform variable and.20 the Laplace transform F (s) = L{f (t)} ≡ ∞ 0− e−st f (t) dt (18. This book does not treat it. the Laplace transform is very like the Fourier. the Laplace transform’s deﬁnition (18. The ﬁrst several of the Laplace properties of Table 18. s (in which one can elaborate the symbol L as Lst if desired) comes by direct application of the deﬁnition. For instance. The diﬀerentiation property comes by L d f (t) dt = ∞ 0− e−st d f (t) dt = dt ∞ 0− e−st d [f (t)] and thence by integration by parts (§ 9.3 have we been able to transform such functions.2 and 18. for which dg/dt = f (t) and g(0− ) = 0.51) as − 20 d d F (s) = − ds ds ∞ 0− e−st f (t) dt = ∞ 0− e−st [tf (t)] dt = L{tf (t)}. the pair L 1 1→ . Only by the indirect techniques of §§ 18.21 As we said. which in (18. 19] . Ch.7 likewise come by direct application. 3].51) oﬀers such an alternative. Ch. Such indirect techniques are valid and even interesting. THE LAPLACE TRANSFORM 541 on time-unlimited functions like f (t) = cos ωo t or even the ridiculously simple f (t) = 1. Here.6. but Laplace’s advantage lies in that it encourages the use of a complex s.5)—tends to transform simple functions straightforwardly. There has been invented a version of the Laplace transform which omits no tail [37. At the sometimes acceptable cost of omitting one of the Fourier integral’s two tails.

and the pair tn → n!/sn+1 comes by repeated application of the same property.1 into their complex exponential comL ponents). beginning L = ∞ −∞ u ∞ t −ψ h 2 e−st −∞ t t t −ψ u +ψ f + ψ dψ 2 2 2 ∞ t t t t u −ψ h −ψ u +ψ f +ψ 2 2 2 2 −∞ dψ dt. In the application of either table.2. a may be. The pair t → 1/s2 comes by application of the property that tf (t) → −(d/ds)F (s) to the pair 1 → 1/s. eventually reaching the form L = ∞ −∞ u ∞ −∞ t −ψ h 2 t −ψ u 2 t +ψ f 2 ∞ t +ψ 2 dψ e−sχ u(χ)h(χ) dχ e−sφ u(φ)f (φ) dφ . this time to curtail each integration to begin at 0− rather than at −∞. As Table 18. The convolution property comes as it did in § 18. regardless of the value of ψ. most too of the latter table’s entries come by direct application of the Laplace transform’s deﬁnition (18. thus completing the convolution property’s proof. complex.51) (though to reach the sine and cosine entries one should ﬁrst split the sine and cosine functions per Table 5. so Table 18. THE FOURIER AND LAPLACE TRANSFORMS whereafter again the higher-order property comes by repeated application. L L L 18. As in § 18. but α and t are normally real. wherein evidently u(t/2 − ψ)u(t/2 + ψ) = 0 for all t < 0. and s usually is.5.2.5 except that here we take advantage the presence of Heaviside’s unit step to manipulate the limits of integration.7 Solving diﬀerential equations by Laplace The Laplace transform is curious. but admittedly one often ﬁnds in practice that the more straightforward—though harder to analyze—Fourier transform is a better tool for frequency-domain analysis. −∞ after which once more we take advantage of Heaviside.542 CHAPTER 18. among other reasons because Fourier brings an inverse transformation formula (18. As the former table’s. The pairs transforming e−at t and e−at tn come similarly. here also we change φ ← t/2 + ψ and χ ← φ − 2ψ.7 lists Laplace properties.8 lists Laplace transform pairs.5) whereas .

u(t − to )f (t − to ) → e−sto F (s) e−at f (t) → F (s + a) A s L Af (αt) → F α α L L L if ℑ(α) = 0. ℜ(α) > 0 A1 f1 (t) + A2 f2 (t) → A1 F1 (t) + A2 F2 (t) d L f (t) → sF (s) − f (0− ) dt dn L f (t) → sn F (s) − dtn ∞ 0− n−1 sk k=0 dn−1−k f (t) dtn−1−k t=0− f (τ ) dτ F (s) s d L tf (t) → − F (s) ds dn L n t f (t) → (−)n n F (s) ds → L L [u(t)h(t)] ∗ [u(t)f (t)] → H(s)F (s) Table 18.18.8: Laplace transform pairs. SOLVING DIFFERENTIAL EQUATIONS BY LAPLACE 543 Table 18. δ(t) → 1 1 → t → tn → sin ωo t → cos ωo t → L L L L L L 1 s 1 s2 n! n+1 s ωo 2 s2 + ωo s 2 + ω2 s o e−at → e−at t → e−at tn → e−at sin ωo t → e−at cos ωo t → L L L L L 1 s+a 1 (s + a)2 n! (s + a)n+1 ωo 2 (s + a)2 + ωo s+a 2 (s + a)2 + ωo .7: Properties of the Laplace transform.7.

evidently. However. s+2 22 Actually. another use use for the Laplace transform happens to arise. diﬀerentiation in the time (untransformed) domain corresponds to multiplication by the transform variable s in the frequency (transformed) domain. by the way. for it has a F diﬀerentiation property. which looks rather alike. f (0) = 1 2 .544 CHAPTER 18. dt t=0− Applying the properties of Table 18. (d/dv)f (v) → ivF (s). s+2 t=0− − d f (t) dt = t=0− 1 . Ch. The diﬀerence however lies in Laplace’s extra Laplace term −f (0− ) which. 23 [40. but we’ll not use it. one might say the same of the Fourier transform. where f (t) ≡ [1 d/dt]T f (t).7 that (d/dt)f (t) → sF (s) − f (0− ). Example 19. consider for example the linear diﬀerential equation23. Now. To see the signiﬁcance. Laplace does bring an uncouth contour-integrating inverse transformation formula in footnote 25. d f (t) = 2. term by term. (s2 + 4s + 3)F (s) − (s + 4) f (t) + 3F (s) = 1 . dt2 dt f (t)|t=0− = 1. The latL ter use emerges from the Laplace property of Table 18. 8] d f (t) = dt 0 −3 1 −4 f (t) + 0 e−2t . The eﬀort required to assimilate the notation rewards the student with signiﬁcant insight into the manner in which initial conditions—here symbolized f (0)—determine a system’s subsequent evolution. written in the statespace style [56. represents the untransformed function’s initial condition.22 This depends on the application.24 d2 d f (t) + 4 f (t) + 3f (t) = e−2t . THE FOURIER AND LAPLACE TRANSFORMS Laplace does not. formally. yields the transformed equation s2 F (s) − s f (t) − d f (t) dt t=0− t=0− t=0− + 4 sF (s) − f (t) That is.8. too.7 and transforms of Table 18. . signiﬁcantly.31] 24 It is rather enlightening to study the same diﬀerential equation. according to which.

However. F (s) = Factoring the denominator. (s + 1)(s + 2)(s + 3) s2 + 8s + 0xD . The careful reader might object that we have never proved that the Laplace transform cannot map distinct time-domain functions atop one another in the frequency domain. SOLVING DIFFERENTIAL EQUATIONS BY LAPLACE Applying the known initial conditions. Isolating the heretofore unknown frequency-domain function F (s). What one thus ought to ask is whether the Laplace transform can map time-domain functions. (s + 2)(s2 + 4s + 3)F (s) = s2 + 8s + 13. (s + 2)(s2 + 4s + 3) 1 . (s2 + 4s + 3)F (s) − (s + 6) = Multiplying by s + 2 and rearranging. s+1 s+2 s+3 Though we lack an inverse transformation formula. 25 . a domain Laplace ignores.6). the functions being distinct for t ≥ 0. F (s) = s2 + 8s + 0xD . even the careful reader will admit that the suggested f2 (t) diﬀers from f (t) only over t < 0.18.7. term by term.25 This section’s Laplace technique neatly solves many linear diﬀerential equations. that we have never shown the Laplace transform to be invertible. The objection has merit. it seems that we do not need one because—having split the frequency-domain equation into such simple terms—we can just look up the inverse transformation in Table 18. whose Laplace transform does not diﬀer from that of f (t). Consider for instance the time-domain function f2 (t) = u(t)[3e−t − e−2t − e−3t ]. that is. The time-domain solution f (t) = 3e−t − e−2t − e−3t results to the diﬀerential equation with which we started. s+2 1 . 3 1 1 F (s) = − − . s+2 545 Expanding in partial fractions (this step being the key to the whole technique: see § 9. (s2 + 4s + 3)F (s) − (s + 4)[1] − [2] = Combining like terms.8.

(Note that these are not transform pairs as in Tables 18. s→∞ t→∞ lim f (t) = lim sF (s). eqn. 7. To consider the choice of contours of integration and otherwise to polish the answer is left as an exercise to the interested reader.7 though eﬀective is sometimes too much work. s→∞ d f (t) dt = 0− dt f (0+ ) − f (0− ) = s→∞ s→∞ lim sF (s) − f (0− ). the precondition being met by any f (t) = u(t)g(t) (in which u[t] is formally deﬁned for the present purpose such that u[0] = 1). which invoke the time-diﬀerentiation property of Table 18. s→0 (18. f (0+ ) = lim sF (s). for one can check the correctness. lim sF (s) − f (0− ). The Laplace transform’s initial.8 but actual equations. However. which results [56. one can answer the latter question formally nonetheless by changing s ← iω in (18. and probably also the suﬃciency.7 and 18. f (t) = (1/i2π) −i∞ est F (s) ds. lim sF (s) − f (0− ). to cause the limits of integration involved to behave nicely.8 Initial and ﬁnal values by Laplace The method of § 18. when all one wants to know are the initial and/or ﬁnal values of a function f (t).1) and observing the peculiar. In one sense it may be unnecessary to answer even the latter question. here it is noted only that. contour-integrating inverse of the Laplace i∞ transform. .7 and the last of atop one another in the frequency domain. one might insist as a precondition to answering the question something like that f (t) = 0 for all t < 0.) One derives the initial-value theorem by the successive steps d f (t) = dt d e−st f (t) dt = dt 0+ s→∞ 0− lim s→∞ ∞ lim L s→∞ lim sF (s) − f (0− ).546 CHAPTER 18.2]. of any solution Laplace might oﬀer to a particular diﬀerential equation by the expedient of substituting the solution back into the equation.and ﬁnal-value theorems.52) meet this want. when one is not interested in the details between. THE FOURIER AND LAPLACE TRANSFORMS 18.

is a spatial frequency. This book tends to reﬂect its author’s preference for ei(−ωt+k·r) . into the more general.26 18. which diﬀers by discipline.18. or by whichever pair of letters is let to represent the complementary variables of transformation.9. To review the general interpretation and use of such a factor lies beyond the chapter’s scope but the factor’s very form. There results the spatial Fourier transform F (k) = 1 (2π)3/2 1 f (r) = (2π)3/2 e+ik·r f (r) dr. suggests Fourier transformation with respect not merely to time but also to space.52)’s second line follows immediately. [56.53) e −ik·r F (k) dk. s→0 and (18. s→0 = lim sF (s) − f (0− ).9 The spatial Fourier transform In the study of wave mechanics. physicists and engineers sometimes elaborate this chapter’s kernel eivθ or eiωt .5) but cubing the 1/ 2π scale factor for the triple integration and reversing the sign of the kernel’s exponent.5] The choice of sign here is a matter of convention. spatiotemporal phase factor27 ei(±ωt∓k·r) . s→0 = lim sF (s) − f (0− ). V √ analogous to (18. where k and r are threedimensional geometrical vectors and r in particular represents a position in space. THE SPATIAL FOURIER TRANSFORM which implies (18. one begins d f (t) s→0 dt ∞ d lim e−st f (t) dt s→0 0− dt ∞ d f (t) dt − dt 0 lim f (t) − f (0− ) lim L t→∞ 547 = lim sF (s) − f (0− ). 27 26 . V (18. For the ﬁnal value. analogous to ω. s→0 = lim sF (s) − f (0− ). also for other reasons called a propagation vector.52)’s ﬁrst line. ei(∓ωt±k·r) = ei(∓ωt±kx x±ky y±kz z) . convenient in electrical modeling but slightly less convenient in quantum-mechanical work. The transform variable k. § 7.

F (kz ) = √ 2π −∞ 1 F (kρ ) = e+ikρ ·ρ f (ρ) dρ. one can restrict it to two dimensions or even one. 2π S 1 F (k) = e+ik·r f (r) dr. including a fourth integration to convert time t to temporal frequency ω while also converting position r to spatial frequency k. various plausibly useful Fourier transforms include ∞ 1 e−iωt f (t) dt. THE FOURIER AND LAPLACE TRANSFORMS Nothing prevents one from extending (18.53) to four dimensions. F (ω) = √ 2π −∞ ∞ 1 e+ikz z f (z) dz. ω) = (2π)2 V −∞ among others. . F (k. (2π)3/2 V ∞ 1 ei(−ωt+k·r) f (r. Thus. t) dt dr. On the other hand.548 CHAPTER 18.

the latter originally in English. and reading Lebedev or Andrews one does soon perceive the shape of the thing). diving thence ﬂuidly into the mathematics before anyone should notice that those authors have motivated but never quite delineated their topic (actually. the former available at the time of this writing in Richard A. whose celebrated handbook2 though not expressly a book on special functions is largely about them. inexpensive English translation. authors respectively in Russian and English of two of the better post-World War II books on the topic. it’s not a bad approach. before the ﬁrst mathematical symbol strikes the page. and the needful topic that most provokes his mathematical curiosity may be that of special functions. 2 [1] 3 [51] 1 549 . However. What is a special function? Trouble arises at once. so brieﬂy addressed by this book’s § 6.1 seem to decline to say what a special function is.” What are we to make of The books respectively are [48] and [3]. Lebedev and Larry C. Abramowitz and Stegun. Andrews. N. partial draft. for it is not easy to discover a precise deﬁnition of the term. merely suggesting in their respective prefaces some things it does.1.Chapter 19 Introduction to special functions [This chapter is a rough. an applied mathematician is pleased to follow a lesser muse.] No topic more stirs the pure mathematician’s imagination than that of number theory. are even silenter on the question. So it is said.N. The sober dictionary3 on the author’s desk correctly deﬁnes such mathematical terms as “eigenvalue” and “Fourier transform” but seems unacquainted with “special function. Silverman’s excellent.

in which the unknown is not a variable but a function f (w) that operates on a dummy variable of integration. and in that. yet. The deﬁnition will do to go on with. even so. 2π We have already met as (20. in how reluctantly they yield their mathematical secrets. the more intrepidly one explores their realm.1 The Gaussian pulse and its moments Ω(x) = exp −x2 /2 √ . useful results will start to ﬂow from it almost at once. in discretized form.4 or to solve a diﬀerential equation elementary functions alone cannot evaluate or solve. in matrix notation (Chs. in how often they connect seemingly unrelated phenomena. Such a deﬁnition approximates at least the aspect and use of the special functions this book means to treat. as in Ch. we have already met integral equations in disguise. w)f (w) dw = h(z). 2 through 5. 11). 4 It need not interest you but in case it does. which means no more than it seems to mean.4)—likely of a single and at most of a few complex scalar variables. harder to analyze and evaluate than the elementary functions of Chs. an integral equation is an equation like ∞ −∞ g(z. We surely will not begin to exhaust the topic in this book. A special function is an analytic function (§ 8. Actually. deﬁned in a suitably canonical form—that serves to evaluate an integral.15). INTRODUCTION TO SPECIAL FUNCTIONS this? Here is what. so maybe integral equations are not so strange as they look. 19. 11 through 14) resembling Gf = h. in how readily they conform to unexpected applications. The topic of special functions seems inexhaustible. The fascination of special functions to the scientist and engineer lies in how gracefully they analyze otherwise intractable physical models. The integral equation is just the matrix equation with the discrete vectors f and h replaced by their continuous versions ∆w f (j ∆w) and ∆z h(i ∆z) (the i representing not the imaginary unit here but just an index.550 CHAPTER 19. the more disquietly one feels that one had barely penetrated the realm’s frontier. . to solve an integral equation.

thus indicating not only the length but the degree of uncertainty in the length. the nomenclature is ﬁne once you know what it means. none may be so counterintuitive yet. 25-year-old American male is 70 ± 3 inches tall. I mean that it is about three inches. Actually. Probability in its guise as statistics is the mathematics which produces.] Of all mathematical ﬁelds of study. whether as probability in the technical term’s conventional. I likely do not mean that it is exactly three inches.1 inches. If I tell you that my thumb is three inches long. randomly chosen sample of 25-year-old American males ought The nomenclature is slightly unfortunate.1 Man’s mind. inferred from actual measurements of some number N > 1 of 25-year-old American males. I might report the length as 3. 1 551 . so probability is the mathematics of uncertainty. to be precise— of a typical.AEC5. seems somehow to rebel against the concept. restricted meaning or as probability in its expanded or inverted guise as statistics.Chapter 20 Probability [This chapter is still quite a rough draft. Statistics (singular noun) the expanded mathematical discipline—as opposed to the statistics (plural noun) mean and standard deviation of § 20.0 ± 0.6. but the chapter will have at least a little to say about it in § 20. untrained. say. so widely applied as that of probability. More obviously statistical is a report that the average. As calculus is the mathematics of change. Sustained reﬂection on the concept however gradually reveals to the mind a fascinating mathematical landscape.2—as such lies mostly beyond this book’s scope. for the report implies among other things that a little √ 1 over two-thirds—(1/ 2π) −1 exp(−τ 2 /2) dτ ≈ 0x0. but on ﬁrst encounter it provokes otherwise unnecessary footnotes like this one. Quantitatively. Were statistics called “inferred probability” or “probabilistic estimation” the name would suggest something like the right taxonomy. paradoxically. analyzes and interprets such quantities. Deep mathematics underlie such a report.

Other than by this brief introduction. often an assumption of symmetry such as that no face of a die or card from a deck ought to turn up more often than another. is the discovery and derivation of the essential mathematical functions of probability theory (including in § 20. It means that the ratio of the number of three-of-the-samerank events to the number of trials must converge exactly upon 1/425 as the number of trials tends toward inﬁnity. were I to shuﬄe 425 decks and you to draw three cards from each.W. However. queen and ten. then you might draw three of the same rank from two. then there would be some ﬁnite probability—which is (3/51)(2/50) = 1/425—that the three cards you draw would share the same rank (why?).1 million nor as few as 0. in the form of the present chapter. three or four decks. though very unlikely from twenty decks—so what does a probability of 1/425 really mean? The answer is that it means something like this: were I to shuﬄe 425 million decks then you would draw three of the same rank from very nearly 1. See also § 4. If I should however shuﬄe the deck. why?). PROBABILITY to be found to have heights between 67 and 73 inches. this knowledge alone must slightly lower your estimation of the probability that your three would subsequently share the same rank as one another to (40/49)(3/48)(2/47) + (9/49)(2/48)(1/47) ≈ 1/428 (again. Entertaining so plausible an assumption. . draw three cards oﬀ the top. The reason it is not is that its mathematics in this case is based not on observation but on a teleological assumption of some kind. if you should draw three cards at random from a standard 52-card deck (let us use decimal notation rather than hexadecimal in this paragraph. and look at the three cards without showing them to you. Probability is also met in games of chance and in models of systems which—from the model’s point of view—logically resemble games of chance. Hamming’s ﬁne book [30] ably ﬁlls such a role.552 CHAPTER 20.2. That the probability should be 1/425 suggests that one would draw three of the same rank once in 425 tries. and you saw that they were ace. then the probability that your three would share the same rank were again 1/425 (why?). since neither you nor I have ever heard of a 0x34-card deck). On the other hand. all before inviting you to draw three.5 the derivation of one critical result undergraduate statistics textbooks 2 The late R. and in this setting probability is not statistics.0 million decks—almost certainly not from as many as 1. the book you are reading is not well placed to oﬀer a gentle tutorial in probabilistic thought. the deck comprising four cards each of thirteen ranks.9 million. or from none at all. if before you drew I let you peek at my three hidden cards.2 What it does oﬀer.

20. Nevent Pevent ≡ lim . as for example we would be if f (x) = δ(x) and a = 0. you can stay here and work through the concepts on your own. ℑ[f (x)] = 0. ∞ 1= f (x) dx. omit to prove).20.1. If the sentence and section make little sense to you then so likely will the rest of the chapter. (20. The notation in this case is not really interested in the bounding points themselves.3) . but any statistics text you might ﬁnd conveniently at hand should ﬁll the gap—which is less a mathematical gap than a conceptual one. P(a−ǫ)(b+ǫ) . Conceptually.1) where the event of interest is that the random variable x fall3 in the interval4 a < x < b and Pba is the probability of this event. if deﬁant. DEFINITIONS AND BASIC CONCEPTS 553 usually state but. −∞ F (x) ≡ where f (τ ) dτ. then we can always write in the style of P(0− )b . except that such notational niceties would distract from the point the notation means to convey. Pba = F (b) − F (a). often met in the literature.1 Deﬁnitions and basic concepts A probability is the chance that a trial of some kind will result in some speciﬁed event of interest.2) 0 = F (−∞). We can even be most explicit and write in the style of P {a ≤ x ≤ b}. P(a+ǫ)(b−ǫ) or the like. plus a brief investigation of these functions’ principal properties and typical use. This sentence and the rest of the section condense somewhat lengthy tracts of an introductory collegiate statistics text like [74][2][49][60]. If we are interested the bounding points. N →∞ N where N and Nevent are the numbers respectively of trials and events. among others. The corresponding cumulative distribution function (CDF) is x 0 ≤ f (x). 4 We might as well have expressed the interval a < x < b as a ≤ x ≤ b or even as a ≤ x < b. A probability density function (PDF) or distribution is a function f (x) deﬁned such that b Pba = a f (x) dx. P(0+ )b . understandably. −∞ (20. 3 (20. Or. 1 = F (∞).

It is easy enough to see that the product P = P1 P2 (20. where. That is. f2 (x) ∗ f1 (x) = f1 (x) ∗ f2 (x) ≡ ∞ −∞ f1 x x − τ f2 +τ 2 2 dτ. . = a b = a which in consideration of (20. Harder to see. is that the convolution f (x) = f1 (x) ∗ f2 (x) (20.5. per Table 18.6).1) implies (20. (20. PROBABILITY The quantile F −1 (·) inverts the CDF F (x) such that F −1 [F (x)] = x. independent events will occur. but just as important. if you think about it in a certain way.4) generally calculatable by a Newton-Raphson iteration (4.554 CHAPTER 20.6) of two probability density functions composes the single probability density function of the sum of two random variables x = x1 + x2 .31) if by no other means. the probability that a < x1 + x2 < b cannot but be Pba = = = = a b ǫ→0+ lim ∞ k=−∞ ∞ k=−∞ b (k+1/2)ǫ b−kǫ f1 (x) dx (k−1/2)ǫ b a−kǫ f2 (x) dx ǫ→0+ ∞ −∞ b lim ǫf1 (kǫ) a f2 (x − kǫ) dx f1 (τ ) a ∞ −∞ ∞ −∞ ∞ −∞ f2 (x − τ ) dx dτ f1 (τ )f2 (x − τ ) dτ dx f1 f1 x x + τ f2 −τ 2 2 x x − τ f2 +τ 2 2 dτ dx dτ dx.5) of two probabilities composes the single probability that not just one but both of two.

7) suggests.2.2 The statistics of a distribution A probability density function f (x) describes a distribution whose mean µ and standard deviation σ are deﬁned such that µ≡ x = σ 2 ≡ (x − x )2 = ∞ −∞ ∞ −∞ f (x)x dx. The standard deviation σ measures a random variable’s typical excursion from the mean. (20. .20. deﬁned as the ﬁrst line of (20. The mean and standard deviation are statistics of the distribution. THE STATISTICS OF A DISTRIBUTION 555 20. about which a random variable should center.7) f (x)(x − µ)2 dx. The mean µ is just the distribution’s average. but these two are the most important ones and are the two this book treats. where · indicates the expected value of the quantity enclosed. it was saying that his height could quantitatively be modeled as a random variable drawn from a distribution whose statistics are µ = 70 inches and σ = 3 inches.5 When the chapter’s introduction proposed that the average 25-year-old American male were 70 ± 3 inches tall. 5 Other statistics than the mean and standard deviation are possible.

For the mean. substituting (20. −∞ That is. but at least it is nice to know that our mathematics is working as it should. substituting (20.6) into the ﬁrst line of (20.7). (20.7). µ = = = = = = ∞ −∞ ∞ −∞ ∞ −∞ ∞ −∞ ∞ −∞ ∞ −∞ [f1 (x) ∗ f2 (x)]x dx ∞ −∞ f1 x x − τ f2 +τ 2 2 ∞ dτ x dx f2 (τ ) f2 (τ ) f2 (τ ) −∞ ∞ f1 (x − τ )x dx dτ f1 (x)(x + τ ) dx dτ f1 (x)x dx + τ ∞ −∞ −∞ ∞ −∞ f1 (x) dx dτ f2 (τ )[µ1 + τ ] dτ ∞ = µ1 f2 (τ ) dτ + ∞ −∞ f2 (τ )τ dτ. µ = µ1 + µ2 . . σ2 = = ∞ −∞ ∞ −∞ [f1 (x) ∗ f2 (x)](x − µ)2 dx ∞ −∞ f1 x x − τ f2 +τ 2 2 dτ (x − µ)2 dx.6) into the second line of (20. The standard deviation of the sum of two random variables is such that.8) which is no surprise.556 CHAPTER 20.3 The sum of random variables The statistics of the sum of two random variables x = x1 + x2 are of interest. PROBABILITY 20.

20. (20. according to (20. THE SUM OF RANDOM VARIABLES Applying (20. σ2 = = = = = ∞ −∞ ∞ −∞ ∞ −∞ ∞ −∞ ∞ −∞ ∞ −∞ ∞ −∞ 557 f1 f1 x x − τ f2 + τ dτ (x − µ1 − µ2 )2 dx 2 2 x + µ1 x + µ1 − τ f2 + τ dτ (x − µ2 )2 dx 2 2 ∞ f2 (τ ) f2 (τ ) f2 (τ ) −∞ ∞ f1 (x + µ1 − τ )(x − µ2 )2 dx dτ f1 (x)[(x − µ1 ) + (τ − µ2 )]2 dx dτ f1 (x)(x − µ1 )2 dx ∞ −∞ ∞ −∞ −∞ ∞ −∞ + 2(τ − µ2 ) + (τ − µ2 )2 = ∞ −∞ f1 (x)(x − µ1 ) dx f1 (x) dx dτ f2 (τ ) ∞ −∞ f1 (x)(x − µ1 )2 dx ∞ −∞ + 2(τ − µ2 ) f1 (x)x dx ∞ −∞ + (τ − µ2 )(τ − µ2 − 2µ1 ) Applying (20. 2 f2 (τ ) σ1 + 2(τ − µ2 )µ1 + (τ − µ2 )(τ − µ2 − 2µ1 ) dτ 2 f2 (τ ) σ1 + (τ − µ2 )2 dτ ∞ f2 (τ ) dτ + ∞ −∞ −∞ f2 (τ )(τ − µ2 )2 dτ.7) and (20.7) and (20.9) If this is right—as indeed it is—then the act of adding random variables together not only adds the means of the variables’ respective distributions according to (20. Applying (20. 2 2 σ 2 = σ1 + σ2 .1) again.9).8) but also. adds the squares of the .1).8). σ2 = = ∞ −∞ ∞ −∞ 2 = σ1 f1 (x) dx dτ.3.

x2 .49) is styled g−1 (·). then µ = 0 and.7).12) Either way. . . xN of a random variable are drawn from the same distribution fo (xk ). (20. σ2 = ∞ −∞ f (x)x2 dx = fo (xo )(αxo )2 dxo = α2 .12) f (x) = 1 x . if N independent instances x1 .13) If µo = 0 and σo = 1. then x ≡ g(xo ) is itself a random variable obeying the distribution f (x) = fo (xo ) |dg/dxo | . but can in any case be supported symbolically by b g(b) fo (xo ) dxo = a g(a) fo (xo ) dxo dx = dx g(b) fo (xo ) g(a) dxo dx. dg One of the most frequently useful transformations is the simple x ≡ g(xo ) ≡ αxo .4 The transformation of a random variable If xo is a random variable obeying the distribution fo (xo ) and g(·) is some invertible function whose inverse per (2. ℜ(α) > 0. (20. suaver way to write the same thing is f (x) |dx| = fo (xo ) |dxo | .10) 20. . . then the statistics of their sum x = N xk = x1 + x2 + · · · + xN are k=1 µ = N µo . PROBABILITY standard deviations. √ σ= N σo . evidently dg/dxo = α. this is almost obvious if seen from just the right perspective. fo |α| α ∞ −∞ (20.11) Another.11) or (20. ℑ(α) = 0.558 CHAPTER 20. xo =g −1 (x) (20. For this. . the distribution’s statistics being µo and σo . so according to (20. It follows directly that. in train of (20.

all by the same factor—which. (20.14) suggests. if x1 and x2 are random variables drawn respectively from the distributions 1 fo σ1 1 f2 (x2 ) = fo σ2 f1 (x1 ) = as (20.9). then x = x1 + x2 by construction is a random variable drawn from the distribution f (x) = where per (20. for which.14) Assuming null mean. The distribution fo (xo ) = Ω(xo ) meets our criterion.5. 2π (20. fo σ σ There are several distributions one might try. 2 2 σ 2 = σ1 + σ2 . naturally.15) recommends itself. but eventually the Gaussian pulse Ω(xo ) of §§ 17.5. we should like to identify a distribution fo (xo ): µo = 0. Ω(x) = exp −x2 /2 √ .4 leads us now to ask whether a distribution does not exist for which. 559 (20. σo = 1. the sum obeys the same distribution. More precisely.3 and 20. when independent random variables drawn from it are added together. .14) states that the act of scaling a random variable ﬂattens out the variable’s distribution and scales its standard deviation.20. This works. x1 σ1 x2 σ2 . is what one would expect it to do. if µo = 0 and σo = 1.3 and 18.5 The normal distribution Combining the ideas of §§ 20. 20. THE NORMAL DISTRIBUTION whereby σ = α and we can rewrite the transformed PDF as f (x) = 1 x fo σ σ and µ = 0. x 1 . only the standard deviations diﬀering. .

Fortunately.50). all that remains to be proved per (20. we have already demonstrated this fact as (18. since Ω(xo ) evidently meets the other demands of (20.50) ∞ −∞ Ω(xo )x2 dxo = 1 o (20.1).560 CHAPTER 20. That σo = 1 is shown by ∞ −∞ Ω(xo )x2 dxo = o ∞ 1 x2 √ exp − o x2 dxo o 2 2π −∞ ∞ 1 x2 = −√ xo d exp − o 2 2π −∞ xo exp −x2 /2 o √ = − 2π 1 +√ 2π = 0+ ∞ −∞ ∞ −∞ ∞ −∞ exp − x2 o 2 dxo . from which again according to (18. We will prove it in the Fourier domain of Ch. as it happens.1). Now having justiﬁed the assertions that Ω(xo ) is a proper distribution and that its statistics are µo = 0 and σo = 1. PROBABILITY To prove that the distribution fo (xo ) = Ω(xo ) meets our criterion we shall ﬁrst have to show that it is indeed a distribution according to (20. Especially. it apparently is a proper distribution. Ω σ σ 2 (20. which is to prove that the sum of Gaussian random variables is itself Gaussian. we shall have to demonstrate that ∞ −∞ Ω(xo ) dxo = 1. so. That µo = 0 for Ω(xo ) is obvious by symmetry.6) is that 1 Ω σ1 xo σ1 ∗ 1 Ω σ2 xo σ2 2 σ1 = 2 σ2 1 xo .17) + =σ .16) as was to be shown. According . 18 as follows. Ω(xo ) dxo .

18 one usually names Ω(·) the Gaussian pulse. accidentally resembles a bell. 20.6. thus completing the proof. however. as we have seen. as does the distribution’s plot. but the result is xo FΩ (xo ) = −∞ Ω(τ ) dτ = 1 1 +√ 2 2π xo 1 +√ 2 2π ∞ k=0 ∞ k=0 (−)k x2k+1 o (2k + 1)2k k! 1 2k + 1 k j=1 = −x2 o . so this section will not repeat the details. The function Ω(·) turns out to be even more prominent in probability theory than in Fourier theory. and we will not be too proud to take advantage of the accident. so that is how you can remember it if you like). and in a probabilistic context it usually goes by the name of the normal distribution. the last line of which is (20.15). By whichever name.17) in other notation. In the Fourier context of Ch.20.18) .1 plots the normal distribution Ω(·) and its cumulative distribution function (20. Fig. THE NORMAL DISTRIBUTION to Tables 18. one way to calculate it numerically is to integrate the normal distribution’s Taylor series term by term. Alternate conventional names include those of the Gaussian distribution and the bell curve (the Greek capital Ω vaguely.9 has worked a very similar integral as an example. 2j (20. 1 Ω σ1 xo σ1 xo σ2 √ 1 xo 1 2π F Ω Ω F σ1 σ1 σ2 √ 2π Ω(σ1 xo )Ω(σ2 xo ) ∗ 1 Ω σ2 2 2 exp −σ1 x2 /2 exp −σ2 x2 /2 o o √ 2π 2 2 exp − σ1 + σ2 x2 /2 o √ 2π 2 2 σ1 + σ2 xo 561 = F −1 = F −1 = F −1 = F −1 xo σ2 = F −1 Ω = 1 2 σ1 + 2 σ2 Ω xo 2 σ1 2 + σ2 . As it happens.2). § 9. Regarding the cumulative normal distribution function.5. and to (20.3 and 18. This is what we will call Ω(·) through the rest of the present chapter.

Typically however one will be prepared to make some assumption about the shape such as that f (x) = µ + x 1 . PROBABILITY √ Figure 20. 20. Could you infer the shape? The answer is that you could infer the shape with passable accuracy provided that the number N of samples were large. See § 20.7 for the reason. but that you were not told the shape of f (x). see § 20. For |xo | ≫ 1.1: The normal distribution Ω(x) ≡ (1/ 2π) exp(−x2 /2) and its x cumulative distribution function FΩ (x) = −∞ Ω(τ ) dτ . When one lacks a reason to do otherwise.10.562 CHAPTER 20.19) . Ω σ σ (20. this Taylor series—though always theoretically correct—is practical only for small and moderate |xo | 1. one models a random quantity as a normally distributed random variable. concrete instances of a random variable—collectively called a sample—were drawn from a distribution f (x) and presented to you. Ω(x) √ 1/ 2π x −1 FΩ (x) 1 1 −1 1 x Unfortunately.6 Inference of statistics Suppose that several. The normal distribution tends to be the default distribution in applied mathematics.

(Naturally.20) of the mean. let us construct from the available sample the quantity σ ′ 2 1 ≡ N k 1 xk − N 2 xk k . The problem then becomes to infer the statistics from the sample.20. In the absence of additional information. uk the ﬁrst of which is merely a statement of the leftward part of (20. a distribution which by construction has zero mean. but this does not prevent us from representing µtrue symbolically during analysis.6.) We shall presently ﬁnd helpful the identities u2 k k 2 k = N σ2 . we do not know—we shall never know—the actual value of µtrue .7) would implicate our imperfect estimate (20. .20) but the true. = N σ2 . It will simplify the standard-deviational analysis to consider the shifted random variable uk = xk − µtrue instead of xk directly. INFERENCE OF STATISTICS 563 which is to assume that x were normally distributed with unknown statistics µ and σ. The distribution of u then is f (u + µtrue ).7)’s second line with respect to the unknown distribution f (u + µtrue ) whose mean u is null by construction. the second of which considers the sum k uk as a random variable whose mean again is null but whose standard deviation σΣ 2 2 according to (20.9) is such that σΣ = N σtrue .20) One infers the mean to be the average of the instances one has observed. where µtrue is not the estimated mean of (20. k (20. unknown mean of the hidden distribution f (x). With the foregoing deﬁnition and identities in hand. one can hardly hardly suppose much about the mean other than that µ≈ 1 N xk . One might think to infer the standard deviation in much the same way except that to calculate the standard deviation directly according to (20. If we wish to estimate the standard deviation accurately from the sample then we shall have to proceed more carefully than that.

564 CHAPTER 20. whereby σ2 = σ2 ≈ N σ′ N −1 2 . Applying the identities of the last paragraph. substituting the deﬁnition of (σ ′ )2 into the last equation. PROBABILITY 2 which would tend to approach σtrue as N grew arbitrarily large but which. is a quantity we can actually compute for any N > 1. By successive steps. 1 N2 uk k k the expected value of which is σ′ 2 = 1 N u2 k k − 1 N2 2 k uk . and.21) . ′ 2 σ = 1 N 1 N 1 N 1 N 1 N k 1 [uk + µtrue ] − N 1 uk − N u2 − k u2 − k u2 − k 2 2 [uk + µtrue ] k = = = = uk k k k 2 uk N 2 k 2 k uk + k 1 N2 2 k 2 uk k 2 N2 1 N2 uk + uk . σ′ N −1 Because the expected value (σ ′ ) is not a quantity whose value we know. σ′ from which 2 = σ2 − σ2 N −1 2 = σ . unlike σtrue . N N N 2 . (20. 1 σ ≈ N −1 2 k 1 xk − N 2 xk k . we can only suppose that (σ ′ )2 ≈ σ 2 .

and modeling on this basis.1 for another example of the kind of paired data in whose correlation one might be interested: in the ﬁgure. obviously.22 can overstate the relationship between paired data when N is small. but he weighs 250!” then your estimate of his probable height will change.) If further elaborated. Such questions confound two. They are the statistics one imputes to an unknown distribution based on the incomplete information of N > 1 samples.6. [weight]k ) of the example. the mathematics of statistics rapidly grow much more complicated. one can usually tear the two uncertainties from one another without undue violence to accuracy.22) a unitless quantity whose value is ±1.21) respectively calculate. 7 6 . The average is 160 pounds. Consider for instance that r = ±1 always when N = 2. [74. 12-6 and 12-14] The subtle mathematical implications of this far exceed the scope of the present book but are developed to one degree or another in numerous collegiate statistics texts of which [74][2][49][60] are representative examples. is the correlation coeﬃcient r≡ − µx )(yk − µy ) .20) and (20.9][2. if I point to some 25-year-old over there and say.7 Fortunately. (Beware that the conventional correlation coeﬃcient of eqn. See Fig. In statistical work however one must sometimes handle correlated quantities like the height and weight of a 25-year-old American male—for.19). the correlation would be +1 if the points fell all right on the line. eqns. such as ([height]k . yk ) of pairs of data.20) and (20. § 9.21) are known as sample statistics. and the uncertain knowledge of the distribution. because height and weight are not independent but correlated. “That’s Pfufnik. This chapter generally assumes independent random variables when it speaks of probability. treating the statistics as random variables. The coeﬃcient as given nevertheless is conventional.20. pretending that one knew the unknown µ and σ to be exactly the values (20. indicating perfect correlation. supposing that the distribution were the normal (20.1) in a random variable even were the variable’s distribution precisely known. 13. 2 2 k (xk − µx ) k (xy − µy ) k (xk (20. when yk = xk or even when yk = a1 xk + a0 . INFERENCE OF STATISTICS 565 The estimates (20. The conventional statistical measure6 of the correlation of a series (xk . 20. separate uncertainties: the uncertainty inherent by deﬁnition (20. but whose value should be near zero when the paired data are unrelated. The book will not pursue the matter further but will mention that the kinds of questions that arise tend to involve the statistics of the statistics themselves. if N ≫ 1.

zero net progress—“[one has] the feeling that as N increases. The lecture is a classic and is recommended to every reader of this chapter who can conveniently lay hands on a printed copy—recommended among other reasons because it lends needed context to the rather abstruse mathematics this chapter has presented to the present point.” so to speak.’ In its simplest version. though one cannot guess whether the ‘player’ will have gone forward or backward after N steps—and. so its expected value DN proves a most convenient measure of “likely straying.566 CHAPTER 20. It also analyzes the simple but often encountered statistics of a series of all-or-nothingtype attempts. 20.7. by the toss of a coin. §§ I:6]. How shall we describe the resulting motion?” Sands goes on to observe that. “There is [an] interesting problem in which the idea of probability is required. and it is the measure this book will adopt. indeed. And what is this expected value? Sands ﬁnds two .” Sands is right. The success of the least-squares technique of § 13. but the expected 2 value DN is interesting. The expected value DN = 0 is uninteresting as we said. on behalf of Richard P. It is the problem of the ‘random walk.3 and 8. The choice is to be made randomly. but this being nonanalytic (§§ 2. that in the absence of other information one must expect DN = 0. PROBABILITY 20. The squared distance DN is nonnegative in 2 every instance and also is analytic. but if DN is not a suitable measure of this “likely straying. if the symbol DN −1 represents the ‘player’s’ position after N − 1 steps.6 however encourages 2 2 us to try the measure DN . insight and practical perspective. his position after N steps must be DN = DN −1 ± 1. for example.1 The random walk Matthew Sands gave a famous lecture [22.12. Feynman at Caltech in the fall of 1961. we imagine a ‘game’ in which a ‘player’ starts at the point [D = 0] and at each ‘move’ is required to take a step either forward (toward [+D]) or backward (toward [−D]).” It is moreover a measure universally accepted among scientists and engineers.4) proves inconvenient in practice (you can try it if you like). then what would be? The measure |DN | might recommend itself. determined. [the ‘player’] is likely to have strayed farther from the starting point. on probability.7 The random walk and its consequences This section brings overview. Sands notes that. One section of the lecture begins.

The PDF of DN is more complicated (though not especially hard to calculate in view of § 4. . is to sell µo ± σo = 0.10). since D0 = 0. but its statistics √ are evidently µN = 0 and σN = N . in which case 2 2 DN = (DN −1 + 1)2 = DN −1 + 2 DN −1 + 1. the expected value increases by 1 with 2 each step. This distribution’s statistics according to (20.10 ± 0.7.24) As an example of the use. (20. is valid only to the extent to which the model is valid—which in a real-estate agent’s case may be not very—but that nevertheless is how the mathematics of it work. The agent’s expected result from √ N = 400 showings. Observe that the PDF of a single step xk is fo (xo ) = [δ(xo + 1) + δ(xo − 1)]/2.2).23) which describes or governs an act whose probability of success is po .30 of a house. (20. 2 σo = (1 − po )po . 7.7. agreeing with (20.10. THE RANDOM WALK AND ITS CONSEQUENCES 567 possibilities: either the ‘player’ steps forward on his N th step. according to (20. in which case 2 2 DN = (DN −1 − 1)2 = DN −1 − 2 DN −1 + 1.20. of course.0 ± 6.10. 2 DN = N. and that the corresponding statistics are µo = 0 and σo = 1. Thus by induction.0 houses. Since forward and backward are equally likely.2 Consequences An important variation of the random walk comes with the distribution fo (xo ) = (1 − po )δ(xo ) + po δ(xo − 1).10). is to sell µ±σ = N µo ± N σo = 40. according to (20. the actual expected value must be the average 2 2 DN = DN −1 + 1 of the two possibilities. 20. consider a real-estate agent who expects to sell one house per 10 times he shows a prospective buyer a house: po = 1/10 = 0.7) are such that µo = po . Evidently. The agent’s expected result from a single showing. or he steps backward on his N th step. where δ(·) is the Dirac delta of Fig. Such a conclusion.24).

thus one would expect f (x) to take on something rather like the bell-shape. Discover a continuous. PROBABILITY As the number N of attempts grows large one ﬁnds that the distribution f (x) of the number of successes begins more and more to take on the bellshape of Fig. which is f (x) = C exp(−ax2 ). and. To render the arithmetic tractable one might try ﬁrst the speciﬁc case of p = 1/2 and make various arithmetical approximations as one goes. The author confesses that he prefers the specious argument of the narrative. the argument. does not change shape is the normal distribution of § 20. Moreover. “When unsure. this makes sense. If for 400 showings the distribution is f (x) then. In the theory and application of probability. when convolved with itself. 40 or 41 houses but a low probability to sell 10 or 70.6). according to (20. better—remembering that suitable arithmetical approximations are permissible in such work—as (∂Pk /∂k)/Pk ≈ (Pk+1/2 − Pk−1/2 )/Pk . which is Pk = N k 8 pk (1 − p)N−k .1’s normal distribution. . approximate the logarithmic derivative of Pk per (4.22) as (∂Pk /∂k)/Pk ≈ (Pk+1 − Pk−1 )/2Pk or. applications tend to approximate sums of several random variables as though the sums were normally distributed. Counting permutations per § 4. uninteresting or unknown. the distribution of last resort. go normal!” usually prospers in probabilistic work. analytic function. derive an exact expression for the probability of k successes in N tries. 20. Indeed. Admittedly.5. but to ﬁll in the tedious details is left as an exercise to the interested (penitent?) reader. often the only distribution tried. moreover. is somewhat specious. the normal distribution is the master distribution. or at least unrigorous—though on the other hand it is hard to imagine any plausible conclusion other than the correct one the argument reaches—but one can construct an alternate though tedious argument toward the normal distribution on the following basis.568 CHAPTER 20. for one would expect the aforementioned real-estate agent to have a relatively high probability of selling 39. that for large N has a similar logarithmic derivative in the distribution’s peak region. The banal suggestion.2. Change x ← (k − po N )/ (1 − po )po N . one infers that the normal distribution is the PDF toward which the real-estate agent’s distribution— and indeed most other distributions of sums of random variables—must converge8 as N → ∞. which supposes that all (or at least most) aggregate PDFs must tend toward some common shape as N grows large. for 800 = 400 + 400 showings the distribution must be f (x) ∗ f (x). tend to impute normal distributions to random variables whose true distributions are unnoticed. For such reasons. Considering also the probabilities of k − 1 and k + 1 successes in N tries. since the only known PDF which.

20.1 The uniform distribution The uniform distribution can be deﬁned in any several forms.2 The exponential distribution The exponential distribution is9 f (x) = whose mean is 1 µ ∞ 0 x 1 exp − µ µ .8.25) where Π(·) is the square pulse of Fig.8. otherwise. The inclusion of µ here is admittedly inconsistent. 20.9. 20. However. but the conventional form is f (x) = Π x − 1 2 = 1 0 if 0 ≤ x < 1.26) of its distribution. this one explicitly includes the mean µ in the expression (20.20. in typical applications the entire point of choosing the exponential distribution may be to specify µ. OTHER DISTRIBUTIONS 569 20. Besides sometimes being useful in its own right. One can extract normally distributed (§ 20. 9 Unlike the section’s other subsections. (20. x ≥ 0. The reader who prefers to do so can mentally set µ = 1 and read the section in that light. this is also the distribution a computer’s pseudorandomnumber generator obeys.3. This section will name a few of the most prominent. or to infer it. The exponential distribution is inherently “µ-focused.” so to speak.3) random variables from it by the Box-Muller transformation of § 20.1 are possible.8.5) or Rayleigh-distributed (§ 20. .8. (20. 17.26) exp − x µ x dx = − exp − x µ (x + µ) ∞ 0 =µ as advertised and whose standard deviation is such that σ2 = = 1 µ ∞ 0 exp − x µ x µ (x − µ)2 dx ∞ 0 − exp − (x2 + µ2 ) . The author prefers to leave the µ in the expression for this reason.8 Other distributions Many distributions other than the normal one of Fig.

3 The Rayleigh distribution The Rayleigh distribution is a generalization of the normal distribution for position in a plane. which implies the distribution f (ρ) = ρ exp − ρ2 2 . (20.4) are evidently F (x) = 1 − exp − F −1 x µ .570 CHAPTER 20. such that dP ≡ [Ω(x) dx] [Ω(y) dy] x2 + y 2 1 exp − dx dy = 2π 2 1 ρ2 = exp − ρ dρ dφ.8.29) . 20. Let each of the x and y coordinates be drawn independently from a normal distribution of zero mean and unit standard deviation. Among other eﬀects. the exponential distribution models the delay until some imminent event like a mechanical bearing’s failure or the arrival of a retail establishment’s next customer.2) and quantile (20. which implies that σ = µ. PROBABILITY (the integration by the method of unknown coeﬃcients of § 9.4). ρ ≥ 0. 2π 2 whence π b Pba ≡ = = dP 1 2π a φ=−π ρ=a π b −π a exp − ρ2 2 ρ2 2 ρ dρ dφ b exp − ρ dρ. (20.28) (u) = −µ ln(1 − u). (20.27) The exponential’s CDF (20.

8. Maxwell’s derivation closely resembles Rayleigh’s. the speed at which an air molecule might travel. Rayleigh’s CDF (20. for the integrand exp(−ρ2 /2)ρ dρ dφ above includes no φ. I:40. with the diﬀerence that Maxwell uses all three of x.4) are evidently F (ρ) = 1 − exp − F −1 ρ2 2 .10 20. is r2 2r 2 .8.1) is proved by evaluating the integral ∞ 0 f (ρ) dρ = 1 (20.30) using the method of § 18. eqn. y and z and then transforms to spherical rather than cylindrical coordinates.32) f (r) = √ exp − 2 2π which models. there is nothing in the mathematics to favor any particular value of φ over another. That it is a proper distribution according to (20. The Rayleigh distribution models among others the distance ρ by which a missile might miss its target. among others.33) [22. Incidentally. φ by symmetry will be uniformly distributed. The distribution which results. 20. so. it is not x but xo ≡ 10 ln x α (20.2) and quantile (20.7] .31) (u) = −2 ln(1 − u). φ being the azimuth toward which the missile misses. OTHER DISTRIBUTIONS 571 This is the Rayleigh distribution. (20. unlike ρ. r ≥ 0.5 The log-normal distribution In the log-normal distribution. the Maxwell distribution.8.4 The Maxwell distribution The Maxwell distribution extends the Rayleigh from two to three dimensions.20.5. (20.

Ch. Evidently. Because we know the quantiles.9 The Box-Muller transformation The quantiles (20.36) are the Box-Muller transformation. Section 20. Unfortunately. Equations (20. (20.11). we can convert a pair of uniform instances to a pair of normal instances. y = ρ sin φ. (20.35) like u′ ← 1 − u. but to do so saves little computational time and makes the derivation harder to understand. 5] One can eliminate a little trivial arithmetic by appropriate changes of variable in (20. 1 2 . Setting x = g(xo ) = exp αxo and fo (xo ) = Ω(xo ) in (20.572 CHAPTER 20. PROBABILITY that is normally distributed.28) and (20.34) 20. interestingly. to convert a pair of instances u and v of a uniformly distributed random variable to Rayleigh’s distance and azimuth is thus straightforward:12 ρ= −2 ln(1 − u). a fairly common case.3 has shown how Rayleigh gives the distance ρ by which a missile misses a target when each of x and y are normally distributed and. φ = (2π) v − But for the reason just given. though we lack an easy way to convert a single uniform instance to a single normal instance.35) and (20. Still. However.31) imply easy conversions from the uniform distribution to the exponential and Rayleigh. 13 [76] 12 11 . (20. we lack a quantile formula for the normal distribution. how the azimuth φ is uniformly distributed under these conditions. we can still convert uniform to normal by way of Rayleigh as follows.35) x = ρ cos φ.8. one can express the log-normal distribution in the form11 f (x) = 1 Ω αx ln x α .13 [54.36) must then constitute two independent instances of a normally distributed random variable with µ = 0 and σ = 1. the interested reader might complete the improvement as an exercise.

[61] Such methods prompt one to wonder how much useful mathematics our civilization should have forgone had Leonhard Euler (1707–1783). even in applications.20. At some scale.” • One might need to evaluate the CDF under a severe time constraint measured in microseconds. swamped by rounding error. that electronic computers will never again be scarce is a proposition whose probability the author is not prepared to evaluate. 15 14 .) • The mathematical method by which a more eﬃcient formula is derived is most instructive. actually. and. • One might wish to evaluate the CDF on a low-power “embedded device. not just one. double-type ﬂoatingpoint arithmetic.64C6 − 0xE.5CA7 + 0x4D. • Hard though it might be for some to imagine. this is not necessarily a bad approach.8DEC − · · · Not promising. one might actually wish to evaluate the CDF with a pencil! Or with a slide rule. as in aircraft control.14 of course.10 The normal cumulative distribution function at large arguments The Taylor series (20.8000 + 0x2. this calculation fails. is it? Using a computer’s standard. asking the computer to carry extra bits. an entire function. One can always calculate in greater precision. There remain however several reasons one might prefer a more eﬃcient formula. for any argument x. to be its own reward. In practice however— consider the Taylor series 1 − FΩ (6) ≈ −0x0.18) in theory correctly calculates the normal CDF FΩ (x). • One might wish to evaluate the CDF at thousands or millions of points. even with a computer.15 • One might regard a prudent measure of elegance. (Besides that one might not have a suitable electronic computer conveniently at hand. THE NORMAL CDF AT LARGE ARGUMENTS 573 20. Carl Friedrich Gauss (1777–1855) and other hardy mathematical minds of the past computers to lean upon. the calculation grows expensive.10.

1 − FΩ (x) = = 1 √ 2π 1 √ 2π ∞ x CHAPTER 20. for even m. 1 − FΩ (x) = 1 √ 2π 2 2 1 e−x /2 e−x /2 √ − −3 x3 2π x 2 2 τ =x e−x /2 e−x /2 3e−x − + x x3 x5 2 /2 − ··· 2 (−)n−1 (2n − 3)!!e−x /2 + x2n−1 ∞ −τ 2 /2 e dτ n + (−) (2n − 1)!! 2n τ x in which the