P. 1
Black - Derivations in Applied Mathematics

# Black - Derivations in Applied Mathematics

|Views: 94|Likes:

See more
See less

10/29/2011

pdf

text

original

## Sections

• 1.1 Applied mathematics
• 1.2 Rigor
• 1.2.1 Axiom and deﬁnition
• 1.2.2 Mathematical extension
• 1.3. COMPLEX NUMBERS AND COMPLEX VARIABLES 5
• 1.3 Complex numbers and complex variables
• 1.4 On the text
• 2.1 Basic arithmetic relationships
• 2.1.2 Negative numbers
• 2.1.4 The change of variable
• 2.3. NOTATION FOR SERIES SUMS AND PRODUCTS 13
• 2.3 Notation for series sums and products
• 2.4 The arithmetic series
• 2.5 Powers and roots
• 2.5.1 Notation and integral powers
• 2.5.2 Roots
• 2.5.3 Powers of products and powers of powers
• 2.5.4 Sums of powers
• 2.5.5 Summary and remarks
• 2.6. MULTIPLYING AND DIVIDING POWER SERIES 21
• 2.6 Multiplying and dividing power series
• 2.6.1 Multiplying power series
• 2.6.2 Dividing power series
• 2.6.3 Dividing power series by matching coeﬃcients
• 2.6.5 Variations on the geometric series
• 2.8 Exponentials and logarithms
• 2.8.1 The logarithm
• 2.9. TRIANGLES AND OTHER POLYGONS: SIMPLE FACTS 33
• 2.8.2 Properties of the logarithm
• 2.9 Triangles and other polygons: simple facts
• 2.9.1 Triangle area
• 2.9.2 The triangle inequalities
• 2.9.3 The sum of interior angles
• 2.10 The Pythagorean theorem
• 2.11 Functions
• 2.12. COMPLEX NUMBERS (INTRODUCTION) 39
• 2.12 Complex numbers (introduction)
• 2.12.2 Complex conjugation
• 2.12.3 Power series and analytic functions (preview)
• Trigonometry
• 3.1 Deﬁnitions
• 3.2 Simple properties
• 3.3 Scalars, vectors, and vector notation
• 3.4 Rotation
• 3.5.1 Variations on the sums and diﬀerences
• 3.5.2 Trigonometric functions of double and half angles
• 3.7 The laws of sines and cosines
• 3.8 Summary of properties
• 3.9 Cylindrical and spherical coordinates
• 3.10 The complex triangle inequalities
• 3.11 De Moivre’s theorem
• The derivative
• 4.1 Inﬁnitesimals and limits
• 4.1.1 The inﬁnitesimal
• 4.1.2 Limits
• 4.2 Combinatorics
• 4.2.1 Combinations and permutations
• 4.2.2 Pascal’s triangle
• 4.3 The binomial theorem
• 4.3.1 Expanding the binomial
• 4.3.2 Powers of numbers near unity
• 4.3.3 Complex powers of numbers near unity
• 4.4 The derivative
• 4.4.1 The derivative of the power series
• 4.4.2 The Leibnitz notation
• 4.4.3 The derivative of a function of a complex variable
• 4.4.4 The derivative of za
• 4.4.5 The logarithmic derivative
• 4.5 Basic manipulation of the derivative
• 4.5.1 The derivative chain rule
• 4.5.2 The derivative product rule
• 4.6 Extrema and higher derivatives
• 4.7 L’Hˆopital’s rule
• 4.8 The Newton-Raphson iteration
• The complex exponential
• 5.1 The real exponential
• 5.2 The natural logarithm
• 5.3 Fast and slow functions
• 5.4 Euler’s formula
• 5.5. COMPLEX EXPONENTIALS AND DE MOIVRE 101
• 5.6 Complex trigonometrics
• 5.7 Summary of properties
• 5.8 Derivatives of complex exponentials
• 5.8.1 Derivatives of sine and cosine
• 5.8.2 Derivatives of the trigonometrics
• 5.8.3 Derivatives of the inverse trigonometrics
• 5.9 The actuality of complex quantities
• Primes, roots and averages
• 6.1 Prime numbers
• 6.1.1 The inﬁnite supply of primes
• 6.1.2 Compositional uniqueness
• 6.1.3 Rational and irrational numbers
• 6.2.1 Polynomial roots
• 6.2.2 The fundamental theorem of algebra
• 6.3.1 Serial and parallel addition
• 6.3.2 Averages
• The integral
• 7.1 The concept of the integral
• 7.1.1 An introductory example
• 7.1.2 Generalizing the introductory example
• 7.1.3 The balanced deﬁnition and the trapezoid rule
• 7.3 Operators, linearity and multiple integrals
• 7.3.1 Operators
• 7.3.2 A formalism
• 7.3.3 Linearity
• 7.3.4 Summational and integrodiﬀerential transitivity
• 7.3.5 Multiple integrals
• 7.4 Areas and volumes
• 7.4.1 The area of a circle
• 7.4.2 The volume of a cone
• 7.4.3 The surface area and volume of a sphere
• 7.5 Checking an integration
• 7.6 Contour integration
• 7.7 Discontinuities
• 7.8 Remarks (and exercises)
• The Taylor series
• 8.1 The power-series expansion of 1/(1−z)n+1
• 8.1.1 The formula
• 8.1.2 The proof by induction
• 8.1.3 Convergence
• 8.1.4 General remarks on mathematical induction
• 8.2 Shifting a power series’ expansion point
• 8.3. EXPANDING FUNCTIONS IN TAYLOR SERIES 159
• 8.3 Expanding functions in Taylor series
• 8.4 Analytic continuation
• 8.5 Branch points
• 8.6 Entire and meromorphic functions
• of entire and meromorphic functions.11
• 8.7 Extrema over a complex domain
• 8.8 Cauchy’s integral formula
• 8.8.1 The meaning of the symbol dz
• 8.8.2 Integrating along the contour
• 8.8.3 The formula
• 8.8.4 Enclosing a multiple pole
• 8.9 Taylor series for speciﬁc functions
• 8.10 Error bounds
• 8.10.1 Examples
• 8.10.2 Majorization
• 8.10.3 Geometric majorization
• 8.10.4 Calculation outside the fast convergence domain
• 8.10.5 Nonconvergent series
• 8.10.6 Remarks
• 8.11 Calculating 2π
• 8.12 Odd and even functions
• 8.13 Trigonometric poles
• 8.14 The Laurent series
• 8.15 Taylor series in 1/z
• 8.16. THE MULTIDIMENSIONAL TAYLOR SERIES 191
• 8.16 The multidimensional Taylor series
• Integration techniques
• 9.1 Integration by antiderivative
• 9.2 Integration by substitution
• 9.3 Integration by parts
• 9.4 Integration by unknown coeﬃcients
• 9.5 Integration by closed contour
• 9.6. INTEGRATION BY PARTIAL-FRACTION EXPANSION 205
• 9.6 Integration by partial-fraction expansion
• 9.6.1 Partial-fraction expansion
• 9.6.2 Repeated poles
• 9.6.3 Integrating a rational function
• 9.6.4 The derivatives of a rational function
• 9.6.5 Repeated poles (the conventional technique)
• 9.6.6 The existence and uniqueness of solutions
• 9.7 Frullani’s integral
• 9.9 Integration by Taylor series
• Cubics and quartics
• 10.1 Vieta’s transform
• 10.2 Cubics
• 10.3 Superﬂuous roots
• 10.4 Edge cases
• 10.5 Quartics
• 10.6 Guessing the roots
• The matrix
• 11.1 Provenance and basic use
• 11.1.1 The linear transformation
• 11.1.2 Matrix multiplication (and addition)
• 11.1.3 Row and column operators
• 11.1.4 The transpose and the adjoint
• 11.2 The Kronecker delta
• 11.3 Dimensionality and matrix forms
• 11.3.1 The null and dimension-limited matrices
• 11.3.3 The active region
• 11.3.4 Other matrix forms
• 11.3.5 The rank-r identity matrix
• 11.3.6 The truncation operator
• 11.3.7 The elementary vector and the lone-element matrix
• 11.3.8 Oﬀ-diagonal entries
• 11.4 The elementary operator
• 11.4.1 Properties
• 11.4.2 Commutation and sorting
• 11.5 Inversion and similarity (introduction)
• 11.6 Parity
• 11.7 The quasielementary operator
• 11.8 The unit triangular matrix
• 11.8.1 Construction
• 11.8.2 The product of like unit triangular matrices
• 11.8.3 Inversion
• 11.8.4 The parallel unit triangular matrix
• 11.8.5 The partial unit triangular matrix
• 11.9 The shift operator
• 11.10 The Jacobian derivative
• 12.1 Linear independence
• 12.2 The elementary similarity transformation
• 12.3 The Gauss-Jordan decomposition
• 12.3.1 Motive
• 12.3.2 Method
• 12.3.3 The algorithm
• 12.3.4 Rank and independent rows
• 12.3.5 Inverting the factors
• 12.3.6 Truncating the factors
• 12.3.7 Properties of the factors
• 12.3.8 Marginalizing the factor In
• 12.3.9 Decomposing an extended operator
• 12.4 Vector replacement
• 12.5 Rank
• 12.5.1 A logical maneuver
• 12.5.2 The impossibility of identity-matrix promotion
• 12.5.3 General matrix rank and its uniqueness
• 12.5.4 The full-rank matrix
• 12.5.5 Underdetermined and overdetermined linear systems (introduction)
• 12.5.6 The full-rank factorization
• 12.5.8 The signiﬁcance of rank uniqueness
• 13.1 Inverting the square matrix
• 13.2 The exactly determined linear system
• 13.3 The kernel
• 13.3.1 The Gauss-Jordan kernel formula
• 13.3.2 Converting between kernel matrices
• 13.3.3 The degree of freedom
• 13.4 The nonoverdetermined linear system
• 13.4.1 Particular and homogeneous solutions
• 13.4.2 A particular solution
• 13.4.3 The general solution
• 13.5 The residual
• 13.6. THE PSEUDOINVERSE AND LEAST SQUARES 335
• 13.6.1 Least squares in the real domain
• 13.6.2 The invertibility of A∗A
• 13.6.3 Positive deﬁniteness
• 13.6.4 The Moore-Penrose pseudoinverse
• 13.7 The multivariate Newton-Raphson iteration
• 13.8 The dot product
• 13.9 The orthogonal complement
• 13.10 Gram-Schmidt orthonormalization
• 13.10.1 Eﬃcient implementation
• 13.10.2 The Gram-Schmidt decomposition
• 13.10.3 The Gram-Schmidt kernel formula
• 13.11 The unitary matrix
• The eigenvalue
• 14.1 The determinant
• 14.1.1 Basic properties
• 14.1.2 The determinant and the elementary operator
• 14.1.3 The determinant of a singular matrix
• 14.1.4 The determinant of a matrix product
• 14.1.5 Determinants of inverse and unitary matrices
• 14.1.6 Inverting the square matrix by determinant
• 14.2 Coincident properties
• 14.3 The eigenvalue itself
• 14.4 The eigenvector
• 14.5 Eigensolution facts
• 14.6 Diagonalization
• 14.7 Remarks on the eigenvalue
• 14.8 Matrix condition
• 14.9. THE SIMILARITY TRANSFORMATION 375
• 14.9 The similarity transformation
• 14.10. THE SCHUR DECOMPOSITION 377
• 14.10 The Schur decomposition
• 14.10.1 Derivation
• 14.10.2 The nondiagonalizable matrix
• 14.11 The Hermitian matrix
• 14.12 The singular value decomposition
• 14.13 General remarks on the matrix
• Vector analysis
• 15.1 Reorientation
• 15.2 Multiplication
• 15.2.1 The dot product
• 15.2.2 The cross product
• 15.3 Orthogonal bases
• 15.4 Notation
• 15.4.1 Components by subscript
• The notation
• 15.4.2 Einstein’s summation convention
• 15.4.3 The Kronecker delta and the Levi-Civita epsilon
• 15.5 Algebraic identities
• 15.6. FIELDS AND THEIR DERIVATIVES 413
• 15.6 Fields and their derivatives
• 15.6.1 The ∇ operator
• A.2 Avoiding notational clutter

# Derivations of Applied Mathematics

Thaddeus H. Black Revised 19 April 2008

ii

Thaddeus H. Black, 1967–. Derivations of Applied Mathematics. 19 April 2008. U.S. Library of Congress class QA401. Copyright c 1983–2008 by Thaddeus H. Black thb@derivations.org . Published by the Debian Project [10]. This book is free software. You can redistribute and/or modify it under the terms of the GNU General Public License [16], version 2.

Contents
Preface 1 Introduction 1.1 Applied mathematics . . . . . 1.2 Rigor . . . . . . . . . . . . . . 1.2.1 Axiom and deﬁnition . 1.2.2 Mathematical extension 1.3 Complex numbers and complex 1.4 On the text . . . . . . . . . . . xvii 1 1 2 2 4 5 5 7 7 7 9 10 10 11 13 15 15 16 17 19 20 20 21 22 22 26

. . . . . . . . . . . . . . . . . . . . . . . . variables . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

2 Classical algebra and geometry 2.1 Basic arithmetic relationships . . . . . . . . . . . . . . 2.1.1 Commutivity, associativity, distributivity . . . 2.1.2 Negative numbers . . . . . . . . . . . . . . . . 2.1.3 Inequality . . . . . . . . . . . . . . . . . . . . 2.1.4 The change of variable . . . . . . . . . . . . . 2.2 Quadratics . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Notation for series sums and products . . . . . . . . . 2.4 The arithmetic series . . . . . . . . . . . . . . . . . . 2.5 Powers and roots . . . . . . . . . . . . . . . . . . . . . 2.5.1 Notation and integral powers . . . . . . . . . . 2.5.2 Roots . . . . . . . . . . . . . . . . . . . . . . . 2.5.3 Powers of products and powers of powers . . . 2.5.4 Sums of powers . . . . . . . . . . . . . . . . . 2.5.5 Summary and remarks . . . . . . . . . . . . . 2.6 Multiplying and dividing power series . . . . . . . . . 2.6.1 Multiplying power series . . . . . . . . . . . . 2.6.2 Dividing power series . . . . . . . . . . . . . . 2.6.3 Dividing power series by matching coeﬃcients iii

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

iv 2.6.4 Common quotients and the geometric series 2.6.5 Variations on the geometric series . . . . . . 2.7 Constants and variables . . . . . . . . . . . . . . . . 2.8 Exponentials and logarithms . . . . . . . . . . . . . 2.8.1 The logarithm . . . . . . . . . . . . . . . . . 2.8.2 Properties of the logarithm . . . . . . . . . . 2.9 Triangles and other polygons: simple facts . . . . . 2.9.1 Triangle area . . . . . . . . . . . . . . . . . . 2.9.2 The triangle inequalities . . . . . . . . . . . 2.9.3 The sum of interior angles . . . . . . . . . . 2.10 The Pythagorean theorem . . . . . . . . . . . . . . . 2.11 Functions . . . . . . . . . . . . . . . . . . . . . . . . 2.12 Complex numbers (introduction) . . . . . . . . . . . 2.12.1 Rectangular complex multiplication . . . . . 2.12.2 Complex conjugation . . . . . . . . . . . . . 2.12.3 Power series and analytic functions (preview)

CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 29 30 32 32 33 33 34 34 35 36 38 39 41 41 43 45 45 47 47 51 53 54 55 55 59 60 62 64 65 67 67 68 69 70 70 72 72

3 Trigonometry 3.1 Deﬁnitions . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Simple properties . . . . . . . . . . . . . . . . . . . . . . . 3.3 Scalars, vectors, and vector notation . . . . . . . . . . . . 3.4 Rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Trigonometric sums and diﬀerences . . . . . . . . . . . . 3.5.1 Variations on the sums and diﬀerences . . . . . . 3.5.2 Trigonometric functions of double and half angles 3.6 Trigonometrics of the hour angles . . . . . . . . . . . . . 3.7 The laws of sines and cosines . . . . . . . . . . . . . . . . 3.8 Summary of properties . . . . . . . . . . . . . . . . . . . 3.9 Cylindrical and spherical coordinates . . . . . . . . . . . 3.10 The complex triangle inequalities . . . . . . . . . . . . . . 3.11 De Moivre’s theorem . . . . . . . . . . . . . . . . . . . . . 4 The derivative 4.1 Inﬁnitesimals and limits . 4.1.1 The inﬁnitesimal . 4.1.2 Limits . . . . . . . 4.2 Combinatorics . . . . . . 4.2.1 Combinations and 4.2.2 Pascal’s triangle . 4.3 The binomial theorem . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . permutations . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

CONTENTS 4.3.1 Expanding the binomial . . . . . . . . . . 4.3.2 Powers of numbers near unity . . . . . . 4.3.3 Complex powers of numbers near unity . The derivative . . . . . . . . . . . . . . . . . . . 4.4.1 The derivative of the power series . . . . 4.4.2 The Leibnitz notation . . . . . . . . . . . 4.4.3 The derivative of a function of a complex 4.4.4 The derivative of z a . . . . . . . . . . . . 4.4.5 The logarithmic derivative . . . . . . . . Basic manipulation of the derivative . . . . . . . 4.5.1 The derivative chain rule . . . . . . . . . 4.5.2 The derivative product rule . . . . . . . . Extrema and higher derivatives . . . . . . . . . . L’Hˆpital’s rule . . . . . . . . . . . . . . . . . . . o The Newton-Raphson iteration . . . . . . . . . . complex exponential The real exponential . . . . . . . . . . . . . . . . The natural logarithm . . . . . . . . . . . . . . . Fast and slow functions . . . . . . . . . . . . . . Euler’s formula . . . . . . . . . . . . . . . . . . . Complex exponentials and de Moivre . . . . . . Complex trigonometrics . . . . . . . . . . . . . . Summary of properties . . . . . . . . . . . . . . Derivatives of complex exponentials . . . . . . . 5.8.1 Derivatives of sine and cosine . . . . . . . 5.8.2 Derivatives of the trigonometrics . . . . . 5.8.3 Derivatives of the inverse trigonometrics The actuality of complex quantities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

v 72 73 74 75 75 76 78 79 80 80 81 81 83 84 86 91 91 94 96 98 101 101 103 103 103 106 106 108

4.4

4.5

4.6 4.7 4.8 5 The 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8

5.9

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

6 Primes, roots and averages 6.1 Prime numbers . . . . . . . . . . . . . . . . . 6.1.1 The inﬁnite supply of primes . . . . . 6.1.2 Compositional uniqueness . . . . . . . 6.1.3 Rational and irrational numbers . . . 6.2 The existence and number of roots . . . . . . 6.2.1 Polynomial roots . . . . . . . . . . . . 6.2.2 The fundamental theorem of algebra 6.3 Addition and averages . . . . . . . . . . . . . 6.3.1 Serial and parallel addition . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

111 . 111 . 111 . 112 . 115 . 116 . 116 . 117 . 118 . 118

vi 6.3.2

CONTENTS Averages . . . . . . . . . . . . . . . . . . . . . . . . . 121 125 . 125 . 126 . 129 . 129 . 130 . 132 . 132 . 133 . 134 . 135 . 137 . 139 . 139 . 139 . 140 . 143 . 144 . 145 . 148 151 152 152 154 155 156 157 159 161 163 165 166 167 168 168 172 173

7 The integral 7.1 The concept of the integral . . . . . . . . . . . . . . . . 7.1.1 An introductory example . . . . . . . . . . . . . 7.1.2 Generalizing the introductory example . . . . . 7.1.3 The balanced deﬁnition and the trapezoid rule . 7.2 The antiderivative . . . . . . . . . . . . . . . . . . . . . 7.3 Operators, linearity and multiple integrals . . . . . . . . 7.3.1 Operators . . . . . . . . . . . . . . . . . . . . . 7.3.2 A formalism . . . . . . . . . . . . . . . . . . . . 7.3.3 Linearity . . . . . . . . . . . . . . . . . . . . . . 7.3.4 Summational and integrodiﬀerential transitivity 7.3.5 Multiple integrals . . . . . . . . . . . . . . . . . 7.4 Areas and volumes . . . . . . . . . . . . . . . . . . . . . 7.4.1 The area of a circle . . . . . . . . . . . . . . . . 7.4.2 The volume of a cone . . . . . . . . . . . . . . . 7.4.3 The surface area and volume of a sphere . . . . 7.5 Checking an integration . . . . . . . . . . . . . . . . . . 7.6 Contour integration . . . . . . . . . . . . . . . . . . . . 7.7 Discontinuities . . . . . . . . . . . . . . . . . . . . . . . 7.8 Remarks (and exercises) . . . . . . . . . . . . . . . . . . 8 The Taylor series 8.1 The power-series expansion of 1/(1 − z)n+1 . . . . . 8.1.1 The formula . . . . . . . . . . . . . . . . . . 8.1.2 The proof by induction . . . . . . . . . . . . 8.1.3 Convergence . . . . . . . . . . . . . . . . . . 8.1.4 General remarks on mathematical induction 8.2 Shifting a power series’ expansion point . . . . . . . 8.3 Expanding functions in Taylor series . . . . . . . . . 8.4 Analytic continuation . . . . . . . . . . . . . . . . . 8.5 Branch points . . . . . . . . . . . . . . . . . . . . . 8.6 Entire and meromorphic functions . . . . . . . . . . 8.7 Extrema over a complex domain . . . . . . . . . . . 8.8 Cauchy’s integral formula . . . . . . . . . . . . . . . 8.8.1 The meaning of the symbol dz . . . . . . . . 8.8.2 Integrating along the contour . . . . . . . . . 8.8.3 The formula . . . . . . . . . . . . . . . . . . 8.8.4 Enclosing a multiple pole . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

CONTENTS 8.9 Taylor series for speciﬁc functions 8.10 Error bounds . . . . . . . . . . . . 8.10.1 Examples . . . . . . . . . . 8.10.2 Majorization . . . . . . . . 8.10.3 Geometric majorization . . 8.10.4 Calculation outside the fast 8.10.5 Nonconvergent series . . . 8.10.6 Remarks . . . . . . . . . . 8.11 Calculating 2π . . . . . . . . . . . 8.12 Odd and even functions . . . . . . 8.13 Trigonometric poles . . . . . . . . 8.14 The Laurent series . . . . . . . . . 8.15 Taylor series in 1/z . . . . . . . . 8.16 The multidimensional Taylor series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . convergence domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

vii 174 175 175 178 179 181 183 184 185 185 186 187 190 191

9 Integration techniques 9.1 Integration by antiderivative . . . . . . . . . . . . . 9.2 Integration by substitution . . . . . . . . . . . . . . 9.3 Integration by parts . . . . . . . . . . . . . . . . . . 9.4 Integration by unknown coeﬃcients . . . . . . . . . 9.5 Integration by closed contour . . . . . . . . . . . . . 9.6 Integration by partial-fraction expansion . . . . . . 9.6.1 Partial-fraction expansion . . . . . . . . . . . 9.6.2 Repeated poles . . . . . . . . . . . . . . . . . 9.6.3 Integrating a rational function . . . . . . . . 9.6.4 The derivatives of a rational function . . . . 9.6.5 Repeated poles (the conventional technique) 9.6.6 The existence and uniqueness of solutions . . 9.7 Frullani’s integral . . . . . . . . . . . . . . . . . . . 9.8 Products of exponentials, powers and logs . . . . . . 9.9 Integration by Taylor series . . . . . . . . . . . . . . 10 Cubics and quartics 10.1 Vieta’s transform . 10.2 Cubics . . . . . . . 10.3 Superﬂuous roots . 10.4 Edge cases . . . . 10.5 Quartics . . . . . . 10.6 Guessing the roots

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

193 . 193 . 194 . 195 . 197 . 200 . 205 . 205 . 206 . 209 . 210 . 210 . 212 . 213 . 215 . 216 219 . 220 . 220 . 223 . 225 . 227 . 229

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

viii

CONTENTS 233 . 236 . 236 . 237 . 239 . 240 . 241 . 242 . 244 . 245 . 247 . 247 . 248 . 249 . 249 . 250 . 250 . 252 . 253 . 254 . 258 . 260 . 261 . 262 . 263 . 265 . 267 . 268 . 268 . 269 . 274 . 276 . 276 279 . 280 . 282 . 282 . 284 . 287

11 The matrix 11.1 Provenance and basic use . . . . . . . . . . . . . . . . . 11.1.1 The linear transformation . . . . . . . . . . . . . 11.1.2 Matrix multiplication (and addition) . . . . . . 11.1.3 Row and column operators . . . . . . . . . . . . 11.1.4 The transpose and the adjoint . . . . . . . . . . 11.2 The Kronecker delta . . . . . . . . . . . . . . . . . . . . 11.3 Dimensionality and matrix forms . . . . . . . . . . . . . 11.3.1 The null and dimension-limited matrices . . . . 11.3.2 The identity, scalar and extended matrices . . . 11.3.3 The active region . . . . . . . . . . . . . . . . . 11.3.4 Other matrix forms . . . . . . . . . . . . . . . . 11.3.5 The rank-r identity matrix . . . . . . . . . . . . 11.3.6 The truncation operator . . . . . . . . . . . . . 11.3.7 The elementary vector and lone-element matrix 11.3.8 Oﬀ-diagonal entries . . . . . . . . . . . . . . . . 11.4 The elementary operator . . . . . . . . . . . . . . . . . 11.4.1 Properties . . . . . . . . . . . . . . . . . . . . . 11.4.2 Commutation and sorting . . . . . . . . . . . . . 11.5 Inversion and similarity (introduction) . . . . . . . . . . 11.6 Parity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.7 The quasielementary operator . . . . . . . . . . . . . . 11.7.1 The general interchange operator . . . . . . . . 11.7.2 The general scaling operator . . . . . . . . . . . 11.7.3 Addition quasielementaries . . . . . . . . . . . . 11.8 The unit triangular matrix . . . . . . . . . . . . . . . . 11.8.1 Construction . . . . . . . . . . . . . . . . . . . . 11.8.2 The product of like unit triangular matrices . . 11.8.3 Inversion . . . . . . . . . . . . . . . . . . . . . . 11.8.4 The parallel unit triangular matrix . . . . . . . 11.8.5 The partial unit triangular matrix . . . . . . . . 11.9 The shift operator . . . . . . . . . . . . . . . . . . . . . 11.10 The Jacobian derivative . . . . . . . . . . . . . . . . . . 12 Rank and the Gauss-Jordan 12.1 Linear independence . . . . . . . . . . . . 12.2 The elementary similarity transformation 12.3 The Gauss-Jordan decomposition . . . . . 12.3.1 Motive . . . . . . . . . . . . . . . 12.3.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

CONTENTS 12.3.3 12.3.4 12.3.5 12.3.6 12.3.7 12.3.8 12.3.9 12.4 Vector 12.5 Rank 12.5.1 12.5.2 12.5.3 12.5.4 12.5.5 12.5.6 12.5.7 12.5.8 The algorithm . . . . . . . . . . . . . . . . . . . . . Rank and independent rows . . . . . . . . . . . . . Inverting the factors . . . . . . . . . . . . . . . . . . Truncating the factors . . . . . . . . . . . . . . . . . Properties of the factors . . . . . . . . . . . . . . . Marginalizing the factor In . . . . . . . . . . . . . . Decomposing an extended operator . . . . . . . . . replacement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A logical maneuver . . . . . . . . . . . . . . . . . . The impossibility of identity-matrix promotion . . . General matrix rank and its uniqueness . . . . . . . The full-rank matrix . . . . . . . . . . . . . . . . . Under- and overdetermined systems (introduction) . The full-rank factorization . . . . . . . . . . . . . . Full column rank and the Gauss-Jordan’s K and S The signiﬁcance of rank uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ix 288 296 297 298 299 300 301 302 305 306 307 310 312 313 314 315 316 319 320 324 325 326 330 330 332 333 333 334 334 335 337 339 340 340 344 345 347 347 349

13 Inversion and orthonormalization 13.1 Inverting the square matrix . . . . . . . . . . 13.2 The exactly determined linear system . . . . 13.3 The kernel . . . . . . . . . . . . . . . . . . . 13.3.1 The Gauss-Jordan kernel formula . . 13.3.2 Converting between kernel matrices . 13.3.3 The degree of freedom . . . . . . . . . 13.4 The nonoverdetermined linear system . . . . 13.4.1 Particular and homogeneous solutions 13.4.2 A particular solution . . . . . . . . . 13.4.3 The general solution . . . . . . . . . . 13.5 The residual . . . . . . . . . . . . . . . . . . 13.6 The pseudoinverse and least squares . . . . . 13.6.1 Least squares in the real domain . . . 13.6.2 The invertibility of A∗ A . . . . . . . . 13.6.3 Positive deﬁniteness . . . . . . . . . . 13.6.4 The Moore-Penrose pseudoinverse . . 13.7 The multivariate Newton-Raphson iteration . 13.8 The dot product . . . . . . . . . . . . . . . . 13.9 The orthogonal complement . . . . . . . . . . 13.10 Gram-Schmidt orthonormalization . . . . . . 13.10.1 Eﬃcient implementation . . . . . . .

4. . . . . . . . 14. 359 . . . . . 405 . . . . . . . 15. . . . 14. . . . . . . . . .9 The similarity transformation .10. . . . . . . . . . . . . . . . . . . . . . 375 . . . . . . . . . . . . . . . . .2 Coincident properties . . . . . . . . . . . . . . . . . . . . . . .4.10 The Schur decomposition . . . .1. . . 363 . . . . 370 . . . . . . . . . . .2 Multiplication . 374 . . . . . . . . . . . . . 14. . . . 14. .11 The Hermitian matrix . 357 . . 15. . . . 14. 395 .1. . 14. . . . . . . . . . . .3 The Kronecker delta and the Levi-Civita epsilon 15. .1 Basic properties . . . .4 The eigenvector . . . . . . . . . . . . . . . . . . 15. . . . .10. . . . . . . . . . . . . . . . . . . . . . . . . . 14. 14. 366 . .12 The singular value decomposition . . . . . . . . . . . . . . . 398 . . . . . . . . 14. . . . . . 14. .10. . . . . . . . . . . . . . . . . . . . . .x CONTENTS 13. . . . . . . 413 . . . . . . . . . . .8 Matrix condition . . . . . . . . .5 Determinants of inverse and unitary matrices . . . . .6. 14. . . . . . . . . . . . . . . . . . . . . . . 406 . . . . . . .7 Remarks on the eigenvalue . . 405 . . . . .5 Algebraic identities . . . . . . . . 382 . 354 14 The eigenvalue 14. .3 Orthogonal bases . .6 Inverting the square matrix by determinant .5 Eigensolution facts . . . 349 13. 14. 14. . . . 14. . . . . . 357 . . . . . 14. . . 15. 364 . . . . . . . . . . .1. . .1 The determinant . . . 373 .6 Diagonalization . . . . . . . . . . 353 13. . . . .1 Components by subscript . 362 . . . . . . . .2. . . . . . . 15. 401 . .13 General remarks on the matrix . . . . .1 The ∇ operator . 377 . . . 15.11 The unitary matrix . . . . . . . . . . . . . . . . . . . . 364 . . . . . . . . .1 Derivation . 15. . . . 14. . . . . . . . .2 The nondiagonalizable matrix . . .3 The Gram-Schmidt kernel formula . . . . . . . . .1. .1 Reorientation . . . 388 . . . . . 408 . . . . . . 15. . . 15.3 The eigenvalue itself . 414 15 Vector analysis 15. . . . . . . . . . . . . . . . . .2 The cross product . . . .2 Einstein’s summation convention . . . . . . . . . . . . . .2 The determinant and the elementary operator 14. . . . . 14.6 Fields and their derivatives . . . . . . . . .4 The determinant of a matrix product . 365 . . .1. . . . . . .4 Notation . . . . . . . . . . . 14. . . . . . . . . . 371 . . . . . . . . . . . . . . . . . . . . . . 397 . . . . . . 384 . 368 . . . . . . . . . . . . . .3 The determinant of a singular matrix . . 412 . . . . . .1 The dot product . .1. . . . . . . . . . . 369 . . . . .2. . . . . . . . .2 The Gram-Schmidt decomposition . . 390 393 .4. . 377 . . . . . . . . . 14. .10. . 399 . . . . . . . . . . 15.

. . . .1 Hexadecimal numerals . 15.7 Integral forms . . . . . . . . . . . . . . . . . . . . . . . . . .6. . . . . xi 416 418 419 421 423 423 425 A Hex and other notational matters 431 A. . . . .1 The divergence theorem . . . . . . . . . . .2 Avoiding notational clutter .4 Divergence . . . . . . . . . . . . . . . . . . . . . . . . .2 Stokes’ theorem . 15. . . . . . . . . . . 15. . . . . . .3 The directional derivative 15. 432 A. . . . . . . . . . . . . . 15. . . . . . . . . . . . .5 Curl . . . . . . . . . .6. . .6. . . . . .2 Operator notation . .7. . . . . . 433 B The Greek alphabet C Manuscript history 435 439 . . . . . . . . . . . . . . .7. . and the gradient . . . . . . . . . . . . . . . . 15. . . . . . . . .6. .CONTENTS 15. .

xii CONTENTS .

. .4 11. . . . . . . .1 11. . . . . . .3 6. . . . . . . . . . . . . . .1 5. . . . . . . . . . . . . . . . . . . . . . . .1 Basic properties of arithmetic. . cylindrical and spherical coordinate relations. . . . . . . . . . . . . . . . . .2 The method to extract the four roots of the general quartic. . 255 256 256 258 273 . . . . . . . . . . Trigonometric functions of the hour angles. . powers and logs. . . . . . . . . . Properties of the parallel unit triangular xiii . . . . . . . .1 The method to extract the three roots of the general cubic. Rectangular. . . . .1 8.1 2. matrix. . . . 109 Parallel and serial addition identities. Power properties and deﬁnitions. . .4 5. . . . . . . . . . . . . . . . . . . . . . . . 104 Derivatives of the trigonometrics. . 216 10. . . . . . . . . . . . . larger powers. . . . . . .List of Tables 2.3 11. . . . . . . 230 11. .5 3. . . . . . . .1 3.2 2. . . . .2 5. . . . . . . . . . Complex exponential properties. . . . . . . . . . . . . . 120 Basic derivatives for the antiderivative. Matrix inversion properties. .3 3. . . . . smaller powers. . . . . . . . . . . . Elementary operators: addition. . . . . .5 Elementary operators: interchange. . . . . 176 Antiderivatives of products of exps. . . . . Dividing power series through successively Dividing power series through successively General properties of the logarithm. .1 7.1 9. . . . .3 2. . . 223 10. . 8 16 25 27 34 48 58 61 63 Simple properties of the trigonometric functions.2 11. . . 107 Derivatives of the inverse trigonometrics. . . . . . . Elementary operators: scaling. . . . .2 3. . . . . . . . . . . . . . . . . 132 Taylor series.4 2. Further properties of the trigonometric functions. . . .

.2 A few properties of the Gauss-Jordan factors. . . . . . . . . . . . . . 436 . . . . . . . . 412 B. . . .xiv LIST OF TABLES 12. . . . . .1 The Roman and Greek alphabets. . . . . . . .1 Properties of the Kronecker delta and Levi-Civita epsilon. . . . 300 12.3 The symmetrical equations of § 12. . . . . . . . . . . . . . . 283 12. . . .4. 304 15.2 Algebraic vector identities. . . . . 409 15. . . . . . . . . . . .1 Some elementary similarity transformations.

. . . . . . . . . . . . .2 3. . derivatives of the sine and cosine functions. . . . . . . . . . . . . . . The Newton-Raphson iteration. . . . . . . . 4 8 35 37 37 40 46 47 49 49 52 57 57 59 63 72 73 83 84 87 natural exponential.5 3. . . Pascal’s triangle. . . . . . . . . . . . . . . .1 2. . . . . . . . . . . . . . . . . . . . . . . . The plan for Pascal’s triangle. . Multiplicative commutivity. . . .List of Figures 1. . . . . . . . . .7 3. . . . .1 3. . . The laws of sines and cosines. . . . . . .4 3. . . . . . . . . . . . . . . . . . . . . the corner. . . . . . . . . . . . . The The The The . . . . . . . . . . . . .5 3. . . . . . . . . . . . . . natural logarithm. 126 xv . . . . . 94 . . . . . . . . The sine function. . .3 4. . The Pythagorean theorem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ˆ ˆ A two-dimensional vector u = xx + yy. . . . . . . The sine and the cosine. .1 5. . . 99 . . . . . . . . . . . . . . . .1 4. .8 3. . . . . . . 105 Areas representing discrete sums.3 2. . . . .3 5. .9 4. . . . . . The complex (or Argand) plane. . . . . . . . . . . . . . . . . . . . 95 . . . . . . . . . . .2 5. . . . . . . . . . . . . . . . . . . . . .6 3. .4 4. . . . . . .4 7. . . . .1 2. . The 0x18 hours in a circle. . . . . . . . . . .2 4. . . . . . . . A local extremum. . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Two triangles. . . . ˆ ˆ ˆ A three-dimensional vector v = xx + yy + zz. . . . . . . . . .4 2. . . . A point on a sphere. . . . . . . Calculating the hour trigonometrics. . . . . .3 3. . . A level inﬂection. . complex exponential and Euler’s formula. . . Vector basis rotation. . . . . . . . . . . . . . . . . . . . . . . . . . . . The sum of a triangle’s inner angles: turning at A right triangle. .5 5. . . . . . . . . . . .2 2. . . . . . . . .

169 A Cauchy contour integral. . . . . . . . . . . . . . . . The cylindrical basis. . . .1 Fitting a line to measured data. . . A contour of integration. . . . . .1 Vieta’s transform. . .8 7. . . . . . . . . . . . . . . The area of a circle. . . . . . . . . . . 394 398 401 404 404 . . . . . . The Dirac delta δ(t). . . . . . . . . . . . . . . . . . . . . 336 15. . . . . . . . . . . . . . . . . . .1 15. . . . . . . . . . . . . . . . . . . . . .3 7. . . . The Heaviside unit step u(t). . . . . . . . . .7 7. . . . . . . The volume of a cone. . . . . . .4 7. . plotted logarithmically. . . .4 15. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 7. 201 10. . . . . . The spherical basis. . . . . . . .2 7. . . . . . . . . . . . . . . .5 7. . . . . . 173 Majorization. . . . . . . . . .2 15. . . . . . . . . .xvi 7. . . . . . . . An element of a sphere’s surface.3 9. . . . Integration by the trapezoid rule. . The dot product . . . . . . . . . . . The cross product. . . . . . . . .1 LIST OF FIGURES An area representing an inﬁnite sum of inﬁnitesimals. . . . . . . . . . . . . . . . . . . . . . . . . . . .10 8. . . . . . . . . . . . . . . . . . . . . . .9 7. . . . . . . . . . .3 15. . .5 A point on a sphere. . . . . . . . A sphere. . . . . 221 13. . . . 179 Integration by closed contour. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 8. . . . . . . . . . . . . . . . . . . . 128 130 139 140 141 141 145 146 146 A complex contour of integration in two segments. . .1 8. . . . . .

complex variables and. Later chapters build upon the foundation. integrals. There unfortunately really is no gradual way to bridge the gap to hexadecimal xvii . Decimal numerals are ﬁne in history and anthropology (man has ten ﬁngers). One particular respect in which the book departs requires some defense here. From such a kernel the notes grew over time. There is nothing wrong with decimal numerals as such.Preface I never meant to write this book. trigonometrics. The book. of course. In the book. until family and friends suggested that the notes might make the material for a book. Kernighan’s and Ritchie’s The C Programming Language [27]. The book began in 1983 when a high-school classmate challenged me to prove the Pythagorean theorem on the spot. adding to it the derivations of a few other theorems of interest to me. series. treating quadratics. cents. never aesthetic. pounds. I lost the dare. The book is neither a tutorial on the one hand nor a bald reference on the other. law and engineering (the physical units are arbitrary anyway). ﬁrst frames coherently the simplest. The book is so organized. or you can begin on page one and read— with toil and commensurate proﬁt—straight through to the end of the last chapter. but looking the proof up later I recorded it on loose leaves. for instance. deriving results less general or more advanced. It is a study reference. unexpectedly. the aforementioned Pythagorean theorem. shillings. I think: the book employs hexadecimal numerals. most basic derivations of applied mathematics. pence: the base hardly matters). exponentials. you can look up some particular result directly. entire book as it stands. The book generally follows established applied mathematical convention but where worthwhile occasionally departs therefrom. It emerged unheralded. but they are merely serviceable in mathematical theory. in the tradition of. These and others establish the book’s foundation in Chapters 2 through 9. derivatives. a work yet in progress but a complete. Such is the book’s plan. ﬁnance and accounting (dollars.

electrical engineering. Brown or he of me. underlying purpose or hidden plan. the ways in which a good doctoral adviser inﬂuences his student are both complex and deep. To the initiated. building construction. but I suppose that what is true for me is true for many of them also: we begin by organizing notes for our own use. Among the bibliography’s entries stands a reference [6] to my doctoral adviser G. and then undertake to revise the notes and to bring them into a form which actually is useful to others. undoubtedly bias the selection and presentation to some degree. and even where a proof is new the idea proven probably is not. Prof. of course. Brown. Brown had nothing directly to do with the book’s development. electromagnetic analysis and Debian development. and my work under him did not regard the book in any case. What constitutes “useful” or “orderly” is a matter of perspective and judgment. How other authors go about writing their books. However. usually too implicitly coherently to cite. or of the notes underlying it. then observe that the same notes might prove useful to others. though the book’s narrative seldom explicitly invokes the reference. The latter proofs are perhaps original or semi-original from my personal point of view. Brown’s style and insight touch the book in many places and in many ways.xix while writing. had been drafted years before I had ever heard of Prof. the book has none. As to a grand goal. for a substantial bulk of the manuscript. for its methods and truths are established by derivation rather than authority. other than to derive as many useful mathematical results as possible and to record the derivations together in an orderly manner in a single volume. Much of the book consists of common mathematical knowledge or of proofs I have worked out with my own pencil from various ideas gleaned who knows where over the years. residence and citizenship in the United States. My own peculiar heterogeneous background in military service. Prof. the mathematics itself often tends to suggest the form of the proof: if to me. Mathematics by its very nature promotes queer bibliographies. THB . however.S. then surely also to others who came before. I do not know. but it is unlikely that many if any of them are truly new. Whether this book succeeds in the last point is for the reader to judge. my nativity.

xx PREFACE .

This ﬁrst chapter treats a few introductory matters of general interest.1 Applied mathematics Applied mathematics is a branch of mathematics that concerns itself with the application of mathematical knowledge to other domains. The question of what is applied mathematics does not answer to logical classiﬁcation so much as to the sociology of professionals who use mathematics. demonstrable but generally lacking the detailed rigor of the professional mathematician. The book’s chapters are topical. de¨mphasizing formal mathematical rigor.Chapter 1 Introduction This is a book of applied mathematical proofs. The book’s purpose is to convey the essential ideas underlying the derivations of a large number of mathematical results useful in the modeling of physical systems. the book emphasizes main threads of mathematical argument and the motivation underlying the main threads. To this end. if you want to know why the result is so. you can look for the proof here. It derives mathematical results e from the purely applied perspective of the scientist and the engineer. . proceeding not from reduced. well deﬁned sets of axioms but rather directly from a nebulous mass of natural arithmetical. In this book we shall deﬁne applied mathematics to be correct mathematics useful to scientists. geometrical and classical-algebraic idealizations of physical systems. engineers and the like. . on both counts. [30] What is applied mathematics? That is about right. 1. . If you have seen a mathematical result. 1 .

2

CHAPTER 1. INTRODUCTION

1.2

Rigor

It is impossible to write such a book as this without some discussion of mathematical rigor. Applied and professional mathematics diﬀer principally and essentially in the layer of abstract deﬁnitions the latter subimposes beneath the physical ideas the former seeks to model. Notions of mathematical rigor ﬁt far more comfortably in the abstract realm of the professional mathematician; they do not always translate so gracefully to the applied realm. The applied mathematical reader and practitioner needs to be aware of this diﬀerence.

1.2.1

Axiom and deﬁnition

Ideally, a professional mathematician knows or precisely speciﬁes in advance the set of fundamental axioms he means to use to derive a result. A prime aesthetic here is irreducibility: no axiom in the set should overlap the others or be speciﬁable in terms of the others. Geometrical argument—proof by sketch—is distrusted. The professional mathematical literature discourages undue pedantry indeed, but its readers do implicitly demand a convincing assurance that its writers could derive results in pedantic detail if called upon to do so. Precise deﬁnition here is critically important, which is why the professional mathematician tends not to accept blithe statements such as that 1 = ∞, 0 without ﬁrst inquiring as to exactly what is meant by symbols like 0 and ∞. The applied mathematician begins from a diﬀerent base. His ideal lies not in precise deﬁnition or irreducible axiom, but rather in the elegant modeling of the essential features of some physical system. Here, mathematical deﬁnitions tend to be made up ad hoc along the way, based on previous experience solving similar problems, adapted implicitly to suit the model at hand. If you ask the applied mathematician exactly what his axioms are, which symbolic algebra he is using, he usually doesn’t know; what he knows is that the bridge has its footings in certain soils with speciﬁed tolerances, suﬀers such-and-such a wind load, etc. To avoid error, the applied mathematician relies not on abstract formalism but rather on a thorough mental grasp of the essential physical features of the phenomenon he is trying to model. An equation like 1 =∞ 0

1.2. RIGOR

3

may make perfect sense without further explanation to an applied mathematical readership, depending on the physical context in which the equation is introduced. Geometrical argument—proof by sketch—is not only trusted but treasured. Abstract deﬁnitions are wanted only insofar as they smooth the analysis of the particular physical problem at hand; such deﬁnitions are seldom promoted for their own sakes. The irascible Oliver Heaviside, responsible for the important applied mathematical technique of phasor analysis, once said, It is shocking that young people should be addling their brains over mere logical subtleties, trying to understand the proof of one obvious fact in terms of something equally . . . obvious. [35] Exaggeration, perhaps, but from the applied mathematical perspective Heaviside nevertheless had a point. The professional mathematicians Richard Courant and David Hilbert put it more soberly in 1924 when they wrote, Since the seventeenth century, physical intuition has served as a vital source for mathematical problems and methods. Recent trends and fashions have, however, weakened the connection between mathematics and physics; mathematicians, turning away from the roots of mathematics in intuition, have concentrated on reﬁnement and emphasized the postulational side of mathematics, and at times have overlooked the unity of their science with physics and other ﬁelds. In many cases, physicists have ceased to appreciate the attitudes of mathematicians. [8, Preface] Although the present book treats “the attitudes of mathematicians” with greater deference than some of the unnamed 1924 physicists might have done, still, Courant and Hilbert could have been speaking for the engineers and other applied mathematicians of our own day as well as for the physicists of theirs. To the applied mathematician, the mathematics is not principally meant to be developed and appreciated for its own sake; it is meant to be used. This book adopts the Courant-Hilbert perspective. The introduction you are now reading is not the right venue for an essay on why both kinds of mathematics—applied and professional (or pure)— are needed. Each kind has its place; and although it is a stylistic error to mix the two indiscriminately, clearly the two have much to do with one another. However this may be, this book is a book of derivations of applied mathematics. The derivations here proceed by a purely applied approach.

4

CHAPTER 1. INTRODUCTION Figure 1.1: Two triangles.

h b1 b b2 b −b2

h

1.2.2

Mathematical extension

Profound results in mathematics are occasionally achieved simply by extending results already known. For example, negative integers and their properties can be discovered by counting backward—3, 2, 1, 0—then asking what follows (precedes?) 0 in the countdown and what properties this new, negative integer must have to interact smoothly with the already known positives. The astonishing Euler’s formula (§ 5.4) is discovered by a similar but more sophisticated mathematical extension. More often, however, the results achieved by extension are unsurprising and not very interesting in themselves. Such extended results are the faithful servants of mathematical rigor. Consider for example the triangle on the left of Fig. 1.1. This triangle is evidently composed of two right triangles of areas A1 = A2 = b1 h , 2 b2 h 2

(each right triangle is exactly half a rectangle). Hence the main triangle’s area is (b1 + b2 )h bh A = A1 + A2 = = . 2 2 Very well. What about the triangle on the right? Its b1 is not shown on the ﬁgure, and what is that −b2 , anyway? Answer: the triangle is composed of the diﬀerence of two right triangles, with b1 the base of the larger, overall one: b1 = b + (−b2 ). The b2 is negative because the sense of the small right triangle’s area in the proof is negative: the small area is subtracted from

1.3. COMPLEX NUMBERS AND COMPLEX VARIABLES

5

the large rather than added. By extension on this basis, the main triangle’s area is again seen to be A = bh/2. The proof is exactly the same. In fact, once the central idea of adding two right triangles is grasped, the extension is really rather obvious—too obvious to be allowed to burden such a book as this. Excepting the uncommon cases where extension reveals something interesting or new, this book generally leaves the mere extension of proofs— including the validation of edge cases and over-the-edge cases—as an exercise to the interested reader.

1.3

Complex numbers and complex variables

More than a mastery of mere logical details, it is an holistic view of the mathematics and of its use in the modeling of physical systems which is the mark of the applied mathematician. A feel for the math is the great thing. Formal deﬁnitions, axioms, symbolic algebras and the like, though often useful, are felt to be secondary. The book’s rapidly staged development of complex numbers and complex variables is planned on this sensibility. Sections 2.12, 3.10, 3.11, 4.3.3, 4.4, 6.2, 9.5 and 9.6.5, plus all of Chs. 5 and 8, constitute the book’s principal stages of complex development. In these sections and throughout the book, the reader comes to appreciate that most mathematical properties which apply for real numbers apply equally for complex, that few properties concern real numbers alone. Pure mathematics develops an abstract theory of the complex variable.1 The abstract theory is quite beautiful. However, its arc takes oﬀ too late and ﬂies too far from applications for such a book as this. Less beautiful but more practical paths to the topic exist;2 this book leads the reader along one of these.

1.4

On the text

The book gives numerals in hexadecimal. It denotes variables in Greek letters as well as Roman. Readers unfamiliar with the hexadecimal notation will ﬁnd a brief orientation thereto in Appendix A. Readers unfamiliar with the Greek alphabet will ﬁnd it in Appendix B. Licensed to the public under the GNU General Public Licence [16], version 2, this book meets the Debian Free Software Guidelines [11].
1 2

[14][44][24] See Ch. 8’s footnote 8.

6

CHAPTER 1. INTRODUCTION

If you cite an equation, section, chapter, ﬁgure or other item from this book, it is recommended that you include in your citation the book’s precise draft date as given on the title page. The reason is that equation numbers, A chapter numbers and the like are numbered automatically by the L TEX typesetting software: such numbers can change arbitrarily from draft to draft. If an example citation helps, see [5] in the bibliography.

Chapter 2

Classical algebra and geometry
Arithmetic and the simplest elements of classical algebra and geometry, we learn as children. Few readers will want this book to begin with a treatment of 1 + 1 = 2; or of how to solve 3x − 2 = 7. However, there are some basic points which do seem worth touching. The book starts with these.

2.1

Basic arithmetic relationships

This section states some arithmetical rules.

2.1.1

Commutivity, associativity, distributivity, identity and inversion

Table 2.1 lists several arithmetical rules, each of which applies not only to real numbers but equally to the complex numbers of § 2.12. Most of the rules are appreciated at once if the meaning of the symbols is understood. In the case of multiplicative commutivity, one imagines a rectangle with sides of lengths a and b, then the same rectangle turned on its side, as in Fig. 2.1: since the area of the rectangle is the same in either case, and since the area is the length times the width in either case (the area is more or less a matter of counting the little squares), evidently multiplicative commutivity holds. A similar argument validates multiplicative associativity, except that here we compute the volume of a three-dimensional rectangular box, which box we turn various ways.1
1

[44, Ch. 1]

7

8

CHAPTER 2. CLASSICAL ALGEBRA AND GEOMETRY

Table 2.1: Basic properties of arithmetic.

a+b a + (b + c) a+0=0+a a + (−a) ab (a)(bc) (a)(1) = (1)(a) (a)(1/a) (a)(b + c)

= = = = = = = = =

b+a (a + b) + c a 0 ba (ab)(c) a 1 ab + ac

Figure 2.1: Multiplicative commutivity.

b a

a

b

2.1. BASIC ARITHMETIC RELATIONSHIPS

9

Multiplicative inversion lacks an obvious interpretation when a = 0. Loosely, 1 = ∞. 0 But since 3/0 = ∞ also, surely either the zero or the inﬁnity, or both, somehow diﬀer in the latter case. Looking ahead in the book, we note that the multiplicative properties do not always hold for more general linear transformations. For example, matrix multiplication is not commutative and vector cross-multiplication is not associative. Where associativity does not hold and parentheses do not otherwise group, right-to-left association is notationally implicit:2,3 A × B × C = A × (B × C). The sense of it is that the thing on the left (A×) operates on the thing on the right (B × C). (In the rare case in which the question arises, you may want to use parentheses anyway.)

2.1.2

Negative numbers

Consider that (+a)(+b) = +ab, (−a)(+b) = −ab, (+a)(−b) = −ab,

(−a)(−b) = +ab. The ﬁrst three of the four equations are unsurprising, but the last is interesting. Why would a negative count −a of a negative quantity −b come to
The ﬁne C and C++ programming languages are unfortunately stuck with the reverse order of association, along with division inharmoniously on the same level of syntactic precedence as multiplication. Standard mathematical notation is more elegant: abc/uvw = (a)(bc) . (u)(vw)
2

3

The nonassociative cross product B × C is introduced in § 15.2.2.

10

CHAPTER 2. CLASSICAL ALGEBRA AND GEOMETRY

a positive product +ab? To see why, consider the progression . . . (+3)(−b) = −3b, (+2)(−b) = −2b, (0)(−b) =

(+1)(−b) = −1b, 0b, (−1)(−b) = +1b, (−2)(−b) = +2b, (−3)(−b) = +3b, . . .

The logic of arithmetic demands that the product of two negative numbers be positive for this reason.

2.1.3
If4

Inequality
a < b,

then necessarily a + x < b + x. However, the relationship between ua and ub depends on the sign of u: ua < ub ua > ub Also, 1 1 > . a b if u > 0; if u < 0.

2.1.4

The change of variable

The applied mathematician very often ﬁnds it convenient to change variables, introducing new symbols to stand in place of old. For this we have
Few readers attempting this book will need to be reminded that < means “is less than,” that > means “is greater than,” or that ≤ and ≥ respectively mean “is less than or equal to” and “is greater than or equal to.”
4

One can indeed use the equal sign. usually it is better to introduce a new symbol. (2. it means to increment k by one. Still. as in j ← k + 1. a number deﬁned such that i2 = −1. which regrettably does not ﬁll the role well. whereas Q ← P implies nothing especially interesting about P or Q. 5 . “let the new symbol Q represent P . introduced in more detail in § 2. as the reader may be aware. △ In some books. Even k ← k + 1 can confuse readers inasmuch as it appears to imply two diﬀerent values for the same symbol k. ≡ is printed as =. but the latter notation is sometimes used anyway when new symbols are unwanted or because more precise alternatives (like kn = kn−1 + 1) seem overwrought. it just introduces a (perhaps temporary) new symbol Q to ease the algebra. “in place of P .12 below). The notation k ← k + 1 by contrast is unambiguous. 6 One would never write k ≡ k + 1.” For example. 11 This means. Subjectively. This means. if a2 + b2 = c2 . other than the = equal sign. but then what does the change of variable k = k + 1 mean? It looks like a claim that k and k + 1 are the same. Diﬀerences and sums of squares are conveniently factored as a2 + b2 = (a + ib)(a − ib). the latter notation admittedly has seen only scattered use in the literature. which is impossible. or. put Q”. then the change of variable 2µ ← a yields the new form (2µ)2 + b2 = c2 .2.2 Quadratics a2 − b2 = (a + b)(a − b). “let Q now equal P .2.1) (where i is the imaginary unit.”6 The two notations logically mean about the same thing. The C and C++ programming languages use == for equality and = for assignment (change of variable). a2 + 2ab + b2 = (a + b)2 a2 − 2ab + b2 = (a − b)2 . QUADRATICS the change of variable or assignment notation5 Q ← P. Useful as these four forms are. Q ≡ P identiﬁes a quantity P suﬃciently interesting to be given a permanent name Q. The concepts grow clearer as examples of the usage arise in the book. There appears to exist no broadly established standard mathematical notation for the change of variable. 2. However. Similar to the change of variable notation is the deﬁnition notation Q ≡ P.

(If not already clear from the context.11.3) β2 − γ2. 2a . Expressions of fourth and ﬁfth order are quartic and quintic. zeroth-order expressions like 3 are constant. Examples of quadratic expressions include x2 . 2x2 − 7x + 3 and x2 +2xy +y 2 . Substituting into the equation the values of z1 and z2 and simplifying proves the suggestion correct. 9 It suggests it because the expressions on the left and right sides of (2. the expressions x3 −1 and 5x2 y are cubic not quadratic because they contain third-order terms. The term 5x2 y = 5[x][x][y] is of third order. See Ch.12 CHAPTER 2.4) (2. order basically refers to the number of variables multiplied together in a term.) 7 The adjective quadratic refers to the algebra of expressions in which no term has greater than second order.2) = z 2 − 2βz + β 2 − (β 2 − γ 2 ) are those given by (2. where z1 and z2 are the two values of z given by (2. 10 and its Tables 10. or in other words where z=β± This suggests the factoring9 z 2 − 2βz + γ 2 = (z − z1 )(z − z2 ). but they are much harder.2. The expression evidently has roots8 where (z − β)2 = (β 2 − γ 2 ). which is called the quadratic formula. It follows that the two solutions of the quadratic equation z 2 = 2βz − γ 2 (2. To factor this. CLASSICAL ALGEBRA AND GEOMETRY however. By contrast. none of them can directly factor the more general quadratic7 expression z 2 − 2βz + γ 2 . First-order expressions like x + 1 are linear. 10 The form of the quadratic formula which usually appears in print is √ −b ± b2 − 4ac x= .1 and 10. we complete the square. See § 2. (2.10 (Cubic and quartic formulas also exist respectively to extract the roots of polynomials of third and fourth order. for instance.2).2). writing z 2 − 2βz + γ 2 = z 2 − 2βz + γ 2 + (β 2 − γ 2 ) − (β 2 − γ 2 ) = (z − β)2 − (β 2 − γ 2 ).) 8 A root of f (z) is a value of z for which f (z) = 0.3) are both quadratic (the highest power is z 2 ) and have the same roots. respectively.

3 speaks further of the dummy variable. the general habit of writing k and j proves convenient at least in § 4. then adding the several f (k). The symbols and come respectively from the Greek letters for S and P. by (2.2. .4). For example. Appendix A tells how to read the numerals. the quadratic z 2 = 3z − 2 has the solutions 3 z= ± 2 11 s„ « 2 3 − 2 = 1 or 2. b in turn. . so we start now.12 (Nothing prevents one from writing k rather than j . For example. and may be regarded as standing for “Sum” and “Product. For a dummy variable.3 Notation for series sums and products Sums and products of series arise so frequently in mathematical work that one ﬁnds it convenient to deﬁne terse notations to express them. However. evaluating the function f (k) at each k. . The book’s preface explains why the book represents such numbers in hexadecimal. incidentally. 2 The hexadecimal numeral 0x56 represents the same number the decimal numeral 86 represents.2) in light of (2. 12 Section 7. The summation notation b f (k) k=a means to let k equal each of a.11 6 k2 = 32 + 42 + 52 + 62 = 0x56.) which solves the quadratic ax2 + bx + c = 0. . index of summation or loop counter —a variable with no independent existence. a + 1.” The j or k is a dummy variable. a + 2. this writer ﬁnds the form (2. NOTATION FOR SERIES SUMS AND PRODUCTS 13 2. one can use any letter one likes. However. 8.2) easier to remember.5.2 and Ch. . used only to facilitate the addition or multiplication of the series. k=3 The similar multiplication notation b f (j) j=a means to multiply the several f (j) rather than to add them.3.

” Regarding the notation n!/m!.1) is not always commutative. . we specify that b j=a f (j) = [f (b)][f (b − 1)][f (b − 2)] · · · [f (a + 2)][f (a + 1)][f (a)] rather than the reverse order of multiplication. If you can avoid unnecessary multiplication by regarding n!/m! as a single unit. 13 (2. overﬂowing computer registers during numerical computation.5) One reason among others for this is that factorials rapidly multiply to extremely large sizes. we shall use the notation b j=a f (j) = [f (a)][f (a + 1)][f (a + 2)] · · · [f (b − 2)][f (b − 1)][f (b)]. CLASSICAL ALGEBRA AND GEOMETRY The product shorthand n n! ≡ n!/m! ≡ j. but this is the order we shall use in this book. N N f (k) = 0 + k=N +1 N k=N +1 N f (k) = 0. This means among other things that 0! = 1. this is a win. this can of course be regarded correctly as n! divided by m! . In the event that the reverse order of multiplication is needed. j=m+1 is very frequently used. The notation n! is pronounced “n factorial.14 CHAPTER 2. Note that for the sake of deﬁnitional consistency. but it usually proves more amenable to regard the notation as a single unit. 14 The extant mathematical Q literature lacks an established standard on the order of multiplication implied by the “ ” symbol. f (j) = (1) j=N +1 j=N +1 f (j) = 1.1.14 Multiplication proceeds from right to left.13 Because multiplication in its more general sense as linear transformation (§ 11. j=1 n j.

16 n z ≡ n z.2: Power properties and deﬁnitions. j=1 (2. n z n ≡ j=1 z.4. n ≥ 0 √ z = (z 1/n )n = (z n )1/n z ≡ z 1/2 (uv)a = ua v a z p/q = (z 1/q )p = (z p )1/q z ab = (z a )b = (z b )a z a+b = z a z b za z a−b = zb 1 z −b = zb 2. when the exponent n is a nonnegative integer. but it further usually indicates that the expression on its right serves to deﬁne the expression on its left. multiplied by itself n times. Refer to § 2. More formally.7) 16 The symbol “≡” means “=”.16 CHAPTER 2.1. . CLASSICAL ALGEBRA AND GEOMETRY Table 2.5.1 Notation and integral powers The power notation zn indicates the number z.

17 z 3 = (z)(z)(z). zn Notice that in general. 17 (2.5. The case 00 is interesting because it lacks an obvious interpretation. For interest. z mn = (z m )n = (z n )m . so for consistency with (2. POWERS AND ROOTS For example. z for any integral m and n.11) 2. z 0 = 1.8) (2. 1 .2 Roots Fractional powers are not something we have deﬁned yet. E→∞ E→∞ E→∞ E ǫ→0+ 17 . For similar reasons.10) we let (u1/n )n = u. Setting v = u1/n . z 1 = z.10) On the other hand from multiplicative associativity and commutivity. (2. raised to the nth power. z 2 = (z)(z). if E ≡ 1/ǫ.2. zm z m−n = n . (uv)n = un v n . then „ «1/E 1 lim ǫǫ = lim = lim E −1/E = lim e−(ln E)/E = e0 = 1. This has u1/n as the number which. yields u. z This leads us to extend the deﬁnition to negative integral powers with z n−1 = z −n = From the foregoing it is plain that z m+n = z m z n . The speciﬁc interpretation depends on the nature and meaning of the two zeros.5.9) (2. zn .

then wpq = z p . In any case. § 2. When z is real and nonnegative. What about powers expressible neither as n nor as 1/n. Extending (2. often written √ z.3 will ﬁnd it useful if we can also show here that (z 1/q )1/s = z 1/qs = (z 1/s )1/q . (z 1/n )n = z = (z n )1/n (2.14) . the square root of z.13) Since any real number can be approximated arbitrarily closely by a ratio of integers. (2. Equation (2.13) is this subsection’s main result. the power and root operations mutually invert one another. so this is (z 1/q )p = (z p )1/q . (2.13) implies a power deﬁnition for all real exponents. The number z 1/n is called the nth root of z—or in the very common case n = 2.5. wp = (z p )1/q . But w = z 1/q . (v n )1/n = u1/n . the result is the same. the last notation is usually implicitly taken to mean the real. Taking the u and v formulas together. (2. which says that it does not matter whether one applies the power or the root ﬁrst. nonnegative square root.18 CHAPTER 2.12) for any z and integral n. then. However. Taking the qth root. such as the 3/2 power? If z and w are numbers related by wq = z. we deﬁne z p/q such that (z 1/q )p = z p/q = (z p )1/q . (v n )1/n = v.10) therefore. CLASSICAL ALGEBRA AND GEOMETRY it follows by successive steps that v n = u.

Raising this equation to the 1/q power. then raising to the qs power yields (ws )q = z.3 Powers of products and powers of powers (uv)p = up v p .5. the last two equations imply (2.13) and (2. Per (2. as we have sought. Raising this equation to the 1/qs power and applying (2.10). w = (z 1/s )1/q . If w ≡ z 1/qs .10).11).5. In other words (uv)a = ua v a for any real a. we have that z (p/q)(r/s) = (z p/q )r/s . per (2. 2. POWERS AND ROOTS The proof is straightforward. On the other hand. z pr = (z p )r . 19 But since w ≡ z 1/qs .2.14) to reorder the powers. we have that (uv)p/q = [up v p ]1/q = = = (up )q/q (v p )q/q (up/q )q (v p/q )q (up/q )(v p/q ) 1/q 1/q q/q = up/q v p/q .14). (2. (2. Successively taking the qth and sth roots gives w = (z 1/q )1/s .15) . By identical reasoning.

With such a deﬁnition the results apply not only for all bases but also for all exponents.18) 2.18) is z a−b = za . CLASSICAL ALGEBRA AND GEOMETRY By identical reasoning. which according to (2. 2. a.5. § 3. real or complex. p/q. Still.11 and Ch. zb (2.5 Summary and remarks Table 2. if complex (§ 2. which implies that 1 .4 Sums of powers z (p/q)+(r/s) = (z ps+rq )1/qs = (z ps z rq )1/qs = z p/q z r/s . 1 = z −b+b = z −b z b . we shall remove later. zb But then replacing −b ← b in (2. the formulas remain valid.20 CHAPTER 2. In the case that a = −b.19) (2. b and so on are real numbers. 5. n.16) for any real a and b. purposely deﬁning the action of a complex exponent to comport with the results found here. This restriction. Looking ahead to § 2. (2.12. one can reason that or in other words that z a+b = z a z b .17) leads to z −b = z a−b = z a z −b . this implies that (z a )b = z ab = (z b )a (2. u and v to be real numbers. . Since p/q and r/s can approximate any real numbers with arbitrary precision. w.5.9).15) and (2.17) (2. With (2.16). we observe that nothing in the foregoing analysis requires the base variables z.2 on page 16 summarizes the section’s deﬁnitions and results. z (p/q)(r/s) = (z r/s )p/q . the analysis does imply that the various exponents m.12).

2. The experienced scientist or engineer may notice that the above vocabulary omits the name “Taylor series.” Semantics. This section discusses the multiplication and division of power series. not if it includes negative powers of z.20). depending on whether the speaker is male or female. 18 Another name for the power series is polynomial. We tend to call (2. technically.20) a “power series with negative powers. they call a “power series” only if ak = 0 for all k < 0—in other words.20) in general a Laurent series. If that is not what you seek. to the point that we applied mathematicians (at least in the author’s country) seem to feel somehow unlicensed actually to use the name. 8. They call it a “polynomial” only if it is a “power series” with a ﬁnite number of terms.6 Multiplying and dividing power series A power series 18 is a weighted sum of integral powers: ∞ k=−∞ A(z) = ak z k . That is after all exactly what it is. but the two names in fact refer to essentially the same thing.” The vocabulary omits the name because that name fortunately remains unconfused in usage—it means quite speciﬁcally a power series without negative powers and tends to connote a representation of some particular function of interest—as we shall see in Ch. Equation (2.20) a Laurent series if you prefer (and if you pronounce it right: “lor-ON”). The name “Laurent series” is a name we shall meet again in § 8. you can snow him with the big word “Laurent series.14. They call (2. by the even humbler name of “polynomial.” or just “a power series. but one is expected most carefully always to give the right name in the right place. if it has a ﬁnite number of terms. The power-series terminology seems to share a spirit of that kin. to sketch some annulus in the Argand plane. You however can call (2. then you may ﬁnd it better just to call the thing by the less lofty name of “power series”—or better. and/or to engage in other maybe unnecessary formalities.6. What a bother! (Someone once told the writer that the Japanese language can give diﬀerent names to the same object. In the meantime however we admit that the professionals have vaguely daunted us by adding to the name some pretty sophisticated connotations. Professional mathematicians use the terms more precisely. (2. When someone does object. The word “polynomial” usually connotes a power series with a ﬁnite number of terms.” instead. .” be prepared for people subjectively— for no particular reason—to expect you to establish complex radii of convergence. All these names mean about the same thing. MULTIPLYING AND DIVIDING POWER SERIES 21 2.20) where the several ak are arbitrary constants.” This book follows the last usage. the writer recommends that you call it a “power series” and then not worry too much about it until someone objects.) If you seek just one word for the thing. Nevertheless if you do use the name “Laurent series.

22 CHAPTER 2.6. (2.3. One sometimes wants to express the long division of power series more formally. however. Section 2. we prepare the long division B(z)/A(z) by writing B(z) = A(z)Qn (z) + Rn (z).23) If Q(z) is a quotient and R(z) a remainder. . Such are the Latin-derived names of the parts of a long division. and you can skip the rest of the subsection if you like. (2.) Formally. then B(z) is a dividend (or numerator ) and A(z) a divisor (or denominator ). z−2 z−2 z−2 = The strategy is to take the dividend19 B(z) piece by piece. but this subsection does it by long division.21) bk z . 19 (2. That is what the rest of the subsection is about.3 below will do it by matching coeﬃcients. though less direct.6.2 Dividing power series The quotient Q(z) = B(z)/A(z) of two power series is a little harder to calculate.6. For example.6. then that is really all there is to it. purposely choosing pieces easily divided by A(z). If you feel that you understand the example. 2z 2 − 3z + 3 z−2 2z 2 − 4z z + 3 z+3 + = 2z + z−2 z−2 z−2 z−2 5 5 = 2z + + = 2z + 1 + .22) k=−∞ j=−∞ 2. CLASSICAL ALGEBRA AND GEOMETRY 2. k the product of the two series is evidently P (z) ≡ A(z)B(z) = ∞ ∞ aj bk−j z k . (Be advised however that the cleverer technique of § 2. and there are at least two ways to do it. is often easier and faster.1 Multiplying power series Given two power series A(z) = B(z) = ∞ k=−∞ ∞ k=−∞ ak z k .

N − 2. where K and N identify the greatest orders k of z k present in A(z) and B(z). MULTIPLYING AND DIVIDING POWER SERIES 23 where Rn (z) is a remainder (being the part of B[z] remaining to be divided). + Rn (z) − aK aK Matching this equation against the desired iterate B(z) = A(z)Qn−1 (z) + Rn−1 (z) and observing from the deﬁnition of Qn (z) that Qn−1 (z) = Qn (z) + . At start. What does it mean? The key to understanding it lies in understanding (2. . respectively. . The goal is to grind Rn (z) away to nothing.23). . Rn (z) = k=−∞ Qn (z) = N −K k=n−K+1 qk z k . where n = N. but the quotient Qn (z) and the remainder Rn (z) change. which is not one but several equations—one equation for each value of n. N − 1. which we add and subtract from (2. Well.24) rnk z k . that is a lot of symbology. k=−∞ B(z) = RN (z) = B(z).6. As in the example. The piece we choose is (rnn /aK )z n−K A(z). bk z k . aK = 0. . to make it disappear as n → −∞. and K A(z) = k=−∞ N ak z k .2. n (2. QN (z) = 0. but the thrust of the long division process is to build Qn (z) up by wearing Rn (z) down. QN (z) = 0 while RN (z) = B(z). The dividend B(z) and the divisor A(z) stay the same from one n to the next.23) to obtain B(z) = A(z) Qn (z) + rnn n−K rnn n−K z z A(z) . we pursue the goal by choosing from Rn (z) an easily divisible piece containing the whole high-order term of Rn (z).

Rn (z) B(z) = Qn (z) + . then so long as Rn (z) tends to vanish as n → −∞. § 3.2] The notations Ko . A(z) Iterating only a ﬁnite number of times leaves a remainder. Table 2.20 In its qn−K equation. Then we iterate per (2.” “a sub k” and “z to the k” (or.24 CHAPTER 2. k=No B(z) = then n Rn (z) = k=n−(K−Ko )+1 20 21 rnk z k for all n < No + (K − Ko ). bk z k . the table includes also the result of § 2. (2. qn−K = where no term remains in Rn−1 (z) higher than a z n−1 term. as “K naught.3 summarizes the long-division procedure.23) that B(z) = Q−∞ (z).25) except that it may happen that Rn (z) = 0 for suﬃciently small n. respectively.3 below.23) is trivially true.27) (2. aK Rn−1 (z) = Rn (z) − qn−K z n−K A(z).25) as many times as desired. If an inﬁnite number of times. To begin the actual long division. we ﬁnd that rnn .28) [46. “z to the kth power”)—at least in the author’s country. it follows from (2. A(z) A(z) (2. we initialize RN (z) = B(z). ak and z k are usually pronounced. for which (2.26) (2. CLASSICAL ALGEBRA AND GEOMETRY qn−K z n−K . more fully.6. It should be observed in light of Table 2. .3 that if21 K A(z) = k=Ko N ak z k .

3: Dividing power series through successively smaller powers.6.2. MULTIPLYING AND DIVIDING POWER SERIES 25 Table 2. B(z) = A(z)Qn (z) + Rn (z) K A(z) = k=−∞ N ak z k . aK = 0 bk z k k=−∞ B(z) = RN (z) = B(z) QN (z) = 0 n Rn (z) = k=−∞ rnk z k N −K k=n−K+1 Qn (z) = qk z k N −K qn−K = 1 rnn = aK aK bn − an−k qk Rn−1 (z) = Rn (z) − qn−K z B(z) = Q−∞ (z) A(z) k=n−K+1 n−K A(z) .

which is exactly the claim (2. The greatest-order term of Rn (z) is by deﬁnition a z n term. Often. then according to the equation so also must the leastorder term of Rm−1 (z) be a z No term.26 CHAPTER 2. bk z k ∞ k=N −K qk z k If a more formal demonstration of (2. 2. however. In this case. otherwise a z No term.28) is wanted.29) ∞ k=K ∞ k=N ak z k . Under these conditions. evidently the least-order term of Rm−1 (z) is a z m−(K−Ko ) term when m − (K − Ko ) ≤ No . CLASSICAL ALGEBRA AND GEOMETRY That is. One can divide them by matching coeﬃcients.3 extends the quotient Qn (z) through successively smaller powers of z. of course. one prefers to extend the quotient through successively larger powers of z. This is better stated after the change of variable n + 1 ← m: the least-order term of Rn (z) is a z n−(K−Ko )+1 term when n < No + (K − Ko ).28) makes. the terms of Rn (z) run from z n−(K−Ko )+1 through z n . unless an even lower-order term be contributed by the product z m−K A(z). aK = 0. in summary.2.6.22 The long-division procedure of Table 2. is that we have strategically planned the long-division iteration precisely to cause the leading term of the divisor to cancel the leading term of the remainder at each step. otherwise a z No term. But that very product’s term of least order is a z m−(K−Ko ) term.6.25) that rmm m−K Rm−1 (z) = Rm (z) − z A(z).5] . The reason for this. the long division goes by the complementary rules of Table 2. 23 [29][14. when n < No + (K − Ko ). So. the remainder has order one less than the divisor has. A(z) (2.4. then consider per (2. § 2. sometimes quicker way to divide power series than by the long division of § 2. where a z K term is A(z)’s term of least order. aK If the least-order term of Rm (z) is a z No term (as clearly is the case at least for the initial remainder RN [z] = B[z]).3 Dividing power series by matching coeﬃcients There is another.23 If Q∞ (z) = where A(z) = B(z) = are known and Q∞ (z) = 22 B(z) .

aK = 0 bk z k RN (z) = B(z) QN (z) = 0 Rn (z) = Qn (z) = k=N −K ∞ rnk z k qk z k k=n n−K−1 qn−K = rnn 1 = aK aK n−K−1 bn − an−k qk A(z) Rn+1 (z) = Rn (z) − qn−K z B(z) = Q∞ (z) A(z) k=N −K n−K .2.4: Dividing power series through successively larger powers. MULTIPLYING AND DIVIDING POWER SERIES 27 Table 2.6. B(z) = A(z)Qn (z) + Rn (z) A(z) = B(z) = ∞ k=K ∞ k=N ak z k .

30) yields a sequence of coeﬃcients does not necessarily mean that the resulting power series Q∞ (z) converges to some deﬁnite value over a given domain.3 and 2. even though all its coeﬃcients are known. often what interest us are only the series’ ﬁrst several terms n−K−1 Qn (z) = In this case. The adaptation is left as an exercise to the interested reader. (2.29) through by A(z) to obtain the form A(z)Q∞ (z) = B(z). we have that qn−K = 1 aK n−K−1 bn − an−k qk k=N −K . Admittedly.30) is correct when Q∞ (z) does converge. then one can multiply (2. each coeﬃcient depending on the coeﬃcients earlier computed.22) and changing the index n ← k on the right side. n=N k=N −K n−K k=N −K But for this to hold for all z. ∞ n−K an−k qk z n = ∞ n=N bn z n . rather than increasing. Transferring all terms but aK qn−K to the equation’s right side and dividing by aK .4 incorporate the technique both ways.31) for Rn (z). the fact that (2. Q∞ (z) = k=N −K qk z k . but Tables 2. the coeﬃcients must match for each n: an−k qk = bn . At least (2. Even when Q∞ (z) as such does not converge. Rn (z) = B(z) − A(z)Qn (z). Solving (2.30) computes the coeﬃcients of Q(z). however. Rn (z) B(z) = Qn (z) + A(z) A(z) and convergence is not an issue.31) (2. (2.28 CHAPTER 2.34). Expanding the left side according to (2. which diverges when |z| > 1. powers of z if needed or desired. CLASSICAL ALGEBRA AND GEOMETRY is to be calculated. n ≥ N. The coeﬃcient-matching technique of this subsection is easily adapted to the division of series in decreasing.30) Equation (2.32) . Consider for instance (2. n ≥ N.

∞ k=1 Multiplying by z yields zS ≡ zk . |5| = 5. sum? Answer: it sums exactly to 1/(1 − k=0 z). which. ∞ k=0 zk = 1 .6.6.   1 k=0 (2. |z| < 1. Let S≡ ∞ k=0 z k . For example. computed by the coeﬃcient matching of § 2. calculated by the long division of § 2. and/or veriﬁed by multiplying. more aesthetic way to demonstrate the same thing. after dividing by 1 − z. the diﬀerence technique of § 2.5 Variations on the geometric series Besides being more aesthetic than the long division of § 2. 1−z (2.2. |z| < 1.6.2. |z| < 1.3.34) 2. |z| < 1. . as follows. but also |−5| = 5.4 Common power-series quotients and the geometric series Frequently encountered power-series quotients.   k=−∞ Equation (2.33) almost incidentally answers a question which has arisen in § 2. However. include24 ∞    (∓)k z k .4 permits one to extend the basic geometric series in 24 The notation |z| represents the magnitude of z.33) = 1 ± z  −1  k k − (∓) z .6.4 and which often arises in practice: to what total does the inﬁnite geometric series ∞ z k . |z| > 1. implies that S≡ as was to be demonstrated. there is a simpler.6. MULTIPLYING AND DIVIDING POWER SERIES 29 2.6.2.6. Subtracting the latter equation from the former leaves (1 − z)S = 1.

the model 25 [33] . independent variables and dependent variables. it doesn’t vary). vsound is an indeterminate constant (given particular atmospheric conditions. among others. k (k + 1)(k)z k . We multiply the unknown S1 by z. independent variables and dependent variables Mathematical models use indeterminate constants. |z| < 1. we can compute as follows.7 Indeterminate constants. |z| < 1 (which arises in. so. Here. k k3 z k . 1−z where we have used (2. We then subtract zS1 from S1 . The three are best illustrated by example. we arrive at ∞ z . The model gives t as a function of ∆r. and t is a dependent variable. where ∆r is the distance from source to listener and vsound is the speed of sound. the sum S1 ≡ ∞ k=0 kz k . Dividing by 1 − z. Planck’s quantum blackbody radiation calculation25 ). 2.34) to collapse the last sum. (2. can be calculated in like manner as the need for them arises. etc. Further series of the kind. Consider the time t a sound needs to travel from its source to a distant listener: t= ∆r vsound .35) kz k = S1 ≡ (1 − z)2 k=0 which was to be found.30 CHAPTER 2. CLASSICAL ALGEBRA AND GEOMETRY several ways.. leaving (1 − z)S1 = ∞ k=0 kz − k ∞ k=1 (k − 1)z = k ∞ k=1 z =z k ∞ k=0 zk = z . ∆r is an independent variable. For instance. if you tell the model how far the listener sits from the sound source. producing zS1 = ∞ kz k+1 = ∞ k=0 k=1 (k − 1)z k . such as k k2 z k .

The chance that the typical reader will ever specify the dimensions of a real musical concert hall is of course vanishingly small. an independent variable or a dependent variable often depends upon one’s immediate point of view. now regarding as an independent variable what a moment ago we had regarded as an indeterminate constant. because the chance that the typical reader will ever specify something technical is quite large. One could calculate the quantity of water in a kitchen mixing bowl just as well. you just have to recalculate numerically).7. Such examples remind one of the kind of calculation one encounters in a childhood arithmetic textbook. ∆r. The same model in the example would remain valid if atmospheric conditions were changing (vsound would then be an independent variable) or if the model were used in designing a musical concert hall26 to suﬀer a maximum acceptable sound time lag from the stage to the hall’s back row (t would then be an independent variable. The point is that conceptually there pre¨xists some right ﬁgure for the indeterminate e constant. day-to-day decisions where small sums of money and negligible risk to life are at stake—are done with models hardly more sophisticated than the one shown here. Such a shift of viewpoint is ﬁne.2. Occasionally we go so far as deliberately to change our point of view in mid-analysis. Note that the abstract validity of the model does not necessarily depend on whether we actually know the right ﬁgure for vsound (if I tell you that sound goes at 500 m/s. Knowing the ﬁgure is not the point. However. § 9. after all. dependent). so long as we remember that there is a diﬀerence between the three kinds of quantity and we keep track of which quantity is which kind to us at the moment. The main reason it matters which symbol represents which of the three kinds of quantity is that in calculus. it probably doesn’t ruin the theoretical part of your analysis. 26 . for instance (a typical case of this arises in the solution of diﬀerential equations by the method of unknown coeﬃcients. Although there exists a deﬁnite philosophical distinction between the three kinds of quantity. as of the quantity of air contained in an astronaut’s round helmet. nevertheless it cannot be denied that which particular quantity is an indeterminate constant. maybe the concert-hall example is not so unreasonable.4). one analyzes how change in independent variables aﬀects dependent variables as indeterminate constants remain Math books are funny about examples like this. but later you ﬁnd out that the real ﬁgure is 331 m/s. CONSTANTS AND VARIABLES 31 returns the time needed for the sound to propagate from one to the other. Although sophisticated models with many factors and terms do indeed play a major role in engineering. but astronauts’ helmets are so much more interesting than bowls. you see. So. that sound goes at some constant speed—whatever it is—and that we can calculate the delay in terms of this. the great majority of practical engineering calculations—for quick. it is the idea of the example that matters here.

(2.3 has introduced the dummy variable. Refer to §§ 2. Within the expression. the inverse operation is not the root but rather the logarithm: loga (az ) = z.1 The logarithm The exponential operation follows the same laws the power operation follows. not on some other variable within the expression.8 Exponentials and logarithms In § 2. . where (in § 2.37) Hence.” of course. in fact. However. With the change of variable w ← az .5 we have considered the power operation z a . One can view it as the exponential operation az . it ﬁlls no role because logically it does not exist there. without. There is another way to view the power operation. this is aloga w = w. the dummy variable ﬁlls the role of an independent variable.7’s language) the independent variable z is the base and the indeterminate constant a is the exponent. we have that aloga (a z) = az . but its dependence is on the operator controlling the expression. 2.) 2.8. Such a dummy variable does not seem very “independent. but because the variable of interest is now the exponent rather than the base.32 CHAPTER 2.3 and 7. most dummy variables are just independent variables—a few are dependent variables— whose scope is restricted to a particular expression. the exponential and logarithmic operations mutually invert one another. “What power must I raise a to.36) The logarithm loga w answers the question. to get w?” Raising a to the power of the last equation. which the present section’s threefold taxonomy seems to exclude. however. where the variable z is the exponent and the constant a is the base.3. CLASSICAL ALGEBRA AND GEOMETRY ﬁxed. (2. (Section 2.

u loga = loga u − loga v. powers to products.38) through (2. quotients to diﬀerences.38) follows from the steps (uv) = (u)(v).42) w z z loga w ) = (aloga u )(aloga v ).37) (which also emerge as restricted forms of eqns. Equations (2. 2.5 repeats the equations along with (2.40 and 2. exponentials to diﬀerently based exponentials. (w z ) Equation (2. loga b Of these. loga w = logb w loga b. z) wz = aloga (w wz = aloga = (aloga w )z . 2. v loga (wz ) = z loga w. thus summarizing the general properties of the logarithm.36) and (2.38) (2. and logarithms to diﬀerently based logarithms. .40) and (2.42) serve respectively to transform products to sums.8. = a . (2. loga w = loga (blogb w ). (a loga uv (2. (2. and (2.39) follows by similar reasoning.9.41) (2.41). TRIANGLES AND OTHER POLYGONS: SIMPLE FACTS 33 2.2.41) follow from the steps wz = (wz ) = (w)z .40) (2.42) follows from the steps w = blogb w . loga w logb w = . aloga uv = aloga u+loga v .2 Properties of the logarithm The basic properties of the logarithm include that loga uv = loga u + loga v. Table 2.9 Triangles and other polygons: simple facts This section gives simple facts about triangles and other polygons. = az loga w .39) (2. Among other purposes.

34 CHAPTER 2. 1. and h for perpendicular height. |a − b| < c < a + b. bh A= . (2.1 Triangle area The area of a right triangle27 is half the area of the corresponding rectangle. These are the triangle inequalities.9. The fact that any triangle’s area is half its base length times its height is seen by dropping a perpendicular from one point of the triangle to the opposite side (see Fig.5: General properties of the logarithm. which itself is longer than the diﬀerence between the two.2 The triangle inequalities Any two sides of a triangle together are longer than the third alone. dividing the triangle into two right triangles.9. In algebraic symbols. b for base length. .1 on page 4).44) where a. This is seen by splitting a rectangle down its diagonal into a pair of right triangles of equal size. is seen by sketching some triangle on a sheet of paper and asking: if c is the direct route between two points and a + b is an indirect route. then how can a + b not be longer? Of course the sum inequality is equally good on any of the 27 A right triangle is a triangle.43) 2 where A stands for area. In symbols. (2. CLASSICAL ALGEBRA AND GEOMETRY Table 2. one of whose three angles is perfectly square. loga uv = loga u + loga v u loga = loga u − loga v v loga (wz ) = z loga w wz = az loga w loga w logb w = loga b loga (az ) = z w = aloga w 2. for each of which the fact is true. The truth of the sum inequality c < a + b. 2. b and c are the lengths of a triangle’s three sides.

φ ψ triangle’s three sides.2). which together say that |a − b| < c. Solving the latter equations for φk and Many or most readers will already know the notation 2π and its meaning as the angle of full revolution. Rearranging the a and b inequalities. Brieﬂy. In mathematical notation. φ1 + φ2 + φ3 = 2π.9. the car turns to travel along the next side. You may be used to the notation 360◦ in place of 2π. the symbol 2π represents a complete turn.3 The sum of interior angles A triangle’s three interior angles28 sum to 2π/2. this book tends to avoid the former notation. 2. but for the reasons explained in Appendix A and in footnote 15 of Ch. we have that a − b < c and b − a < c.9. the notation is introduced more properly in §§ 3. 2π .44)’s proof.2. Hence 2π/4 represents a square turn or right angle.6 and 8.1. returning to the start. completing (2. One way to see the truth of this fact is to imagine a small car rolling along one of the triangle’s sides. we reason that it has turned a total of 2π: a full revolution. φk + ψk = 2 where ψk and φk are respectively the triangle’s inner angles and the angles through which the car turns. Reaching the corner. however. a full circle. For those who do not. k = 1. Since the car again faces the original direction. 3. 2. and so on round all three corners to complete a circuit. the more the car turns: see Fig. 3. 3.11 below. 28 .2: The sum of a triangle’s inner angles: turning at the corner. The last is the diﬀerence inequality. a spin to face the same direction as before. TRIANGLES AND OTHER POLYGONS: SIMPLE FACTS 35 Figure 2. But the angle φ the car turns at a corner and the triangle’s inner angle ψ there together form the straight angle 2π/2 (the sharper the inner angle. 2. so one can write a < c + b and b < c + a just as well as c < a + b.

2) and Cauchy’s integral formula (8. as in Fig. The area of the large outer square is (a + b)2 .46) with n = 3.29). 2. 2 Solving the latter equations for φk and substituting into the former yields n k=1 2π − ψk 2 = 2π. 2 (2. we have that n φk = 2π.46) Equation (2. 2. hence the area .36 CHAPTER 2. CLASSICAL ALGEBRA AND GEOMETRY substituting into the former yields ψ1 + ψ2 + ψ3 = 2π .45) is then seen to be a special case of (2.11). The theorem holds that a2 + b2 = c2 . the Pythagorean theorem is one of the most famous results in all of mathematics. b and c are the lengths of the legs and diagonal of a right triangle.4. k=1 φk + ψk = 2π . Extending the same technique to the case of an n-sided polygon. 2.45) which was to be demonstrated. 2 (2. The area of each of the four triangles in the ﬁgure is evidently ab/2.10 The Pythagorean theorem Along with Euler’s formula (5. The area of the tilted inner square is c2 . (2.47) where a. the fundamental theorem of calculus (7. But the large outer square is comprised of the tilted inner square plus the four triangles. or in other words n k=1 ψk = (n − 2) 2π . Many proofs of the theorem are known.3. One such proof posits a square of side length a + b with a tilted square of side length c inscribed as in Fig.

10. THE PYTHAGOREAN THEOREM 37 Figure 2.3: A right triangle.2. c b a Figure 2.4: The Pythagorean theorem. c b b a .

Be that as it may. thus also to c. A root (or zero) of a function is a domain point at which the function evaluates to zero (the example has roots at x = ±1). it follows that c2 + h2 = r 2 . A singularity of a function is a domain point at which the function’s output This elegant proof is far simpler than the one famously given by the ancient geometer Euclid. Unfortunately the citation is now long lost. which simpliﬁes directly to (2. a function is a mapping from one number (or vector of several numbers) to another. but it is possible [52.” 02:32. f (x) = x2 − 1 is a function which maps 1 to 0 and −3 to 8. “Pythagorean theorem. among others. Brieﬂy. however. Inasmuch as (2. In mathematical symbols.47) applies to any right triangle. 29 .29 The Pythagorean theorem is readily extended to three dimensions as a2 + b2 + h2 = r 2 .47). if the domain is restricted to real x such that |x| ≤ 3. so can claim no credit for originating it. One often speaks of domains and ranges when discussing functions. In the example. CLASSICAL ALGEBRA AND GEOMETRY of the large outer square equals the area of the tilted inner square plus the areas of the four triangles. For example. this is (a + b)2 = c2 + 4 ab 2 . The domain of a function is the set of numbers one can put into it.11 Functions This book is not the place for a gentle introduction to the concept of the function.48). Whether Euclid was acquainted with the simple proof given here this writer does not know. 2. singularity and pole. yet more appealing than alternate proofs often found in print. The range of a function is the corresponding set of numbers one can get out of it. Other terms which arise when discussing functions are root (or zero). then the corresponding range is −1 ≤ f (x) ≤ 8. and where r is the corresponding three-dimensional diagonal: the diagonal of the right triangle whose legs are c and h. the present writer encountered the proof this section gives somewhere years ago and has never seen it in print since. which equation expands directly to yield (2. (2.38 CHAPTER 2. 31 March 2006] that Euclid chose his proof because it comported better with the restricted set of geometrical elements he permitted himself to work with. A current source for the proof is [52] as cited earlier in this footnote.48) where h is an altitude perpendicular to both a and b.

(Besides the root. the singularity and the pole. the function h(x) = 1/(x2 − 1) has poles at x = ±1. Since there exists no real number i such that i2 = −1 (2.49) and since the quantity i thus deﬁned is found to be critically important across broad domains of higher mathematics.32 30 Here is one example of the book’s deliberate lack of formal mathematical rigor. however. or otherwise to frame the problem such that one need not approach the singularity. an infamous example of which is z = 0 in the function √ g[z] = z.5.— neither of us has ever met just 1). 32 The English word imaginary is evocative. A more precise formalism to say that “the function’s output is inﬁnite” might be x→xo lim |f (x)| = ∞.49) as the deﬁnition of a fundamentally new kind of quantity: the imaginary number. What it has not done is to tell us √ how to regard a quantity such as −1. however. say. that the number 1/2 never emerges directly from counting things. The example function f (x) has no singularities for ﬁnite x. then the period separating your fourteenth and twenty-ﬁrst birthdays could have .2.6. where the function’s output is inﬁnite. The word imaginary in the mathematical sense is thus more of a technical term than a descriptive adjective. there is also the troublesome branch point. one chair. which (§ 9.5. However.12 Complex numbers (introduction) Section 2. Notice. that is. but perhaps not of quite the right concept in this usage. The applied mathematician tends to avoid such formalism where there seems no immediate use for it. Imaginary numbers are not to mathematics as. of course. If for some reason the iyear were oﬀered as a unit of time. for example. but the book must lay a more extensive foundation before introducing them properly in § 8. either. Branch points are important. imaginary elfs are (presumably) not substantial objects. as w ← 1/z. in the mathematical realm. one summer afternoon. never directly from counting things.12. COMPLEX NUMBERS (INTRODUCTION) 39 diverges. imaginary numbers are substantial. but the best way to handle such unreasonable singularities is almost always to change a variable. but then so is the number 1 (though you and I have often met one of something—one apple. we accept (2. etc. This book will have little to say of such singularities. 31 There is further the essential singularity. like 1/ x).30 A pole is a sin√ gularity that behaves locally like 1/x (rather than.2) can be thought of as N poles. In the physical world.2 has introduced square roots.31 ) 2. The number i is just a concept. an example of which is z = 0 in p(z) = exp(1/z). A singularity that behaves as 1/xN is a multiple pole. imaginary elfs are to the physical world. The reason imaginary numbers are called “imaginary” probably has to do with the fact that they emerge from mathematical operations only.

let us call it a useful formalism.40 CHAPTER 2. which per the Pythagorean theorem (§ 2.133 is such that y (2. x been measured as −i7 iyears.5: The complex (or Argand) plane. If the equation doesn’t make sense to you yet for this reason. The unpersuaded reader is asked to suspend judgment a while.50) The phase arg z of the complex number is deﬁned to be the angle φ in the ﬁgure. He will soon see the use. rather. plotted at right angles to the familiar real number line as in Fig. The conjugate z ∗ of this complex number is deﬁned to be z ∗ = x − iy. The important point is that arg z is the angle φ in the ﬁgure. 2. Madness? No. 2. skip it for now. which in terms of the trigonometric functions of § 3.10) is such that |z|2 = x2 + y 2 .5.5. . and a complex number z therein. The magnitude (or modulus. iℑ(z) i2 i z ρ φ −2 −1 −i −i2 1 2 ℜ(z) Imaginary numbers are given their own number line. let us not call it that. (2. 33 This is a forward reference. CLASSICAL ALGEBRA AND GEOMETRY Figure 2. or absolute value) |z| of the complex number is deﬁned to be the length ρ in Fig. The sum of a real number x and an imaginary number iy is the complex number z = x + iy.51) tan(arg z) = .

2. 34 [13. = 2 x2 + y 2 2 It is a curious fact that It is a useful fact that 1 = −i. but would perfectly mirror one another in every respect. particularly when printed by hand).54) (2.12. 2. COMPLEX NUMBERS (INTRODUCTION) 41 Speciﬁcally to extract the real and imaginary parts of a complex number.56) (the curious fact. claiming that j not i were the true imaginary unit.34 then one would ﬁnd that (−j)2 = −1 = j 2 . including that z1 z2 = (x1 x2 − y1 y2 ) + i(y1 x2 + x1 y2 ).2.53) (2.2 Complex conjugation An important property of complex numbers descends subtly from the fact that i2 = −1 = (−i)2 . the notations ℑ(z) = y.1 Multiplication and division of complex numbers in rectangular form Several elementary properties of complex numbers are readily seen if the fact that i2 = −1 is kept in mind. (2.55.12. z1 x2 − iy2 x1 + iy1 x1 + iy1 = = z2 x2 + iy2 x2 − iy2 x2 + iy2 (x1 x2 + y1 y2 ) + i(y1 x2 − x1 y2 ) . ℜ(z) = x. 2.52) are conventionally recognized (although often the symbols ℜ(·) and ℑ(·) are written Re(·) and Im(·). eqn. § I:22-5] . The units i and j would diﬀer indeed. is useful. i z ∗ z = x2 + y 2 = |z|2 (2.12.55) (2. too). If one deﬁned some number j ≡ −i. and thus that all the basic properties of complex numbers in the j system held just as well as they did in the i system.

42

CHAPTER 2. CLASSICAL ALGEBRA AND GEOMETRY

That is the basic idea. To establish it symbolically needs a page or so of slightly abstract algebra as follows, the goal of which will be to show that [f (z)]∗ = f (z ∗ ) for some unspeciﬁed function f (z) with speciﬁed properties. To begin with, if z = x + iy, then by deﬁnition. Proposing that (z k−1 )∗ = (z ∗ )k−1 (which may or may not be true but for the moment we assume it), we can write z k−1 = sk−1 + itk−1 , (z ∗ )k−1 = sk−1 − itk−1 , where sk−1 and tk−1 are symbols introduced to represent the real and imaginary parts of z k−1 . Multiplying the former equation by z = x + iy and the latter by z ∗ = x − iy, we have that (z ∗ )k = (xsk−1 − ytk−1 ) − i(ysk−1 + xtk−1 ). With the deﬁnitions sk ≡ xsk−1 − ytk−1 and tk ≡ ysk−1 + xtk−1 , this is written more succinctly z k = sk + itk , (z ∗ )k = sk − itk . In other words, if (z k−1 )∗ = (z ∗ )k−1 , then it necessarily follows that (z k )∗ = (z ∗ )k . Solving the deﬁnitions of sk and tk for sk−1 and tk−1 yields the reverse deﬁnitions sk−1 = (xsk + ytk )/(x2 + y 2 ) and tk−1 = (−ysk + xtk )/(x2 + y 2 ). Therefore, except when z = x + iy happens to be null or inﬁnite, the implication is reversible by reverse reasoning, so by mathematical induction35 we have that (z k )∗ = (z ∗ )k (2.57) for all integral k. We have also from (2.53) that
∗ ∗ (z1 z2 )∗ = z1 z2
35

z ∗ = x − iy

z k = (xsk−1 − ytk−1 ) + i(ysk−1 + xtk−1 ),

(2.58)

Mathematical induction is an elegant old technique for the construction of mathematical proofs. Section 8.1 elaborates on the technique and oﬀers a more extensive example. Beyond the present book, a very good introduction to mathematical induction is found in [20].

2.12. COMPLEX NUMBERS (INTRODUCTION) for any complex z1 and z2 . Consequences of (2.57) and (2.58) include that if f (z) ≡ f ∗ (z) ≡

43

k=−∞ ∞ k=−∞

(ak + ibk )(z − zo )k ,

(2.59) (2.60)

∗ (ak − ibk )(z − zo )k ,

where ak and bk are real and imaginary parts of the coeﬃcients peculiar to the function f (·), then [f (z)]∗ = f ∗ (z ∗ ). (2.61) In the common case where all bk = 0 and zo = xo is a real number, then f (·) and f ∗ (·) are the same function, so (2.61) reduces to the desired form [f (z)]∗ = f (z ∗ ), (2.62)

which says that the eﬀect of conjugating the function’s input is merely to conjugate its output. Equation (2.62) expresses a signiﬁcant, general rule of complex numbers and complex variables which is better explained in words than in mathematical symbols. The rule is this: for most equations and systems of equations used to model physical systems, one can produce an equally valid alternate model simply by simultaneously conjugating all the complex quantities present.36

2.12.3

Power series and analytic functions (preview)

Equation (2.59) is a general power series37 in z − zo . Such power series have broad application.38 It happens in practice that most functions of interest
[20][44] [24, § 10.8] 38 That is a pretty impressive-sounding statement: “Such power series have broad application.” However, air, words and molecules also have “broad application”; merely stating the fact does not tell us much. In fact the general power series is a sort of one-size-ﬁts-all mathematical latex glove, which can be stretched to ﬁt around almost any function. The interesting part is not so much in the general form (2.59) of the series as it is in the speciﬁc choice of ak and bk , which this section does not discuss. Observe that the Taylor series (which this section also does not discuss; see § 8.3) is a power series with ak = bk = 0 for k < 0.
37 36

44

CHAPTER 2. CLASSICAL ALGEBRA AND GEOMETRY

in modeling physical phenomena can conveniently be constructed as power series (or sums of power series)39 with suitable choices of ak , bk and zo . The property (2.61) applies to all such functions, with (2.62) also applying to those for which bk = 0 and zo = xo . The property the two equations represent is called the conjugation property. Basically, it says that if one replaces all the i in some mathematical model with −i, then the resulting conjugate model is equally as valid as the original.40 Such functions, whether bk = 0 and zo = xo or not, are analytic functions (§ 8.4). In the formal mathematical deﬁnition, a function is analytic which is inﬁnitely diﬀerentiable (Ch. 4) in the immediate domain neighborhood of interest. However, for applications a fair working deﬁnition of the analytic function might be “a function expressible as a power series.” Chapter 8 elaborates. All power series are inﬁnitely diﬀerentiable except at their poles. There nevertheless exist one common group of functions which cannot be constructed as power series. These all have to do with the parts of complex numbers and have been introduced in this very section: the magnitude |·|; the phase arg(·); the conjugate (·)∗ ; and the real and imaginary parts ℜ(·) and ℑ(·). These functions are not analytic and do not in general obey the conjugation property. Also not analytic are the Heaviside unit step u(t) and the Dirac delta δ(t) (§ 7.7), used to model discontinuities explicitly. We shall have more to say about analytic functions in Ch. 8. We shall have more to say about complex numbers in §§ 3.11, 4.3.3, and 4.4, and much more yet in Ch. 5.

The careful reader might observe that this statement neglects the Gibbs’ phenomenon, but that curious matter will be dealt with in [section not yet written]. 40 To illustrate, from the fact that (1 + i2)(2 + i3) + (1 − i) = −3 + i6 the conjugation property infers immediately that (1 − i2)(2 − i3) + (1 + i) = −3 − i6. Observe however that no such property holds for the real parts: (−1 + i2)(−2 + i3) + (−1 − i) = 3 + i6.

39

Chapter 3

Trigonometry
Trigonometry is the branch of mathematics which relates angles to lengths. This chapter introduces the trigonometric functions and derives their several properties.

3.1

Deﬁnitions

Consider the circle-inscribed right triangle of Fig. 3.1. In considering the circle, we shall ﬁnd some terminology useful: the angle φ in the diagram is measured in radians, where a radian is the angle which, when centered in a unit circle, describes an arc of unit length.1 Measured in radians, an angle φ intercepts an arc of curved length ρφ on a circle of radius ρ (that is, of distance ρ from the circle’s center to its perimeter). An angle in radians is a dimensionless number, so one need not write “φ = 2π/4 radians”; it suﬃces to write “φ = 2π/4.” In mathematical theory, we express angles in radians. The angle of full revolution is given the symbol 2π—which thus is the circumference of a unit circle.2 A quarter revolution, 2π/4, is then the right angle, or square angle. The trigonometric functions sin φ and cos φ (the “sine” and “cosine” of φ) relate the angle φ to the lengths shown in Fig. 3.1. The tangent function is then deﬁned as sin φ , (3.1) tan φ ≡ cos φ
The word “unit” means “one” in this context. A unit length is a length of 1 (not one centimeter or one mile, just an abstract 1). A unit circle is a circle of radius 1. 2 Section 8.11 computes the numerical value of 2π.
1

45

46

CHAPTER 3. TRIGONOMETRY

Figure 3.1: The sine and the cosine (shown on a circle-inscribed right triangle, with the circle centered at the triangle’s point).
y

φ

ρ cos φ

ρ sin φ

ρ

x

which is the “rise” per unit “run,” or slope, of the triangle’s diagonal. Inverses of the three trigonometric functions can also be deﬁned: arcsin (sin φ) = φ, arccos (cos φ) = φ, arctan (tan φ) = φ. When the last of these is written in the form y , arctan x it is normally implied that x and y are to be interpreted as rectangular coordinates3 and that the arctan function is to return φ in the correct quadrant −π < φ ≤ π (for example, arctan[1/(−1)] = [+3/8][2π], whereas arctan[(−1)/1] = [−1/8][2π]). This is similarly the usual interpretation when an equation like y tan φ = x is written. By the Pythagorean theorem (§ 2.10), it is seen generally that4 cos2 φ + sin2 φ = 1. (3.2)

3 Rectangular coordinates are pairs of numbers (x, y) which uniquely specify points in a plane. Conventionally, the x coordinate indicates distance eastward; the y coordinate, northward. For instance, the coordinates (3, −4) mean the point three units eastward and four units southward (that is, −4 units northward) from the origin (0, 0). A third rectangular coordinate can also be added—(x, y, z)—where the z indicates distance upward. 4 The notation cos2 φ means (cos φ)2 .

3.2. SIMPLE PROPERTIES Figure 3.2: The sine function.

47

sin(t) 1

t 2π − 2 2π 2

Fig. 3.2 plots the sine function. The shape in the plot is called a sinusoid.

3.2

Simple properties

Inspecting Fig. 3.1 and observing (3.1) and (3.2), one readily discovers the several simple trigonometric properties of Table 3.1.

3.3

Scalars, vectors, and vector notation

In applied mathematics, a vector is a amplitude of some kind coupled with a direction.5 For example, “55 miles per hour northwestward” is a vector, as is the entity u depicted in Fig. 3.3. The entity v depicted in Fig. 3.4 is also a vector, in this case a three-dimensional one. Many readers will already ﬁnd the basic vector concept familiar, but for those who do not, a brief review: Vectors such as the ˆ ˆ u = xx + yy, ˆ ˆ ˆ v = xx + yy + zz
5 The same word vector is also used to indicate an ordered set of N scalars (§ 8.16) or an N × 1 matrix (Ch. 11), but those are not the uses of the word meant here.

48

CHAPTER 3. TRIGONOMETRY

Table 3.1: Simple properties of the trigonometric functions.

sin(−φ) sin(2π/4 − φ) sin(2π/2 − φ) sin(φ ± 2π/4) sin(φ ± 2π/2) sin(φ + n2π)

= = = = = =

− sin φ + cos φ + sin φ ± cos φ − sin φ sin φ

cos(−φ) cos(2π/4 − φ) cos(2π/2 − φ) cos(φ ± 2π/4) cos(φ ± 2π/2) cos(φ + n2π) = = = = = = − tan φ +1/ tan φ − tan φ −1/ tan φ + tan φ tan φ

= = = = = =

+ cos φ + sin φ − cos φ ∓ sin φ − cos φ cos φ

tan(−φ) tan(2π/4 − φ) tan(2π/2 − φ) tan(φ ± 2π/4) tan(φ ± 2π/2) tan(φ + n2π)

sin φ = tan φ cos φ cos2 φ + sin2 φ = 1 1 1 + tan2 φ = cos2 φ 1 1 1+ 2φ = tan sin2 φ

3.3. SCALARS, VECTORS, AND VECTOR NOTATION

49

ˆ ˆ Figure 3.3: A two-dimensional vector u = xx + yy, shown with its rectangular components.
y

ˆ yy

u ˆ xx x

ˆ ˆ Figure 3.4: A three-dimensional vector v = xx + yy + ˆz. z
y

v x

z

50

CHAPTER 3. TRIGONOMETRY

ˆ ˆ ˆ of the ﬁgures are composed of multiples of the unit basis vectors x, y and z, which themselves are vectors of unit length pointing in the cardinal directions their respective symbols suggest.6 Any vector a can be factored into ˆ a amplitude a and a unit vector a, as

ˆ a = aa,

ˆ where the a represents direction only and has unit magnitude by deﬁnition, and where the a represents amplitude only and carries the physical units ˆ if any.7 For example, a = 55 miles per hour, a = northwestward. The ˆ unit vector a itself can be expressed in terms of the unit basis vectors: for √ √ ˆ ˆ ˆ ˆ −ˆ example, if x points east and y points north, then a =√ x(1/ 2)+ y(1/ 2), √ where per the Pythagorean theorem (−1/ 2)2 + (1/ 2)2 = 12 . A single number which is not a vector or a matrix (Ch. 11) is called a scalar. In the example, a = 55 miles per hour is a scalar. Though the scalar a in the example happens to be real, scalars can be complex, too— which might surprise one, since scalars by deﬁnition lack direction and the Argand phase φ of Fig. 2.5 so strongly resembles a direction. However, phase is not an actual direction in the vector sense (the real number line in the Argand plane cannot be said to run west-to-east, or anything like that). The x, y and z of Fig. 3.4 are each (possibly complex) scalars; v =

6 Printing by hand, one customarily writes a general vector like u as “ u ” or just “ u ”, ˆ and a unit vector like x as “ x ”. ˆ 7 The word “unit” here is unfortunately overloaded. As an adjective in mathematics, or in its noun form “unity,” it refers to the number one (1)—not one mile per hour, one kilogram, one Japanese yen or anything like that; just an abstract 1. The word “unit” itself as a noun however usually signiﬁes a physical or ﬁnancial reference quantity of measure, like a mile per hour, a kilogram or even a Japanese yen. There is no inherent mathematical unity to 1 mile per hour (otherwise known as 0.447 meters per second, among other names). By contrast, a “unitless 1”—a 1 with no physical unit attached— does represent mathematical unity. Consider the ratio r = h1 /ho of your height h1 to my height ho . Maybe you are taller than I am and so r = 1.05 (not 1.05 cm or 1.05 feet, just 1.05). Now consider the ratio h1 /h1 of your height to your own height. That ratio is of course unity, exactly 1. There is nothing ephemeral in the concept of mathematical unity, nor in the concept of unitless quantities in general. The concept is quite straightforward and entirely practical. That r > 1 means neither more nor less than that you are taller than I am. In applications, one often puts physical quantities in ratio precisely to strip the physical units from them, comparing the ratio to unity without regard to physical units.

but as neither orientation has a natural advantage over the other. A plane is two-dimensional. is a ﬂat (but not necessarily level) surface. as the reader on this tier undoubtedly knows. in the general case vectors are not associated with any particular origin. they represent distances and directions. A point is zero-dimensional.3. ROTATION ˆ ˆ xx + yy + ˆz is a vector. then8 z |v|2 = |x|2 + |y|2 + |z|2 = x∗ x + y ∗ y + z ∗ z + [ℜ(z)]2 + [ℑ(z)]2 . If you hold the screwdriver in your right hand and turn the screw in the natural manner clockwise. then bend your ﬁngers in the y direction and extend your thumb. = [ℜ(x)]2 + [ℑ(x)]2 + [ℜ(y)]2 + [ℑ(y)]2 51 (3. we arbitrarily but conventionally accept the right-handed one as standard. 8 . If x. and Ch. inﬁnite in extent unless otherwise speciﬁed. A line is one-dimensional. y) can be identiﬁed with the vector xx + yy.” This book just renders it |v|.4 Rotation ˆ ˆ u = xx + yy (3. This is orientation by the right-hand rule. speak further of the vector. That is. 15. scalar magnitude of a complex vector.9 Sections 3. the point ˆ ˆ (x. The axes are oriented such that if you point your ﬂat right hand in the x direction. The reason the last notation subscripts a numeral 2 is obscure. 10 A plane. Notice the relative orientation of the axes in Fig. but verbal lore in American engineering has it that the name “right-handed” comes from experience with a standard right-handed wood screw or machine screw. The plane belongs to this geometrical hierarchy. you’d probably ﬁnd it easier to drive that screw with the screwdriver in your left hand.4 and 3. 3. Space is three-dimensional. turning the screw slot from the x orientation toward the y. not ﬁxed positions.3) A point is sometimes identiﬁed by the vector expressing its distance and direction from the origin of the coordinate system. However.4) A fundamental problem in trigonometry arises when a vector ˆ ˆ ˆ must be expressed in terms of alternate unit vectors x′ and y′ . 3. the thumb then points in the z direction. Some books print |v| as v or even v 2 to emphasize that it represents the real. 9 The writer does not know the etymology for certain. If somehow you came across a left-handed screw.9. the screw advances away from you in the z direction into the wood or hole. where x′ ˆ ˆ ˆ and y′ stand at right angles to one another and lie in the plane10 of x and y. y and z are complex.4. of course. A lefthanded orientation is equally possible.4. having to do with the professional mathematician’s generalized deﬁnition of a thing he calls the “norm.

evidently ˆ ˆ x′ = +ˆ cos ψ + y sin ψ.5) ˆ ˆ y = +ˆ ′ sin ψ + y′ cos ψ.4) yields ˆ ˆ u = x′ (x cos ψ + y sin ψ) + y′ (−x sin ψ + y cos ψ). If the question is asked. that’s how it’s pronounced). Here. x Substituting (3.7) ﬁnds general application where rotations in rectangular coordinates are involved. 11 .5: Vector basis rotation. TRIGONOMETRY Figure 3. the mark merely distinguishes the new ˆ ˆ unit vector x′ from the old x.6) (3. Mathematical writing employs the mark for a variety of purposes. 3. (3. but anyway. Equation (3. x and by appeal to symmetry it stands to reason that ˆ ˆ x = +ˆ ′ cos ψ − y′ sin ψ. “what happens if I rotate not the unit basis vectors but rather the vector u instead?” the answer is The “ ′ ” mark is pronounced “prime” or “primed” (for no especially good reason of which the author is aware.52 CHAPTER 3.5. y u y ˆ′ ψ ψ ˆ x ˆ y ˆ x′ x but are rotated from the latter by an angle ψ as depicted in Fig.6) into (3.7) which was to be derived. x (3. x ˆ ˆ y′ = −ˆ sin ψ + y cos ψ.11 In terms of the trigonometric functions of § 3.1.

we would have to rotate ˆ it by ψ = α − β. When one actually rotates a physical body.3. of course. TRIGONOMETRIC SUMS AND DIFFERENCES 53 that it amounts to the same thing. ˆ ≡ x cos β + y sin β. If we wanted b to coincide with a. 12 . ˆ ˆ Since we have deliberately chosen the angle of rotation such that b′ = a.4 in hand. (3. the body experiences forces during rotation which might or might not change the body internally in some relevant way. According to (3. with respect to the vectors themselves.8) Whether it is the basis or the vector which rotates thus depends on your point of view.12 3. if we did this we would obtain ˆ ˆ b′ = x[cos β cos(α − β) − sin β sin(α − β)] ˆ + y[cos β sin(α − β) + sin β cos(α − β)].5 Trigonometric functions of sums and diﬀerences of angles With the results of § 3. ˆ ˆ b be vectors of unit length in the xy plane. we ˆ ˆ ˆ ˆ can separately equate the x and y terms in the expressions for a and b′ to obtain the pair of equations cos α = cos β cos(α − β) − sin β sin(α − β).8) and the deﬁnition of b. This is only true. Let ˆ ˆ ˆ a ≡ x cos α + y sin α. we now stand in a position to consider trigonometric functions of sums and diﬀerences of angles. sin α = cos β sin(α − β) + sin β cos(α − β). respectively at angles α and β ˆ ˆ from the x axis. except that the sense of the rotation is reversed: ˆ ˆ u′ = x(x cos ψ − y sin ψ) + y(x sin ψ + y cos ψ).5.

(3. then to solve the result for sin(α − β). • to add cos β times the ﬁrst equation to sin β times the second. In this book alone. 14 Refer to footnote 13 above for the technique.14 These include cos(α − β) − cos(α + β) . 2 sin(α − β) + sin(α + β) sin α cos β = . (3. TRIGONOMETRY Solving the last pair simultaneously13 for sin(α − β) and cos(α − β) and observing that sin2 (·) + cos2 (·) = 1 yields sin(α − β) = sin α cos β − cos α sin β.5.10) Equations (3. 2 cos(α − β) + cos(α + β) cos α cos β = . 2 sin α sin β = 13 (3. eqns. (3.54 CHAPTER 3.10) are achieved by combining the equations in various straightforward ways.9) become sin(α + β) = sin α cos β + cos α sin β. 3. then to solve the result for cos(α − β).1 Variations on the sums and diﬀerences Several useful variations on (3.9) With the change of variable β ← −β and the observations from Table 3.1 that sin(−φ) = − sin φ and cos(−φ) = + cos(φ).10) are the basic formulas for trigonometric functions of sums and diﬀerences of angles. cos(α + β) = cos α cos β − sin α sin β. it proves useful many times. .9) and (3.11) The easy way to do this is • to subtract sin β times the ﬁrst equation from cos β times the second. cos(α − β) = cos α cos β + sin α sin β.9) and (3. This shortcut technique for solving a pair of equations simultaneously for a pair of variables is well worth mastering.

6 Trigonometric functions of the hour angles In general one uses the Taylor series of Ch.10) become γ−δ γ+δ cos − cos 2 2 γ+δ γ−δ cos δ = cos cos + sin 2 2 γ+δ γ−δ sin γ = sin cos + cos 2 2 γ+δ γ−δ cos γ = cos cos − sin 2 2 sin δ = sin Combining these in various ways. cos 2α = 2 cos2 α − 1 = cos2 α − sin2 α = 1 − 2 sin2 α.13) for sin2 α and cos2 α yields the half-angle formulas 1 − cos 2α . 8 to calculate trigonometric functions of speciﬁc angles. (3. Solving (3. 2 2 γ+δ γ−δ sin γ − sin δ = 2 cos sin . TRIGONOMETRICS OF THE HOUR ANGLES 55 With the change of variables δ ← α − β and γ ← α + β. (3. . 2 sin2 α = (3. then eqns.13) (3. 2 2 γ+δ γ−δ cos δ + cos γ = 2 cos cos . However. we have that γ−δ γ+δ cos .5.3.14) 3. (3. for angles which happen to be integral .9) and (3. .6.12) 3. 2 1 + cos 2α cos2 α = .2 Trigonometric functions of double and half angles If α = β. cos δ − cos γ = 2 sin 2 2 sin γ + sin δ = 2 sin γ+δ 2 γ+δ 2 γ+δ 2 γ+δ 2 sin sin sin sin γ−δ 2 γ−δ 2 γ−δ 2 γ−δ 2 . 2 2 γ−δ γ+δ sin .10) become the double-angle formulas sin 2α = 2 sin α cos α. .

[4] The hex and hour notations are recommended mostly only for theoretical math work. the reader will approve the choice. . Figure 3. • There are 360 degrees in a circle.3). TRIGONOMETRY multiples of an hour —there are twenty-four or 0x18 hours in a circle. 16 An equilateral triangle is. look at the square and the equilateral triangle16 of Fig. the angle between hours on the Greenwich clock is indeed an honest hour of arc.10) then supplies the various other lengths in the ﬁgure. the ﬁrst sentence nevertheless says the thing rather better. If you have ever been to the Old Royal Observatory at Greenwich. This is so because the clock face’s geometry is artiﬁcial.6 shows the angles.7. so the angle between eleven o’clock and twelve on the clock face is not an hour of arc! That angle is two hours of arc. Consider: • There are 0x18 hours in a circle. Since such angles arise very frequently in practice. 15 Hence an hour is 15◦ . you’d probably not like the reception the memo got.56 CHAPTER 3. The familiar. 3. the improved notation ﬁts a book of this kind so well that the author hazards it. The reader is urged to invest the attention and eﬀort to master the notation. Also by symmetry. just as there are twenty-four or 0x18 hours in a day15 —for such angles simpler expressions exist. if you were. you’re in good company.” were you? Well. and the diagonal splits the square’s corner into equal halves of three hours each. It is hoped that after trying the notation a while. a triangle whose three sides all have the same length. the perpendicular splits the triangle’s top angle into equal halves of two hours each and its bottom leg into equal segments of length 1/2 each. Nonetheless. The Pythagorean theorem (§ 2. Table 3. It’d be a bit hard to read the time from such a crowded clock face were it not so big. Both sentences say the same thing.80 hours instead of 22. standard clock face shows only twelve hours not twenty-four. you may have seen the big clock face there with all twenty-four hours on it.5 degrees. 3. To see how the values in the table are calculated. There is a psychological trap regarding the hour. after which we observe from Fig. for example. and since a triangle’s angles always total twelve hours (§ 2. The author is fully aware of the barrier the unfamiliar notation poses for most ﬁrst-time readers of the book. It is not claimed that they oﬀer much beneﬁt in most technical work of the less theoretical kinds. England. by symmetry each of the angles of the equilateral triangle in the ﬁgure measures four.9. but you weren’t going to write your angles in such inelegant conventional notation as “15◦ . If you wrote an engineering memo describing a survey angle as 0x1.1 that • the sine of a non-right angle in a right triangle is the opposite leg’s length divided by the diagonal’s. The barrier is erected neither lightly nor disrespectfully. as the name and the ﬁgure suggest. Each of the square’s four angles naturally measures six hours.2 tabulates the trigonometric functions of these hour angles. it seems worth our while to study them specially. but anyway. don’t they? But even though the “0x” hex preﬁx is a bit clumsy.

6: The 0x18 hours in a circle. Figure 3. TRIGONOMETRICS OF THE HOUR ANGLES 57 Figure 3.6.7: A square and an equilateral triangle for calculating trigonometric functions of the hour angles.3. √ 2 1 1 √ 3 2 1 1 1/2 1/2 .

2: Trigonometric functions of the hour angles. ANGLE φ [radians] [hours] 0 2π 0x18 2π 0xC 2π 8 2π 6 (5)(2π) 0x18 2π 4 0 1 2 3 4 5 6 sin φ 0 √ 3−1 √ 2 2 1 2 1 √ 2 √ 3 2 tan φ 0 √ 3−1 √ 3+1 1 √ 3 1 √ 3 cos φ 1 √ 3+1 √ 2 2 √ 3 2 1 √ 2 1 2 √ 3+1 √ 2 2 1 √ 3+1 √ 3−1 ∞ √ 3−1 √ 2 2 0 .58 CHAPTER 3. TRIGONOMETRY Table 3.

9) and (3. diﬀerence and half-angle formulas from the preceding sections to the values already in the table. However. The values for one and ﬁve hours are found by applying (3.8: The laws of sines and cosines. 17 . With this observation and the lengths in the ﬁgure. and • the cosine is the adjacent leg’s length divided by the diagonal’s.7 The laws of sines and cosines Refer to the triangle of Fig. THE LAWS OF SINES AND COSINES Figure 3.10) against the values for two and three hours just calculated. of course. the Taylor series (§ 8.9) oﬀers a cleaner. one can calculate the sine. 59 y x γ b α h a c β • the tangent is the opposite leg’s length divided by the adjacent leg’s.17 3. 3. b c The creative reader may notice that he can extend the table to any angle by repeated application of the various sum. one can write that c sin β = h = b sin γ.8. quicker way to calculate trigonometrics of non-hour angles. The values for zero and six hours are. or in other words that sin γ sin β = .3. three and four hours. seen by inspection. By the deﬁnition of the sine function. tangent and cosine of angles of two.7.

γ is an angle not a point. too.15) This equation is known as the law of sines. Table 3. On the other hand. 3.1 on page 48 has summarized simple properties of the trigonometric functions. the writer suspects in light of Fig. ˆ ˆ b = xb cos γ + yb sin γ.8 Summary of properties Table 3.60 CHAPTER 3.3 summarizes further properties. what is true for them must be true for α. if one expresses a and b as vectors emanating from the point γ. However.4. then c2 = |b − a|2 = a2 + (b2 )(cos2 γ + sin2 γ) − 2ab cos γ.8 that few readers will be confused as to which point is meant.16) = (b cos γ − a)2 + (b sin γ)2 3. this is c2 = a2 + b2 − 2ab cos γ. if we like. gathering them from §§ 3.18 Hence. Nothing prevents us from dropping additional perpendiculars hβ and hγ from the other two corners and using those as utility variables. Of course there is no “point γ”. “there is something special about α. . However. 19 Here is another example of the book’s judicious relaxation of formal rigor. Table 3. The perpendicular h drops from it. we’re not interested in h itself. Since cos2 (·) + sin2 (·) = 1. TRIGONOMETRY But there is nothing special about β and γ. 18 “But. The skillful applied mathematician does not multiply labels without need.7. too. (3. 3.” True. sin α sin β sin γ = = . the h is just a utility variable to help us to manipulate the equation into the desired form. We can use any utility variables we want.19 ˆ a = xa. known as the law of cosines.2 on page 58 has listed the values of trigonometric functions of the hour angles.5 and 3. a b c (3.” it is objected.

ˆ ˆ u = x′ (x cos ψ + y sin ψ) + y′ (−x sin ψ + y cos ψ) cos(α ± β) = cos α cos β ∓ sin α sin β cos(α − β) − cos(α + β) sin α sin β = 2 sin(α − β) + sin(α + β) sin α cos β = 2 cos(α − β) + cos(α + β) cos α cos β = 2 γ−δ γ+δ cos sin γ + sin δ = 2 sin 2 2 γ+δ γ−δ sin γ − sin δ = 2 cos sin 2 2 γ+δ γ−δ cos δ + cos γ = 2 cos cos 2 2 γ+δ γ−δ cos δ − cos γ = 2 sin sin 2 2 sin 2α = 2 sin α cos α cos 2α = 2 cos2 α − 1 = cos2 α − sin2 α = 1 − 2 sin2 α 1 − cos 2α sin2 α = 2 1 + cos 2α cos2 α = 2 sin γ sin α sin β = = c a b c2 = a2 + b2 − 2ab cos γ sin(α ± β) = sin α cos β ± cos α sin β .3. SUMMARY OF PROPERTIES 61 Table 3.8.3: Further properties of the trigonometric functions.

x ˆ ≡ +ˆ cos θ + ρ sin θ. 7 May 2006] 21 Notice that the φ is conventionally written second in cylindrical (ρ. z). or the illumination a lamp sheds on a printed page of this book. θ. Consider for example an electric wire’s magnetic ﬁeld. Such coordinates are related to one another and to the rectangular coordinates by the formulas of Table 3. θ.9 Cylindrical and spherical coordinates ˆ ˆ ˆ v = xx + yy + zz. This odd-seeming convention is to maintain proper righthanded coordinate rotation. the cylindrical coordinates (ρ. and are convenient for many purposes. That is. “Orthonormality. y. 15 is read.62 CHAPTER 3.” 14:19. There are no constant unit basis vectors to match them.4 on page 49). ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ v = xx + yy + zz = ρρ + φφ + zz = ˆr + θθ + φφ.17) Such variable unit basis vectors point locally in the directions in which their respective coordinates advance. z) can be used instead of the rectangular coordinates (x. φ) coordinates.21 Refer to Fig. However. z (3.9. 3. xˆz Such rectangular coordinates are simple and general. φ.” [52.) 20 .4. rectangular coordinates—which given a speciﬁc orthonormal20 set of unit basis vectors [ˆ y ˆ] uniquely identify a point (see Fig. there are at least two broad classes of conceptually simple problems for which rectangular coordinates tend to be inconvenient: problems in which an axis or a point dominates. TRIGONOMETRY 3. r It doesn’t work that way. Orthonormal in this context means “of unit length and at right angles to the other vectors in the set. To attack a problem dominated by an axis. To attack a problem dominated by a point. Section 3.3 has introduced the concept of the vector The coeﬃcients (x. φ) can be used. which depends on the book’s distance from the lamp (a point). Nevertheless variable unit basis vectors are deﬁned: ˆ ˆ ρ ≡ +ˆ cos φ + y sin φ. Cylindrical and spherical coordinates can greatly simplify the analyses of the kinds of problems they respectively ﬁt. z) on the equation’s right side are coordinates—specifically. but they come at a price. y. the spherical coordinates (r. φ. whose intensity varies with distance from the wire (an axis). ˆ r z ˆ ˆ θ ≡ −ˆ sin θ + ρ cos θ. (The explanation will seem clearer once Ch. 3. z) but third in spherical (r. x ˆ ˆ φ ≡ −ˆ sin φ + y cos φ.

CYLINDRICAL AND SPHERICAL COORDINATES 63 Figure 3. ρ2 = x2 + y 2 r 2 = ρ2 + z 2 = x2 + y 2 + z 2 ρ tan θ = z y tan φ = x z = r cos θ ρ = r sin θ x = ρ cos φ = r sin θ cos φ y = ρ sin φ = r sin θ sin φ .4: Relations among the rectangular.3. φ) and cylindrical (ρ.9: A point on a sphere. cylindrical and spherical coordinates. z) coordinates.) ˆ z ˆ θ φ x ˆ r z y ˆ ρ Table 3. θ. in spherical (r. φ.9. (The axis labels bear circumﬂexes in this ﬁgure only to disambiguate the z axis from the cylindrical coordinate z.

(3. 2. but you can use symbols like (ρx )2 = y 2 + z 2 . when some of the numbers summed are positive and others negative.2. y (ρy )2 = z 2 + x2 .23 But if the triangle inequalities hold for vectors in a plane.” but that’s all right. (3. See § 1.22 ρx . The writer is not aware of any conventionally established symbols for quantities like these. 22 . we have that k zk ≤ k |zk | .2 in vector notation. TRIGONOMETRY ˆ Convention usually orients z in the direction of a problem’s axis. Evidently. b and c represent the three sides of a triangle such that a + b + c = 0.64 CHAPTER 3.20) Naturally. y x .19) for any two complex numbers z1 and z2 . These are just the triangle inequalities of § 2.9.18) 3.19) and (3.5 on page 40. θ and φ is usually not a good idea.2.2 uses the “<” sign rather than the “≤. for example. Symbols like ρx are logical but. not standard. tan θ x = tan φx = instead if needed. one may ﬁnd the latter formula useful for sums of real numbers.10 The complex triangle inequalities If a.9.20) hold equally well for real numbers as for complex. one might note that § 2. 23 Reading closely. Changing the meanings of known symbols like ρ. then per (2. z (3. |z1 | − |z2 | ≤ |z1 + z2 | ≤ |z1 | + |z2 | (3. tan θ y = tan φy = ρy .44) |a| − |b| ≤ |a + b| ≤ |a| + |b| . x z . Extending the sum inequality. then why not equally for complex numbers? Consider the geometric interpretation of the Argand plane of Fig. Occaˆ sionally however a problem arises in which it is more convenient to orient x ˆ or y in the direction of the problem’s axis (usually because ˆ has already z been established in the direction of some other pertinent axis). as far as this writer is aware.

3.21) . it suﬃces (3.10.23) to the equation yields z1 z2 = (cos φ1 cos φ2 − sin φ1 sin φ2 ) + i(sin φ1 cos φ2 + cos φ1 sin φ2 ). ρ1 ρ2 But according to (3.20) is that if |zk | converges.24) is an important result. Per (2. If z = x + iy.53). It says that if you want to multiply complex numbers. 3.5 we can write z = (ρ)(cos φ + i sin φ) = ρ cis φ. then evidently x = ρ cos φ.11.24) Equation (3.10). 2. where cis φ ≡ cos φ + i sin φ. DE MOIVRE’S THEOREM 65 An important consequence of (3. (3.3. ρ1 ρ2 or in other words z1 z2 = ρ1 ρ2 cis(φ1 + φ2 ).20) implies convergence of zk .3 (page 49). 2.11 De Moivre’s theorem Compare the Argand-plotted complex number of Fig. y = ρ sin φ. Such a consequence is important because mathematical derivations sometimes need the convergence of zk established. With reference to Fig. this is just z1 z2 = cos(φ1 + φ2 ) + i sin(φ1 + φ2 ). Convergence of |zk |. is often easier to establish.5 (page 40) against the vector of Fig.22) (3. then zk also converges. z1 z2 = (x1 x2 − y1 y2 ) + i(y1 x2 + x1 y2 ). which per (3. Although complex numbers are scalars not vectors. Equation (3.3. which can be hard to do directly.20) will ﬁnd use among other places in § 8. the ﬁgures do suggest an analogy between complex phase and vector direction.23) (3. Applying (3.

if you refer to any of them as de Moivre’s theorem then you are unlikely to be misunderstood. but shall in § 5. TRIGONOMETRY It follows by parallel reasoning (or by extension) that z1 ρ1 = cis(φ1 − φ2 ) z2 ρ2 and by extension that z a = ρa cis aφ. (3. but since the three equations express essentially the same idea.26) Equations (3.25) and (3.25) Also called de Moivre’s formula.24). that cis φ = exp iφ = eiφ . 25 [44][52] 24 . (3.26). (3.66 • to multiply their magnitudes and • to add their phases. 5. Some authors apply the name of de Moivre directly only to (3.4. where exp(·) is the natural exponential function and e is the natural logarithmic base.25 We have not shown yet.26) are known as de Moivre’s theorem. De Moivre’s theorem is most useful in this light. both deﬁned in Ch.24 . CHAPTER 3. or to some variation thereof.

It cannot be the role of a book like this one to lead the beginner gently toward an apprehension of the basic calculus concept. f ′ (t)? • Interpreting some function f ′ (t) as an instantaneous rate of change. the calculus beginner’s ﬁrst task is quantitatively to understand the pair’s interrelationship. 4.1 Inﬁnitesimals and limits Calculus systematically treats numbers so large and so small. Leibnitz cleared this hurdle for us in the seventeenth century. to understand this pair of questions. f (t)? This chapter builds toward a basic understanding of the ﬁrst question. generality and signiﬁcance. the concept is simple and brieﬂy stated. so now at least we know the right pair of questions to ask. In this book we necessarily state the concept brieﬂy. dedicated beginner could perhaps obtain the basic calculus concept directly here. 1 67 . they lie beyond the reach of our mundane number system. then move along.Chapter 4 The derivative The mathematics of calculus concerns a complementary pair of questions:1 • Given some function f (t). so brieﬂy stated. or derivative. he would probably ﬁnd it quicker and more pleasant to begin with a book like the one referenced. Although a suﬃciently talented. The greatest conceptual hurdle—the stroke of brilliance—probably lies simply in stating the pair of questions clearly. what is the corresponding accretion. Once grasped. Although once grasped the concept is relatively simple. is no trivial thing. Sir Isaac Newton and G.W. or integral. what is the function’s instantaneous rate of change. With the pair in hand. They are the pair which eluded or confounded the most brilliant mathematical minds of the ancient world. Such an understanding constitutes the basic calculus concept. Many instructional textbooks—[20] is a worthy example—have been written to lead the beginner gently.

very small. Nevertheless.” “Smaller than 0x0.” “Zero. etc. then 1/ǫ can be regarded as an inﬁnity: a very large number much larger than any mundane number one can name.” “Smaller than 0x0.” “Oh. My inﬁnitesimal is deﬁnitely bigger than zero.0001? Is it smaller than that?” “Much smaller. my inﬁnitesimal is smaller still. “Very. Bigger than that. then the quotient δ/ǫ is not some unfathomable 0/0.68 CHAPTER 4.1 The inﬁnitesimal A number ǫ is an inﬁnitesimal if it is so small that 0 < |ǫ| < a for all possible mundane positive numbers a. “How small?” “Very small. to divide them. You said that we should use hexadecimal notation in this book. In physical applications. the inﬁnitesimals are often not true mathematical inﬁnitesimals but rather relatively very small quantities such as the mass of a wood screw compared to the mass . right. Yes. The principal advantage of using symbols like ǫ rather than 0 for inﬁnitesimals is in that it permits us conveniently to compare one inﬁnitesimal against another. For instance.01.” “Smaller than 2−0x1 0000 0000 0000 0000 ?” “Now that is an impressively small number.01?” “Smaller than what?” “Than 2−8 . then. rather it is δ/ǫ = 3. This is somewhat a diﬃcult concept.” “What about 0x0. smaller than 0x0. THE DERIVATIVE 4. “How big is your inﬁnitesimal?” you ask.1. If ǫ is an inﬁnitesimal. to add them together. Let me propose to you that I have an inﬁnitesimal. but its smallness conceptually lies beyond the reach of our mundane number system.0000 0000 0000 0001?” “Smaller. It is a deﬁnite number of a certain nonzero magnitude.” This is the idea of the inﬁnitesimal. if δ = 3ǫ is another inﬁnitesimal. no. remember?” “Sorry.” I reply. so if it is not immediately clear then let us approach the matter colloquially.

a quantity less than 1/0x10000 of the principal is indeed inﬁnitesimal for most practical purposes (but not all: for example. z→0 2z 2 lim The symbol “limQ ” is short for “in the limit as Q. INFINITESIMALS AND LIMITS 69 of a wooden house frame.6).” Notice that lim is not a function like log or sin.1. one common way to specify that ǫ be inﬁnitesimal is to write that ǫ ≪ 1.2 The second-order inﬁnitesimal ǫ2 is so small on the scale of the common. the implication is that z draws toward zo from + the positive side such that z > zo . used when saying that the quantity 2 Among scientists and engineers who study wave phenomena. or v ≫ u. The reason for the notation is to provide a way to handle expressions like 3z 2z as z vanishes: 3z 3 = . indicates that u is much less than v. depending on your point of view. the − implication is that z draws toward zo from the negative side. even very roughly speaking.1. 4. . When written limz→zo . It is just a reminder that a quantity approaches some value. On the other hand. positions of spacecraft and concentrations of chemical impurities must sometimes be accounted more precisely). In any case. typically such that one can regard the quantity u/v to be an inﬁnitesimal. whatever “negligible” means in the context.and higher-order inﬁnitesimals are likewise possible. Similarly. when written limz→zo .4. The ǫ2 is an inﬁnitesimal to the inﬁnitesimals. we should probably render the rule as twelve points per wavelength here. ﬁrst-order inﬁnitesimal ǫ that the even latter cannot measure it. In fact. For quantities between 1/0xC and 1/0x10000. The key point is that the inﬁnitesimal quantity be negligible by comparison. The notation u ≪ v. there is an old rule of thumb that sinusoidal waveforms be discretized not less ﬁnely than ten points per wavelength. The additional cost of inviting one more guest to the wedding may or may not be inﬁnitesimal. or the audio power of your voice compared to that of a jet engine. In keeping with this book’s adecimal theme (Appendix A) and the concept of the hour of arc (§ 3.2 Limits The notation limz→zo indicates that z draws as near to zo as it possibly can. Third. it depends on the accuracy one seeks. a quantity greater then 1/0xC of the principal to which it compares probably cannot rightly be regarded as inﬁnitesimal.

There are evidently n P ≡ n!/(n − k)! (4. or permutations. However. because you can accept or reject the ﬁrst block. I have several small wooden blocks of various shapes and sizes. there are n − 1 options (why not n options? because you have already taken one block from me.2 Combinatorics In its general form. Then you select your second favorite: for this. whereas red-white-blue is a diﬀerent combination entirely. This section treats the problem in its basic form. some of these distinct permutations put exactly the same combination of blocks in your hand. Then you select your third favorite—for this there are n − 2 options—and so on until you have k blocks. neither more nor fewer. and so on.1) k ordered ways. but it is not magical. the permutations redgreen-blue and green-red-blue constitute the same combination. For a single combination 3 [20] . you select your favorite block ﬁrst: there are n options for this. Consider that to say z→2− lim (z + 2) = 4 is just a fancy way of saying that 2 + 2 = 4. the problem of selecting k speciﬁc items out of a set of n available items belongs to probability theory ([chapter not yet written]). In its basic form. for instance. then the third. however.1 Combinations and permutations Consider the following scenario. Now. the same problem also applies to the handling of polynomials or power series. If I oﬀer you the blocks and you are free to take all. some or none of them at your option. suppose that what you want is exactly k blocks. I have only n − 1 blocks left). then accept or reject the second. 4.3 4. then how many distinct choices of blocks do you have? Answer: you have 2n choices. available for you to select exactly k blocks. painted diﬀerent colors so that you can clearly tell each block from the others. Don’t let it confuse you.70 CHAPTER 4. THE DERIVATIVE equaled the value would be confusing.2. if you can take whichever blocks you want. Desiring exactly k blocks. The lim notation is convenient to use sometimes.

9) n−k+1 n k−1 k k+1 n n−k k+1 n k n−1 k−1 Equation (4.2). blue-green-red). evidently k! permutations are possible (redgreen-blue.2) n n n−k n k (4.2).4. you can either choose k from the original set.7) (4. to choose k blocks then. (4. red-blue-green. k=0 n−1 k−1 + n−1 k n k (4.9) come directly from the deﬁnition (4. k! of combinations include that = n k .5) is seen when an nth block—let us say that it is a black block—is added to an existing set of n − 1 blocks. green-blue-red.4) = 2n . (4.3) (4. (4.6) through (4. Because one can choose neither fewer than zero nor more than n from n blocks. Equation (4.10) k For „ n k « n n−k n−1 k when n < 0. they relate combinatoric coeﬃcients to their neighbors in Pascal’s triangle (§ 4. there is no obvious deﬁnition. blue-red-green. green-red-blue. COMBINATORICS 71 of k blocks (red. Equation (4. Hence dividing the number of permutations (4. = = = = = n k . blue).8) . green.4) comes directly from the observation (made at the head of this section) that 2n total combinations are possible if any k is allowed. .5) (4. Equations (4.3) results from changing the variable k ← n − k in (4.6) (4.2. n = 0 unless 0 ≤ k ≤ n.2).2.1) by k! yields the number of combinations n k Properties of the number „ n k ≡ « n!/(n − k)! . or the black block plus k − 1 from the original set.

4. the extension is not covered in this book. (In fact this is the easy way to ﬁll Pascal’s triangle out: for each entry.11) The author is given to understand that. 4.2.) 4.3 The binomial theorem This section presents the binomial theorem and one of its signiﬁcant consequences. „ „ 3 0 «„ „ 2 0 «„ 4 1 1 0 «„ 3 1 «„ 0 0 «„ 2 1 «„ 4 2 « 1 1 «„ 3 2 «„ « 2 2 «„ 4 3 « 3 3 «„ „ „ 4 0 « 4 4 « . .5) predicts. just add the two entries above. . by an heroic derivational eﬀort.1 Expanding the binomial The binomial theorem holds that4 n (a + b) = k=0 4 n n k an−k bk .1: The plan for Pascal’s triangle.72 CHAPTER 4. However. Notice how each entry in the triangle is the sum of the two entries immediately above.1 of the various possible n . (4.11) can be extended to nonintegral n. 4. Pascal’s triangle.2 Pascal’s triangle „ « Consider the triangular layout in Fig. since applied mathematics does not usually concern itself with hard theorems of little known practical use.2. this yields Fig. k Evaluated. (4. as (4. .3. 4. THE DERIVATIVE Figure 4.

each (a+b) factor corresponds to one of the “wooden blocks. 4.3. .12) (actually this holds for any ǫ. we have that (1 + δ)1/m ≈ 1 + δ .2 Since „ Powers of numbers near unity n 0 « = 1 and „ n 1 « = n. Since (a + b)n = (a + b)(a + b) · · · (a + b)(a + b). m .” where a means rejecting the block and b. this is n (1 + ǫ) = k=0 n n k ǫk (4. THE BINOMIAL THEOREM Figure 4. b = ǫ.2.3. 73 1 1 1 1 2 1 1 3 3 1 1 4 6 4 1 1 5 A A 5 1 1 6 F 14 F 6 1 1 7 15 23 23 15 7 1 . In either form. but the typical case of interest has |ǫ| ≪ 1). it follows from (4. In the common case that a = 1. small or large. the binomial theorem is a direct consequence of the combinatorics of § 4. raising the equation to the 1/n power then changing δ ← nǫo and m ← n.12) that (1 + ǫo )n ≈ 1 + nǫo . .2: Pascal’s triangle. accepting it. to excellent precision for nonnegative integral n if ǫo is suﬃciently small. |ǫ| ≪ 1. Furthermore.4.

The Taylor expansion as such will be introduced in Ch.74 CHAPTER 4.14) grows large. we have that (1 + ǫ)n/m ≈ 1 + Inverting this equation yields (1 + ǫ)−n/m ≈ [1 − (n/m)ǫ] n 1 = ≈ 1 − ǫ.13). are complex? Changing the symbol z ← x and observing that the inﬁnitesimal ǫ may also be complex. one would like (4. THE DERIVATIVE Changing 1 + δ ← (1 + ǫ)n and observing from the (1 + ǫo )n equation above that this implies that δ ≈ nǫ. The equation oﬀers a simple. because although a complex inﬁnitesimal ǫ poses no particular problem. the action of a complex power z remains undeﬁned. the last two equations imply that (1 + ǫ)x ≈ 1 + xǫ (4. 5 Other than “the ﬁrst-order Taylor expansion. or both. m Taken together.” but such an unwieldy name does not ﬁt the present context. The writer knows of no conventional name5 for (4. for consistency’s sake. 4. one wants to know whether (1 + ǫ)z ≈ 1 + zǫ (4. as follows.13) raises the question: what if ǫ or x.3.4 will investigate the extremely interesting eﬀects which arise when ℜ(ǫ) = 0 and the power z in (4. accurate way of approximating any real power of numbers in the near neighborhood of 1. which we now do.14) does hold. . Still. but for the moment we shall use the equation in a more ordinary manner to develop the concept and basic application of the derivative.13) into the new domain.14) still holds.14) to hold. 8. Section 5. logically extending the known result (4. No work we have yet done in the book answers the question. but named or unnamed it is an important equation.13) for any real x. 1 + (n/m)ǫ [1 − (n/m)ǫ][1 + (n/m)ǫ] m n ǫ.3 Complex powers of numbers near unity Equation (4. In fact nothing prevents us from deﬁning the action of a complex power such that (4.

15) says that f (t) = = ′ ∞ k=−∞ ∞ k=−∞ ǫ→0+ lim (ck )(t + ǫ/2)k − (ck )(t − ǫ/2)k ǫ (1 + ǫ/2t)k − (1 − ǫ/2t)k . ǫ ǫ→0+ lim ck tk Applying (4. f ′ (t) ≡ lim f (t + ǫ/2) − f (t − ǫ/2) .14).15). In mathematical symbols and for the moment using real numbers. (4. There is no helping this. . ǫ ǫ→0+ but this book generally prefers the more elegant balanced form (4. The reader is advised to tread through these sections line by stubborn line.4. (4. THE DERIVATIVE 75 4. we now stand in a position properly to introduce the chapter’s subject. which we will now use in developing the derivative’s several properties through the rest of the chapter.4 The derivative Having laid down (4.6 4. the mathematical notation grows a little thick. in the good trust that the math thus gained will prove both interesting and useful. ǫ From this section through § 4.4. ǫ (4. What is the derivative? The derivative is the instantaneous rate or slope of a function.15) ǫ→0+ Alternately. one can deﬁne the same derivative in the unbalanced form f ′ (t) = lim f (t + ǫ) − f (t) .1 The derivative of the power series In the very common case that f (t) is the power series f (t) = ∞ k=−∞ ck tk .16) where the ck are in general complex coeﬃcients.14).7.4. the derivative. this is f ′ (t) = 6 ∞ k=−∞ ǫ→0+ lim ck tk (1 + kǫ/2t) − (1 − kǫ/2t) .

15) and (4.3 will remedy the oversight. . it’s important. = f (t + dt/2) − f (t − dt/2). In (4. The quantities dt and df represent coordinated inﬁnitesimal changes in t and f respectively. the meaning is clear enough: d(·) signiﬁes how much (·) changes when the independent variable t increments by dt.18) until you do.2 The Leibnitz notation The f ′ (t) notation used above for the derivative is due to Sir Isaac Newton. many users of applied mathematics have never developed a clear understanding as to precisely what the individual symbols mean. and df is a dependent inﬁnitesimal whose size relative to dt depends on the independent variable t.4. conceptually. Usually better on the whole. one can choose any inﬁnitesimal size ǫ. however. and is easier to start with. (4. is G. Because there is signiﬁcant practical beneﬁt in learning how to handle the Leibnitz notation correctly—particularly in applied complex variable theory—this subsection seeks to present each Leibnitz element in its correct light.17) Equation (4. like ∂f /∂z. The meaning of the symbol d unfortunately depends on the context. 10 This is diﬃcult. reread it carefully with reference to (4.4. 9 If you do not fully understand this sentence.15). 8 This subsection is likely to confuse many readers the ﬁrst time they read it. For the independent inﬁnitesimal dt. df such that per (4. As a result.17) admittedly has not explicitly considered what happens when the real t becomes the complex z.7 4. that the notation dt itself has two distinct meanings:10 f ′ (t) = 7 Equation (4. Often they have developed positive misunderstandings.18).17) gives the general derivative of the power series.9 Notice. df . Usually the exact choice of size does not matter. THE DERIVATIVE ∞ k=−∞ ck ktk−1 . more concise way to state it.18) dt Here dt is the inﬁnitesimal. (4.76 which simpliﬁes to f (t) = ′ CHAPTER 4. yet the author can think of no clearer. The reason is that Leibnitz elements like dt and ∂f usually tend to appear in practice in certain speciﬁc relations to one another. however. but occasionally when there are two independent variables it helps the analysis to adjust the size of one of the independent inﬁnitesimals with respect to the other.W. Leibnitz’s notation8 dt = ǫ. but § 4.

For instance. Maybe it would be clearer in some cases to write dt f instead of df . nor do we usually worry about the precise value of dt. and 77 At ﬁrst glance. Now. By contrast. then df or d(f ) is the amount by which f changes as t changes by dt. This leads one to forget that dt does indeed have a precise value. t). such that dt has prior independence to ds. The point is to develop a clear picture in your mind of what a Leibnitz inﬁnitesimal really is. However. The df is inﬁnitesimal but not constant. The important point is that dv and dx be coordinated so that the ratio dv/dx has a deﬁnite value no matter which of the two be regarded as independent. Changing perspective is allowed and perfectly proper. but it really denies the entire point of the Leibnitz notation. whose speciﬁc size could be a function of t but more likely is just a constant. Maybe it would be clearer in some cases to write ǫ instead of dt. Because the book is a book of applied mathematics. What confuses is when one changes perspective in mid-analysis. the math can get a little tricky. • d(t). However. then we can (and in complex analysis sometimes do) say that ds = dt = ǫ. Sometimes when writing a diﬀerential equation like the potential-kinetic energy equation ma dx = mv dv.4. this writer recommends that you learn the Leibnitz notation properly. now regarding f as the independent variable. . what we are interested in is not dt or df as such. which is how much (t) changes as t increments by dt. the distinction between dt and d(t) seems a distinction without a diﬀerence. the ratio df /dt remains the same in any case. then dt does not fundamentally have anything to do with t as such. so there is usually no trouble with treating dt and df as though they were the same kind of thing. developing the ability to treat the inﬁnitesimals individually. and for most practical cases of interest. Many applied mathematicians do precisely that. so indeed it is. One can avoid the confusion simply by keeping the dv/dx or df /dt always in ratio. but the latter is how it is conventionally written. it has not yet pointed out (but does so now) that even if s and t are equally independent variables. we do not necessarily have either v or x in mind as the independent variable. If t is an independent variable. if f is a dependent variable. However. but one must take care: the dt and df after the change are not the same as the dt and df before the change. The point is not to fathom all the possible implications from the start. In fact. ds = δ(s. THE DERIVATIVE • the independent inﬁnitesimal dt = ǫ. Once you have the picture. For this reason. you can go from there. That is okay as far as it goes. or when changing multiple independent complex variables simultaneously. but rather the R P ratio df /dt or the sum k f (k dt) dt = f (t) dt.4. most of the time. you can do that as the need arises. when switching perspective in mid-analysis as to which variables are dependent and which are independent. but for most cases the former notation is unnecessarily cluttered. One might as well just stay with the Newton notation in that case. this footnote does not attempt to say everything there is to say about inﬁnitesimals. Instead. at the fundamental level they really aren’t. If a constant. after which nothing prevents us from using the symbols ds and dt interchangeably. it is a function of t. never treating the inﬁnitesimals individually. or whether the independent be some third variable (like t) not in the equation. we do not usually worry about which of df and dt is the independent inﬁnitesimal. if s and t are both independent variables. This is ﬁne. one can have dt = ǫ(t). the latter is how it is conventionally written. then dt is just an inﬁnitesimal of some kind.

if δ is a positive real inﬁnitesimal. 4. but he accepts the notation as conventional nevertheless.19) rather than (4. then it should be equally valid to let ǫ = δ. In any case you should appreciate the conceptual diﬀerence between dt = ǫ and d(t). Use discretion. One should like the derivative (4. ǫ = iδ.19) to come out the same regardless of the Argand direction from which ǫ approaches 0 (see Fig. ǫ = −δ. introducing some unambiguous symbol like ǫ to represent the independent inﬁnitesimal. ǫ = (4 − i3)δ or any other inﬁnitesimal value. In fact for the sake of robustness. it is conventional to use the symbol ∂ instead of d. THE DERIVATIVE In such cases. it may be wise to use the symbol dt to mean d(t) only. or second derivative. and (4.19) one should like it to evaluate the same in the limit regardless of the complex phase of ǫ. one normally demands that derivatives do come out the same regardless of the Argand direction. ∂s (·) when tracking s. 2. Where two or more independent variables are at work in the same equation.78 CHAPTER 4. though.) Conventional shorthand for d(df ) is d2 f . the notation dk f dtk represents the kth derivative. That is. By extension. so d2 f d(df /dt) = 2 dt dt is a derivative of a derivative. (If needed or desired. Such notation appears only rarely in the literature. so your audience might not understand it when you write it. dz ǫ→0 ǫ (4.15) to be robust.3 The derivative of a function of a complex variable For (4. one can write ∂t (·) when tracking t. written here in the slightly more general form f (z + ǫ/2) − f (z − ǫ/2) df = lim . . dt2 . for (dt)2 . ǫ = −iδ. etc.5). so long as 0 < |ǫ| ≪ 1. as a reminder that the reader needs to pay attention to which ∂ tracks which independent variable.11 A derivative ∂f /∂t or ∂f /∂s in this case is sometimes called by the slightly misleading name of partial derivative.4.19) is 11 The writer confesses that he remains unsure why this minor distinction merits the separate symbol ∂. Where the limit (4.15) is the deﬁnition we normally use for the derivative for this reason.

4.12. d(z a ) dz (z + ǫ/2)a − (z − ǫ/2)a ǫ→0 ǫ (1 + ǫ/2z)a − (1 − ǫ/2z)a = lim z a ǫ→0 ǫ (1 + aǫ/2z) − (1 − aǫ/2z) . Excepting the nonanalytic parts of complex numbers (|·|.3).19) except at isolated nonanalytic points (like z = 0 in h[z] = 1/z √ or g[z] = z). THE DERIVATIVE 79 works without modiﬁcation when ǫ is complex. .4 The derivative of z a Inspection of § 4.14). plus the Heaviside unit step u(t) and the Dirac delta δ(t) (§ 7. real ǫ and integral k of that section with arbitrary complex z. such functions are fully diﬀerentiable except at their poles (where the derivative goes inﬁnite in any case) and other nonanalytic points. ck z k = ∞ k=−∞ ck kz k−1 (4. Particularly. ǫ and a. so the derivative (4.4. [·]∗ . d dz ∞ k=−∞ sensitive to the Argand direction or complex phase of ǫ.12).4. Meeting the criterion.19) does exist—where the derivative is ﬁnite and insensitive to Argand direction—there we say that the function f (z) is diﬀerentiable. see § 2. That is. Where the derivative (4.7).21) which simpliﬁes to for any complex z and a.17) of the general power series. How exactly to evaluate z a or z a−1 when a is complex is another matter.14) reveals that nothing prevents us from replacing the real t. arg[·]. the key formula (4. = lim z a ǫ→0 ǫ = lim d(z a ) = az a−1 dz (4. ℜ[·] and ℑ[·].4.4 and its (5. 4. but in any case you can use (4.1’s logic in light of (4. there we normally say that the derivative does not exist. most functions encountered in applications do meet the criterion (4.21) for real a right now. written here as (1 + ǫ)w ≈ 1 + wǫ. treated in § 5.20) holds equally well for complex z as for real.

if you inform me instead that you earn 10 percent a year on the same bond. like 10 percent annual interest on a bond. For example. The latter ﬁgure is a relative rate. then I may commend you vaguely for your thrift but otherwise the information does not tell me much. .22) The investment principal grows at the absolute rate df /dt. It expresses the signiﬁcant concept of a relative rate.5 Basic manipulation of the derivative This section introduces the derivative chain and product rules. f (t) dt (4. if you inform me that you earn \$ 1000 a year on a bond you hold.2. 4. or logarithmic derivative. The natural logarithmic notation ln f (t) may not mean much to you yet. df /dt d = ln f (t). then I might want to invest.22) for the moment. but the bond’s interest rate is (df /dt)/f (t).5 The logarithmic derivative Sometimes one is more interested in knowing the rate of f (t) relative to the value of f (t) than in knowing the absolute rate itself.4. but the equation’s left side at least should make sense to you. THE DERIVATIVE 4. so you can ignore the right side of (4. as we’ll not introduce it formally until § 5.80 CHAPTER 4. However.

There is nothing more to the proof of the derivative chain rule than that.1 The derivative chain rule df = dz df dw dw dz If f is a function of w.13 4. df = dz 13 = = = = «„ dw dz 1 1 = √ . « 6z 3z = √ = √ . BASIC MANIPULATION OF THE DERIVATIVE 81 4. d j fj (z) = dz 2  fj z + j dz 2 − j fj z − dz 2 . d j fj (z) = fj (z) + j dfj 2 − j fj (z) − dfj 2 . 2w1/2 2 3z 2 − 1 6z. 3z 2 − 1.5. fj z ± so. . then12 .4. a thing divided by itself is 1.23) Equation (4.  12 ≈ fj (z) ± dfj dz dz 2 = fj (z) ± dfj . one can rewrite f (z) = p 3z 2 − 1 w1/2 . 2 For example. That’s it.23) is the derivative chain rule.5. 2 −1 2 3z 3z 2 − 1 „ df dw It bears emphasizing to readers who may inadvertently have picked up unhelpful ideas about the Leibnitz notation in the past: the dw factor in the denominator cancels the dw factor in the numerator.2 The derivative product rule In general per (4. which itself is a function of z.19).5. (4. in the limit.   But to ﬁrst order.23). in the form f (w) w(z) Then df dw dw dz so by (4.

25) fj =   j fj   k dfk . this comes to d(f1 f2 ) = f2 df1 + f1 df2 . An example may help: » 2 3 – » 2 3 –» – u v −5t u v −5t du dv dz ds d e ln 7s = e ln 7s 2 +3 − − 5 dt + . d f g = g df − f dg .24) is the derivative product rule. 5. then by the derivative chain rule (4.26) Equation (4. this simpliﬁes to       dfk −dfk d  fj (z) =  fj (z) −  fj (z) . df2 = −dg/g2 .23).15 This paragraph is extra.27) × ak k where the ak . pk ln ck pk (4.24) in the slightly specialized but often useful form14   d gj j a ebj hj j j j =   ln cj pj  j gj j j j a ebj hj dgk + gk ln cj pj  bk dhk + k k  dpk . g2 (4.82 CHAPTER 4. so. After studying the complex exponential in Ch. fk (4. THE DERIVATIVE Since the product of two or more dfj is negligible compared to the ﬁrst-order inﬁnitesimals to which they are added here. bk and ck are arbitrary complex coeﬃcients and the gk . if f1 (z) = f (z) and f2 (z) = 1/g(z). (4. hk and pk are arbitrary functions.24) On the other hand. The subsection is suﬃciently abstract that it is a little hard to understand unless one already knows what it means. z z u v z s ln 7s 15 14 . You can skip it for now if you prefer. we shall stand in a position to write (4. 2fk (z) 2fk (z) j j k j k or in other words d j In the common case of only two fj .

x=xo (4. x=xo 16 The notation P |Q means “P when Q. The almost distinctive characteristic of the extremum f (xo ) is that16 df dx = 0. But the derivative of the slope is just the derivative of the derivative.28) At the extremum. If from downward to upward. Whether the extremum be a minimum or a maximum depends on whether the curve turn from a downward slope to an upward.” Sometimes it is alternately written P |Q or [P ]Q .6. Refer to Fig.3: A local extremum. or second derivative. respectively. then negative. a local minimum or maximum—of a real-valued function f (x). 4.28) to ﬁnd the extremum.3. EXTREMA AND HIGHER DERIVATIVES Figure 4. Hence if df /dx = 0 at x = xo .6 Extrema and higher derivatives One problem which arises very frequently in applied mathematics is the problem of ﬁnding a local extremum—that is.” or “P evaluated at Q. . The curve momentarily runs level there. y 83 f (xo ) f (x) xo x 4. then d2 f dx2 d2 f dx2 > 0 implies a local minimum at xo . the slope is zero.” “P . given Q. One solves (4. if from upward to downward. or from an upward slope to a downward. then the derivative of the slope is evidently positive.4. x=xo < 0 implies a local maximum at xo .

too.29) Of course if the ﬁrst and second derivatives are zero not just at x = xo but everywhere. would be a matter of deﬁnition. or both or neither. being rather a level inﬂection point as depicted in Fig. THE DERIVATIVE Figure 4. best established not by prescription but rather by the needs of the model at hand. Whether one chooses to call some random point on a level straight line an inﬂection point or an extremum. 4. if both functions go to zero or inﬁnity together at z = zo —then l’Hˆpital’s rule holds that o lim df /dz f (z) = g(z) dg/dz . . 4.4.) 4.17 (In general the term inﬂection point signiﬁes a point at which the second derivative is zero. x=xo this might be either a minimum or a maximum but more probably is neither.7 L’Hˆpital’s rule o If z = zo is a root of both f (z) and g(z).4 is level because its ﬁrst derivative is zero. but you knew that already. or alternately if z = zo is a pole of both functions—that is. z=zo z→zo 17 (4.84 CHAPTER 4. then f (x) = yo is just a level straight line.4: A level inﬂection. y f (xo ) f (x) xo x Regarding the case d2 f dx2 = 0. The inﬂection point of Fig.

7. l’Hˆpital’s rule is proved by reasoning18 o f (z) − 0 f (z) = lim z→zo g(z) − 0 z→zo g(z) f (z) − f (zo ) df df /dz = lim = lim = lim . we have that z→zo lim dg g(z) = lim . z→zo −df /f 2 g(z) z→zo F (z) z→zo dF where we have used the fact from (4. but just to make the point let us apply l’Hˆpital’s rule instead. Canceling the minus signs and multiplying by g2 /f 2 . which is still 0/0.19 o Partly with reference to [52. z→zo g(z) − g(zo ) z→zo dg z→zo dg/dz lim In the case where z = zo is a pole.21) that d(1/u) = −du/u2 for any u. L’HOPITAL’S RULE 85 In the case where z = zo is a root. z→zo dg z→zo dg/dz g(z) And if zo itself is inﬁnite? Then. whether it represents a root or a pole. should the occasion arise. “L’Hopital’s rule. reducing the ratio to limx→0 2(x3 + o x)(3x2 + 1)/2x. Consider for example the ratio limx→0 (x3 + x)2 /x2 . z→zo lim f (z) df df /dz = lim = lim . f (z) z→zo df Inverting. with which we apply l’Hˆpital’s rule for o Z → 0 to obtain Φ(Z) dΦ/dZ df /dZ f (z) = lim = lim = lim z→∞ g(z) Z→0 Γ(Z) Z→0 dΓ/dZ Z→0 dg/dZ (df /dz)(−z 2 ) df /dz (df /dz)(dz/dZ) = lim = lim . Applying l’Hˆpital’s rule again to the result yields o limx→0 2[(3x2 +1)2 +(x3 +x)(6x)]/2 = 2/2 = 1. 5 and [not yet written]) appear in ratio. new functions F (z) ≡ 1/f (z) and G(z) ≡ 1/g(z) of which z = zo is a root are deﬁned. (dg/dz)(dz/dZ) lim Z→0 Nothing in the derivation requires that z or zo be real. Where expressions involving trigonometric or special functions (Chs. = lim z→∞ (dg/dz)(−z 2 ) z→∞ dg/dz z→∞. we deﬁne the new variable Z = 1/z and the new functions Φ(Z) = f (1/Z) = f (z) and Γ(Z) = g(1/Z) = g(z). a recursive application of l’Hˆpital’s rule can be just the thing one needs. which is 0/0. applying the rule a third time would have ruined the result. In the example. o Observe that one must stop applying l’Hˆpital’s rule once the ratio is no longer 0/0 or o ∞/∞. 5 April 2006]. 3.” 03:40.ˆ 4. with which z→zo lim f (z) G(z) dG −dg/g2 = lim = lim = lim . Nothing prevents one from applying l’Hˆpital’s rule recursively. 19 18 . The easier way to resolve this particular ratio would naturally be to cancel a factor of x2 from it.

z2 . broadly applicable method for ﬁnding roots numerically.. fast converging.5. o 4. z3 . Good examples of the use require math from Ch.8 The Newton-Raphson iteration The Newton-Raphson iteration is a powerful. give successively better estimates of the true root z∞ . calculated in turn by the iteration (4. lim √ = lim x→∞ 1/2 x x→∞ x x x→∞ The example incidentally shows that natural logarithms grow slower than square roots.30). then come back here and read the paragraph if you prefer. The iteration approximates the curve f (x) by its tangent 20 This paragraph is optional reading for the moment. THE DERIVATIVE L’Hˆpital’s rule is used in evaluating indeterminate forms of the kinds o 0/0 and ∞/∞. Given a function f (z) of which the root is desired.30) One begins the iteration by guessing the root and calling the guess z0 .3. consider the function y = f (x) of Fig 4. Section 5. (4. plus related forms like (0)(∞) which can be recast into either of the two main forms.7) the natural logarithmic function and its derivative. You can read Ch. Then z1 .3 will put l’Hˆpital’s rule to work. 5 ﬁrst.86 CHAPTER 4. but if we may borrow from (5.20 1 d ln x = . 5 and later. § 10-2] . an instance of a more general principle we shall meet in § 5. the Newton-Raphson iteration is f (z) d dz f (z) z=zk zk+1 = z − . dx x then a typical l’Hˆpital example is21 o ln x 1/x 2 √ = lim √ = 0. etc. To understand the Newton-Raphson iteration. 21 [42.

is the line which most nearly approximates a curve at a given point. leaving the rightward leg tangent to the circle. and in the neighborhood of the point it goes in the same direction the curve goes. y 87 x xk xk+1 f (x) line22 (shown as the dashed line in the ﬁgure): ˜ fk (x) = f (xk ) + d f (x) dx (x − xk ).5: The Newton-Raphson iteration. we have that xk+1 = x − f (x) d dx f (x) x=xk d f (x) dx x=xk (xk+1 − xk ). x=xk ˜ It then approximates the root xk+1 as the point at which fk (xk+1 ) = 0: ˜ fk (xk+1 ) = 0 = f (xk ) + Solving for xk+1 . 3 is slightly obscure. The trigonometric tangent function is named from a variation on Fig. nothing forbids complex z and f (z). THE NEWTON-RAPHSON ITERATION Figure 4. Although the illustration uses real numbers. A tangent line. which is (4.4. 4.30) with x ← z. 22 .8. 3. also just called a tangent.1 in which the triangle’s bottom leg is extended to unit length. . maybe more of linguistic interest than of mathematical. The dashed line in Fig. The Newton-Raphson iteration works just as well for these. The tangent touches the curve at the point.5 is a good example of a tangent line. The relationship between the tangent line and the trigonometric tangent function of Ch.

Usually in practice. then the iteration might never converge at √ all. the roots of f (z) = z 2 + 2 are z = ±i 2. However. as most interesting functions do. so √ it never converges23 (and if you guess that z0 = 2—well. A second limitation of the Newton-Raphson is that. even the relatively slow convergence is still pretty fast and is usually adequate. The Newton-Raphson iteration is a champion square root calculator. try it with your pencil and see what z2 comes out to be). If this procedure is not practical (perhaps because the function has a large or inﬁnite number of roots). For most functions. A third limitation arises where there is a multiple root. The iteration often converges on the root nearest the initial guess zo but does not always. incidentally. the Newton-Raphson iteration for this is xk+1 = If you start by guessing x0 = 1 23 1 p xk + .5 shows why: in the neighborhood. You can ﬁx the problem with a diﬀerent. and in any case there is no guarantee that the root it ﬁnds is the one you wanted. the Newton-Raphson converges relatively slowly on the triple root of f (z) = z 3 . once the Newton-Raphson ﬁnds the root’s neighborhood. then you ﬁnd the next root by iterating on the new function f (z)/(z − α). then you should probably take care to make a suﬃciently accurate initial guess if you can. The most straightforward way to beat this problem is to ﬁnd all the roots: ﬁrst you ﬁnd some root α.30).31) It is entertaining to try this on a computer. In this case. For example. Figure 4. whose roots are √ x = ± p. but relatively slowly. . Consider f (x) = x2 − p. THE DERIVATIVE The principal limitation of the Newton-Raphson arises when the function has more than one root. 2 xk (4. Then try again with z0 = 1 + i2−0x10 .88 CHAPTER 4. if you happen to guess z0 especially unfortunately. the Newton-Raphson iteration works very well. Per (4. possibly complex initial guess. the curve hardly departs from the straight line. but if you guess that z0 = 1 then the iteration has no way to leave the real number line. For instance. and so on until you have found all the roots. the Newton-Raphson normally still converges. then you remove that root (without aﬀecting any of the other roots) by dividing f (z)/(z − α). it converges on the actual root remarkably quickly. even for calculations by hand.

26) as p1/n = σ 1/n cis(ψ/n). seek p1/n directly. § 6.1. you can decompose p1/n per de Moivre’s theorem (3.4.32) reliably. using complex zk in place of the real xk . in which cis(ψ/n) = cos(ψ/n) + i sin(ψ/n) is calculated by the Taylor series of Table 8.31) and (4. This concludes the chapter.8. let f (x) = xn − p and iterate24. upon which (4. treating the Taylor series.1][51] 24 . n xk √ 89 p fast.1.32) Section 13. then don’t bother to decompose. the iteration (4. orderly convergence is needed for complex p = u + iv = σ cis ψ. If so.5. they converge reliably and orderly only for real. 25 [42. (4. σ ≥ 0. sketch f [x] in the fashion of Fig. Given x0 = 1. however. so it often pays to be a little less theoretically rigid in applying it. § 4-9][34. (To see why.) If reliable. 4. orderly computes σ 1/n .25 xk+1 = p 1 (n − 1)xk + n−1 . This saves eﬀort and usually works. Equations (4. The Newton-Raphson iteration however excels as a practical root-ﬁnding technique. nonnegative p. will continue the general discussion of the derivative.31) converges on x∞ = To calculate the nth root x = p1/n . In the uncommon event that the direct iteration does not seem to converge.7 generalizes the Newton-Raphson iteration to handle vectorvalued functions. THE NEWTON-RAPHSON ITERATION and iterate several times.32) work not only for real p but also usually for complex. Chapter 8. start over again with some randomly chosen complex z0 . Then σ is real and nonnegative.

THE DERIVATIVE .90 CHAPTER 4.

ǫ→0 (5.1) Equation (5. It seems to appear just about everywhere. Another way to write the same deﬁnition is exp x = ex . It derives the functions’ basic properties and explains their close relationship to the trigonometrics. and shows how the two operate on complex arguments.1) deﬁnes the natural exponential function—commonly. the complex natural exponential is almost impossible to avoid.3) 91 . It works out the functions’ derivatives and the derivatives of the basic members of the trigonometric and inverse trigonometric families to which they respectively belong. ǫ→0 (5. What happens when ǫ is very small but N is very large? The really interesting question is. the natural logarithm. as ǫ → 0 and N → ∞.2) (5. Consider the factor This is the overall factor by which a quantity grows after N iterative rounds of multiplication by (1 + ǫ).1 The real exponential (1 + ǫ)N . what happens in the limit. more brieﬂy named the exponential function. 5. e ≡ lim (1 + ǫ)1/ǫ . while x = ǫN remains a ﬁnite number? The answer is that the factor becomes exp x ≡ lim (1 + ǫ)x/ǫ .Chapter 5 The complex exponential In higher mathematics. This chapter introduces the concept of the natural exponential and of its inverse.

1 we observe per (4. To show this.92 CHAPTER 5. the next step toward putting a concrete bound on e is to show that y(x) ≤ exp x for all real x. implying that the straight line which best approximates the exponential function in that neighborhood—the tangent line. but even at the applied level cannot think of a logically permissible way to do it.ǫ→0 δ (1 + ǫ)+δ/2ǫ − (1 + ǫ)−δ/2ǫ δ. ǫ→0 which is to say that d exp x = exp x. For the moment. what interests us is that d exp 0 = exp 0 = lim (1 + ǫ)0 = 1. THE COMPLEX EXPONENTIAL Whichever form we write it in. that is. dx (5. Since 1 + ǫ > 1. It seems nonobvious that the limit limǫ→0 (1 + ǫ)1/ǫ actually does exist. the slope and height of the exponential function are everywhere equal. To show that we can.ǫ→0 δ (1 + δ/2) − (1 − δ/2) = lim (1 + ǫ)x/ǫ δ. to divide repeatedly by 1 + ǫ as x decreases. ǫ→0 dx which says that the slope and height of the exponential function are both unity at x = 0. the question remains as to whether the limit actually exists. however.19) that the derivative of the exponential function is exp(x + δ/2) − exp(x − δ/2) d exp x = lim δ→0 dx δ (x+δ/2)/ǫ − (1 + ǫ)(x−δ/2)/ǫ (1 + ǫ) = lim δ.4) This is a curious. whether 0 < e < ∞. With the tangent line y(x) found. this Excepting (5. The rest of this section shows why it does. 1 . which just grazes the curve—is y(x) = 1 + x. important result: the derivative of the exponential function is the exponential function itself.ǫ→0 δ = lim (1 + ǫ)x/ǫ = lim (1 + ǫ)x/ǫ . the author would prefer to omit much of the rest of this section. we observe per (5. whether in fact we can put some concrete bound on e.1) that the essential action of the exponential function is to multiply repeatedly by 1 + ǫ as x increases. that the curve runs nowhere below the line.4).

d y(0) = 1. Leftward. Either way.2). Rightward. . 93 However. 1 2 ≤ exp (1) . the curve never crosses below the line for real x. But we have purposely deﬁned the tangent line y(x) = 1 + x such that exp 0 = d exp 0 = dx y(0) = 1.1. THE REAL EXPONENTIAL action means for real x that exp x1 ≤ exp x2 if x1 ≤ x2 .5. dx if x1 ≤ x2 . Evaluating the last inequality at x = −1/2 and x = 1. In light of (5. such that the line just grazes the curve of exp x at x = 0.1 depicts. bending upward away from the line. ≤ e1 . at x < 0. a positive number remains positive no matter how many times one multiplies or divides it by 1 + ǫ. evidently the curve’s slope only increases. we have that 1 2 2 But per (5. again bending upward away from the line. at x > 0. so 1 2 2 ≤ e−1/2 . y(x) ≤ exp x. dx that is. the last two equations imply further that d exp x1 ≤ dx 0 ≤ d exp x2 dx d exp x.4). Figure 5. evidently the curve’s slope only decreases. ≤ exp − . In symbols. exp x = ex . so the same action also means that 0 ≤ exp x for all real x.

such that we deﬁne for it the special notation ln(·) = loge (·). thus the natural logarithm inverts the natural exponential and vice versa: ln exp x = ln ex = x. (5.1. where e is the constant introduced in (5. so also for base b = e.1: The natural exponential.94 CHAPTER 5. Just as for any other base b. however. 2 ≤ e ≤ 4. one can choose any base b.2 The natural logarithm In the general exponential expression bx .2) puts the desired bound on the exponential function. is the most interesting choice of all. exp x 1 x −1 or in other words.6) exp ln x = eln x = x. As we shall see in § 5.5) which in consideration of (5. (5. For this reason among others. By the Taylor series of Table 8.4.B7E1 can readily be calculated. b = 2 is an interesting choice. it turns out that b = e. but the derivation of that series does not come until Ch.3). for example. and call it the natural logarithm. 5. THE COMPLEX EXPONENTIAL Figure 5. 8. the base-e logarithm is similarly interesting. The limit does exist. . the value e ≈ 0x2.

dx = exp y.2. If y = ln x. here is another rather signiﬁcant result.4).2 2 Besides the result itself. THE NATURAL LOGARITHM Figure 5.2 plots the natural logarithm. We shall use the technique more than once in this book. ln x 95 1 x −1 Figure 5. and per (5. the technique which leads to the result is also interesting and is worth mastering. d 1 ln x = . dy 1 dy = .7) dx x Like many of the equations in these early chapters. . dx x the inverse of which is In other words. dy But this means that dx = x.2: The natural logarithm. then x = exp y.5. (5.

3 Fast and slow functions The exponential exp x is a fast function. Since the exp(·) and ln(·) functions are mutual inverses. Base b = 2 logarithms are interesting. The logarithm ln x is a slow function. lim x→∞ xa lim (5. Such claims are proved by l’Hˆpital’s rule (4. diverge or decay respectively faster and slower than xa .8) logb w = ln b This equation converts any logarithm to a natural logarithm. x→∞ x→∞ That is. (5. +∞ if a ≤ 0. 2 which Ch. we can leverage (5.9) to show also that x→∞ lim exp(±x) xa = = = = = x→∞ x→∞ x→∞ exp(±x) xa lim exp [±x − a ln x] lim exp ln ln x x lim exp (x) ±1 − a lim exp [(x) (±1 − 0)] lim exp [±x] .1 will show how to calculate.96 CHAPTER 5. −∞ if a ≥ 0.5’s logarithmic base-conversion identity to read ln w .9) which reveals the logarithm to be a slow function. x→∞ xa exp(−x) = 0. we o have that −1 ln x = lim = a x→∞ axa x→∞ x lim ln x −1 lim a = lim a = x→0 x x→0 ax 0 if a > 0.B172. ln 2 = − ln 5.10) . (5. so we note here that 1 ≈ 0x0. THE COMPLEX EXPONENTIAL One can specialize Table 2.29). Applying the rule. These functions grow. 8 and its Table 8. 0 if a < 0. exp(+x) = ∞.

It’s a long. x4 x x x 1! 2! 3! 4! 1! 0! x0 x1 x2 x3 x4 3! 2! .3. 3 . The exception is peculiar. . The reader can detail these as the need arises. Also.. each term in each sequence is the derivative with respect to x of the term to its right—except left of the middle element in the ﬁrst sequence and right of the middle element in the second. and indeed one can write ln(x + u) . .. As we have seen. .. The logarithm can in some situations proﬁtably be thought of as a “diverging constant” of sorts. ln x. . What is going on here? The answer is that x0 (which is just a constant) and ln x both are of zeroth order in x. The natural logarithm does indeed eventually diverge to inﬁnity. but the point nevertheless remains that x0 and ln x often play analogous roles in mathematics.3 It is interesting and worthwhile to contrast the sequence .. 3 .4 Figure 5. .− 3! 2! 1! 0! x1 x2 x3 x4 . but the divergence of the former is extremely slow— so slow.5. regardless of the base. There are of course some degenerate edge cases like b = 0 and b = 1. x0 = lim u→∞ ln u which casts x0 as a logarithm shifted and scaled. FAST AND SLOW FUNCTIONS 97 which reveals the exponential to be a fast function. each sequence increases in magnitude going rightward. in fact. it takes practically forever just to reach 0x100. 1 . . in the literal sense that there is no height it does not eventually reach.9) limx→∞ (ln x)/xǫ = 0 for any positive ǫ no matter how small. Admittedly.− against the sequence . x4 x x x 0! 1! 2! 3! 4! As x → +∞. 1. . This seems strange at ﬁrst because ln x diverges as x → ∞ whereas x0 does not. ... Consider for instance how far out x must run to make ln x = 0x100. 3.2 has plotted ln x only for x ∼ 1. logarithms diverge slower. Thus exponentials generally are fast and logarithms generally are slow. that per (5.− 2. . Such conclusions are extended to bases other than the natural base e simply by observing that logb x = ln x/ ln b and that bx = exp(x ln b). . .. Exponentials grow or decay faster than powers. but beyond the ﬁgure’s window the curve (whose slope is 1/x) ﬂattens rapidly rightward. 4 One does not grasp how truly slow the divergence is until one calculates a few concrete values... − 2 . but it certainly does not hurry. one ought not strain such logic too far. because ln x is not in fact a constant. . long way. to the extent that it locally resembles the plot of a constant value.

.98 CHAPTER 5. THE COMPLEX EXPONENTIAL Less strange-seeming perhaps is the consequence of (5.10). So let us multiply a complex number z = x + iy by 1 + iǫ. It beﬁts an applied mathematician subjectively to internalize (5. ǫ→0 but from here it is not obvious where to go. The resulting change in z is ∆z = (1 + iǫ)(x + iy) − (x + iy) = (ǫ)(−y + ix).4 Euler’s formula The result of § 5. then exp(iǫ) = (1 + iǫ)ǫ/ǫ = 1 + iǫ. How can one evaluate exp iθ = lim (1 + ǫ)iθ/ǫ . helps one to grasp mentally the essential features of many mathematical models one encounters in practice. that x∞ and exp x play analogous roles. where the factor is not quite exactly unity—in this case. if we don’t quite know where to go with this yet. fast.14) to write the last equation in the form exp iθ = lim (1 + iǫ)θ/ǫ . to remember that ln x resembles x0 and that exp x resembles x∞ . by 1 + iǫ. one can take advantage of (4. Now leaving aside fast and slow functions for the moment.1 leads to one of the central questions in all of mathematics.10) that exp x is of inﬁnite order in x. In fact it appears that the interpretation of exp iθ remains for us to deﬁne. 5. But per § 5. A qualitative sense that logarithms are slow and exponentials. obtaining (1 + iǫ)(x + iy) = (x − ǫy) + i(y + ǫx).9) and (5. the essential operation of the exponential function is to multiply repeatedly by some factor. we turn our attention in the next section to the highly important matter of the exponential of a complex argument. The book’s development up to the present point gives no obvious direction. So. ǫ→0 where i2 = −1 is the imaginary unit introduced in § 2. if we can ﬁnd a way to deﬁne it which ﬁts sensibly with our existing notions of the real exponential.1. what do we know? One thing we know is that if θ = ǫ.12? To begin.

it follows from the equations that |exp iθ| = lim (1 + iǫ)θ/ǫ ǫ→0 arg [exp iθ] = lim arg (1 + iǫ)θ/ǫ ǫ→0 θ = 1. EULER’S FORMULA Figure 5.5.3. arg(∆z) = arctan −y 4 |∆z| = (ǫ) In other words. These two equations—implying an unvarying. Since the change is directed neither inward nor outward from the dashed circle in the ﬁgure. evidently its eﬀect is to move z a distance (ǫ) |z| counterclockwise about the circle. In other words. steady counterclockwise progression about the circle in the diagram—are somewhat unexpected. Given the steadiness of the progression.4. iℑ(z) i2 ∆z i ρ φ −2 −1 −i −i2 1 2 z ℜ(z) 99 in which it is seen that y 2 + x2 = (ǫ) |z| . ∆φ = ǫ. referring to the ﬁgure. 5. the change is proportional to the magnitude of z. but at a right angle to z’s arm in the complex plane. Refer to Fig. ǫ→0 ǫ .3: The complex exponential and Euler’s formula. 2π x = arg z + . = |exp 0| + lim ∆ρ ǫ→0 ǫ θ = arg [exp 0] + lim ∆φ = θ. the change ∆z leads to ∆ρ = 0.

the last equation is wz = exp(x ln σ − ψy) exp i(y ln σ + ψx). where cis(·) is as deﬁned in § 3. exp iθ is the complex number which lies on the Argand unit circle at phase angle θ. cos φ and i sin φ.29).11). we have for real φ that arg [exp iφ] = φ.11) is one of the most famous results in all of mathematics. . If a complex base w is similarly expressed in the form w = u + iv = σ exp iψ. How? Consider in light of Fig.47). (5. Had we known that θ was an Argand phase angle. then adds the latter two to show them equal to the ﬁrst of the three.11) that one can express any complex number in the form z = x + iy = ρ exp iφ.5. perhaps.11) |exp iφ| = 1. It is called Euler’s formula.100 CHAPTER 5. THE COMPLEX EXPONENTIAL That is. Such an alternate derivation lends little insight. 5 (5.6 and it opens the exponential domain fully to complex numbers. Since exp(α + β) = eα+β = exp α exp β.3 and (5.1 the Taylor series for exp iφ. the fundamental theorem of calculus (7.” 6 An alternate derivation of Euler’s formula (5. Changing φ ← θ now. not just for the natural base e but for any base. Along with the Pythagorean theorem (2.2) and Cauchy’s integral formula (8. then it follows that wz = exp[ln wz ] = exp[z ln w] = exp[(x + iy)(iψ + ln σ)] = exp[(x ln σ − ψy) + i(y ln σ + ψx)]. 5. Leonhard Euler’s name is pronounced as “oiler.12) For native English speakers who do not speak German. which says neither more nor less than that exp iφ = cos φ + i sin φ = cis φ. but at least it builds conﬁdence that we actually knew what we were doing when we came up with the incredible (5. (5.11)—less intuitive and requiring slightly more advanced mathematics. naturally we should have represented it by the symbol φ from the start.11. but briefer—constructs from Table 8.

5.12) serves to raise any complex number to a complex power. e±i2π/2 = −1.14) ln w = ln σeiψ = ln σ + iψ.11) implies that complex numbers z1 and z2 can be written (5.11.11) include that σ = e±i2π/4 = ±i.2. z2 = ρ2 eiφ2 . exp(−iφ) = cos φ − i sin φ.11) to +φ then to −φ. z2 ρ2 ρ2 z a = ρa eiaφ = ρa exp[iaφ]. we have that . we have that (5.5. tan ψ = u Equation (5.15) By the basic power properties of Table 2. y = ρ sin φ. (5. 101 u2 + v 2 .5.16) 5. ρ1 i(φ1 −φ2 ) ρ1 z1 = e = exp[i(φ1 − φ2 )]. Applying Euler’s formula (5. Euler’s formula (5. v .6 Complex trigonometrics exp(+iφ) = cos φ + i sin φ. ein2π = 1. Curious consequences of Euler’s formula (5.5 Complex exponentials and de Moivre’s theorem z1 = ρ1 eiφ1 . then. introduced in § 3. z1 z2 = ρ1 ρ2 ei(φ1 +φ2 ) = ρ1 ρ2 exp[i(φ1 + φ2 )]. This is de Moivre’s theorem. (5.13) For the natural logarithm of a complex number in light of Euler’s formula. COMPLEX EXPONENTIALS AND DE MOIVRE where x = ρ cos φ.

Such confusion probably tempts few readers unfamiliar with the material of Ch. Although less commonly seen than the other two.17) and (5.19) and (5.22)’s ﬁrst line from that ﬁgure. Their inverses arccosh. The ﬁgure is quite handy for real φ. (5.22) hold for complex φ as well as for real. so you can ignore this footnote for now. in the notation of that chapter—which tempts one incorrectly to suppose v ˆ by analogy that cos∗ φ cos φ + sin∗ φ sin φ = 1 and that cosh∗ φ cosh φ − sinh∗ φ sinh φ = 1 when the angle φ is complex. cosh φ (5. 7 .102 CHAPTER 5. 2 exp(+φ) − exp(−φ) . etc. and from (5.7 The notation exp i(·) or ei(·) is sometimes felt to be too bulky. 15 and if the confusion then arises.22) holds exactly as written for complex φ as well as for real. then such thoughts may serve to clarify the matter.20) can generally be true only if (5. (5. 15. However. but what if anything the ﬁgure means when φ is complex is not obvious.20) (5. whereas we originally derived (5. are deﬁned in the obvious way. then consider that the angle φ of Fig. 3.2) is that cos2 φ + sin2 φ = 1. if later you return after reading Ch.18) Thus are the trigonometrics expressed in terms of complex exponentials.22) Both lines of (5.17) Subtracting the second equation from the ﬁrst and solving for sin φ yields sin φ = (5.20) one can derive the hyperbolic analog: cos2 φ + sin2 φ = 1.. the notation cis(·) ≡ exp i(·) = cos(·) + i sin(·) Chapter 15 teaches that the “dot product” of a unit vector and its own conjugate is unity—ˆ ∗ · v = 1. THE COMPLEX EXPONENTIAL Adding the two equations and solving for cos φ yields cos φ = exp(+iφ) + exp(−iφ) .19) (5. 2 exp(+iφ) − exp(−iφ) . 2 sinh φ .1 is a real angle. The Pythagorean theorem for trigonometrics (3. i2 (5. Hence in fact cos∗ φ cos φ + sin∗ φ sin φ = 1 and cosh∗ φ cosh φ − sinh∗ φ sinh φ = 1.21) These are called the hyperbolic functions. cosh2 φ − sinh2 φ = 1. The forms (5.17) through (5. If the confusion descends directly or indirectly from the ﬁgure.18) suggest the deﬁnition of new functions cosh φ ≡ sinh φ ≡ tanh φ ≡ exp(+φ) + exp(−φ) . However.

Conventional names for these two mutually inverse families of functions are unknown to the author. 5. At this point in the development one begins to notice that the sin.7.7 Summary of properties Table 5. Or.17) and (5. and likewise for the several other trigs.11 and 4. by which one can immediately adapt the many trigonometric properties of Tables 3. cosh and sinh functions are each really just diﬀerent facets of the same mathematical phenomenon. Better applied style is to ﬁnd the derivatives by observing directly the circle from which the sine and cosine functions come. cis.1 Derivatives of sine and cosine One can compute derivatives of the sine and cosine functions from (5. SUMMARY OF PROPERTIES 103 is also conventionally recognized.8 Derivatives of complex exponentials This section computes the derivatives of the various trigonometric and inverse trigonometric functions.19) and (5. Likewise their respective inverses: arcsin. arccosh and arcsinh. if the various tangent functions were included. but one might call them the natural exponential and natural logarithmic families.1 and 3. −i ln. i sinh z = sin iz. but to do it in that way doesn’t seem sporting. ln. 3. Also conventionally recognized are sin−1 (·) and occasionally asin(·) for arcsin(·). as earlier seen in § 3.18) respectively to (5.23) 5. comparing (5. (5. exp.20). cos.3 to hyperbolic use. arccos.8.1 gathers properties of the complex exponential from this chapter and from §§ 2. Then.12. then one might call them the trigonometric and inverse trigonometric families.4. .17) and (5. we have that cosh z = cos iz. Replacing z ← φ in this section’s several equations implies a coherent deﬁnition for trigonometric functions of a complex variable. 5.18).5.11. i tanh z = tan iz.

104

CHAPTER 5. THE COMPLEX EXPONENTIAL

Table 5.1: Complex exponential properties. i2 = −1 = (−i)2 1 = −i i eiφ = cos φ + i sin φ eiz = cos z + i sin z z1 z2 = ρ1 ρ2 ei(φ1 +φ2 ) = (x1 x2 − y1 y2 ) + i(y1 x2 + x1 y2 ) ρ1 i(φ1 −φ2 ) (x1 x2 + y1 y2 ) + i(y1 x2 − x1 y2 ) z1 = e = 2 z2 ρ2 x2 + y2 2 z a = ρa eiaφ wz = ex ln σ−ψy ei(y ln σ+ψx) ln w = ln σ + iψ eiz − e−iz i2 eiz + e−iz cos z = 2 sin z tan z = cos z sin z = sin iz = i sinh z cos iz = cosh z tan iz = i tanh z ez − e−z 2 ez + e−z cosh z = 2 sinh z tanh z = cosh z sinh z =

cos2 z + sin2 z = 1 = cosh2 z − sinh2 z w ≡ u + iv = σeiψ z ≡ x + iy = ρeiφ

exp z ≡ ez

d exp z = exp z dz 1 d ln w = dw w df /dz d = ln f (z) f (z) dz ln w logb w = ln b

cis z ≡ cos z + i sin z = eiz

5.8. DERIVATIVES OF COMPLEX EXPONENTIALS Figure 5.4: The derivatives of the sine and cosine functions.
ℑ(z) dz dt z ℜ(z)

105

ρ

ωt + φo

Refer to Fig. 5.4. Suppose that the point z in the ﬁgure is not ﬁxed but travels steadily about the circle such that z(t) = (ρ) [cos(ωt + φo ) + i sin(ωt + φo )] . How fast then is the rate dz/dt, and in what Argand direction? dz d d = (ρ) cos(ωt + φo ) + i sin(ωt + φo ) . dt dt dt Evidently, • the speed is |dz/dt| = (ρ)(dφ/dt) = ρω; • the direction is at right angles to the arm of ρ, which is to say that arg(dz/dt) = φ + 2π/4. With these observations we can write that dz dt = (ρω) cos ωt + φo + 2π 2π + i sin ωt + φo + 4 4 = (ρω) [− sin(ωt + φo ) + i cos(ωt + φo )] . (5.25) (5.24)

(5.26)

Matching the real and imaginary parts of (5.25) against those of (5.26), we

106 have that

CHAPTER 5. THE COMPLEX EXPONENTIAL

d cos(ωt + φo ) = −ω sin(ωt + φo ), dt d sin(ωt + φo ) = +ω cos(ωt + φo ). dt If ω = 1 and φo = 0, these are d cos t = − sin t, dt d sin t = + cos t. dt

(5.27)

(5.28)

5.8.2

Derivatives of the trigonometrics

Equations (5.4) and (5.28) give the derivatives of exp(·), sin(·) and cos(·). From these, with the help of (5.22) and the derivative chain and product rules (§ 4.5), we can calculate the several derivatives of Table 5.2.8

5.8.3

Derivatives of the inverse trigonometrics

Observe the pair d exp z = exp z, dz 1 d ln w = . dw w The natural exponential exp z belongs to the trigonometric family of functions, as does its derivative. The natural logarithm ln w, by contrast, belongs to the inverse trigonometric family of functions; but its derivative is simpler, not a trigonometric or inverse trigonometric function at all. In Table 5.2, one notices that all the trigonometrics have trigonometric derivatives. By analogy with the natural logarithm, do all the inverse trigonometrics have simpler derivatives? It turns out that they do. Refer to the account of the natural logarithm’s derivative in § 5.2. Following a similar procedure, we have by successive steps
8

[42, back endpaper]

5.8. DERIVATIVES OF COMPLEX EXPONENTIALS

107

Table 5.2: Derivatives of the trigonometrics. d exp z = + exp z dz d sin z = + cos z dz d cos z = − sin z dz d 1 dz exp z d 1 dz sin z d 1 dz cos z 1 exp z 1 = − tan z sin z tan z = + cos z

= −

d tan z = + 1 + tan2 z dz 1 d 1 = − 1+ dz tan z tan2 z d sinh z = + cosh z dz d cosh z = + sinh z dz d 1 dz sinh z 1 d dz cosh z

1 cos2 z 1 = − 2 sin z = + 1 tanh z sinh z tanh z = − cosh z

= −

d tanh z = 1 − tanh2 z dz d 1 1 = 1− dz tanh z tanh2 z

= +

1 cosh2 z 1 = − sinh2 z

108 that

CHAPTER 5. THE COMPLEX EXPONENTIAL

arcsin w = z, w dw dz dw dz dw dz dz dw = sin z, = cos z, = ± = ± = √ 1 − sin2 z, 1 − w2 ,

d arcsin w = dw Similarly, w dw dz dw dz dz dw

±1 , 1 − w2 ±1 √ . 1 − w2

(5.29)

arctan w = z, = tan z, = 1 + tan2 z, = 1 + w2 , = 1 , 1 + w2 1 . 1 + w2

d arctan w = dw

(5.30)

Derivatives of the other inverse trigonometrics are found in the same way. Table 5.3 summarizes.

5.9

The actuality of complex quantities

Doing all this neat complex math, the applied mathematician can lose sight of some questions he probably ought to keep in mind: Is there really such a thing as a complex quantity in nature? If not, then hadn’t we better avoid these complex quantities, leaving them to the professional mathematical theorists? As developed by Oliver Heaviside in 1887,9 the answer depends on your point of view. If I have 300 g of grapes and 100 g of grapes, then I have 400 g
9

[35]

5.9. THE ACTUALITY OF COMPLEX QUANTITIES Table 5.3: Derivatives of the inverse trigonometrics. d ln w = dw d arcsin w dw d arccos w dw d arctan w dw d arcsinh w dw d arccosh w dw d arctanh w dw = = = = = = 1 w ±1 1 − w2 ∓1 √ 1 − w2 1 1 + w2 ±1 √ w2 + 1 ±1 √ w2 − 1 1 1 − w2 √

109

altogether. Alternately, if I have 500 g of grapes and −100 g of grapes, again I have 400 g altogether. (What does it mean to have −100 g of grapes? Maybe that I ate some!) But what if I have 200 + i100 g of grapes and 200 − i100 g of grapes? Answer: again, 400 g. Probably you would not choose to think of 200 + i100 g of grapes and 200 − i100 g of grapes, but because of (5.17) and (5.18), one often describes wave phenomena as linear superpositions (sums) of countervailing complex exponentials. Consider for instance the propagating wave A A exp[+i(ωt − kz)] + exp[−i(ωt − kz)]. 2 2 The beneﬁt of splitting the real cosine into two complex parts is that while the magnitude of the cosine changes with time t, the magnitude of either exponential alone remains steady (see the circle in Fig. 5.3). It turns out to be much easier to analyze two complex wave quantities of constant magnitude than to analyze one real wave quantity of varying magnitude. Better yet, since each complex wave quantity is the complex conjugate of the other, the analyses thereof are mutually conjugate, too; so you normally needn’t actually analyze the second. The one analysis suﬃces for both.10 (It’s like A cos[ωt − kz] =
10

If the point is not immediately clear, an example: Suppose that by the Newton-

110

CHAPTER 5. THE COMPLEX EXPONENTIAL

reﬂecting your sister’s handwriting. To read her handwriting backward, you needn’t ask her to try writing reverse with the wrong hand; you can just hold her regular script up to a mirror. Of course, this ignores the question of why one would want to reﬂect someone’s handwriting in the ﬁrst place; but anyway, reﬂecting—which is to say, conjugating—complex quantities often is useful.) Some authors have gently denigrated the use of imaginary parts in physical applications as a mere mathematical trick, as though the parts were not actually there. Well, that is one way to treat the matter, but it is not the way this book recommends. Nothing in the mathematics requires you to regard the imaginary parts as physically nonexistent. You need not abuse Occam’s razor! (Occam’s razor, “Do not multiply objects without necessity,”11 is a ﬁne philosophical tool as far as it goes, but is overused in some circles. More often than one likes to believe, the necessity remains hidden until one has ventured to multiply the objects, nor reveals itself to the one who wields the razor, whose hand humility should stay.) It is true by Euler’s formula (5.11) that a complex exponential exp iφ can be decomposed into a sum of trigonometrics. However, it is equally true by the complex trigonometric formulas (5.17) and (5.18) that a trigonometric can be decomposed into a sum of complex exponentials. So, if each can be decomposed into the other, then which of the two is the real decomposition? Answer: it depends on your point of view. Experience seems to recommend viewing the complex exponential as the basic element—as the element of which the trigonometrics are composed—rather than the other way around. From this point of view, it is (5.17) and (5.18) which are the real decomposition. Euler’s formula itself is secondary. The complex exponential method of oﬀsetting imaginary parts oﬀers an elegant yet practical mathematical way to model physical wave phenomena. So go ahead: regard the imaginary parts as actual. It doesn’t hurt anything, and it helps with the math.

Raphson iteration (§ 4.8) you have found a root of the polynomial x3 + 2x2 + 3x + 4 at x ≈ −0x0.2D + i0x1.8C. Where is there another root? Answer: at the complex conjugate, x ≈ −0x0.2D − i0x1.8C. One need not actually run the Newton-Raphson again to ﬁnd the conjugate root. 11 [48, Ch. 12]

Chapter 6

Primes, roots and averages
This chapter gathers a few signiﬁcant topics, each of whose treatment seems too brief for a chapter of its own.

6.1

Prime numbers

A prime number —or simply, a prime—is an integer greater than one, divisible only by one and itself. A composite number is an integer greater than one and not prime. A composite number can be composed as a product of two or more prime numbers. All positive integers greater than one are either composite or prime. The mathematical study of prime numbers and their incidents constitutes number theory, and it is a deep area of mathematics. The deeper results of number theory seldom arise in applications,1 however, so we shall conﬁne our study of number theory in this book to one or two of its simplest, most broadly interesting results.

6.1.1

The inﬁnite supply of primes

The ﬁrst primes are evidently 2, 3, 5, 7, 0xB, . . . . Is there a last prime? To show that there is not, suppose that there were. More precisely, suppose that there existed exactly N primes, with N ﬁnite, letting p1 , p2 , . . . , pN represent these primes from least to greatest. Now consider the product of
The deeper results of number theory do arise in cryptography, or so the author has been led to understand. Although cryptography is literally an application of mathematics, its spirit is that of pure mathematics rather than of applied. If you seek cryptographic derivations, this book is probably not the one you want.
1

111

112 all the primes,

CHAPTER 6. PRIMES, ROOTS AND AVERAGES

N

C=
j=1

pj .

What of C + 1? Since p1 = 2 divides C, it cannot divide C + 1. Similarly, since p2 = 3 divides C, it also cannot divide C + 1. The same goes for p3 = 5, p4 = 7, p5 = 0xB, etc. Apparently none of the primes in the pj series divides C + 1, which implies either that C + 1 itself is prime, or that C + 1 is composed of primes not in the series. But the latter is assumed impossible on the ground that the pj series includes all primes; and the former is assumed impossible on the ground that C + 1 > C > pN , with pN the greatest prime. The contradiction proves false the assumption which gave rise to it. The false assumption: that there were a last prime. Thus there is no last prime. No matter how great a prime number one ﬁnds, a greater can always be found. The supply of primes is inﬁnite.2 Attributed to the ancient geometer Euclid, the foregoing proof is a classic example of mathematical reductio ad absurdum, or as usually styled in English, proof by contradiction.3

6.1.2

Compositional uniqueness

Occasionally in mathematics, plausible assumptions can hide subtle logical ﬂaws. One such plausible assumption is the assumption that every positive integer has a unique prime factorization. It is readily seen that the ﬁrst several positive integers—1 = (), 2 = (21 ), 3 = (31 ), 4 = (22 ), 5 = (51 ), 6 = (21 )(31 ), 7 = (71 ), 8 = (23 ), . . . —each have unique prime factorizations, but is this necessarily true of all positive integers? To show that it is true, suppose that it were not.4 More precisely, suppose that there did exist positive integers factorable each in two or more distinct ways, with the symbol C representing the least such integer. Noting that C must be composite (prime numbers by deﬁnition are each factorable
[45] [39, Appendix 1][52, “Reductio ad absurdum,” 02:36, 28 April 2006] 4 Unfortunately the author knows no more elegant proof than this, yet cannot even cite this one properly. The author encountered the proof in some book over a decade ago. The identity of that book is now long forgotten.
3 2

Nq > 1. that the same prime cannot appear in both factorizations—because if the same prime r did appear in both then C/r either would be prime (in which case both factorizations would be [r][C/r]. let Np 113 Cp ≡ Cq ≡ pj .1. Cp = Cq = C. qk ≤ qk+1 . We see that pj = q k for any j and k—that is. j=2 Nq qk . Among other eﬀects. k=1 Cp = Cq = C. where Cp and Cq represent two distinct prime factorizations of the same number C and where the pj and qk are the respective primes ordered from least to greatest. p1 ≤ q 1 . k=2 .6. pj ≤ pj+1 . Let us now rewrite the two factorizations in the form Cp = p1 Ap . Np Ap ≡ Aq ≡ pj . PRIME NUMBERS only one way. like 5 = [51 ]). Np > 1. Cq = q1 Aq . defying our assumption that the two diﬀered) or would constitute an ambiguously factorable composite integer less than C when we had already deﬁned C to represent the least such. the fact that pj = qk strengthens the deﬁnition p1 ≤ q1 to read p1 < q 1 . j=1 Nq qk .

Indeed this is so. The last inequality lets us compose the new positive integer B = C − p1 q 1 . we have that √ 1 < p1 < q1 ≤ C ≤ Aq < Ap < C. Let E represent the positive integer which results from dividing C by p1 q1 : C . with C the least positive integer factorable two ways. q1 That Eq1 = Ap says that q1 divides Ap . the plausible assumption of the present subsection has turned out absolutely correct. so Ap ’s prime factorization is unique—and we see above that Ap ’s factorization does not include any qk . which might be prime or composite (or unity). The contradiction proves false the assumption which gave rise to it. Interestingly however. Observing that some integer s which divides C necessarily also divides C ± ns. Such eﬀects .114 CHAPTER 6. But Ap < C. p1 C = Aq . Thus no positive integer is ambiguously factorable. We have observed at the start of this subsection that plausible assumptions can hide subtle logical ﬂaws. E≡ p1 q 1 Then. then it divides B + p1 q1 = C. Since C is composite and since p1 < q1 . The false assumption: that there existed a least composite number C prime-factorable in two distinct ways. which further means that the product p1 q1 divides B. not even q1 . PRIMES. ROOTS AND AVERAGES where p1 and q1 are the least primes in their respective factorizations. also. Eq1 = Ep1 = C = Ap . This means that B’s unique factorization includes both p1 and q1 . we note that each of p1 and q1 necessarily divides B. we have just had to do some extra work to prove it. But if p1 q1 divides B. but which either way enjoys a unique prime factorization because B < C. which implies that p1 q1 < C. Prime factorizations are always unique.

1. PRIME NUMBERS 115 are typical on the shadowed frontier where applied shades into pure mathematics: with suﬃcient experience and with a ﬁrm grasp of the model at hand. Judging when to delve into the mathematics anyway. q > 1. but nevertheless. Hence there √ exists no rational. whereas 2/3 is.3 Rational and irrational numbers p x = . when one lets logical minutiae distract him to too great a degree. q2 which form is evidently also fully reduced. q A rational number is a ﬁnite real number expressible as a ratio of integers The ratio is fully reduced if p and q have no prime factors in common. he admittedly begins to drift out of the applied mathematical realm that is the subject of this book. The proof is readily extended to show that any x = nj/k is irrational if nonintegral. then the fully reduced n = p2 /q 2 is not an integer as we had assumed that it was. as was to be demonstrated. q = 0. The author does judge the present subsection’s proof to be worth the applied eﬀort. and more importantly on whether the unsureness felt is true uncertainty or is just an unaccountable desire for more precise mathematical deﬁnition (if the latter. 4/6 is not fully reduced. For √ √ example. It depends on how sure one feels. 6. To prove5 the last point. is a matter of applied mathematical style. p > 0. the 5 A proof somewhat like the one presented here is found in [39. we have that p2 = n. then it probably is. 2 is irrational.6. then unlike the author you may have the right temperament to become a professional mathematician). The contradiction proves false the assumption which gave rise to it. . Squaring the equation. But if q > 1. seeking a more rigorous demonstration of a proposition one feels pretty sure is correct. suppose that there did exist a fully reduced x= p √ = n. if you think that it’s true. For instance. nonintegral n. Appendix 1]. q and n are all integers. In fact any x = n is irrational unless integral. √ there is no such thing as a n which is not an integer but is rational. An irrational number is a ﬁnite real number which is not rational. q where p.1.

N > 0. then the quotient Q(z) = B(z)/(z − α) has exactly order N − 1. . z N −1 term of the quotient is never null. The reason is that if the leading term were null. which is to say that z − α exactly divides every polynomial B(z) of which z = α is a root. ROOTS AND AVERAGES extension by writing pk /q k = nj then following similar steps as those this paragraph outlines.2 The existence and number of polynomial roots This section shows that an N th-order polynomial must have exactly N roots. so ρ = 0. this reduces to B(α) = ρ. Note that if the polynomial B(z) has order N . a remainder. Thus substituting yields B(z) = (z − α)Q0 (z) + ρ. Now onward we go to other topics. ﬁrst-order divisors leave zeroth-order. where Q0 (z) is the quotient and R0 (z). the leading. if Q(z) had order less than N − 1. But B(α) = 0 by assumption. N B(z) = k=0 bk z k . In this case the divisor A(z) = z − α has ﬁrst order. where A(z) = z − α. Evidently the division leaves no remainder ρ. In the long-division symbology of Table 2.1 Polynomial roots Consider the quotient B(z)/A(z). constant remainders R0 (z) = ρ. 6. bN = 0.116 CHAPTER 6. 6. That is. so little will take you pretty far. That’s all the number theory the book treats. but in applied math. When z = α. B(z) = A(z)Q0 (z) + R0 (z). PRIMES. and as § 2.6. then B(z) = (z − α)Q(z) could not possibly have order N as we have assumed. B(α) = 0.2.2 has observed.3.

where z = ρeiφ .5). the bN z N term dominates B(z). then according to § 6.2 The fundamental theorem of algebra The fundamental theorem of algebra holds that any polynomial B(z) of order N can be factored N N B(z) = k=0 bk z = bN k j=1 (z − αj ). On the other hand.1 one can divide the polynomial by z − αN to obtain a new polynomial of order N − 1. the locus of all points in a plane at distance ρ from a point O is a circle. factoring the polynomial step by step into the desired form bN N (z − αj ). then one can divide it by z−αN −1 to obtain yet another polynomial of order N − 2. For example. it suﬃces to show that all polynomials of order N > 0 have at least one root. Prob. Because ei(φ+n2π) = eiφ and no fractional powers of z appear in (6. (6. THE EXISTENCE AND NUMBER OF ROOTS 117 6.2. 74] 7 A locus is the geometric collection of points which satisfy a given criterion. this locus forms a closed loop. when ρ = 0 the entire locus collapses on the single point B(0) = b0 . the locus of all points in three-dimensional space equidistant from two points P and Q is a plane. 2. the locus is nearly but not quite a circle at radius bN ρN from the Argand origin B(z) = 0. [24. 10. etc. so the locus there evidently has the general character of bN ρN eiN φ . joined at the ends and looped in N great loops. one root extracted at each step. At very large ρ. Ch. Eventually ρ disappears and the entire string collapses on the point B(0) = b0 . j=1 It remains however to show that there exists no polynomial B(z) of order N > 0 lacking roots altogether. bN = 0. for if a polynomial of order N has a root αN . As ρ shrinks smoothly.6 To prove the theorem. They also prove it in rather a diﬀerent way. and so on. the string’s shape changes smoothly. consider the locus7 of all B(ρeiφ ) in the Argand range plane (Fig. but at the end has collapsed on a single point. then at some time between it must have swept through the origin and every other point within the original loops.2. Now consider the locus at very large ρ again. Since the string originally has looped N times at great distance about the Argand origin. Watch the locus as ρ shrinks. As such. but this time let ρ slowly shrink. To the new polynomial the same logic applies: if it has at least one root αN −1 . revolving N times at that great distance before exactly repeating.6. and φ is variable. . 6 Professional mathematicians typically state the theorem in a slightly diﬀerent form. ρ is held constant. To show that there is no such polynomial.2. The locus is like a great string or rubber band.1) where the αk are the N roots of the polynomial.1).

given that the formula be constructed according to certain rules. formulas for the roots are known (see Ch. The Argand origin lies inside the loops at the start but outside at the end.8 but the NewtonRaphson iteration (§ 4.1). ﬁnding the polynomial given the roots. 6. (2. but to the applied mathematician it probably suﬃces to observe merely that no such formula is known. Undoubtedly the theorem is interesting to the professional mathematician. 6. reducing the polynomial step by step until all the roots are found. Charles is new. PRIMES. every 40 seconds. 10) though seemingly not so for quintic (ﬁfth order) and higher-order polynomials. as in (6. The strongest and most experienced of the three. ROOTS AND AVERAGES After all. For a quadratic (second order) polynomial. Given eight hours. .3. every 60 seconds. he lays only 60.118 CHAPTER 6. “Abel’s impossibility theorem”]. Charles. Now suppose that we are told that Adam can lay a brick every 30 seconds. is much easier: one just multiplies out j (z − αj ). then the values of ρ and φ precisely where the string has swept through the origin by deﬁnition constitute a root B(ρeiφ ) = 0. it is said to be shown that no such formula even exists. lays 120 bricks per hour. Brian. If so.9 Next is Brian who lays 90. so the string can only sweep as ρ decreases. how many bricks can the three men lay? Answer: (8 hours)(120 + 90 + 60 bricks per hour) = 2160 bricks.8) can be used to locate a root numerically in any case. it can never skip.1 Serial and parallel addition Consider the following problem. which observation completes the applied demonstration of the fundamental theorem of algebra. B(z) does have at least one root. Thus as we were required to show. Adam. Actually ﬁnding the roots numerically is another matter. The fact that the roots exist is one thing. How much time do the 8 In a celebrated theorem of pure mathematics [50. The Newton-Raphson is used to extract one root (any root) at each step as described above. The reverse problem.2) gives the roots. 9 The ﬁgures in the example are in decimal notation. For cubic (third order) and quartic (fourth order) polynomials.3 Addition and averages This section discusses the two basic ways to add numbers and the three basic ways to calculate averages of them. There are three masons. B(z) is everywhere diﬀerentiable.

1 . . Counterintuitive. 5. .—rather than as 1. . ADDITION AND AVERAGES three men need to lay 2160 bricks? Answer: 1 30 119 + 1 40 2160 bricks 1 + 60 bricks per second = 28. . 5 . where 1 1 1 1 = + + . but suggests that the notation which appears in the table. Assuming that none of the values involved is negative. 1 .2) and a bit of arithmetic. 3.2) a b a b where the familiar operator + is verbally distinguished from the when necessary by calling the + the serial addition or series addition operator.6. It works according to the law 1 1 1 = + . “if and only if. 30 40 60 30 40 60 The operator is called the parallel addition operator. (6. Neither is stated in simpler terms than the other.800 seconds = 8 hours. (6. (6. but fortunately there exists a better notation: (2160 bricks)(30 40 60 seconds per brick) = 8 hours. 4. perhaps.3) This is intuitive.1 are soon derived.4) Because we have all learned as children to count in the sensible man1 1 ner 1. the several parallel-addition identities of Table 6.3. 2. With (6. 2 . one can readily show that10 a x ≤ b x iﬀ a ≤ b. b k=a f (k) ≡ f (a) f (a + 1) f (a + 2) · · · f (b). might serve if needed. 1 hour 3600 seconds The two problems are precisely equivalent. . The writer knows of no conventional notation for parallel sums of series. . is that a x ≤ a.” . The notation used to solve the second is less elegant.—serial addition (+) seems 3 4 10 The word iﬀ means.

1 a b 1 1 + a b ab a+b a 1 + ab 1 a+b 1 1 a b ab a b a 1 ab = = a b = a 1 b = a+b = a+ 1 b = a b = b a a (b c) = (a b) c a ∞=∞ a = a a (−a) = ∞ (a)(b c) = ab ac 1 k a+b = b+a a + (b + c) = (a + b) + c a+0 =0+a = a a + (−a) = 0 (a)(b + c) = ab + ac 1 k ak = k 1 ak ak = k 1 ak .1: Parallel and serial addition identities.120 CHAPTER 6. ROOTS AND AVERAGES Table 6. PRIMES.

7) where the xk are the several samples and the wk are weights. the geometric mean [(120)(90)(60)] 1/3 bricks per hour.122 CHAPTER 6. Generally.8) (6. the arithmetic. out there and this is not one of them. Business demands other talents.9) (6.6) wk xk wk . The author is not sure. PRIMES.5) j Pk 1/wk . (6. making relatively easy problems harder and more mysterious than the problems need to be. (6. that’s what they hire engineers. is in the author’s experience usually a waste of time. Trying to convince businesspeople that their math is wrong. For two samples weighted equally. incidentally. The mathematically savvy sometimes prefer the geometric mean over either of the others for this reason. these are µ = µΠ µ a+b . The inverse geometric mean [(30)(40)(60)] 1/3 seconds per brick implies the same average productivity. (6. a third average is available.10) = 2(a b). . then you have met the diﬃculty. but somehow he doubts that many boards of directors would be willing to bet the company on a ﬁnancial formula containing some mysterious-looking ex . The geometric mean does not have the problem either of the two averages discussed above has. The fact remains nevertheless that businesspeople sometimes use mathematics in peculiar ways. geometric and harmonic means are deﬁned µ ≡ µΠ ≡  µ ≡  k j k wk xk = k wk k 1/ Pk wk w xj j  k xk /wk = 1/wk = k 1 wk  wk xk k w xj j  k . ROOTS AND AVERAGES When it is unclear which of the two averages is more appropriate. Some businesspeople are mathematically rather sharp—as you presumably are if you are in business and are reading these words—but as for most: when real mathematical ability is needed. If you have ever encountered the little monstrosity of an approximation banks (at least in the author’s country) actually use in place of (9. (6.12) to accrue interest and amortize loans. architects and the like for. √2 = ab.

√ 2 ab ≤ a + b. 4ab ≤ a2 + 2ab + b2 . but the motivation behind them remains inscrutable until the reader realizes that the writer originally worked the steps out backward with his pencil. where the word “average” signiﬁes the arithmetic. Starting there and recursing back. 0 ≤ a2 − 2ab + b2 . Only then did he reverse the order and write the steps formally here.14 0 ≤ (a − b)2 .11) holds for each subset individually then surely it holds for the whole set (this is so because the average of the whole set is itself the average of the two subset averages. and so on until each smallest group has two members only—in which case we already know that (6. ADDITION AND AVERAGES If a ≥ 0 and b ≥ 0. then each subsubset can be divided in half again. Does (6. The writer had no idea that he was supposed to start from 0 ≤ (a−b)2 until his pencil working backward showed him. The same reading strategy often clariﬁes inscrutable math. 2 (6. then by successive steps. consider the case of N = 2m nonnegative samples of equal weight. geometric or harmonic mean as appropriate).” the saying goes. But each subset can further be divided in half. we have that (6. 123 a+b √ .3.11) hold when there are several nonnegative samples of various nonnegative weights? To show that it does.11) The arithmetic mean is greatest and the harmonic mean. Nothing prevents one from dividing such a set of samples in half. µ ≤ µΠ ≤ µ. 2 ab a+b . with the geometric mean falling between. “Begin with the end in mind. least.11) obtains.6.11) The steps are logical enough. 2 a+b . considering each subset separately. In this case the saying is right. from the last step to the ﬁrst. √ 2 ab ≤ 1 ≤ a+b √ 2ab ≤ ab ≤ a+b √ 2(a b) ≤ ab ≤ That is. When you can follow the logic but cannot understand what could possibly have inspired the writer to conceive the logic in the ﬁrst place. try reading backward. 14 . for if (6.

. which was to be demonstrated. PRIMES. Now consider that a sample of any weight can be approximated arbitrarily closely by several samples of weight 1/2m .124 CHAPTER 6. (6.11) holds for any nonnegative weights of nonnegative samples. By this reasoning. ROOTS AND AVERAGES obtains for the entire set. provided that m is suﬃciently large.

125 . Concision can be a virtue—and by design. 7. f (t)? Chapter 4 has built toward a basic understanding of the ﬁrst question. The understanding of the second question constitutes the concept of the integral.1 The concept of the integral An integral is a ﬁnite accretion or sum of an inﬁnite number of inﬁnitesimal elements. or derivative. This chapter builds toward a basic understanding of the second. is undeniably a hard chapter. yet the book you are reading cannot be that kind of book. This chapter. [20] is warmly recommended. what is the function’s instantaneous rate of change. one of the profoundest ideas in all of mathematics. Experience knows no reliable way to teach the integral adequately to the uninitiated except through dozens or hundreds of pages of suitable examples and exercises.Chapter 7 The integral Chapter 4 has observed that the mathematics of calculus concerns a complementary pair of questions: • Given some function f (t). what is the corresponding accretion. or integral. However. It can be done. This section introduces the concept. The sections of the present chapter concisely treat matters which elsewhere rightly command chapters or whole books of their own. which introduces the integral. for less intrepid readers who quite reasonably prefer a gentler initiation. nothing essential is omitted here—but the bold novice who wishes to learn the integral from these pages alone faces a daunting challenge. f ′ (t)? • Interpreting some function f ′ (t) as an instantaneous rate of change.

1: Areas representing discrete sums.1. THE INTEGRAL Figure 7. of rectangles of width 1/2 and height . Sn = 1 n 1 2 1 4 1 8 k=0 0x20−1 k=0 0x40−1 k=0 0x80−1 k=0 k. . k . thin rectangles of width 1 and height k. 8 (0x10)n−1 k=0 k .126 CHAPTER 7. 7. S1 is composed of several tall. 4 k . S2 . 2 k . n What do these sums represent? One way to think of them is in terms of the shaded areas of Fig.1 An introductory example Consider the sums 0x10−1 S1 = S2 = S4 = S8 = .1. f1 (τ ) 0x10 ∆τ = 1 f2 (τ ) 0x10 1 ∆τ = 2 S1 τ 0x10 S2 τ 0x10 7. . In the ﬁgure.

1 As n grows. n 1 . (k|τ =0x10 )−1 Sn = k=0 τ ∆τ. then the reader should probably suspend reading here and go study a good basic calculus text like [20].7. or more properly. the reader is urged to pause and consider the illustration carefully until he does understand it. k=0 If the reader does not fully understand this paragraph’s illustration. Notice how we have evaluated S∞ . If it still seems unclear. In the equation 1 Sn = n let us now change the variables τ ∆τ to obtain the representation (0x10)n−1 (0x10)n−1 k=0 k . . The concept is important. THE CONCEPT OF THE INTEGRAL 127 k/2. without actually adding anything up. We have taken a shortcut directly to the total. n→∞ 2 or more tersely S∞ = 0x80. In fact it appears that bh lim Sn = = 0x80. the sum of an inﬁnite number of inﬁnitely narrow rectangles. the shaded region in the ﬁgure looks more and more like a triangle of base length b = 0x10 and height h = 0x10. n ← ← k . where the notation k|τ =0x10 indicates the value of k when τ = 0x10. is the area the increasingly ﬁne stairsteps approach. if the relation of the sum to the area seems unclear. Then (k|τ =0x10 )−1 S∞ = lim 1 ∆τ →0+ τ ∆τ. n Sn = ∆τ k=0 τ.1.

0 This means.” See [52.2: An area representing an inﬁnite sum of inﬁnitesimals. denoting discrete summation.2. “stepping in inﬁnitesimal intervals of dτ . 2 . THE INTEGRAL Figure 7.) f (τ ) 0x10 S∞ τ 0x10 in which it is conventional as ∆τ vanishes to change the symbol dτ ← ∆τ . k=0 τ is cumbersome. the sum of all τ dτ from τ = 0 to τ = 0x10. Compare against ∆τ in Fig. “Long s.” 14:54. where dτ is the inﬁnitesimal of Ch.128 CHAPTER 7. (Observe that the inﬁnitesimal dτ is now too narrow to show on this scale.1. . 7. so we replace it with the The symbol limdτ →0+ k=0=0x10 0x10 new symbol2 0 to obtain the form 0x10 S∞ = τ dτ. 7 April 2006]. .” English “sum.” Graphically. stands for Latin “summa. P Like the Greek S. the seventeenth century-styled R Roman S. 4: (k|τ =0x10 )−1 S∞ = lim (k| dτ →0+ )−1 τ dτ. 7. it is the shaded area of Fig.

The name “trapezoid” comes of the shapes of the shaded integration elements in the ﬁgure.1. (7. It represents the area under the curve of f (τ ) in that interval.3 The balanced deﬁnition and the trapezoid rule Actually. (k|τ =b )−1 b S∞ = lim dτ →0+ f (τ ) dτ = k=(k|τ =a ) a f (τ ) dτ.2 Generalizing the introductory example Now consider a generalization of the example of § 7.1) + 2 2  dτ →0+  a k=(k|τ =a )+1 Here.1.3 depicts it. (In the example of § 7. In the limit. This is the integral of f (τ ) in the interval a < τ < b.) With the change of variables τ ∆τ this is Sn = k=(k|τ =a ) ← ← k .1.1.1) is known as the trapezoid rule. THE CONCEPT OF THE INTEGRAL 129 7. the ﬁrst and last integration samples are each balanced “on the edge. we do well to deﬁne the integral in balanced form. n 1 . too:   (k|τ =b )−1  f (a) dτ b f (b) dτ  f (τ ) dτ ≡ lim f (τ ) dτ + .7. Equation (7.15).1.” half within the integration domain and half without. just as we have deﬁned the derivative in the balanced form (4. n (k|τ =b )−1 f (τ ) ∆τ. Figure 7.1: 1 Sn = n bn−1 f k=an k n . Observe however that it makes no diﬀerence whether one regards the shaded trapezoids or the dashed rectangles as the actual integration . f [τ ] was the simple f [τ ] = τ .1. but in general it could be any function. 7.

130 CHAPTER 7.3: Integration by the trapezoid rule (7. refer to the treatment of the Leibnitz notation in § 4.2. general. The only requirement is that dτ remain inﬁnitesimal. a The trapezoid rule (7. f (τ ) a dτ b τ elements. Notice that the shaded and dashed areas total the same. but other schemes are possible. the total integration area is the same either way. incidentally. THE INTEGRAL Figure 7.2 If The antiderivative and the fundamental theorem of calculus x S(x) ≡ 3 g(τ ) dτ. one can give the integration elements quadratically curved tops which more nearly track the actual curve.1).) 7.1) is perhaps the most straightforward.3 The important point to understand is that the integral is conceptually just a sum. That scheme is called Simpson’s rule. too.4. Constant widths are usually easiest to handle but variable widths ﬁnd use in some cases.] . but a sum nevertheless. nothing more. (For further discussion of the point. [A section on Simpson’s rule might be added to the book at some later date. It is a sum of an inﬁnite number of inﬁnitesimal elements as dτ tends to vanish. robust way to deﬁne the integral. Nothing actually requires the integration element width dτ to remain constant from element to element. For example.

or how to count or to add. . then the idea dawns on him. ﬁttingly named the fundamental theorem of calculus. 23 May 2006] Having read from several calculus books and. but make sure that you a understand it the other way. the author has never yet met a more convincing demonstration of (7.2). one sees that the derivative must be dS = g(x). like millions of others perhaps including the reader. b df dτ = f (τ )|b . the faster the accretion. (7. In this way one sees that the integral and the derivative are inverse operators. Somehow the underlying idea is too simple. too profound to explain. having sat years ago in various renditions of the introductory calculus lectures in school. sketch the corresponding slope function df /dτ . More precisely.2) is of utmost importance in the practice of mathematics.5 The idea is simple but big. but to grasp the idea ﬁrmly in the ﬁrst place is not entirely trivial. try this: Sketch some arbitrary function f (τ ) on a set of axes at the bottom of a piece of paper—some squiggle of a curve like 5 4 f (τ ) τ a b will do nicely—then on a separate set of axes directly above the ﬁrst. The reader [20.2) a dτ a where the notation f (τ )|b or [f (τ )]b means f (b) − f (a).” 06:29. ﬁne. The integral accretes area at a rate proportional to the curve’s height f (τ ): the higher the curve. If this way works better for you. If you want some help pondering. The integral is the antiderivative. Now consider (7. shade the integration area under the curve.2. a a The importance of (7. THE ANTIDERIVATIVE 131 then what is the derivative dS/dx? After some reﬂection.4 can hardly be overstated. It’s like trying to explain how to drink water.2) in light of your sketch.2) than the formula itself. “Fundamental theorem of calculus. There. then on the upper. (7. The idea behind the formula is indeed simple once grasped.2) a while. Mark two points a and b on the common horizontal axis. § 11. dx This is so because the action of the integral is to compile or accrete the area under a curve. Does the idea not dawn? Another way to see the truth of the formula begins by canceling its (1/dτ ) dτ to obtain Rb the form τ =a df = f (τ )|b .6][42. the one inverts the other.7. One ponders the formula (7. Elaborate explanations and their attendant constructs and formalities are indeed possible to contrive. § 5-4][52. but the idea itself is so simple that somehow such contrivances seem to obscure the idea more than to reveal it. df /dτ plot. As the formula which ties together the complementary pair of questions asked at the chapter’s start. too.

Section 9. 7. and introduces the multiple integral. consider that because (d/dτ )(τ 3 /6) = τ 2 /2. discusses linearity and its consequences.2 and 5. sin τ. 6 Gathering elements from (4.21) and from Tables 5.1 lists a handful of the simplest. exp τ. .3. treats the transitivity of the summational and integrodiﬀerential operators. As an example of the formula’s use. . One is unlikely to do much higher mathematics without this formula. THE INTEGRAL Table 7.132 CHAPTER 7.1: Basic derivatives for the antiderivative. linearity and multiple integrals This section presents the operator concept.1 speaks further of the antiderivative.1 Operators An operator is a mathematical agent that combines several values of a function. b a df dτ = f (τ )|b a dτ τa a ln τ. most useful derivatives for antiderivative use. a = 0 ln 1 = 0 exp 0 = 1 sin 0 = 0 cos 0 = 1 τ a−1 = 1 = τ exp τ = cos τ = sin τ = d dτ d dτ d dτ d dτ d dτ is urged to pause now and ponder the formula thoroughly until he feels reasonably conﬁdent that indeed he does grasp it and the important idea it represents. Table 7.3.3 Operators. (− cos τ ) . it follows that x 2 τ 2 dτ = 2 x 2 d dτ τ3 6 dτ = τ3 6 x = 2 x3 − 8 . 7.

f (k) ≡ 5 if k = 1. the + operator can indeed be said to use a dummy variable up. k=0 By such admittedly excessive formalism. unfortunately. Such a variable. is a dummy variable. The operator has used the variable up. Notice that the operator has acted to remove the variable j from the expression 2j − 1. 7. .   undeﬁned otherwise. used up by an operator.3. LINEARITY AND MULTIPLE INTEGRALS 133 Such a deﬁnition. S= f (k) = f (0) + f (1) = 3 + 5 = 8. where if k = 0. A better way to introduce the operator is by giving examples.7. it depends on how you look at it. −. The essential action of an operator is to take several values of a function and combine them in some way. do they? Well. . multiplication. is an operator in 5 j=1 (2j − 1) = (1)(3)(5)(7)(9) = 0x3B1. The point is that + is in fact an operator just like the others. Operators include +.2 A formalism But then how are + and − operators? They don’t use any dummy variables up. is extraordinarily unilluminating to those who do not already know what it means. 1  3  Then.3. One can write this as 1 S= k=0 f (k). The j appears on the equation’s left side but not on its right. For example.3. as encountered earlier in § 2. Consider the sum S = 3 + 5. and ∂. OPERATORS. division. .

otherwise. When you see the linear expression 3z + 1. if k = 2. f (u. That’s the sense of it. k=0 = where Such unedifying formalism is essentially useless in applications.3 Linearity A function f (z) is linear iﬀ (if and only if) it has the properties f (z1 + z2 ) = f (z1 ) + f (z2 ). It doesn’t help much. Nonlinear functions include6 f (z) = z 2 . z) − Φ(3. 7. partly of semantics. the −1 is the constant value it targets. .3.134 Another example of the kind: CHAPTER 7. Once you understand why + and − are operators just as and are. THE INTEGRAL D = g(z) − h(z) + p(z) + q(z) = Φ(0. f (z) = 3z + 1 and even f (z) = 1. z) + Φ(4. z). z) ≡ 0    q(z)     undeﬁned if k = 0. v) = 2u − v and f (z) = 0 are examples of √ linear functions. so the expression 3z + 1 is literally “linear” in this sense. you can forget the formalism. z) − Φ(1. f (0) = 0. f (αz) = αf (z). z) 4 = g(z) − h(z) + p(z) − 0 + q(z) (−)k Φ(k. The functions f (z) = 3z. f (t) = cos ωt. think 3z + 1 = 0. The equation y = 3x + 1 plots a line. if k = 1. if k = 3. except as a vehicle for deﬁnition.  g(z)    h(z)     p(z) Φ(k. then g(z) = 3z = −1. then how is not f (z) = 3z + 1 a linear function? Answer: it is partly a matter of purposeful deﬁnition. but the deﬁnition has more purpose to it than merely this. 6 If 3z + 1 is a linear expression. The g(z) = 3z is linear. z) + Φ(2. if k = 4. v) = uv. f (u.

Integrals and derivatives must 7 You don’t see d in the list of linear operators? But d in this context is really just another way of writing ∂. Now consider that an integral is just a sum of many elements. k) in the indicated domain. Apparently S1 = S2 .2. See § 4. so. d is linear.7 . k).6. too. 135 . f (j. Reﬂection along these lines must soon lead the reader to the conclusion that. and that a derivative is just a diﬀerence of two elements. For d df1 df2 [f1 (z) + f2 (z)] = + . S2 = j=p xk .4 Summational and integrodiﬀerential transitivity b Consider the sum S1 = k=a   q j=p xk j!  This is a sum of the several values of the expression xk /j!. evaluated at every possible pair (j.7. − and ∂ are examples of linear operators. Now consider the sum q b k=a .3. . L(0) = 0. LINEARITY AND MULTIPLE INTEGRALS An operator L is linear iﬀ it has the properties L(f1 + f2 ) = Lf1 + Lf2 . OPERATORS. division and the various trigonometric functions. The operators instance. k) = k j j k f (j. +. 7. yes. only added in a diﬀerent order.2 will have more to say about operators and their notation. j! This is evidently a sum of the same values. Section 15. among others.3. dz dz dz Nonlinear operators include multiplication. in general. L(αf ) = αLf.4.

For example. 2k + j + 1 diverge once reordered.136 CHAPTER 7. 2k + j + 1 One cannot blithely swap operators here. v) du dv = u=a u=a ∞ v=−∞ f (u. but rather because the inner sum after the swap diverges. fk (v) dv = k k fk (v) dv. where L is any of the linear operators Some convergent summations. Paradoxically. k=0 j=0 (−)j . f du = Lv Lu f (u. hence the outer sum after the swap has no concrete summand on which to work. as 1 ∞ j=0 k=0 (−)j . v) = Lu Lv f (u. § 1. which associates two negative terms with each positive.3] . ∂f du. v).) For a more twisted example of the same phenomenon. ∞ v=−∞ b b f (u.10. like ∞ 1 .3) ∂ ∂v In general.5. consider8 1− 1 1 1 + − + ··· = 2 3 4 1− 1 1 − 2 4 + 1 1 1 − − 3 6 8 + ··· . ∂v (7. THE INTEGRAL then have the same transitive property discrete sums have. v) dv du. or ∂. This is not because swapping is wrong. then. See also § 8. 1− 1 1 1 + − + ··· = 2 3 4 = = 8 1 1 1 1 − − + + ··· 2 4 6 8 1 1 1 1 − + − + ··· 2 4 6 8 1 1 1 1 1 − + − + ··· .2. (Why does the inner sum after the swap diverge? Answer: 1 + 1/3 + 1/5 + · · · = [1] + [1/3 + 1/5] + [1/7 + 1/9 + 1/0xB + 1/0xD] + · · · > 1[1/4] + 2[1/8] + 4[1/0x10] + · · · = 1/4 + 1/4 + 1/4 + · · · . 2 2 3 4 [1. but still seems to omit no term.

Professional mathematics does bring an elegant notation and a set of formalisms which serve ably to spotlight certain limited kinds of blunders. maybe. preceded if necessary by [20] and/or [1. but these are blunders no less by the applied approach. the professional approach is worth study if you have the time. The stalwart Leonhard Euler—arguably the greatest series-smith in mathematical history—wielded his heavy analytical hammer in thunderous strokes before professional mathematics had conceived the notation or the formalisms. Ch. v) = Consider the function u2 . but cannot be.3. § 16] . Well. but rather as a curved surface in a three-dimensional space. V = v1 v u1 u2 This is a double integral.9 So specifying would have prevented the error. or so it would seem. Integrating the function seeks not the area under the curve but rather the volume under the surface: v2 2 u2 u dv du. A better way to have handled the example might have been to write the series as 1 1 1 1 1 lim 1 − + − + · · · + − n→∞ 2 3 4 2n − 1 2n 7. v Such a function would not be plotted as a curved line in a plane. 1]. Recommended introductions include [28]. thus explicitly specifying equal numbers of positive and negative terms. LINEARITY AND MULTIPLE INTEGRALS 137 in the ﬁrst place. u2 dv. If the great Euler did without. This writer however does not feel sure that rigor is quite the right word for what was lacking here. One can normally swap summational and integrodiﬀerential operators with little worry. for it claims falsely that the sum is half itself. Inasmuch as it can be written in the form V = u1 v2 g(u) du.7. OPERATORS. v g(u) ≡ 9 v1 Some students of professional mathematics would assert that the false conclusion was reached through lack of rigor. On the other hand. seldom poses much of a dilemma in practice.5 Multiple integrals f (u. The conditional convergence 10 of the last paragraph. 10 [28. then you and I might not always be forbidden to follow his robust example. which can occur in integrals as well as in sums.3. The reader however should at least be aware that conditional convergence troubles can arise where a summand or integrand varies in sign or phase.

another triple: a sixfold integration altogether. V = S f (ρ) dρ. Manifold nesting of integrals is thus not just a theoretical mathematical topic. it arises in sophisticated real-world engineering models. Hence v2 u2 u1 V = v1 u2 du dv. doesn’t it? What diﬀerence does it make whether we add the towers by rows ﬁrst then by columns.11 then the total soil mass in some rectangular volume is x2 y2 y1 z2 M= x1 z1 µ(x. upright slices. then v. Triple integrations arise about as often. if µ(r) = µ(x. For instance. where the V stands for “volume” and is understood to imply a triple integration. . y. v And indeed this makes sense.4.3. 11 Conventionally the Greek letter ρ not µ is used for density. Double integrations arise very frequently in applications. z) dz dy dx. and its inverse. where the S stands for “surface” and is understood to imply a double integration. In light of § 7. As a concise notational convenience.138 CHAPTER 7. then the slices over u to constitute the volume. THE INTEGRAL its eﬀect is to cut the area under the surface into ﬂat. y. If we integrated over time as well as space. but it happens that we need the letter ρ for a diﬀerent purpose later in the paragraph. The topic concerns us here for this reason. Similarly for the double integral. the last is often written M= V µ(r) dr. A spatial Fourier transform ([section not yet written]) implies a triple integration. then the slices crosswise into tall. evidently nothing prevents us from swapping the integrations: u ﬁrst. Even more than three nested integrations are possible. or by columns ﬁrst then by rows? The total volume is the same in any case. the integration would be fourfold. The towers are integrated over v to constitute the slice. thin towers. z) represents the variable mass density of some soil.

areas and volumes of interesting common shapes and solids. but inasmuch as the wedge is inﬁnitesimally narrow.5. Integrating the many triangles. We shall calculate it in § 8. the wedge is indistinguishable from a triangle of base length ρ dφ and height ρ.2 The volume of a cone One can calculate the volume of any cone (or pyramid) if one knows its base area B and its altitude h measured normal12 to the base.4) (The numerical value of 2π—the circumference or perimeter of the unit circle—we have not calculated yet.4. one can calculate the perimeters.1 The area of a circle Figure 7.” .11.4.7.4 Areas and volumes By composing and solving appropriate integrals. 7. The area of such a triangle is Atriangle = ρ2 dφ/2.4 depicts an element of a circle’s area. The element has wedge shape. 2 2 (7. we ﬁnd the circle’s area to be π π Acircle = φ=−π Atriangle = −π 2πρ2 ρ2 dφ = .4.4: The area of a circle. y 139 dφ ρ x 7. AREAS AND VOLUMES Figure 7.) 7. 7. Refer to Fig. 12 Normal here means “at right angles.

5) 7. For the surface area. well characterized part of a square? (With a pair of scissors one can cut any shape from a square piece of paper. then z the cross-sectional area is evidently13 (B)(z/h)2 . Fig. then ponder Fig. If it is not yet evident to you. THE INTEGRAL Figure 7.5: The volume of a cone. h B A cross-section of a cone. has the same shape the base has but a diﬀerent scale.4. 13 . and how cross-sections cut nearer a cone’s vertex are smaller though the same shape.6.) Thinking along such lines must soon lead one to the insight that the parallel-cut cross-sectional area of a cone can be nothing other than (B)(z/h)2 . after all. What if the base were square? Would the cross-sectional area not be (B)(z/h)2 in that case? What if the base were a right triangle with equal legs—in other words.7. half a square? What if the base were some other strange shape like the base depicted in Fig. 7. regardless of the base’s shape. cut parallel to the cone’s base. the sphere’s surface is sliced vertically down the z axis into narrow constant-φ tapered strips (each strip broadest at the sphere’s equator.5 a moment. as in Fig. 3 (7. A surface element so produced (seen as shaded in the latter ﬁgure) evidently has the area dS = (r dθ)(ρ dφ) = r 2 sin θ dθ dφ. If coordinates are chosen such that the altitude h runs in the ˆ direction with z = 0 at the cone’s vertex. 7. 7. Consider what it means to cut parallel to a cone’s base a cross-section of the cone. tapering to points at the sphere’s ±z poles) and horizontally across the z axis into narrow constant-θ rings.140 CHAPTER 7. one wants to calculate both the surface area and the volume. The fact may admittedly not be evident to the reader at ﬁrst glance. For this reason.3 The surface area and volume of a sphere Of a sphere. 7.5? Could such a strange shape not also be regarded as a deﬁnite. the cone’s volume is h Vcone = 0 (B) z h 2 dz = B h2 h 0 z 2 dz = B h2 h3 3 = Bh .

6).6: A sphere.7: An element of the sphere’s surface (see Fig. z ˆ θ φ x ˆ r z y ˆ ρ Figure 7. z ˆ r dθ y ˆ ρ dφ . AREAS AND VOLUMES 141 Figure 7.4. 7.7.

The circular area derivation of § 7.142 CHAPTER 7.1 that sin τ = (d/dτ )(− cos τ ). each cone with base area dS and altitude r. In light of (7. the volume of one such cone is Vcone = r dS/3. however.4. with the vertices of all the cones meeting at the sphere’s center. Per (7.6). where we have used the fact from Table 7. 3 S where the useful symbol S indicates integration over a closed surface. Hence.7) Vsphere = 3 (One can compute the same spherical volume more prosaically. Vsphere = S Vcone = S r dS r = 3 3 r dS = Ssphere .1 lends an analogous insight: that a circle can sometimes be viewed as a great triangle rolled up about its own vertex. THE INTEGRAL The sphere’s total surface area then is the sum of all such elements over the sphere’s entire surface: π π Ssphere = φ=−π π θ=0 π θ=0 dS r 2 sin θ dθ dφ = = r φ=−π π 2 = r2 φ=−π π φ=−π 2 [− cos θ]π dφ 0 [2] dφ (7. without reference to cones. The derivation given above.6) = 4πr .4. the total volume is 4πr 3 . by writing dV = r 2 sin θ dr dθ dφ then integrating V dV . one divides the sphere into many narrow cones. (7. Having computed the sphere’s surface area.1 has found a circle’s area—except that instead of dividing the circle into many narrow triangles. is preferred because it lends the additional insight that a sphere can sometimes be viewed as a great cone rolled up about its own vertex.) . one can ﬁnd its volume just as § 7.5).

go with it. and it’s how computers divide. and long division in straight binary is kind of fun. 14 . b S≡ a df dτ = f (b) − f (a). integrating b a τ2 b3 − a3 dτ = 2 6 with a pencil.10) which can be used to check further. dτ (7.8) with respect to b and a. Actually. according to (7. you might.10) to check the example. If you have never tried it. 2 Diﬀerentiation inverts integration. it’s 1131/13 = 87. They are of little use to check deﬁnite integrals Admittedly. It is simpler than decimal or hexadecimal division. how does one check the result? Answer: by diﬀerentiating ∂ ∂b b3 − a3 6 = b=τ τ2 .9) a=τ Either line of (7. More formally. dτ = b=τ (7.15 As useful as (7. (b3 − a3 )/6|b=a = 0. 15 Using (7. few readers will ever have done much such multidigit hexadecimal arithmetic with a pencil.8) Diﬀerentiating (7.9) and (7.2). The insight gained is worth the trial. Evaluating (7.7. Easier than integration. they nevertheless serve only integrals with variable limits.5 Checking an integration Dividing 0x46B/0xD = 0x57 with a pencil.5. Multiplication inverts division. hexadecimal is just proxy for binary (see Appendix A).9) can be used to check an integration. but. reliable check.8) at b = a yields S|b=a = 0. CHECKING AN INTEGRATION 143 7. ∂S ∂b ∂S ∂a df . Likewise.10) are. how does one check the result?14 Answer: by multiplying (0x57)(0xD) = 0x46B. reliable check. (7. In decimal. dτ df =− . Easier than division. diﬀerentiation like multiplication provides a quick. hey. multiplication provides a quick.

10) do serve such indeﬁnite integrals. b. 9 will bring much harder ones. Yet many other paths of integration from xρ to yρ are possible. a + 2dτ. 7.9) and (7. a f (τ ) dτ . Equations (7.8. many or most integrals one meets in practice have or can be given variable limits. but they diﬀer in between. . then from there to r = yρ? Or does it mean to integrate along the arc of Fig. Consider the integral ˆ yρ S= r=ˆ ρ x (x2 + y 2 ) dℓ. Analytically.8? The two paths of integration begin and end at the same points. where C stands for “contour” and means in this example the speciﬁc contour of Fig. not just these two. So far the book has introduced only easy integrals. which lack variable limits to diﬀerentiate. in which the function f (τ ) is evaluated at τ = a. . However. a + dτ. 7. 7. although numerically diﬀerentiation is indeed harder than integration. . so 2π/4 2π 3 ρ . ˆ What does this integral mean? Does it mean to integrate from r = xρ to ˆ r = 0.6 Contour integration To this point we have considered only integrations in which the variable of integration advances in a straight line from one point to another: for b instance. THE INTEGRAL like (9.144 CHAPTER 7. The integration variable is a real-valued scalar which can do nothing but make a straight line from a to b.14) below. ρ3 dφ = ρ2 dℓ = S= 4 0 C . x2 + y 2 = ρ2 (by the Pythagorean theorem) and dℓ = ρ dφ. In the example. analytically precisely the opposite is true. Because multiple paths are possible. Reversing an integration by taking an easy derivative is thus an excellent way to check a hard-earned integration result. we must be more speciﬁc: S= C (x2 + y 2 ) dℓ. and the integral certainly does not come ˆ ˆ out the same both ways. where dℓ is the inﬁnitesimal length of a step along the path of integration. diﬀerentiation is the easier. Even experienced mathematicians are apt to err in analyzing these. It is a rare irony of mathematics that. Such is not the case when the integration variable is a vector. . but Ch.

as we shall see in §§ 8. but closed contours which begin and end at the same point are also possible.5. It means that the contour ends where it began: the loop is closed. Consider a mechanical valve opened at time t = to . DISCONTINUITIES Figure 7. In the latter case some interesting mathematics emerge. but one thing they do not model gracefully is the simple discontinuity. .7 Discontinuities The polynomials and trigonometrics studied to this point in the book offer ﬂexible means to model many physical phenomena of interest.7. 7. xo . t > t o . contour integration applies equally where the variable of integration is a complex scalar. The contour of Fig.7. 7. t < to .8 would be closed. The useful symbol indicates integration over a closed contour.8 and 9. if it continued to r = 0 and then back to r = xρ. indeed common. ˆ for instance. The ﬂow x(t) past the valve is x(t) = 0. y C ρ φ x 145 In the example the contour is open.8: A contour of integration. Besides applying where the variable of integration is a vector.

11) plotted in Fig. 1. THE INTEGRAL Figure 7.9. 7. (7. t > 0. where it is inﬁnite. with the property that ∞ −∞ δ(t) dt = 1. u(t) ≡ 0. u(t) 1 t Figure 7. The derivative of the Heaviside unit step is the curious Dirac delta δ(t) ≡ d u(t). δ(t) t One can write this more concisely in the form x(t) = u(t − to )xo . 7. where u(t) is the Heaviside unit step.10: The Dirac delta δ(t). This function is zero everywhere except at t = 0. t < 0.146 CHAPTER 7.13) . (7.12) plotted in Fig.10. dt (7.9: The Heaviside unit step u(t).

14) for any function f (t). V 16 It seems inadvisable for the narrative to digress at this point to explore u(z) and δ(z). then you would expect me to adapt my deﬁnition. but that was the professionals’ problem not ours. The professionals who had established the theoretical framework before 1930 justiﬁably felt reluctant to throw the whole framework away because some scientists and engineers like us came along one day with a useful new function which didn’t quite ﬁt. The objection is not so much that δ(t) is not allowed as it is that professional mathematics for years after 1930 lacked a fully coherent theory for it. 25 May 2006] and slapped his disruptive δ(t) down on the table. The internal discussion of words and means. when the six-ﬁngered Count Rugen appeared on the scene. (Equation 7. coupled with the diﬃculty we applied mathematicians experience in trying to understand the reason the dispute even exists. such that δ(r) dr = 1. To us the Dirac delta δ(t) is just a function. an elusive sense that δ(t) hides subtle mysteries—when what it really hides is an internal discussion of words and means among the professionals. For the book’s present purpose the interesting action of the two functions is with respect to the real argument t. however. inasmuch as it lacks certain properties common to functions as they deﬁne them [36. If I had established a deﬁnition of “nobleman” which subsumed “human. In the author’s country at least. Even if not. is not for this writer to judge. the fact of the Dirac delta dispute.4][12]. the six-ﬁngered count is “not a nobleman”. has unfortunately surrounded the Dirac delta with a kind of mysterious aura. a sort of debate seems to have run for decades between professional and applied mathematicians over the Dirac delta δ(t). wouldn’t you? By my pre¨xisting deﬁnition.)16 The Dirac delta is deﬁned for vectors. although by means of Fourier analysis ([chapter not yet written]) or by conceiving the Dirac delta as an inﬁnitely narrow Gaussian pulse ([chapter not yet written]) it could perhaps do so. too. until one realizes that it is more a dispute over methods and deﬁnitions than over facts. of course. “Paul Dirac.7.” whose relevant traits in my deﬁnition included ﬁve ﬁngers on each hand. Whether the professional mathematician’s deﬁnition of the function is ﬂawed. (7. e but such exclusion really tells one more about ﬂaws in the deﬁnition than it does about the count. The book has more pressing topics to treat.14 is the sifting property of the Dirac delta.7. DISCONTINUITIES and the interesting consequence that ∞ −∞ 147 δ(t − to )f (t) dt = f (to ) (7. we leave to the professionals. From the applied point of view the objection is admittedly a little hard to understand. Some professional mathematicians seem to have objected that δ(t) is not a function.” 05:48. the unit step and delta of a complex argument. § 2.15) . strictly speaking. What the professionals seem to be saying is that δ(t) does not ﬁt as neatly as they would like into the abstract mathematical framework they had established for functions in general before Paul Dirac came along in 1930 [52. who know whereof they speak. It’s a little like the six-ﬁngered man in Goldman’s The Princess Bride [19].

∗ Evaluate the integral of the example of § 7. [42. Even if this book is not an instructional textbook. 1. ∗ Evaluate19 (a) 0 [1/(1 + τ 2 )] dτ (answer: arctan x). § 5-6] 19 [42. (b) x k ∞ k=0 0 τ dτ . x3 /3. 4.148 CHAPTER 7. This chapter is short.6 along the alternate conˆ ˆ tour suggested there. Work the exercise by hand in hexadecimal and give the answer in hexadecimal. Work the exercises if you like. The reader who desires a gentler introduction to the integral might consult among others the textbook the chapter’s introduction has recommended. (b) 0 [(4 + √ i3)/ 2 − 3τ 2 ] dτ (hint: the answer involves another inverse trigonometric). deep and hard. (b) ∞ 2 −∞ exp[−τ /2] dτ . but its implications are broad. so you should not expect to be able to complete them all now. § 8-2] [42. 11. One reason introductory calculus texts run so long is that they include many. 8. 3. ∗ Evaluate (a) dτ . (b) 9. −2 5. Evaluate (a) 1 (1/τ 2 ) dτ . The harder ones are marked with ∗ asterisks. (b) 1 x x 0 sin ωτ dτ . (b) a 3τ −2 dτ . Evaluate x 0 ∞ k k=0 τ (d) x 0 dτ . Evaluate (a) x 0 τ x dτ . Here are a few. it seems not meet that it should include no exercises at all here. (c) ∞ k k=0 (τ /k!) x 0 exp ατ 5 dτ . ∗ Evaluate18 (a) x√ 1 + 2τ dτ . (b) 5 (3τ 2 − 2τ 3 ) dτ . x x 0 τ 10. many pages of integration examples and exercises. x (Answer: x2 /2. THE INTEGRAL 7. (c) x (a2 τ 2 + a1 τ ) dτ . √ τ )/ τ ] dτ. x 0 2. Some of them do need material from later chapters. 7.) x n a Cτ dτ . Evaluate (a) −2 (3τ 2 − 2τ 3 ) dτ . back endpaper] . 17 18 ∗∗ Evaluate (a) x 2 −∞ exp[−τ /2] dτ .8 Remarks (and exercises) The concept of the integral is relatively simple once grasped. 6. from xρ to 0 to yρ. Evaluate (a) x 0 cos ωτ dτ . ∗ (c)17 a x [(cos √ sin ωτ dτ . (b) x 2 0 τ dτ . ∗ (e) 1 (1/τ ) dτ . Evaluate ∞ 2 1 (3/τ ) dτ .

and in fact how easy it is to come up with an integral which no one really knows how to solve very well. Chapter 9 introduces some of the basic.8. faster to numerically calculate. the more such integrals you can solve. you should be able to work right now. The mathematical art of solving diverse integrals is well worth cultivating. as promised earlier we turn our attention in Chapter 8 back to the derivative. REMARKS (AND EXERCISES) 149 The last exercise in particular requires some experience to answer. however. Before addressing techniques of integration. integrals which arise in practice often can be solved very well with suﬃcient cleverness—and the more cleverness you develop. The point of the exercises is to illustrate how hard integrals can be to solve. The ways to solve them are myriad. Some of the easier exercises. On the other hand. etc.) yet not even the masters can solve them all in practical ways. Some solutions to the same integral are better than others (easier to manipulate. of course. Moreover.7. . most broadly useful integralsolving techniques. it requires a developed sense of applied mathematical style to put the answer in a pleasing form (the right form for part b is very diﬀerent from that for part a). applied in the form of the Taylor series.

150 CHAPTER 7. THE INTEGRAL .

6 would cease to be necessary. 8. 8. most of §§ 8. plus the introduction of § 8. to read only the following sections might not be an unreasonable way to shorten the chapter: §§ 8. For the impatient. This chapter introduces the Taylor series and some of its incidents. the chapter errs maybe toward too little rigor. The chapter’s early sections prepare the ground for the treatment of the Taylor series proper in § 8.8. The chapter errs maybe toward too much rigor. Fitting a function in such a way brings two advantages: • it lets us take derivatives and integrals in the same straightforward way (4. 1 151 . and • it implies a simple procedure to calculate the function numerically. Some pretty constructs of pure mathematics serve the Taylor series and Cauchy’s integral formula. It also derives Cauchy’s integral formula.1. However.1 Because even at the applied level the proper derivation of the Taylor series involves mathematical induction.9 and 8. such constructs drive the applied mathematician on too long a detour.2. no balance of rigor the chapter might strike seems wholly satisfactory.1.3. The chapter as written represents the most nearly satisfactory compromise the writer has been able to attain.3. with a little less.4 and 8.11. From another point of view. 8. 8.20) we take them with any power series. for.5. 8.Chapter 8 The Taylor series The Taylor series is a power series which ﬁts a function in a limited domain neighborhood. analytic continuation and the matter of convergence domains.

down the second diagonal.4 we found that 1 = 1−z zk = 1 + z + z2 + z3 + · · · for |z| < 1.2) which actually proves (8. one can calculate the ﬁrst few terms of 1/(1 − z)2 to be 1 1 = = 1 + 2z + 3z 2 + 4z 3 + · · · 2 (1 − z) 1 − 2z + z 2 whose coeﬃcients 1.1 The power-series expansion of 1/(1 − z)n+1 ∞ k=0 Before approaching the Taylor series proper in § 8.1) for n ≥ 0.1. 4.1.4. 4. Multiplying by 1 − z. the coeﬃcients down the third. THE TAYLOR SERIES 8. The demonstration comes in three stages. A curious pattern seems to emerge.1.2 on page 73. = (1 − z)n k=0 . it is the second stage (§ 8. (8. 1/(1 − z)3 .1). Of the three. worth investigating more closely. .6. dividing 1/(1 − z)4 . 6.1). 0xA. and so on? By the long-division procedure of Table 2.1 The formula ∞ k=0 In § 2. 8.3. .1. Dividing 1/(1 − z)3 seems to produce the coeﬃcients 1. we shall ﬁnd it both interesting and useful to demonstrate that 1 = (1 − z)n+1 n+k n zk (8. . suppose that 1/(1− z)n+1 . The ﬁrst stage (§ 8. n ≥ 0. is expandable in the power series ∞ 1 ank z k .2) = (1 − z)n+1 k=0 where the ank are coeﬃcients to be determined. .1) comes up with the formula for the second stage to prove.3) establishes the sum’s convergence.152 CHAPTER 8. 4. we have that ∞ 1 (ank − an(k−1) )z k .1). The pattern recommends the conjecture (8. 3. see also Fig. . What about 1/(1 − z)2 . 3. happen to be the numbers down the ﬁrst diagonal of Pascal’s triangle (Fig. 1/(1 − z)4 . |z| < 1. To motivate the conjecture a bit more formally (though without actually proving it yet). The third stage (§ 8. 2. .

3).4) better match (8. this is n + (k − 1) n + (n − 1) + k n−1 = n+k n . applied to (8. Transforming according to the rule (4. yield (8. (8.8.1.3). all we can say is that (8. (8.2).3) reminds one of (4.1) seems right.3) perfectly. .4) gives n+k−1 k−1 + n+k−1 k = n+k k .3) is not a(m−1)(j−1) + a(m−1)j = amj . this much is easy to test. or in other words that an(k−1) + a(n−1)k = ank . (8. But to seem is not to be. (8. 153 (8.5) which ﬁts (8.3) Thinking of Pascal’s triangle.5). THE POWER-SERIES EXPANSION OF 1/(1 − Z)N +1 This is to say that a(n−1)k = ank − an(k−1) . We shall establish that it is right in the next subsection. Various changes of variable are possible to make (8. Thus changing in (8.6) which coeﬃcients.1) is thus suggestive.1). It works at least for the important case of n = 0. At this point. it seems to imply a relationship between the 1/(1 − z)n+1 series and the 1/(1 − z)n series for any n. We might try at ﬁrst a few false ones. recommends itself. Hence we conjecture that ank = n+k n . In light of (8.3). transcribed here in the symbols m−1 m−1 m + = .4) j−1 j j except that (8. but eventually the change n + k ← m. Equation (8. k ← j.

8) Now suppose that (8.7) n+k n − n + (k − 1) n zk .5).34) we know to be true. Applying (8. i+k i zk . (n − 1) + k n−1 zk . this one needs at least one start case to be valid (many inductions actually need a consecutive pair of start cases).1) is true for n = i − 1. if (8. which per (2. if it is true for any one n. Consider the sum Sn ≡ Multiplying by 1 − z yields (1 − z)Sn = Per (8. this is (1 − z)Sn = ∞ k=0 ∞ k=0 n+k n zk .9) implies (8.154 CHAPTER 8. Thus by induction. then it is also true for all greater n.10) Evidently (8.8). this means that 1 = (1 − z)Si .10). (8. (8.2 The proof by induction ∞ k=0 Equation (8. The “if” in the last sentence is important. In other words. (8.7). (8. Like all inductions. .1. (1 − z)i 1 = Si .1) is proved by induction as follows. then it is also true for n = i.1) is true for n = i − 1 (where i denotes an integer rather than the imaginary unit): 1 = (1 − z)i ∞ k=0 (i − 1) + k i−1 zk . THE TAYLOR SERIES 8. The n = 0 supplies the start case 1 = (1 − z)0+1 ∞ k=0 k 0 z = k ∞ k=0 zk .9) In light of (8. (1 − z)i+1 1 = (1 − z)i+1 ∞ k=0 Dividing by 1 − z.

A more rigorous way of saying the same thing is as follows: the series ∞ X τk S= k=0 .7. THE POWER-SERIES EXPANSION OF 1/(1 − Z)N +1 155 8.” distinguishing it through a test devised by Weierstrass from the weaker “pointwise convergence” [1. this means that n+k n or more tersely. but the eﬀect of trying to teach the professional view in a book like this would not be pleasing. The applied mathematician can proﬁt substantially by learning the professional view in the matter. for all possible positive constants ǫ. but if explanation here helps: a series converges if and only if it approaches a speciﬁc. n ← j.1. m j = m m−j m−1 j for any integers m > 0 and j.8. With the substitution n + k ← m.1).5]. § 1. ˛ ˛ ˛ k=K+1 2 The meaning of the verb to converge may seem clear enough from the context and from earlier references.9). ank an(k−1) = n+k n =1+ . are the coeﬃcients of the power series (8.1) converges. ﬁnite value after many terms.11) for all n ≥ K (of course it is also required that the τk be ﬁnite. converges iﬀ (if and only if). The professional mathematical literature calls such convergence “uniform convergence. Here. there exists a ﬁnite K ≥ −1 such that ˛ ˛ n ˛ X ˛ ˛ ˛ τk ˛ < ǫ. It is interesting nevertheless to consider an example of an integral for which convergence is not so simple. Rearranging factors. but you knew that already). we avoid error by keeping a clear view of the physical phenomena the mathematics is meant to model.3 Convergence The question remains as to the domain over which the sum (8.1.2 To answer the question. consider that per (4. such as Frullani’s integral of § 9. k n+k n = n+k k n + (k − 1) n . ank = where ank ≡ n+k an(k−1) . k k (8.

20).11) by z k /z k−1 gives the ratio ank z k n z. The theoretical diﬃculty the student has with mathematical induction arises from the reluctance to ask seriously. as in § 8. 3 .10. 3. the reader may nevertheless wonder why the simpler |(1 + n/k)z| < 1 is not given as a criterion. (8. The P surprising answer is that not all series τk with |τk /τk−1 | < 1 converge! For example. P P the extremely simple 1/k does not converge. = 1+ k−1 an(k−1) z k which is to say that the kth term of (8.4 General remarks on mathematical induction We have proven (8. The virtue of induction as practiced in § 8. 8. Hamming once said of mathematical induction. So long as the criterion3 1+ n z ≤1−δ k is satisﬁed for all suﬃciently large k > K—where 0 < δ ≪ 1 is a small positive constant—then the series evidently converges (see § 2. The distinction is P subtle but rather important.4 and eqn. maybe you can prove it by induction. all series τk with |τk /τk−1 | < 1 − δ do converge.1. but the induction probably does not help you to obtain the formula! A good inductive proof usually begins by motivating the formula proven. Once you obtain a formula somehow. As we see however.1. Its vice is that it conceals the subjective process which has led the mathematician to consider the formula in the ﬁrst place. Richard W. Answer: it Rx majorizes 1 (1/τ ) dτ = ln x.1) by means of a mathematical induction. airtight case for a formula.1. The really curious reader may now ask why 1/k does not converge.2 is that it makes a logically clean. so to meet the criterion it suﬃces that |z| < 1. See (5.1.1). “How could I prove a formula for an inﬁnite number of cases when I know that testing a ﬁnite number of cases is not enough?” Once you Although one need not ask the question to understand the proof.1) is (1 + n/k)z times the (k − 1)th term.156 CHAPTER 8.6.7) and § 8.12) The bound (8.12) thus establishes a sure convergence domain for (8. But we can bind 1+n/k as close to unity as desired by making K suﬃciently large. THE TAYLOR SERIES Multiplying (8.

We shall not always write inductions out as explicitly in this book as we have done in the present section—often we shall leave the induction as an implicit exercise for the interested reader—but this section’s example at least lays out the general pattern of the technique. but he does not disdain mathematical rigor on principle. [20.2. you will understand the ideas behind mathematical induction.3 concerns the shifting of a power series’ expansion point. Mathematical induction is a broadly applicable technique for constructing mathematical proofs. [one cannot teach] a uniform level of rigor. SHIFTING A POWER SERIES’ EXPANSION POINT really face this question. but psychologically there is little else that can be done. Hamming. § 1. [20. wrote further. but Hamming nevertheless makes a pertinent point.8. It is only when you grasp the problem clearly that the method becomes clear. on the ground that the math serves the model. How can the expansion point of the power series f (z) = ∞ k=K (ak )(z − zo )k . which is needed to protect us against careless thinking. Logically. The style lies in exercising rigor at the right level for the problem at hand. K ≤ 0.6] 157 The applied mathematician may tend to avoid rigor for which he ﬁnds no immediate use. § 2. 8. (8. [20. It is necessary to require a gradually rising level of rigor so that when faced with a real need for it you are not left helpless. when teaching a topic the degree of rigor should follow the student’s perceived need for it. but rather a gradually rising level. § 1. . Ideally.13) .2 Shifting a power series’ expansion point One more question we should treat before approaching the Taylor series proper in § 8. Rigor is the hygiene of mathematics.3] Hamming also wrote. .6] Applied mathematics holds that the practice is defensible. The function of rigor is mainly critical and is seldom constructive. a professional mathematician who sympathized with the applied mathematician’s needs. this is indefensible. As a result. .

12). THE TAYLOR SERIES which may have terms of negative order.1). . (8.18) . (cn ) n k (8. (c[−(n+1)] ) The f+ (z) is even simpler to expand: one need only multiply the series out term by term per (4.13) in the form f (z) = then changes the variables z − z1 . (8. be shifted from z = zo to z = z1 ? The ﬁrst step in answering the question is straightforward: one rewrites (8. (1 − w)k+1 (ck )(1 − w)k .16) c[−(k+1)] .17) n+k n . we have that f (z) = f− (z) + f+ (z).158 CHAPTER 8. f− (z) ≡ f+ (z) ≡ −(K+1) k=0 ∞ k=0 (8. w← to obtain f (z) = ∞ k=K ∞ k=K (ak )([z − z1 ] − [zo − z1 ])k . after which combining like powers of w yields the form f− (z) = qk ≡ ∞ k=0 −(K+1) n=0 qk w k .14) (ck )(1 − w)k . (8. the f− (z) is expanded term by term using (8.15) Splitting the k < 0 terms from the k ≥ 0 terms in (8. combining like powers of w to reach the form f+ (z) = pk ≡ ∞ k=0 ∞ n=k pk w k . zo − z1 ck ← [−(zo − z1 )]k ak .15). Of the two subseries.

18) can be nothing other than the Taylor series about z = z1 of the function f+ (z) in any event. it also must lie within the convergence domain of the original. anyway. 4 .17) nor of f+ (z) in (8. however. See §§ 8.9) and its brethren. Rather.8.4 through 8. The convergence domain vanishes as z1 approaches such a forbidden point. which ratio approaches unity with increasing n.8. is harder to establish in the abstract because that subseries has an inﬁnite number of terms.16) converges for |w| < 1. one shifts the expansion point of some concrete function like sin z or ln(1 − z). The treatment begins with a question: if you had to express A rigorous argument can be constructed without appeal to § 8.3.13) through (8.3. The attentive reader might observe that we have formally established the convergence neither of f− (z) in (8. shifting the expansion point away from the pole at z = zo has resolved the k < 0 terms. z = zo series for the shift to be trustworthy as derived. we have strategically framed the problem so that one needn’t worry about it. enjoying the same convergence domain any such Taylor series enjoys. The method fails if z = z1 happens to be a pole or other nonanalytic point of f (z). running the sum in (8. The imagined diﬃculty (if any) vanishes in the concrete case.12) each term of the original f− (z) of (8. one does not normally try to shift the expansion point of an unspeciﬁed function f (z). that of f+ (z).3. it has no terms of negative order. the reconstituted f− (z) of (8. given those of a power series about z = zo . A more elegant rigorous argument can be made indirectly by way of a complex contour integral. we now stand in position to treat the Taylor series proper. z = zo power series—the new.12) of restricting the convergence domain to |w| < 1.3 if desired.) Furthermore. and since according to (8. In applied mathematics. that of f− (z).18) serve to shift a power series’ expansion point. Appealing to § 8. At the price per (8.3 Expanding functions in Taylor series Having prepared the ground.13) from the ﬁnite k = K ≤ 0 rather than from the inﬁnite k = −∞. however. As we shall see by pursuing a diﬀerent line of argument in § 8. The latter convergence.17) safely converges in the same domain. z = z1 power series has terms (z − z1 )k only for k ≥ 0. EXPANDING FUNCTIONS IN TAYLOR SERIES 159 Equations (8. from the ratio n/(n − k) of (4. Notice that—unlike the original. even if z1 does represent a fully analytic point of f (z). (Examples of such forbidden points include z = 0 in √ h[z] = 1/z and in g[z] = z.4 8.18). calculating the coeﬃcients of a power series for f (z) about z = z1 . the important point is the one made in the narrative: f+ (z) can be nothing other than the Taylor series in any event. Regarding the former convergence. the f+ (z) of (8.

then ∆F (zo ) and all its derivatives at zo must be identically zero (otherwise by the Taylor series formula of eqn.) The reason the rigorous proof is conﬁned to a footnote is not a deprecation of rigor as such. If ∆F (z) is the part of F (z) not representable as a Taylor series. having terms of nonnegative order only. for example?5 Fortunately a diﬀerent way to attack the power-series expansion problem is known. Consider an inﬁnitely diﬀerentiable function F (z) and its Taylor series f (z) about zo .160 CHAPTER 8. there is no part of F (z) not representable as a Taylor series. However. the argument goes brieﬂy as follows. most resembles f (z) in the immediate neighborhood of z = zo ? To resemble f (z). and so on. It is a deprecation of rigor which serves little purpose in applications. THE TAYLOR SERIES some function f (z) by a power series f (z) = ∞ k=0 (ak )(z − zo )k . f (z) = ∞ k=0 dk f dz k z=zo (z − zo )k . Further proof details may be too tiresome to inﬂict on applied mathematical readers. it has all the same derivatives f (z) has.1 worked well enough in the case of f (z) = 1/(1−z)n+1 . one could construct a nonzero Taylor series for ∆F [zo ] from the nonzero derivatives). but it is not immediately obvious that the same procedure works more generally. What if f (z) = sin z. This means that ∆F (z) = 0.9. and so on. Where it converges. Then it should have a1 = f ′ (zo ) for the right slope. hence also at z = zo ± 2ǫ. with terms of nonnegative order k ≥ 0 only. It works by asking the question: what power series. preferred by the professional mathematicians [29][14] but needing signiﬁcant theoretical preparation. a2 = f ′′ (zo )/2 for the right second derivative. (A more elegant rigorous proof. for readers who want a little more rigor nevertheless.19) is the Taylor series.19) Equation (8. if F (z) is inﬁnitely diﬀerentiable and if all the derivatives of ∆F (z) are zero at z = zo . In other words. However. otherwise it would not have the right value at z = zo . Let ∆F (z) ≡ F (z) − f (z) be the part of F (z) not representable as a Taylor series about zo . the desired power series should have a0 = f (zo ). We will not detail that proof here. 8.19. involves integrating over a complex contour about the expansion point. Then. The interested reader can ﬁll the details in. all the derivatives must also be zero at z = zo ± ǫ. Applied mathematicians 6 5 .4. how would you do it? The procedure of § 8. With this procedure.6 The actual Taylor series for sin z is given in § 8. then by the unbalanced deﬁnition of the derivative from § 4. k! (8. so if f (z) is inﬁnitely diﬀerentiable then the Taylor series is an exact representation of the function. but basically that is how the more rigorous proof goes.

8.2. maybe more appropriately for our applied purpose.4 Analytic continuation As earlier mentioned in § 2. With such an implicit deﬁnition. except that the transposed series may there enjoy a diﬀerent convergence domain. In the swapped notation. When zo = 0.3. However. ANALYTIC CONTINUATION 161 The Taylor series is not guaranteed to converge outside some neighborhood near z = zo . a function expressible as a Taylor series in that neighborhood.12. the two series evidently describe the selfsame underlying analytic function. so long as the expansion point z = zo lies fully within (neither outside nor right on the edge of) the z = z1 series’ convergence domain. Since an analytic function f (z) is inﬁnitely diﬀerentiable and enjoys a unique Taylor expansion fo (z −zo ) = f (z) about each point zo in its domain.8. transposing rather from expansion about z = z1 to expansion about z = zo . Since the functions are imprecise analogs in any case. then the two can normally regard mathematical functions to be imprecise analogs of physical quantities of interest. to restrict the set of inﬁnitely diﬀerentiable functions used in the model to the subset of such functions representable as Taylor series. nothing prevents one from transposing the series to a diﬀerent expansion point z = z1 by the method of § 8. In applied mathematics. that is. it follows that if two Taylor series f1 (z − z1 ) and f2 (z − z2 ) ﬁnd even a small neighborhood |z − zo | < ǫ which lies in the domain of both. the series is a construct of great importance and tremendous practical value. (It is entertaining to consider [51. but where it does converge it is precise. this section’s purpose ﬁnds it convenient to swap symbols zo ↔ z1 .) . as we shall soon see. As we have seen. By either name. As it happens. the applied mathematician is logically free implicitly to deﬁne the functions he uses as Taylor series in the ﬁrst place. an analytic function is a function which is inﬁnitely diﬀerentiable in the domain neighborhood of interest—or. the series is also called the Maclaurin series.4. whether there actually exist any inﬁnitely diﬀerentiable functions not representable as Taylor series is more or less beside the point. only one Taylor series about zo is possible for a given function f (z): f (z) = ∞ k=0 (ak )(z − zo )k . “Extremum”] the Taylor series of the function sin[1/x]—although in practice this particular function is readily expanded after the obvious change of variable u ← 1/x. not the other way around. the deﬁnitions serve the model.

extends the analytic continuation principle to cover power series in general. The series f1 and the series fn represent the same analytic function even if their domains do not directly overlap at all. The subject of analyticity is rightly a matter of deep concern to the professional mathematician.162 CHAPTER 8. . One can construct 7 The writer hesitates to mention that he is given to understand [44] that the domain neighborhood can technically be reduced to a domain contour of nonzero length but zero width. The result of § 8. one need not actually expand every analytic function in a power series.” he means any function at all. observe: though all convergent power series are indeed analytic. so that is all right. is an analytic function of analytic functions. The domain neighborhood |z − zo | < ǫ suﬃces. because the function is not diﬀerentiable there. then they are the same everywhere. THE TAYLOR SERIES both be transposed to the common z = zo expansion point. then the whole chain of overlapping series necessarily represents the same underlying analytic function f (z).7 Observe however that the principle fails at poles and other nonanalytic points. by the derivative chain rule. and if each pair in the chain matches at least in a small neighborhood in its region of overlap. if a series f3 is found whose domain overlaps that of f2 . which shows general power series to be expressible as Taylor series except at their poles and other nonanalytic points. then a series f4 whose domain overlaps that of f3 . Now. This is a manifestation of the principle of analytic continuation. If the two are found to have the same Taylor series there. When the professional mathematician speaks generally of a “function. Besides. Moreover. It is also a long wedge which drives pure and applied mathematics apart. He does not especially recommend that the reader worry over the point. given Taylor series for g(z) and h(z) one can make a power series for f (z) by long division if desired. Section 8. the writer has neither researched the extension’s proof nor asserted its truth. where g(z) and h(z) are analytic. then f1 and f2 both represent the same function. Having never met a signiﬁcant application of this extension of the principle. The principle holds that if two analytic functions are the same within some domain neighborhood |z − zo | < ǫ.2. including power series with terms of negative order. and so on. Sums. products and ratios of analytic functions are no less diﬀerentiable than the functions themselves—as also. For example.15 speaks further on the point. there also is f (z) ≡ g(z)/h(z) analytic (except perhaps at isolated points where h[z] = 0).

Maybe a future release of the book will trace the pure theory’s main thread in an appendix. troublesome function. BRANCH POINTS some pretty unreasonable functions if one wants to. reduces. However. The foundations of the pure theory of a complex variable. where k and m are integers. replaces and/or avoids them. if one encircles9 the point once alone (that is. he encounters several. without also encircling some other branch point) by a closed contour in the Argand domain plane. Such functions are nonanalytic either because they lack proper derivatives in the Argand plane according to (4. are beautiful. given a function f (z) with such a point at z = zo . the pure theory is probably best appreciated after one already understands its chief conclusions.8 This does not mean that the scientist or engineer never encounters nonanalytic functions. Refer to §§ 2. δ(t). ℜ(z). approximates. geometrical circle.8.5 Branch points √ The function g(z) = z is an interesting. Though not for the explicit purpose of serving the pure theory. even an applied mathematician can proﬁt substantially by studying them.5. one transforms. When such functions do arise. probably the most ﬁtting shape to imagine when thinking of the concept abstractly. ℑ(z). The full theory which classiﬁes and encompasses—or explicitly excludes—such functions is thus of limited interest to the applied mathematician. 163 f (z) ≡ 0 otherwise. and though they do not comfortably ﬁt a book like this. any closed shape suﬃces.12 and 7. its derivative is not ﬁnite there.” The verb does not require the boundary to have the shape of an actual.7. 9 For readers whose native language is not English. though abstract. z ∗ . such as f ([2k + 1]2m ) ≡ (−)m . neither functions like this f (z) nor more subtly unreasonable functions normally arise in the modeling of physical phenomena. Evidently g(z) has a nonanalytic point at z = 0.19) or because one has deﬁned them only over a real domain. However that may be. However. On the contrary. arg z. but they are not subtle: |z|. 8 . The deﬁning characteristic of the branch point is that. so even though the function is ﬁnite at z = 0. the present chapter does develop just such an understanding. including [14][44][24] and numerous others. while simultaneously tracking f (z) in the Argand range plane—and if one demands that z and f (z) move Many books do cover it in varying degrees. “to encircle” means “to surround” or “to enclose. the circle is a typical shape. What is it? We call it a branch point. u(t). and this book does not cover it. 8. Its deriva√ tive is dg/dz = 1/2 z. yet the point is not a pole.

is single-valued even though it has a pole. Poles do not cause their functions to be multiple-valued and thus are not branch points. Evidently f (z) ≡ (z − zo )a has a branch point at z = zo if and only if a is not an integer. An analytic function like h(z) = 1/z. there really is no distinction between z1 = zo + ρeiφ and z2 = zo + ρei(φ+2π) —none at all. Where an integral follows a closed contour as in § 8. that neither suddenly skip from one spot to another—then one ﬁnds that f (z) ends in a diﬀerent place than it began. by contrast. even though the two are exactly the same number. whereupon for some reason he began calling me Gorbag Pfufnik. is it? It’s in my colleague. even though z itself has returned precisely to its own starting point. It is multiple-valued. but paradoxically f (z1 ) = f (z2 ). As far as z is concerned. 16 May 2006] √ An analytic function like g(z) = z having a branch point evidently is not single-valued. THE TAYLOR SERIES smoothly. It is confusing. hadn’t I. The change isn’t really in me. What draws the distinction is the multiplevalued function f (z) which uses the argument. “Pfufnik!” then I had better keep a running count of how many times I have turned about his desk. even though the number of turns is personally of no import to me. a branch point may be thought of informally as a point zo at which a “multiple-valued function” changes values when one winds once around zo . but now the colleague calls me by a diﬀerent name. until one realizes that the fact of a branch point says nothing whatsoever about the argument z. too. . For a single z more than one distinct g(z) is possible. I had not changed at all. When a domain contour encircles a pole. who seems to suﬀer a branch point. It is as though I had a mad colleague who called me Thaddeus Black. the corresponding range contour is properly closed.8. If f (z) does have a branch point—if a is not an integer— then the mathematician must draw a distinction between z1 = zo + ρeiφ and z2 = zo + ρei(φ+2π) . The range contour remains open even though the domain contour is closed. In complex analysis. If it is important to me to be sure that my colleague really is addressing me when he cries. This function does not suﬀer the syndrome described. Indeed z1 = z2 . The usual analysis strategy when one encounters a branch point is simply to avoid the point. until one day I happened to walk past behind his desk (rather than in front as I usually did). “Branch point.164 CHAPTER 8.” 18:10. [52. This is diﬃcult.

and that point is no branch point. pure mathematics nevertheless brings some technical deﬁnitions the applied mathematician can use. isolated nonanalytic point at z = 0. but not f (z) = 1/z which has a pole at z = 0. but are not central to the analysis as developed in this book and are not covered here. Even the function f (z) = exp(1/z) is not meromorphic: it has only the one. A function f (z) which is analytic for all ﬁnite z except at isolated poles (which can be n-fold poles if n is a ﬁnite integer). is a meromorphic function. Examples include f (z) = z 2 and f (z) = exp z. ENTIRE AND MEROMORPHIC FUNCTIONS 165 the strategy is to compose the contour to exclude the branch point.6 Entire and meromorphic functions Though an applied mathematician is unwise to let abstract deﬁnitions enthrall his thinking. but of which the poles nowhere cluster in inﬁnite numbers. Examples include f (z) = 1/z. incidentally. These ideas are interesting. f (z) = 1/(z + 2) + 1/(z − 1)3 + 2z 2 and f (z) = tan z—the last of which has an inﬁnite number of poles. which has no branch points. 4 10 Traditionally associated with branch points in complex variable theory are the notions of branch cuts and Riemann sheets. then consider that tan z = cos w sin z =− .12 If it seems unclear that the singularities of tan z are actual poles. to shut it out.10 8.11 A function f (z) which is analytic for all ﬁnite z is an entire function. Such a strategy of avoidance usually prospers. of which no circle of ﬁnite radius in the Argand domain plane encompasses an inﬁnity of poles. The function f (z) = tan(1/z) is not meromorphic since it has an inﬁnite number of poles within the Argand unit circle. having the character of an inﬁnitifold (∞-fold) pole. The interested reader might consult a book on complex variables or advanced calculus like [24]. cos z sin w wherein we have changed the variable w ← z − (2n + 1) 2π . Two such are the deﬁnitions of entire and meromorphic functions.6. but the point is an essential singularity. among many others. 11 [51] 12 [29] .8.

but ∞ k=1 (ak )(z − zo )k .7 Extrema over a complex domain If a function f (z) is expanded by (8. if it is unclear that z = [2n + 1][2π/4] are the only singularities tan z has—that it has no singularities of which ℑ[z] = 0—then consider that the singularities of tan z occur where cos z = 0. in the immediate neighborhood of the expansion point. below. Speciﬁcally according to (8. |z − zo | ≪ 1.) Sections 8. to shift f (z) by ∆f ≈ ǫeiψ . ′ ′ (8. this is f (z) ≈ f (zo ) + aK ρ′K eiKφ . w/3 − w3 /0x1E + · · · 1 + . such that aK . 5. 8. 8.6 speak further of the matter.19) or by other means about an analytic expansion point z = zo such that f (z) = f (zo ) + and if ak = 0 aK = 0.20). THE TAYLOR SERIES Section 8. This in turn is possible only if |exp[+iz]| = |exp[−iz]|.9 and its Table 8.166 CHAPTER 8.15 and 9.17.14. is the series’ ﬁrst nonzero coeﬃcient. give Taylor series for cos z and sin z.20) Evidently one can shift the output of an analytic function f (z) slightly in any desired Argand direction by shifting slightly the function’s input z. then. 0 < K < ∞. occurs where exp[+iz] = exp[−iz]. which happens only for real z. one can shift z . w − w3 /6 + w5 /0x78 − · · · By long division. for 0 < k < K < ∞. eqn. with which −1 + w2 /2 − w4 /0x18 − · · · tan z = . w 1 − w2 /6 + w4 /0x78 − · · · tan z = − (On the other hand. f (z) ≈ f (zo ) + (aK )(z − zo )K .1. Changing ρ′ eiφ ← z − zo . 0 ≤ ρ′ ≪ 1. which by Euler’s formula.

Inasmuch as z is complex.21) If z were always a real number.5. or contour. Consider the integral z2 Sn = z1 z n−1 dz. then by the antiderivative (§ 7. this always works—even where [df /dz]z=zo = 0. so the contour problem does not arise in that case. (8.8. 14 [9][29] 13 . Except at a nonanalytic point of f (z) or in the trivial case that f (z) were everywhere constant. It does however arise where the variable of integration is a complex scalar. to ln(z2 /z1 ). One must also consider the meaning of the symbol dz. Refer to the Argand plane of Fig. 2.14 8. That one can shift an analytic function’s output smoothly in any Argand direction whatsoever has the signiﬁcant consequence that neither the real nor the imaginary part of the function—nor for that matter any linear combination ℜ[e−iω f (z)] of the real and imaginary parts—can have an extremum within the interior of a domain over which the function is fully analytic. one must consider some speciﬁc path of integration in the Argand plane.13.8 Cauchy’s integral formula In § 7. over which the integration is done. 7. as in Fig. Professional mathematicians tend to deﬁne the domain and its boundary more carefully. That is. because there again diﬀerent paths are possible.2) this inten n gral would evaluate to (z2 − z1 )/n. in the case of n = 0. however.8.8. Because real scalars are conﬁned to a single line. the correct evaluation is less obvious. no alternate choice of path is possible where the variable of integration is a real scalar. in which the sum value of an integration depends not only on the integration’s endpoints but also on the path.6 we considered the problem of vector contour integration. or. CAUCHY’S INTEGRAL FORMULA 167 by ∆z ≈ (ǫ/aK )1/K ei(ψ+n2π)/K . a function’s extrema over a bounded analytic domain never lie within the domain’s interior but always on its boundary. To evaluate the integral sensibly in the latter case.

you’ll know it.8. One cannot always drop them.8. The math itself will tell you. Occasionally one encounters a sum in which not only do the ﬁnite terms cancel. 8. we can drop the dρ dφ term.1. An example of the type is (1 − 3ǫ + 3ǫ2 ) + (3 + 3ǫ) − 4 (1 − ǫ)3 + 3(1 + ǫ) − 4 = lim = 3. Otherwise there isn’t much point in carrying second-order inﬁnitesimals around. the second-order inﬁnitesimals dominate and cannot be dropped. In such a case.22) 8. added to ﬁrst order inﬁnitesimals like dρ. (8.15 After canceling ﬁnite terms. Since the product of two inﬁnitesimals is negligible even on inﬁnitesimal scale. however. .1 The meaning of the symbol dz The symbol dz represents an inﬁnitesimal step in some direction in the Argand plane: dz = [z + dz] − [z] = = = (ρ + dρ)ei(φ+dφ) − ρeiφ (ρ + dρ)ei dφ eiφ − ρeiφ (ρ + dρ)(1 + i dφ)eiφ − ρeiφ . then you proceed from there. In the relatively uncommon event that you need them. but also the ﬁrst-order inﬁnitesimals. Integrat15 The dropping of second-order inﬁnitesimals like dρ dφ. THE TAYLOR SERIES 8. we are left with the peculiar but excellent formula dz = (dρ + iρ dφ)eiφ . you simply go back to the step where you dropped the inﬁnitesimal and you restore it. 2 ǫ→0 ǫ ǫ2 ǫ→0 lim One typically notices that such a case has arisen when the dropping of second-order inﬁnitesimals has left an ambiguous 0/0.2 Integrating along the contour Now consider the integration (8. To ﬁx the problem. is a standard calculus technique.21) along the contour of Fig.168 CHAPTER 8.

8.1: A contour of integration in the Argand plane. and constant-φ (zb to zc ). zc zb z n−1 dz = = ρc ρb ρc ρb (ρeiφ )n−1 (dρ + iρ dφ)eiφ (ρeiφ )n−1 (dρ)eiφ ρc ρb = einφ = = ρn−1 dρ einφ n (ρc − ρn ) b n n − zn zc b . CAUCHY’S INTEGRAL FORMULA 169 Figure 8. y = ℑ(z) zb ρ zc φ za x = ℜ(z) ing along the constant-φ segment. in two segments: constant-ρ (za to zb ).8. n .

ρ ρ1 (8. However. we really didn’t know it before deriving it. we ﬁnd that ρ2 ρ1 dz = z ρ2 ρ1 (dρ + iρ dφ)eiφ = ρeiφ ρ2 ρ1 dρ ρ2 = ln . (8. we have that zc za einφ dφ iρn inφb − einφa e in n n zb − za . Equation (8. It’s the same as for real numbers. n z n−1 dz = surprisingly the same as for real z.7) and (2. but the point is well taken nevertheless. However. zb za z n−1 dz = = φb φa φb φa (ρeiφ )n−1 (dρ + iρ dφ)eiφ (ρeiφ )n−1 (iρ dφ)eiφ φb φa = iρn = = Adding the two.” Well. notice the exemption of n = 0.24) This is always real-valued. Since any path of integration between any two complex numbers z1 and z2 is approximated arbitrarily closely by a succession of short constant-ρ and constant-φ segments. n n n zc − za . Consider the n = 0 integral z2 S0 = z1 dz .23) does not hold in that case. it follows generally that z2 n z n − z1 z n−1 dz = 2 . but otherwise it brings no surprise. n = 0.39). z Following the same steps as before and using (5. φ2 φ1 dz = z φ2 φ1 (dρ + iρ dφ)eiφ =i ρeiφ φ2 φ1 dφ = i(φ2 − φ1 ).23) n z1 The applied mathematician might reasonably ask.23) really worth the trouble? We knew that already. “Was (8. (8.25) .170 CHAPTER 8. THE TAYLOR SERIES Integrating along the constant-ρ arc.

even though the integration ends where it begins. (z − zo )a−1 dz = 0. Notice that the formula’s i2π does not depend on the precise path of integration. a = 0. and where it is implied that the contour loops positively (counterclockwise.26) does not hold in the latter case.27) has that z a−1 dz = 0. its integral comes to zero for nonintegral a only if the contour does not enclose the branch point at z = zo . But because any analytic function can be expanded in the form f (z) = ak −1 (which is just a Taylor series if the a happen to be k k (ck )(z − zo ) positive integers). n = 0.26) where as in § 7.28) The careful reader will observe that (8.28)’s derivation does not explicitly handle . dz z − zo = i2π. a (8. (8.27) However.16 16 (8. so one can write z2 z1 z a−1 dz = a a z2 − z1 . φ2 = φ1 + 2π. CAUCHY’S INTEGRAL FORMULA 171 The odd thing about this is in what happens when the contour closes a complete loop in the Argand plane about the z = 0 pole. In this case. thus S0 = i2π. (8.23) actually requires that n be an integer. this means that f (z) dz = 0 if f (z) is everywhere analytic within the contour. or with the change of variable z − zo ← z. Notice also that nothing in the derivation of (8.8. For a closed contour which encloses no pole or other nonanalytic point. (8.6 the symbol represents integration about a closed contour that ends where it begins. but only on the fact that the path loops once positively about the pole.8. in the direction of increasing φ) exactly once about the z = zo pole. Generalizing. we have that (z − zo )n−1 dz = 0.

fo (z) + k fk (z) z − zk dz = i2π k fk (zk ).6. Thus. Because the cells share boundaries within the contour’s interior. (8. The original contour—each piece of which is an exterior boundary of some cell—is integrated once piecewise. each interior boundary is integrated twice. Let the contour’s interior be divided into several cells. 8. once in each direction.29) is Cauchy’s integral formula. If the contour were a tiny circle of inﬁnitesimal radius about the pole.2). THE TAYLOR SERIES 8. and then per (8. then the integrand would reduce to f (zo )/(z − zo ). f [z] = ln[1 − z]).172 CHAPTER 8. z − zo (8.3.29) holds for any positivelydirected contour which once encloses a pole and no other nonanalytic point.8.1] .3 The formula The combination of (8.30) where the fo (z) is a regular part. 17 [32. and similarly as we have just reasoned. each of which is small enough to enjoy a single convergence domain. If the contour encloses multiple poles (§§ 2. f (z) dz = i2πf (zo ). Equation (8.2 one can transpose such a series from zo to an overlapping convergence domain about z1 . the reverse-directed integral about the tiny detour circle is −i2πf (zo ).17 and again. This is the basis on which a more rigorous proof is constructed. then according to (8. Consider the closed contour integral f (z) dz. where neither fo (z) nor any of the several fk (z) has a pole or other nonanalytic point within (or on) the an f (z) represented by a Taylor series with an inﬁnite number of terms and a ﬁnite convergence domain (for example. then by the principle of linear superposition (§ 7. § 1.26) and (8.11 and 9.28) the resulting integral totals zero. but the two straight integral segments evidently cancel. canceling.2? In this case. if the dashed detour which excludes the pole is taken. by § 8. so to bring the total integral to zero the integral about the main contour must be i2πf (zo ).26). whether the contour be small or large.29) But if the contour were not an inﬁnitesimal circle but rather the larger contour of Fig. (8.3). However. Integrate about each cell. z − zo where the contour encloses no nonanalytic point of f (z) itself but does enclose the pole of f (z)/(z − zo ) at z = zo .28) is powerful.

§ 10. (z − zo )m+1 where m + 1 = n. Expanding f (z) in a Taylor series (8. The values fk (zk ). triple or other n-fold pole. CAUCHY’S INTEGRAL FORMULA Figure 8.5. “Cauchy’s integral formula. 20 April 2006] . (k!)(z − zo )m−k+1 [24.2: A Cauchy contour integral. (Note however that eqn. the integration can be written S= f (z) dz.” 14:13.18 8.8. ℑ(z) zo 173 ℜ(z) contour. are called residues. (8.8.30 does not handle branch points. In words.8. which represent the strengths of the poles.30) says that an integral about a closed contour in the Argand plane comes to i2π times the sum of the residues of the poles (if any) thus enclosed. the contour must exclude it or the formula will not work.30) Cauchy’s integral formula is an extremely useful result.29) or of (8.4 Enclosing a multiple pole When a complex contour of integration encloses a double.6][44][52. S= 18 ∞ k=0 dk f dz k z=zo dz . If there is a branch point.) As we shall see in § 9. whether in the form of (8.19) about z = zo . 8.

For instance. = z=1 z=1 . z=zo z=zo = z=zo = where the integral is evaluated in the last step according to (8. [29][44] . (When m = 0. = 2. (8.19).3. m ≥ 0.29). f (z) i2π dm f .21) that d(z a ) = az a−1 . = z=1 = −1.)19 8. the derivatives of Tables 5. THE TAYLOR SERIES But according to (8. Altogether.31) dz = (z − zo )m+1 m! dz m z=zo Equation (8. . ln z|z=1 = d ln z dz d2 ln z dz 2 d3 ln z dz 3 dk ln z dz k 19 = z=1 ln z|z=1 = 1 = z z=1 −1 z2 2 z3 −(−)k (k − 1)! zk z=1 0. k > 0.9 Taylor series for speciﬁc functions With the general Taylor series formula (8. . the two equations are the same. only the k = m term contributes.31) evaluates a contour integral about an n-fold pole as (8. expanding about z = 1. so S = 1 m! i2π m! dm f dz m dm f dz m dm f dz m dz (m!)(z − zo ) dz (z − zo ) .174 CHAPTER 8.29) does about a single pole. dz one can calculate Taylor series for many functions. 1. and the observation from (4.2 and 5.26). = z=1 z=1 = −(−)k (k − 1)!.

One can expand the Taylor series about a diﬀerent point. Table 8. then quit—which raises the question: how many terms are enough? How can one know that one has added adequately many terms. one can relatively eﬃciently calculate actual numerical values for ln z and many other functions. Section 8. k evidently convergent for |1 − z| < 1. the Taylor series about z = 1 is ln z = ∞ k=1 175 −(−)k (k − 1)! (z − 1)k =− k! ∞ k=1 (1 − z)k . the series for arctan z is computed indirectly20 by way of Table 5.10.3 and (2. ln 20 1 3 = ln 1 + 2 2 = 1 1 1 1 − + − + ··· (1)(21 ) (2)(22 ) (3)(23 ) (4)(24 ) [42. ERROR BOUNDS With these derivatives.10. that the remaining terms. One must add some ﬁnite number of terms. For example.) Using such Taylor series.8. For these it is easy if the numbers involved happen to be real.10. (And if z lies outside the convergence domain? Several strategies are then possible.1 lists Taylor series for a few functions of interest.1. All the series converge for |z| < 1.10 Error bounds One naturally cannot actually sum a Taylor series to an inﬁnite number of terms. 2k + 1 8. § 11-7] . The exp z. which constitute the tail of the series. from Table 8. are suﬃciently insigniﬁcant? How can one set error bounds on the truncated sum? 8. sin z and cos z series converge for all complex z. but cleverer and easier is to take advantage of some convenient relationship like ln w = − ln[1/w].33): z arctan z = 0 1 dw 1 + w2 (−)k w2k dw = = z ∞ 0 k=0 ∞ k=0 (−)k z 2k+1 .4 elaborates.1 Examples Some series alternate sign. Among the several series.

1: Taylor series. ∞ k=0 f (z) = (1 + z)a−1 = dk f dz k k k z=zo j=1 z − zo j ∞ k=0 j=1 a −1 z j z = j k ∞ k=0 exp z = ∞ k k=0 j=1 zk k!   sin z = ∞ k=0  cos z = ∞ z k j=1 (2j)(2j + 1) −z 2 k=0 j=1 −z 2 (2j − 1)(2j) k sinh z = ∞ k=0  cosh z = ∞ z k z2 (2j)(2j + 1) j=1   k=0 j=1 z2 (2j − 1)(2j) k − ln(1 − z) = arctan z = ∞ k=1 ∞ k=0 1 k z= j=1 ∞ k=1 zk k (−z 2 ) =  ∞ k=0 1  z 2k + 1  k j=1 (−)k z 2k+1 2k + 1 .176 CHAPTER 8. THE TAYLOR SERIES Table 8.

8.34) had been used to collapse the summation. 1) 2) 3) (1)(2 (2)(2 (3)(2 (4)(24 ) 1 1 + + ··· 5) (5)(2 (6)(26 ) The basic technique in such a case is to ﬁnd a replacement series (or inte′ gral) Rn which one can collapse analytically. For real 0 ≤ x < 1 generally. one might choose ∞ 2 1 1 ′ = . k k=n+1 xk xn+1 = . n Sn ≡ ′ Rn ≡ k=1 ∞ xk . 3 1 < ln < S3 .10. but that is the basic technique: to add several . ln 2 = − ln S4 = R4 = 1 1 = − ln 1 − = S 4 + R4 . ′ Sn < − ln(1 − x) < Sn + Rn . R4 = 5 2k (5)(25 ) k=5 wherein (2. After three terms. 2 2 1 1 1 1 + + + . For the example. n+1 (n + 1)(1 − x) Many variations and reﬁnements are possible. each of whose terms equals or exceeds in magnitude the corresponding term of Rn . some of which we shall meet in the rest of the section. Then. for instance. ′ S4 < ln 2 < S4 + R4 . 1) 2) (1)(2 (2)(2 (3)(23 ) S3 − Other series however do not alternate sign. 4) (4)(2 2 1 1 1 S3 = − + . so the true value of ln(3/2) necessarily lies between the sum of the series to n terms and the sum to n + 1 terms. For example. The last and next partial sums bound the result. ERROR BOUNDS 177 Each term is smaller in magnitude than the last.

As the plot shows. S < 2. Reﬂecting.178 CHAPTER 8. 21 .10. everywhere at least as great as. THE TAYLOR SERIES terms of the series to establish a lower bound. known quantity.1) I= 1 ∞ dx 1 =− x2 x ∞ 1 = 1. The technique usually works well in practice for this reason. but the summation does rather resemble the integral (refer to Table 7. Figure 8. then to overestimate the ′ remainder of the series to establish an upper bound. Consider the summation S= ∞ k=1 1 1 1 1 = 1 + 2 + 2 + 2 + ··· 2 k 2 3 4 The exact value this summation totals to is unknown to us. such as in representing the terms graphically—not as ﬂat-topped rectangles—but as slant-topped trapezoids.3 plots S and I together as areas—or more precisely. The overestimate Rn ′ in the example majorizes the series’ true remainder Rn . n = 0x40 would have bound ln 2 tighter than the limit of a computer’s typical double-type ﬂoating-point accuracy). and that it would have been a lot smaller yet had we included a few more terms in Sn (for instance. 8. This is best explained by example. but as a reﬂection of majorization the word seems logical. (Of course S < 2 is a very loose limit. shifted in the ﬁgure a half unit rightward. The quantities in question The author does not remember ever encountering the word minorization heretofore in print. This book at least will use the word where needed. Even so. one would let a computer add many terms of the series ﬁrst numerically. plots S − 1 and I together as areas (the summation’s ﬁrst term is omitted). minorization 21 serves surely to bound an unknown quantity by a smaller. Notice that the Rn is a fairly small number. You can use it too if you like. known quantity. thus guaranteeing the absolute upper limit on S. the unknown area S − 1 cannot possibly be as great as the known area I.) Majorization serves surely to bound an unknown quantity by a larger. cleverer ways to majorize the remainder of this particular series will occur to the reader. or. S − 1 < I = 1. In symbols. and only then majorize the remainder. The integral I majorizes the summation S − 1. In practical calculation. or to replace by virtue of being. but that isn’t the point of the example.2 Majorization To majorize in mathematics is to be.

which are easier to sum in that they include a z k factor. The area I between the dashed curve and the x axis majorizes the area S − 1 between the stairstep curve and the x axis.10. but clever majorization can help (and if majorization does not help enough.10. because the height of the dashed curve is everywhere at least as great as that of the stairstep curve. [not yet written] can help even more).8. it does have a z k factor.3 illustrates. y 1 1/22 1/32 1/42 1 2 y = 1/x2 x 3 are often integrals and/or series summations. the point of establishing a bound is not to sum a power series exactly but rather to fence the sum within some . the two of which are akin as Fig.3: Majorization. 8. and the ratio of adjacent terms’ magnitudes approaches unity as k grows. but more common than harmonic series are true power series. However. because although its terms do decrease in magnitude it has no z k factor (or seen from another point of view. the series transformations of Ch. There is no one. Harmonic series can be hard to sum accurately. but z = 1). ERROR BOUNDS 179 Figure 8. ideal bound which works equally well for all power series.3 has observed. 8. It is a harmonic series rather than a power series.10. incidentally. The choice of whether to majorize a particular unknown quantity by an integral or by a series summation depends on the convenience of the problem at hand.3 Geometric majorization Harmonic series can be hard to sum as § 8. The series S of this subsection is interesting.

then (2. k=0 (8.34) and (3. .1’s series for exp z. Here. τk = z k /[k!]). (8. τk represents the power series’ kth-order term (in Table 8. THE TAYLOR SERIES suﬃciently (rather than optimally) small neighborhood.32) Here. exact (but uncalculatable.32). for all k > n.34) (8. for example. 1 − |ρ| (8.1 is truncated after the nth-order term. A simple. general bound which works quite adequately for most power series encountered in practice.180 CHAPTER 8. Given these deﬁnitions. The |ρ| is a positive real number chosen. unknown) inﬁnite series sum. is the geometric majorization |ǫn | < |τn+1 | . then n − ln(1 − z) ≈ |ǫn | < k=1 z n+1 zk . k /(n + 1) . a pair of concrete examples should serve to clarify. such that each term in the series’ tail is smaller than the last by at least a factor of |ρ|. First. preferably as small as possible.20) imply the geometric majorization (8. if n Sn ≡ τk .35) ǫn ≡ S n − S ∞ . |τk+1 /τk | = [k/(k+1)] |z| < |z|. such that |ρ| ≥ τk+1 τk τk+1 |ρ| > τk |ρ| < 1. where S∞ represents the true.1. so we have chosen |ρ| = |z|. for at least one k > n. including among many others all the Taylor series of Table 8. more or less.33) which is to say. if the Taylor series − ln(1 − z) = ∞ k=1 zk k of Table 8. If the last paragraph seems abstract. 1 − |z| where ǫn is the error in the truncated sum.

whose maximum value for all k > n occurs when k = n + 1. whereas each series diverges when asked to compute a quantity like − ln 3 or 3a−1 directly. Let f (1 + ζ) be a function whose Taylor series about ζ = 0 converges for .10. |τk+1 /τk | = |z| /(k + 1).1 is truncated after the nth-order term.6) like these a more probably proﬁtable strategy is to ﬁnd and exploit some property of the functions to transform their arguments. γ 1 γ a−1 = . (1/γ)a−1 which leave the respective Taylor series to compute quantities like − ln(1/3) and (1/3)a−1 they can handle. but for nonentire functions (§ 8.8. if the Taylor series ∞ k 181 exp z = k=0 j=1 z = j ∞ k=0 zk k! also of Table 8. 8. ERROR BOUNDS Second.10. so we have chosen |ρ| = |z| /(n + 2). k! + 1)! . To shift the series’ expansion points per § 8. The series for − ln(1 − z) and (1 + z)a−1 for instance each converge for |z| < 1 (though slowly for |z| ≈ 1).2 is one way to seek convergence. then n k exp z ≈ |ǫn | < k=0 j=1 z n+1 /(n z = j n k=0 zk . where n + 2 > |z|. such as 1 − ln γ = ln .4 Calculation outside the fast convergence domain Used directly. 1 − |z| /(n + 2) Here.1 tend to converge slowly for some values of z and not at all for others. the Taylor series of Table 8.

·] are functions we know how to compute like g[·] = −[·] or g[·] = 1/[·]. γ we have that f (γ) = g f 1+ 1−γ γ . γ 1 . . f (γ) = g f (8. consider that the function f (1 + ζ) = − ln(1 − z) has ζ = −z.182 CHAPTER 8. ·] = [·][·]. and like h[·. f (γ) = g[f (1/γ)] = 1/f (1/γ) and f (αγ) = h[f (α).38) (8. ·] property of (8. f (γ)] = f (α)f (γ).37) whose convergence domain |ζ| < 1 is |1 − γ| / |γ| < 1.36) where g[·] and h[·. f (γ)] .39) This paragraph’s notation is necessarily abstract. ·] = [·] + [·] or h[·. and that the function f (1 + ζ) = (1 + z)a−1 has ζ = z. one can take advantage of the h[·. (8. which is |γ − 1| < |γ| or in other words 1 ℜ(γ) > . (8. γ= 1+ζ 1−γ = ζ. f (γ) = g[f (1/γ)] = −f (1/γ) and f (αγ) = h[f (α). 1 ≤ ℜ(γ) < 2. 22 |ℑ(γ)| ≤ ℜ(γ). 2 Although the transformation from ζ to γ has not lifted the convergence limit altogether. One nonoptimal but entirely eﬀective tactic is represented by the equations ω ≡ in 2m γ. computing f (ω) indirectly for any ω = 0 by any of several tactics. Identifying 1 = 1 + ζ.36) to sidestep the limit. To make it seem more concrete. f (γ)] = f (α) + f (γ). Though this writer knows no way to lift the convergence limit altogether that does not cause more problems than it solves. we see that it has apparently opened the limit to a broader domain. γ f (αγ) = h [f (α). THE TAYLOR SERIES |ζ| < 1 and which obeys properties of the forms22 1 .

24 To draw another example from Table 8. the tactic (8. For the logarithm. For the power. but shall in § 8. For example.39) fences γ within a comfortable zone. but such functions may have other properties one can analogously exploit. most nonentire functions lack properties of the speciﬁc kinds that (8. Axes are rotated per (3. Besides allowing all ω = 0.40) calculates f (ω) fast for any ω = 0.24 8. The method and tactic of (8.5 Nonconvergent series Variants of this section’s techniques can be used to prove that a series does not converge at all.40) leaves open the question of how to compute f (in 2m ).36) demands.11. shrinking a function’s argument by some convenient means before feeding the argument to the function’s Taylor series. In theory all ﬁnite γ rightward of the frontier let the Taylor series converge. as are the numbers ln(1/2) and (1/2)a−1 . τ Equation (8. Of course. we have not calculated yet. The sine and cosine in the cis function are each calculated directly by Taylor series. . ω sin α + cos α ˆ ˆ ˆ where arctan ω is interpreted as the geometrical angle the vector x + yω makes with x. − ln(in 2m ) = m ln(1/2) − in(2π/4). but for the example functions at least this is not hard. keeping γ moderately small in magnitude but never too near the ℜ(γ) = 1/2 frontier in the Argand plane. thus causing the Taylor series to converge faster or indeed to converge at all.39) also thus signiﬁcantly speeds series convergence.40) are useful in themselves and also illustrative generally. ERROR BOUNDS whereupon the formula23 f (ω) = h[f (in 2m ). consider that arctan ω = α + arctan ζ.10. but extreme γ of any kind let the series converge only slowly (and due to compound ﬂoating-point rounding error inaccurately) inasmuch as they imply that |ζ| ≈ 1.8.7) through some angle α to reduce the tangent from ω to ζ. Notice how (8.10. ω cos α − sin α ζ≡ . ∞ k=1 1 k does not converge because 1 > k 23 k+1 k dτ .1.36) through (8. (in 2m )a−1 = cis[(n2π/4)(a − 1)]/[(1/2)a−1 ]m . f (γ)] 183 (8. Any number of further examples and tactics of the kind will occur to the creative reader. The number 2π.

but since eliminating the error entirely also requires recording the sum to inﬁnite precision. A series sum converges to a deﬁnite value. Perfect precision is impossible. There is normally no such thing as an optimal error bound—with suﬃcient cleverness. but adequate precision is usually not hard to achieve. ∞ k=1 CHAPTER 8. to increase n a little). merely by including a suﬃcient number of terms in the sum. and to the same value every time the series is summed. An infamous example is S=− 1 1 1 (−)k √ = 1 − √ + √ − √ + ··· . some tighter bound can usually be discovered—but often easier and more eﬀective than cleverness is simply to add a few extra terms into the series before truncating it (that is. and we can make that neighborhood as tight as we want. 2 3 4 k k=1 ∞ which obviously converges. suggestions and tactics. which is impossible. The “error” in a series summation’s error bounds is unrelated to the error of probability theory. Occasionally nonetheless a series arises for which even adequate precision is quite hard to achieve. The English word “error” is thus overloaded here. To eliminate the error to the 0x34-bit (sixteen-decimal place) precision of a computer’s double-type ﬂoating-point representation typically requires something like 0x34 terms—if the series be wisely composed and if care be taken to keep z moderately small and reasonably distant from the edge of the series’ convergence domain. THE TAYLOR SERIES 1 > k ∞ k=1 k k+1 dτ = τ ∞ 1 dτ = ln ∞.184 hence.10. few engineering applications really use much more than 0x10 bits (ﬁve decimal places) in any case. is to establish a deﬁnite neighborhood in which the unknown value is sure to lie. . It is just that we do not necessarily know exactly what that value is. eliminating the error entirely is not normally a goal one seeks. by this section’s techniques or perhaps by other methods. but sum it if you can! It is not easy to do. To eliminate the error entirely usually demands adding an inﬁnite number of terms.6 Remarks The study of error bounds is not a matter of rules and formulas so much as of ideas. which is impossible anyway. What we can do. Before closing the section. we ought to arrest one potential agent of terminological confusion. Besides. τ 8. no chance is involved.

However. CALCULATING 2π 185 The topic of series error bounds is what G. we have that26 2π = 0x18 ∞ k=0 (−)k 2k + 1 √ 3−1 √ 3+1 2k+1 ≈ 0x6. We already know that tan 2π/8 = 1. 2k + 1 (8. Admittedly. [6] The writer is given to understand that clever mathematicians have invented subtle. from Table 3. Much faster convergence is given by angles smaller than 2π/8. 8. but such schemes are not covered here.”25 There is no general answer to the error-bound problem. but there are several techniques which help.487F. some of which this section has introduced. Any function whose Taylor series about zo = 0 includes only odd-order terms is an odd function. we have that 2π = 8 ∞ k=0 2π . For example. √ 2π 3−1 = . 8 (−)k . arctan √ 0x18 3+1 Applying the Taylor series at this angle. we shall meet later in the book as the need for them arises.42) 8. you only need to compute the numerical value of 2π once. 26 25 .8. the writer supposes that useful lessons lurk in the clever mathematics underlying the subtle schemes. The relatively straightforward series this section gives converges to the best accuracy of your computer’s ﬂoating-point register within a paltry 0x40 iterations or so—and.2. Examples of odd functions include z 3 and sin z. (8.11. or in other words that arctan 1 = Applying the Taylor series.41) is simple but converges extremely slowly. still much faster-converging iterative schemes toward 2π. Other techniques. Brown refers to as “trickbased.11 Calculating 2π The Taylor series for arctan z in Table 8.S.1 implies a neat way of calculating the constant 2π. after all.41) The series (8.12 Odd and even functions An odd function is one for which f (−z) = −f (z). there is fast and there is fast.

Speciﬁcally. tanh z z = ikπ [z − i(k − 1/2)π] tanh z|z = i(k − 1/2)π = 1. z = i(k − 1/2)π z − ikπ = 1. For example. along the imaginary. exp z = sinh z + cosh z. z = (k − 1/2)π z − kπ = 1. of the eight trigonometric functions 1 1 1 . .186 CHAPTER 8.43) = i(−)k . tan z z = kπ [z − (k − 1/2)π] tan z|z = (k − 1/2)π = −1. 8. Any function whose Taylor series about zo = 0 includes only even-order terms is an even function. z = kπ = (−)k . the plot of a real-valued even function being symmetric about a line.13 Trigonometric poles The singularities of the trigonometric functions are single poles of residue ±1 or ±i. . sin z cos z tan z 1 1 1 . z = ikπ k (8. . tanh z. but one can always split an analytic function into two components—one odd. Odd and even functions are interesting because of the symmetry they bring—the plot of a real-valued odd function being symmetric about a point. tan z. For the circular trigonometrics. all the poles lie along the real number line. the other even—by the simple expedient of sorting the odd-order terms from the even-order terms in the function’s Taylor series. Examples of even functions include z 2 and cos z. . Many functions are neither odd nor even. z − ikπ sinh z z − i(k − 1/2)π cosh z = (−) . THE TAYLOR SERIES An even function is one for which f (−z) = f (z). . of course. sinh z cosh z tanh z the poles and their respective residues are z − kπ sin z z − (k − 1/2)π cos z = (−)k . for the hyperbolic trigonometrics.

but for real z. sinh z cosh z tanh z 1/ tanh z to reach (8. .2 plus l’Hˆpital’s rule (4.44) [29] . These simpler trigonometrics are not only meromorphic but also entire.1 and 5. which is possible only for real z. o however. single poles.1). exp z and cis z—conspicuously excluded from this section’s gang of eight—have no poles for ﬁnite z. . the sine function’s very deﬁnition establishes the poles z = kπ (refer to Fig. therefore. cos z. Consider for instance the function 1/ sin z = i2/(eiz − e−iz ). we should like to verify that the poles (8. we ﬁnally apply l’Hˆpital’s rule to each of the ratios o z − kπ z − (k − 1/2)π z − kπ z − (k − 1/2)π . 8. Consider for example expanding f (z) = 27 e−z 1 − cos z (8. The trigonometrics are meromorphic functions (§ 8. similar reasoning for each of the eight trigonometrics forbids poles other than those (8. and thus are not meromorphic at all. . This function evidently goes inﬁnite only when eiz = e−iz . . With the observations from Table 5. sin z cos z tan z 1/ tan z z − ikπ z − i(k − 1/2)π z − ikπ z − i(k − 1/2)π .43) lists are in fact the only poles that there are. because ez .43)’s claims. we shall marshal the identities of Tables 5. Sometimes one nevertheless wants to expand at least about a pole.29). . sinh z.1 that i sinh z = sin iz and cosh z = cos iz.6) for this reason. Before calculating residues and such. eiz .8.43). subject to Cauchy’s integral formula and so on. cosh z.43) lists. The poles are ordinary. with residues. but never about a pole or branch point of the function. THE LAURENT SERIES 187 To support (8. Trigonometric poles evidently are special only in that a trigonometric function has an inﬁnite number of them.27 The six simpler trigonometrics sin z. Satisﬁed that we have forgotten no poles. ez ± e−z and eiz ± e−iz then likewise are ﬁnite. that we have forgotten no poles. 3.14 The Laurent series Any analytic function can be expanded in a Taylor series.14. Observe however that the inverse trigonometrics are multiple-valued and have branch points.

Expanding dividend and divisor separately. By long division. THE TAYLOR SERIES about the function’s pole at z = 0. f (z) = 2 2 − + z2 z 2 2 − 2+ z z + ∞ k=0 ∞ k=1 (−)k z 2k (2k)! ∞ k=1 − z 2k+1 z 2k + (2k)! (2k + 1)! (−)k z 2k (2k)! = 2 2 − + z2 z ∞ k=1 − ∞ k=0 (−)k 2z 2k−2 (−)k 2z 2k−1 + (2k)! (2k)! z 2k+1 z 2k + − (2k)! (2k + 1)! − (−)k 2z 2k+1 (2k + 2)! ∞ k=1 ∞ k=1 + = 2 2 − + 2 z z ∞ k=0 (−)k z 2k (2k)! (−)k 2z 2k (2k + 2)! ∞ k=0 + = 2 2 − + 2 z z ∞ k=0 − z 2k+1 z 2k + (2k)! (2k + 1)! (−)k 2 z 2k ∞ k=1 (−)k z 2k (2k)! −(2k + 1)(2k + 2) + (2k + 2)! + (2k + 2) − (−)k 2 2k+1 z (2k + 2)! (−)k z 2k .188 CHAPTER 8. (2k)! . f (z) = = = 1 − z + z 2 /2 − z 3 /6 + · · · z 2 /2 − z 4 /0x18 + · · · ∞ j j j=0 (−) z /j! − ∞ k 2k k=1 (−) z /(2k)! ∞ 2k /(2k)! + z 2k+1 /(2k k=0 −z ∞ k 2k k=1 (−) z /(2k)! + 1)! .

8. the Laurent series remedies this defect. fo (z − zo ) is analytic at z = zo . Handling the pole separately. and if all the terms 2 2 7 z f (z) = 2 − + − + · · · z z 6 2 are extracted the result is the Laurent series proper.45) the partial Laurent series f (z) = −1 k=K (ak )(z − zo )k + ∞ k=0 (bk )(z ∞ k=0 (ck )(z − zo )k .6 tell more about poles generally. unlike f (z). In either form. From an applied point of view however what interests us is the basic technique to remove the poles from an otherwise analytic function.5] .48) where. (8.8. c0 = 0. The ordinary Taylor series diverges at a function’s pole.46) However for many purposes (as in eqn.28. Further rigor is left to the professionals. (8. 28 The professional mathematician’s treatment of the Laurent series usually begins by deﬁning an annular convergence domain (a convergence domain bounded without by a large circle and within by a small) in the Argand plane. K ≤ 0.47) suﬃces and may even be preferable. − zo )k (8. (2k + 2)! (8. 29 [24. THE LAURENT SERIES 189 The remainder’s k = 0 terms now disappear as intended.14. factoring z 2 /z 2 from the division leaves f (z) = 2 2 − + 2 z z ∞ k=0 (2k + 3)(2k + 4) + (−)k 2 2k z (2k + 4)! (2k + 4) + (−)k 2 2k+1 z (2k + 4)! ∞ k=0 − (−)k z 2k .48) is f (z)’s regular part at z = zo . The fo (z − zo ) of (8. K ≤ 0. f (z) = −1 k=K (ak )(z − zo )k + fo (z − zo ). K ≤ 0. § 10. so.5 and 9. f (z) = ∞ k=K (ak )(z − zo )k .8][14. including multiple poles like the one in the example here.29 Sections 9.45) One can continue dividing to extract further terms if desired. § 2.

it may suﬃce to write f (z) = f (z) = This is f (z) = 1 sin w . As an example of a diﬀerent kind.190 CHAPTER 8. One could expand the function directly about some other point like z = 1. the resulting series would suﬀer a straitly limited convergence domain. even then. The typical trouble one has with the Taylor series is that a function’s poles and branch points limit the series’ convergence domain. Consider the function sin(1/z) . cos z z which is all one needs to calculate f (z) numerically—and may be all one needs for analysis. consider g(z) = 1 . and one cannot expand the function directly about it.15 Taylor series in 1/z A little imagination helps the Taylor series a lot. one needs no Taylor series to handle such a function (one simply does the indicated arithmetic). (z − 2)2 z −1 − z −3 /3! + z −5 /5! − · · · . but calculating the Taylor coeﬃcients would take a lot of eﬀort and. All that however tries too hard. Depending on the application.1) and (4. Then by (8. Suppose however that a Taylor series speciﬁcally about z = 0 were indeed needed for some reason. The point is an essential singularity. Thinking ﬂexibly. THE TAYLOR SERIES 8. Several other ways are possible. zk . too. cos z This function has a nonanalytic point of a most peculiar nature at z = 0. 2k+2 That expansion is good only for |z| < 2.2). but for |z| > 2 we also have that g(z) = 1/z 2 1 = 2 2 (1 − 2/z) z ∞ k=0 1+k 1 2 z k = ∞ k=2 2k−2 (k − 1) . w≡ . The Laurent series of § 8. 1 − z 2 /2! + z 4 /4! − · · · Most often. g(z) = 1 1/4 = 2 (1 − z/2) 4 ∞ k=0 1+k 1 z 2 k = ∞ k=0 k+1 k z . however.14 represents one way to extend the Taylor series. one can often evade the trouble.

v2 = y z and v3 = z. Note that we have computed the two series for g(z) without ever actually taking a derivative. v1 = x. . too. . consider the function f (z1 . then take the Taylor series either of the whole function at once or of pieces of it separately. though taking derivatives per (8. What does it mean? • The z is a vector 30 incorporating the several independent variables z1 . One can expand in negative powers of z equally validly as in positive powers. is a vector with N = 3. . Where two or more independent variables are involved. which has terms z1 and 2z2 —these we understand—but also has the cross-term z1 z2 for which the relevant derivative is the cross-derivative ∂ 2 f /∂z1 ∂z2 .16 The multidimensional Taylor series Equation (8.19) has given the Taylor series for functions of a single variable. one must account for the crossderivatives. . the multidimensional Taylor series is f (z) = k ∂kf ∂zk z=zo (z − zo )k . One is not required immediately to take the Taylor series of a function as it presents itself.19) may be the canonical way to determine Taylor coeﬃcients. k2 . k! (8. THE MULTIDIMENSIONAL TAYLOR SERIES 191 which expands in negative rather than positive powers of z. that’s neat. only the details are a little more complicated. • The k is a nonnegative integer vector of N counters—k1 . And. With this idea in mind. For example. z2 ) = z1 +z1 z2 +2z2 . Each of the kn runs independently from 0 to ∞.16.3.49) Well. Neither of the section’s examples is especially interesting in itself. . 30 . .8. The ˆ ˆ geometrical vector v = xx + yy + ˆz of § 3. 8. . The idea of the Taylor series does not diﬀer where there are two or more independent variables. kN — one for each of the independent variables. z2 . In this generalized sense of the word. one can ﬁrst change variables or otherwise rewrite the function in some convenient way. For 2 2 example. but their point is that it often pays to think ﬂexibly in extending and applying the Taylor series. a vector is an ordered set of N elements. . zN . any eﬀective means to ﬁnd the coeﬃcients suﬃces. and every permutation is possible. then.

. (1.19) represents a function f (z). 0). . 1). Thus within some convergence domain about z = zo . . (0.192 if N = 2 then CHAPTER 8. 2). (3.49) represents a function f (z) as accurately as the simple Taylor series (8. . meaning that ∂kf ≡ ∂zk • The (z − zo )k represents N N n=1 ∂ kn (∂zn )kn f. . n=1 With these deﬁnitions. (z − zo ) ≡ • The k! represents k n=1 (zn − zon )kn . 2). 1). 3). (3. and for the same reason. (2. 3). (2. 0). . . . . 0). k2 ) = (0. (1. . the multidimensional Taylor series (8..49) yields all the right derivatives and cross-derivatives at the expansion point z = zo . . N k! ≡ kn !. . 3). 2). THE TAYLOR SERIES k = (k1 . . (1. the multidimensional Taylor series (8. . (3. (0. (3. 2). 0). (2. . . • The ∂ k f /∂zk represents the kth cross-derivative of f (z). (0. 3). 1). (1. . 1). (2. ..

a dτ (9.2. recognizing its integrand to be the derivative of something already known:1 z a df dτ = f (τ )|z .1) For instance. implies a general technique only for calculating an integral numerically—and even for this purpose it is imperfect. 9. This chapter surveys several weapons of the intrepid mathematician’s arsenal against the integral. It is hard to guess in advance which technique might work best. then directly writes down the solution ln τ |x . Its counterpart (7.1 Integration by antiderivative The simplest way to solve an integral is just to look at it. how is one actually to do the sum? It turns out that there is no one general answer to this question. recognizing it to be the derivative of ln τ . 1 1 The notation f (τ )|z or [f (τ )]z means f (z) − f (a). 1 τ One merely looks at the integrand 1/τ . when it comes to adding an inﬁnite number of inﬁnitesimal elements. some by another. a a 193 . Some functions are best integrated by one technique.19) implies a general technique for calculating a derivative symbolically.Chapter 9 Integration techniques Equation (4. for. Refer to § 7.1). unfortunately. 1 x 1 dτ = ln τ |x = ln x.

whose diﬀerential is (by successive steps) d(u) = d(1 + x2 ). 9. (9. with the change of variable u ← 1 + x2 . However. z ρ1 (9.3) This helps.24) and (8. nonobvious. when z1 and z2 are real but negative numbers. . However.1 provide several further good derivatives this antiderivative technique can use. 1 + x2 This integral is not in a form one immediately recognizes.2.3 and 9. for example. 5.2) Tables 7.194 CHAPTER 9. One particular. du = 2x dx. Besides the essential τ a−1 = d dτ τa a .2 Integration by substitution Consider the integral x2 S= x1 x dx . INTEGRATION TECHNIQUES The technique by itself is pretty limited. If z = ρeiφ . useful variation on the antiderivative technique seems worth calling out specially here. then (8. the frequent object of other integration techniques is to transform an integral into a form to which this basic technique can be applied. 5.25) have that z2 z1 dz ρ2 = ln + i(φ2 − φ1 ).1.

whether alone or in combination with other techniques. 1 + x2 1 To check the result. 9. where u(τ ) and v(τ ) are functions of an independent variable τ . τ =a (9. Integrating. It does not solve all integrals but it does solve many. 1 + x2 x2 =x which indeed has the form of the integrand we started with. d(uv) = u dv + v du.25). INTEGRATION BY PARTS the integral is x2 195 S = x=x1 x2 = = = = x=x1 1+x2 2 x dx u 2x dx 2u du 2u u=1+x2 1 1 ln u 2 1 ln 2 1+x2 2 u=1+x2 1 1 + x2 2 . Reordering terms.3.9. u dv = d(uv) − v du. The technique is integration by substitution.4) . b τ =a u dv = uv|b =a − τ b v du.3 Integration by parts Integration by parts is a curious but very broadly applicable technique which begins with the derivative product rule (4.5 of the ﬁnal expression with respect to x2 : ∂ 1 1 + x2 2 ln ∂x2 2 1 + x2 1 = x2 =x = 1 ∂ ln 1 + x2 − ln 1 + x2 2 1 2 ∂x2 x . we can take the derivative per § 7.

not just for C = 0. sin ατ v = . dv ← cos ατ dτ. we can begin by integrating part of it. What it does is to integrate one part of an integral separately—whichever part one has chosen to identify as dv—while contrarily diﬀerentiating the other part u. Unsure how to integrate this. then. but one should understand clearly what it does and does not do. nothing in the integration by parts technique requires us to consider all possible v. consider the integral x S(x) = 0 τ cos ατ dτ. The technique is powerful for this reason. Any convenient v suﬃces. consider the deﬁnite integral3 Γ(z) ≡ 2 ∞ 0 e−τ τ z−1 dτ. For an example of the rule’s operation. INTEGRATION TECHNIQUES Equation (9.5) The careful reader will observe that v = (sin ατ )/α + C matches the chosen dv for any value of C.4). Letting u ← τ. We can begin by integrating the cos ατ dτ part. The new integral may or may not be easier to integrate than was the original u dv. It does not just integrate each part of an integral separately. α According to (9. we ﬁnd that2 du = dτ.4) is the rule of integration by parts. (9. ℜ(z) > 0. It isn’t that simple. However. S(x) = τ sin ατ α x 0 x − 0 sin ατ x dτ = sin αx + cos αx − 1. The virtue of the technique lies in that one often can ﬁnd a part dv which does yield an easier v du. This is true. For another kind of example of the rule’s operation. we choose v = (sin ατ )/α.196 CHAPTER 9. In this case. upon which it rewards the mathematician only with a whole new integral v du. 3 [32] . α α Integration by parts is a powerful technique.

5). It is best illustrated by example.9. z Γ(z + 1) = zΓ(z). z Substituting these according to (9.4 Integration by unknown coeﬃcients One of the more powerful integration techniques is relatively inelegant. The technique is the method of unknown coeﬃcients. it follows by induction on (9. 197 dv ← τ z−1 dτ. INTEGRATION BY UNKNOWN COEFFICIENTS Letting u ← e−τ .1) plus intelligent guessing.6) this is an interesting result. can be taken as an extended deﬁnition of the factorial (z − 1)! for all z.6) that (n − 1)! = Γ(n). called the gamma function. Since per (9. . τz v = .5) Γ(1) = 0 ∞ e−τ dτ = −e−τ ∞ 0 = 1. yet it easily cracks some integrals that give other techniques trouble.7) Thus (9. ℜ(z) > 0. Integration by parts has made this ﬁnding possible.4. and it is based on the antiderivative (9. we evidently have that du = −e−τ dτ. (9.4) into (9. 9. (9.5) yields Γ(z) = e−τ τz z ∞ = [0 − 0] + = When written τ =0 ∞ 0 − ∞ 0 τz z −e−τ dτ τ z −τ e dτ z Γ(z + 1) .

where I is the loan’s interest rate and P is the borrower’s payment rate. dρ with which one can solve the integral. one can guess a likely-seeming antiderivative form. one has no guarantee that the guess is right. after all). x|t=0 = xo . too. concluding that S(x) = −σ 2 e−(ρ/σ) 2 /2 x 0 = σ2 1 − e−(x/σ) 2 /2 . Having guessed.198 CHAPTER 9.10) which conceptually represents4 the changing balance x of a bank loan account over time t. too. but see: if the guess were right. one can write the speciﬁc antiderivative e−(ρ/σ) 2 /2 ρ= d 2 −σ 2 e−(ρ/σ) /2 . x|t=T = 0. Consider for example the diﬀerential equation dx = (Ix − P ) dt. Using this value for a. then the antiderivative would have the form e−(ρ/σ) 2 /2 ρ = d −(ρ/σ)2 /2 ae dρ aρ 2 = − 2 e−(ρ/σ) /2 . dρ where the a is an unknown coeﬃcient.9) The same technique solves diﬀerential equations. If it is desired to ﬁnd the correct payment rate P which pays Real banks (in the author’s country. (9. at least) by law or custom actually use a needlessly more complicated formula—and not only more complicated. such as e−(ρ/σ) 2 /2 ρ= d −(ρ/σ)2 /2 ae . INTEGRATION TECHNIQUES Consider the integral (which arises in probability theory) x S(x) = 0 e−(ρ/σ) 2 /2 ρ dρ. (9.8) If one does not know how to solve the integral in a more elegant way. (9. but mathematically slightly incorrect. σ implying that a = −σ 2 (evidently the guess is right. 4 .

For the former condition. Evidently good choices for α and B.12) . then (perhaps after some bad guesses) we guess the form x(t) = Aeαt + B. Substituting the last two equations into (9. we establish by applying the given boundary conditions x|t=0 = xo and x|t=T = 0. then. which at least is satisﬁed if both of the equations αAeαt = IAeαt .10). 0 = AeIT + P .10) and dividing by dt yields αAeαt = IAeαt + IB − P. where α. The guess’ derivative is dx = αAeαt dt. we have that A = P = −e−IT xo .4. 1 − e−IT Ixo . (9.11) is P P =A+ . 1 − e−IT (9. INTEGRATION BY UNKNOWN COEFFICIENTS 199 the loan oﬀ in the time T . are satisﬁed. 0 = IB − P.11) x(t) = AeIt + I to (9. are α = I. I Solving the last two equations simultaneously. A and B are unknown coeﬃcients. P . The constants A and P . xo = Ae(I)(0) + I I and for the latter condition. B = I Substituting these coeﬃcients into the x(t) equation above yields the general solution P (9.9.

The trouble. Slightly inelegant the method may be. yields I= za dz = i2πz a |z=−1 = i2π ei2π/2 z+1 a = i2πei2πa/2 .200 CHAPTER 9. Observing that τ is only a dummy integration variable. seems to solve the integral. Such are the kinds of problems the method can solve. is that the integral S does not go about a closed complex contour. One can however construct a closed complex contour I of which S is a part.13) to (9.29) has that integrating once counterclockwise about a closed complex contour. as in Fig 9. The integrand has a pole at τ = −1. τ +1 This is a hard integral. −1 < a < 0. too—and it has surprise value (for some reason people seem not to expect it). the method usually ﬁnds it.1. with the contour enclosing the pole at z = −1 but shutting out the branch point at z = 0. The method of unknown coeﬃcients is an elephant.2] .12).5 Integration by closed contour We pass now from the elephant to the falcon.11) yields the speciﬁc solution x(t) = xo 1 − e−IT 1 − e(I)(t−T ) (9. from the inelegant to the sublime. The virtue of the method of unknown coeﬃcients lies in that it permits one to try an entire family of candidate solutions at once. but there is a way. but it is pretty powerful. of course. No obvious substitution. If a solution exists anywhere in the family. with the family members distinguished by the values of the coeﬃcients. Consider the deﬁnite integral5 S= 0 ∞ τa dτ. § 1. with the payment rate P required of the borrower given by (9. no evident factoring into parts. INTEGRATION TECHNIQUES Applying these to the general solution (9.10) meeting the boundary conditions. then Cauchy’s integral formula (8. if one writes the same integral using the complex variable z in place of the real variable τ . If the outer circle in the ﬁgure is of inﬁnite 5 [32. 9.

9. Even though z itself takes on the same values along I3 as along I1 . Indeed. but besides being incorrect this defeats the purpose of the closed contour technique. then the closed contour I is composed of the four parts I = I1 + I2 + I3 + I4 = (I1 + I3 ) + I2 + I4 . so. One must take care to interpret the four parts correctly. The integrand z a /(z + 1) is multiple-valued. The ﬁgure tempts one to make the mistake of writing that I1 = S = −I3 . ℑ(z) 201 I2 z = −1 I4 I1 I3 ℜ(z) radius and the inner. ρ+1 ρa dρ = ei2πa S. which. ρ+1 . of inﬁnitesimal.5. the multiple-valued integrand z a /(z + 1) does not. the two parts I1 + I3 = 0 do not cancel. INTEGRATION BY CLOSED CONTOUR Figure 9.1: Integration by closed contour. I1 = 0 ∞ ∞ 0 −I3 = (ρei0 )a dρ = (ρei0 ) + 1 (ρei2π )a dρ = ei2πa (ρei2π ) + 1 ∞ 0 ∞ 0 ρa dρ = S. The integrand has a branch point at z = 0. More subtlety is needed. in passing from I3 through I4 to I1 . in fact. the contour has circled.

the ﬁrst limit vanishes.14) an astonishing result. the second limit vanishes. too. Substituting this expression into the last equation yields. leaving I = (1 − ei2πa )S. Such results vindicate the eﬀort we have spent in deriving Cauchy’s integral formula (8.202 Therefore. ℑ(a) = 0. |ℜ(a)| < 1. i2πei2πa/2 = (1 − ei2πa )S. −1 < a < 0. that one is unlikely to believe it at ﬁrst encounter. But by Cauchy’s integral formula we have already found an expression for I.6 Another example7 is 2π T = 0 6 dθ . straightforward (though computationally highly ineﬃcient) numerical integration per (7. S = i2πei2πa/2 . INTEGRATION TECHNIQUES I = I1 + I2 + I3 + I4 = (I1 + I3 ) + I2 + I4 = (1 − ei2πa )S + lim 2π ρ→∞ φ=0 2π za dz − lim ρ→0 z+1 z a−1 dz − lim − lim ρ→0 2π φ=0 2π za dz z+1 = (1 − ei2πa )S + lim = (1 − ei2πa )S + lim ρ→∞ φ=0 a 2π ρ→∞ z a φ=0 z a+1 ρ→0 φ=0 a+1 2π φ=0 z a dz .1) conﬁrms the result. τ +1 sin(2πa/2) (9. However. Since a < 0. 1 − ei2πa i2π S = −i2πa/2 . as the interested reader and his computer can check. e − ei2πa/2 2π/2 . 1 + a cos θ So astonishing is the result. by successive steps. 0 ∞ 2π/2 τa dτ = − . 7 [29] . S = − sin(2πa/2) That is.29). CHAPTER 9. and because a > −1.

unlike θ. One thus obtains the equivalent integral T = dz/iz i2 dz =− 2 + 2z/a + 1 1 + (a/2)(z + 1/z) a z dz i2 = − . begins and ends the integration at the same point. whose magnitudes are such that √ 2 − a2 ∓ 2 1 − a2 . Rather. In this example there is no branch point to exclude.5.9. The integrand evidently has poles at −1 ± √ a 1 − a2 z= . here again the contour is not closed. nor need one extend the contour. |z| = a2 2 One of the two magnitudes is less than unity and one is greater. excluding the branch point. one changes the variable z ← eiθ and takes advantage of the fact that z. meaning that one of the two poles lies within the contour and one lies without. as is . INTEGRATION BY CLOSED CONTOUR 203 As in the previous example. The previous example closed the contour by extending it. √ √ a z − −1 + 1 − a2 /a z − −1 − 1 − a2 /a whose contour is the unit circle in the Argand plane.

It is a great technique. If it seems unreasonable to the reader to expect so ﬂamboyant a technique actually to work. The technique. is found in practice to solve many integrals other techniques ﬁnd almost impossible to crack. √ 2 − a2 + 2 1 − a2 . . > > 1 − a2 . integrating about the pole within the contour yields T = i2π −i2/a √ z − −1 − 1 − a2 /a =√ 2π . √ 2 + 2 1 − a2 . 1 − a2 √ z=(−1+ 1−a2 )/a Observe that by means of a complex variable of integration. nevertheless. 8 These steps are perhaps best read from bottom to top. a2 Per Cauchy’s integral formula (8. 1 − a2 > 1 − 2a2 + a4 . See Ch. The key to making the technique work lies in closing a contour one knows how to treat. integration by closed contour. 6’s footnote 14. INTEGRATION TECHNIQUES seen by the successive steps8 a2 < 1. so long as the contour encloses appropriate poles in the Argand domain plane while shutting branch points out. √ < −(1 − a2 ) < < < < < 2a2 a2 1 1 − a2 .204 CHAPTER 9.29). √ 2 − a2 + 2 1 − a2 . 1 − a2 √ 1 − a2 √ − 1 − a2 √ 1 − 1 − a2 √ 2 − 2 1 − a2 √ 2 − a2 − 2 1 − a2 √ 2 − a2 − 2 1 − a2 a2 . (−a2 )(0) > (−a2 )(1 − a2 ). The robustness of the technique lies in that any contour of any shape will work. < < < < a2 1 − a2 2 0 > −a2 + a4 . √ 1 + 1 − a2 . each example has indirectly evaluated an integral whose integrand is purely real. this seems equally unreasonable to the writer—but work it does. 0 < 1 − a2 .

B is the numerator and A is the denominator. §§ 2. z−1 z−2 Combining the two fractions over a common denominator10 yields f (z) = z+3 .9 9.9. Appendix F][24. the former is probably the more amenable to analysis. 0 0 f (τ ) dτ −1 = −1 = [−4 ln(1 − τ ) + 5 ln(2 − τ )]0 .16) by N j=1 (z N j=1 (z 9 − αj ) /(z − αk ) − αj ) /(z − αk ) [36. (9.15) in which no two of the several poles αj are the same.1 Partial-fraction expansion f (z) = 5 −4 + . the partial-fraction expansion has the form N Ak f (z) = .6. In the fraction. INTEGRATION BY PARTIAL-FRACTION EXPANSION 205 9. −1 −4 dτ + τ −1 0 −1 5 dτ τ −2 The trouble is that one is not always given the function in the amenable form. 10 . using (9. The quotient is Q = B/A.12] Terminology (you probably knew this already): A fraction is the ratio of two numbers or expressions B/A.6 Integration by partial-fraction expansion This section treats integration by partial-fraction expansion. (z − 1)(z − 2) Consider the function Of the two forms.6.16) z − αk k=1 where multiplying each fraction of (9.7 and 10. For example. Given a rational function f (z) = N −1 k k=0 bk z N j=1 (z − αj ) (9. It introduces the expansion itself ﬁrst.3).

but because at least to this writer that way does not lend much applied insight.18) where C is a real-valued constant.15) by (9. 1 = lim N −1 k k=0 bk z N j=1 (z z→αm − αj ) Am . 1 − ǫei2πk/N /z . INTEGRATION TECHNIQUES puts the several fractions over a common denominator.17) together give the partial-fraction expansion of (9.206 CHAPTER 9.1 is that it cannot directly handle repeated poles. The conventional way to expand a fraction with repeated poles is presented in § 9. then the residue formula (9. i = j. the value of f (z) with the pole canceled. Consider the function g(z) = N −1 k=0 Cei2πk/N . is called the residue of f (z) at the pole z = αm . we separate the poles. Dividing (9.2 Repeated poles The weakness of the partial-fraction expansion of § 9. 9. if αi = αj . This function evidently has a small circle of poles in the Argand plane at αk = ǫei2πk/N .15)’s rational function f (z).17) where Am . z→αm z=αm (9. the present subsection treats the matter in a diﬀerent way. That is. z − αm Rearranging factors. yielding (9. we have that Am = N −1 k k=0 bk z N j=1 (z − αj ) /(z − αm ) = lim [(z − αm )f (z)] .17) ﬁnds an uncanceled pole remaining in its denominator and thus fails for Ai = Aj (it still works for the other Am ). Factoring.16) gives the ratio 1= N −1 k k=0 bk z N j=1 (z N k=1 − αj ) Ak .6.6. the mth term Am /(z − αm ) dominates the summation of (9.6.15). N > 1.16) and (9. Equations (9.5 below. 0 < ǫ ≪ 1. g(z) = C z N −1 k=0 ei2πk/N . z − αk In the immediate neighborhood of z = αm . Hence. z − ǫei2πk/N (9.16). Here.

in the other cases they cancel. zN Having achieved this approximation. Do the same for j = 2 then j = 8. |z| ≫ ǫ. 0 < ǫ ≪ 1. so g(z) = N C ǫmN −1 . then for N = 8 and j = 3 plot the several (ei2πj/N )k in the Argand plane.9. z − ǫei2πk/N If you don’t see why. if we strategically choose C= then 1 . Only in the j = 8 case do the terms add coherently.6. |z| ≫ ǫ. canceling otherwise—is a classic manifestation of Parseval’s principle.  N −1 C ei2πk/N g(z) = z k=0 207 ∞ j=0 ǫei2πk/N z j   = C N −1 ∞ k=0 j=1 ǫj−1 ei2πjk/N zj N −1 k=0 = C But11 ∞ j=1 ǫj−1 zj ei2πj/N k . N ǫN −1 1 . N −1 k=0 ei2πj/N k = N 0 ∞ if j = mN.34) to expand the fraction. This eﬀect—reinforcing when j = nN . Hence. otherwise. except in the immediate neighborhood of the small circle of poles—the ﬁrst term of the summation dominates. zN But given the chosen value of C. z mN m=1 For |z| ≫ ǫ—that is. (9. INTEGRATION BY PARTIAL-FRACTION EXPANSION Using (2. N > 1.18) is g(z) ≈ g(z) = 11 1 N ǫN −1 N −1 k=0 ei2πk/N . g(z) ≈ N C ǫN −1 . .

z − zo + ǫei2πk/N (9. changing z − zo ← z. INTEGRATION TECHNIQUES Joining the last two equations together. An example to illustrate the technique. we have that 1 1 = lim ǫ→0 N ǫN −1 (z − zo )N N −1 k=0 ei2πk/N .1 we already know how to handle. . the pole hiding under dominant. double pole there.19) is that it lets one replace an N -fold pole with a small circle of ordinary poles.19) The signiﬁcance of (9. Notice how the calculation has discovered an additional. separating a double pole: f (z) = z2 − z + 6 (z − 1)2 (z + 2) z2 − z + 6 ǫ→0 (z − [1 + ǫei2π(0)/2 ])(z − [1 + ǫei2π(1)/2 ])(z + 2) z2 − z + 6 = lim ǫ→0 (z − [1 + ǫ])(z − [1 − ǫ])(z + 2) 1 z2 − z + 6 = lim ǫ→0 z − [1 + ǫ] (z − [1 − ǫ])(z + 2) z=1+ǫ = lim + + = lim 1 z − [1 − ǫ] 1 z+2 z2 − z + 6 (z − [1 + ǫ])(z + 2) z2 z=1−ǫ ǫ→0 = lim ǫ→0 = lim ǫ→0 6+ǫ 1 z − [1 + ǫ] 6ǫ + 2ǫ2 1 6−ǫ + z − [1 − ǫ] −6ǫ + 2ǫ2 0xC 1 + z+2 9 1/ǫ − 1/6 −1/ǫ − 1/6 4/3 + + z − [1 + ǫ] z − [1 − ǫ] z+2 1/ǫ −1/ǫ −1/3 4/3 + + + z − [1 + ǫ] z − [1 − ǫ] z − 1 z + 2 −z+6 (z − [1 + ǫ])(z − [1 − ǫ]) z=−2 . Notice incidentally that 1/N ǫN −1 is a large number not a small. The poles are close together but very strong.208 CHAPTER 9. single pole at z = 1. and writing more formally.6. N > 1. which per § 9.

(τ − 1)2 (τ + 2) which indeed has the form of the integrand we started with. for 0 ≤ x < 1. Continuing the example of § 9.5) that the result is correct.9.16) and (9. then one can use (9.17)—and.6.3 Integrating a rational function If one can ﬁnd the poles of a rational function of the form (9. conﬁrming the . x f (τ ) dτ 0 = = = = = = = = = τ2 − τ + 6 dτ 2 0 (τ − 1) (τ + 2) x −1/ǫ −1/3 4/3 1/ǫ lim + + + dτ ǫ→0 0 τ − [1 + ǫ] τ − [1 − ǫ] τ − 1 τ + 2 1 1 lim ln([1 + ǫ] − τ ) − ln([1 − ǫ] − τ ) ǫ→0 ǫ ǫ x 4 1 − ln(1 − τ ) + ln(τ + 2) 3 3 0 x 1 [1 + ǫ] − τ 4 1 lim ln − ln(1 − τ ) + ln(τ + 2) ǫ→0 ǫ [1 − ǫ] − τ 3 3 0 x [1 − τ ] + ǫ 4 1 1 ln − ln(1 − τ ) + ln(τ + 2) lim ǫ→0 ǫ [1 − τ ] − ǫ 3 3 0 x 2ǫ 4 1 1 ln 1 + − ln(1 − τ ) + ln(τ + 2) lim ǫ→0 ǫ 1−τ 3 3 0 x 1 2ǫ 4 1 lim − ln(1 − τ ) + ln(τ + 2) ǫ→0 ǫ 1−τ 3 3 0 x 1 4 2 − ln(1 − τ ) + ln(τ + 2) lim ǫ→0 1 − τ 3 3 0 2 x+2 1 4 − 2 − ln(1 − x) + ln . (9. each of which one can integrate individually.2.6. if needed. INTEGRATION BY PARTIAL-FRACTION EXPANSION 209 9.19)—to expand the function into a sum of partial fractions. we can take the derivative of the ﬁnal expression: d dx x+2 1 4 2 − 2 − ln(1 − x) + ln 1−x 3 3 2 x=τ 2 −1/3 4/3 = + + (τ − 1)2 τ − 1 τ + 2 τ2 − τ + 6 = .6.15). 1−x 3 3 2 x To check (§ 7.

A related property is that dk Φ dwk =0 w=0 for 0 ≤ k < p.21) is (9. when 0 ≤ k < p and w = 0.21) where g and hk are polynomials in nonnegative powers of w. the function and its ﬁrst p − 1 derivatives are all zero at w = 0.21) holds for all 0 ≤ k ≤ p. (9. di Φ dwi = d di−1 Φ d = i−1 dw dw dw wp−i+1 hi−1 (w) [g(w)]i = wp−i hi (w) [g(w)]i+1 . By induction on this basis.21) is good at least for this case. (9. of course.3 aﬀords extra insight. The reason is that (9. (9. 9. When k = 0. g(0) = 0. hi (w) ≡ wg dg dhi−1 − iwhi−1 + (p − i + 1)ghi−1 .20). as was to be demonstrated. whereas its numerator has a wp−k = 0 factor.2 and 9. (Notice incidentally how much easier it is symbolically to diﬀerentiate than to integrate!) 9.6. but the derivatives do bring some noteworthy properties. 0 < i ≤ p.6.21)’s denominator is [g(w)]k+1 = 0. The property is proved by induction. its derivatives interest us. (9. INTEGRATION TECHNIQUES result.4 The derivatives of a rational function Not only the integral of a rational function interests us.5 Repeated poles (the conventional technique) Though the technique of §§ 9. too. 0 ≤ k ≤ p. dwk [g(w)]k+1 (9.21) holds for k = i − 1.20) Φ(w) = g(w) enjoys derivatives in the general rational form dk Φ wp−k hk (w) = .210 CHAPTER 9. Then. if (9. One needs no special technique to compute such derivatives. so (9. First of interest is the property that a function in the general rational form wp h0 (w) . it is not the conventional technique to expand in partial fractions a rational function .6.6.22) That is. dw dw which makes hi (like hi−1 ) a polynomial in nonnegative powers of w.

ℓ=0 j=m (Ajℓ )(z − αm )pm + (z − αj )pj −ℓ pm −1 ℓ=0 (Amℓ )(z − αm )ℓ . one multiplies (9.6. (9.22) with w = z − αm reveals the double summation and its ﬁrst pm − 1 derivatives all to be null at z = αm . cannot be expanded solely in the ﬁrst-order fractions of § 9.9. (z − αj )pj −ℓ (9. 1 ≤ m ≤ M .24) by (z − αm )pm to obtain the form M (z − αm ) pm f (z) = pj −1 j=1. j=1 ≥ 0.24) lacks are the values of its several coeﬃcients Ajℓ . k. where j. f (z) = M N −1 k k=0 bk z M j=1 (z − αj )pj . M . But (9. but can indeed be expanded if higherorder fractions are allowed: f (z) = M pj −1 j=1 ℓ=0 Ajℓ . dk dz k M pj −1 j=1. ℓ=0 j=m (Ajℓ )(z − αm )pm (z − αj )pj −ℓ z=αm = 0.6. INTEGRATION BY PARTIAL-FRACTION EXPANSION 211 having a repeated pole.1. = αj if j ′ = j. N and the several pj are integers.24) What the partial-fraction expansion (9.23) N pj αj ′ ≡ pj . A rational function with repeated poles. . 0 ≤ k < pm . The conventional technique is worth learning not only because it is conventional but also because it is usually quicker to apply in practice. This subsection derives it. One can determine the coeﬃcients with respect to one (possibly repeated) pole at a time. To determine them with respect to the pm -fold pole at z = αm . that is.

“Why should we prove that a solution exists.25)’s coeﬃcients actually come from the same expansion. yes. some from the other. “The present book is a book of applied mathematics. once we have actually found it?” Ah. and uniquely. On the other hand. one derivative per repetition of the pole. “But these are quibbles. Maybe there exist two distinct expansions. but on this occasion let us nonetheless follow the professional’s line of reasoning. z=αj 0 ≤ ℓ < p.5’s logic discovers no guarantee that all of (9. A professional mathematician might object however that it has done so without ﬁrst proving that a unique solution actually exists. INTEGRATION TECHNIQUES so. A careful review of § 9. the (z − αm )pm f (z) equation’s kth derivative reduces at that point to dk (z − αm )pm f (z) dz k = z=αm pm −1 ℓ=0 dk (Amℓ )(z − αm )ℓ dz k z=αm = k!Amk . but the professional’s point is that we have found the solution only if in fact it does exist. In case of a repeated pole.6 The existence and uniqueness of solutions Equation (9. Uniqueness is proved by positing two solutions M pj −1 j=1 ℓ=0 f (z) = Ajℓ = (z − αj )pj −ℓ M pj −1 j=1 ℓ=0 Bjℓ (z − αj )pj −ℓ . if only a short way. Comes from us the reply. in which event it is not even clear what (9. (9. 9.6.24)’s partial fractions.6.24).25) has solved (9. and some of the coeﬃcients come from the one. 0 ≤ k < pm . these coeﬃcients evidently depend not only on the residual function itself but also on its several derivatives. cavils and nitpicks!” we are inclined to grumble.23) and (9.” Well. maybe there exists no expansion at all. Changing j ← m and ℓ ← k and solving for Ajℓ then produces the coeﬃcients Ajℓ = 1 ℓ! dℓ (z − αj )pj f (z) dz ℓ .25) means.212 CHAPTER 9.25) to weight the expansion (9. otherwise what we have found is a phantom.

9. From the N coeﬃcients bk and the N coeﬃcients Ajℓ .23). We shall not often in this book prove existence and uniqueness explicitly.9. Logically this diﬀerence must be zero for all z if the two solutions are actually to represent the same function f (z). because each half-integral . one cannot. But uniqueness.7. FRULLANI’S INTEGRAL and computing the diﬀerence M pj −1 j=1 ℓ=0 213 Bjℓ − Ajℓ (z − αj )pj −ℓ between them. there always exist an alternate set of bk for which the same system has multiple solutions.7 Frullani’s integral ∞ 0 One occasionally meets an integral of the form S= f (bτ ) − f (aτ ) dτ. This however is seen to be possible only if Bjℓ = Ajℓ for each (j. which we have already established. say. where the combination’s weights depend solely on the locations αj and multiplicities pj of f (z)’s several poles. but such proofs when desired tend to ﬁt the pattern outlined here. positive coeﬃcients and f (τ ) is an arbitrary complex expression in τ . Therefore. one wants to split in two as [f (bτ )/τ ] dτ − [f (aτ )/τ ] dτ . the solution necessarily exists. Such an integral. but if f (0+ ) = f (+∞). 11 through 14 that when such a system has no solution. Therefore it is not possible for the system to have no solution—which is to say.24) over a common denominator and comparing the resulting numerator against the numerator of (9. N = 3) look like b0 = −2A00 + A01 + 3A10 . b1 = A00 + A01 + A10 . ℓ). Existence comes of combining the several fractions of (9. the two solutions are one and the same. forbids such multiple solutions in all cases. τ where a and b are real. Each coeﬃcient bk is seen thereby to be a linear combination of the several Ajℓ . b2 = 2A01 − 5A10 . We shall show in Chs. an N × N system of N linear equations in N unknowns results—which might for example (if.

Nonetheless. this is b/ǫ S = = ǫ→0+ lim f (+∞) a/ǫ dσ − f (0+ ) σ bǫ aǫ dσ σ ǫ→0+ lim f (+∞) ln b/ǫ bǫ − f (0+ ) ln a/ǫ aǫ b = [f (τ )]∞ ln .5.3][1. 0 a Thus we have Frullani’s integral. So long as each of f (ǫ) and f (1/ǫ) approaches a constant value as ǫ vanishes. but in fact it does not matter which of a and b is the greater. INTEGRATION TECHNIQUES alone diverges. works for any f (τ ) which has deﬁnite f (0+ ) and f (+∞). as is easy to verify). § 2.12 12 [32.1][51. provided that one ﬁrst relaxes the limits of integration as 1/ǫ S = lim ǫ→0+ ǫ f (bτ ) dτ − τ 1/ǫ ǫ f (aτ ) dτ τ . “Frullani’s integral”] . 0 τ a (9. Changing σ ← bτ in the left integral and σ ← aτ in the right yields b/ǫ S = = = ǫ→0+ lim bǫ bǫ f (σ) dσ − σ a/ǫ aǫ f (σ) dσ σ f (σ) − f (σ) dσ + σ b/ǫ a/ǫ ǫ→0+ lim aǫ −f (σ) dσ + σ f (σ) dσ − σ a/ǫ bǫ bǫ aǫ f (σ) dσ σ b/ǫ ǫ→0+ lim a/ǫ f (σ) dσ σ (here on the face of it. ∞ 0 b f (bτ ) − f (aτ ) dτ = [f (τ )]∞ ln . § 1. if a and b are both real and positive. splitting the integral in two is the right idea.214 CHAPTER 9.26) which. we have split the integration as though a ≤ b.

α ak+1 = − .14 exp(ατ )τ n = d dτ n k=0 1 . Remembering however § 5. eqn. (n!/k!)αn−k+1 1 α n j=k+1 (−jα) = (−)n−k exp(ατ )τ k . however. ln βτ = ln β + ln τ . 14 [42. (k + 1)(α) (−)n−k . 0 ≤ k < n. = One could write the latter product more generally as τ a−1 ln βτ . α = 0.3’s observation that ln τ is of zeroth order in τ . wherein ln β is just a constant. The two occur often enough to merit investigation here. by § 9.9. 0 ≤ k ≤ n. ak = Therefore.27) The right form to guess for the antiderivative of τ a−1 ln τ is less obvious. = αan exp(ατ )τ n + k=0 αak + If so. Concerning exp(ατ )τ n . POWERS AND LOGS 215 9. after maybe some false tries we eventually do strike the right form τ a−1 ln τ d a τ [B ln τ + C] dτ = τ a−1 [aB ln τ + (B + aC)].5. n ≥ 0. Appendix 2. PRODUCTS OF EXPONENTIALS. then evidently an = ak That is. powers and logarithms The products exp(ατ )τ n and τ a−1 ln τ tend to arise13 among other places in integrands related to special functions ([chapter not yet written]). (n!/k!)αn−k+1 (9. 73] 13 .8.8 Integrating products of exponentials.4’s method of unknown coeﬃcients we guess its antiderivative to ﬁt the form exp(ατ )τ n = = k=0 n−1 d dτ n n ak exp(ατ )τ k k=0 n αak exp(ατ )τ k + k=1 ak exp(ατ )τ k−1 k ak+1 k+1 exp(ατ )τ k . According to Table 2.

as sometimes it does.28) fails when a = 0. the Taylor series of Ch. When all else fails.29).28) as a → 0.1: Antiderivatives of products of exponentials. 15 [42. Therefore.29) If (9. α = 0 (n!/k!)αn−k+1 . would yield the same (9.41) that τ a = exp(a ln τ ). a=0 d 1 ln τ − dτ a a 2 d (ln τ ) dτ 2 which demands that B = 1/a and that C = −1/a2 . many integrals.1 summarizes. o applied to (9. But not all. eqn. but in this case with a little imagination the antiderivative is not hard to guess: d (ln τ )2 ln τ = .29). with the observation from (2. τ dτ 2 (9.216 CHAPTER 9.28) Antiderivatives of terms like τ a−1 (ln τ )2 . 8 and the antiderivative of § 9. d dτ n k=0 τa exp(ατ )τ n = τ a−1 ln τ ln τ τ = = (−)n−k exp(ατ )τ k . then l’Hˆpital’s rule (4. Appendix 2. (9. (9. INTEGRATION TECHNIQUES Table 9.29) seemed hard to guess nevertheless. a = 0.15 τ a−1 ln τ = d τa dτ a ln τ − 1 a . Table 9. powers and logarithms.9 Integration by Taylor series With suﬃcient cleverness the techniques of the foregoing sections solve many.30) 9. n ≥ 0. 74] . exp(ατ )τ n ln τ and so on can be computed in like manner as the need arises. Equation (9.1 together oﬀer a concise.

however. at the price of losing the functions’ known closed analytic forms. it’s just a function you hadn’t heard of before. too. . After all. 0 ∞ k=0 (−)k z 2k+1 = (z) (2k + 1)2k k! x ∞ k=0 1 2k + 1 k j=1 −z 2 . The myf z is no less a function than sin z is.1 is used to calculate sin z. You can plot the function. For example. x 0 τ2 exp − 2 dτ = = = x ∞ 0 k=0 x ∞ 0 k=0 ∞ k=0 (−τ 2 /2)k dτ k! (−)k τ 2k dτ 2k k! x (−)k τ 2k+1 (2k + 1)2k k! 0 ∞ k=0 = ∞ k=0 (−)k x2k+1 (2k + 1)2k k! = (x) 1 2k + 1 k j=1 −x2 . It works just the same.9.9. or do with it whatever else one does with functions. 2j The result is no function one recognizes. The series above converges just as accurately and just as fast. or calculate its value. when a Taylor series from Table 8. This is not necessarily bad. 2j exp − τ2 2 dτ = myf x. then sin z is just a series. Sometimes it helps to give the series a name like myf z ≡ Then. it is just a series. or take its derivative τ2 d myf τ = exp − dτ 2 . INTEGRATION BY TAYLOR SERIES 217 practical way to integrate some functions.

INTEGRATION TECHNIQUES .218 CHAPTER 9.

and as we have seen in § 4. one likes to lay down one’s burden and rest a spell in the shade. 219 . But in this short chapter which rests between. 11.2) extracts with little eﬀort.30) locates swiftly. The expression z + a0 is a linear polynomial. The roots of higher-order polynomials. No general formula to extract the roots of the nth-order polynomial seems to be known. and Ch. which though not plain to see the quadratic formula (2. 6’s footnote 8.2). but that is an approximate iteration rather than an exact formula like (2. One would prefer an actual formula to extract the roots. z 4 + a3 z 3 + a2 z 2 + a1 z + a0 . the Newton-Raphson iteration (4. hefting the weighty topic of the matrix. will indeed begin to build on those foundations. The quadratic polynomial z 2 + a1 z + a0 has of course two roots. to extract the roots of the cubic and quartic polynomials z 3 + a2 z 2 + a1 z + a0 . between the hard work of the morning and the heavy lifting of the afternoon. Chapters 2 through 9 have established the applied mathematical foundations upon which coming chapters will build. So much algebra has been known since antiquity. we shall refresh ourselves with an interesting but lighter mathematical topic: the topic of cubics and quartics.1 However. the lone root z = −a0 of which is plain to see.Chapter 10 Cubics and quartics Under the heat of noonday.8 it can occasionally fail to converge. 1 Refer to Ch.

2 10.” 05:17. “Fran¸ois Vi`te. w+ An interesting alternative to Vieta’s transform is w 2 wo ← z. but as |w| approaches |wo | this 2 ceases to be true. 31 Oct.3) 2 [51.4 For |w| ≫ |wo |. “Quartic equation”][52.” [51. Figure 10. 4 Also called “Vieta’s substitution. “Quartic equation. and so forth. Section 10. z ≈ wo /w. For |w| ≪ |wo |. 1/3 resembles 3. 9 Nov. (10. we have that z ≈ w. “Gerolamo Cardano.2 Cubics The general cubic polynomial is too hard to extract the roots of directly. The constant wo is the corner value. so one begins by changing the variable x+h←z (10. 1 ← z.5] 3 This change of variable broadly recalls the sum-of-exponentials form (5. one can transform a function f (z) into a function f (w) by the change of variable3 w+ or.1) w Equation (10.3 might be named Vieta’s parallel transform. 2006][52. “Cubic equation”][51. 10. c e 2006][46.1 plots Vieta’s transform for real w in the case wo = 1. The 16thcentury algebraists Ferrari.1) is Vieta’s transform. 1 Nov. Vieta. w (10.2) which in light of § 6. To capture this sense. “Vieta’s substitution”] .” 22:35. § 1. 2006][52. inasmuch as exp[−φ] = 1/ exp φ.” 00:26.2 shows how Vieta’s transform can be used.1 Vieta’s transform There is a sense to numbers by which 1/2 resembles 2. This chapter explains. formulas do exist. in the neighborhood of which w transitions from the one domain to the other. Tartaglia and Cardano have given us the clever technique. more generally. 1/4 resembles 4.19) of the cosh(·) function. CUBICS AND QUARTICS though the ancients never discovered how.220 CHAPTER 10. w 2 wo ← z.

what to do next is not so obvious.5) . where p ≡ −a1 + a2 2 .10. 3 a1 a2 a2 q ≡ −a0 + −2 3 3 a2 a1 a2 a2 2 +2 x + a0 − 3 3 3 3 (10. are the cubic polynomial’s three roots. So we have struck the a2 z 2 term. (10.2. plotted logarithmically.6) then.4) .1: Vieta’s transform (10. ln z ln w to obtain the polynomial x3 + (a2 + 3h)x2 + (a1 + 2ha2 + 3h2 )x + (a0 + ha1 + h2 a2 + h3 ). That was the easy part. If one could strike the px term as well. 3 (10.1) for wo = 1. The choice a2 3 casts the polynomial into the improved form h≡− x3 + a1 − or better yet x3 − px − q. The solutions to the equation x3 = px + q. then the . CUBICS 221 Figure 10.

This works.3.3) achieves this—or rather. with the idea of balancing the equation between oﬀsetting w and 1/w terms. To deﬁne inscrutable coeﬃcients unnecessarily before the need for them is apparent seems poor applied mathematical style. The careful reader will observe that (10.7) (10.6) by the change of variable w+ we get the new equation 2 2 w3 + (3wo − p)w + (3wo − p) 2 w6 wo + o = q. none of which seems to help much. double the three the fundamental theorem of algebra (§ 6. Vieta-transforming (10. but at the price of reintroducing an unwanted x2 or z 2 term. we should like to improve the notation by deﬁning5 p P ←− .2. but after weeks or months of such frustration one might eventually discover Vieta’s transform (10. various substitutions. we have the quadratic equation (w3 )2 = 2 p q w3 − 2 3 3 . For the moment.2) allows a cubic polynomial to have.10) seems to imply six roots. CUBICS AND QUARTICS roots would follow immediately.9) reducing (10. such a substitution does achieve it.11) Why did we not deﬁne P and Q so to begin with? Well. Lacking guidance. (10. We shall return to this point in § 10. one might try many. but no very simple substitution like (10. 3 q Q←+ . Vieta’s transform has reduced the original cubic to a quadratic. 3 (10.222 CHAPTER 10.10).10) which by (2.8) to read w3 + (p/3)3 = q. we lacked motivation to do so. That way is no good.2) we know how to solve. w3 Multiplying by w3 and rearranging terms. before unveiling (10. however.8) which invites the choice 2 wo ≡ p .1). 2 5 (10. w w3 2 wo ←x w (10. .

what the equations really imply is not six distinct roots but six distinct w.1: The method to extract the three roots of the general cubic polynomial.4 below. otherwise. which three do we really need and which three can we ignore as superﬂuous? The six w naturally come in two groups of three: one group of three from the one w3 and a second from the other. 10. x ≡ 0 w − P/w a2 z = x− 3 3 if P = 0 and Q = 0. one can choose either sign.10) are written 3 2 (w ) x3 = 2Q − 3P x. 3 (10. (In the deﬁnition of w3 .2) allows a cubic polynomial to have. we shall guess—and logically it is only a guess—that a single w3 generates three distinct x and thus (because z diﬀers from x only by a constant oﬀset) all three roots z. treated in §§ 10.1 seem to imply six roots.2. If . so in fact the equations imply only three x and thus three roots z.2 has observed. For this reason.13) = 2Qw + P .12) (10. However. SUPERFLUOUS ROOTS 223 Table 10.1 summarizes the complete cubic polynomial root extraction method in the revised notation—including a few ﬁne points regarding superﬂuous roots and edge cases. The deﬁnition x ≡ w − P/w maps two w to any one x. the equations of Table 10.) 0 = z 3 + a2 z 2 + a1 z + a0 a1 a2 2 P ≡ − 3 3 a2 a2 a1 1 −2 −a0 + 3 Q ≡ 2 3 3 3 2Q if P = 0.6) and (10. The question then is: of the six w. with which (10.3. w3 ≡ Q ± Q2 + P 3 otherwise.10. 3 Table 10. double the three the fundamental theorem of algebra (§ 6.3 and 10.3 Superﬂuous roots As § 10.

Q2 + P 3 . just as the verb to square means “to raise to the second power. 6 w1 = 2Q2 + P 3 ± 2Q 6 Combining the last two on w1 . Q4 + 2Q2 P 3 + P 6 = Q4 + Q2 P 3 . then the second w3 cannot but yield the same three roots. (P 3 )(Q2 + P 3 ) = 0. rearranging terms and halving. then canceling oﬀsetting terms and factoring. 6 The verb to cube in this context means “to raise to the third power. by successive steps. Cubing6 the last equation.” as to change y to y 3 . 6 w1 = −P 3 . CUBICS AND QUARTICS the guess is right. which means that the second w3 is superﬂuous and can safely be overlooked.224 CHAPTER 10. e+i2π/3 w1 − P e+i2π/3 w1 P e+i2π/3 w1 + −i2π/3 e w1 P e+i2π/3 w1 + w1 . But is the guess right? Does a single w3 in fact generate three distinct x? To prove that it does. e−i2π/3 w1 P = e−i2π/3 w1 + +i2π/3 . e w1 P = e−i2π/3 w1 + . −P 3 = 2Q2 + P 3 ± 2Q or. Squaring. w1 = e−i2π/3 w1 − P which can only be true if 2 w1 = −P. Q2 + P 3 = ∓Q Q2 + P 3 . Letting the symbol w1 represent the third w. Let us suppose that a single w3 did generate two w which led to the same x. Q2 + P 3 . Because x ≡ w − P/w. but squaring the table’s w3 deﬁnition for w = w1 . then (since all three w come from the same w3 ) the two w are e+i2π/3 w1 and e−i2π/3 w1 . let us suppose that it did not.” .

just as it ought to do. when the two w3 diﬀer in magnitude one might choose the larger.4 Edge cases Section 10.1. 1 and 2. as was to be demonstrated. This certainly is possible. One can choose either sign. but e±i2π/3 = (−1 ± i 3)/2.32). the three x descending from a single w3 are indeed distinct.1 gives the quadratic solution w 3 ≡ Q ± Q2 + P 3 . EDGE CASES 225 The last equation demands rigidly that either P = 0 or P 3 = −Q2 . 7 Numerically. the Taylor series of Table 8. then. For example. of the two signs in the table’s quadratic solution w3 ≡ Q ± Q2 + P 3 demands to be considered. that nothing prevents two actual roots of a cubic polynomial from having the same value. In calculating the three w from w3 . if for no other reason than to oﬀer the reader a model of how to think about edge cases on his own. Some cubic polynomials do meet the demand—§ 10. When this happens. They are e 1 √ −1 ± i 3 w1 .4 will treat these and the reader is asked to set them aside for the moment—but most cubic polynomials do not meet it.4. Therefore.10. it matters not which. the method of Table 10.3 excepts the edge cases P = 0 and P 3 = −Q2 . The assumption: that the three x descending from a single w3 were not distinct. Mostly the book does not worry much about edge cases. 10. incidentally. For most cubic polynomials. the cubic polynomial (z − 1)(z − 1)(z − 2) = z 3 − 4z 2 + 5z − 2 has roots at 1. and it does not mean that one of the two roots is superﬂuous or that the polynomial has fewer than three roots. so two roots come easier. 2 We should observe. provided that P = 0 and P 3 = −Q2 . with a single root at z = 2 and a double root—that is. (10.1 properly yields the single root once and the double root twice. Then√ other the ±i2π/3 w . two roots—at z = 1. or any other convenient root3 ﬁnding technique to ﬁnd a single root w1 such that w1 = w3 . The conclusion: either. the contradiction proves false the assumption which gave rise to it. one can apply the Newton-Raphson iteration (4. Table 10. it can matter.14) w = w1 . because w appears in the denominator of x’s deﬁnition. As a simple rule. not both. but the eﬀects of these cubic edge cases seem suﬃciently nonobvious that the book might include here a few words about them.7 The one sign alone yields all three roots of the general cubic polynomial. .

In the edge case P = 0. and does not worry about choosing one w3 over the other. by sidestepping it. the trouble is that w3 = 0 and that no alternate w3 is available. w (2Q)1/3 . At the corner. w3 = 2Q or 0. This is ﬁne. (2Q)1/3 ǫ ǫ x = w− =− + (2Q)1/3 = (2Q)1/3 . we shall consider ﬁrst the edge cases themselves. or equivalently where P = 0 and Q = 0. This implies the triple root z = −a2 /3. or corner case. However. gives two distinct quadratic solutions w3 . w3 = Q. w3 = Q − w = − Q2 + ǫ3 = Q − (Q) 1 + ǫ3 2Q2 =− ǫ3 . which puts an awkward 0/0 in the table’s deﬁnition of x.3. so the one w3 suﬃces by default because the other w3 brings nothing diﬀerent. changing P inﬁnitesimally from P = 0 to P = ǫ. Both edge cases are interesting. The edge case P = 0 however does give two distinct w3 .3 has excluded the edge cases from its proof of the suﬃciency of a single w3 . choosing the − sign in the deﬁnition of w3 . one of which is w3 = 0. no other x being possible. In the edge case P 3 = −Q2 . In this section. The edge case P 3 = −Q2 gives only the one quadratic solution w3 = Q.3 generally ﬁnds it suﬃcient to consider either of the two signs. or more precisely. which in this case means that x3 = 0 and thus that x = 0 absolutely. One of the two however is w3 = Q − Q = 0. The double edge case.1’s deﬁnition that x ≡ w − P/w. in applying the table’s method when P = 0. The edge case P = 0. Then. like the general non-edge case. which is awkward in light of Table 10. according to (10. both w3 are the same. CUBICS AND QUARTICS in which § 10. Let us now add the edge cases to the proof. 2Q ǫ .226 CHAPTER 10. it gives two quadratic solutions which happen to have the same value. We address this edge in the spirit of l’Hˆpital’s o rule. one chooses the other quadratic solution. One merely accepts that w3 = Q. then their eﬀect on the proof of § 10. For this reason.12). w3 = Q + Q = 2Q. Section 10. arises where the two edges meet— where P = 0 and P 3 = −Q2 . In the edge case P 3 = −Q2 . x3 = 2Q − 3P x.

though. historically Ferrari discovered it earlier [51. this writer does not know. which he could not solve until Tartaglia applied Vieta’s transform to it. 8 Even stranger. but one supposes that it might make an interesting story. This completes the proof. What motivated Ferrari to chase the quartic solution while the cubic solution remained still unknown. whereas (1)1/3 = 1. 6’s footnote 8. we now turn our attention to the general quartic.10. Apparently Ferrari discovered the quartic’s resolvent cubic (10. . w (2Q)1/3 Evidently the roots come out the same. either way. +3 a3 4 4 . The details differ. ±i. strangely enough. The kernel of the quartic technique lies likewise in reducing the quartic to a cubic. and. “Quartic equation”].8 As with the cubic.17) . This may also be why the quintic is intractable—but here we trespass the professional mathematician’s territory and stray from the scope of this book. 10.5 Quartics Having successfully extracted the roots of the general cubic polynomial. the roots lying nicely along the Argand axes. (−1±i 3)/2. See Ch. 1/3 227 .5. 4 a3 4 2 (10. ǫ ǫ x = w − = (2Q)1/3 − = (2Q)1/3 .22).16) s ≡ −a2 + 6 . one begins solving the quartic by changing the variable x+h←z to obtain the equation x4 = sx2 + px + q. The (1)1/4 brings a much neater result. 3 2 p ≡ −a1 + 2a2 q ≡ −a0 + a1 a3 a3 −8 4 4 a3 a3 − a2 4 4 (10. The reason the quartic is simpler to reduce is probably related to the fact that (1)1/4 = √ ±1.15) (10. The kernel of the cubic technique lay in reducing the cubic to a quadratic. where h≡− a3 . w3 = Q + w = (2Q) Q2 + ǫ3 = 2Q. QUARTICS But choosing the + sign. in some ways the quartic reduction is actually the simpler.

one must regard (10. but not so u. look at (10. The left side of that equation is a perfect square. but with respect to x2 rather than x. Ferrari9 supplies the cleverness. where u remains to be chosen. what choice for u would be wise? Well.18) Now. And though j 2 and k2 depend on u. The variable u is completely free.18).16) further. too. better expressed. one must be cleverer. or. we have introduced it ourselves and can assign it any value we like. even after specifying u we remain free at least to choose signs for j and k. (10.17). though no choice would truly be wrong. (10. j 2 ≡ u2 + q. The clever idea is to transfer some but not all of the sx2 term to the equation’s left side by x4 + 2ux2 = (2u + s)x2 + px + q. we have that p2 = 4(2u + s)(u2 + q). that 4sq − p2 s . 0 = u3 + u2 + qu + 2 8 9 (10. CUBICS AND QUARTICS To reduce (10. (10.20) and substituting for j 2 and k2 from (10.21) Squaring (10.228 CHAPTER 10.19).18) easier to simplify. rearranging terms and scaling. So. j or k.19) properly. “Quartic equation”] . if it were that p = ±2jk.20) or.18) and (10. In these equations. p and q have deﬁnite values ﬁxed by (10.2.19) 2 = k2 x2 + px + j 2 . As for u. then to complete the square on the equation’s left side as in § 2.22) [51. j= p . still. we propose the constraint that p = 2jk. The right side would be. as x2 + u where k2 ≡ 2u + s. one supposes that a wise choice might at least render (10. arbitrarily choosing the + sign. so. after distributing factors. 2k (10. s.

Equation (10.22) are both honored. 10. which we know by Table 10.20) into (10.23) implies the quadratic x2 = ±(kx + j) − u.25) wherein the two ± signs are tied together but the third. but the resolvent cubic is a voluntary constraint.22) is the resolvent cubic.25). 2 2 = kx + j . If the constraints (10. which (2. reveals the four roots of the general quartic polynomial. and (10. (10.22) of course yields three u not one. (10. Using the improved notation.21) then gives j.2) solves as x=± k ±o 2 k 2 2 (10. (10. then we can safely substitute (10. Table 10. .19) then gives k (again.23) The resolvent cubic (10.26) improves the notation. so we can just pick one u and ignore the other two. we can just pick one of the two signs).2 summarizes the complete quartic polynomial root extraction method. ±o sign is independent of the two. the change of variables K← k . 2 J ← j.21) and (10. GUESSING THE ROOTS 229 Equation (10.25). With u.24) ± j − u.6 Guessing the roots It is entertaining to put pencil to paper and use Table 10. (10. with the other equations and deﬁnitions of this section. j and k established.6. Equation (10.18) to reach the form x2 + u which is x2 + u 2 = k2 x2 + 2jkx + j 2 . and which we now specify as a second constraint.1’s method to extract the roots of the cubic polynomial 0 = [z − 1][z − i][z + i] = z 3 − z 2 + z − 1.1 how to solve for u. In view of (10.10.

CUBICS AND QUARTICS Table 10.2: The method to extract the four roots of the general quartic polynomial. the ±o is independent but the other two are tied together. J ≡ p/4K otherwise. where any one of the three resulting u serves.) 0 = z 4 + a3 z 3 + a2 z 2 + a1 z + a0 a3 2 s ≡ −a2 + 6 4 a3 a3 3 p ≡ −a1 + 2a2 −8 4 4 a3 2 a3 a3 − a2 +3 q ≡ −a0 + a1 4 4 4 s 2 4sq − p2 0 = u3 + u + qu + 8 √ 2 2u + s K ≡ ± 2 ± u2 + q if K = 0.1.230 CHAPTER 10. the four resulting combinations giving the four roots of the general quartic. Of the three ± signs in x’s deﬁnition. the resolvent cubic is solved for u by the method of Table 10. (In the table. x ≡ ±K ±o a3 z = x− 4 K2 ± J − u 4 . Either of the two K similarly serves.

1 and 10. ± . one ﬁnds that 1. 2 2 Doubling to make the coeﬃcients all integers produces the polynomial 2z 5 − 7z 4 + 8z 3 + 1z 2 − 0xAz + 6. but if any of the roots happens to be real and rational. 10 .10. it must belong to the set 1 2 3 6 ±1. ±3 and ±6) divided by the factors of the polynomial’s leading coeﬃcient (in the example. No other real. whose factors are ±1 and ±2). One has found the number 1 but cannot recognize it. rational root is possible. 3 3 w √ 2 5 + 33 . ±2. ± 2 2 2 2 . but why? The root’s symbolic form gives little clue. Figuring the square and cube roots in the expression numerically. GUESSING THE ROOTS One ﬁnds that z = w+ w3 ≡ 1 2 − 2 . rational candidates are the factors of the polynomial’s trailing coeﬃcient (in the example. which naturally has the same roots. a quartic. but just you try to simplify it! A more baroque. ±3.10 we are stuck with the cubic baroquity. ± . 6. Consider for example the quintic polynomial 1 7 z 5 − z 4 + 4z 3 + z 2 − 5z + 3. the author would like to hear of it.2 and guess the roots directly. a quintic or any other polynomial has real. However. ±2. to the extent that a cubic. Dividing these out leaves a quadratic which is easy to solve for the remaining roots. rational root is At least. The reason no other real. more impenetrable way to write the number 1 is not easy to conceive. The real.0000. whose factors are ±1. no better way is known to this author. rational roots. If the roots are complex or irrational. the root of the polynomial comes mysteriously to 1. ± .6. 2. 33 231 which says indeed that z = 1. a trick is known to sidestep Tables 10. −1 and 3/2 are indeed roots. In general no better way is known. Trying the several candidates on the polynomial. ±i. ±6. If any reader can straightforwardly simplify the expression without solving a cubic polynomial of some kind. they are hard to guess.

12 [46.12 Such root-guessing is little more than an algebraic trick. By similar reasoning. rational root is possible except a factor of a0 divided by a factor of an . We conclude for this reason. Moving the q n term to the equation’s right side. as was to be demonstrated. which implies that a0 q n is a multiple of p. We do not want to spend many pages on this. The next several chapters turn to the topic of the matrix. harder but much more proﬁtable. that no real. The presentation here is quite informal. we have that an pn−1 + an−1 pn−2 q + · · · + a1 q n−1 p = −a0 q n . § 3. but now that the reader has tasted the topic he may feel inclined to agree that. though the general methods this chapter has presented to solve cubics and quartics are interesting. we have deﬁned p and q to be relatively prime to one another—that is. we have deﬁned them to have no factors but ±1 in common—so. then p and q are factors of a0 and an respectively. a multiple of q. But by demanding that the fraction p/q be fully reduced. toward which we mean to put substantial eﬀort. One could write much more about higher-order algebra. of course. and an . CUBICS AND QUARTICS possible is seen11 by writing z = p/q—where p and q are integers and the fraction p/q is fully reduced—then multiplying the nth-order polynomial by q n to reach the form an pn + an−1 pn−1 q + · · · + a1 pq n−1 + a0 q n = 0. further eﬀort were nevertheless probably better spent elsewhere.2] 11 . but it can be a pretty useful trick if it saves us the embarrassment of inadvertently expressing simple rational numbers in ridiculous ways. where all the coeﬃcients ak are integers. But if a0 is a multiple of p.232 CHAPTER 10. an is a multiple of q. not only a0 q n but a0 itself is a multiple of p.

or how C might scale something. the √ eigenvalues of C happen to be −1 and [7 ± 0x49]/2). but prior to this point in the book it was usually trivial—the eigenvalue of 5 is just 5. for example. But. just what is an eigenvalue? Answer: an eigenvalue is the value by which an object like C scales an eigenvector without altering the eigenvector’s direction. but it is to answer precisely such questions that this chapter and the three which follow it are written. we have not yet said what an eigenvector is. 14. most of the foundational methods of the earlier chapters have handled only one or at most a few numbers (or functions) at a time. Where we have used single numbers as coeﬃcients or multipliers 233 . So. we are getting ahead of ourselves. either. Taken by themselves. as. It serves as a generalized coeﬃcient or multiplier. However. Regarding the eigenvalue: the eigenvalue was always there. Let’s back up. This chapter begins to build on those foundations. for instance—so we didn’t bother much to talk about it. demanding some heavier mathematical lifting. the eigenvalue of Ch. It is when numbers are laid out in orderly grids like C= 2 6 4 3 3 4 0 1 3 0 1 5 0 that nontrivial eigenvalues arise (though you cannot tell just by looking. An object like C is called a matrix. in practical applications the need to handle large arrays of numbers at once arises often.Chapter 11 The matrix Chapters 2 through 9 have laid solidly the basic foundations of applied mathematics. Some nonobvious eﬀects emerge then. Of course.

so in this example m = 2 and n = 3). 6 1. This book does not treat them. the chief advantage of the standard matrix is that it neatly represents the linear transformation of one vector into another. THE MATRIX heretofore. Fifth. which can be written in-line with the notation A = [0 6 2. an m-element vector does not diﬀer for most purposes from an m×1 matrix. a column of m scalars like u= » 5 −4 + i3 – .1. Third. “Matrix stacks” bring no such advantage. vector. matrix suggests next a “matrix stack” or stack of p matrices. in square brackets. the three-element (that is. The technical name for the “single number” is the scalar. even inﬁnity. is called a scalar because its action alone during multiplication is simply to scale the thing it multiplies. Such a number.1 Where one needs visually to distinguish a symbol like A representing a matrix. one can with suﬃcient care often use matrices instead. First.234 CHAPTER 11. though the progression scalar.2 Normally however a simple A suﬃces. such objects in fact are seldom used. • the vector. even one. which can be written in-line with the notation u = [5 −4 + i3]T (here there are two scalar elements. so in this example m = 2). 1 . Fourth. As we shall see in § 11. generally the two can be regarded as the same thing. 5 and −4+i3. like – » 0 6 2 A = 1 1 −1 . a single number like α = 5 or β = −4 + i3. three-dimensional) geometrical vector of § 3. or equivalently a row of n vectors. The matrix interests us for this reason among others. m and n can be any nonnegative integers. despite the geometrical Argand interpretation of the complex number. Besides acting alone. scalars can also act in concert—in orderly formations—thus constituting any of three basic kinds of arithmetical object: • the scalar itself. Second. one can write it [A].3 is just an m-element vector with m = 3. 1 1 −1] or the notation A = [0 1. • the matrix. even zero. as for instance 5 or −4 + i3. 2 −1]T (here there are two rows and three columns of scalar elements. Several general points are immediately to be observed about these various objects. an m × n grid of scalars. 2 Alternate notations seen in print include A and A. therefore any or all of a vector’s or matrix’s scalar elements can be complex. however. a complex number is not a two-element vector but a scalar.

no more likely to be mastered by mere theoretical study than was the classical arithmetic of childhood. and using a sword. The reward is worth the eﬀort. 11 and 12 as need arises. At least. Would the rebel consider alternate counsel? If so.3. The reader who has previously drilled matrix arithmetic will meet here the essential applied theory of the matrix. Only the doggedly determined beginner will learn the matrix here alone. is likely to ﬁnd these chapters positively hostile. decomposing each carefully by pencil per the Gauss-Jordan method of § 12. The way of the warrior is hard. necessary the chapters may be. the earlier parts are not very exciting (later parts are better). Part of the trouble with the matrix is that its arithmetic is just that. The matrix upsets this balance. surprisingly less dull [23] for instance. Applied mathematics brings nothing else quite like it. using a machine defeats the point of the exercise. an arithmetic. still determined to learn the matrix here alone. logical. one must drill it. yet the book you hold is fundamentally one of theory not drill. the author salutes his honorable deﬁance. The most basic body of matrix theory. however. That reader will ﬁnd this chapter and the next three tedious enough. the beginner can expect to ﬁnd these chapters still tedious but no longer impenetrable. The reader who has not previously drilled matrix arithmetic. square and tall. 13 and 14. others will ﬁnd it more amenable to drill matrix arithmetic ﬁrst in the early chapters of an introductory linear algebra textbook. dull though such chapters be (see [31] or better yet the ﬁne. Substantial. without which little or no useful matrix work can be done. but exciting they are not. even interminable at ﬁrst encounter. no one has ever devised a way to introduce the matrix which does not seem shallow. The idea of the matrix is deceptively simple. referring back to Chs. yet the matrix is too important to ignore.235 The matrix is a notoriously hard topic to motivate. though the early chapters of almost any such book give the needed arithmetical drill. but to understand the matrix one must ﬁrst understand the tedium and clutter the matrix encapsulates. the young warrior with face painted and sword agleam. checking results (again by pencil. As far as the author is aware.) Returning here thereafter. Several hours of such drill should build the young warrior the practical arithmetical foundation to master—with commensurate eﬀort—the theory these chapters bring. then the rebel might compose a dozen matrices of various sizes and shapes. That is the approach the author recommends. but conquest is not impossible. . The mechanics of matrix arithmetic are deceptively intricate.3 Chapters 11 through 14 treat the matrix and its algebra. To the mathematical rebel. well. As a reasonable compromise. tiresome. 3 In most of its chapters. it won’t work) by multiplying factors to restore the original matrices. broad. the veteran seeking more interesting reading might skip directly to Chs. To master matrix arithmetic. To the matrix veteran. This chapter. is deceptively extensive. irksome. the author presents these four chapters with grim enthusiasm. the book seeks a balance between terseness the determined beginner cannot penetrate and prolixity the seasoned veteran will not abide. The matrix neatly encapsulates a substantial knot of arithmetical tedium and clutter.

The linear transformation 5 is the operation of an m × n matrix A.1) to transform an n-element vector x into an m-element vector b. 2 and 5] has much to recommend it.3.1 The linear transformation Section 7. (11. We begin there.2) is the 2 × 3 matrix which transforms a three-element vector x into a twoelement vector b such that Ax = where x= 4 5 » 2 0x1 + 6x2 + 2x3 1x1 + 1x2 − 1x3 – = b.236 CHAPTER 11. 1 and 2][31. The professional approach [2. introduces the rudiments of the matrix itself.4 11. . while respecting the rules of linearity A(x1 + x2 ) = Ax1 + Ax2 = b1 + b2 . 1. 3 x1 4 x2 5. (11. but it is not the approach we will follow here. the basis set and the simultaneous system of linear equations—proving from suitable axioms that the three amount more or less to the same thing.1. A= » 0 1 6 1 2 −1 – = αb. [2][15][23][31] Professional mathematicians conventionally are careful to begin by drawing a clear distinction between the ideas of the linear transformation. 11.3 has introduced the idea of linearity. as in Ax = b. 11. A(αx) = αAx A(0) = 0.1 Provenance and basic use It is in the study of linear transformations that the concept of the matrix ﬁrst arises. rather than implicitly assuming the fact. THE MATRIX Ch. Chs. x3 b= » b1 b2 – . For example. Chs.

there are unfortunately not enough distinct Roman and Greek letters available to serve the needs of higher mathematics.1. industrial mass production-style. however. the operation of a matrix A is that6. is compactly represented as Ax = b.3. matrices can also represent simultaneous systems of linear equations. 11. one can use ℓjk or some other convenient letters for the indices. the system 0x1 + 6x2 + 2x3 = 2. following mathematical convention in the matter for this reason. P the meaning is usually clear from the context: i in i or aij is an index.11. 1x1 + 1x2 − 1x3 = 4. PROVENANCE AND BASIC USE In general. with A as given above and b = [2 4]T . and aij ≡ [A]ij is the element at the ith row and jth column of A.7 n 237 bi = j=1 aij xj . Seen from this point of view. In computers. The book you are reading tends to let the index run from 1 to n. Here. so the computer’s indexing convention poses no dilemma in this case. which is not an index and has nothing to do with indices. but the same letter i also serves as the imaginary unit. a simultaneous system of linear equations is itself neither more nor less than a linear transformation.1. a 0 index normally implies something special or basic about the object it identiﬁes.3) where xj is the jth element of x. i in −4 + i3 or eiφ is the imaginary unit. transforming them at once into the corresponding As observed in Appendix B. a12 = 6). an m × n matrix can be considered an ∞ × ∞ matrix with zeros in the unused cells. the index normally runs from 0 to n − 1. counting from top left (in the example for instance. both indices i and j run from −∞ to +∞ anyway. and in many ways this really is the more sensible way to do it. Besides representing linear transformations as such. 6 . 7 Whether to let the index j run from 0 to n−1 or from 1 to n is an awkward question of applied mathematical style.2 Matrix multiplication (and addition) Nothing prevents one from lining several vectors xk up in a row. In matrix work. bi is the ith element of b. In mathematical theory. Fortunately. Should a case arise in which the meaning is not clear. See § 11. the Roman letters ijk conventionally serve as indices. Conceived more generally. For example. (11.

one multiplies each of the matrix’s elements individually by the scalar: [αA]ij = αaij .4) implies a deﬁnition for matrix multiplication. however. (A + C)(X) = AX + CX. (11. matrix addition is indeed distributive: [X + Y ]ij = xij + yij .4). (11. X ≡ [ x1 x2 · · · B ≡ [ b1 b2 · · · AX = B.4) bik = j=1 aij xjk . (11.6) as one can show by a suitable counterexample like A = [0 1.7) Evidently multiplication by a scalar is commutative: αAx = Aαx.5) (11. Such matrix multiplication is associative since n [(A)(XY )]ik = j=1 n aij [XY ]jk p = aij j=1 p n ℓ=1 xjℓ yℓk aij xjℓ yℓk = ℓ=1 j=1 = [(AX)(Y )]ik . X = [1 0. under multiplication. In this case. Equation (11. (A)(X + Y ) = AX + AY . (11. AX = XA. bp ]. Matrix addition works in the way one would expect. To multiply a matrix by a scalar. element by element. THE MATRIX vectors bk by the same matrix A. and as one can see from (11.238 CHAPTER 11. n xp ]. 0 0]. Matrix multiplication is not generally commutative. 0 0].8) .

the concept is important. however. from the left.10) or a row operation (11. the same matrix equation represents something else.1. It operates on A’s columns. all columns of X”—that is. row operators. the jth column of A. one writes (11. it represents a weighted sum of the columns of A. The ∗ here means “any” or “all. In this view. Since matrix multiplication produces the same result whether one views it as a linear transformation (11. By virtue of multiplying A from the right. If several vectors xk line up in a row to form a matrix X. with the elements of x as the weights. the matrix X operates on A’s columns. (11. This rule is worth memorizing. the vector x is a column operator acting on A. Similarly.3 Row and column operators The matrix equation Ax = b represents the linear transformation of x into b. (11.11. the matrix A operates on X’s rows. besides (11.11). If a matrix multiplying from the right is a column operator. is n [B]i∗ = j=1 aij [X]j∗ . is a matrix multiplying from the left a row operator? Indeed it is.” Hence [X]j∗ means “jth row. a column operation (11.10). (11. PROVENANCE AND BASIC USE 239 11. the jth row of X.4). such that AX = B. as we have seen.11) The ith row of A weights the several rows of X to yield the ith row of B. In AX = B.3) as n b= j=1 [A]∗j xj . [A]∗j means “all rows. jth column of A”—that is. (Observe the notation. it is also an operator.1. then the matrix X is likewise a column operator: n [B]∗k = j=1 [A]∗j xjk .) Column operators attack from the right. it is not so much for the sake . However. Another way to write AX = B. The matrix A is a row operator. one might wonder what purpose lies in deﬁning matrix multiplication three separate ways. Viewed from another perspective.10) The kth column of X weights the several columns of A to yield the kth column of B.9) where [A]∗j is the jth column of A. Here x is not only a vector.

It is worth developing the mental agility to view and handle matrices all three ways for this reason.240 CHAPTER 11. 11.4 The transpose and the adjoint One function peculiar to matrix algebra is the transpose C = AT . A tedious. but algebraically the transpose is artiﬁcial.4). the latter two do indeed expand to yield (11.1. −1 (11.63). The transpose is convenient notationally to write vectors and matrices inline and to express certain matrix-arithmetical mechanics. Alternate notations sometimes seen in print for the adjoint include A† (a notation which in this book means something unrelated) and AH (a notation which recalls the name of the mathematician Charles Hermite). the transpose and the adjoint amount to the same thing. the book you are reading writes the adjoint only as A∗ . of course. It is the adjoint rather which mirrors a matrix properly. nonintuitive matrix theorem from one perspective can appear suddenly obvious from another (see for example eqn.12) Similar and even more useful is the conjugate transpose or adjoint 8 C = A∗ . We do it for ourselves.” whereas the adjoint would be “snoitavired. cij = a∗ . then the transpose of “derivations” would be “snoitavired. ji (11. but as written the three represent three diﬀerent perspectives on the matrix. For example. Results hard to visualize one way are easy to visualize another.” See the diﬀerence?) On real-valued matrices like the A in the example. a notation which better captures the sense of the thing in the author’s view. THE MATRIX of the mathematics that we deﬁne it three ways as it is for the sake of the mathematician. cij = aji . (If the transpose and adjoint functions applied to words as to matrices. However.13) which again mirrors an m × n matrix into an n × m matrix. 11. AT = 2 0 4 6 2 3 1 1 5. but conjugates each element as it goes. 8 . Mathematically. which mirrors an m × n matrix into an n × m matrix.

0 otherwise. Later. “Kronecker delta. 31 May 2006] . THE KRONECKER DELTA 241 If one needed to conjugate the elements of a matrix without transposing the matrix itself. The discrete analog of the Dirac delta is the Kronecker delta 10 δi ≡ or δij ≡ ∞ i=−∞ ∞ i=−∞ ∞ j=−∞ 1 if i = 0. Observe that (A2 A1 )T = AT AT . [52.13) and (7.2.14). 1 2 (A2 A1 )∗ = A∗ A∗ . however.17) The Kronecker delta enjoys the Dirac-like properties that δi = δij = δij = 1 (11.3 that k Ak = · · · A3 A2 A1 . whereas k Ak = A1 A2 A3 · · · .2 The Kronecker delta Section 7.3 will revisit the Kronecker delta in another light. Such a need seldom arises. Chs. 9 10 Q ‘ Recall from § 2.19) parallel the Dirac equations (7. (11. k k Ak k = 11.4.18) and that δij ajk = aik .15) A∗ . 1 2 and more generally that9 T (11.19) the latter of which is the Kronecker sifting property.” 15:59.18) and (11.11.7 has introduced the Dirac delta. 0 otherwise. ∞ j=−∞ (11. k (11. 11 and 14 will ﬁnd frequent use for the Kronecker delta. one could contrive notation like A∗T . 1 if i = j.14) Ak k ∗ = k AT . The Kronecker equations (11.16) (11. § 15.

. . if we want. . . for instance. (11. .. We can also deﬁne A5 . .3 Dimensionality and matrix forms X= 2 −4 4 1 2 3 0 2 5 −1 . 7 7 7 7 7 7 5 aij xjk . the matrix multiplication rule (11. x(−1)(−1) = 0. . . 0 0 0 0 0 0 0 .20) For square matrices whose purpose is to manipulate other matrices or vectors in place. 0 0 0 0 0 0 0 . As before. . . . . THE MATRIX 11. 0 0 0 0 0 0 0 . . .” 1 6 5 6 4 0 0 0 1 0 0 0 0 1 0 0 0 7 7. merely padding with zeros often does not suit. all these express the same operation: “to add to the second row. . 3 An m × n matrix like can be viewed as the ∞ × ∞ matrix 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 . its eﬀect is to add to X’s second row. Further consider 3 2 A4 = which does the same to a 4 × p matrix X. A7 . . . bik = ∞ j=−∞ 7 7 7 7 7 7 7 7 7. with zeros in the unused cells. indeed for all matrices.242 CHAPTER 11. 0 0 0 0 0 0 0 . . ··· ··· ··· ··· ··· ··· ··· . . . . really. x11 = −4 and x32 = −1. Consider for example the square matrix A3 = 2 1 4 5 0 0 1 0 3 0 0 5. . .4) generalizes to B = AX. ··· ··· ··· ··· ··· ··· ··· . . 0 0 0 0 0 0 0 −4 0 0 1 2 0 2 −1 0 0 0 0 0 0 . . X= . . .. . For such a matrix. 0 5 1 . .. . but. . . but now xij exists for all integral i and j. 5 times the ﬁrst. . . . . but when it acts A3 X as a row operator on some 3 × p matrix X. 1 This A3 is indeed a matrix. . A6 . 5 times the ﬁrst. . .

11. . ··· ··· ··· ··· ··· ··· ··· . .1. A adds to the second row of X. DIMENSIONALITY AND MATRIX FORMS The ∞ × ∞ matrix 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 . . And really. . . . the idea of an ∞ × 1 vector or an ∞ × ∞ matrix should not seem so strange. . . 0 0 0 1 0 0 0 . 12 The idea of inﬁnite dimensionality is sure to discomﬁt some readers. 0 0 0 0 1 0 0 .. . .20) that when A acts AX on a matrix X of any dimensionality whatsoever. only that that view does not suit the present book’s development. among others.11 This section explains. which holds all values of the function sin θ of a real argument θ. . that the idea thereof does not threaten to overturn the reader’s pre¨xisting matrix knowledge. As before. 1 0 0 0 0 0 0 . 0 0 0 0 0 1 0 . any more than one actually writes down or stores all the bits (or digits) of 2π. . Traditionally in matrix work and elsewhere in the book. . ··· ··· ··· ··· ··· ··· ··· . 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 243 A= expresses the operation generally. and has no second row? Then the operation AX creates a new second row. . that. who have studied matrices before and are used to thinking of a matrix as having some deﬁnite size. a11 = 1 and a21 = 5. . (But what if X is a 1 × p matrix. . . Diﬀerent ways of looking at the same mathematics can be extremely useful to the applied mathematician. . . By running ones inﬁnitely both ways out the main diagonal. 5 times the ﬁrst—and aﬀects no other row. .) In the inﬁnite-dimensional view. 5 times the ﬁrst—or rather so ﬁlls in X’s previously null second row. . the matrices A and X diﬀer essentially. 0 0 1 5 0 0 0 . Such usage admittedly proves awkward in other contexts. consider the vector u such that uℓ = sin ℓǫ.. . . After all. the letter A does not necessarily represent an extended operator as it does here. but now also a(−1)(−1) = 1 and a09 = 0. There is nothing wrong with thinking of a matrix as having some deﬁnite size. .12 This particular section happens to use the symbols A and X to represent certain speciﬁc matrix forms because such usage ﬂows naturally from the usage Ax = b of § 11. . Writing them down or storing them is not the point. . developing some nonstandard formalisms the derivations of later sections and chapters can use. . . . The applied mathematical reader who has never heretofore considered 11 . . but rather an arbitrary matrix of no particular form. we guarantee by (11.3. 0 0 0 0 0 0 1 . where 0 < ǫ ≪ 1 and ℓ is an integer. . . Of course one does not actually write down or store all the elements of an inﬁnite-dimensional vector or matrix. no fundamental conceptual barrier rises against it. . . The point is that inﬁnite dimensionality is all right. 0 1 0 0 0 0 0 . . though the construct seem e unfamiliar. .

. too. What really counts is not a matrix’s m × n dimensionality but rather its rank. inasmuch as X diﬀers from the null matrix only in the mn elements xij . but usually a simple 0 suﬃces. . Basically. . . Special symbols like 0. .244 CHAPTER 11. but those aren’t what we were speaking of here. . . 0 or O are possible for the null matrix. 7 7 7 7 7 7 7. ··· ··· ··· ··· ··· . The same symbol 0 used for the null scalar (zero) and the null matrix is used for the null vector. or to summing an inﬁnity of zeros as in Ch. . . 12. THE MATRIX 11. There are no surprises here. 7. to retain complete information about such a matrix. as a variation on the null matrix. the null matrix brings all the expected properties of a zero. one need record only the mn elements. 3 0= 6 6 6 6 6 6 6 6 6 6 6 4 ··· ··· ··· ··· ··· . . 13 Well. . . . . So the semantics are these: when we call a matrix X an m × n matrix. or more precisely a dimension-limited matrix with an m × n inﬁnite dimensionality in vectors and matrices would be well served to take the opportunity to do so here. 0 0 0 0 0 .. 7 7 7 7 5 or more compactly. dimensionality is a poor measure of a matrix’s size in any case. the vector 0 and the matrix 0 actually represent diﬀerent things is a matter of semantics. there’s not much else to it. 0 0 0 0 0 .3. . Whether the scalar 0. As we shall discover in Ch. 1 ≤ i ≤ m. plus the values of m and n. when it comes to dividing by zero as in Ch. . but the three are interchangeable for most practical purposes in any case. . . like 0 + A = A. 0 0 0 0 0 . 0 0 0 0 0 . 0 0 0 0 0 . 1 ≤ j ≤ n. inﬁnitedimensionally. .13 Now a formality: the ordinary m × n matrix X can be viewed. [0][X] = 0. . Though the theoretical dimensionality of X be ∞ × ∞. . a zero is a zero is a zero. . . 4.. there’s a lot else to it. [0]ij = 0.1 The null and dimension-limited matrices The null matrix is just what its name implies: 2 . . of course. . . .

. . . . . . .2 The identity and scalar matrices and the extended operator The general identity matrix —or simply. . . we shall mean formally that X is an ∞ × ∞ matrix whose elements are all zero outside the m × n rectangle: xij = 0 except where 1 ≤ i ≤ m and 1 ≤ j ≤ n. . . . . .22) where δij is the Kronecker delta of § 11. the identity matrix —is 2 6 6 6 6 6 6 6 6 6 6 6 4 . . .23) λI = ··· ··· ··· ··· ··· ..3. The I can be regarded as standing for “identity” or as the Roman numeral I. . . .2. or more compactly. . The identity matrix I is a matrix 1. . .3. . but a 4 × 4 matrix is not in general a 3 × 2 matrix. .24) 14 In fact you can write it as 1 if you like. DIMENSIONALITY AND MATRIX FORMS 245 active region. . 0 λ 0 0 0 . . .14 bringing the essential property one expects of a 1: IX = X = XI. 7 7 7 7 5 (11. . 0 0 0 1 0 . . . . The scalar matrix is 2 6 6 6 6 6 6 6 6 6 6 6 4 . .. [λI]ij = λδij . . 3 7 7 7 7 7 7 7. . 3 7 7 7 7 7 7 7. That is essentially what it is. . . . . . . .. 0 0 0 0 λ . . .11. (11. ··· ··· ··· ··· ··· . (11. 7 7 7 7 5 I= ··· ··· ··· ··· ··· . . 0 0 0 λ 0 . every 3 × 2 matrix (for example) is also a formally a 4 × 4 matrix. . . 0 0 λ 0 0 . 0 1 0 0 0 . . 0 0 1 0 0 . .. . [I]ij = δij . 1 0 0 0 0 . as it were. or more compactly. 0 0 0 0 1 . ··· ··· ··· ··· ··· .21) By these semantics. . . 11. . (11. . λ 0 0 0 0 .

26) λ = 0. . (11. . an extended n × n operator is one whose several αk all lie within the n × n square. 7 7 7 7 5 A= ··· ··· ··· ··· ··· . (Often in practice for smaller operators—especially in the typical case that λ = 1—one ﬁnds it easier just to record all the n × n elements of the active region.246 CHAPTER 11. inasmuch as A diﬀers from λI only in p speciﬁc elements. to retain complete information about such a matrix. . 0 0 0 0 λ . Large matrix operators however tend to be . then the scalar matrix λI is a matrix λ.27) That is. 1). . plus the scale λ. otherwise. When we call a matrix A an extended n × n operator. such that [λI]X = λX = X[λI]. . . . . . The several αk control how the extended operator A diﬀers from λI. with p a ﬁnite number. 4) and (4. this information alone implies the entire ∞ × ∞ matrix A. . (3. or an extended operator with an n × n active region. etc. The A in the example is an extended 4 × 4 operator (and also a 5 × 5. 1 ≤ k ≤ p. 3 7 7 7 7 7 7 7. α2 and α3 . . λ = 0. . 0 λ 0 0 0 . we shall mean formally that A is an ∞ × ∞ matrix and is further an extended operator for which 1 ≤ ik ≤ n and 1 ≤ jk ≤ n for all 1 ≤ k ≤ p. j) = (ik . aij = (λ)(δij + αk ) λδij if (i. 4). This is ﬁne. for an extended operator ﬁtting the pattern 2 6 6 6 6 6 6 6 6 6 6 6 4 . One need record only the several αk along with their respective addresses (ik . 0 0 λ 0 0 . . a 6 × 6. . . . jk ). (11. . Symbolically. . THE MATRIX If the identity matrix I is a matrix 1. . . one need record only the values of α1 .. but not a 3 × 3). . For example. the respective addresses (2. jk ).25) The identity matrix is (to state the obvious) just the scalar matrix with λ = 1. .. . . 0 0 λα2 λ(1 + α3 ) 0 .. The extended operator A is a variation on the scalar matrix λI. ··· ··· ··· ··· ··· . (11. and the value of the scale λ. . λ λα1 0 0 0 . .

bearing repeated entries not only along the main diagonal but also along the diagonals just above and just below. 2 × 2. meaning that they depart from λI in only a very few of their many elements. If λ = 0. j) ≤ n.3. here it is in outline: aij bij [AB]ij = = = = = λa δij unless 1 ≤ (i. j) ≤ n. otherwise the product is an extended operator.4 Other matrix forms Besides the dimension-limited form of § 11.3.1 and the extended-operational form of § 11. so one normally stores just the few elements.3 The active region Though maybe obvious. Section 11. λ = 0. however. etc. j) ≤ n.). the dimensionThe section’s earlier subsections formally deﬁne the term active region with respect to each of the two matrix forms. the scalar matrix λI is just the null matrix. are extended operators with 0 × 0 active regions (and also 1 × 1. DIMENSIONALITY AND MATRIX FORMS 247 sparse. 15 λb δij unless 1 ≤ (i.2.9 introduces one worthwhile matrix which ﬁts neither the dimension-limited nor the extended-operational form.3.16 11.) If any of the factors has dimension-limited form then so does the product. it bears stating explicitly that a product of dimension-limited and/or extended-operational matrices with n × n active regions itself has an n × n active region.) Implicit in the deﬁnition of the extended operator is that the identity matrix I and the scalar matrix λI.15 (Remember that a matrix with an m′ × n′ active region also by deﬁnition has an n × n active region if m′ ≤ n and n′ ≤ n. . 16 If symbolic proof of the subsection’s claims is wanted. 11. Pk (λa δik )bkj = λa bij unless 1 ≤ i ≤ n k aik (λb δkj ) = λb aij unless 1 ≤ j ≤ n It’s probably easier just to sketch the matrices and look at them.11. recording only nonzero elements and their addresses in an otherwise null matrix. though. other inﬁnite-dimensional matrix forms are certainly possible. instead. Still. X aik bkj k (P λa λb δij unless 1 ≤ (i. which is no extended operator but rather by deﬁnition a 0 × 0 dimension-limited matrix. One could for example advantageously deﬁne a “null sparse” form.3.3. or a “tridiagonal extended” form. It would waste a lot of computer memory explicitly to store all those zeros.

The areas of I3 not shown are all zero.) The rank r can be any nonnegative integer. otherwise.3. Further reasons to have deﬁned such forms will soon occur. The name “rank-r” implies that Ir has a “rank” of r. even along the main diagonal. I3 . and they are the ones we shall principally be handling in this book.5 deﬁnes rank for matrices more generally.1.29) where X is an m × n matrix and x. we shall discern the attribute of rank only in the rank-r identity matrix itself. (11. though a 3 × 3 matrix. and indeed it does. The eﬀect of Ir is that Im X = X = XIn . . One reason to have deﬁned speciﬁc inﬁnite-dimensional matrix forms is to show how straightforwardly one can fully represent a practical matrix of an inﬁnity of elements by a modest. 11. Examples of Ir include I3 = 1 4 0 0 0 1 0 (Remember that in the inﬁnite-dimensional view.28) where either the “and” or the “or” can be regarded (it makes no diﬀerence). It has only the three ones and ﬁts the 3 × 3 dimension-limited form of § 11. ﬁnite quantity of information. even zero (though the rankzero identity matrix I0 is in fact the null matrix.30) can be used. Section 12. If alternate indexing limits are needed (for instance for a computer-indexed b identity matrix whose indices run from 0 to r − 1). Im x = x. however.248 CHAPTER 11. is formally an ∞ × ∞ matrix with zeros in the unused cells. (11. 1 (11. which is just the count of ones along the matrix’s main diagonal. normally just written 0). otherwise.3. where b [Ia ]ij ≡ δij 0 if a ≤ i ≤ b and/or a ≤ j ≤ b. the notation Ia . the rank in this case is r = b − a + 1.5 The rank-r identity matrix The rank-r identity matrix Ir is the dimension-limited matrix for which [Ir ]ij = δij 0 if 1 ≤ i ≤ r and/or 1 ≤ j ≤ r. THE MATRIX limited and extended-operational forms are normally the most useful. For the moment. an m × 1 vector. 2 3 0 0 5.

(After all.) 17 . Attacking from the right. which has a one as the mth element: 1 if i = m.7 The elementary vector and the lone-element matrix The lone-element matrix Emn is the matrix with a one in the mnth cell and zeros elsewhere: [Emn ]ij ≡ δim δjn = 1 if i = m and j = n. Attacking from the left. [I]∗j = ej and [I]i∗ = eT . 0 otherwise.6 The truncation operator The rank-r identity matrix Ir is also the truncation operator. (11.32) By this deﬁnition.j cij Eij for any matrix C. The vector analog of the lone-element matrix is the elementary vector em . it retains the ﬁrst through rth columns. • The rank-r identity matrix Ir commutes freely past C. 11.31) says at least two things: • It is superﬂuous to truncate both rows and columns.3. as in Ir A. (11. Whether a matrix C has dimension-limited or extended-operational form (though not necessarily if it has some other form). There would be nothing there to truncate.3. (11. By this deﬁnition. if it were always all zero outside. That a matrix has an m × n active region does not necessarily mean that it is all zero outside the m × n rectangle.3. then there would be little point in applying a truncation operator. (11. n ≤ r.1 and 11.11. it suﬃces to truncate one or the other.33) [em ]i ≡ δim = 0 otherwise. C = i. i Refer to the deﬁnitions of active region in §§ 11.2. if it has an m × n active region17 and m ≤ r.31) then Ir C = Ir CIr = CIr . Evidently big identity matrices commute freely where small ones cannot (and the general identity matrix I = I∞ commutes freely past everything).3. For such a matrix. Such truncation is useful symbolically to reduce an extended operator to dimension-limited form.3. DIMENSIONALITY AND MATRIX FORMS 249 11. as in AIr . it retains the ﬁrst through rth rows of A but cancels other rows.

3. .18 • The ﬁrst is the interchange elementary T[i↔j] = I − (Eii + Ejj ) + (Eij + Eji ). Conventionally denoted T .35) The product of matrices has oﬀ-diagonal entries in a row or column only if at least one of the factors itself has oﬀ-diagonal entries in that row or column. The elementary operator T comes in three kinds.3 has introduced the general row or column operator. where j = i.1.6). the ith row or jth column of the product of matrices can depart from eT or ej . which is to say that the operator doesn’t actually do anything. The reason is that in (11. that if C1 ’s ith row is eT . There exist legitimate tactical reasons to forbid (as in § 11.19 18 In § 11. which itself is i just eT . then its action is merely to duplicate C2 ’s ith row. Or. some authors [31] forbid T[i↔i] as an elementary operator.4 The elementary operator Section 11. then also [C1 C2 ]∗j = ej . but normally this book permits. 19 As a matter of deﬁnition. THE MATRIX 11. C1 acts as a row operator on C2 .36) which by operating T[i↔j]A or AT[i↔j] respectively interchanges A’s ith row or column with its jth. (11. Parallel logic naturally applies to (11. the elementary operator is a simple extended row or column operator from sequences of which more complicated extended operators can be built. (11.8 Oﬀ-diagonal entries It is interesting to observe and useful to note that if [C1 ]i∗ = [C2 ]i∗ = eT .3.34) 11.35). the symbol A speciﬁcally represented an extended operator. but here and generally the symbol represents any matrix. i (11.250 CHAPTER 11. less readably but more precisely. since after all T[i↔i] = I. only if the i corresponding row or column of at least one of the factors so departs. respectively.34). i then also [C1 C2 ]i∗ = eT . i and likewise that if [C1 ]∗j = [C2 ]∗j = ej .

··· ··· ··· ··· ··· . . . . 7 7 7 7 5 T[1↔2] = 2 T5[4] = 2 6 6 6 6 6 6 6 6 6 6 6 4 . . . . respectively. 0 0 0 0 1 . . . ··· ··· ··· ··· ··· . 7 7 7 7 5 7 7 7 7 7 7 7. . . 0 0 0 1 0 . the applied mathematician considers whether it were not worth the eﬀort to adapt the deﬁnition accordingly. However. . ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· .. 3 T5[21] = It is good to deﬁne a concept aesthetically.. 3 7 7 7 7 7 7 7. . 3 7 7 7 7 7 7 7. 1 5 0 0 0 . 1 0 0 0 0 . . . . α times the ith column. 7 7 7 7 5 .37) which by operating Tα[i] A or ATα[i] scales (multiplies) A’s ith row or column. . . . . 0 1 0 0 0 . an applied mathematician ought not to let a mere deﬁnition entangle him. . • The third and last is the addition elementary Tα[ij] = I + αEij . . . . . . . 0 0 0 1 0 .. . . . . 6 6 6 6 6 6 6 6 6 6 6 4 .11. 0 0 0 0 1 .4. . 251 (11. . . . 0 0 0 0 1 .. 0 1 0 0 0 . . What matters is the underlying concept.. . . . . . . ··· ··· ··· ··· ··· . 0 0 0 5 0 . . . . 1 0 0 0 0 . and indeed in this case one might reasonably promote either deﬁnition on aesthetic grounds. THE ELEMENTARY OPERATOR • The second is the scaling elementary Tα[i] = I + (α − 1)Eii . . . α times the jth row. . 0 0 1 0 0 . . . . . . Where the deﬁnition does not serve the concept well. 0 0 1 0 0 . . . . . ..38) which by operating Tα[ij] A adds to the ith row of A. . by the factor α. i = j. . . . . . Examples of the elementary operators include 2 6 6 6 6 6 6 6 6 6 6 6 4 . . . . . One should usually do so when one can. . or which by operating ATα[ij] adds to the jth column of A. . . . ··· ··· ··· ··· ··· . 0 0 1 0 0 . . (11. . . . 0 1 0 0 0 . . α = 0. . . .

20 This means that any sequence of elementaries −1 safely be undone by the reverse sequence k Tk : −1 Tk k k k (11. The last can be considered a distinct.42) for any elementary operator T which operates within the given bounds. extended-operational form.42) lets an identity matrix with suﬃciently high rank pass through a sequence of elementaries as needed. Equation (11. The rank-r identity matrix Ir is no elementary operator.40) Tk can (11. THE MATRIX Note that none of these. with −1 T[i↔j] = T[j↔i] = T[i↔j]. but the general identity matrix I is indeed an elementary operator.41) Tk = I = k Tk k −1 Tk . −1 Tα[ij] = T−α[ij] . . Curiously. 11. The other books are not wrong. −1 Tα[i] = T(1/α)[i] . fourth kind of elementary operator if desired. elementary operators as deﬁned above are always invertible (which is to say.39) being themselves elementary operators such that T −1 T = I = T T −1 in each case. In general.252 CHAPTER 11. (11. but it is probably easier just to regard it as an elementary of any of the ﬁrst three kinds. From (11. the transpose of an elementary row operator is the corresponding elementary column operator.1 Properties Signiﬁcantly. This book ﬁnds it convenient to deﬁne the elementary operator in inﬁnite-dimensional. 21 If the statement seems to contradict statements of some other books. it is only a matter of deﬁnition. (11.43) 20 The addition elementary Tα[ii] and the scaling elementary T0[i] are forbidden precisely because they are not generally invertible.21 nor is the lone-element matrix Emn . and in fact no elementary operator of any kind. we have that Ir T = Ir T Ir = T Ir if 1 ≤ i ≤ r and 1 ≤ j ≤ r (11.4. their underlying deﬁnitions just diﬀer slightly.31). since I = T[i↔i] = T1[i] = T0[ij] . reversible in eﬀect). the interchange elementary is its own transpose and adjoint: ∗ T T[i↔j] = T[i↔j] = T[i↔j]. diﬀers from I in more than four elements.

THE ELEMENTARY OPERATOR 253 11. it can alter any elementary by commuting past. but all other pairs commute in this way. Itself subject to alteration only by another interchange elementary. however. it changes the kind of neither.4. • The interchange elementary is the strongest. where T1 and T2 are elementaries of the same kinds respectively as T1 and T2 . (ii) scaling. elementaries of diﬀerent kinds always commute. though ′ indeed T1 T2 = T2 T1 . we should like to observe a natural hierarchy among the three kinds of elementary: (i) interchange.11. and for most pairs T1 and T2 of elementary operators. with several elementaries of all kinds intermixed. Signiﬁcantly. they yield the same A and thus represent exactly the same matrix operation. among other possible orderings. Some applications demand that the elementaries be sorted and grouped by kind. First. One sorts a chain of elementary operators by repeatedly exchanging adjacent pairs.2 Commutation and sorting Elementary operators often occur in long chains like A = T−4[32] T[2↔3] T(1/5)[3] T(1/2)[31] T5[21] T[1↔3] . Though you probably cannot tell just by looking. as A = T[2↔3] T[1↔3] or as A = T−4[32] T(1/0xA)[21] T5[31] T(1/5)[2] T[2↔3] T[1↔3] . or whatever symbols T−4[21] T(1/0xA)[13] T5[23] T(1/5)[1] .4. However. at the moment we are dealing in elementary operators only. but has changed the kind of none of them. it so happens that there exists either a T1 such that ′ ′ ′ ′ ′ T1 T2 = T2 T1 or a T2 such that T1 T2 = T2 T1 . we shall list these exhaustively in a moment. Many qualitatively distinct pairs of elementaries exist. When an interchange elementary commutes past another elementary of any kind. the three products above are diﬀerent orderings of the same elementary chain. though commutation can alter one (never both) of the two elementaries. Interesting is that the act of reordering the elementaries has altered some of them into other elementaries of the same kind. (iii) addition. which seems impossible since matrix multiplication is not commutative: A1 A2 = A2 A1 . This of course supposes that one can exchange adjacent pairs. what it alters are the other elementary’s indices i and/or j (or m and/or n. And. The attempt sometimes fails when both T1 and T2 are addition elementaries.

what it alters is the latter’s scale α (or β. or whatever symbol happens to represent the scale in question). itself having no power to alter any elementary during commutation.3 list all possible pairs of elementary operators.) Refer to Table 11. When a scaling elementary commutes past an addition elementary. Only an interchange elementary can alter it. .1. Refer to Table 11. then what can one write of the commutation of an elementary past the general matrix A? With some matrix algebra. commuting leftward. When two interchange elementaries commute past one another. A pair of addition elementaries are the only pair that can altogether fail to commute—they fail when the row index of one equals the column index of the other—but when they do commute.2.1. neither alters the other.3 exhaustively describe the commutation of one elementary past another elementary. The mathematician chooses. is subject to alteration by either of the other two. T A = (T A)(I) = (T A)(T −1 T ).3. (11. as the reader can check. • Next in strength is the scaling elementary.2 and 11. one can write that T A = [T AT −1 ]T.254 CHAPTER 11.3.5 Inversion and similarity (introduction) If Tables 11. to T −1 AT . Refer to Table 11.44) where T −1 is given by (11. The only pairs that fail to commute are the last three of Table 11. • The addition elementary. Scaling elementaries do not alter one another during commutation. An elementary commuting rightward changes A to T AT −1 . 11. AT = T [T −1 AT ]. 11. AT = (I)(AT ) = (T T −1 )(AT ). 11. (Which one? Either. and it in turn can alter only an addition elementary.2 and 11. last and weakest. THE MATRIX happen to represent the indices in question). only one of the two is altered.39).1. Tables 11.

11. In the table. combining and expanding elementary operators: interchange.5. is simply to replace m by n and n by m among the indices of the other elementary. even another interchange elementary. i = j = m = n. Notice that the eﬀect an interchange elementary T[m↔n] has in passing any other elementary. INVERSION AND SIMILARITY (INTRODUCTION) 255 Table 11. no two indices are the same. commuting. T[m↔n] = T[n↔m] T[m↔m] = I IT[m↔n] = T[m↔n] I T[m↔n] T[m↔n] = T[m↔n] T[n↔m] = T[n↔m] T[m↔n] = I T[m↔n] T[i↔n] = T[i↔n] T[m↔i] = T[i↔m] T[m↔n] = T[i↔n] T[m↔n] 2 T[m↔n] T[i↔j] = T[i↔j] T[m↔n] T[m↔n] Tα[m] = Tα[n] T[m↔n] T[m↔n] Tα[i] = Tα[i] T[m↔n] T[m↔n] Tα[ij] = Tα[ij] T[m↔n] T[m↔n] Tα[in] = Tα[im] T[m↔n] T[m↔n] Tα[mj] = Tα[nj] T[m↔n] T[m↔n] Tα[mn] = Tα[nm] T[m↔n] .1: Inverting.

i = j = m = n. i = j = m = n. T1[m] = I ITβ[m] = Tβ[m] I T(1/β)[m] Tβ[m] = I Tβ[m] Tα[m] = Tα[m] Tβ[m] = Tαβ[m] Tβ[m] Tα[i] = Tα[i] Tβ[m] Tβ[m] Tα[ij] = Tα[ij] Tβ[m] Tβ[m] Tαβ[im] = Tα[im] Tβ[m] Tβ[m] Tα[mj] = Tαβ[mj] Tβ[m] Table 11. THE MATRIX Table 11. In the table.2: Inverting. combining and expanding elementary operators: addition. T0[ij] = I ITα[ij] = Tα[ij] I T−α[ij] Tα[ij] = I Tβ[ij] Tα[ij] = Tα[ij] Tβ[ij] = T(α+β)[ij] Tβ[mj] Tα[ij] = Tα[ij] Tβ[mj] Tβ[in] Tα[ij] = Tα[ij] Tβ[in] Tβ[mn] Tα[ij] = Tα[ij] Tβ[mn] Tβ[mi] Tα[ij] = Tα[ij] Tαβ[mj] Tβ[mi] Tβ[jn] Tα[ij] = Tα[ij] T−αβ[in] Tβ[jn] Tβ[ji] Tα[ij] = Tα[ij] Tβ[ji] . In the table. The last three lines give pairs of addition elementaries that do not commute. commuting.3: Inverting. combining and expanding elementary operators: scaling. commuting. no two indices are the same.256 CHAPTER 11. no two indices are the same.

) The broad question of how to invert a general matrix C. the null matrix has no such inverse.48) This rule emerges upon repeated application of (11. First. INVERSION AND SIMILARITY (INTRODUCTION) 257 First encountered in § 11. AC = C[C −1 AC].9 speak further of this. Matrix inversion is not for elementary operators only. though.11. −1 Ck k = k −1 Ck . (11. CA = [CAC −1 ]C. we leave for Chs. . which yields −1 Ck k k Ck = I = k Ck k −1 Ck .45).44) actually requires the matrix T there to be an elementary operator. Second.47) is a consequence of (11. Hence.4. (11. 12 and 13 to address. Many more general matrices C also have inverses such that C −1 C = I = CC −1 . (11.2 and 14. ∗ −1 −1 ∗ (11. For example. nothing in the logic leading to (11. such that T −1 T = I = T T −1 . For the moment however we should like to observe three simple rules involving matrix inversion. Third. Sections 12. CT C −1 = C −T = C −1 = C −∗ = C T .47) where C −∗ is condensed notation for conjugate transposition and inversion in either order and C −T is of like style. Equation (11. .45) (Do all matrices have such inverses? No. since for conjugate transposition C −1 ∗ C ∗ = CC −1 ∗ = [I]∗ = I = [I]∗ = C −1 C ∗ = C ∗ C −1 ∗ and similarly for nonconjugate transposition.46) The transformation CAC −1 or C −1 AC is called a similarity transformation.14). the notation T −1 means the inverse of the elementary operator T .5. Any matrix C for which C −1 is known can ﬁll the role.

In either notation. one can achieve any desired permutation (§ 4. This is the rank-r inverse.4 summarizes. 11. 2.45).31). (11. (11. . a matrix C −1(r) such that C −1(r) C = Ir = CC −1(r) .) Table 11.2. with which many or most readers will already be familiar. beginning . we shall ﬁrst learn how to compute it reliably in Ch. THE MATRIX Table 11.49) The full notation C −1(r) is not standard and usually is not needed. usually is not needed. 13.48) apply equally for the rank-r inverse as for the inﬁnite-dimensional inverse. For example. (The properties work equally for C −1(r) as for C −1 if A honors an r ×r active region. not just adjacent pairs). .258 CHAPTER 11. Because of (11. 3. By successively interchanging pairs of the objects (any pairs. The full notation C −1(r) for the rank-r inverse incidentally is not standard. 12.4: Matrix inversion properties. one can abbreviate the notation to C −1 . This is a famous way to use the inverse. and normally is not used.47) and (11.2 uses the rank-r inverse to solve an exactly determined linear system. . (11. n.46) too applies for the rank-r inverse if A’s active region is limited to r × r.) C −1 C = C −1(r) C = CT C∗ −1 −1 I Ir C −∗ = CC −1 = CC −1(r) = = C −1 C −1 T ∗ = C −T = CA = [CAC −1 ]C AC = C[C −1 AC] −1 Ck k = k −1 Ck A more limited form of the inverse exists than the inﬁnite-dimensional form of (11.1). (Section 13. When so. since the context usually implies the rank. but before using it so in Ch.6 Parity Consider the sequence of integers or other objects 1. .

some pairs will appear in correct order with respect to one another. v) (u. v. Complementarily. −2. thus the interchange aﬀects the ordering of no other pair). The interchange reverses with respect to one another just the pairs (u. . then we say that the permutation has even or positive parity. am the elements lying between. a1 . n). If two elements are adjacent and their order is correct. while others will appear in incorrect order. . but the pair [1. think about it. . . . thus reversing parity. Either way. Before the interchange: . . 3) (2. am . 3. a2 ) · · · (a1 . (3. .6. 2) (1. . 3] appears in incorrect order in that the larger 3 stands to the left of the smaller 1. 4. v. 7. PARITY 259 with 1. am . . v) 22 For readers who learned arithmetic in another language than English. 5. 1. . then odd or negative parity. . 0. v) (a2 . v) (am .11. the odd integers are . (n − 1. . 1. Why? Well. . then the 2 and 5. the even integers are . am−1 . n). (2. reversing parity. −1. 3) (1. . . . . 4. . . . 4) · · · (3. 3. one can achieve the permutation 3. (1. . n).. a1 ) (u. . and if p is even. then interchanging rectiﬁes the order. am ) (am−1 . a1 . .) If p is the number of pairs which appear in incorrect order (in the example. p = 6). Now contemplate all possible pairs: (1. an adjacent interchange alters p by exactly ±1. . n). a2 . a2 . . . . . with a1 . 5. . . 2 by interchanging ﬁrst the 1 and 3. 2. . am−1 ) (u. . −3. 4. too? To answer the question. 6. u. v) · · · (u. if odd. 2. What about nonadjacent elements? Does interchanging a pair of these reverse parity. . . 1. 4) · · · . but only of that pair (no other element interposes. (In 3. In a given permutation (like 3. After the interchange: . . u. . the pair [1. 2. 4. 5. . am−1 . 2). . if the order is incorrect. let u and v represent the two elements interchanged. 2] appears in correct order in that the larger 2 stands to the right of the smaller 1. a2 . 4. −4. . . 4) · · · (2.. .22 Now consider: every interchange of adjacent elements must either increment or decrement p by one. 5. 5. . 1. then interchanging falsiﬁes the order.

1 and 14. This does not change parity. the net change in p apparently also is odd. explained in § 11. reversing parity. there are some extra rules. The sole exception arises when an element is interchanged with itself. The rows or columns of a matrix can be considered elements in a sequence. so in parity calculations we ignore it.7. more complicated than elementary operators but less so than arbitrary matrices. interchanging rows or columns and thus reversing parity. however. then q T[ik ↔jk ] = I. it is possible that q T[ik ↔jk ] = I k=1 k=1 if q is even. called in this book the quasielementary operators. With respect to interchange and scaling.7. However. even q implies even p. which means even (positive) parity. Since each reversal alters p by ±1. It seems that regardless of how distant the pair. any sequences of elementaries of the respective kinds are allowed.41) are always invertible. We discuss parity in this. as explained in footnote 19.1. If so.260 CHAPTER 11. With respect to addition. scaling and addition—to match the three kinds of elementary. but it does not change anything else. so one ﬁnds it convenient to deﬁne an intermediate class of operators. We shall have more to say about parity in §§ 11. The three subsections which follow respectively introduce the three kinds of quasielementary operator. Such complicated operators are not trivial to analyze. which means odd (negative) parity.3. acts precisely in the manner described. either. A quasielementary operator is composed of elementaries only of a single kind.4. THE MATRIX The number of pairs reversed is odd. It follows that if ik = jk and q is odd. a chapter on matrices. then the interchange operator T[i↔j] . . There are thus three kinds of quasielementary—interchange. one can form much more complicated operators.7 The quasielementary operator Multiplying sequences of the elementary operators of § 11. because parity concerns the elementary interchange operator of § 11. i = j. 23 This is why some authors forbid self-interchanges. which per (11. 11.4.23 All other interchanges reverse parity. odd q implies odd p. In any event. interchanging any pair of elements reverses the permutation’s parity.

. P = k T[ik ↔jk ] . By (11. THE QUASIELEMENTARY OPERATOR 261 11. (11. . . . ..39). without altering any of the rows or columns it shuﬄes. permutation matrix. . permutor or general interchange operator. The eﬀect of the operator is to shuﬄe the rows or columns of the matrix it operates on. the inverse of the general interchange operator is P −1 −1 = k T[ik ↔jk ] = k −1 T[ik ↔jk ] = k T[ik ↔jk ] ∗ T[ik ↔jk ] ∗ = k = k T[ik ↔jk ] (11.51) = P∗ = PT (where P ∗ = P T because P has only real elements). . .41). (11. 1 0 0 0 0 .15). 0 0 0 0 1 .50) constitutes an interchange quasielementary. . 7 7 7 7 5 This operator resembles I in that it has a single one in each row and in each column. 0 1 0 0 0 .11. . . . .7. 7 7 7 7 7 7 7.” . 3 P = T[2↔5] T[1↔3] = 6 6 6 6 6 6 6 6 6 6 6 4 ··· ··· ··· ··· ··· . .1 The interchange quasielementary or general interchange operator Any product P of zero or more interchange elementaries. . but the ones here do not necessarily run along the main diagonal. The inverse. . ..52) The letter P here recalls the verb “to permute.7. 0 0 0 1 0 . . 0 0 1 0 0 . (11.24 An example is 2 .43) and (11. . . . . . 24 (11. . transpose and adjoint of the general interchange operator are thus the same: P T P = P ∗P = I = P P ∗ = P P T . . ··· ··· ··· ··· ··· . .

. . . . 3 7 7 7 7 7 7 7 7 7 7 7 5 D= ∞ i=−∞ Tαi [i] = ∞ i=−∞ Tαi [i] = ∞ i=−∞ αi Eii = (11. negative or odd parity if the number is odd. the matrix is nonetheless composable as a deﬁnite sequence of interchange elementaries. THE MATRIX A signiﬁcant attribute of the general interchange operator P is its parity: positive or even parity if the number of interchange elementaries T[ik ↔jk ] which compose it is even. The parity of the general interchange operator P concerns us for this reason. is a property of the matrix P itself. 0 0 0 ∗ 0 . 0 0 ∗ 0 0 . The letter D here recalls the adjective “diagonal. . It is the number of interchanges. most or even all i. Thus the example P above has even parity (two interchanges). 0 0 0 0 ∗ . which determines P ’s parity. The reason is that. .. For the purpose of parity determination. αi = 0 is forbidden by the deﬁnition of the scaling 25 6 6 6 6 6 6 6 6 6 6 6 4 ··· ··· ··· ··· ··· . 11. however. the positive (even) and negative (odd) parities sometimes lend actual positive and negative senses to the matrices they describe. . . . hence that Tαi [i] = I.1. . ∗ 0 0 0 0 . This works precisely as described in § 11. only interchange elementaries T[ik ↔jk ] for which ik = jk are counted.7. in this case elementary scaling operators:25 2 . diagonal matrix or general scaling operator D consists of a product of zero or more elementary operators. . incidentally. regardless of whether one ultimately means to use P as a row or column operator. ··· ··· ··· ··· ··· . .. as does I itself (zero interchanges). . . . . 0 ∗ 0 0 0 . .7.262 CHAPTER 11. .1. . any T[i↔i] = I noninterchanges are ignored. As we shall see in § 14. . but T[i↔j] alone (one interchange) has odd parity if i = j. for some. . . . not the use. .2 The scaling quasielementary or general scaling operator Like the interchange quasielementary P of § 11. No interchange quasielementary P has positive parity as a row operator but negative as a column operator.53) (of course it might be that αi = 1. . .6. not just of the operation P represents.” . Parity. the scaling quasielementary.

. . .11. . qualiﬁes as a quasielementary operator. .7. 3 7 7 7 7 7 7 7. . . .. . . A superset of the general scaling operator is the diagonal matrix. . 0 0 0 0 x5 . An example is 2 6 6 6 6 6 6 6 6 6 6 6 4 . 7 7 7 7 5 diag{x} = converts a vector x into a diagonal matrix. 2 6 6 6 6 6 6 6 6 6 6 6 4 . . They are nonzero scaling factors. . Not so. . . . . . This operator resembles I in that all its entries run down the main diagonal. 0 x2 0 0 0 . . 0 0 0 0 1 . where zeros along the main diagonal are allowed. . . . 7 7 7 7 5 263 D = T−5[4] T4[2] T7[1] = ··· ··· ··· ··· ··· . 0 0 0 0 1 0 0 −5 0 0 . . . . .. 11. The eﬀect of the operator is to scale the rows or columns of the matrix it operates on. any . 0 4 0 0 0 . though never zeros. .. . but these entries. . . . . . The conventional notation [diag{x}]ij ≡ δij xi = δij xj .54) where each element down the main diagonal is individually inverted. either. but is sometimes useful nevertheless. Its inverse is evidently D −1 = ∞ i=−∞ T(1/αi )[i] = ∞ i=−∞ T(1/αi )[i] = ∞ i=−∞ Eii . THE QUASIELEMENTARY OPERATOR elementary). . .. 0 0 x3 0 0 . . . . .55) . (11.3 Addition quasielementaries Any product of interchange elementaries (§ 11. . The diagonal matrix in general is not invertible and is no quasielementary operator. 7 0 0 0 0 . . any product of scaling elementaries (§ 11. . .7.7. ··· ··· ··· ··· ··· . . αi (11. . . deﬁned less restrictively that [A]ij = 0 for i = j.7. . The general scaling operator is a particularly simple matrix.1). . . . 0 0 0 x4 0 . . 3 7 7 7 7 7 7 7. are not necessarily ones. ··· ··· ··· ··· ··· .2). ··· ··· ··· ··· ··· . . x1 0 0 0 0 .

7 7 7 7 5 whose inverse is L−1 = [j] ∞ i=j+1 T−αij [ij] = ∞ ∞ i=j+1 T−αij [ij] (11.57) = I− i=j+1 αij Eij = 2I − L[j]. . . . .. = 6 6 6 6 6 6 6 6 6 6 6 4 . . .56) = I+ 2 . . 0 1 0 0 0 . a product of elementary addition operators must meet some additional restrictions. . 1 0 0 0 0 . . . Tαij [ij] (11. . . . . 0 0 0 1 0 . . 0 0 0 0 1 . 3 7 7 7 7 7 7 7. but the pattern is similar. . 0 0 1 ∗ ∗ . . . .27 ∞ i=j+1 ∞ i=j+1 ∞ i=j+1 L[j] = Tαij [ij] = αij Eij . THE MATRIX product of addition elementaries. The reader can ﬁll in the details. In this subsection the explanations are briefer than in the last two. . Four types of addition quasielementary are deﬁned:26 • the downward multitarget row addition operator. To qualify as a quasielementary.” 26 .. ··· ··· ··· ··· ··· . . . .264 CHAPTER 11. ··· ··· ··· ··· ··· . 27 The letter L here recalls the adjective “lower. .

and [j] • the leftward multitarget column addition operator. which is the transT pose U[j] of the upward operator. . . . . 0 0 0 1 0 . 3 2 = 6 6 6 6 6 6 6 6 6 6 6 4 . THE UNIT TRIANGULAR MATRIX • the upward multitarget row addition operator. .” . .28 j−1 j−1 265 U[j] = i=−∞ Tαij [ij] = i=−∞ j−1 Tαij [ij] (11. 7 7 7 7 5 whose inverse is j−1 −1 U[j] j−1 = i=−∞ T−αij [ij] = i=−∞ T−αij [ij] (11. ··· ··· ··· ··· ··· . .. .7 is the unit triangular matrix. . . which is the transpose LT of the downward operator.11. 11. . .8 The unit triangular matrix Yet more complicated than the quasielementary of § 11. . . • the rightward multitarget column addition operator. ··· ··· ··· ··· ··· . . 1 0 0 0 0 . . . 0 1 0 0 0 . .. ∗ ∗ 1 0 0 .59) j−1 = I− i=−∞ αij Eij = 2I − U[j] . 0 0 0 0 1 . with which we draw this necessary but tedious chapter toward 28 The letter U here recalls the adjective “upper. 7 7 7 7 7 7 7. . . . .58) = I+ i=−∞ αij Eij . . .8. .

7 7 7 7 5 U = I+ 2 . ··· ··· ··· ··· ··· . . .29 The strictly triangular matrix L − I or U − I is likewise sometimes of interest. 0 0 1 ∗ ∗ . . αij Eij = I + .. . ∞ .5. . This section presents and derives the basic properties of the unit triangular matrix. . ∗ ∗ ∗ ∗ 1 . ∗ ∗ ∗ 1 0 . 0 0 0 1 ∗ . ··· ··· ··· ··· ··· . The general triangular matrix LS or US . which by deﬁnition can have any values along its main diagonal. . Other books typically use the symbols L and U for the general triangular matrix of Schur. . 0 0 0 0 1 . .60) i=−∞ j=−∞ j=−∞ i=j+1 = 6 6 6 6 6 6 6 6 6 6 6 4 . .61) i=−∞ j=i+1 j=−∞ i=−∞ = 6 6 6 6 6 6 6 6 6 6 6 4 . ··· ··· ··· ··· ··· . “Schur decomposition. . The unit triangular matrix is a generalized addition quasielementary. THE MATRIX L = I+ 2 . 3 ∞ j−1 αij Eij (11. . . . . . . 30 August 2007] 29 . . . . 30 [52. . . . 1 0 0 0 0 . 3 7 7 7 7 7 7 7. . . ∞ ∞ ∞ αij Eij (11. . ··· ··· ··· ··· ··· . as in the Schur decomposition of § 14... . The subscript S here stands for Schur. 7 7 7 7 5 The former is a unit lower triangular matrix. αij Eij = I + . the latter. which adds not only to multiple targets but also from multiple sources—but in one direction only: downward or leftward for L or U T (or U ∗ ). a unit upper triangular matrix.” 00:32.. .266 a long close: ∞ i−1 CHAPTER 11. . . . as in Table 11. . . . such matrices cannot in general be expressed as products of elementary operators and this section does not treat them. ∗ ∗ 1 0 0 .10. . . . is sometimes of interest. 7 7 7 7 7 7 7. . . 1 ∗ ∗ ∗ ∗ . . ∗ 1 0 0 0 . . . . . 0 1 ∗ ∗ ∗ . . upward or rightward for U or LT (or L∗ ). .30 However. . but this book distinguishes by the subscript.

as row operators k to C. (11. Equation (11. to build any unit triangular matrix desired.62) immediately and directly.31 then conveniently. A3 and so on. rightward. 31 .63) is most easily seen if the several L[j] and U[j] are regarded as column operators acting sequentially on I:  ∞ j=−∞ ∞ j=−∞ L = (I)  U = (I)   L[j]  . U[j]  .11. but just thinking about how L[j] adds columns leftward and U[j] . without calculation. whereas (C)( Qk Ak ) ‘ applies ﬁrst A1 .63) follows at once.   The reader can construct an inductive proof symbolically on this basis without too much diﬃculty if desired.1 Construction To make a unit triangular matrix is straightforward: L= U= ∞ j=−∞ ∞ j=−∞ L[j] .62) U[j] . A3 and so on. then A2 . (11. The symbols and as this book uses them can thus be thought of respectively as row and column sequencers. (11. So long as the multiplication is done in the order indicated. then considering the order in which the several L[j] and U[j] act.8. as column operators to C. Q ‘ Recall again from § 2.8. THE UNIT TRIANGULAR MATRIX 267 11. Q This means that ( ‘ Ak )(C) applies ﬁrst A1 . The correctness of (11. ij ij ij ij which is to say that the entries of L and U are respectively nothing more than the relevant entries of the several L[j] and U[j].63) . L U = L[j] = U[j] . then A2 .63) enables one to use (11. whereas k Ak = A1 A2 A3 · · · .3 that k Ak = · · · A3 A2 A1 .

11. The proof for unit lower and unit upper triangular matrices is the same.8. Inasmuch as this is true.64) is another unit triangular matrix of the same type. But this is just [L1 L2 ]ij = 0 if i < j.62).2 The product of like unit triangular matrices The product of like unit triangular matrices.64).63) supplies the speciﬁc quasielementaries.268 CHAPTER 11. and [L2 ]mj is null when m < j. [L1 L2 ]ij = i m=j [L1 ]im [L2 ]mj if i = j. if i ≥ j.3 Inversion Inasmuch as any unit triangular matrix can be constructed from addition quasielementaries by (11. Therefore. [L1 ]ij [L2 ]ij = [L1 ]ii [L2 ]ii = (1)(1) = 1 if i = j.59) gives the inverse of each .8. Hence (11. But as we have just observed. (11. U1 U2 = U. [L1 ]ij or [L2 ]ij = 1 if i = j. [L1 L2 ]ij = ∞ m=−∞ [L1 ]im [L2 ]mj . L1 L2 = L. one starts from a form of the deﬁnition of a unit lower triangular matrix: 0 if i < j. which again is the very deﬁnition of a unit lower triangular matrix.57) or (11. THE MATRIX 11. and inasmuch as (11. In the unit lower triangular case. [L1 ]im is null when i < m. [L1 L2 ]ij = 0 i m=j [L1 ]im [L2 ]mj if i < j. nothing prevents us from weakening the statement to read 0 if i < j. inasmuch as (11. Then.

The adjoint reverses the sense of the triangle. . . . It is plain to see but still interesting to note that—unlike the inverse— the adjoint or transpose of a unit lower triangular matrix is a unit upper triangular matrix.65) −1 U[j] .66) j=−∞ i=k+1 = 6 6 6 6 6 6 6 6 6 6 6 4 . another unit upper triangular matrix. . [j] (11. . therefore. 0 0 0 1 0 . . . . 0 0 1 ∗ ∗ . ··· ··· ··· ··· ··· . and the inverse of a unit upper triangular matrix. .64).8. . . 0 1 0 ∗ ∗ . 0 0 0 0 1 . . .. . −1 In view of (11. . the inverse of a unit lower triangular matrix is another unit lower triangular matrix. . restricted form L {k} k = I+ 2 . . . . 3 7 7 7 7 7 7 7 7 7 7 7 5 (11. ··· ··· ··· ··· ··· . . . one can always invert a unit triangular matrix easily by L U −1 = = ∞ j=−∞ ∞ j=−∞ L−1 . 11.8. . THE UNIT TRIANGULAR MATRIX 269 such quasielementary.11..4 The parallel unit triangular matrix If a unit triangular matrix ﬁts the special. and that the adjoint or transpose of a unit upper triangular matrix is a unit lower triangular matrix. ∞ αij Eij . . . 1 0 0 ∗ ∗ . .

∗ ∗ 0 1 0 . . in matrix arithmetic generally. The reason we care about the separation of source from target is that. . . to A’s (k + 1)th row onward. . the sequence matters in which rows or columns are added. That is. and the sequence ceases to matter (refer to Table 11. ∗ ∗ 1 0 0 . . The danger is that i1 = j2 or i2 = j1 . . . A horizontal frontier separates source from target. 3 (11. .3).. that U Each separates source from target in the matrix A it operates on.270 or U {k} CHAPTER 11. . ∗ ∗ 0 0 1 . . The addition is from A’s rows through the kth. That is exactly what the parallel unit triangular matrix does: it separates source from target and thus removes the danger. . . . . but with the useful restriction L that it makes no row of A both source and target. 7 7 7 7 5 and U {k}T . Tα1 [i1 j1 ] Tα2 [i2 j2 ] = I + α1 Ei1 j1 + α2 Ei2 j2 = Tα2 [i2 j2 ] Tα1 [i1 j1 ] . . and also with respect to L {k}T 6 6 6 6 6 6 6 6 6 6 6 4 ··· ··· ··· ··· ··· . which acting AL {k}T and AU {k}T {k}T add columns respectively rightward and leftward (remembering that L {k}T is the lower). The general unit lower triangular matrix L acting LA on a matrix A adds the rows of A downward. . ∞ k−1 αij Eij . The parallel unit lower triangular matrix {k} {k} acting L A also adds rows downward. . ··· ··· ··· ··· ··· . . is no unit lower but a unit upper triangular matrix. It is for this reason that . in general. . 0 1 0 0 0 . It makes a diﬀerence whether the one addition comes before. . Remove this danger. 1 0 0 0 0 . .67) j=k i=−∞ = conﬁning its nonzero elements to a rectangle within the triangle as shown. during or after the other—but only because the target of the one addition might be the source of the other. . 7 7 7 7 7 7 7.. . THE MATRIX = I+ 2 . where source and target are not separate but remain intermixed. . then it is a parallel unit triangular matrix and has some special properties the general unit triangular matrix lacks. which acting U A adds rows upward. . Similar observations naturally apply with respect to the parallel unit {k} {k} upper triangular matrix U . which thus march in A as separate squads.

the inverse of the parallel unit . which says that one can build a parallel unit triangular matrix equally well in any sequence—in contrast to the case of the general unit triangular matrix.68 does not show them. The multiplication is fully commutative.62) one must sequence carefully. You can scramble the factors’ ordering any random way you like. THE UNIT TRIANGULAR MATRIX the parallel unit triangular matrix brings the useful property that 271 L {k} k =I+ k ∞ ∞ αij Eij k ∞ j=−∞ i=k+1 = k Tαij [ij] = k Tαij [ij] Tαij [ij] Tαij [ij] j=−∞ i=k+1 j=−∞ i=k+1 = ∞ ∞ Tαij [ij] = Tαij [ij] = ∞ ∞ j=−∞ i=k+1 k j=−∞ i=k+1 k = (11. (Though eqn.68) i=k+1 j=−∞ i=k+1 j=−∞ = {k} ∞ k Tαij [ij] = ∞ k−1 ∞ k Tαij [ij] .) Under such conditions. whose construction per (11. even more sequences are possible. 11. i=k+1 j=−∞ i=k+1 j=−∞ U =I+ ∞ αij Eij j=k i=−∞ k−1 = j=k i=−∞ Tαij [ij] = · · · .8.11.

. . Pictorially. . . . . . ··· ··· ··· ··· ··· .. . 0 0 0 0 1 . . . .5 records a few properties that come immediately of the last observation and from the parallel unit triangular matrix’s basic layout.69) represents is indeed simple: that one can multiply constituent elementaries in any order and still reach the same parallel unit triangular matrix. only with each element oﬀ the main diagonal negated. U {k} −1 = 6 6 6 6 6 6 6 6 6 6 6 4 . . . .. . .69) “particularly simple. ··· ··· ··· ··· ··· . . . . in the present context the idea (11. . 1 0 0 0 0 . .. . . . .” Nevertheless. (11. 0 0 0 1 0 . . . 0 −∗ −∗ −∗ 1 −∗ −∗ −∗ 0 1 0 0 0 0 1 0 0 0 0 1 . . . . . . 1 0 0 0 1 0 0 0 1 −∗ −∗ −∗ −∗ −∗ −∗ . . . 3 7 7 7 7 7 7 7. . THE MATRIX =I− k ∞ j=−∞ i=k+1 ∞ αij Eij = 2I − L {k} = {k} −1 ∞ j=−∞ i=k+1 T−αij [ij] = · · · . . . . . 7 7 7 7 5 7 7 7 7 7 7 7. . . . 2 . . where again the elementaries can be multiplied in any order. Table 11. 3 L {k} −1 = 2 6 6 6 6 6 6 6 6 6 6 6 4 ··· ··· ··· ··· ··· . . .. . .272 triangular matrix is particularly simple:32 L {k} −1 k CHAPTER 11. ··· ··· ··· ··· ··· . . 32 . that the elementaries in this case do not interfere. . . . . There is some odd parochiality at play in applied mathematics when one calls such collections of symbols as (11. . . . 7 7 7 7 5 The inverse of a parallel unit triangular matrix is just the matrix itself. . .69) αij Eij = 2I − U {k} k−1 U =I− = ∞ j=k i=−∞ k−1 j=k i=−∞ T−αij [ij] = · · · .

) L ∞ Ik+1 L {k} {k} −1 {k}′ +L 2 {k} −1 =I = {k} {k} U {k} +U 2 {k} −1 {k} k I−∞ k−1 {k} ∞ I−∞ U Ik = L = U −I ∞ = Ik+1 (L {k} {k} k−1 − I = I−∞ (U k − I)I−∞ ∞ − I)Ik If L {k} and (I − In )(L If U {k} (In − Ik )L honors an n × n active region. (In the table. − I)(In − Ik−1 ) .5: Properties of the parallel unit triangular matrix. {k} − I)Ik Ik−1 U honors an n × n active region. THE UNIT TRIANGULAR MATRIX 273 Table 11. then {k} and (I − In )(U (In − Ik−1 ) = U {k} − I) = 0 = (U − I = Ik−1 (U {k} − I)(I − In ). 11. then {k} Ik = L {k} {k} − I) = 0 = (L {k} − I = (In − Ik )(L {k} {k} − I)(I − In ).8. Note that the inverses L =U are parallel unit triangular matrices themselves. such that U the table’s properties hold for them.11. b the notation Ia represents the generalized dimension-limited indentity ma{k}′ {k} −1 =L and trix or truncator of eqn. too.30.

. 0 0 0 0 1 . . 1 0 0 0 0 . . . . k . .5 The partial unit triangular matrix Besides the notation L and U for the general unit lower and unit upper {k} {k} for the parallel unit and U triangular matrices and the notation L lower and unit upper triangular matrices. . . . . ∞ ∞ αij Eij . 0 0 0 0 1 . . 0 0 0 1 0 . . . . 0 0 1 ∗ ∗ . ∗ 1 0 0 0 . . ··· ··· ··· ··· ··· . .71) 2 = 6 6 6 6 6 6 6 6 6 6 6 4 . 7 7 7 7 5 U [k] = I+ j=−∞ i=−∞ αij Eij . 0 0 0 1 ∗ . . . j−1 7 7 7 7 7 7 7. 3 7 7 7 7 7 7 7 7 7 7 7 5 (11. .274 CHAPTER 11. . . ∗ ∗ 1 0 0 . . . . . .8. 3 (11. . . .. . 1 0 0 0 0 . THE MATRIX 11.70) j=k i=j+1 = 6 6 6 6 6 6 6 6 6 6 6 4 . . . . . . . we shall ﬁnd it useful to introduce the additional notation L [k] = I+ 2 . . . . ··· ··· ··· ··· ··· . . ··· ··· ··· ··· ··· . . . . . . .. . . . .. 0 1 0 0 0 . ··· ··· ··· ··· ··· .

··· ··· ··· ··· ··· . major partial unit triangular matrices. 7 7 7 7 5 U {k} = I+ 2 . . . . . . 1 0 0 0 0 . . and can be denoted L[k]T . ∗ ∗ 1 0 0 . . . . . . . THE UNIT TRIANGULAR MATRIX 275 for unit triangular matrices whose oﬀ-diagonal content is conﬁned to a narrow wedge and k L{k} = I + 2 . . . L{k}T and U {k}T . U [k]T . 1 ∗ ∗ ∗ ∗ . . The conventional notation b f (k) + c f (k) = c f (k) k=a k=b k=a suﬀers the same arguable imperfection. but it serves a purpose in this book and is introduced here for this reason. j−1 . 7 7 7 7 7 7 7. . the partial unit triangular matrix is a matrix which leftward or rightward of the kth column resembles I. . . .8. 0 0 0 1 0 . ∗ ∗ ∗ ∗ 1 . the former pair can be called minor partial unit triangular matrices.. {k} 33 . . 0 0 0 0 1 .. . . . L{k} and U {k} . 3 (11. . αij Eij . .. . and the latter pair. . . ··· ··· ··· ··· ··· . . . .4 are in fact also major partial unit triangular matrices. . . {k} {k} Observe that the parallel unit triangular matrices L and U of § 11. . .72) j=−∞ i=j+1 . U [k] . . as the notation suggests. = 6 6 6 6 6 6 6 6 6 6 6 4 . . 3 7 7 7 7 7 7 7 7 7 7 7 5 (11. . .. Whether minor or major.73) j=k i=−∞ = 6 6 6 6 6 6 6 6 6 6 6 4 ··· ··· ··· ··· ··· . . . . . The notation is arguably imperfect in that L{k} + L[k] − P = L but rather that I P P L + L[k+1] − I = L. . . ∗ ∗ ∗ 1 0 .8. . for the supplementary forms. 0 0 1 ∗ ∗ .11. 0 1 ∗ ∗ ∗ . Of course partial unit triangular matrices which resemble I above or below the kth row are equally possible. If names are needed for L[k]. 0 1 0 0 0 . . ··· ··· ··· ··· ··· ∞ ∞ αij Eij . .33 Such notation is not standard in the literature.

. 7 ∂f2 5 ∂x3 . its active region is ∞ × ∞ in extent.76) 11. . 0 0 1 0 0 0 0 . Further obvious but useful identities include that H−k (Iℓ − Ik ) = Iℓ−k H−k . . . 0 0 0 0 0 0 0 . . . 0 0 0 0 1 0 0 . . . . . 0 0 0 0 0 1 0 . . . An exception is the shift operator Hk . transpose and adjoint are the same: T ∗ ∗ T Hk Hk = Hk Hk = I = Hk Hk = Hk Hk . . . (Iℓ − Ik )Hk = Hk Iℓ−k . . Inasmuch as the shift operator shifts all rows or columns of the matrix it operates on. . and the function itself can be vector-valued.10 The Jacobian derivative Chapter 4 has introduced the derivative of a function with respect to a scalar variable. 0 0 0 0 0 0 0 . THE MATRIX 11. (11. ··· ··· ··· ··· ··· ··· ··· 3 7 7 7 7 7 7 7 7 7. Hk shifts A’s rows downward k steps. . . 0 0 0 1 0 0 0 . .74) H2 = Operating Hk A. 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 ··· ··· ··· ··· ··· ··· ··· . . deﬁned that [Hk ]ij = δi(j+k) . .9 The shift operator Not all useful matrices ﬁt the dimension-limited and extended-operational forms of § 11. .3. . the shift operator’s inverse.77) = dx ij ∂xj For instance. 7 7 7 7 7 7 5 (11. . .75) −1 ∗ T Hk = Hk = Hk = H−k . For example. if x has three elements and f has two. . . . . . Hk shifts A’s columns leftward k steps.276 CHAPTER 11. The derivative is df ∂fi . 0 0 0 0 0 0 1 . . then df = dx 2 ∂f1 6 ∂x 1 6 4 ∂f 2 ∂x1 ∂f1 ∂x2 ∂f2 ∂x2 3 ∂f1 ∂x3 7. One can also take the derivative of a function with respect to a vector variable. Obviously. . (11. (11. . . Operating AHk .

34 35 [52. however.25) in the form35 d T g Af = gTA dx d ∗ g Af = g∗ A dx df dx df dx + dg dx dg dx T T Af ∗ T .19) of the derivative. + Af valid for any constant matrix A—as is seen by applying the deﬁnition (4.9 and the Jacobian derivative of this section complete the family of matrix rudiments we shall need to begin to do increasingly interesting things with matrices in Chs. 12. ∂xj →0 ∂xj ∂xj and simplifying.11. one could vary xn+1 in principle.78) The derivative is not In as one might think.10. and ∂xn+1 /∂xn+1 = 0. the Jacobian matrix. . which here is (g + ∂g/2)∗ A(f + ∂f /2) − (g − ∂g/2)∗ A(f − ∂f /2) ∂ (g∗ Af ) = lim . Before doing interesting things. we must treat two more foundational matrix matters. “Jacobian. 2007] Notice that the last term on (11. which will be the subjects of Ch. (11. THE JACOBIAN DERIVATIVE 277 This is called the Jacobian derivative. next.79)’s second line is transposed. The Jacobian derivative obeys the derivative product rule (4. 13 and 14.79) . still. The shift operator of § 11. 15 Sept. The Jacobian derivative of a vector with respect to itself is dx = I.” 00:50. or just the Jacobian. because. dx (11. even if x has only n elements. not adjointed.34 Each of its columns is the derivative with respect to one element of x. The two are the Gauss-Jordan decomposition and the matter of matrix rank.

278 CHAPTER 11. THE MATRIX .

and • the unit triangular matrix L or U (§ 11.1).3. L[k] or U[k] (§ 11. The general matrix A does not necessarily have any of these properties. the latter including • lone-element matrix E (§ 11.7). • the elementary operator T (§ 11. 279 . and that several orderly procedures are known to do so. • the rank-r identity matrix Ir (§ 11.4). and indeed one of the more useful. However. • the quasielementary operator P . Section 11.Chapter 12 Matrix rank and the Gauss-Jordan decomposition Chapter 11 has brought the matrix and its rudiments.2). • the general identity matrix I and the scalar matrix λI (§ 11. but it turns out that one can factor any matrix whatsoever into a product of rudiments which do have the properties. This chapter explains. The simplest of these. • the null matrix 0 (§ 11. Such rudimentary forms have useful properties. In fact all matrices have rank.3.7). that section has actually deﬁned rank only for the rank-r identity matrix Ir .3. as we have seen.5). supplying in its place the new concept of matrix rank. D.3. This chapter introduces it. is the Gauss-Jordan decomposition.8).3 has de¨mphasized the concept of matrix dimensionality m× e n.

Technically the property applies to scalars. the several ak are linearly independent iﬀ α1 a1 + α2 a2 + α3 a3 + · · · + αn an = 0 (12. we shall ﬁnd it helpful to prepare two preliminaries thereto: (i) the matter of the linear independence of vectors. the chapter demands more rigor than one likes in such a book as this. More formally. a4 . Signiﬁcantly but less obviously. . by contrast. too. That is. any nonzero scalar alone is linearly independent—but there is no such thing as a linearly independent pair of scalars. a3 . there is also no such thing as a linearly independent set which includes the null vector. and (ii) the elementary similarity transformation. a2 . strictly speaking. Except in § 12. 13 and 14. Paradoxically. One . at least one of which is nonzero (trivial αk . n = 1 set consisting only of a1 = 0 is. where “nontrivial αk ” means the several αk . a5 .1 Linear independence Linear independence is a signiﬁcant possible property of a set of vectors— whether the set be the several columns of a matrix. (12. RANK AND THE GAUSS-JORDAN Before treating the Gauss-Jordan decomposition and the matter of matrix rank as such. . We shall drive through the chapter in as few pages as can be managed. For consistency of deﬁnition. Vectors which can combine nontrivially to reach the null vector are by deﬁnition linearly dependent. However. The chapter begins with these.2.1) for all nontrivial αk .280 CHAPTER 12. on the technical ground that the only possible linear combination of the empty set is trivial. inasmuch as a scalar resembles a one-element vector— so. because one of the pair can always be expressed as a complex multiple of the other.1) forbids it. not linearly independent.1 1 This is the kind of thinking which typically governs mathematical edge cases. the several rows. Linear independence is a property of vectors. however. would be α1 = α2 = α3 = · · · = αn = 0). n = 0 set as linearly independent. and then onward to the more interesting matrix topics of Chs. and logically the chapter cannot be omitted. the n vectors of the set {a1 . an } are linearly independent if and only if none of them can be expressed as a linear combination—a weighted sum—of the others. A vector is linearly independent if its role cannot be served by the other vectors in the set. or some other vectors—the property being deﬁned as follows. . . it is hard to see how to avoid the rigor here. even the singlemember. we regard the empty. 12.

only if β1 − β1 = 0. then it cannot be expressed as any other combination of the same vectors. among other examples. .1. the important property of matrix rank depends on the number of linearly independent columns or rows a matrix has. but it helps to visualize the concept geometrically in three dimensions. Similar thinking makes 0! = 1.5. β3 − ′ β3 = 0. then one might ask: can there exist a diﬀerent linear combination of the same vectors ak which also forms b? That is. . As we shall see in § 12. this could only be so if the coeﬃcients in the last ′ ′ ′ equation where trivial—that is. were in fact one and the same. The combination is unique. if a vector b can be expressed as a linear combination of several linearly independent vectors ak . because (§ 11. The diﬀerence of the two equations then would be ′ ′ ′ ′ (β1 − β1 )a1 + (β2 − β2 )a2 + (β3 − β3 )a3 + · · · + (βn − βn )an = 0. and 2 not 1 k=0 the least prime. Two such vectors are independent so long as they do not lie along the same line. One concludes therefore that. a chapter on matrices. β2 − β2 = 0. LINEAR INDEPENDENCE 281 If a linear combination of several independent vectors ak forms a vector b. βn − βn = 0. which we had supposed to diﬀer. but what then of the observation that adding a vector to a linearly dependent set never renders the set independent? Surely in this light it is preferable justP deﬁne the empty set as to independent in the ﬁrst place. then is ′ ′ ′ ′ β1 a1 + β2 a2 + β3 a3 + · · · + βn an = b possible? To answer the question. .1) a matrix is essentially a sequence of vectors—either of column vectors or of row vectors. depending on one’s point of view.1). could deﬁne the empty set to be linearly dependent if one really wanted to. if β1 a1 + β2 a2 + β3 a3 + · · · + βn an = b. −1 ak z k = 0.12. According to (12. using the threedimensional geometrical vectors of § 3. But this says no less than that the two linear combinations. suppose that it were possible. .3. Linear independence can apply in any dimensionality. A third such vector is independent of the ﬁrst two so long as it does not lie in their common plane. where the several ak satisfy (12. A fourth such vector (unless it points oﬀ into some unvisualizable fourth dimension) cannot possibly then be independent of the three. We discuss the linear independence of vectors in this.1). .

1). without fundamentally altering the character of the quasielementary or unit triangular matrix. . The symbols P ′ . L and U do. m × n matrix A is2 A = G> Ir G< = P DLU Ir KS. The similarity transformation has several interesting properties. typically merging D into L and omitting K and S. but they reveal little or nothing sketching the matrices does not. D.282 CHAPTER 12.3 The Gauss-Jordan decomposition The Gauss-Jordan decomposition of an arbitrary. only not necessarily the same ones P .2 The elementary similarity transformation Section 11. Perhaps the reader will agree that the decomposition is cleaner as presented here. RANK AND THE GAUSS-JORDAN 12.1 obtain. and sometimes without altering the matrix at all. The rules ﬁnd use among other places in the Gauss-Jordan decomposition of § 12. dimension-limited. Most of the table’s rules are fairly obvious if the meaning of the symbols is understood. D.3. Most introductory linear algebra texts this writer has met call the Gauss-Jordan decomposition instead the “LU decomposition” and include fewer factors in it. Of course rigorous symbolic proofs can be constructed after the pattern of § 11.2) G> ≡ P DLU. The symbols P . where • P and S are general interchange operators (§ 11. the several rules of Table 12.7. They also omit Ir . though to grasp some of the rules it helps to sketch the relevant matrices on a sheet of paper.8. D ′ . some of which we are now prepared to discuss.8. L and U of course represent the quasielementaries and unit triangular matrices of §§ 11. C = T .46) have introduced the similarity transformation CAC −1 or C −1 AC. since their matrices have pre-deﬁned dimensionality.2. which arises when an operator C commutes respectively rightward or leftward past a matrix A. particularly in the case in which the operator happens to be an elementary. In this case.5 and its (11. L′ and U ′ also represent quasielementaries and unit triangular matrices.7 and 11. G< ≡ KS. 12. 2 (12. The rules permit one to commute some but not all elementaries past a quasielementary or unit triangular matrix.

U. U Tα[ij] DT−α[ij] = D + ([D]jj − [D]ii ) αEij = D ′ Tα[ij] DT−α[ij] = D Tα[ij] U T−α[ij] = if [D]ii = [D]jj if i > j if i < j Tα[ij] LT−α[ij] = L′ U′ .1: Some elementary similarity transformations. U [k] . U {k} . L . T[i↔j] IT[i↔j] = I T[i↔j]P T[i↔j] = P ′ T[i↔j]DT[i↔j] = D ′ = D + ([D]jj − [D]ii ) Eii + ([D]ii − [D]jj ) Ejj T[i↔j]DT[i↔j] = D [k] if [D]ii = [D]jj T[i↔j]L T[i↔j] = L[k] {k} ′ {k} ′ if i < k and j < k if i > k and j > k if i > k and j > k if i < k and j < k if i > k and j > k if i < k and j < k T[i↔j]U [k] T[i↔j] = U [k] T[i↔j] L{k} T[i↔j] = L T[i↔j] U T[i↔j] L T[i↔j] U {k} {k} {k} T[i↔j] = U T[i↔j] = L {k} ′ {k} ′ T[i↔j] = U Tα[i] IT(1/α)[i] = I Tα[i] DT(1/α)[i] = D Tα[i] AT(1/α)[i] = A′ Tα[ij] IT−α[ij] = I where A is any of {k} {k} L. L[k] . THE GAUSS-JORDAN DECOMPOSITION 283 Table 12.3.12. L{k} .

RANK AND THE GAUSS-JORDAN • D is a general scaling operator (§ 11. > < S −1 K −1 = G−1 . > the Gauss-Jordan’s complementary form.3) 12. We shall meet some of the others in §§ 13. The equation itself is easy enough to read.1 Motive Equation (12. is the transpose of a parallel unit lower triangular matrix. for example). one can left-multiply the equation A = G> Ir G< by G−1 and right> multiply it by G−1 to obtain < U −1 L−1 D−1 P −1 AS −1 K −1 = G−1 AG−1 = Ir . 14.3.” The letter G itself can be regarded as standing for “Gauss-Jordan.2) seems inscrutable. but just as there are many ways to factor a scalar (0xC = [4][3] = [2]2 [3] = [2][6].284 CHAPTER 12. there are likewise many ways to factor a matrix.6.8. • K=L being thus a parallel unit upper triangular matrix (§ 11. However—at least for matrices which do have one—because G> and G< are composed of invertible factors.2).7. 14. and • r is an unspeciﬁed rank.8). (12. in fact) is a matter this section addresses.” but admittedly it is chosen as much because otherwise we were running out of available Roman capitals! 3 .12. The Gauss-Jordan decomposition we meet here however has both signiﬁcant theoretical properties and useful practical applications.10 and 14. and in any case needs less advanced preparation to appreciate One can pronounce G> and G< respectively as “G acting rightward” and “G acting leftward. Whether all possible matrices A have a Gauss-Jordan decomposition (they do. • G> and G< are composites3 as deﬁned by (12. • L and U are respectively unit lower and unit upper triangular matrices (§ 11. Why choose this particular way? There are indeed many ways.10.4). {r}T The Gauss-Jordan decomposition is also called the Gauss-Jordan factorization.2). < U −1 L−1 D −1 P −1 = G−1 .

for which A−1 A = In . that is only for square A.49. how to determine it is not immediately obvious. where A is known and A−1 is to be determined. it is only supposed here that A−1 A = In .3. left-multiplying by In and observing that In = In . itself constitutes A−1 . which implies that A−1 = (In ) T . (The A−1 here is the A−1(n) of eqn. it is not yet claimed that AA−1 = In . (In ) T (A) = In . and (at least as developed in this book) precedes the others logically. then we shall have that T (A) = In . so for the moment let us suppose a square A for which A−1 does exist. and even if it does exist. The matrix A−1 such that A−1 A = In may or may not exist (usually it does exist if A is square. as we shall soon see). When In is ﬁnally achieved. This observation is what motivates the Gauss-Jordan decomposition. what if A is not square? In the present subsection however we are not trying to prove anything. each of which makes the matrix more nearly resemble In .6) to n × n dimensionality. . truncated (§ 11. if one can determine A−1 . It emerges naturally when one posits a pair of square. THE GAUSS-JORDAN DECOMPOSITION 285 than the others. only to motivate. The product of elementaries which transforms A to In .3.12.) To determine A−1 is not an entirely trivial problem. 11. A and A−1 . However. n × n matrices. And still. and let us seek A−1 by left-multiplying A by a sequence T of elementary row operators. but even then it may not. 2 or.

or. all elementary operators including the ones here have extendedoperational form (§ 11.2).2). to group like operators. RANK AND THE GAUSS-JORDAN By successive steps. multiplying the two scaling elementaries to merge them into a single general scaling operator (§ 11. from Table 11. Only the 2 × 2 active regions are shown here. but all those · · · ellipses clutter the page too much.3. Using the elementary commutation identity that Tβ[m] Tα[mj] = Tαβ[mj] Tβ[m] . we have that A −1 = " 1 0 0 1 #" 1 0 2 1 #" 1 −3 5 0 1 #" 1 0 0 1 5 #" 1 2 0 1 0 # = " 1 −A 3 −A 2 5 1 5 # . .2. A −1 = " 1 0 0 1 #" 1 0 2 1 #" 1 0 0 1 5 #" 1 −3 0 1 #" 1 2 0 1 0 # = " 3 −A 1 −A 2 5 1 5 # . A −1 = " 1 0 0 1 #" 1 0 2 1 #" 1 −3 5 0 1 #" 1 2 0 1 5 0 # = " 3 −A 1 −A 2 5 1 5 # . from which A = DLU I2 = » 2 0 0 5 –» 1 3 5 0 1 –» 1 0 −2 1 –» 1 0 0 1 – = » 2 3 −4 −1 – . 4 . . Theoretically.7. . . The last equation is written symbolically as A−1 = I2 U −1 L−1 D−1 .286 CHAPTER 12. . – – – – – –» 0 1 5 Hence.4 a concrete example: A = » 1 2 1 2 1 2 1 2 1 2 0 0 0 0 0 0 1 0 1 0 1 0 1 0 1 » 1 0 0 1 –» » 1 0 1 0 2 1 2 1 –» » 1 0 1 0 1 0 0 1 5 0 1 5 –» –» –» » 1 −3 1 −3 1 −3 1 −3 0 1 0 1 0 1 0 1 –» –» –» –» – – – – – » 2 3 1 3 1 0 1 0 1 0 1 0 A = A = A = A = A = » » » » » −4 −1 −2 −1 −2 5 −2 1 0 1 0 1 – .

Intervening factors are multiplied by both T and T −1 .3. step 12. so if the reader will accept the other factors and suspend judgment on K until the actual need for it emerges in § 12. By successive elementary operations. it is (12. ˜ Each step of the transformation goes as follows. (12. the A on the right is gradually transformed into Ir . It begins with the equation A = IIIIAII. U . particularly to avoid dividing by zero when they encounter a zero in an inconvenient cell of the matrix (the reader might try reducing A = [0 1. The matrix I is left. this factor comes into play when A has broad rectangular rather than square shape. are all I. for instance. In between. elementary by elementary. . which is (12. S and K in the ﬁrst place? The answer regarding P and S is that these factors respectively gather row and column interchange elementaries. where the six I hold the places of the six Gauss-Jordan factors P . but only in the special case that r = 2 and P = S = K = I—which begs the question: why do we need the factors P .2. 12.or left-multiplied by T −1 . K and S of (12. we are not quite ready to detail yet. the equation is represented as ˜ ˜ ˜ ˜ ˜˜ ˜ A = P D LU I K S.2). D. For example. THE GAUSS-JORDAN DECOMPOSITION 287 Now. a row or column interchange is needed here).3. of which the example given has used none but which other examples sometimes need or want. but at present we are only motivating not proving.2).2 Method The Gauss-Jordan decomposition of a matrix A is not discovered at one stroke but rather is gradually built up.3. the equation A = DLU I2 is not (12..2). Regarding K.4) ˜ ˜ ˜ where the initial value of I is A and the initial values of P .2)—or rather. The decomposition thus ends with the equation A = P DLU Ir KS. etc. To compensate. which multiplication constitutes an elementary similarity transformation as described in § 12. and also sometimes when one of the rows of A happens to be a linear combination of the others. while the several matrices are gradually being transformed.12. L. 1 0] to I2 .or right-multiplied by an elementary operator T . The last point. admittedly. ˜ ˜ A = P DT(1/α)[i] ˜ Tα[i] LT(1/α)[i] ˜ Tα[i] U T(1/α)[i] ˜ ˜˜ Tα[i] I K S. one of the six factors is right. while the six I are gradually transformed into the six Gauss-Jordan factors. then we shall proceed on this basis. D.3.

3 The algorithm Having motivated the Gauss-Jordan decomposition in § 12. the algorithm decrees the following steps.2. ˜ ˜ L ← Tα[i] LT(1/α)[i] . ˜ = Ir .4) has become the Gauss-Jordan decomposition (12.1 and having proposed a basic method to pursue it in § 12. ˜ thus associating the operation with the appropriate factor—in this case.) . D. The remarks are unnecessary to execute the algorithm’s steps as such. Speciﬁcally. at Such elementary row and column operations are repeated until I which point (12. RANK AND THE GAUSS-JORDAN which is just (12. then. failproof algorithm to achieve it. 12. the algorithm ˜ • copies A into the variable working matrix I (step 1 below).2). orderly. • establishes a rank r (step 8).4). ˜ • reduces I by suitable row (and maybe column) operations to unit upper triangular form (steps 2 through 7).3. (The steps as written include many parenthetical remarks—so many that some steps seem to consist more of parenthetical remarks than of actual algorithm. ˜ ˜ D ← DT(1/α)[i] .288 CHAPTER 12. we shall now establish a deﬁnite. ˜ ˜ U ← Tα[i] U T(1/α)[i] . inasmuch as the adjacent elementaries cancel one another. Broadly.3. and ˜ • reduces the now unit triangular I further to the rank-r identity matrix Ir (steps 9 through 13).3. ˜ ˜ I ← Tα[i] I. They are however necessary to explain and to justify the algorithm’s steps to the reader.

Regarding the ˜ other factors.8). The other six variable working matrices each have extendedoperational form. What is less clear until one has read the whole algorithm. The ˜ relates not to the ı ı ˜ ˜ doubled. n × n for K and S. leaving ˜ = Ir . the ı matrix has proper unit upper triangular form (§ 11. The need grows clear once ˜ one has read through step 7. so e this step 2 though logical seems unneeded. The notation ˜ii thus means [I]ii —in other words. subscribed index ii but to I. 289 ˜ U ← I. Begin by initializing ˜ P ← I. where i is a row index. but this is accidental. ˜ S ← I. ˜ where I holds the part of A remaining to be decomposed. that I is all-zero below-leftward of and directly leftward of (though not directly below) the pivot element ˜ii .3. notice that L enjoys the major partial unit triangular The notation ˜ii looks interesting.) Observe that neither the ith row of I ˜ nor any row below it has an entry left of the ith column. one naturally need not store A beyond the m × n active region. From step 1. L and U . ˜ K ← I. One need store none of the matrices beyond these bounds. is that ˜ one also need not store the dimension-limited I beyond the m×n active region. 5 . and where the others are the variable working matrices ˜ of (12. i ← 1. the algorithm also ˜ ˜ re¨nters here from step 7 below. I = A and L = I. (Besides arriving at this point from step 1 above. but nevertheless true.5 Observe also that above the ith row. ˜ D ← I. ı ˜ it means the current iith element of the variable working matrix I.4).) 2.12. ˜ L ← I. ˜ I ← A. (The eventual goal will be to factor all of I away. THE GAUSS-JORDAN DECOMPOSITION 1. but they also conﬁne their activity to well deﬁned ˜ ˜ ˜ ˜ ˜ ˜ regions: m × m for P . I Since A by deﬁnition is a dimension-limited m×n matrix. D. though the precise value of r will not be known until step 8.

. . . . . . 0 1 ∗ ∗ ∗ ∗ ∗ . . 0 0 0 0 0 0 1 . . . . . but any nonzero element from the ith row downward ı can in general be chosen. . . 7 7 7 7 7 7 5 3 7 7 7 7 7 7 7 7 7. 0 0 0 0 1 0 0 . 7 7 7 7 7 7 5 where the ith row and ith column are depicted at center. . . . ··· ··· ··· ··· ··· ··· ··· . . . RANK AND THE GAUSS-JORDAN ˜ form L{i−1} (§ 11. Pictorially. ∗ ∗ ∗ ∗ ∗ ∗ ∗ . . ··· ··· ··· ··· ··· ··· ··· . . . . . . . . . . . . 0 0 ∗ 0 0 0 0 . . When one has no reason to choose otherwise. .290 CHAPTER 12. . .. . . . . . 0 0 0 0 1 0 0 . . that is as good a choice as any. . 2 . . if ˜ii = 0. . . . . Choose a nonzero element ˜pq = 0 on or below the pivot row. . 0 0 0 0 0 1 0 . . . . . . . . . . . ∗ ∗ ∗ ∗ ∗ ∗ ∗ . . ∗ ∗ ∗ ∗ ∗ ∗ ∗ .. . .. 7 7 7 7 7 7 7 7 7. ∗ 1 0 0 0 0 0 . where p = ı q = i. 0 0 0 0 0 0 1 . . . . ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· . There is however no actual need to choose so. . 0 ∗ 0 0 0 0 0 . (The easiest choice may simply be ˜ii . ∗ 0 0 0 0 0 0 . . . . .8. . . . . . where ı p ≥ i and q ≥ i. . ∗ ∗ 1 0 0 0 0 . . . 1 0 0 0 0 0 0 .. . . ∗ ∗ ∗ ∗ ∗ ∗ ∗ . . . . Beginning students of the Gauss-Jordan or LU decomposition are conventionally taught to choose ﬁrst the least possible q then the least possible p.5) and that dkk = 1 for all k ≥ i. 0 0 0 1 0 0 0 . 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 ˜ I = . . . . . . . . . 0 0 0 1 0 0 0 . . . . . . In fact alternate choices can sometimes improve . . ··· ··· ··· ··· ··· ··· ··· .. 1 ∗ ∗ ∗ ∗ ∗ ∗ . . . . . . . 0 0 1 ∗ ∗ ∗ ∗ . 3 ˜ D = 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 ··· ··· ··· ··· ··· ··· ··· . .. 3. 3 7 7 7 7 7 7 7 7 7. . 7 7 7 7 7 7 5 ˜ L = L{i−1} = 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 . . . . . . . 0 0 0 0 0 1 0 .

Smallness however is relative: a small pivot in a row and a column each populated by even smaller elements is unlikely to cause as much error as is a large pivot in a row and a column each populated by even larger elements.3. (The mantissa’s high-order bit. If two are equally least. is implied not stored. A typical Intel or AMD x86-class computer processor represents a C/C++ doubletype ﬂoating-point number. the choice is quite arbitrary. Such errors are naturally avoided by avoiding small pivots. then choose ﬁrst the lesser column ˜2 index q.0 ≤ b < 2.12.7 Theoretically nonetheless. THE GAUSS-JORDAN DECOMPOSITION 291 practical numerical accuracy. 0xB are for the number’s exponent −0x3FF ≤ p ≤ 0x3FE. ı ı ı∗ ı ′ =i ˜p′ q˜p′ q + p q ′ =i ˜pq ′˜pq ′ 6 Choose the p and q of least ηpq .4) can be expanded to read A = ˜ P T[p↔i] ˜ T[p↔i] DT[p↔i] ˜ T[p↔i]LT[p↔i] ˜ T[p↔i]U T[p↔i] ˜ × T[p↔i] IT[i↔q] = ˜ T[i↔q] KT[i↔q] ˜ T[i↔q] S ˜ ˜ ˜ ˜ P T[p↔i] D T[p↔i] LT[p↔i] U ˜ ˜ ˜ × T[p↔i] IT[i↔q] K T[i↔q] S . which is always 1. so long as ˜pq = 0. The following heuristic if programmed intelligently might not be too computationally expensive: Deﬁne the pivotsmallness metric 2˜∗ ˜pq ı pqı Pn ηpq ≡ Pm ∗ ˜2 . ˜ ˜ S ← T[i↔q] S. [26. but of course it is not generally exact.) If ı no nonzero element is available—if all remaining rows p ≥ i are now null—then skip directly to step 8. To choose a pivot. 4. when doing exact arithmetic. let ˜ ˜ P ← P T[p↔i] . § 1-4. All this is standard computing practice.6. any of several heuristics are reasonable. .0 (not 1. then if necessary the lesser row index p.) The out-of-bounds exponents p = −0x400 and p = 0x3FF serve specially respectively to encode 0 and ∞.0 as one might expect). ˜ ˜ L ← T[p↔i] LT[p↔i]. in 0x40 bits of computer memory. Observing that (12.2.0 ≤ b < 4. 0x34 are for the number’s mantissa 2.2] 7 The Gauss-Jordan’s ﬂoating-point errors come mainly from dividing by small pivots. x = 2p b. at least until as late in the algorithm as possible. Of the 0x40 bits. ˜ ˜ I ← T[p↔i] IT[i↔q] . Such a ﬂoating-point representation is easily accurate enough for most practical purposes. and one is for the number’s ± sign. thus is one neither of the 0x34 nor of the 0x40 bits.

. ∗ ∗ ∗ ∗ ∗ ∗ ∗ . . . . ˜ This forces ˜ii = 1. ı ˜ D = 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 ˜ I = . still ˜ ˜ ˜ U = K = I. Regarding the L transformation. . . 0 0 0 0 0 1 0 . . . to bring the chosen element to the pivot position. . . ∗ ∗ 1 0 0 0 0 .292 CHAPTER 12.1 for the similarity transformations. . . . which form according to Table 12. 7 7 7 7 7 7 5 ˜ ˜ I ← T(1/˜ii )[i] I. . . . . . . . . . . . . . . . . . . . ˜ ˜ ı L ← T(1/˜ )[i] LT˜ ı ii ii [i] . ··· ··· ··· ··· ··· ··· ··· . ∗ ∗ ∗ 1 ∗ ∗ ∗ . . . . . (Re˜ ˜ fer to Table 12. . The U and K transformations disappear because at this stage of the algorithm. . .) 5. . .. . . . 0 0 0 ∗ 0 0 0 . The D transformation disappears because p ≥ i and ˜kk = 1 for all k ≥ i.. .4) can be expanded to read ˜ ˜ ı A = P DT˜ii [i] ˜ ı T(1/˜ii )[i] LT˜ii [i] ı ˜ ı T(1/˜ii )[i] U T˜ii [i] ı ˜ ˜˜ × T(1/˜ii )[i] I K S ı ˜ ˜ ı = P DT˜ii [i] ˜ ˜ ı ˜ ˜˜ T(1/˜ii )[i] LT˜ii [i] U T(1/˜ii )[i] I K S. . 0 0 0 0 0 0 1 . . but L has major partial unit triangular form L{i−1} . . Observing that (12. ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· . . . . .. It also changes the value of dii . . . 0 0 ∗ 0 0 0 0 . 3 7 7 7 7 7 7 7 7 7. ∗ 1 0 0 0 0 0 . ı ı normalize the new ˜ii pivot by letting ı ˜ ˜ ı D ← DT˜ii [i] . RANK AND THE GAUSS-JORDAN thus interchanging the pth with the ith row and the qth with the ith column. ∗ ∗ ∗ ∗ ∗ ∗ ∗ . 0 0 0 0 1 0 0 . . . . . 0 ∗ 0 0 0 0 0 . 7 7 7 7 7 7 5 3 7 7 7 7 7 7 7 7 7. ∗ 0 0 0 0 0 0 . ∗ ∗ ∗ ∗ ∗ ∗ ∗ . . . ··· ··· ··· ··· ··· ··· ··· . .. . . . . 1 0 0 0 0 0 0 .1 it retains since i − 1 < i ≤ p. . 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 . . it does ˜ because d ˜ not disappear. . . Pictorially after ı this step. . . . . .

3 7 7 7 7 7 7 7 7 7. .3. . .) 7 7 7 7 7 7 7 7 7. . ··· ··· ··· ··· ··· ··· ··· . . . . ı ˜ L = L{i} = 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 ˜ I = . . .7. . . . . . It also ﬁlls in L’s ith column below the ı {i−1} form to the L{i} form. 7 7 7 7 7 7 5 ˜ I ←   ˜ L  m p=i+1 T˜pi [pi]  . . . . Observing that (12. again it leaves L in the major partial unit triangular form L{i−1} . . . 1 ∗ ∗ ∗ ∗ ∗ ∗ .. . .12. 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 . ∗ ∗ 1 0 0 0 0 . pivot. . Refer to Table 12. THE GAUSS-JORDAN DECOMPOSITION 293 ˜ ˜ (Though the step changes L. ∗ ∗ ∗ ∗ ∗ ∗ ∗ . 7 7 7 7 7 7 5 . . .. . . . . . 1 0 0 0 0 0 0 . ∗ ∗ ∗ ∗ ∗ ∗ ∗ . 0 0 0 1 ∗ ∗ ∗ . See § 11. . . ∗ 1 0 0 0 0 0 . . . . . .. . . ı ˜ clear I’s ith column below the pivot by letting   m ˜ L ← ˜ This forces ˜ip = 0 for all p > i. .3. . . . . ··· ··· ··· ··· ··· ··· ··· . . . . thus can be applied all at once. ∗ ∗ ∗ 1 0 0 0 . ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· . . . .4) can be expanded to read ˜˜ ˜ ı A = P D LT˜pi [pi] ˜ ı T−˜pi [pi] U T˜pi [pi] ı ˜ ˜˜ T−˜pi [pi] I K S ı ˜˜ ˜ ı ˜ ˜ ˜˜ = P D LT˜pi [pi] U T−˜pi [pi] I K S. 0 0 0 0 0 1 0 . . ∗ ∗ ∗ ∗ ∗ ∗ ∗ . because i − 1 < i. . . too. . . . 3 (Note that it is not necessary actually to apply the addition elementaries here one by one. 0 0 0 0 1 0 0 . . . . . 0 0 0 0 0 0 1 . . . advancing that matrix from the L Pictorially. . . . . .) 6. 0 1 ∗ ∗ ∗ ∗ ∗ . 0 0 1 ∗ ∗ ∗ ∗ . . ı  p=i+1 ˜ T−˜pi [pi]  I . .1.. . . Together they easily form an addition quasielementary L[i] . . . .

then skip directly to e step 12. thus letting i point to the matrix’s last nonzero row. ı ˜ This forces ˜ip = 0 for all p = i. Notice that.4) can be expanded to read ˜˜˜ ˜ ı A = P DL U T˜pi [pi] ˜ ˜˜ T−˜pi [pi] I K S. It also ﬁlls in U ’s ith column above the ı {i+1} form to the U {i} form. Go back to step 2. 10. r ≤ m and r ≤ n.) If i = 0. Increment CHAPTER 12. pivot. certainly. (Besides arriving at this point from step 8 above. the algorithm also re¨nters here from step 11 below. 9. Decrement i← i−1 to undo the last instance of step 7 (even if there never was an instance of step 7). 8. let the rank r ≡ i. ı   ˜ T−˜pi [pi]  I . Observing that (12. advancing that matrix from the U . After decrementing.294 7. ı ˜ clear I’s ith column above the pivot by letting  i−1 p=1 ˜ U ← ˜ I ←   ˜ U  i−1 p=1 T˜pi [pi]  . RANK AND THE GAUSS-JORDAN i ← i + 1.

. 3 7 7 7 7 7 7 7 7 7.. . ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· . . . . . . 0 0 0 1 0 0 0 . . . . . except with n − r extra columns dressing its right edge (often r = n however.4) can be expanded to read ˜˜˜˜ ˜ ı A = P D LU IT−˜pq [pq] ˜ ˜ T˜pq [pq] K S. . . . . ∗ ∗ ∗ ∗ ∗ 1 0 . . ∗ 1 0 0 0 0 0 . . .12. . . . 1 0 0 0 0 0 0 . ··· ··· ··· ··· ··· ··· ··· . . . . . . . ∗ ∗ ∗ ∗ 0 0 0 . . . ∗ ∗ ∗ ∗ 1 0 0 . 0 0 0 1 0 0 0 . . . . . . . Together they easily form an addition quasielementary U[i] . ∗ ∗ 1 0 0 0 0 . . . . . 2 ˜ I = (As in step 6. . . . 7 7 7 7 7 7 5 ˜ I= ··· ··· ··· ··· ··· ··· ··· . . . 0 0 1 0 0 0 0 . . 0 1 0 0 0 0 0 .. . .7. 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 . See § 11. . . ∗ ∗ ∗ ∗ ∗ ∗ 1 . ˜ 12. . . Observing that (12. 1 0 0 0 0 0 0 . . . 0 0 0 0 0 0 1 . . 1 0 0 0 0 0 0 . .. . . . . . 7 7 7 7 7 7 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 . . . .) 11. . . then there are no extra columns). . . . 0 1 0 0 0 0 0 . . Decrement i ← i − 1. ∗ ∗ ∗ ∗ 0 0 0 . . 3 7 7 7 7 7 7 7 7 7. . 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 . . ··· ··· ··· ··· ··· ··· ··· .. 7 7 7 7 7 7 5 295 ˜ U = U {i} = ··· ··· ··· ··· ··· ··· ··· . Go back to step 9. . . THE GAUSS-JORDAN DECOMPOSITION Pictorially. . . here again it is not necessary actually to apply the addition elementaries one by one. . 0 0 0 0 1 0 0 . . . . . .3. . . ı . . . ∗ ∗ ∗ ∗ 0 0 0 . . 0 0 0 0 0 1 0 . 0 0 1 0 0 0 0 . . . . . .3. . . . . ∗ ∗ ∗ 1 0 0 0 . . .. . . . . Pictorially. . . .. 3 7 7 7 7 7 7 7 7 7. . . . Notice that I now has the form of a rank-r identity matrix. .

See § 11.8. End.4 Rank and independent rows Observe that the Gauss-Jordan algorithm of § 12. here again it is not necessary actually to apply the addition elementaries one by one. Let ˜ ≡ P. entering this step. We can safely ignore this unproven fact however for the immediate moment. Together they easily form a parallel unit upper—not lower—triangular matrix {r}T .296 CHAPTER 12. Therefore. (12. regardless of which pivots ˜pq = 0 ı one happens to choose in step 3 along the way. ˜ Never stalling. that the algorithm always produces the same Ir . ˜ D ≡ D.) 12.3. RANK AND THE GAUSS-JORDAN ˜ use the now conveniently elementarized columns of I’s main body to suppress the extra columns on its right edge by   n r ˜ I ← ˜ ˜ (Actually. r ≤ n.5) . ı  q=r+1 p=1 ˜ T˜pq [pq] K . Notice now that I = Ir . r ≤ m. As in steps 6 and 10. but shall in § 12.5. ˜ K ≡ K. ı ˜ ≡ U.4.) L ˜ 13. the same rank r ≥ 0. ˜ L ≡ L. P U ˜ K ←   ˜ I  n q=r+1 p=1 r T−˜pq [pq]  . (We have not yet proven. though what value the rank r might turn out to have is not normally known to us in advance. so in fact K becomes just the product above. the algorithm cannot fail to achieve I = Ir and thus a complete Gauss-Jordan decomposition of the form (12.3.2).3 operates always within the bounds of the original m × n matrix A. it was that K = I. necessarily. ˜ S ≡ S.

r < m. one can invert. Refer to (11. but according to § 12. Observe also however that the rank always fully reaches r = m if the rows of the original matrix A are linearly independent. Step 6 in the ith iteration adds multiples of the ith pivot row downward. U .1). which already include multiples of the ﬁrst through (i − 1)th. one can drown the assertion rigorously in symbols to prove it.41). K and S—is composed of a sequence T of elementary operators. a multiple of itself—until step 10. Step 6 in the second iteration therefore adds multiples of the second pivot row downward. in fact. If one merely records the sequence of elementaries used to build each of the six factors—if one reverses each sequence. 12.12. and multiplies—then the six inverse factors result. One actually need not record the individual elementaries. but ﬁnd only rows which already include multiples of the ﬁrst pivot row. How do we know that step 6 can never create a null row? We know this because the action of step 6 is to add multiples only of current and earlier pivot rows ˜ to rows in I which have not yet been on pivot. such action has no power to cancel the independent rows it targets. D−1 .7 and 11. inverts each elementary. but before going to that extreme consider: The action of steps 3 and 4 is to choose a pivot row p ≥ i and to shift it upward to the ith position.8 have shown how. L.3. One need not however go even to that much trouble. To no row is ever added. The reason for this observation is that the rank can fall short. This being so. such null rows never were linearly independent in the ﬁrst place). only if step 3 ﬁnds a null row i ≤ m.8 According to (12. but step 3 can ﬁnd such a null row only if step 6 has created one (or if there were a null row in the original matrix A. L−1 . Sections 11. steps 3 and 4 in the second iteration ﬁnd no unmixed rows available to choose as second pivot. The action of step 6 then is to add multiples of the chosen pivot row downward only—that is. Each of the six factors—P . which already include multiples of the ﬁrst pivot row. only to rows which have not yet been on pivot.1. So it comes to pass that multiples only of current and earlier pivot rows are added to rows which have not yet been on pivot.3. directly or indirectly. it isn’t even that hard. multiply and forget them in stream. which does not belong to the algorithm’s main loop and has nothing to do with the availability of nonzero rows to step 3. This is unsurprising. Each of the six inverse factors—P −1 . D. And. This means starting the algorithm from step 1 with six extra variable work8 If the truth of the sentence’s assertion regarding the action of step 6 seems nonobvious. Indeed the narrative of the algorithm’s step 8 has already noticed the fact. THE GAUSS-JORDAN DECOMPOSITION 297 The rank r exceeds the number neither of the matrix’s rows nor of its columns. .5 Inverting the factors Inverting the six Gauss-Jordan factors is easy. U −1 . K −1 and S −1 —is therefore composed of the reverse sequence T −1 of inverse elementary operators.

RANK AND THE GAUSS-JORDAN ing matrices (besides the seven already there): ˜ P −1 ← I. we left-multiply (12. ˜ U −1 ← I. by their n × n.3). actions which have no eﬀect on A since it is already a dimension-limited m×n 9 Section 11. anyway. To truncate the six operators formally. Indeed.3.) Then. the Im and In respectively truncate rows and columns. row operators. ˜ L−1 ← I. so there is nothing for the six operators to act upon beyond those bounds in any event.3. ˜ ˜ ı D ← DT˜ii [i] . unknown at algorithm’s start. ı ˜ ˜ I ← T(1/˜ii )[i] I. ı ˜ ˜ D−1 ← T(1/˜ii )[i] D−1 . ˜ K −1 ← I.2) actually needs to retain its entire extendedoperational form (§ 11. not because it would not be useful. ı ı With this simple extension. in step 5.298 CHAPTER 12. K or S. for each ˜ ˜ ˜ ˜ ˜ ˜ operation on any of P . For example. obtaining Im AIn = Im P DLU Ir KSIn .1) for this reason if we want. U . the algorithm yields all the factors not only of the Gauss-Jordan decomposition (12. the two on the right. According to § 11. neither Ir nor A has anything but zeros outside the m × n rectangle. We can truncate all six operators to dimension-limited forms (§ 11. ˜ D−1 ← I. column operators. but because its initial value would be9 A−1(r) . act wholly by their m × m squares. The four factors on the left.2).2) but simultaneously also of the Gauss-Jordan’s complementary form (12.6 Truncating the factors None of the six factors of (12. ˜ ˜ ı L ← T(1/˜ii )[i] LT˜ii [i] .3.3. ˜ (There is no I −1 .2) by Im and right-multiply it by In .5 explains the notation. ı ˜ ˜ L−1 ← T(1/˜ii )[i] L−1 T˜ii [i] . ˜ S −1 ← I. . D. 12. L. one operates inversely on the corresponding inverse matrix.6.

Comes the objection.6) expresses any dimension-limited rectangular matrix A as the product of six particularly simple. having inﬁnite rank. the dimensionality m × n of the matrix is a distraction. serves the latter case. m × r. A = Im P DLU Ir KSIn 7 2 3 = Im P DLU Ir KSIn . then. dimension-limited rectangular factors. 299 and ﬁnally. whose rank matters. Equation (12. and theoretically in the author’s view. too.2 lists a few. Table 12.31) or (11. (12. m × m. or a reversible row or column operation. one need not be too impressed with either. A = (Im P Im )(Im DIm )(Im LIm )(Im U Ir )(Ir KIn )(In SIn ).3. By similar reasoning from (12. by using (11.7) where the dimensionalities of the two factors are m × r and r × n. Anyway. a matrix displaying an inﬁnite ﬁeld of zeros resembles a shipment delivering an inﬁnite supply of nothing. why do you make it more complicated than it needs to be? For what reason must all your matrices have inﬁnite dimensionality. whose rank does not. The answer is that this book is a book of applied mathematical theory.7 Properties of the factors One would expect such neatly formed operators as the factors of the GaussJordan to enjoy some useful special properties.42) repeatedly.3 manifest the sense that a matrix can represent a linear transformation. The extended-operational form. It is the rank r. The book will seldom point it out again explicitly. but one can straightforwardly truncate not only the Gauss-Jordan factors but most other factors and operators. (12. if any. 10 . however.2). nor is there any real diﬀerence between T5[21] when it row-operates on a 3 × p matrix and the same elementary when it row-operates on a 4 × p. inﬁnite-dimensional matrices are signiﬁcantly neater to handle. THE GAUSS-JORDAN DECOMPOSITION matrix. “Black.12. The relevant theoretical constructs ought to reﬂect such insights. To append a null row or a null column to a dimension-limited matrix is to alter the matrix in no essential way.10 12. m × m.3. Indeed they do. anyway? They don’t do it that way in my other linear algebra book. by the method of this subsection. The table’s properties formally come from (11.52) and Table 11. Hence inﬁnite dimensionality.” It is a fair question. A = (Im G> Ir )(Ir G< In ). The two matrix forms of § 11.5. that counts. In either case. By successive steps. r × n and n × n.6) where the dimensionalities of the six factors on the equation’s right side are respectively m × m.

The table’s properties regarding P and S express a general advantage all permutors share. Further properties of the several Gauss-Jordan factors can be gleaned from the respectively relevant subsections of §§ 11. 12. if one ﬁrmly grasps the matrix forms involved and comprehends the notation (neither of which is trivial to do).8. then the properties are plainly seen without reference to Ch. then one can take advantage of (11.8 Marginalizing the factor In If A happens to be a square.3.8) thus marginalizing the factor In . The table’s properties regarding K are admittedly less signiﬁcant. n × n matrix and if it develops that the rank r = n. and if one sketches the relevant factors schematically with a pencil.300 CHAPTER 12.31) to rewrite the Gauss-Jordan decomposition (12.2) in the form P DLU KSIn = A = In P DLU KS. RANK AND THE GAUSS-JORDAN Table 12. Still.2: A few properties of the Gauss-Jordan factors. P ∗ = P −1 = P T S ∗ = S −1 P −∗ = S −∗ = P S = ST = P −T = S −T = I = Ir (K − I)(In − Ir ) = (K − I)(I − In ) K + K −1 2 Ir K −1 (In (I − In Ir K(In − Ir ) = − Ir ) = − I) = K −1 K −I 0 0 (I − In )(K − I) = )(K −1 − I = Ir (K −1 − I)(In − Ir ) = (K −1 − I)(I − In ) but. They might ﬁnd other uses. if one understands that the operator (In − Ir ) is a truncator that selects columns r + 1 through n of the matrix it operates leftward upon. (12.7 and 11. This is to express the Gauss-Jordan solely in row operations or solely in column operations.3 will need them. It does not change the . 11 as such. included mostly only because § 13. even the K properties are always true.

9 Decomposing an extended operator Once one has derived the Gauss-Jordan decomposition. the extended-operational matrix form is hardly more than a formality. the preceding equation results. if you like. Or. too. this may be a distinction without a diﬀerence.12.6 and 11. of (11. Indeed this was also the point of § 12.” as the narrative says. 13.3. eqn. To put them all in dimension-limited form however brings at least three eﬀects the book you are reading prefers to avoid. one decomposes the n × n dimension-limited matrix In A = In AIn = AIn as AIn = P DLU In KS = P DLU KSIn . wherein the Ir has become an I. because the vector on which the operator ultimately acts is probably null there in any event. The last of the three is arguably most signiﬁcant: matrix rank is such an important attribute that one prefers to impute it only to those operators about which it actually says something interesting. it leaves shift-and-truncate operations hard to express cleanly (refer to §§ 11. it leaves one to consider the ranks of reversible operators like T[1↔2] that naturally should have no rank.6 and. whereas a dimension-limited operator would have truncated whatever it found there.31). then one might just put all matrices in dimension-limited form. THE GAUSS-JORDAN DECOMPOSITION 301 algorithm and it does not alter the factors. Nevertheless.9 and. n × n extended operator A (where per § 11. Since what is found outside the operational domain is often uninteresting.3. One can decompose only reversible extended operators so. It fails however if A is rectangular or r < n. it merely reorders the factors after the algorithm has determined them. is that what an operator does outside some appropriately delimited active region is seldom interesting. 11 . One merely writes A = P DLU KS.3. This subsection’s equations remain unnumbered because they say little new. really. to extend it to decompose a reversible.3. In such a context it may not matter whether one truncates the operator.3. See § 12. as a typical example of the usage.11 If “it may not matter. it confuses the otherwise natural extension of discrete vectors into continuous functions. Second. inasmuch as all the factors present but In are n × n extended operators. which one can safely ignore. First. 12. for which the rank of the truncated form AIn is r < n. Their only point. equivalently. Third. All it says is that the extended operator unobtrusively leaves untouched anything it happens to ﬁnd outside its operational domain. from which.7).5.2 A outside the n × n active region resembles the inﬁnite-dimensional identity matrix I) is trivial. The GaussJordan fails on irreversible extended operators. Many books do.

13. . for which ψo = 0. However. the Gauss-Jordan decomposition is a signiﬁcant achievement. RANK AND THE GAUSS-JORDAN Regarding the present section as a whole. the space consists of all vectors b formable as βo u + β1 a1 + β2 a2 + · · · + βm am = b.12.302 CHAPTER 12. That is. Solving (12. before closing the present chapter we should like ﬁnally. ψo u + ψ1 a1 + ψ2 a2 + · · · + ψm am = v. . As a deﬁnition.10 and the singular value of § 14. we shall ﬁrst need one more preliminary: the technique of vector replacement. ψo ψo ψo ψo (12. We shall put the Gauss-Jordan to good use in Ch. the diagonal of § 14. 12. Now consider a speciﬁc vector v in the space. which we had not known how to handle very well.9) . into a product of unit triangular matrices and quasielementaries. am }. To do that.10) (12. It is not the only matrix decomposition— further interesting decompositions include the Gram-Schmidt of § 13. we ﬁnd that ψ1 ψ2 ψm 1 v− a1 − a2 − · · · − am = u. . squarely to deﬁne and to establish the concept of matrix rank. which we do. a2 . a1 . among others—but the Gauss-Jordan nonetheless reliably factors an arbitrary m × n matrix A. the space these vectors address consists of all linear combinations of the set’s several vectors.4 Vector replacement Consider a set of m + 1 (not necessarily independent) vectors {u.10) for u. . the Schur of § 14.10. not only for Ir but for all matrices.6.

. Now further consider an arbitrary vector b which lies in the space addressed by the vectors {u. assuming that ψo is ﬁnite it even appears that φo = 0. φo φ2 = − . only with the symbols u ↔ v and ψ ↔ φ swapped. it happens that ψo = ψ1 ψ2 1 . φm = − . a2 . .3 summarizes. . . the symmetry is complete. . Table 12.10). φo ψm the solution is φo v + φ1 a1 + φ2 a2 + · · · + φm am = u. a2 . ψo . Does the same b also lie in the space addressed by the vectors {v. . ψm .11) has identical form to (12. am }? . φo φ1 = − . φo . .4. ψo ψ1 ← − . (12. a1 . . ψo ψ2 ← − . . . quite symmetrically. am }. VECTOR REPLACEMENT With the change of variables φo ← φ1 φ2 1 . .11) Equation (12.12. so. a1 . . ← − ψo 303 φm for which. Since φo = 1/ψo .

one can safely replace the u by a new vector v. . . . a2 . . provided that the replacement vector v includes at least a little of the replaced vector u (ψo = 0 in eqn. 12. never just in one or the other. one and the same. . we substitute into (12. Given a set of vectors {u. . obtaining the form (βo )(φo v + φ1 a1 + φ2 a2 + · · · + φm am ) + β1 a1 + β2 a2 + · · · + βm am = b.4. ψo u + ψ1 a1 + ψ2 a2 + · · · + ψm am = v 0 = 1 ψo ψ1 − ψo ψ2 − ψo = φo = φ1 = φ2 . in which we see that. . a1 . RANK AND THE GAUSS-JORDAN Table 12. . obtaining the new set {v. a vector b must lie in both spaces or neither. The two spaces are.9) the expression for u from (12. a1 .304 CHAPTER 12. . − ψm ψo = φm φo v + φ1 a1 + φ2 a2 + · · · + φm am = u 0 = 1 φo φ1 − φo φ2 − φo = ψo = ψ1 = ψ2 . Collecting terms. . . am }. Therefore.3: The symmetrical equations of § 12. b does indeed also lie in the latter space. This leads to the following useful conclusion. that an arbitrary vector b which lies in the latter space also lies in the former. this is βo φo v + (βo φ1 + β1 )a1 + (βo φ2 + β2 )a2 + · · · + (βo φm + βm )am = b. − φm φo = ψm To show that it does. a2 . am }.11). Naturally the problem’s u ↔ v symmetry then guarantees the converse. .10) and that v is otherwise an honest . in fact. yes.

10) would still also explicitly make v a linear combination of the same ak plus a nonzero multiple of u. for.6 have introduced the rank-r identity matrix Ir . As a corollary. The contradiction proves false the assumption which gave rise to it: that the vectors of the new set were linearly dependent. because according to § 12. The new space is exactly the same as the old. where the integer r is the number of ones the matrix has along its main diagonal.1).5 and 11. If a matrix element . if it were that γo v + γ1 a1 + γ2 a2 + · · · + γm am = 0 for nontrivial γo and γk . To forestall potential confusion in the matter. Also.12. unambiguous rank. Commonly. then the vectors of the new set are linearly independent. the third row is just two-thirds the second. 6 The third column of this matrix is the sum of the ﬁrst and second columns. in which case v would be a linear combination of the several ak alone. an n × n matrix has rank r = n. too. then (12. then either γo = 0—impossible since that would make the several ak themselves linearly dependent—or γo = 0. the matrix has only two independent vectors. Either way.5. we should immediately observe that—like the rest of this chapter but unlike some other parts of the book—this section explicitly trades in exact numbers.1.5 Rank Sections 11.3. The section demonstrates that every matrix has a deﬁnite. Such vector replacement does not in any way alter the space addressed. if the vectors of the original set happen to be linearly independent (§ 12. The rank of this 3 × 3 matrix is not r = 3 but r = 2. But if v were a linear combination of the several ak alone. Yet both combinations cannot be. and shows how this rank can be calculated.3. RANK 305 linear combination of the several vectors of the original set. untainted by foreign contribution. but consider the matrix 2 5 4 3 2 1 6 4 3 6 9 5. by columns or by rows. Hence the vectors of the new set are equally as independent as the vectors of the old. 12. too. two distinct combinations of independent vectors can never target the same v. This section establishes properly the important concept of matrix rank. Other matrices have rank.

then it is exactly 5. the numbers are exact. The author has exactly one brother. but that is not the point here. and so on.5. “the end justiﬁes the means. which thereby one can and does prove.2 days). so this subsection is to prepare the reader to expect the maneuver.0 ± 0. If P3 then Q. Herein.306 CHAPTER 12.12 12. then.1 sides! A triangle has exactly three sides. it is the applied mathematician’s responsibility to distinguish between the two kinds of quantity.0±0. would be to suppose P1 and show that it led to Q. one can still conclude that their common object Q is true. On the contrary. none of which one actually means to prove. which one might name.5.2 we shall execute a pretty logical maneuver. Where the distinction matters. although one can draw no valid conclusion regarding any one of the three predicates P1 . of course—especially matrices populated by measured data—can never truly be exact. Once the reader feels that he grasps its logic.1 inches. Exact quantities are every bit as valid in applied mathematics as imprecise ones are. The ratio of a circle’s circumference to its radius is exactly 2π. Many real-world matrices. the length of your thumb may indeed be 3. The ﬁnal step would be to show somehow that P1 .1 A logical maneuver In § 12.1 inches for the length of your thumb. the maneuver if unexpected can confuse. One valid way to prove Q. It is false to suppose that because applied mathematics permits imprecise quantities. 13 The maneuver’s name rings a bit sinister. it also requires them. the means is to assert several individually suspicious claims.0±0. but surely no triangle has 3. then again to suppose P3 and show that it also led to Q. P2 and P3 could not possibly all be false at once. P2 and P3 are true is not known. RANK AND THE GAUSS-JORDAN here is 5. A construction contract might require the builder to ﬁnish within exactly 180 days (though the actual construction time might be an inexact t = 172. Therefore.6 ± 0. Here. Which of P1 . 12 . If P2 then Q.2.”13 When embedded within a larger logical construct as in § 12. P2 or P3 . does it not? The author recommends no such maneuver in social or family life! Logically here however it helps the math. It is a subtle maneuver. then exactly 0. if 0. The logical maneuver follows this basic pattern. like 3.5. but what is known is that at least one of the three is true: P1 or P2 or P3 . If P1 then Q. The end which justiﬁes the means is the conclusion Q. then alternately to suppose P2 and show that it separately led to Q. he can proceed to the next subsection where the maneuver is put to use.

7).1 among others has already illustrated the technique. then. e5 . 15 As the reader will have observed by this point in the book. e2 . In fact it is impossible.14) exists for each ej . e4 . but to prove this. . . a4 .12) can be written in the form (AIr )B = Is . then reasons toward a contradiction which proves the assumption false. here we have not necessarily assumed that the several ak are linearly independent. Observe that unlike as in § 12. This subsection proves the impossibility. . a3 . . .3. but the technique’s use here is more sophisticated. If r < s. because Ir attacking from the right is the column truncation operator (§ 11. more precisely. ever transform an identity matrix into another identity matrix of greater rank (§ 11. B operates on the r columns of AIr to produce the s columns of Is . .3.15 and it runs as follows.3.12) holds: A = Is = B. RANK 307 12. are nothing more than the elementary vectors e1 .2 The impossibility of identity-matrix promotion Consider the matrix equation AIr B = Is .1.14) will turn out to be false because there are too many ej .6). One assumes the falsehood to be true. 14 . (12. we shall assume for the moment that the claim were true. . ar denote these columns.1. that a linear combination14 b1j a1 + b2j a2 + b3j a3 + · · · + brj ar = ej (12. reversible or otherwise. a5 .5. a2 . the technique—also called reductio ad absurdum—is the usual mathematical technique to prove impossibility. the product AIr is a matrix with an unspeciﬁed number of rows but only r columns—or.5). The proof then is by contradiction.13) makes is thus that the several vectors ak together address each of the several elementary vectors ej —that is. . 1 ≤ j ≤ s.1. (12.13) where. Viewed this way. The s columns of Is . Equation (12. The claim (12. it is not so easy. Let the symbols a1 . however.12) If r ≥ s. Section 6. per § 11. The r columns of AIr are nothing more than the ﬁrst through rth columns of A.5.3. es (§ 11. with no more than r nonzero columns. then it is trivial to ﬁnd matrices A and B for which (12. e3 . The claim (12. It shows that one cannot by any row and column operations.12.

lead ultimately alike to the same identical end remains to be determined.4 is that e1 contain at least a little of the vector ak it replaces—that bk1 = 0. then a3 might be available instead because b31 = 0. . even though it is illegal to replace an ak by an e1 which contains none of it. true and false. then all the logic requires is an assurance that there does exist at least one true choice. ar }. {e1 . Therefore. a5 . then according to (12. Whether as the maneuver also demands. . so for e1 to replace a1 might not be allowed. according to § 12. but surely it contains at least one of them. . . provided that the false choice and the true choice lead ultimately alike to the same identical end. .1 comes in. For j = 1. even if we remain ignorant as to which choice that is. The claim (12. even though we have no idea which of the several ak the vector e1 contains.14) guarantees at least one true choice. Some of the ak might indeed be forbidden. . ar }. Of course there is no guarantee specifically that b11 = 0. a5 . . RANK AND THE GAUSS-JORDAN Consider the elementary vector e1 .14) at least one of the several bk1 also is nonzero. However. even though replacing the wrong ak logically invalidates any conclusion which ﬂows from the replacement. Because e1 is a linear combination. still we can proceed with the proof. If they do. inasmuch as e1 is nonzero. there is always at least one ak which e1 can replace. e1 . In this case the new set would be {a1 . a4 . (For example. all the choices. replacing a1 by e1 .4 one can safely replace any of the vectors in the set by e1 without altering the space addressed. The only restriction per § 12. According to (12. which says that the elementary vector e1 is a linear combination of the several vectors {a1 . . . a2 .14).14) is b11 a1 + b21 a2 + b31 a3 + · · · + br1 ar = e1 . a4 . and if bk1 is nonzero then e1 can replace ak . . a3 . a2 . For example. but never all. a2 .5. the vector e1 might contain some of the several ak or all of them. a4 . a3 . a5 .) Here is where the logical maneuver of § 12. but that section depends logically on this one. . The book to this point has established no general method to tell which of the several ak the elementary vector e1 actually contains (§ 13.308 CHAPTER 12.2 gives the method. . . (12. if a1 were forbidden because b11 = 0. ar }. so we cannot licitly appeal to it here).

e2 lies in the space addressed by the original set {a1 . . a3 . a5 . ar }. e3 . . Therefore as we have seen. for (as should seem plain to the reader who grasps the concept of the elementary vector. ar }. a2 . or whatever the new set happens to be). a2 . which thereby is properly established.7) no elementary vector can ever be a linear combination of other elementary vectors alone! The linear combination which forms e2 evidently includes a nonzero multiple of at least one of the remaining ak . And so it goes. § 11. Again it is impossible for all the coeﬃcients βk2 to be zero. . This newer set addresses precisely the same space as the previous set. . which is attached to e1 ) must be nonzero.5. e2 . addresses precisely the same space as did the original set {a1 . . . . a5 . All intermediate choices. a3 . . . e2 also lies in the space addressed by the new set {e1 . a4 . ultimately lead to the single conclusion of this paragraph. . . .3.5. . ar }. . a2 . thus also as the original set. we now choose an ak with a nonzero coeﬃcient βk2 = 0 and replace it by e2 . RANK 309 Now consider the elementary vector e2 . a3 . . . . obtaining an even newer set of vectors like {e1 . a2 . a5 . not only do coeﬃcients bk2 exist such that b12 a1 + b22 a2 + b32 a3 + · · · + br2 ar = e2 .12. At least one of the βk2 attached to an ak (not β12 . . . but moreover. which. That is. e2 . . . until all the ak are gone and our set has become {e1 . a5 . a4 . . . replacing one ak by an ej at a time. a4 . true and false. e5 . ar }.14). According to (12. And this is the one identical end the maneuver of § 12. a4 . ar } (or {a1 . Therefore by the same reasoning as before. a3 . but also coeﬃcients βk2 exist such that β12 e1 + β22 a2 + β32 a3 + · · · + βr2 ar = e2 . er }. . a2 .1 has demanded. a5 . . e4 . e1 . as we have reasoned. it is impossible for β12 to be the sole nonzero coeﬃcient.

are evidently left over. than there are ak . e2 . (This is from § 11. e2 . e4 . 12. . e4 . 1 ≤ j ≤ s. as we have stipulated.13). a5 . even when r < s.3 General matrix rank and its uniqueness Step 8 of the Gauss-Jordan algorithm (§ 12. asserts that B is a column operator which does precisely what we have just shown impossible: to combine the r columns of AIr to yield the s columns of Is . e5 . Plainly this is impossible with respect to the left-over ej . . r < j ≤ s. r < j ≤ s. 1 ≤ k ≤ r. but until now we have lacked the background theory to prove it. they can never promote them. . the claim (12. The false claim: that the several ak . r < s.5. . . we conclude that although row and column operations can demote identity matrices in rank. because. The proof begins with a formal deﬁnition of the quantity whose uniqueness we are trying to prove. 1 ≤ k ≤ r. RANK AND THE GAUSS-JORDAN The conclusion leaves us with a problem. Equation (12. . The contradiction proves false the claim which gave rise to it. . ar } together addressed each of the several elementary vectors ej . .310 CHAPTER 12. es .3. In other words. 1 ≤ j ≤ s. e3 .14) made was that {a1 . Here is the proof. addressed all the ej . Back at the beginning of the section. There are more ej . Hence ﬁnally we conclude that no matrices A and B exist which satisfy (12. But as we have seen.12) when r < s. The promotion of identity matrices is impossible. a3 . One should like to think that this rank r were a deﬁnite property of the matrix itself rather than some unreliable artifact of the algorithm. a2 . Some elementary vectors ej . Now we have the theory. . e3 . er } together addressed each of the several elementary vectors ej . .12) written diﬀerently. the latter of which are just the elementary vectors e1 . .) • The rank r of a general matrix A is the rank of an identity matrix Ir to which A can be reduced by reversible row and column operations. • The rank r of an identity matrix Ir is the number of ones along its main diagonal.3) discovers a rank r for any matrix A. . however.3. .5. e5 . which is just (12. a4 . this amounts to a claim that {e1 .

15) only a single rank r is possible. −1 −1 B> B> = I = B> B> . < > Combining these equations. according to (12. This ﬁnding has two immediate implications: • Reversible row and/or column operations exist to change any matrix of rank r to any other matrix of the same rank. RANK 311 A matrix A has rank r if and only if matrices B> and B< exist such that B> AB< = Ir . we suppose that another rank were possible. −1 −1 B< B< = I = B< B< . reversible operations exist to change both matrices to Ir and back. No matrix has two diﬀerent ranks. (12. Therefore r = s is also impossible. Then. −1 −1 B> Ir B< = G−1 Is G−1 .12. then for Is . A = G−1 Is G−1 . −1 −1 A = B> Ir B< . a promotion.2 and its (12. The discovery that every matrix has a single. > < Solving ﬁrst for Ir . but they are important achievements . and r=s is guaranteed. (B> G−1 )Is (G−1 B< ) = Ir . But according to § 12.15).5.15) The question is whether in (12. Were it that r = s. unambiguous rank and the establishment of a failproof algorithm—the Gauss-Jordan—to ascertain that rank have not been easy to achieve. Matrix rank is unique.12). To answer the question. • No reversible operation can change a matrix’s rank. The reason is that. > < −1 −1 (G> B> )Ir (B< G< ) = Is . promotion is impossible. −1 −1 A = B> Ir B< . then one of these two equations would constitute the demotion of an identity matrix and the other.5. that A had not only rank r but also rank s.

Since AT is such a matrix. A matrix of less than full rank is a degenerate matrix. both. m × n matrix C likewise to r < n. We shall rely on it often. Section 12. To square matrices. m = n. Matrix rank by contrast is an entirely solid. The greatest rank possible for an m × n matrix is the lesser of m and n. RANK AND THE GAUSS-JORDAN nonetheless. Consider a tall m × n matrix C.4 was that a matrix of independent rows always has rank equal to the number of rows. using multiples of the other columns to wipe the dependent column out.8 comments further. The reason these achievements matter is that the mere dimensionality of a matrix is a chimerical measure of the matrix’s true size—as for instance for the 3 × 3 example matrix at the head of the section. both lines of reasoning apply. One of the conclusions of § 12. done by reversible column operations. but also A itself. one could then interchange it over to the matrix’s extreme right. 12. Now consider a tall matrix A with the same m × n dimensionality. we have that • a tall m × n matrix.312 CHAPTER 12. of course. . what this T T T T says is that there exist operators B< and B> such that In = B< AT B> .3. But formally. The transpose AT of such a matrix has a full n independent rows.3 binds the rank of the original.5). m ≥ n. But the shrink. has full rank r = n.5. has full rank if and only if its columns are linearly independent. by which § 12. the rank r of a matrix can exceed the number neither of the matrix’s rows nor of its columns. One could by deﬁnition target the dependent column with addition elementaries. Having zeroed the dependent column. worth the eﬀort thereto. Gathering ﬁndings. shrinking the matrix to m × (n − 1) dimensionality. Shrinking the matrix necessarily also shrinks the bound on the matrix’s rank to r ≤ n − 1—which is to say. if m = n. dependable measure. the transpose of which equation is B> AB< = In —which in turn says that not only AT . is deﬁned to be an m × n matrix with maximum rank r = m or r = n—or.5. but with a full n independent columns. is itself reversible.5. to r < n. m ≥ n. eﬀectively throwing the column away. m ≤ n. one of whose n columns is a linear combination (§ 12.4 The full-rank matrix According to (12.1) of the others. A full-rank matrix. Parallel reasoning rules the rows of broad matrices. is necessarily degenerate for this reason. one of whose columns is a linear combination of the others. then. The matrix C. its rank is a full r = n.

but. meaning that knowledge of b does not suﬃce to determine x uniquely. because a tall or broad matrix cannot but include. 12. more columns or more rows than Ir . Further generalities await Ch. respectively. To say that a matrix has full column rank is to say that it is tall or square and has full rank r = n ≤ m.5. > If the m-element vector c ≡ G−1 b. The truth of this claim can be seen by decomposing the system’s matrix A by Gauss-Jordan and then leftmultiplying the decomposed system by G−1 to reach the form > Ir G< x = G−1 b. never just one or the other. An overdetermined linear system Ax = b cannot have a solution for every possible m-element driving vector b. has full rank if and only if its columns and/or its rows are linearly independent.6 analyzes the unsolvable overdetermined linear system among others.2 solves the exactly determined linear system. m = n. Section 13. if r < m. then Ir G< x = c. Only a square matrix can have full column rank and full row rank at the same time.5 Underdetermined and overdetermined linear systems (introduction) The last paragraph of § 12. If A happily has both. then the system is paradoxically both underdetermined and overdetermined and is thereby degenerate. But since G> is .4 solves the nonoverdetermined linear system. or neither. if r < n—because inasmuch as some of A’s columns then depend linearly on the others such a system maps multiple n-element vectors x to the same m-element vector b.5. then the system is exactly determined.12.4 provokes yet further terminology. Section 13. • a square n × n matrix. which is impossible > unless the last m − r elements of c happen to be zero. m ≤ n. 13. To say that a matrix has full row rank is to say that it is broad or square and has full rank r = m ≤ n. the present subsection would observe at least the few following facts. If A lacks both. RANK 313 • a broad m × n matrix. A linear system Ax = b is underdetermined if A lacks full column rank—that is. Complementarily. and • a square matrix has both independent columns and independent rows. Section 13. has full rank if and only if its rows are linearly independent. regarding the overdetermined system speciﬁcally.5. a linear system Ax = b is overdetermined if A lacks full row rank—that is.

C = Ir G< In . 12. The full-rank factorization is not unique.6 The full-rank factorization One sometimes ﬁnds dimension-limited matrices of less than full rank inconvenient to handle.5. each b corresponds to a unique c and vice versa. if b is an unrestricted m-element vector then so also is c. Other full-rank factorizations are possible. better stated.16) The truncated Gauss-Jordan (12. Complementarily. then the full-rank factorization is trivial: A = Im A or A = AIn . every dimension-limited. whatever other solutions such a system might have. and an easy one innocently to commit. § 3. because it has no last m − r elements. The error is easy to commit because the equation looks right. RANK AND THE GAUSS-JORDAN invertible. a nonoverdetermined linear system Ax = b does have a solution for every possible m-element driving vector b. which veriﬁes the claim. both also of rank r: A = BC. for the trivial reason that r = m. m × n matrix of rank r can be expressed as the product of two full-rank matrices. it has at least the solution x = 0. It is an analytical error. However. so.6. 16 [2. to require that Ax = b for unrestricted b when A lacks full row rank. good for any matrix.3][37. because such an equation is indeed valid over a broad domain of b and might very well have been written correctly in that context. This is so because in this case the last m − r elements of c do happen to be zero.7) constitutes one such full-rank factorization: B = Im G> Ir . only not in the context of unrestricted b.54).16 Of course. Section 13. It is never such an analytical error however to require that Ax = 0 because.314 CHAPTER 12.4 uses the full-rank factorization. including among others the truncated Gram-Schmidt (13. (12. one m × r and the other r × n. however. “Moore-Penrose generalized inverse”] . if an m × n matrix already has full rank r = m or r = n. Analysis including such an error can lead to subtly absurd conclusions. or. because c in this case has no nonzeros among its last m − r elements.

as the full-column-rank matrix A evidently leaves one free to do. regardless of the pivots one chooses during the Gauss-Jordan algorithm’s step 3. That S = I comes immediately of choosing q = i for pivot column during each iterative instance of the algorithm’s step 3. We conclude that though one might voluntarily choose q = i during the algorithm’s step 3. the algorithm cannot force one to do so if r = n. of a tall or square m × n matrix A of full column rank r = n ≤ m always ﬁnds the factor K = I. what if the ˜ only nonzero elements remaining in I’s ith column stood above the main diagonal. A = P DLU Ir KS. then one would indeed have to choose q = i to swap the unusable column away rightward. Therefore. the Gauss-Jordan decomposition theoretically needs no K or S for such matrices. Hence step 12 does nothing and K = I.2) includes the factors K and S precisely to handle matrices with more columns than rank. Step 12 ˜ ˜ nulls the spare columns q > r that dress I’s right. which fact lets us abbreviate the decomposition for such matrices to read A = P DLU In . The column would remain unusable. RANK 315 12. If one happens always to choose q = i as pivot column then not only K = I but S = I. then indeed S = I. but see: nothing in the algorithm later ﬁlls such a column’s zeros with anything else—they remain zeros—so swapping the column away rightward could only delay the crisis. (12. Yet if one always does choose q = i. the Gauss-Jordan decomposition (12. That K = I is seen by the algorithm’s step 12.5.7 Full column rank and the Gauss-Jordan factors K and S The Gauss-Jordan decomposition (12. which creates K.2). too. were it so. Matrices of full column rank r = n. one must ask.12.17) .5. by deﬁnition have no such problem. Such contradiction can only imply that if r = n then no unusable column can ever appear. Eventually the column would reappear on pivot when no usable column rightward remained available to swap it with. But. One need not swap. which contrary to our assumption would mean precisely that r < n. unavailable for step 4 to bring to pivot? Well. can one choose so? What if column q = i were unusable? That is. Theoretically. but in this case I has only r columns and therefore has no spare columns to null. common in applications.

and. that each row of In lies in the space the rows of A address. 12.17)’s complementary form. after all. much more than its mere dimensionality or the extent of its active region. U −1 L−1 D −1 P −1 A = In . is an extremely important matrix theorem. Without this theorem. the very concept of matrix rank must remain in doubt. however. This is obvious and boring. if a matrix A is square and has full rank r = n. two independent columns.2 is used). represents the matrix’s true size. since B = AT has full rank just as A does. after all—especially numerically to avoid small pivots during early invocations of the algorithm’s step 5. simply by applying some 3 × 3 row and column . The concept underlying the theorem promotes the useful sensibility that a matrix’s rank.17) is not mandatory but optional for a matrix A of full column rank (though still r = n and thus K = I for such a matrix. In ’s columns together and In ’s rows together each address the same. along with all that attends to the concept.316 CHAPTER 12. which we have spent so many pages to attain. then A’s columns together. The column permutor S exists to be used. rank r = 2. Since P DLU acts as a row operator.5. complete n-dimensional space.3. The theorem is the rock upon which the general theory of the matrix is built.8 The signiﬁcance of rank uniqueness The result of § 12. In the whole. the honest 2 × 2 matrix » 5 3 1 6 – has two independent rows or. but interesting is the converse implication of (12.5. Equation (12. hence. even when the unabbreviated eqn. 12. It constitutes the chapter’s chief result. The rows of In and the rows of A evidently address the same space. One can easily construct a phony 3 × 3 matrix from the honest 2 × 2. One can moreover say the same of A’s columns. For example. alternately.17) implies that each row of the full-rank matrix A lies in the space the rows of In address. Dimensionality can deceive. that matrix rank is unique. (12. RANK AND THE GAUSS-JORDAN Observe however that just because one theoretically can set S = I does not mean that one actually should. A’s rows together. if doing exact arithmetic. set S = I if one wanted to. There are however times when it is nice to know that one theoretically could.

one sometimes is dismayed to discover that one of the equations. This can happen in matrix work. An applied mathematician with some matrix experience actually probably recognizes this particular 3 × 3 matrix as a fraud on sight. dimensions or equations one actually has available to work with.5. Consider for instance the 5 × 5 matrix (in hexadecimal notation) 2 3 12 9 3 1 0 F 15 6 3 2 12 7 2 2 6 2 7 6 D 9 −19 − E −6 7. Its true rank is r = 2.17 Now. but it is a very simple example. so long as the true rank is correctly recognized.” are a bit hyperbolic. 3 6 7 4 −2 0 6 1 5 5 1 −4 4 1 −8 17 As the reader can verify by the Gauss-Jordan algorithm. admittedly. We have here caught a matrix impostor pretending to be bigger than it really is. That is the sense of matrix rank. No one can just look at some arbitrary matrix and instantly perceive its true rank. The rank of a matrix helps one to recognize how many truly independent vectors. It looks like a rank-three matrix. RANK elementaries: T(2/3)[32] » 5 3 1 6 – 317 T1[13] T1[23] = 2 5 4 3 2 1 6 4 3 6 9 5. but of course there is nothing mathematically improper or illegal about a matrix of less than full rank. .12. the matrix’s rank is not r = 5 but r = 4. too. adjectives like “honest” and “phony. 6 The 3 × 3 matrix on the equation’s right is the one we met at the head of the section. is really just a useless combination of the others. When one models a physical phenomenon by a set of equations. The last paragraph has used them to convey the subjective sense of the matter. thought to be independent. but really has only two independent columns and two independent rows. rather than how many seem available at ﬁrst glance.” terms like “imposter.

318 CHAPTER 12. RANK AND THE GAUSS-JORDAN .

Building upon the two tedious chapters. First. after all. 11 and 12. if only to prepare to do here something we already knew how to do? The question is a fair one.Chapter 13 Inversion and orthonormalization The undeniably tedious Chs.5. as in the last step to derive (3. Why should we have suﬀered two bulky chapters. Uses however it has. 12 might at least review § 12.5 have already broached1 the matrix’s most basic use.1. this chapter brings the ﬁrst rewarding matrix work. before we go on. the matrix neatly solves a linear system not only for a particular driving vector b but for all possible driving vectors b at one stroke. to represent a system of m linear scalar equations in n unknowns neatly as Ax = b and to solve the whole system at once by inverting the matrix A that characterizes it.1 and 12. the primary subject of this chapter. we want to confess that such a use alone. 11 and 12 have piled the matrix theory deep while aﬀording scant practical reward. Now. as this chapter 1 The reader who has skipped Ch.5.9) as far back as Ch. One might be forgiven for forgetting after so many pages of abstract theory that the matrix aﬀorded any reward or had any use at all.5. Sections 11. 3. We already knew how to solve a simultaneous system of linear scalar equations in principle without recourse to the formality of a matrix. on the surface of it—though interesting—might not have justiﬁed the whole uncomfortable bulk of Chs. 319 . but admits at least four answers.

to solve the linear system neatly is only the primary and most straightforward use of the matrix. the matrix allows § 13. 14. to do so both optimally and eﬃciently. It continues in § 13. G−1 and G−1 can be found. then he stands in good company. After brieﬂy revisiting the Newton-Raphson iteration in § 13.8 through 13. Fourth.1 Inverting the square matrix Consider an n×n square matrix A of full rank r = n.4. it concludes by introducing the concept and practice of vector orthonormalization in §§ 13. G< . it brings forth the aforementioned pseudoinverse.6. The matrix ﬁnally begins to show its worth here.2. m × n linear system in § 13. n × n linear system in § 13. whereas such overdetermined systems arise commonly in applications. The chapter opens in § 13. so. if the reader now wonders. moreover. Third. speciﬁc applications aside. INVERSION AND ORTHONORMALIZATION explains.7.6 to introduce the pseudoinverse to approximate the solution to an unsolvable linear system and. not its only use: the even more interesting eigenvalue and its incidents await Ch.3 by computing the rectangular matrix’s kernel to solve the nonoverdetermined. one should never underestimate the blunt practical beneﬁt of reducing an arbitrarily large grid of scalars to a single symbol A. which rightly approximates the solution to the unsolvable overdetermined linear system.11. 13. Second and yet more impressively.320 CHAPTER 13. Most students ﬁrst learning the matrix have wondered at this stage whether it were worth all the tedium.1 by inverting the square matrix to solve the exactly determined. Suppose that extended operators G> . which one can then manipulate by known algebraic rules. each with an n × n active > < . In § 13.

The deﬁnitions do however necessarily.31) that In A −1 −1 In G< G> = = A G−1 In G−1 < > = AIn .13. = In . 2 The symbology and associated terminology might disorient a reader who had skipped Chs. None of this is complicated. = G−1 G−1 In .49. In this book. because all matrices are theoretically ∞ × ∞. which means that the operators aﬀect respectively only the ﬁrst m rows or n columns of the thing they operate on. but more often it does not—which naturally is why the other books tend to forbid such multiplication). the symbol I theoretically represents an ∞ × ∞ identity matrix. 13. INVERTING THE SQUARE MATRIX region (§ 11. the matrix A of the m × n system Ax = b has nonzero content only within the m × n rectangle. really. though it too can be viewed as an ∞ × ∞ matrix with zeroes in the unused regions. as in eqns. anyway (though whether it makes any sense in a given circumstance to multiply mismatched matrices is another question. > > G−1 G< = I = G< G−1 .1. If interpreted as an ∞ × ∞ matrix.3. one can legally multiply any two matrices. In A −1 −1 G< G> In A −1 (G< In G−1 )(A) > = G> G< In . Observing from (11. such that2 G−1 G> = I = G> G−1 . . especially § 11. Its purpose is merely to separate the essential features of a reversible operation like G> or G< from the dimensionality of the vector or matrix on which the operation happens to operate. In this book. 11 and 12.2). Outside the m × m or n × n square. (In the present section it happens that m = n because the matrix A of interest is square.24 and 14.3. the operators G> and G< each resemble the ∞ × ∞ identity matrix I. < > 321 (13. = In . the reader might brieﬂy review the earlier chapters. but this footnote uses both symbols because generally m = n. < < A = G> In G< . slightly diverge from deﬁnitions the reader may have been used to seeing in other books.1) we ﬁnd by successive steps that A = G> In G< .) The symbol Ir contrarily represents an identity matrix of only r ones. sometimes it does make sense. To the extent that the deﬁnitions confuse.

< > (13. not I. The rank-n inverse exists. but the rank-n inverse from (11. That is.2) = In G> G< .2) include the following. (12.2).” because the rank is implied by the size of the square active region of the matrix inverted. • If B = A−1 then B −1 = A. A is itself the rank-n inverse of A−1 . reciprocal pair. INVERSION AND ORTHONORMALIZATION or alternately that A = G> In G< . unique inverse A−1 .2) features the important matrix A−1 . its columns are linearly independent. = In . • Like A.5. G< .4). According > < to (13. which might seem a practical hurdle. No other rank-n inverse of A exists.2). AIn −1 −1 AIn G< G> (A)(G−1 In G−1 ) < > Either way.49) is not quite the inﬁnitedimensional inverse from (11. • On the other hand. When naming the rank-n inverse in words one usually says simply. there is no trouble here. the product of A−1 and A—or. written more fully. but we have deﬁned it in (11. inasmuch as A is square and has full rank.3) and the body of § 12. “the inverse. be known > < and honor n × n active regions. separately. Of course. . so. we have that A−1 A = In = AA−1 . and indeed we know how to ﬁnd them. G< .3 have shown exactly how to ﬁnd just such a G> . However. A−1 ≡ G−1 In G−1 .45). it does per (13. The factors do exist. G> . Properties that emerge from (13. G−1 and G−1 for any square matrix A of full rank. the rank-n inverse of A.49). for this to work. the rank-n inverse A−1 (more fully written A−1(n) ) too is an n × n square matrix of full rank r = n. which is what G−1 and G−1 are. so it has only the one. Equation (13. the product of A−1(n) and A—is. but In . = In . We have not yet much studied the rank-n inverse.322 CHAPTER 13. nonstandard notation A−1(n) . its rows and. • Since A is square and has full rank (§ 12. G−1 and G−1 must exist. (12.2) indeed have an inverse A−1 . > < without exception. where we gave it the fuller. The matrices A and A−1 thus form an exclusive.

Mathematical convention owns a special name for a square matrix which is degenerate and thus has no inverse.13. • Only a square. From this beginning and the fact that In = AA−1 . . it is hardly the only means.3. A−1 is unique. which has no reciprocal.2).4’s observation that the columns (like the rows) of A are linearly independent because A is square and has full rank. n × n matrix of full rank r = n has a rank-n inverse. What are unique are not the factors but the A and A−1 they produce. A matrix A′ which is not square. so in the sense of (13. incidentally. inasmuch as the Gauss-Jordan decomposition plus (13.1). B = A−1 . it follows that [A−1 ]∗1 represents3 the one and only possible combination of A’s columns which achieves e1 . and any proper G> and G< found by any means will serve so long as they satisfy (13. that [A−1 ]∗2 represents the one and only possible combination of A’s columns which achieves e2 .5. and so on through en . or whose rank falls short of a full r = n. of rank r < n? Rank promotion is impossible as §§ 12. One could observe likewise respecting the independent rows of A.2) reliably calculate it.5.2)—that BA = In or that AB = In —much less both. It is not claimed that the matrix factors G> and G< themselves are unique. What of the degenerate n × n square matrix A′ . One can have neither equality without the other. Indeed. then A′−1 would by deﬁnition represent a row or column operation which impossibly promoted A′ to the full rank r = n of In .3’s ﬁnding that reversible operations like G−1 and G−1 cannot change In ’s rank. it calls it a singular matrix.” Refer to § 11. thus.2) such a matrix has no inverse. On the contrary.2) of A−1 plus § 12. Either way. then both equalities in fact hold. in that it has no inverse such a degenerate matrix closely resembles the scalar 0.2 and 12. That A−1 is an n × n square matrix of full rank and that A is itself the inverse of A−1 proceed from the deﬁnition (13. INVERTING THE SQUARE MATRIX 323 • If B is an n × n square matrix and either BA = In or AB = In . > < That the inverse exists is plain. Though the GaussJordan decomposition is a convenient means to G> and G< . many diﬀerent pairs of matrix factors G> and G< can yield A = G> In G< .3 have shown.5. no other n × n matrix B = A−1 satisﬁes either requirement of (13.1.1. is not invertible in the rank-n sense of (13.5. 3 The notation [A−1 ]∗j means “the jth column of A−1 . no less than that many diﬀerent pairs of scalar factors γ> and γ< can yield α = γ> 1γ< . for. That the inverse is unique begins from § 12. Moreover. if it had.

See also §§ 12. (13.4) with less eﬀort.2 The exactly determined linear system Section 11. introductory.1 has shown how the single matrix equation Ax = b (13.4. which gives A full rank and makes it invertible. What the tutorials do is pedagogically necessary—it is how the 4 . First.5. 1][31. then the rows of A are similarly independent.4 As the chapter’s introduction has observed. however. appending the right number of null rows or columns to a nonsquare matrix does turn it into a degenerate square. we For motivational reasons. The deﬁnitions of the present particular section are meant for square matrices.3) and the corresponding system of linear scalar equations by left-multiplying (13.6.4 and 13. in the long run the tutorials save no eﬀort. However. tutorial linear algebra textbooks like [23] and [31] rightly yet invariably invert the general square matrix of full rank much earlier. INVERSION AND ORTHONORMALIZATION And what of a rectangular matrix? Is it degenerate? Well. one can solve (13. then one can indeed reach (13.4) with rather less eﬀort than the book has done.2) and (13. reaching (13.3) concisely represents an entire simultaneous system of linear scalar equations. the student fails to develop the GaussJordan decomposition properly.1) to reach the famous formula x = A−1 b. not exactly. Under these conditions. Furthermore.324 CHAPTER 13. not necessarily. Second. Ch. because the student still must at some point develop the theory underlying matrix rank and supporting each of the several coincident properties of § 14.3 and 12. then the matrix A has square. (13. It has taken the book two long chapters to reach (13. The deferred price the student pays for the simplerseeming approach of the tutorials is twofold.5.5.1.4) concisely solves a simultaneous system of n linear scalar equations in n scalar unknowns. § 1. in which case the preceding argument applies.4) Inverting the square matrix A of scalar coeﬃcients. 13. they do not neatly apply to nonsquare ones. It is the classic motivational result of matrix theory.3) by the A−1 of (13. instead learning the less elegant but easier to grasp “row echelon form” of “Gaussian elimination” [23.5. n × n dimensionality. no.4). 13. If the system has n scalar equations and n scalar unknowns. if the n scalar equations are independent of one another.2]—which makes good matrixarithmetic drill but leaves the student imperfectly prepared when the time comes to study kernels and eigensolutions or to read and write matrix-handling computer code.2. If one omits ﬁrst to prepare the theoretical ground suﬃciently to support more advanced matrix work. Refer to §§ 12.

5. e If we were really precise. Equation (13. and surely more comprehensible to those who do not happen to have read this particular book.6. .49).3 The kernel If a matrix A has full column rank (§ 12. any n-element x (including x = 0) that satisﬁes (13. Indeed. minimally represents the kernel of A if and only if • AK has n × (n − r) dimensionality (which gives AK tall rectangular form unless r = 0). though. writer ﬁrst learned the matrix and probably how the reader ﬁrst learned it. thus degenerate. 13. 5 The conventional mathematical notation for the kernel of A is ker{A}. no unique solution to a linear system described by a singular square matrix is possible—though a good approximate solution is given by the pseudoinverse of § 13. the rows of the matrix are linearly dependent. THE KERNEL 325 shall soon meet additional interesting applications of the matrix which in any case require the theoretical ground to have been prepared. because Chs.13. null{A} or something nearly resembling one of the two—the notation seems to vary from editor to editor—which technically represent the kernel space itself. Let A be an m × n matrix of rank r. as opposed to the notation AK which represents a matrix whose columns address the kernel space.4) is only the ﬁrst fruit of the eﬀort. where derivations prevail.3. where the square matrix A is singular. In the language of § 12.5) belongs to the kernel of A.5.5.5) is impossible if In x = 0. A second matrix. The abbreviated notation AK is probably clear enough for most practical purposes. Either way.5) is possible even if In x = 0. the singular square matrix characterizes a system that is both underdetermined and overdetermined. In this book. If the matrix however lacks full column rank then (13.4). the superﬂuous equations either merely reproduce or ﬂatly contradict information the other equations already supply. In either case. This book de¨mphasizes the distinction and prefers the kernel matrix notation AK . Where the inverse does not exist. meaning that the corresponding system actually contains fewer than n useful scalar equations. the inversion here goes smoothly. Depending on the value of the driving vector b. not to a study reference like this book. then the columns of A are linearly independent and Ax = 0 (13.5 AK . too—but it is appropriate to a tutorial. we might write not AK but AK(n) to match the A−1(r) of (11. the proper place to invert the general square matrix of full rank is here. 11 and 12 have laid under it a ﬁrm foundation upon which—and supplied it the right tools with which—to work.

3. 13. G> Ir KSx = b.4. for generality’s sake. where S −1 . Ax = A(AK a) = (AAK )a = 0.1 derives the formula.5) requires but it serves § 13.7) is not easy. and it is the latter space rather than AK as such that is technically the kernel (if you forget and call AK “a kernel. so.3. next. What is unique is not the kernel matrix but rather the space its columns address. the columns of AK are linearly independent. Section 13.8) and where b and x are respectively m. (13. The Gauss-Jordan kernel formula 6 AK = S −1 K −1 Hr In−r = G−1 Hr In−r < (13. there are inﬁnitely many kernel matrices AK to choose from for a given matrix A.6) The n−r independent columns of the kernel matrix AK address the complete space x = AK a of vectors in the kernel. . or both. > 6 The name Gauss-Jordan kernel formula is not standard as far as the writer is aware. by successive steps. This name seems as ﬁtting as any. too. which gives AK full column rank). we leave b unspeciﬁed but by the given proviso.3) and Hr is the shift operator of § 11.9. for the moment. INVERSION AND ORTHONORMALIZATION • AK has full rank n − r (that is. Ir KSx = G−1 b. In symbols. and • AK satisﬁes the equation AAK = 0. Gauss-Jordan factoring A. (13. K −1 and G−1 are the factors < their respective symbols indicate of the Gauss-Jordan decomposition’s complementary form (12.7) gives a complete kernel AK of A. Except when A has full column rank the kernel matrix is not unique. This statement is broader than (13.” though. you’ll be all right).7). where the (n − r)-element vector a can have any value.326 CHAPTER 13. but we would like a name for (13. where b = 0 or r = m. The deﬁnition does not pretend that the kernel matrix AK is unique.and n-element vectors and A is an m × n matrix of rank r.1 The Gauss-Jordan kernel formula To derive (13. > Ir (K − I)Sx + Ir Sx = G−1 b. It begins from the statement of the linear system Ax = b.

9) is interesting. we have (probably after some trial and error not recorded here) planned the steps leading to (13. which is the remaining n − r elements.9) implies that one can choose the last n − r elements of Sx freely. THE KERNEL Applying an identity from Table 12. b) = f (a. > Rearranging terms. It has Sx on both sides. b) + Hr a. This makes f and thereby also x functions of the free parameter a and the driving vector b: f (a. and on the right only (In − Ir )Sx. The equation has however on the left only Ir Sx.3. The ﬂexibility to reassociate operators in such a way is one of many good reasons Chs. b) a = f (a. .10) a ≡ H−r (In − Ir )Sx. The implication is signiﬁcant. (13.9) in the improved form f = G−1 b − Ir KHr a. then f (a. Ir Sx = G−1 b − Ir K(In − Ir )Sx. Sx(a. 0) = −Ir KHr a. If b = 0 as (13. 0) a = f (a. Equation (13.11) f ≡ Ir Sx. where a represents the n − r free elements of Sx and f represents the r dependent elements. (13. b) = G−1 b − Ir KHr a. where Sx is the vector x with elements reordered in some particular way. Naturally this is no accident. 11 and 12 have gone to such considerable trouble to develop the basic theory of the matrix.9) Equation (13. 0) = 7 f (a. but that the choice then determines the ﬁrst r elements. To express the implication more clearly we can rewrite (13. > Sx(a. Ir K(In − Ir )Sx + Ir Sx = G−1 b.13.7 No element of Sx appears on both sides. > Sx = f a = f + Hr a.5) requires.9) to achieve precisely this eﬀect. Notice how we now associate the factor (In − Ir ) rightward as a row truncator. > 327 (13. which is the ﬁrst r elements of Sx.2 on page 300. though it had ﬁrst entered acting leftward as a column truncator. 0) + Hr a.

14) The alternate kernel formula (13. this seems trivial. 0) = (I − Ir K)Hr In−r .15) 8 These are diﬃcult steps. 0) = (I − Ir K)Hr a. AK = S −1 (I − Ir K)Hr In−r . then x by AK ? One justiﬁes them in that the columns of In−r are the several ej . But if all the ej at once. in aggregate.13) (13. then the columns of x(In−r . Sx(In−r .328 CHAPTER 13. exactly address the domain of a. Seen from one perspective. (13. which by deﬁnition are the independent values of x that cause b = 0.76). with the elements of a again as the weights. Sx(ej . until one grasps what is really going on here. Sx(a. if we can solve the problem for the identity matrix In−r —then we shall implicitly have solved it for every a because a is a weighted combination of the ej and the whole problem is linear. The idea is that if we can solve the problem for each elementary vector ej —that is. then ej by In−r .13) is SAK = (I − Ir K)(In − Ir )Hr = [(In − Ir ) − Ir K(In − Ir )]Hr = [(In − Ir ) − (K − I)]Hr .14) is correct but not as simple as it could be. In the event that a = ej .12) Left-multiplying by S −1 = S ∗ = S T produces the alternate kernel formula (13. For all the ej at once. 0) likewise exactly address the range of x(a. How does one justify replacing a by ej .6) has already named this range AK . The solution x = AK a for a given choice of a becomes a weighted combination of the solutions for each ej . the columns of In−r . from another perspective. by which8 SAK = (I − Ir K)Hr In−r . (13. By the identity (11. . of which any (n − r)-element vector a can be constructed as the linear combination a = In−r a = [ e1 e2 e3 ··· en−r ]a = n−r X j=1 aj ej weighted by the elements of a. And what are the solutions for each ej ? Answer: the corresponding columns of AK . Equation (13. (13. INVERSION AND ORTHONORMALIZATION Substituting the ﬁrst line into the second. where 1 ≤ j ≤ n − r. baﬄing. 0) = (I − Ir K)Hr ej . 0).

17) Second.15) yields SAK = [(In − K −1 Ir ) − (I − K −1 )]Hr .17) and (13. since all of K’s interesting content lies by construction right of its rth column). then it appears that SAK = K −1 Hr In−r . SAK = K −1 (In − Ir )Hr + [(K −1 − I)(I − In )]Hr .2 again in the last step.16) though unproven still helps because it posits a hypothesis toward which to target the analysis. 9 Well.18) into (13.2 also help. Substituting (13. THE KERNEL 329 where we have used Table 12. .9 but (13.2.3.16) The appearance is not entirely convincing. Adding 0 = K −1 In Hr − K −1 In Hr and rearranging terms. actually. Now we have enough to go on with.18) (which actually is pretty obvious if you think about it. so SAK = K −1 (In − Ir )Hr . First. the appearance pretty much is entirely convincing.15) schematically with a pencil.13. but if one sketches the matrices of (13. no. 2 we have that K − I = I − K −1 . the quantity in square brackets is zero. but let us ﬁnish the proof symbolically nonetheless. SAK = K −1 (In − Ir )Hr + [K −1 − K −1 In − I + In ]Hr . How to proceed symbolically from (13.15) is not obvious. we have that K −1 Ir = Ir (13. right-multiplying by Ir the identity that Ir K −1 (In − Ir ) = K −1 − I and canceling terms. Factoring. (13. According to Table 12. and if one remembers that K −1 is just K with elements oﬀ the main diagonal negated. (13. Two variations on the identities of Table 12. from the identity that K + K −1 = I.

We know this because § 12. .2 Converting between kernel matrices If C is a reversible (n−r)×(n−r) operator by which we right-multiply (13.5. Moreover. AK lacks it because SAK lacks it.19) like AK evidently represents the kernel of A: AA′K = A(AK C) = (AAK )C = 0.7) has both features and does ﬁt the deﬁnition. that it had full rank. the deﬁnition of AK in the section’s introduction demands both of these features. AK addresses it because AK comes from all a. that is. INVERSION AND ORTHONORMALIZATION which.3. reaching (13. reversibly.3.16) by S −1 = S ∗ = S T . without altering the space addressed.) The orthonormalizing column operator R−1 of (13. So. proves (13. In general such features would be hard to establish.7)’s AK actually addressed the whole kernel space of A rather than only part. Regarding the whole kernel space. and SAK lacks it because according to (13. but here the factors conveniently are Gauss-Jordan factors.16).76) has that (In − Ir )Hr = Hr In−r .4 lets one replace the columns of AK with those of A′K . the two matrices necessarily represent the same underlying kernel. The concept is the concept of the degree of freedom. Regarding redundancy.7) that was to be derived. The ﬁnal step is to left-multiply (13. worth pausing brieﬂy to appreciate. then the matrix A′K = AK C (13. (It might not let one replace the columns in sequence.6). in fact. 13.1 and 12.5.52) below incidentally tends to make a good choice for C. 13. Refer to §§ 12.330 CHAPTER 13.13) the last rows of SAK are Hr In−r .3 The degree of freedom A slightly vague but extraordinarily useful concept has emerged in this section. One would like to feel sure that the columns of (13. one column at a time. (13. Indeed this makes sense: because the columns of AK C address the same space the columns of AK address. but if out of sequence then a reversible permutation at the end corrects the order. considering that the identity (11. One would further like to feel sure that AK had no redundant columns. Moreover.2 for the pattern by which this is done. some C exists to convert AK into every alternate kernel matrix A′K of A.

Likewise. Other degrees of freedom are nonlinear in eﬀect: a certain ﬁring elevation gives maximum range. then he would almost surely miss. . the artillerist might ﬁnd some impractical to exercise.3. he can still choose ﬁring elevation but no longer azimuth. one in muzzle velocity (as governed by the quantity of gunpowder used to propel the ball). Two degrees of freedom remain to the artillerist. invites any real artillerist among the readership to write in to improve the example. . A seventh potential degree of freedom. . The artillerist probably preloads the cannon always with a standard charge of gunpowder because. Some apparent degrees of freedom are not real. The author. too much gunpowder might break the cannon. 10 . that is. then he can still hope to hit even with the broken carriage wheel. but. one in east-west). the height from which the artillerist ﬁres. For example. when he ﬁnds his target in the ﬁeld. And Napoleon might yell. the two are enough. he cannot spare the time to unload the cannon and alter the charge: this costs one degree of freedom. anyway. the artillerist needs somehow to recover another degree of freedom. two in aim (azimuth and elevation). the artillerist must limber up the cannon and hitch it to a horse to shift it to better ground. for this too he cannot spare time in the heat of battle: this costs two degrees. ) and waits for the target to traverse the cannon’s ﬁxed line of ﬁre. for Napoleon had no ﬂying cannon. In such a strait to hit some particular target on the battleﬁeld. Now consider what happens if the artillerist loses one of his last two remaining degrees of freedom. but. “Fire!” canceling the time degree as well. Maybe the cannon’s carriage wheel is broken and the artillerist can no longer turn the cannon. who has never ﬁred an artillery piece (unless an arrow from a Boy Scout bow counts). since exactly two degrees are needed to hit some particular target on the battleﬁeld. is of course restricted by the lay of the land: the artillerist can ﬁre from a high place only if the place he has chosen to ﬁre from happens to be up on a hill. “Fire!” (maybe not a wise thing to do. THE KERNEL 331 A degree of freedom is a parameter one remains free to determine within some continuous domain. For example. and one in time. nearer targets can be hit by ﬁring either higher or lower at the artillerist’s discretion. for could he choose neither azimuth nor the moment to ﬁre. If he disregards Napoleon’s order. Napoleon’s artillerist10 might have enjoyed as many as six degrees of freedom in ﬁring a cannonball: two in where he chose to set up his cannon (one degree in north-south position. muzzle velocity gives the artillerist little control ﬁring elevation does not also give. for he needs two but has only one. Yet even among the six remaining degrees of freedom. On the other hand.13.

If the underdetermined system is not also overdetermined.20) .332 CHAPTER 13. if it is nondegenerate such that r = m. north-south. The n may possibly for various reasons still not suﬃce—it might be wise in some cases to allow n + 1 or n + 2—but in no event will fewer than n do.2 is common. This family is the topic of the next section. Engineers of all kinds think in this way: an aeronautical engineer knows in advance that an airplane needs at least n ailerons. nonoverdetermined linear system Ax = b. The driver on the mountain road cannot claim a second degree of freedom at the mountain intersection (he can indeed claim a choice. represented by the n − r free elements of a. A plane brings two. 13. but he might plausibly claim a second degree of freedom upon reaching the city.4 The nonoverdetermined linear system The exactly determined linear system of § 13. we ﬁnd n − r degrees of freedom in the general underdetermined linear system. it still brings a single degree of freedom. the count captures the idea that to control n output variables of some system takes at least n independent input variables. but also common is the more general. where the web or grid of streets is dense enough to approximate access to any point on the city’s surface. east-west) even though his domain in any of the three is limited to the small volume of the pool. Reviewing (13.11). not merely a discrete choice to turn left or right. In geometry. And if the road reaches an intersection? Answer: still one degree. A point brings none. a swimmer in a swimming pool enjoys three degrees of freedom (up-down. If the line bends and turns like a mountain road. rudders and other control surfaces for the pilot adequately to control the airplane. a line brings a single degree of freedom. but the choice being discrete lacks the proper character of a degree of freedom). INVERSION AND ORTHONORMALIZATION All of this is hard to generalize in unambiguous mathematical terms. but the count of the degrees of freedom in a system is of high conceptual importance to the engineer nonetheless. an electrical engineer knows in advance that a circuit needs at least n potentiometers for the technician adequately to tune the circuit. A degree of freedom has some continuous nature. Just how many streets it takes to turn the driver’s “line” experience into a “plane” experience is a matter for the mathematician’s discretion. and so on. On the other hand. then it is guaranteed to have a family of solutions x. Basically. (13.

when the second line is added to the ﬁrst and the third is substituted.20). The family of vectors expressible as AK a is called the homogeneous solution of (13. it remains in § 13. THE NONOVERDETERMINED LINEAR SYSTEM 333 in which b is a known.2.20) by deﬁnition admits more than one solution x for a given driving vector b. b) + AK a = f (a.4. Any particular solution will do.4.20) already half done. the nonoverdetermined linear system has no unique solution but rather a family of solutions. This section delineates the family.7) has given us AK and thereby the homogeneous solution. To complete the analysis.4.13.21) Except in the exactly determined edge case r = m = n of § 13. Notice the italicized articles a and the.2 A particular solution f (a. and A is a square or broad. but it does let us treat the system’s ﬁrst and second lines in (13. A(AK a) = 0. > (S) x1 (a.5.4) r = m ≤ n.20). Splitting the system does not change it. Such a system is hard to solve all at once.2 to ﬁnd a particular solution. x is an unknown.22) separately. whereupon AK a represents the complete family of n-element vectors that satisfy the form’s second line.4. so we prefer to split the system as Ax1 = b. x = x1 + A a.1 Particular and homogeneous solutions The nonoverdetermined linear system (13. which renders the analysis of (13.20). though.11) has that . m-element vector.22) 13. b) + Hr a. the mathematician just picks one—and is called a particular solution of (13. Equation (13. which. 13. b) = G−1 b − Ir KHr a. n-element vector. (13. The (n − r)-element vector a remains unspeciﬁed. m × n matrix of full row rank (§ 12. b) a = f (a. In the split form. The Gauss-Jordan kernel formula (13. K (13. the symbol x1 represents any one n-element vector that happens to satisfy the form’s ﬁrst line—many are possible. makes the whole form (13.

24) solves the nonoverdetermined linear system in theory exactly. (It can also lose accuracy when the matrix’s rows are almost dependent. In exact arithmetic (13. > (13. This holds for any a and b.22) and (13.8. which addresses a related problem.4) solve the exactly determined linear system Ax = b. Equation (13.24) for matrices larger than some moderately large size. but since we need only one particular solution. b).24)’s reach to larger matrices. but that is more the fault of the matrix than of the formula. b) = That is.4. 13.) 13. practical calculations are usually done in limited precision. but one should at least be aware that it can in practice lose ﬂoating-point accuracy when the matrix it attacks grows too large. We are not free to choose the driving vector b. Avoiding unduly small pivots early in the Gauss-Jordan extends (13.24) broadens the solution to include the nonoverdetermined linear system.23) f (0. x1 = S −1 G−1 b. INVERSION AND ORTHONORMALIZATION where we have substituted the last line of (13. None of those equations however can handle the . and for yet larger matrices a bewildering variety of more sophisticated techniques exists to mitigate the problem. See § 14.3 The general solution Assembling (13. (13.334 CHAPTER 13.2) and (13. b) 0 = f (0. Why not a = 0? Then f (0. > Sx1 (0.5 The residual Equations (13.24) to the nonoverdetermined linear system (13. in which compounded rounding error in the last bit eventually disrupts (13.23) yields the general solution x = S −1 (G−1 b + K −1 Hr In−r a) > (13. a can be anything we want. Of course. b) = G−1 b. Equation (13.22) for x.20). which can be vexing because the problem arises even when the matrix A is exactly known.24) is useful and correct.7).

12 Some books [49] prefer to deﬁne r(x) ≡ b − Ax. The quantity11. For example. The r here is unrelated to matrix rank r. and the more so because it arises so frequently in applications. fully trained labor on demand. More precisely.25) has no exact solution.5 for the deﬁnitions of underdetermined. On Saturday morning at the end of the second week. 13. We need to develop the mathematics to handle the overdetermined system properly. etc.12 r(x) ≡ Ax − b (13.25). and the smaller. because for general b the overdetermined linear system Ax ≈ b (13. Alas. the smaller the nonnegative real scalar [r(x)]∗ [r(x)] = i |ri (x)|2 (13. 13.13.6 The Moore-Penrose pseudoinverse and the least-squares problem A typical problem is to ﬁt a straight line to some data. Suppose further that we are contracted to build a long freeway and have been adding workers to the job in recent weeks to speed construction. called the squared residual norm. the better. One seldom trusts a minimal set of data for important measurements.5. yet extra data imply an overdetermined system. THE PSEUDOINVERSE AND LEAST SQUARES 335 overdetermined linear system. we gather and plot the production data on the left of Fig. (See § 12.27) is. If ui and bi respectively represent the number of workers and the length of freeway completed during week i.1. instead. In fact the overdetermined system is especially interesting.26) measures how nearly some candidate solution x solves the system (13.) One is tempted to declare the overdetermined system uninteresting because it has no solution and to leave the matter there. but this would be a serious mistake.6. We call this quantity the residual. the alphabet has only so many letters (see Appendix B). overdetermined. the more favorably we regard the candidate solution x. 11 . whose labor union can supply additional. suppose that we are building-construction contractors with a unionized work force.

Why now two? The answer is that § 13. INVERSION AND ORTHONORMALIZATION Figure 13. they diﬀer as driving an automobile diﬀers from washing one. Do not let this confuse you. The added data present a problem. Though both involve lines. but geometrically we need only two points to specify a line.2 to solve for x ≡ [σ γ]T . to which we should like to ﬁt a better line to predict production more accurately. what are we to do with the other three? The ﬁve points together overdetermine the linear system 2 3 3 2 = b1 6 b2 6 6 b3 6 4 b4 b5 7 7 7.3. Length newly completed during the week Number of workers Length newly completed during the week Number of workers u2 1 γ b2 u1 6 u2 6 6 u3 6 4 u4 u5 1 1 1 1 1 7» – 7 σ 7 7 γ 5 then we can ﬁt a straight line b = σu + γ to the measured production data such that – » – –» » b1 σ u1 1 = . 13 .336 CHAPTER 13.3 characterized a line as enjoying only one degree of freedom. By the ﬁfth Saturday however we shall have gathered more production data. 7 5 There is no way to draw a single straight line b = σu + γ exactly through all ﬁve. More precisely. in the hope that the resulting line will predict future production accurately.3. the added data are welcome. for in placing the line we enjoy only two degrees of freedom.13 The proper approach is to draw among the data points a single straight line that misses the points as narrowly as possible.1: Fitting a line to measured data.1 and 13. inverting the matrix per §§ 13. That is all mathematically irreproachable. the proper Section 13. Statistically.3 discussed travel along a line rather than placement of a line as here. plotted on the ﬁgure’s right.

6.13. . all ones on the right. This is a typical structure for A. 7 7 5 2 6 6 6 6 6 6 6 4 b1 b2 b3 b4 b5 . this section attacks the diﬃcult but important problem of approximating optimally a solution to the general.10). THE PSEUDOINVERSE AND LEAST SQUARES 337 approach chooses parameters σ and γ to minimize the squared residual norm [r(x)]∗ [r(x)] of § 13. . 7 7 5 A= x= » σ γ – . 13. . A requirement that d xT AT (Ax − 2b) + bT b = 0 dx . The matrix A in the example has two columns. Ax ≈ b.1 Least squares in the real domain The least-squares problem is simplest when the matrix A enjoys full column rank and no complex numbers are involved. 3 7 7 7 7 7. data marching on the left. we seek to minimize the squared residual norm [r(x)]T [r(x)] = (Ax − b)T (Ax − b) = xT AT (Ax − 2b) + bT b. given that 2 6 6 6 6 6 6 6 4 u1 u2 u3 u4 u5 . = xT AT Ax + bT b − 2xT AT b = xT AT Ax + bT b − xT AT b + bT Ax in which the transpose is used interchangeably for the adjoint because all the numbers involved happen to be real.5. . but in general any matrix A with any number of columns of any content might arise (because there were more than two relevant variables or because some data merited heavier weight than others. 1 1 1 1 1 3 7 7 7 7 7.25). b= Such parameters constitute a least-squares solution. Whatever matrix A might arise from whatever source. In this case. among many further reasons). possibly unsolvable linear system (13. The norm is minimized where d rT r = 0 dx (in which d/dx is the Jacobian operator of § 11.6.

No line can pass more nearly.28) x = AT A to the unsolvable linear system (13. for applied mathematics is a poor guide to such mysteries. if the other metric really. Strictly speaking.27) is so chosen. 13. As the reader can see. the better. or. in the squared-residual norm sense of (13. An experienced mathematician would probably reject the circle on the aesthetic yet practical ground that the parabola b = αu2 + σu + γ lends 14 . the mathematics cannot tell us which metric to use. objectively is better.338 CHAPTER 13. but where no other consideration prevails the applied mathematician tends to choose the metric that best simpliﬁes the mathematics at hand— and. the simpliﬁed equation AT Ax = AT b.28) plots the line on Fig. and the less. real cost function or metric. “But. This is where the mathematician’s experience. but it does pass pretty convincingly nearly among them. next) that the n×n square matrix AT A is invertible. rearranging terms and dividing by 2.6. Assuming (as warranted by § 13.27).” comes the objection. usually. to identify the good as good in the ﬁrst place. “Optimal” means “best.79) yields the equation xT AT A + AT (Ax − 2b) T = 0. Equation (13. In general however the mathematical question is: what does one mean by “better?” Better by which metric? Each metric is better according to itself. “what if some more complicated metric is better?” Well.2. that is about as good a way to choose a metric as any.25) in the restricted but quite typical case that A and b are real and A has full column rank. after transposing the equation. nonnegative.1’s right. even of dispute. We shall leave to the philosopher and the theologian the important question of what constitutes objective good. A circle. One could therefore seek to ﬁt not a line but some downward-turning curve to the data. the line does not pass through all the points.” Many problems in applied mathematics involve discovering the best of something. One generally establishes mathematical optimality by some suitable. The role of applied mathematics is to construct suitable models to calculate quantities needed to achieve some deﬁnite good. the simpliﬁed equation implies the approximate but optimal least-squares solution −1 T A b (13. The metric (13. then one should probably use it. In fact it passes optimally nearly among them. taste and judgment come in.14 Here is a nice example of the use of the mathematical adjective optimal in its adverbial form. really. its role is not. maybe? Not likely. INVERSION AND ORTHONORMALIZATION comes of combining the last two equations. What constitutes the best however can be a matter of judgment. too much labor on the freeway job might actually slow construction rather than speed it. for no line can. In the present section’s example. Diﬀerentiating by the Jacobian product rule (11. Mathematics oﬀers many downward-turning curves.

This would mean that there existed a nonzero. Let A be a complex. Left-multiplying by u∗ would give that u∗ A∗ Au = 0. In u = 0. One might ﬁt the line to the points (bi . Suppose falsely that A∗ A were not invertible but singular. itself to easier analysis.4) that its columns (as its rows) depended on one another. In u = 0. implying (§ 12.13. bi ). The big square is not very interesting and in any case is not invertible.2 The invertibility of A∗ A Section 13. Yet even ﬁtting a mere straight line oﬀers choices. impossible when the columns of A are independent. The adjective “optimal” alone evidently does not always tell us all we need to know.1) that the product’s rank r ′ < n were less than full.6. m × n matrix of full column rank r = n ≤ m. the n × n product A∗ A is invertible for any tall or square. .6. it happens that AT = A∗ . The contradiction proves false the assumption which gave rise to it. It is the compact square that concerns this section. then A∗ A is a compact. Section 6. Thus. They predict production diﬀerently.3 oﬀers a choice between averages that resembles in spirit this footnote’s choice between metrics.5. But this could only be so if Au = 0. 15 Notice that if A is tall.1 has assumed correctly but unwarrantedly that the product AT A were invertible for real A of full column rank. n-element vector u for which A∗ Au = 0. In u = 0. For real A. m × n matrix A of full column rank r = n ≤ m.15 This subsection warrants the latter assumption. Since the product A∗ A is a square. ui ) or (ln ui . n × n square. so it only broadens the same assumption to suppose that the product A∗ A were invertible for complex A of full column rank. or in other words that n (Au) (Au) = i=1 ∗ |[Au]i |2 = 0.6. The false assumption: that A∗ A were singular. thereby incidentally also warranting the former. THE PSEUDOINVERSE AND LEAST SQUARES 339 13. ln bi ) rather than to the points (ui . this is to suppose (§ 13. n × n matrix. In u = 0. whereas AA∗ is a big. The three resulting lines diﬀer subtly. m × m square.

then one can just choose B = Im or C = In . the product A∗ A is nonnegative definite for any matrix A whatsoever. Such deﬁnitions might seem opaque. Suppose that. C = Ir G< . INVERSION AND ORTHONORMALIZATION 13. which is unfortunate but conventional. that the product Au points more in the direction of u than of −u. (13. 16 This subsection uses the symbols B and b for unrelated purposes.16). we manipulated (13. both of which factors themselves enjoy full rank r.3 Positive deﬁniteness An n × n matrix C is positive deﬁnite if and only if ℑ(u∗ Cu) = 0 and ℜ(u∗ Cu) > 0 for all In u = 0. (B B) ∗ −1 B BCx ≈ (B ∗ B)−1 B ∗ b. a conjecture seems warranted. (13. Thus per (13. but their sense is that a positive deﬁnite operator never reverses the thing it operates on. An n × n matrix C is nonnegative deﬁnite if and only if ℑ(u∗ Cu) = 0 and ℜ(u∗ Cu) ≥ 0 for all u.2. Section 13.25) by the successive steps Ax ≈ b. 12.6. however. the Gauss-Jordan decomposition of eqn. every m × n matrix A of rank r can be factored into a product16 A = BC of an m × r tall or square matrix B and an r × n broad or square matrix C. (If A happens to have full row or column rank. According to (12.30) By reasoning like the last paragraph’s. 13.29) the product A∗ A is positive deﬁnite for any matrix A of full column rank.2 ﬁnds at least the full-rank factorization B = G> Ir .28). inspired by (13.8 explains further. but even if A lacks full rank.29) As in § 13.6. See footnote 11.6. nelement vectors u. A positive deﬁnite operator resembles a positive scalar in this sense. Cx ≈ (B ∗ B)−1 B ∗ b. ∗ BCx ≈ b. here also when a matrix A has full column rank r = n ≤ m the product u∗ A∗ Au = (Au)∗ (Au) is real and positive for all nonzero.4 The Moore-Penrose pseudoinverse Not every m × n matrix A enjoys full rank. .) This being so.340 CHAPTER 13.

After all. altering the “≈” sign to “=. minimal squared residual norm. Changing the variable back and (because we are conjecturing and can do as we like). .26). So here is our conjecture: • no x enjoys a smaller squared residual norm r∗ r than the x of (13.2 at least that the two matrices it tries to invert are invertible.6.31) does.31) does resemble (13. the latter of which admittedly requires real A of full column rank but does minimize the residual when its requirements are met. whether small.28). The conjecture is bold. The ﬁrst point of the conjecture is symbolized r(x) ≤ r(x + ∆x).6. this is [Ax − b]∗ [Ax − b] ≤ [(A)(x + ∆x) − b]∗ [(A)(x + ∆x) − b]. moderate or large. even if there were more than one x which minimized the residual. Reorganizing. According to (13.” x = C ∗ (CC ∗ )−1 (B ∗ B)−1 B ∗ b. and. (13. (13. and • among all x that enjoy the same.31) has a pleasingly symmetrical form.31) Equation (13. one of them might be smaller than the others: why not the x of (13. u ≈ (CC ∗ )−1 (B ∗ B)−1 B ∗ b.13.31) is strictly least in magnitude. CC ∗ u ≈ (B ∗ B)−1 B ∗ b. Continuing. [Ax − b]∗ [Ax − b] ≤ [(Ax − b) + A ∆x]∗ [(Ax − b) + A ∆x]. 341 thus restricting x to the space addressed by the independent columns of C ∗ . THE PSEUDOINVERSE AND LEAST SQUARES Then suppose that we changed C ∗ u ← x. but if you think about it in the right way it is not unwarranted under the circumstance.31)? One can but investigate. the x of (13.31). of some alternate x from the x of (13. and we know from § 13. where ∆x represents the deviation.

It does not claim that no x + ∆x enjoys the same. Each step in the present paragraph is reversible.3.2 and 9. thus establishing the conjecture’s ﬁrst point.3 is a nonnegative deﬁnite operator. Moreover. See Ch. minimal squared residual norm.31) and the full-rank factorization A = BC. leaving an assertion that 0 ≤ ∆x∗ A∗ A ∆x.6. The conjecture’s ﬁrst point. the assertion in the last form is correct because the product of any matrix and its adjoint according to § 13. which since B has full column rank is possible only if C ∆x = 0. A∗ (Ax − b) = A∗ Ax − A∗ b = [C ∗ B ∗ ][BC][C ∗ (CC ∗ )−1 (B ∗ B)−1 B ∗ b] − [C ∗ B ∗ ][b] = C ∗ (B ∗ B)(CC ∗ )(CC ∗ )−1 (B ∗ B)−1 B ∗ b − C ∗ B ∗ b = C ∗ B ∗ b − C ∗ B ∗ b = 0.17 so the assertion in the last form is logically equivalent to the conjecture’s ﬁrst point. The latter case is symbolized r(x) = r(x + ∆x). 6’s footnote 14. so this is to claim that B(C ∆x) = 0. or in other words.5. has it that no x+∆x enjoys a smaller squared residual norm than the x of (13. with which the paragraph began. A ∆x = 0. INVERSION AND ORTHONORMALIZATION Distributing factors and canceling like terms.31) does. But according to (13. 0 ≤ (Ax − b)∗ A ∆x + ∆x∗ A∗ (Ax − b) + ∆x∗ A∗ A ∆x. But A = BC. 0 = ∆x∗ A∗ A ∆x. which reveals two of the inequality’s remaining three terms to be zero. or equivalently by the last paragraph’s logic. .342 CHAPTER 13. now established. 17 The paragraph might inscrutably but logically instead have ordered the steps in reverse as in §§ 6.

which naturally for ∆x = 0 is true. reverse logic establishes the conjecture’s second point. THE PSEUDOINVERSE AND LEAST SQUARES 343 Considering the product ∆x∗ x in light of (13. Distributing factors and canceling like terms. which is to observe that ∆x∗ x = 0 for any ∆x for which x + ∆x achieves minimal squared residual norm. Returning attention to the conjecture. more brieﬂy the pseudoinverse of A. its second point is symbolized x∗ x < (x + ∆x)∗ (x + ∆x) for any ∆x = 0 for which x+∆x achieves minimal squared residual norm (note that it’s “<” this time. so the last inequality reduces to read 0 < ∆x∗ ∆x.31) is called the Moore-Penrose pseudoinverse of A. With both its points established. exactly determined. overdetermined or even degenerate.32) of (13. Since each step in the paragraph is reversible. the conjecture is true. not “≤” as in the conjecture’s ﬁrst point). then the matrix18 A† ≡ C ∗ (CC ∗ )−1 (B ∗ B)−1 B ∗ (13. every matrix has a Moore-Penrose pseudoinverse.31) and the last equation.33) Some books print A† as A+ . But the last paragraph has found that ∆x∗ x = 0 for precisely such ∆x as we are considering here.6. 0 < x∗ ∆x + ∆x∗ x + ∆x∗ ∆x. Whether underdetermined. Yielding the optimal approximation x = A† b. 18 (13. we observe that ∆x∗ x = ∆x∗ [C ∗ (CC ∗ )−1 (B ∗ B)−1 B ∗ b] = [C ∆x]∗ [(CC ∗ )−1 (B ∗ B)−1 B ∗ b]. If A = BC is a full-rank factorization.13. .

then the Moore-Penrose A† = A−1 is just the inverse.23 fails to do).20 19 20 [2. The MoorePenrose is thus a general-purpose solver and approximator for linear systems.25) as well as the system can be solved—exactly if possible. If A lacks full row rank. 4. x=xk Solving for xk+1 (approximately if necessary).34) where [·]† is the Moore-Penrose pseudoinverse of § 13.7 The multivariate Newton-Raphson iteration When we ﬁrst met the Newton-Raphson iteration in § 4. f dx x=xk where df /dx is the Jacobian derivative of § 11. and then of course (13. Nothing can solve the system uniquely if A has broad shape but the Moore-Penrose still solves the system exactly in that case as long as A has full row rank. then the Moore-Penrose solves the system as nearly as the system can be solved (as in Fig.344 CHAPTER 13. 13. (13.1 if f and x happen each to have the same number of elements. If A is square and invertible. § 3. It then approximates the root xk+1 as the point at which ˜k (xk+1 ) = 0: f ˜k (xk+1 ) = 0 = f (xk ) + f d f (x) dx (xk+1 − xk ). Now that we have the notation and algebra we can write down the multivariate Newton-Raphson iteration almost at once.19 13.8 and Fig. moreover minimizing the solution’s squared magnitude x∗ x (which the solution of eqn.5. INVERSION AND ORTHONORMALIZATION the Moore-Penrose solves the linear system (13. It is a signiﬁcant discovery. The iteration approximates the nonlinear vector function f (x) by its tangent ˜k (x) = f (xk ) + d f (x) (x − xk ). we have that xk+1 d f (x) = x− dx † f (x) x=xk .33) solves the system uniquely and exactly.6—which is just the ordinary inverse [·]−1 of § 13.8 we lacked the matrix notation and algebra to express and handle vector-valued functions adeptly. 13. with minimal squared residual norm if impossible. Refer to § 4.10. “Moore-Penrose generalized inverse”] [40] .1) and as a side-beneﬁt also minimizes x∗ x.3][37.

also called the inner product. § 3. it must be that ei · ej = δij . (13. It does not intend to use the full (13. Ch. Where used.32). at least in the author’s country.8 The dot product The dot product of two vectors. especially in some of the older literature. some similar product). if r = n ≤ m. more broadly. as though the curve of Fig. Most recently. The iteration intends rather in light of (13.37) 21 The term inner product is often used to indicate a broader class of products than the one deﬁned here. .1][15. if the Jacobian is degenerate—then (13.32) that d f (x) dx † where B = Im in the ﬁrst case and C = In in the last.35) fails. Therefore. a).5 ran horizontally at the test point—when one quits. or. If both r < m and r < n—which is to say.  −1 [df /dx]∗ ([df /dx] [df /dx]∗ )  = [df /dx]−1   −1 ([df /dx]∗ [df /dx]) [df /dx]∗ if r = m ≤ n.8. It is written a · b. the usage a. 4]. (13. if r = m = n. In general.34). the Newton-Raphson iteration is not normally meant to be applied at a value of x for which the Jacobian is degenerate. This book prefers the dot. restarting the iteration from another point. the notation usually resembles a. both of which mean a∗ · b (or. b ≡ a∗ · b seems to be emerging as standard where the dot is not used [2. b or (b. But if the dot product is to mean anything.13. THE DOT PRODUCT 345 Despite the Moore-Penrose notation of (13. except that which of a and b is conjugated depends on the author. a · b ≡ aT b = ∞ j=−∞ (13.21 is the product of the two vectors to the extent that they run in the same direction. 4. a · b = (a1 e1 + a2 e2 + · · · + an en ) · (b1 e1 + b2 e2 + · · · + bn en ).36) aj bj . more concisely. a · b = a1 b1 + a2 b2 + · · · + an bn .35) 13.

such that (13.43) which formally deﬁnes the angle θ even when a and b have more than three elements each.346 CHAPTER 13. the dot product |a|2 = a∗ · a (13. usually one is not interested in a · b so much as in a ·b≡a b= The reason is that ℜ(a∗ · b) = ℜ(a) · ℜ(b) + ℑ(a) · ℑ(b). The unit vector in a’s direction then is a a ˆ =√ .) Where vectors may have complex elements. incidentally: a · b = a · bT = aT · b = aT · bT = aT b. never negative. the notation implicitly reorients it before using it. a a∗ · b = 0.40) a≡ |a| a∗ · a from which ˆ ˆ |ˆ|2 = a∗ · a = 1.42) the two vectors are said to lie orthogonal to one another. (The more orderly notation aT b by contrast assumes that both are proper column vectors. For other angles θ between two vectors.” By the Pythagorean theorem. (13. thus honoring the right Argand sense of “the product of the two vectors to the extent that they run in the same direction.39) ∗ ∗ ∞ a∗ bj . always real. if either vector is wrongly oriented. with the product of the imaginary parts added not subtracted.38) j=−∞ gives the square of a vector’s magnitude. ˆ ˆ a∗ · b = cos θ. Geometrically this puts them at right angles. . j (13. (13.41) When two vectors do not run in the same direction at all. INVERSION AND ORTHONORMALIZATION The dot notation does not worry whether its arguments are column or row vectors. That is. (13.

The same goes for the eigenvectors of Ch.3). Kernel vectors have no inherent scale. then so does α1 x1 + α2 x2 . (13.3] 22 . 7 − 1 0]T .46) 13. This says that the columns of A⊥ ≡ A∗K address the complete space of vectors that lie orthogonal to A’s columns.” since by (13. THE ORTHOGONAL COMPLEMENT 347 13. such that A⊥∗ A = 0 = A∗ A⊥ . If I claim AK = [3 4 5. we might write not A⊥ but A⊥(m) .44) is an interesting matrix. If we were really precise. style generally asks the applied mathematician not only to normalize but also to orthogonalize the columns The symbol A⊥ [23][2][31] can be pronounced “A perp.45) The matrix A⊥ is called the orthogonal complement 23 or perpendicular matrix to A.45) A⊥ is in some sense perpendicular to A. (13. the orthogonal complement A⊥ supplies the columns A lacks to reach full row rank. Properties include that A∗K = A⊥ . −1 1 0]T to represent a kernel. If the vectors x1 = AK a1 and x2 = AK a2 both belong. By deﬁnition of the kernel. 23 [23. Where a kernel matrix AK has two or more columns (or a repeated eigenvalue has two or more eigenvectors).9.40) to normalize the columns of a kernel matrix to unit magnitude before reporting them. instead for instance representing the same kernel 1 as AK = [6 8 0xA. Among other uses. then you are not mistaken arbitrarily to rescale each column of my AK by a separate nonzero factor. then so equally does any αx. 14 to come.” short for “A perpendicular.13.VI. Style 7 generally asks the applied mathematician to remove the false appearance of scale by using (13. A∗⊥ = AK . the columns of A∗K are the independent vectors uj for which A∗ uj = 0.3)22 (13. Refer to footnote 5.10 Gram-Schmidt orthonormalization If a vector x = AK a belongs to a kernel space AK (§ 13.9 The orthogonal complement A⊥ ≡ A∗K The m × (m − r) kernel (§ 13. which—inasmuch as the rows of A∗ are the adjoints of the columns of A—is possible only when each uj lies orthogonal to every column of A. § 3.

a ·a √ ˆ ˆ ˆ But according to (13. and according to (13. x3 . Substituting the second of these equations into the ﬁrst and solving for β yields β= Hence.50) j−1 xj⊥ ≡ i=1 ˆ ˆ i⊥ (I − xi⊥ x∗ ) xj . orthogonalize x2 with respect to x1⊥ by (13. Symbolically. (13. . rather than the scalar [ˆ∗ ][ˆ]). a matrix. b⊥ ≡ b − βa.48) or (13. then normalize it. ˆ Second. (13. xn } ˆ therefore is as follows.40).48) or. One orthogonalizes a vector b with respect to a vector a by subtracting from b a multiple of a such that a∗ · b⊥ = 0. . a∗ · a = 1. This is arguably better written b⊥ = [I − (ˆ)(ˆ∗ )] b a a (observe that it’s [ˆ][ˆ∗ ]. in matrix notation.49). Proceed in this manner through the several xj . so. xj⊥ ˆ xj⊥ = . ˆ a b⊥ = b − a(ˆ∗ )(b). a∗ · b . a a a a One orthonormalizes a set of vectors by orthogonalizing them with respect to one another. then ˆ ˆ normalize it. a∗ · a (13. First.49) a∗ · b⊥ = 0. then by normalizing each of them to unit magnitude.47) a∗ · b b⊥ ≡ b − ∗ a. x∗ xj⊥ j⊥ (13. . . normalize x1 by (13. x2 .348 CHAPTER 13. where the symbol b⊥ represents the orthogonalized vector. call the result x3⊥ . orthogonalize x3 with respect to x1⊥ ˆ ˆ then to x2⊥ . . Third.41). INVERSION AND ORTHONORMALIZATION before reporting them. The procedure to orthonormalize several vectors {x1 . a = a a∗ · a. call the result x1⊥ .40). call the result x2⊥ . ˆ a b⊥ = b − a(ˆ∗ · b).

in perspective of whatever it happens to be that one is trying to accomplish. GRAM-SCHMIDT ORTHONORMALIZATION 349 By the vector replacement principle of § 12.10. “Gram-Schmidt process. 11 August 2007] . If all one wants is some vectors orthonormalized. probably does not even reserve that much. then the equation as written is neat but is ˆ i⊥ ˆ ˆ i⊥ overkill because the product xi⊥ x∗ is a matrix.” 04:48.10. representing the multiplication in the admittedly messier form ˆ xj(i+1) ≡ xji − (ˆ ∗ · xji ) xi⊥ .4 in light of (13. . xn⊥ } x addresses the same space as did the original set. Orthonormalization naturally works equally for any linearly independent set of vectors. not only for kernel vectors or eigenvectors. .13.2 The Gram-Schmidt decomposition The orthonormalization technique this section has developed is named the Gram-Schmidt process. Fortunately.48) is just a scalar. which memory it repeatedly overwrites—and. whereas the product x∗ xj implied by (13. 13.1 Eﬃcient implementation To turn an equation like the latter line of (13. x2⊥ . (A well written orthonormalizing computer program reserves memory for one intermediate vector only.51) ˆ ˆ i⊥ Besides obviating the matrix I − xi⊥ x∗ and the associated matrix multiplication. (13. one need not apply the latter line of (13. xi⊥ xj⊥ = xjj .10. neater.51) has the signiﬁcant additional practical virtue that it lets one forget each intermediate vector xji immediately after using it. One can instead introduce intermediate vectors xji . xj1 ≡ xj . x3⊥ . . the resulting orthonormal set of vectors ˆ ˆ ˆ {ˆ 1⊥ . One can turn it into the Gram-Schmidt decomposi24 [52. the messier form (13. By the technique. working rather in the ˆ memory space it has already reserved for xj⊥ . orthonormal set which addresses precisely the same space. .50) exactly as written. actually.)24 Other equations one algorithmizes can likewise beneﬁt from thoughtful rendering.50) into an eﬃcient numerical algorithm sometimes demands some extra thought. 13.47). one can conveniently replace a set of independent vectors by an equivalent.

bringing the very same index pairs in a diﬀerent sequence.1—where the latter three working matrices ultimately become the extended-operational factors U . such that the several (i. The i-major nesting (1. Borrowing the language of computer science we observe that the indices i and j of (13. By elementary column operations based on (13. The equations suggest j-major nesting.50) ˜ and (13. The algorithm. distributing the inverse elementaries ˜ ˜ ˜ to U . INVERSION AND ORTHONORMALIZATION A = QR = QU DS. m×r matrix Q of orthonormal columns. (13.350 tion CHAPTER 13. 4) (2. We choose i-major nesting on the subtle ground that it aﬀords better information to the choice of column index p during the algorithm’s step 3. is just as valid. by an algorithm that somewhat resembles the Gauss-Jordan algorithm of § 12. D and S of (13. j) pair follows which. 4) .. one level looping over j and the other over i. with the loop over j at the outer level and the loop over i at the inner.53) ˜ and initially Q ← A.. 4) (3. 4) · · · .51)’s middle line requires only that no xi⊥ be used before it is fully calculated.52) also called the orthonormalizing or QR decomposition. in detail: . otherwise that line does not care which (i. 3) (1. ··· ··· ··· ˆ In reality.4) here becomes ˜˜ ˜˜ A = QU D S (13. 3) (1. 4) · · · (2. . D and S according to Table 12. however. 3) (2.3. except that (12. 2) (1.52). 4) · · · (3.50) and (13. j) index pairs occur in the sequence (reading left to right then top to bottom) (1.3. (13.51). 3) (2. the algorithm gradually transforms Q into a dimension-limited. R ≡ U DS. 2) (1. .51) imply a two-level nested loop.

among other ˜ heuristics.1. 5. but one might instead prefer to choose the column of greatest magnitude.7.2) with djj = 1 for all j ≥ i.13. the algorithm also ˜ re¨nters here from step 9 below. Begin by initializing ˜ Q ← A. ˜ S ← I.) If Q is null in and rightward of its ith column such that no column p ≥ i remains available to choose. ˜ ˜ U ← T[i↔p] U T[i↔p] . ˜ ˜ S ← T[i↔p] S.7. Observing that (13. interchange the chosen pth column to the ith position by ˜ ˜ Q ← QT[i↔p] . Choose a column p ≥ i of Q containing at least one nonzero element.1).53) can be expanded to read A = = ˜ QT[i↔p] ˜ QT[i↔p] ˜ T[i↔p] U T[i↔p] ˜ T[i↔p] DT[i↔p] ˜ T[i↔p]S ˜ ˜ ˜ T[i↔p] U T[i↔p] D T[i↔p]S . GRAM-SCHMIDT ORTHONORMALIZATION 1.) Observe that U enjoys the major e {i−1}T (§ 11. Observing that (13. ˜ 3. or to choose randomly. and that the ﬁrst through (i − 1)th columns of Q consist of mutually orthonormal unit vectors.5). ˜ D ← I. 351 2.8. . that D is a general ˜ partial unit triangular form L ˜ ˜ scaling operator (§ 11.53) can be expanded to read ˜ A = QT(1/α)[i] ˜ Tα[i] U T(1/α)[i] ˜ ˜ Tα[i] D S. 4. that S is permutor ˜ (§ 11. (The simplest choice is perhaps p = i as long as the ith column does not happen to be null. where the latter line has applied a rule from Table 12. then skip directly to step 10.10. i ← 1. (Besides arriving at this point from step 1 above. ˜ U ← I.

observing that (13. 6. ˜ S ≡ S. where α= ˜ Q ∗ ∗i ˜ · Q ∗i . where ˜ β= Q ∗ ∗i ˜ · Q ∗j . INVERSION AND ORTHONORMALIZATION ˜ normalize the ith column of Q by ˜ ˜ Q ← QT(1/α)[i] . Let ˜ Q ≡ Q.352 CHAPTER 13.51) with respect to the ith column by ˜ ˜ Q ← QT−β[ij] . 8. e Otherwise. 10.) If j > n then skip directly to step 9. Increment and return to step 7. ˜ U ≡ U. ˜ ˜ U ← Tα[i] U T(1/α)[i] . 7. the algorithm also re¨nters here from step 8 below. Increment and return to step 2. (Besides arriving at this point from step 6 above.53) can be expanded to read ˜ A = QT−β[ij] ˜ ˜˜ Tβ[ij]U DS. i← i+1 j ←j+1 ˜ D ≡ D. 9. ˜ orthogonalize the jth column of Q per (13. . ˜ ˜ U ← Tβ[ij] U . r = i − 1. ˜ ˜ D ← Tα[i] D. Initialize j ← i + 1.

Observing that the r independent columns of the m × r matrix Q address the same space the columns of A address. achieving square.56) . here also one sometimes prefers that S = I. 13. if one always chooses p = i during the algorithm’s step 3.52) and its algorithm in § 13. (ii) since Q is itself dimension-limited.10. GRAM-SCHMIDT ORTHONORMALIZATION End.7).10. below).13.52) needs and has no explicit factor Ir .16) is a proper full-rank factorization with which one can compute the pseudoinverse A† of A (see eqn. this is A = (Q)(Ir R).5. the matrix Q = QIr has only r columns.32. the Gram-Schmidt decomposition too brings a kernel formula. above. Reassociating factors. but see also eqn.11.3. The algorithm optionally supports this preference if the m × n matrix A has full column rank r = n. next.2.10. then it becomes a unitary matrix —the subject of § 13.10. never on the rows. 13.54) which per (12. (13. If the Gram-Schmidt decomposition (13. If Q reaches the maximum possible rank r = m. however.52) looks useful. whose orthonormal columns address the same space the columns of A themselves address.3 The Gram-Schmidt kernel formula Like the Gauss-Jordan decomposition in (13. when null columns cannot arise. Such optional discipline maintains S = I when desired. m × m shape.7. the Gram-Schmidt decomposition (13. one decomposes an m × n matrix A = QR (13.55) per the Gram-Schmidt (13. Whether S = I or not. one then constructs the m × (r + m) matrix A′ ≡ Q + Im H−r = Q Im (13.63. 353 Though the Gram-Schmidt algorithm broadly resembles the GaussJordan. 13. at least two signiﬁcant diﬀerences stand out: (i) the Gram-Schmidt ˜ is one-sided because it operates only on the columns of Q. To develop and apply it. The most interesting of its several factors is the m × r orthonormalized matrix Q.52) as A = (QIr )(R). so one can write (13. Before treating the unitary matrix. it is even more useful than it looks. As in § 12. let us pause to extract a kernel from the Gram-Schmidt decomposition in § 13.

it brings one .354 CHAPTER 13.57) again by Gram-Schmidt—with the diﬀerences that.58) or the form (13. The resulting m × m. A∗K = A⊥ = Q′ Hr Im−r . .59). To compute the kernel of a matrix B by Gram-Schmidt one sets A = B ∗ and applies (13. full-rank square matrix Q′ = Q + A⊥ H−r = consists of • r columns on the left that address the same space the columns of A address and • m−r columns on the right that give a complete orthogonal complement (§ 13. A′ = Q′ R′ .11 The unitary matrix When the orthonormalized matrix Q of the Gram-Schmidt decomposition (13. after all. are just Q.58) extracts the rightward columns that express the kernel.11 that follows. if one wants a Gauss-Jordan kernel orthonormalized.46). as the last paragraph of § 13. one chooses p = 1.58) is probably the more useful form. and that one skips the unnecessary step 7 for all j ≤ r.9) A⊥ of A. INVERSION AND ORTHONORMALIZATION and decomposes it too.59) Q A⊥ (13. The unitary matrix is the subject of § 13. then one must orthonormalize it as an extra step. the m × m matrix Q′ is a unitary matrix. but of A∗ .55) has already orthonormalized ﬁrst r columns of A′ . the Gram-Schmidt kernel formula does everything the Gauss-Jordan kernel formula (13. . having the maximum possible rank r = m. 3. . Refer to (13. left and right.7) does and in at least one sense does it better.10. 2. (13. on the ground that the earlier Gram-Schmidt application of (13. Being square.55) through (13.2 has alluded. which columns. but the GramSchmidt kernel formula as such. 13. not of A. this time.59).52) is square. In either the form (13. (13. Equation (13. Each column has unit magnitude and conveniently lies orthogonal to every other column. whereas the Gram-Schmidt kernel comes already orthonormalized. r during the ﬁrst r instances of the algorithm’s step 3. for. .

60) is that a square matrix with either orthonormal columns or orthonormal rows is unitary and has both. until one realizes25 that the equation Q∗ Q = Im characterizes Q∗ to be the rank-m inverse of Q. and that the very deﬁnition of orthonormality demands that the dot product [Q]∗ · ∗i [Q]∗j of orthonormal columns be zero unless i = j.1. qj = i qbij qai . (Note that the permutor of § 11. qaj and qbj respectively represent the jth columns of Q. when the dot product of a unit vector with itself is unity. however. The product of two or more unitary matrices is itself unitary if the matrices are of the same dimensionality. The adjoint dot product of any two of Q’s columns then is q∗′ · qj = j i. Thus. By the columnwise interpretation (§ 11. To prove it.13.11. Q−1 = Q∗ . is called a unitary matrix.1 lets any rank-m inverse (orthonormal or otherwise) attack just as well from the right as from the left.1 enjoys the property of eqn. and that § 13.61) a very useful property. ai But q∗ ′ · qai = δi′ i because Qa is unitary.3) of matrix multiplication.i′ ∗ qbi′ j ′ qbij q∗ ′ · qai .4] This is true only for 1 ≤ i ≤ m.26 so ai q∗′ · qj = j 25 26 i ∗ qbij ′ qbij = q∗ ′ · qbj = δj ′ j . consider the product Q = Qa Qb (13.61 precisely because it is unitary. (13. bj [15.) One immediate consequence of (13. whether derived from the Gram-Schmidt or from elsewhere.60) The reason that Q∗ Q = Im is that Q’s columns are orthonormal. Let the symbols qj . That Im = QQ∗ is unexpected.62) of m × m unitary matrices Qa and Qb . . THE UNITARY MATRIX 355 property so interesting that the property merits a section of its own. Qa and Qb and let the symbol qbij represent the ith element of qbj .60).7. (13. 13. A matrix Q that satisﬁes (13. § 4. The property is that Q∗ Q = Im = QQ∗ . but you knew that already.

Unitary operations preserve length. But according to (13. so. To prove it. Multiplying the system by its own adjoint yields x∗ Q∗ Qx = b∗ b. It has explained how to orthonormalize a set of vectors and has derived from the explanation the useful Gram-Schmidt decomposition. The chapter as a whole has demonstrated at least in theory (and usually in practice) techniques to solve any linear system characterized by a matrix of ﬁnite dimensionality.52) to invert a square matrix as A−1 = R−1 Q∗ = S ∗ D −1 U −1 Q∗ . whatever the matrix’s rank or shape.60) for m = ∞. and it is the subject of Ch.63) Unitary extended operators are certainly possible. next. consider the system Qx = b. the single most interesting agent of matrix arithmetic remains yet to be treated.60). . We shall meet the unitary matrix again in §§ 14. operating on an m-element vector by an m × m unitary matrix does not alter the vector’s magnitude. itself meets the unitary criterion (13. 14. which is just Q with ones running out the main diagonal from its active region. for if Q is an m × m dimension-limited matrix.356 CHAPTER 13. And even so.10 and 14. Unitary matrices are so easy to handle that they can sometimes justify signiﬁcant eﬀort to convert a model to work in terms of them if possible. INVERSION AND ORTHONORMALIZATION which says neither more nor less than that the columns of Q are orthonormal. Q∗ Q = Im . x∗ x = b∗ b.61) lets one use the Gram-Schmidt decomposition (13. the matrix has shown its worth here. That is. as was to be demonstrated. then the extended operator Q∞ = Q + (I − Im ). (13. This last is the eigenvalue. as was to be demonstrated. As the chapter’s introduction had promised. which is to say that Q is unitary. Equation (13. arithmetic and algebra most of the chapter’s ﬁndings would have lain beyond practical reach.12. for without the matrix’s notation.

Unless you already know about determinants. This chapter analyzes the eigenvalue and the associated eigenvector it scales. Here comes the ﬁrst unexpected turn.Chapter 14 The eigenvalue The eigenvalue is a scalar by which a square matrix scales a vector without otherwise changing it. each term the product of n elements.7. 11. terms of positive parity adding to and terms of negative parity subtracting from the sum—a term’s parity (§ 11. so try this.2 for reference. the chapter gathers from across Chs. the deﬁnition alone might seem hard to parse.1 The determinant Through Chs. assembling them in § 14. which opens the chapter.1) marking the positions of the term’s elements. Before treating the eigenvalue proper. such that Av = λv.6) being the parity of the permutor (§ 11. 14. 12 and 13 the theory of the matrix has developed slowly but pretty straightforwardly. It begins with an arbitrary-seeming deﬁnition. The inverse of the general 2 × 2 square matrix a11 a12 . 11 through 14 several properties all invertible square matrices share. no two elements from the same row or column. One of these regards the determinant. A2 = a21 a22 357 . The determinant of an n × n square matrix A is the sum of n! terms.

nonnegative. analytic function. 2 a11 a22 − a12 a21 The quantity1 det A2 = a11 a22 − a12 a21 in the denominator is deﬁned to be the determinant of A2 . we do ﬁnd that quantity in the denominator there.358 CHAPTER 14. an appropriately terse notation for which the author confesses some nostalgia. (How do we know that exactly half are of positive and half. n! in number. whereas the determinant det A is a complex-valued. 1 . THE EIGENVALUE by the Gauss-Jordan method or any other convenient technique. P = 2 0 4 0 1 1 0 0 3 0 1 5 0 marking the positions of the term’s elements is positive. The determinant comprehends all possible such terms. is found to be a22 −a12 −a21 a11 A−1 = . nonanalytic function of the elements of the quantity in question. which has no second term to pair.1). with parity giving the term its ± sign.) The determinant det A used to be written |A|. or interchange quasielementary (§ 11. The book follows convention by denoting the determinant as det A for this reason among others. The older notation |A| however unluckily suggests “the magnitude of A. The parity of a term like a13 a22 a31 is negative for the same reason. For every term like a12 a23 a31 whose marking permutor is P . there is a corresponding a13 a22 a31 whose marking permutor is T[1↔2] P . The determinant of the general 3 × 3 square matrix by the same rule is det A3 = (a11 a22 a33 + a12 a23 a31 + a13 a21 a32 ) − (a13 a22 a31 + a12 a21 a33 + a11 a23 a32 ). The parity rule merits a more careful description. Each of the determinant’s terms includes one element from each column of the matrix and one from each row. half of positive parity and half of negative. either. The parity of a term like a12 a23 a31 is positive because the parity of the permutor. and indeed if we tediously invert such a matrix symbolically. negative? Answer: by pairing the terms.7. necessarily of opposite parity. The magnitude |z| of a scalar or |u| of a vector is a real-valued. The sole exception to the rule is the 1 × 1 square matrix.” which though not quite the wrong idea is not quite the right idea.

Historically the determinant probably emerged not from abstract considerations but for the mundane reason that the quantity it represents occurred frequently in practice (as in the A−1 example above).2 ) It is admitted3 that we have not. 13’s footnotes 5 and 22.14. otherwise. when j = j ′ . Nothing however log2 ically prevents one from simply deﬁning some quantity which. Interchanging rows or columns negates the determinant. we have merely motivated and deﬁned it. but the nonstandard notation det(n) A is available especially to call the rank out.1) αai∗ ai∗ when i = i′ . when i = i′′ . or if where i′′ = i′ and j ′′ = j ′ . 1] .1. [15. So we do here. stating explicitly that the determinant has exactly n! terms. when j = j ′′ . THE DETERMINANT 359 Normally the context implies a determinant’s rank n.2] 4 [15.5 and 11.49. one merely suspects will later prove useful. then det C = − det A.1 Basic properties The determinant det A enjoys several useful basic properties. Ch. as yet. (See also §§ 11. actually shown the determinant to be a generally useful quantity. And further Ch. § 1.1. • If ci∗ = 2 3 (14. 11. • If  ai′′ ∗  ci∗ = ai′ ∗   ai∗ c∗j  a∗j ′′  = a∗j ′   a∗j when i = i′ . otherwise.4 14.5 and eqn. at ﬁrst. otherwise.3.

• If or if c∗j ′′ = γc∗j ′ . otherwise.2. (Equation 14.3. Scaling a single row or column of a matrix scales the matrix’s determinant by the same factor.3) If one row or column of a matrix C is the sum of the corresponding rows or columns of two other matrices A and B.2.2) det C = α det A. ci′ ∗ = 0. (14.360 or if c∗j = then CHAPTER 14. otherwise. then det C = 0. 11.3 tracks the linear superposition property of § 7. (14. then the determinant of the one matrix is the sum of the determinants of the other two. (Equation 14.3. (14.2 tracks the linear scaling property of § 7. .4) A matrix with a null row or column also has a null determinant.3 and of eqn.3 and of eqn. otherwise. a∗j + b∗j a∗j = b∗j when j = j ′ . THE EIGENVALUE αa∗j a∗j when j = j ′ . 11. ci′′ ∗ = γci′ ∗ . while the three matrices remain otherwise identical.) • If ci∗ = or if c∗j = then det C = det A + det B.) • If or if c∗j ′ = 0. ai∗ + bi∗ ai∗ = bi∗ when i = i′ .

(14. Hence the net eﬀect of the procedure is to negate the determinant—to negate the very determinant the procedure is not permitted to alter. the procedure does not alter the determinant either. (14. However. then det C = 0.5) comes because the following procedure does not alter the matrix: (i) scale row i′′ or column j ′′ by 1/γ.6) comes because according to § 11.2). more formally.3) and (14.7. (14. Equation (14. when j = j ′′ . Or. and the determinant of the transpose is just the determinant itself: det C ∗ = (det C)∗ . the permutors P and P ∗ always have the same parity. (ii) scale row i′ or column j ′ by γ.6) These basic properties are all fairly easy to see if the deﬁnition of the determinant is clearly understood. . Equation (14. and indeed according to (14.1. The apparent contradiction can be reconciled only if the determinant is zero to begin with. Not altering the matrix.1.2). according to (14. in this case.1).14. step (iii) negates the determinant. • If ci∗ = or if c∗j = a∗j + αa∗j ′ a∗j ai∗ + αai′ ∗ ai∗ when i = i′′ . and because the adjoint operation individually conjugates each element of C. • The determinant of the adjoint is just the determinant’s conjugate. THE DETERMINANT where i′′ = i′ and j ′′ = j ′ .5) comes because.4) come because each of the n! terms in the determinant’s expansion has exactly one element from row i′ or column j ′ . Finally. otherwise. (14. step (ii)’s eﬀect on the determinant cancels that of step (i). 361 (14. every term in the determinant’s expansion ﬁnds an equal term of opposite parity to oﬀset it. (iii) interchange rows i′ ↔ i′′ or columns j ′ ↔ j ′′ .5) The determinant is zero if one row or column of the matrix is a multiple of another. det C T = det C.1) comes because a row or column interchange reverses parity. Equations (14. From the foregoing properties the following further property can be deduced. otherwise.

according to (14. scales or does not alter the matrix’s determinant. bi∗ ≡ ai∗ otherwise.9) . det Tα[ij] A = det A = det ATα[ij] . (14. det In A = det A = det AIn . bi′′ ∗ = αai′ ∗ whereas bi′ ∗ = ai′ ∗ . j) ≤ n.2 The determinant and the elementary operator Section 14. Obviously also. where [C]i′′ ∗ = [A]i′′ ∗ + [B]i′′ ∗ . then CHAPTER 14. so bi′′ ∗ = αbi′ ∗ .7) Adding to a row or column of a matrix a multiple of another row or column does not change the matrix’s determinant.4.3). B and C diﬀer only in the (i′′ )th row. On the other hand. (14.1 has it that interchanging.7) results from combining the last two equations. THE EIGENVALUE det C = det A. which by (14.1.7) for rows (the column proof is similar). one deﬁnes a matrix B such that αai′ ∗ when i = i′′ . det Tα[i] A = α det A = det ATα[j] . det C = det A + det B. To derive (14.1. det I = 1 = det In .362 where i′′ = i′ and j ′′ = j ′ . But the three operations named are precisely the operations of the three elementaries of § 11. i = j. From this deﬁnition.8) 1 ≤ (i. so. Equation (14. for any n × n square matrix A. (14. 14. the three matrices A. det T[i↔j]A = − det A = det AT[i↔j].5) guarantees that det B = 0. det IA = det A = det AI. Therefore. scaling or adding rows or columns of a matrix respectively negates.

Elementary operators being reversible have no power to breach this barrier. an elementary operator which honors an n × n active region. In . Nevertheless.} restricts Mk to be any of the things between the braces.8) gives elementary operators the power to alter a matrix’s determinant almost arbitrarily—almost arbitrarily. Another thing n × n elementaries cannot do according to § 12. In symbols. is that the determinant of a product is the product of the determinants. I. This matters because. full-rank square matrices never do.9).1. a determinant remains zero under the action of elementary operators. applied recursively. j) ≤ n. Once zero. where r ≤ n is the matrix’s rank. . What an n × n elementary operator6 cannot do is to change an n × n matrix’s determinant to or from zero.11) below is going to erase the restriction. ∈ 14. by the Gauss-Jordan algorithm of § 12.4 will put the rule (14.8) and (14. Once nonzero. then a signiﬁcant consequence of (14. always nonzero.3. As it happens though. . one can build up any square matrix of full rank by applying elementary operators to In . The notation Mk ∈ {. T[i↔j] . so it follows that the n × n determinant of every rank-r matrix is similarly zero if r < n. (14.10) 1 ≤ (i.1. Singular matrices always have zero determinants.3 is to change an n × n matrix’s rank. Equation (14. but not quite. as the Gauss-Jordan decomposition of § 12. One can evidently tell the singularity or invertibility of a square matrix from its determinant alone.2. at least where identity matrices and elementary operators are concerned. Tα[i] . but it does help here.1. and complementarily that the n × n determinant of a rank-n matrix is never zero. such elementaries can reduce any n × n matrix reversibly to Ir . Section 14.3 The determinant of a singular matrix Equation (14.5 det k Mk Mk = k det Mk .3 has shown.3.4) has that the n × n determinant of Ir is zero if r < n. THE DETERMINANT 363 If A is taken to represent an arbitrary product of identity matrices (In and/or I) and elementary operators. 5 . Notation like “∈” can be too fancy for applied mathematics. (14. Tα[ij] . 6 That is.14. in this case.5. See § 11.10) to good use.

The ﬁrst case is that A is singular. Here. the determinant of the product is the product of the determinants. The determinant of a matrix product is the product of the matrix determinants.12) .10) merely that since A and B are products of identity matrices and elementaries.11) holds in the ﬁrst case. which says that AB is no less singular than A.5 Determinants of inverse and unitary matrices From (14. for which det AB = det {[A] [B]} = det = = det T In det T det In T In T T det T det T In T det T det T det In T In T = det A det B.1. which is a schematic way of pointing out in light of (14.1.11) holds in all three cases. THE EIGENVALUE 14. The third case is that neither matrix is singular.1. (14. So it is that (14.4 The determinant of a matrix product Sections 14.364 CHAPTER 14. 14. Evidently (14. whereas according to § 12.3 suggest the useful rule that det AB = det A det B. B acts as a column operator on A. as was to be demonstrated.11) To prove the rule.5.2 no operator has the power to promote A in rank. which implies that AB like A has a null determinant. Hence the product AB is no higher in rank than A. we use GaussJordan to decompose both matrices into sequences of elementary operators and rank-n identity matrices.2 and 14. 1 det A (14. In this case. we consider three distinct cases. The second case is that B is singular. The proof here resembles that of the ﬁrst case.11) it follows that det A−1 = because A−1 A = In and det In = 1.1.

365 (14. |det Q| = 1. ∗ ∗ 0 ∗ ∗ .6) it follows that if Q is a unitary matrix (§ 13. but swamps one in nearly interminable algebra when symbolically inverting general matrices larger than the A2 at the section’s head. ··· ··· ··· ··· ··· 3 7 7 7 7 7 7 7. 14. 0 0 1 0 0 . The matrix C. .   ai′ j ′ otherwise.14) is [C T A]ij = (det A)δij = [AC T ]ij .14) Pictorially. . Another way to write (14. .11). . . . . . . . cij ≡ det Rij . 7 7 7 7 5 (14.  [Rij ]i′ j ′ ≡ 0 if i′ = i or j ′ = j but not both. . . THE DETERMINANT From (14. . called the cofactor of A. . .1. then consists of the determinants of the various Rij . then det Q∗ det Q = 1. one quite incidentally discovers a clever way to factor the determinant: C T A = (det A)In = AC T .1. . In the case that i = j. Slogging through the algebra to invert A3 symbolically nevertheless (the reader need not actually do this unless he desires a long exercise). [AC T ]ij = [AC T ]ii = ℓ (14.  1 if i′ = i and j ′ = j. . .13) This reason is that |det Q|2 = (det Q)∗ (det Q) = det Q∗ det Q = det Q∗ Q = det Q−1 Q = det In = 1. ··· ··· ··· ··· ··· ∗ ∗ 0 ∗ ∗ . ∗ ∗ 0 ∗ ∗ . . .14. which comprises two cases. Rij = same as A except in the ith row and jth column. .6 Inverting the square matrix by determinant The Gauss-Jordan algorithm comfortably inverts concrete matrices of moderate size. ∗ ∗ 0 ∗ ∗ . . . 2 6 6 6 6 6 6 6 6 6 6 6 4 . . .15) aiℓ ciℓ = ℓ aiℓ det Riℓ = det A = (det A)δij .

14) by det A.16) still holds in theory for yet larger matrices. too—especially with the help of a computer to do the arithmetic. so this book does not treat Cramer’s rule as such. One does This is a bit subtle.7 In the case that i = j. In practice. but rather of A with the jth row replaced by a copy of the ith. through about 4 × 4 dimensionality (the A−1 equation 2 at the head of the section is just eqn.16 for n = 2).15) on them. even the Gauss-Jordan grows impractical. for concrete matrices much bigger than 4 × 4 to 6 × 6 or so its several determinants begin to grow too great and too many for practical calculation. (14. CT .16) inverts a matrix by determinant. Cramer’s rule is really nothing more than (14. Iterative techniques ([chapter not yet written]) serve to invert such matrices approximately.16) to (13.16) A−1 = det A Equation (14. where aiℓ provides the ith row and Riℓ provides the other rows. being a determinant. of which the reader may have heard. 7 . plus this chapter up to the present point.9 14. Similar equations can be written for [C T A]ij in both cases. wherein ℓ aiℓ det Rjℓ is the determinant. THE EIGENVALUE wherein the equation ℓ aiℓ det Riℓ = det A states that det A. n × n matrix concisely whose entries remain unspeciﬁed. 12 and 13. due to compound ﬂoating-point rounding error and the maybe large but nonetheless limited quantity of available computer memory. not of A itself. it inverts small matrices nicely. results from applying (14. hence also (14. each term including one factor from each row of A. Though (14. [AC T ]ij = ℓ aiℓ cjℓ = ℓ aiℓ det Rjℓ = 0 = (det A)(0) = (det A)δij . Dividing (14.5) evaluates to zero. consists of several terms.366 CHAPTER 14.6]. 8 Cramer’s rule [15.2 Coincident properties Chs.4). The Gauss-Jordan technique (or even the Gram-Schmidt technique) is preferred to invert concrete matrices above a certain size for this reason. § 1. and though symbolically it expresses the inverse of an abstract. The two cases together prove (14. have discovered several coincident properties of the invertible n × n square matrix. we have that8 A−1 A = In = AA−1 . 11.16) in a less pleasing form. 14. which according to (14. but if you actually write out A3 and its cofactor C3 symbolically. However.14).15). It inverts 5 × 5 and even 6 × 6 matrices reasonably. 9 For very large matrices. trying (14. then you will soon see what is meant.

given any speciﬁc n-element driving vector b (§ 13.3.3. but a matrix can be almost singular just as a scalar can be almost zero.5.1). the computers which invert them tend to do arithmetic imprecisely in ﬂoating-point. COINCIDENT PROPERTIES 367 not feel the full impact of the coincidence when these properties are left scattered across the long chapters.2. • The linear system Ax = b it represents has a unique n-element solution x.7).4). per § 12.4).1. Whether the distinction is always useful is another matter. Assuming exact arithmetic.5. and its rows address the same space the rows of In address (§ 12. no matter how small.4). The square matrix which lacks one. • Its columns are linearly independent (§ 12. Matrices which live .3). (In this. never both. the choice of pivots does not matter. Practical matrices however often have entries whose values are imprecisely known. let us gather and summarize the properties here. the Gram-Schmidt algorithm achieves a fully square.5. has all of them. unitary. implies a theoretically invertible matrix.) • Decomposing it.10. Now it is true: in exact arithmetic.1 and 12. • Its rows are linearly independent (§§ 12.14. below). n × n matrix evidently has either all of the following properties or none of them.3. Such a matrix is known. • The determinant det A = 0 (§ 14. a square matrix is either invertible. • The matrix is invertible (§ 13.3). • Its columns address the same space the columns of In address.2). a nonzero determinant. by its unexpectedly small determinant. with all that that implies. • It has full rank r = n (§ 12. so. or singular. The distinction between invertible and singular matrices is theoretically as absolute as (and indeed is analogous to) the distinction between nonzero and zero scalars. • The Gauss-Jordan algorithm reduces it to In (§ 12. • None of its eigenvalues is zero (§ 14. A square. and even when they don’t. among other ways.2). n × n factor Q (§ 13.3.5. The square matrix which has one of these properties. lacks all. never some but not others. Usually the distinction is indeed useful.

They are called ill conditioned matrices.3 The eigenvalue itself We stand ready at last to approach the ﬁnal major agent of matrix arithmetic.1.3 is to demand that det(A − λIn ) = 0. 3 −1 – » 2−λ 0 det 3 −1 − λ » λ2 − λ − 2 = 0. When a matrix happens to operate on the right eigenvector v. together such that Av = λv.17) and a scalar λ. whose n roots are the eigenvalues 10 of the matrix A. it is all the same whether one applies the entire matrix or just the eigenvalue to the vector. (14.8 develops the topic. or in other words such that Av = λIn v. changing the vector’s 10 An example: A det(A − λIn ) = = = = λ = – 2 0 . What is an eigenvalue. Suppose a square. (2 − λ)(−1 − λ) − (0)(3) .1. n×n matrix A. then [A − λIn ]v = 0. The matrix scales the eigenvector by the eigenvalue without otherwise altering the vector. (14. the last equation is true if and only if the matrix [A − λIn ] is singular—which in light of § 14. Section 14.20) The left side of (14. 14. the eigenvalue.19) (14. the characteristic polynomial. (14.368 CHAPTER 14.18) Since In v is nonzero. THE EIGENVALUE on the hazy frontier between invertibility and singularity resemble the inﬁnitesimals of § 4. If so. really? An eigenvalue is a scalar a matrix resembles under certain conditions. a nonzero n-element vector v = In v = 0.1. −1 or 2.20) is an nth-order polynomial in λ.

j (14. This is what (14. Of course it works only when v happens to be the right eigenvector.30).18) means.59). the determinant (14. The eigenvalues of a matrix’s inverse are the inverses of the matrix’s eigenvalues. here iterative techniques help: see [chapter not yet written]. then what does A′ Avj = Ivj do to vj ? Naturally one must solve (14. taken together. Zero eigenvalues and singular matrices always travel together.2). One solves it by the same techniques by which one solves any polynomial: the quadratic formula (2.19) and (14. which § 14. one can feed it back to calculate the associated eigenvector. 11 The inexpensive [15] also covers the topic competently and readably.11 14. An eigenvalue and its associated eigenvector. nonsingular matrices never do.4 The eigenvector It is an odd fact that (14. Once one has calculated an eigenvalue. When λ = 0.20) makes det A = 0. That is. On the other hand.20)’s nth-order polynomial to locate the actual eigenvalues. According to (14. which as we have said is the sign of a singular matrix. are sometimes called an eigensolution.4. 10.7) or the Gram-Schmidt kernel formula (13. hulking matrix. the Newton-Raphson iteration (4.19).4 discusses.21) The reason behind (14.20) reveal the eigenvalues λ of a square matrix A while obscuring the associated eigenvectors v. λ′ λj = 1 for all 1 ≤ j ≤ n if A′ A = In = AA′ . The eigenvalue alone takes the place of the whole. which is to say that the eigenvectors are the vectors of the kernel space of the degenerate matrix [A − λIn ]—which one can calculate (among other ways) by the Gauss-Jordan kernel formula (13. the eigenvectors are the nelement vectors for which [A − λIn ]v = 0. the cubic and quartic methods of Ch.14. THE EIGENVECTOR 369 magnitude but not its direction.21) comes from answering the question: if Avj scales vj by the factor λj . (14.20) can be impractical to expand for a large matrix. . though. Singular matrices each have at least one zero eigenvalue.

. which is to say. vj ). then any n-element vector can be expressed as a unique linear combination of the eigenvectors. v2 . vk−1 respectively with distinct eigenvalues λ1 . Unfortunately some matrices with repeated eigenvalues also have repeated eigenvectors—as for example. did nontrivial coeﬃcients cj exist such that k−1 vk = j=1 cj vj .5 Eigensolution facts Many useful or interesting mathematical facts concern the eigensolution. Refer to (14. Thus vk too is independent. . . . THE EIGENVALUE 14. . • If the eigensolutions of A are (λj . then the eigensolutions of A + αIn are (λj + α. whereupon by induction from a start case of k = 1 we conclude that there exists no dependent eigenvector with a distinct eigenvalue. This is seen by adding αvj to both sides of the deﬁnition Avj = λvj . . impossible since the k − 1 eigenvectors are independent. • Eigenvectors corresponding to distinct eigenvalues are always linearly independent of one another. • A matrix and its inverse share the same eigenvectors with inverted eigenvalues. then left-multiplying the equation by A − λk would yield k−1 0= j=1 (λj − λk )cj vj .370 CHAPTER 14. λ2 . This is a simple consequence of the fact that the n × n matrix V whose columns are the several eigenvectors vj has full rank r = n.3. . The eigenvalues move over by αIn while the eigenvectors remain ﬁxed. . and further consider another eigenvector vk which might or might not be independent but which too has a distinct eigenvalue λk . λk−1 . vj ). . consider several independent eigenvectors v1 . To prove this fact. • If an n × n square matrix A has n independent eigenvectors (which is always so if the matrix has n distinct eigenvalues and often so even otherwise).21) and its explanation in § 14. Were vk dependent. among them the following.

6. . Were it not so..3.22) where Λ= 2 6 6 6 6 6 4 λ1 0 . every n×n matrix with n distinct eigenvalues) can be diagonalized as A = V ΛV −1 .2 speaks of matrices of the last kind. Section 14. but is not limited to. which in turn is possible only if A1 = A2 . λn−1 0 0 0 . .10.6 Diagonalization Any n × n matrix with n independent eigenvectors (which class per § 14.12 [1 0. nonnegative eigenvalues.1) the matrix’s characteristic polynomial (14. 1 1]T . . This fact proceeds from the last point. 14. . which by deﬁnition would be no eigenvalue if it had no eigenvector to scale.19) necessarily admits at least one nonzero solution v because its matrix A − λIn is degenerate. then v∗ Av = λv∗ v (in which v∗ v naturally is a positive real scalar) would violate the criterion for positive or nonnegative deﬁniteness. • An n × n square matrix whose eigenvectors are linearly independent of one another cannot share all eigensolutions with any other n×n square matrix. (14. ··· ··· 0 0 . • Every n × n square matrix has at least one eigensolution if n > 0. whose double eigenvalue λ = 1 has the single eigenvector [1 0]T . positive eigenvalues. . because according to the fundamental theorem of algebra (6.20) has at least one root. that every n-element vector x is a unique linear combination of independent eigenvectors. See § 13. . • A positive deﬁnite matrix has only real. .6. A nonnegative deﬁnite matrix has only real. DIAGONALIZATION 371 curiously. . and for which (14. Neither of the two proposed matrices A1 and A2 could scale any of the eigenvector components of x diﬀerently than the other matrix did.5 includes. 0 λn 3 7 7 7 7 7 5 12 [22] . 0 0 ··· ··· . so A1 x − A2 x = (A1 − A2 )x = 0 for all x. . 0 0 0 λ2 . an eigenvalue.14.

One might object that we had shown only how to compose some matrix V ΛV −1 with the correct eigenvalues and independent eigenvectors.23) for any diagonalizable matrix A. includes every n × n matrix with n distinct eigenvalues and also includes many matrices with fewer) is called a diagonalizable matrix. The diagonal matrix Λk is nothing more than the diagonal matrix Λ with each element individually raised to the kth power. because § 14. THE EIGENVALUE is an otherwise empty n × n matrix with the eigenvalues of A set along its main diagonal and V = v1 v2 · · · vn−1 vn is an n × n matrix whose columns are the eigenvectors of A. the diagonal decomposition or the diagonalization of the square matrix A. or. Besides factoring a diagonalizable matrix by (14.372 CHAPTER 14. Equation (14.22). we need not show this. then consider that the jth column of the product AV is Avj .55) is trivially diagonalizable as diag{x} = In diag{x}In . one can apply the same formula to compose a diagonalizable matrix with desired eigensolutions. j ij If this seems confusing. from which (14. because the identity AV = V Λ holds.13 The matrix V is invertible because its columns the eigenvectors are independent. An n × n matrix with n independent eigenvectors (which class.5 has already demonstrated that two matrices with the same eigenvalues and independent eigenvectors are in fact the same matrix. The diagonal matrix diag{x} of (11. such that Λk = δij λk . but had failed to show that the matrix was actually A.22) follows. This is so because the identity Avj = vj λj holds for all 1 ≤ j ≤ n.22) is called the eigenvalue decomposition. 13 . expressed more concisely. whereas the jth column of Λ having just the one element acts to scale V ’s jth column only. It is a curious and useful fact that A2 = (V ΛV −1 )(V ΛV −1 ) = V Λ2 V −1 and by extension that Ak = V Λk V −1 (14. whereby the product V ΛV −1 can be nothing other than A. However. again.

The nondiagonalizable matrix vaguely resembles the singular matrix in that both represent edge cases and can be hard to handle numerically.2 will have more to say about the nondiagonalizable matrix. Should the dominant eigenvalue have greater than unit magnitude. tends then to transform v into the associated eigenvector. REMARKS ON THE EIGENVALUE Changing z ← k implies the generalization14 Az = V Λz V −1 . All this is fairly deep mathematics. 14 It may not be clear however according to (5. a nondiagonalizable matrix is a matrix with an n-fold eigenvalue whose corresponding eigenvector space fewer than n eigenvectors fully characterize. then to A2 v. largest in magnitude. thus one can sometimes judge the stability of a physical process indirectly by examining the eigenvalues of the matrix which describes it. . Λz ij 373 = δij λz .24) good for any diagonalizable A and complex z. The idea that a matrix resembles a humble scalar in the right circumstance is powerful.10. 14. The n × n null matrix for example is singular but still diagonalizable. but the resemblance ends there. operating repeatedly on a vector v to change it ﬁrst to Av. Section 14. A3 v and so on. Then there is the edge case of the nondiagonalizable matrix. which matrix surprisingly covers only part of its domain with eigenvectors. twice or more. and a matrix can be either without being the other. it destabilizes the iteration.2 and 14. Remarks continue in §§ 14.12) which branch of λz one should choose j at each index j. It brings an appreciation of the matrix for reasons which were anything but apparent from the outset of Ch.10. gradually but relatively eliminating all other components of v.14. Nondiagonalizable matrices are troublesome and interesting. 11. What a nondiagonalizable matrix is in essence is a matrix with a repeated eigensolution: the same eigenvalue with the same eigenvector.7 Remarks on the eigenvalue Eigenvalues and their associated eigenvectors stand among the principal reasons one goes to the considerable trouble to develop matrix theory as we have done in recent chapters. More formally.7. especially if A has negative or complex eigenvalues. j (14.13. The dominant eigenvalue of A. Among the reasons for this is that a matrix can represent an iterative process.

then the corresponding eigenvalue of A′ is λmin / |λmax |. Section 14. for instance. Matrix condition so deﬁned turns out to have another useful application. An ill conditioned matrix by deﬁnition16 is a matrix of large κ ≫ 1.3 are singular. THE EIGENVALUE 14. or where the elements of A are known only within some tolerance. the scale factor 1/ |λmax | scales all eigenvalues equally. In terms of the new operator. 16 15 The largest in magnitude of the several eigenvalues of a diagonalizable operator A. no particular edge value of κ. One sometimes ﬁnds it convenient to normalize a dominant eigenvalue by deﬁning a new operator A′ ≡ A/ |λmax |. but in computer ﬂoating point. If zero. then both matrices are ill conditioned. or that one with a wretched κ = 20x18 were well conditioned. but you knew that already. then you might not credit me—but the mathematics nevertheless can only give the number. If I tried to claim that a matrix with a ﬁne κ = 3 were ill conditioned. denoted here λmax . tends to dominate the iteration Ak x.25) κ≡ λmin by which κ≥1 (14. at and beyond which it turns ill conditioned.374 CHAPTER 14. Suppose that a diagonalizable matrix A is precisely known but that the corresponding driving vector b is not. it remains to the mathematician to interpret it. if nearly zero. The applied mathematician handles such a matrix with due skepticism. the iteration becomes Ak x = |λmax |k A′k x. (14. If A(x + δx) = b + δb. κ = 1 would be ideal (it would mean that all eigenvalues had the same magnitude). leaving one free to carry the magnifying eﬀect |λmax |k separately if one prefers to do so. Such considerations lead us to deﬁne the condition of a diagonalizable matrix quantitatively as15 λmax . thus. though in practice quite a broad range of κ is usually acceptable. Could we always work in exact arithmetic. the value of κ might not interest us much as long as it stayed ﬁnite.26) . less than which a matrix is well conditioned.8 Matrix condition is always a real number of no less than unit magnitude. For best invertibility. whose own dominant eigenvalue λmax / |λmax | has unit magnitude.7 has named λmax the dominant eigenvalue for this reason. [9] There is of course no deﬁnite boundary. if A’s eigenvalue of smallest magnitude is denoted λmin . inﬁnite κ tends to emerge imprecisely rather as large κ ≫ 1. However. then both matrices according to § 14.

max λ−1 δb λmax |δb| |δx| ≤ min . For example. then the vector B » 5 1 – = 3 3 2 −1 1 54 0 5 + 14 2 5 1 0 2 = 3 4 4 2 5 1 2 . but ill condition remains a property of matrices alone. Subtracting x = A−1 b and taking the magnitude. x + δx = A−1 (b + δb). |δb| |δx| ≤κ .25). then one should like to bound the ratio |δx| / |x| to ascertain the reliability of x as a solution. = −1 |x| λmin |b| λmax b That is. Dividing by |x| = A−1 b . 14.27) Condition. |x| |b| (14.9 The similarity transformation Any collection of vectors assembled into a matrix can serve as a basis by which other vectors can be expressed. the condition of every nonzero scalar is happily κ = 1. might technically be said to apply to scalars as well as to matrices. |x| |A−1 b| The quantity A−1 δb cannot exceed λ−1 δb . A−1 δb |δx| = . According to (14. if the columns of B= 2 1 4 0 0 3 −1 2 5 1 are regarded as a basis. incidentally.14. Thus. THE SIMILARITY TRANSFORMATION 375 where δb is the error in b and δx is the resultant error in x. Transferring A to the equation’s right side. |δx| = A−1 δb .9. The quantity A−1 b cannot min fall short of λ−1 b .

Left-multiplication by B −1 . only transformed to work within the basis17. One can therefore convert any operator A to work within a complete basis B by the successive steps Ax = b. converting into the basis. That is. e3 . Such semantical distinctions seem a little too ﬁne for applied use.28) The reader may need to ponder the basis concept a while to grasp it. though. 17 (14. in which the n basis vectors are independent and address the same full space the columns of In address. by successive steps. then does the reverse. B −1 A(BB −1 )x = λB −1 x.376 CHAPTER 14.18 B. unitarily similar ) to the matrix A. if B is unitary. by which the operator B −1 AB is seen to be the operator A. not only from the standard building blocks e1 . . [B −1 AB]u = B −1 b. Probably the most important property of the similarity transformation is that it alters no eigenvalues. but the concept is simple once grasped and little purpose would be served by dwelling on it here.—except that the right word for the relevant “building block” is basis vector. The basis provides the units from which other vectors can be built. If x = Bu then u represents x in the basis B. one might consult one of those if needed. Now we have the theory to appreciate it properly. Basically. The books [23] and [31] introduce the basis more gently. 1) in the basis B: ﬁve times the ﬁrst basis vector plus once the second. [B −1 AB]u = λu. If B happens to be unitary (§ 13. if Ax = λx. invertible complete basis B. We have already met the similarity transformation in §§ 11. THE EIGENVALUE is (5. etc. The conversion from A into B −1 AB is called a similarity transformation.2. the idea is that one can build the same vector from alternate building blocks.11). then. e2 . ABu = b. u = B −1 x.5 and 12. The matrix B −1 AB the transformation produces is said to be similar (or. Left-multiplication by B evidently converts out of the basis. Particularly interesting is the n×n. 18 The professional matrix literature sometimes distinguishes by typeface between the matrix B and the basis B its columns represent. then the conversion is also called a unitary transformation. This book just uses B.

. is somewhat tedious to derive and is of limited use in itself.30) The alternative is to develop the interesting but diﬃcult Jordan canonical form. ∗ ∗ ∗ 0 0 0 0 . ··· ··· ··· ··· ··· ··· ··· . n × n matrix A and any invertible. . n × n square matrix A is A = QUS Q∗ . which will shortly grow clear) we have a matrix B of the form 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 . . ∗ ∗ ∗ ∗ ∗ ∗ ∗ . . as for any unitary matrix (§ 13. ∗ ∗ ∗ ∗ ∗ ∗ ∗ .10. . . . . of which this subsection uses too many to be allowed to reserve any. . 20 This subsection assigns various capital Roman letters to represent the several matrices and submatrices it manipulates. is Q−1 = Q∗ .29) is not standard and carries no meaning elsewhere. ∗ 0 0 0 0 0 0 .11). ∗ ∗ ∗ ∗ ∗ ∗ ∗ . . . and where US is a general upper triangular matrix which can have any values (even zeros) along its main diagonal. . . The writer had to choose some letters and these are ones he chose. . Its choice of letters except in (14. . . 3 B= ··· ··· ··· ··· ··· ··· ··· .29) where Q is an n × n unitary matrix whose inverse. square. . . . ∗ ∗ 0 0 0 0 0 . not because the latter is wrong but because it would be extremely confusing). This footnote mentions the fact because good mathematical style avoid assigning letters that already bear a conventional meaning in a related context (for example. . which for brevity’s sake this chapter prefers to omit.1 Derivation Suppose that20 (for some reason. 7 7 7 7 7 7 7 7 7. 19 . . . . .. See Appendix B. . 7 7 7 7 7 7 5 (14. .10. . ∗ ∗ ∗ ∗ 0 0 0 . this book avoids writing Ax = b as T e = i. The Schur decomposition is slightly obscure. THE SCHUR DECOMPOSITION 377 The eigenvalues of A and the similar B −1 AB are the same for any square.. . . . n × n matrix B.19 We derive it here for this reason. . 14. 14. . .14. .10 The Schur decomposition The Schur decomposition of an arbitrary. . though. but serves a theoretical purpose. . (14. The Roman alphabet provides only twenty-six capitals.

. . THE EIGENVALUE where the ith row and ith column are depicted at center. (14. ∗ ∗ ∗ ∗ ∗ ∗ ∗ . ··· ··· ··· ··· ··· ··· ··· . 7 7 7 7 7 7 5 7 7 7. . ∗ ∗ ∗ ∗ ∗ 0 0 . ∗ ∗ 0 0 0 0 0 . 3 2 6 4 ∗ 0 0 . . 3 Equation (14. . because we need not ﬁnd every W that satisﬁes the form but only one such W . . . . . . . . . . . . . 5 Co = 6 6 ∗ ∗ ∗ . . ··· ··· ··· . Let Bo and Co represent the (n − i) × (n − i) submatrices in the lower right corners respectively of B and C... . . 0 0 0 0 ∗ ∗ ∗ . . . . . To narrow the search. ··· ··· ··· ··· ··· ··· ··· . .378 CHAPTER 14.. .33) . . . . let us look ﬁrst for a W that ﬁts the restricted template 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 . . . . ··· ··· ··· ··· ··· ··· ··· . 1 0 0 0 0 0 0 . The question remains as to whether a unitary W exists that satisﬁes the form and whether for general B we can discover a way to calculate it. 0 0 0 0 ∗ ∗ ∗ . .31) seeks an n × n unitary matrix W to transform the matrix B into a new matrix C ≡ W ∗ BW such that C ﬁts the form (14. . 0 0 1 0 0 0 0 . . . . . . 7 7 7 7 7 7 5 C ≡ W ∗ BW = (14. . . . . Suppose further that we wish to transform B not only similarly but unitarily into 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 . . . ∗ ∗ ∗ 0 0 0 0 . . . . . . 7 7 7. . 0 0 0 0 ∗ ∗ ∗ . Pictorially. ∗ ∗ ∗ . .. . .9. . . ∗ ∗ ∗ ∗ ∗ ∗ ∗ . . .31) where W is an n × n unitary matrix. . . . 0 0 0 1 0 0 0 . Co ≡ In−i H−i CHi In−i . ∗ ∗ ∗ . . . . . . . .. . . . .32) where Hk is the shift operator of § 11. 3 7 7 7 7 7 7 7 7 7. ∗ 0 0 0 0 0 0 . . . . . 5 W = Ii + Hi Wo H−i = (14. Bo = 6 6 4 2 6 ∗ ∗ ∗ . . . . ∗ ∗ ∗ ∗ 0 0 0 . . ∗ ∗ ∗ . such that Bo ≡ In−i H−i BHi In−i . . . .31) stipulates. ··· ··· ··· ··· ··· ··· ··· .. . 3 7 7 7 7 7 7 7 7 7. and where we do not mind if any or all of the ∗ elements change in going from B to C but we require zeros in the indicated spots. ··· ··· ··· . . . 0 1 0 0 0 0 0 .

That leaves the lower right. THE SCHUR DECOMPOSITION 379 which contains a smaller. so In−i Wo = Wo = Wo In−i . and the upper right term does not bother us because we have not restricted the content of C’s upper right area. The four terms on the equation’s right. (n − i) × (n − i) unitary submatrix Wo in its lower right corner and resembles In elsewhere. Thus.14.10. Apparently any (n − i) × (n − i) unitary submatrix Wo whatsoever obeys (14. upper left and upper right.31) by the truncator (In −Ii ) to focus solely on the lower right area.and right-multiplying (14.76). ∗ = (Ii + Hi Wo H−i )(B)(Ii + Hi Wo H−i ) The unitary submatrix Wo has only n−i columns and n−i rows. each term with rows and columns neatly truncated.34) . (14. represent the four quarters of C ≡ W ∗ BW —upper left. upper right. Beginning from (14. The lower left term is null because ∗ ∗ (In − Ii )[Hi Wo H−i B]Ii = (In − Ii )[Hi Wo In−i H−i BIi ]Ii ∗ = (In − Ii )[Hi Wo H−i ][0]Ii = 0.31) in the lower left. ∗ C = Ii BIi + Ii BHi Wo In−i H−i + Hi In−i Wo H−i BIi ∗ = Ii [B]Ii + Ii [BHi Wo H−i ](In − Ii ) + (In − Ii )[Hi Wo H−i B]Ii ∗ + (In − Ii )[Hi Wo Bo Wo H−i ](In − Ii ). ∗ = (In − Ii )[Hi Wo H−i ][(In − Ii )BIi ]Ii leaving C = Ii [B]Ii + Ii [BHi Wo H−i ](In − Ii ) ∗ + (In − Ii )[Hi Wo Bo Wo H−i ](In − Ii ). ∗ + Hi In−i Wo In−i H−i BHi In−i Wo In−i H−i where the last step has used (14.32) and the identity (11. we have the reduced requirement that (In − Ii )C(In − Ii ) = (In − Ii )W ∗ BW (In − Ii ). Left.31). we have by successive. lower left and lower right. reversible steps that C = W ∗ BW ∗ = Ii BIi + Ii BHi Wo H−i + Hi Wo H−i BIi ∗ + Hi Wo H−i BHi Wo H−i . But the upper left term makes the upper left areas of B and C the same. respectively.

∗ Co = In−i H−i (Ii + Hi Wo H−i )B(Ii + Hi Wo H−i )Hi In−i . Noting that the Gram-Schmidt algorithm orthogonalizes only rightward. . or. or.35) The steps from (14. Expanding W per (14. THE EIGENVALUE Further left-multiplying by H−i . therefore. The increasingly well-stocked armory of matrix theory we now have to draw from makes satisfying (14. ∗ Co = Wo Bo Wo .35) possible as follows. Let (λo . it suﬃces to satisfy (14. To achieve a unitary transformation of the form (14.35). this is (14.5 that every square matrix has at least one eigensolution. since In−i H−i Ii = 0 = Ii Hi In−i .33). Decompose F by the Gram-Schmidt technique of § 13. vo ) represent an eigensolution of Bo —any eigensolution of Bo —with vo normalized to unit magnitude.10. (n − i) × (n − i + 1) matrix F ≡ vo e1 e2 e3 · · · en−i . ∗ Co = In−i H−i (Hi Wo H−i )B(Hi Wo H−i )Hi In−i ∗ = In−i Wo H−i BHi Wo In−i ∗ = Wo In−i H−i BHi In−i Wo . Co = In−i H−i W ∗ BW HiIn−i . Form the broad. so the latter is as good a way to state the reduced requirement as the former is.76) yields In−i H−i CHi In−i = In−i H−i W ∗ BW HiIn−i . and applying the identity (11. Observe per § 14.31).2. right-multiplying by Hi . substituting from (14. Per (14. observe that the ﬁrst column of the (n − i) × (n − i) unitary matrix QF remains simply the ﬁrst column of F .32). which is the unit eigenvector vo : [QF ]∗1 = QF e1 = vo .35) are reversible.34) to (14. choosing p = 1 during the ﬁrst instance of the algorithm’s step 3 (though choosing any permissible p thereafter).380 CHAPTER 14.32). to obtain F = Q F RF .

36) completes a failsafe technique to transform unitarily any square matrix B of the form (14. 1 ≤ i′ ≤ n. 3 Ge1 = λo Q∗ vo = λo e1 = 6 6 F which means that G=6 6 4 2 6 λo 0 0 . together constitute a valid choice for Wo and Co . 5 7 7 7. Left-multiplying by Q∗ .36) where QF and G are as this paragraph develops. F Noting that the Gram-Schmidt process has rendered orthogonal to vo all columns of QF but the ﬁrst.. . satisfying the reduced requirement (14.35) and thus also the original requirement (14. observe that QF Ge1 = λo vo .37) . .10. Co = G. . 5 which ﬁts the very form (14. ··· ··· ··· . (14. . . which is vo . THE SCHUR DECOMPOSITION Transform Bo unitarily by QF to deﬁne the new matrix G ≡ Q ∗ Bo Q F .31).31). 7 7 7. Naturally the technique can be applied recursively as B|i=i′ = C|i=i′ −1 . F Ge1 = λo Q∗ vo . (14. F then transfer factors to reach the equation QF GQ∗ = Bo .14. . Conclude therefore that W o = QF . . Equation (14. observe that 2 6 4 3 λo 0 0 . ∗ ∗ ∗ . .32) the submatrix Co is required to have. F 381 Right-multiplying by QF e1 = vo and noting that Bo vo = λo vo . ∗ ∗ ∗ . .30) into a square matrix C of the form (14.

else the permutor’s bottom row would be empty which is not allowed.30) the matrix US has the general upper triangular form the Schur decomposition (14. If P marked no position below the main diagonal. Moreover.41) The determinant’s deﬁnition in § 14. (ii) that the only permutor that marks no position below the main diagonal is the one which also marks no position above.21 Hence no element above the main diagonal of US even inﬂuences the eigenvalues. p(n−2)(n−2) = 1. 14. we have that n−1 Q= i′ =0 (W |i=i′ ) .10. In either form. if we let B|i=0 = A. this determinant brings only the one term n det(US − λIn ) = i=1 (uSii − λ) = 0 whose factors run straight down the main diagonal. Unlike most determinants. (14. In the next row up. because the product of unitary matrices according to (13. then it would necessarily have pnn = 1.39) (14. THE EIGENVALUE because the form (14. which apparently are λi = uSii .29) requires.30) of B at i = i′ is nothing other than the form (14. 21 (14. then it follows by induction that B|i=n = US .40) which along with (14.39) accomplishes the Schur decomposition. Therefore. . In the next-to-bottom row.1 makes the following two propositions equivalent: (i) that a determinant’s term which includes one or more factors above the main diagonal also includes one or more factors below.38) where per (14. the proposition’s truth might seem less than obvious until viewed from the proper angle.20) of the general upper triangular matrix US is det(US − λIn ) = 0. where the determinant’s n! − 1 other terms are all zero because each of them includes at least one zero factor from below the main diagonal.62) is itself a unitary matrix. p(n−1)(n−1) = 1. thus aﬃrming the proposition. (14. because the nth column is already occupied.31) of C at i = i′ − 1. and so on.382 CHAPTER 14.2 The nondiagonalizable matrix The characteristic equation (14. Consider a permutor P .

however. By the Schur decomposition. Though inﬁnitesimally near A.22 One might think that the Schur decomposition oﬀered an easy way to calculate eigenvalues.28).2 has analyzed perturbed poles. ≡ US + ǫ diag{u}. 23 Equation (11. whose triangular factor US openly lists all eigenvalues along its main diagonal. one can construct a second square matrix A′ . simply by perturbing the main diagonal of US to23 ′ US ui′ = ui if λi′ = λi .42) where |ǫ| ≪ 1 and where u is an arbitrary vector that meets the criterion ′ given. This theoretical beneﬁt pays when some of the n eigenvalues of an n × n square matrix A repeat.55) deﬁnes the diag{·} notation. and. 22 An unusually careful reader might worry that A and US had the same eigenvalues with diﬀerent multiplicities.10. still. . The Schur decomposition (14. as we have seen. but.11) and (14. thus further the same eigenvalue multiplicities.41): every square matrix without exception has a Schur decomposition. one can analyze such perturbed eigenvalues and their associated eigenvectors similarly as § 9. similarity transformations preserve eigenvalues. which says that A and US have not only the same eigenvalues but also the same characteristic polynomials. then the eigenvalues of A are just the values along the main diagonal of US . as near as desired to A but having n distinct eigenvalues.29) is in fact a similarity transformation. THE SCHUR DECOMPOSITION 383 the main-diagonal elements. With suﬃcient toil. If therefore A = QUS Q∗ .14. one would like to give a sounder reason than the participle “surprising.13).6. it brings at least the theoretical beneﬁt of (14. but it is less easy than it ﬁrst appears because one must calculate eigenvalues to reach the Schur decomposition in the ﬁrst place. (14. this equation’s determinant is = Q[US − λ(Q∗ In Q)]Q∗ = Q[US − λIn ]Q∗ . It would be surprising if it actually were so.” Consider however that A − λIn = QUS Q∗ − λIn = Q[US − Q∗ (λIn )Q]Q∗ According to (14. Whatever practical merit the Schur decomposition might have or lack. every matrix A has a Schur decomposition. the modiﬁed matrix A′ = QUS Q∗ unlike A has n (maybe inﬁnitesimally) distinct eigenvalues. According to (14. det[A − λIn ] = det{Q[US − λIn ]Q∗ } = det Q det[US − λIn ] det Q∗ = det[US − λIn ].

Ch. or concerning the eigenvalue problem26 more broadly.5.384 CHAPTER 14. but fail where eigenvectors repeat. Ch. Then there is a kind of sloppy Schur form called a Hessenberg form which allows content in US along one or more subdiagonals just beneath the main diagonal. The ﬁnding that every matrix is arbitrarily nearly diagonalizable illuminates a question the chapter has evaded up to the present point.22) diagonalize any matrix with distinct eigenvalues and even any matrix with repeated eigenvalues but distinct eigenvectors. One could proﬁtably propose and prove any number of useful theorems concerning the nondiagonalizable matrix and its generalized eigenvectors.42) brings us to the nondiagonalizable matrix of the subsection’s title. Section 14. 7] [15.6 and its diagonalization formula (14. 24 25 [17. is arbitrarily nearly diagonalizable with distinct eigenvalues. (14.6. in more and less rigorous ways. The question: does a p-fold root in the characteristic polynomial (14. 14.11 The Hermitian matrix A∗ = A.42) separates eigenvalues.25 which together roughly track the sophisticated conventional pole-separation technique of § 9. 5] 26 [53] . but for the time being we shall let the matter rest there. THE EIGENVALUE Equation (14. We shall leave the answer in that form.5 eigenvectors of distinct eigenvalues never depend on one another—permitting a nonunique but still sometimes usable form of diagonalization in the limit ǫ → 0 even when the matrix in question is strictly nondiagonalizable. Generalizing the nondiagonalizability concept leads one eventually to the ideas of the generalized eigenvector 24 (which solves the higher-order linear system [A − λI]k v = 0) and the Jordan canonical form.20) necessarily imply a p-fold eigenvalue in the corresponding matrix? The existence of the nondiagonalizable matrix casts a shadow of doubt until one realizes that every nondiagonalizable matrix is arbitrarily nearly diagonalizable— and. Equation (14. better.43) An m × m square matrix A that is its own adjoint. thus also eigenvectors—for according to § 14. That is the essence of the idea. then you can show him a nearly identical matrix with three inﬁnitesimally distinct eigenvalues. If you claim that a matrix has a triple eigenvalue and someone disputes the claim.

let the s columns of the m × s matrix Vo represent the s independent eigenvectors of A such that (i) each column has unit magnitude and (ii) columns whose eigenvectors share the same eigenvalue lie orthogonal to one another.11 and 14. • its eigenvectors corresponding to distinct eigenvalues lie orthogonal to one another. and • it is unitarily diagonalizable (§§ 13. for which λ∗ v∗ v = (Av)∗ v = v∗ Av = v∗ (Av) = λv∗ v. 27 [31. the eigenvalues λ1 and λ2 are no exceptions. ∗ λ2 = λ1 or v2 v1 = 0. Let the s × s diagonal matrix Λo carry the eigenvalues on its main diagonal such that AVo = Vo Λo . which naturally is possible only if λ is real. 2 But according to the last paragraph all eigenvalues are real. § 8.1] . λ∗ = λ. Properties of the Hermitian matrix include that • its eigenvalues are real. 2 That is. v) represent an eigensolution of A and constructing the product v∗ Av. Given an m × m matrix A. Hence. v2 ) represent eigensolu∗ tions of A and constructing the product v2 Av1 . That is. THE HERMITIAN MATRIX 385 is called a Hermitian or self-adjoint matrix.6) such that A = V ΛV ∗ .14.44) That the eigenvalues are real is proved by letting (λ. (14. That eigenvectors corresponding to distinct eigenvalues lie orthogonal to one another is proved27 by letting (λ1 . v1 ) and (λ2 . ∗ λ∗ = λ1 or v2 v1 = 0. for which ∗ ∗ ∗ ∗ λ∗ v2 v1 = (Av2 )∗ v1 = v2 Av1 = v2 (Av1 ) = λ1 v2 v1 .11. To prove the last hypothesis of the three needs ﬁrst some deﬁnitions as follows.

9) to Vo —perpendicular to all eigenvectors. Λo = 4 0 5 0 5. Recall from § 14.22) is that the latter always includes a p-fold eigenvalue p times. Vo = 6 √ 7. not an eigenvector but perpendicular to them all. whereas the former includes a p-fold eigenvalue only as many times as the eigenvalue enjoys independent eigenvectors. the latter of whose eigenvector space is fully characterized by two eigenvectors rather than three such that 2 1 3 3 2 √ 0 0 2 3 0 2 −1 0 0 1 6 √ 7 6 1 0 7 0 7 6 ⊥ Vo = 6 2 7. Let the m−s columns of the m× (m − s) matrix Vo⊥ represent the complete orthogonal complement (§ 13. . falsely supposing a nondiagonalizable Hermitian matrix A.28 With these deﬁnitions in hand. each column of unit magnitude— such that Vo⊥∗ Vo = 0 and Vo⊥∗ Vo⊥ = Im−s . s × (m − s) and (m − s) × (m − s) auxiliary matrices F and G necessarily would exist such that AVo⊥ = Vo F + Vo⊥ G. In the example. For such a matrix A. The two λ = 5 eigenvectors are reported in mutually orthogonal form.386 CHAPTER 14. we can now prove by contradiction that all Hermitian matrices are diagonalizable. implying that s < m) would have at least one column. THE EIGENVALUE where the distinction between the matrix Λo and the full eigenvalue matrix Λ of (14. whose Vo⊥ (since A is supposed to be nondiagonalizable. 1 1 5 4 0 √2 5 4 0 2 0 0 5 1 1 √ − 2 0 0 √2 A concrete example: the invertible but 2 −1 6 −6 A=6 4 0 0 nondiagonalizable matrix 3 0 0 0 5 5 2 −5 7 2 7 0 5 0 5 0 0 5 The orthogonal complement Vo⊥ supplies the missing vector.6 that s = m if and only if A is diagonalizable. but notice that eigenvectors corresponding to distinct eigenvalues need not be orthogonal when A is not Hermitian. All vectors in the example are reported with unit magnitude.5 that s = 0 but 0 < s ≤ m because every square matrix has at least one eigensolution. m = 4 and s = 3. Recall from § 14. not due to any unusual property of the product AVo⊥ but for the mundane reason that the columns of Vo and Vo⊥ together by deﬁnition addressed 28 has a single eigenvalue at λ = −1 and a triple eigenvalue at λ = 5.

In the reduced equation the matrix G would have at least one eigensolution. o Λ∗ (0) = F + (0)G.11. one wonders whether the converse holds: are all diagonalizable matrices with real eigenvalues and orthogonal eigenvectors Hermitian? To show that they are. We conclude that all Hermitian matrices are diagonalizable—and conclude further that they are unitarily diagonalizable on the ground that their eigenvectors lie orthogonal to one another—as was to be demonstrated. The ﬁnding that F = 0 reduces the AVo⊥ equation above to read AVo⊥ = Vo⊥ G. we would have by successive steps that AVo⊥ w = Vo⊥ Gw. w) represent an eigensolution of G. Right-multiplying by the (m − s)-element vector w = 0. its distinctly eigenvalued eigenvectors lay orthogonal to one another. The assumption: that a nondiagonalizable Hermitian A existed. Vo⊥ w) were an eigensolution of A. Λ∗ Vo∗ Vo⊥ = F + Vo∗ Vo⊥ G. The contradiction proves false the assumption that gave rise to it. o 0 = F.14. A(Vo⊥ w) = µ(Vo⊥ w). not due to any unusual property of G but because according to § 14.22). we would have by successive steps that Vo∗ AVo⊥ = Vo∗ Vo F + Vo∗ Vo⊥ G. A = V ΛV ∗ . (AVo )∗ Vo⊥ = Is F + Vo∗ Vo⊥ G. The last equation claims that (µ. (Vo Λo )∗ Vo⊥ = F + Vo∗ Vo⊥ G. . has at least one eigensolution. thus by construction did not lie in the space addressed by the columns of Vo⊥ . in consequence of which A∗ = A and Vo∗ Vo = Is . 1 × 1 or larger. where we had relied on the assumption that A were Hermitian and thus that. Having proven that all Hermitian matrices are diagonalizable and have real eigenvalues and orthogonal eigenvectors.5 every square matrix. as proved above. THE HERMITIAN MATRIX 387 the space of all m-element vectors—including the columns of AVo⊥ . Let (µ. when we had supposed that all of A’s eigenvectors lay in the space addressed by the columns of Vo . one can construct the matrix described by the diagonalization formula (14. Leftmultiplying by Vo∗ .

2. 18 Oct. The equation’s adjoint is A∗ = V Λ∗ V ∗ . but it works in theory at least. (14.3. thus is unitarily diagonalizable according to (14. 30 [52. which means that Λ∗ = Λ and the right sides of the two equations are the same. then the following idea is one you might discover. All diagonalizable matrices with real eigenvalues and orthogonal eigenvectors are Hermitian. is clearly Hermitian according to § 14. 2007] 29 . That the eigenvalues.45) in which A = C ∗ C and b = C ∗ d.5 does require all of its eigenvalues to be real and moreover positive— which means among other things that the eigenvalue matrix Λ has full rank.46) Here.” 14:29. almost in plain sight. the n × n matrices Λ and V represent respectively the eigenvalues and eigenvectors not of A but of the product A∗ A.6.11). (14.11. overlooked. since (A∗ A)∗ = A∗ A. The properties demand a Hermitian matrix.12 The singular value decomposition Occasionally an elegant idea awaits discovery. for which the properties obtain. THE EIGENVALUE where V −1 = V ∗ because this V is unitary (§ 13. “Singular value decomposition. and.45) worsens a matrix’s condition and may be undesirable for this reason. But all the eigenvalues here are real. Though nothing requires the product’s eigenvectors to be real. the diagonal elements of Λ. That is.388 CHAPTER 14. are real and positive is The device (14. This section brings properties that greatly simplify many kinds of matrix analysis. m × n matrix A of full column rank r=n≤m and its adjoint A∗ . A∗ = A as was to be demonstrated. If the unlikely thought occurred to you to take the square root of a matrix.29 14. which might seem a severe and unfortunate restriction—except that one can left-multiply any exactly determined linear system Cx = d by C ∗ to get the equivalent Hermitian system [C ∗ C]x = [C ∗ d]. The product A∗ A is invertible according to § 13. because the product is positive deﬁnite § 14.30 Consider the n × n product A∗ A of a tall or square.6.44) as A∗ A = V ΛV ∗ . is positive deﬁnite according to § 13.

··· ··· 0 0 . + λn−1 0 0 0 . . 0 √ + λn 3 7 7 7 7. so this case poses no special trouble. instead.12. thus making U a unitary matrix? If we do this. positive square root under these conditions. 7 5 where the singular values of A populate Σ’s diagonal. which says neither more nor less than that the ﬁrst n columns of U are orthonormal.48) (14. then the surprisingly simple (14. AV = U Σ. positive square root of the eigenvalue matrix Λ such that Λ = Σ∗ Σ.48)’s second line gives the equation Σ∗ U ∗ U Σ = Σ∗ Σ. . V ∗ A∗ AV = Σ∗ Σ. A = U ΣV . . . for just as a real. positive scalar has a real. If A happens to have broad shape then we can decompose A∗ .49) constitutes the singular value decomposition of A. but ΣΣ−1 = In .46) then yields A∗ A = V Σ∗ ΣV ∗ . 0 0 √ (14. Apparently every full-rank matrix has a singular value decomposition.47) 0 √ + λ2 . Now consider the m × m matrix U such that AV Σ−1 = U In . ∗ Σ =Σ = 2 6 6 6 6 6 4 + λ1 0 . √. .49) . Substituting (14. Λ Let the symbol Σ = Λ represent the n × n real. THE SINGULAR VALUE DECOMPOSITION 389 a useful fact. . Why not use GramSchmidt to make them orthonormal. . too.. so equally has √ a real. . 0 0 ··· ··· .49) does not constrain the last m − n columns of U . so left. Applying (14.49)’s second line into (14. Equation (14.14.and right-multiplying respectively by Σ−∗ and Σ−1 leaves that In U ∗ U In = In . leaving us free to make them anything we want. But what of the matrix of less than full rank r < n? In this case the product A∗ A is singular and has only s < n nonzero eigenvalues (it may be ∗ (14. positive square root.47) to (14.

51) ∗ (14. Apparently every matrix of less than full rank has a singular value decomposition. This book has chosen a way that needs some tools like truncators other books omit. If A happens to be an invertible square matrix. (14. the orthogonal complement. then the singular value decomposition evidently inverts it as A−1 = V Σ−1 U ∗ .390 CHAPTER 14. These several matrix chapters have not covered every topic they might. The other is the class of basic matrix theory these chapters do not happen to use. However. then (14. orthonormalization. one might avoid true understanding and instead work by memorized rules. A = U ΣV . and how many scientists and engineers would rightly choose the “nothing” if the matrix did not serve so very many applications as it does? Since it does serve so very many. but the reasoning is otherwise the same as before. about which we shall have more to say in a moment. the “all” it must be. 31 . but this is irrelevant to the proof at hand). The choice to learn the basic theory of the matrix is almost an all-or-nothing choice. THE EIGENVALUE that s = r. One is the class of more advanced and more specialized matrix theory. the kernel. but does not need other tools like Of course.31 Applied mathematics brings nothing else quite like it. pseudoinversion. That is not always a bad plan.50) 14. if the s nonzero eigenvalues are arranged ﬁrst in Λ. The essential agents of matrix analysis— multiplicative associativity. The topics they omit fall roughly into two classes.49) becomes AV Σ−1 = U Is . really. The product A∗ A is nonnegative deﬁnite in this case and ΣΣ−1 = Is . diagonalization and so on—are the same in practically all books on the subject. AV = U Σ. too. rank. no one has yet discovered a satisfactory way to untangle the knot. but if that were your plan then it seems spectacularly unlikely that you would be reading a footnote buried beneath the further regions of the hinterland of Chapter 14 in such a book as this. but the way the agents are developed diﬀers. As far as the writer knows. the eigenvalue.13 General remarks on the matrix Chapters 11 through 14 have derived the uncomfortably bulky but—incredibly—approximately minimal knot of theory one needs to grasp the matrix properly and to use it with moderate versatility. inversion.

VI. which is to † say that Ax ≈ b. should the need arise. However. Such as [23. yet most of the tools are of limited interest in themselves. it is the agents that matter. From the present pause one could proceed directly to such topics.32). since this is the ﬁrst proper pause these several matrix chapters have aﬀorded. § 3. GENERAL REMARKS ON THE MATRIX 391 projectors other books32 include. Tools like the projector not used here tend to be omitted here or deferred to later chapters. more advanced and more specialized matrix theory though often harder tends to come in smaller. brieﬂy: a projector is a matrix that ﬂattens an arbitrary ˜ vector b into its nearest shadow b within some restricted subspace. More broadly deﬁned. per (13. in which the matrix B(B ∗ B)−1 B ∗ is the projector. whereupon x = A b. since the book is Derivations of Applied Mathematics rather than Derivations of Applied Matrices. If the columns of A ˜ ˜ represent the subspace. then x represents b in the subspace basis iﬀ Ax = b. The theory develops endlessly. Thence it is readily shown that the ˜ ˜ deviation b − b lies orthogonal to the shadow b. Well.3]. not because they are altogether useless but because they are not used here and because the present chapters are already too long. The reader who understands the Moore-Penrose pseudoinverse and/or the Gram-Schmidt process reasonably well can after all pretty easily ﬁgure out how to construct a projector without explicit instructions thereto. for instance. That is.13. . maybe we ought to take advantage to change the subject. since we have brought it up (though only as an example of tools these chapters have avoided bringing up). or the conjugategradient algorithm. One can approach the projector in other ways. What has given these chapters their hefty bulk is not so much the immediate development of the essential agents as the preparatory development of theoretical tools used to construct the essential agents. but there are two ways at least. any matrix M for 2 which M = M is a projector. 33 32 ˜ b = Ax = AA† b = [BC][C ∗ (CC ∗ )−1 (B ∗ B)−1 B ∗ ]b = B(B ∗ B)−1 B ∗ b.33 Paradoxically and thankfully. more manageable increments: the Cholesky decomposition.14. a lengthy but well-knit tutorial this writer recommends.

THE EIGENVALUE .392 CHAPTER 14.

11 through 14.4 and 3. the three-dimensional geometrical vector is the n = 3 special case of the general.4 on page 63 interrelates. z) or spherical (r.Chapter 15 Vector analysis Leaving the matrix. the need to disambiguate seems seldom to arise in practice. A three-dimensional geometrical vector—or. 2] 393 .” A name like “matrix vector” can disambiguate the vector of Chs. more exotic possibilities. among other. the three-dimensional geometrical vector merits closer attention and special treatment.3 of an amplitude of some kind plus a direction. Without 1 [7. because its three elements represent the three dimensions of the physical world. Ch. a vector—consists per § 3. 15. Seen from one perspective. However. x). This chapter details it. 3. cylindrical (ρ. conventionally call it just a “vector. y. this chapter turns to a curiously underappreciated agent of applied mathematics: the three-dimensional geometrical vector. In this chapter and in other appropriate contexts we shall usually. φ) coordinates. The vector brings an elegant notation. simply. The short name “vector” serves. φ. ﬁrst met in §§ 3.3. Section 3. but since the three-dimensional geometrical vector brings all the matrix vector’s properties and disrupts none of its operations. since it adds semantics to but does not subtract them from it.1 Applied convention ﬁnds the full name “three-dimensional geometrical vector” too bulky for common use.9. as Fig.1 illustrates and Table 3. θ. 11 through 14 if and where required. n-dimensional vector of Chs.9 has told that three scalars called coordinates suﬃce to specify a vector: these three can be rectangular (x.

VECTOR ANALYSIS Figure 15. φ. (The axis labels bear circumﬂexes in this ﬁgure only to disambiguate the z axis from the cylindrical coordinate z. z) coordinates.5. See also Fig.) ˆ z ˆ θ φ x ˆ r z y ˆ ρ . 15. in spherical (r.394 CHAPTER 15.1: A point on a sphere. φ) and cylindrical (ρ. θ.

1 Reorientation 3 ˆ x′ 4 y′ 5 ˆ ˆ z′ 2 2 32 3 ˆ x 0 ˆ 0 54 y 5. after which it develops the calculus of vector ﬁelds.15.1. The vector notation’s principal advantage lies in that it frees a model’s geometry from reliance on any particular coordinate system. Two-dimensional geometrical vectors arise in practical modeling about as often. one rotates the coordinates of its aspect coeﬃcient about the y axis. ˆ 1 z Matrix notation expresses the rotation of axes (3. With the notation. The example measures this advantage when.2.” though for vectors some students seem to insist on preferring the expression without anyway. whose spaghetti-like result is as unsuggestive as it is unwieldy. This chapter therefore begins with the reorientation of axes in § 15. The two-dimensional however needs little or no additional special treatment. The same coeﬃcient in standard vector notation is ˆ n · ∆ˆ. REORIENTATION the notation. such rotation is a lengthy. don’t worry. for instance. Vector addition will already be familiar to the reader from Ch. one writes an expression like (z − z ′ ) − [∂z ′ /∂x]x=x′ . too. Three-dimensional geometrical vectors naturally are not the only interesting kind. 15.y=y′ (x − x′ ) − [∂z ′ /∂y]x=x′ . the rotation is almost trivial.y=y′ [(x − x′ )2 + (y − y ′ )2 + (z − z ′ )2 ] for the aspect coeﬃcient relative to a local surface normal (and if the sentence’s words do not make sense to you yet. just look the symbols over).1 and vector multiplication in § 15.5) as = cos ψ 4 − sin ψ 0 sin ψ cos ψ 0 In three dimensions however one can do more than just to rotate the x and y axes about the z. You probably should not.y=y′ (y − y ′ ) 395 [1 + (∂z ′ /∂x)2 + (∂z ′ /∂y)2 ]x=x′ . They are important. 3 or (more likely) from earlier work outside this book. for it is just the threedimensional with z = 0 or θ = 2π/4. With two rotations to point the x axis in the desired . error-prone exercise. Without the notation. r To prefer the expression without to the expression with the notation would be as to to prefer the pseudo-Roman “DCXCIII/M + CXLVII/(M)(M)” to “ln 2.

ˆ z 1 (15.1) or (15.2) To multiply the three matrices of (15. but notice that the transpose (though curiously not the adjoint) of each of the 3 × 3 rotation matrices is also its inverse.3) can still seem confusing until one rewrites the latter equation in the form   x  y . (15. Consider a vector ˆ ˆ ˆ v = xx + yy + zz. but. since the prospect of actually applying the equations in a concrete case tends to confuse the uninitiated.2) say all one needs to say about reorienting axes in three dimensions. one spins or rolls the y and z axes through an angle η about the new x. even once one has clearly visualized this sequence of three rotations. the prospect of applying (15. inverting per (3. Envisioning the axes as in Fig. 15. .3) cos φ 4 sin φ 0 − sin φ cos φ 0 32 cos θ 0 0 0 54 − sin θ 1 0 1 0 32 1 sin θ 0 54 0 0 cos θ 0 cos η sin η 32 ′ 3 ˆ x 0 ˆ − sin η 54 y′ 5.396 CHAPTER 15. all the while maintaining the three axes rigidly at right angles to one another. VECTOR ANALYSIS direction plus a third rotation to spin the y and z axes into the desired position about the new x axis. z v= ˆ ˆ ˆ x y z (15.1 with the z axis upward. Finally. one can reorient the three axes generally: 3 ˆ x′ 4 y′ 5 ˆ ˆ z′ 2 = 2 1 4 0 0 0 cos η − sin η 32 cos θ 0 sin η 54 0 sin θ cos η 0 1 0 32 cos φ − sin θ 54 − sin φ 0 0 cos θ sin φ cos φ 0 3 32 ˆ x 0 ˆ 0 54 y 5. we might discuss the equations here a little further. (15. one ﬁrst rotates or yaws the x axis through an angle φ toward the y then tilts or pitches it downward through an angle θ away from the z.4) after which the application is straightforward.2) (which inversely represents the sequence) to (15.2) to form a single 3 × 3 matrix for each equation is left as an exercise to the interested reader. In concept. There results ˆ ˆ ˆ v′ = x′ x′ + y′ y ′ + z′ z ′ . 3 ˆ x 4 y 5 ˆ ˆ z 2 = 2 (15.6). Yet.1) or. ˆ′ z cos η One does not reorient the vector but rather the axes used to describe the vector.1) and (15.

but those are the essentials of it. Indeed. The other two forms of vector multiplication involve multiplying a vector by another vector and are the subjects of the two subsections that follow.5) naturally resembles (15. MULTIPLICATION where 3 x′ 4 y′ 5 z′ 2 397 ≡ 2 (15.2.15. which is the inverse of (15. scalar multiplication. this makes sense.7) Such scalar multiplication evidently scales a vector’s length without diverting it from its direction. 0) belong to x.6) It is tempting incidentally to suppose the unprimed spherical coordinates ˆ of x′ —that is. θ. φ) happen to belong to a not-particularlyˆ interesting unit vector one might label z′′ .2). One can reorient three axes arbitrarily by rotating them in pairs about the z.5) and where Table 3. its spherical coordinates in the [ˆ y ˆ] system—were (1. is trivial: if a vector v is as deﬁned by (15.3). θ + ˆ 2π/4. cos η z′ 1 4 0 0 0 cos η − sin η 32 cos θ 0 sin η 54 0 sin θ cos η 0 1 0 32 cos φ − sin θ 54 − sin φ 0 0 cos θ sin φ cos φ 0 3 32 x 0 0 54 y 5. z 1 = (15. which between the second and third rotations momentarily marks the z axis.1). y and x axes in sequence. φ) that actually belong to x′ . So. A ﬁrmer grasp of the reorientation of axes in three dimensions comes with practice. It is 3 x 4 y 5 z 2 2 cos φ 4 sin φ 0 − sin φ cos φ 0 32 cos θ 0 0 0 54 − sin θ 1 0 1 0 32 1 sin θ 0 54 0 0 cos θ 0 cos η sin η 32 ′ 3 x 0 − sin η 54 y ′ 5. θ. . xˆz but this is incorrect. 15.2 Multiplication One can multiply a vector in any of three ways. It is the coordinates (1. The ﬁrst. One should appreciate the temptation to avoid the mistake. (15. then ˆ ˆ ˆ ψv = xψx + yψy + zψz.4 converts to cylindrical or spherical coordinates if and as desired. φ). when one considers that the ˆ coordinates (1. 2π/4. The inverse of (15. that’s it. The coordinates (1.

and indeed if we write (15.9) z2 and then expand each of the two factors on the right according to (15. by deﬁnition hereby if complex.8). (15. (15. is obvious from (15.8.8) in matrix notation. 13: v1 · v2 = x1 x2 + y1 y2 + z1 z2 .1’s cosine if the vectors are real. (15. we see that the dot product does not in fact vary.43) of § 13. 3. Fig. v2 · v1 = v1 · v2 . As in (13.10) gives the angle θ between two vectors according Fig. (15.11) . ˆ∗ ˆ v1 · v2 = cos θ. That the dot product is commutative.   x2 v1 · v2 = x1 y1 z1  y2  .1 The dot product We ﬁrst met the dot product in § 13. here too the relationship ∗ ∗ v1 · v2 = v1 v2 cos θ. VECTOR ANALYSIS Figure 15.8. to be valid.2: The dot product b θ b cos θ a · b = ab cos θ a 15. 15.2.6). the dot product must not vary under a reorientation of axes. It works similarly for the geometrical vectors of this chapter as for the matrix vectors of Ch.398 CHAPTER 15. Naturally.8) which is the product of the vectors v1 and v2 to the extent that they run in the same direction.2 illustrates the dot product.

z otherwise a − sign. as in ˆx1 y2 . Unlike the dot product the cross product is a vector. Where the progression is honored.2. inasmuch as a reorientation consists of three rotations in sequence.1 is a scalar. v2 × v1 = −v1 × v2 .2 The cross product The dot product of two vectors according to § 15. MULTIPLICATION 399 15. the cross product is the product of two vectors to the extent that they run in diﬀerent directions.6) then substituting the results into (15.12) arises again and again in vector analysis.1 could have but did not happen to use) whose semantics are as shown.6’s parity principle and the right-hand rule.14) which is a direct consequence of the previous point regarding parity.2) and (15. One can also multiply two vectors to obtain a vector. however. where the |·| notation is a mnemonic (actually a pleasant old determinant notation § 14. the associated term bears a + sign.12) x2 y2 z2 ˆ ˆ ˆ ≡ x(y1 z2 − z1 y2 ) + y(z1 x2 − x1 z2 ) + z(x1 y2 − y1 x2 ).2. Several facets of the cross product draw attention to themselves. . • The cyclic progression ··· → x → y → z → x → y → z → x → y → ··· (15. or which can be seen more prosaically in (15. As the dot product.12) by swapping the places of v1 and v2 . it suﬃces merely that rotation about one axis not alter the dot product. Fortunately. In fact. it is also an unnecessary exercise. • The cross product is not commutative. due to § 11.15. and it is often useful to do so. unpleasant exercise.12): a lengthy.2. the cross product too is invariant under reorientation. As the dot product is the product of two vectors to the extent that they run in the same direction. It is deﬁned in rectangular coordinates as v1 × v2 = ˆ ˆ z x y ˆ x1 y1 z1 (15. (15. for. One proves the proposition in the latter form by setting any two of φ. θ and η to zero before multiplying out and substituting.13) of (15. One could demonstrate this fact by multiplying out (15.

ˆ ˆ as is proved by a suitable counterexample like v1 = v2 = x. exotic physical phenomena is three-dimensional). 15.12) into (15. and ﬁnally by applying (15.10—in fact the operation v1 2 is used routinely to calculate electromagnetic power ﬂow2 —but each of the cross product’s three rectangular components has its own complex phase which the magnitude operation ﬂattens. the physical world is threedimensional (or.8) with an appropriate change of variables and simplifying. Fortunately. (v1 × v2 ) × v3 = v1 × (v2 × v3 ). One can similarly relate the angle’s sine to the cross product if the vectors involved are real. as is seen by substituting (15. as ˆ |ˆ 1 × v2 | = sin θ.12) and comparing the result against Fig. 1-51] . • Section 15. v1 · (v1 × v2 ) = 0 = v2 · (v1 × v2 ).400 CHAPTER 15. 3. • The cross product runs perpendicularly to each of its two factors.1 has related the cosine of the angle between vectors to the dot product. (If the vectors involved are com∗ plex then nothing prevents the operation |v1 × v2 | by analogy with ∗ × v without the magnitude sign eqn. so the cross product’s limitation as deﬁned here to three dimensions will seldom if ever disturb us. the space in which we model all but a few. the cross product is closely tied to threedimensional space.2. but to speak of a cross product in fourdimensional space would require arcane deﬁnitions and would otherwise make little sense. That is. and that v2 has only a nonnegative comz ˆ ponent in the y′ direction. so the result’s relationship to the sine of an angle is not immediately clear. at least.) 2 [21. by remembering that reorientation cannot alter a cross product. that v2 has no ˆ component in the ˆ′ direction. That is. VECTOR ANALYSIS • The cross product is not associative. Two-dimensional space (a plane) can have a cross product so long as one does not mind that the product points oﬀ into the third dimension. (15. eqn.15) ˆ ˆ ˆ demonstrated by reorienting axes such that v1 = x′ . v3 = y. • Unlike the dot product.1’s sine. v |v1 × v2 | = v1 v2 sin θ.

such a concept is ﬂexible enough to confuse the uninitiated severely and soon. here too an example aﬀords perspective. 15. for. Normally one requires that x′ . y′ and ˆ′ can serve.7. Imagine driving your automobile down a winding road.3. That your velocity were ˆ meant ℓq that you kept skilfully to your lane. and obey the right-hand rule (§ 3.7. However. However. not generally. but just at the spot along the road at which your automobile momentarily happened to be. Recalling the constants and variables of § 2. 3 . to choose x′ . Moreover. y′ and z′ wisely can immensely simplify the model’s ˆ ˆ ˆ analysis.15. whether q = xx + yy + ˆz or q = x′ x′ + y′ y ′ + z′ z ′ . with velocity as v which in the present example would happen to be v = ˆ ℓv. 15. it remains z the same vector q. a cylinder or a sphere.3 illustrates the cross product. a model can specify z ′ . ˆ ˆ but otherwise any x′ . ORTHOGONAL BASES Figure 15. y′ and z′ under various conditons. that your velocity Conventionally one would prefer the letter v to represent speed. where q represented your speed3 and ˆ represented the direction ℓ the road ran.3). 401 c = a×b = ˆab sin θ c b θ a Fig. As in § 2. y′ and z′ each retain unit length. where a model involves a circle.3: The cross product. this section will require the letter v for an unrelated purpose. run perpendiclarly to one another. for nothing requires the three ˆ ˆ various x ˆ to be constant. on the other hand.3 Orthogonal bases A vector exists independently of the components by which one expresses ˆ ˆ ˆ ˆ ˆ it. where a model involves a contour or a curved surface of some ˆ ˆ ˆ kind.

etc. that such a unit normal n tells one everything one needs to know about its surface’s local orientation. Refer to § 3. Refer to § 3. where the model features a contour along a curved surface.3. Standing on the earth’s surface. • Combining the last two. θ is variable and runs locally tangentially to the sphere’s surface in the direction of increasing elevation θ (that is. 4 The assertion wants a citation. In such a model one conventionally establishes z ˆ as the direction of the principal circle’s axis but then is left with x ˆ or y as the direction of the secondary circle’s axis.) can be used. • Occasionally a model arises with two circles that lie neither in the same plane nor in parallel planes but whose axes stand perpendicuˆ lar to one another.4.9. and φ east. . along the sphere’s surface perpendicularly to ˆ).” Observe.”4 • Where the model features a curved surface like the surface of a wavy ˆ sea. then that v or w should probably be used. where n u ˆ ˆ u ˆ ˆ points locally perpendicularly to the surface. [φ y ρy ] or [θ φ ˆ] basis can be used locally x ˆ ˆ as appropriate. a [ˆ φ z] basis can ρ ˆ ˆ ˆ be used. θ south. a [ˆ v n] basis (or a [ˆ n w] basis. which the author lacks. an [ˆ θ φ] basis can be used. Refer to § 3. ˆ z r ˆ ˆ would be up. • Where the model features a sphere.5. The letter n here stands for “normal. The letter ℓ here stands for “longitudinal. 15. incidentally ˆ but signiﬁcantly. an [ˆ v n] basis can be used. as nearly as possible to the −ˆ direction z z ˆ without departing from the sphere’s surface). upon which an x ˆ x r ˆx ˆ y ˆ ˆ ˆy ˆ y r [ˆ ρx φ ]. 15. ORTHOGONAL BASES 403 ˆ ˆ the direction right-to-left across the example’s road. choosing a direction for v in this case since necessarily v = n × ˆ • Where the model features a circle or cylinder. and φ is variable and runs locally tangentially to the sphere’s surface in the direction of increasing azimuth φ (that is. and φ is variable and runs locally along the circle’s perimeter in the direction of increasing azimuth φ. though not usually in the −ˆ direction itself. ρ is variable and points ˆ locally away from the axis.9 and Fig. where z is constant and runs along the cylinder’s axis (or ˆ perpendicularly through the circle’s center).” a synonym for “perpendicular. r ˆ ˆ where ˆ is variable and points locally away from the sphere’s cenr ˆ ter. One need not worry about ℓˆ ˆ ˆ ˆ ˆ ℓ.15. [φ ˆ θ ]. with the earth as the sphere.9 and Fig.

The dot in the is supposed to look like the tip of an arrowhead. ˆ r ˆ θ ˆ φ .1). VECTOR ANALYSIS and Figure 15.404 CHAPTER 15. this ﬁgure shows the constant ˆ basis vector z pointing out of the page toward the reader. 15.) middle of ˆ φ ˆ ρ ρ φ ˆ z Figure 15. (The conventional symbols respectively represent vectors pointing out of the page toward the reader and into the page away from the reader.4: The cylindrical basis. Thus.5: The spherical basis (see also Fig.

(15. aρ ar aθ aφ ˆ ≡ ρ · a.16).1 Components by subscript ax ay az an ≡ ≡ ≡ ≡ ˆ x · a. a×b = ˆ ˆ ˆ x y z ax ay az bx by bz . 15. This section augments the notation to render it much more concise. r ˆ · a. as we have seen.20) (15. Generically. z ˆ n · a.4 Notation The vector notation of §§ 15.12): a · b = ax bx + ay by + az bz . ai ≡ ˆ · a.4.19) In fact—because.8) and (15.18) (15. ˆ · a. but the foregoing are the most common. ˆ y · a.21) . That is to say. familiar and often expedient but sometimes inconveniently prolix. each orthogonal basis orders its three unit vectors by the right-hand rule (15.4. the notation represents the component of a vector a in the indicated direction.15. The notation and so forth abbreviates the indicated dot product. (15. so one can write more generally that ˆ ˆ (for instance. reorientation of axes cannot alter the dot and cross products—any orthogonal basis [ˆ ′ y′ ˆ′ ] (§ 15. but the abbreviation especially helps when several such dot products occur together in the same expression. 15.2 is correct. ≡ ˆ · a. The abbreviation lends a more amenable notation to the dot and cross products of (15. Whether listed here or not. so the full dot-product notation is often clearer in print than the abbreviation is. ı (15.17) Applied mathematicians use subscripts for several unrelated or vaguely related purposes. ˆ x′ ˆ y′ ˆ′ z a×b = ax′ bx ′ ay′ az ′ by′ bz ′ .1 and 15. ≡ θ ˆ ≡ φ · a. NOTATION 405 Many other orthogonal bases are possible. [ˆ φ z ρ r a · b = ax′ bx′ + ay′ by′ + az ′ bz ′ .3) can serve here x ˆ z ˆ ˆ] or [ˆ θ φ]).

2 Einstein’s summation convention Einstein’s summation convention is this: that repeated indices are implicitly summed over. right-handed. 10 February 2008]. VECTOR ANALYSIS Because all those prime marks burden the notation and for professional mathematical reasons. If the reader feels that the notation begins to confuse more than it describes. y and z could just as well be ρ. the general forms (15.406 CHAPTER 15. the abbreviated notation really is the proper notation. with the implied understanding that x.20) and (15. Typically. study closely and take heart! The notation is not actually as impenetrable as it at ﬁrst will seem. For vectors. “Einstein notation. far from granting the reader a comfortable chance to absorb the elaborated notation as it stands. indeed. So. three-dimensional basis.21) but prefer not to burden the notation with extra little strokes—that is. 15. where the superscript bears some additional semantics [52.8).18) and (15.4. b1 b2 b3 but you have to be careful about that in applied usage because people are not always sure whether a symbol like a3 means “the third component of ˆ the vector a” (as it does here) or “the third vector’s component in the a direction” (as it would in eqn.22) [25] Some professional mathematicians now write a superscript ai in certain cases in place of a subscript ai . the equation6 a · b = ai bi 5 6 (15. The trouble with vector work is that one has to learn to abbreviate or the expressions involved grow repetitive and unreadably long. the writer empathizes but regrets to inform the reader that the rest of the section. ˆ ˆ ˆ e1 e2 e3 a×b = a1 a2 a3 .19) with the implied understanding that they really mean (15.20) and (15. . and.21) are sometimes rendered a · b = a1 b1 + a2 b2 + a3 b3 . the abbreviation is rather elegant once one becomes used to it. 15. where the convention is in force.5 For instance.” 05:36. Eventually one accepts the need and takes the trouble to master the conventional vector abbreviation this section presents. Scientists and engineers however tend to prefer Einstein’s original. shall not delay to elaborate the notation yet further! The confusion however is subjective. φ and z or the coordinates of any other orthogonal. applied mathematicians will write in the manner of (15. subscript-only notation.

judging it somewhat elegant but not worth devoting a subsection to. “Einstein summation”] The book might not present the convention at all were it not so widely used. Admittedly confusing on ﬁrst encounter.20). below.24) before presenting it.3. the convention is both serviceable and neat. Einstein’s summation convention is also called the Einstein notation. you should indeed learn the convention—if only because you must learn it to understand the [50.21)—although an experienced applied mathematician would probably apply the Levi-Civita epsilon of § 15. however. it brings no new mathematics. It is rather a notational convenience. and. Under the convention. a term sometimes taken loosely to include also the Kronecker delta and LeviCivita epsilon of § 15. Besides. who does not want to learn a thing to which the famous name of Albert Einstein is attached? 9 [38] 8 7 . in and of itself. It is the kind of notational trick an accountant might appreciate. but the operator is still there.7 . The convention is indeed widely used in more advanced applied circles. NOTATION means that a·b= or more fully that a·b = ai bi = ax′ bx′ + ay′ by′ + az ′ bz ′ .” It does not magically create a summation where none existed. You can waive the convention.22) expresses it more succinctly. writing the summation symbol out explicitly whenever you like.9 for the symbol after all exists to be employed.4.3.4.15.8 It asks a reader to regard a repeated index like the i in “ai bi ” as a dummy index (§ 2. Nothing requires you to invoke Einstein’s summation convention everywhere and for all purposes. after all. it just hides the summation sign to keep it from cluttering the page. What is important to understand about Einstein’s summation convention is that.23) is (15. i=x′ .z ′ 407 ai bi i which is (15. to further abbreviate this last equation to the form of (15. except that Einstein’s form (15.3) and thus to read “ai bi ” as “ i ai bi .4. the convention’s utility and charm are felt after only a little practice.y ′ . to invoke the convention at all may make little sense. the summational operator i is implied not written. a × b = ˆ(ai+1 bi−1 − bi+1 ai−1 ) ı (15. Likewise. In contexts other than vector work. Nevertheless.

k) = (x′ . not only three as in the present chapter.3 The Kronecker delta and the Levi-Civita epsilon Einstein’s summation convention expresses the dot product (15. Table 15. (y ′ . x′ . y ′ . ı where12  +1  ≡ −1   0 if (i.6. VECTOR ANALYSIS rest of this chapter—but once having learned it you should naturally use it only where it actually serves to clarify. x′ .7.” As far as the writer knows. concerns the three-dimensional case only. it often does just that.) Technically. y ′ . 11. In this more general sense the Levi-Civita is the determinant of the permutor whose ones hold the indicated positions—which is a formal way of saying that it’s a + sign for even parity and a − sign for odd. x′ . (Chapters 11 and 14 did not use it. does not by itself wholly avoid unseemly repetition in the cross product. ǫ is merely the letter 11 10 . otherwise [for instance if (i. The Levi-Civita epsilon 11 ǫijk mends this.23). the “ci” in Levi-Civita’s name is pronounced as the “chi” in “children. 4 × 4 case ǫ1234 = 1 whereas ǫ1243 = −1: refer to §§ 11. (y ′ . rendering the cross-product as a × b = ǫijkˆaj bk . in the fourdimensional. “Levi-Civita permutation symbol”] 13 The writer has heard the apocryphal belief expressed that the letter ǫ. x′ ).24). the Levi-Civita epsilon and Einstein’s summation convention are two separate.1. Fortunately. y ′ ). k) = (x′ . y ′ )].22) neatly but. z ′ ) or (z ′ . For instance. tensor. as the rest of this section and chapter. independent things. j. a Greek e.” 12 [37. j. as in (15. z ′ . if (i. Quiz:10 if δij is the Kronecker delta of § 11. but a canny reader takes the Levi-Civita’s appearance as a hint that Einstein’s convention is probably in force.408 CHAPTER 15.4. z ′ ).6.13 [25] Also called the Levi-Civita symbol. in vector work. but the Levi-Civita notation applies in any number of dimensions. (15.2. y ′ ).25) In the language of § 11. x′ ) or (z ′ . j.24) ǫijk (15.1. as we have seen in (15. or permutor. For native English speakers who do not speak Italian. then what does the symbol δii represent where Einstein’s summation convention is in force? 15. the Levi-Civita epsilon quantiﬁes parity. k) = (x′ . stood in this context for “Einstein. z ′ . however. The two tend to go together.1 and 14.

2’s quiz. y ′ ). and also either (m. n) or (j. n) = (y ′ .15. k) = (z ′ .1 lists several relevant properties. which makes for yet another plausible story). k) = (n. in the case that i = x′ . thus contributing nothing to the sum). z ′ ) or (m. in each case the several indices can take any values. This implies that either (j. Table 15. In any event. to zero. with Einstein’s summation convention in force. NOTATION 409 Table 15. y ′ ).10. Both delta and epsilon ﬁnd use in vector work.1: Properties of the Kronecker delta and the Levi-Civita epsilon. one sometimes hears Einstein’s summation convention. and similarly in the cases that i = y ′ and i = z ′ (more precisely. the Kronecker delta and the Levi-Civita epsilon together referred to as “the Einstein notation. one can write (15. k) = (m. n) = (z ′ .6 approximately as the cross product relates to the dot product. . but combinations other than the ones listed drive ǫimn or ǫijk . the property that ǫimn ǫijk = δmj δnk − δmk δnj is proved by observing that.22) alternately in the form a · b = δij ai bj . or both.14 each as with Einstein’s summation convention in force. δjk = δkj δij δjk = δik δii = 3 δjk ǫijk = 0 δnk ǫijk = ǫijn ǫijk = ǫjki = ǫkij = −ǫikj = −ǫjik = −ǫkji ǫijk ǫijk = 6 ǫijn ǫijk = 2δnk ǫimn ǫijk = δmj δnk − δmk δnj The Levi-Civita epsilon ǫijk relates to the Kronecker delta δij of § 11.” which though maybe not quite terminologically correct is hardly incorrect enough to argue over and is clear enough in practice. k) = (y ′ .15 Of the table’s several properties. z ′ ) or (j. when one takes parity into after δ. 14 [38] 15 The table incidentally answers § 15.4. either (j. which represents the name of Paul Dirac—though the writer does not claim his indirected story to be any less apocryphal than the other one (the capital letter ∆ has a point on top that suggests the pointy nature of the Dirac delta of Fig.4. For example. m)—which. 7.

(15. and similarly for k = n = y ′ and again for k = n = z′ . the last paragraph likely makes sense to few who do not already know what it means. Unfortunately. A concrete example helps. in any given term of the Einstein sum.n j. j) = (y ′ .24).n ˆ (δmj δnk − δmk δnj )man bj ck . j) = (z ′ . in light of (15. a × (b × c) = b(a · c) − c(a · b). which leaves the third to be shared by both k and n. VECTOR ANALYSIS account. In this section’s notation and with the use of (15. y ′ ) term both contribute positively to the sum. The factor 2 appears because. the compound product is a × (b × c) = a × (ǫijkˆbj ck ) ı ˆ = ǫmni man (ǫijkˆbj ck )i ı ˆ = ǫmni ǫijk man bj ck ˆ = ǫimn ǫijk man bj ck ˆ = (δmj δnk − δmk δnj )man bj ck ˆ ˆ = δmj δnk man bj ck − δmk δnj man bj ck ˆ = ˆak bj ck − kaj bj ck  ˆ = (ˆbj )(ak ck ) − (kck )(aj bj ). z ′ ) term and an (i. . i is either x′ or y ′ or z ′ and that j is one of the remaining two. Consider the compound product a × (b × c).410 CHAPTER 15.22).m.  That is.m. for k = n = x′ .26) a useful vector identity. The property that ǫijn ǫijk = 2δnk is proved by observing that. is exactly what the property in question asserts. Written without the beneﬁt of Einstein’s summation convention. an (i.k. the example’s central step would have been a × (b × c) = = ˆ ǫimn ǫijk man bj ck i.j.k.

x′ .j. x′ ) + ǫy ′ x′ z ′ ǫy ′ x′ z ′ f (x′ . z ′ . z ′ . z ′ . z ′ . x′ ) + f (y ′ . For the property that ǫijn ǫijk = 2δnk . x′ ) + ǫz ′ y ′ x′ ǫz ′ x′ y ′ f (y ′ . x′ . z ′ ) + ǫy ′ x′ z ′ ǫy ′ z ′ x′ f (x′ .k. y ′ . y ′ . y ′ ) + f (z ′ . x′ . y ′ . x′ ) + ǫz ′ y ′ x′ ǫz ′ y ′ x′ f (x′ . y ′ ) ˆ − f (y ′ . y ′ . z ′ . y ′ . z ′ ) ˆ ˜ 2 f (x′ . x′ . k. x′ . x′ . x′ . y ′ ) + f (y ′ . y ′ ) + f (z ′ . z ′ . z ′ ) + f (x′ . z ′ . y ′ . z ′ . z ′ ) − f (y ′ . y ′ ) + f (z ′ . n). y ′ . x′ . z ′ . x′ . x′ . x′ . x′ . x′ ) + f (y ′ . z ′ ) + f (x′ . the corresponding calculation is X ǫijn ǫijk f (k. z ′ . x′ ) + f (x′ . NOTATION 411 which makes sense if you think about it hard enough. x′ . z ′ .15.j. z ′ . X ǫijk ǫijk = ǫ2 ′ y ′ z ′ + ǫ2 ′ z ′ x′ + ǫ2′ x′ y ′ + ǫ2 ′ z ′ y ′ + ǫ2 ′ x′ z ′ + ǫ2′ y ′ x′ = 6. x′ . m. z ′ . y ′ . z ′ . y ′ ) + f (x′ . x′ . z ′ ) + f (z ′ . x′ ) = f (y ′ .4. x′ ) + f (z ′ . x′ . x′ .m. y ′ ) + f (z ′ . x′ ) − f (z ′ . z ′ ) + f (z ′ . x y z x y z i. y ′ ) + f (y ′ . y ′ . y ′ ) + f (z ′ . z ′ ) + f (y ′ . z ′ ) + f (y ′ .n = ǫx′ y ′ z ′ ǫx′ y ′ z ′ f (y ′ . z ′ . z ′ . x′ ) + f (y ′ . x′ . 2 k. y ′ . x′ . y ′ ) + ǫx′ z ′ y ′ ǫx′ y ′ z ′ f (z ′ . z ′ ) + ǫx′ y ′ z ′ ǫx′ z ′ y ′ f (y ′ . y ′ . x′ ) + f (y ′ . x′ ) + ǫy ′ z ′ x′ ǫy ′ x′ z ′ f (z ′ . y ′ ) + f (z ′ . x′ ) + f (y ′ . x′ .16 and justiﬁes the 16 If thinking about it hard enough does not work. z ′ . y ′ ) + f (x′ . y ′ . y ′ . z ′ ) + f (x′ . z ′ . z ′ ) = j. x′ . then here it is in interminable detail: X ǫimn ǫijk f (j. z ′ ) + ǫz ′ x′ y ′ ǫz ′ x′ y ′ f (x′ . y ′ ) + ǫz ′ x′ y ′ ǫz ′ y ′ x′ f (x′ . y ′ ) + ǫy ′ z ′ x′ ǫy ′ z ′ x′ f (z ′ . n). z ′ . y ′ ) + f (x′ . y ′ . x′ . y ′ . k. x′ . x′ ) + f (z ′ . x′ . x′ . y ′ . z ′ . z ′ . z ′ ) + ǫy ′ x′ z ′ ǫy ′ x′ z ′ f (z ′ .j. z ′ . y ′ ) + ǫx′ z ′ y ′ ǫx′ z ′ y ′ f (y ′ . z ′ ) X δnk f (k.k. z ′ ) + f (z ′ . y ′ .n = ǫy ′ z ′ x′ ǫy ′ z ′ x′ f (x′ . x′ ) ˜ ˜ ˜ ˜ = ˆ f (y ′ .m. z ′ . m. z ′ . z ′ ) + f (z ′ . z ′ . z ′ . y ′ ) + f (x′ . x′ . y ′ ) + ǫz ′ y ′ x′ ǫz ′ y ′ x′ f (y ′ . y ′ . x′ . z ′ ) = = = f (x′ . y ′ .n X (δmj δnk − δmk δnj )f (j. z ′ . x′ ) ˆ + f (z ′ . x′ ) − f (y ′ . x′ ) + f (x′ . x′ . z ′ . n) i. x′ ) + ǫz ′ x′ y ′ ǫz ′ x′ y ′ f (y ′ . x′ . y ′ . y ′ . z ′ ) − f (y ′ . y ′ . x′ . y ′ . y ′ . y ′ .k . z ′ . z ′ ) = ˆ f (y ′ . x′ . y ′ . x′ . z ′ ) − f (x′ . y ′ . y ′ . That is for the property that ǫimn ǫijk = δmj δnk −δmk δnj . x′ . z ′ ) + ǫx′ z ′ y ′ ǫx′ z ′ y ′ f (z ′ . y ′ . y ′ . x′ . z ′ . z ′ . z ′ . x′ . y ′ . z ′ .k. y ′ . y ′ ) − f (z ′ . z ′ . y ′ ) + ǫx′ y ′ z ′ ǫx′ y ′ z ′ f (z ′ . y ′ ) + f (z ′ . z ′ ) + f (x′ . z ′ . n) i. x′ ) + f (y ′ . x′ ) + f (x′ . x′ . y ′ ) − f (x′ . x′ ) + f (x′ . z ′ . y ′ ) + f (z ′ . y ′ . y ′ . y ′ .n For the property that ǫijk ǫijk = 6. y ′ . x′ ) + f (x′ .

dot and cross products. but with three distinct types of product it has more rules controlling the way its products and sums are combined.18) and (15. until he is satisﬁed that he adequately understands the several properties and their correct use.2: Algebraic vector identities. the Levi-Civita epsilon and the properties of Table 15. the compound Kronecker operator −δmk δnj includes canceling terms for these same three cases. 17 [47.19) respectively for the scalar. ψa = a∗ · a b·a a · (b + c) a · (ψb) ˆψai ı a · b = ai bi a × b = ǫijkˆaj bk ı = |a|2 (ψ)(a + b) = ψa + ψb = a·b b × a = −a × b = a·b+a·c a × (b + c) = a × b + a × c = (ψ)(a · b) a × (ψb) = (ψ)(a × b) a · (b × c) = b · (c × a) = c · (a × b) a × (b × c) = b(a · c) − c(a · b) table’s claim that ǫimn ǫijk = δmj δnk − δmk δnj . Appendix A] .5 Algebraic identities Vector algebra is not in principle very much harder than scalar algebra is. VECTOR ANALYSIS Table 15. (15. in something like the style of the example. Table 15.412 CHAPTER 15. for the case that j = k = m = n = y ′ and for the case that j = k = m = n = z ′ . Appendix II][21. whereas the compound Levi-Civita operator ǫimn ǫijk does not.7).17 Most of the table’s identities are plain by the formulas (15. The reader who does not feel entirely sure that he understands what is going on might work out the table’s several properties with his own pencil.2 lists several of these. This is why the table’s claim is valid as written. (Notice that the compound Kronecker operator δmj δnk includes nonzero terms for the case that j = k = m = n = x′ . However.) To belabor the topic further here would serve little purpose. and one was proved It is precisely to encapsulate such interminable detail that we use the Kronecker delta. 15.1.

since ˆ ˆ z q(r) = x′ qx′ (r) + y′ qy′ (r) + ˆ′ qz ′ (r) for any orthogonal basis [x′ y′ z′ ] as well. the speciﬁc scalar ﬁelds qx (r).6. FIELDS AND THEIR DERIVATIVES 413 as (15. The notation dσ/dt means “the rate of σ as time t advances.15. The remaining identity is proved in the notation of § 15. Air pressure p(r) is an example of a scalar ﬁeld. § 18.” We can likewise call a scalar quantity ψ(r) or vector quantity a(r) whose varies over space “a function of position r. derivatives with respect to position create a notational problem.3. We call it a ﬁeld. Wind velocity18 q(r) is an example of a vector ﬁeld. These are typical examples. the table also includes the three vector products in Einstein notation. As one can take the derivative dσ/dt or df /dt with respect to time t of a function σ(t) or f (t). which it needs for another purpose. a function in which spatial position serves as independent variable. this section also uses the letter q for velocity in place of the conventional v [3. qy (r) and qz (r) are no more essential to the vector ﬁeld q(r) than the speciﬁc scalars bx .” but if the notation dψ/dr likewise meant “the rate of ψ as 18 As § 15.27) = ǫijk ai bj ck = ǫijk bi cj ak = ǫijk ci aj bk Besides the several vector identities. one can likewise take the derivative with respect to position r of a ﬁeld ψ(r) or a(r). A ﬁeld is a quantity distributed over space or. if you prefer.26). A vector ﬁeld can be thought of as composed of three scalar ﬁelds ˆ ˆ ˆ q(r) = xqx (r) + yqy (r) + zqz (r). Scalar and vector ﬁelds are of utmost use in the modeling of physical phenomena. whose value at a given location r has amplitude but no direction.4]. (15. 15. However.” but there is a special. for it is not obvious what notation like dψ/dr or da/dr would mean. but. . by and bz are to the vector b.6 Fields and their derivatives We call a scalar quantity φ(t) or vector quantity f (t) whose value varies over time “a function of time t. whose value at a given location r has both amplitude and direction. alternate name for such a quantity.4 as a · (b × c) = a · (ǫijkˆbj ck ) ı = ǫijk ai bj ck = ǫkij ak bi cj = ǫjki aj bk ci = b · (c × a) = c · (a × b).

but were you to try to explain such a building to a caveman used to thinking of the ground and the ﬂoor as more or less the same thing.2 has given the vector three distinct kinds of product. z If you think that the latter does not look very much like a vector. z Consider a vector Then consider a “vector” ˆ ˆ c = x[Tuesday] + y[Wednesday] + ˆ[Thursday].1 The ∇ operator ˆ ˆ a = xax + yay + ˆaz . Section 15. VECTOR ANALYSIS position r advances” then it would prompt one to ask. 19 . and the curl. If we will speak of a ﬁeld’s derivative with respect to position r then we shall be more precise. 15. The writer does not know how to interpret a nonsense term like “[Tuesday]ax ” any more than the reader does. then the writer thinks as you do. Once one has grasped the concept of upstairs and downstairs. “advances in which direction?” The notation oﬀers no hint.414 CHAPTER 15. the gradient. a building with three ﬂoors or thirty is not especially hard to envision. In fact dψ/dr and da/dr mean nothing very distinct in most contexts and we shall avoid such notation. We shall address the Laplacian in [section not yet written]. What matters in this context is not that c have amplitude and direction (it has neither) but rather that it have the three orthonormal Vector veterans may notice that the Laplacian is not listed. he might be conﬂicted by partly false images of sitting in a tree or of clambering onto (and breaking) the roof of his hut. This section gives the ﬁeld no fewer than four distinct kinds of derivative: the directional derivative.19 So many derivatives bring the student a conceptual diﬃculty akin to a caveman’s diﬃculty in conceiving of a building with more than one ﬂoor. This is not because the Laplacian were uninteresting but rather because the Laplacian is actually a second-order derivative—a derivative of a derivative. but the point is that c behaves as though it were a vector insofar as vector operations like the dot product are concerned. “There are many trees and antelopes but only one sky and ﬂoor.6. the divergence. but consider: c · a = [Tuesday]ax + [Wednesday]ay + [Thursday]az . How can one speak of many skies or many ﬂoors?” The student’s principal conceptual diﬃculty with the several vector derivatives is of this kind.

The operator ∇ however shares more in common with a true vector than merely having x.15. partial derivatives. Now consider a “vector” ˆ ∇=x ∂ ∂ ∂ ˆ ˆ +y +z . It has these. ∂x ∂y ∂z (15. unlike the terms in the earlier dot product. That the components’ amplitudes seem nonsensical is beside the point. ˆ + y′ [−ax′ cos φ sin φ + ay′ sin2 φ + ax′ cos φ sin φ + ay′ cos2 φ] . Maybe there exists a model in which “[Tuesday]” knows how to operate on a scalar like ax . As if this were not enough. so “[Tuesday]” might do something to ax other than to multiply it.2).6. then the dot product c · a could be licit in that model. but. maybe. Nothing in the dot’s product’s deﬁnition requires the component amplitudes of c to multiply those of a. This is easier to see at ﬁrst with respect the true vector a. ∂x ∂y ∂z Such a dot product might or might not prove useful.1) and (15. The model might allow it. but c is not a true vector. the cross product c × a too could be licit in that model. The dot and cross products in and of themselves do not forbid it. like a true vector. Multiplication is what the component amplitudes of true vectors do.28) This ∇ is not a true vector any more than c is.) If there did exist such a model. at least we know what this one’s terms mean. Section 15. days of the week. Well. There follows ˆ ˆ ˆ a = xax + yay + zaz ′ ′ ˆ = (ˆ cos φ − y sin φ)(ax′ cos φ − ay′ sin φ) x ˆ ˆ + (ˆ ′ sin φ + y′ cos φ)(ax′ sin φ + ay′ cos φ) + z′ az ′ x ˆ = x′ [ax′ cos2 φ − ay′ cos φ sin φ + ax′ sin2 φ + ay′ cos φ sin φ] ˆ ˆ z = x′ ax′ + y′ ay′ + ˆ′ az ′ . Consider rotating the x and y axes through an angle φ about the z axis. What’s the point? The answer is that there wouldn’t be any point if the only nonvector “vectors” in question were of c’s nonsense kind. composed according to the usual rule for cross products. but if we treat it as one then we have that ∇·a= ∂ax ∂ay ∂az + + . for. the operator ∇ is amenable to having its axes reoriented by (15. y and z components.6. (Operate on? Yes.2 elaborates the point. FIELDS AND THEIR DERIVATIVES 415 components it needs to participate formally in relevant vector operations. ersatz vectors—it all seems rather abstract.

ı Introduced by Oliver Heaviside. ∇ =ˆ ı ∂ ∂ ∂ ∂ ˆ ˆ ˆ = x′ ′ + y′ ′ + z′ ′ . The partial diﬀerential operators ∂/∂x.416 CHAPTER 15. VECTOR ANALYSIS where the ﬁnal expression has diﬀerent axes than the original but.6. where the “·” holds the place of the thing operated upon. relative to those axes. exactly the same form. then what takes the place ˜ of the ambiguous d/dr′ . where ro = ˆio . a single ﬁeld.1). [6] 20 . That is the convention. Any letter might serve as well as the example’s J. It is x ˆ ˆ this invariance under reorientation that makes the ∇ operator useful. ay and az do. Now consider ∇. Further rotation about other axes would further reorient but naturally also would not alter the form.6. r † and so on.2 Operator notation Section 15. ∇o . ı there ∇o = ˆ ∂/∂io . the same mark ∇ distinguishes the corresponding special ∇. for instance. but what distinguishes operator notation is that. Operator notation concerns the representation of unary operators and the operations they specify. the operator J in operator notation’s operation Jt attacks from the left. For example. ∂i ∂x ∂y ∂z (15. For example. ∇. the vector diﬀerential operator ∇ ﬁnds extensive use in the modeling of physical phenomena. A unary operator is a mathematical agent that transforms a single discrete quantity. informally pronounced “del” (in the author’s country at least). ∂/∂y and ∂/∂z change no diﬀerently under reorientation than the component amplitudes ax .1. d/d˜. Section 7. Jt = t2 /2 and J cos ωt = (sin ωt)/ω. d/dro . a single function or another single mathematical object in some deﬁnite t way. In this example it is clear enough.1 has introduced operator notation without explaining what it is. but where otherwise unclear one can write an Rt operator’s deﬁnition in the applied style of J ≡ 0 · dt. After a brief digression to discuss operator notation.3 has already broadly introduced the notion of the operator. a single distributed quantity. J ≡ 0 dt is a unary operator20 whose eﬀect is such that. This subsection digresses to explain. 15.29) evidently the same operator regardless of the choice of basis [ˆ ′ y′ z′ ]. d/dr† and so on? Answer: ∇′ . Whatever mark distinguishes the special r. Hence. If ∇ takes the place of the ambiguous d/dr. the subsections that follow will use the operator to develop and present the four basic kinds of vector derivative. like the matrix row operator A in matrix notation’s product Ax (§ 11.

and for the same reason.” Both confusion and perceived need would vanish if Ω ≡ (ω)(·) were unambiguously an operator. The matrix actually is a type of unary operator and matrix notation is a specialization of operator notation. not (Jω)t. Jt = tJ if J is a unary operator. Seen in this way. In operator notation. FIELDS AND THEIR DERIVATIVES 417 Thus. better.6. One can compare the distinction in § 11. we have met operator notation much earlier than that. The product 5t can if you like be regarded as the unary operator “5 times. though the notation Jt usually formally resembles multiplication in other respects as we shall see. the scalar 5 itself is no operator but just a number—but where no other operation is deﬁned operator notation implies scalar multiplication by default.5) a matrix enjoys. the operator K ≡ · + 3 is not linear and almost certainly should never be represented in operator notation as here. so J1 J2 = J2 J1 . the same algebra matrices obey. Whether operator notation can can licitly represent any unary operation whatsoever is a deﬁnitional question we will leave for the professional mathematicians to answer. It is precisely this algebra that makes operator notation so useful. Linear unary operators generally are not commutative.’ operating on t. rather than go to the trouble of deﬁning extra symbols like Ω.3. unary operations that scrupulously honor § 7. And. but otherwise linear unary operators follow familiar rules of multiplication like (J2 + J3 )J1 = J2 J1 + J3 J1 .2 between λ and λI against the distinction between ω and Ω here. in which case (JΩ)t = JΩt = J(Ωt). so we have met operator notation before.3.15.3’s rule of linearity. to rely on the right-to-left convention that Jωt = J(ωt). Still. Perhaps you did not know that 5 was an operator—and. or. which take little space on the page and are universally understood. Linear unary operators obey a deﬁnite algebra. Jωt = J(ωt). Observe however that the confusion and the perceived need for parentheses come only of the purely notational ambiguity as to whether ωt bears the semantics of “the product of ω and t” or those of “the unary operator ‘ω times. generally. for a linear unary operator enjoys the same associativity (11. in fact.” operating on t. Modern conventional applied mathematical notation though generally excellent re- . it is usually easier just to write the parentheses. The operators J and A above are examples of linear unary operators. lest an expression like Kt mislead an understandably unsuspecting audience. The a · in the dot product a · b and the a × in the cross product a × b can proﬁtably be regarded as unary operators. though naturally in the speciﬁc case of scalar multiplication. which happens to be commutative. indeed. 5t and t5 actually mean two diﬀerent things. but in normal usage operator notation represents only linear unary operations. it is true that 5t = t5.

tJ is itself a unary operator—it is the operator t 0 dt—though one can assign no particular value to it until it actually operates on something. the notation gives r no speciﬁc direction in which to advance.1) except where parentheses group otherwise. The vector b only directs the derivative. A or Ω without giving it anything in particular to operate upon. 21 Well. (b · ∇)a(r) = bi ∂a ∂aj = ˆbi  .32) b · ∇ψ(r) = bi ∂i In the case (15. In operator notation. when it matters. so. incidentally. ∇a(r) itself means nothing coherent21 so the parentheses usually are retained.29) is an unresolved unary operator of the same kind.32) deﬁne the directional derivative. VECTOR ANALYSIS mains imperfect. ∂i ∂i (15. however.3 The directional derivative and the gradient In the calculus of vector ﬁelds.29) and accorded a reference vector b to supply a direction and.31) For the scalar ﬁeld the parentheses are unnecessary and conventionally are omitted.30) (b · ∇) = bi ∂i to express the derivative unambiguously. This operator applies equally to the scalar ﬁeld. as the section’s introduction has observed. One can leave an operation unresolved. it does mean something coherent in dyadic analysis.1. The operator ∇ of (15. one can compose the directional derivative operator ∂ (15. . however.418 CHAPTER 15. 15. given (15. but this book won’t treat that. it is not the object of it. Nothing requires b to be constant. Equations (15. t For example. One can speak of a unary operator like J.6.31) and (15. Note that the directional derivative is the derivative not of the reference vector b but only of the ﬁeld ψ(r) or a(r). ∂i as to the vector ﬁeld. ∂ψ (b · ∇)ψ(r) = bi . the derivative notation d/dr is ambiguous because.31) of the vector ﬁeld. to supply a scale. as ∂ψ . notationally. (15. operators associate from right to left (§ 2.

Pick your favorite brand. the directional derivative does not care. in which the vector ds represents the area and orientation of a patch of the region’s enclosing surface.35) is a vector inﬁnitesimal of amplitude ds. It is not easy to motivate divergence directly. One of these is divergence.4 Divergence There exist other vector derivatives than the directional derivative and gradient of § 15.15.34) where ˆ ds ≡ n · ds (15. The result of a ∇ operation. but other football games have goal lines and goal posts. directed normally outward from the closed surface bounding the region—ds being the area of an inﬁnitesimal element of the surface. too. through the concept of ﬂux as follows. (15. the directional derivative operator b · ∇ is invariant under reorientation of axes. however. Flux is ﬂow through a surface: in this case. The ﬂux of a vector ﬁeld a(r) outward from a region in space is Φ= S a(r) · ds. wherefore the directional derivative is invariant.3. so we shall approach it indirectly. . 15. The gradient represents the amplitude and direction of a scalar ﬁeld’s locally steepest rate.33) is called the gradient of the scalar ﬁeld ψ(r). FIELDS AND THEIR DERIVATIVES 419 though. too.6. If it seems opaque then try to visualize eqn. Formally a dot product. only scalar ﬁelds have gradients. When something like air ﬂows through any surface—not necessarily a physical barrier but an imaginary surface like the goal line’s vertical plane in a football game22 —what matters is not the surface’s area as such but rather 22 The author has American football in mind.6. (The paragraph says much in relatively few words.34’s dot product a[r] · ds. the quantity ∇ψ(r) = ˆ ı ∂ψ ∂i (15. It can be a vector ﬁeld b(r) that varies from place to place.6. the area of a tiny patch. 15. the gradient ∇ψ(r) is likewise invariant. Within (15. net ﬂow outward from the region in question.32). Though both scalar and vector ﬁelds have directional derivatives.

but otherwise the ﬂow sees a foreshortened surface. y) = ∆z(x. One can contemplate the ﬂux Φopen = S a(r) · ds through an open surface as well as through a closed. a sink. where wind blowing through the region. Refer to Fig. ∂y ∂az dz ∂z ∆ay (z.x) zmax (x. then some lines might enter and exit the region more than once. as though the surface were projected onto a plane perpendicular to the ﬂow. z). ∂x ∂ay dy. 15. 15.or ˆ-directed line. entering and leaving. It changes the problem in no essential way. x) = ∆y(z.z) ymax (z. then these increases are simply ∂ax ∆x(y.y) ∆az (x.y) represent the increase across the region respectively of ax . ∆ax (y. but where a positive net ﬂux would have barometric pressure falling and air leaving the region maybe because a storm is coming—and you’ve got the idea. x) dz dx + ∆az (x. z) dy dz + ∆ay (z. z) = xmin (y. y) dx dy. ∂z Naturally.) A region of positive ﬂux is a source. Now realize that eqn. 23 .34 actually describes ﬂux not through an open surface but through a closed—it could be the surface enclosing the region of football play to goal-post height. VECTOR ANALYSIS the area the surface presents to the ﬂow. but this merely elaborates the limits of integration along those lines.z) ∆ax (y. but it is the outward ﬂux (15. ∆ax (y. The outward ﬂux Φ of a vector ﬁeld a(r) from some deﬁnite region in space is evidently Φ= where xmax (y. z) = ∂x ∂ay ∆ay (z.2. would constitute zero net ﬂux.x) ∂ax dx. x).23 If the ﬁeld has constant derivatives ∂a/∂i. ay or az along ˆ ˆ an x-.420 CHAPTER 15. if the region’s boundary happens to be concave. y). y. ∂y ∂az ∆az (x. y) = zmin (x.34) through a closed surface that will concern us here. of negative ﬂux. x) = ymin (z. The surface presents its full area to a perpendicular ﬂow. z or equivalently if the region in question is small enough that the derivatives are practically constant through it.

Formally a dot product. y) dx dy. so ∂az ∂ax ∂ay + + . Φ = (V ) ∂x ∂y ∂z or. divergence is invariant under reorientation of axes. z) dy dz + ∂ay ∂y ∆y(z.6. representing the intensity of a source or sink. though. x) dz dx 421 ∆z(x. The circulation of a vector ﬁeld a(r) about a closed contour in space is Γ= a(r) · dℓ. ﬂat plane in space. 15.36) the name divergence.34) which represented a double integration over a surface. ˆ Let [ˆ v n] be an orthogonal basis with n normal to the contour’s plane uˆ ˆ ˆ ˆ such that travel positively along the contour tends from u toward v rather . ∇ · a(r) = ∂ai . ∂i (15. V ∂x ∂y ∂z ∂i We give this ratio of outward ﬂux to volume. One can in general contemplate circulation about any closed contour.38) where.37) (15. dividing through by the volume. ∂ax ∂ay ∂az ∂ai Φ = + + = = ∇ · a(r). FIELDS AND THEIR DERIVATIVES upon which Φ = ∂ax ∂x + ∂az ∂z ∆x(y. It needs ﬁrst the concept of circulation as follows. But each of the last equation’s three integrals represents the region’s volume V . the here represents only a single integration. Curl is a little trickier to visualize.15. unlike the S of (15. (15.5 Curl Curl is to divergence as the cross product is to the dot product. but it suits our purpose here to consider speciﬁcally a closed contour that happens not to depart from a single.6.

or equivalently if the contour in question is short enough that the derivatives are practically constant over it. so ∂au ∂av − . Γ A = ∂av ∂au − ∂u ∂v ∂ak ˆ =n· ∂j ˆ ˆ ˆ u v n ∂/∂u ∂/∂v ∂/∂n au av an (15. ∂u ∂au ∆v(u). = ∂j ∂u ∂v (15. then these increases are simply ∆av (v) = ∆au (u) = upon which Γ= ∂av ∂u ∆u(v) dv − ∂au ∂v ∆v(u) du. ˆ ˆ n · [∇ × a(r)] = n · ǫijkˆ ı ∂au ∂av ∂ak − . Γ = (A) ∂u ∂v or. ∆av (v) = umin (v) vmax (u) ∂av du. VECTOR ANALYSIS than the reverse.422 CHAPTER 15. We give this ratio of circulation to area.or v-directed line.39) ˆ = n · ǫijkˆ ı ˆ = n · [∇ × a(r)]. dividing through by the area.40) . ∂u ∂au dv ∂v ∆au (u) = vmin (u) represent the increase across the contour’s interior respectively of av or au ˆ ˆ along a u. If the ﬁeld has constant derivatives ∂a/∂i. The circulation Γ of a vector ﬁeld a(r) about this contour is evidently Γ= where umax (v) ∆av (v) dv − ∆au (u) du. ∂av ∆u(v). ∂v But each of the last equation’s two integrals represents the area A within the contour.

7. 15. ∇ × a(r) = ǫijkˆ ı ∂ak . about a speciﬁed axis. a vector. Observe that. whereupon one can return to deﬁne directional curl more generally than (15. Formally a cross product. so it may be said that curl is the locally greatest directional curl. not only n.6. b · [∇ × a(r)] = b · ǫijkˆ ı ∂ak ∂ak . Directional curl.40) has deﬁned it. representing the intensity of circulation.40) with respect to a contour in some speciﬁed plane.41) would necessarily result.4 contemplated the ﬂux of a vector ﬁeld a(r) from a volume small enough that the divergence ∇ · a was practically constant through .41) of curl.40) to motivate and develop the concept (15. Once developed.31) here too any reference vector b or ˆ vector ﬁeld b(r) can serve to direct the curl. although we have developed directional curl (15.1 The divergence theorem Section 15. curl (15. 15. ˆ We have needed n and (15. Directional curl evidently cannot exceed curl in magnitude. ∂j (15. This shift comes in two kinds. which we are now prepared to treat. the degree of twist so to speak. We might have chosen ˆ another plane and though n would then be diﬀerent the same (15. curl is invariant under reorientation of axes. directional curl is likewise invariant. dot-multiplied by a reference vector. a scalar. As in (15. Curl. The two subsections that follow explain.7 Integral forms The vector’s distinctive maneuver is the shift between integral forms. Note however that directional curl so deﬁned is not a distinct kind of derivative but rather is just curl. INTEGRAL FORMS 423 the name directional curl.40). = ǫijk bi ∂j ∂j (15. An ordinary dot product. the concept of curl stands on its own. unexpectedly is a property of the ﬁeld only.15. but will equal ˆ it when n points in its direction. oriented normally to the locally greatest directional curl’s plane.42) This would be the actual deﬁnition of directional curl. Hence.7. The cross product in (15.41) itself turns out to be altogether independent of any particular plane. however. is a property of the ﬁeld and the plane.41) we call curl.

Fortunately. so ∇ · a dv = a · ds.24 According to (15. we have that ∇ · a dv = a · ds + a · ds.43) Where waves propagate through a material interface the ﬁelds that describe the waves can be discontinuous. overall volume.424 CHAPTER 15. each element small enough that the constant-divergence assumption holds across it. V 24 ∇ · a dv = S a · ds. thin transition layer in which the ﬁelds and their derivatives vary smoothly rather than jumping sharply. Adding all the elements together. VECTOR ANALYSIS it.36). such that the one volume element’s ds on the surface the two elements share points oppositely to the other volume element’s ds on the same surface). the ﬂux from any one of these volume elements is a · ds. . and outer surface area shared with no other element because it belongs to the surface of the larger. Interior elements naturally have only the former kind but boundary elements have both kinds of surface area. One would like to treat larger volumes as well. whereas the narrative depends on ﬁelds not only to be continuous but even to have continuous derivatives. so we can elaborate the last equation to read ∇ · a dv = a · ds + a · ds Sinner Souter for a single element. where Selement = Sinner + Souter . elements elements Sinner elements Souter but the inner sum is null because it includes every interior surface twice.34) and (15. the apparent diﬃculty is dismissed by imagining an appropriately graded. but to us it is admittedly a distraction so we shall let the matter rest in that form. ∇ · a dv = Selement Even a single volume element however can have two distinct kinds of surface area: inner surface area shared with another element. because each interior surface is shared by two elements such that ds2 = −ds1 (in other words. A professional mathematician might develop this footnote into a formidable discursion. elements elements Souter That is. (15. One can treat a larger volume by subdividing the volume into inﬁnitesimal volume elements dv.

and outer edge shared with no other element because it belongs to the edge of the larger.43) is the divergence theorem.7. INTEGRAL FORMS 425 Equation (15. overall surface. If an open surface— whether the surface be conﬁned to a plane or be warped (as for example the surface of a bowl) in three dimensions—is subdivided into inﬁnitesimal surface elements ds.2. (∇ × a) · ds = elements elements outer 25 [47. developed as follows. so a · dℓ. eqn.7. 15. where element = inner + outer . then the circulation about any one of these elements according to (15.38) and (15.1 is a second.39) is (∇ × a) · ds = a · dℓ. each element small enough to be regarded as planar and to experience essentially constant curl.43) swaps the one integral for the other. related theorem for directional curl. (15.2 Stokes’ theorem Corresponding to the divergence theorem of § 15.15. 1. we have that (∇ × a) · ds = a · dℓ + a · dℓ. elements elements inner elements outer but the inner sum is null because it includes every interior edge twice. so we can elaborate the last equation to read (∇ × a) · ds = a · dℓ + a · dℓ inner outer for a single element. so it is an important result. When they do. The integral on the equation’s left and the one on its right each arise in vector analysis more often than one might expect. because each interior edge is shared by two elements such that dℓ2 = −dℓ1 . Interior elements naturally have only the former kind but boundary elements have both kinds of edge. It is the vector’s version of the fundamental theorem of calculus (7.25 neatly relating the divergence within a volume to the ﬂux from it.2).7.8] . often a proﬁtable maneuver. element Even a single surface element however can have two distinct kinds of edge: inner edge shared with another element. Adding all the elements together.

20] .43). [Chapter to be continued. VECTOR ANALYSIS (∇ × a) · ds = a · dℓ.4. S CHAPTER 15.426 That is.44) Equation (15.44) is Stokes’ theorem.26 neatly relating the directional curl over a (possibly nonplanar) surface to the circulation about it.44) serves to swap one integral for the other. 1. eqn. Like the divergence theorem (15. (15. Stokes’ theorem (15.] 26 [47.

Orthogonal polynomials 23. Basic special functions 20. Fourier and Laplace transforms 17. Feedback control 27. Bessel functions The following chapters are further tentatively planned but may have to await the book’s second edition. The conjugate-gradient algorithm is to be treated here. Energy conservation 28.Plan The following chapters are tentatively planned to complete the book. 26. 16. Iterative techniques2 The following chapters are yet more remotely planned. Transformations to speed series convergence 19. Spatial Fourier transforms1 24. 427 . Numerical integration 25. The wave equation 21. Stochastics 29. Probability 18. Statistical mechanics 1 2 Weyl and Sommerfeld forms are to be included here. 22.

430 CHAPTER 15. VECTOR ANALYSIS .

and it concisely summarizes complex ideas on paper to the writer himself. if writers will not incrementally improve it? Consider the notation of the algebraist Girolamo Cardano in his 1539 letter to Tartaglia: [T]he cube of one-third of the coeﬃcient of the unknown is greater in value than the square of one-half of the number. when it e falls to the discoverer and his colleagues to establish new notation to meet the need. one would ﬁnd it diﬃcult even to think clearly about the math. Nevertheless. and rightly so. A more diﬃcult problem arises when old notation exists but is inelegant in modern use. surely he would express the same thought in the form x 2 a 3 > .Appendix A Hexadecimal and other notational matters The importance of conventional mathematical notation is hard to overstate. 3 2 Good notation matters. Without the notation. New mathematical ideas occasionally ﬁnd no adequate pre¨stablished notation. Convention is a hard hill to climb. [35] If Cardano lived today. The right notation is not always found at hand. Such notation serves two distinct purposes: it conveys mathematical ideas from writer to reader. slavish devotion to convention does not serve the literature well. of course. for how else can notation improve over time. Although this book has no brief to overhaul applied mathematical notation generally. it does seek to aid the honorable cause of notational evolution 431 . nearly impossible. to discuss it with others.

434

APPENDIX A. HEX AND OTHER NOTATIONAL MATTERS

subscript. For example, the sum
M N

S=
i=1 j=1

a2 ij

might be written less thoroughly but more readably as S=
i,j

a2 ij

if the meaning of the latter were clear from the context. When to omit subscripts and such is naturally a matter of style and subjective judgment, but in practice such judgment is often not hard to render. The balance is between showing few enough symbols that the interesting parts of an equation are not obscured visually in a tangle and a haze of redundant little letters, strokes and squiggles, on the one hand; and on the other hand showing enough detail that the reader who opens the book directly to the page has a fair chance to understand what is written there without studying the whole book carefully up to that point. Where appropriate, this book often condenses notation and omits redundant symbols.

Appendix B

The Greek alphabet
Mathematical experience ﬁnds the Roman alphabet to lack suﬃcient symbols to write higher mathematics clearly. Although not completely solving the problem, the addition of the Greek alphabet helps. See Table B.1. When ﬁrst seen in mathematical writing, the Greek letters take on a wise, mysterious aura. Well, the aura is ﬁne—the Greek letters are pretty— but don’t let the Greek letters throw you. They’re just letters. We use them not because we want to be wise and mysterious1 but rather because
Well, you can use them to be wise and mysterious if you want to. It’s kind of fun, actually, when you’re dealing with someone who doesn’t understand math—if what you want is for him to go away and leave you alone. Otherwise, we tend to use Roman and Greek letters in various conventional ways: Greek minuscules (lower-case letters) for angles; Roman capitals for matrices; e for the natural logarithmic base; f and g for unspeciﬁed functions; i, j, k, m, n, M and N for integers; P and Q for metasyntactic elements (the mathematical equivalents of foo and bar); t, T and τ for time; d, δ and ∆ for change; A, B and C for unknown coeﬃcients; J, Y and H for Bessel functions; etc. Even with the Greek, there still are not enough letters, so each letter serves multiple conventional roles: for example, i as an integer, an a-c electric current, or—most commonly—the imaginary unit, depending on the context. Cases even arise in which a quantity falls back to an alternate traditional letter because its primary traditional letter is already in use: for example, the imaginary unit falls back from i to j where the former represents an a-c electric current. This is not to say that any letter goes. If someone wrote e2 + π 2 = O 2 for some reason instead of the traditional a2 + b2 = c2 for the Pythagorean theorem, you would not ﬁnd that person’s version so easy to read, would you? Mathematically, maybe it doesn’t matter, but the choice of letters is not a matter of arbitrary utility only but also of convention, tradition and style: one of the early
1

435

436

APPENDIX B. THE GREEK ALPHABET Table B.1: The Roman and Greek alphabets. Aa Bb Cc Dd Ee Ff Aa Bb Cc Dd Ee Ff Gg Hh Ii Jj Kk Lℓ Gg Hh Ii Jj Kk Ll ROMAN Mm Nn Oo Pp Qq Rr Ss Mm Nn Oo Pp Qq Rr Ss Tt Uu Vv Ww Xx Yy Zz Tt Uu Vv Ww Xx Yy Zz

Aα Bβ Γγ ∆δ Eǫ Zζ

alpha beta gamma delta epsilon zeta

Hη Θθ Iι Kκ Λλ Mµ

GREEK eta Nν theta Ξξ iota Oo kappa Ππ lambda Pρ mu Σσ

nu xi omicron pi rho sigma

Tτ Υυ Φφ Xχ Ψψ Ωω

tau upsilon phi chi psi omega

we simply do not have enough Roman letters. An equation like α2 + β 2 = γ 2 says no more than does an equation like a2 + b2 = c2 , after all. The letters are just diﬀerent. Applied as well as professional mathematicians tend to use Roman and Greek letters in certain long-established conventional sets: abcd; f gh; ijkℓ; mn (sometimes followed by pqrst as necessary); pqr; st; uvw; xyzw. For
writers in a ﬁeld has chosen some letter—who knows why?—then the rest of us follow. This is how it usually works. When writing notes for your own personal use, of course, you can use whichever letter you want. Probably you will ﬁnd yourself using the letter you are used to seeing in print, but a letter is a letter; any letter will serve. Excepting the rule, for some obscure reason almost never broken, that i not follow h, the mathematical letter convention consists of vague stylistic guidelines rather than clear, hard rules. You can bend the convention at need. When unsure of which letter to use, just pick one; you can always go back and change the letter in your notes later if you need to.

437 the Greek: αβγ; δǫ; κλµνξ; ρστ ; φψωθχ (the letters ζηθξ can also form a set, but these oftener serve individually). Greek letters are frequently paired with their Roman congeners as appropriate: aα; bβ; cgγ; dδ; eǫ; f φ; kκ; ℓλ; mµ; nν; pπ; rρ; sσ; tτ ; xχ; zζ.2 Some applications (even in this book) group the letters slightly diﬀerently, but most tend to group them approximately as shown. Even in Western languages other than English, mathematical convention seems to group the letters in about the same way. Naturally, you can use whatever symbols you like in your own private papers, whether the symbols are letters or not; but other people will ﬁnd your work easier to read if you respect the convention when you can. It is a special stylistic error to let the Roman letters ijkℓmn, which typically represent integers, follow the Roman letters abcdef gh directly when identifying mathematical quantities. The Roman letter i follows the Roman letter h only in the alphabet, almost never in mathematics. If necessary, p can follow h (curiously, p can alternatively follow n, but never did convention claim to be logical). Mathematicians usually avoid letters like the Greek capital H (eta), which looks just like the Roman capital H, even though H (eta) is an entirely proper member of the Greek alphabet. The Greek minuscule υ (upsilon) is avoided for like reason, for mathematical symbols are useful only insofar as we can visually tell them apart.3
2 The capital pair Y Υ is occasionally seen but is awkward both because the Greek minuscule υ is visually almost indistinguishable from the unrelated (or distantly related) Roman minuscule v; and because the ancient Romans regarded the letter Y not as a congener but as the Greek letter itself, seldom used but to spell Greek words in the Roman alphabet. To use Y and Υ as separate symbols is to display an indiﬀerence to, easily misinterpreted as an ignorance of, the Graeco-Roman sense of the thing—which is silly, really, if you think about it, since no one objects when you diﬀerentiate j from i, or u and w from v—but, anyway, one is probably the wiser to tend to limit the mathematical use of the symbol Υ to the very few instances in which established convention decrees it. (In English particularly, there is also an old typographical ambiguity between Y and a Germanic, non-Roman letter named “thorn” that has practically vanished from English today, to the point that the typeface in which you are reading these words lacks a glyph for it—but which suﬃciently literate writers are still expected to recognize on sight. This is one more reason to tend to avoid Υ when you can, a Greek letter that makes you look ignorant when you use it wrong and pretentious when you use it right. You can’t win.) The history of the alphabets is extremely interesting. Unfortunately, a footnote in an appendix to a book on derivations of applied mathematics is probably not the right place for an essay on the topic, so we’ll let the matter rest there. 3 No citation supports this appendix, whose contents (besides the Roman and Greek alphabets themselves, which are what they are) are inferred from the author’s subjective observation of seemingly consistent practice in English-language applied mathematical

438

APPENDIX B. THE GREEK ALPHABET

publishing, plus some German and a little French, dating back to the 1930s. From the thousands of readers of drafts of the book, the author has yet to receive a single serious suggestion that the appendix were wrong—a lack that constitutes a sort of negative (though admittedly unveriﬁable) citation if you like. If a mathematical style guide exists that formally studies the letter convention’s origins, the author is unaware of it (and if a graduate student in history or the languages who, for some reason, happens to be reading these words seeks an esoteric, maybe untouched topic on which to write a master’s thesis, why, there is one).

Appendix C

Manuscript history
The book in its present form is based on various unpublished drafts and notes of mine, plus some of my wife Kristie’s (n´e Hancock), going back to 1983 e when I was ﬁfteen years of age. What prompted the contest I can no longer remember, but the notes began one day when I challenged a high-school classmate to prove the quadratic formula. The classmate responded that he didn’t need to prove the quadratic formula because the proof was in the class math textbook, then counterchallenged me to prove the Pythagorean theorem. Admittedly obnoxious (I was ﬁfteen, after all) but not to be outdone, I whipped out a pencil and paper on the spot and started working. But I found that I could not prove the theorem that day. The next day I did ﬁnd a proof in the school library,1 writing it down, adding to it the proof of the quadratic formula plus a rather ineﬃcient proof of my own invention to the law of cosines. Soon thereafter the school’s chemistry instructor happened to mention that the angle between the tetrahedrally arranged four carbon-hydrogen bonds in a methane molecule was 109◦ , so from a symmetry argument I proved that result to myself, too, adding it to my little collection of proofs. That is how it started.2 The book actually has earlier roots than these. In 1979, when I was twelve years old, my father bought our family’s ﬁrst eight-bit computer. The computer’s built-in BASIC programming-language interpreter exposed
A better proof is found in § 2.10. Fellow gear-heads who lived through that era at about the same age might want to date me against the disappearance of the slide rule. Answer: in my country, or at least at my high school, I was three years too young to use a slide rule. The kids born in 1964 learned the slide rule; those born in 1965 did not. I wasn’t born till 1967, so for better or for worse I always had a pocket calculator in high school. My family had an eight-bit computer at home, too, as we shall see.
2 1

439

440

APPENDIX C. MANUSCRIPT HISTORY

441 it up.” The stormclouds dissipated from the ﬁrst sergeant’s face. No one ever said anything to me about my boots (in fact as far as I remember, the ﬁrst sergeant—who saw me seldom in any case—never spoke to me again). The platoon sergeant thereafter explicitly permitted me to read the Lectures on duty after midnight on nights when there was nothing else to do, so in the last several months of my military service I did read a number of them. It is fair to say that I also kept my boots better polished. In Volume I, Chapter 6, of the Lectures there is a lovely introduction to probability theory. It discusses the classic problem of the “random walk” in some detail, then states without proof that the generalization of the random walk leads to the Gaussian distribution p(x) = exp(−x2 /2σ 2 ) √ . σ 2π

For the derivation of this remarkable theorem, I scanned the book in vain. One had no Internet access in those days, but besides a well equipped gym the Army post also had a tiny library, and in one yellowed volume in the library—who knows how such a book got there?—I did ﬁnd a derivation √ of the 1/σ 2π factor.3 The exponential factor, the volume did not derive. Several days later, I chanced to ﬁnd myself in Munich with an hour or two to spare, which I spent in the university library seeking the missing part of the proof, but lack of time and unfamiliarity with such a German site defeated me. Back at the Army post, I had to sweat the proof out on my own over the ensuing weeks. Nevertheless, eventually I did obtain a proof which made sense to me. Writing the proof down carefully, I pulled the old high-school math notes out of my military footlocker (for some reason I had kept the notes and even brought them to Germany), dusted them oﬀ, and added to them the new Gaussian proof. That is how it has gone. To the old notes, I have added new proofs from time to time; and although somehow I have misplaced the original high-school leaves I took to Germany with me, the notes have nevertheless grown with the passing years. After I had left the Army, married, taken my degree at the university, begun work as a building construction engineer, and started a family—when the latest proof to join my notes was a mathematical justiﬁcation of the standard industrial construction technique for measuring the resistance-to-ground of a new building’s electrical grounding system—I was delighted to discover that Eric W. Weisstein had compiled
3

The citation is now unfortunately long lost.

442

APPENDIX C. MANUSCRIPT HISTORY

and published [51] a wide-ranging collection of mathematical results in a spirit not entirely dissimilar to that of my own notes. A signiﬁcant difference remained, however, between Weisstein’s work and my own. The diﬀerence was and is fourfold: 1. Number theory, mathematical recreations and odd mathematical names interest Weisstein much more than they interest me; my own tastes run toward math directly useful in known physical applications. The selection of topics in each body of work reﬂects this diﬀerence. 2. Weisstein often includes results without proof. This is ﬁne, but for my own part I happen to like proofs. 3. Weisstein lists results encyclopedically, alphabetically by name. I organize results more traditionally by topic, leaving alphabetization to the book’s index, that readers who wish to do so can coherently read the book from front to back.4 4. I have eventually developed an interest in the free-software movement, joining it as a Debian Developer [10]; and by these lights and by the standard of the Debian Free Software Guidelines (DFSG) [11], Weisstein’s work is not free. No objection to non-free work as such is raised here, but the book you are reading is free in the DFSG sense. A diﬀerent mathematical reference, even better in some ways than Weisstein’s and (if I understand correctly) indeed free in the DFSG sense, is emerging at the time of this writing in the on-line pages of the generalpurpose encyclopedia Wikipedia [52]. Although Wikipedia reads unevenly, remains generally uncitable,5 forms no coherent whole, and seems to suﬀer a certain competition among some of its mathematical editors as to which
4 There is an ironic personal story in this. As children in the 1970s, my brother and I had a 1959 World Book encyclopedia in our bedroom, about twenty volumes. It was then a bit outdated (in fact the world had changed tremendously in the ﬁfteen or twenty years following 1959, so the encyclopedia was more than a bit outdated) but the two of us still used it sometimes. Only years later did I learn that my father, who in 1959 was fourteen years old, had bought the encyclopedia with money he had earned delivering newspapers daily before dawn, and then had read the entire encyclopedia, front to back. My father played linebacker on the football team and worked a job after school, too, so where he found the time or the inclination to read an entire encyclopedia, I’ll never know. Nonetheless, it does prove that even an encyclopedia can be read from front to back. 5 Some “Wikipedians” do seem actively to be working on making Wikipedia authoritatively citable. The underlying philosophy and basic plan of Wikipedia admittedly tend to thwart their eﬀorts, but their eﬀorts nevertheless seem to continue to progress. We shall see. Wikipedia is a remarkable, monumental creation.

The ideal author of such a book as this would probably hold two doctorates: one in mathematics and the other in engineering or the like. Such developments augur well for the book’s future at least. Who can say? The manuscript met an uncommonly enthusiastic reception at Debconf 6 [10] May 2006 at Oaxtepec. whip this book out and turn to § 2. Mexico. extended over twenty years and through the course of two-and-a-half university degrees. THB . MANUSCRIPT HISTORY use the math to model something. Where this manuscript will go in the future is hard to guess. now partly A typed and revised for the ﬁrst time as a L TEX manuscript. and in August of the same year it warmly welcomed Karl Sarnow and Xplora Knoppix [54] aboard as the second oﬃcial distributor of the book.10. The ideal author lacking. But in the meantime. I have written the book. if anyone should challenge you to prove the Pythagorean theorem on the spot.444 APPENDIX C. That should confound ’em. Perhaps the revision you are reading is the last. So here you have my old high-school notes. why.

6 Dec. 1989. [4] Kristie H. Stewart. John Rossi. [6] Gary S. Rogers. [3] R. http://www. Private conversation. [9] Eric de Sturler. and Edwin N. [5] Thaddeus H.org/.debian.debian. Lecture. http://www. Private conversation. 445 . Notes on Matrix Theory. 1960. Byron Bird. [2] Christopher Beattie. New York. version 1. Guide to the Applications of the Laplace and z-Transforms. 2007. Black. Special Functions of Mathematics for Engineers. New York. http://www. Transport Phenomena. Mass. 19 April 2008.org/social_contract#guidelines. The Debian Project. Virginia Polytechnic Institute and State University. Unpublished. Black. [12] G. Addison-Wesley. 2nd edition. [7] David K. Cheng.Bibliography [1] Larry C. Debian Free Software Guidelines. Referenced indirectly by way of [36]. Va. Macmillan. ﬁrst English edition. Series in Electrical Engineering. Van Nostrand Reinhold. 1996. [8] Richard Courant and David Hilbert. 1971. 2004–08. Reading. Virginia Polytechnic Institute and State University. Va.. Andrews. 2001. 1953. Brown...org/.debian. Field and Wave Electromagnetics. Methods of Mathematical Physics. 1985. Doetsch. Blacksburg. Interscience (Wiley). New York.1. London. and Robert C. Warren E. [11] The Debian Project. [10] The Debian Project. Blacksburg. John Wiley & Sons. Lightfoot. Derivations of Applied Mathematics. Department of Mathematics.

GNU General Public License.edu/archive/21b_fall_03/ final/21breview. The Princess Bride.. 2nd edition. . [14] Stephen D. Probability. Books on Mathematics. N. Mass.J.smcvt. Hildebrand. [16] The Free Software Foundation.pdf.edu/linearalgebra/. St. Linear Algebra. [24] Francis B. and Statistics. The Feynman Lectures on Physics. Methods of Mathematics Applied to Calculus. Advanced Calculus for Applications. N. 1985.). Harrington. Addison-Wesley. /usr/share/common-licenses/GPL-2 on a Debian system. Dover. The Debian Project: http://www. Mineola. New York. Time-harmonic Electromagnetic Fields. 1968. N. 2004.. 1788. Linear Algebra. and Lawrence E.. Fifth Floor.harvard. 1963– 65. Franklin.. [22] Harvard University (author unknown). N. Three volumes. Boston.org/. Books on Mathematics. Upper Saddle River. The History of the Decline and Fall of the Roman Empire. Dover. Friedberg.. Complex Variables. New York. version 2. As of 3 Nov. one can download it from http://joshua. [23] Jim Heﬀeron. [15] Joel N. 4th edition. Mineola. 1976. 20 May 2006. 1990. 1973. Books on Mathematics. Englewood Cliﬀs. Pearson Education/Prentice-Hall. and besides is the best book of the kind the author of the book you hold has encountered. Matrix Theory. Mathematics.Y. Michael’s College. Robert B.J.math. Dover.. Arnold J. The Free Software Foundation: 51 Franklin St. [19] William Goldman. Hamming. USA.debian. 2007 at least. 1961. Insel. Mineola. [18] Edward Gibbon..Y. N. Colchester. 13 Jan. [20] Richard W. Fisher. [17] Stephen H. “Math 21B review”.. Reading. 2003. McGraw-Hill. (The book is free software. 02110-1301. Mass. [21] Roger F. http://www. Feynman. PrenticeHall.446 BIBLIOGRAPHY [13] Richard P. 2nd edition. Ballantine. Vt. Leighton. Spence.Y. Texts in Electrical Engineering. and Matthew Sands.

Theory and Application of Inﬁnite Series. 2005. As retrieved 21 Sept. Parr. [27] Brian W. 2nd edition.org/. Phillips and John M.. Mass. O’Connor and Edmund F. Lebedev. March 2006.pdf. Virginia Polytechnic Institute and State University. IA-32 Intel Architecture Software Developer’s Manual. http:// www. Englewood Cliﬀs.. Nayfeh and Balakumar Balachandran.. The MacTutor History of Mathematics. 2007 through 20 Feb. [34] Ali H. New York. 2nd English ed.uoguelph. Hafner. 1988.J. N. Dover.com/ Applied_mathematics. Planet Math. N. McGraw-Hill.. Books on Mathematics. 1995. [31] David C. 1995. Software Series. http://encyclopedia. [32] N. Blacksburg. Englewood Cliﬀs. [36] Charles L.J. 2007. Mineola. [33] David McMahon. Addison-Wesley. 28 Aug. Lay. Quantum Mechanics Demystiﬁed. 2005. Ritchie. Series in Nonlinear Science. New York. Kohler. Prentice Hall PTR. As retrieved 12 Oct.mcs. Special Functions and Their Applications. [30] The Labor Law Talk. [26] Intel Corporation.Y. revised in accordance with the 4th German edition. N. Reading. Wiley. Robertson.N. [37] PlanetMath. 2006. revised English edition. Prentice-Hall. Va.. Andrews. [29] Werner E. 2008. University of St. 2005 through 2 Nov. Signals. As retrieved 1 Sept. 19th edition. [35] John J. 1965.org. Systems and Transforms.ac. 1947. http://www. Kernighan and Dennis M. http://www-history. Computational and Experimental Methods. 1994. Lecture. New York. Scotland. Linear Algebra and Its Applications.BIBLIOGRAPHY 447 [25] Theo Hopman.laborlawtalk. Applied Nonlinear Dynamics: Analytical.ca/~thopman/246/indicial. st-andrews. 2002. School of Mathematics and Statistics. . “Introduction to indicial notation”. The C Programming Language. [28] Konrad Knopp..planetmath. Demystiﬁed Series.uk/.

Foresman & Co. [48] Bjarne Stroustrup. Cosmos.. 2003... As retrieved 28 April 2006. New York. [41] Adel S. “Levi-Civita symbol. 1941. Oxford University Press. Cambridge. [43] William L. 1993.reed. Scott. 2004. [40] Wayne A. New York. 1980. Electromagnetic Theory. As retrieved 29 May 2006 through 20 Feb. Physics 321: Electrodynamics”. 1960. Schaum’s Outline Series. 1991.pdf. Number 13 in Monographs on Applied and Computational Mathematics. [39] Carl Sagan. Ore. http://academic. 3rd edition.wolfram. Series in Electrical Engineering. van der Vorst. Virginia Polytechnic Institute and State University. [47] Julius Adams Stratton. 2008. New York. [50] Eric W. Lecture. Lecture 1.cs. [46] James Stewart. Lothar Redlin. Scales. “special” (third-and-a-half?) edition. Spiegel. New York. International Series in Pure and Applied Physics. “Euclid’s proof that there are an inﬁnite number of primes”. Paciﬁc Grove. Portland.htm. [44] Murray R. Calif. Ill. Microelectronic Circuits. The Rise and Fall of the Third Reich. [42] Al Shenk. 1984.york. 3rd edition.com/. 3rd edition. The C++ Programming Language. 2000. Glenview. [45] Susan Stepney. . Addison-Wesley. 1964. and Saleem Watson. Sedra and Kenneth C. Brooks/Cole. 27 Aug. Random House. Shirer. Smith. Iterative Krylov Methods for Large Linear Systems. Blacksburg. Simon & Schuster. Calculus and Analytic Geometry.uk/susan/cyc/p/ primeprf. [49] Henk A. Cambridge University Press.ac. http://www-users. Precalculus: Mathematics for Calculus. 2007. Weisstein. New York. Boston.. http://mathworld. Mathworld. McGrawHill.edu/ physics/courses/Physics321/page1/files/LCHandout.. Va. McGraw-Hill. Complex Variables: with an Introduction to Conformal Mapping and Its Applications.448 BIBLIOGRAPHY [38] Reed College (author unknown).

org/. Xplora Knoppix . Clarendon Press. Wikipedia.BIBLIOGRAPHY 449 [51] Eric W. [54] Xplora. 1965. The Algebraic Eigenvalue Problem.H. Fla. Knoppix/.. CRC Concise Encyclopedia of Mathematics. Chapman & Hall/CRC. Monographs on Numerical Analysis. http://en. http://www. 2nd edition. Boca Raton. [52] The Wikimedia Foundation. Weisstein.wikipedia. [53] J.xplora. Wilkinson. 2003.org/downloads/ . Oxford.

450 BIBLIOGRAPHY .

68. 39 higher-order. 219 nth root linear. 245 addition operator downward multitarget. 168 classical. 395 0 (zero). 432 aeronautical engineer. 45. 288 16th century. 384 ≪ and ≫. 219 implementation of from an equation. 51. 118. 413. 349 inverting a matrix by. 290 amortization. 88 of the vector. 356 rightward multitarget. 35. 331 ∇. 117 i. 220 dividing by. 419 dℓ. 264 δ. 414 aileron. 118 matrix. 244 series. 10 adjoint. 405 of vectors. 7. 244. 198 active region. 264 ≡. 244. 91 fundamental theorem of. 349 absolute value. 408 addition quasielementary. 45. 281 alternating signs. 263 row. 280 1 (one). 118 addition of matrices. 69 of a matrix inverse. 38 accountant. 7 parallel. 130 AMD. 240. 302 ←. 282 multitarget. 241. 257 0x. 185 elementary. 264 2π. 265 LU decomposition. 307 Gauss-Jordan. 331 π. 233 calculation by Newton-Raphson.Index . 263 QR decomposition. 161 of rows or columns. 7 e. 47. 431 calculating. 146. 393 addition analytic continuation. 144 algebra dz. 68. 11 addressing a vector space. 68 serial. 393 nth-order expression. 361 ′ 451 . 40 abstraction. 219 algorithm reductio ad absurdum. 407 accretion. 175 altitude. 265 upward multitarget. 408 ǫ. 247 amplitude. 251 leftward multitarget. 237 vector. 112. 361. 431 air.

45 sum of. powers and logarithms. 395 rotation of. 198 boundary element. 193 and the natural logarithm. 156 foundations of. 2 axis. 15 arm. Jean-Robert (1768–1822). 7 exact. 45 derivative of. 139 surface. 403 azimuth. 45 derivative of. 330 assignment. orthogonal. 330. 55 interior. 53 right. 393 axiom. 402 basis. 162 Argand plane. 62 of a cylinder or circle. 1.452 analytic function. 7 INDEX nonassociativity of the cross product. 10 associativity. 198 bound on a power series. 45 arccosine. 237 of unary linear operators. 375 complete. 330 binomial theorem. 315. 40 arithmetic. 417 automobile. wooden. 139 arg. 10 . 51 invariance under the reorientation of. 403 spherical. 400 of matrix multiplication. 305. 403 variable. 400 double. 290 blackbody radiation. 161 angle. 109 arctangent. 55 hour. 29 block. 43. 70 bond. 197 of a product of exponentials. 401 average. 366. 403 barometric pressure. 163 strategy to avoid. 229 basis. 402 converting to and from. 109 arcsine. 40 Argand domain and range planes. 401 battle. 334. 122 arithmetic series. 333 artillerist. 36 of a triangle. 45 square. 98 articles “a” and “the”. 35 of rotation. 175 boundary condition. 80 borrower. 146. 216 antiquity. 40 Argand. 403 secondary cylindrical. 243. 233 arc. 121 axes. 51. 330 battleﬁeld. 237 arithmetic mean. 376 constant. 425 box. 376 cylindrical. 130. 39. 419 baroquity. 7. 35 of a polygon. 423. 399 reorientation of. 35 antelope. 45 derivative of. 194 guessing. 7 branch point. 51 changing. 335 businessperson. 398. 121 C and C++. 414 antiderivative. 374 of matrices. 164 building construction. 45. 35. 72 bit. 55. 398. 219 applied mathematics. 55 half. 36. 9. 290. 109 area.

233 inscrutable. 131 cannon. 81 change. 339 full. 103 properties. 5. 237 noncommutivity of the cross product. 67 change of variable. 327. 45. 104 complex number. 360 null. 368. derivative of. 102 citation. 1501–1576). 143 checking division. 98. derivative. 125. 315 column operator. 376 completing the square. 384 checking an integration. 234 . 125 fundamental theorem of. 200 caveman. 41 is a scalar not a vector. Andr´-Louis (1875–1918). Augustin Louis (1789–1857). 313. rate of. 291. 254 of the identity matrix. 144 closed form. 413 the two complementary questions of. 360 spare. 421 cis. 253 of matrices. 12 complex conjugation. 361 null. 10 change. 173 complex exponent. 307. 330 Cardano. 41 imaginary part of. 41 complex contour about a multiple pole. 26 nontrivial.INDEX calculus. 143 choice of wooden blocks. 403 area of. 382. 7 cleverness. 70 Cholesky. 414 chain rule. 91 and de Moivre’s theorem. 70 properties of. 417 of the dot product. notational. 403 unit. 331 Cauchy’s integral formula. 295. 323 orthonormal. 7 noncommutivity of matrix multiplication. 239 column rank. 351 addition of. 108 conjugating. 100 complex exponential. 234. 222 matching of. 398 complete basis. 227 clock. 45 circuit. Girolamo (also known as Cardanus or Cardan. 197 453 coincident properties of the square matrix. 55. 315 column vector. 167. 103 inverse. 101 derivative of. 65 actuality of. 399 noncommutivity of unary linear operators. 249 commutivity. 139 secondary. 331 circulation. 332 classical algebra. 55 closed analytic form. 366 column. 70 commutation hierarchy of. 216 closed contour integration. 280. 345 combination. 142 clutter. 405. 130. 370 unknown. 424 of the vector. 67 characteristic polynomial. 67. 216 closed surface integration. 219 carriage wheel. 39. 67. 253 of elementary operators. 355 scaled and repeated. appending a. 167. 433 coeﬃcient. 200 Cauchy. 71 combinatorics. 5 city street. 391 e circle. 149.

201 complex. 62 rectangular. 366 cross product. 405 coordinate rotation. 163. 220 cosine. 64. 393 datum. 144 closed complex. 374 of a scalar. 30 constraint. 155 conditional. 136 INDEX domain of. 102 law of cosines. 338 counterexample. 41. 39. 112. 121 contractor. 224 curl. 400 noncommutivity of. Richard (1888–1972). 139 conjecture. 405 composite number. 307 control. proof by.454 magnitude of. 405 condition. 101 de Moivre. 349 computer processor. 173 contour integration. 237 Courant. 223 cubic formula. 111 compositional uniqueness of. Abraham (1667–1754). 228 contour. indeterminate. 65 and the complex exponential. Gabriel (1704–1752). 181 slow. aeronautical. 226 corner value. 219. 74 complex trigonometrics. 40 multiplication and division. 107 in complex exponential form. 220 roots of. 398 derivative of. 331 convention.about a multiple pole. 181 coordinate. 145 contract. 78 components of a vector by subscript. 118. 11. 103. 421 cyclic progression of coordinates. 393 primed and unprimed. 41. 109. 135. 136 cone volume of. 402 complex. 30 constant expression. 366 Cramer. 111 cubic expression. 403 cylindrical coordinates. 191 crosswind. 31 concision. 62 corner case. 401 cryptography. 45. 55 de Moivre’s theorem. 60 cost function. 200 complex. 101 complex variable. 168 of a vector quantity. 399 cylinder. 112 computer memory. 11 constant. 173. 167. 403 cylindrical basis. 145. 62 cyclic progression of. 399 nonassociativity of. 431 convergence. 5. 40 complex power. 223 cubing. 168. 397. 62 relations among. 191 cross-term. 335 contradiction. 65 . 3 Cramer’s rule. 335 day. 240 of a matrix inverse. 257 conjugation. 399 perpendicularity of. 40 real part of. 240 conjugate transpose. 367. 62 coordinates. 45. 41 constant. 63 spherical. 400 cross-derivative. 375 conditional convergence. 144 closed. 65 phase of. 399 cylindrical. 41 complex plane. 290 concert hall. 331 control surface. 62. 340 conjugate.

195. 364 product of two. inverting a matrix by. 81 constant. 75 determinant. 364 of a product. 22. 146 deﬁnition notation. 103 of a ﬁeld. 344 Leibnitz notation for. 95. 364 of a unitary matrix. 330 del (∇). 191 deﬁnition of. 143 deﬁnition. 323 degree of freedom. 363 deviation. 5 decomposition LU . 385 diagonalization. 365 of a matrix inverse. 327 dependent variable. 2. Kronecker. 337. 36. 67 balanced form. 408 properties of. 79 diﬀerential equation. 413 of a function of a complex variable. 107 of the natural logarithm. 109 of arcsine. 78 of a rational function. 358 zero. 312. 83 unbalanced form. 45 main. 76 derivation. 313 degenerate matrix. 80 manipulation of. 107 of the natural exponential. 92. 282 Gram-Schmidt. 420 cross-. the. 418 higher. 341 DFSG (Debian Free Software Guidelines). arccosine and arctangent. 276. 263 diagonal. 80 455 Newton notation for. 362 deﬁnition of. 371 diagonal matrix. 372 diagonalizable matrix. 409 sifting property of. 242. 79 partial. 146 delta. 353 eigenvalue. 5 Debian Free Software Guidelines. 78. 75 of z a /a.INDEX Debian. 371 diﬀerentiability. 415 product rule for. 75 directional. 263 diagonalizability. 81. 262. 11 degenerate linear system. 138 dependent element. 382 three-dimensional. 107 of an inverse trigonometric. 75 chain rule for. 210 of a trigonometric. cosine and tangent. 241. Dirac. 356 orthonormalizing. 377 singular value. 388 deﬁnite integral. 109 of sine and cosine. 359 rank-n. 364 properties of. 5 diag notation. 358 inversion by. 241 denominator. 282 QR. 103 of sine. 414 delta function. 198 . 76 logarithmic. 349 Schur. 194 of a complex exponential. 349 diagonal. 205 density. 371 diﬀerences between the GramSchmidt and Gauss-Jordan. 371 Gauss-Jordan. 109 of z a . 357 and the elementary operator. 349 Gram-Schmidt. 277 second. 38 diagonal decomposition. 83 Jacobian. 146 sifting property of. 1 derivative.

371 eigenvector. 253 . 373. Albert (1879–1955). 369. 385 repeated. 146 Dirac. 325. 161 dominant eigenvalue. 370 repeated. 138 double pole. 145 distributivity. 41. 331 electromagnetic power. 345. 373. 146 direction. 206. 383 shifting. 7 of unary linear operators.456 solving by unknown coeﬃcients. 425 edge case. 383 eigenvalue. 406 Einstein. 398 abbrevated notation for. 398 double angle. 45. 225. 405 commutivity of. 357. 242. 385 repeated. 332 downstairs. 370. 38. 368. 254 commutation of. 45. 65 by matching coeﬃcients. 146 sifting property of. 256 and the determinant. 182 domain contour. 22 dividing by zero. 5. 373. 369 perturbed. 403 edge INDEX inner and outer. 370 small. 372 dominant. 38 dividend. 374 dot product. 251. 393 directional curl. 349 eigensolution. 370. 181. 400 element. 13. 316 dimensionlessness. 417 divergence. 374 eigenvalue decomposition. 368 distinct. 225 down. 51. 45 Dirac delta function. 406 Einstein’s summation convention. 327 interior and boundary. 371 eigenvalue matrix. 47. 406 electric circuit. 369. 418 discontinuity. 210 double root. 143 divisor. 383 real. 327. 198 diﬀerentiation analytical versus numeric. 144 dimension. 374 of a matrix inverse. 250 addition. 369 generalized. 234 dimension-limited matrix. 26 checking. 362 combining. 370 orthogonal. 55 double integral. 374 magnitude of. 371 of a matrix inverse. 414 downward multitarget addition operator. 384 independent. 327 sidestepping a. 419 divergence theorem. 374 large. 68 division. 374 zero. 133. 244 dimensionality. 264 driving vector. Paul (1902–1984). 200 east. 22 domain. 423 divergence to inﬁnity. 163 domain neighborhood. 370 of a matrix inverse. 333 dummy variable. 373. 372 linearly independent. 421 directional derivative. 330. 383 Einstein notation. 423. 371 impossibility to share given independent eigenvectors. 234 free and dependent. 370 count of. 331 electrical engineer. 280 eﬃcient implementation. 425 elementary operator.

185 evenness. 332 fast function.). 413 . 16. 98 curious consequences of. 237. 100. 11 factorization QR. 349 orthonormalizing. 305 exact quantity. 13. natural. 140 error due to rounding. 306 engineer. 421 derivative of. Levi-Civita. 314. 180 resemblance of to x∞ . 237. 219. 280 end. 53. 112 Euler’s formula. 91 compared to xa . 101 Euler. 253 two. 83. 295. 245. 39 empty set. 98. 256 sorting. 403 elf. 324 solving a set of simultaneously. 112 family of solutions. 91 and de Moivre’s theorem. 136 evaluation. 298 factorial. 353 Gauss-Jordan. complex. 97 exponential. 215 natural. 250. 250 invertibility of. 101 exponential. 227 ﬁeld. 190 Euclid (c. 408 properties of. ﬂoating-point. 100 exponent. 331 entire function. 165. of diﬀerent kinds. 282 Gram-Schmidt. 349 full-rank. 96 derivative of. 251. 325 equator. 196 factoring. 315. 301 extension. 305. 345 left-over. 290 exponential. 249. 212 expansion of 1/(1 − z)n+1 . 305 457 exactly determined linear system. 307. 4 extra column. 91 extended operator. 94 integrating a product of a power and. 334 in the solution to a linear system. 305 exact number. 349 factorization. 187 epsilon. 290. 300 B. 366. 356 decomposing. 252 inverting. 374 error bound. error of. 100 general. Leonhard (1707–1783).C. 165. 96 Ferrari. 409 equation simultaneous linear system of. 334. 330. prime. 243.INDEX expanding. 91 exponential. 32 complex. 254 scaling. 92. 181. 282 elementary vector. 254 interchange. 32. 166 factor truncating. 324 exercises. 175 essential singularity. 253 elementary similarity transformation. real. 157 exponent. 374 exact matrix. 83 even function. 148 existence. 258 exact arithmetic. 36. 152 expansion point shifting. 340. 309 elevation. 107 existence of. Lodovico (1522–1565). 39. 324 superﬂuous. 315 extremum. 255 inverse of. 413 curl of. justifying the means.

347 Gram-Schmidt decomposition. 414 ﬂux. 5 goal line. 384 geometric majorization. 313. 33 GNU General Public License. 334 general triangular matrix. 419 football. 159. 96 ﬁtting of. 419 goal post. 134 meromorphic. 330 freeway. 213 Frullani. GNU. 165. Jørgen Pedersen (1850–1916). 94 general identity matrix. derivatives of. 419 Goldman. 117 INDEX fundamental theorem of calculus. 38.and multiple-valued. 281 geometry. 245 general interchange operator. 290. 2 geometrical vector. 185 of a complex variable. 164 single. 146 GPL. 326 Gauss-Jordan algorithm. logical. 122 geometric series. 382 generality. 270 Frullani’s integral. multiple-valued. 213 full column rank. 262 general solution. 315 inverting the factors of. 187 nonanalytic. 83 fast. 288 Gauss-Jordan decomposition. 130. 353 factors K and S of. 151 linear. 424 gamma function. 340. 5 gradient. 29 geometrical arguments. 312 full rank factorization. 418 Gram.458 directional derivative of. 134 fourfold integral. 299 Gauss-Jordan kernel formula. 179 geometric mean. 282. William (1931–). degree of. 179 geometric series. 419 gradient of. variations on. 78 rational. 315 full rank. 266. 419 forbidden point. 5 GNU GPL. 181.vs. 261 General Public License. 349 . 165. Giuliano (1795–1834). Carl Friedrich (1777–1855). 335 frontier. 347 full-rank factorization. 138 fraction. 187 extremum of. 29 majorization by. 314 full row rank. 326 general exponential. 326 generalized eigenvector. 210 rational. 205 rational. 5 general scaling operator. 418 divergence of. 162 slow. integral of. 315 diﬀerences of against the GramSchmidt. 114 ﬂoating-point number. 138 Fourier transform spatial. 374 ﬂoor. 205 free element. 44 nonlinear. 418 ﬂaw. 327 freedom. 162 formalism. 282 Gauss-Jordan factors properties of. 313. 96 fundamental theorem of algebra. 297 Gauss-Jordan factorization. 43. 134 odd or even. 209 single. 161 entire. 146 analytic. 393 geometrical visualization. 196 Gauss. 282 and full column rank. 353 function.

83 homogeneous solution.INDEX diﬀerences of against the GaussJordan. 103 prime. 7 iﬀ. 111 hyperbolic functions. 55 inner surface. 280. variable. 130 156 independent variable. 353 factor Q of. 401 induction. 111 hypotenuse. 219 inﬁnity. 374 indeﬁnite integral. 3 inﬂection. 245. 353 factor S of. 161 145. 36 integral. 312 half angle. 374 imaginary number. 229 guessing the form of a solution. 125 hypothesis. 130 Hessenberg. 178 identity as accretion or area. 248 identity. 76 handwriting. 122 index. 112 properties of. 108. 248 balanced form. 68 Hilbert. 353 inverting a matrix by. 333 inner edge. 330 inner product. 168 Hessenberg matrix. 249 impossibility of promoting. 127 r-dimensional. 55 integer. 130 identity matrix. 356 Gram-Schmidt kernel formula. 240. (1915–1998). 435 Greenwich. 423 hour angle. 307 rank-r. 41 imaginary unit. 416 inﬁnite dimensionality. 134. 328 and series summation. 384 dropping when negligible. inﬁnite diﬀerentiability. 3. 305. 13 headwind. 10. 347 grapes. 197 gunpowder. 384 practical size of. 119. 145 inequality.and higher-order. 102 compositional uniqueness of. 384 independent. 345 hour. 425 horse. 41. 353 Gram-Schmidt process. arithmetic. 330 459 commutation of. 341 Heaviside. 125. 154 Heaviside unit step function. Oliver (1850–1925). 248 as shortcut to a sum. 15 hut. 155 ill conditioned matrix. 69 higher-order algebra. 68 and the Leibnitz notation. 367. David (1862–1943). 126 vector. 39 imaginary part. 179 index of summation. 412 as antiderivative. 129 . 109 Greek alphabet. 242 Hermite. 384 inﬁnitesimal. 236 harmonic series. 55 independent inﬁnitesimal Hamming. reﬂected. 109 indeterminate form. 76 Hermitian matrix. Richard W. Gerhard (1874–1925). 68 hexadecimal. 143 independence. 85 harmonic mean. Charles (1822–1901). variable. 414 composite. 432 second. 349 imprecise quantity. algebraic. 39 implementation eﬃcient. 55 guessing roots.

198 interior element. Wilhelm (1842–1899). 84 o labor union. 364 existence of. Leopold (1823–1891). 138. 257 of a matrix transpose or adjoint. Guillaume de (1661–1704). 408 l’Hˆpital’s rule. 290 interchange. 144 by antiderivative. 80. 103 inverse trigonometric family of functions. 423 integral swapping. powers and logarithms. 357. 408 properties of. 21. 138 integral forms of vector calculus. 322 inverse complex exponential derivative of. 399 inverse INDEX determinant of. 254. 7 symbolic. 197 checking. 282. 2 iteration. 353 kernel matrix. 320 uniqueness of. 328 Gauss-Jordan formula for. 143 multiple. 322 of a matrix product. 325 converting between two of. 187 . 194 by Taylor series. 425 invariance under the reorientation of axes. 167. 103 inversion. 320 by determinant. 138 surface. 374 Jacobi. 349 interchange operator elementary. 241 Kronecker. 143 of a product of exponentials. 205 by parts. 261 interchange quasielementary. 138 vector contour. 337. 144 closed surface. 252 irrational number. 276. arithmetic. 335 Laurent series. 140 triple. Pierre Alphonse (1813–1854). 21. 209 sixfold. 326 kernel. 168 concept of. 193 Intel. 409 sifting property of. 326 Gram-Schmidt formula for. 398. 326. 359 refusing an. 365 inversion. 7 invertibility.460 closed complex contour. Carl Gustav Jacob (1804–1851). 423. 137 of a rational function. 142 complex contour. 339 of the elementary operator. 138 integrand. 330 Kronecker delta. 193 integration analytical versus numeric. 195 by substitution. 319. 351. 330 kernel space. 125 contour. 241. 365 multiplicative. 187 o l’Hˆpital. 86. 241. 138 indeﬁnite. 193 by closed contour. 257 rank-r. 84. 145 volume. 276 Jacobian derivative. 138 fourfold. 215 integration techniques. 144 deﬁnite. 344 Jordan. 315. 115 irreducibility. 216 by unknown coeﬃcients. 200 by partial-fraction expansion. 261 interest. 200 closed contour. 250 general. 187 Laurent. 143 double. 325 alternate formula for. 322 mutual.

249 long division. 134. 25. 116 procedure for. 350 loop counter. 109 logarithmic derivative. 312 linear dependence. 408 properties of. 215 natural. 51. logical. 374 preservation of. 1 . 160 Maclaurin. 94 and the antiderivative. 313. 118 mass density. 177. 138 matching coeﬃcients. 236 linearity. 417 linear superposition. natural. 336 taxonomy of.INDEX law of cosines. 80 logic. 408 limit. 156. 134. 335 linear algebra. 333 overdetermined. 134 of a function. 11. 356 unit. 59 least squares. 324 nonoverdetermined. 358 mason. 178 geometric. 242. 172 linear system classiﬁcation of. 97 logarithm. 337 leftward multitarget addition operator. 350 majorization. 38 marking permutor. 358 quasielementary. 335. 313 underdetermined. 356 Levi-Civita epsilon. general solution of. 13 lower triangular matrix. 180 properties of. 134 unary. 33 resemblance of to x0 . 332 nonoverdetermined. 306 mantissa. 114 lone-element matrix. error of. 60 law of sines. 27 loop. 312 linear expression. 36 Leibnitz notation. 179 maneuver. 22. 382 majority. 280. 32 integrating a product of a power and. 130 Leibnitz. 76 length curved. 346 of an eigenvalue. 341 logical ﬂaw. 76. 26 mathematician applied. 374 magnitude. 335 least-squares solution. 306 reverse. 290 mapping. 332 ﬁtting a. 219 linear independence. 417 loan. 134 of an operator. 313 linear transformation. 313 exactly determined. 117 461 logarithm. 64 of a vector. Gottfried Wilhelm (1646–1716). 265 leg. 334 nonoverdetermined. 187 by z − α. 198 locus. 134. 69 line. 40. 45 preservation of. 313 degenerate. 312 linear operator. 280. 233 linear combination. 409 Levi-Civita. 95. 67. 160 magniﬁcation. 280. particular solution of. 265 Maclaurin series. Colin (1698–1746). Tullio (1873–1941). 374 main diagonal. 96 derivative of. 194 compared to xa .

269 identity. 306 memory. 146 projection. 312.462 INDEX padding with zeros. computer. 258 matrix forms. 248 unit triangular. 315 exact. applied provenance of. 323 broad. 279 lone-element. noncommutivity of. 114. 245 self-adjoint. 245. 146. 373 metric. 265 ill conditioned. 83 null column of. 237 geometric. 236 real. 367. 366 matrix rudiments. 236 arithmetic. 246 dimension-limited. 242 harmonic. 233 raised to a complex power. 212 parallel unit triangular. 291. 236 foundations of. 237 multiplication. 307 unit upper triangular. 254. 390 mathematics. 274 ing. 327. 245 unit lower triangular. 354 inversion of. 323. construction of. 312. 122 multiplication by a scalar. 320. coincident properties of. partial. 365 matrix operator. 351 scalar. 338 null. 315. 312 truncating. 121 motivation to. 178 orthogonally complementary. 374 inversion properties of. 267 full-rank. 165. 417 large. 258 arithmetic of. 83 motivation for. 351 minorization. associativity of. 237 basic operations of. 393 main diagonal of. 156 perpendicular. 242. 242. the impossibility of promotunit triangular. 334. 347 mirror. 382 maximum. 109 . 265 identity. 249 general identity. 332 row of. 273 applied. 265 Hermitian. 312. 291 column of. 385 sparse. 372 matrix. 2. 244 square. justiﬁed by the end. 384 unit parallel triangular. 388 diagonalizable. 337 basic use of. properties mathematics of. 249 matrix vector. 237. 366 eigenvalue. 384 commutation of. 2. 242 professional. 234 rank of. 122 multiplication of. 234. 122 multiplication. 374 unitary. 114. 347 professional or pure. 254 singular. 382 nondiagonalizable versus singular. 236 rectangular. 244 minimum. 310 addition. 1. 320 dimensionality. 374 degenerate. 323 singular value. 233 mean. 349 237 meromorphic function. 242 inverting. 305 triangular. 233. means. 187 nondiagonalizable. 312. 316 square. 319 matrix condition. 371 tall. 237 rank-r inverse of. 233. 242.

91 natural exponential family of functions. 265 rightward. 340. 40 Moore. 111 numerator. Old Royal. 280 number. 139 normalization. 351. 344 Newton-Raphson iteration. particular solution of. 399 model. 344 nobleman. 334 nonoverdetermined linear system. 115 rational. 264 multivariate Newton-Raphson iteration. Sir Isaac (1642–1727). 335. 205 Observatory. 332 multiple pole. 91 derivative of. 91 real. 114. E. 103 neighborhood. 55 . 210 enclosing. 109 error of. 103 natural logarithm. 91 compared to xa . 393 null column. 322 nonlinearity. 162. 173 multiple-valued function. 330 notation for the vector. 22. 115 real. 68 number theory. 405 of the operator. 335 Moore-Penrose pseudoinverse. 86. 350 463 Newton. 96 complex. 353 motivation. (1862–1932). 344 Napoleon. 397 of a vector by a scalar. 233 multitarget addition operator. 67. 324 mountain road. 92. 162.INDEX mnemonic. 38. 333 normal unit vector. 65 complex.H. 331 nonnegative deﬁniteness. 263 downward. 7 multiplier. 351 null matrix. 219 multivariate. 237 noncommutivity of the cross product. 2. 374 north. 39 complex. 403 normal vector or line. 347. 180 of a complex number. 265 upward. 107 error of. 397 of matrices. 371 nonoverdetermined linear system. 65 of a vector. 382 versus a singular matrix. 373 noninvertibility. 95. 264 leftward. 96 derivative of. 164 multiplication. 41. 75. 206. actuality of. 244 null vector. 44 nonanalytic point. 39 irrational. 39 very large or very small. 39. 5. 242 multiplicative inversion. 13. 45. concise. 400 noncommutivity of matrix multiplication. general solution of. 194 compared to xa . 332 nonoverdetermined linear system. 101 natural logarithmic family of functions. 180 existence of. 305 imaginary. 108 exact. 330 natural exponential. 237. 146 nonanalytic function. 416 of the vector. 402 modulus. 171 nonassociativity of the cross product. 399 nondiagonalizable matrix. 94 and the antiderivative. 161 nesting. 86.

55 one. 71 Pascal. Marc-Antoine (1755–1836). 261 permutor. 264 using a variable up. 336 INDEX padding a matrix with zeros. 134. 274 partial-fraction expansion. 62 orthonormalization. 205 particular solution. 262 leftward multitarget addition. enclosing a. 383 Pfufnik. 72 neighbors in. 164. 62. 313. 401 variable. 403 perturbation. 84. 110 odd function. 269 properties of. 70 permutation matrix. 417 unresolved. Gorbag. 263 nonlinear. 347 perpendicular unit vector. 328 Penrose. Max (1858–1947). 250 Old Royal Observatory. 357. 332 in vector notation. 265 truncation. 51 origin. 162. 40 physical world. 198 pencil. 265 linear. 258. 393 physicist. 258 oﬀ-diagonal entries. 121 parallel unit triangular matrix. 347. 357 marking. 399 and the Levi-Civita epsilon. Roger (1931–). 415. 335. 331 pitch. 356 outer edge. 11 orientation. 416 unary linear. 332 plausible assumption. 7. 51. 319. 38. 408 Parseval’s principle. 175 partial derivative. 171. 133 downward multitarget addition. 416 optimality. 187 multiple. 264 elementary. 118. 132. 334 Planck. 261 general scaling. 242 parallel addition. 299. 51 pole. 425 outer surface. 245 operator. 51. 3 pilot. 29 plane. 347 orthonormalizing decomposition. 396 pivot. 220 parallel subtraction. 51 orthogonal basis. 185 oddness. 45. 415 partial unit triangular matrix.464 Occam’s razor abusing. 402 orthogonal complement. 206. 417 multitarget addition. 418 upward multitarget addition. 273 parity. 72 path integration. 347 orthogonal vector. 416 + and − as. 333 Pascal’s triangle. 352 orthonormal rows and columns. 133 operator notation. 250 general interchange. 358 perpendicular matrix. 335 permutation. 134 quasielementary. 423 overdetermined linear system. 338 order. 355 orthonormal vectors. 260 rightward multitarget addition. Blaise (1623–1662). 164 phase. 206 Parseval. 173 circle of. 289 small. 206 . 346 orthogonalization. 144 payment rate. 45. 349 inverting a matrix by. 112 point. 249 unary. 261. 78. 206 partial sum.

74 electromagnetic. 222 quadratic formula. 112. 231 . 29 derivative of. 45 quadratic expression. 210 of a trigonometric function. 12 quadratics. 397 product rule. 262 quintic expression. 500 B. 382. 51. 277 productivity. 38. 16 of a power. 206. 195. 11. cyclic. 331 power. 118. 1 by contradiction. 38 quadrant. 390 prolixity. 146 progression of coordinates. 117 of order N has N roots. 118. 36 and the hyperbolic functions. 206 polygon. 114 pyramid volume of. 580–c. 118. 16 real. 75 dividing. 263 interchange. 384 has at least one root. 117 positive deﬁniteness. 206.C. derivative. 397 465 of determinants. 22 dividing by matching coeﬃcients. 227 resolvent cubic of. 219. 15 complex. 46 in three dimensions. 102 and the sine and cosine functions. 20 power series. 260 addition. 11 quantity exact. 2. 210 separation of. 19 of a product. 2. 405 proof. 116 characteristic. 358 row addition. 29 multiplying. 345 of a vector and a scalar. 36 Pythagorean theorem. 353 pure mathematics. 13 determinant of. 230 quartic formula. 371 potentiometer. 340.). 261 scaling. 121 professional mathematician. 18 sum of. 405 processor. 21 pressure. 290 product.INDEX double. 21. 397. 154 by sketch. 2. 16 notation for. 26 extending the technique. 305 quartic expression. 43 bounds on. 400 fractional. 114. 21. 186 repeated. 22 shifting the expansion point of. 228 roots of. 305 imprecise. 112 prime mark (′ ). computer. 11. 41. 111 inﬁnite supply of. 405 prime number. 231 primed coordinate. 111 relative. 399 projector. 123 proviso. 81. 206. 335. 364 dot or inner. 33 polynomial. 210 multiple. 139 Pythagoras (c. 326 pseudoinverse. 11. 368. 175 common quotients of. 157 with negative powers. 212 professional mathematics. 219. 19 properties. 230 quasielementary marking. 17 integral. 307 by induction. 34 proving backward. 413 prime factorization. 264 quasielementary operator. 364 of vectors.

instantaneous. 358 right triangle. 136. 7. 383 radius. 116 reorientation. 209 Roman numeral. 231 root. 399. 210 range contour. 229 18 rational. 163 repetition. 67 265 relative. 22 addition of. 310 uniqueness of. 205 INDEX zero. 39 ﬁnding of numerically. 399 radian. 41 superﬂuous. 310 residual. 26. 360 . 393 from a quartic polynomial. 361 after division by z − α. unseemly. 84. 398. 393 rational number. 334 and independent rows. 258. 307 resolvent cubic. 228 maximum. 45 rank-n determinant. 344 approximating as a ratio of integers. 334 relative rate. 296 reverse logic. 67 rise. 341 row. 230 rectangular matrix. 18. guessing. 205 mountain. 313. 401 rational function. 315. 116. 51. 206. 156 rate of change. 38. 313. 226 reciprocal pair. 172. 340 column. 231 rounding error. 116 null. rate. 86. 80 rigor. 34. 45 rank-r inverse. 45. 414 rational root. 408 quotient. 86. 205 impossibility of promoting. 11. 172. 223 splitting down the diagonal. 310 revolution. 34 from a quadratic polynomial. 115 roof. Joseph (1648–1715). 323 triple. 332 fully reduced. 312 residue. 435 integral of. 282. 296 minimizing the. 17. 80 row. 344 rightward multitarget addition operator. 335 full. 408 rank. 323 rotation. 22. 370. 55 repeated eigensolution. 12 rectangular coordinates. 339 squared norm of. 205 roll. 45. 327 repeated pole. 393 regular part. 45 repeated eigenvalue. 383 range. 395 invariance under. 38. 219 real exponential. 402 Raphson. 325. 115 winding. 322 root extraction rectangle. 2. 231 real part. 91 double. 234. 305. 7 from a cubic polynomial.466 quiz. 373. 347 reversibility. 210 Roman alphabet. 45 rate of change. 67 road ratio. 396 derivatives of. 53 relative primeness. 223 reciprocal. 187 angle of. 225 real number. 291 remainder. 62. 320 right-hand rule. 51.

377 singular value decomposition.INDEX null. 164 elementary. 102 scaling. 403 sixfold integral. 179 multiplication order of. 251 singular matrix. 241 Saturday. schematic. 375 scalar ﬁeld. 103. 129 directional derivative of. 118 slow convergence. 262 determinant of. 51 singularity. 276. 206 slope. 29 harmonic. 400 scalar multiplication derivative of. 403 sink. Old. variations on. 2. 181 series. Erhard (1876–1959). 70 sketch. 363 scaling quasielementary. 175 complex. gradient of. 262 versus a nondiagonalizable matrix. 413 Simpson’s rule. 264 row operator. 239 row rank full. 266. 83 serial addition. 347 singular value. 282. 345 Royal Observatory. 414 separation of poles. 374 general. 50 similarity. 388 screw. 397 in complex exponential form. 15 solid convergence of. 328 373 Schmidt. 146. 64 surface area of. 323 orthonormal. 403 skepticism. 118 shape area of. 47. 420 second derivative. 233. 13 sum of. 83 sinusoid. 179 geometric. 165. 254. Issai (1875–1941). 299. 190 wavy surface of. 323. 377 singular value matrix. 107 of a vector. 55 rudder. 313 row vector. 14 notation for. 335 sign scalar. appending a. 355 scaled and repeated. 13 series addition. 375 similarity transformation. 38 sea essential. 245 sine. 13 product of. 138 secondary cylindrical basis. 59 scaling operator single-valued function. 45 467 geometric. 360 row addition quasielementary. proof by. 418 237. 384 sky. 324 scalar matrix. 34 self-adjoint matrix. 96 arithmetic. 162. 331 rudiments. 326 shifting an eigenvalue. 388 Schur decomposition. 376 condition of. 234 alternating. 45. 418 simultaneous system of linear equations. 45. 279 run. 374 selection from among wooden blocks. 13 slow function. 370 sifting property. 39. 359 law of sines. 388 Schur. 140 . 47 secondary circle. 29. 139 shift operator.

270. 45. 216 multidimensional. completing the. 13 partial. 332 general. 423 orientation of. 174 in 1/z. 405 subtraction parallel. 92 tapered strip. 139 solution. 334 guessing the form of. 425 strictly triangular matrix. 333 particular and homogeneous. 138 spare column. 403 surface area. 403 spherical coordinates. tapered. 420 south. 63. 242. 74 Taylor series. 366 degenerate. 12 squared residual norm. 121 sum. 140 style. 315 tangent. 140 surface element. 266 strip. 316 three-dimensional. 30 source. 403 space. 109. 140 target. 295. 401 sphere. 142 spherical basis. 223 superposition. 330. 280 summation. 270 Tartaglia. 148 INDEX subscript indicating the components of a vector by. 178 convergence of. 246 speed. 18. 140 closed. 138 surface integration. 56 tilted. 400 two-dimensional. Einstein’s. 313 Taylor expansion. 45 derivative of. 51 tall matrix. 332 symmetry. 172. 224 steepest rate. 335 squares. 337 particular. 191 . 107 in complex exponential form. 360 surface. 320 coincident properties of. least. 64 summation convention. 86. 175 weighted. 159 converting a power series to. o 219 taxonomy of the linear system. sum or diﬀerence of. 394. 11 squaring. 419 Stokes’ theorem. 88 square. 403 surface area of. 114. 425 Stokes.468 volume of. 36 square matrix. 319 error in. 333 sound. 62. Sir George Gabriel (1819–1903). 425 surface integral. 140 volume of. 101 tangent line. 157 for speciﬁc functions. 197 of least-squares. 190 integration by. 304 symmetry. ﬁrst-order. 333 square. 186. 3. 374 family of. 403 swimming pool. 335 squares. 151. 406 superﬂuous root. 302. 137 inner and outer. appeal to. 400 space and time. 142 surface normal. 315 sparsity. 323 square root. Niccol` Fontana (1499–1557). 51. 326 addressing. 315. 13 compared to integration. 393 split form. 39 calculation by Newton-Raphson.

265 construction of. 161 Taylor. 175 theory.INDEX transposing to a diﬀerent expansion point. 397 unresolved operator. 64 vector. 403 upper triangular matrix. 382 construction of. properties of. 418 unsureness. 266. 414 triangle. derivative of. 273 partial. 45. 191 ﬁnite number of. 47 unknown coeﬃcient. 403 unitary matrix. logical. 416 unresolved. 313 uniqueness. 236 transitivity summational and integrodiﬀerential. 7. 45 unit lower triangular matrix. 45. 393 three-dimensional space. 269 trigonometric family of functions. 47 cylindrical. 364 unitary similarity. 274 unit parallel. 135 transpose. 298 operator. 56 right. 55 of a sum or diﬀerence of angles. 39. 197 unprimed coordinate. 53 poles of. 33. 45 properties. 47. 323 of matrix rank. 48. 265 upstairs. 151. 186 trigonometrics. 374 unit normal. 159 technician. 109 of a double or half angle. properties of. Heaviside. 45 derivative of. 395 two-dimensional space. 240 of a matrix inverse. 249 tuning. 138 469 triple root. Brook (1685–1731). 310 unit. 331 two-dimensional geometrical vector. 257 trapezoid rule. 212. 274 unit upper triangular matrix. 62 unit circle. 226 triviality. 45 inverse. 129 tree. 417 unary operator. 103 trigonometric function. 331 term cross-. 61 triple integral. 62 variable. 267 parallel. 45 triangle inequalities. 34. linear. 273 partial. 240. 47 imaginary. 376 unity. 418 underdetermined linear system. 361 conjugate. 114 up. 414 . 400 unary linear operator. 64 triangular matrix. 45. 280 truncation. 138 transformation. 267 parallel. 39 real. 400 time and space. 145 unit triangular matrix. 101 trigonometry. complex. 332. 62 spherical. 107 inverse. 39 unit basis vector. 354 determinant of. 34 equilateral. 34 complex. 59 area of. 403 unit step function. 346 normal or perpendicular. 55 of a hour angle. 376 unitary transformation. 265 unit vector. 265 unit magnitude. 324 three-dimensional geometrical vector.

401. 413 dot or inner product of two. 419 vector notation. 47 orientation of. 130 vector. 393 addition of. 393 two-dimensional. 47 unit basis. 346 matrix. 191. 346 orthogonalization of. 76 independent. 413 curl of. spherical. 49 two-dimensional geometrical. 420 volume element. 45. 234. 413 . 413 integral forms of. 47. 345 elementary. 393 multiplication of. left-over. 345 orthogonal. 302. 413 vertex. 59 variable independent inﬁnitesimal. 302 building from basis vectors. 220. 345 concise notation for. 403 Weierstrass. 221 Vieta. 401. variable. 330 vector. 326. Karl Wilhelm Theodor (1815– 1897). 47. 280. 191 normalization of. 316. driving. 10 complex. 393 vector analysis. 345 INDEX row of. 330 wind. 109 propagating. 405 derivative of. 138 wave complex. 49 three-dimensional geometrical. 393 vector calculus. 346 unit basis. 5.470 upward multitarget addition operator. 330 rotation of. 418 divergence of. 325. 347 notation for. 220 Vieta’s substitution. 30 assignment. 62 orthonormalization of. 7. 307. 220 Vieta’s transform. 191 magnitude of. 347 point. 130 variable dτ . cylindrical. 423 vector ﬁeld. 412 angle between two. 62 unit basis. 309 ersatz. 281 volume. 62 zero or null. 139. 415 generalized. 76 utility. 234 scalar multiplication of. 234. 191. Franciscus (Fran¸ois Vi`te. 234 integer. 333 velocity. 327 addressing. 307. 280 vector algebra. 398 driving. 397 nonnegative integer. 62 unit basis. 302. 393 vector space. 109 wavy sea. 346 arbitrary. 59 variable. 347 orthonormal. 280 west. 397 three-dimensional. 1540– c e 1603). 51 row. 220 visualization. 139 Vieta’s parallel transform. 219. 78 deﬁnition notation for. 51 replacement of. 395 unit. 395 algebraic identities of. 264 utility variable. 423 volume integral. 10 change of. 11 dependent. 327 elementary. 249. geometrical. 30. 375 column. 421 directional derivative of. 155 weighted sum. 30. 137. 345 dot product of.

INDEX winding road. 7. 393 x86-class computer processor. 244 zero vector. 38 dividing by. 401 wooden block. 242 vector. 396 zero. 244. 244 padding a matrix with. 68 matrix. 244 zero matrix. 280 471 . 70 worker. 335 world physical. 290 yaw.

472 INDEX .

scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->