P. 1
Black - Derivations in Applied Mathematics

Black - Derivations in Applied Mathematics

|Views: 94|Likes:
Published by a4520

More info:

Published by: a4520 on Aug 17, 2010
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

10/29/2011

pdf

text

original

Sections

  • 1.1 Applied mathematics
  • 1.2 Rigor
  • 1.2.1 Axiom and definition
  • 1.2.2 Mathematical extension
  • 1.3. COMPLEX NUMBERS AND COMPLEX VARIABLES 5
  • 1.3 Complex numbers and complex variables
  • 1.4 On the text
  • 2.1 Basic arithmetic relationships
  • 2.1.2 Negative numbers
  • 2.1.4 The change of variable
  • 2.2 Quadratics
  • 2.3. NOTATION FOR SERIES SUMS AND PRODUCTS 13
  • 2.3 Notation for series sums and products
  • 2.4 The arithmetic series
  • 2.5 Powers and roots
  • 2.5.1 Notation and integral powers
  • 2.5.2 Roots
  • 2.5.3 Powers of products and powers of powers
  • 2.5.4 Sums of powers
  • 2.5.5 Summary and remarks
  • 2.6. MULTIPLYING AND DIVIDING POWER SERIES 21
  • 2.6 Multiplying and dividing power series
  • 2.6.1 Multiplying power series
  • 2.6.2 Dividing power series
  • 2.6.3 Dividing power series by matching coefficients
  • 2.6.5 Variations on the geometric series
  • 2.8 Exponentials and logarithms
  • 2.8.1 The logarithm
  • 2.9. TRIANGLES AND OTHER POLYGONS: SIMPLE FACTS 33
  • 2.8.2 Properties of the logarithm
  • 2.9 Triangles and other polygons: simple facts
  • 2.9.1 Triangle area
  • 2.9.2 The triangle inequalities
  • 2.9.3 The sum of interior angles
  • 2.10 The Pythagorean theorem
  • 2.11 Functions
  • 2.12. COMPLEX NUMBERS (INTRODUCTION) 39
  • 2.12 Complex numbers (introduction)
  • 2.12.2 Complex conjugation
  • 2.12.3 Power series and analytic functions (preview)
  • Trigonometry
  • 3.1 Definitions
  • 3.2 Simple properties
  • 3.3 Scalars, vectors, and vector notation
  • 3.4 Rotation
  • 3.5.1 Variations on the sums and differences
  • 3.5.2 Trigonometric functions of double and half angles
  • 3.7 The laws of sines and cosines
  • 3.8 Summary of properties
  • 3.9 Cylindrical and spherical coordinates
  • 3.10 The complex triangle inequalities
  • 3.11 De Moivre’s theorem
  • The derivative
  • 4.1 Infinitesimals and limits
  • 4.1.1 The infinitesimal
  • 4.1.2 Limits
  • 4.2 Combinatorics
  • 4.2.1 Combinations and permutations
  • 4.2.2 Pascal’s triangle
  • 4.3 The binomial theorem
  • 4.3.1 Expanding the binomial
  • 4.3.2 Powers of numbers near unity
  • 4.3.3 Complex powers of numbers near unity
  • 4.4 The derivative
  • 4.4.1 The derivative of the power series
  • 4.4.2 The Leibnitz notation
  • 4.4.3 The derivative of a function of a complex variable
  • 4.4.4 The derivative of za
  • 4.4.5 The logarithmic derivative
  • 4.5 Basic manipulation of the derivative
  • 4.5.1 The derivative chain rule
  • 4.5.2 The derivative product rule
  • 4.6 Extrema and higher derivatives
  • 4.7 L’Hˆopital’s rule
  • 4.8 The Newton-Raphson iteration
  • The complex exponential
  • 5.1 The real exponential
  • 5.2 The natural logarithm
  • 5.3 Fast and slow functions
  • 5.4 Euler’s formula
  • 5.5. COMPLEX EXPONENTIALS AND DE MOIVRE 101
  • 5.6 Complex trigonometrics
  • 5.7 Summary of properties
  • 5.8 Derivatives of complex exponentials
  • 5.8.1 Derivatives of sine and cosine
  • 5.8.2 Derivatives of the trigonometrics
  • 5.8.3 Derivatives of the inverse trigonometrics
  • 5.9 The actuality of complex quantities
  • Primes, roots and averages
  • 6.1 Prime numbers
  • 6.1.1 The infinite supply of primes
  • 6.1.2 Compositional uniqueness
  • 6.1.3 Rational and irrational numbers
  • 6.2.1 Polynomial roots
  • 6.2.2 The fundamental theorem of algebra
  • 6.3 Addition and averages
  • 6.3.1 Serial and parallel addition
  • 6.3.2 Averages
  • The integral
  • 7.1 The concept of the integral
  • 7.1.1 An introductory example
  • 7.1.2 Generalizing the introductory example
  • 7.1.3 The balanced definition and the trapezoid rule
  • 7.3 Operators, linearity and multiple integrals
  • 7.3.1 Operators
  • 7.3.2 A formalism
  • 7.3.3 Linearity
  • 7.3.4 Summational and integrodifferential transitivity
  • 7.3.5 Multiple integrals
  • 7.4 Areas and volumes
  • 7.4.1 The area of a circle
  • 7.4.2 The volume of a cone
  • 7.4.3 The surface area and volume of a sphere
  • 7.5 Checking an integration
  • 7.6 Contour integration
  • 7.7 Discontinuities
  • 7.8 Remarks (and exercises)
  • The Taylor series
  • 8.1 The power-series expansion of 1/(1−z)n+1
  • 8.1.1 The formula
  • 8.1.2 The proof by induction
  • 8.1.3 Convergence
  • 8.1.4 General remarks on mathematical induction
  • 8.2 Shifting a power series’ expansion point
  • 8.3. EXPANDING FUNCTIONS IN TAYLOR SERIES 159
  • 8.3 Expanding functions in Taylor series
  • 8.4 Analytic continuation
  • 8.5 Branch points
  • 8.6 Entire and meromorphic functions
  • of entire and meromorphic functions.11
  • 8.7 Extrema over a complex domain
  • 8.8 Cauchy’s integral formula
  • 8.8.1 The meaning of the symbol dz
  • 8.8.2 Integrating along the contour
  • 8.8.3 The formula
  • 8.8.4 Enclosing a multiple pole
  • 8.9 Taylor series for specific functions
  • 8.10 Error bounds
  • 8.10.1 Examples
  • 8.10.2 Majorization
  • 8.10.3 Geometric majorization
  • 8.10.4 Calculation outside the fast convergence domain
  • 8.10.5 Nonconvergent series
  • 8.10.6 Remarks
  • 8.11 Calculating 2π
  • 8.12 Odd and even functions
  • 8.13 Trigonometric poles
  • 8.14 The Laurent series
  • 8.15 Taylor series in 1/z
  • 8.16. THE MULTIDIMENSIONAL TAYLOR SERIES 191
  • 8.16 The multidimensional Taylor series
  • Integration techniques
  • 9.1 Integration by antiderivative
  • 9.2 Integration by substitution
  • 9.3 Integration by parts
  • 9.4 Integration by unknown coefficients
  • 9.5 Integration by closed contour
  • 9.6. INTEGRATION BY PARTIAL-FRACTION EXPANSION 205
  • 9.6 Integration by partial-fraction expansion
  • 9.6.1 Partial-fraction expansion
  • 9.6.2 Repeated poles
  • 9.6.3 Integrating a rational function
  • 9.6.4 The derivatives of a rational function
  • 9.6.5 Repeated poles (the conventional technique)
  • 9.6.6 The existence and uniqueness of solutions
  • 9.7 Frullani’s integral
  • 9.9 Integration by Taylor series
  • Cubics and quartics
  • 10.1 Vieta’s transform
  • 10.2 Cubics
  • 10.3 Superfluous roots
  • 10.4 Edge cases
  • 10.5 Quartics
  • 10.6 Guessing the roots
  • The matrix
  • 11.1 Provenance and basic use
  • 11.1.1 The linear transformation
  • 11.1.2 Matrix multiplication (and addition)
  • 11.1.3 Row and column operators
  • 11.1.4 The transpose and the adjoint
  • 11.2 The Kronecker delta
  • 11.3 Dimensionality and matrix forms
  • 11.3.1 The null and dimension-limited matrices
  • 11.3.3 The active region
  • 11.3.4 Other matrix forms
  • 11.3.5 The rank-r identity matrix
  • 11.3.6 The truncation operator
  • 11.3.7 The elementary vector and the lone-element matrix
  • 11.3.8 Off-diagonal entries
  • 11.4 The elementary operator
  • 11.4.1 Properties
  • 11.4.2 Commutation and sorting
  • 11.5 Inversion and similarity (introduction)
  • 11.6 Parity
  • 11.7 The quasielementary operator
  • 11.7.3 Addition quasielementaries
  • 11.8 The unit triangular matrix
  • 11.8.1 Construction
  • 11.8.2 The product of like unit triangular matrices
  • 11.8.3 Inversion
  • 11.8.4 The parallel unit triangular matrix
  • 11.8.5 The partial unit triangular matrix
  • 11.9 The shift operator
  • 11.10 The Jacobian derivative
  • 12.1 Linear independence
  • 12.2 The elementary similarity transformation
  • 12.3 The Gauss-Jordan decomposition
  • 12.3.1 Motive
  • 12.3.2 Method
  • 12.3.3 The algorithm
  • 12.3.4 Rank and independent rows
  • 12.3.5 Inverting the factors
  • 12.3.6 Truncating the factors
  • 12.3.7 Properties of the factors
  • 12.3.8 Marginalizing the factor In
  • 12.3.9 Decomposing an extended operator
  • 12.4 Vector replacement
  • 12.5 Rank
  • 12.5.1 A logical maneuver
  • 12.5.2 The impossibility of identity-matrix promotion
  • 12.5.3 General matrix rank and its uniqueness
  • 12.5.4 The full-rank matrix
  • 12.5.5 Underdetermined and overdetermined linear systems (introduction)
  • 12.5.6 The full-rank factorization
  • 12.5.8 The significance of rank uniqueness
  • 13.1 Inverting the square matrix
  • 13.2 The exactly determined linear system
  • 13.3 The kernel
  • 13.3.1 The Gauss-Jordan kernel formula
  • 13.3.2 Converting between kernel matrices
  • 13.3.3 The degree of freedom
  • 13.4 The nonoverdetermined linear system
  • 13.4.1 Particular and homogeneous solutions
  • 13.4.2 A particular solution
  • 13.4.3 The general solution
  • 13.5 The residual
  • 13.6. THE PSEUDOINVERSE AND LEAST SQUARES 335
  • 13.6.1 Least squares in the real domain
  • 13.6.2 The invertibility of A∗A
  • 13.6.3 Positive definiteness
  • 13.6.4 The Moore-Penrose pseudoinverse
  • 13.7 The multivariate Newton-Raphson iteration
  • 13.8 The dot product
  • 13.9 The orthogonal complement
  • 13.10 Gram-Schmidt orthonormalization
  • 13.10.1 Efficient implementation
  • 13.10.2 The Gram-Schmidt decomposition
  • 13.10.3 The Gram-Schmidt kernel formula
  • 13.11 The unitary matrix
  • The eigenvalue
  • 14.1 The determinant
  • 14.1.1 Basic properties
  • 14.1.2 The determinant and the elementary operator
  • 14.1.3 The determinant of a singular matrix
  • 14.1.4 The determinant of a matrix product
  • 14.1.5 Determinants of inverse and unitary matrices
  • 14.1.6 Inverting the square matrix by determinant
  • 14.2 Coincident properties
  • 14.3 The eigenvalue itself
  • 14.4 The eigenvector
  • 14.5 Eigensolution facts
  • 14.6 Diagonalization
  • 14.7 Remarks on the eigenvalue
  • 14.8 Matrix condition
  • 14.9. THE SIMILARITY TRANSFORMATION 375
  • 14.9 The similarity transformation
  • 14.10. THE SCHUR DECOMPOSITION 377
  • 14.10 The Schur decomposition
  • 14.10.1 Derivation
  • 14.10.2 The nondiagonalizable matrix
  • 14.11 The Hermitian matrix
  • 14.12 The singular value decomposition
  • 14.13 General remarks on the matrix
  • Vector analysis
  • 15.1 Reorientation
  • 15.2 Multiplication
  • 15.2.1 The dot product
  • 15.2.2 The cross product
  • 15.3 Orthogonal bases
  • 15.4 Notation
  • 15.4.1 Components by subscript
  • The notation
  • 15.4.2 Einstein’s summation convention
  • 15.4.3 The Kronecker delta and the Levi-Civita epsilon
  • 15.5 Algebraic identities
  • 15.6. FIELDS AND THEIR DERIVATIVES 413
  • 15.6 Fields and their derivatives
  • 15.6.1 The ∇ operator
  • A.1 Hexadecimal numerals
  • A.2 Avoiding notational clutter

Derivations of Applied Mathematics

Thaddeus H. Black Revised 19 April 2008

ii

Thaddeus H. Black, 1967–. Derivations of Applied Mathematics. 19 April 2008. U.S. Library of Congress class QA401. Copyright c 1983–2008 by Thaddeus H. Black thb@derivations.org . Published by the Debian Project [10]. This book is free software. You can redistribute and/or modify it under the terms of the GNU General Public License [16], version 2.

Contents
Preface 1 Introduction 1.1 Applied mathematics . . . . . 1.2 Rigor . . . . . . . . . . . . . . 1.2.1 Axiom and definition . 1.2.2 Mathematical extension 1.3 Complex numbers and complex 1.4 On the text . . . . . . . . . . . xvii 1 1 2 2 4 5 5 7 7 7 9 10 10 11 13 15 15 16 17 19 20 20 21 22 22 26

. . . . . . . . . . . . . . . . . . . . . . . . variables . . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

2 Classical algebra and geometry 2.1 Basic arithmetic relationships . . . . . . . . . . . . . . 2.1.1 Commutivity, associativity, distributivity . . . 2.1.2 Negative numbers . . . . . . . . . . . . . . . . 2.1.3 Inequality . . . . . . . . . . . . . . . . . . . . 2.1.4 The change of variable . . . . . . . . . . . . . 2.2 Quadratics . . . . . . . . . . . . . . . . . . . . . . . . 2.3 Notation for series sums and products . . . . . . . . . 2.4 The arithmetic series . . . . . . . . . . . . . . . . . . 2.5 Powers and roots . . . . . . . . . . . . . . . . . . . . . 2.5.1 Notation and integral powers . . . . . . . . . . 2.5.2 Roots . . . . . . . . . . . . . . . . . . . . . . . 2.5.3 Powers of products and powers of powers . . . 2.5.4 Sums of powers . . . . . . . . . . . . . . . . . 2.5.5 Summary and remarks . . . . . . . . . . . . . 2.6 Multiplying and dividing power series . . . . . . . . . 2.6.1 Multiplying power series . . . . . . . . . . . . 2.6.2 Dividing power series . . . . . . . . . . . . . . 2.6.3 Dividing power series by matching coefficients iii

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . .

iv 2.6.4 Common quotients and the geometric series 2.6.5 Variations on the geometric series . . . . . . 2.7 Constants and variables . . . . . . . . . . . . . . . . 2.8 Exponentials and logarithms . . . . . . . . . . . . . 2.8.1 The logarithm . . . . . . . . . . . . . . . . . 2.8.2 Properties of the logarithm . . . . . . . . . . 2.9 Triangles and other polygons: simple facts . . . . . 2.9.1 Triangle area . . . . . . . . . . . . . . . . . . 2.9.2 The triangle inequalities . . . . . . . . . . . 2.9.3 The sum of interior angles . . . . . . . . . . 2.10 The Pythagorean theorem . . . . . . . . . . . . . . . 2.11 Functions . . . . . . . . . . . . . . . . . . . . . . . . 2.12 Complex numbers (introduction) . . . . . . . . . . . 2.12.1 Rectangular complex multiplication . . . . . 2.12.2 Complex conjugation . . . . . . . . . . . . . 2.12.3 Power series and analytic functions (preview)

CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 29 30 32 32 33 33 34 34 35 36 38 39 41 41 43 45 45 47 47 51 53 54 55 55 59 60 62 64 65 67 67 68 69 70 70 72 72

3 Trigonometry 3.1 Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . 3.2 Simple properties . . . . . . . . . . . . . . . . . . . . . . . 3.3 Scalars, vectors, and vector notation . . . . . . . . . . . . 3.4 Rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Trigonometric sums and differences . . . . . . . . . . . . 3.5.1 Variations on the sums and differences . . . . . . 3.5.2 Trigonometric functions of double and half angles 3.6 Trigonometrics of the hour angles . . . . . . . . . . . . . 3.7 The laws of sines and cosines . . . . . . . . . . . . . . . . 3.8 Summary of properties . . . . . . . . . . . . . . . . . . . 3.9 Cylindrical and spherical coordinates . . . . . . . . . . . 3.10 The complex triangle inequalities . . . . . . . . . . . . . . 3.11 De Moivre’s theorem . . . . . . . . . . . . . . . . . . . . . 4 The derivative 4.1 Infinitesimals and limits . 4.1.1 The infinitesimal . 4.1.2 Limits . . . . . . . 4.2 Combinatorics . . . . . . 4.2.1 Combinations and 4.2.2 Pascal’s triangle . 4.3 The binomial theorem . .

. . . . . . . . . . . . .

. . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . permutations . . . . . . . . . . . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

. . . . . . .

CONTENTS 4.3.1 Expanding the binomial . . . . . . . . . . 4.3.2 Powers of numbers near unity . . . . . . 4.3.3 Complex powers of numbers near unity . The derivative . . . . . . . . . . . . . . . . . . . 4.4.1 The derivative of the power series . . . . 4.4.2 The Leibnitz notation . . . . . . . . . . . 4.4.3 The derivative of a function of a complex 4.4.4 The derivative of z a . . . . . . . . . . . . 4.4.5 The logarithmic derivative . . . . . . . . Basic manipulation of the derivative . . . . . . . 4.5.1 The derivative chain rule . . . . . . . . . 4.5.2 The derivative product rule . . . . . . . . Extrema and higher derivatives . . . . . . . . . . L’Hˆpital’s rule . . . . . . . . . . . . . . . . . . . o The Newton-Raphson iteration . . . . . . . . . . complex exponential The real exponential . . . . . . . . . . . . . . . . The natural logarithm . . . . . . . . . . . . . . . Fast and slow functions . . . . . . . . . . . . . . Euler’s formula . . . . . . . . . . . . . . . . . . . Complex exponentials and de Moivre . . . . . . Complex trigonometrics . . . . . . . . . . . . . . Summary of properties . . . . . . . . . . . . . . Derivatives of complex exponentials . . . . . . . 5.8.1 Derivatives of sine and cosine . . . . . . . 5.8.2 Derivatives of the trigonometrics . . . . . 5.8.3 Derivatives of the inverse trigonometrics The actuality of complex quantities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

v 72 73 74 75 75 76 78 79 80 80 81 81 83 84 86 91 91 94 96 98 101 101 103 103 103 106 106 108

4.4

4.5

4.6 4.7 4.8 5 The 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8

5.9

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

. . . . . . . . . . . .

6 Primes, roots and averages 6.1 Prime numbers . . . . . . . . . . . . . . . . . 6.1.1 The infinite supply of primes . . . . . 6.1.2 Compositional uniqueness . . . . . . . 6.1.3 Rational and irrational numbers . . . 6.2 The existence and number of roots . . . . . . 6.2.1 Polynomial roots . . . . . . . . . . . . 6.2.2 The fundamental theorem of algebra 6.3 Addition and averages . . . . . . . . . . . . . 6.3.1 Serial and parallel addition . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

111 . 111 . 111 . 112 . 115 . 116 . 116 . 117 . 118 . 118

vi 6.3.2

CONTENTS Averages . . . . . . . . . . . . . . . . . . . . . . . . . 121 125 . 125 . 126 . 129 . 129 . 130 . 132 . 132 . 133 . 134 . 135 . 137 . 139 . 139 . 139 . 140 . 143 . 144 . 145 . 148 151 152 152 154 155 156 157 159 161 163 165 166 167 168 168 172 173

7 The integral 7.1 The concept of the integral . . . . . . . . . . . . . . . . 7.1.1 An introductory example . . . . . . . . . . . . . 7.1.2 Generalizing the introductory example . . . . . 7.1.3 The balanced definition and the trapezoid rule . 7.2 The antiderivative . . . . . . . . . . . . . . . . . . . . . 7.3 Operators, linearity and multiple integrals . . . . . . . . 7.3.1 Operators . . . . . . . . . . . . . . . . . . . . . 7.3.2 A formalism . . . . . . . . . . . . . . . . . . . . 7.3.3 Linearity . . . . . . . . . . . . . . . . . . . . . . 7.3.4 Summational and integrodifferential transitivity 7.3.5 Multiple integrals . . . . . . . . . . . . . . . . . 7.4 Areas and volumes . . . . . . . . . . . . . . . . . . . . . 7.4.1 The area of a circle . . . . . . . . . . . . . . . . 7.4.2 The volume of a cone . . . . . . . . . . . . . . . 7.4.3 The surface area and volume of a sphere . . . . 7.5 Checking an integration . . . . . . . . . . . . . . . . . . 7.6 Contour integration . . . . . . . . . . . . . . . . . . . . 7.7 Discontinuities . . . . . . . . . . . . . . . . . . . . . . . 7.8 Remarks (and exercises) . . . . . . . . . . . . . . . . . . 8 The Taylor series 8.1 The power-series expansion of 1/(1 − z)n+1 . . . . . 8.1.1 The formula . . . . . . . . . . . . . . . . . . 8.1.2 The proof by induction . . . . . . . . . . . . 8.1.3 Convergence . . . . . . . . . . . . . . . . . . 8.1.4 General remarks on mathematical induction 8.2 Shifting a power series’ expansion point . . . . . . . 8.3 Expanding functions in Taylor series . . . . . . . . . 8.4 Analytic continuation . . . . . . . . . . . . . . . . . 8.5 Branch points . . . . . . . . . . . . . . . . . . . . . 8.6 Entire and meromorphic functions . . . . . . . . . . 8.7 Extrema over a complex domain . . . . . . . . . . . 8.8 Cauchy’s integral formula . . . . . . . . . . . . . . . 8.8.1 The meaning of the symbol dz . . . . . . . . 8.8.2 Integrating along the contour . . . . . . . . . 8.8.3 The formula . . . . . . . . . . . . . . . . . . 8.8.4 Enclosing a multiple pole . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .

CONTENTS 8.9 Taylor series for specific functions 8.10 Error bounds . . . . . . . . . . . . 8.10.1 Examples . . . . . . . . . . 8.10.2 Majorization . . . . . . . . 8.10.3 Geometric majorization . . 8.10.4 Calculation outside the fast 8.10.5 Nonconvergent series . . . 8.10.6 Remarks . . . . . . . . . . 8.11 Calculating 2π . . . . . . . . . . . 8.12 Odd and even functions . . . . . . 8.13 Trigonometric poles . . . . . . . . 8.14 The Laurent series . . . . . . . . . 8.15 Taylor series in 1/z . . . . . . . . 8.16 The multidimensional Taylor series . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . convergence domain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

vii 174 175 175 178 179 181 183 184 185 185 186 187 190 191

9 Integration techniques 9.1 Integration by antiderivative . . . . . . . . . . . . . 9.2 Integration by substitution . . . . . . . . . . . . . . 9.3 Integration by parts . . . . . . . . . . . . . . . . . . 9.4 Integration by unknown coefficients . . . . . . . . . 9.5 Integration by closed contour . . . . . . . . . . . . . 9.6 Integration by partial-fraction expansion . . . . . . 9.6.1 Partial-fraction expansion . . . . . . . . . . . 9.6.2 Repeated poles . . . . . . . . . . . . . . . . . 9.6.3 Integrating a rational function . . . . . . . . 9.6.4 The derivatives of a rational function . . . . 9.6.5 Repeated poles (the conventional technique) 9.6.6 The existence and uniqueness of solutions . . 9.7 Frullani’s integral . . . . . . . . . . . . . . . . . . . 9.8 Products of exponentials, powers and logs . . . . . . 9.9 Integration by Taylor series . . . . . . . . . . . . . . 10 Cubics and quartics 10.1 Vieta’s transform . 10.2 Cubics . . . . . . . 10.3 Superfluous roots . 10.4 Edge cases . . . . 10.5 Quartics . . . . . . 10.6 Guessing the roots

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

. . . . . . . . . . . . . . .

193 . 193 . 194 . 195 . 197 . 200 . 205 . 205 . 206 . 209 . 210 . 210 . 212 . 213 . 215 . 216 219 . 220 . 220 . 223 . 225 . 227 . 229

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

. . . . . .

viii

CONTENTS 233 . 236 . 236 . 237 . 239 . 240 . 241 . 242 . 244 . 245 . 247 . 247 . 248 . 249 . 249 . 250 . 250 . 252 . 253 . 254 . 258 . 260 . 261 . 262 . 263 . 265 . 267 . 268 . 268 . 269 . 274 . 276 . 276 279 . 280 . 282 . 282 . 284 . 287

11 The matrix 11.1 Provenance and basic use . . . . . . . . . . . . . . . . . 11.1.1 The linear transformation . . . . . . . . . . . . . 11.1.2 Matrix multiplication (and addition) . . . . . . 11.1.3 Row and column operators . . . . . . . . . . . . 11.1.4 The transpose and the adjoint . . . . . . . . . . 11.2 The Kronecker delta . . . . . . . . . . . . . . . . . . . . 11.3 Dimensionality and matrix forms . . . . . . . . . . . . . 11.3.1 The null and dimension-limited matrices . . . . 11.3.2 The identity, scalar and extended matrices . . . 11.3.3 The active region . . . . . . . . . . . . . . . . . 11.3.4 Other matrix forms . . . . . . . . . . . . . . . . 11.3.5 The rank-r identity matrix . . . . . . . . . . . . 11.3.6 The truncation operator . . . . . . . . . . . . . 11.3.7 The elementary vector and lone-element matrix 11.3.8 Off-diagonal entries . . . . . . . . . . . . . . . . 11.4 The elementary operator . . . . . . . . . . . . . . . . . 11.4.1 Properties . . . . . . . . . . . . . . . . . . . . . 11.4.2 Commutation and sorting . . . . . . . . . . . . . 11.5 Inversion and similarity (introduction) . . . . . . . . . . 11.6 Parity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.7 The quasielementary operator . . . . . . . . . . . . . . 11.7.1 The general interchange operator . . . . . . . . 11.7.2 The general scaling operator . . . . . . . . . . . 11.7.3 Addition quasielementaries . . . . . . . . . . . . 11.8 The unit triangular matrix . . . . . . . . . . . . . . . . 11.8.1 Construction . . . . . . . . . . . . . . . . . . . . 11.8.2 The product of like unit triangular matrices . . 11.8.3 Inversion . . . . . . . . . . . . . . . . . . . . . . 11.8.4 The parallel unit triangular matrix . . . . . . . 11.8.5 The partial unit triangular matrix . . . . . . . . 11.9 The shift operator . . . . . . . . . . . . . . . . . . . . . 11.10 The Jacobian derivative . . . . . . . . . . . . . . . . . . 12 Rank and the Gauss-Jordan 12.1 Linear independence . . . . . . . . . . . . 12.2 The elementary similarity transformation 12.3 The Gauss-Jordan decomposition . . . . . 12.3.1 Motive . . . . . . . . . . . . . . . 12.3.2 Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

CONTENTS 12.3.3 12.3.4 12.3.5 12.3.6 12.3.7 12.3.8 12.3.9 12.4 Vector 12.5 Rank 12.5.1 12.5.2 12.5.3 12.5.4 12.5.5 12.5.6 12.5.7 12.5.8 The algorithm . . . . . . . . . . . . . . . . . . . . . Rank and independent rows . . . . . . . . . . . . . Inverting the factors . . . . . . . . . . . . . . . . . . Truncating the factors . . . . . . . . . . . . . . . . . Properties of the factors . . . . . . . . . . . . . . . Marginalizing the factor In . . . . . . . . . . . . . . Decomposing an extended operator . . . . . . . . . replacement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A logical maneuver . . . . . . . . . . . . . . . . . . The impossibility of identity-matrix promotion . . . General matrix rank and its uniqueness . . . . . . . The full-rank matrix . . . . . . . . . . . . . . . . . Under- and overdetermined systems (introduction) . The full-rank factorization . . . . . . . . . . . . . . Full column rank and the Gauss-Jordan’s K and S The significance of rank uniqueness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

ix 288 296 297 298 299 300 301 302 305 306 307 310 312 313 314 315 316 319 320 324 325 326 330 330 332 333 333 334 334 335 337 339 340 340 344 345 347 347 349

13 Inversion and orthonormalization 13.1 Inverting the square matrix . . . . . . . . . . 13.2 The exactly determined linear system . . . . 13.3 The kernel . . . . . . . . . . . . . . . . . . . 13.3.1 The Gauss-Jordan kernel formula . . 13.3.2 Converting between kernel matrices . 13.3.3 The degree of freedom . . . . . . . . . 13.4 The nonoverdetermined linear system . . . . 13.4.1 Particular and homogeneous solutions 13.4.2 A particular solution . . . . . . . . . 13.4.3 The general solution . . . . . . . . . . 13.5 The residual . . . . . . . . . . . . . . . . . . 13.6 The pseudoinverse and least squares . . . . . 13.6.1 Least squares in the real domain . . . 13.6.2 The invertibility of A∗ A . . . . . . . . 13.6.3 Positive definiteness . . . . . . . . . . 13.6.4 The Moore-Penrose pseudoinverse . . 13.7 The multivariate Newton-Raphson iteration . 13.8 The dot product . . . . . . . . . . . . . . . . 13.9 The orthogonal complement . . . . . . . . . . 13.10 Gram-Schmidt orthonormalization . . . . . . 13.10.1 Efficient implementation . . . . . . .

4. . . . . . . . 14. 359 . . . . . 405 . . . . . . . 15. . . . 14. . . . . . . . . .9 The similarity transformation .10. . . . . . . . . . . . . . . . . . . . . . 375 . . . . . . . . . . . . . . . . .2 Coincident properties . . . . . . . . . . . . . . . . . . . . . . .4.10 The Schur decomposition . . . .1. . . 363 . . . . 370 . . . . . . . . . . .2 Multiplication . 374 . . . . . . . . . . . . . 14. . . . 14. .11 The Hermitian matrix . 357 . . 15. . . . 14. 395 .1. . 14. . . . . . . . . . . .3 The Kronecker delta and the Levi-Civita epsilon 15. .1 Basic properties . . . .4 The eigenvector . . . . . . . . . . . . . . . . . . 15. . . . .10. . . . . . . . . . . . . . . . . . . . . . . . . . 14. 14. 366 . .12 The singular value decomposition . . . . . . . . . . . . . . . 398 . . . . . . . . 14. . . . . . 14. .10. . . . . . . . . . . . . . . . . . . . . .x CONTENTS 13. . . . . . . 413 . . . . . . . . . . .8 Matrix condition . . . . . . . . .5 Determinants of inverse and unitary matrices . . . . .6. 14. . . . . . . . . . . . . . . . . . . . . . . 406 . . . . . . .7 Remarks on the eigenvalue . . 405 . . . . .5 Algebraic identities . . . . . . . . 382 . 354 14 The eigenvalue 14. .3 Orthogonal bases . .6 Inverting the square matrix by determinant .5 Eigensolution facts . . . 349 13. 14. 14. . . . 14. . . . . . 357 . . . . . 14. . . 15. 364 . . . . . . . . . . .1. . .1 The determinant . . . 373 .6 Diagonalization . . . . . . . . . . 353 13. . . . .1 Components by subscript . 362 . . . . . . . .2. . . . . . . 15. 401 . .13 General remarks on the matrix . . . . .1 The ∇ operator . 377 . . . 15.11 The unitary matrix . . . . . . . . . . . . . . . . . . . . 364 . . . . . . . . .1 Derivation . 15. . . . 14. . . . . . . . .2 The nondiagonalizable matrix . . .3 The Gram-Schmidt kernel formula . . . . . . . . .1. .1 Reorientation . . . 388 . . . . . 408 . . . . . . 15. . . 15.3 The eigenvalue itself . 414 15 Vector analysis 15. . . . . . . . . . . . . . . . . .2 The cross product . . . .2 Einstein’s summation convention . . . . . . . . . . . . . .2 The determinant and the elementary operator 14. . . . . 14.6 Fields and their derivatives . . . . . . . . .4 The determinant of a matrix product . 365 . . .1. . . . . . .4 Notation . . . . . . . . . . . 14. . . . . . . . . . 371 . . . . . . . . . . . . . . . . . . . . . . 397 . . . . . . 384 . 368 . . . . . . . . . . . . . .3 The determinant of a singular matrix . . 412 . . . . . .1 The dot product . .1. . . . . . . . . . . 369 . . . . .2. . . . . . . . .2 The Gram-Schmidt decomposition . . 390 393 .4. . 377 . . . . . . . . . 14. .10. . 399 . . . . . . . . . . 15.

. . . .1 Hexadecimal numerals . 15.7 Integral forms . . . . . . . . . . . . . . . . . . . . . . . . . .6. . . . . xi 416 418 419 421 423 423 425 A Hex and other notational matters 431 A. . . . .1 The divergence theorem . . . . . . . . . . .2 Avoiding notational clutter .4 Divergence . . . . . . . . . . . . . . . . . . . . . . . . .2 Stokes’ theorem . 15. . . . . . . . . . . 15. . . . . . .3 The directional derivative 15. 432 A. . . . . . . . . . . . . . 15. . . . . . . . . . . . .5 Curl . . . . . . . . . .6. . .6. . . . . .2 Operator notation . .7. . . . . . 433 B The Greek alphabet C Manuscript history 435 439 . . . . . . . . . . . . . . .7. . and the gradient . . . . . . . . . . . . . . . . 15. . . . . . . . .6. .CONTENTS 15. .

xii CONTENTS .

. .4 11. . . . . . . .1 11. . . . . . .3 6. . . . . . . . . . . . . . .1 5. . . . . . . . . . . . . . . . . . . . . . . .1 Basic properties of arithmetic. . cylindrical and spherical coordinate relations. . . . . . . . . . . . . . . . . .2 The method to extract the four roots of the general quartic. . 255 256 256 258 273 . . . . . . . . . . Trigonometric functions of the hour angles. . powers and logs. . . . . . . . . . Properties of the parallel unit triangular xiii . . . . . . . .1 The method to extract the three roots of the general cubic. Rectangular. . . . .1 8.1 2. matrix. . . . 109 Parallel and serial addition identities. Power properties and definitions. . .4 5. . . . . . . . . . . . . . . . . . . . . . . . 104 Derivatives of the trigonometrics. . 216 10. . . . . . . . . . . . . larger powers. . . . . . .List of Tables 2.3 11. . . . . . . 230 11. .5 3. . . . . . . .1 3.2 2. . . . .2 5. . . . . . . . . . Complex exponential properties. . . . . . . . . . . . . . 120 Basic derivatives for the antiderivative. Matrix inversion properties. .3 3. . . . . smaller powers. . . . . . . . . . . . Elementary operators: addition. . . . . .5 Elementary operators: interchange. . . . . 176 Antiderivatives of products of exps. . . . . Dividing power series through successively Dividing power series through successively General properties of the logarithm. .1 7.1 9. . . . .3 2. . . 223 10. . 8 16 25 27 34 48 58 61 63 Simple properties of the trigonometric functions.2 11. . . 107 Derivatives of the inverse trigonometrics. . . . . . . Elementary operators: scaling. . . . .2 3. . . . . . . . . . . . . . . . . 132 Taylor series.4 2. Further properties of the trigonometric functions. . . .

.2 A few properties of the Gauss-Jordan factors. . . . . . . . . . . . . . 436 . . . . . . . . 412 B. . . .xiv LIST OF TABLES 12. . . . . .1 The Roman and Greek alphabets. . . . . . . .1 Properties of the Kronecker delta and Levi-Civita epsilon. . . . 300 12.3 The symmetrical equations of § 12. . . . . . . . . . . . . . . 283 12. . . .4. 304 15.2 Algebraic vector identities. . . . . 409 15. . . . . . . . . . . .1 Some elementary similarity transformations.

. . . . . . . . . . . . .2 3. . derivatives of the sine and cosine functions. . . . . . . . . . . . . . . The Newton-Raphson iteration. . . . . . . . 4 8 35 37 37 40 46 47 49 49 52 57 57 59 63 72 73 83 84 87 natural exponential.5 3. . . Pascal’s triangle. . . . . . . . . . . . . . . .1 2. . . . . . . . . . . . . . . . . . . . . . . . The plan for Pascal’s triangle. . Multiplicative commutivity. . . .List of Figures 1. . . . . . . . . .7 3. . . . .1 3. . . The laws of sines and cosines. . . . . . .4 3. . . . . . . . . . . . . . . . . . . . . the corner. . . . . . . . . . . . . The The The The . . . . . . . . . . . . .5 3. . . . . . . . . . . . . . natural logarithm. 126 xv . . . . . 94 . . . . . . . . The sine function. . .3 4. . The Pythagorean theorem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ˆ ˆ A two-dimensional vector u = xx + yy. . . . . . . The sine and the cosine. .1 5. . . 99 . . . . . . . . . . . . . . . .1 4. .8 3. . . . . . . 105 Areas representing discrete sums.3 2. . . . .3 5. .9 4. . . . . . The complex (or Argand) plane. . . . . . . . . . . . . . . . . . . . 95 . . . . . . . . . . .2 5. . . . . . . . . . . . . . . . . . . . . .6 3. .4 4. . . . . . .4 7. . . . .1 2. . The 0x18 hours in a circle. . . . . . . . . . .2 4. . . . . . . . A local extremum. . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Two triangles. . . . ˆ ˆ ˆ A three-dimensional vector v = xx + yy + zz. . . . . . . . . .4 2. . . . A point on a sphere. . . . . . . Calculating the hour trigonometrics. . . . . .3 3. . . A level inflection. . complex exponential and Euler’s formula. . . Vector basis rotation. . . . . . . . . . . . . . . . . . . . . . . . . . . . The sum of a triangle’s inner angles: turning at A right triangle. .5 5. . . . . . . . . . . .2 2. . . . . . . . .

169 A Cauchy contour integral. . . . . . . . . . . . . . . . The cylindrical basis. . . .1 Fitting a line to measured data. . . A contour of integration. . . . . .1 Vieta’s transform. . .8 7. . . . . . . . . . . . . . . The area of a circle. . . . . . . . . . . 394 398 401 404 404 . . . . . . The Dirac delta δ(t). . . . . . . . . . . . . . . . . . . . . 336 15. . . . . . . . . . . . . . . . . . .1 15. . . . . . . . . . . . . . . . . . . . . .3 7. . . . The Heaviside unit step u(t). . . . . . . . . .7 7. . . . . . . The volume of a cone. . . . . . .4 7. . plotted logarithmically. . . .4 15. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 7. 201 10. . . . . . The spherical basis. . . . . . . .2 7. . . . . . . . . . . . . . . .5 7. . . . . . 173 Majorization. . . . . . . . . .2 15. . . . . . . . . .xvi 7. . . . . . . . An element of a sphere’s surface.3 9. . . . Integration by the trapezoid rule. . The dot product . . . . . . . . . . . The cross product. . . . . . . . .1 LIST OF FIGURES An area representing an infinite sum of infinitesimals. . . . . . . . . . . . . . . . . . . . . . . . . . . .10 8. . . . . . . . . . . . . . . . . . . . . . .9 7. . . . . . . . . . .3 15. . .5 A point on a sphere. . . . . . . . A sphere. . . . . 221 13. . . . 179 Integration by closed contour. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 8. . . . . . . . . . . . . . . . . . . . 128 130 139 140 141 141 145 146 146 A complex contour of integration in two segments. . .1 8. . . . . .

complex variables and. Later chapters build upon the foundation. integrals. There unfortunately really is no gradual way to bridge the gap to hexadecimal xvii . Decimal numerals are fine in history and anthropology (man has ten fingers). One particular respect in which the book departs requires some defense here. From such a kernel the notes grew over time. There is nothing wrong with decimal numerals as such.Preface I never meant to write this book. trigonometrics. The book. of course. In the book. until family and friends suggested that the notes might make the material for a book. Kernighan’s and Ritchie’s The C Programming Language [27]. The book began in 1983 when a high-school classmate challenged me to prove the Pythagorean theorem on the spot. adding to it the derivations of a few other theorems of interest to me. series. treating quadratics. cents. never aesthetic. pounds. I lost the dare. The book is neither a tutorial on the one hand nor a bald reference on the other. law and engineering (the physical units are arbitrary anyway). first frames coherently the simplest. The book is so organized. or you can begin on page one and read— with toil and commensurate profit—straight through to the end of the last chapter. but looking the proof up later I recorded it on loose leaves. for instance. deriving results less general or more advanced. It is a study reference. unexpectedly. the aforementioned Pythagorean theorem. shillings. I think: the book employs hexadecimal numerals. most basic derivations of applied mathematics. pence: the base hardly matters). exponentials. you can look up some particular result directly. entire book as it stands. The book generally follows established applied mathematical convention but where worthwhile occasionally departs therefrom. It emerged unheralded. but they are merely serviceable in mathematical theory. in the tradition of. These and others establish the book’s foundation in Chapters 2 through 9. derivatives. a work yet in progress but a complete. Such is the book’s plan. finance and accounting (dollars.

one must leap. either—unless one would on the same principle despise a Washington or an Einstein. it can break the flow of otherwise good prose. tending rather to consult the table of contents. special fonts. sidebars. Will making. etc. long sanctioned by an earlier era of publishing. among whom a surprising concentration of talent and expertise are found. while edifying. In this book it shall have many messages to bear. . the last of which it generally sets in italic type. The book belongs to the emerging tradition of open-source software. or a Debian Developer [10]. A theorem is referenced by the number of the section that states it. The book’s peculiar mission and program lend it an unusual quantity of discursive footnotes. extensively employed by such sages as Gibbon [18] and Shirer [43]. the book leaps seldom. like numbered examples. really do help the right kind of book. In typical science/engineering style. Often it does in fact. On the other hand. thence to twelve. It would be vain to deny that professional editing and formal peer review. colored inks. the book numbers its sections.. had substantial value. one who does for the love of it: not such a bad motive. after all1 ) on principle. The book naturally subjoins an index. by contrast. proprietary book. Personally with regard to my own work. seems the most able messenger. is no use). tables and formulas. Twenty years of keeping my own private notes in hex have persuaded me that the leap justifies the risk. I should rather not make too many claims. it does not do to despise the amateur (literally. enrich the work freely. not a program. too. of course.xviii PREFACE (shifting to base eleven. The book provides a bibliography listing other books I have referred to 1 The expression is derived from an observation I seem to recall George F. but not its theorems. Catching the reader’s eye. Well. Open source has a spirit to it which leads readers to be far more generous with their feedback than ever could be the case with a traditional. Nevertheless it is a book. This has value. coheres insufficiently well to join the main narrative. Some of these are merely trendy. These footnotes offer nonessential material which. Modern publishing offers various alternatives to the footnote—numbered examples. In other matters. The footnote is an imperfect messenger. Such readers. etc. figures. maybe. If one wishes to reach hexadecimal ground. but for this book the humble footnote. Others. The canny reader will avoid using the index (of this and most other books). neither of which the book enjoys. The book in general walks a tolerably conventional applied mathematical line. Lore among open-source developers holds that open development inherently leads to superior work. where at the time of this writing it fills a void.

electrical engineering. Brown or he of me. underlying purpose or hidden plan. the ways in which a good doctoral adviser influences his student are both complex and deep. To the initiated. building construction. but I suppose that what is true for me is true for many of them also: we begin by organizing notes for our own use. Among the bibliography’s entries stands a reference [6] to my doctoral adviser G. and then undertake to revise the notes and to bring them into a form which actually is useful to others. undoubtedly bias the selection and presentation to some degree. and even where a proof is new the idea proven probably is not. Prof. of course. Brown. Brown had nothing directly to do with the book’s development. electromagnetic analysis and Debian development. and my work under him did not regard the book in any case. What constitutes “useful” or “orderly” is a matter of perspective and judgment. How other authors go about writing their books. However. usually too implicitly coherently to cite. or of the notes underlying it. then observe that the same notes might prove useful to others. though the book’s narrative seldom explicitly invokes the reference. The latter proofs are perhaps original or semi-original from my personal point of view. Brown’s style and insight touch the book in many places and in many ways.xix while writing. had been drafted years before I had ever heard of Prof. the book has none. As to a grand goal. for a substantial bulk of the manuscript. for its methods and truths are established by derivation rather than authority. other than to derive as many useful mathematical results as possible and to record the derivations together in an orderly manner in a single volume. Much of the book consists of common mathematical knowledge or of proofs I have worked out with my own pencil from various ideas gleaned who knows where over the years. residence and citizenship in the United States. My own peculiar heterogeneous background in military service. Prof. the mathematics itself often tends to suggest the form of the proof: if to me. Mathematics by its very nature promotes queer bibliographies. THB . however.S. then surely also to others who came before. I do not know. but it is unlikely that many if any of them are truly new. Whether this book succeeds in the last point is for the reader to judge. my nativity.

xx PREFACE .

This first chapter treats a few introductory matters of general interest.1 Applied mathematics Applied mathematics is a branch of mathematics that concerns itself with the application of mathematical knowledge to other domains. The question of what is applied mathematics does not answer to logical classification so much as to the sociology of professionals who use mathematics. demonstrable but generally lacking the detailed rigor of the professional mathematician. The book’s chapters are topical. de¨mphasizing formal mathematical rigor.Chapter 1 Introduction This is a book of applied mathematical proofs. The book’s purpose is to convey the essential ideas underlying the derivations of a large number of mathematical results useful in the modeling of physical systems. the book emphasizes main threads of mathematical argument and the motivation underlying the main threads. To this end. if you want to know why the result is so. you can look for the proof here. It derives mathematical results e from the purely applied perspective of the scientist and the engineer. . proceeding not from reduced. well defined sets of axioms but rather directly from a nebulous mass of natural arithmetical. In this book we shall define applied mathematics to be correct mathematics useful to scientists. geometrical and classical-algebraic idealizations of physical systems. engineers and the like. . on both counts. [30] What is applied mathematics? That is about right. 1. . If you have seen a mathematical result. 1 .

2

CHAPTER 1. INTRODUCTION

1.2

Rigor

It is impossible to write such a book as this without some discussion of mathematical rigor. Applied and professional mathematics differ principally and essentially in the layer of abstract definitions the latter subimposes beneath the physical ideas the former seeks to model. Notions of mathematical rigor fit far more comfortably in the abstract realm of the professional mathematician; they do not always translate so gracefully to the applied realm. The applied mathematical reader and practitioner needs to be aware of this difference.

1.2.1

Axiom and definition

Ideally, a professional mathematician knows or precisely specifies in advance the set of fundamental axioms he means to use to derive a result. A prime aesthetic here is irreducibility: no axiom in the set should overlap the others or be specifiable in terms of the others. Geometrical argument—proof by sketch—is distrusted. The professional mathematical literature discourages undue pedantry indeed, but its readers do implicitly demand a convincing assurance that its writers could derive results in pedantic detail if called upon to do so. Precise definition here is critically important, which is why the professional mathematician tends not to accept blithe statements such as that 1 = ∞, 0 without first inquiring as to exactly what is meant by symbols like 0 and ∞. The applied mathematician begins from a different base. His ideal lies not in precise definition or irreducible axiom, but rather in the elegant modeling of the essential features of some physical system. Here, mathematical definitions tend to be made up ad hoc along the way, based on previous experience solving similar problems, adapted implicitly to suit the model at hand. If you ask the applied mathematician exactly what his axioms are, which symbolic algebra he is using, he usually doesn’t know; what he knows is that the bridge has its footings in certain soils with specified tolerances, suffers such-and-such a wind load, etc. To avoid error, the applied mathematician relies not on abstract formalism but rather on a thorough mental grasp of the essential physical features of the phenomenon he is trying to model. An equation like 1 =∞ 0

1.2. RIGOR

3

may make perfect sense without further explanation to an applied mathematical readership, depending on the physical context in which the equation is introduced. Geometrical argument—proof by sketch—is not only trusted but treasured. Abstract definitions are wanted only insofar as they smooth the analysis of the particular physical problem at hand; such definitions are seldom promoted for their own sakes. The irascible Oliver Heaviside, responsible for the important applied mathematical technique of phasor analysis, once said, It is shocking that young people should be addling their brains over mere logical subtleties, trying to understand the proof of one obvious fact in terms of something equally . . . obvious. [35] Exaggeration, perhaps, but from the applied mathematical perspective Heaviside nevertheless had a point. The professional mathematicians Richard Courant and David Hilbert put it more soberly in 1924 when they wrote, Since the seventeenth century, physical intuition has served as a vital source for mathematical problems and methods. Recent trends and fashions have, however, weakened the connection between mathematics and physics; mathematicians, turning away from the roots of mathematics in intuition, have concentrated on refinement and emphasized the postulational side of mathematics, and at times have overlooked the unity of their science with physics and other fields. In many cases, physicists have ceased to appreciate the attitudes of mathematicians. [8, Preface] Although the present book treats “the attitudes of mathematicians” with greater deference than some of the unnamed 1924 physicists might have done, still, Courant and Hilbert could have been speaking for the engineers and other applied mathematicians of our own day as well as for the physicists of theirs. To the applied mathematician, the mathematics is not principally meant to be developed and appreciated for its own sake; it is meant to be used. This book adopts the Courant-Hilbert perspective. The introduction you are now reading is not the right venue for an essay on why both kinds of mathematics—applied and professional (or pure)— are needed. Each kind has its place; and although it is a stylistic error to mix the two indiscriminately, clearly the two have much to do with one another. However this may be, this book is a book of derivations of applied mathematics. The derivations here proceed by a purely applied approach.

4

CHAPTER 1. INTRODUCTION Figure 1.1: Two triangles.

h b1 b b2 b −b2

h

1.2.2

Mathematical extension

Profound results in mathematics are occasionally achieved simply by extending results already known. For example, negative integers and their properties can be discovered by counting backward—3, 2, 1, 0—then asking what follows (precedes?) 0 in the countdown and what properties this new, negative integer must have to interact smoothly with the already known positives. The astonishing Euler’s formula (§ 5.4) is discovered by a similar but more sophisticated mathematical extension. More often, however, the results achieved by extension are unsurprising and not very interesting in themselves. Such extended results are the faithful servants of mathematical rigor. Consider for example the triangle on the left of Fig. 1.1. This triangle is evidently composed of two right triangles of areas A1 = A2 = b1 h , 2 b2 h 2

(each right triangle is exactly half a rectangle). Hence the main triangle’s area is (b1 + b2 )h bh A = A1 + A2 = = . 2 2 Very well. What about the triangle on the right? Its b1 is not shown on the figure, and what is that −b2 , anyway? Answer: the triangle is composed of the difference of two right triangles, with b1 the base of the larger, overall one: b1 = b + (−b2 ). The b2 is negative because the sense of the small right triangle’s area in the proof is negative: the small area is subtracted from

1.3. COMPLEX NUMBERS AND COMPLEX VARIABLES

5

the large rather than added. By extension on this basis, the main triangle’s area is again seen to be A = bh/2. The proof is exactly the same. In fact, once the central idea of adding two right triangles is grasped, the extension is really rather obvious—too obvious to be allowed to burden such a book as this. Excepting the uncommon cases where extension reveals something interesting or new, this book generally leaves the mere extension of proofs— including the validation of edge cases and over-the-edge cases—as an exercise to the interested reader.

1.3

Complex numbers and complex variables

More than a mastery of mere logical details, it is an holistic view of the mathematics and of its use in the modeling of physical systems which is the mark of the applied mathematician. A feel for the math is the great thing. Formal definitions, axioms, symbolic algebras and the like, though often useful, are felt to be secondary. The book’s rapidly staged development of complex numbers and complex variables is planned on this sensibility. Sections 2.12, 3.10, 3.11, 4.3.3, 4.4, 6.2, 9.5 and 9.6.5, plus all of Chs. 5 and 8, constitute the book’s principal stages of complex development. In these sections and throughout the book, the reader comes to appreciate that most mathematical properties which apply for real numbers apply equally for complex, that few properties concern real numbers alone. Pure mathematics develops an abstract theory of the complex variable.1 The abstract theory is quite beautiful. However, its arc takes off too late and flies too far from applications for such a book as this. Less beautiful but more practical paths to the topic exist;2 this book leads the reader along one of these.

1.4

On the text

The book gives numerals in hexadecimal. It denotes variables in Greek letters as well as Roman. Readers unfamiliar with the hexadecimal notation will find a brief orientation thereto in Appendix A. Readers unfamiliar with the Greek alphabet will find it in Appendix B. Licensed to the public under the GNU General Public Licence [16], version 2, this book meets the Debian Free Software Guidelines [11].
1 2

[14][44][24] See Ch. 8’s footnote 8.

6

CHAPTER 1. INTRODUCTION

If you cite an equation, section, chapter, figure or other item from this book, it is recommended that you include in your citation the book’s precise draft date as given on the title page. The reason is that equation numbers, A chapter numbers and the like are numbered automatically by the L TEX typesetting software: such numbers can change arbitrarily from draft to draft. If an example citation helps, see [5] in the bibliography.

Chapter 2

Classical algebra and geometry
Arithmetic and the simplest elements of classical algebra and geometry, we learn as children. Few readers will want this book to begin with a treatment of 1 + 1 = 2; or of how to solve 3x − 2 = 7. However, there are some basic points which do seem worth touching. The book starts with these.

2.1

Basic arithmetic relationships

This section states some arithmetical rules.

2.1.1

Commutivity, associativity, distributivity, identity and inversion

Table 2.1 lists several arithmetical rules, each of which applies not only to real numbers but equally to the complex numbers of § 2.12. Most of the rules are appreciated at once if the meaning of the symbols is understood. In the case of multiplicative commutivity, one imagines a rectangle with sides of lengths a and b, then the same rectangle turned on its side, as in Fig. 2.1: since the area of the rectangle is the same in either case, and since the area is the length times the width in either case (the area is more or less a matter of counting the little squares), evidently multiplicative commutivity holds. A similar argument validates multiplicative associativity, except that here we compute the volume of a three-dimensional rectangular box, which box we turn various ways.1
1

[44, Ch. 1]

7

8

CHAPTER 2. CLASSICAL ALGEBRA AND GEOMETRY

Table 2.1: Basic properties of arithmetic.

a+b a + (b + c) a+0=0+a a + (−a) ab (a)(bc) (a)(1) = (1)(a) (a)(1/a) (a)(b + c)

= = = = = = = = =

b+a (a + b) + c a 0 ba (ab)(c) a 1 ab + ac

Additive commutivity Additive associativity Additive identity Additive inversion Multiplicative commutivity Multiplicative associativity Multiplicative identity Multiplicative inversion Distributivity

Figure 2.1: Multiplicative commutivity.

b a

a

b

2.1. BASIC ARITHMETIC RELATIONSHIPS

9

Multiplicative inversion lacks an obvious interpretation when a = 0. Loosely, 1 = ∞. 0 But since 3/0 = ∞ also, surely either the zero or the infinity, or both, somehow differ in the latter case. Looking ahead in the book, we note that the multiplicative properties do not always hold for more general linear transformations. For example, matrix multiplication is not commutative and vector cross-multiplication is not associative. Where associativity does not hold and parentheses do not otherwise group, right-to-left association is notationally implicit:2,3 A × B × C = A × (B × C). The sense of it is that the thing on the left (A×) operates on the thing on the right (B × C). (In the rare case in which the question arises, you may want to use parentheses anyway.)

2.1.2

Negative numbers

Consider that (+a)(+b) = +ab, (−a)(+b) = −ab, (+a)(−b) = −ab,

(−a)(−b) = +ab. The first three of the four equations are unsurprising, but the last is interesting. Why would a negative count −a of a negative quantity −b come to
The fine C and C++ programming languages are unfortunately stuck with the reverse order of association, along with division inharmoniously on the same level of syntactic precedence as multiplication. Standard mathematical notation is more elegant: abc/uvw = (a)(bc) . (u)(vw)
2

3

The nonassociative cross product B × C is introduced in § 15.2.2.

10

CHAPTER 2. CLASSICAL ALGEBRA AND GEOMETRY

a positive product +ab? To see why, consider the progression . . . (+3)(−b) = −3b, (+2)(−b) = −2b, (0)(−b) =

(+1)(−b) = −1b, 0b, (−1)(−b) = +1b, (−2)(−b) = +2b, (−3)(−b) = +3b, . . .

The logic of arithmetic demands that the product of two negative numbers be positive for this reason.

2.1.3
If4

Inequality
a < b,

then necessarily a + x < b + x. However, the relationship between ua and ub depends on the sign of u: ua < ub ua > ub Also, 1 1 > . a b if u > 0; if u < 0.

2.1.4

The change of variable

The applied mathematician very often finds it convenient to change variables, introducing new symbols to stand in place of old. For this we have
Few readers attempting this book will need to be reminded that < means “is less than,” that > means “is greater than,” or that ≤ and ≥ respectively mean “is less than or equal to” and “is greater than or equal to.”
4

One can indeed use the equal sign. usually it is better to introduce a new symbol. (2. it means to increment k by one. Still. as in j ← k + 1. a number defined such that i2 = −1. which regrettably does not fill the role well. whereas Q ← P implies nothing especially interesting about P or Q. 5 . “let the new symbol Q represent P . introduced in more detail in § 2. as the reader may be aware. △ In some books. Even k ← k + 1 can confuse readers inasmuch as it appears to imply two different values for the same symbol k. ≡ is printed as =. but the latter notation is sometimes used anyway when new symbols are unwanted or because more precise alternatives (like kn = kn−1 + 1) seem overwrought. it just introduces a (perhaps temporary) new symbol Q to ease the algebra. “in place of P .12 below). The notation k ← k + 1 by contrast is unambiguous. 6 One would never write k ≡ k + 1.” For example. 11 This means. Subjectively. This means. if a2 + b2 = c2 . other than the = equal sign. but then what does the change of variable k = k + 1 mean? It looks like a claim that k and k + 1 are the same. Differences and sums of squares are conveniently factored as a2 + b2 = (a + ib)(a − ib). the latter notation admittedly has seen only scattered use in the literature. which is impossible. or. put Q”. then the change of variable 2µ ← a yields the new form (2µ)2 + b2 = c2 .2.2 Quadratics a2 − b2 = (a + b)(a − b). “let Q now equal P .2.1) (where i is the imaginary unit.”6 The two notations logically mean about the same thing. The C and C++ programming languages use == for equality and = for assignment (change of variable). a2 + 2ab + b2 = (a + b)2 a2 − 2ab + b2 = (a − b)2 . QUADRATICS the change of variable or assignment notation5 Q ← P. Useful as these four forms are. Q ≡ P identifies a quantity P sufficiently interesting to be given a permanent name Q. The concepts grow clearer as examples of the usage arise in the book. There appears to exist no broadly established standard mathematical notation for the change of variable. 2. However. Similar to the change of variable notation is the definition notation Q ≡ P.

(If not already clear from the context.11.3) β2 − γ2. 2a . Expressions of fourth and fifth order are quartic and quintic. zeroth-order expressions like 3 are constant. Examples of quadratic expressions include x2 . 2x2 − 7x + 3 and x2 +2xy +y 2 . Substituting into the equation the values of z1 and z2 and simplifying proves the suggestion correct. 9 It suggests it because the expressions on the left and right sides of (2. the expressions x3 −1 and 5x2 y are cubic not quadratic because they contain third-order terms. The term 5x2 y = 5[x][x][y] is of third order. See Ch.12 CHAPTER 2.4) (2. order basically refers to the number of variables multiplied together in a term.) 7 The adjective quadratic refers to the algebra of expressions in which no term has greater than second order.2) = z 2 − 2βz + β 2 − (β 2 − γ 2 ) are those given by (2. where z1 and z2 are the two values of z given by (2. 10 and its Tables 10. or in other words where z=β± This suggests the factoring9 z 2 − 2βz + γ 2 = (z − z1 )(z − z2 ). but they are much harder.2. The expression evidently has roots8 where (z − β)2 = (β 2 − γ 2 ). which is called the quadratic formula. It follows that the two solutions of the quadratic equation z 2 = 2βz − γ 2 (2. To factor this. CLASSICAL ALGEBRA AND GEOMETRY however. By contrast. none of them can directly factor the more general quadratic7 expression z 2 − 2βz + γ 2 . First-order expressions like x + 1 are linear. 10 The form of the quadratic formula which usually appears in print is √ −b ± b2 − 4ac x= .1 and 10. we complete the square. See § 2. (2.10 (Cubic and quartic formulas also exist respectively to extract the roots of polynomials of third and fourth order. for instance.2).2). writing z 2 − 2βz + γ 2 = z 2 − 2βz + γ 2 + (β 2 − γ 2 ) − (β 2 − γ 2 ) = (z − β)2 − (β 2 − γ 2 ).) 8 A root of f (z) is a value of z for which f (z) = 0.3) are both quadratic (the highest power is z 2 ) and have the same roots. respectively.

3 speaks further of the dummy variable. the general habit of writing k and j proves convenient at least in § 4. then adding the several f (k). The symbols and come respectively from the Greek letters for S and P. by (2.2. .4). For example. Appendix A tells how to read the numerals. the quadratic z 2 = 3z − 2 has the solutions 3 z= ± 2 11 s„ « 2 3 − 2 = 1 or 2. b in turn. . so we start now.12 (Nothing prevents one from writing k rather than j . For example. and may be regarded as standing for “Sum” and “Product. For a dummy variable.3 Notation for series sums and products Sums and products of series arise so frequently in mathematical work that one finds it convenient to define terse notations to express them. However. evaluating the function f (k) at each k. . The book’s preface explains why the book represents such numbers in hexadecimal. incidentally. 2 The hexadecimal numeral 0x56 represents the same number the decimal numeral 86 represents.2) in light of (2. 12 Section 7. The summation notation b f (k) k=a means to let k equal each of a.11 6 k2 = 32 + 42 + 52 + 62 = 0x56.) which solves the quadratic ax2 + bx + c = 0. . index of summation or loop counter —a variable with no independent existence. a + 1.” The j or k is a dummy variable. a + 2. this writer finds the form (2. NOTATION FOR SERIES SUMS AND PRODUCTS 13 2. one can use any letter one likes. However. 8.2) easier to remember.5.2 and Ch. . used only to facilitate the addition or multiplication of the series. k=3 The similar multiplication notation b f (j) j=a means to multiply the several f (j) rather than to add them.3.

” Regarding the notation n!/m!.1) is not always commutative. . we specify that b j=a f (j) = [f (b)][f (b − 1)][f (b − 2)] · · · [f (a + 2)][f (a + 1)][f (a)] rather than the reverse order of multiplication. If you can avoid unnecessary multiplication by regarding n!/m! as a single unit. 13 (2. overflowing computer registers during numerical computation.5) One reason among others for this is that factorials rapidly multiply to extremely large sizes. we shall use the notation b j=a f (j) = [f (a)][f (a + 1)][f (a + 2)] · · · [f (b − 2)][f (b − 1)][f (b)]. CLASSICAL ALGEBRA AND GEOMETRY The product shorthand n n! ≡ n!/m! ≡ j. but this is the order we shall use in this book. N N f (k) = 0 + k=N +1 N k=N +1 N f (k) = 0. This means among other things that 0! = 1. this is a win. this can of course be regarded correctly as n! divided by m! . In the event that the reverse order of multiplication is needed. j=m+1 is very frequently used. The notation n! is pronounced “n factorial.14 CHAPTER 2. Note that for the sake of definitional consistency. but it usually proves more amenable to regard the notation as a single unit. 14 The extant mathematical Q literature lacks an established standard on the order of multiplication implied by the “ ” symbol. f (j) = (1) j=N +1 j=N +1 f (j) = 1.1.14 Multiplication proceeds from right to left.13 Because multiplication in its more general sense as linear transformation (§ 11. j=1 n j.

thus the average of the entire series is [a+b]/2.4 addresses that point. . etc.2. 1. −2. m.15 but the exponents a and b are arbitrary real numbers. n. It offers no surprises. n..2 summarizes its definitions and results. r and s. Traditionally. Pairing a with b.6. the letters i. Hence. but this changes nothing. 7). . p. q. .6) Success with this arithmetic series leads one to wonder about the geometric series ∞ z k . r and s are integers. then a+ 2 with b− 2.” However. −3. THE ARITHMETIC SERIES 15 On first encounter. . the exponents k. 2 (2.4 The arithmetic series A simple yet useful application of the series sum of § 2. −1. see Ch. We shall use such notation extensively in this book. . In this section. too. In case the word is unfamiliar to the reader who has learned arithmetic in another language than English. . 2. 3. Admittedly it is easier for the beginner to read “f (1) + f (2) + · · · + f (N )” than “ N f (k). m. the average of each pair is [a+b]/2. j. .4. Section 2. q. Readers seeking more rewarding reading may prefer just to glance at the table then to skip directly to the start of the next section. then a+ 1 with b− 1. 0. k=0 2. M and N are used to represent integers (i is sometimes avoided because the same letter represents the imaginary unit). experience shows the latter notation to be k=1 extremely useful in expressing more sophisticated mathematical ideas. k. the integers are the negative. b k=a k = (b − a + 1) a+b . but this section needs more integer symbols so it uses p.3 is the arithmetic series b k=a k = a + (a + 1) + (a + 2) + · · · + b. 2.) The series has b − a + 1 terms. (The pairing may or may not leave an unpaired element at the series midpoint k = [a + b]/2. Table 2.5 Powers and roots This necessarily tedious section discusses powers and roots. zero and positive counting numbers . 15 . The corresponding adjective is integral (although the word “integral” is also used as a noun and adjective indicating an infinite sum of infinitesimals. such and notation seems a bit overwrought.

16 n z ≡ n z.2: Power properties and definitions. j=1 (2. n z n ≡ j=1 z.4. n ≥ 0 √ z = (z 1/n )n = (z n )1/n z ≡ z 1/2 (uv)a = ua v a z p/q = (z 1/q )p = (z p )1/q z ab = (z a )b = (z b )a z a+b = z a z b za z a−b = zb 1 z −b = zb 2. when the exponent n is a nonnegative integer. but it further usually indicates that the expression on its right serves to define the expression on its left. multiplied by itself n times. Refer to § 2. More formally.7) 16 The symbol “≡” means “=”.16 CHAPTER 2.1. . CLASSICAL ALGEBRA AND GEOMETRY Table 2.5.1 Notation and integral powers The power notation zn indicates the number z.

17 z 3 = (z)(z)(z). zn Notice that in general. 17 (2.5. The case 00 is interesting because it lacks an obvious interpretation. For interest. z mn = (z m )n = (z n )m . so for consistency with (2. POWERS AND ROOTS For example. z for any integral m and n.11) 2. z 0 = 1.8) (2. 1 .2 Roots Fractional powers are not something we have defined yet. E→∞ E→∞ E→∞ E ǫ→0+ 17 . For similar reasons.10) we let (u1/n )n = u. Setting v = u1/n . z 1 = z.10) On the other hand from multiplicative associativity and commutivity. (2. raised to the nth power. z 2 = (z)(z). if E ≡ 1/ǫ.2. zm z m−n = n . (uv)n = un v n . then „ «1/E 1 lim ǫǫ = lim = lim E −1/E = lim e−(ln E)/E = e0 = 1. This has u1/n as the number which. yields u. z This leads us to extend the definition to negative integral powers with z n−1 = z −n = From the foregoing it is plain that z m+n = z m z n . The specific interpretation depends on the nature and meaning of the two zeros.5.9) (2. zn .

then wpq = z p . In any case. § 2. When z is real and nonnegative. What about powers expressible neither as n nor as 1/n. Extending (2. often written √ z.3 will find it useful if we can also show here that (z 1/q )1/s = z 1/qs = (z 1/s )1/q . (z 1/n )n = z = (z n )1/n (2.14) . the square root of z.13) Since any real number can be approximated arbitrarily closely by a ratio of integers. (2. Equation (2.13) is this subsection’s main result. the power and root operations mutually invert one another. so this is (z 1/q )p = (z p )1/q . (2.13) implies a power definition for all real exponents. The number z 1/n is called the nth root of z—or in the very common case n = 2.5. wp = (z p )1/q . But w = z 1/q . (v n )1/n = u1/n . the result is the same. the last notation is usually implicitly taken to mean the real. Taking the u and v formulas together. (2. which says that it does not matter whether one applies the power or the root first. nonnegative square root.18 CHAPTER 2.12) for any z and integral n. then. However. Taking the qth root. such as the 3/2 power? If z and w are numbers related by wq = z. we define z p/q such that (z 1/q )p = z p/q = (z p )1/q . (v n )1/n = v.10) therefore. CLASSICAL ALGEBRA AND GEOMETRY it follows by successive steps that v n = u.

Raising this equation to the 1/q power. then raising to the qs power yields (ws )q = z.3 Powers of products and powers of powers (uv)p = up v p .5. the last two equations imply (2.13) and (2. Per (2. as we have sought. Raising this equation to the 1/qs power and applying (2.10). w = (z 1/s )1/q . If w ≡ z 1/qs .10).11).5. In other words (uv)a = ua v a for any real a. we have that z (p/q)(r/s) = (z p/q )r/s . per (2. 2. POWERS AND ROOTS The proof is straightforward. On the other hand. z pr = (z p )r . 19 But since w ≡ z 1/qs .2.14) to reorder the powers. we have that (uv)p/q = [up v p ]1/q = = = (up )q/q (v p )q/q (up/q )q (v p/q )q (up/q )(v p/q ) 1/q 1/q q/q = up/q v p/q .14). (2. (2. Successively taking the qth and sth roots gives w = (z 1/q )1/s .15) . By identical reasoning.

With such a definition the results apply not only for all bases but also for all exponents.18) 2.18) is z a−b = za . CLASSICAL ALGEBRA AND GEOMETRY By identical reasoning. which according to (2. 2. a.5. § 3. real or complex. p/q. Still.11 and Ch. zb (2.5 Summary and remarks Table 2. if complex (§ 2. which implies that 1 .4 Sums of powers z (p/q)+(r/s) = (z ps+rq )1/qs = (z ps z rq )1/qs = z p/q z r/s . 1 = z −b+b = z −b z b . we shall remove later. zb But then replacing −b ← b in (2. the formulas remain valid.20 CHAPTER 2. In the case that a = −b.19) (2. b and so on are real numbers. 5. n.16) for any real a and b. purposely defining the action of a complex exponent to comport with the results found here. This restriction. Looking ahead to § 2. (2.12. one can reason that or in other words that z a+b = z a z b .17) leads to z −b = z a−b = z a z −b . this implies that (z a )b = z ab = (z b )a (2. u and v to be real numbers. . Since p/q and r/s can approximate any real numbers with arbitrary precision. w.5.9).15) and (2.17) (2. With (2.16). we observe that nothing in the foregoing analysis requires the base variables z.2 on page 16 summarizes the section’s definitions and results. z (p/q)(r/s) = (z r/s )p/q . the analysis does imply that the various exponents m.12).

2. The experienced scientist or engineer may notice that the above vocabulary omits the name “Taylor series.” Semantics. This section discusses the multiplication and division of power series. not if it includes negative powers of z.20). depending on whether the speaker is male or female. 18 Another name for the power series is polynomial. We tend to call (2. technically.20) a “power series with negative powers. they call a “power series” only if ak = 0 for all k < 0—in other words.20) in general a Laurent series. If that is not what you seek. to the point that we applied mathematicians (at least in the author’s country) seem to feel somehow unlicensed actually to use the name. 8. They call it a “polynomial” only if it is a “power series” with a finite number of terms.6 Multiplying and dividing power series A power series 18 is a weighted sum of integral powers: ∞ k=−∞ A(z) = ak z k . That is after all exactly what it is. but the two names in fact refer to essentially the same thing.” The vocabulary omits the name because that name fortunately remains unconfused in usage—it means quite specifically a power series without negative powers and tends to connote a representation of some particular function of interest—as we shall see in Ch. Equation (2.20) a Laurent series if you prefer (and if you pronounce it right: “lor-ON”). The name “Laurent series” is a name we shall meet again in § 8. you can snow him with the big word “Laurent series.14. They call (2. by the even humbler name of “polynomial.” or just “a power series. but one is expected most carefully always to give the right name in the right place. if it has a finite number of terms. The power-series terminology seems to share a spirit of that kin. to sketch some annulus in the Argand plane. You however can call (2. then you may find it better just to call the thing by the less lofty name of “power series”—or better. and/or to engage in other maybe unnecessary formalities.6. What a bother! (Someone once told the writer that the Japanese language can give different names to the same object. In the meantime however we admit that the professionals have vaguely daunted us by adding to the name some pretty sophisticated connotations. Professional mathematicians use the terms more precisely. (2. When someone does object. The word “polynomial” usually connotes a power series with a finite number of terms.” instead. .” be prepared for people subjectively— for no particular reason—to expect you to establish complex radii of convergence. All these names mean about the same thing. MULTIPLYING AND DIVIDING POWER SERIES 21 2.20) where the several ak are arbitrary constants.” This book follows the last usage. the writer recommends that you call it a “power series” and then not worry too much about it until someone objects.) If you seek just one word for the thing. Nevertheless if you do use the name “Laurent series.

22 CHAPTER 2.6. (2.3. One sometimes wants to express the long division of power series more formally. however. Section 2. we prepare the long division B(z)/A(z) by writing B(z) = A(z)Qn (z) + Rn (z).23) If Q(z) is a quotient and R(z) a remainder. . Such are the Latin-derived names of the parts of a long division. and you can skip the rest of the subsection if you like. (2.) Formally. then B(z) is a dividend (or numerator ) and A(z) a divisor (or denominator ). z−2 z−2 z−2 = The strategy is to take the dividend19 B(z) piece by piece. but this subsection does it by long division.21) bk z . 19 (2. That is what the rest of the subsection is about.3 below will do it by matching coefficients. though less direct.6.2 Dividing power series The quotient Q(z) = B(z)/A(z) of two power series is a little harder to calculate.6. For example.6. then that is really all there is to it. purposely choosing pieces easily divided by A(z). If you feel that you understand the example. 2z 2 − 3z + 3 z−2 2z 2 − 4z z + 3 z+3 + = 2z + z−2 z−2 z−2 z−2 5 5 = 2z + + = 2z + 1 + .22) k=−∞ j=−∞ 2. CLASSICAL ALGEBRA AND GEOMETRY 2. k the product of the two series is evidently P (z) ≡ A(z)B(z) = ∞ ∞ aj bk−j z k . (Be advised however that the cleverer technique of § 2. and there are at least two ways to do it. is often easier and faster.1 Multiplying power series Given two power series A(z) = B(z) = ∞ k=−∞ ∞ k=−∞ ak z k .

N − 2. where K and N identify the greatest orders k of z k present in A(z) and B(z). MULTIPLYING AND DIVIDING POWER SERIES 23 where Rn (z) is a remainder (being the part of B[z] remaining to be divided). + Rn (z) − aK aK Matching this equation against the desired iterate B(z) = A(z)Qn−1 (z) + Rn−1 (z) and observing from the definition of Qn (z) that Qn−1 (z) = Qn (z) + . At start. What does it mean? The key to understanding it lies in understanding (2. . respectively. . The goal is to grind Rn (z) away to nothing.23). . Rn (z) = k=−∞ Qn (z) = N −K k=n−K+1 qk z k . where n = N. but the quotient Qn (z) and the remainder Rn (z) change. which is not one but several equations—one equation for each value of n. N − 1. which we add and subtract from (2. Well.24) rnk z k . that is a lot of symbology. k=−∞ B(z) = RN (z) = B(z).6. As in the example. The piece we choose is (rnn /aK )z n−K A(z). bk z k . aK = 0. . to make it disappear as n → −∞. and K A(z) = k=−∞ N ak z k .2. n (2. QN (z) = 0. but the thrust of the long division process is to build Qn (z) up by wearing Rn (z) down. QN (z) = 0 while RN (z) = B(z). The dividend B(z) and the divisor A(z) stay the same from one n to the next.23) to obtain B(z) = A(z) Qn (z) + rnn n−K rnn n−K z z A(z) . we pursue the goal by choosing from Rn (z) an easily divisible piece containing the whole high-order term of Rn (z).

Rn (z) B(z) = Qn (z) + . then so long as Rn (z) tends to vanish as n → −∞. § 3.2] The notations Ko . A(z) Iterating only a finite number of times leaves a remainder. Table 2.20 In its qn−K equation. Then we iterate per (2.” “a sub k” and “z to the k” (or.24 CHAPTER 2. k=No B(z) = then n Rn (z) = k=n−(K−Ko )+1 20 21 rnk z k for all n < No + (K − Ko ). bk z k . the table includes also the result of § 2. (2. qn−K = where no term remains in Rn−1 (z) higher than a z n−1 term. as “K naught.3 summarizes the long-division procedure.23) that B(z) = Q−∞ (z).25) except that it may happen that Rn (z) = 0 for sufficiently small n. respectively.3 below.23) is trivially true.27) (2. aK Rn−1 (z) = Rn (z) − qn−K z n−K A(z).25) as many times as desired. If an infinite number of times. To begin the actual long division. we find that rnn .28) [46. “z to the kth power”)—at least in the author’s country. it follows from (2. A(z) A(z) (2. we initialize RN (z) = B(z). ak and z k are usually pronounced. for which (2.26) (2. CLASSICAL ALGEBRA AND GEOMETRY qn−K z n−K . more fully.6. It should be observed in light of Table 2. .3 that if21 K A(z) = k=Ko N ak z k .

3: Dividing power series through successively smaller powers.6.2. MULTIPLYING AND DIVIDING POWER SERIES 25 Table 2. B(z) = A(z)Qn (z) + Rn (z) K A(z) = k=−∞ N ak z k . aK = 0 bk z k k=−∞ B(z) = RN (z) = B(z) QN (z) = 0 n Rn (z) = k=−∞ rnk z k N −K k=n−K+1 Qn (z) = qk z k N −K qn−K = 1 rnn = aK aK bn − an−k qk Rn−1 (z) = Rn (z) − qn−K z B(z) = Q−∞ (z) A(z) k=n−K+1 n−K A(z) .

which is exactly the claim (2. The greatest-order term of Rn (z) is by definition a z n term. Often. then according to the equation so also must the leastorder term of Rm−1 (z) be a z No term.26 CHAPTER 2. bk z k ∞ k=N −K qk z k If a more formal demonstration of (2. 2. however. In this case. otherwise a z No term.28) is wanted.29) ∞ k=K ∞ k=N ak z k . Under these conditions. evidently the least-order term of Rm−1 (z) is a z m−(K−Ko ) term when m − (K − Ko ) ≤ No . CLASSICAL ALGEBRA AND GEOMETRY That is. One can divide them by matching coefficients.3 extends the quotient Qn (z) through successively smaller powers of z. of course. one prefers to extend the quotient through successively larger powers of z. This is better stated after the change of variable n + 1 ← m: the least-order term of Rn (z) is a z n−(K−Ko )+1 term when n < No + (K − Ko ).28) makes. the terms of Rn (z) run from z n−(K−Ko )+1 through z n . unless an even lower-order term be contributed by the product z m−K A(z). aK = 0. in summary.2.6.22 The long-division procedure of Table 2. is that we have strategically planned the long-division iteration precisely to cause the leading term of the divisor to cancel the leading term of the remainder at each step. otherwise a z No term. But that very product’s term of least order is a z m−(K−Ko ) term.6.25) that rmm m−K Rm−1 (z) = Rm (z) − z A(z).5] . The reason for this. the long division goes by the complementary rules of Table 2. 23 [29][14. when n < No + (K − Ko ). So. the remainder has order one less than the divisor has. A(z) (2.4. then consider per (2. § 2. sometimes quicker way to divide power series than by the long division of § 2. where a z K term is A(z)’s term of least order. aK If the least-order term of Rm (z) is a z No term (as clearly is the case at least for the initial remainder RN [z] = B[z]).3 Dividing power series by matching coefficients There is another.23 If Q∞ (z) = where A(z) = B(z) = are known and Q∞ (z) = 22 B(z) .

aK = 0 bk z k RN (z) = B(z) QN (z) = 0 Rn (z) = Qn (z) = k=N −K ∞ rnk z k qk z k k=n n−K−1 qn−K = rnn 1 = aK aK n−K−1 bn − an−k qk A(z) Rn+1 (z) = Rn (z) − qn−K z B(z) = Q∞ (z) A(z) k=N −K n−K .2.4: Dividing power series through successively larger powers. MULTIPLYING AND DIVIDING POWER SERIES 27 Table 2.6. B(z) = A(z)Qn (z) + Rn (z) A(z) = B(z) = ∞ k=K ∞ k=N ak z k .

30) yields a sequence of coefficients does not necessarily mean that the resulting power series Q∞ (z) converges to some definite value over a given domain.3 and 2. even though all its coefficients are known. often what interest us are only the series’ first several terms n−K−1 Qn (z) = In this case. The adaptation is left as an exercise to the interested reader. (2.29) through by A(z) to obtain the form A(z)Q∞ (z) = B(z). we have that qn−K = 1 aK n−K−1 bn − an−k qk k=N −K . Admittedly.30) is correct when Q∞ (z) does converge. then one can multiply (2. each coefficient depending on the coefficients earlier computed.22) and changing the index n ← k on the right side. n=N k=N −K n−K k=N −K But for this to hold for all z. ∞ n−K an−k qk z n = ∞ n=N bn z n . rather than increasing. Transferring all terms but aK qn−K to the equation’s right side and dividing by aK .4 incorporate the technique both ways.31) for Rn (z). the fact that (2. Q∞ (z) = k=N −K qk z k . but Tables 2. the coefficients must match for each n: an−k qk = bn . At least (2. Even when Q∞ (z) as such does not converge. Rn (z) = B(z) − A(z)Qn (z). Solving (2.30) computes the coefficients of Q(z). however. Rn (z) B(z) = Qn (z) + A(z) A(z) and convergence is not an issue.31) (2. (2.28 CHAPTER 2.34). Expanding the left side according to (2. which diverges when |z| > 1. powers of z if needed or desired. CLASSICAL ALGEBRA AND GEOMETRY is to be calculated. n ≥ N. The coefficient-matching technique of this subsection is easily adapted to the division of series in decreasing.30) Equation (2.32) . Consider for instance (2. n ≥ N.

∞ k=1 Multiplying by z yields zS ≡ zk . |5| = 5. sum? Answer: it sums exactly to 1/(1 − k=0 z). which. ∞ k=0 zk = 1 .6.6.   1 k=0 (2. |z| < 1. Let S≡ ∞ k=0 z k . For example. computed by the coefficient matching of § 2. calculated by the long division of § 2. and/or verified by multiplying. more aesthetic way to demonstrate the same thing. after dividing by 1 − z. the difference technique of § 2.5 Variations on the geometric series Besides being more aesthetic than the long division of § 2. 1−z (2.2. |z| < 1.6.2. |z| < 1.3.34) 2. |z| < 1. . as follows. but also |−5| = 5.4 Common power-series quotients and the geometric series Frequently encountered power-series quotients.   k=−∞ Equation (2.33) almost incidentally answers a question which has arisen in § 2. However. include24 ∞    (∓)k z k .4 permits one to extend the basic geometric series in 24 The notation |z| represents the magnitude of z.33) = 1 ± z  −1  k k − (∓) z .6.4 and which often arises in practice: to what total does the infinite geometric series ∞ z k . |z| > 1. implies that S≡ as was to be demonstrated. there is a simpler.6. MULTIPLYING AND DIVIDING POWER SERIES 29 2.6.2.6. Subtracting the latter equation from the former leaves (1 − z)S = 1.

the model 25 [33] . independent variables and dependent variables. it doesn’t vary). vsound is an indeterminate constant (given particular atmospheric conditions. among others. k (k + 1)(k)z k . We multiply the unknown S1 by z. independent variables and dependent variables Mathematical models use indeterminate constants. |z| < 1. we can compute as follows.7 Indeterminate constants. |z| < 1 (which arises in. so. Here. k k3 z k . 1−z where we have used (2. We then subtract zS1 from S1 . The three are best illustrated by example. we arrive at ∞ z . The model gives t as a function of ∆r. and t is a dependent variable. where ∆r is the distance from source to listener and vsound is the speed of sound. the sum S1 ≡ ∞ k=0 kz k . Dividing by 1 − z. Planck’s quantum blackbody radiation calculation25 ). 2.34) to collapse the last sum. (2. can be calculated in like manner as the need for them arises. etc. Further series of the kind. Consider the time t a sound needs to travel from its source to a distant listener: t= ∆r vsound .35) kz k = S1 ≡ (1 − z)2 k=0 which was to be found.30 CHAPTER 2. CLASSICAL ALGEBRA AND GEOMETRY several ways.. leaving (1 − z)S1 = ∞ k=0 kz − k ∞ k=1 (k − 1)z = k ∞ k=1 z =z k ∞ k=0 zk = z . ∆r is an independent variable. For instance. if you tell the model how far the listener sits from the sound source. producing zS1 = ∞ kz k+1 = ∞ k=0 k=1 (k − 1)z k . such as k k2 z k .

The chance that the typical reader will ever specify the dimensions of a real musical concert hall is of course vanishingly small. an independent variable or a dependent variable often depends upon one’s immediate point of view. now regarding as an independent variable what a moment ago we had regarded as an indeterminate constant. because the chance that the typical reader will ever specify something technical is quite large. One could calculate the quantity of water in a kitchen mixing bowl just as well. you just have to recalculate numerically).7. Such examples remind one of the kind of calculation one encounters in a childhood arithmetic textbook. ∆r. The same model in the example would remain valid if atmospheric conditions were changing (vsound would then be an independent variable) or if the model were used in designing a musical concert hall26 to suffer a maximum acceptable sound time lag from the stage to the hall’s back row (t would then be an independent variable. The point is that conceptually there pre¨xists some right figure for the indeterminate e constant. day-to-day decisions where small sums of money and negligible risk to life are at stake—are done with models hardly more sophisticated than the one shown here. Such a shift of viewpoint is fine.2. Occasionally we go so far as deliberately to change our point of view in mid-analysis. Note that the abstract validity of the model does not necessarily depend on whether we actually know the right figure for vsound (if I tell you that sound goes at 500 m/s. Knowing the figure is not the point. However. § 9. after all. dependent). so long as we remember that there is a difference between the three kinds of quantity and we keep track of which quantity is which kind to us at the moment. The main reason it matters which symbol represents which of the three kinds of quantity is that in calculus. it probably doesn’t ruin the theoretical part of your analysis. 26 . for instance (a typical case of this arises in the solution of differential equations by the method of unknown coefficients. Although there exists a definite philosophical distinction between the three kinds of quantity. as of the quantity of air contained in an astronaut’s round helmet. nevertheless it cannot be denied that which particular quantity is an indeterminate constant. maybe the concert-hall example is not so unreasonable.4). one analyzes how change in independent variables affects dependent variables as indeterminate constants remain Math books are funny about examples like this. but later you find out that the real figure is 331 m/s. CONSTANTS AND VARIABLES 31 returns the time needed for the sound to propagate from one to the other. Although sophisticated models with many factors and terms do indeed play a major role in engineering. but astronauts’ helmets are so much more interesting than bowls. you see. So. that sound goes at some constant speed—whatever it is—and that we can calculate the delay in terms of this. the great majority of practical engineering calculations—for quick. it is the idea of the example that matters here.

(2.3 has introduced the dummy variable. Refer to §§ 2. Within the expression. the inverse operation is not the root but rather the logarithm: loga (az ) = z.1 The logarithm The exponential operation follows the same laws the power operation follows. not on some other variable within the expression.8 Exponentials and logarithms In § 2. . where (in § 2.37) Hence.” of course. in fact. However. With the change of variable w ← az .5 we have considered the power operation z a . One can view it as the exponential operation az . it fills no role because logically it does not exist there. without. There is another way to view the power operation. this is aloga w = w. the dummy variable fills the role of an independent variable.7’s language) the independent variable z is the base and the indeterminate constant a is the exponent. we have that aloga (a z) = az . but its dependence is on the operator controlling the expression. 2.) 2.8. Such a dummy variable does not seem very “independent. but because the variable of interest is now the exponent rather than the base.32 CHAPTER 2.3 and 7. most dummy variables are just independent variables—a few are dependent variables— whose scope is restricted to a particular expression. the exponential and logarithmic operations mutually invert one another. “What power must I raise a to.36) The logarithm loga w answers the question. to get w?” Raising a to the power of the last equation. which the present section’s threefold taxonomy seems to exclude. however. where the variable z is the exponent and the constant a is the base.3. CLASSICAL ALGEBRA AND GEOMETRY fixed. (2. (Section 2.

u loga = loga u − loga v. powers to products.38) through (2. quotients to differences.38) follows from the steps (uv) = (u)(v).42) w z z loga w ) = (aloga u )(aloga v ).37) (which also emerge as restricted forms of eqns. Equations (2. 2.5 repeats the equations along with (2.40 and 2. exponentials to differently based exponentials. (w z ) Equation (2. loga b Of these. loga w = logb w loga b. z) wz = aloga (w wz = aloga = (aloga w )z . 2. v loga (wz ) = z loga w. thus summarizing the general properties of the logarithm.36) and (2.38) (2. and logarithms to differently based logarithms. .40) and (2.42) serve respectively to transform products to sums.8. = a . (2. loga w = loga (blogb w ). (a loga uv (2. (2. and (2.39) follows by similar reasoning.9.41) (2.41). TRIANGLES AND OTHER POLYGONS: SIMPLE FACTS 33 2.2.41) follow from the steps wz = (wz ) = (w)z .40) (2.42) follows from the steps w = blogb w . loga w logb w = . aloga uv = aloga u+loga v .2 Properties of the logarithm The basic properties of the logarithm include that loga uv = loga u + loga v. Table 2.9 Triangles and other polygons: simple facts This section gives simple facts about triangles and other polygons. = az loga w .39) (2. Among other purposes.

34 CHAPTER 2. 1. and h for perpendicular height. |a − b| < c < a + b. bh A= . (2.1 Triangle area The area of a right triangle27 is half the area of the corresponding rectangle. These are the triangle inequalities.9. The fact that any triangle’s area is half its base length times its height is seen by dropping a perpendicular from one point of the triangle to the opposite side (see Fig.5: General properties of the logarithm. which itself is longer than the difference between the two.2 The triangle inequalities Any two sides of a triangle together are longer than the third alone. dividing the triangle into two right triangles.9. In algebraic symbols. b for base length. .1 on page 4).44) where a. This is seen by splitting a rectangle down its diagonal into a pair of right triangles of equal size. is seen by sketching some triangle on a sheet of paper and asking: if c is the direct route between two points and a + b is an indirect route. then how can a + b not be longer? Of course the sum inequality is equally good on any of the 27 A right triangle is a triangle.43) 2 where A stands for area. In symbols. (2. CLASSICAL ALGEBRA AND GEOMETRY Table 2. one of whose three angles is perfectly square. loga uv = loga u + loga v u loga = loga u − loga v v loga (wz ) = z loga w wz = az loga w loga w logb w = loga b loga (az ) = z w = aloga w 2. for each of which the fact is true. The truth of the sum inequality c < a + b. 2. b and c are the lengths of a triangle’s three sides.

φ ψ triangle’s three sides.2). which together say that |a − b| < c. Solving the latter equations for φk and Many or most readers will already know the notation 2π and its meaning as the angle of full revolution. Rearranging the a and b inequalities. Briefly. In mathematical notation. φ1 + φ2 + φ3 = 2π.9. the car turns to travel along the next side. You may be used to the notation 360◦ in place of 2π. the symbol 2π represents a complete turn.3 The sum of interior angles A triangle’s three interior angles28 sum to 2π/2. this book tends to avoid the former notation. 2. but for the reasons explained in Appendix A and in footnote 15 of Ch. we have that a − b < c and b − a < c.9. the notation is introduced more properly in §§ 3. 2π .44)’s proof.2. Hence 2π/4 represents a square turn or right angle.6 and 8.1. returning to the start. completing (2. One way to see the truth of this fact is to imagine a small car rolling along one of the triangle’s sides. we reason that it has turned a total of 2π: a full revolution. φk + ψk = 2 where ψk and φk are respectively the triangle’s inner angles and the angles through which the car turns. Reaching the corner. however. a full circle. For those who do not. k = 1. Since the car again faces the original direction. 3. 2. and so on round all three corners to complete a circuit. the more the car turns: see Fig. 3. 3.11 below. 28 .2: The sum of a triangle’s inner angles: turning at the corner. The last is the difference inequality. a spin to face the same direction as before. TRIANGLES AND OTHER POLYGONS: SIMPLE FACTS 35 Figure 2. But the angle φ the car turns at a corner and the triangle’s inner angle ψ there together form the straight angle 2π/2 (the sharper the inner angle. 2. so one can write a < c + b and b < c + a just as well as c < a + b.

2) and Cauchy’s integral formula (8. as in Fig. The area of the large outer square is (a + b)2 .46) with n = 3.29). 2. 2 Solving the latter equations for φk and substituting into the former yields n k=1 2π − ψk 2 = 2π. 2 (2. we have that n φk = 2π.46) Equation (2. 2. hence the area .36 CHAPTER 2. CLASSICAL ALGEBRA AND GEOMETRY substituting into the former yields ψ1 + ψ2 + ψ3 = 2π .45) is then seen to be a special case of (2.11). The theorem holds that a2 + b2 = c2 . the Pythagorean theorem is one of the most famous results in all of mathematics. b and c are the lengths of the legs and diagonal of a right triangle.4. k=1 φk + ψk = 2π . Extending the same technique to the case of an n-sided polygon. 2.45) which was to be demonstrated. 2 (2. The area of each of the four triangles in the figure is evidently ab/2.10 The Pythagorean theorem Along with Euler’s formula (5. The area of the tilted inner square is c2 . (2.47) where a. the fundamental theorem of calculus (7. But the large outer square is comprised of the tilted inner square plus the four triangles. or in other words n k=1 ψk = (n − 2) 2π . Many proofs of the theorem are known.3. One such proof posits a square of side length a + b with a tilted square of side length c inscribed as in Fig.

10. THE PYTHAGOREAN THEOREM 37 Figure 2.3: A right triangle.2. c b a Figure 2.4: The Pythagorean theorem. c b b a .

Be that as it may. thus also to c. A root (or zero) of a function is a domain point at which the function evaluates to zero (the example has roots at x = ±1). it follows that c2 + h2 = r 2 . A singularity of a function is a domain point at which the function’s output This elegant proof is far simpler than the one famously given by the ancient geometer Euclid. Unfortunately the citation is now long lost. which simplifies directly to (2. a function is a mapping from one number (or vector of several numbers) to another. but it is possible [52.” 02:32. f (x) = x2 − 1 is a function which maps 1 to 0 and −3 to 8. “Pythagorean theorem. among others. Briefly. however. Inasmuch as (2. In mathematical symbols.47) applies to any right triangle. 29 .29 The Pythagorean theorem is readily extended to three dimensions as a2 + b2 + h2 = r 2 .47). if the domain is restricted to real x such that |x| ≤ 3. so can claim no credit for originating it. One often speaks of domains and ranges when discussing functions. In the example. CLASSICAL ALGEBRA AND GEOMETRY of the large outer square equals the area of the tilted inner square plus the areas of the four triangles. For example. this is (a + b)2 = c2 + 4 ab 2 . The domain of a function is the set of numbers one can put into it.11 Functions This book is not the place for a gentle introduction to the concept of the function.48). Whether Euclid was acquainted with the simple proof given here this writer does not know. 2. singularity and pole. yet more appealing than alternate proofs often found in print. The range of a function is the corresponding set of numbers one can get out of it. Other terms which arise when discussing functions are root (or zero). then the corresponding range is −1 ≤ f (x) ≤ 8. and where r is the corresponding three-dimensional diagonal: the diagonal of the right triangle whose legs are c and h. the present writer encountered the proof this section gives somewhere years ago and has never seen it in print since. which equation expands directly to yield (2. (2.38 CHAPTER 2. 31 March 2006] that Euclid chose his proof because it comported better with the restricted set of geometrical elements he permitted himself to work with. A current source for the proof is [52] as cited earlier in this footnote.48) where h is an altitude perpendicular to both a and b.

(Besides the root. the singularity and the pole. the function h(x) = 1/(x2 − 1) has poles at x = ±1. Since there exists no real number i such that i2 = −1 (2.49) and since the quantity i thus defined is found to be critically important across broad domains of higher mathematics.32 30 Here is one example of the book’s deliberate lack of formal mathematical rigor. however. or otherwise to frame the problem such that one need not approach the singularity. an infamous example of which is z = 0 in the function √ g[z] = z.5.— neither of us has ever met just 1). 32 The English word imaginary is evocative. A more precise formalism to say that “the function’s output is infinite” might be x→xo lim |f (x)| = ∞.49) as the definition of a fundamentally new kind of quantity: the imaginary number. What it has not done is to tell us √ how to regard a quantity such as −1. however. say. that the number 1/2 never emerges directly from counting things. The example function f (x) has no singularities for finite x. then the period separating your fourteenth and twenty-first birthdays could have .2.6. where the function’s output is infinite. The word imaginary in the mathematical sense is thus more of a technical term than a descriptive adjective. there is also the troublesome branch point. one chair. which (§ 9.5. However.12 Complex numbers (introduction) Section 2. Notice. that is. but perhaps not of quite the right concept in this usage. The applied mathematician tends to avoid such formalism where there seems no immediate use for it. Imaginary numbers are not to mathematics as. of course. If for some reason the iyear were offered as a unit of time. for example. but the book must lay a more extensive foundation before introducing them properly in § 8. either. Branch points are important. imaginary elfs are (presumably) not substantial objects. as w ← 1/z. in the mathematical realm. one summer afternoon. never directly from counting things.12. COMPLEX NUMBERS (INTRODUCTION) 39 diverges. imaginary numbers are substantial. but the best way to handle such unreasonable singularities is almost always to change a variable. but then so is the number 1 (though you and I have often met one of something—one apple. we accept (2. etc. This book will have little to say of such singularities. 31 There is further the essential singularity. like 1/ x).30 A pole is a sin√ gularity that behaves locally like 1/x (rather than.2) can be thought of as N poles. In the physical world.2 has introduced square roots.31 ) 2. The number i is just a concept. an example of which is z = 0 in p(z) = exp(1/z). A singularity that behaves as 1/xN is a multiple pole. imaginary elfs are to the physical world. The reason imaginary numbers are called “imaginary” probably has to do with the fact that they emerge from mathematical operations only.

let us call it a useful formalism.40 CHAPTER 2. which per the Pythagorean theorem (§ 2.133 is such that y (2. x been measured as −i7 iyears.5: The complex (or Argand) plane. If the equation doesn’t make sense to you yet for this reason. The unpersuaded reader is asked to suspend judgment a while.50) The phase arg z of the complex number is defined to be the angle φ in the figure. He will soon see the use. rather. plotted at right angles to the familiar real number line as in Fig. The conjugate z ∗ of this complex number is defined to be z ∗ = x − iy. The important point is that arg z is the angle φ in the figure. 2. Madness? No. 2. skip it for now. which in terms of the trigonometric functions of § 3.10) is such that |z|2 = x2 + y 2 .5.5. . and a complex number z therein. The magnitude (or modulus. iℑ(z) i2 i z ρ φ −2 −1 −i −i2 1 2 ℜ(z) Imaginary numbers are given their own number line. let us not call it that. (2. 33 This is a forward reference. CLASSICAL ALGEBRA AND GEOMETRY Figure 2. or absolute value) |z| of the complex number is defined to be the length ρ in Fig. The sum of a real number x and an imaginary number iy is the complex number z = x + iy.51) tan(arg z) = .

2. 34 [13. = 2 x2 + y 2 2 It is a curious fact that It is a useful fact that 1 = −i. but would perfectly mirror one another in every respect. particularly when printed by hand).54) (2.12. 2. COMPLEX NUMBERS (INTRODUCTION) 41 Specifically to extract the real and imaginary parts of a complex number.56) (the curious fact. claiming that j not i were the true imaginary unit.34 then one would find that (−j)2 = −1 = j 2 . including that z1 z2 = (x1 x2 − y1 y2 ) + i(y1 x2 + x1 y2 ).2.53) (2.2 Complex conjugation An important property of complex numbers descends subtly from the fact that i2 = −1 = (−i)2 . the notations ℑ(z) = y.1 Multiplication and division of complex numbers in rectangular form Several elementary properties of complex numbers are readily seen if the fact that i2 = −1 is kept in mind. (2.55.12. z1 x2 − iy2 x1 + iy1 x1 + iy1 = = z2 x2 + iy2 x2 − iy2 x2 + iy2 (x1 x2 + y1 y2 ) + i(y1 x2 − x1 y2 ) . ℜ(z) = x. 2.52) are conventionally recognized (although often the symbols ℜ(·) and ℑ(·) are written Re(·) and Im(·). eqn. § I:22-5] . The units i and j would differ indeed. is useful. i z ∗ z = x2 + y 2 = |z|2 (2.12.55) (2. too). If one defined some number j ≡ −i. and thus that all the basic properties of complex numbers in the j system held just as well as they did in the i system.

42

CHAPTER 2. CLASSICAL ALGEBRA AND GEOMETRY

That is the basic idea. To establish it symbolically needs a page or so of slightly abstract algebra as follows, the goal of which will be to show that [f (z)]∗ = f (z ∗ ) for some unspecified function f (z) with specified properties. To begin with, if z = x + iy, then by definition. Proposing that (z k−1 )∗ = (z ∗ )k−1 (which may or may not be true but for the moment we assume it), we can write z k−1 = sk−1 + itk−1 , (z ∗ )k−1 = sk−1 − itk−1 , where sk−1 and tk−1 are symbols introduced to represent the real and imaginary parts of z k−1 . Multiplying the former equation by z = x + iy and the latter by z ∗ = x − iy, we have that (z ∗ )k = (xsk−1 − ytk−1 ) − i(ysk−1 + xtk−1 ). With the definitions sk ≡ xsk−1 − ytk−1 and tk ≡ ysk−1 + xtk−1 , this is written more succinctly z k = sk + itk , (z ∗ )k = sk − itk . In other words, if (z k−1 )∗ = (z ∗ )k−1 , then it necessarily follows that (z k )∗ = (z ∗ )k . Solving the definitions of sk and tk for sk−1 and tk−1 yields the reverse definitions sk−1 = (xsk + ytk )/(x2 + y 2 ) and tk−1 = (−ysk + xtk )/(x2 + y 2 ). Therefore, except when z = x + iy happens to be null or infinite, the implication is reversible by reverse reasoning, so by mathematical induction35 we have that (z k )∗ = (z ∗ )k (2.57) for all integral k. We have also from (2.53) that
∗ ∗ (z1 z2 )∗ = z1 z2
35

z ∗ = x − iy

z k = (xsk−1 − ytk−1 ) + i(ysk−1 + xtk−1 ),

(2.58)

Mathematical induction is an elegant old technique for the construction of mathematical proofs. Section 8.1 elaborates on the technique and offers a more extensive example. Beyond the present book, a very good introduction to mathematical induction is found in [20].

2.12. COMPLEX NUMBERS (INTRODUCTION) for any complex z1 and z2 . Consequences of (2.57) and (2.58) include that if f (z) ≡ f ∗ (z) ≡

43

k=−∞ ∞ k=−∞

(ak + ibk )(z − zo )k ,

(2.59) (2.60)

∗ (ak − ibk )(z − zo )k ,

where ak and bk are real and imaginary parts of the coefficients peculiar to the function f (·), then [f (z)]∗ = f ∗ (z ∗ ). (2.61) In the common case where all bk = 0 and zo = xo is a real number, then f (·) and f ∗ (·) are the same function, so (2.61) reduces to the desired form [f (z)]∗ = f (z ∗ ), (2.62)

which says that the effect of conjugating the function’s input is merely to conjugate its output. Equation (2.62) expresses a significant, general rule of complex numbers and complex variables which is better explained in words than in mathematical symbols. The rule is this: for most equations and systems of equations used to model physical systems, one can produce an equally valid alternate model simply by simultaneously conjugating all the complex quantities present.36

2.12.3

Power series and analytic functions (preview)

Equation (2.59) is a general power series37 in z − zo . Such power series have broad application.38 It happens in practice that most functions of interest
[20][44] [24, § 10.8] 38 That is a pretty impressive-sounding statement: “Such power series have broad application.” However, air, words and molecules also have “broad application”; merely stating the fact does not tell us much. In fact the general power series is a sort of one-size-fits-all mathematical latex glove, which can be stretched to fit around almost any function. The interesting part is not so much in the general form (2.59) of the series as it is in the specific choice of ak and bk , which this section does not discuss. Observe that the Taylor series (which this section also does not discuss; see § 8.3) is a power series with ak = bk = 0 for k < 0.
37 36

44

CHAPTER 2. CLASSICAL ALGEBRA AND GEOMETRY

in modeling physical phenomena can conveniently be constructed as power series (or sums of power series)39 with suitable choices of ak , bk and zo . The property (2.61) applies to all such functions, with (2.62) also applying to those for which bk = 0 and zo = xo . The property the two equations represent is called the conjugation property. Basically, it says that if one replaces all the i in some mathematical model with −i, then the resulting conjugate model is equally as valid as the original.40 Such functions, whether bk = 0 and zo = xo or not, are analytic functions (§ 8.4). In the formal mathematical definition, a function is analytic which is infinitely differentiable (Ch. 4) in the immediate domain neighborhood of interest. However, for applications a fair working definition of the analytic function might be “a function expressible as a power series.” Chapter 8 elaborates. All power series are infinitely differentiable except at their poles. There nevertheless exist one common group of functions which cannot be constructed as power series. These all have to do with the parts of complex numbers and have been introduced in this very section: the magnitude |·|; the phase arg(·); the conjugate (·)∗ ; and the real and imaginary parts ℜ(·) and ℑ(·). These functions are not analytic and do not in general obey the conjugation property. Also not analytic are the Heaviside unit step u(t) and the Dirac delta δ(t) (§ 7.7), used to model discontinuities explicitly. We shall have more to say about analytic functions in Ch. 8. We shall have more to say about complex numbers in §§ 3.11, 4.3.3, and 4.4, and much more yet in Ch. 5.

The careful reader might observe that this statement neglects the Gibbs’ phenomenon, but that curious matter will be dealt with in [section not yet written]. 40 To illustrate, from the fact that (1 + i2)(2 + i3) + (1 − i) = −3 + i6 the conjugation property infers immediately that (1 − i2)(2 − i3) + (1 + i) = −3 − i6. Observe however that no such property holds for the real parts: (−1 + i2)(−2 + i3) + (−1 − i) = 3 + i6.

39

Chapter 3

Trigonometry
Trigonometry is the branch of mathematics which relates angles to lengths. This chapter introduces the trigonometric functions and derives their several properties.

3.1

Definitions

Consider the circle-inscribed right triangle of Fig. 3.1. In considering the circle, we shall find some terminology useful: the angle φ in the diagram is measured in radians, where a radian is the angle which, when centered in a unit circle, describes an arc of unit length.1 Measured in radians, an angle φ intercepts an arc of curved length ρφ on a circle of radius ρ (that is, of distance ρ from the circle’s center to its perimeter). An angle in radians is a dimensionless number, so one need not write “φ = 2π/4 radians”; it suffices to write “φ = 2π/4.” In mathematical theory, we express angles in radians. The angle of full revolution is given the symbol 2π—which thus is the circumference of a unit circle.2 A quarter revolution, 2π/4, is then the right angle, or square angle. The trigonometric functions sin φ and cos φ (the “sine” and “cosine” of φ) relate the angle φ to the lengths shown in Fig. 3.1. The tangent function is then defined as sin φ , (3.1) tan φ ≡ cos φ
The word “unit” means “one” in this context. A unit length is a length of 1 (not one centimeter or one mile, just an abstract 1). A unit circle is a circle of radius 1. 2 Section 8.11 computes the numerical value of 2π.
1

45

46

CHAPTER 3. TRIGONOMETRY

Figure 3.1: The sine and the cosine (shown on a circle-inscribed right triangle, with the circle centered at the triangle’s point).
y

φ

ρ cos φ

ρ sin φ

ρ

x

which is the “rise” per unit “run,” or slope, of the triangle’s diagonal. Inverses of the three trigonometric functions can also be defined: arcsin (sin φ) = φ, arccos (cos φ) = φ, arctan (tan φ) = φ. When the last of these is written in the form y , arctan x it is normally implied that x and y are to be interpreted as rectangular coordinates3 and that the arctan function is to return φ in the correct quadrant −π < φ ≤ π (for example, arctan[1/(−1)] = [+3/8][2π], whereas arctan[(−1)/1] = [−1/8][2π]). This is similarly the usual interpretation when an equation like y tan φ = x is written. By the Pythagorean theorem (§ 2.10), it is seen generally that4 cos2 φ + sin2 φ = 1. (3.2)

3 Rectangular coordinates are pairs of numbers (x, y) which uniquely specify points in a plane. Conventionally, the x coordinate indicates distance eastward; the y coordinate, northward. For instance, the coordinates (3, −4) mean the point three units eastward and four units southward (that is, −4 units northward) from the origin (0, 0). A third rectangular coordinate can also be added—(x, y, z)—where the z indicates distance upward. 4 The notation cos2 φ means (cos φ)2 .

3.2. SIMPLE PROPERTIES Figure 3.2: The sine function.

47

sin(t) 1

t 2π − 2 2π 2

Fig. 3.2 plots the sine function. The shape in the plot is called a sinusoid.

3.2

Simple properties

Inspecting Fig. 3.1 and observing (3.1) and (3.2), one readily discovers the several simple trigonometric properties of Table 3.1.

3.3

Scalars, vectors, and vector notation

In applied mathematics, a vector is a amplitude of some kind coupled with a direction.5 For example, “55 miles per hour northwestward” is a vector, as is the entity u depicted in Fig. 3.3. The entity v depicted in Fig. 3.4 is also a vector, in this case a three-dimensional one. Many readers will already find the basic vector concept familiar, but for those who do not, a brief review: Vectors such as the ˆ ˆ u = xx + yy, ˆ ˆ ˆ v = xx + yy + zz
5 The same word vector is also used to indicate an ordered set of N scalars (§ 8.16) or an N × 1 matrix (Ch. 11), but those are not the uses of the word meant here.

48

CHAPTER 3. TRIGONOMETRY

Table 3.1: Simple properties of the trigonometric functions.

sin(−φ) sin(2π/4 − φ) sin(2π/2 − φ) sin(φ ± 2π/4) sin(φ ± 2π/2) sin(φ + n2π)

= = = = = =

− sin φ + cos φ + sin φ ± cos φ − sin φ sin φ

cos(−φ) cos(2π/4 − φ) cos(2π/2 − φ) cos(φ ± 2π/4) cos(φ ± 2π/2) cos(φ + n2π) = = = = = = − tan φ +1/ tan φ − tan φ −1/ tan φ + tan φ tan φ

= = = = = =

+ cos φ + sin φ − cos φ ∓ sin φ − cos φ cos φ

tan(−φ) tan(2π/4 − φ) tan(2π/2 − φ) tan(φ ± 2π/4) tan(φ ± 2π/2) tan(φ + n2π)

sin φ = tan φ cos φ cos2 φ + sin2 φ = 1 1 1 + tan2 φ = cos2 φ 1 1 1+ 2φ = tan sin2 φ

3.3. SCALARS, VECTORS, AND VECTOR NOTATION

49

ˆ ˆ Figure 3.3: A two-dimensional vector u = xx + yy, shown with its rectangular components.
y

ˆ yy

u ˆ xx x

ˆ ˆ Figure 3.4: A three-dimensional vector v = xx + yy + ˆz. z
y

v x

z

50

CHAPTER 3. TRIGONOMETRY

ˆ ˆ ˆ of the figures are composed of multiples of the unit basis vectors x, y and z, which themselves are vectors of unit length pointing in the cardinal directions their respective symbols suggest.6 Any vector a can be factored into ˆ a amplitude a and a unit vector a, as

ˆ a = aa,

ˆ where the a represents direction only and has unit magnitude by definition, and where the a represents amplitude only and carries the physical units ˆ if any.7 For example, a = 55 miles per hour, a = northwestward. The ˆ unit vector a itself can be expressed in terms of the unit basis vectors: for √ √ ˆ ˆ ˆ ˆ −ˆ example, if x points east and y points north, then a =√ x(1/ 2)+ y(1/ 2), √ where per the Pythagorean theorem (−1/ 2)2 + (1/ 2)2 = 12 . A single number which is not a vector or a matrix (Ch. 11) is called a scalar. In the example, a = 55 miles per hour is a scalar. Though the scalar a in the example happens to be real, scalars can be complex, too— which might surprise one, since scalars by definition lack direction and the Argand phase φ of Fig. 2.5 so strongly resembles a direction. However, phase is not an actual direction in the vector sense (the real number line in the Argand plane cannot be said to run west-to-east, or anything like that). The x, y and z of Fig. 3.4 are each (possibly complex) scalars; v =

6 Printing by hand, one customarily writes a general vector like u as “ u ” or just “ u ”, ˆ and a unit vector like x as “ x ”. ˆ 7 The word “unit” here is unfortunately overloaded. As an adjective in mathematics, or in its noun form “unity,” it refers to the number one (1)—not one mile per hour, one kilogram, one Japanese yen or anything like that; just an abstract 1. The word “unit” itself as a noun however usually signifies a physical or financial reference quantity of measure, like a mile per hour, a kilogram or even a Japanese yen. There is no inherent mathematical unity to 1 mile per hour (otherwise known as 0.447 meters per second, among other names). By contrast, a “unitless 1”—a 1 with no physical unit attached— does represent mathematical unity. Consider the ratio r = h1 /ho of your height h1 to my height ho . Maybe you are taller than I am and so r = 1.05 (not 1.05 cm or 1.05 feet, just 1.05). Now consider the ratio h1 /h1 of your height to your own height. That ratio is of course unity, exactly 1. There is nothing ephemeral in the concept of mathematical unity, nor in the concept of unitless quantities in general. The concept is quite straightforward and entirely practical. That r > 1 means neither more nor less than that you are taller than I am. In applications, one often puts physical quantities in ratio precisely to strip the physical units from them, comparing the ratio to unity without regard to physical units.

but as neither orientation has a natural advantage over the other. A plane is two-dimensional. is a flat (but not necessarily level) surface. as the reader on this tier undoubtedly knows. in the general case vectors are not associated with any particular origin. they represent distances and directions. A point is zero-dimensional.3. ROTATION ˆ ˆ xx + yy + ˆz is a vector. then8 z |v|2 = |x|2 + |y|2 + |z|2 = x∗ x + y ∗ y + z ∗ z + [ℜ(z)]2 + [ℑ(z)]2 . If you hold the screwdriver in your right hand and turn the screw in the natural manner clockwise. then bend your fingers in the y direction and extend your thumb. = [ℜ(x)]2 + [ℑ(x)]2 + [ℜ(y)]2 + [ℑ(y)]2 51 (3. we arbitrarily but conventionally accept the right-handed one as standard. 8 . If x. and Ch. infinite in extent unless otherwise specified. A line is one-dimensional. y) can be identified with the vector xx + yy.” This book just renders it |v|.4 Rotation ˆ ˆ u = xx + yy (3. This is orientation by the right-hand rule. speak further of the vector. That is. 15. scalar magnitude of a complex vector.9 Sections 3. the point ˆ ˆ (x. The axes are oriented such that if you point your flat right hand in the x direction. The reason the last notation subscripts a numeral 2 is obscure. 10 A plane. Notice the relative orientation of the axes in Fig. but verbal lore in American engineering has it that the name “right-handed” comes from experience with a standard right-handed wood screw or machine screw. The plane belongs to this geometrical hierarchy. you’d probably find it easier to drive that screw with the screwdriver in your left hand.4 and 3. 3. Space is three-dimensional. turning the screw slot from the x orientation toward the y. not fixed positions.3) A point is sometimes identified by the vector expressing its distance and direction from the origin of the coordinate system. However.4) A fundamental problem in trigonometry arises when a vector ˆ ˆ ˆ must be expressed in terms of alternate unit vectors x′ and y′ . 3. the thumb then points in the z direction. Some books print |v| as v or even v 2 to emphasize that it represents the real. 9 The writer does not know the etymology for certain. If somehow you came across a left-handed screw.9. the screw advances away from you in the z direction into the wood or hole. where x′ ˆ ˆ ˆ and y′ stand at right angles to one another and lie in the plane10 of x and y. y and z are complex.4. of course. A lefthanded orientation is equally possible.4. having to do with the professional mathematician’s generalized definition of a thing he calls the “norm.

evidently ˆ ˆ x′ = +ˆ cos ψ + y sin ψ.5) ˆ ˆ y = +ˆ ′ sin ψ + y′ cos ψ.4) yields ˆ ˆ u = x′ (x cos ψ + y sin ψ) + y′ (−x sin ψ + y cos ψ). If the question is asked. that’s how it’s pronounced). Here. x Substituting (3.7) finds general application where rotations in rectangular coordinates are involved. 11 .5: Vector basis rotation. TRIGONOMETRY Figure 3. the mark merely distinguishes the new ˆ ˆ unit vector x′ from the old x.6) (3. Mathematical writing employs the mark for a variety of purposes. 3. (3. but anyway. Equation (3. x and by appeal to symmetry it stands to reason that ˆ ˆ x = +ˆ ′ cos ψ − y′ sin ψ. “what happens if I rotate not the unit basis vectors but rather the vector u instead?” the answer is The “ ′ ” mark is pronounced “prime” or “primed” (for no especially good reason of which the author is aware.52 CHAPTER 3.5. y u y ˆ′ ψ ψ ˆ x ˆ y ˆ x′ x but are rotated from the latter by an angle ψ as depicted in Fig.6) into (3.7) which was to be derived. x (3. x ˆ ˆ y′ = −ˆ sin ψ + y cos ψ.11 In terms of the trigonometric functions of § 3.1.

we would have to rotate ˆ it by ψ = α − β. When one actually rotates a physical body.3. of course. TRIGONOMETRIC SUMS AND DIFFERENCES 53 that it amounts to the same thing. ˆ ≡ x cos β + y sin β. If we wanted b to coincide with a. 12 . ˆ ˆ Since we have deliberately chosen the angle of rotation such that b′ = a.4 in hand. (3. the body experiences forces during rotation which might or might not change the body internally in some relevant way. According to (3. with respect to the vectors themselves.8) Whether it is the basis or the vector which rotates thus depends on your point of view.12 3. if we did this we would obtain ˆ ˆ b′ = x[cos β cos(α − β) − sin β sin(α − β)] ˆ + y[cos β sin(α − β) + sin β cos(α − β)].5 Trigonometric functions of sums and differences of angles With the results of § 3. ˆ ˆ b be vectors of unit length in the xy plane. we ˆ ˆ ˆ ˆ can separately equate the x and y terms in the expressions for a and b′ to obtain the pair of equations cos α = cos β cos(α − β) − sin β sin(α − β).8) and the definition of b. This is only true. Let ˆ ˆ ˆ a ≡ x cos α + y sin α. we now stand in a position to consider trigonometric functions of sums and differences of angles. sin α = cos β sin(α − β) + sin β cos(α − β). respectively at angles α and β ˆ ˆ from the x axis. except that the sense of the rotation is reversed: ˆ ˆ u′ = x(x cos ψ − y sin ψ) + y(x sin ψ + y cos ψ).5.

(3. then to solve the result for sin(α − β). • to add cos β times the first equation to sin β times the second. In this book alone. 14 Refer to footnote 13 above for the technique.14 These include cos(α − β) − cos(α + β) . 2 sin(α − β) + sin(α + β) sin α cos β = . (3. TRIGONOMETRY Solving the last pair simultaneously13 for sin(α − β) and cos(α − β) and observing that sin2 (·) + cos2 (·) = 1 yields sin(α − β) = sin α cos β − cos α sin β.5.10) Equations (3. 2 cos(α − β) + cos(α + β) cos α cos β = . 2 sin α sin β = 13 (3. eqns. (3.54 CHAPTER 3.10) are achieved by combining the equations in various straightforward ways.9) become sin(α + β) = sin α cos β + cos α sin β. 3. then to solve the result for cos(α − β).1 Variations on the sums and differences Several useful variations on (3.9) With the change of variable β ← −β and the observations from Table 3.1 that sin(−φ) = − sin φ and cos(−φ) = + cos(φ).10) are the basic formulas for trigonometric functions of sums and differences of angles. cos(α + β) = cos α cos β − sin α sin β. it proves useful many times. .9) and (3.11) The easy way to do this is • to subtract sin β times the first equation from cos β times the second. cos(α − β) = cos α cos β + sin α sin β.9) and (3. This shortcut technique for solving a pair of equations simultaneously for a pair of variables is well worth mastering.

6 Trigonometric functions of the hour angles In general one uses the Taylor series of Ch.10) become γ−δ γ+δ cos − cos 2 2 γ+δ γ−δ cos δ = cos cos + sin 2 2 γ+δ γ−δ sin γ = sin cos + cos 2 2 γ+δ γ−δ cos γ = cos cos − sin 2 2 sin δ = sin Combining these in various ways. cos 2α = 2 cos2 α − 1 = cos2 α − sin2 α = 1 − 2 sin2 α.13) for sin2 α and cos2 α yields the half-angle formulas 1 − cos 2α . 8 to calculate trigonometric functions of specific angles. (3. Solving (3. 2 2 γ+δ γ−δ sin γ − sin δ = 2 cos sin . TRIGONOMETRICS OF THE HOUR ANGLES 55 With the change of variables δ ← α − β and γ ← α + β. (3. . 2 sin2 α = (3. then eqns.13) (3. 2 2 γ+δ γ−δ cos δ + cos γ = 2 cos cos . However. we have that γ−δ γ+δ cos .5.3.14) 3. (3. for angles which happen to be integral .9) and (3. .6.12) 3. 2 1 + cos 2α cos2 α = .2 Trigonometric functions of double and half angles If α = β. cos δ − cos γ = 2 sin 2 2 sin γ + sin δ = 2 sin γ+δ 2 γ+δ 2 γ+δ 2 γ+δ 2 sin sin sin sin γ−δ 2 γ−δ 2 γ−δ 2 γ−δ 2 . 2 2 γ−δ γ+δ sin .10) become the double-angle formulas sin 2α = 2 sin α cos α. .

[4] The hex and hour notations are recommended mostly only for theoretical math work. the reader will approve the choice. . Figure 3. • There are 360 degrees in a circle.3). TRIGONOMETRY multiples of an hour —there are twenty-four or 0x18 hours in a circle. 16 An equilateral triangle is. look at the square and the equilateral triangle16 of Fig. the angle between hours on the Greenwich clock is indeed an honest hour of arc.10) then supplies the various other lengths in the figure. the first sentence nevertheless says the thing rather better. If you have ever been to the Old Royal Observatory at Greenwich. This is so because the clock face’s geometry is artificial.6 shows the angles.7. so the angle between eleven o’clock and twelve on the clock face is not an hour of arc! That angle is two hours of arc. Consider: • There are 0x18 hours in a circle. Since such angles arise very frequently in practice. 15 Hence an hour is 15◦ . you’d probably not like the reception the memo got.56 CHAPTER 3. The familiar. 3. the improved notation fits a book of this kind so well that the author hazards it. The reader is urged to invest the attention and effort to master the notation. Also by symmetry. just as there are twenty-four or 0x18 hours in a day15 —for such angles simpler expressions exist. if you were. you’re in good company.” were you? Well. and the diagonal splits the square’s corner into equal halves of three hours each. It is hoped that after trying the notation a while. a triangle whose three sides all have the same length. the perpendicular splits the triangle’s top angle into equal halves of two hours each and its bottom leg into equal segments of length 1/2 each. Nonetheless. The Pythagorean theorem (§ 2. Table 3. It’d be a bit hard to read the time from such a crowded clock face were it not so big. Both sentences say the same thing.80 hours instead of 22. standard clock face shows only twelve hours not twenty-four. you may have seen the big clock face there with all twenty-four hours on it.5 degrees. 3. To see how the values in the table are calculated. There is a psychological trap regarding the hour. after which we observe from Fig. for example. and since a triangle’s angles always total twelve hours (§ 2. The author is fully aware of the barrier the unfamiliar notation poses for most first-time readers of the book. It is not claimed that they offer much benefit in most technical work of the less theoretical kinds. England. by symmetry each of the angles of the equilateral triangle in the figure measures four.9. but you weren’t going to write your angles in such inelegant conventional notation as “15◦ . If you wrote an engineering memo describing a survey angle as 0x1.1 that • the sine of a non-right angle in a right triangle is the opposite leg’s length divided by the diagonal’s. The barrier is erected neither lightly nor disrespectfully. as the name and the figure suggest. Each of the square’s four angles naturally measures six hours.2 tabulates the trigonometric functions of these hour angles. it seems worth our while to study them specially. but anyway. don’t they? But even though the “0x” hex prefix is a bit clumsy.

6: The 0x18 hours in a circle. Figure 3. TRIGONOMETRICS OF THE HOUR ANGLES 57 Figure 3.6.7: A square and an equilateral triangle for calculating trigonometric functions of the hour angles.3. √ 2 1 1 √ 3 2 1 1 1/2 1/2 .

2: Trigonometric functions of the hour angles. ANGLE φ [radians] [hours] 0 2π 0x18 2π 0xC 2π 8 2π 6 (5)(2π) 0x18 2π 4 0 1 2 3 4 5 6 sin φ 0 √ 3−1 √ 2 2 1 2 1 √ 2 √ 3 2 tan φ 0 √ 3−1 √ 3+1 1 √ 3 1 √ 3 cos φ 1 √ 3+1 √ 2 2 √ 3 2 1 √ 2 1 2 √ 3+1 √ 2 2 1 √ 3+1 √ 3−1 ∞ √ 3−1 √ 2 2 0 .58 CHAPTER 3. TRIGONOMETRY Table 3.

9) and (3. difference and half-angle formulas from the preceding sections to the values already in the table. However. The values for one and five hours are found by applying (3.8: The laws of sines and cosines. 17 . With this observation and the lengths in the figure. and • the cosine is the adjacent leg’s length divided by the diagonal’s.7 The laws of sines and cosines Refer to the triangle of Fig. THE LAWS OF SINES AND COSINES Figure 3.10) against the values for two and three hours just calculated. of course. the Taylor series (§ 8.9) offers a cleaner. one can calculate the sine. 59 y x γ b α h a c β • the tangent is the opposite leg’s length divided by the adjacent leg’s.17 3. 3. b c The creative reader may notice that he can extend the table to any angle by repeated application of the various sum. one can write that c sin β = h = b sin γ.8. quicker way to calculate trigonometrics of non-hour angles. The values for zero and six hours are. or in other words that sin γ sin β = .3. three and four hours. seen by inspection. By the definition of the sine function. tangent and cosine of angles of two.7.

γ is an angle not a point. too.15) This equation is known as the law of sines. Table 3. On the other hand. 3.1 on page 48 has summarized simple properties of the trigonometric functions. the writer suspects in light of Fig. ˆ ˆ b = xb cos γ + yb sin γ.8 Summary of properties Table 3.60 CHAPTER 3.3 summarizes further properties. what is true for them must be true for α. if one expresses a and b as vectors emanating from the point γ. However.4. then c2 = |b − a|2 = a2 + (b2 )(cos2 γ + sin2 γ) − 2ab cos γ.8 that few readers will be confused as to which point is meant.16) = (b cos γ − a)2 + (b sin γ)2 3. this is c2 = a2 + b2 − 2ab cos γ. if we like. gathering them from §§ 3.18 Hence. Nothing prevents us from dropping additional perpendiculars hβ and hγ from the other two corners and using those as utility variables. Of course there is no “point γ”. “there is something special about α. . However. 19 Here is another example of the book’s judicious relaxation of formal rigor. Table 3. The perpendicular h drops from it. we’re not interested in h itself. Since cos2 (·) + sin2 (·) = 1. TRIGONOMETRY But there is nothing special about β and γ. 18 “But. The skillful applied mathematician does not multiply labels without need.7. too. (3. 3.” True. sin α sin β sin γ = = . the h is just a utility variable to help us to manipulate the equation into the desired form. We can use any utility variables we want.19 ˆ a = xa. known as the law of cosines.2 on page 58 has listed the values of trigonometric functions of the hour angles.5 and 3. a b c (3.” it is objected.

ˆ ˆ u = x′ (x cos ψ + y sin ψ) + y′ (−x sin ψ + y cos ψ) cos(α ± β) = cos α cos β ∓ sin α sin β cos(α − β) − cos(α + β) sin α sin β = 2 sin(α − β) + sin(α + β) sin α cos β = 2 cos(α − β) + cos(α + β) cos α cos β = 2 γ−δ γ+δ cos sin γ + sin δ = 2 sin 2 2 γ+δ γ−δ sin γ − sin δ = 2 cos sin 2 2 γ+δ γ−δ cos δ + cos γ = 2 cos cos 2 2 γ+δ γ−δ cos δ − cos γ = 2 sin sin 2 2 sin 2α = 2 sin α cos α cos 2α = 2 cos2 α − 1 = cos2 α − sin2 α = 1 − 2 sin2 α 1 − cos 2α sin2 α = 2 1 + cos 2α cos2 α = 2 sin γ sin α sin β = = c a b c2 = a2 + b2 − 2ab cos γ sin(α ± β) = sin α cos β ± cos α sin β .3. SUMMARY OF PROPERTIES 61 Table 3.8.3: Further properties of the trigonometric functions.

x ˆ ≡ +ˆ cos θ + ρ sin θ. 7 May 2006] 21 Notice that the φ is conventionally written second in cylindrical (ρ. z). or the illumination a lamp sheds on a printed page of this book. θ. Consider for example an electric wire’s magnetic field. Such coordinates are related to one another and to the rectangular coordinates by the formulas of Table 3. θ.9 Cylindrical and spherical coordinates ˆ ˆ ˆ v = xx + yy + zz. This odd-seeming convention is to maintain proper righthanded coordinate rotation. the cylindrical coordinates (ρ. and are convenient for many purposes. That is. “Orthonormality. y. 15 is read.62 CHAPTER 3.” 14:19. There are no constant unit basis vectors to match them.4 on page 49). ˆ ˆ ˆ ˆ ˆ ˆ ˆ ˆ v = xx + yy + zz = ρρ + φφ + zz = ˆr + θθ + φφ.17) Such variable unit basis vectors point locally in the directions in which their respective coordinates advance. z) can be used instead of the rectangular coordinates (x. φ) coordinates.21 Refer to Fig. However. z (3.9. 3. xˆz Such rectangular coordinates are simple and general. φ.” [52.) 20 .4. rectangular coordinates—which given a specific orthonormal20 set of unit basis vectors [ˆ y ˆ] uniquely identify a point (see Fig. there are at least two broad classes of conceptually simple problems for which rectangular coordinates tend to be inconvenient: problems in which an axis or a point dominates. TRIGONOMETRY 3. r It doesn’t work that way. Orthonormal in this context means “of unit length and at right angles to the other vectors in the set. To attack a problem dominated by an axis. To attack a problem dominated by a point. Section 3.3 has introduced the concept of the vector The coefficients (x. φ) can be used. which depends on the book’s distance from the lamp (a point). Nevertheless variable unit basis vectors are defined: ˆ ˆ ρ ≡ +ˆ cos φ + y sin φ. Cylindrical and spherical coordinates can greatly simplify the analyses of the kinds of problems they respectively fit. z) on the equation’s right side are coordinates—specifically. but they come at a price. y. the spherical coordinates (r. φ. whose intensity varies with distance from the wire (an axis). ˆ r z ˆ ˆ θ ≡ −ˆ sin θ + ρ cos θ. (The explanation will seem clearer once Ch. 3. z) but third in spherical (r. x ˆ ˆ φ ≡ −ˆ sin φ + y cos φ.

CYLINDRICAL AND SPHERICAL COORDINATES 63 Figure 3. ρ2 = x2 + y 2 r 2 = ρ2 + z 2 = x2 + y 2 + z 2 ρ tan θ = z y tan φ = x z = r cos θ ρ = r sin θ x = ρ cos φ = r sin θ cos φ y = ρ sin φ = r sin θ sin φ .4: Relations among the rectangular.3. φ) and cylindrical (ρ.9: A point on a sphere. cylindrical and spherical coordinates. z) coordinates.) ˆ z ˆ θ φ x ˆ r z y ˆ ρ Table 3. θ. in spherical (r. φ.9. (The axis labels bear circumflexes in this figure only to disambiguate the z axis from the cylindrical coordinate z.

(3. 2. but you can use symbols like (ρx )2 = y 2 + z 2 . when some of the numbers summed are positive and others negative.2. y (ρy )2 = z 2 + x2 .23 But if the triangle inequalities hold for vectors in a plane.” but that’s all right. (3. See § 1.22 ρx . The writer is not aware of any conventionally established symbols for quantities like these. 22 . we have that k zk ≤ k |zk | .2 in vector notation. TRIGONOMETRY ˆ Convention usually orients z in the direction of a problem’s axis. Evidently. b and c represent the three sides of a triangle such that a + b + c = 0.64 CHAPTER 3.20) Naturally. y x .19) for any two complex numbers z1 and z2 . These are just the triangle inequalities of § 2.9.18) 3.19) and (3.5 on page 40. θ and φ is usually not a good idea.2.2 uses the “<” sign rather than the “≤. for example. Symbols like ρx are logical but. not standard. tan θ x = tan φx = instead if needed. one may find the latter formula useful for sums of real numbers.10 The complex triangle inequalities If a.9.20) hold equally well for real numbers as for complex. one might note that § 2. 23 Reading closely. Changing the meanings of known symbols like ρ. then per (2. z (3. |z1 | − |z2 | ≤ |z1 + z2 | ≤ |z1 | + |z2 | (3. tan θ y = tan φy = ρy .44) |a| − |b| ≤ |a + b| ≤ |a| + |b| . x z . Extending the sum inequality. then why not equally for complex numbers? Consider the geometric interpretation of the Argand plane of Fig. Occaˆ sionally however a problem arises in which it is more convenient to orient x ˆ or y in the direction of the problem’s axis (usually because ˆ has already z been established in the direction of some other pertinent axis). as far as this writer is aware.

3.21) . it suffices (3.10.23) to the equation yields z1 z2 = (cos φ1 cos φ2 − sin φ1 sin φ2 ) + i(sin φ1 cos φ2 + cos φ1 sin φ2 ). ρ1 ρ2 But according to (3.20) is that if |zk | converges.24) is an important result. Per (2. If z = x + iy.53). It says that if you want to multiply complex numbers. 3.5 we can write z = (ρ)(cos φ + i sin φ) = ρ cis φ. then evidently x = ρ cos φ.11.24) Equation (3.10). 2. where cis φ ≡ cos φ + i sin φ. DE MOIVRE’S THEOREM 65 An important consequence of (3. (3.3. ρ1 ρ2 or in other words z1 z2 = ρ1 ρ2 cis(φ1 + φ2 ).20) implies convergence of zk .3 (page 49). 2.11 De Moivre’s theorem Compare the Argand-plotted complex number of Fig. y = ρ sin φ. Such a consequence is important because mathematical derivations sometimes need the convergence of zk established. With reference to Fig. this is just z1 z2 = cos(φ1 + φ2 ) + i sin(φ1 + φ2 ). Convergence of |zk |. is often easier to establish.5 (page 40) against the vector of Fig.22) (3. then zk also converges. z1 z2 = (x1 x2 − y1 y2 ) + i(y1 x2 + x1 y2 ). which per (3. Although complex numbers are scalars not vectors. Equation (3.3. which can be hard to do directly.20) will find use among other places in § 8. the figures do suggest an analogy between complex phase and vector direction.23) (3. Applying (3.

if you refer to any of them as de Moivre’s theorem then you are unlikely to be misunderstood. but shall in § 5. TRIGONOMETRY It follows by parallel reasoning (or by extension) that z1 ρ1 = cis(φ1 − φ2 ) z2 ρ2 and by extension that z a = ρa cis aφ. (3. but since the three equations express essentially the same idea.26) Equations (3.25) and (3.25) Also called de Moivre’s formula.24). that cis φ = exp iφ = eiφ . 25 [44][52] 24 . (3.26). (3.66 • to multiply their magnitudes and • to add their phases. 5. Some authors apply the name of de Moivre directly only to (3.4. where exp(·) is the natural exponential function and e is the natural logarithmic base.25 We have not shown yet.26) are known as de Moivre’s theorem. De Moivre’s theorem is most useful in this light. both defined in Ch.24 . CHAPTER 3. or to some variation thereof.

It cannot be the role of a book like this one to lead the beginner gently toward an apprehension of the basic calculus concept. f ′ (t)? • Interpreting some function f ′ (t) as an instantaneous rate of change. the calculus beginner’s first task is quantitatively to understand the pair’s interrelationship. 4.1 Infinitesimals and limits Calculus systematically treats numbers so large and so small. Leibnitz cleared this hurdle for us in the seventeenth century. to understand this pair of questions. f (t)? This chapter builds toward a basic understanding of the first question. generality and significance. the concept is simple and briefly stated. so now at least we know the right pair of questions to ask. In this book we necessarily state the concept briefly. dedicated beginner could perhaps obtain the basic calculus concept directly here. 1 67 . they lie beyond the reach of our mundane number system. then move along.Chapter 4 The derivative The mathematics of calculus concerns a complementary pair of questions:1 • Given some function f (t). so briefly stated. or derivative. he would probably find it quicker and more pleasant to begin with a book like the one referenced. Although a sufficiently talented. The greatest conceptual hurdle—the stroke of brilliance—probably lies simply in stating the pair of questions clearly. what is the corresponding accretion. Once grasped. Although once grasped the concept is relatively simple. is no trivial thing. Sir Isaac Newton and G.W. or integral. what is the function’s instantaneous rate of change. With the pair in hand. They are the pair which eluded or confounded the most brilliant mathematical minds of the ancient world. Such an understanding constitutes the basic calculus concept. Many instructional textbooks—[20] is a worthy example—have been written to lead the beginner gently.

very small. Nevertheless.” “Smaller than 0x0.” “Zero. etc. then 1/ǫ can be regarded as an infinity: a very large number much larger than any mundane number one can name.” “Smaller than 0x0.” “Oh. My infinitesimal is definitely bigger than zero.0001? Is it smaller than that?” “Much smaller. my infinitesimal is smaller still. “Very. Bigger than that. then the quotient δ/ǫ is not some unfathomable 0/0.68 CHAPTER 4.1 The infinitesimal A number ǫ is an infinitesimal if it is so small that 0 < |ǫ| < a for all possible mundane positive numbers a. “How small?” “Very small. to divide them. You said that we should use hexadecimal notation in this book. In physical applications. the infinitesimals are often not true mathematical infinitesimals but rather relatively very small quantities such as the mass of a wood screw compared to the mass . right. Yes. The principal advantage of using symbols like ǫ rather than 0 for infinitesimals is in that it permits us conveniently to compare one infinitesimal against another. For instance.01.” “Smaller than 2−0x1 0000 0000 0000 0000 ?” “Now that is an impressively small number.01?” “Smaller than what?” “Than 2−8 . then. rather it is δ/ǫ = 3. This is somewhat a difficult concept.” “What about 0x0. smaller than 0x0. THE DERIVATIVE 4. “How big is your infinitesimal?” you ask.1. If ǫ is an infinitesimal. to add them together. Let me propose to you that I have an infinitesimal. but its smallness conceptually lies beyond the reach of our mundane number system.0000 0000 0000 0001?” “Smaller. It is a definite number of a certain nonzero magnitude.” This is the idea of the infinitesimal. if δ = 3ǫ is another infinitesimal. no. remember?” “Sorry.” I reply. so if it is not immediately clear then let us approach the matter colloquially.

a quantity less than 1/0x10000 of the principal is indeed infinitesimal for most practical purposes (but not all: for example. z→0 2z 2 lim The symbol “limQ ” is short for “in the limit as Q. INFINITESIMALS AND LIMITS 69 of a wooden house frame.6).” Notice that lim is not a function like log or sin.1. one common way to specify that ǫ be infinitesimal is to write that ǫ ≪ 1.2 The second-order infinitesimal ǫ2 is so small on the scale of the common. the implication is that z draws toward zo from + the positive side such that z > zo . used when saying that the quantity 2 Among scientists and engineers who study wave phenomena. or v ≫ u. The reason for the notation is to provide a way to handle expressions like 3z 2z as z vanishes: 3z 3 = . indicates that u is much less than v. depending on your point of view. the − implication is that z draws toward zo from the negative side. even very roughly speaking.1. 4. . When written limz→zo . It is just a reminder that a quantity approaches some value. On the other hand. positions of spacecraft and concentrations of chemical impurities must sometimes be accounted more precisely). In any case. typically such that one can regard the quantity u/v to be an infinitesimal. whatever “negligible” means in the context.and higher-order infinitesimals are likewise possible. Similarly. when written limz→zo .4. The ǫ2 is an infinitesimal to the infinitesimals. we should probably render the rule as twelve points per wavelength here. first-order infinitesimal ǫ that the even latter cannot measure it. In fact. For quantities between 1/0xC and 1/0x10000. The key point is that the infinitesimal quantity be negligible by comparison. The notation u ≪ v. there is an old rule of thumb that sinusoidal waveforms be discretized not less finely than ten points per wavelength. The additional cost of inviting one more guest to the wedding may or may not be infinitesimal. or the audio power of your voice compared to that of a jet engine. In keeping with this book’s adecimal theme (Appendix A) and the concept of the hour of arc (§ 3.2 Limits The notation limz→zo indicates that z draws as near to zo as it possibly can. Third. it depends on the accuracy one seeks. a quantity greater then 1/0xC of the principal to which it compares probably cannot rightly be regarded as infinitesimal.

There are evidently n P ≡ n!/(n − k)! (4. or permutations. However. because you can accept or reject the first block. I have several small wooden blocks of various shapes and sizes. there are n − 1 options (why not n options? because you have already taken one block from me.2 Combinatorics In its general form. Then you select your second favorite: for this. whereas red-white-blue is a different combination entirely. This section treats the problem in its basic form. some of these distinct permutations put exactly the same combination of blocks in your hand. Then you select your third favorite—for this there are n − 2 options—and so on until you have k blocks. neither more nor fewer. and so on.1) k ordered ways. but it is not magical. the permutations redgreen-blue and green-red-blue constitute the same combination. For a single combination 3 [20] . you select your favorite block first: there are n options for this. Consider that to say z→2− lim (z + 2) = 4 is just a fancy way of saying that 2 + 2 = 4. the problem of selecting k specific items out of a set of n available items belongs to probability theory ([chapter not yet written]). In its basic form. for instance. then the third. however.1 Combinations and permutations Consider the following scenario. Now. the same problem also applies to the handling of polynomials or power series. If I offer you the blocks and you are free to take all. some or none of them at your option. suppose that what you want is exactly k blocks. I have only n − 1 blocks left). then accept or reject the second. 4.3 4. then how many distinct choices of blocks do you have? Answer: you have 2n choices. available for you to select exactly k blocks. painted different colors so that you can clearly tell each block from the others. Don’t let it confuse you.70 CHAPTER 4. THE DERIVATIVE equaled the value would be confusing.2. if you can take whichever blocks you want. Desiring exactly k blocks. The lim notation is convenient to use sometimes.

9) n−k+1 n k−1 k k+1 n n−k k+1 n k n−1 k−1 Equation (4.2). blue-green-red). evidently k! permutations are possible (redgreen-blue.2) n n n−k n k (4.2).4. you can either choose k from the original set.7) (4. to choose k blocks then. (4. red-blue-green. k=0 n−1 k−1 + n−1 k n k (4.9) come directly from the definition (4. k! of combinations include that = n k .5) is seen when an nth block—let us say that it is a black block—is added to an existing set of n − 1 blocks. green-blue-red.4) = 2n . (4.3) (4. (4.6) through (4. Because one can choose neither fewer than zero nor more than n from n blocks. Equation (4.10) k For „ n k « n n−k n−1 k when n < 0. they relate combinatoric coefficients to their neighbors in Pascal’s triangle (§ 4. there is no obvious definition. blue-red-green. green-red-blue. COMBINATORICS 71 of k blocks (red. Equation (4. Hence dividing the number of permutations (4. = = = = = n k . blue).8) . green.4) comes directly from the observation (made at the head of this section) that 2n total combinations are possible if any k is allowed. .5) (4. Equations (4.3) results from changing the variable k ← n − k in (4.6) (4.2. n = 0 unless 0 ≤ k ≤ n.2).2.1) by k! yields the number of combinations n k Properties of the number „ n k ≡ « n!/(n − k)! . or the black block plus k − 1 from the original set.

4. the extension is not covered in this book. (In fact this is the easy way to fill Pascal’s triangle out: for each entry.11) The author is given to understand that. 4.2.) 4.3 The binomial theorem This section presents the binomial theorem and one of its significant consequences. „ „ 3 0 «„ „ 2 0 «„ 4 1 1 0 «„ 3 1 «„ 0 0 «„ 2 1 «„ 4 2 « 1 1 «„ 3 2 «„ « 2 2 «„ 4 3 « 3 3 «„ „ „ 4 0 « 4 4 « . .5) predicts. just add the two entries above. . by an heroic derivational effort.1 Expanding the binomial The binomial theorem holds that4 n (a + b) = k=0 4 n n k an−k bk .1: The plan for Pascal’s triangle.72 CHAPTER 4. However. Notice how each entry in the triangle is the sum of the two entries immediately above.1 of the various possible n . (4.11) can be extended to nonintegral n. 4. Pascal’s triangle.2 Pascal’s triangle „ « Consider the triangular layout in Fig. since applied mathematics does not usually concern itself with hard theorems of little known practical use.2. this yields Fig. k Evaluated. (4. as (4. .3. 4. THE DERIVATIVE Figure 4.

each (a+b) factor corresponds to one of the “wooden blocks. 4.3. .12) (actually this holds for any ǫ. we have that (1 + δ)1/m ≈ 1 + δ .2 Since „ Powers of numbers near unity n 0 « = 1 and „ n 1 « = n. Since (a + b)n = (a + b)(a + b) · · · (a + b)(a + b). m .” where a means rejecting the block and b. this is n (1 + ǫ) = k=0 n n k ǫk (4. THE BINOMIAL THEOREM Figure 4. b = ǫ.2.3. 73 1 1 1 1 2 1 1 3 3 1 1 4 6 4 1 1 5 A A 5 1 1 6 F 14 F 6 1 1 7 15 23 23 15 7 1 . In either form. but the typical case of interest has |ǫ| ≪ 1). it follows from (4. In the common case that a = 1. small or large. the binomial theorem is a direct consequence of the combinatorics of § 4. raising the equation to the 1/n power then changing δ ← nǫo and m ← n.12) that (1 + ǫo )n ≈ 1 + nǫo . .2: Pascal’s triangle. accepting it. to excellent precision for nonnegative integral n if ǫo is sufficiently small. |ǫ| ≪ 1. Furthermore.4.

The Taylor expansion as such will be introduced in Ch.74 CHAPTER 4.14) grows large. we have that (1 + ǫ)n/m ≈ 1 + Inverting this equation yields (1 + ǫ)−n/m ≈ [1 − (n/m)ǫ] n 1 = ≈ 1 − ǫ.13). are complex? Changing the symbol z ← x and observing that the infinitesimal ǫ may also be complex. one would like (4. THE DERIVATIVE Changing 1 + δ ← (1 + ǫ)n and observing from the (1 + ǫo )n equation above that this implies that δ ≈ nǫ. The equation offers a simple. because although a complex infinitesimal ǫ poses no particular problem. the action of a complex power z remains undefined. the last two equations imply that (1 + ǫ)x ≈ 1 + xǫ (4. 5 Other than “the first-order Taylor expansion. or both. m Taken together.” but such an unwieldy name does not fit the present context. The writer knows of no conventional name5 for (4. for consistency’s sake. 4. one wants to know whether (1 + ǫ)z ≈ 1 + zǫ (4. as follows.13) raises the question: what if ǫ or x.3.4 will investigate the extremely interesting effects which arise when ℜ(ǫ) = 0 and the power z in (4. accurate way of approximating any real power of numbers in the near neighborhood of 1. which we now do.14) does hold. . Still. but for the moment we shall use the equation in a more ordinary manner to develop the concept and basic application of the derivative.13) into the new domain.14) still holds.14) to hold. 8. Section 5. logically extending the known result (4. No work we have yet done in the book answers the question. but named or unnamed it is an important equation.13) for any real x. 1 + (n/m)ǫ [1 − (n/m)ǫ][1 + (n/m)ǫ] m n ǫ.3 Complex powers of numbers near unity Equation (4. In fact nothing prevents us from defining the action of a complex power such that (4.

15) says that f (t) = = ′ ∞ k=−∞ ∞ k=−∞ ǫ→0+ lim (ck )(t + ǫ/2)k − (ck )(t − ǫ/2)k ǫ (1 + ǫ/2t)k − (1 − ǫ/2t)k . ǫ ǫ→0+ lim ck tk Applying (4. f ′ (t) ≡ lim f (t + ǫ/2) − f (t − ǫ/2) .14).15). In mathematical symbols and for the moment using real numbers. (4. There is no helping this. . ǫ ǫ→0+ but this book generally prefers the more elegant balanced form (4. The reader is advised to tread through these sections line by stubborn line.4. (4. THE DERIVATIVE 75 4. we now stand in a position properly to introduce the chapter’s subject. which we will now use in developing the derivative’s several properties through the rest of the chapter.4 The derivative Having laid down (4.6 4. the mathematical notation grows a little thick. in the good trust that the math thus gained will prove both interesting and useful. ǫ From this section through § 4.4. ǫ (4. What is the derivative? The derivative is the instantaneous rate or slope of a function.15) ǫ→0+ Alternately. one can define the same derivative in the unbalanced form f ′ (t) = lim f (t + ǫ) − f (t) .1 The derivative of the power series In the very common case that f (t) is the power series f (t) = ∞ k=−∞ ck tk .16) where the ck are in general complex coefficients.14).7.4. the derivative. this is f ′ (t) = 6 ∞ k=−∞ ǫ→0+ lim ck tk (1 + kǫ/2t) − (1 − kǫ/2t) .

15) and (4.3 will remedy the oversight. . it’s important. = f (t + dt/2) − f (t − dt/2). In (4. The quantities dt and df represent coordinated infinitesimal changes in t and f respectively. the meaning is clear enough: d(·) signifies how much (·) changes when the independent variable t increments by dt.18) until you do.2 The Leibnitz notation The f ′ (t) notation used above for the derivative is due to Sir Isaac Newton. many users of applied mathematics have never developed a clear understanding as to precisely what the individual symbols mean. and df is a dependent infinitesimal whose size relative to dt depends on the independent variable t.4. conceptually. Usually better on the whole. one can choose any infinitesimal size ǫ. however. and is easier to start with. (4. is G. Because there is significant practical benefit in learning how to handle the Leibnitz notation correctly—particularly in applied complex variable theory—this subsection seeks to present each Leibnitz element in its correct light.17) Equation (4. like ∂f /∂z. The meaning of the symbol d unfortunately depends on the context. 10 This is difficult. reread it carefully with reference to (4.4. 9 If you do not fully understand this sentence.15). 8 This subsection is likely to confuse many readers the first time they read it. For the independent infinitesimal dt. df such that per (4. As a result.17) admittedly has not explicitly considered what happens when the real t becomes the complex z.7 4. that the notation dt itself has two distinct meanings:10 f ′ (t) = 7 Equation (4. Often they have developed positive misunderstandings.18).17) gives the general derivative of the power series.9 Notice. df . Usually the exact choice of size does not matter. THE DERIVATIVE ∞ k=−∞ ck ktk−1 . more concise way to state it.18) dt Here dt is the infinitesimal. (4.76 which simplifies to f (t) = ′ CHAPTER 4. yet the author can think of no clearer. The reason is that Leibnitz elements like dt and ∂f usually tend to appear in practice in certain specific relations to one another. however. but occasionally when there are two independent variables it helps the analysis to adjust the size of one of the independent infinitesimals with respect to the other.W. Leibnitz’s notation8 dt = ǫ. but § 4.

For instance. Maybe it would be clearer in some cases to write dt f instead of df . nor do we usually worry about the precise value of dt. and 77 At first glance. Now. By contrast. then df or d(f ) is the amount by which f changes as t changes by dt. This leads one to forget that dt does indeed have a precise value. t). such that dt has prior independence to ds. The point is to develop a clear picture in your mind of what a Leibnitz infinitesimal really is. However. The df is infinitesimal but not constant. The important point is that dv and dx be coordinated so that the ratio dv/dx has a definite value no matter which of the two be regarded as independent. Changing perspective is allowed and perfectly proper. but it really denies the entire point of the Leibnitz notation. whose specific size could be a function of t but more likely is just a constant. Maybe it would be clearer in some cases to write ǫ instead of dt. Because the book is a book of applied mathematics. What confuses is when one changes perspective in mid-analysis. the math can get a little tricky. • d(t). However. then we can (and in complex analysis sometimes do) say that ds = dt = ǫ. Sometimes when writing a differential equation like the potential-kinetic energy equation ma dx = mv dv.4. this writer recommends that you learn the Leibnitz notation properly. now regarding f as the independent variable. . what we are interested in is not dt or df as such. which is how much (t) changes as t increments by dt. the distinction between dt and d(t) seems a distinction without a difference. the ratio df /dt remains the same in any case. then dt does not fundamentally have anything to do with t as such. so there is usually no trouble with treating dt and df as though they were the same kind of thing. developing the ability to treat the infinitesimals individually. and for most practical cases of interest. Many applied mathematicians do precisely that. so indeed it is. One can avoid the confusion simply by keeping the dv/dx or df /dt always in ratio. but the latter is how it is conventionally written. it has not yet pointed out (but does so now) that even if s and t are equally independent variables. we do not necessarily have either v or x in mind as the independent variable. If t is an independent variable. if f is a dependent variable. However. but one must take care: the dt and df after the change are not the same as the dt and df before the change. The point is not to fathom all the possible implications from the start. In fact. ds = δ(s. THE DERIVATIVE • the independent infinitesimal dt = ǫ. Once you have the picture. For this reason. you can go from there. That is okay as far as it goes. or when changing multiple independent complex variables simultaneously. but rather the R P ratio df /dt or the sum k f (k dt) dt = f (t) dt.4. most of the time. you can do that as the need arises. when switching perspective in mid-analysis as to which variables are dependent and which are independent. but for most cases the former notation is unnecessarily cluttered. One might as well just stay with the Newton notation in that case. this footnote does not attempt to say everything there is to say about infinitesimals. Instead. at the fundamental level they really aren’t. If a constant. after which nothing prevents us from using the symbols ds and dt interchangeably. it is a function of t. never treating the infinitesimals individually. or whether the independent be some third variable (like t) not in the equation. we do not usually worry about which of df and dt is the independent infinitesimal. if s and t are both independent variables. This is fine. one can have dt = ǫ(t). the latter is how it is conventionally written. then dt is just an infinitesimal of some kind.

if δ is a positive real infinitesimal. 4. but he accepts the notation as conventional nevertheless.19) rather than (4. then it should be equally valid to let ǫ = δ. In any case you should appreciate the conceptual difference between dt = ǫ and d(t). Use discretion. One should like the derivative (4. ǫ = iδ.19) to come out the same regardless of the Argand direction from which ǫ approaches 0 (see Fig. ǫ = −δ. introducing some unambiguous symbol like ǫ to represent the independent infinitesimal. ǫ = (4 − i3)δ or any other infinitesimal value. In fact for the sake of robustness. it is conventional to use the symbol ∂ instead of d. THE DERIVATIVE In such cases. it may be wise to use the symbol dt to mean d(t) only. or second derivative. and (4.19) one should like it to evaluate the same in the limit regardless of the complex phase of ǫ. one normally demands that derivatives do come out the same regardless of the Argand direction. ∂s (·) when tracking s. 2. Where two or more independent variables are at work in the same equation.78 CHAPTER 4. though.) Conventional shorthand for d(df ) is d2 f . the notation dk f dtk represents the kth derivative. That is. By extension. so d2 f d(df /dt) = 2 dt dt is a derivative of a derivative. (If needed or desired. Such notation appears only rarely in the literature. so your audience might not understand it when you write it. dz ǫ→0 ǫ (4.15) to be robust.3 The derivative of a function of a complex variable For (4. one can write ∂t (·) when tracking t. written here in the slightly more general form f (z + ǫ/2) − f (z − ǫ/2) df = lim . . dt2 . for (dt)2 . ǫ = −iδ. etc.5). so long as 0 < |ǫ| ≪ 1. as a reminder that the reader needs to pay attention to which ∂ tracks which independent variable.11 A derivative ∂f /∂t or ∂f /∂s in this case is sometimes called by the slightly misleading name of partial derivative.4.19) is 11 The writer confesses that he remains unsure why this minor distinction merits the separate symbol ∂. Where the limit (4.15) is the definition we normally use for the derivative for this reason.

4.12. d(z a ) dz (z + ǫ/2)a − (z − ǫ/2)a ǫ→0 ǫ (1 + ǫ/2z)a − (1 − ǫ/2z)a = lim z a ǫ→0 ǫ (1 + aǫ/2z) − (1 − aǫ/2z) . Excepting the nonanalytic parts of complex numbers (|·|.3).19) except at isolated nonanalytic points (like z = 0 in h[z] = 1/z √ or g[z] = z). THE DERIVATIVE 79 works without modification when ǫ is complex. .4 The derivative of z a Inspection of § 4.14). plus the Heaviside unit step u(t) and the Dirac delta δ(t) (§ 7. real ǫ and integral k of that section with arbitrary complex z. such functions are fully differentiable except at their poles (where the derivative goes infinite in any case) and other nonanalytic points. ck z k = ∞ k=−∞ ck kz k−1 (4. Particularly. ǫ and a. so the derivative (4.4. [·]∗ . d dz ∞ k=−∞ sensitive to the Argand direction or complex phase of ǫ.12).4. Meeting the criterion.19) does exist—where the derivative is finite and insensitive to Argand direction—there we say that the function f (z) is differentiable. see § 2. That is. Where the derivative (4.7).21) which simplifies to for any complex z and a.17) of the general power series. How exactly to evaluate z a or z a−1 when a is complex is another matter.14) reveals that nothing prevents us from replacing the real t. arg[·]. the key formula (4. = lim z a ǫ→0 ǫ = lim d(z a ) = az a−1 dz (4. ℜ[·] and ℑ[·].4.4 and its (5. 4. but in any case you can use (4.1’s logic in light of (4. there we normally say that the derivative does not exist. most functions encountered in applications do meet the criterion (4.21) for real a right now. written here as (1 + ǫ)w ≈ 1 + wǫ. treated in § 5.20) holds equally well for complex z as for real.

if you inform me instead that you earn 10 percent a year on the same bond. like 10 percent annual interest on a bond. For example. The latter figure is a relative rate. then I may commend you vaguely for your thrift but otherwise the information does not tell me much. .22) The investment principal grows at the absolute rate df /dt. It expresses the significant concept of a relative rate.5 Basic manipulation of the derivative This section introduces the derivative chain and product rules. f (t) dt (4. if you inform me that you earn $ 1000 a year on a bond you hold.2. 4. or logarithmic derivative. The natural logarithmic notation ln f (t) may not mean much to you yet. df /dt d = ln f (t). then I might want to invest.22) for the moment. but the bond’s interest rate is (df /dt)/f (t).5 The logarithmic derivative Sometimes one is more interested in knowing the rate of f (t) relative to the value of f (t) than in knowing the absolute rate itself.4. but the equation’s left side at least should make sense to you. THE DERIVATIVE 4. so you can ignore the right side of (4. as we’ll not introduce it formally until § 5.80 CHAPTER 4. However.

There is nothing more to the proof of the derivative chain rule than that.1 The derivative chain rule df = dz df dw dw dz If f is a function of w.13 4. df = dz 13 = = = = «„ dw dz 1 1 = √ . « 6z 3z = √ = √ . BASIC MANIPULATION OF THE DERIVATIVE 81 4. d j fj (z) = dz 2  fj z + j dz 2 − j fj z − dz 2 . d j fj (z) = fj (z) + j dfj 2 − j fj (z) − dfj 2 . 2w1/2 2 3z 2 − 1 6z. 3z 2 − 1.5. fj z ± so. . then12 .4. a thing divided by itself is 1.23) Equation (4.  12 ≈ fj (z) ± dfj dz dz 2 = fj (z) ± dfj . one can rewrite f (z) = p 3z 2 − 1 w1/2 . 2 For example. That’s it.23) is the derivative chain rule.5. 2 −1 2 3z 3z 2 − 1 „ df dw It bears emphasizing to readers who may inadvertently have picked up unhelpful ideas about the Leibnitz notation in the past: the dw factor in the denominator cancels the dw factor in the numerator.2 The derivative product rule In general per (4. which itself is a function of z.19).5. (4. in the limit.   But to first order.23). in the form f (w) w(z) Then df dw dw dz so by (4.

25) fj =   j fj   k dfk . this comes to d(f1 f2 ) = f2 df1 + f1 df2 . An example may help: » 2 3 – » 2 3 –» – u v −5t u v −5t du dv dz ds d e ln 7s = e ln 7s 2 +3 − − 5 dt + . d f g = g df − f dg .24) is the derivative product rule. 5. then by the derivative chain rule (4.26) Equation (4. this simplifies to       dfk −dfk d  fj (z) =  fj (z) −  fj (z) . df2 = −dg/g2 .23).15 This paragraph is extra.27) × ak k where the ak . pk ln ck pk (4.24) in the slightly specialized but often useful form14   d gj j a ebj hj j j j =   ln cj pj  j gj j j j a ebj hj dgk + gk ln cj pj  bk dhk + k k  dpk . g2 (4.82 CHAPTER 4. so. After studying the complex exponential in Ch. fk (4. THE DERIVATIVE Since the product of two or more dfj is negligible compared to the first-order infinitesimals to which they are added here. bk and ck are arbitrary complex coefficients and the gk . if f1 (z) = f (z) and f2 (z) = 1/g(z). (4. hk and pk are arbitrary functions.24) On the other hand. The subsection is sufficiently abstract that it is a little hard to understand unless one already knows what it means. z z u v z s ln 7s 15 14 . You can skip it for now if you prefer. we shall stand in a position to write (4. 2fk (z) 2fk (z) j j k j k or in other words d j In the common case of only two fj .

x=xo (4. x=xo 16 The notation P |Q means “P when Q. The almost distinctive characteristic of the extremum f (xo ) is that16 df dx = 0. But the derivative of the slope is just the derivative of the derivative.28) At the extremum. If from downward to upward. Whether the extremum be a minimum or a maximum depends on whether the curve turn from a downward slope to an upward.” Sometimes it is alternately written P |Q or [P ]Q .6. Refer to Fig.3: A local extremum. or second derivative. respectively. then negative. a local minimum or maximum—of a real-valued function f (x). 4.28) to find the extremum.3. EXTREMA AND HIGHER DERIVATIVES Figure 4. Hence if df /dx = 0 at x = xo .6 Extrema and higher derivatives One problem which arises very frequently in applied mathematics is the problem of finding a local extremum—that is.” or “P evaluated at Q. . The curve momentarily runs level there. y 83 f (xo ) f (x) xo x 4. then d2 f dx2 d2 f dx2 > 0 implies a local minimum at xo . the slope is zero.” “P . given Q. One solves (4. if from upward to downward. or from an upward slope to a downward. then the derivative of the slope is evidently positive.4. x=xo < 0 implies a local maximum at xo .

too.29) Of course if the first and second derivatives are zero not just at x = xo but everywhere. would be a matter of definition. or both or neither. being rather a level inflection point as depicted in Fig. THE DERIVATIVE Figure 4. best established not by prescription but rather by the needs of the model at hand. Whether one chooses to call some random point on a level straight line an inflection point or an extremum. 4. if both functions go to zero or infinity together at z = zo —then l’Hˆpital’s rule holds that o lim df /dz f (z) = g(z) dg/dz . . 4.4.) 4.17 (In general the term inflection point signifies a point at which the second derivative is zero. x=xo this might be either a minimum or a maximum but more probably is neither.7 L’Hˆpital’s rule o If z = zo is a root of both f (z) and g(z).4 is level because its first derivative is zero. but you knew that already. or alternately if z = zo is a pole of both functions—that is. z=zo z→zo 17 (4.84 CHAPTER 4. then f (x) = yo is just a level straight line.4: A level inflection. y f (xo ) f (x) xo x Regarding the case d2 f dx2 = 0. The inflection point of Fig.

7. l’Hˆpital’s rule is proved by reasoning18 o f (z) − 0 f (z) = lim z→zo g(z) − 0 z→zo g(z) f (z) − f (zo ) df df /dz = lim = lim = lim . we have that z→zo lim dg g(z) = lim . z→zo −df /f 2 g(z) z→zo F (z) z→zo dF where we have used the fact from (4. but just to make the point let us apply l’Hˆpital’s rule instead. Canceling the minus signs and multiplying by g2 /f 2 . which is still 0/0.19 o Partly with reference to [52. z→zo g(z) − g(zo ) z→zo dg z→zo dg/dz lim In the case where z = zo is a pole.21) that d(1/u) = −du/u2 for any u. L’HOPITAL’S RULE 85 In the case where z = zo is a root. z→zo dg z→zo dg/dz g(z) And if zo itself is infinite? Then. whether it represents a root or a pole. should the occasion arise. “L’Hopital’s rule. reducing the ratio to limx→0 2(x3 + o x)(3x2 + 1)/2x. Consider for example the ratio limx→0 (x3 + x)2 /x2 . z→zo lim f (z) df df /dz = lim = lim . f (z) z→zo df Inverting. with which we apply l’Hˆpital’s rule for o Z → 0 to obtain Φ(Z) dΦ/dZ df /dZ f (z) = lim = lim = lim z→∞ g(z) Z→0 Γ(Z) Z→0 dΓ/dZ Z→0 dg/dZ (df /dz)(−z 2 ) df /dz (df /dz)(dz/dZ) = lim = lim . Applying l’Hˆpital’s rule again to the result yields o limx→0 2[(3x2 +1)2 +(x3 +x)(6x)]/2 = 2/2 = 1. 5 and [not yet written]) appear in ratio. new functions F (z) ≡ 1/f (z) and G(z) ≡ 1/g(z) of which z = zo is a root are defined. (dg/dz)(dz/dZ) lim Z→0 Nothing in the derivation requires that z or zo be real. Where expressions involving trigonometric or special functions (Chs. = lim z→∞ (dg/dz)(−z 2 ) z→∞ dg/dz z→∞. we define the new variable Z = 1/z and the new functions Φ(Z) = f (1/Z) = f (z) and Γ(Z) = g(1/Z) = g(z). a recursive application of l’Hˆpital’s rule can be just the thing one needs. which is 0/0. applying the rule a third time would have ruined the result. In the example. o Observe that one must stop applying l’Hˆpital’s rule once the ratio is no longer 0/0 or o ∞/∞. 5 April 2006]. 3.” 03:40.ˆ 4. with which z→zo lim f (z) G(z) dG −dg/g2 = lim = lim = lim . Nothing prevents one from applying l’Hˆpital’s rule recursively. 19 18 . The easier way to resolve this particular ratio would naturally be to cancel a factor of x2 from it.

z2 . broadly applicable method for finding roots numerically.. fast converging.5. o 4. z3 . Good examples of the use require math from Ch.8 The Newton-Raphson iteration The Newton-Raphson iteration is a powerful. give successively better estimates of the true root z∞ . calculated in turn by the iteration (4. lim √ = lim x→∞ 1/2 x x→∞ x x x→∞ The example incidentally shows that natural logarithms grow slower than square roots.30). then come back here and read the paragraph if you prefer. The iteration approximates the curve f (x) by its tangent 20 This paragraph is optional reading for the moment. THE DERIVATIVE L’Hˆpital’s rule is used in evaluating indeterminate forms of the kinds o 0/0 and ∞/∞. Given a function f (z) of which the root is desired.30) One begins the iteration by guessing the root and calling the guess z0 .3. consider the function y = f (x) of Fig 4. Section 5. (4. plus related forms like (0)(∞) which can be recast into either of the two main forms.7) the natural logarithmic function and its derivative. You can read Ch. Then z1 .3 will put l’Hˆpital’s rule to work. 5 first.86 CHAPTER 4. but if we may borrow from (5.20 1 d ln x = . 5 and later. § 10-2] . an instance of a more general principle we shall meet in § 5. the Newton-Raphson iteration is f (z) d dz f (z) z=zk zk+1 = z − . dx x then a typical l’Hˆpital example is21 o ln x 1/x 2 √ = lim √ = 0. etc. To understand the Newton-Raphson iteration. 21 [42.

is the line which most nearly approximates a curve at a given point. leaving the rightward leg tangent to the circle. and in the neighborhood of the point it goes in the same direction the curve goes. y 87 x xk xk+1 f (x) line22 (shown as the dashed line in the figure): ˜ fk (x) = f (xk ) + d f (x) dx (x − xk ).5: The Newton-Raphson iteration. we have that xk+1 = x − f (x) d dx f (x) x=xk d f (x) dx x=xk (xk+1 − xk ). x=xk ˜ It then approximates the root xk+1 as the point at which fk (xk+1 ) = 0: ˜ fk (xk+1 ) = 0 = f (xk ) + Solving for xk+1 . 3 is slightly obscure. The trigonometric tangent function is named from a variation on Fig. nothing forbids complex z and f (z). THE NEWTON-RAPHSON ITERATION Figure 4. Although the illustration uses real numbers. A tangent line. which is (4.4. 4.30) with x ← z. 22 .8. 3. also just called a tangent.1 in which the triangle’s bottom leg is extended to unit length. . maybe more of linguistic interest than of mathematical. The dashed line in Fig. The Newton-Raphson iteration works just as well for these. The tangent touches the curve at the point.5 is a good example of a tangent line. The relationship between the tangent line and the trigonometric tangent function of Ch.

Usually in practice. then the iteration might never converge at √ all. the roots of f (z) = z 2 + 2 are z = ±i 2. However. as most interesting functions do. so √ it never converges23 (and if you guess that z0 = 2—well. A second limitation of the Newton-Raphson is that. even the relatively slow convergence is still pretty fast and is usually adequate. The Newton-Raphson iteration is a champion square root calculator. try it with your pencil and see what z2 comes out to be). If this procedure is not practical (perhaps because the function has a large or infinite number of roots). For most functions. A third limitation arises where there is a multiple root. The iteration often converges on the root nearest the initial guess zo but does not always. incidentally. the Newton-Raphson iteration for this is xk+1 = If you start by guessing x0 = 1 23 1 p xk + .5 shows why: in the neighborhood. You can fix the problem with a different. and in any case there is no guarantee that the root it finds is the one you wanted. the Newton-Raphson converges relatively slowly on the triple root of f (z) = z 3 . once the Newton-Raphson finds the root’s neighborhood. then you find the next root by iterating on the new function f (z)/(z − α). then you should probably take care to make a sufficiently accurate initial guess if you can. The most straightforward way to beat this problem is to find all the roots: first you find some root α.30).31) It is entertaining to try this on a computer. In this case. For example. Figure 4. whose roots are √ x = ± p. but relatively slowly. . Consider f (x) = x2 − p. THE DERIVATIVE The principal limitation of the Newton-Raphson arises when the function has more than one root. 2 xk (4. Then try again with z0 = 1 + i2−0x10 .88 CHAPTER 4. if you happen to guess z0 especially unfortunately. the Newton-Raphson iteration works very well. Per (4. possibly complex initial guess. the curve hardly departs from the straight line. but if you guess that z0 = 1 then the iteration has no way to leave the real number line. For instance. and so on until you have found all the roots. the Newton-Raphson normally still converges. then you remove that root (without affecting any of the other roots) by dividing f (z)/(z − α). it converges on the actual root remarkably quickly. even for calculations by hand.

26) as p1/n = σ 1/n cis(ψ/n). seek p1/n directly. § 6.1. you can decompose p1/n per de Moivre’s theorem (3.4.32) reliably. using complex zk in place of the real xk . in which cis(ψ/n) = cos(ψ/n) + i sin(ψ/n) is calculated by the Taylor series of Table 8.31) and (4. This concludes the chapter.8. let f (x) = xn − p and iterate24. upon which (4. treating the Taylor series.1][51] 24 . n xk √ 89 p fast.1.32) Section 13. then don’t bother to decompose. the iteration (4. orderly convergence is needed for complex p = u + iv = σ cis ψ. If so.5. they converge reliably and orderly only for real. 25 [42. (4. σ ≥ 0. sketch f [x] in the fashion of Fig. Given x0 = 1. however. so it often pays to be a little less theoretically rigid in applying it. § 4-9][34. (To see why.) If reliable. 4. orderly computes σ 1/n .25 xk+1 = p 1 (n − 1)xk + n−1 . This saves effort and usually works. Equations (4. The Newton-Raphson iteration however excels as a practical root-finding technique. nonnegative p. will continue the general discussion of the derivative.31) converges on x∞ = To calculate the nth root x = p1/n . In the uncommon event that the direct iteration does not seem to converge.7 generalizes the Newton-Raphson iteration to handle vectorvalued functions. THE NEWTON-RAPHSON ITERATION and iterate several times.32) work not only for real p but also usually for complex. Chapter 8. start over again with some randomly chosen complex z0 . Then σ is real and nonnegative.

THE DERIVATIVE .90 CHAPTER 4.

ǫ→0 (5.1) Equation (5. It seems to appear just about everywhere. Another way to write the same definition is exp x = ex . It derives the functions’ basic properties and explains their close relationship to the trigonometrics. and shows how the two operate on complex arguments.1) defines the natural exponential function—commonly. the complex natural exponential is almost impossible to avoid.3) 91 . It works out the functions’ derivatives and the derivatives of the basic members of the trigonometric and inverse trigonometric families to which they respectively belong. ǫ→0 (5. What happens when ǫ is very small but N is very large? The really interesting question is. the natural logarithm. as ǫ → 0 and N → ∞.2) (5. Consider the factor This is the overall factor by which a quantity grows after N iterative rounds of multiplication by (1 + ǫ).1 The real exponential (1 + ǫ)N . what happens in the limit. more briefly named the exponential function. 5. e ≡ lim (1 + ǫ)1/ǫ . while x = ǫN remains a finite number? The answer is that the factor becomes exp x ≡ lim (1 + ǫ)x/ǫ .Chapter 5 The complex exponential In higher mathematics. This chapter introduces the concept of the natural exponential and of its inverse.

1 we observe per (4. To show this.92 CHAPTER 5. the next step toward putting a concrete bound on e is to show that y(x) ≤ exp x for all real x. implying that the straight line which best approximates the exponential function in that neighborhood—the tangent line. but even at the applied level cannot think of a logically permissible way to do it.ǫ→0 δ (1 + ǫ)+δ/2ǫ − (1 + ǫ)−δ/2ǫ δ. ǫ→0 which is to say that d exp x = exp x. For the moment. what interests us is that d exp 0 = exp 0 = lim (1 + ǫ)0 = 1. THE COMPLEX EXPONENTIAL Whichever form we write it in. that is. dx (5. Since 1 + ǫ > 1. It seems nonobvious that the limit limǫ→0 (1 + ǫ)1/ǫ actually does exist. the slope and height of the exponential function are everywhere equal. To show that we can.ǫ→0 δ (1 + δ/2) − (1 − δ/2) = lim (1 + ǫ)x/ǫ δ. to divide repeatedly by 1 + ǫ as x decreases. ǫ→0 dx which says that the slope and height of the exponential function are both unity at x = 0. the question remains as to whether the limit actually exists. however.19) that the derivative of the exponential function is exp(x + δ/2) − exp(x − δ/2) d exp x = lim δ→0 dx δ (x+δ/2)/ǫ − (1 + ǫ)(x−δ/2)/ǫ (1 + ǫ) = lim δ.4) This is a curious. whether 0 < e < ∞. With the tangent line y(x) found. this Excepting (5. The rest of this section shows why it does. 1 . which just grazes the curve—is y(x) = 1 + x. important result: the derivative of the exponential function is the exponential function itself.ǫ→0 δ = lim (1 + ǫ)x/ǫ = lim (1 + ǫ)x/ǫ . the author would prefer to omit much of the rest of this section. we observe per (5. whether in fact we can put some concrete bound on e.1) that the essential action of the exponential function is to multiply repeatedly by 1 + ǫ as x increases. that the curve runs nowhere below the line.4).

d y(0) = 1. Leftward. Either way.2). Rightward. . 93 However. 1 2 ≤ exp (1) . the curve never crosses below the line for real x. But we have purposely defined the tangent line y(x) = 1 + x such that exp 0 = d exp 0 = dx y(0) = 1.1. THE REAL EXPONENTIAL action means for real x that exp x1 ≤ exp x2 if x1 ≤ x2 .5. dx if x1 ≤ x2 . Evaluating the last inequality at x = −1/2 and x = 1. In light of (5. such that the line just grazes the curve of exp x at x = 0.1 depicts. bending upward away from the line. ≤ e1 . at x < 0. a positive number remains positive no matter how many times one multiplies or divides it by 1 + ǫ. evidently the curve’s slope only increases. we have that 1 2 2 But per (5. again bending upward away from the line. at x > 0. so 1 2 2 ≤ e−1/2 . y(x) ≤ exp x. dx that is. the last two equations imply further that d exp x1 ≤ dx 0 ≤ d exp x2 dx d exp x.4). Figure 5. evidently the curve’s slope only decreases. ≤ exp − . In symbols. exp x = ex . so the same action also means that 0 ≤ exp x for all real x.

such that we define for it the special notation ln(·) = loge (·). thus the natural logarithm inverts the natural exponential and vice versa: ln exp x = ln ex = x. (5.1. where e is the constant introduced in (5. so also for base b = e.1: The natural exponential.94 CHAPTER 5. Just as for any other base b. however. 2 ≤ e ≤ 4. one can choose any base b.2 The natural logarithm In the general exponential expression bx .2) puts the desired bound on the exponential function. is the most interesting choice of all. exp x 1 x −1 or in other words.6) exp ln x = eln x = x. As we shall see in § 5.5) which in consideration of (5. (5. For this reason among others. By the Taylor series of Table 8.4.B7E1 can readily be calculated. b = 2 is an interesting choice. it turns out that b = e. but the derivation of that series does not come until Ch.3). for example. and call it the natural logarithm. 5. THE COMPLEX EXPONENTIAL Figure 5. 8. the base-e logarithm is similarly interesting. The limit does exist. . the value e ≈ 0x2.

dx = exp y.2. If y = ln x. here is another rather significant result.4).2 2 Besides the result itself. THE NATURAL LOGARITHM Figure 5.2 plots the natural logarithm. We shall use the technique more than once in this book. ln x 95 1 x −1 Figure 5. and per (5. the technique which leads to the result is also interesting and is worth mastering. d 1 ln x = . dy 1 dy = .7) dx x Like many of the equations in these early chapters. . dx x the inverse of which is In other words. dy But this means that dx = x.2: The natural logarithm. then x = exp y.5. (5.

3 Fast and slow functions The exponential exp x is a fast function. Since the exp(·) and ln(·) functions are mutual inverses. Base b = 2 logarithms are interesting. The logarithm ln x is a slow function. lim x→∞ xa lim (5. Such claims are proved by l’Hˆpital’s rule (4. diverge or decay respectively faster and slower than xa .8) logb w = ln b This equation converts any logarithm to a natural logarithm. x→∞ x→∞ That is. (5. +∞ if a ≤ 0. 2 which Ch. we can leverage (5.9) to show also that x→∞ lim exp(±x) xa = = = = = x→∞ x→∞ x→∞ exp(±x) xa lim exp [±x − a ln x] lim exp ln ln x x lim exp (x) ±1 − a lim exp [(x) (±1 − 0)] lim exp [±x] .1 will show how to calculate.96 CHAPTER 5. −∞ if a ≥ 0.5’s logarithmic base-conversion identity to read ln w .9) which reveals the logarithm to be a slow function. x→∞ xa exp(−x) = 0. we o have that −1 ln x = lim = a x→∞ axa x→∞ x lim ln x −1 lim a = lim a = x→0 x x→0 ax 0 if a > 0.B172. ln 2 = − ln 5.10) . (5. so we note here that 1 ≈ 0x0. THE COMPLEX EXPONENTIAL One can specialize Table 2.29). Applying the rule. These functions grow. 8 and its Table 8. 0 if a < 0. exp(+x) = ∞.

It’s a long. x4 x x x 1! 2! 3! 4! 1! 0! x0 x1 x2 x3 x4 3! 2! .3. 3 . The exception is peculiar. . The reader can detail these as the need arises. Also.. each term in each sequence is the derivative with respect to x of the term to its right—except left of the middle element in the first sequence and right of the middle element in the second. and indeed one can write ln(x + u) . .. As we have seen. .. The logarithm can in some situations profitably be thought of as a “diverging constant” of sorts. ln x. . What is going on here? The answer is that x0 (which is just a constant) and ln x both are of zeroth order in x. The natural logarithm does indeed eventually diverge to infinity. but the point nevertheless remains that x0 and ln x often play analogous roles in mathematics.3 It is interesting and worthwhile to contrast the sequence .. 3 .4 Figure 5. .− 3! 2! 1! 0! x1 x2 x3 x4 . but the divergence of the former is extremely slow— so slow.5. regardless of the base. There are of course some degenerate edge cases like b = 0 and b = 1. x0 = lim u→∞ ln u which casts x0 as a logarithm shifted and scaled. FAST AND SLOW FUNCTIONS 97 which reveals the exponential to be a fast function. each sequence increases in magnitude going rightward. in fact. it takes practically forever just to reach 0x100. 1 . . in the literal sense that there is no height it does not eventually reach.9) limx→∞ (ln x)/xǫ = 0 for any positive ǫ no matter how small. Admittedly.− against the sequence . x4 x x x 0! 1! 2! 3! 4! As x → +∞. 1. . This seems strange at first because ln x diverges as x → ∞ whereas x0 does not. ... Consider for instance how far out x must run to make ln x = 0x100. 3.2 has plotted ln x only for x ∼ 1. logarithms diverge slower. Thus exponentials generally are fast and logarithms generally are slow. that per (5.− 2. . Such conclusions are extended to bases other than the natural base e simply by observing that logb x = ln x/ ln b and that bx = exp(x ln b). . .. Exponentials grow or decay faster than powers. but beyond the figure’s window the curve (whose slope is 1/x) flattens rapidly rightward. 4 One does not grasp how truly slow the divergence is until one calculates a few concrete values... − 2 . but it certainly does not hurry. one ought not strain such logic too far. because ln x is not in fact a constant. . long way. to the extent that it locally resembles the plot of a constant value.

.98 CHAPTER 5. THE COMPLEX EXPONENTIAL Less strange-seeming perhaps is the consequence of (5.10). So let us multiply a complex number z = x + iy by 1 + iǫ. It befits an applied mathematician subjectively to internalize (5. ǫ→0 but from here it is not obvious where to go. The resulting change in z is ∆z = (1 + iǫ)(x + iy) − (x + iy) = (ǫ)(−y + ix).4 Euler’s formula The result of § 5. then exp(iǫ) = (1 + iǫ)ǫ/ǫ = 1 + iǫ. How can one evaluate exp iθ = lim (1 + ǫ)iθ/ǫ . helps one to grasp mentally the essential features of many mathematical models one encounters in practice. that x∞ and exp x play analogous roles. where the factor is not quite exactly unity—in this case. if we don’t quite know where to go with this yet. fast.14) to write the last equation in the form exp iθ = lim (1 + iǫ)θ/ǫ . to remember that ln x resembles x0 and that exp x resembles x∞ . by 1 + iǫ. one can take advantage of (4. Now leaving aside fast and slow functions for the moment.1 leads to one of the central questions in all of mathematics.10) that exp x is of infinite order in x. In fact it appears that the interpretation of exp iθ remains for us to define. 5. But per § 5. A qualitative sense that logarithms are slow and exponentials. obtaining (1 + iǫ)(x + iy) = (x − ǫy) + i(y + ǫx).9) and (5. the essential operation of the exponential function is to multiply repeatedly by some factor. we turn our attention in the next section to the highly important matter of the exponential of a complex argument. The book’s development up to the present point gives no obvious direction. So. ǫ→0 where i2 = −1 is the imaginary unit introduced in § 2. if we can find a way to define it which fits sensibly with our existing notions of the real exponential.1. what do we know? One thing we know is that if θ = ǫ.12? To begin.

it follows from the equations that |exp iθ| = lim (1 + iǫ)θ/ǫ ǫ→0 arg [exp iθ] = lim arg (1 + iǫ)θ/ǫ ǫ→0 θ = 1. EULER’S FORMULA Figure 5.5.3. arg(∆z) = arctan −y 4 |∆z| = (ǫ) In other words. These two equations—implying an unvarying. Since the change is directed neither inward nor outward from the dashed circle in the figure. evidently its effect is to move z a distance (ǫ) |z| counterclockwise about the circle. In other words. steady counterclockwise progression about the circle in the diagram—are somewhat unexpected. Given the steadiness of the progression.4. iℑ(z) i2 ∆z i ρ φ −2 −1 −i −i2 1 2 z ℜ(z) 99 in which it is seen that y 2 + x2 = (ǫ) |z| . ∆φ = ǫ. referring to the figure. 5. the change is proportional to the magnitude of z. but at a right angle to z’s arm in the complex plane. Refer to Fig. ǫ→0 ǫ .3: The complex exponential and Euler’s formula. 2π x = arg z + . = |exp 0| + lim ∆ρ ǫ→0 ǫ θ = arg [exp 0] + lim ∆φ = θ. the change ∆z leads to ∆ρ = 0.

the last equation is wz = exp(x ln σ − ψy) exp i(y ln σ + ψx). where cis(·) is as defined in § 3. exp iθ is the complex number which lies on the Argand unit circle at phase angle θ. cos φ and i sin φ.29).11). we have for real φ that arg [exp iφ] = φ.11) is one of the most famous results in all of mathematics. . If a complex base w is similarly expressed in the form w = u + iv = σ exp iψ. How? Consider in light of Fig.47). (5. Had we known that θ was an Argand phase angle. then adds the latter two to show them equal to the first of the three.11) that one can express any complex number in the form z = x + iy = ρ exp iφ.5. perhaps.11) |exp iφ| = 1. It is called Euler’s formula.100 CHAPTER 5. THE COMPLEX EXPONENTIAL That is. Such an alternate derivation lends little insight. 5 (5.6 and it opens the exponential domain fully to complex numbers. Since exp(α + β) = eα+β = exp α exp β.3 and (5.1 the Taylor series for exp iφ. the fundamental theorem of calculus (7.” 6 An alternate derivation of Euler’s formula (5. Changing φ ← θ now. not just for the natural base e but for any base. Along with the Pythagorean theorem (2.2) and Cauchy’s integral formula (8. then it follows that wz = exp[ln wz ] = exp[z ln w] = exp[(x + iy)(iψ + ln σ)] = exp[(x ln σ − ψy) + i(y ln σ + ψx)]. 5. Leonhard Euler’s name is pronounced as “oiler.12) For native English speakers who do not speak German. which says neither more nor less than that exp iφ = cos φ + i sin φ = cis φ. but at least it builds confidence that we actually knew what we were doing when we came up with the incredible (5. (5.11)—less intuitive and requiring slightly more advanced mathematics. naturally we should have represented it by the symbol φ from the start.11. but briefer—constructs from Table 8.

5.12) serves to raise any complex number to a complex power. e±i2π/2 = −1.14) ln w = ln σeiψ = ln σ + iψ.11) implies that complex numbers z1 and z2 can be written (5.11.11) include that σ = e±i2π/4 = ±i.2. z2 = ρ2 eiφ2 . exp(−iφ) = cos φ − i sin φ.11) to +φ then to −φ. z2 ρ2 ρ2 z a = ρa eiaφ = ρa exp[iaφ]. we have that . we have that (5.5. tan ψ = u Equation (5.15) By the basic power properties of Table 2. y = ρ sin φ. (5. 101 u2 + v 2 .5.16) 5. ρ1 i(φ1 −φ2 ) ρ1 z1 = e = exp[i(φ1 − φ2 )]. Applying Euler’s formula (5. Euler’s formula (5. v .6 Complex trigonometrics exp(+iφ) = cos φ + i sin φ. ein2π = 1. Curious consequences of Euler’s formula (5.5 Complex exponentials and de Moivre’s theorem z1 = ρ1 eiφ1 . then. introduced in § 3. z1 z2 = ρ1 ρ2 ei(φ1 +φ2 ) = ρ1 ρ2 exp[i(φ1 + φ2 )]. This is de Moivre’s theorem. (5.13) For the natural logarithm of a complex number in light of Euler’s formula. COMPLEX EXPONENTIALS AND DE MOIVRE where x = ρ cos φ.

Such confusion probably tempts few readers unfamiliar with the material of Ch. Although less commonly seen than the other two.17) and (5.19) and (5.22)’s first line from that figure. Their inverses arccosh. The figure is quite handy for real φ. (5.22) hold for complex φ as well as for real. so you can ignore this footnote for now. in the notation of that chapter—which tempts one incorrectly to suppose v ˆ by analogy that cos∗ φ cos φ + sin∗ φ sin φ = 1 and that cosh∗ φ cosh φ − sinh∗ φ sinh φ = 1 when the angle φ is complex. cosh φ (5. 7 .102 CHAPTER 5. 2 exp(+φ) − exp(−φ) . etc. and from (5.7 The notation exp i(·) or ei(·) is sometimes felt to be too bulky. 15 and if the confusion then arises.22) holds exactly as written for complex φ as well as for real. then such thoughts may serve to clarify the matter.20) can generally be true only if (5. (5. 15. However. but what if anything the figure means when φ is complex is not obvious.20) (5. whereas we originally derived (5. are defined in the obvious way. then consider that the angle φ of Fig. 3.2) is that cos2 φ + sin2 φ = 1. if later you return after reading Ch.18) Thus are the trigonometrics expressed in terms of complex exponentials.22) Both lines of (5.17) Subtracting the second equation from the first and solving for sin φ yields sin φ = (5.20) one can derive the hyperbolic analog: cos2 φ + sin2 φ = 1.. the notation cis(·) ≡ exp i(·) = cos(·) + i sin(·) Chapter 15 teaches that the “dot product” of a unit vector and its own conjugate is unity—ˆ ∗ · v = 1. THE COMPLEX EXPONENTIAL Adding the two equations and solving for cos φ yields cos φ = exp(+iφ) + exp(−iφ) .19) (5. 2 exp(+iφ) − exp(−iφ) . 2 sinh φ .1 is a real angle. The Pythagorean theorem for trigonometrics (3. i2 (5. Hence in fact cos∗ φ cos φ + sin∗ φ sin φ = 1 and cosh∗ φ cosh φ − sinh∗ φ sinh φ = 1.21) These are called the hyperbolic functions. cosh2 φ − sinh2 φ = 1. The forms (5.17) through (5. If the confusion descends directly or indirectly from the figure.18) suggest the definition of new functions cosh φ ≡ sinh φ ≡ tanh φ ≡ exp(+φ) + exp(−φ) . However.

Conventional names for these two mutually inverse families of functions are unknown to the author. 5. At this point in the development one begins to notice that the sin.7.7 Summary of properties Table 5. Or.17) and (5. and likewise for the several other trigs.11 and 4. by which one can immediately adapt the many trigonometric properties of Tables 3. cosh and sinh functions are each really just different facets of the same mathematical phenomenon. Better applied style is to find the derivatives by observing directly the circle from which the sine and cosine functions come. cis.1 Derivatives of sine and cosine One can compute derivatives of the sine and cosine functions from (5. SUMMARY OF PROPERTIES 103 is also conventionally recognized.8 Derivatives of complex exponentials This section computes the derivatives of the various trigonometric and inverse trigonometric functions.19) and (5. Likewise their respective inverses: arcsin. arccosh and arcsinh. if the various tangent functions were included. but one might call them the natural exponential and natural logarithmic families.1 and 3. −i ln. i sinh z = sin iz. but to do it in that way doesn’t seem sporting. ln. 3. Also conventionally recognized are sin−1 (·) and occasionally asin(·) for arcsin(·). as earlier seen in § 3.18) respectively to (5.23) 5. comparing (5. (5. exp.20). cos.3 to hyperbolic use. arccos.8.1 gathers properties of the complex exponential from this chapter and from §§ 2. Then.12. then one might call them the trigonometric and inverse trigonometric families.4. .17) and (5. we have that cosh z = cos iz. Replacing z ← φ in this section’s several equations implies a coherent definition for trigonometric functions of a complex variable. 5.18).5.11. i tanh z = tan iz.

104

CHAPTER 5. THE COMPLEX EXPONENTIAL

Table 5.1: Complex exponential properties. i2 = −1 = (−i)2 1 = −i i eiφ = cos φ + i sin φ eiz = cos z + i sin z z1 z2 = ρ1 ρ2 ei(φ1 +φ2 ) = (x1 x2 − y1 y2 ) + i(y1 x2 + x1 y2 ) ρ1 i(φ1 −φ2 ) (x1 x2 + y1 y2 ) + i(y1 x2 − x1 y2 ) z1 = e = 2 z2 ρ2 x2 + y2 2 z a = ρa eiaφ wz = ex ln σ−ψy ei(y ln σ+ψx) ln w = ln σ + iψ eiz − e−iz i2 eiz + e−iz cos z = 2 sin z tan z = cos z sin z = sin iz = i sinh z cos iz = cosh z tan iz = i tanh z ez − e−z 2 ez + e−z cosh z = 2 sinh z tanh z = cosh z sinh z =

cos2 z + sin2 z = 1 = cosh2 z − sinh2 z w ≡ u + iv = σeiψ z ≡ x + iy = ρeiφ

exp z ≡ ez

d exp z = exp z dz 1 d ln w = dw w df /dz d = ln f (z) f (z) dz ln w logb w = ln b

cis z ≡ cos z + i sin z = eiz

5.8. DERIVATIVES OF COMPLEX EXPONENTIALS Figure 5.4: The derivatives of the sine and cosine functions.
ℑ(z) dz dt z ℜ(z)

105

ρ

ωt + φo

Refer to Fig. 5.4. Suppose that the point z in the figure is not fixed but travels steadily about the circle such that z(t) = (ρ) [cos(ωt + φo ) + i sin(ωt + φo )] . How fast then is the rate dz/dt, and in what Argand direction? dz d d = (ρ) cos(ωt + φo ) + i sin(ωt + φo ) . dt dt dt Evidently, • the speed is |dz/dt| = (ρ)(dφ/dt) = ρω; • the direction is at right angles to the arm of ρ, which is to say that arg(dz/dt) = φ + 2π/4. With these observations we can write that dz dt = (ρω) cos ωt + φo + 2π 2π + i sin ωt + φo + 4 4 = (ρω) [− sin(ωt + φo ) + i cos(ωt + φo )] . (5.25) (5.24)

(5.26)

Matching the real and imaginary parts of (5.25) against those of (5.26), we

106 have that

CHAPTER 5. THE COMPLEX EXPONENTIAL

d cos(ωt + φo ) = −ω sin(ωt + φo ), dt d sin(ωt + φo ) = +ω cos(ωt + φo ). dt If ω = 1 and φo = 0, these are d cos t = − sin t, dt d sin t = + cos t. dt

(5.27)

(5.28)

5.8.2

Derivatives of the trigonometrics

Equations (5.4) and (5.28) give the derivatives of exp(·), sin(·) and cos(·). From these, with the help of (5.22) and the derivative chain and product rules (§ 4.5), we can calculate the several derivatives of Table 5.2.8

5.8.3

Derivatives of the inverse trigonometrics

Observe the pair d exp z = exp z, dz 1 d ln w = . dw w The natural exponential exp z belongs to the trigonometric family of functions, as does its derivative. The natural logarithm ln w, by contrast, belongs to the inverse trigonometric family of functions; but its derivative is simpler, not a trigonometric or inverse trigonometric function at all. In Table 5.2, one notices that all the trigonometrics have trigonometric derivatives. By analogy with the natural logarithm, do all the inverse trigonometrics have simpler derivatives? It turns out that they do. Refer to the account of the natural logarithm’s derivative in § 5.2. Following a similar procedure, we have by successive steps
8

[42, back endpaper]

5.8. DERIVATIVES OF COMPLEX EXPONENTIALS

107

Table 5.2: Derivatives of the trigonometrics. d exp z = + exp z dz d sin z = + cos z dz d cos z = − sin z dz d 1 dz exp z d 1 dz sin z d 1 dz cos z 1 exp z 1 = − tan z sin z tan z = + cos z

= −

d tan z = + 1 + tan2 z dz 1 d 1 = − 1+ dz tan z tan2 z d sinh z = + cosh z dz d cosh z = + sinh z dz d 1 dz sinh z 1 d dz cosh z

1 cos2 z 1 = − 2 sin z = + 1 tanh z sinh z tanh z = − cosh z

= −

d tanh z = 1 − tanh2 z dz d 1 1 = 1− dz tanh z tanh2 z

= +

1 cosh2 z 1 = − sinh2 z

108 that

CHAPTER 5. THE COMPLEX EXPONENTIAL

arcsin w = z, w dw dz dw dz dw dz dz dw = sin z, = cos z, = ± = ± = √ 1 − sin2 z, 1 − w2 ,

d arcsin w = dw Similarly, w dw dz dw dz dz dw

±1 , 1 − w2 ±1 √ . 1 − w2

(5.29)

arctan w = z, = tan z, = 1 + tan2 z, = 1 + w2 , = 1 , 1 + w2 1 . 1 + w2

d arctan w = dw

(5.30)

Derivatives of the other inverse trigonometrics are found in the same way. Table 5.3 summarizes.

5.9

The actuality of complex quantities

Doing all this neat complex math, the applied mathematician can lose sight of some questions he probably ought to keep in mind: Is there really such a thing as a complex quantity in nature? If not, then hadn’t we better avoid these complex quantities, leaving them to the professional mathematical theorists? As developed by Oliver Heaviside in 1887,9 the answer depends on your point of view. If I have 300 g of grapes and 100 g of grapes, then I have 400 g
9

[35]

5.9. THE ACTUALITY OF COMPLEX QUANTITIES Table 5.3: Derivatives of the inverse trigonometrics. d ln w = dw d arcsin w dw d arccos w dw d arctan w dw d arcsinh w dw d arccosh w dw d arctanh w dw = = = = = = 1 w ±1 1 − w2 ∓1 √ 1 − w2 1 1 + w2 ±1 √ w2 + 1 ±1 √ w2 − 1 1 1 − w2 √

109

altogether. Alternately, if I have 500 g of grapes and −100 g of grapes, again I have 400 g altogether. (What does it mean to have −100 g of grapes? Maybe that I ate some!) But what if I have 200 + i100 g of grapes and 200 − i100 g of grapes? Answer: again, 400 g. Probably you would not choose to think of 200 + i100 g of grapes and 200 − i100 g of grapes, but because of (5.17) and (5.18), one often describes wave phenomena as linear superpositions (sums) of countervailing complex exponentials. Consider for instance the propagating wave A A exp[+i(ωt − kz)] + exp[−i(ωt − kz)]. 2 2 The benefit of splitting the real cosine into two complex parts is that while the magnitude of the cosine changes with time t, the magnitude of either exponential alone remains steady (see the circle in Fig. 5.3). It turns out to be much easier to analyze two complex wave quantities of constant magnitude than to analyze one real wave quantity of varying magnitude. Better yet, since each complex wave quantity is the complex conjugate of the other, the analyses thereof are mutually conjugate, too; so you normally needn’t actually analyze the second. The one analysis suffices for both.10 (It’s like A cos[ωt − kz] =
10

If the point is not immediately clear, an example: Suppose that by the Newton-

110

CHAPTER 5. THE COMPLEX EXPONENTIAL

reflecting your sister’s handwriting. To read her handwriting backward, you needn’t ask her to try writing reverse with the wrong hand; you can just hold her regular script up to a mirror. Of course, this ignores the question of why one would want to reflect someone’s handwriting in the first place; but anyway, reflecting—which is to say, conjugating—complex quantities often is useful.) Some authors have gently denigrated the use of imaginary parts in physical applications as a mere mathematical trick, as though the parts were not actually there. Well, that is one way to treat the matter, but it is not the way this book recommends. Nothing in the mathematics requires you to regard the imaginary parts as physically nonexistent. You need not abuse Occam’s razor! (Occam’s razor, “Do not multiply objects without necessity,”11 is a fine philosophical tool as far as it goes, but is overused in some circles. More often than one likes to believe, the necessity remains hidden until one has ventured to multiply the objects, nor reveals itself to the one who wields the razor, whose hand humility should stay.) It is true by Euler’s formula (5.11) that a complex exponential exp iφ can be decomposed into a sum of trigonometrics. However, it is equally true by the complex trigonometric formulas (5.17) and (5.18) that a trigonometric can be decomposed into a sum of complex exponentials. So, if each can be decomposed into the other, then which of the two is the real decomposition? Answer: it depends on your point of view. Experience seems to recommend viewing the complex exponential as the basic element—as the element of which the trigonometrics are composed—rather than the other way around. From this point of view, it is (5.17) and (5.18) which are the real decomposition. Euler’s formula itself is secondary. The complex exponential method of offsetting imaginary parts offers an elegant yet practical mathematical way to model physical wave phenomena. So go ahead: regard the imaginary parts as actual. It doesn’t hurt anything, and it helps with the math.

Raphson iteration (§ 4.8) you have found a root of the polynomial x3 + 2x2 + 3x + 4 at x ≈ −0x0.2D + i0x1.8C. Where is there another root? Answer: at the complex conjugate, x ≈ −0x0.2D − i0x1.8C. One need not actually run the Newton-Raphson again to find the conjugate root. 11 [48, Ch. 12]

Chapter 6

Primes, roots and averages
This chapter gathers a few significant topics, each of whose treatment seems too brief for a chapter of its own.

6.1

Prime numbers

A prime number —or simply, a prime—is an integer greater than one, divisible only by one and itself. A composite number is an integer greater than one and not prime. A composite number can be composed as a product of two or more prime numbers. All positive integers greater than one are either composite or prime. The mathematical study of prime numbers and their incidents constitutes number theory, and it is a deep area of mathematics. The deeper results of number theory seldom arise in applications,1 however, so we shall confine our study of number theory in this book to one or two of its simplest, most broadly interesting results.

6.1.1

The infinite supply of primes

The first primes are evidently 2, 3, 5, 7, 0xB, . . . . Is there a last prime? To show that there is not, suppose that there were. More precisely, suppose that there existed exactly N primes, with N finite, letting p1 , p2 , . . . , pN represent these primes from least to greatest. Now consider the product of
The deeper results of number theory do arise in cryptography, or so the author has been led to understand. Although cryptography is literally an application of mathematics, its spirit is that of pure mathematics rather than of applied. If you seek cryptographic derivations, this book is probably not the one you want.
1

111

112 all the primes,

CHAPTER 6. PRIMES, ROOTS AND AVERAGES

N

C=
j=1

pj .

What of C + 1? Since p1 = 2 divides C, it cannot divide C + 1. Similarly, since p2 = 3 divides C, it also cannot divide C + 1. The same goes for p3 = 5, p4 = 7, p5 = 0xB, etc. Apparently none of the primes in the pj series divides C + 1, which implies either that C + 1 itself is prime, or that C + 1 is composed of primes not in the series. But the latter is assumed impossible on the ground that the pj series includes all primes; and the former is assumed impossible on the ground that C + 1 > C > pN , with pN the greatest prime. The contradiction proves false the assumption which gave rise to it. The false assumption: that there were a last prime. Thus there is no last prime. No matter how great a prime number one finds, a greater can always be found. The supply of primes is infinite.2 Attributed to the ancient geometer Euclid, the foregoing proof is a classic example of mathematical reductio ad absurdum, or as usually styled in English, proof by contradiction.3

6.1.2

Compositional uniqueness

Occasionally in mathematics, plausible assumptions can hide subtle logical flaws. One such plausible assumption is the assumption that every positive integer has a unique prime factorization. It is readily seen that the first several positive integers—1 = (), 2 = (21 ), 3 = (31 ), 4 = (22 ), 5 = (51 ), 6 = (21 )(31 ), 7 = (71 ), 8 = (23 ), . . . —each have unique prime factorizations, but is this necessarily true of all positive integers? To show that it is true, suppose that it were not.4 More precisely, suppose that there did exist positive integers factorable each in two or more distinct ways, with the symbol C representing the least such integer. Noting that C must be composite (prime numbers by definition are each factorable
[45] [39, Appendix 1][52, “Reductio ad absurdum,” 02:36, 28 April 2006] 4 Unfortunately the author knows no more elegant proof than this, yet cannot even cite this one properly. The author encountered the proof in some book over a decade ago. The identity of that book is now long forgotten.
3 2

Nq > 1. that the same prime cannot appear in both factorizations—because if the same prime r did appear in both then C/r either would be prime (in which case both factorizations would be [r][C/r]. let Np 113 Cp ≡ Cq ≡ pj .1. Cp = Cq = C. qk ≤ qk+1 . We see that pj = q k for any j and k—that is. j=2 Nq qk . Among other effects. k=1 Cp = Cq = C. where Cp and Cq represent two distinct prime factorizations of the same number C and where the pj and qk are the respective primes ordered from least to greatest. p1 ≤ q 1 . k=2 .6. pj ≤ pj+1 . Let us now rewrite the two factorizations in the form Cp = p1 Ap . Np Ap ≡ Aq ≡ pj . PRIME NUMBERS only one way. like 5 = [51 ]). Np > 1. Cq = q1 Aq . defying our assumption that the two differed) or would constitute an ambiguously factorable composite integer less than C when we had already defined C to represent the least such. the fact that pj = qk strengthens the definition p1 ≤ q1 to read p1 < q 1 . j=1 Nq qk .

Indeed this is so. The last inequality lets us compose the new positive integer B = C − p1 q 1 . we have that √ 1 < p1 < q1 ≤ C ≤ Aq < Ap < C. Let E represent the positive integer which results from dividing C by p1 q1 : C . with C the least positive integer factorable two ways. q1 That Eq1 = Ap says that q1 divides Ap . the plausible assumption of the present subsection has turned out absolutely correct. so Ap ’s prime factorization is unique—and we see above that Ap ’s factorization does not include any qk . which might be prime or composite (or unity). The contradiction proves false the assumption which gave rise to it. Interestingly however. Observing that some integer s which divides C necessarily also divides C ± ns. Such effects .114 CHAPTER 6. But Ap < C. p1 C = Aq . Thus no positive integer is ambiguously factorable. We have observed at the start of this subsection that plausible assumptions can hide subtle logical flaws. E≡ p1 q 1 Then. then it divides B + p1 q1 = C. Since C is composite and since p1 < q1 . The false assumption: that there existed a least composite number C prime-factorable in two distinct ways. which further means that the product p1 q1 divides B. not even q1 . PRIMES. ROOTS AND AVERAGES where p1 and q1 are the least primes in their respective factorizations. also. Eq1 = Ep1 = C = Ap . This means that B’s unique factorization includes both p1 and q1 . we note that each of p1 and q1 necessarily divides B. we have just had to do some extra work to prove it. But if p1 q1 divides B. but which either way enjoys a unique prime factorization because B < C. which implies that p1 q1 < C. Prime factorizations are always unique.

1. PRIME NUMBERS 115 are typical on the shadowed frontier where applied shades into pure mathematics: with sufficient experience and with a firm grasp of the model at hand. Judging when to delve into the mathematics anyway. q > 1. but nevertheless. Hence there √ exists no rational. whereas 2/3 is.3 Rational and irrational numbers p x = . when one lets logical minutiae distract him to too great a degree. q2 which form is evidently also fully reduced. q A rational number is a finite real number expressible as a ratio of integers The ratio is fully reduced if p and q have no prime factors in common. he admittedly begins to drift out of the applied mathematical realm that is the subject of this book. The proof is readily extended to show that any x = nj/k is irrational if nonintegral. then the fully reduced n = p2 /q 2 is not an integer as we had assumed that it was. as was to be demonstrated. q = 0. The author does judge the present subsection’s proof to be worth the applied effort. and more importantly on whether the unsureness felt is true uncertainty or is just an unaccountable desire for more precise mathematical definition (if the latter. 4/6 is not fully reduced. For √ √ example. It depends on how sure one feels. 6. To prove5 the last point. is a matter of applied mathematical style. p > 0. the 5 A proof somewhat like the one presented here is found in [39. we have that p2 = n. then it probably is. 2 is irrational.6. then unlike the author you may have the right temperament to become a professional mathematician). The contradiction proves false the assumption which gave rise to it. . Squaring the equation. But if q > 1. seeking a more rigorous demonstration of a proposition one feels pretty sure is correct. suppose that there did exist a fully reduced x= p √ = n. if you think that it’s true. For instance. nonintegral n. Appendix 1]. q and n are all integers. In fact any x = n is irrational unless integral. √ there is no such thing as a n which is not an integer but is rational. An irrational number is a finite real number which is not rational. q where p.1.

N > 0. then the quotient Q(z) = B(z)/(z − α) has exactly order N − 1. . z N −1 term of the quotient is never null. The reason is that if the leading term were null. which is to say that z − α exactly divides every polynomial B(z) of which z = α is a root. ROOTS AND AVERAGES extension by writing pk /q k = nj then following similar steps as those this paragraph outlines.2 The existence and number of polynomial roots This section shows that an N th-order polynomial must have exactly N roots. so ρ = 0. this reduces to B(α) = ρ. Note that if the polynomial B(z) has order N . a remainder. Thus substituting yields B(z) = (z − α)Q0 (z) + ρ. Now onward we go to other topics. first-order divisors leave zeroth-order. where Q0 (z) is the quotient and R0 (z). the leading. if Q(z) had order less than N − 1. But B(α) = 0 by assumption. N B(z) = k=0 bk z k . In this case the divisor A(z) = z − α has first order. where A(z) = z − α. Evidently the division leaves no remainder ρ. In the long-division symbology of Table 2.1 Polynomial roots Consider the quotient B(z)/A(z). constant remainders R0 (z) = ρ. 6. bN = 0.116 CHAPTER 6. 6. That is. so little will take you pretty far. That’s all the number theory the book treats. but in applied math. When z = α. B(z) = A(z)Q0 (z) + R0 (z). PRIMES. and as § 2.6. then B(z) = (z − α)Q(z) could not possibly have order N as we have assumed. B(α) = 0.2.2 has observed.3.

where z = ρeiφ .5). the bN z N term dominates B(z). then according to § 6.2 The fundamental theorem of algebra The fundamental theorem of algebra holds that any polynomial B(z) of order N can be factored N N B(z) = k=0 bk z = bN k j=1 (z − αj ). On the other hand.1 one can divide the polynomial by z − αN to obtain a new polynomial of order N − 1. the locus of all points in a plane at distance ρ from a point O is a circle. factoring the polynomial step by step into the desired form bN N (z − αj ). then one can divide it by z−αN −1 to obtain yet another polynomial of order N − 2. For example. it suffices to show that all polynomials of order N > 0 have at least one root. Prob. Because ei(φ+n2π) = eiφ and no fractional powers of z appear in (6. (6. THE EXISTENCE AND NUMBER OF ROOTS 117 6.2. 74] 7 A locus is the geometric collection of points which satisfy a given criterion. this locus forms a closed loop. when ρ = 0 the entire locus collapses on the single point B(0) = b0 . the locus of all points in three-dimensional space equidistant from two points P and Q is a plane. 2. the locus is nearly but not quite a circle at radius bN ρN from the Argand origin B(z) = 0. [24. 10. etc. so the locus there evidently has the general character of bN ρN eiN φ . joined at the ends and looped in N great loops. one root extracted at each step. At very large ρ. Ch. Eventually ρ disappears and the entire string collapses on the point B(0) = b0 . j=1 It remains however to show that there exists no polynomial B(z) of order N > 0 lacking roots altogether. bN = 0. for if a polynomial of order N has a root αN . As ρ shrinks smoothly.6 To prove the theorem. They also prove it in rather a different way. and so on. the string’s shape changes smoothly. consider the locus7 of all B(ρeiφ ) in the Argand range plane (Fig. but at the end has collapsed on a single point. then at some time between it must have swept through the origin and every other point within the original loops.2. Now consider the locus at very large ρ again. Since the string originally has looped N times at great distance about the Argand origin. Watch the locus as ρ shrinks. As such. but this time let ρ slowly shrink. To the new polynomial the same logic applies: if it has at least one root αN −1 . revolving N times at that great distance before exactly repeating.6. and φ is variable. . 6 Professional mathematicians typically state the theorem in a slightly different form. ρ is held constant. To show that there is no such polynomial.2. The locus is like a great string or rubber band.1) where the αk are the N roots of the polynomial.1).

given that the formula be constructed according to certain rules. formulas for the roots are known (see Ch. The Argand origin lies inside the loops at the start but outside at the end.8 but the NewtonRaphson iteration (§ 4.1). finding the polynomial given the roots. 6. (2. but to the applied mathematician it probably suffices to observe merely that no such formula is known. Undoubtedly the theorem is interesting to the professional mathematician. 6. reducing the polynomial step by step until all the roots are found. Charles is new. PRIMES. every 40 seconds. 10) though seemingly not so for quintic (fifth order) and higher-order polynomials. as in (6. The strongest and most experienced of the three. ROOTS AND AVERAGES After all. For a quadratic (second order) polynomial. Given eight hours. .3. every 60 seconds. he lays only 60.118 CHAPTER 6. “Abel’s impossibility theorem”]. Charles. Now suppose that we are told that Adam can lay a brick every 30 seconds. is much easier: one just multiplies out j (z − αj ). then the values of ρ and φ precisely where the string has swept through the origin by definition constitute a root B(ρeiφ ) = 0. it is said to be shown that no such formula even exists. lays 120 bricks per hour. Brian. If so.9 Next is Brian who lays 90. so the string can only sweep as ρ decreases. how many bricks can the three men lay? Answer: (8 hours)(120 + 90 + 60 bricks per hour) = 2160 bricks.8) can be used to locate a root numerically in any case. it can never skip.1 Serial and parallel addition Consider the following problem. which observation completes the applied demonstration of the fundamental theorem of algebra. B(z) does have at least one root. Thus as we were required to show. Adam. Actually finding the roots numerically is another matter. The fact that the roots exist is one thing. How much time do the 8 In a celebrated theorem of pure mathematics [50. The Newton-Raphson is used to extract one root (any root) at each step as described above. The reverse problem.2) gives the roots. 9 The figures in the example are in decimal notation. For cubic (third order) and quartic (fourth order) polynomials.3 Addition and averages This section discusses the two basic ways to add numbers and the three basic ways to calculate averages of them. There are three masons. B(z) is everywhere differentiable.

1 . . Counterintuitive. 5. .—rather than as 1. . ADDITION AND AVERAGES three men need to lay 2160 bricks? Answer: 1 30 119 + 1 40 2160 bricks 1 + 60 bricks per second = 28. . 5 . where 1 1 1 1 = + + . but suggests that the notation which appears in the table. Assuming that none of the values involved is negative. 1 .2) and a bit of arithmetic. 3.2) a b a b where the familiar operator + is verbally distinguished from the when necessary by calling the + the serial addition or series addition operator.6. It works according to the law 1 1 1 = + . “if and only if. 30 40 60 30 40 60 The operator is called the parallel addition operator. (6. Neither is stated in simpler terms than the other.800 seconds = 8 hours. (6. (6. but fortunately there exists a better notation: (2160 bricks)(30 40 60 seconds per brick) = 8 hours. 4. perhaps.3) This is intuitive.1 are soon derived.4) Because we have all learned as children to count in the sensible man1 1 ner 1. the several parallel-addition identities of Table 6.3. 2. With (6. 2 . one can readily show that10 a x ≤ b x iff a ≤ b. b k=a f (k) ≡ f (a) f (a + 1) f (a + 2) · · · f (b). might serve if needed. 1 hour 3600 seconds The two problems are precisely equivalent. . The writer knows of no conventional notation for parallel sums of series. . is that a x ≤ a.” . The notation used to solve the second is less elegant.—serial addition (+) seems 3 4 10 The word iff means.

1 a b 1 1 + a b ab a+b a 1 + ab 1 a+b 1 1 a b ab a b a 1 ab = = a b = a 1 b = a+b = a+ 1 b = a b = b a a (b c) = (a b) c a ∞=∞ a = a a (−a) = ∞ (a)(b c) = ab ac 1 k a+b = b+a a + (b + c) = (a + b) + c a+0 =0+a = a a + (−a) = 0 (a)(b + c) = ab + ac 1 k ak = k 1 ak ak = k 1 ak .1: Parallel and serial addition identities.120 CHAPTER 6. ROOTS AND AVERAGES Table 6. PRIMES.

they are connected in series. Which figure you choose depends on what you want to calculate.11 Now that you have seen it. yet outside the electrical engineering literature the parallel addition notation is seldom seen. which means exactly what it appears to mean. eqn. Realizing this. The rejoinder is fair enough. .27] 13 “And what does the author know about business?” comes the rejoinder. 3 3 1 These two figures are not the same. 1. If the author wanted to demonstrate his business acumen (or lack thereof) he’d do so elsewhere not here! There are a lot of good business books 11 .6. One merely writes a (−b). how does the parallel analog go?)12 Convention brings no special symbol for parallel subtraction. Among the three masons. 4. . you can use it. in fact probably more often than. the productivities average one way. That is. yet for many purposes parallel addition is in fact no less fundamental. if in seconds per brick. 6. Yet both figures are valid.1 illustrates. Parallel addition gives the electrical engineer a neat way of adding the impedances of parallel-connected loads. incidentally. 1/(43 3 seconds per brick) = 90 bricks per hour. 12 [41.3. loads are connected in parallel as often as.3. Its rules are inherently neither more nor less complicated. what is their average productivity? The answer depends on how you look at it. the clever businessperson might negotiate a contract so that the average used worked to his own advantage. There is profit in learning to think both ways.. (Exercise: counting from zero serially goes 0. 3. 3 On the other hand. . The psychological barrier is hard to breach. On the one hand.2 Averages Let us return to the problem of the preceding section. ADDITION AND AVERAGES 121 more natural than parallel addition ( ) does.13 In electric circuits. 30 + 40 + 60 seconds per brick = 43 1 seconds per brick. A common mathematical error among businesspeople is not to realize that both averages are possible and that they yield different numbers (if the businessperson quotes in bricks per hour. as Table 6. the other way. 1. 120 + 90 + 60 bricks per hour = 90 bricks per hour. 5. 2. yet some businesspeople will never clearly consider the difference).

7) where the xk are the several samples and the wk are weights. the geometric mean [(120)(90)(60)] 1/3 bricks per hour.122 CHAPTER 6. Generally.8) (6. the arithmetic. out there and this is not one of them. Business demands other talents.9) (6.6) wk xk wk . The author is not sure. PRIMES.5) j Pk 1/wk . (6. making relatively easy problems harder and more mysterious than the problems need to be. (6. that’s what they hire engineers. is in the author’s experience usually a waste of time. Trying to convince businesspeople that their math is wrong. For two samples weighted equally. incidentally. The mathematically savvy sometimes prefer the geometric mean over either of the others for this reason. these are µ = µΠ µ a+b . The inverse geometric mean [(30)(40)(60)] 1/3 seconds per brick implies the same average productivity. (6. a third average is available.10) = 2(a b). . then you have met the difficulty. but somehow he doubts that many boards of directors would be willing to bet the company on a financial formula containing some mysterious-looking ex . The geometric mean does not have the problem either of the two averages discussed above has. The fact remains nevertheless that businesspeople sometimes use mathematics in peculiar ways. geometric and harmonic means are defined µ ≡ µΠ ≡  µ ≡  k j k wk xk = k wk k 1/ Pk wk w xj j  k xk /wk = 1/wk = k 1 wk  wk xk k w xj j  k . ROOTS AND AVERAGES When it is unclear which of the two averages is more appropriate. Some businesspeople are mathematically rather sharp—as you presumably are if you are in business and are reading these words—but as for most: when real mathematical ability is needed. If you have ever encountered the little monstrosity of an approximation banks (at least in the author’s country) actually use in place of (9. (6.12) to accrue interest and amortize loans. architects and the like for. √2 = ab.

√ 2 ab ≤ a + b. 4ab ≤ a2 + 2ab + b2 . but the motivation behind them remains inscrutable until the reader realizes that the writer originally worked the steps out backward with his pencil. where the word “average” signifies the arithmetic. Starting there and recursing back. 0 ≤ a2 − 2ab + b2 . Only then did he reverse the order and write the steps formally here.14 0 ≤ (a − b)2 .11) holds for each subset individually then surely it holds for the whole set (this is so because the average of the whole set is itself the average of the two subset averages. and so on until each smallest group has two members only—in which case we already know that (6. ADDITION AND AVERAGES If a ≥ 0 and b ≥ 0. then each subsubset can be divided in half again. Does (6. The writer had no idea that he was supposed to start from 0 ≤ (a−b)2 until his pencil working backward showed him. The same reading strategy often clarifies inscrutable math. 2 (6. then by successive steps. consider the case of N = 2m nonnegative samples of equal weight. geometric or harmonic mean as appropriate).” the saying goes. But each subset can further be divided in half. we have that (6. 123 a+b √ .3.11) hold when there are several nonnegative samples of various nonnegative weights? To show that it does.11) The arithmetic mean is greatest and the harmonic mean. Nothing prevents one from dividing such a set of samples in half. µ ≤ µΠ ≤ µ. 2 ab a+b . with the geometric mean falling between. “Begin with the end in mind. least.11) obtains.6.11) The steps are logical enough. 2 a+b . considering each subset separately. In this case the saying is right. from the last step to the first. √ 2 ab ≤ 1 ≤ a+b √ 2ab ≤ ab ≤ a+b √ 2(a b) ≤ ab ≤ That is. When you can follow the logic but cannot understand what could possibly have inspired the writer to conceive the logic in the first place. try reading backward. 14 . for if (6.

. which was to be demonstrated. PRIMES. Now consider that a sample of any weight can be approximated arbitrarily closely by several samples of weight 1/2m .124 CHAPTER 6. (6.11) holds for any nonnegative weights of nonnegative samples. By this reasoning. ROOTS AND AVERAGES obtains for the entire set. provided that m is sufficiently large.

125 . Concision can be a virtue—and by design. 7. f (t)? Chapter 4 has built toward a basic understanding of the first question. The understanding of the second question constitutes the concept of the integral.1 The concept of the integral An integral is a finite accretion or sum of an infinite number of infinitesimal elements. or derivative. This chapter builds toward a basic understanding of the second. is undeniably a hard chapter. yet the book you are reading cannot be that kind of book. This chapter. [20] is warmly recommended. what is the function’s instantaneous rate of change. one of the profoundest ideas in all of mathematics. Experience knows no reliable way to teach the integral adequately to the uninitiated except through dozens or hundreds of pages of suitable examples and exercises.Chapter 7 The integral Chapter 4 has observed that the mathematics of calculus concerns a complementary pair of questions: • Given some function f (t). what is the corresponding accretion. or integral. However. It can be done. This section introduces the concept. The sections of the present chapter concisely treat matters which elsewhere rightly command chapters or whole books of their own. which introduces the integral. for less intrepid readers who quite reasonably prefer a gentler initiation. nothing essential is omitted here—but the bold novice who wishes to learn the integral from these pages alone faces a daunting challenge. f ′ (t)? • Interpreting some function f ′ (t) as an instantaneous rate of change.

1: Areas representing discrete sums.1. THE INTEGRAL Figure 7. of rectangles of width 1/2 and height . Sn = 1 n 1 2 1 4 1 8 k=0 0x20−1 k=0 0x40−1 k=0 0x80−1 k=0 k. . k . thin rectangles of width 1 and height k. 8 (0x10)n−1 k=0 k .126 CHAPTER 7. 7. S1 is composed of several tall. 4 k . S2 . 2 k . n What do these sums represent? One way to think of them is in terms of the shaded areas of Fig.1 An introductory example Consider the sums 0x10−1 S1 = S2 = S4 = S8 = .1. f1 (τ ) 0x10 ∆τ = 1 f2 (τ ) 0x10 1 ∆τ = 2 S1 τ 0x10 S2 τ 0x10 7. . In the figure.

1 As n grows. n 1 . (k|τ =0x10 )−1 Sn = k=0 τ ∆τ. then the reader should probably suspend reading here and go study a good basic calculus text like [20].7. or more properly. the reader is urged to pause and consider the illustration carefully until he does understand it. k=0 If the reader does not fully understand this paragraph’s illustration. Notice how we have evaluated S∞ . If it still seems unclear. In the equation 1 Sn = n let us now change the variables τ ∆τ to obtain the representation (0x10)n−1 (0x10)n−1 k=0 k . . The concept is important. THE CONCEPT OF THE INTEGRAL 127 k/2. without actually adding anything up. We have taken a shortcut directly to the total. n→∞ 2 or more tersely S∞ = 0x80. In fact it appears that bh lim Sn = = 0x80. the sum of an infinite number of infinitely narrow rectangles. the shaded region in the figure looks more and more like a triangle of base length b = 0x10 and height h = 0x10. n ← ← k . where the notation k|τ =0x10 indicates the value of k when τ = 0x10. is the area the increasingly fine stairsteps approach. if the relation of the sum to the area seems unclear. Then (k|τ =0x10 )−1 S∞ = lim 1 ∆τ →0+ τ ∆τ. n Sn = ∆τ k=0 τ.1.

0 This means.” See [52.2: An area representing an infinite sum of infinitesimals. denoting discrete summation.2. “stepping in infinitesimal intervals of dτ . 2 . THE INTEGRAL Figure 7.) f (τ ) 0x10 S∞ τ 0x10 in which it is conventional as ∆τ vanishes to change the symbol dτ ← ∆τ . k=0 τ is cumbersome. the sum of all τ dτ from τ = 0 to τ = 0x10. Compare against ∆τ in Fig. “Long s.” 14:54. where dτ is the infinitesimal of Ch.128 CHAPTER 7. (Observe that the infinitesimal dτ is now too narrow to show on this scale.1. . 7. so we replace it with the The symbol limdτ →0+ k=0=0x10 0x10 new symbol2 0 to obtain the form 0x10 S∞ = τ dτ. 7 April 2006]. .” English “sum.” Graphically. stands for Latin “summa. P Like the Greek S. the seventeenth century-styled R Roman S. 4: (k|τ =0x10 )−1 S∞ = lim (k| dτ →0+ )−1 τ dτ. 7. it is the shaded area of Fig.

The name “trapezoid” comes of the shapes of the shaded integration elements in the figure.1. (7. It represents the area under the curve of f (τ ) in that interval.3 The balanced definition and the trapezoid rule Actually. (k|τ =b )−1 b S∞ = lim dτ →0+ f (τ ) dτ = k=(k|τ =a ) a f (τ ) dτ.2 Generalizing the introductory example Now consider a generalization of the example of § 7.1) + 2 2  dτ →0+  a k=(k|τ =a )+1 Here.1.3 depicts it. (In the example of § 7. In the limit. This is the integral of f (τ ) in the interval a < τ < b.) With the change of variables τ ∆τ this is Sn = k=(k|τ =a ) ← ← k .1.1.1) is known as the trapezoid rule. THE CONCEPT OF THE INTEGRAL 129 7. the first and last integration samples are each balanced “on the edge. we do well to define the integral in balanced form. n 1 . too:   (k|τ =b )−1  f (a) dτ b f (b) dτ  f (τ ) dτ ≡ lim f (τ ) dτ + .7. Equation (7.15).1.” half within the integration domain and half without. just as we have defined the derivative in the balanced form (4. n (k|τ =b )−1 f (τ ) ∆τ. Figure 7.1: 1 Sn = n bn−1 f k=an k n . Observe however that it makes no difference whether one regards the shaded trapezoids or the dashed rectangles as the actual integration . f [τ ] was the simple f [τ ] = τ .1. but in general it could be any function. 7.

130 CHAPTER 7.3: Integration by the trapezoid rule (7. refer to the treatment of the Leibnitz notation in § 4.2. general. The only requirement is that dτ remain infinitesimal. a The trapezoid rule (7. f (τ ) a dτ b τ elements. Notice that the shaded and dashed areas total the same. but other schemes are possible. the total integration area is the same either way. incidentally. THE INTEGRAL Figure 7.2 If The antiderivative and the fundamental theorem of calculus x S(x) ≡ 3 g(τ ) dτ. one can give the integration elements quadratically curved tops which more nearly track the actual curve.1).) 7.1) is perhaps the most straightforward.3 The important point to understand is that the integral is conceptually just a sum. That scheme is called Simpson’s rule. too.4. Constant widths are usually easiest to handle but variable widths find use in some cases.] . but a sum nevertheless. nothing more. (For further discussion of the point. [A section on Simpson’s rule might be added to the book at some later date. It is a sum of an infinite number of infinitesimal elements as dτ tends to vanish. robust way to define the integral. Nothing actually requires the integration element width dτ to remain constant from element to element. For example.

or how to count or to add. . then the idea dawns on him. fittingly named the fundamental theorem of calculus. 23 May 2006] Having read from several calculus books and. but make sure that you a understand it the other way. the author has never yet met a more convincing demonstration of (7.2). one sees that the derivative must be dS = g(x). like millions of others perhaps including the reader. b df dτ = f (τ )|b . the faster the accretion. (7. In this way one sees that the integral and the derivative are inverse operators. Somehow the underlying idea is too simple. too profound to explain. having sat years ago in various renditions of the introductory calculus lectures in school. sketch the corresponding slope function df /dτ . More precisely.2) is of utmost importance in the practice of mathematics.5 The idea is simple but big. but to grasp the idea firmly in the first place is not entirely trivial. try this: Sketch some arbitrary function f (τ ) on a set of axes at the bottom of a piece of paper—some squiggle of a curve like 5 4 f (τ ) τ a b will do nicely—then on a separate set of axes directly above the first. The reader [20.2) a dτ a where the notation f (τ )|b or [f (τ )]b means f (b) − f (a).” 06:29. fine. The integral accretes area at a rate proportional to the curve’s height f (τ ): the higher the curve. If this way works better for you. If you want some help pondering. The integral is the antiderivative. Now consider (7. shade the integration area under the curve.2. a a The importance of (7. THE ANTIDERIVATIVE 131 then what is the derivative dS/dx? After some reflection.4 can hardly be overstated. It’s like trying to explain how to drink water.2) in light of your sketch.2) than the formula itself. “Fundamental theorem of calculus. There. then on the upper. (7. The idea behind the formula is indeed simple once grasped.2) a while. Mark two points a and b on the common horizontal axis. § 11. dx This is so because the action of the integral is to compile or accrete the area under a curve. Does the idea not dawn? Another way to see the truth of the formula begins by canceling its (1/dτ ) dτ to obtain Rb the form τ =a df = f (τ )|b .6][42. the one inverts the other.7. One ponders the formula (7. Elaborate explanations and their attendant constructs and formalities are indeed possible to contrive. § 5-4][52. but the idea itself is so simple that somehow such contrivances seem to obscure the idea more than to reveal it. df /dτ plot. As the formula which ties together the complementary pair of questions asked at the chapter’s start. too.

Section 9. 7. and introduces the multiple integral. consider that because (d/dτ )(τ 3 /6) = τ 2 /2. discusses linearity and its consequences.2 and 5. sin τ. 6 Gathering elements from (4.21) and from Tables 5.1 lists a handful of the simplest. exp τ. .3. treats the transitivity of the summational and integrodifferential operators. As an example of the formula’s use. . One is unlikely to do much higher mathematics without this formula. THE INTEGRAL Table 7.132 CHAPTER 7.1: Basic derivatives for the antiderivative. linearity and multiple integrals This section presents the operator concept.1 speaks further of the antiderivative.1 Operators An operator is a mathematical agent that combines several values of a function. b a df dτ = f (τ )|b a dτ τa a ln τ. most useful derivatives for antiderivative use. a = 0 ln 1 = 0 exp 0 = 1 sin 0 = 0 cos 0 = 1 τ a−1 = 1 = τ exp τ = cos τ = sin τ = d dτ d dτ d dτ d dτ d dτ is urged to pause now and ponder the formula thoroughly until he feels reasonably confident that indeed he does grasp it and the important idea it represents. Table 7.3.3 Operators. (− cos τ ) . it follows that x 2 τ 2 dτ = 2 x 2 d dτ τ3 6 dτ = τ3 6 x = 2 x3 − 8 . 7.

f (k) ≡ 5 if k = 1. the + operator can indeed be said to use a dummy variable up. k=0 By such admittedly excessive formalism. unfortunately. Such a variable. is a dummy variable. The operator has used the variable up. Notice that the operator has acted to remove the variable j from the expression 2j − 1. 7. .   undefined otherwise. used up by an operator.3. LINEARITY AND MULTIPLE INTEGRALS 133 Such a definition. S= f (k) = f (0) + f (1) = 3 + 5 = 8. where if k = 0. A better way to introduce the operator is by giving examples.7. it depends on how you look at it. −. The essential action of an operator is to take several values of a function and combine them in some way. do they? Well. . multiplication. is an operator in 5 j=1 (2j − 1) = (1)(3)(5)(7)(9) = 0x3B1. The point is that + is in fact an operator just like the others. Operators include +.2 A formalism But then how are + and − operators? They don’t use any dummy variables up. is extraordinarily unilluminating to those who do not already know what it means. 1  3  Then.3. One can write this as 1 S= k=0 f (k). The j appears on the equation’s left side but not on its right. For example.3. as encountered earlier in § 2. Consider the sum S = 3 + 5. and ∂. OPERATORS. division. .

otherwise. When you see the linear expression 3z + 1. if k = 2. f (u. That’s the sense of it. k=0 = where Such unedifying formalism is essentially useless in applications.3 Linearity A function f (z) is linear iff (if and only if) it has the properties f (z1 + z2 ) = f (z1 ) + f (z2 ). It doesn’t help much. Nonlinear functions include6 f (z) = z 2 . z) − Φ(3. 7. partly of semantics. the −1 is the constant value it targets. .3.134 Another example of the kind: CHAPTER 7. Once you understand why + and − are operators just as and are. THE INTEGRAL D = g(z) − h(z) + p(z) + q(z) = Φ(0. f (z) = 3z + 1 and even f (z) = 1. z) + Φ(4. z). z) ≡ 0    q(z)     undefined if k = 0. v) = 2u − v and f (z) = 0 are examples of √ linear functions. so the expression 3z + 1 is literally “linear” in this sense. you can forget the formalism. z) − Φ(1. f (0) = 0. f (αz) = αf (z). z) 4 = g(z) − h(z) + p(z) − 0 + q(z) (−)k Φ(k. The functions f (z) = 3z. f (t) = cos ωt. think 3z + 1 = 0. The equation y = 3x + 1 plots a line. if k = 1. if k = 3. except as a vehicle for definition.  g(z)    h(z)     p(z) Φ(k. then g(z) = 3z = −1. then how is not f (z) = 3z + 1 a linear function? Answer: it is partly a matter of purposeful definition. but the definition has more purpose to it than merely this. 6 If 3z + 1 is a linear expression. The g(z) = 3z is linear. z) + Φ(2. if k = 4. v) = uv. f (u.

Integrals and derivatives must 7 You don’t see d in the list of linear operators? But d in this context is really just another way of writing ∂. Now consider that an integral is just a sum of many elements. k) in the indicated domain. Apparently S1 = S2 .2. See § 4. so. d is linear.7 . k).6. too. 135 . f (j. Reflection along these lines must soon lead the reader to the conclusion that. and that a derivative is just a difference of two elements. For d df1 df2 [f1 (z) + f2 (z)] = + . S2 = j=p xk .4 Summational and integrodifferential transitivity b Consider the sum S1 = k=a   q j=p xk j!  This is a sum of the several values of the expression xk /j!. evaluated at every possible pair (j.7. − and ∂ are examples of linear operators. Now consider the sum q b k=a .3. . L(0) = 0. LINEARITY AND MULTIPLE INTEGRALS An operator L is linear iff it has the properties L(f1 + f2 ) = Lf1 + Lf2 . OPERATORS. division and the various trigonometric functions. The operators instance. k) = k j j k f (j. +. 7. yes. only added in a different order.2 will have more to say about operators and their notation. j! This is evidently a sum of the same values. Section 15. among others.3. dz dz dz Nonlinear operators include multiplication. in general. L(αf ) = αLf.4.

For example. 2k + j + 1 diverge once reordered.136 CHAPTER 7. 2k + j + 1 One cannot blithely swap operators here. v) du dv = u=a u=a ∞ v=−∞ f (u. but rather because the inner sum after the swap diverges. fk (v) dv = k k fk (v) dv. where L is any of the linear operators Some convergent summations. Paradoxically. k=0 j=0 (−)j . f du = Lv Lu f (u. hence the outer sum after the swap has no concrete summand on which to work. as 1 ∞ j=0 k=0 (−)j . v) = Lu Lv f (u. § 1. which associates two negative terms with each positive.3] . ∂f du. v).) For a more twisted example of the same phenomenon. ∞ v=−∞ b b f (u.10. like ∞ 1 .3) ∂ ∂v In general.5. consider8 1− 1 1 1 + − + ··· = 2 3 4 1− 1 1 − 2 4 + 1 1 1 − − 3 6 8 + ··· . ∂v (7. THE INTEGRAL then have the same transitive property discrete sums have. v) dv du. or ∂. This is not because swapping is wrong. then. See also § 8. 1− 1 1 1 + − + ··· = 2 3 4 = = 8 1 1 1 1 − − + + ··· 2 4 6 8 1 1 1 1 − + − + ··· 2 4 6 8 1 1 1 1 1 − + − + ··· .2. (Why does the inner sum after the swap diverge? Answer: 1 + 1/3 + 1/5 + · · · = [1] + [1/3 + 1/5] + [1/7 + 1/9 + 1/0xB + 1/0xD] + · · · > 1[1/4] + 2[1/8] + 4[1/0x10] + · · · = 1/4 + 1/4 + 1/4 + · · · . 2 2 3 4 [1. but still seems to omit no term.

Professional mathematics does bring an elegant notation and a set of formalisms which serve ably to spotlight certain limited kinds of blunders. maybe. preceded if necessary by [20] and/or [1. but these are blunders no less by the applied approach. the professional approach is worth study if you have the time. The stalwart Leonhard Euler—arguably the greatest series-smith in mathematical history—wielded his heavy analytical hammer in thunderous strokes before professional mathematics had conceived the notation or the formalisms. Ch. v) = Consider the function u2 . but cannot be.3. § 16] . Well. but rather as a curved surface in a three-dimensional space. V = v1 v u1 u2 This is a double integral.9 So specifying would have prevented the error. or so it would seem. Integrating the function seeks not the area under the curve but rather the volume under the surface: v2 2 u2 u dv du. A better way to have handled the example might have been to write the series as 1 1 1 1 1 lim 1 − + − + · · · + − n→∞ 2 3 4 2n − 1 2n 7. v Such a function would not be plotted as a curved line in a plane. 1]. Recommended introductions include [28]. thus explicitly specifying equal numbers of positive and negative terms. LINEARITY AND MULTIPLE INTEGRALS 137 in the first place. u2 dv. If the great Euler did without. This writer however does not feel sure that rigor is quite the right word for what was lacking here. One can normally swap summational and integrodifferential operators with little worry. for it claims falsely that the sum is half itself. Inasmuch as it can be written in the form V = u1 v2 g(u) du.7. OPERATORS. v g(u) ≡ 9 v1 Some students of professional mathematics would assert that the false conclusion was reached through lack of rigor. On the other hand. seldom poses much of a dilemma in practice.5 Multiple integrals f (u. The conditional convergence 10 of the last paragraph. 10 [28. then you and I might not always be forbidden to follow his robust example. which can occur in integrals as well as in sums.3. The reader however should at least be aware that conditional convergence troubles can arise where a summand or integrand varies in sign or phase.

another triple: a sixfold integration altogether. V = S f (ρ) dρ. Manifold nesting of integrals is thus not just a theoretical mathematical topic. it arises in sophisticated real-world engineering models. Hence v2 u2 u1 V = v1 u2 du dv. doesn’t it? What difference does it make whether we add the towers by rows first then by columns.11 then the total soil mass in some rectangular volume is x2 y2 y1 z2 M= x1 z1 µ(x. upright slices. then v. Triple integrations arise about as often. if µ(r) = µ(x. For instance. where the V stands for “volume” and is understood to imply a triple integration. . y. v And indeed this makes sense.4.3. 11 Conventionally the Greek letter ρ not µ is used for density. Double integrations arise very frequently in applications. z) dz dy dx. and its inverse. where the S stands for “surface” and is understood to imply a double integration. In light of § 7. As a concise notational convenience.138 CHAPTER 7. then the slices over u to constitute the volume. THE INTEGRAL its effect is to cut the area under the surface into flat. y. If we integrated over time as well as space. but it happens that we need the letter ρ for a different purpose later in the paragraph. The topic concerns us here for this reason. Similarly for the double integral. the last is often written M= V µ(r) dr. A spatial Fourier transform ([section not yet written]) implies a triple integration. then the slices crosswise into tall. evidently nothing prevents us from swapping the integrations: u first. Even more than three nested integrations are possible. or by columns first then by rows? The total volume is the same in any case. the integration would be fourfold. The towers are integrated over v to constitute the slice. thin towers. z) represents the variable mass density of some soil.

areas and volumes of interesting common shapes and solids. but inasmuch as the wedge is infinitesimally narrow.5. Integrating the many triangles. We shall calculate it in § 8. the wedge is indistinguishable from a triangle of base length ρ dφ and height ρ.2 The volume of a cone One can calculate the volume of any cone (or pyramid) if one knows its base area B and its altitude h measured normal12 to the base.4) (The numerical value of 2π—the circumference or perimeter of the unit circle—we have not calculated yet.4. one can calculate the perimeters.1 The area of a circle Figure 7.” .11.4.7.4 Areas and volumes By composing and solving appropriate integrals. 7. The area of such a triangle is Atriangle = ρ2 dφ/2.4 depicts an element of a circle’s area. The element has wedge shape. 2 2 (7. we find the circle’s area to be π π Acircle = φ=−π Atriangle = −π 2πρ2 ρ2 dφ = .4.4: The area of a circle. y 139 dφ ρ x 7. AREAS AND VOLUMES Figure 7.) 7. 7. Refer to Fig. 12 Normal here means “at right angles.

5) 7. For the surface area. well characterized part of a square? (With a pair of scissors one can cut any shape from a square piece of paper. then z the cross-sectional area is evidently13 (B)(z/h)2 . Fig. then ponder Fig. If it is not yet evident to you. THE INTEGRAL Figure 7.5: The volume of a cone. h B A cross-section of a cone. has the same shape the base has but a different scale.4. 13 . and how cross-sections cut nearer a cone’s vertex are smaller though the same shape.6.) Thinking along such lines must soon lead one to the insight that the parallel-cut cross-sectional area of a cone can be nothing other than (B)(z/h)2 . after all. What if the base were square? Would the cross-sectional area not be (B)(z/h)2 in that case? What if the base were a right triangle with equal legs—in other words.7. half a square? What if the base were some other strange shape like the base depicted in Fig. 7. regardless of the base’s shape. cut parallel to the cone’s base. the sphere’s surface is sliced vertically down the z axis into narrow constant-φ tapered strips (each strip broadest at the sphere’s equator.5 a moment. as in Fig. 3 (7. A surface element so produced (seen as shaded in the latter figure) evidently has the area dS = (r dθ)(ρ dφ) = r 2 sin θ dθ dφ. If coordinates are chosen such that the altitude h runs in the ˆ direction with z = 0 at the cone’s vertex. 7. 7. Consider what it means to cut parallel to a cone’s base a cross-section of the cone. tapering to points at the sphere’s ±z poles) and horizontally across the z axis into narrow constant-θ rings.140 CHAPTER 7. one wants to calculate both the surface area and the volume. The fact may admittedly not be evident to the reader at first glance. For this reason.3 The surface area and volume of a sphere Of a sphere. 7.5? Could such a strange shape not also be regarded as a definite. the cone’s volume is h Vcone = 0 (B) z h 2 dz = B h2 h 0 z 2 dz = B h2 h3 3 = Bh .

6).6: A sphere.7: An element of the sphere’s surface (see Fig. z ˆ θ φ x ˆ r z y ˆ ρ Figure 7. z ˆ r dθ y ˆ ρ dφ . AREAS AND VOLUMES 141 Figure 7.4. 7.7.

The circular area derivation of § 7.142 CHAPTER 7.1 that sin τ = (d/dτ )(− cos τ ). each cone with base area dS and altitude r. In light of (7. the volume of one such cone is Vcone = r dS/3. however.4. with the vertices of all the cones meeting at the sphere’s center. Per (7.6). where we have used the fact from Table 7. 3 S where the useful symbol S indicates integration over a closed surface. Hence.7) Vsphere = 3 (One can compute the same spherical volume more prosaically. Vsphere = S Vcone = S r dS r = 3 3 r dS = Ssphere .1 lends an analogous insight: that a circle can sometimes be viewed as a great triangle rolled up about its own vertex. THE INTEGRAL The sphere’s total surface area then is the sum of all such elements over the sphere’s entire surface: π π Ssphere = φ=−π π θ=0 π θ=0 dS r 2 sin θ dθ dφ = = r φ=−π π 2 = r2 φ=−π π φ=−π 2 [− cos θ]π dφ 0 [2] dφ (7. without reference to cones. The derivation given above.6) = 4πr .4. the total volume is 4πr 3 . by writing dV = r 2 sin θ dr dθ dφ then integrating V dV . one divides the sphere into many narrow cones. (7. Having computed the sphere’s surface area.1 has found a circle’s area—except that instead of dividing the circle into many narrow triangles. is preferred because it lends the additional insight that a sphere can sometimes be viewed as a great cone rolled up about its own vertex.) . one can find its volume just as § 7.5).

go with it. and it’s how computers divide. and long division in straight binary is kind of fun. 14 . b S≡ a df dτ = f (b) − f (a). integrating b a τ2 b3 − a3 dτ = 2 6 with a pencil.10) which can be used to check further. dτ (7.8) with respect to b and a. Actually. according to (7. you might.10) to check the example. If you have never tried it. 2 Differentiation inverts integration. it’s 1131/13 = 87. They are of little use to check definite integrals Admittedly. It is simpler than decimal or hexadecimal division. how does one check the result? Answer: by differentiating ∂ ∂b b3 − a3 6 = b=τ τ2 .9) a=τ Either line of (7. More formally. dτ = b=τ (7.15 As useful as (7. (b3 − a3 )/6|b=a = 0. 15 Using (7. few readers will ever have done much such multidigit hexadecimal arithmetic with a pencil.8) Differentiating (7.9) and (7.2). The insight gained is worth the trial. Evaluating (7.7. Easier than integration. they nevertheless serve only integrals with variable limits.5 Checking an integration Dividing 0x46B/0xD = 0x57 with a pencil.5. Multiplication inverts division. hexadecimal is just proxy for binary (see Appendix A).9) can be used to check an integration. but. reliable check.8) at b = a yields S|b=a = 0. CHECKING AN INTEGRATION 143 7. ∂S ∂b ∂S ∂a df . Likewise.10) are. how does one check the result?14 Answer: by multiplying (0x57)(0xD) = 0x46B. reliable check. (7. In decimal. dτ df =− . Easier than division. differentiation like multiplication provides a quick. hey. multiplication provides a quick.

10) do serve such indefinite integrals. b. 9 will bring much harder ones. Yet many other paths of integration from xρ to yρ are possible. a + 2dτ. 7.9) and (7. a f (τ ) dτ . Equations (7.8. many or most integrals one meets in practice have or can be given variable limits. but they differ in between. . then from there to r = yρ? Or does it mean to integrate along the arc of Fig. Consider the integral ˆ yρ S= r=ˆ ρ x (x2 + y 2 ) dℓ. Analytically.8? The two paths of integration begin and end at the same points. where C stands for “contour” and means in this example the specific contour of Fig. not just these two. So far the book has introduced only easy integrals. which lack variable limits to differentiate. in which the function f (τ ) is evaluated at τ = a. . However. a + dτ. 7. 7. although numerically differentiation is indeed harder than integration. . so 2π/4 2π 3 ρ . ˆ What does this integral mean? Does it mean to integrate from r = xρ to ˆ r = 0.6 Contour integration To this point we have considered only integrations in which the variable of integration advances in a straight line from one point to another: for b instance. THE INTEGRAL like (9.144 CHAPTER 7. The integration variable is a real-valued scalar which can do nothing but make a straight line from a to b.14) below. ρ3 dφ = ρ2 dℓ = S= 4 0 C . x2 + y 2 = ρ2 (by the Pythagorean theorem) and dℓ = ρ dφ. In the example. analytically precisely the opposite is true. Because multiple paths are possible. Reversing an integration by taking an easy derivative is thus an excellent way to check a hard-earned integration result. we must be more specific: S= C (x2 + y 2 ) dℓ. and the integral certainly does not come ˆ ˆ out the same both ways. where dℓ is the infinitesimal length of a step along the path of integration. differentiation is the easier. Even experienced mathematicians are apt to err in analyzing these. It is a rare irony of mathematics that. Such is not the case when the integration variable is a vector. . but Ch.

as we shall see in §§ 8. but closed contours which begin and end at the same point are also possible.5. It means that the contour ends where it began: the loop is closed. Consider a mechanical valve opened at time t = to . DISCONTINUITIES Figure 7. In the latter case some interesting mathematics emerge. but one thing they do not model gracefully is the simple discontinuity. .7 Discontinuities The polynomials and trigonometrics studied to this point in the book offer flexible means to model many physical phenomena of interest.7. 7. xo . t > t o . contour integration applies equally where the variable of integration is a complex scalar. The contour of Fig.7. 7. t < to .8 would be closed. The useful symbol indicates integration over a closed contour.8 and 9. if it continued to r = 0 and then back to r = xρ. indeed common. ˆ for instance. The flow x(t) past the valve is x(t) = 0. y C ρ φ x 145 In the example the contour is open.8: A contour of integration. Besides applying where the variable of integration is a vector.

11) plotted in Fig. 1. THE INTEGRAL Figure 7.9. 7. (7. t > 0. where it is infinite. with the property that ∞ −∞ δ(t) dt = 1. u(t) ≡ 0. u(t) 1 t Figure 7. The derivative of the Heaviside unit step is the curious Dirac delta δ(t) ≡ d u(t). δ(t) t One can write this more concisely in the form x(t) = u(t − to )xo . 7. where u(t) is the Heaviside unit step.10: The Dirac delta δ(t). This function is zero everywhere except at t = 0. t < 0.146 CHAPTER 7.13) . (7.12) plotted in Fig.10. dt (7.9: The Heaviside unit step u(t).

14) for any function f (t). V 16 It seems inadvisable for the narrative to digress at this point to explore u(z) and δ(z). then you would expect me to adapt my definition. but that was the professionals’ problem not ours. The professionals who had established the theoretical framework before 1930 justifiably felt reluctant to throw the whole framework away because some scientists and engineers like us came along one day with a useful new function which didn’t quite fit. The objection is not so much that δ(t) is not allowed as it is that professional mathematics for years after 1930 lacked a fully coherent theory for it. 25 May 2006] and slapped his disruptive δ(t) down on the table. The internal discussion of words and means. when the six-fingered Count Rugen appeared on the scene. (Equation 7. coupled with the difficulty we applied mathematicians experience in trying to understand the reason the dispute even exists. such that δ(r) dr = 1. To us the Dirac delta δ(t) is just a function. an elusive sense that δ(t) hides subtle mysteries—when what it really hides is an internal discussion of words and means among the professionals. For the book’s present purpose the interesting action of the two functions is with respect to the real argument t. however. inasmuch as it lacks certain properties common to functions as they define them [36. If I had established a definition of “nobleman” which subsumed “human. In the author’s country at least. Even if not. is not for this writer to judge. the fact of the Dirac delta dispute.4][12]. the six-fingered count is “not a nobleman”. has unfortunately surrounded the Dirac delta with a kind of mysterious aura. a sort of debate seems to have run for decades between professional and applied mathematicians over the Dirac delta δ(t). wouldn’t you? By my pre¨xisting definition.)16 The Dirac delta is defined for vectors. although by means of Fourier analysis ([chapter not yet written]) or by conceiving the Dirac delta as an infinitely narrow Gaussian pulse ([chapter not yet written]) it could perhaps do so. too. until one realizes that it is more a dispute over methods and definitions than over facts. of course. “Paul Dirac.7.” whose relevant traits in my definition included five fingers on each hand. Whether the professional mathematician’s definition of the function is flawed. (7. e but such exclusion really tells one more about flaws in the definition than it does about the count. The book has more pressing topics to treat.14 is the sifting property of the Dirac delta.7. DISCONTINUITIES and the interesting consequence that ∞ −∞ 147 δ(t − to )f (t) dt = f (to ) (7. we leave to the professionals. From the applied point of view the objection is admittedly a little hard to understand. Some professional mathematicians seem to have objected that δ(t) is not a function.” 05:48. the unit step and delta of a complex argument. § 2.15) . strictly speaking. What the professionals seem to be saying is that δ(t) does not fit as neatly as they would like into the abstract mathematical framework they had established for functions in general before Paul Dirac came along in 1930 [52. who know whereof they speak. It’s a little like the six-fingered man in Goldman’s The Princess Bride [19].

∗ Evaluate the integral of the example of § 7. [42. Even if this book is not an instructional textbook. 1. ∗ Evaluate19 (a) 0 [1/(1 + τ 2 )] dτ (answer: arctan x). § 5-6] 19 [42. (b) x k ∞ k=0 0 τ dτ . x3 /3. 4.148 CHAPTER 7. This chapter is short.6 along the alternate conˆ ˆ tour suggested there. Work the exercise by hand in hexadecimal and give the answer in hexadecimal. Work the exercises if you like. The reader who desires a gentler introduction to the integral might consult among others the textbook the chapter’s introduction has recommended. (b) 0 [(4 + √ i3)/ 2 − 3τ 2 ] dτ (hint: the answer involves another inverse trigonometric). deep and hard. (b) ∞ 2 −∞ exp[−τ /2] dτ . but its implications are broad. so you should not expect to be able to complete them all now. § 8-2] [42. 11. One reason introductory calculus texts run so long is that they include many. 8. 3. ∗ Evaluate (a) dτ . (b) 9. −2 5. Evaluate (a) 1 (1/τ 2 ) dτ . The harder ones are marked with ∗ asterisks. (b) 1 x x 0 sin ωτ dτ . (b) a 3τ −2 dτ . Evaluate x 0 ∞ k k=0 τ (d) x 0 dτ . Evaluate (a) x 0 τ x dτ . Here are a few. it seems not meet that it should include no exercises at all here. (c) ∞ k k=0 (τ /k!) x 0 exp ατ 5 dτ . ∗ Evaluate18 (a) x√ 1 + 2τ dτ . (b) 5 (3τ 2 − 2τ 3 ) dτ . x x 0 τ 10. many pages of integration examples and exercises. x (Answer: x2 /2. THE INTEGRAL 7. (c) x (a2 τ 2 + a1 τ ) dτ . √ τ )/ τ ] dτ. x 0 2. Some of them do need material from later chapters. 7.) x n a Cτ dτ . Evaluate (a) −2 (3τ 2 − 2τ 3 ) dτ . back endpaper] . 17 18 ∗∗ Evaluate (a) x 2 −∞ exp[−τ /2] dτ .8 Remarks (and exercises) The concept of the integral is relatively simple once grasped. 6. from xρ to 0 to yρ. Evaluate (a) x 0 cos ωτ dτ . ∗ (c)17 a x [(cos √ sin ωτ dτ . (b) x 2 0 τ dτ . ∗ (e) 1 (1/τ ) dτ . Evaluate ∞ 2 1 (3/τ ) dτ .

and in fact how easy it is to come up with an integral which no one really knows how to solve very well. Chapter 9 introduces some of the basic.8. faster to numerically calculate. the more such integrals you can solve. you should be able to work right now. The mathematical art of solving diverse integrals is well worth cultivating. as promised earlier we turn our attention in Chapter 8 back to the derivative. REMARKS (AND EXERCISES) 149 The last exercise in particular requires some experience to answer. however. Before addressing techniques of integration. integrals which arise in practice often can be solved very well with sufficient cleverness—and the more cleverness you develop. The point of the exercises is to illustrate how hard integrals can be to solve. The ways to solve them are myriad. Some of the easier exercises. On the other hand. etc.) yet not even the masters can solve them all in practical ways. Some solutions to the same integral are better than others (easier to manipulate. of course. Moreover.7. . most broadly useful integralsolving techniques. it requires a developed sense of applied mathematical style to put the answer in a pleasing form (the right form for part b is very different from that for part a). applied in the form of the Taylor series.

150 CHAPTER 7. THE INTEGRAL .

6 would cease to be necessary. 8. 8. most of §§ 8. plus the introduction of § 8. to read only the following sections might not be an unreasonable way to shorten the chapter: §§ 8. For the impatient. This chapter introduces the Taylor series and some of its incidents. the chapter errs maybe toward too little rigor. The chapter’s early sections prepare the ground for the treatment of the Taylor series proper in § 8.8. The chapter errs maybe toward too much rigor. Fitting a function in such a way brings two advantages: • it lets us take derivatives and integrals in the same straightforward way (4. 1 151 . and • it implies a simple procedure to calculate the function numerically. Some pretty constructs of pure mathematics serve the Taylor series and Cauchy’s integral formula. It also derives Cauchy’s integral formula.1. However.1 Because even at the applied level the proper derivation of the Taylor series involves mathematical induction.9 and 8. such constructs drive the applied mathematician on too long a detour.2. no balance of rigor the chapter might strike seems wholly satisfactory.1.3. The chapter as written represents the most nearly satisfactory compromise the writer has been able to attain.3. with a little less.4 and 8.11. From another point of view. 8. 8.20) we take them with any power series. for.5. 8.Chapter 8 The Taylor series The Taylor series is a power series which fits a function in a limited domain neighborhood. analytic continuation and the matter of convergence domains.

down the second diagonal.4 we found that 1 = 1−z zk = 1 + z + z2 + z3 + · · · for |z| < 1.2) which actually proves (8. one can calculate the first few terms of 1/(1 − z)2 to be 1 1 = = 1 + 2z + 3z 2 + 4z 3 + · · · 2 (1 − z) 1 − 2z + z 2 whose coefficients 1.1 The power-series expansion of 1/(1 − z)n+1 ∞ k=0 Before approaching the Taylor series proper in § 8.1) for n ≥ 0.1. 4.1.4. 4. Multiplying by 1 − z. the coefficients down the third. THE TAYLOR SERIES 8. The demonstration comes in three stages. A curious pattern seems to emerge.1.2 on page 73. = (1 − z)n k=0 . it is the second stage (§ 8. (8. 1/(1 − z)3 .1). Of the three. worth investigating more closely. .6. dividing 1/(1 − z)4 . 6.1). 0xA. and so on? By the long-division procedure of Table 2.1 The formula ∞ k=0 In § 2. 8.3. .1. Dividing 1/(1 − z)3 seems to produce the coefficients 1. we shall find it both interesting and useful to demonstrate that 1 = (1 − z)n+1 n+k n zk (8. . suppose that 1/(1− z)n+1 . The first stage (§ 8. n ≥ 0. is expandable in the power series ∞ 1 ank z k .2) = (1 − z)n+1 k=0 where the ank are coefficients to be determined. .1) comes up with the formula for the second stage to prove.3) establishes the sum’s convergence.152 CHAPTER 8. 4. we have that ∞ 1 (ank − an(k−1) )z k .1). The pattern recommends the conjecture (8. 3. see also Fig. . What about 1/(1 − z)2 . 3. happen to be the numbers down the first diagonal of Pascal’s triangle (Fig. 1/(1 − z)4 . |z| < 1. To motivate the conjecture a bit more formally (though without actually proving it yet). The third stage (§ 8. 2. .

3).4) better match (8. this is n + (k − 1) n + (n − 1) + k n−1 = n+k n . applied to (8. Transforming according to the rule (4. yield (8. (8.8.1.3). all we can say is that (8. (8.2).3) reminds one of (4.1) seems right.3) perfectly. .4) gives n+k−1 k−1 + n+k−1 k = n+k k .3) is not a(m−1)(j−1) + a(m−1)j = amj . this much is easy to test. or in other words that an(k−1) + a(n−1)k = ank . (8. But to seem is not to be. (8. 153 (8.5) which fits (8.3) Thinking of Pascal’s triangle.5). THE POWER-SERIES EXPANSION OF 1/(1 − Z)N +1 This is to say that a(n−1)k = ank − an(k−1) . We shall establish that it is right in the next subsection. Various changes of variable are possible to make (8. Thus changing in (8.6) which coefficients.1) is thus suggestive.1). It works at least for the important case of n = 0. At this point. it seems to imply a relationship between the 1/(1 − z)n+1 series and the 1/(1 − z)n series for any n. We might try at first a few false ones. recommends itself. Hence we conjecture that ank = n+k n . In light of (8.3). transcribed here in the symbols m−1 m−1 m + = .4) j−1 j j except that (8. but eventually the change n + k ← m. Equation (8. k ← j.

8) Now suppose that (8.7) n+k n − n + (k − 1) n zk .5).34) we know to be true. Applying (8. i+k i zk . (n − 1) + k n−1 zk . this one needs at least one start case to be valid (many inductions actually need a consecutive pair of start cases).1) is true for n = i − 1. if (8. which per (2. if it is true for any one n. Consider the sum Sn ≡ Multiplying by 1 − z yields (1 − z)Sn = Per (8. this is (1 − z)Sn = ∞ k=0 ∞ k=0 n+k n zk .9) implies (8.154 CHAPTER 8. Thus by induction. then it is also true for all greater n.10) Evidently (8.8). this means that 1 = (1 − z)Si .10). (8. (8.2 The proof by induction ∞ k=0 Equation (8. The “if” in the last sentence is important. In other words. (8.7). (8. Like all inductions. .1. (1 − z)i 1 = Si .1) is proved by induction as follows. then it is also true for n = i.1) is true for n = i − 1 (where i denotes an integer rather than the imaginary unit): 1 = (1 − z)i ∞ k=0 (i − 1) + k i−1 zk . THE TAYLOR SERIES 8. The n = 0 supplies the start case 1 = (1 − z)0+1 ∞ k=0 k 0 z = k ∞ k=0 zk .9) In light of (8. (1 − z)i+1 1 = (1 − z)i+1 ∞ k=0 Dividing by 1 − z.

A more rigorous way of saying the same thing is as follows: the series ∞ X τk S= k=0 .7. THE POWER-SERIES EXPANSION OF 1/(1 − Z)N +1 155 8.” distinguishing it through a test devised by Weierstrass from the weaker “pointwise convergence” [1. this means that n+k n or more tersely. but the effect of trying to teach the professional view in a book like this would not be pleasing. The applied mathematician can profit substantially by learning the professional view in the matter. for all possible positive constants ǫ. but if explanation here helps: a series converges if and only if it approaches a specific. n ← j.1. m j = m m−j m−1 j for any integers m > 0 and j.8. With the substitution n + k ← m.1).5]. § 1. ˛ ˛ ˛ k=K+1 2 The meaning of the verb to converge may seem clear enough from the context and from earlier references.9). ank an(k−1) = n+k n =1+ . are the coefficients of the power series (8.1) converges. finite value after many terms.11) for all n ≥ K (of course it is also required that the τk be finite. converges iff (if and only if). The professional mathematical literature calls such convergence “uniform convergence. Here. there exists a finite K ≥ −1 such that ˛ ˛ n ˛ X ˛ ˛ ˛ τk ˛ < ǫ. It is interesting nevertheless to consider an example of an integral for which convergence is not so simple. Rearranging factors. but you knew that already). we avoid error by keeping a clear view of the physical phenomena the mathematics is meant to model.3 Convergence The question remains as to the domain over which the sum (8.1.2 To answer the question. consider that per (4. such as Frullani’s integral of § 9. k n+k n = n+k k n + (k − 1) n . ank = where ank ≡ n+k an(k−1) . k k (8.

20).11) by z k /z k−1 gives the ratio ank z k n z. The theoretical difficulty the student has with mathematical induction arises from the reluctance to ask seriously. as in § 8. 3 .10. 3. the reader may nevertheless wonder why the simpler |(1 + n/k)z| < 1 is not given as a criterion. (8. The P surprising answer is that not all series τk with |τk /τk−1 | < 1 converge! For example. P P the extremely simple 1/k does not converge. = 1+ k−1 an(k−1) z k which is to say that the kth term of (8.4 General remarks on mathematical induction We have proven (8. The virtue of induction as practiced in § 8. 8. Hamming once said of mathematical induction. So long as the criterion3 1+ n z ≤1−δ k is satisfied for all sufficiently large k > K—where 0 < δ ≪ 1 is a small positive constant—then the series evidently converges (see § 2. The distinction is P subtle but rather important.4 and eqn. maybe you can prove it by induction. all series τk with |τk /τk−1 | < 1 − δ do converge.1. but the induction probably does not help you to obtain the formula! A good inductive proof usually begins by motivating the formula proven. Once you obtain a formula somehow. As we see however.1. Its vice is that it conceals the subjective process which has led the mathematician to consider the formula in the first place. Richard W. Answer: it Rx majorizes 1 (1/τ ) dτ = ln x.1) by means of a mathematical induction. airtight case for a formula.1. The really curious reader may now ask why 1/k does not converge.2 is that it makes a logically clean. so to meet the criterion it suffices that |z| < 1. See (5.1.1). “How could I prove a formula for an infinite number of cases when I know that testing a finite number of cases is not enough?” Once you Although one need not ask the question to understand the proof.1) is (1 + n/k)z times the (k − 1)th term.156 CHAPTER 8.6.7) and § 8.12) The bound (8.12) thus establishes a sure convergence domain for (8. But we can bind 1+n/k as close to unity as desired by making K sufficiently large. THE TAYLOR SERIES Multiplying (8.

We shall not always write inductions out as explicitly in this book as we have done in the present section—often we shall leave the induction as an implicit exercise for the interested reader—but this section’s example at least lays out the general pattern of the technique. but he does not disdain mathematical rigor on principle. [20.2. you will understand the ideas behind mathematical induction.3 concerns the shifting of a power series’ expansion point. Mathematical induction is a broadly applicable technique for constructing mathematical proofs. [one cannot teach] a uniform level of rigor. SHIFTING A POWER SERIES’ EXPANSION POINT really face this question. but psychologically there is little else that can be done. Hamming. § 1. [20. wrote further. but Hamming nevertheless makes a pertinent point.8. It is only when you grasp the problem clearly that the method becomes clear. on the ground that the math serves the model. How can the expansion point of the power series f (z) = ∞ k=K (ak )(z − zo )k . which is needed to protect us against careless thinking. Logically. The style lies in exercising rigor at the right level for the problem at hand. K ≤ 0.6] 157 The applied mathematician may tend to avoid rigor for which he finds no immediate use. § 2. 8. (8. [20. It is necessary to require a gradually rising level of rigor so that when faced with a real need for it you are not left helpless. when teaching a topic the degree of rigor should follow the student’s perceived need for it. but rather a gradually rising level. § 1. . Ideally.13) .2 Shifting a power series’ expansion point One more question we should treat before approaching the Taylor series proper in § 8. Rigor is the hygiene of mathematics.3] Hamming also wrote. .6] Applied mathematics holds that the practice is defensible. The function of rigor is mainly critical and is seldom constructive. a professional mathematician who sympathized with the applied mathematician’s needs. this is indefensible. As a result. .

12). THE TAYLOR SERIES which may have terms of negative order.1). . (8.18) . (cn ) n k (8. (c[−(n+1)] ) The f+ (z) is even simpler to expand: one need only multiply the series out term by term per (4.13) in the form f (z) = then changes the variables z − z1 . (8. be shifted from z = zo to z = z1 ? The first step in answering the question is straightforward: one rewrites (8. (1 − w)k+1 (ck )(1 − w)k .16) c[−(k+1)] .17) n+k n . we have that f (z) = f− (z) + f+ (z).158 CHAPTER 8. f− (z) ≡ f+ (z) ≡ −(K+1) k=0 ∞ k=0 (8. w← to obtain f (z) = ∞ k=K ∞ k=K (ak )([z − z1 ] − [zo − z1 ])k . after which combining like powers of w yields the form f− (z) = qk ≡ ∞ k=0 −(K+1) n=0 qk w k .14) (ck )(1 − w)k . (8. the f− (z) is expanded term by term using (8.15) Splitting the k < 0 terms from the k ≥ 0 terms in (8. combining like powers of w to reach the form f+ (z) = pk ≡ ∞ k=0 ∞ n=k pk w k . zo − z1 ck ← [−(zo − z1 )]k ak .15). Of the two subseries.

18) can be nothing other than the Taylor series about z = z1 of the function f+ (z) in any event. it also must lie within the convergence domain of the original. anyway. 4 .17) nor of f+ (z) in (8. however. See §§ 8.9) and its brethren. Rather.8.4 through 8. The convergence domain vanishes as z1 approaches such a forbidden point. which ratio approaches unity with increasing n.8. is harder to establish in the abstract because that subseries has an infinite number of terms.16) converges for |w| < 1. one shifts the expansion point of some concrete function like sin z or ln(1 − z). The treatment begins with a question: if you had to express A rigorous argument can be constructed without appeal to § 8.3.13) through (8.3. The attentive reader might observe that we have formally established the convergence neither of f− (z) in (8. shifting the expansion point away from the pole at z = zo has resolved the k < 0 terms. z = zo series for the shift to be trustworthy as derived. we have strategically framed the problem so that one needn’t worry about it. enjoying the same convergence domain any such Taylor series enjoys. The method fails if z = z1 happens to be a pole or other nonanalytic point of f (z). running the sum in (8. The imagined difficulty (if any) vanishes in the concrete case.12) each term of the original f− (z) of (8. one does not normally try to shift the expansion point of an unspecified function f (z). that of f+ (z).3. it has no terms of negative order. the reconstituted f− (z) of (8. given those of a power series about z = zo . A more elegant rigorous argument can be made indirectly by way of a complex contour integral. we now stand in position to treat the Taylor series proper. z = zo power series—the new.12) of restricting the convergence domain to |w| < 1.3 if desired.) Furthermore. and since according to (8. In applied mathematics. that of f− (z).18) serve to shift a power series’ expansion point. Appealing to § 8. At the price per (8.3 Expanding functions in Taylor series Having prepared the ground.13) from the finite k = K ≤ 0 rather than from the infinite k = −∞. however. As we shall see by pursuing a different line of argument in § 8. The latter convergence.17) safely converges in the same domain. z = z1 power series has terms (z − z1 )k only for k ≥ 0. EXPANDING FUNCTIONS IN TAYLOR SERIES 159 Equations (8. from the ratio n/(n − k) of (4. Notice that—unlike the original. even if z1 does represent a fully analytic point of f (z). (Examples of such forbidden points include z = 0 in √ h[z] = 1/z and in g[z] = z.4 8.18). calculating the coefficients of a power series for f (z) about z = z1 . the important point is the one made in the narrative: f+ (z) can be nothing other than the Taylor series in any event. Regarding the former convergence. the f+ (z) of (8.

then ∆F (zo ) and all its derivatives at zo must be identically zero (otherwise by the Taylor series formula of eqn.) The reason the rigorous proof is confined to a footnote is not a deprecation of rigor as such. If ∆F (z) is the part of F (z) not representable as a Taylor series. having terms of nonnegative order only. for example?5 Fortunately a different way to attack the power-series expansion problem is known. Consider an infinitely differentiable function F (z) and its Taylor series f (z) about zo .160 CHAPTER 8. there is no part of F (z) not representable as a Taylor series. However. the argument goes briefly as follows. most resembles f (z) in the immediate neighborhood of z = zo ? To resemble f (z). and so on. It is a deprecation of rigor which serves little purpose in applications. THE TAYLOR SERIES some function f (z) by a power series f (z) = ∞ k=0 (ak )(z − zo )k . f (z) = ∞ k=0 dk f dz k z=zo (z − zo )k . Further proof details may be too tiresome to inflict on applied mathematical readers. it has all the same derivatives f (z) has.1 worked well enough in the case of f (z) = 1/(1−z)n+1 . one could construct a nonzero Taylor series for ∆F [zo ] from the nonzero derivatives). but it is not immediately obvious that the same procedure works more generally. What if f (z) = sin z. This means that ∆F (z) = 0.9. and so on. Where it converges. Then it should have a1 = f ′ (zo ) for the right slope. hence also at z = zo ± 2ǫ. with terms of nonnegative order k ≥ 0 only. It works by asking the question: what power series. preferred by the professional mathematicians [29][14] but needing significant theoretical preparation. a2 = f ′′ (zo )/2 for the right second derivative. (A more elegant rigorous proof. for readers who want a little more rigor nevertheless.19) is the Taylor series.19) Equation (8. if F (z) is infinitely differentiable and if all the derivatives of ∆F (z) are zero at z = zo . In other words. However. otherwise it would not have the right value at z = zo . Let ∆F (z) ≡ F (z) − f (z) be the part of F (z) not representable as a Taylor series about zo . the desired power series should have a0 = f (zo ). We will not detail that proof here. 8.19. involves integrating over a complex contour about the expansion point. Then. The interested reader can fill the details in. all the derivatives must also be zero at z = zo ± ǫ. Applied mathematicians 6 5 .4. how would you do it? The procedure of § 8. With this procedure.6 The actual Taylor series for sin z is given in § 8. then by the unbalanced definition of the derivative from § 4. k! (8. so if f (z) is infinitely differentiable then the Taylor series is an exact representation of the function. but basically that is how the more rigorous proof goes.

8.2. maybe more appropriately for our applied purpose.4 Analytic continuation As earlier mentioned in § 2. With such an implicit definition. except that the transposed series may there enjoy a different convergence domain. In the swapped notation. When zo = 0.3. However. ANALYTIC CONTINUATION 161 The Taylor series is not guaranteed to converge outside some neighborhood near z = zo . a function expressible as a Taylor series in that neighborhood.12. the two series evidently describe the selfsame underlying analytic function. so long as the expansion point z = zo lies fully within (neither outside nor right on the edge of) the z = z1 series’ convergence domain. Since an analytic function f (z) is infinitely differentiable and enjoys a unique Taylor expansion fo (z −zo ) = f (z) about each point zo in its domain.8. transposing rather from expansion about z = z1 to expansion about z = zo . Since the functions are imprecise analogs in any case. then the two can normally regard mathematical functions to be imprecise analogs of physical quantities of interest. to restrict the set of infinitely differentiable functions used in the model to the subset of such functions representable as Taylor series. nothing prevents one from transposing the series to a different expansion point z = z1 by the method of § 8. In applied mathematics. that is. it follows that if two Taylor series f1 (z − z1 ) and f2 (z − z2 ) find even a small neighborhood |z − zo | < ǫ which lies in the domain of both. the series is a construct of great importance and tremendous practical value. (It is entertaining to consider [51. but where it does converge it is precise. this section’s purpose finds it convenient to swap symbols zo ↔ z1 .) . as we shall soon see. As we have seen. By either name. As it happens. the applied mathematician is logically free implicitly to define the functions he uses as Taylor series in the first place. an analytic function is a function which is infinitely differentiable in the domain neighborhood of interest—or. the series is also called the Maclaurin series.4. whether there actually exist any infinitely differentiable functions not representable as Taylor series is more or less beside the point. only one Taylor series about zo is possible for a given function f (z): f (z) = ∞ k=0 (ak )(z − zo )k . “Extremum”] the Taylor series of the function sin[1/x]—although in practice this particular function is readily expanded after the obvious change of variable u ← 1/x. not the other way around. the definitions serve the model.

extends the analytic continuation principle to cover power series in general. The series f1 and the series fn represent the same analytic function even if their domains do not directly overlap at all. The subject of analyticity is rightly a matter of deep concern to the professional mathematician.162 CHAPTER 8. . One can construct 7 The writer hesitates to mention that he is given to understand [44] that the domain neighborhood can technically be reduced to a domain contour of nonzero length but zero width. The result of § 8. one need not actually expand every analytic function in a power series.” he means any function at all. observe: though all convergent power series are indeed analytic. so that is all right. is an analytic function of analytic functions. The domain neighborhood |z − zo | < ǫ suffices. because the function is not differentiable there. then they are the same everywhere. THE TAYLOR SERIES both be transposed to the common z = zo expansion point. then the whole chain of overlapping series necessarily represents the same underlying analytic function f (z).7 Observe however that the principle fails at poles and other nonanalytic points. by the derivative chain rule. and if each pair in the chain matches at least in a small neighborhood in its region of overlap. if a series f3 is found whose domain overlaps that of f2 . which shows general power series to be expressible as Taylor series except at their poles and other nonanalytic points. then a series f4 whose domain overlaps that of f3 . Now. This is a manifestation of the principle of analytic continuation. If the two are found to have the same Taylor series there. When the professional mathematician speaks generally of a “function. Besides. Moreover. It is also a long wedge which drives pure and applied mathematics apart. He does not especially recommend that the reader worry over the point. given Taylor series for g(z) and h(z) one can make a power series for f (z) by long division if desired. Section 8. the writer has neither researched the extension’s proof nor asserted its truth. where g(z) and h(z) are analytic. then f1 and f2 both represent the same function. Having never met a significant application of this extension of the principle. The principle holds that if two analytic functions are the same within some domain neighborhood |z − zo | < ǫ.2. including power series with terms of negative order. and so on. Sums. products and ratios of analytic functions are no less differentiable than the functions themselves—as also. For example.15 speaks further on the point. there also is f (z) ≡ g(z)/h(z) analytic (except perhaps at isolated points where h[z] = 0).

Maybe a future release of the book will trace the pure theory’s main thread in an appendix. troublesome function. BRANCH POINTS some pretty unreasonable functions if one wants to. reduces. However. The foundations of the pure theory of a complex variable. where k and m are integers. replaces and/or avoids them. if one encircles9 the point once alone (that is. he encounters several. without also encircling some other branch point) by a closed contour in the Argand domain plane. Such functions are nonanalytic either because they lack proper derivatives in the Argand plane according to (4. are beautiful. given a function f (z) with such a point at z = zo . the pure theory is probably best appreciated after one already understands its chief conclusions.8 This does not mean that the scientist or engineer never encounters nonanalytic functions. Refer to §§ 2. δ(t). ℜ(z). approximates. geometrical circle.8.5 Branch points √ The function g(z) = z is an interesting. Though not for the explicit purpose of serving the pure theory. even an applied mathematician can profit substantially by studying them.5. one transforms. When such functions do arise. probably the most fitting shape to imagine when thinking of the concept abstractly. ℑ(z). The full theory which classifies and encompasses—or explicitly excludes—such functions is thus of limited interest to the applied mathematician. 163 f (z) ≡ 0 otherwise. and though they do not comfortably fit a book like this. any closed shape suffices.12 and 7. its derivative is not finite there.” The verb does not require the boundary to have the shape of an actual.7. 9 For readers whose native language is not English. though abstract. z ∗ . such as f ([2k + 1]2m ) ≡ (−)m . neither functions like this f (z) nor more subtly unreasonable functions normally arise in the modeling of physical phenomena. Evidently g(z) has a nonanalytic point at z = 0.19) or because one has defined them only over a real domain. However that may be. However. On the contrary. arg z. but they are not subtle: |z|. 8 . The defining characteristic of the branch point is that. so even though the function is finite at z = 0. the present chapter does develop just such an understanding. including [14][44][24] and numerous others. while simultaneously tracking f (z) in the Argand range plane—and if one demands that z and f (z) move Many books do cover it in varying degrees. “to encircle” means “to surround” or “to enclose. the circle is a typical shape. What is it? We call it a branch point. u(t). and this book does not cover it. 8. Its deriva√ tive is dg/dz = 1/2 z. yet the point is not a pole.

is single-valued even though it has a pole. Poles do not cause their functions to be multiple-valued and thus are not branch points. Evidently f (z) ≡ (z − zo )a has a branch point at z = zo if and only if a is not an integer. An analytic function like h(z) = 1/z. there really is no distinction between z1 = zo + ρeiφ and z2 = zo + ρei(φ+2π) —none at all. Where an integral follows a closed contour as in § 8. that neither suddenly skip from one spot to another—then one finds that f (z) ends in a different place than it began. by contrast. even though the two are exactly the same number. whereupon for some reason he began calling me Gorbag Pfufnik. is it? It’s in my colleague. even though z itself has returned precisely to its own starting point. It is multiple-valued. but paradoxically f (z1 ) = f (z2 ). As far as z is concerned. 16 May 2006] √ An analytic function like g(z) = z having a branch point evidently is not single-valued. THE TAYLOR SERIES smoothly. It is confusing. hadn’t I. The change isn’t really in me. What draws the distinction is the multiplevalued function f (z) which uses the argument. “Pfufnik!” then I had better keep a running count of how many times I have turned about his desk. even though the number of turns is personally of no import to me. a branch point may be thought of informally as a point zo at which a “multiple-valued function” changes values when one winds once around zo . but now the colleague calls me by a different name. until one realizes that the fact of a branch point says nothing whatsoever about the argument z. too. . For a single z more than one distinct g(z) is possible. I had not changed at all. When a domain contour encircles a pole. who seems to suffer a branch point. It is as though I had a mad colleague who called me Thaddeus Black. the corresponding range contour is properly closed.8. If f (z) does have a branch point—if a is not an integer— then the mathematician must draw a distinction between z1 = zo + ρeiφ and z2 = zo + ρei(φ+2π) . The range contour remains open even though the domain contour is closed. In complex analysis. If it is important to me to be sure that my colleague really is addressing me when he cries. This function does not suffer the syndrome described. Indeed z1 = z2 . The usual analysis strategy when one encounters a branch point is simply to avoid the point. until one day I happened to walk past behind his desk (rather than in front as I usually did). “Branch point.164 CHAPTER 8.” 18:10. [52. This is difficult.

and that point is no branch point. pure mathematics nevertheless brings some technical definitions the applied mathematician can use. isolated nonanalytic point at z = 0. but not f (z) = 1/z which has a pole at z = 0. but are not central to the analysis as developed in this book and are not covered here. Even the function f (z) = exp(1/z) is not meromorphic: it has only the one. A function f (z) which is analytic for all finite z except at isolated poles (which can be n-fold poles if n is a finite integer). is a meromorphic function. Examples include f (z) = z 2 and f (z) = exp z. ENTIRE AND MEROMORPHIC FUNCTIONS 165 the strategy is to compose the contour to exclude the branch point.6 Entire and meromorphic functions Though an applied mathematician is unwise to let abstract definitions enthrall his thinking. but of which the poles nowhere cluster in infinite numbers. Examples include f (z) = 1/z. incidentally. These ideas are interesting. f (z) = 1/(z + 2) + 1/(z − 1)3 + 2z 2 and f (z) = tan z—the last of which has an infinite number of poles. which has no branch points. 4 10 Traditionally associated with branch points in complex variable theory are the notions of branch cuts and Riemann sheets. then consider that tan z = cos w sin z =− .12 If it seems unclear that the singularities of tan z are actual poles. to shut it out.10 8.11 A function f (z) which is analytic for all finite z is an entire function. Such a strategy of avoidance usually prospers. of which no circle of finite radius in the Argand domain plane encompasses an infinity of poles. The function f (z) = tan(1/z) is not meromorphic since it has an infinite number of poles within the Argand unit circle. having the character of an infinitifold (∞-fold) pole. The interested reader might consult a book on complex variables or advanced calculus like [24]. cos z sin w wherein we have changed the variable w ← z − (2n + 1) 2π . Two such are the definitions of entire and meromorphic functions.6. but the point is an essential singularity. among many others. 11 [51] 12 [29] .8.

but ∞ k=1 (ak )(z − zo )k .7 Extrema over a complex domain If a function f (z) is expanded by (8. if it is unclear that z = [2n + 1][2π/4] are the only singularities tan z has—that it has no singularities of which ℑ[z] = 0—then consider that the singularities of tan z occur where cos z = 0. in the immediate neighborhood of the expansion point. below. Specifically according to (8. |z − zo | ≪ 1.) Sections 8. to shift f (z) by ∆f ≈ ǫeiψ . ′ ′ (8. this is f (z) ≈ f (zo ) + aK ρ′K eiKφ . w/3 − w3 /0x1E + · · · 1 + . such that aK . 5. 8. 8.6 speak further of the matter.19) or by other means about an analytic expansion point z = zo such that f (z) = f (zo ) + and if ak = 0 aK = 0.20). THE TAYLOR SERIES Section 8. This in turn is possible only if |exp[+iz]| = |exp[−iz]|.9 and its Table 8.166 CHAPTER 8.15 and 9.17.14. is the series’ first nonzero coefficient. give Taylor series for cos z and sin z.20) Evidently one can shift the output of an analytic function f (z) slightly in any desired Argand direction by shifting slightly the function’s input z. then. 0 < K < ∞. occurs where exp[+iz] = exp[−iz]. which happens only for real z. one can shift z . w − w3 /6 + w5 /0x78 − · · · By long division. for 0 < k < K < ∞. eqn. with which −1 + w2 /2 − w4 /0x18 − · · · tan z = . w 1 − w2 /6 + w4 /0x78 − · · · tan z = − (On the other hand. f (z) ≈ f (zo ) + (aK )(z − zo )K .1. Changing ρ′ eiφ ← z − zo . 0 ≤ ρ′ ≪ 1. which by Euler’s formula.

Inasmuch as z is complex.21) If z were always a real number.5. or contour. Consider the integral z2 Sn = z1 z n−1 dz. then by the antiderivative (§ 7. this always works—even where [df /dz]z=zo = 0. so the contour problem does not arise in that case. (8.8. 14 [9][29] 13 . Except at a nonanalytic point of f (z) or in the trivial case that f (z) were everywhere constant. It does however arise where the variable of integration is a complex scalar. to ln(z2 /z1 ). One must also consider the meaning of the symbol dz. Refer to the Argand plane of Fig. 2.14 8. That one can shift an analytic function’s output smoothly in any Argand direction whatsoever has the significant consequence that neither the real nor the imaginary part of the function—nor for that matter any linear combination ℜ[e−iω f (z)] of the real and imaginary parts—can have an extremum within the interior of a domain over which the function is fully analytic. one must consider some specific path of integration in the Argand plane.13.8 Cauchy’s integral formula In § 7. over which the integration is done. 7. as in Fig. Professional mathematicians tend to define the domain and its boundary more carefully. That is. because there again different paths are possible.2) this inten n gral would evaluate to (z2 − z1 )/n. in the case of n = 0. however.8.8. Because real scalars are confined to a single line. the correct evaluation is less obvious. no alternate choice of path is possible where the variable of integration is a real scalar. in which the sum value of an integration depends not only on the integration’s endpoints but also on the path.6 we considered the problem of vector contour integration. or. CAUCHY’S INTEGRAL FORMULA 167 by ∆z ≈ (ǫ/aK )1/K ei(ψ+n2π)/K . a function’s extrema over a bounded analytic domain never lie within the domain’s interior but always on its boundary. To evaluate the integral sensibly in the latter case.

you’ll know it.8. One cannot always drop them.8. The math itself will tell you. Occasionally one encounters a sum in which not only do the finite terms cancel. 8. we can drop the dρ dφ term.1. An example of the type is (1 − 3ǫ + 3ǫ2 ) + (3 + 3ǫ) − 4 (1 − ǫ)3 + 3(1 + ǫ) − 4 = lim = 3. Otherwise there isn’t much point in carrying second-order infinitesimals around. the second-order infinitesimals dominate and cannot be dropped. In such a case.22) 8. added to first order infinitesimals like dρ. (8.15 After canceling finite terms. Since the product of two infinitesimals is negligible even on infinitesimal scale. however. .1 The meaning of the symbol dz The symbol dz represents an infinitesimal step in some direction in the Argand plane: dz = [z + dz] − [z] = = = (ρ + dρ)ei(φ+dφ) − ρeiφ (ρ + dρ)ei dφ eiφ − ρeiφ (ρ + dρ)(1 + i dφ)eiφ − ρeiφ . then you proceed from there. In the relatively uncommon event that you need them. but also the first-order infinitesimals. Integrat15 The dropping of second-order infinitesimals like dρ dφ. THE TAYLOR SERIES 8. we are left with the peculiar but excellent formula dz = (dρ + iρ dφ)eiφ . you simply go back to the step where you dropped the infinitesimal and you restore it. 2 ǫ→0 ǫ ǫ2 ǫ→0 lim One typically notices that such a case has arisen when the dropping of second-order infinitesimals has left an ambiguous 0/0.2 Integrating along the contour Now consider the integration (8. To fix the problem. is a standard calculus technique.21) along the contour of Fig.168 CHAPTER 8.

8.1: A contour of integration in the Argand plane. and constant-φ (zb to zc ). zc zb z n−1 dz = = ρc ρb ρc ρb (ρeiφ )n−1 (dρ + iρ dφ)eiφ (ρeiφ )n−1 (dρ)eiφ ρc ρb = einφ = = ρn−1 dρ einφ n (ρc − ρn ) b n n − zn zc b . CAUCHY’S INTEGRAL FORMULA 169 Figure 8. y = ℑ(z) zb ρ zc φ za x = ℜ(z) ing along the constant-φ segment. in two segments: constant-ρ (za to zb ).8. n .

ρ ρ1 (8. However. we really didn’t know it before deriving it. we find that ρ2 ρ1 dz = z ρ2 ρ1 (dρ + iρ dφ)eiφ = ρeiφ ρ2 ρ1 dρ ρ2 = ln . (8. we have that zc za einφ dφ iρn inφb − einφa e in n n zb − za . Equation (8. It’s the same as for real numbers. n z n−1 dz = surprisingly the same as for real z.7) and (2. but the point is well taken nevertheless. However. zb za z n−1 dz = = φb φa φb φa (ρeiφ )n−1 (dρ + iρ dφ)eiφ (ρeiφ )n−1 (iρ dφ)eiφ φb φa = iρn = = Adding the two.” Well. notice the exemption of n = 0.24) This is always real-valued. Since any path of integration between any two complex numbers z1 and z2 is approximated arbitrarily closely by a succession of short constant-ρ and constant-φ segments. n n n zc − za . Consider the n = 0 integral z2 S0 = z1 dz .23) does not hold in that case. it follows generally that z2 n z n − z1 z n−1 dz = 2 . but otherwise it brings no surprise. n = 0.39). z Following the same steps as before and using (5. φ2 φ1 dz = z φ2 φ1 (dρ + iρ dφ)eiφ =i ρeiφ φ2 φ1 dφ = i(φ2 − φ1 ).23) n z1 The applied mathematician might reasonably ask.23) really worth the trouble? We knew that already. “Was (8. (8.25) .170 CHAPTER 8. THE TAYLOR SERIES Integrating along the constant-ρ arc.

even though the integration ends where it begins. (z − zo )a−1 dz = 0. Notice that the formula’s i2π does not depend on the precise path of integration. a = 0. and where it is implied that the contour loops positively (counterclockwise.26) does not hold in the latter case.27) has that z a−1 dz = 0. its integral comes to zero for nonintegral a only if the contour does not enclose the branch point at z = zo . But because any analytic function can be expanded in the form f (z) = ak −1 (which is just a Taylor series if the a happen to be k k (ck )(z − zo ) positive integers). n = 0.26) where as in § 7.28) The careful reader will observe that (8.28)’s derivation does not explicitly handle . dz z − zo = i2π. a (8. (8.27) However.16 16 (8. so one can write z2 z1 z a−1 dz = a a z2 − z1 . φ2 = φ1 + 2π. CAUCHY’S INTEGRAL FORMULA 171 The odd thing about this is in what happens when the contour closes a complete loop in the Argand plane about the z = 0 pole. In this case. thus S0 = i2π. (8.23) actually requires that n be an integer. this means that f (z) dz = 0 if f (z) is everywhere analytic within the contour. or with the change of variable z − zo ← z. Notice also that nothing in the derivation of (8.8. For a closed contour which encloses no pole or other nonanalytic point. (8.6 the symbol represents integration about a closed contour that ends where it begins. but only on the fact that the path loops once positively about the pole.8. in the direction of increasing φ) exactly once about the z = zo pole. Generalizing. we have that (z − zo )n−1 dz = 0.

fo (z) + k fk (z) z − zk dz = i2π k fk (zk ).6. Thus. Because the cells share boundaries within the contour’s interior. (8. The original contour—each piece of which is an exterior boundary of some cell—is integrated once piecewise. each interior boundary is integrated twice. Let the contour’s interior be divided into several cells. 8. once in each direction.29) is Cauchy’s integral formula. If the contour were a tiny circle of infinitesimal radius about the pole.2). THE TAYLOR SERIES 8. and then per (8. then the integrand would reduce to f (zo )/(z − zo ). f [z] = ln[1 − z]).172 CHAPTER 8. z − zo (8.3.29) holds for any positivelydirected contour which once encloses a pole and no other nonanalytic point.8.1] .3 The formula The combination of (8.30) where the fo (z) is a regular part. 17 [32. and similarly as we have just reasoned. each of which is small enough to enjoy a single convergence domain. If the contour encloses multiple poles (§§ 2. f (z) dz = i2πf (zo ). Equation (8.2 one can transpose such a series from zo to an overlapping convergence domain about z1 . the reverse-directed integral about the tiny detour circle is −i2πf (zo ).17 and again. This is the basis on which a more rigorous proof is constructed. then according to (8. Consider the closed contour integral f (z) dz. where neither fo (z) nor any of the several fk (z) has a pole or other nonanalytic point within (or on) the an f (z) represented by a Taylor series with an infinite number of terms and a finite convergence domain (for example. then by the principle of linear superposition (§ 7. § 1.26) and (8.11 and 9.28) the resulting integral totals zero. but the two straight integral segments evidently cancel. canceling.2? In this case. if the dashed detour which excludes the pole is taken. by § 8. so to bring the total integral to zero the integral about the main contour must be i2πf (zo ).26). whether the contour be small or large.29) But if the contour were not an infinitesimal circle but rather the larger contour of Fig. (8.3). However. Integrate about each cell. z − zo where the contour encloses no nonanalytic point of f (z) itself but does enclose the pole of f (z)/(z − zo ) at z = zo .28) is powerful.

§ 10. (z − zo )m+1 where m + 1 = n. Expanding f (z) in a Taylor series (8. The values fk (zk ). triple or other n-fold pole. CAUCHY’S INTEGRAL FORMULA Figure 8.5. “Cauchy’s integral formula. 20 April 2006] . (k!)(z − zo )m−k+1 [24.2: A Cauchy contour integral. (Note however that eqn. the integration can be written S= f (z) dz.” 14:13.18 8.8. ℑ(z) zo 173 ℜ(z) contour. are called residues. (8.8.30 does not handle branch points. In words.8. which represent the strengths of the poles.30) says that an integral about a closed contour in the Argand plane comes to i2π times the sum of the residues of the poles (if any) thus enclosed. the contour must exclude it or the formula will not work.30) Cauchy’s integral formula is an extremely useful result.29) or of (8.4 Enclosing a multiple pole When a complex contour of integration encloses a double.6][44][52. S= 18 ∞ k=0 dk f dz k z=zo dz . If there is a branch point.) As we shall see in § 9. whether in the form of (8.19) about z = zo . 8.

For instance. = z=1 z=1 . z=zo z=zo = z=zo = where the integral is evaluated in the last step according to (8. [29][44] . (When m = 0. = 2. (8.19).3. m ≥ 0.29). f (z) i2π dm f .21) that d(z a ) = az a−1 . = z=1 = −1.)19 8. the derivatives of Tables 5. THE TAYLOR SERIES But according to (8. Altogether.31) dz = (z − zo )m+1 m! dz m z=zo Equation (8. . ln z|z=1 = d ln z dz d2 ln z dz 2 d3 ln z dz 3 dk ln z dz k 19 = z=1 ln z|z=1 = 1 = z z=1 −1 z2 2 z3 −(−)k (k − 1)! zk z=1 0. k > 0.9 Taylor series for specific functions With the general Taylor series formula (8. . the two equations are the same. only the k = m term contributes.31) evaluates a contour integral about an n-fold pole as (8. expanding about z = 1. so S = 1 m! i2π m! dm f dz m dm f dz m dm f dz m dz (m!)(z − zo ) dz (z − zo ) .174 CHAPTER 8.29) does about a single pole. dz one can calculate Taylor series for many functions. 1. and the observation from (4.2 and 5.26). = z=1 z=1 = −(−)k (k − 1)!.

One can expand the Taylor series about a different point. Table 8. then quit—which raises the question: how many terms are enough? How can one know that one has added adequately many terms. one can relatively efficiently calculate actual numerical values for ln z and many other functions. Section 8. k evidently convergent for |1 − z| < 1. the Taylor series about z = 1 is ln z = ∞ k=1 175 −(−)k (k − 1)! (z − 1)k =− k! ∞ k=1 (1 − z)k . the series for arctan z is computed indirectly20 by way of Table 5.10.3 and (2. ln 20 1 3 = ln 1 + 2 2 = 1 1 1 1 − + − + ··· (1)(21 ) (2)(22 ) (3)(23 ) (4)(24 ) [42. ERROR BOUNDS With these derivatives.10. that the remaining terms. One must add some finite number of terms. For example.) Using such Taylor series.8. For these it is easy if the numbers involved happen to be real.10. (And if z lies outside the convergence domain? Several strategies are then possible.1 lists Taylor series for a few functions of interest.1. All the series converge for |z| < 1.10 Error bounds One naturally cannot actually sum a Taylor series to an infinite number of terms. 2k + 1 8. § 11-7] . The exp z. which constitute the tail of the series. from Table 8. are sufficiently insignificant? How can one set error bounds on the truncated sum? 8. sin z and cos z series converge for all complex z. but cleverer and easier is to take advantage of some convenient relationship like ln w = − ln[1/w].33): z arctan z = 0 1 dw 1 + w2 (−)k w2k dw = = z ∞ 0 k=0 ∞ k=0 (−)k z 2k+1 .4 elaborates.1 Examples Some series alternate sign. Among the several series.

1: Taylor series. ∞ k=0 f (z) = (1 + z)a−1 = dk f dz k k k z=zo j=1 z − zo j ∞ k=0 j=1 a −1 z j z = j k ∞ k=0 exp z = ∞ k k=0 j=1 zk k!   sin z = ∞ k=0  cos z = ∞ z k j=1 (2j)(2j + 1) −z 2 k=0 j=1 −z 2 (2j − 1)(2j) k sinh z = ∞ k=0  cosh z = ∞ z k z2 (2j)(2j + 1) j=1   k=0 j=1 z2 (2j − 1)(2j) k − ln(1 − z) = arctan z = ∞ k=1 ∞ k=0 1 k z= j=1 ∞ k=1 zk k (−z 2 ) =  ∞ k=0 1  z 2k + 1  k j=1 (−)k z 2k+1 2k + 1 .176 CHAPTER 8. THE TAYLOR SERIES Table 8.

8.34) had been used to collapse the summation. 1) 2) 3) (1)(2 (2)(2 (3)(2 (4)(24 ) 1 1 + + ··· 5) (5)(2 (6)(26 ) The basic technique in such a case is to find a replacement series (or inte′ gral) Rn which one can collapse analytically. For real 0 ≤ x < 1 generally. one might choose ∞ 2 1 1 ′ = . k k=n+1 xk xn+1 = . n Sn ≡ ′ Rn ≡ k=1 ∞ xk . 3 1 < ln < S3 .10. but that is the basic technique: to add several . ln 2 = − ln S4 = R4 = 1 1 = − ln 1 − = S 4 + R4 . ′ Sn < − ln(1 − x) < Sn + Rn . R4 = 5 2k (5)(25 ) k=5 wherein (2. After three terms. 2 2 1 1 1 1 + + + . For the example. n+1 (n + 1)(1 − x) Many variations and refinements are possible. each of whose terms equals or exceeds in magnitude the corresponding term of Rn . some of which we shall meet in the rest of the section. Then. for instance. ′ S4 < ln 2 < S4 + R4 . 1) 2) (1)(2 (2)(2 (3)(23 ) S3 − Other series however do not alternate sign. 4) (4)(2 2 1 1 1 S3 = − + . so the true value of ln(3/2) necessarily lies between the sum of the series to n terms and the sum to n + 1 terms. For example. The last and next partial sums bound the result. ERROR BOUNDS 177 Each term is smaller in magnitude than the last.

As the plot shows. S < 2. Reflecting.178 CHAPTER 8. 21 .10. everywhere at least as great as. THE TAYLOR SERIES terms of the series to establish a lower bound. known quantity.1) I= 1 ∞ dx 1 =− x2 x ∞ 1 = 1. The technique usually works well in practice for this reason. but the summation does rather resemble the integral (refer to Table 7. Figure 8. then to overestimate the ′ remainder of the series to establish an upper bound. Consider the summation S= ∞ k=1 1 1 1 1 = 1 + 2 + 2 + 2 + ··· 2 k 2 3 4 The exact value this summation totals to is unknown to us. such as in representing the terms graphically—not as flat-topped rectangles—but as slant-topped trapezoids.3 plots S and I together as areas—or more precisely. The overestimate Rn ′ in the example majorizes the series’ true remainder Rn . n = 0x40 would have bound ln 2 tighter than the limit of a computer’s typical double-type floating-point accuracy). and that it would have been a lot smaller yet had we included a few more terms in Sn (for instance. 8. This is best explained by example. but as a reflection of majorization the word seems logical. (Of course S < 2 is a very loose limit. shifted in the figure a half unit rightward. The quantities in question The author does not remember ever encountering the word minorization heretofore in print. This book at least will use the word where needed. Even so. one would let a computer add many terms of the series first numerically. plots S − 1 and I together as areas (the summation’s first term is omitted). minorization 21 serves surely to bound an unknown quantity by a smaller. Notice that the Rn is a fairly small number. You can use it too if you like. known quantity. thus guaranteeing the absolute upper limit on S. the unknown area S − 1 cannot possibly be as great as the known area I.) Majorization serves surely to bound an unknown quantity by a larger. cleverer ways to majorize the remainder of this particular series will occur to the reader. or. S − 1 < I = 1. In symbols. and only then majorize the remainder. The integral I majorizes the summation S − 1. In practical calculation. or to replace by virtue of being. but that isn’t the point of the example.2 Majorization To majorize in mathematics is to be.

which are easier to sum in that they include a z k factor. The area I between the dashed curve and the x axis majorizes the area S − 1 between the stairstep curve and the x axis.10. but clever majorization can help (and if majorization does not help enough.10. because the height of the dashed curve is everywhere at least as great as that of the stairstep curve. [not yet written] can help even more).8. it does have a z k factor.3 illustrates. y 1 1/22 1/32 1/42 1 2 y = 1/x2 x 3 are often integrals and/or series summations. the point of establishing a bound is not to sum a power series exactly but rather to fence the sum within some . the two of which are akin as Fig.3: Majorization. 8. and the ratio of adjacent terms’ magnitudes approaches unity as k grows. but more common than harmonic series are true power series. However. because although its terms do decrease in magnitude it has no z k factor (or seen from another point of view. the series transformations of Ch. There is no one. Harmonic series can be hard to sum accurately. but z = 1). ERROR BOUNDS 179 Figure 8. ideal bound which works equally well for all power series.3 has observed. 8. It is a harmonic series rather than a power series.10. incidentally. The choice of whether to majorize a particular unknown quantity by an integral or by a series summation depends on the convenience of the problem at hand.3 Geometric majorization Harmonic series can be hard to sum as § 8. The series S of this subsection is interesting.

then (2. k=0 (8.34) and (3. .1’s series for exp z. Here. τk = z k /[k!]). (8. τk represents the power series’ kth-order term (in Table 8. THE TAYLOR SERIES sufficiently (rather than optimally) small neighborhood.32) Here. exact (but uncalculatable.32). for all k > n.34) (8. for example. 1 − |ρ| (8.1 is truncated after the nth-order term. A simple. general bound which works quite adequately for most power series encountered in practice.180 CHAPTER 8. Given these definitions. The |ρ| is a positive real number chosen. unknown) infinite series sum. is the geometric majorization |ǫn | < |τn+1 | . then n − ln(1 − z) ≈ |ǫn | < k=1 z n+1 zk . k /(n + 1) . a pair of concrete examples should serve to clarify. such that each term in the series’ tail is smaller than the last by at least a factor of |ρ|. First. preferably as small as possible.20) imply the geometric majorization (8. if n Sn ≡ τk .35) ǫn ≡ S n − S ∞ . |τk+1 /τk | = [k/(k+1)] |z| < |z|. such that |ρ| ≥ τk+1 τk τk+1 |ρ| > τk |ρ| < 1. where S∞ represents the true.1. so we have chosen |ρ| = |z|. for at least one k > n. including among many others all the Taylor series of Table 8. more or less.33) which is to say. if the Taylor series − ln(1 − z) = ∞ k=1 zk k of Table 8. If the last paragraph seems abstract. 1 − |z| where ǫn is the error in the truncated sum.

whose maximum value for all k > n occurs when k = n + 1. whereas each series diverges when asked to compute a quantity like − ln 3 or 3a−1 directly. Let f (1 + ζ) be a function whose Taylor series about ζ = 0 converges for .10. |τk+1 /τk | = |z| /(k + 1).1 is truncated after the nth-order term.6) like these a more probably profitable strategy is to find and exploit some property of the functions to transform their arguments. γ 1 γ a−1 = . (1/γ)a−1 which leave the respective Taylor series to compute quantities like − ln(1/3) and (1/3)a−1 they can handle. but for nonentire functions (§ 8.8. if the Taylor series ∞ k 181 exp z = k=0 j=1 z = j ∞ k=0 zk k! also of Table 8. 8. ERROR BOUNDS Second.10. so we have chosen |ρ| = |z| /(n + 2). k! + 1)! . To shift the series’ expansion points per § 8. The series for − ln(1 − z) and (1 + z)a−1 for instance each converge for |z| < 1 (though slowly for |z| ≈ 1).2 is one way to seek convergence. then n k exp z ≈ |ǫn | < k=0 j=1 z n+1 /(n z = j n k=0 zk . where n + 2 > |z|. such as 1 − ln γ = ln .4 Calculation outside the fast convergence domain Used directly. 1 − |z| /(n + 2) Here.1 tend to converge slowly for some values of z and not at all for others. the Taylor series of Table 8.

·] are functions we know how to compute like g[·] = −[·] or g[·] = 1/[·]. γ we have that f (γ) = g f 1+ 1−γ γ . γ 1 . . f (γ) = g f (8. consider that the function f (1 + ζ) = − ln(1 − z) has ζ = −z.182 CHAPTER 8. ·] = [·][·]. and like h[·. f (γ) = g[f (1/γ)] = 1/f (1/γ) and f (αγ) = h[f (α).38) (8. ·] property of (8. f (γ)] = f (α)f (γ).37) whose convergence domain |ζ| < 1 is |1 − γ| / |γ| < 1.36) where g[·] and h[·. f (γ)] .39) This paragraph’s notation is necessarily abstract. ·] = [·] + [·] or h[·. and that the function f (1 + ζ) = (1 + z)a−1 has ζ = z. one can take advantage of the h[·. (8. which is |γ − 1| < |γ| or in other words 1 ℜ(γ) > . (8. γ= 1+ζ 1−γ = ζ. f (γ) = g[f (1/γ)] = −f (1/γ) and f (αγ) = h[f (α). 1 ≤ ℜ(γ) < 2. 22 |ℑ(γ)| ≤ ℜ(γ). 2 Although the transformation from ζ to γ has not lifted the convergence limit altogether. One nonoptimal but entirely effective tactic is represented by the equations ω ≡ in 2m γ. computing f (ω) indirectly for any ω = 0 by any of several tactics. Identifying 1 = 1 + ζ.36) to sidestep the limit. To make it seem more concrete. f (γ)] = f (α) + f (γ). Though this writer knows no way to lift the convergence limit altogether that does not cause more problems than it solves. we see that it has apparently opened the limit to a broader domain. γ f (αγ) = h [f (α). THE TAYLOR SERIES |ζ| < 1 and which obeys properties of the forms22 1 .

24 To draw another example from Table 8. the tactic (8. For the logarithm. For the power. but shall in § 8. For example.39) fences γ within a comfortable zone. but such functions may have other properties one can analogously exploit. most nonentire functions lack properties of the specific kinds that (8. Axes are rotated per (3. Besides allowing all ω = 0.40) calculates f (ω) fast for any ω = 0.24 8. The method and tactic of (8.5 Nonconvergent series Variants of this section’s techniques can be used to prove that a series does not converge at all.40) leaves open the question of how to compute f (in 2m ).36) demands.11. shrinking a function’s argument by some convenient means before feeding the argument to the function’s Taylor series. In theory all finite γ rightward of the frontier let the Taylor series converge. as are the numbers ln(1/2) and (1/2)a−1 . τ Equation (8. Of course. we have not calculated yet. The sine and cosine in the cis function are each calculated directly by Taylor series. . ω sin α + cos α ˆ ˆ ˆ where arctan ω is interpreted as the geometrical angle the vector x + yω makes with x. − ln(in 2m ) = m ln(1/2) − in(2π/4). but for the example functions at least this is not hard. keeping γ moderately small in magnitude but never too near the ℜ(γ) = 1/2 frontier in the Argand plane. thus causing the Taylor series to converge faster or indeed to converge at all.39) also thus significantly speeds series convergence.40) are useful in themselves and also illustrative generally. ERROR BOUNDS whereupon the formula23 f (ω) = h[f (in 2m ). consider that arctan ω = α + arctan ζ.10. but extreme γ of any kind let the series converge only slowly (and due to compound floating-point rounding error inaccurately) inasmuch as they imply that |ζ| ≈ 1.8.7) through some angle α to reduce the tangent from ω to ζ. Notice how (8.10. ω cos α − sin α ζ≡ . ∞ k=1 1 k does not converge because 1 > k 23 k+1 k dτ .1.36) through (8. (in 2m )a−1 = cis[(n2π/4)(a − 1)]/[(1/2)a−1 ]m . f (γ)] 183 (8. Any number of further examples and tactics of the kind will occur to the creative reader. The number 2π.

but since eliminating the error entirely also requires recording the sum to infinite precision. A series sum converges to a definite value. Perfect precision is impossible. There is normally no such thing as an optimal error bound—with sufficient cleverness. but adequate precision is usually not hard to achieve. ∞ k=1 CHAPTER 8. to increase n a little). merely by including a sufficient number of terms in the sum. and to the same value every time the series is summed. An infamous example is S=− 1 1 1 (−)k √ = 1 − √ + √ − √ + ··· . some tighter bound can usually be discovered—but often easier and more effective than cleverness is simply to add a few extra terms into the series before truncating it (that is. and we can make that neighborhood as tight as we want. 2 3 4 k k=1 ∞ which obviously converges. suggestions and tactics. which is impossible. The “error” in a series summation’s error bounds is unrelated to the error of probability theory. Occasionally nonetheless a series arises for which even adequate precision is quite hard to achieve. The English word “error” is thus overloaded here. To eliminate the error to the 0x34-bit (sixteen-decimal place) precision of a computer’s double-type floating-point representation typically requires something like 0x34 terms—if the series be wisely composed and if care be taken to keep z moderately small and reasonably distant from the edge of the series’ convergence domain. THE TAYLOR SERIES 1 > k ∞ k=1 k k+1 dτ = τ ∞ 1 dτ = ln ∞.184 hence.10. few engineering applications really use much more than 0x10 bits (five decimal places) in any case. is to establish a definite neighborhood in which the unknown value is sure to lie. . It is just that we do not necessarily know exactly what that value is. eliminating the error entirely is not normally a goal one seeks. by this section’s techniques or perhaps by other methods. but sum it if you can! It is not easy to do. To eliminate the error entirely usually demands adding an infinite number of terms.6 Remarks The study of error bounds is not a matter of rules and formulas so much as of ideas. which is impossible anyway. What we can do. Before closing the section. we ought to arrest one potential agent of terminological confusion. Besides. τ 8. no chance is involved.

However. CALCULATING 2π 185 The topic of series error bounds is what G. we have that26 2π = 0x18 ∞ k=0 (−)k 2k + 1 √ 3−1 √ 3+1 2k+1 ≈ 0x6. We already know that tan 2π/8 = 1. 2k + 1 (8. Admittedly. [6] The writer is given to understand that clever mathematicians have invented subtle. from Table 3. Much faster convergence is given by angles smaller than 2π/8. 8. but such schemes are not covered here.”25 There is no general answer to the error-bound problem. but there are several techniques which help.487F. some of which this section has introduced. Any function whose Taylor series about zo = 0 includes only odd-order terms is an odd function. we have that 2π = 8 ∞ k=0 2π . For example. √ 2π 3−1 = . 8 (−)k . arctan √ 0x18 3+1 Applying the Taylor series at this angle. we shall meet later in the book as the need for them arises.42) 8. you only need to compute the numerical value of 2π once. 26 25 .8. the writer supposes that useful lessons lurk in the clever mathematics underlying the subtle schemes. The relatively straightforward series this section gives converges to the best accuracy of your computer’s floating-point register within a paltry 0x40 iterations or so—and.2. Examples of odd functions include z 3 and sin z. (8.11. or in other words that arctan 1 = Applying the Taylor series.41) is simple but converges extremely slowly. still much faster-converging iterative schemes toward 2π. Other techniques. Brown refers to as “trickbased.11 Calculating 2π The Taylor series for arctan z in Table 8.S.1 implies a neat way of calculating the constant 2π. after all.41) The series (8.12 Odd and even functions An odd function is one for which f (−z) = −f (z). there is fast and there is fast.

Specifically. tanh z z = ikπ [z − i(k − 1/2)π] tanh z|z = i(k − 1/2)π = 1. z = i(k − 1/2)π z − ikπ = 1. For example. along the imaginary. exp z = sinh z + cosh z. z = (k − 1/2)π z − kπ = 1. of the eight trigonometric functions 1 1 1 . .186 CHAPTER 8.43) = i(−)k . tan z z = kπ [z − (k − 1/2)π] tan z|z = (k − 1/2)π = −1. 8. Any function whose Taylor series about zo = 0 includes only even-order terms is an even function. z = kπ = (−)k . the plot of a real-valued even function being symmetric about a line.13 Trigonometric poles The singularities of the trigonometric functions are single poles of residue ±1 or ±i. . sin z cos z tan z 1 1 1 . z = ikπ k (8. . tanh z. but one can always split an analytic function into two components—one odd. Odd and even functions are interesting because of the symmetry they bring—the plot of a real-valued odd function being symmetric about a point. tan z. For the circular trigonometrics. all the poles lie along the real number line. the other even—by the simple expedient of sorting the odd-order terms from the even-order terms in the function’s Taylor series. Examples of even functions include z 2 and cos z. . Many functions are neither odd nor even. z − ikπ sinh z z − i(k − 1/2)π cosh z = (−) . THE TAYLOR SERIES An even function is one for which f (−z) = f (z). . of course. sinh z cosh z tanh z the poles and their respective residues are z − kπ sin z z − (k − 1/2)π cos z = (−)k . for the hyperbolic trigonometrics.

but for real z. sinh z cosh z tanh z 1/ tanh z to reach (8. .2 plus l’Hˆpital’s rule (4.44) [29] . These simpler trigonometrics are not only meromorphic but also entire.1 and 5. which is possible only for real z. o however. single poles.1). exp z and cis z—conspicuously excluded from this section’s gang of eight—have no poles for finite z. . the sine function’s very definition establishes the poles z = kπ (refer to Fig. therefore. cos z. Consider for instance the function 1/ sin z = i2/(eiz − e−iz ). we should like to verify that the poles (8. we finally apply l’Hˆpital’s rule to each of the ratios o z − kπ z − (k − 1/2)π z − kπ z − (k − 1/2)π . 8. Consider for example expanding f (z) = 27 e−z 1 − cos z (8. The trigonometrics are meromorphic functions (§ 8. similar reasoning for each of the eight trigonometrics forbids poles other than those (8. and thus are not meromorphic at all. . This function evidently goes infinite only when eiz = e−iz . . With the observations from Table 5. sin z cos z tan z 1/ tan z z − ikπ z − i(k − 1/2)π z − ikπ z − i(k − 1/2)π .43) lists are in fact the only poles that there are. because ez .43)’s claims. we shall marshal the identities of Tables 5. Sometimes one nevertheless wants to expand at least about a pole.29). . sinh z.1 that i sinh z = sin iz and cosh z = cos iz.6) for this reason. Before calculating residues and such. eiz .8.43). subject to Cauchy’s integral formula and so on. cosh z.43) lists. The poles are ordinary. with residues. but never about a pole or branch point of the function. THE LAURENT SERIES 187 To support (8. Trigonometric poles evidently are special only in that a trigonometric function has an infinite number of them.27 The six simpler trigonometrics sin z. Satisfied that we have forgotten no poles. ez ± e−z and eiz ± e−iz then likewise are finite. that we have forgotten no poles. 3.14 The Laurent series Any analytic function can be expanded in a Taylor series.14. Observe however that the inverse trigonometrics are multiple-valued and have branch points.

Expanding dividend and divisor separately. By long division. THE TAYLOR SERIES about the function’s pole at z = 0. f (z) = 2 2 − + z2 z 2 2 − 2+ z z + ∞ k=0 ∞ k=1 (−)k z 2k (2k)! ∞ k=1 − z 2k+1 z 2k + (2k)! (2k + 1)! (−)k z 2k (2k)! = 2 2 − + z2 z ∞ k=1 − ∞ k=0 (−)k 2z 2k−2 (−)k 2z 2k−1 + (2k)! (2k)! z 2k+1 z 2k + − (2k)! (2k + 1)! − (−)k 2z 2k+1 (2k + 2)! ∞ k=1 ∞ k=1 + = 2 2 − + 2 z z ∞ k=0 (−)k z 2k (2k)! (−)k 2z 2k (2k + 2)! ∞ k=0 + = 2 2 − + 2 z z ∞ k=0 − z 2k+1 z 2k + (2k)! (2k + 1)! (−)k 2 z 2k ∞ k=1 (−)k z 2k (2k)! −(2k + 1)(2k + 2) + (2k + 2)! + (2k + 2) − (−)k 2 2k+1 z (2k + 2)! (−)k z 2k .188 CHAPTER 8. (2k)! . f (z) = = = 1 − z + z 2 /2 − z 3 /6 + · · · z 2 /2 − z 4 /0x18 + · · · ∞ j j j=0 (−) z /j! − ∞ k 2k k=1 (−) z /(2k)! ∞ 2k /(2k)! + z 2k+1 /(2k k=0 −z ∞ k 2k k=1 (−) z /(2k)! + 1)! .

8. the Laurent series remedies this defect. fo (z − zo ) is analytic at z = zo . Handling the pole separately. and if all the terms 2 2 7 z f (z) = 2 − + − + · · · z z 6 2 are extracted the result is the Laurent series proper.45) the partial Laurent series f (z) = −1 k=K (ak )(z − zo )k + ∞ k=0 (bk )(z ∞ k=0 (ck )(z − zo )k .6 tell more about poles generally. unlike f (z). In either form. From an applied point of view however what interests us is the basic technique to remove the poles from an otherwise analytic function.5] .48) where. (8.8. c0 = 0. The ordinary Taylor series diverges at a function’s pole.46) However for many purposes (as in eqn.28. Further rigor is left to the professionals. (8. 28 The professional mathematician’s treatment of the Laurent series usually begins by defining an annular convergence domain (a convergence domain bounded without by a large circle and within by a small) in the Argand plane. K ≤ 0.47) suffices and may even be preferable. − zo )k (8. (2k + 2)! (8. 29 [24. THE LAURENT SERIES 189 The remainder’s k = 0 terms now disappear as intended.14. factoring z 2 /z 2 from the division leaves f (z) = 2 2 − + 2 z z ∞ k=0 (2k + 3)(2k + 4) + (−)k 2 2k z (2k + 4)! (2k + 4) + (−)k 2 2k+1 z (2k + 4)! ∞ k=0 − (−)k z 2k .48) is f (z)’s regular part at z = zo . The fo (z − zo ) of (8. K ≤ 0. f (z) = −1 k=K (ak )(z − zo )k + fo (z − zo ). K ≤ 0. § 10. so.5 and 9. f (z) = ∞ k=K (ak )(z − zo )k .8][14. including multiple poles like the one in the example here.29 Sections 9.45) One can continue dividing to extract further terms if desired. § 2.

it may suffice to write f (z) = f (z) = This is f (z) = 1 sin w . As an example of a different kind.190 CHAPTER 8. One could expand the function directly about some other point like z = 1. the resulting series would suffer a straitly limited convergence domain. even then. The typical trouble one has with the Taylor series is that a function’s poles and branch points limit the series’ convergence domain. Consider the function sin(1/z) . cos z z which is all one needs to calculate f (z) numerically—and may be all one needs for analysis. consider g(z) = 1 . and one cannot expand the function directly about it.15 Taylor series in 1/z A little imagination helps the Taylor series a lot. one needs no Taylor series to handle such a function (one simply does the indicated arithmetic). (z − 2)2 z −1 − z −3 /3! + z −5 /5! − · · · . but calculating the Taylor coefficients would take a lot of effort and. All that however tries too hard. Depending on the application.1) and (4. Then by (8. Suppose however that a Taylor series specifically about z = 0 were indeed needed for some reason. The point is an essential singularity. Thinking flexibly. THE TAYLOR SERIES 8. Several other ways are possible. zk . too. cos z This function has a nonanalytic point of a most peculiar nature at z = 0. 2k+2 That expansion is good only for |z| < 2.2). but for |z| > 2 we also have that g(z) = 1/z 2 1 = 2 2 (1 − 2/z) z ∞ k=0 1+k 1 2 z k = ∞ k=2 2k−2 (k − 1) . w≡ . The Laurent series of § 8. 1 − z 2 /2! + z 4 /4! − · · · Most often. g(z) = 1 1/4 = 2 (1 − z/2) 4 ∞ k=0 1+k 1 z 2 k = ∞ k=0 k+1 k z . however.14 represents one way to extend the Taylor series. one can often evade the trouble.

v2 = y z and v3 = z. Note that we have computed the two series for g(z) without ever actually taking a derivative. v1 = x. . too. . consider the function f (z1 . then take the Taylor series either of the whole function at once or of pieces of it separately. though taking derivatives per (8. What does it mean? • The z is a vector 30 incorporating the several independent variables z1 . One can expand in negative powers of z equally validly as in positive powers. is a vector with N = 3. . Where two or more independent variables are involved. which has terms z1 and 2z2 —these we understand—but also has the cross-term z1 z2 for which the relevant derivative is the cross-derivative ∂ 2 f /∂z1 ∂z2 .16 The multidimensional Taylor series Equation (8.19) has given the Taylor series for functions of a single variable. one must account for the crossderivatives. . the multidimensional Taylor series is f (z) = k ∂kf ∂zk z=zo (z − zo )k . One is not required immediately to take the Taylor series of a function as it presents itself.19) may be the canonical way to determine Taylor coefficients. k2 . k! (8. THE MULTIDIMENSIONAL TAYLOR SERIES 191 which expands in negative rather than positive powers of z. that’s neat. only the details are a little more complicated. • The k is a nonnegative integer vector of N counters—k1 . And. With this idea in mind. For example. z2 ) = z1 +z1 z2 +2z2 . Each of the kn runs independently from 0 to ∞.16.3.49) Well. Neither of the section’s examples is especially interesting in itself. . 30 . .8. The ˆ ˆ geometrical vector v = xx + yy + ˆz of § 3. 8. . The idea of the Taylor series does not differ where there are two or more independent variables. kN — one for each of the independent variables. z2 . In this generalized sense of the word. one can first change variables or otherwise rewrite the function in some convenient way. For 2 2 example. but their point is that it often pays to think flexibly in extending and applying the Taylor series. a vector is an ordered set of N elements. . zN . any effective means to find the coefficients suffices. and every permutation is possible. then.

. (1.19) represents a function f (z). 0). . 1). Thus within some convergence domain about z = zo . . (0.192 if N = 2 then CHAPTER 8. 2). (3.49) represents a function f (z) as accurately as the simple Taylor series (8. . meaning that ∂kf ≡ ∂zk • The (z − zo )k represents N N n=1 ∂ kn (∂zn )kn f. . n=1 With these definitions. (z − zo ) ≡ • The k! represents k n=1 (zn − zon )kn . 2). 1). 3). (3. and for the same reason. (2. 3). (2. 0). . . . . 0). k2 ) = (0. (1. . the multidimensional Taylor series (8..49) yields all the right derivatives and cross-derivatives at the expansion point z = zo . . N k! ≡ kn !. . 3). 2). THE TAYLOR SERIES k = (k1 . . (1. the multidimensional Taylor series (8. . (3. (0. (3. 2). 0). (2. . . • The ∂ k f /∂zk represents the kth cross-derivative of f (z). (0. 3). 1). (1. . 1). (2. ..

a dτ (9.2. recognizing its integrand to be the derivative of something already known:1 z a df dτ = f (τ )|z .1) For instance. implies a general technique only for calculating an integral numerically—and even for this purpose it is imperfect. 9. This chapter surveys several weapons of the intrepid mathematician’s arsenal against the integral. It is hard to guess in advance which technique might work best. then directly writes down the solution ln τ |x . Its counterpart (7.1 Integration by antiderivative The simplest way to solve an integral is just to look at it. how is one actually to do the sum? It turns out that there is no one general answer to this question. recognizing it to be the derivative of ln τ . 1 1 The notation f (τ )|z or [f (τ )]z means f (z) − f (a). 1 τ One merely looks at the integrand 1/τ . when it comes to adding an infinite number of infinitesimal elements. some by another. a a 193 . Some functions are best integrated by one technique.19) implies a general technique for calculating a derivative symbolically.Chapter 9 Integration techniques Equation (4. for. Refer to § 7.1). unfortunately. 1 x 1 dτ = ln τ |x = ln x.

whose differential is (by successive steps) d(u) = d(1 + x2 ). 9. (9. with the change of variable u ← 1 + x2 . However. z ρ1 (9.3) This helps.24) and (8. nonobvious. when z1 and z2 are real but negative numbers. . However.1 provide several further good derivatives this antiderivative technique can use. 1 + x2 This integral is not in a form one immediately recognizes.2.3 and 9. for example. 5.2) Tables 7.194 CHAPTER 9. One particular. du = 2x dx. Besides the essential τ a−1 = d dτ τa a .2 Integration by substitution Consider the integral x2 S= x1 x dx . INTEGRATION TECHNIQUES The technique by itself is pretty limited. If z = ρeiφ . useful variation on the antiderivative technique seems worth calling out specially here. then (8. the frequent object of other integration techniques is to transform an integral into a form to which this basic technique can be applied. 5.25) have that z2 z1 dz ρ2 = ln + i(φ2 − φ1 ).1.

whether alone or in combination with other techniques. 1 + x2 1 To check the result. 9. where u(τ ) and v(τ ) are functions of an independent variable τ . τ =a (9. Integrating. It does not solve all integrals but it does solve many. 1 + x2 x2 =x which indeed has the form of the integrand we started with. d(uv) = u dv + v du.25). INTEGRATION BY PARTS the integral is x2 195 S = x=x1 x2 = = = = x=x1 1+x2 2 x dx u 2x dx 2u du 2u u=1+x2 1 1 ln u 2 1 ln 2 1+x2 2 u=1+x2 1 1 + x2 2 . Reordering terms.3.9. u dv = d(uv) − v du. The technique is integration by substitution.4) . b τ =a u dv = uv|b =a − τ b v du.3 Integration by parts Integration by parts is a curious but very broadly applicable technique which begins with the derivative product rule (4.5 of the final expression with respect to x2 : ∂ 1 1 + x2 2 ln ∂x2 2 1 + x2 1 = x2 =x = 1 ∂ ln 1 + x2 − ln 1 + x2 2 1 2 ∂x2 x . we can take the derivative per § 7.

not just for C = 0. sin ατ v = . dv ← cos ατ dτ. we can begin by integrating part of it. What it does is to integrate one part of an integral separately—whichever part one has chosen to identify as dv—while contrarily differentiating the other part u. Unsure how to integrate this. then. but one should understand clearly what it does and does not do. nothing in the integration by parts technique requires us to consider all possible v. consider the integral x S(x) = 0 τ cos ατ dτ. The technique is powerful for this reason. Any convenient v suffices. consider the definite integral3 Γ(z) ≡ 2 ∞ 0 e−τ τ z−1 dτ. For an example of the rule’s operation. INTEGRATION TECHNIQUES Equation (9.5) The careful reader will observe that v = (sin ατ )/α + C matches the chosen dv for any value of C.4). Letting u ← τ. We can begin by integrating the cos ατ dτ part. The new integral may or may not be easier to integrate than was the original u dv. It does not just integrate each part of an integral separately. α According to (9. we find that2 du = dτ.4) is the rule of integration by parts. (9. ℜ(z) > 0. It isn’t that simple. However. S(x) = τ sin ατ α x 0 x − 0 sin ατ x dτ = sin αx + cos αx − 1. The virtue of the technique lies in that one often can find a part dv which does yield an easier v du. This is true. For another kind of example of the rule’s operation. we choose v = (sin ατ )/α.196 CHAPTER 9. In this case. upon which it rewards the mathematician only with a whole new integral v du. 3 [32] . α α Integration by parts is a powerful technique.

5). It is best illustrated by example.9. z Γ(z + 1) = zΓ(z). z Substituting these according to (9.4 Integration by unknown coefficients One of the more powerful integration techniques is relatively inelegant. The technique is the method of unknown coefficients. it follows by induction on (9. 197 dv ← τ z−1 dτ. INTEGRATION BY UNKNOWN COEFFICIENTS Letting u ← e−τ .1) plus intelligent guessing.6) this is an interesting result. can be taken as an extended definition of the factorial (z − 1)! for all z.6) that (n − 1)! = Γ(n). called the gamma function. Since per (9. . τz v = .5) Γ(1) = 0 ∞ e−τ dτ = −e−τ ∞ 0 = 1. yet it easily cracks some integrals that give other techniques trouble.7) Thus (9. ℜ(z) > 0. Integration by parts has made this finding possible.4. and it is based on the antiderivative (9. we evidently have that du = −e−τ dτ. (9.4) into (9. 9. (9.5) yields Γ(z) = e−τ τz z ∞ = [0 − 0] + = When written τ =0 ∞ 0 − ∞ 0 τz z −e−τ dτ τ z −τ e dτ z Γ(z + 1) .

where I is the loan’s interest rate and P is the borrower’s payment rate. dρ with which one can solve the integral. one can guess a likely-seeming antiderivative form. one has no guarantee that the guess is right. after all). x|t=0 = xo . too. concluding that S(x) = −σ 2 e−(ρ/σ) 2 /2 x 0 = σ2 1 − e−(x/σ) 2 /2 . Having guessed.198 CHAPTER 9.10) which conceptually represents4 the changing balance x of a bank loan account over time t. too. but see: if the guess were right. one can write the specific antiderivative e−(ρ/σ) 2 /2 ρ= d 2 −σ 2 e−(ρ/σ) /2 . x|t=T = 0. Consider for example the differential equation dx = (Ix − P ) dt. Using this value for a. then the antiderivative would have the form e−(ρ/σ) 2 /2 ρ = d −(ρ/σ)2 /2 ae dρ aρ 2 = − 2 e−(ρ/σ) /2 . dρ where the a is an unknown coefficient.9) The same technique solves differential equations. If it is desired to find the correct payment rate P which pays Real banks (in the author’s country. (9. at least) by law or custom actually use a needlessly more complicated formula—and not only more complicated. such as e−(ρ/σ) 2 /2 ρ= d −(ρ/σ)2 /2 ae . INTEGRATION TECHNIQUES Consider the integral (which arises in probability theory) x S(x) = 0 e−(ρ/σ) 2 /2 ρ dρ. (9.8) If one does not know how to solve the integral in a more elegant way. (9. but mathematically slightly incorrect. σ implying that a = −σ 2 (evidently the guess is right. 4 .

For the former condition. Evidently good choices for α and B.12) . then (perhaps after some bad guesses) we guess the form x(t) = Aeαt + B. Substituting the last two equations into (9. we establish by applying the given boundary conditions x|t=0 = xo and x|t=T = 0. then. which at least is satisfied if both of the equations αAeαt = IAeαt .10). 0 = AeIT + P .10) and dividing by dt yields αAeαt = IAeαt + IB − P. where α. The guess’ derivative is dx = αAeαt dt. we have that A = P = −e−IT xo .4. 1 − e−IT Ixo . (9.11) is P P =A+ . 1 − e−IT (9. INTEGRATION BY UNKNOWN COEFFICIENTS 199 the loan off in the time T . are satisfied. 0 = IB − P.11) x(t) = AeIt + I to (9. are α = I. I Solving the last two equations simultaneously. A and B are unknown coefficients. P . The constants A and P . xo = Ae(I)(0) + I I and for the latter condition. B = I Substituting these coefficients into the x(t) equation above yields the general solution P (9.9.

The trouble. Slightly inelegant the method may be. yields I= za dz = i2πz a |z=−1 = i2π ei2π/2 z+1 a = i2πei2πa/2 .200 CHAPTER 9. Observing that τ is only a dummy integration variable. seems to solve the integral. Such are the kinds of problems the method can solve. is that the integral S does not go about a closed complex contour. One can however construct a closed complex contour I of which S is a part.13) to (9.29) has that integrating once counterclockwise about a closed complex contour. as in Fig 9. The integrand has a pole at τ = −1. τ +1 This is a hard integral. −1 < a < 0. too—and it has surprise value (for some reason people seem not to expect it). the method usually finds it.1. with the contour enclosing the pole at z = −1 but shutting out the branch point at z = 0. The method of unknown coefficients is an elephant.2] .12).5 Integration by closed contour We pass now from the elephant to the falcon.11) yields the specific solution x(t) = xo 1 − e−IT 1 − e(I)(t−T ) (9. from the inelegant to the sublime. The virtue of the method of unknown coefficients lies in that it permits one to try an entire family of candidate solutions at once. but there is a way. but it is pretty powerful. of course. No obvious substitution. If a solution exists anywhere in the family. with the family members distinguished by the values of the coefficients. Consider the definite integral5 S= 0 ∞ τa dτ. § 1. with the payment rate P required of the borrower given by (9. no evident factoring into parts. INTEGRATION TECHNIQUES Applying these to the general solution (9.10) meeting the boundary conditions. then Cauchy’s integral formula (8. if one writes the same integral using the complex variable z in place of the real variable τ . If the outer circle in the figure is of infinite 5 [32. 9.

9. Even though z itself takes on the same values along I3 as along I1 . Indeed. but besides being incorrect this defeats the purpose of the closed contour technique. then the closed contour I is composed of the four parts I = I1 + I2 + I3 + I4 = (I1 + I3 ) + I2 + I4 . so. One must take care to interpret the four parts correctly. The integrand z a /(z + 1) is multiple-valued. The figure tempts one to make the mistake of writing that I1 = S = −I3 . ℑ(z) 201 I2 z = −1 I4 I1 I3 ℜ(z) radius and the inner. ρ+1 ρa dρ = ei2πa S. which. ρ+1 . of infinitesimal.5. the multiple-valued integrand z a /(z + 1) does not. the two parts I1 + I3 = 0 do not cancel. INTEGRATION BY CLOSED CONTOUR Figure 9.1: Integration by closed contour. I1 = 0 ∞ ∞ 0 −I3 = (ρei0 )a dρ = (ρei0 ) + 1 (ρei2π )a dρ = ei2πa (ρei2π ) + 1 ∞ 0 ∞ 0 ρa dρ = S. The integrand has a branch point at z = 0. More subtlety is needed. in passing from I3 through I4 to I1 . in fact. the contour has circled.

the first limit vanishes.14) an astonishing result. the second limit vanishes. too. Substituting this expression into the last equation yields. leaving I = (1 − ei2πa )S. Such results vindicate the effort we have spent in deriving Cauchy’s integral formula (8.202 Therefore. ℑ(a) = 0. |ℜ(a)| < 1. i2πei2πa/2 = (1 − ei2πa )S. −1 < a < 0. that one is unlikely to believe it at first encounter. But by Cauchy’s integral formula we have already found an expression for I.6 Another example7 is 2π T = 0 6 dθ . straightforward (though computationally highly inefficient) numerical integration per (7. S = i2πei2πa/2 . INTEGRATION TECHNIQUES I = I1 + I2 + I3 + I4 = (I1 + I3 ) + I2 + I4 = (1 − ei2πa )S + lim 2π ρ→∞ φ=0 2π za dz − lim ρ→0 z+1 z a−1 dz − lim − lim ρ→0 2π φ=0 2π za dz z+1 = (1 − ei2πa )S + lim = (1 − ei2πa )S + lim ρ→∞ φ=0 a 2π ρ→∞ z a φ=0 z a+1 ρ→0 φ=0 a+1 2π φ=0 z a dz .1) confirms the result. τ +1 sin(2πa/2) (9. However. Since a < 0. 1 − ei2πa i2π S = −i2πa/2 . as the interested reader and his computer can check. e − ei2πa/2 2π/2 . 1 + a cos θ So astonishing is the result. by successive steps. 0 ∞ 2π/2 τa dτ = − . 7 [29] . S = − sin(2πa/2) That is.29). CHAPTER 9. and because a > −1.

unlike θ. One thus obtains the equivalent integral T = dz/iz i2 dz =− 2 + 2z/a + 1 1 + (a/2)(z + 1/z) a z dz i2 = − . begins and ends the integration at the same point. whose magnitudes are such that √ 2 − a2 ∓ 2 1 − a2 . Rather. In this example there is no branch point to exclude.5.9. The integrand evidently has poles at −1 ± √ a 1 − a2 z= . here again the contour is not closed. nor need one extend the contour. |z| = a2 2 One of the two magnitudes is less than unity and one is greater. excluding the branch point. one changes the variable z ← eiθ and takes advantage of the fact that z. meaning that one of the two poles lies within the contour and one lies without. as is . INTEGRATION BY CLOSED CONTOUR 203 As in the previous example. The previous example closed the contour by extending it. √ √ a z − −1 + 1 − a2 /a z − −1 − 1 − a2 /a whose contour is the unit circle in the Argand plane.

It is a great technique. If it seems unreasonable to the reader to expect so flamboyant a technique actually to work. The technique. is found in practice to solve many integrals other techniques find almost impossible to crack. √ 2 − a2 + 2 1 − a2 . . > > 1 − a2 . integrating about the pole within the contour yields T = i2π −i2/a √ z − −1 − 1 − a2 /a =√ 2π . √ 2 + 2 1 − a2 . 1 − a2 √ z=(−1+ 1−a2 )/a Observe that by means of a complex variable of integration. nevertheless. 8 These steps are perhaps best read from bottom to top. a2 Per Cauchy’s integral formula (8. 1 − a2 > 1 − 2a2 + a4 . See Ch. The key to making the technique work lies in closing a contour one knows how to treat. integration by closed contour. 6’s footnote 14. INTEGRATION TECHNIQUES seen by the successive steps8 a2 < 1. so long as the contour encloses appropriate poles in the Argand domain plane while shutting branch points out. √ < −(1 − a2 ) < < < < < 2a2 a2 1 1 − a2 .204 CHAPTER 9.29). √ 2 − a2 + 2 1 − a2 . 1 − a2 √ 1 − a2 √ − 1 − a2 √ 1 − 1 − a2 √ 2 − 2 1 − a2 √ 2 − a2 − 2 1 − a2 √ 2 − a2 − 2 1 − a2 a2 . (−a2 )(0) > (−a2 )(1 − a2 ). The robustness of the technique lies in that any contour of any shape will work. < < < < a2 1 − a2 2 0 > −a2 + a4 . √ 1 + 1 − a2 . each example has indirectly evaluated an integral whose integrand is purely real. this seems equally unreasonable to the writer—but work it does. 0 < 1 − a2 .

B is the numerator and A is the denominator. §§ 2. z−1 z−2 Combining the two fractions over a common denominator10 yields f (z) = z+3 .9 9.9. Appendix F][24. the former is probably the more amenable to analysis. 0 0 f (τ ) dτ −1 = −1 = [−4 ln(1 − τ ) + 5 ln(2 − τ )]0 .16) by N j=1 (z N j=1 (z 9 − αj ) /(z − αk ) − αj ) /(z − αk ) [36. (9.15) in which no two of the several poles αj are the same.1 Partial-fraction expansion f (z) = 5 −4 + . the partial-fraction expansion has the form N Ak f (z) = .6. In the fraction. INTEGRATION BY PARTIAL-FRACTION EXPANSION 205 9. −1 −4 dτ + τ −1 0 −1 5 dτ τ −2 The trouble is that one is not always given the function in the amenable form. 10 . using (9. The quotient is Q = B/A.12] Terminology (you probably knew this already): A fraction is the ratio of two numbers or expressions B/A.6 Integration by partial-fraction expansion This section treats integration by partial-fraction expansion. (z − 1)(z − 2) Consider the function Of the two forms.6.16) z − αk k=1 where multiplying each fraction of (9.7 and 10. For example. Given a rational function f (z) = N −1 k k=0 bk z N j=1 (z − αj ) (9. It introduces the expansion itself first.3).

but because at least to this writer that way does not lend much applied insight.18) where C is a real-valued constant.15) by (9. 1 = lim N −1 k k=0 bk z N j=1 (z z→αm − αj ) Am . 1 − ǫei2πk/N /z . INTEGRATION TECHNIQUES puts the several fractions over a common denominator.17) together give the partial-fraction expansion of (9.206 CHAPTER 9.1 is that it cannot directly handle repeated poles. The conventional way to expand a fraction with repeated poles is presented in § 9. then the residue formula (9. i = j. the value of f (z) with the pole canceled. Consider the function g(z) = N −1 k=0 Cei2πk/N . is called the residue of f (z) at the pole z = αm . we separate the poles. Dividing (9.2 Repeated poles The weakness of the partial-fraction expansion of § 9. 9. if αi = αj . This function evidently has a small circle of poles in the Argand plane at αk = ǫei2πk/N .15)’s rational function f (z).17) where Am . z→αm z=αm (9. the present subsection treats the matter in a different way. That is. z − αm Rearranging factors. yielding (9. we have that Am = N −1 k k=0 bk z N j=1 (z − αj ) /(z − αm ) = lim [(z − αm )f (z)] .17) finds an uncanceled pole remaining in its denominator and thus fails for Ai = Aj (it still works for the other Am ). Factoring.16) gives the ratio 1= N −1 k k=0 bk z N j=1 (z N k=1 − αj ) Ak .6.6. the mth term Am /(z − αm ) dominates the summation of (9.6.15). N > 1.16) and (9. Equations (9.5 below. 0 < ǫ ≪ 1. g(z) = C z N −1 k=0 ei2πk/N . z − αk In the immediate neighborhood of z = αm . Hence. z − ǫei2πk/N (9.16). Here.

in the other cases they cancel. zN Having achieved this approximation. Do the same for j = 2 then j = 8. |z| ≫ ǫ. 0 < ǫ ≪ 1. so g(z) = N C ǫmN −1 . then for N = 8 and j = 3 plot the several (ei2πj/N )k in the Argand plane.9. z − ǫei2πk/N If you don’t see why. if we strategically choose C= then 1 . Only in the j = 8 case do the terms add coherently.6. |z| ≫ ǫ. canceling otherwise—is a classic manifestation of Parseval’s principle.  N −1 C ei2πk/N g(z) = z k=0 207 ∞ j=0 ǫei2πk/N z j   = C N −1 ∞ k=0 j=1 ǫj−1 ei2πjk/N zj N −1 k=0 = C But11 ∞ j=1 ǫj−1 zj ei2πj/N k . N ǫN −1 1 . N −1 k=0 ei2πj/N k = N 0 ∞ if j = mN.34) to expand the fraction. This effect—reinforcing when j = nN . Hence. otherwise. except in the immediate neighborhood of the small circle of poles—the first term of the summation dominates. zN But given the chosen value of C. z mN m=1 For |z| ≫ ǫ—that is. (9. INTEGRATION BY PARTIAL-FRACTION EXPANSION Using (2. N > 1.18) is g(z) ≈ g(z) = 11 1 N ǫN −1 N −1 k=0 ei2πk/N . g(z) ≈ N C ǫN −1 . .

z − zo + ǫei2πk/N (9. changing z − zo ← z. INTEGRATION TECHNIQUES Joining the last two equations together. An example to illustrate the technique. we have that 1 1 = lim ǫ→0 N ǫN −1 (z − zo )N N −1 k=0 ei2πk/N .1 we already know how to handle. . the pole hiding under dominant. double pole there.19) is that it lets one replace an N -fold pole with a small circle of ordinary poles.19) The significance of (9. Notice how the calculation has discovered an additional. separating a double pole: f (z) = z2 − z + 6 (z − 1)2 (z + 2) z2 − z + 6 ǫ→0 (z − [1 + ǫei2π(0)/2 ])(z − [1 + ǫei2π(1)/2 ])(z + 2) z2 − z + 6 = lim ǫ→0 (z − [1 + ǫ])(z − [1 − ǫ])(z + 2) 1 z2 − z + 6 = lim ǫ→0 z − [1 + ǫ] (z − [1 − ǫ])(z + 2) z=1+ǫ = lim + + = lim 1 z − [1 − ǫ] 1 z+2 z2 − z + 6 (z − [1 + ǫ])(z + 2) z2 z=1−ǫ ǫ→0 = lim ǫ→0 = lim ǫ→0 6+ǫ 1 z − [1 + ǫ] 6ǫ + 2ǫ2 1 6−ǫ + z − [1 − ǫ] −6ǫ + 2ǫ2 0xC 1 + z+2 9 1/ǫ − 1/6 −1/ǫ − 1/6 4/3 + + z − [1 + ǫ] z − [1 − ǫ] z+2 1/ǫ −1/ǫ −1/3 4/3 + + + z − [1 + ǫ] z − [1 − ǫ] z − 1 z + 2 −z+6 (z − [1 + ǫ])(z − [1 − ǫ]) z=−2 . Notice incidentally that 1/N ǫN −1 is a large number not a small. The poles are close together but very strong.208 CHAPTER 9. single pole at z = 1. and writing more formally.6. N > 1. which per § 9.

(τ − 1)2 (τ + 2) which indeed has the form of the integrand we started with. for 0 ≤ x < 1. Continuing the example of § 9.5) that the result is correct.9.16) and (9. then one can use (9.17)—and.6.3 Integrating a rational function If one can find the poles of a rational function of the form (9. confirming the . x f (τ ) dτ 0 = = = = = = = = = τ2 − τ + 6 dτ 2 0 (τ − 1) (τ + 2) x −1/ǫ −1/3 4/3 1/ǫ lim + + + dτ ǫ→0 0 τ − [1 + ǫ] τ − [1 − ǫ] τ − 1 τ + 2 1 1 lim ln([1 + ǫ] − τ ) − ln([1 − ǫ] − τ ) ǫ→0 ǫ ǫ x 4 1 − ln(1 − τ ) + ln(τ + 2) 3 3 0 x 1 [1 + ǫ] − τ 4 1 lim ln − ln(1 − τ ) + ln(τ + 2) ǫ→0 ǫ [1 − ǫ] − τ 3 3 0 x [1 − τ ] + ǫ 4 1 1 ln − ln(1 − τ ) + ln(τ + 2) lim ǫ→0 ǫ [1 − τ ] − ǫ 3 3 0 x 2ǫ 4 1 1 ln 1 + − ln(1 − τ ) + ln(τ + 2) lim ǫ→0 ǫ 1−τ 3 3 0 x 1 2ǫ 4 1 lim − ln(1 − τ ) + ln(τ + 2) ǫ→0 ǫ 1−τ 3 3 0 x 1 4 2 − ln(1 − τ ) + ln(τ + 2) lim ǫ→0 1 − τ 3 3 0 2 x+2 1 4 − 2 − ln(1 − x) + ln . (9. each of which one can integrate individually.2.6. if needed. INTEGRATION BY PARTIAL-FRACTION EXPANSION 209 9.19)—to expand the function into a sum of partial fractions. we can take the derivative of the final expression: d dx x+2 1 4 2 − 2 − ln(1 − x) + ln 1−x 3 3 2 x=τ 2 −1/3 4/3 = + + (τ − 1)2 τ − 1 τ + 2 τ2 − τ + 6 = .6.15). 1−x 3 3 2 x To check (§ 7.

A related property is that dk Φ dwk =0 w=0 for 0 ≤ k < p.21) is (9. when 0 ≤ k < p and w = 0.21) where g and hk are polynomials in nonnegative powers of w. the function and its first p − 1 derivatives are all zero at w = 0.21) holds for all 0 ≤ k ≤ p. (9. di Φ dwi = d di−1 Φ d = i−1 dw dw dw wp−i+1 hi−1 (w) [g(w)]i = wp−i hi (w) [g(w)]i+1 . By induction on this basis.21) is good at least for this case. (9. of course.3 affords extra insight. The reason is that (9. (9. 9. When k = 0. g(0) = 0. hi (w) ≡ wg dg dhi−1 − iwhi−1 + (p − i + 1)ghi−1 .20). as was to be demonstrated. whereas its numerator has a wp−k = 0 factor.2 and 9. (Notice incidentally how much easier it is symbolically to differentiate than to integrate!) 9.6. but the derivatives do bring some noteworthy properties. 0 < i ≤ p.6.21)’s denominator is [g(w)]k+1 = 0. The property is proved by induction. its derivatives interest us. (9. INTEGRATION TECHNIQUES result.4 The derivatives of a rational function Not only the integral of a rational function interests us.5 Repeated poles (the conventional technique) Though the technique of §§ 9. too. 0 ≤ k ≤ p. dwk [g(w)]k+1 (9.21) holds for k = i − 1.20) Φ(w) = g(w) enjoys derivatives in the general rational form dk Φ wp−k hk (w) = .210 CHAPTER 9. Then. if (9. One needs no special technique to compute such derivatives. so (9. First of interest is the property that a function in the general rational form wp h0 (w) . it is not the conventional technique to expand in partial fractions a rational function .6.6.22) That is. dw dw which makes hi (like hi−1 ) a polynomial in nonnegative powers of w.

ℓ=0 j=m (Ajℓ )(z − αm )pm + (z − αj )pj −ℓ pm −1 ℓ=0 (Amℓ )(z − αm )ℓ . one multiplies (9.6. (9.22) with w = z − αm reveals the double summation and its first pm − 1 derivatives all to be null at z = αm . cannot be expanded solely in the first-order fractions of § 9.9. (z − αj )pj −ℓ (9. 1 ≤ m ≤ M .24) by (z − αm )pm to obtain the form M (z − αm ) pm f (z) = pj −1 j=1. j=1 ≥ 0.24) lacks are the values of its several coefficients Ajℓ . k. where j. f (z) = M N −1 k k=0 bk z M j=1 (z − αj )pj . M . But (9. but can indeed be expanded if higherorder fractions are allowed: f (z) = M pj −1 j=1 ℓ=0 Ajℓ . dk dz k M pj −1 j=1. ℓ=0 j=m (Ajℓ )(z − αm )pm (z − αj )pj −ℓ z=αm = 0.6. INTEGRATION BY PARTIAL-FRACTION EXPANSION 211 having a repeated pole.1. = αj if j ′ = j. N and the several pj are integers.24) What the partial-fraction expansion (9.23) N pj αj ′ ≡ pj . A rational function with repeated poles. . 0 ≤ k < pm . The conventional technique is worth learning not only because it is conventional but also because it is usually quicker to apply in practice. This subsection derives it. One can determine the coefficients with respect to one (possibly repeated) pole at a time. To determine them with respect to the pm -fold pole at z = αm . that is.

“Why should we prove that a solution exists.25)’s coefficients actually come from the same expansion. yes. some from the other. “The present book is a book of applied mathematics. once we have actually found it?” Ah. and uniquely. On the other hand. one derivative per repetition of the pole. “But these are quibbles. Maybe there exist two distinct expansions. but on this occasion let us nonetheless follow the professional’s line of reasoning. z=αj 0 ≤ ℓ < p.5’s logic discovers no guarantee that all of (9. A professional mathematician might object however that it has done so without first proving that a unique solution actually exists. INTEGRATION TECHNIQUES so. A careful review of § 9. the (z − αm )pm f (z) equation’s kth derivative reduces at that point to dk (z − αm )pm f (z) dz k = z=αm pm −1 ℓ=0 dk (Amℓ )(z − αm )ℓ dz k z=αm = k!Amk . but the professional’s point is that we have found the solution only if in fact it does exist. In case of a repeated pole.6 The existence and uniqueness of solutions Equation (9. Uniqueness is proved by positing two solutions M pj −1 j=1 ℓ=0 f (z) = Ajℓ = (z − αj )pj −ℓ M pj −1 j=1 ℓ=0 Bjℓ (z − αj )pj −ℓ . if only a short way. Comes from us the reply. in which event it is not even clear what (9. (9. 9.6.24)’s partial fractions.6.24).25) has solved (9. and some of the coefficients come from the one. 0 ≤ k < pm . these coefficients evidently depend not only on the residual function itself but also on its several derivatives. cavils and nitpicks!” we are inclined to grumble.23) and (9.” Well. maybe there exists no expansion at all. Changing j ← m and ℓ ← k and solving for Ajℓ then produces the coefficients Ajℓ = 1 ℓ! dℓ (z − αj )pj f (z) dz ℓ .25) means.212 CHAPTER 9.25) to weight the expansion (9. otherwise what we have found is a phantom.

9. From the N coefficients bk and the N coefficients Ajℓ .23). We shall not often in this book prove existence and uniqueness explicitly.9. Logically this difference must be zero for all z if the two solutions are actually to represent the same function f (z). because each half-integral . one cannot. But uniqueness.7. FRULLANI’S INTEGRAL and computing the difference M pj −1 j=1 ℓ=0 213 Bjℓ − Ajℓ (z − αj )pj −ℓ between them. there always exist an alternate set of bk for which the same system has multiple solutions.7 Frullani’s integral ∞ 0 One occasionally meets an integral of the form S= f (bτ ) − f (aτ ) dτ. This however is seen to be possible only if Bjℓ = Ajℓ for each (j. which we have already established. say. where the combination’s weights depend solely on the locations αj and multiplicities pj of f (z)’s several poles. but such proofs when desired tend to fit the pattern outlined here. positive coefficients and f (τ ) is an arbitrary complex expression in τ . Therefore. one wants to split in two as [f (bτ )/τ ] dτ − [f (aτ )/τ ] dτ . the solution necessarily exists. Such an integral. but if f (0+ ) = f (+∞). 11 through 14 that when such a system has no solution. Therefore it is not possible for the system to have no solution—which is to say.24) over a common denominator and comparing the resulting numerator against the numerator of (9. N = 3) look like b0 = −2A00 + A01 + 3A10 . b1 = A00 + A01 + A10 . ℓ). Existence comes of combining the several fractions of (9. the two solutions are one and the same. forbids such multiple solutions in all cases. τ where a and b are real. Each coefficient bk is seen thereby to be a linear combination of the several Ajℓ . b2 = 2A01 − 5A10 . We shall show in Chs. an N × N system of N linear equations in N unknowns results—which might for example (if.

Nonetheless. this is b/ǫ S = = ǫ→0+ lim f (+∞) a/ǫ dσ − f (0+ ) σ bǫ aǫ dσ σ ǫ→0+ lim f (+∞) ln b/ǫ bǫ − f (0+ ) ln a/ǫ aǫ b = [f (τ )]∞ ln .5.3][1. 0 a Thus we have Frullani’s integral. So long as each of f (ǫ) and f (1/ǫ) approaches a constant value as ǫ vanishes. but in fact it does not matter which of a and b is the greater. INTEGRATION TECHNIQUES alone diverges. works for any f (τ ) which has definite f (0+ ) and f (+∞). as is easy to verify). § 2.12 12 [32.1][51. provided that one first relaxes the limits of integration as 1/ǫ S = lim ǫ→0+ ǫ f (bτ ) dτ − τ 1/ǫ ǫ f (aτ ) dτ τ . “Frullani’s integral”] . 0 τ a (9. Changing σ ← bτ in the left integral and σ ← aτ in the right yields b/ǫ S = = = ǫ→0+ lim bǫ bǫ f (σ) dσ − σ a/ǫ aǫ f (σ) dσ σ f (σ) − f (σ) dσ + σ b/ǫ a/ǫ ǫ→0+ lim aǫ −f (σ) dσ + σ f (σ) dσ − σ a/ǫ bǫ bǫ aǫ f (σ) dσ σ b/ǫ ǫ→0+ lim a/ǫ f (σ) dσ σ (here on the face of it. ∞ 0 b f (bτ ) − f (aτ ) dτ = [f (τ )]∞ ln . § 1. if a and b are both real and positive. splitting the integral in two is the right idea.214 CHAPTER 9.26) which. we have split the integration as though a ≤ b.

α ak+1 = − .14 exp(ατ )τ n = d dτ n k=0 1 . Remembering however § 5. eqn. (n!/k!)αn−k+1 1 α n j=k+1 (−jα) = (−)n−k exp(ατ )τ k . however. ln βτ = ln β + ln τ . 14 [42. (k + 1)(α) (−)n−k . 0 ≤ k < n. = One could write the latter product more generally as τ a−1 ln βτ . α = 0.3’s observation that ln τ is of zeroth order in τ . wherein ln β is just a constant. The two occur often enough to merit investigation here. by § 9.9. 0 ≤ k ≤ n. ak = Therefore.27) The right form to guess for the antiderivative of τ a−1 ln τ is less obvious. = αan exp(ατ )τ n + k=0 αak + If so. Concerning exp(ατ )τ n . POWERS AND LOGS 215 9. after maybe some false tries we eventually do strike the right form τ a−1 ln τ d a τ [B ln τ + C] dτ = τ a−1 [aB ln τ + (B + aC)].5. n ≥ 0. Appendix 2. PRODUCTS OF EXPONENTIALS. then evidently an = ak That is. powers and logarithms The products exp(ατ )τ n and τ a−1 ln τ tend to arise13 among other places in integrands related to special functions ([chapter not yet written]). (n!/k!)αn−k+1 (9. 73] 13 .8.8 Integrating products of exponentials.4’s method of unknown coefficients we guess its antiderivative to fit the form exp(ατ )τ n = = k=0 n−1 d dτ n n ak exp(ατ )τ k k=0 n αak exp(ατ )τ k + k=1 ak exp(ατ )τ k−1 k ak+1 k+1 exp(ατ )τ k . According to Table 2.

as sometimes it does.28) fails when a = 0. the Taylor series of Ch. When all else fails.29).28) as a → 0.1: Antiderivatives of products of exponentials. 15 [42. Therefore.29) If (9. α = 0 (n!/k!)αn−k+1 . would yield the same (9.41) that τ a = exp(a ln τ ). a=0 d 1 ln τ − dτ a a 2 d (ln τ ) dτ 2 which demands that B = 1/a and that C = −1/a2 . many integrals.1 summarizes. o applied to (9. But not all. eqn. but in this case with a little imagination the antiderivative is not hard to guess: d (ln τ )2 ln τ = .29). with the observation from (2. τ dτ 2 (9.216 CHAPTER 9.28) Antiderivatives of terms like τ a−1 (ln τ )2 . 8 and the antiderivative of § 9. d dτ n k=0 τa exp(ατ )τ n = τ a−1 ln τ ln τ τ = = (−)n−k exp(ατ )τ k . then l’Hˆpital’s rule (4. Appendix 2. (9. (9. INTEGRATION TECHNIQUES Table 9.29) seemed hard to guess nevertheless. a = 0.15 τ a−1 ln τ = d τa dτ a ln τ − 1 a . Table 9. powers and logarithms.9 Integration by Taylor series With sufficient cleverness the techniques of the foregoing sections solve many.30) 9. n ≥ 0. 74] . exp(ατ )τ n ln τ and so on can be computed in like manner as the need arises. Equation (9.1 together offer a concise.

however. at the price of losing the functions’ known closed analytic forms. it’s just a function you hadn’t heard of before. too. . After all. 0 ∞ k=0 (−)k z 2k+1 = (z) (2k + 1)2k k! x ∞ k=0 1 2k + 1 k j=1 −z 2 . The myf z is no less a function than sin z is.1 is used to calculate sin z. You can plot the function. For example. x 0 τ2 exp − 2 dτ = = = x ∞ 0 k=0 x ∞ 0 k=0 ∞ k=0 (−τ 2 /2)k dτ k! (−)k τ 2k dτ 2k k! x (−)k τ 2k+1 (2k + 1)2k k! 0 ∞ k=0 = ∞ k=0 (−)k x2k+1 (2k + 1)2k k! = (x) 1 2k + 1 k j=1 −x2 . It works just the same.9.9. or do with it whatever else one does with functions. 2j The result is no function one recognizes. The series above converges just as accurately and just as fast. or calculate its value. when a Taylor series from Table 8. This is not necessarily bad. 2j exp − τ2 2 dτ = myf x. then sin z is just a series. Sometimes it helps to give the series a name like myf z ≡ Then. it is just a series. or take its derivative τ2 d myf τ = exp − dτ 2 . INTEGRATION BY TAYLOR SERIES 217 practical way to integrate some functions.

INTEGRATION TECHNIQUES .218 CHAPTER 9.

and as we have seen in § 4. one likes to lay down one’s burden and rest a spell in the shade. 219 . But in this short chapter which rests between. 11.2) extracts with little effort.30) locates swiftly. The expression z + a0 is a linear polynomial. The roots of higher-order polynomials. No general formula to extract the roots of the nth-order polynomial seems to be known. and Ch. which though not plain to see the quadratic formula (2. 6’s footnote 8.2). but that is an approximate iteration rather than an exact formula like (2. One would prefer an actual formula to extract the roots. z 4 + a3 z 3 + a2 z 2 + a1 z + a0 . the Newton-Raphson iteration (4. hefting the weighty topic of the matrix. will indeed begin to build on those foundations. The quadratic polynomial z 2 + a1 z + a0 has of course two roots. to extract the roots of the cubic and quartic polynomials z 3 + a2 z 2 + a1 z + a0 . between the hard work of the morning and the heavy lifting of the afternoon. Chapters 2 through 9 have established the applied mathematical foundations upon which coming chapters will build. So much algebra has been known since antiquity. we shall refresh ourselves with an interesting but lighter mathematical topic: the topic of cubics and quartics.1 However. the lone root z = −a0 of which is plain to see.Chapter 10 Cubics and quartics Under the heat of noonday.8 it can occasionally fail to converge. 1 Refer to Ch.

2 10.” 05:17. “Fran¸ois Vi`te. w+ An interesting alternative to Vieta’s transform is w 2 wo ← z. but as |w| approaches |wo | this 2 ceases to be true. 31 Oct.3) 2 [51.4 For |w| ≫ |wo |. “Quartic equation”][52.” [51. Figure 10. 4 Also called “Vieta’s substitution. “Quartic equation. and so forth. Section 10. z ≈ wo /w. For |w| ≪ |wo |. 1/3 resembles 3. 9 Nov. (10. we have that z ≈ w. “Gerolamo Cardano.2 Cubics The general cubic polynomial is too hard to extract the roots of directly. The constant wo is the corner value. so one begins by changing the variable x+h←z (10. 1 ← z.5] 3 This change of variable broadly recalls the sum-of-exponentials form (5. one can transform a function f (z) into a function f (w) by the change of variable3 w+ or.1) w Equation (10.3 might be named Vieta’s parallel transform. 2006][52. “Cubic equation”][51. 10. c e 2006][46.1 plots Vieta’s transform for real w in the case wo = 1. The 16thcentury algebraists Ferrari.1) is Vieta’s transform. 1 Nov. Vieta. w (10.2) which in light of § 6. To capture this sense. “Vieta’s substitution”] .” 22:35. § 1. 2006][52. inasmuch as exp[−φ] = 1/ exp φ.” 00:26.2 shows how Vieta’s transform can be used.1 Vieta’s transform There is a sense to numbers by which 1/2 resembles 2. This chapter explains. formulas do exist. in the neighborhood of which w transitions from the one domain to the other. Tartaglia and Cardano have given us the clever technique. more generally. 1/4 resembles 4.19) of the cosh(·) function. CUBICS AND QUARTICS though the ancients never discovered how.220 CHAPTER 10. w 2 wo ← z.

what to do next is not so obvious.5) . where p ≡ −a1 + a2 2 .10. 3 a1 a2 a2 q ≡ −a0 + −2 3 3 a2 a1 a2 a2 2 +2 x + a0 − 3 3 3 3 (10. are the cubic polynomial’s three roots. So we have struck the a2 z 2 term. (10.2. plotted logarithmically.6) then.4) .1: Vieta’s transform (10. ln z ln w to obtain the polynomial x3 + (a2 + 3h)x2 + (a1 + 2ha2 + 3h2 )x + (a0 + ha1 + h2 a2 + h3 ). That was the easy part. If one could strike the px term as well. 3 (10.1) for wo = 1. The choice a2 3 casts the polynomial into the improved form h≡− x3 + a1 − or better yet x3 − px − q. The solutions to the equation x3 = px + q. then the . CUBICS 221 Figure 10.

This works.3.3) achieves this—or rather. with the idea of balancing the equation between offsetting w and 1/w terms. To define inscrutable coefficients unnecessarily before the need for them is apparent seems poor applied mathematical style. The careful reader will observe that (10.7) (10.6) by the change of variable w+ we get the new equation 2 2 w3 + (3wo − p)w + (3wo − p) 2 w6 wo + o = q. none of which seems to help much. double the three the fundamental theorem of algebra (§ 6. Vieta-transforming (10. but at the price of reintroducing an unwanted x2 or z 2 term. we should like to improve the notation by defining5 p P ←− .2. but after weeks or months of such frustration one might eventually discover Vieta’s transform (10. various substitutions. we have the quadratic equation (w3 )2 = 2 p q w3 − 2 3 3 . For the moment.2) allows a cubic polynomial to have.10) seems to imply six roots. CUBICS AND QUARTICS roots would follow immediately.9) reducing (10. such a substitution does achieve it.11) Why did we not define P and Q so to begin with? Well. Lacking guidance. (10. We shall return to this point in § 10. one might try many. but no very simple substitution like (10. 3 q Q←+ . Vieta’s transform has reduced the original cubic to a quadratic. 3 (10.222 CHAPTER 10.10).10) which by (2.8) to read w3 + (p/3)3 = q. we lacked motivation to do so. That way is no good.2) we know how to solve. w3 Multiplying by w3 and rearranging terms. before unveiling (10. however.8) which invites the choice 2 wo ≡ p .1). 2 5 (10. w w3 2 wo ←x w (10. .

what the equations really imply is not six distinct roots but six distinct w.1: The method to extract the three roots of the general cubic polynomial.4 below. otherwise. which three do we really need and which three can we ignore as superfluous? The six w naturally come in two groups of three: one group of three from the one w3 and a second from the other. 10. x ≡ 0 w − P/w a2 z = x− 3 3 if P = 0 and Q = 0. one can choose either sign.10) are written 3 2 (w ) x3 = 2Q − 3P x. 3 (10. (In the definition of w3 .2) allows a cubic polynomial to have. we shall guess—and logically it is only a guess—that a single w3 generates three distinct x and thus (because z differs from x only by a constant offset) all three roots z. treated in §§ 10.1 seem to imply six roots.2. If . so in fact the equations imply only three x and thus three roots z.2 has observed. For this reason.13) = 2Qw + P .12) (10. However. SUPERFLUOUS ROOTS 223 Table 10.1 summarizes the complete cubic polynomial root extraction method in the revised notation—including a few fine points regarding superfluous roots and edge cases. The definition x ≡ w − P/w maps two w to any one x. the equations of Table 10.) 0 = z 3 + a2 z 2 + a1 z + a0 a1 a2 2 P ≡ − 3 3 a2 a2 a1 1 −2 −a0 + 3 Q ≡ 2 3 3 3 2Q if P = 0.6) and (10. The question then is: of the six w. with which (10.3. w3 ≡ Q ± Q2 + P 3 otherwise.10. 3 Table 10. double the three the fundamental theorem of algebra (§ 6.3 and 10.3 Superfluous roots As § 10.

Q2 + P 3 . just as the verb to square means “to raise to the second power. 6 w1 = 2Q2 + P 3 ± 2Q 6 Combining the last two on w1 . Q4 + 2Q2 P 3 + P 6 = Q4 + Q2 P 3 . then the second w3 cannot but yield the same three roots. (P 3 )(Q2 + P 3 ) = 0. rearranging terms and halving. then canceling offsetting terms and factoring. 6 The verb to cube in this context means “to raise to the third power. by successive steps. Cubing6 the last equation.” as to change y to y 3 . 6 w1 = −P 3 . CUBICS AND QUARTICS the guess is right. which means that the second w3 is superfluous and can safely be overlooked.224 CHAPTER 10. e+i2π/3 w1 − P e+i2π/3 w1 P e+i2π/3 w1 + −i2π/3 e w1 P e+i2π/3 w1 + w1 . But is the guess right? Does a single w3 in fact generate three distinct x? To prove that it does. e−i2π/3 w1 P = e−i2π/3 w1 + +i2π/3 . e w1 P = e−i2π/3 w1 + . −P 3 = 2Q2 + P 3 ± 2Q or. Squaring. w1 = e−i2π/3 w1 − P which can only be true if 2 w1 = −P. Q2 + P 3 = ∓Q Q2 + P 3 . Letting the symbol w1 represent the third w. Let us suppose that a single w3 did generate two w which led to the same x. Q2 + P 3 . Because x ≡ w − P/w. but squaring the table’s w3 definition for w = w1 . then (since all three w come from the same w3 ) the two w are e+i2π/3 w1 and e−i2π/3 w1 . let us suppose that it did not.” .

just as it ought to do. when the two w3 differ in magnitude one might choose the larger.4 Edge cases Section 10.1. 1 and 2. as was to be demonstrated. This certainly is possible. One can choose either sign. but e±i2π/3 = (−1 ± i 3)/2.32). the three x descending from a single w3 are indeed distinct.1 gives the quadratic solution w 3 ≡ Q ± Q2 + P 3 . EDGE CASES 225 The last equation demands rigidly that either P = 0 or P 3 = −Q2 . 7 Numerically. the Taylor series of Table 8. then. For example. of the two signs in the table’s quadratic solution w3 ≡ Q ± Q2 + P 3 demands to be considered. that nothing prevents two actual roots of a cubic polynomial from having the same value. In calculating the three w from w3 . if for no other reason than to offer the reader a model of how to think about edge cases on his own. Some cubic polynomials do meet the demand—§ 10. When this happens. They are e 1 √ −1 ± i 3 w1 .4 will treat these and the reader is asked to set them aside for the moment—but most cubic polynomials do not meet it.4. Therefore.10. it matters not which. the method of Table 10.3 excepts the edge cases P = 0 and P 3 = −Q2 . The assumption: that the three x descending from a single w3 were not distinct. Mostly the book does not worry much about edge cases. 10. incidentally. For most cubic polynomials. the cubic polynomial (z − 1)(z − 1)(z − 2) = z 3 − 4z 2 + 5z − 2 has roots at 1. and it does not mean that one of the two roots is superfluous or that the polynomial has fewer than three roots. so two roots come easier. 2 We should observe. provided that P = 0 and P 3 = −Q2 . with a single root at z = 2 and a double root—that is. (10.1 properly yields the single root once and the double root twice. Then√ other the ±i2π/3 w . two roots—at z = 1. or any other convenient root3 finding technique to find a single root w1 such that w1 = w3 . The conclusion: either. the contradiction proves false the assumption which gave rise to it. one can apply the Newton-Raphson iteration (4. Table 10. it can matter.14) w = w1 . because w appears in the denominator of x’s definition. As a simple rule. not both. but the effects of these cubic edge cases seem sufficiently nonobvious that the book might include here a few words about them.7 The one sign alone yields all three roots of the general cubic polynomial. .

In the edge case P = 0. and does not worry about choosing one w3 over the other. by sidestepping it. the trouble is that w3 = 0 and that no alternate w3 is available. w (2Q)1/3 . At the corner. w3 = 2Q or 0. This is fine. (2Q)1/3 ǫ ǫ x = w− =− + (2Q)1/3 = (2Q)1/3 . we shall consider first the edge cases themselves. or equivalently where P = 0 and Q = 0. This implies the triple root z = −a2 /3. or corner case. However. gives two distinct quadratic solutions w3 . w3 = Q. w3 = Q − w = − Q2 + ǫ3 = Q − (Q) 1 + ǫ3 2Q2 =− ǫ3 . which puts an awkward 0/0 in the table’s definition of x.3. so the one w3 suffices by default because the other w3 brings nothing different. changing P infinitesimally from P = 0 to P = ǫ. Both edge cases are interesting. The edge case P = 0 however does give two distinct w3 .3 has excluded the edge cases from its proof of the sufficiency of a single w3 . choosing the − sign in the definition of w3 . one of which is w3 = 0. no other x being possible. In the edge case P 3 = −Q2 . In this section. The edge case P 3 = −Q2 gives only the one quadratic solution w3 = Q.3 generally finds it sufficient to consider either of the two signs. or more precisely. which in this case means that x3 = 0 and thus that x = 0 absolutely. One of the two however is w3 = Q − Q = 0. The double edge case.1’s definition that x ≡ w − P/w. in applying the table’s method when P = 0. The edge case P = 0. Then. like the general non-edge case. which is awkward in light of Table 10. according to (10. both w3 are the same. CUBICS AND QUARTICS in which § 10. Let us now add the edge cases to the proof. 2Q ǫ .226 CHAPTER 10. it gives two quadratic solutions which happen to have the same value. We address this edge in the spirit of l’Hˆpital’s o rule. one chooses the other quadratic solution. One merely accepts that w3 = Q. then their effect on the proof of § 10. For this reason.12). w3 = Q + Q = 2Q. Section 10. arises where the two edges meet— where P = 0 and P 3 = −Q2 . In the edge case P 3 = −Q2 . x3 = 2Q − 3P x.

though. historically Ferrari discovered it earlier [51. this writer does not know. which he could not solve until Tartaglia applied Vieta’s transform to it. 8 Even stranger. but one supposes that it might make an interesting story. This completes the proof. What motivated Ferrari to chase the quartic solution while the cubic solution remained still unknown. whereas (1)1/3 = 1. 6’s footnote 8. we now turn our attention to the general quartic.10. Apparently Ferrari discovered the quartic’s resolvent cubic (10. . w (2Q)1/3 Evidently the roots come out the same. either way. +3 a3 4 4 . The details differ. ±i. strangely enough. The kernel of the quartic technique lies likewise in reducing the quartic to a cubic. and. “Quartic equation”].8 As with the cubic.17) . This may also be why the quintic is intractable—but here we trespass the professional mathematician’s territory and stray from the scope of this book. 10.5 Quartics Having successfully extracted the roots of the general cubic polynomial. the roots lying nicely along the Argand axes. (−1±i 3)/2. See Ch. 1/3 227 .5. 4 a3 4 2 (10. ǫ ǫ x = w − = (2Q)1/3 − = (2Q)1/3 .22).16) s ≡ −a2 + 6 . one begins solving the quartic by changing the variable x+h←z to obtain the equation x4 = sx2 + px + q. The (1)1/4 brings a much neater result. 3 2 p ≡ −a1 + 2a2 q ≡ −a0 + a1 a3 a3 −8 4 4 a3 a3 − a2 4 4 (10. The reason the quartic is simpler to reduce is probably related to the fact that (1)1/4 = √ ±1.15) (10. The kernel of the cubic technique lay in reducing the cubic to a quadratic. where h≡− a3 . w3 = Q + w = (2Q) Q2 + ǫ3 = 2Q. QUARTICS But choosing the + sign. in some ways the quartic reduction is actually the simpler.

one must regard (10. but not so u. look at (10. The left side of that equation is a perfect square. but with respect to x2 rather than x. Ferrari9 supplies the cleverness. where u remains to be chosen. what choice for u would be wise? Well.18) Now. And though j 2 and k2 depend on u. The variable u is completely free.18).16) further. too. better expressed. one must be cleverer. or. we have introduced it ourselves and can assign it any value we like. even after specifying u we remain free at least to choose signs for j and k. (10.17). though no choice would truly be wrong. (10. j 2 ≡ u2 + q. The clever idea is to transfer some but not all of the sx2 term to the equation’s left side by x4 + 2ux2 = (2u + s)x2 + px + q. we have that p2 = 4(2u + s)(u2 + q). that 4sq − p2 s . 0 = u3 + u2 + qu + 2 8 9 (10. CUBICS AND QUARTICS To reduce (10. (10.20) and substituting for j 2 and k2 from (10.21) Squaring (10.228 CHAPTER 10.19).18) easier to simplify. rearranging terms and scaling. So. j or k.19) properly. “Quartic equation”] . if it were that p = ±2jk.20) or.18) and (10. In these equations. p and q have definite values fixed by (10.2.19) 2 = k2 x2 + px + j 2 . As for u. then to complete the square on the equation’s left side as in § 2.22) [51. j= p . still. we propose the constraint that p = 2jk. The right side would be. as x2 + u where k2 ≡ 2u + s. one supposes that a wise choice might at least render (10. arbitrarily choosing the + sign. so. after distributing factors. 2k (10. s.

Equation (10.22) are both honored. 10. which we know by Table 10.20) into (10.23) implies the quadratic x2 = ±(kx + j) − u.25) wherein the two ± signs are tied together but the third. but the resolvent cubic is a voluntary constraint.22) is the resolvent cubic.25). 2 2 = kx + j . If the constraints (10. which (2. reveals the four roots of the general quartic polynomial. and (10. (10.22) of course yields three u not one. (10. Using the improved notation.21) then gives j.2) solves as x=± k ±o 2 k 2 2 (10. (10. then we can safely substitute (10. Table 10. .19) then gives k (again.23) The resolvent cubic (10.26) improves the notation. so we can just pick one u and ignore the other two. we can just pick one of the two signs).2 summarizes the complete quartic polynomial root extraction method. ±o sign is independent of the two. the change of variables K← k . 2 J ← j.21) and (10. GUESSING THE ROOTS 229 Equation (10.25). With u.24) ± j − u.6 Guessing the roots It is entertaining to put pencil to paper and use Table 10. (10. with the other equations and definitions of this section. j and k established.6. Equation (10.18) to reach the form x2 + u which is x2 + u 2 = k2 x2 + 2jkx + j 2 . and which we now specify as a second constraint.1’s method to extract the roots of the cubic polynomial 0 = [z − 1][z − i][z + i] = z 3 − z 2 + z − 1.1 how to solve for u. In view of (10.10.

CUBICS AND QUARTICS Table 10.2: The method to extract the four roots of the general quartic polynomial. the ±o is independent but the other two are tied together. J ≡ p/4K otherwise. where any one of the three resulting u serves.) 0 = z 4 + a3 z 3 + a2 z 2 + a1 z + a0 a3 2 s ≡ −a2 + 6 4 a3 a3 3 p ≡ −a1 + 2a2 −8 4 4 a3 2 a3 a3 − a2 +3 q ≡ −a0 + a1 4 4 4 s 2 4sq − p2 0 = u3 + u + qu + 8 √ 2 2u + s K ≡ ± 2 ± u2 + q if K = 0.1.230 CHAPTER 10. the four resulting combinations giving the four roots of the general quartic. Of the three ± signs in x’s definition. the resolvent cubic is solved for u by the method of Table 10. (In the table. x ≡ ±K ±o a3 z = x− 4 K2 ± J − u 4 . Either of the two K similarly serves.

1 and 10. ± . one finds that 1. 2 2 Doubling to make the coefficients all integers produces the polynomial 2z 5 − 7z 4 + 8z 3 + 1z 2 − 0xAz + 6. but if any of the roots happens to be real and rational. 10 .10. it must belong to the set 1 2 3 6 ±1. ±3 and ±6) divided by the factors of the polynomial’s leading coefficient (in the example. No other real. whose factors are ±1 and ±2). One has found the number 1 but cannot recognize it. rational root is possible. 3 3 w √ 2 5 + 33 . ±2. ± 2 2 2 2 . but why? The root’s symbolic form gives little clue. Figuring the square and cube roots in the expression numerically. GUESSING THE ROOTS One finds that z = w+ w3 ≡ 1 2 − 2 . rational candidates are the factors of the polynomial’s trailing coefficient (in the example. which naturally has the same roots. a quartic. but just you try to simplify it! A more baroque. ±3.10 we are stuck with the cubic baroquity. ± . 6. Consider for example the quintic polynomial 1 7 z 5 − z 4 + 4z 3 + z 2 − 5z + 3. the author would like to hear of it.2 and guess the roots directly. a quintic or any other polynomial has real. However. ±2. to the extent that a cubic. Dividing these out leaves a quadratic which is easy to solve for the remaining roots. rational root is At least. The reason no other real. more impenetrable way to write the number 1 is not easy to conceive. The real.0000. whose factors are ±1. no better way is known to this author. rational roots. If the roots are complex or irrational. the root of the polynomial comes mysteriously to 1. ± .6. 2. 33 231 which says indeed that z = 1. a trick is known to sidestep Tables 10. −1 and 3/2 are indeed roots. In general no better way is known. Trying the several candidates on the polynomial. ±i. ±6. If any reader can straightforwardly simplify the expression without solving a cubic polynomial of some kind. they are hard to guess.

12 [46.12 Such root-guessing is little more than an algebraic trick. By similar reasoning. rational root is possible except a factor of a0 divided by a factor of an . We conclude for this reason. Moving the q n term to the equation’s right side. as was to be demonstrated. which implies that a0 q n is a multiple of p. We do not want to spend many pages on this. The next several chapters turn to the topic of the matrix. harder but much more profitable. that no real. The presentation here is quite informal. we have that an pn−1 + an−1 pn−2 q + · · · + a1 q n−1 p = −a0 q n . § 3. but now that the reader has tasted the topic he may feel inclined to agree that. though the general methods this chapter has presented to solve cubics and quartics are interesting. we have defined p and q to be relatively prime to one another—that is. we have defined them to have no factors but ±1 in common—so. then p and q are factors of a0 and an respectively. a multiple of q. But by demanding that the fraction p/q be fully reduced. toward which we mean to put substantial effort. One could write much more about higher-order algebra. of course. and an . CUBICS AND QUARTICS possible is seen11 by writing z = p/q—where p and q are integers and the fraction p/q is fully reduced—then multiplying the nth-order polynomial by q n to reach the form an pn + an−1 pn−1 q + · · · + a1 pq n−1 + a0 q n = 0. further effort were nevertheless probably better spent elsewhere.2] 11 . but it can be a pretty useful trick if it saves us the embarrassment of inadvertently expressing simple rational numbers in ridiculous ways. where all the coefficients ak are integers. But if a0 is a multiple of p.232 CHAPTER 10. an is a multiple of q. not only a0 q n but a0 itself is a multiple of p.

or how C might scale something. the √ eigenvalues of C happen to be −1 and [7 ± 0x49]/2). but prior to this point in the book it was usually trivial—the eigenvalue of 5 is just 5. for example. But. just what is an eigenvalue? Answer: an eigenvalue is the value by which an object like C scales an eigenvector without altering the eigenvector’s direction. but it is to answer precisely such questions that this chapter and the three which follow it are written. we have not yet said what an eigenvector is. 14. most of the foundational methods of the earlier chapters have handled only one or at most a few numbers (or functions) at a time. Where we have used single numbers as coefficients or multipliers 233 . So. we are getting ahead of ourselves. either. Taken by themselves. as. It serves as a generalized coefficient or multiplier. However. Regarding the eigenvalue: the eigenvalue was always there. Let’s back up. This chapter begins to build on those foundations. for instance—so we didn’t bother much to talk about it. demanding some heavier mathematical lifting. the eigenvalue of Ch. It is when numbers are laid out in orderly grids like C= 2 6 4 3 3 4 0 1 3 0 1 5 0 that nontrivial eigenvalues arise (though you cannot tell just by looking. An object like C is called a matrix. in practical applications the need to handle large arrays of numbers at once arises often.Chapter 11 The matrix Chapters 2 through 9 have laid solidly the basic foundations of applied mathematics. Some nonobvious effects emerge then. Of course.

so in this example m = 2 and n = 3). 6 1. This book does not treat them. the chief advantage of the standard matrix is that it neatly represents the linear transformation of one vector into another. THE MATRIX heretofore. Fifth. which can be written in-line with the notation A = [0 6 2. an m-element vector does not differ for most purposes from an m×1 matrix. a column of m scalars like u= » 5 −4 + i3 – .1. Third. “Matrix stacks” bring no such advantage. vector. matrix suggests next a “matrix stack” or stack of p matrices. in square brackets. the three-element (that is. The technical name for the “single number” is the scalar. even infinity. is called a scalar because its action alone during multiplication is simply to scale the thing it multiplies. Such a number.1 Where one needs visually to distinguish a symbol like A representing a matrix. one can with sufficient care often use matrices instead. First.234 CHAPTER 11. though the progression scalar.2 Normally however a simple A suffices. such objects in fact are seldom used. • the vector. even one. which can be written in-line with the notation u = [5 −4 + i3]T (here there are two scalar elements. so in this example m = 2). 1 . Fourth. As we shall see in § 11. generally the two can be regarded as the same thing. 5 and −4+i3. like – » 0 6 2 A = 1 1 −1 . a single number like α = 5 or β = −4 + i3. three-dimensional) geometrical vector of § 3. or equivalently a row of n vectors. The matrix interests us for this reason among others. m and n can be any nonnegative integers. despite the geometrical Argand interpretation of the complex number. Besides acting alone. scalars can also act in concert—in orderly formations—thus constituting any of three basic kinds of arithmetical object: • the scalar itself. Second. one can write it [A].3 is just an m-element vector with m = 3. 1 1 −1] or the notation A = [0 1. • the matrix. even zero. as for instance 5 or −4 + i3. 2 −1]T (here there are two rows and three columns of scalar elements. Several general points are immediately to be observed about these various objects. an m × n grid of scalars. 2 Alternate notations seen in print include A and A. therefore any or all of a vector’s or matrix’s scalar elements can be complex. however. a complex number is not a two-element vector but a scalar.

no more likely to be mastered by mere theoretical study than was the classical arithmetic of childhood. and using a sword. The reward is worth the effort. 11 and 12 as need arises. At least. Would the rebel consider alternate counsel? If so.3. The reader who has previously drilled matrix arithmetic will meet here the essential applied theory of the matrix. Only the doggedly determined beginner will learn the matrix here alone. is likely to find these chapters positively hostile. decomposing each carefully by pencil per the Gauss-Jordan method of § 12. The way of the warrior is hard. necessary the chapters may be. the earlier parts are not very exciting (later parts are better). Part of the trouble with the matrix is that its arithmetic is just that. The matrix upsets this balance. surprisingly less dull [23] for instance. Applied mathematics brings nothing else quite like it. using a machine defeats the point of the exercise. an arithmetic. still determined to learn the matrix here alone. logical. one must drill it. yet the book you hold is fundamentally one of theory not drill. the author salutes his honorable defiance. The most basic body of matrix theory. however. That reader will find this chapter and the next three tedious enough. the beginner can expect to find these chapters still tedious but no longer impenetrable. The reader who has not previously drilled matrix arithmetic. square and tall. 13 and 14. others will find it more amenable to drill matrix arithmetic first in the early chapters of an introductory linear algebra textbook. dull though such chapters be (see [31] or better yet the fine. Substantial. without which little or no useful matrix work can be done. but exciting they are not. even interminable at first encounter. no one has ever devised a way to introduce the matrix which does not seem shallow. The idea of the matrix is deceptively simple. referring back to Chs. yet the matrix is too important to ignore.235 The matrix is a notoriously hard topic to motivate. though the early chapters of almost any such book give the needed arithmetical drill. but to understand the matrix one must first understand the tedium and clutter the matrix encapsulates. the young warrior with face painted and sword agleam. checking results (again by pencil. As far as the author is aware.) Returning here thereafter. Several hours of such drill should build the young warrior the practical arithmetical foundation to master—with commensurate effort—the theory these chapters bring. then the rebel might compose a dozen matrices of various sizes and shapes. That is the approach the author recommends. but conquest is not impossible. . The mechanics of matrix arithmetic are deceptively intricate.3 Chapters 11 through 14 treat the matrix and its algebra. To the mathematical rebel. well. As a reasonable compromise. tiresome. 3 In most of its chapters. it won’t work) by multiplying factors to restore the original matrices. broad. the veteran seeking more interesting reading might skip directly to Chs. To master matrix arithmetic. To the matrix veteran. This chapter. is deceptively extensive. irksome. the author presents these four chapters with grim enthusiasm. the book seeks a balance between terseness the determined beginner cannot penetrate and prolixity the seasoned veteran will not abide. The matrix neatly encapsulates a substantial knot of arithmetical tedium and clutter.

The linear transformation 5 is the operation of an m × n matrix A.1) to transform an n-element vector x into an m-element vector b. 2 and 5] has much to recommend it.3.1 The linear transformation Section 7. (11. We begin there.2) is the 2 × 3 matrix which transforms a three-element vector x into a twoelement vector b such that Ax = where x= 4 5 » 2 0x1 + 6x2 + 2x3 1x1 + 1x2 − 1x3 – = b.236 CHAPTER 11. 1 and 2][31. The professional approach [2. introduces the rudiments of the matrix itself.4 11. . while respecting the rules of linearity A(x1 + x2 ) = Ax1 + Ax2 = b1 + b2 . 1. 3 x1 4 x2 5. (11. but it is not the approach we will follow here. the basis set and the simultaneous system of linear equations—proving from suitable axioms that the three amount more or less to the same thing.1. A= » 0 1 6 1 2 −1 – = αb. [2][15][23][31] Professional mathematicians conventionally are careful to begin by drawing a clear distinction between the ideas of the linear transformation. 11.3 has introduced the idea of linearity. as in Ax = b. 11. A(αx) = αAx A(0) = 0.1 Provenance and basic use It is in the study of linear transformations that the concept of the matrix first arises. rather than implicitly assuming the fact. THE MATRIX Ch. Chs. x3 b= » b1 b2 – . For example. Chs.

there are unfortunately not enough distinct Roman and Greek letters available to serve the needs of higher mathematics.1. industrial mass production-style. however. the operation of a matrix A is that6. is compactly represented as Ax = b.3. matrices can also represent simultaneous systems of linear equations. 11. one can use ℓjk or some other convenient letters for the indices. the system 0x1 + 6x2 + 2x3 = 2. following mathematical convention in the matter for this reason. P the meaning is usually clear from the context: i in i or aij is an index.11. 1x1 + 1x2 − 1x3 = 4. PROVENANCE AND BASIC USE In general. with A as given above and b = [2 4]T . and aij ≡ [A]ij is the element at the ith row and jth column of A.7 n 237 bi = j=1 aij xj . Seen from this point of view. In computers. The book you are reading tends to let the index run from 1 to n. Here. so the computer’s indexing convention poses no dilemma in this case. which is not an index and has nothing to do with indices. but the same letter i also serves as the imaginary unit. a simultaneous system of linear equations is itself neither more nor less than a linear transformation.1. a 0 index normally implies something special or basic about the object it identifies.3) where xj is the jth element of x. i in −4 + i3 or eiφ is the imaginary unit. transforming them at once into the corresponding As observed in Appendix B. a12 = 6). an m × n matrix can be considered an ∞ × ∞ matrix with zeros in the unused cells. the index normally runs from 0 to n − 1. counting from top left (in the example for instance. both indices i and j run from −∞ to +∞ anyway. and in many ways this really is the more sensible way to do it. Besides representing linear transformations as such. 6 . 7 Whether to let the index j run from 0 to n−1 or from 1 to n is an awkward question of applied mathematical style.2 Matrix multiplication (and addition) Nothing prevents one from lining several vectors xk up in a row. In matrix work. bi is the ith element of b. In mathematical theory. Fortunately. Should a case arise in which the meaning is not clear. See § 11. the Roman letters ijk conventionally serve as indices. Conceived more generally. For example. (11.

one multiplies each of the matrix’s elements individually by the scalar: [αA]ij = αaij .4) implies a definition for matrix multiplication. however. (A + C)(X) = AX + CX. (11. matrix addition is indeed distributive: [X + Y ]ij = xij + yij .4). (11. X ≡ [ x1 x2 · · · B ≡ [ b1 b2 · · · AX = B.4) bik = j=1 aij xjk . (11.6) as one can show by a suitable counterexample like A = [0 1.7) Evidently multiplication by a scalar is commutative: αAx = Aαx.5) (11. Such matrix multiplication is associative since n [(A)(XY )]ik = j=1 n aij [XY ]jk p = aij j=1 p n ℓ=1 xjℓ yℓk aij xjℓ yℓk = ℓ=1 j=1 = [(AX)(Y )]ik . X = [1 0. under multiplication. In this case. Equation (11. (A)(X + Y ) = AX + AY . (11. AX = XA. bp ]. Matrix addition works in the way one would expect. To multiply a matrix by a scalar. element by element. THE MATRIX vectors bk by the same matrix A. and as one can see from (11.238 CHAPTER 11. n xp ]. 0 0]. Matrix multiplication is not generally commutative. 0 0].8) .

the concept is important. however. from the left.10) or a row operation (11. the same matrix equation represents something else.1. It operates on A’s columns. all columns of X”—that is. row operators. the jth column of A. one writes (11. it represents a weighted sum of the columns of A. The ∗ here means “any” or “all. In this view. Since matrix multiplication produces the same result whether one views it as a linear transformation (11. By virtue of multiplying A from the right. If several vectors xk line up in a row to form a matrix X. with the elements of x as the weights. the matrix X operates on A’s columns. (11. This rule is worth memorizing. the vector x is a column operator acting on A. Similarly.3 Row and column operators The matrix equation Ax = b represents the linear transformation of x into b. (11.11. the matrix A operates on X’s rows. besides (11.11). If a matrix multiplying from the right is a column operator. is n [B]i∗ = j=1 aij [X]j∗ . is a matrix multiplying from the left a row operator? Indeed it is.” Hence [X]j∗ means “jth row. a column operation (11.10). (11. PROVENANCE AND BASIC USE 239 11. the jth row of X.4). such that AX = B. as we have seen.11) The ith row of A weights the several rows of X to yield the ith row of B. In AX = B.3) as n b= j=1 [A]∗j xj . [A]∗j means “all rows. jth column of A”—that is. (Observe the notation. it is also an operator.1. then the matrix X is likewise a column operator: n [B]∗k = j=1 [A]∗j xjk .) Column operators attack from the right. it is not so much for the sake . However. Another way to write AX = B. The matrix A is a row operator. one might wonder what purpose lies in defining matrix multiplication three separate ways. Viewed from another perspective.10) The kth column of X weights the several columns of A to yield the kth column of B.9) where [A]∗j is the jth column of A. Here x is not only a vector.

It is worth developing the mental agility to view and handle matrices all three ways for this reason.240 CHAPTER 11. 11.4 The transpose and the adjoint One function peculiar to matrix algebra is the transpose C = AT . A tedious. but algebraically the transpose is artificial.4). the latter two do indeed expand to yield (11.1. −1 (11.63). The transpose is convenient notationally to write vectors and matrices inline and to express certain matrix-arithmetical mechanics. Alternate notations sometimes seen in print for the adjoint include A† (a notation which in this book means something unrelated) and AH (a notation which recalls the name of the mathematician Charles Hermite). the transpose and the adjoint amount to the same thing. the book you are reading writes the adjoint only as A∗ . of course. It is the adjoint rather which mirrors a matrix properly. nonintuitive matrix theorem from one perspective can appear suddenly obvious from another (see for example eqn.12) Similar and even more useful is the conjugate transpose or adjoint 8 C = A∗ . We do it for ourselves.” whereas the adjoint would be “snoitavired. cij = a∗ . then the transpose of “derivations” would be “snoitavired. ji (11. but as written the three represent three different perspectives on the matrix. For example. Results hard to visualize one way are easy to visualize another.” See the difference?) On real-valued matrices like the A in the example. a notation which better captures the sense of the thing in the author’s view. THE MATRIX of the mathematics that we define it three ways as it is for the sake of the mathematician. cij = aji . (If the transpose and adjoint functions applied to words as to matrices. However.13) which again mirrors an m × n matrix into an n × m matrix. 11. AT = 2 0 4 6 2 3 1 1 5. but conjugates each element as it goes. 8 . Mathematically. which mirrors an m × n matrix into an n × m matrix.

0 otherwise. Later. “Kronecker delta. 31 May 2006] . THE KRONECKER DELTA 241 If one needed to conjugate the elements of a matrix without transposing the matrix itself. The discrete analog of the Dirac delta is the Kronecker delta 10 δi ≡ or δij ≡ ∞ i=−∞ ∞ i=−∞ ∞ j=−∞ 1 if i = 0. Observe that (A2 A1 )T = AT AT . [52.13) and (7.2.14). 1 2 (A2 A1 )∗ = A∗ A∗ . however.17) The Kronecker delta enjoys the Dirac-like properties that δi = δij = δij = 1 (11.3 that k Ak = · · · A3 A2 A1 . whereas k Ak = A1 A2 A3 · · · .2 The Kronecker delta Section 7.3 will revisit the Kronecker delta in another light. Such a need seldom arises. Chs. 9 10 Q ‘ Recall from § 2.19) parallel the Dirac equations (7. (11. k k Ak k = 11.4.18) and that δij ajk = aik .15) A∗ . 1 2 and more generally that9 T (11.19) the latter of which is the Kronecker sifting property.” 15:59.18) and (11.11.7 has introduced the Dirac delta. 0 otherwise. ∞ j=−∞ (11. k (11. 11 and 14 will find frequent use for the Kronecker delta. one could contrive notation like A∗T . 1 if i = j.14) Ak k ∗ = k AT . The Kronecker equations (11.16) (11. § 15.

. . if we want. . . for instance. (11. .. We can also define A5 . .3 Dimensionality and matrix forms X= 2 −4 4 1 2 3 0 2 5 −1 . 7 7 7 7 7 7 5 aij xjk . the matrix multiplication rule (11. x(−1)(−1) = 0. . . 0 0 0 0 0 0 0 .20) For square matrices whose purpose is to manipulate other matrices or vectors in place. 0 0 0 0 0 0 0 . As before. . . . . THE MATRIX 11. 0 0 0 0 0 0 0 . . .” 1 6 5 6 4 0 0 0 1 0 0 0 0 1 0 0 0 7 7. merely padding with zeros often does not suit. all these express the same operation: “to add to the second row. . 3 An m × n matrix like can be viewed as the ∞ × ∞ matrix 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 . its effect is to add to X’s second row. Further consider 3 2 A4 = which does the same to a 4 × p matrix X. A7 . . . bik = ∞ j=−∞ 7 7 7 7 7 7 7 7 7. with zeros in the unused cells. indeed for all matrices.242 CHAPTER 11. 0 0 0 0 0 0 0 . . ··· ··· ··· ··· ··· ··· ··· . . . . really. x11 = −4 and x32 = −1. Consider for example the square matrix A3 = 2 1 4 5 0 0 1 0 3 0 0 5. . .4) generalizes to B = AX. ··· ··· ··· ··· ··· ··· ··· . . 0 0 0 0 0 0 0 −4 0 0 1 2 0 2 −1 0 0 0 0 0 0 . . X= . . .. . For such a matrix. 0 5 1 . .. . but. . . but now xij exists for all integral i and j. 5 times the first. . . . . but when it acts A3 X as a row operator on some 3 × p matrix X. 1 This A3 is indeed a matrix. . A6 . 5 times the first. . .

11. . ··· ··· ··· ··· ··· ··· ··· . .1. A adds to the second row of X. DIMENSIONALITY AND MATRIX FORMS The ∞ × ∞ matrix 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 . . And really. . . . the idea of an ∞ × 1 vector or an ∞ × ∞ matrix should not seem so strange. . . 0 0 0 1 0 0 0 . 12 The idea of infinite dimensionality is sure to discomfit some readers. 0 0 0 0 1 0 0 .. . .20) that when A acts AX on a matrix X of any dimensionality whatsoever. only that that view does not suit the present book’s development. among others.11 This section explains. which holds all values of the function sin θ of a real argument θ. . that the idea thereof does not threaten to overturn the reader’s pre¨xisting matrix knowledge. As before. 1 0 0 0 0 0 0 . 0 0 0 0 0 1 0 . any more than one actually writes down or stores all the bits (or digits) of 2π. . Traditionally in matrix work and elsewhere in the book. . ··· ··· ··· ··· ··· ··· ··· . 3 7 7 7 7 7 7 7 7 7 7 7 7 7 7 7 5 243 A= expresses the operation generally. and has no second row? Then the operation AX creates a new second row. . that. who have studied matrices before and are used to thinking of a matrix as having some definite size. a11 = 1 and a21 = 5. . (But what if X is a 1 × p matrix. . . Different ways of looking at the same mathematics can be extremely useful to the applied mathematician. . . By running ones infinitely both ways out the main diagonal. 5 times the first—and affects no other row. .) In the infinite-dimensional view. 5 times the first—or rather so fills in X’s previously null second row. . the matrices A and X differ essentially. 0 0 1 5 0 0 0 . Such usage admittedly proves awkward in other contexts. consider the vector u such that uℓ = sin ℓǫ.. . . After all. the letter A does not necessarily represent an extended operator as it does here. but now also a(−1)(−1) = 1 and a09 = 0. There is nothing wrong with thinking of a matrix as having some definite size. .12 This particular section happens to use the symbols A and X to represent certain specific matrix forms because such usage flows naturally from the usage Ax = b of § 11. . Writing them down or storing them is not the point. . developing some nonstandard formalisms the derivations of later sections and chapters can use. . . . The applied mathematical reader who has never heretofore considered 11 . . but rather an arbitrary matrix of no particular form. we guarantee by (11.3. 0 0 0 0 0 0 1 . where 0 < ǫ ≪ 1 and ℓ is an integer. . . Of course one does not actually write down or store all the elements of an infinite-dimensional vector or matrix. no fundamental conceptual barrier rises against it. . . The point is that infinite dimensionality is all right. 0 1 0 0 0 0 0 . . though the construct seem e unfamiliar. .

. too. What really counts is not a matrix’s m × n dimensionality but rather its rank. inasmuch as X differs from the null matrix only in the mn elements xij . but usually a simple 0 suffices. . Basically. . . Special symbols like 0. .244 CHAPTER 11. but those aren’t what we were speaking of here. . . 0 or O are possible for the null matrix. 7 7 7 7 7 7 7. ··· ··· ··· ··· ··· . The same symbol 0 used for the null scalar (zero) and the null matrix is used for the null vector. or to summing an infinity of zeros as in Ch. . . 12. THE MATRIX 11. There are no surprises here. 7. to retain complete information about such a matrix. as a variation on the null matrix. the null matrix brings all the expected properties of a zero. one need record only the mn elements. 3 0= 6 6 6 6 6 6 6 6 6 6 6 4 ··· ··· ··· ··· ··· . . 13 Well. . . . . So the semantics are these: when we call a matrix X an m × n matrix. or more precisely a dimension-limited matrix with an m × n infinite dimensionality in vectors and matrices would be well served to take the opportunity to do so here. 0 0 0 0 0 .. 7 7 7 7 5 or more compactly. dimensionality is a poor measure of a matrix’s size in any case. the vector 0 and the matrix 0 actually represent different things is a matter of semantics. there’s not much else to it. 0 0 0 0 0 .3. . Whether the scalar 0. As we shall discover in Ch. 1 ≤ i ≤ m. plus the values of m and n. when it comes to dividing by zero as in Ch. . but the three are interchangeable for most practical purposes in any case. . . like 0 + A = A. 0 0 0 0 0 . 0 0 0 0 0 . 0 0 0 0 0 . 1 ≤ j ≤ n. infinitedimensionally. .13 Now a formality: the ordinary m × n matrix X can be viewed. [0][X] = 0. . Though the theoretical dimensionality of X be ∞ × ∞. . a zero is a zero is a zero. . . 4.. there’s a lot else to it. [0]ij = 0.1 The null and dimension-limited matrices The null matrix is just what its name implies: 2 . . of course. . . .

. . . . . . .2 The identity and scalar matrices and the extended operator The general identity matrix —or simply. . . we shall mean formally that X is an ∞ × ∞ matrix whose elements are all zero outside the m × n rectangle: xij = 0 except where 1 ≤ i ≤ m and 1 ≤ j ≤ n. . . . . .22) where δij is the Kronecker delta of § 11. the identity matrix —is 2 6 6 6 6 6 6 6 6 6 6 6 4 . . .23) λI = ··· ··· ··· ··· ··· ..3. The I can be regarded as standing for “identity” or as the Roman numeral I. . . .2. or more compactly. . The identity matrix I is a matrix 1. . .3. . but a 4 × 4 matrix is not in general a 3 × 2 matrix. .24) 14 In fact you can write it as 1 if you like. DIMENSIONALITY AND MATRIX FORMS 245 active region. . 0 λ 0 0 0 . . .14 bringing the essential property one expects of a 1: IX = X = XI. 7 7 7 7 5 (11. . 0 0 0 1 0 . . . . The scalar matrix is 2 6 6 6 6 6 6 6 6 6 6 6 4 . .. [λI]ij = λδij . . 3 7 7 7 7 7 7 7. . 3 7 7 7 7 7 7 7. That is essentially what it is. . . . . . . .. 0 0 0 0 λ . . .11. (11. ··· ··· ··· ··· ··· . (11. 7 7 7 7 5 I= ··· ··· ··· ··· ··· . . 0 0 0 λ 0 . every 3 × 2 matrix (for example) is also a formally a 4 × 4 matrix. . . 0 0 λ 0 0 . 0 1 0 0 0 . . 0 0 1 0 0 . .. . [I]ij = δij . 1 0 0 0 0 . as it were. or more compactly. 0 0 0 0 1 . ··· ··· ··· ··· ··· .21) By these semantics. . . 11. . (11. . λ 0 0 0 0 .

26) λ = 0. . (11. . an extended n × n operator is one whose several αk all lie within the n × n square. 7 7 7 7 5 A= ··· ··· ··· ··· ··· . (Often in practice for smaller operators—especially in the typical case that λ = 1—one finds it easier just to record all the n × n elements of the active region.246 CHAPTER 11. inasmuch as A differs from λI only in p specific elements. to retain complete information about such a matrix. . 0 0 0 0 λ . Large matrix operators however tend to be . then the scalar matrix λI is a matrix λ.27) That is. 1). . plus the scale λ. otherwise. When we call a matrix A an extended n × n operator. such that [λI]X = λX = X[λI]. . . . . . The several αk control how the extended operator A differs from λI. with p a finite number. 4) and (4. this information alone implies the entire ∞ × ∞ matrix A. . (3. or an extended operator with an n × n active region. etc. The A in the example is an extended 4 × 4 operator (and also a 5 × 5. 1 ≤ k ≤ p. 3 7 7 7 7 7 7 7. α2 and α3 . . λ = 0. . 0 λ 0 0 0 . we shall mean formally that A is an ∞ × ∞ matrix and is further an extended operator for which 1 ≤ ik ≤ n and 1 ≤ jk ≤ n for all 1 ≤ k ≤ p. j) = (ik . aij = (λ)(δij + αk ) λδij if (i. 4). This is fine. for an extended operator fitting the pattern 2 6 6 6 6 6 6 6 6 6 6 6 4 . One need record only the several αk along with their respective addresses (ik . 0 0 λ 0 0 . . a 6 × 6. . . . jk ). (11. . Symbolically. . THE MATRIX If the identity matrix I is a matrix 1. . . one need record only the values of α1 .. but not a 3 × 3). . For example. the respective addresses (2. jk ).25) The identity matrix is (to state the obvious) just the scalar matrix with λ = 1. .. . . 0 0 λα2 λ(1 + α3 ) 0 .. The extended operator A is a variation on the scalar matrix λI. ··· ··· ··· ··· ··· . (11. and the value of the scale λ. . λ λα1 0 0 0 . .

bearing repeated entries not only along the main diagonal but also along the diagonals just above and just below. 2 × 2. meaning that they depart from λI in only a very few of their many elements. If λ = 0. j) ≤ n.3. here it is in outline: aij bij [AB]ij = = = = = λa δij unless 1 ≤ (i. j) ≤ n. otherwise the product is an extended operator.4 Other matrix forms Besides the dimension-limited form of § 11.3.1 and the extended-operational form of § 11. so one normally stores just the few elements.3 The active region Though maybe obvious. Section 11. λ = 0. however. etc. j) ≤ n.). the dimensionThe section’s earlier subsections formally define the term active region with respect to each of the two matrix forms. the scalar matrix λI is just the null matrix. are extended operators with 0 × 0 active regions (and also 1 × 1. DIMENSIONALITY AND MATRIX FORMS 247 sparse. 15 λb δij unless 1 ≤ (i.2.9 introduces one worthwhile matrix which fits neither the dimension-limited nor the extended-operational form.3.16 11.) If any of the factors has dimension-limited form then so does the product. it bears stating explicitly that a product of dimension-limited and/or extended-operational matrices with n × n active regions itself has an n × n active region.) Implicit in the definition of the extended operator is that the identity matrix I and the scalar matrix λI.15 (Remember that a matrix with an m′ × n′ active region also by definition has an n × n active region if m′ ≤ n and n′ ≤ n. . 16 If symbolic proof of the subsection’s claims is wanted. 11. Pk (λa δik )bkj = λa bij unless 1 ≤ i ≤ n k aik (λb δkj ) = λb aij unless 1 ≤ j ≤ n It’s probably easier just to sketch the matrices and look at them.11. recording only nonzero elements and their addresses in an otherwise null matrix. though. other infinite-dimensional matrix forms are certainly possible. instead. Still. X aik bkj k (P λa λb δij unless 1 ≤ (i. which is no extended operator but rather by definition a 0 × 0 dimension-limited matrix. One could for example advantageously define a “null sparse” form.3.3. or a “tridiagonal extended” form. It would waste a lot of computer memory explicitly to store all those zeros.

The areas of I3 not shown are all zero.) The rank r can be any nonnegative integer. otherwise.3. Further reasons to have defined such forms will soon occur. The name “rank-r” implies that Ir has a “rank” of r. even along the main diagonal. I3 . and they are the ones we shall principally be handling in this book.5 defines rank for matrices more generally.1.29) where X is an m × n matrix and x. we shall discern the attribute of rank only in the rank-r identity matrix itself. (11. though a 3 × 3 matrix. and indeed it does. The effect of Ir is that Im X = X = XIn . . One reason to have defined specific infinite-dimensional matrix forms is to show how straightforwardly one can fully represent a practical matrix of an infinity of elements by a modest. 11. Examples of Ir include I3 = 1 4 0 0 0 1 0 (Remember that in the infinite-dimensional view.28) where either the “and” or the “or” can be regarded (it makes no difference). It has only the three ones and fits the 3 × 3 dimension-limited form of § 11. finite quantity of information. even zero (though the rankzero identity matrix I0 is in fact the null matrix.30) can be used. Section 12. If alternate indexing limits are needed (for instance for a computer-indexed b identity matrix whose indices run from 0 to r − 1). Im x = x. however.248 CHAPTER 11. is formally an ∞ × ∞ matrix with zeros in the unused cells. (11. 1 (11. which is just the count of ones along the matrix’s main diagonal. normally just written 0). otherwise.3. where b [Ia ]ij ≡ δij 0 if a ≤ i ≤ b and/or a ≤ j ≤ b. the notation Ia . the rank in this case is r = b − a + 1.5 The rank-r identity matrix The rank-r identity matrix Ir is the dimension-limited matrix for which [Ir ]ij = δij 0 if 1 ≤ i ≤ r and/or 1 ≤ j ≤ r. THE MATRIX limited and extended-operational forms are normally the most useful. For the moment. an m × 1 vector. 2 3 0 0 5.

(After all.) 17 . Attacking from the right. which has a one as the mth element: 1 if i = m.7 The elementary vector and the lone-element matrix The lone-element matrix Emn is the matrix with a one in the mnth cell and zeros elsewhere: [Emn ]ij ≡ δim δjn = 1 if i = m and j = n. Attacking from the left. [I]∗j = ej and [I]i∗ = eT . 0 otherwise.6 The truncation operator The rank-r identity matrix Ir is also the truncation operator. (11.32) By this definition.j cij Eij for any matrix C. The vector analog of the lone-element matrix is the elementary vector em . it retains the first through rth columns. • The rank-r identity matrix Ir commutes freely past C. 11.31) says at least two things: • It is superfluous to truncate both rows and columns.3. as in Ir A. (11. Whether a matrix C has dimension-limited or extended-operational form (though not necessarily if it has some other form). There would be nothing there to truncate.3. (11. By this definition. if it were always all zero outside. That a matrix has an m × n active region does not necessarily mean that it is all zero outside the m × n rectangle.3. then there would be little point in applying a truncation operator. (11. n ≤ r.1 and 11.11. it suffices to truncate one or the other.33) [em ]i ≡ δim = 0 otherwise. C = i. i Refer to the definitions of active region in §§ 11.2. if it has an m × n active region17 and m ≤ r.31) then Ir C = Ir CIr = CIr . Evidently big identity matrices commute freely where small ones cannot (and the general identity matrix I = I∞ commutes freely past everything).3. For such a matrix. Such truncation is useful symbolically to reduce an extended operator to dimension-limited form.3. DIMENSIONALITY AND MATRIX FORMS 249 11. as in AIr . it retains the first through rth rows of A but cancels other rows.

3. .18 • The first is the interchange elementary T[i↔j] = I − (Eii + Ejj ) + (Eij + Eji ). Conventionally denoted T .35) The product of matrices has off-diagonal entries in a row or column only if at least one of the factors itself has off-diagonal entries in that row or column. The elementary operator T comes in three kinds.3 has introduced the general row or column operator. where j = i.1.6). the ith row or jth column of the product of matrices can depart from eT or ej . which is to say that the operator doesn’t actually do anything. The reason is that in (11. that if C1 ’s ith row is eT . There exist legitimate tactical reasons to forbid (as in § 11.19 18 In § 11. which itself is i just eT . then its action is merely to duplicate C2 ’s ith row. Or. some authors [31] forbid T[i↔i] as an elementary operator.4 The elementary operator Section 11. then also [C1 C2 ]∗j = ej . but normally this book permits. 19 As a matter of definition. THE MATRIX 11. C1 acts as a row operator on C2 .36) which by operating T[i↔j]A or AT[i↔j] respectively interchanges A’s ith row or column with its jth. (11. Parallel logic naturally applies to (11. the elementary operator is a simple extended row or column operator from sequences of which more complicated extended operators can be built. (11.8 Off-diagonal entries It is interesting to observe and useful to note that if [C1 ]i∗ = [C2 ]i∗ = eT .3.34) 11.35). the symbol A specifically represented an extended operator. but here and generally the symbol represents any matrix. i (11.250 CHAPTER 11. less readably but more precisely. since after all T[i↔i] = I. only if the i corresponding row or column of at least one of the factors so departs. respectively.34). i then also [C1 C2 ]i∗ = eT . i and likewise that if [C1 ]∗j = [C2 ]∗j = ej .

··· ··· ··· ··· ··· . . . . 7 7 7 7 5 T[1↔2] = 2 T5[4] = 2 6 6 6 6 6 6 6 6 6 6 6 4 . . . . respectively. 0 0 0 0 1 . . . ··· ··· ··· ··· ··· . 7 7 7 7 5 7 7 7 7 7 7 7. . . 0 0 0 1 0 . the applied mathematician considers whether it were not worth the effort to adapt the definition accordingly. However. . ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· .. 3 T5[21] = It is good to define a concept aesthetically.. 3 7 7 7 7 7 7 7. . 3 7 7 7 7 7 7 7. 1 5 0 0 0 . 1 0 0 0 0 . . . . α times the ith column. 7 7 7 7 5 .37) which by operating Tα[i] A or ATα[i] scales (multiplies) A’s ith row or column. . . . . 0 1 0 0 0 . an applied mathematician ought not to let a mere definition entangle him. . • The third and last is the addition elementary Tα[ij] = I + αEij . . . . . . . 0 0 0 1 0 .. . . . . 6 6 6 6 6 6 6 6 6 6 6 4 .11. 0 0 0 0 1 .4. . 251 (11. . . . 0 0 0 0 1 .. 0 1 0 0 0 . . What matters is the underlying concept.. . . . . . . ··· ··· ··· ··· ··· . 0 0 0 5 0 . . . . 1 0 0 0 0 . and indeed in this case one might reasonably promote either definition on aesthetic grounds. THE ELEMENTARY OPERATOR • The second is the scaling elementary Tα[i] = I + (α − 1)Eii . . . α times the jth row. . 0 0 1 0 0 . . . . . . Where the definition does not serve the concept well. 0 0 1 0 0 . . . . . ..38) which by operating Tα[ij] A adds to the ith row of A. . by the factor α. i = j. . . . . . Examples of the elementary operators include 2 6 6 6 6 6 6 6 6 6 6 6 4 . . . . . One should usually do so when one can. . or which by operating ATα[ij] adds to the jth column of A. . . . ··· ··· ··· ··· ··· . 0 0 1 0 0 . . (11. . . . 0 1 0 0 0 . . α = 0. . . .

20 This means that any sequence of elementaries −1 safely be undone by the reverse sequence k Tk : −1 Tk k k k (11. The last can be considered a distinct.42) for any elementary operator T which operates within the given bounds. extended-operational form.42) lets an identity matrix with sufficiently high rank pass through a sequence of elementaries as needed. Equation (11. The rank-r identity matrix Ir is no elementary operator.40) Tk can (11. THE MATRIX Note that none of these. with −1 T[i↔j] = T[j↔i] = T[i↔j]. but the general identity matrix I is indeed an elementary operator.41) Tk = I = k Tk k −1 Tk . −1 Tα[ij] = T−α[ij] . . Curiously. 11. The other books are not wrong. −1 Tα[i] = T(1/α)[i] . fourth kind of elementary operator if desired. elementary operators as defined above are always invertible (which is to say.39) being themselves elementary operators such that T −1 T = I = T T −1 in each case. In general.252 CHAPTER 11. (11. but it is probably easier just to regard it as an elementary of any of the first three kinds. From (11. the transpose of an elementary row operator is the corresponding elementary column operator.1 Properties Significantly. This book finds it convenient to define the elementary operator in infinite-dimensional. 21 If the statement seems to contradict statements of some other books. it is only a matter of definition. (11.43) 20 The addition elementary Tα[ii] and the scaling elementary T0[i] are forbidden precisely because they are not generally invertible.21 nor is the lone-element matrix Emn . and in fact no elementary operator of any kind. we have that Ir T = Ir T Ir = T Ir if 1 ≤ i ≤ r and 1 ≤ j ≤ r (11.4. their underlying definitions just differ slightly.31). since I = T[i↔i] = T1[i] = T0[ij] . reversible in effect). the interchange elementary is its own transpose and adjoint: ∗ T T[i↔j] = T[i↔j] = T[i↔j]. differs from I in more than four elements.

THE ELEMENTARY OPERATOR 253 11. it can alter any elementary by commuting past. but all other pairs commute in this way. Itself subject to alteration only by another interchange elementary. however. it changes the kind of neither.4. • The interchange elementary is the strongest. where T1 and T2 are elementaries of the same kinds respectively as T1 and T2 . (ii) scaling. elementaries of different kinds always commute. though ′ indeed T1 T2 = T2 T1 . we should like to observe a natural hierarchy among the three kinds of elementary: (i) interchange.11. and for most pairs T1 and T2 of elementary operators. with several elementaries of all kinds intermixed. Significantly. they yield the same A and thus represent exactly the same matrix operation. among other possible orderings. Some applications demand that the elementaries be sorted and grouped by kind. First. One sorts a chain of elementary operators by repeatedly exchanging adjacent pairs.2 Commutation and sorting Elementary operators often occur in long chains like A = T−4[32] T[2↔3] T(1/5)[3] T(1/2)[31] T5[21] T[1↔3] . Though you probably cannot tell just by looking. as A = T[2↔3] T[1↔3] or as A = T−4[32] T(1/0xA)[21] T5[31] T(1/5)[2] T[2↔3] T[1↔3] . or whatever symbols T−4[21] T(1/0xA)[13] T5[23] T(1/5)[1] .4. However. at the moment we are dealing in elementary operators only. but has changed the kind of none of them. it so happens that there exists either a T1 such that ′ ′ ′ ′ ′ T1 T2 = T2 T1 or a T2 such that T1 T2 = T2 T1 . we shall list these exhaustively in a moment. Many qualitatively distinct pairs of elementaries exist. When an interchange elementary commutes past another elementary of any kind. the three products above are different orderings of the same elementary chain. though commutation can alter one (never both) of the two elementaries. Interesting is that the act of reordering the elementaries has altered some of them into other elementaries of the same kind. (iii) addition. which seems impossible since matrix multiplication is not commutative: A1 A2 = A2 A1 . This of course supposes that one can exchange adjacent pairs. what it alters are the other elementary’s indices i and/or j (or m and/or n. And. The attempt sometimes fails when both T1 and T2 are addition elementaries.

what it alters is the latter’s scale α (or β. or whatever symbol happens to represent the scale in question). itself having no power to alter any elementary during commutation.3 list all possible pairs of elementary operators.) Refer to Table 11. When a scaling elementary commutes past an addition elementary. Only an interchange elementary can alter it. .1. Refer to Table 11. then what can one write of the commutation of an elementary past the general matrix A? With some matrix algebra. commuting leftward. When two interchange elementaries commute past one another. A pair of addition elementaries are the only pair that can altogether fail to commute—they fail when the row index of one equals the column index of the other—but when they do commute.2.1. neither alters the other.3 exhaustively describe the commutation of one elementary past another elementary. The mathematician chooses. is subject to alteration by either of the other two. T A = (T A)(I) = (T A)(T −1 T ).3. (11. as the reader can check. • Next in strength is the scaling elementary.2 and 11. one can write that T A = [T AT −1 ]T.254 CHAPTER 11.3.5 Inversion and similarity (introduction) If Tables 11. to T −1 AT . Refer to Table 11.44) where T −1 is given by (11. The only pairs that fail to commute are the last three of Table 11. • The addition elementary. Scaling elementaries do not alter one another during commutation. An elementary commuting rightward changes A to T AT −1 . 11. AT = T [T −1 AT ]. 11. AT = (I)(AT ) = (T T −1 )(AT ). 11. (Which one? Either. and it in turn can alter only an addition elementary.2 and 11. last and weakest. THE MATRIX happen to represent the indices in question). only one of the two is altered.39).1. Tables 11.

11. In the table. combining and expanding elementary operators: interchange.5. is simply to replace m by n and n by m among the indices of the other elementary. even another interchange elementary. i = j = m = n. Notice that the effect an interchange elementary T[m↔n] has in passing any other elementary. INVERSION AND SIMILARITY (INTRODUCTION) 255 Table 11. no two indices are the same. commuting. T[m↔n] = T[n↔m] T[m↔m] = I IT[m↔n] = T[m↔n] I T[m↔n] T[m↔n] = T[m↔n] T[n↔m] = T[n↔m] T[m↔n] = I T[m↔n] T[i↔n] = T[i↔n] T[m↔i] = T[i↔m] T[m↔n] = T[i↔n] T[m↔n] 2 T[m↔n] T[i↔j] = T[i↔j] T[m↔n] T[m↔n] Tα[m] = Tα[n] T[m↔n] T[m↔n] Tα[i] = Tα[i] T[m↔n] T[m↔n] Tα[ij] = Tα[ij] T[m↔n] T[m↔n] Tα[in] = Tα[im] T[m↔n] T[m↔n] Tα[mj] = Tα[nj] T[m↔n] T[m↔n] Tα[mn] = Tα[nm] T[m↔n] .1: Inverting.

i = j = m = n. i = j = m = n. T1[m] = I ITβ[m] = Tβ[m] I T(1/β)[m] Tβ[m] = I Tβ[m] Tα[m] = Tα[m] Tβ[m] = Tαβ[m] Tβ[m] Tα[i] = Tα[i] Tβ[m] Tβ[m] Tα[ij] = Tα[ij] Tβ[m] Tβ[m] Tαβ[im] = Tα[im] Tβ[m] Tβ[m] Tα[mj] = Tαβ[mj] Tβ[m] Table 11. THE MATRIX Table 11. In the table.2: Inverting. combining and expanding elementary operators: addition. T0[ij] = I ITα[ij] = Tα[ij] I T−α[ij] Tα[ij] = I Tβ[ij] Tα[ij] = Tα[ij] Tβ[ij] = T(α+β)[ij] Tβ[mj] Tα[ij] = Tα[ij] Tβ[mj] Tβ[in] Tα[ij] = Tα[ij] Tβ[in] Tβ[mn] Tα[ij] = Tα[ij] Tβ[mn] Tβ[mi] Tα[ij] = Tα[ij] Tαβ[mj] Tβ[mi] Tβ[jn] Tα[ij] = Tα[ij] T−αβ[in] Tβ[jn] Tβ[ji] Tα[ij] = Tα[ij] Tβ[ji] . In the table. The last three lines give pairs of addition elementaries that do not commute. commuting.3: Inverting. combining and expanding elementary operators: scaling. commuting. no two indices are the same.256 CHAPTER 11. no two indices are the same.

) The broad question of how to invert a general matrix C. the null matrix has no such inverse.48) This rule emerges upon repeated application of (11. First. INVERSION AND SIMILARITY (INTRODUCTION) 257 First encountered in § 11. AC = C[C −1 AC].9 speak further of this. Matrix inversion is not for elementary operators only. though.11. −1 Ck k = k −1 Ck . (11. CA = [CAC −1 ]C. we leave for Chs. . which yields −1 Ck k k Ck = I = k Ck k −1 Ck .45).44) actually requires the matrix T there to be an elementary operator. Second.47) is a consequence of (11. Hence.4. (11. 12 and 13 to address. Many more general matrices C also have inverses such that C −1 C = I = CC −1 . (11.2 and 14. ∗ −1 −1 ∗ (11. For example. nothing in the logic leading to (11. such that T −1 T = I = T T −1 . For the moment however we should like to observe three simple rules involving matrix inversion. Third. Sections 12. CT C −1 = C −T = C −1 = C −∗ = C T .47) where C −∗ is condensed notation for conjugate transposition and inversion in either order and C −T is of like style. Equation (11. .45) (Do all matrices have such inverses? No. since for conjugate transposition C −1 ∗ C ∗ = CC −1 ∗ = [I]∗ = I = [I]∗ = C −1 C ∗ = C ∗ C −1 ∗ and similarly for nonconjugate transposition.46) The transformation CAC −1 or C −1 AC is called a similarity transformation.14). the notation T −1 means the inverse of the elementary operator T .5. Any matrix C for which C −1 is known can fill the role.

In either notation. one can achieve any desired permutation (§ 4. This is the rank-r inverse.4 summarizes. 11. 2.45).31). (11. (11. . a matrix C −1(r) such that C −1(r) C = Ir = CC −1(r) .) Table 11.2. with which many or most readers will already be familiar. beginning . we shall first learn how to compute it reliably in Ch. THE MATRIX Table 11.49) The full notation C −1(r) is not standard and usually is not needed. usually is not needed. 13.48) apply equally for the rank-r inverse as for the infinite-dimensional inverse. For example. (The properties work equally for C −1(r) as for C −1 if A honors an r ×r active region. not just adjacent pairs). .258 CHAPTER 11. Because of (11. 3. By successively interchanging pairs of the objects (any pairs. The full notation C −1(r) for the rank-r inverse incidentally is not standard. 12.4: Matrix inversion properties. one can abbreviate the notation to C −1 . This is a famous way to use the inverse. and normally is not used.47) and (11.2 uses the rank-r inverse to solve an exactly determined linear system. . (11. n.46) too applies for the rank-r inverse if A’s active region is limited to r × r.) C −1 C = C −1(r) C = CT C∗ −1 −1 I Ir C −∗ = CC −1 = CC −1(r) = = C −1 C −1 T ∗ = C −T = CA = [CAC −1 ]C AC = C[C −1 AC] −1 Ck k = k −1 Ck A more limited form of the inverse exists than the infinite-dimensional form of (11.1). (Section 13. When so. since the context usually implies the rank. but before using it so in Ch.6 Parity Consider the sequence of integers or other objects 1. .

some pairs will appear in correct order with respect to one another. v) (u. v. Complementarily. −2. thus the interchange affects the ordering of no other pair). The interchange reverses with respect to one another just the pairs (u. . then we say that the permutation has even or positive parity. am the elements lying between. a1 . n). If two elements are adjacent and their order is correct. while others will appear in incorrect order. . but the pair [1. think about it. . . . thus reversing parity. Either way. Before the interchange: . . 3) (2. am . 3. a2 ) · · · (a1 . (3. .6. 2) (1. . 3] appears in incorrect order in that the larger 3 stands to the left of the smaller 1. 4. v. 7. PARITY 259 with 1. am . . v) 22 For readers who learned arithmetic in another language than English. 5. 1. . then odd or negative parity. . 0. v) (a2 . v) (am .11. the odd integers are . (n − 1. . 1. Why? Well. . then the 2 and 5. the even integers are . am−1 . n). (2. reversing parity. −1. 3) (1. . . . . 4. . . . 4) · · · (3. 3. one can achieve the permutation 3. (1. . n).. a1 ) (u. . and if p is even. then interchanging rectifies the order. am ) (am−1 . a1 . .) If p is the number of pairs which appear in incorrect order (in the example. p = 6). Now contemplate all possible pairs: (1. an adjacent interchange alters p by exactly ±1. . n). a2 . a2 . . . . . with a1 . 5. . . 2 by interchanging first the 1 and 3. 2. . am−1 ) (u. . −3. 4. too? To answer the question. 6. u. v) · · · (u. if odd. 2. What about nonadjacent elements? Does interchanging a pair of these reverse parity. . . 1. 4) · · · . but only of that pair (no other element interposes. (In 3. In a given permutation (like 3. After the interchange: . . u. . the pair [1. 2. 4. 5. . am−1 . 2). . if the order is incorrect. let u and v represent the two elements interchanged. 2] appears in correct order in that the larger 2 stands to the right of the smaller 1. a2 . 4. −4. . . 4) · · · (2.. .22 Now consider: every interchange of adjacent elements must either increment or decrement p by one. 5. 5. . 1. then interchanging falsifies the order.

1 and 14. This does not change parity. the net change in p apparently also is odd. explained in § 11. reversing parity. there are some extra rules. The sole exception arises when an element is interchanged with itself. The rows or columns of a matrix can be considered elements in a sequence. so in parity calculations we ignore it.7. more complicated than elementary operators but less so than arbitrary matrices. interchanging rows or columns and thus reversing parity. however. then q T[ik ↔jk ] = I. it is possible that q T[ik ↔jk ] = I k=1 k=1 if q is even. called in this book the quasielementary operators. With respect to interchange and scaling.7. However. even q implies even p. which means even (positive) parity. Since each reversal alters p by ±1. It seems that regardless of how distant the pair. any sequences of elementaries of the respective kinds are allowed.41) are always invertible. We discuss parity in this. as explained in footnote 19.1. If so.260 CHAPTER 11. With respect to addition. scaling and addition—to match the three kinds of elementary. but it does not change anything else. so one finds it convenient to define an intermediate class of operators. We shall have more to say about parity in §§ 11. The three subsections which follow respectively introduce the three kinds of quasielementary operator. Such complicated operators are not trivial to analyze. which means odd (negative) parity.3. acts precisely in the manner described. either. A quasielementary operator is composed of elementaries only of a single kind.4. THE MATRIX The number of pairs reversed is odd. It follows that if ik = jk and q is odd. a chapter on matrices. then the interchange operator T[i↔j] . . There are thus three kinds of quasielementary—interchange. one can form much more complicated operators.7 The quasielementary operator Multiplying sequences of the elementary operators of § 11. because parity concerns the elementary interchange operator of § 11. i = j. 23 This is why some authors forbid self-interchanges. which per (11. 11.4.23 All other interchanges reverse parity. odd q implies odd p. In any event. interchanging any pair of elements reverses the permutation’s parity.

. P = k T[ik ↔jk ] . By (11. THE QUASIELEMENTARY OPERATOR 261 11. (11. . . . ..39). without altering any of the rows or columns it shuffles. permutation matrix. . permutor or general interchange operator. The effect of the operator is to shuffle the rows or columns of the matrix it operates on. the inverse of the general interchange operator is P −1 −1 = k T[ik ↔jk ] = k −1 T[ik ↔jk ] = k T[ik ↔jk ] ∗ T[ik ↔jk ] ∗ = k = k T[ik ↔jk ] (11.51) = P∗ = PT (where P ∗ = P T because P has only real elements). . .41). (11. 1 0 0 0 0 .15). 0 0 0 0 1 .50) constitutes an interchange quasielementary. . 7 7 7 7 5 This operator resembles I in that it has a single one in each row and in each column. 0 1 0 0 0 .11. . . . .7. 7 7 7 7 7 7 7.” . 3 P = T[2↔5] T[1↔3] = 6 6 6 6 6 6 6 6 6 6 6 4 ··· ··· ··· ··· ··· . .1 The interchange quasielementary or general interchange operator Any product P of zero or more interchange elementaries. . but the ones here do not necessarily run along the main diagonal. The inverse. . ..52) The letter P here recalls the verb “to permute.7. 0 0 0 1 0 . . 0 0 1 0 0 . (11.24 An example is 2 .43) and (11. . . . . . 24 (11. . transpose and adjoint of the general interchange operator are thus the same: P T P = P ∗P = I = P P ∗ = P P T . . ··· ··· ··· ··· ··· . .

. . . . 3 7 7 7 7 7 7 7 7 7 7 7 5 D= ∞ i=−∞ Tαi [i] = ∞ i=−∞ Tαi [i] = ∞ i=−∞ αi Eii = (11. negative or odd parity if the number is odd. the matrix is nonetheless composable as a definite sequence of interchange elementaries. THE MATRIX A significant attribute of the general interchange operator P is its parity: positive or even parity if the number of interchange elementaries T[ik ↔jk ] which compose it is even. The parity of the general interchange operator P concerns us for this reason. is a property of the matrix P itself. 0 0 0 ∗ 0 . 0 0 ∗ 0 0 . The letter D here recalls the adjective “diagonal. . It is the number of interchanges. most or even all i. Thus the example P above has even parity (two interchanges). 0 0 0 0 ∗ . which determines P ’s parity. The reason is that. .. For the purpose of parity determination. αi = 0 is forbidden by the definition of the scaling 25 6 6 6 6 6 6 6 6 6 6 6 4 ··· ··· ··· ··· ··· . 11. however. the positive (even) and negative (odd) parities sometimes lend actual positive and negative senses to the matrices they describe. . . . hence that Tαi [i] = I.1. . ∗ 0 0 0 0 . This works precisely as described in § 11. only interchange elementaries T[ik ↔jk ] for which ik = jk are counted.7. in this case elementary scaling operators:25 2 . diagonal matrix or general scaling operator D consists of a product of zero or more elementary operators. . incidentally. regardless of whether one ultimately means to use P as a row or column operator. ··· ··· ··· ··· ··· . .. as does I itself (zero interchanges). . . . . 0 ∗ 0 0 0 . .7.262 CHAPTER 11. .1. . any T[i↔i] = I noninterchanges are ignored. As we shall see in § 14. . but T[i↔j] alone (one interchange) has odd parity if i = j. for some. . . . not the use. .2 The scaling quasielementary or general scaling operator Like the interchange quasielementary P of § 11. No interchange quasielementary P has positive parity as a row operator but negative as a column operator.53) (of course it might be that αi = 1. . .6. not just of the operation P represents.” . Parity. the scaling quasielementary.

. . .11. . qualifies as a quasielementary operator. .7. 3 7 7 7 7 7 7 7. . . .. . . A superset of the general scaling operator is the diagonal matrix. . 0 0 0 0 x5 . An example is 2 6 6 6 6 6 6 6 6 6 6 6 4 . 7 7 7 7 5 diag{x} = converts a vector x into a diagonal matrix. 2 6 6 6 6 6 6 6 6 6 6 6 4 . . They are nonzero scaling factors. . Not so. . . . . . This operator resembles I in that all its entries run down the main diagonal. 0 x2 0 0 0 . . 0 0 0 0 1 . where zeros along the main diagonal are allowed. . . . 7 7 7 7 5 263 D = T−5[4] T4[2] T7[1] = ··· ··· ··· ··· ··· . 0 0 0 0 1 0 0 −5 0 0 . . . . .. 11. The effect of the operator is to scale the rows or columns of the matrix it operates on. any . 0 4 0 0 0 . though never zeros. .. . but these entries. . . . . . The conventional notation [diag{x}]ij ≡ δij xi = δij xj .54) where each element down the main diagonal is individually inverted. either. but is sometimes useful nevertheless. Its inverse is evidently D −1 = ∞ i=−∞ T(1/αi )[i] = ∞ i=−∞ T(1/αi )[i] = ∞ i=−∞ Eii . THE QUASIELEMENTARY OPERATOR elementary). . .. 0 0 x3 0 0 . . . . .55) . (11.3 Addition quasielementaries Any product of interchange elementaries (§ 11. . The diagonal matrix in general is not invertible and is no quasielementary operator. 7 0 0 0 0 . . any product of scaling elementaries (§ 11. . .7.7. ··· ··· ··· ··· ··· . . αi (11. . . defined less restrictively that [A]ij = 0 for i = j.7. . The general scaling operator is a particularly simple matrix.1). . . . 0 0 0 x4 0 . . 3 7 7 7 7 7 7 7. are not necessarily ones. ··· ··· ··· ··· ··· .2). ··· ··· ··· ··· ··· . . x1 0 0 0 0 .

7 7 7 7 5 whose inverse is L−1 = [j] ∞ i=j+1 T−αij [ij] = ∞ ∞ i=j+1 T−αij [ij] (11.57) = I− i=j+1 αij Eij = 2I − L[j]. . . . .. = 6 6 6 6 6 6 6 6 6 6 6 4 . . .56) = I+ 2 . . 0 1 0 0 0 . a product of elementary addition operators must meet some additional restrictions. . 1 0 0 0 0 . . . Tαij [ij] (11. . . . . 0 0 0 1 0 . . 0 0 0 0 1 . 3 7 7 7 7 7 7 7. but the pattern is similar. . 0 0 1 ∗ ∗ . . . .27 ∞ i=j+1 ∞ i=j+1 ∞ i=j+1 L[j] = Tαij [ij] = αij Eij . THE MATRIX product of addition elementaries. The reader can fill in the details. In this subsection the explanations are briefer than in the last two. . Four types of addition quasielementary are defined:26 • the downward multitarget row addition operator. To qualify as a quasielementary.” 26 .. ··· ··· ··· ··· ··· . . . .264 CHAPTER 11. ··· ··· ··· ··· ··· . 27 The letter L here recalls the adjective “lower. .

and [j] • the leftward multitarget column addition operator. which is the transT pose U[j] of the upward operator. . . . . 0 0 0 1 0 . 3 2 = 6 6 6 6 6 6 6 6 6 6 6 4 . THE UNIT TRIANGULAR MATRIX • the upward multitarget row addition operator. .” . .28 j−1 j−1 265 U[j] = i=−∞ Tαij [ij] = i=−∞ j−1 Tαij [ij] (11. 7 7 7 7 5 whose inverse is j−1 −1 U[j] j−1 = i=−∞ T−αij [ij] = i=−∞ T−αij [ij] (11. ··· ··· ··· ··· ··· . .. .7 is the unit triangular matrix. . . which is the transpose LT of the downward operator.11. 11. . .8 The unit triangular matrix Yet more complicated than the quasielementary of § 11. . . • the rightward multitarget column addition operator. ··· ··· ··· ··· ··· . . 1 0 0 0 0 . . . 0 1 0 0 0 . .. ∗ ∗ 1 0 0 .59) j−1 = I− i=−∞ αij Eij = 2I − U[j] . 0 0 0 0 1 . with which we draw this necessary but tedious chapter toward 28 The letter U here recalls the adjective “upper. 7 7 7 7 7 7 7. . . . .58) = I+ i=−∞ αij Eij . . .8. .

7 7 7 7 5 U = I+ 2 . ··· ··· ··· ··· ··· . . .29 The strictly triangular matrix L − I or U − I is likewise sometimes of interest. 0 0 1 ∗ ∗ . . αij Eij = I + .. . ∞ .5. . This section presents and derives the basic properties of the unit triangular matrix. . ∗ ∗ ∗ ∗ 1 . ∗ ∗ ∗ 1 0 . 0 0 0 1 ∗ . ··· ··· ··· ··· ··· . The general triangular matrix LS or US . which by definition can have any values along its main diagonal. . Other books typically use the symbols L and U for the general triangular matrix of Schur. . 0 0 0 0 1 . .60) i=−∞ j=−∞ j=−∞ i=j+1 = 6 6 6 6 6 6 6 6 6 6 6 4 . .61) i=−∞ j=i+1 j=−∞ i=−∞ = 6 6 6 6 6 6 6 6 6 6 6 4 . ··· ··· ··· ··· ··· . “Schur decomposition. . The unit triangular matrix is a generalized addition quasielementary. THE MATRIX L = I+ 2 . 3 ∞ j−1 αij Eij (11. . . . . . . 30 August 2007] 29 . . . . 30 [52. . . . 1 0 0 0 0 . 3 7 7 7 7 7 7 7. . . ∞ ∞ ∞ αij Eij (11. . ··· ··· ··· ··· ··· . as in the Schur decomposition of § 14... . The subscript S here stands for Schur. 7 7 7 7 5 The former is a unit lower triangular matrix. αij Eij = I + . the latter. which adds not only to multiple targets but also from multiple sources—but in one direction only: downward or leftward for L or U T (or U ∗ ). a unit upper triangular matrix.” 00:32.. .266 a long close: ∞ i−1 CHAPTER 11. . . . as in Table 11. . . . such matrices cannot in general be expressed as products of elementary operators and this section does not treat them. ∗ ∗ 1 0 0 .10. . . . is sometimes of interest. 7 7 7 7 7 7 7. . . 1 ∗ ∗ ∗ ∗ . . ∗ 1 0 0 0 . . . . . 0 1 ∗ ∗ ∗ . . upward or rightward for U or LT (or L∗ ). .30 However. . but this book distinguishes by the subscript.

as row operators k to C. (11. Equation (11. to build any unit triangular matrix desired.62) immediately and directly.31 then conveniently. A3 and so on. rightward. 31 .63) is most easily seen if the several L[j] and U[j] are regarded as column operators acting sequentially on I:  ∞ j=−∞ ∞ j=−∞ L = (I)  U = (I)   L[j]  . U[j]  .11. but just thinking about how L[j] adds columns leftward and U[j] . without calculation. whereas (C)( Qk Ak ) ‘ applies first A1 .63) follows at once.   The reader can construct an inductive proof symbolically on this basis without too much difficulty if desired.1 Construction To make a unit triangular matrix is straightforward: L= U= ∞ j=−∞ ∞ j=−∞ L[j] .62) U[j] . A3 and so on. then A2 . (11. The symbols and as this book uses them can thus be thought of respectively as row and column sequencers. (11. So long as the multiplication is done in the order indicated. then considering the order in which the several L[j] and U[j] act.8. as column operators to C. Q ‘ Recall again from § 2.8. THE UNIT TRIANGULAR MATRIX 267 11. Q This means that ( ‘ Ak )(C) applies first A1 . The correctness of (11. ij ij ij ij which is to say that the entries of L and U are respectively nothing more than the relevant entries of the several L[j] and U[j].63) . L U = L[j] = U[j] . then A2 .63) enables one to use (11. whereas k Ak = A1 A2 A3 · · · .3 that k Ak = · · · A3 A2 A1 .

11. The proof for unit lower and unit upper triangular matrices is the same.8. Inasmuch as this is true.64) is another unit triangular matrix of the same type. But this is just [L1 L2 ]ij = 0 if i < j.62).2 The product of like unit triangular matrices The product of like unit triangular matrices.64).63) supplies the specific quasielementaries.268 CHAPTER 11. and [L2 ]mj is null when m < j. [L1 L2 ]ij = i m=j [L1 ]im [L2 ]mj if i = j. if i ≥ j.3 Inversion Inasmuch as any unit triangular matrix can be constructed from addition quasielementaries by (11. Therefore. [L1 ]ij [L2 ]ij = [L1 ]ii [L2 ]ii = (1)(1) = 1 if i = j.59) gives the inverse of each .8. Hence (11. But as we have just observed. (11. U1 U2 = U. [L1 ]ij or [L2 ]ij = 1 if i = j. [L1 L2 ]ij = ∞ m=−∞ [L1 ]im [L2 ]mj . L1 L2 = L. one starts from a form of the definition of a unit lower triangular matrix: 0 if i < j. which again is the very definition of a unit lower triangular matrix.57) or (11. THE MATRIX 11. and inasmuch as (11. In the unit lower triangular case. [L1 ]im is null when i < m. [L1 L2 ]ij = 0 i m=j [L1 ]im [L2 ]mj if i < j. nothing prevents us from weakening the statement to read 0 if i < j. inasmuch as (11. Then.

The adjoint reverses the sense of the triangle. . . . It is plain to see but still interesting to note that—unlike the inverse— the adjoint or transpose of a unit lower triangular matrix is a unit upper triangular matrix.65) −1 U[j] .66) j=−∞ i=k+1 = 6 6 6 6 6 6 6 6 6 6 6 4 . another unit upper triangular matrix. . [j] (11. . therefore. 0 0 0 1 0 . . . . 0 0 1 ∗ ∗ . ··· ··· ··· ··· ··· . and the inverse of a unit upper triangular matrix. .64).8. . . 0 1 0 ∗ ∗ . 0 0 0 0 1 . . .. . −1 In view of (11. . the inverse of a unit lower triangular matrix is another unit lower triangular matrix. . restricted form L {k} k = I+ 2 . . . . 3 7 7 7 7 7 7 7 7 7 7 7 5 (11. ··· ··· ··· ··· ··· . . . one can always invert a unit triangular matrix easily by L U −1 = = ∞ j=−∞ ∞ j=−∞ L−1 . 11.8. . THE UNIT TRIANGULAR MATRIX 269 such quasielementary.11..4 The parallel unit triangular matrix If a unit triangular matrix fits the special. and that the adjoint or transpose of a unit upper triangular matrix is a unit lower triangular matrix. ∞ αij Eij . . . 1 0 0 ∗ ∗ . .

∗ ∗ 0 1 0 . . in matrix arithmetic generally. The reason we care about the separation of source from target is that. . . to A’s (k + 1)th row onward. . the sequence matters in which rows or columns are added. That is. and the sequence ceases to matter (refer to Table 11. ∗ ∗ 1 0 0 . . The danger is that i1 = j2 or i2 = j1 . . . A horizontal frontier separates source from target. 3 (11. .3).. that U Each separates source from target in the matrix A it operates on.270 or U {k} CHAPTER 11. . ∗ ∗ 0 0 1 . . The addition is from A’s rows through the kth. That is exactly what the parallel unit triangular matrix does: it separates source from target and thus removes the danger. . . . . but with the useful restriction L that it makes no row of A both source and target. 7 7 7 7 5 and U {k}T . Tα1 [i1 j1 ] Tα2 [i2 j2 ] = I + α1 Ei1 j1 + α2 Ei2 j2 = Tα2 [i2 j2 ] Tα1 [i1 j1 ] . . and also with respect to L {k}T 6 6 6 6 6 6 6 6 6 6 6 4 ··· ··· ··· ··· ··· . which acting AL {k}T and AU {k}T {k}T add columns respectively rightward and leftward (remembering that L {k}T is the lower). The general unit lower triangular matrix L acting LA on a matrix A adds the rows of A downward. . ∞ k−1 αij Eij . The parallel unit lower triangular matrix {k} {k} acting L A also adds rows downward. . ··· ··· ··· ··· ··· . . is no unit lower but a unit upper triangular matrix. It is for this reason that . in general. . 0 1 0 0 0 . It makes a difference whether the one addition comes before. . Remove this danger. 1 0 0 0 0 . .67) j=k i=−∞ = confining its nonzero elements to a rectangle within the triangle as shown. during or after the other—but only because the target of the one addition might be the source of the other. . 7 7 7 7 7 7 7.. . THE MATRIX = I+ 2 . where source and target are not separate but remain intermixed. . then it is a parallel unit triangular matrix and has some special properties the general unit triangular matrix lacks. which acting U A adds rows upward. . Similar observations naturally apply with respect to the parallel unit {k} {k} upper triangular matrix U . which thus march in A as separate squads.

the inverse of the parallel unit . which says that one can build a parallel unit triangular matrix equally well in any sequence—in contrast to the case of the general unit triangular matrix.68 does not show them. The multiplication is fully commutative.62) one must sequence carefully. You can scramble the factors’ ordering any random way you like. THE UNIT TRIANGULAR MATRIX the parallel unit triangular matrix brings the useful property that 271 L {k} k =I+ k ∞ ∞ αij Eij k ∞ j=−∞ i=k+1 = k Tαij [ij] = k Tαij [ij] Tαij [ij] Tαij [ij] j=−∞ i=k+1 j=−∞ i=k+1 = ∞ ∞ Tαij [ij] = Tαij [ij] = ∞ ∞ j=−∞ i=k+1 k j=−∞ i=k+1 k = (11. (Though eqn.68) i=k+1 j=−∞ i=k+1 j=−∞ = {k} ∞ k Tαij [ij] = ∞ k−1 ∞ k Tαij [ij] .) Under such conditions. whose construction per (11. even more sequences are possible. 11. i=k+1 j=−∞ i=k+1 j=−∞ U =I+ ∞ αij Eij j=k i=−∞ k−1 = j=k i=−∞ Tαij [ij] = · · · .8.11.

. . Pictorially. . . . . . ··· ··· ··· ··· ··· .. . 0 0 0 0 1 . . . .5 records a few properties that come immediately of the last observation and from the parallel unit triangular matrix’s basic layout.69) represents is indeed simple: that one can multiply constituent elementaries in any order and still reach the same parallel unit triangular matrix. only with each element off the main diagonal negated. U {k} −1 = 6 6 6 6 6 6 6 6 6 6 6 4 . . . .. . .69) “particularly simple. ··· ··· ··· ··· ··· . . . . in the present context the idea (11. . 1 0 0 0 0 . .. . . . .” Nevertheless. (11. 0 0 0 1 0 . . . 0 −∗ −∗ −∗ 1 −∗ −∗ −∗ 0 1 0 0 0 0 1 0 0 0 0 1 . . . . . . 1 0 0 0 1 0 0 0 1 −∗ −∗ −∗ −∗ −∗ −∗ . . . 3 7 7 7 7 7 7 7. . THE MATRIX =I− k ∞ j=−∞ i=k+1 ∞ αij Eij = 2I − L {k} = {k} −1 ∞ j=−∞ i=k+1 T−αij [ij] = · · · . . . . . 7 7 7 7 5 7 7 7 7 7 7 7. . . . 2 . . where again the elementaries can be multiplied in any order. Table 11. 3 L {k} −1 = 2 6 6 6 6 6 6 6 6 6 6 6 4 ··· ··· ··· ··· ··· . . .. . .272 triangular matrix is particularly simple:32 L {k} −1 k CHAPTER 11. ··· ··· ··· ··· ··· . . 32 . that the elementaries in this case do not interfere. . . . . There is some odd parochiality at play in applied mathematics when one calls such collections of symbols as (11. . . . 7 7 7 7 5 The inverse of a parallel unit triangular matrix is just the matrix itself. . .69) αij Eij = 2I − U {k} k−1 U =I− = ∞ j=k i=−∞ k−1 j=k i=−∞ T−αij [ij] = · · · .

) L ∞ Ik+1 L {k} {k} −1 {k}′ +L 2 {k} −1 =I = {k} {k} U {k} +U 2 {k} −1 {k} k I−∞ k−1 {k} ∞ I−∞ U Ik = L = U −I ∞ = Ik+1 (L {k} {k} k−1 − I = I−∞ (U k − I)I−∞ ∞ − I)Ik If L {k} and (I − In )(L If U {k} (In − Ik )L honors an n × n active region. (In the table. − I)(In − Ik−1 ) .5: Properties of the parallel unit triangular matrix. {k} − I)Ik Ik−1 U honors an n × n active region. THE UNIT TRIANGULAR MATRIX 273 Table 11. then {k} and (I − In )(U (In − Ik−1 ) = U {k} − I) = 0 = (U − I = Ik−1 (U {k} − I)(I − In ). 11. then {k} Ik = L {k} {k} − I) = 0 = (L {k} − I = (In − Ik )(L {k} {k} − I)(I − In ).8. Note that the inverses L =U are parallel unit triangular matrices themselves. such that U the table’s properties hold for them.11. b the notation Ia represents the generalized dimension-limited indentity ma{k}′ {k} −1 =L and trix or truncator of eqn. too.30.

. 0 0 0 0 1 . . 1 0 0 0 0 . . . . k . .5 The partial unit triangular matrix Besides the notation L and U for the general unit lower and unit upper {k} {k} for the parallel unit and U triangular matrices and the notation L lower and unit upper triangular matrices. . . . . ∞ ∞ αij Eij . 0 0 0 0 1 . . 0 0 0 1 0 . . . . 0 0 1 ∗ ∗ . ∗ 1 0 0 0 . . ··· ··· ··· ··· ··· . .71) 2 = 6 6 6 6 6 6 6 6 6 6 6 4 . 7 7 7 7 5 U [k] = I+ j=−∞ i=−∞ αij Eij . 0 0 0 1 ∗ . . . j−1 7 7 7 7 7 7 7. 3 7 7 7 7 7 7 7 7 7 7 7 5 (11. .274 CHAPTER 11. . . ∗ ∗ 1 0 0 . . . . . .8. 3 (11. . . .. . 1 0 0 0 0 . THE MATRIX 11.70) j=k i=j+1 = 6 6 6 6 6 6 6 6 6 6 6 4 . . . . . . . we shall find it useful to introduce the additional notation L [k] = I+ 2 . . . . ··· ··· ··· ··· ··· . . ··· ··· ··· ··· ··· . . . . . . .. . . . .. 0 1 0 0 0 . ··· ··· ··· ··· ··· .

··· ··· ··· ··· ··· . major partial unit triangular matrices. 7 7 7 7 5 U {k} = I+ 2 . . . . . . 1 0 0 0 0 . . and can be denoted L[k]T . ∗ ∗ 1 0 0 . . . . . . . THE UNIT TRIANGULAR MATRIX 275 for unit triangular matrices whose off-diagonal content is confined to a narrow wedge and k L{k} = I + 2 . . . L{k}T and U {k}T . U [k]T . 1 ∗ ∗ ∗ ∗ . . The conventional notation b f (k) + c f (k) = c f (k) k=a k=b k=a suffers the same arguable imperfection. but it serves a purpose in this book and is introduced here for this reason. j−1 . 7 7 7 7 7 7 7. . the partial unit triangular matrix is a matrix which leftward or rightward of the kth column resembles I. . . .8. 0 0 0 1 0 . ∗ ∗ ∗ ∗ 1 . the former pair can be called minor partial unit triangular matrices.. {k} 33 . . 0 0 0 0 1 .. . . . L{k} and U {k} . 3 (11. . αij Eij . .. . and the latter pair. . . ··· ··· ··· ··· ··· . . . .4 are in fact also major partial unit triangular matrices. . . {k} {k} Observe that the parallel unit triangular matrices L and U of § 11. . .72) j=−∞ i=j+1 . U [k] . . as the notation suggests. = 6 6 6 6 6 6 6 6 6 6 6 4 . . 3 7 7 7 7 7 7 7 7 7 7 7 5 (11. . .. Whether minor or major.73) j=k i=−∞ = 6 6 6 6 6 6 6 6 6 6 6 4 ··· ··· ··· ··· ··· . . . . . The notation is arguably imperfect in that L{k} + L[k] − P = L but rather that I P P L + L[k+1] − I = L. . . ∗ ∗ ∗ 1 0 .8. . for the supplementary forms. 0 0 1 ∗ ∗ .11. 0 1 ∗ ∗ ∗ . Of course partial unit triangular matrices which resemble I above or below the kth row are equally possible. If names are needed for L[k]. 0 1 0 0 0 . . ··· ··· ··· ··· ··· ∞ ∞ αij Eij . .33 Such notation is not standard in the literature.

. 7 ∂f2 5 ∂x3 . its active region is ∞ × ∞ in extent.76) 11. . 0 0 1 0 0 0 0 . Further obvious but useful identities include that H−k (Iℓ − Ik ) = Iℓ−k H−k . . . 0 0 0 0 0 0 0 . . . 0 0 0 0 1 0 0 . . . . . 0 0 0 0 0 1 0 . . . An exception is the shift operator Hk . transpose and adjoint are the same: T ∗ ∗ T Hk Hk = Hk Hk = I = Hk Hk = Hk Hk . . . (Iℓ − Ik )Hk = Hk Iℓ−k . . Inasmuch as the shift operator shifts all rows or columns of the matrix it operates on. . and the function itself can be vector-valued.10 The Jacobian derivative Chapter 4 has introduced the derivative of a function with respect to a scalar variable. 0 0 0 0 0 0 0 . THE MATRIX 11. (11. ··· ··· ··· ··· ··· ··· ··· 3 7 7 7 7 7 7 7 7 7. Hk shifts A’s rows downward k steps. . . 0 0 0 1 0 0 0 . .74) H2 = Operating Hk A. 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 ··· ··· ··· ··· ··· ··· ··· . . defined that [Hk ]ij = δi(j+k) . .9 The shift operator Not all useful matrices fit the dimension-limited and extended-operational forms of § 11. .3. . the shift operator’s inverse.77) = dx ij ∂xj For instance. 7 7 7 7 7 7 5 (11. . .75) −1 ∗ T Hk = Hk = Hk = H−k . For example. if x has three elements and f has two. . . . . . Hk shifts A’s columns leftward k steps.276 CHAPTER 11. The derivative is df ∂fi . 0 0 0 0 0 0 1 . . then df = dx 2 ∂f1 6 ∂x 1 6 4 ∂f 2 ∂x1 ∂f1 ∂x2 ∂f2 ∂x2 3 ∂f1 ∂x3 7. One can also take the derivative of a function with respect to a vector variable. Obviously. . (11. (11. . . Operating AHk .

34 35 [52. however.25) in the form35 d T g Af = gTA dx d ∗ g Af = g∗ A dx df dx df dx + dg dx dg dx T T Af ∗ T .19) of the derivative. + Af valid for any constant matrix A—as is seen by applying the definition (4.9 and the Jacobian derivative of this section complete the family of matrix rudiments we shall need to begin to do increasingly interesting things with matrices in Chs. 12. ∂xj →0 ∂xj ∂xj and simplifying.11. one could vary xn+1 in principle.78) The derivative is not In as one might think.10. and ∂xn+1 /∂xn+1 = 0. the Jacobian matrix. . which here is (g + ∂g/2)∗ A(f + ∂f /2) − (g − ∂g/2)∗ A(f − ∂f /2) ∂ (g∗ Af ) = lim . Before doing interesting things. we must treat two more foundational matrix matters. “Jacobian. 2007] Notice that the last term on (11. which will be the subjects of Ch. (11. THE JACOBIAN DERIVATIVE 277 This is called the Jacobian derivative. next.79)’s second line is transposed. The Jacobian derivative obeys the derivative product rule (4. 13 and 14.79) . still. The shift operator of § 11. 15 Sept. The Jacobian derivative of a vector with respect to itself is dx = I.” 00:50. or just the Jacobian. because. dx (11. even if x has only n elements. not adjointed.34 Each of its columns is the derivative with respect to one element of x. The two are the Gauss-Jordan decomposition and the matter of matrix rank.

278 CHAPTER 11. THE MATRIX .

and • the unit triangular matrix L or U (§ 11.1).3. L[k] or U[k] (§ 11. The general matrix A does not necessarily have any of these properties. the latter including • lone-element matrix E (§ 11.7). • the elementary operator T (§ 11. 279 . and that several orderly procedures are known to do so. • the rank-r identity matrix Ir (§ 11.4). and indeed one of the more useful. However. • the quasielementary operator P . Section 11.Chapter 12 Matrix rank and the Gauss-Jordan decomposition Chapter 11 has brought the matrix and its rudiments.2). • the general identity matrix I and the scalar matrix λI (§ 11. but it turns out that one can factor any matrix whatsoever into a product of rudiments which do have the properties. This chapter explains. The simplest of these. • the null matrix 0 (§ 11. Such rudimentary forms have useful properties. In fact all matrices have rank.3.7). that section has actually defined rank only for the rank-r identity matrix Ir .3. as we have seen.5). supplying in its place the new concept of matrix rank. D.3. This chapter introduces it. is the Gauss-Jordan decomposition.8).3 has de¨mphasized the concept of matrix dimensionality m× e n.

Technically the property applies to scalars. the several ak are linearly independent iff α1 a1 + α2 a2 + α3 a3 + · · · + αn an = 0 (12. we shall find it helpful to prepare two preliminaries thereto: (i) the matter of the linear independence of vectors. the chapter demands more rigor than one likes in such a book as this. More formally. a4 . Significantly but less obviously. . by contrast. too. That is. any nonzero scalar alone is linearly independent—but there is no such thing as a linearly independent pair of scalars. a3 . there is also no such thing as a linearly independent set which includes the null vector. and (ii) the elementary similarity transformation. a2 . strictly speaking. Except in § 12. 13 and 14. Paradoxically. One . at least one of which is nonzero (trivial αk . n = 1 set consisting only of a1 = 0 is. where “nontrivial αk ” means the several αk . a5 .1 Linear independence Linear independence is a significant possible property of a set of vectors— whether the set be the several columns of a matrix. (12. RANK AND THE GAUSS-JORDAN Before treating the Gauss-Jordan decomposition and the matter of matrix rank as such. . We shall drive through the chapter in as few pages as can be managed. For consistency of definition. Vectors which can combine nontrivially to reach the null vector are by definition linearly dependent. However. The chapter begins with these.2.1) for all nontrivial αk .280 CHAPTER 12. on the technical ground that the only possible linear combination of the empty set is trivial. inasmuch as a scalar resembles a one-element vector— so. because one of the pair can always be expressed as a complex multiple of the other.1) forbids it. not linearly independent.1 1 This is the kind of thinking which typically governs mathematical edge cases. the several rows. Linear independence is a property of vectors. however. would be α1 = α2 = α3 = · · · = αn = 0). n = 0 set as linearly independent. and then onward to the more interesting matrix topics of Chs. and logically the chapter cannot be omitted. the n vectors of the set {a1 . an } are linearly independent if and only if none of them can be expressed as a linear combination—a weighted sum—of the others. A vector is linearly independent if its role cannot be served by the other vectors in the set. or some other vectors—the property being defined as follows. . . it is hard to see how to avoid the rigor here. even the singlemember. we regard the empty. 12.

only if β1 − β1 = 0. then it cannot be expressed as any other combination of the same vectors. among other examples. .1. the important property of matrix rank depends on the number of linearly independent columns or rows a matrix has. but it helps to visualize the concept geometrically in three dimensions. Similar thinking makes 0! = 1.5. β3 − ′ β3 = 0. then one might ask: can there exist a different linear combination of the same vectors ak which also forms b? That is. . As we shall see in § 12. this could only be so if the coefficients in the last ′ ′ ′ equation where trivial—that is. were in fact one and the same. The combination is unique. if a vector b can be expressed as a linear combination of several linearly independent vectors ak . because (§ 11. The difference of the two equations then would be ′ ′ ′ ′ (β1 − β1 )a1 + (β2 − β2 )a2 + (β3 − β3 )a3 + · · · + (βn − βn )an = 0. and 2 not 1 k=0 the least prime. Two such vectors are independent so long as they do not lie along the same line. One concludes therefore that. a chapter on matrices. β2 − β2 = 0. LINEAR INDEPENDENCE 281 If a linear combination of several independent vectors ak forms a vector b. βn − βn = 0. which we had supposed to differ. but what then of the observation that adding a vector to a linearly dependent set never renders the set independent? Surely in this light it is preferable justP define the empty set as to independent in the first place. then is ′ ′ ′ ′ β1 a1 + β2 a2 + β3 a3 + · · · + βn an = b possible? To answer the question. .1) a matrix is essentially a sequence of vectors—either of column vectors or of row vectors. depending on one’s point of view.1). could define the empty set to be linearly dependent if one really wanted to. if β1 a1 + β2 a2 + β3 a3 + · · · + βn an = b. −1 ak z k = 0.12. According to (12. using the threedimensional geometrical vectors of § 3. But this says no less than that the two linear combinations. suppose that it were possible. .3. Linear independence can apply in any dimensionality. A third such vector is independent of the first two so long as it does not lie in their common plane. where the several ak satisfy (12. A fourth such vector (unless it points off into some unvisualizable fourth dimension) cannot possibly then be independent of the three. We discuss the linear independence of vectors in this.1). .

1). without fundamentally altering the character of the quasielementary or unit triangular matrix. . The symbols P ′ . L and U do. m × n matrix A is2 A = G> Ir G< = P DLU Ir KS. The similarity transformation has several interesting properties. typically merging D into L and omitting K and S. but they reveal little or nothing sketching the matrices does not. D.282 CHAPTER 12.3 The Gauss-Jordan decomposition The Gauss-Jordan decomposition of an arbitrary. only not necessarily the same ones P .2 The elementary similarity transformation Section 11. Perhaps the reader will agree that the decomposition is cleaner as presented here. RANK AND THE GAUSS-JORDAN 12.1 obtain. and sometimes without altering the matrix at all. The rules find use among other places in the Gauss-Jordan decomposition of § 12. dimension-limited. Most of the table’s rules are fairly obvious if the meaning of the symbols is understood. D.3. Most introductory linear algebra texts this writer has met call the Gauss-Jordan decomposition instead the “LU decomposition” and include fewer factors in it. Of course rigorous symbolic proofs can be constructed after the pattern of § 11.2) G> ≡ P DLU. The symbols P . where • P and S are general interchange operators (§ 11. the several rules of Table 12.7. They also omit Ir . though to grasp some of the rules it helps to sketch the relevant matrices on a sheet of paper.8. D ′ . some of which we are now prepared to discuss.8. L and U of course represent the quasielementaries and unit triangular matrices of §§ 11. C = T .46) have introduced the similarity transformation CAC −1 or C −1 AC. since their matrices have pre-defined dimensionality.2. which arises when an operator C commutes respectively rightward or leftward past a matrix A. particularly in the case in which the operator happens to be an elementary. In this case.5 and its (11. L′ and U ′ also represent quasielementaries and unit triangular matrices.7 and 11. G< ≡ KS. 12. 2 (12. The rules permit one to commute some but not all elementaries past a quasielementary or unit triangular matrix.

U. U Tα[ij] DT−α[ij] = D + ([D]jj − [D]ii ) αEij = D ′ Tα[ij] DT−α[ij] = D Tα[ij] U T−α[ij] = if [D]ii = [D]jj if i > j if i < j Tα[ij] LT−α[ij] = L′ U′ .1: Some elementary similarity transformations. U [k] . U {k} . L . T[i↔j] IT[i↔j] = I T[i↔j]P T[i↔j] = P ′ T[i↔j]DT[i↔j] = D ′ = D + ([D]jj − [D]ii ) Eii + ([D]ii − [D]jj ) Ejj T[i↔j]DT[i↔j] = D [k] if [D]ii = [D]jj T[i↔j]L T[i↔j] = L[k] {k} ′ {k} ′ if i < k and j < k if i > k and j > k if i > k and j > k if i < k and j < k if i > k and j > k if i < k and j < k T[i↔j]U [k] T[i↔j] = U [k] T[i↔j] L{k} T[i↔j] = L T[i↔j] U T[i↔j] L T[i↔j] U {k} {k} {k} T[i↔j] = U T[i↔j] = L {k} ′ {k} ′ T[i↔j] = U Tα[i] IT(1/α)[i] = I Tα[i] DT(1/α)[i] = D Tα[i] AT(1/α)[i] = A′ Tα[ij] IT−α[ij] = I where A is any of {k} {k} L. L[k] . THE GAUSS-JORDAN DECOMPOSITION 283 Table 12.3.12. L{k} .

RANK AND THE GAUSS-JORDAN • D is a general scaling operator (§ 11. > < S −1 K −1 = G−1 . > the Gauss-Jordan’s complementary form.3) 12. We shall meet some of the others in §§ 13. The equation itself is easy enough to read.1 Motive Equation (12. is the transpose of a parallel unit lower triangular matrix. for example). one can left-multiply the equation A = G> Ir G< by G−1 and right> multiply it by G−1 to obtain < U −1 L−1 D−1 P −1 AS −1 K −1 = G−1 AG−1 = Ir . 14.3.” The letter G itself can be regarded as standing for “Gauss-Jordan.2) seems inscrutable. but just as there are many ways to factor a scalar (0xC = [4][3] = [2]2 [3] = [2][6].284 CHAPTER 12. there are likewise many ways to factor a matrix.6.8. • K=L being thus a parallel unit upper triangular matrix (§ 11. However—at least for matrices which do have one—because G> and G< are composed of invertible factors.2).7. 14. and • r is an unspecified rank.8). (12. in fact) is a matter this section addresses.” but admittedly it is chosen as much because otherwise we were running out of available Roman capitals! 3 .12. The Gauss-Jordan decomposition we meet here however has both significant theoretical properties and useful practical applications.10 and 14. and in any case needs less advanced preparation to appreciate One can pronounce G> and G< respectively as “G acting rightward” and “G acting leftward. Whether all possible matrices A have a Gauss-Jordan decomposition (they do. • G> and G< are composites3 as defined by (12. • L and U are respectively unit lower and unit upper triangular matrices (§ 11. Why choose this particular way? There are indeed many ways.10.4). {r}T The Gauss-Jordan decomposition is also called the Gauss-Jordan factorization.2). < U −1 L−1 D −1 P −1 = G−1 .

for which A−1 A = In . that is only for square A.49. how to determine it is not immediately obvious. where A is known and A−1 is to be determined. it is only supposed here that A−1 A = In .3. left-multiplying by In and observing that In = In . itself constitutes A−1 . which implies that A−1 = (In ) T . (The A−1 here is the A−1(n) of eqn. it is not yet claimed that AA−1 = In . (In ) T (A) = In . and (at least as developed in this book) precedes the others logically. then we shall have that T (A) = In . so for the moment let us suppose a square A for which A−1 does exist. and even if it does exist. The matrix A−1 such that A−1 A = In may or may not exist (usually it does exist if A is square. as we shall soon see). When In is finally achieved. This observation is what motivates the Gauss-Jordan decomposition. what if A is not square? In the present subsection however we are not trying to prove anything. each of which makes the matrix more nearly resemble In .6) to n × n dimensionality. . truncated (§ 11. if one can determine A−1 . It emerges naturally when one posits a pair of square. THE GAUSS-JORDAN DECOMPOSITION 285 than the others. only to motivate. The product of elementaries which transforms A to In .3.12.) To determine A−1 is not an entirely trivial problem. 11. A and A−1 . However. n × n matrices. And still. and let us seek A−1 by left-multiplying A by a sequence T of elementary row operators. but even then it may not. 2 or.

or. all elementary operators including the ones here have extendedoperational form (§ 11.2).2). to group like operators. RANK AND THE GAUSS-JORDAN By successive steps. multiplying the two scaling elementaries to merge them into a single general scaling operator (§ 11. from Table 11. Only the 2 × 2 active regions are shown here. but all those · · · ellipses clutter the page too much.3. Using the elementary commutation identity that Tβ[m] Tα[mj] = Tαβ[mj] Tβ[m] . we have that A −1 = " 1 0 0 1 #" 1 0 2 1 #" 1 −3 5 0 1 #" 1 0 0 1 5 #" 1 2 0 1 0 # = " 1 −A 3 −A 2 5 1 5 # . .2. A −1 = " 1 0 0 1 #" 1 0 2 1 #" 1 0 0 1 5 #" 1 −3 0 1 #" 1 2 0 1 0 # = " 3 −A 1 −A 2 5 1 5 # . A −1 = " 1 0 0 1 #" 1 0 2 1 #" 1 −3 5 0 1 #" 1 2 0 1 5 0 # = " 3 −A 1 −A 2 5 1 5 # . from which A = DLU I2 = » 2 0 0 5 –» 1 3 5 0 1 –» 1 0 −2 1 –» 1 0 0 1 – = » 2 3 −4 −1 – . 4 . . Theoretically.7. . . The last equation is written symbolically as A−1 = I2 U −1 L−1 D−1 .286 CHAPTER 12. . – – – – – –» 0 1 5 Hence.4 a concrete example: A = » 1 2 1 2 1 2 1 2 1 2 0 0 0 0 0 0 1 0 1 0 1 0 1 0 1 » 1 0 0 1 –» » 1 0 1 0 2 1 2 1 –» » 1 0 1 0 1 0 0 1 5 0 1 5 –» –» –» » 1 −3 1 −3 1 −3 1 −3 0 1 0 1 0 1 0 1 –» –» –» –» – – – – – » 2 3 1 3 1 0 1 0 1 0 1 0 A = A = A = A = A = » » » » » −4 −1 −2 −1 −2 5 −2 1 0 1 0 1 – .

Intervening factors are multiplied by both T and T −1 .3. step 12. so if the reader will accept the other factors and suspend judgment on K until the actual need for it emerges in § 12. By successive elementary operations. it is (12. ˜ Each step of the transformation goes as follows. (12. the A on the right is gradually transformed into Ir . It begins with the equation A = IIIIAII. U . particularly to avoid dividing by zero when they encounter a zero in an inconvenient cell of the matrix (the reader might try reducing A = [0 1. The matrix I is left. this factor comes into play when A has broad rectangular rather than square shape. are all I. for instance. In between. elementary by elementary. . which is (12. S and K in the first place? The answer regarding P and S is that these factors respectively gather row and column interchange elementaries. where the six I hold the places of the six Gauss-Jordan factors P . but only in the special case that r = 2 and P = S = K = I—which begs the question: why do we need the factors P .2. 12.or left-multiplied by T −1 . K and S of (12. we are not quite ready to detail yet. the equation is represented as ˜ ˜ ˜ ˜ ˜˜ ˜ A = P D LU I K S.2). D. For example. THE GAUSS-JORDAN DECOMPOSITION 287 Now. a row or column interchange is needed here).3. of which the example given has used none but which other examples sometimes need or want. but at present we are only motivating not proving.2).2 Method The Gauss-Jordan decomposition of a matrix A is not discovered at one stroke but rather is gradually built up.3. the equation A = DLU I2 is not (12..2). Regarding K.4) ˜ ˜ ˜ where the initial value of I is A and the initial values of P .2)—or rather. The decomposition thus ends with the equation A = P DLU Ir KS. etc. To compensate. which multiplication constitutes an elementary similarity transformation as described in § 12. and also sometimes when one of the rows of A happens to be a linear combination of the others. while the several matrices are gradually being transformed.12. L. 1 0] to I2 .or right-multiplied by an elementary operator T . The last point. admittedly. ˜ ˜ A = P DT(1/α)[i] ˜ Tα[i] LT(1/α)[i] ˜ Tα[i] U T(1/α)[i] ˜ ˜˜ Tα[i] I K S. one of the six factors is right. while the six I are gradually transformed into the six Gauss-Jordan factors. then we shall proceed on this basis. D.3.

3 The algorithm Having motivated the Gauss-Jordan decomposition in § 12. the algorithm decrees the following steps.2. ˜ ˜ L ← Tα[i] LT(1/α)[i] . ˜ = Ir .4) has become the Gauss-Jordan decomposition (12.1 and having proposed a basic method to pursue it in § 12. ˜ thus associating the operation with the appropriate factor—in this case.) . D. The remarks are unnecessary to execute the algorithm’s steps as such. Specifically. at Such elementary row and column operations are repeated until I which point (12. RANK AND THE GAUSS-JORDAN which is just (12. then. failproof algorithm to achieve it. 12. the algorithm ˜ • copies A into the variable working matrix I (step 1 below).2). orderly. • establishes a rank r (step 8).4). ˜ • reduces I by suitable row (and maybe column) operations to unit upper triangular form (steps 2 through 7).3. (The steps as written include many parenthetical remarks—so many that some steps seem to consist more of parenthetical remarks than of actual algorithm. ˜ ˜ D ← DT(1/α)[i] .288 CHAPTER 12. we shall now establish a definite. ˜ ˜ U ← Tα[i] U T(1/α)[i] . inasmuch as the adjacent elementaries cancel one another. Broadly.3. and ˜ • reduces the now unit triangular I further to the rank-r identity matrix Ir (steps 9 through 13).3. ˜ ˜ I ← Tα[i] I. They are however necessary to explain and to justify the algorithm’s steps to the reader.

Regarding the ˜ other factors.8). The other six variable working matrices each have extendedoperational form. What is less clear until one has read the whole algorithm. The ˜ relates not to the ı ı ˜ ˜ doubled. n × n for K and S. leaving ˜ = Ir . the ı matrix has proper unit upper triangular form (§ 11. The need grows clear once ˜ one has read through step 7. so e this step 2 though logical seems unneeded. The notation ˜ii thus means [I]ii —in other words. subscribed index ii but to I. 289 ˜ U ← I. Begin by initializing ˜ P ← I. where i is a row index. but this is accidental. ˜ S ← I. ˜ where I holds the part of A remaining to be decomposed. that I is all-zero below-leftward of and directly leftward of (though not directly below) the pivot element ˜ii .3. notice that L enjoys the major partial unit triangular The notation ˜ii looks interesting.) Observe that neither the ith row of I ˜ nor any row below it has an entry left of the ith column. one naturally need not store A beyond the m × n active region. From step 1. L and U . ˜ K ← I. One need store none of the matrices beyond these bounds. is that ˜ one also need not store the dimension-limited I beyond the m×n active region. 5 . and where the others are the variable working matrices ˜ of (12. i ← 1. the algorithm also ˜ ˜ re¨nters here from step 7 below. I = A and L = I. (Besides arriving at this point from step 1 above. but nevertheless true.5 Observe also that above the ith row. ˜ D ← I. ı ˜ it means the current iith element of the variable working matrix I.4).) 2.12. ˜ L ← I. ˜ I ← A. (The eventual goal will be to factor all of I away. THE GAUSS-JORDAN DECOMPOSITION 1. but they also confine their activity to well defined ˜ ˜ ˜ ˜ ˜ ˜ regions: m × m for P . I Since A by definition is a dimension-limited m×n matrix. D. though the precise value of r will not be known until step 8.

. . . . . . 0 1 ∗ ∗ ∗ ∗ ∗ . . 0 0 0 0 0 0 1 . . . . . but any nonzero element from the ith row downward ı can in general be chosen. . . 7 7 7 7 7 7 5 3 7 7 7 7 7 7 7 7 7. 0 0 0 0 1 0 0 . 7 7 7 7 7 7 5 where the ith row and ith column are depicted at center. . . . ··· ··· ··· ··· ··· ··· ··· . . . RANK AND THE GAUSS-JORDAN ˜ form L{i−1} (§ 11. Pictorially. ∗ ∗ ∗ ∗ ∗ ∗ ∗ . . ··· ··· ··· ··· ··· ··· ··· . . . . . . . . . . . . 0 0 ∗ 0 0 0 0 . . When one has no reason to choose otherwise. .290 CHAPTER 12. . .. . . . . . 0 0 0 0 1 0 0 . . that is as good a choice as any. . 2 . . if ˜ii = 0. . . . . Choose a nonzero element ˜pq = 0 on or below the pivot row. . 0 0 0 0 0 1 0 . . . . . . . . . . . ∗ ∗ ∗ ∗ ∗ ∗ ∗ . . ∗ ∗ ∗ ∗ ∗ ∗ ∗ .. . .. 7 7 7 7 7 7 7 7 7. ∗ 1 0 0 0 0 0 . where p = ı q = i. 0 0 0 0 0 0 1 . . . . ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· . There is however no actual need to choose so. . 0 ∗ 0 0 0 0 0 . (The easiest choice may simply be ˜ii . ∗ 0 0 0 0 0 0 . . . . .8. . . . . . where ı p ≥ i and q ≥ i. . ∗ ∗ 1 0 0 0 0 . . . 1 0 0 0 0 0 0 .. . . ∗ ∗ ∗ ∗ ∗ ∗ ∗ . . . . Beginning students of the Gauss-Jordan or LU decomposition are conventionally taught to choose first the least possible q then the least possible p.5) and that dkk = 1 for all k ≥ i. 0 0 0 1 0 0 0 . 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 ˜ I = . . . . . . . . . 0 0 0 1 0 0 0 . . . . . . In fact alternate choices can sometimes improve . . ··· ··· ··· ··· ··· ··· ··· .. 1 ∗ ∗ ∗ ∗ ∗ ∗ . . . . . . . 0 0 1 ∗ ∗ ∗ ∗ . 3 ˜ D = 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 ··· ··· ··· ··· ··· ··· ··· . .. 3. 3 7 7 7 7 7 7 7 7 7. . 7 7 7 7 7 7 5 ˜ L = L{i−1} = 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 . . . . . . . 0 0 0 0 0 1 0 .

Smallness however is relative: a small pivot in a row and a column each populated by even smaller elements is unlikely to cause as much error as is a large pivot in a row and a column each populated by even larger elements.3. (The mantissa’s high-order bit. If two are equally least. is implied not stored. A typical Intel or AMD x86-class computer processor represents a C/C++ doubletype floating-point number. the choice is quite arbitrary. Such errors are naturally avoided by avoiding small pivots. then choose first the lesser column ˜2 index q.0 ≤ b < 2.12.7 Theoretically nonetheless. THE GAUSS-JORDAN DECOMPOSITION 291 practical numerical accuracy. 0xB are for the number’s exponent −0x3FF ≤ p ≤ 0x3FE. ı ı ı∗ ı ′ =i ˜p′ q˜p′ q + p q ′ =i ˜pq ′˜pq ′ 6 Choose the p and q of least ηpq .4) can be expanded to read A = ˜ P T[p↔i] ˜ T[p↔i] DT[p↔i] ˜ T[p↔i]LT[p↔i] ˜ T[p↔i]U T[p↔i] ˜ × T[p↔i] IT[i↔q] = ˜ T[i↔q] KT[i↔q] ˜ T[i↔q] S ˜ ˜ ˜ ˜ P T[p↔i] D T[p↔i] LT[p↔i] U ˜ ˜ ˜ × T[p↔i] IT[i↔q] K T[i↔q] S . which is always 1. so long as ˜pq = 0. The following heuristic if programmed intelligently might not be too computationally expensive: Define the pivotsmallness metric 2˜∗ ˜pq ı pqı Pn ηpq ≡ Pm ∗ ˜2 . ˜ ˜ S ← T[i↔q] S. [26. but of course it is not generally exact.) If ı no nonzero element is available—if all remaining rows p ≥ i are now null—then skip directly to step 8. To choose a pivot. 4. when doing exact arithmetic. let ˜ ˜ P ← P T[p↔i] . § 1-4. All this is standard computing practice.6. any of several heuristics are reasonable. .0 (not 1. then if necessary the lesser row index p.) The out-of-bounds exponents p = −0x400 and p = 0x3FF serve specially respectively to encode 0 and ∞.0 as one might expect). ˜ ˜ L ← T[p↔i] LT[p↔i]. in 0x40 bits of computer memory. Observing that (12.2.0 ≤ b < 4. 0x34 are for the number’s mantissa 2.2] 7 The Gauss-Jordan’s floating-point errors come mainly from dividing by small pivots. x = 2p b. at least until as late in the algorithm as possible. Of the 0x40 bits. ˜ ˜ I ← T[p↔i] IT[i↔q] . Such a floating-point representation is easily accurate enough for most practical purposes. and one is for the number’s ± sign. thus is one neither of the 0x34 nor of the 0x40 bits.

. ∗ ∗ ∗ ∗ ∗ ∗ ∗ . . . . ˜ This forces ˜ii = 1. ı ˜ D = 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 ˜ I = . still ˜ ˜ ˜ U = K = I. Regarding the L transformation. . . 0 0 0 0 0 1 0 . . . to bring the chosen element to the pivot position. . . ∗ ∗ 1 0 0 0 0 .292 CHAPTER 12.1 for the similarity transformations. . . . which form according to Table 12. 7 7 7 7 7 7 5 ˜ ˜ I ← T(1/˜ii )[i] I. . . . . . . . . . . . . . . . . . . . ˜ ˜ ı L ← T(1/˜ )[i] LT˜ ı ii ii [i] . ··· ··· ··· ··· ··· ··· ··· . ∗ ∗ ∗ 1 ∗ ∗ ∗ . . . . . (Re˜ ˜ fer to Table 12. . The U and K transformations disappear because at this stage of the algorithm. . .) 5. . .. . . . 0 0 0 ∗ 0 0 0 . The D transformation disappears because p ≥ i and ˜kk = 1 for all k ≥ i.. .4) can be expanded to read ˜ ˜ ı A = P DT˜ii [i] ˜ ı T(1/˜ii )[i] LT˜ii [i] ı ˜ ı T(1/˜ii )[i] U T˜ii [i] ı ˜ ˜˜ × T(1/˜ii )[i] I K S ı ˜ ˜ ı = P DT˜ii [i] ˜ ˜ ı ˜ ˜˜ T(1/˜ii )[i] LT˜ii [i] U T(1/˜ii )[i] I K S. . 0 0 0 0 0 0 1 . . but L has major partial unit triangular form L{i−1} . . Observing that (12. ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· . . . . .. It also changes the value of dii . . . 0 0 ∗ 0 0 0 0 . 3 7 7 7 7 7 7 7 7 7. ∗ 1 0 0 0 0 0 . ı ı normalize the new ˜ii pivot by letting ı ˜ ˜ ı D ← DT˜ii [i] . RANK AND THE GAUSS-JORDAN thus interchanging the pth with the ith row and the qth with the ith column. ∗ ∗ ∗ ∗ ∗ ∗ ∗ . 0 0 0 0 1 0 0 . . . . . 0 ∗ 0 0 0 0 0 . 7 7 7 7 7 7 5 3 7 7 7 7 7 7 7 7 7. ∗ 0 0 0 0 0 0 . ∗ ∗ ∗ ∗ ∗ ∗ ∗ . . . ··· ··· ··· ··· ··· ··· ··· . .. . . . . 1 0 0 0 0 0 0 .1 it retains since i − 1 < i ≤ p. . 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 . . it does ˜ because d ˜ not disappear. . . Pictorially after ı this step. . . . . .

3 7 7 7 7 7 7 7 7 7. .3. . .) 7 7 7 7 7 7 7 7 7. . ··· ··· ··· ··· ··· ··· ··· . . . . ı ˜ L = L{i} = 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 ˜ I = . . .7. . . . . . It also fills in L’s ith column below the ı {i−1} form to the L{i} form. 7 7 7 7 7 7 5 ˜ I ←   ˜ L  m p=i+1 T˜pi [pi]  . . . . Observing that (12. again it leaves L in the major partial unit triangular form L{i−1} . . . 1 ∗ ∗ ∗ ∗ ∗ ∗ .. . .12. 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 . ∗ ∗ 1 0 0 0 0 . pivot. . Refer to Table 12. THE GAUSS-JORDAN DECOMPOSITION 293 ˜ ˜ (Though the step changes L. ∗ ∗ ∗ ∗ ∗ ∗ ∗ . 7 7 7 7 7 7 5 . . .. . . . . . 1 0 0 0 0 0 0 . ∗ ∗ ∗ ∗ ∗ ∗ ∗ . 0 0 0 1 ∗ ∗ ∗ . See § 11. . . ∗ 1 0 0 0 0 0 . . . . . .. . . ı ˜ clear I’s ith column below the pivot by letting   m ˜ L ← ˜ This forces ˜ip = 0 for all p > i. .3. . . . . ··· ··· ··· ··· ··· ··· ··· . . . . thus can be applied all at once. ∗ ∗ ∗ 1 0 0 0 . ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· . . . .4) can be expanded to read ˜˜ ˜ ı A = P D LT˜pi [pi] ˜ ı T−˜pi [pi] U T˜pi [pi] ı ˜ ˜˜ T−˜pi [pi] I K S ı ˜˜ ˜ ı ˜ ˜ ˜˜ = P D LT˜pi [pi] U T−˜pi [pi] I K S. 0 0 0 0 0 1 0 . . ∗ ∗ ∗ ∗ ∗ ∗ ∗ . because i − 1 < i. . . too. . . . 3 (Note that it is not necessary actually to apply the addition elementaries here one by one. 0 0 0 0 1 0 0 . . . . . 0 0 0 0 0 0 1 . . . advancing that matrix from the L Pictorially. . . . . .) 6. 0 1 ∗ ∗ ∗ ∗ ∗ . 0 0 1 ∗ ∗ ∗ ∗ . . ı  p=i+1 ˜ T−˜pi [pi]  I . .1.. . . Together they easily form an addition quasielementary L[i] . . . .

then skip directly to e step 12. thus letting i point to the matrix’s last nonzero row. ı ˜ This forces ˜ip = 0 for all p = i. Notice that.4) can be expanded to read ˜˜˜ ˜ ı A = P DL U T˜pi [pi] ˜ ˜˜ T−˜pi [pi] I K S. It also fills in U ’s ith column above the ı {i+1} form to the U {i} form. Go back to step 2. 10. r ≤ m and r ≤ n.) If i = 0. Increment CHAPTER 12. pivot. certainly. (Besides arriving at this point from step 8 above. the algorithm also re¨nters here from step 11 below. 9. Decrement i← i−1 to undo the last instance of step 7 (even if there never was an instance of step 7). 8. let the rank r ≡ i. ı   ˜ T−˜pi [pi]  I . Observing that (12. advancing that matrix from the U . After decrementing.294 7. ı ˜ clear I’s ith column above the pivot by letting  i−1 p=1 ˜ U ← ˜ I ←   ˜ U  i−1 p=1 T˜pi [pi]  . RANK AND THE GAUSS-JORDAN i ← i + 1.

. 3 7 7 7 7 7 7 7 7 7.. . ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· ··· . . . . . . 0 0 0 1 0 0 0 . . . . . except with n − r extra columns dressing its right edge (often r = n however.4) can be expanded to read ˜˜˜˜ ˜ ı A = P D LU IT−˜pq [pq] ˜ ˜ T˜pq [pq] K S. . . . . ∗ ∗ ∗ ∗ ∗ 1 0 . . ∗ 1 0 0 0 0 0 . . .12. . . . 1 0 0 0 0 0 0 . ··· ··· ··· ··· ··· ··· ··· . . . . . . . ∗ ∗ ∗ ∗ 0 0 0 . . . ∗ ∗ ∗ ∗ 1 0 0 . 0 0 0 1 0 0 0 . . . . . . . Together they easily form an addition quasielementary U[i] . ∗ ∗ 1 0 0 0 0 . . . . . 2 ˜ I = (As in step 6. . . . 7 7 7 7 7 7 5 ˜ I= ··· ··· ··· ··· ··· ··· ··· . . . 0 0 1 0 0 0 0 . . 0 1 0 0 0 0 0 .. . .7. 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 . See § 11. . . ∗ ∗ ∗ ∗ ∗ ∗ 1 . ˜ 12. . . Observing that (12. 1 0 0 0 0 0 0 . . . 0 0 0 0 0 0 1 . . 1 0 0 0 0 0 0 . .. . . . . . 7 7 7 7 7 7 5 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 . . . .) 11. . . then there are no extra columns). . . . 0 1 0 0 0 0 0 . . Decrement i ← i − 1. ∗ ∗ ∗ ∗ 0 0 0 . . 3 7 7 7 7 7 7 7 7 7. . 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 . . ··· ··· ··· ··· ··· ··· ··· .. 7 7 7 7 7 7 5 295 ˜ U = U {i} = ··· ··· ··· ··· ··· ··· ··· . Go back to step 9. . . THE GAUSS-JORDAN DECOMPOSITION Pictorially. . . here again it is not necessary actually to apply the addition elementaries one by one. . 0 0 0 0 1 0 0 . . . . . .3. . . ı . . . ∗ ∗ ∗ ∗ 0 0 0 . . 0 0 0 0 0 1 0 . 0 0 1 0 0 0 0 . . . . . .3. . . . . ∗ ∗ ∗ 1 0 0 0 . . .. . . . . Pictorially. . . .. 3 7 7 7 7 7 7 7 7 7. . . . Notice that I now has the form of a rank-r identity matrix. .

See § 11.8. End.4 Rank and independent rows Observe that the Gauss-Jordan algorithm of § 12. here again it is not necessary actually to apply the addition elementaries one by one. Let ˜ ≡ P. entering this step. We can safely ignore this unproven fact however for the immediate moment. Together they easily form a parallel unit upper—not lower—triangular matrix {r}T .296 CHAPTER 12. Therefore. (12. regardless of which pivots ˜pq = 0 ı one happens to choose in step 3 along the way. ˜ Never stalling. that the algorithm always produces the same Ir . ˜ D ≡ D.) 12.3. RANK AND THE GAUSS-JORDAN ˜ use the now conveniently elementarized columns of I’s main body to suppress the extra columns on its right edge by   n r ˜ I ← ˜ ˜ (Actually. r ≤ n.5) . ı  q=r+1 p=1 ˜ T˜pq [pq] K . Notice now that I = Ir . r ≤ m. As in steps 6 and 10. but shall in § 12.5. ˜ K ≡ K. ı ˜ ≡ U.4.) L ˜ 13. the same rank r ≥ 0. ˜ L ≡ L. P U ˜ K ←   ˜ I  n q=r+1 p=1 r T−˜pq [pq]  . (We have not yet proven. though what value the rank r might turn out to have is not normally known to us in advance. so in fact K becomes just the product above. the algorithm cannot fail to achieve I = Ir and thus a complete Gauss-Jordan decomposition of the form (12.3.2).3 operates always within the bounds of the original m × n matrix A. it was that K = I. necessarily. ˜ S ≡ S.

r < m. one can invert. Refer to (11. but according to § 12. Observe also however that the rank always fully reaches r = m if the rows of the original matrix A are linearly independent. Step 6 in the ith iteration adds multiples of the ith pivot row downward. U .1). which already include multiples of the first through (i − 1)th. one can drown the assertion rigorously in symbols to prove it.41). K and S—is composed of a sequence T of elementary operators. a multiple of itself—until step 10. Step 6 in the second iteration therefore adds multiples of the second pivot row downward. in fact. If one merely records the sequence of elementaries used to build each of the six factors—if one reverses each sequence. 12.12. and multiplies—then the six inverse factors result. One actually need not record the individual elementaries. but find only rows which already include multiples of the first pivot row. How do we know that step 6 can never create a null row? We know this because the action of step 6 is to add multiples only of current and earlier pivot rows ˜ to rows in I which have not yet been on pivot. such action has no power to cancel the independent rows it targets. D−1 .7 and 11. inverts each elementary. but before going to that extreme consider: The action of steps 3 and 4 is to choose a pivot row p ≥ i and to shift it upward to the ith position.8 have shown how. L.3. One need not however go even to that much trouble. To no row is ever added. The reason for this observation is that the rank can fall short. This being so. such null rows never were linearly independent in the first place). only if step 3 finds a null row i ≤ m.8 According to (12. but step 3 can find such a null row only if step 6 has created one (or if there were a null row in the original matrix A. L−1 . Sections 11. steps 3 and 4 in the second iteration find no unmixed rows available to choose as second pivot. The action of step 6 then is to add multiples of the chosen pivot row downward only—that is. Each of the six factors—P . which already include multiples of the first pivot row. only to rows which have not yet been on pivot.1. So it comes to pass that multiples only of current and earlier pivot rows are added to rows which have not yet been on pivot.3. directly or indirectly. it isn’t even that hard. multiply and forget them in stream. which does not belong to the algorithm’s main loop and has nothing to do with the availability of nonzero rows to step 3. This is unsurprising. Each of the six inverse factors—P −1 . D. And. This means starting the algorithm from step 1 with six extra variable work8 If the truth of the sentence’s assertion regarding the action of step 6 seems nonobvious. Indeed the narrative of the algorithm’s step 8 has already noticed the fact. THE GAUSS-JORDAN DECOMPOSITION 297 The rank r exceeds the number neither of the matrix’s rows nor of its columns. .5 Inverting the factors Inverting the six Gauss-Jordan factors is easy. U −1 . K −1 and S −1 —is therefore composed of the reverse sequence T −1 of inverse elementary operators.

RANK AND THE GAUSS-JORDAN ing matrices (besides the seven already there): ˜ P −1 ← I. we left-multiply (12. ˜ U −1 ← I. by their n × n.3). actions which have no effect on A since it is already a dimension-limited m×n 9 Section 11. anyway. To truncate the six operators formally. Indeed.3.) Then. the Im and In respectively truncate rows and columns. row operators. ˜ L−1 ← I. so there is nothing for the six operators to act upon beyond those bounds in any event.3. ˜ ˜ ı D ← DT˜ii [i] . unknown at algorithm’s start. ı ˜ ˜ I ← T(1/˜ii )[i] I. ı ˜ ˜ D−1 ← T(1/˜ii )[i] D−1 . ˜ K −1 ← I.2) actually needs to retain its entire extendedoperational form (§ 11. not because it would not be useful. ı ı With this simple extension. in step 5.298 CHAPTER 12. K or S. for each ˜ ˜ ˜ ˜ ˜ ˜ operation on any of P . For example. obtaining Im AIn = Im P DLU Ir KSIn .1) for this reason if we want. U . the algorithm yields all the factors not only of the Gauss-Jordan decomposition (12. the two on the right. According to § 11. neither Ir nor A has anything but zeros outside the m × n rectangle. We can truncate all six operators to dimension-limited forms (§ 11. ˜ D−1 ← I. column operators. but because its initial value would be9 A−1(r) . act wholly by their m × m squares. The four factors on the left.2).2) but simultaneously also of the Gauss-Jordan’s complementary form (12.6 Truncating the factors None of the six factors of (12. ˜ ˜ ı L ← T(1/˜ii )[i] LT˜ii [i] .3.3. ˜ (There is no I −1 .2) by Im and right-multiply it by In .5 explains the notation. ı ˜ ˜ L−1 ← T(1/˜ii )[i] L−1 T˜ii [i] . ˜ S −1 ← I. . D. 12. L. one operates inversely on the corresponding inverse matrix.6.

Comes the objection.6) expresses any dimension-limited rectangular matrix A as the product of six particularly simple. having infinite rank. the dimensionality m × n of the matrix is a distraction. serves the latter case. m × r. A = Im P DLU Ir KSIn 7 2 3 = Im P DLU Ir KSIn . then. dimension-limited rectangular factors. 299 and finally. whose rank matters. Equation (12. and theoretically in the author’s view. too.2 lists a few. Table 12.31) or (11. (12. m × m. or a reversible row or column operation. one need not be too impressed with either. A = (Im P Im )(Im DIm )(Im LIm )(Im U Ir )(Ir KIn )(In SIn ).3. By similar reasoning from (12. by using (11.7) where the dimensionalities of the two factors are m × r and r × n. Anyway. a matrix displaying an infinite field of zeros resembles a shipment delivering an infinite supply of nothing. why do you make it more complicated than it needs to be? For what reason must all your matrices have infinite dimensionality. whose rank does not. The answer is that this book is a book of applied mathematical theory.7 Properties of the factors One would expect such neatly formed operators as the factors of the GaussJordan to enjoy some useful special properties.42) repeatedly.3 manifest the sense that a matrix can represent a linear transformation. The extended-operational form. It is the rank r. The book will seldom point it out again explicitly. but one can straightforwardly truncate not only the Gauss-Jordan factors but most other factors and operators. (12. if any. 10 . however.2). nor is there any real difference between T5[21] when it row-operates on a 3 × p matrix and the same elementary when it row-operates on a 4 × p. infinite-dimensional matrices are significantly neater to handle. THE GAUSS-JORDAN DECOMPOSITION matrix. “Black.12. The relevant theoretical constructs ought to reflect such insights. To append a null row or a null column to a dimension-limited matrix is to alter the matrix in no essential way.10 12. m × m.3. Indeed they do. anyway? They don’t do it that way in my other linear algebra book. by the method of this subsection. The table’s properties formally come from (11.52) and Table 11. Hence infinite dimensionality.” It is a fair question. A = (Im G> Ir )(Ir G< In ). The two matrix forms of § 11.5. that counts. In either case. By successive steps. r × n and n × n.6) where the dimensionalities of the six factors on the equation’s right side are respectively m × m.

The table’s properties regarding P and S express a general advantage all permutors share. Further properties of the several Gauss-Jordan factors can be gleaned from the respectively relevant subsections of §§ 11. 12. if one firmly grasps the matrix forms involved and comprehends the notation (neither of which is trivial to do).8. then the properties are plainly seen without reference to Ch. then one can take advantage of (11.8 Marginalizing the factor In If A happens to be a square.3.8) thus marginalizing the factor In . The table’s properties regarding K are admittedly less significant. n × n matrix and if it develops that the rank r = n. and if one sketches the relevant factors schematically with a pencil.300 CHAPTER 12.31) to rewrite the Gauss-Jordan decomposition (12.2) in the form P DLU KSIn = A = In P DLU KS. RANK AND THE GAUSS-JORDAN Table 12. Still.2: A few properties of the Gauss-Jordan factors. P ∗ = P −1 = P T S ∗ = S −1 P −∗ = S −∗ = P S = ST = P −T = S −T = I = Ir (K − I)(In − Ir ) = (K − I)(I − In ) K + K −1 2 Ir K −1 (In (I − In Ir K(In − Ir ) = − Ir ) = − I) = K −1 K −I 0 0 (I − In )(K − I) = )(K −1 − I = Ir (K −1 − I)(In − Ir ) = (K −1 − I)(I − In ) but. They might find other uses. if one understands that the operator (In − Ir ) is a truncator that selects columns r + 1 through n of the matrix it operates leftward upon. (12.7 and 11. This is to express the Gauss-Jordan solely in row operations or solely in column operations.3 will need them. It does not change the . 11 as such. included mostly only because § 13. even the K properties are always true.

9 Decomposing an extended operator Once one has derived the Gauss-Jordan decomposition. the extended-operational matrix form is hardly more than a formality. the preceding equation results. if you like. Or. too. this may be a distinction without a difference.12.6 and 11. of (11. Indeed this was also the point of § 12.” as the narrative says. 13.3. eqn. To put them all in dimension-limited form however brings at least three effects the book you are reading prefers to avoid. one decomposes the n × n dimension-limited matrix In A = In AIn = AIn as AIn = P DLU In KS = P DLU KSIn . wherein the Ir has become an I. because the vector on which the operator ultimately acts is probably null there in any event. The last of the three is arguably most significant: matrix rank is such an important attribute that one prefers to impute it only to those operators about which it actually says something interesting. it leaves shift-and-truncate operations hard to express cleanly (refer to §§ 11. it leaves one to consider the ranks of reversible operators like T[1↔2] that naturally should have no rank.6 and. whereas a dimension-limited operator would have truncated whatever it found there.31). then one might just put all matrices in dimension-limited form. THE GAUSS-JORDAN DECOMPOSITION 301 algorithm and it does not alter the factors. Nevertheless.9 and. n × n extended operator A (where per § 11. Since what is found outside the operational domain is often uninteresting.3. One can decompose only reversible extended operators so. It fails however if A is rectangular or r < n. it merely reorders the factors after the algorithm has determined them. is that what an operator does outside some appropriately delimited active region is seldom interesting. 11 . One merely writes A = P DLU KS.3. This subsection’s equations remain unnumbered because they say little new. really. to extend it to decompose a reversible.3. In such a context it may not matter whether one truncates the operator.3. See § 12. as a typical example of the usage.11 If “it may not matter. it confuses the otherwise natural extension of discrete vectors into continuous functions. Second. inasmuch as all the factors present but In are n × n extended operators. which one can safely ignore. First. 12. for which the rank of the truncated form AIn is r < n. Their only point. equivalently. Third. All it says is that the extended operator unobtrusively leaves untouched anything it happens to find outside its operational domain. from which.7).5.2 A outside the n × n active region resembles the infinite-dimensional identity matrix I) is trivial. The GaussJordan fails on irreversible extended operators. Many books do.

13. . for which ψo = 0. However. the Gauss-Jordan decomposition is a significant achievement. RANK AND THE GAUSS-JORDAN Regarding the present section as a whole. the space consists of all vectors b formable as βo u + β1 a1 + β2 a2 + · · · + βm am = b.12.302 CHAPTER 12. That is. Solving (12. before closing the present chapter we should like finally. ψo u + ψ1 a1 + ψ2 a2 + · · · + ψm am = v. . As a definition.10 and the singular value of § 14. we shall first need one more preliminary: the technique of vector replacement. ψo ψo ψo ψo (12. We shall put the Gauss-Jordan to good use in Ch. the diagonal of § 14. 12. Now consider a specific vector v in the space. which we had not known how to handle very well.9) . into a product of unit triangular matrices and quasielementaries. am }. To do that.10) (12. It is not the only matrix decomposition— further interesting decompositions include the Gram-Schmidt of § 13. we find that ψ1 ψ2 ψm 1 v− a1 − a2 − · · · − am = u. . squarely to define and to establish the concept of matrix rank. which we do. a2 . a1 . among others—but the Gauss-Jordan nonetheless reliably factors an arbitrary m × n matrix A. the space these vectors address consists of all linear combinations of the set’s several vectors.4 Vector replacement Consider a set of m + 1 (not necessarily independent) vectors {u.10) for u. . the Schur of § 14.10. not only for Ir but for all matrices.6.

. Now further consider an arbitrary vector b which lies in the space addressed by the vectors {u. assuming that ψo is finite it even appears that φo = 0. φo φ2 = − . only with the symbols u ↔ v and ψ ↔ φ swapped. it happens that ψo = ψ1 ψ2 1 . φm = − . a2 . .3 summarizes. . . the symmetry is complete. . Table 12.10). φo ψm the solution is φo v + φ1 a1 + φ2 a2 + · · · + φm am = u. a2 . ψo . Does the same b also lie in the space addressed by the vectors {v. . ψm .11) has identical form to (12. am }? . φo φ1 = − . φo . .4. ψo ψ1 ← − . (12. a1 . . ψo ψ2 ← − . . . quite symmetrically. am }. VECTOR REPLACEMENT With the change of variables φo ← φ1 φ2 1 . .11) Equation (12.12. so. a1 . . ← − ψo 303 φm for which. Since φo = 1/ψo .

one can safely replace the u by a new vector v. . . . a2 . . provided that the replacement vector v includes at least a little of the replaced vector u (ψo = 0 in eqn. 12. never just in one or the other. one and the same. . we substitute into (12. Given a set of vectors {u. . obtaining the form (βo )(φo v + φ1 a1 + φ2 a2 + · · · + φm am ) + β1 a1 + β2 a2 + · · · + βm am = b.4. ψo u + ψ1 a1 + ψ2 a2 + · · · + ψm am = v 0 = 1 ψo ψ1 − ψo ψ2 − ψo = φo = φ1 = φ2 . in which we see that. . a1 . RANK AND THE GAUSS-JORDAN Table 12. . obtaining the new set {v. a vector b must lie in both spaces or neither. The two spaces are.9) the expression for u from (12. a1 .304 CHAPTER 12. . − ψm ψo = φm φo v + φ1 a1 + φ2 a2 + · · · + φm am = u 0 = 1 φo φ1 − φo φ2 − φo = ψo = ψ1 = ψ2 . Collecting terms. . . am }. Therefore.3: The symmetrical equations of § 12. b does indeed also lie in the latter space. This leads to the following useful conclusion. that an arbitrary vector b which lies in the latter space also lies in the former. this is βo φo v + (βo φ1 + β1 )a1 + (βo φ2 + β2 )a2 + · · · + (βo φm + βm )am = b. − φm φo = ψm To show that it does. a2 . am }.11). Naturally the problem’s u ↔ v symmetry then guarantees the converse. .10) and that v is otherwise an honest . in fact. yes.

10) would still also explicitly make v a linear combination of the same ak plus a nonzero multiple of u. for.6 have introduced the rank-r identity matrix Ir . As a corollary. The contradiction proves false the assumption which gave rise to it: that the vectors of the new set were linearly dependent. because according to § 12. The new space is exactly the same as the old. where the integer r is the number of ones the matrix has along its main diagonal.1).5 and 11. If a matrix element . if it were that γo v + γ1 a1 + γ2 a2 + · · · + γm am = 0 for nontrivial γo and γk . To forestall potential confusion in the matter. Also.12. unambiguous rank. Commonly. then the vectors of the new set are linearly independent. the third row is just two-thirds the second. 6 The third column of this matrix is the sum of the first and second columns. in which case v would be a linear combination of the several ak alone. an n × n matrix has rank r = n. too. then (12. then either γo = 0—impossible since that would make the several ak themselves linearly dependent—or γo = 0. the matrix has only two independent vectors. Either way.5. we should immediately observe that—like the rest of this chapter but unlike some other parts of the book—this section explicitly trades in exact numbers.1.5 Rank Sections 11.3. The section demonstrates that every matrix has a definite. Such vector replacement does not in any way alter the space addressed. if the vectors of the original set happen to be linearly independent (§ 12. The rank of this 3 × 3 matrix is not r = 3 but r = 2. But if v were a linear combination of the several ak alone. Yet both combinations cannot be. and shows how this rank can be calculated.3. RANK 305 linear combination of the several vectors of the original set. untainted by foreign contribution. but consider the matrix 2 5 4 3 2 1 6 4 3 6 9 5. by columns or by rows. Hence the vectors of the new set are equally as independent as the vectors of the old. 12. too. two distinct combinations of independent vectors can never target the same v. This section establishes properly the important concept of matrix rank. Other matrices have rank.

then it is exactly 5. the numbers are exact. The author has exactly one brother. but that is not the point here. and so on.5. “the end justifies the means. which thereby one can and does prove.2 days). so this subsection is to prepare the reader to expect the maneuver.0 ± 0. If P3 then Q. Herein.306 CHAPTER 12.12 12. then.1 sides! A triangle has exactly three sides. it is the applied mathematician’s responsibility to distinguish between the two kinds of quantity.0±0. would be to suppose P1 and show that it led to Q. one can still conclude that their common object Q is true. On the contrary. none of which one actually means to prove. which one might name.5.2 we shall execute a pretty logical maneuver. Where the distinction matters. although one can draw no valid conclusion regarding any one of the three predicates P1 . of course—especially matrices populated by measured data—can never truly be exact. Once the reader feels that he grasps its logic.1 inches. Exact quantities are every bit as valid in applied mathematics as imprecise ones are. The ratio of a circle’s circumference to its radius is exactly 2π. Many real-world matrices. the length of your thumb may indeed be 3. The final step would be to show somehow that P1 .1 A logical maneuver In § 12.1 inches for the length of your thumb. the maneuver if unexpected can confuse. One valid way to prove Q. It is false to suppose that because applied mathematics permits imprecise quantities. 13 The maneuver’s name rings a bit sinister. it also requires them. the means is to assert several individually suspicious claims.0±0. but surely no triangle has 3. then again to suppose P3 and show that it also led to Q. P2 and P3 could not possibly all be false at once. P2 and P3 are true is not known. RANK AND THE GAUSS-JORDAN here is 5. A construction contract might require the builder to finish within exactly 180 days (though the actual construction time might be an inexact t = 172. Therefore.6 ± 0. Here. Which of P1 . 12 . If P2 then Q.2.”13 When embedded within a larger logical construct as in § 12. P2 or P3 . does it not? The author recommends no such maneuver in social or family life! Logically here however it helps the math. It is a subtle maneuver. then exactly 0. if 0. The logical maneuver follows this basic pattern. like 3.5. but what is known is that at least one of the three is true: P1 or P2 or P3 . If P1 then Q. The end which justifies the means is the conclusion Q. then alternately to suppose P2 and show that it separately led to Q. he can proceed to the next subsection where the maneuver is put to use.

7).1 among others has already illustrated the technique. then. e5 . 15 As the reader will have observed by this point in the book. e2 . In fact it is impossible.14) exists for each ej . e4 . but to prove this. . . a4 .12) can be written in the form (AIr )B = Is . then reasons toward a contradiction which proves the assumption false. here we have not necessarily assumed that the several ak are linearly independent. Observe that unlike as in § 12. This subsection proves the impossibility. . a3 . . .3. but the technique’s use here is more sophisticated. If r < s. because Ir attacking from the right is the column truncation operator (§ 11. more precisely. ever transform an identity matrix into another identity matrix of greater rank (§ 11. B operates on the r columns of AIr to produce the s columns of Is . .3.15 and it runs as follows.3.12) holds: A = Is = B. RANK 307 12. are nothing more than the elementary vectors e1 .2 The impossibility of identity-matrix promotion Consider the matrix equation AIr B = Is .1.14) will turn out to be false because there are too many ej .6). One assumes the falsehood to be true. 14 . (12. we shall assume for the moment that the claim were true. . ar denote these columns.1. that a linear combination14 b1j a1 + b2j a2 + b3j a3 + · · · + brj ar = ej (12. reversible or otherwise. a5 .5. a2 . the technique—also called reductio ad absurdum—is the usual mathematical technique to prove impossibility. the product AIr is a matrix with an unspecified number of rows but only r columns—or.5). The proof then is by contradiction.13) makes is thus that the several vectors ak together address each of the several elementary vectors ej —that is. . 1 ≤ j ≤ s.1. (12.13) where. Viewed this way. The s columns of Is . Equation (12. The claim (12. it is not so easy. Let the symbols a1 . however.12) If r ≥ s. Section 6. per § 11. The r columns of AIr are nothing more than the first through rth columns of A.5.3. es (§ 11. with no more than r nonzero columns. then it is trivial to find matrices A and B for which (12. e3 . The claim (12. It shows that one cannot by any row and column operations.12.

lead ultimately alike to the same identical end remains to be determined.4 is that e1 contain at least a little of the vector ak it replaces—that bk1 = 0. then a3 might be available instead because b31 = 0. . even though it is illegal to replace an ak by an e1 which contains none of it. true and false. then all the logic requires is an assurance that there does exist at least one true choice. ar }. {e1 . Therefore. a5 . then according to (12. Whether as the maneuver also demands. . so for e1 to replace a1 might not be allowed. according to § 12. but surely it contains at least one of them. . . provided that the false choice and the true choice lead ultimately alike to the same identical end. .1 comes in. For j = 1. even if we remain ignorant as to which choice that is. The claim (12. even though we have no idea which of the several ak the vector e1 contains.14) guarantees at least one true choice. Some of the ak might indeed be forbidden. . ar }. Of course there is no guarantee specifically that b11 = 0. a5 . . RANK AND THE GAUSS-JORDAN Consider the elementary vector e1 .14) at least one of the several bk1 also is nonzero. However. even though replacing the wrong ak logically invalidates any conclusion which flows from the replacement. Because e1 is a linear combination. still we can proceed with the proof. If they do. inasmuch as e1 is nonzero. there is always at least one ak which e1 can replace. e1 . In this case the new set would be {a1 . a4 . (For example. all the choices. replacing a1 by e1 .4 one can safely replace any of the vectors in the set by e1 without altering the space addressed. The only restriction per § 12. According to (12. which says that the elementary vector e1 is a linear combination of the several vectors {a1 . . . a2 .14).14) is b11 a1 + b21 a2 + b31 a3 + · · · + br1 ar = e1 . a4 . and if bk1 is nonzero then e1 can replace ak . . a3 . a2 . For example. but never all. a2 .5. the vector e1 might contain some of the several ak or all of them. a4 . a3 . a5 .) Here is where the logical maneuver of § 12. but that section depends logically on this one. . The book to this point has established no general method to tell which of the several ak the elementary vector e1 actually contains (§ 13.308 CHAPTER 12.2 gives the method. . . (12. if a1 were forbidden because b11 = 0. ar }. so we cannot licitly appeal to it here).

e2 lies in the space addressed by the original set {a1 . . a3 . a5 . ar }. e3 . . Therefore as we have seen. for (as should seem plain to the reader who grasps the concept of the elementary vector. ar }. a2 . or whatever the new set happens to be). a2 . which thereby is properly established.7) no elementary vector can ever be a linear combination of other elementary vectors alone! The linear combination which forms e2 evidently includes a nonzero multiple of at least one of the remaining ak . And so it goes. § 11. Again it is impossible for all the coefficients βk2 to be zero. . This newer set addresses precisely the same space as the previous set. . which is attached to e1 ) must be nonzero.5. e2 . addresses precisely the same space as did the original set {a1 . . . . a5 . All intermediate choices. a3 . . . e2 also lies in the space addressed by the new set {e1 . a4 . ultimately lead to the single conclusion of this paragraph. . . .3.5. . ar }. . a2 . thus also as the original set. we now choose an ak with a nonzero coefficient βk2 = 0 and replace it by e2 . RANK 309 Now consider the elementary vector e2 . a3 . . . . obtaining an even newer set of vectors like {e1 . a2 . a5 . not only do coefficients bk2 exist such that b12 a1 + b22 a2 + b32 a3 + · · · + br2 ar = e2 .12. At least one of the βk2 attached to an ak (not β12 . . . but moreover. which. That is. e2 . . . until all the ak are gone and our set has become {e1 . a5 . a4 . . . replacing one ak by an ej at a time. a4 . true and false. e5 . ar }.14). According to (12. And this is the one identical end the maneuver of § 12. a4 . ar } (or {a1 . Therefore by the same reasoning as before. a3 . but also coefficients βk2 exist such that β12 e1 + β22 a2 + β32 a3 + · · · + βr2 ar = e2 . er }. . a2 .1 has demanded. a5 . . e4 . e1 . as we have reasoned. it is impossible for β12 to be the sole nonzero coefficient.

are evidently left over. than there are ak . e2 . (This is from § 11. e2 . e4 . 12. . e4 . 1 ≤ j ≤ s. as we have stipulated.13). a5 . even when r < s.3 General matrix rank and its uniqueness Step 8 of the Gauss-Jordan algorithm (§ 12. asserts that B is a column operator which does precisely what we have just shown impossible: to combine the r columns of AIr to yield the s columns of Is . e5 . Plainly this is impossible with respect to the left-over ej . . r < j ≤ s. r < j ≤ s. 1 ≤ k ≤ r. but until now we have lacked the background theory to prove it. they can never promote them. . the claim (12. The false claim: that the several ak . r < s.5. . . we conclude that although row and column operations can demote identity matrices in rank. because. The proof begins with a formal definition of the quantity whose uniqueness we are trying to prove. 1 ≤ k ≤ r. RANK AND THE GAUSS-JORDAN The conclusion leaves us with a problem. Equation (12. . The contradiction proves false the claim which gave rise to it. . ar } together addressed each of the several elementary vectors ej . .310 CHAPTER 12. es .3. In other words. 1 ≤ j ≤ s. e3 .14) made was that {a1 . Here is the proof. addressed all the ej . Back at the beginning of the section. There are more ej . Hence finally we conclude that no matrices A and B exist which satisfy (12. But as we have seen.12) when r < s. The promotion of identity matrices is impossible. a3 . One should like to think that this rank r were a definite property of the matrix itself rather than some unreliable artifact of the algorithm. a2 . Some elementary vectors ej . Now we have the theory. . e3 . er } together addressed each of the several elementary vectors ej . .12) written differently. the latter of which are just the elementary vectors e1 . .) • The rank r of a general matrix A is the rank of an identity matrix Ir to which A can be reduced by reversible row and column operations. • The rank r of an identity matrix Ir is the number of ones along its main diagonal.3) discovers a rank r for any matrix A. . however.3. .5. e5 . which is just (12. a4 . this amounts to a claim that {e1 .

15) only a single rank r is possible. −1 −1 B> B> = I = B> B> . < > Combining these equations. according to (12. This finding has two immediate implications: • Reversible row and/or column operations exist to change any matrix of rank r to any other matrix of the same rank. RANK 311 A matrix A has rank r if and only if matrices B> and B< exist such that B> AB< = Ir . we suppose that another rank were possible. −1 −1 B< B< = I = B< B< . reversible operations exist to change both matrices to Ir and back. No matrix has two different ranks. (12. Therefore r = s is also impossible. Then. −1 −1 B> Ir B< = G−1 Is G−1 .12. then for Is . A = G−1 Is G−1 . −1 −1 A = B> Ir B< . a promotion.2 and its (12. The discovery that every matrix has a single. > < Solving first for Ir . but they are important achievements . and r=s is guaranteed. (B> G−1 )Is (G−1 B< ) = Ir . But according to § 12.15).5.15) The question is whether in (12. Were it that r = s. unambiguous rank and the establishment of a failproof algorithm—the Gauss-Jordan—to ascertain that rank have not been easy to achieve. Matrix rank is unique.12). To answer the question. • No reversible operation can change a matrix’s rank. The reason is that. > < −1 −1 (G> B> )Ir (B< G< ) = Is . promotion is impossible. −1 −1 A = B> Ir B< . then one of these two equations would constitute the demotion of an identity matrix and the other.5. that A had not only rank r but also rank s.

Since AT is such a matrix. A matrix of less than full rank is a degenerate matrix. both. m × n matrix C likewise to r < n. We shall rely on it often. Section 12. To square matrices. m = n. Matrix rank by contrast is an entirely solid. The greatest rank possible for an m × n matrix is the lesser of m and n. RANK AND THE GAUSS-JORDAN nonetheless. Consider a tall m × n matrix C.4 was that a matrix of independent rows always has rank equal to the number of rows. using multiples of the other columns to wipe the dependent column out.8 comments further. The reason these achievements matter is that the mere dimensionality of a matrix is a chimerical measure of the matrix’s true size—as for instance for the 3 × 3 example matrix at the head of the section. both lines of reasoning apply. One of the conclusions of § 12. done by reversible column operations. but also A itself. one could then interchange it over to the matrix’s extreme right. 12. Now consider a tall matrix A with the same m × n dimensionality. we have that • a tall m × n matrix.312 CHAPTER 12. of course. . what this T T T T says is that there exist operators B< and B> such that In = B< AT B> .3. But formally. The transpose AT of such a matrix has a full n independent rows.3 binds the rank of the original.5). m ≥ n. But the shrink. has full rank r = n.5. has full rank if and only if its columns are linearly independent. by which § 12. the rank r of a matrix can exceed the number neither of the matrix’s rows nor of its columns. One could by definition target the dependent column with addition elementaries. Having zeroed the dependent column. worth the effort thereto. Gathering findings. shrinking the matrix to m × (n − 1) dimensionality. Shrinking the matrix necessarily also shrinks the bound on the matrix’s rank to r ≤ n − 1—which is to say. if m = n. dependable measure. the transpose of which equation is B> AB< = In —which in turn says that not only AT . is defined to be an m × n matrix with maximum rank r = m or r = n—or.5. but with a full n independent columns. is itself reversible.5. to r < n. m ≥ n. effectively throwing the column away. m ≤ n. one of whose n columns is a linear combination (§ 12.4 The full-rank matrix According to (12.1) of the others. A full-rank matrix. Parallel reasoning rules the rows of broad matrices. is necessarily degenerate for this reason. one of whose columns is a linear combination of the others. then. The matrix C. its rank is a full r = n.

but. meaning that knowledge of b does not suffice to determine x uniquely. because a tall or broad matrix cannot but include. 12. more columns or more rows than Ir . Further generalities await Ch. respectively. To say that a matrix has full column rank is to say that it is tall or square and has full rank r = n ≤ m.5. > If the m-element vector c ≡ G−1 b. The truth of this claim can be seen by decomposing the system’s matrix A by Gauss-Jordan and then leftmultiplying the decomposed system by G−1 to reach the form > Ir G< x = G−1 b. never just one or the other. An overdetermined linear system Ax = b cannot have a solution for every possible m-element driving vector b. has full rank if and only if its columns and/or its rows are linearly independent.6 analyzes the unsolvable overdetermined linear system among others.2 solves the exactly determined linear system. m = n. Section 13. if r < m. then Ir G< x = c. Only a square matrix can have full column rank and full row rank at the same time.5 Underdetermined and overdetermined linear systems (introduction) The last paragraph of § 12. If A happily has both. then the system is paradoxically both underdetermined and overdetermined and is thereby degenerate. But since G> is .4 solves the nonoverdetermined linear system. or neither. if r < n—because inasmuch as some of A’s columns then depend linearly on the others such a system maps multiple n-element vectors x to the same m-element vector b.5. then the system is exactly determined.12.4 provokes yet further terminology. Section 13. • a square n × n matrix. which is impossible > unless the last m − r elements of c happen to be zero. m ≤ n. 13. To say that a matrix has full row rank is to say that it is broad or square and has full rank r = m ≤ n. the present subsection would observe at least the few following facts. If A lacks both. RANK 313 • a broad m × n matrix. A linear system Ax = b is underdetermined if A lacks full column rank—that is. Complementarily. and • a square matrix has both independent columns and independent rows. Section 13. has full rank if and only if its rows are linearly independent. regarding the overdetermined system specifically.5. a linear system Ax = b is overdetermined if A lacks full row rank—that is.

C = Ir G< In . 12. The full-rank factorization is not unique.6 The full-rank factorization One sometimes finds dimension-limited matrices of less than full rank inconvenient to handle.5. each b corresponds to a unique c and vice versa. if b is an unrestricted m-element vector then so also is c. Other full-rank factorizations are possible. better stated.16) The truncated Gauss-Jordan (12. Complementarily. then the full-rank factorization is trivial: A = Im A or A = AIn . every dimension-limited. whatever other solutions such a system might have. and an easy one innocently to commit. § 3. because it has no last m − r elements. The error is easy to commit because the equation looks right. RANK AND THE GAUSS-JORDAN invertible. a nonoverdetermined linear system Ax = b does have a solution for every possible m-element driving vector b. which verifies the claim. both also of rank r: A = BC. for the trivial reason that r = m. m × n matrix of rank r can be expressed as the product of two full-rank matrices. it has at least the solution x = 0. It is an analytical error. However. so.6. 16 [2. to require that Ax = b for unrestricted b when A lacks full row rank. good for any matrix.3][37. because such an equation is indeed valid over a broad domain of b and might very well have been written correctly in that context. This is so because in this case the last m − r elements of c do happen to be zero.7) constitutes one such full-rank factorization: B = Im G> Ir . only not in the context of unrestricted b.54).16 Of course. Section 13. It is never such an analytical error however to require that Ax = 0 because.314 CHAPTER 12.4 uses the full-rank factorization. including among others the truncated Gram-Schmidt (13. (12. one m × r and the other r × n. however. “Moore-Penrose generalized inverse”] . if an m × n matrix already has full rank r = m or r = n. Analysis including such an error can lead to subtly absurd conclusions. or. because c in this case has no nonzeros among its last m − r elements.

as the full-column-rank matrix A evidently leaves one free to do. regardless of the pivots one chooses during the Gauss-Jordan algorithm’s step 3. That S = I comes immediately of choosing q = i for pivot column during each iterative instance of the algorithm’s step 3. We conclude that though one might voluntarily choose q = i during the algorithm’s step 3. the algorithm cannot force one to do so if r = n. of a tall or square m × n matrix A of full column rank r = n ≤ m always finds the factor K = I. what if the ˜ only nonzero elements remaining in I’s ith column stood above the main diagonal. A = P DLU Ir KS. then one would indeed have to choose q = i to swap the unusable column away rightward. Therefore. the Gauss-Jordan decomposition theoretically needs no K or S for such matrices. Hence step 12 does nothing and K = I.2) includes the factors K and S precisely to handle matrices with more columns than rank. Step 12 ˜ ˜ nulls the spare columns q > r that dress I’s right. which fact lets us abbreviate the decomposition for such matrices to read A = P DLU In . The column would remain unusable. RANK 315 12. If one happens always to choose q = i as pivot column then not only K = I but S = I. then indeed S = I. but see: nothing in the algorithm later fills such a column’s zeros with anything else—they remain zeros—so swapping the column away rightward could only delay the crisis. (12. Yet if one always does choose q = i. the Gauss-Jordan decomposition (12. That K = I is seen by the algorithm’s step 12.5.7 Full column rank and the Gauss-Jordan factors K and S The Gauss-Jordan decomposition (12. which creates K.2). too. were it so. Matrices of full column rank r = n. one must ask.12.17) .5. by definition have no such problem. Such contradiction can only imply that if r = n then no unusable column can ever appear. Eventually the column would reappear on pivot when no usable column rightward remained available to swap it with. But. One need not swap. which contrary to our assumption would mean precisely that r < n. unavailable for step 4 to bring to pivot? Well. can one choose so? What if column q = i were unusable? That is. Theoretically. but in this case I has only r columns and therefore has no spare columns to null. common in applications.

and. that each row of In lies in the space the rows of A address. 12.17)’s complementary form. after all. much more than its mere dimensionality or the extent of its active region. U −1 L−1 D −1 P −1 A = In . is an extremely important matrix theorem. Without this theorem. the very concept of matrix rank must remain in doubt. however. This is obvious and boring. if a matrix A is square and has full rank r = n. two independent columns.2 is used). represents the matrix’s true size. since B = AT has full rank just as A does. after all—especially numerically to avoid small pivots during early invocations of the algorithm’s step 5. simply by applying some 3 × 3 row and column . The concept underlying the theorem promotes the useful sensibility that a matrix’s rank.17) is not mandatory but optional for a matrix A of full column rank (though still r = n and thus K = I for such a matrix. In ’s columns together and In ’s rows together each address the same. along with all that attends to the concept.316 CHAPTER 12. which we have spent so many pages to attain. then A’s columns together. The column permutor S exists to be used. rank r = 2. Since P DLU acts as a row operator.5. complete n-dimensional space.3. The theorem is the rock upon which the general theory of the matrix is built.8 The significance of rank uniqueness The result of § 12. In the whole. the honest 2 × 2 matrix » 5 3 1 6 – has two independent rows or. but interesting is the converse implication of (12.5. Equation (12. hence. even when the unabbreviated eqn. 12. It constitutes the chapter’s chief result. The rows of In and the rows of A evidently address the same space. One can easily construct a phony 3 × 3 matrix from the honest 2 × 2. One can moreover say the same of A’s columns. For example. alternately.17) implies that each row of the full-rank matrix A lies in the space the rows of In address. Dimensionality can deceive. that matrix rank is unique. (12. RANK AND THE GAUSS-JORDAN Observe however that just because one theoretically can set S = I does not mean that one actually should. A’s rows together. if doing exact arithmetic. set S = I if one wanted to. There are however times when it is nice to know that one theoretically could.

one sometimes is dismayed to discover that one of the equations. This can happen in matrix work. An applied mathematician with some matrix experience actually probably recognizes this particular 3 × 3 matrix as a fraud on sight. dimensions or equations one actually has available to work with.5. Consider for instance the 5 × 5 matrix (in hexadecimal notation) 2 3 12 9 3 1 0 F 15 6 3 2 12 7 2 2 6 2 7 6 D 9 −19 − E −6 7. Its true rank is r = 2.17 Now. but it is a very simple example. so long as the true rank is correctly recognized.” are a bit hyperbolic. 3 6 7 4 −2 0 6 1 5 5 1 −4 4 1 −8 17 As the reader can verify by the Gauss-Jordan algorithm. admittedly. We have here caught a matrix impostor pretending to be bigger than it really is. That is the sense of matrix rank. No one can just look at some arbitrary matrix and instantly perceive its true rank. The rank of a matrix helps one to recognize how many truly independent vectors. It looks like a rank-three matrix. RANK elementaries: T(2/3)[32] » 5 3 1 6 – 317 T1[13] T1[23] = 2 5 4 3 2 1 6 4 3 6 9 5. but of course there is nothing mathematically improper or illegal about a matrix of less than full rank. .12. the matrix’s rank is not r = 5 but r = 4. too. adjectives like “honest” and “phony. 6 The 3 × 3 matrix on the equation’s right is the one we met at the head of the section. is really just a useless combination of the others. When one models a physical phenomenon by a set of equations. The last paragraph has used them to convey the subjective sense of the matter. thought to be independent. but really has only two independent columns and two independent rows. rather than how many seem available at first glance.” terms like “imposter.

318 CHAPTER 12. RANK AND THE GAUSS-JORDAN .

Building upon the two tedious chapters. First. after all. 11 and 12. if only to prepare to do here something we already knew how to do? The question is a fair one.Chapter 13 Inversion and orthonormalization The undeniably tedious Chs.5. as in the last step to derive (3. Why should we have suffered two bulky chapters. Uses however it has. 12 might at least review § 12.5 have already broached1 the matrix’s most basic use.1. this chapter brings the first rewarding matrix work. before we go on. the matrix neatly solves a linear system not only for a particular driving vector b but for all possible driving vectors b at one stroke. to represent a system of m linear scalar equations in n unknowns neatly as Ax = b and to solve the whole system at once by inverting the matrix A that characterizes it.1 and 12. the primary subject of this chapter. we want to confess that such a use alone. 11 and 12 have piled the matrix theory deep while affording scant practical reward. Now. as this chapter 1 The reader who has skipped Ch.5.9) as far back as Ch. One might be forgiven for forgetting after so many pages of abstract theory that the matrix afforded any reward or had any use at all.5. Sections 11. 3. We already knew how to solve a simultaneous system of linear scalar equations in principle without recourse to the formality of a matrix. on the surface of it—though interesting—might not have justified the whole uncomfortable bulk of Chs. 319 . but admits at least four answers.

to solve the linear system neatly is only the primary and most straightforward use of the matrix. the matrix allows § 13. 14. to do so both optimally and efficiently. It continues in § 13. G−1 and G−1 can be found. then he stands in good company. After briefly revisiting the Newton-Raphson iteration in § 13.8 through 13. Fourth.1 Inverting the square matrix Consider an n×n square matrix A of full rank r = n.4. it concludes by introducing the concept and practice of vector orthonormalization in §§ 13. G< . it brings forth the aforementioned pseudoinverse.6. The matrix finally begins to show its worth here.2. m × n linear system in § 13. n × n linear system in § 13. whereas such overdetermined systems arise commonly in applications. The chapter opens in § 13. so. if the reader now wonders. moreover. Third. specific applications aside. INVERSION AND ORTHONORMALIZATION explains.7.6 to introduce the pseudoinverse to approximate the solution to an unsolvable linear system and. not its only use: the even more interesting eigenvalue and its incidents await Ch.3 by computing the rectangular matrix’s kernel to solve the nonoverdetermined. one should never underestimate the blunt practical benefit of reducing an arbitrarily large grid of scalars to a single symbol A. which rightly approximates the solution to the unsolvable overdetermined linear system.11. 13. Second and yet more impressively.320 CHAPTER 13. Most students first learning the matrix have wondered at this stage whether it were worth all the tedium.1 by inverting the square matrix to solve the exactly determined. Suppose that extended operators G> . which one can then manipulate by known algebraic rules. each with an n × n active > < . In § 13.

The definitions do however necessarily.31) that In A −1 −1 In G< G> = = A G−1 In G−1 < > = AIn .13. = In . 2 The symbology and associated terminology might disorient a reader who had skipped Chs. None of this is complicated. = G−1 G−1 In .49. In this book. because all matrices are theoretically ∞ × ∞. which means that the operators affect respectively only the first m rows or n columns of the thing they operate on. but more often it does not—which naturally is why the other books tend to forbid such multiplication). the symbol I theoretically represents an ∞ × ∞ identity matrix. 13. INVERTING THE SQUARE MATRIX region (§ 11. the matrix A of the m × n system Ax = b has nonzero content only within the m × n rectangle. really. though it too can be viewed as an ∞ × ∞ matrix with zeroes in the unused regions. as in eqns. anyway (though whether it makes any sense in a given circumstance to multiply mismatched matrices is another question. > > G−1 G< = I = G< G−1 .1. If interpreted as an ∞ × ∞ matrix.3. one can legally multiply any two matrices. In A −1 −1 G< G> In A −1 (G< In G−1 )(A) > = G> G< In . Observing from (11. such that2 G−1 G> = I = G> G−1 . . especially § 11. Its purpose is merely to separate the essential features of a reversible operation like G> or G< from the dimensionality of the vector or matrix on which the operation happens to operate. In this book. 11 and 12.2). Outside the m × m or n × n square. (In the present section it happens that m = n because the matrix A of interest is square.24 and 14.3. the operators G> and G< each resemble the ∞ × ∞ identity matrix I. < > 321 (13. = In . the reader might briefly review the earlier chapters. but this footnote uses both symbols because generally m = n. < < A = G> In G< . slightly diverge from definitions the reader may have been used to seeing in other books.1) we find by successive steps that A = G> In G< .) The symbol Ir contrarily represents an identity matrix of only r ones. sometimes it does make sense. To the extent that the definitions confuse.

< > (13. not I. The rank-n inverse exists. but the rank-n inverse from (11. That is.2) = In G> G< .2) include the following. (12.2).” because the rank is implied by the size of the square active region of the matrix inverted. • If B = A−1 then B −1 = A. A is itself the rank-n inverse of A−1 . reciprocal pair. INVERSION AND ORTHONORMALIZATION or alternately that A = G> In G< . unique inverse A−1 .2) features the important matrix A−1 . its columns are linearly independent. = In . • Like A.5. G< .4). According > < to (13. which might seem a practical hurdle. No other rank-n inverse of A exists.2). AIn −1 −1 AIn G< G> (A)(G−1 In G−1 ) < > Either way.49) is not quite the infinitedimensional inverse from (11. • On the other hand. When naming the rank-n inverse in words one usually says simply. there is no trouble here. the product of A−1 and A—or. written more fully. but we have defined it in (11. inasmuch as A is square and has full rank.3) and the body of § 12. “the inverse. be known > < and honor n × n active regions. separately. Of course. . so. we have that A−1 A = In = AA−1 . and indeed we know how to find them. G< .3 have shown exactly how to find just such a G> . However. A−1 ≡ G−1 In G−1 .45). it does per (13. The factors do exist. G> . Properties that emerge from (13. G−1 and G−1 for any square matrix A of full rank. the rank-n inverse of A.49). for this to work. the rank-n inverse A−1 (more fully written A−1(n) ) too is an n × n square matrix of full rank r = n. which is what G−1 and G−1 are. so it has only the one. Equation (13. the product of A−1(n) and A—is. but In . = In . We have not yet much studied the rank-n inverse.322 CHAPTER 13. nonstandard notation A−1(n) . its rows and. • Since A is square and has full rank (§ 12. G−1 and G−1 must exist. (12.2) indeed have an inverse A−1 . > < without exception. where we gave it the fuller. The matrices A and A−1 thus form an exclusive.

Mathematical convention owns a special name for a square matrix which is degenerate and thus has no inverse.13. • Only a square. From this beginning and the fact that In = AA−1 . . it is hardly the only means.3. A−1 is unique. which has no reciprocal.2).4’s observation that the columns (like the rows) of A are linearly independent because A is square and has full rank. n × n matrix of full rank r = n has a rank-n inverse. What are unique are not the factors but the A and A−1 they produce. A matrix A′ which is not square. so in the sense of (13. incidentally. inasmuch as the Gauss-Jordan decomposition plus (13.1). B = A−1 . it follows that [A−1 ]∗1 represents3 the one and only possible combination of A’s columns which achieves e1 . and any proper G> and G< found by any means will serve so long as they satisfy (13. that [A−1 ]∗2 represents the one and only possible combination of A’s columns which achieves e2 .5. and so on through en . or whose rank falls short of a full r = n. of rank r < n? Rank promotion is impossible as §§ 12. One could observe likewise respecting the independent rows of A.2) reliably calculate it.5.2)—that BA = In or that AB = In —much less both. It is not claimed that the matrix factors G> and G< themselves are unique. What of the degenerate n × n square matrix A′ . One can have neither equality without the other. Indeed. then A′−1 would by definition represent a row or column operation which impossibly promoted A′ to the full rank r = n of In .3’s finding that reversible operations like G−1 and G−1 cannot change In ’s rank. it calls it a singular matrix.” Refer to § 11. thus.2) such a matrix has no inverse. On the contrary.2) of A−1 plus § 12. Either way. then both equalities in fact hold. in that it has no inverse such a degenerate matrix closely resembles the scalar 0.2 and 12. That A−1 is an n × n square matrix of full rank and that A is itself the inverse of A−1 proceed from the definition (13. INVERTING THE SQUARE MATRIX 323 • If B is an n × n square matrix and either BA = In or AB = In . > < That the inverse exists is plain. Though the GaussJordan decomposition is a convenient means to G> and G< . many different pairs of matrix factors G> and G< can yield A = G> In G< .3 have shown.5. no other n × n matrix B = A−1 satisfies either requirement of (13.1.1. is not invertible in the rank-n sense of (13.5. 3 The notation [A−1 ]∗j means “the jth column of A−1 . no less than that many different pairs of scalar factors γ> and γ< can yield α = γ> 1γ< . for. That the inverse is unique begins from § 12. Moreover. if it had.

See also §§ 12. (13.4) with less effort.2 The exactly determined linear system Section 11. introductory.1 has shown how the single matrix equation Ax = b (13.4. which gives A full rank and makes it invertible. What the tutorials do is pedagogically necessary—it is how the 4 . First.5. 1][31. then the rows of A are similarly independent.4 As the chapter’s introduction has observed. however. appending the right number of null rows or columns to a nonsquare matrix does turn it into a degenerate square. we For motivational reasons. The definitions of the present particular section are meant for square matrices.3) and the corresponding system of linear scalar equations by left-multiplying (13.6.4 and 13. in the long run the tutorials save no effort. However. tutorial linear algebra textbooks like [23] and [31] rightly yet invariably invert the general square matrix of full rank much earlier. INVERSION AND ORTHONORMALIZATION And what of a rectangular matrix? Is it degenerate? Well. one can solve (13. then one can indeed reach (13.4) with rather less effort than the book has done.2) and (13. reaching (13.3) concisely represents an entire simultaneous system of linear scalar equations. the student fails to develop the GaussJordan decomposition properly.1) to reach the famous formula x = A−1 b. not exactly. Under these conditions. Furthermore.324 CHAPTER 13. not necessarily. Second. Ch. because the student still must at some point develop the theory underlying matrix rank and supporting each of the several coincident properties of § 14.3 and 12. then the matrix A has square. (13. It has taken the book two long chapters to reach (13. The deferred price the student pays for the simplerseeming approach of the tutorials is twofold.5.5.1.4) concisely solves a simultaneous system of n linear scalar equations in n scalar unknowns. § 1. in which case the preceding argument applies.4) Inverting the square matrix A of scalar coefficients. 13. they do not neatly apply to nonsquare ones. It is the classic motivational result of matrix theory.3) by the A−1 of (13. instead learning the less elegant but easier to grasp “row echelon form” of “Gaussian elimination” [23.5. n × n dimensionality. no.4). 13. If the system has n scalar equations and n scalar unknowns. if the n scalar equations are independent of one another.2]—which makes good matrixarithmetic drill but leaves the student imperfectly prepared when the time comes to study kernels and eigensolutions or to read and write matrix-handling computer code.2. If one omits first to prepare the theoretical ground sufficiently to support more advanced matrix work. Refer to §§ 12.

5. e If we were really precise. Equation (13. and surely more comprehensible to those who do not happen to have read this particular book.6. .49).3 The kernel If a matrix A has full column rank (§ 12. any n-element x (including x = 0) that satisfies (13. Indeed. minimally represents the kernel of A if and only if • AK has n × (n − r) dimensionality (which gives AK tall rectangular form unless r = 0). though. writer first learned the matrix and probably how the reader first learned it. thus degenerate. 13. 5 The conventional mathematical notation for the kernel of A is ker{A}. no unique solution to a linear system described by a singular square matrix is possible—though a good approximate solution is given by the pseudoinverse of § 13. the rows of the matrix are linearly dependent. THE KERNEL 325 shall soon meet additional interesting applications of the matrix which in any case require the theoretical ground to have been prepared. because Chs.13. null{A} or something nearly resembling one of the two—the notation seems to vary from editor to editor—which technically represent the kernel space itself. Let A be an m × n matrix of rank r. as opposed to the notation AK which represents a matrix whose columns address the kernel space.4) is only the first fruit of the effort. where derivations prevail.3. where the square matrix A is singular. In the language of § 12.5) belongs to the kernel of A.5.5.5) is impossible if In x = 0. A second matrix. The abbreviated notation AK is probably clear enough for most practical purposes. Either way.5) is possible even if In x = 0. the singular square matrix characterizes a system that is both underdetermined and overdetermined. In this book. If the matrix however lacks full column rank then (13.4). the superfluous equations either merely reproduce or flatly contradict information the other equations already supply. In either case. This book de¨mphasizes the distinction and prefers the kernel matrix notation AK . Where the inverse does not exist. meaning that the corresponding system actually contains fewer than n useful scalar equations. the inversion here goes smoothly. Depending on the value of the driving vector b. not to a study reference like this book. then the columns of A are linearly independent and Ax = 0 (13.5 AK . too—but it is appropriate to a tutorial. we might write not AK but AK(n) to match the A−1(r) of (11. the proper place to invert the general square matrix of full rank is here. 11 and 12 have laid under it a firm foundation upon which—and supplied it the right tools with which—to work.

3. 13. G> Ir KSx = b.4. for generality’s sake. where S −1 . Ax = A(AK a) = (AAK )a = 0.1 derives the formula.5) requires but it serves § 13.7) is not easy. and it is the latter space rather than AK as such that is technically the kernel (if you forget and call AK “a kernel. so.3. next. What is unique is not the kernel matrix but rather the space its columns address. the columns of AK are linearly independent. Section 13.8) and where b and x are respectively m. (13. The Gauss-Jordan kernel formula 6 AK = S −1 K −1 Hr In−r = G−1 Hr In−r < (13. there are infinitely many kernel matrices AK to choose from for a given matrix A.6) The n−r independent columns of the kernel matrix AK address the complete space x = AK a of vectors in the kernel. . or both. > 6 The name Gauss-Jordan kernel formula is not standard as far as the writer is aware. by successive steps. This name seems as fitting as any. too. which gives AK full column rank). we leave b unspecified but by the given proviso.3) and Hr is the shift operator of § 11.9. for the moment. INVERSION AND ORTHONORMALIZATION • AK has full rank n − r (that is. Ir KSx = G−1 b. In symbols. and • AK satisfies the equation AAK = 0. Gauss-Jordan factoring A. (13. K −1 and G−1 are the factors < their respective symbols indicate of the Gauss-Jordan decomposition’s complementary form (12.7) gives a complete kernel AK of A. Except when A has full column rank the kernel matrix is not unique. This statement is broader than (13.” though. you’ll be all right).7). where the (n − r)-element vector a can have any value.326 CHAPTER 13. but we would like a name for (13. where b = 0 or r = m. The definition does not pretend that the kernel matrix AK is unique.and n-element vectors and A is an m × n matrix of rank r.1 The Gauss-Jordan kernel formula To derive (13. > Ir (K − I)Sx + Ir Sx = G−1 b. It begins from the statement of the linear system Ax = b.

9) is interesting. we have (probably after some trial and error not recorded here) planned the steps leading to (13. which is the remaining n − r elements.9) implies that one can choose the last n − r elements of Sx freely. THE KERNEL Applying an identity from Table 12. b) = f (a. > Rearranging terms. It has Sx on both sides. b) + Hr a. This makes f and thereby also x functions of the free parameter a and the driving vector b: f (a. and on the right only (In − Ir )Sx. The equation has however on the left only Ir Sx.3. The flexibility to reassociate operators in such a way is one of many good reasons Chs. b) a = f (a. .10) a ≡ H−r (In − Ir )Sx. The implication is significant. (13.9) in the improved form f = G−1 b − Ir KHr a. then f (a. Ir Sx = G−1 b − Ir K(In − Ir )Sx. Sx(a. 0) = −Ir KHr a. If b = 0 as (13. 0) a = f (a. Equation (13.11) f ≡ Ir Sx. where a represents the n − r free elements of Sx and f represents the r dependent elements. (13. b) = G−1 b − Ir KHr a. where Sx is the vector x with elements reordered in some particular way. Naturally this is no accident. 11 and 12 have gone to such considerable trouble to develop the basic theory of the matrix.9) Equation (13. 0) = 7 f (a. but that the choice then determines the first r elements. To express the implication more clearly we can rewrite (13. > Sx(a. Ir K(In − Ir )Sx + Ir Sx = G−1 b.13.7 No element of Sx appears on both sides. > Sx = f a = f + Hr a.5) requires.9) to achieve precisely this effect. Notice how we now associate the factor (In − Ir ) rightward as a row truncator. > 327 (13. which is the first r elements of Sx.2 on page 300. though it had first entered acting leftward as a column truncator. 0) + Hr a.

14) The alternate kernel formula (13. this seems trivial. 0) = (I − Ir K)Hr In−r .15) 8 These are difficult steps. 0) = (I − Ir K)Hr a. AK = S −1 (I − Ir K)Hr In−r . then x by AK ? One justifies them in that the columns of In−r are the several ej . But if all the ej at once. in aggregate.13) (13. then the columns of x(In−r . Sx(In−r .328 CHAPTER 13. exactly address the domain of a. Seen from one perspective. (13. which by definition are the independent values of x that cause b = 0.76). with the elements of a again as the weights. Sx(ej . until one grasps what is really going on here. Sx(a. if we can solve the problem for the identity matrix In−r —then we shall implicitly have solved it for every a because a is a weighted combination of the ej and the whole problem is linear. The idea is that if we can solve the problem for each elementary vector ej —that is. then ej by In−r .13) is SAK = (I − Ir K)(In − Ir )Hr = [(In − Ir ) − Ir K(In − Ir )]Hr = [(In − Ir ) − (K − I)]Hr .14) is correct but not as simple as it could be. In the event that a = ej .12) Left-multiplying by S −1 = S ∗ = S T produces the alternate kernel formula (13. For all the ej at once. 0) likewise exactly address the range of x(a. How does one justify replacing a by ej .6) has already named this range AK . The solution x = AK a for a given choice of a becomes a weighted combination of the solutions for each ej . the columns of In−r . from another perspective. by which8 SAK = (I − Ir K)Hr In−r . (13. By the identity (11. . of which any (n − r)-element vector a can be constructed as the linear combination a = In−r a = [ e1 e2 e3 ··· en−r ]a = n−r X j=1 aj ej weighted by the elements of a. And what are the solutions for each ej ? Answer: the corresponding columns of AK . Equation (13. (13. INVERSION AND ORTHONORMALIZATION Substituting the first line into the second. where 1 ≤ j ≤ n − r. baffling. 0) = (I − Ir K)Hr ej . 0).

17) Second.15) yields SAK = [(In − K −1 Ir ) − (I − K −1 )]Hr .17) and (13. since all of K’s interesting content lies by construction right of its rth column). then it appears that SAK = K −1 Hr In−r . SAK = K −1 (In − Ir )Hr + [(K −1 − I)(I − In )]Hr .2 again in the last step.16) though unproven still helps because it posits a hypothesis toward which to target the analysis. 9 Well.18) into (13.2 also help. Substituting (13. THE KERNEL 329 where we have used Table 12. .9 but (13.2.3.16) The appearance is not entirely convincing. Adding 0 = K −1 In Hr − K −1 In Hr and rearranging terms. actually. Now we have enough to go on with.18) (which actually is pretty obvious if you think about it. so SAK = K −1 (In − Ir )Hr . First. the appearance pretty much is entirely convincing.15) schematically with a pencil.13. but if one sketches the matrices of (13. no. 2 we have that K − I = I − K −1 . the quantity in square brackets is zero. but let us finish the proof symbolically nonetheless. SAK = K −1 (In − Ir )Hr + [K −1 − K −1 In − I + In ]Hr . How to proceed symbolically from (13.15) is not obvious. we have that K −1 Ir = Ir (13. right-multiplying by Ir the identity that Ir K −1 (In − Ir ) = K −1 − I and canceling terms. Factoring. (13. According to Table 12. and if one remembers that K −1 is just K with elements off the main diagonal negated. (13. Two variations on the identities of Table 12. from the identity that K + K −1 = I.

We know this because § 12. .2 Converting between kernel matrices If C is a reversible (n−r)×(n−r) operator by which we right-multiply (13.5. Moreover. AK lacks it because SAK lacks it.19) like AK evidently represents the kernel of A: AA′K = A(AK C) = (AAK )C = 0.7) has both features and does fit the definition. that it had full rank. the definition of AK in the section’s introduction demands both of these features. AK addresses it because AK comes from all a. that is. INVERSION AND ORTHONORMALIZATION which.3. reaching (13. reversibly.3.16) by S −1 = S ∗ = S T . without altering the space addressed.) The orthonormalizing column operator R−1 of (13. So. proves (13. In general such features would be hard to establish.7)’s AK actually addressed the whole kernel space of A rather than only part. Regarding the whole kernel space. and SAK lacks it because according to (13. but here the factors conveniently are Gauss-Jordan factors.16).76) has that (In − Ir )Hr = Hr In−r .4 lets one replace the columns of AK with those of A′K . the two matrices necessarily represent the same underlying kernel. The concept is the concept of the degree of freedom. Regarding redundancy.7) that was to be derived. The final step is to left-multiply (13. worth pausing briefly to appreciate. then the matrix A′K = AK C (13. (It might not let one replace the columns in sequence.6). in fact. 13.1 and 12.5.52) below incidentally tends to make a good choice for C. 13. Refer to §§ 12.330 CHAPTER 13.13) the last rows of SAK are Hr In−r .3 The degree of freedom A slightly vague but extraordinarily useful concept has emerged in this section. One would like to feel sure that the columns of (13. one column at a time. (13. Indeed this makes sense: because the columns of AK C address the same space the columns of AK address. but if out of sequence then a reversible permutation at the end corrects the order. considering that the identity (11. One would further like to feel sure that AK had no redundant columns. Moreover.2 for the pattern by which this is done. some C exists to convert AK into every alternate kernel matrix A′K of A.

Likewise. Other degrees of freedom are nonlinear in effect: a certain firing elevation gives maximum range. then he would almost surely miss. . the artillerist might find some impractical to exercise.3. he can still choose firing elevation but no longer azimuth. one in muzzle velocity (as governed by the quantity of gunpowder used to propel the ball). Two degrees of freedom remain to the artillerist. invites any real artillerist among the readership to write in to improve the example. . A seventh potential degree of freedom. . The artillerist probably preloads the cannon always with a standard charge of gunpowder because. Some apparent degrees of freedom are not real. The author. too much gunpowder might break the cannon. 10 . that is. then he can still hope to hit even with the broken carriage wheel. but. one in east-west). the height from which the artillerist fires. For example. when he finds his target in the field. And Napoleon might yell. the two are enough. he cannot spare the time to unload the cannon and alter the charge: this costs one degree of freedom. anyway. the artillerist needs somehow to recover another degree of freedom. two in aim (azimuth and elevation). the artillerist must limber up the cannon and hitch it to a horse to shift it to better ground. for this too he cannot spare time in the heat of battle: this costs two degrees. ) and waits for the target to traverse the cannon’s fixed line of fire. for Napoleon had no flying cannon. In such a strait to hit some particular target on the battlefield. Now consider what happens if the artillerist loses one of his last two remaining degrees of freedom. but. “Fire!” canceling the time degree as well. Maybe the cannon’s carriage wheel is broken and the artillerist can no longer turn the cannon. who has never fired an artillery piece (unless an arrow from a Boy Scout bow counts). since exactly two degrees are needed to hit some particular target on the battlefield. is of course restricted by the lay of the land: the artillerist can fire from a high place only if the place he has chosen to fire from happens to be up on a hill. “Fire!” (maybe not a wise thing to do. THE KERNEL 331 A degree of freedom is a parameter one remains free to determine within some continuous domain. For example. and one in time. nearer targets can be hit by firing either higher or lower at the artillerist’s discretion. for could he choose neither azimuth nor the moment to fire. If he disregards Napoleon’s order. Napoleon’s artillerist10 might have enjoyed as many as six degrees of freedom in firing a cannonball: two in where he chose to set up his cannon (one degree in north-south position. muzzle velocity gives the artillerist little control firing elevation does not also give. for he needs two but has only one. Yet even among the six remaining degrees of freedom. On the other hand.13.

If the underdetermined system is not also overdetermined.20) .332 CHAPTER 13. if it is nondegenerate such that r = m. north-south. The n may possibly for various reasons still not suffice—it might be wise in some cases to allow n + 1 or n + 2—but in no event will fewer than n do.2 is common. This family is the topic of the next section. Engineers of all kinds think in this way: an aeronautical engineer knows in advance that an airplane needs at least n ailerons. nonoverdetermined linear system Ax = b. The driver on the mountain road cannot claim a second degree of freedom at the mountain intersection (he can indeed claim a choice. represented by the n − r free elements of a. A plane brings two. 13. but he might plausibly claim a second degree of freedom upon reaching the city.4 The nonoverdetermined linear system The exactly determined linear system of § 13. we find n − r degrees of freedom in the general underdetermined linear system. it still brings a single degree of freedom. the count captures the idea that to control n output variables of some system takes at least n independent input variables. but also common is the more general. where the web or grid of streets is dense enough to approximate access to any point on the city’s surface. east-west) even though his domain in any of the three is limited to the small volume of the pool. Reviewing (13.11). not merely a discrete choice to turn left or right. In geometry. And if the road reaches an intersection? Answer: still one degree. A point brings none. a swimmer in a swimming pool enjoys three degrees of freedom (up-down. If the line bends and turns like a mountain road. rudders and other control surfaces for the pilot adequately to control the airplane. a line brings a single degree of freedom. but the choice being discrete lacks the proper character of a degree of freedom). INVERSION AND ORTHONORMALIZATION All of this is hard to generalize in unambiguous mathematical terms. but the count of the degrees of freedom in a system is of high conceptual importance to the engineer nonetheless. an electrical engineer knows in advance that a circuit needs at least n potentiometers for the technician adequately to tune the circuit. A degree of freedom has some continuous nature. Just how many streets it takes to turn the driver’s “line” experience into a “plane” experience is a matter for the mathematician’s discretion. and so on. On the other hand. then it is guaranteed to have a family of solutions x. Basically. (13.

when the second line is added to the first and the third is substituted.20). The family of vectors expressible as AK a is called the homogeneous solution of (13. it remains in § 13. THE NONOVERDETERMINED LINEAR SYSTEM 333 in which b is a known.2.20) by definition admits more than one solution x for a given driving vector b. b) + AK a = f (a.4. Any particular solution will do.4.20) already half done. the nonoverdetermined linear system has no unique solution but rather a family of solutions. This section delineates the family.7) has given us AK and thereby the homogeneous solution. To complete the analysis.4.13.21) Except in the exactly determined edge case r = m = n of § 13. Notice the italicized articles a and the.2 A particular solution f (a. and A is a square or broad. but it does let us treat the system’s first and second lines in (13. A(AK a) = 0. > (S) x1 (a.5.4) r = m ≤ n.20). Splitting the system does not change it. Such a system is hard to solve all at once.2 to find a particular solution. x is an unknown.22) separately. whereupon AK a represents the complete family of n-element vectors that satisfy the form’s second line.4. so we prefer to split the system as Ax1 = b. x = x1 + A a.1 Particular and homogeneous solutions The nonoverdetermined linear system (13. which renders the analysis of (13.20). though.11) has that . m-element vector.22) 13. b) + Hr a. the mathematician just picks one—and is called a particular solution of (13. Equation (13. which. 13. b) = G−1 b − Ir KHr a. n-element vector. (13. The (n − r)-element vector a remains unspecified. m × n matrix of full row rank (§ 12. b) a = f (a. In the split form. The Gauss-Jordan kernel formula (13. K (13. the symbol x1 represents any one n-element vector that happens to satisfy the form’s first line—many are possible. makes the whole form (13.

24) solves the nonoverdetermined linear system in theory exactly. (It can also lose accuracy when the matrix’s rows are almost dependent. In exact arithmetic (13. > (13. This holds for any a and b.22) and (13.8. which addresses a related problem.4) solve the exactly determined linear system Ax = b. Equation (13.24) for matrices larger than some moderately large size. but since we need only one particular solution. b).24)’s reach to larger matrices. but that is more the fault of the matrix than of the formula. b) = That is.4. 13.) 13. practical calculations are usually done in limited precision. but one should at least be aware that it can in practice lose floating-point accuracy when the matrix it attacks grows too large. We are not free to choose the driving vector b. Avoiding unduly small pivots early in the Gauss-Jordan extends (13.24) broadens the solution to include the nonoverdetermined linear system.23) f (0. x1 = S −1 G−1 b. INVERSION AND ORTHONORMALIZATION where we have substituted the last line of (13. None of those equations however can handle the . and for yet larger matrices a bewildering variety of more sophisticated techniques exists to mitigate the problem. See § 14.3 The general solution Assembling (13. (13.334 CHAPTER 13.2) and (13. b) 0 = f (0. Why not a = 0? Then f (0. > Sx1 (0.5 The residual Equations (13.24) to the nonoverdetermined linear system (13. in which compounded rounding error in the last bit eventually disrupts (13.23) yields the general solution x = S −1 (G−1 b + K −1 Hr In−r a) > (13. a can be anything we want. Of course. b) = G−1 b. Equation (13.22) for x.20). which can be vexing because the problem arises even when the matrix A is exactly known.24) is useful and correct.7).

12 Some books [49] prefer to define r(x) ≡ b − Ax. The quantity11. For example. The r here is unrelated to matrix rank r. and the more so because it arises so frequently in applications. fully trained labor on demand. More precisely.25) has no exact solution.5 for the definitions of underdetermined. On Saturday morning at the end of the second week. 13. We need to develop the mathematics to handle the overdetermined system properly. etc.12 r(x) ≡ Ax − b (13.25). and the smaller. because for general b the overdetermined linear system Ax ≈ b (13. Alas. the smaller the nonnegative real scalar [r(x)]∗ [r(x)] = i |ri (x)|2 (13. 13.13.6 The Moore-Penrose pseudoinverse and the least-squares problem A typical problem is to fit a straight line to some data. Suppose further that we are contracted to build a long freeway and have been adding workers to the job in recent weeks to speed construction. called the squared residual norm. the better. One seldom trusts a minimal set of data for important measurements.5. yet extra data imply an overdetermined system. THE PSEUDOINVERSE AND LEAST SQUARES 335 overdetermined linear system. we gather and plot the production data on the left of Fig. (See § 12.27) is. If ui and bi respectively represent the number of workers and the length of freeway completed during week i.1. instead. In fact the overdetermined system is especially interesting.26) measures how nearly some candidate solution x solves the system (13.) One is tempted to declare the overdetermined system uninteresting because it has no solution and to leave the matter there. but this would be a serious mistake.6. We call this quantity the residual. the alphabet has only so many letters (see Appendix B). overdetermined. the more favorably we regard the candidate solution x. 11 . whose labor union can supply additional. suppose that we are building-construction contractors with a unionized work force.

Why now two? The answer is that § 13. INVERSION AND ORTHONORMALIZATION Figure 13. they differ as driving an automobile differs from washing one. Do not let this confuse you. The added data present a problem. Though both involve lines. but geometrically we need only two points to specify a line.2 to solve for x ≡ [σ γ]T . to which we should like to fit a better line to predict production more accurately. what are we to do with the other three? The five points together overdetermine the linear system 2 3 3 2 = b1 6 b2 6 6 b3 6 4 b4 b5 7 7 7.3. Length newly completed during the week Number of workers Length newly completed during the week Number of workers u2 1 γ b2 u1 6 u2 6 6 u3 6 4 u4 u5 1 1 1 1 1 7» – 7 σ 7 7 γ 5 then we can fit a straight line b = σu + γ to the measured production data such that – » – –» » b1 σ u1 1 = . 13 .336 CHAPTER 13.3 characterized a line as enjoying only one degree of freedom. By the fifth Saturday however we shall have gathered more production data. 7 5 There is no way to draw a single straight line b = σu + γ exactly through all five. More precisely. in the hope that the resulting line will predict future production accurately.3. the added data are welcome. for in placing the line we enjoy only two degrees of freedom.13 The proper approach is to draw among the data points a single straight line that misses the points as narrowly as possible.1: Fitting a line to measured data.1 and 13. inverting the matrix per §§ 13. That is all mathematically irreproachable. the proper Section 13. Statistically.3 discussed travel along a line rather than placement of a line as here. plotted on the figure’s right.

6.13. . all ones on the right. This is a typical structure for A. 7 7 5 2 6 6 6 6 6 6 6 4 b1 b2 b3 b4 b5 . this section attacks the difficult but important problem of approximating optimally a solution to the general.10). THE PSEUDOINVERSE AND LEAST SQUARES 337 approach chooses parameters σ and γ to minimize the squared residual norm [r(x)]∗ [r(x)] of § 13. . 7 7 5 A= x= » σ γ – . 13. . A requirement that d xT AT (Ax − 2b) + bT b = 0 dx . The matrix A in the example has two columns. Ax ≈ b.1 Least squares in the real domain The least-squares problem is simplest when the matrix A enjoys full column rank and no complex numbers are involved. 3 7 7 7 7 7. data marching on the left. we seek to minimize the squared residual norm [r(x)]T [r(x)] = (Ax − b)T (Ax − b) = xT AT (Ax − 2b) + bT b. given that 2 6 6 6 6 6 6 6 4 u1 u2 u3 u4 u5 . = xT AT Ax + bT b − 2xT AT b = xT AT Ax + bT b − xT AT b + bT Ax in which the transpose is used interchangeably for the adjoint because all the numbers involved happen to be real.5. . but in general any matrix A with any number of columns of any content might arise (because there were more than two relevant variables or because some data merited heavier weight than others. 1 1 1 1 1 3 7 7 7 7 7.25). b= Such parameters constitute a least-squares solution. Whatever matrix A might arise from whatever source. In this case. among many further reasons). possibly unsolvable linear system (13. The norm is minimized where d rT r = 0 dx (in which d/dx is the Jacobian operator of § 11.6.

No line can pass more nearly.28) x = AT A to the unsolvable linear system (13. for applied mathematics is a poor guide to such mysteries. if the other metric really. Strictly speaking.27) is so chosen. 13. As the reader can see. the better. or. in the squared-residual norm sense of (13. An experienced mathematician would probably reject the circle on the aesthetic yet practical ground that the parabola b = αu2 + σu + γ lends 14 . the mathematics cannot tell us which metric to use. objectively is better.338 CHAPTER 13. but where no other consideration prevails the applied mathematician tends to choose the metric that best simplifies the mathematics at hand— and. the simplified equation AT Ax = AT b.28) plots the line on Fig. and the less. real cost function or metric. “But. This is where the mathematician’s experience. but it does pass pretty convincingly nearly among them. next) that the n×n square matrix AT A is invertible. rearranging terms and dividing by 2.6. Assuming (as warranted by § 13.27).” comes the objection. usually. to identify the good as good in the first place. “Optimal” means “best.79) yields the equation xT AT A + AT (Ax − 2b) T = 0. Equation (13. In general however the mathematical question is: what does one mean by “better?” Better by which metric? Each metric is better according to itself. “what if some more complicated metric is better?” Well.2. that is about as good a way to choose a metric as any.25) in the restricted but quite typical case that A and b are real and A has full column rank. after transposing the equation. nonnegative.1’s right. even of dispute. We shall leave to the philosopher and the theologian the important question of what constitutes objective good. A circle. One could therefore seek to fit not a line but some downward-turning curve to the data. the line does not pass through all the points.” Many problems in applied mathematics involve discovering the best of something. One generally establishes mathematical optimality by some suitable. The role of applied mathematics is to construct suitable models to calculate quantities needed to achieve some definite good. the simplified equation implies the approximate but optimal least-squares solution −1 T A b (13. The metric (13. then one should probably use it. In fact it passes optimally nearly among them. taste and judgment come in.14 Here is a nice example of the use of the mathematical adjective optimal in its adverbial form. really. its role is not. maybe? Not likely. INVERSION AND ORTHONORMALIZATION comes of combining the last two equations. What constitutes the best however can be a matter of judgment. too much labor on the freeway job might actually slow construction rather than speed it. for no line can. In the present section’s example. Differentiating by the Jacobian product rule (11. Mathematics offers many downward-turning curves.

This would mean that there existed a nonzero. Let A be a complex. Left-multiplying by u∗ would give that u∗ A∗ Au = 0. In u = 0. One might fit the line to the points (bi . Suppose falsely that A∗ A were not invertible but singular. itself to easier analysis.4) that its columns (as its rows) depended on one another. In u = 0. implying (§ 12.13. bi ). The big square is not very interesting and in any case is not invertible.2 The invertibility of A∗ A Section 13. Yet even fitting a mere straight line offers choices. impossible when the columns of A are independent. The adjective “optimal” alone evidently does not always tell us all we need to know.1) that the product’s rank r ′ < n were less than full.6. m × n matrix of full column rank r = n ≤ m. the n × n product A∗ A is invertible for any tall or square. .6. it happens that AT = A∗ . The contradiction proves false the assumption which gave rise to it. It is the compact square that concerns this section. then A∗ A is a compact. Section 6. Thus. They predict production differently.3 offers a choice between averages that resembles in spirit this footnote’s choice between metrics.5. But this could only be so if Au = 0. 15 Notice that if A is tall.1 has assumed correctly but unwarrantedly that the product AT A were invertible for real A of full column rank. n-element vector u for which A∗ Au = 0. In u = 0. For real A. m × n matrix A of full column rank r = n ≤ m.15 This subsection warrants the latter assumption. Since the product A∗ A is a square. ui ) or (ln ui . n × n square. so it only broadens the same assumption to suppose that the product A∗ A were invertible for complex A of full column rank. or in other words that n (Au) (Au) = i=1 ∗ |[Au]i |2 = 0.6. The false assumption: that A∗ A were singular. thereby incidentally also warranting the former. THE PSEUDOINVERSE AND LEAST SQUARES 339 13. ln bi ) rather than to the points (ui . this is to suppose (§ 13. n × n matrix. In u = 0. whereas AA∗ is a big. The three resulting lines differ subtly. m × m square.

then one can just choose B = Im or C = In . the product A∗ A is nonnegative definite for any matrix A whatsoever. Such definitions might seem opaque. Suppose that. C = Ir G< . INVERSION AND ORTHONORMALIZATION 13. which is unfortunate but conventional. that the product Au points more in the direction of u than of −u. (13. 16 This subsection uses the symbols B and b for unrelated purposes.16). we manipulated (13. both of which factors themselves enjoy full rank r.3 Positive definiteness An n × n matrix C is positive definite if and only if ℑ(u∗ Cu) = 0 and ℜ(u∗ Cu) > 0 for all In u = 0. (B B) ∗ −1 B BCx ≈ (B ∗ B)−1 B ∗ b. a conjecture seems warranted. (13. Thus per (13. but their sense is that a positive definite operator never reverses the thing it operates on. An n × n matrix C is nonnegative definite if and only if ℑ(u∗ Cu) = 0 and ℜ(u∗ Cu) ≥ 0 for all u.2. Section 13.25) by the successive steps Ax ≈ b. 12.6. however. the Gauss-Jordan decomposition of eqn. every m × n matrix A of rank r can be factored into a product16 A = BC of an m × r tall or square matrix B and an r × n broad or square matrix C. (If A happens to have full row or column rank. According to (12.30) By reasoning like the last paragraph’s. 13.29) the product A∗ A is positive definite for any matrix A of full column rank.2 finds at least the full-rank factorization B = G> Ir .28). inspired by (13.8 explains further. but even if A lacks full rank.29) As in § 13.6. See footnote 11.6. nelement vectors u. A positive definite operator resembles a positive scalar in this sense. Cx ≈ (B ∗ B)−1 B ∗ b. ∗ BCx ≈ b. here also when a matrix A has full column rank r = n ≤ m the product u∗ A∗ Au = (Au)∗ (Au) is real and positive for all nonzero.4 The Moore-Penrose pseudoinverse Not every m × n matrix A enjoys full rank. .) This being so.340 CHAPTER 13.

After all. altering the “≈” sign to “=. minimal squared residual norm. Changing the variable back and (because we are conjecturing and can do as we like). .26). So here is our conjecture: • no x enjoys a smaller squared residual norm r∗ r than the x of (13.2 at least that the two matrices it tries to invert are invertible.6.31) does.31) does resemble (13. the latter of which admittedly requires real A of full column rank but does minimize the residual when its requirements are met. whether small.28). The conjecture is bold. The first point of the conjecture is symbolized r(x) ≤ r(x + ∆x).6. this is [Ax − b]∗ [Ax − b] ≤ [(A)(x + ∆x) − b]∗ [(A)(x + ∆x) − b]. moderate or large. even if there were more than one x which minimized the residual. Reorganizing. According to (13.” x = C ∗ (CC ∗ )−1 (B ∗ B)−1 B ∗ b. and. (13. (13. and • among all x that enjoy the same.31) has a pleasingly symmetrical form.31) Equation (13. one of them might be smaller than the others: why not the x of (13. u ≈ (CC ∗ )−1 (B ∗ B)−1 B ∗ b.13.31) is strictly least in magnitude. CC ∗ u ≈ (B ∗ B)−1 B ∗ b. Continuing. [Ax − b]∗ [Ax − b] ≤ [(Ax − b) + A ∆x]∗ [(Ax − b) + A ∆x]. 341 thus restricting x to the space addressed by the independent columns of C ∗ . THE PSEUDOINVERSE AND LEAST SQUARES Then suppose that we changed C ∗ u ← x. but if you think about it in the right way it is not unwarranted under the circumstance.31)? One can but investigate. the x of (13.31). of some alternate x from the x of (13. and we know from § 13. where ∆x represents the deviation.

It does not claim that no x + ∆x enjoys the same. Each step in the present paragraph is reversible.3.2 and 9. thus establishing the conjecture’s first point.3 is a nonnegative definite operator. Moreover. See Ch. minimal squared residual norm.31) and the full-rank factorization A = BC. leaving an assertion that 0 ≤ ∆x∗ A∗ A ∆x.6. The conjecture’s first point. the assertion in the last form is correct because the product of any matrix and its adjoint according to § 13. which since B has full column rank is possible only if C ∆x = 0. A∗ (Ax − b) = A∗ Ax − A∗ b = [C ∗ B ∗ ][BC][C ∗ (CC ∗ )−1 (B ∗ B)−1 B ∗ b] − [C ∗ B ∗ ][b] = C ∗ (B ∗ B)(CC ∗ )(CC ∗ )−1 (B ∗ B)−1 B ∗ b − C ∗ B ∗ b = C ∗ B ∗ b − C ∗ B ∗ b = 0.17 so the assertion in the last form is logically equivalent to the conjecture’s first point. The latter case is symbolized r(x) = r(x + ∆x). 6’s footnote 14. so this is to claim that B(C ∆x) = 0. or in other words.5. has it that no x+∆x enjoys a smaller squared residual norm than the x of (13. with which the paragraph began. A ∆x = 0. INVERSION AND ORTHONORMALIZATION Distributing factors and canceling like terms.31) does. But according to (13. 0 ≤ (Ax − b)∗ A ∆x + ∆x∗ A∗ (Ax − b) + ∆x∗ A∗ A ∆x. But A = BC. 0 = ∆x∗ A∗ A ∆x. which reveals two of the inequality’s remaining three terms to be zero. or equivalently by the last paragraph’s logic. .342 CHAPTER 13. now established. 17 The paragraph might inscrutably but logically instead have ordered the steps in reverse as in §§ 6.

which naturally for ∆x = 0 is true. reverse logic establishes the conjecture’s second point. THE PSEUDOINVERSE AND LEAST SQUARES 343 Considering the product ∆x∗ x in light of (13. Distributing factors and canceling like terms. which is to observe that ∆x∗ x = 0 for any ∆x for which x + ∆x achieves minimal squared residual norm. Returning attention to the conjecture. more briefly the pseudoinverse of A. its second point is symbolized x∗ x < (x + ∆x)∗ (x + ∆x) for any ∆x = 0 for which x+∆x achieves minimal squared residual norm (note that it’s “<” this time. so the last inequality reduces to read 0 < ∆x∗ ∆x.31) is called the Moore-Penrose pseudoinverse of A. With both its points established. exactly determined. overdetermined or even degenerate.32) of (13. Since each step in the paragraph is reversible. the conjecture is true. not “≤” as in the conjecture’s first point). then the matrix18 A† ≡ C ∗ (CC ∗ )−1 (B ∗ B)−1 B ∗ (13. every matrix has a Moore-Penrose pseudoinverse.31) and the last equation.33) Some books print A† as A+ . But the last paragraph has found that ∆x∗ x = 0 for precisely such ∆x as we are considering here.6. 0 < x∗ ∆x + ∆x∗ x + ∆x∗ ∆x. Whether underdetermined. Yielding the optimal approximation x = A† b. 18 (13. we observe that ∆x∗ x = ∆x∗ [C ∗ (CC ∗ )−1 (B ∗ B)−1 B ∗ b] = [C ∆x]∗ [(CC ∗ )−1 (B ∗ B)−1 B ∗ b]. If A = BC is a full-rank factorization.13. .

then the Moore-Penrose A† = A−1 is just the inverse.23 fails to do).20 19 20 [2. The MoorePenrose is thus a general-purpose solver and approximator for linear systems.25) as well as the system can be solved—exactly if possible. If A lacks full row rank. 4. x=xk Solving for xk+1 (approximately if necessary).34) where [·]† is the Moore-Penrose pseudoinverse of § 13.7 The multivariate Newton-Raphson iteration When we first met the Newton-Raphson iteration in § 4. f dx x=xk where df /dx is the Jacobian derivative of § 11. and then of course (13. Nothing can solve the system uniquely if A has broad shape but the Moore-Penrose still solves the system exactly in that case as long as A has full row rank. then the Moore-Penrose solves the system as nearly as the system can be solved (as in Fig.344 CHAPTER 13. 13. (13.1 if f and x happen each to have the same number of elements. If A is square and invertible. § 3. It then approximates the root xk+1 as the point at which ˜k (xk+1 ) = 0: f ˜k (xk+1 ) = 0 = f (xk ) + f d f (x) dx (xk+1 − xk ). Now that we have the notation and algebra we can write down the multivariate Newton-Raphson iteration almost at once.19 13.8 and Fig. moreover minimizing the solution’s squared magnitude x∗ x (which the solution of eqn.5. INVERSION AND ORTHONORMALIZATION the Moore-Penrose solves the linear system (13. It is a significant discovery. The iteration approximates the nonlinear vector function f (x) by its tangent ˜k (x) = f (xk ) + d f (x) (x − xk ). we have that xk+1 d f (x) = x− dx † f (x) x=xk .33) solves the system uniquely and exactly.6—which is just the ordinary inverse [·]−1 of § 13.8 we lacked the matrix notation and algebra to express and handle vector-valued functions adeptly. 13. with minimal squared residual norm if impossible. Refer to § 4.10. “Moore-Penrose generalized inverse”] [40] .1) and as a side-benefit also minimizes x∗ x.3][37.

also called the inner product. § 3. it must be that ei · ej = δij . (13. It does not intend to use the full (13. Ch. Where used.32). at least in the author’s country.8 The dot product The dot product of two vectors. especially in some of the older literature. some similar product). if r = n ≤ m. more broadly. as though the curve of Fig. Most recently. The iteration intends rather in light of (13.37) 21 The term inner product is often used to indicate a broader class of products than the one defined here. .1][15. if the Jacobian is degenerate—then (13.32) that d f (x) dx † where B = Im in the first case and C = In in the last.35) fails. Therefore. a).5 ran horizontally at the test point—when one quits. or. If both r < m and r < n—which is to say.  −1 [df /dx]∗ ([df /dx] [df /dx]∗ )  = [df /dx]−1   −1 ([df /dx]∗ [df /dx]) [df /dx]∗ if r = m ≤ n.8. It is written a · b. the usage a. 4]. (13. if r = m = n. In general.34). the Newton-Raphson iteration is not normally meant to be applied at a value of x for which the Jacobian is degenerate. This book prefers the dot. restarting the iteration from another point. the notation usually resembles a. both of which mean a∗ · b (or. b ≡ a∗ · b seems to be emerging as standard where the dot is not used [2. b or (b. But if the dot product is to mean anything.13. THE DOT PRODUCT 345 Despite the Moore-Penrose notation of (13. except that which of a and b is conjugated depends on the author. a · b ≡ aT b = ∞ j=−∞ (13.21 is the product of the two vectors to the extent that they run in the same direction. 4. a · b = (a1 e1 + a2 e2 + · · · + an en ) · (b1 e1 + b2 e2 + · · · + bn en ).36) aj bj . more concisely. a · b = a1 b1 + a2 b2 + · · · + an bn .35) 13.

such that (13.43) which formally defines the angle θ even when a and b have more than three elements each.346 CHAPTER 13. the dot product |a|2 = a∗ · a (13. usually one is not interested in a · b so much as in a ·b≡a b= The reason is that ℜ(a∗ · b) = ℜ(a) · ℜ(b) + ℑ(a) · ℑ(b). The unit vector in a’s direction then is a a ˆ =√ .) Where vectors may have complex elements. incidentally: a · b = a · bT = aT · b = aT · bT = aT b. never negative. the notation implicitly reorients it before using it. a a∗ · b = 0.40) a≡ |a| a∗ · a from which ˆ ˆ |ˆ|2 = a∗ · a = 1.42) the two vectors are said to lie orthogonal to one another. (The more orderly notation aT b by contrast assumes that both are proper column vectors. For other angles θ between two vectors.” By the Pythagorean theorem. (13. thus honoring the right Argand sense of “the product of the two vectors to the extent that they run in the same direction.39) ∗ ∗ ∞ a∗ bj . always real. if either vector is wrongly oriented. with the product of the imaginary parts added not subtracted.38) j=−∞ gives the square of a vector’s magnitude. ˆ ˆ a∗ · b = cos θ. Geometrically this puts them at right angles. . j (13. (13.41) When two vectors do not run in the same direction at all. INVERSION AND ORTHONORMALIZATION The dot notation does not worry whether its arguments are column or row vectors. That is. (13.

The same goes for the eigenvectors of Ch.3). Kernel vectors have no inherent scale. then so does α1 x1 + α2 x2 . (13.3] 22 . 7 − 1 0]T .46) 13. This says that the columns of A⊥ ≡ A∗K address the complete space of vectors that lie orthogonal to A’s columns.” since by (13. THE ORTHOGONAL COMPLEMENT 347 13. such that A⊥∗ A = 0 = A∗ A⊥ . If I claim AK = [3 4 5. we might write not A⊥ but A⊥(m) .44) is an interesting matrix. If we were really precise. style generally asks the applied mathematician not only to normalize but also to orthogonalize the columns The symbol A⊥ [23][2][31] can be pronounced “A perp.45) The matrix A⊥ is called the orthogonal complement 23 or perpendicular matrix to A.45) A⊥ is in some sense perpendicular to A. (13. the orthogonal complement A⊥ supplies the columns A lacks to reach full row rank. Properties include that A∗K = A⊥ . −1 1 0]T to represent a kernel. If the vectors x1 = AK a1 and x2 = AK a2 both belong. By definition of the kernel. 23 [23. Where a kernel matrix AK has two or more columns (or a repeated eigenvalue has two or more eigenvectors).9.40) to normalize the columns of a kernel matrix to unit magnitude before reporting them. instead for instance representing the same kernel 1 as AK = [6 8 0xA. Among other uses. then you are not mistaken arbitrarily to rescale each column of my AK by a separate nonzero factor. then so equally does any αx. 14 to come.” short for “A perpendicular.13.VI. Style 7 generally asks the applied mathematician to remove the false appearance of scale by using (13. A∗⊥ = AK . the columns of A∗K are the independent vectors uj for which A∗ uj = 0.3)22 (13. Refer to footnote 5.10 Gram-Schmidt orthonormalization If a vector x = AK a belongs to a kernel space AK (§ 13.9 The orthogonal complement A⊥ ≡ A∗K The m × (m − r) kernel (§ 13. which—inasmuch as the rows of A∗ are the adjoints of the columns of A—is possible only when each uj lies orthogonal to every column of A. § 3.

a ·a √ ˆ ˆ ˆ But according to (13. and according to (13. x3 . Substituting the second of these equations into the first and solving for β yields β= Hence.50) j−1 xj⊥ ≡ i=1 ˆ ˆ i⊥ (I − xi⊥ x∗ ) xj . orthogonalize x2 with respect to x1⊥ by (13. Symbolically. (13. . rather than the scalar [ˆ∗ ][ˆ]). a matrix. b⊥ ≡ b − βa.48) or (13. then normalize it. ˆ Second. (13. xn } ˆ therefore is as follows.40).48) or. One orthogonalizes a vector b with respect to a vector a by subtracting from b a multiple of a such that a∗ · b⊥ = 0. . a∗ · a = 1. This is arguably better written b⊥ = [I − (ˆ)(ˆ∗ )] b a a (observe that it’s [ˆ][ˆ∗ ]. in matrix notation.49). Proceed in this manner through the several xj . so. xj⊥ ˆ xj⊥ = . ˆ a b⊥ = b − a(ˆ∗ )(b). a∗ · b . a a a a One orthonormalizes a set of vectors by orthogonalizing them with respect to one another. then ˆ ˆ normalize it. a∗ · a (13. First.49) a∗ · b⊥ = 0. then by normalizing each of them to unit magnitude.47) a∗ · b b⊥ ≡ b − ∗ a. x∗ xj⊥ j⊥ (13. . . normalize x1 by (13. x2 .348 CHAPTER 13. where the symbol b⊥ represents the orthogonalized vector. call the result x3⊥ . orthogonalize x3 with respect to x1⊥ ˆ ˆ then to x2⊥ . . Third.41). INVERSION AND ORTHONORMALIZATION before reporting them. The procedure to orthonormalize several vectors {x1 . a = a a∗ · a. call the result x1⊥ .40). call the result x2⊥ . ˆ a b⊥ = b − a(ˆ∗ · b).

in perspective of whatever it happens to be that one is trying to accomplish. GRAM-SCHMIDT ORTHONORMALIZATION 349 By the vector replacement principle of § 12.10. “Gram-Schmidt process. 11 August 2007] . If all one wants is some vectors orthonormalized. probably does not even reserve that much. then the equation as written is neat but is ˆ i⊥ ˆ ˆ i⊥ overkill because the product xi⊥ x∗ is a matrix.” 04:48.10. representing the multiplication in the admittedly messier form ˆ xj(i+1) ≡ xji − (ˆ ∗ · xji ) xi⊥ .4 in light of (13. . xn⊥ } x addresses the same space as did the original set. Orthonormalization naturally works equally for any linearly independent set of vectors. not only for kernel vectors or eigenvectors. .13.2 The Gram-Schmidt decomposition The orthonormalization technique this section has developed is named the Gram-Schmidt process. Fortunately.48) is just a scalar. which memory it repeatedly overwrites—and. whereas the product x∗ xj implied by (13. 13.1 Efficient implementation To turn an equation like the latter line of (13. x2⊥ . (A well written orthonormalizing computer program reserves memory for one intermediate vector only.51) ˆ ˆ i⊥ Besides obviating the matrix I − xi⊥ x∗ and the associated matrix multiplication. (13. one need not apply the latter line of (13. xi⊥ xj⊥ = xjj .10. neater.51) has the significant additional practical virtue that it lets one forget each intermediate vector xji immediately after using it. One can instead introduce intermediate vectors xji . xj1 ≡ xj . x3⊥ . . the resulting orthonormal set of vectors ˆ ˆ ˆ {ˆ 1⊥ . One can turn it into the Gram-Schmidt decomposi24 [52. the messier form (13. By the technique. working rather in the ˆ memory space it has already reserved for xj⊥ . orthonormal set which addresses precisely the same space. .50) exactly as written. actually.)24 Other equations one algorithmizes can likewise benefit from thoughtful rendering.50) into an efficient numerical algorithm sometimes demands some extra thought. 13.47). one can conveniently replace a set of independent vectors by an equivalent.

bringing the very same index pairs in a different sequence.1—where the latter three working matrices ultimately become the extended-operational factors U . such that the several (i. The i-major nesting (1. Borrowing the language of computer science we observe that the indices i and j of (13. By elementary column operations based on (13. The equations suggest j-major nesting.50) ˜ and (13. The algorithm. distributing the inverse elementaries ˜ ˜ ˜ to U . INVERSION AND ORTHONORMALIZATION A = QR = QU DS. m×r matrix Q of orthonormal columns. (13.350 tion CHAPTER 13. 4) (2. We choose i-major nesting on the subtle ground that it affords better information to the choice of column index p during the algorithm’s step 3. is just as valid. by an algorithm that somewhat resembles the Gauss-Jordan algorithm of § 12. D and S of (13. j) pair follows which. 4) .. one level looping over j and the other over i. with the loop over j at the outer level and the loop over i at the inner.53) ˜ and initially Q ← A.. 4) (3. 4) · · · .51)’s middle line requires only that no xi⊥ be used before it is fully calculated.52) also called the orthonormalizing or QR decomposition. in detail: . otherwise that line does not care which (i. 3) (1. ··· ··· ··· ˆ In reality.4) here becomes ˜˜ ˜˜ A = QU D S (13. 3) (1. 4) · · · (2. . D and S according to Table 12. however. 3) (2.3. except that (12. 2) (1.52). 4) · · · (3.50) and (13. j) index pairs occur in the sequence (reading left to right then top to bottom) (1.3. (13.51). 3) (2. the algorithm gradually transforms Q into a dimension-limited. R ≡ U DS. 2) (1. .51) imply a two-level nested loop.

among other ˜ heuristics.1. 5. but one might instead prefer to choose the column of greatest magnitude.7.2) with djj = 1 for all j ≥ i.13. the algorithm also ˜ re¨nters here from step 9 below. Begin by initializing ˜ Q ← A. ˜ S ← I.) If Q is null in and rightward of its ith column such that no column p ≥ i remains available to choose. ˜ ˜ U ← T[i↔p] U T[i↔p] . ˜ ˜ S ← T[i↔p] S.7. Observing that (13. interchange the chosen pth column to the ith position by ˜ ˜ Q ← QT[i↔p] . Choose a column p ≥ i of Q containing at least one nonzero element.1).53) can be expanded to read A = = ˜ QT[i↔p] ˜ QT[i↔p] ˜ T[i↔p] U T[i↔p] ˜ T[i↔p] DT[i↔p] ˜ T[i↔p]S ˜ ˜ ˜ T[i↔p] U T[i↔p] D T[i↔p]S . GRAM-SCHMIDT ORTHONORMALIZATION 1.) Observe that U enjoys the major e {i−1}T (§ 11. Observing that (13. ˜ 3. or to choose randomly. and that the first through (i − 1)th columns of Q consist of mutually orthonormal unit vectors.5). ˜ D ← I. 351 2.8. . that D is a general ˜ partial unit triangular form L ˜ ˜ scaling operator (§ 11.53) can be expanded to read ˜ A = QT(1/α)[i] ˜ Tα[i] U T(1/α)[i] ˜ ˜ Tα[i] D S. 4. that S is permutor ˜ (§ 11. (The simplest choice is perhaps p = i as long as the ith column does not happen to be null. where the latter line has applied a rule from Table 12. then skip directly to step 10.10. i ← 1. (Besides arriving at this point from step 1 above. ˜ U ← I.

observing that (13. 6. ˜ S ≡ S. where α= ˜ Q ∗ ∗i ˜ · Q ∗i . where ˜ β= Q ∗ ∗i ˜ · Q ∗j . INVERSION AND ORTHONORMALIZATION ˜ normalize the ith column of Q by ˜ ˜ Q ← QT(1/α)[i] . Let ˜ Q ≡ Q.352 CHAPTER 13.51) with respect to the ith column by ˜ ˜ Q ← QT−β[ij] . 8. e Otherwise. 10.) If j > n then skip directly to step 9. Increment and return to step 7. ˜ U ≡ U. ˜ ˜ U ← Tα[i] U T(1/α)[i] . 7. the algorithm also re¨nters here from step 8 below. Increment and return to step 2. (Besides arriving at this point from step 6 above.53) can be expanded to read ˜ A = QT−β[ij] ˜ ˜˜ Tβ[ij]U DS. i← i+1 j ←j+1 ˜ D ≡ D. 9. ˜ orthogonalize the jth column of Q per (13. . ˜ ˜ U ← Tβ[ij] U . r = i − 1. ˜ ˜ D ← Tα[i] D. Initialize j ← i + 1.

Observing that the r independent columns of the m × r matrix Q address the same space the columns of A address. achieving square.56) . here also one sometimes prefers that S = I. 13. if one always chooses p = i during the algorithm’s step 3.52) and its algorithm in § 13. (ii) since Q is itself dimension-limited.10. GRAM-SCHMIDT ORTHONORMALIZATION End.7).10. below).13.52) needs and has no explicit factor Ir .16) is a proper full-rank factorization with which one can compute the pseudoinverse A† of A (see eqn. this is A = (Q)(Ir R).5. the matrix Q = QIr has only r columns.32. the Gram-Schmidt decomposition too brings a kernel formula. above. Reassociating factors. but see also eqn.11.3. The algorithm optionally supports this preference if the m × n matrix A has full column rank r = n. next.2.10. then it becomes a unitary matrix —the subject of § 13.10. never on the rows. 13.54) which per (12. (13. If the Gram-Schmidt decomposition (13. If Q reaches the maximum possible rank r = m. however.52) looks useful. whose orthonormal columns address the same space the columns of A themselves address.3 The Gram-Schmidt kernel formula Like the Gauss-Jordan decomposition in (13. when null columns cannot arise. Such optional discipline maintains S = I when desired. m × m shape.7. the Gram-Schmidt decomposition (13. one decomposes an m × n matrix A = QR (13.55) per the Gram-Schmidt (13. Whether S = I or not. one then constructs the m × (r + m) matrix A′ ≡ Q + Im H−r = Q Im (13.63. 353 Though the Gram-Schmidt algorithm broadly resembles the GaussJordan. 13. at least two significant differences stand out: (i) the Gram-Schmidt ˜ is one-sided because it operates only on the columns of Q. To develop and apply it. The most interesting of its several factors is the m × r orthonormalized matrix Q.52) as A = (QIr )(R). so one can write (13. Before treating the unitary matrix. it is even more useful than it looks. As in § 12. let us pause to extract a kernel from the Gram-Schmidt decomposition in § 13.

it brings one .354 CHAPTER 13.57) again by Gram-Schmidt—with the differences that.58) or the form (13. The resulting m × m. A∗K = A⊥ = Q′ Hr Im−r . .59). To compute the kernel of a matrix B by Gram-Schmidt one sets A = B ∗ and applies (13. full-rank square matrix Q′ = Q + A⊥ H−r = consists of • r columns on the left that address the same space the columns of A address and • m−r columns on the right that give a complete orthogonal complement (§ 13. A′ = Q′ R′ .11 The unitary matrix When the orthonormalized matrix Q of the Gram-Schmidt decomposition (13. after all. are just Q.58) extracts the rightward columns that express the kernel.11 that follows. if one wants a Gauss-Jordan kernel orthonormalized.46). as the last paragraph of § 13. one chooses p = 1.58) is probably the more useful form. and that one skips the unnecessary step 7 for all j ≤ r.9) A⊥ of A. INVERSION AND ORTHONORMALIZATION and decomposes it too.59) Q A⊥ (13. The unitary matrix is the subject of § 13. then one must orthonormalize it as an extra step. the m × m matrix Q′ is a unitary matrix. but of A∗ .55) has already orthonormalized first r columns of A′ . the Gram-Schmidt kernel formula does everything the Gauss-Jordan kernel formula (13. . having the maximum possible rank r = m. 3. . Refer to (13. left and right.7) does and in at least one sense does it better.10. 2. (13. on the ground that the earlier Gram-Schmidt application of (13. Being square.55) through (13.2 has alluded. which columns. but the GramSchmidt kernel formula as such. 13. not of A. this time.59).52) is square. In either the form (13. (13. Equation (13. Each column has unit magnitude and conveniently lies orthogonal to every other column. whereas the Gram-Schmidt kernel comes already orthonormalized. r during the first r instances of the algorithm’s step 3. for. .

60) is that a square matrix with either orthonormal columns or orthonormal rows is unitary and has both. until one realizes25 that the equation Q∗ Q = Im characterizes Q∗ to be the rank-m inverse of Q. and that the very definition of orthonormality demands that the dot product [Q]∗ · ∗i [Q]∗j of orthonormal columns be zero unless i = j.1. qj = i qbij qai . (Note that the permutor of § 11. qaj and qbj respectively represent the jth columns of Q. when the dot product of a unit vector with itself is unity. however. The product of two or more unitary matrices is itself unitary if the matrices are of the same dimensionality. The adjoint dot product of any two of Q’s columns then is q∗′ · qj = j i. Thus. By the columnwise interpretation (§ 11. To prove it.13.11. Q−1 = Q∗ . is called a unitary matrix.1 lets any rank-m inverse (orthonormal or otherwise) attack just as well from the right as from the left.1 enjoys the property of eqn. and that § 13.61) a very useful property. ai But q∗ ′ · qai = δi′ i because Qa is unitary.3) of matrix multiplication.i′ ∗ qbi′ j ′ qbij q∗ ′ · qai .4] This is true only for 1 ≤ i ≤ m.26 so ai q∗′ · qj = j 25 26 i ∗ qbij ′ qbij = q∗ ′ · qbj = δj ′ j . consider the product Q = Qa Qb (13.61 precisely because it is unitary. (13. bj [15.) One immediate consequence of (13. whether derived from the Gram-Schmidt or from elsewhere.60) The reason that Q∗ Q = Im is that Q’s columns are orthonormal. Let the symbols qj . That Im = QQ∗ is unexpected.62) of m × m unitary matrices Qa and Qb . . THE UNITARY MATRIX 355 property so interesting that the property merits a section of its own. Qa and Qb and let the symbol qbij represent the ith element of qbj .60).7. (13. 13. A matrix Q that satisfies (13. § 4. The property is that Q∗ Q = Im = QQ∗ . but you knew that already.

Unitary operations preserve length. But according to (13. so. To prove it. Multiplying the system by its own adjoint yields x∗ Q∗ Qx = b∗ b. It has explained how to orthonormalize a set of vectors and has derived from the explanation the useful Gram-Schmidt decomposition. The chapter as a whole has demonstrated at least in theory (and usually in practice) techniques to solve any linear system characterized by a matrix of finite dimensionality.52) to invert a square matrix as A−1 = R−1 Q∗ = S ∗ D −1 U −1 Q∗ . whatever the matrix’s rank or shape.60) for m = ∞. and it is the subject of Ch.63) Unitary extended operators are certainly possible. next. consider the system Qx = b. the single most interesting agent of matrix arithmetic remains yet to be treated.60). . We shall meet the unitary matrix again in §§ 14. operating on an m-element vector by an m × m unitary matrix does not alter the vector’s magnitude. itself meets the unitary criterion (13. 14. which is just Q with ones running out the main diagonal from its active region. for if Q is an m × m dimension-limited matrix.356 CHAPTER 13. And even so.10 and 14. Unitary matrices are so easy to handle that they can sometimes justify significant effort to convert a model to work in terms of them if possible. INVERSION AND ORTHONORMALIZATION which says neither more nor less than that the columns of Q are orthonormal. Q∗ Q = Im . x∗ x = b∗ b.61) lets one use the Gram-Schmidt decomposition (13. the matrix has shown its worth here. That is. as was to be demonstrated. then the extended operator Q∞ = Q + (I − Im ). (13. This last is the eigenvalue. as was to be demonstrated. As the chapter’s introduction had promised. which is to say that Q is unitary. Equation (13. arithmetic and algebra most of the chapter’s findings would have lain beyond practical reach.12. for without the matrix’s notation.

Unless you already know about determinants. This chapter analyzes the eigenvalue and the associated eigenvector it scales. Here comes the first unexpected turn.Chapter 14 The eigenvalue The eigenvalue is a scalar by which a square matrix scales a vector without otherwise changing it. each term the product of n elements.7. 11. terms of positive parity adding to and terms of negative parity subtracting from the sum—a term’s parity (§ 11. so try this.2 for reference. the chapter gathers from across Chs. the definition alone might seem hard to parse.1 The determinant Through Chs. assembling them in § 14. which opens the chapter.1) marking the positions of the term’s elements. Before treating the eigenvalue proper. such that Av = λv.6) being the parity of the permutor (§ 11. 14. 12 and 13 the theory of the matrix has developed slowly but pretty straightforwardly. It begins with an arbitrary-seeming definition. The inverse of the general 2 × 2 square matrix a11 a12 . 11 through 14 several properties all invertible square matrices share. no two elements from the same row or column. One of these regards the determinant. A2 = a21 a22 357 . The determinant of an n × n square matrix A is the sum of n! terms.

nonnegative. analytic function. 2 a11 a22 − a12 a21 The quantity1 det A2 = a11 a22 − a12 a21 in the denominator is defined to be the determinant of A2 . we do find that quantity in the denominator there.358 CHAPTER 14. an appropriately terse notation for which the author confesses some nostalgia. (How do we know that exactly half are of positive and half. n! in number. whereas the determinant det A is a complex-valued. 1 . THE EIGENVALUE by the Gauss-Jordan method or any other convenient technique. P = 2 0 4 0 1 1 0 0 3 0 1 5 0 marking the positions of the term’s elements is positive. The determinant comprehends all possible such terms. is found to be a22 −a12 −a21 a11 A−1 = . nonanalytic function of the elements of the quantity in question. which has no second term to pair.1). with parity giving the term its ± sign.) The determinant det A used to be written |A|. or interchange quasielementary (§ 11. The book follows convention by denoting the determinant as det A for this reason among others. The older notation |A| however unluckily suggests “the magnitude of A. The parity of a term like a13 a22 a31 is negative for the same reason. For every term like a12 a23 a31 whose marking permutor is P . there is a corresponding a13 a22 a31 whose marking permutor is T[1↔2] P . The determinant of the general 3 × 3 square matrix by the same rule is det A3 = (a11 a22 a33 + a12 a23 a31 + a13 a21 a32 ) − (a13 a22 a31 + a12 a21 a33 + a11 a23 a32 ). The parity rule merits a more careful description. Each of the determinant’s terms includes one element from each column of the matrix and one from each row. half of positive parity and half of negative. either. The parity of a term like a12 a23 a31 is positive because the parity of the permutor. and indeed if we tediously invert such a matrix symbolically. negative? Answer: by pairing the terms.7. necessarily of opposite parity. The magnitude |z| of a scalar or |u| of a vector is a real-valued. The sole exception to the rule is the 1 × 1 square matrix.” which though not quite the wrong idea is not quite the right idea.

Historically the determinant probably emerged not from abstract considerations but for the mundane reason that the quantity it represents occurred frequently in practice (as in the A−1 example above).2 ) It is admitted3 that we have not. 13’s footnotes 5 and 22.14. otherwise. when j = j ′ . Nothing however log2 ically prevents one from simply defining some quantity which. Interchanging rows or columns negates the determinant. we have merely motivated and defined it. but the nonstandard notation det(n) A is available especially to call the rank out.1) αai∗ ai∗ when i = i′ . when i = i′′ . or if where i′′ = i′ and j ′′ = j ′ . 1] .1. [15. So we do here. stating explicitly that the determinant has exactly n! terms. when j = j ′′ . THE DETERMINANT 359 Normally the context implies a determinant’s rank n.2] 4 [15.5 and 11.49. one merely suspects will later prove useful. then det C = − det A.1 Basic properties The determinant det A enjoys several useful basic properties. Ch. as yet. (See also §§ 11. actually shown the determinant to be a generally useful quantity. And further Ch. § 1.1. • If ci∗ = 2 3 (14. 11. • If  ai′′ ∗  ci∗ = ai′ ∗   ai∗ c∗j  a∗j ′′  = a∗j ′   a∗j when i = i′ . otherwise.4 14.5 and eqn. at first. otherwise.3.

• If or if c∗j ′′ = γc∗j ′ . otherwise.2. (Equation 14.3. Scaling a single row or column of a matrix scales the matrix’s determinant by the same factor.3) If one row or column of a matrix C is the sum of the corresponding rows or columns of two other matrices A and B.2.2) det C = α det A. ci′ ∗ = 0. (14.360 or if c∗j = then CHAPTER 14. otherwise. then det C = 0. 11.3 tracks the linear superposition property of § 7. (14. then the determinant of the one matrix is the sum of the determinants of the other two. (Equation 14.3. (14.2 tracks the linear scaling property of § 7. .4) A matrix with a null row or column also has a null determinant.3 and of eqn.3 and of eqn. otherwise. a∗j + b∗j a∗j = b∗j when j = j ′ . THE EIGENVALUE αa∗j a∗j when j = j ′ . 11. ci′′ ∗ = γci′ ∗ . while the three matrices remain otherwise identical.) • If ci∗ = or if c∗j = then det C = det A + det B.) • If or if c∗j ′ = 0. ai∗ + bi∗ ai∗ = bi∗ when i = i′ .

(14. Hence the net effect of the procedure is to negate the determinant—to negate the very determinant the procedure is not permitted to alter. the procedure does not alter the determinant either. (14. However. then det C = 0.5) comes because the following procedure does not alter the matrix: (i) scale row i′′ or column j ′′ by 1/γ.6) comes because according to § 11.2). more formally.3) and (14.7. (14. Equation (14. when j = j ′′ . Or. and the determinant of the transpose is just the determinant itself: det C ∗ = (det C)∗ . the permutors P and P ∗ always have the same parity. (ii) scale row i′ or column j ′ by γ.6) These basic properties are all fairly easy to see if the definition of the determinant is clearly understood. . Equation (14. and indeed according to (14.1. The apparent contradiction can be reconciled only if the determinant is zero to begin with. Not altering the matrix.1.2). according to (14. in this case.1).14. step (iii) negates the determinant. • If ci∗ = or if c∗j = a∗j + αa∗j ′ a∗j ai∗ + αai′ ∗ ai∗ when i = i′′ . and because the adjoint operation individually conjugates each element of C. • The determinant of the adjoint is just the determinant’s conjugate. THE DETERMINANT where i′′ = i′ and j ′′ = j ′ .5) comes because.4) come because each of the n! terms in the determinant’s expansion has exactly one element from row i′ or column j ′ . Finally. otherwise. (14. step (ii)’s effect on the determinant cancels that of step (i). 361 (14. every term in the determinant’s expansion finds an equal term of opposite parity to offset it. (iii) interchange rows i′ ↔ i′′ or columns j ′ ↔ j ′′ .5) The determinant is zero if one row or column of the matrix is a multiple of another. det C T = det C.1) comes because a row or column interchange reverses parity. Equations (14. From the foregoing properties the following further property can be deduced. otherwise.

according to (14. scales or does not alter the matrix’s determinant. bi∗ ≡ ai∗ otherwise.9) . det Tα[ij] A = det A = det ATα[ij] . (14. det In A = det A = det AIn . bi′′ ∗ = αai′ ∗ whereas bi′ ∗ = ai′ ∗ . j) ≤ n.2 The determinant and the elementary operator Section 14. Obviously also. where [C]i′′ ∗ = [A]i′′ ∗ + [B]i′′ ∗ . then CHAPTER 14. so bi′′ ∗ = αbi′ ∗ .7) Adding to a row or column of a matrix a multiple of another row or column does not change the matrix’s determinant.4.3). B and C differ only in the (i′′ )th row. On the other hand. (14.1 has it that interchanging.7) results from combining the last two equations. THE EIGENVALUE det C = det A. which by (14.1.7) for rows (the column proof is similar). one defines a matrix B such that αai′ ∗ when i = i′′ . det Tα[i] A = α det A = det ATα[j] . det C = det A + det B. To derive (14.1. det I = 1 = det In .362 where i′′ = i′ and j ′′ = j ′ . But the three operations named are precisely the operations of the three elementaries of § 11. i = j. From this definition.8) 1 ≤ (i. so. Equation (14. for any n × n square matrix A. (14. 14. the three matrices A. det T[i↔j]A = − det A = det AT[i↔j].5) guarantees that det B = 0. det IA = det A = det AI. Therefore. scaling or adding rows or columns of a matrix respectively negates.

Elementary operators being reversible have no power to breach this barrier. an elementary operator which honors an n × n active region. In . Nevertheless.} restricts Mk to be any of the things between the braces.8) gives elementary operators the power to alter a matrix’s determinant almost arbitrarily—almost arbitrarily. Another thing n × n elementaries cannot do according to § 12. In symbols. is that the determinant of a product is the product of the determinants. I. This matters because. full-rank square matrices never do.9).1. a determinant remains zero under the action of elementary operators. applied recursively. j) ≤ n. Once zero. where r ≤ n is the matrix’s rank. . What an n × n elementary operator6 cannot do is to change an n × n matrix’s determinant to or from zero.11) below is going to erase the restriction. ∈ 14. by the Gauss-Jordan algorithm of § 12.4 will put the rule (14.8) and (14. Once nonzero. then a significant consequence of (14. always nonzero.3. As it happens though. . one can build up any square matrix of full rank by applying elementary operators to In . The notation Mk ∈ {. T[i↔j] . so it follows that the n × n determinant of every rank-r matrix is similarly zero if r < n. (14.10) 1 ≤ (i.1. Singular matrices always have zero determinants.3 is to change an n × n matrix’s rank. Equation (14. but not quite. as the Gauss-Jordan decomposition of § 12. One can evidently tell the singularity or invertibility of a square matrix from its determinant alone.2. at least where identity matrices and elementary operators are concerned. Tα[i] . but it does help here.1. and complementarily that the n × n determinant of a rank-n matrix is never zero. such elementaries can reduce any n × n matrix reversibly to Ir . Section 14.3 The determinant of a singular matrix Equation (14.5 det k Mk Mk = k det Mk .3 has shown.3.4) has that the n × n determinant of Ir is zero if r < n. THE DETERMINANT 363 If A is taken to represent an arbitrary product of identity matrices (In and/or I) and elementary operators. 5 . Notation like “∈” can be too fancy for applied mathematics. (14. Tα[ij] . 6 That is.14. in this case.5. See § 11.10) to good use.

The first case is that A is singular. Here. the determinant of the product is the product of the determinants. The determinant of a matrix product is the product of the matrix determinants.12) .10) merely that since A and B are products of identity matrices and elementaries.11) holds in the first case. which says that AB is no less singular than A.5 Determinants of inverse and unitary matrices From (14. for which det AB = det {[A] [B]} = det = = det T In det T det In T In T T det T det T In T det T det T det In T In T = det A det B.1. which is a schematic way of pointing out in light of (14.1.11) holds in all three cases. THE EIGENVALUE 14. The third case is that neither matrix is singular.1. (14. So it is that (14.4 The determinant of a matrix product Sections 14.364 CHAPTER 14. 14. Evidently (14. whereas according to § 12.3 suggest the useful rule that det AB = det A det B. B acts as a column operator on A. as was to be demonstrated.11) To prove the rule.5.2 no operator has the power to promote A in rank. which implies that AB like A has a null determinant. Hence the product AB is no higher in rank than A. we use GaussJordan to decompose both matrices into sequences of elementary operators and rank-n identity matrices.2 and 14. 1 det A (14. In this case. we consider three distinct cases. The second case is that B is singular. The proof here resembles that of the first case.11) it follows that det A−1 = because A−1 A = In and det In = 1.1.

365 (14. |det Q| = 1. ∗ ∗ 0 ∗ ∗ .6) it follows that if Q is a unitary matrix (§ 13. but swamps one in nearly interminable algebra when symbolically inverting general matrices larger than the A2 at the section’s head. ··· ··· ··· ··· ··· 3 7 7 7 7 7 7 7. 14. 0 0 1 0 0 . The matrix C. .   ai′ j ′ otherwise.14) is [C T A]ij = (det A)δij = [AC T ]ij .14) Pictorially. . Another way to write (14. .11). . . . . . . . cij ≡ det Rij . 7 7 7 7 5 (14.  [Rij ]i′ j ′ ≡ 0 if i′ = i or j ′ = j but not both. . . THE DETERMINANT From (14. . called the cofactor of A. . .1. then consists of the determinants of the various Rij . then det Q∗ det Q = 1. one quite incidentally discovers a clever way to factor the determinant: C T A = (det A)In = AC T .1. . In the case that i = j. Slogging through the algebra to invert A3 symbolically nevertheless (the reader need not actually do this unless he desires a long exercise). [AC T ]ij = [AC T ]ii = ℓ (14.  1 if i′ = i and j ′ = j. . .13) This reason is that |det Q|2 = (det Q)∗ (det Q) = det Q∗ det Q = det Q∗ Q = det Q−1 Q = det In = 1. ··· ··· ··· ··· ··· ∗ ∗ 0 ∗ ∗ . ∗ ∗ 0 ∗ ∗ . . .14. which comprises two cases. Rij = same as A except in the ith row and jth column. .6 Inverting the square matrix by determinant The Gauss-Jordan algorithm comfortably inverts concrete matrices of moderate size. ∗ ∗ 0 ∗ ∗ . . . 2 6 6 6 6 6 6 6 6 6 6 6 4 . . .15) aiℓ ciℓ = ℓ aiℓ det Riℓ = det A = (det A)δij .

14) by det A.16) still holds in theory for yet larger matrices. too—especially with the help of a computer to do the arithmetic. so this book does not treat Cramer’s rule as such. One does This is a bit subtle.7 In the case that i = j. In practice. but rather of A with the jth row replaced by a copy of the ith. through about 4 × 4 dimensionality (the A−1 equation 2 at the head of the section is just eqn.16 for n = 2).15) on them. even the Gauss-Jordan grows impractical. for concrete matrices much bigger than 4 × 4 to 6 × 6 or so its several determinants begin to grow too great and too many for practical calculation. (14. CT .16) inverts a matrix by determinant. Cramer’s rule is really nothing more than (14. Iterative techniques ([chapter not yet written]) serve to invert such matrices approximately.16) to (13.16) A−1 = det A Equation (14. where aiℓ provides the ith row and Riℓ provides the other rows. being a determinant. of which the reader may have heard. 7 . plus this chapter up to the present point.9 14. Similar equations can be written for [C T A]ij in both cases. wherein ℓ aiℓ det Rjℓ is the determinant. THE EIGENVALUE wherein the equation ℓ aiℓ det Riℓ = det A states that det A. n × n matrix concisely whose entries remain unspecified. 12 and 13. due to compound floating-point rounding error and the maybe large but nonetheless limited quantity of available computer memory. not of A itself. it inverts small matrices nicely. results from applying (14. hence also (14. each term including one factor from each row of A. Though (14. [AC T ]ij = ℓ aiℓ cjℓ = ℓ aiℓ det Rjℓ = 0 = (det A)(0) = (det A)δij . Dividing (14.5) evaluates to zero. consists of several terms.366 CHAPTER 14.6]. 8 Cramer’s rule [15.2 Coincident properties Chs.4). The Gauss-Jordan technique (or even the Gram-Schmidt technique) is preferred to invert concrete matrices above a certain size for this reason. § 1. and though symbolically it expresses the inverse of an abstract. The two cases together prove (14. have discovered several coincident properties of the invertible n × n square matrix. we have that8 A−1 A = In = AA−1 . 11.16) in a less pleasing form. 14. which according to (14. but if you actually write out A3 and its cofactor C3 symbolically. However.14).15). It inverts 5 × 5 and even 6 × 6 matrices reasonably. 9 For very large matrices. trying (14. then you will soon see what is meant.

given any specific n-element driving vector b (§ 13.3.3. but a matrix can be almost singular just as a scalar can be almost zero.5.1). the computers which invert them tend to do arithmetic imprecisely in floating-point. COINCIDENT PROPERTIES 367 not feel the full impact of the coincidence when these properties are left scattered across the long chapters.2. • The linear system Ax = b it represents has a unique n-element solution x.7).4). per § 12.4).1. Whether the distinction is always useful is another matter. Assuming exact arithmetic.5. and its rows address the same space the rows of In address (§ 12. no matter how small.4). The square matrix which lacks one. • Its columns are linearly independent (§ 12. Matrices which live .3). (In this. never both. the choice of pivots does not matter. Practical matrices however often have entries whose values are imprecisely known. let us gather and summarize the properties here. the Gram-Schmidt algorithm achieves a fully square.5. has all of them. unitary. implies a theoretically invertible matrix.) • Decomposing it.10. Now it is true: in exact arithmetic.1 and 12. • Its rows are linearly independent (§§ 12.14. below). n × n matrix evidently has either all of the following properties or none of them.3. Such a matrix is known. • The determinant det A = 0 (§ 14. a square matrix is either invertible. • The matrix is invertible (§ 13.3). • Its columns address the same space the columns of In address.2). a nonzero determinant. by its unexpectedly small determinant. with all that that implies. • It has full rank r = n (§ 12. so. or singular. The distinction between invertible and singular matrices is theoretically as absolute as (and indeed is analogous to) the distinction between nonzero and zero scalars. • The Gauss-Jordan algorithm reduces it to In (§ 12. • None of its eigenvalues is zero (§ 14. A square. and even when they don’t. among other ways.2). n × n factor Q (§ 13.3.5. The square matrix which has one of these properties. lacks all. never some but not others. Usually the distinction is indeed useful.

They are called ill conditioned matrices.3 The eigenvalue itself We stand ready at last to approach the final major agent of matrix arithmetic.1.3 is to demand that det(A − λIn ) = 0. 3 −1 – » 2−λ 0 det 3 −1 − λ » λ2 − λ − 2 = 0. When a matrix happens to operate on the right eigenvector v. together such that Av = λv.17) and a scalar λ. whose n roots are the eigenvalues 10 of the matrix A. it is all the same whether one applies the entire matrix or just the eigenvalue to the vector. (14.8 develops the topic. or in other words such that Av = λIn v. changing the vector’s 10 An example: A det(A − λIn ) = = = = λ = – 2 0 . What is an eigenvalue. Suppose a square. (2 − λ)(−1 − λ) − (0)(3) .1. n×n matrix A. then [A − λIn ]v = 0. The matrix scales the eigenvector by the eigenvalue without otherwise altering the vector. (14. the last equation is true if and only if the matrix [A − λIn ] is singular—which in light of § 14. Section 14.20) The left side of (14. 14. the eigenvalue.19) (14. the characteristic polynomial. (14.368 CHAPTER 14.18) Since In v is nonzero. THE EIGENVALUE on the hazy frontier between invertibility and singularity resemble the infinitesimals of § 4. If so. really? An eigenvalue is a scalar a matrix resembles under certain conditions. a nonzero n-element vector v = In v = 0.1. −1 or 2.20) is an nth-order polynomial in λ.

j (14. This is what (14. Of course it works only when v happens to be the right eigenvector.30).18) means.59). the determinant (14. The eigenvalues of a matrix’s inverse are the inverses of the matrix’s eigenvalues. here iterative techniques help: see [chapter not yet written]. then what does A′ Avj = Ivj do to vj ? Naturally one must solve (14. taken together. Zero eigenvalues and singular matrices always travel together.2). One solves it by the same techniques by which one solves any polynomial: the quadratic formula (2.19) and (14. which § 14. one can feed it back to calculate the associated eigenvector. 11 The inexpensive [15] also covers the topic competently and readably.11 14. An eigenvalue and its associated eigenvector. nonsingular matrices never do.4 The eigenvector It is an odd fact that (14. Once one has calculated an eigenvalue. When λ = 0.20) makes det A = 0. That is. On the other hand.20)’s nth-order polynomial to locate the actual eigenvalues. According to (14. which as we have said is the sign of a singular matrix. are sometimes called an eigensolution.4. 10.7) or the Gram-Schmidt kernel formula (13. hulking matrix. the Newton-Raphson iteration (4.19).4 discusses.21) The reason behind (14.20) reveal the eigenvalues λ of a square matrix A while obscuring the associated eigenvectors v. λ′ λj = 1 for all 1 ≤ j ≤ n if A′ A = In = AA′ . The eigenvalue alone takes the place of the whole. which is to say that the eigenvectors are the vectors of the kernel space of the degenerate matrix [A − λIn ]—which one can calculate (among other ways) by the Gauss-Jordan kernel formula (13. the eigenvectors are the nelement vectors for which [A − λIn ]v = 0. the cubic and quartic methods of Ch.14. THE EIGENVECTOR 369 magnitude but not its direction.21) comes from answering the question: if Avj scales vj by the factor λj . (14.20) can be impractical to expand for a large matrix. . though. Singular matrices each have at least one zero eigenvalue.

. which is to say. vj ). then any n-element vector can be expressed as a unique linear combination of the eigenvectors. v2 . vk−1 respectively with distinct eigenvalues λ1 . Unfortunately some matrices with repeated eigenvalues also have repeated eigenvectors—as for example. did nontrivial coefficients cj exist such that k−1 vk = j=1 cj vj .5 Eigensolution facts Many useful or interesting mathematical facts concern the eigensolution. Refer to (14. Thus vk too is independent. . . . THE EIGENVALUE 14. . • If the eigensolutions of A are (λj . then the eigensolutions of A + αIn are (λj + α. whereupon by induction from a start case of k = 1 we conclude that there exists no dependent eigenvector with a distinct eigenvalue. This is seen by adding αvj to both sides of the definition Avj = λvj . . impossible since the k − 1 eigenvectors are independent. • Eigenvectors corresponding to distinct eigenvalues are always linearly independent of one another. • A matrix and its inverse share the same eigenvectors with inverted eigenvalues. then left-multiplying the equation by A − λk would yield k−1 0= j=1 (λj − λk )cj vj .370 CHAPTER 14. λ2 . This is a simple consequence of the fact that the n × n matrix V whose columns are the several eigenvectors vj has full rank r = n.3. . The eigenvalues move over by αIn while the eigenvectors remain fixed. . and further consider another eigenvector vk which might or might not be independent but which too has a distinct eigenvalue λk . λk−1 . vj ). . consider several independent eigenvectors v1 . To prove this fact. • If an n × n square matrix A has n independent eigenvectors (which is always so if the matrix has n distinct eigenvalues and often so even otherwise).21) and its explanation in § 14. Were vk dependent. among them the following.

6. . Were it not so..3.22) where Λ= 2 6 6 6 6 6 4 λ1 0 . every n×n matrix with n distinct eigenvalues) can be diagonalized as A = V ΛV −1 .2 speaks of matrices of the last kind. Section 14. but is not limited to. which in turn is possible only if A1 = A2 . λn−1 0 0 0 . .10.6 Diagonalization Any n × n matrix with n independent eigenvectors (which class per § 14.12 [1 0. nonnegative eigenvalues.1) the matrix’s characteristic polynomial (14. 1 1]T . . This fact proceeds from the last point. 14. . which by definition would be no eigenvalue if it had no eigenvector to scale.19) necessarily admits at least one nonzero solution v because its matrix A − λIn is degenerate. then v∗ Av = λv∗ v (in which v∗ v naturally is a positive real scalar) would violate the criterion for positive or nonnegative definiteness. • An n × n square matrix whose eigenvectors are linearly independent of one another cannot share all eigensolutions with any other n×n square matrix. (14. ··· ··· 0 0 . • Every n × n square matrix has at least one eigensolution if n > 0. whose double eigenvalue λ = 1 has the single eigenvector [1 0]T . positive eigenvalues. . because according to the fundamental theorem of algebra (6.20) has at least one root. that every n-element vector x is a unique linear combination of independent eigenvectors. See § 13. . • A positive definite matrix has only real. .6. A nonnegative definite matrix has only real. DIAGONALIZATION 371 curiously. . and for which (14. Neither of the two proposed matrices A1 and A2 could scale any of the eigenvector components of x differently than the other matrix did.5 includes. 0 λn 3 7 7 7 7 7 5 12 [22] . 0 0 ··· ··· . so A1 x − A2 x = (A1 − A2 )x = 0 for all x. . 0 0 0 λ2 . an eigenvalue.14.

One might object that we had shown only how to compose some matrix V ΛV −1 with the correct eigenvalues and independent eigenvectors.23) for any diagonalizable matrix A. includes every n × n matrix with n distinct eigenvalues and also includes many matrices with fewer) is called a diagonalizable matrix. The diagonal matrix Λk is nothing more than the diagonal matrix Λ with each element individually raised to the kth power. because § 14. THE EIGENVALUE is an otherwise empty n × n matrix with the eigenvalues of A set along its main diagonal and V = v1 v2 · · · vn−1 vn is an n × n matrix whose columns are the eigenvectors of A. the diagonal decomposition or the diagonalization of the square matrix A. or. Besides factoring a diagonalizable matrix by (14.372 CHAPTER 14. Equation (14.22). we need not show this. then consider that the jth column of the product AV is Avj .55) is trivially diagonalizable as diag{x} = In diag{x}In . one can apply the same formula to compose a diagonalizable matrix with desired eigensolutions. j ij If this seems confusing. from which (14. because the identity AV = V Λ holds.13 The matrix V is invertible because its columns the eigenvectors are independent. An n × n matrix with n independent eigenvectors (which class.5 has already demonstrated that two matrices with the same eigenvalues and independent eigenvectors are in fact the same matrix. The diagonal matrix diag{x} of (11. such that Λk = δij λk . but had failed to show that the matrix was actually A.22) follows. This is so because the identity Avj = vj λj holds for all 1 ≤ j ≤ n.22) is called the eigenvalue decomposition. 13 . expressed more concisely. whereas the jth column of Λ having just the one element acts to scale V ’s jth column only. It is a curious and useful fact that A2 = (V ΛV −1 )(V ΛV −1 ) = V Λ2 V −1 and by extension that Ak = V Λk V −1 (14. whereby the product V ΛV −1 can be nothing other than A. However. again.

The nondiagonalizable matrix vaguely resembles the singular matrix in that both represent edge cases and can be hard to handle numerically.2 will have more to say about the nondiagonalizable matrix. Should the dominant eigenvalue have greater than unit magnitude. tends then to transform v into the associated eigenvector. REMARKS ON THE EIGENVALUE Changing z ← k implies the generalization14 Az = V Λz V −1 . All this is fairly deep mathematics. 14 It may not be clear however according to (5. a nondiagonalizable matrix is a matrix with an n-fold eigenvalue whose corresponding eigenvector space fewer than n eigenvectors fully characterize. then to A2 v. largest in magnitude. thus one can sometimes judge the stability of a physical process indirectly by examining the eigenvalues of the matrix which describes it. . Λz ij 373 = δij λz .24) good for any diagonalizable A and complex z. The idea that a matrix resembles a humble scalar in the right circumstance is powerful.10. 14. The n × n null matrix for example is singular but still diagonalizable. but the resemblance ends there. operating repeatedly on a vector v to change it first to Av. Section 14. A3 v and so on. Then there is the edge case of the nondiagonalizable matrix. which matrix surprisingly covers only part of its domain with eigenvectors. twice or more. and a matrix can be either without being the other. it destabilizes the iteration.2 and 14. Remarks continue in §§ 14.12) which branch of λz one should choose j at each index j. It brings an appreciation of the matrix for reasons which were anything but apparent from the outset of Ch.10. gradually but relatively eliminating all other components of v.14. Nondiagonalizable matrices are troublesome and interesting. 11. What a nondiagonalizable matrix is in essence is a matrix with a repeated eigensolution: the same eigenvalue with the same eigenvector.7 Remarks on the eigenvalue Eigenvalues and their associated eigenvectors stand among the principal reasons one goes to the considerable trouble to develop matrix theory as we have done in recent chapters. More formally.7. especially if A has negative or complex eigenvalues. j (14.13. The dominant eigenvalue of A. Among the reasons for this is that a matrix can represent an iterative process.

then the corresponding eigenvalue of A′ is λmin / |λmax |. Section 14. for instance. Matrix condition so defined turns out to have another useful application. An ill conditioned matrix by definition16 is a matrix of large κ ≫ 1.3 are singular. THE EIGENVALUE 14. or where the elements of A are known only within some tolerance. the scale factor 1/ |λmax | scales all eigenvalues equally. In terms of the new operator. 16 15 The largest in magnitude of the several eigenvalues of a diagonalizable operator A. no particular edge value of κ. One sometimes finds it convenient to normalize a dominant eigenvalue by defining a new operator A′ ≡ A/ |λmax |. but in computer floating point. If zero. then both matrices are ill conditioned. or that one with a wretched κ = 20x18 were well conditioned. but you knew that already. then you might not credit me—but the mathematics nevertheless can only give the number. If I tried to claim that a matrix with a fine κ = 3 were ill conditioned. denoted here λmax . tends to dominate the iteration Ak x.25) κ≡ λmin by which κ≥1 (14. at and beyond which it turns ill conditioned.374 CHAPTER 14. Suppose that a diagonalizable matrix A is precisely known but that the corresponding driving vector b is not. it remains to the mathematician to interpret it. if nearly zero. The applied mathematician handles such a matrix with due skepticism. the iteration becomes Ak x = |λmax |k A′k x. (14. If A(x + δx) = b + δb. κ = 1 would be ideal (it would mean that all eigenvalues had the same magnitude). leaving one free to carry the magnifying effect |λmax |k separately if one prefers to do so. Such considerations lead us to define the condition of a diagonalizable matrix quantitatively as15 λmax . thus. though in practice quite a broad range of κ is usually acceptable. Could we always work in exact arithmetic. the value of κ might not interest us much as long as it stayed finite.26) . less than which a matrix is well conditioned.8 Matrix condition is always a real number of no less than unit magnitude. For best invertibility. whose own dominant eigenvalue λmax / |λmax | has unit magnitude.7 has named λmax the dominant eigenvalue for this reason. [9] There is of course no definite boundary. if A’s eigenvalue of smallest magnitude is denoted λmin . infinite κ tends to emerge imprecisely rather as large κ ≫ 1. However. then both matrices according to § 14.

max λ−1 δb λmax |δb| |δx| ≤ min . For example. then the vector B » 5 1 – = 3 3 2 −1 1 54 0 5 + 14 2 5 1 0 2 = 3 4 4 2 5 1 2 . but ill condition remains a property of matrices alone. Subtracting x = A−1 b and taking the magnitude. x + δx = A−1 (b + δb). |δb| |δx| ≤κ .25). then one should like to bound the ratio |δx| / |x| to ascertain the reliability of x as a solution. = −1 |x| λmin |b| λmax b That is. Dividing by |x| = A−1 b . 14.27) Condition. |x| |b| (14.9 The similarity transformation Any collection of vectors assembled into a matrix can serve as a basis by which other vectors can be expressed. the condition of every nonzero scalar is happily κ = 1. might technically be said to apply to scalars as well as to matrices. |x| |A−1 b| The quantity A−1 δb cannot exceed λ−1 δb . A−1 δb |δx| = . According to (14. if the columns of B= 2 1 4 0 0 3 −1 2 5 1 are regarded as a basis. incidentally.14. Thus. THE SIMILARITY TRANSFORMATION 375 where δb is the error in b and δx is the resultant error in x. Transferring A to the equation’s right side. |δx| = A−1 δb .9. The quantity A−1 b cannot min fall short of λ−1 b .

Left-multiplication by B −1 . only transformed to work within the basis17. One can therefore convert any operator A to work within a complete basis B by the successive steps Ax = b. converting into the basis. That is. e3 . Such semantical distinctions seem a little too fine for applied use.28) The reader may need to ponder the basis concept a while to grasp it. though. 17 (14. in which the n basis vectors are independent and address the same full space the columns of In address. by successive steps. then does the reverse. B −1 A(BB −1 )x = λB −1 x.376 CHAPTER 14.18 B. unitarily similar ) to the matrix A. if B is unitary. by which the operator B −1 AB is seen to be the operator A. not only from the standard building blocks e1 . . [B −1 AB]u = B −1 b. Probably the most important property of the similarity transformation is that it alters no eigenvalues. but the concept is simple once grasped and little purpose would be served by dwelling on it here.—except that the right word for the relevant “building block” is basis vector. The basis provides the units from which other vectors can be built. If x = Bu then u represents x in the basis B. one might consult one of those if needed. Now we have the theory to appreciate it properly. Basically. The books [23] and [31] introduce the basis more gently. 1) in the basis B: five times the first basis vector plus once the second. [B −1 AB]u = λu. If B happens to be unitary (§ 13. if Ax = λx. invertible complete basis B. We have already met the similarity transformation in §§ 11. THE EIGENVALUE is (5. etc. The conversion from A into B −1 AB is called a similarity transformation.2. the idea is that one can build the same vector from alternate building blocks.11). then. e2 . ABu = b. u = B −1 x.5 and 12. The matrix B −1 AB the transformation produces is said to be similar (or. Left-multiplication by B evidently converts out of the basis. Particularly interesting is the n×n. 18 The professional matrix literature sometimes distinguishes by typeface between the matrix B and the basis B its columns represent. then the conversion is also called a unitary transformation. This book just uses B.

. is somewhat tedious to derive and is of limited use in itself.30) The alternative is to develop the interesting but difficult Jordan canonical form. ∗ ∗ ∗ 0 0 0 0 . ··· ··· ··· ··· ··· ··· ··· . n × n matrix A and any invertible. . n × n square matrix A is A = QUS Q∗ . which will shortly grow clear) we have a matrix B of the form 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 . . ∗ ∗ ∗ ∗ ∗ ∗ ∗ . . as for any unitary matrix (§ 13. ∗ ∗ ∗ ∗ ∗ ∗ ∗ .10. . . . . of which this subsection uses too many to be allowed to reserve any. . 20 This subsection assigns various capital Roman letters to represent the several matrices and submatrices it manipulates. is Q−1 = Q∗ .29) is not standard and carries no meaning elsewhere. ∗ 0 0 0 0 0 0 .11). ∗ ∗ ∗ ∗ ∗ ∗ ∗ . . . and where US is a general upper triangular matrix which can have any values (even zeros) along its main diagonal. . . The writer had to choose some letters and these are ones he chose. . Its choice of letters except in (14. . . 3 B= ··· ··· ··· ··· ··· ··· ··· .29) where Q is an n × n unitary matrix whose inverse. square. . . . ∗ ∗ 0 0 0 0 0 . not because the latter is wrong but because it would be extremely confusing). This footnote mentions the fact because good mathematical style avoid assigning letters that already bear a conventional meaning in a related context (for example. . which for brevity’s sake this chapter prefers to omit.1 Derivation Suppose that20 (for some reason. 7 7 7 7 7 7 7 7 7. 19 . . . . .. See Appendix B. . 7 7 7 7 7 7 5 (14. .10. . ∗ ∗ ∗ ∗ 0 0 0 . this book avoids writing Ax = b as T e = i. The Schur decomposition is slightly obscure. THE SCHUR DECOMPOSITION 377 The eigenvalues of A and the similar B −1 AB are the same for any square.. . . . n × n matrix B.19 We derive it here for this reason. . 14. 14. . .14. .10 The Schur decomposition The Schur decomposition of an arbitrary. . though. but serves a theoretical purpose. . (14. The Roman alphabet provides only twenty-six capitals.

. . THE EIGENVALUE where the ith row and ith column are depicted at center. (14. ∗ ∗ ∗ ∗ ∗ ∗ ∗ . ··· ··· ··· ··· ··· ··· ··· . 7 7 7 7 7 7 5 7 7 7. . ∗ ∗ ∗ ∗ ∗ 0 0 . ∗ ∗ 0 0 0 0 0 . 3 2 6 4 ∗ 0 0 . . 3 Equation (14. . because we need not find every W that satisfies the form but only one such W . . . . . . . . . . . . . 5 Co = 6 6 ∗ ∗ ∗ . . ··· ··· ··· . Let Bo and Co represent the (n − i) × (n − i) submatrices in the lower right corners respectively of B and C... . . 0 0 0 0 ∗ ∗ ∗ . . . . . To narrow the search. ··· ··· ··· ··· ··· ··· ··· . .378 CHAPTER 14.. .33) . . . . let us look first for a W that fits the restricted template 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 . . . . ··· ··· ··· ··· ··· ··· ··· . 1 0 0 0 0 0 0 . The question remains as to whether a unitary W exists that satisfies the form and whether for general B we can discover a way to calculate it. 0 0 0 0 ∗ ∗ ∗ . .31) seeks an n × n unitary matrix W to transform the matrix B into a new matrix C ≡ W ∗ BW such that C fits the form (14. . 0 0 1 0 0 0 0 . . . . . . 7 7 7 7 7 7 5 C ≡ W ∗ BW = (14. . . . . Suppose further that we wish to transform B not only similarly but unitarily into 2 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 4 . . . ∗ ∗ ∗ 0 0 0 0 . . . . . . 7 7 7. . 0 0 0 0 ∗ ∗ ∗ . Pictorially. ∗ ∗ ∗ . .. . .9. . . ∗ ∗ ∗ ∗ ∗ ∗ ∗ . . .31) where W is an n × n unitary matrix. . . . 0 0 0 1 0 0 0 . Co ≡ In−i H−i CHi In−i . ∗ ∗ ∗ . . . . . . . .. . . . .32) where Hk is the shift operator of § 11. 3 7 7 7 7 7 7 7 7 7. ∗ 0 0 0 0 0 0 . . . . . 5 W = Ii + Hi Wo H−i = (14. Bo = 6 6 4 2 6 ∗ ∗ ∗ . . . . ∗ ∗ ∗ ∗ 0 0 0 . . ∗ ∗ ∗ . such that Bo ≡ In−i H−i BHi In−i . . . .31) stipulates. ··· ··· ··· ··· ··· ··· ··· .. . 3 7 7 7 7 7 7 7 7 7. and where we do not mind if any or all of the ∗ elements change in going from B to C but we require zeros in the indicated spots. ··· ··· ··· . . . 0 1 0 0 0 0 0 .

That leaves the lower right. THE SCHUR DECOMPOSITION 379 which contains a smaller. so In−i Wo = Wo = Wo In−i . and the upper right term does not bother us because we have not restricted the content of C’s upper right area. The four terms on the equation’s right. (n − i) × (n − i) unitary submatrix Wo in its lower right corner and resembles In elsewhere. Thus.14.10. Apparently any (n − i) × (n − i) unitary submatrix Wo whatsoever obeys (14. upper left and upper right.31) by the truncator (In −Ii ) to focus solely on the lower right area.and right-multiplying (14.76). ∗ = (Ii + Hi Wo H−i )(B)(Ii + Hi Wo H−i ) The unitary submatrix Wo has only n−i columns and n−i rows. each term with rows and columns neatly truncated.34) . (14. represent the four quarters of C ≡ W ∗ BW —upper left. upper right. Beginning from (14. The lower left term is null because ∗ ∗ (In − Ii )[Hi Wo H−i B]Ii = (In − Ii )[Hi Wo In−i H−i BIi ]Ii ∗ = (In − Ii )[Hi Wo H−i ][0]Ii = 0.31) in the lower left. ∗ C = Ii BIi + Ii BHi Wo In−i H−i + Hi In−i Wo H−i BIi ∗ = Ii [B]Ii + Ii [BHi Wo H−i ](In − Ii ) + (In − Ii )[Hi Wo H−i B]Ii ∗ + (In − Ii )[Hi Wo Bo Wo H−i ](In − Ii ). ∗ = (In − Ii )[Hi Wo H−i ][(In − Ii )BIi ]Ii leaving C = Ii [B]Ii + Ii [BHi Wo H−i ](In − Ii ) ∗ + (In − Ii )[Hi Wo Bo Wo H−i ](In − Ii ). ∗ + Hi In−i Wo In−i H−i BHi In−i Wo In−i H−i where the last step has used (14.32) and the identity (11. we have the reduced requirement that (In − Ii )C(In − Ii ) = (In − Ii )W ∗ BW (In − Ii ). Left.31). we have by successive. lower left and lower right. reversible steps that C = W ∗ BW ∗ = Ii BIi + Ii BHi Wo H−i + Hi Wo H−i BIi ∗ + Hi Wo H−i BHi Wo H−i . But the upper left term makes the upper left areas of B and C the same. respectively.

∗ Co = In−i H−i (Ii + Hi Wo H−i )B(Ii + Hi Wo H−i )Hi In−i . Noting that the Gram-Schmidt algorithm orthogonalizes only rightward. . or. or.35) The steps from (14. Expanding W per (14. THE EIGENVALUE Further left-multiplying by H−i . therefore. The increasingly well-stocked armory of matrix theory we now have to draw from makes satisfying (14. ∗ Co = Wo Bo Wo .35) possible as follows. Let (λo . it suffices to satisfy (14. To achieve a unitary transformation of the form (14.35). this is (14.5 that every square matrix has at least one eigensolution. since In−i H−i Ii = 0 = Ii Hi In−i .33). Decompose F by the Gram-Schmidt technique of § 13. vo ) represent an eigensolution of Bo —any eigensolution of Bo —with vo normalized to unit magnitude.10. (n − i) × (n − i + 1) matrix F ≡ vo e1 e2 e3 · · · en−i . ∗ Co = In−i H−i (Hi Wo H−i )B(Hi Wo H−i )Hi In−i ∗ = In−i Wo H−i BHi Wo In−i ∗ = Wo In−i H−i BHi In−i Wo . Co = In−i H−i W ∗ BW HiIn−i . Form the broad. so the latter is as good a way to state the reduced requirement as the former is.76) yields In−i H−i CHi In−i = In−i H−i W ∗ BW HiIn−i . and applying the identity (11. Observe per § 14.31).2. right-multiplying by Hi . substituting from (14. Per (14. observe that the first column of the (n − i) × (n − i) unitary matrix QF remains simply the first column of F .32). which is the unit eigenvector vo : [QF ]∗1 = QF e1 = vo .35) are reversible.34) to (14. choosing p = 1 during the first instance of the algorithm’s step 3 (though choosing any permissible p thereafter).380 CHAPTER 14.32). to obtain F = Q F RF .

36) completes a failsafe technique to transform unitarily any square matrix B of the form (14. 1 ≤ i′ ≤ n. 3 Ge1 = λo Q∗ vo = λo e1 = 6 6 F which means that G=6 6 4 2 6 λo 0 0 . together constitute a valid choice for Wo and Co . 5 7 7 7. Left-multiplying by Q∗ .36) where QF and G are as this paragraph develops. F Noting that the Gram-Schmidt process has rendered orthogonal to vo all columns of QF but the first.. . satisfying the reduced requirement (14.35) and thus also the original requirement (14. observe that QF Ge1 = λo vo .37) . .10. Co = G. . 5 which fits the very form (14. ··· ··· ··· . (14. . . which is vo . THE SCHUR DECOMPOSITION Transform Bo unitarily by QF to define the new matrix G ≡ Q ∗ Bo Q F .31).31). 7 7 7. Naturally the technique can be applied recursively as B|i=i′ = C|i=i′ −1 . F Ge1 = λo Q∗ vo . (14. F then transfer factors to reach the equation QF GQ∗ = Bo .14. . Conclude therefore that W o = QF . . Equation (14. observe that 2 6 4 3 λo 0 0 . ∗ ∗ ∗ . .32) the submatrix Co is required to have. F 381 Right-multiplying by QF e1 = vo and noting that Bo vo = λo vo . ∗ ∗ ∗ . .30) into a square matrix C of the form (14.

else the permutor’s bottom row would be empty which is not allowed.30) the matrix US has the general upper triangular form the Schur decomposition (14. If P marked no position below the main diagonal. Moreover.41) The determinant’s definition in § 14. (ii) that the only permutor that marks no position below the main diagonal is the one which also marks no position above.21 Hence no element above the main diagonal of US even influences the eigenvalues. p(n−2)(n−2) = 1. 14. we have that n−1 Q= i′ =0 (W |i=i′ ) .10. In either form. if we let B|i=0 = A. this determinant brings only the one term n det(US − λIn ) = i=1 (uSii − λ) = 0 whose factors run straight down the main diagonal. Unlike most determinants. (14. In the next row up. because the product of unitary matrices according to (13. then it would necessarily have pnn = 1.39) (14. THE EIGENVALUE because the form (14. which apparently are λi = uSii .29) requires.30) of B at i = i′ is nothing other than the form (14. 21 (14. then it follows by induction that B|i=n = US .40) which along with (14.39) accomplishes the Schur decomposition. Therefore. . In the next-to-bottom row.1 makes the following two propositions equivalent: (i) that a determinant’s term which includes one or more factors above the main diagonal also includes one or more factors below.38) where per (14. the proposition’s truth might seem less than obvious until viewed from the proper angle.20) of the general upper triangular matrix US is det(US − λIn ) = 0. where the determinant’s n! − 1 other terms are all zero because each of them includes at least one zero factor from below the main diagonal.62) is itself a unitary matrix. p(n−1)(n−1) = 1. thus affirming the proposition. (14. because the nth column is already occupied.31) of C at i = i′ − 1. and so on.382 CHAPTER 14.2 The nondiagonalizable matrix The characteristic equation (14. Consider a permutor P .

however. By the Schur decomposition. Though infinitesimally near A.22 One might think that the Schur decomposition offered an easy way to calculate eigenvalues.28).2 has analyzed perturbed poles. ≡ US + ǫ diag{u}. 23 Equation (11. whose triangular factor US openly lists all eigenvalues along its main diagonal. one can construct a second square matrix A′ . simply by perturbing the main diagonal of US to23 ′ US ui′ = ui if λi′ = λi .42) where |ǫ| ≪ 1 and where u is an arbitrary vector that meets the criterion ′ given. This theoretical benefit pays when some of the n eigenvalues of an n × n square matrix A repeat.55) defines the diag{·} notation. and. 22 An unusually careful reader might worry that A and US had the same eigenvalues with different multiplicities.10. still. . The Schur decomposition (14. as we have seen. but.11) and (14. thus further the same eigenvalue multiplicities.41): every square matrix without exception has a Schur decomposition. one can analyze such perturbed eigenvalues and their associated eigenvectors similarly as § 9. similarity transformations preserve eigenvalues. which says that A and US have not only the same eigenvalues but also the same characteristic polynomials. then the eigenvalues of A are just the values along the main diagonal of US . as near as desired to A but having n distinct eigenvalues.29) is in fact a similarity transformation. THE SCHUR DECOMPOSITION 383 the main-diagonal elements. With sufficient toil. If therefore A = QUS Q∗ .14. one would like to give a sounder reason than the participle “surprising.13).6. it brings at least the theoretical benefit of (14. but it is less easy than it first appears because one must calculate eigenvalues to reach the Schur decomposition in the first place. (14. this equation’s determinant is = Q[US − λ(Q∗ In Q)]Q∗ = Q[US − λIn ]Q∗ . It would be surprising if it actually were so.” Consider however that A − λIn = QUS Q∗ − λIn = Q[US − Q∗ (λIn )Q]Q∗ According to (14. Whatever practical merit the Schur decomposition might have or lack. every matrix A has a Schur decomposition. the modified matrix A′ = QUS Q∗ unlike A has n (maybe infinitesimally) distinct eigenvalues. According to (14. det[A − λIn ] = det{Q[US − λIn ]Q∗ } = det Q det[US − λIn ] det Q∗ = det[US − λIn ].

Ch. or concerning the eigenvalue problem26 more broadly.5.384 CHAPTER 14. but fail where eigenvectors repeat. Ch. Then there is a kind of sloppy Schur form called a Hessenberg form which allows content in US along one or more subdiagonals just beneath the main diagonal. The finding that every matrix is arbitrarily nearly diagonalizable illuminates a question the chapter has evaded up to the present point.22) diagonalize any matrix with distinct eigenvalues and even any matrix with repeated eigenvalues but distinct eigenvectors. One could profitably propose and prove any number of useful theorems concerning the nondiagonalizable matrix and its generalized eigenvectors.42) brings us to the nondiagonalizable matrix of the subsection’s title. Section 14. 7] [15.6 and its diagonalization formula (14. 24 25 [17. is arbitrarily nearly diagonalizable with distinct eigenvalues. (14.6. in more and less rigorous ways. The question: does a p-fold root in the characteristic polynomial (14. 14.11 The Hermitian matrix A∗ = A.42) separates eigenvalues.25 which together roughly track the sophisticated conventional pole-separation technique of § 9. 5] 26 [53] . but for the time being we shall let the matter rest there. THE EIGENVALUE Equation (14. We shall leave the answer in that form.5 eigenvectors of distinct eigenvalues never depend on one another—permitting a nonunique but still sometimes usable form of diagonalization in the limit ǫ → 0 even when the matrix in question is strictly nondiagonalizable. Generalizing the nondiagonalizability concept leads one eventually to the ideas of the generalized eigenvector 24 (which solves the higher-order linear system [A − λI]k v = 0) and the Jordan canonical form.20) necessarily imply a p-fold eigenvalue in the corresponding matrix? The existence of the nondiagonalizable matrix casts a shadow of doubt until one realizes that every nondiagonalizable matrix is arbitrarily nearly diagonalizable— and. Equation (14. better.43) An m × m square matrix A that is its own adjoint. thus also eigenvectors—for according to § 14. That is the essence of the idea. then you can show him a nearly identical matrix with three infinitesimally distinct eigenvalues. If you claim that a matrix has a triple eigenvalue and someone disputes the claim.

let the s columns of the m × s matrix Vo represent the s independent eigenvectors of A such that (i) each column has unit magnitude and (ii) columns whose eigenvectors share the same eigenvalue lie orthogonal to one another.11 and 14. • its eigenvectors corresponding to distinct eigenvalues lie orthogonal to one another. and • it is unitarily diagonalizable (§§ 13. for which λ∗ v∗ v = (Av)∗ v = v∗ Av = v∗ (Av) = λv∗ v. 27 [31. the eigenvalues λ1 and λ2 are no exceptions. ∗ λ2 = λ1 or v2 v1 = 0. Let the s × s diagonal matrix Λo carry the eigenvalues on its main diagonal such that AVo = Vo Λo . which naturally is possible only if λ is real. 2 But according to the last paragraph all eigenvalues are real. § 8.1] . λ∗ = λ. Properties of the Hermitian matrix include that • its eigenvalues are real. 2 That is. v) represent an eigensolution of A and constructing the product v∗ Av. Given an m × m matrix A. Hence. v2 ) represent eigensolu∗ tions of A and constructing the product v2 Av1 . That is. THE HERMITIAN MATRIX 385 is called a Hermitian or self-adjoint matrix.6) such that A = V ΛV ∗ .14.44) That the eigenvalues are real is proved by letting (λ. (14. That eigenvectors corresponding to distinct eigenvalues lie orthogonal to one another is proved27 by letting (λ1 . v1 ) and (λ2 . ∗ λ∗ = λ1 or v2 v1 = 0. for which ∗ ∗ ∗ ∗ λ∗ v2 v1 = (Av2 )∗ v1 = v2 Av1 = v2 (Av1 ) = λ1 v2 v1 .11. To prove the last hypothesis of the three needs first some definitions as follows.

9) to Vo —perpendicular to all eigenvectors. Λo = 4 0 5 0 5. Recall from § 14.22) is that the latter always includes a p-fold eigenvalue p times. Vo = 6 √ 7. not an eigenvector but perpendicular to them all. whereas the former includes a p-fold eigenvalue only as many times as the eigenvalue enjoys independent eigenvectors. the latter of whose eigenvector space is fully characterized by two eigenvectors rather than three such that 2 1 3 3 2 √ 0 0 2 3 0 2 −1 0 0 1 6 √ 7 6 1 0 7 0 7 6 ⊥ Vo = 6 2 7. Let the m−s columns of the m× (m − s) matrix Vo⊥ represent the complete orthogonal complement (§ 13. . falsely supposing a nondiagonalizable Hermitian matrix A.28 With these definitions in hand. each column of unit magnitude— such that Vo⊥∗ Vo = 0 and Vo⊥∗ Vo⊥ = Im−s . s × (m − s) and (m − s) × (m − s) auxiliary matrices F and G necessarily would exist such that AVo⊥ = Vo F + Vo⊥ G. In the example. For such a matrix A. The two λ = 5 eigenvectors are reported in mutually orthogonal form.386 CHAPTER 14. we can now prove by contradiction that all Hermitian matrices are diagonalizable. implying that s < m) would have at least one column. THE EIGENVALUE where the distinction between the matrix Λo and the full eigenvalue matrix Λ of (14. whose Vo⊥ (since A is supposed to be nondiagonalizable. 1 1 5 4 0 √2 5 4 0 2 0 0 5 1 1 √ − 2 0 0 √2 A concrete example: the invertible but 2 −1 6 −6 A=6 4 0 0 nondiagonalizable matrix 3 0 0 0 5 5 2 −5 7 2 7 0 5 0 5 0 0 5 The orthogonal complement Vo⊥ supplies the missing vector.6 that s = m if and only if A is diagonalizable. but notice that eigenvectors corresponding to distinct eigenvalues need not be orthogonal when A is not Hermitian. All vectors in the example are reported with unit magnitude.5 that s = 0 but 0 < s ≤ m because every square matrix has at least one eigensolution. m = 4 and s = 3. Recall from § 14. not due to any unusual property of the product AVo⊥ but for the mundane reason that the columns of Vo and Vo⊥ together by definition addressed 28 has a single eigenvalue at λ = −1 and a triple eigenvalue at λ = 5.

In the reduced equation the matrix G would have at least one eigensolution. o Λ∗ (0) = F + (0)G.11. one wonders whether the converse holds: are all diagonalizable matrices with real eigenvalues and orthogonal eigenvectors Hermitian? To show that they are. We conclude that all Hermitian matrices are diagonalizable—and conclude further that they are unitarily diagonalizable on the ground that their eigenvectors lie orthogonal to one another—as was to be demonstrated. The finding that F = 0 reduces the AVo⊥ equation above to read AVo⊥ = Vo⊥ G. we would have by successive steps that AVo⊥ w = Vo⊥ Gw. w) represent an eigensolution of G. Right-multiplying by the (m − s)-element vector w = 0. its distinctly eigenvalued eigenvectors lay orthogonal to one another. The assumption: that a nondiagonalizable Hermitian A existed. Vo⊥ w) were an eigensolution of A. Λ∗ Vo∗ Vo⊥ = F + Vo∗ Vo⊥ G. The contradiction proves false the assumption that gave rise to it. o 0 = F.14. A(Vo⊥ w) = µ(Vo⊥ w). not due to any unusual property of G but because according to § 14.22). we would have by successive steps that Vo∗ AVo⊥ = Vo∗ Vo F + Vo∗ Vo⊥ G. A = V ΛV ∗ . (AVo )∗ Vo⊥ = Is F + Vo∗ Vo⊥ G. The last equation claims that (µ. (Vo Λo )∗ Vo⊥ = F + Vo∗ Vo⊥ G. . has at least one eigensolution. thus by construction did not lie in the space addressed by the columns of Vo⊥ . in consequence of which A∗ = A and Vo∗ Vo = Is . 1 × 1 or larger. where we had relied on the assumption that A were Hermitian and thus that. Having proven that all Hermitian matrices are diagonalizable and have real eigenvalues and orthogonal eigenvectors.5 every square matrix. as proved above. THE HERMITIAN MATRIX 387 the space of all m-element vectors—including the columns of AVo⊥ . Let (µ. when we had supposed that all of A’s eigenvectors lay in the space addressed by the columns of Vo . one can construct the matrix described by the diagonalization formula (14. Leftmultiplying by Vo∗ .

2. 18 Oct. The equation’s adjoint is A∗ = V Λ∗ V ∗ . but it works in theory at least. (14.3. thus is unitarily diagonalizable according to (14. 30 [52. which means that Λ∗ = Λ and the right sides of the two equations are the same. then the following idea is one you might discover. All diagonalizable matrices with real eigenvalues and orthogonal eigenvectors are Hermitian. is clearly Hermitian according to § 14. 2007] 29 . That the eigenvalues.45) in which A = C ∗ C and b = C ∗ d.5 does require all of its eigenvalues to be real and moreover positive— which means among other things that the eigenvalue matrix Λ has full rank.46) Here.” 14:29. almost in plain sight. the n × n matrices Λ and V represent respectively the eigenvalues and eigenvectors not of A but of the product A∗ A.6.11). (14.11. overlooked. since (A∗ A)∗ = A∗ A. The properties demand a Hermitian matrix.12 The singular value decomposition Occasionally an elegant idea awaits discovery. for which the properties obtain. THE EIGENVALUE where V −1 = V ∗ because this V is unitary (§ 13. “Singular value decomposition. and.45) worsens a matrix’s condition and may be undesirable for this reason. But all the eigenvalues here are real. Though nothing requires the product’s eigenvectors to be real. the diagonal elements of Λ. That is.388 CHAPTER 14. are real and positive is The device (14. This section brings properties that greatly simplify many kinds of matrix analysis. m × n matrix A of full column rank r=n≤m and its adjoint A∗ . A∗ = A as was to be demonstrated. If the unlikely thought occurred to you to take the square root of a matrix.29 14. which might seem a severe and unfortunate restriction—except that one can left-multiply any exactly determined linear system Cx = d by C ∗ to get the equivalent Hermitian system [C ∗ C]x = [C ∗ d]. The product A∗ A is invertible according to § 13. because the product is positive definite § 14.30 Consider the n × n product A∗ A of a tall or square.6.44) as A∗ A = V ΛV ∗ . is positive definite according to § 13.

··· ··· 0 0 . + λn−1 0 0 0 . . 0 √ + λn 3 7 7 7 7. so this case poses no special trouble. instead.12. thus making U a unitary matrix? If we do this. positive square root under these conditions. 7 5 where the singular values of A populate Σ’s diagonal. which says neither more nor less than that the first n columns of U are orthonormal.48) (14. then the surprisingly simple (14. AV = U Σ. positive square root of the eigenvalue matrix Λ such that Λ = Σ∗ Σ.48)’s second line gives the equation Σ∗ U ∗ U Σ = Σ∗ Σ. . V ∗ A∗ AV = Σ∗ Σ. A = U ΣV . . . for just as a real. positive scalar has a real. If A happens to have broad shape then we can decompose A∗ .49) constitutes the singular value decomposition of A. but ΣΣ−1 = In .46) then yields A∗ A = V Σ∗ ΣV ∗ . 0 0 √ (14. Apparently every full-rank matrix has a singular value decomposition.47) 0 √ + λ2 . Now consider the m × m matrix U such that AV Σ−1 = U In . ∗ Σ =Σ = 2 6 6 6 6 6 4 + λ1 0 . √. .49) . Substituting (14. Λ Let the symbol Σ = Λ represent the n × n real. THE SINGULAR VALUE DECOMPOSITION 389 a useful fact. . Why not use GramSchmidt to make them orthonormal. . too.. so equally has √ a real. . 0 0 ··· ··· .49) does not constrain the last m − n columns of U . so left. Applying (14.49)’s second line into (14. Equation (14.14.and right-multiplying respectively by Σ−∗ and Σ−1 leaves that In U ∗ U In = In . leaving us free to make them anything we want. But what of the matrix of less than full rank r < n? In this case the product A∗ A is singular and has only s < n nonzero eigenvalues (it may be ∗ (14. positive square root.47) to (14.

51) ∗ (14. Apparently every matrix of less than full rank has a singular value decomposition. This book has chosen a way that needs some tools like truncators other books omit. If A happens to be an invertible square matrix. (14. the orthogonal complement. then the singular value decomposition evidently inverts it as A−1 = V Σ−1 U ∗ .390 CHAPTER 14. These several matrix chapters have not covered every topic they might. The other is the class of basic matrix theory these chapters do not happen to use. However. then (14. orthonormalization. one might avoid true understanding and instead work by memorized rules. A = U ΣV . and how many scientists and engineers would rightly choose the “nothing” if the matrix did not serve so very many applications as it does? Since it does serve so very many. but the reasoning is otherwise the same as before. about which we shall have more to say in a moment. the “all” it must be. 31 . but this is irrelevant to the proof at hand). The choice to learn the basic theory of the matrix is almost an all-or-nothing choice. THE EIGENVALUE that s = r. One is the class of more advanced and more specialized matrix theory. the kernel. but does not need other tools like Of course.31 Applied mathematics brings nothing else quite like it. pseudoinversion. That is not always a bad plan.50) 14. if the s nonzero eigenvalues are arranged first in Λ. The essential agents of matrix analysis— multiplicative associativity. The topics they omit fall roughly into two classes.49) becomes AV Σ−1 = U Is . really. The product A∗ A is nonnegative definite in this case and ΣΣ−1 = Is . diagonalization and so on—are the same in practically all books on the subject. AV = U Σ. too. rank. no one has yet discovered a satisfactory way to untangle the knot. but if that were your plan then it seems spectacularly unlikely that you would be reading a footnote buried beneath the further regions of the hinterland of Chapter 14 in such a book as this. but the way the agents are developed differs. As far as the writer knows. the eigenvalue.13 General remarks on the matrix Chapters 11 through 14 have derived the uncomfortably bulky but—incredibly—approximately minimal knot of theory one needs to grasp the matrix properly and to use it with moderate versatility. inversion.

VI. which is to † say that Ax ≈ b. should the need arise. However. Such as [23. yet most of the tools are of limited interest in themselves. it is the agents that matter. From the present pause one could proceed directly to such topics.32). since this is the first proper pause these several matrix chapters have afforded. § 3. GENERAL REMARKS ON THE MATRIX 391 projectors other books32 include. Tools like the projector not used here tend to be omitted here or deferred to later chapters. more advanced and more specialized matrix theory though often harder tends to come in smaller. briefly: a projector is a matrix that flattens an arbitrary ˜ vector b into its nearest shadow b within some restricted subspace. More broadly defined. per (13. in which the matrix B(B ∗ B)−1 B ∗ is the projector. whereupon x = A b. since the book is Derivations of Applied Mathematics rather than Derivations of Applied Matrices. If the columns of A ˜ ˜ represent the subspace. then x represents b in the subspace basis iff Ax = b. The theory develops endlessly. Thence it is readily shown that the ˜ ˜ deviation b − b lies orthogonal to the shadow b. Well.3]. not because they are altogether useless but because they are not used here and because the present chapters are already too long. The reader who understands the Moore-Penrose pseudoinverse and/or the Gram-Schmidt process reasonably well can after all pretty easily figure out how to construct a projector without explicit instructions thereto. for instance. That is.13. . maybe we ought to take advantage to change the subject. since we have brought it up (though only as an example of tools these chapters have avoided bringing up). or the conjugategradient algorithm. One can approach the projector in other ways. What has given these chapters their hefty bulk is not so much the immediate development of the essential agents as the preparatory development of theoretical tools used to construct the essential agents. but there are two ways at least. any matrix M for 2 which M = M is a projector. 33 32 ˜ b = Ax = AA† b = [BC][C ∗ (CC ∗ )−1 (B ∗ B)−1 B ∗ ]b = B(B ∗ B)−1 B ∗ b.33 Paradoxically and thankfully. more manageable increments: the Cholesky decomposition.14. a lengthy but well-knit tutorial this writer recommends.

THE EIGENVALUE .392 CHAPTER 14.

11 through 14.4 and 3. the three-dimensional geometrical vector is the n = 3 special case of the general.4 on page 63 interrelates. z) or spherical (r.Chapter 15 Vector analysis Leaving the matrix. the need to disambiguate seems seldom to arise in practice. A three-dimensional geometrical vector—or. 2] 393 .” A name like “matrix vector” can disambiguate the vector of Chs. more exotic possibilities. among other. the three-dimensional geometrical vector merits closer attention and special treatment.3 of an amplitude of some kind plus a direction. Without 1 [7. because its three elements represent the three dimensions of the physical world. Ch. a vector—consists per § 3. 15. Seen from one perspective. However. x). This chapter details it. 3. cylindrical (ρ. conventionally call it just a “vector. y. this chapter turns to a curiously underappreciated agent of applied mathematics: the three-dimensional geometrical vector. In this chapter and in other appropriate contexts we shall usually. φ) coordinates. The vector brings an elegant notation. simply. The short name “vector” serves. φ. first met in §§ 3.3. Section 3. but since the three-dimensional geometrical vector brings all the matrix vector’s properties and disrupts none of its operations. since it adds semantics to but does not subtract them from it.1 Applied convention finds the full name “three-dimensional geometrical vector” too bulky for common use.9. as Fig.1 illustrates and Table 3. θ. 11 through 14 if and where required. n-dimensional vector of Chs.9 has told that three scalars called coordinates suffice to specify a vector: these three can be rectangular (x.

VECTOR ANALYSIS Figure 15. φ. (The axis labels bear circumflexes in this figure only to disambiguate the z axis from the cylindrical coordinate z. z) coordinates.5. See also Fig.) ˆ z ˆ θ φ x ˆ r z y ˆ ρ . 15. in spherical (r.394 CHAPTER 15.1: A point on a sphere. φ) and cylindrical (ρ. θ.

1 Reorientation 3 ˆ x′ 4 y′ 5 ˆ ˆ z′ 2 2 32 3 ˆ x 0 ˆ 0 54 y 5. after which it develops the calculus of vector fields.15.1. The vector notation’s principal advantage lies in that it frees a model’s geometry from reliance on any particular coordinate system. Two-dimensional geometrical vectors arise in practical modeling about as often. one rotates the coordinates of its aspect coefficient about the y axis. ˆ 1 z Matrix notation expresses the rotation of axes (3. With the notation. The example measures this advantage when.2.” though for vectors some students seem to insist on preferring the expression without anyway. whose spaghetti-like result is as unsuggestive as it is unwieldy. This chapter therefore begins with the reorientation of axes in § 15. The two-dimensional however needs little or no additional special treatment. The same coefficient in standard vector notation is ˆ n · ∆ˆ. REORIENTATION the notation. such rotation is a lengthy. don’t worry. for instance. Vector addition will already be familiar to the reader from Ch. one writes an expression like (z − z ′ ) − [∂z ′ /∂x]x=x′ . too. Three-dimensional geometrical vectors naturally are not the only interesting kind. 15.y=y′ (x − x′ ) − [∂z ′ /∂y]x=x′ . the rotation is almost trivial.y=y′ [(x − x′ )2 + (y − y ′ )2 + (z − z ′ )2 ] for the aspect coefficient relative to a local surface normal (and if the sentence’s words do not make sense to you yet. just look the symbols over).1 and vector multiplication in § 15.5) as = cos ψ 4 − sin ψ 0 sin ψ cos ψ 0 In three dimensions however one can do more than just to rotate the x and y axes about the z. You probably should not.y=y′ (y − y ′ ) 395 [1 + (∂z ′ /∂x)2 + (∂z ′ /∂y)2 ]x=x′ . They are important. 3 or (more likely) from earlier work outside this book. for it is just the threedimensional with z = 0 or θ = 2π/4. With two rotations to point the x axis in the desired . error-prone exercise. Without the notation. r To prefer the expression without to the expression with the notation would be as to to prefer the pseudo-Roman “DCXCIII/M + CXLVII/(M)(M)” to “ln 2.

ˆ z 1 (15.1) or (15.2) To multiply the three matrices of (15. but notice that the transpose (though curiously not the adjoint) of each of the 3 × 3 rotation matrices is also its inverse.3) can still seem confusing until one rewrites the latter equation in the form   x  y . (15. Consider a vector ˆ ˆ ˆ v = xx + yy + zz. but. since the prospect of actually applying the equations in a concrete case tends to confuse the uninitiated.2) say all one needs to say about reorienting axes in three dimensions. one spins or rolls the y and z axes through an angle η about the new x. even once one has clearly visualized this sequence of three rotations. the prospect of applying (15. inverting per (3. Envisioning the axes as in Fig. 15. .3) cos φ 4 sin φ 0 − sin φ cos φ 0 32 cos θ 0 0 0 54 − sin θ 1 0 1 0 32 1 sin θ 0 54 0 0 cos θ 0 cos η sin η 32 ′ 3 ˆ x 0 ˆ − sin η 54 y′ 5.396 CHAPTER 15. all the while maintaining the three axes rigidly at right angles to one another. VECTOR ANALYSIS direction plus a third rotation to spin the y and z axes into the desired position about the new x axis. z v= ˆ ˆ ˆ x y z (15.1 with the z axis upward. Finally. one can reorient the three axes generally: 3 ˆ x′ 4 y′ 5 ˆ ˆ z′ 2 = 2 1 4 0 0 0 cos η − sin η 32 cos θ 0 sin η 54 0 sin θ cos η 0 1 0 32 cos φ − sin θ 54 − sin φ 0 0 cos θ sin φ cos φ 0 3 32 ˆ x 0 ˆ 0 54 y 5. we might discuss the equations here a little further. (15. one first rotates or yaws the x axis through an angle φ toward the y then tilts or pitches it downward through an angle θ away from the z.4) after which the application is straightforward.2) (which inversely represents the sequence) to (15.2) to form a single 3 × 3 matrix for each equation is left as an exercise to the interested reader. In concept. There results ˆ ˆ ˆ v′ = x′ x′ + y′ y ′ + z′ z ′ . 3 ˆ x 4 y 5 ˆ ˆ z 2 = 2 (15.6). Yet.1) or. ˆ′ z cos η One does not reorient the vector but rather the axes used to describe the vector.1) and (15.

but those are the essentials of it. Indeed. The other two forms of vector multiplication involve multiplying a vector by another vector and are the subjects of the two subsections that follow.5) naturally resembles (15. MULTIPLICATION where 3 x′ 4 y′ 5 z′ 2 397 ≡ 2 (15.2.15. which is the inverse of (15. scalar multiplication. this makes sense.7) Such scalar multiplication evidently scales a vector’s length without diverting it from its direction. 0) belong to x.6) It is tempting incidentally to suppose the unprimed spherical coordinates ˆ of x′ —that is. θ. φ) happen to belong to a not-particularlyˆ interesting unit vector one might label z′′ .2). One can reorient three axes arbitrarily by rotating them in pairs about the z.5) and where Table 3. its spherical coordinates in the [ˆ y ˆ] system—were (1. is trivial: if a vector v is as defined by (15.3). θ + ˆ 2π/4. cos η z′ 1 4 0 0 0 cos η − sin η 32 cos θ 0 sin η 54 0 sin θ cos η 0 1 0 32 cos φ − sin θ 54 − sin φ 0 0 cos θ sin φ cos φ 0 3 32 x 0 0 54 y 5. z 1 = (15. which between the second and third rotations momentarily marks the z axis.1). y and x axes in sequence. φ) that actually belong to x′ . So. A firmer grasp of the reorientation of axes in three dimensions comes with practice. It is 3 x 4 y 5 z 2 2 cos φ 4 sin φ 0 − sin φ cos φ 0 32 cos θ 0 0 0 54 − sin θ 1 0 1 0 32 1 sin θ 0 54 0 0 cos θ 0 cos η sin η 32 ′ 3 x 0 − sin η 54 y ′ 5. θ. . xˆz but this is incorrect. 15.2 Multiplication One can multiply a vector in any of three ways. It is the coordinates (1. The first. One should appreciate the temptation to avoid the mistake. (15. then ˆ ˆ ˆ ψv = xψx + yψy + zψz.4 converts to cylindrical or spherical coordinates if and as desired. φ). when one considers that the ˆ coordinates (1. 2π/4. The inverse of (15. that’s it. The coordinates (1.

and indeed if we write (15.9) z2 and then expand each of the two factors on the right according to (15. by definition hereby if complex.8). (15. (15. is obvious from (15.8.8) in matrix notation. 13: v1 · v2 = x1 x2 + y1 y2 + z1 z2 .1’s cosine if the vectors are real. (15. we see that the dot product does not in fact vary.43) of § 13. 3. Fig. v2 · v1 = v1 · v2 . As in (13.10) gives the angle θ between two vectors according Fig. (15.11) . ˆ∗ ˆ v1 · v2 = cos θ. That the dot product is commutative.   x2 v1 · v2 = x1 y1 z1  y2  .1 The dot product We first met the dot product in § 13. here too the relationship ∗ ∗ v1 · v2 = v1 v2 cos θ. VECTOR ANALYSIS Figure 15.8. to be valid.2: The dot product b θ b cos θ a · b = ab cos θ a 15. 15.2.6). the dot product must not vary under a reorientation of axes. It works similarly for the geometrical vectors of this chapter as for the matrix vectors of Ch.398 CHAPTER 15. Naturally.8) which is the product of the vectors v1 and v2 to the extent that they run in the same direction.2 illustrates the dot product.

z otherwise a − sign. as in ˆx1 y2 . Unlike the dot product the cross product is a vector. Where the progression is honored.2. inasmuch as a reorientation consists of three rotations in sequence.1 is a scalar. v2 × v1 = −v1 × v2 .2 The cross product The dot product of two vectors according to § 15. MULTIPLICATION 399 15. the cross product is the product of two vectors to the extent that they run in different directions.6) then substituting the results into (15.12) arises again and again in vector analysis.1 could have but did not happen to use) whose semantics are as shown.6’s parity principle and the right-hand rule.14) which is a direct consequence of the previous point regarding parity.2) and (15. One can also multiply two vectors to obtain a vector. however. where the |·| notation is a mnemonic (actually a pleasant old determinant notation § 14. the associated term bears a + sign.12) x2 y2 z2 ˆ ˆ ˆ ≡ x(y1 z2 − z1 y2 ) + y(z1 x2 − x1 z2 ) + z(x1 y2 − y1 x2 ).2. Several facets of the cross product draw attention to themselves. . • The cyclic progression ··· → x → y → z → x → y → z → x → y → ··· (15. or which can be seen more prosaically in (15. As the dot product.12) by swapping the places of v1 and v2 . it suffices merely that rotation about one axis not alter the dot product. Fortunately. In fact. it is also an unnecessary exercise. • The cross product is not commutative. due to § 11.15. and it is often useful to do so. unpleasant exercise.12): a lengthy.2. the cross product too is invariant under reorientation. As the dot product is the product of two vectors to the extent that they run in the same direction. It is defined in rectangular coordinates as v1 × v2 = ˆ ˆ z x y ˆ x1 y1 z1 (15. (15. for. One proves the proposition in the latter form by setting any two of φ. θ and η to zero before multiplying out and substituting.13) of (15. One could demonstrate this fact by multiplying out (15.

ˆ ˆ as is proved by a suitable counterexample like v1 = v2 = x. exotic physical phenomena is three-dimensional). 15.12) into (15. and finally by applying (15.10—in fact the operation v1 2 is used routinely to calculate electromagnetic power flow2 —but each of the cross product’s three rectangular components has its own complex phase which the magnitude operation flattens. the physical world is threedimensional (or.8) with an appropriate change of variables and simplifying. Fortunately. (v1 × v2 ) × v3 = v1 × (v2 × v3 ). One can similarly relate the angle’s sine to the cross product if the vectors involved are real. as is seen by substituting (15. as ˆ |ˆ 1 × v2 | = sin θ.12) and comparing the result against Fig. 1-51] . • Section 15. v1 · (v1 × v2 ) = 0 = v2 · (v1 × v2 ).400 CHAPTER 15. 3. • The cross product runs perpendicularly to each of its two factors.1 has related the cosine of the angle between vectors to the dot product. (If the vectors involved are com∗ plex then nothing prevents the operation |v1 × v2 | by analogy with ∗ × v without the magnitude sign eqn. so the cross product’s limitation as defined here to three dimensions will seldom if ever disturb us. the space in which we model all but a few. the cross product is closely tied to threedimensional space.2. but to speak of a cross product in fourdimensional space would require arcane definitions and would otherwise make little sense. That is. and that v2 has only a nonnegative comz ˆ ponent in the y′ direction. so the result’s relationship to the sine of an angle is not immediately clear. at least.) 2 [21. by remembering that reorientation cannot alter a cross product. that v2 has no ˆ component in the ˆ′ direction. That is. VECTOR ANALYSIS • The cross product is not associative. Two-dimensional space (a plane) can have a cross product so long as one does not mind that the product points off into the third dimension. (15. eqn.15) ˆ ˆ ˆ demonstrated by reorienting axes such that v1 = x′ . v3 = y. • Unlike the dot product.1’s sine. v |v1 × v2 | = v1 v2 sin θ.

such a concept is flexible enough to confuse the uninitiated severely and soon. here too an example affords perspective. 15. for. Normally one requires that x′ . y′ and ˆ′ can serve.7. Imagine driving your automobile down a winding road.3. That your velocity were ˆ meant ℓq that you kept skilfully to your lane. and obey the right-hand rule (§ 3.7. However. However. not generally. but just at the spot along the road at which your automobile momentarily happened to be. Recalling the constants and variables of § 2. 3 . to choose x′ . Moreover. y′ and z′ wisely can immensely simplify the model’s ˆ ˆ ˆ analysis.15. whether q = xx + yy + ˆz or q = x′ x′ + y′ y ′ + z′ z ′ . with velocity as v which in the present example would happen to be v = ˆ ℓv. 15. it remains z the same vector q. a cylinder or a sphere.3 illustrates the cross product. a model can specify z ′ . ˆ ˆ but otherwise any x′ . ORTHOGONAL BASES Figure 15. y′ and z′ under various conditons. that your velocity Conventionally one would prefer the letter v to represent speed. where q represented your speed3 and ˆ represented the direction ℓ the road ran.3). 401 c = a×b = ˆab sin θ c b θ a Fig. As in § 2. y′ and z′ each retain unit length. where a model involves a circle.3: The cross product. this section will require the letter v for an unrelated purpose. run perpendiclarly to one another. for nothing requires the three ˆ ˆ various x ˆ to be constant. on the other hand.3 Orthogonal bases A vector exists independently of the components by which one expresses ˆ ˆ ˆ ˆ ˆ it. where a model involves a contour or a curved surface of some ˆ ˆ ˆ kind.

where β repℓ resented the difference (assuming that the other driver kept skilfully to his own lane) between the direction of the road a mile ahead and the direction at your spot. whereas the wind’s velocity had not changed while the expression representing it had. often a variable one. where ℓ ˆ ˆ ˆ and w (or u and v. A headwind had velocity −ˆ wind . after you had rounded a bend in the road.16) constitutes such an orthogonal basis. like . The geometries of other models however do x ˆ z suggest a particular basis. then the three together would ℓˆ ˆ constitute an orthogonal basis. ˆ ˆ z ˆ ˆ x′ × y′ = ˆ′ .402 CHAPTER 15. z (15. ±ˆ qwind . ˆ ˆ ˆ y′ × z′ = x′ . which location changes as you drive. from which other vectors can be built. such that w lay in the plane of ˆ ˆ (ˆ × ℓ) ˆ z z ˆ ˆ ˆ and ˆ ℓ)—but if the geometry suggests a particular v or w (or u). for which x ˆ ˆ ˆ ˆ y′ · z′ = 0. ˆ ˆ z′ · x′ = 0. Any right-handedly mutually perpendicular [ˆ ′ y′ z′ ] in three dimensions. ℓ However. • Where the model features a contour like the example’s winding road. ˆ′ × x′ = y′ . at right angles to ˆ represented the ℓ ℓ. Rather. an [ˆ v w] basis (or a [ˆ v ˆ basis or even a [ˆ ˆ w] basis) can be ℓ ˆ ˆ u ˆ ℓ] u ℓ ˆ ˆ locally follows the contour. it is because ˆ is defined relative to the road at ℓ your automobile’s location. The geometries of some models suggest no particular basis. the wind would no longer be a headwind. And this is where confusion can arise: your own velocity had changed while the expression representing it had not.) can be defined in any convenient way so long as they remain perpendicular to one another and to ˆ ℓ—such that ˆ · w = 0 for instance (that is. For all these purposes the unit vector ˆ would remain constant. with respect to which one would express your new velocity as ˆ but ℓq would no longer express the headwind’s velocity as −ˆ wind because. direction right-to-left across the road—would have you drifting out of your lane at an angle ψ. This is not because ˆ differs from place to place at a given moment. ˆ the symbols ˆ and v would by definition represent different vectors than ℓ before. The variable unit vectors v ˆ used. for like any ℓ other vector the vector ˆ (as defined in this particular example) is the same ℓ vector everywhere. whether constant or variable. perpendicular both to ˆ and to v ℓ such that [ˆ v w] obeyed the right-hand rule. ˆ ˆ If a third unit vector w were defined. ℓq v ˆ A car a mile ahead of you had velocity (q2 )(ˆ cos β + v sin β). VECTOR ANALYSIS ˆ ˆ were (q)(ˆ cos ψ + v sin ψ)—where v. ˆ ˆ x′ · y′ = 0. when one usually just uses a constant [ˆ y ˆ]. a crosswind. fifteen seconds later. etc. since ℓq the road had turned while the wind had not.

etc. that such a unit normal n tells one everything one needs to know about its surface’s local orientation. Refer to § 3. Refer to § 3. where the model features a contour along a curved surface.3. Standing on the earth’s surface. • Combining the last two. θ is variable and runs locally tangentially to the sphere’s surface in the direction of increasing elevation θ (that is. 4 The assertion wants a citation. In such a model one conventionally establishes z ˆ as the direction of the principal circle’s axis but then is left with x ˆ or y as the direction of the secondary circle’s axis.) can be used. • Occasionally a model arises with two circles that lie neither in the same plane nor in parallel planes but whose axes stand perpendicuˆ lar to one another.4.9. and φ east. . along the sphere’s surface perpendicularly to ˆ).” Observe.”4 • Where the model features a curved surface like the surface of a wavy ˆ sea. then that v or w should probably be used. where n u ˆ ˆ u ˆ ˆ points locally perpendicularly to the surface. [φ y ρy ] or [θ φ ˆ] basis can be used locally x ˆ ˆ as appropriate. a [ˆ φ z] basis can ρ ˆ ˆ ˆ be used. θ south. a [ˆ v n] basis (or a [ˆ n w] basis. which the author lacks. an [ˆ θ φ] basis can be used. Refer to § 3. ˆ z r ˆ ˆ would be up. • Where the model features a sphere.5. The letter n here stands for “normal. The letter ℓ here stands for “longitudinal. 15. incidentally ˆ but significantly. an [ˆ v n] basis can be used. as nearly as possible to the −ˆ direction z z ˆ without departing from the sphere’s surface). upon which an x ˆ x r ˆx ˆ y ˆ ˆ ˆy ˆ y r [ˆ ρx φ ]. 15. ORTHOGONAL BASES 403 ˆ ˆ the direction right-to-left across the example’s road. choosing a direction for v in this case since necessarily v = n × ˆ • Where the model features a circle or cylinder. and φ is variable and runs locally tangentially to the sphere’s surface in the direction of increasing azimuth φ (that is. and φ is variable and runs locally along the circle’s perimeter in the direction of increasing azimuth φ. though not usually in the −ˆ direction itself. ρ is variable and points ˆ locally away from the axis.9 and Fig. where z is constant and runs along the cylinder’s axis (or ˆ perpendicularly through the circle’s center).” a synonym for “perpendicular. r ˆ ˆ where ˆ is variable and points locally away from the sphere’s cenr ˆ ter. One need not worry about ℓˆ ˆ ˆ ˆ ˆ ℓ.15. [φ ˆ θ ]. with the earth as the sphere.9 and Fig.

The dot in the is supposed to look like the tip of an arrowhead. ˆ r ˆ θ ˆ φ .1). VECTOR ANALYSIS and Figure 15.404 CHAPTER 15. this figure shows the constant ˆ basis vector z pointing out of the page toward the reader. 15.) middle of ˆ φ ˆ ρ ρ φ ˆ z Figure 15. (The conventional symbols respectively represent vectors pointing out of the page toward the reader and into the page away from the reader.4: The cylindrical basis. Thus.5: The spherical basis (see also Fig.

(15. aρ ar aθ aφ ˆ ≡ ρ · a.16).1 Components by subscript ax ay az an ≡ ≡ ≡ ≡ ˆ x · a. a×b = ˆ ˆ ˆ x y z ax ay az bx by bz . 15. This section augments the notation to render it much more concise. r ˆ · a. as we have seen.20) (15. Generically. z ˆ n · a.4 Notation The vector notation of §§ 15.12): a · b = ax bx + ay by + az bz . ai ≡ ˆ · a.4.19) In fact—because.8) and (15.18) (15. ˆ · a. but the foregoing are the most common. ˆ y · a.21) . That is to say. familiar and often expedient but sometimes inconveniently prolix. each orthogonal basis orders its three unit vectors by the right-hand rule (15.4. the notation represents the component of a vector a in the indicated direction.15. The notation and so forth abbreviates the indicated dot product. (15. so one can write more generally that ˆ ˆ (for instance. reorientation of axes cannot alter the dot and cross products—any orthogonal basis [ˆ ′ y′ ˆ′ ] (§ 15. but the abbreviation especially helps when several such dot products occur together in the same expression. 15.2 is correct. ≡ ˆ · a. The abbreviation lends a more amenable notation to the dot and cross products of (15. Whether listed here or not. so the full dot-product notation is often clearer in print than the abbreviation is. ı (15.17) Applied mathematicians use subscripts for several unrelated or vaguely related purposes. ˆ x′ ˆ y′ ˆ′ z a×b = ax′ bx ′ ay′ az ′ by′ bz ′ .1 and 15. ≡ θ ˆ ≡ φ · a. NOTATION 405 Many other orthogonal bases are possible. [ˆ φ z ρ r a · b = ax′ bx′ + ay′ by′ + az ′ bz ′ .3) can serve here x ˆ z ˆ ˆ] or [ˆ θ φ]).

2 Einstein’s summation convention Einstein’s summation convention is this: that repeated indices are implicitly summed over. right-handed. 10 February 2008]. VECTOR ANALYSIS Because all those prime marks burden the notation and for professional mathematical reasons. If the reader feels that the notation begins to confuse more than it describes. y and z could just as well be ρ. the general forms (15.406 CHAPTER 15. the abbreviated notation really is the proper notation. with the implied understanding that x.20) and (15. Typically. study closely and take heart! The notation is not actually as impenetrable as it at first will seem. For vectors. “Einstein notation. far from granting the reader a comfortable chance to absorb the elaborated notation as it stands. indeed. So. three-dimensional basis.21) but prefer not to burden the notation with extra little strokes—that is. 15. where the superscript bears some additional semantics [52.8).18) and (15.4. b1 b2 b3 but you have to be careful about that in applied usage because people are not always sure whether a symbol like a3 means “the third component of ˆ the vector a” (as it does here) or “the third vector’s component in the a direction” (as it would in eqn.22) [25] Some professional mathematicians now write a superscript ai in certain cases in place of a subscript ai . the equation6 a · b = ai bi 5 6 (15. The trouble with vector work is that one has to learn to abbreviate or the expressions involved grow repetitive and unreadably long. the writer empathizes but regrets to inform the reader that the rest of the section. ˆ ˆ ˆ e1 e2 e3 a×b = a1 a2 a3 .19) with the implied understanding that they really mean (15.20) and (15. . and.21) are sometimes rendered a · b = a1 b1 + a2 b2 + a3 b3 . the abbreviation is rather elegant once one becomes used to it. 15. where the convention is in force.5 For instance.” 05:36. Eventually one accepts the need and takes the trouble to master the conventional vector abbreviation this section presents. Scientists and engineers however tend to prefer Einstein’s original. shall not delay to elaborate the notation yet further! The confusion however is subjective. φ and z or the coordinates of any other orthogonal. applied mathematicians will write in the manner of (15. subscript-only notation.

judging it somewhat elegant but not worth devoting a subsection to. “Einstein summation”] The book might not present the convention at all were it not so widely used. Admittedly confusing on first encounter.20). below.24) before presenting it.3. the convention is both serviceable and neat. Einstein’s summation convention is also called the Einstein notation. you should indeed learn the convention—if only because you must learn it to understand the [50.21)—although an experienced applied mathematician would probably apply the Levi-Civita epsilon of § 15. however. it brings no new mathematics. It is rather a notational convenience. and. Under the convention. a term sometimes taken loosely to include also the Kronecker delta and LeviCivita epsilon of § 15. Besides. who does not want to learn a thing to which the famous name of Albert Einstein is attached? 9 [38] 8 7 . in and of itself. It is the kind of notational trick an accountant might appreciate. but the operator is still there.7 . The convention is indeed widely used in more advanced applied circles. NOTATION means that a·b= or more fully that a·b = ai bi = ax′ bx′ + ay′ by′ + az ′ bz ′ .” It does not magically create a summation where none existed. You can waive the convention.22) expresses it more succinctly. writing the summation symbol out explicitly whenever you like.9 for the symbol after all exists to be employed.4.3.4.15.8 It asks a reader to regard a repeated index like the i in “ai bi ” as a dummy index (§ 2. Nothing requires you to invoke Einstein’s summation convention everywhere and for all purposes. after all. it just hides the summation sign to keep it from cluttering the page. What is important to understand about Einstein’s summation convention is that.23) is (15. i=x′ .z ′ 407 ai bi i which is (15. to further abbreviate this last equation to the form of (15. except that Einstein’s form (15.3) and thus to read “ai bi ” as “ i ai bi .4. the convention’s utility and charm are felt after only a little practice.y ′ . to invoke the convention at all may make little sense. the summational operator i is implied not written. a × b = ˆ(ai+1 bi−1 − bi+1 ai−1 ) ı (15. Likewise. In contexts other than vector work. Nevertheless.

k) = (x′ . not only three as in the present chapter.3 The Kronecker delta and the Levi-Civita epsilon Einstein’s summation convention expresses the dot product (15. Table 15. (y ′ . x′ . y ′ . ı where12  +1  ≡ −1   0 if (i.6. VECTOR ANALYSIS rest of this chapter—but once having learned it you should naturally use it only where it actually serves to clarify. x′ .7.” As far as the writer knows. concerns the three-dimensional case only. it often does just that.) Technically. y ′ . 11. In this more general sense the Levi-Civita is the determinant of the permutor whose ones hold the indicated positions—which is a formal way of saying that it’s a + sign for even parity and a − sign for odd. x′ . (Chapters 11 and 14 did not use it. does not by itself wholly avoid unseemly repetition in the cross product. ǫ is merely the letter 11 10 . otherwise [for instance if (i. The Levi-Civita epsilon 11 ǫijk mends this.23). the “ci” in Levi-Civita’s name is pronounced as the “chi” in “children. 4 × 4 case ǫ1234 = 1 whereas ǫ1243 = −1: refer to §§ 11. (y ′ . rendering the cross-product as a × b = ǫijkˆaj bk . in the fourdimensional. “Levi-Civita permutation symbol”] 13 The writer has heard the apocryphal belief expressed that the letter ǫ. x′ ).24). the Levi-Civita epsilon and Einstein’s summation convention are two separate.1. Fortunately. y ′ ). k) = (x′ . y ′ )].22) neatly but. z ′ ) or (z ′ . For instance. tensor. as the rest of this section and chapter. independent things. j. a Greek e.” 12 [37. j. as in (15. z ′ . if (i. Quiz:10 if δij is the Kronecker delta of § 11. but a canny reader takes the Levi-Civita’s appearance as a hint that Einstein’s convention is probably in force.408 CHAPTER 15.4. z ′ ).6.13 [25] Also called the Levi-Civita symbol. in vector work. but the Levi-Civita notation applies in any number of dimensions. (15.2. y ′ ).25) In the language of § 11. x′ ) or (z ′ . j.24) ǫijk (15.1. as we have seen in (15. or permutor. For native English speakers who do not speak Italian. then what does the symbol δii represent where Einstein’s summation convention is in force? 15. the Levi-Civita epsilon quantifies parity. k) = (x′ . stood in this context for “Einstein. z ′ . however. The two tend to go together.1 and 14.

2’s quiz. y ′ ). and also either (m. n) or (j. n) = (y ′ .15. k) = (z ′ .1 lists several relevant properties. which makes for yet another plausible story). k) = (n. in the case that i = x′ . thus contributing nothing to the sum). z ′ ) or (m. in each case the several indices can take any values. This implies that either (j. Table 15. In any event. to zero. with Einstein’s summation convention in force. NOTATION 409 Table 15. y ′ ).10. Both delta and epsilon find use in vector work.1: Properties of the Kronecker delta and the Levi-Civita epsilon. one sometimes hears Einstein’s summation convention. and similarly in the cases that i = y ′ and i = z ′ (more precisely. the Kronecker delta and the Levi-Civita epsilon together referred to as “the Einstein notation. one can write (15. k) = (m. n) = (z ′ .6 approximately as the cross product relates to the dot product. . but combinations other than the ones listed drive ǫimn or ǫijk . the property that ǫimn ǫijk = δmj δnk − δmk δnj is proved by observing that.22) alternately in the form a · b = δij ai bj . or both.14 each as with Einstein’s summation convention in force. δjk = δkj δij δjk = δik δii = 3 δjk ǫijk = 0 δnk ǫijk = ǫijn ǫijk = ǫjki = ǫkij = −ǫikj = −ǫjik = −ǫkji ǫijk ǫijk = 6 ǫijn ǫijk = 2δnk ǫimn ǫijk = δmj δnk − δmk δnj The Levi-Civita epsilon ǫijk relates to the Kronecker delta δij of § 11.” which though maybe not quite terminologically correct is hardly incorrect enough to argue over and is clear enough in practice. k) = (y ′ .15 Of the table’s several properties. z ′ ) or (j. when one takes parity into after δ. 14 [38] 15 The table incidentally answers § 15.4. either (j. which represents the name of Paul Dirac—though the writer does not claim his indirected story to be any less apocryphal than the other one (the capital letter ∆ has a point on top that suggests the pointy nature of the Dirac delta of Fig.4. For example. m)—which. 7.

(15. and similarly for k = n = y ′ and again for k = n = z′ . the last paragraph likely makes sense to few who do not already know what it means. Unfortunately. A concrete example helps. in any given term of the Einstein sum.n j. j) = (y ′ .24).n ˆ (δmj δnk − δmk δnj )man bj ck . j) = (z ′ . in light of (15. a × (b × c) = b(a · c) − c(a · b). which leaves the third to be shared by both k and n. VECTOR ANALYSIS account. In this section’s notation and with the use of (15. y ′ ) term both contribute positively to the sum. The factor 2 appears because. the compound product is a × (b × c) = a × (ǫijkˆbj ck ) ı ˆ = ǫmni man (ǫijkˆbj ck )i ı ˆ = ǫmni ǫijk man bj ck ˆ = ǫimn ǫijk man bj ck ˆ = (δmj δnk − δmk δnj )man bj ck ˆ ˆ = δmj δnk man bj ck − δmk δnj man bj ck ˆ = ˆak bj ck − kaj bj ck  ˆ = (ˆbj )(ak ck ) − (kck )(aj bj ). z ′ ) term and an (i. . i is either x′ or y ′ or z ′ and that j is one of the remaining two. Consider the compound product a × (b × c).410 CHAPTER 15.22).m.  That is.m. for k = n = x′ .26) a useful vector identity. The property that ǫijn ǫijk = 2δnk is proved by observing that. is exactly what the property in question asserts. Written without the benefit of Einstein’s summation convention. an (i.k. the example’s central step would have been a × (b × c) = = ˆ ǫimn ǫijk man bj ck i.j.k.

x′ .j. x′ ) + ǫy ′ x′ z ′ ǫy ′ x′ z ′ f (x′ . z ′ . z ′ . z ′ . z ′ . x′ ) + f (y ′ . For the property that ǫijn ǫijk = 2δnk . x′ ) + ǫz ′ y ′ x′ ǫz ′ x′ y ′ f (y ′ . x′ . z ′ ) + ǫy ′ x′ z ′ ǫy ′ z ′ x′ f (x′ .k. y ′ . y ′ . y ′ ) + f (z ′ . x′ . y ′ . x′ ) + ǫz ′ y ′ x′ ǫz ′ y ′ x′ f (x′ . y ′ ) ˆ − f (y ′ . y ′ . z ′ . y ′ . z ′ ) ˆ ˜ 2 f (x′ . x′ . k. x′ . x′ . x′ . y ′ ) + f (y ′ . y ′ ) + f (z ′ . z ′ . z ′ ) + f (x′ . z ′ . y ′ . z ′ . z ′ ) − f (y ′ . y ′ ) + f (z ′ . n). y ′ . x′ . z ′ . x′ . x′ . x′ . x′ . x′ ) + f (y ′ . z ′ ) + f (x′ . the corresponding calculation is X ǫijn ǫijk f (k. z ′ . x′ ) + f (x′ . NOTATION 411 which makes sense if you think about it hard enough. x′ . z ′ .15.j. z ′ . X ǫijk ǫijk = ǫ2 ′ y ′ z ′ + ǫ2 ′ z ′ x′ + ǫ2′ x′ y ′ + ǫ2 ′ z ′ y ′ + ǫ2 ′ x′ z ′ + ǫ2′ y ′ x′ = 6. x′ . m. z ′ . y ′ . z ′ . y ′ ) + f (x′ . x′ . z ′ ) + f (z ′ . x′ ) = f (y ′ .4. x′ ) + f (z ′ . x′ . x′ .m. y ′ ) + f (z ′ . x′ ) − f (z ′ . z ′ ) + f (z ′ . x y z x y z i. y ′ ) + f (y ′ . y ′ . y ′ ) + f (z ′ . z ′ ) + f (y ′ . z ′ ) + f (y ′ .n = ǫx′ y ′ z ′ ǫx′ y ′ z ′ f (y ′ . z ′ . z ′ . x′ ) + f (y ′ . x′ . 2 k. y ′ . x′ . y ′ ) + ǫx′ z ′ y ′ ǫx′ y ′ z ′ f (z ′ . z ′ ) + ǫx′ y ′ z ′ ǫx′ z ′ y ′ f (y ′ . y ′ . x′ ) + f (y ′ . x′ ) + ǫy ′ z ′ x′ ǫy ′ x′ z ′ f (z ′ . y ′ ) + f (z ′ . x′ ) + f (y ′ . x′ .16 and justifies the 16 If thinking about it hard enough does not work. z ′ . y ′ ) + f (x′ . y ′ . y ′ . z ′ ) + f (x′ . z ′ . z ′ ) = j. x′ . then here it is in interminable detail: X ǫimn ǫijk f (j. z ′ ) + ǫz ′ x′ y ′ ǫz ′ x′ y ′ f (x′ . y ′ ) + ǫz ′ x′ y ′ ǫz ′ y ′ x′ f (x′ . y ′ ) + ǫy ′ z ′ x′ ǫy ′ z ′ x′ f (z ′ . n). z ′ . y ′ ) + f (x′ . y ′ . x′ . y ′ . k. x′ . x′ ) + f (z ′ . x′ . x′ . y ′ . z ′ . z ′ . z ′ ) + ǫy ′ x′ z ′ ǫy ′ x′ z ′ f (z ′ .j. z ′ . y ′ ) + ǫx′ z ′ y ′ ǫx′ z ′ y ′ f (y ′ . z ′ ) X δnk f (k.k. z ′ ) + f (z ′ . y ′ .n = ǫy ′ z ′ x′ ǫy ′ z ′ x′ f (x′ . x′ ) ˜ ˜ ˜ ˜ = ˆ f (y ′ .m. z ′ . m. z ′ . z ′ ) + f (z ′ . z ′ . z ′ . y ′ ) + f (x′ . x′ . y ′ ) + ǫz ′ y ′ x′ ǫz ′ y ′ x′ f (y ′ . y ′ . x′ . z ′ ) = = = f (x′ . y ′ .n X (δmj δnk − δmk δnj )f (j. z ′ . x′ ) ˆ + f (z ′ . x′ ) − f (y ′ . x′ ) + f (x′ . x′ . z ′ . n) i. x′ ) + ǫz ′ x′ y ′ ǫz ′ x′ y ′ f (y ′ . x′ . y ′ . y ′ . z ′ ) − f (y ′ . y ′ . x′ . y ′ . y ′ . y ′ .k . z ′ . z ′ ) = ˆ f (y ′ . x′ . y ′ . x′ . z ′ ) − f (x′ . y ′ . y ′ . That is for the property that ǫimn ǫijk = δmj δnk −δmk δnj . x′ . z ′ ) + ǫx′ z ′ y ′ ǫx′ z ′ y ′ f (z ′ . y ′ . y ′ . x′ . z ′ . z ′ . z ′ . x′ . y ′ . z ′ .k. y ′ . y ′ ) − f (z ′ . z ′ . y ′ ) + ǫx′ y ′ z ′ ǫx′ y ′ z ′ f (z ′ . y ′ ) + f (z ′ . z ′ ) + f (x′ . z ′ . n) i. x′ ) + f (y ′ . x′ ) + f (x′ . x′ . y ′ ) − f (x′ . x′ ) + f (x′ . z ′ . y ′ ) + f (z ′ . y ′ . y ′ . y ′ .n For the property that ǫijk ǫijk = 6. y ′ . x′ ) + f (x′ .

dot and cross products. but with three distinct types of product it has more rules controlling the way its products and sums are combined.18) and (15. until he is satisfied that he adequately understands the several properties and their correct use.2: Algebraic vector identities. the Levi-Civita epsilon and the properties of Table 15. the compound Kronecker operator −δmk δnj includes canceling terms for these same three cases. 17 [47.19) respectively for the scalar. ψa = a∗ · a b·a a · (b + c) a · (ψb) ˆψai ı a · b = ai bi a × b = ǫijkˆaj bk ı = |a|2 (ψ)(a + b) = ψa + ψb = a·b b × a = −a × b = a·b+a·c a × (b + c) = a × b + a × c = (ψ)(a · b) a × (ψb) = (ψ)(a × b) a · (b × c) = b · (c × a) = c · (a × b) a × (b × c) = b(a · c) − c(a · b) table’s claim that ǫimn ǫijk = δmj δnk − δmk δnj . Appendix A] .5 Algebraic identities Vector algebra is not in principle very much harder than scalar algebra is. VECTOR ANALYSIS Table 15. (15. in something like the style of the example. Table 15.412 CHAPTER 15. for the case that j = k = m = n = y ′ and for the case that j = k = m = n = z ′ . Appendix II][21. whereas the compound Levi-Civita operator ǫimn ǫijk does not.7).17 Most of the table’s identities are plain by the formulas (15. The reader who does not feel entirely sure that he understands what is going on might work out the table’s several properties with his own pencil.2 lists several of these. This is why the table’s claim is valid as written. (Notice that the compound Kronecker operator δmj δnk includes nonzero terms for the case that j = k = m = n = x′ . However.) To belabor the topic further here would serve little purpose. and one was proved It is precisely to encapsulate such interminable detail that we use the Kronecker delta. 15.1.

since ˆ ˆ z q(r) = x′ qx′ (r) + y′ qy′ (r) + ˆ′ qz ′ (r) for any orthogonal basis [x′ y′ z′ ] as well. the specific scalar fields qx (r).6. FIELDS AND THEIR DERIVATIVES 413 as (15. The notation dσ/dt means “the rate of σ as time t advances.15. The remaining identity is proved in the notation of § 15. Air pressure p(r) is an example of a scalar field. § 18.” We can likewise call a scalar quantity ψ(r) or vector quantity a(r) whose varies over space “a function of position r. derivatives with respect to position create a notational problem.3. We call it a field. Wind velocity18 q(r) is an example of a vector field. These are typical examples. the table also includes the three vector products in Einstein notation. As one can take the derivative dσ/dt or df /dt with respect to time t of a function σ(t) or f (t). which it needs for another purpose. a function in which spatial position serves as independent variable. this section also uses the letter q for velocity in place of the conventional v [3. qy (r) and qz (r) are no more essential to the vector field q(r) than the specific scalars bx .” but if the notation dψ/dr likewise meant “the rate of ψ as 18 As § 15.27) = ǫijk ai bj ck = ǫijk bi cj ak = ǫijk ci aj bk Besides the several vector identities. one can likewise take the derivative with respect to position r of a field ψ(r) or a(r). A field is a quantity distributed over space or. if you prefer.26). A vector field can be thought of as composed of three scalar fields ˆ ˆ ˆ q(r) = xqx (r) + yqy (r) + zqz (r). Scalar and vector fields are of utmost use in the modeling of physical phenomena. whose value at a given location r has amplitude but no direction.4]. (15. 15. However.” but there is a special. for it is not obvious what notation like dψ/dr or da/dr would mean. but. . by and bz are to the vector b.6 Fields and their derivatives We call a scalar quantity φ(t) or vector quantity f (t) whose value varies over time “a function of time t. whose value at a given location r has both amplitude and direction. alternate name for such a quantity.4 as a · (b × c) = a · (ǫijkˆbj ck ) ı = ǫijk ai bj ck = ǫkij ak bi cj = ǫjki aj bk ci = b · (c × a) = c · (a × b).

but were you to try to explain such a building to a caveman used to thinking of the ground and the floor as more or less the same thing.2 has given the vector three distinct kinds of product. z If you think that the latter does not look very much like a vector. z Consider a vector Then consider a “vector” ˆ ˆ c = x[Tuesday] + y[Wednesday] + ˆ[Thursday].1 The ∇ operator ˆ ˆ a = xax + yay + ˆaz . Section 15. VECTOR ANALYSIS position r advances” then it would prompt one to ask. 19 . and the curl. If we will speak of a field’s derivative with respect to position r then we shall be more precise. 15. The writer does not know how to interpret a nonsense term like “[Tuesday]ax ” any more than the reader does. then the writer thinks as you do. Once one has grasped the concept of upstairs and downstairs. “advances in which direction?” The notation offers no hint.414 CHAPTER 15. the gradient. a building with three floors or thirty is not especially hard to envision. In fact dψ/dr and da/dr mean nothing very distinct in most contexts and we shall avoid such notation. We shall address the Laplacian in [section not yet written]. What matters in this context is not that c have amplitude and direction (it has neither) but rather that it have the three orthonormal Vector veterans may notice that the Laplacian is not listed. he might be conflicted by partly false images of sitting in a tree or of clambering onto (and breaking) the roof of his hut. This section gives the field no fewer than four distinct kinds of derivative: the directional derivative.19 So many derivatives bring the student a conceptual difficulty akin to a caveman’s difficulty in conceiving of a building with more than one floor. This is not because the Laplacian were uninteresting but rather because the Laplacian is actually a second-order derivative—a derivative of a derivative. but the point is that c behaves as though it were a vector insofar as vector operations like the dot product are concerned. “There are many trees and antelopes but only one sky and floor.6. the divergence. but consider: c · a = [Tuesday]ax + [Wednesday]ay + [Thursday]az . How can one speak of many skies or many floors?” The student’s principal conceptual difficulty with the several vector derivatives is of this kind.

The operator ∇ however shares more in common with a true vector than merely having x.15. partial derivatives. Now consider a “vector” ˆ ∇=x ∂ ∂ ∂ ˆ ˆ +y +z . It has these. ∂x ∂y ∂z (15. unlike the terms in the earlier dot product. That the components’ amplitudes seem nonsensical is beside the point. ˆ + y′ [−ax′ cos φ sin φ + ay′ sin2 φ + ax′ cos φ sin φ + ay′ cos2 φ] . Maybe there exists a model in which “[Tuesday]” knows how to operate on a scalar like ax . As if this were not enough. so “[Tuesday]” might do something to ax other than to multiply it.2).6. then the dot product c · a could be licit in that model. but. maybe. Nothing in the dot’s product’s definition requires the component amplitudes of c to multiply those of a. This is easier to see at first with respect the true vector a. ∂x ∂y ∂z Such a dot product might or might not prove useful.1) and (15. The model might allow it. but c is not a true vector. the cross product c × a too could be licit in that model. The dot and cross products in and of themselves do not forbid it. like a true vector. Multiplication is what the component amplitudes of true vectors do.28) This ∇ is not a true vector any more than c is.) If there did exist such a model. at least we know what this one’s terms mean. Section 15. days of the week. Well. There follows ˆ ˆ ˆ a = xax + yay + zaz ′ ′ ˆ = (ˆ cos φ − y sin φ)(ax′ cos φ − ay′ sin φ) x ˆ ˆ + (ˆ ′ sin φ + y′ cos φ)(ax′ sin φ + ay′ cos φ) + z′ az ′ x ˆ = x′ [ax′ cos2 φ − ay′ cos φ sin φ + ax′ sin2 φ + ay′ cos φ sin φ] ˆ ˆ z = x′ ax′ + y′ ay′ + ˆ′ az ′ . Consider rotating the x and y axes through an angle φ about the z axis. What’s the point? The answer is that there wouldn’t be any point if the only nonvector “vectors” in question were of c’s nonsense kind. composed according to the usual rule for cross products. but if we treat it as one then we have that ∇·a= ∂ax ∂ay ∂az + + . for. the operator ∇ is amenable to having its axes reoriented by (15. y and z components.6. (Operate on? Yes.2 elaborates the point. FIELDS AND THEIR DERIVATIVES 415 components it needs to participate formally in relevant vector operations. ersatz vectors—it all seems rather abstract.

ı Introduced by Oliver Heaviside. ∇ =ˆ ı ∂ ∂ ∂ ∂ ˆ ˆ ˆ = x′ ′ + y′ ′ + z′ ′ . The partial differential operators ∂/∂x.416 CHAPTER 15. VECTOR ANALYSIS where the final expression has different axes than the original but.6. where the “·” holds the place of the thing operated upon. relative to those axes. exactly the same form. then what takes the place ˜ of the ambiguous d/dr′ . where ro = ˆio . a single field.1). [6] 20 . That is the convention. Any letter might serve as well as the example’s J. It is x ˆ ˆ this invariance under reorientation that makes the ∇ operator useful. ay and az do. Now consider ∇. Further rotation about other axes would further reorient but naturally also would not alter the form.6. r † and so on.2 Operator notation Section 15. ∇o . ı there ∇o = ˆ ∂/∂io . the same mark ∇ distinguishes the corresponding special ∇. for instance. but what distinguishes operator notation is that. Operator notation concerns the representation of unary operators and the operations they specify. the operator J in operator notation’s operation Jt attacks from the left. For example. ∂i ∂x ∂y ∂z (15. For example. ∇. the vector differential operator ∇ finds extensive use in the modeling of physical phenomena. A unary operator is a mathematical agent that transforms a single discrete quantity. informally pronounced “del” (in the author’s country at least). ∂/∂y and ∂/∂z change no differently under reorientation than the component amplitudes ax .1. d/d˜. Section 7. Jt = t2 /2 and J cos ωt = (sin ωt)/ω. d/dro . a single function or another single mathematical object in some definite t way. In this example it is clear enough.1 has introduced operator notation without explaining what it is. but where otherwise unclear one can write an Rt operator’s definition in the applied style of J ≡ 0 · dt. After a brief digression to discuss operator notation.3 has already broadly introduced the notion of the operator. a single distributed quantity. J ≡ 0 dt is a unary operator20 whose effect is such that. This subsection digresses to explain. 15.29) evidently the same operator regardless of the choice of basis [ˆ ′ y′ z′ ]. d/dr† and so on? Answer: ∇′ . Whatever mark distinguishes the special r. Hence. If ∇ takes the place of the ambiguous d/dr. the subsections that follow will use the operator to develop and present the four basic kinds of vector derivative. like the matrix row operator A in matrix notation’s product Ax (§ 11.

and for the same reason.” Both confusion and perceived need would vanish if Ω ≡ (ω)(·) were unambiguously an operator. The matrix actually is a type of unary operator and matrix notation is a specialization of operator notation. not (Jω)t. Jt = tJ if J is a unary operator. Seen in this way. In operator notation. FIELDS AND THEIR DERIVATIVES 417 Thus. better.6. One can compare the distinction in § 11. we have met operator notation much earlier than that. The product 5t can if you like be regarded as the unary operator “5 times. though the notation Jt usually formally resembles multiplication in other respects as we shall see. the scalar 5 itself is no operator but just a number—but where no other operation is defined operator notation implies scalar multiplication by default.5) a matrix enjoys. the operator K ≡ · + 3 is not linear and almost certainly should never be represented in operator notation as here. so J1 J2 = J2 J1 . the same algebra matrices obey. Whether operator notation can can licitly represent any unary operation whatsoever is a definitional question we will leave for the professional mathematicians to answer. It is precisely this algebra that makes operator notation so useful. Linear unary operators generally are not commutative.’ operating on t. rather than go to the trouble of defining extra symbols like Ω.3. unary operations that scrupulously honor § 7. And. but otherwise linear unary operators follow familiar rules of multiplication like (J2 + J3 )J1 = J2 J1 + J3 J1 .2 between λ and λI against the distinction between ω and Ω here. in which case (JΩ)t = JΩt = J(Ωt). so we have met operator notation before.3.15.3’s rule of linearity. to rely on the right-to-left convention that Jωt = J(ωt). Still. Perhaps you did not know that 5 was an operator—and. or. which take little space on the page and are universally understood. Linear unary operators obey a definite algebra. Jωt = J(ωt). Observe however that the confusion and the perceived need for parentheses come only of the purely notational ambiguity as to whether ωt bears the semantics of “the product of ω and t” or those of “the unary operator ‘ω times. generally. for a linear unary operator enjoys the same associativity (11. in fact.” operating on t. Modern conventional applied mathematical notation though generally excellent re- . it is usually easier just to write the parentheses. The operators J and A above are examples of linear unary operators. lest an expression like Kt mislead an understandably unsuspecting audience. The a · in the dot product a · b and the a × in the cross product a × b can profitably be regarded as unary operators. though naturally in the specific case of scalar multiplication. which happens to be commutative. indeed. 5t and t5 actually mean two different things. but in normal usage operator notation represents only linear unary operations. it is true that 5t = t5.

tJ is itself a unary operator—it is the operator t 0 dt—though one can assign no particular value to it until it actually operates on something. the notation gives r no specific direction in which to advance.1) except where parentheses group otherwise. The vector b only directs the derivative. A or Ω without giving it anything in particular to operate upon. 21 Well. (b · ∇)a(r) = bi ∂a ∂aj = ˆbi  .32) b · ∇ψ(r) = bi ∂i In the case (15. In operator notation. when it matters. so. incidentally. ∇a(r) itself means nothing coherent21 so the parentheses usually are retained.29) is an unresolved unary operator of the same kind.32) define the directional derivative. VECTOR ANALYSIS mains imperfect. ∂i ∂i (15. however.3 The directional derivative and the gradient In the calculus of vector fields.29) and accorded a reference vector b to supply a direction and.31) For the scalar field the parentheses are unnecessary and conventionally are omitted.30) (b · ∇) = bi ∂i to express the derivative unambiguously. This operator applies equally to the scalar field. as the section’s introduction has observed. One can leave an operation unresolved. it does mean something coherent in dyadic analysis.1. The operator ∇ of (15. one can compose the directional derivative operator ∂ (15. . however.418 CHAPTER 15. 15. given (15. but this book won’t treat that. it is not the object of it. Nothing requires b to be constant. Equations (15. t For example. One can speak of a unary operator like J.6.31) and (15. Note that the directional derivative is the derivative not of the reference vector b but only of the field ψ(r) or a(r). ∂i as to the vector field. ∂ψ (b · ∇)ψ(r) = bi . the derivative notation d/dr is ambiguous because.31) of the vector field. to supply a scale. as ∂ψ . notationally. (15. operators associate from right to left (§ 2.

Pick your favorite brand. the directional derivative does not care. in which the vector ds represents the area and orientation of a patch of the region’s enclosing surface.35) is a vector infinitesimal of amplitude ds. It is not easy to motivate divergence directly. One of these is divergence.4 Divergence There exist other vector derivatives than the directional derivative and gradient of § 15.15.34) where ˆ ds ≡ n · ds (15. The result of a ∇ operation. but other football games have goal lines and goal posts. directed normally outward from the closed surface bounding the region—ds being the area of an infinitesimal element of the surface. too. through the concept of flux as follows. (15. the directional derivative operator b · ∇ is invariant under reorientation of axes. however. Flux is flow through a surface: in this case. The flux of a vector field a(r) outward from a region in space is Φ= S a(r) · ds. wherefore the directional derivative is invariant.3. so we shall approach it indirectly. . 15. The gradient represents the amplitude and direction of a scalar field’s locally steepest rate.33) is called the gradient of the scalar field ψ(r). FIELDS AND THEIR DERIVATIVES 419 though. too.6. If it seems opaque then try to visualize eqn. Formally a dot product. only scalar fields have gradients. When something like air flows through any surface—not necessarily a physical barrier but an imaginary surface like the goal line’s vertical plane in a football game22 —what matters is not the surface’s area as such but rather 22 The author has American football in mind.6. (The paragraph says much in relatively few words.34’s dot product a[r] · ds. the quantity ∇ψ(r) = ˆ ı ∂ψ ∂i (15. It can be a vector field b(r) that varies from place to place.6. the area of a tiny patch. 15. the gradient ∇ψ(r) is likewise invariant. Within (15. net flow outward from the region in question.32). Though both scalar and vector fields have directional derivatives.

but otherwise the flow sees a foreshortened surface. y) = ∆z(x. One can contemplate the flux Φopen = S a(r) · ds through an open surface as well as through a closed. a sink. where wind blowing through the region. Refer to Fig. ∂y ∂az dz ∂z ∆ay (z.x) zmax (x. then some lines might enter and exit the region more than once. as though the surface were projected onto a plane perpendicular to the flow. z). ∂x ∂ay dy. 15. 15.or ˆ-directed line. entering and leaving. It changes the problem in no essential way. x) = ∆y(z.z) ymax (z. then these increases are simply ∂ax ∆x(y.y) ∆az (x.y) represent the increase across the region respectively of ax . ∆ax (y. but where a positive net flux would have barometric pressure falling and air leaving the region maybe because a storm is coming—and you’ve got the idea. x) dz dx + ∆az (x. z) dy dz + ∆ay (z. z) = xmin (y. y) dx dy. ∂z Naturally.) A region of positive flux is a source. Now realize that eqn. 23 .34 actually describes flux not through an open surface but through a closed—it could be the surface enclosing the region of football play to goal-post height. VECTOR ANALYSIS the area the surface presents to the flow. but this merely elaborates the limits of integration along those lines.z) ∆ax (y. but it is the outward flux (15. ∆ax (y. The outward flux Φ of a vector field a(r) from some definite region in space is evidently Φ= where xmax (y. z) = ∂x ∂ay ∆ay (z.2. would constitute zero net flux.x) ∂ax dx. x).23 If the field has constant derivatives ∂a/∂i. ay or az along ˆ ˆ an x-.420 CHAPTER 15. if the region’s boundary happens to be concave. y). y. ∂y ∂az ∆az (x. y) = zmin (x.34) through a closed surface that will concern us here. of negative flux. x) = ymin (z. The surface presents its full area to a perpendicular flow. z or equivalently if the region in question is small enough that the derivatives are practically constant through it.

Formally a dot product. y) dx dy. so ∂az ∂ax ∂ay + + . Φ = (V ) ∂x ∂y ∂z or. divergence is invariant under reorientation of axes. z) dy dz + ∂ay ∂y ∆y(z.6. representing the intensity of a source or sink. though. x) dz dx 421 ∆z(x. The circulation of a vector field a(r) about a closed contour in space is Γ= a(r) · dℓ. flat plane in space. 15.36) the name divergence.34) which represented a double integration over a surface. ˆ Let [ˆ v n] be an orthogonal basis with n normal to the contour’s plane uˆ ˆ ˆ ˆ such that travel positively along the contour tends from u toward v rather . ∇ · a(r) = ∂ai . ∂i (15. V ∂x ∂y ∂z ∂i We give this ratio of outward flux to volume. One can in general contemplate circulation about any closed contour.38) where.37) (15. dividing through by the volume. ∂ax ∂ay ∂az ∂ai Φ = + + = = ∇ · a(r). FIELDS AND THEIR DERIVATIVES upon which Φ = ∂ax ∂x + ∂az ∂z ∆x(y. It needs first the concept of circulation as follows. But each of the last equation’s three integrals represents the region’s volume V . the here represents only a single integration. Curl is a little trickier to visualize.15. unlike the S of (15. (15.5 Curl Curl is to divergence as the cross product is to the dot product. but it suits our purpose here to consider specifically a closed contour that happens not to depart from a single.6.

or equivalently if the contour in question is short enough that the derivatives are practically constant over it. so ∂au ∂av − . Γ A = ∂av ∂au − ∂u ∂v ∂ak ˆ =n· ∂j ˆ ˆ ˆ u v n ∂/∂u ∂/∂v ∂/∂n au av an (15. ∂u ∂au ∆v(u). = ∂j ∂u ∂v (15. then these increases are simply ∆av (v) = ∆au (u) = upon which Γ= ∂av ∂u ∆u(v) dv − ∂au ∂v ∆v(u) du. ˆ ˆ n · [∇ × a(r)] = n · ǫijkˆ ı ∂au ∂av ∂ak − . Γ = (A) ∂u ∂v or. ∆av (v) = umin (v) vmax (u) ∂av du. VECTOR ANALYSIS than the reverse.422 CHAPTER 15. We give this ratio of circulation to area.or v-directed line.39) ˆ = n · ǫijkˆ ı ˆ = n · [∇ × a(r)]. dividing through by the area.40) . ∂u ∂au dv ∂v ∆au (u) = vmin (u) represent the increase across the contour’s interior respectively of av or au ˆ ˆ along a u. If the field has constant derivatives ∂a/∂i. The circulation Γ of a vector field a(r) about this contour is evidently Γ= where umax (v) ∆av (v) dv − ∆au (u) du. ∂av ∆u(v). ∂v But each of the last equation’s two integrals represents the area A within the contour.

7. 15. ∇ × a(r) = ǫijkˆ ı ∂ak . about a specified axis. a vector. Observe that. whereupon one can return to define directional curl more generally than (15. Formally a cross product. so it may be said that curl is the locally greatest directional curl. not only n.6. b · [∇ × a(r)] = b · ǫijkˆ ı ∂ak ∂ak . Directional curl.40) has defined it. representing the intensity of circulation.40) with respect to a contour in some specified plane.41) would necessarily result.4 contemplated the flux of a vector field a(r) from a volume small enough that the divergence ∇ · a was practically constant through .41) of curl.40) to motivate and develop the concept (15. Once developed.31) here too any reference vector b or ˆ vector field b(r) can serve to direct the curl. although we have developed directional curl (15.1 The divergence theorem Section 15. curl (15. 15. ˆ We have needed n and (15. Directional curl evidently cannot exceed curl in magnitude. ∂j (15. This shift comes in two kinds. which we are now prepared to treat. the degree of twist so to speak. We might have chosen ˆ another plane and though n would then be different the same (15. curl is invariant under reorientation of axes. directional curl is likewise invariant. dot-multiplied by a reference vector. a scalar. As in (15. Curl. The two subsections that follow explain.7 Integral forms The vector’s distinctive maneuver is the shift between integral forms. Note however that directional curl so defined is not a distinct kind of derivative but rather is just curl. INTEGRAL FORMS 423 the name directional curl.40). = ǫijk bi ∂j ∂j (15. An ordinary dot product. the concept of curl stands on its own. unexpectedly is a property of the field only.15. but will equal ˆ it when n points in its direction. oriented normally to the locally greatest directional curl’s plane.42) This would be the actual definition of directional curl. Hence.7. The cross product in (15.41) itself turns out to be altogether independent of any particular plane. however. is a property of the field and the plane.41) we call curl.

Fortunately. so ∇ · a dv = a · ds.24 According to (15. we have that ∇ · a dv = a · ds + a · ds.43) Where waves propagate through a material interface the fields that describe the waves can be discontinuous. overall volume.424 CHAPTER 15. each element small enough that the constant-divergence assumption holds across it. V 24 ∇ · a dv = S a · ds. thin transition layer in which the fields and their derivatives vary smoothly rather than jumping sharply. Adding all the elements together. VECTOR ANALYSIS it.36). such that the one volume element’s ds on the surface the two elements share points oppositely to the other volume element’s ds on the same surface). the flux from any one of these volume elements is a · ds. . and outer surface area shared with no other element because it belongs to the surface of the larger. Interior elements naturally have only the former kind but boundary elements have both kinds of surface area. One would like to treat larger volumes as well. whereas the narrative depends on fields not only to be continuous but even to have continuous derivatives. so we can elaborate the last equation to read ∇ · a dv = a · ds + a · ds Sinner Souter for a single element. where Selement = Sinner + Souter . elements elements Sinner elements Souter but the inner sum is null because it includes every interior surface twice.34) and (15. the apparent difficulty is dismissed by imagining an appropriately graded. but to us it is admittedly a distraction so we shall let the matter rest in that form. ∇ · a dv = Selement Even a single volume element however can have two distinct kinds of surface area: inner surface area shared with another element. because each interior surface is shared by two elements such that ds2 = −ds1 (in other words. A professional mathematician might develop this footnote into a formidable discursion. elements elements Souter That is. (15. One can treat a larger volume by subdividing the volume into infinitesimal volume elements dv.

and outer edge shared with no other element because it belongs to the edge of the larger.43) is the divergence theorem.7. INTEGRAL FORMS 425 Equation (15. overall surface. If an open surface— whether the surface be confined to a plane or be warped (as for example the surface of a bowl) in three dimensions—is subdivided into infinitesimal surface elements ds.2. (∇ × a) · ds = elements elements outer 25 [47. developed as follows. so a · dℓ. eqn.7. 15. where element = inner + outer . then the circulation about any one of these elements according to (15.38) and (15.1 is a second.39) is (∇ × a) · ds = a · dℓ. each element small enough to be regarded as planar and to experience essentially constant curl.43) swaps the one integral for the other. related theorem for directional curl. (15.2 Stokes’ theorem Corresponding to the divergence theorem of § 15.15. 1. we have that (∇ × a) · ds = a · dℓ + a · dℓ. elements elements inner elements outer but the inner sum is null because it includes every interior edge twice. so we can elaborate the last equation to read (∇ × a) · ds = a · dℓ + a · dℓ inner outer for a single element. so it is an important result. When they do. The integral on the equation’s left and the one on its right each arise in vector analysis more often than one might expect. because each interior edge is shared by two elements such that dℓ2 = −dℓ1 . Interior elements naturally have only the former kind but boundary elements have both kinds of edge. It is the vector’s version of the fundamental theorem of calculus (7.25 neatly relating the divergence within a volume to the flux from it.2).7.8] . often a profitable maneuver. element Even a single surface element however can have two distinct kinds of edge: inner edge shared with another element. Adding all the elements together.

20] .43). [Chapter to be continued. VECTOR ANALYSIS (∇ × a) · ds = a · dℓ.4. S CHAPTER 15.426 That is.44) Equation (15.44) is Stokes’ theorem.26 neatly relating the directional curl over a (possibly nonplanar) surface to the circulation about it.44) serves to swap one integral for the other. 1. eqn. Like the divergence theorem (15. (15. Stokes’ theorem (15.] 26 [47.

Orthogonal polynomials 23. Basic special functions 20. Fourier and Laplace transforms 17. Feedback control 27. Bessel functions The following chapters are further tentatively planned but may have to await the book’s second edition. The conjugate-gradient algorithm is to be treated here. Energy conservation 28.Plan The following chapters are tentatively planned to complete the book. 26. 16. Iterative techniques2 The following chapters are yet more remotely planned. Transformations to speed series convergence 19. Spatial Fourier transforms1 24. 427 . Numerical integration 25. The wave equation 21. Stochastics 29. Probability 18. Statistical mechanics 1 2 Weyl and Sommerfeld forms are to be included here. 22.

Remarks Tentatively planned new appendices include “Additional derivations” (to prove a few obscure results) and maybe “A summary of pure complexvariable theory. the author would rather not see a four-digit page number here. however. Maxwell’s equations happen to lie squarely within the author’s own professional area of expertise. Several of the tentatively planned chapters from Ch. and it does provide another way to think about complex numbers. Scientists and engineers with advanced credentials occasionally expect you to be acquainted with the pure theory for technical-social reasons. it is not impossible that the author would yet amend the plan to include it. Differential geometry and Kepler’s laws 32. Information theory 34. First. and a good chapter is not a bad thing. the book is a book. and to develop the Poynting vector that quantifies the electromagnetic delivery of energy. Even to the applied mathematician.428 30. As for the Navier-Stokes equation. Maxwell’s equations are the equations of radio waves and light. For these reasons. a future edition might tersely outline the pure theory’s main line of argument in an appendix. It is not planned to develop electromagnetics generally but it is planned to derive the vector Helmholtz equation from the differential forms of Amp`re’s and Faraday’s laws. The inclusion of such a chapter in the plan prompts one to ask why chapters on any number of other physical topics are not also included—on the Navier-Stokes equation. allowing him to deliver a better chapter on this particular topic than on some others. few if any of these chapters would treat much more than a few general results from their respective fields.”4 either or both to precede the existing appendix “Manuscript history” in a future edition of the book. Maxwell’s equations happen to provide a beautiful instance of a vector wave equation’s scalarization. for example. 25 onward represent deep fields of study each wanting full books of their own. The answer is threefold. 2 through 9 will have noticed that the author loves complex numbers but lacks commensurate enthusiasm for the standard development of the pure theory of the complex variable. but even if so it is not yet clear to the author just where in the book to put it or how extensively to treat it. on the other hand. Any reader who does not find such things inherently interesting holds a different notion of “interesting” than the author does. Second. regardless of its practical use. Besides. Maxwell’s equations3 33. If written according to plan. The pure theory though maybe unnecessarily abstract is pretty. Rotational dynamics CHAPTER 15. to e scalarize the result. Third. VECTOR ANALYSIS 31. 4 Any reader who has read Chs. not a library: if it tried to treat all physical topics it would never end. the standard development of the pure theory does have its proper place. 3 . To earn a technical degree you are not unlikely to have to obtain a passing grade from a professional mathematician teaching the pure theory.

7. If you find a part of the book insufficiently rigorous. or how you have cited it. if it fails to do so in your view then maybe you and I should discuss the matter. what you have learned from it.org. as far as it has yet gone. o or whatever.org. Write as appropriate. so no such correction is too small. which means that you as reader have a stake in it if you wish. then my response is likely to be that there is already a surfeit of fine professional mathematics books in print. readers are downloading the book at the rate of about four thousand copies per year directly through derivations. such errors only mar the manuscript. actually have read it. THB . I do not discourage such criticism and would be glad to hear it. or a substantial fraction of it. Finding the right balance is not always easy. then that is another matter. Check http://www. On the other hand. More to come. if you have found any part of the book unnecessarily confusing. I would most expressly solicit your feedback on typos. then write at your discretion. At the time of this writing. On a higher plane. but this book may not be well placed to meet it (the book might compromise by including a footnote that briefly suggests the outline of a more rigorous proof.15. then you can help to improve it. false or missing symbols and the like. the book does intend to derive every one of its results adequately from an applied perspective. On no particular plane. If you want to detail H¨lder spaces and Galois theory.derivations. Some fraction of those. INTEGRAL FORMS A personal note to the reader 429 Derivations of Applied Mathematics belongs to the open-source tradition. misprints. If you have read the book. if you would tell me what you have done with your copy of the book. plus others who have installed the book as a Debian package or have acquired the book through secondary channels. but it tries not to distract the narrative by formalities that do not serve applications). please tell how so. this just isn’t that kind of book. now you stand among them. then write me at thb@derivations.org/ for the latest revision.

430 CHAPTER 15. VECTOR ANALYSIS .

and it concisely summarizes complex ideas on paper to the writer himself. if writers will not incrementally improve it? Consider the notation of the algebraist Girolamo Cardano in his 1539 letter to Tartaglia: [T]he cube of one-third of the coefficient of the unknown is greater in value than the square of one-half of the number. when it e falls to the discoverer and his colleagues to establish new notation to meet the need. one would find it difficult even to think clearly about the math. Nevertheless. and rightly so. A more difficult problem arises when old notation exists but is inelegant in modern use. surely he would express the same thought in the form x 2 a 3 > .Appendix A Hexadecimal and other notational matters The importance of conventional mathematical notation is hard to overstate. 3 2 Good notation matters. Without the notation. New mathematical ideas occasionally find no adequate pre¨stablished notation. Convention is a hard hill to climb. [35] If Cardano lived today. The right notation is not always found at hand. Such notation serves two distinct purposes: it conveys mathematical ideas from writer to reader. slavish devotion to convention does not serve the literature well. of course. for how else can notation improve over time. Although this book has no brief to overhaul applied mathematical notation generally. it does seek to aid the honorable cause of notational evolution 431 . nearly impossible. to discuss it with others.

483. the introduction of new symbols ξ and such. plus five times sixteen. or in other words 20x1F − 1.32 million or 9. However. sixteens cubed. this is 0x7FFF FFFF. the aesthetic advantage may not seem immediately clear from the one example. so that (for instance) the quarter revolution or right angle is expressed as 2π/4 rather than as the less evocative π/2. sixteens. we choose to leap.1 Hexadecimal numerals Treating 2π as a single symbol is a small step. To the reader who is not a computer scientist. the sixteen symbols 0123456789ABCDEF respectively represent the numbers zero through . the next. For example. but whose decimal notation does nothing to convey the notion of the largest signed integer storable in a byte. and so on. plus once sixteen times sixteen times sixteen. to explain very briefly: hexadecimal represents numbers not in tens but rather in sixteens. but its iterative tenfold structure meets little or no aesthetic support in mathematical theory.147.7 miles. In this book. Traditional decimal notation is unobjectionable for measured quantities like 63. the book sometimes treats 2π implicitly as a single symbol. If this book treats 2π sometimes as a single symbol—if such treatment meets the approval of slowly evolving convention—then further steps. HEX AND OTHER NOTATIONAL MATTERS in a few specifics. In hexadecimal notation. whose number suggests a significant idea to the computer scientist.432 APPENDIX A. One wants to introduce some new symbol ξ = 2π thereto. The rightmost place in a hexadecimal numeral represents ones. of course. A. 2π remains a bit awkward. the hexadecimal numeral 0x1357 means “seven. either we leap the ditch or we remain on the wrong side. sixteens squared. A bolder step is to adopt from the computer science literature the important notational improvement of the hexadecimal numeral.81 m/s2 . As a single symbol.647. Much better is the base-sixteen hexadecimal notation 0x7F. Consider for instance the decimal numeral 127. can safely be left incrementally to future writers. No incremental step is possible here. which is the largest signed integer storable in a standard thirty-two bit word. it is neither necessary nor practical nor desirable to leap straight to notational Utopia in one great bound.” In hexadecimal. which clearly expresses the idea of 27 − 1. unlikely to trouble readers much. but consider the decimal number 2. It suffices in print to improve the notation incrementally. The question is: which notation more clearly captures the idea? To readers unfamiliar with the hexadecimal notation. plus thrice sixteen times sixteen. $ 1. For instance. the next place leftward. the next place leftward.

The reason it is unfamiliar is that it is not often encountered outside the computer science literature. but it is not encountered because it is not used. one quarter. Combining the hexadecimal and 2π ideas. and so on in a cycle. In base twelve. such as where hexadecimal numbers are arrayed in matrices.06)(2π) in base twelve. with sixteen being written 0x10. If you have never yet used the hexadecimal system. AVOIDING NOTATIONAL CLUTTER 433 fifteen.2 Avoiding notational clutter Good applied mathematical notation is not cluttered. but where they do naturally decimal not hexadecimal is used: vsound = 331 m/s rather than the silly-looking vsound = 0x14B m/s.6.3. that this particular cycle is worth breaking. so hexadecimal (base sixteen) is found to offer a convenient shorthand for binary (base two. the fundamental. All this raises the sensible question: why sixteen?1 The answer is that sixteen is 24 .2. however. so this book uses the hexadecimal for integers larger than 9.A. on aesthetic grounds. Good notation does not necessarily include every possible limit. the hour angles (§ 3. smallest possible base). it is worth your while to learn it. letters which when set in italics usually represent coefficients not digits. It seems to this writer.6) come in neat increments of (0. one third and one half are respectively written 0. For the sake of elegance. A. Observe that in some cases. we note here for interest’s sake that 2π ≈ 0x6.4 and 0. Also. Hexadecimal. However. Binary is inherently theoretically interesting. 0. is preferred for its straightforward proxy of binary. at the risk of challenging entrenched convention. superscript and An alternative advocated by some eighteenth-century writers was twelve. The conventional hexadecimal notation is admittedly a bit bulky and unfortunately overloads the letters A through F. this book employs hexadecimal throughout. the real problem with the hexadecimal notation is not in the notation itself but rather in the unfamiliarity with it. qualification. but direct binary notation is unwieldy (the hexadecimal number 0x1357 is binary 0001 0011 0101 0111). 1 .487F. this book may omit the cumbersome hexadecimal prefix “0x. so there are some real advantages to that base. and it is not used because it is not familiar. so hexadecimal is written in proxy. Each of the sixteen hexadecimal digits represents a unique sequence of exactly four bits (binary digits).” Specific numbers with physical dimensions attached appear seldom in this book. besides having momentum from the computer science literature.

434

APPENDIX A. HEX AND OTHER NOTATIONAL MATTERS

subscript. For example, the sum
M N

S=
i=1 j=1

a2 ij

might be written less thoroughly but more readably as S=
i,j

a2 ij

if the meaning of the latter were clear from the context. When to omit subscripts and such is naturally a matter of style and subjective judgment, but in practice such judgment is often not hard to render. The balance is between showing few enough symbols that the interesting parts of an equation are not obscured visually in a tangle and a haze of redundant little letters, strokes and squiggles, on the one hand; and on the other hand showing enough detail that the reader who opens the book directly to the page has a fair chance to understand what is written there without studying the whole book carefully up to that point. Where appropriate, this book often condenses notation and omits redundant symbols.

Appendix B

The Greek alphabet
Mathematical experience finds the Roman alphabet to lack sufficient symbols to write higher mathematics clearly. Although not completely solving the problem, the addition of the Greek alphabet helps. See Table B.1. When first seen in mathematical writing, the Greek letters take on a wise, mysterious aura. Well, the aura is fine—the Greek letters are pretty— but don’t let the Greek letters throw you. They’re just letters. We use them not because we want to be wise and mysterious1 but rather because
Well, you can use them to be wise and mysterious if you want to. It’s kind of fun, actually, when you’re dealing with someone who doesn’t understand math—if what you want is for him to go away and leave you alone. Otherwise, we tend to use Roman and Greek letters in various conventional ways: Greek minuscules (lower-case letters) for angles; Roman capitals for matrices; e for the natural logarithmic base; f and g for unspecified functions; i, j, k, m, n, M and N for integers; P and Q for metasyntactic elements (the mathematical equivalents of foo and bar); t, T and τ for time; d, δ and ∆ for change; A, B and C for unknown coefficients; J, Y and H for Bessel functions; etc. Even with the Greek, there still are not enough letters, so each letter serves multiple conventional roles: for example, i as an integer, an a-c electric current, or—most commonly—the imaginary unit, depending on the context. Cases even arise in which a quantity falls back to an alternate traditional letter because its primary traditional letter is already in use: for example, the imaginary unit falls back from i to j where the former represents an a-c electric current. This is not to say that any letter goes. If someone wrote e2 + π 2 = O 2 for some reason instead of the traditional a2 + b2 = c2 for the Pythagorean theorem, you would not find that person’s version so easy to read, would you? Mathematically, maybe it doesn’t matter, but the choice of letters is not a matter of arbitrary utility only but also of convention, tradition and style: one of the early
1

435

436

APPENDIX B. THE GREEK ALPHABET Table B.1: The Roman and Greek alphabets. Aa Bb Cc Dd Ee Ff Aa Bb Cc Dd Ee Ff Gg Hh Ii Jj Kk Lℓ Gg Hh Ii Jj Kk Ll ROMAN Mm Nn Oo Pp Qq Rr Ss Mm Nn Oo Pp Qq Rr Ss Tt Uu Vv Ww Xx Yy Zz Tt Uu Vv Ww Xx Yy Zz

Aα Bβ Γγ ∆δ Eǫ Zζ

alpha beta gamma delta epsilon zeta

Hη Θθ Iι Kκ Λλ Mµ

GREEK eta Nν theta Ξξ iota Oo kappa Ππ lambda Pρ mu Σσ

nu xi omicron pi rho sigma

Tτ Υυ Φφ Xχ Ψψ Ωω

tau upsilon phi chi psi omega

we simply do not have enough Roman letters. An equation like α2 + β 2 = γ 2 says no more than does an equation like a2 + b2 = c2 , after all. The letters are just different. Applied as well as professional mathematicians tend to use Roman and Greek letters in certain long-established conventional sets: abcd; f gh; ijkℓ; mn (sometimes followed by pqrst as necessary); pqr; st; uvw; xyzw. For
writers in a field has chosen some letter—who knows why?—then the rest of us follow. This is how it usually works. When writing notes for your own personal use, of course, you can use whichever letter you want. Probably you will find yourself using the letter you are used to seeing in print, but a letter is a letter; any letter will serve. Excepting the rule, for some obscure reason almost never broken, that i not follow h, the mathematical letter convention consists of vague stylistic guidelines rather than clear, hard rules. You can bend the convention at need. When unsure of which letter to use, just pick one; you can always go back and change the letter in your notes later if you need to.

437 the Greek: αβγ; δǫ; κλµνξ; ρστ ; φψωθχ (the letters ζηθξ can also form a set, but these oftener serve individually). Greek letters are frequently paired with their Roman congeners as appropriate: aα; bβ; cgγ; dδ; eǫ; f φ; kκ; ℓλ; mµ; nν; pπ; rρ; sσ; tτ ; xχ; zζ.2 Some applications (even in this book) group the letters slightly differently, but most tend to group them approximately as shown. Even in Western languages other than English, mathematical convention seems to group the letters in about the same way. Naturally, you can use whatever symbols you like in your own private papers, whether the symbols are letters or not; but other people will find your work easier to read if you respect the convention when you can. It is a special stylistic error to let the Roman letters ijkℓmn, which typically represent integers, follow the Roman letters abcdef gh directly when identifying mathematical quantities. The Roman letter i follows the Roman letter h only in the alphabet, almost never in mathematics. If necessary, p can follow h (curiously, p can alternatively follow n, but never did convention claim to be logical). Mathematicians usually avoid letters like the Greek capital H (eta), which looks just like the Roman capital H, even though H (eta) is an entirely proper member of the Greek alphabet. The Greek minuscule υ (upsilon) is avoided for like reason, for mathematical symbols are useful only insofar as we can visually tell them apart.3
2 The capital pair Y Υ is occasionally seen but is awkward both because the Greek minuscule υ is visually almost indistinguishable from the unrelated (or distantly related) Roman minuscule v; and because the ancient Romans regarded the letter Y not as a congener but as the Greek letter itself, seldom used but to spell Greek words in the Roman alphabet. To use Y and Υ as separate symbols is to display an indifference to, easily misinterpreted as an ignorance of, the Graeco-Roman sense of the thing—which is silly, really, if you think about it, since no one objects when you differentiate j from i, or u and w from v—but, anyway, one is probably the wiser to tend to limit the mathematical use of the symbol Υ to the very few instances in which established convention decrees it. (In English particularly, there is also an old typographical ambiguity between Y and a Germanic, non-Roman letter named “thorn” that has practically vanished from English today, to the point that the typeface in which you are reading these words lacks a glyph for it—but which sufficiently literate writers are still expected to recognize on sight. This is one more reason to tend to avoid Υ when you can, a Greek letter that makes you look ignorant when you use it wrong and pretentious when you use it right. You can’t win.) The history of the alphabets is extremely interesting. Unfortunately, a footnote in an appendix to a book on derivations of applied mathematics is probably not the right place for an essay on the topic, so we’ll let the matter rest there. 3 No citation supports this appendix, whose contents (besides the Roman and Greek alphabets themselves, which are what they are) are inferred from the author’s subjective observation of seemingly consistent practice in English-language applied mathematical

438

APPENDIX B. THE GREEK ALPHABET

publishing, plus some German and a little French, dating back to the 1930s. From the thousands of readers of drafts of the book, the author has yet to receive a single serious suggestion that the appendix were wrong—a lack that constitutes a sort of negative (though admittedly unverifiable) citation if you like. If a mathematical style guide exists that formally studies the letter convention’s origins, the author is unaware of it (and if a graduate student in history or the languages who, for some reason, happens to be reading these words seeks an esoteric, maybe untouched topic on which to write a master’s thesis, why, there is one).

Appendix C

Manuscript history
The book in its present form is based on various unpublished drafts and notes of mine, plus some of my wife Kristie’s (n´e Hancock), going back to 1983 e when I was fifteen years of age. What prompted the contest I can no longer remember, but the notes began one day when I challenged a high-school classmate to prove the quadratic formula. The classmate responded that he didn’t need to prove the quadratic formula because the proof was in the class math textbook, then counterchallenged me to prove the Pythagorean theorem. Admittedly obnoxious (I was fifteen, after all) but not to be outdone, I whipped out a pencil and paper on the spot and started working. But I found that I could not prove the theorem that day. The next day I did find a proof in the school library,1 writing it down, adding to it the proof of the quadratic formula plus a rather inefficient proof of my own invention to the law of cosines. Soon thereafter the school’s chemistry instructor happened to mention that the angle between the tetrahedrally arranged four carbon-hydrogen bonds in a methane molecule was 109◦ , so from a symmetry argument I proved that result to myself, too, adding it to my little collection of proofs. That is how it started.2 The book actually has earlier roots than these. In 1979, when I was twelve years old, my father bought our family’s first eight-bit computer. The computer’s built-in BASIC programming-language interpreter exposed
A better proof is found in § 2.10. Fellow gear-heads who lived through that era at about the same age might want to date me against the disappearance of the slide rule. Answer: in my country, or at least at my high school, I was three years too young to use a slide rule. The kids born in 1964 learned the slide rule; those born in 1965 did not. I wasn’t born till 1967, so for better or for worse I always had a pocket calculator in high school. My family had an eight-bit computer at home, too, as we shall see.
2 1

439

440

APPENDIX C. MANUSCRIPT HISTORY

functions for calculating sines and cosines of angles. The interpreter’s manual included a diagram much like Fig. 3.1 showing what sines and cosines were, but it never explained how the computer went about calculating such quantities. This bothered me at the time. Many hours with a pencil I spent trying to figure it out, yet the computer’s trigonometric functions remained mysterious to me. When later in high school I learned of the use of the Taylor series to calculate trigonometrics, into my growing collection of proofs the series went. Five years after the Pythagorean incident I was serving the U.S. Army as an enlisted troop in the former West Germany. Although those were the last days of the Cold War, there was no shooting war at the time, so the duty was peacetime duty. My duty was in military signal intelligence, frequently in the middle of the German night when there often wasn’t much to do. The platoon sergeant wisely condoned neither novels nor cards on duty, but he did let the troops read the newspaper after midnight when things were quiet enough. Sometimes I used the time to study my German—the platoon sergeant allowed this, too—but I owned a copy of Richard P. Feynman’s Lectures on Physics [13] which I would sometimes read instead. Late one night the battalion commander, a lieutenant colonel and West Point graduate, inspected my platoon’s duty post by surprise. A lieutenant colonel was a highly uncommon apparition at that hour of a quiet night, so when that old man appeared suddenly with the sergeant major, the company commander and the first sergeant in tow—the last two just routed from their sleep, perhaps—surprise indeed it was. The colonel may possibly have caught some of my unlucky fellows playing cards that night—I am not sure— but me, he caught with my boots unpolished, reading the Lectures. I snapped to attention. The colonel took a long look at my boots without saying anything, as stormclouds gathered on the first sergeant’s brow at his left shoulder, then asked me what I had been reading. “Feynman’s Lectures on Physics, sir.” “Why?” “I am going to attend the university when my three-year enlistment is up, sir.” “I see.” Maybe the old man was thinking that I would do better as a scientist than as a soldier? Maybe he was remembering when he had had to read some of the Lectures himself at West Point. Or maybe it was just the singularity of the sight in the man’s eyes, as though he were a medieval knight at bivouac who had caught one of the peasant levies, thought to be illiterate, reading Cicero in the original Latin. The truth of this, we shall never know. What the old man actually said was, “Good work, son. Keep

441 it up.” The stormclouds dissipated from the first sergeant’s face. No one ever said anything to me about my boots (in fact as far as I remember, the first sergeant—who saw me seldom in any case—never spoke to me again). The platoon sergeant thereafter explicitly permitted me to read the Lectures on duty after midnight on nights when there was nothing else to do, so in the last several months of my military service I did read a number of them. It is fair to say that I also kept my boots better polished. In Volume I, Chapter 6, of the Lectures there is a lovely introduction to probability theory. It discusses the classic problem of the “random walk” in some detail, then states without proof that the generalization of the random walk leads to the Gaussian distribution p(x) = exp(−x2 /2σ 2 ) √ . σ 2π

For the derivation of this remarkable theorem, I scanned the book in vain. One had no Internet access in those days, but besides a well equipped gym the Army post also had a tiny library, and in one yellowed volume in the library—who knows how such a book got there?—I did find a derivation √ of the 1/σ 2π factor.3 The exponential factor, the volume did not derive. Several days later, I chanced to find myself in Munich with an hour or two to spare, which I spent in the university library seeking the missing part of the proof, but lack of time and unfamiliarity with such a German site defeated me. Back at the Army post, I had to sweat the proof out on my own over the ensuing weeks. Nevertheless, eventually I did obtain a proof which made sense to me. Writing the proof down carefully, I pulled the old high-school math notes out of my military footlocker (for some reason I had kept the notes and even brought them to Germany), dusted them off, and added to them the new Gaussian proof. That is how it has gone. To the old notes, I have added new proofs from time to time; and although somehow I have misplaced the original high-school leaves I took to Germany with me, the notes have nevertheless grown with the passing years. After I had left the Army, married, taken my degree at the university, begun work as a building construction engineer, and started a family—when the latest proof to join my notes was a mathematical justification of the standard industrial construction technique for measuring the resistance-to-ground of a new building’s electrical grounding system—I was delighted to discover that Eric W. Weisstein had compiled
3

The citation is now unfortunately long lost.

442

APPENDIX C. MANUSCRIPT HISTORY

and published [51] a wide-ranging collection of mathematical results in a spirit not entirely dissimilar to that of my own notes. A significant difference remained, however, between Weisstein’s work and my own. The difference was and is fourfold: 1. Number theory, mathematical recreations and odd mathematical names interest Weisstein much more than they interest me; my own tastes run toward math directly useful in known physical applications. The selection of topics in each body of work reflects this difference. 2. Weisstein often includes results without proof. This is fine, but for my own part I happen to like proofs. 3. Weisstein lists results encyclopedically, alphabetically by name. I organize results more traditionally by topic, leaving alphabetization to the book’s index, that readers who wish to do so can coherently read the book from front to back.4 4. I have eventually developed an interest in the free-software movement, joining it as a Debian Developer [10]; and by these lights and by the standard of the Debian Free Software Guidelines (DFSG) [11], Weisstein’s work is not free. No objection to non-free work as such is raised here, but the book you are reading is free in the DFSG sense. A different mathematical reference, even better in some ways than Weisstein’s and (if I understand correctly) indeed free in the DFSG sense, is emerging at the time of this writing in the on-line pages of the generalpurpose encyclopedia Wikipedia [52]. Although Wikipedia reads unevenly, remains generally uncitable,5 forms no coherent whole, and seems to suffer a certain competition among some of its mathematical editors as to which
4 There is an ironic personal story in this. As children in the 1970s, my brother and I had a 1959 World Book encyclopedia in our bedroom, about twenty volumes. It was then a bit outdated (in fact the world had changed tremendously in the fifteen or twenty years following 1959, so the encyclopedia was more than a bit outdated) but the two of us still used it sometimes. Only years later did I learn that my father, who in 1959 was fourteen years old, had bought the encyclopedia with money he had earned delivering newspapers daily before dawn, and then had read the entire encyclopedia, front to back. My father played linebacker on the football team and worked a job after school, too, so where he found the time or the inclination to read an entire encyclopedia, I’ll never know. Nonetheless, it does prove that even an encyclopedia can be read from front to back. 5 Some “Wikipedians” do seem actively to be working on making Wikipedia authoritatively citable. The underlying philosophy and basic plan of Wikipedia admittedly tend to thwart their efforts, but their efforts nevertheless seem to continue to progress. We shall see. Wikipedia is a remarkable, monumental creation.

0xB.443 of them can explain a thing most reconditely. patiently storing up the long. nor is such criticism necessarily improper from my point of view. 7. but anyway you can see where that line of logic leads. after all. . This book is bound to lose at least a few readers for its unorthodox use of hexadecimal notation (“The first primes are 2. at first to prove to myself that I could do hexadecimal arithmetic routinely and accurately with a pencil. undoubtedly some will do neither.”). stagnate at a local maximum. some will tolerate. had it no formulas. Still. which is understandable but does not always help the scientist or engineer who wants to . it is laden with mathematical knowledge. it evolves or accumulates only gradually. I started keeping my own theoretical math notes in hex a long time ago. 3. I have with hesitation and no little trepidation resolved to let this book use it. which I have referred to more than a few times in the preparation of this text. in all its richness. though. . It risks one leap at least: it employs hexadecimal numerals. a hillock whence higher ground is achievable not by gradual ascent but only by descent first—or by a leap. and in general these are not permitted to burden the published text. adverse criticism from some quarters for lack of rigor is probably inevitable. Perhaps it will gain a few readers for the same reason. A book can follow convention or depart from it. neither or both. A leap risks a fall. but in any case the book does both follow and depart. Like other applied mathematicians. More substantively: despite the book’s title. even in such an inherently unconservative discipline as mathematics. . time will tell. to be balanced against other factors. One ought not run such risks without cause. yet herein arises the ancient dilemma. too. 5. frequent departure seldom renders a book good. serious books by professional mathematicians tend to be for professional mathematicians. Descent risks a bog. Convention. Whether this particular book is original or good. in all its profundity. The hex notation is not my own. It existed before I arrived on the scene. hidden wisdom of generations past. yet. is for the reader to tell. later from aesthetic conviction that it was the right thing to do. but in the meantime the book has a mission. and since I know of no math book better positioned to risk its use. though occasional departure might render a book original. Convention is a peculiar thing: at its best. and crass popularity can be only one consideration. The views of the last group must be respected. can. sometimes. Well. the book does risk. I’ve several own private notations. The book might gain even more readers. Some readers will approve. including many proofs. and painted landscapes in place of geometric diagrams! I like landscapes.

The ideal author of such a book as this would probably hold two doctorates: one in mathematics and the other in engineering or the like. Such developments augur well for the book’s future at least. Who can say? The manuscript met an uncommonly enthusiastic reception at Debconf 6 [10] May 2006 at Oaxtepec. whip this book out and turn to § 2. Mexico. extended over twenty years and through the course of two-and-a-half university degrees. THB . MANUSCRIPT HISTORY use the math to model something. Where this manuscript will go in the future is hard to guess. now partly A typed and revised for the first time as a L TEX manuscript. and in August of the same year it warmly welcomed Karl Sarnow and Xplora Knoppix [54] aboard as the second official distributor of the book.10. The ideal author lacking. But in the meantime. I have written the book. if anyone should challenge you to prove the Pythagorean theorem on the spot.444 APPENDIX C. That should confound ’em. Perhaps the revision you are reading is the last. So here you have my old high-school notes. why.

6 Dec. 1989. [4] Kristie H. Stewart. John Rossi. [6] Gary S. Rogers. [3] R. http://www. Private conversation. [9] Eric de Sturler. and Edwin N. [5] Thaddeus H.org/.debian.debian. Lecture. http://www. Private conversation. 445 . Notes on Matrix Theory. 1960. Byron Bird. [2] Christopher Beattie. New York. version 1. Guide to the Applications of the Laplace and z-Transforms. 2007. Black. Special Functions of Mathematics for Engineers. New York. http://www. Transport Phenomena. Mass. 19 April 2008.org/social_contract#guidelines. The Debian Project. Virginia Polytechnic Institute and State University. Unpublished. Black. [12] G. Addison-Wesley. 2nd edition. [7] David K. Cheng.Bibliography [1] Larry C. Debian Free Software Guidelines. Referenced indirectly by way of [36]. Va. Macmillan. first English edition. Series in Electrical Engineering. Van Nostrand Reinhold. 1996. [8] Richard Courant and David Hilbert. 1971. 2004–08. Reading. Virginia Polytechnic Institute and State University. Va.. Andrews. 2001. 1953. Brown...org/.debian. Field and Wave Electromagnetics. Methods of Mathematical Physics. 1985. Doetsch. Blacksburg. Interscience (Wiley). New York.1. London. and Robert C. Warren E. [11] The Debian Project. [10] The Debian Project. Blacksburg. John Wiley & Sons. Lightfoot. Derivations of Applied Mathematics. Department of Mathematics.

GNU General Public License.edu/archive/21b_fall_03/ final/21breview. The Princess Bride.. 2nd edition. . [14] Stephen D. Probability. Books on Mathematics. N. Mass.J.smcvt. Hildebrand. [16] The Free Software Foundation.pdf.edu/linearalgebra/. St. Linear Algebra. [24] Francis B. and Statistics. The Feynman Lectures on Physics. Methods of Mathematics Applied to Calculus. Advanced Calculus for Applications. N. 1985.). Harrington. Addison-Wesley. /usr/share/common-licenses/GPL-2 on a Debian system. Dover. The Debian Project: http://www. Mineola. New York. Time-harmonic Electromagnetic Fields. 1968. N. 2004.. 1788. Linear Algebra. and Lawrence E.. Fifth Floor.harvard. 1963– 65. Franklin.. [22] Harvard University (author unknown). N. Three volumes. Boston.org/. Books on Mathematics. Upper Saddle River. The History of the Decline and Fall of the Roman Empire. Dover. Friedberg.. Complex Variables. New York. version 2. As of 3 Nov. one can download it from http://joshua. [23] Jim Hefferon. [15] Joel N. 4th edition. Mineola. 1976. 20 May 2006. 1990. 1973. Books on Mathematics. Englewood Cliffs. Pearson Education/Prentice-Hall. and besides is the best book of the kind the author of the book you hold has encountered. Matrix Theory. Mathematics.Y. Michael’s College. Robert B.J.math. Dover.. Arnold J. The Free Software Foundation: 51 Franklin St. [19] William Goldman. Hamming. USA.debian. 2007 at least. 1961. Insel. Mineola. [18] Edward Gibbon..Y. N. Colchester. 13 Jan. [20] Richard W. Fisher. [17] Stephen H. “Math 21B review”.. Reading. 2003. McGraw-Hill. (The book is free software. 02110-1301. Mass. [21] Roger F. http://www. Feynman. PrenticeHall.446 BIBLIOGRAPHY [13] Richard P. 2nd edition. Ballantine. Vt. Leighton. Spence.Y. Texts in Electrical Engineering. and Matthew Sands.

Theory and Application of Infinite Series. 2005. As retrieved 21 Sept. Parr. [27] Brian W. 2nd edition.org/. Phillips and John M.. Mass. O’Connor and Edmund F. Lebedev. March 2006.pdf. Virginia Polytechnic Institute and State University. IA-32 Intel Architecture Software Developer’s Manual. http:// www. Englewood Cliffs.. Nayfeh and Balakumar Balachandran.. The MacTutor History of Mathematics. 2007 through 20 Feb. [34] Ali H. New York. 2nd English ed.uoguelph. Hafner. 1988.J. N. Dover.com/ Applied_mathematics. Planet Math. N. McGraw-Hill.. Books on Mathematics. 1995. [31] David C. 1995. Software Series. http://encyclopedia. [32] N. Blacksburg. Englewood Cliffs. [36] Charles L.J. 2007. Mineola. [33] David McMahon. Addison-Wesley. 28 Aug. Lay. Quantum Mechanics Demystified. 2005. Ritchie. Series in Nonlinear Science. New York. Kohler. Prentice Hall PTR. As retrieved 12 Oct.mcs. Special Functions and Their Applications. [30] The Labor Law Talk. [26] Intel Corporation.Y. revised in accordance with the 4th German edition. N. Reading. Wiley. Robertson.N. [37] PlanetMath. 2006. revised English edition. Prentice-Hall. Va.. Andrews. [29] Werner E. 2008. University of St. 2005 through 2 Nov. Signals. As retrieved 1 Sept. 19th edition. [35] John J. 1965.org. Systems and Transforms.ac. 1947. http://www. Kernighan and Dennis M. http://www-history. Computational and Experimental Methods. 1994. Lecture. New York. Scotland. Linear Algebra and Its Applications.BIBLIOGRAPHY 447 [25] Theo Hopman.laborlawtalk. Applied Nonlinear Dynamics: Analytical.ca/~thopman/246/indicial. st-andrews. 2002. School of Mathematics and Statistics. . “Introduction to indicial notation”. The C Programming Language. [28] Konrad Knopp..planetmath. Demystified Series.uk/.

Foresman & Co. [48] Bjarne Stroustrup. Cosmos.. 2003... As retrieved 28 April 2006. New York. [41] Adel S. “Levi-Civita symbol. 1941. Oxford University Press. Cambridge. [43] William L. 1993.reed. Scott. 2004. [40] Wayne A. New York. 1980. Electromagnetic Theory. As retrieved 29 May 2006 through 20 Feb. Physics 321: Electrodynamics”. 1960. Schaum’s Outline Series. 1991.pdf. Number 13 in Monographs on Applied and Computational Mathematics. [39] Carl Sagan. Ore. http://academic. 3rd edition.wolfram. Series in Electrical Engineering. van der Vorst. Virginia Polytechnic Institute and State University. [47] Julius Adams Stratton. 2008. New York. [50] Eric W. Lecture. Lecture 1.cs. [46] James Stewart. Lothar Redlin. Scales. “special” (third-and-a-half?) edition. Spiegel. New York. International Series in Pure and Applied Physics. “Euclid’s proof that there are an infinite number of primes”. Pacific Grove. Portland.htm. [44] Murray R. Calif. Ill. Microelectronic Circuits. The Rise and Fall of the Third Reich. [42] Al Shenk. 1984.york. 3rd edition.com/. 3rd edition. The C++ Programming Language. 2000. Glenview. [45] Susan Stepney. . Addison-Wesley. 1964. and Saleem Watson. Sedra and Kenneth C. Brooks/Cole. 27 Aug. Random House. Shirer. Smith. Iterative Krylov Methods for Large Linear Systems. Blacksburg. Simon & Schuster. Calculus and Analytic Geometry.uk/susan/cyc/p/ primeprf. [49] Henk A. Cambridge University Press.ac. http://www-users. Precalculus: Mathematics for Calculus. 2007. Weisstein. New York. Boston.. http://mathworld. Mathworld. McGrawHill.edu/ physics/courses/Physics321/page1/files/LCHandout.. Va. McGraw-Hill. Complex Variables: with an Introduction to Conformal Mapping and Its Applications.448 BIBLIOGRAPHY [38] Reed College (author unknown).

org/. Xplora Knoppix . Clarendon Press. Wikipedia.BIBLIOGRAPHY 449 [51] Eric W. [54] Xplora. 1965. The Algebraic Eigenvalue Problem.H. Fla. Knoppix/.. CRC Concise Encyclopedia of Mathematics. Chapman & Hall/CRC. Monographs on Numerical Analysis. http://en. http://www. 2nd edition. Boca Raton. [52] The Wikimedia Foundation. Weisstein.wikipedia. [53] J.xplora. Wilkinson. 2003.org/downloads/ . Oxford.

450 BIBLIOGRAPHY .

68. 39 higher-order. 219 nth root linear. 245 addition operator downward multitarget. 168 classical. 395 0 (zero). 432 aeronautical engineer. 45. 288 16th century. 384 ≪ and ≫. 219 implementation of from an equation. 51. 118. 413. 349 inverting a matrix by. 290 amortization. 88 of the vector. 356 rightward multitarget. 35. 331 ∇. 117 i. 220 dividing by. 419 dℓ. 264 δ. 414 aileron. 118 matrix. 244 series. 10 adjoint. 405 of vectors. 7. 244. 198 active region. 264 ≡. 244. 91 fundamental theorem of. 349 absolute value. 408 addition quasielementary. 45. 281 alternating signs. 263 row. 280 1 (one). 118 addition of matrices. 69 of a matrix inverse. 38 accountant. 7 parallel. 130 AMD. 240. 302 ←. 282 multitarget. 241. 257 0x. 185 elementary. 264 2π. 265 LU decomposition. 307 Gauss-Jordan. 331 π. 233 calculation by Newton-Raphson.Index . 263 QR decomposition. 161 of rows or columns. 7 e. 47. 431 calculating. 146. 393 addition analytic continuation. 144 algebra dz. 68. 11 addressing a vector space. 68 serial. 393 nth-order expression. 361 ′ 451 . 40 abstraction. 219 algorithm reductio ad absurdum. 407 accretion. 175 altitude. 265 upward multitarget. 408 ǫ. 247 amplitude. 251 leftward multitarget. 237 vector. 112. 361. 431 air.

45 sum of. powers and logarithms. 395 rotation of. 198 boundary element. 193 and the natural logarithm. 156 foundations of. 2 axis. 15 arm. Jean-Robert (1768–1822). 7 exact. 45 derivative of. 139 surface. 403 azimuth. 45 derivative of. 330 assignment. orthogonal. 330. 55 interior. 53 right. 393 axiom. 402 basis. 162 Argand plane. 62 of a cylinder or circle. 1.452 analytic function. 7 INDEX nonassociativity of the cross product. 10 associativity. 198 bound on a power series. 45 arccosine. 237 of unary linear operators. 375 complete. 330 binomial theorem. 315. 40 arithmetic. 417 automobile. wooden. 139 arg. 10 . 51 invariance under the reorientation of. 403 spherical. 400 of matrix multiplication. 305. 403 variable. 400 double. 290 blackbody radiation. 161 angle. 109 arctangent. 55 hour. 29 block. 43. 70 bond. 197 of a product of exponentials. 401 average. 366. 403 barometric pressure. 163 strategy to avoid. 229 basis. 402 converting to and from. 109 arcsine. 40 Argand domain and range planes. 401 battle. 334. 122 arithmetic series. 333 artillerist. 36 of a triangle. 45 square. 98 articles “a” and “the”. 35 of rotation. 175 boundary condition. 80 borrower. 146. 216 antiquity. 40 Argand. 403 secondary cylindrical. 243. 233 arc. 121 axes. 51. 330 battlefield. 237 arithmetic mean. 376 constant. 425 box. 376 cylindrical. 130. 39. 419 baroquity. 7. 35 of a polygon. 423. 399 reorientation of. 35 antelope. 45 derivative of. 194 guessing. 7 branch point. 51 changing. 335 businessperson. 398. 121 C and C++. 414 antiderivative. 374 of matrices. 164 building construction. 45. 35. 72 bit. 55. 398. 219 applied mathematics. 55 half. 36. 9. 290. 109 area.

233 inscrutable. 131 cannon. 81 change. 339 full. 103 properties. 5. 237 noncommutivity of the cross product. 67 change of variable. 327. 45. 104 complex number. 360 null. 368. derivative of. 102 citation. 1501–1576). 143 checking division. 98. derivative. 125. 315 column operator. 376 completing the square. 384 checking an integration. 234 . 125 fundamental theorem of. 200 caveman. 41 is a scalar not a vector. Andr´-Louis (1875–1918). Augustin Louis (1789–1857). 313. rate of. 291. 254 of the identity matrix. 144 closed form. 413 the two complementary questions of. 360 spare. 421 cis. 253 of matrices. 12 complex conjugation. 361 null. 10 change. 173 complex exponent. 307. 330 Cardano. 41 imaginary part of. 41 complex contour about a multiple pole. 26 nontrivial.INDEX calculus. 143 choice of wooden blocks. 403 area of. 382. 7 cleverness. 70 Cholesky. 414 chain rule. 91 and de Moivre’s theorem. 70 properties of. 417 of the dot product. notational. 403 unit. 331 Cauchy’s integral formula. 295. 323 orthonormal. 7 noncommutivity of matrix multiplication. 239 column rank. 351 addition of. 108 conjugating. 100 complex exponential. 234. 222 matching of. 398 complete basis. 227 clock. 45 circuit. Girolamo (also known as Cardanus or Cardan. 197 453 coincident properties of the square matrix. 55. 315 column vector. 167. 103 inverse. 101 derivative of. 65 actuality of. 399 noncommutivity of unary linear operators. 249 commutivity. 139 secondary. 331 circulation. 332 classical algebra. 55 closed analytic form. 366 column. 70 commutation hierarchy of. 216 closed contour integration. 280. 345 combination. 142 clutter. 405. 130. 370 unknown. 424 of the vector. 67 characteristic polynomial. 67. 216 closed surface integration. 219 carriage wheel. 39. 67. 253 of elementary operators. 355 scaled and repeated. appending a. 167. 433 coefficient. 200 Cauchy. 71 combinatorics. 5 city street. 391 e circle. 149.

201 complex. 62 rectangular. 366 cross product. 405 coordinate rotation. 163. 220 cosine. 64. 393 datum. 144 closed complex. 374 of a scalar. 30 constraint. 155 conditional. 136 INDEX domain of. 102 law of cosines. 338 counterexample. 41. 39. 112. 121 contractor. 224 curl. 400 noncommutivity of. Richard (1888–1972). 139 conjecture. 405 composite number. 307 control. proof by.454 magnitude of. 405 condition. 101 de Moivre. 349 computer processor. 173 contour integration. 237 Courant. 223 cubic formula. 111 compositional uniqueness of. Abraham (1667–1754). 228 contour. indeterminate. 65 and the complex exponential. Gabriel (1704–1752). 181 slow. aeronautical. 226 corner value. 219. 74 complex trigonometrics. 40 multiplication and division. 107 in complex exponential form. 220 roots of. 398 derivative of. 331 convention.about a multiple pole. 181 coordinate. 145 contract. 78 components of a vector by subscript. 118. 11. 103. 421 cyclic progression of coordinates. 393 primed and unprimed. 41. 109. 135. 136 cone volume of. 402 complex. 30 constant expression. 366 Cramer. 111 cubic expression. 403 cylindrical coordinates. 191 crosswind. 31 concision. 62 corner case. 401 cryptography. 45. 55 de Moivre’s theorem. 60 cost function. 200 complex. 101 complex variable. 168 of a vector quantity. 399 cylinder. 112 computer memory. 11 constant. 173. 167. 403 cylindrical basis. 145. 62 cyclic progression of. 399 nonassociativity of. 431 convergence. 5. 40 complex power. 223 cubing. 168. 397. 62 relations among. 191 cross-term. 335 contradiction. 65 . 3 Cramer’s rule. 335 day. 240 of a matrix inverse. 257 conjugation. 399 perpendicularity of. 40 real part of. 240 conjugate transpose. 367. 62 coordinates. 45. 41 constant. 63 spherical. 400 cross-derivative. 375 conditional convergence. 144 closed. 65 phase of. 399 cylindrical. 41 complex plane. 290 concert hall. 331 control surface. 62. 340 conjugate.

195. 364 product of two. inverting a matrix by. 81 constant. 75 determinant. 364 of a product. 22. 146 definition notation. 103 of a field. 344 Leibnitz notation for. 95. 364 of a unitary matrix. 330 del (∇). 191 definition of. 143 definition. 323 degree of freedom. 363 deviation. 5 decomposition LU . 385 diagonalization. 365 of a matrix inverse. 327 dependent variable. 2. Kronecker. 337. 36. 67 balanced form. 408 properties of. 79 differential equation. 413 of a function of a complex variable. 107 of the natural logarithm. 109 of arcsine. 78 of a rational function. 358 zero. 312. 83 unbalanced form. 45 main. 76 derivation. 313 degenerate matrix. 80 manipulation of. 107 of the natural exponential. 92. 282 Gram-Schmidt. 420 cross-. the. 418 higher. 341 DFSG (Debian Free Software Guidelines). arccosine and arctangent. 276. 263 diagonal. 80 455 Newton notation for. 362 definition of. 371 diagonal matrix. 372 diagonalizable matrix. 409 sifting property of. 242. 79 partial. 146 delta. 353 eigenvalue. 5 Debian Free Software Guidelines. 78. 75 of z a /a.INDEX Debian. 371 differentiability. 415 product rule for. 75 directional. 263 diagonalizability. 81. 262. 11 degenerate linear system. 138 dependent element. 382 three-dimensional. 107 of an inverse trigonometric. 75 chain rule for. 210 of a trigonometric. cosine and tangent. 241. Dirac. 356 orthonormalizing. 377 singular value. 388 definite integral. 109 of sine and cosine. 359 rank-n. 364 properties of. 5 diag notation. 358 inversion by. 241 denominator. 282 QR. 103 of sine. 414 delta function. 198 . 76 logarithmic. 349 Schur. 194 of a complex exponential. 349 diagonal. 205 density. 371 differences between the GramSchmidt and Gauss-Jordan. 371 Gauss-Jordan. 109 of z a . 357 and the elementary operator. 349 Gram-Schmidt. 277 second. 38 diagonal decomposition. 83 Jacobian. 146 sifting property of. 1 derivative.

371 eigenvector. 253 . 373. Albert (1879–1955). 369. 385 repeated. 146 Dirac. 325. 161 dominant eigenvalue. 370 repeated. 138 double pole. 145 distributivity. 41. 331 electromagnetic power. 345. 373. 146 direction. 206. 383 shifting. 7 of unary linear operators.456 solving by unknown coefficients. 425 edge case. 383 eigenvalue. 406 Einstein. 398 abbrevated notation for. 398 double angle. 45. 225. 405 commutivity of. 357. 242. 385 repeated. 332 downstairs. 370. 38. 368. 254 commutation of. 45. 65 by matching coefficients. 146 sifting property of. 256 and the determinant. 182 domain contour. 22 dividing by zero. 5. 373. 369 perturbed. 403 edge INDEX inner and outer. 370 small. 372 dominant. 38 dividend. 374 dot product. 251. 393 directional curl. 349 eigensolution. 370. 181. 400 element. 13. 316 dimensionlessness. 417 divergence. 374 eigenvalue decomposition. 368 distinct. 225 down. 51. 45 Dirac delta function. 406 Einstein’s summation convention. 327 interior and boundary. 371 eigenvalue matrix. 47. 406 electric circuit. 369. 418 discontinuity. 210 double root. 143 divisor. 383 real. 327. 198 differentiation analytical versus numeric. 144 dimension. 374 of a matrix inverse. 250 addition. 369 generalized. 234 dimension-limited matrix. 26 checking. 362 combining. 370 orthogonal. 55 double integral. 374 magnitude of. 371 of a matrix inverse. 414 downward multitarget addition operator. 384 independent. 327 sidestepping a. 419 divergence theorem. 374 large. 68 division. 374 zero. 133. 244 dimensionality. 264 driving vector. Paul (1902–1984). 200 east. 22 domain. 423 divergence to infinity. 163 domain neighborhood. 370 of a matrix inverse. 333 dummy variable. 373. 372 linearly independent. 421 directional derivative. 330. 383 Einstein notation. 423. 371 impossibility to share given independent eigenvectors. 234 free and dependent. 370 count of. 331 electrical engineer. 280 efficient implementation. 425 elementary operator.

185 evenness. 332 fast function.). 413 . 16. 98 curious consequences of. 237. 100. 11 factorization QR. 349 orthonormalizing. 305 exact quantity. 13. natural. 140 error due to rounding. 306 engineer. 421 derivative of. Levi-Civita. 314. 180 resemblance of to x∞ . 237. 219. 280 end. 53. 112 Euler’s formula. 91 compared to xa . 101 Euler. 253 two. 83. 295. 245. 39 empty set. 98. 256 sorting. 403 elf. 324 solving a set of simultaneously. 112 family of solutions. 91 and de Moivre’s theorem. 136 evaluation. 298 factorial. 353 Gauss-Jordan. complex. 97 exponential. 215 natural. 250. 250 invertibility of. 101 exponential. 227 field. 190 Euclid (c. 408 properties of. floating-point. 100 exponent. 331 entire function. 165. of different kinds. 282 Gram-Schmidt. 349 full-rank. 96 derivative of. 251. 325 equator. 196 factoring. 315. 301 extension. 305. 345 left-over. 290 exponential. 249. 212 expansion of 1/(1 − z)n+1 . 305 457 exactly determined linear system. 307. 4 extra column. 91 extended operator. 94 integrating a product of a power and. 334 in the solution to a linear system. 305 exact number. 349 factorization. 187 epsilon. 290. 300 B. 366. 356 decomposing. 252 inverting. 374 error bound. error of. 100 general. Leonhard (1707–1783).C. 165. 96 Ferrari. 409 equation simultaneous linear system of. 334. 330. prime. 243.INDEX expanding. 91 exponential. 32 complex. 254 scaling. 92. 181. 282 elementary vector. 254 interchange. 32. 166 factor truncating. 324 exercises. 175 essential singularity. 253 elementary similarity transformation. real. 157 exponent. 374 exact matrix. 83 even function. 148 existence. 258 exact arithmetic. 36. 152 expansion point shifting. 340. 309 elevation. 107 existence of. Lodovico (1522–1565). 39. 324 superfluous. 315 extremum. 255 inverse of. 413 curl of. justifying the means.

347 Gram-Schmidt decomposition. 414 flux. 5 goal line. 384 geometric majorization. 313. 33 GNU General Public License. 334 general triangular matrix. 419 football. 159. 96 fitting of. 419 goal post. 134 meromorphic. 330 freeway. 213 Frullani. GNU. 165. Jørgen Pedersen (1850–1916). 94 general identity matrix. derivatives of. 419 Goldman. 117 INDEX fundamental theorem of calculus. 38.and multiple-valued. 281 geometry. 245 general interchange operator. 290. 2 geometrical vector. 185 of a complex variable. 164 single. 146 GPL. 326 Gauss-Jordan algorithm. logical. 122 geometric series. 382 generality. 270 Frullani’s integral. multiple-valued. 213 full column rank. 262 general solution. 315 inverting the factors of. 187 nonanalytic. 83 fast. 288 Gauss-Jordan decomposition. 130. 353 factors K and S of. 151 linear. 424 gamma function. 340. 5 gradient. 29 geometrical arguments. 312 full rank factorization. 418 Gram.458 directional derivative of. 134 fourfold integral. 299 Gauss-Jordan kernel formula. 179 geometric mean. 282. William (1931–). degree of. 179 geometric series. 419 gradient of. variations on. 78 rational. 315 full rank. 266. 419 forbidden point. 5 GNU GPL. 181.vs. 261 General Public License. 349 . 165. Giuliano (1795–1834). Carl Friedrich (1777–1855). 335 frontier. 347 full-rank factorization. 138 fraction. 187 extremum of. 29 majorization by. 314 full row rank. 326 general exponential. 326 generalized eigenvector. 210 rational. 205 rational. 5 general scaling operator. 418 divergence of. 162 slow. integral of. 315 differences of against the GramSchmidt. 114 floating-point number. 138 Fourier transform spatial. 374 floor. 205 free element. 44 nonlinear. 418 flaw. 327 freedom. 162 formalism. 282 Gauss-Jordan factors properties of. 313. 96 fundamental theorem of algebra. 297 Gauss-Jordan factorization. 43. 134 odd or even. 209 single. 161 entire. 146 analytic. 393 geometrical visualization. 196 Gauss. 282 and full column rank. 353 function.

83 homogeneous solution.INDEX differences of against the GaussJordan. 103 prime. 7 iff. 111 hyperbolic functions. 55 inner surface. 280. variable. 130 156 independent variable. 353 factor Q of. 401 induction. 111 hypotenuse. 219 infinity. 374 indefinite integral. 3 inflection. 245. 353 factor S of. 161 145. 36 integral. 312 half angle. 374 imaginary number. 229 guessing the form of a solution. 125 hypothesis. 130 Hessenberg. 178 identity as accretion or area. 248 identity. 76 handwriting. 122 index. 112 properties of. 108. 248 balanced form. 68 Hilbert. 353 inverting a matrix by. 333 inner edge. 330 inner product. 168 Hessenberg matrix. 249 impossibility of promoting. 127 r-dimensional. 55 integer. 130 identity matrix. 356 Gram-Schmidt kernel formula. 240. (1915–1998). 435 Greenwich. 423 hour angle. 307 rank-r. 41 imaginary unit. 416 infinite dimensionality. 134. 328 and series summation. 384 dropping when negligible. infinite differentiability. 3. 305. 13 headwind. 10. 347 grapes. 197 gunpowder. 384 practical size of. 119. 145 inequality.and higher-order. 102 compositional uniqueness of. 384 independent. 345 hour. 425 horse. 41. 353 Gram-Schmidt process. arithmetic. 330 459 commutation of. 341 Heaviside. 125. 154 Heaviside unit step function. Oliver (1850–1925). 248 as shortcut to a sum. 15 hut. 155 ill conditioned matrix. 69 higher-order algebra. 68 and the Leibnitz notation. 367. David (1862–1943). 126 vector. 39 imaginary part. 179 index of summation. 412 as antiderivative. 129 . 109 Greek alphabet. 242 Hermite. 384 infinitesimal. 236 harmonic series. 55 independent infinitesimal Hamming. reflected. 109 indeterminate form. 76 Hermitian matrix. Richard W. Gerhard (1874–1925). 68 hexadecimal. 143 independence. 85 harmonic mean. Charles (1822–1901). variable. 414 composite. 432 second. 349 imprecise quantity. algebraic. 39 implementation efficient. 55 guessing roots.

198 interior element. Wilhelm (1842–1899). 84 o labor union. 364 existence of. Leopold (1823–1891). 138. 257 of a matrix transpose or adjoint. Guillaume de (1661–1704). 408 l’Hˆpital’s rule. 290 interchange. 144 by antiderivative. 80. 103 inverse trigonometric family of functions. 423 integral swapping. powers and logarithms. 357. 408 properties of. 21. 138 integral forms of vector calculus. 322 inverse complex exponential derivative of. 399 inverse INDEX determinant of. 254. 7 symbolic. 197 checking. 282. 2 iteration. 353 kernel matrix. 320 uniqueness of. 328 Gauss-Jordan formula for. 143 multiple. 322 of a matrix product. 325 converting between two of. 187 . 194 by Taylor series. 425 invariance under the reorientation of axes. 167. 103 inversion. 320 by determinant. 138 surface. 374 Jacobi. 349 interchange operator elementary. 241 Kronecker. 143 of a product of exponentials. 205 by parts. 261 interchange quasielementary. 138 vector contour. 337. 144 closed surface. 252 irrational number. 276. arithmetic. 335 Laurent series. 140 triple. Pierre Alphonse (1813–1854). 21. 209 sixfold. 326 kernel. 168 concept of. 193 Intel. 409 sifting property of. 326 Gram-Schmidt formula for. 398. 326. 359 refusing an. 365 inversion. 7 invertibility.460 closed complex contour. Carl Gustav Jacob (1804–1851). 423. 137 of a rational function. 142 complex contour. 339 of the elementary operator. 138 integrand. 330 Kronecker delta. 193 integration analytical versus numeric. 195 by substitution. 319. 351. 330 kernel space. 125 contour. 241. 365 multiplicative. 187 o l’Hˆpital. 86. 241. 138 indefinite. 193 by closed contour. 257 rank-r. 84. 145 volume. 276 Jacobian derivative. 138 fourfold. 215 integration techniques. 144 definite. 344 Jordan. 315. 115 irreducibility. 216 by unknown coefficients. 200 by partial-fraction expansion. 261 interest. 200 closed contour. 250 general. 187 Laurent. 143 double. 325 alternate formula for. 322 mutual.

249 long division. 134. 25. 116 procedure for. 350 loop counter. 109 logarithmic derivative. 312 linear dependence. 408 properties of. 215 natural. 51. logical. 374 preservation of. 1 . 160 Maclaurin. 94 and the antiderivative. 313. 118 mass density. 177. 138 matching coefficients. 236 linearity. 417 linear superposition. natural. 336 taxonomy of.INDEX law of cosines. 80 logic. 408 limit. 156. 134. 335 linear algebra. 333 overdetermined. 134 of a function. 11. 356 unit. 59 least squares. 324 nonoverdetermined. 358 mason. 178 geometric. 242. 172 linear system classification of. 97 logarithm. 337 leftward multitarget addition operator. 350 majorization. 38 marking permutor. 358 quasielementary. 335. 313 underdetermined. 356 Levi-Civita epsilon. general solution of. 13 lower triangular matrix. 180 properties of. 134 unary. 33 resemblance of to x0 . 332 nonoverdetermined. 306 mantissa. 114 lone-element matrix. error of. 60 law of sines. 27 loop. 312 linear expression. 36 Leibnitz notation. 179 maneuver. 22. 382 majority. 280. 32 integrating a product of a power and. 130 Leibnitz. 76 length curved. 346 of an eigenvalue. 341 logical flaw. 76. 26 mathematician applied. 374 magnitude. 335 least-squares solution. 306 reverse. 290 mapping. 332 fitting a. 219 linear independence. 417 loan. 134 of an operator. 313 linear transformation. 313 exactly determined. 117 461 logarithm. 64 of a vector. Gottfried Wilhelm (1646–1716). 265 leg. 334 nonoverdetermined. 187 by z − α. 198 locus. 134. 69 line. 40. 45 preservation of. 313 degenerate. 312 linear operator. 280. 233 linear combination. 409 Levi-Civita. 95. 67. 160 magnification. 280. particular solution of. 265 Maclaurin series. Colin (1698–1746). Tullio (1873–1941). 374 main diagonal. 96 derivative of. 194 compared to xa .

269 identity. 306 memory. 146 projection. 312.462 INDEX padding with zeros. computer. 258 matrix forms. 248 unit triangular. 315 exact. applied provenance of. 323 broad. 279 lone-element. noncommutivity of. 114. 245 self-adjoint. 245. 146. 373 metric. 265 ill conditioned. 83 null column of. 237 geometric. 236 real. 367. 366 matrix rudiments. 236 arithmetic. 246 dimension-limited. 242 harmonic. 233 raised to a complex power. 212 parallel unit triangular. 291. 236 foundations of. 237 multiplication. 307 unit upper triangular. 254. 390 mathematics. 274 ing. 327. 245 unit lower triangular. 354 inversion of. 323. construction of. 312. 122 multiplication by a scalar. 320. coincident properties of. partial. 365 matrix operator. 351 scalar. 338 null. 315. 312 truncating. 121 motivation to. 178 orthogonally complementary. 374 inversion properties of. 267 full-rank. 165. 417 large. 258 arithmetic of. 83 motivation for. 351 minorization. associativity of. 237 basic operations of. 393 main diagonal of. 156 perpendicular. 242. 242. the impossibility of promotunit triangular. 334. 347 mirror. 382 maximum. 109 . 265 identity. 249 general identity. 332 row of. 273 applied. 265 Hermitian. 312. 291 column of. 385 sparse. 372 matrix. 2. 244 square. justified by the end. 384 unit parallel triangular. 388 diagonalizable. 337 basic use of. properties mathematics of. 249 matrix vector. 237. 366 eigenvalue. 384 commutation of. 2. 242 professional. 234 rank of. 122 multiplication of. 234. 122 multiplication. 374 unitary. 114. 347 professional or pure. 254 singular. 382 nondiagonalizable versus singular. 236 rectangular. 244 minimum. 310 addition. 1. 320 dimensionality. 374 degenerate. 323 singular value. 233 mean. 349 237 meromorphic function. 242 inverting. 305 triangular. 233. means. 187 nondiagonalizable. 312. 316 square. 319 matrix condition. 371 tall. 237 rank-r inverse of. 233. 242.

91 natural exponential family of functions. 265 rightward. 340. 40 Moore. 111 numerator. Old Royal. 280 number. 139 normalization. 351. 344 Newton-Raphson iteration. particular solution of. 399 model. 344 nobleman. 334 nonoverdetermined linear system. 115 rational. 264 multivariate Newton-Raphson iteration. Sir Isaac (1642–1727). 335. 205 Observatory. 332 multiple pole. 91 derivative of. 91 real. 114. E. 103 neighborhood. 55 . 210 enclosing. 109 error of. 103 natural logarithm. 91 compared to xa . 393 null column. 322 nonlinearity. 162. 173 multiple-valued function. 330 notation for the vector. 22. 115 real. 68 number theory. 405 of the operator. 335 Moore-Penrose pseudoinverse. 86. 350 463 Newton. 96 complex. 353 motivation. (1862–1932). 344 Napoleon. 397 of a vector by a scalar. 233 multitarget addition operator. 67. 324 mountain road. 92. 162.INDEX mnemonic. 38. 333 normal unit vector. 65 complex.H. 331 nonnegative definiteness. 263 downward. 7 multiplier. 351 null matrix. 219 multivariate. 237 noncommutivity of the cross product. 2. 374 north. 39 complex. 403 normal vector or line. 347. 180 of a complex number. 265 upward. 107 error of. 397 of matrices. 371 nonoverdetermined linear system. 65 of a vector. 382 versus a singular matrix. 373 noninvertibility. 95. 264 leftward. 96 derivative of. 164 multiplication. 41. 75. 206. actuality of. 244 null vector. 44 nonanalytic point. 39 irrational. 39 very large or very small. 39. 5. 242 multiplicative inversion. 13. 45. concise. 400 noncommutivity of matrix multiplication. general solution of. 194 compared to xa . 332 nonoverdetermined linear system. 101 natural logarithmic family of functions. 180 existence of. 305 imaginary. 108 exact. 330 natural exponential. 237. 146 nonanalytic function. 416 of the vector. 402 modulus. 171 nonassociativity of the cross product. 399 nondiagonalizable matrix. 94 and the antiderivative. 161 nesting. 86.

55 one. 71 Pascal. Marc-Antoine (1755–1836). 261 permutor. 264 using a variable up. 336 INDEX padding a matrix with zeros. 134. 274 partial-fraction expansion. 62 orthonormalization. 205 particular solution. 262 leftward multitarget addition. enclosing a. 383 Pfufnik. 72 neighbors in. 164. 62. 313. 401 variable. 403 perturbation. 84. 110 odd function. 269 properties of. 70 permutation matrix. 417 unresolved. Gorbag. 263 nonlinear. 347 perpendicular unit vector. 328 Penrose. Max (1858–1947). 250 Old Royal Observatory. 357. 332 in vector notation. 265 truncation. 51 origin. 162. 40 physical world. 198 pencil. 265 linear. 258. 393 physicist. 258 off-diagonal entries. 121 parallel unit triangular matrix. 347. 357 marking. 399 and the Levi-Civita epsilon. Roger (1931–). 415. 335. 331 pitch. 356 outer edge. 11 orientation. 416 unary linear. 332 plausible assumption. 7. 51. 319. 38. 408 Parseval’s principle. 175 partial derivative. 171. 133 downward multitarget addition. 416 optimality. 187 multiple. 264 elementary. 118. 132. 334 Planck. 261 general scaling. 242 parallel addition. 299. 51 pole. 425 outer surface. 245 operator. 51. 3 pilot. 29 plane. 347 orthonormalizing decomposition. 396 pivot. 220 parallel subtraction. 51 orthogonal basis. 185 oddness. 45. 415 partial unit triangular matrix.464 Occam’s razor abusing. 402 orthogonal complement. 206. 417 multitarget addition. 418 upward multitarget addition. 273 parity. 72 path integration. 347 orthogonal vector. 416 + and − as. 333 Pascal’s triangle. 352 orthonormal rows and columns. 133 operator notation. 250 general interchange. 358 perpendicular matrix. 335 permutation. 134 quasielementary. 423 overdetermined linear system. 338 order. 355 orthonormal vectors. 260 rightward multitarget addition. Blaise (1623–1662). 164 phase. 206 Parseval. 173 circle of. 289 small. 206 . 346 orthogonalization. 144 payment rate. 45. 349 inverting a matrix by. 112 point. 249 unary. 261. 78. 206 partial sum.

74 electromagnetic. 222 quadratic formula. 112. 231 . 29 derivative of. 45 quadratic expression. 210 of a trigonometric function. 12 quadratics. 397 product rule. 262 quintic expression. 500 B. 382. 51. 277 productivity. 38. 16 of a power. 206. 195. 11. cyclic. 331 power. 118. 1 by contradiction. 38 quadrant. 390 prolixity. 146 progression of coordinates. 117 of order N has N roots. 118. 36 and the hyperbolic functions. 206 polygon. 114 pyramid volume of. 580–c. 118. 16 real. 75 dividing. 263 interchange. 384 has at least one root. 117 positive definiteness. 206.C. derivative. 397 465 of determinants. 22 dividing by matching coefficients. 227 resolvent cubic of. 219. 15 complex. 46 in three dimensions. 102 and the sine and cosine functions. 20 power series. 260 addition. 11 quantity exact. 2. 210 separation of. 19 of a product. 2. 405 proof. 116 characteristic. 358 row addition. 29 multiplying. 345 of a vector and a scalar. 36 Pythagorean theorem. 353 pure mathematics. 13 determinant of. 230 quartic formula. 371 potentiometer. 340.). 261 scaling. 121 professional mathematician. 18 sum of. 405 processor. 21 pressure. 290 product.INDEX double. 21. 397. 154 by sketch. 2. 16 notation for. 26 extending the technique. 305 quartic expression. 43 bounds on. 400 fractional. 114. 21. 186 repeated. 22 shifting the expansion point of. 228 roots of. 305 imprecise. 112 prime mark (′ ). computer. 11. 41. 111 infinite supply of. 405 prime number. 231 primed coordinate. 111 relative. 399 projector. 123 proviso. 81. 206. 335. 364 dot or inner. 33 polynomial. 210 multiple. 139 Pythagoras (c. 326 pseudoinverse. 11. 368. 175 common quotients of. 157 with negative powers. 212 professional mathematics. 219. 19 properties. 230 quasielementary marking. 17 integral. 307 by induction. 34 proving backward. 413 prime factorization. 264 quasielementary operator. 364 of vectors.

instantaneous. 358 right triangle. 136. 7. 383 radius. 116 reorientation. 209 Roman numeral. 231 root. 399. 210 range contour. 229 18 rational. 163 repetition. 67 265 relative. 22 addition of. 310 uniqueness of. 205 INDEX zero. 39 finding of numerically. 399 radian. 41 superfluous. 310 residual. 26. 360 . 393 from a quartic polynomial. 361 after division by z − α. unseemly. 84. 398. 393 rational number. 334 and independent rows. 258. 307 resolvent cubic. 228 maximum. 45 rank-n determinant. 344 approximating as a ratio of integers. 334 relative rate. 296 reverse logic. 67 rise. 341 row. 230 rectangular matrix. 18. guessing. 205 mountain. 313. 401 rational function. 315. 116. 51. 206. 156 rate of change. 38. 313. 226 reciprocal pair. 172. 340 column. 231 rounding error. 116 null. rate. 86. 80 rigor. 34. 45 rank-r inverse. 45. 414 rational root. 408 quotient. 86. 205 impossibility of promoting. 11. 172. 223 splitting down the diagonal. 310 revolution. 34 from a quadratic polynomial. 115 roof. Joseph (1648–1715). 323 triple. 332 fully reduced. 312 residue. 435 integral of. 282. 296 minimizing the. 17. 80 row. 344 rightward multitarget addition operator. 335 full. 408 rank. 323 rotation. 22. 370. 55 repeated eigensolution. 12 rectangular coordinates. 339 squared norm of. 205 roll. 45. 327 repeated pole. 393 regular part. 45 repeated eigenvalue. 383 range. 395 invariance under. 38. 219 real exponential. 402 Raphson. 325. 115 winding. 322 root extraction rectangle. 2. 231 real part. 91 double. 234. 305. 7 from a cubic polynomial.466 quiz. 373. 347 reversibility. 210 Roman alphabet. 45 rate of change. 67 road ratio. 396 derivatives of. 53 relative primeness. 223 reciprocal. 187 angle of. 225 real number. 291 remainder. 62. 320 right-hand rule. 51.

377 singular value decomposition.INDEX null. 164 elementary. 102 scaling. 403 sixfold integral. 179 multiplication order of. 251 singular matrix. 241 Saturday. schematic. 375 scalar field. 103. 129 directional derivative of. 118 slow convergence. 262 determinant of. 51 singularity. 276. 206 slope. 29 harmonic. 400 scalar multiplication derivative of. 403 sink. Old. variations on. 2. 181 series. Erhard (1876–1959). 70 sketch. 363 scaling quasielementary. 175 complex. gradient of. 262 versus a nondiagonalizable matrix. 413 Simpson’s rule. 264 row operator. 239 row rank full. 266. 83 serial addition. 347 singular value. 282. 345 Royal Observatory. 414 separation of poles. 374 general. 50 similarity. 388 screw. 397 in complex exponential form. 15 solid convergence of. 328 373 Schmidt. 146. 64 surface area of. 323 orthonormal. 403 skepticism. 118 shape area of. 47. 420 second derivative. 233. 13 sum of. 83 sinusoid. 179 geometric. 165. 254. Issai (1875–1941). 299. 190 wavy surface of. 323. 377 singular value matrix. 107 of a vector. 55 rudder. 313 row vector. 14 notation for. 335 sign scalar. appending a. 355 scaled and repeated. 13 series addition. 375 similarity transformation. 38 sea essential. 245 sine. 13 product of. 138 secondary cylindrical basis. 59 scaling operator single-valued function. 45 467 geometric. 360 row addition quasielementary. proof by. 418 237. 384 sky. 324 scalar matrix. 34 self-adjoint matrix. 96 arithmetic. 162. 331 rudiments. 326 shifting an eigenvalue. 388 Schur decomposition. 376 condition of. 234 alternating. 45. 418 simultaneous system of linear equations. 45. 279 run. 374 selection from among wooden blocks. 13 slow function. 370 sifting property. 39. 359 law of sines. 388 Schur. 140 . 47 secondary circle. 29. 139 shift operator.

270. 45. 216 multidimensional. completing the. 13 partial. 332 general. 423 orientation of. 174 in 1/z. 405 subtraction parallel. 92 tapered strip. 139 solution. 334 guessing the form of. 425 strictly triangular matrix. 333 particular and homogeneous. 138 spare column. 403 surface area. 403 spherical coordinates. tapered. 420 south. 63. 242. 74 Taylor series. 366 degenerate. 12 squared residual norm. 121 sum. 140 style. 315 tangent. 140 surface element. 266 strip. 316 three-dimensional. 30 source. 403 space. 109. 140 target. 295. 401 sphere. 142 spherical basis. 223 superposition. 330. 280 summation. 270 Tartaglia. 148 INDEX subscript indicating the components of a vector by. 178 convergence of. 246 speed. 18. 140 closed. 138 surface integration. 56 tilted. 400 two-dimensional. Einstein’s. 313 Taylor expansion. 45 derivative of. 51 tall matrix. 332 symmetry. 172. 224 steepest rate. 335 squares. 337 particular. 191 . 107 in complex exponential form. 360 surface. 320 coincident properties of. least. 64 summation convention. 86. 175 weighted. 159 converting a power series to. o 219 taxonomy of the linear system. sum or difference of. 394. 11 squaring. 419 Stokes’ theorem. 88 square. 403 surface area of. 114. 425 Stokes.468 volume of. 36 square matrix. 319 error in. 333 sound. 62. Sir George Gabriel (1819–1903). 425 surface integral. 140 volume of. 101 tangent line. 157 for specific functions. 197 of least-squares. 190 integration by. 304 symmetry. first-order. 333 square. 186. 3. 374 family of. 403 swimming pool. 335 squares. 151. 406 superfluous root. 302. 137 inner and outer. appeal to. 400 space and time. 142 surface normal. 315 sparsity. 323 square root. Niccol` Fontana (1499–1557). 51. 326 addressing. 315. 13 compared to integration. 393 split form. 39 calculation by Newton-Raphson.

265 construction of. 161 Taylor. 175 theory.INDEX transposing to a different expansion point. 397 unresolved operator. 64 vector. 403 upper triangular matrix. 382 construction of. properties of. 418 unsureness. 266. 414 triangle. derivative of. 273 partial. 45. 191 finite number of. 47 unknown coefficient. 403 unitary matrix. logical. 416 unresolved. 313 uniqueness. 236 transitivity summational and integrodifferential. 7. 45 unit lower triangular matrix. 45. 393 three-dimensional space. 269 trigonometric family of functions. 47 cylindrical. 364 unitary similarity. 274 unit parallel. 135 transpose. 298 operator. 56 right. 55 of a sum or difference of angles. 39. 197 unprimed coordinate. 53 poles of. 33. 45 properties. 47. 323 of matrix rank. 48. 265 upstairs. 151. 186 trigonometrics. 374 unit normal. 159 technician. 109 of a double or half angle. properties of. Heaviside. 45 derivative of. 395 two-dimensional space. 240 of a matrix inverse. 249 tuning. 138 469 triple root. Brook (1685–1731). 310 unit. 331 two-dimensional geometrical vector. 257 trapezoid rule. 212. 274 unit upper triangular matrix. 62 unit circle. 226 triviality. 45 inverse. 129 tree. 417 unary operator. 103 trigonometric function. 331 term cross-. 61 triple integral. 62 variable. 267 parallel. 45 triangle inequalities. 34. linear. 273 partial. 240. 47 imaginary. 376 unity. 418 underdetermined linear system. 361 conjugate. 114 up. 414 . 400 unary linear operator. 64 triangular matrix. 45. 280 truncation. 138 transformation. 267 parallel. 39 real. 400 time and space. 145 unit triangular matrix. 101 trigonometry. complex. 332. 62 spherical. 107 inverse. 39 unit basis vector. 354 determinant of. 34 equilateral. 34 complex. 59 area of. 403 unit step function. 346 normal or perpendicular. 55 of a hour angle. 376 unitary transformation. 265 unit vector. 265 unit magnitude. 324 three-dimensional geometrical vector.

401. 413 dot or inner product of two. 419 vector notation. 47 orientation of. 130 vector. 393 addition of. 393 two-dimensional. 47 unit basis. 346 matrix. 191. 346 orthogonalization of. 76 independent. 413 curl of. spherical. 49 two-dimensional geometrical. 420 volume element. 45. 234. 413 . 413 integral forms of. 47. 345 elementary. 393 multiplication of. left-over. 345 orthogonal. 302. 413 vertex. 59 variable independent infinitesimal. 302 building from basis vectors. 220. 345 concise notation for. 403 Weierstrass. 221 Vieta. 401. variable. 330 vector. 326. Karl Wilhelm Theodor (1815– 1897). 47. 280. 191 normalization of. 316. driving. 10 complex. 393 vector analysis. 345 INDEX row of. 330 wind. 109 propagating. 405 derivative of. 138 wave complex. 49 three-dimensional geometrical. 393 vector calculus. 346 unit basis. 5.470 upward multitarget addition operator. 330 rotation of. 418 divergence of. 325. 347 notation for. 220 Vieta’s substitution. 30 assignment. 62 orthonormalization of. 7. 307. 220 Vieta’s transform. 191 magnitude of. 347 point. 130 variable dτ . cylindrical. 423 vector field. 412 angle between two. 62 unit basis. 309 ersatz. 281 volume. 62 zero or null. 139. 415 generalized. 76 utility. 234 scalar multiplication of. 234. 191. Franciscus (Fran¸ois Vi`te. 234 integer. 333 velocity. 327 addressing. 307. 280 vector algebra. 398 driving. 397 nonnegative integer. 62 unit basis. 302. 393 vector space. 109 wavy sea. 346 arbitrary. 59 variable. 347 orthonormal. 280 west. 397 three-dimensional. 1540– c e 1603). 51 row. 220 visualization. 139 Vieta’s parallel transform. 219. 78 definition notation for. 51 replacement of. 395 unit. 395 algebraic identities of. 264 utility variable. 423 volume integral. 10 change of. 11 dependent. 327 elementary. 249. geometrical. 30. 375 column. 421 directional derivative of. 155 weighted sum. 30. 137. 345 dot product of.

INDEX winding road. 7. 393 x86-class computer processor. 244 zero vector. 38 dividing by. 401 wooden block. 242 vector. 396 zero. 244. 244 padding a matrix with. 68 matrix. 244 zero matrix. 280 471 . 70 worker. 335 world physical. 290 yaw.

472 INDEX .

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->