## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

Version 0.11

(UCSD Math 174, Fall 2004)

Steven E. Pav

1

October 13, 2005

1

Department of Mathematics, MC0112, University of California at San Diego, La Jolla, CA 92093-0112.

<spav@ucsd.edu> This document is Copyright c 2004 Steven E. Pav. Permission is granted to copy,

distribute and/or modify this document under the terms of the GNU Free Documentation License, Version

1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-

Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled ”GNU Free

Documentation License”.

2

Preface

These notes were originally prepared during Fall quarter 2003 for UCSD Math 174, Numerical

Methods. In writing these notes, it was not my intention to add to the glut of Numerical Analysis

texts; they were designed to complement the course text, Numerical Mathematics and Computing,

Fourth edition, by Cheney and Kincaid [7]. As such, these notes follow the conventions of that

text fairly closely. If you are at all serious about pursuing study of Numerical Analysis, you should

consider acquiring that text, or any one of a number of other ﬁne texts by e.g., Epperson, Hamming,

etc. [3, 4, 5].

3.1

3.2 8.4

3.3

1.4

3.4

1.1

4.2 7.1 10.1

4.3

5.1

5.2

7.2

8.3

8.2

9.1

9.2 9.3

10.2

10.3

Figure 1: The chapter dependency of this text, though some dependencies are weak.

Special thanks go to the students of Math 174, 2003–2004, who suﬀered through early versions

of these notes, which were riddled with (more) errors.

Revision History

0.0 Transcription of course notes for Math 174, Fall 2003.

0.1 As used in Math 174, Fall 2004.

0.11 Added material on functional analysis and Orthogonal Least Squares.

Todo

• More homework questions and example problems.

• Chapter on optimization.

• Chapters on basic ﬁnite diﬀerence and ﬁnite element methods?

• Section on root ﬁnding for functions of more than one variable.

i

ii

Contents

Preface i

1 Introduction 1

1.1 Taylor’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Loss of Signiﬁcance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.3 Vector Spaces, Inner Products, Norms . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3.1 Vector Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.3.2 Inner Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

1.3.3 Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.4 Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.4.1 Matrix Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 A “Crash” Course in octave/Matlab 13

2.1 Getting Started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2 Useful Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.3 Programming and Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.3.1 Logical Forks and Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

2.4 Plotting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3 Solving Linear Systems 25

3.1 Gaussian Elimination with Na¨ıve Pivoting . . . . . . . . . . . . . . . . . . . . . . . . 25

3.1.1 Elementary Row Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.1.2 Algorithm Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.1.3 Algorithm Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

3.2 Pivoting Strategies for Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . 29

3.2.1 Scaled Partial Pivoting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.2.2 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

3.2.3 Another Example and A Real Algorithm . . . . . . . . . . . . . . . . . . . . . 32

3.3 LU Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.3.1 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.3.2 Using LU Factorizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3.3.3 Some Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.3.4 Computing Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.4 Iterative Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.4.1 An Operation Count for Gaussian Elimination . . . . . . . . . . . . . . . . . 37

iii

iv CONTENTS

3.4.2 Dividing by Multiplying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

3.4.3 Impossible Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.4.4 Richardson Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.4.5 Jacobi Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

3.4.6 Gauss Seidel Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.4.7 Error Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3.4.8 A Free Lunch? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

4 Finding Roots 49

4.1 Bisection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.1.1 Modiﬁcations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49

4.1.2 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.2 Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

4.2.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4.2.2 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.2.3 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

4.2.4 Using Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

4.3 Secant Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

4.3.1 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

4.3.2 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5 Interpolation 63

5.1 Polynomial Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5.1.1 Lagranges Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5.1.2 Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.1.3 Newton’s Nested Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5.1.4 Divided Diﬀerences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.2 Errors in Polynomial Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.2.1 Interpolation Error Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.2.2 Interpolation Error for Equally Spaced Nodes . . . . . . . . . . . . . . . . . . 73

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

6 Spline Interpolation 79

6.1 First and Second Degree Splines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

6.1.1 First Degree Spline Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

6.1.2 Second Degree Splines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

6.1.3 Computing Second Degree Splines . . . . . . . . . . . . . . . . . . . . . . . . 82

6.2 (Natural) Cubic Splines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

6.2.1 Why Natural Cubic Splines? . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

6.2.2 Computing Cubic Splines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

6.3 B Splines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

CONTENTS v

7 Approximating Derivatives 89

7.1 Finite Diﬀerences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

7.1.1 Approximating the Second Derivative . . . . . . . . . . . . . . . . . . . . . . 91

7.2 Richardson Extrapolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

7.2.1 Abstracting Richardson’s Method . . . . . . . . . . . . . . . . . . . . . . . . . 92

7.2.2 Using Richardson Extrapolation . . . . . . . . . . . . . . . . . . . . . . . . . 93

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

8 Integrals and Quadrature 97

8.1 The Deﬁnite Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

8.1.1 Upper and Lower Sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

8.1.2 Approximating the Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

8.1.3 Simple and Composite Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

8.2 Trapezoidal Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

8.2.1 How Good is the Composite Trapezoidal Rule? . . . . . . . . . . . . . . . . . 101

8.2.2 Using the Error Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

8.3 Romberg Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

8.3.1 Recursive Trapezoidal Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

8.4 Gaussian Quadrature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107

8.4.1 Determining Weights (Lagrange Polynomial Method) . . . . . . . . . . . . . . 107

8.4.2 Determining Weights (Method of Undetermined Coeﬃcients) . . . . . . . . . 108

8.4.3 Gaussian Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

8.4.4 Determining Gaussian Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

8.4.5 Reinventing the Wheel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

9 Least Squares 117

9.1 Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

9.1.1 The Deﬁnition of Ordinary Least Squares . . . . . . . . . . . . . . . . . . . . 117

9.1.2 Linear Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

9.1.3 Least Squares from Basis Functions . . . . . . . . . . . . . . . . . . . . . . . 119

9.2 Orthonormal Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121

9.2.1 Alternatives to Normal Equations . . . . . . . . . . . . . . . . . . . . . . . . 122

9.2.2 Ordinary Least Squares in octave/Matlab . . . . . . . . . . . . . . . . . . . . 124

9.3 Orthogonal Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

9.3.1 Computing the Orthogonal Least Squares Approximant . . . . . . . . . . . . 128

9.3.2 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 129

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132

10 Ordinary Diﬀerential Equations 135

10.1 Elementary Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135

10.1.1 Integration and ‘Stepping’ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

10.1.2 Taylor’s Series Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

10.1.3 Euler’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

10.1.4 Higher Order Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

10.1.5 Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

10.1.6 Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

10.1.7 Backwards Euler’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

vi CONTENTS

10.2 Runge-Kutta Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

10.2.1 Taylor’s Series Redux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

10.2.2 Deriving the Runge-Kutta Methods . . . . . . . . . . . . . . . . . . . . . . . 144

10.2.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

10.3 Systems of ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

10.3.1 Larger Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

10.3.2 Recasting Single ODE Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 147

10.3.3 It’s Only Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149

10.3.4 It’s Only Autonomous Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 150

Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

A Old Exams 157

A.1 First Midterm, Fall 2003 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157

A.2 Second Midterm, Fall 2003 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

A.3 Final Exam, Fall 2003 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

A.4 First Midterm, Fall 2004 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161

A.5 Second Midterm, Fall 2004 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

A.6 Final Exam, Fall 2004 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

B GNU Free Documentation License 167

1. APPLICABILITY AND DEFINITIONS . . . . . . . . . . . . . . . . . . . . . . . . . . 167

2. VERBATIM COPYING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

3. COPYING IN QUANTITY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

4. MODIFICATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168

5. COMBINING DOCUMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

6. COLLECTIONS OF DOCUMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

7. AGGREGATION WITH INDEPENDENT WORKS . . . . . . . . . . . . . . . . . . . 169

8. TRANSLATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

9. TERMINATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170

10. FUTURE REVISIONS OF THIS LICENSE . . . . . . . . . . . . . . . . . . . . . . . 170

ADDENDUM: How to use this License for your documents . . . . . . . . . . . . . . . . . 170

Bibliography 171

Chapter 1

Introduction

1.1 Taylor’s Theorem

Recall from calculus the Taylor’s series for a function, f(x), expanded about some number, c, is

written as

f(x) ∼ a

0

+a

1

(x −c) +a

2

(x −c)

2

+. . . .

Here the symbol ∼ is used to denote a “formal series,” meaning that convergence is not guaranteed

in general. The constants a

i

are related to the function f and its derivatives evaluated at c. When

c = 0, this is a MacLaurin series.

For example we have the following Taylor’s series (with c = 0):

e

x

= 1 +x +

x

2

2!

+

x

3

3!

+. . .

sin (x) = x −

x

3

3!

+

x

5

5!

−. . .

cos (x) = 1 −

x

2

2!

+

x

4

4!

−. . .

(1.1)

(1.2)

(1.3)

Theorem 1.1 (Taylor’s Theorem). If f(x) has derivatives of order 0, 1, 2, . . . , n+1 on the closed

interval [a, b], then for any x and c in this interval

f(x) =

n

¸

k=0

f

(k)

(c) (x −c)

k

k!

+

f

(n+1)

(ξ) (x −c)

n+1

(n + 1)!

,

where ξ is some number between x and c, and f

k

(x) is the k

th

derivative of f at x.

We will use this theorem again and again in this class. The main usage is to approximate

a function by the ﬁrst few terms of its Taylor’s series expansion; the theorem then tells us the

approximation is “as good” as the ﬁnal term, also known as the error term. That is, we can make

the following manipulation:

1

2 CHAPTER 1. INTRODUCTION

f(x) =

n

¸

k=0

f

(k)

(c) (x −c)

k

k!

+

f

(n+1)

(ξ) (x −c)

n+1

(n + 1)!

f(x) −

n

¸

k=0

f

(k)

(c) (x −c)

k

k!

=

f

(n+1)

(ξ) (x −c)

n+1

(n + 1)!

f(x) −

n

¸

k=0

f

(k)

(c) (x −c)

k

k!

=

f

(n+1)

(ξ)

[x −c[

n+1

(n + 1)!

.

On the left hand side is the diﬀerence between f(x) and its approximation by Taylor’s series.

We will then use our knowledge about f

(n+1)

(ξ) on the interval [a, b] to ﬁnd some constant M such

that

f(x) −

n

¸

k=0

f

(k)

(c) (x −c)

k

k!

=

f

(n+1)

(ξ)

[x −c[

n+1

(n + 1)!

≤ M[x −c[

n+1

.

Example Problem 1.2. Find an approximation for f(x) = sinx, expanded about c = 0, using

n = 3.

Solution: Solving for f

(k)

is fairly easy for this function. We ﬁnd that

f(x) = sin x = sin(0) +

cos(0) x

1!

+

−sin(0) x

2

2!

+

−cos(0) x

3

3!

+

sin(ξ) x

4

4!

= x −

x

3

6

+

sin(ξ) x

4

24

,

so

sin x −

x −

x

3

6

=

sin(ξ) x

4

24

≤

x

4

24

,

because [sin(ξ)[ ≤ 1. ¬

Example Problem 1.3. Apply Taylor’s Theorem for the case n = 1.

Solution: Taylor’s Theorem for n = 1 states: Given a function, f(x) with a continuous derivative

on [a, b], then

f(x) = f(c) +f

t

(ξ)(x −c)

for some ξ between x, c when x, c are in [a, b].

This is the Mean Value Theorem. As a one-liner, the MVT says that at some time during a trip,

your velocity is the same as your average velocity for the trip. ¬

Example Problem 1.4. Apply Taylor’s Theorem to expand f(x) = x

3

−21x

2

+17 around c = 1.

Solution: Simple calculus gives us

f

(0)

(x) = x

3

−21x

2

+ 17,

f

(1)

(x) = 3x

2

−42x,

f

(2)

(x) = 6x −42,

f

(3)

(x) = 6,

f

(k)

(x) = 0.

1.1. TAYLOR’S THEOREM 3

with the last holding for k > 3. Evaluating these at c = 1 gives

f(x) = −3 +−39(x −1) +

−36 (x −1)

2

2

+

6 (x −1)

3

6

.

Note there is no error term, since the higher order derivatives are identically zero. By carrying out

simple algebra, you will ﬁnd that the above expansion is, in fact, the function f(x). ¬

There is an alternative form of Taylor’s Theorem, in this case substituting x + h for x, and x

for c in the more general version. This gives

Theorem 1.5 (Taylor’s Theorem, Alternative Form). If f(x) has derivatives of order 0, 1, . . . , n+

1 on the closed interval [a, b], then for any x in this interval and any h such that x + h is in this

interval,

f(x +h) =

n

¸

k=0

f

(k)

(x) (h)

k

k!

+

f

(n+1)

(ξ) (h)

n+1

(n + 1)!

,

where ξ is some number between x and x +h.

We generally apply this form of the theorem with h → 0. This leads to a discussion on the

matter of Orders of Convergence. The following deﬁnition will suﬃce for this class

Deﬁnition 1.6. We say that a function f(h) is in the class O

h

k

(pronounced “big-Oh of h

k

”)

if there is some constant C such that

[f(h)[ ≤ C [h[

k

for all h “suﬃciently small,” i.e., smaller than some h

∗

in absolute value.

For a function f ∈ O

h

k

we sometimes write f = O

h

k

. We sometimes also write O

h

k

,

meaning some function which is a member of this class.

Roughly speaking, through use of the “Big-O” function we can write an expression without

“sweating the small stuﬀ.” This can give us an intuitive understanding of how an approximation

works, without losing too many of the details.

Example 1.7. Consider the Taylor expansion of ln x:

ln (x +h) = ln x +

(1/x) h

1

+

−1/x

2

h

2

2

+

2/ξ

3

h

3

6

Letting x = 1, we have

ln (1 +h) = h −

h

2

2

+

1

3ξ

3

h

3

.

Using the fact that ξ is between 1 and 1 +h, as long as h is relatively small (say smaller than

1

2

),

the term

1

3ξ

3

can be bounded by a constant, and thus

ln (1 +h) = h −

h

2

2

+O

h

3

.

Thus we say that h −

h

2

2

is a O

h

3

**approximation to ln(1 +h). For example
**

ln(1 + 0.01) ≈ 0.009950331 ≈ 0.00995 = 0.01 −

0.01

2

2

.

4 CHAPTER 1. INTRODUCTION

1.2 Loss of Signiﬁcance

Generally speaking, a computer stores a number x as a mantissa and exponent, that is x = ±r10

k

,

where r is a rational number of a given number of digits in [0.1, 1), and k is an integer in a certain

range.

The number of signiﬁcant digits in r is usually determined by the user’s input. Operations

on numbers stored in this way follow a “lowest common denominator” type of rule, i.e., precision

cannot be gained but can be lost. Thus for example if you add the two quantities 0.171717 and

0.51, then the result should only have two signiﬁcant digits; the precision of the ﬁrst measurement

is lost in the uncertainty of the second.

This is as it should be. However, a loss of signiﬁcance can be incurred if two nearly equal

quantities are subtracted from one another. Thus if I were to direct my computer to subtract

0.177241 from 0.177589, the result would be .34810

−3

, and three signiﬁcant digits have been lost.

This loss is called subtractive cancellation, and can often be avoided by rewriting the expression.

This will be made clearer by the examples below.

Errors can also occur when quantities of radically diﬀerent magnitudes are summed. For exam-

ple 0.1234 + 5.6789 10

−20

might be rounded to 0.1234 by a system that keeps only 16 signiﬁcant

digits. This may lead to unexpected results.

The usual strategies for rewriting subtractive expressions are completing the square, factoring,

or using the Taylor expansions, as the following examples illustrate.

Example Problem 1.8. Consider the stability of

√

x + 1 − 1 when x is near 0. Rewrite the

expression to rid it of subtractive cancellation.

Solution: Suppose that x = 1.2345678 10

−5

. Then

√

x + 1 ≈ 1.000006173. If your computer

(or calculator) can only keep 8 signiﬁcant digits, this will be rounded to 1.0000062. When 1 is

subtracted, the result is 6.2 10

−6

. Thus 6 signiﬁcant digits have been lost from the original.

To ﬁx this, we rationalize the expression

√

x + 1 −1 =

√

x + 1 −1

√

x + 1 + 1

√

x + 1 + 1

=

x + 1 −1

√

x + 1 + 1

=

x

√

x + 1 + 1

.

This expression has no subtractions, and so is not subject to subtractive cancelling. When x =

1.2345678 10

−5

, this expression evaluates approximately as

1.2345678 10

−5

2.0000062

≈ 6.17281995 10

−6

on a machine with 8 digits, there is no loss of precision. ¬

Note that nearly all modern computers and calculators store intermediate results of calculations

in higher precision formats. This minimizes, but does not eliminate, problems like those of the

previous example problem.

Example Problem 1.9. Write stable code to ﬁnd the roots of the equation x

2

+bx +c = 0.

Solution: The usual quadratic formula is

x

±

=

−b ±

√

b

2

−4c

2

Supposing that b c > 0, the expression in the square root might be rounded to b

2

, giving two

roots x

+

= 0, x

−

= −b. The latter root is nearly correct, while the former has no correct digits.

1.3. VECTOR SPACES, INNER PRODUCTS, NORMS 5

To correct this problem, multiply the numerator and denominator of x

+

by −b −

√

b

2

−4c to get

x

+

=

2c

−b −

√

b

2

−4c

Now if b c > 0, this expression gives root x

+

= −c/b, which is nearly correct. This leads to the

pair:

x

−

=

−b −

√

b

2

−4c

2

, x

+

=

2c

−b −

√

b

2

−4c

Note that the two roots are nearly reciprocals, and if x

−

is computed, x

+

can easily be computed

with little additional work. ¬

Example Problem 1.10. Rewrite e

x

−cos x to be stable when x is near 0.

Solution: Look at the Taylor’s Series expansion for these functions:

e

x

−cos x =

¸

1 +x +

x

2

2!

+

x

3

3!

+

x

4

4!

+

x

5

5!

+. . .

−

¸

1 −

x

2

2!

+

x

4

4!

−

x

6

6!

+. . .

= x +x

2

+

x

3

3!

+O

x

5

**This expression has no subtractions, and so is not subject to subtractive cancelling. Note that we
**

propose calculating x + x

2

+ x

3

/6 as an approximation of e

x

− cos x, which we cannot calculate

exactly anyway. Since we assume x is nearly zero, the approximate should be good. If x is very

close to zero, we may only have to take the ﬁrst one or two terms. If x is not so close to zero, we

may need to take all three terms, or even more terms of the expansion; if x is far from zero we

should use some other technique. ¬

1.3 Vector Spaces, Inner Products, Norms

We explore some of the basics of functional analysis which may be useful in this text.

1.3.1 Vector Space

A vector space is a collection of objects together with a binary operator which is deﬁned over an

algebraic ﬁeld.

1

The binary operator allows transparent algebraic manipulation of vectors.

Deﬁnition 1.11. A collection of vectors, 1, with a binary addition operator, +, deﬁned over 1,

and a scalar multiply over the real ﬁeld R, forms a vector space if

1. For each u, v ∈ 1, the sum u+v is a vector in 1. (i.e., the space is “closed under addition.”)

2. Addition is commutative: u +v = v +u for each u, v ∈ 1.

3. For each u ∈ 1, and each scalar α ∈ R the scalar product αu is a vector in 1. (i.e., the space

is “closed under scalar multiplication.”)

4. There is a zero vector 0 ∈ 1 such that for any u ∈ 1, 0u = 0, where 0 is the zero of R.

5. For any u ∈ 1, 1u = u, where 1 is the multiplicative identity of R.

6. For any u, v ∈ 1, and scalars α, β ∈ R, both (α+

R

β)u = αu+βu and α(u +v) = αu+αv

hold, where +

R

is addition in R. (i.e., scalar multiplication distributes in both ways.)

1

For the purposes of this text, this algebraic ﬁeld will always be the real ﬁeld, R, though in the real world, the

complex ﬁeld, C, has some currency.

6 CHAPTER 1. INTRODUCTION

Example 1.12. The most familiar example of a vector space is R

n

, which is the collection of

n-tuples of real numbers. That is u ∈ R

n

is of the form [u

1

, u

2

, . . . , u

n

]

¯

, where u

i

∈ R for

i = 1, 2, . . . , n. Addition and scalar multiplication over R are deﬁned pairwise:

[u

1

, u

2

, . . . , u

n

]

¯

+ [v

1

, v

2

, . . . , v

n

]

¯

= [u

1

+v

1

, u

2

+v

2

, . . . , u

n

+v

n

]

¯

, and

α[u

1

, u

2

, . . . , u

n

]

¯

= [αu

1

, αu

2

, . . . , αu

n

]

¯

Note that some authors distinguish between points in n-dimensional space and vectors in n-

dimensional space. We will use R

n

to refer to both of them, as in this text there is no need to

distinguish them symbolically.

Example 1.13. Let X ⊆ R

k

be an closed, bounded set, and let H be the collection of all functions

from X to R. Then H forms a vector space under the “pointwise” deﬁned addition and scalar

multiplication over R. That is, for u, v ∈ H, u + v is the function in H deﬁned by [u +v] (x) =

u(x) +v(x) for all x ∈ X. And for u ∈ H, α ∈ R, αu is the function in [αu] (x) = α(u(x)).

Example 1.14. Let X ⊆ R

k

be a closed, bounded set, and let H

0

be the collection of all functions

from X to R that take the value zero on ∂X. Then H forms a vector space under the “pointwise”

deﬁned addition and scalar multiplication over R. The only diﬀerence between proving H

0

is a

vector space and the proof required for the previous example is in showing that H

0

is indeed closed

under addition and scalar multiplication. This is simple because if x ∈ ∂X, then [u +v] (x) =

u(x) +v(x) = 0 +0, and thus u +v has the property that it takes value 0 on ∂X. Similarly for αu.

This would not have worked if the functions of H

0

were required to take some other value on ∂X,

like, say, 2 instead of 0.

Example 1.15. Let {

n

be the collection of all ‘formal’ polynomials of degree less than or equal to

n with coeﬃcients from R. Then {

n

forms a vector space over R.

Example 1.16. The collection of all real-valued mn matrices forms a vector space over the reals

with the usual scalar multiplication and matrix addition. This space is denoted as R

mn

. Another

way of viewing this space is it is the space of linear functions which carry vectors of R

n

to vectors

of R

m

.

1.3.2 Inner Products

An inner product is a way of “multiplying” two vectors from a vector space together to get a scalar

from the same ﬁeld the space is deﬁned over (e.g., a real or a complex number). The inner product

should have the following properties:

Deﬁnition 1.17. For a vector space, 1, deﬁned over R, a binary function, (, ), which takes two

vectors of 1 to R is an inner product if

1. It is symmetric: (v, u) = (u, v).

2. It is linear in both its arguments:

(αu +βv, w) = α(u, w) +β (v, w) and

(u, αv +βw) = α(u, v) +β (u, w) .

A binary function for which this holds is sometimes called a bilinear form.

3. It is positive: (v, v) ≥ 0, with equality holding if and only if v is the zero vector of 1.

1.3. VECTOR SPACES, INNER PRODUCTS, NORMS 7

Example 1.18. The most familiar example of an inner product is the L

2

(pronounced “L two”)

inner product on the vector space R

n

. If u = [u

1

, u

2

, . . . , u

n

]

¯

, and v = [v

1

, v

2

, . . . , v

n

]

¯

, then

letting

(u, v)

2

=

¸

i

u

i

v

i

gives an inner product. This inner product is the usual vector calculus dot product and is sometimes

written as u v or u

¯

v.

Example 1.19. Let H be the vector space of functions from X to R from Example 1.13. Then

for u, v ∈ H, letting

(u, v)

H

=

X

u(x)v(x) dx,

gives an inner product. This inner product is like the “limit case” of the L

2

inner product on R

n

as n goes to inﬁnity.

1.3.3 Norms

A norm is a way of measuring the “length” of a vector:

Deﬁnition 1.20. A function || from a vector space, 1, to R

+

is called a norm if

1. It obeys the triangle inequality: |x +y| ≤ |x| +|y| .

2. It scales positively: |αx| = [α[ |x| , for scalar α.

3. It is positive: |x| ≥ 0, with equality only holding when x is the zero vector.

The easiest way of constructing a norm is on top of an inner product. If (, ) is an inner product

on vector space 1, then letting

|u| =

(u, u)

gives a norm on 1. This is how we construct our most common norms:

Example 1.21. For vector x ∈ R

n

, its L

2

norm is deﬁned

|x|

2

=

n

¸

i=1

x

2

i

1

2

=

x

¯

x

1

2

.

This is constructed on top of the L

2

inner product.

Example 1.22. The L

p

norm on R

n

generalizes the L

2

norm, and is deﬁned, for p > 0, as

|x|

p

=

n

¸

i=1

[x

i

[

p

1/p

.

Example 1.23. The L

∞

norm on R

n

is deﬁned as

|x|

∞

= max

i

[x

i

[ .

The L

∞

norm is somehow the “limit” of the L

p

norm as p →∞.

8 CHAPTER 1. INTRODUCTION

1.4 Eigenvalues

It is assumed the reader has some familiarity with linear algebra. We review the topic of eigenvalues.

Deﬁnition 1.24. A nonzero vector x is an eigenvector of a given matrix A, with corresponding

eigenvalue λ if

Ax = λx

Subtracting the right hand side from the left and gathering terms gives

(A −λI) x = 0.

Since x is assumed to be nonzero, the matrix A −λI must be singular. A matrix is singular if and

only if its determinant is zero. These steps are reversible, thus we claim λ is an eigenvalue if and

only if

det (A −λI) = 0.

The left hand side can be expanded to a polynomial in λ, of degree n where A is an nn matrix. This

gives the so-called characteristic equation. Sometimes eigenvectors,-values are called characteristic

vectors,-values.

Example Problem 1.25. Find the eigenvalues of

¸

1 1

4 −2

**Solution: The eigenvalues are roots of
**

0 = det

¸

1 −λ 1

4 −2 −λ

= (1 −λ) (−2 −λ) −4 = λ

2

+λ −6.

This equation has roots λ

1

= −3, λ

2

= 2. ¬

Example Problem 1.26. Find the eigenvalues of A

2

.

Solution: Let λ be an eigenvalue of A, with corresponding eigenvector x. Then

A

2

x = A(Ax) = A(λx) = λAx = λ

2

x.

¬

The eigenvalues of a matrix tell us, roughly, how the linear transform scales a given matrix; the

eigenvectors tell us which directions are “purely scaled.” This will make more sense when we talk

about norms of vectors and matrices.

1.4.1 Matrix Norms

Given a norm || on the vector space, R

n

, we can deﬁne the matrix norm “subordinate” to it, as

follows:

Deﬁnition 1.27. Given a norm || on R

n

, we deﬁne the subordinate matrix norm on R

nn

by

|A| = max

x,=0

|Ax|

|x|

.

1.4. EIGENVALUES 9

We will use the subordinate two-norm for matrices. From the deﬁnition of the subordinate

norm as a max, we conclude that if x is a nonzero vector then

|Ax|

2

|x|

2

≤ |A|

2

thus,

|Ax|

2

≤ |A|

2

|x|

2

.

Example 1.28. Strange but true: If Λ is the set of eigenvalues of A, then

|A|

2

= max

λ∈Λ

[λ[ .

Example Problem 1.29. Prove that

|AB|

2

≤ |A|

2

|B|

2

.

Solution:

|AB|

2

=

df

max

x,=0

|ABx|

2

|x|

2

≤ max

x,=0

|A|

2

|Bx|

2

|x|

2

= |A|

2

|B|

2

.

¬

10 CHAPTER 1. INTRODUCTION

Exercises

(1.1) Suppose f ∈ O

h

k

. Show that f ∈ O(h

m

) for any m with 0 < m < k. (Hint: Take

h

∗

< 1.) Note this may appear counterintuitive, unless you remember that O

h

k

is a better

approximation than O(h

m

) when m < k.

(1.2) Suppose f ∈ O

h

k

, and g ∈ O(h

m

) . Show that fg ∈ O

h

k+m

.

(1.3) Suppose f ∈ O

h

k

, and g ∈ O(h

m

) , with m < k. Show that f +g ∈ O(h

m

) .

(1.4) Prove that f(h) = −3h

5

is in O

h

5

.

(1.5) Prove that f(h) = h

2

+ 5h

17

is in O

h

2

.

(1.6) Prove that f(h) = h

3

is not in O

h

4

**(Hint: Proof by contradiction.)
**

(1.7) Prove that sin(h) is in O(h).

(1.8) Find a O

h

3

approximation to sin h.

(1.9) Find a O

h

4

**approximation to ln(1+h). Compare the approximate value to the actual when
**

h = 0.1. How does this approximation compare to the O

h

3

**approximate from Example 1.7
**

for h = 0.1?

(1.10) Suppose that f ∈ O

h

k

**. Can you show that f
**

t

∈ O

h

k−1

?

(1.11) Rewrite

√

x + 1 −

√

1 to get rid of subtractive cancellation when x ≈ 0.

(1.12) Rewrite

√

x + 1 −

√

x to get rid of subtractive cancellation when x is very large.

(1.13) Use a Taylor’s expansion to rid the expression 1 − cos x of subtractive cancellation for x

small. Use a O

x

5

approximate.

(1.14) Use a Taylor’s expansion to rid the expression 1 − cos

2

x of subtractive cancellation for x

small. Use a O

x

6

approximate.

(1.15) Calculate cos(π/2 + 0.001) to within 8 decimal places by using the Taylor’s expansion.

(1.16) Prove that if x is an eigenvector of A then αx is also an eigenvector of A, for the same

eigenvalue. Here α is a nonzero real number.

(1.17) Prove, by induction, that if λ is an eigenvalue of A then λ

k

is an eigenvalue of A

k

for integer

k > 1. The base case was done in Example Problem 1.26.

(1.18) Let B =

¸

k

i=0

α

i

A

i

, where A

0

= I. Prove that if λ is an eigenvalue of A, then

¸

k

i=0

α

i

λ

i

is

an eigenvalue of B. Thus for polynomial p(x), p(λ) is an eigenvalue of p(A).

(1.19) Suppose A is an invertible matrix with eigenvalue λ. Prove that λ

−1

is an eigenvalue for

A

−1

.

(1.20) Suppose that the eigenvalues of A are 1, 10, 100. Give the eigenvalues of B = 3A

3

−4A

2

+I.

Show that B is singular.

(1.21) Show that if |x|

2

= r, then x is on a sphere centered at the origin of radius r, in R

n

.

(1.22) If |x|

2

= 0, what does this say about vector x?

(1.23) Letting x = [3 4 12]

¯

, what is |x|

2

?

(1.24) What is the norm of

A =

1 0 0 0

0 1/2 0 0

0 0 1/3 0

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

0 0 0 1/n

¸

¸

¸

¸

¸

¸

¸

?

(1.25) Show that |A|

2

= 0 implies that A is the matrix of all zeros.

(1.26) Show that

A

−1

2

equals (1/[λ

min

[) , where λ

min

is the smallest, in absolute value, eigenvalue

of A.

(1.27) Suppose there is some µ > 0 such that, for a given A,

|Av|

2

≥ µ|v|

2

,

1.4. EIGENVALUES 11

for all vectors v.

(a) Show that µ ≤ |A|

2

. (Should be very simple.)

(b) Show that A is nonsingular. (Recall: A is singular if there is some x = 0 such that

Ax = 0.)

(c) Show that

A

−1

2

≤ (1/µ) .

(1.28) If A is singular, is it necessarily the case that |A|

2

= 0?

(1.29) If |A|

2

≥ µ > 0 does it follow that A is nonsingular?

(1.30) Towards proving the equality in Example Problem 1.28, prove that if Λ is the set of eigen-

values of A, then

|A| ≥ max

λ∈Λ

[λ[ ,

where || is any subordinate matrix norm. The inequality in the other direction holds when

the norm is ||

2

, but is diﬃcult to prove.

12 CHAPTER 1. INTRODUCTION

Chapter 2

A “Crash” Course in octave/Matlab

2.1 Getting Started

Matlab is a software package that allows you to program the mathematics of an algorithm without

getting too bogged down in the details of data structures, pointers, and reinventing the wheel.

It also includes graphing capabilities, and there are numerous packages available for all kinds of

functions, enabling relatively high-level programming. Unfortunately, it also costs quite a bit of

money, which is why I recommend the free Matlab clone, octave, available under the GPL

1

, freely

downloadable from http://www.octave.org.

In a lame attempt to retain market share, Mathworks continues to tinker with Matlab to make

it noncompatible with octave; this has the side eﬀect of obsoletizing old Matlab code. I will try

to focus on the intersection of the two systems, except where explicitly noted otherwise. What

follows, then, is an introduction to octave; Matlab users will have to make some changes.

You can ﬁnd a number of octave/Matlab tutorials for free on the web; many of them are

certainly better than this one. A brief web search reveals the following excellent tutorials:

• http://www.math.mtu.edu/~msgocken/intro/intro.html

• http://www.cyclismo.org/tutorial/matlab/vector.html

• http://web.ew.usna.edu/~mecheng/DESIGN/CAD/MATLAB/usna.html

Matlab has some demo programs covering a number of topics– from the most basic functionality

to the more arcane toolboxes. In Matlab, simply type demo.

What follows is a lame demo for octave. Start up octave. You should get something like:

GNU Octave, version 2.1.44 (i686-pc-linux-gnu).

Copyright (C) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003 John W. Eaton.

This is free software; see the source code for copying conditions.

There is ABSOLUTELY NO WARRANTY; not even for MERCHANTIBILITY or

FITNESS FOR A PARTICULAR PURPOSE. For details, type ‘warranty’.

Please contribute if you find this software useful.

For more information, visit http://www.octave.org/help-wanted.html

Report bugs to <bug-octave@bevo.che.wisc.edu>.

octave:1>

1

Gnu Public License. See http://www.gnu.org/copyleft/gpl.html.

13

14 CHAPTER 2. A “CRASH” COURSE IN OCTAVE/MATLAB

You now have a command line. The basic octavian data structure is a matrix; a scalar is a 11

matrix, a vector is an n 1 matrix. Some simple matrix constructions are as follows:

octave:2> a = [1 2 3]

a =

1 2 3

octave:3> b = [5;4;3]

b =

5

4

3

octave:4> c = a’

c =

1

2

3

octave:5> d = 5*c - 2 * b

d =

-5

2

9

You should notice that octave “echoes” the lvalues it creates. This is either a feature or an

annoyance. It can be prevented by appending a semicolon at the end of a command. Thus the

previous becomes

octave:5> d = 5*c - 2 * b;

octave:6>

For illustration purposes, I am leaving the semicolon oﬀ. To access an entry or entries of a

matrix, use parentheses. In the case where the variable is a vector, you only need give a single

index, as shown below; when the variable is a matrix, you need give both indices. You can also

give a range of indices, as in what follows.

WARNING: vectors and matrices in octave/Matlab are indexed starting from 1, and not

from 0, as is more common in modern programming languages. You are warned! Moreover, the

last index of a vector is denoted by the special symbol end.

octave:6> a(1) = 77

a =

77 2 3

octave:7> a(end) = -400

a =

77 2 -400

2.1. GETTING STARTED 15

octave:8> a(2:3) = [22 333]

a =

77 22 333

octave:9> M = diag(a)

M =

77 0 0

0 22 0

0 0 333

octave:10> M(2,1) = 14

M =

77 0 0

14 22 0

0 0 333

octave:11> M(1:2,1:2) = [1 2;3 4]

M =

1 2 0

3 4 0

0 0 333

The command diag(v) returns a matrix with v as the diagonal, if v is a vector. diag(M) returns

the diagonal of matrix M as a vector.

The form c:d gives returns a row vector of the integers between a and d, as we will examine

later. First we look at matrix manipulation:

octave:12> j = M * b

j =

13

31

999

octave:13> N = rand(3,3)

N =

0.166880 0.027866 0.087402

0.706307 0.624716 0.067067

0.911833 0.769423 0.938714

octave:14> L = M + N

L =

1.166880 2.027866 0.087402

3.706307 4.624716 0.067067

0.911833 0.769423 333.938714

octave:15> P = L * M

P =

16 CHAPTER 2. A “CRASH” COURSE IN OCTAVE/MATLAB

7.2505e+00 1.0445e+01 2.9105e+01

1.7580e+01 2.5911e+01 2.2333e+01

3.2201e+00 4.9014e+00 1.1120e+05

octave:16> P = L .* M

P =

1.1669e+00 4.0557e+00 0.0000e+00

1.1119e+01 1.8499e+01 0.0000e+00

0.0000e+00 0.0000e+00 1.1120e+05

octave:17> x = M \ b

x =

-6.0000000

5.5000000

0.0090090

octave:18> err = M * x - b

err =

0

0

0

Note the diﬀerence between L * M and L .* M; the former is matrix multiplication, the latter

is element by element multiplication, i.e.,

(L .∗ M)

i,j

= L

i,j

M

i,j

.

The command rand(m,n) gives an m n matrix with each element “uniformly distributed” on

[0, 1]. For a zero mean normal distribution with unit variance, use randn(m,n).

In line 17 we asked octave to solve the linear system

Mx = b,

by setting

x = M`b = M

−1

b.

Note that you can construct matrices directly as you did vectors:

octave:19> B = [1 3 4 5;2 -2 2 -2]

B =

1 3 4 5

2 -2 2 -2

You can also create row vectors as a sequence, either using the form c:d or the form c:e:d, which

give, respectively, c, c+1, . . . , d, and c, c+e, . . . , d, (or something like it if e does not divide d−c)

as follows:

octave:20> z = 1:5

z =

1 2 3 4 5

2.2. USEFUL COMMANDS 17

octave:21> z = 5:(-1):1

z =

5 4 3 2 1

octave:22> z = 5:(-2):1

z =

5 3 1

octave:23> z = 2:3:11

z =

2 5 8 11

octave:24> z = 2:3:10

z =

2 5 8

Matrices and vectors can be constructed “blockwise.” Blocks in the same row are separated by a

comma, those in the same column by a semicolon. Thus

octave:2> y=[2 7 9]

y =

2 7 9

octave:3> m = [z;y]

m =

2 5 8

2 7 9

octave:4> k = [(3:4)’, m]

k =

3 2 5 8

4 2 7 9

2.2 Useful Commands

Here’s a none too complete listing of useful commands in octave:

• help is the most useful command.

• floor(X) returns the largest integer not greater than X. If X is a vector or matrix, it computes

the ﬂoor element-wise. This behavior is common in octave: many functions which we normally

think of as applicable to scalars can be applied to matrices, with the result computed element-

wise.

• ceil(X) returns the smallest integer not less than X, computed element-wise.

• sin(X), cos(X), tan(X), atan(X), sqrt(X), returns the sine, cosine, tangent, arctangent,

square root of X, computed elementwise.

• exp(X) returns e

X

, elementwise.

• abs(X) returns [X[ , elementwise.

18 CHAPTER 2. A “CRASH” COURSE IN OCTAVE/MATLAB

• norm(X) returns the norm of X; if X is a vector, this is the L

2

norm:

|X|

2

=

¸

i

X

2

i

1/2

,

if X is a matrix, it is the matrix norm subordinate to the L

2

norm.

You can compute other norms with norm(X,p) where p is a number, to get the L

p

norm, or

with p one of Inf, -Inf, etc.

• zeros(m,n) returns an mn matrix of all zeros.

• eye(m) returns the mm identity matrix.

• [m,n] = size(A) returns the number of rows, columns of A. Similarly the functions rows(A)

and columns(A) return the number of rows and columns, respectively.

• length(v) returns the length of vector v, or the larger dimension if v is a matrix.

• find(M) returns the indices of the nonzero elements of N. This may not seem helpful at ﬁrst,

but it can be very useful for selecting subsets of data because the indices can be used for

selection. Thus, for example, in this code

octave:1> v = round(20*randn(400,3));

octave:2> selectv = v(find(v(:,2) == 7),:)

we have selected the rows of v where the element in the second column equals 7. Now you

see why leading computer scientists refer to octave/Matlab as “semantically suspect.” It is

a very useful language nonetheless, and you should try to learn its quirks rather than resist

them.

• diag(v) returns the diagonal matrix with vector v as diagonal. diag(M) returns as a vector,

the diagonal of matrix v. Thus diag(diag(v)) is v for vector v, but diag(diag(M)) is the

diagonal part of matrix M.

• toeplitz(v) returns the Toeplitz matrix associated with vector v. That is

toeplitz(v) =

v(1) v(2) v(3) v(n)

v(2) v(1) v(2) v(n −1)

v(3) v(2) v(1) v(n −2)

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

v(n) v(n −1) v(n −2) v(1)

¸

¸

¸

¸

¸

¸

¸

In the more general form, toeplitz(c,r) can be used to return a assymmetric Toeplitz

matrix.

A matrix which is banded on the cross diagonals is evidently called a “Hankel” matrix:

hankel(u, v) =

u(1) u(2) u(3) u(n)

u(2) u(3) u(4) v(2)

u(3) u(4) u(5) v(3)

.

.

.

.

.

.

.

.

.

.

.

.

u(n) v(2) v(3) v(n)

¸

¸

¸

¸

¸

¸

¸

• eig(M) returns the eigenvalues of M. [V, LAMBDA] = eig(M) returns the eigenvectors, and

eigenvalues of M.

• kron(M,N) returns the Kronecker product of the two matrices. This is a blcok construction

which returns a matrix where each block is an element of M as a scalar multiplied by the

whole matrix N.

• flipud(N) ﬂips the vector or matrix N so that its ﬁrst row is last and vice versa. Similarly

fliplr(N) ﬂips left/right.

2.3. PROGRAMMING AND CONTROL 19

2.3 Programming and Control

If you are going to do any serious programming in octave, you should keep your commands in a

ﬁle. octave loads commands from ‘.m’ ﬁles.

2

If you have the following in a ﬁle called myfunc.m:

function [y1,y2] = myfunc(x1,x2)

% comments start with a ‘%’

% this function is useless, except as an example of functions.

% input:

% x1 a number

% x2 another number

% output:

% y1 some output

% y2 some output

y1 = cos(x1) .* sin(x2);

y2 = norm(y1);

then you can call this function from octave, as follows:

octave:1> myfunc(2,3)

ans = -0.058727

octave:2> [a,b] = myfunc(2,3)

a = -0.058727

b = 0.058727

octave:3> [a,b] = myfunc([1 2 3 4],[1 2 3 4])

a =

0.45465 -0.37840 -0.13971 0.49468

b = 0.78366

Note this silly function will throw an error if x1 and x2 are not of the same size.

It is recommended that you write your functions so that they can take scalar and vector input

where appropriate. For example, the octave builtin sine function can take a scalar and output a

scalar, or take a vector and output a vector which is, elementwise, the sine of the input. It is not

too diﬃcult to write functions this way, it often only requires judicious use of .* multiplies instead

of * multiplies. For example, if the ﬁle myfunc.m were changed to read

y1 = cos(x1) * sin(x2);

it could easily crash if x1 and x2 were vectors of the same size because matrix multiplication is not

deﬁned for an n 1 matrix times another n 1 matrix.

An .m ﬁle does not have to contain a function, it can merely contain some octave commands.

For example, putting the following into runner.m:

x1 = rand(4,3);

x2 = rand(size(x1));

[a,b] = myfunc(x1,x2)

2

The ‘m’ stands for ‘octave.’

20 CHAPTER 2. A “CRASH” COURSE IN OCTAVE/MATLAB

octave allows you to call this script without arguments:

octave:4> runner

a =

0.245936 0.478054 0.535323

0.246414 0.186454 0.206279

0.542728 0.419457 0.083917

0.257607 0.378558 0.768188

b = 1.3135

octave has to know where your .m ﬁle is. It will look in the directory from which it was called.

You can set this to something else with cd or chdir.

You can also use the octave builtin function feval to evaluate a function by name. For example,

the following is a diﬀerent way of calling myfunc.m:

octave:5> [a,b] = feval("myfunc",2,3)

a = -0.058727

b = 0.058727

In this form feval seems like a way of using more keystrokes to get the same result. However, you

can pass a variable function name as well:

octave:6> fname = "myfunc"

fname = myfunc

octave:7> [a,b] = feval(fname,2,3)

a = -0.058727

b = 0.058727

This allows you to eﬀectively pass functions to other functions.

2.3.1 Logical Forks and Control

octave has the regular assortment of ‘if-then-else’ and ‘for’ and ‘while’ loops. These take the

following form:

if expr1

statements

elseif expr2

statements

elsif expr3

statements

...

else

statements

end

for var=vector

statements

end

while expr

2.4. PLOTTING 21

statements

end

Note that the word end is one of the most overloaded in octave/Matlab. It stands for the last

index of a vector of matrix, as well as the exit point for for loops, if statements, switches, etc. To

simplify debugging, it is also permissible to use endif to end an if statement, endfor to end a

for loop, etc..

The test expressions may use the logical conditionals: >, <, >=, <=, ==, ~=. Do not use the

assignment operator = as a conditional, as this can lead to disastrous results, as in C.

Here are some examples:

%compute the sign of x

if x > 0

s = 1;

elseif x == 0

s = 0;

else

s = -1;

end

%a ‘regular’ for loop

for i=1:10

sm = sm + i;

end

%an ‘irregular’ for loop

for i=[1 2 3 5 8 13 21 34]

fsm = fsm + i;

end

while (sin(x) > 0)

x = x * pi;

end

2.4 Plotting

Plotting is one area in which there are some noticeable diﬀerences between octave and Matlab.

The commands and examples given herein are for octave, but the commands for Matlab are not

too diﬀerent. octave ships its plotting commands to Gnuplot.

The main plot command is plot. You may also use semilogx, semilogy,loglog for 2D plots

with log axes, and contour and mesh for 3D plots. Use the help command to get the speciﬁc

syntax for each command. We present some examples:

n = 100;

X = pi .* ((1:n) ./ n);

Y = sin(X);

%just plot Y

plot(Y);

22 CHAPTER 2. A “CRASH” COURSE IN OCTAVE/MATLAB

%plot Y, but with the ‘right’ X axis labels

plot(X,Y);

W = sqrt(Y);

plot(W);

%plot W, but with the ‘right’ X axis labels

plot(Y,W);

The output from these commands is seen in Figure 2.1. In particular, you should note the

diﬀerence between plotting a vector, as in Figure 2.1(c) versus plotting the same vector but with

the appropriate abscissa values, as in Figure 2.1(d).

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 10 20 30 40 50 60 70 80 90 100

line 1

(a) Y = sin(X)

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.5 1 1.5 2 2.5 3 3.5

line 1

(b) Y versus X

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 10 20 30 40 50 60 70 80 90 100

line 1

(c) W =

√

Y

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

line 1

(d) W versus Y

Figure 2.1: Four plots from octave.

Some magic commands are required to plot to a ﬁle. For octave, I recommend the following

magic formula, which replots the ﬁgure to a ﬁle:

%call the plot commands before this line

gset term postscript color;

2.4. PLOTTING 23

gset output "filename.ps";

replot;

gset term x11;

gset output "/dev/null";

In Matlab, the commands are something like this:

%call the plot commands before this line

print(gcf,’-deps’,’filename.eps’);

24 CHAPTER 2. A “CRASH” COURSE IN OCTAVE/MATLAB

Exercises

(2.1) What do the following pieces of octave/Matlab code accomplish?

(a) x = (0:40) ./ 40;

(b) a = 2;

b = 5;

x = a + (b-a) .* (0:40) ./ 40;

(c) x = a + (b-a) .* (0:40) ./ 40;

y = sin(x);

plot(x,y);

(2.2) Implement the na¨ıve quadratic formula to ﬁnd the roots of x

2

+bx+c = 0, for real b, c. Your

code should return

−b ±

√

b

2

−4c

2

.

Your m-ﬁle should have header line like:

function [x1,x2] = naivequad(b,c)

Test your code for (b, c) =

1 10

15

, 1

**. Do you get a spurious root?
**

(2.3) Implement a robust quadratic formula (cf. Example Problem 1.9) to ﬁnd the roots of x

2

+

bx +c = 0. Your m-ﬁle should have header line like:

function [x1,x2] = robustquad(b,c)

Test your code for (b, c) =

1 10

15

, 1

**. Do you get a spurious root?
**

(2.4) Write octave/Matlab code to ﬁnd a ﬁxed point for the cosine, i.e., some x such that x =

cos(x). Do this as follows: pick some initial value x

0

, then let x

i+1

= cos(x

i

) for i =

0, 1, . . . , n. Pick n to be reasonably large, or choose some convergence criterion (i.e., ter-

minate if [x

i+1

−x

i

[ < 1 10

−10

). Does your code always converge?

(2.5) Write code to implement the factorial function for integers:

function [nfact] = factorial(n)

where n factorial is equal to 1 2 3 (n−1) n. Either use a ‘for’ loop, or write the function

to recursively call itself.

Chapter 3

Solving Linear Systems

A number of problems in numerical analysis can be reduced to, or approximated by, a system of

linear equations.

3.1 Gaussian Elimination with Na¨ıve Pivoting

Our goal is the automatic solution of systems of linear equations:

a

11

x

1

+ a

12

x

2

+ a

13

x

3

+ + a

1n

x

n

= b

1

a

21

x

1

+ a

22

x

2

+ a

23

x

3

+ + a

2n

x

n

= b

2

a

31

x

1

+ a

32

x

2

+ a

33

x

3

+ + a

3n

x

n

= b

3

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

a

n1

x

1

+ a

n2

x

2

+ a

n3

x

3

+ + a

nn

x

n

= b

n

In these equations, the a

ij

and b

i

are given real numbers. We also write this as

Ax = b,

where A is a matrix, whose element in the i

th

row and j

th

column is a

ij

, and b is a column vector,

whose i

th

entry is b

i

.

This gives the easier way of writing this equation:

a

11

a

12

a

13

a

1n

a

21

a

22

a

23

a

2n

a

31

a

32

a

33

a

3n

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

a

n1

a

n2

a

n3

a

nn

¸

¸

¸

¸

¸

¸

¸

x

1

x

2

x

3

.

.

.

x

n

¸

¸

¸

¸

¸

¸

¸

=

b

1

b

2

b

3

.

.

.

b

n

¸

¸

¸

¸

¸

¸

¸

(3.1)

3.1.1 Elementary Row Operations

You may remember that one way to solve linear equations is by applying elementary row operations

to a given equation of the system. For example, if we are trying to solve the given system of

equations, they should have the same solution as the following system:

25

26 CHAPTER 3. SOLVING LINEAR SYSTEMS

a

11

a

12

a

13

a

1n

a

21

a

22

a

23

a

2n

a

31

a

32

a

33

a

3n

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

κa

i1

κa

i2

κa

i3

κa

in

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

a

n1

a

n2

a

n3

a

nn

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

x

1

x

2

x

3

.

.

.

x

i

.

.

.

x

n

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

=

b

1

b

2

b

3

.

.

.

κb

i

.

.

.

b

n

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

where κ is some given number which is not zero. It suﬃces to solve this system of linear

equations, as it has the same solution(s) as our original system. Multiplying a row of the system

by a nonzero constant is one of the elementary row operations.

The second elementary row operation is to replace a row by the sum of that row and a constant

times another. Thus, for example, the following system of equations has the same solution as the

original system:

a

11

a

12

a

13

a

1n

a

21

a

22

a

23

a

2n

a

31

a

32

a

33

a

3n

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

a

(i−1)1

a

(i−1)2

a

(i−1)3

a

(i−1)n

a

i1

+βa

j1

a

i2

+βa

j2

a

i3

+βa

j3

a

in

+βa

jn

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

a

n1

a

n2

a

n3

a

nn

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

x

1

x

2

x

3

.

.

.

x

(i−1)

x

i

.

.

.

x

n

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

=

b

1

b

2

b

3

.

.

.

b

(i−1)

b

i

+βb

j

.

.

.

b

n

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

We have replaced the i

th

row by the i

th

row plus β times the j

th

row.

The third elementary row operation is to switch rows:

a

11

a

12

a

13

a

1n

a

31

a

32

a

33

a

3n

a

21

a

22

a

23

a

2n

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

a

n1

a

n2

a

n3

a

nn

¸

¸

¸

¸

¸

¸

¸

x

1

x

2

x

3

.

.

.

x

n

¸

¸

¸

¸

¸

¸

¸

=

b

1

b

3

b

2

.

.

.

b

n

¸

¸

¸

¸

¸

¸

¸

We have here switched the second and third rows. The purpose of this e.r.o. is mainly to make

things look nice.

Note that none of the e.r.o.’s change the structure of the solution vector x. For this reason, it is

customary to drop the solution vector entirely and to write the matrix A and the vector b together

in augmented form:

¸

¸

¸

¸

¸

¸

a

11

a

12

a

13

a

1n

b

1

a

21

a

22

a

23

a

2n

b

2

a

31

a

32

a

33

a

3n

b

3

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

a

n1

a

n2

a

n3

a

nn

b

n

¸

**The idea of Gaussian Elimination is to use the elementary row operations to put a system into
**

upper triangular form then use back substitution. We’ll give an example here:

3.1. GAUSSIAN ELIMINATION WITH NA

¨

IVE PIVOTING 27

Example Problem 3.1. Solve the set of linear equations:

x

1

+x

2

−x

3

= 2

−3x

1

−4x

2

+ 4x

3

= −7

2x

1

+ 1x

2

+ 1x

3

= 7

Solution: We start by rewriting in the augmented form:

¸

1 1 −1 2

−3 −4 4 −7

2 1 1 7

¸

We add 3 times the ﬁrst row to the second, and −2 times the ﬁrst row to the third to get:

¸

1 1 −1 2

0 −1 1 −1

0 −1 3 3

¸

We now add −1 times the second row to the third row to get:

¸

1 1 −1 2

0 −1 1 −1

0 0 2 4

¸

The matrix is now in upper triangular form: there are no nonzero entries below the diagonal. This

corresponds to the set of equations:

x

1

+x

2

−x

3

= 2

−x

2

+x

3

= −1

2x

3

= 4

We now solve this by back substitution. Because the matrix is in upper triangular form, we can

solve x

3

by looking only at the last equation; namely x

3

= 2. However, once x

3

is known, the second

equation involves only one unknown, x

2

, and can be solved only by x

2

= 3. Then the ﬁrst equation

has only one unknown, and is solved by x

1

= 1. ¬

All sorts of funny things can happen when you attempt Gaussian Elimination: it may turn

out that your system has no solution, or has a single solution (as above), or an inﬁnite number

of solutions. We should expect that an algorithm for automatic solution of systems of equations

should detect these problems.

3.1.2 Algorithm Terminology

The method outlined above is ﬁne for solving small systems. We should like to devise an algorithm

for doing the same thing which can be applied to large systems of equations. The algorithm will

take the system (in augmented form):

¸

¸

¸

¸

¸

¸

a

11

a

12

a

13

a

1n

b

1

a

21

a

22

a

23

a

2n

b

2

a

31

a

32

a

33

a

3n

b

3

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

a

n1

a

n2

a

n3

a

nn

b

n

¸

**28 CHAPTER 3. SOLVING LINEAR SYSTEMS
**

The algorithm then selects the ﬁrst row as the pivot equation or pivot row, and the ﬁrst element of

the ﬁrst row, a

11

is the pivot element. The algorithm then pivots on the pivot element to get the

system:

¸

¸

¸

¸

¸

¸

a

11

a

12

a

13

a

1n

b

1

0 a

t

22

a

t

23

a

t

2n

b

t

2

0 a

t

32

a

t

33

a

t

3n

b

t

3

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

0 a

t

n2

a

t

n3

a

t

nn

b

t

n

¸

Where

a

t

ij

= a

ij

−

a

i1

a

11

a

1j

b

t

i

= b

i

−

a

i1

a

11

b

1

(2 ≤ i ≤ n, 1 ≤ j ≤ n)

Eﬀectively we are carrying out the e.r.o. of replacing the i

th

row by the i

th

row minus

a

i1

a

11

times

the ﬁrst row. The quantity

a

i1

a

11

**is the multiplier for the i
**

th

row.

Hereafter the algorithm will not alter the ﬁrst row or ﬁrst column of the system. Thus, the

algorithm could be written recursively. By pivoting on the second row, the algorithm then generates

the system:

¸

¸

¸

¸

¸

¸

a

11

a

12

a

13

a

1n

b

1

0 a

t

22

a

t

23

a

t

2n

b

t

2

0 0 a

tt

33

a

tt

3n

b

tt

3

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

0 0 a

tt

n3

a

tt

nn

b

tt

n

¸

**In this case
**

a

tt

ij

= a

t

ij

−

a

i2

a

22

a

t

2j

b

tt

i

= b

t

i

−

a

i2

a

22

b

t

2

(3 ≤ i ≤ n, 1 ≤ j ≤ n)

3.1.3 Algorithm Problems

The pivoting strategy we examined in this section is called ‘na¨ıve’ because a real algorithm is a bit

more complicated. The algorithm we have outlined is far too rigid–it always chooses to pivot on

the k

th

row during the k

th

step. This would be bad if the pivot element were zero; in this case all

the multipliers

a

ik

a

kk

are not deﬁned.

Bad things can happen if a

kk

is merely small instead of zero. Consider the following example:

Example 3.2. Solve the system of equations given by the augmented form:

**−0.0590 0.2372 −0.3528
**

0.1080 −0.4348 0.6452

**Note that the exact solution of this system is x
**

1

= 10, x

2

= 1. Suppose, however, that the algorithm

uses only 4 signiﬁcant ﬁgures for its calculations. The algorithm, na¨ıvely, pivots on the ﬁrst

equation. The multiplier for the second row is

0.1080

−0.0590

≈ −1.830508...,

which will be rounded to −1.831 by the algorithm.

3.2. PIVOTING STRATEGIES FOR GAUSSIAN ELIMINATION 29

The second entry in the matrix is replaced by

−0.4348 −(−1.831)(0.2372) = −0.4348 + 0.4343 = −0.0005,

where the arithmetic is rounded to four signiﬁcant ﬁgures each time. There is some serious sub-

tractive cancellation going on here. We have lost three ﬁgures with this subtraction. The errors

get worse from here. Similarly, the second vector entry becomes:

0.6452 −(−1.831)(−0.3528) = 0.6452 −0.6460 = −0.0008,

where, again, intermediate steps are rounded to four signiﬁcant ﬁgures, and again there is subtrac-

tive cancelling. This puts the system in the form

**−0.0590 0.2372 −0.3528
**

0 −0.0005 −0.0008

**When the algorithm attempts back substitution, it gets the value
**

x

2

=

−0.0008

−0.0005

= 1.6.

This is a bit oﬀ from the actual value of 1. The algorithm now ﬁnds

x

1

= (−0.3528 −0.2372 1.6) /−0.059 = (−0.3528 −0.3795) /−0.059 = (−0.7323) /−0.059 = 12.41,

where each step has rounding to four signiﬁcant ﬁgures. This is also a bit oﬀ.

3.2 Pivoting Strategies for Gaussian Elimination

Gaussian Elimination can fail when performed in the wrong order. If the algorithm selects a zero

pivot, the multipliers are undeﬁned, which is no good. We also saw that a pivot small in magnitude

can cause failure. As here:

x

1

+x

2

= 1

x

1

+x

2

= 2

The na¨ıve algorithm solves this as

x

2

=

2 −

1

1 −

1

= 1 −

1 −

x

1

=

1 −x

2

=

1

1 −

If is very small, then

1

**is enormous compared to both 1 and 2. With poor rounding, the algorithm
**

solves x

2

as 1. Then it solves x

1

= 0. This is nearly correct for x

2

, but is an awful approximation

for x

1

. Note that this choice of x

1

, x

2

satisﬁes the ﬁrst equation, but not the second.

Now suppose the algorithm changed the order of the equations, then solved:

x

1

+x

2

= 2

x

1

+x

2

= 1

30 CHAPTER 3. SOLVING LINEAR SYSTEMS

The algorithm solves this as

x

2

=

1 −2

1 −

x

1

= 2 −x

2

There’s no problem with rounding here.

The problem is not the small entry per se: Suppose we use an e.r.o. to scale the ﬁrst equation,

then use na¨ıve G.E.:

x

1

+

1

x

2

=

1

x

1

+x

2

= 2

This is still solved as

x

2

=

2 −

1

1 −

1

x

1

=

1 −x

2

,

and rounding is still a problem.

3.2.1 Scaled Partial Pivoting

The na¨ıve G.E. algorithm uses the rows 1, 2, . . . , n-1 in order as pivot equations. As shown above,

this can cause errors. Better is to pivot ﬁrst on row

1

, then row

2

, etc, until ﬁnally pivoting on

row

n−1

, for some permutation ¦

i

¦

n

i=1

of the integers 1, 2, . . . , n. The strategy of scaled partial

pivoting is to compute this permutation so that G.E. works well.

In light of our example, we want to pivot on an element which is not small compared to other

elements in its row. So our algorithm ﬁrst determines “smallness” by calculating a scale, row-wise:

s

i

= max

1≤j≤n

[a

ij

[ .

The scales are only computed once.

Then the ﬁrst pivot,

1

, is chosen to be the i such that

[a

i,1

[

s

i

is maximized. The algorithm pivots on row

1

, producing a bunch of zeros in the ﬁrst column. Note

that the algorithm should not rearrange the matrix–this takes too much work.

The second pivot,

2

, is chosen to be the i such that

[a

i,2

[

s

i

is maximized, but without choosing

2

=

1

. The algorithm pivots on row

2

, producing a bunch of

zeros in the second column.

In the k

th

step

k

is chosen to be the i not among

1

,

2

, . . . ,

k−1

such that

[a

i,k

[

s

i

3.2. PIVOTING STRATEGIES FOR GAUSSIAN ELIMINATION 31

is maximized. The algorithm pivots on row

k

, producing a bunch of zeros in the k

th

column.

The slick way to implement this is to ﬁrst set

i

= i for i = 1, 2, . . . , n. Then rearrange this

vector in a kind of “bubble sort”: when you ﬁnd the index that should be

1

, swap them, i.e., ﬁnd

the j such that

j

should be the ﬁrst pivot and switch the values of

1

,

j

.

Then at the k

th

step, search only those indices in the tail of this vector: i.e., only among

j

for

k ≤ j ≤ n, and perform a swap.

3.2.2 An Example

We present an example of using scaled partial pivoting with G.E. It’s hard to come up with an

example where the numbers do not come out as ugly fractions. We’ll look at a homework question.

¸

¸

¸

2 −1 3 7 15

4 4 0 7 11

2 1 1 3 7

6 5 4 17 31

¸

**The scales are as follows: s
**

1

= 7, s

2

= 7, s

3

= 3, s

4

= 17.

We pick

1

. It should be the index which maximizes [a

i1

[ /s

i

. These values are:

2

7

,

4

7

,

2

3

,

6

17

.

We pick

1

= 3, and pivot:

¸

¸

¸

0 −2 2 4 8

0 2 −2 1 −3

2 1 1 3 7

0 2 1 8 10

¸

We pick

2

. It should not be 3, and should be the index which maximizes [a

i2

[ /s

i

. These values

are:

2

7

,

2

7

,

2

17

.

We have a tie. In this case we pick the second row, i.e.,

2

= 2. We pivot:

¸

¸

¸

0 0 0 5 5

0 2 −2 1 −3

2 1 1 3 7

0 0 3 7 13

¸

**The matrix is in permuted upper triangular form. We could proceed, but would get a zero
**

multiplier, and no changes would occur.

If we did proceed we would have

3

= 4. Then

4

= 1. Our row permutation is 3, 2, 4, 1. When

we do back substitution, we work in this order reversed on the rows, solving x

4

, then x

3

, x

2

, x

1

.

We get x

4

= 1, so

x

3

=

1

3

(13 −7 ∗ 1) = 2

x

2

=

1

2

(−3 −1 ∗ 1 + 2 ∗ 2) = 0

x

1

=

1

2

(7 −3 ∗ 1 −1 ∗ 2 −1 ∗ 0) = 1

32 CHAPTER 3. SOLVING LINEAR SYSTEMS

3.2.3 Another Example and A Real Algorithm

Sometimes we want to solve

Ax = b

for a number of diﬀerent vectors b. It turns out we can run G.E. on the matrix A alone and come up

with all the multipliers, which can then be used multiple times on diﬀerent vectors b. We illustrate

with an example:

M

0

=

¸

¸

¸

1 2 4 1

4 2 1 2

2 1 2 3

1 3 2 1

¸

, =

1

2

3

4

¸

¸

¸

¸

.

The scale vector is s =

4 4 3 3

¯

.

Our scale choices are

1

4

,

4

4

,

2

3

,

1

3

. We choose

1

= 2, and swap

1

,

2

. In the places where there

would be zeros in the real matrix, we will put the multipliers. We will illustrate them here boxed:

M

1

=

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

1

4

3

2

15

4

1

2

4 2 1 2

1

2

0

3

2

2

1

4

5

2

7

4

1

2

¸

, =

2

1

3

4

¸

¸

¸

¸

.

Our scale choices are

3

8

,

0

3

,

5

6

. We choose

2

= 4, and so swap

2

,

4

:

M

2

=

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

1

4

3

5

27

10

1

5

4 2 1 2

1

2

0

3

2

2

1

4

5

2

7

4

1

2

¸

, =

2

4

3

1

¸

¸

¸

¸

.

Our scale choices are

27

40

,

1

2

. We choose

3

= 1, and so swap

3

,

4

:

M

3

=

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

1

4

3

5

27

10

1

5

4 2 1 2

1

2

0

5

9

17

9

1

4

5

2

7

4

1

2

¸

, =

2

4

1

3

¸

¸

¸

¸

.

Now suppose we had to solve the linear system for b =

−1 8 2 1

¯

.

We scale b by the multipliers in order:

1

= 2, so, we sweep through the ﬁrst column of M

3

,

picking oﬀ the boxed numbers (your computer doesn’t really have boxed variables), and scaling b

3.3. LU FACTORIZATION 33

appropriately:

−1

8

2

1

¸

¸

¸

¸

⇒

−3

8

−2

−1

¸

¸

¸

¸

This continues:

−3

8

−2

−1

¸

¸

¸

¸

⇒

−

12

5

8

−2

−1

¸

¸

¸

¸

⇒

−

12

5

8

−

2

3

−1

¸

¸

¸

¸

We then perform a permuted backwards substitution on the augmented system

¸

¸

¸

0 0

27

10

1

5

−

12

5

4 2 1 2 8

0 0 0

17

9

−

2

3

0

5

2

7

4

1

2

−1

¸

This proceeds as

x

4

=

−2

3

9

17

=

−6

17

x

3

=

10

27

−

12

5

−

1

5

−6

17

= . . .

x

2

=

2

5

−1 −

1

2

−6

17

−

7

4

x

3

= . . .

x

1

=

1

4

8 −2

−6

17

−x

3

−2x

2

= . . .

Fill in your own values here.

3.3 LU Factorization

We examined G.E. to solve the system

Ax = b,

where A is a matrix:

A =

a

11

a

12

a

13

a

1n

a

21

a

22

a

23

a

2n

a

31

a

32

a

33

a

3n

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

a

n1

a

n2

a

n3

a

nn

¸

¸

¸

¸

¸

¸

¸

.

We want to show that G.E. actually factors A into lower and upper triangular parts, that is A = LU,

where

L =

1 0 0 0

21

1 0 0

31

32

1 0

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

n1

n2

n3

1

¸

¸

¸

¸

¸

¸

¸

, U =

u

11

u

12

u

13

u

1n

0 u

22

u

23

u

2n

0 0 u

33

u

3n

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

0 0 0 u

nn

¸

¸

¸

¸

¸

¸

¸

.

We call this a LU Factorization of A.

34 CHAPTER 3. SOLVING LINEAR SYSTEMS

3.3.1 An Example

We consider solution of the following augmented form:

¸

¸

¸

2 1 1 3 7

4 4 0 7 11

6 5 4 17 31

2 −1 0 7 15

¸

(3.2)

The na¨ıve G.E. reduces this to

¸

¸

¸

2 1 1 3 7

0 2 −2 1 −3

0 0 3 7 13

0 0 0 12 18

¸

We are going to run the na¨ıve G.E., and see how it is a LU Factorization. Since this is the na¨ıve

version, we ﬁrst pivot on the ﬁrst row. Our multipliers are 2, 3, 1. We pivot to get

¸

¸

¸

2 1 1 3 7

0 2 −2 1 −3

0 2 1 8 10

0 −2 −1 4 8

¸

**Careful inspection shows that we’ve merely multiplied A and b by a lower triangular matrix M
**

1

:

M

1

=

1 0 0 0

−2 1 0 0

−3 0 1 0

−1 0 0 1

¸

¸

¸

¸

The entries in the ﬁrst column are the negative e.r.o. multipliers for each row. Thus after the ﬁrst

pivot, it is like we are solving the system

M

1

Ax = M

1

b.

We pivot on the second row to get:

¸

¸

¸

2 1 1 3 7

0 2 −2 1 −3

0 0 3 7 13

0 0 −3 5 5

¸

**The multipliers are 1, −1. We can view this pivot as a multiplication by M
**

2

, with

M

2

=

1 0 0 0

0 1 0 0

0 −1 1 0

0 1 0 1

¸

¸

¸

¸

We are now solving

M

2

M

1

Ax = M

2

M

1

b.

3.3. LU FACTORIZATION 35

We pivot on the third row, with a multiplier of −1. Thus we get

¸

¸

¸

2 1 1 3 7

0 2 −2 1 −3

0 0 3 7 13

0 0 0 12 18

¸

**We have multiplied by M
**

3

:

M

3

=

1 0 0 0

0 1 0 0

0 0 1 0

0 0 1 1

¸

¸

¸

¸

We are now solving

M

3

M

2

M

1

Ax = M

3

M

2

M

1

b.

But we have an upper triangular form, that is, if we let

U =

2 1 1 3

0 2 −2 1

0 0 3 7

0 0 0 12

¸

¸

¸

¸

Then we have

M

3

M

2

M

1

A = U,

A = (M

3

M

2

M

1

)

−1

U,

A = M

1

−1

M

2

−1

M

3

−1

U,

A = LU.

We are hoping that L is indeed lower triangular, and has ones on the diagonal. It turns out that

the inverse of each M

i

matrix has a nice form (See Exercise (3.6)). We write them here:

L =

1 0 0 0

2 1 0 0

3 0 1 0

1 0 0 1

¸

¸

¸

¸

1 0 0 0

0 1 0 0

0 1 1 0

0 −1 0 1

¸

¸

¸

¸

1 0 0 0

0 1 0 0

0 0 1 0

0 0 −1 1

¸

¸

¸

¸

=

1 0 0 0

2 1 0 0

3 1 1 0

1 −1 −1 1

¸

¸

¸

¸

This is really crazy: the matrix L looks to be composed of ones on the diagonal and multipliers

under the diagonal.

Now we check to see if we made any mistakes:

LU =

1 0 0 0

2 1 0 0

3 1 1 0

1 −1 −1 1

¸

¸

¸

¸

2 1 1 3

0 2 −2 1

0 0 3 7

0 0 0 12

¸

¸

¸

¸

=

2 1 1 3

4 4 0 7

6 5 4 17

2 −1 0 7

¸

¸

¸

¸

= A.

36 CHAPTER 3. SOLVING LINEAR SYSTEMS

3.3.2 Using LU Factorizations

We see that the G.E. algorithm can be used to actually calculate the LU factorization. We will look

at this in more detail in another example. We now examine how we can use the LU factorization

to solve the equation

Ax = b,

Since we have A = LU, we ﬁrst solve

Lz = b,

then solve

Ux = z.

Since L is lower triangular, we can solve for z with a forward substitution. Similarly, since U is

upper triangular, we can solve for x with a back substitution. We drag out the previous example

(which we never got around to solving):

¸

¸

¸

2 1 1 3 7

4 4 0 7 11

6 5 4 17 31

2 −1 0 7 15

¸

**We had found the LU factorization of A as
**

A =

1 0 0 0

2 1 0 0

3 1 1 0

1 −1 −1 1

¸

¸

¸

¸

2 1 1 3

0 2 −2 1

0 0 3 7

0 0 0 12

¸

¸

¸

¸

So we solve

1 0 0 0

2 1 0 0

3 1 1 0

1 −1 −1 1

¸

¸

¸

¸

z =

7

11

31

15

¸

¸

¸

¸

We get

z =

7

−3

13

18

¸

¸

¸

¸

Now we solve

2 1 1 3

0 2 −2 1

0 0 3 7

0 0 0 12

¸

¸

¸

¸

x =

7

−3

13

18

¸

¸

¸

¸

We get the ugly solution

z =

37

24

−17

12

5

6

3

2

¸

¸

¸

¸

3.4. ITERATIVE SOLUTIONS 37

3.3.3 Some Theory

We aren’t doing much proving here. The following theorem has an ugly proof in the Cheney &

Kincaid [7].

Theorem 3.3. If A is an nn matrix, and na¨ıve Gaussian Elimination does not encounter a zero

pivot, then the algorithm generates a LU factorization of A, where L is the lower triangular part of

the output matrix, and U is the upper triangular part.

This theorem relies on us using the fancy version of G.E., which saves the multipliers in the

spots where there should be zeros. If correctly implemented, then, L is the lower triangular part

but with ones put on the diagonal.

This theorem is proved in Cheney & Kincaid [7]. This appears to me to be a case of something

which can be better illustrated with an example or two and some informal investigation. The proof

is an unillustrating index-chase–read it at your own risk.

3.3.4 Computing Inverses

We consider ﬁnding the inverse of A. Since

AA

−1

= I,

then the j

th

column of the inverse A

−1

solves the equation

Ax = e

j

,

where e

j

is the column matrix of all zeros, but with a one in the j

th

position.

Thus we can ﬁnd the inverse of A by running n linear solves. Obviously we are only going

to run G.E. once, to put the matrix in LU form, then run n solves using forward and backward

substitutions.

3.4 Iterative Solutions

Recall we are trying to solve

Ax = b.

We examine the computational cost of Gaussian Elimination to motivate the search for an alter-

native algorithm.

3.4.1 An Operation Count for Gaussian Elimination

We consider the number of ﬂoating point operations (“ﬂops”) required to solve the system Ax = b.

Gaussian Elimnation ﬁrst uses row operations to transform the problem into an equivalent problem

of the form Ux = b

**, where U is upper triangular. Then back substitution is used to solve for x.
**

First we look at how many ﬂoating point operations are required to reduce

¸

¸

¸

¸

¸

¸

a

11

a

12

a

13

a

1n

b

1

a

21

a

22

a

23

a

2n

b

2

a

31

a

32

a

33

a

3n

b

3

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

a

n1

a

n2

a

n3

a

nn

b

n

¸

**38 CHAPTER 3. SOLVING LINEAR SYSTEMS
**

to

¸

¸

¸

¸

¸

¸

a

11

a

12

a

13

a

1n

b

1

0 a

t

22

a

t

23

a

t

2n

b

t

2

0 a

t

32

a

t

33

a

t

3n

b

t

3

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

0 a

t

n2

a

t

n3

a

t

nn

b

t

n

¸

**First a multiplier is computed for each row. Then in each row the algorithm performs n
**

multiplies and n adds. This gives a total of (n−1) +(n−1)n multiplies (counting in the computing

of the multiplier in each of the (n−1) rows) and (n−1)n adds. In total this is 2n

2

−n−1 ﬂoating

point operations to do a single pivot on the n by n system.

Then this has to be done recursively on the lower right subsystem, which is an (n−1) by (n−1)

system. This requires 2(n − 1)

2

− (n − 1) − 1 operations. Then this has to be done on the next

subsystem, requiring 2(n −2)

2

−(n −2) −1 operations, and so on.

In total, then, we use I

n

total ﬂoating point operations, with

I

n

= 2

n

¸

j=1

j

2

−

n

¸

j=1

j −

n

¸

j=1

1.

Recalling that

n

¸

j=1

j

2

=

1

6

(n)(n + 1)(2n + 1), and

n

¸

j=1

j =

1

2

(n)(n + 1),

We ﬁnd that

I

n

=

1

6

(4n −1)n(n + 1) −n ≈

2

3

n

3

.

Now consider the costs of back substitution. To solve

¸

¸

¸

¸

¸

¸

a

11

a

1,n−2

a

1,n−1

a

1n

b

1

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

0 a

n−2,n−2

a

n−2,n−1

a

n−2,n

b

n−2

0 0 a

n−1,n−1

a

n−1,n

b

n−1

0 0 0 a

nn

b

n

¸

for x

n

requires only a single division. Then to solve for x

n−1

we compute

x

n−1

=

1

a

n−1,n−1

[b

n−1

−a

n−1,n

x

n

] ,

and requires 3 ﬂops. Similarly, solving for x

n−2

requires 5 ﬂops. Thus in total back substitution

requires B

n

total ﬂoating point operations with

B

n

=

n

¸

j=1

2j −1 = n(n −1) −n = n(n −2) ≈ n

2

3.4. ITERATIVE SOLUTIONS 39

3.4.2 Dividing by Multiplying

We saw that Gaussian Elimination requires around

2

3

n

3

operations just to ﬁnd the LU factorization,

then about n

2

operations to solve the system, when A is nn. When n is large, this may take too

long to be practical. Additionally, if A is sparse (has few nonzero elements per row), we would like

the complexity of our computations to scale with the sparsity of A. Thus we look for an alternative

algorithm.

First we consider the simplest case, n = 1. Suppose we are to solve the equation

Ax = b.

for scalars A, b. We solve this by

x =

1

A

b =

1

ωA

ωb =

1

1 −(1 −ωA)

ωb =

1

1 −r

ωb,

where ω = 0 is some real number chosen to “weight” the problem appropriately, and r = 1 − ωA.

Now suppose that ω is chosen such that [r[ < 1. This can be done so long as A = 0, which would

have been a problem anyway. Now use the geometric expansion:

1

1 −r

= 1 +r +r

2

+r

3

+. . .

Because of the assumption [r[ < 1, the terms r

n

converge to zero as n → ∞. This gives the

approximate solution to our one dimensional problem as

x ≈

1 +r +r

2

+r

3

+. . . +r

k

ωb

= ωb +

r +r

2

+r

3

+. . . +r

k

ωb

= ωb +r

1 +r +r

2

+. . . +r

k−1

ωb

This suggests an iterative approach to solving Ax = b. First let x

(0)

= ωb, then let

x

(k)

= ωb +rx

(k−1)

.

The iterates x

(k)

will converge to the solution of Ax = b if [r[ < 1.

You should now convince yourself that because r

n

→0, that the choice of the initial iterate x

(0)

was immaterial, i.e., that under any choice of initial iterate convergence is guaranteed.

We now translate this scalar result into the vector case. The algorithm proceeds as follows: ﬁrst

ﬁx some initial estimate of the solution, x

(0)

. A good choice might be ωb, but this is not necessary.

Then calculate successive approximations to the actual solution by updates of the form

x

(k)

= ωb + (I −ωA) x

(k−1)

.

It turns out that we can consider a slightly more general form of the algorithm, one in which

successive iterates are deﬁned implicitly. That is we consider iterates of the form

Qx

(k+1)

= (Q−ωA) x

(k)

+ωb, (3.3)

for some matrix Q, and some scaling factor ω. Note that this update relies on vector additions

and possibly by premultiplication of a vector by A or Q. In the case where these two matrices are

sparse, such an update can be relatively cheap.

40 CHAPTER 3. SOLVING LINEAR SYSTEMS

Now suppose that as k → ∞, x

(k)

converges to some vector x

∗

, which is a ﬁxed point of the

iteration. Then

Qx

∗

= (Q −ωA) x

∗

+ωb,

Qx

∗

= Qx

∗

−ωAx

∗

+ωb,

ωAx

∗

= ωb,

Ax

∗

= b.

We have some freedom in choosing Q, but there are two considerations we should keep in mind:

1. Choice of Q aﬀects convergence and speed of convergence of the method. In particular, we

want Q to be similar to A.

2. Choice of Q aﬀects ease of computing the update. That is, given

z = (Q−A) x

(k)

+b,

we should pick Q such that the equation

Qx

(k+1)

= z

is easy to solve exactly.

These two goals conﬂict with each other. At one end of the spectrum is the so-called “impossible

iteration,” at the other is the Richardsons.

3.4.3 Impossible Iteration

I made up the term “impossible iteration.” But consider the method which takes Q to be A. This

seems to be the best choice for satisfying the ﬁrst goal. Letting ω = 1, our method becomes

Ax

(k+1)

= (A −A) x

(k)

+b = b.

This method should clearly converge in one step. However, the second goal is totally ignored.

Indeed, we are considering iterative methods because we cannot easily solve this linear equation in

the ﬁrst place.

3.4.4 Richardson Iteration

At the other end of the spectrum is the Richardson Iteration, which chooses Q to be the identity

matrix. Solving the system

Qx

(k+1)

= z

is trivial: we just have x

(k+1)

= z.

Example Problem 3.4. Use Richardson Iteration with ω = 1 on the system

A =

6 1 1

2 4 0

1 2 6

¸

¸

, b =

12

0

6

¸

¸

.

Solution: We let

Q =

1 0 0

0 1 0

0 0 1

¸

¸

, (Q−A) =

−5 −1 −1

−2 −3 0

−1 −2 −5

¸

¸

.

3.4. ITERATIVE SOLUTIONS 41

We start with an arbitrary x

(0)

, say x

(0)

= [2 2 2]

¯

. We get x

(1)

= [−2 −10 −10]

¯

, and x

(2)

=

[42 34 78]

¯

.

Note the real solution is x = [2 −1 1]

¯

. The Richardson Iteration does not appear to converge

for this example, unfortunately. ¬

Example Problem 3.5. Apply Richardson Iteration with ω = 1/6 on the previous system.

Solution: Our iteration becomes

x

(k+1)

=

0 −1/6 −1/6

−1/3 1/3 0

−1/6 −1/3 0

¸

¸

x

(k)

+

2

0

1

¸

¸

.

We start with the same x

(0)

as previously, x

(0)

= [2 2 2]

¯

. We get x

(1)

= [4/3 0 0]

¯

, x

(2)

=

[2 −4/9 7/9]

¯

, and ﬁnally x

(12)

= [2 −0.99998 0.99998]

¯

.

Thus, the choice of ω has some aﬀect on convergence. ¬

We can rethink the Richardson Iteration as

x

(k+1)

= (I −ωA) x

(k)

+ωb = x

(k)

+ω

b −Ax

(k)

.

Thus at each step we are adding some scaled version of the residual, deﬁned as b − Ax

(k)

, to the

iterate.

3.4.5 Jacobi Iteration

The Jacobi Iteration chooses Q to be the matrix consisting of the diagonal of A. This is more

similar to A than the identity matrix, but nearly as simple to invert.

Example Problem 3.6. Use Jacobi Iteration, with ω = 1, to solve the system

A =

6 1 1

2 4 0

1 2 6

¸

¸

, b =

12

0

6

¸

¸

.

Solution: We let

Q =

6 0 0

0 4 0

0 0 6

¸

¸

, (Q−A) =

0 −1 −1

−2 0 0

−1 −2 0

¸

¸

, Q

−1

=

1

6

0 0

0

1

4

0

0 0

1

6

¸

¸

.

We start with an arbitrary x

(0)

, say x

(0)

= [2 2 2]

¯

. We get x

(1)

=

4

3

−1 0

¯

. Then x

(2)

=

13

6

−

2

3

10

9

¯

. Continuing, we ﬁnd that x

(5)

≈ [1.987 −1.019 0.981]

¯

.

Note the real solution is x = [2 −1 1]

¯

. ¬

There is an alternative way to describe the Jacobi Iteration for ω = 1. By considering the update

elementwise, we see that the operation can be described by

x

(k+1)

j

=

1

a

jj

¸

b

j

−

n

¸

i=1,i,=j

a

ji

x

(k)

i

¸

.

Thus an update takes less than 2n

2

operations. In fact, if A is sparse, with less than k nonzero

entries per row, the update should take less than 2nk operations.

42 CHAPTER 3. SOLVING LINEAR SYSTEMS

3.4.6 Gauss Seidel Iteration

The Gauss Seidel Iteration chooses Q to be lower triangular part of A, including the diagonal. In

this case solving the system

Qx

(k+1)

= z

is performed by forward substitution. Here the Q is more like A than for Jacobi Iteration, but

involves more work for inverting.

Example Problem 3.7. Use Gauss Seidel Iteration to again solve for

A =

6 1 1

2 4 0

1 2 6

¸

¸

, b =

12

0

6

¸

¸

.

Solution: We let

Q =

6 0 0

2 4 0

1 2 6

¸

¸

, (Q−A) =

0 −1 −1

0 0 0

0 0 0

¸

¸

.

We start with an arbitrary x

(0)

, say x

(0)

= [2 2 2]

¯

. We get x

(1)

=

4

3

−

2

3

1

¯

. Then x

(2)

=

35

18

−

35

36

1

¯

.

Already this is fairly close to the actual solution x = [2 −1 1]

¯

. ¬

Just as with Jacobi Iteration, there is an easier way to describe the Gauss Seidel Iteration. In

this case we will keep a single vector x and overwrite it, element by element. Thus for j = 1, 2, . . . , n,

we set

x

j

←

1

a

jj

¸

b

j

−

n

¸

i=1,i,=j

a

ji

x

i

¸

.

This looks exactly like the Jacobi update. However, in the sum on the right there are some “old”

values of x

i

and some “new” values; the new values are those x

i

for which i < j.

Again this takes less than 2n

2

operations. Or less than 2nk if A is suﬃciently sparse.

An alteration of the Gauss Seidel Iteration is to make successive “sweeps” of this redeﬁnition,

one for j = 1, 2, . . . , n, the next for j = n, n − 1, . . . , 2, 1. This amounts to running Gauss Seidel

once with Q the lower triangular part of A, then running it with Q the upper triangular part. This

iterative method is known as “red-black Gauss Seidel.”

3.4.7 Error Analysis

Suppose that x is the solution to equation 3.4. Deﬁne the error vector:

e

(k)

= x

(k)

−x.

3.4. ITERATIVE SOLUTIONS 43

Now notice that

x

(k+1)

= Q

−1

(Q−ωA) x

(k)

+Q

−1

ωb,

x

(k+1)

= Q

−1

Qx

(k)

−ωQ

−1

Ax

(k)

+ωQ

−1

Ax,

x

(k+1)

= x

(k)

−ωQ

−1

A

x

(k)

−x

,

x

(k+1)

−x = x

(k)

−x −ωQ

−1

A

x

(k)

−x

,

e

(k+1)

= e

(k)

−ωQ

−1

Ae

(k)

,

e

(k+1)

=

I −ωQ

−1

A

e

(k)

.

Reusing this relation we ﬁnd that

e

(k)

=

I −ωQ

−1

A

e

(k−1)

,

=

I −ωQ

−1

A

2

e

(k−2)

,

=

I −ωQ

−1

A

k

e

(0)

.

We want to ensure that e

(k+1)

is “smaller” than e

(k)

. To do this we recall matrix and vector norms

from Subsection 1.4.1.

e

(k)

2

=

I −ωQ

−1

A

k

e

(0)

2

≤

I −ωQ

−1

A

k

2

e

(0)

2

.

(See Example Problem 1.29.)

Thus our iteration converges (e

(k)

goes to the zero vector, i.e., x

(k)

→x) if

I −ωQ

−1

A

2

< 1.

This gives the theorem:

Theorem 3.8. An iterative solution scheme converges for any starting x

(0)

if and only if all

eigenvalues of I −ωQ

−1

A are less than 1 in absolute value, i.e., if and only if

I −ωQ

−1

A

2

< 1

Another way of saying this is “the spectral radius of I −ωQ

−1

A is less than 1.”

In fact, the speed of convergence is decided by the spectral radius of the matrix–convergence

is faster for smaller values. Recall our introduction to iterative methods in the scalar case, where

the result relied on ω being chosen such that [1 −ωA[ < 1. You should now think about how

eigenvalues generalize the absolute value of a scalar, and how this relates to the norm of matrices.

Let y be an eigenvector for Q

−1

A, with corresponding eigenvalue λ. Then

I −ωQ

−1

A

y = y −ωQ

−1

Ay = y −ωλy = (1 −ωλ) y.

This relation may allow us to pick the optimal ω for given A,Q. It can also show us that

sometimes no choice of ω will give convergence of the method. There are a number of diﬀerent

related results that show when various methods will work for certain choices of ω. We leave these

to the exercises.

44 CHAPTER 3. SOLVING LINEAR SYSTEMS

Example Problem 3.9. Find conditions on ω which guarantee convergence of Richardson’s Iter-

ation for ﬁnding approximate iterative solutions to the system Ax = b, where

A =

6 1 1

2 4 0

1 2 6

¸

¸

, b =

12

0

6

¸

¸

.

Solution: By Theorem 3.8, with Q the identity matrix, we have convergence if and only if

|I −ωA|

2

< 1

We now use the fact that “eigenvalues commute with polynomials;” that is if f(x) is a polynomial

and λ is an eigenvalue of a matrix A, then f(λ) is an eigenvalue of the matrix f(A). In this case the

polynomial we consider is f(x) = x

0

−ωx

1

. Using octave or Matlab you will ﬁnd that the eigenvalues

of A are approximately 7.7321, 4.2679, and 4. Thus the eigenvalues of I −ωA are approximately

1 −7.7321ω, 1 −4.2679ω, 1 −4ω.

With some work it can be shown that all three of these values will be less than one in absolute

value if and only if

0 < ω <

3

7.7321

≈ 0.388

(See also Exercise (3.10).)

Compare this to the results of Example Problem 3.4, where for this system, ω = 1 apparently did

not lead to convergence, while for Example Problem 3.5, with ω = 1/6, convergence was observed.

¬

3.4.8 A Free Lunch?

The analysis leading to Theorem 3.8 leads to an interesting possible variant of the iterative scheme.

For simplicity we will only consider an alteration of Richardson’s Iteration. In the altered algorithm

we presuppose the existence, via some oracle, of a sequence of weightings, ω

i

, which we use in each

iterative update. Thus our algorithm becomes:

1. Select some initial iterate x

(0)

.

2. Given iterate x

(k−1)

, deﬁne

x

(k)

= (I −ω

k

A) x

(k−1)

+ω

k

b.

Following the analysis for Theorem 3.8, it can be shown that

e

(k)

= (I −ω

k

A) e

(k−1)

where, again, e

(k)

= x

(k)

− x, with x the actual solution to the linear system. Expanding e

(k−1)

similarly gives

e

(k)

= (I −ω

k

A) (Iω

k−1

A) e

(k−2)

=

k

¸

i=1

I −ω

i

A

e

(0)

.

3.4. ITERATIVE SOLUTIONS 45

Remember that we want e

(k)

to be small in magnitude, or, better yet, to be zero. One way to

guarantee that e

(k)

is zero, regardless of the choice of e

(0)

it to somehow ensure that the matrix

B =

k

¸

i=1

I −ω

i

A

has all zero eigenvalues.

We again make use of the fact that eigenvalues “commute” with polynomials to claim that if

λ

j

is an eigenvalue of A, then

k

¸

i=1

1 −ω

i

λ

j

is an eigenvalue of B. This eigenvalue is zero if one of the ω

i

for 1 ≤ i ≤ k is 1/λ

j

. This suggests

how we are to pick the weightings: let them be the inverses of the eigenvalues of A.

In fact, if A has a small number of distinct eigenvalues, say m eigenvalues, then convergence to

the exact solution could be guaranteed after only m iterations, regardless of the size of the matrix.

As you may have guessed from the title of this subsection, this is not exactly a practical

algorithm. The problem is that it is not simple to ﬁnd, for a given arbitrary matrix A, one, some,

or all its eigenvalues. This problem is of suﬃcient complexity to outweigh any savings to be gotten

from our “free lunch” algorithm.

However, in some limited situations this algorithm might be practical if the eigenvalues of A

are known a priori .

46 CHAPTER 3. SOLVING LINEAR SYSTEMS

Exercises

(3.1) Find the LU decomposition of the following matrices, using na¨ıve Gaussian Elimination

(a)

3 −1 −2

9 −1 −4

−6 10 13

¸

¸

(b)

8 24 16

1 12 11

4 13 19

¸

¸

(c)

−3 6 0

1 −2 0

−4 5 −8

¸

¸

(3.2) Perform back substitution to solve the equation

1 3 5 3

0 2 4 3

0 0 2 1

0 0 0 2

¸

¸

¸

¸

x =

−1

−1

1

2

¸

¸

¸

¸

(3.3) Perform Na¨ıve Gaussian Elimination to prove Cramer’s rule for the 2D case. That is, prove

that the solution to

¸

a b

c d

¸

x

y

=

¸

f

g

is given by

y =

det

¸

a f

c g

det

¸

a b

c d

and x =

det

¸

f b

g d

det

¸

a b

c d

**(3.4) Implement Cramer’s rule to solve a pair of linear equations in 2 variables. Your m-ﬁle should
**

have header line like:

function x = cramer2(A,b)

where A is a 2 2 matrix, and b and x are 2 1 vectors. Your code should ﬁnd the x such

that Ax = b. (See Exercise (3.3))

Test your code on the following (augmented) systems:

(a)

3 −2 1

4 −3 −1

(b)

1.24 −3.48 1

−0.744 2.088 2

(c)

1.24 −3.48 1

−0.744 2.088 −0.6

(d)

**−0.0590 0.2372 −0.3528
**

0.1080 −0.4348 0.6452

**(3.5) Given two lines parametrized by f(t) = at +b, and g(s) = cs+d, set up a linear 22 system
**

of equations to ﬁnd the t, s at the point of intersection of the two lines. If you were going

to write a program to detect the intersection of two lines, how would you detect whether

they are parallel? How is this related to the form of the solution to a system of two linear

equations? (See Exercise (3.3))

(3.6) Prove that the inverse of the matrix

1 0 0 0

a

2

1 0 0

a

3

0 1 0

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

a

n

0 0 1

¸

¸

¸

¸

¸

¸

¸

3.4. ITERATIVE SOLUTIONS 47

is the matrix

1 0 0 0

−a

2

1 0 0

−a

3

0 1 0

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

−a

n

0 0 1

¸

¸

¸

¸

¸

¸

¸

(Hint: Multiply them together.)

(3.7) Under the strategy of scaled partial pivoting, which row of the following matrix will be the

ﬁrst pivot row?

10 17 −10 0.1 0.9

−3 3 −3 0.3 −4

0.3 0.1 0.01 −1 0.5

2 3 4 −3 5

10 100 1 0.1 0

¸

¸

¸

¸

¸

¸

(3.8) Let A be a symmetric positive deﬁnite n n matrix with n distinct eigenvalues. Letting

y

(0)

= b/ |b|

2

, consider the iteration

y

(k+1)

=

Ay

(k)

Ay

(k)

2

.

(a) What is

y

(k)

2

?

(b) Show that y

(k)

= A

k

b/

A

k

b

2

.

(c) Show that as k →∞, y

(k)

converges to the (normalized) eigenvector associated with the

largest eigenvalue of A.

(3.9) Consider the equation

1 3 5

−2 2 4

4 −3 −4

¸

¸

x =

−5

−6

10

¸

¸

Letting x

(0)

= [1 1 0]

¯

, ﬁnd the iterate x

(1)

by one step of Richardson’s Method. And by

one step of Jacobi Iteration. And by Gauss Seidel.

(3.10) Let A be a symmetric n n matrix with eigenvalues in the interval [α, β], with 0 < α ≤ β,

and α +β = 0. Consider Richardson’s Iteration

x

(k+1)

= (I −ωA) x

(k)

+ωb.

Recall that e

(k+1)

= (I −ωA) e

(k)

.

(a) Show that the eigenvalues of I −ωA are in the interval [1 −ωβ, 1 −ωα].

(b) Prove that

max ¦[λ[ : 1 −ωβ ≤ λ ≤ 1 −ωα¦

is minimized when we choose ω such that 1 − ωβ = −(1 −ωα) . (Hint: It may help to

look at the graph of something versus ω.)

(c) Show that this relationship is satisﬁed by ω = 2/ (α +β).

(d) For this choice of ω show that the spectral radius of I −ωA is

[α −β[

[α +β[

.

48 CHAPTER 3. SOLVING LINEAR SYSTEMS

(e) Show that when 0 < α, this quantity is always smaller than 1.

(f) Prove that if A is positive deﬁnite, then there is an ω such that Richardson’s Iteration

with this ω will converge for any choice of x

(0)

.

(g) For which matrix do you expect faster convergence of Richardson’s Iteration: A

1

with

eigenvalues in [10, 20] or A

2

with eigenvalues in [1010, 1020]? Why?

(3.11) Implement Richardson’s Iteration to solve the system Ax = b. Your m-ﬁle should have

header line like:

function xk = richardsons(A,b,x0,w,k)

Your code should return x

(k)

based on the iteration

x

(j+1)

= x

(j)

−ω

Ax

(j)

−b

.

Let w take the place of ω, and let x0 be the initial iterate x

(0)

. Test your code for A, b for

which you know the actual solution to the problem. (Hint: Start with A and the solution x

and generate b.) Test your code on the following matrices:

• Let A be the Hilbert Matrix. This is generated by the octave command A = hilb(10),

or whatever (smallish) integer. Try diﬀerent values of ω, including ω = 1.

• Let A be a Toeplitz matrix of the form:

A =

−2 1 0 0

1 −2 1 0

0 1 −2 0

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

0 0 0 −2

¸

¸

¸

¸

¸

¸

¸

These can be generated by the octave command A = toeplitz([-2 1 0 0 0 0 0 0]).

Try diﬀerent values of ω, including ω = −1/2.

(3.12) Let A be a nonsingular n n matrix. We wish to solve Ax = b. Let x

(0)

be some starting

vector, let T

k

be span

¸

r

(0)

, Ar

(0)

, . . . , A

k

r

(0)

¸

, and let {

k

be the set of polynomials, p(x) of

degree k with p(0) = 1.

Consider the following iterative method: Let x

(k+1)

be the x that solves

min

x∈x

(0)

+T

k

|b −Ax|

2

.

Let r

(k)

= b −Ax

(k)

.

(a) Show that if x ∈ x

(0)

+T

k

, then b −Ax = p(A)r

(0)

for some p ∈ {

k

.

(b) Prove that, conversely, for any p ∈ {

k

there is some x ∈ x

(0)

+T

k

, such that b −Ax =

p(A)r

(0)

.

(c) Argue that

r

(k+1)

2

= min

p∈1

k

p(A)r

(0)

2

.

(d) Prove that this iteration converges in at most n steps. (Hint: Argue for the existence

of a polynomial in {

n

that vanishes at all the eigenvalues of A. Use this polynomial to

show that

r

(n)

2

≤ 0.)

Chapter 4

Finding Roots

4.1 Bisection

We are now concerned with the problem of ﬁnding a zero for the function f(x), i.e., some c such

that f(c) = 0.

The simplest method is that of bisection. The following theorem, from calculus class, insures

the success of the method

Theorem 4.1 (Intermediate Value Theorem). If f(x) is continuous on [a, b] then for any y

such that y is between f(a) and f(b) there is a c ∈ [a, b] such that f(c) = y.

The IVT is best illustrated graphically. Note that continuity is really a requirement here–a

single point of discontinuity could ruin your whole day, as the following example illustrates.

Example 4.2. The function f(x) =

1

x

is not continuous at 0. Thus if 0 ∈ [a, b] , we cannot apply

the IVT. In particular, if 0 ∈ [a, b] it happens to be the case that for every y between f(a), f(b)

there is no c ∈ [a, b] such that f(c) = y.

In particular, the IVT tells us that if f(x) is continuous and we know a, b such that f(a), f(b)

have diﬀerent sign, then there is some root in [a, b]. A decent estimate of the root is c =

a+b

2

.

We can check whether f(c) = 0. If this does not hold then one and only one of the two following

options holds:

1. f(a), f(c) have diﬀerent signs.

2. f(c), f(b) have diﬀerent signs.

We now choose to recursively apply bisection to either [a, c] or [c, b], respectively, depending on

which of these two options hold.

4.1.1 Modiﬁcations

Unfortunately, it is impossible for a computer to test whether a given black box function is contin-

uous. Thus malicious or incompetent users could cause a na¨ıvely implemented bisection algorithm

to fail. There are a number of easily conceivable problems:

1. The user might give f, a, b such that f(a), f(b) have the same sign. In this case the function

f might be legitimately continuous, and might have a root in the interval [a, b]. If, taking

c =

a+b

2

, f(a), f(b), f(c) all have the same sign, the algorithm would be at an impasse. We

should perform a “sanity check” on the input to make sure f(a), f(b) have diﬀerent signs.

49

50 CHAPTER 4. FINDING ROOTS

2. The user might give f, a, b such that f is not continuous on [a, b], moreover has no root in

the interval [a, b]. For a poorly implemented algorithm, this might lead to an inﬁnite search

on smaller and smaller intervals about some discontinuity of f. In fact, the algorithm might

descend to intervals as small as machine precision, in which case the midpoint of the interval

will, due to rounding, be the same as one of the endpoints, resulting in an inﬁnite recursion.

3. The user might give f such that f has no root c that is representable in the computer’s

memory. Recall that we think of computers as storing numbers in the form ±r 10

k

; given

a ﬁnite number of bits to represent a number, only a ﬁnite number of such numbers can be

represented. It may legitimately be the case that none of them is a root to f. In this case,

the behaviour of the algorithm may be like that of the previous case. A well implemented

version of bisection should check the length of its input interval, and give up if the length is

too small, compared to machine precision.

Another common error occurs in the testing of the signs of f(a), f(b). A slick programmer might

try to implement this test in the pseudocode:

if (f(a)f(b) > 0) then ...

Note however, that [f(a)[ , [f(b)[ might be very small, and that f(a)f(b) might be too small to

be representable in the computer; this calculation would be rounded to zero, and unpredictable

behaviour would ensue. A wiser choice is

if (sign(f(a)) * sign(f(b)) > 0) then ...

where the function sign (x) returns −1, 0, 1 depending on whether x is negative, zero, or positive,

respectively.

The pseudocode bisection algorithm is given as Algorithm 1.

4.1.2 Convergence

We can see that each time recursive bisection(f, a, b, . . .) is called that [b −a[ is half the length

of the interval in the previous call. Formally call the ﬁrst interval [a

0

, b

0

], and the ﬁrst midpoint

c

0

. Let the second interval be [a

1

, b

1

], etc. Note that one of a

1

, b

1

will be c

0

, and the other will be

either a

0

or b

0

. We are claiming that

b

n

−a

n

=

b

n−1

−a

n−1

2

=

b

0

−a

0

2

n

Theorem 4.3 (Bisection Method Theorem). If f(x) is a continuous function on [a, b] such that

f(a)f(b) < 0, then after n steps, the algorithm run bisection will return c such that [c −c

t

[ ≤

[b−a[

2

n

, where c

t

is some root of f.

4.2 Newton’s Method

Newton’s method is an iterative method for root ﬁnding. That is, starting from some guess at the

root, x

0

, one iteration of the algorithm produces a number x

1

, which is supposed to be closer to a

root; guesses x

2

, x

3

, . . . , x

n

follow identically.

Newton’s method uses “linearization” to ﬁnd an approximate root. Recalling Taylor’s Theorem,

we know that

f(x +h) ≈ f(x) +f

t

(x)h.

4.2. NEWTON’S METHOD 51

Algorithm 1: Algorithm for ﬁnding root by bisection.

Input: a function, two endpoints, a x-tolerance, and a y-tolerance

Output: a c such that [f(c)[ is smaller than the y-tolerance.

run bisection(f, a, b, δ, )

(1) Let fa ←sign (f(a)) .

(2) Let fb ←sign (f(b)) .

(3) if fafb > 0

(4) throw an error.

(5) else if fafb = 0

(6) if fa = 0

(7) return a

(8) else

(9) return b

(10) Let c ←

a+b

2

(11) while b −a > 2δ

(12) Let fc ←f(c).

(13) if [fc[ <

(14) return c

(15) if sign (fa) sign (fc) < 0

(16) Let b ←c, fb ←fc, c ←

a+c

2

.

(17) else

(18) Let a ←c, fa ←fc, c ←

c+b

2

.

(19) return c

This approximation is better when f

tt

() is “well-behaved” between x and x+h. Newton’s method

attempts to ﬁnd some h such that

0 = f(x +h) = f(x) +f

t

(x)h.

This is easily solved as

h =

−f(x)

f

t

(x)

.

An iteration of Newton’s method, then, takes some guess x

k

and returns x

k+1

deﬁned by

x

k+1

= x

k

−

f(x

k

)

f

t

(x

k

)

. (4.1)

An iteration of Newton’s method is shown in Figure 4.1, along with the linearization of f(x) at

x

k

.

4.2.1 Implementation

Use of Newton’s method requires that the function f(x) be diﬀerentiable. Moreover, the derivative

of the function must be known. This may preclude Newton’s method from being used when f(x)

is a black box. As is the case for the bisection method, our algorithm cannot explicitly check for

continuity of f(x). Moreover, the success of Newton’s method is dependent on the initial guess

x

0

. This was also the case with bisection, but for bisection there was an easy test of the initial

interval–i.e., test if f(a

0

)f(b

0

) < 0.

52 CHAPTER 4. FINDING ROOTS

P

S

f

r

a

g

r

e

p

l

a

c

e

m

e

n

t

s

x

k

x

k−1

f(x

k

)

f(x

k−1

)

Figure 4.1: One iteration of Newton’s method is shown for a quadratic function f(x). The lin-

earization of f(x) at x

k

is shown. It is clear that x

k+1

is a root of the linearization. It happens to

be the case that [f(x

k+1

)[ is smaller than [f(x

k

)[ , i.e., x

k+1

is a better guess than x

k

.

Our algorithm will test for goodness of the estimate by looking at [f(x

k

)[ . The algorithm will

also test for near-zero derivative. Note that if it were the case that f

t

(x

k

) = 0 then h would be ill

deﬁned.

Algorithm 2: Algorithm for ﬁnding root by Newton’s Method.

Input: a function, its derivative, an initial guess, an iteration limit, and a tolerance

Output: a point for which the function has small value.

run newton(f, f

t

, x0, N, tol)

(1) Let x ←x0, n ←0.

(2) while n ≤ N

(3) Let fx ←f(x).

(4) if [fx[ < tol

(5) return x.

(6) Let fpx ←f

t

(x).

(7) if [fpx[ < tol

(8) Warn “f

t

(x) is small; giving up.”

(9) return x.

(10) Let x ←x −fx/fpx.

(11) Let n ←n + 1.

4.2. NEWTON’S METHOD 53

4.2.2 Problems

As mentioned above, convergence is dependent on f(x), and the initial estimate x

0

. A number of

conceivable problems might come up. We illustrate them here.

Example 4.4. Consider Newton’s method applied to the function f(x) = x

j

with j > 1, and with

initial estimate x

0

= 0.

Note that f(x) = 0 has the single root x = 0. Now note that

x

k+1

= x

k

−

x

j

k

jx

j−1

k

=

1 −

1

j

x

k

.

Since the equation has the single root x = 0, we ﬁnd that x

k

is converging to the root. However,

it is converging at a rate slower than we expect from Newton’s method: at each step we have a

constant decrease of 1−(1/j) , which is a larger number (and thus worse decrease) when j is larger.

Example 4.5. Consider Newton’s method applied to the function f(x) =

lnx

x

, with initial estimate

x

0

= 3.

Note that f(x) is continuous on R

+

. It has a single root at x = 1. Our initial guess is not too far

from this root. However, consider the derivative:

f

t

(x) =

x

1

x

−ln x

x

2

=

1 −ln x

x

2

If x > e

1

, then 1 −ln x < 0, and so f

t

(x) < 0. However, for x > 1, we know f(x) > 0. Thus taking

x

k+1

= x

k

−

f(x

k

)

f

t

(x

k

)

> x

k

.

The estimates will “run away” from the root x = 1.

Example 4.6. Consider Newton’s method applied to the function f(x) = sin (x) for the initial

estimate x

0

= 0, where x

0

has the odious property 2x

0

= tan x

0

.

You should verify that there are an inﬁnite number of such x

0

. Consider the identity of x

1

:

x

1

= x

0

−

f(x

0

)

f

t

(x

0

)

= x

0

−

sin(x

0

)

cos(x

0

)

= x

0

−tan x

0

= x

0

−2x

0

= −x

0

.

Now consider x

2

:

x

2

= x

1

−

f(x

1

)

f

t

(x

1

)

= −x

0

−

sin(−x

0

)

cos(−x

0

)

−x

0

+

sin(x

0

)

cos(x

0

)

= −x

0

+ tan x

0

= −x

0

+ 2x

0

= x

0

.

Thus Newton’s method “cycles” between the two values x

0

, −x

0

.

Of course, Newton’s method may ﬁnd some iterate x

k

for which f

t

(x

k

) = 0, in which case, there

is no well-deﬁned x

k+1

.

4.2.3 Convergence

When Newton’s Method converges, it actually displays quadratic convergence. That is, if e

k

= x

k

−r,

where r is the root that the x

k

are converging to, that [e

k+1

[ ≤ C [e

k

[

2

. If, for example, it were

the case that C = 1, then we would double the accuracy of our root estimate with each iterate.

That is, if e

0

were 0.001, we would expect e

1

to be on the order of 0.000001. The following theorem

formalizes our claim:

54 CHAPTER 4. FINDING ROOTS

Theorem 4.7 (Newton’s Method Convergence). If f(x) has two continuous derivatives, and

r is a simple root of f(x), then there is some D such that if [x

0

−r[ < D, Newton’s method will

converge quadratically to r.

The proof is based on arguments from real analysis, and is omitted; see Cheney & Kincaid for

the proof [7]. Take note of the following, though:

1. The proof requires that r be a simple root, that is that f

t

(r) = 0. When this does not hold

we may get only linear convergence, as in Example 4.4.

2. The key to the proof is using Taylor’s theorem, and the deﬁnition of Newton’s method to

show that

e

k+1

=

−f

tt

(ξ

k

)e

2

k

2f

t

(x

k

)

,

where ξ

k

is between x

k

and r = x

k

+e

k

.

The proof then proceeds by showing that there is some region about r such that

(a) in this region [f

tt

(x)[ is not too large and [f

t

(x)[ is not too small, and

(b) if x

k

is in the region, then x

k+1

is in the region.

In particular the region is chosen to be so small that if x

k

is in the region, then the factor

e

2

k

will outweigh the factor [f

tt

(ξ

k

)[ /2 [f

t

(x

k

)[ . You can roughly think of this region as an

“attractor” region for the root.

3. The theorem never guarantees that some x

k

will fall into the “attractor” region of a root r of

the function, as in Example 4.5 and Example 4.6. The theorem that follows gives suﬃcient

conditions for convergence to a particular root.

Theorem 4.8 (Newton’s Method Convergence II [2]). If f(x) has two continuous derivatives

on [a, b], and

1. f(a)f(b) < 0,

2. f

t

(x) = 0 on [a, b],

3. f

tt

(x) does not change sign on [a, b],

4. Both [f(a)[ ≤ (b −a) [f

t

(a)[ and [f(b)[ ≤ (b −a) [f

t

(b)[ hold,

then Newton’s method converges to the unique root of f(x) = 0 for any choice of x

0

∈ [a, b] .

4.2.4 Using Newton’s Method

Newton’s Method can be used to program more complex functions using only simple functions.

Suppose you had a computer which could perform addition, subtraction, multiplication, division,

and storing and retrieving numbers, and it was your task to write a subroutine to compute some

complex function g(). One way of solving this problem is to have your subroutine use Newton’s

Method to solve some equation equivalent to g(z) −x = 0, where z is the input to the subroutine.

Note that it is assumed the subroutine cannot evaluate g(z) directly, so this equation needs to be

modiﬁed to satisfy the computer’s constraints.

Quite often when dealing with problems of this type, students make the mistake of using New-

ton’s Method to try to solve a linear equation. This should be an indication that a mistake was

made, since Newton’s Method can solve a linear equation in a single step:

Example 4.9. Consider Newton’s method applied to the function f(x) = ax +b. The iteration is

given as

x

k+1

←x

k

−

ax

k

+b

a

.

This can be rewritten simply as x

k+1

←−b/a.

4.3. SECANT METHOD 55

The following example problems should illustrate this process of “bootstrapping” via Newton’s

Method.

Example Problem 4.10. Devise a subroutine using only subtraction and multiplication that can

ﬁnd the multiplicative inverse of an input number z, i.e., can ﬁnd (1/z) .

Solution: We are tempted to use the linear function f(x) = (1/z) −x. But this is a linear equation

for which Newton’s Method would reduce to x

k+1

← (1/z) . Since the subroutine can only use

subtraction and multiplication, this will not work.

Instead apply Newton’s Method to f(x) = z −(1/x) . The Newton step is

x

k+1

←x

k

−

z −(1/x

k

)

1/x

2

k

= x

k

−zx

2

k

+x

k

= x

k

(2 −zx

k

) .

Note this step uses only multiplication and subtraction. The subroutine is given in Algorithm 3.

¬

Algorithm 3: Algorithm for ﬁnding a multiplicative inverse using simple operations.

Input: a number

Output: its multiplicative inverse

invs(z)

(1) if z = 0 throw an error.

(2) Let x ←sign (z) , n ←0.

(3) while n ≤ 50

(4) Let x ←x(2 −zx) .

(5) Let n ←n + 1.

Example Problem 4.11. Devise a subroutine using only simple operations which computes the

square root of an input number z.

Solution: The temptation is to ﬁnd a zero for f(x) =

√

z − x. However, this equation is linear

in x. Instead let f(x) = z − x

2

. You can easily see that if x is a positive root of f(x) = 0, then

x =

√

z. The Newton step becomes

x

k+1

←x

k

−

z −x

2

k

−2x

k

.

after some simpliﬁcation this becomes

x

k+1

←

x

k

2

+

z

2x

k

.

Note this relies only on addition, multiplication and division.

The ﬁnal subroutine is given in Algorithm 4.

¬

4.3 Secant Method

The secant method for root ﬁnding is roughly based on Newton’s method; however, it is not

assumed that the derivative of the function is known, rather the derivative is approximated by

the value of the function at some of the iterates, x

k

. More speciﬁcally, the slope of the tangent

56 CHAPTER 4. FINDING ROOTS

Algorithm 4: Algorithm for ﬁnding a square root using simple operations.

Input: a number

Output: its square root

sqrt(z)

(1) if z < 0 throw an error.

(2) Let x ←1, n ←0.

(3) while n ≤ 50

(4) Let x ←(x +z/x) /2.

(5) Let n ←n + 1.

P

S

f

r

a

g

r

e

p

l

a

c

e

m

e

n

t

s

x

k

x

k−1

f(x

k

)

f(x

k−1

)

x

k+1

f

(

x

k

+

1

)

Figure 4.2: One iteration of the Secant method is shown for some quadratic function f(x). The

secant line through (x

k−1

, f(x

k−1

)) and (x

k

, f(x

k

)) is shown. It happens to be the case that

[f(x

k+1

)[ is smaller than [f(x

k

)[ , i.e., x

k+1

is a better guess than x

k

.

line at (x

k

, f(x

k

)) , which is f

t

(x

k

) is approximated by the slope of the secant line passing through

(x

k−1

, f(x

k−1

)) and (x

k

, f(x

k

)) , which is

f(x

k

) −f(x

k−1

)

x

k

−x

k−1

Thus the iterate x

k+1

is the root of this secant line. That is, it is a root to the equation

f(x

k

) −f(x

k−1

)

x

k

−x

k−1

(x −x

k

) = y −f(x

k

).

4.3. SECANT METHOD 57

Since the root has a y value of 0, we have

f(x

k

) −f(x

k−1

)

x

k

−x

k−1

(x

k+1

−x

k

) = −f(x

k

),

x

k+1

−x

k

= −

x

k

−x

k−1

f(x

k

) −f(x

k−1

)

f(x

k

),

x

k+1

= x

k

−

x

k

−x

k−1

f(x

k

) −f(x

k−1

)

f(x

k

). (4.2)

You will note this is the recurrence of Newton’s method, but with the slope f

t

(x

k

) substituted

by the slope of the secant line. Note also that the secant method requires two initial guesses, x

0

, x

1

,

but can be used on a black box function. The secant method can suﬀer from some of the same

problems that Newton’s method does, as we will see.

An iteration of the secant method is shown in Figure 4.2, along with the secant line.

Example 4.12. Consider the secant method used on x

3

+x

2

−x −1, with x

0

= 2, x

1

=

1

2

.

Note that this function is continuous and has roots ±1. We give the iterates here:

k x

k

f(x

k

)

0 2 9

1 0.5 −1.125

2 0.666666666666667 −0.925925925925926

3 1.44186046511628 2.63467367653163

4 0.868254072087394 −0.459842466254495

5 0.953491494113659 −0.177482458876898

6 1.00706900811804 0.0284762692197613

7 0.999661272951803 −0.0013544492875992

8 0.999997617569723 −9.52969840528617 10

−06

9 1.0000000008072 3.22880033820638 10

−09

10 0.999999999999998 −7.43849426498855 10

−15

4.3.1 Problems

As with Newton’s method, convergence is dependent on f(x), and the initial estimates x

0

, x

1

. We

illustrate a few possible problems:

Example 4.13. Consider the secant method for the function f(x) =

lnx

x

, with x

0

= 3, x

1

= 4.

As with Newton’s method, the iterates diverge towards inﬁnity, looking for a nonexistant root. We

58 CHAPTER 4. FINDING ROOTS

give some iterates here:

k x

k

f(x

k

)

0 3 0.366204096222703

1 4 0.346573590279973

2 21.6548475770851 0.142011128224341

3 33.9111765137635 0.103911011441661

4 67.3380435135758 0.0625163004418104

5 117.820919458675 0.0404780904944712

6 210.543986613847 0.0254089165873003

7 366.889164762149 0.0160949419231219

8 637.060241341843 0.010135406045582

9 1096.54125113444 0.00638363233543847

10 1878.34688714646 0.00401318169994875

11 3201.94672271613 0.0025208146648422

12 5437.69020766155 0.00158175793894727

13 9203.60222260594 0.000991714984152597

14 15533.1606791089 0.000621298692241343

15 26149.7196085218 0.000388975250950428

16 43924.8466075548 0.000243375589137882

17 73636.673898472 0.000152191807070607

Example 4.14. Consider Newton’s method applied to the function f(x) =

1

1+x

2

−

1

17

, with initial

estimates x

0

= −1, x

1

= 1.

You can easily verify that f(x

0

) = f(x

1

), and thus the secant line is horizontal. And thus x

2

is not

deﬁned.

Algorithm 5: Algorithm for ﬁnding root by secant method.

Input: a function, initial guesses, an iteration limit, and a tolerance

Output: a point for which the function has small value.

run secant(f, x0, x1, N, tol)

(1) Let x ←x1, xp ←x0, fh ←f(x1), fp ←f(x0), n ←0.

(2) while n ≤ N

(3) if [fh[ < tol

(4) return x.

(5) Let fpx ←(fh −fp)/(x −xp).

(6) if [fpx[ < tol

(7) Warn “secant slope is too small; giving up.”

(8) return x.

(9) Let xp ←x, fp ←fh.

(10) Let x ←x −fh/fpx.

(11) Let fh ←f(x).

(12) Let n ←n + 1.

4.3.2 Convergence

We consider convergence analysis as in Newton’s method. We assume that r is a root of f(x), and

let e

n

= r −x

n

. Because the secant method involves two iterates, we assume that we will ﬁnd some

4.3. SECANT METHOD 59

relation between e

k+1

and the previous two errors, e

k

, e

k−1

.

Indeed, this is the case. Omitting all the nasty details (see Cheney & Kincaid [7]), we arrive at

the imprecise equation:

e

k+1

≈ −

f

tt

(r)

2f

t

(r)

e

k

e

k−1

= Ce

k

e

k−1

.

Again, the proof relies on ﬁnding some “attractor” region and using continuity.

We now postulate that the error terms for the secant method follow some power law of the

following type:

[e

k+1

[ ∼ A[e

k

[

α

.

Recall that this held true for Newton’s method, with α = 2. We try to ﬁnd the α for the secant

method. Note that

[e

k

[ = A[e

k−1

[

α

,

so

[e

k−1

[ = A

−

1

α

[e

k

[

1

α

.

Then we have

A[e

k

[

α

= [e

k+1

[ = C [e

k

[ [e

k−1

[ = CA

−

1

α

[e

k

[

1+

1

α

,

Since this equation is to hold for all e

k

, we must have

α = 1 +

1

α

.

This is solved by α =

1

2

1 +

√

5

**≈ 1.62. Thus we say that the secant method enjoys superlinear
**

convergence; This is somewhere between the convergence rates for bisection and Newton’s method.

60 CHAPTER 4. FINDING ROOTS

Exercises

(4.1) Consider the bisection method applied to ﬁnd the zero of the function f(x) = x

2

− 5x + 3,

with a

0

= 0, b

0

= 1. What are a

1

, b

1

? What are a

2

, b

2

?

(4.2) Approximate

√

10 by using two steps of Newton’s method, with an initial estimate of x

0

= 3.

(cf. Example Problem 4.11) Your answer should be correct to 5 decimal places.

(4.3) Consider bisection for ﬁnding the root to cos x = 0. Let the initial interval I

0

be [0, 2]. What

is the next interval considered, call it I

1

? What is I

2

? I

6

?

(4.4) What does the sequence deﬁned by

x

0

= 1, x

k+1

=

1

2

x

k

+

1

x

k

converge to?

(4.5) Devise a subroutine using only simple operations that ﬁnds, via Newton’s Method, the cubed

root of some input number z.

(4.6) Use Newton’s Method to approximate

3

√

9. Start with x

0

= 2. Find x

2

.

(4.7) Use Newton’s Method to devise a sequence x

0

, x

1

, . . . such that x

k

→ln 10. Is this a reasonable

way to write a subroutine that, given z, computes ln z? (Hint: such a subroutine would require

computation of e

x

k

. Is this possible for rational x

k

without using a logarithm? Is it practical?)

(4.8) Give an example (graphical or analytic) of a function, f(x) for which Newton’s Method:

(a) Does not ﬁnd a root for some choices of x

0

.

(b) Finds a root for every choice of x

0

.

(c) Falls into a cycle for some choice of x

0

.

(d) Converges slowly to the zero of f(x).

(4.9) How will Newton’s Method perform for ﬁnding the root to f(x) =

[x[ = 0?

(4.10) Implement the inverse ﬁnder described in Example Problem 4.10. Your m-ﬁle should have

header line like:

function c = invs(z)

where z is the number to be inverted. You may wish to use the builtin function sign. As

an extra termination condition, you should have the subroutine return the current iterate if

zx is suﬃciently close to 1, say within the interval (0.9999, 0.0001). Can you ﬁnd some z for

which the algorithm performs poorly?

(4.11) Implement the bisection method in octave/Matlab. Your m-ﬁle should have header line like:

function c = run_bisection(f, a, b, tol)

where f is the name of the function. Recall that feval(f,x) when f is a string with the

name of some function evaluates that function at x. This works for builtin functions and

m-ﬁles.

(a) Run your code on the function f(x) = cos x, with a

0

= 0, b

0

= 2. In this case you can

set f = "cos".

(b) Run your code on the function f(x) = x

2

−5x + 3, with a

0

= 0, b

0

= 1. In this case you

will have to set f to the name of an m-ﬁle (without the “.m”) which will evaluate the

given f(x).

(c) Run your code on the function f(x) = x −cos x, with a

0

= 0, b

0

= 1.

(4.12) Implement Newton’s Method. Your m-ﬁle should have header line like:

function x = run_newton(f, fp, x0, N, tol)

where f is the name of the function, and fp is its derivative. Run your code to ﬁnd zeroes of

the following functions:

(a) f(x) = tan x −x.

4.3. SECANT METHOD 61

(b) f(x) = x

2

−(2 +)x + 1 +, for small.

(c) f(x) = x

7

−7x

6

+ 21x

5

−35x

4

+ 35x

3

−21x

2

+ 7x −1.

(d) f(x) = (x −1)

7

.

(4.13) Implement the secant method Your m-ﬁle should have header line like:

function x = run_secant(f, x0, x1, N, tol)

where f is the name of the function. Run your code to ﬁnd zeroes of the following functions:

(a) f(x) = tan x −x.

(b) f(x) = x

2

+ 1. (Note: This function has no zeroes.)

(c) f(x) = 2 + sin x. (Note: This function also has no zeroes.)

62 CHAPTER 4. FINDING ROOTS

Chapter 5

Interpolation

5.1 Polynomial Interpolation

We consider the problem of ﬁnding a polynomial that interpolates a given set of values:

x x

0

x

1

. . . x

n

y y

0

y

1

. . . y

n

where the x

i

are all distinct. A polynomial p(x) is said to interpolate these data if p(x

i

) = y

i

for

i = 0, 1, . . . , n. The x

i

values are called “nodes.”

Sometimes, we will consider a variant of this problem: we have some black box function, f(x),

which we want to approximate with a polynomial p(x). We do this by ﬁnding the polynomial

interpolant to the data

x x

0

x

1

. . . x

n

f(x) f(x

0

) f(x

1

) . . . f(x

n

)

for some choice of distinct nodes x

i

.

5.1.1 Lagranges Method

As you might have guessed, for any such set of data, there is an n-degree polynomial that interpo-

lates it. We present a constructive proof of this fact by use of Lagrange Polynomials.

Deﬁnition 5.1 (Lagrange Polynomials). For a given set of n + 1 nodes x

i

, the Lagrange

polynomials are the n + 1 polynomials

i

deﬁned by

i

(x

j

) = δ

ij

=

0 if i = j

1 if i = j

Then we deﬁne the interpolating polynomial as

p

n

(x) =

n

¸

i=0

y

i

i

(x).

If each Lagrange Polynomial is of degree at most n, then p

n

also has this property. The Lagrange

Polynomials can be characterized as follows:

i

(x) =

n

¸

j=0, j,=i

x −x

j

x

i

−x

j

. (5.1)

63

64 CHAPTER 5. INTERPOLATION

By evaluating this product for each x

j

, we see that this is indeed a characterization of the Lagrange

Polynomials. Moreover, each polynomial is clearly the product of n monomials, and thus has degree

no greater than n.

This gives the theorem

Theorem 5.2 (Interpolant Existence and Uniqueness). Let ¦x

i

¦

n

i=0

be distinct nodes. Then

for any values at the nodes, ¦y

i

¦

n

i=0

, there is exactly one polynomial, p(x) of degree no greater than

n such that p(x

i

) = y

i

for i = 0, 1, . . . , n.

Proof. The Lagrange Polynomial construction gives existence of such a polynomial p(x) of degree

no greater than n.

Suppose there were two such polynomials, call them p(x), q(x), each of degree no greater than

n, both interpolating the data. Let r(x) = p(x) −q(x). Note that r(x) can have degree no greater

than n, yet it has roots at x

0

, x

1

, . . . , x

n

. The only polynomial of degree ≤ n that has n+1 distinct

roots is the zero polynomial, i.e., 0 ≡ r(x) = p(x) − q(x). Thus p(x), q(x) are equal everywhere,

i.e., they are the same polynomial.

Example Problem 5.3. Construct the polynomial interpolating the data

x 1

1

2

3

y 3 −10 2

by using Lagrange Polynomials.

Solution: We construct the Lagrange Polynomials:

0

(x) =

(x −

1

2

)(x −3)

(1 −

1

2

)(1 −3)

= −(x −

1

2

)(x −3)

1

(x) =

(x −1)(x −3)

(

1

2

−1)(

1

2

−3)

=

4

5

(x −1)(x −3)

2

(x) =

(x −1)(x −

1

2

)

(3 −1)(3 −

1

2

)

=

1

5

(x −1)(x −

1

2

)

Then the interpolating polynomial, in “Lagrange Form” is

p

2

(x) = −3(x −

1

2

)(x −3) −8(x −1)(x −3) +

2

5

(x −1)(x −

1

2

)

¬

5.1.2 Newton’s Method

There is another way to prove existence. This method is also constructive, and leads to a diﬀerent

algorithm for constructing the interpolant. One way to view this construction is to imagine how

one would update the Lagrangian form of an interpolant. That is, suppose some data were given,

and the interpolating polynomial calculated using Lagrange Polynomials; then a new point was

given (x

n+1

, y

n+1

), and an interpolant for the augmented set of data is to be found. Each Lagrange

Polynomial would have to be updated. This could take a lot of calculation (especially if n is large).

So the alternative method constructs the polynomials iteratively. Thus we create polynomials

p

k

(x) such that p

k

(x

i

) = y

i

for 0 ≤ i ≤ k. This is simple for k = 0, we simply let

p

0

(x) = y

0

,

5.1. POLYNOMIAL INTERPOLATION 65

-1

-0.5

0

0.5

1

1.5

2

0 0.5 1 1.5 2 2.5 3 3.5

l0(x)

l1(x)

l2(x)

Figure 5.1: The 3 Lagrange polynomials for the example are shown. Verify, graphically, that these

polynomials have the property

i

(x

j

) = δ

ij

for the nodes 1,

1

2

, 3.

the constant polynomial with value y

0

. Then assume we have a proper p

k

(x) and want to construct

p

k+1

(x). The following construction works:

p

k+1

(x) = p

k

(x) +c(x −x

0

)(x −x

1

) (x −x

k

),

for some constant c. Note that the second term will be zero for any x

i

for 0 ≤ i ≤ k, so p

k+1

(x)

will interpolate the data at x

0

, x

1

, . . . , x

k

. To get the value of the constant we calculate c such that

y

k+1

= p

k+1

(x

k+1

) = p

k

(x

k+1

) +c(x

k+1

−x

0

)(x

k+1

−x

1

) (x

k+1

−x

k

).

This construction is known as Newton’s Algorithm, and the resultant form is Newton’s form of

the interpolant

Example Problem 5.4. Construct the polynomial interpolating the data

x 1

1

2

3

y 3 −10 2

by using Newton’s Algorithm.

Solution: We construct the polynomial iteratively:

p

0

(x) = 3

p

1

(x) = 3 +c(x −1)

We want −10 = p

1

(

1

2

) = 3 +c(−

1

2

), and thus c = 26. Then

p

2

(x) = 3 + 26(x −1) +c(x −1)(x −

1

2

)

66 CHAPTER 5. INTERPOLATION

-35

-30

-25

-20

-15

-10

-5

0

5

10

15

0 0.5 1 1.5 2 2.5 3 3.5 4

p(x)

Figure 5.2: The interpolating polynomial for the example is shown.

We want 2 = p

2

(3) = 3 + 26(2) +c(2)(

5

2

), and thus c =

−53

5

. Then we get

p

2

(x) = 3 + 26(x −1) +

−53

5

(x −1)(x −

1

2

).

¬

Does Newton’s Algorithm give a diﬀerent polynomial? It is easy to show, by induction, that

the degree of p

n

(x) is no greater than n. Then, by Theorem 5.2, it must be the same polynomial

as the Lagrange interpolant.

The two methods give the same interpolant, we may wonder “Which should we use?” Newton’s

Algorithm seems more ﬂexible–it can deal with adding new data. There also happens to be a way

of storing Newton’s form of the interpolant that makes the polynomial simple to evaluate (in the

sense of number of calculations required).

5.1.3 Newton’s Nested Form

Recall the iterative construction of Newton’s Form:

p

k+1

(x) = p

k

(x) +c

k

(x −x

0

)(x −x

1

) (x −x

k

).

The previous iterate p

k

(x) was constructed similarly, so we can write:

p

k+1

(x) = [p

k−1

(x) +c

k−1

(x −x

0

)(x −x

1

) (x −x

k−1

)] +c

k

(x −x

0

)(x −x

1

) (x −x

k

).

Continuing in this way we see that we can write

p

n

(x) =

n

¸

k=0

c

k

¸

0≤j<k

(x −x

j

)

¸

¸

, (5.2)

where an empty product has value 1 by convention. This can be rewritten in a funny form, where

a monomial is factored out of each successive summand:

p

n

(x) = c

0

+ (x −x

0

)[c

1

+ (x −x

1

)[c

2

+ (x −x

2

) [. . .]]]

5.1. POLYNOMIAL INTERPOLATION 67

Supposing for an instant that the constants c

k

were known, this provides a better way of

calculating p

n

(t) at arbitrary t. By “better” we mean requiring few multiplications and additions.

This nested calculation is performed iteratively:

v

0

= c

n

v

1

= c

n−1

+ (t −x

n−1

)v

0

v

2

= c

n−2

+ (t −x

n−2

)v

1

.

.

.

v

n

= c

0

+ (t −x

0

)v

n−1

This requires only n multiplications and 2n additions. Compare this with the number required

for using the Lagrange form: at least n

2

additions and multiplications.

5.1.4 Divided Diﬀerences

It turns out that the coeﬃcients c

k

for Newton’s nested form can be calculated relatively easily by

using divided diﬀerences. We assume, for the remainder of this section, that we are considering

interpolating a function, that is, we have values of f(x

i

) at the nodes x

i

.

Deﬁnition 5.5 (Divided Diﬀerences). For a given collection of nodes ¦x

i

¦

n

i=0

and values

¦f(x

i

)¦

n

x=0

, a k

th

order divided diﬀerence is a function of k + 1 (not necessarily distinct) nodes,

written as

f [x

i

0

, x

i

1

, . . . , x

i

k

]

The divided diﬀerences are deﬁned recursively as follows:

• The 0

th

order divided diﬀerences are simply deﬁned:

f [x

i

] = f(x

i

).

• Higher order divided diﬀerences are the ratio of diﬀerences:

f [x

i

0

, x

i

1

, . . . , x

i

k

] =

f [x

i

1

, x

i

2

, . . . , x

i

k

] −f

x

i

0

, x

i

1

, . . . , x

i

k−1

x

i

k

−x

i

0

We care about divided diﬀerences because coeﬃcients for the Newton nested form are divided

diﬀerences:

c

k

= f [x

0

, x

1

, . . . , x

k

] . (5.3)

Because we are only interested in the Newton method coeﬃcients, we will only consider divided

diﬀerences with successive nodes, i.e., those of the form f [x

j

, x

j+1

, . . . , x

j+k

]. In this case the

higher order diﬀerences can more simply be written as

f [x

j

, x

j+1

, . . . , x

j+k

] =

f [x

j+1

, x

j+2

, . . . , x

j+k

] −f [x

j

, x

j+1

, . . . , x

j+k−1

]

x

j+k

−x

j

68 CHAPTER 5. INTERPOLATION

The graphical way to calculate these things is a “pyramid scheme”

1

, where you compute the

following table columnwise, from left to right:

x f[ ] f[ , ] f[ , , ] f[ , , , ]

x

0

f[x

0

]

f[x

0

, x

1

]

x

1

f[x

1

] f[x

0

, x

1

, x

2

]

f[x

1

, x

2

] f[x

0

, x

1

, x

2

, x

3

]

x

2

f[x

2

] f[x

1

, x

2

, x

3

]

f[x

2

, x

2

]

x

3

f[x

3

]

Note that by deﬁnition, the ﬁrst column just echoes the data: f[x

i

] = f(x

i

).

We drag out our example one last time:

Example Problem 5.6. Find the divided diﬀerences for the following data

x 1

1

2

3

f(x) 3 −10 2

Solution: We start by writing in the data:

x f[ ] f[ , ] f[ , , ]

1 3

1

2

−10

3 2

Then we calculate:

f[x

0

, x

1

] =

−10 −3

(1/2) −1

= 26, and f[x

1

, x

2

] =

2 −−10

3 −(1/2)

= 24/5.

Adding these to the table, we have

x f[ ] f[ , ] f[ , , ]

1 3

26

1

2

−10

24

5

3 2

Then we calculate

f[x

0

, x

1

, x

2

] =

24/5 −26

3 −1

=

−53

5

.

So we complete the table:

x f[ ] f[ , ] f[ , , ]

1 3

26

1

2

−10

−53

5

24

5

3 2

1

That’s supposed to be a joke.

5.2. ERRORS IN POLYNOMIAL INTERPOLATION 69

You should verify that along the top line of this pyramid you can read oﬀ the coeﬃcients for

Newton’s form, as found in Example Problem 5.4. ¬

5.2 Errors in Polynomial Interpolation

We now consider two questions:

1. If we want to interpolate some function f(x) at n + 1 nodes over some closed interval, how

should we pick the nodes?

2. How accurate can we make a polynomial interpolant over a closed interval?

You may be surprised to ﬁnd that the answer to the ﬁrst question is not that we should make

the x

i

equally spaced over the closed interval, as the following example illustrates.

Example 5.7 (Runge Function). Let

f(x) =

1 +x

2

−1

,

(known as the Runge Function), and let p

n

(x) interpolate f on n equally spaced nodes, including

the endpoints, on [−5, 5]. Then

lim

n→∞

max

x∈[−5,5]

[p

n

(x) −f(x)[ = ∞.

This behaviour is shown in Figure 5.3.

It turns out that a much better choice is related to the Chebyshev Polynomials (“of the ﬁrst

kind”). If our closed interval is [−1, 1], then we want to deﬁne our nodes as

x

i

= cos

¸

2i + 1

2n + 2

π

, 0 ≤ i ≤ n.

Literally interpreted, these Chebyshev Nodes are the projections of points uniformly spaced on

a semi circle; see Figure 5.4. By using the Chebyshev nodes, a good polynomial interpolant of the

Runge function of Example 5.7 can be found, as shown in Figure 5.5.

5.2.1 Interpolation Error Theorem

We did not just invent the Chebyshev nodes. The fact that they are “good” for interpolation

follows from the following theorem:

Theorem 5.8 (Interpolation Error Theorem). Let p be the polynomial of degree at most n

interpolating function f at the n+1 distinct nodes x

0

, x

1

, . . . , x

n

on [a, b]. Let f

(n+1)

be continuous.

Then for each x ∈ [a, b] there is some ξ ∈ [a, b] such that

f(x) −p(x) =

1

(n + 1)!

f

(n+1)

(ξ)

n

¸

i=0

(x −x

i

) .

You should be thinking that the term on the right hand side resembles the error term in Taylor’s

Theorem.

70 CHAPTER 5. INTERPOLATION

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

-6 -4 -2 0 2 4 6

runge

3 nodes

7 nodes

10 nodes

(a) 3,7,10 equally spaced nodes

-1

0

1

2

3

4

5

6

7

8

-6 -4 -2 0 2 4 6

runge

15 nodes

(b) 15 equally spaced nodes

-700000

-600000

-500000

-400000

-300000

-200000

-100000

0

100000

-6 -4 -2 0 2 4 6

runge

50 nodes

(c) 50 equally spaced nodes

Figure 5.3: The Runge function is poorly interpolated at n equally spaced nodes, as n gets large.

At ﬁrst, the interpolations appear to improve with larger n, as in (a). In (b) it becomes apparent

that the interpolant is good in center of the interval, but bad at the edges. As shown in (c), this

gets worse as n gets larger.

5.2. ERRORS IN POLYNOMIAL INTERPOLATION 71

bottomright

topleft

P

S

f

r

a

g

r

e

p

l

a

c

e

m

e

n

t

s

1 1 −

Figure 5.4: The Chebyshev nodes are the projections of nodes equally spaced on a semi circle.

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

-6 -4 -2 0 2 4 6

runge

3 nodes

7 nodes

10 nodes

(a) 3,7,10 Chebyshev nodes

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

-6 -4 -2 0 2 4 6

runge

17 nodes

(b) 17 Chebyshev nodes

Figure 5.5: The Chebyshev nodes yield good polynomial interpolants of the Runge function. Com-

pare these ﬁgures to those of Figure 5.3. For more than about 25 nodes, the interpolant is indis-

tinguishable from the Runge function by eye.

Proof. First consider when x is one of the nodes x

i

; in this case both the LHS and RHS are zero.

So assume x is not a node. Make the following deﬁnitions

w(t) =

n

¸

i=0

(t −x

i

) ,

c =

f(x) −p(x)

w(x)

,

φ(t) = f(t) −p(t) −cw(t).

Since x is not a node, w(x) is nonzero. Now note that φ(x

i

) is zero for each node x

i

, and that by

deﬁnition of c, that φ(x) = 0 for our x. That is φ(t) has n + 2 roots.

Some other facts: f, p, w have n + 1 continuous derivatives, by assumption and deﬁnition; thus

φ has this many continuous derivatives. Apply Rolle’s Theorem to φ(t) to ﬁnd that φ

t

(t) has n+1

72 CHAPTER 5. INTERPOLATION

roots. Then apply Rolle’s Theorem again to ﬁnd φ

tt

(t) has n roots. In this way we see that φ

(n+1)

(t)

has a root, call it ξ. That is

0 = φ

(n+1)

(ξ) = f

(n+1)

(ξ) −p

(n+1)

(ξ) −cw

(n+1)

(ξ).

But p(t) is a polynomial of degree ≤ n, so p

(n+1)

is identically zero. And w(t) is a polynomial

of degree n + 1 in t, so its n + 1

th

derivative is easily seen to be (n + 1)! Thus

0 = f

(n+1)

(ξ) −c(n + 1)!

c(n + 1)! = f

(n+1)

(ξ)

f(x) −p(x)

w(x)

=

1

(n + 1)!

f

(n+1)

(ξ)

f(x) −p(x) =

1

(n + 1)!

f

(n+1)

(ξ)w(x),

which is what was to be proven.

Thus the error in a polynomial interpolation is given as

f(x) −p(x) =

1

(n + 1)!

f

(n+1)

(ξ)

n

¸

i=0

(x −x

i

) .

We have no control over the function f(x) or its derivatives, and once the nodes and f are ﬁxed, p

is determined; thus the only way we can make the error [f(x) −p(x)[ small is by judicious choice

of the nodes x

i

.

The Chebyshev nodes on [−1, 1] have the remarkable property that

n

¸

i=0

(t −x

i

)

≤ 2

−n

for any t ∈ [−1, 1] . Moreover, it can be shown that for any choice of nodes x

i

that

max

t∈[−1,1]

n

¸

i=0

(t −x

i

)

≥ 2

−n

.

Thus the Chebyshev nodes are considered the best for polynomial interpolation.

Merging this result with Theorem 5.8, the error for polynomial interpolants deﬁned on Cheby-

shev nodes can be bounded as

[f(x) −p(x)[ ≤

1

2

n

(n + 1)!

max

ξ∈[−1,1]

f

(n+1)

(ξ)

.

The Chebyshev nodes can be rescaled and shifted for use on the general interval [α, β]. In this

case they take the form

x

i

=

β −α

2

cos

¸

2i + 1

2n + 2

π

+

α +β

2

, 0 ≤ i ≤ n.

In this case, the rescaling of the nodes changes the bound on

¸

(t −x

i

) so the overall error bound

becomes

[f(x) −p(x)[ ≤

(β −α)

n+1

2

2n+1

(n + 1)!

max

ξ∈[α,β]

f

(n+1)

(ξ)

,

for x ∈ [α, β] .

5.2. ERRORS IN POLYNOMIAL INTERPOLATION 73

Example Problem 5.9. How many Chebyshev nodes are required to interpolate the function

f(x) = sin(x) + cos(x) to within 10

−8

on the interval [0, π]?

Solution: We ﬁrst ﬁnd the derivatives of f

f

t

(x) = cos(x) −sin(x)

f

tt

(x) = −sin(x) −cos(x)

f

ttt

(x) = −cos(x) + sin(x)

.

.

.

As a crude approximation we can assert that

f

(k)

(x)

≤ [cos x[ +[sin x[ ≤ 2.

Thus it suﬃces to take n large enough such that

π

n+1

2

2n+1

(n + 1)!

2 ≤ 10

−8

By trial and error we see that n = 10 suﬃces. ¬

5.2.2 Interpolation Error for Equally Spaced Nodes

Despite the proven superiority of Chebyshev Nodes, and the problems with the Runge Function,

equally spaced nodes are frequently used for interpolation, since they are easy to calculate

2

. We

now consider bounding

max

x∈[a,b]

n

¸

i=0

[x −x

i

[ ,

where

x

i

= a +hi = a +

(b −a)

n

i, i = 0, 1, . . . , n.

Start by picking an x. We can assume x is not one of the nodes, otherwise the product in

question is zero. Then x is between some x

j

, x

j+1

We can show that

[x −x

j

[ [x −x

j+1

[ ≤

h

2

4

.

by simple calculus.

Now we claim that [x −x

i

[ ≤ (j −i + 1) h for i < j, and [x −x

i

[ ≤ (i −j) h for j +1 < i. Then

n

¸

i=0

[x −x

i

[ ≤

h

2

4

(j + 1)!h

j

(n −j)!h

n−j−1

.

It can be shown that (j + 1)!(n −j)! ≤ n!, and so we get an overall bound

n

¸

i=0

[x −x

i

[ ≤

h

n+1

n!

4

.

2

Never underestimate the power of laziness.

74 CHAPTER 5. INTERPOLATION

The interpolation theorem then gives us

[f(x) −p(x)[ ≤

h

n+1

4 (n + 1)

max

ξ∈[a,b]

f

(n+1)

(ξ)

,

where h = (b −a)/n.

The reason this result does not seem to apply to Runge’s Function is that f

(n)

for Runge’s

Function becomes unbounded as n →∞.

Example Problem 5.10. How many equally spaced nodes are required to interpolate the function

f(x) = sin(x) + cos(x) to within 10

−8

on the interval [0, π]?

Solution: As in Example Problem 5.9, we make the crude approximation

f

(k)

(x)

≤ [cos x[ +[sin x[ ≤ 2.

Thus we want to make n suﬃciently large such that

h

n+1

4(n + 1)

2 ≤ 10

−8

,

where h = (π −0)/n. That is we want to ﬁnd n large enough such that

π

n+1

2n

n+1

(n + 1)

≤ 10

−8

,

By simply trying small numbers, we can see this is satisﬁed if n = 12. ¬

5.2. ERRORS IN POLYNOMIAL INTERPOLATION 75

Exercises

(5.1) Find the Lagrange Polynomials for the nodes ¦−1, 1¦.

(5.2) Find the Lagrange Polynomials for the nodes ¦−1, 1, 5¦.

(5.3) Find the polynomial of degree no greater than 2 that interpolates

x −1 1 5

y 3 3 −2

(5.4) Complete the divided diﬀerences table:

x f[ ] f[ , ] f[ , , ]

−1 3

1 3

5 −2

Find the Newton form of the polynomial interpolant.

(5.5) Find the polynomial of degree no greater than 3 that interpolates

x −1 1 5 −3

y 3 3 −2 4

(Hint: reuse the Newton form of the polynomial from the previous question.)

(5.6) Find the nested form of the polynomial interpolant of the data

x 1 3 4 6

y −3 13 21 1

by completing the following divided diﬀerences table:

x f[ ] f[ , ] f[ , , ] f[ , , , ]

1 −3

3 13

4 21

6 1

(5.7) Find the polynomial of degree no greater than 3 that interpolates

x 1 0 3/2 2

y 3 2 37/8 8

(5.8) Complete the divided diﬀerences table:

x f[ ] f[ , ] f[ , , ] f[ , , , ]

−1 6

0 3

1 2

2 3

76 CHAPTER 5. INTERPOLATION

(Something a little odd should have happened in the last column.) Find the Newton form of

the polynomial interpolant. Of what degree is the polynomial interpolant?

(5.9) Let p(x) interpolate the function cos(x) at n equally spaced nodes on the interval [0, 2]. Bound

the error

max

0≤x≤2

[p(x) −cos(x)[

as a function of n. How small is the error when n = 10? How small would the error be if

Chebyshev nodes were used instead? How about when n = 10?

(5.10) Let p(x) interpolate the function x

−2

at n equally spaced nodes on the interval [0.5, 1].

Bound the error

max

0.5≤x≤1

p(x) −x

−2

**as a function of n. How small is the error when n = 10?
**

(5.11) How many Chebyshev nodes are required to interpolate the function

1

x

to within 10

−6

on

the interval [1, 2]?

(5.12) Write code to calculate the Newton form coeﬃcients, by divided diﬀerences, for the nodes

x

i

and values f(x

i

). Your m-ﬁle should have header line like:

function coefs = newtonCoef(xs,fxs)

where xs is the vector of n + 1 nodes, and fxs the vector of n + 1 values. Test your code on

the following input:

octave:1> xs = [1 -1 2 -2 3 -3 4 -4];

octave:2> fxs = [1 1 2 3 5 8 13 21];

octave:3> newtonCoef(xs,fxs)

ans =

1.00000 -0.00000 0.33333 -0.08333 0.05000 0.00417 -0.00020 -0.00040

(a) What do you get when you try the following?

octave:4> xs = [1 4 5 9 14 23 37 60];

octave:5> fxs = [3 1 4 1 5 9 2 6];

octave:6> newtonCoef(xs,ys)

(b) Try the following:

octave:7> xs = [1 3 4 2 8 -2 0 14 23 15];

octave:8> fxs = xs.*xs + xs .+ 4;

octave:9> newtonCoef(xs,fxs)

(5.13) Write code to calculate a polynomial interpolant from its Newton form coeﬃcients and the

node values. Your m-ﬁle should have header line like:

function y = calcNewton(t,coefs,xs)

where coefs is a vector of the Newton coeﬃcients, xs is a vector of the nodes x

i

, and y is

the value of the interpolating polynomial at t. Check your code against the following values:

octave:1> xs = [1 3 4];

octave:2> coefs = [6 5 1];

octave:3> calcNewton (5,coefs,xs)

ans = 34

octave:4> calcNewton (-3,coefs,xs)

ans = 10

(a) What do you get when you try the following?

octave:5> xs = [3 1 4 5 9 2 6 8 7 0];

5.2. ERRORS IN POLYNOMIAL INTERPOLATION 77

octave:6> coefs = [1 -1 2 -2 3 -3 4 -4 5 -5];

octave:7> calcNewton(0.5,coefs,xs)

(b) Try the following

octave:8> xs = [3 1 4 5 9 2 6 8 7 0];

octave:9> coefs = [1 -1 2 -2 3 -3 4 -4 5 -5];

octave:10> calcNewton(1,coefs,xs)

78 CHAPTER 5. INTERPOLATION

Chapter 6

Spline Interpolation

Splines are used to approximate complex functions and shapes. A spline is a function consisting of

simple functions glued together. In this way a spline is diﬀerent from a polynomial interpolation,

which consists of a single well deﬁned function that approximates a given shape; splines are normally

piecewise polynomial.

6.1 First and Second Degree Splines

Splines make use of partitions, which are a way of cutting an interval into a number of subintervals.

Deﬁnition 6.1 (Partition). A partition of the interval [a, b] is an ordered sequence ¦t

i

¦

n

i=0

such

that

a = t

0

< t

1

< < t

n−1

< t

n

= b

The numbers t

i

are known as knots.

A spline of degree 1, also known as a linear spline, is a function which is linear on each subinterval

deﬁned by a partition:

Deﬁnition 6.2 (Linear Splines). A function S is a spline of degree 1 on [a, b] if

1. The domain of S is [a, b].

2. S is continuous on [a, b].

3. There is a partition ¦t

i

¦

n

i=0

of [a, b] such that on each [t

i

, t

i+1

], S is a linear polynomial.

A linear spline is deﬁned entirely by its value at the knots. That is, given

t t

0

t

1

. . . t

n

y y

0

y

1

. . . y

n

there is only one linear spline with these values at the knots and linear on each given subinterval.

For a spline with this data, the linear polynomial on each subinterval is deﬁned as

S

i

(x) = y

i

+

y

i+1

−y

i

t

i+1

−t

i

(x −t

i

) .

Note that if x ∈ [t

i

, t

i+1

] , then x − t

i

> 0, but x − t

i−1

≤ 0. Thus if we wish to evaluate S(x), we

search for the largest i such that x −t

i

> 0, then evaluate S

i

(x).

Example 6.3. The linear spline for the following data

t 0.0 0.1 0.4 0.5 0.75 1.0

y 1.3 4.5 2.0 2.1 5.0 3

is shown in Figure 6.1.

79

80 CHAPTER 6. SPLINE INTERPOLATION

0

1

2

3

4

5

6

0 0.2 0.4 0.6 0.8 1

Figure 6.1: A linear spline. The spline is piecewise linear, and is linear between each knot.

6.1.1 First Degree Spline Accuracy

As with polynomial functions, splines are used to interpolate tabulated data as well as functions. In

the latter case, if the spline is being used to interpolate the function f, say, then this is equivalent

to interpolating the data

t t

0

t

1

. . . t

n

y f(t

0

) f(t

1

) . . . f(t

n

)

A function and its linear spline interpolant are shown in Figure 6.2. The spline interpolant in

that ﬁgure is fairly close to the function over some of the interval in question, but it also deviates

greatly from the function at other points of the interval. We are interested in ﬁnding bounds on

the possible error between a function and its spline interpolant.

To ﬁnd the error bound, we will consider the error on a single interval of the partition, and use

a little calculus.

1

Suppose p(t) is the linear polynomial interpolating f(t) at the endpoints of the

subinterval [t

i

, t

i+1

], then for t ∈ [t

i

, t

i+1

] ,

[f(t) −p(t)[ ≤ max ¦[f(t) −f(t

i

)[ , [f(t) −f(t

i+1

)[¦ .

That is, [f(t) −p(t)[ is no larger than the “maximum variation” of f(t) on this interval.

In particular, if f

t

(t) exists and is bounded by M

1

on [t

i

, t

i+1

], then

[f(t) −p(t)[ ≤

M

1

2

(t

i+1

−t

i

) .

Similarly, if f

tt

(t) exists and is bounded by M

2

on [t

i

, t

i+1

], then

[f(t) −p(t)[ ≤

M

2

8

(t

i+1

−t

i

)

2

.

Over a given partition, these become, respectively

1

Although we could directly claim Theorem 5.8, it is a bit of overkill.

6.1. FIRST AND SECOND DEGREE SPLINES 81

2

2.5

3

3.5

4

4.5

5

5.5

6

0 0.2 0.4 0.6 0.8 1

P

S

f

r

a

g

r

e

p

l

a

c

e

m

e

n

t

s

function

spline interpolant

Figure 6.2: The function f(t) = 4.1 + sin (1/(0.08t + 0.5)) is shown, along with the linear spline

interpolant, for the knots 0, 0.1, 0.4, 0.6, 0.75, 0.9, 1.0 For some values of t, the function and its

interpolant are very close, while they vary greatly in the middle of the interval.

[f(t) −p(t)[ ≤

M

1

2

max

0≤i<n

(t

i+1

−t

i

) ,

[f(t) −p(t)[ ≤

M

2

8

max

0≤i<n

(t

i+1

−t

i

)

2

.

(6.1)

If equally spaced nodes are used, these bounds guarantee that spline interpolants become better

as the number of nodes is increased. This contrasts with polynomial interpolants, which may get

worse as the number of nodes is increased, cf. Example 5.7.

6.1.2 Second Degree Splines

Piecewise quadratic splines, or splines of degree 2, are deﬁned similarly:

Deﬁnition 6.4 (Quadratic Splines). A function Q is a quadratic spline on [a, b] if

1. The domain of Q is [a, b].

2. Q is continuous on [a, b].

3. Q

t

is continuous on (a, b).

4. There is a partition ¦t

i

¦

n

i=0

of [a, b] such that on [t

i

, t

i+1

], Q is a polynomial of degree at most

2.

Example 6.5. The following is a quadratic splne:

Q(x) =

−x x ≤ 0,

x

2

−x 0 ≤ x ≤ 2,

−x

2

+ 7x −8 2 ≤ x.

Unlike linear splines, quadratic splines are not deﬁned entirely by their values at the knots. We

consider why that is. The spline Q(x) is deﬁned by its piecewise polynomials,

Q

i

(x) = a

i

x

2

+b

i

x +c

i

.

82 CHAPTER 6. SPLINE INTERPOLATION

Thus there are 3n parameters to deﬁne Q(x).

For each of the n subintervals, the data

t t

0

t

1

. . . t

n

y y

0

y

1

. . . y

n

give two equations regarding Q

i

(x), namely that Q

i

(t

i

) must equal y

i

and Q

i

(t

i+1

) must equal y

i+1

.

This is 2n equations. The condition on continuity of Q

t

gives a single equation for each of the n−1

internal nodes. This totals 3n −1 equations, but 3n unknowns. This system is underdetermined.

Thus some additional user-chosen condition is required to determine the quadratic spline. One

might choose, for example, Q

t

(a) = 0, or Q

tt

(a) = 0, or some other condition.

6.1.3 Computing Second Degree Splines

Suppose the data

t t

0

t

1

. . . t

n

y y

0

y

1

. . . y

n

are given. Let z

i

= Q

t

i

(t

i

), and suppose that the additional condition to deﬁne the quadratic spline

is given by specifying z

0

. We want to be able to compute the form of Q

i

(x).

Because Q

i

(t

i

) = y

i

, Q

t

i

(t

i

) = z

i

, Q

t

i

(t

i+1

) = z

i+1

, we see that we can deﬁne

Q

i

(x) =

z

i+1

−z

i

2 (t

i+1

−t

i

)

(x −t

i

)

2

+z

i

(x −t

i

) +y

i

.

Use this at t

i+1

:

y

i+1

= Q

i

(t

i+1

) =

z

i+1

−z

i

2 (t

i+1

−t

i

)

(t

i+1

−t

i

)

2

+z

i

(t

i+1

−t

i

) +y

i

,

y

i+1

−y

i

=

z

i+1

−z

i

2

(t

i+1

−t

i

) +z

i

(t

i+1

−t

i

) ,

y

i+1

−y

i

=

z

i+1

+z

i

2

(t

i+1

−t

i

) .

Thus we can determine, from the data alone, z

i+1

from z

i

:

z

i+1

= 2

y

i+1

−y

i

t

i+1

−t

i

−z

i

.

6.2 (Natural) Cubic Splines

If you recall the deﬁnition of the linear and quadratic splines, probably you can guess the deﬁnition

of the spline of degree k:

Deﬁnition 6.6 (Splines of Degree k). A function S is a spline of degree k on [a, b] if

1. The domain of S is [a, b].

2. S, S

t

, S

tt

, . . . , S

(k−1)

are continuous on (a, b).

3. There is a partition ¦t

i

¦

n

i=0

of [a, b] such that on [t

i

, t

i+1

], S is a polynomial of degree ≤ k.

You would also expect that a spline of degree k has k − 1 “degrees of freedom,” as we show

here. If the partition has n+1 knots, the spline of degree k is deﬁned by n(k +1) parameters. The

given data

t t

0

t

1

. . . t

n

y y

0

y

1

. . . y

n

6.2. (NATURAL) CUBIC SPLINES 83

provide 2n equations. The continuity of S

t

, S

tt

, . . . , S

(k−1)

at the n − 1 internal knots gives (k −

1)(n − 1) equations. This is a total of n(k + 1) − (k − 1) equations. Thus we have k − 1 more

unknowns than equations. Thus, barring some singularity, we can (and must) add k−1 constraints

to uniquely deﬁne the spline. These are the degrees of freedom.

Often k is chosen as 3. This yields cubic splines. We must add 2 extra constraints to deﬁne the

spline. The usual choice is to make

S

tt

(t

0

) = S

tt

(t

n

) = 0.

This yields the natural cubic spline.

6.2.1 Why Natural Cubic Splines?

It turns out that natural cubic splines are a good choice in the sense that they are the “inter-

polant of minimal H

2

seminorm.” The corollary following this theorem states this in more easily

understandable terms:

Theorem 6.7. Suppose f has two continuous derivatives, and S is the natural cubic spline inter-

polating f at knots a = t

0

< t

1

< . . . < t

n

= b. Then

b

a

S

tt

(x)

2

dx ≤

b

a

f

tt

(x)

2

dx

Proof. We let g(x) = f(x) −S(x). Then g(x) is zero on the (n+1) knots t

i

. Derivatives are linear,

meaning that

f

tt

(x) = S

tt

(x) +g

tt

(x).

Then

b

a

f

tt

(x)

2

dx =

b

a

S

tt

(x)

2

dx +

b

a

g

tt

(x)

2

dx +

b

a

2S

tt

(x)g

tt

(x) dx.

We show that the last integral is zero. Integrating by parts we get

b

a

S

tt

(x)g

tt

(x) dx = S

tt

g

t

b

a

−

b

a

S

ttt

g

t

dx = −

b

a

S

ttt

g

t

dx,

because S

tt

(a) = S

tt

(b) = 0. Then notice that S is a polynomial of degree ≤ 3 on each interval, thus

S

ttt

(x) is a piecewise constant function, taking value c

i

on each interval [t

i

, t

i+1

]. Thus

b

a

S

ttt

g

t

dx =

n−1

¸

i=0

b

a

c

i

g

t

dx =

n−1

¸

i=0

c

i

g

t

i+1

t

i

= 0,

with the last equality holding because g(x) is zero at the knots.

Corollary 6.8. The natural cubic spline is best twice-continuously diﬀerentiable interpolant for a

twice-continuously diﬀerentiable function, under the measure given by the theorem.

Proof. Let f be twice-continuously diﬀerentiable, and let S be the natural cubic spline interpolating

f(x) at some given nodes ¦t

i

¦

n

i=0

. Let R(x) be some twice-continuously diﬀerentiable function

which also interpolates f(x) at these nodes. Then S(x) interpolates R(x) at these nodes. Apply

the theorem to get

b

a

S

tt

(x)

2

dx ≤

b

a

R

tt

(x)

2

dx

84 CHAPTER 6. SPLINE INTERPOLATION

6.2.2 Computing Cubic Splines

First we present an example of computing the natural cubic spline by hand:

Example Problem 6.9. Construct the natural cubic spline for the following data:

t −1 0 2

y 3 −1 3

Solution: The natural cubic spline is deﬁned by eight parameters:

S(x) =

ax

3

+bx

2

+cx +d x ∈ [−1, 0]

ex

3

+fx

2

+gx +h x ∈ [0, 2]

We interpolate to ﬁnd that d = h = −1 and

−a +b −c −1 = 3

8e + 4f + 2g −1 = 3

We take the derivative of S:

S

t

(x) =

3ax

2

+ 2bx +c x ∈ [−1, 0]

3ex

2

+ 2fx +g x ∈ [0, 2]

Continuity at the middle node gives c = g. Now take the second derivative of S:

S

tt

(x) =

6ax + 2b x ∈ [−1, 0]

6ex + 2f x ∈ [0, 2]

Continuity at the middle node gives b = f. The natural cubic spline condition gives −6a + 2b = 0

and 12e + 2f = 0. Solving this by “divide and conquer” gives

S(x) =

x

3

+ 3x

2

−2x −1 x ∈ [−1, 0]

−

1

2

x

3

+ 3x

2

−2x −1 x ∈ [0, 2]

¬

Finding the constants for the previous example was fairly tedious. And this is for the case of

only three nodes. We would like a method easier than setting up the 4n equations and unknowns,

something akin to the description in Subsection 6.1.3. The method is rather tedious, so we leave it

to the exercises.

6.3 B Splines

The B splines form a basis for spline functions, whence the name. We presuppose the existence of

an inﬁnite number of knots:

. . . < t

2

< t

1

< t

0

< t

1

< t

2

< . . . ,

with lim

k→−∞

t

k

= −∞ and lim

k→∞

t

k

= ∞.

The B splines of degree 0 are deﬁned as single “blocks”:

B

0

i

(x) =

1 t

i

≤ x < t

i+1

0 otherwise

6.3. B SPLINES 85

The zero degree B splines are continuous from the right, are nonzero only on one subinterval

[t

i

, t

i+1

), sum to 1 everywhere.

We justify the description of B splines as basis splines: If S is a spline of degree 0 on the given

knots and is continuous from the right then

S(x) =

¸

i

S(x

i

)B

0

i

(x).

That is, the basis splines work in the same way that Lagrange Polynomials worked for polynomial

interpolation.

The B splines of degree k are deﬁned recursively:

B

k

i

(x) =

x −t

i

t

i+k

−t

i

B

k−1

i

(x) +

t

i+k+1

−x

t

i+k+1

−t

i+1

B

k−1

i+1

(x).

Some B splines are shown in Figure 6.3.

The B splines quickly become unwieldy. We focus on the case k = 1. The B spline B

1

i

(x) is

• Piecewise linear.

• Continuous.

• Nonzero only on (t

i

, t

i+2

).

• 1 at t

i+1

.

These B splines are sometimes called hat functions. Imagine wearing a hat shaped like this! What-

ever.

The nice thing about the hat functions is they allow us to use analogy. Harken back to polyno-

mial interpolation and the Lagrange Functions. The hat functions play a similar role because

B

1

i

(t

j

) = δ

(i+1)j

=

1 (i + 1) = j

0 (i + 1) = j

Then if we want to interpolate the following data with splines of degree 1:

t t

0

t

1

. . . t

n

y y

0

y

1

. . . y

n

We can immediately set

S(x) =

n

¸

i=0

y

i

B

1

i−1

(x).

86 CHAPTER 6. SPLINE INTERPOLATION

-1

-0.5

0

0.5

1

1.5

2

2.5

3

0 1 2 3 4 5 6

(a) Degree 0 B spline

-1

-0.5

0

0.5

1

1.5

2

2.5

3

0 1 2 3 4 5 6

(b) Degree 1 B spline

-1

-0.5

0

0.5

1

1.5

2

2.5

3

0 1 2 3 4 5 6

P

S

f

r

a

g

r

e

p

l

a

c

e

m

e

n

t

s

B

1

0

(x)

B

1

1

(x)

B

2

0

(x)

(c) Degree 2 B spline, with 2 Degree 1 B splines

Figure 6.3: Some B splines for the knots t

0

= 1, t

1

= 2.5, t

2

= 3.5, t

3

= 4 are shown. In (a), one of

the degree 0 B splines; in (b), a degree 1 B spline; in (c), two of the degree 1 B splines and a degree

2 B spline are shown. The two hat functions “merge” together to give the quadratic B spline.

6.3. B SPLINES 87

Exercises

(6.1) Is the following function a linear spline on [0, 4]? Why or why not?

S(x) =

3x + 2 : 0 ≤ x < 1

−2x + 4 : 1 ≤ x ≤ 4

(6.2) Is the following function a linear spline on [0, 2]? Why or why not?

S(x) =

x + 3 : 0 ≤ x < 1

3 : 1 ≤ x ≤ 2

(6.3) Is the following function a linear spline on [0, 4]? Why or why not?

S(x) =

x

2

+ 3 : 0 ≤ x < 3

5x −6 : 3 ≤ x ≤ 4

(6.4) Find constants, α, β such that the following is a linear spline on [0, 5].

S(x) =

4x −2 : 0 ≤ x < 1

αx +β : 1 ≤ x < 3

−2x + 10 : 3 ≤ x ≤ 5

(6.5) Is the following function a quadratic spline on [0, 4]? Why or why not?

Q(x) =

x

2

+ 3 : 0 ≤ x < 3

5x −6 : 3 ≤ x ≤ 4

(6.6) Is the following function a quadratic spline on [0, 2]? Why or why not?

Q(x) =

x

2

+ 3x + 2 : 0 ≤ x < 1

2x

2

+x + 3 : 1 ≤ x ≤ 2

(6.7) Find constants, α, β, γ such that the following is a quadratic spline on [0, 5].

Q(x) =

1

2

x

2

+ 2x +

3

2

: 0 ≤ x < 1

αx

2

+βx +γ : 1 ≤ x < 3

3x

2

−7x + 12 : 3 ≤ x ≤ 5

(6.8) Find the quadratic spline that interpolates the following data:

t 0 1 4

y 1 −2 1

To resolve the single degree of freedom, assume that Q

t

(0) = −Q

t

(4). Assume your solution

takes the form

Q(x) =

α

1

(x −1)

2

+β

1

(x −1) −2 : 0 ≤ x < 1

α

2

(x −1)

2

+β

2

(x −1) −2 : 1 ≤ x ≤ 4

Find the constants α

1

, β

1

, α

2

, β

2

.

88 CHAPTER 6. SPLINE INTERPOLATION

(6.9) Find the natural cubic spline that interpolates the data

x 0 1 3

y 4 2 7

It may help to assume your answer has the form

S(x) =

Ax

3

+Bx

2

+Cx + 4 : 0 ≤ x < 1

D(x −1)

3

+E(x −1)

2

+F(x −1) + 2 : 1 ≤ x ≤ 3

Chapter 7

Approximating Derivatives

7.1 Finite Diﬀerences

Suppose we have some blackbox function f(x) and we wish to calculate f

t

(x) at some given x. Not

surprisingly, we start with Taylor’s theorem:

f(x +h) = f(x) +f

t

(x)h +

f

tt

(ξ)h

2

2

.

Rearranging we get

f

t

(x) =

f(x +h) −f(x)

h

−

f

tt

(ξ)h

2

.

Remember that ξ is between x and x +h, but its exact value is not known. If we wish to calculate

f

t

(x), we cannot evaluate f

tt

(ξ), so we approximate the derivative by dropping the last term. That

is, we calculate [f(x +h) −f(x)] /h as an approximation

1

to f

t

(x). In so doing, we have dropped

the last term. If there is a ﬁnite bound on f

tt

(z) on the interval in question then the dropped term

is bounded by a constant times h. That is,

f

t

(x) =

f(x +h) −f(x)

h

+O(h) (7.1)

The error that we incur when we approximate f

t

(x) by calculating [f(x +h) −f(x)] /h is called

truncation error. It has nothing to do with the kind of error that you get when you do calculations

with a computer with limited precision; even if you worked in inﬁnite precision, you would still

have truncation error.

The truncation error can be made small by making h small. However, as h gets smaller, precision

will be lost in equation 7.1 due to subtractive cancellation. The error in calculation for small h

is called roundoﬀ error. Generally the roundoﬀ error will increase as h decreases. Thus there is

a nonzero h for which the sum of these two errors is minimized. See Example Problem 7.5 for an

example of this.

The truncation error for this approximation is O(h). We may want a more precise approxima-

tion. By now, you should know that any calculation starts with Taylor’s Theorem:

f(x +h) = f(x) +f

t

(x)h +

f

tt

(x)

2

h

2

+

f

ttt

(ξ

1

)

3!

h

3

f(x −h) = f(x) −f

t

(x)h +

f

tt

(x)

2

h

2

−

f

ttt

(ξ

2

)

3!

h

3

1

This approximation for f

(x) should remind you of the deﬁnition of f

(x) as a limit.

89

90 CHAPTER 7. APPROXIMATING DERIVATIVES

By subtracting these two lines, we get

f(x +h) −f(x −h) = 2f

t

(x)h +

f

ttt

(ξ

1

) +f

ttt

(ξ

2

)

3!

h

3

.

Thus

2f

t

(x)h = f(x +h) −f(x −h) −

f

ttt

(ξ

1

) +f

ttt

(ξ

2

)

3!

h

3

f

t

(x) =

f(x +h) −f(x −h)

2h

−

¸

f

ttt

(ξ

1

) +f

ttt

(ξ

2

)

2

h

2

6

If f

ttt

(x) is continuous, then there is some ξ between ξ

1

, ξ

2

such that f

ttt

(ξ) =

f

(ξ

1

)+f

(ξ

2

)

2

. (This

is the MVT at work.) Assuming some uniform bound on f

ttt

(), we get

f

t

(x) =

f(x +h) −f(x −h)

2h

+O

h

2

(7.2)

In some situations it may be necessary to use evaluations of a function at “odd” places to

approximate a derivative. These are usually straightforward to derive, involving the use of Taylor’s

Theorem. The following examples illustrate:

Example Problem 7.1. Use evaluations of f at x+h and x+2h to approximate f

t

(x), assuming

f(x) is an analytic function, i.e., one with inﬁntely many derivatives.

Solution: First use Taylor’s Theorem to expand f(x+h) and f(x+2h), then subtract to get some

factor of f

t

(x):

f(x + 2h) = f(x) + 2hf

t

(x) +

4h

2

2!

f

tt

(x) +

8h

3

3!

f

ttt

(x) +

16h

4

4!

f

(4)

(x) +. . .

f(x +h) = f(x) +hf

t

(x) +

h

2

2!

f

tt

(x) +

h

3

3!

f

ttt

(x) +

h

4

4!

f

(4)

(x) +. . .

f(x + 2h) −f(x +h) = hf

t

(x) +

3h

2

2!

f

tt

(x) +

7h

3

3!

f

ttt

(x) +

15h

4

4!

f

(4)

(x) +. . .

(f(x + 2h) −f(x +h)) /h = f

t

(x) +

3h

2!

f

tt

(x) +

7h

2

3!

f

ttt

(x) +

15h

3

4!

f

(4)

(x) +. . .

Thus (f(x + 2h) −f(x +h)) /h = f

t

(x) +O(h) ¬

Example Problem 7.2. Show that

4f(x +h) −f(x + 2h) −3f(x)

2h

= f

t

(x) +O

h

2

,

for f with suﬃcient number of derivatives

Solution: In this case we do not have to ﬁnd the approximation scheme, it is given to us. We only

have to expand the appropriate terms with Taylor’s Theorem. As before:

f(x +h) = f(x) +hf

t

(x) +

h

2

2!

f

tt

(x) +

h

3

3!

f

ttt

(x) +. . .

4f(x +h) = 4f(x) + 4hf

t

(x) +

4h

2

2!

f

tt

(x) +

4h

3

3!

f

ttt

(x) +. . .

f(x + 2h) = f(x) + 2hf

t

(x) +

4h

2

2!

f

tt

(x) +

8h

3

3!

f

ttt

(x) +. . .

4f(x +h) −f(x + 2h) = 3f(x) + 2hf

t

(x) + 0f

tt

(x) +

−4h

3

3!

f

ttt

(x) +. . .

4f(x +h) −f(x + 2h) −3f(x) = 2hf

t

(x) +

−4h

3

3!

f

ttt

(x) +. . .

(4f(x +h) −f(x + 2h) −3f(x)) /2h = f

t

(x) +

−2h

2

3!

f

ttt

(x) +. . .

¬

7.2. RICHARDSON EXTRAPOLATION 91

7.1.1 Approximating the Second Derivative

Suppose we want to approximate the second derivative of some blackbox function f(x). Again,

start with Taylor’s Theorem:

f(x +h) = f(x) +f

t

(x)h +

f

tt

(x)

2

h

2

+

f

ttt

(x)

3!

h

3

+

f

(4)

(x)

4!

h

4

+. . .

f(x −h) = f(x) −f

t

(x)h +

f

tt

(x)

2

h

2

−

f

ttt

(x)

3!

h

3

+

f

(4)

(x)

4!

h

4

−. . .

Now add the two series to get

f(x +h) +f(x −h) = 2f(x) +h

2

f

tt

(x) + 2

f

(4)

(x)

4!

h

4

+ 2

f

(6)

(x)

6!

h

6

+. . .

Then let

ψ(h) =

f(x +h) −2f(x) +f(x −h)

h

2

= f

tt

(x) + 2

f

(4)

(x)

4!

h

2

+ 2

f

(6)

(x)

6!

h

4

+. . . ,

= f

tt

(x) +

∞

¸

k=1

b

2k

h

2k

.

Thus we can use Richardson Extrapolation on ψ(h) to get higher order approximations.

This derivation also gives us the centered diﬀerence approximation to the second derivative:

f

tt

(x) =

f(x +h) −2f(x) +f(x −h)

h

2

+O

h

2

. (7.3)

7.2 Richardson Extrapolation

The centered diﬀerence approximation gives a truncation error of O

h

2

**, which is better than
**

O(h) . Can we do better? Let’s deﬁne

φ(h) =

1

2h

[f(x +h) −f(x −h)] .

Had we expanded the Taylor’s Series for f(x +h), f(x −h) to more terms we would have seen

that

φ(h) = f

t

(x) +a

2

h

2

+a

4

h

4

+a

6

h

6

+a

8

h

8

+. . .

The constants a

i

are a function of f

(i+1)

(x) only. (In fact, they should take the value of

f

(i+1)

(x)

(i+1)!

.)

What happens if we now calculate φ(h/2)?

φ(h/2) = f

t

(x) +

1

4

a

2

h

2

+

1

16

a

4

h

4

+

1

64

a

6

h

6

+

1

256

a

8

h

8

+. . .

But we can combine this with φ(h) to get better accuracy. We have to be a little tricky, but we

can get the O

h

2

**terms to cancel by taking the right multiples of these two approximations:
**

φ(h) −4φ(h/2) = −3f

t

(x) +

3

4

4h

4

+

15

16

a

6

h

6

+

63

64

a

8

h

8

+. . .

4φ(h/2) −φ(h)

3

= f

t

(x) −

1

4

4h

4

−

5

16

a

6

h

6

−

21

64

a

8

h

8

+. . .

92 CHAPTER 7. APPROXIMATING DERIVATIVES

This approximation has a truncation error of O

h

4

.

This technique of getting better approximations is known as the Richardson Extrapolation, and

can be repeatedly applied. We will also use this technique later to get better quadrature rules–that

is, ways of approximating the deﬁnite integral of a function.

7.2.1 Abstracting Richardson’s Method

We now discuss Richardson’s Method in a more abstract framework. Suppose you want to calculate

some quantity L, and have found, through theory, some approximation:

φ(h) = L +

∞

¸

k=1

a

2k

h

2k

.

Let

D(n, 0) = φ

h

2

n

.

Now deﬁne

D(n, m) =

4

m

D(n, m−1) −D(n −1, m−1)

4

m

−1

. (7.4)

We will be interested in calculating D(n, n) for some n. We claim that

D(n, n) = L +O

h

2(n+1)

.

First we examine the recurrence for D(n, m). As in divided diﬀerences, we use a pyramid table:

D(0, 0)

D(1, 0) D(1, 1)

D(2, 0) D(2, 1) D(2, 2)

.

.

.

.

.

.

.

.

.

.

.

.

D(n, 0) D(n, 1) D(n, 2) D(n, n)

By deﬁnition we know how to calculate the ﬁrst column of this table; every other entry in the table

depends on two other entries, one directly to the left, and the other to the left and up one space.

Thus to calculate D(n, n) we have to compute this whole lower triangular array.

We want to show that D(n, n) = L+O

h

2(n+1)

, that is D(n, n) is a O

h

2(n+1)

approximation

to L. The following theorem gives this result:

Theorem 7.3 (Richardson Extrapolation). There are constants a

k,m

such that

D(n, m) = L +

∞

¸

k=m+1

a

k,m

h

2

n

2k

(0 ≤ m ≤ n) .

The proof is by an easy, but tedious, induction. We skip the proof.

7.2. RICHARDSON EXTRAPOLATION 93

7.2.2 Using Richardson Extrapolation

We now try out the technique on an example or two.

Example Problem 7.4. Approximate the derivative of f(x) = log x at x = 1.

Solution: The real answer is f

t

(1) = 1/1 = 1, but our computer doesn’t know that. Deﬁne

φ(h) =

1

2h

[f(1 +h) −f(1 −h)] =

log

1+h

1−h

2h

.

Let’s use h = 0.1. We now try to ﬁnd D(2, 2), which is supposed to be a O

h

6

approximation to

f

t

(1) = 1:

n`m 0 1 2

0

log

1.1

0.9

0.2

≈ 1.003353477

1

log

1.05

0.95

0.1

≈ 1.000834586 ≈ 0.999994954

2

log

1.025

0.975

0.05

≈ 1.000208411 ≈ 0.999999686 ≈ 1.000000002

This shows that the Richardson method is pretty good. However, notice that for this simple

example, we have, already, that φ(0.00001) ≈ 0.999999999. ¬

Example Problem 7.5. Consider the ugly function:

f(x) = arctan(x).

Attempt to ﬁnd f

t

(

√

2). Recall that f

t

(x) =

1

1+x

2

, so the value that we are seeking is

1

3

.

Solution: Let’s use h = 0.01. We now try to ﬁnd D(2, 2), which is supposed to be a O

h

6

approximation to

1

3

:

n`m 0 1 2

0 0.333339506181068

1 0.333334876543723 0.333333333331274

2 0.33333371913582 0.333333333333186 0.333333333333313

Note that we have some motivation to use Richardson’s method in this case: If we let

φ(h) =

1

2h

f(

√

2 +h) −f(

√

2 −h)

,

then making h small gives a good approximation to f

t

√

2

**until subtractive cancelling takes over.
**

The following table illustrates this:

94 CHAPTER 7. APPROXIMATING DERIVATIVES

h φ(h)

1.0 0.39269908169872408

0.1 0.33395069677431943

0.01 0.33333950618106845

0.001 0.33333339506169679

0.0001 0.33333333395058062

1 10

−5

0.33333333334106813

1 10

−6

0.33333333332441484

1 10

−7

0.33333333315788138

1 10

−8

0.33333332760676626

1 10

−9

0.33333336091345694

1 10

−10

0.333333360913457

1 10

−11

0.333333360913457

1 10

−12

0.33339997429493451

1 10

−13

0.33306690738754696

1 10

−14

0.33306690738754696

1 10

−15

0.33306690738754691

1 10

−16

0

The data are illustrated in Figure 7.1. Notice that φ(h) gives at most 10 decimal places of

accuracy, then begins to deteriorate; Note however, we get 13 decimal places from D(2, 2). ¬

1e-12

1e-10

1e-08

1e-06

0.0001

0.01

1

1e-16 1e-14 1e-12 1e-10 1e-08 1e-06 0.0001 0.01 1

P

S

f

r

a

g

r

e

p

l

a

c

e

m

e

n

t

s

total error

Figure 7.1: The total error for the centered diﬀerence approximation to f

t

(

√

2) is shown versus

h. The total error is the sum of a truncation term which decreases as h decreases, and a roundoﬀ

term which increases. The optimal h value is around 1 10

−5

. Note that Richardson’s D(2, 2)

approximation with h = 0.01 gives much better results than this optimal h.

7.2. RICHARDSON EXTRAPOLATION 95

Exercises

(7.1) Derive the approximation

f

t

(x) ≈

4f(x +h) −3f(x) −f(x −2h)

6h

using Taylor’s Theorem.

(a) Assuming that f(x) has bounded derivatives, give the accuracy of the above approxi-

mation. Your answer should be something like O

h

?

.

(b) Let f(x) = x

3

. Approximate f

t

(0) with this approximation, using h =

1

4

.

(7.2) Let f(x) be an analytic function, i.e., one which is inﬁnitely diﬀerentiable. Let ψ(h) be the

centered diﬀerence approximation to the ﬁrst derivative:

ψ(h) =

f(x +h) −f(x −h)

2h

(a) Show that ψ(h) = f

t

(x) +

h

2

3!

f

ttt

(x) +

h

4

5!

f

(5)

(x) +

h

6

7!

f

(7)

(x) +. . .

(b) Show that

8 (ψ(h) −ψ(h/2))

h

2

= f

ttt

(x) +O

h

2

.

(7.3) Derive the approximation

f

t

(x) ≈

4f(x + 3h) + 5f(x) −9f(x −2h)

30h

using Taylor’s Theorem.

(a) What order approximation is this? (Assume f(x) has bounded derivatives of arbitrary

order.)

(b) Use this formula to approximate f

t

(0), where f(x) = x

4

, and h = 0.1

(7.4) Suppose you want to know quantity Q, and can approximate it with some formula, say φ(h),

which depends on parameter h, and such that φ(h) = Q + a

1

h + a

2

h

2

+ a

3

h

3

+ a

4

h

4

+ . . .

Find some linear combination of φ(h) and φ(−h) which is a O

h

2

approximation to Q.

(7.5) Assuming that φ(h) = Q + a

2

h

2

+ a

4

h

4

+ a

6

h

6

. . ., ﬁnd some combination of φ(h), φ(h/3)

which is a O

h

4

approximation to Q.

(7.6) Let λ be some number in (0, 1). Assuming that φ(h) = Q+a

2

h

2

+a

4

h

4

+a

6

h

6

. . ., ﬁnd some

combination of φ(h), φ(λh) which is a O

h

4

**approximation to Q. To make the constant
**

associated with the h

4

term small in magnitude, what should you do with λ? Is this practical?

Note that the method of Richardson Extrapolation that we considered used the value λ = 1/2.

(7.7) Assuming that φ(h) = Q + a

2

h

2

+ a

4

h

4

+ a

6

h

6

. . ., ﬁnd some combination of φ(h), φ(h/4)

which is a O

h

4

approximation to Q.

(7.8) Suppose you have some great computational approximation to the quantity Q such that

ψ(h) = Q +a

3

h

3

+ a

6

h

6

+ a

9

h

9

. . . Can you ﬁnd some combination of ψ(h), ψ(h/2) which is

a O

h

6

approximation to Q?

(7.9) Complete the following Richardson’s Extrapolation Table, assuming the ﬁrst column consists

of values D(n, 0) for n = 0, 1, 2:

n`m 0 1 2

0 2

1 1.5 ?

2 1.25 ? ?

(See equation 7.4 if you’ve forgotten the deﬁnitions.)

96 CHAPTER 7. APPROXIMATING DERIVATIVES

(7.10) Write code to complete a Richardson’s Method table, given the ﬁrst column. Your m-ﬁle

should have header line like:

function Dnn = richardsons(col0)

where Dnn is the value at the lower left corner of the table, D(n, n) while col0 is the column

of n + 1 values D(i, 0), for i = 0, 1, . . . , n. Test your code on the following input:

octave:1> col0 = [1 0.5 0.25 0.125 0.0625 0.03125];

octave:2> richardsons(col0)

ans = 0.019042

(a) What do you get when you try the following?

octave:5> col0 = [1.5 0.5 1.5 0.5 1.5 0.5 1.5];

octave:6> richardsons(col0)

(b) What do you get when you try the following?

octave:7> col0 = [0.9 0.99 0.999 0.9999 0.99999];

octave:8> richardsons(col0)

Chapter 8

Integrals and Quadrature

8.1 The Deﬁnite Integral

Often enough the numerical analyst is presented with the challenge of ﬁnding the deﬁnite integral

of some function:

b

a

f(x) dx.

In your golden years of Calculus, you learned the Fundamental Theorem of Calculus, which claims

that if f(x) is continuous, and F(x) is an antiderivative of f(x), then

b

a

f(x) dx = F(b) −F(a).

What you might not have been told in Calculus is there are some functions for which a closed

form antiderivative does not exist or at least is not known to humankind. Nevertheless, you may

ﬁnd yourself in a situation where you have to evaluate an integral for just such an integrand. An

approximation will have to do.

8.1.1 Upper and Lower Sums

We will review the deﬁnition of the Riemann integral of a function. A partition of an interval [a, b]

is a ﬁnite, ordered collection of nodes x

i

:

a = x

0

< x

1

< x

2

< < x

n

= b.

Given such a partition, P, deﬁne the upper and lower bounds on each subinterval [x

j

, x

j+1

] as

follows:

m

i

= inf ¦f(x) [ x

i

≤ x ≤ x

i+1

¦

M

i

= sup¦f(x) [ x

i

≤ x ≤ x

i+1

¦

Then for this function f and partition P, deﬁne the upper and lower sums:

L(f, P) =

n−1

¸

i=0

m

i

(x

i+1

−x

i

)

U(f, P) =

n−1

¸

i=0

M

i

(x

i+1

−x

i

)

97

98 CHAPTER 8. INTEGRALS AND QUADRATURE

We can interpret the upper and lower sums graphically as the sums of areas of rectangles deﬁned

by the function f and the partition P, as in Figure 8.1.

bottomright

**(a) The Lower Sum
**

bottomright

**(b) The Upper Sum
**

Figure 8.1: The (a) lower, and (b) upper sums of a function on a given interval are shown. These

approximations to the integral are the sums of areas of rectangles. Note that the lower sums are

an underestimate, and the upper sums an overestimate of the integral.

Notice a few things about the upper, lower sums:

(i) L(f, P) ≤ U(f, P).

(ii) If we switch to a “better” partition (i.e., a ﬁner one), we expect that L(f, ) increases and

U(f, ) decreases.

The notion of integrability familiar from Calculus class (that is Riemann Integrability) is deﬁned

in terms of the upper and lower sums.

Deﬁnition 8.1. A function f is Riemann Integrable over interval [a, b] if

sup

P

L(f, P) = inf

P

U(f, P),

where the supremum and inﬁmum are over all partitions of the interval [a, b]. Moreover, in case

f(x) is integrable, we deﬁne the integral

b

a

f(x) dx = inf

P

U(f, P),

You may recall the following

Theorem 8.2. Every continuous function on a closed bounded interval of the real line is Riemann

Integrable (on that interval).

Continuity is suﬃcient, but not necessary.

Example 8.3. Consider the Heaviside function:

f(x) =

0 x < 0

1 0 ≤ x

This function is not continuous on any interval containing 0, but is Riemann Integrable on every

closed bounded interval.

8.1. THE DEFINITE INTEGRAL 99

Example 8.4. Consider the Dirichlet function:

f(x) =

0 x rational

1 x irrational

For any partition P of any interval [a, b], we have L(f, P) = 0, while U(f, P) = 1, so

sup

P

L(f, P) = 0 = 1 = inf

P

U(f, P),

so this function is not Riemann Integrable.

8.1.2 Approximating the Integral

The deﬁnition of the integral gives a simple method of approximating an integral

b

a

f(x) dx. The

method cuts the interval [a, b] into a partition of n equal subintervals x

i

= a+

b−a

n

, for i = 0, 1, . . . , n.

The algorithm then has to somehow ﬁnd the supremum and inﬁmum of f(x) on each interval

[x

i

, x

i+1

]. The integral is then approximated by the mean of the lower and upper sums:

b

a

f(x) dx ≈

1

2

(L(f, P) +U(f, P)) .

Because the value of the integral is between L(f, P) and U(f, P), this approximation has error at

most

1

2

(U(f, P) −L(f, P)) .

Note that in general, or for a black box function, it is usually not feasible to ﬁnd the suprema

and inﬁma of f(x) on the subintervals, and thus the lower and upper sums cannot be calculated.

However, if some information is known about the function, it becomes easier:

Example 8.5. Consider for example, using this method on some function f(x) which is monotone

increasing, that is x ≤ y implies f(x) ≤ f(y). In this case, the inﬁmum of f(x) on each interval

occurs at the leftmost endpoint, while the supremum occurs at the right hand endpoint. Thus for

this partition, P, we have

L(f, P) =

n−1

¸

k=0

m

i

[x

k+1

−x

k

[ =

[b −a[

n

n−1

¸

k=0

f(x

k

)

U(f, P) =

n−1

¸

k=0

M

i

[x

k+1

−x

k

[ =

[b −a[

n

n−1

¸

k=0

f(x

k+1

) =

[b −a[

n

n

¸

k=1

f(x

k

)

Then the error of the approximation is

1

2

(U(f, P) −L(f, P)) =

1

2

[b −a[

n

[f(x

n

) −f(x

0

)] =

[b −a[ [f(b) −f(a)]

2n

.

8.1.3 Simple and Composite Rules

For the remainder of this chapter we will study “simple” quadrature rules, i.e., rules which approx-

imate the integral of a function, f(x) over an interval [a, b] by means of a number of evaluations of

f at points in this interval. The error of a simple quadrature rule usually depends on the function

100 CHAPTER 8. INTEGRALS AND QUADRATURE

f, and the width of the interval [a, b] to some power which is determined by the rule. That is we

usually think of a simple rule as being applied to a small interval.

To use a simple rule on a larger interval, we usually cast it into a “composite” rule. Thus the

trapezoidal rule, which we will study next becomes the composite trapezoidal rule. The means of

extending a simple rule to a composite rule is straightforward: Partition the given interval into

subintervals, apply the simple rule to each subinterval, and sum the results. Thus, for example if

the interval in question is [α, β], and the partition is α = x

0

< x

1

< x

2

< . . . < x

n

= β, we have

composite rule on [α, β] =

n−1

¸

i=0

simple rule applied to [x

i

, x

i+1

].

8.2 Trapezoidal Rule

Suppose we are trying to approximate the integral

b

a

f(x) dx,

for some unpleasant or black box function f(x).

The trapezoidal rule approximates the integral

b

a

f(x) dx

by the (signed) area of the trapezoid through the points (a, f(a)) , (b, f(b)) , and with one side the

segment from a to b. See Figure 8.2.

lleft

uright

P

S

f

r

a

g

r

e

p

l

a

c

e

m

e

n

t

s

a b

Figure 8.2: The trapezoidal rule for approximating the integral of a function over [a, b] is shown.

By old school math, we can ﬁnd this signed area easily. This gives the (simple) trapezoidal rule:

b

a

f(x) dx ≈ (b −a)

f(a) +f(b)

2

.

8.2. TRAPEZOIDAL RULE 101

The composite trapezoidal rule can be written in a simpliﬁed form, one which you saw in your

calculus class, if the interval in question is partitioned into equal width subintervals. That is if you

let [α, β] be partitioned by

α = x

0

< x

1

< x

2

< x

3

< . . . < x

n

= β,

with x

i

= α +ih, where h = (β −α)/n, then the composite trapezoidal rule is

β

α

f(x) dx =

n−1

¸

i=0

x

i+1

x

i

f(x) dx ≈

1

2

n−1

¸

i=0

(x

i+1

−x

i

) [f(x

i

) +f(x

i+1

)] . (8.1)

Since each subinterval has equal width, x

i+1

−x

i

= h, and we have

b

a

f(x) dx ≈

h

2

n−1

¸

i=0

[f(x

i

) +f(x

i+1

)] . (8.2)

In your calculus class, you saw this in the less comprehensible form:

b

a

f(x) dx ≈ h

¸

f(a) +f(b)

2

+

n−1

¸

i=1

f(x

i

)

¸

.

Note that the composite trapezoidal rule for equal subintervals is the same as the approximation

we found for increasing functions in Example 8.5.

Example Problem 8.6. Approximate the integral

2

0

1

1 +x

2

dx

by the composite trapezoidal rule with a partition of equally spaced points, for n = 2.

Solution: We have h =

2−0

2

= 1, and f(x

0

) = 1, f(x

1

) =

1

2

, f(x

2

) =

1

5

. Then the composite

trapezoidal rule gives the value

1

2

[f(x

0

) +f(x

1

) +f(x

1

) +f(x

2

)] =

1

2

¸

1 + 1 +

1

5

=

11

10

.

The actual value is arctan 2 ≈ 1.107149, and our approximation is correct to two decimal places. ¬

8.2.1 How Good is the Composite Trapezoidal Rule?

We consider the composite trapezoidal rule for partitions of equal subintervals. Let p

i

(x) be the

polynomial of degree ≤ 1 that interpolates f(x) at x

i

, x

i+1

. Let

I

i

=

x

i+1

x

i

f(x) dx, T

i

=

x

i+1

x

i

p

i

(x) dx = (x

i+1

−x

i

)

p

i

(x

i

) +p

i

(x

i+1

)

2

=

h

2

(f(x

i

) +f(x

i+1

)) .

That’s right: the composite trapezoidal rule approximates the integral of f(x) over [x

i

, x

i+1

] by

the integral of p

i

(x) over the same interval.

Now recall our theorem on polynomial interpolation error. For x ∈ [x

i

, x

i+1

] , we have

f(x) −p

i

(x) =

1

(2)!

f

(2)

(ξ

x

) (x −x

i

) (x −x

i+1

) ,

102 CHAPTER 8. INTEGRALS AND QUADRATURE

for some ξ

x

∈ [x

i

, x

i+1

] . Recall that ξ

x

depends on x. To make things simpler, call it ξ(x).

Now integrate:

I

i

−T

i

=

x

i+1

x

i

f(x) −p

i

(x) dx =

1

2

x

i+1

x

i

f

tt

(ξ(x)) (x −x

i

) (x −x

i+1

) dx.

We will now attack the integral on the right hand side. Recall the following theorem:

Theorem 8.7 (Mean Value Theorem for Integrals). Suppose f is continuous, g is Riemann

Integrable and does not change sign on [α, β]. Then there is some ζ ∈ [α, β] such that

β

α

f(x)g(x) dx = f(ζ)

β

α

g(x) dx.

We use this theorem on our integral. Note that (x −x

i

) (x −x

i+1

) is nonpositive on the interval

of question, [x

i

, x

i+1

]. We assume continuity of f

tt

(x), and wave our hands to get continuity of

f

tt

(ξ(x)). Then we have

I

i

−T

i

=

1

2

f

tt

(ξ)

x

i+1

x

i

(x −x

i

) (x −x

i+1

) dx,

for some ξ

i

∈ [x

i

, x

i+1

]. By boring calculus and algebra, we ﬁnd that

x

i+1

x

i

(x −x

i

) (x −x

i+1

) dx = −

h

3

6

.

This gives

I

i

−T

i

= −

h

3

12

f

tt

(ξ

i

),

for some ξ

i

∈ [x

i

, x

i+1

].

We now sum over all subintervals to ﬁnd the total error of the composite trapezoidal rule

E =

n−1

¸

i=0

I

i

−T

i

= −

h

3

12

n−1

¸

i=0

f

tt

(ξ

i

) = −

(b −a) h

2

12

¸

1

n

n−1

¸

i=0

f

tt

(ξ

i

)

¸

.

On the far right we have an average value,

1

n

¸

n−1

i=0

f

tt

(ξ

i

), which lies between the least and greatest

values of f

tt

on the inteval [a, b], and thus by the IVT, there is some ξ which takes this value. So

E = −

(b −a) h

2

12

f

tt

(ξ)

This gives us the theorem:

Theorem 8.8 (Error of the Composite Trapezoidal Rule). Let f

tt

(x) be continuous on [a, b].

Let T be the value of the trapezoidal rule applied to f(x) on this interval with a partition of uniform

spacing, h, and let I =

b

a

f(x) dx. Then there is some ξ ∈ [a, b] such that

I −T = −

(b −a) h

2

12

f

tt

(ξ).

Note that this theorem tells us not only the magnitude of the error, but the sign as well. Thus

if, for example, f(x) is concave up and thus f

tt

is positive, then I − T will be negative, i.e., the

trapezoidal rule gives an overestimate of the integral I. See Figure 8.3.

8.2. TRAPEZOIDAL RULE 103

lleft

uright

P

S

f

r

a

g

r

e

p

l

a

c

e

m

e

n

t

s

x

i

x

i+1

Figure 8.3: The trapezoidal rule is an overestimate for a function which is concave up, i.e., has

positive second derivative.

8.2.2 Using the Error Bound

Example Problem 8.9. How many intervals are required to approximate the integral

ln 2 = I =

1

0

1

1 +x

dx

to within 1 10

−10

?

Solution: We have f(x) =

1

1+x

, thus f

t

(x) = −

1

(1+x)

2

. And f

tt

(x) =

2

(1+x)

3

. Thus f

tt

(ξ) is

continuous and bounded by 2 on [0, 1]. If we use n equal subintervals then Theorem 8.8 tells us the

error will be

−

1 −0

12

1 −0

n

2

f

tt

(ξ) = −

f

tt

(ξ)

12n

2

.

To make this smaller than 1 10

−10

, in absolute value, we need only take

1

6n

2

≤ 1 10

−10

,

and so n ≥

1

6

10

5

suﬃces. Because f

tt

(x) is positive on this interval, the trapezoidal rule will

be an overestimate. ¬

Example Problem 8.10. How many intervals are required to approximate the integral

2

0

x

3

−1 dx

to within 1 10

−6

?

Solution: We have f(x) = x

3

− 1, thus f

t

(x) = 3x

2

, and f

tt

(x) = 6x. Thus f

tt

(ξ) is continuous

and bounded by 12 on [0, 2]. If we use n equal subintervals then by Theorem 8.8 the error will be

−

2 −0

12

2 −0

n

2

f

tt

(ξ) = −

2f

tt

(ξ)

3n

2

.

104 CHAPTER 8. INTEGRALS AND QUADRATURE

To make this smaller than 1 10

−6

, in absolute value, it suﬃces to take

24

3n

2

≤ 1 10

−6

,

and so n ≥

√

8 10

3

suﬃces. Because f

tt

(x) is positive on this interval, the trapezoidal rule will

be an overestimate. ¬

8.3 Romberg Algorithm

Theorem 8.8 tells us, approximately, that the error of the composite trapezoidal rule approximation

is O

h

2

**. If we halve h, the error is quartered. Sometimes we want to do better than this. We’ll
**

use the same trick that we did from Richardson extrapolation. In fact, the forms are exactly the

same.

Towards this end, suppose that f, a, b are given. For a given n, we are going to use the trapezoidal

rule on a partition of 2

n

equal subintervals of [a, b]. That is h =

b−a

2

n

. Then deﬁne

φ(n) =

1

2

b −a

2

n

2

n

−1

¸

i=0

f(x

i

) +f(x

i+1

)

=

b −a

2

n

¸

f(a)

2

+

f(b)

2

+

2

n

−1

¸

i=1

f

a +i

b −a

2

n

¸

.

The intervals used to calculate φ(n + 1) are half the size of those for φ(n). As mentioned above,

this means the error is one quarter.

It turns out that if we had proved the error theorem diﬀerently, we would have proved the

relation:

φ(n) =

b

a

f(x) dx +a

2

h

2

n

+a

4

h

4

n

+a

6

h

6

n

+a

8

h

8

n

+. . . ,

where h

n

=

b−a

2

n

. The constants a

i

are a function of f

(i)

(x) only. This should look just like something

from Chapter 7. What happens if we now calculate φ(n + 1)? We have

φ(n + 1) =

b

a

f(x) dx +a

2

h

2

n+1

+a

4

h

4

n+1

+a

6

h

6

n+1

+a

8

h

8

n+1

+. . . ,

=

b

a

f(x) dx +

1

4

a

2

h

2

n

+

1

16

a

4

h

4

n

+

1

64

a

6

h

6

n

+

1

256

a

8

h

8

n

+. . . .

This happens because h

n+1

=

b−a

2

n+1

=

1

2

b−a

2

n

=

hn

2

. As with Richardon’s method for approximating

derivatives, we now combine the right multiples of these:

φ(n) −4φ(n + 1) = −3

b

a

f(x) dx +

3

4

4h

4

n

+

15

16

a

6

h

6

n

+

63

64

a

8

h

8

n

+. . .

4φ(n) −φ(n + 1)

3

=

b

a

f(x) dx −

1

4

4h

4

n

−

5

16

a

6

h

6

n

−

21

64

a

8

h

8

n

+. . .

This approximation has a truncation error of O

h

4

n

.

Like in Richardson’s method, we can use this to get better and better approximations to the

integral. We do this by constructing a triangular array of approximations, each entry depending

on two others. Towards this end, we let

R(n, 0) = φ(n),

8.3. ROMBERG ALGORITHM 105

then deﬁne, for m > 0

R(n, m) =

4

m

R(n, m−1) −R(n −1, m−1)

4

m

−1

. (8.3)

The familiar pyramid table then is:

R(0, 0)

R(1, 0) R(1, 1)

R(2, 0) R(2, 1) R(2, 2)

.

.

.

.

.

.

.

.

.

.

.

.

R(n, 0) R(n, 1) R(n, 2) R(n, n)

Even though this is exactly the same as Richardon’s method, it has another name: this is called

the Romberg Algorithm.

Example Problem 8.11. Approximating the integral

2

0

1

1 +x

2

dx

by Romberg’s Algorithm; ﬁnd R(1, 1).

Solution: The ﬁrst column is calculated by the trapezoidal rule. Successive columns are found by

combining members of previous columns. So we ﬁrst calculate R(0, 0) and R(1, 0). These are fairly

simple, the ﬁrst is the trapezoidal rule on a single subinterval, the second is the trapezoidal rule on

two subintervals. Then

R(0, 0) =

2 −0

1

1

2

[f(0) +f(2)] =

6

5

,

R(1, 0) =

2 −0

2

1

2

[f(0) +f(1) +f(1) +f(2)] =

11

10

.

Then, using Romberg’s Algorithm we have

R(1, 1) =

4R(1, 0) −R(0, 0)

4 −1

=

44

10

−

12

10

3

=

32

30

= 1.0

¯

6.

¬

At this point we are tempted to use Richardson’s analysis. This would claim that R(n, n) is a

O

h

2(n+1)

0

**approximation to the integral. However, h
**

0

= b − a, and need not be smaller than 1.

This is a bit diﬀerent from Richardson’s method, where the original h is independently set before

starting the triangular array; for Romberg’s algorithm, h

0

is determined by a and b.

We can easily deal with this problem by picking some k such that

b−a

2

k

is small enough, say

smaller than 1. Then calculating the following array:

R(k, 0)

R(k + 1, 0) R(k + 1, 1)

R(k + 2, 0) R(k + 2, 1) R(k + 2, 2)

.

.

.

.

.

.

.

.

.

.

.

.

R(k +n, 0) R(k +n, 1) R(k +n, 2) R(k +n, n)

106 CHAPTER 8. INTEGRALS AND QUADRATURE

Quite often Romberg’s Algorithm is used to compute columns of this array. Subtractive can-

celling or unbounded higher derivatives of f(x) can make successive approximations less accurate.

For this reason, entries in ever rightward columns are usually not calculated, rather lower entries

in a single column are calculated instead. That is, the user calculates the array:

R(k, 0)

R(k + 1, 0) R(k + 1, 1)

R(k + 2, 0) R(k + 2, 1) R(k + 2, 2)

.

.

.

.

.

.

.

.

.

.

.

.

R(k +n, 0) R(k +n, 1) R(k +n, 2) R(k +n, n)

R(k +n + 1, 0) R(k +n + 1, 1) R(k +n + 1, 2) R(k +n + 1, n)

R(k +n + 2, 0) R(k +n + 2, 1) R(k +n + 2, 2) R(k +n + 2, n)

R(k +n + 3, 0) R(k +n + 3, 1) R(k +n + 3, 2) R(k +n + 3, n)

.

.

.

.

.

.

.

.

.

.

.

.

Then R(k +n +l, n) makes a ﬁne approximation to the integral as l →∞. Usually n is small,

like 2 or 3.

8.3.1 Recursive Trapezoidal Rule

It turns out there is an eﬃcient way of calculating R(n + 1, 0) given R(n, 0); ﬁrst notice from the

above example that

R(0, 0) =

b −a

1

1

2

[f(a) +f(b)] ,

R(1, 0) =

b −a

2

1

2

¸

f(a) +f(

a +b

2

) +f(

a +b

2

) +f(b)

.

It would be best to calculate R(1, 0) without recalculating f(a) and f(b). It turns out this is possible.

Let h

n

=

b−a

2

n

, and recall that

R(n, 0) = φ(n) = h

n

¸

f(a) +f(b)

2

+

2

n

−1

¸

i=1

f (a +ih

n

)

¸

.

Thus

R(n + 1, 0) = φ(n + 1) = h

n+1

f(a) +f(b)

2

+

2

n+1

−1

¸

i=1

f (a +ih

n+1

)

¸

¸

,

=

1

2

h

n

¸

f(a) +f(b)

2

+

2

n

−1

¸

i=1

f (a + (2i −1)h

n+1

) +f

a + (2i)

1

2

h

n

¸

,

=

1

2

h

n

¸

f(a) +f(b)

2

+

2

n

−1

¸

i=1

f (a +ih

n

) +

2

n

−1

¸

i=1

f (a + (2i −1)h

n+1

)

¸

,

=

1

2

R(n, 0) +h

n+1

2

n

−1

¸

i=1

f (a + (2i −1)h

n+1

) .

Then calculating R(n + 1, 0) requires only 2

n

−1 additional evaluations of f(x), instead of the

2

n+1

+ 1 usually required.

8.4. GAUSSIAN QUADRATURE 107

8.4 Gaussian Quadrature

The word quadrature refers to a method of approximating the integral of a function as the linear

combination of the function at certain points, i.e.,

b

a

f(x) dx ≈ A

0

f(x

0

) +A

1

f(x

1

) +. . . A

n

f(x

n

), (8.4)

for some collection of nodes ¦x

i

¦

n

i=0

, and weights ¦A

i

¦

n

i=0

. Normally one ﬁnds the nodes and weights

in a table somewhere; we expect a quadrature rule with more nodes to be more accurate in some

sense–the tradeoﬀ is in the number of evaluations of f(). We will examine how these rules are

created.

8.4.1 Determining Weights (Lagrange Polynomial Method)

Suppose that the nodes ¦x

i

¦

n

i=0

are given. An easy way to ﬁnd “good” weights ¦A

i

¦

n

i=0

for these

nodes is to rig them so the quadrature rule gives the integral of p(x), the polynomial of degree ≤ n

which interpolates f(x) on these nodes. Recall

p(x) =

n

¸

i=0

f(x

i

)

i

(x),

where

i

(x) is the i

th

Lagrange polynomial. Thus our rigged approximation is the one that gives

b

a

f(x) dx ≈

b

a

p(x) dx =

n

¸

i=0

f(x

i

)

b

a

i

(x) dx.

If we let

A

i

=

b

a

i

(x) dx,

then we have a quadrature rule.

If f(x) is a polynomial of degree ≤ n then f(x) = p(x), and the quadrature rule is exact.

Example Problem 8.12. Construct a quadrature rule on the interval [0, 4] using nodes 0, 1, 2.

Solution: The nodes are given, we determine the weights by constructing the Lagrange Polyno-

mials, and integrating them.

0

(x) =

(x −1)(x −2)

(0 −1)(0 −2)

=

1

2

(x −1)(x −2),

1

(x) =

(x −0)(x −2)

(1 −0)(1 −2)

= −(x)(x −2),

2

(x) =

(x −0)(x −1)

(2 −0)(2 −1)

=

1

2

(x)(x −1).

Then the weights are

A

0

=

4

0

0

(x) dx =

4

0

1

2

(x −1)(x −2) dx =

8

3

,

A

1

=

4

0

1

(x) dx =

4

0

−(x)(x −2) dx = −

16

3

,

A

2

=

4

0

2

(x) dx =

4

0

1

2

(x)(x −1) dx =

20

3

.

108 CHAPTER 8. INTEGRALS AND QUADRATURE

Thus our quadrature rule is

4

0

f(x) dx ≈

8

3

f(0) −

16

3

f(1) +

20

3

f(2).

We expect this rule to be exact for a quadratic function f(x). To illustrate this, let f(x) = x

2

+1.

By calculus we have

4

0

x

2

+ 1 dx =

1

3

x

3

+x

4

0

=

64

3

+ 4 =

76

3

.

The approximation is

4

0

x

2

+ 1 dx ≈

8

3

[0 + 1] −

16

3

[1 + 1] +

20

3

[4 + 1] =

76

3

.

¬

8.4.2 Determining Weights (Method of Undetermined Coeﬃcients)

Using the Lagrange Polynomial Method to ﬁnd the weights A

i

is ﬁne for a computer, but can

be tedious (and error-prone) if done by hand (say, on an exam). The method of undetermined

coeﬃcients is a good alternative for ﬁnding the weights by hand, and for small n.

The idea behind the method is to ﬁnd n+1 equations involving the n+1 weights. The equations

are derived by letting the quadrature rule be exact for f(x) = x

j

for j = 0, 1, . . . , n. That is, setting

b

a

x

j

dx =

n

¸

i=0

A

i

(x

i

)

j

.

For example, we reconsider Example Problem 8.12.

Example Problem 8.13. Construct a quadrature rule on the interval [0, 4] using nodes 0, 1, 2.

Solution: The method of undetermined coeﬃcients gives the equations:

4

0

1 dx = 4 = A

0

+A

1

+A

2

4

0

xdx = 8 = A

1

+ 2A

2

4

0

x

2

dx = 64/3 = A

1

+ 4A

2

.

We perform Na¨ıve Gaussian Elimination on the system:

¸

1 1 1 4

0 1 2 8

0 1 4 64/3

¸

**We get the same weights as in Example Problem 8.12: A
**

2

=

20

3

, A

1

= −

16

3

, A

0

=

8

3

. ¬

Notice the diﬀerence compared to the Lagrange Polynomial Method: undetermined coeﬃcients

requires solution of a linear system, while the former method calculates the weights “directly.”

Since we will not consider n to be very large, solving the linear system may not be too burdensome.

Moreover, the method of undetermined coeﬃcients is useful in more general settings, as illus-

trated by the next example:

8.4. GAUSSIAN QUADRATURE 109

Example Problem 8.14. Determine a “quadrature” rule of the form

1

0

f(x) dx ≈ Af(1) +Bf

t

(1) +Cf

tt

(1)

that is exact for polynomials of highest possible degree. What is the highest degree polynomial for

which this rule is exact?

Solution: Since there are three unknown coeﬃcients to be determined, we look for three equations.

We get these equations by plugging in successive polynomials. That is, we plug in f(x) = 1, x, x

2

and assuming the coeﬃcients give equality:

1

0

1 dx = 1 = A1 +B0 +C 0 = A

1

0

xdx = 1/2 = A1 +B1 +C 0 = A+B

1

0

x

2

dx = 1/3 = A1 +B2 +C 2 = A+ 2B + 2C

This is solved by A = 1, B = −1/2, C = 1/6. This rule should be exact for polynomials of degree

no greater than 2, but it might be better. We should check:

1

0

x

3

dx = 1/4 = 1/2 = 1 −3/2 + 1 = A1 +B3 +C 6,

and thus the rule is not exact for cubic polynomials, or those of higher degree. ¬

8.4.3 Gaussian Nodes

It would seem this is the best we can do: using n+1 nodes we can devise a quadrature rule that is

exact for polynomials of degree ≤ n by choosing the weights correctly. It turns out that by choosing

the nodes in the right way, we can do far better. Gauss discovered that the right nodes to choose

are the n + 1 roots of the (nontrivial) polynomial, q(x), of degree n + 1 which has the property

b

a

x

k

q(x) dx = 0 (0 ≤ k ≤ n) .

(If you view the integral as an inner product, you could say that q(x) is orthogonal to the polyno-

mials x

k

in the resultant inner product space, but that’s just fancy talk.)

Suppose that we have such a q(x)–we will not prove existence or uniqueness. Let f(x) be a

polynomial of degree ≤ 2n + 1. We write

f(x) = p(x)q(x) +r(x).

Both p(x), r(x) are of degree ≤ n. Because of how we picked q(x) we have

b

a

p(x)q(x) dx = 0.

Thus

b

a

f(x) dx =

b

a

p(x)q(x) dx +

b

a

r(x)dx =

b

a

r(x)dx.

110 CHAPTER 8. INTEGRALS AND QUADRATURE

Now suppose that the A

i

are chosen by Lagrange Polynomials so the quadrature rule on the

nodes x

i

is exact for polynomials of degree ≤ n. Then

n

¸

i=0

A

i

f(x

i

) =

n

¸

i=0

A

i

[p(x

i

)q(x

i

) +r(x

i

)] =

n

¸

i=0

A

i

r(x

i

).

The last equality holds because the x

i

are the roots of q(x). Because of how the A

i

are chosen we

then have

n

¸

i=0

A

i

f(x

i

) =

n

¸

i=0

A

i

r(x

i

) =

b

a

r(x) dx =

b

a

f(x) dx.

Thus this rule is exact for f(x). We have (or rather, Gauss has) made quadrature twice as good.

Theorem 8.15 (Gaussian Quadrature Theorem). Let x

i

be the n + 1 roots of a (nontrivial)

polynomial, q(x), of degree n + 1 which has the property

b

a

x

k

q(x) dx = 0 (0 ≤ k ≤ n) .

Let A

i

be the coeﬃcients for these nodes chosen by integrating the Lagrange Polynomials. Then the

quadrature rule for this choice of nodes and coeﬃcients is exact for polynomials of degree ≤ 2n+1.

8.4.4 Determining Gaussian Nodes

We can determine the Gaussian nodes in the same way we determine coeﬃcients. The example is

illustrative

Example Problem 8.16. Find the two Gaussian nodes for a quadrature rule on the interval [0, 2].

Solution: We will ﬁnd the function q(x) of degree 2, which is “orthogonal” to 1, x under the inner

product of integration over [0, 2]. Thus we let q(x) = c

0

+c

1

x +c

2

x

2

. The orthogonality condition

becomes

2

0

1q(x) dx =

2

0

xq(x) dx = 0 that is,

2

0

c

0

+c

1

x +c

2

x

2

dx =

2

0

c

0

x +c

1

x

2

+c

2

x

3

dx = 0

Evaluating these integrals gives the following system of linear equations:

2c

0

+ 2c

1

+

8

3

c

2

= 0,

2c

0

+

8

3

c

1

+ 4c

2

= 0.

This system is “underdetermined,” that is, there are two equations, but three unknowns. Notice,

however, that if q(x) satisﬁes the orthogonality conditions, then so does ˆ q(x) = αq(x), for any real

number α. That is, we can pick the scaling of q(x) as we wish.

With great foresight, we “guess” that we want c

2

= 3. This reduces the equations to

2c

0

+ 2c

1

= −8,

2c

0

+

8

3

c

1

= −12.

8.4. GAUSSIAN QUADRATURE 111

Simple Gaussian Elimination (cf. Chapter 3) yields the answer c

0

= 2, c

1

= −6, c

2

= 3.

Then our nodes are the roots of q(x) = 2 −6x + 3x

2

. That is the roots

6 ±

√

36 −24

6

= 1 ±

√

3

3

.

These nodes are a bit ugly. Rather than construct the Lagrange Polynomials, we will use the

method of undetermined coeﬃcients. Remember, we want to construct A

0

, A

1

such that

2

0

f(x) dx ≈ A

0

f(1 −

√

3

3

) +A

1

f(1 +

√

3

3

)

is exact for polynomial f(x) of degree ≤ 1. It suﬃces to make this approximation exact for the

“building blocks” of such polynomials, that is, for the functions 1 and x. That is, it suﬃces to ﬁnd

A

0

, A

1

such that

2

0

1 dx = A

0

+A

1

2

0

xdx = A

0

(1 −

√

3

3

) +A

1

(1 +

√

3

3

)

This gives the equations

2 = A

0

+A

1

2 = A

0

(1 −

√

3

3

) +A

1

(1 +

√

3

3

)

This is solved by A

0

= A

1

= 1.

Thus our quadrature rule is

2

0

f(x) dx ≈ f(1 −

√

3

3

) +f(1 +

√

3

3

) .

We expect this rule to be exact for cubic polynomials. ¬

Example Problem 8.17. Verify the results of the previous example problem for f(x) = x

3

.

Solution: We have

2

0

f(x) dx = (1/4) x

4

2

0

= 4.

The quadrature rule gives

f(1 −

√

3

3

) +f(1 +

√

3

3

) =

1 −

√

3

3

3

+

1 +

√

3

3

3

=

1 −3

√

3

3

+ 3

3

9

−

3

√

3

27

+

1 + 3

√

3

3

+ 3

3

9

+

3

√

3

27

=

2 −

√

3 −

√

3/9

+

2 +

√

3 +

√

3/9

= 4

Thus the quadrature rule is exact for f(x). ¬

112 CHAPTER 8. INTEGRALS AND QUADRATURE

8.4.5 Reinventing the Wheel

While it is good to know the theory, it doesn’t make sense in practice to recompute these things all

the time. There are books full of quadrature rules; any good textbook will list a few. The simplest

ones are given in Table 8.1.

See also http://mathworld.wolfram.com/Legendre-GaussQuadrature.html

n x

i

A

i

0 x

0

= 0 A

0

= 2

1

x

0

= −

1/3 A

0

= 1

x

1

=

1/3 A

1

= 1

x

0

= −

3/5 A

0

= 5/9

2 x

1

= 0 A

1

= 8/9

x

2

=

3/5 A

1

= 5/9

Table 8.1: Gaussian Quadrature rules for the interval [−1, 1]. Thus

1

−1

f(x) dx ≈

¸

n

i=0

A

i

f(x

i

),

with this relation holding exactly for all polynomials of degree no greater than 2n + 1.

Quadrature rules are normally given for the interval [−1, 1]. On ﬁrst consideration, it would

seem you need a diﬀerent rule for each interval [a, b]. This is not the case, as the following example

problem illustrates:

Example Problem 8.18. Given a quadrature rule which is good on the interval [−1, 1], derive a

version of the rule to apply to the interval [a, b].

Solution: Consider the substitution:

x =

b −a

2

t +

b +a

2

, so dx =

b −a

2

dt.

Then

b

a

f(x) dx =

1

−1

f

b −a

2

t +

b +a

2

b −a

2

dt.

Letting

g(t) =

b −a

2

f

b −a

2

t +

b +a

2

,

if f(x) is a polynomial, g(t) is a polynomial of the same degree. Thus we can use the quadrature

rule for [−1, 1] on g(t) to evaluate the integral of f(x) over [a, b]. ¬

Example Problem 8.19. Derive the quadrature rules of Example Problem 8.16 by using the

technique of Example Problem 8.18 and the quadrature rules of Table 8.1.

Solution: We have a = 0, b = 2. Thus

g(t) =

b −a

2

f

b −a

2

t +

b +a

2

= f(t + 1).

To integrate f(x) over [0, 2], we integrate g(t) over [−1, 1]. The standard Gaussian Quadrature rule

approximates this as

1

−1

g(t) dt ≈ g(−

1/3) +g(

1/3) = f(1 −

1/3) +f(1 +

1/3).

This is the same rule that was derived (with much more work) in Example Problem 8.16. ¬

8.4. GAUSSIAN QUADRATURE 113

Exercises

(8.1) Use the composite trapezoidal rule, by hand, to approximate

3

0

x

2

dx (= 9)

Use the partition ¦x

i

¦

2

i=0

= ¦0, 1, 3¦ . Why is your approximation an overestimate?

(8.2) Use the composite trapezoidal rule, by hand, to approximate

1

0

1

x + 1

dx (= ln 2 ≈ 0.693)

Use the partition ¦x

i

¦

3

i=0

=

¸

0,

1

4

,

1

2

, 1

¸

. Why is your approximation an overestimate? (Check:

I think the answer is 0.7)

(8.3) Use the composite trapezoidal rule, by hand to approximate

1

0

4

1 +x

2

dx.

Use n = 4 subintervals. How good is your answer?

(8.4) Use Theorem 8.8 to bound the error of the composite trapezoidal rule approximation of

2

0

x

3

dx with n = 10 intervals. You should ﬁnd that the approximation is an overestimate.

(8.5) How many equal subintervals of [0, 1] are required to approximate

1

0

cos x dx with error

smaller than 1 10

−6

by the composite trapezoidal rule? (Use Theorem 8.8.)

(8.6) How many equal subintervals would be required to approximate

1

0

4

1 +x

2

dx.

to within 0.0001 by the composite trapezoidal rule? (Hint: Use the fact that [f

tt

(x)[ ≤ 8 on

[0, 1] for f(x) = 4/(1 +x

2

))

(8.7) How many equal subintervals of [2, 3] are required to approximate

3

2

e

x

dx with error smaller

than 1 10

−3

by the composite trapezoidal rule?

(8.8) Simpson’s Rule for quadrature is given as

b

a

f(x) dx ≈

∆x

3

[f(x

0

) + 4f(x

1

) + 2f(x

2

) + 4f(x

3

) +. . . + 2f(x

n−2

) + 4f(x

n−1

) +f(x

n

)] ,

where ∆x = (b −a)/n, and n is assumed to be even. Show that Simpson’s Rule for n = 2 is

actually given by Romberg’s Algorithm as R(1, 1). As such we expect Simpson’s Rule to be

a O

h

4

**approximation to the integral.
**

(8.9) Find a quadrature rule of the form

1

0

f(x) dx ≈ Af(0) +Bf(1/2) +Cf(1)

that is exact for polynomials of highest possible degree. What is the highest degree polynomial

for which this rule is exact?

114 CHAPTER 8. INTEGRALS AND QUADRATURE

(8.10) Determine a “quadrature” rule of the form

1

−1

f(x) dx ≈ Af(0) +Bf

t

(−1) +Cf

t

(1)

that is exact for polynomials of highest possible degree. What is the highest degree polynomial

for which this rule is exact? (Since this rule uses derivatives of f, it does not exactly ﬁt our

deﬁnition of a quadrature rule, but it may be applicable in some situations.)

(8.11) Determine a “quadrature” rule of the form

1

0

f(x) dx ≈ Af(0) +Bf

t

(0) +Cf(1)

that is exact for polynomials of highest possible degree. What is the highest degree polynomial

for which this rule is exact?

(8.12) Consider the so-called order n Chebyshev Quadrature rule:

1

−1

f(x) dx ≈ c

n

n

¸

i=0

f(x

i

)

Find the weighting c

n

and nodes x

i

for the case n = 2 and the case n = 3. For what order

polynomials are these rules exact?

(8.13) Find the Gaussian Quadrature rule with 2 nodes for the interval [1, 5], i.e., ﬁnd a rule

5

1

f(x) dx ≈ Af(x

0

) +Bf(x

1

)

Before you solve the problem, consider the following questions: do you expect the nodes to

be the endpoints 1 and 5? do you expect the nodes to be arranged symmetrically around the

midpoint of the interval?

(8.14) Find the Gaussian Quadrature rule with 3 nodes for the interval [−1, 1], i.e., ﬁnd a rule

1

−1

f(x) dx ≈ Af(x

0

) +Bf(x

1

) +Cf(x

2

)

To ﬁnd the nodes x

0

, x

1

, x

2

you will have to ﬁnd the zeroes of a cubic equation, which could be

diﬃcult. However, you may use the simplifying assumption that the nodes are symmetrically

placed in the interval [−1, 1].

(8.15) Write code to approximate the integral of a f on [a, b] by the composite trapezoidal rule on

n equal subintervals. Your m-ﬁle should have header line like:

function iappx = trapezoidal(f,a,b,n)

You may wish to use the code:

x = a .+ (b-a) .* (0:n) ./ n;

If f is deﬁned to work on vectors element-wise, you can probably speed up your computation

by using

bigsum = 0.5 * ( f(x(1)) + f(x(n+1)) ) + sum( f(x(2:(n))) );

(8.16) Write code to implement the Gaussian Quadrature rule for n = 2 to integrate f on the

interval [a, b]. Your m-ﬁle should have header line like:

function iappx = gauss2(f,a,b)

(8.17) Write code to implement composite Gaussian Quadrature based on code from the previous

problem. Something like the following probably works:

8.4. GAUSSIAN QUADRATURE 115

function iappx = gaussComp(f,a,b,n)

% code to approximate integral of f over n equal subintervals of [a,b]

x = a .+ (b-a) .* (0:n) ./ n;

iappx = 0;

for i=1:n

iappx += gauss2(f,x(i),x(i+1));

end

Use your code to approximate the error function:

erf(z) =

2

√

π

z

0

e

−t

2

dt.

Compare your results with the octave/Matlab builtin function erf. (Try help erf in octave,

or see http://mathworld.wolfram.com/Erf.html)

The error function is used in probability. In particular, the probability that a normal random

variable is within z standard deviations from its mean is

erf(z/

√

2)

Thus erf(1/

√

2) ≈ 0.683, and erf(2/

√

2) ≈ 0.955. These numbers should look familiar to you.

116 CHAPTER 8. INTEGRALS AND QUADRATURE

Chapter 9

Least Squares

9.1 Least Squares

Least squares is a general class of methods for ﬁtting observed data to a theoretical model function.

In the general setting we are given a set of data

x x

0

x

1

. . . x

n

y y

0

y

1

. . . y

n

and some class of functions, T. The goal then is to ﬁnd the “best” f ∈ T to ﬁt the data to y = f(x).

Usually the class of functions T will be determined by some small number of parameters; the number

of parameters will be smaller (usually much smaller) than the number of data points. The theory

here will be concerned with deﬁning “best,” and examining methods for ﬁnding the “best” function

to ﬁt the data.

9.1.1 The Deﬁnition of Ordinary Least Squares

First we consider the data to be two vectors of length n + 1. That is we let

x = [x

0

x

1

. . . x

n

]

¯

, and y = [y

0

y

1

. . . y

n

]

¯

.

The error, or residual, of a given function with respect to this data is the vector r = y − f(x).

That is

r = [r

0

r

1

. . . r

n

]

¯

, where r

i

= y

i

−f(x

i

).

Our goal is to ﬁnd f ∈ T such that r is reasonably small. We measure the size of a vector

by the use of norms, which were explored in Subsection 1.4.1. The most useful norms are the

p

norms. For a given p with 0 < p < ∞, the

p

norm of r is deﬁned as

|r|

p

=

n

¸

i=0

r

p

i

1/p

For the purposes of approximation, the easiest norm to use is the

2

norm:

Deﬁnition 9.1.1 ((Ordinary) Least Squares Best Approximant). The least-squares best

approximant to a set of data, x, y from a class of functions, T, is the function f

∗

∈ T that

minimizes the

2

norm of the error. That is, if f

∗

is the least squares best approximant, then

|y −f

∗

(x)|

2

= min

f∈T

|y −f(x)|

2

117

118 CHAPTER 9. LEAST SQUARES

We will generally assume uniqueness of the minimum. This method is sometimes called the ordinary

least squares method. It assumes there is no error in the measurement of the data x, and usually

admits a relatively straightforward solution.

9.1.2 Linear Least Squares

We illustrate this deﬁnition using the class of linear functions as an example, as this case is reason-

ably simple. We are assuming

T = ¦f(x) = ax +b [ a, b ∈ R¦ .

We can now think of our job as drawing the “best” line through the data.

By Deﬁnition 9.1.1, we want to ﬁnd the function f(x) = ax +b that minimizes

n

¸

i=0

[y

i

−f(x

i

)]

2

1/2

.

This is a minimization problem over two variables, x and y. As you may recall from calculus class,

it suﬃces to minimize the sum rather than its square root. That is, it suﬃces to minimize

n

¸

i=0

[ax

i

+b −y

i

]

2

.

We minimize this function with calculus. We set

0 =

∂φ

a

=

n

¸

k=0

2x

k

(ax

k

+b −y

k

)

0 =

∂φ

b

=

n

¸

k=0

2 (ax

k

+b −y

k

)

These are called the normal equations. Believe it or not these are two equations in two unknowns.

They can be reduced to

¸

x

2

k

a +

¸

x

k

b =

¸

x

k

y

k

¸

x

k

a + (n + 1)b =

¸

y

k

The solution is found by na¨ıve Gaussian Elimination, and is ugly. Let

d

11

=

¸

x

2

k

d

12

= d

21

=

¸

x

k

d

22

= n + 1

e

1

=

¸

x

k

y

k

e

2

=

¸

y

k

We want to solve

d

11

a +d

12

b = e

1

d

21

a +d

22

b = e

2

9.1. LEAST SQUARES 119

Gaussian Elimination produces

a =

d

22

e

1

−d

12

e

2

d

22

d

11

−d

12

d

21

b =

d

11

e

2

−d

21

e

1

d

22

d

11

−d

12

d

21

The answer is not so enlightening as the means of ﬁnding the solution.

We should, for a moment, consider whether this is indeed the solution. Our calculations have

only shown an extrema at this choice of (a, b); could it not be a maxima?

9.1.3 Least Squares from Basis Functions

In many, but not all cases, the class of functions, T, is the span of a small set of functions. This

case is simpler to explore and we consider it here. In this case T can be viewed as a vector space

over the real numbers. That is, for f, g ∈ T, and α, β ∈ R, then αf +βg ∈ T, where the function

αf is that function such that (αf)(x) = αf(x).

Now let ¦g

j

(x)¦

m

j=0

be a set of m+ 1 linearly independent functions, i.e.,

c

0

g

0

(x) +c

1

g

1

(x) +. . . +c

m

g

m

(x) = 0 ∀x ⇒ c

0

= c

1

= . . . = c

m

= 0

Then we say that T is spanned by the functions ¦g

j

(x)¦

m

j=0

if

T =

f(x) =

¸

j

c

j

g

j

(x) [ c

j

∈ R, j = 0, 1, . . . , m

.

In this case the functions g

j

are basis functions for T. Note the basis functions need not be unique:

a given class of functions will usually have more than one choice of basis functions.

Example 9.1. The class T = ¦f(x) = ax +b [ a, b ∈ R¦ is spanned by the two functions g

0

(x) = 1,

and g

1

(x) = x. However, it is also spanned by the two functions ˜ g

0

(x) = 2x+1, and ˜ g

1

(x) = x−1.

To ﬁnd the least squares best approximant of T for a given set of data, we minimize the square

of the

2

norm of the error; that is we minimize the function

φ(c

0

, c

1

, . . . , c

m

) =

n

¸

k=0

¸

¸

j

c

j

g

j

(x

k

)

¸

−y

k

¸

¸

2

(9.1)

Again we set partials to zero and solve

0 =

∂φ

c

i

=

n

¸

k=0

2

¸

¸

j

c

j

g

j

(x

k

)

¸

−y

k

¸

¸

g

i

(x

k

)

This can be rearranged to get

m

¸

j=0

¸

¸

k

g

j

(x

k

)g

i

(x

k

)

¸

c

j

=

n

¸

k=0

y

k

g

i

(x

k

)

120 CHAPTER 9. LEAST SQUARES

If we now let

d

ij

=

n

¸

k=0

g

j

(x

k

)g

i

(x

k

), e

i

=

n

¸

k=0

y

k

g

i

(x

k

),

Then we have reduced the problem to the linear system (again, called the normal equations):

d

00

d

01

d

02

d

0m

d

10

d

11

d

12

d

1m

d

20

d

21

d

22

d

2m

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

d

m0

d

m1

d

m2

d

mm

¸

¸

¸

¸

¸

¸

¸

c

0

c

1

c

2

.

.

.

c

m

¸

¸

¸

¸

¸

¸

¸

=

e

0

e

1

e

2

.

.

.

e

m

¸

¸

¸

¸

¸

¸

¸

(9.2)

The choice of the basis functions can aﬀect how easy it is to solve this system. We explore this

in Section 9.2. Note that we are talking about the basis ¦g

j

¦

m

j=0

, and not exactly about the class

of functions T.

For example, consider what would happen if the system of normal equations were diagonal. In

this case, solving the system would be rather trivial.

Example Problem 9.2. Consider the case where m = 0, and g

0

(x) = ln x. Find the least squares

approximation of the data

x 0.50 0.75 1.0 1.50 2.0 2.25 2.75 3.0

y −1.187098 −0.452472 −0.068077 0.713938 1.165234 1.436975 1.725919 1.841422

Solution: Essentially we are trying to ﬁnd the c such that c ln x best approximates the data. The

system of equation 9.2 reduces to the 1-D equation:

Σ

k

ln

2

x

k

c = Σ

k

y

k

lnx

k

For our data, this reduces to:

4.0960c = 6.9844

so we ﬁnd c = 1.7052. The data and the least squares interpolant are shown in Figure 9.1. ¬

Example 9.3. Consider the awfully chosen basis functions:

g

0

(x) =

2

−1

x

2

−

2

x + 1

g

1

(x) = x

3

+

2

−1

x

2

+x

+ 1

where is small, around machine precision.

Suppose the data are given at the nodes x

0

= 0, x

1

= 1, x

2

= −1. We want to set up the normal

equations, so we compute some of the d

ij

. First we have to evaluate the basis functions at the nodes

x

i

. But this example was rigged to give:

g

0

(x

0

) = 1, g

0

(x

1

) = 0, g

0

(x

2

) =

g

1

(x

0

) = 1, g

1

(x

1

) = , g

1

(x

2

) = 0

After much work we ﬁnd we want to solve

¸

1 +

2

1

1 1 +

2

¸

c

0

c

1

=

¸

y

0

+y

2

y

0

+y

1

**9.2. ORTHONORMAL BASES 121
**

-2.5

-2

-1.5

-1

-0.5

0

0.5

1

1.5

2

0.5 1 1.5 2 2.5 3

Figure 9.1: The data of Example Problem 9.2 and the least squares interpolant are shown.

However, the computer would only ﬁnd this if it had inﬁnite precision. Since it does not, and since

is rather small, the computer thinks

2

= 0, and so tries to solve the system

¸

1 1

1 1

¸

c

0

c

1

=

¸

y

0

+y

2

y

0

+y

1

When y

1

= y

2

, this has no solution. Bummer.

This kind of thing is common in the method of least squares: the coeﬃcients of the normal

equations include terms like

g

i

(x

k

)g

j

(x

k

).

When the g

i

are small at the nodes x

k

, these coeﬃcients can get really small, since we are squaring.

Now we draw a rough sketch of the basis functions. We ﬁnd they do a pretty poor job of

discriminating around all the nodes.

9.2 Orthonormal Bases

In the previous section we saw that poor choice of basis vectors can lead to numerical problems.

Roughly speaking, if g

i

(x

k

) is small for some i’s and k’s, then some d

ij

can have a loss of precision

when two small quantities are multiplied together and rounded to zero.

Poor choice of basis vectors can also lead to numerical problems in solution of the normal

equations, which will be done by Gaussian Elimination.

Consider the case where T is the class of polynomials of degree no greater than m. For simplicity

we will assume that all x

i

are in the interval [0, 1]. The most obvious choice of basis functions is

g

j

(x) = x

j

. This certainly gives a basis for T, but actually a rather poor one. To see why, look

at the graph of the basis functions in Figure 9.3. The basis functions look too much alike on the

given interval.

A better basis for this problem is the set of Chebyshev Polynomials of the ﬁrst kind, i.e.,

g

j

(x) = T

j

(x), where

T

0

(x) = 1, T

1

(x) = x, T

i+1

(x) = 2xT

i

(x) −T

i−1

(x).

122 CHAPTER 9. LEAST SQUARES

-1.5

-1

-0.5

0

0.5

1

1.5

-1 -0.5 0 0.5 1

P

S

f

r

a

g

r

e

p

l

a

c

e

m

e

n

t

s

g

0

(x)

g

1

(x)

Figure 9.2: The basis functions g

0

(x) =

2

−1

x

2

−

2

x +1, and g

1

(x) = x

3

+

2

−1

x

2

+x

+1

are shown for = 0.05. Note that around the three nodes 0, 1, −1, these two functions take nearly

identical values. This can lead to a system of normal equations with no solution.

0

0.2

0.4

0.6

0.8

1

0 0.2 0.4 0.6 0.8 1

P

S

f

r

a

g

r

e

p

l

a

c

e

m

e

n

t

s

x

i

x

i

+

1

Figure 9.3: The polynomials x

i

for i = 0, 1, . . . , 6 are shown on [0, 1]. These polynomials make a

bad basis because they look so much alike, essentially.

These polynomials are illustrated in Figure 9.4.

9.2.1 Alternatives to Normal Equations

It turns out that the Normal Equations method isn’t really so great. We consider other methods.

First, we deﬁne A as the n m matrix deﬁned by the entries:

a

ij

= g

j

(x

i

), i = 0, 1, . . . , n, j = 0, 1, . . . , m.

9.2. ORTHONORMAL BASES 123

-1

-0.5

0

0.5

1

0 0.2 0.4 0.6 0.8 1

P

S

f

r

a

g

r

e

p

l

a

c

e

m

e

n

t

s

x

i

x

i

+

1

Figure 9.4: The Chebyshev polynomials T

j

(x) for j = 0, 1, . . . , 6 are shown on [0, 1]. These polyno-

mials make a better basis for least squares because they are orthogonal under some inner product.

Basically, they do not look like each other.

That is

A =

g

0

(x

0

) g

1

(x

0

) g

2

(x

0

) g

m

(x

0

)

g

0

(x

1

) g

1

(x

1

) g

2

(x

1

) g

m

(x

1

)

g

0

(x

2

) g

1

(x

2

) g

2

(x

2

) g

m

(x

2

)

g

0

(x

3

) g

1

(x

3

) g

2

(x

3

) g

m

(x

3

)

g

0

(x

4

) g

1

(x

4

) g

2

(x

4

) g

m

(x

4

)

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

g

0

(x

n

) g

1

(x

n

) g

2

(x

n

) g

m

(x

n

)

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

We write it in this way since we are thinking of the case where n m, so A is “tall.”

After some inspection, we ﬁnd that the Normal Equations can be written as:

A

¯

Ac = A

¯

y. (9.3)

Now let the vector c be identiﬁed, in the natural way, with a function in T. That is c is

identiﬁed with f(x) =

¸

m

j=0

c

j

g

j

(x). You should now convince yourself that

Ac = f(x).

And thus the residual, or error, of this function is r = y −Ac.

In our least squares theory we attempted to ﬁnd that c that minimized

|y −Ac|

2

2

= (y −Ac)

¯

(y −Ac)

We can see this as minimizing the Euclidian distance from y to Ac. For this reason, we will have

that the residual is orthogonal to the column space of A, that is we want

A

¯

r = A

¯

(y −Ac) = 0.

124 CHAPTER 9. LEAST SQUARES

This is just the normal equations. We could rewrite this, however, in the following form: ﬁnd c, r

such that

¸

I A

A

¯

0

¸

r

c

=

¸

y

0

**This is now a system of n +m variables and unknowns, which can be solved by specialized means.
**

This is known as the augmented form. We brieﬂy mention that na¨ıve Gaussian Elimination is not

appropriate to solve the augmented form, as it turns out to be equivalent to using the normal

equations method.

9.2.2 Ordinary Least Squares in octave/Matlab

The ordinary least squares best approximant is so fundamental that it is hard-wired into oc-

tave/Matlab’s system solve operator. The general equation we wish to solve is

Ac = y.

This system is overdetermined: there are more rows than columns in A, i.e., more equations than

unknown variables. However, in octave/Matlab, this is solved just like the case where A is square:

octave:1> c = A \ y;

Thus, unless speciﬁcally told otherwise, you should not implement the ordinary least squares solu-

tion method in octave/Matlab. Under the hood, octave/Matlab is not using the normal equations

method, and does not suﬀer the increased condition number of the normal equations method.

9.3 Orthogonal Least Squares

The method described in Section 9.1, sometimes referred to as “ordinary least squares,” assumes

that measurement error is found entirely in the dependent variable, and that there is no error in

the independent variables.

For example, consider the case of two variables, x and y, which are thought to be related by an

equation of the form y = mx +b. A number of measurements are made, giving the two sequences

¦x

i

¦

n

i=1

and ¦y

i

¦

n

i=1

. Ordinary least squares assumes, that

y

i

= mx

i

+b +

i

,

where

i

, the error of the i

th

measurement, is a random variable. It is usually assumed that

E[

i

] = 0, i.e., that the measurements are “unbiased.” Under the further assumption that the

errors are independent and have the same variance, the ordinary least squares solution is a very

good one.

1

However, what if it were the case that

y

i

= m(x

i

+ξ

i

) +b +

i

,

i.e., that there is actually error in the measurement of the x

i

? In this case, the orthogonal least

squares method is appropriate.

1

This means that the ordinary least squares solution gives an unbiased estimator of m and b, and, moreover, gives

the estimators with the lowest variance among all linear, a priori estimators. For more details on this, locate the

Gauss-Markov Theorem in a good statistics textbook.

9.3. ORTHOGONAL LEAST SQUARES 125

The diﬀerence between the two methods is illustrated in Figure 9.5. In Figure 9.5(a), the

ordinary least squares method is shown; it minimizes the sum of the squared lengths of vertical

lines from observations to a line, reﬂecting the assumption of no error in x

i

. The orthogonal least

squares method will minimize the sum of the squared distances from observations to a line, as

shown in Figure 9.5(b).

1

2

3

4

5

6

7

0 1 2 3 4 5 6

1

2

3

4

5

6

7

0 1 2 3 4 5 6

(a) Ordinary Least Squares

1

2

3

4

5

6

7

0 1 2 3 4 5 6

1

2

3

4

5

6

7

0 1 2 3 4 5 6

(b) Orthogonal Least Squares

Figure 9.5: The ordinary least squares method, as shown in (a), minimizes the sum of the squared

lengths of the vertical lines from the observed points to the regression line. The orthogonal least

squares, shown in (b), minimizes the sum of the squared distances from the points to the line. The

oﬀsets from points to the regression line in (b) may not look orthogonal due to improper aspect

ratio of the ﬁgure.

We construct the solution geometrically. Let

¸

x

(i)

¸

m

i=1

be the set of observations, expressed as

vectors in R

n

. The problem is to ﬁnd the vector n and number d such that the hyperplane n

¯

x = d

is the plane that minimizes the sum of the squared distances from the points to the plane. We can

solve this problem using vector calculus.

The function

1

|n|

2

2

m

¸

i=1

n

¯

x

(i)

−d

2

2

.

is the one to be minimized with respect to n and d. It gives the sum of the squared distances from

the points to the plane described by n and d. To simplify its computation, we will minimize it

subject to the constraint that |n|

2

2

= 1. Thus our problem is to ﬁnd n and d that solve

min

n

n=1

m

¸

i=1

n

¯

x

(i)

−d

2

2

.

We can express this as min f(n, d) subject to g(n, d) = 1. The solution of this problem uses the

Lagrange multipliers technique.

Let L(n, d, λ) = f(n, d) − λg(n, d). The Lagrange multipliers theory tells us that a necessary

126 CHAPTER 9. LEAST SQUARES

condition for n, d to be the solution is that there exists λ such that the following hold:

∂/

∂d

(n, d, λ) = 0

∇

n

L(n, d, λ) = 0

g (n, d) = 1

Let X be the m n matrix whose i

th

row is the vector x

(i)

¯

. We can rewrite the Lagrange

function as

L(n, d, λ) = (Xn−d1

m

)

¯

(Xn −d1

m

) −λn

¯

n

= n

¯

X

¯

Xn−2d1

m

¯

Xn +d

2

1

m

¯

1

m

−λn

¯

n

We solve the necessary condition on

∂/

∂d

.

0 =

∂L

∂d

(n, d, λ) = 2d1

m

¯

1

m

−21

m

¯

Xn ⇒ d = 1

m

¯

Xn/1

m

¯

1

m

This essentially tells us that we will zero the ﬁrst moment of the data around the line. Note that

this determination of d requires knowledge of n. However, since this is a necessary condition, we

can plug it into the gradient equation:

0 = ∇

n

L(n, d, λ) = 2X

¯

Xn −2d

1

m

¯

X

¯

−2λn = 2

¸

X

¯

Xn −

1

m

¯

Xn

1

m

¯

1

m

1

m

¯

X

¯

−λn

**The middle term is a scalar times a vector, so the multiplication can be commuted, this gives
**

0 =

¸

X

¯

Xn −

X

¯

1

m

1

m

¯

X

1

m

¯

1

m

n −λn

=

¸

X

¯

X −

X

¯

1

m

1

m

¯

X

1

m

¯

1

m

−λI

n

Thus n is an eigenvector of the matrix

M = X

¯

X −

X

¯

1

m

1

m

¯

X

1

m

¯

1

m

The ﬁnal condition of the three Lagrange conditions is that g(n, d) = n

¯

n = 1. Thus if n is a

minimizer, then it is a unit eigenvector of M.

It is not clear which eigenvector is the right one, so we might have to check all the eigenvectors.

However, further work tells us which eigenvector it is. First we rewrite the function to be minimized,

when the optimal d is used:

ˆ

f(n) = f(n, 1

m

¯

Xn/1

m

¯

1

m

). We have

ˆ

f(n) = n

¯

X

¯

Xn −2

1

m

¯

Xn

1

m

¯

1

m

1

m

¯

Xn +

1

m

¯

Xn

1

m

¯

1

m

2

1

m

¯

1

m

= n

¯

X

¯

Xn −2

n

¯

X

¯

1

m

1

m

¯

Xn

1

m

¯

1

m

+

n

¯

X

¯

1

m

1

m

¯

Xn1

m

¯

1

m

1

m

¯

1

m

1

m

¯

1

m

= n

¯

X

¯

Xn −2

n

¯

X

¯

1

m

1

m

¯

Xn

1

m

¯

1

m

+

n

¯

X

¯

1

m

1

m

¯

Xn

1

m

¯

1

m

= n

¯

X

¯

Xn −

n

¯

X

¯

1

m

1

m

¯

Xn

1

m

¯

1

m

= n

¯

¸

X

¯

X −

X

¯

1

m

1

m

¯

X

1

m

¯

1

m

n

= n

¯

Mn

9.3. ORTHOGONAL LEAST SQUARES 127

This sequence of equations only required that the d be the optimal d for the given n, and used the

fact that a scalar is it’s own transpose, thus 1

m

¯

Xn = n

¯

X

¯

1

m

.

Now if n is a unit eigenvector of M, with corresponding eigenvalue λ, then n

¯

Mn = λ. Note

that because f(n, d) is deﬁned as w

¯

w, for some vector w, then the matrix M must be positive

deﬁnite, i.e., λ must be positive. However, we want to minimize λ. Thus we take n to be the unit

eigenvector associated with the smallest eigenvalue.

Note that, by roundoﬀ, you may compute that M has some negative eigenvalues, though they

are small in magnitude. Thus one should select, as n, the unit eigenvector associated with the

smallest eigenvalue in absolute value.

Example Problem 9.4. Find the equation of the line which best approximates the 2D data

¦(x

i

, y

i

)¦

m

i=1

, by the orthogonal least squares method.

Solution: In this case the matrix X is

x

1

y

1

x

2

y

2

x

3

y

3

.

.

.

.

.

.

x

m

y

m

¸

¸

¸

¸

¸

¸

¸

Thus we have that

M = X

¯

X −

X

¯

1

m

1

m

¯

X

1

m

¯

1

m

,

=

¸ ¸

x

2

i

¸

x

i

y

i

¸

x

i

y

i

¸

y

2

i

−

1

m

¸

(

¸

x

i

)

2

¸

x

i

¸

y

i

¸

x

i

¸

y

i

(

¸

y

i

)

2

,

=

¸ ¸

x

2

i

¸

x

i

y

i

¸

x

i

y

i

¸

y

2

i

−m

¸

¯ x

2

¯ x¯ y

¯ x¯ y ¯ y

2

,

=

¸ ¸

x

2

i

−m¯ x

2

¸

x

i

y

i

−m¯ x¯ y

¸

x

i

y

i

−m¯ x¯ y

¸

y

2

i

−m¯ y

2

=

¸

2α β

β 2γ

. (9.4)

where we use ¯ x, ¯ y, to mean, respectively, the mean of the x- and y-values. The characteristic

polynomial of the matrix, whose roots are the eigenvalues of the matrix is

p(λ) = det

¸

2α −λ β

β 2γ −λ

= 4αγ −β

2

−2(α +γ)λ +λ

2

.

The roots of this polynomial are given by the quadratic equation

λ

±

=

2(α +γ) ±

4 (α +γ)

2

−4 (4αγ −β

2

)

2

= (α +γ) ±

(α −γ)

2

+β

2

We want the smaller eigenvalue, λ

−

= (α +γ) −

(α −γ)

2

+β

2

. The associated eigenvector is in

the null space of the matrix

0 =

¸

2α −λ

−

β

β 2γ −λ

−

v =

¸

2α −λ

−

β

β 2γ −λ

−

¸

−k

1

=

¸

β −k (2α −λ

−

)

−kβ + 2γ −λ

−

**128 CHAPTER 9. LEAST SQUARES
**

This is solved by

k =

2γ −λ

−

β

=

(γ −α) +

(γ −α)

2

+β

2

β

=

¸

(y

2

i

−x

2

i

) −m

¯ y

2

− ¯ x

2

+

¸

(y

2

i

−x

2

i

) −m(¯ y

2

− ¯ x

2

)

2

+ 4 (

¸

x

i

y

i

−m¯ x¯ y)

2

2

¸

x

i

y

i

−m¯ x¯ y

. (9.5)

Thus the best plane through the data is of the form

−kx + 1y = d or y = kx +d

Note that the eigenvector, v, that we used is not a unit vector. However it is parallel to the unit

vector we want, and can still use the regular formula to compute the optimal d.

d = 1

m

¯

Xn/1

m

¯

1

m

=

1

m

1

m

¯

X

¸

−k

1

=

¯ x ¯ y

¸

−k

1

= ¯ y −k¯ x

Thus the optimal line through the data is

y = kx +d = k (x − ¯ x) + ¯ y or y − ¯ y = k (x − ¯ x) ,

with k deﬁned in equation 9.5.

¬

9.3.1 Computing the Orthogonal Least Squares Approximant

Just as the Normal Equations equation 9.3 deﬁne the solution to the ordinary least squares ap-

proximant, but are not normally used to solve the problem, so too is the above described means of

computing the orthogonal least squares not a good idea.

First we deﬁne

W = X −

1

m

1

m

¯

X

1

m

¯

1

m

.

Now note that

W

¯

W =

X −

1

m

1

m

¯

X

1

m

¯

1

m

¯

X −

1

m

1

m

¯

X

1

m

¯

1

m

= X

¯

X −2

X

¯

1

m

1

m

¯

X

1

m

¯

1

m

+

1

m

1

m

¯

X

¯

1

m

1

m

¯

X

1

m

¯

1

m

2

= X

¯

X −2

X

¯

1

m

1

m

¯

X

1

m

¯

1

m

+

X

¯

1

m

(1

m

¯

1

m

)1

m

¯

X

1

m

¯

1

m

2

= X

¯

X −2

X

¯

1

m

1

m

¯

X

1

m

¯

1

m

+

X

¯

1

m

1

m

¯

X

1

m

¯

1

m

= X

¯

X −

X

¯

1

m

1

m

¯

X

1

m

¯

1

m

= M.

We now need some linear algebra magic:

Deﬁnition 9.3.1. A square matrix U is called unitary if its inverse is its transpose, i.e.,

U

¯

U = I = UU

¯

.

From the ﬁrst equation, we see that the columns of U form a collection of orthonormal vectors.

The second equation tells that the rows of U are also such a collection.

9.3. ORTHOGONAL LEAST SQUARES 129

Deﬁnition 9.3.2. Every mn matrix B can be decomposed as

B = UΣV

¯

,

where U and V are unitary matrices, U is m m, V is n n, and Σ is m n, and has nonzero

elements only on its diagonal. The values of the diagonal of Σ are the singular values of B. The

column vectors of V span the row space of B, and the column vectors of U contain the column space

of B.

The singular value decomposition generalizes the notion of eigenvalues and eigenvalues to the

case of nonsquare matrices. Let u

(i)

be the i

th

column vector of U, v

(i)

the i

th

column of V, and

let σ

i

be the i

th

element of the diagonal of Σ. Then if x =

¸

i

α

i

v

(i)

, we have

Bx = UΣV

¯

¸

i

α

i

v

(i)

= UΣ[α

1

α

2

. . . α

n

]

¯

= U

σ

1

α

1

0 . . . 0

0 σ

2

α

2

. . . 0

.

.

.

.

.

.

.

.

.

.

.

.

0 0 . . . σ

n

α

n

0 0 . . . 0

.

.

.

.

.

.

.

.

.

.

.

.

0 0 . . . 0

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

¸

=

¸

i

σ

i

α

i

u

(i)

.

Thus v

(i)

and u

(i)

act as an “eigenvector pair” with “eigenvalue” σ

i

.

The singular value decomposition is computed by the octave/Matlab command [U,S,V] =

svd(B). The decomposition is computed such that the diagonal elements of S, the singular values,

are positive, and decreasing with increasing index.

Now we can use the singular value decomposition on W:

W = UΣV

¯

.

Thus

M = W

¯

W = VΣ

¯

U

¯

UΣV

¯

= VΣ

¯

IΣV

¯

= VS

2

V

¯

,

where S

2

is the n n diagonal matrix whose i

th

diagonal is σ

2

i

. Thus we see that the solution to

the orthogonal least squares problem, the eigenvector associated with the smallest eigenvalue of M,

is the column vector of V associated with the smallest, in absolute value, singular value of W. In

octave/Matlab, this is V(:,n).

This may not appear to be a computational savings, but if M is computed up front, and its

eigenvectors are computed, there is loss of precision in its smallest eigenvalues which might lead us

to choose the wrong normal direction (especially if there is more than one!). On the other hand,

M is a n n matrix, whereas W is m n. If storage is at a premium, and the method is being

reapplied with new observations, it may be preferrable to keep M, rather than keeping W. That is,

it may be necessary to trade conditioning for space.

9.3.2 Principal Component Analysis

The above analysis leads us to the topic of Principal Component Analysis. Suppose the m n

matrix X represents m noisy observations of some process, where each observation is a vector in

R

n

. Suppose the observations nearly lie in some k-dimensional subspace of R

n

, for k < n. How can

you ﬁnd an approximate value of k, and how do you ﬁnd the k-dimensional subspace?

130 CHAPTER 9. LEAST SQUARES

As in the previous subsection, let

W = X −

1

m

1

m

¯

X

1

m

¯

1

m

= X −1

m

¯

X, where

¯

X = 1

m

¯

X/1

m

¯

1

m

.

Now note that

¯

X is the mean of the m observations, as a row vector in R

n

. Under the assumption

that the noise is approximately the same for each observation, we should think that

¯

X is likely

to be in or very close to the k-dimensional subspace. Then each row of W should be like a vector

which is contained in the subspace. If we take some vector v and multiply Wv, we expect the

resultant vector to have small norm if v is not in the k-dimensional subspace, and to have a large

norm if it is in the subspace.

You should now convince yourself that this is related to the singular value decomposition of W.

Decomposing v =

¸

i

α

i

v

(i)

, we have Wv =

¸

i

σ

i

α

i

u

(i)

, and thus product vector has small norm if

the α

i

associated with large σ

i

are small, and large norm otherwise. That is the principal directions

of the “best” k-dimensional subspace associated with X are the k columns of V associated with the

largest singular values of W.

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 5 10 15 20 25 30 35 40

cum. sum sqrd. sing. vals.

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 5 10 15 20 25 30 35 40

cum. sum sqrd. sing. vals.

Figure 9.6: Noisy 5-dimensional data lurking in R

40

are treated to Principal Component Analysis.

This ﬁgure shows the normalized cumulative sum of squared singular values of W. There is a sharp

“elbow” at k = 5, which indicates the data is actually 5-dimensional.

If k is unknown a priori , a good value can be obtained by eyeing a graph of the cumulative

sum of squared singular values of W, and selecting the k that makes this appropriately large. This

is illustrated in Figure 9.6, with data generated by the following octave/Matlab code:

octave:1> X = rand(150,5);

octave:2> sawX = (X * randn(5,40)) + kron(ones(150,1),randn(1,40));

octave:3> sawX += 0.1 * randn(150,40);

octave:4> wun = ones(150,1);

9.3. ORTHOGONAL LEAST SQUARES 131

octave:5> W = sawX - (wun * wun’ * sawX) / (wun’ * wun);

octave:6> [U,S,V] = svd(W);

octave:7> eigs = diag(S’*S);

octave:8> cseig = cumsum(eigs) ./ sum(eigs);

octave:9> plot ((1:40)’,cseig,’xr-@;cum. sum sqrd. sing. vals.;’);

132 CHAPTER 9. LEAST SQUARES

Exercises

(9.1) How do you know that the choice of constants c

i

in our least squares analysis actually ﬁnd a

minimum of equation 9.1, and not, say, a maximum?

(9.2) Our “linear” least squares might be better called the “aﬃne” least squares. In this exercise

you will ﬁnd the best linear function which approximates a set of data. That is, ﬁnd the

function f(x) = cx which is the ordinary least squares best approximant to the given data

x x

0

x

1

. . . x

n

y y

0

y

1

. . . y

n

(9.3) Find the constant that best approximates, in the ordinary least squares sense, the given data

x, y. (Hint: you can use equation 9.2 or equation 9.3 using a single basis function g

0

(x) = 1.)

Do the x

i

values aﬀect your answer?

(9.4) Find the function f(x) = c which best approximates, in the ordinary least squares sense, the

data

x 1 −2 5

y 1 −2 4

(9.5) Find the function ax +b that best approximates the data

x 0 −1 2

y 0 1 −1

Use the ordinary least squares method.

(9.6) Find the function ax +b that best approximates the data of the previous problem using the

orthogonal least squares method.

(9.7) Find the function ax

2

+b that best approximates the data

x 0 1 2

y 0.3 0.1 0.5

(9.8) Prove that the least squares solution is the solution of the normal equations, equation 9.3,

and thus takes the form

c =

A

¯

A

−1

A

¯

y.

(9.9) For ordinary least squares regression to the line y = mx + b, express the approximate m

in terms of the α, β, and γ parameters from equation 9.4. Compare this to that found in

equation 9.5.

(9.10) Any symmetric positive deﬁnite matrix, G can be used to deﬁne a norm in the following

way:

|v|

G

=

df

√

v

¯

Gv

The weighted least squares method for G and data A and y, ﬁnds the ˆ c that solves

min

c

|Ac −y|

2

G

Prove that the normal equations form of the solution of this problem is

A

¯

GAˆ c = A

¯

Gy.

9.3. ORTHOGONAL LEAST SQUARES 133

(9.11) (Continuation) Let the symmetric positive deﬁnite matrix G have Cholesky Factorization

G = LL

¯

, where L is a lower triangular matrix. Show that the solution to the weighted least

squares method is the ˆ c that is the ordinary (unweighted) least squares best approximate

solution to the problem

L

¯

Ac = L

¯

y.

(9.12) The r

2

statistic is often used to describe the “goodness-of-ﬁt” of the ordinary least squares

approximation to data. It is deﬁned as

r

2

=

|Aˆ c|

2

2

|y|

2

2

,

where ˆ c is the approximant A

¯

A

−1

A

¯

y.

(a) Prove that we also have

r

2

= 1 −

|Aˆ c −y|

2

2

|y|

2

2

.

(b) Prove that the r

2

parameter is “scale-invariant,” that is, if we change the units of our

data, the parameter is unchanged (although the value of ˆ c may change.)

(c) Devise a similar statistic for the orthogonal least squares approximation which is scale-

invariant and rotation-invariant.

(9.13) As proved in Example Problem 9.4, the orthogonal least squares best approximate line for

data ¦(x

i

, y

i

)¦

m

i=1

goes through the point (¯ x, ¯ y). Does the same thing hold for the ordinary

least squares best approximate line?

(9.14) Find the constant c such that f(x) = ln (cx) best approximates, in the least squares sense,

the given data

x x

0

x

1

. . . x

n

y y

0

y

1

. . . y

n

(Hint: You cannot use basis functions and equation 9.2 to solve this. You must use the

Deﬁnition 9.1.1.) The geometric mean of the numbers a

1

, a

2

, . . . , a

n

is deﬁned as (

¸

a

i

)

1/n

.

How does your answer relate to the geometric mean?

(9.15) Write code to ﬁnd the function ae

x

+be

−x

that best interpolates, in the least squares sense,

a set of data ¦(x

i

, y

i

)¦

i=n

i=0

. Your m-ﬁle should have header line like:

function [a,b] = lsqrexp(xys)

where xys is the (n + 1) 2 matrix whose rows are the n + 1 tuples (x

i

, y

i

). Use the normal

equations method. This should reduce the problem to solving a 2 2 linear system; let

octave/Matlab solve that linear system for you. (Hint: To ﬁnd x such that Mx = b, use x =

M ` b.)

(a) What do you get when you try the following?

octave:1> xs = [-1 -0.5 0 0.5 1]’;

octave:2> ys = [1.194 0.430 0.103 0.322 1.034]’;

octave:3> xys = [xs ys];

octave:4> [a,b] = lsqrexp(xys)

(b) Try the following:

octave:5> xys = xys = [-0.00001 0.3;0 0.6;0.00001 0.7];

octave:6> [a,b] = lsqrexp(xys)

Does something unexpected happen? Why?

(9.16) Write code to ﬁnd the plane n

¯

x

(i)

= d that best interpolates, in the least squares sense, a

set of data

¸

x

(i)

¸

i=m

i=0

. Your m-ﬁle should have header line like:

134 CHAPTER 9. LEAST SQUARES

function [n,d] = orthlsq(X)

where X is the m n matrix whose rows are the m tuples (x

1

, x

2

, . . . , x

n

). Use the oc-

tave/Matlab eig function, which returns the eigenvalues and vectors of a matrix.

(a) Create artiﬁcially planar data by the following:

octave:1> m = 100;n = 5;epsi = 0.1;offs = 2;

octave:2> Xsub = rand(m,(n-1));

octave:3> T = rand(n-1,n);

octave:4> X = Xsub * T;

octave:5> X += offs .* ones(size(X));

octave:6> X += epsi .* randn(size(X));

Now X contains data which are approximately on a (n − 1)-dimensional hyperplane in

R

n

. The data, when centered, lay in the rowspace of T. Now test your code.

octave:7> [n,d] = orthlsq(X);

You should have that the vector n is orthogonal to the rows of T. Check this:

octave:8> disp(norm(T * n));

What do you get? Try again with epsi very small (1 10

−5

) or zero.

(b) Compare the orthogonal least squares technique to the ordinary least squares in a two

dimensional setting. Consider x, y data nearly on the line y = mx +b:

octave:1> obs = 100;xeps = 1.0;yeps = 1.0;m = 0.375;b = 1;

octave:2> Xtrue = randn(obs,1);Ytrue = m .* Xtrue .+ b;

octave:3> XSaw = Xtrue + xeps * randn(size(Xtrue));

octave:4> YSaw = Ytrue + yeps * randn(size(Ytrue));

octave:5> lsmb = [XSaw, ones(obs,1)] \ YSaw;

octave:6> lsm = lsmb(1); lsb = lsmb(2);

octave:7> [n,d] = orthlsq([XSaw, YSaw]);

octave:8> orthm = -n(1) / n(2);orthb = d / n(2);

Compare the estimated coeﬃcients from both solutions. Plot the two regression lines.

octave:9> hold off;

octave:10> plot(XSaw,YSaw,"*;observed data;");

octave:11> hold on;

octave:12> plot(XSaw,m*XSaw .+ b,"r-;true line;");

octave:13> plot(XSaw,lsm*XSaw .+ lsb,"g-;ordinary least squares;");

octave:14> plot(XSaw,orthm*XSaw .+ orthb,"b-;orthogonal least squares;");

octave:15> hold off;

Try this again with diﬀerent values of xeps and yeps, which control the sizes of the

errors in the x and y dimensions. Try with xeps = 0.1, yeps = 1.0, and with with

xeps = 1.0, yeps = 0.1. Under which situations does orthogonal least squares outperform

ordinary least squares? Do they give equally erroneous results in some situations? Can

you think of a way of “reweighting” channel data to make the orthogonal least squares

method more robust in cases of unequal error variance?

Chapter 10

Ordinary Diﬀerential Equations

Ordinary Diﬀerential Equations, or ODEs, are used in approximating some physical systems. Some

classical uses include simulation of the growth of populations and trajectory of a particle. They are

usually easier to solve or approximate than the more general Partial Diﬀerential Equations, PDEs,

which can contain partial derivatives.

Much of the background material of this chapter should be familiar to the student of calculus.

We will focus more on the approximation of solutions, than on analytical solutions. For more

background on ODEs, see any of the standard texts, e.g., [8].

10.1 Elementary Methods

A one-dimensional ODE is posed as follows: ﬁnd some function x(t) such that

dx(t)

dt

= f (t, x(t)) ,

x(a) = c.

(10.1)

In calculus classes, you probably studied the case where f (t, x(t)) is independent of its second

variable. For example the ODE given by

dx(t)

dt

= t

2

−t,

x(a) = c.

has solution x(t) =

1

3

t

3

−

1

2

t

2

+K, where K is chosen such that x(a) = c.

A more general case is when f(t, x) is separable, that is f(t, x) = g(t)h(x) for some functions

g, h. You may recall that the solution to such a separable ODE usually involved the following

equation:

dx

h(x)

=

g(t) dt

Finding an analytic solution can be considerably more diﬃcult in the general case, thus we turn

to approximation schemes.

135

136 CHAPTER 10. ORDINARY DIFFERENTIAL EQUATIONS

10.1.1 Integration and ‘Stepping’

We attempt to solve the ODE by integrating both sides. That is

dx(t)

dt

= f (t, x(t)) , yields

t+h

t

dx =

t+h

t

f (r, x(r)) dr, thus

x(t +h) = x(t) +

t+h

t

f (r, x(r)) dr. (10.2)

If we can approximate the integral then we have a way of ‘stepping’ from t to t +h, i.e., if we have

a good approximate of x(t) we can approximate x(t + h). Note that this means that all the work

we have put into approximating integrals can be used to approximate the solution of ODEs.

Using stepping schemes we can approximate x(t

final

) given x(t

initial

) by taking a number of

steps. For simplicity we usually use the same step size, h, for each step, though nothing precludes

us from varying the step size.

If we apply the left-hand rectangle rule for approximating integrals to equation 10.2, we get

x(t +h) ≈ x(t) +hf(t, x(t)). (10.3)

This is called Euler’s Method. Note this is essentially the same as the “forward diﬀerence” approx-

imation we made to the derivative, equation 7.1.

Trapezoid rule gives

x(t +h) ≈ x(t) +

h

2

[f(t, x(t)) +f(t +h, x(t +h))] .

But we cannot evaluate this exactly, since x(t +h) appears on both sides of the equation. And it is

embedded on the right hand side. Bummer. But if we could approximate it, say by using Euler’s

Method, maybe the formula would work. This is the idea behind the Runge-Kutta Method (see

Section 10.2).

10.1.2 Taylor’s Series Methods

We see that our more accurate integral approximations will be useless since they require information

we do not know, i.e., evaluations of f(t, x) for yet unknown x values. Thus we fall back on

Taylor’s Theorem (Theorem 1.1). We can also view this as using integral approximations where all

information comes from the left-hand endpoint.

By Taylor’s theorem, if x has at least m+ 1 continuous derivatives on the interval in question,

we can write

x(t +h) = x(t) +hx

t

(t) +

1

2

h

2

x

tt

(t) +. . . +

1

m!

h

m

x

(m)

(t) +

1

(m+ 1)!

h

m+1

x

(m+1)

(τ),

where τ is between t and t + h. Since τ is essentially unknown, our best calculable approximation

to this expression is

x(t +h) ≈ x(t) +hx

t

(t) +

1

2

h

2

x

tt

(t) +. . . +

1

m!

h

m

x

(m)

(t).

This approximate solution to the ODE is called a Taylor’s series method of order m. We also say

that this approximation is a truncation of the Taylor’s series. The diﬀerence between the actual

10.1. ELEMENTARY METHODS 137

value of x(t +h), and this approximation is called the truncation error. The truncation error exists

independently from any error in computing the approximation, called roundoﬀ error. We discuss

this more in Subsection 10.1.5.

10.1.3 Euler’s Method

When m = 1 we recover Euler’s Method:

x(t +h) = x(t) +hx

t

(t) = x(t) +hf(t, x(t)).

Example 10.1. Consider the ODE

dx(t)

dt

= x,

x(0) = 1.

The actual solution is x(t) = e

t

. Euler’s Method will underestimate x(t) because the curvature of

the actual x(t) is positive, and thus the function is always above its linearization. In Figure 10.1,

we see that eventually the Euler’s Method approximation is very poor.

0

2

4

6

8

10

12

14

16

18

20

22

0 0.5 1 1.5 2 2.5 3

P

S

f

r

a

g

r

e

p

l

a

c

e

m

e

n

t

s

Euler approximation

Actual

Figure 10.1: Euler’s Method applied to approximate x

t

= x, x(0) = 1. Each step proceeds on

the linearization of the curve, and is thus an underestimate, as the actual solution, x(t) = e

t

has

positive curvature. Here step size h = 0.3 is used.

10.1.4 Higher Order Methods

Higher order methods are derived by taking more terms of the Taylor’s series expansion. Note

however, that they will require information about the higher derivatives of x(t), and thus may not

be appropriate for the case where f(t, x) is given as a “black box” function.

138 CHAPTER 10. ORDINARY DIFFERENTIAL EQUATIONS

Example Problem 10.2. Derive the Taylor’s series method for m = 3 for the ODE

dx(t)

dt

= x +e

x

,

x(0) = 1.

Solution: By simple calculus:

x

t

(t) = x +e

x

x

tt

(t) = x

t

+e

x

x

t

x

ttt

(t) = x

tt

(1 +e

x

) +x

t

e

x

x

t

We can then write our step like a program:

x

t

(t) ←x(t) +e

x(t)

x

tt

(t) ←x

t

(t)

1 +e

x(t)

x

ttt

(t) ←x

tt

(t)

1 +e

x(t)

+

x

t

(t)

2

e

x(t)

x(t +h) ≈ x(t) +hx

t

(t) +

1

2

h

2

x

tt

(t) +

1

6

h

3

x

ttt

(t)

¬

10.1.5 Errors

There are two types of errors: truncation and roundoﬀ. Truncation error is the error of truncating

the Taylor’s series after the m

th

term. You can see that the truncation error is O

h

m+1

. Roundoﬀ

error occurs because of the limited precision of computer arithmetic.

While it is possible these two kinds of error could cancel each other, given our pessimistic

worldview, we usually assume they are additive. We also expect some tradeoﬀ between these two

errors as h varies. As h is made smaller, the truncation error presumably decreases, while the

roundoﬀ error increases. This results in an optimal h-value for a given method. See Figure 10.2.

Unfortunately, the optimal h-value will usually depend on the given ODE and the approximation

method. For an ODE with unknown solution, it is generally impossible to predict the optimal h. We

have already observed this kind of behaviour, when analyzing approximate derivatives–cf. Example

Problem 7.5.

Both of these kinds of error can accumulate: Since we step from x(t) to x(t + h), an error in

the value of x(t) can cause a greater error in the value of x(t +h). It turns out that for some ODEs

and some methods, this does not happen. This leads to the topic of stability.

10.1.6 Stability

Suppose you had an exact method of solving an ODE, but had the wrong data. That is, you were

trying to solve

dx(t)

dt

= f (t, x(t)) ,

x(a) = c,

but instead fed the wrong data into the computer, which then solved exactly the ODE

dx(t)

dt

= f (t, x(t)) ,

x(a) = c +.

10.1. ELEMENTARY METHODS 139

0.01

0.1

1

10

100

1000

10000

1e-07 1e-06 1e-05 0.0001 0.001 0.01

P

S

f

r

a

g

r

e

p

l

a

c

e

m

e

n

t

s

truncation

roundoﬀ

total error

Figure 10.2: The truncation, roundoﬀ, and total error for a ﬁctional ODE and approximation

method are shown. The total error is apparently minimized for an h value of approximately 0.0002.

Note that the truncation error appears to be O(h), thus the approximation could be Euler’s Method.

In reality, we expect the roundoﬀ error to be more “bumpy,” as seen in Figure 7.1.

This solution can diverge from the correct one because of the diﬀerent starting conditions. We

illustrate this with the usual example.

Example 10.3. Consider the ODE

dx(t)

dt

= x.

This has solution x(t) = x(0)e

t

. The curves for diﬀerent starting conditions diverge, as in Fig-

ure 10.3(a).

However, if we instead consider the ODE

dx(t)

dt

= −x,

which has solution x(t) = x(0)e

−t

, we see that diﬀerences in the initial conditions become immate-

rial, as in Figure 10.3(b).

Thus the latter ODE exhibits stability: roundoﬀ and truncation errors accrued at a given step

will become irrelevant as more steps are taken. The former ODE exhibits the opposite behavior–

accrued errors will be ampliﬁed.

140 CHAPTER 10. ORDINARY DIFFERENTIAL EQUATIONS

0

2

4

6

8

10

12

0 0.5 1 1.5 2

P

S

f

r

a

g

r

e

p

l

a

c

e

m

e

n

t

s

= −0.5

= −0.25

= 0

= 0.25

= 0.5

(a) (1 + )e

t

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

0 0.5 1 1.5 2

P

S

f

r

a

g

r

e

p

l

a

c

e

m

e

n

t

s

= −0.5

= −0.25

= 0

= 0.25

= 0.5

(b) (1 + )e

−t

Figure 10.3: Exponential functions with diﬀerent starting conditions: in (a), the functions (1+)e

t

are shown, while in (b), (1+)e

−t

are shown. The latter exhibits stability, while the former exhibits

instability.

10.1. ELEMENTARY METHODS 141

It turns out there is a simple test for stability. If f

x

> δ for some positive δ, then the ODE

is instable; if f

x

< −δ for a positive δ, the equation is stable. Some ODEs fall through this test,

however.

We can prove this without too much pain. Deﬁne x(t, s) to be the solution to

dx(t)

dt

= f (t, x(t)) ,

x(a) = s,

for t ≥ a.

Then instability means that

lim

t→∞

∂

∂s

x(t, s)

= ∞.

Think about this in terms of the curves for our simple example x

t

= x.

If we take the derivative of the ODE we get

∂

∂t

x(t, s) = f(t, x(t, s)),

∂

∂s

∂

∂t

x(t, s) =

∂

∂s

f(t, x(t, s)), i.e.,

∂

∂s

∂

∂t

x(t, s) = f

t

(t, x(t))

∂t

∂s

+f

x

(t, x(t, s))

∂x(t, s)

∂s

.

By the independence of t, s the ﬁrst part of the RHS is zero, so

∂

∂s

∂

∂t

x(t, s) = f

x

(t, x(t, s))

∂x(t, s)

∂s

.

We use continuity of x to switch the orders of diﬀerentiation:

∂

∂t

∂

∂s

x(t, s) = f

x

(t, x(t, s))

∂

∂s

x(t, s).

Deﬁning u(t) =

∂

∂s

x(t, s), and q(t) = f

x

(t, x(t, s)), we have

∂

∂t

u(t) = q(t)u(t).

The solution is u(t) = ce

Q(t)

, where

Q(t) =

t

a

q(r) dr.

If q(r) = f

x

(r, x(r, s)) ≥ δ > 0, then limQ(t) = ∞, and so limu(t) = ∞. But note that u(t) =

∂

∂s

x(t, s), and thus we have instability.

Similarly if q(r) ≤ −δ < 0, then limQ(t) = −∞, and so limu(t) = 0, giving stability.

10.1.7 Backwards Euler’s Method

Euler’s Method is the most basic numerical technique for solving ODEs. However, it may suﬀer

from instability, which may make the method inappropriate or impractical. However, it can be

recast into a method which may have superior stability at the cost of limited applicability.

142 CHAPTER 10. ORDINARY DIFFERENTIAL EQUATIONS

This method, the so-called Backwards Euler’s Method, is a stepping method: given a reasonable

approximation to x(t), it calculates an approximation to x(t + h). As usual, Taylor’s Theorem is

the starting point. Letting t

f

= t +h, Taylor’s Theorem states that

x(t

f

−h) = x(t

f

) + (−h)x

t

(t

f

) +

(−h)

2

2

x

tt

(t

f

) +. . . ,

for an analytic function. Assuming that x has two continuous derivatives on the interval in question,

the second order term is truncated to give

x(t) = x(t +h −h) = x(t

f

−h) ≈ x(t

f

) + (−h)x

t

(t

f

)

and so, x(t +h) ≈ x(t) +hx

t

(t +h)

The complication with applying this method is that x

t

(t+h) may not be computable if x(t+h) is

not known, thus cannot be used to calculate an approximation for x(t+h). Since this approximation

may contain x(t + h) on both sides, we call this method an implicit method. In contrast, Euler’s

Method, and the Runge-Kutta methods in the following section are explicit methods: they can be

used to directly compute an approximate step. We make this clear by examples.

Example 10.4. Consider the ODE

dx(t)

dt

= cos x,

x(0) = 2π.

In this case, if we wish to approximate x(0 +h) using Backwards Euler’s Method, we have to ﬁnd

x(h) such that x(h) = x(0) + cos x(h). This is equivalent to ﬁnding a root of the equation

g(y) = 2π + cos y −y = 0

The function g(y) is nonincreasing (g

t

(y) is nonpositive), and has a zero between 6 and 8, as seen

in Figure 10.4. The techniques of Chapter 4 are needed to ﬁnd the zero of this function.

-6

-4

-2

0

2

4

6

8

0 2 4 6 8 10 12

P

S

f

r

a

g

r

e

p

l

a

c

e

m

e

n

t

s

g(y)

Figure 10.4: The nonincreasing function g(y) = 2π + cos y −y is shown.

10.2. RUNGE-KUTTA METHODS 143

Example 10.5. Consider the ODE from Example 10.1:

dx(t)

dt

= x,

x(0) = 1.

The actual solution is x(t) = e

t

. Euler’s Method underestimates x(t), as shown in Figure 10.1.

Using Backwards Euler’s Method, gives the approximation:

x(t +h) = x(t) +hx(t +h),

x(t +h) =

x(t)

1 −h

.

Using Backwards Euler’s Method gives an overestimate, as shown in Figure 10.5. This is because

each step proceeds on the tangent line to a curve ke

t

to a point on that curve. Generally Backwards

Euler’s Method is preferred over vanilla Euler’s Method because it gives equivalent stability for

larger stepsize h. This eﬀect is not evident in this case.

0

5

10

15

20

25

30

35

40

0 0.5 1 1.5 2 2.5 3

P

S

f

r

a

g

r

e

p

l

a

c

e

m

e

n

t

s

Backwards Euler approximation

Actual

Figure 10.5: Backwards Euler’s Method applied to approximate x

t

= x, x(0) = 1. The approxi-

mation is an overestimate, as each step is to a point on a curve ke

t

, through the tangent at that

point. Compare this ﬁgure to Figure 10.1, which shows the same problem approximated by Euler’s

Method, with the same stepsize, h = 0.3.

10.2 Runge-Kutta Methods

Recall the ODE problem: ﬁnd some x(t) such that

dx(t)

dt

= f (t, x(t)) ,

x(a) = c,

144 CHAPTER 10. ORDINARY DIFFERENTIAL EQUATIONS

where f, a, c are given.

Recall that the Taylor’s series method has problems: either you are using the ﬁrst order method

(Euler’s Method), which suﬀers inaccuracy, or you are using a higher order method which requires

evaluations of higher order derivatives of x(t), i.e., x

tt

(t), x

ttt

(t), etc. This makes these methods less

useful for the general setting of f being a blackbox function. The Runge-Kutta Methods (don’t

ask me how it’s pronounced) seek to resolve this.

10.2.1 Taylor’s Series Redux

We fall back on Taylor’s series, in this case the 2-dimensional version:

f(x +h, y +k) =

∞

¸

i=0

1

i!

h

∂

∂x

+k

∂

∂y

i

f(x, y).

The thing in the middle is an operator on f(x, y). The partial derivative operators are interpreted

exactly as if they were algebraic terms, that is:

h

∂

∂x

+k

∂

∂y

0

f(x, y) = f(x, y),

h

∂

∂x

+k

∂

∂y

1

f(x, y) = h

∂f(x, y)

∂x

+k

∂f(x, y)

∂y

,

h

∂

∂x

+k

∂

∂y

2

f(x, y) = h

2

∂

2

f(x, y)

∂x

2

+ 2hk

∂

2

f(x, y)

∂x∂y

+k

2

∂

2

f(x, y)

∂y

2

,

.

.

.

There is a truncated version of Taylors series:

f(x +h, y +k) =

n

¸

i=0

1

i!

h

∂

∂x

+k

∂

∂y

i

f(x, y) +

1

(n + 1)!

h

∂

∂x

+k

∂

∂y

n+1

f(¯ x, ¯ y).

where (¯ x, ¯ y) is a point on the line segment with endpoints (x, y) and (x +h, y +k).

For practice, we will write this for n = 1:

f(x +h, y +k) = f(x, y) +hf

x

(x, y) +kf

y

(x, y) +

1

2

h

2

f

xx

(¯ x, ¯ y) + 2hkf

xy

(¯ x, ¯ y) +k

2

f

yy

(¯ x, ¯ y)

**10.2.2 Deriving the Runge-Kutta Methods
**

We now introduce the Runge-Kutta Method for stepping from x(t) to x(t + h). We suppose that

α, β, w

1

, w

2

are ﬁxed constants. We then compute:

K

1

←hf(t, x)

K

2

←hf(t +αh, x +βK

1

).

Then we approximate:

x(t +h) ←x(t) +w

1

K

1

+w

2

K

2

.

We now examine the “proper” choices of the constants. Note that w

1

= 1, w

2

= 0 corresponds

to Euler’s Method, and does not require computation of K

2

. We will pick another choice.

10.2. RUNGE-KUTTA METHODS 145

Also notice that the deﬁnition of K

2

should be related to Taylor’s theorem in two dimensions.

Let’s look at it:

K

2

/h = f(t +αh, x +βK

1

)

= f(t +αh, x +βhf(t, x))

= f +αhf

t

+βhff

x

+

1

2

α

2

h

2

f

tt

(

¯

t, ¯ x) +αβfh

2

f

tx

(

¯

t, ¯ x) +β

2

h

2

f

2

f

x

x(

¯

t, ¯ x)

.

Now reconsider our step:

x(t +h) = x(t) +w

1

K

1

+w

2

K

2

= x(t) +w

1

hf +w

2

hf +w

2

αh

2

f

t

+w

2

βh

2

ff

x

+O

h

3

.

= x(t) + (w

1

+w

2

) hx

t

(t) +w

2

h

2

(αf

t

+βff

x

) +O

h

3

.

= x(t) + (w

1

+w

2

) hx

t

(t) +w

2

h

2

(αf

t

+βff

x

) +O

h

3

.

If we happen to choose our constants such that

w

1

+w

2

= 1, αw

2

=

1

2

= βw

2

,

then we get

x(t +h) = x(t) +hx

t

(t) +

1

2

h

2

(f

t

+ff

x

) +O

h

3

= x(t) +hx

t

(t) +

1

2

h

2

x

tt

(t) +O

h

3

,

i.e., our choice of the constants makes the approximate x(t +h) good up to a O

h

3

term, because

we end up with the Taylor’s series expansion up to that term. cool.

The usual choice of constants is α = β = 1, w

1

= w

2

=

1

2

. This gives the second order Runge-

Kutta Method :

x(t +h) ←x(t) +

h

2

f(t, x) +

h

2

f(t +h, x +hf(t, x)).

This can be written (and evaluated) as

K

1

←hf(t, x)

K

2

←hf(t +h, x +K

1

)

x(t +h) ←x(t) +

1

2

(K

1

+K

2

) .

(10.4)

Another choice is α = β = 2/3, w

1

= 1/4, w

2

= 3/4. This gives

x(t +h) ←x(t) +

h

4

f(t, x) +

3h

4

f

t +

2h

3

, x +

2h

3

f(t, x)

.

The Runge-Kutta Method of order two has error term O

h

3

**. Sometimes this is not enough
**

and higher-order Runge-Kutta Methods are used. The next Runge-Kutta Method is the order four

method:

K

1

←hf(t, x)

K

2

←hf(t +h/2, x +K

1

/2)

K

3

←hf(t +h/2, x +K

2

/2)

K

4

←hf(t +h, x +K

3

)

x(t +h) ←x(t) +

1

6

(K

1

+ 2K

2

+ 2K

3

+K

4

) .

(10.5)

146 CHAPTER 10. ORDINARY DIFFERENTIAL EQUATIONS

This method has order O

h

5

**. (See http://mathworld.wolfram.com/Runge-KuttaMethod.html)
**

The Runge-Kutta Method can be extrapolated to even higher orders. However, the number

of function evaluations grows faster than the accuracy of the method. Thus the methods of order

higher than four are normally not used.

We already saw that w

1

= 1, w

2

= 0 corresponds to Euler’s Method. We consider now the case

w

1

= 0, w

2

= 1. The method becomes

x(t +h) ←x(t) +hf

t +

h

2

, x +

h

2

f(t, x)

.

This is called the modiﬁed Euler’s Method . Note this gives a diﬀerent value than if Euler’s

Method was applied twice with step size h/2.

10.2.3 Examples

Example 10.6. Consider the ODE:

x

t

= (tx)

3

−

x

t

2

x(1) = 1

Use h = 0.1 to compute x(1.1) using both Taylor’s Series Methods and Runge-Kutta methods

of order 2.

10.3 Systems of ODEs

Recall the regular ODE problem: ﬁnd some x(t) such that

dx(t)

dt

= f (t, x(t)) ,

x(a) = c,

where f, a, c are given.

Sometimes the physical systems we are considering are more complex. For example, we might

be interested in the system of ODEs:

dx(t)

dt

= f (t, x(t), y(t)) ,

dy(t)

dt

= g (t, x(t), y(t)) ,

x(a) = c,

y(a) = d.

Example 10.7. Consider the following system of ODEs:

dx(t)

dt

= t

2

−x

dy(t)

dt

= y

2

+y −t,

x(0) = 1,

y(0) = 0.

You should immediately notice that this is not a system at all, but rather a collection of two ODEs:

dx(t)

dt

= t

2

−x

x(0) = 1,

10.3. SYSTEMS OF ODES 147

and

dy(t)

dt

= y

2

+y −t,

y(0) = 0.

These two ODEs can be solved separately. We call such a system uncoupled. In an uncoupled

system the function f(t, x, y) is independent of y, and g(t, x, y) is independent of x. A system

which is not uncoupled, is, of course, coupled. We will not consider uncoupled systems.

10.3.1 Larger Systems

There is no need to stop at two functions. We may imagine we have to solve the following problem:

ﬁnd x

1

(t), x

2

(t), . . . , x

n

(t) such that

dx

1

(t)

dt

= f

1

(t, x

1

(t), x

2

(t), . . . , x

n

(t)) ,

dx

2

(t)

dt

= f

2

(t, x

1

(t), x

2

(t), . . . , x

n

(t)) ,

.

.

.

dxn(t)

dt

= f

n

(t, x

1

(t), x

2

(t), . . . , x

n

(t)) ,

x

1

(a) = c

1

, x

2

(a) = c

2

, . . . , x

n

(a) = c

n

.

The idea is to not be afraid of the notation, write everything as vectors, then do exactly the

same thing as for the one dimensional case! That’s right, there is nothing new but notation:

Let

X =

x

1

x

2

. . .

x

n

¸

¸

¸

¸

, X

=

x

t

1

x

t

2

. . .

x

t

n

¸

¸

¸

¸

, F =

f

1

f

2

. . .

f

n

¸

¸

¸

¸

, C =

c

1

c

2

. . .

c

n

¸

¸

¸

¸

.

Then we want to ﬁnd X such that

X

(t) = F (t, X(t))

X (a) = C.

(10.6)

Compare this to equation 10.1.

10.3.2 Recasting Single ODE Methods

We will consider stepping methods for solving higher order ODEs. That is, we know X(t), and

want to ﬁnd X(t + h). Most of the methods we looked at for solving one dimensional ODEs can

be rewritten to solve higher dimensional problems.

For example, Euler’s Method, equation 10.3 can be written as

X(t +h) ←X(t) +hF (t, X(t)) .

As in 1D, this method is derived from approximating X by its linearization.

In fact, our general strategy of using Taylor’s Theorem also carries through without change.

That is we can use the k

th

order method to step as follows:

X(t +h) ←X(t) +hX

(t) +

h

2

2

X

(t) +. . . +

h

k

k!

X

(k)

(t).

148 CHAPTER 10. ORDINARY DIFFERENTIAL EQUATIONS

Additionally we can write the Runge-Kutta Methods as follows:

Order two:

K

1

←hF (t, X)

K

2

←hF (t +h, X +K

1

)

X(t +h) ←X(t) +

1

2

(K

1

+K

2

) .

Order four:

K

1

←hF (t, X)

K

2

←hF

t +

1

2

h, X +

1

2

K

1

K

3

←hF

t +

1

2

h, X +

1

2

K

2

K

4

←hF (t +h, X +K

3

)

X(t +h) ←x(t) +

1

6

(K

1

+ 2K

2

+ 2K

3

+K

4

) .

Example Problem 10.8. Consider the system of ODEs:

x

t

1

(t) = x

2

−x

2

3

x

t

2

(t) = t +x

1

+x

3

x

t

3

(t) = x

2

−x

2

1

x

1

(0) = 1,

x

2

(0) = 0,

x

3

(0) = 1.

Approximate X(0.1) by taking a single step of size 0.1, for Euler’s Method, and the Runge-Kutta

Method of order 2.

Solution: We write

F(t, X(t)) =

x

2

(t) −x

2

3

(t)

t +x

1

(t) +x

3

(t)

x

2

(t) −x

2

1

(t)

¸

¸

X(0) =

1

0

1

¸

¸

Thus we have

F(0, X(0)) =

−1

2

−1

¸

¸

Euler’s Method then makes the approximation

X(0.1) ←X(0) + 0.1F(0, X(0)) =

1

0

1

¸

¸

+

−0.1

0.2

−0.1

¸

¸

=

0.9

0.2

0.9

¸

¸

The Runge-Kutta Method computes:

K

1

←0.1F(0, X(0)) =

−0.1

0.2

−0.1

¸

¸

and K

2

←0.1F(0.1, X(0) +K

1

) =

−0.061

0.19

−0.061

¸

¸

10.3. SYSTEMS OF ODES 149

Then

X(0.1) ←X(0) +

1

2

(K

1

+K

2

) =

1

0

1

¸

¸

+

−0.0805

0.195

−0.0805

¸

¸

=

0.9195

0.195

0.9195

¸

¸

¬

10.3.3 It’s Only Systems

The fact we can simply solve systems of ODEs using old methods is good news. It allows us to

solve higher order ODEs. For example, consider the following problem: given f, a, c

0

, c

1

, . . . , c

n

,

ﬁnd x(t) such that

x

(n)

(t) = f

t, x(t), x

t

(t), . . . , x

(n−1)

(t)

,

x(a) = c

0

, x

t

(a) = c

1

, x

tt

(a) = c

2

, . . . , x

(n−1)

(a) = c

n−1

.

We can put this in terms of a system by letting

x

0

= x, x

1

= x

t

, x

2

= x

tt

, . . . , x

n−1

= x

(n−1)

.

Then the ODE plus these n −1 equations can be written as

x

t

n−1

(t) = f (t, x

0

(t), x

1

(t), x

2

(t), . . . , x

n−1

(t))

x

t

0

(t) = x

1

(t)

x

t

1

(t) = x

2

(t)

.

.

.

x

t

n−2

(t) = x

n−1

(t)

x

0

(a) = c

0

, x

1

(a) = c

1

, x

2

(a) = c

2

, . . . , x

n−1

(a) = c

n−1

.

This is just a system of ODEs that we can solve like any other.

Note that this trick also works on systems of higher order ODEs. That is, we can transform

any system of higher order ODEs into a (larger) system of ﬁrst order ODEs.

Example Problem 10.9. Rewrite the following system as a system of ﬁrst order ODEs:

x

tt

(t) = (t/y(t)) +x

t

(t) −1

y

t

(t) =

1

(x

(t)+y(t))

x(0) = 1, x

t

(0) = 0, y(0) = −1

Solution: We let x

0

= x, x

1

= x

t

, x

2

= y, then we can rewrite as

x

t

1

(t) = (t/x

2

(t)) +x

1

(t) −1

x

t

0

(t) = x

1

(t)

x

t

2

(t) =

1

(x

1

(t)+x

2

(t))

x

0

(0) = 1, x

1

(0) = 0, x

2

(0) = −1

¬

150 CHAPTER 10. ORDINARY DIFFERENTIAL EQUATIONS

10.3.4 It’s Only Autonomous Systems

In fact, we can use a similar trick to get rid of the time-dependence of an ODE. Consider the ﬁrst

order system

x

t

1

= f

1

(t, x

1

, x

2

, . . . , x

n

) ,

x

t

2

= f

2

(t, x

1

, x

2

, . . . , x

n

) ,

x

t

3

= f

3

(t, x

1

, x

2

, . . . , x

n

) ,

.

.

.

x

t

n

= f

n

(t, x

1

, x

2

, . . . , x

n

) ,

x

1

(a) = c

1

, x

2

(a) = c

2

, . . . , x

n

(a) = c

n

.

We can get rid of the time dependence and thus make the system autonomous by making t

another variable function to be found. We do this by letting x

0

= t, then adding the equations

x

t

0

= 1, and x

0

(a) = a, to get the system:

x

t

0

= 1,

x

t

1

= f

1

(x

0

, x

1

, x

2

, . . . , x

n

) ,

x

t

2

= f

2

(x

0

, x

1

, x

2

, . . . , x

n

) ,

x

t

3

= f

3

(x

0

, x

1

, x

2

, . . . , x

n

) ,

.

.

.

x

t

n

= f

n

(x

0

, x

1

, x

2

, . . . , x

n

) ,

x

0

(a) = a, x

1

(a) = c

1

, x

2

(a) = c

2

, . . . , x

n

(a) = c

n

.

An autonomous system can be written in the more elegant form:

X

= F (X)

X (a) = C.

Moreover, we can consider the phase curve of an autonomous system.

One can consider X to be the position of a particle which moves through R

n

over time. For

an autonomous system of ODEs, the movement of the particle is dependent only on its current

position and not on time.

1

A phase curve is a ﬂow line in the vector ﬁeld F. You can think of

the vector ﬁeld F as being the velocity, at a given point, of a river, the ﬂow of which is time

independent. Under this analogy, the phase curve is the path of a vessel placed in the river. In our

deterministic world, phase curves never cross. This is because at the point of crossing, there would

have to be two diﬀerent values of the tangent of the curve, i.e., the vector ﬁeld would have to have

two diﬀerent values at that point.

The following example illustrates the idea of phase curves for autonomous ODE.

Example 10.10. Consider the autonomous system of ODEs, without an initial value:

x

t

1

(t) = −2(x

2

−2.4)

x

t

2

(t) = 3(x

1

−1.2)

We can rewrite this ODE as

X

t

(t) = F (X(t)) .

1

Note that if you have converted a system of n equations with a time dependence to an autonomous system, then

the autonomous system should be considered a particle moving through R

n+1

. In this case time becomes one of the

dimensions, and thus we speak of position, instead of time.

10.3. SYSTEMS OF ODES 151

This is an autonomous ODE. The associated vector ﬁeld can be expressed as

F (X) =

¸

0 −2

3 0

X +

¸

4.8

−3.6

.

This vector ﬁeld is plotted in Figure 10.6.

-0.5

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

-0.5 0 0.5 1 1.5 2 2.5 3 3.5

Figure 10.6: The vector ﬁeld F (X) is shown in R

2

. The tips of the vectors are shown as circles.

(Due to budget restrictions, arrowheads were not available for this ﬁgure.)

You should verify that this ODE is solved by

X (t) = r

¸ √

2 cos(

√

6t +t

0

)

√

3 sin(

√

6t +t

0

)

+

¸

1.2

2.4

,

for any r ≥ 0, and t

0

∈ (0, 2π] . Given an initial value, the parameters r and t

0

are uniquely

determined. Thus the trajectory of X over time is that of an ellipse in the plane.

2

Thus the

family of such ellipses, taking all r ≥ 0, form the phase curves of this ODE. We would expect the

approximation of an ODE to never cross a phase curve. This is because the phase curve represents

the

Euler’s Method and the second order Runge-Kutta Method were used to approximate the ODE

with initial value

X (0) =

¸

1.5

1.5

.

2

In the case r = 0, the ellipse is the “trivial” ellipse which consists of a point.

152 CHAPTER 10. ORDINARY DIFFERENTIAL EQUATIONS

-0.5

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

-0.5 0 0.5 1 1.5 2 2.5 3 3.5

(a) Euler’s Method

-0.5

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

-0.5 0 0.5 1 1.5 2 2.5 3 3.5

(b) Runge-Kutta Method

Figure 10.7: The simulated phase curves of the system of Example 10.10 are shown for (a) Euler’s

Method and (b) the Runge-Kutta Method of order 2. The Runge-Kutta Method used larger step

size, but was more accurate. The actual solution to the ODE is an ellipse, thus the “spiralling”

observed for Euler’s Method is spurious.

10.3. SYSTEMS OF ODES 153

Euler’s Method was used for step size h = 0.005 for 3000 steps. The Runge-Kutta Method was

employed with step size h = 0.015 for 3000 steps. The approximations are shown in Figure 10.7.

Given that the actual solution of this initial value problem is an ellipse, we see that Euler’s

Method performed rather poorly, spiralling out from the ellipse. This must be the case for any step

size, since Euler’s Method steps in the direction tangent to the actual solution; thus every step of

Euler’s Method puts the approximation on an ellipse of larger radius. Smaller stepsize minimizes

this eﬀect, but at greater computational cost.

The Runge-Kutta Method performs much better, and for larger step size. At the given resolu-

tion, no spiralling is observable. Thus the Runge-Kutta Method outperforms Euler’s Method, and

at lesser computational cost.

3

3

Because the Runge-Kutta Method of order two requires two evaluations of F, whereas Euler’s Method requires

only one, per step, it is expected that they would be ‘evenly matched’ when Runge-Kutta Method uses a step size

twice that of Euler’s Method.

154 CHAPTER 10. ORDINARY DIFFERENTIAL EQUATIONS

Exercises

(10.1) Given the ODE:

x

t

(t) = t

2

−x

2

x(1) = 1

Approximate x(1.1) by using one step of Euler’s Method. Approximate x(1.1) using one

step of the second order Runge-Kutta Method.

(10.2) Given the ODE:

x

t

1

(t) = x

1

x

2

+t

x

t

2

(t) = 2x

2

−x

2

1

x

1

(1) = 0

x

2

(1) = 3

Approximate X(1.1) by using one step of Euler’s Method. Approximate X(1.1) using one

step of the second order Runge-Kutta Method.

(10.3) Consider the separable ODE

x

t

(t) = x(t)g(t),

for some function g. Use backward’s Euler (or, alternatively, the right endpoint rule for

approximating integrals) to derive the approximation scheme

x(t +h) ←

x(t)

1 −hg(t +h)

.

What could go wrong with using such a scheme?

(10.4) Consider the ODE

x

t

(t) = x(t) +t

2

.

Using Taylor’s Theorem, derive a higher order approximation scheme to ﬁnd x(t +h) based

on x(t), t, and h.

(10.5) Which of the following ODEs can you show are stable? Which are unstable? Which seem

ambiguous?

(a) x

t

(t) = −x − arctan x (b) x

t

(t) = 2x + tan x (c) x

t

(t) = −4x − e

x

(d) x

t

(t) = x + x

3

(e) x

t

(t) = t + cos x

(10.6) Rewrite the following higher order ODE as a system of ﬁrst order ODEs:

x

tt

(t) = x(t) −sin x

t

(t),

x(0) = 0,

x

t

(0) = π.

(10.7) Rewrite the system of ODEs as a system of ﬁrst order ODEs

x

tt

(t) = y(t) +t −x(t),

y

t

(t) = x

t

(t) +x(t) −4,

x

t

(0) = 1,

x(0) = 2,

y(0) = 0.

(10.8) Rewrite the following higher order system of ODEs as a system of ﬁrst order ODEs:

x

ttt

(t) = y

tt

(t) −x

t

(t),

y

ttt

(t) = x

tt

(t) +y

t

(t),

x(0) = x

tt

(0) = y

t

(0) = −1,

x

t

(0) = y(0) = y

tt

(0) = 1.

10.3. SYSTEMS OF ODES 155

(10.9) Implement Euler’s Method for approximating the solution to

dx(t)

dt

= f (t, x(t)) ,

x(a) = c.

Your m-ﬁle should have header line like:

function xfin = euler(f,a,c,h,steps)

where xfin should be the approximate solution to the ODE at time a + steps h.

(a) Run your code with f(t, x) = −x, using a = 0 = c, and using h steps = 1.0. In this

case the actual solution is

xfin = ce

−1

.

This ODE is stable, so your approximation should be good (cf. Example 10.3). Exper-

iment with diﬀerent values of h.

(b) Run your code with f(t, x) = x, using a = 0 = c, and using h steps = 1.0. In this

case the actual solution is

xfin = ce.

Your actual solution may be a rather poor approximation. Prepare a plot of error

versus h for varying values of h. (cf. Figure 7.1 and Figure 10.2)

(c) Run your code with f(t, x) = 1/ (1 −x) , using a = 0, c = 0.1, and using hsteps = 2.0.

Plot your approximations for varying values of steps. For example, try steps values of

35, 36, and 90, 91. Also try large values of steps, like 100, 500, 1000. Can you guess what

the actual solution is supposed to look like? Can you explain the poor performance of

Euler’s Method for this ODE?

(10.10) Implement the Runge-Kutta Method of order two for approximating the solution to

dx(t)

dt

= f (t, x(t)) ,

x(a) = c.

Your m-ﬁle should have header line like:

function xfin = rkm(f,a,c,h,steps)

where xfin should be the approximate solution to the ODE at time a + steps h.

Test your code with the ODEs from the previous problem. Does the Runge-Kutta Method

perform any better than Euler’s Method for the last ODE?

156 CHAPTER 10. ORDINARY DIFFERENTIAL EQUATIONS

Appendix A

Old Exams

A.1 First Midterm, Fall 2003

P1 (7 pnts) Clearly state Taylor’s Theorem for f(x + h). Include all hypotheses, and state the

conditions on the variable that appears in the error term.

P2 (7 pnts) Write out the Lagrange Polynomial

0

(x) for nodes x

0

, x

1

, . . . , x

n

. That is, write the

polynomial that has value 1 at x

0

and has value 0 at x

1

, x

2

, . . . , x

n

.

P3 (7 pnts) Write the Lagrange form of the polynomial of lowest degree that interpolates the

following values:

x

k

−1 1

y

k

11 17

P4 (14 pnts) Use the method of Bisection for ﬁnding the zero of the function f(x) = x

3

− 3x

2

+

5x −

5

2

. Let a

0

= 0, and b

0

= 2. What are a

2

, b

2

?

P5 (21 pnts) • Write out the iteration step of Newton’s Method for ﬁnding a zero of the func-

tion f(x). Your iteration should look like

x

k+1

=??

• Write the Newton’s Method iteration for ﬁnding

√

5.

• Letting x

0

= 1, ﬁnd the x

1

and x

2

for your iteration. Note that

√

5 ≈ 2.236.

P6 (7 pnts) Write out the iteration step of the Secant Method for ﬁnding a zero of the function

f(x). Your iteration should look like

x

k+1

=??

P7 (21 pnts) Consider the following approximation:

f

t

(x) ≈

f(x +h) −5f(x) + 4f(x +

h

2

)

3h

• Derive this approximation using Taylor’s Theorem.

• Assuming that f(x) has bounded derivatives, give the accuracy of the above approxima-

tion. Your answer should be something like O

h

?

.

• Let f(x) = x

2

. Approximate f

t

(0) with this approximation, using h =

1

3

.

P8 (21 pnts) Consider the data:

k 0 1 2

x

k

3 1 −1

f(x

k

) 5 15 9

157

158 APPENDIX A. OLD EXAMS

• Construct the divided diﬀerences table for the data.

• Construct the Newton form of the polynomial p(x) of lowest degree that interpolates

f(x) at these nodes.

• Suppose that these data were generated by the function f(x) = x

3

− 5x

2

+ 2x + 17.

Find a numerical upper bound on the error [p(x) −f(x)[ over the interval [−1, 3]. Your

answer should be a number.

A.2 Second Midterm, Fall 2003

P1 (6 pnts) Write the Trapezoid rule approximation for

b

a

f(x) dx, using the nodes ¦x

i

¦

n

i=0

.

P2 (6 pnts) Suppose Gaussian Elimination with scaled partial pivoting is applied to the matrix

A =

0.0001 −0.1 −0.01 0.001

43 91 −43 1

−12 26 −1 0

−7

1

2

−10 π

¸

¸

¸

¸

.

Which row contains the ﬁrst pivot? You must show your work.

P3 (14 pnts) Let φ(n) be some quadrature rule that divides an interval into 2

n

equal subintervals.

Suppose

b

a

f(x) dx = φ(n) +a

5

h

5

n

+a

10

h

10

n

+a

15

h

15

n

+. . . ,

where h

n

=

b−a

2

n

. Using evaluations of φ(), devise a “Romberg-style” quadrature rule that is

a O

h

10

n

**approximation to the integral.
**

P4 (14 pnts) Compute the LU factorization of the matrix

A =

1 0 −1

0 2 0

2 2 −1

¸

¸

using na¨ıve Gaussian Elimination. Show all your work. Your ﬁnal answer should be two

matrices, L and U. Verify your answer, i.e., verify that A = LU.

P5 (14 pnts) Run one iteration of either Jacobi or Gauss Seidel on the system:

1 3 4

−2 2 1

−1 3 −2

¸

¸

x =

−3

−3

1

¸

¸

Start with x

(0)

= [1 1 1]

¯

. Clearly indicate your answer x

(1)

. Show your work. You must

indicate which of the two methods you are using.

P6 (21 pnts) Derive the two-node Chebyshev Quadrature rule:

1

−1

f(x) dx ≈ c [f(x

0

) +f(x

1

)] .

Make this rule exact for all polynomials of degree ≤ 2.

P7 (28 pnts) Consider the quadrature rule:

1

0

f(x) dx ≈ z

0

f(0) +z

1

f(1) +z

2

f

t

(0) +z

3

f

t

(1).

A.3. FINAL EXAM, FALL 2003 159

• Write four linear equations involving z

i

to make this rule exact for polynomials of the

highest possible degree.

• Solve your system for z using na¨ıve Gaussian Elimination.

A.3 Final Exam, Fall 2003

P1 (4 pnts) Clearly state Taylor’s Theorem for f(x + h). Include all hypotheses, and state the

conditions on the variable that appears in the error term.

P2 (4 pnts) State the conditions that characterize S(x) as a natural cubic spline on the interval

[a, b]. Let the knots be a = t

0

< t

1

< t

2

< . . . < t

n

= b.

P3 (4 pnts) Let x

0

= 3, x

1

= −5, x

2

= 0. Write out the Lagrange Polynomial

0

(x) for these

nodes; that is, write the polynomial that has value 1 at x

0

and has value 0 at x

1

, x

2

.

P4 (4 pnts) Consider the ODE:

x

t

(t) = f(t, x(t))

x(a) = c

Write out the Euler Method to step from x(t) to x(t +h).

P5 (7 pnts) Compute the LU factorization of the matrix

A =

−1 1 0

1 1 −1

−2 0 2

¸

¸

using na¨ıve Gaussian Elimination. Show all your work. Your ﬁnal answer should be two

matrices, L and U. Verify your answer, i.e., verify that A = LU.

P6 (9 pnts) Let α be given. Determine a quadrature rule of the form

α

−α

f(x) dx ≈ Af(0) +Bf

tt

(−α) +Cf

tt

(α)

that is exact for polynomials of highest possible degree. What is the highest degree polynomial

for which this rule is exact?

P7 (9 pnts) Suppose that φ(n) is some computational function that approximates a desired quan-

tity, L. Furthermore suppose that

φ(n) = L +a

1

n

−

1

2

+a

2

n

−

2

2

+a

3

n

−

3

2

+. . .

Combine two evaluations of φ() in the manner of Richardson to get a O

n

−1

approximation

to L.

P8 (9 pnts) Find the least-squares best approximate of the form y = ln (cx) to the data

x

0

x

1

. . . x

n

y

0

y

1

. . . y

n

Caution: this should not be done with basis vectors.

P9 (9 pnts) Let

T = ¦f(x) = c

0

x +c

1

log x +c

2

cos x[ c

0

, c

1

, c

2

∈ R¦ .

Set up (but do not attempt to solve) the normal equations to ﬁnd the least-squares best f ∈ T

to approximate the data:

x

0

x

1

. . . x

n

y

0

y

1

. . . y

n

160 APPENDIX A. OLD EXAMS

The normal equations should involve the vector c = [c

0

c

1

c

2

]

¯

, which uniquely determines a

function in T.

P10 (9 pnts) Consider the following approximation:

f

tt

(x) ≈

3f(x −h) −4f(x) +f(x + 3h)

6h

2

• Derive this approximation using Taylor’s Theorem.

• Assuming that f(x) has bounded derivatives, give the accuracy of the above approxima-

tion. Your answer should be something like O

h

?

.

P11 (11 pnts) Consider the ODE:

x

t

(t) = x(t)g(t)

x(0) = 1

• Use the Fundamental Theorem of Calculus to write a step update of the form:

x(t +h) = x(t) +

t+h

t

??

• Approximate the integral using the Trapezoid Rule to get an explicit step update of the

form

x(t +h) ←Q(t, x(t), h)

• Suppose that g(t) ≥ m > 0 for all t. Do you see any problem with this stepping method

if h > 2/m?

Hint: the actual solution to the ODE is

x(t) = e

R

t

0

g(r) dr

.

P12 (9 pnts) Consider the ODE:

x

t

(t) = e

x(t)

+ sin (x(t))

x(0) = 0

Write out the Taylor’s series method of order three to approximate x(t +h) given x(t). It is

permissible to write the update as a small program like:

x

t

(t) ← Q

0

(x(t))

x

tt

(t) ← Q

1

(x(t), x

t

(t))

.

.

.

x(t +h) ← R(x(t), x

t

(t), x

tt

(t), . . .)

rather than write the update as a monstrous equation of the form

x(t +h) ←M(x(t)).

P13 (15 pnts) Consider the data:

k 0 1 2

x

k

−0.5 0 0.5

f(x

k

) 5 15 9

• Construct the divided diﬀerences table for the data.

A.4. FIRST MIDTERM, FALL 2004 161

• Construct the Newton form of the polynomial p(x) of lowest degree that interpolates

f(x) at these nodes.

• Suppose that these data were generated by the function f(x) = 40x

3

−32x

2

−6x + 15.

Find a numerical upper bound on the error [p(x) −f(x)[ over the interval [−0.5, 0.5].

Your answer should be a number.

P14 (15 pnts) • Write out the iteration step of Newton’s Method for ﬁnding a zero of the

function f(x). Your iteration should look like

x

k+1

=??

• Write the Newton’s Method iteration for ﬁnding

3

25/4.

• Letting x

0

=

5

2

, ﬁnd the x

1

and x

2

for your iteration. Note that

3

25/4 ≈ 1.842.

P15 (15 pnts) • Let x be some given number, and let f() be a given, “black box” function.

Using Taylor’s Theorem, derive a O(h) approximation to the derivative f

t

(x). Call your

approximation φ(h); the approximation should be deﬁned in terms of f() evaluated at

some values, and should probably involve x and h.

• Let f(x) = x

2

+ 1. Approximate f

t

(0) using your φ(h), for h =

1

2

.

• For the same f(x) and h, get a O

h

2

approximation to f

t

(0) using the method of

Richardson’s Extrapolation; your answer should probably include a table like this:

D(0, 0)

D(1, 0) D(1, 1)

A.4 First Midterm, Fall 2004

[Deﬁnitions] Answer no more than two of the following.

P1 (12 pnts) Clearly state Taylor’s Theorem for f(x + h). Include all hypotheses, and state the

conditions on the variable that appears in the error term.

P2 (12 pnts) Write the general form of the iterated solution to the problem Ax = b. Let Q be

your “splitting matrix,” and use the factor ω. Your solution should look something like:

??x

(k)

=??x

(k−1)

+?? or x

(k)

=??x

(k−1)

+??

For one of the following variants, describe what Q is, in relation to A : Richardson’s, Jacobi,

Gauss-Seidel.

P3 (12 pnts) Write the iteration for Newton’s Method or the Secant Method for ﬁnding a root to

the equation f(x) = 0. Your solution should look something like:

x

k+1

=?? +

??

??

[Problems] Answer no more than four of the following.

P1 (14 pnts) Perform bisection on the function graphed below. Let the ﬁrst interval be [0, 256].

Give the second, third, fourth, and ﬁfth intervals.

P2 (14 pnts) Give a O

h

4

**approximation to sin (h +π/2) . Find a reasonable C such that the
**

truncation error is no greater than Ch

4

.

P3 (14 pnts) Use Newton’s Method to ﬁnd a good approximation to

3

√

24. Use x

0

= 3. That is,

iteratively deﬁne a sequence with x

0

= 3, such that x

k

→

3

√

24.

P4 (14 pnts) One of the roots of x

2

− bx + c = 0 as found by the quadratic formula is subject

to subtractive cancellation when [b[ [c[ . Which root is it? Rewrite the expression for that

root to eliminate subtractive cancellation.

162 APPENDIX A. OLD EXAMS

0 32 64 96 128 160 192 224 256

-64

-32

0

32

64

P5 (14 pnts) Find the LU decomposition of the following matrix by Gaussian Elimination with

na¨ıve pivoting.

3 2 4

−6 1 −6

12 −12 10

¸

¸

P6 (14 pnts) Consider the linear equation:

6 1 1

2 4 0

1 2 6

¸

¸

x =

5

−6

3

¸

¸

Letting x

(0)

= [1 1 1]

¯

, ﬁnd x

(1)

using either the Gauss-Seidel or Jacobi iterative schemes.

Use weighting ω = 1. You must indicate which scheme you are using. Show all your work.

[Questions] Answer only one of the following.

P1 (20 pnts) Suppose that the eigenvalues of A are in [α, β] with 0 < α < β. Find the ω that

minimizes

I −ωA

2

2

. What is the minimal value of

I −ωA

2

2

for this optimal ω? For full

credit you need to make some argument why this ω minimizes this norm.

P2 (20 pnts) Let f(x) = 0 have a unique root r. Recall that the form of the error for Newton’s

Method is

e

k+1

=

−f

tt

(ξ

k

)

2f

t

(x

k

)

e

2

k

,

where e

k

= r − x

k

. Suppose that f(x) has the properties that f

t

(x) ≥ m > 0 for all x, and

[f

tt

(x)[ ≤ M for all x. Find δ > 0 such that [e

k

[ < δ insures that x

n

→r. Justify your answer.

P3 (20 pnts) Suppose that f(r) = 0 = f

t

(r) = f

tt

(r), and f

tt

(x) is continuous. Letting e

k

= x

k

−r,

show that for Newton’s Method,

e

k+1

≈

1

2

e

k

,

when [e

k

[ is suﬃciently small. (Hint: Use Taylor’s Theorem twice.) Is this faster or slower

convergence than for Newton’s Method with a simple root?

A.5 Second Midterm, Fall 2004

[Deﬁnitions] Answer no more than two of the following three.

P1 (10 pnts) For a set of distinct nodes x

0

, x

1

, . . . , x

13

, what is the Lagrange Polynomial

5

(x)?

You may write your answer as a product. What degree is this polynomial?

A.5. SECOND MIDTERM, FALL 2004 163

P2 (10 pnts) Complete this statement of the polynomial interpolation error theorem: Let p be the

polynomial of degree at most n interpolating function f at the n+1 distinct nodes x

0

, x

1

, . . . , x

n

on [a, b]. Let f

(n+1)

be continuous. Then for each x ∈ [a, b] there is some ξ ∈ [a, b] such that

f(x) −p(x) =

?

?

n

¸

i=0

(?) .

P3 (10 pnts) State the conditions which deﬁne the function S(x) as a natural cubic spline on the

interval [a, b].

[Problems] Answer no more than ﬁve of the following six.

P1 (16 pnts) How large must n be to interpolate the function e

x

to within 0.001 on the interval

[0, 2] with the polynomial interpolant at n equally spaced nodes?

P2 (16 pnts) Let φ(h) be some computable approximation to the quantity L such that

φ(h) = L +a

5

h

5

+a

10

h

10

+a

15

h

15

+. . .

Combine evaluations of φ() to devise a O

h

10

approximation to L.

P3 (16 pnts) Derive the constants A and B which make the quadrature rule

1

−1

f(x) dx ≈ Af

−

√

15

5

+Bf(0) +Af

√

15

5

**exact for polynomials of degree ≤ 2. Use this quadrature rule to approximate
**

π =

1

−1

2

1 +x

2

dx

P4 (16 pnts) Find the polynomial of degree ≤ 3 interpolating the data

x −1 0 1 2

y −2 8 0 4

(Hint: using Lagrange Polynomials will probably take too long.)

P5 (16 pnts) Find constants α, β such that the following is a quadratic spline on [0, 4].

Q(x) =

x

2

−3x + 4 : 0 ≤ x < 1

α(x −1)

2

+β(x −1) + 2 : 1 ≤ x < 3

x

2

+x −4 : 3 ≤ x ≤ 4

P6 (16 pnts) Consider the following approximation:

f

t

(x) ≈

f(x +h) −5f(x) + 4f(x +

h

2

)

3h

• Derive this approximation via Taylor’s Theorem.

• Assuming that f(x) has bounded derivatives, give the accuracy of the above approxima-

tion. Your answer should be something like O

h

?

.

• Let f(x) = x

2

. Approximate f

t

(0) with this approximation, using h =

1

3

.

164 APPENDIX A. OLD EXAMS

A.6 Final Exam, Fall 2004

[Deﬁnitions] Answer all of the following.

P1 (10 pnts) Clearly state Taylor’s Theorem for f(x + h). Include all hypotheses, and state the

conditions on the variable that appears in the error term.

P2 (10 pnts) Write the iteration for Newton’s Method or the Secant Method for ﬁnding a root to

the equation f(x) = 0. Your solution should look something like:

x

k+1

=?? +

??

??

P3 (10 pnts) Write the general form of the iterated solution to the problem Ax = b. Let Q be

your “splitting matrix,” and use the factor ω. Your solution should look something like:

??x

(k)

=??x

(k−1)

+?? or x

(k)

=??x

(k−1)

+??

For one of the following variants, describe what Q is, in relation to A : Richardson’s, Jacobi,

Gauss-Seidel.

P4 (10 pnts) Deﬁne what it means for the function f(x) ∈ T to be the least squares best approx-

imant to the data

x

0

x

1

. . . x

n

y

0

y

1

. . . y

n

[Problems] Answer all of the following.

P1 (10 pnts) Rewrite this system of higher order ODEs as a system of ﬁrst order ODEs:

x

tt

(t) = x

t

(t) [x(t) +ty(t)]

y

tt

(t) = y

t

(t) [y(t) −tx(t)]

x

t

(0) = 1

x(0) = 4

y

t

(0) = −3

y(0) = −2

P2 (10 pnts) Consider the ODE:

x

t

(t) = t (x + 1)

2

x(2) = 1/2

Use Euler’s Method to ﬁnd an approximate value of x(2.1) using a single step.

P3 (10 pnts) Let φ(h) be some computable approximation to the quantity L such that

φ(h) = L +a

4

h

4

+a

8

h

8

+a

12

h

12

+. . .

Combine evaluations of φ() to devise a O

h

8

approximation to L.

P4 (15 pnts) Consider the approximation

f

t

(x) ≈

9f(x +h) −8f(x) −f(x −3h)

12h

(a) Derive this approximation using Taylor’s Theorem.

(b) Assuming that f(x) has bounded derivatives, give the accuracy of the above approxima-

tion. Your answer should be something like O

h

?

.

A.6. FINAL EXAM, FALL 2004 165

P5 (15 pnts) Find constants, α, β, γ such that the following is a natural cubic spline on [1, 3].

Q(x) =

α(x −1)

3

+β (x −1)

2

+ 12 (x −1) + 1 : 1 ≤ x < 2

γx

3

+ 18x

2

−30x + 19 : 2 ≤ x < 3

P6 (15 pnts) Devise a subroutine that, given an input number z, calculates an approximate to

3

√

z

via Newton’s Method. Write your subroutine in pseudo code, Matlab, or C/C++. Include

reasonable termination conditions so your subroutine will terminate if it ﬁnds a good approx-

imate solution. Your subroutine should use only the basic arithmetic operations of addition,

subtraction, multiplication, division; in particular your subroutine should not have access to

a cubed root or logarithm operator.

P7 (15 pnts) Consider the ODE:

x

t

(t) = cos (x(t)) + sin (x(t))

x(0) = 1

Use Taylor’s theorem to devise a O

h

3

**approximant to x(t +h) which is a function of x, t,
**

and h only. You may write your answer as a program:

x

t

(t) ←Q

0

(x(t))

x

tt

(t) ←Q

1

(x(t), x

t

(t))

.

.

.

˜ x ←R(x(t), x

t

(t), x

tt

(t), . . .)

such that x(t +h) = ˜ x +O

h

3

.

[Questions] Answer all of the following.

P1 (20 pnts) Consider the “quadrature” rule:

1

0

f(x) dx ≈ A

0

f(0) +A

1

f(1) +A

2

f

t

(1) +A

3

f

tt

(1)

(a) Write four linear equations involving A

i

to make this rule exact for polynomials of the

highest possible degree.

(b) Find the constants A

i

using Gaussian Elimination with na¨ıve pivoting.

P2 (20 pnts) Consider the data:

k 0 1 2

x

k

0 0.25 0.5

f(x

k

) 1.5 1 0.5

(a) Construct the divided diﬀerences table for the data.

(b) Construct the Newton form of the polynomial p(x) of lowest degree that interpolates

f(x) at these nodes.

(c) What degree is your polynomial? Is this strange?

(d) Suppose that these data were generated by the function

f(x) = 1 +

cos 2πx

2

.

Find an upper bound on the error [p(x) −f(x)[ over the interval [0, 0.5]. Your answer

should be a number.

166 APPENDIX A. OLD EXAMS

P3 (20 pnts) Let

T =

¸

f(x) = c

0

√

x +c

1

(x) +c

2

e

x

[ c

0

, c

1

, c

2

∈ R

¸

,

where (x) is some black box function. We are concerned with ﬁnding the least-squares best

f ∈ T to approximate the data

x

0

x

1

. . . x

n

y

0

y

1

. . . y

n

(a) Set up (but do not attempt to solve) the normal equations to ﬁnd the the vector c =

[c

0

c

1

c

2

]

¯

, which determines the least squares approximant in T.

(b) Now suppose that the function (x) happens to be the Lagrange Polynomial associated

with x

7

, among the values x

0

, x

1

, . . . , x

n

. That is, (x

j

) = δ

j7

. Prove that f(x

7

) = y

7

,

where f is the least squares best approximant to the data.

P4 (10 pnts) State some substantive question which you thought might appear on this exam, but

did not. Answer this question (correctly).

Appendix B

GNU Free Documentation License

Version 1.2, November 2002

Copyright c (2000,2001,2002 Free Software Foundation, Inc.

59 Temple Place, Suite 330, Boston, MA 02111-1307 USA

Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed.

Preamble

The purpose of this License is to make a manual, textbook, or other functional and useful document ”free” in the sense of

freedom: to assure everyone the eﬀective freedom to copy and redistribute it, with or without modifying it, either commercially

or noncommercially. Secondarily, this License preserves for the author and publisher a way to get credit for their work, while

not being considered responsible for modiﬁcations made by others.

This License is a kind of ”copyleft”, which means that derivative works of the document must themselves be free in the

same sense. It complements the GNU General Public License, which is a copyleft license designed for free software.

We have designed this License in order to use it for manuals for free software, because free software needs free documentation:

a free program should come with manuals providing the same freedoms that the software does. But this License is not limited

to software manuals; it can be used for any textual work, regardless of subject matter or whether it is published as a printed

book. We recommend this License principally for works whose purpose is instruction or reference.

1. APPLICABILITY AND DEFINITIONS

This License applies to any manual or other work, in any medium, that contains a notice placed by the copyright holder

saying it can be distributed under the terms of this License. Such a notice grants a world-wide, royalty-free license, unlimited

in duration, to use that work under the conditions stated herein. The ”Document”, below, refers to any such manual or work.

Any member of the public is a licensee, and is addressed as ”you”. You accept the license if you copy, modify or distribute the

work in a way requiring permission under copyright law.

A ”Modiﬁed Version” of the Document means any work containing the Document or a portion of it, either copied

verbatim, or with modiﬁcations and/or translated into another language.

A ”Secondary Section” is a named appendix or a front-matter section of the Document that deals exclusively with the

relationship of the publishers or authors of the Document to the Document’s overall subject (or to related matters) and contains

nothing that could fall directly within that overall subject. (Thus, if the Document is in part a textbook of mathematics, a

Secondary Section may not explain any mathematics.) The relationship could be a matter of historical connection with the

subject or with related matters, or of legal, commercial, philosophical, ethical or political position regarding them.

The ”Invariant Sections” are certain Secondary Sections whose titles are designated, as being those of Invariant Sections,

in the notice that says that the Document is released under this License. If a section does not ﬁt the above deﬁnition of Secondary

then it is not allowed to be designated as Invariant. The Document may contain zero Invariant Sections. If the Document does

not identify any Invariant Sections then there are none.

The ”Cover Texts” are certain short passages of text that are listed, as Front-Cover Texts or Back-Cover Texts, in

the notice that says that the Document is released under this License. A Front-Cover Text may be at most 5 words, and a

Back-Cover Text may be at most 25 words.

A ”Transparent” copy of the Document means a machine-readable copy, represented in a format whose speciﬁcation is

available to the general public, that is suitable for revising the document straightforwardly with generic text editors or (for

images composed of pixels) generic paint programs or (for drawings) some widely available drawing editor, and that is suitable

for input to text formatters or for automatic translation to a variety of formats suitable for input to text formatters. A copy

made in an otherwise Transparent ﬁle format whose markup, or absence of markup, has been arranged to thwart or discourage

subsequent modiﬁcation by readers is not Transparent. An image format is not Transparent if used for any substantial amount

of text. A copy that is not ”Transparent” is called ”Opaque”.

167

168 APPENDIX B. GNU FREE DOCUMENTATION LICENSE

Examples of suitable formats for Transparent copies include plain ASCII without markup, Texinfo input format, LaTeX

input format, SGML or XML using a publicly available DTD, and standard-conforming simple HTML, PostScript or PDF

designed for human modiﬁcation. Examples of transparent image formats include PNG, XCF and JPG. Opaque formats

include proprietary formats that can be read and edited only by proprietary word processors, SGML or XML for which the

DTD and/or processing tools are not generally available, and the machine-generated HTML, PostScript or PDF produced by

some word processors for output purposes only.

The ”Title Page” means, for a printed book, the title page itself, plus such following pages as are needed to hold, legibly,

the material this License requires to appear in the title page. For works in formats which do not have any title page as such,

”Title Page” means the text near the most prominent appearance of the work’s title, preceding the beginning of the body of

the text.

A section ”Entitled XYZ” means a named subunit of the Document whose title either is precisely XYZ or contains XYZ

in parentheses following text that translates XYZ in another language. (Here XYZ stands for a speciﬁc section name mentioned

below, such as ”Acknowledgements”, ”Dedications”, ”Endorsements”, or ”History”.) To ”Preserve the Title” of

such a section when you modify the Document means that it remains a section ”Entitled XYZ” according to this deﬁnition.

The Document may include Warranty Disclaimers next to the notice which states that this License applies to the Document.

These Warranty Disclaimers are considered to be included by reference in this License, but only as regards disclaiming warranties:

any other implication that these Warranty Disclaimers may have is void and has no eﬀect on the meaning of this License.

2. VERBATIM COPYING

You may copy and distribute the Document in any medium, either commercially or noncommercially, provided that this

License, the copyright notices, and the license notice saying this License applies to the Document are reproduced in all copies,

and that you add no other conditions whatsoever to those of this License. You may not use technical measures to obstruct or

control the reading or further copying of the copies you make or distribute. However, you may accept compensation in exchange

for copies. If you distribute a large enough number of copies you must also follow the conditions in section 3.

You may also lend copies, under the same conditions stated above, and you may publicly display copies.

3. COPYING IN QUANTITY

If you publish printed copies (or copies in media that commonly have printed covers) of the Document, numbering more

than 100, and the Document’s license notice requires Cover Texts, you must enclose the copies in covers that carry, clearly and

legibly, all these Cover Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on the back cover. Both covers must

also clearly and legibly identify you as the publisher of these copies. The front cover must present the full title with all words of

the title equally prominent and visible. You may add other material on the covers in addition. Copying with changes limited to

the covers, as long as they preserve the title of the Document and satisfy these conditions, can be treated as verbatim copying

in other respects.

If the required texts for either cover are too voluminous to ﬁt legibly, you should put the ﬁrst ones listed (as many as ﬁt

reasonably) on the actual cover, and continue the rest onto adjacent pages.

If you publish or distribute Opaque copies of the Document numbering more than 100, you must either include a machine-

readable Transparent copy along with each Opaque copy, or state in or with each Opaque copy a computer-network location from

which the general network-using public has access to download using public-standard network protocols a complete Transparent

copy of the Document, free of added material. If you use the latter option, you must take reasonably prudent steps, when you

begin distribution of Opaque copies in quantity, to ensure that this Transparent copy will remain thus accessible at the stated

location until at least one year after the last time you distribute an Opaque copy (directly or through your agents or retailers)

of that edition to the public.

It is requested, but not required, that you contact the authors of the Document well before redistributing any large number

of copies, to give them a chance to provide you with an updated version of the Document.

4. MODIFICATIONS

You may copy and distribute a Modiﬁed Version of the Document under the conditions of sections 2 and 3 above, provided

that you release the Modiﬁed Version under precisely this License, with the Modiﬁed Version ﬁlling the role of the Document,

thus licensing distribution and modiﬁcation of the Modiﬁed Version to whoever possesses a copy of it. In addition, you must

do these things in the Modiﬁed Version:

A. Use in the Title Page (and on the covers, if any) a title distinct from that of the Document, and from those of previous

versions (which should, if there were any, be listed in the History section of the Document). You may use the same title

as a previous version if the original publisher of that version gives permission.

B. List on the Title Page, as authors, one or more persons or entities responsible for authorship of the modiﬁcations in the

Modiﬁed Version, together with at least ﬁve of the principal authors of the Document (all of its principal authors, if it

has fewer than ﬁve), unless they release you from this requirement.

C. State on the Title page the name of the publisher of the Modiﬁed Version, as the publisher.

D. Preserve all the copyright notices of the Document.

E. Add an appropriate copyright notice for your modiﬁcations adjacent to the other copyright notices.

169

F. Include, immediately after the copyright notices, a license notice giving the public permission to use the Modiﬁed Version

under the terms of this License, in the form shown in the Addendum below.

G. Preserve in that license notice the full lists of Invariant Sections and required Cover Texts given in the Document’s

license notice.

H. Include an unaltered copy of this License.

I. Preserve the section Entitled ”History”, Preserve its Title, and add to it an item stating at least the title, year, new

authors, and publisher of the Modiﬁed Version as given on the Title Page. If there is no section Entitled ”History” in

the Document, create one stating the title, year, authors, and publisher of the Document as given on its Title Page,

then add an item describing the Modiﬁed Version as stated in the previous sentence.

J. Preserve the network location, if any, given in the Document for public access to a Transparent copy of the Document,

and likewise the network locations given in the Document for previous versions it was based on. These may be placed

in the ”History” section. You may omit a network location for a work that was published at least four years before the

Document itself, or if the original publisher of the version it refers to gives permission.

K. For any section Entitled ”Acknowledgements” or ”Dedications”, Preserve the Title of the section, and preserve in the

section all the substance and tone of each of the contributor acknowledgements and/or dedications given therein.

L. Preserve all the Invariant Sections of the Document, unaltered in their text and in their titles. Section numbers or the

equivalent are not considered part of the section titles.

M. Delete any section Entitled ”Endorsements”. Such a section may not be included in the Modiﬁed Version.

N. Do not retitle any existing section to be Entitled ”Endorsements” or to conﬂict in title with any Invariant Section.

O. Preserve any Warranty Disclaimers.

If the Modiﬁed Version includes new front-matter sections or appendices that qualify as Secondary Sections and contain

no material copied from the Document, you may at your option designate some or all of these sections as invariant. To do this,

add their titles to the list of Invariant Sections in the Modiﬁed Version’s license notice. These titles must be distinct from any

other section titles.

You may add a section Entitled ”Endorsements”, provided it contains nothing but endorsements of your Modiﬁed Version by

various parties–for example, statements of peer review or that the text has been approved by an organization as the authoritative

deﬁnition of a standard.

You may add a passage of up to ﬁve words as a Front-Cover Text, and a passage of up to 25 words as a Back-Cover Text,

to the end of the list of Cover Texts in the Modiﬁed Version. Only one passage of Front-Cover Text and one of Back-Cover

Text may be added by (or through arrangements made by) any one entity. If the Document already includes a cover text for

the same cover, previously added by you or by arrangement made by the same entity you are acting on behalf of, you may not

add another; but you may replace the old one, on explicit permission from the previous publisher that added the old one.

The author(s) and publisher(s) of the Document do not by this License give permission to use their names for publicity

for or to assert or imply endorsement of any Modiﬁed Version.

5. COMBINING DOCUMENTS

You may combine the Document with other documents released under this License, under the terms deﬁned in section 4

above for modiﬁed versions, provided that you include in the combination all of the Invariant Sections of all of the original

documents, unmodiﬁed, and list them all as Invariant Sections of your combined work in its license notice, and that you preserve

all their Warranty Disclaimers.

The combined work need only contain one copy of this License, and multiple identical Invariant Sections may be replaced

with a single copy. If there are multiple Invariant Sections with the same name but diﬀerent contents, make the title of each

such section unique by adding at the end of it, in parentheses, the name of the original author or publisher of that section if

known, or else a unique number. Make the same adjustment to the section titles in the list of Invariant Sections in the license

notice of the combined work.

In the combination, you must combine any sections Entitled ”History” in the various original documents, forming one section

Entitled ”History”; likewise combine any sections Entitled ”Acknowledgements”, and any sections Entitled ”Dedications”. You

must delete all sections Entitled ”Endorsements”.

6. COLLECTIONS OF DOCUMENTS

You may make a collection consisting of the Document and other documents released under this License, and replace the

individual copies of this License in the various documents with a single copy that is included in the collection, provided that

you follow the rules of this License for verbatim copying of each of the documents in all other respects.

You may extract a single document from such a collection, and distribute it individually under this License, provided you

insert a copy of this License into the extracted document, and follow this License in all other respects regarding verbatim

copying of that document.

7. AGGREGATION WITH INDEPENDENT WORKS

170 APPENDIX B. GNU FREE DOCUMENTATION LICENSE

A compilation of the Document or its derivatives with other separate and independent documents or works, in or on a

volume of a storage or distribution medium, is called an ”aggregate” if the copyright resulting from the compilation is not used

to limit the legal rights of the compilation’s users beyond what the individual works permit. When the Document is included

in an aggregate, this License does not apply to the other works in the aggregate which are not themselves derivative works of

the Document.

If the Cover Text requirement of section 3 is applicable to these copies of the Document, then if the Document is less than

one half of the entire aggregate, the Document’s Cover Texts may be placed on covers that bracket the Document within the

aggregate, or the electronic equivalent of covers if the Document is in electronic form. Otherwise they must appear on printed

covers that bracket the whole aggregate.

8. TRANSLATION

Translation is considered a kind of modiﬁcation, so you may distribute translations of the Document under the terms of

section 4. Replacing Invariant Sections with translations requires special permission from their copyright holders, but you may

include translations of some or all Invariant Sections in addition to the original versions of these Invariant Sections. You may

include a translation of this License, and all the license notices in the Document, and any Warranty Disclaimers, provided that

you also include the original English version of this License and the original versions of those notices and disclaimers. In case

of a disagreement between the translation and the original version of this License or a notice or disclaimer, the original version

will prevail.

If a section in the Document is Entitled ”Acknowledgements”, ”Dedications”, or ”History”, the requirement (section 4) to

Preserve its Title (section 1) will typically require changing the actual title.

9. TERMINATION

You may not copy, modify, sublicense, or distribute the Document except as expressly provided for under this License.

Any other attempt to copy, modify, sublicense or distribute the Document is void, and will automatically terminate your rights

under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses

terminated so long as such parties remain in full compliance.

10. FUTURE REVISIONS OF THIS LICENSE

The Free Software Foundation may publish new, revised versions of the GNU Free Documentation License from time to

time. Such new versions will be similar in spirit to the present version, but may diﬀer in detail to address new problems or

concerns. See http://www.gnu.org/copyleft/.

Each version of the License is given a distinguishing version number. If the Document speciﬁes that a particular numbered

version of this License ”or any later version” applies to it, you have the option of following the terms and conditions either of

that speciﬁed version or of any later version that has been published (not as a draft) by the Free Software Foundation. If the

Document does not specify a version number of this License, you may choose any version ever published (not as a draft) by the

Free Software Foundation.

ADDENDUM: How to use this License for your documents

To use this License in a document you have written, include a copy of the License in the document and put the following

copyright and license notices just after the title page:

Copyright c (YEAR YOUR NAME. Permission is granted to copy, distribute and/or modify this document

under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free

Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of

the license is included in the section entitled ”GNU Free Documentation License”.

If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts, replace the ”with...Texts.” line with this:

with the Invariant Sections being LIST THEIR TITLES, with the Front-Cover Texts being LIST, and with the

Back-Cover Texts being LIST.

If you have Invariant Sections without Cover Texts, or some other combination of the three, merge those two alternatives

to suit the situation.

If your document contains nontrivial examples of program code, we recommend releasing these examples in parallel under

your choice of free software license, such as the GNU General Public License, to permit their use in free software.

Bibliography

[1] Guillaume Bal. Lecture notes, course on numerical analysis, 2002. URL

http://www.columbia.edu/~gb2030/COURSES/E6302/NumAnal.ps.

[2] Samuel Daniel Conte and Carl de Boor. Elementary Numerical Analysis; An Algorithmic Ap-

proach. McGraw-Hill, New York, third edition, 1980. ISBN 0070124477.

[3] James F. Epperson. An Introduction to Numerical Methods and Analysis. Wiley, 2001. ISBN

0-471-31647-4.

[4] Richard Hamming. Numerical Methods for Scientists and Engineers. Dover, New York, 1987.

ISBN 0486652416.

[5] Eugene Isaacson and Herbert Bishop Keller. Analysis of Numerical Methods. Dover, New York,

1966. ISBN 0486680290.

[6] David Kincaid and Ward Cheney. Numerical analysis. Brooks/Cole Publishing Co., Paciﬁc

Grove, CA, second edition, 1996. ISBN 0-534-33892-5. Mathematics of scientiﬁc computing.

[7] David Kincaid and Ward Cheney. Numerical Mathematics and Computing. Brooks/Cole Pub-

lishing Co., Paciﬁc Grove, CA, fourth edition, 1999. ISBN 0-534-35184-0.

[8] James Stewart. Calculus, Early Transcendentals. Brooks/Cole-Thomson Learning, Belmont,

CA, ﬁfth edition, 2003. ISBN 0-534-39330-6.

171

Index

Backwards Euler’s Method, 141

big-O, 3

bisection method, 50

centered diﬀerence, 91

Chebyshev Polynomials, 69, 122

Chebyshev Quadrature, 114

composite quadrature rule, 100

Dirichlet function, 99

divided diﬀerences, 67

eigenvalue, 8

eigenvector, 8

elementary row operations, 25

error

roundoﬀ, 89, 138

truncation, 89, 138

Euler’s Method, 136, 137

Gaussian Elimination, Na¨ıve, 25

Gaussian Quadrature, 107

Heaviside function, 98

Hilbert Matrix, 48

inner product, 6

Intermediate Value Theorem, 49

knot, 79

Lagrange Polynomials, 63

least squares, 119

deﬁnition, 117

linear, 119

LU Factorization, 33

method of undetermined coeﬃcients, 108

multiplier, 28

natural cubic spline, 83

Newton’s Method, 52

norm, 7

subordinate, 9

normal equations, 118

ODE

errors, 138

higher order methods, 137

stability, 138

stepping, 136

partition, 79

phase curve, 150

Richardson Extrapolation, 92, 104

Riemann Integrable, 98

Riemann integral, 97

Romberg Algorithm, 104, 105

Runge Function, 69

Runge-Kutta Method, 143

secant method, 58

splines, 79

basis splines, 84

linear, 79

quadratic, 81

subtractive cancellation, 4

Taylor’s Series Method, 136

Taylor’s Theorem, 1, 3

trapezoidal rule, 100

error, 102

recursive, 106

vector space, 5

172

2

Preface

These notes were originally prepared during Fall quarter 2003 for UCSD Math 174, Numerical Methods. In writing these notes, it was not my intention to add to the glut of Numerical Analysis texts; they were designed to complement the course text, Numerical Mathematics and Computing, Fourth edition, by Cheney and Kincaid [7]. As such, these notes follow the conventions of that text fairly closely. If you are at all serious about pursuing study of Numerical Analysis, you should consider acquiring that text, or any one of a number of other ﬁne texts by e.g., Epperson, Hamming, etc. [3, 4, 5].

3.1 5.1 1.4 1.1 9.1

3.2

8.4

5.2

3.4

4.2

7.1

10.1

9.2

9.3

3.3

4.3

7.2

8.2

10.2

8.3

10.3

Figure 1: The chapter dependency of this text, though some dependencies are weak. Special thanks go to the students of Math 174, 2003–2004, who suﬀered through early versions of these notes, which were riddled with (more) errors.

Revision History

0.0 Transcription of course notes for Math 174, Fall 2003. 0.1 As used in Math 174, Fall 2004. 0.11 Added material on functional analysis and Orthogonal Least Squares.

Todo

• • • • More homework questions and example problems. Chapter on optimization. Chapters on basic ﬁnite diﬀerence and ﬁnite element methods? Section on root ﬁnding for functions of more than one variable.

i

ii .

. . . . . . . . . . .4 Iterative Solutions . . . . . . . . . . . . . . . . . . . . . . . 3. . . . . . . . .3 Programming and Control . . . . . . . . . . . . . . . . . . . . . . . . . . .3. . . . . . .1. . 1. . . . . . .3 LU Factorization .1 Gaussian Elimination with Na¨ Pivoting . . . . . . . . . . 2. . . . . .3 Another Example and A Real Algorithm . . . . . . . . . . . . . . . . . . . . . . . . .2. . . . . . . . . . . . . . . . . . . . .3.4 Plotting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Getting Started . 3. . . . . . . . . . . . 2. . . . . . .4. . . . . . . . i 1 1 4 5 5 6 7 8 8 10 13 13 17 19 20 21 24 25 25 25 27 28 29 30 31 32 33 34 36 37 37 37 37 . .3. . . . . . . . . . . . .3. . . . . . . . . . .3. . . . . . . . . . . . . . . . . . . . . . . 3. . . . .2 Using LU Factorizations .2 Inner Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. . . . . 2. . 1. . . . . . . . . . . . . . . . .2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ıve 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Scaled Partial Pivoting . . . . . . . . . . . . . . . . . . . . . . .1 Logical Forks and Control . . . . . . . . . . . . 2. . . .2 Useful Commands . . . . . . . . . 3. . . . . . . . . . . . . . . . . . 2 A “Crash” Course in octave/Matlab 2.2 Algorithm Terminology . . . . . . . . . . . . . . . . . . .4 Computing Inverses . . . . . . . . . . . . . . . .3 Norms . . . . . 3. . . . . . . . . . . . .2. . . .1 Elementary Row Operations . . . . . . . . . . . . . . . . . . 3. . . . . . . . . . . 1. . . . . . . . 3. . . . . .2 Pivoting Strategies for Gaussian Elimination . . 3. . . . . . . 3. . . . . . . . . . 3. . . . . .3 Some Theory . . . . . . . . . . . . . . . . . . . . . . . . .1. . . . . . . 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3. . . . . 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. . . . . . . . . . . . . . .1 Taylor’s Theorem . . . . . . . . . . . . . . .Contents Preface 1 Introduction 1. .2 Loss of Signiﬁcance . Norms 1. . . . . . . 1. . . .4 Eigenvalues . . . . . . . . . . . . . . . . . . . . . . .3 Vector Spaces. . . . 3 Solving Linear Systems 3. . .1. . . . . .1 Matrix Norms . . . . . . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . .3. . . . . . . . . . . .1 Vector Space . . .3.4. . . . . . . . . . .1 An Example . . . . . . . . . . . . . . . . . . . . . . . . Inner Products. .1 An Operation Count for Gaussian Elimination iii . .2 An Example . . . .3 Algorithm Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . .3 Secant Method . . . . . . . . . . . . .1. . . . . . . . . . . . . . . . . . . . . . 4. . 3. . . 4 Finding Roots 4. . . . . . . . . . . . .2 (Natural) Cubic Splines . . . . . . . . . . . . . . . . . 4. . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . .4 Divided Diﬀerences . . .3 Impossible Iteration . . . . . . . . . . . . . . . . . . . . . . . .1 First and Second Degree Splines . . . . . . .1 Problems . . . . . . . . . .2 Problems . . . . . 5. . . . . .3 Convergence . . . . . . . . . . . . . . . . . . . . . . . 6. . . . . . . . . . . . . . . . . . . . 4. . . . .1. .1. .1. . . . . .1 Lagranges Method . .2 Convergence . .2. . . . .2. . . . . .2. . . . . . . . .4 Richardson Iteration . . . . . . .1 Why Natural Cubic Splines? . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Dividing by Multiplying 3. . . . . . . . . . . . . . . . . .1. . . . . . . . . . . . . . .1 First Degree Spline Accuracy . . . . . . . . . . . . . . . . . .2 Convergence . 6. . . . . . .4. . . . . . . . . . . .iv 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . .4. . . . . . . . . . .2 Newton’s Method . . . . . . . . . . . . . . . . . . . . 3. . 3. . . . . .2 Interpolation Error for Equally Spaced Exercises . . . . .4. . . . . . . . . . . . . . . . . . . . . . . . . 5 Interpolation 5. . .4. Method . . . . . . . . . . . . . . . . . . . . . . . . 4. . . . . . . . .2 Errors in Polynomial Interpolation . . . . . . . .4. .2 Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. . . . 4. . . .8 A Free Lunch? . . . . . . . . . . . . . . . . . . .2. . . 4. . . . . . .3. . . . . . . . . . . . . . . . . . . . 5. . . . . . . . . . . . . . . . . . . . . . Nodes . . . . . . . . . . . . . . . . 6. . . . .3 B Splines . . . . . . . . . . . . . . . 5. . 39 40 40 41 42 42 44 46 49 49 49 50 50 51 53 53 54 55 57 58 60 63 63 63 64 66 67 69 69 73 75 79 79 80 81 82 82 83 84 84 87 . . . . . . . . . CONTENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Interpolation Error Theorem . . . . . . . . . 6 Spline Interpolation 6. . . . . . . . . . . .1. . . . . . 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. 6. . . . . . . . . . . . . . . . . .1 Modiﬁcations . . . . .1 Implementation 4. . . . . . . . . . . . . . . . . . . . . . . . .7 Error Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. . 5. . . . . . . . . . . . . . . . . .3 Newton’s Nested Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. . . . . . . . . . . . . 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. . . . . . . . . . . . . . . . . . . . . . . .2. . . . . . . . . . . . . . . . . . . . . . . . .2 Second Degree Splines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. . . . . . . . . . . . . . . . . . . .3. . . . . . . . . . . . . . . . .2. . . . . .1. . . . . . .2. . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 Using Newton’s 4.6 Gauss Seidel Iteration . . . . .4. . . . . . . . . . . . . . . .5 Jacobi Iteration . . . .3 Computing Second Degree Splines 6. . . . . . 4. . . . . . .1. . . . .1 Bisection . . . . . . . . . . . . .2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4. . . . . . . . . . . .1. . . . . . . .1 Polynomial Interpolation . . . . . .2 Computing Cubic Splines .

. . . . . .3 Gaussian Nodes . . . .2 Ordinary Least Squares in octave/Matlab . . . . . . . . 8. . . . . . . . . . . . . . 10. 9. . . . . . . . . . 9. . . . . . . . . .7 Backwards Euler’s Method . . . . .1 Integration and ‘Stepping’ . . . . . . . . . . 9. . . .2 Taylor’s Series Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 7. . . .3 Simple and Composite Rules . . . .5 Reinventing the Wheel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 Stability . . .2 Using the Error Bound . . . . . . .3 Least Squares from Basis Functions . . . Exercises . .1. . . .5 Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1. . 8.1. . . . . . . . . . . . v 89 89 91 91 92 93 95 97 97 97 99 99 100 101 103 104 106 107 107 108 109 110 112 113 117 117 117 118 119 121 122 124 124 128 129 132 135 135 136 136 137 137 138 138 141 . . . . . . . . . . . .1 How Good is the Composite Trapezoidal Rule? . . . . . . 8. . . . .3 Orthogonal Least Squares . . . . . . . . . . . . . . . . . .1 Upper and Lower Sums . . . . . . . . . .2. . . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . .4 Determining Gaussian Nodes . . . . . . . . . . . . . . . 10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7. . . . . . . . . . . . . 10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Elementary Methods . . . . . . . . . . . . . . . . .1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Euler’s Method . . . . .4. . . . . . . . . . . . . . . . . . . . . . . 10. . . . . . . . . . . . . . . . . . .1 Finite Diﬀerences .3 Romberg Algorithm . . . . . . . . . . . . . . . .1. . . . . . . . . . . . . . .4. . . . . . . . . .1. . . . . . . . . .2. . . . . . 10. . . .2. . . . . . . . . . . . . . . . . . . . . .1 The Deﬁnite Integral . . . . . . . .3. . . . . . . . . .1 The Deﬁnition of Ordinary Least Squares . . . . . . . . . . . . . . 10. . . . . . . . . . . . . . . . . .2 Approximating the Integral . . . . . . .2 Determining Weights (Method of Undetermined Coeﬃcients) 8. . . . . . . . . . . . . . . . . 7.1 Recursive Trapezoidal Rule .1. . . . . . . . . . 10 Ordinary Diﬀerential Equations 10. . . . . . . . . . . . . . . . . . . . . . . . . .1 Approximating the Second Derivative 7. . . 10. . . . . . . 8. . . . . . . . . . . . 8.4. . . . . . . . .1 Abstracting Richardson’s Method . . 9. . . . . . . . . . . . . . .2 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9. . . . . 9 Least Squares 9. . . . . . . . . 9. . . . . . . . . . . . . . . . .1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8. . 8.2 Linear Least Squares . . 8. . . . 8. . . .CONTENTS 7 Approximating Derivatives 7. . .1. . . . . . . . . . . . . .3. . . . 8. . . . . . . . . . . . . .2. .1. .1. . . .2 Trapezoidal Rule . . . . . . . . . . .4 Higher Order Methods . . . . .3. . . . . . . . . . . . .4 Gaussian Quadrature . . . . . . . 8 Integrals and Quadrature 8. . . . . . .2. . . . . . . . . . . . . . . . . . . .2 Richardson Extrapolation . .4. . . . . . . . . . . . . . . . . . . . . . . .2 Using Richardson Extrapolation . . . . . . . . . . . .1. . . .2 Orthonormal Bases . . . . . . . . . . . . . . . . . . . . . . . . . .1. . .4. 9. . . . . . 9. . . . . . . . . . . . . . . .1 Determining Weights (Lagrange Polynomial Method) . . . . . . . . . . . . . . . . . . 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . . .2. . . . . . . . 8. . . . . . . . . . .1 Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Computing the Orthogonal Least Squares Approximant 9. . . . . . . . . . . . . . . . . .1 Alternatives to Normal Equations . . . . . . . . . . .1. . . . . . . . . . . . . . . .

. . . . . 6. . . . . .3. . . . . . . . . . . . . Fall 2004 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Runge-Kutta Methods . . . . . Exercises . . . . . . . . . . . . . . . . . 158 . . . . . . . . . . . . . . . . . . . . . .2 Recasting Single ODE Methods . . . . . . . .3 It’s Only Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. .2 Deriving the Runge-Kutta Methods 10. . . . . . . . . . . . . . . . . . . . . . . . . . . 10. Fall 2003 . . . . APPLICABILITY AND DEFINITIONS . . . . . First Midterm. . . . . . . . . . . . . . . . . . . .2. . . . . . . . . . . . . . TERMINATION . . . . . . . . . 8. FUTURE REVISIONS OF THIS LICENSE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. . . . . . . .1 A. . .4 A.4 It’s Only Autonomous Systems . . . . . . . . . . . . .3. . . . . . . . . . Fall 2004 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 Exams First Midterm. . . . . . .5 A. . . . . . . .3. . . . . 10. . . . . . . Fall 2003 Final Exam. . . . . . . . . . . . . Fall 2003 . . . . . . . . . . . . . TRANSLATION . . . . . . . . . . . . . . . . . . . . . . . . .3. . . . . . . . . . . . . . . . . . . 4. . . . . . . . MODIFICATIONS . . . . . . . . . . . . 157 . . . . . . 10.1 Larger Systems . . 10. . . . . . . . . . . . . . . .3 Examples . .2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2. . . . . . . . . . . . . . 10. . CONTENTS . . . . . . VERBATIM COPYING . . . 162 . . . . . . . . . . . . . . . . A Old A. . . . . . . . . . . . . . . .2 A. . . 7. . . . AGGREGATION WITH INDEPENDENT WORKS . . . . . . . . . . . .3 A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . COMBINING DOCUMENTS . . . . . . . . . . . COPYING IN QUANTITY . . 10. . . . . . . 3. . . . . . . . 143 144 144 146 146 147 147 149 150 154 157 . . . . . . . . . . . . . . . . . . . . . . . .3 Systems of ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ADDENDUM: How to use this License for your documents Bibliography . . 10.1 Taylor’s Series Redux . . . . . . . . . . . . . . . . . . . . . . . . . . 159 . . . . . . 9. . . . . . . . . . . . . 161 . . . . 167 167 168 168 168 169 169 169 170 170 170 170 171 B GNU Free Documentation License 1. . . . . . . . . . . . . . . . . . . . . . . . 164 . . . . Fall 2004 Final Exam. . 10. . . . . . . . . . .vi 10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Second Midterm. . . . . . . . . COLLECTIONS OF DOCUMENTS . . . . . . . . . . . . . . . . . . . Second Midterm. . . . . . . . . . . . .

b]. 2. . and f k (x) is the k th derivative of f at x.. . this is a MacLaurin series... 2! 3! x5 + − . The main usage is to approximate a function by the ﬁrst few terms of its Taylor’s series expansion.. . then for any x and c in this interval n f (x) = k=0 f (k) (c) (x − c)k f (n+1) (ξ) (x − c)n+1 + . k! (n + 1)! where ξ is some number between x and c. . Here the symbol ∼ is used to denote a “formal series. We will use this theorem again and again in this class. f (x). That is. . 4! ex = 1 + x + x3 3! x2 cos (x) = 1 − 2! sin (x) = x − (1. If f (x) has derivatives of order 0.. The constants ai are related to the function f and its derivatives evaluated at c.3) Theorem 1. For example we have the following Taylor’s series (with c = 0): x2 x3 + + . When c = 0.1 Taylor’s Theorem Recall from calculus the Taylor’s series for a function. the theorem then tells us the approximation is “as good” as the ﬁnal term.. also known as the error term.1 (Taylor’s Theorem). 1. n+1 on the closed interval [a.1) (1. . we can make the following manipulation: 1 . 5! x4 + − . c. is written as f (x) ∼ a0 + a1 (x − c) + a2 (x − c)2 + .Chapter 1 Introduction 1. expanded about some number.2) (1. .” meaning that convergence is not guaranteed in general.

= f (k) (x) = 0. This is the Mean Value Theorem. 24 24 so sin x − x − because |sin(ξ)| ≤ 1.2. Example Problem 1. Solution: Solving for f (k) is fairly easy for this function. f (1) (x) = 3x2 − 42x. Solution: Taylor’s Theorem for n = 1 states: Given a function. We will then use our knowledge about f (n+1) (ξ) on the interval [a. b]. . As a one-liner. f (3) (x) = 6. b] to ﬁnd some constant M such that n f (x) − k=0 f (k) (c) (x − c)k k! = f (n+1) (ξ) |x − c|n+1 ≤ M |x − c|n+1 . your velocity is the same as your average velocity for the trip.3. (n + 1)! Example Problem 1. f (x) with a continuous derivative on [a. (n + 1)! On the left hand side is the diﬀerence between f (x) and its approximation by Taylor’s series. Solution: Simple calculus gives us f (0) (x) = x3 − 21x2 + 17. = x− 6 24 x3 6 sin(ξ) x4 x4 ≤ . c when x. Apply Taylor’s Theorem for the case n = 1. c are in [a. using n = 3. then f (x) = f (c) + f (ξ)(x − c) for some ξ between x.4. expanded about c = 0. f (2) (x) = 6x − 42. Find an approximation for f (x) = sin x. Apply Taylor’s Theorem to expand f (x) = x 3 − 21x2 + 17 around c = 1.2 CHAPTER 1. INTRODUCTION n f (x) = n f (x) − f (x) − f (k) (c) k=0 n f (k) (x − c) k! k = = k=0 f (n+1) (ξ) f (k) (c) (x − c)k f (n+1) (ξ) (x − c)n+1 + k! (n + 1)! (x − c)n+1 (n + 1)! k=0 (c) (x − c)k k! f (n+1) (ξ) |x − c|n+1 . We ﬁnd that f (x) = sin x = sin(0) + − cos(0) x3 sin(ξ) x4 cos(0) x − sin(0) x2 + + + 1! 2! 3! 4! x3 sin(ξ) x4 + . Example Problem 1. the MVT says that at some time during a trip. b].

There is an alternative form of Taylor’s Theorem. as long as h is relatively small (say smaller than 1 the term 3ξ 3 can be bounded by a constant.” i. Alternative Form). . .1. and x for c in the more general version.5 (Taylor’s Theorem. since the higher order derivatives are identically zero. Roughly speaking. . The following deﬁnition will suﬃce for this class Deﬁnition 1. we have ln (1 + h) = h − h2 1 + 3 h3 . meaning some function which is a member of this class.7. 2 6 3 Note there is no error term. in this case substituting x + h for x. then for any x in this interval and any h such that x + h is in this interval.01) ≈ 0. b].01 − . k! (n + 1)! k=0 where ξ is some number between x and x + h. without losing too many of the details. For example 0. Example 1. Evaluating these at c = 1 gives f (x) = −3 + −39(x − 1) + −36 (x − 1)2 6 (x − 1)3 + .1. This leads to a discussion on the matter of Orders of Convergence. 2 ln(1 + 0. and thus ln (1 + h) = h − Thus we say that h − h2 2 h2 + O h3 . 2 is a O h3 approximation to ln(1 + h). For a function f ∈ O hk we sometimes write f = O hk .009950331 ≈ 0. By carrying out simple algebra. you will ﬁnd that the above expansion is. −1/x2 h2 2/ξ 3 h3 (1/x) h + + 1 2 6 Using the fact that ξ is between 1 and 1 + h. in fact. TAYLOR’S THEOREM with the last holding for k > 3. 1. smaller than some h ∗ in absolute value. This gives Theorem 1.e. We sometimes also write O hk . . n f (k) (x) (h)k f (n+1) (ξ) (h)n+1 f (x + h) = + . We say that a function f (h) is in the class O hk (pronounced “big-Oh of hk ”) if there is some constant C such that |f (h)| ≤ C |h|k for all h “suﬃciently small. We generally apply this form of the theorem with h → 0. 2 3ξ 1 2 ).6.00995 = 0.012 . through use of the “Big-O” function we can write an expression without “sweating the small stuﬀ.. Consider the Taylor expansion of ln x: ln (x + h) = ln x + Letting x = 1. n+ 1 on the closed interval [a. the function f (x). If f (x) has derivatives of order 0.” This can give us an intuitive understanding of how an approximation works.

177241 from 0. If your computer (or calculator) can only keep 8 signiﬁcant digits. the result would be . For example 0.51. giving two roots x+ = 0.2 Loss of Signiﬁcance Generally speaking. but does not eliminate. there is no loss of precision. Then x + 1 ≈ 1. Solution: The usual quadratic formula is √ −b ± b2 − 4c x± = 2 Supposing that b c > 0..17281995 × 10−6 2. x− = −b.8. The number of signiﬁcant digits in r is usually determined by the user’s input.0000062.1. factoring. Errors can also occur when quantities of radically diﬀerent magnitudes are summed.1234 + 5. This will be made clearer by the examples below. problems like those of the previous example problem. and so is not subject to subtractive cancelling. The latter root is nearly correct. or using the Taylor expansions.1234 by a system that keeps only 16 signiﬁcant digits. However.171717 and 0. 1). this will be rounded to 1.2345678 × 10−5 .9. the result is 6.2345678 × 10−5 ≈ 6.2 × 10−6 . then the result should only have two signiﬁcant digits. Thus if I were to direct my computer to subtract 0.6789 × 10−20 might be rounded to 0. √ Example Problem 1.000006173. √ Solution: Suppose that x = 1. precision cannot be gained but can be lost.177589. and three signiﬁcant digits have been lost. as the following examples illustrate. The usual strategies for rewriting subtractive expressions are completing the square. . Thus for example if you add the two quantities 0.348 × 10 −3 . Consider the stability of x + 1 − 1 when x is near 0. This minimizes. a loss of signiﬁcance can be incurred if two nearly equal quantities are subtracted from one another. while the former has no correct digits. To ﬁx this. we rationalize the expression √ √ √ x+1+1 x+1−1 x x+1−1 = x + 1 − 1√ =√ =√ . Rewrite the expression to rid it of subtractive cancellation. the precision of the ﬁrst measurement is lost in the uncertainty of the second. i. where r is a rational number of a given number of digits in [0. Write stable code to ﬁnd the roots of the equation x 2 + bx + c = 0. This loss is called subtractive cancellation. Note that nearly all modern computers and calculators store intermediate results of calculations in higher precision formats. a computer stores a number x as a mantissa and exponent. INTRODUCTION 1. Example Problem 1.4 CHAPTER 1. Operations on numbers stored in this way follow a “lowest common denominator” type of rule. that is x = ±r×10 k . Thus 6 signiﬁcant digits have been lost from the original. When 1 is subtracted. This may lead to unexpected results. the expression in the square root might be rounded to b 2 . x+1+1 x+1+1 x+1+1 This expression has no subtractions. and k is an integer in a certain range. This is as it should be.0000062 on a machine with 8 digits.e. When x = 1. this expression evaluates approximately as 1. and can often be avoided by rewriting the expression.2345678 × 10 −5 .

the space is “closed under addition. Inner Products. There is a zero vector 0 ∈ V such that for any u ∈ V. For any u ∈ V. 5.e. For each u ∈ V. v ∈ V. Example Problem 1. A collection of vectors. and a scalar multiply over the real ﬁeld R.1. Norms We explore some of the basics of functional analysis which may be useful in this text.3. where 0 is the zero of R. and if x − is computed.. and scalars α. this expression gives root x + = −c/b. if x is far from zero we should use some other technique. and each scalar α ∈ R the scalar product αu is a vector in V. the approximate should be good.”) 4. − 1 − + − + .) For the purposes of this text. This leads to the x− = −b − √ b2 − 4c .. this algebraic ﬁeld will always be the real ﬁeld. C.. v ∈ V. If x is very close to zero. which is nearly correct.3. (i. Deﬁnition 1. Rewrite ex − cos x to be stable when x is near 0. +. 2 x+ = 2c √ −b − b2 − 4c Note that the two roots are nearly reciprocals.3 Vector Spaces. Solution: Look at the Taylor’s Series expansion for these functions: ex − cos x = x2 x3 x4 x5 x2 x4 x6 + + + + . Note that we propose calculating x + x2 + x3 /6 as an approximation of ex − cos x. which we cannot calculate exactly anyway.”) 2. (i. 1 . or even more terms of the expansion. x+ can easily be computed with little additional work. 1. where +R is addition in R. with a binary addition operator. where 1 is the multiplicative identity of R.11.e. (i. For each u. though in the real world.1 The binary operator allows transparent algebraic manipulation of vectors. NORMS To correct this problem. INNER PRODUCTS.10. forms a vector space if 1. VECTOR SPACES. the complex ﬁeld. scalar multiplication distributes in both ways. Addition is commutative: u + v = v + u for each u.. 6. v ∈ V. has some currency. 1u = u. β ∈ R. 1. deﬁned over V. 2! 3! 4! 5! 2! 4! 6! x3 + O x5 = x + x2 + 3! 1+x+ This expression has no subtractions.1 Vector Space A vector space is a collection of objects together with a binary operator which is deﬁned over an algebraic ﬁeld. we may need to take all three terms. Since we assume x is nearly zero. the space is “closed under scalar multiplication. V. and so is not subject to subtractive cancelling. multiply the numerator and denominator of x + by −b − x+ = Now if b pair: 2c √ −b − b2 − 4c 5 √ b2 − 4c to get c > 0.. If x is not so close to zero.. R. both (α + R β)u = αu + βu and α (u + v) = αu + αv hold. we may only have to take the ﬁrst one or two terms. the sum u + v is a vector in V. 0u = 0.. For any u.e. 3.

Similarly for αu. For a vector space. vn ] = [u1 + v1 . .g. .14. v) ≥ 0. . and let H be the collection of all functions from X to R. INTRODUCTION Example 1. . and 1. w) + β (v. We will use Rn to refer to both of them. w) = α (u. It is linear in both its arguments: (αu + βv. Then H forms a vector space under the “pointwise” deﬁned addition and scalar multiplication over R. 3. . . a real or a complex number). αv + βw) = α (u. Example 1. which is the collection of n-tuples of real numbers. u2 .13. . v). bounded set. . . 2. Let Pn be the collection of all ‘formal’ polynomials of degree less than or equal to n with coeﬃcients from R. . This space is denoted as R m×n . and let H0 be the collection of all functions from X to R that take the value zero on ∂X. The collection of all real-valued m × n matrices forms a vector space over the reals with the usual scalar multiplication and matrix addition. Example 1. u2 + v2 . . . a binary function. Then Pn forms a vector space over R. .17. .3. Let X ⊆ Rk be an closed. . . deﬁned over R.2 Inner Products An inner product is a way of “multiplying” two vectors from a vector space together to get a scalar from the same ﬁeld the space is deﬁned over (e. That is. . This would not have worked if the functions of H 0 were required to take some other value on ∂X. v) + β (u. And for u ∈ H. u2 . bounded set. α ∈ R. That is u ∈ R n is of the form [u1 . v2 . (. Example 1. This is simple because if x ∈ ∂X. It is positive: (v. u) = (u.. for u. . n. Addition and scalar multiplication over R are deﬁned pairwise: [u1 . Let X ⊆ Rk be a closed. which takes two vectors of V to R is an inner product if 1.15. The only diﬀerence between proving H 0 is a vector space and the proof required for the previous example is in showing that H 0 is indeed closed under addition and scalar multiplication. . The most familiar example of a vector space is R n . un ] = [αu1 . as in this text there is no need to distinguish them symbolically. 2. with equality holding if and only if v is the zero vector of V. then [u + v] (x) = u(x) + v(x) = 0 + 0. . . v ∈ H. . un ] . It is symmetric: (v. w) . un + vn ] . αu is the function in [αu] (x) = α (u(x)). .6 CHAPTER 1. Then H forms a vector space under the “pointwise” deﬁned addition and scalar multiplication over R. u2 . . . and thus u + v has the property that it takes value 0 on ∂X. α [u1 . like. αun ] Note that some authors distinguish between points in n-dimensional space and vectors in ndimensional space. un ] + [v1 . V. αu2 . 2 instead of 0. A binary function for which this holds is sometimes called a bilinear form. u + v is the function in H deﬁned by [u + v] (x) = u(x) + v(x) for all x ∈ X. . Another way of viewing this space is it is the space of linear functions which carry vectors of R n to vectors of Rm . Example 1.16. The inner product should have the following properties: Deﬁnition 1. and . . . where ui ∈ R for i = 1.12. say. w) (u. ).

1.3. VECTOR SPACES, INNER PRODUCTS, NORMS

7

Example 1.18. The most familiar example of an inner product is the L 2 (pronounced “L two”) inner product on the vector space Rn . If u = [u1 , u2 , . . . , un ] , and v = [v1 , v2 , . . . , vn ] , then letting (u, v)2 = ui vi

i

gives an inner product. This inner product is the usual vector calculus dot product and is sometimes written as u · v or u v. Example 1.19. Let H be the vector space of functions from X to R from Example 1.13. Then for u, v ∈ H, letting (u, v)H = u(x)v(x) dx,

X

gives an inner product. This inner product is like the “limit case” of the L 2 inner product on Rn as n goes to inﬁnity.

1.3.3

Norms

A norm is a way of measuring the “length” of a vector: Deﬁnition 1.20. A function · from a vector space, V, to R + is called a norm if 1. It obeys the triangle inequality: x + y ≤ x + y . 2. It scales positively: αx = |α| x , for scalar α. 3. It is positive: x ≥ 0, with equality only holding when x is the zero vector. The easiest way of constructing a norm is on top of an inner product. If (, ) is an inner product on vector space V, then letting u = (u, u) gives a norm on V. This is how we construct our most common norms: Example 1.21. For vector x ∈ Rn , its L2 norm is deﬁned

n

1 2

x

2

=

i=1

x2 i

= x x

1 2

.

This is constructed on top of the L2 inner product. Example 1.22. The Lp norm on Rn generalizes the L2 norm, and is deﬁned, for p > 0, as

n 1/p

x

p

=

i=1

|xi |

p

.

**Example 1.23. The L∞ norm on Rn is deﬁned as x
**

∞

= max |xi | .

i

The L∞ norm is somehow the “limit” of the Lp norm as p → ∞.

8

CHAPTER 1. INTRODUCTION

1.4

Eigenvalues

It is assumed the reader has some familiarity with linear algebra. We review the topic of eigenvalues. Deﬁnition 1.24. A nonzero vector x is an eigenvector of a given matrix A, with corresponding eigenvalue λ if Ax = λx Subtracting the right hand side from the left and gathering terms gives (A − λI) x = 0. Since x is assumed to be nonzero, the matrix A − λI must be singular. A matrix is singular if and only if its determinant is zero. These steps are reversible, thus we claim λ is an eigenvalue if and only if det (A − λI) = 0. The left hand side can be expanded to a polynomial in λ, of degree n where A is an n×n matrix. This gives the so-called characteristic equation. Sometimes eigenvectors,-values are called characteristic vectors,-values. Example Problem 1.25. Find the eigenvalues of 1 1 4 −2 Solution: The eigenvalues are roots of 0 = det 1−λ 1 4 −2 − λ = (1 − λ) (−2 − λ) − 4 = λ2 + λ − 6.

This equation has roots λ1 = −3, λ2 = 2. Example Problem 1.26. Find the eigenvalues of A 2 . Solution: Let λ be an eigenvalue of A, with corresponding eigenvector x. Then A2 x = A (Ax) = A (λx) = λAx = λ2 x.

The eigenvalues of a matrix tell us, roughly, how the linear transform scales a given matrix; the eigenvectors tell us which directions are “purely scaled.” This will make more sense when we talk about norms of vectors and matrices.

1.4.1

Matrix Norms

Given a norm · on the vector space, R n , we can deﬁne the matrix norm “subordinate” to it, as follows: Deﬁnition 1.27. Given a norm · on R n , we deﬁne the subordinate matrix norm on R n×n by A = max

x=0

Ax . x

1.4. EIGENVALUES

9

We will use the subordinate two-norm for matrices. From the deﬁnition of the subordinate norm as a max, we conclude that if x is a nonzero vector then Ax 2 ≤ x 2 Ax 2 ≤ A A

2 2

thus, x 2.

**Example 1.28. Strange but true: If Λ is the set of eigenvalues of A, then A Example Problem 1.29. Prove that AB
**

2 2

= max |λ| .

λ∈Λ

≤ A

2

B 2.

Solution: AB

2

=df max

x=0

A 2 Bx ABx 2 ≤ max x=0 x 2 x 2

2

= A

2

B 2.

12) Rewrite x + 1 − x to get rid of subtractive cancellation when x is very large.5) Prove that f (h) = h2 + 5h17 is in O h2 .25) Show that A 2 = 0 implies that A is the matrix of all zeros. Use a O x6 approximate. (1.18) Let B = k αi Ai .. (1. .15) Calculate cos(π/2 + 0.19) Suppose A is an invertible matrix with eigenvalue λ.1) Suppose f ∈ O hk . by induction. (1. Prove that λ −1 is an eigenvalue for A−1 .) (1.001) to within 8 decimal places by using the Taylor’s expansion. (1.10) Suppose that f ∈ O hk . The base case was done in Example Problem 1. . (1.9) Find a O h4 approximation to ln(1+h).26. (1. . (1. 100.22) If x 2 = 0.11) Rewrite √x + 1 − 1 to get rid of subtractive cancellation when x ≈ 0. for the same eigenvalue. (1. INTRODUCTION (1. (1. Thus for polynomial p(x). (1.7) Prove that sin(h) is in O (h).1. then x is on a sphere centered at the origin of radius r. (Hint: Take h∗ < 1. Av 2 ≥ µ v 2.21) Show that if x 2 = r. what is x 2 ? (1. Prove that if λ is an eigenvalue of A. and g ∈ O (hm ) . . Give the eigenvalues of B = 3A 3 − 4A2 + I.7 for h = 0. (1. . where A0 = I. (1. unless you remember that O hk is a better approximation than O (hm ) when m < k. (1.14) Use a Taylor’s expansion to rid the expression 1 − cos 2 x of subtractive cancellation for x small.3) Suppose f ∈ O hk . (1.24) What is the norm of 1 0 0 ··· 0 0 1/2 0 · · · 0 0 ? A = 0 0 1/3 · · · . (1.13) Use a Taylor’s expansion to rid the expression 1 − cos x of subtractive cancellation for x small. Compare the approximate value to the actual when h = 0.6) Prove that f (h) = h3 is not in O h4 (Hint: Proof by contradiction. . . for a given A.20) Suppose that the eigenvalues of A are 1. Use a O x5 approximate.4) Prove that f (h) = −3h5 is in O h5 . 10.1? (1. . Show that f g ∈ O hk+m . . eigenvalue of A.27) Suppose there is some µ > 0 such that. in absolute value. then k αi λi is i=0 i=0 an eigenvalue of B. Show that f + g ∈ O (hm ) . and g ∈ O (hm ) . that if λ is an eigenvalue of A then λ k is an eigenvalue of Ak for integer k > 1. √ (1. p(λ) is an eigenvalue of p(A). Show that f ∈ O (hm ) for any m with 0 < m < k.23) Letting x = [3 4 12] .17) Prove. what does this say about vector x? (1.10 Exercises CHAPTER 1. (1.26) Show that A−1 2 equals (1/|λmin |) . 0 0 0 · · · 1/n (1.16) Prove that if x is an eigenvector of A then αx is also an eigenvector of A. . in R n . where λmin is the smallest. . How does this approximation compare to the O h3 approximate from Example 1. with m < k. . Show that B is singular.2) Suppose f ∈ O hk . Here α is a nonzero real number. Can you show that f ∈ O hk−1 ? √ √ (1. . (1. (1. .) Note this may appear counterintuitive. (1.8) Find a O h3 approximation to sin h.

(a) Show that µ ≤ A 2 . (1.28) If A is singular. . then A ≥ max |λ| . is it necessarily the case that A 2 = 0? (1. EIGENVALUES 11 for all vectors v.4.30) Towards proving the equality in Example Problem 1.29) If A 2 ≥ µ > 0 does it follow that A is nonsingular? (1.1.28. but is diﬃcult to prove. prove that if Λ is the set of eigenvalues of A.) (c) Show that A−1 2 ≤ (1/µ) . (Recall: A is singular if there is some x = 0 such that Ax = 0. The inequality in the other direction holds when the norm is · 2 . λ∈Λ where · is any subordinate matrix norm.) (b) Show that A is nonsingular. (Should be very simple.

12 CHAPTER 1. INTRODUCTION .

You should get something like: GNU Octave. version 2. For details. You can ﬁnd a number of octave/Matlab tutorials for free on the web. Please contribute if you find this software useful.1 Getting Started Matlab is a software package that allows you to program the mathematics of an algorithm without getting too bogged down in the details of data structures.html • http://web. 2001. Mathworks continues to tinker with Matlab to make it noncompatible with octave.html Matlab has some demo programs covering a number of topics– from the most basic functionality to the more arcane toolboxes. it also costs quite a bit of money.usna.octave.org/tutorial/matlab/vector. enabling relatively high-level programming. octave. Start up octave.wisc. which is why I recommend the free Matlab clone. available under the GPL 1 . and there are numerous packages available for all kinds of functions.math.html Report bugs to <bug-octave@bevo.html. Matlab users will have to make some changes.edu>. Copyright (C) 1996. freely downloadable from http://www. pointers.org/help-wanted. see the source code for copying conditions.org. What follows.gnu. Unfortunately. For more information. There is ABSOLUTELY NO WARRANTY. 13 . except where explicitly noted otherwise.44 (i686-pc-linux-gnu). this has the side eﬀect of obsoletizing old Matlab code. 1999. type ‘warranty’. It also includes graphing capabilities. What follows is a lame demo for octave. 2000. I will try to focus on the intersection of the two systems. simply type demo.cyclismo. and reinventing the wheel. See http://www. Eaton.ew. In Matlab.org/copyleft/gpl. This is free software. 2003 John W. then. A brief web search reveals the following excellent tutorials: • http://www.edu/~msgocken/intro/intro.1. octave:1> 1 Gnu Public License. many of them are certainly better than this one.edu/~mecheng/DESIGN/CAD/MATLAB/usna. In a lame attempt to retain market share.html • http://www.che. 1997. is an introduction to octave. not even for MERCHANTIBILITY or FITNESS FOR A PARTICULAR PURPOSE. 2002.mtu. visit http://www. 1998.octave.Chapter 2 A “Crash” Course in octave/Matlab 2.

a scalar is a 1 × 1 matrix.14 CHAPTER 2. Thus the previous becomes octave:5> d = 5*c . Some simple matrix constructions are as follows: octave:2> a = [1 2 3] a = 1 2 3 octave:3> b = [5. This is either a feature or an annoyance. you need give both indices. You are warned! Moreover. and not from 0. octave:6> For illustration purposes.2 * b d = -5 2 9 You should notice that octave “echoes” the lvalues it creates. The basic octavian data structure is a matrix. WARNING: vectors and matrices in octave/Matlab are indexed starting from 1.2 * b. when the variable is a matrix. as shown below. You can also give a range of indices. To access an entry or entries of a matrix. the last index of a vector is denoted by the special symbol end.4. octave:6> a(1) = 77 a = 2 3 77 octave:7> a(end) = -400 a = 77 2 -400 . you only need give a single index. A “CRASH” COURSE IN OCTAVE/MATLAB You now have a command line. as in what follows.3] b = 5 4 3 octave:4> c = a’ c = 1 2 3 octave:5> d = 5*c . use parentheses. as is more common in modern programming languages. a vector is an n × 1 matrix. It can be prevented by appending a semicolon at the end of a command. In the case where the variable is a vector. I am leaving the semicolon oﬀ.

2. The form c:d gives returns a row vector of the integers between a and d.067067 333.624716 0. First we look at matrix manipulation: octave:12> j = M * b j = 13 31 999 octave:13> N = rand(3.938714 .624716 0.1.769423 octave:15> P = L * M P = 0.027866 3.1:2) = [1 2.166880 2. GETTING STARTED 15 octave:8> a(2:3) = [22 333] a = 77 22 333 octave:9> M = diag(a) M = 77 0 0 0 22 0 0 0 333 octave:10> M(2.911833 0.087402 0. as we will examine later.769423 0. diag(M) returns the diagonal of matrix M as a vector.067067 0.166880 0.027866 0.911833 0.3) N = 0.3 4] M = 1 2 0 3 4 0 0 0 333 The command diag(v) returns a matrix with v as the diagonal.087402 0.1) = 14 M = 77 0 0 14 22 0 0 0 333 octave:11> M(1:2. if v is a vector.938714 octave:14> L = M + N L = 1.706307 4.706307 0.

0090090 octave:18> err = M * x . respectively.0445e+01 2. use randn(m. The command rand(m.5000000 0.n) gives an m × n matrix with each element “uniformly distributed” on [0. the latter is element by element multiplication.0000e+00 octave:17> x = M \ b x = -6.b err = 0 0 0 0.2333e+01 1. i. . by setting x = M\b = M−1 b. and c. c + e.1669e+00 4.j Mi.j = Li. . (L .0000e+00 1.* M. Note that you can construct matrices directly as you did vectors: octave:19> B = [1 3 4 5.0000e+00 0.j . d.5911e+01 4.7580e+01 3.9014e+00 CHAPTER 2. .2201e+00 1. .9105e+01 2. d. For a zero mean normal distribution with unit variance. . A “CRASH” COURSE IN OCTAVE/MATLAB 2.16 7.8499e+01 0.1120e+05 Note the diﬀerence between L * M and L . the former is matrix multiplication.2505e+00 1. .∗ M)i. In line 17 we asked octave to solve the linear system Mx = b.0000000 5. .1119e+01 1. (or something like it if e does not divide d − c) as follows: octave:20> z = 1:5 z = 1 2 3 4 5 .0000e+00 0. c.e.1120e+05 octave:16> P = L . 1].0557e+00 1.. c + 1. either using the form c:d or the form c:e:d.* M P = 1. .n). which give.2 -2 2 -2] B = 3 4 5 1 2 -2 2 -2 You can also create row vectors as a sequence.

” Blocks in the same row are separated by a comma. those in the same column by a semicolon.2 Useful Commands Here’s a none too complete listing of useful commands in octave: • help is the most useful command. • sin(X). • floor(X) returns the largest integer not greater than X. with the result computed elementwise. it computes the ﬂoor element-wise. elementwise. returns the sine. • exp(X) returns eX . cosine. arctangent. Thus octave:2> y=[2 7 9] y = 2 7 9 octave:3> m = [z. tangent. computed elementwise.2. If X is a vector or matrix. square root of X. atan(X). tan(X). This behavior is common in octave: many functions which we normally think of as applicable to scalars can be applied to matrices.2. • abs(X) returns |X| . USEFUL COMMANDS octave:21> z = 5:(-1):1 z = 5 4 3 2 1 octave:22> z = 5:(-2):1 z = 5 3 1 octave:23> z = 2:3:11 z = 2 5 8 11 octave:24> z = 2:3:10 z = 2 5 8 17 Matrices and vectors can be constructed “blockwise. • ceil(X) returns the smallest integer not less than X. elementwise.y] m = 2 5 8 2 7 9 octave:4> k = [(3:4)’. computed element-wise. cos(X). m] k = 3 2 5 8 4 2 7 9 2. sqrt(X). .

18

**CHAPTER 2. A “CRASH” COURSE IN OCTAVE/MATLAB • norm(X) returns the norm of X; if X is a vector, this is the L 2 norm:
**

1/2

X

2

=

i

X2 i

,

• • • • •

• •

if X is a matrix, it is the matrix norm subordinate to the L 2 norm. You can compute other norms with norm(X,p) where p is a number, to get the L p norm, or with p one of Inf, -Inf, etc. zeros(m,n) returns an m × n matrix of all zeros. eye(m) returns the m × m identity matrix. [m,n] = size(A) returns the number of rows, columns of A. Similarly the functions rows(A) and columns(A) return the number of rows and columns, respectively. length(v) returns the length of vector v, or the larger dimension if v is a matrix. find(M) returns the indices of the nonzero elements of N. This may not seem helpful at ﬁrst, but it can be very useful for selecting subsets of data because the indices can be used for selection. Thus, for example, in this code octave:1> v = round(20*randn(400,3)); octave:2> selectv = v(find(v(:,2) == 7),:) we have selected the rows of v where the element in the second column equals 7. Now you see why leading computer scientists refer to octave/Matlab as “semantically suspect.” It is a very useful language nonetheless, and you should try to learn its quirks rather than resist them. diag(v) returns the diagonal matrix with vector v as diagonal. diag(M) returns as a vector, the diagonal of matrix v. Thus diag(diag(v)) is v for vector v, but diag(diag(M)) is the diagonal part of matrix M. toeplitz(v) returns the Toeplitz matrix associated with vector v. That is v(1) v(2) v(3) ··· v(n) v(2) v(1) v(2) · · · v(n − 1) v(3) v(2) v(1) · · · v(n − 2) toeplitz(v) = . . . . .. . . . . . . . . . v(n) v(n − 1) v(n − 2) · · · v(1) In the more general form, toeplitz(c,r) can be used to return a assymmetric Toeplitz matrix. A matrix which is banded on the cross diagonals is evidently called a “Hankel” matrix: u(1) u(2) u(3) · · · u(n) u(2) u(3) u(4) · · · v(2) hankel(u, v) = u(3) u(4) u(5) · · · v(3) . . . . . . . . . . . . u(n) v(2) v(3) · · · v(n)

• eig(M) returns the eigenvalues of M. [V, LAMBDA] = eig(M) returns the eigenvectors, and eigenvalues of M. • kron(M,N) returns the Kronecker product of the two matrices. This is a blcok construction which returns a matrix where each block is an element of M as a scalar multiplied by the whole matrix N. • flipud(N) ﬂips the vector or matrix N so that its ﬁrst row is last and vice versa. Similarly fliplr(N) ﬂips left/right.

2.3. PROGRAMMING AND CONTROL

19

2.3

Programming and Control

If you are going to do any serious programming in octave, you should keep your commands in a ﬁle. octave loads commands from ‘.m’ ﬁles. 2 If you have the following in a ﬁle called myfunc.m: function [y1,y2] = myfunc(x1,x2) % comments start with a ‘%’ % this function is useless, except as an example of functions. % input: % x1 a number % x2 another number % output: % y1 some output % y2 some output y1 = cos(x1) .* sin(x2); y2 = norm(y1); then you can call this function from octave, as follows: octave:1> myfunc(2,3) ans = -0.058727 octave:2> [a,b] = myfunc(2,3) a = -0.058727 b = 0.058727 octave:3> [a,b] = myfunc([1 2 3 4],[1 2 3 4]) a = 0.45465 b = 0.78366 Note this silly function will throw an error if x1 and x2 are not of the same size. It is recommended that you write your functions so that they can take scalar and vector input where appropriate. For example, the octave builtin sine function can take a scalar and output a scalar, or take a vector and output a vector which is, elementwise, the sine of the input. It is not too diﬃcult to write functions this way, it often only requires judicious use of .* multiplies instead of * multiplies. For example, if the ﬁle myfunc.m were changed to read y1 = cos(x1) * sin(x2); it could easily crash if x1 and x2 were vectors of the same size because matrix multiplication is not deﬁned for an n × 1 matrix times another n × 1 matrix. An .m ﬁle does not have to contain a function, it can merely contain some octave commands. For example, putting the following into runner.m: x1 = rand(4,3); x2 = rand(size(x1)); [a,b] = myfunc(x1,x2)

2

-0.37840

-0.13971

0.49468

The ‘m’ stands for ‘octave.’

20

CHAPTER 2. A “CRASH” COURSE IN OCTAVE/MATLAB

octave allows you to call this script without arguments: octave:4> runner a = 0.245936 0.246414 0.542728 0.257607 b = 1.3135 octave has to know where your .m ﬁle is. It will look in the directory from which it was called. You can set this to something else with cd or chdir. You can also use the octave builtin function feval to evaluate a function by name. For example, the following is a diﬀerent way of calling myfunc.m: octave:5> [a,b] = feval("myfunc",2,3) a = -0.058727 b = 0.058727 In this form feval seems like a way of using more keystrokes to get the same result. However, you can pass a variable function name as well: octave:6> fname = "myfunc" fname = myfunc octave:7> [a,b] = feval(fname,2,3) a = -0.058727 b = 0.058727 This allows you to eﬀectively pass functions to other functions. 0.478054 0.186454 0.419457 0.378558 0.535323 0.206279 0.083917 0.768188

2.3.1

Logical Forks and Control

octave has the regular assortment of ‘if-then-else’ and ‘for’ and ‘while’ loops. These take the following form: if expr1 statements elseif expr2 statements elsif expr3 statements ... else statements end for var=vector statements end while expr

and contour and mesh for 3D plots.loglog for 2D plots with log axes. as this can lead to disastrous results.. else s = -1. The test expressions may use the logical conditionals: >. <. switches. Here are some examples: %compute the sign of x if x > 0 s = 1. as well as the exit point for for loops. The main plot command is plot. etc. semilogy. as in C. if statements. . end 2. It stands for the last index of a vector of matrix. %just plot Y plot(Y). Use the help command to get the speciﬁc syntax for each command. end while (sin(x) > 0) x = x * pi. PLOTTING statements end 21 Note that the word end is one of the most overloaded in octave/Matlab. >=. end %an ‘irregular’ for loop for i=[1 2 3 5 8 13 21 34] fsm = fsm + i. To simplify debugging. it is also permissible to use endif to end an if statement. ==. <=. We present some examples: n = 100.4. The commands and examples given herein are for octave. X = pi . elseif x == 0 s = 0.* ((1:n) ./ n). etc. end %a ‘regular’ for loop for i=1:10 sm = sm + i. endfor to end a for loop. octave ships its plotting commands to Gnuplot. ~=. but the commands for Matlab are not too diﬀerent. You may also use semilogx. Y = sin(X).4 Plotting Plotting is one area in which there are some noticeable diﬀerences between octave and Matlab.2. Do not use the assignment operator = as a conditional.

9 0.5 0. In particular. but with the ‘right’ X axis labels plot(Y.3 0.1(d).2 0.9 0.7 0.8 0. as in Figure 2.1 0 0 10 20 30 40 50 60 70 80 90 100 0.5 1 line 1 (a) Y = sin(X) (b) Y versus X 1 line 1 0. For octave.1 0 0 10 20 30 40 50 60 70 80 90 100 1 line 1 0.7 0.6 0. you should note the diﬀerence between plotting a vector.4 0.8 0.6 0.5 2 2.4 0.7 0.1 0 0 0. as in Figure 2.5 1 1.3 0.6 0.9 0.3 0.2 0.8 0. The output from these commands is seen in Figure 2. plot(W). A “CRASH” COURSE IN OCTAVE/MATLAB %plot Y.1: Four plots from octave.5 0.9 0.6 0.4 0. .2 0. W = sqrt(Y).3 0.5 0.8 0.W).1.Y).7 0.4 0.1 0 0 0.5 0.1 0.9 1 √ (c) W = Y (d) W versus Y Figure 2.22 CHAPTER 2. but with the ‘right’ X axis labels plot(X. I recommend the following magic formula.1(c) versus plotting the same vector but with the appropriate abscissa values.3 0. which replots the ﬁgure to a ﬁle: %call the plot commands before this line gset term postscript color. Some magic commands are required to plot to a ﬁle. 1 line 1 0.8 0. %plot W.5 0.7 0.5 3 3.2 0.2 0.6 0.4 0.

’filename.eps’).’-deps’.2. the commands are something like this: %call the plot commands before this line print(gcf. gset term x11.ps". 23 . replot. PLOTTING gset output "filename.4. In Matlab. gset output "/dev/null".

some x such that x = cos(x). b = 5.1) What do the following pieces of octave/Matlab code accomplish? (a) x = (0:40) .24 Exercises CHAPTER 2.* (0:40) . c) = 1 × 1015 . Do you get a spurious root? (2... 1 .x2] = robustquad(b. (2. (b) a = 2. c. n. then let xi+1 = cos(xi ) for i = 0.x2] = naivequad(b.9) to ﬁnd the roots of x 2 + bx + c = 0.2) Implement the na¨ quadratic formula to ﬁnd the roots of x 2 + bx + c = 0.c) Test your code for (b.e. i. Do you get a spurious root? (2. or write the function to recursively call itself./ 40. . A “CRASH” COURSE IN OCTAVE/MATLAB (2.e. 1. 1 . .c) Test your code for (b. 2 Your m-ﬁle should have header line like: function [x1. . Your ıve code should return √ −b ± b2 − 4c . terminate if |xi+1 − xi | < 1 × 10−10 ). or choose some convergence criterion (i. Does your code always converge? (2. for real b.* (0:40) . Example Problem 1.5) Write code to implement the factorial function for integers: function [nfact] = factorial(n) where n factorial is equal to 1 · 2 · 3 · · · (n − 1) · n.y).4) Write octave/Matlab code to ﬁnd a ﬁxed point for the cosine. plot(x. Do this as follows: pick some initial value x 0 . Either use a ‘for’ loop. ./ 40.3) Implement a robust quadratic formula (cf. . (c) x = a + (b-a) ./ 40. y = sin(x). x = a + (b-a) . Pick n to be reasonably large. c) = 1 × 1015 . Your m-ﬁle should have header line like: function [x1.

Chapter 3

**Solving Linear Systems
**

A number of problems in numerical analysis can be reduced to, or approximated by, a system of linear equations.

3.1

Gaussian Elimination with Na¨ Pivoting ıve

Our goal is the automatic solution of systems of linear equations: a11 x1 a21 x1 a31 x1 . . . + a12 x2 + a22 x2 + a32 x2 . . . + a13 x3 + a23 x3 + a33 x3 . . . + ··· + ··· + ··· .. . + a1n xn + a2n xn + a3n xn . . . = b1 = b2 = b3 . . .

an1 x1 + an2 x2 + an3 x3 + · · ·

+ ann xn = bn

In these equations, the aij and bi are given real numbers. We also write this as Ax = b, where A is a matrix, whose element in the i th row and j th column is aij , and b is a column vector, whose ith entry is bi . This gives the easier way of writing this equation: a11 a21 a31 . . . a12 a22 a32 . . . a13 a23 a33 . . . ··· ··· ··· .. . a1n a2n a3n . . . ann x1 x2 x3 . . . xn b1 b2 b3 . . . bn

an1 an2 an3 · · ·

=

(3.1)

3.1.1

Elementary Row Operations

You may remember that one way to solve linear equations is by applying elementary row operations to a given equation of the system. For example, if we are trying to solve the given system of equations, they should have the same solution as the following system: 25

26

CHAPTER 3. SOLVING LINEAR SYSTEMS

where κ is some given number which is not zero. It suﬃces to solve this system of linear equations, as it has the same solution(s) as our original system. Multiplying a row of the system by a nonzero constant is one of the elementary row operations. The second elementary row operation is to replace a row by the sum of that row and a constant times another. Thus, for example, the following system of equations has the same solution as the original system: a11 a21 a31 . . . a12 a22 a32 . . . a13 a23 a33 . . . ··· ··· ··· .. . a1n a2n a3n . . . a(i−1)n ain + βajn . . . ann x1 x2 x3 . . . b1 b2 b3 . . .

κai1 κai2 κai3 · · · . . . .. . . . . . . . an1 an2 an3 · · ·

a11 a21 a31 . . .

a12 a22 a32 . . .

a13 a23 a33 . . .

··· ··· ··· .. .

a1n a2n a3n . . . κain . . . ann

x1 x2 x3 . . . xi . . . xn

=

b1 b2 b3 . . . κbi . . . bn

The idea of Gaussian Elimination is to use the elementary row operations to put a system into upper triangular form then use back substitution. We’ll give an example here:

We have here switched the second and third rows. The purpose of this e.r.o. is mainly to make things look nice. Note that none of the e.r.o.’s change the structure of the solution vector x. For this reason, it is customary to drop the solution vector entirely and to write the matrix A and the vector b together in augmented form: a11 a12 a13 · · · a1n b1 a21 a22 a23 · · · a2n b2 a31 a32 a33 · · · a3n b3 . . . . .. . . . . . . . . . an1 an2 an3 · · · ann bn

We have replaced the ith row by the ith row plus β times the j th row. The third elementary row operation is to switch rows: b1 x1 a11 a12 a13 · · · a1n a31 a32 a33 · · · a3n x2 b3 a21 a22 a23 · · · a2n x3 b2 = . . . . . . .. . . . . . . . . . . . . . an1 an2 an3 · · · ann xn bn

a(i−1)1 a(i−1)2 a(i−1)3 ··· ai1 + βaj1 ai2 + βaj2 ai3 + βaj3 · · · . . . .. . . . . . . . an1 an2 an3 ···

x(i−1) xi . . . xn

= b(i−1) bi + βbj . . . bn

3.1. GAUSSIAN ELIMINATION WITH NA¨ IVE PIVOTING Example Problem 3.1. Solve the set of linear equations: −3x1 − 4x2 + 4x3 = −7 2x1 + 1x2 + 1x3 = 7 x1 + x 2 − x 3 = 2

27

The matrix is now in upper triangular form: there are no nonzero entries below the diagonal. This corresponds to the set of equations: x1 + x 2 − x 3 = 2 −x2 + x3 = −1 2x3 = 4

We now add −1 times the second row to the third row to get: 1 1 −1 2 0 −1 1 −1 0 0 2 4

We add 3 times the ﬁrst row to the second, and −2 times the ﬁrst row to the third to get: 1 1 −1 2 0 −1 1 −1 0 −1 3 3

Solution: We start by rewriting in the augmented form: 1 −1 2 1 −3 −4 4 −7 2 1 1 7

We now solve this by back substitution. Because the matrix is in upper triangular form, we can solve x3 by looking only at the last equation; namely x 3 = 2. However, once x3 is known, the second equation involves only one unknown, x 2 , and can be solved only by x2 = 3. Then the ﬁrst equation has only one unknown, and is solved by x 1 = 1. All sorts of funny things can happen when you attempt Gaussian Elimination: it may turn out that your system has no solution, or has a single solution (as above), or an inﬁnite number of solutions. We should expect that an algorithm for automatic solution of systems of equations should detect these problems.

3.1.2

Algorithm Terminology

The method outlined above is ﬁne for solving small systems. We should like to devise an algorithm for doing the same thing which can be applied to large systems of equations. The algorithm will take the system (in augmented form): a11 a12 a13 · · · a1n b1 a21 a22 a23 · · · a2n b2 a31 a32 a33 · · · a3n b3 . . . . .. . . . . . . . . . an1 an2 an3 · · · ann bn

. . . in this case all aik the multipliers akk are not deﬁned. . 0 an2 an3 · · · a1j b1 ann bn Where aij = aij − bi = bi − ai1 a11 ai1 a11 (2 ≤ i ≤ n. . and the ﬁrst element of the ﬁrst row. This would be bad if the pivot element were zero. Thus.0590 0. . Bad things can happen if akk is merely small instead of zero. .o.830508.3528 0. . SOLVING LINEAR SYSTEMS The algorithm then selects the ﬁrst row as the pivot equation or pivot row.. . x2 = 1. pivots on the ﬁrst equation. of replacing the i th row by the ith row minus times ai1 the ﬁrst row. .2372 −0. .0590 which will be rounded to −1. The algorithm we have outlined is far too rigid–it always chooses to pivot on the k th row during the k th step. . the algorithm could be written recursively. a11 is the pivot element. Hereafter the algorithm will not alter the ﬁrst row or ﬁrst column of the system.2.831 by the algorithm. . Consider the following example: Example 3. . . . .. .1080 −0. the algorithm then generates the system: a11 a12 a13 · · · a1n b1 0 a a2n b2 22 a23 · · · 0 0 a33 · · · a3n b3 . .1080 ≈ −1. By pivoting on the second row. . The quantity a11 is the multiplier for the ith row.. Solve the system of equations given by the augmented form: −0. na¨ ıvely. . . The multiplier for the second row is 0. . 1 ≤ j ≤ n) ai1 a11 Eﬀectively we are carrying out the e. .1. 0 0 an3 · · · ann bn In this case aij bi = aij − = bi − ai2 a22 ai2 a22 a2j b2 (3 ≤ i ≤ n.. The algorithm. . that the algorithm uses only 4 signiﬁcant ﬁgures for its calculations. however. 1 ≤ j ≤ n) 3. −0. . Suppose.3 Algorithm Problems The pivoting strategy we examined in this section is called ‘na¨ because a real algorithm is a bit ıve’ more complicated.4348 0.r.6452 Note that the exact solution of this system is x 1 = 10. .28 CHAPTER 3. . The algorithm then pivots on the pivot element to get the system: a11 a12 a13 · · · a1n b1 0 a a2n b2 22 a23 · · · 0 a a3n b3 32 a33 · · · .

With poor rounding. x2 satisﬁes the ﬁrst equation.2372 · 1. This is also a bit oﬀ.4348 + 0.0005 −0.059 = (−0.0008 = 1. it gets the value x2 = −0. If the algorithm selects a zero pivot. This puts the system in the form −0.0590 0. then 1 is enormous compared to both 1 and 2. the second vector entry becomes: 0.3528 − 0. The algorithm now ﬁnds x1 = (−0.2. Similarly. 3. which is no good.6) /−0. but is an awful approximation for x1 .059 = 12.7323) /−0.0008.3. the algorithm solves x2 as 1.3528) = 0. where each step has rounding to four signiﬁcant ﬁgures.6460 = −0. There is some serious subtractive cancellation going on here.6452 − 0.3528 0 −0.0005. and again there is subtractive cancelling. We also saw that a pivot small in magnitude can cause failure.059 = (−0.2 Pivoting Strategies for Gaussian Elimination Gaussian Elimination can fail when performed in the wrong order. The errors get worse from here.4343 = −0.2372 −0. Then it solves x1 = 0. again. then solved: x1 + x 2 = 2 x1 + x2 = 1 . This is nearly correct for x2 .3795) /−0. PIVOTING STRATEGIES FOR GAUSSIAN ELIMINATION The second entry in the matrix is replaced by −0. but not the second.2372) = −0. Note that this choice of x1 . As here: x1 + x2 = 1 x1 + x 2 = 2 The na¨ algorithm solves this as ıve 2− 1 =1− 1− 1− 1 1 − x2 1 x1 = = 1− x2 = If is very small. Now suppose the algorithm changed the order of the equations. intermediate steps are rounded to four signiﬁcant ﬁgures. −0.3528 − 0.6.831)(0. the multipliers are undeﬁned.0005 This is a bit oﬀ from the actual value of 1.41.4348 − (−1. where.831)(−0. We have lost three ﬁgures with this subtraction.0008 When the algorithm attempts back substitution.6452 − (−1. 29 where the arithmetic is rounded to four signiﬁcant ﬁgures each time.

is chosen to be the i such that |ai. . . . 2.E. Note that the algorithm should not rearrange the matrix–this takes too much work. The problem is not the small entry per se: Suppose we use an e.r.: ıve 1 1 x1 + x2 = x1 + x 2 = 2 This is still solved as x2 = x1 = and rounding is still a problem. n-1 in order as pivot equations. . . .k | si . . Better is to pivot ﬁrst on row 1 . is chosen to be the i such that |ai. then row 2 .1 | si is maximized. . ıve this can cause errors. The strategy of scaled partial i=1 pivoting is to compute this permutation so that G. So our algorithm ﬁrst determines “smallness” by calculating a scale.1 Scaled Partial Pivoting The na¨ G. The algorithm pivots on row 1 .E. . we want to pivot on an element which is not small compared to other elements in its row. 1 . Then the ﬁrst pivot. 2− 1 1− 1 1 − x2 . producing a bunch of zeros in the ﬁrst column. . 2 . SOLVING LINEAR SYSTEMS 1−2 1− x1 = 2 − x 2 There’s no problem with rounding here. In the k th step k is chosen to be the i not among 1 .o. producing a bunch of zeros in the second column. The second pivot. n. algorithm uses the rows 1. . The algorithm pivots on row 2 . . 1≤j≤n The scales are only computed once. As shown above. then use na¨ G. row-wise: si = max |aij | . for some permutation { i }n of the integers 1. 2. 3. 2 .30 The algorithm solves this as x2 = CHAPTER 3. to scale the ﬁrst equation. but without choosing 2 = 1 . until ﬁnally pivoting on row n−1 .E. works well.2 | si is maximized. etc.2. In light of our example. k−1 such that |ai.

7 7 17 We have a tie. . In this case we pick the second row. When we do back substitution. The slick way to implement this is to ﬁrst set i = i for i = 1. solving x 4 . These values are: 2 4 2 6 . We’ll look at a homework question. 7 7 3 17 We pick 1 = 3.e. 2 = 2.2 An Example pivoting with G. only among j for k ≤ j ≤ n. so x3 = 1 (13 − 7 ∗ 1) = 2 3 1 x2 = (−3 − 1 ∗ 1 + 2 ∗ 2) = 0 2 1 x1 = (7 − 3 ∗ 1 − 1 ∗ 2 − 1 ∗ 0) = 1 2 2 2 2 . We get x4 = 1.e. It should not be 3. search only those indices in the tail of this vector: i. x1 . Then rearrange this vector in a kind of “bubble sort”: when you ﬁnd the index that should be 1 . i. s2 = 7. . .e.E. 2. Then 4 = 1. s4 = 17. Our row permutation is 3.2. PIVOTING STRATEGIES FOR GAUSSIAN ELIMINATION 31 is maximized. j . and pivot: 0 −2 2 4 8 0 2 −2 1 −3 2 1 1 3 7 0 2 1 8 10 We pick are: 2. The algorithm pivots on row k . It’s hard to come up with an ugly fractions.3. i. If we did proceed we would have 3 = 4.2. . . . and no changes would occur. then x3 . we work in this order reversed on the rows. . and perform a swap. Then at the k th step. We could proceed. swap them. 2. We pick 1 . . 1. . 3 7 15 0 7 11 1 3 7 4 17 31 We present an example of using scaled partial example where the numbers do not come out as 2 −1 4 4 2 1 6 5 The scales are as follows: s1 = 7.. n. We pivot: 0 0 0 5 5 0 2 −2 1 −3 2 1 1 3 7 0 0 3 7 13 . but would get a zero multiplier. x2 . 4.. It should be the index which maximizes |a i1 | /si .. These values The matrix is in permuted upper triangular form. producing a bunch of zeros in the k th column. s3 = 3. and should be the index which maximizes |a i2 | /si . ﬁnd the j such that j should be the ﬁrst pivot and switch the values of 1 . 3.

M1 = 1 . we will put the multipliers. We will illustrate them here boxed: 1 3 15 1 4 2 4 2 2 4 2 1 2 1 = . 1 1 Our scale choices are 4 . We scale b by the multipliers in order: 1 = 2. 2 .E. 4: M2 = Our scale choices are 27 1 40 . we sweep through the ﬁrst column of M 3 . We illustrate with an example: 1 1 2 4 1 2 4 2 1 2 = . so.32 CHAPTER 3. which can then be used multiple times on diﬀerent vectors b. 3 0 3 2 2 2 4 1 5 7 1 4 2 4 2 3 5 Our scale choices are 8 . 6 . and swap 1 . 3 . M0 = 3 2 1 2 3 .2. on the matrix A alone and come up with all the multipliers. and so swap 27 10 1 5 2. and scaling b . It turns out we can run G. We choose 3 2 = 4. In the places where there 4 3 would be zeros in the real matrix. 4 1 3 2 1 The scale vector is s = 4 4 3 3 . 2 . 0 .3 Another Example and A Real Algorithm Ax = b Sometimes we want to solve for a number of diﬀerent vectors b. 2 . 1 4 4 1 2 1 4 3 5 2 0 5 2 1 3 2 7 4 2 . SOLVING LINEAR SYSTEMS 3. picking oﬀ the boxed numbers (your computer doesn’t really have boxed variables). 2 1 2 2 4 = . 3 1 3. 4 . and so swap 27 10 1 5 M3 = 3 5 2 0 5 2 1 5 9 7 4 2 17 . 4: We choose 1 4 4 1 2 1 4 3 = 1. 1 3 Now suppose we had to solve the linear system for b = −1 8 2 1 . 9 1 2 2 4 = . We choose 1 = 2.

L = 31 32 1 · · · 0 . . A= a11 a21 a31 . . 27 5 5 17 1 −6 7 2 −1 − − x3 = . . . 3. .3.E.. . a12 a22 a32 . where 1 0 0 ··· 0 u11 u12 u13 · · · u1n 21 1 0 u22 u23 · · · u2n 0 ··· 0 0 u33 · · · u3n . . . . ..3. . . ··· ··· ··· . . . that is A = LU. . . a1n a2n a3n . . . . ann an1 an2 an3 · · · . . We examined G. . . . . 8−2 x1 = 4 17 − 12 5 ⇒ 82 − 3 −1 Fill in your own values here. . .. .3 LU Factorization Ax = b. .. x2 = 5 2 17 4 −6 1 − x3 − 2x2 = . . 0 0 0 · · · unn n1 n2 n3 · · · 1 We call this a LU Factorization of A. LU FACTORIZATION appropriately: This continues: −1 −3 8 8 2 ⇒ −2 1 −1 12 −5 −3 8 8 −2 ⇒ −2 −1 −1 33 This proceeds as We then perform a permuted backwards substitution on the augmented system 0 0 27 1 − 12 10 5 5 4 2 1 2 8 17 0 0 0 −2 9 3 1 −1 0 5 7 2 4 2 x4 = −6 −2 9 = 3 17 17 10 12 1 −6 x3 = − − = . . .E. .. . . . to solve the system where A is a matrix: We want to show that G. . . . U= 0 . actually factors A into lower and upper triangular parts. a13 a23 a33 . . . .

We pivot on the second row to get: 2 0 0 0 1 1 3 7 2 −2 1 −3 0 3 7 13 0 −3 5 5 0 0 0 1 The multipliers are 1. −1. 1. 3.2) We are going to run the na¨ G.E. multipliers for each row. Since this is the na¨ ıve ıve version.. reduces this to ıve 2 0 0 0 1 1 3 7 2 −2 1 −3 0 3 7 13 0 0 12 18 1 3 7 0 7 11 4 17 31 0 7 15 (3. SOLVING LINEAR SYSTEMS 3.1 An Example We consider solution of the following augmented form: 2 1 4 4 6 5 2 −1 The na¨ G. with 1 0 0 0 1 0 M2 = 0 −1 1 0 1 0 We are now solving M2 M1 Ax = M2 M1 b.r. Thus after the ﬁrst pivot. we ﬁrst pivot on the ﬁrst row. . and see how it is a LU Factorization. it is like we are solving the system M1 Ax = M1 b. We can view this pivot as a multiplication by M 2 . Our multipliers are 2.E.o.3.34 CHAPTER 3. We pivot to get 2 1 1 3 7 0 2 −2 1 −3 0 2 1 8 10 0 −2 −1 4 8 1 −2 M1 = −3 −1 0 1 0 0 0 0 1 0 0 0 0 1 Careful inspection shows that we’ve merely multiplied A and b by a lower triangular matrix M 1 : The entries in the ﬁrst column are the negative e.

with a multiplier of −1. and has ones on the diagonal.3.3. that is. A = LU. It turns out that the inverse of each Mi matrix has a nice form (See Exercise (3. 2 0 U= 0 0 Then we have if we let 1 1 3 2 −2 1 0 3 7 0 0 12 M3 M2 M1 A = U. A = (M3 M2 M1 )−1 U. = 6 5 4 17 2 −1 0 7 of ones on the diagonal and multipliers 1 1 3 2 −2 1 0 3 7 0 0 12 . But we have an upper triangular form. LU FACTORIZATION We pivot on the third row. Now we check to see if we made any mistakes: 2 1 0 0 0 0 2 1 0 0 LU = 3 1 1 0 0 0 1 −1 −1 1 2 1 1 3 4 4 0 7 = A.6)). Thus we get 2 1 1 3 7 0 2 −2 1 −3 0 0 3 7 13 0 0 0 12 18 1 0 M3 = 0 0 0 1 0 0 0 0 1 1 0 0 0 1 35 We have multiplied by M3 : We are now solving M3 M2 M1 Ax = M3 M2 M1 b. We are hoping that L is indeed lower triangular. We write them here: 1 0 0 0 1 0 0 0 1 0 0 0 2 1 0 0 0 0 0 1 0 0 0 1 L= 0 0 3 0 1 0 0 1 0 1 1 0 0 0 −1 1 0 −1 0 1 1 0 0 1 1 0 0 0 2 1 0 0 = 3 1 1 0 1 −1 −1 1 This is really crazy: the matrix L looks to be composed under the diagonal. A = M1 −1 M2 −1 M3 −1 U.

We will look at this in more detail in another example. algorithm can be used to actually calculate the LU factorization. We now examine how we can use the LU factorization to solve the equation Ax = b.3.E. then solve Ux = z.2 Using LU Factorizations We see that the G. we can solve for z with a forward substitution.36 CHAPTER 3. Since we have A = LU. since U is upper triangular. we can solve for x with a back substitution. SOLVING LINEAR SYSTEMS 3. Since L is lower triangular. Similarly. we ﬁrst solve Lz = b. We drag out the previous example (which we never got around to solving): 2 1 1 3 7 4 4 0 7 11 6 5 4 17 31 2 −1 0 7 15 We had found the LU factorization of A as So we solve 1 0 0 2 1 0 A= 3 1 1 1 −1 −1 1 0 0 2 1 0 3 1 1 1 −1 −1 0 2 1 1 3 0 0 2 −2 1 3 7 0 0 0 0 0 1 0 12 7 0 11 0 z = 31 0 15 1 We get Now we solve 7 −3 z= 13 18 2 0 0 0 7 1 1 3 2 −2 1 x = −3 13 0 3 7 18 0 0 12 z= 37 24 −17 12 5 6 3 2 We get the ugly solution .

ITERATIVE SOLUTIONS 37 3.3 Some Theory We aren’t doing much proving here. an1 an2 an3 · · · ann bn . where ej is the column matrix of all zeros. . This appears to me to be a case of something which can be better illustrated with an example or two and some informal investigation. This theorem relies on us using the fancy version of G. Since AA−1 = I. . If A is an n × n matrix. then the j th column of the inverse A−1 solves the equation Ax = ej . First we look at how many ﬂoating point operations are required to reduce a11 a12 a13 · · · a1n b1 a21 a22 a23 · · · a2n b2 a31 a32 a33 · · · a3n b3 . . 3.E.1 An Operation Count for Gaussian Elimination We consider the number of ﬂoating point operations (“ﬂops”) required to solve the system Ax = b. .. Thus we can ﬁnd the inverse of A by running n linear solves. 3.4.3. .4 Computing Inverses We consider ﬁnding the inverse of A. Recall we are trying to solve We examine the computational cost of Gaussian Elimination to motivate the search for an alternative algorithm. Gaussian Elimnation ﬁrst uses row operations to transform the problem into an equivalent problem of the form Ux = b . then the algorithm generates a LU factorization of A. . If correctly implemented. . then. and U is the upper triangular part. Then back substitution is used to solve for x. L is the lower triangular part but with ones put on the diagonal. and na¨ Gaussian Elimination does not encounter a zero ıve pivot. The following theorem has an ugly proof in the Cheney & Kincaid [7].. 3.3.3.E. to put the matrix in LU form. .3.4 Iterative Solutions Ax = b. where L is the lower triangular part of the output matrix. but with a one in the j th position.4. once. . Obviously we are only going to run G. Theorem 3. . . . . which saves the multipliers in the spots where there should be zeros. The proof is an unillustrating index-chase–read it at your own risk. then run n solves using forward and backward substitutions. This theorem is proved in Cheney & Kincaid [7]. where U is upper triangular.

n−2 an−2. Similarly.n xn ] . Then to solve for x n−1 we compute xn−1 = 1 an−1. Then in each row the algorithm performs n multiplies and n adds. a13 a23 a33 . . 6 2 n and j=1 1 j = (n)(n + 1). requiring 2(n − 2)2 − (n − 2) − 1 operations. .n−2 . a1n a2n a3n . Thus in total back substitution requires Bn total ﬂoating point operations with n Bn = j=1 2j − 1 = n(n − 1) − n = n(n − 2) ≈ n2 . a1. .n−1 . . Then this has to be done on the next subsystem.n bn−2 0 an−1.38 to a11 0 0 . 0 ··· 0 ··· 0 ··· a1. then. . and so on. .. Then this has to be done recursively on the lower right subsystem. In total.n−1 an−1. and requires 3 ﬂops. j=1 Recalling that n j=1 1 j = (n)(n + 1)(2n + 1). . with n n n In = 2 j=1 j − 2 j=1 j− 1.. SOLVING LINEAR SYSTEMS ··· ··· ··· . In total this is 2n 2 − n − 1 ﬂoating point operations to do a single pivot on the n by n system. . . . b1 b2 b3 an2 an3 · · · ann bn First a multiplier is computed for each row. . . . an−2. . we use In total ﬂoating point operations.n bn−1 0 0 ann bn for xn requires only a single division. which is an (n − 1) by (n − 1) system. . solving for x n−2 requires 5 ﬂops. .n−1 an−2. 0 a12 a22 a32 . . CHAPTER 3. 2 We ﬁnd that 1 2 In = (4n − 1)n(n + 1) − n ≈ n3 . b1 . . To solve a11 · · · . This requires 2(n − 1)2 − (n − 1) − 1 operations. This gives a total of (n − 1) + (n − 1)n multiplies (counting in the computing of the multiplier in each of the (n − 1) rows) and (n − 1)n adds. . a1n . .n−1 [bn−1 − an−1. 6 3 Now consider the costs of back substitution. .

the terms r n converge to zero as n → ∞. that the choice of the initial iterate x (0) was immaterial. Now suppose that ω is chosen such that |r| < 1. we would like the complexity of our computations to scale with the sparsity of A. Additionally. then let x(k) = ωb + rx(k−1) . . . That is we consider iterates of the form Qx(k+1) = (Q − ωA) x(k) + ωb. when A is n × n. ITERATIVE SOLUTIONS 39 3. Suppose we are to solve the equation Ax = b. Note that this update relies on vector additions and possibly by premultiplication of a vector by A or Q. and r = 1 − ωA.4. this may take too then about n long to be practical.. . n = 1. if A is sparse (has few nonzero elements per row). The iterates x(k) will converge to the solution of Ax = b if |r| < 1. that under any choice of initial iterate convergence is guaranteed. 2 operations to solve the system. First let x (0) = ωb. i. for scalars A. Thus we look for an alternative algorithm.3.3) for some matrix Q. and some scaling factor ω.4. We solve this by x= 1 1 1 1 b= ωb = ωb = ωb. First we consider the simplest case. We now translate this scalar result into the vector case. In the case where these two matrices are sparse. When n is large. . This gives the approximate solution to our one dimensional problem as x ≈ 1 + r + r 2 + r 3 + . A ωA 1 − (1 − ωA) 1−r where ω = 0 is some real number chosen to “weight” the problem appropriately. + r k−1 ωb This suggests an iterative approach to solving Ax = b. + r k ωb = ωb + r 1 + r + r 2 + . The algorithm proceeds as follows: ﬁrst ﬁx some initial estimate of the solution. (3. + r k ωb = ωb + r + r 2 + r 3 + . . but this is not necessary. x (0) . which would have been a problem anyway. 1−r Because of the assumption |r| < 1. A good choice might be ωb. This can be done so long as A = 0.2 Dividing by Multiplying 2 We saw that Gaussian Elimination requires around 3 n3 operations just to ﬁnd the LU factorization. b. .e. . such an update can be relatively cheap. It turns out that we can consider a slightly more general form of the algorithm. Now use the geometric expansion: 1 = 1 + r + r2 + r3 + . You should now convince yourself that because r n → 0. . one in which successive iterates are deﬁned implicitly. Then calculate successive approximations to the actual solution by updates of the form x(k) = ωb + (I − ωA) x(k−1) . .

but there are two considerations we should keep in mind: 1. A = 2 4 0 . Letting ω = 1. 3. Example Problem 3. the second goal is totally ignored. 1 2 6 6 Solution: We let 1 0 0 Q = 0 1 0 . In particular. x(k) converges to some vector x∗ . This method should clearly converge in one step. This seems to be the best choice for satisfying the ﬁrst goal. Ax∗ = b.4. Use Richardson Iteration with ω = 1 on the system 12 6 1 1 b = 0 . our method becomes Ax(k+1) = (A − A) x(k) + b = b.4. given z = (Q − A) x(k) + b.” But consider the method which takes Q to be A. which chooses Q to be the identity matrix. 2. we should pick Q such that the equation Qx(k+1) = z is easy to solve exactly. Then Qx∗ = (Q − ωA) x∗ + ωb. ωAx∗ = ωb.40 CHAPTER 3. we are considering iterative methods because we cannot easily solve this linear equation in the ﬁrst place. we want Q to be similar to A.” at the other is the Richardsons. SOLVING LINEAR SYSTEMS Now suppose that as k → ∞. That is. However.4. Choice of Q aﬀects convergence and speed of convergence of the method. Qx∗ = Qx∗ − ωAx∗ + ωb. which is a ﬁxed point of the iteration. Solving the system Qx(k+1) = z is trivial: we just have x(k+1) = z.3 Impossible Iteration I made up the term “impossible iteration. These two goals conﬂict with each other. Indeed. (Q − A) = −2 −3 −1 −2 −5 . We have some freedom in choosing Q.4 Richardson Iteration At the other end of the spectrum is the Richardson Iteration. 0 0 1 −5 −1 −1 0 . At one end of the spectrum is the so-called “impossible iteration. Choice of Q aﬀects ease of computing the update. 3.

we see that the operation can be described by n 1 (k+1) (k) xj = bj − aji xi . The Richardson Iteration does not appear to converge for this example. By considering the update elementwise. Thus at each step we are adding some scaled version of the residual. and ﬁnally x(12) = [2 − 0. There is an alternative way to describe the Jacobi Iteration for ω = 1. We can rethink the Richardson Iteration as x(k+1) = (I − ωA) x(k) + ωb = x(k) + ω b − Ax(k) . 0 0 6 13 6 We start with an arbitrary x(0) .019 0. 1 2 6 6 Solution: We let 6 0 0 Q = 0 4 0 . with ω = 1. ITERATIVE SOLUTIONS 41 We start with an arbitrary x(0) . unfortunately. and x(2) = [42 34 78] . −1 −2 0 Q−1 = 0 0 4 3 1 6 0 0 1 4 − 2 10 .5. Continuing.987 − 1.5 Jacobi Iteration The Jacobi Iteration chooses Q to be the matrix consisting of the diagonal of A. with less than k nonzero entries per row. 3. 1 6 Thus an update takes less than 2n2 operations.3. Apply Richardson Iteration with ω = 1/6 Solution: Our iteration becomes 0 −1/6 −1/6 x(k+1) = −1/3 1/3 0 x(k) + −1/6 −1/3 0 on the previous system. This is more similar to A than the identity matrix. ajj i=1. x(2) = [2 − 4/9 7/9] . Example Problem 3. We get x(1) = [−2 − 10 − 10] . the update should take less than 2nk operations. x(0) = [2 2 2] . 3 9 Note the real solution is x = [2 − 1 1] . We get x(1) = 0 −1 −1 (Q − A) = −2 0 0 . Then x(2) = 0 0 . 2 0 .99998 0. the choice of ω has some aﬀect on convergence. We get x(1) = [4/3 0 0] . deﬁned as b − Ax (k) .981] .4. to the iterate. Thus. to solve the system 6 1 1 12 A = 2 4 0 . Example Problem 3. Note the real solution is x = [2 − 1 1] . but nearly as simple to invert. −1 0 . say x(0) = [2 2 2] .4.99998] . 1 We start with the same x(0) as previously. b = 0 . if A is sparse. say x(0) = [2 2 2] . In fact.6.i=j . Use Jacobi Iteration. we ﬁnd that x(5) ≈ [1.

there is an easier way to describe the Gauss Seidel Iteration. n − 1. Deﬁne the error vector: e(k) = x(k) − x. . An alteration of the Gauss Seidel Iteration is to make successive “sweeps” of this redeﬁnition. but involves more work for inverting.4. then running it with Q the upper triangular part. in the sum on the right there are some “old” values of xi and some “new” values. . 0 0 0 4 3 2 −31 We start with an arbitrary x(0) . . SOLVING LINEAR SYSTEMS 3. . In this case solving the system Qx(k+1) = z is performed by forward substitution.4. n. . Then x(2) = 1 .6 Gauss Seidel Iteration The Gauss Seidel Iteration chooses Q to be lower triangular part of A. However.i=j This looks exactly like the Jacobi update. − 35 36 . Example Problem 3. . Use Gauss Seidel Iteration to again solve for 6 1 1 A = 2 4 0 . We get x(1) = Already this is fairly close to the actual solution x = [2 − 1 1] .4. element by element. . This iterative method is known as “red-black Gauss Seidel. In this case we will keep a single vector x and overwrite it. xj ← ajj i=1. 2. say x(0) = [2 2 2] .” 3. 1 2 6 35 18 12 b = 0 . Just as with Jacobi Iteration. the next for j = n. 2. we set n 1 bj − aji xi . . . 6 0 −1 −1 (Q − A) = 0 0 0 .7 Error Analysis Suppose that x is the solution to equation 3.7. This amounts to running Gauss Seidel once with Q the lower triangular part of A.42 CHAPTER 3. 1. one for j = 1. . Here the Q is more like A than for Jacobi Iteration. Thus for j = 1. 2. the new values are those x i for which i < j. including the diagonal. . Again this takes less than 2n2 operations. 1 2 6 Solution: We let 6 0 0 Q = 2 4 0 . . . Or less than 2nk if A is suﬃciently sparse. n.

.3.e.. with corresponding eigenvalue λ. You should now think about how eigenvalues generalize the absolute value of a scalar. <1 Another way of saying this is “the spectral radius of I − ωQ −1 A is less than 1.e. This relation may allow us to pick the optimal ω for given A. e(k) 2 = I − ωQ−1 A k e(0) 2 ≤ I − ωQ−1 A k 2 e(0) 2 . ITERATIVE SOLUTIONS Now notice that x(k+1) = Q−1 (Q − ωA) x(k) + Q−1 ωb. where the result relied on ω being chosen such that |1 − ωA| < 1. It can also show us that sometimes no choice of ω will give convergence of the method. if and only if I − ωQ−1 A 2 2 < 1.. To do this we recall matrix and vector norms from Subsection 1. Let y be an eigenvector for Q−1 A.” In fact. x(k+1) = x(k) − ωQ−1 A x(k) − x .4.8.Q. There are a number of diﬀerent related results that show when various methods will work for certain choices of ω. x(k+1) − x = x(k) − x − ωQ−1 A x(k) − x . Then I − ωQ−1 A y = y − ωQ−1 Ay = y − ωλy = (1 − ωλ) y. = I − ωQ−1 A 2 43 x(k+1) = Q−1 Qx(k) − ωQ−1 Ax(k) + ωQ−1 Ax.4.29. i. Reusing this relation we ﬁnd that e(k) = I − ωQ−1 A e(k−1) . We leave these to the exercises. (See Example Problem 1.1. and how this relates to the norm of matrices. An iterative solution scheme converges for any starting x (0) if and only if all eigenvalues of I − ωQ−1 A are less than 1 in absolute value. e(k+1) = I − ωQ−1 A e(k) . e(k−2) . e(0) . Recall our introduction to iterative methods in the scalar case. e(k+1) = e(k) − ωQ−1 Ae(k) . the speed of convergence is decided by the spectral radius of the matrix–convergence is faster for smaller values. = I − ωQ−1 A k We want to ensure that e (k+1) is “smaller” than e(k) . i. x(k) → x) if I − ωQ−1 A This gives the theorem: Theorem 3.) Thus our iteration converges (e(k) goes to the zero vector.

2679ω.5. Find conditions on ω which guarantee convergence of Richardson’s Iteration for ﬁnding approximate iterative solutions to the system Ax = b.7321ω. For simplicity we will only consider an alteration of Richardson’s Iteration. 2. it can be shown that e(k) = (I − ωk A) e(k−1) where. 1 − 4.8. In the altered algorithm we presuppose the existence.9.388 7. 4. e(k) = x(k) − x. Given iterate x(k−1) . A = 2 4 0 .) Compare this to the results of Example Problem 3.7321 (See also Exercise (3. convergence was observed. . while for Example Problem 3. 1 − 4ω.” that is if f (x) is a polynomial and λ is an eigenvalue of a matrix A. In this case the polynomial we consider is f (x) = x0 −ωx1 .8 leads to an interesting possible variant of the iterative scheme.7321.4. Thus the eigenvalues of I − ωA are approximately 1 − 7.8 A Free Lunch? The analysis leading to Theorem 3. again. which we use in each iterative update. Using octave or Matlab you will ﬁnd that the eigenvalues of A are approximately 7.4. with x the actual solution to the linear system. and 4.44 CHAPTER 3. 3. with ω = 1/6. via some oracle. SOLVING LINEAR SYSTEMS Example Problem 3. ω i . With some work it can be shown that all three of these values will be less than one in absolute value if and only if 3 0<ω< ≈ 0. Expanding e (k−1) similarly gives e(k) = (I − ωk A) (Iωk−1 A) e(k−2) k = i=1 I − ωi A e(0) . Following the analysis for Theorem 3. we have convergence if and only if I − ωA 2 <1 We now use the fact that “eigenvalues commute with polynomials. deﬁne x(k) = (I − ωk A) x(k−1) + ωk b.2679. 6 1 2 6 Solution: By Theorem 3. ω = 1 apparently did not lead to convergence. where 12 6 1 1 b = 0 . Select some initial iterate x(0) .10). where for this system. Thus our algorithm becomes: 1.8. of a sequence of weightings. then f (λ) is an eigenvalue of the matrix f (A). with Q the identity matrix.

regardless of the choice of e (0) it to somehow ensure that the matrix k B= i=1 I − ωi A has all zero eigenvalues. This problem is of suﬃcient complexity to outweigh any savings to be gotten from our “free lunch” algorithm. one. One way to guarantee that e(k) is zero. The problem is that it is not simple to ﬁnd. for a given arbitrary matrix A. this is not exactly a practical algorithm. We again make use of the fact that eigenvalues “commute” with polynomials to claim that if λj is an eigenvalue of A. in some limited situations this algorithm might be practical if the eigenvalues of A are known a priori. This suggests how we are to pick the weightings: let them be the inverses of the eigenvalues of A. then k i=1 1 − ω i λj is an eigenvalue of B. . This eigenvalue is zero if one of the ω i for 1 ≤ i ≤ k is 1/λj . or all its eigenvalues. regardless of the size of the matrix. ITERATIVE SOLUTIONS 45 Remember that we want e(k) to be small in magnitude. or. to be zero. if A has a small number of distinct eigenvalues. better yet. As you may have guessed from the title of this subsection. say m eigenvalues.3. In fact.4. some. then convergence to the exact solution could be guaranteed after only m iterations. However.

088 −0. .744 2. (See Exercise (3.3) Perform Na¨ Gaussian Elimination to prove Cramer’s rule for the 2D case.744 2.4) Implement Cramer’s rule to solve a pair of linear equations in 2 variables. . and g(s) = cs + d. .1080 −0. 1 .24 −3.48 (a) (b) (c) 4 −3 −1 −0. how would you detect whether they are parallel? How is this related to the form of the solution to a system of two linear equations? (See Exercise (3.46 Exercises CHAPTER 3. .24 −3.2) Perform back substitution to solve the equation 1 0 0 0 3 2 0 0 5 4 2 0 3 3 x = 1 2 −1 −1 1 2 ıve (3.3)) Test your code on the following (augmented) systems: 1 1 3 −2 1.088 2 −0.6) Prove that the inverse of the matrix 1 a2 a3 .48 1 1..5) Given two lines parametrized by f (t) = at + b. Your m-ﬁle should have header line like: function x = cramer2(A.0590 0. . That is.2372 −0. s at the point of intersection of the two lines.6 −0.3528 (d) 0.b) where A is a 2 × 2 matrix. .4348 0. . and b and x are 2 × 1 vectors. SOLVING LINEAR SYSTEMS (3. . usingna¨ Gaussian Elimination ıve −3 6 0 8 24 16 3 −1 −2 (a) 9 −1 −4 (b) 1 12 11 (c) 1 −2 0 −4 5 −8 4 13 19 −6 10 13 (3. . set up a linear 2 × 2 system of equations to ﬁnd the t. an 0 0 ··· 1 0 ··· 0 1 ··· . Your code should ﬁnd the x such that Ax = b. 0 0 ··· 0 0 0 . .6452 (3. prove that the solution to a b x f = c d y g is given by det y= det a f c g a b c d det and x= det f g b d a b c d (3.1) Find the LU decomposition of the followingmatrices. If you were going to write a program to detect the intersection of two lines.3)) (3. .

01 −1 0.3 −4 0.1 0.5 2 3 4 −3 5 10 100 1 0. −an 0 0 ··· 1 0 ··· 0 1 ··· . . . and α + β = 0.1 0.10) Let A be a symmetric n × n matrix with eigenvalues in the interval [α. .4. (3.1 0 (3. consider the iteration y (k+1) = Ay (k) . . ﬁnd the iterate x(1) by one step of Richardson’s Method. . .) (c) Show that this relationship is satisﬁed by ω = 2/ (α + β). (b) Prove that max {|λ| : 1 − ωβ ≤ λ ≤ 1 − ωα} is minimized when we choose ω such that 1 − ωβ = − (1 − ωα) . (c) Show that as k → ∞. (a) Show that the eigenvalues of I − ωA are in the interval [1 − ωβ. And by one step of Jacobi Iteration. .7) Under the strategy of scaled partial pivoting. ITERATIVE SOLUTIONS is the matrix 47 1 −a2 −a3 . 1 (Hint: Multiply them together. 1 − ωα]. .) (3. (Hint: It may help to look at the graph of something versus ω. which row of the following matrix will be the ﬁrst pivot row? 10 17 −10 0.9 −3 3 −3 0. with 0 < α ≤ β. .. Recall that e(k+1) = (I − ωA) e(k) . (d) For this choice of ω show that the spectral radius of I − ωA is |α − β| . Letting y (0) = b/ b 2 .3 0.3. (3. y (k) converges to the (normalized) eigenvector associated with the largest eigenvalue of A. Ay (k) 2 (a) What is y (k) 2 ? (b) Show that y (k) = Ak b/ Ak b 2 . Consider Richardson’s Iteration x(k+1) = (I − ωA) x(k) + ωb.8) Let A be a symmetric positive deﬁnite n × n matrix with n distinct eigenvalues.9) Consider the equation 1 3 5 −5 −2 2 4 x = −6 4 −3 −4 10 Letting x(0) = [1 1 0] . . |α + β| . . And by Gauss Seidel. β]. 0 0 ··· 0 0 0 .

Try diﬀerent values of ω. p(x) of degree k with p(0) = 1. Let r (k) = b − Ax(k) . (c) Argue that r (k+1) = min p(A)r (0) . • Let A be a Toeplitz matrix of the form: −2 1 0 ··· 0 1 −2 1 · · · 0 1 −2 · · · 0 A= 0 . Let x (0) be some starting vector. Consider the following iterative method: Let x (k+1) be the x that solves x∈x(0) +Dk min b − Ax 2 .. (g) For which matrix do you expect faster convergence of Richardson’s Iteration: A 1 with eigenvalues in [10. SOLVING LINEAR SYSTEMS (e) Show that when 0 < α. .12) Let A be a nonsingular n × n matrix. . . Try diﬀerent values of ω. .x0. (Hint: Start with A and the solution x and generate b. and let x0 be the initial iterate x (0) .) Test your code on the following matrices: • Let A be the Hilbert Matrix . then there is an ω such that Richardson’s Iteration with this ω will converge for any choice of x (0) . (f) Prove that if A is positive deﬁnite. then b − Ax = p(A)r (0) for some p ∈ Pk .11) Implement Richardson’s Iteration to solve the system Ax = b. this quantity is always smaller than 1. . . Ar (0) . . (Hint: Argue for the existence of a polynomial in Pn that vanishes at all the eigenvalues of A. conversely. We wish to solve Ax = b.) . b for which you know the actual solution to the problem. 2 p∈Pk 2 (d) Prove that this iteration converges in at most n steps. .w. (b) Prove that. or whatever (smallish) integer. (3. This is generated by the octave command A = hilb(10). . . and let Pk be the set of polynomials. 1020]? Why? (3. including ω = 1. . Your m-ﬁle should have header line like: function xk = richardsons(A. Let w take the place of ω. . Use this polynomial to show that r (n) 2 ≤ 0. let Dk be span r (0) . . including ω = −1/2. Test your code for A. Ak r (0) .b.48 CHAPTER 3. . . 20] or A2 with eigenvalues in [1010. for any p ∈ P k there is some x ∈ x(0) + Dk . such that b − Ax = p(A)r (0) .k) Your code should return x(k) based on the iteration x(j+1) = x(j) − ω Ax(j) − b . 0 0 0 ··· −2 These can be generated by the octave command A = toeplitz([-2 1 0 0 0 0 0 0]). (a) Show that if x ∈ x(0) + Dk . . .

f (a). The following theorem. 2. it is impossible for a computer to test whether a given black box function is continuous.2. we cannot apply the IVT. f (b) have diﬀerent signs. In this case the function f might be legitimately continuous. b] such that f (c) = y. In particular. There are a number of easily conceivable problems: 1. b such that f (a).1. f (c) have diﬀerent signs. f (c). f (c) all have the same sign. 4. 2 We can check whether f (c) = 0. b] such that f (c) = y. c] or [c. We 2 should perform a “sanity check” on the input to make sure f (a). b]. some c such that f (c) = 0. A decent estimate of the root is c = a+b . b] . a. respectively. taking c = a+b . 49 . insures the success of the method Theorem 4. Thus if 0 ∈ [a. b]. and might have a root in the interval [a. If. from calculus class. Thus malicious or incompetent users could cause a na¨ ıvely implemented bisection algorithm to fail. the algorithm would be at an impasse. We now choose to recursively apply bisection to either [a. b such that f (a). The user might give f.Chapter 4 Finding Roots 4. i. b] then for any y such that y is between f (a) and f (b) there is a c ∈ [a. f (b) have diﬀerent sign. The function f (x) = x is not continuous at 0. In particular. The IVT is best illustrated graphically. If this does not hold then one and only one of the two following options holds: 1. f (b) there is no c ∈ [a. b]. If f (x) is continuous on [a. f (a). if 0 ∈ [a. f (b).. then there is some root in [a. b] it happens to be the case that for every y between f (a).e. Note that continuity is really a requirement here–a single point of discontinuity could ruin your whole day. the IVT tells us that if f (x) is continuous and we know a. 1 Example 4.1 Modiﬁcations Unfortunately. depending on which of these two options hold. f (b) have diﬀerent signs.1 Bisection We are now concerned with the problem of ﬁnding a zero for the function f (x). as the following example illustrates. f (b) have the same sign.1 (Intermediate Value Theorem). The simplest method is that of bisection.

b0 ]. b. and that f (a)f (b) might be too small to be representable in the computer. A well implemented version of bisection should check the length of its input interval.. and the other will be either a0 or b0 . moreover has no root in the interval [a. this calculation would be rounded to zero.2 Newton’s Method Newton’s method is an iterative method for root ﬁnding. due to rounding. . .. FINDING ROOTS 2. the algorithm might descend to intervals as small as machine precision. resulting in an inﬁnite recursion. etc.. Another common error occurs in the testing of the signs of f (a). In this case. 4. a. which is supposed to be closer to a root. b]. It may legitimately be the case that none of them is a root to f . If f (x) is a continuous function on [a. then after n steps. respectively. 4. b such that f is not continuous on [a. that |f (a)| . and the ﬁrst midpoint c0 . A wiser choice is if (sign(f(a)) * sign(f(b)) > 0) then .1. The pseudocode bisection algorithm is given as Algorithm 1. That is. and give up if the length is too small. We are claiming that bn − a n = = bn−1 − an−1 2 b0 − a 0 2n Theorem 4. a. only a ﬁnite number of such numbers can be represented. b1 ]. we know that f (x + h) ≈ f (x) + f (x)h.50 CHAPTER 4. b1 will be c0 . Formally call the ﬁrst interval [a 0 .) is called that |b − a| is half the length of the interval in the previous call. . b]. Let the second interval be [a1 . . For a poorly implemented algorithm. . x3 . . Newton’s method uses “linearization” to ﬁnd an approximate root.3 (Bisection Method Theorem). 1 depending on whether x is negative.2 Convergence We can see that each time recursive bisection (f. Recall that we think of computers as storing numbers in the form ±r × 10 k . f (b). 0. the algorithm run bisection will return c such that |c − c | ≤ |b−a| 2n . where c is some root of f . |f (b)| might be very small. In fact.. Note that one of a1 . be the same as one of the endpoints. . guesses x2 . A slick programmer might try to implement this test in the pseudocode: if (f(a)f(b) > 0) then . where the function sign (x) returns −1. . compared to machine precision. The user might give f. one iteration of the algorithm produces a number x 1 . this might lead to an inﬁnite search on smaller and smaller intervals about some discontinuity of f . xn follow identically. 3. zero. b] such that f (a)f (b) < 0. or positive. Recalling Taylor’s Theorem. The user might give f such that f has no root c that is representable in the computer’s memory. Note however. x0 . and unpredictable behaviour would ensue. in which case the midpoint of the interval will. given a ﬁnite number of bits to represent a number. starting from some guess at the root. the behaviour of the algorithm may be like that of the previous case.

f b ← f c.2. then. (2) Let f b ← sign (f (b)) . run bisection(f. (5) else if f af b = 0 (6) if f a = 0 (7) return a (8) else (9) return b (10) Let c ← a+b 2 (11) while b − a > 2δ (12) Let f c ← f (c). As is the case for the bisection method. 2 (19) return c 51 This approximation is better when f (·) is “well-behaved” between x and x + h. the derivative of the function must be known. Input: a function. This is easily solved as h= −f (x) . This was also the case with bisection..1) An iteration of Newton’s method is shown in Figure 4. our algorithm cannot explicitly check for continuity of f (x). takes some guess x k and returns xk+1 deﬁned by xk+1 = xk − f (xk ) . f a ← f c. two endpoints. c ← c+b . 2 (17) else (18) Let a ← c.2.e. c ← a+c . δ. NEWTON’S METHOD Algorithm 1: Algorithm for ﬁnding root by bisection.1 Implementation Use of Newton’s method requires that the function f (x) be diﬀerentiable. along with the linearization of f (x) at xk . Newton’s method attempts to ﬁnd some h such that 0 = f (x + h) = f (x) + f (x)h. (3) if f af b > 0 (4) throw an error.1. but for bisection there was an easy test of the initial interval–i. f (x) An iteration of Newton’s method. ) (1) Let f a ← sign (f (a)) . test if f (a0 )f (b0 ) < 0.4. Moreover. (13) if |f c| < (14) return c (15) if sign (f a) sign (f c) < 0 (16) Let b ← c. 4. This may preclude Newton’s method from being used when f (x) is a black box. and a y-tolerance Output: a c such that |f (c)| is smaller than the y-tolerance. the success of Newton’s method is dependent on the initial guess x0 . f (xk ) (4. b. Moreover. . a. a x-tolerance.

tol) (1) Let x ← x0. FINDING ROOTS f (xk−1 ) f (xk ) xk−1 xk Figure 4. (4) if |f x| < tol (5) return x.1: One iteration of Newton’s method is shown for a quadratic function f (x). and a tolerance Output: a point for which the function has small value. (10) Let x ← x − f x/f px. xk+1 is a better guess than xk . Input: a function. run newton(f. Our algorithm will test for goodness of the estimate by looking at |f (x k )| . x0. .e.. (11) Let n ← n + 1. its derivative. (6) Let f px ← f (x). Note that if it were the case that f (xk ) = 0 then h would be ill deﬁned. (7) if |f px| < tol (8) Warn “f (x) is small. The linearization of f (x) at xk is shown.Sfrag replacements 52 CHAPTER 4. an initial guess. an iteration limit. Algorithm 2: Algorithm for ﬁnding root by Newton’s Method. (2) while n ≤ N (3) Let f x ← f (x). giving up.” (9) return x. It is clear that xk+1 is a root of the linearization. n ← 0. N. i. It happens to be the case that |f (xk+1 )| is smaller than |f (xk )| . The algorithm will also test for near-zero derivative. f .

4. that |ek+1 | ≤ C |ek |2 . However. in which case. with initial estimate x x0 = 3. for example. It has a single root at x = 1. You should verify that there are an inﬁnite number of such x 0 .5.2 Problems As mentioned above.2. Example 4. where r is the root that the xk are converging to. and the initial estimate x 0 . Consider Newton’s method applied to the function f (x) = ln x . Example 4. for x > 1. it is converging at a rate slower than we expect from Newton’s method: at each step we have a constant decrease of 1 − (1/j) . f (x1 ) cos(−x0 ) cos(x0 ) f (x0 ) sin(x0 ) = x0 − = x0 − tan x0 = x0 − 2x0 = −x0 . f (xk ) The estimates will “run away” from the root x = 1.3 Convergence When Newton’s Method converges. Our initial guess is not too far from this root. −x0 . Example 4. However. The following theorem formalizes our claim: . it actually displays quadratic convergence. if e0 were 0.001. A number of conceivable problems might come up. where x0 has the odious property 2x0 = tan x0 . If. Consider Newton’s method applied to the function f (x) = sin (x) for the initial estimate x0 = 0. we know f (x) > 0. then we would double the accuracy of our root estimate with each iterate.4. Of course.000001. if e k = xk −r. f (x0 ) cos(x0 ) Thus Newton’s method “cycles” between the two values x 0 . That is. Thus taking xk+1 = xk − f (xk ) > xk . Consider the identity of x1 : x1 = x 0 − Now consider x2 : x2 = x 1 − f (x1 ) sin(−x0 ) sin(x0 ) = −x0 − − x0 + = −x0 + tan x0 = −x0 + 2x0 = x0 . it were the case that C = 1.2. which is a larger number (and thus worse decrease) when j is larger. We illustrate them here. and with initial estimate x0 = 0. then 1 − ln x < 0. we ﬁnd that x k is converging to the root.6. there is no well-deﬁned xk+1 . convergence is dependent on f (x).2. That is. consider the derivative: f (x) = 1 x x − ln x 1 − ln x = 2 x x2 Since the equation has the single root x = 0. If x > e1 . Now note that xk+1 = xk − xj k j−1 jxk = 1− 1 j xk . However. NEWTON’S METHOD 53 4. Newton’s method may ﬁnd some iterate x k for which f (xk ) = 0. Note that f (x) = 0 has the single root x = 0. we would expect e1 to be on the order of 0. Consider Newton’s method applied to the function f (x) = x j with j > 1. Note that f (x) is continuous on R+ . 4. and so f (x) < 0.

Note that it is assumed the subroutine cannot evaluate g(z) directly.9. Both |f (a)| ≤ (b − a) |f (a)| and |f (b)| ≤ (b − a) |f (b)| hold. then Newton’s method converges to the unique root of f (x) = 0 for any choice of x 0 ∈ [a. Take note of the following.6. f (x) does not change sign on [a. multiplication. since Newton’s Method can solve a linear equation in a single step: Example 4. The proof is based on arguments from real analysis. and (b) if xk is in the region. subtraction. though: 1. FINDING ROOTS Theorem 4. If f (x) has two continuous derivatives on [a. If f (x) has two continuous derivatives. then there is some D such that if |x 0 − r| < D. One way of solving this problem is to have your subroutine use Newton’s Method to solve some equation equivalent to g(z) − x = 0.54 CHAPTER 4.4 Using Newton’s Method Newton’s Method can be used to program more complex functions using only simple functions.7 (Newton’s Method Convergence). Suppose you had a computer which could perform addition. see Cheney & Kincaid for the proof [7]. Newton’s method will converge quadratically to r. f (a)f (b) < 0. 2. This should be an indication that a mistake was made. 2. The theorem never guarantees that some x k will fall into the “attractor” region of a root r of the function.5 and Example 4. and is omitted. 2f (xk ) where ξk is between xk and r = xk + ek . b]. b] .8 (Newton’s Method Convergence II [2]). that is that f (r) = 0. b].4. The proof then proceeds by showing that there is some region about r such that (a) in this region |f (x)| is not too large and |f (x)| is not too small. b]. and r is a simple root of f (x). The proof requires that r be a simple root. where z is the input to the subroutine. then the factor 2 ek will outweigh the factor |f (ξk )| /2 |f (xk )| . then xk+1 is in the region. as in Example 4. students make the mistake of using Newton’s Method to try to solve a linear equation. division. as in Example 4. The theorem that follows gives suﬃcient conditions for convergence to a particular root. You can roughly think of this region as an “attractor” region for the root. xk+1 ← xk − a This can be rewritten simply as xk+1 ← −b/a. so this equation needs to be modiﬁed to satisfy the computer’s constraints. and 1. and storing and retrieving numbers. The iteration is given as axk + b . 4. 3. 4. When this does not hold we may get only linear convergence. . The key to the proof is using Taylor’s theorem. and the deﬁnition of Newton’s method to show that ek+1 = −f (ξk )e2 k . Theorem 4. 3. f (x) = 0 on [a.2. Quite often when dealing with problems of this type. Consider Newton’s method applied to the function f (x) = ax + b. and it was your task to write a subroutine to compute some complex function g(·). In particular the region is chosen to be so small that if x k is in the region.

The Newton step is xk+1 ← xk − z − (1/xk ) 2 = xk − zxk + xk = xk (2 − zxk ) . Instead apply Newton’s Method to f (x) = z − (1/x) .3 Secant Method The secant method for root ﬁnding is roughly based on Newton’s method. Input: a number Output: its multiplicative inverse invs(z) (1) if z = 0 throw an error. 2 2xk z − x2 k . Algorithm 3: Algorithm for ﬁnding a multiplicative inverse using simple operations. Devise a subroutine using only subtraction and multiplication that can ﬁnd the multiplicative inverse of an input number z. 4. i. it is not assumed that the derivative of the function is known.10. however. However. SECANT METHOD 55 The following example problems should illustrate this process of “bootstrapping” via Newton’s Method. The subroutine is given in Algorithm 3. √ Solution: The temptation is to ﬁnd a zero for f (x) = z − x. The Newton step becomes xk+1 ← xk − after some simpliﬁcation this becomes xk+1 ← xk z + . this will not work.11.3. Instead let f (x) = z − x2 . 1/x2 k Note this step uses only multiplication and subtraction.. (5) Let n ← n + 1. Solution: We are tempted to use the linear function f (x) = (1/z) − x. rather the derivative is approximated by the value of the function at some of the iterates.4. (3) while n ≤ 50 (4) Let x ← x (2 − zx) .e. But this is a linear equation for which Newton’s Method would reduce to x k+1 ← (1/z) . Example Problem 4. Since the subroutine can only use subtraction and multiplication. The ﬁnal subroutine is given in Algorithm 4. You can easily see that if x is a positive root of f (x) = 0. −2xk Note this relies only on addition. n ← 0. More speciﬁcally. Example Problem 4. then √ x = z. Devise a subroutine using only simple operations which computes the square root of an input number z. multiplication and division. can ﬁnd (1/z) . (2) Let x ← sign (z) . the slope of the tangent . x k . this equation is linear in x.

FINDING ROOTS Algorithm 4: Algorithm for ﬁnding a square root using simple operations. f (xk−1 )) and (xk . n ← 0. The secant line through (xk−1 . which is f (xk ) is approximated by the slope of the secant line passing through (xk−1 . it is a root to the equation f (xk ) − f (xk−1 ) (x − xk ) = y − f (xk ).56 CHAPTER 4. (5) Let n ← n + 1. f (xk )) . PSfrag replacements f (xk+1 ) f (xk−1 ) f (xk ) xk xk+1 xk−1 Figure 4. xk+1 is a better guess than xk . It happens to be the case that |f (xk+1 )| is smaller than |f (xk )| . i. Input: a number Output: its square root sqrt(z) (1) if z < 0 throw an error. (2) Let x ← 1.2: One iteration of the Secant method is shown for some quadratic function f (x).e. xk − xk−1 . f (xk )) . (3) while n ≤ 50 (4) Let x ← (x + z/x) /2. That is.. line at (xk . f (xk )) is shown. which is f (xk ) − f (xk−1 ) xk − xk−1 Thus the iterate xk+1 is the root of this secant line. f (xk−1 )) and (xk .

The secant method can suﬀer from some of the same problems that Newton’s method does. We . x1 = 4.0284762692197613 −0. 2 Note that this function is continuous and has roots ±1. convergence is dependent on f (x). along with the secant line. x1 = 1 .459842466254495 −0.12. x1 .0013544492875992 −9.177482458876898 0.44186046511628 0.52969840528617 × 10 −06 3.0000000008072 0. with x0 = 3. SECANT METHOD Since the root has a y value of 0. and the initial estimates x 0 .125 −0.2. f (xk ) − f (xk−1 ) xk − xk−1 f (xk ).4.13. Example 4. We give the iterates here: k 0 1 2 3 4 5 6 7 8 9 10 xk 2 0.22880033820638 × 10 −09 −7. x As with Newton’s method. We illustrate a few possible problems: Example 4. x1 . Note also that the secant method requires two initial guesses.925925925925926 2.63467367653163 −0.1 Problems As with Newton’s method.5 0. looking for a nonexistant root.953491494113659 1.999661272951803 0. but with the slope f (xk ) substituted by the slope of the secant line.999999999999998 f (xk ) 9 −1. x 0 .43849426498855 × 10 −15 4.3. xk+1 = xk − f (xk ) − f (xk−1 ) (4. as we will see.00706900811804 0. with x0 = 2. the iterates diverge towards inﬁnity.999997617569723 1.666666666666667 1.2) You will note this is the recurrence of Newton’s method. Consider the secant method for the function f (x) = ln x . An iteration of the secant method is shown in Figure 4.3.868254072087394 0. we have 57 f (xk ) − f (xk−1 ) (xk+1 − xk ) = −f (xk ). xk − xk−1 xk − xk−1 xk+1 − xk = − f (xk ). but can be used on a black box function. Consider the secant method used on x 3 + x2 − x − 1.

N. (12) Let n ← n + 1. We assume that r is a root of f (x). You can easily verify that f (x0 ) = f (x1 ). And thus x 2 is not deﬁned.00401318169994875 0.000388975250950428 0. giving up.3380435135758 117.6548475770851 33. FINDING ROOTS f (xk ) 0. xp ← x0. (10) Let x ← x − f h/f px.000243375589137882 0.00158175793894727 0.8466075548 73636. f p ← f (x0). with initial estimates x0 = −1.820919458675 210. Algorithm 5: Algorithm for ﬁnding root by secant method. x1 = 1.889164762149 637. n ← 0. Consider Newton’s method applied to the function f (x) = 1+x2 − 17 . run secant(f. we assume that we will ﬁnd some . Because the secant method involves two iterates.060241341843 1096.3. (9) Let xp ← x.60222260594 15533. (11) Let f h ← f (x).346573590279973 0. tol) (1) Let x ← x1. (2) while n ≤ N (3) if |f h| < tol (4) return x.2 Convergence We consider convergence analysis as in Newton’s method. and thus the secant line is horizontal.0625163004418104 0.673898472 CHAPTER 4.010135406045582 0.0254089165873003 0.94672271613 5437.” (8) return x.0404780904944712 0.1606791089 26149. 4.7196085218 43924.0160949419231219 0.34688714646 3201. (5) Let f px ← (f h − f p)/(x − xp).54125113444 1878.142011128224341 0. an iteration limit.69020766155 9203.103911011441661 0. and a tolerance Output: a point for which the function has small value.543986613847 366. initial guesses. Input: a function.9111765137635 67.14.000621298692241343 0. x1. and let en = r − xn . f p ← f h. (6) if |f px| < tol (7) Warn “secant slope is too small.000991714984152597 0.0025208146648422 0.00638363233543847 0.000152191807070607 1 1 Example 4. f h ← f (x1). x0.58 give some iterates here: k 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 xk 3 4 21.366204096222703 0.

4. ek−1 . we must have α =1+ √ This is solved by α = 1 1 + 5 ≈ 1. with α = 2.62. We try to ﬁnd the α for the secant method. A |ek |α = |ek+1 | = C |ek | |ek−1 | = CA− α |ek |1+ α . We now postulate that the error terms for the secant method follow some power law of the following type: |ek+1 | ∼ A |ek |α . Indeed. Note that |ek | = A |ek−1 |α . so Then we have |ek−1 | = A− α |ek | α . the proof relies on ﬁnding some “attractor” region and using continuity.3. This is somewhere between the convergence rates for bisection and Newton’s method. 1 . Thus we say that the secant method enjoys superlinear 2 convergence. . 2f (r) Again. ek . α 1 1 1 1 Since this equation is to hold for all e k . we arrive at the imprecise equation: ek+1 ≈ − f (r) ek ek−1 = Cek ek−1 . this is the case. SECANT METHOD 59 relation between ek+1 and the previous two errors. Omitting all the nasty details (see Cheney & Kincaid [7]). Recall that this held true for Newton’s method.

As an extra termination condition.m”) which will evaluate the given f (x). b1 ? What are a2 . and fp is its derivative. (b) Run your code on the function f (x) = x 2 − 5x + 3. This works for builtin functions and m-ﬁles. Run your code to ﬁnd zeroes of the following functions: (a) f (x) = tan x − x. .11) Your answer should be correct to 5 decimal places. b0 = 1.1) Consider the bisection method applied to ﬁnd the zero of the function f (x) = x 2 − 5x + 3. . (cf.10. (a) Run your code on the function f (x) = cos x. 1 1 xk+1 = xk + 2 xk converge to? (4.5) Devise a subroutine using only simple operations that ﬁnds. √ (4. a. with a0 = 0. f (x) for which Newton’s Method: (a) Does not ﬁnd a root for some choices of x 0 . . Your m-ﬁle should have header line like: function c = run_bisection(f. you should have the subroutine return the current iterate if zx is suﬃciently close to 1. You may wish to use the builtin function sign. such that xk → ln 10. Let the initial interval I 0 be [0. What is the next interval considered. Your m-ﬁle should have header line like: function x = run_newton(f. with an initial estimate of x 0 = 3. Your m-ﬁle should have header line like: function c = invs(z) where z is the number to be inverted. say within the interval (0. (b) Finds a root for every choice of x 0 . b0 = 2.12) Implement Newton’s Method.4) What does the sequence deﬁned by x0 = 1. via Newton’s Method. the cubed root of some input number z. (4. b0 = 1. tol) where f is the name of the function. 0. with a0 = 0. b2 ? √ (4. (4.6) Use Newton’s Method to approximate 3 9. fp.9) How will Newton’s Method perform for ﬁnding the root to f (x) = |x| = 0? (4. (4. (c) Run your code on the function f (x) = x − cos x. with a 0 = 0.60 Exercises CHAPTER 4. with a 0 = 0. given z. b0 = 1.3) Consider bisection for ﬁnding the root to cos x = 0.x) when f is a string with the name of some function evaluates that function at x. What are a1 . In this case you can set f = "cos". . (d) Converges slowly to the zero of f (x). FINDING ROOTS (4.8) Give an example (graphical or analytic) of a function.2) Approximate 10 by using two steps of Newton’s method. Recall that feval(f. Is this a reasonable way to write a subroutine that.9999.10) Implement the inverse ﬁnder described in Example Problem 4. tol) where f is the name of the function. (c) Falls into a cycle for some choice of x 0 . Find x2 . Is this possible for rational xk without using a logarithm? Is it practical?) (4.11) Implement the bisection method in octave/Matlab.7) Use Newton’s Method to devise a sequence x 0 . Start with x0 = 2. In this case you will have to set f to the name of an m-ﬁle (without the “. call it I 1 ? What is I2 ? I6 ? (4. x1 . (4. x0. b. 2]. Example Problem 4. computes ln z? (Hint: such a subroutine would require computation of exk .0001). Can you ﬁnd some z for which the algorithm performs poorly? (4. N.

(d) f (x) = (x − 1)7 .) (c) f (x) = 2 + sin x. (4. SECANT METHOD 61 (b) f (x) = x2 − (2 + )x + 1 + . x0.3. N. (b) f (x) = x2 + 1.13) Implement the secant method Your m-ﬁle should have header line like: function x = run_secant(f.4. tol) where f is the name of the function. for small.) . Run your code to ﬁnd zeroes of the following functions: (a) f (x) = tan x − x. (Note: This function has no zeroes. (c) f (x) = x7 − 7x6 + 21x5 − 35x4 + 35x3 − 21x2 + 7x − 1. (Note: This function also has no zeroes. x1.

FINDING ROOTS .62 CHAPTER 4.

f (x). . y n We consider the problem of ﬁnding a polynomial that interpolates a given set of values: where the xi are all distinct. n. for any such set of data. . which we want to approximate with a polynomial p(x)..1 Lagranges Method As you might have guessed. the Lagrange polynomials are the n + 1 polynomials i deﬁned by i (xj ) = δij = 0 if i = j 1 if i = j Then we deﬁne the interpolating polynomial as n pn (x) = i=0 yi i (x). there is an n-degree polynomial that interpolates it. Deﬁnition 5. x n y 0 y1 . we will consider a variant of this problem: we have some black box function.1 Polynomial Interpolation x y x 0 x1 .1. j=i x − xj . .1 (Lagrange Polynomials). The Lagrange Polynomials can be characterized as follows: n i (x) = j=0.Chapter 5 Interpolation 5.” Sometimes.1) 63 . . For a given set of n + 1 nodes x i . xn f (x) f (x0 ) f (x1 ) . then p n also has this property. . . The xi values are called “nodes. . .. xi − x j (5. 1. f (xn ) for some choice of distinct nodes xi . A polynomial p(x) is said to interpolate these data if p(x i ) = yi for i = 0. 5. . If each Lagrange Polynomial is of degree at most n. We present a constructive proof of this fact by use of Lagrange Polynomials. We do this by ﬁnding the polynomial interpolant to the data x x0 x1 . .

then a new point was given (xn+1 . INTERPOLATION By evaluating this product for each x j . . Solution: We construct the Lagrange Polynomials: 0 (x) = (x − 1 )(x − 3) 1 2 = −(x − )(x − 3) 2 (1 − 1 )(1 − 3) 2 4 (x − 1)(x − 3) = (x − 1)(x − 3) 1 (x) = 1 1 5 ( 2 − 1)( 2 − 3) = (x − 1)(x − 1 ) 1 1 2 1 = 5 (x − 1)(x − 2 ) (3 − 1)(3 − 2 ) 2 (x) Then the interpolating polynomial. Then i=0 for any values at the nodes. Example Problem 5. they are the same polynomial. yet it has roots at x0 .e. xn . Each Lagrange Polynomial would have to be updated. So the alternative method constructs the polynomials iteratively. 0 ≡ r(x) = p(x) − q(x). {yi }n . yn+1 ). and an interpolant for the augmented set of data is to be found. suppose some data were given.1. Suppose there were two such polynomials. Thus p(x). each of degree no greater than n. . This method is also constructive. . q(x) are equal everywhere. Proof. and thus has degree no greater than n. both interpolating the data. q(x). Thus we create polynomials pk (x) such that pk (xi ) = yi for 0 ≤ i ≤ k. we simply let p0 (x) = y0 .3. i. . Let r(x) = p(x) − q(x). there is exactly one polynomial. Note that r(x) can have degree no greater than n. each polynomial is clearly the product of n monomials. . Construct the polynomial interpolating the data x y 1 1 3 2 3 −10 2 by using Lagrange Polynomials. call them p(x).2 (Interpolant Existence and Uniqueness). This gives the theorem Theorem 5. .. That is. The only polynomial of degree ≤ n that has n + 1 distinct roots is the zero polynomial. . This could take a lot of calculation (especially if n is large). One way to view this construction is to imagine how one would update the Lagrangian form of an interpolant. and the interpolating polynomial calculated using Lagrange Polynomials. n. and leads to a diﬀerent algorithm for constructing the interpolant. The Lagrange Polynomial construction gives existence of such a polynomial p(x) of degree no greater than n. i. 1. we see that this is indeed a characterization of the Lagrange Polynomials.64 CHAPTER 5. p(x) of degree no greater than i=0 n such that p(xi ) = yi for i = 0. Moreover. .. x1 . This is simple for k = 0. Let {x i }n be distinct nodes.e.2 Newton’s Method There is another way to prove existence. . in “Lagrange Form” is 2 1 1 p2 (x) = −3(x − )(x − 3) − 8(x − 1)(x − 3) + (x − 1)(x − ) 2 5 2 5.

and thus c = 26. 3. . x1 . .5 Figure 5.5 0 -0. and the resultant form is Newton’s form of the interpolant Example Problem 5. graphically. for some constant c. 2 the constant polynomial with value y 0 . . POLYNOMIAL INTERPOLATION 2 l0(x) l1(x) l2(x) 1.5 3 3. . Note that the second term will be zero for any x i for 0 ≤ i ≤ k.5 -1 0 0.5 1 1. Verify.1. This construction is known as Newton’s Algorithm. The following construction works: pk+1 (x) = pk (x) + c(x − x0 )(x − x1 ) · · · (x − xk ). Solution: We construct the polynomial iteratively: p0 (x) = 3 p1 (x) = 3 + c(x − 1) 1 We want −10 = p1 ( 1 ) = 3 + c(− 2 ). Construct the polynomial interpolating the data x y 1 1 3 2 3 −10 2 by using Newton’s Algorithm.1: The 3 Lagrange polynomials for the example are shown. xk .5. Then assume we have a proper pk (x) and want to construct pk+1 (x). so pk+1 (x) will interpolate the data at x0 .5 65 1 0. Then 2 1 p2 (x) = 3 + 26(x − 1) + c(x − 1)(x − ) 2 . that these polynomials have the property i (xj ) = δij for the nodes 1. 1 .5 2 2. To get the value of the constant we calculate c such that yk+1 = pk+1 (xk+1 ) = pk (xk+1 ) + c(xk+1 − x0 )(xk+1 − x1 ) · · · (xk+1 − xk ).4.

This can be rewritten in a funny form. 5 2 Does Newton’s Algorithm give a diﬀerent polynomial? It is easy to show. we may wonder “Which should we use?” Newton’s Algorithm seems more ﬂexible–it can deal with adding new data. . The two methods give the same interpolant.5 4 Figure 5. Continuing in this way we see that we can write n pn (x) = k=0 where an empty product has value 1 by convention. by Theorem 5. Then we get 1 −53 (x − 1)(x − ). INTERPOLATION p(x) 10 5 0 -5 -10 -15 -20 -25 -30 -35 0 0.66 15 CHAPTER 5. by induction. We want 2 = p2 (3) = 3 + 26(2) + c(2)( 5 ).5 1 1. (5. it must be the same polynomial as the Lagrange interpolant. where a monomial is factored out of each successive summand: pn (x) = c0 + (x − x0 )[c1 + (x − x1 )[c2 + (x − x2 ) [. There also happens to be a way of storing Newton’s form of the interpolant that makes the polynomial simple to evaluate (in the sense of number of calculations required).2. The previous iterate pk (x) was constructed similarly.]]] ck 0≤j<k (x − xj ) . so we can write: pk+1 (x) = [pk−1 (x) + ck−1 (x − x0 )(x − x1 ) · · · (x − xk−1 )] + ck (x − x0 )(x − x1 ) · · · (x − xk ).5 3 3.1. and thus c = 2 p2 (x) = 3 + 26(x − 1) + −53 5 . Then. 5.5 2 2.2: The interpolating polynomial for the example is shown.3 Newton’s Nested Form Recall the iterative construction of Newton’s Form: pk+1 (x) = pk (x) + ck (x − x0 )(x − x1 ) · · · (x − xk ). that the degree of pn (x) is no greater than n.2) . .

This nested calculation is performed iteratively: v0 = c n v1 = cn−1 + (t − xn−1 )v0 v2 = cn−2 + (t − xn−2 )v1 . . POLYNOMIAL INTERPOLATION 67 Supposing for an instant that the constants c k were known. this provides a better way of calculating pn (t) at arbitrary t. .1. We assume. . .3) Because we are only interested in the Newton method coeﬃcients. xi2 . for the remainder of this section. . xik−1 x ik − x i0 We care about divided diﬀerences because coeﬃcients for the Newton nested form are divided diﬀerences: ck = f [x0 . .e. . . . . . Compare this with the number required for using the Lagrange form: at least n 2 additions and multiplications. xj+k ] = f [xj+1 . xj+2 . we have values of f (x i ) at the nodes xi . For a given collection of nodes {x i }n and values i=0 {f (xi )}n . 5. xi1 . i. xik ] The divided diﬀerences are deﬁned recursively as follows: • The 0th order divided diﬀerences are simply deﬁned: f [xi ] = f (xi ). • Higher order divided diﬀerences are the ratio of diﬀerences: f [xi0 . . those of the form f [x j . x1 . that we are considering interpolating a function.4 Divided Diﬀerences It turns out that the coeﬃcients ck for Newton’s nested form can be calculated relatively easily by using divided diﬀerences. . . xj+k ].5. xj+k ] − f [xj . . that is. we will only consider divided diﬀerences with successive nodes. . xj+1 . x=0 written as f [xi0 . . xi1 . xj+1 . . . xj+k−1 ] xj+k − xj . . . xk ] . Deﬁnition 5. xi1 . . a k th order divided diﬀerence is a function of k + 1 (not necessarily distinct) nodes.5 (Divided Diﬀerences). vn = c0 + (t − x0 )vn−1 This requires only n multiplications and 2n additions. . . .1. . xik ] − f xi0 . .. . . . . By “better” we mean requiring few multiplications and additions. . In this case the higher order diﬀerences can more simply be written as f [xj . xj+1 . . xik ] = f [xi1 . . (5. . . . .

3−1 5 f[ . ] 26 f[ ] 3 −10 2 3 1 24 5 −53 5 That’s supposed to be a joke. ] f[ . x2 . 3 − (1/2) f[ ] 3 −10 2 f[ . x1 ] = Adding these to the table. x2 ] = So we complete the table: x 1 1 2 24/5 − 26 −53 = . x2 ] f [x3 ] Note that by deﬁnition. . x1 . the ﬁrst column just echoes the data: f [x i ] = f (xi ). x2 ] f [x2 ] f [x2 .68 CHAPTER 5. (1/2) − 1 x 1 1 2 and f [x1 . where you compute the following table columnwise. . ] 3 Then we calculate: f [x0 . x1 ] f[ . ] f [x0 . ] f[ . . ] f[ . x2 ] f [x0 . ] f [x0 . . We drag out our example one last time: Example Problem 5. ] 26 24 5 3 Then we calculate f [x0 . x3 ] f [x1 . x1 . . Find the divided diﬀerences for the following data x f (x) Solution: We start by writing in the data: x 1 1 2 1 1 3 2 3 −10 2 f[ . ] f[ . x1 . from left to right: x x0 x1 x2 x3 f[ ] f [x0 ] f [x1 ] f [x1 . INTERPOLATION The graphical way to calculate these things is a “pyramid scheme” 1 . we have −10 − 3 = 26. x2 ] = 2 − −10 = 24/5. .6. x2 . x3 ] f[ ] 3 −10 2 f[ . .

If our closed interval is [−1.2. then we want to deﬁne our nodes as xi = cos 2i + 1 2n + 2 π . Let p be the polynomial of degree at most n interpolating function f at the n+1 distinct nodes x 0 . b] such that 1 f (n+1) (ξ) f (x) − p(x) = (n + 1)! n i=0 (x − xi ) . . . If we want to interpolate some function f (x) at n + 1 nodes over some closed interval. . Example 5. 0 ≤ i ≤ n.8 (Interpolation Error Theorem).4. see Figure 5. how should we pick the nodes? 2. xn on [a. Then for each x ∈ [a. these Chebyshev Nodes are the projections of points uniformly spaced on a semi circle.5. x1 .7 (Runge Function).2 Errors in Polynomial Interpolation We now consider two questions: 1.5. Then n→∞ x∈[−5. . and let p n (x) interpolate f on n equally spaced nodes.7 can be found. ERRORS IN POLYNOMIAL INTERPOLATION 69 You should verify that along the top line of this pyramid you can read oﬀ the coeﬃcients for Newton’s form. 5. You should be thinking that the term on the right hand side resembles the error term in Taylor’s Theorem. This behaviour is shown in Figure 5. Let f (n+1) be continuous. How accurate can we make a polynomial interpolant over a closed interval? You may be surprised to ﬁnd that the answer to the ﬁrst question is not that we should make the xi equally spaced over the closed interval. Literally interpreted. including the endpoints.4. b].1 Interpolation Error Theorem We did not just invent the Chebyshev nodes. 1]. as the following example illustrates. By using the Chebyshev nodes. a good polynomial interpolant of the Runge function of Example 5.3. as shown in Figure 5. as found in Example Problem 5. on [−5.5] lim max |pn (x) − f (x)| = ∞. The fact that they are “good” for interpolation follows from the following theorem: Theorem 5. 5. It turns out that a much better choice is related to the Chebyshev Polynomials (“of the ﬁrst kind”).2. Let f (x) = 1 + x2 −1 . (known as the Runge Function). . 5]. b] there is some ξ ∈ [a.

2 -0. the interpolations appear to improve with larger n.4 0. At ﬁrst. In (b) it becomes apparent that the interpolant is good in center of the interval. as n gets large.2 0 -0.3: The Runge function is poorly interpolated at n equally spaced nodes. but bad at the edges.8 0. . as in (a).4 -6 -4 -2 0 2 4 6 (a) 3.70 CHAPTER 5. this gets worse as n gets larger.6 0.10 equally spaced nodes 8 runge 15 nodes 7 100000 runge 50 nodes 0 6 -100000 5 -200000 4 -300000 3 -400000 2 -500000 1 0 -600000 -1 -6 -4 -2 0 2 4 6 -700000 -6 -4 -2 0 2 4 6 (b) 15 equally spaced nodes (c) 50 equally spaced nodes Figure 5. As shown in (c).7. INTERPOLATION 1 runge 3 nodes 7 nodes 10 nodes 0.

4 0.2 0. and that by deﬁnition of c. For more than about 25 nodes.3.7 0. First consider when x is one of the nodes x i . bottomright 1 runge 3 nodes 7 nodes 10 nodes 1 runge 17 nodes 0.2.8 0 (a) 3. the interpolant is indistinguishable from the Runge function by eye.5.7. by assumption and deﬁnition.3 0. Proof. f (x) − p(x) .6 0. ERRORS IN POLYNOMIAL INTERPOLATION topleft Sfrag replacements 71 −1 1 Figure 5. that φ(x) = 0 for our x.4 -6 -4 -2 0 2 4 6 0 -6 -4 -2 0 2 4 6 0.5: The Chebyshev nodes yield good polynomial interpolants of the Runge function.10 Chebyshev nodes (b) 17 Chebyshev nodes Figure 5. thus φ has this many continuous derivatives. c= Since x is not a node. Some other facts: f. Make the following deﬁnitions n w(t) = i=0 (t − xi ) . Compare these ﬁgures to those of Figure 5.8 0. So assume x is not a node.2 0.6 0. w(x) φ(t) = f (t) − p(t) − cw(t). in this case both the LHS and RHS are zero. w(x) is nonzero.4 0.2 -0. Now note that φ(x i ) is zero for each node xi . p. That is φ(t) has n + 2 roots. w have n + 1 continuous derivatives. Apply Rolle’s Theorem to φ(t) to ﬁnd that φ (t) has n + 1 .9 0.4: The Chebyshev nodes are the projections of nodes equally spaced on a semi circle.5 0.1 -0.

Merging this result with Theorem 5. Thus the error in a polynomial interpolation is given as 1 f (x) − p(x) = f (n+1) (ξ) (n + 1)! n i=0 0 = f (n+1) (ξ) − c(n + 1)! (x − xi ) . 1] have the remarkable property that n i=0 (t − xi ) ≤ 2−n for any t ∈ [−1. p is determined. the error for polynomial interpolants deﬁned on Chebyshev nodes can be bounded as 1 |f (x) − p(x)| ≤ n max f (n+1) (ξ) . And w(t) is a polynomial of degree n + 1 in t. INTERPOLATION roots. The Chebyshev nodes on [−1. 2 0 ≤ i ≤ n. β]. But p(t) is a polynomial of degree ≤ n. In this case they take the form xi = β−α cos 2 2i + 1 2n + 2 π + α+β . 2 (n + 1)! ξ∈[α. call it ξ. and once the nodes and f are ﬁxed.8. Thus the Chebyshev nodes are considered the best for polynomial interpolation. That is 0 = φ(n+1) (ξ) = f (n+1) (ξ) − p(n+1) (ξ) − cw(n+1) (ξ). Moreover. 2 (n + 1)! ξ∈[−1. Then apply Rolle’s Theorem again to ﬁnd φ (t) has n roots.1] The Chebyshev nodes can be rescaled and shifted for use on the general interval [α. . In this way we see that φ (n+1) (t) has a root. the rescaling of the nodes changes the bound on (t − xi ) so the overall error bound becomes (β − α)n+1 |f (x) − p(x)| ≤ 2n+1 max f (n+1) (ξ) . 1] . (n + 1)! which is what was to be proven. so p (n+1) is identically zero.β] for x ∈ [α.1] max i=0 (t − xi ) ≥ 2−n . it can be shown that for any choice of nodes x i that n t∈[−1. We have no control over the function f (x) or its derivatives. so its n + 1th derivative is easily seen to be (n + 1)! Thus c(n + 1)! = f (n+1) (ξ) f (x) − p(x) 1 = f (n+1) (ξ) w(x) (n + 1)! 1 f (x) − p(x) = f (n+1) (ξ)w(x). thus the only way we can make the error |f (x) − p(x)| small is by judicious choice of the nodes xi . β] . In this case.72 CHAPTER 5.

. and the problems with the Runge Function. .2. . since they are easy to calculate 2 . xj+1 We can show that xi = a + hi = a + |x − xj | |x − xj+1 | ≤ h2 .2. equally spaced nodes are frequently used for interpolation. i = 0. Then n i=0 |x − xi | ≤ h2 (j + 1)!hj 4 (n − j)!hn−j−1 . and |x − xi | ≤ (i − j) h for j + 1 < i. As a crude approximation we can assert that f (k) (x) ≤ |cos x| + |sin x| ≤ 2. 1. and so we get an overall bound n i=0 2 |x − xi | ≤ hn+1 n! . We now consider bounding n x∈[a.5. n Start by picking an x. .9. How many Chebyshev nodes are required to interpolate the function f (x) = sin(x) + cos(x) to within 10−8 on the interval [0.b] max i=0 |x − xi | . . π]? Solution: We ﬁrst ﬁnd the derivatives of f f (x) = cos(x) − sin(x) f (x) = − cos(x) + sin(x) . Thus it suﬃces to take n large enough such that π n+1 2 ≤ 10−8 22n+1 (n + 1)! By trial and error we see that n = 10 suﬃces. . ERRORS IN POLYNOMIAL INTERPOLATION 73 Example Problem 5. f (x) = − sin(x) − cos(x) 5. 4 Never underestimate the power of laziness. It can be shown that (j + 1)!(n − j)! ≤ n!. . otherwise the product in question is zero.2 Interpolation Error for Equally Spaced Nodes Despite the proven superiority of Chebyshev Nodes. 4 where by simple calculus. Now we claim that |x − xi | ≤ (j − i + 1) h for i < j. (b − a) i. Then x is between some x j . n. We can assume x is not one of the nodes.

INTERPOLATION hn+1 max f (n+1) (ξ) . we make the crude approximation f (k) (x) ≤ |cos x| + |sin x| ≤ 2. How many equally spaced nodes are required to interpolate the function f (x) = sin(x) + cos(x) to within 10−8 on the interval [0. . 4(n + 1) where h = (π − 0)/n.74 The interpolation theorem then gives us |f (x) − p(x)| ≤ CHAPTER 5. Thus we want to make n suﬃciently large such that hn+1 2 ≤ 10−8 .10. we can see this is satisﬁed if n = 12. 2nn+1 (n + 1) By simply trying small numbers.b] where h = (b − a)/n. The reason this result does not seem to apply to Runge’s Function is that f (n) for Runge’s Function becomes unbounded as n → ∞. π]? Solution: As in Example Problem 5. 4 (n + 1) ξ∈[a.9. That is we want to ﬁnd n large enough such that π n+1 ≤ 10−8 . Example Problem 5.

5}.7) Find the polynomial of degree no greater than 3 that interpolates x y (5.2) Find the Lagrange Polynomials for the nodes {−1. 1. .) (5.6) Find the nested form of the polynomial interpolant of the data x y x 1 3 4 6 1 3 4 6 −3 13 21 1 by completing the following divided diﬀerences table: f[ ] f[ .5) Find the polynomial of degree no greater than 3 that interpolates x y −1 1 5 −3 3 3 −2 4 (Hint: reuse the Newton form of the polynomial from the previous question. ] f[ . . ] −3 13 21 1 (5. . . ] f[ .8) Complete the divided diﬀerences table: x −1 0 1 2 f[ ] f[ . ] f[ . . (5. ERRORS IN POLYNOMIAL INTERPOLATION Exercises (5. .1) Find the Lagrange Polynomials for the nodes {−1. ] f[ .5.2. (5.4) Complete the divided diﬀerences table: x −1 1 5 −1 1 5 3 3 −2 f[ ] f[ . . ] f[ . ] 6 3 2 3 1 0 3/2 2 3 2 37/8 8 . 1}. (5. ] 3 3 −2 75 Find the Newton form of the polynomial interpolant.3) Find the polynomial of degree no greater than 2 that interpolates x y (5.

1].76 CHAPTER 5.00000 -0. Of what degree is the polynomial interpolant? (5. 2]? (5. How small is the error when n = 10? 1 (5.fxs) where xs is the vector of n + 1 nodes. and fxs the vector of n + 1 values.12) Write code to calculate the Newton form coeﬃcients. Your m-ﬁle should have header line like: function coefs = newtonCoef(xs. octave:2> coefs = [6 5 1]. How small is the error when n = 10? How small would the error be if Chebyshev nodes were used instead? How about when n = 10? (5. 2].00000 0. octave:3> calcNewton (5.05000 0.xs) ans = 34 octave:4> calcNewton (-3.10) Let p(x) interpolate the function x −2 at n equally spaced nodes on the interval [0. octave:6> newtonCoef(xs. octave:8> fxs = xs.5≤x≤1 as a function of n.+ 4.13) Write code to calculate a polynomial interpolant from its Newton form coeﬃcients and the node values.00417 -0.coefs.00020 -0.coefs. octave:9> newtonCoef(xs.) Find the Newton form of the polynomial interpolant.*xs + xs .xs) where coefs is a vector of the Newton coeﬃcients.00040 (a) What do you get when you try the following? octave:4> xs = [1 4 5 9 14 23 37 60]. octave:2> fxs = [1 1 2 3 5 8 13 21].xs) ans = 10 (a) What do you get when you try the following? octave:5> xs = [3 1 4 5 9 2 6 8 7 0]. for the nodes xi and values f (xi ).08333 0. octave:5> fxs = [3 1 4 1 5 9 2 6].9) Let p(x) interpolate the function cos(x) at n equally spaced nodes on the interval [0.11) How many Chebyshev nodes are required to interpolate the function x to within 10−6 on the interval [1. by divided diﬀerences. octave:3> newtonCoef(xs.fxs) ans = 1.33333 -0. and y is the value of the interpolating polynomial at t. INTERPOLATION (Something a little odd should have happened in the last column. Check your code against the following values: octave:1> xs = [1 3 4]. xs is a vector of the nodes x i . Bound the error max p(x) − x−2 0. Test your code on the following input: octave:1> xs = [1 -1 2 -2 3 -3 4 -4]. Your m-ﬁle should have header line like: function y = calcNewton(t. Bound the error max |p(x) − cos(x)| 0≤x≤2 as a function of n.fxs) (5. .5.ys) (b) Try the following: octave:7> xs = [1 3 4 2 8 -2 0 14 23 15].coefs.

octave:7> calcNewton(0. octave:10> calcNewton(1.coefs.xs) 77 .5.xs) (b) Try the following octave:8> xs = [3 1 4 5 9 2 6 8 7 0].5. ERRORS IN POLYNOMIAL INTERPOLATION octave:6> coefs = [1 -1 2 -2 3 -3 4 -4 5 -5].2. octave:9> coefs = [1 -1 2 -2 3 -3 4 -4 5 -5].coefs.

INTERPOLATION .78 CHAPTER 5.

In this way a spline is diﬀerent from a polynomial interpolation. .5 0. 6.5 2.1 5. is a function which is linear on each subinterval deﬁned by a partition: Deﬁnition 6. we search for the largest i such that x − t i > 0. which are a way of cutting an interval into a number of subintervals. y n Note that if x ∈ [ti . S is a linear polynomial. given t y t 0 t1 . Deﬁnition 6.1 First and Second Degree Splines Splines make use of partitions. A spline of degree 1. 3. 79 0.0 2.75 1.1 0. t n y 0 y1 . 2.1 (Partition). A function S is a spline of degree 1 on [a. For a spline with this data.2 (Linear Splines). b].3. splines are normally piecewise polynomial. . also known as a linear spline.3 4. then evaluate Si (x). A spline is a function consisting of simple functions glued together. . That is. S is continuous on [a. ti+1 − ti . The linear spline for the following data t y is shown in Figure 6. Thus if we wish to evaluate S(x).0 0. A partition of the interval [a. b] such that on each [ti .0 3 there is only one linear spline with these values at the knots and linear on each given subinterval.0 1. There is a partition {ti }n of [a. b] if 1. which consists of a single well deﬁned function that approximates a given shape.4 0. i=0 A linear spline is deﬁned entirely by its value at the knots. the linear polynomial on each subinterval is deﬁned as yi+1 − yi Si (x) = yi + (x − ti ) .Chapter 6 Spline Interpolation Splines are used to approximate complex functions and shapes. but x − ti−1 ≤ 0. ti+1 ]. Example 6. ti+1 ] . b] is an ordered sequence {t i }n such i=0 that a = t0 < t1 < · · · < tn−1 < tn = b The numbers ti are known as knots. b].1. . then x − ti > 0. The domain of S is [a.

ti+1 ] . .80 6 CHAPTER 6.. then |f (t) − p(t)| ≤ M1 (ti+1 − ti ) .. |f (t) − f (ti+1 )|} . . it is a bit of overkill. |f (t) − p(t)| ≤ max {|f (t) − f (ti )| . ti+1 ]. if the spline is being used to interpolate the function f.1 First Degree Spline Accuracy As with polynomial functions.6 0. and is linear between each knot. 6. To ﬁnd the error bound. The spline is piecewise linear. That is. ti+1 ].8.1 Suppose p(t) is the linear polynomial interpolating f (t) at the endpoints of the subinterval [ti . if f (t) exists and is bounded by M2 on [ti . then for t ∈ [ti . |f (t) − p(t)| is no larger than the “maximum variation” of f (t) on this interval. ti+1 ]. then this is equivalent to interpolating the data t t0 t1 . 2 Similarly. say. 8 Over a given partition. if f (t) exists and is bounded by M1 on [ti . and use a little calculus.4 0. The spline interpolant in that ﬁgure is fairly close to the function over some of the interval in question. We are interested in ﬁnding bounds on the possible error between a function and its spline interpolant. but it also deviates greatly from the function at other points of the interval.1. tn y f (t0 ) f (t1 ) . these become. respectively 1 Although we could directly claim Theorem 5.1: A linear spline.2 0. we will consider the error on a single interval of the partition. In the latter case. In particular.2. SPLINE INTERPOLATION 5 4 3 2 1 0 0 0. splines are used to interpolate tabulated data as well as functions. .8 1 Figure 6. then |f (t) − p(t)| ≤ M2 (ti+1 − ti )2 . f (tn ) A function and its linear spline interpolant are shown in Figure 6.

b). This contrasts with polynomial interpolants. The domain of Q is [a. for the knots 0. b]. quadratic splines are not deﬁned entirely by their values at the knots. b].1 + sin (1/(0.5 2 0 0. 3. The spline Q(x) is deﬁned by its piecewise polynomials.1) If equally spaced nodes are used.08t + 0. are deﬁned similarly: Deﬁnition 6. the function and its interpolant are very close.75.2 0. Q(x) = −x2 + 7x − 8 2 ≤ x. 1.5. The following is a quadratic splne: −x x ≤ 0.1. 0. Q is a polynomial of degree at most 2. Q is continuous on (a. along with the linear spline interpolant.4 (Quadratic Splines). |f (t) − p(t)| ≤ M1 max (ti+1 − ti ) . which may get worse as the number of nodes is increased. 0.2: The function f (t) = 4.1. 0.4. b] such that on [ti . n 4. ti+1 ]. b] if 1. cf. Example 6. 0. 2. Unlike linear splines.5)) is shown. 8 0≤i<n (6.6.4 0. Qi (x) = ai x2 + bi x + ci .6 0. or splines of degree 2.2 Second Degree Splines Piecewise quadratic splines. 6.0 For some values of t. Example 5.5 4 3.9.1. We consider why that is. while they vary greatly in the middle of the interval. 0. A function Q is a quadratic spline on [a.8 1 Figure 6.6. FIRST AND SECOND DEGREE SPLINES 6 Sfrag replacements 81 function spline interpolant 5. Q is continuous on [a.5 5 4. There is a partition {ti }i=0 of [a. 2 0≤i<n M2 |f (t) − p(t)| ≤ max (ti+1 − ti )2 . .7. these bounds guarantee that spline interpolants become better as the number of nodes is increased.5 3 2. x2 − x 0 ≤ x ≤ 2.

. The given data t t 0 t1 . 2 (ti+1 − ti ) zi+1 − zi (ti+1 − ti ) + zi (ti+1 − ti ) .82 Thus there are 3n parameters to deﬁne Q(x). . . 6. Qi (ti+1 ) = zi+1 . S . ti+1 − ti zi+1 − zi (x − ti )2 + zi (x − ti ) + yi . the data t y CHAPTER 6. . 2 (ti+1 − ti ) Thus we can determine. S. but 3n unknowns.3 Computing Second Degree Splines t y t 0 t1 . . ti+1 ]. This totals 3n − 1 equations. Q (a) = 0. . . A function S is a spline of degree k on [a. . b] such that on [ti .6 (Splines of Degree k). t n y 0 y1 . t n y y 0 y1 . . 2 yi+1 − yi − zi .2 (Natural) Cubic Splines If you recall the deﬁnition of the linear and quadratic splines. b]. i=0 You would also expect that a spline of degree k has k − 1 “degrees of freedom. S is a polynomial of degree ≤ k. Qi (ti ) = zi . . t n y 0 y1 . . This system is underdetermined. yi+1 − yi = 2 zi+1 + zi yi+1 − yi = (ti+1 − ti ) . If the partition has n + 1 knots. Thus some additional user-chosen condition is required to determine the quadratic spline.” as we show here. we see that we can deﬁne Qi (x) = Use this at ti+1 : yi+1 = Qi (ti+1 ) = zi+1 − zi (ti+1 − ti )2 + zi (ti+1 − ti ) + yi . SPLINE INTERPOLATION t 0 t1 . 2. . or some other condition. One might choose. The domain of S is [a. . z i+1 from zi : zi+1 = 2 6. Let zi = Qi (ti ). Because Qi (ti ) = yi . S . for example. from the data alone. S (k−1) are continuous on (a. 3. or Q (a) = 0. . . The condition on continuity of Q gives a single equation for each of the n − 1 internal nodes. There is a partition {ti }n of [a. and suppose that the additional condition to deﬁne the quadratic spline is given by specifying z0 . the spline of degree k is deﬁned by n(k + 1) parameters. y n give two equations regarding Qi (x). This is 2n equations.1. We want to be able to compute the form of Q i (x). y n Suppose the data are given. namely that Qi (ti ) must equal yi and Qi (ti+1 ) must equal yi+1 . b] if 1. . probably you can guess the deﬁnition of the spline of degree k: Deﬁnition 6. For each of the n subintervals. b). y n .

Proof. barring some singularity. and let S be the natural cubic spline interpolating f (x) at some given nodes {ti }n . Integrating by parts we get b b b b S (x)g (x) dx = S g a a − a S g dx = − S g dx. Then g(x) is zero on the (n + 1) knots t i .6. Let R(x) be some twice-continuously diﬀerentiable function i=0 which also interpolates f (x) at these nodes. The usual choice is to make S (t0 ) = S (tn ) = 0. Let f be twice-continuously diﬀerentiable. Thus b n−1 b n−1 ti+1 S g dx = a i=0 a ci g dx = i=0 ci g ti = 0. meaning that f (x) = S (x) + g (x). S (k−1) at the n − 1 internal knots gives (k − 1)(n − 1) equations. . Corollary 6. We let g(x) = f (x) − S(x). . taking value c i on each interval [ti . and S is the natural cubic spline interpolating f at knots a = t0 < t1 < . thus S (x) is a piecewise constant function. . The continuity of S . These are the degrees of freedom.” The corollary following this theorem states this in more easily understandable terms: Theorem 6. This yields cubic splines. with the last equality holding because g(x) is zero at the knots.1 Why Natural Cubic Splines? It turns out that natural cubic splines are a good choice in the sense that they are the “interpolant of minimal H 2 seminorm. . . Derivatives are linear. We must add 2 extra constraints to deﬁne the spline.2. . Thus we have k − 1 more unknowns than equations. S . The natural cubic spline is best twice-continuously diﬀerentiable interpolant for a twice-continuously diﬀerentiable function. ti+1 ]. This yields the natural cubic spline. Apply the theorem to get b S (x) a 2 b dx ≤ R (x) a 2 dx . Then notice that S is a polynomial of degree ≤ 3 on each interval. Often k is chosen as 3.8. Suppose f has two continuous derivatives. Then S(x) interpolates R(x) at these nodes. Then a b f (x) 2 b dx = a S (x) 2 b dx + a g (x) 2 b dx + a 2S (x)g (x) dx. we can (and must) add k − 1 constraints to uniquely deﬁne the spline. Then b S (x) a 2 b dx ≤ f (x) a 2 dx Proof. We show that the last integral is zero. (NATURAL) CUBIC SPLINES 83 provide 2n equations.7.2. < tn = b. a because S (a) = S (b) = 0. 6. under the measure given by the theorem. This is a total of n(k + 1) − (k − 1) equations. Thus.

0] 3ex2 + 2f x + g x ∈ [0.2. with limk→−∞ tk = −∞ and limk→∞ tk = ∞. The method is rather tedious. 0] x ∈ [0.1. 2] 6ax + 2b 6ex + 2f x ∈ [−1. The natural cubic spline condition gives −6a + 2b = 0 and 12e + 2f = 0. 0] 1 3 2 − 2x − 1 x ∈ [0. 2] − 2 x + 3x Finding the constants for the previous example was fairly tedious. . And this is for the case of only three nodes. Solving this by “divide and conquer” gives S(x) = x3 + 3x2 − 2x − 1 x ∈ [−1. SPLINE INTERPOLATION 6. . We presuppose the existence of an inﬁnite number of knots: . 0] ex3 + f x2 + gx + h x ∈ [0. . < t 2 < t1 < t0 < t1 < t2 < . .3 B Splines The B splines form a basis for spline functions. so we leave it to the exercises.2 Computing Cubic Splines First we present an example of computing the natural cubic spline by hand: Example Problem 6.84 CHAPTER 6. The B splines of degree 0 are deﬁned as single “blocks”: 0 Bi (x) = 1 ti ≤ x < ti+1 0 otherwise . 2] −a + b − c − 1 = 3 We interpolate to ﬁnd that d = h = −1 and 8e + 4f + 2g − 1 = 3 We take the derivative of S: S (x) = 3ax2 + 2bx + c x ∈ [−1. 2] Continuity at the middle node gives c = g. whence the name. Now take the second derivative of S: S (x) = Continuity at the middle node gives b = f. Construct the natural cubic spline for the following data: t y −1 0 2 3 −1 3 Solution: The natural cubic spline is deﬁned by eight parameters: S(x) = ax3 + bx2 + cx + d x ∈ [−1. . We would like a method easier than setting up the 4n equations and unknowns. something akin to the description in Subsection 6.9. 6.3.

The B spline B i (x) is • Piecewise linear.3. . These B splines are sometimes called hat functions. y n n 1 yi Bi−1 (x). The hat functions play a similar role because 1 Bi (tj ) = δ(i+1)j = 1 (i + 1) = j 0 (i + 1) = j Then if we want to interpolate the following data with splines of degree 1: t y We can immediately set S(x) = i=0 t 0 t1 .6. . • Nonzero only on (ti . That is. . . Imagine wearing a hat shaped like this! Whatever. The nice thing about the hat functions is they allow us to use analogy. are nonzero only on one subinterval [ti . • Continuous. . B SPLINES 85 The zero degree B splines are continuous from the right. We focus on the case k = 1.3. 1 The B splines quickly become unwieldy. ti+1 ). t n y 0 y1 . the basis splines work in the same way that Lagrange Polynomials worked for polynomial interpolation. Harken back to polynomial interpolation and the Lagrange Functions. We justify the description of B splines as basis splines: If S is a spline of degree 0 on the given knots and is continuous from the right then S(x) = i 0 S(xi )Bi (x). Some B splines are shown in Figure 6. The B splines of degree k are deﬁned recursively: k Bi (x) = x − ti ti+k − ti k−1 Bi (x) + ti+k+1 − x ti+k+1 − ti+1 k−1 Bi+1 (x). ti+2 ). sum to 1 everywhere. • 1 at ti+1 .

5 -0. a degree 1 B spline.5 -1 0 1 2 -1 3 4 5 6 0 1 2 3 4 5 6 (a) Degree 0 B spline (b) Degree 1 B spline 3 1 B0 (x) 1 B1 (x) 2 (x) B0 2.3: Some B splines for the knots t 0 = 1. two of the degree 1 B splines and a degree 2 B spline are shown.5.5 0 -0. .5 1 1 0.5. t1 = 2. t3 = 4 are shown. in (c). in (b).5 2 1. t2 = 3.5 0. The two hat functions “merge” together to give the quadratic B spline.5 1 0.5 2. with 2 Degree 1 B splines Figure 6.5 -1 0 1 2 3 4 5 6 (c) Degree 2 B spline. one of the degree 0 B splines. In (a).5 1. SPLINE INTERPOLATION 3 3 2.5 PSfrag replacements 0 0 -0.86 CHAPTER 6.5 2 2 1.

2]? Why or why not? S(x) = x+3 3 :0≤x<1 :1≤x≤2 (6.5) Is the following function a quadratic spline on [0. 4]? Why or why not? S(x) = 3x + 2 −2x + 4 :0≤x<1 :1≤x≤4 87 (6. Assume your solution takes the form α1 (x − 1)2 + β1 (x − 1) − 2 : 0 ≤ x < 1 Q(x) = α2 (x − 1)2 + β2 (x − 1) − 2 : 1 ≤ x ≤ 4 Find the constants α1 .1) Is the following function a linear spline on [0. 4]? Why or why not? Q(x) = x2 + 3 5x − 6 :0≤x<3 :3≤x≤4 (6.6. 3 1 x2 + 2x + 2 : 0 ≤ x < 1 2 Q(x) = αx2 + βx + γ : 1 ≤ x < 3 2 3x − 7x + 12 : 3 ≤ x ≤ 5 (6. β1 . 2]? Why or why not? Q(x) = x2 + 3x + 2 2x2 + x + 3 :0≤x<1 :1≤x≤2 (6. β. 4]? Why or why not? S(x) = x2 + 3 5x − 6 :0≤x<3 :3≤x≤4 (6.6) Is the following function a quadratic spline on [0. assume that Q (0) = −Q (4).2) Is the following function a linear spline on [0.3. α2 . 5]. β2 .4) Find constants. B SPLINES Exercises (6.8) Find the quadratic spline that interpolates the following data: t y 0 1 4 1 −2 1 To resolve the single degree of freedom. γ such that the following is a quadratic spline on [0. β such that the following is a linear spline on [0. . 5]. α.7) Find constants.3) Is the following function a linear spline on [0. 4x − 2 :0≤x<1 S(x) = αx + β :1≤x<3 −2x + 10 : 3 ≤ x ≤ 5 (6. α.

9) Find the natural cubic spline that interpolates the data x y 0 1 3 4 2 7 It may help to assume your answer has the form S(x) = Ax3 + Bx2 + Cx + 4 D(x − 1)3 + E(x − 1)2 + F (x − 1) + 2 :0≤x<1 :1≤x≤3 .88 CHAPTER 6. SPLINE INTERPOLATION (6.

1) f (ξ)h2 . 2 The error that we incur when we approximate f (x) by calculating [f (x + h) − f (x)] /h is called truncation error. so we approximate the derivative by dropping the last term. you would still have truncation error. Thus there is a nonzero h for which the sum of these two errors is minimized. The error in calculation for small h is called roundoﬀ error. By now.1 Finite Diﬀerences Suppose we have some blackbox function f (x) and we wish to calculate f (x) at some given x. h 2 Remember that ξ is between x and x + h. That is. 89 . The truncation error can be made small by making h small. In so doing. we cannot evaluate f (ξ). we start with Taylor’s theorem: f (x + h) = f (x) + f (x)h + Rearranging we get f (x + h) − f (x) f (ξ)h − .5 for an example of this. you should know that any calculation starts with Taylor’s Theorem: f (x + h) = f (x) + f (x)h + f (x) 2 f (ξ1 ) 3 h + h 2 3! f (x) 2 f (ξ2 ) 3 h − h f (x − h) = f (x) − f (x)h + 2 3! 1 This approximation for f (x) should remind you of the deﬁnition of f (x) as a limit. That is. The truncation error for this approximation is O (h). as h gets smaller. See Example Problem 7. precision will be lost in equation 7.Chapter 7 Approximating Derivatives 7. Not surprisingly. It has nothing to do with the kind of error that you get when you do calculations with a computer with limited precision. but its exact value is not known. we have dropped the last term. If there is a ﬁnite bound on f (z) on the interval in question then the dropped term is bounded by a constant times h. Generally the roundoﬀ error will increase as h decreases. However. we calculate [f (x + h) − f (x)] /h as an approximation 1 to f (x). We may want a more precise approximation. f (x) = f (x) = f (x + h) − f (x) + O (h) h (7. If we wish to calculate f (x). even if you worked in inﬁnite precision.1 due to subtractive cancellation.

one with inﬁntely many derivatives. ξ2 such that f (ξ) = is the MVT at work. .) Assuming some uniform bound on f (·). 2! 3! −4h3 3! f (4f (x + h) − f (x + 2h) − 3f (x)) /2h = f (x) + 4f (x + h) − f (x + 2h) − 3f (x) = 2hf (x) 4f (x + h) − f (x + 2h) = 3f (x) + 2hf (x) + 0f (x) + 3 + −4h f (x) + . . 3! f (ξ1 ) + f (ξ2 ) 3 h 3! f (x + h) − f (x − h) f (ξ1 ) + f (ξ2 ) h2 − 2h 2 6 f (ξ1 )+f (ξ2 ) . .1. (x) + . . .. . involving the use of Taylor’s Theorem. 4! f . assuming f (x) is an analytic function.2) In some situations it may be necessary to use evaluations of a function at “odd” places to approximate a derivative. . . As before: f (x + h) = f (x) + hf (x) + 4f (x + h) = f (x + 2h) = h3 h2 2! f (x) + 3! f (x) + .90 By subtracting these two lines. . 2! f (x) + 3! f (x) + 4! f 2 3 7h 15h 3h (4) (x) + . it is given to us. 3! −2h2 3! f (x) + . . . 2h for f with suﬃcient number of derivatives Solution: In this case we do not have to ﬁnd the approximation scheme. . then there is some ξ between ξ 1 . 2! f (x) + 3! f (x) + 4! f Thus (f (x + 2h) − f (x + h)) /h = f (x) + O (h) 4f (x + h) − f (x + 2h) − 3f (x) = f (x) + O h2 .. we get CHAPTER 7. . we get f (x) = f (x + h) − f (x − h) + O h2 2h (This (7. 2 3 4f (x) + 4hf (x) + 4h f (x) + 4h f (x) + . Show that 7h3 15h4 (4) 3h2 (x) + . i. . 2 If f (x) is continuous. We only have to expand the appropriate terms with Taylor’s Theorem. APPROXIMATING DERIVATIVES f (x + h) − f (x − h) = 2f (x)h + Thus 2f (x)h = f (x + h) − f (x − h) − f (x) = f (ξ1 ) + f (ξ2 ) 3 h . . . .e. Solution: First use Taylor’s Theorem to expand f (x + h) and f (x + 2h). 2! 3! 2 3 f (x) + 2hf (x) + 4h f (x) + 8h f (x) + . These are usually straightforward to derive. . Use evaluations of f at x + h and x + 2h to approximate f (x). ..2. The following examples illustrate: Example Problem 7. (f (x + 2h) − f (x + h)) /h = f (x) + Example Problem 7. then subtract to get some factor of f (x): f (x + 2h) = f (x) + 2hf (x) + f (x + h) = f (x) + hf (x) + f (x + 2h) − f (x + h) = hf (x) + 4h2 2! f (x) + 8h3 3! f (x) + h2 2! f (x) + h3 3! f (x) + 16h4 (4) (x) + 4! f h4 (4) (x) + .

k=1 f (x + h) = f (x) + f (x)h + Now add the two series to get f (x + h) + f (x − h) = 2f (x) + h2 f (x) + 2 Then let ψ(h) = f (x + h) − 2f (x) + f (x − h) h2 = f (x) + 2 = f (x) + ∞ Thus we can use Richardson Extrapolation on ψ(h) to get higher order approximations. which is better than O (h) . 4 16 64 256 But we can combine this with φ(h) to get better accuracy.. . 2 3! 4! f (x) 2 f (x) 3 f (4) (x) 4 f (x − h) = f (x) − f (x)h + h − h + h − .1 Approximating the Second Derivative Suppose we want to approximate the second derivative of some blackbox function f (x)..2. 2 3! 4! f (6) (x) 6 f (4) (x) 4 h +2 h + . 3 4 16 64 f (i+1) (x) (i+1)! . Again. The constants ai are a function of f (i+1) (x) only. 4! 6! f (4) (x) 2 f (6) (x) 4 h +2 h + .) . . f (x − h) to more terms we would have seen that φ(h) = f (x) + a2 h2 + a4 h4 + a6 h6 + a8 h8 + .3) 7. 2h Had we expanded the Taylor’s Series for f (x + h). .7. but we can get the O h2 terms to cancel by taking the right multiples of these two approximations: 3 15 63 φ(h) − 4φ(h/2) = −3f (x) + 4h4 + a6 h6 + a8 h8 + .2 Richardson Extrapolation The centered diﬀerence approximation gives a truncation error of O h2 . .. . .. 4 16 64 1 4 5 21 4φ(h/2) − φ(h) 6 = f (x) − 4h − a6 h − a8 h8 + . This derivation also gives us the centered diﬀerence approximation to the second derivative: f (x) = f (x + h) − 2f (x) + f (x − h) + O h2 .... . (In fact.. Can we do better? Let’s deﬁne φ(h) = 1 [f (x + h) − f (x − h)] . h2 (7. We have to be a little tricky.1. 4! 6! b2k h2k . .. they should take the value of What happens if we now calculate φ(h/2)? 1 1 1 1 φ(h/2) = f (x) + a2 h2 + a4 h4 + a6 h6 + a8 h8 + . RICHARDSON EXTRAPOLATION 91 7. start with Taylor’s Theorem: f (x) 2 f (x) 3 f (4) (x) 4 h + h + h + .

. There are constants a k. . and can be repeatedly applied. Suppose you want to calculate some quantity L.1 Abstracting Richardson’s Method We now discuss Richardson’s Method in a more abstract framework. some approximation: ∞ φ(h) = L + k=1 a2k h2k . 4m − 1 (7. 1) D(2. 0) D(2. The proof is by an easy. we use a pyramid table: D(0. but tedious. Let D(n. and the other to the left and up one space. . Thus to calculate D(n. 2) . D(n.2. 0) D(n. 1) D(n. m) = 4m D(n. that is D(n. n) = L+O h2(n+1) .3 (Richardson Extrapolation). ways of approximating the deﬁnite integral of a function. n) we have to compute this whole lower triangular array. We want to show that D(n. 7. . We claim that D(n.4) h 2n . . . induction. First we examine the recurrence for D(n. . The following theorem gives this result: Theorem 7. m − 1) − D(n − 1. 2) · · · D(n. . n) is a O h2(n+1) approximation to L. and have found. This technique of getting better approximations is known as the Richardson Extrapolation. n) for some n. m − 1) . 0) D(1.m such that ∞ D(n.92 CHAPTER 7. We skip the proof. through theory. As in divided diﬀerences. m) = L + k=m+1 ak. 0) = φ Now deﬁne D(n. n) By deﬁnition we know how to calculate the ﬁrst column of this table. We will be interested in calculating D(n.m h 2n 2k (0 ≤ m ≤ n) . n) = L + O h2(n+1) . APPROXIMATING DERIVATIVES This approximation has a truncation error of O h4 . . . m). one directly to the left.. . every other entry in the table depends on two other entries. 0) D(1. 1) D(2. We will also use this technique later to get better quadrature rules–that is.

but our computer doesn’t know that. However.025 log 0. φ(h) = 2h 2h Let’s use h = 0.999999999.00001) ≈ 0.000834586 0.95 ≈ 1.999999686 ≈ 1. already.333333333331274 0. RICHARDSON EXTRAPOLATION 93 7. 2). Consider the ugly function: f (x) = arctan(x).05 1 ≈ 0. then making h small gives a good approximation to f The following table illustrates this: . so the value that we are seeking is 1 . √ 1 Attempt to ﬁnd f ( 2). We now try to ﬁnd D(2.975 ≈ 1.333339506181068 0. 2h √ 2 until subtractive cancelling takes over. 3 Solution: Let’s use h = 0.1 1.4.333334876543723 0.333333333333186 0. Recall that f (x) = 1+x2 . Deﬁne 1+h log 1−h 1 [f (1 + h) − f (1 − h)] = .5. which is supposed to be a O h6 approximation to 1 : 3 n\m 0 1 2 0 1 2 0.1 0. Example Problem 7.003353477 1.2 Using Richardson Extrapolation We now try out the technique on an example or two.7. We now try to ﬁnd D(2.999994954 2 ≈ 0. Example Problem 7. 2).2 ≈ 1.000208411 0.2. Approximate the derivative of f (x) = log x at x = 1. notice that for this simple example. Solution: The real answer is f (1) = 1/1 = 1.05 log 0.000000002 This shows that the Richardson method is pretty good. which is supposed to be a O h6 approximation to f (1) = 1: n\m 0 1 2 1.333333333333313 Note that we have some motivation to use Richardson’s method in this case: If we let φ(h) = √ √ 1 f ( 2 + h) − f ( 2 − h) . we have. that φ(0.33333371913582 0.1.2.01.9 0 log 0.

39269908169872408 0.0001 0.0001 1 × 10−5 1 × 10−6 1 × 10−7 1 × 10−8 1 × 10−9 1 × 10−10 1 × 10−11 1 × 10−12 1 × 10−13 1 × 10−14 1 × 10−15 1 × 10−16 φ(h) 0.001 0.33333332760676626 0. The total error is the sum of a truncation term which decreases as h decreases. APPROXIMATING DERIVATIVES h 1.01 0. 2). and a roundoﬀ term which increases.33333333315788138 0. PSfrag replacements total error 0.01 gives much better results than this optimal h. 1 √ Figure 7. then begins to deteriorate.33333336091345694 0. Note that Richardson’s D(2. 2) approximation with h = 0. Notice that φ(h) gives at most 10 decimal places of accuracy.01 1 .333333360913457 0.1: The total error for the centered diﬀerence approximation to f ( 2) is shown versus h.0001 1e-06 1e-08 1e-10 1e-12 1e-16 1e-14 1e-12 1e-10 1e-08 1e-06 0.33306690738754691 0 The data are illustrated in Figure 7.1 0.94 CHAPTER 7. Note however.33333333332441484 0. we get 13 decimal places from D(2.33333333334106813 0.33339997429493451 0.33333333395058062 0.0 0.01 0.33306690738754696 0.33395069677431943 0.33306690738754696 0.1.333333360913457 0.33333950618106845 0.33333339506169679 0. The optimal h value is around 1 × 10 −5 .

e.. assuming the ﬁrst column consists of values D(n.1 Suppose you want to know quantity Q. . 0) for n = 0. say φ(h).1) Derive the approximation f (x) ≈ 4f (x + h) − 3f (x) − f (x − 2h) 6h 95 using Taylor’s Theorem. RICHARDSON EXTRAPOLATION Exercises (7. 1). where f (x) = x4 . (a) Assuming that f (x) has bounded derivatives. Let ψ(h) be the centered diﬀerence approximation to the ﬁrst derivative: ψ(h) = 2 f (x + h) − f (x − h) 2h 4 6 (a) Show that ψ(h) = f (x) + h f (x) + h f (5) (x) + h f (7) (x) + . . Find some linear combination of φ(h) and φ(−h) which is a O h2 approximation to Q. Your answer should be something like O h? . ﬁnd some combination of φ(h). using h = 4 . Assuming that φ(h) = Q + a2 h2 + a4 h4 + a6 h6 . and h = 0. (a) What order approximation is this? (Assume f (x) has bounded derivatives of arbitrary order. 3! 5! 7! (b) Show that 8 (ψ(h) − ψ(h/2)) = f (x) + O h2 .25 ? ? (See equation 7.7. and can approximate it with some formula. Assuming that φ(h) = Q + a 2 h2 + a4 h4 + a6 h6 . ﬁnd some combination of φ(h). .) (b) Use this formula to approximate f (0). i.9) using Taylor’s Theorem. φ(h/3) which is a O h4 approximation to Q. .. h2 (7.6) (7. Suppose you have some great computational approximation to the quantity Q such that ψ(h) = Q + a3 h3 + a6 h6 + a9 h9 . . φ(h/4) which is a O h4 approximation to Q.7) (7. give the accuracy of the above approximation. φ(λh) which is a O h4 approximation to Q.5 ? 2 1.. . ﬁnd some combination of φ(h). 2: n\m 0 1 2 0 2 1 1. . Let λ be some number in (0. . 1.2) Let f (x) be an analytic function. . one which is inﬁnitely diﬀerentiable.3) Derive the approximation f (x) ≈ 4f (x + 3h) + 5f (x) − 9f (x − 2h) 30h (7. 1 (b) Let f (x) = x3 . Can you ﬁnd some combination of ψ(h).. . and such that φ(h) = Q + a 1 h + a2 h2 + a3 h3 + a4 h4 + .5) (7. To make the constant associated with the h4 term small in magnitude. . ψ(h/2) which is a O h6 approximation to Q? Complete the following Richardson’s Extrapolation Table.8) (7. what should you do with λ? Is this practical? Note that the method of Richardson Extrapolation that we considered used the value λ = 1/2. which depends on parameter h.4) (7. (7. Approximate f (0) with this approximation.2. .) .4 if you’ve forgotten the deﬁnitions. Assuming that φ(h) = Q + a2 h2 + a4 h4 + a6 h6 .

D(n. .5 1.99999]. octave:6> richardsons(col0) (b) What do you get when you try the following? octave:7> col0 = [0. .999 0. .5 1. n. . for i = 0. Test your code on the following input: octave:1> col0 = [1 0. Your m-ﬁle should have header line like: function Dnn = richardsons(col0) where Dnn is the value at the lower left corner of the table. given the ﬁrst column.5 1.5 0.019042 (a) What do you get when you try the following? octave:5> col0 = [1.25 0.5 0.9 0. n) while col0 is the column of n + 1 values D(i. octave:8> richardsons(col0) .9999 0.10) Write code to complete a Richardson’s Method table. 1. 0).5].5 0.99 0.03125]. octave:2> richardsons(col0) ans = 0. APPROXIMATING DERIVATIVES (7.125 0.96 CHAPTER 7.0625 0.5 0.

1 Upper and Lower Sums We will review the deﬁnition of the Riemann integral of a function. 8. you may ﬁnd yourself in a situation where you have to evaluate an integral for just such an integrand. P ) = i=0 97 . P. What you might not have been told in Calculus is there are some functions for which a closed form antiderivative does not exist or at least is not known to humankind.1 The Deﬁnite Integral b Often enough the numerical analyst is presented with the challenge of ﬁnding the deﬁnite integral of some function: f (x) dx. deﬁne the upper and lower bounds on each subinterval [x j . and F (x) is an antiderivative of f (x). An approximation will have to do. a In your golden years of Calculus. you learned the Fundamental Theorem of Calculus. P ) = i=0 n−1 mi (xi+1 − xi ) Mi (xi+1 − xi ) U (f. Given such a partition. then b a f (x) dx = F (b) − F (a). ordered collection of nodes x i : a = x0 < x1 < x2 < · · · < xn = b. A partition of an interval [a. Nevertheless.Chapter 8 Integrals and Quadrature 8. deﬁne the upper and lower sums: n−1 L(f.1. which claims that if f (x) is continuous. b] is a ﬁnite. xj+1 ] as follows: mi = inf {f (x) | xi ≤ x ≤ xi+1 } Mi = sup {f (x) | xi ≤ x ≤ xi+1 } Then for this function f and partition P.

. Continuity is suﬃcient. lower sums: (i) L(f. and the upper sums an overestimate of the integral.1. we deﬁne the integral b f (x) dx = inf U (f. a P You may recall the following Theorem 8.1: The (a) lower. b]. in case f (x) is integrable. The notion of integrability familiar from Calculus class (that is Riemann Integrability) is deﬁned in terms of the upper and lower sums.3.. P ). bottomright bottomright (a) The Lower Sum (b) The Upper Sum Figure 8. INTEGRALS AND QUADRATURE We can interpret the upper and lower sums graphically as the sums of areas of rectangles deﬁned by the function f and the partition P . as in Figure 8. ·) increases and U (f. Note that the lower sums are an underestimate. Notice a few things about the upper.1. P ). Moreover. These approximations to the integral are the sums of areas of rectangles. (ii) If we switch to a “better” partition (i. a ﬁner one). P P where the supremum and inﬁmum are over all partitions of the interval [a. P ) = inf U (f. P ) ≤ U (f. A function f is Riemann Integrable over interval [a. but not necessary. Example 8.98 CHAPTER 8. but is Riemann Integrable on every closed bounded interval. Consider the Heaviside function: f (x) = 0 x<0 1 0≤x This function is not continuous on any interval containing 0. b] if sup L(f. Every continuous function on a closed bounded interval of the real line is Riemann Integrable (on that interval). we expect that L(f. P ).e. ·) decreases. and (b) upper sums of a function on a given interval are shown. Deﬁnition 8.2.

The error of a simple quadrature rule usually depends on the function . P )) = [f (xn ) − f (x0 )] = . P ) = 0. P ) − L(f.8. P ) = k=0 f (xk+1 ) = k=0 |b − a| n n f (xk ) k=1 Then the error of the approximation is 1 1 |b − a| |b − a| [f (b) − f (a)] (U (f. so sup L(f.2 Approximating the Integral b The deﬁnition of the integral gives a simple method of approximating an integral a f (x) dx. 8. Consider the Dirichlet function: f (x) = 0 x rational 1 x irrational 99 For any partition P of any interval [a. THE DEFINITE INTEGRAL Example 8. . or for a black box function. while the supremum occurs at the right hand endpoint. . n. it becomes easier: Example 8. . In this case. that is x ≤ y implies f (x) ≤ f (y). Consider for example.1. P )) . b]. it is usually not feasible to ﬁnd the suprema and inﬁma of f (x) on the subintervals. rules which approximate the integral of a function. n The algorithm then has to somehow ﬁnd the supremum and inﬁmum of f (x) on each interval [xi . f (x) over an interval [a.. . using this method on some function f (x) which is monotone increasing. 2 2 n 2n 8. P ). we have n−1 L(f. the inﬁmum of f (x) on each interval occurs at the leftmost endpoint.e. we have L(f. P. The integral is then approximated by the mean of the lower and upper sums: b a f (x) dx ≈ 1 (L(f. this approximation has error at most 1 (U (f.5. P ). xi+1 ].1. However. P ) and U (f.4. i. P ) − L(f. 1. P ) + U (f. and thus the lower and upper sums cannot be calculated. while U (f. 2 Note that in general. The method cuts the interval [a. Thus for this partition.3 Simple and Composite Rules For the remainder of this chapter we will study “simple” quadrature rules.1. b] by means of a number of evaluations of f at points in this interval. b] into a partition of n equal subintervals x i = a+ b−a . P ) = k=0 n−1 |b − a| mi |xk+1 − xk | = n Mi |xk+1 − xk | = |b − a| n n−1 f (xk ) k=0 n−1 U (f. P P so this function is not Riemann Integrable. P ) = 0 = 1 = inf U (f. for i = 0. P )) . P ) = 1. 2 Because the value of the integral is between L(f. if some information is known about the function.

This gives the (simple) trapezoidal rule: b a f (x) dx ≈ (b − a) f (a) + f (b) . for example if the interval in question is [α. β]. Thus. b] is shown. 8. f (a)) . we usually cast it into a “composite” rule. xi+1 ]. uright a b lleft Figure 8. a for some unpleasant or black box function f (x). and sum the results. we can ﬁnd this signed area easily. INTEGRALS AND QUADRATURE f. The means of extending a simple rule to a composite rule is straightforward: Partition the given interval into subintervals. . which we will study next becomes the composite trapezoidal rule. By old school math.2 Trapezoidal Rule b Suppose we are trying to approximate the integral f (x) dx.100 CHAPTER 8. To use a simple rule on a larger interval. The trapezoidal rule approximates the integral PSfrag replacements b f (x) dx a by the (signed) area of the trapezoid through the points (a. we have n−1 composite rule on [α.2. and with one side the segment from a to b. That is we usually think of a simple rule as being applied to a small interval. f (b)) . See Figure 8. apply the simple rule to each subinterval. β] = i=0 simple rule applied to [xi .2: The trapezoidal rule for approximating the integral of a function over [a. (b. . < xn = β. b] to some power which is determined by the rule. and the width of the interval [a. 2 . and the partition is α = x 0 < x1 < x2 < . Thus the trapezoidal rule.

< xn = β. Example Problem 8. f (x1 ) = 1 .1) Since each subinterval has equal width.2. 2 2 That’s right: the composite trapezoidal rule approximates the integral of f (x) over [x i . Ti = xi pi (x) dx = (xi+1 − xi ) pi (xi ) + pi (xi+1 ) h = (f (xi ) + f (xi+1 )) . where h = (β − α)/n. . with xi = α + ih. i=1 Note that the composite trapezoidal rule for equal subintervals is the same as the approximation we found for increasing functions in Example 8. xi+1 . Solution: We have h = 2−0 = 1. (8.6. Approximate the integral 2 0 1 dx 1 + x2 by the composite trapezoidal rule with a partition of equally spaced points. and our approximation is correct to two decimal places.8. for n = 2. one which you saw in your calculus class. xi+1 ] by the integral of pi (x) over the same interval. 8.107149.1 How Good is the Composite Trapezoidal Rule? We consider the composite trapezoidal rule for partitions of equal subintervals. xi+1 ] . we have f (x) − pi (x) = 1 (2) f (ξx ) (x − xi ) (x − xi+1 ) . Let p i (x) be the polynomial of degree ≤ 1 that interpolates f (x) at x i . and f (x0 ) = 1.5. if the interval in question is partitioned into equal width subintervals. you saw this in the less comprehensible form: b a f (x) dx ≈ h f (a) + f (b) + 2 n−1 f (xi ) . then the composite trapezoidal rule is β n−1 xi+1 xi f (x) dx = α i=0 1 f (x) dx ≈ 2 n−1 i=0 (xi+1 − xi ) [f (xi ) + f (xi+1 )] . Now recall our theorem on polynomial interpolation error. f (x2 ) = 1 . and we have b a f (x) dx ≈ h 2 n−1 [f (xi ) + f (xi+1 )] . Let xi+1 xi+1 Ii = xi f (x) dx. TRAPEZOIDAL RULE 101 The composite trapezoidal rule can be written in a simpliﬁed form.2. .2) In your calculus class. Then the composite 2 2 5 trapezoidal rule gives the value 1 11 1 1 1+1+ = . That is if you let [α. [f (x0 ) + f (x1 ) + f (x1 ) + f (x2 )] = 2 2 5 10 The actual value is arctan 2 ≈ 1. For x ∈ [x i . x i+1 − xi = h. β] be partitioned by α = x0 < x1 < x2 < x3 < . i=0 (8. (2)! .

. Note that (x − x i ) (x − xi+1 ) is nonpositive on the interval of question. We will now attack the integral on the right hand side. xi+1 ]. f (x) is concave up and thus f is positive. then I − T will be negative. We assume continuity of f (x). which lies between the least and greatest values of f on the inteval [a. Thus if. By boring calculus and algebra.102 CHAPTER 8. xi+1 ] . Suppose f is continuous.8 (Error of the Composite Trapezoidal Rule). To make things simpler. Then there is some ζ ∈ [α. So E=− This gives us the theorem: (b − a) h2 f (ξ) 12 Theorem 8. call it ξ(x). there is some ξ which takes this value. β] such that β β f (x)g(x) dx = f (ζ) α α g(x) dx.e. 12 Note that this theorem tells us not only the magnitude of the error.3. We use this theorem on our integral. 12 h3 . β]. Recall that ξx depends on x. h. n i=0 f (ξi ). b]. xi+1 ]. Then there is some ξ ∈ [a. but the sign as well. Then we have 1 Ii − Ti = f (ξ) 2 xi+1 xi (x − xi ) (x − xi+1 ) dx. i=0 n−1 1 On the far right we have an average value. b]. for some ξi ∈ [xi . Let T be the value of the trapezoidal rule applied to f (x) on this interval with a partition of uniform b spacing. See Figure 8. INTEGRALS AND QUADRATURE for some ξx ∈ [xi . We now sum over all subintervals to ﬁnd the total error of the composite trapezoidal rule n−1 E= i=0 Ii − T i = − h3 12 n−1 i=0 f (ξi ) = − (b − a) h2 12 1 n n−1 f (ξi ) .7 (Mean Value Theorem for Integrals). for example. i. and thus by the IVT. b] such that I −T =− (b − a) h2 f (ξ). g is Riemann Integrable and does not change sign on [α. Let f (x) be continuous on [a. and let I = a f (x) dx. . xi+1 ]. Now integrate: xi+1 Ii − T i = xi f (x) − pi (x) dx = 1 2 xi+1 xi f (ξ(x)) (x − xi ) (x − xi+1 ) dx. the trapezoidal rule gives an overestimate of the integral I. 6 This gives Ii − T i = − for some ξi ∈ [xi . and wave our hands to get continuity of f (ξ(x)). [xi . we ﬁnd that xi+1 xi (x − xi ) (x − xi+1 ) dx = − h3 f (ξi ). Recall the following theorem: Theorem 8.

. 1]. the trapezoidal rule will 6 be an overestimate. thus f (x) = 3x2 . we need only take 1 ≤ 1 × 10−10 . 8. in absolute value.2. Thus f (ξ) is continuous and bounded by 12 on [0. How many intervals are required to approximate the integral 2 0 x3 − 1 dx to within 1 × Solution: We have f (x) = x3 − 1. thus f (x) = − (1+x)2 .8. has positive second derivative. How many intervals are required to approximate the integral ln 2 = I = 0 1 dx 1+x to within 1 × 10−10 ? 1 2 1 Solution: We have f (x) = 1+x . 6n2 and so n ≥ 1 × 105 suﬃces. 2]. If we use n equal subintervals then by Theorem 8.8 the error will be − 2−0 12 2−0 n 2 10−6 ? f (ξ) = − 2f (ξ) .2 Using the Error Bound 1 Example Problem 8. − f (ξ) = − 12 n 12n2 To make this smaller than 1 × 10−10 . 3n2 . i. and f (x) = 6x.10. Thus f (ξ) is continuous and bounded by 2 on [0. And f (x) = (1+x)3 . Example Problem 8.3: The trapezoidal rule is an overestimate for a function which is concave up.8 tells us the error will be 1−0 1−0 2 f (ξ) . TRAPEZOIDAL RULE uright Sfrag replacements 103 xi xi+1 lleft Figure 8.9. Because f (x) is positive on this interval. If we use n equal subintervals then Theorem 8.2.e.

. each entry depending on two others. this means the error is one quarter. f (x) dx + a2 h2 + a4 h4 + a6 h6 + n n n n 4 16 64 256 = a b−a 1 This happens because hn+1 = 2n+1 = 2 b−a = h2n . we would have proved the relation: b φ(n) = a f (x) dx + a2 h2 + a4 h4 + a6 h6 + a8 h8 + . a. The intervals used to calculate φ(n + 1) are half the size of those for φ(n). that the error of the composite trapezoidal rule approximation is O h2 . INTEGRALS AND QUADRATURE To make this smaller than 1 × 10−6 . approximately. n+1 n+1 n+1 n+1 1 1 1 1 a8 h8 + . . For a given n. 24 ≤ 1 × 10−6 . As mentioned above. we let R(n. n 64 This approximation has a truncation error of O h4 . This should look just like something 2n from Chapter 7. The constants ai are a function of f (i) (x) only. in absolute value. . b are given.3 Romberg Algorithm Theorem 8. we can use this to get better and better approximations to the integral. . If we halve h. the forms are exactly the same. n n n n where hn = b−a . Sometimes we want to do better than this. the error is quartered.8 tells us. . . Towards this end. it suﬃces to take √ and so n ≥ 8 × 103 suﬃces. . . the trapezoidal rule will be an overestimate. Because f (x) is positive on this interval. It turns out that if we had proved the error theorem diﬀerently. In fact. .104 CHAPTER 8. . That is h = b−a . 3n2 8. We do this by constructing a triangular array of approximations. Towards this end. . n Like in Richardson’s method. As with Richardon’s method for approximating 2n derivatives. . What happens if we now calculate φ(n + 1)? We have b φ(n + 1) = a b f (x) dx + a2 h2 + a4 h4 + a6 h6 + a8 h8 + . n 64 21 a8 h8 + . We’ll use the same trick that we did from Richardson extrapolation. . we now combine the right multiples of these: φ(n) − 4φ(n + 1) = 4φ(n) − φ(n + 1) = 3 −3 3 f (x) dx + 4h4 + 4 n a b 1 f (x) dx − 4h4 − 4 n a b 15 a6 h6 + n 16 5 a6 h6 − n 16 63 a8 h8 + . Then deﬁne 2n 1b−a φ(n) = 2 2n 2n −1 f (xi ) + f (xi+1 ) i=0 2n −1 b − a f (a) f (b) + + = n 2 2 2 f i=1 a+i b−a 2n . we are going to use the trapezoidal rule on a partition of 2n equal subintervals of [a. . 0) = φ(n). b]. suppose that f.

So we ﬁrst calculate R(0. 30 At this point we are tempted to use Richardson’s analysis. using Romberg’s Algorithm we have R(1. and need not be smaller than 1. R(k + n. for Romberg’s algorithm.. h 0 is determined by a and b. R(k + n. 1) . m − 1) − R(n − 1. 2) · · · R(2. 0) R(k + n. Approximating the integral 2 0 1 dx 1 + x2 by Romberg’s Algorithm. . . 0) R(n. 0). Successive columns are found by combining members of previous columns. R(n. 0) R(k + 2. 1) R(n. 1) = 4R(1.8.3. . Example Problem 8. 0) and R(1. R(k + 1. 0) R(1. This would claim that R(n. for m > 0 R(n. n) Even though this is exactly the same as Richardon’s method. the ﬁrst is the trapezoidal rule on a single subinterval. 0) − R(0. n) . Then R(0. 2) . . R(n. 1) R(k + n. 2 2 10 Then. We can easily deal with this problem by picking some k such that b−a is small enough. 4m − 1 105 (8. 1). n) is a 2(n+1) approximation to the integral. .3) The familiar pyramid table then is: R(0. say 2k smaller than 1. 0) . . Solution: The ﬁrst column is calculated by the trapezoidal rule. 2) · · · R(k + 2. 0) R(k + 1. . . . These are fairly simple. 1) . .0¯ 6. O h0 This is a bit diﬀerent from Richardson’s method. . 0) = [f (0) + f (1) + f (1) + f (2)] = . 1 2 5 11 2−01 R(1. m) = 4m R(n. ﬁnd R(1. Then calculating the following array: R(k. it has another name: this is called the Romberg Algorithm. ROMBERG ALGORITHM then deﬁne. However. R(1.11. 0) = 4−1 44 10 − 3 12 10 = 32 = 1. . . h 0 = b − a. m − 1) . 1) R(2. the second is the trapezoidal rule on two subintervals. where the original h is independently set before starting the triangular array. 0) . 0) = 6 2−01 [f (0) + f (2)] = . 0) R(2. 2) .. 1) R(k + 2. . . .

. . rather lower entries in a single column are calculated instead. 2) · · · . 0) + hn+1 2 f (a + ihn ) + i=1 f (a + (2i − 1)hn+1 ) . Subtractive cancelling or unbounded higher derivatives of f (x) can make successive approximations less accurate. 2n+1 Then calculating R(n + 1. f (a + (2i − 1)hn+1 ) + f 2n −1 1 a + (2i) hn 2 . n) R(k + n + 2. i=1 2n −1 i=1 2n −1 i=1 f (a + ihn+1 ) .. 1) . f (a) + f (b) 1 + = hn 2 2 1 = R(n. .3. 2) . . ﬁrst notice from the above example that R(0. . 1) R(k + n. 1) R(k + n + 2. n) makes a ﬁne approximation to the integral as l → ∞. 8. 0) . 0) = b−a1 [f (a) + f (b)] . . 1) R(k + n + 1. Then R(k + n + l. and recall that 2n R(n. . . the user calculates the array: R(k. 0) = φ(n) = hn Thus f (a) + f (b) R(n + 1. . 0) = ) + f( ) + f (b) . R(k + 1. . . It turns out this is possible. 0) R(k + n. 2n −1 i=1 f (a + (2i − 1)hn+1 ) . . 0). 2) · · · R(k + n + 3. . 0) = φ(n + 1) = hn+1 + 2 = 1 f (a) + f (b) hn + 2 2 2n+1 −1 i=1 f (a) + f (b) + 2 2n −1 f (a + ihn ) . Let hn = b−a . For this reason. n) R(k + n + 3. 0) R(k + 2. That is. 0) R(k + n + 2. . 1) R(k + n + 3. R(k + n. 0) R(k + n + 3. 1) R(k + 2. . instead of the + 1 usually required. R(k + n. like 2 or 3.106 CHAPTER 8. . 2 2 2 2 It would be best to calculate R(1. R(k + 2. . entries in ever rightward columns are usually not calculated. Usually n is small. 0) requires only 2 n − 1 additional evaluations of f (x). 2) · · · R(k + n + 2. . n) R(k + n + 1. 0) R(k + 1. n) . 1 2 a+b b−a1 a+b f (a) + f ( R(1. . 2) ··· R(k + n + 1. INTEGRALS AND QUADRATURE Quite often Romberg’s Algorithm is used to compute columns of this array.1 Recursive Trapezoidal Rule It turns out there is an eﬃcient way of calculating R(n + 1. 0) without recalculating f (a) and f (b). 0) given R(n. 0) R(k + n + 1.

4) n for some collection of nodes {xi }i=0 . 1.8.12. Solution: The nodes are given. We will examine how these rules are created. and weights {Ai }n . . If we let Ai = a i (x) dx. GAUSSIAN QUADRATURE 107 8.4. 4] using nodes 0. we determine the weights by constructing the Lagrange Polynomials. b a f (x) dx ≈ A0 f (x0 ) + A1 f (x1 ) + . Recall n p(x) = i=0 f (xi ) i (x).4. we expect a quadrature rule with more nodes to be more accurate in some sense–the tradeoﬀ is in the number of evaluations of f (·). (8. and the quadrature rule is exact. the polynomial of degree ≤ n which interpolates f (x) on these nodes. Thus our rigged approximation is the one that gives b a b n b f (x) dx ≈ p(x) dx = a b i=0 f (xi ) a i (x) dx. Example Problem 8. 1 (x − 1)(x − 2) = (x − 1)(x − 2). 3 0 2 4 16 −(x)(x − 2) dx = − . 2. i.e. and integrating them. 2 (x) dx = 2 3 0 4 . then we have a quadrature rule. Normally one ﬁnds the nodes and weights i=0 in a table somewhere. An easy way to ﬁnd “good” weights {A i }n for these i=0 nodes is to rig them so the quadrature rule gives the integral of p(x).1 Determining Weights (Lagrange Polynomial Method) n Suppose that the nodes {xi }i=0 are given. Construct a quadrature rule on the interval [0. 1 (x) dx = 3 0 4 1 20 (x)(x − 1) dx = . ..4 Gaussian Quadrature The word quadrature refers to a method of approximating the integral of a function as the linear combination of the function at certain points. If f (x) is a polynomial of degree ≤ n then f (x) = p(x). 2 (x) = (2 − 0)(2 − 1) 2 0 (x) = Then the weights are 4 A0 = 0 4 0 (x) dx = A1 = 0 4 A2 = 0 8 1 (x − 1)(x − 2) dx = . where i (x) is the ith Lagrange polynomial. An f (xn ). (0 − 1)(0 − 2) 2 (x − 0)(x − 2) = −(x)(x − 2). 8. 1 (x) = (1 − 0)(1 − 2) (x − 0)(x − 1) 1 = (x)(x − 1).

” Since we will not consider n to be very large.13. and for small n. A1 = − 16 . we reconsider Example Problem 8. . 3 3 3 We expect this rule to be exact for a quadratic function f (x). 2. . 3 3 3 3 8. 1. solving the linear system may not be too burdensome. For example. The method of undetermined coeﬃcients is a good alternative for ﬁnding the weights by hand. To illustrate this. n. while the former method calculates the weights “directly. the system: 4 1 1 1 2 8 1 4 64/3 We get the same weights as in Example Problem 8. but can be tedious (and error-prone) if done by hand (say. The idea behind the method is to ﬁnd n+1 equations involving the n+1 weights. Construct a quadrature rule on the interval [0.2 Determining Weights (Method of Undetermined Coeﬃcients) Using the Lagrange Polynomial Method to ﬁnd the weights A i is ﬁne for a computer. A0 = 8 . the method of undetermined coeﬃcients is useful in more general settings. . . INTEGRALS AND QUADRATURE 16 20 8 f (x) dx ≈ f (0) − f (1) + f (2).4.12: A 2 = We perform Na¨ Gaussian Elimination on ıve 1 0 0 20 3 . The equations are derived by letting the quadrature rule be exact for f (x) = x j for j = 0. setting b a n xj dx = i=0 Ai (xi )j . 3 3 3 0 0 The approximation is 4 0 x2 + 1 dx ≈ 16 20 76 8 [0 + 1] − [1 + 1] + [4 + 1] = . 4] using nodes 0. Moreover. 3 3 Notice the diﬀerence compared to the Lagrange Polynomial Method: undetermined coeﬃcients requires solution of a linear system.12. Solution: The method of undetermined coeﬃcients gives the equations: 4 1 dx = 4 = A0 + A1 + A2 0 4 x dx = 8 = A1 + 2A2 0 4 0 x2 dx = 64/3 = A1 + 4A2 . 1. let f (x) = x 2 +1. Example Problem 8. That is. on an exam). as illustrated by the next example: . By calculus we have 4 4 1 76 64 x2 + 1 dx = x3 + x = +4= .108 Thus our quadrature rule is 4 0 CHAPTER 8.

we look for three equations.3 Gaussian Nodes It would seem this is the best we can do: using n + 1 nodes we can devise a quadrature rule that is exact for polynomials of degree ≤ n by choosing the weights correctly. This rule should be exact for polynomials of degree no greater than 2. Gauss discovered that the right nodes to choose are the n + 1 roots of the (nontrivial) polynomial.14. (If you view the integral as an inner product. Because of how we picked q(x) we have b p(x)q(x) dx = 0. Both p(x). or those of higher degree. x 2 and assuming the coeﬃcients give equality: 1 1 dx = 1 = A 1 + B 0 + C 0 = A 0 1 x dx = 1/2 = A 1 + B 1 + C 0 = A + B 0 1 0 x2 dx = 1/3 = A 1 + B 2 + C 2 = A + 2B + 2C This is solved by A = 1. GAUSSIAN QUADRATURE Example Problem 8. but that’s just fancy talk. x. we plug in f (x) = 1. you could say that q(x) is orthogonal to the polynomials xk in the resultant inner product space. of degree n + 1 which has the property b a xk q(x) dx = 0 (0 ≤ k ≤ n) . We write f (x) = p(x)q(x) + r(x). we can do far better. It turns out that by choosing the nodes in the right way. q(x). but it might be better. We get these equations by plugging in successive polynomials. and thus the rule is not exact for cubic polynomials. B = −1/2. 8. r(x) are of degree ≤ n. .) Suppose that we have such a q(x)–we will not prove existence or uniqueness. Let f (x) be a polynomial of degree ≤ 2n + 1.8. We should check: 1 0 x3 dx = 1/4 = 1/2 = 1 − 3/2 + 1 = A 1 + B 3 + C 6. C = 1/6.4. a Thus a b b b b f (x) dx = a p(x)q(x) dx + a r(x)dx = a r(x)dx. Determine a “quadrature” rule of the form 1 0 109 f (x) dx ≈ Af (1) + Bf (1) + Cf (1) that is exact for polynomials of highest possible degree. What is the highest degree polynomial for which this rule is exact? Solution: Since there are three unknown coeﬃcients to be determined.4. That is.

Find the two Gaussian nodes for a quadrature rule on the interval [0. The last equality holds because the x i are the roots of q(x). Gauss has) made quadrature twice as good. however.15 (Gaussian Quadrature Theorem). 8. Because of how the A i are chosen we then have n n b b Ai f (xi ) = i=0 i=0 Ai r(xi ) = a r(x) dx = a f (x) dx. Then the quadrature rule for this choice of nodes and coeﬃcients is exact for polynomials of degree ≤ 2n + 1. which is “orthogonal” to 1. 2]. there are two equations. we can pick the scaling of q(x) as we wish. x under the inner product of integration over [0. q(x). 3 . for any real ˆ number α. Theorem 8. INTEGRALS AND QUADRATURE Now suppose that the Ai are chosen by Lagrange Polynomials so the quadrature rule on the nodes xi is exact for polynomials of degree ≤ n. We have (or rather. 3 This system is “underdetermined. but three unknowns. The example is illustrative Example Problem 8. 3 8 2c0 + c1 + 4c2 = 0. That is. Let x i be the n + 1 roots of a (nontrivial) polynomial. 2].110 CHAPTER 8. that if q(x) satisﬁes the orthogonality conditions.” that is. Thus this rule is exact for f (x). The orthogonality condition becomes 2 2 1q(x) dx = 0 2 0 0 xq(x) dx = 0 2 0 that is. of degree n + 1 which has the property b a xk q(x) dx = 0 (0 ≤ k ≤ n) . 8 2c0 + c1 = −12. This reduces the equations to 2c0 + 2c1 = −8.4. With great foresight. c0 + c1 x + c2 x2 dx = c0 x + c1 x2 + c2 x3 dx = 0 Evaluating these integrals gives the following system of linear equations: 8 2c0 + 2c1 + c2 = 0. Notice. Solution: We will ﬁnd the function q(x) of degree 2.16. Then n n n Ai f (xi ) = i=0 i=0 Ai [p(xi )q(xi ) + r(xi )] = i=0 Ai r(xi ).4 Determining Gaussian Nodes We can determine the Gaussian nodes in the same way we determine coeﬃcients. Thus we let q(x) = c 0 + c1 x + c2 x2 . then so does q (x) = αq(x). Let Ai be the coeﬃcients for these nodes chosen by integrating the Lagrange Polynomials. we “guess” that we want c 2 = 3.

. That is the roots √ √ 6 ± 36 − 24 3 =1± . That is. It suﬃces to make this approximation exact for the “building blocks” of such polynomials. Remember. Chapter 3) yields the answer c 0 = 2.17. c2 = 3. we want to construct A 0 . that is. Verify the results of the previous example problem for f (x) = x 3 . we will use the method of undetermined coeﬃcients.4. Example Problem 8. Then our nodes are the roots of q(x) = 2 − 6x + 3x 2 . A1 such that √ √ 2 3 3 f (x) dx ≈ A0 f (1 − ) + A1 f (1 + ) 3 3 0 is exact for polynomial f (x) of degree ≤ 1. 0 The quadrature rule gives √ 3 3 f (1 − ) + f (1 + )= 3 3 √ √ 3 √ 3 3 3 1− + 1+ 3 3 √ √ √ √ 3 3 3 3 3 3 3 3 + 1+3 = 1−3 +3 − +3 + 3 9 27 3 9 27 √ √ √ √ = 2 − 3 − 3/9 + 2 + 3 + 3/9 = 4 Thus the quadrature rule is exact for f (x). A1 such that 2 0 2 0 1 dx = A0 + A1 √ √ 3 3 x dx = A0 (1 − ) + A1 (1 + ) 3 3 This gives the equations 2 = A 0 + A1 √ √ 3 3 2 = A0 (1 − ) + A1 (1 + ) 3 3 This is solved by A0 = A1 = 1. Solution: We have 2 f (x) dx = (1/4) x4 2 0 = 4. 6 3 111 These nodes are a bit ugly. c1 = −6. GAUSSIAN QUADRATURE Simple Gaussian Elimination (cf. Rather than construct the Lagrange Polynomials. f (x) dx ≈ f (1 − 3 3 We expect this rule to be exact for cubic polynomials. for the functions 1 and x.8. it suﬃces to ﬁnd A0 . Thus our quadrature rule is 2 0 √ √ 3 3 ) + f (1 + ).

b]. so dx = dt. This is not the case. 1]. There are books full of quadrature rules. i=0 with this relation holding exactly for all polynomials of degree no greater than 2n + 1.18 and the quadrature rules of Table 8. 1] on g(t) to evaluate the integral of f (x) over [a. Solution: We have a = 0. Thus g(t) = b−a f 2 b+a b−a t+ 2 2 = f (t + 1). f 2 2 2 a −1 Letting b−a b+a b−a f t+ . The standard Gaussian Quadrature rule approximates this as 1 −1 g(t) dt ≈ g(− 1/3) + g( 1/3) = f (1 − 1/3) + f (1 + 1/3). b = 2. 1]. INTEGRALS AND QUADRATURE 8.19. any good textbook will list a few. To integrate f (x) over [0. b].1.18. See also http://mathworld. as the following example problem illustrates: Example Problem 8. it would seem you need a diﬀerent rule for each interval [a. x= 2 2 2 Then b 1 b+a b−a b−a f (x) dx = t+ dt. 1].4. The simplest ones are given in Table 8. Quadrature rules are normally given for the interval [−1.com/Legendre-GaussQuadrature.112 CHAPTER 8. Derive the quadrature rules of Example Problem 8.5 Reinventing the Wheel While it is good to know the theory. 1]. Solution: Consider the substitution: b+a b−a b−a t+ . This is the same rule that was derived (with much more work) in Example Problem 8.html n 0 1 xi x0 = 0 x0 = − 1/3 x1 = 1/3 x0 = − 3/5 x1 = 0 x2 = 3/5 Ai A0 = 2 A0 = 1 A1 = 1 A0 = 5/9 A1 = 8/9 A1 = 5/9 1 2 Table 8.16 by using the technique of Example Problem 8. b]. derive a version of the rule to apply to the interval [a. Thus −1 f (x) dx ≈ n Ai f (xi ). 2]. we integrate g(t) over [−1. . Thus we can use the quadrature rule for [−1.16. Given a quadrature rule which is good on the interval [−1. it doesn’t make sense in practice to recompute these things all the time.1: Gaussian Quadrature rules for the interval [−1. Example Problem 8. On ﬁrst consideration. g(t) is a polynomial of the same degree.wolfram. g(t) = 2 2 2 if f (x) is a polynomial.1.

7) How many equal subintervals of [2. to approximate 3 0 113 x2 dx (= 9) Use the partition {xi }2 = {0. 3] are required to approximate 2 ex dx with error smaller than 1 × 10−3 by the composite trapezoidal rule? (8.4. Why is your approximation an overestimate? i=0 (8. 1). by hand. 1] are required to approximate 0 cos x dx with error smaller than 1 × 10−6 by the composite trapezoidal rule? (Use Theorem 8. to approximate 1 0 1 dx (= ln 2 ≈ 0. 3} . (8.7) (8.693) x+1 Use the partition {xi }3 = 0. 1 + x2 Use n = 4 subintervals. You should ﬁnd that the approximation is an overestimate. Why is your approximation an overestimate? (Check: i=0 4 2 I think the answer is 0. by hand.6) How many equal subintervals would be required to approximate 1 0 4 dx. As such we expect Simpson’s Rule to be a O h4 approximation to the integral. 1] for f (x) = 4/(1 + x2 )) 3 (8.0001 by the composite trapezoidal rule? (Hint: Use the fact that |f (x)| ≤ 8 on [0.8. What is the highest degree polynomial for which this rule is exact? .4) Use Theorem 8. .3) Use the composite trapezoidal rule. by hand to approximate 1 0 4 dx. 1. 1 + x2 to within 0.1) Use the composite trapezoidal rule. 1 .) (8.5) How many equal subintervals of [0. + 2f (xn−2 ) + 4f (xn−1 ) + f (xn )] .8. 1 (8. How good is your answer? (8. .9) Find a quadrature rule of the form 1 0 f (x) dx ≈ Af (0) + Bf (1/2) + Cf (1) that is exact for polynomials of highest possible degree. Show that Simpson’s Rule for n = 2 is actually given by Romberg’s Algorithm as R(1.8) Simpson’s Rule for quadrature is given as b a f (x) dx ≈ ∆x [f (x0 ) + 4f (x1 ) + 2f (x2 ) + 4f (x3 ) + . and n is assumed to be even. 3 where ∆x = (b − a)/n. 1 .8 to bound the error of the composite trapezoidal rule approximation of 2 3 0 x dx with n = 10 intervals. 1 . GAUSSIAN QUADRATURE Exercises (8.2) Use the composite trapezoidal rule.

+ (b-a) . it does not exactly ﬁt our deﬁnition of a quadrature rule. What is the highest degree polynomial for which this rule is exact? (8.13) Find the Gaussian Quadrature rule with 2 nodes for the interval [1.. x1 . INTEGRALS AND QUADRATURE (8. (8.e. Your m-ﬁle should have header line like: function iappx = gauss2(f.114 CHAPTER 8. b].b. For what order polynomials are these rules exact? (8.14) Find the Gaussian Quadrature rule with 3 nodes for the interval [−1. consider the following questions: do you expect the nodes to be the endpoints 1 and 5? do you expect the nodes to be arranged symmetrically around the midpoint of the interval? (8. i. 1]. ﬁnd a rule 1 −1 f (x) dx ≈ Af (x0 ) + Bf (x1 ) + Cf (x2 ) To ﬁnd the nodes x0 .b) (8. If f is deﬁned to work on vectors element-wise. However.17) Write code to implement composite Gaussian Quadrature based on code from the previous problem.a. 1].15) Write code to approximate the integral of a f on [a. Your m-ﬁle should have header line like: function iappx = trapezoidal(f. but it may be applicable in some situations. 5]. Something like the following probably works: .* (0:n) . x2 you will have to ﬁnd the zeroes of a cubic equation.11) Determine a “quadrature” rule of the form 1 0 f (x) dx ≈ Af (0) + Bf (0) + Cf (1) that is exact for polynomials of highest possible degree.5 * ( f(x(1)) + f(x(n+1)) ) + sum( f(x(2:(n))) ).10) Determine a “quadrature” rule of the form 1 −1 f (x) dx ≈ Af (0) + Bf (−1) + Cf (1) that is exact for polynomials of highest possible degree.16) Write code to implement the Gaussian Quadrature rule for n = 2 to integrate f on the interval [a.12) Consider the so-called order n Chebyshev Quadrature rule: 1 −1 n f (x) dx ≈ cn f (xi ) i=0 Find the weighting cn and nodes xi for the case n = 2 and the case n = 3./ n.) (8. What is the highest degree polynomial for which this rule is exact? (Since this rule uses derivatives of f.e.. ﬁnd a rule 5 1 f (x) dx ≈ Af (x0 ) + Bf (x1 ) Before you solve the problem. you can probably speed up your computation by using bigsum = 0.a. i. (8.n) You may wish to use the code: x = a . which could be diﬃcult. b] by the composite trapezoidal rule on n equal subintervals. you may use the simplifying assumption that the nodes are symmetrically placed in the interval [−1.

the probability that a normal random variable is within z standard deviations from its mean is √ erf(z/ 2) √ √ Thus erf(1/ 2) ≈ 0. In particular./ n. or see http://mathworld.x(i).wolfram. end Use your code to approximate the error function: 2 erf(z) = √ π z 0 115 e−t dt.b. and erf(2/ 2) ≈ 0. GAUSSIAN QUADRATURE function iappx = gaussComp(f. These numbers should look familiar to you.com/Erf. 2 Compare your results with the octave/Matlab builtin function erf.4.b] x = a . (Try help erf in octave.955.n) % code to approximate integral of f over n equal subintervals of [a. . iappx = 0.* (0:n) .+ (b-a) .a.html) The error function is used in probability.x(i+1)).683.8. for i=1:n iappx += gauss2(f.

116 CHAPTER 8. INTEGRALS AND QUADRATURE .

the number of parameters will be smaller (usually much smaller) than the number of data points. That is we let x = [x0 x1 .” and examining methods for ﬁnding the “best” function to ﬁt the data. For a given p with 0 < p < ∞. F. . The theory here will be concerned with deﬁning “best. . xn ] . . or residual. is the function f ∗ ∈ F that minimizes the 2 norm of the error.1 Least Squares Least squares is a general class of methods for ﬁtting observed data to a theoretical model function.4. The goal then is to ﬁnd the “best” f ∈ F to ﬁt the data to y = f (x). if f ∗ is the least squares best approximant. 9.1. the p norm of r is deﬁned as n 1/p p ri i=0 2 r p = For the purposes of approximation.1 The Deﬁnition of Ordinary Least Squares First we consider the data to be two vectors of length n + 1. Usually the class of functions F will be determined by some small number of parameters. . That is r = [r0 r1 . In the general setting we are given a set of data x y x 0 x1 . . where ri = yi − f (xi ). then y − f ∗ (x) 2 = min y − f (x) f ∈F 2 117 . x. x n y 0 y1 . y n and some class of functions. Our goal is to ﬁnd f ∈ F such that r is reasonably small. and y = [y0 y1 .1.Chapter 9 Least Squares 9. y from a class of functions. the easiest norm to use is the norm: Deﬁnition 9. which were explored in Subsection 1. of a given function with respect to this data is the vector r = y − f (x). rn ] . The error. . yn ] . .1. The least-squares best approximant to a set of data. .1 ((Ordinary) Least Squares Best Approximant). We measure the size of a vector by the use of norms. That is. The most useful norms are the p norms. F. . .

It assumes there is no error in the measurement of the data x. They can be reduced to x2 a + k xk b = xk yk yk xk a + (n + 1)b = The solution is found by na¨ Gaussian Elimination. Believe it or not these are two equations in two unknowns. We can now think of our job as drawing the “best” line through the data. As you may recall from calculus class. and is ugly.1. as this case is reasonably simple. b ∈ R } . we want to ﬁnd the function f (x) = ax + b that minimizes n i=0 1/2 [yi − f (xi )] 2 . it suﬃces to minimize n i=0 [axi + b − yi ]2 . and usually admits a relatively straightforward solution. This is a minimization problem over two variables. Let ıve d11 = d12 = d21 = e1 = e2 = We want to solve d11 a + d12 b = e1 d21 a + d22 b = e2 x2 k xk xk yk yk d22 = n + 1 .118 CHAPTER 9. That is. We minimize this function with calculus. 9. By Deﬁnition 9. it suﬃces to minimize the sum rather than its square root. We are assuming F = {f (x) = ax + b | a.2 Linear Least Squares We illustrate this deﬁnition using the class of linear functions as an example.1. LEAST SQUARES We will generally assume uniqueness of the minimum. This method is sometimes called the ordinary least squares method. x and y.1. We set 0= 0= ∂φ a ∂φ b n = k=0 n 2xk (axk + b − yk ) 2 (axk + b − yk ) = k=0 These are called the normal equations.

the class of functions. We should.1. for f.1. .1. but not all cases. Now let {gj (x)}m be a set of m + 1 linearly independent functions. it is also spanned by the two functions g 0 (x) = 2x + 1.3 Least Squares from Basis Functions In many. g ∈ F. b ∈ R } is spanned by the two functions g 0 (x) = 1. . i. . that is we minimize the function n φ (c0 .1) j cj gj (xk ) − yk gi (xk ) n gj (xk )gi (xk ) cj = j=0 k k=0 yk gi (xk ) . In this case F can be viewed as a vector space over the real numbers. . ˜ ˜ To ﬁnd the least squares best approximant of F for a given set of data. could it not be a maxima? 9. Note the basis functions need not be unique: a given class of functions will usually have more than one choice of basis functions. .. In this case the functions gj are basis functions for F. we minimize the square of the 2 norm of the error. where the function αf is that function such that (αf )(x) = αf (x). . . LEAST SQUARES Gaussian Elimination produces a = b = d22 e1 − d12 e2 d22 d11 − d12 d21 d11 e2 − d21 e1 d22 d11 − d12 d21 119 The answer is not so enlightening as the means of ﬁnding the solution. j = 0. for a moment. = c m = 0 Then we say that F is spanned by the functions {g j (x)}m if j=0 F= f (x) = j cj gj (x) | cj ∈ R. . This case is simpler to explore and we consider it here. However. then αf + βg ∈ F. + cm gm (x) = 0 ∀x ⇒ c 0 = c1 = .9. cm ) = k=0 j This can be rearranged to get Again we set partials to zero and solve n ∂φ 2 = 0= ci k=0 m cj gj (xk ) − yk 2 (9. and g1 (x) = x. β ∈ R. . m . . and α. and g1 (x) = x − 1. c1 . Example 9. consider whether this is indeed the solution. The class F = {f (x) = ax + b | a.e. Our calculations have only shown an extrema at this choice of (a. is the span of a small set of functions. j=0 c0 g0 (x) + c1 g1 (x) + . . . That is. 1. b). F.

consider what would happen if the system of normal equations were diagonal.452472 −0. x2 = −1. Note that we are talking about the basis {g j }m . In this case. Find the least squares approximation of the data x y 0. Example 9.187098 −0. g1 (x1 ) = . Consider the awfully chosen basis functions: g0 (x) = 2 − 1 x2 − x + 1 2 2 −1 x2 + x + 1 g1 (x) = x3 + where is small. . solving the system would be rather trivial. g0 (x1 ) = 0. . ei = k=0 yk gi (xk ). . dmm (again.50 2.25 2.2. . called the normal equations): e0 c0 c1 e1 c2 = e2 (9. Suppose the data are given at the nodes x 0 = 0. g1 (x2 ) = 0 After much work we ﬁnd we want to solve 1+ 1 2 1 1+ 2 c0 c1 = y0 + y 2 y0 + y 1 . x1 = 1.068077 0. g0 (x2 ) = g1 (x0 ) = 1. . But this example was rigged to give: g0 (x0 ) = 1. . First we have to evaluate the basis functions at the nodes xi . . We want to set up the normal equations.0 2.50 0. dm0 dm1 dm2 · · · to the linear system d02 · · · d0m d12 · · · d1m d22 · · · d2m .436975 1. .2 reduces to the 1-D equation: Σk ln2 xk c = Σk yk ln xk For our data.75 1.9844 so we ﬁnd c = 1.3.0960c = 6. We explore this in Section 9. so we compute some of the d ij . around machine precision.1.165234 1.725919 1. . .0 −1.0 1. .75 3.7052.. LEAST SQUARES n n gj (xk )gi (xk ). .841422 Solution: Essentially we are trying to ﬁnd the c such that c ln x best approximates the data. this reduces to: 4.713938 1. For example. The data and the least squares interpolant are shown in Figure 9. The system of equation 9. Then we have reduced the problem d00 d01 d10 d11 d20 d21 .2) . .120 If we now let dij = k=0 CHAPTER 9. . . .2. . and g 0 (x) = ln x. and not exactly about the class j=0 of functions F. em cm Example Problem 9. Consider the case where m = 0. The choice of the basis functions can aﬀect how easy it is to solve this system.

T1 (x) = x. i. and since is rather small. since we are squaring. ORTHONORMAL BASES 2 121 1. Roughly speaking. The basis functions look too much alike on the given interval. . When the gi are small at the nodes xk .5 1 0.5 -1 -1.9. We ﬁnd they do a pretty poor job of discriminating around all the nodes.1: The data of Example Problem 9.e. and so tries to solve the system 1 1 1 1 c0 c1 = y0 + y 2 y0 + y 1 When y1 = y2 .5 2 2. the computer would only ﬁnd this if it had inﬁnite precision. Bummer. Ti+1 (x) = 2xTi (x) − Ti−1 (x). This certainly gives a basis for F. these coeﬃcients can get really small.. but actually a rather poor one. The most obvious choice of basis functions is gj (x) = xj .5 0 -0.5 1 1. then some d ij can have a loss of precision when two small quantities are multiplied together and rounded to zero. 1]. To see why. the computer thinks 2 = 0. For simplicity we will assume that all xi are in the interval [0.5 0. where T0 (x) = 1. Since it does not. if gi (xk ) is small for some i’s and k’s. A better basis for this problem is the set of Chebyshev Polynomials of the ﬁrst kind.5 -2 -2. gj (x) = Tj (x).2. which will be done by Gaussian Elimination. Now we draw a rough sketch of the basis functions. Poor choice of basis vectors can also lead to numerical problems in solution of the normal equations. look at the graph of the basis functions in Figure 9. Consider the case where F is the class of polynomials of degree no greater than m. 9.2 Orthonormal Bases In the previous section we saw that poor choice of basis vectors can lead to numerical problems.2 and the least squares interpolant are shown.3. This kind of thing is common in the method of least squares: the coeﬃcients of the normal equations include terms like gi (xk )gj (xk ). this has no solution. However.5 3 Figure 9.

.5 0 -0. .6 0.4 0. . Note that around the three nodes 0. . LEAST SQUARES g0 (x) g1 (x) 1 0. n.8 0. Figure 9.5 1 0.Sfrag replacements 122 1.5 0 0.2. j = 0. This can lead to a system of normal equations with no solution. .8 1 9.5 1 0. First. 1]. −1. .4.2 0 0 -1 -0. These polynomials are illustrated in Figure 9. . and g1 (x) = x3 + 2 − 1 x2 + x + 1 are shown for = 0. We consider other methods.5 CHAPTER 9.2: The basis functions g0 (x) = 2 − 1 x2 − 2 x + 1.5 -1 Figure 9. . essentially. These polynomials make a bad basis because they look so much alike. 1. i = 0. . . . 1. 6 are shown on [0. 1.05.1 Alternatives to Normal Equations It turns out that the Normal Equations method isn’t really so great. . these two functions take nearly identical values. .3: The polynomials xi for i = 0. PSfrag replacements xi xi+1 -1.4 0. m. 1.2 0.6 0. we deﬁne A as the n × m matrix deﬁned by the entries: aij = gj (xi ).

gm (x0 ) gm (x1 ) gm (x2 ) gm (x3 ) gm (x4 ) . g1 (x0 ) g1 (x1 ) g1 (x2 ) g1 (x3 ) g1 (x4 ) . in the natural way. 1. of this function is r = y − Ac. These polynomials make a better basis for least squares because they are orthogonal under some inner product. 6 are shown on [0.5 -1 0 123 0.8 1 g0 (xn ) g1 (xn ) g2 (xn ) · · · = (y − A c) (y − A c) . ORTHONORMAL BASES Figure 9. You should now convince yourself that Ac = f (x). That is c is m identiﬁed with f (x) = j=0 cj gj (x). 1]. .” After some inspection.6 0. . with a function in F. Basically. . For this reason..3) Now let the vector c be identiﬁed.4: The Chebyshev polynomials T j (x) for j = 0. .4 0. . And thus the residual. . (9. gm (xn ) We write it in this way since we are thinking of the case where n m. . . . In our least squares theory we attempted to ﬁnd that c that minimized y − Ac 2 2 We can see this as minimizing the Euclidian distance from y to A c. so A is “tall. . . . they do not look like each other. that is we want A r = A (y − A c) = 0. we will have that the residual is orthogonal to the column space of A. ··· ··· ··· ··· ··· . Sfrag replacements xi xi+1 1 0. .9. That is A= g0 (x0 ) g0 (x1 ) g0 (x2 ) g0 (x3 ) g0 (x4 ) . g2 (x0 ) g2 (x1 ) g2 (x2 ) g2 (x3 ) g2 (x4 ) .5 0 -0.2.2 0. we ﬁnd that the Normal Equations can be written as: A A c = A y. or error.

which can be solved by specialized means. and that there is no error in the independent variables. consider the case of two variables. is a random variable. in the following form: ﬁnd c. octave/Matlab is not using the normal equations method. the orthogonal least squares method is appropriate. locate the Gauss-Markov Theorem in a good statistics textbook. However. 9.1. 9. unless speciﬁcally told otherwise. that there is actually error in the measurement of the x i ? In this case. 1 This means that the ordinary least squares solution gives an unbiased estimator of m and b. It is usually assumed that E [ i ] = 0. this is solved just like the case where A is square: octave:1> c = A \ y..e. the ordinary least squares solution is a very good one. r such that y r I A = 0 c A 0 This is now a system of n + m variables and unknowns.” Under the further assumption that the errors are independent and have the same variance. more equations than unknown variables. sometimes referred to as “ordinary least squares.” assumes that measurement error is found entirely in the dependent variable.e. i.e. i. A number of measurements are made.. Under the hood. and. that i=1 i=1 yi = mxi + b + i . and does not suﬀer the increased condition number of the normal equations method. For more details on this. Ordinary least squares assumes.124 CHAPTER 9. x and y. however. as it turns out to be equivalent to using the normal equations method. a priori estimators. We brieﬂy mention that na¨ Gaussian Elimination is not ıve appropriate to solve the augmented form.2 Ordinary Least Squares in octave/Matlab The ordinary least squares best approximant is so fundamental that it is hard-wired into octave/Matlab’s system solve operator.. that the measurements are “unbiased.3 Orthogonal Least Squares The method described in Section 9. Thus. gives the estimators with the lowest variance among all linear. what if it were the case that yi = m (xi + ξi ) + b + i . you should not implement the ordinary least squares solution method in octave/Matlab. The general equation we wish to solve is Ac = y. the error of the ith measurement. LEAST SQUARES This is just the normal equations. We could rewrite this. This system is overdetermined: there are more rows than columns in A. where i . moreover. . which are thought to be related by an equation of the form y = mx + b. For example. i. This is known as the augmented form. in octave/Matlab.2. giving the two sequences {xi }n and {yi }n .1 However.

3. It gives the sum of the squared distances from the points to the plane described by n and d. The oﬀsets from points to the regression line in (b) may not look orthogonal due to improper aspect ratio of the ﬁgure. d) = 1. We can solve this problem using vector calculus. Thus our problem is to ﬁnd n and d that solve 2 m m min n n=1 i=1 n x(i) − d 2 2 . We can express this as min f (n. as shown in (a). The orthogonal least squares method will minimize the sum of the squared distances from observations to a line. Let x(i) i=1 be the set of observations. minimizes the sum of the squared distances from the points to the line.5. it minimizes the sum of the squared lengths of vertical lines from observations to a line. minimizes the sum of the squared lengths of the vertical lines from the observed points to the regression line.5: The ordinary least squares method. reﬂecting the assumption of no error in x i .9. the ordinary least squares method is shown. λ) = f (n. The solution of this problem uses the Lagrange multipliers technique. d).5(b). The Lagrange multipliers theory tells us that a necessary . expressed as vectors in Rn . The problem is to ﬁnd the vector n and number d such that the hyperplane n x = d is the plane that minimizes the sum of the squared distances from the points to the plane. 7 7 6 6 5 5 4 4 3 3 2 2 1 0 1 2 3 4 5 6 1 0 1 2 3 4 5 6 (a) Ordinary Least Squares (b) Orthogonal Least Squares Figure 9. d) − λg(n. ORTHOGONAL LEAST SQUARES 125 The diﬀerence between the two methods is illustrated in Figure 9. d.5(a). we will minimize it subject to the constraint that n 2 = 1. The function m 2 1 n x(i) − d . The orthogonal least squares. To simplify its computation. We construct the solution geometrically. In Figure 9. shown in (b). Let L (n. 2 n 2 i=1 2 is the one to be minimized with respect to n and d. d) subject to g(n. as shown in Figure 9.

d) = 1 L (n. λ) = 2d1m 1m − 21m Xn ∂d ⇒ d = 1m Xn/1m 1m This essentially tells us that we will zero the ﬁrst moment of the data around the line. d.126 CHAPTER 9. λ) = (Xn − d1m ) (Xn − d1m ) − λn n We solve the necessary condition on 0= ∂L ∂d . Let X be the m × n matrix whose ith row is the vector x(i) . Thus if n is a minimizer. d. It is not clear which eigenvector is the right one. However. d to be the solution is that there exists λ such that the following hold: ∂L (n. λ) = 0 n g (n. 1m Xn/1m 1m ). First we rewrite the function to be minimized. this gives 0 = X Xn − X 1m 1m X X 1m 1m X n − λn = X X − − λI n 1m 1m 1m 1m Thus n is an eigenvector of the matrix M=X X− X 1m 1m X 1m 1m The ﬁnal condition of the three Lagrange conditions is that g(n. ˆ when the optimal d is used: f (n) = f (n. We can rewrite the Lagrange function as = n X Xn − 2d1m Xn + d2 1m 1m − λn n ∂L (n. However. we can plug it into the gradient equation: 0= n L (n. We have 1m Xn ˆ 1m Xn + f(n) = n X Xn − 2 1m 1m n X 1m 1m Xn = n X Xn − 2 + 1m 1m n X 1m 1m Xn = n X Xn − 2 + 1m 1m n X 1m 1m Xn = n X Xn − 1m 1m X 1m 1m X n =n X X− 1m 1m = n Mn 1m Xn 1m 1m 1m 1m n X 1m 1m Xn1m 1m 1m 1m 1m 1m n X 1m 1m Xn 1m 1m 2 . so we might have to check all the eigenvectors. so the multiplication can be commuted. Note that this determination of d requires knowledge of n. d. d. then it is a unit eigenvector of M. λ) = 2X Xn − 2d 1m X − 2λn = 2 X Xn − 1m Xn 1m X 1m 1m − λn The middle term is a scalar times a vector. d. further work tells us which eigenvector it is. d) = n n = 1. since this is a necessary condition. λ) = 0 ∂d L (n. LEAST SQUARES condition for n.

Example Problem 9. λ must be positive. xm ym M = X X− = = = = X 1m 1m X . Now if n is a unit eigenvector of M. The roots of this polynomial are given by the quadratic equation λ± = 2(α + γ) ± 4 (α + γ)2 − 4 (4αγ − β 2 ) 2 = (α + γ) ± (α − γ)2 + β 2 We want the smaller eigenvalue. though they are small in magnitude.and y-values. whose roots are the eigenvalues of the matrix is p(λ) = det 2α − λ β β 2γ − λ = 4αγ − β 2 − 2(α + γ)λ + λ2 . ORTHOGONAL LEAST SQUARES 127 This sequence of equations only required that the d be the optimal d for the given n. . you may compute that M has some negative eigenvalues. then n Mn = λ. 1m 1m 1 xi yi x2 i − 2 yi xi yi m x2 i xi yi xi yi 2 yi −m Thus we have that ( xi ) 2 xi yi . the mean of the x. Find the equation of the line which best approximates the 2D data m {(xi . as n. for some vector w. Note that. Thus we take n to be the unit eigenvector associated with the smallest eigenvalue.4. thus 1 m Xn = n X 1m .9. . x2 xy ¯ ¯¯ xy y 2 ¯¯ ¯ 2α β β 2γ 2 x xi − m¯2 xi yi − m¯y x¯ xi yi − m¯y x¯ 2 y yi − m¯2 (9. by roundoﬀ. λ− = (α + γ) − the null space of the matrix 0= 2α − λ− β β 2γ − λ− v= (α − γ)2 + β 2 . where we use x. by the orthogonal least squares method. Solution: In this case the matrix X is x1 y1 x2 y2 x3 y3 . The associated eigenvector is in −k 1 = β − k (2α − λ− ) −kβ + 2γ − λ− 2α − λ− β β 2γ − λ− . xi ( yi ) 2 yi . we want to minimize λ.3. the unit eigenvector associated with the smallest eigenvalue in absolute value.. However. . to mean. then the matrix M must be positive deﬁnite. Note that because f (n.4) . and used the fact that a scalar is it’s own transpose.e. respectively. with corresponding eigenvalue λ. Thus one should select. . . The characteristic ¯ ¯ polynomial of the matrix. d) is deﬁned as w w. yi )}i=1 . i. y .

but are not normally used to solve the problem. and can still use the regular formula to compute the optimal d. (9. v.128 This is solved by (γ − α) + 2γ − λ− = k= β = (γ − α)2 + β 2 CHAPTER 9.3. From the ﬁrst equation. The second equation tells that the rows of U are also such a collection. U U = I = UU . so too is the above described means of computing the orthogonal least squares not a good idea.. 1m 1m Now note that W W= X− 1m 1m X 1m 1m X− 1m 1m X 1m 1m = X X−2 1m 1m X 1m 1m X X 1m 1m X + 2 1m 1m 1m 1m =X X−2 =X X− X 1m 1m X X 1m (1m 1m )1m X X 1m 1m X X 1m 1m X + + =X X−2 2 1m 1m 1m 1m 1m 1m 1m 1m X 1m 1m X = M. d = 1m Xn/1m 1m = = x y ¯ ¯ −k 1 1 1m X m −k 1 = y − k¯ ¯ x Thus the optimal line through the data is y = kx + d = k (x − x) + y or ¯ ¯ with k deﬁned in equation 9. A square matrix U is called unitary if its inverse is its transpose.e. First we deﬁne 1m 1m X W =X− .1 Computing the Orthogonal Least Squares Approximant Just as the Normal Equations equation 9. i. y − y = k (x − x) . . However it is parallel to the unit vector we want.1. ¯ ¯ 9.3 deﬁne the solution to the ordinary least squares approximant. that we used is not a unit vector.5.3. 1m 1m We now need some linear algebra magic: Deﬁnition 9. we see that the columns of U form a collection of orthonormal vectors. LEAST SQUARES β 2 (yi − x2 ) − m y 2 − x2 + ¯ ¯ i 2 2 (yi − x2 ) − m (¯2 − x2 ) y ¯ i 2 + 4( xi yi − m¯y x¯ or xi yi − m¯y )2 x¯ .5) Thus the best plane through the data is of the form −kx + 1y = d y = kx + d Note that the eigenvector.

9.3. ORTHOGONAL LEAST SQUARES Deﬁnition 9.3.2. Every m × n matrix B can be decomposed as B = UΣV ,

129

where U and V are unitary matrices, U is m × m, V is n × n, and Σ is m × n, and has nonzero elements only on its diagonal. The values of the diagonal of Σ are the singular values of B. The column vectors of V span the row space of B, and the column vectors of U contain the column space of B. The singular value decomposition generalizes the notion of eigenvalues and eigenvalues to the case of nonsquare matrices. Let u(i) be the ith column vector of U, v (i) the ith column of V, and let σi be the ith element of the diagonal of Σ. Then if x = i αi v (i) , we have 0 σ1 α1 0 σ 2 α2 . . . . . . 0 0 0 0 . . . . . . 0 0 ... ... .. . ... ... .. . ... 0 0 . . .

Bx = UΣV

i

αi v (i)

= UΣ [α1 α2 . . . αn ] = U

σ n αn = 0 . . . 0

σi αi u(i) .

i

Thus v (i) and u(i) act as an “eigenvector pair” with “eigenvalue” σ i . The singular value decomposition is computed by the octave/Matlab command [U,S,V] = svd(B). The decomposition is computed such that the diagonal elements of S, the singular values, are positive, and decreasing with increasing index. Now we can use the singular value decomposition on W: W = UΣV . Thus M = W W = VΣ U UΣV = VΣ IΣV = VS2 V ,

2 where S2 is the n × n diagonal matrix whose ith diagonal is σi . Thus we see that the solution to the orthogonal least squares problem, the eigenvector associated with the smallest eigenvalue of M, is the column vector of V associated with the smallest, in absolute value, singular value of W. In octave/Matlab, this is V(:,n). This may not appear to be a computational savings, but if M is computed up front, and its eigenvectors are computed, there is loss of precision in its smallest eigenvalues which might lead us to choose the wrong normal direction (especially if there is more than one!). On the other hand, M is a n × n matrix, whereas W is m × n. If storage is at a premium, and the method is being reapplied with new observations, it may be preferrable to keep M, rather than keeping W. That is, it may be necessary to trade conditioning for space.

9.3.2

Principal Component Analysis

The above analysis leads us to the topic of Principal Component Analysis. Suppose the m × n matrix X represents m noisy observations of some process, where each observation is a vector in Rn . Suppose the observations nearly lie in some k-dimensional subspace of R n , for k < n. How can you ﬁnd an approximate value of k, and how do you ﬁnd the k-dimensional subspace?

130 As in the previous subsection, let W =X− 1m 1m X ¯ = X − 1m X, 1m 1m

CHAPTER 9. LEAST SQUARES

¯ where X = 1m X/1m 1m .

¯ Now note that X is the mean of the m observations, as a row vector in R n . Under the assumption ¯ that the noise is approximately the same for each observation, we should think that X is likely to be in or very close to the k-dimensional subspace. Then each row of W should be like a vector which is contained in the subspace. If we take some vector v and multiply Wv, we expect the resultant vector to have small norm if v is not in the k-dimensional subspace, and to have a large norm if it is in the subspace. You should now convince yourself that this is related to the singular value decomposition of W. Decomposing v = i αi v (i) , we have Wv = i σi αi u(i) , and thus product vector has small norm if the αi associated with large σi are small, and large norm otherwise. That is the principal directions of the “best” k-dimensional subspace associated with X are the k columns of V associated with the largest singular values of W.

1 cum. sum sqrd. sing. vals.

0.9

0.8

0.7

0.6

0.5

0.4

0.3 0 5 10 15 20 25 30 35 40

Figure 9.6: Noisy 5-dimensional data lurking in R 40 are treated to Principal Component Analysis. This ﬁgure shows the normalized cumulative sum of squared singular values of W. There is a sharp “elbow” at k = 5, which indicates the data is actually 5-dimensional. If k is unknown a priori, a good value can be obtained by eyeing a graph of the cumulative sum of squared singular values of W, and selecting the k that makes this appropriately large. This is illustrated in Figure 9.6, with data generated by the following octave/Matlab code: octave:1> octave:2> octave:3> octave:4> X = rand(150,5); sawX = (X * randn(5,40)) + kron(ones(150,1),randn(1,40)); sawX += 0.1 * randn(150,40); wun = ones(150,1);

9.3. ORTHOGONAL LEAST SQUARES octave:5> octave:6> octave:7> octave:8> octave:9> W = sawX - (wun * wun’ * sawX) / (wun’ * wun); [U,S,V] = svd(W); eigs = diag(S’*S); cseig = cumsum(eigs) ./ sum(eigs); plot ((1:40)’,cseig,’xr-@;cum. sum sqrd. sing. vals.;’);

131

4. equation 9. in the ordinary least squares sense.6) Find the function ax + b that best approximates the data of the previous problem using the orthogonal least squares method. say. . (9. the data x 1 −2 5 y 1 −2 4 (9.9) For ordinary least squares regression to the line y = mx + b.3 0.3) Find the constant that best approximates. y n (9. (9.4) Find the function f (x) = c which best approximates. Compare this to that found in equation 9.10) Any symmetric positive deﬁnite matrix. (9. x n y 0 y1 .132 Exercises CHAPTER 9. the given data x.1.1 0. and thus takes the form −1 c= A A A y. LEAST SQUARES (9.7) Find the function ax2 + b that best approximates the data x y 0 1 2 0. ˆ . and γ parameters from equation 9.1) How do you know that the choice of constants c i in our least squares analysis actually ﬁnd a minimum of equation 9.) Do the xi values aﬀect your answer? (9.2 or equation 9.5) Find the function ax + b that best approximates the data x y 0 −1 2 0 1 −1 Use the ordinary least squares method.2) Our “linear” least squares might be better called the “aﬃne” least squares. y. . β. in the ordinary least squares sense. and not.5 (9. ﬁnds the c that solves ˆ min Ac − y c 2 G Prove that the normal equations form of the solution of this problem is A GA c = A Gy.3. express the approximate m in terms of the α.5. In this exercise you will ﬁnd the best linear function which approximates a set of data. (9. ﬁnd the function f (x) = cx which is the ordinary least squares best approximant to the given data x y x 0 x1 . .8) Prove that the least squares solution is the solution of the normal equations. That is. G can be used to deﬁne a norm in the following way: √ v G =df v Gv The weighted least squares method for G and data A and y. a maximum? (9.3 using a single basis function g 0 (x) = 1. . (Hint: you can use equation 9.

00001 0. .7]. (Hint: To ﬁnd x such that Mx = b.4. a set of data {(xi .034]’.3. (b) Prove that the r 2 parameter is “scale-invariant.5 0 0. 2 . Show that the solution to the weighted least squares method is the c that is the ordinary (unweighted) least squares best approximate ˆ solution to the problem L Ac = L y. . the orthogonal least squares best approximate line for x ¯ data {(xi . . (9. ˆ (a) Prove that we also have r2 = 1 − Aˆ − y c y 2 2 2 2 2 Aˆ c y 2 2 2 .00001 0. in the least squares sense. the parameter is unchanged (although the value of c may change.15) Write code to ﬁnd the function ae x + be−x that best interpolates.” that is. yi ). Your m-ﬁle should have header line like: i=0 function [a.14) Find the constant c such that f (x) = ln (cx) best approximates. yi )}i=n . use x = M \ b.16) Write code to ﬁnd the plane n x(i) = d that best interpolates.) ˆ (c) Devise a similar statistic for the orthogonal least squares approximation which is scaleinvariant and rotation-invariant.b] = lsqrexp(xys) where xys is the (n + 1) × 2 matrix whose rows are the n + 1 tuples (x i .b] = lsqrexp(xys) (b) Try the following: octave:5> xys = xys = [-0.6. yi )}m goes through the point (¯. This should reduce the problem to solving a 2 × 2 linear system. x n y y 0 y1 .194 0. Use the normal equations method.322 1.12) The r 2 statistic is often used to describe the “goodness-of-ﬁt” of the ordinary least squares approximation to data. let octave/Matlab solve that linear system for you.2 to solve this. in the least squares sense.0 0. Does the same thing hold for the ordinary i=1 least squares best approximate line? (9.5 1]’.1.) (a) What do you get when you try the following? octave:1> xs = [-1 -0. . .11) (Continuation) Let the symmetric positive deﬁnite matrix G have Cholesky Factorization G = LL . octave:3> xys = [xs ys]. Your m-ﬁle should have header line like: . an is deﬁned as ( ai )1/n . . the given data x x 0 x1 . if we change the units of our data.0.3. y ).1.103 0. It is deﬁned as r = where c is the approximant A A−1 A y. You must use the Deﬁnition 9. in the least squares sense. octave:6> [a.430 0.) The geometric mean of the numbers a 1 . a2 . (9. octave:2> ys = [1.9. where L is a lower triangular matrix. ORTHOGONAL LEAST SQUARES 133 (9.13) As proved in Example Problem 9.b] = lsqrexp(xys) Does something unexpected happen? Why? (9. octave:4> [a. a i=m set of data x(i) i=0 . . y n (Hint: You cannot use basis functions and equation 9. . How does your answer relate to the geometric mean? (9.

Now test your code.").yeps = 1. octave:10> plot(XSaw. and with with xeps = 1.orthb = d / n(2). The data. . octave:3> T = rand(n-1.375. octave:6> lsm = lsmb(1).m = 0. Try this again with diﬀerent values of xeps and yeps. octave:15> hold off.epsi = 0. octave:3> XSaw = Xtrue + xeps * randn(size(Xtrue)). which control the sizes of the errors in the x and y dimensions. lsb = lsmb(2). Compare the estimated coeﬃcients from both solutions.1."*.lsm*XSaw . . octave:2> Xsub = rand(m.0. x2 . ones(obs."). (b) Compare the orthogonal least squares technique to the ordinary least squares in a two dimensional setting.0.(n-1)). octave:11> hold on.1. yeps = 1. Consider x. ."). (a) Create artiﬁcially planar data by the following: octave:1> m = 100. octave:5> lsmb = [XSaw.YSaw. octave:6> X += epsi .* randn(size(X)).0. octave:13> plot(XSaw. octave:2> Xtrue = randn(obs.m*XSaw ."). octave:4> YSaw = Ytrue + yeps * randn(size(Ytrue))."g-. Under which situations does orthogonal least squares outperform ordinary least squares? Do they give equally erroneous results in some situations? Can you think of a way of “reweighting” channel data to make the orthogonal least squares method more robust in cases of unequal error variance? . xn ). octave:14> plot(XSaw. octave:9> hold off.d] = orthlsq(X).1).true line.1)] \ YSaw.+ b.observed data.+ orthb.ordinary least squares. Now X contains data which are approximately on a (n − 1)-dimensional hyperplane in Rn ."b-. lay in the rowspace of T. Try with xeps = 0. Check this: octave:8> disp(norm(T * n)).xeps = 1.orthm*XSaw . octave:8> orthm = -n(1) / n(2).orthogonal least squares.n). y data nearly on the line y = mx + b: octave:1> obs = 100."r-.Ytrue = m . yeps = 0. which returns the eigenvalues and vectors of a matrix. Plot the two regression lines. .134 CHAPTER 9. octave:4> X = Xsub * T. YSaw]).0.1. What do you get? Try again with epsi very small (1 × 10 −5 ) or zero. You should have that the vector n is orthogonal to the rows of T. octave:7> [n.* Xtrue .+ lsb. Use the octave/Matlab eig function.d] = orthlsq([XSaw.* ones(size(X)).n = 5.d] = orthlsq(X) where X is the m × n matrix whose rows are the m tuples (x 1 .b = 1. octave:5> X += offs . octave:7> [n.+ b.offs = 2. LEAST SQUARES function [n. when centered. octave:12> plot(XSaw.

[8]. Some classical uses include simulation of the growth of populations and trajectory of a particle. you probably studied the case where f (t. see any of the standard texts. Much of the background material of this chapter should be familiar to the student of calculus. are used in approximating some physical systems. 10. x(a) = c.g. For more background on ODEs.. where K is chosen such that x(a) = c. x(t)) . x(a) = c. or ODEs. You may recall that the solution to such a separable ODE usually involved the following equation: dx(t) dt dx = h(x) g(t) dt Finding an analytic solution can be considerably more diﬃcult in the general case. thus we turn to approximation schemes. For example the ODE given by = t2 − t. x) = g(t)h(x) for some functions g. that is f (t. We will focus more on the approximation of solutions. dx(t) dt (10. which can contain partial derivatives. e. They are usually easier to solve or approximate than the more general Partial Diﬀerential Equations. x(t)) is independent of its second variable. x) is separable. h. 135 . than on analytical solutions. PDEs. 2 A more general case is when f (t. 1 has solution x(t) = 3 t3 − 1 t2 + K.1 Elementary Methods A one-dimensional ODE is posed as follows: ﬁnd some function x(t) such that = f (t.Chapter 10 Ordinary Diﬀerential Equations Ordinary Diﬀerential Equations.1) In calculus classes.

2) dx = t t f (r. for each step. Thus we fall back on Taylor’s Theorem (Theorem 1. say by using Euler’s Method. If we can approximate the integral then we have a way of ‘stepping’ from t to t + h. since x(t + h) appears on both sides of the equation..1). though nothing precludes us from varying the step size. Bummer. 2 But we cannot evaluate this exactly.1.3) This is called Euler’s Method.e. x(r)) dr. we get x(t + h) ≈ x(t) + hf (t. i.e.1. + 2 m! This approximate solution to the ODE is called a Taylor’s series method of order m. dt t+h t+h yields thus (10. Since τ is essentially unknown. x) for yet unknown x values. We also say that this approximation is a truncation of the Taylor’s series. t+h x(t + h) = x(t) + t f (r. Note that this means that all the work we have put into approximating integrals can be used to approximate the solution of ODEs. And it is embedded on the right hand side.136 CHAPTER 10. h. The diﬀerence between the actual . . If we apply the left-hand rectangle rule for approximating integrals to equation 10. . if we have a good approximate of x(t) we can approximate x(t + h).2 Taylor’s Series Methods We see that our more accurate integral approximations will be useless since they require information we do not know. That is dx(t) = f (t. x(r)) dr.1 Integration and ‘Stepping’ We attempt to solve the ODE by integrating both sides. 10. Trapezoid rule gives x(t + h) ≈ x(t) + h [f (t. But if we could approximate it. x(t + h))] . Using stepping schemes we can approximate x(t f inal ) given x(tinitial ) by taking a number of steps. x(t)) . For simplicity we usually use the same step size. Note this is essentially the same as the “forward diﬀerence” approximation we made to the derivative.2). equation 7. This is the idea behind the Runge-Kutta Method (see Section 10. evaluations of f (t. i.1. . x(t)). By Taylor’s theorem. + h x (t) + hm+1 x(m+1) (τ ). 2 m! (m + 1)! where τ is between t and t + h.. if x has at least m + 1 continuous derivatives on the interval in question. x(t)) + f (t + h. We can also view this as using integral approximations where all information comes from the left-hand endpoint. (10. ORDINARY DIFFERENTIAL EQUATIONS 10. x(t + h) ≈ x(t) + hx (t) + h2 x (t) + . . maybe the formula would work. we can write 1 1 m (m) 1 x(t + h) = x(t) + hx (t) + h2 x (t) + . our best calculable approximation to this expression is 1 m (m) 1 h x (t).2.

called roundoﬀ error. x(t)). Euler’s Method will underestimate x(t) because the curvature of the actual x(t) is positive.1. 10. and is thus an underestimate.1. ELEMENTARY METHODS 137 value of x(t + h).3 Euler’s Method When m = 1 we recover Euler’s Method: x(t + h) = x(t) + hx (t) = x(t) + hf (t. We discuss this more in Subsection 10. x) is given as a “black box” function.5 3 Euler approximation Actual Figure 10.5 1 1. and thus the function is always above its linearization. 22 20 18 16 14 12 10 8 6 4 2 0 0 0.1. In Figure 10.1. . dx(t) dt The actual solution is x(t) = et . Each step proceeds on the linearization of the curve.5. and thus may not be appropriate for the case where f (t. Note however. that they will require information about the higher derivatives of x(t). and this approximation is called the truncation error.4 Higher Order Methods Higher order methods are derived by taking more terms of the Taylor’s series expansion. 10. x(t) = e t has positive curvature. x(0) = 1. Example 10. we see that eventually the Euler’s Method approximation is very poor. Consider the ODE PSfrag replacements = x.3 is used. The truncation error exists independently from any error in computing the approximation.1. x(0) = 1. Here step size h = 0.10.1. as the actual solution.5 2 2.1: Euler’s Method applied to approximate x = x.

but had the wrong data. x(a) = c. For an ODE with unknown solution. dx(t) dt . x(a) = c + . you were trying to solve dx(t) dt = f (t. while the roundoﬀ error increases. x(0) = 1. which then solved exactly the ODE = f (t. x(t)) . This results in an optimal h-value for a given method. As h is made smaller. Example Problem 7. it is generally impossible to predict the optimal h. Solution: By simple calculus: x (t) = x + ex x (t) = x + ex x x (t) = x (1 + ex ) + x ex x We can then write our step like a program: x (t) ← x(t) + ex(t) dx(t) dt x (t) ← x (t) 1 + ex(t) 1 1 x(t + h) ≈ x(t) + hx (t) + h2 x (t) + h3 x (t) 2 6 x (t) ← x (t) 1 + ex(t) + x (t) 2 x(t) e 10. x(t)) . when analyzing approximate derivatives–cf. but instead fed the wrong data into the computer.5. You can see that the truncation error is O hm+1 .1. Truncation error is the error of truncating the Taylor’s series after the mth term. While it is possible these two kinds of error could cancel each other. we usually assume they are additive. We have already observed this kind of behaviour. Unfortunately. We also expect some tradeoﬀ between these two errors as h varies. It turns out that for some ODEs and some methods. That is.1.2.5 Errors There are two types of errors: truncation and roundoﬀ.6 Stability Suppose you had an exact method of solving an ODE.2. this does not happen. Both of these kinds of error can accumulate: Since we step from x(t) to x(t + h). See Figure 10. an error in the value of x(t) can cause a greater error in the value of x(t + h).138 CHAPTER 10. This leads to the topic of stability. given our pessimistic worldview. Roundoﬀ error occurs because of the limited precision of computer arithmetic. Derive the Taylor’s series method for m = 3 for the ODE = x + ex . the optimal h-value will usually depend on the given ODE and the approximation method. the truncation error presumably decreases. ORDINARY DIFFERENTIAL EQUATIONS Example Problem 10. 10.

Note that the truncation error appears to be O (h). we see that diﬀerences in the initial conditions become immaterial.3. This solution can diverge from the correct one because of the diﬀerent starting conditions. The total error is apparently minimized for an h value of approximately 0. In reality. we expect the roundoﬀ error to be more “bumpy.1.1.3(b). dt which has solution x(t) = x(0)e−t .0002. We illustrate this with the usual example. and total error for a ﬁctional ODE and approximation method are shown. thus the approximation could be Euler’s Method. However.01 . if we instead consider the ODE dx(t) = −x. Example 10. ELEMENTARY METHODS 10000 1000 100 10 1 0. as in Figure 10.” as seen in Figure 7. The curves for diﬀerent starting conditions diverge.2: The truncation. Sfrag replacements 139 truncation roundoﬀ total error 1e-06 1e-05 0. roundoﬀ.0001 0.01 1e-07 Figure 10.001 0. The former ODE exhibits the opposite behavior– accrued errors will be ampliﬁed. Consider the ODE dx(t) = x.1 0. dt This has solution x(t) = x(0)et .10. Thus the latter ODE exhibits stability: roundoﬀ and truncation errors accrued at a given step will become irrelevant as more steps are taken.3(a). as in Figure 10.

6 1.5 2 (b) (1 + )e−t Figure 10.5 1. (1+ )e−t are shown.6 0. while in (b).25 =0 = 0.2 0 0 0.5 8 6 4 PSfrag replacements 2 0 0 0.5 2 (a) (1 + )et 1.5 1 1. the functions (1 + )e t are shown.PSfrag replacements 140 CHAPTER 10.25 =0 = 0.4 0.5 = −0.2 1 0.4 = −0.3: Exponential functions with diﬀerent starting conditions: in (a).5 1 1. while the former exhibits instability. ORDINARY DIFFERENTIAL EQUATIONS 12 10 = −0. The latter exhibits stability.8 0.25 = 0.5 = −0.25 = 0. .

∂t ∂ ∂ ∂ x(t. s) = ∞.. If we take the derivative of the ODE we get ∂ x(t. s) x(t. and so lim u(t) = 0. x(t. ∂s ∂t ∂s ∂ ∂ ∂t ∂x(t. x(t. . so ∂ ∂ ∂x(t. it may suﬀer from instability. the equation is stable. ∂s Think about this in terms of the curves for our simple example x = x. s)). We can prove this without too much pain.1.e. s). s) = f (t. and thus we have instability. s). x(t. and so lim u(t) = ∞. x(t. s the ﬁrst part of the RHS is zero. If q(r) = fx (r. then lim Q(t) = −∞. x(t. s) = fx (t. and q(t) = fx (t. Then instability means that t→∞ dx(t) dt lim ∂ x(t. where t Q(t) = a q(r) dr.1. x(t)) + fx (t. s) = ft (t. But note that u(t) = ∂ ∂s x(t. s) to be the solution to = f (t. s)) x(t. s)) . Deﬁne x(t. it can be recast into a method which may have superior stability at the cost of limited applicability. s)). s)) . s) x(t. s)) ≥ δ > 0. ∂s ∂t ∂s ∂s By the independence of t. Similarly if q(r) ≤ −δ < 0. ∂t The solution is u(t) = ceQ(t) . However. x(t. ∂t ∂s ∂s Deﬁning u(t) = ∂ ∂s x(t. s).10. however. x(t)) . which may make the method inappropriate or impractical. 10.7 Backwards Euler’s Method Euler’s Method is the most basic numerical technique for solving ODEs. Some ODEs fall through this test. then the ODE is instable. i. s) = fx (t. for t ≥ a. ELEMENTARY METHODS 141 It turns out there is a simple test for stability. x(r. if fx < −δ for a positive δ. s)). If f x > δ for some positive δ. giving stability. we have ∂ u(t) = q(t)u(t). ∂s ∂t ∂s We use continuity of x to switch the orders of diﬀerentiation: ∂ ∂ ∂ x(t. x(a) = s. then lim Q(t) = ∞. However. s) = f (t.

We make this clear by examples. x(0) = 2π. Consider the ODE = cos x. the so-called Backwards Euler’s Method. .4: The nonincreasing function g(y) = 2π + cos y − y is shown. Taylor’s Theorem is the starting point.142 CHAPTER 10. Example 10. Euler’s Method. as seen in Figure 10. Since this approximation may contain x(t + h) on both sides. As usual. Taylor’s Theorem states that x(tf − h) = x(tf ) + (−h)x (tf ) + (−h)2 x (tf ) + . we have to ﬁnd x(h) such that x(h) = x(0) + cos x(h). ORDINARY DIFFERENTIAL EQUATIONS This method.4. 8 g(y) 6 4 2 0 -2 -4 -6 2 4 6 8 10 12 Figure 10. we call this method an implicit method. In contrast. is a stepping method: given a reasonable approximation to x(t). the second order term is truncated to give x(t) = x(t + h − h) = x(tf − h) ≈ x(tf ) + (−h)x (tf ) and so. The techniques of Chapter 4 are needed to ﬁnd the zero of this function.4. if we wish to approximate x(0 + h) using Backwards Euler’s Method. 2 for an analytic function. x(t + h) ≈ x(t) + hx (t + h) The complication with applying this method is that x (t+h) may not be computable if x(t+h) is not known. . . This is equivalent to ﬁnding a root of the equation PSfrag replacements 0 dx(t) dt g(y) = 2π + cos y − y = 0 The function g(y) is nonincreasing (g (y) is nonpositive). and has a zero between 6 and 8. Letting tf = t + h. and the Runge-Kutta methods in the following section are explicit methods: they can be used to directly compute an approximate step. it calculates an approximation to x(t + h). Assuming that x has two continuous derivatives on the interval in question. In this case. . thus cannot be used to calculate an approximation for x(t+h).

with the same stepsize. x(t)) . Compare this ﬁgure to Figure 10. RUNGE-KUTTA METHODS Example 10. The approximation is an overestimate.3. gives the approximation: x(t + h) = x(t) + hx(t + h).5: Backwards Euler’s Method applied to approximate x = x. as shown in Figure 10.2 Runge-Kutta Methods dx(t) dt Recall the ODE problem: ﬁnd some x(t) such that = f (t.10.1. This eﬀect is not evident in this case.5 2 2.2.5 3 Figure 10. x(t) x(t + h) = . Euler’s Method underestimates x(t). as shown in Figure 10. as each step is to a point on a curve ke t .5.5 1 1. x(a) = c.1. 1−h Using Backwards Euler’s Method gives an over estimate. x(0) = 1. which shows the same problem approximated by Euler’s Method. 40 PSfrag replacements Backwards Euler approximation Actual 35 30 25 20 15 10 5 0 0 0.5. dx(t) dt 143 The actual solution is x(t) = et . 10. Using Backwards Euler’s Method. Generally Backwards Euler’s Method is preferred over vanilla Euler’s Method because it gives equivalent stability for larger stepsize h. This is because each step proceeds on the tangent line to a curve ke t to a point on that curve.1: = x. h = 0. through the tangent at that point. x(0) = 1. Consider the ODE from Example 10. .

. x ¯ where (¯. x (t). or you are using a higher order method which requires evaluations of higher order derivatives of x(t). y) + hfx (x.1 Taylor’s Series Redux We fall back on Taylor’s series. ORDINARY DIFFERENTIAL EQUATIONS where f. ∂x∂y ∂x2 ∂y 2 f (x. we will write this for n = 1: f (x + h. 1 f (x. There is a truncated version of Taylors series: n f (x + h. The partial derivative operators are interpreted exactly as if they were algebraic terms. Recall that the Taylor’s series method has problems: either you are using the ﬁrst order method (Euler’s Method). y + k) = i=0 1 i! h ∂ ∂ +k ∂x ∂y i f (x. y) +k . y) ∂ 2 f (x. y ). y ) + k 2 fyy (¯. y + k) = i=0 1 i! h ∂ ∂ +k ∂x ∂y i f (x. etc.e. x ¯ For practice. Then we approximate: x(t + h) ← x(t) + w1 K1 + w2 K2 . We now examine the “proper” choices of the constants. w2 = 0 corresponds to Euler’s Method. y). y) + 1 (n + 1)! h ∂ ∂ +k ∂x ∂y n+1 f (¯. w1 . y ) is a point on the line segment with endpoints (x. y) + 1 2 h fxx (¯. y) = h 2 ∂f (x. y + k) = f (x. y) = h2 . We will pick another choice. The thing in the middle is an operator on f (x. This makes these methods less useful for the general setting of f being a blackbox function. y) + kfy (x. y) ∂ 2 f (x.144 CHAPTER 10.2. i. β. y) and (x + h. Note that w 1 = 1. y). that is: h h h ∂ ∂ +k ∂x ∂y ∂ ∂ +k ∂x ∂y ∂ ∂ +k ∂x ∂y 0 f (x. y + k). x) K2 ← hf (t + αh. a. c are given. in this case the 2-dimensional version: ∞ f (x + h. w2 are ﬁxed constants. y) + k2 + 2hk .2 Deriving the Runge-Kutta Methods We now introduce the Runge-Kutta Method for stepping from x(t) to x(t + h). 10. We suppose that α.2. y) ∂f (x. y). y) = f (x. which suﬀers inaccuracy. We then compute: K1 ← hf (t. x (t). The Runge-Kutta Methods (don’t ask me how it’s pronounced) seek to resolve this. y ) + 2hkfxy (¯. . and does not require computation of K 2 . . y ) x ¯ x ¯ x ¯ 2 10. . x + βK1 ). ∂x ∂y ∂ 2 f (x.

2 i.2. x) + f 4 4 t+ 2h 2h . x + βhf (t.x + f (t. w1 + w2 = 1. cool. x) . our choice of the constants makes the approximate x(t + h) good up to a O h3 term. x)). x) + αβf h2 ftx (t.e. The next Runge-Kutta Method is the order four method: K1 ← hf (t. 6 . RUNGE-KUTTA METHODS 145 Also notice that the deﬁnition of K2 should be related to Taylor’s theorem in two dimensions. x) K2 ← hf (t + h/2. x) αw2 = 1 = βw2 .. then we get 1 x(t + h) = x(t) + hx (t) + h2 (ft + f fx ) + O h3 2 1 = x(t) + hx (t) + h2 x (t) + O h3 . x) . x + K2 /2) (10. w2 = 3/4. 2 Now reconsider our step: x(t + h) = x(t) + w1 K1 + w2 K2 = x(t) + w1 hf + w2 hf + w2 αh2 ft + w2 βh2 f fx + O h3 . x + hf (t. 2 If we happen to choose our constants such that K2 ← hf (t + h. = x(t) + (w1 + w2 ) hx (t) + w2 h2 (αft + βf fx ) + O h3 . x)) 1 2 2 ¯¯ ¯¯ ¯¯ = f + αhft + βhf fx + α h ftt (t. x) + β 2 h2 f 2 fx x(t. This gives x(t + h) ← x(t) + 3h h f (t. x) + f (t + h. This gives the second order RungeKutta Method : h h x(t + h) ← x(t) + f (t. = x(t) + (w1 + w2 ) hx (t) + w2 h2 (αft + βf fx ) + O h3 . 2 2 This can be written (and evaluated) as K1 ← hf (t. w 1 = w2 = 2 . Let’s look at it: K2 /h = f (t + αh. x + K3 ) 1 x(t + h) ← x(t) + (K1 + 2K2 + 2K3 + K4 ) . x + K1 /2) K3 ← hf (t + h/2. x + βK1 ) = f (t + αh. 1 The usual choice of constants is α = β = 1. w1 = 1/4. Sometimes this is not enough and higher-order Runge-Kutta Methods are used.4) The Runge-Kutta Method of order two has error term O h3 . x + K1 ) 1 x(t + h) ← x(t) + (K1 + K2 ) . 3 3 (10.5) K4 ← hf (t + h. because we end up with the Taylor’s series expansion up to that term. 2 Another choice is α = β = 2/3.10.

x(a) = c.com/Runge-KuttaMethod. y(t)) . The method becomes x(t + h) ← x(t) + hf t+ h h . 10. x(a) = c.html) The Runge-Kutta Method can be extrapolated to even higher orders. Sometimes the physical systems we are considering are more complex. We already saw that w1 = 1. y(t)) . y(a) = d. 10.wolfram. y(0) = 0. dx(t) dt You should immediately notice that this is not a system at all. x(t).3 Systems of ODEs dx(t) dt Recall the regular ODE problem: ﬁnd some x(t) such that = f (t.146 CHAPTER 10. where f.6. Consider the following system of ODEs: dx(t) = t2 − x dt dy(t) 2 dt = y + y − t.7. the number of function evaluations grows faster than the accuracy of the method. However. dt dy(t) dt = g (t.1 to compute x(1. we might be interested in the system of ODEs: dx(t) = f (t. (See http://mathworld. but rather a collection of two ODEs: . ORDINARY DIFFERENTIAL EQUATIONS This method has order O h5 .2.1) using both Taylor’s Series Methods and Runge-Kutta methods of order 2. x(t). x + f (t. = t2 − x x(0) = 1. w2 = 1.3 Examples Example 10. a. Note this gives a diﬀerent value than if Euler’s Method was applied twice with step size h/2. x) . x(0) = 1. For example. x(t)) . w2 = 0 corresponds to Euler’s Method. Example 10. 2 2 This is called the modiﬁed Euler’s Method . c are given. Thus the methods of order higher than four are normally not used. We consider now the case w1 = 0. Consider the ODE: x = (tx)3 − x(1) = 1 x 2 t Use h = 0.

3. then do exactly the same thing as for the one dimensional case! That’s right. . Euler’s Method. We may imagine we have to solve the following problem: ﬁnd x1 (t). (10. .. + X (k) (t). dy(t) dt 147 These two ODEs can be solved separately. we know X(t). and g(t. . . x2 (a) = c2 . xn (t)) .6) 10. . x1 (t). dt x1 (a) = c1 . cn fn xn xn X (t) = F (t. . 2 k! . x2 (t).10.3.1 Larger Systems There is no need to stop at two functions. x1 (t). and want to ﬁnd X(t + h). . That is. We will not consider uncoupled systems. coupled..... X = 2 . write everything as vectors. . xn (t) such that dx1 (t) dt dx2 (t) dt = f1 (t. Most of the methods we looked at for solving one dimensional ODEs can be rewritten to solve higher dimensional problems. We call such a system uncoupled. is. . . In an uncoupled system the function f (t.3. x2 (t). Compare this to equation 10. As in 1D. . . That is we can use the k th order method to step as follows: X(t + h) ← X(t) + hX (t) + hk h2 X (t) + . . our general strategy of using Taylor’s Theorem also carries through without change.. of course. . A system which is not uncoupled. . . X(t)) X (a) = C. . 10. xn (t)) . y) is independent of x.1. this method is derived from approximating X by its linearization. . y) is independent of y. . SYSTEMS OF ODES and = y 2 + y − t. .. . x. C = 2 . there is nothing new but notation: Let c1 f1 x1 x1 c f x x X = 2 .2 Recasting Single ODE Methods We will consider stepping methods for solving higher order ODEs. . In fact. equation 10. . X(t)) .. . x2 (t). . x2 (t). dxn (t) = fn (t. x. . . For example. . F = 2 . x1 (t).3 can be written as X(t + h) ← X(t) + hF (t. xn (t)) . Then we want to ﬁnd X such that The idea is to not be afraid of the notation. y(0) = 0. = f2 (t. xn (a) = cn . .

2 −0.061 and . 2 K1 ← hF (t.2 = 0.1F (0.1 K1 ← 0. X + K1 ) 1 X(t + h) ← X(t) + (K1 + K2 ) . X + K3 ) 1 X(t + h) ← x(t) + (K1 + 2K2 + 2K3 + K4 ) .1) ← X(0) + 0.9 −0.1.8. X(0)) = 0 + 0. 6 Example Problem 10.2 0. x2 (0) = 0.061 K2 ← 0. ORDINARY DIFFERENTIAL EQUATIONS Additionally we can write the Runge-Kutta Methods as follows: K1 ← hF (t. X + K1 2 2 1 1 K3 ← hF t + h.1F (0. for Euler’s Method. X) K2 ← hF Order four: 1 1 t + h. X) Order two: K2 ← hF (t + h. and the Runge-Kutta Method of order 2. Solution: We write x2 (t) − x2 (t) 1 3 t + x1 (t) + x3 (t) X(0) = 0 F (t.1) by taking a single step of size 0. X(0)) = 0.1 1 The Runge-Kutta Method computes: −0.1 1 X(0. X(t)) = x2 (t) − x2 (t) 1 1 Thus we have −1 F (0. X + K2 2 2 K4 ← hF (t + h.148 CHAPTER 10.19 −0.1 −0. X(0) + K1 ) = 0.1. x3 (0) = 1. Consider the system of ODEs: 2 x1 (t) = x2 − x3 x (t) = t + x + x 2 1 3 x3 (t) = x2 − x2 1 x1 (0) = 1. X(0)) = 2 −1 Euler’s Method then makes the approximation 0. Approximate X(0.1F (0.9 −0.

This is just a system of ODEs that we can solve like any other. . . xn−1 = x(n−1) . . x(n−1) (t) . . Note that this trick also works on systems of higher order ODEs. . Then the ODE plus these n − 1 equations can be written as xn−1 (t) = f (t. . xn−1 (t)) x0 (t) = x1 (t) x (t) = x2 (t) 1 . .9195 1 X(0. . .1) ← X(0) + (K1 + K2 ) = 0 + 0. x2 (t). . . x1 (0) = 0. Example Problem 10.10. . . x n−2 (t) = xn−1 (t) x0 (a) = c0 . x1 (a) = c1 . x (a) = c2 . x(n−1) (a) = cn−1 . . It allows us to solve higher order ODEs. x1 = x .195 = 0. then we can rewrite as x1 (t) = (t/x2 (t)) + x1 (t) − 1 x (t) = x (t) 1 0 1 x2 (t) = (x1 (t)+x2 (t)) x0 (0) = 1. SYSTEMS OF ODES Then 1 −0. x2 (0) = −1 . . x(a) = c0 .9. x (0) = 0. . a. ﬁnd x(t) such that x(n) (t) = f t. . . x0 (t). .9195 149 10.0805 0. x2 = x . x (t). . x2 (a) = c2 . Rewrite the following system as a system of ﬁrst order ODEs: x (t) = (t/y(t)) + x (t) − 1 1 y (t) = (x (t)+y(t)) x(0) = 1. c 0 . c1 . xn−1 (a) = cn−1 . For example. we can transform any system of higher order ODEs into a (larger) system of ﬁrst order ODEs. .3 It’s Only Systems The fact we can simply solve systems of ODEs using old methods is good news. That is. We can put this in terms of a system by letting x0 = x. x2 = y.3.3. x(t). . x1 = x .195 2 1 −0. . . y(0) = −1 Solution: We let x0 = x. . consider the following problem: given f.0805 0. x (a) = c1 . x1 (t). . cn .

Under this analogy. to get the system: x0 = 1. One can consider X to be the position of a particle which moves through R n over time. An autonomous system can be written in the more elegant form: . . . . x2 . . x = fn (t. . xn ) . This is because at the point of crossing. We do this by letting x 0 = t. . . . 1 Note that if you have converted a system of n equations with a time dependence to an autonomous system. xn ) . x2 . 2 x3 = f3 (x0 . xn (a) = cn . . . n x1 (a) = c1 . In our deterministic world. x1 . xn ) . . x1 . xn ) . i. . x2 .2) We can rewrite this ODE as X (t) = F (X(t)) .1 A phase curve is a ﬂow line in the vector ﬁeld F . xn ) . x2 . . x1 . . instead of time. n x0 (a) = a.3. xn ) . x1 . then the autonomous system should be considered a particle moving through Rn+1 . x1 = f1 (x0 . of a river. 3 . the movement of the particle is dependent only on its current position and not on time. . phase curves never cross. x = f3 (t. we can consider the phase curve of an autonomous system. xn (a) = cn . Consider the autonomous system of ODEs. x1 (a) = c1 . We can get rid of the time dependence and thus make the system autonomous by making t another variable function to be found. x1 . . x2 . the phase curve is the path of a vessel placed in the river. For an autonomous system of ODEs. x = f2 (x0 . . x2 . x2 = f2 (t. there would have to be two diﬀerent values of the tangent of the curve. and thus we speak of position. . . Example 10.10. . without an initial value: x1 (t) = −2(x2 − 2. . then adding the equations x0 = 1. . . .. ORDINARY DIFFERENTIAL EQUATIONS 10. xn ) . xn ) . x2 (a) = c2 . . we can use a similar trick to get rid of the time-dependence of an ODE. In this case time becomes one of the dimensions. . x1 . .4) x2 (t) = 3(x1 − 1. x2 (a) = c2 . Consider the ﬁrst order system x1 = f1 (t. . . . x = fn (x0 . at a given point. Moreover. x1 . . . You can think of the vector ﬁeld F as being the velocity. x2 . the ﬂow of which is time independent. . . . . . x1 . . . . The following example illustrates the idea of phase curves for autonomous ODE. . .4 It’s Only Autonomous Systems In fact. and x0 (a) = a.e. the vector ﬁeld would have to have two diﬀerent values at that point. .150 CHAPTER 10. X = F (X) X (a) = C. . x2 . .

10. The tips of the vectors are shown as circles. the parameters r and t 0 are uniquely determined. arrowheads were not available for this ﬁgure.5 -0.5 1 1.3.8 −3.) You should verify that this ODE is solved by √ √ 2 cos(√6t + t0 ) X (t) = r √ 3 sin( 6t + t0 ) + 1. Given an initial value. . form the phase curves of this ODE.5 151 0 −2 3 0 X+ 4. Thus the trajectory of X over time is that of an ellipse in the plane.6. SYSTEMS OF ODES This is an autonomous ODE.5 3 2. This is because the phase curve represents the Euler’s Method and the second order Runge-Kutta Method were used to approximate the ODE with initial value 1. 1. taking all r ≥ 0. 2π] . the ellipse is the “trivial” ellipse which consists of a point.5 2 1.5 Figure 10. for any r ≥ 0.5 X (0) = . We would expect the approximation of an ODE to never cross a phase curve. (Due to budget restrictions.5 2 In the case r = 0.6: The vector ﬁeld F (X) is shown in R 2 .6 .4 .5 1 0.5 3 3. 2 Thus the family of such ellipses.5 2 2.5 0 -0. The associated vector ﬁeld can be expressed as F (X) = This vector ﬁeld is plotted in Figure 10. and t0 ∈ (0.5 4 3. 0 0. 4.2 2.

5 3 3.5 -0.5 3 2.5 1 1. . ORDINARY DIFFERENTIAL EQUATIONS 4.5 2 2.5 4 3.5 3 2. but was more accurate.5 0 -0.7: The simulated phase curves of the system of Example 10.5 2 1.5 -0.5 0 -0.10 are shown for (a) Euler’s Method and (b) the Runge-Kutta Method of order 2.5 4 3. The Runge-Kutta Method used larger step size.5 1 1.5 (a) Euler’s Method 4.5 2 2. The actual solution to the ODE is an ellipse.5 0 0.152 CHAPTER 10.5 1 0.5 3 3.5 0 0. thus the “spiralling” observed for Euler’s Method is spurious.5 (b) Runge-Kutta Method Figure 10.5 2 1.5 1 0.

and for larger step size. The approximations are shown in Figure 10. we see that Euler’s Method performed rather poorly. . At the given resolution. SYSTEMS OF ODES 153 Euler’s Method was used for step size h = 0. whereas Euler’s Method requires only one. This must be the case for any step size.005 for 3000 steps. Given that the actual solution of this initial value problem is an ellipse. thus every step of Euler’s Method puts the approximation on an ellipse of larger radius. since Euler’s Method steps in the direction tangent to the actual solution. Thus the Runge-Kutta Method outperforms Euler’s Method. per step.015 for 3000 steps. no spiralling is observable.3. The Runge-Kutta Method was employed with step size h = 0. spiralling out from the ellipse.7. it is expected that they would be ‘evenly matched’ when Runge-Kutta Method uses a step size twice that of Euler’s Method.10. and at lesser computational cost. Smaller stepsize minimizes this eﬀect. but at greater computational cost.3 3 Because the Runge-Kutta Method of order two requires two evaluations of F . The Runge-Kutta Method performs much better.

y(0) = 0. x(0) = x (0) = y (0) = −1.7) Rewrite the system of ODEs as a system of ﬁrst order ODEs x (t) = y(t) + t − x(t).6) Rewrite the following higher order ODE as a system of ﬁrst order ODEs: x (t) = x(t) − sin x (t).1) Given the ODE: CHAPTER 10. (10. x(0) = 0. derive a higher order approximation scheme to ﬁnd x(t + h) based on x(t). x (0) = 1.1) using one step of the second order Runge-Kutta Method. Use backward’s Euler (or. alternatively. (10.2) Given the ODE: x1 (t) = x1 x2 + t x2 (t) = 2x2 − x2 1 x1 (1) = 0 x2 (1) = 3 Approximate X(1.8) Rewrite the following higher order system of ODEs as a system of ﬁrst order ODEs: x (t) = y (t) − x (t). Using Taylor’s Theorem.5) Which of the following ODEs can you show are stable? Which are unstable? Which seem ambiguous? (a) x (t) = −x − arctan x (b) x (t) = 2x + tan x (c) x (t) = −4x − ex (d) x (t) = x + x3 (e) x (t) = t + cos x (10. for some function g. and h.3) Consider the separable ODE x (t) = x(t)g(t). the right endpoint rule for approximating integrals) to derive the approximation scheme x(t + h) ← x(t) .1) using one step of the second order Runge-Kutta Method. y (t) = x (t) + y (t). (10. ORDINARY DIFFERENTIAL EQUATIONS x (t) = t2 − x2 x(1) = 1 Approximate x(1. x (0) = y(0) = y (0) = 1.154 Exercises (10. (10. 1 − hg(t + h) What could go wrong with using such a scheme? (10.1) by using one step of Euler’s Method.1) by using one step of Euler’s Method. Approximate X(1. x(0) = 2. y (t) = x (t) + x(t) − 4.4) Consider the ODE x (t) = x(t) + t2 . t. Approximate x(1. . x (0) = π. (10.

c = 0. using a = 0 = c. and 90. Does the Runge-Kutta Method perform any better than Euler’s Method for the last ODE? dx(t) dt . 36. SYSTEMS OF ODES (10. x(a) = c. Your m-ﬁle should have header line like: function xfin = rkm(f. (b) Run your code with f(t. x) = 1/ (1 − x) . using a = 0 = c. In this case the actual solution is xfin = ce−1 . 1000. Example 10. Prepare a plot of error versus h for varying values of h.1 and Figure 10. x(t)) . Experiment with diﬀerent values of h.steps) where xfin should be the approximate solution to the ODE at time a + steps h. Can you guess what the actual solution is supposed to look like? Can you explain the poor performance of Euler’s Method for this ODE? (10.2) (c) Run your code with f(t. This ODE is stable. like 100. Figure 7.1.0. try steps values of 35. so your approximation should be good (cf.10) Implement the Runge-Kutta Method of order two for approximating the solution to = f (t. using a = 0. (a) Run your code with f(t. Plot your approximations for varying values of steps. dx(t) dt 155 Your m-ﬁle should have header line like: function xfin = euler(f. x(a) = c. x) = x. and using h steps = 1.0.h.3). Also try large values of steps.3. Your actual solution may be a rather poor approximation. 91.steps) where xfin should be the approximate solution to the ODE at time a + steps h.a.a. x) = −x.c. (cf.h.10. and using h steps = 2. 500. In this case the actual solution is xfin = ce.c. x(t)) .9) Implement Euler’s Method for approximating the solution to = f (t. and using h steps = 1. Test your code with the ODEs from the previous problem.0. For example.

156 CHAPTER 10. ORDINARY DIFFERENTIAL EQUATIONS .

Note that 5 ≈ 2. Your answer should be something like O h? . xn . • Assuming that f (x) has bounded derivatives. .236. What are a2 . Let a0 = 0. Fall 2003 P1 (7 pnts) Clearly state Taylor’s Theorem for f (x + h). give the accuracy of the above approximation. P6 (7 pnts) Write out the iteration step of the Secant Method for ﬁnding a zero of the function f (x). . Approximate f (0) with this approximation. using h = 3 . P8 (21 pnts) Consider the data: 0 1 2 k xk 3 1 −1 f (xk ) 5 15 9 157 .Appendix A Old Exams A. . That is. . x1 . ﬁnd the x1 and x2 for your iteration. P3 (7 pnts) Write the Lagrange form of the polynomial of lowest degree that interpolates the following values: xk −1 1 yk 11 17 P4 (14 pnts) Use the method of Bisection for ﬁnding the zero of the function f (x) = x 3 − 3x2 + 5x − 5 .1 First Midterm. P2 (7 pnts) Write out the Lagrange Polynomial 0 (x) for nodes x0 . . √ • Letting x0 = 1. Your iteration should look like xk+1 = ?? P7 (21 pnts) Consider the following approximation: f (x) ≈ f (x + h) − 5f (x) + 4f (x + h ) 2 3h • Derive this approximation using Taylor’s Theorem. x2 . . 1 • Let f (x) = x2 . Include all hypotheses. . and b0 = 2. and state the conditions on the variable that appears in the error term. Your iteration should look like xk+1 = ?? √ • Write the Newton’s Method iteration for ﬁnding 5. b2 ? 2 P5 (21 pnts) • Write out the iteration step of Newton’s Method for ﬁnding a zero of the function f (x). . write the polynomial that has value 1 at x0 and has value 0 at x1 . xn .

158 APPENDIX A. A..e. Verify your answer. devise a “Romberg-style” quadrature rule that is 2n a O h10 approximation to the integral. P3 (14 pnts) Let φ(n) be some quadrature rule that divides an interval into 2 n equal subintervals. Find a numerical upper bound on the error |p(x) − f (x)| over the interval [−1. • Suppose that these data were generated by the function f (x) = x 3 − 5x2 + 2x + 17. You must indicate which of the two methods you are using.01 0. Make this rule exact for all polynomials of degree ≤ 2. Clearly indicate your answer x(1) . OLD EXAMS • Construct the divided diﬀerences table for the data. Show your work. P7 (28 pnts) Consider the quadrature rule: 1 0 f (x) dx ≈ z0 f (0) + z1 f (1) + z2 f (0) + z3 f (1).2 Second Midterm. . . 3]. Your answer should be a number. .001 43 91 −43 1 .0001 −0. Using evaluations of φ(·). n n n where hn = b−a . A= −12 26 −1 0 1 −10 π −7 2 Which row contains the ﬁrst pivot? You must show your work. Suppose b a f (x) dx = φ(n) + a5 h5 + a10 h10 + a15 h15 + .1 −0. . • Construct the Newton form of the polynomial p(x) of lowest degree that interpolates f (x) at these nodes. L and U. using the nodes {xi }n . Fall 2003 b P1 (6 pnts) Write the Trapezoid rule approximation for a f (x) dx. P5 (14 pnts) Run one iteration of either Jacobi or Gauss Seidel on the system: 1 3 4 −3 −2 2 1 x = −3 −1 3 −2 1 Start with x(0) = [1 1 1] . i=0 P2 (6 pnts) Suppose Gaussian Elimination with scaled partial pivoting is applied to the matrix 0. Your ﬁnal answer should be two ıve matrices. P6 (21 pnts) Derive the two-node Chebyshev Quadrature rule: 1 −1 f (x) dx ≈ c [f (x0 ) + f (x1 )] . verify that A = LU. n P4 (14 pnts) Compute the LU factorization of the matrix 1 0 −1 A= 0 2 0 2 2 −1 using na¨ Gaussian Elimination. Show all your work. i.

. .e. c2 ∈ R } . .3 Final Exam. FINAL EXAM. P8 (9 pnts) Find the least-squares best approximate of the form y = ln (cx) to the data x0 x1 . x2 . ıve A. write the polynomial that has value 1 at x 0 and has value 0 at x1 . . . . verify that A = LU. . Combine two evaluations of φ(·) in the manner of Richardson to get a O n−1 approximation to L. Your ﬁnal answer should be two ıve matrices. . that is. P4 (4 pnts) Consider the ODE: x (t) = f (t. P2 (4 pnts) State the conditions that characterize S(x) as a natural cubic spline on the interval [a. What is the highest degree polynomial for which this rule is exact? P7 (9 pnts) Suppose that φ(n) is some computational function that approximates a desired quantity. < tn = b. Verify your answer. Set up (but do not attempt to solve) the normal equations to ﬁnd the least-squares best f ∈ F to approximate the data: x0 x1 . L and U. y n Caution: this should not be done with basis vectors. FALL 2003 159 • Write four linear equations involving z i to make this rule exact for polynomials of the highest possible degree. Show all your work. Let the knots be a = t0 < t1 < t2 < . x1 = −5. y n 1 2 3 . P5 (7 pnts) Compute the LU factorization of the matrix −1 1 0 A = 1 1 −1 −2 0 2 using na¨ Gaussian Elimination. P9 (9 pnts) Let F = {f (x) = c0 x + c1 log x + c2 cos x | c0 . P3 (4 pnts) Let x0 = 3. Include all hypotheses. Write out the Lagrange Polynomial 0 (x) for these nodes.3. . Determine a quadrature rule of the form α −α f (x) dx ≈ Af (0) + Bf (−α) + Cf (α) that is exact for polynomials of highest possible degree. and state the conditions on the variable that appears in the error term. L. Furthermore suppose that φ(n) = L + a1 n− 2 + a2 n− 2 + a3 n− 2 + . .A. x n y0 y1 .. . x n y0 y1 . • Solve your system for z using na¨ Gaussian Elimination. Fall 2003 P1 (4 pnts) Clearly state Taylor’s Theorem for f (x + h). P6 (9 pnts) Let α be given. x2 = 0. . b]. c1 . x(t)) x(a) = c Write out the Euler Method to step from x(t) to x(t + h). i.

P11 (11 pnts) Consider the ODE: x (t) = x(t)g(t) x(0) = 1 • Use the Fundamental Theorem of Calculus to write a step update of the form: t+h x(t + h) = x(t) + t ?? • Approximate the integral using the Trapezoid Rule to get an explicit step update of the form x(t + h) ← Q(t. x (t)) . It is permissible to write the update as a small program like: x (t) ← Q0 (x(t)) Rt 0 g(r) dr . Do you see any problem with this stepping method if h > 2/m? Hint: the actual solution to the ODE is x(t) = e P12 (9 pnts) Consider the ODE: x (t) = ex(t) + sin (x(t)) x(0) = 0 Write out the Taylor’s series method of order three to approximate x(t + h) given x(t). . . P13 (15 pnts) Consider the data: k xk f (xk ) 0 1 2 −0. OLD EXAMS The normal equations should involve the vector c = [c 0 c1 c2 ] .) rather than write the update as a monstrous equation of the form x(t + h) ← M (x(t)). P10 (9 pnts) Consider the following approximation: f (x) ≈ 3f (x − h) − 4f (x) + f (x + 3h) 6h2 • Derive this approximation using Taylor’s Theorem.5 0 0. x(t + h) ← R(x(t). Your answer should be something like O h? .160 APPENDIX A. give the accuracy of the above approximation. h) • Suppose that g(t) ≥ m > 0 for all t. x(t). . which uniquely determines a function in F. . • Assuming that f (x) has bounded derivatives.5 5 15 9 • Construct the divided diﬀerences table for the data. x (t). . x (t) ← Q1 (x(t). x (t). .

842. • Suppose that these data were generated by the function f (x) = 40x 3 − 32x2 − 6x + 15. get a O h2 approximation to f (0) using the method of Richardson’s Extrapolation. Which root is it? Rewrite the expression for that root to eliminate subtractive cancellation.5]. and ﬁfth intervals. P1 (12 pnts) Clearly state Taylor’s Theorem for f (x + h). 0) D(1. Let Q be your “splitting matrix. and should probably involve x and h. Using Taylor’s Theorem. 1) A. Your iteration should look like xk+1 = ?? • Write the Newton’s Method iteration for ﬁnding 3 25/4. Find a reasonable C such that the truncation error is no greater than Ch 4 . P3 (12 pnts) Write the iteration for Newton’s Method or the Secant Method for ﬁnding a root to the equation f (x) = 0. • For the same f (x) and h. Your solution should look something like: xk+1 =?? + ?? ?? [Problems] Answer no more than four of the following. 1 • Let f (x) = x2 + 1.5. Fall 2004 [Deﬁnitions] Answer no more than two of the following. FALL 2004 161 • Construct the Newton form of the polynomial p(x) of lowest degree that interpolates f (x) at these nodes. . P2 (12 pnts) Write the general form of the iterated solution to the problem Ax = b. Give the second. ﬁnd the x1 and x2 for your iteration. and state the conditions on the variable that appears in the error term.4. Find a numerical upper bound on the error |p(x) − f (x)| over the interval [−0. P2 (14 pnts) Give a O h4 approximation to sin (h + π/2) . Jacobi. That is. P4 (14 pnts) One of the roots of x2 − bx + c = 0 as found by the quadratic formula is subject to subtractive cancellation when |b| |c| .A. FIRST MIDTERM.4 First Midterm. Call your approximation φ(h). Approximate f (0) using your φ(h). √ iteratively deﬁne a sequence with x 0 = 3. Include all hypotheses. “black box” function. fourth. Your answer should be a number. 0. describe what Q is. and let f (·) be a given. Your solution should look something like: ??x(k) =??x(k−1) +?? or x(k) =??x(k−1) +?? For one of the following variants. Use x0 = 3. √ P3 (14 pnts) Use Newton’s Method to ﬁnd a good approximation to 3 24. for h = 2 . derive a O (h) approximation to the derivative f (x). third. 0) D(1. your answer should probably include a table like this: D(0.” and use the factor ω. the approximation should be deﬁned in terms of f (·) evaluated at some values. P1 (14 pnts) Perform bisection on the function graphed below. P14 (15 pnts) • Write out the iteration step of Newton’s Method for ﬁnding a zero of the function f (x). Gauss-Seidel. P15 (15 pnts) • Let x be some given number. such that xk → 3 24. in relation to A : Richardson’s. 256]. Note that 3 25/4 ≈ 1. 5 • Letting x0 = 2 . Let the ﬁrst interval be [0.

(Hint: Use Taylor’s Theorem twice.) Is this faster or slower convergence than for Newton’s Method with a simple root? A. ek+1 = 2f (xk ) k where ek = r − xk . What degree is this polynomial? .5 Second Midterm. Find the ω that minimizes I − ωA2 2 .162 64 32 0 -32 -64 0 32 64 96 128 160 192 APPENDIX A. and f (x) is continuous. Suppose that f (x) has the properties that f (x) ≥ m > 0 for all x. . ﬁnd x(1) using either the Gauss-Seidel or Jacobi iterative schemes. 1 ek+1 ≈ ek . P1 (10 pnts) For a set of distinct nodes x 0 . and |f (x)| ≤ M for all x. x1 . Fall 2004 5 (x)? [Deﬁnitions] Answer no more than two of the following three. P2 (20 pnts) Let f (x) = 0 have a unique root r. x13 . P1 (20 pnts) Suppose that the eigenvalues of A are in [α. What is the minimal value of I − ωA2 2 for this optimal ω? For full credit you need to make some argument why this ω minimizes this norm. 2 when |ek | is suﬃciently small. . . Show all your work. P3 (20 pnts) Suppose that f (r) = 0 = f (r) = f (r). Find δ > 0 such that |e k | < δ insures that xn → r. OLD EXAMS 224 256 P5 (14 pnts) Find the LU decomposition of the following matrix by Gaussian Elimination with na¨ pivoting. Letting ek = xk −r. Justify your answer. Use weighting ω = 1. ıve 3 2 4 −6 1 −6 12 −12 10 P6 (14 pnts) Consider the linear equation: 5 6 1 1 2 4 0 x = −6 3 1 2 6 Letting x(0) = [1 1 1] . [Questions] Answer only one of the following. . Recall that the form of the error for Newton’s Method is −f (ξk ) 2 e . what is the Lagrange Polynomial You may write your answer as a product. β] with 0 < α < β. show that for Newton’s Method. You must indicate which scheme you are using.

[Problems] Answer no more than ﬁve of the following six. Then for each x ∈ [a. β such that the following is a quadratic spline on [0. . b] such that f (x) − p(x) = ? ? n (?) . P1 (16 pnts) How large must n be to interpolate the function e x to within 0. xn on [a. using h = 3 . Combine evaluations of φ(·) to devise a O h10 approximation to L.5. Let f (n+1) be continuous. b] there is some ξ ∈ [a. . b]. • Assuming that f (x) has bounded derivatives. Use this quadrature rule to approximate 1 π= −1 2 dx 1 + x2 P4 (16 pnts) Find the polynomial of degree ≤ 3 interpolating the data x y −1 0 1 2 −2 8 0 4 (Hint: using Lagrange Polynomials will probably take too long. SECOND MIDTERM.001 on the interval [0. .) P5 (16 pnts) Find constants α. 2] with the polynomial interpolant at n equally spaced nodes? P2 (16 pnts) Let φ(h) be some computable approximation to the quantity L such that φ(h) = L + a5 h5 + a10 h10 + a15 h15 + . 1 • Let f (x) = x2 . .A. 4]. FALL 2004 163 P2 (10 pnts) Complete this statement of the polynomial interpolation error theorem: Let p be the polynomial of degree at most n interpolating function f at the n+1 distinct nodes x 0 . b]. x1 . Approximate f (0) with this approximation. x2 − 3x + 4 Q(x) = α(x − 1)2 + β(x − 1) + 2 2 x +x−4 f (x) ≈ :0≤x<1 :1≤x<3 :3≤x≤4 P6 (16 pnts) Consider the following approximation: f (x + h) − 5f (x) + 4f (x + h ) 2 3h • Derive this approximation via Taylor’s Theorem. P3 (16 pnts) Derive the constants A and B which make the quadrature rule 1 −1 f (x) dx ≈ Af √ 15 − 5 + Bf (0) + Af √ 15 5 exact for polynomials of degree ≤ 2. . Your answer should be something like O h? . i=0 P3 (10 pnts) State the conditions which deﬁne the function S(x) as a natural cubic spline on the interval [a. . . give the accuracy of the above approximation.

Let Q be your “splitting matrix. in relation to A : Richardson’s. . P1 (10 pnts) Rewrite this system of higher order ODEs as a system of ﬁrst order ODEs: x (t) = x (t) [x(t) + ty(t)] y (t) = y (t) [y(t) − tx(t)] x (0) = 1 x(0) = 4 y (0) = −3 y(0) = −2 P2 (10 pnts) Consider the ODE: x (t) = t (x + 1)2 x(2) = 1/2 Use Euler’s Method to ﬁnd an approximate value of x(2. Your solution should look something like: ??x(k) =??x(k−1) +?? or x(k) =??x(k−1) +?? For one of the following variants. . . Jacobi. Gauss-Seidel. y n [Problems] Answer all of the following. .164 APPENDIX A. P4 (15 pnts) Consider the approximation f (x) ≈ 9f (x + h) − 8f (x) − f (x − 3h) 12h (a) Derive this approximation using Taylor’s Theorem. Combine evaluations of φ(·) to devise a O h8 approximation to L. Your solution should look something like: xk+1 =?? + ?? ?? P3 (10 pnts) Write the general form of the iterated solution to the problem Ax = b. P4 (10 pnts) Deﬁne what it means for the function f (x) ∈ F to be the least squares best approximant to the data x0 x1 . Your answer should be something like O h? . Include all hypotheses. OLD EXAMS A. describe what Q is. P1 (10 pnts) Clearly state Taylor’s Theorem for f (x + h). . x n y0 y1 . give the accuracy of the above approximation.6 Final Exam. (b) Assuming that f (x) has bounded derivatives. . P2 (10 pnts) Write the iteration for Newton’s Method or the Secant Method for ﬁnding a root to the equation f (x) = 0. and state the conditions on the variable that appears in the error term. P3 (10 pnts) Let φ(h) be some computable approximation to the quantity L such that φ(h) = L + a4 h4 + a8 h8 + a12 h12 + .” and use the factor ω.1) using a single step. . Fall 2004 [Deﬁnitions] Answer all of the following.

6.5 f (xk ) 1. (c) What degree is your polynomial? Is this strange? (d) Suppose that these data were generated by the function f (x) = 1 + cos 2πx . (b) Construct the Newton form of the polynomial p(x) of lowest degree that interpolates f (x) at these nodes.25 0. Write your subroutine in pseudo code. given an input number z. x (t)) . Q(x) = α (x − 1)3 + β (x − 1)2 + 12 (x − 1) + 1 γx3 + 18x2 − 30x + 19 :1≤x<2 :2≤x<3 165 √ P6 (15 pnts) Devise a subroutine that. You may write your answer as a program: x (t) ← Q0 (x(t)) x (t) ← Q1 (x(t). γ such that the following is a natural cubic spline on [1. . Matlab. Your answer should be a number. P7 (15 pnts) Consider the ODE: x (t) = cos (x(t)) + sin (x(t)) x(0) = 1 Use Taylor’s theorem to devise a O h3 approximant to x(t + h) which is a function of x. P1 (20 pnts) Consider the “quadrature” rule: 1 0 f (x) dx ≈ A0 f (0) + A1 f (1) + A2 f (1) + A3 f (1) (a) Write four linear equations involving A i to make this rule exact for polynomials of the highest possible degree. β. in particular your subroutine should not have access to a cubed root or logarithm operator. FINAL EXAM. division. . FALL 2004 P5 (15 pnts) Find constants. (b) Find the constants Ai using Gaussian Elimination with na¨ pivoting. Your subroutine should use only the basic arithmetic operations of addition. .5 1 0. ıve P2 (20 pnts) Consider the data: 0 1 2 k xk 0 0. ˜ [Questions] Answer all of the following.5].A. 3]. x (t).) ˜ such that x(t + h) = x + O h3 . . x (t). 0. subtraction. Include reasonable termination conditions so your subroutine will terminate if it ﬁnds a good approximate solution. multiplication. or C/C++. x ← R(x(t). and h only. calculates an approximate to 3 z via Newton’s Method. 2 Find an upper bound on the error |p(x) − f (x)| over the interval [0. t. . α.5 (a) Construct the divided diﬀerences table for the data. .

. xn . Answer this question (correctly). We are concerned with ﬁnding the least-squares best f ∈ F to approximate the data x0 x1 . . . x1 .166 P3 (20 pnts) Let APPENDIX A. (xj ) = δj7 . c2 ∈ R . . That is. . which determines the least squares approximant in F. . OLD EXAMS √ F = f (x) = c0 x + c1 (x) + c2 ex | c0 . c1 . among the values x0 . where f is the least squares best approximant to the data. . x n y0 y1 . Prove that f (x7 ) = y7 . . where (x) is some black box function. but did not. P4 (10 pnts) State some substantive question which you thought might appear on this exam. . y n (a) Set up (but do not attempt to solve) the normal equations to ﬁnd the the vector c = [c0 c1 c2 ] . (b) Now suppose that the function (x) happens to be the Lagrange Polynomial associated with x7 .

or with modiﬁcations and/or translated into another language. The ”Cover Texts” are certain short passages of text that are listed. and a Back-Cover Text may be at most 25 words. A ”Modiﬁed Version” of the Document means any work containing the Document or a portion of it. as being those of Invariant Sections. But this License is not limited to software manuals. Suite 330. if the Document is in part a textbook of mathematics. A copy that is not ”Transparent” is called ”Opaque”. in any medium. royalty-free license. a Secondary Section may not explain any mathematics.Appendix B GNU Free Documentation License Version 1. philosophical. or other functional and useful document ”free” in the sense of freedom: to assure everyone the eﬀective freedom to copy and redistribute it. ethical or political position regarding them. which means that derivative works of the document must themselves be free in the same sense. Secondarily. An image format is not Transparent if used for any substantial amount of text.) The relationship could be a matter of historical connection with the subject or with related matters. A copy made in an otherwise Transparent ﬁle format whose markup. November 2002 Copyright c 2000. that contains a notice placed by the copyright holder saying it can be distributed under the terms of this License. APPLICABILITY AND DEFINITIONS This License applies to any manual or other work. (Thus. as Front-Cover Texts or Back-Cover Texts. This License is a kind of ”copyleft”. Such a notice grants a world-wide. it can be used for any textual work. We recommend this License principally for works whose purpose is instruction or reference. but changing it is not allowed. and is addressed as ”you”. or absence of markup. has been arranged to thwart or discourage subsequent modiﬁcation by readers is not Transparent. The ”Invariant Sections” are certain Secondary Sections whose titles are designated. or of legal. If a section does not ﬁt the above deﬁnition of Secondary then it is not allowed to be designated as Invariant. with or without modifying it. We have designed this License in order to use it for manuals for free software. Boston. It complements the GNU General Public License. Any member of the public is a licensee.2002 Free Software Foundation. 1. in the notice that says that the Document is released under this License. represented in a format whose speciﬁcation is available to the general public. in the notice that says that the Document is released under this License. modify or distribute the work in a way requiring permission under copyright law. 167 . The Document may contain zero Invariant Sections. A ”Transparent” copy of the Document means a machine-readable copy. either copied verbatim. either commercially or noncommercially. refers to any such manual or work. A ”Secondary Section” is a named appendix or a front-matter section of the Document that deals exclusively with the relationship of the publishers or authors of the Document to the Document’s overall subject (or to related matters) and contains nothing that could fall directly within that overall subject. unlimited in duration. and that is suitable for input to text formatters or for automatic translation to a variety of formats suitable for input to text formatters. because free software needs free documentation: a free program should come with manuals providing the same freedoms that the software does. that is suitable for revising the document straightforwardly with generic text editors or (for images composed of pixels) generic paint programs or (for drawings) some widely available drawing editor. to use that work under the conditions stated herein. below.2. while not being considered responsible for modiﬁcations made by others. Preamble The purpose of this License is to make a manual. MA 02111-1307 USA Everyone is permitted to copy and distribute verbatim copies of this license document. commercial. Inc. 59 Temple Place.2001. regardless of subject matter or whether it is published as a printed book. The ”Document”. this License preserves for the author and publisher a way to get credit for their work. textbook. You accept the license if you copy. which is a copyleft license designed for free software. If the Document does not identify any Invariant Sections then there are none. A Front-Cover Text may be at most 5 words.

SGML or XML using a publicly available DTD. you must do these things in the Modiﬁed Version: A. the copyright notices. If the required texts for either cover are too voluminous to ﬁt legibly. SGML or XML for which the DTD and/or processing tools are not generally available. and continue the rest onto adjacent pages. In addition. COPYING IN QUANTITY If you publish printed copies (or copies in media that commonly have printed covers) of the Document. free of added material. one or more persons or entities responsible for authorship of the modiﬁcations in the Modiﬁed Version. Add an appropriate copyright notice for your modiﬁcations adjacent to the other copyright notices. as the publisher. Examples of transparent image formats include PNG. under the same conditions stated above. Both covers must also clearly and legibly identify you as the publisher of these copies. ”Dedications”. legibly. or state in or with each Opaque copy a computer-network location from which the general network-using public has access to download using public-standard network protocols a complete Transparent copy of the Document. ”Title Page” means the text near the most prominent appearance of the work’s title. 3. A section ”Entitled XYZ” means a named subunit of the Document whose title either is precisely XYZ or contains XYZ in parentheses following text that translates XYZ in another language. If you publish or distribute Opaque copies of the Document numbering more than 100. clearly and legibly. provided that this License. You may also lend copies. or ”History”. if it has fewer than ﬁve). (Here XYZ stands for a speciﬁc section name mentioned below. but not required. plus such following pages as are needed to hold. either commercially or noncommercially. For works in formats which do not have any title page as such. List on the Title Page. Copying with changes limited to the covers. and from those of previous versions (which should. as authors. preceding the beginning of the body of the text. LaTeX input format. that you contact the authors of the Document well before redistributing any large number of copies. You may use the same title as a previous version if the original publisher of that version gives permission. all these Cover Texts: Front-Cover Texts on the front cover. the material this License requires to appear in the title page. as long as they preserve the title of the Document and satisfy these conditions. PostScript or PDF designed for human modiﬁcation. you may accept compensation in exchange for copies. The ”Title Page” means. 4. The front cover must present the full title with all words of the title equally prominent and visible. but only as regards disclaiming warranties: any other implication that these Warranty Disclaimers may have is void and has no eﬀect on the meaning of this License. VERBATIM COPYING You may copy and distribute the Document in any medium. when you begin distribution of Opaque copies in quantity. and Back-Cover Texts on the back cover. with the Modiﬁed Version ﬁlling the role of the Document. to ensure that this Transparent copy will remain thus accessible at the stated location until at least one year after the last time you distribute an Opaque copy (directly or through your agents or retailers) of that edition to the public. and the machine-generated HTML. ”Endorsements”. and that you add no other conditions whatsoever to those of this License. unless they release you from this requirement. to give them a chance to provide you with an updated version of the Document. If you distribute a large enough number of copies you must also follow the conditions in section 3. You may add other material on the covers in addition. C. and the license notice saying this License applies to the Document are reproduced in all copies. However. for a printed book. and you may publicly display copies. can be treated as verbatim copying in other respects. you must take reasonably prudent steps. . You may not use technical measures to obstruct or control the reading or further copying of the copies you make or distribute. Opaque formats include proprietary formats that can be read and edited only by proprietary word processors.) To ”Preserve the Title” of such a section when you modify the Document means that it remains a section ”Entitled XYZ” according to this deﬁnition. 2. PostScript or PDF produced by some word processors for output purposes only. you must either include a machinereadable Transparent copy along with each Opaque copy. the title page itself. Use in the Title Page (and on the covers. and the Document’s license notice requires Cover Texts. thus licensing distribution and modiﬁcation of the Modiﬁed Version to whoever possesses a copy of it.168 APPENDIX B. together with at least ﬁve of the principal authors of the Document (all of its principal authors. and standard-conforming simple HTML. if any) a title distinct from that of the Document. MODIFICATIONS You may copy and distribute a Modiﬁed Version of the Document under the conditions of sections 2 and 3 above. GNU FREE DOCUMENTATION LICENSE Examples of suitable formats for Transparent copies include plain ASCII without markup. be listed in the History section of the Document). B. you should put the ﬁrst ones listed (as many as ﬁt reasonably) on the actual cover. you must enclose the copies in covers that carry. such as ”Acknowledgements”. These Warranty Disclaimers are considered to be included by reference in this License. It is requested. State on the Title page the name of the publisher of the Modiﬁed Version. E. Preserve all the copyright notices of the Document. If you use the latter option. numbering more than 100. D. XCF and JPG. if there were any. The Document may include Warranty Disclaimers next to the notice which states that this License applies to the Document. provided that you release the Modiﬁed Version under precisely this License. Texinfo input format.

year. and follow this License in all other respects regarding verbatim copying of that document. forming one section Entitled ”History”. provided you insert a copy of this License into the extracted document. These titles must be distinct from any other section titles. if any. You may omit a network location for a work that was published at least four years before the Document itself. M. Delete any section Entitled ”Endorsements”. and a passage of up to 25 words as a Back-Cover Text. likewise combine any sections Entitled ”Acknowledgements”. Preserve all the Invariant Sections of the Document. You may add a passage of up to ﬁve words as a Front-Cover Text. but you may replace the old one. H. make the title of each such section unique by adding at the end of it. create one stating the title. COMBINING DOCUMENTS You may combine the Document with other documents released under this License. unaltered in their text and in their titles. Section numbers or the equivalent are not considered part of the section titles. given in the Document for public access to a Transparent copy of the Document. You may extract a single document from such a collection. a license notice giving the public permission to use the Modiﬁed Version under the terms of this License. The author(s) and publisher(s) of the Document do not by this License give permission to use their names for publicity for or to assert or imply endorsement of any Modiﬁed Version. Only one passage of Front-Cover Text and one of Back-Cover Text may be added by (or through arrangements made by) any one entity. in the form shown in the Addendum below. K. COLLECTIONS OF DOCUMENTS You may make a collection consisting of the Document and other documents released under this License. Include an unaltered copy of this License. to the end of the list of Cover Texts in the Modiﬁed Version. statements of peer review or that the text has been approved by an organization as the authoritative deﬁnition of a standard. and multiple identical Invariant Sections may be replaced with a single copy. the name of the original author or publisher of that section if known. in parentheses. under the terms deﬁned in section 4 above for modiﬁed versions. If there are multiple Invariant Sections with the same name but diﬀerent contents. 5. Make the same adjustment to the section titles in the list of Invariant Sections in the license notice of the combined work. You must delete all sections Entitled ”Endorsements”. and add to it an item stating at least the title. then add an item describing the Modiﬁed Version as stated in the previous sentence. If the Modiﬁed Version includes new front-matter sections or appendices that qualify as Secondary Sections and contain no material copied from the Document. on explicit permission from the previous publisher that added the old one. AGGREGATION WITH INDEPENDENT WORKS . G. new authors. Preserve in that license notice the full lists of Invariant Sections and required Cover Texts given in the Document’s license notice. authors. I. Such a section may not be included in the Modiﬁed Version. You may add a section Entitled ”Endorsements”. In the combination. or if the original publisher of the version it refers to gives permission. The combined work need only contain one copy of this License. L. you may not add another. immediately after the copyright notices. previously added by you or by arrangement made by the same entity you are acting on behalf of. provided it contains nothing but endorsements of your Modiﬁed Version by various parties–for example. Preserve the network location. or else a unique number. year. and any sections Entitled ”Dedications”. you may at your option designate some or all of these sections as invariant. N. and likewise the network locations given in the Document for previous versions it was based on. Preserve the Title of the section. Preserve its Title. Include. and publisher of the Document as given on its Title Page. To do this. For any section Entitled ”Acknowledgements” or ”Dedications”. If the Document already includes a cover text for the same cover.169 F. unmodiﬁed. provided that you follow the rules of this License for verbatim copying of each of the documents in all other respects. and distribute it individually under this License. and preserve in the section all the substance and tone of each of the contributor acknowledgements and/or dedications given therein. Preserve the section Entitled ”History”. O. and replace the individual copies of this License in the various documents with a single copy that is included in the collection. add their titles to the list of Invariant Sections in the Modiﬁed Version’s license notice. Preserve any Warranty Disclaimers. and that you preserve all their Warranty Disclaimers. These may be placed in the ”History” section. J. you must combine any sections Entitled ”History” in the various original documents. 6. 7. and list them all as Invariant Sections of your combined work in its license notice. and publisher of the Modiﬁed Version as given on the Title Page. If there is no section Entitled ”History” in the Document. Do not retitle any existing section to be Entitled ”Endorsements” or to conﬂict in title with any Invariant Section. provided that you include in the combination all of the Invariant Sections of all of the original documents.

revised versions of the GNU Free Documentation License from time to time. the Document’s Cover Texts may be placed on covers that bracket the Document within the aggregate. If your document contains nontrivial examples of program code. but you may include translations of some or all Invariant Sections in addition to the original versions of these Invariant Sections. and with the Back-Cover Texts being LIST. Replacing Invariant Sections with translations requires special permission from their copyright holders. ADDENDUM: How to use this License for your documents To use this License in a document you have written.. 9. See http://www. However. If you have Invariant Sections. you may choose any version ever published (not as a draft) by the Free Software Foundation. Any other attempt to copy. and will automatically terminate your rights under this License. . from you under this License will not have their licenses terminated so long as such parties remain in full compliance. or ”History”. 8. you have the option of following the terms and conditions either of that speciﬁed version or of any later version that has been published (not as a draft) by the Free Software Foundation. Permission is granted to copy. or distribute the Document except as expressly provided for under this License.gnu. Each version of the License is given a distinguishing version number. If the Cover Text requirement of section 3 is applicable to these copies of the Document. TERMINATION You may not copy. Such new versions will be similar in spirit to the present version. You may include a translation of this License. so you may distribute translations of the Document under the terms of section 4. the requirement (section 4) to Preserve its Title (section 1) will typically require changing the actual title. sublicense or distribute the Document is void. is called an ”aggregate” if the copyright resulting from the compilation is not used to limit the legal rights of the compilation’s users beyond what the individual works permit. parties who have received copies. replace the ”with.2 or any later version published by the Free Software Foundation. no Front-Cover Texts. ”Dedications”.Texts. distribute and/or modify this document under the terms of the GNU Free Documentation License. and all the license notices in the Document. If you have Invariant Sections without Cover Texts. we recommend releasing these examples in parallel under your choice of free software license. If a section in the Document is Entitled ”Acknowledgements”. in or on a volume of a storage or distribution medium. with no Invariant Sections. or the electronic equivalent of covers if the Document is in electronic form. such as the GNU General Public License. to permit their use in free software. the original version will prevail. provided that you also include the original English version of this License and the original versions of those notices and disclaimers. and no Back-Cover Texts. Otherwise they must appear on printed covers that bracket the whole aggregate. then if the Document is less than one half of the entire aggregate. with the Front-Cover Texts being LIST. In case of a disagreement between the translation and the original version of this License or a notice or disclaimer. modify. GNU FREE DOCUMENTATION LICENSE A compilation of the Document or its derivatives with other separate and independent documents or works. modify. or rights. TRANSLATION Translation is considered a kind of modiﬁcation. When the Document is included in an aggregate. but may diﬀer in detail to address new problems or concerns. Front-Cover Texts and Back-Cover Texts. and any Warranty Disclaimers.org/copyleft/.170 APPENDIX B. If the Document speciﬁes that a particular numbered version of this License ”or any later version” applies to it. include a copy of the License in the document and put the following copyright and license notices just after the title page: Copyright c YEAR YOUR NAME. merge those two alternatives to suit the situation. Version 1.” line with this: with the Invariant Sections being LIST THEIR TITLES. FUTURE REVISIONS OF THIS LICENSE The Free Software Foundation may publish new. sublicense. or some other combination of the three. 10. A copy of the license is included in the section entitled ”GNU Free Documentation License”. If the Document does not specify a version number of this License.. this License does not apply to the other works in the aggregate which are not themselves derivative works of the Document.

ISBN 0070124477. Mathematics of scientiﬁc computing. 1999.columbia. Brooks/Cole Publishing Co. ISBN 0-471-31647-4. fourth edition. Dover. Numerical analysis. CA. 2002. 2003. ISBN 0486652416. Paciﬁc Grove. Early Transcendentals.. Brooks/Cole Publishing Co. 171 . third edition. ISBN 0-534-39330-6. Belmont. Calculus. An Introduction to Numerical Methods and Analysis. 1987. New York. McGraw-Hill. An Algorithmic Approach. 2001. New York. [8] James Stewart. Brooks/Cole-Thomson Learning. Wiley. CA. ISBN 0-534-35184-0.Bibliography [1] Guillaume Bal. Paciﬁc Grove. [7] David Kincaid and Ward Cheney. [5] Eugene Isaacson and Herbert Bishop Keller. [3] James F. ISBN 0-534-33892-5. Dover. Lecture notes.edu/~gb2030/COURSES/E6302/NumAnal. 1980. New York. 1966.. Elementary Numerical Analysis. URL [2] Samuel Daniel Conte and Carl de Boor. Numerical Methods for Scientists and Engineers. ISBN 0486680290. 1996. [4] Richard Hamming. course on numerical analysis. Numerical Mathematics and Computing. [6] David Kincaid and Ward Cheney. second edition. Analysis of Numerical Methods. ﬁfth edition.ps. http://www. Epperson. CA.

136 Taylor’s Theorem. 119 LU Factorization. 97 Romberg Algorithm. 136 partition. 8 eigenvector. 69 Runge-Kutta Method. 3 trapezoidal rule. 7 172 subordinate. 6 Intermediate Value Theorem. 28 natural cubic spline. 3 bisection method. 104. 104 Riemann Integrable. 67 eigenvalue. Gaussian Quadrature. 92. 58 splines. 100 Dirichlet function. 138 truncation. 1. 99 divided diﬀerences. 25 error roundoﬀ. 108 multiplier. 9 normal equations. 83 Newton’s Method. 143 secant method. 33 method of undetermined coeﬃcients. 79 basis splines. 117 linear. 118 ODE errors. 49 knot. 52 norm. 91 Chebyshev Polynomials. 98 Hilbert Matrix. Na¨ 25 ıve. 141 big-O. 79 Lagrange Polynomials.Index Backwards Euler’s Method. 138 stepping. 8 elementary row operations. 138 higher order methods. 119 deﬁnition. 122 Chebyshev Quadrature. 98 Riemann integral. 79 quadratic. 48 inner product. 137 stability. 138 Euler’s Method. 106 vector space. 5 . 100 error. 150 Richardson Extrapolation. 136. 137 Gaussian Elimination. 84 linear. 81 subtractive cancellation. 69. 4 Taylor’s Series Method. 89. 79 phase curve. 50 centered diﬀerence. 105 Runge Function. 114 composite quadrature rule. 102 recursive. 89. 63 least squares. 107 Heaviside function.

- Numerical Methods 2009 [CuPpY]
- Numerical Methods for Engineers and Scientists, 2nd Edition
- solution_manual for numerical analysis
- Numerical Methods Notes
- Numerical Methods
- matlab
- Nice Hand Book -Numerical Analysis - PDF
- numerical computions
- Numerical Methods
- Introduction to Numerical Methods and Matlab Programming for Engineers
- numerical methods
- Numerical Methods Notes
- Numerical Method
- dahlquist Numerical Analysis
- Numerical Methods using MATLAB
- 19485352 Advanced Mathematics
- Octave Manual
- Numerical Analysis
- Gilat - Numerical Methods for Engineers and Scientists 3rd c2014
- Chapra Applied Numerical Metho
- Secant Method
- ece_311 lab
- Guide to BSc Numerical Methods
- Numerical Methods Using MATLAB 4ed Solution Manual
- Rao G.S. Numerical Analysis
- numerical methods!
- Actuarial Mathematics

Skip carousel

- UT Dallas Syllabus for ee2300.001.08f taught by William Pervin (pervin)
- UT Dallas Syllabus for math2418.5u1.10u taught by ()
- UT Dallas Syllabus for math2418.001.08s taught by Paul Stanford (phs031000)
- UT Dallas Syllabus for math2418.001.07f taught by Paul Stanford (phs031000)
- Background Foreground Based Underwater Image Segmentation
- tmpB096
- UT Dallas Syllabus for math2418.001 06s taught by Paul Stanford (phs031000)
- UT Dallas Syllabus for math2418.001.07s taught by Paul Stanford (phs031000)
- AHP technique a way to show preferences amongst alternatives
- UT Dallas Syllabus for ee2300.001 06f taught by Lakshman Tamil (laxman)
- Principal Component Analysis based Opinion Classification for Sentiment Analysis
- UT Dallas Syllabus for ce2300.501.09s taught by Lakshman Tamil (laxman)
- tmpE89F.tmp
- UT Dallas Syllabus for math2418.001.10f taught by Paul Stanford (phs031000)
- UT Dallas Syllabus for math2418.001.08f taught by (xxx)
- UT Dallas Syllabus for math2418.501 05s taught by Paul Stanford (phs031000)
- UT Dallas Syllabus for ee2300.001.08s taught by Naofal Al-dhahir (nxa028000)
- Offline Signature Recognition and Verification using PCA and Neural Network Approach
- UT Dallas Syllabus for ce2300.003.10s taught by Lakshman Tamil (laxman)
- UT Dallas Syllabus for math2333.001.11f taught by William Scott (wms016100)
- frbrich_wp90-1.pdf
- tmp59CC.tmp
- UT Dallas Syllabus for engr2300.001.11f taught by Jung Lee (jls032000)
- tmp6601
- UT Dallas Syllabus for ee2300.001.09s taught by Naofal Al-dhahir (nxa028000)
- UT Dallas Syllabus for math2418.501.07s taught by Paul Stanford (phs031000)
- UT Dallas Syllabus for ee2300.501 05s taught by Edward Esposito (exe010200)
- Information Security using Cryptography and Image Processing
- UT Dallas Syllabus for eco5309.501 05f taught by Daniel Obrien (obri)
- tmp808C.tmp

Sign up to vote on this title

UsefulNot usefulClose Dialog## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

Close Dialog## This title now requires a credit

Use one of your book credits to continue reading from where you left off, or restart the preview.

Loading