Numerical Methods Course Notes

Version 0.11
(UCSD Math 174, Fall 2004)
Steven E. Pav
1
October 13, 2005
1
Department of Mathematics, MC0112, University of California at San Diego, La Jolla, CA 92093-0112.
<spav@ucsd.edu> This document is Copyright c 2004 Steven E. Pav. Permission is granted to copy,
distribute and/or modify this document under the terms of the GNU Free Documentation License, Version
1.2 or any later version published by the Free Software Foundation; with no Invariant Sections, no Front-
Cover Texts, and no Back-Cover Texts. A copy of the license is included in the section entitled ”GNU Free
Documentation License”.
2
Preface
These notes were originally prepared during Fall quarter 2003 for UCSD Math 174, Numerical
Methods. In writing these notes, it was not my intention to add to the glut of Numerical Analysis
texts; they were designed to complement the course text, Numerical Mathematics and Computing,
Fourth edition, by Cheney and Kincaid [7]. As such, these notes follow the conventions of that
text fairly closely. If you are at all serious about pursuing study of Numerical Analysis, you should
consider acquiring that text, or any one of a number of other fine texts by e.g., Epperson, Hamming,
etc. [3, 4, 5].
3.1
3.2 8.4
3.3
1.4
3.4
1.1
4.2 7.1 10.1
4.3
5.1
5.2
7.2
8.3
8.2
9.1
9.2 9.3
10.2
10.3
Figure 1: The chapter dependency of this text, though some dependencies are weak.
Special thanks go to the students of Math 174, 2003–2004, who suffered through early versions
of these notes, which were riddled with (more) errors.
Revision History
0.0 Transcription of course notes for Math 174, Fall 2003.
0.1 As used in Math 174, Fall 2004.
0.11 Added material on functional analysis and Orthogonal Least Squares.
Todo
• More homework questions and example problems.
• Chapter on optimization.
• Chapters on basic finite difference and finite element methods?
• Section on root finding for functions of more than one variable.
i
ii
Contents
Preface i
1 Introduction 1
1.1 Taylor’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Loss of Significance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Vector Spaces, Inner Products, Norms . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.1 Vector Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3.2 Inner Products . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3.3 Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.4 Eigenvalues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4.1 Matrix Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 A “Crash” Course in octave/Matlab 13
2.1 Getting Started . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Useful Commands . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3 Programming and Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.3.1 Logical Forks and Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4 Plotting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3 Solving Linear Systems 25
3.1 Gaussian Elimination with Na¨ıve Pivoting . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1.1 Elementary Row Operations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.1.2 Algorithm Terminology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1.3 Algorithm Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.2 Pivoting Strategies for Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . 29
3.2.1 Scaled Partial Pivoting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.2.2 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.2.3 Another Example and A Real Algorithm . . . . . . . . . . . . . . . . . . . . . 32
3.3 LU Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3.1 An Example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.3.2 Using LU Factorizations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.3.3 Some Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.3.4 Computing Inverses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4 Iterative Solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.4.1 An Operation Count for Gaussian Elimination . . . . . . . . . . . . . . . . . 37
iii
iv CONTENTS
3.4.2 Dividing by Multiplying . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
3.4.3 Impossible Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.4.4 Richardson Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
3.4.5 Jacobi Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
3.4.6 Gauss Seidel Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.4.7 Error Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
3.4.8 A Free Lunch? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4 Finding Roots 49
4.1 Bisection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.1.1 Modifications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
4.1.2 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.2 Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
4.2.1 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
4.2.2 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2.3 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.2.4 Using Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
4.3 Secant Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4.3.1 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.3.2 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5 Interpolation 63
5.1 Polynomial Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.1.1 Lagranges Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.1.2 Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.1.3 Newton’s Nested Form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.1.4 Divided Differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.2 Errors in Polynomial Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.2.1 Interpolation Error Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.2.2 Interpolation Error for Equally Spaced Nodes . . . . . . . . . . . . . . . . . . 73
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6 Spline Interpolation 79
6.1 First and Second Degree Splines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.1.1 First Degree Spline Accuracy . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6.1.2 Second Degree Splines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.1.3 Computing Second Degree Splines . . . . . . . . . . . . . . . . . . . . . . . . 82
6.2 (Natural) Cubic Splines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.2.1 Why Natural Cubic Splines? . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.2.2 Computing Cubic Splines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.3 B Splines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
CONTENTS v
7 Approximating Derivatives 89
7.1 Finite Differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
7.1.1 Approximating the Second Derivative . . . . . . . . . . . . . . . . . . . . . . 91
7.2 Richardson Extrapolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.2.1 Abstracting Richardson’s Method . . . . . . . . . . . . . . . . . . . . . . . . . 92
7.2.2 Using Richardson Extrapolation . . . . . . . . . . . . . . . . . . . . . . . . . 93
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
8 Integrals and Quadrature 97
8.1 The Definite Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
8.1.1 Upper and Lower Sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
8.1.2 Approximating the Integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
8.1.3 Simple and Composite Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
8.2 Trapezoidal Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
8.2.1 How Good is the Composite Trapezoidal Rule? . . . . . . . . . . . . . . . . . 101
8.2.2 Using the Error Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
8.3 Romberg Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
8.3.1 Recursive Trapezoidal Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
8.4 Gaussian Quadrature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 107
8.4.1 Determining Weights (Lagrange Polynomial Method) . . . . . . . . . . . . . . 107
8.4.2 Determining Weights (Method of Undetermined Coefficients) . . . . . . . . . 108
8.4.3 Gaussian Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
8.4.4 Determining Gaussian Nodes . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
8.4.5 Reinventing the Wheel . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113
9 Least Squares 117
9.1 Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
9.1.1 The Definition of Ordinary Least Squares . . . . . . . . . . . . . . . . . . . . 117
9.1.2 Linear Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
9.1.3 Least Squares from Basis Functions . . . . . . . . . . . . . . . . . . . . . . . 119
9.2 Orthonormal Bases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
9.2.1 Alternatives to Normal Equations . . . . . . . . . . . . . . . . . . . . . . . . 122
9.2.2 Ordinary Least Squares in octave/Matlab . . . . . . . . . . . . . . . . . . . . 124
9.3 Orthogonal Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
9.3.1 Computing the Orthogonal Least Squares Approximant . . . . . . . . . . . . 128
9.3.2 Principal Component Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 129
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
10 Ordinary Differential Equations 135
10.1 Elementary Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
10.1.1 Integration and ‘Stepping’ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
10.1.2 Taylor’s Series Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
10.1.3 Euler’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
10.1.4 Higher Order Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
10.1.5 Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
10.1.6 Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
10.1.7 Backwards Euler’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
vi CONTENTS
10.2 Runge-Kutta Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
10.2.1 Taylor’s Series Redux . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
10.2.2 Deriving the Runge-Kutta Methods . . . . . . . . . . . . . . . . . . . . . . . 144
10.2.3 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
10.3 Systems of ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
10.3.1 Larger Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
10.3.2 Recasting Single ODE Methods . . . . . . . . . . . . . . . . . . . . . . . . . . 147
10.3.3 It’s Only Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
10.3.4 It’s Only Autonomous Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
A Old Exams 157
A.1 First Midterm, Fall 2003 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
A.2 Second Midterm, Fall 2003 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
A.3 Final Exam, Fall 2003 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
A.4 First Midterm, Fall 2004 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
A.5 Second Midterm, Fall 2004 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
A.6 Final Exam, Fall 2004 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
B GNU Free Documentation License 167
1. APPLICABILITY AND DEFINITIONS . . . . . . . . . . . . . . . . . . . . . . . . . . 167
2. VERBATIM COPYING . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
3. COPYING IN QUANTITY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
4. MODIFICATIONS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 168
5. COMBINING DOCUMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
6. COLLECTIONS OF DOCUMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169
7. AGGREGATION WITH INDEPENDENT WORKS . . . . . . . . . . . . . . . . . . . 169
8. TRANSLATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
9. TERMINATION . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 170
10. FUTURE REVISIONS OF THIS LICENSE . . . . . . . . . . . . . . . . . . . . . . . 170
ADDENDUM: How to use this License for your documents . . . . . . . . . . . . . . . . . 170
Bibliography 171
Chapter 1
Introduction
1.1 Taylor’s Theorem
Recall from calculus the Taylor’s series for a function, f(x), expanded about some number, c, is
written as
f(x) ∼ a
0
+a
1
(x −c) +a
2
(x −c)
2
+. . . .
Here the symbol ∼ is used to denote a “formal series,” meaning that convergence is not guaranteed
in general. The constants a
i
are related to the function f and its derivatives evaluated at c. When
c = 0, this is a MacLaurin series.
For example we have the following Taylor’s series (with c = 0):
e
x
= 1 +x +
x
2
2!
+
x
3
3!
+. . .
sin (x) = x −
x
3
3!
+
x
5
5!
−. . .
cos (x) = 1 −
x
2
2!
+
x
4
4!
−. . .
(1.1)
(1.2)
(1.3)
Theorem 1.1 (Taylor’s Theorem). If f(x) has derivatives of order 0, 1, 2, . . . , n+1 on the closed
interval [a, b], then for any x and c in this interval
f(x) =
n
¸
k=0
f
(k)
(c) (x −c)
k
k!
+
f
(n+1)
(ξ) (x −c)
n+1
(n + 1)!
,
where ξ is some number between x and c, and f
k
(x) is the k
th
derivative of f at x.
We will use this theorem again and again in this class. The main usage is to approximate
a function by the first few terms of its Taylor’s series expansion; the theorem then tells us the
approximation is “as good” as the final term, also known as the error term. That is, we can make
the following manipulation:
1
2 CHAPTER 1. INTRODUCTION
f(x) =
n
¸
k=0
f
(k)
(c) (x −c)
k
k!
+
f
(n+1)
(ξ) (x −c)
n+1
(n + 1)!
f(x) −
n
¸
k=0
f
(k)
(c) (x −c)
k
k!
=
f
(n+1)
(ξ) (x −c)
n+1
(n + 1)!

f(x) −
n
¸
k=0
f
(k)
(c) (x −c)
k
k!

=

f
(n+1)
(ξ)

[x −c[
n+1
(n + 1)!
.
On the left hand side is the difference between f(x) and its approximation by Taylor’s series.
We will then use our knowledge about f
(n+1)
(ξ) on the interval [a, b] to find some constant M such
that

f(x) −
n
¸
k=0
f
(k)
(c) (x −c)
k
k!

=

f
(n+1)
(ξ)

[x −c[
n+1
(n + 1)!
≤ M[x −c[
n+1
.
Example Problem 1.2. Find an approximation for f(x) = sinx, expanded about c = 0, using
n = 3.
Solution: Solving for f
(k)
is fairly easy for this function. We find that
f(x) = sin x = sin(0) +
cos(0) x
1!
+
−sin(0) x
2
2!
+
−cos(0) x
3
3!
+
sin(ξ) x
4
4!
= x −
x
3
6
+
sin(ξ) x
4
24
,
so

sin x −

x −
x
3
6

=

sin(ξ) x
4
24


x
4
24
,
because [sin(ξ)[ ≤ 1. ¬
Example Problem 1.3. Apply Taylor’s Theorem for the case n = 1.
Solution: Taylor’s Theorem for n = 1 states: Given a function, f(x) with a continuous derivative
on [a, b], then
f(x) = f(c) +f
t
(ξ)(x −c)
for some ξ between x, c when x, c are in [a, b].
This is the Mean Value Theorem. As a one-liner, the MVT says that at some time during a trip,
your velocity is the same as your average velocity for the trip. ¬
Example Problem 1.4. Apply Taylor’s Theorem to expand f(x) = x
3
−21x
2
+17 around c = 1.
Solution: Simple calculus gives us
f
(0)
(x) = x
3
−21x
2
+ 17,
f
(1)
(x) = 3x
2
−42x,
f
(2)
(x) = 6x −42,
f
(3)
(x) = 6,
f
(k)
(x) = 0.
1.1. TAYLOR’S THEOREM 3
with the last holding for k > 3. Evaluating these at c = 1 gives
f(x) = −3 +−39(x −1) +
−36 (x −1)
2
2
+
6 (x −1)
3
6
.
Note there is no error term, since the higher order derivatives are identically zero. By carrying out
simple algebra, you will find that the above expansion is, in fact, the function f(x). ¬
There is an alternative form of Taylor’s Theorem, in this case substituting x + h for x, and x
for c in the more general version. This gives
Theorem 1.5 (Taylor’s Theorem, Alternative Form). If f(x) has derivatives of order 0, 1, . . . , n+
1 on the closed interval [a, b], then for any x in this interval and any h such that x + h is in this
interval,
f(x +h) =
n
¸
k=0
f
(k)
(x) (h)
k
k!
+
f
(n+1)
(ξ) (h)
n+1
(n + 1)!
,
where ξ is some number between x and x +h.
We generally apply this form of the theorem with h → 0. This leads to a discussion on the
matter of Orders of Convergence. The following definition will suffice for this class
Definition 1.6. We say that a function f(h) is in the class O

h
k

(pronounced “big-Oh of h
k
”)
if there is some constant C such that
[f(h)[ ≤ C [h[
k
for all h “sufficiently small,” i.e., smaller than some h

in absolute value.
For a function f ∈ O

h
k

we sometimes write f = O

h
k

. We sometimes also write O

h
k

,
meaning some function which is a member of this class.
Roughly speaking, through use of the “Big-O” function we can write an expression without
“sweating the small stuff.” This can give us an intuitive understanding of how an approximation
works, without losing too many of the details.
Example 1.7. Consider the Taylor expansion of ln x:
ln (x +h) = ln x +
(1/x) h
1
+

−1/x
2

h
2
2
+

2/ξ
3

h
3
6
Letting x = 1, we have
ln (1 +h) = h −
h
2
2
+
1

3
h
3
.
Using the fact that ξ is between 1 and 1 +h, as long as h is relatively small (say smaller than
1
2
),
the term
1

3
can be bounded by a constant, and thus
ln (1 +h) = h −
h
2
2
+O

h
3

.
Thus we say that h −
h
2
2
is a O

h
3

approximation to ln(1 +h). For example
ln(1 + 0.01) ≈ 0.009950331 ≈ 0.00995 = 0.01 −
0.01
2
2
.
4 CHAPTER 1. INTRODUCTION
1.2 Loss of Significance
Generally speaking, a computer stores a number x as a mantissa and exponent, that is x = ±r10
k
,
where r is a rational number of a given number of digits in [0.1, 1), and k is an integer in a certain
range.
The number of significant digits in r is usually determined by the user’s input. Operations
on numbers stored in this way follow a “lowest common denominator” type of rule, i.e., precision
cannot be gained but can be lost. Thus for example if you add the two quantities 0.171717 and
0.51, then the result should only have two significant digits; the precision of the first measurement
is lost in the uncertainty of the second.
This is as it should be. However, a loss of significance can be incurred if two nearly equal
quantities are subtracted from one another. Thus if I were to direct my computer to subtract
0.177241 from 0.177589, the result would be .34810
−3
, and three significant digits have been lost.
This loss is called subtractive cancellation, and can often be avoided by rewriting the expression.
This will be made clearer by the examples below.
Errors can also occur when quantities of radically different magnitudes are summed. For exam-
ple 0.1234 + 5.6789 10
−20
might be rounded to 0.1234 by a system that keeps only 16 significant
digits. This may lead to unexpected results.
The usual strategies for rewriting subtractive expressions are completing the square, factoring,
or using the Taylor expansions, as the following examples illustrate.
Example Problem 1.8. Consider the stability of

x + 1 − 1 when x is near 0. Rewrite the
expression to rid it of subtractive cancellation.
Solution: Suppose that x = 1.2345678 10
−5
. Then

x + 1 ≈ 1.000006173. If your computer
(or calculator) can only keep 8 significant digits, this will be rounded to 1.0000062. When 1 is
subtracted, the result is 6.2 10
−6
. Thus 6 significant digits have been lost from the original.
To fix this, we rationalize the expression

x + 1 −1 =

x + 1 −1

x + 1 + 1

x + 1 + 1
=
x + 1 −1

x + 1 + 1
=
x

x + 1 + 1
.
This expression has no subtractions, and so is not subject to subtractive cancelling. When x =
1.2345678 10
−5
, this expression evaluates approximately as
1.2345678 10
−5
2.0000062
≈ 6.17281995 10
−6
on a machine with 8 digits, there is no loss of precision. ¬
Note that nearly all modern computers and calculators store intermediate results of calculations
in higher precision formats. This minimizes, but does not eliminate, problems like those of the
previous example problem.
Example Problem 1.9. Write stable code to find the roots of the equation x
2
+bx +c = 0.
Solution: The usual quadratic formula is
x
±
=
−b ±

b
2
−4c
2
Supposing that b c > 0, the expression in the square root might be rounded to b
2
, giving two
roots x
+
= 0, x

= −b. The latter root is nearly correct, while the former has no correct digits.
1.3. VECTOR SPACES, INNER PRODUCTS, NORMS 5
To correct this problem, multiply the numerator and denominator of x
+
by −b −

b
2
−4c to get
x
+
=
2c
−b −

b
2
−4c
Now if b c > 0, this expression gives root x
+
= −c/b, which is nearly correct. This leads to the
pair:
x

=
−b −

b
2
−4c
2
, x
+
=
2c
−b −

b
2
−4c
Note that the two roots are nearly reciprocals, and if x

is computed, x
+
can easily be computed
with little additional work. ¬
Example Problem 1.10. Rewrite e
x
−cos x to be stable when x is near 0.
Solution: Look at the Taylor’s Series expansion for these functions:
e
x
−cos x =
¸
1 +x +
x
2
2!
+
x
3
3!
+
x
4
4!
+
x
5
5!
+. . .


¸
1 −
x
2
2!
+
x
4
4!

x
6
6!
+. . .

= x +x
2
+
x
3
3!
+O

x
5

This expression has no subtractions, and so is not subject to subtractive cancelling. Note that we
propose calculating x + x
2
+ x
3
/6 as an approximation of e
x
− cos x, which we cannot calculate
exactly anyway. Since we assume x is nearly zero, the approximate should be good. If x is very
close to zero, we may only have to take the first one or two terms. If x is not so close to zero, we
may need to take all three terms, or even more terms of the expansion; if x is far from zero we
should use some other technique. ¬
1.3 Vector Spaces, Inner Products, Norms
We explore some of the basics of functional analysis which may be useful in this text.
1.3.1 Vector Space
A vector space is a collection of objects together with a binary operator which is defined over an
algebraic field.
1
The binary operator allows transparent algebraic manipulation of vectors.
Definition 1.11. A collection of vectors, 1, with a binary addition operator, +, defined over 1,
and a scalar multiply over the real field R, forms a vector space if
1. For each u, v ∈ 1, the sum u+v is a vector in 1. (i.e., the space is “closed under addition.”)
2. Addition is commutative: u +v = v +u for each u, v ∈ 1.
3. For each u ∈ 1, and each scalar α ∈ R the scalar product αu is a vector in 1. (i.e., the space
is “closed under scalar multiplication.”)
4. There is a zero vector 0 ∈ 1 such that for any u ∈ 1, 0u = 0, where 0 is the zero of R.
5. For any u ∈ 1, 1u = u, where 1 is the multiplicative identity of R.
6. For any u, v ∈ 1, and scalars α, β ∈ R, both (α+
R
β)u = αu+βu and α(u +v) = αu+αv
hold, where +
R
is addition in R. (i.e., scalar multiplication distributes in both ways.)
1
For the purposes of this text, this algebraic field will always be the real field, R, though in the real world, the
complex field, C, has some currency.
6 CHAPTER 1. INTRODUCTION
Example 1.12. The most familiar example of a vector space is R
n
, which is the collection of
n-tuples of real numbers. That is u ∈ R
n
is of the form [u
1
, u
2
, . . . , u
n
]
¯
, where u
i
∈ R for
i = 1, 2, . . . , n. Addition and scalar multiplication over R are defined pairwise:
[u
1
, u
2
, . . . , u
n
]
¯
+ [v
1
, v
2
, . . . , v
n
]
¯
= [u
1
+v
1
, u
2
+v
2
, . . . , u
n
+v
n
]
¯
, and
α[u
1
, u
2
, . . . , u
n
]
¯
= [αu
1
, αu
2
, . . . , αu
n
]
¯
Note that some authors distinguish between points in n-dimensional space and vectors in n-
dimensional space. We will use R
n
to refer to both of them, as in this text there is no need to
distinguish them symbolically.
Example 1.13. Let X ⊆ R
k
be an closed, bounded set, and let H be the collection of all functions
from X to R. Then H forms a vector space under the “pointwise” defined addition and scalar
multiplication over R. That is, for u, v ∈ H, u + v is the function in H defined by [u +v] (x) =
u(x) +v(x) for all x ∈ X. And for u ∈ H, α ∈ R, αu is the function in [αu] (x) = α(u(x)).
Example 1.14. Let X ⊆ R
k
be a closed, bounded set, and let H
0
be the collection of all functions
from X to R that take the value zero on ∂X. Then H forms a vector space under the “pointwise”
defined addition and scalar multiplication over R. The only difference between proving H
0
is a
vector space and the proof required for the previous example is in showing that H
0
is indeed closed
under addition and scalar multiplication. This is simple because if x ∈ ∂X, then [u +v] (x) =
u(x) +v(x) = 0 +0, and thus u +v has the property that it takes value 0 on ∂X. Similarly for αu.
This would not have worked if the functions of H
0
were required to take some other value on ∂X,
like, say, 2 instead of 0.
Example 1.15. Let {
n
be the collection of all ‘formal’ polynomials of degree less than or equal to
n with coefficients from R. Then {
n
forms a vector space over R.
Example 1.16. The collection of all real-valued mn matrices forms a vector space over the reals
with the usual scalar multiplication and matrix addition. This space is denoted as R
mn
. Another
way of viewing this space is it is the space of linear functions which carry vectors of R
n
to vectors
of R
m
.
1.3.2 Inner Products
An inner product is a way of “multiplying” two vectors from a vector space together to get a scalar
from the same field the space is defined over (e.g., a real or a complex number). The inner product
should have the following properties:
Definition 1.17. For a vector space, 1, defined over R, a binary function, (, ), which takes two
vectors of 1 to R is an inner product if
1. It is symmetric: (v, u) = (u, v).
2. It is linear in both its arguments:
(αu +βv, w) = α(u, w) +β (v, w) and
(u, αv +βw) = α(u, v) +β (u, w) .
A binary function for which this holds is sometimes called a bilinear form.
3. It is positive: (v, v) ≥ 0, with equality holding if and only if v is the zero vector of 1.
1.3. VECTOR SPACES, INNER PRODUCTS, NORMS 7
Example 1.18. The most familiar example of an inner product is the L
2
(pronounced “L two”)
inner product on the vector space R
n
. If u = [u
1
, u
2
, . . . , u
n
]
¯
, and v = [v
1
, v
2
, . . . , v
n
]
¯
, then
letting
(u, v)
2
=
¸
i
u
i
v
i
gives an inner product. This inner product is the usual vector calculus dot product and is sometimes
written as u v or u
¯
v.
Example 1.19. Let H be the vector space of functions from X to R from Example 1.13. Then
for u, v ∈ H, letting
(u, v)
H
=

X
u(x)v(x) dx,
gives an inner product. This inner product is like the “limit case” of the L
2
inner product on R
n
as n goes to infinity.
1.3.3 Norms
A norm is a way of measuring the “length” of a vector:
Definition 1.20. A function || from a vector space, 1, to R
+
is called a norm if
1. It obeys the triangle inequality: |x +y| ≤ |x| +|y| .
2. It scales positively: |αx| = [α[ |x| , for scalar α.
3. It is positive: |x| ≥ 0, with equality only holding when x is the zero vector.
The easiest way of constructing a norm is on top of an inner product. If (, ) is an inner product
on vector space 1, then letting
|u| =

(u, u)
gives a norm on 1. This is how we construct our most common norms:
Example 1.21. For vector x ∈ R
n
, its L
2
norm is defined
|x|
2
=

n
¸
i=1
x
2
i
1
2
=

x
¯
x
1
2
.
This is constructed on top of the L
2
inner product.
Example 1.22. The L
p
norm on R
n
generalizes the L
2
norm, and is defined, for p > 0, as
|x|
p
=

n
¸
i=1
[x
i
[
p

1/p
.
Example 1.23. The L

norm on R
n
is defined as
|x|

= max
i
[x
i
[ .
The L

norm is somehow the “limit” of the L
p
norm as p →∞.
8 CHAPTER 1. INTRODUCTION
1.4 Eigenvalues
It is assumed the reader has some familiarity with linear algebra. We review the topic of eigenvalues.
Definition 1.24. A nonzero vector x is an eigenvector of a given matrix A, with corresponding
eigenvalue λ if
Ax = λx
Subtracting the right hand side from the left and gathering terms gives
(A −λI) x = 0.
Since x is assumed to be nonzero, the matrix A −λI must be singular. A matrix is singular if and
only if its determinant is zero. These steps are reversible, thus we claim λ is an eigenvalue if and
only if
det (A −λI) = 0.
The left hand side can be expanded to a polynomial in λ, of degree n where A is an nn matrix. This
gives the so-called characteristic equation. Sometimes eigenvectors,-values are called characteristic
vectors,-values.
Example Problem 1.25. Find the eigenvalues of
¸
1 1
4 −2

Solution: The eigenvalues are roots of
0 = det
¸
1 −λ 1
4 −2 −λ

= (1 −λ) (−2 −λ) −4 = λ
2
+λ −6.
This equation has roots λ
1
= −3, λ
2
= 2. ¬
Example Problem 1.26. Find the eigenvalues of A
2
.
Solution: Let λ be an eigenvalue of A, with corresponding eigenvector x. Then
A
2
x = A(Ax) = A(λx) = λAx = λ
2
x.
¬
The eigenvalues of a matrix tell us, roughly, how the linear transform scales a given matrix; the
eigenvectors tell us which directions are “purely scaled.” This will make more sense when we talk
about norms of vectors and matrices.
1.4.1 Matrix Norms
Given a norm || on the vector space, R
n
, we can define the matrix norm “subordinate” to it, as
follows:
Definition 1.27. Given a norm || on R
n
, we define the subordinate matrix norm on R
nn
by
|A| = max
x,=0
|Ax|
|x|
.
1.4. EIGENVALUES 9
We will use the subordinate two-norm for matrices. From the definition of the subordinate
norm as a max, we conclude that if x is a nonzero vector then
|Ax|
2
|x|
2
≤ |A|
2
thus,
|Ax|
2
≤ |A|
2
|x|
2
.
Example 1.28. Strange but true: If Λ is the set of eigenvalues of A, then
|A|
2
= max
λ∈Λ
[λ[ .
Example Problem 1.29. Prove that
|AB|
2
≤ |A|
2
|B|
2
.
Solution:
|AB|
2
=
df
max
x,=0
|ABx|
2
|x|
2
≤ max
x,=0
|A|
2
|Bx|
2
|x|
2
= |A|
2
|B|
2
.
¬
10 CHAPTER 1. INTRODUCTION
Exercises
(1.1) Suppose f ∈ O

h
k

. Show that f ∈ O(h
m
) for any m with 0 < m < k. (Hint: Take
h

< 1.) Note this may appear counterintuitive, unless you remember that O

h
k

is a better
approximation than O(h
m
) when m < k.
(1.2) Suppose f ∈ O

h
k

, and g ∈ O(h
m
) . Show that fg ∈ O

h
k+m

.
(1.3) Suppose f ∈ O

h
k

, and g ∈ O(h
m
) , with m < k. Show that f +g ∈ O(h
m
) .
(1.4) Prove that f(h) = −3h
5
is in O

h
5

.
(1.5) Prove that f(h) = h
2
+ 5h
17
is in O

h
2

.
(1.6) Prove that f(h) = h
3
is not in O

h
4

(Hint: Proof by contradiction.)
(1.7) Prove that sin(h) is in O(h).
(1.8) Find a O

h
3

approximation to sin h.
(1.9) Find a O

h
4

approximation to ln(1+h). Compare the approximate value to the actual when
h = 0.1. How does this approximation compare to the O

h
3

approximate from Example 1.7
for h = 0.1?
(1.10) Suppose that f ∈ O

h
k

. Can you show that f
t
∈ O

h
k−1

?
(1.11) Rewrite

x + 1 −

1 to get rid of subtractive cancellation when x ≈ 0.
(1.12) Rewrite

x + 1 −

x to get rid of subtractive cancellation when x is very large.
(1.13) Use a Taylor’s expansion to rid the expression 1 − cos x of subtractive cancellation for x
small. Use a O

x
5

approximate.
(1.14) Use a Taylor’s expansion to rid the expression 1 − cos
2
x of subtractive cancellation for x
small. Use a O

x
6

approximate.
(1.15) Calculate cos(π/2 + 0.001) to within 8 decimal places by using the Taylor’s expansion.
(1.16) Prove that if x is an eigenvector of A then αx is also an eigenvector of A, for the same
eigenvalue. Here α is a nonzero real number.
(1.17) Prove, by induction, that if λ is an eigenvalue of A then λ
k
is an eigenvalue of A
k
for integer
k > 1. The base case was done in Example Problem 1.26.
(1.18) Let B =
¸
k
i=0
α
i
A
i
, where A
0
= I. Prove that if λ is an eigenvalue of A, then
¸
k
i=0
α
i
λ
i
is
an eigenvalue of B. Thus for polynomial p(x), p(λ) is an eigenvalue of p(A).
(1.19) Suppose A is an invertible matrix with eigenvalue λ. Prove that λ
−1
is an eigenvalue for
A
−1
.
(1.20) Suppose that the eigenvalues of A are 1, 10, 100. Give the eigenvalues of B = 3A
3
−4A
2
+I.
Show that B is singular.
(1.21) Show that if |x|
2
= r, then x is on a sphere centered at the origin of radius r, in R
n
.
(1.22) If |x|
2
= 0, what does this say about vector x?
(1.23) Letting x = [3 4 12]
¯
, what is |x|
2
?
(1.24) What is the norm of
A =

1 0 0 0
0 1/2 0 0
0 0 1/3 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 1/n
¸
¸
¸
¸
¸
¸
¸
?
(1.25) Show that |A|
2
= 0 implies that A is the matrix of all zeros.
(1.26) Show that

A
−1

2
equals (1/[λ
min
[) , where λ
min
is the smallest, in absolute value, eigenvalue
of A.
(1.27) Suppose there is some µ > 0 such that, for a given A,
|Av|
2
≥ µ|v|
2
,
1.4. EIGENVALUES 11
for all vectors v.
(a) Show that µ ≤ |A|
2
. (Should be very simple.)
(b) Show that A is nonsingular. (Recall: A is singular if there is some x = 0 such that
Ax = 0.)
(c) Show that

A
−1

2
≤ (1/µ) .
(1.28) If A is singular, is it necessarily the case that |A|
2
= 0?
(1.29) If |A|
2
≥ µ > 0 does it follow that A is nonsingular?
(1.30) Towards proving the equality in Example Problem 1.28, prove that if Λ is the set of eigen-
values of A, then
|A| ≥ max
λ∈Λ
[λ[ ,
where || is any subordinate matrix norm. The inequality in the other direction holds when
the norm is ||
2
, but is difficult to prove.
12 CHAPTER 1. INTRODUCTION
Chapter 2
A “Crash” Course in octave/Matlab
2.1 Getting Started
Matlab is a software package that allows you to program the mathematics of an algorithm without
getting too bogged down in the details of data structures, pointers, and reinventing the wheel.
It also includes graphing capabilities, and there are numerous packages available for all kinds of
functions, enabling relatively high-level programming. Unfortunately, it also costs quite a bit of
money, which is why I recommend the free Matlab clone, octave, available under the GPL
1
, freely
downloadable from http://www.octave.org.
In a lame attempt to retain market share, Mathworks continues to tinker with Matlab to make
it noncompatible with octave; this has the side effect of obsoletizing old Matlab code. I will try
to focus on the intersection of the two systems, except where explicitly noted otherwise. What
follows, then, is an introduction to octave; Matlab users will have to make some changes.
You can find a number of octave/Matlab tutorials for free on the web; many of them are
certainly better than this one. A brief web search reveals the following excellent tutorials:
• http://www.math.mtu.edu/~msgocken/intro/intro.html
• http://www.cyclismo.org/tutorial/matlab/vector.html
• http://web.ew.usna.edu/~mecheng/DESIGN/CAD/MATLAB/usna.html
Matlab has some demo programs covering a number of topics– from the most basic functionality
to the more arcane toolboxes. In Matlab, simply type demo.
What follows is a lame demo for octave. Start up octave. You should get something like:
GNU Octave, version 2.1.44 (i686-pc-linux-gnu).
Copyright (C) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003 John W. Eaton.
This is free software; see the source code for copying conditions.
There is ABSOLUTELY NO WARRANTY; not even for MERCHANTIBILITY or
FITNESS FOR A PARTICULAR PURPOSE. For details, type ‘warranty’.
Please contribute if you find this software useful.
For more information, visit http://www.octave.org/help-wanted.html
Report bugs to <bug-octave@bevo.che.wisc.edu>.
octave:1>
1
Gnu Public License. See http://www.gnu.org/copyleft/gpl.html.
13
14 CHAPTER 2. A “CRASH” COURSE IN OCTAVE/MATLAB
You now have a command line. The basic octavian data structure is a matrix; a scalar is a 11
matrix, a vector is an n 1 matrix. Some simple matrix constructions are as follows:
octave:2> a = [1 2 3]
a =
1 2 3
octave:3> b = [5;4;3]
b =
5
4
3
octave:4> c = a’
c =
1
2
3
octave:5> d = 5*c - 2 * b
d =
-5
2
9
You should notice that octave “echoes” the lvalues it creates. This is either a feature or an
annoyance. It can be prevented by appending a semicolon at the end of a command. Thus the
previous becomes
octave:5> d = 5*c - 2 * b;
octave:6>
For illustration purposes, I am leaving the semicolon off. To access an entry or entries of a
matrix, use parentheses. In the case where the variable is a vector, you only need give a single
index, as shown below; when the variable is a matrix, you need give both indices. You can also
give a range of indices, as in what follows.
WARNING: vectors and matrices in octave/Matlab are indexed starting from 1, and not
from 0, as is more common in modern programming languages. You are warned! Moreover, the
last index of a vector is denoted by the special symbol end.
octave:6> a(1) = 77
a =
77 2 3
octave:7> a(end) = -400
a =
77 2 -400
2.1. GETTING STARTED 15
octave:8> a(2:3) = [22 333]
a =
77 22 333
octave:9> M = diag(a)
M =
77 0 0
0 22 0
0 0 333
octave:10> M(2,1) = 14
M =
77 0 0
14 22 0
0 0 333
octave:11> M(1:2,1:2) = [1 2;3 4]
M =
1 2 0
3 4 0
0 0 333
The command diag(v) returns a matrix with v as the diagonal, if v is a vector. diag(M) returns
the diagonal of matrix M as a vector.
The form c:d gives returns a row vector of the integers between a and d, as we will examine
later. First we look at matrix manipulation:
octave:12> j = M * b
j =
13
31
999
octave:13> N = rand(3,3)
N =
0.166880 0.027866 0.087402
0.706307 0.624716 0.067067
0.911833 0.769423 0.938714
octave:14> L = M + N
L =
1.166880 2.027866 0.087402
3.706307 4.624716 0.067067
0.911833 0.769423 333.938714
octave:15> P = L * M
P =
16 CHAPTER 2. A “CRASH” COURSE IN OCTAVE/MATLAB
7.2505e+00 1.0445e+01 2.9105e+01
1.7580e+01 2.5911e+01 2.2333e+01
3.2201e+00 4.9014e+00 1.1120e+05
octave:16> P = L .* M
P =
1.1669e+00 4.0557e+00 0.0000e+00
1.1119e+01 1.8499e+01 0.0000e+00
0.0000e+00 0.0000e+00 1.1120e+05
octave:17> x = M \ b
x =
-6.0000000
5.5000000
0.0090090
octave:18> err = M * x - b
err =
0
0
0
Note the difference between L * M and L .* M; the former is matrix multiplication, the latter
is element by element multiplication, i.e.,
(L .∗ M)
i,j
= L
i,j
M
i,j
.
The command rand(m,n) gives an m n matrix with each element “uniformly distributed” on
[0, 1]. For a zero mean normal distribution with unit variance, use randn(m,n).
In line 17 we asked octave to solve the linear system
Mx = b,
by setting
x = M`b = M
−1
b.
Note that you can construct matrices directly as you did vectors:
octave:19> B = [1 3 4 5;2 -2 2 -2]
B =
1 3 4 5
2 -2 2 -2
You can also create row vectors as a sequence, either using the form c:d or the form c:e:d, which
give, respectively, c, c+1, . . . , d, and c, c+e, . . . , d, (or something like it if e does not divide d−c)
as follows:
octave:20> z = 1:5
z =
1 2 3 4 5
2.2. USEFUL COMMANDS 17
octave:21> z = 5:(-1):1
z =
5 4 3 2 1
octave:22> z = 5:(-2):1
z =
5 3 1
octave:23> z = 2:3:11
z =
2 5 8 11
octave:24> z = 2:3:10
z =
2 5 8
Matrices and vectors can be constructed “blockwise.” Blocks in the same row are separated by a
comma, those in the same column by a semicolon. Thus
octave:2> y=[2 7 9]
y =
2 7 9
octave:3> m = [z;y]
m =
2 5 8
2 7 9
octave:4> k = [(3:4)’, m]
k =
3 2 5 8
4 2 7 9
2.2 Useful Commands
Here’s a none too complete listing of useful commands in octave:
• help is the most useful command.
• floor(X) returns the largest integer not greater than X. If X is a vector or matrix, it computes
the floor element-wise. This behavior is common in octave: many functions which we normally
think of as applicable to scalars can be applied to matrices, with the result computed element-
wise.
• ceil(X) returns the smallest integer not less than X, computed element-wise.
• sin(X), cos(X), tan(X), atan(X), sqrt(X), returns the sine, cosine, tangent, arctangent,
square root of X, computed elementwise.
• exp(X) returns e
X
, elementwise.
• abs(X) returns [X[ , elementwise.
18 CHAPTER 2. A “CRASH” COURSE IN OCTAVE/MATLAB
• norm(X) returns the norm of X; if X is a vector, this is the L
2
norm:
|X|
2
=

¸
i
X
2
i

1/2
,
if X is a matrix, it is the matrix norm subordinate to the L
2
norm.
You can compute other norms with norm(X,p) where p is a number, to get the L
p
norm, or
with p one of Inf, -Inf, etc.
• zeros(m,n) returns an mn matrix of all zeros.
• eye(m) returns the mm identity matrix.
• [m,n] = size(A) returns the number of rows, columns of A. Similarly the functions rows(A)
and columns(A) return the number of rows and columns, respectively.
• length(v) returns the length of vector v, or the larger dimension if v is a matrix.
• find(M) returns the indices of the nonzero elements of N. This may not seem helpful at first,
but it can be very useful for selecting subsets of data because the indices can be used for
selection. Thus, for example, in this code
octave:1> v = round(20*randn(400,3));
octave:2> selectv = v(find(v(:,2) == 7),:)
we have selected the rows of v where the element in the second column equals 7. Now you
see why leading computer scientists refer to octave/Matlab as “semantically suspect.” It is
a very useful language nonetheless, and you should try to learn its quirks rather than resist
them.
• diag(v) returns the diagonal matrix with vector v as diagonal. diag(M) returns as a vector,
the diagonal of matrix v. Thus diag(diag(v)) is v for vector v, but diag(diag(M)) is the
diagonal part of matrix M.
• toeplitz(v) returns the Toeplitz matrix associated with vector v. That is
toeplitz(v) =

v(1) v(2) v(3) v(n)
v(2) v(1) v(2) v(n −1)
v(3) v(2) v(1) v(n −2)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
v(n) v(n −1) v(n −2) v(1)
¸
¸
¸
¸
¸
¸
¸
In the more general form, toeplitz(c,r) can be used to return a assymmetric Toeplitz
matrix.
A matrix which is banded on the cross diagonals is evidently called a “Hankel” matrix:
hankel(u, v) =

u(1) u(2) u(3) u(n)
u(2) u(3) u(4) v(2)
u(3) u(4) u(5) v(3)
.
.
.
.
.
.
.
.
.
.
.
.
u(n) v(2) v(3) v(n)
¸
¸
¸
¸
¸
¸
¸
• eig(M) returns the eigenvalues of M. [V, LAMBDA] = eig(M) returns the eigenvectors, and
eigenvalues of M.
• kron(M,N) returns the Kronecker product of the two matrices. This is a blcok construction
which returns a matrix where each block is an element of M as a scalar multiplied by the
whole matrix N.
• flipud(N) flips the vector or matrix N so that its first row is last and vice versa. Similarly
fliplr(N) flips left/right.
2.3. PROGRAMMING AND CONTROL 19
2.3 Programming and Control
If you are going to do any serious programming in octave, you should keep your commands in a
file. octave loads commands from ‘.m’ files.
2
If you have the following in a file called myfunc.m:
function [y1,y2] = myfunc(x1,x2)
% comments start with a ‘%’
% this function is useless, except as an example of functions.
% input:
% x1 a number
% x2 another number
% output:
% y1 some output
% y2 some output
y1 = cos(x1) .* sin(x2);
y2 = norm(y1);
then you can call this function from octave, as follows:
octave:1> myfunc(2,3)
ans = -0.058727
octave:2> [a,b] = myfunc(2,3)
a = -0.058727
b = 0.058727
octave:3> [a,b] = myfunc([1 2 3 4],[1 2 3 4])
a =
0.45465 -0.37840 -0.13971 0.49468
b = 0.78366
Note this silly function will throw an error if x1 and x2 are not of the same size.
It is recommended that you write your functions so that they can take scalar and vector input
where appropriate. For example, the octave builtin sine function can take a scalar and output a
scalar, or take a vector and output a vector which is, elementwise, the sine of the input. It is not
too difficult to write functions this way, it often only requires judicious use of .* multiplies instead
of * multiplies. For example, if the file myfunc.m were changed to read
y1 = cos(x1) * sin(x2);
it could easily crash if x1 and x2 were vectors of the same size because matrix multiplication is not
defined for an n 1 matrix times another n 1 matrix.
An .m file does not have to contain a function, it can merely contain some octave commands.
For example, putting the following into runner.m:
x1 = rand(4,3);
x2 = rand(size(x1));
[a,b] = myfunc(x1,x2)
2
The ‘m’ stands for ‘octave.’
20 CHAPTER 2. A “CRASH” COURSE IN OCTAVE/MATLAB
octave allows you to call this script without arguments:
octave:4> runner
a =
0.245936 0.478054 0.535323
0.246414 0.186454 0.206279
0.542728 0.419457 0.083917
0.257607 0.378558 0.768188
b = 1.3135
octave has to know where your .m file is. It will look in the directory from which it was called.
You can set this to something else with cd or chdir.
You can also use the octave builtin function feval to evaluate a function by name. For example,
the following is a different way of calling myfunc.m:
octave:5> [a,b] = feval("myfunc",2,3)
a = -0.058727
b = 0.058727
In this form feval seems like a way of using more keystrokes to get the same result. However, you
can pass a variable function name as well:
octave:6> fname = "myfunc"
fname = myfunc
octave:7> [a,b] = feval(fname,2,3)
a = -0.058727
b = 0.058727
This allows you to effectively pass functions to other functions.
2.3.1 Logical Forks and Control
octave has the regular assortment of ‘if-then-else’ and ‘for’ and ‘while’ loops. These take the
following form:
if expr1
statements
elseif expr2
statements
elsif expr3
statements
...
else
statements
end
for var=vector
statements
end
while expr
2.4. PLOTTING 21
statements
end
Note that the word end is one of the most overloaded in octave/Matlab. It stands for the last
index of a vector of matrix, as well as the exit point for for loops, if statements, switches, etc. To
simplify debugging, it is also permissible to use endif to end an if statement, endfor to end a
for loop, etc..
The test expressions may use the logical conditionals: >, <, >=, <=, ==, ~=. Do not use the
assignment operator = as a conditional, as this can lead to disastrous results, as in C.
Here are some examples:
%compute the sign of x
if x > 0
s = 1;
elseif x == 0
s = 0;
else
s = -1;
end
%a ‘regular’ for loop
for i=1:10
sm = sm + i;
end
%an ‘irregular’ for loop
for i=[1 2 3 5 8 13 21 34]
fsm = fsm + i;
end
while (sin(x) > 0)
x = x * pi;
end
2.4 Plotting
Plotting is one area in which there are some noticeable differences between octave and Matlab.
The commands and examples given herein are for octave, but the commands for Matlab are not
too different. octave ships its plotting commands to Gnuplot.
The main plot command is plot. You may also use semilogx, semilogy,loglog for 2D plots
with log axes, and contour and mesh for 3D plots. Use the help command to get the specific
syntax for each command. We present some examples:
n = 100;
X = pi .* ((1:n) ./ n);
Y = sin(X);
%just plot Y
plot(Y);
22 CHAPTER 2. A “CRASH” COURSE IN OCTAVE/MATLAB
%plot Y, but with the ‘right’ X axis labels
plot(X,Y);
W = sqrt(Y);
plot(W);
%plot W, but with the ‘right’ X axis labels
plot(Y,W);
The output from these commands is seen in Figure 2.1. In particular, you should note the
difference between plotting a vector, as in Figure 2.1(c) versus plotting the same vector but with
the appropriate abscissa values, as in Figure 2.1(d).
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 10 20 30 40 50 60 70 80 90 100
line 1
(a) Y = sin(X)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.5 1 1.5 2 2.5 3 3.5
line 1
(b) Y versus X
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 10 20 30 40 50 60 70 80 90 100
line 1
(c) W =

Y
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
line 1
(d) W versus Y
Figure 2.1: Four plots from octave.
Some magic commands are required to plot to a file. For octave, I recommend the following
magic formula, which replots the figure to a file:
%call the plot commands before this line
gset term postscript color;
2.4. PLOTTING 23
gset output "filename.ps";
replot;
gset term x11;
gset output "/dev/null";
In Matlab, the commands are something like this:
%call the plot commands before this line
print(gcf,’-deps’,’filename.eps’);
24 CHAPTER 2. A “CRASH” COURSE IN OCTAVE/MATLAB
Exercises
(2.1) What do the following pieces of octave/Matlab code accomplish?
(a) x = (0:40) ./ 40;
(b) a = 2;
b = 5;
x = a + (b-a) .* (0:40) ./ 40;
(c) x = a + (b-a) .* (0:40) ./ 40;
y = sin(x);
plot(x,y);
(2.2) Implement the na¨ıve quadratic formula to find the roots of x
2
+bx+c = 0, for real b, c. Your
code should return
−b ±

b
2
−4c
2
.
Your m-file should have header line like:
function [x1,x2] = naivequad(b,c)
Test your code for (b, c) =

1 10
15
, 1

. Do you get a spurious root?
(2.3) Implement a robust quadratic formula (cf. Example Problem 1.9) to find the roots of x
2
+
bx +c = 0. Your m-file should have header line like:
function [x1,x2] = robustquad(b,c)
Test your code for (b, c) =

1 10
15
, 1

. Do you get a spurious root?
(2.4) Write octave/Matlab code to find a fixed point for the cosine, i.e., some x such that x =
cos(x). Do this as follows: pick some initial value x
0
, then let x
i+1
= cos(x
i
) for i =
0, 1, . . . , n. Pick n to be reasonably large, or choose some convergence criterion (i.e., ter-
minate if [x
i+1
−x
i
[ < 1 10
−10
). Does your code always converge?
(2.5) Write code to implement the factorial function for integers:
function [nfact] = factorial(n)
where n factorial is equal to 1 2 3 (n−1) n. Either use a ‘for’ loop, or write the function
to recursively call itself.
Chapter 3
Solving Linear Systems
A number of problems in numerical analysis can be reduced to, or approximated by, a system of
linear equations.
3.1 Gaussian Elimination with Na¨ıve Pivoting
Our goal is the automatic solution of systems of linear equations:
a
11
x
1
+ a
12
x
2
+ a
13
x
3
+ + a
1n
x
n
= b
1
a
21
x
1
+ a
22
x
2
+ a
23
x
3
+ + a
2n
x
n
= b
2
a
31
x
1
+ a
32
x
2
+ a
33
x
3
+ + a
3n
x
n
= b
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
a
n1
x
1
+ a
n2
x
2
+ a
n3
x
3
+ + a
nn
x
n
= b
n
In these equations, the a
ij
and b
i
are given real numbers. We also write this as
Ax = b,
where A is a matrix, whose element in the i
th
row and j
th
column is a
ij
, and b is a column vector,
whose i
th
entry is b
i
.
This gives the easier way of writing this equation:

a
11
a
12
a
13
a
1n
a
21
a
22
a
23
a
2n
a
31
a
32
a
33
a
3n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
a
n1
a
n2
a
n3
a
nn
¸
¸
¸
¸
¸
¸
¸

x
1
x
2
x
3
.
.
.
x
n
¸
¸
¸
¸
¸
¸
¸
=

b
1
b
2
b
3
.
.
.
b
n
¸
¸
¸
¸
¸
¸
¸
(3.1)
3.1.1 Elementary Row Operations
You may remember that one way to solve linear equations is by applying elementary row operations
to a given equation of the system. For example, if we are trying to solve the given system of
equations, they should have the same solution as the following system:
25
26 CHAPTER 3. SOLVING LINEAR SYSTEMS

a
11
a
12
a
13
a
1n
a
21
a
22
a
23
a
2n
a
31
a
32
a
33
a
3n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
κa
i1
κa
i2
κa
i3
κa
in
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
a
n1
a
n2
a
n3
a
nn
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸

x
1
x
2
x
3
.
.
.
x
i
.
.
.
x
n
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
=

b
1
b
2
b
3
.
.
.
κb
i
.
.
.
b
n
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
where κ is some given number which is not zero. It suffices to solve this system of linear
equations, as it has the same solution(s) as our original system. Multiplying a row of the system
by a nonzero constant is one of the elementary row operations.
The second elementary row operation is to replace a row by the sum of that row and a constant
times another. Thus, for example, the following system of equations has the same solution as the
original system:

a
11
a
12
a
13
a
1n
a
21
a
22
a
23
a
2n
a
31
a
32
a
33
a
3n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
a
(i−1)1
a
(i−1)2
a
(i−1)3
a
(i−1)n
a
i1
+βa
j1
a
i2
+βa
j2
a
i3
+βa
j3
a
in
+βa
jn
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
a
n1
a
n2
a
n3
a
nn
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸

x
1
x
2
x
3
.
.
.
x
(i−1)
x
i
.
.
.
x
n
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
=

b
1
b
2
b
3
.
.
.
b
(i−1)
b
i
+βb
j
.
.
.
b
n
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
We have replaced the i
th
row by the i
th
row plus β times the j
th
row.
The third elementary row operation is to switch rows:

a
11
a
12
a
13
a
1n
a
31
a
32
a
33
a
3n
a
21
a
22
a
23
a
2n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
a
n1
a
n2
a
n3
a
nn
¸
¸
¸
¸
¸
¸
¸

x
1
x
2
x
3
.
.
.
x
n
¸
¸
¸
¸
¸
¸
¸
=

b
1
b
3
b
2
.
.
.
b
n
¸
¸
¸
¸
¸
¸
¸
We have here switched the second and third rows. The purpose of this e.r.o. is mainly to make
things look nice.
Note that none of the e.r.o.’s change the structure of the solution vector x. For this reason, it is
customary to drop the solution vector entirely and to write the matrix A and the vector b together
in augmented form:

¸
¸
¸
¸
¸
¸
a
11
a
12
a
13
a
1n
b
1
a
21
a
22
a
23
a
2n
b
2
a
31
a
32
a
33
a
3n
b
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
a
n1
a
n2
a
n3
a
nn
b
n
¸

The idea of Gaussian Elimination is to use the elementary row operations to put a system into
upper triangular form then use back substitution. We’ll give an example here:
3.1. GAUSSIAN ELIMINATION WITH NA
¨
IVE PIVOTING 27
Example Problem 3.1. Solve the set of linear equations:
x
1
+x
2
−x
3
= 2
−3x
1
−4x
2
+ 4x
3
= −7
2x
1
+ 1x
2
+ 1x
3
= 7
Solution: We start by rewriting in the augmented form:

¸
1 1 −1 2
−3 −4 4 −7
2 1 1 7
¸

We add 3 times the first row to the second, and −2 times the first row to the third to get:

¸
1 1 −1 2
0 −1 1 −1
0 −1 3 3
¸

We now add −1 times the second row to the third row to get:

¸
1 1 −1 2
0 −1 1 −1
0 0 2 4
¸

The matrix is now in upper triangular form: there are no nonzero entries below the diagonal. This
corresponds to the set of equations:
x
1
+x
2
−x
3
= 2
−x
2
+x
3
= −1
2x
3
= 4
We now solve this by back substitution. Because the matrix is in upper triangular form, we can
solve x
3
by looking only at the last equation; namely x
3
= 2. However, once x
3
is known, the second
equation involves only one unknown, x
2
, and can be solved only by x
2
= 3. Then the first equation
has only one unknown, and is solved by x
1
= 1. ¬
All sorts of funny things can happen when you attempt Gaussian Elimination: it may turn
out that your system has no solution, or has a single solution (as above), or an infinite number
of solutions. We should expect that an algorithm for automatic solution of systems of equations
should detect these problems.
3.1.2 Algorithm Terminology
The method outlined above is fine for solving small systems. We should like to devise an algorithm
for doing the same thing which can be applied to large systems of equations. The algorithm will
take the system (in augmented form):

¸
¸
¸
¸
¸
¸
a
11
a
12
a
13
a
1n
b
1
a
21
a
22
a
23
a
2n
b
2
a
31
a
32
a
33
a
3n
b
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
a
n1
a
n2
a
n3
a
nn
b
n
¸

28 CHAPTER 3. SOLVING LINEAR SYSTEMS
The algorithm then selects the first row as the pivot equation or pivot row, and the first element of
the first row, a
11
is the pivot element. The algorithm then pivots on the pivot element to get the
system:

¸
¸
¸
¸
¸
¸
a
11
a
12
a
13
a
1n
b
1
0 a
t
22
a
t
23
a
t
2n
b
t
2
0 a
t
32
a
t
33
a
t
3n
b
t
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 a
t
n2
a
t
n3
a
t
nn
b
t
n
¸

Where
a
t
ij
= a
ij

a
i1
a
11

a
1j
b
t
i
= b
i

a
i1
a
11

b
1

(2 ≤ i ≤ n, 1 ≤ j ≤ n)
Effectively we are carrying out the e.r.o. of replacing the i
th
row by the i
th
row minus

a
i1
a
11

times
the first row. The quantity

a
i1
a
11

is the multiplier for the i
th
row.
Hereafter the algorithm will not alter the first row or first column of the system. Thus, the
algorithm could be written recursively. By pivoting on the second row, the algorithm then generates
the system:

¸
¸
¸
¸
¸
¸
a
11
a
12
a
13
a
1n
b
1
0 a
t
22
a
t
23
a
t
2n
b
t
2
0 0 a
tt
33
a
tt
3n
b
tt
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 a
tt
n3
a
tt
nn
b
tt
n
¸

In this case
a
tt
ij
= a
t
ij

a

i2
a

22

a
t
2j
b
tt
i
= b
t
i

a

i2
a

22

b
t
2

(3 ≤ i ≤ n, 1 ≤ j ≤ n)
3.1.3 Algorithm Problems
The pivoting strategy we examined in this section is called ‘na¨ıve’ because a real algorithm is a bit
more complicated. The algorithm we have outlined is far too rigid–it always chooses to pivot on
the k
th
row during the k
th
step. This would be bad if the pivot element were zero; in this case all
the multipliers
a
ik
a
kk
are not defined.
Bad things can happen if a
kk
is merely small instead of zero. Consider the following example:
Example 3.2. Solve the system of equations given by the augmented form:

−0.0590 0.2372 −0.3528
0.1080 −0.4348 0.6452

Note that the exact solution of this system is x
1
= 10, x
2
= 1. Suppose, however, that the algorithm
uses only 4 significant figures for its calculations. The algorithm, na¨ıvely, pivots on the first
equation. The multiplier for the second row is
0.1080
−0.0590
≈ −1.830508...,
which will be rounded to −1.831 by the algorithm.
3.2. PIVOTING STRATEGIES FOR GAUSSIAN ELIMINATION 29
The second entry in the matrix is replaced by
−0.4348 −(−1.831)(0.2372) = −0.4348 + 0.4343 = −0.0005,
where the arithmetic is rounded to four significant figures each time. There is some serious sub-
tractive cancellation going on here. We have lost three figures with this subtraction. The errors
get worse from here. Similarly, the second vector entry becomes:
0.6452 −(−1.831)(−0.3528) = 0.6452 −0.6460 = −0.0008,
where, again, intermediate steps are rounded to four significant figures, and again there is subtrac-
tive cancelling. This puts the system in the form

−0.0590 0.2372 −0.3528
0 −0.0005 −0.0008

When the algorithm attempts back substitution, it gets the value
x
2
=
−0.0008
−0.0005
= 1.6.
This is a bit off from the actual value of 1. The algorithm now finds
x
1
= (−0.3528 −0.2372 1.6) /−0.059 = (−0.3528 −0.3795) /−0.059 = (−0.7323) /−0.059 = 12.41,
where each step has rounding to four significant figures. This is also a bit off.
3.2 Pivoting Strategies for Gaussian Elimination
Gaussian Elimination can fail when performed in the wrong order. If the algorithm selects a zero
pivot, the multipliers are undefined, which is no good. We also saw that a pivot small in magnitude
can cause failure. As here:
x
1
+x
2
= 1
x
1
+x
2
= 2
The na¨ıve algorithm solves this as
x
2
=
2 −
1

1 −
1

= 1 −

1 −
x
1
=
1 −x
2

=
1
1 −
If is very small, then
1

is enormous compared to both 1 and 2. With poor rounding, the algorithm
solves x
2
as 1. Then it solves x
1
= 0. This is nearly correct for x
2
, but is an awful approximation
for x
1
. Note that this choice of x
1
, x
2
satisfies the first equation, but not the second.
Now suppose the algorithm changed the order of the equations, then solved:
x
1
+x
2
= 2
x
1
+x
2
= 1
30 CHAPTER 3. SOLVING LINEAR SYSTEMS
The algorithm solves this as
x
2
=
1 −2
1 −
x
1
= 2 −x
2
There’s no problem with rounding here.
The problem is not the small entry per se: Suppose we use an e.r.o. to scale the first equation,
then use na¨ıve G.E.:
x
1
+
1

x
2
=
1

x
1
+x
2
= 2
This is still solved as
x
2
=
2 −
1

1 −
1

x
1
=
1 −x
2

,
and rounding is still a problem.
3.2.1 Scaled Partial Pivoting
The na¨ıve G.E. algorithm uses the rows 1, 2, . . . , n-1 in order as pivot equations. As shown above,
this can cause errors. Better is to pivot first on row
1
, then row
2
, etc, until finally pivoting on
row
n−1
, for some permutation ¦
i
¦
n
i=1
of the integers 1, 2, . . . , n. The strategy of scaled partial
pivoting is to compute this permutation so that G.E. works well.
In light of our example, we want to pivot on an element which is not small compared to other
elements in its row. So our algorithm first determines “smallness” by calculating a scale, row-wise:
s
i
= max
1≤j≤n
[a
ij
[ .
The scales are only computed once.
Then the first pivot,
1
, is chosen to be the i such that
[a
i,1
[
s
i
is maximized. The algorithm pivots on row
1
, producing a bunch of zeros in the first column. Note
that the algorithm should not rearrange the matrix–this takes too much work.
The second pivot,
2
, is chosen to be the i such that
[a
i,2
[
s
i
is maximized, but without choosing
2
=
1
. The algorithm pivots on row
2
, producing a bunch of
zeros in the second column.
In the k
th
step
k
is chosen to be the i not among
1
,
2
, . . . ,
k−1
such that
[a
i,k
[
s
i
3.2. PIVOTING STRATEGIES FOR GAUSSIAN ELIMINATION 31
is maximized. The algorithm pivots on row
k
, producing a bunch of zeros in the k
th
column.
The slick way to implement this is to first set
i
= i for i = 1, 2, . . . , n. Then rearrange this
vector in a kind of “bubble sort”: when you find the index that should be
1
, swap them, i.e., find
the j such that
j
should be the first pivot and switch the values of
1
,
j
.
Then at the k
th
step, search only those indices in the tail of this vector: i.e., only among
j
for
k ≤ j ≤ n, and perform a swap.
3.2.2 An Example
We present an example of using scaled partial pivoting with G.E. It’s hard to come up with an
example where the numbers do not come out as ugly fractions. We’ll look at a homework question.

¸
¸
¸
2 −1 3 7 15
4 4 0 7 11
2 1 1 3 7
6 5 4 17 31
¸

The scales are as follows: s
1
= 7, s
2
= 7, s
3
= 3, s
4
= 17.
We pick
1
. It should be the index which maximizes [a
i1
[ /s
i
. These values are:
2
7
,
4
7
,
2
3
,
6
17
.
We pick
1
= 3, and pivot:

¸
¸
¸
0 −2 2 4 8
0 2 −2 1 −3
2 1 1 3 7
0 2 1 8 10
¸

We pick
2
. It should not be 3, and should be the index which maximizes [a
i2
[ /s
i
. These values
are:
2
7
,
2
7
,
2
17
.
We have a tie. In this case we pick the second row, i.e.,
2
= 2. We pivot:

¸
¸
¸
0 0 0 5 5
0 2 −2 1 −3
2 1 1 3 7
0 0 3 7 13
¸

The matrix is in permuted upper triangular form. We could proceed, but would get a zero
multiplier, and no changes would occur.
If we did proceed we would have
3
= 4. Then
4
= 1. Our row permutation is 3, 2, 4, 1. When
we do back substitution, we work in this order reversed on the rows, solving x
4
, then x
3
, x
2
, x
1
.
We get x
4
= 1, so
x
3
=
1
3
(13 −7 ∗ 1) = 2
x
2
=
1
2
(−3 −1 ∗ 1 + 2 ∗ 2) = 0
x
1
=
1
2
(7 −3 ∗ 1 −1 ∗ 2 −1 ∗ 0) = 1
32 CHAPTER 3. SOLVING LINEAR SYSTEMS
3.2.3 Another Example and A Real Algorithm
Sometimes we want to solve
Ax = b
for a number of different vectors b. It turns out we can run G.E. on the matrix A alone and come up
with all the multipliers, which can then be used multiple times on different vectors b. We illustrate
with an example:
M
0
=

¸
¸
¸
1 2 4 1
4 2 1 2
2 1 2 3
1 3 2 1
¸

, =

1
2
3
4
¸
¸
¸
¸
.
The scale vector is s =

4 4 3 3

¯
.
Our scale choices are
1
4
,
4
4
,
2
3
,
1
3
. We choose
1
= 2, and swap
1
,
2
. In the places where there
would be zeros in the real matrix, we will put the multipliers. We will illustrate them here boxed:
M
1
=

¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
1
4
3
2
15
4
1
2
4 2 1 2
1
2
0
3
2
2
1
4
5
2
7
4
1
2
¸

, =

2
1
3
4
¸
¸
¸
¸
.
Our scale choices are
3
8
,
0
3
,
5
6
. We choose
2
= 4, and so swap
2
,
4
:
M
2
=

¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
1
4
3
5
27
10
1
5
4 2 1 2
1
2
0
3
2
2
1
4
5
2
7
4
1
2
¸

, =

2
4
3
1
¸
¸
¸
¸
.
Our scale choices are
27
40
,
1
2
. We choose
3
= 1, and so swap
3
,
4
:
M
3
=

¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
1
4
3
5
27
10
1
5
4 2 1 2
1
2
0
5
9
17
9
1
4
5
2
7
4
1
2
¸

, =

2
4
1
3
¸
¸
¸
¸
.
Now suppose we had to solve the linear system for b =

−1 8 2 1

¯
.
We scale b by the multipliers in order:
1
= 2, so, we sweep through the first column of M
3
,
picking off the boxed numbers (your computer doesn’t really have boxed variables), and scaling b
3.3. LU FACTORIZATION 33
appropriately:

−1
8
2
1
¸
¸
¸
¸

−3
8
−2
−1
¸
¸
¸
¸
This continues:

−3
8
−2
−1
¸
¸
¸
¸


12
5
8
−2
−1
¸
¸
¸
¸


12
5
8

2
3
−1
¸
¸
¸
¸
We then perform a permuted backwards substitution on the augmented system

¸
¸
¸
0 0
27
10
1
5

12
5
4 2 1 2 8
0 0 0
17
9

2
3
0
5
2
7
4
1
2
−1
¸

This proceeds as
x
4
=
−2
3
9
17
=
−6
17
x
3
=
10
27


12
5

1
5
−6
17

= . . .
x
2
=
2
5

−1 −
1
2
−6
17

7
4
x
3

= . . .
x
1
=
1
4

8 −2
−6
17
−x
3
−2x
2

= . . .
Fill in your own values here.
3.3 LU Factorization
We examined G.E. to solve the system
Ax = b,
where A is a matrix:
A =

a
11
a
12
a
13
a
1n
a
21
a
22
a
23
a
2n
a
31
a
32
a
33
a
3n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
a
n1
a
n2
a
n3
a
nn
¸
¸
¸
¸
¸
¸
¸
.
We want to show that G.E. actually factors A into lower and upper triangular parts, that is A = LU,
where
L =

1 0 0 0

21
1 0 0

31

32
1 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

n1

n2

n3
1
¸
¸
¸
¸
¸
¸
¸
, U =

u
11
u
12
u
13
u
1n
0 u
22
u
23
u
2n
0 0 u
33
u
3n
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 u
nn
¸
¸
¸
¸
¸
¸
¸
.
We call this a LU Factorization of A.
34 CHAPTER 3. SOLVING LINEAR SYSTEMS
3.3.1 An Example
We consider solution of the following augmented form:

¸
¸
¸
2 1 1 3 7
4 4 0 7 11
6 5 4 17 31
2 −1 0 7 15
¸

(3.2)
The na¨ıve G.E. reduces this to

¸
¸
¸
2 1 1 3 7
0 2 −2 1 −3
0 0 3 7 13
0 0 0 12 18
¸

We are going to run the na¨ıve G.E., and see how it is a LU Factorization. Since this is the na¨ıve
version, we first pivot on the first row. Our multipliers are 2, 3, 1. We pivot to get

¸
¸
¸
2 1 1 3 7
0 2 −2 1 −3
0 2 1 8 10
0 −2 −1 4 8
¸

Careful inspection shows that we’ve merely multiplied A and b by a lower triangular matrix M
1
:
M
1
=

1 0 0 0
−2 1 0 0
−3 0 1 0
−1 0 0 1
¸
¸
¸
¸
The entries in the first column are the negative e.r.o. multipliers for each row. Thus after the first
pivot, it is like we are solving the system
M
1
Ax = M
1
b.
We pivot on the second row to get:

¸
¸
¸
2 1 1 3 7
0 2 −2 1 −3
0 0 3 7 13
0 0 −3 5 5
¸

The multipliers are 1, −1. We can view this pivot as a multiplication by M
2
, with
M
2
=

1 0 0 0
0 1 0 0
0 −1 1 0
0 1 0 1
¸
¸
¸
¸
We are now solving
M
2
M
1
Ax = M
2
M
1
b.
3.3. LU FACTORIZATION 35
We pivot on the third row, with a multiplier of −1. Thus we get

¸
¸
¸
2 1 1 3 7
0 2 −2 1 −3
0 0 3 7 13
0 0 0 12 18
¸

We have multiplied by M
3
:
M
3
=

1 0 0 0
0 1 0 0
0 0 1 0
0 0 1 1
¸
¸
¸
¸
We are now solving
M
3
M
2
M
1
Ax = M
3
M
2
M
1
b.
But we have an upper triangular form, that is, if we let
U =

2 1 1 3
0 2 −2 1
0 0 3 7
0 0 0 12
¸
¸
¸
¸
Then we have
M
3
M
2
M
1
A = U,
A = (M
3
M
2
M
1
)
−1
U,
A = M
1
−1
M
2
−1
M
3
−1
U,
A = LU.
We are hoping that L is indeed lower triangular, and has ones on the diagonal. It turns out that
the inverse of each M
i
matrix has a nice form (See Exercise (3.6)). We write them here:
L =

1 0 0 0
2 1 0 0
3 0 1 0
1 0 0 1
¸
¸
¸
¸

1 0 0 0
0 1 0 0
0 1 1 0
0 −1 0 1
¸
¸
¸
¸

1 0 0 0
0 1 0 0
0 0 1 0
0 0 −1 1
¸
¸
¸
¸
=

1 0 0 0
2 1 0 0
3 1 1 0
1 −1 −1 1
¸
¸
¸
¸
This is really crazy: the matrix L looks to be composed of ones on the diagonal and multipliers
under the diagonal.
Now we check to see if we made any mistakes:
LU =

1 0 0 0
2 1 0 0
3 1 1 0
1 −1 −1 1
¸
¸
¸
¸

2 1 1 3
0 2 −2 1
0 0 3 7
0 0 0 12
¸
¸
¸
¸
=

2 1 1 3
4 4 0 7
6 5 4 17
2 −1 0 7
¸
¸
¸
¸
= A.
36 CHAPTER 3. SOLVING LINEAR SYSTEMS
3.3.2 Using LU Factorizations
We see that the G.E. algorithm can be used to actually calculate the LU factorization. We will look
at this in more detail in another example. We now examine how we can use the LU factorization
to solve the equation
Ax = b,
Since we have A = LU, we first solve
Lz = b,
then solve
Ux = z.
Since L is lower triangular, we can solve for z with a forward substitution. Similarly, since U is
upper triangular, we can solve for x with a back substitution. We drag out the previous example
(which we never got around to solving):

¸
¸
¸
2 1 1 3 7
4 4 0 7 11
6 5 4 17 31
2 −1 0 7 15
¸

We had found the LU factorization of A as
A =

1 0 0 0
2 1 0 0
3 1 1 0
1 −1 −1 1
¸
¸
¸
¸

2 1 1 3
0 2 −2 1
0 0 3 7
0 0 0 12
¸
¸
¸
¸
So we solve

1 0 0 0
2 1 0 0
3 1 1 0
1 −1 −1 1
¸
¸
¸
¸
z =

7
11
31
15
¸
¸
¸
¸
We get
z =

7
−3
13
18
¸
¸
¸
¸
Now we solve

2 1 1 3
0 2 −2 1
0 0 3 7
0 0 0 12
¸
¸
¸
¸
x =

7
−3
13
18
¸
¸
¸
¸
We get the ugly solution
z =

37
24
−17
12
5
6
3
2
¸
¸
¸
¸
3.4. ITERATIVE SOLUTIONS 37
3.3.3 Some Theory
We aren’t doing much proving here. The following theorem has an ugly proof in the Cheney &
Kincaid [7].
Theorem 3.3. If A is an nn matrix, and na¨ıve Gaussian Elimination does not encounter a zero
pivot, then the algorithm generates a LU factorization of A, where L is the lower triangular part of
the output matrix, and U is the upper triangular part.
This theorem relies on us using the fancy version of G.E., which saves the multipliers in the
spots where there should be zeros. If correctly implemented, then, L is the lower triangular part
but with ones put on the diagonal.
This theorem is proved in Cheney & Kincaid [7]. This appears to me to be a case of something
which can be better illustrated with an example or two and some informal investigation. The proof
is an unillustrating index-chase–read it at your own risk.
3.3.4 Computing Inverses
We consider finding the inverse of A. Since
AA
−1
= I,
then the j
th
column of the inverse A
−1
solves the equation
Ax = e
j
,
where e
j
is the column matrix of all zeros, but with a one in the j
th
position.
Thus we can find the inverse of A by running n linear solves. Obviously we are only going
to run G.E. once, to put the matrix in LU form, then run n solves using forward and backward
substitutions.
3.4 Iterative Solutions
Recall we are trying to solve
Ax = b.
We examine the computational cost of Gaussian Elimination to motivate the search for an alter-
native algorithm.
3.4.1 An Operation Count for Gaussian Elimination
We consider the number of floating point operations (“flops”) required to solve the system Ax = b.
Gaussian Elimnation first uses row operations to transform the problem into an equivalent problem
of the form Ux = b

, where U is upper triangular. Then back substitution is used to solve for x.
First we look at how many floating point operations are required to reduce

¸
¸
¸
¸
¸
¸
a
11
a
12
a
13
a
1n
b
1
a
21
a
22
a
23
a
2n
b
2
a
31
a
32
a
33
a
3n
b
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
a
n1
a
n2
a
n3
a
nn
b
n
¸

38 CHAPTER 3. SOLVING LINEAR SYSTEMS
to

¸
¸
¸
¸
¸
¸
a
11
a
12
a
13
a
1n
b
1
0 a
t
22
a
t
23
a
t
2n
b
t
2
0 a
t
32
a
t
33
a
t
3n
b
t
3
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 a
t
n2
a
t
n3
a
t
nn
b
t
n
¸

First a multiplier is computed for each row. Then in each row the algorithm performs n
multiplies and n adds. This gives a total of (n−1) +(n−1)n multiplies (counting in the computing
of the multiplier in each of the (n−1) rows) and (n−1)n adds. In total this is 2n
2
−n−1 floating
point operations to do a single pivot on the n by n system.
Then this has to be done recursively on the lower right subsystem, which is an (n−1) by (n−1)
system. This requires 2(n − 1)
2
− (n − 1) − 1 operations. Then this has to be done on the next
subsystem, requiring 2(n −2)
2
−(n −2) −1 operations, and so on.
In total, then, we use I
n
total floating point operations, with
I
n
= 2
n
¸
j=1
j
2

n
¸
j=1
j −
n
¸
j=1
1.
Recalling that
n
¸
j=1
j
2
=
1
6
(n)(n + 1)(2n + 1), and
n
¸
j=1
j =
1
2
(n)(n + 1),
We find that
I
n
=
1
6
(4n −1)n(n + 1) −n ≈
2
3
n
3
.
Now consider the costs of back substitution. To solve

¸
¸
¸
¸
¸
¸
a
11
a
1,n−2
a
1,n−1
a
1n
b
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 a
n−2,n−2
a
n−2,n−1
a
n−2,n
b
n−2
0 0 a
n−1,n−1
a
n−1,n
b
n−1
0 0 0 a
nn
b
n
¸

for x
n
requires only a single division. Then to solve for x
n−1
we compute
x
n−1
=
1
a
n−1,n−1
[b
n−1
−a
n−1,n
x
n
] ,
and requires 3 flops. Similarly, solving for x
n−2
requires 5 flops. Thus in total back substitution
requires B
n
total floating point operations with
B
n
=
n
¸
j=1
2j −1 = n(n −1) −n = n(n −2) ≈ n
2
3.4. ITERATIVE SOLUTIONS 39
3.4.2 Dividing by Multiplying
We saw that Gaussian Elimination requires around
2
3
n
3
operations just to find the LU factorization,
then about n
2
operations to solve the system, when A is nn. When n is large, this may take too
long to be practical. Additionally, if A is sparse (has few nonzero elements per row), we would like
the complexity of our computations to scale with the sparsity of A. Thus we look for an alternative
algorithm.
First we consider the simplest case, n = 1. Suppose we are to solve the equation
Ax = b.
for scalars A, b. We solve this by
x =
1
A
b =
1
ωA
ωb =
1
1 −(1 −ωA)
ωb =
1
1 −r
ωb,
where ω = 0 is some real number chosen to “weight” the problem appropriately, and r = 1 − ωA.
Now suppose that ω is chosen such that [r[ < 1. This can be done so long as A = 0, which would
have been a problem anyway. Now use the geometric expansion:
1
1 −r
= 1 +r +r
2
+r
3
+. . .
Because of the assumption [r[ < 1, the terms r
n
converge to zero as n → ∞. This gives the
approximate solution to our one dimensional problem as
x ≈

1 +r +r
2
+r
3
+. . . +r
k

ωb
= ωb +

r +r
2
+r
3
+. . . +r
k

ωb
= ωb +r

1 +r +r
2
+. . . +r
k−1

ωb
This suggests an iterative approach to solving Ax = b. First let x
(0)
= ωb, then let
x
(k)
= ωb +rx
(k−1)
.
The iterates x
(k)
will converge to the solution of Ax = b if [r[ < 1.
You should now convince yourself that because r
n
→0, that the choice of the initial iterate x
(0)
was immaterial, i.e., that under any choice of initial iterate convergence is guaranteed.
We now translate this scalar result into the vector case. The algorithm proceeds as follows: first
fix some initial estimate of the solution, x
(0)
. A good choice might be ωb, but this is not necessary.
Then calculate successive approximations to the actual solution by updates of the form
x
(k)
= ωb + (I −ωA) x
(k−1)
.
It turns out that we can consider a slightly more general form of the algorithm, one in which
successive iterates are defined implicitly. That is we consider iterates of the form
Qx
(k+1)
= (Q−ωA) x
(k)
+ωb, (3.3)
for some matrix Q, and some scaling factor ω. Note that this update relies on vector additions
and possibly by premultiplication of a vector by A or Q. In the case where these two matrices are
sparse, such an update can be relatively cheap.
40 CHAPTER 3. SOLVING LINEAR SYSTEMS
Now suppose that as k → ∞, x
(k)
converges to some vector x

, which is a fixed point of the
iteration. Then
Qx

= (Q −ωA) x

+ωb,
Qx

= Qx

−ωAx

+ωb,
ωAx

= ωb,
Ax

= b.
We have some freedom in choosing Q, but there are two considerations we should keep in mind:
1. Choice of Q affects convergence and speed of convergence of the method. In particular, we
want Q to be similar to A.
2. Choice of Q affects ease of computing the update. That is, given
z = (Q−A) x
(k)
+b,
we should pick Q such that the equation
Qx
(k+1)
= z
is easy to solve exactly.
These two goals conflict with each other. At one end of the spectrum is the so-called “impossible
iteration,” at the other is the Richardsons.
3.4.3 Impossible Iteration
I made up the term “impossible iteration.” But consider the method which takes Q to be A. This
seems to be the best choice for satisfying the first goal. Letting ω = 1, our method becomes
Ax
(k+1)
= (A −A) x
(k)
+b = b.
This method should clearly converge in one step. However, the second goal is totally ignored.
Indeed, we are considering iterative methods because we cannot easily solve this linear equation in
the first place.
3.4.4 Richardson Iteration
At the other end of the spectrum is the Richardson Iteration, which chooses Q to be the identity
matrix. Solving the system
Qx
(k+1)
= z
is trivial: we just have x
(k+1)
= z.
Example Problem 3.4. Use Richardson Iteration with ω = 1 on the system
A =

6 1 1
2 4 0
1 2 6
¸
¸
, b =

12
0
6
¸
¸
.
Solution: We let
Q =

1 0 0
0 1 0
0 0 1
¸
¸
, (Q−A) =

−5 −1 −1
−2 −3 0
−1 −2 −5
¸
¸
.
3.4. ITERATIVE SOLUTIONS 41
We start with an arbitrary x
(0)
, say x
(0)
= [2 2 2]
¯
. We get x
(1)
= [−2 −10 −10]
¯
, and x
(2)
=
[42 34 78]
¯
.
Note the real solution is x = [2 −1 1]
¯
. The Richardson Iteration does not appear to converge
for this example, unfortunately. ¬
Example Problem 3.5. Apply Richardson Iteration with ω = 1/6 on the previous system.
Solution: Our iteration becomes
x
(k+1)
=

0 −1/6 −1/6
−1/3 1/3 0
−1/6 −1/3 0
¸
¸
x
(k)
+

2
0
1
¸
¸
.
We start with the same x
(0)
as previously, x
(0)
= [2 2 2]
¯
. We get x
(1)
= [4/3 0 0]
¯
, x
(2)
=
[2 −4/9 7/9]
¯
, and finally x
(12)
= [2 −0.99998 0.99998]
¯
.
Thus, the choice of ω has some affect on convergence. ¬
We can rethink the Richardson Iteration as
x
(k+1)
= (I −ωA) x
(k)
+ωb = x
(k)

b −Ax
(k)

.
Thus at each step we are adding some scaled version of the residual, defined as b − Ax
(k)
, to the
iterate.
3.4.5 Jacobi Iteration
The Jacobi Iteration chooses Q to be the matrix consisting of the diagonal of A. This is more
similar to A than the identity matrix, but nearly as simple to invert.
Example Problem 3.6. Use Jacobi Iteration, with ω = 1, to solve the system
A =

6 1 1
2 4 0
1 2 6
¸
¸
, b =

12
0
6
¸
¸
.
Solution: We let
Q =

6 0 0
0 4 0
0 0 6
¸
¸
, (Q−A) =

0 −1 −1
−2 0 0
−1 −2 0
¸
¸
, Q
−1
=

1
6
0 0
0
1
4
0
0 0
1
6
¸
¸
.
We start with an arbitrary x
(0)
, say x
(0)
= [2 2 2]
¯
. We get x
(1)
=

4
3
−1 0

¯
. Then x
(2)
=

13
6

2
3
10
9

¯
. Continuing, we find that x
(5)
≈ [1.987 −1.019 0.981]
¯
.
Note the real solution is x = [2 −1 1]
¯
. ¬
There is an alternative way to describe the Jacobi Iteration for ω = 1. By considering the update
elementwise, we see that the operation can be described by
x
(k+1)
j
=
1
a
jj

¸
b
j

n
¸
i=1,i,=j
a
ji
x
(k)
i
¸

.
Thus an update takes less than 2n
2
operations. In fact, if A is sparse, with less than k nonzero
entries per row, the update should take less than 2nk operations.
42 CHAPTER 3. SOLVING LINEAR SYSTEMS
3.4.6 Gauss Seidel Iteration
The Gauss Seidel Iteration chooses Q to be lower triangular part of A, including the diagonal. In
this case solving the system
Qx
(k+1)
= z
is performed by forward substitution. Here the Q is more like A than for Jacobi Iteration, but
involves more work for inverting.
Example Problem 3.7. Use Gauss Seidel Iteration to again solve for
A =

6 1 1
2 4 0
1 2 6
¸
¸
, b =

12
0
6
¸
¸
.
Solution: We let
Q =

6 0 0
2 4 0
1 2 6
¸
¸
, (Q−A) =

0 −1 −1
0 0 0
0 0 0
¸
¸
.
We start with an arbitrary x
(0)
, say x
(0)
= [2 2 2]
¯
. We get x
(1)
=

4
3

2
3
1

¯
. Then x
(2)
=

35
18

35
36
1

¯
.
Already this is fairly close to the actual solution x = [2 −1 1]
¯
. ¬
Just as with Jacobi Iteration, there is an easier way to describe the Gauss Seidel Iteration. In
this case we will keep a single vector x and overwrite it, element by element. Thus for j = 1, 2, . . . , n,
we set
x
j

1
a
jj

¸
b
j

n
¸
i=1,i,=j
a
ji
x
i
¸

.
This looks exactly like the Jacobi update. However, in the sum on the right there are some “old”
values of x
i
and some “new” values; the new values are those x
i
for which i < j.
Again this takes less than 2n
2
operations. Or less than 2nk if A is sufficiently sparse.
An alteration of the Gauss Seidel Iteration is to make successive “sweeps” of this redefinition,
one for j = 1, 2, . . . , n, the next for j = n, n − 1, . . . , 2, 1. This amounts to running Gauss Seidel
once with Q the lower triangular part of A, then running it with Q the upper triangular part. This
iterative method is known as “red-black Gauss Seidel.”
3.4.7 Error Analysis
Suppose that x is the solution to equation 3.4. Define the error vector:
e
(k)
= x
(k)
−x.
3.4. ITERATIVE SOLUTIONS 43
Now notice that
x
(k+1)
= Q
−1
(Q−ωA) x
(k)
+Q
−1
ωb,
x
(k+1)
= Q
−1
Qx
(k)
−ωQ
−1
Ax
(k)
+ωQ
−1
Ax,
x
(k+1)
= x
(k)
−ωQ
−1
A

x
(k)
−x

,
x
(k+1)
−x = x
(k)
−x −ωQ
−1
A

x
(k)
−x

,
e
(k+1)
= e
(k)
−ωQ
−1
Ae
(k)
,
e
(k+1)
=

I −ωQ
−1
A

e
(k)
.
Reusing this relation we find that
e
(k)
=

I −ωQ
−1
A

e
(k−1)
,
=

I −ωQ
−1
A

2
e
(k−2)
,
=

I −ωQ
−1
A

k
e
(0)
.
We want to ensure that e
(k+1)
is “smaller” than e
(k)
. To do this we recall matrix and vector norms
from Subsection 1.4.1.

e
(k)

2
=

I −ωQ
−1
A

k
e
(0)

2

I −ωQ
−1
A

k
2

e
(0)

2
.
(See Example Problem 1.29.)
Thus our iteration converges (e
(k)
goes to the zero vector, i.e., x
(k)
→x) if

I −ωQ
−1
A

2
< 1.
This gives the theorem:
Theorem 3.8. An iterative solution scheme converges for any starting x
(0)
if and only if all
eigenvalues of I −ωQ
−1
A are less than 1 in absolute value, i.e., if and only if

I −ωQ
−1
A

2
< 1
Another way of saying this is “the spectral radius of I −ωQ
−1
A is less than 1.”
In fact, the speed of convergence is decided by the spectral radius of the matrix–convergence
is faster for smaller values. Recall our introduction to iterative methods in the scalar case, where
the result relied on ω being chosen such that [1 −ωA[ < 1. You should now think about how
eigenvalues generalize the absolute value of a scalar, and how this relates to the norm of matrices.
Let y be an eigenvector for Q
−1
A, with corresponding eigenvalue λ. Then

I −ωQ
−1
A

y = y −ωQ
−1
Ay = y −ωλy = (1 −ωλ) y.
This relation may allow us to pick the optimal ω for given A,Q. It can also show us that
sometimes no choice of ω will give convergence of the method. There are a number of different
related results that show when various methods will work for certain choices of ω. We leave these
to the exercises.
44 CHAPTER 3. SOLVING LINEAR SYSTEMS
Example Problem 3.9. Find conditions on ω which guarantee convergence of Richardson’s Iter-
ation for finding approximate iterative solutions to the system Ax = b, where
A =

6 1 1
2 4 0
1 2 6
¸
¸
, b =

12
0
6
¸
¸
.
Solution: By Theorem 3.8, with Q the identity matrix, we have convergence if and only if
|I −ωA|
2
< 1
We now use the fact that “eigenvalues commute with polynomials;” that is if f(x) is a polynomial
and λ is an eigenvalue of a matrix A, then f(λ) is an eigenvalue of the matrix f(A). In this case the
polynomial we consider is f(x) = x
0
−ωx
1
. Using octave or Matlab you will find that the eigenvalues
of A are approximately 7.7321, 4.2679, and 4. Thus the eigenvalues of I −ωA are approximately
1 −7.7321ω, 1 −4.2679ω, 1 −4ω.
With some work it can be shown that all three of these values will be less than one in absolute
value if and only if
0 < ω <
3
7.7321
≈ 0.388
(See also Exercise (3.10).)
Compare this to the results of Example Problem 3.4, where for this system, ω = 1 apparently did
not lead to convergence, while for Example Problem 3.5, with ω = 1/6, convergence was observed.
¬
3.4.8 A Free Lunch?
The analysis leading to Theorem 3.8 leads to an interesting possible variant of the iterative scheme.
For simplicity we will only consider an alteration of Richardson’s Iteration. In the altered algorithm
we presuppose the existence, via some oracle, of a sequence of weightings, ω
i
, which we use in each
iterative update. Thus our algorithm becomes:
1. Select some initial iterate x
(0)
.
2. Given iterate x
(k−1)
, define
x
(k)
= (I −ω
k
A) x
(k−1)

k
b.
Following the analysis for Theorem 3.8, it can be shown that
e
(k)
= (I −ω
k
A) e
(k−1)
where, again, e
(k)
= x
(k)
− x, with x the actual solution to the linear system. Expanding e
(k−1)
similarly gives
e
(k)
= (I −ω
k
A) (Iω
k−1
A) e
(k−2)
=

k
¸
i=1
I −ω
i
A

e
(0)
.
3.4. ITERATIVE SOLUTIONS 45
Remember that we want e
(k)
to be small in magnitude, or, better yet, to be zero. One way to
guarantee that e
(k)
is zero, regardless of the choice of e
(0)
it to somehow ensure that the matrix
B =
k
¸
i=1
I −ω
i
A
has all zero eigenvalues.
We again make use of the fact that eigenvalues “commute” with polynomials to claim that if
λ
j
is an eigenvalue of A, then
k
¸
i=1
1 −ω
i
λ
j
is an eigenvalue of B. This eigenvalue is zero if one of the ω
i
for 1 ≤ i ≤ k is 1/λ
j
. This suggests
how we are to pick the weightings: let them be the inverses of the eigenvalues of A.
In fact, if A has a small number of distinct eigenvalues, say m eigenvalues, then convergence to
the exact solution could be guaranteed after only m iterations, regardless of the size of the matrix.
As you may have guessed from the title of this subsection, this is not exactly a practical
algorithm. The problem is that it is not simple to find, for a given arbitrary matrix A, one, some,
or all its eigenvalues. This problem is of sufficient complexity to outweigh any savings to be gotten
from our “free lunch” algorithm.
However, in some limited situations this algorithm might be practical if the eigenvalues of A
are known a priori .
46 CHAPTER 3. SOLVING LINEAR SYSTEMS
Exercises
(3.1) Find the LU decomposition of the following matrices, using na¨ıve Gaussian Elimination
(a)

3 −1 −2
9 −1 −4
−6 10 13
¸
¸
(b)

8 24 16
1 12 11
4 13 19
¸
¸
(c)

−3 6 0
1 −2 0
−4 5 −8
¸
¸
(3.2) Perform back substitution to solve the equation

1 3 5 3
0 2 4 3
0 0 2 1
0 0 0 2
¸
¸
¸
¸
x =

−1
−1
1
2
¸
¸
¸
¸
(3.3) Perform Na¨ıve Gaussian Elimination to prove Cramer’s rule for the 2D case. That is, prove
that the solution to
¸
a b
c d
¸
x
y

=
¸
f
g

is given by
y =
det
¸
a f
c g

det
¸
a b
c d
and x =
det
¸
f b
g d

det
¸
a b
c d

(3.4) Implement Cramer’s rule to solve a pair of linear equations in 2 variables. Your m-file should
have header line like:
function x = cramer2(A,b)
where A is a 2 2 matrix, and b and x are 2 1 vectors. Your code should find the x such
that Ax = b. (See Exercise (3.3))
Test your code on the following (augmented) systems:
(a)

3 −2 1
4 −3 −1

(b)

1.24 −3.48 1
−0.744 2.088 2

(c)

1.24 −3.48 1
−0.744 2.088 −0.6

(d)

−0.0590 0.2372 −0.3528
0.1080 −0.4348 0.6452

(3.5) Given two lines parametrized by f(t) = at +b, and g(s) = cs+d, set up a linear 22 system
of equations to find the t, s at the point of intersection of the two lines. If you were going
to write a program to detect the intersection of two lines, how would you detect whether
they are parallel? How is this related to the form of the solution to a system of two linear
equations? (See Exercise (3.3))
(3.6) Prove that the inverse of the matrix

1 0 0 0
a
2
1 0 0
a
3
0 1 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
a
n
0 0 1
¸
¸
¸
¸
¸
¸
¸
3.4. ITERATIVE SOLUTIONS 47
is the matrix

1 0 0 0
−a
2
1 0 0
−a
3
0 1 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
−a
n
0 0 1
¸
¸
¸
¸
¸
¸
¸
(Hint: Multiply them together.)
(3.7) Under the strategy of scaled partial pivoting, which row of the following matrix will be the
first pivot row?

10 17 −10 0.1 0.9
−3 3 −3 0.3 −4
0.3 0.1 0.01 −1 0.5
2 3 4 −3 5
10 100 1 0.1 0
¸
¸
¸
¸
¸
¸
(3.8) Let A be a symmetric positive definite n n matrix with n distinct eigenvalues. Letting
y
(0)
= b/ |b|
2
, consider the iteration
y
(k+1)
=
Ay
(k)

Ay
(k)

2
.
(a) What is

y
(k)

2
?
(b) Show that y
(k)
= A
k
b/

A
k
b

2
.
(c) Show that as k →∞, y
(k)
converges to the (normalized) eigenvector associated with the
largest eigenvalue of A.
(3.9) Consider the equation

1 3 5
−2 2 4
4 −3 −4
¸
¸
x =

−5
−6
10
¸
¸
Letting x
(0)
= [1 1 0]
¯
, find the iterate x
(1)
by one step of Richardson’s Method. And by
one step of Jacobi Iteration. And by Gauss Seidel.
(3.10) Let A be a symmetric n n matrix with eigenvalues in the interval [α, β], with 0 < α ≤ β,
and α +β = 0. Consider Richardson’s Iteration
x
(k+1)
= (I −ωA) x
(k)
+ωb.
Recall that e
(k+1)
= (I −ωA) e
(k)
.
(a) Show that the eigenvalues of I −ωA are in the interval [1 −ωβ, 1 −ωα].
(b) Prove that
max ¦[λ[ : 1 −ωβ ≤ λ ≤ 1 −ωα¦
is minimized when we choose ω such that 1 − ωβ = −(1 −ωα) . (Hint: It may help to
look at the graph of something versus ω.)
(c) Show that this relationship is satisfied by ω = 2/ (α +β).
(d) For this choice of ω show that the spectral radius of I −ωA is
[α −β[
[α +β[
.
48 CHAPTER 3. SOLVING LINEAR SYSTEMS
(e) Show that when 0 < α, this quantity is always smaller than 1.
(f) Prove that if A is positive definite, then there is an ω such that Richardson’s Iteration
with this ω will converge for any choice of x
(0)
.
(g) For which matrix do you expect faster convergence of Richardson’s Iteration: A
1
with
eigenvalues in [10, 20] or A
2
with eigenvalues in [1010, 1020]? Why?
(3.11) Implement Richardson’s Iteration to solve the system Ax = b. Your m-file should have
header line like:
function xk = richardsons(A,b,x0,w,k)
Your code should return x
(k)
based on the iteration
x
(j+1)
= x
(j)
−ω

Ax
(j)
−b

.
Let w take the place of ω, and let x0 be the initial iterate x
(0)
. Test your code for A, b for
which you know the actual solution to the problem. (Hint: Start with A and the solution x
and generate b.) Test your code on the following matrices:
• Let A be the Hilbert Matrix. This is generated by the octave command A = hilb(10),
or whatever (smallish) integer. Try different values of ω, including ω = 1.
• Let A be a Toeplitz matrix of the form:
A =

−2 1 0 0
1 −2 1 0
0 1 −2 0
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
0 0 0 −2
¸
¸
¸
¸
¸
¸
¸
These can be generated by the octave command A = toeplitz([-2 1 0 0 0 0 0 0]).
Try different values of ω, including ω = −1/2.
(3.12) Let A be a nonsingular n n matrix. We wish to solve Ax = b. Let x
(0)
be some starting
vector, let T
k
be span
¸
r
(0)
, Ar
(0)
, . . . , A
k
r
(0)
¸
, and let {
k
be the set of polynomials, p(x) of
degree k with p(0) = 1.
Consider the following iterative method: Let x
(k+1)
be the x that solves
min
x∈x
(0)
+T
k
|b −Ax|
2
.
Let r
(k)
= b −Ax
(k)
.
(a) Show that if x ∈ x
(0)
+T
k
, then b −Ax = p(A)r
(0)
for some p ∈ {
k
.
(b) Prove that, conversely, for any p ∈ {
k
there is some x ∈ x
(0)
+T
k
, such that b −Ax =
p(A)r
(0)
.
(c) Argue that

r
(k+1)

2
= min
p∈1
k

p(A)r
(0)

2
.
(d) Prove that this iteration converges in at most n steps. (Hint: Argue for the existence
of a polynomial in {
n
that vanishes at all the eigenvalues of A. Use this polynomial to
show that

r
(n)

2
≤ 0.)
Chapter 4
Finding Roots
4.1 Bisection
We are now concerned with the problem of finding a zero for the function f(x), i.e., some c such
that f(c) = 0.
The simplest method is that of bisection. The following theorem, from calculus class, insures
the success of the method
Theorem 4.1 (Intermediate Value Theorem). If f(x) is continuous on [a, b] then for any y
such that y is between f(a) and f(b) there is a c ∈ [a, b] such that f(c) = y.
The IVT is best illustrated graphically. Note that continuity is really a requirement here–a
single point of discontinuity could ruin your whole day, as the following example illustrates.
Example 4.2. The function f(x) =
1
x
is not continuous at 0. Thus if 0 ∈ [a, b] , we cannot apply
the IVT. In particular, if 0 ∈ [a, b] it happens to be the case that for every y between f(a), f(b)
there is no c ∈ [a, b] such that f(c) = y.
In particular, the IVT tells us that if f(x) is continuous and we know a, b such that f(a), f(b)
have different sign, then there is some root in [a, b]. A decent estimate of the root is c =
a+b
2
.
We can check whether f(c) = 0. If this does not hold then one and only one of the two following
options holds:
1. f(a), f(c) have different signs.
2. f(c), f(b) have different signs.
We now choose to recursively apply bisection to either [a, c] or [c, b], respectively, depending on
which of these two options hold.
4.1.1 Modifications
Unfortunately, it is impossible for a computer to test whether a given black box function is contin-
uous. Thus malicious or incompetent users could cause a na¨ıvely implemented bisection algorithm
to fail. There are a number of easily conceivable problems:
1. The user might give f, a, b such that f(a), f(b) have the same sign. In this case the function
f might be legitimately continuous, and might have a root in the interval [a, b]. If, taking
c =
a+b
2
, f(a), f(b), f(c) all have the same sign, the algorithm would be at an impasse. We
should perform a “sanity check” on the input to make sure f(a), f(b) have different signs.
49
50 CHAPTER 4. FINDING ROOTS
2. The user might give f, a, b such that f is not continuous on [a, b], moreover has no root in
the interval [a, b]. For a poorly implemented algorithm, this might lead to an infinite search
on smaller and smaller intervals about some discontinuity of f. In fact, the algorithm might
descend to intervals as small as machine precision, in which case the midpoint of the interval
will, due to rounding, be the same as one of the endpoints, resulting in an infinite recursion.
3. The user might give f such that f has no root c that is representable in the computer’s
memory. Recall that we think of computers as storing numbers in the form ±r 10
k
; given
a finite number of bits to represent a number, only a finite number of such numbers can be
represented. It may legitimately be the case that none of them is a root to f. In this case,
the behaviour of the algorithm may be like that of the previous case. A well implemented
version of bisection should check the length of its input interval, and give up if the length is
too small, compared to machine precision.
Another common error occurs in the testing of the signs of f(a), f(b). A slick programmer might
try to implement this test in the pseudocode:
if (f(a)f(b) > 0) then ...
Note however, that [f(a)[ , [f(b)[ might be very small, and that f(a)f(b) might be too small to
be representable in the computer; this calculation would be rounded to zero, and unpredictable
behaviour would ensue. A wiser choice is
if (sign(f(a)) * sign(f(b)) > 0) then ...
where the function sign (x) returns −1, 0, 1 depending on whether x is negative, zero, or positive,
respectively.
The pseudocode bisection algorithm is given as Algorithm 1.
4.1.2 Convergence
We can see that each time recursive bisection(f, a, b, . . .) is called that [b −a[ is half the length
of the interval in the previous call. Formally call the first interval [a
0
, b
0
], and the first midpoint
c
0
. Let the second interval be [a
1
, b
1
], etc. Note that one of a
1
, b
1
will be c
0
, and the other will be
either a
0
or b
0
. We are claiming that
b
n
−a
n
=
b
n−1
−a
n−1
2
=
b
0
−a
0
2
n
Theorem 4.3 (Bisection Method Theorem). If f(x) is a continuous function on [a, b] such that
f(a)f(b) < 0, then after n steps, the algorithm run bisection will return c such that [c −c
t
[ ≤
[b−a[
2
n
, where c
t
is some root of f.
4.2 Newton’s Method
Newton’s method is an iterative method for root finding. That is, starting from some guess at the
root, x
0
, one iteration of the algorithm produces a number x
1
, which is supposed to be closer to a
root; guesses x
2
, x
3
, . . . , x
n
follow identically.
Newton’s method uses “linearization” to find an approximate root. Recalling Taylor’s Theorem,
we know that
f(x +h) ≈ f(x) +f
t
(x)h.
4.2. NEWTON’S METHOD 51
Algorithm 1: Algorithm for finding root by bisection.
Input: a function, two endpoints, a x-tolerance, and a y-tolerance
Output: a c such that [f(c)[ is smaller than the y-tolerance.
run bisection(f, a, b, δ, )
(1) Let fa ←sign (f(a)) .
(2) Let fb ←sign (f(b)) .
(3) if fafb > 0
(4) throw an error.
(5) else if fafb = 0
(6) if fa = 0
(7) return a
(8) else
(9) return b
(10) Let c ←
a+b
2
(11) while b −a > 2δ
(12) Let fc ←f(c).
(13) if [fc[ <
(14) return c
(15) if sign (fa) sign (fc) < 0
(16) Let b ←c, fb ←fc, c ←
a+c
2
.
(17) else
(18) Let a ←c, fa ←fc, c ←
c+b
2
.
(19) return c
This approximation is better when f
tt
() is “well-behaved” between x and x+h. Newton’s method
attempts to find some h such that
0 = f(x +h) = f(x) +f
t
(x)h.
This is easily solved as
h =
−f(x)
f
t
(x)
.
An iteration of Newton’s method, then, takes some guess x
k
and returns x
k+1
defined by
x
k+1
= x
k

f(x
k
)
f
t
(x
k
)
. (4.1)
An iteration of Newton’s method is shown in Figure 4.1, along with the linearization of f(x) at
x
k
.
4.2.1 Implementation
Use of Newton’s method requires that the function f(x) be differentiable. Moreover, the derivative
of the function must be known. This may preclude Newton’s method from being used when f(x)
is a black box. As is the case for the bisection method, our algorithm cannot explicitly check for
continuity of f(x). Moreover, the success of Newton’s method is dependent on the initial guess
x
0
. This was also the case with bisection, but for bisection there was an easy test of the initial
interval–i.e., test if f(a
0
)f(b
0
) < 0.
52 CHAPTER 4. FINDING ROOTS
P
S
f
r
a
g
r
e
p
l
a
c
e
m
e
n
t
s
x
k
x
k−1
f(x
k
)
f(x
k−1
)
Figure 4.1: One iteration of Newton’s method is shown for a quadratic function f(x). The lin-
earization of f(x) at x
k
is shown. It is clear that x
k+1
is a root of the linearization. It happens to
be the case that [f(x
k+1
)[ is smaller than [f(x
k
)[ , i.e., x
k+1
is a better guess than x
k
.
Our algorithm will test for goodness of the estimate by looking at [f(x
k
)[ . The algorithm will
also test for near-zero derivative. Note that if it were the case that f
t
(x
k
) = 0 then h would be ill
defined.
Algorithm 2: Algorithm for finding root by Newton’s Method.
Input: a function, its derivative, an initial guess, an iteration limit, and a tolerance
Output: a point for which the function has small value.
run newton(f, f
t
, x0, N, tol)
(1) Let x ←x0, n ←0.
(2) while n ≤ N
(3) Let fx ←f(x).
(4) if [fx[ < tol
(5) return x.
(6) Let fpx ←f
t
(x).
(7) if [fpx[ < tol
(8) Warn “f
t
(x) is small; giving up.”
(9) return x.
(10) Let x ←x −fx/fpx.
(11) Let n ←n + 1.
4.2. NEWTON’S METHOD 53
4.2.2 Problems
As mentioned above, convergence is dependent on f(x), and the initial estimate x
0
. A number of
conceivable problems might come up. We illustrate them here.
Example 4.4. Consider Newton’s method applied to the function f(x) = x
j
with j > 1, and with
initial estimate x
0
= 0.
Note that f(x) = 0 has the single root x = 0. Now note that
x
k+1
= x
k

x
j
k
jx
j−1
k
=

1 −
1
j

x
k
.
Since the equation has the single root x = 0, we find that x
k
is converging to the root. However,
it is converging at a rate slower than we expect from Newton’s method: at each step we have a
constant decrease of 1−(1/j) , which is a larger number (and thus worse decrease) when j is larger.
Example 4.5. Consider Newton’s method applied to the function f(x) =
lnx
x
, with initial estimate
x
0
= 3.
Note that f(x) is continuous on R
+
. It has a single root at x = 1. Our initial guess is not too far
from this root. However, consider the derivative:
f
t
(x) =
x
1
x
−ln x
x
2
=
1 −ln x
x
2
If x > e
1
, then 1 −ln x < 0, and so f
t
(x) < 0. However, for x > 1, we know f(x) > 0. Thus taking
x
k+1
= x
k

f(x
k
)
f
t
(x
k
)
> x
k
.
The estimates will “run away” from the root x = 1.
Example 4.6. Consider Newton’s method applied to the function f(x) = sin (x) for the initial
estimate x
0
= 0, where x
0
has the odious property 2x
0
= tan x
0
.
You should verify that there are an infinite number of such x
0
. Consider the identity of x
1
:
x
1
= x
0

f(x
0
)
f
t
(x
0
)
= x
0

sin(x
0
)
cos(x
0
)
= x
0
−tan x
0
= x
0
−2x
0
= −x
0
.
Now consider x
2
:
x
2
= x
1

f(x
1
)
f
t
(x
1
)
= −x
0

sin(−x
0
)
cos(−x
0
)
−x
0
+
sin(x
0
)
cos(x
0
)
= −x
0
+ tan x
0
= −x
0
+ 2x
0
= x
0
.
Thus Newton’s method “cycles” between the two values x
0
, −x
0
.
Of course, Newton’s method may find some iterate x
k
for which f
t
(x
k
) = 0, in which case, there
is no well-defined x
k+1
.
4.2.3 Convergence
When Newton’s Method converges, it actually displays quadratic convergence. That is, if e
k
= x
k
−r,
where r is the root that the x
k
are converging to, that [e
k+1
[ ≤ C [e
k
[
2
. If, for example, it were
the case that C = 1, then we would double the accuracy of our root estimate with each iterate.
That is, if e
0
were 0.001, we would expect e
1
to be on the order of 0.000001. The following theorem
formalizes our claim:
54 CHAPTER 4. FINDING ROOTS
Theorem 4.7 (Newton’s Method Convergence). If f(x) has two continuous derivatives, and
r is a simple root of f(x), then there is some D such that if [x
0
−r[ < D, Newton’s method will
converge quadratically to r.
The proof is based on arguments from real analysis, and is omitted; see Cheney & Kincaid for
the proof [7]. Take note of the following, though:
1. The proof requires that r be a simple root, that is that f
t
(r) = 0. When this does not hold
we may get only linear convergence, as in Example 4.4.
2. The key to the proof is using Taylor’s theorem, and the definition of Newton’s method to
show that
e
k+1
=
−f
tt

k
)e
2
k
2f
t
(x
k
)
,
where ξ
k
is between x
k
and r = x
k
+e
k
.
The proof then proceeds by showing that there is some region about r such that
(a) in this region [f
tt
(x)[ is not too large and [f
t
(x)[ is not too small, and
(b) if x
k
is in the region, then x
k+1
is in the region.
In particular the region is chosen to be so small that if x
k
is in the region, then the factor
e
2
k
will outweigh the factor [f
tt

k
)[ /2 [f
t
(x
k
)[ . You can roughly think of this region as an
“attractor” region for the root.
3. The theorem never guarantees that some x
k
will fall into the “attractor” region of a root r of
the function, as in Example 4.5 and Example 4.6. The theorem that follows gives sufficient
conditions for convergence to a particular root.
Theorem 4.8 (Newton’s Method Convergence II [2]). If f(x) has two continuous derivatives
on [a, b], and
1. f(a)f(b) < 0,
2. f
t
(x) = 0 on [a, b],
3. f
tt
(x) does not change sign on [a, b],
4. Both [f(a)[ ≤ (b −a) [f
t
(a)[ and [f(b)[ ≤ (b −a) [f
t
(b)[ hold,
then Newton’s method converges to the unique root of f(x) = 0 for any choice of x
0
∈ [a, b] .
4.2.4 Using Newton’s Method
Newton’s Method can be used to program more complex functions using only simple functions.
Suppose you had a computer which could perform addition, subtraction, multiplication, division,
and storing and retrieving numbers, and it was your task to write a subroutine to compute some
complex function g(). One way of solving this problem is to have your subroutine use Newton’s
Method to solve some equation equivalent to g(z) −x = 0, where z is the input to the subroutine.
Note that it is assumed the subroutine cannot evaluate g(z) directly, so this equation needs to be
modified to satisfy the computer’s constraints.
Quite often when dealing with problems of this type, students make the mistake of using New-
ton’s Method to try to solve a linear equation. This should be an indication that a mistake was
made, since Newton’s Method can solve a linear equation in a single step:
Example 4.9. Consider Newton’s method applied to the function f(x) = ax +b. The iteration is
given as
x
k+1
←x
k

ax
k
+b
a
.
This can be rewritten simply as x
k+1
←−b/a.
4.3. SECANT METHOD 55
The following example problems should illustrate this process of “bootstrapping” via Newton’s
Method.
Example Problem 4.10. Devise a subroutine using only subtraction and multiplication that can
find the multiplicative inverse of an input number z, i.e., can find (1/z) .
Solution: We are tempted to use the linear function f(x) = (1/z) −x. But this is a linear equation
for which Newton’s Method would reduce to x
k+1
← (1/z) . Since the subroutine can only use
subtraction and multiplication, this will not work.
Instead apply Newton’s Method to f(x) = z −(1/x) . The Newton step is
x
k+1
←x
k

z −(1/x
k
)

1/x
2
k
= x
k
−zx
2
k
+x
k
= x
k
(2 −zx
k
) .
Note this step uses only multiplication and subtraction. The subroutine is given in Algorithm 3.
¬
Algorithm 3: Algorithm for finding a multiplicative inverse using simple operations.
Input: a number
Output: its multiplicative inverse
invs(z)
(1) if z = 0 throw an error.
(2) Let x ←sign (z) , n ←0.
(3) while n ≤ 50
(4) Let x ←x(2 −zx) .
(5) Let n ←n + 1.
Example Problem 4.11. Devise a subroutine using only simple operations which computes the
square root of an input number z.
Solution: The temptation is to find a zero for f(x) =

z − x. However, this equation is linear
in x. Instead let f(x) = z − x
2
. You can easily see that if x is a positive root of f(x) = 0, then
x =

z. The Newton step becomes
x
k+1
←x
k

z −x
2
k
−2x
k
.
after some simplification this becomes
x
k+1

x
k
2
+
z
2x
k
.
Note this relies only on addition, multiplication and division.
The final subroutine is given in Algorithm 4.
¬
4.3 Secant Method
The secant method for root finding is roughly based on Newton’s method; however, it is not
assumed that the derivative of the function is known, rather the derivative is approximated by
the value of the function at some of the iterates, x
k
. More specifically, the slope of the tangent
56 CHAPTER 4. FINDING ROOTS
Algorithm 4: Algorithm for finding a square root using simple operations.
Input: a number
Output: its square root
sqrt(z)
(1) if z < 0 throw an error.
(2) Let x ←1, n ←0.
(3) while n ≤ 50
(4) Let x ←(x +z/x) /2.
(5) Let n ←n + 1.
P
S
f
r
a
g
r
e
p
l
a
c
e
m
e
n
t
s
x
k
x
k−1
f(x
k
)
f(x
k−1
)
x
k+1
f
(
x
k
+
1
)
Figure 4.2: One iteration of the Secant method is shown for some quadratic function f(x). The
secant line through (x
k−1
, f(x
k−1
)) and (x
k
, f(x
k
)) is shown. It happens to be the case that
[f(x
k+1
)[ is smaller than [f(x
k
)[ , i.e., x
k+1
is a better guess than x
k
.
line at (x
k
, f(x
k
)) , which is f
t
(x
k
) is approximated by the slope of the secant line passing through
(x
k−1
, f(x
k−1
)) and (x
k
, f(x
k
)) , which is
f(x
k
) −f(x
k−1
)
x
k
−x
k−1
Thus the iterate x
k+1
is the root of this secant line. That is, it is a root to the equation
f(x
k
) −f(x
k−1
)
x
k
−x
k−1
(x −x
k
) = y −f(x
k
).
4.3. SECANT METHOD 57
Since the root has a y value of 0, we have
f(x
k
) −f(x
k−1
)
x
k
−x
k−1
(x
k+1
−x
k
) = −f(x
k
),
x
k+1
−x
k
= −

x
k
−x
k−1
f(x
k
) −f(x
k−1
)

f(x
k
),
x
k+1
= x
k

x
k
−x
k−1
f(x
k
) −f(x
k−1
)

f(x
k
). (4.2)
You will note this is the recurrence of Newton’s method, but with the slope f
t
(x
k
) substituted
by the slope of the secant line. Note also that the secant method requires two initial guesses, x
0
, x
1
,
but can be used on a black box function. The secant method can suffer from some of the same
problems that Newton’s method does, as we will see.
An iteration of the secant method is shown in Figure 4.2, along with the secant line.
Example 4.12. Consider the secant method used on x
3
+x
2
−x −1, with x
0
= 2, x
1
=
1
2
.
Note that this function is continuous and has roots ±1. We give the iterates here:
k x
k
f(x
k
)
0 2 9
1 0.5 −1.125
2 0.666666666666667 −0.925925925925926
3 1.44186046511628 2.63467367653163
4 0.868254072087394 −0.459842466254495
5 0.953491494113659 −0.177482458876898
6 1.00706900811804 0.0284762692197613
7 0.999661272951803 −0.0013544492875992
8 0.999997617569723 −9.52969840528617 10
−06
9 1.0000000008072 3.22880033820638 10
−09
10 0.999999999999998 −7.43849426498855 10
−15
4.3.1 Problems
As with Newton’s method, convergence is dependent on f(x), and the initial estimates x
0
, x
1
. We
illustrate a few possible problems:
Example 4.13. Consider the secant method for the function f(x) =
lnx
x
, with x
0
= 3, x
1
= 4.
As with Newton’s method, the iterates diverge towards infinity, looking for a nonexistant root. We
58 CHAPTER 4. FINDING ROOTS
give some iterates here:
k x
k
f(x
k
)
0 3 0.366204096222703
1 4 0.346573590279973
2 21.6548475770851 0.142011128224341
3 33.9111765137635 0.103911011441661
4 67.3380435135758 0.0625163004418104
5 117.820919458675 0.0404780904944712
6 210.543986613847 0.0254089165873003
7 366.889164762149 0.0160949419231219
8 637.060241341843 0.010135406045582
9 1096.54125113444 0.00638363233543847
10 1878.34688714646 0.00401318169994875
11 3201.94672271613 0.0025208146648422
12 5437.69020766155 0.00158175793894727
13 9203.60222260594 0.000991714984152597
14 15533.1606791089 0.000621298692241343
15 26149.7196085218 0.000388975250950428
16 43924.8466075548 0.000243375589137882
17 73636.673898472 0.000152191807070607
Example 4.14. Consider Newton’s method applied to the function f(x) =
1
1+x
2

1
17
, with initial
estimates x
0
= −1, x
1
= 1.
You can easily verify that f(x
0
) = f(x
1
), and thus the secant line is horizontal. And thus x
2
is not
defined.
Algorithm 5: Algorithm for finding root by secant method.
Input: a function, initial guesses, an iteration limit, and a tolerance
Output: a point for which the function has small value.
run secant(f, x0, x1, N, tol)
(1) Let x ←x1, xp ←x0, fh ←f(x1), fp ←f(x0), n ←0.
(2) while n ≤ N
(3) if [fh[ < tol
(4) return x.
(5) Let fpx ←(fh −fp)/(x −xp).
(6) if [fpx[ < tol
(7) Warn “secant slope is too small; giving up.”
(8) return x.
(9) Let xp ←x, fp ←fh.
(10) Let x ←x −fh/fpx.
(11) Let fh ←f(x).
(12) Let n ←n + 1.
4.3.2 Convergence
We consider convergence analysis as in Newton’s method. We assume that r is a root of f(x), and
let e
n
= r −x
n
. Because the secant method involves two iterates, we assume that we will find some
4.3. SECANT METHOD 59
relation between e
k+1
and the previous two errors, e
k
, e
k−1
.
Indeed, this is the case. Omitting all the nasty details (see Cheney & Kincaid [7]), we arrive at
the imprecise equation:
e
k+1
≈ −
f
tt
(r)
2f
t
(r)
e
k
e
k−1
= Ce
k
e
k−1
.
Again, the proof relies on finding some “attractor” region and using continuity.
We now postulate that the error terms for the secant method follow some power law of the
following type:
[e
k+1
[ ∼ A[e
k
[
α
.
Recall that this held true for Newton’s method, with α = 2. We try to find the α for the secant
method. Note that
[e
k
[ = A[e
k−1
[
α
,
so
[e
k−1
[ = A

1
α
[e
k
[
1
α
.
Then we have
A[e
k
[
α
= [e
k+1
[ = C [e
k
[ [e
k−1
[ = CA

1
α
[e
k
[
1+
1
α
,
Since this equation is to hold for all e
k
, we must have
α = 1 +
1
α
.
This is solved by α =
1
2

1 +

5

≈ 1.62. Thus we say that the secant method enjoys superlinear
convergence; This is somewhere between the convergence rates for bisection and Newton’s method.
60 CHAPTER 4. FINDING ROOTS
Exercises
(4.1) Consider the bisection method applied to find the zero of the function f(x) = x
2
− 5x + 3,
with a
0
= 0, b
0
= 1. What are a
1
, b
1
? What are a
2
, b
2
?
(4.2) Approximate

10 by using two steps of Newton’s method, with an initial estimate of x
0
= 3.
(cf. Example Problem 4.11) Your answer should be correct to 5 decimal places.
(4.3) Consider bisection for finding the root to cos x = 0. Let the initial interval I
0
be [0, 2]. What
is the next interval considered, call it I
1
? What is I
2
? I
6
?
(4.4) What does the sequence defined by
x
0
= 1, x
k+1
=
1
2
x
k
+
1
x
k
converge to?
(4.5) Devise a subroutine using only simple operations that finds, via Newton’s Method, the cubed
root of some input number z.
(4.6) Use Newton’s Method to approximate
3

9. Start with x
0
= 2. Find x
2
.
(4.7) Use Newton’s Method to devise a sequence x
0
, x
1
, . . . such that x
k
→ln 10. Is this a reasonable
way to write a subroutine that, given z, computes ln z? (Hint: such a subroutine would require
computation of e
x
k
. Is this possible for rational x
k
without using a logarithm? Is it practical?)
(4.8) Give an example (graphical or analytic) of a function, f(x) for which Newton’s Method:
(a) Does not find a root for some choices of x
0
.
(b) Finds a root for every choice of x
0
.
(c) Falls into a cycle for some choice of x
0
.
(d) Converges slowly to the zero of f(x).
(4.9) How will Newton’s Method perform for finding the root to f(x) =

[x[ = 0?
(4.10) Implement the inverse finder described in Example Problem 4.10. Your m-file should have
header line like:
function c = invs(z)
where z is the number to be inverted. You may wish to use the builtin function sign. As
an extra termination condition, you should have the subroutine return the current iterate if
zx is sufficiently close to 1, say within the interval (0.9999, 0.0001). Can you find some z for
which the algorithm performs poorly?
(4.11) Implement the bisection method in octave/Matlab. Your m-file should have header line like:
function c = run_bisection(f, a, b, tol)
where f is the name of the function. Recall that feval(f,x) when f is a string with the
name of some function evaluates that function at x. This works for builtin functions and
m-files.
(a) Run your code on the function f(x) = cos x, with a
0
= 0, b
0
= 2. In this case you can
set f = "cos".
(b) Run your code on the function f(x) = x
2
−5x + 3, with a
0
= 0, b
0
= 1. In this case you
will have to set f to the name of an m-file (without the “.m”) which will evaluate the
given f(x).
(c) Run your code on the function f(x) = x −cos x, with a
0
= 0, b
0
= 1.
(4.12) Implement Newton’s Method. Your m-file should have header line like:
function x = run_newton(f, fp, x0, N, tol)
where f is the name of the function, and fp is its derivative. Run your code to find zeroes of
the following functions:
(a) f(x) = tan x −x.
4.3. SECANT METHOD 61
(b) f(x) = x
2
−(2 +)x + 1 +, for small.
(c) f(x) = x
7
−7x
6
+ 21x
5
−35x
4
+ 35x
3
−21x
2
+ 7x −1.
(d) f(x) = (x −1)
7
.
(4.13) Implement the secant method Your m-file should have header line like:
function x = run_secant(f, x0, x1, N, tol)
where f is the name of the function. Run your code to find zeroes of the following functions:
(a) f(x) = tan x −x.
(b) f(x) = x
2
+ 1. (Note: This function has no zeroes.)
(c) f(x) = 2 + sin x. (Note: This function also has no zeroes.)
62 CHAPTER 4. FINDING ROOTS
Chapter 5
Interpolation
5.1 Polynomial Interpolation
We consider the problem of finding a polynomial that interpolates a given set of values:
x x
0
x
1
. . . x
n
y y
0
y
1
. . . y
n
where the x
i
are all distinct. A polynomial p(x) is said to interpolate these data if p(x
i
) = y
i
for
i = 0, 1, . . . , n. The x
i
values are called “nodes.”
Sometimes, we will consider a variant of this problem: we have some black box function, f(x),
which we want to approximate with a polynomial p(x). We do this by finding the polynomial
interpolant to the data
x x
0
x
1
. . . x
n
f(x) f(x
0
) f(x
1
) . . . f(x
n
)
for some choice of distinct nodes x
i
.
5.1.1 Lagranges Method
As you might have guessed, for any such set of data, there is an n-degree polynomial that interpo-
lates it. We present a constructive proof of this fact by use of Lagrange Polynomials.
Definition 5.1 (Lagrange Polynomials). For a given set of n + 1 nodes x
i
, the Lagrange
polynomials are the n + 1 polynomials
i
defined by

i
(x
j
) = δ
ij
=

0 if i = j
1 if i = j
Then we define the interpolating polynomial as
p
n
(x) =
n
¸
i=0
y
i

i
(x).
If each Lagrange Polynomial is of degree at most n, then p
n
also has this property. The Lagrange
Polynomials can be characterized as follows:

i
(x) =
n
¸
j=0, j,=i
x −x
j
x
i
−x
j
. (5.1)
63
64 CHAPTER 5. INTERPOLATION
By evaluating this product for each x
j
, we see that this is indeed a characterization of the Lagrange
Polynomials. Moreover, each polynomial is clearly the product of n monomials, and thus has degree
no greater than n.
This gives the theorem
Theorem 5.2 (Interpolant Existence and Uniqueness). Let ¦x
i
¦
n
i=0
be distinct nodes. Then
for any values at the nodes, ¦y
i
¦
n
i=0
, there is exactly one polynomial, p(x) of degree no greater than
n such that p(x
i
) = y
i
for i = 0, 1, . . . , n.
Proof. The Lagrange Polynomial construction gives existence of such a polynomial p(x) of degree
no greater than n.
Suppose there were two such polynomials, call them p(x), q(x), each of degree no greater than
n, both interpolating the data. Let r(x) = p(x) −q(x). Note that r(x) can have degree no greater
than n, yet it has roots at x
0
, x
1
, . . . , x
n
. The only polynomial of degree ≤ n that has n+1 distinct
roots is the zero polynomial, i.e., 0 ≡ r(x) = p(x) − q(x). Thus p(x), q(x) are equal everywhere,
i.e., they are the same polynomial.
Example Problem 5.3. Construct the polynomial interpolating the data
x 1
1
2
3
y 3 −10 2
by using Lagrange Polynomials.
Solution: We construct the Lagrange Polynomials:

0
(x) =
(x −
1
2
)(x −3)
(1 −
1
2
)(1 −3)
= −(x −
1
2
)(x −3)

1
(x) =
(x −1)(x −3)
(
1
2
−1)(
1
2
−3)
=
4
5
(x −1)(x −3)

2
(x) =
(x −1)(x −
1
2
)
(3 −1)(3 −
1
2
)
=
1
5
(x −1)(x −
1
2
)
Then the interpolating polynomial, in “Lagrange Form” is
p
2
(x) = −3(x −
1
2
)(x −3) −8(x −1)(x −3) +
2
5
(x −1)(x −
1
2
)
¬
5.1.2 Newton’s Method
There is another way to prove existence. This method is also constructive, and leads to a different
algorithm for constructing the interpolant. One way to view this construction is to imagine how
one would update the Lagrangian form of an interpolant. That is, suppose some data were given,
and the interpolating polynomial calculated using Lagrange Polynomials; then a new point was
given (x
n+1
, y
n+1
), and an interpolant for the augmented set of data is to be found. Each Lagrange
Polynomial would have to be updated. This could take a lot of calculation (especially if n is large).
So the alternative method constructs the polynomials iteratively. Thus we create polynomials
p
k
(x) such that p
k
(x
i
) = y
i
for 0 ≤ i ≤ k. This is simple for k = 0, we simply let
p
0
(x) = y
0
,
5.1. POLYNOMIAL INTERPOLATION 65
-1
-0.5
0
0.5
1
1.5
2
0 0.5 1 1.5 2 2.5 3 3.5
l0(x)
l1(x)
l2(x)
Figure 5.1: The 3 Lagrange polynomials for the example are shown. Verify, graphically, that these
polynomials have the property
i
(x
j
) = δ
ij
for the nodes 1,
1
2
, 3.
the constant polynomial with value y
0
. Then assume we have a proper p
k
(x) and want to construct
p
k+1
(x). The following construction works:
p
k+1
(x) = p
k
(x) +c(x −x
0
)(x −x
1
) (x −x
k
),
for some constant c. Note that the second term will be zero for any x
i
for 0 ≤ i ≤ k, so p
k+1
(x)
will interpolate the data at x
0
, x
1
, . . . , x
k
. To get the value of the constant we calculate c such that
y
k+1
= p
k+1
(x
k+1
) = p
k
(x
k+1
) +c(x
k+1
−x
0
)(x
k+1
−x
1
) (x
k+1
−x
k
).
This construction is known as Newton’s Algorithm, and the resultant form is Newton’s form of
the interpolant
Example Problem 5.4. Construct the polynomial interpolating the data
x 1
1
2
3
y 3 −10 2
by using Newton’s Algorithm.
Solution: We construct the polynomial iteratively:
p
0
(x) = 3
p
1
(x) = 3 +c(x −1)
We want −10 = p
1
(
1
2
) = 3 +c(−
1
2
), and thus c = 26. Then
p
2
(x) = 3 + 26(x −1) +c(x −1)(x −
1
2
)
66 CHAPTER 5. INTERPOLATION
-35
-30
-25
-20
-15
-10
-5
0
5
10
15
0 0.5 1 1.5 2 2.5 3 3.5 4
p(x)
Figure 5.2: The interpolating polynomial for the example is shown.
We want 2 = p
2
(3) = 3 + 26(2) +c(2)(
5
2
), and thus c =
−53
5
. Then we get
p
2
(x) = 3 + 26(x −1) +
−53
5
(x −1)(x −
1
2
).
¬
Does Newton’s Algorithm give a different polynomial? It is easy to show, by induction, that
the degree of p
n
(x) is no greater than n. Then, by Theorem 5.2, it must be the same polynomial
as the Lagrange interpolant.
The two methods give the same interpolant, we may wonder “Which should we use?” Newton’s
Algorithm seems more flexible–it can deal with adding new data. There also happens to be a way
of storing Newton’s form of the interpolant that makes the polynomial simple to evaluate (in the
sense of number of calculations required).
5.1.3 Newton’s Nested Form
Recall the iterative construction of Newton’s Form:
p
k+1
(x) = p
k
(x) +c
k
(x −x
0
)(x −x
1
) (x −x
k
).
The previous iterate p
k
(x) was constructed similarly, so we can write:
p
k+1
(x) = [p
k−1
(x) +c
k−1
(x −x
0
)(x −x
1
) (x −x
k−1
)] +c
k
(x −x
0
)(x −x
1
) (x −x
k
).
Continuing in this way we see that we can write
p
n
(x) =
n
¸
k=0
c
k

¸
0≤j<k
(x −x
j
)
¸
¸
, (5.2)
where an empty product has value 1 by convention. This can be rewritten in a funny form, where
a monomial is factored out of each successive summand:
p
n
(x) = c
0
+ (x −x
0
)[c
1
+ (x −x
1
)[c
2
+ (x −x
2
) [. . .]]]
5.1. POLYNOMIAL INTERPOLATION 67
Supposing for an instant that the constants c
k
were known, this provides a better way of
calculating p
n
(t) at arbitrary t. By “better” we mean requiring few multiplications and additions.
This nested calculation is performed iteratively:
v
0
= c
n
v
1
= c
n−1
+ (t −x
n−1
)v
0
v
2
= c
n−2
+ (t −x
n−2
)v
1
.
.
.
v
n
= c
0
+ (t −x
0
)v
n−1
This requires only n multiplications and 2n additions. Compare this with the number required
for using the Lagrange form: at least n
2
additions and multiplications.
5.1.4 Divided Differences
It turns out that the coefficients c
k
for Newton’s nested form can be calculated relatively easily by
using divided differences. We assume, for the remainder of this section, that we are considering
interpolating a function, that is, we have values of f(x
i
) at the nodes x
i
.
Definition 5.5 (Divided Differences). For a given collection of nodes ¦x
i
¦
n
i=0
and values
¦f(x
i

n
x=0
, a k
th
order divided difference is a function of k + 1 (not necessarily distinct) nodes,
written as
f [x
i
0
, x
i
1
, . . . , x
i
k
]
The divided differences are defined recursively as follows:
• The 0
th
order divided differences are simply defined:
f [x
i
] = f(x
i
).
• Higher order divided differences are the ratio of differences:
f [x
i
0
, x
i
1
, . . . , x
i
k
] =
f [x
i
1
, x
i
2
, . . . , x
i
k
] −f

x
i
0
, x
i
1
, . . . , x
i
k−1

x
i
k
−x
i
0
We care about divided differences because coefficients for the Newton nested form are divided
differences:
c
k
= f [x
0
, x
1
, . . . , x
k
] . (5.3)
Because we are only interested in the Newton method coefficients, we will only consider divided
differences with successive nodes, i.e., those of the form f [x
j
, x
j+1
, . . . , x
j+k
]. In this case the
higher order differences can more simply be written as
f [x
j
, x
j+1
, . . . , x
j+k
] =
f [x
j+1
, x
j+2
, . . . , x
j+k
] −f [x
j
, x
j+1
, . . . , x
j+k−1
]
x
j+k
−x
j
68 CHAPTER 5. INTERPOLATION
The graphical way to calculate these things is a “pyramid scheme”
1
, where you compute the
following table columnwise, from left to right:
x f[ ] f[ , ] f[ , , ] f[ , , , ]
x
0
f[x
0
]
f[x
0
, x
1
]
x
1
f[x
1
] f[x
0
, x
1
, x
2
]
f[x
1
, x
2
] f[x
0
, x
1
, x
2
, x
3
]
x
2
f[x
2
] f[x
1
, x
2
, x
3
]
f[x
2
, x
2
]
x
3
f[x
3
]
Note that by definition, the first column just echoes the data: f[x
i
] = f(x
i
).
We drag out our example one last time:
Example Problem 5.6. Find the divided differences for the following data
x 1
1
2
3
f(x) 3 −10 2
Solution: We start by writing in the data:
x f[ ] f[ , ] f[ , , ]
1 3
1
2
−10
3 2
Then we calculate:
f[x
0
, x
1
] =
−10 −3
(1/2) −1
= 26, and f[x
1
, x
2
] =
2 −−10
3 −(1/2)
= 24/5.
Adding these to the table, we have
x f[ ] f[ , ] f[ , , ]
1 3
26
1
2
−10
24
5
3 2
Then we calculate
f[x
0
, x
1
, x
2
] =
24/5 −26
3 −1
=
−53
5
.
So we complete the table:
x f[ ] f[ , ] f[ , , ]
1 3
26
1
2
−10
−53
5
24
5
3 2
1
That’s supposed to be a joke.
5.2. ERRORS IN POLYNOMIAL INTERPOLATION 69
You should verify that along the top line of this pyramid you can read off the coefficients for
Newton’s form, as found in Example Problem 5.4. ¬
5.2 Errors in Polynomial Interpolation
We now consider two questions:
1. If we want to interpolate some function f(x) at n + 1 nodes over some closed interval, how
should we pick the nodes?
2. How accurate can we make a polynomial interpolant over a closed interval?
You may be surprised to find that the answer to the first question is not that we should make
the x
i
equally spaced over the closed interval, as the following example illustrates.
Example 5.7 (Runge Function). Let
f(x) =

1 +x
2

−1
,
(known as the Runge Function), and let p
n
(x) interpolate f on n equally spaced nodes, including
the endpoints, on [−5, 5]. Then
lim
n→∞
max
x∈[−5,5]
[p
n
(x) −f(x)[ = ∞.
This behaviour is shown in Figure 5.3.
It turns out that a much better choice is related to the Chebyshev Polynomials (“of the first
kind”). If our closed interval is [−1, 1], then we want to define our nodes as
x
i
= cos
¸
2i + 1
2n + 2

π

, 0 ≤ i ≤ n.
Literally interpreted, these Chebyshev Nodes are the projections of points uniformly spaced on
a semi circle; see Figure 5.4. By using the Chebyshev nodes, a good polynomial interpolant of the
Runge function of Example 5.7 can be found, as shown in Figure 5.5.
5.2.1 Interpolation Error Theorem
We did not just invent the Chebyshev nodes. The fact that they are “good” for interpolation
follows from the following theorem:
Theorem 5.8 (Interpolation Error Theorem). Let p be the polynomial of degree at most n
interpolating function f at the n+1 distinct nodes x
0
, x
1
, . . . , x
n
on [a, b]. Let f
(n+1)
be continuous.
Then for each x ∈ [a, b] there is some ξ ∈ [a, b] such that
f(x) −p(x) =
1
(n + 1)!
f
(n+1)
(ξ)
n
¸
i=0
(x −x
i
) .
You should be thinking that the term on the right hand side resembles the error term in Taylor’s
Theorem.
70 CHAPTER 5. INTERPOLATION
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
-6 -4 -2 0 2 4 6
runge
3 nodes
7 nodes
10 nodes
(a) 3,7,10 equally spaced nodes
-1
0
1
2
3
4
5
6
7
8
-6 -4 -2 0 2 4 6
runge
15 nodes
(b) 15 equally spaced nodes
-700000
-600000
-500000
-400000
-300000
-200000
-100000
0
100000
-6 -4 -2 0 2 4 6
runge
50 nodes
(c) 50 equally spaced nodes
Figure 5.3: The Runge function is poorly interpolated at n equally spaced nodes, as n gets large.
At first, the interpolations appear to improve with larger n, as in (a). In (b) it becomes apparent
that the interpolant is good in center of the interval, but bad at the edges. As shown in (c), this
gets worse as n gets larger.
5.2. ERRORS IN POLYNOMIAL INTERPOLATION 71
bottomright
topleft
P
S
f
r
a
g
r
e
p
l
a
c
e
m
e
n
t
s
1 1 −
Figure 5.4: The Chebyshev nodes are the projections of nodes equally spaced on a semi circle.
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
-6 -4 -2 0 2 4 6
runge
3 nodes
7 nodes
10 nodes
(a) 3,7,10 Chebyshev nodes
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
-6 -4 -2 0 2 4 6
runge
17 nodes
(b) 17 Chebyshev nodes
Figure 5.5: The Chebyshev nodes yield good polynomial interpolants of the Runge function. Com-
pare these figures to those of Figure 5.3. For more than about 25 nodes, the interpolant is indis-
tinguishable from the Runge function by eye.
Proof. First consider when x is one of the nodes x
i
; in this case both the LHS and RHS are zero.
So assume x is not a node. Make the following definitions
w(t) =
n
¸
i=0
(t −x
i
) ,
c =
f(x) −p(x)
w(x)
,
φ(t) = f(t) −p(t) −cw(t).
Since x is not a node, w(x) is nonzero. Now note that φ(x
i
) is zero for each node x
i
, and that by
definition of c, that φ(x) = 0 for our x. That is φ(t) has n + 2 roots.
Some other facts: f, p, w have n + 1 continuous derivatives, by assumption and definition; thus
φ has this many continuous derivatives. Apply Rolle’s Theorem to φ(t) to find that φ
t
(t) has n+1
72 CHAPTER 5. INTERPOLATION
roots. Then apply Rolle’s Theorem again to find φ
tt
(t) has n roots. In this way we see that φ
(n+1)
(t)
has a root, call it ξ. That is
0 = φ
(n+1)
(ξ) = f
(n+1)
(ξ) −p
(n+1)
(ξ) −cw
(n+1)
(ξ).
But p(t) is a polynomial of degree ≤ n, so p
(n+1)
is identically zero. And w(t) is a polynomial
of degree n + 1 in t, so its n + 1
th
derivative is easily seen to be (n + 1)! Thus
0 = f
(n+1)
(ξ) −c(n + 1)!
c(n + 1)! = f
(n+1)
(ξ)
f(x) −p(x)
w(x)
=
1
(n + 1)!
f
(n+1)
(ξ)
f(x) −p(x) =
1
(n + 1)!
f
(n+1)
(ξ)w(x),
which is what was to be proven.
Thus the error in a polynomial interpolation is given as
f(x) −p(x) =
1
(n + 1)!
f
(n+1)
(ξ)
n
¸
i=0
(x −x
i
) .
We have no control over the function f(x) or its derivatives, and once the nodes and f are fixed, p
is determined; thus the only way we can make the error [f(x) −p(x)[ small is by judicious choice
of the nodes x
i
.
The Chebyshev nodes on [−1, 1] have the remarkable property that

n
¸
i=0
(t −x
i
)

≤ 2
−n
for any t ∈ [−1, 1] . Moreover, it can be shown that for any choice of nodes x
i
that
max
t∈[−1,1]

n
¸
i=0
(t −x
i
)

≥ 2
−n
.
Thus the Chebyshev nodes are considered the best for polynomial interpolation.
Merging this result with Theorem 5.8, the error for polynomial interpolants defined on Cheby-
shev nodes can be bounded as
[f(x) −p(x)[ ≤
1
2
n
(n + 1)!
max
ξ∈[−1,1]

f
(n+1)
(ξ)

.
The Chebyshev nodes can be rescaled and shifted for use on the general interval [α, β]. In this
case they take the form
x
i
=
β −α
2
cos
¸
2i + 1
2n + 2

π

+
α +β
2
, 0 ≤ i ≤ n.
In this case, the rescaling of the nodes changes the bound on
¸
(t −x
i
) so the overall error bound
becomes
[f(x) −p(x)[ ≤
(β −α)
n+1
2
2n+1
(n + 1)!
max
ξ∈[α,β]

f
(n+1)
(ξ)

,
for x ∈ [α, β] .
5.2. ERRORS IN POLYNOMIAL INTERPOLATION 73
Example Problem 5.9. How many Chebyshev nodes are required to interpolate the function
f(x) = sin(x) + cos(x) to within 10
−8
on the interval [0, π]?
Solution: We first find the derivatives of f
f
t
(x) = cos(x) −sin(x)
f
tt
(x) = −sin(x) −cos(x)
f
ttt
(x) = −cos(x) + sin(x)
.
.
.
As a crude approximation we can assert that

f
(k)
(x)

≤ [cos x[ +[sin x[ ≤ 2.
Thus it suffices to take n large enough such that
π
n+1
2
2n+1
(n + 1)!
2 ≤ 10
−8
By trial and error we see that n = 10 suffices. ¬
5.2.2 Interpolation Error for Equally Spaced Nodes
Despite the proven superiority of Chebyshev Nodes, and the problems with the Runge Function,
equally spaced nodes are frequently used for interpolation, since they are easy to calculate
2
. We
now consider bounding
max
x∈[a,b]
n
¸
i=0
[x −x
i
[ ,
where
x
i
= a +hi = a +
(b −a)
n
i, i = 0, 1, . . . , n.
Start by picking an x. We can assume x is not one of the nodes, otherwise the product in
question is zero. Then x is between some x
j
, x
j+1
We can show that
[x −x
j
[ [x −x
j+1
[ ≤
h
2
4
.
by simple calculus.
Now we claim that [x −x
i
[ ≤ (j −i + 1) h for i < j, and [x −x
i
[ ≤ (i −j) h for j +1 < i. Then
n
¸
i=0
[x −x
i
[ ≤
h
2
4

(j + 1)!h
j

(n −j)!h
n−j−1

.
It can be shown that (j + 1)!(n −j)! ≤ n!, and so we get an overall bound
n
¸
i=0
[x −x
i
[ ≤
h
n+1
n!
4
.
2
Never underestimate the power of laziness.
74 CHAPTER 5. INTERPOLATION
The interpolation theorem then gives us
[f(x) −p(x)[ ≤
h
n+1
4 (n + 1)
max
ξ∈[a,b]

f
(n+1)
(ξ)

,
where h = (b −a)/n.
The reason this result does not seem to apply to Runge’s Function is that f
(n)
for Runge’s
Function becomes unbounded as n →∞.
Example Problem 5.10. How many equally spaced nodes are required to interpolate the function
f(x) = sin(x) + cos(x) to within 10
−8
on the interval [0, π]?
Solution: As in Example Problem 5.9, we make the crude approximation

f
(k)
(x)

≤ [cos x[ +[sin x[ ≤ 2.
Thus we want to make n sufficiently large such that
h
n+1
4(n + 1)
2 ≤ 10
−8
,
where h = (π −0)/n. That is we want to find n large enough such that
π
n+1
2n
n+1
(n + 1)
≤ 10
−8
,
By simply trying small numbers, we can see this is satisfied if n = 12. ¬
5.2. ERRORS IN POLYNOMIAL INTERPOLATION 75
Exercises
(5.1) Find the Lagrange Polynomials for the nodes ¦−1, 1¦.
(5.2) Find the Lagrange Polynomials for the nodes ¦−1, 1, 5¦.
(5.3) Find the polynomial of degree no greater than 2 that interpolates
x −1 1 5
y 3 3 −2
(5.4) Complete the divided differences table:
x f[ ] f[ , ] f[ , , ]
−1 3
1 3
5 −2
Find the Newton form of the polynomial interpolant.
(5.5) Find the polynomial of degree no greater than 3 that interpolates
x −1 1 5 −3
y 3 3 −2 4
(Hint: reuse the Newton form of the polynomial from the previous question.)
(5.6) Find the nested form of the polynomial interpolant of the data
x 1 3 4 6
y −3 13 21 1
by completing the following divided differences table:
x f[ ] f[ , ] f[ , , ] f[ , , , ]
1 −3
3 13
4 21
6 1
(5.7) Find the polynomial of degree no greater than 3 that interpolates
x 1 0 3/2 2
y 3 2 37/8 8
(5.8) Complete the divided differences table:
x f[ ] f[ , ] f[ , , ] f[ , , , ]
−1 6
0 3
1 2
2 3
76 CHAPTER 5. INTERPOLATION
(Something a little odd should have happened in the last column.) Find the Newton form of
the polynomial interpolant. Of what degree is the polynomial interpolant?
(5.9) Let p(x) interpolate the function cos(x) at n equally spaced nodes on the interval [0, 2]. Bound
the error
max
0≤x≤2
[p(x) −cos(x)[
as a function of n. How small is the error when n = 10? How small would the error be if
Chebyshev nodes were used instead? How about when n = 10?
(5.10) Let p(x) interpolate the function x
−2
at n equally spaced nodes on the interval [0.5, 1].
Bound the error
max
0.5≤x≤1

p(x) −x
−2

as a function of n. How small is the error when n = 10?
(5.11) How many Chebyshev nodes are required to interpolate the function
1
x
to within 10
−6
on
the interval [1, 2]?
(5.12) Write code to calculate the Newton form coefficients, by divided differences, for the nodes
x
i
and values f(x
i
). Your m-file should have header line like:
function coefs = newtonCoef(xs,fxs)
where xs is the vector of n + 1 nodes, and fxs the vector of n + 1 values. Test your code on
the following input:
octave:1> xs = [1 -1 2 -2 3 -3 4 -4];
octave:2> fxs = [1 1 2 3 5 8 13 21];
octave:3> newtonCoef(xs,fxs)
ans =
1.00000 -0.00000 0.33333 -0.08333 0.05000 0.00417 -0.00020 -0.00040
(a) What do you get when you try the following?
octave:4> xs = [1 4 5 9 14 23 37 60];
octave:5> fxs = [3 1 4 1 5 9 2 6];
octave:6> newtonCoef(xs,ys)
(b) Try the following:
octave:7> xs = [1 3 4 2 8 -2 0 14 23 15];
octave:8> fxs = xs.*xs + xs .+ 4;
octave:9> newtonCoef(xs,fxs)
(5.13) Write code to calculate a polynomial interpolant from its Newton form coefficients and the
node values. Your m-file should have header line like:
function y = calcNewton(t,coefs,xs)
where coefs is a vector of the Newton coefficients, xs is a vector of the nodes x
i
, and y is
the value of the interpolating polynomial at t. Check your code against the following values:
octave:1> xs = [1 3 4];
octave:2> coefs = [6 5 1];
octave:3> calcNewton (5,coefs,xs)
ans = 34
octave:4> calcNewton (-3,coefs,xs)
ans = 10
(a) What do you get when you try the following?
octave:5> xs = [3 1 4 5 9 2 6 8 7 0];
5.2. ERRORS IN POLYNOMIAL INTERPOLATION 77
octave:6> coefs = [1 -1 2 -2 3 -3 4 -4 5 -5];
octave:7> calcNewton(0.5,coefs,xs)
(b) Try the following
octave:8> xs = [3 1 4 5 9 2 6 8 7 0];
octave:9> coefs = [1 -1 2 -2 3 -3 4 -4 5 -5];
octave:10> calcNewton(1,coefs,xs)
78 CHAPTER 5. INTERPOLATION
Chapter 6
Spline Interpolation
Splines are used to approximate complex functions and shapes. A spline is a function consisting of
simple functions glued together. In this way a spline is different from a polynomial interpolation,
which consists of a single well defined function that approximates a given shape; splines are normally
piecewise polynomial.
6.1 First and Second Degree Splines
Splines make use of partitions, which are a way of cutting an interval into a number of subintervals.
Definition 6.1 (Partition). A partition of the interval [a, b] is an ordered sequence ¦t
i
¦
n
i=0
such
that
a = t
0
< t
1
< < t
n−1
< t
n
= b
The numbers t
i
are known as knots.
A spline of degree 1, also known as a linear spline, is a function which is linear on each subinterval
defined by a partition:
Definition 6.2 (Linear Splines). A function S is a spline of degree 1 on [a, b] if
1. The domain of S is [a, b].
2. S is continuous on [a, b].
3. There is a partition ¦t
i
¦
n
i=0
of [a, b] such that on each [t
i
, t
i+1
], S is a linear polynomial.
A linear spline is defined entirely by its value at the knots. That is, given
t t
0
t
1
. . . t
n
y y
0
y
1
. . . y
n
there is only one linear spline with these values at the knots and linear on each given subinterval.
For a spline with this data, the linear polynomial on each subinterval is defined as
S
i
(x) = y
i
+
y
i+1
−y
i
t
i+1
−t
i
(x −t
i
) .
Note that if x ∈ [t
i
, t
i+1
] , then x − t
i
> 0, but x − t
i−1
≤ 0. Thus if we wish to evaluate S(x), we
search for the largest i such that x −t
i
> 0, then evaluate S
i
(x).
Example 6.3. The linear spline for the following data
t 0.0 0.1 0.4 0.5 0.75 1.0
y 1.3 4.5 2.0 2.1 5.0 3
is shown in Figure 6.1.
79
80 CHAPTER 6. SPLINE INTERPOLATION
0
1
2
3
4
5
6
0 0.2 0.4 0.6 0.8 1
Figure 6.1: A linear spline. The spline is piecewise linear, and is linear between each knot.
6.1.1 First Degree Spline Accuracy
As with polynomial functions, splines are used to interpolate tabulated data as well as functions. In
the latter case, if the spline is being used to interpolate the function f, say, then this is equivalent
to interpolating the data
t t
0
t
1
. . . t
n
y f(t
0
) f(t
1
) . . . f(t
n
)
A function and its linear spline interpolant are shown in Figure 6.2. The spline interpolant in
that figure is fairly close to the function over some of the interval in question, but it also deviates
greatly from the function at other points of the interval. We are interested in finding bounds on
the possible error between a function and its spline interpolant.
To find the error bound, we will consider the error on a single interval of the partition, and use
a little calculus.
1
Suppose p(t) is the linear polynomial interpolating f(t) at the endpoints of the
subinterval [t
i
, t
i+1
], then for t ∈ [t
i
, t
i+1
] ,
[f(t) −p(t)[ ≤ max ¦[f(t) −f(t
i
)[ , [f(t) −f(t
i+1
)[¦ .
That is, [f(t) −p(t)[ is no larger than the “maximum variation” of f(t) on this interval.
In particular, if f
t
(t) exists and is bounded by M
1
on [t
i
, t
i+1
], then
[f(t) −p(t)[ ≤
M
1
2
(t
i+1
−t
i
) .
Similarly, if f
tt
(t) exists and is bounded by M
2
on [t
i
, t
i+1
], then
[f(t) −p(t)[ ≤
M
2
8
(t
i+1
−t
i
)
2
.
Over a given partition, these become, respectively
1
Although we could directly claim Theorem 5.8, it is a bit of overkill.
6.1. FIRST AND SECOND DEGREE SPLINES 81
2
2.5
3
3.5
4
4.5
5
5.5
6
0 0.2 0.4 0.6 0.8 1
P
S
f
r
a
g
r
e
p
l
a
c
e
m
e
n
t
s
function
spline interpolant
Figure 6.2: The function f(t) = 4.1 + sin (1/(0.08t + 0.5)) is shown, along with the linear spline
interpolant, for the knots 0, 0.1, 0.4, 0.6, 0.75, 0.9, 1.0 For some values of t, the function and its
interpolant are very close, while they vary greatly in the middle of the interval.
[f(t) −p(t)[ ≤
M
1
2
max
0≤i<n
(t
i+1
−t
i
) ,
[f(t) −p(t)[ ≤
M
2
8
max
0≤i<n
(t
i+1
−t
i
)
2
.
(6.1)
If equally spaced nodes are used, these bounds guarantee that spline interpolants become better
as the number of nodes is increased. This contrasts with polynomial interpolants, which may get
worse as the number of nodes is increased, cf. Example 5.7.
6.1.2 Second Degree Splines
Piecewise quadratic splines, or splines of degree 2, are defined similarly:
Definition 6.4 (Quadratic Splines). A function Q is a quadratic spline on [a, b] if
1. The domain of Q is [a, b].
2. Q is continuous on [a, b].
3. Q
t
is continuous on (a, b).
4. There is a partition ¦t
i
¦
n
i=0
of [a, b] such that on [t
i
, t
i+1
], Q is a polynomial of degree at most
2.
Example 6.5. The following is a quadratic splne:
Q(x) =

−x x ≤ 0,
x
2
−x 0 ≤ x ≤ 2,
−x
2
+ 7x −8 2 ≤ x.
Unlike linear splines, quadratic splines are not defined entirely by their values at the knots. We
consider why that is. The spline Q(x) is defined by its piecewise polynomials,
Q
i
(x) = a
i
x
2
+b
i
x +c
i
.
82 CHAPTER 6. SPLINE INTERPOLATION
Thus there are 3n parameters to define Q(x).
For each of the n subintervals, the data
t t
0
t
1
. . . t
n
y y
0
y
1
. . . y
n
give two equations regarding Q
i
(x), namely that Q
i
(t
i
) must equal y
i
and Q
i
(t
i+1
) must equal y
i+1
.
This is 2n equations. The condition on continuity of Q
t
gives a single equation for each of the n−1
internal nodes. This totals 3n −1 equations, but 3n unknowns. This system is underdetermined.
Thus some additional user-chosen condition is required to determine the quadratic spline. One
might choose, for example, Q
t
(a) = 0, or Q
tt
(a) = 0, or some other condition.
6.1.3 Computing Second Degree Splines
Suppose the data
t t
0
t
1
. . . t
n
y y
0
y
1
. . . y
n
are given. Let z
i
= Q
t
i
(t
i
), and suppose that the additional condition to define the quadratic spline
is given by specifying z
0
. We want to be able to compute the form of Q
i
(x).
Because Q
i
(t
i
) = y
i
, Q
t
i
(t
i
) = z
i
, Q
t
i
(t
i+1
) = z
i+1
, we see that we can define
Q
i
(x) =
z
i+1
−z
i
2 (t
i+1
−t
i
)
(x −t
i
)
2
+z
i
(x −t
i
) +y
i
.
Use this at t
i+1
:
y
i+1
= Q
i
(t
i+1
) =
z
i+1
−z
i
2 (t
i+1
−t
i
)
(t
i+1
−t
i
)
2
+z
i
(t
i+1
−t
i
) +y
i
,
y
i+1
−y
i
=
z
i+1
−z
i
2
(t
i+1
−t
i
) +z
i
(t
i+1
−t
i
) ,
y
i+1
−y
i
=
z
i+1
+z
i
2
(t
i+1
−t
i
) .
Thus we can determine, from the data alone, z
i+1
from z
i
:
z
i+1
= 2
y
i+1
−y
i
t
i+1
−t
i
−z
i
.
6.2 (Natural) Cubic Splines
If you recall the definition of the linear and quadratic splines, probably you can guess the definition
of the spline of degree k:
Definition 6.6 (Splines of Degree k). A function S is a spline of degree k on [a, b] if
1. The domain of S is [a, b].
2. S, S
t
, S
tt
, . . . , S
(k−1)
are continuous on (a, b).
3. There is a partition ¦t
i
¦
n
i=0
of [a, b] such that on [t
i
, t
i+1
], S is a polynomial of degree ≤ k.
You would also expect that a spline of degree k has k − 1 “degrees of freedom,” as we show
here. If the partition has n+1 knots, the spline of degree k is defined by n(k +1) parameters. The
given data
t t
0
t
1
. . . t
n
y y
0
y
1
. . . y
n
6.2. (NATURAL) CUBIC SPLINES 83
provide 2n equations. The continuity of S
t
, S
tt
, . . . , S
(k−1)
at the n − 1 internal knots gives (k −
1)(n − 1) equations. This is a total of n(k + 1) − (k − 1) equations. Thus we have k − 1 more
unknowns than equations. Thus, barring some singularity, we can (and must) add k−1 constraints
to uniquely define the spline. These are the degrees of freedom.
Often k is chosen as 3. This yields cubic splines. We must add 2 extra constraints to define the
spline. The usual choice is to make
S
tt
(t
0
) = S
tt
(t
n
) = 0.
This yields the natural cubic spline.
6.2.1 Why Natural Cubic Splines?
It turns out that natural cubic splines are a good choice in the sense that they are the “inter-
polant of minimal H
2
seminorm.” The corollary following this theorem states this in more easily
understandable terms:
Theorem 6.7. Suppose f has two continuous derivatives, and S is the natural cubic spline inter-
polating f at knots a = t
0
< t
1
< . . . < t
n
= b. Then

b
a

S
tt
(x)

2
dx ≤

b
a

f
tt
(x)

2
dx
Proof. We let g(x) = f(x) −S(x). Then g(x) is zero on the (n+1) knots t
i
. Derivatives are linear,
meaning that
f
tt
(x) = S
tt
(x) +g
tt
(x).
Then

b
a

f
tt
(x)

2
dx =

b
a

S
tt
(x)

2
dx +

b
a

g
tt
(x)

2
dx +

b
a
2S
tt
(x)g
tt
(x) dx.
We show that the last integral is zero. Integrating by parts we get

b
a
S
tt
(x)g
tt
(x) dx = S
tt
g
t

b
a

b
a
S
ttt
g
t
dx = −

b
a
S
ttt
g
t
dx,
because S
tt
(a) = S
tt
(b) = 0. Then notice that S is a polynomial of degree ≤ 3 on each interval, thus
S
ttt
(x) is a piecewise constant function, taking value c
i
on each interval [t
i
, t
i+1
]. Thus

b
a
S
ttt
g
t
dx =
n−1
¸
i=0

b
a
c
i
g
t
dx =
n−1
¸
i=0
c
i
g

t
i+1
t
i
= 0,
with the last equality holding because g(x) is zero at the knots.
Corollary 6.8. The natural cubic spline is best twice-continuously differentiable interpolant for a
twice-continuously differentiable function, under the measure given by the theorem.
Proof. Let f be twice-continuously differentiable, and let S be the natural cubic spline interpolating
f(x) at some given nodes ¦t
i
¦
n
i=0
. Let R(x) be some twice-continuously differentiable function
which also interpolates f(x) at these nodes. Then S(x) interpolates R(x) at these nodes. Apply
the theorem to get

b
a

S
tt
(x)

2
dx ≤

b
a

R
tt
(x)

2
dx
84 CHAPTER 6. SPLINE INTERPOLATION
6.2.2 Computing Cubic Splines
First we present an example of computing the natural cubic spline by hand:
Example Problem 6.9. Construct the natural cubic spline for the following data:
t −1 0 2
y 3 −1 3
Solution: The natural cubic spline is defined by eight parameters:
S(x) =

ax
3
+bx
2
+cx +d x ∈ [−1, 0]
ex
3
+fx
2
+gx +h x ∈ [0, 2]
We interpolate to find that d = h = −1 and
−a +b −c −1 = 3
8e + 4f + 2g −1 = 3
We take the derivative of S:
S
t
(x) =

3ax
2
+ 2bx +c x ∈ [−1, 0]
3ex
2
+ 2fx +g x ∈ [0, 2]
Continuity at the middle node gives c = g. Now take the second derivative of S:
S
tt
(x) =

6ax + 2b x ∈ [−1, 0]
6ex + 2f x ∈ [0, 2]
Continuity at the middle node gives b = f. The natural cubic spline condition gives −6a + 2b = 0
and 12e + 2f = 0. Solving this by “divide and conquer” gives
S(x) =

x
3
+ 3x
2
−2x −1 x ∈ [−1, 0]

1
2
x
3
+ 3x
2
−2x −1 x ∈ [0, 2]
¬
Finding the constants for the previous example was fairly tedious. And this is for the case of
only three nodes. We would like a method easier than setting up the 4n equations and unknowns,
something akin to the description in Subsection 6.1.3. The method is rather tedious, so we leave it
to the exercises.
6.3 B Splines
The B splines form a basis for spline functions, whence the name. We presuppose the existence of
an infinite number of knots:
. . . < t
2
< t
1
< t
0
< t
1
< t
2
< . . . ,
with lim
k→−∞
t
k
= −∞ and lim
k→∞
t
k
= ∞.
The B splines of degree 0 are defined as single “blocks”:
B
0
i
(x) =

1 t
i
≤ x < t
i+1
0 otherwise
6.3. B SPLINES 85
The zero degree B splines are continuous from the right, are nonzero only on one subinterval
[t
i
, t
i+1
), sum to 1 everywhere.
We justify the description of B splines as basis splines: If S is a spline of degree 0 on the given
knots and is continuous from the right then
S(x) =
¸
i
S(x
i
)B
0
i
(x).
That is, the basis splines work in the same way that Lagrange Polynomials worked for polynomial
interpolation.
The B splines of degree k are defined recursively:
B
k
i
(x) =

x −t
i
t
i+k
−t
i

B
k−1
i
(x) +

t
i+k+1
−x
t
i+k+1
−t
i+1

B
k−1
i+1
(x).
Some B splines are shown in Figure 6.3.
The B splines quickly become unwieldy. We focus on the case k = 1. The B spline B
1
i
(x) is
• Piecewise linear.
• Continuous.
• Nonzero only on (t
i
, t
i+2
).
• 1 at t
i+1
.
These B splines are sometimes called hat functions. Imagine wearing a hat shaped like this! What-
ever.
The nice thing about the hat functions is they allow us to use analogy. Harken back to polyno-
mial interpolation and the Lagrange Functions. The hat functions play a similar role because
B
1
i
(t
j
) = δ
(i+1)j
=

1 (i + 1) = j
0 (i + 1) = j
Then if we want to interpolate the following data with splines of degree 1:
t t
0
t
1
. . . t
n
y y
0
y
1
. . . y
n
We can immediately set
S(x) =
n
¸
i=0
y
i
B
1
i−1
(x).
86 CHAPTER 6. SPLINE INTERPOLATION
-1
-0.5
0
0.5
1
1.5
2
2.5
3
0 1 2 3 4 5 6
(a) Degree 0 B spline
-1
-0.5
0
0.5
1
1.5
2
2.5
3
0 1 2 3 4 5 6
(b) Degree 1 B spline
-1
-0.5
0
0.5
1
1.5
2
2.5
3
0 1 2 3 4 5 6
P
S
f
r
a
g
r
e
p
l
a
c
e
m
e
n
t
s
B
1
0
(x)
B
1
1
(x)
B
2
0
(x)
(c) Degree 2 B spline, with 2 Degree 1 B splines
Figure 6.3: Some B splines for the knots t
0
= 1, t
1
= 2.5, t
2
= 3.5, t
3
= 4 are shown. In (a), one of
the degree 0 B splines; in (b), a degree 1 B spline; in (c), two of the degree 1 B splines and a degree
2 B spline are shown. The two hat functions “merge” together to give the quadratic B spline.
6.3. B SPLINES 87
Exercises
(6.1) Is the following function a linear spline on [0, 4]? Why or why not?
S(x) =

3x + 2 : 0 ≤ x < 1
−2x + 4 : 1 ≤ x ≤ 4
(6.2) Is the following function a linear spline on [0, 2]? Why or why not?
S(x) =

x + 3 : 0 ≤ x < 1
3 : 1 ≤ x ≤ 2
(6.3) Is the following function a linear spline on [0, 4]? Why or why not?
S(x) =

x
2
+ 3 : 0 ≤ x < 3
5x −6 : 3 ≤ x ≤ 4
(6.4) Find constants, α, β such that the following is a linear spline on [0, 5].
S(x) =

4x −2 : 0 ≤ x < 1
αx +β : 1 ≤ x < 3
−2x + 10 : 3 ≤ x ≤ 5
(6.5) Is the following function a quadratic spline on [0, 4]? Why or why not?
Q(x) =

x
2
+ 3 : 0 ≤ x < 3
5x −6 : 3 ≤ x ≤ 4
(6.6) Is the following function a quadratic spline on [0, 2]? Why or why not?
Q(x) =

x
2
+ 3x + 2 : 0 ≤ x < 1
2x
2
+x + 3 : 1 ≤ x ≤ 2
(6.7) Find constants, α, β, γ such that the following is a quadratic spline on [0, 5].
Q(x) =

1
2
x
2
+ 2x +
3
2
: 0 ≤ x < 1
αx
2
+βx +γ : 1 ≤ x < 3
3x
2
−7x + 12 : 3 ≤ x ≤ 5
(6.8) Find the quadratic spline that interpolates the following data:
t 0 1 4
y 1 −2 1
To resolve the single degree of freedom, assume that Q
t
(0) = −Q
t
(4). Assume your solution
takes the form
Q(x) =

α
1
(x −1)
2

1
(x −1) −2 : 0 ≤ x < 1
α
2
(x −1)
2

2
(x −1) −2 : 1 ≤ x ≤ 4
Find the constants α
1
, β
1
, α
2
, β
2
.
88 CHAPTER 6. SPLINE INTERPOLATION
(6.9) Find the natural cubic spline that interpolates the data
x 0 1 3
y 4 2 7
It may help to assume your answer has the form
S(x) =

Ax
3
+Bx
2
+Cx + 4 : 0 ≤ x < 1
D(x −1)
3
+E(x −1)
2
+F(x −1) + 2 : 1 ≤ x ≤ 3
Chapter 7
Approximating Derivatives
7.1 Finite Differences
Suppose we have some blackbox function f(x) and we wish to calculate f
t
(x) at some given x. Not
surprisingly, we start with Taylor’s theorem:
f(x +h) = f(x) +f
t
(x)h +
f
tt
(ξ)h
2
2
.
Rearranging we get
f
t
(x) =
f(x +h) −f(x)
h

f
tt
(ξ)h
2
.
Remember that ξ is between x and x +h, but its exact value is not known. If we wish to calculate
f
t
(x), we cannot evaluate f
tt
(ξ), so we approximate the derivative by dropping the last term. That
is, we calculate [f(x +h) −f(x)] /h as an approximation
1
to f
t
(x). In so doing, we have dropped
the last term. If there is a finite bound on f
tt
(z) on the interval in question then the dropped term
is bounded by a constant times h. That is,
f
t
(x) =
f(x +h) −f(x)
h
+O(h) (7.1)
The error that we incur when we approximate f
t
(x) by calculating [f(x +h) −f(x)] /h is called
truncation error. It has nothing to do with the kind of error that you get when you do calculations
with a computer with limited precision; even if you worked in infinite precision, you would still
have truncation error.
The truncation error can be made small by making h small. However, as h gets smaller, precision
will be lost in equation 7.1 due to subtractive cancellation. The error in calculation for small h
is called roundoff error. Generally the roundoff error will increase as h decreases. Thus there is
a nonzero h for which the sum of these two errors is minimized. See Example Problem 7.5 for an
example of this.
The truncation error for this approximation is O(h). We may want a more precise approxima-
tion. By now, you should know that any calculation starts with Taylor’s Theorem:
f(x +h) = f(x) +f
t
(x)h +
f
tt
(x)
2
h
2
+
f
ttt

1
)
3!
h
3
f(x −h) = f(x) −f
t
(x)h +
f
tt
(x)
2
h
2

f
ttt

2
)
3!
h
3
1
This approximation for f

(x) should remind you of the definition of f

(x) as a limit.
89
90 CHAPTER 7. APPROXIMATING DERIVATIVES
By subtracting these two lines, we get
f(x +h) −f(x −h) = 2f
t
(x)h +
f
ttt

1
) +f
ttt

2
)
3!
h
3
.
Thus
2f
t
(x)h = f(x +h) −f(x −h) −
f
ttt

1
) +f
ttt

2
)
3!
h
3
f
t
(x) =
f(x +h) −f(x −h)
2h

¸
f
ttt

1
) +f
ttt

2
)
2

h
2
6
If f
ttt
(x) is continuous, then there is some ξ between ξ
1
, ξ
2
such that f
ttt
(ξ) =
f


1
)+f


2
)
2
. (This
is the MVT at work.) Assuming some uniform bound on f
ttt
(), we get
f
t
(x) =
f(x +h) −f(x −h)
2h
+O

h
2

(7.2)
In some situations it may be necessary to use evaluations of a function at “odd” places to
approximate a derivative. These are usually straightforward to derive, involving the use of Taylor’s
Theorem. The following examples illustrate:
Example Problem 7.1. Use evaluations of f at x+h and x+2h to approximate f
t
(x), assuming
f(x) is an analytic function, i.e., one with infintely many derivatives.
Solution: First use Taylor’s Theorem to expand f(x+h) and f(x+2h), then subtract to get some
factor of f
t
(x):
f(x + 2h) = f(x) + 2hf
t
(x) +
4h
2
2!
f
tt
(x) +
8h
3
3!
f
ttt
(x) +
16h
4
4!
f
(4)
(x) +. . .
f(x +h) = f(x) +hf
t
(x) +
h
2
2!
f
tt
(x) +
h
3
3!
f
ttt
(x) +
h
4
4!
f
(4)
(x) +. . .
f(x + 2h) −f(x +h) = hf
t
(x) +
3h
2
2!
f
tt
(x) +
7h
3
3!
f
ttt
(x) +
15h
4
4!
f
(4)
(x) +. . .
(f(x + 2h) −f(x +h)) /h = f
t
(x) +
3h
2!
f
tt
(x) +
7h
2
3!
f
ttt
(x) +
15h
3
4!
f
(4)
(x) +. . .
Thus (f(x + 2h) −f(x +h)) /h = f
t
(x) +O(h) ¬
Example Problem 7.2. Show that
4f(x +h) −f(x + 2h) −3f(x)
2h
= f
t
(x) +O

h
2

,
for f with sufficient number of derivatives
Solution: In this case we do not have to find the approximation scheme, it is given to us. We only
have to expand the appropriate terms with Taylor’s Theorem. As before:
f(x +h) = f(x) +hf
t
(x) +
h
2
2!
f
tt
(x) +
h
3
3!
f
ttt
(x) +. . .
4f(x +h) = 4f(x) + 4hf
t
(x) +
4h
2
2!
f
tt
(x) +
4h
3
3!
f
ttt
(x) +. . .
f(x + 2h) = f(x) + 2hf
t
(x) +
4h
2
2!
f
tt
(x) +
8h
3
3!
f
ttt
(x) +. . .
4f(x +h) −f(x + 2h) = 3f(x) + 2hf
t
(x) + 0f
tt
(x) +
−4h
3
3!
f
ttt
(x) +. . .
4f(x +h) −f(x + 2h) −3f(x) = 2hf
t
(x) +
−4h
3
3!
f
ttt
(x) +. . .
(4f(x +h) −f(x + 2h) −3f(x)) /2h = f
t
(x) +
−2h
2
3!
f
ttt
(x) +. . .
¬
7.2. RICHARDSON EXTRAPOLATION 91
7.1.1 Approximating the Second Derivative
Suppose we want to approximate the second derivative of some blackbox function f(x). Again,
start with Taylor’s Theorem:
f(x +h) = f(x) +f
t
(x)h +
f
tt
(x)
2
h
2
+
f
ttt
(x)
3!
h
3
+
f
(4)
(x)
4!
h
4
+. . .
f(x −h) = f(x) −f
t
(x)h +
f
tt
(x)
2
h
2

f
ttt
(x)
3!
h
3
+
f
(4)
(x)
4!
h
4
−. . .
Now add the two series to get
f(x +h) +f(x −h) = 2f(x) +h
2
f
tt
(x) + 2
f
(4)
(x)
4!
h
4
+ 2
f
(6)
(x)
6!
h
6
+. . .
Then let
ψ(h) =
f(x +h) −2f(x) +f(x −h)
h
2
= f
tt
(x) + 2
f
(4)
(x)
4!
h
2
+ 2
f
(6)
(x)
6!
h
4
+. . . ,
= f
tt
(x) +

¸
k=1
b
2k
h
2k
.
Thus we can use Richardson Extrapolation on ψ(h) to get higher order approximations.
This derivation also gives us the centered difference approximation to the second derivative:
f
tt
(x) =
f(x +h) −2f(x) +f(x −h)
h
2
+O

h
2

. (7.3)
7.2 Richardson Extrapolation
The centered difference approximation gives a truncation error of O

h
2

, which is better than
O(h) . Can we do better? Let’s define
φ(h) =
1
2h
[f(x +h) −f(x −h)] .
Had we expanded the Taylor’s Series for f(x +h), f(x −h) to more terms we would have seen
that
φ(h) = f
t
(x) +a
2
h
2
+a
4
h
4
+a
6
h
6
+a
8
h
8
+. . .
The constants a
i
are a function of f
(i+1)
(x) only. (In fact, they should take the value of
f
(i+1)
(x)
(i+1)!
.)
What happens if we now calculate φ(h/2)?
φ(h/2) = f
t
(x) +
1
4
a
2
h
2
+
1
16
a
4
h
4
+
1
64
a
6
h
6
+
1
256
a
8
h
8
+. . .
But we can combine this with φ(h) to get better accuracy. We have to be a little tricky, but we
can get the O

h
2

terms to cancel by taking the right multiples of these two approximations:
φ(h) −4φ(h/2) = −3f
t
(x) +
3
4
4h
4
+
15
16
a
6
h
6
+
63
64
a
8
h
8
+. . .
4φ(h/2) −φ(h)
3
= f
t
(x) −
1
4
4h
4

5
16
a
6
h
6

21
64
a
8
h
8
+. . .
92 CHAPTER 7. APPROXIMATING DERIVATIVES
This approximation has a truncation error of O

h
4

.
This technique of getting better approximations is known as the Richardson Extrapolation, and
can be repeatedly applied. We will also use this technique later to get better quadrature rules–that
is, ways of approximating the definite integral of a function.
7.2.1 Abstracting Richardson’s Method
We now discuss Richardson’s Method in a more abstract framework. Suppose you want to calculate
some quantity L, and have found, through theory, some approximation:
φ(h) = L +

¸
k=1
a
2k
h
2k
.
Let
D(n, 0) = φ

h
2
n

.
Now define
D(n, m) =
4
m
D(n, m−1) −D(n −1, m−1)
4
m
−1
. (7.4)
We will be interested in calculating D(n, n) for some n. We claim that
D(n, n) = L +O

h
2(n+1)

.
First we examine the recurrence for D(n, m). As in divided differences, we use a pyramid table:
D(0, 0)
D(1, 0) D(1, 1)
D(2, 0) D(2, 1) D(2, 2)
.
.
.
.
.
.
.
.
.
.
.
.
D(n, 0) D(n, 1) D(n, 2) D(n, n)
By definition we know how to calculate the first column of this table; every other entry in the table
depends on two other entries, one directly to the left, and the other to the left and up one space.
Thus to calculate D(n, n) we have to compute this whole lower triangular array.
We want to show that D(n, n) = L+O

h
2(n+1)

, that is D(n, n) is a O

h
2(n+1)

approximation
to L. The following theorem gives this result:
Theorem 7.3 (Richardson Extrapolation). There are constants a
k,m
such that
D(n, m) = L +

¸
k=m+1
a
k,m

h
2
n

2k
(0 ≤ m ≤ n) .
The proof is by an easy, but tedious, induction. We skip the proof.
7.2. RICHARDSON EXTRAPOLATION 93
7.2.2 Using Richardson Extrapolation
We now try out the technique on an example or two.
Example Problem 7.4. Approximate the derivative of f(x) = log x at x = 1.
Solution: The real answer is f
t
(1) = 1/1 = 1, but our computer doesn’t know that. Define
φ(h) =
1
2h
[f(1 +h) −f(1 −h)] =
log
1+h
1−h
2h
.
Let’s use h = 0.1. We now try to find D(2, 2), which is supposed to be a O

h
6

approximation to
f
t
(1) = 1:
n`m 0 1 2
0
log
1.1
0.9
0.2
≈ 1.003353477
1
log
1.05
0.95
0.1
≈ 1.000834586 ≈ 0.999994954
2
log
1.025
0.975
0.05
≈ 1.000208411 ≈ 0.999999686 ≈ 1.000000002
This shows that the Richardson method is pretty good. However, notice that for this simple
example, we have, already, that φ(0.00001) ≈ 0.999999999. ¬
Example Problem 7.5. Consider the ugly function:
f(x) = arctan(x).
Attempt to find f
t
(

2). Recall that f
t
(x) =
1
1+x
2
, so the value that we are seeking is
1
3
.
Solution: Let’s use h = 0.01. We now try to find D(2, 2), which is supposed to be a O

h
6

approximation to
1
3
:
n`m 0 1 2
0 0.333339506181068
1 0.333334876543723 0.333333333331274
2 0.33333371913582 0.333333333333186 0.333333333333313
Note that we have some motivation to use Richardson’s method in this case: If we let
φ(h) =
1
2h

f(

2 +h) −f(

2 −h)

,
then making h small gives a good approximation to f
t

2

until subtractive cancelling takes over.
The following table illustrates this:
94 CHAPTER 7. APPROXIMATING DERIVATIVES
h φ(h)
1.0 0.39269908169872408
0.1 0.33395069677431943
0.01 0.33333950618106845
0.001 0.33333339506169679
0.0001 0.33333333395058062
1 10
−5
0.33333333334106813
1 10
−6
0.33333333332441484
1 10
−7
0.33333333315788138
1 10
−8
0.33333332760676626
1 10
−9
0.33333336091345694
1 10
−10
0.333333360913457
1 10
−11
0.333333360913457
1 10
−12
0.33339997429493451
1 10
−13
0.33306690738754696
1 10
−14
0.33306690738754696
1 10
−15
0.33306690738754691
1 10
−16
0
The data are illustrated in Figure 7.1. Notice that φ(h) gives at most 10 decimal places of
accuracy, then begins to deteriorate; Note however, we get 13 decimal places from D(2, 2). ¬
1e-12
1e-10
1e-08
1e-06
0.0001
0.01
1
1e-16 1e-14 1e-12 1e-10 1e-08 1e-06 0.0001 0.01 1
P
S
f
r
a
g
r
e
p
l
a
c
e
m
e
n
t
s
total error
Figure 7.1: The total error for the centered difference approximation to f
t
(

2) is shown versus
h. The total error is the sum of a truncation term which decreases as h decreases, and a roundoff
term which increases. The optimal h value is around 1 10
−5
. Note that Richardson’s D(2, 2)
approximation with h = 0.01 gives much better results than this optimal h.
7.2. RICHARDSON EXTRAPOLATION 95
Exercises
(7.1) Derive the approximation
f
t
(x) ≈
4f(x +h) −3f(x) −f(x −2h)
6h
using Taylor’s Theorem.
(a) Assuming that f(x) has bounded derivatives, give the accuracy of the above approxi-
mation. Your answer should be something like O

h
?

.
(b) Let f(x) = x
3
. Approximate f
t
(0) with this approximation, using h =
1
4
.
(7.2) Let f(x) be an analytic function, i.e., one which is infinitely differentiable. Let ψ(h) be the
centered difference approximation to the first derivative:
ψ(h) =
f(x +h) −f(x −h)
2h
(a) Show that ψ(h) = f
t
(x) +
h
2
3!
f
ttt
(x) +
h
4
5!
f
(5)
(x) +
h
6
7!
f
(7)
(x) +. . .
(b) Show that
8 (ψ(h) −ψ(h/2))
h
2
= f
ttt
(x) +O

h
2

.
(7.3) Derive the approximation
f
t
(x) ≈
4f(x + 3h) + 5f(x) −9f(x −2h)
30h
using Taylor’s Theorem.
(a) What order approximation is this? (Assume f(x) has bounded derivatives of arbitrary
order.)
(b) Use this formula to approximate f
t
(0), where f(x) = x
4
, and h = 0.1
(7.4) Suppose you want to know quantity Q, and can approximate it with some formula, say φ(h),
which depends on parameter h, and such that φ(h) = Q + a
1
h + a
2
h
2
+ a
3
h
3
+ a
4
h
4
+ . . .
Find some linear combination of φ(h) and φ(−h) which is a O

h
2

approximation to Q.
(7.5) Assuming that φ(h) = Q + a
2
h
2
+ a
4
h
4
+ a
6
h
6
. . ., find some combination of φ(h), φ(h/3)
which is a O

h
4

approximation to Q.
(7.6) Let λ be some number in (0, 1). Assuming that φ(h) = Q+a
2
h
2
+a
4
h
4
+a
6
h
6
. . ., find some
combination of φ(h), φ(λh) which is a O

h
4

approximation to Q. To make the constant
associated with the h
4
term small in magnitude, what should you do with λ? Is this practical?
Note that the method of Richardson Extrapolation that we considered used the value λ = 1/2.
(7.7) Assuming that φ(h) = Q + a
2
h
2
+ a
4
h
4
+ a
6
h
6
. . ., find some combination of φ(h), φ(h/4)
which is a O

h
4

approximation to Q.
(7.8) Suppose you have some great computational approximation to the quantity Q such that
ψ(h) = Q +a
3
h
3
+ a
6
h
6
+ a
9
h
9
. . . Can you find some combination of ψ(h), ψ(h/2) which is
a O

h
6

approximation to Q?
(7.9) Complete the following Richardson’s Extrapolation Table, assuming the first column consists
of values D(n, 0) for n = 0, 1, 2:
n`m 0 1 2
0 2
1 1.5 ?
2 1.25 ? ?
(See equation 7.4 if you’ve forgotten the definitions.)
96 CHAPTER 7. APPROXIMATING DERIVATIVES
(7.10) Write code to complete a Richardson’s Method table, given the first column. Your m-file
should have header line like:
function Dnn = richardsons(col0)
where Dnn is the value at the lower left corner of the table, D(n, n) while col0 is the column
of n + 1 values D(i, 0), for i = 0, 1, . . . , n. Test your code on the following input:
octave:1> col0 = [1 0.5 0.25 0.125 0.0625 0.03125];
octave:2> richardsons(col0)
ans = 0.019042
(a) What do you get when you try the following?
octave:5> col0 = [1.5 0.5 1.5 0.5 1.5 0.5 1.5];
octave:6> richardsons(col0)
(b) What do you get when you try the following?
octave:7> col0 = [0.9 0.99 0.999 0.9999 0.99999];
octave:8> richardsons(col0)
Chapter 8
Integrals and Quadrature
8.1 The Definite Integral
Often enough the numerical analyst is presented with the challenge of finding the definite integral
of some function:

b
a
f(x) dx.
In your golden years of Calculus, you learned the Fundamental Theorem of Calculus, which claims
that if f(x) is continuous, and F(x) is an antiderivative of f(x), then

b
a
f(x) dx = F(b) −F(a).
What you might not have been told in Calculus is there are some functions for which a closed
form antiderivative does not exist or at least is not known to humankind. Nevertheless, you may
find yourself in a situation where you have to evaluate an integral for just such an integrand. An
approximation will have to do.
8.1.1 Upper and Lower Sums
We will review the definition of the Riemann integral of a function. A partition of an interval [a, b]
is a finite, ordered collection of nodes x
i
:
a = x
0
< x
1
< x
2
< < x
n
= b.
Given such a partition, P, define the upper and lower bounds on each subinterval [x
j
, x
j+1
] as
follows:
m
i
= inf ¦f(x) [ x
i
≤ x ≤ x
i+1
¦
M
i
= sup¦f(x) [ x
i
≤ x ≤ x
i+1
¦
Then for this function f and partition P, define the upper and lower sums:
L(f, P) =
n−1
¸
i=0
m
i
(x
i+1
−x
i
)
U(f, P) =
n−1
¸
i=0
M
i
(x
i+1
−x
i
)
97
98 CHAPTER 8. INTEGRALS AND QUADRATURE
We can interpret the upper and lower sums graphically as the sums of areas of rectangles defined
by the function f and the partition P, as in Figure 8.1.
bottomright

(a) The Lower Sum
bottomright

(b) The Upper Sum
Figure 8.1: The (a) lower, and (b) upper sums of a function on a given interval are shown. These
approximations to the integral are the sums of areas of rectangles. Note that the lower sums are
an underestimate, and the upper sums an overestimate of the integral.
Notice a few things about the upper, lower sums:
(i) L(f, P) ≤ U(f, P).
(ii) If we switch to a “better” partition (i.e., a finer one), we expect that L(f, ) increases and
U(f, ) decreases.
The notion of integrability familiar from Calculus class (that is Riemann Integrability) is defined
in terms of the upper and lower sums.
Definition 8.1. A function f is Riemann Integrable over interval [a, b] if
sup
P
L(f, P) = inf
P
U(f, P),
where the supremum and infimum are over all partitions of the interval [a, b]. Moreover, in case
f(x) is integrable, we define the integral

b
a
f(x) dx = inf
P
U(f, P),
You may recall the following
Theorem 8.2. Every continuous function on a closed bounded interval of the real line is Riemann
Integrable (on that interval).
Continuity is sufficient, but not necessary.
Example 8.3. Consider the Heaviside function:
f(x) =

0 x < 0
1 0 ≤ x
This function is not continuous on any interval containing 0, but is Riemann Integrable on every
closed bounded interval.
8.1. THE DEFINITE INTEGRAL 99
Example 8.4. Consider the Dirichlet function:
f(x) =

0 x rational
1 x irrational
For any partition P of any interval [a, b], we have L(f, P) = 0, while U(f, P) = 1, so
sup
P
L(f, P) = 0 = 1 = inf
P
U(f, P),
so this function is not Riemann Integrable.
8.1.2 Approximating the Integral
The definition of the integral gives a simple method of approximating an integral

b
a
f(x) dx. The
method cuts the interval [a, b] into a partition of n equal subintervals x
i
= a+
b−a
n
, for i = 0, 1, . . . , n.
The algorithm then has to somehow find the supremum and infimum of f(x) on each interval
[x
i
, x
i+1
]. The integral is then approximated by the mean of the lower and upper sums:

b
a
f(x) dx ≈
1
2
(L(f, P) +U(f, P)) .
Because the value of the integral is between L(f, P) and U(f, P), this approximation has error at
most
1
2
(U(f, P) −L(f, P)) .
Note that in general, or for a black box function, it is usually not feasible to find the suprema
and infima of f(x) on the subintervals, and thus the lower and upper sums cannot be calculated.
However, if some information is known about the function, it becomes easier:
Example 8.5. Consider for example, using this method on some function f(x) which is monotone
increasing, that is x ≤ y implies f(x) ≤ f(y). In this case, the infimum of f(x) on each interval
occurs at the leftmost endpoint, while the supremum occurs at the right hand endpoint. Thus for
this partition, P, we have
L(f, P) =
n−1
¸
k=0
m
i
[x
k+1
−x
k
[ =
[b −a[
n
n−1
¸
k=0
f(x
k
)
U(f, P) =
n−1
¸
k=0
M
i
[x
k+1
−x
k
[ =
[b −a[
n
n−1
¸
k=0
f(x
k+1
) =
[b −a[
n
n
¸
k=1
f(x
k
)
Then the error of the approximation is
1
2
(U(f, P) −L(f, P)) =
1
2
[b −a[
n
[f(x
n
) −f(x
0
)] =
[b −a[ [f(b) −f(a)]
2n
.
8.1.3 Simple and Composite Rules
For the remainder of this chapter we will study “simple” quadrature rules, i.e., rules which approx-
imate the integral of a function, f(x) over an interval [a, b] by means of a number of evaluations of
f at points in this interval. The error of a simple quadrature rule usually depends on the function
100 CHAPTER 8. INTEGRALS AND QUADRATURE
f, and the width of the interval [a, b] to some power which is determined by the rule. That is we
usually think of a simple rule as being applied to a small interval.
To use a simple rule on a larger interval, we usually cast it into a “composite” rule. Thus the
trapezoidal rule, which we will study next becomes the composite trapezoidal rule. The means of
extending a simple rule to a composite rule is straightforward: Partition the given interval into
subintervals, apply the simple rule to each subinterval, and sum the results. Thus, for example if
the interval in question is [α, β], and the partition is α = x
0
< x
1
< x
2
< . . . < x
n
= β, we have
composite rule on [α, β] =
n−1
¸
i=0
simple rule applied to [x
i
, x
i+1
].
8.2 Trapezoidal Rule
Suppose we are trying to approximate the integral

b
a
f(x) dx,
for some unpleasant or black box function f(x).
The trapezoidal rule approximates the integral

b
a
f(x) dx
by the (signed) area of the trapezoid through the points (a, f(a)) , (b, f(b)) , and with one side the
segment from a to b. See Figure 8.2.
lleft
uright
P
S
f
r
a
g
r
e
p
l
a
c
e
m
e
n
t
s
a b
Figure 8.2: The trapezoidal rule for approximating the integral of a function over [a, b] is shown.
By old school math, we can find this signed area easily. This gives the (simple) trapezoidal rule:

b
a
f(x) dx ≈ (b −a)
f(a) +f(b)
2
.
8.2. TRAPEZOIDAL RULE 101
The composite trapezoidal rule can be written in a simplified form, one which you saw in your
calculus class, if the interval in question is partitioned into equal width subintervals. That is if you
let [α, β] be partitioned by
α = x
0
< x
1
< x
2
< x
3
< . . . < x
n
= β,
with x
i
= α +ih, where h = (β −α)/n, then the composite trapezoidal rule is

β
α
f(x) dx =
n−1
¸
i=0

x
i+1
x
i
f(x) dx ≈
1
2
n−1
¸
i=0
(x
i+1
−x
i
) [f(x
i
) +f(x
i+1
)] . (8.1)
Since each subinterval has equal width, x
i+1
−x
i
= h, and we have

b
a
f(x) dx ≈
h
2
n−1
¸
i=0
[f(x
i
) +f(x
i+1
)] . (8.2)
In your calculus class, you saw this in the less comprehensible form:

b
a
f(x) dx ≈ h
¸
f(a) +f(b)
2
+
n−1
¸
i=1
f(x
i
)
¸
.
Note that the composite trapezoidal rule for equal subintervals is the same as the approximation
we found for increasing functions in Example 8.5.
Example Problem 8.6. Approximate the integral

2
0
1
1 +x
2
dx
by the composite trapezoidal rule with a partition of equally spaced points, for n = 2.
Solution: We have h =
2−0
2
= 1, and f(x
0
) = 1, f(x
1
) =
1
2
, f(x
2
) =
1
5
. Then the composite
trapezoidal rule gives the value
1
2
[f(x
0
) +f(x
1
) +f(x
1
) +f(x
2
)] =
1
2
¸
1 + 1 +
1
5

=
11
10
.
The actual value is arctan 2 ≈ 1.107149, and our approximation is correct to two decimal places. ¬
8.2.1 How Good is the Composite Trapezoidal Rule?
We consider the composite trapezoidal rule for partitions of equal subintervals. Let p
i
(x) be the
polynomial of degree ≤ 1 that interpolates f(x) at x
i
, x
i+1
. Let
I
i
=

x
i+1
x
i
f(x) dx, T
i
=

x
i+1
x
i
p
i
(x) dx = (x
i+1
−x
i
)
p
i
(x
i
) +p
i
(x
i+1
)
2
=
h
2
(f(x
i
) +f(x
i+1
)) .
That’s right: the composite trapezoidal rule approximates the integral of f(x) over [x
i
, x
i+1
] by
the integral of p
i
(x) over the same interval.
Now recall our theorem on polynomial interpolation error. For x ∈ [x
i
, x
i+1
] , we have
f(x) −p
i
(x) =
1
(2)!
f
(2)

x
) (x −x
i
) (x −x
i+1
) ,
102 CHAPTER 8. INTEGRALS AND QUADRATURE
for some ξ
x
∈ [x
i
, x
i+1
] . Recall that ξ
x
depends on x. To make things simpler, call it ξ(x).
Now integrate:
I
i
−T
i
=

x
i+1
x
i
f(x) −p
i
(x) dx =
1
2

x
i+1
x
i
f
tt
(ξ(x)) (x −x
i
) (x −x
i+1
) dx.
We will now attack the integral on the right hand side. Recall the following theorem:
Theorem 8.7 (Mean Value Theorem for Integrals). Suppose f is continuous, g is Riemann
Integrable and does not change sign on [α, β]. Then there is some ζ ∈ [α, β] such that

β
α
f(x)g(x) dx = f(ζ)

β
α
g(x) dx.
We use this theorem on our integral. Note that (x −x
i
) (x −x
i+1
) is nonpositive on the interval
of question, [x
i
, x
i+1
]. We assume continuity of f
tt
(x), and wave our hands to get continuity of
f
tt
(ξ(x)). Then we have
I
i
−T
i
=
1
2
f
tt
(ξ)

x
i+1
x
i
(x −x
i
) (x −x
i+1
) dx,
for some ξ
i
∈ [x
i
, x
i+1
]. By boring calculus and algebra, we find that

x
i+1
x
i
(x −x
i
) (x −x
i+1
) dx = −
h
3
6
.
This gives
I
i
−T
i
= −
h
3
12
f
tt

i
),
for some ξ
i
∈ [x
i
, x
i+1
].
We now sum over all subintervals to find the total error of the composite trapezoidal rule
E =
n−1
¸
i=0
I
i
−T
i
= −
h
3
12
n−1
¸
i=0
f
tt

i
) = −
(b −a) h
2
12
¸
1
n
n−1
¸
i=0
f
tt

i
)
¸
.
On the far right we have an average value,
1
n
¸
n−1
i=0
f
tt

i
), which lies between the least and greatest
values of f
tt
on the inteval [a, b], and thus by the IVT, there is some ξ which takes this value. So
E = −
(b −a) h
2
12
f
tt
(ξ)
This gives us the theorem:
Theorem 8.8 (Error of the Composite Trapezoidal Rule). Let f
tt
(x) be continuous on [a, b].
Let T be the value of the trapezoidal rule applied to f(x) on this interval with a partition of uniform
spacing, h, and let I =

b
a
f(x) dx. Then there is some ξ ∈ [a, b] such that
I −T = −
(b −a) h
2
12
f
tt
(ξ).
Note that this theorem tells us not only the magnitude of the error, but the sign as well. Thus
if, for example, f(x) is concave up and thus f
tt
is positive, then I − T will be negative, i.e., the
trapezoidal rule gives an overestimate of the integral I. See Figure 8.3.
8.2. TRAPEZOIDAL RULE 103
lleft
uright
P
S
f
r
a
g
r
e
p
l
a
c
e
m
e
n
t
s
x
i
x
i+1
Figure 8.3: The trapezoidal rule is an overestimate for a function which is concave up, i.e., has
positive second derivative.
8.2.2 Using the Error Bound
Example Problem 8.9. How many intervals are required to approximate the integral
ln 2 = I =

1
0
1
1 +x
dx
to within 1 10
−10
?
Solution: We have f(x) =
1
1+x
, thus f
t
(x) = −
1
(1+x)
2
. And f
tt
(x) =
2
(1+x)
3
. Thus f
tt
(ξ) is
continuous and bounded by 2 on [0, 1]. If we use n equal subintervals then Theorem 8.8 tells us the
error will be

1 −0
12

1 −0
n

2
f
tt
(ξ) = −
f
tt
(ξ)
12n
2
.
To make this smaller than 1 10
−10
, in absolute value, we need only take
1
6n
2
≤ 1 10
−10
,
and so n ≥

1
6
10
5
suffices. Because f
tt
(x) is positive on this interval, the trapezoidal rule will
be an overestimate. ¬
Example Problem 8.10. How many intervals are required to approximate the integral

2
0
x
3
−1 dx
to within 1 10
−6
?
Solution: We have f(x) = x
3
− 1, thus f
t
(x) = 3x
2
, and f
tt
(x) = 6x. Thus f
tt
(ξ) is continuous
and bounded by 12 on [0, 2]. If we use n equal subintervals then by Theorem 8.8 the error will be

2 −0
12

2 −0
n

2
f
tt
(ξ) = −
2f
tt
(ξ)
3n
2
.
104 CHAPTER 8. INTEGRALS AND QUADRATURE
To make this smaller than 1 10
−6
, in absolute value, it suffices to take
24
3n
2
≤ 1 10
−6
,
and so n ≥

8 10
3
suffices. Because f
tt
(x) is positive on this interval, the trapezoidal rule will
be an overestimate. ¬
8.3 Romberg Algorithm
Theorem 8.8 tells us, approximately, that the error of the composite trapezoidal rule approximation
is O

h
2

. If we halve h, the error is quartered. Sometimes we want to do better than this. We’ll
use the same trick that we did from Richardson extrapolation. In fact, the forms are exactly the
same.
Towards this end, suppose that f, a, b are given. For a given n, we are going to use the trapezoidal
rule on a partition of 2
n
equal subintervals of [a, b]. That is h =
b−a
2
n
. Then define
φ(n) =
1
2
b −a
2
n
2
n
−1
¸
i=0
f(x
i
) +f(x
i+1
)
=
b −a
2
n
¸
f(a)
2
+
f(b)
2
+
2
n
−1
¸
i=1
f

a +i
b −a
2
n

¸
.
The intervals used to calculate φ(n + 1) are half the size of those for φ(n). As mentioned above,
this means the error is one quarter.
It turns out that if we had proved the error theorem differently, we would have proved the
relation:
φ(n) =

b
a
f(x) dx +a
2
h
2
n
+a
4
h
4
n
+a
6
h
6
n
+a
8
h
8
n
+. . . ,
where h
n
=
b−a
2
n
. The constants a
i
are a function of f
(i)
(x) only. This should look just like something
from Chapter 7. What happens if we now calculate φ(n + 1)? We have
φ(n + 1) =

b
a
f(x) dx +a
2
h
2
n+1
+a
4
h
4
n+1
+a
6
h
6
n+1
+a
8
h
8
n+1
+. . . ,
=

b
a
f(x) dx +
1
4
a
2
h
2
n
+
1
16
a
4
h
4
n
+
1
64
a
6
h
6
n
+
1
256
a
8
h
8
n
+. . . .
This happens because h
n+1
=
b−a
2
n+1
=
1
2
b−a
2
n
=
hn
2
. As with Richardon’s method for approximating
derivatives, we now combine the right multiples of these:
φ(n) −4φ(n + 1) = −3

b
a
f(x) dx +
3
4
4h
4
n
+
15
16
a
6
h
6
n
+
63
64
a
8
h
8
n
+. . .
4φ(n) −φ(n + 1)
3
=

b
a
f(x) dx −
1
4
4h
4
n

5
16
a
6
h
6
n

21
64
a
8
h
8
n
+. . .
This approximation has a truncation error of O

h
4
n

.
Like in Richardson’s method, we can use this to get better and better approximations to the
integral. We do this by constructing a triangular array of approximations, each entry depending
on two others. Towards this end, we let
R(n, 0) = φ(n),
8.3. ROMBERG ALGORITHM 105
then define, for m > 0
R(n, m) =
4
m
R(n, m−1) −R(n −1, m−1)
4
m
−1
. (8.3)
The familiar pyramid table then is:
R(0, 0)
R(1, 0) R(1, 1)
R(2, 0) R(2, 1) R(2, 2)
.
.
.
.
.
.
.
.
.
.
.
.
R(n, 0) R(n, 1) R(n, 2) R(n, n)
Even though this is exactly the same as Richardon’s method, it has another name: this is called
the Romberg Algorithm.
Example Problem 8.11. Approximating the integral

2
0
1
1 +x
2
dx
by Romberg’s Algorithm; find R(1, 1).
Solution: The first column is calculated by the trapezoidal rule. Successive columns are found by
combining members of previous columns. So we first calculate R(0, 0) and R(1, 0). These are fairly
simple, the first is the trapezoidal rule on a single subinterval, the second is the trapezoidal rule on
two subintervals. Then
R(0, 0) =
2 −0
1
1
2
[f(0) +f(2)] =
6
5
,
R(1, 0) =
2 −0
2
1
2
[f(0) +f(1) +f(1) +f(2)] =
11
10
.
Then, using Romberg’s Algorithm we have
R(1, 1) =
4R(1, 0) −R(0, 0)
4 −1
=
44
10

12
10
3
=
32
30
= 1.0
¯
6.
¬
At this point we are tempted to use Richardson’s analysis. This would claim that R(n, n) is a
O

h
2(n+1)
0

approximation to the integral. However, h
0
= b − a, and need not be smaller than 1.
This is a bit different from Richardson’s method, where the original h is independently set before
starting the triangular array; for Romberg’s algorithm, h
0
is determined by a and b.
We can easily deal with this problem by picking some k such that
b−a
2
k
is small enough, say
smaller than 1. Then calculating the following array:
R(k, 0)
R(k + 1, 0) R(k + 1, 1)
R(k + 2, 0) R(k + 2, 1) R(k + 2, 2)
.
.
.
.
.
.
.
.
.
.
.
.
R(k +n, 0) R(k +n, 1) R(k +n, 2) R(k +n, n)
106 CHAPTER 8. INTEGRALS AND QUADRATURE
Quite often Romberg’s Algorithm is used to compute columns of this array. Subtractive can-
celling or unbounded higher derivatives of f(x) can make successive approximations less accurate.
For this reason, entries in ever rightward columns are usually not calculated, rather lower entries
in a single column are calculated instead. That is, the user calculates the array:
R(k, 0)
R(k + 1, 0) R(k + 1, 1)
R(k + 2, 0) R(k + 2, 1) R(k + 2, 2)
.
.
.
.
.
.
.
.
.
.
.
.
R(k +n, 0) R(k +n, 1) R(k +n, 2) R(k +n, n)
R(k +n + 1, 0) R(k +n + 1, 1) R(k +n + 1, 2) R(k +n + 1, n)
R(k +n + 2, 0) R(k +n + 2, 1) R(k +n + 2, 2) R(k +n + 2, n)
R(k +n + 3, 0) R(k +n + 3, 1) R(k +n + 3, 2) R(k +n + 3, n)
.
.
.
.
.
.
.
.
.
.
.
.
Then R(k +n +l, n) makes a fine approximation to the integral as l →∞. Usually n is small,
like 2 or 3.
8.3.1 Recursive Trapezoidal Rule
It turns out there is an efficient way of calculating R(n + 1, 0) given R(n, 0); first notice from the
above example that
R(0, 0) =
b −a
1
1
2
[f(a) +f(b)] ,
R(1, 0) =
b −a
2
1
2
¸
f(a) +f(
a +b
2
) +f(
a +b
2
) +f(b)

.
It would be best to calculate R(1, 0) without recalculating f(a) and f(b). It turns out this is possible.
Let h
n
=
b−a
2
n
, and recall that
R(n, 0) = φ(n) = h
n
¸
f(a) +f(b)
2
+
2
n
−1
¸
i=1
f (a +ih
n
)
¸
.
Thus
R(n + 1, 0) = φ(n + 1) = h
n+1

f(a) +f(b)
2
+
2
n+1
−1
¸
i=1
f (a +ih
n+1
)
¸
¸
,
=
1
2
h
n
¸
f(a) +f(b)
2
+
2
n
−1
¸
i=1
f (a + (2i −1)h
n+1
) +f

a + (2i)
1
2
h
n

¸
,
=
1
2
h
n
¸
f(a) +f(b)
2
+
2
n
−1
¸
i=1
f (a +ih
n
) +
2
n
−1
¸
i=1
f (a + (2i −1)h
n+1
)
¸
,
=
1
2
R(n, 0) +h
n+1
2
n
−1
¸
i=1
f (a + (2i −1)h
n+1
) .
Then calculating R(n + 1, 0) requires only 2
n
−1 additional evaluations of f(x), instead of the
2
n+1
+ 1 usually required.
8.4. GAUSSIAN QUADRATURE 107
8.4 Gaussian Quadrature
The word quadrature refers to a method of approximating the integral of a function as the linear
combination of the function at certain points, i.e.,

b
a
f(x) dx ≈ A
0
f(x
0
) +A
1
f(x
1
) +. . . A
n
f(x
n
), (8.4)
for some collection of nodes ¦x
i
¦
n
i=0
, and weights ¦A
i
¦
n
i=0
. Normally one finds the nodes and weights
in a table somewhere; we expect a quadrature rule with more nodes to be more accurate in some
sense–the tradeoff is in the number of evaluations of f(). We will examine how these rules are
created.
8.4.1 Determining Weights (Lagrange Polynomial Method)
Suppose that the nodes ¦x
i
¦
n
i=0
are given. An easy way to find “good” weights ¦A
i
¦
n
i=0
for these
nodes is to rig them so the quadrature rule gives the integral of p(x), the polynomial of degree ≤ n
which interpolates f(x) on these nodes. Recall
p(x) =
n
¸
i=0
f(x
i
)
i
(x),
where
i
(x) is the i
th
Lagrange polynomial. Thus our rigged approximation is the one that gives

b
a
f(x) dx ≈

b
a
p(x) dx =
n
¸
i=0
f(x
i
)

b
a

i
(x) dx.
If we let
A
i
=

b
a

i
(x) dx,
then we have a quadrature rule.
If f(x) is a polynomial of degree ≤ n then f(x) = p(x), and the quadrature rule is exact.
Example Problem 8.12. Construct a quadrature rule on the interval [0, 4] using nodes 0, 1, 2.
Solution: The nodes are given, we determine the weights by constructing the Lagrange Polyno-
mials, and integrating them.

0
(x) =
(x −1)(x −2)
(0 −1)(0 −2)
=
1
2
(x −1)(x −2),

1
(x) =
(x −0)(x −2)
(1 −0)(1 −2)
= −(x)(x −2),

2
(x) =
(x −0)(x −1)
(2 −0)(2 −1)
=
1
2
(x)(x −1).
Then the weights are
A
0
=

4
0

0
(x) dx =

4
0
1
2
(x −1)(x −2) dx =
8
3
,
A
1
=

4
0

1
(x) dx =

4
0
−(x)(x −2) dx = −
16
3
,
A
2
=

4
0

2
(x) dx =

4
0
1
2
(x)(x −1) dx =
20
3
.
108 CHAPTER 8. INTEGRALS AND QUADRATURE
Thus our quadrature rule is

4
0
f(x) dx ≈
8
3
f(0) −
16
3
f(1) +
20
3
f(2).
We expect this rule to be exact for a quadratic function f(x). To illustrate this, let f(x) = x
2
+1.
By calculus we have

4
0
x
2
+ 1 dx =
1
3
x
3
+x

4
0
=
64
3
+ 4 =
76
3
.
The approximation is

4
0
x
2
+ 1 dx ≈
8
3
[0 + 1] −
16
3
[1 + 1] +
20
3
[4 + 1] =
76
3
.
¬
8.4.2 Determining Weights (Method of Undetermined Coefficients)
Using the Lagrange Polynomial Method to find the weights A
i
is fine for a computer, but can
be tedious (and error-prone) if done by hand (say, on an exam). The method of undetermined
coefficients is a good alternative for finding the weights by hand, and for small n.
The idea behind the method is to find n+1 equations involving the n+1 weights. The equations
are derived by letting the quadrature rule be exact for f(x) = x
j
for j = 0, 1, . . . , n. That is, setting

b
a
x
j
dx =
n
¸
i=0
A
i
(x
i
)
j
.
For example, we reconsider Example Problem 8.12.
Example Problem 8.13. Construct a quadrature rule on the interval [0, 4] using nodes 0, 1, 2.
Solution: The method of undetermined coefficients gives the equations:

4
0
1 dx = 4 = A
0
+A
1
+A
2

4
0
xdx = 8 = A
1
+ 2A
2

4
0
x
2
dx = 64/3 = A
1
+ 4A
2
.
We perform Na¨ıve Gaussian Elimination on the system:

¸
1 1 1 4
0 1 2 8
0 1 4 64/3
¸

We get the same weights as in Example Problem 8.12: A
2
=
20
3
, A
1
= −
16
3
, A
0
=
8
3
. ¬
Notice the difference compared to the Lagrange Polynomial Method: undetermined coefficients
requires solution of a linear system, while the former method calculates the weights “directly.”
Since we will not consider n to be very large, solving the linear system may not be too burdensome.
Moreover, the method of undetermined coefficients is useful in more general settings, as illus-
trated by the next example:
8.4. GAUSSIAN QUADRATURE 109
Example Problem 8.14. Determine a “quadrature” rule of the form

1
0
f(x) dx ≈ Af(1) +Bf
t
(1) +Cf
tt
(1)
that is exact for polynomials of highest possible degree. What is the highest degree polynomial for
which this rule is exact?
Solution: Since there are three unknown coefficients to be determined, we look for three equations.
We get these equations by plugging in successive polynomials. That is, we plug in f(x) = 1, x, x
2
and assuming the coefficients give equality:

1
0
1 dx = 1 = A1 +B0 +C 0 = A

1
0
xdx = 1/2 = A1 +B1 +C 0 = A+B

1
0
x
2
dx = 1/3 = A1 +B2 +C 2 = A+ 2B + 2C
This is solved by A = 1, B = −1/2, C = 1/6. This rule should be exact for polynomials of degree
no greater than 2, but it might be better. We should check:

1
0
x
3
dx = 1/4 = 1/2 = 1 −3/2 + 1 = A1 +B3 +C 6,
and thus the rule is not exact for cubic polynomials, or those of higher degree. ¬
8.4.3 Gaussian Nodes
It would seem this is the best we can do: using n+1 nodes we can devise a quadrature rule that is
exact for polynomials of degree ≤ n by choosing the weights correctly. It turns out that by choosing
the nodes in the right way, we can do far better. Gauss discovered that the right nodes to choose
are the n + 1 roots of the (nontrivial) polynomial, q(x), of degree n + 1 which has the property

b
a
x
k
q(x) dx = 0 (0 ≤ k ≤ n) .
(If you view the integral as an inner product, you could say that q(x) is orthogonal to the polyno-
mials x
k
in the resultant inner product space, but that’s just fancy talk.)
Suppose that we have such a q(x)–we will not prove existence or uniqueness. Let f(x) be a
polynomial of degree ≤ 2n + 1. We write
f(x) = p(x)q(x) +r(x).
Both p(x), r(x) are of degree ≤ n. Because of how we picked q(x) we have

b
a
p(x)q(x) dx = 0.
Thus

b
a
f(x) dx =

b
a
p(x)q(x) dx +

b
a
r(x)dx =

b
a
r(x)dx.
110 CHAPTER 8. INTEGRALS AND QUADRATURE
Now suppose that the A
i
are chosen by Lagrange Polynomials so the quadrature rule on the
nodes x
i
is exact for polynomials of degree ≤ n. Then
n
¸
i=0
A
i
f(x
i
) =
n
¸
i=0
A
i
[p(x
i
)q(x
i
) +r(x
i
)] =
n
¸
i=0
A
i
r(x
i
).
The last equality holds because the x
i
are the roots of q(x). Because of how the A
i
are chosen we
then have
n
¸
i=0
A
i
f(x
i
) =
n
¸
i=0
A
i
r(x
i
) =

b
a
r(x) dx =

b
a
f(x) dx.
Thus this rule is exact for f(x). We have (or rather, Gauss has) made quadrature twice as good.
Theorem 8.15 (Gaussian Quadrature Theorem). Let x
i
be the n + 1 roots of a (nontrivial)
polynomial, q(x), of degree n + 1 which has the property

b
a
x
k
q(x) dx = 0 (0 ≤ k ≤ n) .
Let A
i
be the coefficients for these nodes chosen by integrating the Lagrange Polynomials. Then the
quadrature rule for this choice of nodes and coefficients is exact for polynomials of degree ≤ 2n+1.
8.4.4 Determining Gaussian Nodes
We can determine the Gaussian nodes in the same way we determine coefficients. The example is
illustrative
Example Problem 8.16. Find the two Gaussian nodes for a quadrature rule on the interval [0, 2].
Solution: We will find the function q(x) of degree 2, which is “orthogonal” to 1, x under the inner
product of integration over [0, 2]. Thus we let q(x) = c
0
+c
1
x +c
2
x
2
. The orthogonality condition
becomes

2
0
1q(x) dx =

2
0
xq(x) dx = 0 that is,

2
0
c
0
+c
1
x +c
2
x
2
dx =

2
0
c
0
x +c
1
x
2
+c
2
x
3
dx = 0
Evaluating these integrals gives the following system of linear equations:
2c
0
+ 2c
1
+
8
3
c
2
= 0,
2c
0
+
8
3
c
1
+ 4c
2
= 0.
This system is “underdetermined,” that is, there are two equations, but three unknowns. Notice,
however, that if q(x) satisfies the orthogonality conditions, then so does ˆ q(x) = αq(x), for any real
number α. That is, we can pick the scaling of q(x) as we wish.
With great foresight, we “guess” that we want c
2
= 3. This reduces the equations to
2c
0
+ 2c
1
= −8,
2c
0
+
8
3
c
1
= −12.
8.4. GAUSSIAN QUADRATURE 111
Simple Gaussian Elimination (cf. Chapter 3) yields the answer c
0
= 2, c
1
= −6, c
2
= 3.
Then our nodes are the roots of q(x) = 2 −6x + 3x
2
. That is the roots
6 ±

36 −24
6
= 1 ±

3
3
.
These nodes are a bit ugly. Rather than construct the Lagrange Polynomials, we will use the
method of undetermined coefficients. Remember, we want to construct A
0
, A
1
such that

2
0
f(x) dx ≈ A
0
f(1 −

3
3
) +A
1
f(1 +

3
3
)
is exact for polynomial f(x) of degree ≤ 1. It suffices to make this approximation exact for the
“building blocks” of such polynomials, that is, for the functions 1 and x. That is, it suffices to find
A
0
, A
1
such that

2
0
1 dx = A
0
+A
1

2
0
xdx = A
0
(1 −

3
3
) +A
1
(1 +

3
3
)
This gives the equations
2 = A
0
+A
1
2 = A
0
(1 −

3
3
) +A
1
(1 +

3
3
)
This is solved by A
0
= A
1
= 1.
Thus our quadrature rule is

2
0
f(x) dx ≈ f(1 −

3
3
) +f(1 +

3
3
) .
We expect this rule to be exact for cubic polynomials. ¬
Example Problem 8.17. Verify the results of the previous example problem for f(x) = x
3
.
Solution: We have

2
0
f(x) dx = (1/4) x
4

2
0
= 4.
The quadrature rule gives
f(1 −

3
3
) +f(1 +

3
3
) =

1 −

3
3

3
+

1 +

3
3

3
=

1 −3

3
3
+ 3
3
9

3

3
27

+

1 + 3

3
3
+ 3
3
9
+
3

3
27

=

2 −

3 −

3/9

+

2 +

3 +

3/9

= 4
Thus the quadrature rule is exact for f(x). ¬
112 CHAPTER 8. INTEGRALS AND QUADRATURE
8.4.5 Reinventing the Wheel
While it is good to know the theory, it doesn’t make sense in practice to recompute these things all
the time. There are books full of quadrature rules; any good textbook will list a few. The simplest
ones are given in Table 8.1.
See also http://mathworld.wolfram.com/Legendre-GaussQuadrature.html
n x
i
A
i
0 x
0
= 0 A
0
= 2
1
x
0
= −

1/3 A
0
= 1
x
1
=

1/3 A
1
= 1
x
0
= −

3/5 A
0
= 5/9
2 x
1
= 0 A
1
= 8/9
x
2
=

3/5 A
1
= 5/9
Table 8.1: Gaussian Quadrature rules for the interval [−1, 1]. Thus

1
−1
f(x) dx ≈
¸
n
i=0
A
i
f(x
i
),
with this relation holding exactly for all polynomials of degree no greater than 2n + 1.
Quadrature rules are normally given for the interval [−1, 1]. On first consideration, it would
seem you need a different rule for each interval [a, b]. This is not the case, as the following example
problem illustrates:
Example Problem 8.18. Given a quadrature rule which is good on the interval [−1, 1], derive a
version of the rule to apply to the interval [a, b].
Solution: Consider the substitution:
x =
b −a
2
t +
b +a
2
, so dx =
b −a
2
dt.
Then

b
a
f(x) dx =

1
−1
f

b −a
2
t +
b +a
2

b −a
2
dt.
Letting
g(t) =
b −a
2
f

b −a
2
t +
b +a
2

,
if f(x) is a polynomial, g(t) is a polynomial of the same degree. Thus we can use the quadrature
rule for [−1, 1] on g(t) to evaluate the integral of f(x) over [a, b]. ¬
Example Problem 8.19. Derive the quadrature rules of Example Problem 8.16 by using the
technique of Example Problem 8.18 and the quadrature rules of Table 8.1.
Solution: We have a = 0, b = 2. Thus
g(t) =
b −a
2
f

b −a
2
t +
b +a
2

= f(t + 1).
To integrate f(x) over [0, 2], we integrate g(t) over [−1, 1]. The standard Gaussian Quadrature rule
approximates this as

1
−1
g(t) dt ≈ g(−

1/3) +g(

1/3) = f(1 −

1/3) +f(1 +

1/3).
This is the same rule that was derived (with much more work) in Example Problem 8.16. ¬
8.4. GAUSSIAN QUADRATURE 113
Exercises
(8.1) Use the composite trapezoidal rule, by hand, to approximate

3
0
x
2
dx (= 9)
Use the partition ¦x
i
¦
2
i=0
= ¦0, 1, 3¦ . Why is your approximation an overestimate?
(8.2) Use the composite trapezoidal rule, by hand, to approximate

1
0
1
x + 1
dx (= ln 2 ≈ 0.693)
Use the partition ¦x
i
¦
3
i=0
=
¸
0,
1
4
,
1
2
, 1
¸
. Why is your approximation an overestimate? (Check:
I think the answer is 0.7)
(8.3) Use the composite trapezoidal rule, by hand to approximate

1
0
4
1 +x
2
dx.
Use n = 4 subintervals. How good is your answer?
(8.4) Use Theorem 8.8 to bound the error of the composite trapezoidal rule approximation of

2
0
x
3
dx with n = 10 intervals. You should find that the approximation is an overestimate.
(8.5) How many equal subintervals of [0, 1] are required to approximate

1
0
cos x dx with error
smaller than 1 10
−6
by the composite trapezoidal rule? (Use Theorem 8.8.)
(8.6) How many equal subintervals would be required to approximate

1
0
4
1 +x
2
dx.
to within 0.0001 by the composite trapezoidal rule? (Hint: Use the fact that [f
tt
(x)[ ≤ 8 on
[0, 1] for f(x) = 4/(1 +x
2
))
(8.7) How many equal subintervals of [2, 3] are required to approximate

3
2
e
x
dx with error smaller
than 1 10
−3
by the composite trapezoidal rule?
(8.8) Simpson’s Rule for quadrature is given as

b
a
f(x) dx ≈
∆x
3
[f(x
0
) + 4f(x
1
) + 2f(x
2
) + 4f(x
3
) +. . . + 2f(x
n−2
) + 4f(x
n−1
) +f(x
n
)] ,
where ∆x = (b −a)/n, and n is assumed to be even. Show that Simpson’s Rule for n = 2 is
actually given by Romberg’s Algorithm as R(1, 1). As such we expect Simpson’s Rule to be
a O

h
4

approximation to the integral.
(8.9) Find a quadrature rule of the form

1
0
f(x) dx ≈ Af(0) +Bf(1/2) +Cf(1)
that is exact for polynomials of highest possible degree. What is the highest degree polynomial
for which this rule is exact?
114 CHAPTER 8. INTEGRALS AND QUADRATURE
(8.10) Determine a “quadrature” rule of the form

1
−1
f(x) dx ≈ Af(0) +Bf
t
(−1) +Cf
t
(1)
that is exact for polynomials of highest possible degree. What is the highest degree polynomial
for which this rule is exact? (Since this rule uses derivatives of f, it does not exactly fit our
definition of a quadrature rule, but it may be applicable in some situations.)
(8.11) Determine a “quadrature” rule of the form

1
0
f(x) dx ≈ Af(0) +Bf
t
(0) +Cf(1)
that is exact for polynomials of highest possible degree. What is the highest degree polynomial
for which this rule is exact?
(8.12) Consider the so-called order n Chebyshev Quadrature rule:

1
−1
f(x) dx ≈ c
n
n
¸
i=0
f(x
i
)
Find the weighting c
n
and nodes x
i
for the case n = 2 and the case n = 3. For what order
polynomials are these rules exact?
(8.13) Find the Gaussian Quadrature rule with 2 nodes for the interval [1, 5], i.e., find a rule

5
1
f(x) dx ≈ Af(x
0
) +Bf(x
1
)
Before you solve the problem, consider the following questions: do you expect the nodes to
be the endpoints 1 and 5? do you expect the nodes to be arranged symmetrically around the
midpoint of the interval?
(8.14) Find the Gaussian Quadrature rule with 3 nodes for the interval [−1, 1], i.e., find a rule

1
−1
f(x) dx ≈ Af(x
0
) +Bf(x
1
) +Cf(x
2
)
To find the nodes x
0
, x
1
, x
2
you will have to find the zeroes of a cubic equation, which could be
difficult. However, you may use the simplifying assumption that the nodes are symmetrically
placed in the interval [−1, 1].
(8.15) Write code to approximate the integral of a f on [a, b] by the composite trapezoidal rule on
n equal subintervals. Your m-file should have header line like:
function iappx = trapezoidal(f,a,b,n)
You may wish to use the code:
x = a .+ (b-a) .* (0:n) ./ n;
If f is defined to work on vectors element-wise, you can probably speed up your computation
by using
bigsum = 0.5 * ( f(x(1)) + f(x(n+1)) ) + sum( f(x(2:(n))) );
(8.16) Write code to implement the Gaussian Quadrature rule for n = 2 to integrate f on the
interval [a, b]. Your m-file should have header line like:
function iappx = gauss2(f,a,b)
(8.17) Write code to implement composite Gaussian Quadrature based on code from the previous
problem. Something like the following probably works:
8.4. GAUSSIAN QUADRATURE 115
function iappx = gaussComp(f,a,b,n)
% code to approximate integral of f over n equal subintervals of [a,b]
x = a .+ (b-a) .* (0:n) ./ n;
iappx = 0;
for i=1:n
iappx += gauss2(f,x(i),x(i+1));
end
Use your code to approximate the error function:
erf(z) =
2

π

z
0
e
−t
2
dt.
Compare your results with the octave/Matlab builtin function erf. (Try help erf in octave,
or see http://mathworld.wolfram.com/Erf.html)
The error function is used in probability. In particular, the probability that a normal random
variable is within z standard deviations from its mean is
erf(z/

2)
Thus erf(1/

2) ≈ 0.683, and erf(2/

2) ≈ 0.955. These numbers should look familiar to you.
116 CHAPTER 8. INTEGRALS AND QUADRATURE
Chapter 9
Least Squares
9.1 Least Squares
Least squares is a general class of methods for fitting observed data to a theoretical model function.
In the general setting we are given a set of data
x x
0
x
1
. . . x
n
y y
0
y
1
. . . y
n
and some class of functions, T. The goal then is to find the “best” f ∈ T to fit the data to y = f(x).
Usually the class of functions T will be determined by some small number of parameters; the number
of parameters will be smaller (usually much smaller) than the number of data points. The theory
here will be concerned with defining “best,” and examining methods for finding the “best” function
to fit the data.
9.1.1 The Definition of Ordinary Least Squares
First we consider the data to be two vectors of length n + 1. That is we let
x = [x
0
x
1
. . . x
n
]
¯
, and y = [y
0
y
1
. . . y
n
]
¯
.
The error, or residual, of a given function with respect to this data is the vector r = y − f(x).
That is
r = [r
0
r
1
. . . r
n
]
¯
, where r
i
= y
i
−f(x
i
).
Our goal is to find f ∈ T such that r is reasonably small. We measure the size of a vector
by the use of norms, which were explored in Subsection 1.4.1. The most useful norms are the
p
norms. For a given p with 0 < p < ∞, the
p
norm of r is defined as
|r|
p
=

n
¸
i=0
r
p
i

1/p
For the purposes of approximation, the easiest norm to use is the
2
norm:
Definition 9.1.1 ((Ordinary) Least Squares Best Approximant). The least-squares best
approximant to a set of data, x, y from a class of functions, T, is the function f

∈ T that
minimizes the
2
norm of the error. That is, if f

is the least squares best approximant, then
|y −f

(x)|
2
= min
f∈T
|y −f(x)|
2
117
118 CHAPTER 9. LEAST SQUARES
We will generally assume uniqueness of the minimum. This method is sometimes called the ordinary
least squares method. It assumes there is no error in the measurement of the data x, and usually
admits a relatively straightforward solution.
9.1.2 Linear Least Squares
We illustrate this definition using the class of linear functions as an example, as this case is reason-
ably simple. We are assuming
T = ¦f(x) = ax +b [ a, b ∈ R¦ .
We can now think of our job as drawing the “best” line through the data.
By Definition 9.1.1, we want to find the function f(x) = ax +b that minimizes

n
¸
i=0
[y
i
−f(x
i
)]
2

1/2
.
This is a minimization problem over two variables, x and y. As you may recall from calculus class,
it suffices to minimize the sum rather than its square root. That is, it suffices to minimize
n
¸
i=0
[ax
i
+b −y
i
]
2
.
We minimize this function with calculus. We set
0 =
∂φ
a
=
n
¸
k=0
2x
k
(ax
k
+b −y
k
)
0 =
∂φ
b
=
n
¸
k=0
2 (ax
k
+b −y
k
)
These are called the normal equations. Believe it or not these are two equations in two unknowns.
They can be reduced to
¸
x
2
k
a +
¸
x
k
b =
¸
x
k
y
k
¸
x
k
a + (n + 1)b =
¸
y
k
The solution is found by na¨ıve Gaussian Elimination, and is ugly. Let
d
11
=
¸
x
2
k
d
12
= d
21
=
¸
x
k
d
22
= n + 1
e
1
=
¸
x
k
y
k
e
2
=
¸
y
k
We want to solve
d
11
a +d
12
b = e
1
d
21
a +d
22
b = e
2
9.1. LEAST SQUARES 119
Gaussian Elimination produces
a =
d
22
e
1
−d
12
e
2
d
22
d
11
−d
12
d
21
b =
d
11
e
2
−d
21
e
1
d
22
d
11
−d
12
d
21
The answer is not so enlightening as the means of finding the solution.
We should, for a moment, consider whether this is indeed the solution. Our calculations have
only shown an extrema at this choice of (a, b); could it not be a maxima?
9.1.3 Least Squares from Basis Functions
In many, but not all cases, the class of functions, T, is the span of a small set of functions. This
case is simpler to explore and we consider it here. In this case T can be viewed as a vector space
over the real numbers. That is, for f, g ∈ T, and α, β ∈ R, then αf +βg ∈ T, where the function
αf is that function such that (αf)(x) = αf(x).
Now let ¦g
j
(x)¦
m
j=0
be a set of m+ 1 linearly independent functions, i.e.,
c
0
g
0
(x) +c
1
g
1
(x) +. . . +c
m
g
m
(x) = 0 ∀x ⇒ c
0
= c
1
= . . . = c
m
= 0
Then we say that T is spanned by the functions ¦g
j
(x)¦
m
j=0
if
T =

f(x) =
¸
j
c
j
g
j
(x) [ c
j
∈ R, j = 0, 1, . . . , m

.
In this case the functions g
j
are basis functions for T. Note the basis functions need not be unique:
a given class of functions will usually have more than one choice of basis functions.
Example 9.1. The class T = ¦f(x) = ax +b [ a, b ∈ R¦ is spanned by the two functions g
0
(x) = 1,
and g
1
(x) = x. However, it is also spanned by the two functions ˜ g
0
(x) = 2x+1, and ˜ g
1
(x) = x−1.
To find the least squares best approximant of T for a given set of data, we minimize the square
of the
2
norm of the error; that is we minimize the function
φ(c
0
, c
1
, . . . , c
m
) =
n
¸
k=0

¸
¸
j
c
j
g
j
(x
k
)
¸

−y
k
¸
¸
2
(9.1)
Again we set partials to zero and solve
0 =
∂φ
c
i
=
n
¸
k=0
2

¸
¸
j
c
j
g
j
(x
k
)
¸

−y
k
¸
¸
g
i
(x
k
)
This can be rearranged to get
m
¸
j=0
¸
¸
k
g
j
(x
k
)g
i
(x
k
)
¸
c
j
=
n
¸
k=0
y
k
g
i
(x
k
)
120 CHAPTER 9. LEAST SQUARES
If we now let
d
ij
=
n
¸
k=0
g
j
(x
k
)g
i
(x
k
), e
i
=
n
¸
k=0
y
k
g
i
(x
k
),
Then we have reduced the problem to the linear system (again, called the normal equations):

d
00
d
01
d
02
d
0m
d
10
d
11
d
12
d
1m
d
20
d
21
d
22
d
2m
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
d
m0
d
m1
d
m2
d
mm
¸
¸
¸
¸
¸
¸
¸

c
0
c
1
c
2
.
.
.
c
m
¸
¸
¸
¸
¸
¸
¸
=

e
0
e
1
e
2
.
.
.
e
m
¸
¸
¸
¸
¸
¸
¸
(9.2)
The choice of the basis functions can affect how easy it is to solve this system. We explore this
in Section 9.2. Note that we are talking about the basis ¦g
j
¦
m
j=0
, and not exactly about the class
of functions T.
For example, consider what would happen if the system of normal equations were diagonal. In
this case, solving the system would be rather trivial.
Example Problem 9.2. Consider the case where m = 0, and g
0
(x) = ln x. Find the least squares
approximation of the data
x 0.50 0.75 1.0 1.50 2.0 2.25 2.75 3.0
y −1.187098 −0.452472 −0.068077 0.713938 1.165234 1.436975 1.725919 1.841422
Solution: Essentially we are trying to find the c such that c ln x best approximates the data. The
system of equation 9.2 reduces to the 1-D equation:

Σ
k
ln
2
x
k

c = Σ
k
y
k
lnx
k
For our data, this reduces to:
4.0960c = 6.9844
so we find c = 1.7052. The data and the least squares interpolant are shown in Figure 9.1. ¬
Example 9.3. Consider the awfully chosen basis functions:
g
0
(x) =

2
−1

x
2


2
x + 1
g
1
(x) = x
3
+

2
−1

x
2
+x

+ 1
where is small, around machine precision.
Suppose the data are given at the nodes x
0
= 0, x
1
= 1, x
2
= −1. We want to set up the normal
equations, so we compute some of the d
ij
. First we have to evaluate the basis functions at the nodes
x
i
. But this example was rigged to give:
g
0
(x
0
) = 1, g
0
(x
1
) = 0, g
0
(x
2
) =
g
1
(x
0
) = 1, g
1
(x
1
) = , g
1
(x
2
) = 0
After much work we find we want to solve
¸
1 +
2
1
1 1 +
2
¸
c
0
c
1

=
¸
y
0
+y
2
y
0
+y
1

9.2. ORTHONORMAL BASES 121
-2.5
-2
-1.5
-1
-0.5
0
0.5
1
1.5
2
0.5 1 1.5 2 2.5 3
Figure 9.1: The data of Example Problem 9.2 and the least squares interpolant are shown.
However, the computer would only find this if it had infinite precision. Since it does not, and since
is rather small, the computer thinks
2
= 0, and so tries to solve the system
¸
1 1
1 1
¸
c
0
c
1

=
¸
y
0
+y
2
y
0
+y
1

When y
1
= y
2
, this has no solution. Bummer.
This kind of thing is common in the method of least squares: the coefficients of the normal
equations include terms like
g
i
(x
k
)g
j
(x
k
).
When the g
i
are small at the nodes x
k
, these coefficients can get really small, since we are squaring.
Now we draw a rough sketch of the basis functions. We find they do a pretty poor job of
discriminating around all the nodes.
9.2 Orthonormal Bases
In the previous section we saw that poor choice of basis vectors can lead to numerical problems.
Roughly speaking, if g
i
(x
k
) is small for some i’s and k’s, then some d
ij
can have a loss of precision
when two small quantities are multiplied together and rounded to zero.
Poor choice of basis vectors can also lead to numerical problems in solution of the normal
equations, which will be done by Gaussian Elimination.
Consider the case where T is the class of polynomials of degree no greater than m. For simplicity
we will assume that all x
i
are in the interval [0, 1]. The most obvious choice of basis functions is
g
j
(x) = x
j
. This certainly gives a basis for T, but actually a rather poor one. To see why, look
at the graph of the basis functions in Figure 9.3. The basis functions look too much alike on the
given interval.
A better basis for this problem is the set of Chebyshev Polynomials of the first kind, i.e.,
g
j
(x) = T
j
(x), where
T
0
(x) = 1, T
1
(x) = x, T
i+1
(x) = 2xT
i
(x) −T
i−1
(x).
122 CHAPTER 9. LEAST SQUARES
-1.5
-1
-0.5
0
0.5
1
1.5
-1 -0.5 0 0.5 1
P
S
f
r
a
g
r
e
p
l
a
c
e
m
e
n
t
s
g
0
(x)
g
1
(x)
Figure 9.2: The basis functions g
0
(x) =

2
−1

x
2


2
x +1, and g
1
(x) = x
3
+

2
−1

x
2
+x

+1
are shown for = 0.05. Note that around the three nodes 0, 1, −1, these two functions take nearly
identical values. This can lead to a system of normal equations with no solution.
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
P
S
f
r
a
g
r
e
p
l
a
c
e
m
e
n
t
s
x
i
x
i
+
1
Figure 9.3: The polynomials x
i
for i = 0, 1, . . . , 6 are shown on [0, 1]. These polynomials make a
bad basis because they look so much alike, essentially.
These polynomials are illustrated in Figure 9.4.
9.2.1 Alternatives to Normal Equations
It turns out that the Normal Equations method isn’t really so great. We consider other methods.
First, we define A as the n m matrix defined by the entries:
a
ij
= g
j
(x
i
), i = 0, 1, . . . , n, j = 0, 1, . . . , m.
9.2. ORTHONORMAL BASES 123
-1
-0.5
0
0.5
1
0 0.2 0.4 0.6 0.8 1
P
S
f
r
a
g
r
e
p
l
a
c
e
m
e
n
t
s
x
i
x
i
+
1
Figure 9.4: The Chebyshev polynomials T
j
(x) for j = 0, 1, . . . , 6 are shown on [0, 1]. These polyno-
mials make a better basis for least squares because they are orthogonal under some inner product.
Basically, they do not look like each other.
That is
A =

g
0
(x
0
) g
1
(x
0
) g
2
(x
0
) g
m
(x
0
)
g
0
(x
1
) g
1
(x
1
) g
2
(x
1
) g
m
(x
1
)
g
0
(x
2
) g
1
(x
2
) g
2
(x
2
) g
m
(x
2
)
g
0
(x
3
) g
1
(x
3
) g
2
(x
3
) g
m
(x
3
)
g
0
(x
4
) g
1
(x
4
) g
2
(x
4
) g
m
(x
4
)
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
g
0
(x
n
) g
1
(x
n
) g
2
(x
n
) g
m
(x
n
)
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
We write it in this way since we are thinking of the case where n m, so A is “tall.”
After some inspection, we find that the Normal Equations can be written as:
A
¯
Ac = A
¯
y. (9.3)
Now let the vector c be identified, in the natural way, with a function in T. That is c is
identified with f(x) =
¸
m
j=0
c
j
g
j
(x). You should now convince yourself that
Ac = f(x).
And thus the residual, or error, of this function is r = y −Ac.
In our least squares theory we attempted to find that c that minimized
|y −Ac|
2
2
= (y −Ac)
¯
(y −Ac)
We can see this as minimizing the Euclidian distance from y to Ac. For this reason, we will have
that the residual is orthogonal to the column space of A, that is we want
A
¯
r = A
¯
(y −Ac) = 0.
124 CHAPTER 9. LEAST SQUARES
This is just the normal equations. We could rewrite this, however, in the following form: find c, r
such that
¸
I A
A
¯
0
¸
r
c

=
¸
y
0

This is now a system of n +m variables and unknowns, which can be solved by specialized means.
This is known as the augmented form. We briefly mention that na¨ıve Gaussian Elimination is not
appropriate to solve the augmented form, as it turns out to be equivalent to using the normal
equations method.
9.2.2 Ordinary Least Squares in octave/Matlab
The ordinary least squares best approximant is so fundamental that it is hard-wired into oc-
tave/Matlab’s system solve operator. The general equation we wish to solve is
Ac = y.
This system is overdetermined: there are more rows than columns in A, i.e., more equations than
unknown variables. However, in octave/Matlab, this is solved just like the case where A is square:
octave:1> c = A \ y;
Thus, unless specifically told otherwise, you should not implement the ordinary least squares solu-
tion method in octave/Matlab. Under the hood, octave/Matlab is not using the normal equations
method, and does not suffer the increased condition number of the normal equations method.
9.3 Orthogonal Least Squares
The method described in Section 9.1, sometimes referred to as “ordinary least squares,” assumes
that measurement error is found entirely in the dependent variable, and that there is no error in
the independent variables.
For example, consider the case of two variables, x and y, which are thought to be related by an
equation of the form y = mx +b. A number of measurements are made, giving the two sequences
¦x
i
¦
n
i=1
and ¦y
i
¦
n
i=1
. Ordinary least squares assumes, that
y
i
= mx
i
+b +
i
,
where
i
, the error of the i
th
measurement, is a random variable. It is usually assumed that
E[
i
] = 0, i.e., that the measurements are “unbiased.” Under the further assumption that the
errors are independent and have the same variance, the ordinary least squares solution is a very
good one.
1
However, what if it were the case that
y
i
= m(x
i

i
) +b +
i
,
i.e., that there is actually error in the measurement of the x
i
? In this case, the orthogonal least
squares method is appropriate.
1
This means that the ordinary least squares solution gives an unbiased estimator of m and b, and, moreover, gives
the estimators with the lowest variance among all linear, a priori estimators. For more details on this, locate the
Gauss-Markov Theorem in a good statistics textbook.
9.3. ORTHOGONAL LEAST SQUARES 125
The difference between the two methods is illustrated in Figure 9.5. In Figure 9.5(a), the
ordinary least squares method is shown; it minimizes the sum of the squared lengths of vertical
lines from observations to a line, reflecting the assumption of no error in x
i
. The orthogonal least
squares method will minimize the sum of the squared distances from observations to a line, as
shown in Figure 9.5(b).
1
2
3
4
5
6
7
0 1 2 3 4 5 6
1
2
3
4
5
6
7
0 1 2 3 4 5 6
(a) Ordinary Least Squares
1
2
3
4
5
6
7
0 1 2 3 4 5 6
1
2
3
4
5
6
7
0 1 2 3 4 5 6
(b) Orthogonal Least Squares
Figure 9.5: The ordinary least squares method, as shown in (a), minimizes the sum of the squared
lengths of the vertical lines from the observed points to the regression line. The orthogonal least
squares, shown in (b), minimizes the sum of the squared distances from the points to the line. The
offsets from points to the regression line in (b) may not look orthogonal due to improper aspect
ratio of the figure.
We construct the solution geometrically. Let
¸
x
(i)
¸
m
i=1
be the set of observations, expressed as
vectors in R
n
. The problem is to find the vector n and number d such that the hyperplane n
¯
x = d
is the plane that minimizes the sum of the squared distances from the points to the plane. We can
solve this problem using vector calculus.
The function
1
|n|
2
2
m
¸
i=1

n
¯
x
(i)
−d

2
2
.
is the one to be minimized with respect to n and d. It gives the sum of the squared distances from
the points to the plane described by n and d. To simplify its computation, we will minimize it
subject to the constraint that |n|
2
2
= 1. Thus our problem is to find n and d that solve
min
n

n=1
m
¸
i=1

n
¯
x
(i)
−d

2
2
.
We can express this as min f(n, d) subject to g(n, d) = 1. The solution of this problem uses the
Lagrange multipliers technique.
Let L(n, d, λ) = f(n, d) − λg(n, d). The Lagrange multipliers theory tells us that a necessary
126 CHAPTER 9. LEAST SQUARES
condition for n, d to be the solution is that there exists λ such that the following hold:

∂/
∂d
(n, d, λ) = 0

n
L(n, d, λ) = 0
g (n, d) = 1
Let X be the m n matrix whose i
th
row is the vector x
(i)
¯
. We can rewrite the Lagrange
function as
L(n, d, λ) = (Xn−d1
m
)
¯
(Xn −d1
m
) −λn
¯
n
= n
¯
X
¯
Xn−2d1
m
¯
Xn +d
2
1
m
¯
1
m
−λn
¯
n
We solve the necessary condition on
∂/
∂d
.
0 =
∂L
∂d
(n, d, λ) = 2d1
m
¯
1
m
−21
m
¯
Xn ⇒ d = 1
m
¯
Xn/1
m
¯
1
m
This essentially tells us that we will zero the first moment of the data around the line. Note that
this determination of d requires knowledge of n. However, since this is a necessary condition, we
can plug it into the gradient equation:
0 = ∇
n
L(n, d, λ) = 2X
¯
Xn −2d

1
m
¯
X

¯
−2λn = 2
¸
X
¯
Xn −
1
m
¯
Xn
1
m
¯
1
m

1
m
¯
X

¯
−λn

The middle term is a scalar times a vector, so the multiplication can be commuted, this gives
0 =
¸
X
¯
Xn −
X
¯
1
m
1
m
¯
X
1
m
¯
1
m
n −λn

=
¸
X
¯
X −
X
¯
1
m
1
m
¯
X
1
m
¯
1
m
−λI

n
Thus n is an eigenvector of the matrix
M = X
¯
X −
X
¯
1
m
1
m
¯
X
1
m
¯
1
m
The final condition of the three Lagrange conditions is that g(n, d) = n
¯
n = 1. Thus if n is a
minimizer, then it is a unit eigenvector of M.
It is not clear which eigenvector is the right one, so we might have to check all the eigenvectors.
However, further work tells us which eigenvector it is. First we rewrite the function to be minimized,
when the optimal d is used:
ˆ
f(n) = f(n, 1
m
¯
Xn/1
m
¯
1
m
). We have
ˆ
f(n) = n
¯
X
¯
Xn −2
1
m
¯
Xn
1
m
¯
1
m
1
m
¯
Xn +

1
m
¯
Xn
1
m
¯
1
m

2
1
m
¯
1
m
= n
¯
X
¯
Xn −2
n
¯
X
¯
1
m
1
m
¯
Xn
1
m
¯
1
m
+
n
¯
X
¯
1
m
1
m
¯
Xn1
m
¯
1
m
1
m
¯
1
m
1
m
¯
1
m
= n
¯
X
¯
Xn −2
n
¯
X
¯
1
m
1
m
¯
Xn
1
m
¯
1
m
+
n
¯
X
¯
1
m
1
m
¯
Xn
1
m
¯
1
m
= n
¯
X
¯
Xn −
n
¯
X
¯
1
m
1
m
¯
Xn
1
m
¯
1
m
= n
¯
¸
X
¯
X −
X
¯
1
m
1
m
¯
X
1
m
¯
1
m

n
= n
¯
Mn
9.3. ORTHOGONAL LEAST SQUARES 127
This sequence of equations only required that the d be the optimal d for the given n, and used the
fact that a scalar is it’s own transpose, thus 1
m
¯
Xn = n
¯
X
¯
1
m
.
Now if n is a unit eigenvector of M, with corresponding eigenvalue λ, then n
¯
Mn = λ. Note
that because f(n, d) is defined as w
¯
w, for some vector w, then the matrix M must be positive
definite, i.e., λ must be positive. However, we want to minimize λ. Thus we take n to be the unit
eigenvector associated with the smallest eigenvalue.
Note that, by roundoff, you may compute that M has some negative eigenvalues, though they
are small in magnitude. Thus one should select, as n, the unit eigenvector associated with the
smallest eigenvalue in absolute value.
Example Problem 9.4. Find the equation of the line which best approximates the 2D data
¦(x
i
, y
i

m
i=1
, by the orthogonal least squares method.
Solution: In this case the matrix X is

x
1
y
1
x
2
y
2
x
3
y
3
.
.
.
.
.
.
x
m
y
m
¸
¸
¸
¸
¸
¸
¸
Thus we have that
M = X
¯
X −
X
¯
1
m
1
m
¯
X
1
m
¯
1
m
,
=
¸ ¸
x
2
i
¸
x
i
y
i
¸
x
i
y
i
¸
y
2
i


1
m
¸
(
¸
x
i
)
2
¸
x
i
¸
y
i
¸
x
i
¸
y
i
(
¸
y
i
)
2

,
=
¸ ¸
x
2
i
¸
x
i
y
i
¸
x
i
y
i
¸
y
2
i

−m
¸
¯ x
2
¯ x¯ y
¯ x¯ y ¯ y
2

,
=
¸ ¸
x
2
i
−m¯ x
2
¸
x
i
y
i
−m¯ x¯ y
¸
x
i
y
i
−m¯ x¯ y
¸
y
2
i
−m¯ y
2

=
¸
2α β
β 2γ

. (9.4)
where we use ¯ x, ¯ y, to mean, respectively, the mean of the x- and y-values. The characteristic
polynomial of the matrix, whose roots are the eigenvalues of the matrix is
p(λ) = det
¸
2α −λ β
β 2γ −λ

= 4αγ −β
2
−2(α +γ)λ +λ
2
.
The roots of this polynomial are given by the quadratic equation
λ
±
=
2(α +γ) ±

4 (α +γ)
2
−4 (4αγ −β
2
)
2
= (α +γ) ±

(α −γ)
2

2
We want the smaller eigenvalue, λ

= (α +γ) −

(α −γ)
2

2
. The associated eigenvector is in
the null space of the matrix
0 =
¸
2α −λ

β
β 2γ −λ

v =
¸
2α −λ

β
β 2γ −λ

¸
−k
1

=
¸
β −k (2α −λ

)
−kβ + 2γ −λ

128 CHAPTER 9. LEAST SQUARES
This is solved by
k =
2γ −λ

β
=
(γ −α) +

(γ −α)
2

2
β
=
¸
(y
2
i
−x
2
i
) −m

¯ y
2
− ¯ x
2

+

¸
(y
2
i
−x
2
i
) −m(¯ y
2
− ¯ x
2
)

2
+ 4 (
¸
x
i
y
i
−m¯ x¯ y)
2
2
¸
x
i
y
i
−m¯ x¯ y
. (9.5)
Thus the best plane through the data is of the form
−kx + 1y = d or y = kx +d
Note that the eigenvector, v, that we used is not a unit vector. However it is parallel to the unit
vector we want, and can still use the regular formula to compute the optimal d.
d = 1
m
¯
Xn/1
m
¯
1
m
=
1
m
1
m
¯
X
¸
−k
1

=

¯ x ¯ y

¸
−k
1

= ¯ y −k¯ x
Thus the optimal line through the data is
y = kx +d = k (x − ¯ x) + ¯ y or y − ¯ y = k (x − ¯ x) ,
with k defined in equation 9.5.
¬
9.3.1 Computing the Orthogonal Least Squares Approximant
Just as the Normal Equations equation 9.3 define the solution to the ordinary least squares ap-
proximant, but are not normally used to solve the problem, so too is the above described means of
computing the orthogonal least squares not a good idea.
First we define
W = X −
1
m
1
m
¯
X
1
m
¯
1
m
.
Now note that
W
¯
W =

X −
1
m
1
m
¯
X
1
m
¯
1
m

¯

X −
1
m
1
m
¯
X
1
m
¯
1
m

= X
¯
X −2
X
¯
1
m
1
m
¯
X
1
m
¯
1
m
+

1
m
1
m
¯
X

¯
1
m
1
m
¯
X

1
m
¯
1
m

2
= X
¯
X −2
X
¯
1
m
1
m
¯
X
1
m
¯
1
m
+
X
¯
1
m
(1
m
¯
1
m
)1
m
¯
X

1
m
¯
1
m

2
= X
¯
X −2
X
¯
1
m
1
m
¯
X
1
m
¯
1
m
+
X
¯
1
m
1
m
¯
X
1
m
¯
1
m
= X
¯
X −
X
¯
1
m
1
m
¯
X
1
m
¯
1
m
= M.
We now need some linear algebra magic:
Definition 9.3.1. A square matrix U is called unitary if its inverse is its transpose, i.e.,
U
¯
U = I = UU
¯
.
From the first equation, we see that the columns of U form a collection of orthonormal vectors.
The second equation tells that the rows of U are also such a collection.
9.3. ORTHOGONAL LEAST SQUARES 129
Definition 9.3.2. Every mn matrix B can be decomposed as
B = UΣV
¯
,
where U and V are unitary matrices, U is m m, V is n n, and Σ is m n, and has nonzero
elements only on its diagonal. The values of the diagonal of Σ are the singular values of B. The
column vectors of V span the row space of B, and the column vectors of U contain the column space
of B.
The singular value decomposition generalizes the notion of eigenvalues and eigenvalues to the
case of nonsquare matrices. Let u
(i)
be the i
th
column vector of U, v
(i)
the i
th
column of V, and
let σ
i
be the i
th
element of the diagonal of Σ. Then if x =
¸
i
α
i
v
(i)
, we have
Bx = UΣV
¯
¸
i
α
i
v
(i)
= UΣ[α
1
α
2
. . . α
n
]
¯
= U

σ
1
α
1
0 . . . 0
0 σ
2
α
2
. . . 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 . . . σ
n
α
n
0 0 . . . 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 . . . 0
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
=
¸
i
σ
i
α
i
u
(i)
.
Thus v
(i)
and u
(i)
act as an “eigenvector pair” with “eigenvalue” σ
i
.
The singular value decomposition is computed by the octave/Matlab command [U,S,V] =
svd(B). The decomposition is computed such that the diagonal elements of S, the singular values,
are positive, and decreasing with increasing index.
Now we can use the singular value decomposition on W:
W = UΣV
¯
.
Thus
M = W
¯
W = VΣ
¯
U
¯
UΣV
¯
= VΣ
¯
IΣV
¯
= VS
2
V
¯
,
where S
2
is the n n diagonal matrix whose i
th
diagonal is σ
2
i
. Thus we see that the solution to
the orthogonal least squares problem, the eigenvector associated with the smallest eigenvalue of M,
is the column vector of V associated with the smallest, in absolute value, singular value of W. In
octave/Matlab, this is V(:,n).
This may not appear to be a computational savings, but if M is computed up front, and its
eigenvectors are computed, there is loss of precision in its smallest eigenvalues which might lead us
to choose the wrong normal direction (especially if there is more than one!). On the other hand,
M is a n n matrix, whereas W is m n. If storage is at a premium, and the method is being
reapplied with new observations, it may be preferrable to keep M, rather than keeping W. That is,
it may be necessary to trade conditioning for space.
9.3.2 Principal Component Analysis
The above analysis leads us to the topic of Principal Component Analysis. Suppose the m n
matrix X represents m noisy observations of some process, where each observation is a vector in
R
n
. Suppose the observations nearly lie in some k-dimensional subspace of R
n
, for k < n. How can
you find an approximate value of k, and how do you find the k-dimensional subspace?
130 CHAPTER 9. LEAST SQUARES
As in the previous subsection, let
W = X −
1
m
1
m
¯
X
1
m
¯
1
m
= X −1
m
¯
X, where
¯
X = 1
m
¯
X/1
m
¯
1
m
.
Now note that
¯
X is the mean of the m observations, as a row vector in R
n
. Under the assumption
that the noise is approximately the same for each observation, we should think that
¯
X is likely
to be in or very close to the k-dimensional subspace. Then each row of W should be like a vector
which is contained in the subspace. If we take some vector v and multiply Wv, we expect the
resultant vector to have small norm if v is not in the k-dimensional subspace, and to have a large
norm if it is in the subspace.
You should now convince yourself that this is related to the singular value decomposition of W.
Decomposing v =
¸
i
α
i
v
(i)
, we have Wv =
¸
i
σ
i
α
i
u
(i)
, and thus product vector has small norm if
the α
i
associated with large σ
i
are small, and large norm otherwise. That is the principal directions
of the “best” k-dimensional subspace associated with X are the k columns of V associated with the
largest singular values of W.
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 5 10 15 20 25 30 35 40
cum. sum sqrd. sing. vals.
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 5 10 15 20 25 30 35 40
cum. sum sqrd. sing. vals.
Figure 9.6: Noisy 5-dimensional data lurking in R
40
are treated to Principal Component Analysis.
This figure shows the normalized cumulative sum of squared singular values of W. There is a sharp
“elbow” at k = 5, which indicates the data is actually 5-dimensional.
If k is unknown a priori , a good value can be obtained by eyeing a graph of the cumulative
sum of squared singular values of W, and selecting the k that makes this appropriately large. This
is illustrated in Figure 9.6, with data generated by the following octave/Matlab code:
octave:1> X = rand(150,5);
octave:2> sawX = (X * randn(5,40)) + kron(ones(150,1),randn(1,40));
octave:3> sawX += 0.1 * randn(150,40);
octave:4> wun = ones(150,1);
9.3. ORTHOGONAL LEAST SQUARES 131
octave:5> W = sawX - (wun * wun’ * sawX) / (wun’ * wun);
octave:6> [U,S,V] = svd(W);
octave:7> eigs = diag(S’*S);
octave:8> cseig = cumsum(eigs) ./ sum(eigs);
octave:9> plot ((1:40)’,cseig,’xr-@;cum. sum sqrd. sing. vals.;’);
132 CHAPTER 9. LEAST SQUARES
Exercises
(9.1) How do you know that the choice of constants c
i
in our least squares analysis actually find a
minimum of equation 9.1, and not, say, a maximum?
(9.2) Our “linear” least squares might be better called the “affine” least squares. In this exercise
you will find the best linear function which approximates a set of data. That is, find the
function f(x) = cx which is the ordinary least squares best approximant to the given data
x x
0
x
1
. . . x
n
y y
0
y
1
. . . y
n
(9.3) Find the constant that best approximates, in the ordinary least squares sense, the given data
x, y. (Hint: you can use equation 9.2 or equation 9.3 using a single basis function g
0
(x) = 1.)
Do the x
i
values affect your answer?
(9.4) Find the function f(x) = c which best approximates, in the ordinary least squares sense, the
data
x 1 −2 5
y 1 −2 4
(9.5) Find the function ax +b that best approximates the data
x 0 −1 2
y 0 1 −1
Use the ordinary least squares method.
(9.6) Find the function ax +b that best approximates the data of the previous problem using the
orthogonal least squares method.
(9.7) Find the function ax
2
+b that best approximates the data
x 0 1 2
y 0.3 0.1 0.5
(9.8) Prove that the least squares solution is the solution of the normal equations, equation 9.3,
and thus takes the form
c =

A
¯
A

−1
A
¯
y.
(9.9) For ordinary least squares regression to the line y = mx + b, express the approximate m
in terms of the α, β, and γ parameters from equation 9.4. Compare this to that found in
equation 9.5.
(9.10) Any symmetric positive definite matrix, G can be used to define a norm in the following
way:
|v|
G
=
df

v
¯
Gv
The weighted least squares method for G and data A and y, finds the ˆ c that solves
min
c
|Ac −y|
2
G
Prove that the normal equations form of the solution of this problem is
A
¯
GAˆ c = A
¯
Gy.
9.3. ORTHOGONAL LEAST SQUARES 133
(9.11) (Continuation) Let the symmetric positive definite matrix G have Cholesky Factorization
G = LL
¯
, where L is a lower triangular matrix. Show that the solution to the weighted least
squares method is the ˆ c that is the ordinary (unweighted) least squares best approximate
solution to the problem
L
¯
Ac = L
¯
y.
(9.12) The r
2
statistic is often used to describe the “goodness-of-fit” of the ordinary least squares
approximation to data. It is defined as
r
2
=
|Aˆ c|
2
2
|y|
2
2
,
where ˆ c is the approximant A
¯
A
−1
A
¯
y.
(a) Prove that we also have
r
2
= 1 −
|Aˆ c −y|
2
2
|y|
2
2
.
(b) Prove that the r
2
parameter is “scale-invariant,” that is, if we change the units of our
data, the parameter is unchanged (although the value of ˆ c may change.)
(c) Devise a similar statistic for the orthogonal least squares approximation which is scale-
invariant and rotation-invariant.
(9.13) As proved in Example Problem 9.4, the orthogonal least squares best approximate line for
data ¦(x
i
, y
i

m
i=1
goes through the point (¯ x, ¯ y). Does the same thing hold for the ordinary
least squares best approximate line?
(9.14) Find the constant c such that f(x) = ln (cx) best approximates, in the least squares sense,
the given data
x x
0
x
1
. . . x
n
y y
0
y
1
. . . y
n
(Hint: You cannot use basis functions and equation 9.2 to solve this. You must use the
Definition 9.1.1.) The geometric mean of the numbers a
1
, a
2
, . . . , a
n
is defined as (
¸
a
i
)
1/n
.
How does your answer relate to the geometric mean?
(9.15) Write code to find the function ae
x
+be
−x
that best interpolates, in the least squares sense,
a set of data ¦(x
i
, y
i

i=n
i=0
. Your m-file should have header line like:
function [a,b] = lsqrexp(xys)
where xys is the (n + 1) 2 matrix whose rows are the n + 1 tuples (x
i
, y
i
). Use the normal
equations method. This should reduce the problem to solving a 2 2 linear system; let
octave/Matlab solve that linear system for you. (Hint: To find x such that Mx = b, use x =
M ` b.)
(a) What do you get when you try the following?
octave:1> xs = [-1 -0.5 0 0.5 1]’;
octave:2> ys = [1.194 0.430 0.103 0.322 1.034]’;
octave:3> xys = [xs ys];
octave:4> [a,b] = lsqrexp(xys)
(b) Try the following:
octave:5> xys = xys = [-0.00001 0.3;0 0.6;0.00001 0.7];
octave:6> [a,b] = lsqrexp(xys)
Does something unexpected happen? Why?
(9.16) Write code to find the plane n
¯
x
(i)
= d that best interpolates, in the least squares sense, a
set of data
¸
x
(i)
¸
i=m
i=0
. Your m-file should have header line like:
134 CHAPTER 9. LEAST SQUARES
function [n,d] = orthlsq(X)
where X is the m n matrix whose rows are the m tuples (x
1
, x
2
, . . . , x
n
). Use the oc-
tave/Matlab eig function, which returns the eigenvalues and vectors of a matrix.
(a) Create artificially planar data by the following:
octave:1> m = 100;n = 5;epsi = 0.1;offs = 2;
octave:2> Xsub = rand(m,(n-1));
octave:3> T = rand(n-1,n);
octave:4> X = Xsub * T;
octave:5> X += offs .* ones(size(X));
octave:6> X += epsi .* randn(size(X));
Now X contains data which are approximately on a (n − 1)-dimensional hyperplane in
R
n
. The data, when centered, lay in the rowspace of T. Now test your code.
octave:7> [n,d] = orthlsq(X);
You should have that the vector n is orthogonal to the rows of T. Check this:
octave:8> disp(norm(T * n));
What do you get? Try again with epsi very small (1 10
−5
) or zero.
(b) Compare the orthogonal least squares technique to the ordinary least squares in a two
dimensional setting. Consider x, y data nearly on the line y = mx +b:
octave:1> obs = 100;xeps = 1.0;yeps = 1.0;m = 0.375;b = 1;
octave:2> Xtrue = randn(obs,1);Ytrue = m .* Xtrue .+ b;
octave:3> XSaw = Xtrue + xeps * randn(size(Xtrue));
octave:4> YSaw = Ytrue + yeps * randn(size(Ytrue));
octave:5> lsmb = [XSaw, ones(obs,1)] \ YSaw;
octave:6> lsm = lsmb(1); lsb = lsmb(2);
octave:7> [n,d] = orthlsq([XSaw, YSaw]);
octave:8> orthm = -n(1) / n(2);orthb = d / n(2);
Compare the estimated coefficients from both solutions. Plot the two regression lines.
octave:9> hold off;
octave:10> plot(XSaw,YSaw,"*;observed data;");
octave:11> hold on;
octave:12> plot(XSaw,m*XSaw .+ b,"r-;true line;");
octave:13> plot(XSaw,lsm*XSaw .+ lsb,"g-;ordinary least squares;");
octave:14> plot(XSaw,orthm*XSaw .+ orthb,"b-;orthogonal least squares;");
octave:15> hold off;
Try this again with different values of xeps and yeps, which control the sizes of the
errors in the x and y dimensions. Try with xeps = 0.1, yeps = 1.0, and with with
xeps = 1.0, yeps = 0.1. Under which situations does orthogonal least squares outperform
ordinary least squares? Do they give equally erroneous results in some situations? Can
you think of a way of “reweighting” channel data to make the orthogonal least squares
method more robust in cases of unequal error variance?
Chapter 10
Ordinary Differential Equations
Ordinary Differential Equations, or ODEs, are used in approximating some physical systems. Some
classical uses include simulation of the growth of populations and trajectory of a particle. They are
usually easier to solve or approximate than the more general Partial Differential Equations, PDEs,
which can contain partial derivatives.
Much of the background material of this chapter should be familiar to the student of calculus.
We will focus more on the approximation of solutions, than on analytical solutions. For more
background on ODEs, see any of the standard texts, e.g., [8].
10.1 Elementary Methods
A one-dimensional ODE is posed as follows: find some function x(t) such that

dx(t)
dt
= f (t, x(t)) ,
x(a) = c.
(10.1)
In calculus classes, you probably studied the case where f (t, x(t)) is independent of its second
variable. For example the ODE given by

dx(t)
dt
= t
2
−t,
x(a) = c.
has solution x(t) =
1
3
t
3

1
2
t
2
+K, where K is chosen such that x(a) = c.
A more general case is when f(t, x) is separable, that is f(t, x) = g(t)h(x) for some functions
g, h. You may recall that the solution to such a separable ODE usually involved the following
equation:

dx
h(x)
=

g(t) dt
Finding an analytic solution can be considerably more difficult in the general case, thus we turn
to approximation schemes.
135
136 CHAPTER 10. ORDINARY DIFFERENTIAL EQUATIONS
10.1.1 Integration and ‘Stepping’
We attempt to solve the ODE by integrating both sides. That is
dx(t)
dt
= f (t, x(t)) , yields

t+h
t
dx =

t+h
t
f (r, x(r)) dr, thus
x(t +h) = x(t) +

t+h
t
f (r, x(r)) dr. (10.2)
If we can approximate the integral then we have a way of ‘stepping’ from t to t +h, i.e., if we have
a good approximate of x(t) we can approximate x(t + h). Note that this means that all the work
we have put into approximating integrals can be used to approximate the solution of ODEs.
Using stepping schemes we can approximate x(t
final
) given x(t
initial
) by taking a number of
steps. For simplicity we usually use the same step size, h, for each step, though nothing precludes
us from varying the step size.
If we apply the left-hand rectangle rule for approximating integrals to equation 10.2, we get
x(t +h) ≈ x(t) +hf(t, x(t)). (10.3)
This is called Euler’s Method. Note this is essentially the same as the “forward difference” approx-
imation we made to the derivative, equation 7.1.
Trapezoid rule gives
x(t +h) ≈ x(t) +
h
2
[f(t, x(t)) +f(t +h, x(t +h))] .
But we cannot evaluate this exactly, since x(t +h) appears on both sides of the equation. And it is
embedded on the right hand side. Bummer. But if we could approximate it, say by using Euler’s
Method, maybe the formula would work. This is the idea behind the Runge-Kutta Method (see
Section 10.2).
10.1.2 Taylor’s Series Methods
We see that our more accurate integral approximations will be useless since they require information
we do not know, i.e., evaluations of f(t, x) for yet unknown x values. Thus we fall back on
Taylor’s Theorem (Theorem 1.1). We can also view this as using integral approximations where all
information comes from the left-hand endpoint.
By Taylor’s theorem, if x has at least m+ 1 continuous derivatives on the interval in question,
we can write
x(t +h) = x(t) +hx
t
(t) +
1
2
h
2
x
tt
(t) +. . . +
1
m!
h
m
x
(m)
(t) +
1
(m+ 1)!
h
m+1
x
(m+1)
(τ),
where τ is between t and t + h. Since τ is essentially unknown, our best calculable approximation
to this expression is
x(t +h) ≈ x(t) +hx
t
(t) +
1
2
h
2
x
tt
(t) +. . . +
1
m!
h
m
x
(m)
(t).
This approximate solution to the ODE is called a Taylor’s series method of order m. We also say
that this approximation is a truncation of the Taylor’s series. The difference between the actual
10.1. ELEMENTARY METHODS 137
value of x(t +h), and this approximation is called the truncation error. The truncation error exists
independently from any error in computing the approximation, called roundoff error. We discuss
this more in Subsection 10.1.5.
10.1.3 Euler’s Method
When m = 1 we recover Euler’s Method:
x(t +h) = x(t) +hx
t
(t) = x(t) +hf(t, x(t)).
Example 10.1. Consider the ODE

dx(t)
dt
= x,
x(0) = 1.
The actual solution is x(t) = e
t
. Euler’s Method will underestimate x(t) because the curvature of
the actual x(t) is positive, and thus the function is always above its linearization. In Figure 10.1,
we see that eventually the Euler’s Method approximation is very poor.
0
2
4
6
8
10
12
14
16
18
20
22
0 0.5 1 1.5 2 2.5 3
P
S
f
r
a
g
r
e
p
l
a
c
e
m
e
n
t
s
Euler approximation
Actual
Figure 10.1: Euler’s Method applied to approximate x
t
= x, x(0) = 1. Each step proceeds on
the linearization of the curve, and is thus an underestimate, as the actual solution, x(t) = e
t
has
positive curvature. Here step size h = 0.3 is used.
10.1.4 Higher Order Methods
Higher order methods are derived by taking more terms of the Taylor’s series expansion. Note
however, that they will require information about the higher derivatives of x(t), and thus may not
be appropriate for the case where f(t, x) is given as a “black box” function.
138 CHAPTER 10. ORDINARY DIFFERENTIAL EQUATIONS
Example Problem 10.2. Derive the Taylor’s series method for m = 3 for the ODE

dx(t)
dt
= x +e
x
,
x(0) = 1.
Solution: By simple calculus:
x
t
(t) = x +e
x
x
tt
(t) = x
t
+e
x
x
t
x
ttt
(t) = x
tt
(1 +e
x
) +x
t
e
x
x
t
We can then write our step like a program:
x
t
(t) ←x(t) +e
x(t)
x
tt
(t) ←x
t
(t)

1 +e
x(t)

x
ttt
(t) ←x
tt
(t)

1 +e
x(t)

+

x
t
(t)

2
e
x(t)
x(t +h) ≈ x(t) +hx
t
(t) +
1
2
h
2
x
tt
(t) +
1
6
h
3
x
ttt
(t)
¬
10.1.5 Errors
There are two types of errors: truncation and roundoff. Truncation error is the error of truncating
the Taylor’s series after the m
th
term. You can see that the truncation error is O

h
m+1

. Roundoff
error occurs because of the limited precision of computer arithmetic.
While it is possible these two kinds of error could cancel each other, given our pessimistic
worldview, we usually assume they are additive. We also expect some tradeoff between these two
errors as h varies. As h is made smaller, the truncation error presumably decreases, while the
roundoff error increases. This results in an optimal h-value for a given method. See Figure 10.2.
Unfortunately, the optimal h-value will usually depend on the given ODE and the approximation
method. For an ODE with unknown solution, it is generally impossible to predict the optimal h. We
have already observed this kind of behaviour, when analyzing approximate derivatives–cf. Example
Problem 7.5.
Both of these kinds of error can accumulate: Since we step from x(t) to x(t + h), an error in
the value of x(t) can cause a greater error in the value of x(t +h). It turns out that for some ODEs
and some methods, this does not happen. This leads to the topic of stability.
10.1.6 Stability
Suppose you had an exact method of solving an ODE, but had the wrong data. That is, you were
trying to solve

dx(t)
dt
= f (t, x(t)) ,
x(a) = c,
but instead fed the wrong data into the computer, which then solved exactly the ODE

dx(t)
dt
= f (t, x(t)) ,
x(a) = c +.
10.1. ELEMENTARY METHODS 139
0.01
0.1
1
10
100
1000
10000
1e-07 1e-06 1e-05 0.0001 0.001 0.01
P
S
f
r
a
g
r
e
p
l
a
c
e
m
e
n
t
s
truncation
roundoff
total error
Figure 10.2: The truncation, roundoff, and total error for a fictional ODE and approximation
method are shown. The total error is apparently minimized for an h value of approximately 0.0002.
Note that the truncation error appears to be O(h), thus the approximation could be Euler’s Method.
In reality, we expect the roundoff error to be more “bumpy,” as seen in Figure 7.1.
This solution can diverge from the correct one because of the different starting conditions. We
illustrate this with the usual example.
Example 10.3. Consider the ODE
dx(t)
dt
= x.
This has solution x(t) = x(0)e
t
. The curves for different starting conditions diverge, as in Fig-
ure 10.3(a).
However, if we instead consider the ODE
dx(t)
dt
= −x,
which has solution x(t) = x(0)e
−t
, we see that differences in the initial conditions become immate-
rial, as in Figure 10.3(b).
Thus the latter ODE exhibits stability: roundoff and truncation errors accrued at a given step
will become irrelevant as more steps are taken. The former ODE exhibits the opposite behavior–
accrued errors will be amplified.
140 CHAPTER 10. ORDINARY DIFFERENTIAL EQUATIONS
0
2
4
6
8
10
12
0 0.5 1 1.5 2
P
S
f
r
a
g
r
e
p
l
a
c
e
m
e
n
t
s
= −0.5
= −0.25
= 0
= 0.25
= 0.5
(a) (1 + )e
t
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
0 0.5 1 1.5 2
P
S
f
r
a
g
r
e
p
l
a
c
e
m
e
n
t
s
= −0.5
= −0.25
= 0
= 0.25
= 0.5
(b) (1 + )e
−t
Figure 10.3: Exponential functions with different starting conditions: in (a), the functions (1+)e
t
are shown, while in (b), (1+)e
−t
are shown. The latter exhibits stability, while the former exhibits
instability.
10.1. ELEMENTARY METHODS 141
It turns out there is a simple test for stability. If f
x
> δ for some positive δ, then the ODE
is instable; if f
x
< −δ for a positive δ, the equation is stable. Some ODEs fall through this test,
however.
We can prove this without too much pain. Define x(t, s) to be the solution to

dx(t)
dt
= f (t, x(t)) ,
x(a) = s,
for t ≥ a.
Then instability means that
lim
t→∞


∂s
x(t, s)

= ∞.
Think about this in terms of the curves for our simple example x
t
= x.
If we take the derivative of the ODE we get

∂t
x(t, s) = f(t, x(t, s)),

∂s

∂t
x(t, s) =

∂s
f(t, x(t, s)), i.e.,

∂s

∂t
x(t, s) = f
t
(t, x(t))
∂t
∂s
+f
x
(t, x(t, s))
∂x(t, s)
∂s
.
By the independence of t, s the first part of the RHS is zero, so

∂s

∂t
x(t, s) = f
x
(t, x(t, s))
∂x(t, s)
∂s
.
We use continuity of x to switch the orders of differentiation:

∂t

∂s
x(t, s) = f
x
(t, x(t, s))

∂s
x(t, s).
Defining u(t) =

∂s
x(t, s), and q(t) = f
x
(t, x(t, s)), we have

∂t
u(t) = q(t)u(t).
The solution is u(t) = ce
Q(t)
, where
Q(t) =

t
a
q(r) dr.
If q(r) = f
x
(r, x(r, s)) ≥ δ > 0, then limQ(t) = ∞, and so limu(t) = ∞. But note that u(t) =

∂s
x(t, s), and thus we have instability.
Similarly if q(r) ≤ −δ < 0, then limQ(t) = −∞, and so limu(t) = 0, giving stability.
10.1.7 Backwards Euler’s Method
Euler’s Method is the most basic numerical technique for solving ODEs. However, it may suffer
from instability, which may make the method inappropriate or impractical. However, it can be
recast into a method which may have superior stability at the cost of limited applicability.
142 CHAPTER 10. ORDINARY DIFFERENTIAL EQUATIONS
This method, the so-called Backwards Euler’s Method, is a stepping method: given a reasonable
approximation to x(t), it calculates an approximation to x(t + h). As usual, Taylor’s Theorem is
the starting point. Letting t
f
= t +h, Taylor’s Theorem states that
x(t
f
−h) = x(t
f
) + (−h)x
t
(t
f
) +
(−h)
2
2
x
tt
(t
f
) +. . . ,
for an analytic function. Assuming that x has two continuous derivatives on the interval in question,
the second order term is truncated to give
x(t) = x(t +h −h) = x(t
f
−h) ≈ x(t
f
) + (−h)x
t
(t
f
)
and so, x(t +h) ≈ x(t) +hx
t
(t +h)
The complication with applying this method is that x
t
(t+h) may not be computable if x(t+h) is
not known, thus cannot be used to calculate an approximation for x(t+h). Since this approximation
may contain x(t + h) on both sides, we call this method an implicit method. In contrast, Euler’s
Method, and the Runge-Kutta methods in the following section are explicit methods: they can be
used to directly compute an approximate step. We make this clear by examples.
Example 10.4. Consider the ODE

dx(t)
dt
= cos x,
x(0) = 2π.
In this case, if we wish to approximate x(0 +h) using Backwards Euler’s Method, we have to find
x(h) such that x(h) = x(0) + cos x(h). This is equivalent to finding a root of the equation
g(y) = 2π + cos y −y = 0
The function g(y) is nonincreasing (g
t
(y) is nonpositive), and has a zero between 6 and 8, as seen
in Figure 10.4. The techniques of Chapter 4 are needed to find the zero of this function.
-6
-4
-2
0
2
4
6
8
0 2 4 6 8 10 12
P
S
f
r
a
g
r
e
p
l
a
c
e
m
e
n
t
s
g(y)
Figure 10.4: The nonincreasing function g(y) = 2π + cos y −y is shown.
10.2. RUNGE-KUTTA METHODS 143
Example 10.5. Consider the ODE from Example 10.1:

dx(t)
dt
= x,
x(0) = 1.
The actual solution is x(t) = e
t
. Euler’s Method underestimates x(t), as shown in Figure 10.1.
Using Backwards Euler’s Method, gives the approximation:
x(t +h) = x(t) +hx(t +h),
x(t +h) =
x(t)
1 −h
.
Using Backwards Euler’s Method gives an overestimate, as shown in Figure 10.5. This is because
each step proceeds on the tangent line to a curve ke
t
to a point on that curve. Generally Backwards
Euler’s Method is preferred over vanilla Euler’s Method because it gives equivalent stability for
larger stepsize h. This effect is not evident in this case.
0
5
10
15
20
25
30
35
40
0 0.5 1 1.5 2 2.5 3
P
S
f
r
a
g
r
e
p
l
a
c
e
m
e
n
t
s
Backwards Euler approximation
Actual
Figure 10.5: Backwards Euler’s Method applied to approximate x
t
= x, x(0) = 1. The approxi-
mation is an overestimate, as each step is to a point on a curve ke
t
, through the tangent at that
point. Compare this figure to Figure 10.1, which shows the same problem approximated by Euler’s
Method, with the same stepsize, h = 0.3.
10.2 Runge-Kutta Methods
Recall the ODE problem: find some x(t) such that

dx(t)
dt
= f (t, x(t)) ,
x(a) = c,
144 CHAPTER 10. ORDINARY DIFFERENTIAL EQUATIONS
where f, a, c are given.
Recall that the Taylor’s series method has problems: either you are using the first order method
(Euler’s Method), which suffers inaccuracy, or you are using a higher order method which requires
evaluations of higher order derivatives of x(t), i.e., x
tt
(t), x
ttt
(t), etc. This makes these methods less
useful for the general setting of f being a blackbox function. The Runge-Kutta Methods (don’t
ask me how it’s pronounced) seek to resolve this.
10.2.1 Taylor’s Series Redux
We fall back on Taylor’s series, in this case the 2-dimensional version:
f(x +h, y +k) =

¸
i=0
1
i!

h

∂x
+k

∂y

i
f(x, y).
The thing in the middle is an operator on f(x, y). The partial derivative operators are interpreted
exactly as if they were algebraic terms, that is:

h

∂x
+k

∂y

0
f(x, y) = f(x, y),

h

∂x
+k

∂y

1
f(x, y) = h
∂f(x, y)
∂x
+k
∂f(x, y)
∂y
,

h

∂x
+k

∂y

2
f(x, y) = h
2

2
f(x, y)
∂x
2
+ 2hk

2
f(x, y)
∂x∂y
+k
2

2
f(x, y)
∂y
2
,
.
.
.
There is a truncated version of Taylors series:
f(x +h, y +k) =
n
¸
i=0
1
i!

h

∂x
+k

∂y

i
f(x, y) +
1
(n + 1)!

h

∂x
+k

∂y

n+1
f(¯ x, ¯ y).
where (¯ x, ¯ y) is a point on the line segment with endpoints (x, y) and (x +h, y +k).
For practice, we will write this for n = 1:
f(x +h, y +k) = f(x, y) +hf
x
(x, y) +kf
y
(x, y) +
1
2

h
2
f
xx
(¯ x, ¯ y) + 2hkf
xy
(¯ x, ¯ y) +k
2
f
yy
(¯ x, ¯ y)

10.2.2 Deriving the Runge-Kutta Methods
We now introduce the Runge-Kutta Method for stepping from x(t) to x(t + h). We suppose that
α, β, w
1
, w
2
are fixed constants. We then compute:
K
1
←hf(t, x)
K
2
←hf(t +αh, x +βK
1
).
Then we approximate:
x(t +h) ←x(t) +w
1
K
1
+w
2
K
2
.
We now examine the “proper” choices of the constants. Note that w
1
= 1, w
2
= 0 corresponds
to Euler’s Method, and does not require computation of K
2
. We will pick another choice.
10.2. RUNGE-KUTTA METHODS 145
Also notice that the definition of K
2
should be related to Taylor’s theorem in two dimensions.
Let’s look at it:
K
2
/h = f(t +αh, x +βK
1
)
= f(t +αh, x +βhf(t, x))
= f +αhf
t
+βhff
x
+
1
2

α
2
h
2
f
tt
(
¯
t, ¯ x) +αβfh
2
f
tx
(
¯
t, ¯ x) +β
2
h
2
f
2
f
x
x(
¯
t, ¯ x)

.
Now reconsider our step:
x(t +h) = x(t) +w
1
K
1
+w
2
K
2
= x(t) +w
1
hf +w
2
hf +w
2
αh
2
f
t
+w
2
βh
2
ff
x
+O

h
3

.
= x(t) + (w
1
+w
2
) hx
t
(t) +w
2
h
2
(αf
t
+βff
x
) +O

h
3

.
= x(t) + (w
1
+w
2
) hx
t
(t) +w
2
h
2
(αf
t
+βff
x
) +O

h
3

.
If we happen to choose our constants such that
w
1
+w
2
= 1, αw
2
=
1
2
= βw
2
,
then we get
x(t +h) = x(t) +hx
t
(t) +
1
2
h
2
(f
t
+ff
x
) +O

h
3

= x(t) +hx
t
(t) +
1
2
h
2
x
tt
(t) +O

h
3

,
i.e., our choice of the constants makes the approximate x(t +h) good up to a O

h
3

term, because
we end up with the Taylor’s series expansion up to that term. cool.
The usual choice of constants is α = β = 1, w
1
= w
2
=
1
2
. This gives the second order Runge-
Kutta Method :
x(t +h) ←x(t) +
h
2
f(t, x) +
h
2
f(t +h, x +hf(t, x)).
This can be written (and evaluated) as
K
1
←hf(t, x)
K
2
←hf(t +h, x +K
1
)
x(t +h) ←x(t) +
1
2
(K
1
+K
2
) .
(10.4)
Another choice is α = β = 2/3, w
1
= 1/4, w
2
= 3/4. This gives
x(t +h) ←x(t) +
h
4
f(t, x) +
3h
4
f

t +
2h
3
, x +
2h
3
f(t, x)

.
The Runge-Kutta Method of order two has error term O

h
3

. Sometimes this is not enough
and higher-order Runge-Kutta Methods are used. The next Runge-Kutta Method is the order four
method:
K
1
←hf(t, x)
K
2
←hf(t +h/2, x +K
1
/2)
K
3
←hf(t +h/2, x +K
2
/2)
K
4
←hf(t +h, x +K
3
)
x(t +h) ←x(t) +
1
6
(K
1
+ 2K
2
+ 2K
3
+K
4
) .
(10.5)
146 CHAPTER 10. ORDINARY DIFFERENTIAL EQUATIONS
This method has order O

h
5

. (See http://mathworld.wolfram.com/Runge-KuttaMethod.html)
The Runge-Kutta Method can be extrapolated to even higher orders. However, the number
of function evaluations grows faster than the accuracy of the method. Thus the methods of order
higher than four are normally not used.
We already saw that w
1
= 1, w
2
= 0 corresponds to Euler’s Method. We consider now the case
w
1
= 0, w
2
= 1. The method becomes
x(t +h) ←x(t) +hf

t +
h
2
, x +
h
2
f(t, x)

.
This is called the modified Euler’s Method . Note this gives a different value than if Euler’s
Method was applied twice with step size h/2.
10.2.3 Examples
Example 10.6. Consider the ODE:

x
t
= (tx)
3

x
t

2
x(1) = 1
Use h = 0.1 to compute x(1.1) using both Taylor’s Series Methods and Runge-Kutta methods
of order 2.
10.3 Systems of ODEs
Recall the regular ODE problem: find some x(t) such that

dx(t)
dt
= f (t, x(t)) ,
x(a) = c,
where f, a, c are given.
Sometimes the physical systems we are considering are more complex. For example, we might
be interested in the system of ODEs:

dx(t)
dt
= f (t, x(t), y(t)) ,
dy(t)
dt
= g (t, x(t), y(t)) ,
x(a) = c,
y(a) = d.
Example 10.7. Consider the following system of ODEs:

dx(t)
dt
= t
2
−x
dy(t)
dt
= y
2
+y −t,
x(0) = 1,
y(0) = 0.
You should immediately notice that this is not a system at all, but rather a collection of two ODEs:

dx(t)
dt
= t
2
−x
x(0) = 1,
10.3. SYSTEMS OF ODES 147
and

dy(t)
dt
= y
2
+y −t,
y(0) = 0.
These two ODEs can be solved separately. We call such a system uncoupled. In an uncoupled
system the function f(t, x, y) is independent of y, and g(t, x, y) is independent of x. A system
which is not uncoupled, is, of course, coupled. We will not consider uncoupled systems.
10.3.1 Larger Systems
There is no need to stop at two functions. We may imagine we have to solve the following problem:
find x
1
(t), x
2
(t), . . . , x
n
(t) such that

dx
1
(t)
dt
= f
1
(t, x
1
(t), x
2
(t), . . . , x
n
(t)) ,
dx
2
(t)
dt
= f
2
(t, x
1
(t), x
2
(t), . . . , x
n
(t)) ,
.
.
.
dxn(t)
dt
= f
n
(t, x
1
(t), x
2
(t), . . . , x
n
(t)) ,
x
1
(a) = c
1
, x
2
(a) = c
2
, . . . , x
n
(a) = c
n
.
The idea is to not be afraid of the notation, write everything as vectors, then do exactly the
same thing as for the one dimensional case! That’s right, there is nothing new but notation:
Let
X =

x
1
x
2
. . .
x
n
¸
¸
¸
¸
, X

=

x
t
1
x
t
2
. . .
x
t
n
¸
¸
¸
¸
, F =

f
1
f
2
. . .
f
n
¸
¸
¸
¸
, C =

c
1
c
2
. . .
c
n
¸
¸
¸
¸
.
Then we want to find X such that

X

(t) = F (t, X(t))
X (a) = C.
(10.6)
Compare this to equation 10.1.
10.3.2 Recasting Single ODE Methods
We will consider stepping methods for solving higher order ODEs. That is, we know X(t), and
want to find X(t + h). Most of the methods we looked at for solving one dimensional ODEs can
be rewritten to solve higher dimensional problems.
For example, Euler’s Method, equation 10.3 can be written as
X(t +h) ←X(t) +hF (t, X(t)) .
As in 1D, this method is derived from approximating X by its linearization.
In fact, our general strategy of using Taylor’s Theorem also carries through without change.
That is we can use the k
th
order method to step as follows:
X(t +h) ←X(t) +hX

(t) +
h
2
2
X

(t) +. . . +
h
k
k!
X
(k)
(t).
148 CHAPTER 10. ORDINARY DIFFERENTIAL EQUATIONS
Additionally we can write the Runge-Kutta Methods as follows:
Order two:
K
1
←hF (t, X)
K
2
←hF (t +h, X +K
1
)
X(t +h) ←X(t) +
1
2
(K
1
+K
2
) .
Order four:
K
1
←hF (t, X)
K
2
←hF

t +
1
2
h, X +
1
2
K
1

K
3
←hF

t +
1
2
h, X +
1
2
K
2

K
4
←hF (t +h, X +K
3
)
X(t +h) ←x(t) +
1
6
(K
1
+ 2K
2
+ 2K
3
+K
4
) .
Example Problem 10.8. Consider the system of ODEs:

x
t
1
(t) = x
2
−x
2
3
x
t
2
(t) = t +x
1
+x
3
x
t
3
(t) = x
2
−x
2
1
x
1
(0) = 1,
x
2
(0) = 0,
x
3
(0) = 1.
Approximate X(0.1) by taking a single step of size 0.1, for Euler’s Method, and the Runge-Kutta
Method of order 2.
Solution: We write
F(t, X(t)) =

x
2
(t) −x
2
3
(t)
t +x
1
(t) +x
3
(t)
x
2
(t) −x
2
1
(t)
¸
¸
X(0) =

1
0
1
¸
¸
Thus we have
F(0, X(0)) =

−1
2
−1
¸
¸
Euler’s Method then makes the approximation
X(0.1) ←X(0) + 0.1F(0, X(0)) =

1
0
1
¸
¸
+

−0.1
0.2
−0.1
¸
¸
=

0.9
0.2
0.9
¸
¸
The Runge-Kutta Method computes:
K
1
←0.1F(0, X(0)) =

−0.1
0.2
−0.1
¸
¸
and K
2
←0.1F(0.1, X(0) +K
1
) =

−0.061
0.19
−0.061
¸
¸
10.3. SYSTEMS OF ODES 149
Then
X(0.1) ←X(0) +
1
2
(K
1
+K
2
) =

1
0
1
¸
¸
+

−0.0805
0.195
−0.0805
¸
¸
=

0.9195
0.195
0.9195
¸
¸
¬
10.3.3 It’s Only Systems
The fact we can simply solve systems of ODEs using old methods is good news. It allows us to
solve higher order ODEs. For example, consider the following problem: given f, a, c
0
, c
1
, . . . , c
n
,
find x(t) such that

x
(n)
(t) = f

t, x(t), x
t
(t), . . . , x
(n−1)
(t)

,
x(a) = c
0
, x
t
(a) = c
1
, x
tt
(a) = c
2
, . . . , x
(n−1)
(a) = c
n−1
.
We can put this in terms of a system by letting
x
0
= x, x
1
= x
t
, x
2
= x
tt
, . . . , x
n−1
= x
(n−1)
.
Then the ODE plus these n −1 equations can be written as

x
t
n−1
(t) = f (t, x
0
(t), x
1
(t), x
2
(t), . . . , x
n−1
(t))
x
t
0
(t) = x
1
(t)
x
t
1
(t) = x
2
(t)
.
.
.
x
t
n−2
(t) = x
n−1
(t)
x
0
(a) = c
0
, x
1
(a) = c
1
, x
2
(a) = c
2
, . . . , x
n−1
(a) = c
n−1
.
This is just a system of ODEs that we can solve like any other.
Note that this trick also works on systems of higher order ODEs. That is, we can transform
any system of higher order ODEs into a (larger) system of first order ODEs.
Example Problem 10.9. Rewrite the following system as a system of first order ODEs:

x
tt
(t) = (t/y(t)) +x
t
(t) −1
y
t
(t) =
1
(x

(t)+y(t))
x(0) = 1, x
t
(0) = 0, y(0) = −1
Solution: We let x
0
= x, x
1
= x
t
, x
2
= y, then we can rewrite as

x
t
1
(t) = (t/x
2
(t)) +x
1
(t) −1
x
t
0
(t) = x
1
(t)
x
t
2
(t) =
1
(x
1
(t)+x
2
(t))
x
0
(0) = 1, x
1
(0) = 0, x
2
(0) = −1
¬
150 CHAPTER 10. ORDINARY DIFFERENTIAL EQUATIONS
10.3.4 It’s Only Autonomous Systems
In fact, we can use a similar trick to get rid of the time-dependence of an ODE. Consider the first
order system

x
t
1
= f
1
(t, x
1
, x
2
, . . . , x
n
) ,
x
t
2
= f
2
(t, x
1
, x
2
, . . . , x
n
) ,
x
t
3
= f
3
(t, x
1
, x
2
, . . . , x
n
) ,
.
.
.
x
t
n
= f
n
(t, x
1
, x
2
, . . . , x
n
) ,
x
1
(a) = c
1
, x
2
(a) = c
2
, . . . , x
n
(a) = c
n
.
We can get rid of the time dependence and thus make the system autonomous by making t
another variable function to be found. We do this by letting x
0
= t, then adding the equations
x
t
0
= 1, and x
0
(a) = a, to get the system:

x
t
0
= 1,
x
t
1
= f
1
(x
0
, x
1
, x
2
, . . . , x
n
) ,
x
t
2
= f
2
(x
0
, x
1
, x
2
, . . . , x
n
) ,
x
t
3
= f
3
(x
0
, x
1
, x
2
, . . . , x
n
) ,
.
.
.
x
t
n
= f
n
(x
0
, x
1
, x
2
, . . . , x
n
) ,
x
0
(a) = a, x
1
(a) = c
1
, x
2
(a) = c
2
, . . . , x
n
(a) = c
n
.
An autonomous system can be written in the more elegant form:

X

= F (X)
X (a) = C.
Moreover, we can consider the phase curve of an autonomous system.
One can consider X to be the position of a particle which moves through R
n
over time. For
an autonomous system of ODEs, the movement of the particle is dependent only on its current
position and not on time.
1
A phase curve is a flow line in the vector field F. You can think of
the vector field F as being the velocity, at a given point, of a river, the flow of which is time
independent. Under this analogy, the phase curve is the path of a vessel placed in the river. In our
deterministic world, phase curves never cross. This is because at the point of crossing, there would
have to be two different values of the tangent of the curve, i.e., the vector field would have to have
two different values at that point.
The following example illustrates the idea of phase curves for autonomous ODE.
Example 10.10. Consider the autonomous system of ODEs, without an initial value:

x
t
1
(t) = −2(x
2
−2.4)
x
t
2
(t) = 3(x
1
−1.2)
We can rewrite this ODE as
X
t
(t) = F (X(t)) .
1
Note that if you have converted a system of n equations with a time dependence to an autonomous system, then
the autonomous system should be considered a particle moving through R
n+1
. In this case time becomes one of the
dimensions, and thus we speak of position, instead of time.
10.3. SYSTEMS OF ODES 151
This is an autonomous ODE. The associated vector field can be expressed as
F (X) =
¸
0 −2
3 0

X +
¸
4.8
−3.6

.
This vector field is plotted in Figure 10.6.
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
-0.5 0 0.5 1 1.5 2 2.5 3 3.5
Figure 10.6: The vector field F (X) is shown in R
2
. The tips of the vectors are shown as circles.
(Due to budget restrictions, arrowheads were not available for this figure.)
You should verify that this ODE is solved by
X (t) = r
¸ √
2 cos(

6t +t
0
)

3 sin(

6t +t
0
)

+
¸
1.2
2.4

,
for any r ≥ 0, and t
0
∈ (0, 2π] . Given an initial value, the parameters r and t
0
are uniquely
determined. Thus the trajectory of X over time is that of an ellipse in the plane.
2
Thus the
family of such ellipses, taking all r ≥ 0, form the phase curves of this ODE. We would expect the
approximation of an ODE to never cross a phase curve. This is because the phase curve represents
the
Euler’s Method and the second order Runge-Kutta Method were used to approximate the ODE
with initial value
X (0) =
¸
1.5
1.5

.
2
In the case r = 0, the ellipse is the “trivial” ellipse which consists of a point.
152 CHAPTER 10. ORDINARY DIFFERENTIAL EQUATIONS
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
-0.5 0 0.5 1 1.5 2 2.5 3 3.5
(a) Euler’s Method
-0.5
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
-0.5 0 0.5 1 1.5 2 2.5 3 3.5
(b) Runge-Kutta Method
Figure 10.7: The simulated phase curves of the system of Example 10.10 are shown for (a) Euler’s
Method and (b) the Runge-Kutta Method of order 2. The Runge-Kutta Method used larger step
size, but was more accurate. The actual solution to the ODE is an ellipse, thus the “spiralling”
observed for Euler’s Method is spurious.
10.3. SYSTEMS OF ODES 153
Euler’s Method was used for step size h = 0.005 for 3000 steps. The Runge-Kutta Method was
employed with step size h = 0.015 for 3000 steps. The approximations are shown in Figure 10.7.
Given that the actual solution of this initial value problem is an ellipse, we see that Euler’s
Method performed rather poorly, spiralling out from the ellipse. This must be the case for any step
size, since Euler’s Method steps in the direction tangent to the actual solution; thus every step of
Euler’s Method puts the approximation on an ellipse of larger radius. Smaller stepsize minimizes
this effect, but at greater computational cost.
The Runge-Kutta Method performs much better, and for larger step size. At the given resolu-
tion, no spiralling is observable. Thus the Runge-Kutta Method outperforms Euler’s Method, and
at lesser computational cost.
3
3
Because the Runge-Kutta Method of order two requires two evaluations of F, whereas Euler’s Method requires
only one, per step, it is expected that they would be ‘evenly matched’ when Runge-Kutta Method uses a step size
twice that of Euler’s Method.
154 CHAPTER 10. ORDINARY DIFFERENTIAL EQUATIONS
Exercises
(10.1) Given the ODE:

x
t
(t) = t
2
−x
2
x(1) = 1
Approximate x(1.1) by using one step of Euler’s Method. Approximate x(1.1) using one
step of the second order Runge-Kutta Method.
(10.2) Given the ODE:

x
t
1
(t) = x
1
x
2
+t
x
t
2
(t) = 2x
2
−x
2
1
x
1
(1) = 0
x
2
(1) = 3
Approximate X(1.1) by using one step of Euler’s Method. Approximate X(1.1) using one
step of the second order Runge-Kutta Method.
(10.3) Consider the separable ODE
x
t
(t) = x(t)g(t),
for some function g. Use backward’s Euler (or, alternatively, the right endpoint rule for
approximating integrals) to derive the approximation scheme
x(t +h) ←
x(t)
1 −hg(t +h)
.
What could go wrong with using such a scheme?
(10.4) Consider the ODE
x
t
(t) = x(t) +t
2
.
Using Taylor’s Theorem, derive a higher order approximation scheme to find x(t +h) based
on x(t), t, and h.
(10.5) Which of the following ODEs can you show are stable? Which are unstable? Which seem
ambiguous?
(a) x
t
(t) = −x − arctan x (b) x
t
(t) = 2x + tan x (c) x
t
(t) = −4x − e
x
(d) x
t
(t) = x + x
3
(e) x
t
(t) = t + cos x
(10.6) Rewrite the following higher order ODE as a system of first order ODEs:

x
tt
(t) = x(t) −sin x
t
(t),
x(0) = 0,
x
t
(0) = π.
(10.7) Rewrite the system of ODEs as a system of first order ODEs

x
tt
(t) = y(t) +t −x(t),
y
t
(t) = x
t
(t) +x(t) −4,
x
t
(0) = 1,
x(0) = 2,
y(0) = 0.
(10.8) Rewrite the following higher order system of ODEs as a system of first order ODEs:

x
ttt
(t) = y
tt
(t) −x
t
(t),
y
ttt
(t) = x
tt
(t) +y
t
(t),
x(0) = x
tt
(0) = y
t
(0) = −1,
x
t
(0) = y(0) = y
tt
(0) = 1.
10.3. SYSTEMS OF ODES 155
(10.9) Implement Euler’s Method for approximating the solution to

dx(t)
dt
= f (t, x(t)) ,
x(a) = c.
Your m-file should have header line like:
function xfin = euler(f,a,c,h,steps)
where xfin should be the approximate solution to the ODE at time a + steps h.
(a) Run your code with f(t, x) = −x, using a = 0 = c, and using h steps = 1.0. In this
case the actual solution is
xfin = ce
−1
.
This ODE is stable, so your approximation should be good (cf. Example 10.3). Exper-
iment with different values of h.
(b) Run your code with f(t, x) = x, using a = 0 = c, and using h steps = 1.0. In this
case the actual solution is
xfin = ce.
Your actual solution may be a rather poor approximation. Prepare a plot of error
versus h for varying values of h. (cf. Figure 7.1 and Figure 10.2)
(c) Run your code with f(t, x) = 1/ (1 −x) , using a = 0, c = 0.1, and using hsteps = 2.0.
Plot your approximations for varying values of steps. For example, try steps values of
35, 36, and 90, 91. Also try large values of steps, like 100, 500, 1000. Can you guess what
the actual solution is supposed to look like? Can you explain the poor performance of
Euler’s Method for this ODE?
(10.10) Implement the Runge-Kutta Method of order two for approximating the solution to

dx(t)
dt
= f (t, x(t)) ,
x(a) = c.
Your m-file should have header line like:
function xfin = rkm(f,a,c,h,steps)
where xfin should be the approximate solution to the ODE at time a + steps h.
Test your code with the ODEs from the previous problem. Does the Runge-Kutta Method
perform any better than Euler’s Method for the last ODE?
156 CHAPTER 10. ORDINARY DIFFERENTIAL EQUATIONS
Appendix A
Old Exams
A.1 First Midterm, Fall 2003
P1 (7 pnts) Clearly state Taylor’s Theorem for f(x + h). Include all hypotheses, and state the
conditions on the variable that appears in the error term.
P2 (7 pnts) Write out the Lagrange Polynomial
0
(x) for nodes x
0
, x
1
, . . . , x
n
. That is, write the
polynomial that has value 1 at x
0
and has value 0 at x
1
, x
2
, . . . , x
n
.
P3 (7 pnts) Write the Lagrange form of the polynomial of lowest degree that interpolates the
following values:
x
k
−1 1
y
k
11 17
P4 (14 pnts) Use the method of Bisection for finding the zero of the function f(x) = x
3
− 3x
2
+
5x −
5
2
. Let a
0
= 0, and b
0
= 2. What are a
2
, b
2
?
P5 (21 pnts) • Write out the iteration step of Newton’s Method for finding a zero of the func-
tion f(x). Your iteration should look like
x
k+1
=??
• Write the Newton’s Method iteration for finding

5.
• Letting x
0
= 1, find the x
1
and x
2
for your iteration. Note that

5 ≈ 2.236.
P6 (7 pnts) Write out the iteration step of the Secant Method for finding a zero of the function
f(x). Your iteration should look like
x
k+1
=??
P7 (21 pnts) Consider the following approximation:
f
t
(x) ≈
f(x +h) −5f(x) + 4f(x +
h
2
)
3h
• Derive this approximation using Taylor’s Theorem.
• Assuming that f(x) has bounded derivatives, give the accuracy of the above approxima-
tion. Your answer should be something like O

h
?

.
• Let f(x) = x
2
. Approximate f
t
(0) with this approximation, using h =
1
3
.
P8 (21 pnts) Consider the data:
k 0 1 2
x
k
3 1 −1
f(x
k
) 5 15 9
157
158 APPENDIX A. OLD EXAMS
• Construct the divided differences table for the data.
• Construct the Newton form of the polynomial p(x) of lowest degree that interpolates
f(x) at these nodes.
• Suppose that these data were generated by the function f(x) = x
3
− 5x
2
+ 2x + 17.
Find a numerical upper bound on the error [p(x) −f(x)[ over the interval [−1, 3]. Your
answer should be a number.
A.2 Second Midterm, Fall 2003
P1 (6 pnts) Write the Trapezoid rule approximation for

b
a
f(x) dx, using the nodes ¦x
i
¦
n
i=0
.
P2 (6 pnts) Suppose Gaussian Elimination with scaled partial pivoting is applied to the matrix
A =

0.0001 −0.1 −0.01 0.001
43 91 −43 1
−12 26 −1 0
−7
1
2
−10 π
¸
¸
¸
¸
.
Which row contains the first pivot? You must show your work.
P3 (14 pnts) Let φ(n) be some quadrature rule that divides an interval into 2
n
equal subintervals.
Suppose

b
a
f(x) dx = φ(n) +a
5
h
5
n
+a
10
h
10
n
+a
15
h
15
n
+. . . ,
where h
n
=
b−a
2
n
. Using evaluations of φ(), devise a “Romberg-style” quadrature rule that is
a O

h
10
n

approximation to the integral.
P4 (14 pnts) Compute the LU factorization of the matrix
A =

1 0 −1
0 2 0
2 2 −1
¸
¸
using na¨ıve Gaussian Elimination. Show all your work. Your final answer should be two
matrices, L and U. Verify your answer, i.e., verify that A = LU.
P5 (14 pnts) Run one iteration of either Jacobi or Gauss Seidel on the system:

1 3 4
−2 2 1
−1 3 −2
¸
¸
x =

−3
−3
1
¸
¸
Start with x
(0)
= [1 1 1]
¯
. Clearly indicate your answer x
(1)
. Show your work. You must
indicate which of the two methods you are using.
P6 (21 pnts) Derive the two-node Chebyshev Quadrature rule:

1
−1
f(x) dx ≈ c [f(x
0
) +f(x
1
)] .
Make this rule exact for all polynomials of degree ≤ 2.
P7 (28 pnts) Consider the quadrature rule:

1
0
f(x) dx ≈ z
0
f(0) +z
1
f(1) +z
2
f
t
(0) +z
3
f
t
(1).
A.3. FINAL EXAM, FALL 2003 159
• Write four linear equations involving z
i
to make this rule exact for polynomials of the
highest possible degree.
• Solve your system for z using na¨ıve Gaussian Elimination.
A.3 Final Exam, Fall 2003
P1 (4 pnts) Clearly state Taylor’s Theorem for f(x + h). Include all hypotheses, and state the
conditions on the variable that appears in the error term.
P2 (4 pnts) State the conditions that characterize S(x) as a natural cubic spline on the interval
[a, b]. Let the knots be a = t
0
< t
1
< t
2
< . . . < t
n
= b.
P3 (4 pnts) Let x
0
= 3, x
1
= −5, x
2
= 0. Write out the Lagrange Polynomial
0
(x) for these
nodes; that is, write the polynomial that has value 1 at x
0
and has value 0 at x
1
, x
2
.
P4 (4 pnts) Consider the ODE:

x
t
(t) = f(t, x(t))
x(a) = c
Write out the Euler Method to step from x(t) to x(t +h).
P5 (7 pnts) Compute the LU factorization of the matrix
A =

−1 1 0
1 1 −1
−2 0 2
¸
¸
using na¨ıve Gaussian Elimination. Show all your work. Your final answer should be two
matrices, L and U. Verify your answer, i.e., verify that A = LU.
P6 (9 pnts) Let α be given. Determine a quadrature rule of the form

α
−α
f(x) dx ≈ Af(0) +Bf
tt
(−α) +Cf
tt
(α)
that is exact for polynomials of highest possible degree. What is the highest degree polynomial
for which this rule is exact?
P7 (9 pnts) Suppose that φ(n) is some computational function that approximates a desired quan-
tity, L. Furthermore suppose that
φ(n) = L +a
1
n

1
2
+a
2
n

2
2
+a
3
n

3
2
+. . .
Combine two evaluations of φ() in the manner of Richardson to get a O

n
−1

approximation
to L.
P8 (9 pnts) Find the least-squares best approximate of the form y = ln (cx) to the data
x
0
x
1
. . . x
n
y
0
y
1
. . . y
n
Caution: this should not be done with basis vectors.
P9 (9 pnts) Let
T = ¦f(x) = c
0
x +c
1
log x +c
2
cos x[ c
0
, c
1
, c
2
∈ R¦ .
Set up (but do not attempt to solve) the normal equations to find the least-squares best f ∈ T
to approximate the data:
x
0
x
1
. . . x
n
y
0
y
1
. . . y
n
160 APPENDIX A. OLD EXAMS
The normal equations should involve the vector c = [c
0
c
1
c
2
]
¯
, which uniquely determines a
function in T.
P10 (9 pnts) Consider the following approximation:
f
tt
(x) ≈
3f(x −h) −4f(x) +f(x + 3h)
6h
2
• Derive this approximation using Taylor’s Theorem.
• Assuming that f(x) has bounded derivatives, give the accuracy of the above approxima-
tion. Your answer should be something like O

h
?

.
P11 (11 pnts) Consider the ODE:

x
t
(t) = x(t)g(t)
x(0) = 1
• Use the Fundamental Theorem of Calculus to write a step update of the form:
x(t +h) = x(t) +

t+h
t
??
• Approximate the integral using the Trapezoid Rule to get an explicit step update of the
form
x(t +h) ←Q(t, x(t), h)
• Suppose that g(t) ≥ m > 0 for all t. Do you see any problem with this stepping method
if h > 2/m?
Hint: the actual solution to the ODE is
x(t) = e
R
t
0
g(r) dr
.
P12 (9 pnts) Consider the ODE:

x
t
(t) = e
x(t)
+ sin (x(t))
x(0) = 0
Write out the Taylor’s series method of order three to approximate x(t +h) given x(t). It is
permissible to write the update as a small program like:
x
t
(t) ← Q
0
(x(t))
x
tt
(t) ← Q
1
(x(t), x
t
(t))
.
.
.
x(t +h) ← R(x(t), x
t
(t), x
tt
(t), . . .)
rather than write the update as a monstrous equation of the form
x(t +h) ←M(x(t)).
P13 (15 pnts) Consider the data:
k 0 1 2
x
k
−0.5 0 0.5
f(x
k
) 5 15 9
• Construct the divided differences table for the data.
A.4. FIRST MIDTERM, FALL 2004 161
• Construct the Newton form of the polynomial p(x) of lowest degree that interpolates
f(x) at these nodes.
• Suppose that these data were generated by the function f(x) = 40x
3
−32x
2
−6x + 15.
Find a numerical upper bound on the error [p(x) −f(x)[ over the interval [−0.5, 0.5].
Your answer should be a number.
P14 (15 pnts) • Write out the iteration step of Newton’s Method for finding a zero of the
function f(x). Your iteration should look like
x
k+1
=??
• Write the Newton’s Method iteration for finding
3

25/4.
• Letting x
0
=
5
2
, find the x
1
and x
2
for your iteration. Note that
3

25/4 ≈ 1.842.
P15 (15 pnts) • Let x be some given number, and let f() be a given, “black box” function.
Using Taylor’s Theorem, derive a O(h) approximation to the derivative f
t
(x). Call your
approximation φ(h); the approximation should be defined in terms of f() evaluated at
some values, and should probably involve x and h.
• Let f(x) = x
2
+ 1. Approximate f
t
(0) using your φ(h), for h =
1
2
.
• For the same f(x) and h, get a O

h
2

approximation to f
t
(0) using the method of
Richardson’s Extrapolation; your answer should probably include a table like this:
D(0, 0)
D(1, 0) D(1, 1)
A.4 First Midterm, Fall 2004
[Definitions] Answer no more than two of the following.
P1 (12 pnts) Clearly state Taylor’s Theorem for f(x + h). Include all hypotheses, and state the
conditions on the variable that appears in the error term.
P2 (12 pnts) Write the general form of the iterated solution to the problem Ax = b. Let Q be
your “splitting matrix,” and use the factor ω. Your solution should look something like:
??x
(k)
=??x
(k−1)
+?? or x
(k)
=??x
(k−1)
+??
For one of the following variants, describe what Q is, in relation to A : Richardson’s, Jacobi,
Gauss-Seidel.
P3 (12 pnts) Write the iteration for Newton’s Method or the Secant Method for finding a root to
the equation f(x) = 0. Your solution should look something like:
x
k+1
=?? +
??
??
[Problems] Answer no more than four of the following.
P1 (14 pnts) Perform bisection on the function graphed below. Let the first interval be [0, 256].
Give the second, third, fourth, and fifth intervals.
P2 (14 pnts) Give a O

h
4

approximation to sin (h +π/2) . Find a reasonable C such that the
truncation error is no greater than Ch
4
.
P3 (14 pnts) Use Newton’s Method to find a good approximation to
3

24. Use x
0
= 3. That is,
iteratively define a sequence with x
0
= 3, such that x
k

3

24.
P4 (14 pnts) One of the roots of x
2
− bx + c = 0 as found by the quadratic formula is subject
to subtractive cancellation when [b[ [c[ . Which root is it? Rewrite the expression for that
root to eliminate subtractive cancellation.
162 APPENDIX A. OLD EXAMS
0 32 64 96 128 160 192 224 256
-64
-32
0
32
64
P5 (14 pnts) Find the LU decomposition of the following matrix by Gaussian Elimination with
na¨ıve pivoting.

3 2 4
−6 1 −6
12 −12 10
¸
¸
P6 (14 pnts) Consider the linear equation:

6 1 1
2 4 0
1 2 6
¸
¸
x =

5
−6
3
¸
¸
Letting x
(0)
= [1 1 1]
¯
, find x
(1)
using either the Gauss-Seidel or Jacobi iterative schemes.
Use weighting ω = 1. You must indicate which scheme you are using. Show all your work.
[Questions] Answer only one of the following.
P1 (20 pnts) Suppose that the eigenvalues of A are in [α, β] with 0 < α < β. Find the ω that
minimizes

I −ωA
2

2
. What is the minimal value of

I −ωA
2

2
for this optimal ω? For full
credit you need to make some argument why this ω minimizes this norm.
P2 (20 pnts) Let f(x) = 0 have a unique root r. Recall that the form of the error for Newton’s
Method is
e
k+1
=
−f
tt

k
)
2f
t
(x
k
)
e
2
k
,
where e
k
= r − x
k
. Suppose that f(x) has the properties that f
t
(x) ≥ m > 0 for all x, and
[f
tt
(x)[ ≤ M for all x. Find δ > 0 such that [e
k
[ < δ insures that x
n
→r. Justify your answer.
P3 (20 pnts) Suppose that f(r) = 0 = f
t
(r) = f
tt
(r), and f
tt
(x) is continuous. Letting e
k
= x
k
−r,
show that for Newton’s Method,
e
k+1

1
2
e
k
,
when [e
k
[ is sufficiently small. (Hint: Use Taylor’s Theorem twice.) Is this faster or slower
convergence than for Newton’s Method with a simple root?
A.5 Second Midterm, Fall 2004
[Definitions] Answer no more than two of the following three.
P1 (10 pnts) For a set of distinct nodes x
0
, x
1
, . . . , x
13
, what is the Lagrange Polynomial
5
(x)?
You may write your answer as a product. What degree is this polynomial?
A.5. SECOND MIDTERM, FALL 2004 163
P2 (10 pnts) Complete this statement of the polynomial interpolation error theorem: Let p be the
polynomial of degree at most n interpolating function f at the n+1 distinct nodes x
0
, x
1
, . . . , x
n
on [a, b]. Let f
(n+1)
be continuous. Then for each x ∈ [a, b] there is some ξ ∈ [a, b] such that
f(x) −p(x) =
?
?
n
¸
i=0
(?) .
P3 (10 pnts) State the conditions which define the function S(x) as a natural cubic spline on the
interval [a, b].
[Problems] Answer no more than five of the following six.
P1 (16 pnts) How large must n be to interpolate the function e
x
to within 0.001 on the interval
[0, 2] with the polynomial interpolant at n equally spaced nodes?
P2 (16 pnts) Let φ(h) be some computable approximation to the quantity L such that
φ(h) = L +a
5
h
5
+a
10
h
10
+a
15
h
15
+. . .
Combine evaluations of φ() to devise a O

h
10

approximation to L.
P3 (16 pnts) Derive the constants A and B which make the quadrature rule

1
−1
f(x) dx ≈ Af



15
5

+Bf(0) +Af


15
5

exact for polynomials of degree ≤ 2. Use this quadrature rule to approximate
π =

1
−1
2
1 +x
2
dx
P4 (16 pnts) Find the polynomial of degree ≤ 3 interpolating the data
x −1 0 1 2
y −2 8 0 4
(Hint: using Lagrange Polynomials will probably take too long.)
P5 (16 pnts) Find constants α, β such that the following is a quadratic spline on [0, 4].
Q(x) =

x
2
−3x + 4 : 0 ≤ x < 1
α(x −1)
2
+β(x −1) + 2 : 1 ≤ x < 3
x
2
+x −4 : 3 ≤ x ≤ 4
P6 (16 pnts) Consider the following approximation:
f
t
(x) ≈
f(x +h) −5f(x) + 4f(x +
h
2
)
3h
• Derive this approximation via Taylor’s Theorem.
• Assuming that f(x) has bounded derivatives, give the accuracy of the above approxima-
tion. Your answer should be something like O

h
?

.
• Let f(x) = x
2
. Approximate f
t
(0) with this approximation, using h =
1
3
.
164 APPENDIX A. OLD EXAMS
A.6 Final Exam, Fall 2004
[Definitions] Answer all of the following.
P1 (10 pnts) Clearly state Taylor’s Theorem for f(x + h). Include all hypotheses, and state the
conditions on the variable that appears in the error term.
P2 (10 pnts) Write the iteration for Newton’s Method or the Secant Method for finding a root to
the equation f(x) = 0. Your solution should look something like:
x
k+1
=?? +
??
??
P3 (10 pnts) Write the general form of the iterated solution to the problem Ax = b. Let Q be
your “splitting matrix,” and use the factor ω. Your solution should look something like:
??x
(k)
=??x
(k−1)
+?? or x
(k)
=??x
(k−1)
+??
For one of the following variants, describe what Q is, in relation to A : Richardson’s, Jacobi,
Gauss-Seidel.
P4 (10 pnts) Define what it means for the function f(x) ∈ T to be the least squares best approx-
imant to the data
x
0
x
1
. . . x
n
y
0
y
1
. . . y
n
[Problems] Answer all of the following.
P1 (10 pnts) Rewrite this system of higher order ODEs as a system of first order ODEs:

x
tt
(t) = x
t
(t) [x(t) +ty(t)]
y
tt
(t) = y
t
(t) [y(t) −tx(t)]
x
t
(0) = 1
x(0) = 4
y
t
(0) = −3
y(0) = −2
P2 (10 pnts) Consider the ODE:

x
t
(t) = t (x + 1)
2
x(2) = 1/2
Use Euler’s Method to find an approximate value of x(2.1) using a single step.
P3 (10 pnts) Let φ(h) be some computable approximation to the quantity L such that
φ(h) = L +a
4
h
4
+a
8
h
8
+a
12
h
12
+. . .
Combine evaluations of φ() to devise a O

h
8

approximation to L.
P4 (15 pnts) Consider the approximation
f
t
(x) ≈
9f(x +h) −8f(x) −f(x −3h)
12h
(a) Derive this approximation using Taylor’s Theorem.
(b) Assuming that f(x) has bounded derivatives, give the accuracy of the above approxima-
tion. Your answer should be something like O

h
?

.
A.6. FINAL EXAM, FALL 2004 165
P5 (15 pnts) Find constants, α, β, γ such that the following is a natural cubic spline on [1, 3].
Q(x) =

α(x −1)
3
+β (x −1)
2
+ 12 (x −1) + 1 : 1 ≤ x < 2
γx
3
+ 18x
2
−30x + 19 : 2 ≤ x < 3
P6 (15 pnts) Devise a subroutine that, given an input number z, calculates an approximate to
3

z
via Newton’s Method. Write your subroutine in pseudo code, Matlab, or C/C++. Include
reasonable termination conditions so your subroutine will terminate if it finds a good approx-
imate solution. Your subroutine should use only the basic arithmetic operations of addition,
subtraction, multiplication, division; in particular your subroutine should not have access to
a cubed root or logarithm operator.
P7 (15 pnts) Consider the ODE:

x
t
(t) = cos (x(t)) + sin (x(t))
x(0) = 1
Use Taylor’s theorem to devise a O

h
3

approximant to x(t +h) which is a function of x, t,
and h only. You may write your answer as a program:
x
t
(t) ←Q
0
(x(t))
x
tt
(t) ←Q
1
(x(t), x
t
(t))
.
.
.
˜ x ←R(x(t), x
t
(t), x
tt
(t), . . .)
such that x(t +h) = ˜ x +O

h
3

.
[Questions] Answer all of the following.
P1 (20 pnts) Consider the “quadrature” rule:

1
0
f(x) dx ≈ A
0
f(0) +A
1
f(1) +A
2
f
t
(1) +A
3
f
tt
(1)
(a) Write four linear equations involving A
i
to make this rule exact for polynomials of the
highest possible degree.
(b) Find the constants A
i
using Gaussian Elimination with na¨ıve pivoting.
P2 (20 pnts) Consider the data:
k 0 1 2
x
k
0 0.25 0.5
f(x
k
) 1.5 1 0.5
(a) Construct the divided differences table for the data.
(b) Construct the Newton form of the polynomial p(x) of lowest degree that interpolates
f(x) at these nodes.
(c) What degree is your polynomial? Is this strange?
(d) Suppose that these data were generated by the function
f(x) = 1 +
cos 2πx
2
.
Find an upper bound on the error [p(x) −f(x)[ over the interval [0, 0.5]. Your answer
should be a number.
166 APPENDIX A. OLD EXAMS
P3 (20 pnts) Let
T =
¸
f(x) = c
0

x +c
1
(x) +c
2
e
x
[ c
0
, c
1
, c
2
∈ R
¸
,
where (x) is some black box function. We are concerned with finding the least-squares best
f ∈ T to approximate the data
x
0
x
1
. . . x
n
y
0
y
1
. . . y
n
(a) Set up (but do not attempt to solve) the normal equations to find the the vector c =
[c
0
c
1
c
2
]
¯
, which determines the least squares approximant in T.
(b) Now suppose that the function (x) happens to be the Lagrange Polynomial associated
with x
7
, among the values x
0
, x
1
, . . . , x
n
. That is, (x
j
) = δ
j7
. Prove that f(x
7
) = y
7
,
where f is the least squares best approximant to the data.
P4 (10 pnts) State some substantive question which you thought might appear on this exam, but
did not. Answer this question (correctly).
Appendix B
GNU Free Documentation License
Version 1.2, November 2002
Copyright c (2000,2001,2002 Free Software Foundation, Inc.
59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed.
Preamble
The purpose of this License is to make a manual, textbook, or other functional and useful document ”free” in the sense of
freedom: to assure everyone the effective freedom to copy and redistribute it, with or without modifying it, either commercially
or noncommercially. Secondarily, this License preserves for the author and publisher a way to get credit for their work, while
not being considered responsible for modifications made by others.
This License is a kind of ”copyleft”, which means that derivative works of the document must themselves be free in the
same sense. It complements the GNU General Public License, which is a copyleft license designed for free software.
We have designed this License in order to use it for manuals for free software, because free software needs free documentation:
a free program should come with manuals providing the same freedoms that the software does. But this License is not limited
to software manuals; it can be used for any textual work, regardless of subject matter or whether it is published as a printed
book. We recommend this License principally for works whose purpose is instruction or reference.
1. APPLICABILITY AND DEFINITIONS
This License applies to any manual or other work, in any medium, that contains a notice placed by the copyright holder
saying it can be distributed under the terms of this License. Such a notice grants a world-wide, royalty-free license, unlimited
in duration, to use that work under the conditions stated herein. The ”Document”, below, refers to any such manual or work.
Any member of the public is a licensee, and is addressed as ”you”. You accept the license if you copy, modify or distribute the
work in a way requiring permission under copyright law.
A ”Modified Version” of the Document means any work containing the Document or a portion of it, either copied
verbatim, or with modifications and/or translated into another language.
A ”Secondary Section” is a named appendix or a front-matter section of the Document that deals exclusively with the
relationship of the publishers or authors of the Document to the Document’s overall subject (or to related matters) and contains
nothing that could fall directly within that overall subject. (Thus, if the Document is in part a textbook of mathematics, a
Secondary Section may not explain any mathematics.) The relationship could be a matter of historical connection with the
subject or with related matters, or of legal, commercial, philosophical, ethical or political position regarding them.
The ”Invariant Sections” are certain Secondary Sections whose titles are designated, as being those of Invariant Sections,
in the notice that says that the Document is released under this License. If a section does not fit the above definition of Secondary
then it is not allowed to be designated as Invariant. The Document may contain zero Invariant Sections. If the Document does
not identify any Invariant Sections then there are none.
The ”Cover Texts” are certain short passages of text that are listed, as Front-Cover Texts or Back-Cover Texts, in
the notice that says that the Document is released under this License. A Front-Cover Text may be at most 5 words, and a
Back-Cover Text may be at most 25 words.
A ”Transparent” copy of the Document means a machine-readable copy, represented in a format whose specification is
available to the general public, that is suitable for revising the document straightforwardly with generic text editors or (for
images composed of pixels) generic paint programs or (for drawings) some widely available drawing editor, and that is suitable
for input to text formatters or for automatic translation to a variety of formats suitable for input to text formatters. A copy
made in an otherwise Transparent file format whose markup, or absence of markup, has been arranged to thwart or discourage
subsequent modification by readers is not Transparent. An image format is not Transparent if used for any substantial amount
of text. A copy that is not ”Transparent” is called ”Opaque”.
167
168 APPENDIX B. GNU FREE DOCUMENTATION LICENSE
Examples of suitable formats for Transparent copies include plain ASCII without markup, Texinfo input format, LaTeX
input format, SGML or XML using a publicly available DTD, and standard-conforming simple HTML, PostScript or PDF
designed for human modification. Examples of transparent image formats include PNG, XCF and JPG. Opaque formats
include proprietary formats that can be read and edited only by proprietary word processors, SGML or XML for which the
DTD and/or processing tools are not generally available, and the machine-generated HTML, PostScript or PDF produced by
some word processors for output purposes only.
The ”Title Page” means, for a printed book, the title page itself, plus such following pages as are needed to hold, legibly,
the material this License requires to appear in the title page. For works in formats which do not have any title page as such,
”Title Page” means the text near the most prominent appearance of the work’s title, preceding the beginning of the body of
the text.
A section ”Entitled XYZ” means a named subunit of the Document whose title either is precisely XYZ or contains XYZ
in parentheses following text that translates XYZ in another language. (Here XYZ stands for a specific section name mentioned
below, such as ”Acknowledgements”, ”Dedications”, ”Endorsements”, or ”History”.) To ”Preserve the Title” of
such a section when you modify the Document means that it remains a section ”Entitled XYZ” according to this definition.
The Document may include Warranty Disclaimers next to the notice which states that this License applies to the Document.
These Warranty Disclaimers are considered to be included by reference in this License, but only as regards disclaiming warranties:
any other implication that these Warranty Disclaimers may have is void and has no effect on the meaning of this License.
2. VERBATIM COPYING
You may copy and distribute the Document in any medium, either commercially or noncommercially, provided that this
License, the copyright notices, and the license notice saying this License applies to the Document are reproduced in all copies,
and that you add no other conditions whatsoever to those of this License. You may not use technical measures to obstruct or
control the reading or further copying of the copies you make or distribute. However, you may accept compensation in exchange
for copies. If you distribute a large enough number of copies you must also follow the conditions in section 3.
You may also lend copies, under the same conditions stated above, and you may publicly display copies.
3. COPYING IN QUANTITY
If you publish printed copies (or copies in media that commonly have printed covers) of the Document, numbering more
than 100, and the Document’s license notice requires Cover Texts, you must enclose the copies in covers that carry, clearly and
legibly, all these Cover Texts: Front-Cover Texts on the front cover, and Back-Cover Texts on the back cover. Both covers must
also clearly and legibly identify you as the publisher of these copies. The front cover must present the full title with all words of
the title equally prominent and visible. You may add other material on the covers in addition. Copying with changes limited to
the covers, as long as they preserve the title of the Document and satisfy these conditions, can be treated as verbatim copying
in other respects.
If the required texts for either cover are too voluminous to fit legibly, you should put the first ones listed (as many as fit
reasonably) on the actual cover, and continue the rest onto adjacent pages.
If you publish or distribute Opaque copies of the Document numbering more than 100, you must either include a machine-
readable Transparent copy along with each Opaque copy, or state in or with each Opaque copy a computer-network location from
which the general network-using public has access to download using public-standard network protocols a complete Transparent
copy of the Document, free of added material. If you use the latter option, you must take reasonably prudent steps, when you
begin distribution of Opaque copies in quantity, to ensure that this Transparent copy will remain thus accessible at the stated
location until at least one year after the last time you distribute an Opaque copy (directly or through your agents or retailers)
of that edition to the public.
It is requested, but not required, that you contact the authors of the Document well before redistributing any large number
of copies, to give them a chance to provide you with an updated version of the Document.
4. MODIFICATIONS
You may copy and distribute a Modified Version of the Document under the conditions of sections 2 and 3 above, provided
that you release the Modified Version under precisely this License, with the Modified Version filling the role of the Document,
thus licensing distribution and modification of the Modified Version to whoever possesses a copy of it. In addition, you must
do these things in the Modified Version:
A. Use in the Title Page (and on the covers, if any) a title distinct from that of the Document, and from those of previous
versions (which should, if there were any, be listed in the History section of the Document). You may use the same title
as a previous version if the original publisher of that version gives permission.
B. List on the Title Page, as authors, one or more persons or entities responsible for authorship of the modifications in the
Modified Version, together with at least five of the principal authors of the Document (all of its principal authors, if it
has fewer than five), unless they release you from this requirement.
C. State on the Title page the name of the publisher of the Modified Version, as the publisher.
D. Preserve all the copyright notices of the Document.
E. Add an appropriate copyright notice for your modifications adjacent to the other copyright notices.
169
F. Include, immediately after the copyright notices, a license notice giving the public permission to use the Modified Version
under the terms of this License, in the form shown in the Addendum below.
G. Preserve in that license notice the full lists of Invariant Sections and required Cover Texts given in the Document’s
license notice.
H. Include an unaltered copy of this License.
I. Preserve the section Entitled ”History”, Preserve its Title, and add to it an item stating at least the title, year, new
authors, and publisher of the Modified Version as given on the Title Page. If there is no section Entitled ”History” in
the Document, create one stating the title, year, authors, and publisher of the Document as given on its Title Page,
then add an item describing the Modified Version as stated in the previous sentence.
J. Preserve the network location, if any, given in the Document for public access to a Transparent copy of the Document,
and likewise the network locations given in the Document for previous versions it was based on. These may be placed
in the ”History” section. You may omit a network location for a work that was published at least four years before the
Document itself, or if the original publisher of the version it refers to gives permission.
K. For any section Entitled ”Acknowledgements” or ”Dedications”, Preserve the Title of the section, and preserve in the
section all the substance and tone of each of the contributor acknowledgements and/or dedications given therein.
L. Preserve all the Invariant Sections of the Document, unaltered in their text and in their titles. Section numbers or the
equivalent are not considered part of the section titles.
M. Delete any section Entitled ”Endorsements”. Such a section may not be included in the Modified Version.
N. Do not retitle any existing section to be Entitled ”Endorsements” or to conflict in title with any Invariant Section.
O. Preserve any Warranty Disclaimers.
If the Modified Version includes new front-matter sections or appendices that qualify as Secondary Sections and contain
no material copied from the Document, you may at your option designate some or all of these sections as invariant. To do this,
add their titles to the list of Invariant Sections in the Modified Version’s license notice. These titles must be distinct from any
other section titles.
You may add a section Entitled ”Endorsements”, provided it contains nothing but endorsements of your Modified Version by
various parties–for example, statements of peer review or that the text has been approved by an organization as the authoritative
definition of a standard.
You may add a passage of up to five words as a Front-Cover Text, and a passage of up to 25 words as a Back-Cover Text,
to the end of the list of Cover Texts in the Modified Version. Only one passage of Front-Cover Text and one of Back-Cover
Text may be added by (or through arrangements made by) any one entity. If the Document already includes a cover text for
the same cover, previously added by you or by arrangement made by the same entity you are acting on behalf of, you may not
add another; but you may replace the old one, on explicit permission from the previous publisher that added the old one.
The author(s) and publisher(s) of the Document do not by this License give permission to use their names for publicity
for or to assert or imply endorsement of any Modified Version.
5. COMBINING DOCUMENTS
You may combine the Document with other documents released under this License, under the terms defined in section 4
above for modified versions, provided that you include in the combination all of the Invariant Sections of all of the original
documents, unmodified, and list them all as Invariant Sections of your combined work in its license notice, and that you preserve
all their Warranty Disclaimers.
The combined work need only contain one copy of this License, and multiple identical Invariant Sections may be replaced
with a single copy. If there are multiple Invariant Sections with the same name but different contents, make the title of each
such section unique by adding at the end of it, in parentheses, the name of the original author or publisher of that section if
known, or else a unique number. Make the same adjustment to the section titles in the list of Invariant Sections in the license
notice of the combined work.
In the combination, you must combine any sections Entitled ”History” in the various original documents, forming one section
Entitled ”History”; likewise combine any sections Entitled ”Acknowledgements”, and any sections Entitled ”Dedications”. You
must delete all sections Entitled ”Endorsements”.
6. COLLECTIONS OF DOCUMENTS
You may make a collection consisting of the Document and other documents released under this License, and replace the
individual copies of this License in the various documents with a single copy that is included in the collection, provided that
you follow the rules of this License for verbatim copying of each of the documents in all other respects.
You may extract a single document from such a collection, and distribute it individually under this License, provided you
insert a copy of this License into the extracted document, and follow this License in all other respects regarding verbatim
copying of that document.
7. AGGREGATION WITH INDEPENDENT WORKS
170 APPENDIX B. GNU FREE DOCUMENTATION LICENSE
A compilation of the Document or its derivatives with other separate and independent documents or works, in or on a
volume of a storage or distribution medium, is called an ”aggregate” if the copyright resulting from the compilation is not used
to limit the legal rights of the compilation’s users beyond what the individual works permit. When the Document is included
in an aggregate, this License does not apply to the other works in the aggregate which are not themselves derivative works of
the Document.
If the Cover Text requirement of section 3 is applicable to these copies of the Document, then if the Document is less than
one half of the entire aggregate, the Document’s Cover Texts may be placed on covers that bracket the Document within the
aggregate, or the electronic equivalent of covers if the Document is in electronic form. Otherwise they must appear on printed
covers that bracket the whole aggregate.
8. TRANSLATION
Translation is considered a kind of modification, so you may distribute translations of the Document under the terms of
section 4. Replacing Invariant Sections with translations requires special permission from their copyright holders, but you may
include translations of some or all Invariant Sections in addition to the original versions of these Invariant Sections. You may
include a translation of this License, and all the license notices in the Document, and any Warranty Disclaimers, provided that
you also include the original English version of this License and the original versions of those notices and disclaimers. In case
of a disagreement between the translation and the original version of this License or a notice or disclaimer, the original version
will prevail.
If a section in the Document is Entitled ”Acknowledgements”, ”Dedications”, or ”History”, the requirement (section 4) to
Preserve its Title (section 1) will typically require changing the actual title.
9. TERMINATION
You may not copy, modify, sublicense, or distribute the Document except as expressly provided for under this License.
Any other attempt to copy, modify, sublicense or distribute the Document is void, and will automatically terminate your rights
under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses
terminated so long as such parties remain in full compliance.
10. FUTURE REVISIONS OF THIS LICENSE
The Free Software Foundation may publish new, revised versions of the GNU Free Documentation License from time to
time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or
concerns. See http://www.gnu.org/copyleft/.
Each version of the License is given a distinguishing version number. If the Document specifies that a particular numbered
version of this License ”or any later version” applies to it, you have the option of following the terms and conditions either of
that specified version or of any later version that has been published (not as a draft) by the Free Software Foundation. If the
Document does not specify a version number of this License, you may choose any version ever published (not as a draft) by the
Free Software Foundation.
ADDENDUM: How to use this License for your documents
To use this License in a document you have written, include a copy of the License in the document and put the following
copyright and license notices just after the title page:
Copyright c (YEAR YOUR NAME. Permission is granted to copy, distribute and/or modify this document
under the terms of the GNU Free Documentation License, Version 1.2 or any later version published by the Free
Software Foundation; with no Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A copy of
the license is included in the section entitled ”GNU Free Documentation License”.
If you have Invariant Sections, Front-Cover Texts and Back-Cover Texts, replace the ”with...Texts.” line with this:
with the Invariant Sections being LIST THEIR TITLES, with the Front-Cover Texts being LIST, and with the
Back-Cover Texts being LIST.
If you have Invariant Sections without Cover Texts, or some other combination of the three, merge those two alternatives
to suit the situation.
If your document contains nontrivial examples of program code, we recommend releasing these examples in parallel under
your choice of free software license, such as the GNU General Public License, to permit their use in free software.
Bibliography
[1] Guillaume Bal. Lecture notes, course on numerical analysis, 2002. URL
http://www.columbia.edu/~gb2030/COURSES/E6302/NumAnal.ps.
[2] Samuel Daniel Conte and Carl de Boor. Elementary Numerical Analysis; An Algorithmic Ap-
proach. McGraw-Hill, New York, third edition, 1980. ISBN 0070124477.
[3] James F. Epperson. An Introduction to Numerical Methods and Analysis. Wiley, 2001. ISBN
0-471-31647-4.
[4] Richard Hamming. Numerical Methods for Scientists and Engineers. Dover, New York, 1987.
ISBN 0486652416.
[5] Eugene Isaacson and Herbert Bishop Keller. Analysis of Numerical Methods. Dover, New York,
1966. ISBN 0486680290.
[6] David Kincaid and Ward Cheney. Numerical analysis. Brooks/Cole Publishing Co., Pacific
Grove, CA, second edition, 1996. ISBN 0-534-33892-5. Mathematics of scientific computing.
[7] David Kincaid and Ward Cheney. Numerical Mathematics and Computing. Brooks/Cole Pub-
lishing Co., Pacific Grove, CA, fourth edition, 1999. ISBN 0-534-35184-0.
[8] James Stewart. Calculus, Early Transcendentals. Brooks/Cole-Thomson Learning, Belmont,
CA, fifth edition, 2003. ISBN 0-534-39330-6.
171
Index
Backwards Euler’s Method, 141
big-O, 3
bisection method, 50
centered difference, 91
Chebyshev Polynomials, 69, 122
Chebyshev Quadrature, 114
composite quadrature rule, 100
Dirichlet function, 99
divided differences, 67
eigenvalue, 8
eigenvector, 8
elementary row operations, 25
error
roundoff, 89, 138
truncation, 89, 138
Euler’s Method, 136, 137
Gaussian Elimination, Na¨ıve, 25
Gaussian Quadrature, 107
Heaviside function, 98
Hilbert Matrix, 48
inner product, 6
Intermediate Value Theorem, 49
knot, 79
Lagrange Polynomials, 63
least squares, 119
definition, 117
linear, 119
LU Factorization, 33
method of undetermined coefficients, 108
multiplier, 28
natural cubic spline, 83
Newton’s Method, 52
norm, 7
subordinate, 9
normal equations, 118
ODE
errors, 138
higher order methods, 137
stability, 138
stepping, 136
partition, 79
phase curve, 150
Richardson Extrapolation, 92, 104
Riemann Integrable, 98
Riemann integral, 97
Romberg Algorithm, 104, 105
Runge Function, 69
Runge-Kutta Method, 143
secant method, 58
splines, 79
basis splines, 84
linear, 79
quadratic, 81
subtractive cancellation, 4
Taylor’s Series Method, 136
Taylor’s Theorem, 1, 3
trapezoidal rule, 100
error, 102
recursive, 106
vector space, 5
172

2

Preface
These notes were originally prepared during Fall quarter 2003 for UCSD Math 174, Numerical Methods. In writing these notes, it was not my intention to add to the glut of Numerical Analysis texts; they were designed to complement the course text, Numerical Mathematics and Computing, Fourth edition, by Cheney and Kincaid [7]. As such, these notes follow the conventions of that text fairly closely. If you are at all serious about pursuing study of Numerical Analysis, you should consider acquiring that text, or any one of a number of other fine texts by e.g., Epperson, Hamming, etc. [3, 4, 5].
3.1 5.1 1.4 1.1 9.1

3.2

8.4

5.2

3.4

4.2

7.1

10.1

9.2

9.3

3.3

4.3

7.2

8.2

10.2

8.3

10.3

Figure 1: The chapter dependency of this text, though some dependencies are weak. Special thanks go to the students of Math 174, 2003–2004, who suffered through early versions of these notes, which were riddled with (more) errors.

Revision History
0.0 Transcription of course notes for Math 174, Fall 2003. 0.1 As used in Math 174, Fall 2004. 0.11 Added material on functional analysis and Orthogonal Least Squares.

Todo
• • • • More homework questions and example problems. Chapter on optimization. Chapters on basic finite difference and finite element methods? Section on root finding for functions of more than one variable.

i

ii .

. . . . . . . .4. . 1. . . . . . . . . 1. . . . . . . . . 3. . . . . . .1. . . . . . .2 An Example . . . . . . . . 3 Solving Linear Systems 3. . . . .3 Programming and Control . . . . . . . . . . . . . . . . . . . . . 2. . .2 Useful Commands . . . i 1 1 4 5 5 6 7 8 8 10 13 13 17 19 20 21 24 25 25 25 27 28 29 30 31 32 33 34 36 37 37 37 37 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 Plotting . . . . . . . . . . . . . . . 3. . . . . . .3. . . . . 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.1 An Operation Count for Gaussian Elimination iii . . . . . . . . . . . . .4 Iterative Solutions . . . . . . . . . . . . . . 3. .1 Getting Started . . Exercises . . . . . . . . . . . .3 LU Factorization . . . . . . . . . . . .4 Computing Inverses . . . . . . . . . . . . . . . . . 3. . . . . . . 1. . . . . . . . . . . . . . . . 3. . . . . . . . . . . . . .1 Taylor’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1. . . .3. . . . . . . . . . . . . . 3. . . . . . . . .4 Eigenvalues . . . . . . . . . . . . . . .3 Norms . . . . . . . . . . . . . . . . . . . 3. . . 3. . . . . . . . . . . . . Inner Products. . . 1. . . . .1 An Example . . . . . . . . . . . . . . . . . .3. . . . . . . . . . . 3. . . . . . . . . . . Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Vector Spaces. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4. . . . . . . . 2. . . . . . . . . . . . . . . . . . . . .1 Scaled Partial Pivoting . . . . . . 2. . . . . . . . . . . . . . . . . .Contents Preface 1 Introduction 1. . . . . . . . . . . . .3. . .3 Another Example and A Real Algorithm . . . . . . . . . . . . . . . . .2. . . . . . . .3 Some Theory . . . . . . . . . . . . . . . . . . . . .2 Using LU Factorizations . . . . . . . . . 1. . . . . . . . . 2. . . . . . . . . . . . .1 Matrix Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2. . . . . . . . . . . . . . . 2 A “Crash” Course in octave/Matlab 2. . . . . . . . . . . . . . . 1. . Norms 1.1. . . . . . . . . . . .3. . . . . . . .1 Logical Forks and Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3. . . . 3. . . . ıve 3. .2 Pivoting Strategies for Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Elementary Row Operations . . . . . . . . . . . . . . .3 Algorithm Problems . . .2 Loss of Significance . . . . . . . . . .2. . . . . . . . . . . . . . . .2 Algorithm Terminology . . .1 Gaussian Elimination with Na¨ Pivoting . .3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Inner Products . . . . . . . . . . . . . . . . . . . . . . . . . 3. . . .1 Vector Space .

3 Newton’s Nested Form . . . . . . . . . . . . . . . . . . .3 B Splines . . . . . . . . . .2 Newton’s Method . . . . . . . . . . . . . .2. .4. . . . . . . . . . . . . . 3. CONTENTS . . . . . . . . . 6. . . . . 6. . . . . . . 3. . 4. . . .7 Error Analysis . . .2 Newton’s Method . . . . . . . . . . . . .2 Errors in Polynomial Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. . . . . . . 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Computing Second Degree Splines 6. . . . . .3 Impossible Iteration . . . . . . .1 Polynomial Interpolation . . . . . . . . 4. . . . . . . . . . . . . . . . . . . . . . . . . . . 4. . . . . . . . 3. 5. . . . . . . . . . . . . . . . . . .5 Jacobi Iteration . . . . . . . .3. . . . . . . . . . . . . . . . . . .1 Why Natural Cubic Splines? . . . .2. . . . . . . . . . . . . . . . . . . . . . . . . 5. . . . . . . . . . . . . . . . . . . . Exercises . . . . . . . . . .2 Convergence . . . . . . . . . . . . . . .2. . . . . . . . . . . . .4 Using Newton’s 4. . . . . . . . . . . . . . . . . .1. . . . . . . . . . . . . . . . . . . . . . . . .1. . . . . . . . . . . . . . . . . . . . . . . . . .4. . 39 40 40 41 42 42 44 46 49 49 49 50 50 51 53 53 54 55 57 58 60 63 63 63 64 66 67 69 69 73 75 79 79 80 81 82 82 83 84 84 87 .2. . . . . . . . . . . . . . . . . . . . . .4. . . . . . . . . . . . . . . . 4 Finding Roots 4. . .3 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . .3. . . . . .4. . . . . . . . . . . .1 First and Second Degree Splines . . . . . . . . . . .1. . . .iv 3. . . . . .1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Nodes . . . . . . .3 Secant Method .2 Problems .1 Problems . . . . .2. . . . 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. . . .4. . . . . . . . . . . . . . . . . . . . . . . .1 Lagranges Method . . . . . . . . . . . 4. . . . . . .2 Dividing by Multiplying 3. . . . .2 (Natural) Cubic Splines . . . . . . . . . . . . . . . 5 Interpolation 5. . . . . . . . . . . . . . . Method . . . . . . . . . . . . . . . . . . . . . .2 Convergence . . . . . . . . . . . . . . . . . . . 6. . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Computing Cubic Splines . . . . Exercises . . . .1. . . . . . . . . . . . . . . . . . . . 4. . . . . . . . . . . 6. . . . . . . . . . . . . . . . . .1 Interpolation Error Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 Divided Differences . . . . . . . . . .1 Modifications . . . . .8 A Free Lunch? . . . . . . . 5. 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1. . . . . . . . . . . . . . . . . . . 6. . . . . . . . . . . . . . . . . .2.2. . 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. . . . . . . . . . . . . . 3. . . . . . . . . . . . . . Exercises . . . . . . . . . . . . . .2 Interpolation Error for Equally Spaced Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2. . .1 First Degree Spline Accuracy . . . . . . . . . . . . . . . . . . .4 Richardson Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1. . . 6 Spline Interpolation 6. . . . . . .1 Implementation 4.1. . . . . . . . . .2 Second Degree Splines . . . . . . . . . . . . . . . . . . . . . .1 Bisection . . . . . . . . . . . . . . . . . . . . . . . .1. . . . . . . . .4. . . . . . . .4. . . . . . . . . . 5. . . . . . . . . . . . . . . . . .6 Gauss Seidel Iteration . . . . . . . . . . .

. . . . . . . .1. . . . . . . . . . . . . . . . . . . . . . .1. . . . . . .1 Alternatives to Normal Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 Gaussian Quadrature . 10. . . . .4. . . . . .4 Determining Gaussian Nodes .7 Backwards Euler’s Method . . . . .3 Gaussian Nodes . . . . . . . . . . . .CONTENTS 7 Approximating Derivatives 7. . . . . . . . . . . . .2. . . . . . . . . . . . . . .1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2. . . . .2. . . . . . . .4. 8 Integrals and Quadrature 8. . . . . . . . . . . . . . .3. . . . . . . . . . . . . . . 10. . . . . .1. . . . . . . . . . . . . . . . . . . . 8. . . . . . . . . 8. . . . . . . . Exercises . . . . . . . . . . . 10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1. . . . . . 10. . . . . . . . 8. . . . . . . .2 Using Richardson Extrapolation . . . . . . . . . . . . . . . . . . . . . . . . . 9 Least Squares 9. . . .3 Romberg Algorithm . . . . . . . . 9. . . . . . 8. . . . . . . . . . . . . . . . . .1 Determining Weights (Lagrange Polynomial Method) . . . . .1. . . . . . . . .1. . .2 Approximating the Integral . . . . . . . . . . . . . . . . .6 Stability . . . . . . . . . . . . . . . . . . . . . . .1 Recursive Trapezoidal Rule . . . . .1 Computing the Orthogonal Least Squares Approximant 9. . . . . . . . . . . . . . . .1 Upper and Lower Sums . . . . . . . . . . . . . .1. . . . . .2 Linear Least Squares . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8. . . . . v 89 89 91 91 92 93 95 97 97 97 99 99 100 101 103 104 106 107 107 108 109 110 112 113 117 117 117 118 119 121 122 124 124 128 129 132 135 135 136 136 137 137 138 138 141 . . . . . . . . . . . . . . . . . . . . . . . . . .2. .1 Least Squares .4 Higher Order Methods . . . . . . . . . . . . . . . .2. . . . . . . . .1 Abstracting Richardson’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Orthonormal Bases . . . . . . . . . 9. .2 Principal Component Analysis . . . . . 9. . . . . . . .1. . . . . . . . . . . . . . . . . . .1. . . . . . .3. . . . . 8. . . . . . . . . . . . . . . .5 Reinventing the Wheel . . . . . . . . . . . . . . .1. . . . . . . . . . . . . . .4. . . . . . . . . . . . . . Exercises . . . . 8. . . . . . . . . . . . . . . . . . . . .2 Richardson Extrapolation . . . . . . . . . . . 7. . . . . . . . . . . . . .1 Finite Differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.1 Approximating the Second Derivative 7. . . .2. . . . . . . . . . . . . . . . . . .4. . . . . . . . . . .3 Least Squares from Basis Functions . . .3 Orthogonal Least Squares . 10. . . . 8. . . . . . . .1 The Definite Integral . . . 8. . . . . . . . Exercises . . . . . 8. . . . . . . . 10.1 The Definition of Ordinary Least Squares . . . . . . . . . . . . . . . . . . . . .2 Ordinary Least Squares in octave/Matlab . . . . . . . 7. . . . . . . . . . . . . . . 9. . . . . . . . .2 Using the Error Bound . . . . . . . . . . . . . . . . . . . 10 Ordinary Differential Equations 10. . . . . .1. . . . .3 Simple and Composite Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Euler’s Method . . .2 Trapezoidal Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Taylor’s Series Methods . . . . . 8. . . . . . 7. . . . . 9. . . . . 9. . . . . . . . . . . . . . . . . . . . . 10. . . . . . . . . . . . . . . . . . .1 Integration and ‘Stepping’ . . . . . .1. . . . . . . . . . . . . . . . . 9. . . . . . . . . . . . . . . . . .1 How Good is the Composite Trapezoidal Rule? . . . . . . . . . . .1. . . . . . . . . . . . . . . 8. . . . . . . . . . . . . 8. . . . .2 Determining Weights (Method of Undetermined Coefficients) 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Elementary Methods . . . . . . . . . . . . . .4. . . .5 Errors . . . 9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . .3. . . . . . . . . . . 10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Fall 2003 . COPYING IN QUANTITY . . . . . 158 . . . . . . . . . . . 159 . . . . . . . . . . . . . . . . . . . . . . . . 3. . . . . . . 10. . . APPLICABILITY AND DEFINITIONS . . . . . . . 10. . . . . . . . . . . . . . . . First Midterm. COMBINING DOCUMENTS . . . . . . ADDENDUM: How to use this License for your documents Bibliography . . . . . . . . . . . . . . . 167 167 168 168 168 169 169 169 170 170 170 170 171 B GNU Free Documentation License 1. . . 10. . FUTURE REVISIONS OF THIS LICENSE . . 10. . . . . . . . . . . . . . . . . . . . . . . . .4 It’s Only Autonomous Systems . . 2. . . . . . . . . . . . . . . . AGGREGATION WITH INDEPENDENT WORKS . . . . . 7. . . . . . . . . . . . . . . . . . . . . . . . . . . Fall 2003 . . . . . . . . . . . . . . . . . .2. .3. . 5. . . . . . . . . . .2. . . . . . . . . . . .1 Larger Systems . . . . . . . . . . . . . . . . . .3 A. . . . . . . . . . . . . . . . . . . . . . . .3 It’s Only Systems . . Second Midterm. . . . . . . . . . . . . . . . . . . . . . . . 9. . . . . . Exercises . . . 162 . . . . .2. . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Deriving the Runge-Kutta Methods 10. . . . . . . . MODIFICATIONS . . . . . .2 Runge-Kutta Methods . . . . . . . . . . . Second Midterm. . . . . . . . . . . . . . 143 144 144 146 146 147 147 149 150 154 157 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Recasting Single ODE Methods . . . . . .3. . . . Fall 2004 . . . . . . . . . . . . . . . . . .5 A. . . 8. . . . . . . . . . . . . . . 10. .6 Exams First Midterm. . . . . . . . . . . . . . . . . . . . . . . . . . Fall 2003 Final Exam. . . . . . . . . . . . . . .2 A. . . . . . . . . . . . . . . . . . . . . . . . . . . . .3. . 10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . CONTENTS . . Fall 2004 . . . . . . . . . . . . .4 A. . . . . . . . . . . . . . . . . . . . . 4. . . . . . . . . . . . . . . . . 10. . . . . . . . . . . 157 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Systems of ODEs . . . . . . . . . . . . . . . . Fall 2004 Final Exam. . . . . . . . . . . COLLECTIONS OF DOCUMENTS . . . . . . . . . . . . . . .3 Examples . . . . . . . . . . . . . . . . . 161 . . . . . . . . . . . . . . . . . . . .1 A. . 164 . . . . . . . . . . . . . . . . . . . . . . TERMINATION . . . . . . . . TRANSLATION . . . . . . . . . . . . . . . . . . A Old A. . . . . .vi 10. . . . . . . . VERBATIM COPYING . . . . . . . . . . . . . . . . . . . . . 6. . . . . . . . . . . . . . . . .1 Taylor’s Series Redux . .

f (x). . also known as the error term. c. this is a MacLaurin series. 2! 3! x5 + − . and f k (x) is the k th derivative of f at x. 1.2) (1..” meaning that convergence is not guaranteed in general.1) (1.. The main usage is to approximate a function by the first few terms of its Taylor’s series expansion. k! (n + 1)! where ξ is some number between x and c. . 2. We will use this theorem again and again in this class. .3) Theorem 1. then for any x and c in this interval n f (x) = k=0 f (k) (c) (x − c)k f (n+1) (ξ) (x − c)n+1 + . the theorem then tells us the approximation is “as good” as the final term...1 Taylor’s Theorem Recall from calculus the Taylor’s series for a function. The constants ai are related to the function f and its derivatives evaluated at c. . is written as f (x) ∼ a0 + a1 (x − c) + a2 (x − c)2 + . Here the symbol ∼ is used to denote a “formal series. 4! ex = 1 + x + x3 3! x2 cos (x) = 1 − 2! sin (x) = x − (1. That is.1 (Taylor’s Theorem).. For example we have the following Taylor’s series (with c = 0): x2 x3 + + .. expanded about some number. . 5! x4 + − . . we can make the following manipulation: 1 . . If f (x) has derivatives of order 0. When c = 0. b]. n+1 on the closed interval [a.Chapter 1 Introduction 1.

INTRODUCTION n f (x) = n f (x) − f (x) − f (k) (c) k=0 n f (k) (x − c) k! k = = k=0 f (n+1) (ξ) f (k) (c) (x − c)k f (n+1) (ξ) (x − c)n+1 + k! (n + 1)! (x − c)n+1 (n + 1)! k=0 (c) (x − c)k k! f (n+1) (ξ) |x − c|n+1 . (n + 1)! Example Problem 1. f (3) (x) = 6. = x− 6 24 x3 6 sin(ξ) x4 x4 ≤ . We find that f (x) = sin x = sin(0) + − cos(0) x3 sin(ξ) x4 cos(0) x − sin(0) x2 + + + 1! 2! 3! 4! x3 sin(ξ) x4 + . Solution: Solving for f (k) is fairly easy for this function. b]. Solution: Taylor’s Theorem for n = 1 states: Given a function. c are in [a. . Example Problem 1.2 CHAPTER 1. f (1) (x) = 3x2 − 42x. As a one-liner. the MVT says that at some time during a trip. using n = 3. Example Problem 1. f (2) (x) = 6x − 42. Apply Taylor’s Theorem for the case n = 1. This is the Mean Value Theorem. expanded about c = 0.4. Apply Taylor’s Theorem to expand f (x) = x 3 − 21x2 + 17 around c = 1. = f (k) (x) = 0.2. Solution: Simple calculus gives us f (0) (x) = x3 − 21x2 + 17. 24 24 so sin x − x − because |sin(ξ)| ≤ 1. We will then use our knowledge about f (n+1) (ξ) on the interval [a. f (x) with a continuous derivative on [a.3. b]. b] to find some constant M such that n f (x) − k=0 f (k) (c) (x − c)k k! = f (n+1) (ξ) |x − c|n+1 ≤ M |x − c|n+1 . c when x. your velocity is the same as your average velocity for the trip. Find an approximation for f (x) = sin x. (n + 1)! On the left hand side is the difference between f (x) and its approximation by Taylor’s series. then f (x) = f (c) + f (ξ)(x − c) for some ξ between x.

012 . This leads to a discussion on the matter of Orders of Convergence.e. 2 6 3 Note there is no error term. and x for c in the more general version. smaller than some h ∗ in absolute value. We generally apply this form of the theorem with h → 0. b]. we have ln (1 + h) = h − h2 1 + 3 h3 . n+ 1 on the closed interval [a. We say that a function f (h) is in the class O hk (pronounced “big-Oh of hk ”) if there is some constant C such that |f (h)| ≤ C |h|k for all h “sufficiently small. in fact.. as long as h is relatively small (say smaller than 1 the term 3ξ 3 can be bounded by a constant. Evaluating these at c = 1 gives f (x) = −3 + −39(x − 1) + −36 (x − 1)2 6 (x − 1)3 + . .01) ≈ 0. Roughly speaking.01 − .6.1. since the higher order derivatives are identically zero. and thus ln (1 + h) = h − Thus we say that h − h2 2 h2 + O h3 . meaning some function which is a member of this class. −1/x2 h2 2/ξ 3 h3 (1/x) h + + 1 2 6 Using the fact that ξ is between 1 and 1 + h. Alternative Form). k! (n + 1)! k=0 where ξ is some number between x and x + h.” i. We sometimes also write O hk . Consider the Taylor expansion of ln x: ln (x + h) = ln x + Letting x = 1. If f (x) has derivatives of order 0. through use of the “Big-O” function we can write an expression without “sweating the small stuff.” This can give us an intuitive understanding of how an approximation works. 2 3ξ 1 2 ). 1. 2 ln(1 + 0. For example 0.5 (Taylor’s Theorem. . then for any x in this interval and any h such that x + h is in this interval. 2 is a O h3 approximation to ln(1 + h). in this case substituting x + h for x. n f (k) (x) (h)k f (n+1) (ξ) (h)n+1 f (x + h) = + . the function f (x).009950331 ≈ 0. The following definition will suffice for this class Definition 1.1. without losing too many of the details. There is an alternative form of Taylor’s Theorem.00995 = 0.7. By carrying out simple algebra. . TAYLOR’S THEOREM with the last holding for k > 3. For a function f ∈ O hk we sometimes write f = O hk . you will find that the above expansion is. This gives Theorem 1. . Example 1.

1234 by a system that keeps only 16 significant digits. The number of significant digits in r is usually determined by the user’s input. but does not eliminate. x− = −b. and k is an integer in a certain range. the result would be .2 × 10−6 . Consider the stability of x + 1 − 1 when x is near 0.17281995 × 10−6 2. √ Example Problem 1. To fix this. we rationalize the expression √ √ √ x+1+1 x+1−1 x x+1−1 = x + 1 − 1√ =√ =√ .1. When x = 1.8. as the following examples illustrate. the precision of the first measurement is lost in the uncertainty of the second. Operations on numbers stored in this way follow a “lowest common denominator” type of rule. . 1). This will be made clearer by the examples below.177241 from 0. a loss of significance can be incurred if two nearly equal quantities are subtracted from one another. This loss is called subtractive cancellation. and three significant digits have been lost. Solution: The usual quadratic formula is √ −b ± b2 − 4c x± = 2 Supposing that b c > 0.2345678 × 10 −5 . there is no loss of precision. Then x + 1 ≈ 1. If your computer (or calculator) can only keep 8 significant digits. Thus if I were to direct my computer to subtract 0. Thus 6 significant digits have been lost from the original. where r is a rational number of a given number of digits in [0. When 1 is subtracted.177589. Errors can also occur when quantities of radically different magnitudes are summed.6789 × 10−20 might be rounded to 0. and can often be avoided by rewriting the expression.4 CHAPTER 1. giving two roots x+ = 0.171717 and 0.2 Loss of Significance Generally speaking. x+1+1 x+1+1 x+1+1 This expression has no subtractions.e. The usual strategies for rewriting subtractive expressions are completing the square.2345678 × 10−5 ≈ 6. this expression evaluates approximately as 1. and so is not subject to subtractive cancelling. Rewrite the expression to rid it of subtractive cancellation.. precision cannot be gained but can be lost. the result is 6. i.51.000006173. Example Problem 1. This minimizes. Write stable code to find the roots of the equation x 2 + bx + c = 0.1234 + 5. This may lead to unexpected results. that is x = ±r×10 k .0000062 on a machine with 8 digits.9. This is as it should be. factoring. The latter root is nearly correct.0000062. problems like those of the previous example problem.348 × 10 −3 .2345678 × 10−5 . while the former has no correct digits. Note that nearly all modern computers and calculators store intermediate results of calculations in higher precision formats. √ Solution: Suppose that x = 1. INTRODUCTION 1. However. For example 0. or using the Taylor expansions. Thus for example if you add the two quantities 0. the expression in the square root might be rounded to b 2 . a computer stores a number x as a mantissa and exponent. then the result should only have two significant digits. this will be rounded to 1.

1 The binary operator allows transparent algebraic manipulation of vectors. has some currency. For each u ∈ V.. which is nearly correct. v ∈ V. V.e. Addition is commutative: u + v = v + u for each u..”) 4. If x is very close to zero. and if x − is computed. Inner Products. 3.. C. the sum u + v is a vector in V. where 0 is the zero of R. (i.3 Vector Spaces. the complex field. Rewrite ex − cos x to be stable when x is near 0. this algebraic field will always be the real field.. For any u.3. 1u = u. − 1 − + − + . (i. and a scalar multiply over the real field R. 5. where +R is addition in R. defined over V..) For the purposes of this text. which we cannot calculate exactly anyway. the approximate should be good. x+ can easily be computed with little additional work. Note that we propose calculating x + x2 + x3 /6 as an approximation of ex − cos x. +. the space is “closed under scalar multiplication. and so is not subject to subtractive cancelling. For any u ∈ V.e.. Solution: Look at the Taylor’s Series expansion for these functions: ex − cos x = x2 x3 x4 x5 x2 x4 x6 + + + + . this expression gives root x + = −c/b. v ∈ V.10. 1 . There is a zero vector 0 ∈ V such that for any u ∈ V. though in the real world. NORMS To correct this problem. Since we assume x is nearly zero. multiply the numerator and denominator of x + by −b − x+ = Now if b pair: 2c √ −b − b2 − 4c 5 √ b2 − 4c to get c > 0. with a binary addition operator. Definition 1. A collection of vectors. forms a vector space if 1. This leads to the x− = −b − √ b2 − 4c .11. we may need to take all three terms. For each u. and each scalar α ∈ R the scalar product αu is a vector in V. R. INNER PRODUCTS.e.1 Vector Space A vector space is a collection of objects together with a binary operator which is defined over an algebraic field. or even more terms of the expansion. (i.. 2! 3! 4! 5! 2! 4! 6! x3 + O x5 = x + x2 + 3! 1+x+ This expression has no subtractions. and scalars α. v ∈ V.1. 6. 0u = 0. β ∈ R. scalar multiplication distributes in both ways. if x is far from zero we should use some other technique. 1. Norms We explore some of the basics of functional analysis which may be useful in this text. the space is “closed under addition. both (α + R β)u = αu + βu and α (u + v) = αu + αv hold. 2 x+ = 2c √ −b − b2 − 4c Note that the two roots are nearly reciprocals. 1.”) 2. If x is not so close to zero. Example Problem 1. we may only have to take the first one or two terms. VECTOR SPACES.3. where 1 is the multiplicative identity of R.

v) + β (u. . v). . . That is. This space is denoted as R m×n . The most familiar example of a vector space is R n . un ] + [v1 . Similarly for αu. That is u ∈ R n is of the form [u1 . u2 . . u2 + v2 . . Then H forms a vector space under the “pointwise” defined addition and scalar multiplication over R. . The inner product should have the following properties: Definition 1.12. which is the collection of n-tuples of real numbers. un + vn ] . v ∈ H. . . for u. . defined over R. The only difference between proving H 0 is a vector space and the proof required for the previous example is in showing that H 0 is indeed closed under addition and scalar multiplication. Let Pn be the collection of all ‘formal’ polynomials of degree less than or equal to n with coefficients from R. . . Example 1. u) = (u. a real or a complex number).16. V. w) . A binary function for which this holds is sometimes called a bilinear form. α [u1 . w) (u.17.. . bounded set. INTRODUCTION Example 1. αv + βw) = α (u. Addition and scalar multiplication over R are defined pairwise: [u1 . then [u + v] (x) = u(x) + v(x) = 0 + 0. . αu2 . And for u ∈ H. n. . It is positive: (v. 2 instead of 0. . This is simple because if x ∈ ∂X. Example 1. . This would not have worked if the functions of H 0 were required to take some other value on ∂X. Let X ⊆ Rk be an closed. . 2. a binary function. . . Then Pn forms a vector space over R. Let X ⊆ Rk be a closed. αun ] Note that some authors distinguish between points in n-dimensional space and vectors in ndimensional space.15. α ∈ R. and 1.13. . Another way of viewing this space is it is the space of linear functions which carry vectors of R n to vectors of Rm . . un ] . . . Then H forms a vector space under the “pointwise” defined addition and scalar multiplication over R. v) ≥ 0.g.3. (. We will use Rn to refer to both of them. The collection of all real-valued m × n matrices forms a vector space over the reals with the usual scalar multiplication and matrix addition. u2 . and . 2. Example 1. w) = α (u. with equality holding if and only if v is the zero vector of V.14. αu is the function in [αu] (x) = α (u(x)). u + v is the function in H defined by [u + v] (x) = u(x) + v(x) for all x ∈ X. say. v2 . It is symmetric: (v. like. . u2 .6 CHAPTER 1. . and let H be the collection of all functions from X to R. which takes two vectors of V to R is an inner product if 1. and let H0 be the collection of all functions from X to R that take the value zero on ∂X. bounded set. .2 Inner Products An inner product is a way of “multiplying” two vectors from a vector space together to get a scalar from the same field the space is defined over (e. It is linear in both its arguments: (αu + βv. w) + β (v. as in this text there is no need to distinguish them symbolically. where ui ∈ R for i = 1. . 3. Example 1. . un ] = [αu1 . For a vector space. vn ] = [u1 + v1 . ). and thus u + v has the property that it takes value 0 on ∂X.

1.3. VECTOR SPACES, INNER PRODUCTS, NORMS

7

Example 1.18. The most familiar example of an inner product is the L 2 (pronounced “L two”) inner product on the vector space Rn . If u = [u1 , u2 , . . . , un ] , and v = [v1 , v2 , . . . , vn ] , then letting (u, v)2 = ui vi
i

gives an inner product. This inner product is the usual vector calculus dot product and is sometimes written as u · v or u v. Example 1.19. Let H be the vector space of functions from X to R from Example 1.13. Then for u, v ∈ H, letting (u, v)H = u(x)v(x) dx,
X

gives an inner product. This inner product is like the “limit case” of the L 2 inner product on Rn as n goes to infinity.

1.3.3

Norms

A norm is a way of measuring the “length” of a vector: Definition 1.20. A function · from a vector space, V, to R + is called a norm if 1. It obeys the triangle inequality: x + y ≤ x + y . 2. It scales positively: αx = |α| x , for scalar α. 3. It is positive: x ≥ 0, with equality only holding when x is the zero vector. The easiest way of constructing a norm is on top of an inner product. If (, ) is an inner product on vector space V, then letting u = (u, u) gives a norm on V. This is how we construct our most common norms: Example 1.21. For vector x ∈ Rn , its L2 norm is defined
n
1 2

x

2

=
i=1

x2 i

= x x

1 2

.

This is constructed on top of the L2 inner product. Example 1.22. The Lp norm on Rn generalizes the L2 norm, and is defined, for p > 0, as
n 1/p

x

p

=
i=1

|xi |

p

.

Example 1.23. The L∞ norm on Rn is defined as x

= max |xi | .
i

The L∞ norm is somehow the “limit” of the Lp norm as p → ∞.

8

CHAPTER 1. INTRODUCTION

1.4

Eigenvalues

It is assumed the reader has some familiarity with linear algebra. We review the topic of eigenvalues. Definition 1.24. A nonzero vector x is an eigenvector of a given matrix A, with corresponding eigenvalue λ if Ax = λx Subtracting the right hand side from the left and gathering terms gives (A − λI) x = 0. Since x is assumed to be nonzero, the matrix A − λI must be singular. A matrix is singular if and only if its determinant is zero. These steps are reversible, thus we claim λ is an eigenvalue if and only if det (A − λI) = 0. The left hand side can be expanded to a polynomial in λ, of degree n where A is an n×n matrix. This gives the so-called characteristic equation. Sometimes eigenvectors,-values are called characteristic vectors,-values. Example Problem 1.25. Find the eigenvalues of 1 1 4 −2 Solution: The eigenvalues are roots of 0 = det 1−λ 1 4 −2 − λ = (1 − λ) (−2 − λ) − 4 = λ2 + λ − 6.

This equation has roots λ1 = −3, λ2 = 2. Example Problem 1.26. Find the eigenvalues of A 2 . Solution: Let λ be an eigenvalue of A, with corresponding eigenvector x. Then A2 x = A (Ax) = A (λx) = λAx = λ2 x.

The eigenvalues of a matrix tell us, roughly, how the linear transform scales a given matrix; the eigenvectors tell us which directions are “purely scaled.” This will make more sense when we talk about norms of vectors and matrices.

1.4.1

Matrix Norms

Given a norm · on the vector space, R n , we can define the matrix norm “subordinate” to it, as follows: Definition 1.27. Given a norm · on R n , we define the subordinate matrix norm on R n×n by A = max
x=0

Ax . x

1.4. EIGENVALUES

9

We will use the subordinate two-norm for matrices. From the definition of the subordinate norm as a max, we conclude that if x is a nonzero vector then Ax 2 ≤ x 2 Ax 2 ≤ A A
2 2

thus, x 2.

Example 1.28. Strange but true: If Λ is the set of eigenvalues of A, then A Example Problem 1.29. Prove that AB
2 2

= max |λ| .
λ∈Λ

≤ A

2

B 2.

Solution: AB
2

=df max
x=0

A 2 Bx ABx 2 ≤ max x=0 x 2 x 2

2

= A

2

B 2.

and g ∈ O (hm ) . (1. (1. (1. Prove that λ −1 is an eigenvalue for A−1 .22) If x 2 = 0. 0 0 0 · · · 1/n (1. . . Prove that if λ is an eigenvalue of A. Can you show that f ∈ O hk−1 ? √ √ (1.   . that if λ is an eigenvalue of A then λ k is an eigenvalue of Ak for integer k > 1. .27) Suppose there is some µ > 0 such that.4) Prove that f (h) = −3h5 is in O h5 . . Give the eigenvalues of B = 3A 3 − 4A2 + I.6) Prove that f (h) = h3 is not in O h4 (Hint: Proof by contradiction. (Hint: Take h∗ < 1. (1. Use a O x5 approximate. . 10. Show that f + g ∈ O (hm ) . .26. How does this approximation compare to the O h3 approximate from Example 1. The base case was done in Example Problem 1. for the same eigenvalue.17) Prove. .19) Suppose A is an invertible matrix with eigenvalue λ.25) Show that A 2 = 0 implies that A is the matrix of all zeros. (1. . in absolute value.9) Find a O h4 approximation to ln(1+h).18) Let B = k αi Ai . (1.2) Suppose f ∈ O hk . by induction. what is x 2 ? (1. (1. where λmin is the smallest. .12) Rewrite x + 1 − x to get rid of subtractive cancellation when x is very large. Show that B is singular. (1..8) Find a O h3 approximation to sin h. what does this say about vector x? (1.1.  .) (1.15) Calculate cos(π/2 + 0. in R n . (1.16) Prove that if x is an eigenvector of A then αx is also an eigenvector of A.10 Exercises CHAPTER 1. with m < k. . then x is on a sphere centered at the origin of radius r. p(λ) is an eigenvalue of p(A). Compare the approximate value to the actual when h = 0. where A0 = I. (1.7 for h = 0. Show that f g ∈ O hk+m . Use a O x6 approximate.11) Rewrite √x + 1 − 1 to get rid of subtractive cancellation when x ≈ 0.21) Show that if x 2 = r.10) Suppose that f ∈ O hk . INTRODUCTION (1. Av 2 ≥ µ v 2. unless you remember that O hk is a better approximation than O (hm ) when m < k. (1. 100.24) What is the norm of   1 0 0 ··· 0  0 1/2 0 · · · 0     0 ? A =  0 0 1/3 · · ·   .) Note this may appear counterintuitive.7) Prove that sin(h) is in O (h). (1. (1. (1. . for a given A.26) Show that A−1 2 equals (1/|λmin |) .3) Suppose f ∈ O hk . Thus for polynomial p(x). (1. then k αi λi is i=0 i=0 an eigenvalue of B.14) Use a Taylor’s expansion to rid the expression 1 − cos 2 x of subtractive cancellation for x small. (1. . Show that f ∈ O (hm ) for any m with 0 < m < k. (1. eigenvalue of A. (1.1) Suppose f ∈ O hk . and g ∈ O (hm ) .5) Prove that f (h) = h2 + 5h17 is in O h2 .13) Use a Taylor’s expansion to rid the expression 1 − cos x of subtractive cancellation for x small.23) Letting x = [3 4 12] . √ (1.1? (1.20) Suppose that the eigenvalues of A are 1. Here α is a nonzero real number. (1.001) to within 8 decimal places by using the Taylor’s expansion.

. (1. but is difficult to prove.) (b) Show that A is nonsingular.29) If A 2 ≥ µ > 0 does it follow that A is nonsingular? (1. The inequality in the other direction holds when the norm is · 2 .28) If A is singular. EIGENVALUES 11 for all vectors v. (Should be very simple.4. is it necessarily the case that A 2 = 0? (1.28.1. then A ≥ max |λ| . λ∈Λ where · is any subordinate matrix norm. (a) Show that µ ≤ A 2 . (Recall: A is singular if there is some x = 0 such that Ax = 0.) (c) Show that A−1 2 ≤ (1/µ) . prove that if Λ is the set of eigenvalues of A.30) Towards proving the equality in Example Problem 1.

INTRODUCTION .12 CHAPTER 1.

Please contribute if you find this software useful. 2003 John W.1 Getting Started Matlab is a software package that allows you to program the mathematics of an algorithm without getting too bogged down in the details of data structures. Mathworks continues to tinker with Matlab to make it noncompatible with octave. pointers. See http://www.che.html • http://www. For details.org. octave:1> 1 Gnu Public License.edu/~msgocken/intro/intro. visit http://www.org/copyleft/gpl.html • http://web.ew. For more information.math. available under the GPL 1 . In a lame attempt to retain market share. which is why I recommend the free Matlab clone. 1999. many of them are certainly better than this one. this has the side effect of obsoletizing old Matlab code. then. not even for MERCHANTIBILITY or FITNESS FOR A PARTICULAR PURPOSE.html Report bugs to <bug-octave@bevo.cyclismo.html. 13 . and reinventing the wheel.usna.Chapter 2 A “Crash” Course in octave/Matlab 2. You can find a number of octave/Matlab tutorials for free on the web.edu/~mecheng/DESIGN/CAD/MATLAB/usna. You should get something like: GNU Octave. 2002. is an introduction to octave. Start up octave. freely downloadable from http://www. it also costs quite a bit of money.octave. 2000. Eaton.gnu.html Matlab has some demo programs covering a number of topics– from the most basic functionality to the more arcane toolboxes. A brief web search reveals the following excellent tutorials: • http://www. It also includes graphing capabilities.44 (i686-pc-linux-gnu).wisc. In Matlab. What follows is a lame demo for octave. 1997. There is ABSOLUTELY NO WARRANTY.mtu. octave. version 2. 2001. Copyright (C) 1996. This is free software. and there are numerous packages available for all kinds of functions. Matlab users will have to make some changes. simply type demo. What follows. Unfortunately.org/tutorial/matlab/vector.org/help-wanted. except where explicitly noted otherwise.octave. 1998.edu>. I will try to focus on the intersection of the two systems. see the source code for copying conditions. type ‘warranty’. enabling relatively high-level programming.1.

the last index of a vector is denoted by the special symbol end. and not from 0. I am leaving the semicolon off. octave:6> For illustration purposes. as is more common in modern programming languages. as in what follows. You can also give a range of indices. octave:6> a(1) = 77 a = 2 3 77 octave:7> a(end) = -400 a = 77 2 -400 . Thus the previous becomes octave:5> d = 5*c . This is either a feature or an annoyance. when the variable is a matrix. use parentheses. The basic octavian data structure is a matrix. you only need give a single index.14 CHAPTER 2. as shown below. WARNING: vectors and matrices in octave/Matlab are indexed starting from 1. a scalar is a 1 × 1 matrix. It can be prevented by appending a semicolon at the end of a command. You are warned! Moreover. you need give both indices.2 * b d = -5 2 9 You should notice that octave “echoes” the lvalues it creates.2 * b.3] b = 5 4 3 octave:4> c = a’ c = 1 2 3 octave:5> d = 5*c . A “CRASH” COURSE IN OCTAVE/MATLAB You now have a command line. To access an entry or entries of a matrix. Some simple matrix constructions are as follows: octave:2> a = [1 2 3] a = 1 2 3 octave:3> b = [5.4. a vector is an n × 1 matrix. In the case where the variable is a vector.

624716 0.166880 0.769423 0.027866 0.067067 333.706307 4.1:2) = [1 2.027866 3. if v is a vector. diag(M) returns the diagonal of matrix M as a vector. First we look at matrix manipulation: octave:12> j = M * b j = 13 31 999 octave:13> N = rand(3.769423 octave:15> P = L * M P = 0.2.166880 2.3) N = 0.067067 0.624716 0.1.911833 0.938714 octave:14> L = M + N L = 1.087402 0.3 4] M = 1 2 0 3 4 0 0 0 333 The command diag(v) returns a matrix with v as the diagonal.087402 0. as we will examine later.706307 0.938714 . The form c:d gives returns a row vector of the integers between a and d. GETTING STARTED 15 octave:8> a(2:3) = [22 333] a = 77 22 333 octave:9> M = diag(a) M = 77 0 0 0 22 0 0 0 333 octave:10> M(2.1) = 14 M = 77 0 0 14 22 0 0 0 333 octave:11> M(1:2.911833 0.

c + e.16 7.9014e+00 CHAPTER 2.1120e+05 octave:16> P = L .j = Li. the latter is element by element multiplication. (or something like it if e does not divide d − c) as follows: octave:20> z = 1:5 z = 1 2 3 4 5 .e.b err = 0 0 0 0.9105e+01 2.0000e+00 0. For a zero mean normal distribution with unit variance. and c. .∗ M)i. (L . use randn(m. In line 17 we asked octave to solve the linear system Mx = b.n) gives an m × n matrix with each element “uniformly distributed” on [0.0000e+00 octave:17> x = M \ b x = -6. .1119e+01 1. . The command rand(m.2333e+01 1.7580e+01 3. . .1120e+05 Note the difference between L * M and L . the former is matrix multiplication.0445e+01 2. which give. d.1669e+00 4.2201e+00 1. respectively.0000e+00 0. d.n).2505e+00 1.j . c.0000000 5. 1]. either using the form c:d or the form c:e:d. .0090090 octave:18> err = M * x . by setting x = M\b = M−1 b.0557e+00 1. . A “CRASH” COURSE IN OCTAVE/MATLAB 2. i. Note that you can construct matrices directly as you did vectors: octave:19> B = [1 3 4 5.5000000 0.8499e+01 0.5911e+01 4. c + 1.2 -2 2 -2] B = 3 4 5 1 2 -2 2 -2 You can also create row vectors as a sequence.* M..* M P = 1.j Mi. .0000e+00 1.

m] k = 3 2 5 8 4 2 7 9 2. This behavior is common in octave: many functions which we normally think of as applicable to scalars can be applied to matrices. . • ceil(X) returns the smallest integer not less than X. USEFUL COMMANDS octave:21> z = 5:(-1):1 z = 5 4 3 2 1 octave:22> z = 5:(-2):1 z = 5 3 1 octave:23> z = 2:3:11 z = 2 5 8 11 octave:24> z = 2:3:10 z = 2 5 8 17 Matrices and vectors can be constructed “blockwise. • abs(X) returns |X| . arctangent.2. • sin(X). • floor(X) returns the largest integer not greater than X. it computes the floor element-wise. returns the sine.y] m = 2 5 8 2 7 9 octave:4> k = [(3:4)’. tan(X). • exp(X) returns eX . sqrt(X). Thus octave:2> y=[2 7 9] y = 2 7 9 octave:3> m = [z. atan(X).” Blocks in the same row are separated by a comma. cosine.2 Useful Commands Here’s a none too complete listing of useful commands in octave: • help is the most useful command. square root of X. If X is a vector or matrix. computed elementwise.2. elementwise. computed element-wise. tangent. cos(X). with the result computed elementwise. those in the same column by a semicolon. elementwise.

18

CHAPTER 2. A “CRASH” COURSE IN OCTAVE/MATLAB • norm(X) returns the norm of X; if X is a vector, this is the L 2 norm:
1/2

X

2

=
i

X2 i

,

• • • • •

• •

if X is a matrix, it is the matrix norm subordinate to the L 2 norm. You can compute other norms with norm(X,p) where p is a number, to get the L p norm, or with p one of Inf, -Inf, etc. zeros(m,n) returns an m × n matrix of all zeros. eye(m) returns the m × m identity matrix. [m,n] = size(A) returns the number of rows, columns of A. Similarly the functions rows(A) and columns(A) return the number of rows and columns, respectively. length(v) returns the length of vector v, or the larger dimension if v is a matrix. find(M) returns the indices of the nonzero elements of N. This may not seem helpful at first, but it can be very useful for selecting subsets of data because the indices can be used for selection. Thus, for example, in this code octave:1> v = round(20*randn(400,3)); octave:2> selectv = v(find(v(:,2) == 7),:) we have selected the rows of v where the element in the second column equals 7. Now you see why leading computer scientists refer to octave/Matlab as “semantically suspect.” It is a very useful language nonetheless, and you should try to learn its quirks rather than resist them. diag(v) returns the diagonal matrix with vector v as diagonal. diag(M) returns as a vector, the diagonal of matrix v. Thus diag(diag(v)) is v for vector v, but diag(diag(M)) is the diagonal part of matrix M. toeplitz(v) returns the Toeplitz matrix associated with vector v. That is   v(1) v(2) v(3) ··· v(n)  v(2) v(1) v(2) · · · v(n − 1)     v(3) v(2) v(1) · · · v(n − 2)  toeplitz(v) =     . . . . .. . . .   . . . . . . v(n) v(n − 1) v(n − 2) · · · v(1) In the more general form, toeplitz(c,r) can be used to return a assymmetric Toeplitz matrix. A matrix which is banded on the cross diagonals is evidently called a “Hankel” matrix:   u(1) u(2) u(3) · · · u(n)  u(2) u(3) u(4) · · · v(2)      hankel(u, v) =  u(3) u(4) u(5) · · · v(3)   . .  . . .  . .  . . . . . u(n) v(2) v(3) · · · v(n)

• eig(M) returns the eigenvalues of M. [V, LAMBDA] = eig(M) returns the eigenvectors, and eigenvalues of M. • kron(M,N) returns the Kronecker product of the two matrices. This is a blcok construction which returns a matrix where each block is an element of M as a scalar multiplied by the whole matrix N. • flipud(N) flips the vector or matrix N so that its first row is last and vice versa. Similarly fliplr(N) flips left/right.

2.3. PROGRAMMING AND CONTROL

19

2.3

Programming and Control

If you are going to do any serious programming in octave, you should keep your commands in a file. octave loads commands from ‘.m’ files. 2 If you have the following in a file called myfunc.m: function [y1,y2] = myfunc(x1,x2) % comments start with a ‘%’ % this function is useless, except as an example of functions. % input: % x1 a number % x2 another number % output: % y1 some output % y2 some output y1 = cos(x1) .* sin(x2); y2 = norm(y1); then you can call this function from octave, as follows: octave:1> myfunc(2,3) ans = -0.058727 octave:2> [a,b] = myfunc(2,3) a = -0.058727 b = 0.058727 octave:3> [a,b] = myfunc([1 2 3 4],[1 2 3 4]) a = 0.45465 b = 0.78366 Note this silly function will throw an error if x1 and x2 are not of the same size. It is recommended that you write your functions so that they can take scalar and vector input where appropriate. For example, the octave builtin sine function can take a scalar and output a scalar, or take a vector and output a vector which is, elementwise, the sine of the input. It is not too difficult to write functions this way, it often only requires judicious use of .* multiplies instead of * multiplies. For example, if the file myfunc.m were changed to read y1 = cos(x1) * sin(x2); it could easily crash if x1 and x2 were vectors of the same size because matrix multiplication is not defined for an n × 1 matrix times another n × 1 matrix. An .m file does not have to contain a function, it can merely contain some octave commands. For example, putting the following into runner.m: x1 = rand(4,3); x2 = rand(size(x1)); [a,b] = myfunc(x1,x2)
2

-0.37840

-0.13971

0.49468

The ‘m’ stands for ‘octave.’

20

CHAPTER 2. A “CRASH” COURSE IN OCTAVE/MATLAB

octave allows you to call this script without arguments: octave:4> runner a = 0.245936 0.246414 0.542728 0.257607 b = 1.3135 octave has to know where your .m file is. It will look in the directory from which it was called. You can set this to something else with cd or chdir. You can also use the octave builtin function feval to evaluate a function by name. For example, the following is a different way of calling myfunc.m: octave:5> [a,b] = feval("myfunc",2,3) a = -0.058727 b = 0.058727 In this form feval seems like a way of using more keystrokes to get the same result. However, you can pass a variable function name as well: octave:6> fname = "myfunc" fname = myfunc octave:7> [a,b] = feval(fname,2,3) a = -0.058727 b = 0.058727 This allows you to effectively pass functions to other functions. 0.478054 0.186454 0.419457 0.378558 0.535323 0.206279 0.083917 0.768188

2.3.1

Logical Forks and Control

octave has the regular assortment of ‘if-then-else’ and ‘for’ and ‘while’ loops. These take the following form: if expr1 statements elseif expr2 statements elsif expr3 statements ... else statements end for var=vector statements end while expr

semilogy. octave ships its plotting commands to Gnuplot. ==. but the commands for Matlab are not too different. The test expressions may use the logical conditionals: >.* ((1:n) . Use the help command to get the specific syntax for each command. end %an ‘irregular’ for loop for i=[1 2 3 5 8 13 21 34] fsm = fsm + i./ n). etc. It stands for the last index of a vector of matrix..4 Plotting Plotting is one area in which there are some noticeable differences between octave and Matlab. endfor to end a for loop. as well as the exit point for for loops. Y = sin(X).4. end %a ‘regular’ for loop for i=1:10 sm = sm + i. We present some examples: n = 100. switches. %just plot Y plot(Y). >=. end while (sin(x) > 0) x = x * pi. and contour and mesh for 3D plots. Here are some examples: %compute the sign of x if x > 0 s = 1. PLOTTING statements end 21 Note that the word end is one of the most overloaded in octave/Matlab. Do not use the assignment operator = as a conditional. as this can lead to disastrous results. end 2.2. The main plot command is plot. elseif x == 0 s = 0. X = pi . else s = -1. <=. To simplify debugging. as in C.loglog for 2D plots with log axes. The commands and examples given herein are for octave. ~=. . etc. <. You may also use semilogx. if statements. it is also permissible to use endif to end an if statement.

1 0 0 10 20 30 40 50 60 70 80 90 100 1 line 1 0. but with the ‘right’ X axis labels plot(X.9 0.W).1. %plot W. you should note the difference between plotting a vector.3 0.2 0. but with the ‘right’ X axis labels plot(Y.8 0.4 0.5 1 line 1 (a) Y = sin(X) (b) Y versus X 1 line 1 0.Y).1 0 0 0.7 0.22 CHAPTER 2. .7 0. Some magic commands are required to plot to a file.9 0.7 0.8 0.1: Four plots from octave.2 0.9 0.1(d).9 0. In particular.7 0.6 0.8 0.5 0.6 0.5 0.3 0.5 0. as in Figure 2. I recommend the following magic formula.5 1 1.6 0.4 0.4 0.6 0.9 1 √ (c) W = Y (d) W versus Y Figure 2.5 2 2.1 0 0 10 20 30 40 50 60 70 80 90 100 0. which replots the figure to a file: %call the plot commands before this line gset term postscript color.1 0 0 0.1(c) versus plotting the same vector but with the appropriate abscissa values.3 0.2 0.4 0.7 0. The output from these commands is seen in Figure 2.8 0.8 0.5 0.2 0. as in Figure 2.5 3 3.2 0.5 0.3 0. A “CRASH” COURSE IN OCTAVE/MATLAB %plot Y. W = sqrt(Y).6 0.1 0. plot(W).3 0.4 0. 1 line 1 0. For octave.

’filename.’-deps’. PLOTTING gset output "filename. gset term x11. In Matlab.ps".4. replot.eps’).2. 23 . gset output "/dev/null". the commands are something like this: %call the plot commands before this line print(gcf.

/ 40. Pick n to be reasonably large. y = sin(x). . c) = 1 × 1015 . . Your m-file should have header line like: function [x1.x2] = naivequad(b. for real b.c) Test your code for (b. Either use a ‘for’ loop. i.c) Test your code for (b. Do you get a spurious root? (2. . x = a + (b-a) . b = 5.9) to find the roots of x 2 + bx + c = 0.24 Exercises CHAPTER 2.* (0:40) .2) Implement the na¨ quadratic formula to find the roots of x 2 + bx + c = 0.1) What do the following pieces of octave/Matlab code accomplish? (a) x = (0:40) . or choose some convergence criterion (i. . terminate if |xi+1 − xi | < 1 × 10−10 ).y).4) Write octave/Matlab code to find a fixed point for the cosine. Do you get a spurious root? (2. Your ıve code should return √ −b ± b2 − 4c .e. or write the function to recursively call itself. (b) a = 2.* (0:40) . Example Problem 1. 1. some x such that x = cos(x). then let xi+1 = cos(xi ) for i = 0./ 40.e. plot(x. n. (2.. c) = 1 × 1015 .5) Write code to implement the factorial function for integers: function [nfact] = factorial(n) where n factorial is equal to 1 · 2 · 3 · · · (n − 1) · n. A “CRASH” COURSE IN OCTAVE/MATLAB (2. c. Do this as follows: pick some initial value x 0 . (c) x = a + (b-a) .3) Implement a robust quadratic formula (cf.. ./ 40. Does your code always converge? (2. 1 . 1 .x2] = robustquad(b. 2 Your m-file should have header line like: function [x1.

Chapter 3

Solving Linear Systems
A number of problems in numerical analysis can be reduced to, or approximated by, a system of linear equations.

3.1

Gaussian Elimination with Na¨ Pivoting ıve

Our goal is the automatic solution of systems of linear equations: a11 x1 a21 x1 a31 x1 . . . + a12 x2 + a22 x2 + a32 x2 . . . + a13 x3 + a23 x3 + a33 x3 . . . + ··· + ··· + ··· .. . + a1n xn + a2n xn + a3n xn . . . = b1 = b2 = b3 . . .

an1 x1 + an2 x2 + an3 x3 + · · ·

+ ann xn = bn

In these equations, the aij and bi are given real numbers. We also write this as Ax = b, where A is a matrix, whose element in the i th row and j th column is aij , and b is a column vector, whose ith entry is bi . This gives the easier way of writing this equation:        a11 a21 a31 . . . a12 a22 a32 . . . a13 a23 a33 . . . ··· ··· ··· .. . a1n a2n a3n . . . ann        x1 x2 x3 . . . xn   b1 b2 b3 . . . bn       

an1 an2 an3 · · ·

      =    

(3.1)

3.1.1

Elementary Row Operations

You may remember that one way to solve linear equations is by applying elementary row operations to a given equation of the system. For example, if we are trying to solve the given system of equations, they should have the same solution as the following system: 25

26

CHAPTER 3. SOLVING LINEAR SYSTEMS

where κ is some given number which is not zero. It suffices to solve this system of linear equations, as it has the same solution(s) as our original system. Multiplying a row of the system by a nonzero constant is one of the elementary row operations. The second elementary row operation is to replace a row by the sum of that row and a constant times another. Thus, for example, the following system of equations has the same solution as the original system:  a11 a21 a31 . . . a12 a22 a32 . . . a13 a23 a33 . . . ··· ··· ··· .. . a1n a2n a3n . . . a(i−1)n ain + βajn . . . ann  x1 x2 x3 . . .   b1 b2 b3 . . .              

        κai1 κai2 κai3 · · ·   . . . .. . .  . . . . . an1 an2 an3 · · ·

a11 a21 a31 . . .

a12 a22 a32 . . .

a13 a23 a33 . . .

··· ··· ··· .. .

a1n a2n a3n . . . κain . . . ann

           

x1 x2 x3 . . . xi . . . xn

          =          

b1 b2 b3 . . . κbi . . . bn

           

The idea of Gaussian Elimination is to use the elementary row operations to put a system into upper triangular form then use back substitution. We’ll give an example here:

We have here switched the second and third rows. The purpose of this e.r.o. is mainly to make things look nice. Note that none of the e.r.o.’s change the structure of the solution vector x. For this reason, it is customary to drop the solution vector entirely and to write the matrix A and the vector b together in augmented form:   a11 a12 a13 · · · a1n b1  a21 a22 a23 · · · a2n b2     a31 a32 a33 · · · a3n b3      . . . . .. . . . .   . . . . . an1 an2 an3 · · · ann bn

We have replaced the ith row by the ith row plus β times the j th row. The third elementary row operation is to switch rows:      b1 x1 a11 a12 a13 · · · a1n  a31 a32 a33 · · · a3n   x2   b3        a21 a22 a23 · · · a2n   x3   b2   =    . . . .  .   .  .. . . .  .   .   . . . . . . . . an1 an2 an3 · · · ann xn bn

        a(i−1)1 a(i−1)2 a(i−1)3 ···   ai1 + βaj1 ai2 + βaj2 ai3 + βaj3 · · ·   . . . .. . . .  . . . . an1 an2 an3 ···

         x(i−1)    xi   . .  . xn

            =   b(i−1)     bi + βbj     . .   . bn

3.1. GAUSSIAN ELIMINATION WITH NA¨ IVE PIVOTING Example Problem 3.1. Solve the set of linear equations: −3x1 − 4x2 + 4x3 = −7 2x1 + 1x2 + 1x3 = 7 x1 + x 2 − x 3 = 2

27

The matrix is now in upper triangular form: there are no nonzero entries below the diagonal. This corresponds to the set of equations: x1 + x 2 − x 3 = 2 −x2 + x3 = −1 2x3 = 4

We now add −1 times the second row to the third row to get:   1 1 −1 2  0 −1 1 −1  0 0 2 4

We add 3 times the first row to the second, and −2 times the first row to the third to get:   1 1 −1 2  0 −1 1 −1  0 −1 3 3

Solution: We start by rewriting in the augmented form:   1 −1 2 1  −3 −4 4 −7  2 1 1 7

We now solve this by back substitution. Because the matrix is in upper triangular form, we can solve x3 by looking only at the last equation; namely x 3 = 2. However, once x3 is known, the second equation involves only one unknown, x 2 , and can be solved only by x2 = 3. Then the first equation has only one unknown, and is solved by x 1 = 1. All sorts of funny things can happen when you attempt Gaussian Elimination: it may turn out that your system has no solution, or has a single solution (as above), or an infinite number of solutions. We should expect that an algorithm for automatic solution of systems of equations should detect these problems.

3.1.2

Algorithm Terminology

The method outlined above is fine for solving small systems. We should like to devise an algorithm for doing the same thing which can be applied to large systems of equations. The algorithm will take the system (in augmented form):   a11 a12 a13 · · · a1n b1  a21 a22 a23 · · · a2n b2     a31 a32 a33 · · · a3n b3      . . . . .. . . .   . . . . . . an1 an2 an3 · · · ann bn

. . By pivoting on the second row.r. . . Solve the system of equations given by the augmented form: −0. . . . The quantity a11 is the multiplier for the ith row. 1 ≤ j ≤ n) ai1 a11 Effectively we are carrying out the e. . the algorithm could be written recursively. . ..   .6452 Note that the exact solution of this system is x 1 = 10.0590 0. the algorithm then generates the system:   a11 a12 a13 · · · a1n b1  0 a a2n b2  22 a23 · · ·    0 0 a33 · · · a3n b3      . Bad things can happen if akk is merely small instead of zero. . Hereafter the algorithm will not alter the first row or first column of the system. 1 ≤ j ≤ n) 3. The algorithm then pivots on the pivot element to get the system:   a11 a12 a13 · · · a1n b1  0 a a2n b2  22 a23 · · ·    0 a a3n b3  32 a33 · · ·     .830508. SOLVING LINEAR SYSTEMS The algorithm then selects the first row as the pivot equation or pivot row. .28 CHAPTER 3. 0 an2 an3 · · ·  a1j  b1  ann bn Where aij = aij − bi = bi − ai1 a11 ai1 a11 (2 ≤ i ≤ n. and the first element of the first row. however.o. . .3 Algorithm Problems The pivoting strategy we examined in this section is called ‘na¨ because a real algorithm is a bit ıve’ more complicated.0590 which will be rounded to −1. a11 is the pivot element. The algorithm we have outlined is far too rigid–it always chooses to pivot on the k th row during the k th step. .831 by the algorithm..3528 0. This would be bad if the pivot element were zero.1080 ≈ −1. na¨ ıvely.. . in this case all aik the multipliers akk are not defined.1080 −0. x2 = 1. .. pivots on the first equation.. Thus. .2372 −0. 0 0 an3 · · · ann bn In this case aij bi = aij − = bi − ai2 a22 ai2 a22  a2j  b2  (3 ≤ i ≤ n.4348 0. . . . .2. that the algorithm uses only 4 significant figures for its calculations.1. Suppose. Consider the following example: Example 3. of replacing the i th row by the ith row minus times ai1 the first row. The algorithm. . . −0.   . . The multiplier for the second row is 0.

3528 − 0.6452 − (−1.2 Pivoting Strategies for Gaussian Elimination Gaussian Elimination can fail when performed in the wrong order.6) /−0. We have lost three figures with this subtraction.0008. This puts the system in the form −0.059 = (−0.3528 − 0. We also saw that a pivot small in magnitude can cause failure.831)(0. but not the second.6460 = −0. 29 where the arithmetic is rounded to four significant figures each time. it gets the value x2 = −0. Then it solves x1 = 0.3795) /−0. the second vector entry becomes: 0.3. This is nearly correct for x2 . which is no good.2372 −0.4348 − (−1.41.6.3528) = 0. −0.3528 0 −0. then 1 is enormous compared to both 1 and 2. The algorithm now finds x1 = (−0.0008 = 1. and again there is subtractive cancelling.059 = (−0.2. If the algorithm selects a zero pivot.7323) /−0.6452 − 0. x2 satisfies the first equation. There is some serious subtractive cancellation going on here.0005.2372 · 1.4348 + 0.0008 When the algorithm attempts back substitution. The errors get worse from here. With poor rounding. 3. intermediate steps are rounded to four significant figures. This is also a bit off. again. Note that this choice of x1 . Similarly. but is an awful approximation for x1 .4343 = −0.059 = 12. Now suppose the algorithm changed the order of the equations. where each step has rounding to four significant figures. the multipliers are undefined.831)(−0.0005 This is a bit off from the actual value of 1. the algorithm solves x2 as 1. As here: x1 + x2 = 1 x1 + x 2 = 2 The na¨ algorithm solves this as ıve 2− 1 =1− 1− 1− 1 1 − x2 1 x1 = = 1− x2 = If is very small. where.2372) = −0. PIVOTING STRATEGIES FOR GAUSSIAN ELIMINATION The second entry in the matrix is replaced by −0.0005 −0.0590 0. then solved: x1 + x 2 = 2 x1 + x2 = 1 .

2. .1 Scaled Partial Pivoting The na¨ G.E. . The problem is not the small entry per se: Suppose we use an e. . . 3. then row 2 .: ıve 1 1 x1 + x2 = x1 + x 2 = 2 This is still solved as x2 = x1 = and rounding is still a problem. works well.E. . producing a bunch of zeros in the second column. . The algorithm pivots on row 2 .2 | si is maximized. n. . Then the first pivot. . . 1≤j≤n The scales are only computed once. until finally pivoting on row n−1 .1 | si is maximized. SOLVING LINEAR SYSTEMS 1−2 1− x1 = 2 − x 2 There’s no problem with rounding here. to scale the first equation. As shown above. . algorithm uses the rows 1. The algorithm pivots on row 1 . ıve this can cause errors. Note that the algorithm should not rearrange the matrix–this takes too much work. In the k th step k is chosen to be the i not among 1 . producing a bunch of zeros in the first column. The strategy of scaled partial i=1 pivoting is to compute this permutation so that G. row-wise: si = max |aij | .k | si .o. we want to pivot on an element which is not small compared to other elements in its row. then use na¨ G. k−1 such that |ai. The second pivot. n-1 in order as pivot equations.30 The algorithm solves this as x2 = CHAPTER 3. 2 . . 2. is chosen to be the i such that |ai. is chosen to be the i such that |ai. Better is to pivot first on row 1 . 2.E. . So our algorithm first determines “smallness” by calculating a scale. 2 . etc. for some permutation { i }n of the integers 1. 2− 1 1− 1 1 − x2 . but without choosing 2 = 1 .r. 1 . In light of our example.

When we do back substitution.3. and pivot:   0 −2 2 4 8  0 2 −2 1 −3     2 1 1 3 7  0 2 1 8 10 We pick are: 2.. PIVOTING STRATEGIES FOR GAUSSIAN ELIMINATION 31 is maximized. search only those indices in the tail of this vector: i. . we work in this order reversed on the rows. .E. These values are: 2 4 2 6 . These values The matrix is in permuted upper triangular form.2. Then 4 = 1.. s2 = 7. x1 . i. . We get x4 = 1. We could proceed. In this case we pick the second row. 4. If we did proceed we would have 3 = 4. and perform a swap. We’ll look at a homework question. It’s hard to come up with an ugly fractions. . producing a bunch of zeros in the k th column.  3 7 15 0 7 11   1 3 7  4 17 31 We present an example of using scaled partial example where the numbers do not come out as  2 −1  4 4   2 1 6 5 The scales are as follows: s1 = 7. The slick way to implement this is to first set i = i for i = 1. only among j for k ≤ j ≤ n.2 An Example pivoting with G. 7 7 17 We have a tie. and should be the index which maximizes |a i2 | /si . 1. . Our row permutation is 3. . It should not be 3. The algorithm pivots on row k . . so x3 = 1 (13 − 7 ∗ 1) = 2 3 1 x2 = (−3 − 1 ∗ 1 + 2 ∗ 2) = 0 2 1 x1 = (7 − 3 ∗ 1 − 1 ∗ 2 − 1 ∗ 0) = 1 2 2 2 2 . 2 = 2. s4 = 17. x2 .e. 2. j . find the j such that j should be the first pivot and switch the values of 1 . swap them. We pick 1 . 2.e. n. 3. . and no changes would occur.e.2. It should be the index which maximizes |a i1 | /si . Then rearrange this vector in a kind of “bubble sort”: when you find the index that should be 1 .. then x3 . Then at the k th step. but would get a zero multiplier. We pivot:   0 0 0 5 5  0 2 −2 1 −3     2 1 1 3 7  0 0 3 7 13 . . i. solving x 4 . s3 = 3. 7 7 3 17 We pick 1 = 3.

It turns out we can run G. SOLVING LINEAR SYSTEMS 3.  3   0 3 2  2   2   4   1 5 7 1 4 2 4 2 3 5 Our scale choices are 8 . We choose 3 2 = 4.  1  3  Now suppose we had to solve the linear system for b = −1 8 2 1 .  9    1 2   2  4  =  . 2     1 2   2  4  =  . M1 =  1 . We illustrate with an example:     1 1 2 4 1  2   4 2 1 2   =  . 3 .32 CHAPTER 3. and so swap 27 10 1 5 2. picking off the boxed numbers (your computer doesn’t really have boxed variables). In the places where there 4 3 would be zeros in the real matrix. We will illustrate them here boxed:   1 3 15 1  4 2 4 2      2  4 2 1 2     1    =  . we sweep through the first column of M 3 . 4:      M2 =      Our scale choices are 27 1 40 . 4 1 3 2 1 The scale vector is s = 4 4 3 3 . 1 1 Our scale choices are 4 . and so swap 27 10 1 5      M3 =      3 5 2 0 5 2 1 5 9 7 4   2    17  . 2 . 6 .E. 0 .  3  1  3. 2 . so. and scaling b .3 Another Example and A Real Algorithm Ax = b Sometimes we want to solve for a number of different vectors b. 4 . 2 . which can then be used multiple times on different vectors b. M0 =   3   2 1 2 3 . We scale b by the multipliers in order: 1 = 2.  1 4 4 1 2 1 4 3 5 2 0 5 2 1 3 2 7 4   2    . We choose 1 = 2. and swap 1 .2. we will put the multipliers. 4: We choose  1 4 4 1 2 1 4 3 = 1. on the matrix A alone and come up with all the multipliers.

. . . . . LU FACTORIZATION appropriately:  This continues:   −1 −3  8   8     2  ⇒  −2 1 −1  12  −5 −3  8  8      −2  ⇒  −2 −1 −1        33 This proceeds as We then perform a permuted backwards substitution on the augmented system   0 0 27 1 − 12 10 5 5  4 2 1 2 8    17  0 0 0 −2  9 3 1 −1 0 5 7 2 4 2 x4 = −6 −2 9 = 3 17 17 10 12 1 −6 x3 = − − = . actually factors A into lower and upper triangular parts.E. .. . ..3 LU Factorization Ax = b. . x2 = 5 2 17 4 −6 1 − x3 − 2x2 = .  . . to solve the system where A is a matrix: We want to show that G. . a13 a23 a33 . . that is A = LU. ..3. . . . U= 0   .  0 0 0 · · · unn n1 n2 n3 · · · 1 We call this a LU Factorization of A. where     1 0 0 ··· 0 u11 u12 u13 · · · u1n  21 1  0 u22 u23 · · · u2n  0 ··· 0         0 u33 · · · u3n  . . We examined G. a12 a22 a32 .   . . ··· ··· ··· . . L =  31 32 1 · · · 0  . ann  an1 an2 an3 · · ·    . .  .E. . . . . . . . . 8−2 x1 = 4 17  − 12 5     ⇒  82   −   3 −1 Fill in your own values here.. .  .    A=    a11 a21 a31 . 27 5 5 17 1 −6 7 2 −1 − − x3 = .   . .  . 3. . . . .3. a1n a2n a3n . . .

it is like we are solving the system M1 Ax = M1 b.E. 1. We can view this pivot as a multiplication by M 2 .r. . Thus after the first pivot. We pivot to get   2 1 1 3 7  0 2 −2 1 −3     0 2 1 8 10  0 −2 −1 4 8 1  −2 M1 =   −3 −1  0 1 0 0 0 0 1 0  0 0   0  1 Careful inspection shows that we’ve merely multiplied A and b by a lower triangular matrix M 1 : The entries in the first column are the negative e. multipliers for each row. −1.1 An Example We consider solution of the following augmented form:  2 1  4 4   6 5 2 −1 The na¨ G. 3. We pivot on the second row to get:  2  0   0 0  1 1 3 7 2 −2 1 −3   0 3 7 13  0 −3 5 5   0 0   0  1 The multipliers are 1. reduces this to ıve  2  0   0 0  1 1 3 7 2 −2 1 −3   0 3 7 13  0 0 12 18  1 3 7 0 7 11   4 17 31  0 7 15 (3. Since this is the na¨ ıve ıve version. we first pivot on the first row. with 1 0 0  0 1 0 M2 =   0 −1 1 0 1 0 We are now solving M2 M1 Ax = M2 M1 b. SOLVING LINEAR SYSTEMS 3..o. and see how it is a LU Factorization.3. Our multipliers are 2.2) We are going to run the na¨ G.34 CHAPTER 3.E.

that is. A = M1 −1 M2 −1 M3 −1 U.3.3. LU FACTORIZATION We pivot on the third row.  2  0 U=  0 0 Then we have if we let  1 1 3 2 −2 1   0 3 7  0 0 12 M3 M2 M1 A = U. =  6 5 4 17  2 −1 0 7 of ones on the diagonal and multipliers  1 1 3 2 −2 1   0 3 7  0 0 12 . Now we check to see if we made any mistakes:   2 1 0 0 0  0  2 1 0 0  LU =   3 1 1 0  0 0 1 −1 −1 1   2 1 1 3  4 4 0 7   = A. We write them here:     1 0 0 0 1 0 0 0 1 0 0 0  2 1 0 0  0 0 0  1 0 0  0 1    L=  0 0  3 0 1 0  0 1 0  1 1 0 0 0 −1 1 0 −1 0 1 1 0 0 1   1 0 0 0  2 1 0 0   =  3 1 1 0  1 −1 −1 1 This is really crazy: the matrix L looks to be composed under the diagonal. and has ones on the diagonal. A = (M3 M2 M1 )−1 U. But we have an upper triangular form. It turns out that the inverse of each Mi matrix has a nice form (See Exercise (3.6)). Thus we get   2 1 1 3 7  0 2 −2 1 −3     0 0 3 7 13  0 0 0 12 18 1  0 M3 =   0 0  0 1 0 0 0 0 1 1  0 0   0  1 35 We have multiplied by M3 : We are now solving M3 M2 M1 Ax = M3 M2 M1 b. We are hoping that L is indeed lower triangular. with a multiplier of −1. A = LU.

we first solve Lz = b.E. we can solve for z with a forward substitution.36 CHAPTER 3. then solve Ux = z. Since L is lower triangular.2 Using LU Factorizations We see that the G. We now examine how we can use the LU factorization to solve the equation Ax = b.3. Similarly. algorithm can be used to actually calculate the LU factorization. We drag out the previous example (which we never got around to solving):   2 1 1 3 7  4 4 0 7 11     6 5 4 17 31  2 −1 0 7 15 We had found the LU factorization of A as  So we solve 1 0 0  2 1 0 A=  3 1 1 1 −1 −1  1 0 0  2 1 0   3 1 1 1 −1 −1  0 2 1 1 3 0   0 2 −2 1  3 7 0  0 0 0 0 1 0 12   7 0  11 0  z =   31 0  15 1          We get Now we solve  7  −3   z=  13  18  2  0   0 0    7 1 1 3   2 −2 1   x =  −3   13  0 3 7  18 0 0 12  z=   37 24 −17 12 5 6 3 2 We get the ugly solution     . Since we have A = LU. We will look at this in more detail in another example. SOLVING LINEAR SYSTEMS 3. we can solve for x with a back substitution. since U is upper triangular.

then run n solves using forward and backward substitutions.1 An Operation Count for Gaussian Elimination We consider the number of floating point operations (“flops”) required to solve the system Ax = b. where L is the lower triangular part of the output matrix. First we look at how many floating point operations are required to reduce   a11 a12 a13 · · · a1n b1  a21 a22 a23 · · · a2n b2     a31 a32 a33 · · · a3n b3     . and U is the upper triangular part. . once. and na¨ Gaussian Elimination does not encounter a zero ıve pivot.4. Obviously we are only going to run G. This theorem is proved in Cheney & Kincaid [7]. . If A is an n × n matrix.  . Then back substitution is used to solve for x. 3.3.4. This theorem relies on us using the fancy version of G. . . L is the lower triangular part but with ones put on the diagonal. If correctly implemented. Gaussian Elimnation first uses row operations to transform the problem into an equivalent problem of the form Ux = b . Thus we can find the inverse of A by running n linear solves.3 Some Theory We aren’t doing much proving here. Recall we are trying to solve We examine the computational cost of Gaussian Elimination to motivate the search for an alternative algorithm. where U is upper triangular.E.. . The following theorem has an ugly proof in the Cheney & Kincaid [7]. 3. then the j th column of the inverse A−1 solves the equation Ax = ej . . which saves the multipliers in the spots where there should be zeros. but with a one in the j th position..3. . then the algorithm generates a LU factorization of A.3.  .4 Iterative Solutions Ax = b. then. .3. ITERATIVE SOLUTIONS 37 3. Since AA−1 = I. Theorem 3.  . This appears to me to be a case of something which can be better illustrated with an example or two and some informal investigation.E. to put the matrix in LU form. 3. . where ej is the column matrix of all zeros.4 Computing Inverses We consider finding the inverse of A. . an1 an2 an3 · · · ann bn . The proof is an unillustrating index-chase–read it at your own risk.

.  0 ···   0 ··· 0 ··· a1. Similarly.  . CHAPTER 3. . b1 . j=1 Recalling that n j=1 1 j = (n)(n + 1)(2n + 1). which is an (n − 1) by (n − 1) system. . . SOLVING LINEAR SYSTEMS ··· ··· ··· .        an−2. a1n a2n a3n . .  . a13 a23 a33 . In total this is 2n 2 − n − 1 floating point operations to do a single pivot on the n by n system.n xn ] . . . and requires 3 flops. Thus in total back substitution requires Bn total floating point operations with n Bn = j=1 2j − 1 = n(n − 1) − n = n(n − 2) ≈ n2 . b1 b2 b3        an2 an3 · · · ann bn First a multiplier is computed for each row. we use In total floating point operations. . 6 2 n and j=1 1 j = (n)(n + 1). a1n .n−2 an−2. . This requires 2(n − 1)2 − (n − 1) − 1 operations. . Then this has to be done on the next subsystem. .. .n−1 an−1. 0 a12 a22 a32 .n bn−2 0 an−1.n−1 . Then this has to be done recursively on the lower right subsystem.n−2 .. . . requiring 2(n − 2)2 − (n − 2) − 1 operations. . a1. 6 3 Now consider the costs of back substitution. Then to solve for x n−1 we compute xn−1 = 1 an−1.n bn−1 0 0 ann bn for xn requires only a single division. solving for x n−2 requires 5 flops.n−1 [bn−1 − an−1. To solve  a11 · · ·  . and so on. Then in each row the algorithm performs n multiplies and n adds. .n−1 an−2. with n n n In = 2 j=1 j − 2 j=1 j− 1. In total. . 2 We find that 1 2 In = (4n − 1)n(n + 1) − n ≈ n3 .38 to        a11 0 0 . . This gives a total of (n − 1) + (n − 1)n multiplies (counting in the computing of the multiplier in each of the (n − 1) rows) and (n − 1)n adds. then. .

We now translate this scalar result into the vector case. . Note that this update relies on vector additions and possibly by premultiplication of a vector by A or Q. + r k ωb = ωb + r 1 + r + r 2 + .3) for some matrix Q.4.4. The algorithm proceeds as follows: first fix some initial estimate of the solution. for scalars A. and r = 1 − ωA. we would like the complexity of our computations to scale with the sparsity of A. and some scaling factor ω. First let x (0) = ωb. You should now convince yourself that because r n → 0. then let x(k) = ωb + rx(k−1) . if A is sparse (has few nonzero elements per row). this may take too then about n long to be practical. In the case where these two matrices are sparse. . . x (0) . This gives the approximate solution to our one dimensional problem as x ≈ 1 + r + r 2 + r 3 + . We solve this by x= 1 1 1 1 b= ωb = ωb = ωb. .e. This can be done so long as A = 0. . when A is n × n. Then calculate successive approximations to the actual solution by updates of the form x(k) = ωb + (I − ωA) x(k−1) . the terms r n converge to zero as n → ∞. Now use the geometric expansion: 1 = 1 + r + r2 + r3 + . . that under any choice of initial iterate convergence is guaranteed. one in which successive iterates are defined implicitly. First we consider the simplest case. Thus we look for an alternative algorithm.. A good choice might be ωb. 1−r Because of the assumption |r| < 1. When n is large. b. ITERATIVE SOLUTIONS 39 3. that the choice of the initial iterate x (0) was immaterial. + r k−1 ωb This suggests an iterative approach to solving Ax = b. Suppose we are to solve the equation Ax = b. (3. + r k ωb = ωb + r + r 2 + r 3 + . 2 operations to solve the system. i. .2 Dividing by Multiplying 2 We saw that Gaussian Elimination requires around 3 n3 operations just to find the LU factorization. but this is not necessary. A ωA 1 − (1 − ωA) 1−r where ω = 0 is some real number chosen to “weight” the problem appropriately. It turns out that we can consider a slightly more general form of the algorithm. Additionally. . . That is we consider iterates of the form Qx(k+1) = (Q − ωA) x(k) + ωb.3. n = 1. which would have been a problem anyway. The iterates x(k) will converge to the solution of Ax = b if |r| < 1. Now suppose that ω is chosen such that |r| < 1. such an update can be relatively cheap.

40 CHAPTER 3. (Q − A) =  −2 −3 −1 −2 −5  . but there are two considerations we should keep in mind: 1. Letting ω = 1. A =  2 4 0 .” at the other is the Richardsons. x(k) converges to some vector x∗ . SOLVING LINEAR SYSTEMS Now suppose that as k → ∞. Example Problem 3. we should pick Q such that the equation Qx(k+1) = z is easy to solve exactly.4 Richardson Iteration At the other end of the spectrum is the Richardson Iteration. However.3 Impossible Iteration I made up the term “impossible iteration. We have some freedom in choosing Q. This seems to be the best choice for satisfying the first goal. Choice of Q affects ease of computing the update. we want Q to be similar to A. 0 0 1   −5 −1 −1 0 . ωAx∗ = ωb. 1 2 6 6 Solution: We let  1 0 0 Q =  0 1 0 . which is a fixed point of the iteration. which chooses Q to be the identity matrix. This method should clearly converge in one step. Solving the system Qx(k+1) = z is trivial: we just have x(k+1) = z. 2. These two goals conflict with each other. our method becomes Ax(k+1) = (A − A) x(k) + b = b. Ax∗ = b. Use Richardson Iteration with ω = 1 on the system     12 6 1 1 b =  0 . we are considering iterative methods because we cannot easily solve this linear equation in the first place. That is. the second goal is totally ignored.4. At one end of the spectrum is the so-called “impossible iteration. Indeed.” But consider the method which takes Q to be A.4. 3. Choice of Q affects convergence and speed of convergence of the method. given z = (Q − A) x(k) + b. Then Qx∗ = (Q − ωA) x∗ + ωb. 3. Qx∗ = Qx∗ − ωAx∗ + ωb. In particular.4.

say x(0) = [2 2 2] . Example Problem 3. Note the real solution is x = [2 − 1 1] . Thus. The Richardson Iteration does not appear to converge for this example. defined as b − Ax (k) . Continuing. we see that the operation can be described by   n 1  (k+1) (k) xj = bj − aji xi  .4. ajj i=1. 3.99998 0. Use Jacobi Iteration. By considering the update elementwise. −1 0 .i=j .4. with ω = 1.6. −1 −2 0  Q−1 =  0 0 4 3  1 6 0 0 1 4 − 2 10 . unfortunately. and x(2) = [42 34 78] . This is more similar to A than the identity matrix. In fact. We get x(1) = [−2 − 10 − 10] .3. and finally x(12) = [2 − 0. We get x(1) =  0 −1 −1 (Q − A) =  −2 0 0 . we find that x(5) ≈ [1. say x(0) = [2 2 2] . to solve the system     6 1 1 12 A =  2 4 0 . x(0) = [2 2 2] . We can rethink the Richardson Iteration as x(k+1) = (I − ωA) x(k) + ωb = x(k) + ω b − Ax(k) . with less than k nonzero entries per row. but nearly as simple to invert.981] . 1 6 Thus an update takes less than 2n2 operations. 3 9 Note the real solution is x = [2 − 1 1] .99998] . ITERATIVE SOLUTIONS 41 We start with an arbitrary x(0) . Example Problem 3. the update should take less than 2nk operations. 1 We start with the same x(0) as previously. x(2) = [2 − 4/9 7/9] .019 0.5. 0 0 6 13 6 We start with an arbitrary x(0) . There is an alternative way to describe the Jacobi Iteration for ω = 1. Thus at each step we are adding some scaled version of the residual. if A is sparse. Then x(2) =  0 0 .  2 0 .5 Jacobi Iteration The Jacobi Iteration chooses Q to be the matrix consisting of the diagonal of A. 1 2 6 6 Solution: We let   6 0 0 Q =  0 4 0 . Apply Richardson Iteration with ω = 1/6 Solution: Our iteration becomes    0 −1/6 −1/6 x(k+1) =  −1/3 1/3 0  x(k) +  −1/6 −1/3 0 on the previous system. to the iterate. the choice of ω has some affect on convergence. We get x(1) = [4/3 0 0] .987 − 1. b =  0 .

n. . . This iterative method is known as “red-black Gauss Seidel. .4. We get x(1) = Already this is fairly close to the actual solution x = [2 − 1 1] . xj ← ajj i=1. say x(0) = [2 2 2] . but involves more work for inverting. Here the Q is more like A than for Jacobi Iteration. An alteration of the Gauss Seidel Iteration is to make successive “sweeps” of this redefinition. In this case solving the system Qx(k+1) = z is performed by forward substitution. Again this takes less than 2n2 operations. 1 2 6 35 18   12 b =  0 . .7. there is an easier way to describe the Gauss Seidel Iteration. n − 1. Then x(2) = 1 . element by element. SOLVING LINEAR SYSTEMS 3. in the sum on the right there are some “old” values of xi and some “new” values. Just as with Jacobi Iteration. the new values are those x i for which i < j. . Define the error vector: e(k) = x(k) − x. . 2. Example Problem 3. . including the diagonal. one for j = 1. 1 2 6 Solution: We let  6 0 0 Q =  2 4 0 . 0 0 0 4 3 2 −31 We start with an arbitrary x(0) . we set   n 1  bj − aji xi  . . then running it with Q the upper triangular part. .6 Gauss Seidel Iteration The Gauss Seidel Iteration chooses Q to be lower triangular part of A. 2. − 35 36 . . . . 1.42 CHAPTER 3. 2. However.” 3.4. This amounts to running Gauss Seidel once with Q the lower triangular part of A.i=j This looks exactly like the Jacobi update.4. Thus for j = 1. Or less than 2nk if A is sufficiently sparse.7 Error Analysis Suppose that x is the solution to equation 3. In this case we will keep a single vector x and overwrite it. 6     0 −1 −1 (Q − A) =  0 0 0 . the next for j = n. . Use Gauss Seidel Iteration to again solve for  6 1 1 A =  2 4 0 . n.

There are a number of different related results that show when various methods will work for certain choices of ω.” In fact.Q. x(k) → x) if I − ωQ−1 A This gives the theorem: Theorem 3. where the result relied on ω being chosen such that |1 − ωA| < 1.4.. We leave these to the exercises.3. (See Example Problem 1. = I − ωQ−1 A 2 43 x(k+1) = Q−1 Qx(k) − ωQ−1 Ax(k) + ωQ−1 Ax. with corresponding eigenvalue λ. . This relation may allow us to pick the optimal ω for given A.29. e(k+1) = I − ωQ−1 A e(k) . <1 Another way of saying this is “the spectral radius of I − ωQ −1 A is less than 1. To do this we recall matrix and vector norms from Subsection 1.8. An iterative solution scheme converges for any starting x (0) if and only if all eigenvalues of I − ωQ−1 A are less than 1 in absolute value. i. Reusing this relation we find that e(k) = I − ωQ−1 A e(k−1) . It can also show us that sometimes no choice of ω will give convergence of the method.4. x(k+1) − x = x(k) − x − ωQ−1 A x(k) − x .) Thus our iteration converges (e(k) goes to the zero vector.e. the speed of convergence is decided by the spectral radius of the matrix–convergence is faster for smaller values. e(k+1) = e(k) − ωQ−1 Ae(k) . ITERATIVE SOLUTIONS Now notice that x(k+1) = Q−1 (Q − ωA) x(k) + Q−1 ωb.1. e(k−2) . You should now think about how eigenvalues generalize the absolute value of a scalar. Recall our introduction to iterative methods in the scalar case. i. Let y be an eigenvector for Q−1 A. x(k+1) = x(k) − ωQ−1 A x(k) − x .e. Then I − ωQ−1 A y = y − ωQ−1 Ay = y − ωλy = (1 − ωλ) y.. if and only if I − ωQ−1 A 2 2 < 1. and how this relates to the norm of matrices. = I − ωQ−1 A k We want to ensure that e (k+1) is “smaller” than e(k) . e(k) 2 = I − ωQ−1 A k e(0) 2 ≤ I − ωQ−1 A k 2 e(0) 2 . e(0) .

With some work it can be shown that all three of these values will be less than one in absolute value if and only if 3 0<ω< ≈ 0. Thus the eigenvalues of I − ωA are approximately 1 − 7. again. and 4.5. with x the actual solution to the linear system. Using octave or Matlab you will find that the eigenvalues of A are approximately 7.7321 (See also Exercise (3.4. via some oracle.388 7. with ω = 1/6. of a sequence of weightings.4. which we use in each iterative update. 2. In the altered algorithm we presuppose the existence. Find conditions on ω which guarantee convergence of Richardson’s Iteration for finding approximate iterative solutions to the system Ax = b. . 3.” that is if f (x) is a polynomial and λ is an eigenvalue of a matrix A. 6 1 2 6 Solution: By Theorem 3. where     12 6 1 1 b =  0 .2679ω.9. 4. ω i . 1 − 4ω.8. SOLVING LINEAR SYSTEMS Example Problem 3. ω = 1 apparently did not lead to convergence. Thus our algorithm becomes: 1. e(k) = x(k) − x.) Compare this to the results of Example Problem 3. Given iterate x(k−1) . with Q the identity matrix.7321ω. Expanding e (k−1) similarly gives e(k) = (I − ωk A) (Iωk−1 A) e(k−2) k = i=1 I − ωi A e(0) .2679. it can be shown that e(k) = (I − ωk A) e(k−1) where.10).8. In this case the polynomial we consider is f (x) = x0 −ωx1 .7321. define x(k) = (I − ωk A) x(k−1) + ωk b. Select some initial iterate x(0) . convergence was observed.8 leads to an interesting possible variant of the iterative scheme. we have convergence if and only if I − ωA 2 <1 We now use the fact that “eigenvalues commute with polynomials. A =  2 4 0 .44 CHAPTER 3. Following the analysis for Theorem 3. where for this system. then f (λ) is an eigenvalue of the matrix f (A). while for Example Problem 3. For simplicity we will only consider an alteration of Richardson’s Iteration.8 A Free Lunch? The analysis leading to Theorem 3. 1 − 4.

then convergence to the exact solution could be guaranteed after only m iterations. However. to be zero. then k i=1 1 − ω i λj is an eigenvalue of B. regardless of the choice of e (0) it to somehow ensure that the matrix k B= i=1 I − ωi A has all zero eigenvalues. . This suggests how we are to pick the weightings: let them be the inverses of the eigenvalues of A. if A has a small number of distinct eigenvalues. This eigenvalue is zero if one of the ω i for 1 ≤ i ≤ k is 1/λj . As you may have guessed from the title of this subsection. We again make use of the fact that eigenvalues “commute” with polynomials to claim that if λj is an eigenvalue of A. or all its eigenvalues. some.3. in some limited situations this algorithm might be practical if the eigenvalues of A are known a priori. this is not exactly a practical algorithm. or. for a given arbitrary matrix A. The problem is that it is not simple to find. ITERATIVE SOLUTIONS 45 Remember that we want e(k) to be small in magnitude.4. better yet. say m eigenvalues. This problem is of sufficient complexity to outweigh any savings to be gotten from our “free lunch” algorithm. In fact. one. regardless of the size of the matrix. One way to guarantee that e(k) is zero.

6) Prove that the inverse of the matrix        1 a2 a3 . . set up a linear 2 × 2 system of equations to find the t. .2) Perform back substitution to solve the equation  1  0   0 0 3 2 0 0 5 4 2 0   3  3  x =    1 2  −1 −1   1  2 ıve (3. 0 0 ··· 0 0 0 .6 −0.0590 0. .3) Perform Na¨ Gaussian Elimination to prove Cramer’s rule for the 2D case.48 1 1. That is. how would you detect whether they are parallel? How is this related to the form of the solution to a system of two linear equations? (See Exercise (3. If you were going to write a program to detect the intersection of two lines.088 2 −0.744 2.2372 −0.088 −0. SOLVING LINEAR SYSTEMS (3. Your code should find the x such that Ax = b. usingna¨ Gaussian Elimination ıve     −3 6 0 8 24 16 3 −1 −2 (a)  9 −1 −4  (b)  1 12 11  (c)  1 −2 0  −4 5 −8 4 13 19 −6 10 13 (3. .5) Given two lines parametrized by f (t) = at + b.. .6452 (3.744 2. .b) where A is a 2 × 2 matrix.4) Implement Cramer’s rule to solve a pair of linear equations in 2 variables.1080 −0. s at the point of intersection of the two lines.3)) Test your code on the following (augmented) systems: 1 1 3 −2 1. and g(s) = cs + d. . . .24 −3. (See Exercise (3. and b and x are 2 × 1 vectors. .46 Exercises CHAPTER 3.3528 (d) 0. Your m-file should have header line like: function x = cramer2(A. prove that the solution to a b x f = c d y g is given by det y= det a f c g a b c d det and x= det f g b d a b c d (3.1) Find the LU decomposition of the followingmatrices. . 1        . an 0 0 ··· 1 0 ··· 0 1 ··· .4348 0.48 (a) (b) (c) 4 −3 −1 −0.3)) (3.24 −3.

1        (Hint: Multiply them together. Consider Richardson’s Iteration x(k+1) = (I − ωA) x(k) + ωb. (b) Prove that max {|λ| : 1 − ωβ ≤ λ ≤ 1 − ωα} is minimized when we choose ω such that 1 − ωβ = − (1 − ωα) .. |α + β| . 0 0 ··· 0 0 0 . y (k) converges to the (normalized) eigenvector associated with the largest eigenvalue of A. . ITERATIVE SOLUTIONS is the matrix 47        1 −a2 −a3 . with 0 < α ≤ β.5     2 3 4 −3 5  10 100 1 0. which row of the following matrix will be the first pivot row?   10 17 −10 0. . .4.8) Let A be a symmetric positive definite n × n matrix with n distinct eigenvalues. −an 0 0 ··· 1 0 ··· 0 1 ··· .9) Consider the equation     1 3 5 −5  −2 2 4  x =  −6  4 −3 −4 10 Letting x(0) = [1 1 0] . (3. 1 − ωα].3 0.1 0. find the iterate x(1) by one step of Richardson’s Method. (Hint: It may help to look at the graph of something versus ω.) (c) Show that this relationship is satisfied by ω = 2/ (α + β). Recall that e(k+1) = (I − ωA) e(k) . . (a) Show that the eigenvalues of I − ωA are in the interval [1 − ωβ.3 −4     0. And by one step of Jacobi Iteration. Letting y (0) = b/ b 2 .9  −3 3 −3 0.1 0. (3. .) (3. . and α + β = 0.1 0 (3. And by Gauss Seidel. β]. Ay (k) 2 (a) What is y (k) 2 ? (b) Show that y (k) = Ak b/ Ak b 2 .3.01 −1 0. . (d) For this choice of ω show that the spectral radius of I − ωA is |α − β| . consider the iteration y (k+1) = Ay (k) . (c) Show that as k → ∞. . .10) Let A be a symmetric n × n matrix with eigenvalues in the interval [α. . .7) Under the strategy of scaled partial pivoting.

p(x) of degree k with p(0) = 1. . (c) Argue that r (k+1) = min p(A)r (0) . SOLVING LINEAR SYSTEMS (e) Show that when 0 < α. 1020]? Why? (3. .k) Your code should return x(k) based on the iteration x(j+1) = x(j) − ω Ax(j) − b . (a) Show that if x ∈ x(0) + Dk . (b) Prove that. .12) Let A be a nonsingular n × n matrix. Consider the following iterative method: Let x (k+1) be the x that solves x∈x(0) +Dk min b − Ax 2 . Ar (0) . Let w take the place of ω. this quantity is always smaller than 1. b for which you know the actual solution to the problem.x0.  . . Test your code for A. Try different values of ω. . Your m-file should have header line like: function xk = richardsons(A. 20] or A2 with eigenvalues in [1010. . .11) Implement Richardson’s Iteration to solve the system Ax = b. Let r (k) = b − Ax(k) .. (Hint: Start with A and the solution x and generate b.) . and let x0 be the initial iterate x (0) .w. 0 0 0 ··· −2 These can be generated by the octave command A = toeplitz([-2 1 0 0 0 0 0 0]). . .  . including ω = 1. . (3. .  .48 CHAPTER 3. Use this polynomial to show that r (n) 2 ≤ 0. . (f) Prove that if A is positive definite. and let Pk be the set of polynomials. . 2 p∈Pk 2 (d) Prove that this iteration converges in at most n steps.) Test your code on the following matrices: • Let A be the Hilbert Matrix . (g) For which matrix do you expect faster convergence of Richardson’s Iteration: A 1 with eigenvalues in [10. then there is an ω such that Richardson’s Iteration with this ω will converge for any choice of x (0) . We wish to solve Ax = b. or whatever (smallish) integer. Let x (0) be some starting vector. (Hint: Argue for the existence of a polynomial in Pn that vanishes at all the eigenvalues of A. This is generated by the octave command A = hilb(10). including ω = −1/2. • Let A be a Toeplitz matrix of the form:   −2 1 0 ··· 0  1 −2 1 · · · 0     1 −2 · · · 0  A= 0   . Ak r (0) . such that b − Ax = p(A)r (0) . Try different values of ω. conversely.b. . let Dk be span r (0) . for any p ∈ P k there is some x ∈ x(0) + Dk . then b − Ax = p(A)r (0) for some p ∈ Pk .

if 0 ∈ [a. 2.2. b such that f (a). from calculus class. f (b) there is no c ∈ [a. The function f (x) = x is not continuous at 0.. The user might give f. If. The simplest method is that of bisection. the IVT tells us that if f (x) is continuous and we know a. In particular.e. and might have a root in the interval [a. b]. There are a number of easily conceivable problems: 1. 4. Thus if 0 ∈ [a. f (c) have different signs. f (a).1 (Intermediate Value Theorem). If f (x) is continuous on [a. If this does not hold then one and only one of the two following options holds: 1. f (b). The IVT is best illustrated graphically. Note that continuity is really a requirement here–a single point of discontinuity could ruin your whole day. then there is some root in [a. depending on which of these two options hold. A decent estimate of the root is c = a+b . We 2 should perform a “sanity check” on the input to make sure f (a). taking c = a+b . f (b) have different signs. 49 . c] or [c. In this case the function f might be legitimately continuous. 2 We can check whether f (c) = 0. respectively. i. a. b] it happens to be the case that for every y between f (a). f (a). 1 Example 4. it is impossible for a computer to test whether a given black box function is continuous. f (b) have different sign. b] such that f (c) = y. b]. b such that f (a). In particular. the algorithm would be at an impasse. We now choose to recursively apply bisection to either [a. Thus malicious or incompetent users could cause a na¨ ıvely implemented bisection algorithm to fail. The following theorem.1 Modifications Unfortunately. as the following example illustrates. insures the success of the method Theorem 4. b] . some c such that f (c) = 0. b] then for any y such that y is between f (a) and f (b) there is a c ∈ [a.1. we cannot apply the IVT. b]. f (c).Chapter 4 Finding Roots 4. b] such that f (c) = y.1 Bisection We are now concerned with the problem of finding a zero for the function f (x). f (b) have the same sign. f (c) all have the same sign. f (b) have different signs.

0. A well implemented version of bisection should check the length of its input interval.) is called that |b − a| is half the length of the interval in the previous call. b0 ]. The pseudocode bisection algorithm is given as Algorithm 1. that |f (a)| .. b]. where c is some root of f . 3. compared to machine precision. zero. guesses x2 . x0 . Recall that we think of computers as storing numbers in the form ±r × 10 k . the algorithm run bisection will return c such that |c − c | ≤ |b−a| 2n . x3 . . and that f (a)f (b) might be too small to be representable in the computer. Another common error occurs in the testing of the signs of f (a). We are claiming that bn − a n = = bn−1 − an−1 2 b0 − a 0 2n Theorem 4.3 (Bisection Method Theorem). Note that one of a1 . The user might give f such that f has no root c that is representable in the computer’s memory. b. b1 will be c0 . a. or positive. respectively. due to rounding. That is. The user might give f. be the same as one of the endpoints. one iteration of the algorithm produces a number x 1 . FINDING ROOTS 2. this might lead to an infinite search on smaller and smaller intervals about some discontinuity of f . In this case. given a finite number of bits to represent a number. resulting in an infinite recursion. 1 depending on whether x is negative. 4. we know that f (x + h) ≈ f (x) + f (x)h. Recalling Taylor’s Theorem. A wiser choice is if (sign(f(a)) * sign(f(b)) > 0) then . . If f (x) is a continuous function on [a.50 CHAPTER 4.. the behaviour of the algorithm may be like that of the previous case..2 Convergence We can see that each time recursive bisection (f. a. |f (b)| might be very small.2 Newton’s Method Newton’s method is an iterative method for root finding. It may legitimately be the case that none of them is a root to f . moreover has no root in the interval [a. For a poorly implemented algorithm. where the function sign (x) returns −1. Formally call the first interval [a 0 . and unpredictable behaviour would ensue. f (b). A slick programmer might try to implement this test in the pseudocode: if (f(a)f(b) > 0) then . b such that f is not continuous on [a. In fact. . xn follow identically. . b]. and the other will be either a0 or b0 . in which case the midpoint of the interval will. the algorithm might descend to intervals as small as machine precision. . Note however. Newton’s method uses “linearization” to find an approximate root. 4. only a finite number of such numbers can be represented. and the first midpoint c0 . and give up if the length is too small. Let the second interval be [a1 . b1 ]. this calculation would be rounded to zero.1. etc. b] such that f (a)f (b) < 0. then after n steps.. starting from some guess at the root. . which is supposed to be closer to a root. . .

4. f (xk ) (4. Moreover. NEWTON’S METHOD Algorithm 1: Algorithm for finding root by bisection. This may preclude Newton’s method from being used when f (x) is a black box.1.2. along with the linearization of f (x) at xk . c ← a+c . Input: a function. f a ← f c. δ.2. two endpoints. takes some guess x k and returns xk+1 defined by xk+1 = xk − f (xk ) . 2 (19) return c 51 This approximation is better when f (·) is “well-behaved” between x and x + h. (3) if f af b > 0 (4) throw an error. test if f (a0 )f (b0 ) < 0. the derivative of the function must be known. run bisection(f. f (x) An iteration of Newton’s method. (5) else if f af b = 0 (6) if f a = 0 (7) return a (8) else (9) return b (10) Let c ← a+b 2 (11) while b − a > 2δ (12) Let f c ← f (c).1) An iteration of Newton’s method is shown in Figure 4.. This was also the case with bisection. the success of Newton’s method is dependent on the initial guess x0 . This is easily solved as h= −f (x) .1 Implementation Use of Newton’s method requires that the function f (x) be differentiable. then. ) (1) Let f a ← sign (f (a)) . c ← c+b . a. Moreover. our algorithm cannot explicitly check for continuity of f (x). and a y-tolerance Output: a c such that |f (c)| is smaller than the y-tolerance. f b ← f c. (13) if |f c| < (14) return c (15) if sign (f a) sign (f c) < 0 (16) Let b ← c.e. 4. As is the case for the bisection method. . b. but for bisection there was an easy test of the initial interval–i. a x-tolerance. Newton’s method attempts to find some h such that 0 = f (x + h) = f (x) + f (x)h. (2) Let f b ← sign (f (b)) . 2 (17) else (18) Let a ← c.

e. tol) (1) Let x ← x0. f . It is clear that xk+1 is a root of the linearization.Sfrag replacements 52 CHAPTER 4.. i.” (9) return x. giving up. and a tolerance Output: a point for which the function has small value. Note that if it were the case that f (xk ) = 0 then h would be ill defined. (11) Let n ← n + 1. It happens to be the case that |f (xk+1 )| is smaller than |f (xk )| . its derivative. Algorithm 2: Algorithm for finding root by Newton’s Method. xk+1 is a better guess than xk . an iteration limit. Our algorithm will test for goodness of the estimate by looking at |f (x k )| . (7) if |f px| < tol (8) Warn “f (x) is small.1: One iteration of Newton’s method is shown for a quadratic function f (x). run newton(f. (10) Let x ← x − f x/f px. . FINDING ROOTS f (xk−1 ) f (xk ) xk−1 xk Figure 4. (4) if |f x| < tol (5) return x. The algorithm will also test for near-zero derivative. x0. an initial guess. Input: a function. (2) while n ≤ N (3) Let f x ← f (x). N. The linearization of f (x) at xk is shown. (6) Let f px ← f (x). n ← 0.

Consider Newton’s method applied to the function f (x) = sin (x) for the initial estimate x0 = 0. Now note that xk+1 = xk − xj k j−1 jxk = 1− 1 j xk .2. However. it were the case that C = 1. If x > e1 .000001.6.2. for example.3 Convergence When Newton’s Method converges. We illustrate them here. if e0 were 0. we know f (x) > 0. where r is the root that the xk are converging to. Of course. That is.5. Thus taking xk+1 = xk − f (xk ) > xk . Consider Newton’s method applied to the function f (x) = ln x . 4. You should verify that there are an infinite number of such x 0 .2. NEWTON’S METHOD 53 4. Note that f (x) is continuous on R+ . f (xk ) The estimates will “run away” from the root x = 1. Example 4. there is no well-defined xk+1 . we find that x k is converging to the root. f (x0 ) cos(x0 ) Thus Newton’s method “cycles” between the two values x 0 . f (x1 ) cos(−x0 ) cos(x0 ) f (x0 ) sin(x0 ) = x0 − = x0 − tan x0 = x0 − 2x0 = −x0 .2 Problems As mentioned above. where x0 has the odious property 2x0 = tan x0 . However.4. and the initial estimate x 0 .4. Newton’s method may find some iterate x k for which f (xk ) = 0. that |ek+1 | ≤ C |ek |2 . for x > 1. Consider Newton’s method applied to the function f (x) = x j with j > 1. If. Example 4. which is a larger number (and thus worse decrease) when j is larger. convergence is dependent on f (x). it actually displays quadratic convergence. it is converging at a rate slower than we expect from Newton’s method: at each step we have a constant decrease of 1 − (1/j) . −x0 . Example 4. then 1 − ln x < 0. Note that f (x) = 0 has the single root x = 0. in which case. The following theorem formalizes our claim: . That is. Our initial guess is not too far from this root. Consider the identity of x1 : x1 = x 0 − Now consider x2 : x2 = x 1 − f (x1 ) sin(−x0 ) sin(x0 ) = −x0 − − x0 + = −x0 + tan x0 = −x0 + 2x0 = x0 . A number of conceivable problems might come up. and so f (x) < 0. if e k = xk −r. then we would double the accuracy of our root estimate with each iterate. and with initial estimate x0 = 0. consider the derivative: f (x) = 1 x x − ln x 1 − ln x = 2 x x2 Since the equation has the single root x = 0. we would expect e1 to be on the order of 0. However. with initial estimate x x0 = 3.001. It has a single root at x = 1.

One way of solving this problem is to have your subroutine use Newton’s Method to solve some equation equivalent to g(z) − x = 0. see Cheney & Kincaid for the proof [7]. then Newton’s method converges to the unique root of f (x) = 0 for any choice of x 0 ∈ [a. and the definition of Newton’s method to show that ek+1 = −f (ξk )e2 k . Suppose you had a computer which could perform addition. Note that it is assumed the subroutine cannot evaluate g(z) directly. The theorem never guarantees that some x k will fall into the “attractor” region of a root r of the function. 4. students make the mistake of using Newton’s Method to try to solve a linear equation. The key to the proof is using Taylor’s theorem. f (x) = 0 on [a. so this equation needs to be modified to satisfy the computer’s constraints. Newton’s method will converge quadratically to r.2. If f (x) has two continuous derivatives. The theorem that follows gives sufficient conditions for convergence to a particular root.54 CHAPTER 4. then xk+1 is in the region. 3.5 and Example 4.4 Using Newton’s Method Newton’s Method can be used to program more complex functions using only simple functions. The proof requires that r be a simple root. and is omitted. In particular the region is chosen to be so small that if x k is in the region. subtraction. where z is the input to the subroutine. then there is some D such that if |x 0 − r| < D.7 (Newton’s Method Convergence). 3. When this does not hold we may get only linear convergence. The proof is based on arguments from real analysis. that is that f (r) = 0. as in Example 4. 2. as in Example 4. multiplication. 4.6. Take note of the following. Both |f (a)| ≤ (b − a) |f (a)| and |f (b)| ≤ (b − a) |f (b)| hold. and 1. The proof then proceeds by showing that there is some region about r such that (a) in this region |f (x)| is not too large and |f (x)| is not too small. b] . and (b) if xk is in the region. though: 1. Theorem 4. b]. Quite often when dealing with problems of this type. This should be an indication that a mistake was made. division. and it was your task to write a subroutine to compute some complex function g(·). If f (x) has two continuous derivatives on [a. FINDING ROOTS Theorem 4. b]. b]. .8 (Newton’s Method Convergence II [2]). 2. f (a)f (b) < 0. since Newton’s Method can solve a linear equation in a single step: Example 4. The iteration is given as axk + b .4.9. 2f (xk ) where ξk is between xk and r = xk + ek . and storing and retrieving numbers. Consider Newton’s method applied to the function f (x) = ax + b. then the factor 2 ek will outweigh the factor |f (ξk )| /2 |f (xk )| . xk+1 ← xk − a This can be rewritten simply as xk+1 ← −b/a. and r is a simple root of f (x). f (x) does not change sign on [a. You can roughly think of this region as an “attractor” region for the root.

e. n ← 0. 4. Example Problem 4. this will not work. (2) Let x ← sign (z) . The final subroutine is given in Algorithm 4. this equation is linear in x.3 Secant Method The secant method for root finding is roughly based on Newton’s method. However. 2 2xk z − x2 k . Since the subroutine can only use subtraction and multiplication. Devise a subroutine using only simple operations which computes the square root of an input number z. the slope of the tangent . multiplication and division. Example Problem 4. Solution: We are tempted to use the linear function f (x) = (1/z) − x. The Newton step becomes xk+1 ← xk − after some simplification this becomes xk+1 ← xk z + . The Newton step is xk+1 ← xk − z − (1/xk ) 2 = xk − zxk + xk = xk (2 − zxk ) . Instead apply Newton’s Method to f (x) = z − (1/x) . SECANT METHOD 55 The following example problems should illustrate this process of “bootstrapping” via Newton’s Method. it is not assumed that the derivative of the function is known.4. however. x k .10. −2xk Note this relies only on addition. But this is a linear equation for which Newton’s Method would reduce to x k+1 ← (1/z) . Instead let f (x) = z − x2 . then √ x = z.11. Input: a number Output: its multiplicative inverse invs(z) (1) if z = 0 throw an error. can find (1/z) .3. (5) Let n ← n + 1. √ Solution: The temptation is to find a zero for f (x) = z − x. i. Devise a subroutine using only subtraction and multiplication that can find the multiplicative inverse of an input number z. You can easily see that if x is a positive root of f (x) = 0. 1/x2 k Note this step uses only multiplication and subtraction. The subroutine is given in Algorithm 3.. More specifically. Algorithm 3: Algorithm for finding a multiplicative inverse using simple operations. (3) while n ≤ 50 (4) Let x ← x (2 − zx) . rather the derivative is approximated by the value of the function at some of the iterates.

56 CHAPTER 4. (5) Let n ← n + 1. f (xk−1 )) and (xk . f (xk )) . i. xk+1 is a better guess than xk . which is f (xk ) − f (xk−1 ) xk − xk−1 Thus the iterate xk+1 is the root of this secant line. which is f (xk ) is approximated by the slope of the secant line passing through (xk−1 . f (xk )) is shown. It happens to be the case that |f (xk+1 )| is smaller than |f (xk )| . xk − xk−1 . The secant line through (xk−1 .. n ← 0. PSfrag replacements f (xk+1 ) f (xk−1 ) f (xk ) xk xk+1 xk−1 Figure 4.2: One iteration of the Secant method is shown for some quadratic function f (x). That is. f (xk )) . Input: a number Output: its square root sqrt(z) (1) if z < 0 throw an error. it is a root to the equation f (xk ) − f (xk−1 ) (x − xk ) = y − f (xk ). (3) while n ≤ 50 (4) Let x ← (x + z/x) /2. f (xk−1 )) and (xk . (2) Let x ← 1. line at (xk .e. FINDING ROOTS Algorithm 4: Algorithm for finding a square root using simple operations.

the iterates diverge towards infinity. f (xk ) − f (xk−1 ) xk − xk−1 f (xk ).0000000008072 0.3. We illustrate a few possible problems: Example 4. and the initial estimates x 0 .5 0.2. looking for a nonexistant root.43849426498855 × 10 −15 4. We give the iterates here: k 0 1 2 3 4 5 6 7 8 9 10 xk 2 0.4. x1 = 1 . with x0 = 2.177482458876898 0.22880033820638 × 10 −09 −7. Example 4. We .868254072087394 0. The secant method can suffer from some of the same problems that Newton’s method does.13. SECANT METHOD Since the root has a y value of 0.12.953491494113659 1. along with the secant line.52969840528617 × 10 −06 3.666666666666667 1. Consider the secant method for the function f (x) = ln x . An iteration of the secant method is shown in Figure 4. Note also that the secant method requires two initial guesses.00706900811804 0.0013544492875992 −9. we have 57 f (xk ) − f (xk−1 ) (xk+1 − xk ) = −f (xk ). x As with Newton’s method. with x0 = 3.999661272951803 0. x1 . xk+1 = xk − f (xk ) − f (xk−1 ) (4.0284762692197613 −0.125 −0. Consider the secant method used on x 3 + x2 − x − 1. but can be used on a black box function. 2 Note that this function is continuous and has roots ±1.1 Problems As with Newton’s method.459842466254495 −0.999999999999998 f (xk ) 9 −1. convergence is dependent on f (x).999997617569723 1.3.925925925925926 2.44186046511628 0. x1 .2) You will note this is the recurrence of Newton’s method. as we will see. x1 = 4.63467367653163 −0. but with the slope f (xk ) substituted by the slope of the secant line. xk − xk−1 xk − xk−1 xk+1 − xk = − f (xk ). x 0 .

8466075548 73636.366204096222703 0. xp ← x0. (12) Let n ← n + 1. with initial estimates x0 = −1.346573590279973 0.000621298692241343 0.543986613847 366. tol) (1) Let x ← x1.0160949419231219 0. (11) Let f h ← f (x).7196085218 43924. Because the secant method involves two iterates. x1.14.69020766155 9203. f h ← f (x1).010135406045582 0.000152191807070607 1 1 Example 4. run secant(f. We assume that r is a root of f (x).103911011441661 0.6548475770851 33.3380435135758 117.58 give some iterates here: k 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 xk 3 4 21. (2) while n ≤ N (3) if |f h| < tol (4) return x. (6) if |f px| < tol (7) Warn “secant slope is too small.54125113444 1878.00401318169994875 0.142011128224341 0.000991714984152597 0.0254089165873003 0.000388975250950428 0.9111765137635 67. (10) Let x ← x − f h/f px. f p ← f (x0).60222260594 15533. N. Consider Newton’s method applied to the function f (x) = 1+x2 − 17 .” (8) return x.0625163004418104 0. an iteration limit. You can easily verify that f (x0 ) = f (x1 ).673898472 CHAPTER 4. and let en = r − xn .889164762149 637.060241341843 1096. And thus x 2 is not defined. and a tolerance Output: a point for which the function has small value. we assume that we will find some . x0.000243375589137882 0.34688714646 3201.0025208146648422 0. f p ← f h. 4.00158175793894727 0.1606791089 26149.94672271613 5437.820919458675 210. giving up.00638363233543847 0. n ← 0. and thus the secant line is horizontal. FINDING ROOTS f (xk ) 0.2 Convergence We consider convergence analysis as in Newton’s method. initial guesses. Input: a function. Algorithm 5: Algorithm for finding root by secant method.3. (9) Let xp ← x. (5) Let f px ← (f h − f p)/(x − xp).0404780904944712 0. x1 = 1.

this is the case. ek . we arrive at the imprecise equation: ek+1 ≈ − f (r) ek ek−1 = Cek ek−1 . with α = 2. Recall that this held true for Newton’s method. α 1 1 1 1 Since this equation is to hold for all e k . 1 . we must have α =1+ √ This is solved by α = 1 1 + 5 ≈ 1. SECANT METHOD 59 relation between ek+1 and the previous two errors. This is somewhere between the convergence rates for bisection and Newton’s method. A |ek |α = |ek+1 | = C |ek | |ek−1 | = CA− α |ek |1+ α . the proof relies on finding some “attractor” region and using continuity. Omitting all the nasty details (see Cheney & Kincaid [7]).62.4. so Then we have |ek−1 | = A− α |ek | α . Indeed. ek−1 . .3. We now postulate that the error terms for the secant method follow some power law of the following type: |ek+1 | ∼ A |ek |α . 2f (r) Again. Thus we say that the secant method enjoys superlinear 2 convergence. We try to find the α for the secant method. Note that |ek | = A |ek−1 |α .

In this case you will have to set f to the name of an m-file (without the “. (c) Falls into a cycle for some choice of x 0 . Find x2 . you should have the subroutine return the current iterate if zx is sufficiently close to 1. b. with an initial estimate of x 0 = 3. given z. call it I 1 ? What is I2 ? I6 ? (4. b0 = 1. Is this a reasonable way to write a subroutine that. fp. b2 ? √ (4. 1 1 xk+1 = xk + 2 xk converge to? (4. (b) Finds a root for every choice of x 0 .9) How will Newton’s Method perform for finding the root to f (x) = |x| = 0? (4.x) when f is a string with the name of some function evaluates that function at x. . (4. This works for builtin functions and m-files. Your m-file should have header line like: function c = run_bisection(f. with a 0 = 0. via Newton’s Method.11) Implement the bisection method in octave/Matlab. b0 = 1. N. (4. . (4. with a 0 = 0.11) Your answer should be correct to 5 decimal places.0001).7) Use Newton’s Method to devise a sequence x 0 . 0.12) Implement Newton’s Method. 2]. (c) Run your code on the function f (x) = x − cos x.60 Exercises CHAPTER 4. (b) Run your code on the function f (x) = x 2 − 5x + 3. (4. tol) where f is the name of the function. say within the interval (0. Your m-file should have header line like: function c = invs(z) where z is the number to be inverted.10) Implement the inverse finder described in Example Problem 4. √ (4. such that xk → ln 10. What are a1 . (cf. .3) Consider bisection for finding the root to cos x = 0.2) Approximate 10 by using two steps of Newton’s method. tol) where f is the name of the function. FINDING ROOTS (4. Example Problem 4. x0. You may wish to use the builtin function sign.4) What does the sequence defined by x0 = 1. x1 . In this case you can set f = "cos". computes ln z? (Hint: such a subroutine would require computation of exk . a. What is the next interval considered. Let the initial interval I 0 be [0. Can you find some z for which the algorithm performs poorly? (4.8) Give an example (graphical or analytic) of a function. b0 = 1. (a) Run your code on the function f (x) = cos x. with a0 = 0. Start with x0 = 2. b0 = 2. Run your code to find zeroes of the following functions: (a) f (x) = tan x − x. b1 ? What are a2 . Recall that feval(f. Is this possible for rational xk without using a logarithm? Is it practical?) (4. the cubed root of some input number z. .6) Use Newton’s Method to approximate 3 9. and fp is its derivative.10. Your m-file should have header line like: function x = run_newton(f. f (x) for which Newton’s Method: (a) Does not find a root for some choices of x 0 . (d) Converges slowly to the zero of f (x).1) Consider the bisection method applied to find the zero of the function f (x) = x 2 − 5x + 3. As an extra termination condition.9999.m”) which will evaluate the given f (x). with a0 = 0.5) Devise a subroutine using only simple operations that finds.

x0. (c) f (x) = x7 − 7x6 + 21x5 − 35x4 + 35x3 − 21x2 + 7x − 1.13) Implement the secant method Your m-file should have header line like: function x = run_secant(f. N.3. for small. (Note: This function also has no zeroes.4. Run your code to find zeroes of the following functions: (a) f (x) = tan x − x.) (c) f (x) = 2 + sin x. (4.) . x1. (d) f (x) = (x − 1)7 . (Note: This function has no zeroes. (b) f (x) = x2 + 1. SECANT METHOD 61 (b) f (x) = x2 − (2 + )x + 1 + . tol) where f is the name of the function.

62 CHAPTER 4. FINDING ROOTS .

which we want to approximate with a polynomial p(x).” Sometimes. The xi values are called “nodes. y n We consider the problem of finding a polynomial that interpolates a given set of values: where the xi are all distinct. xi − x j (5.1. then p n also has this property. xn f (x) f (x0 ) f (x1 ) . . . .. Definition 5.1) 63 . x n y 0 y1 . we will consider a variant of this problem: we have some black box function. there is an n-degree polynomial that interpolates it. . for any such set of data. . We present a constructive proof of this fact by use of Lagrange Polynomials.. . A polynomial p(x) is said to interpolate these data if p(x i ) = yi for i = 0. . n. 5.1 Polynomial Interpolation x y x 0 x1 . .1 (Lagrange Polynomials). . the Lagrange polynomials are the n + 1 polynomials i defined by i (xj ) = δij = 0 if i = j 1 if i = j Then we define the interpolating polynomial as n pn (x) = i=0 yi i (x). f (xn ) for some choice of distinct nodes xi . We do this by finding the polynomial interpolant to the data x x0 x1 . f (x).Chapter 5 Interpolation 5. . For a given set of n + 1 nodes x i . If each Lagrange Polynomial is of degree at most n. 1. The Lagrange Polynomials can be characterized as follows: n i (x) = j=0.1 Lagranges Method As you might have guessed. j=i x − xj .

both interpolating the data. call them p(x).3.2 (Interpolant Existence and Uniqueness). Let r(x) = p(x) − q(x). x1 . Construct the polynomial interpolating the data x y 1 1 3 2 3 −10 2 by using Lagrange Polynomials. This gives the theorem Theorem 5. Thus we create polynomials pk (x) such that pk (xi ) = yi for 0 ≤ i ≤ k. . q(x) are equal everywhere. Proof. xn . Example Problem 5. . and leads to a different algorithm for constructing the interpolant. . we simply let p0 (x) = y0 .e. and the interpolating polynomial calculated using Lagrange Polynomials. Let {x i }n be distinct nodes. in “Lagrange Form” is 2 1 1 p2 (x) = −3(x − )(x − 3) − 8(x − 1)(x − 3) + (x − 1)(x − ) 2 5 2 5. 0 ≡ r(x) = p(x) − q(x). Note that r(x) can have degree no greater than n. . then a new point was given (xn+1 . ..e. . each of degree no greater than n. Each Lagrange Polynomial would have to be updated. there is exactly one polynomial. This method is also constructive. That is. One way to view this construction is to imagine how one would update the Lagrangian form of an interpolant. p(x) of degree no greater than i=0 n such that p(xi ) = yi for i = 0. yet it has roots at x0 . .2 Newton’s Method There is another way to prove existence. i. This could take a lot of calculation (especially if n is large). they are the same polynomial.. INTERPOLATION By evaluating this product for each x j . So the alternative method constructs the polynomials iteratively. The Lagrange Polynomial construction gives existence of such a polynomial p(x) of degree no greater than n. {yi }n . we see that this is indeed a characterization of the Lagrange Polynomials. The only polynomial of degree ≤ n that has n + 1 distinct roots is the zero polynomial. Thus p(x). Solution: We construct the Lagrange Polynomials: 0 (x) = (x − 1 )(x − 3) 1 2 = −(x − )(x − 3) 2 (1 − 1 )(1 − 3) 2 4 (x − 1)(x − 3) = (x − 1)(x − 3) 1 (x) = 1 1 5 ( 2 − 1)( 2 − 3) = (x − 1)(x − 1 ) 1 1 2 1 = 5 (x − 1)(x − 2 ) (3 − 1)(3 − 2 ) 2 (x) Then the interpolating polynomial. yn+1 ). and thus has degree no greater than n. Moreover. Then i=0 for any values at the nodes. i. suppose some data were given. and an interpolant for the augmented set of data is to be found. . Suppose there were two such polynomials. n. . 1. q(x). each polynomial is clearly the product of n monomials.64 CHAPTER 5.1. This is simple for k = 0.

This construction is known as Newton’s Algorithm. 2 the constant polynomial with value y 0 .5 2 2. 1 . so pk+1 (x) will interpolate the data at x0 . POLYNOMIAL INTERPOLATION 2 l0(x) l1(x) l2(x) 1. and thus c = 26. and the resultant form is Newton’s form of the interpolant Example Problem 5. To get the value of the constant we calculate c such that yk+1 = pk+1 (xk+1 ) = pk (xk+1 ) + c(xk+1 − x0 )(xk+1 − x1 ) · · · (xk+1 − xk ). . The following construction works: pk+1 (x) = pk (x) + c(x − x0 )(x − x1 ) · · · (x − xk ).1. .5.5 65 1 0. .5 0 -0. graphically. xk . Verify.1: The 3 Lagrange polynomials for the example are shown. Then 2 1 p2 (x) = 3 + 26(x − 1) + c(x − 1)(x − ) 2 . Then assume we have a proper pk (x) and want to construct pk+1 (x). 3.5 Figure 5. . Solution: We construct the polynomial iteratively: p0 (x) = 3 p1 (x) = 3 + c(x − 1) 1 We want −10 = p1 ( 1 ) = 3 + c(− 2 ). for some constant c. that these polynomials have the property i (xj ) = δij for the nodes 1. Construct the polynomial interpolating the data x y 1 1 3 2 3 −10 2 by using Newton’s Algorithm.5 3 3.5 1 1.4. Note that the second term will be zero for any x i for 0 ≤ i ≤ k.5 -1 0 0. x1 .

5 1 1. Continuing in this way we see that we can write  n pn (x) = k=0 where an empty product has value 1 by convention.2: The interpolating polynomial for the example is shown. Then.  (5. it must be the same polynomial as the Lagrange interpolant.5 2 2.66 15 CHAPTER 5.1. This can be rewritten in a funny form. we may wonder “Which should we use?” Newton’s Algorithm seems more flexible–it can deal with adding new data.2.2) . The two methods give the same interpolant. We want 2 = p2 (3) = 3 + 26(2) + c(2)( 5 ). The previous iterate pk (x) was constructed similarly. 5. There also happens to be a way of storing Newton’s form of the interpolant that makes the polynomial simple to evaluate (in the sense of number of calculations required).5 3 3. that the degree of pn (x) is no greater than n. . by Theorem 5. where a monomial is factored out of each successive summand: pn (x) = c0 + (x − x0 )[c1 + (x − x1 )[c2 + (x − x2 ) [.3 Newton’s Nested Form Recall the iterative construction of Newton’s Form: pk+1 (x) = pk (x) + ck (x − x0 )(x − x1 ) · · · (x − xk ). by induction. . 5 2 Does Newton’s Algorithm give a different polynomial? It is easy to show. INTERPOLATION p(x) 10 5 0 -5 -10 -15 -20 -25 -30 -35 0 0. Then we get 1 −53 (x − 1)(x − ). so we can write: pk+1 (x) = [pk−1 (x) + ck−1 (x − x0 )(x − x1 ) · · · (x − xk−1 )] + ck (x − x0 )(x − x1 ) · · · (x − xk ).]]] ck  0≤j<k (x − xj ) .5 4 Figure 5. and thus c = 2 p2 (x) = 3 + 26(x − 1) + −53 5 .

(5. 5. .. We assume. . xi1 . xj+k ] − f [xj . a k th order divided difference is a function of k + 1 (not necessarily distinct) nodes. . . for the remainder of this section. . xik ] The divided differences are defined recursively as follows: • The 0th order divided differences are simply defined: f [xi ] = f (xi ). . x=0 written as f [xi0 . xi1 . those of the form f [x j . xj+1 . . . xj+k−1 ] xj+k − xj . that we are considering interpolating a function. By “better” we mean requiring few multiplications and additions. . . . xi2 . In this case the higher order differences can more simply be written as f [xj . vn = c0 + (t − x0 )vn−1 This requires only n multiplications and 2n additions. . xi1 . this provides a better way of calculating pn (t) at arbitrary t. xk ] . . . . xj+2 . we will only consider divided differences with successive nodes. . . xj+k ] = f [xj+1 . . Compare this with the number required for using the Lagrange form: at least n 2 additions and multiplications. xj+1 . .e. .1. For a given collection of nodes {x i }n and values i=0 {f (xi )}n . . we have values of f (x i ) at the nodes xi . . i. . . . . POLYNOMIAL INTERPOLATION 67 Supposing for an instant that the constants c k were known. . . . . . . This nested calculation is performed iteratively: v0 = c n v1 = cn−1 + (t − xn−1 )v0 v2 = cn−2 + (t − xn−2 )v1 . xik ] = f [xi1 . . .5. xj+k ]. . . xik−1 x ik − x i0 We care about divided differences because coefficients for the Newton nested form are divided differences: ck = f [x0 . xj+1 .1.3) Because we are only interested in the Newton method coefficients. Definition 5.5 (Divided Differences). x1 .4 Divided Differences It turns out that the coefficients ck for Newton’s nested form can be calculated relatively easily by using divided differences. . xik ] − f xi0 . • Higher order divided differences are the ratio of differences: f [xi0 . . that is.

x1 . . x3 ] f[ ] 3 −10 2 f[ . . ] 26 24 5 3 Then we calculate f [x0 . ] f [x0 . .68 CHAPTER 5. 3−1 5 f[ . ] f [x0 . Find the divided differences for the following data x f (x) Solution: We start by writing in the data: x 1 1 2 1 1 3 2 3 −10 2 f[ . from left to right: x x0 x1 x2 x3 f[ ] f [x0 ] f [x1 ] f [x1 . the first column just echoes the data: f [x i ] = f (xi ). . x2 ] = 2 − −10 = 24/5. x1 ] = Adding these to the table. x2 ] = So we complete the table: x 1 1 2 24/5 − 26 −53 = . x3 ] f [x1 . x2 ] f [x3 ] Note that by definition. ] f[ . x2 . x2 . ] 3 Then we calculate: f [x0 . we have −10 − 3 = 26. ] 26 f[ ] 3 −10 2 3 1 24 5 −53 5 That’s supposed to be a joke. x1 . where you compute the following table columnwise. ] f[ .6. We drag out our example one last time: Example Problem 5. x1 . ] f[ . x2 ] f [x0 . . 3 − (1/2) f[ ] 3 −10 2 f[ . x2 ] f [x2 ] f [x2 . ] f[ . x1 ] f[ . (1/2) − 1 x 1 1 2 and f [x1 . INTERPOLATION The graphical way to calculate these things is a “pyramid scheme” 1 . . .

8 (Interpolation Error Theorem). Then n→∞ x∈[−5. how should we pick the nodes? 2. as found in Example Problem 5. 5. . This behaviour is shown in Figure 5. (known as the Runge Function). . If our closed interval is [−1. 1].5. on [−5. 5. . Let p be the polynomial of degree at most n interpolating function f at the n+1 distinct nodes x 0 . b] there is some ξ ∈ [a. . Literally interpreted. ERRORS IN POLYNOMIAL INTERPOLATION 69 You should verify that along the top line of this pyramid you can read off the coefficients for Newton’s form. If we want to interpolate some function f (x) at n + 1 nodes over some closed interval. Then for each x ∈ [a.3. a good polynomial interpolant of the Runge function of Example 5. xn on [a. It turns out that a much better choice is related to the Chebyshev Polynomials (“of the first kind”). then we want to define our nodes as xi = cos 2i + 1 2n + 2 π .2. By using the Chebyshev nodes.5. see Figure 5. b]. as the following example illustrates. these Chebyshev Nodes are the projections of points uniformly spaced on a semi circle.2 Errors in Polynomial Interpolation We now consider two questions: 1.4.4. b] such that 1 f (n+1) (ξ) f (x) − p(x) = (n + 1)! n i=0 (x − xi ) .7 can be found. You should be thinking that the term on the right hand side resembles the error term in Taylor’s Theorem. Let f (x) = 1 + x2 −1 . Let f (n+1) be continuous.7 (Runge Function). as shown in Figure 5. including the endpoints. 5]. . x1 .5] lim max |pn (x) − f (x)| = ∞. The fact that they are “good” for interpolation follows from the following theorem: Theorem 5. and let p n (x) interpolate f on n equally spaced nodes. Example 5. 0 ≤ i ≤ n.1 Interpolation Error Theorem We did not just invent the Chebyshev nodes. How accurate can we make a polynomial interpolant over a closed interval? You may be surprised to find that the answer to the first question is not that we should make the xi equally spaced over the closed interval.2.

3: The Runge function is poorly interpolated at n equally spaced nodes.70 CHAPTER 5. as n gets large. the interpolations appear to improve with larger n. this gets worse as n gets larger. .2 -0.4 -6 -4 -2 0 2 4 6 (a) 3. but bad at the edges.8 0.2 0 -0. INTERPOLATION 1 runge 3 nodes 7 nodes 10 nodes 0.6 0. As shown in (c). as in (a). In (b) it becomes apparent that the interpolant is good in center of the interval. At first.10 equally spaced nodes 8 runge 15 nodes 7 100000 runge 50 nodes 0 6 -100000 5 -200000 4 -300000 3 -400000 2 -500000 1 0 -600000 -1 -6 -4 -2 0 2 4 6 -700000 -6 -4 -2 0 2 4 6 (b) 15 equally spaced nodes (c) 50 equally spaced nodes Figure 5.7.4 0.

1 -0.3.4 -6 -4 -2 0 2 4 6 0 -6 -4 -2 0 2 4 6 0.2 -0.5.9 0. Apply Rolle’s Theorem to φ(t) to find that φ (t) has n + 1 .2 0. Some other facts: f.4 0.4 0. So assume x is not a node. w have n + 1 continuous derivatives.2. Proof. Compare these figures to those of Figure 5. Make the following definitions n w(t) = i=0 (t − xi ) . p.10 Chebyshev nodes (b) 17 Chebyshev nodes Figure 5. ERRORS IN POLYNOMIAL INTERPOLATION topleft Sfrag replacements 71 −1 1 Figure 5. by assumption and definition.7 0.5: The Chebyshev nodes yield good polynomial interpolants of the Runge function. w(x) is nonzero. c= Since x is not a node.5 0. Now note that φ(x i ) is zero for each node xi . and that by definition of c. That is φ(t) has n + 2 roots.4: The Chebyshev nodes are the projections of nodes equally spaced on a semi circle. in this case both the LHS and RHS are zero.2 0. First consider when x is one of the nodes x i . the interpolant is indistinguishable from the Runge function by eye.6 0. thus φ has this many continuous derivatives. w(x) φ(t) = f (t) − p(t) − cw(t).3 0. For more than about 25 nodes.7. that φ(x) = 0 for our x. f (x) − p(x) .8 0 (a) 3.8 0.6 0. bottomright 1 runge 3 nodes 7 nodes 10 nodes 1 runge 17 nodes 0.

72 CHAPTER 5. and once the nodes and f are fixed. In this case. call it ξ. That is 0 = φ(n+1) (ξ) = f (n+1) (ξ) − p(n+1) (ξ) − cw(n+1) (ξ). β] . Thus the error in a polynomial interpolation is given as 1 f (x) − p(x) = f (n+1) (ξ) (n + 1)! n i=0 0 = f (n+1) (ξ) − c(n + 1)! (x − xi ) . the rescaling of the nodes changes the bound on (t − xi ) so the overall error bound becomes (β − α)n+1 |f (x) − p(x)| ≤ 2n+1 max f (n+1) (ξ) . And w(t) is a polynomial of degree n + 1 in t. 1] have the remarkable property that n i=0 (t − xi ) ≤ 2−n for any t ∈ [−1. 2 (n + 1)! ξ∈[α. Moreover.β] for x ∈ [α.1] The Chebyshev nodes can be rescaled and shifted for use on the general interval [α.8. (n + 1)! which is what was to be proven. In this way we see that φ (n+1) (t) has a root. Then apply Rolle’s Theorem again to find φ (t) has n roots. thus the only way we can make the error |f (x) − p(x)| small is by judicious choice of the nodes xi . But p(t) is a polynomial of degree ≤ n. 1] . INTERPOLATION roots. it can be shown that for any choice of nodes x i that n t∈[−1. . 2 (n + 1)! ξ∈[−1. so its n + 1th derivative is easily seen to be (n + 1)! Thus c(n + 1)! = f (n+1) (ξ) f (x) − p(x) 1 = f (n+1) (ξ) w(x) (n + 1)! 1 f (x) − p(x) = f (n+1) (ξ)w(x). β]. 2 0 ≤ i ≤ n. The Chebyshev nodes on [−1. the error for polynomial interpolants defined on Chebyshev nodes can be bounded as 1 |f (x) − p(x)| ≤ n max f (n+1) (ξ) . We have no control over the function f (x) or its derivatives.1] max i=0 (t − xi ) ≥ 2−n . p is determined. In this case they take the form xi = β−α cos 2 2i + 1 2n + 2 π + α+β . Thus the Chebyshev nodes are considered the best for polynomial interpolation. Merging this result with Theorem 5. so p (n+1) is identically zero.

f (x) = − sin(x) − cos(x) 5. n. .b] max i=0 |x − xi | . equally spaced nodes are frequently used for interpolation. . since they are easy to calculate 2 . i = 0. 4 Never underestimate the power of laziness. . Now we claim that |x − xi | ≤ (j − i + 1) h for i < j. 4 where by simple calculus. (b − a) i. Thus it suffices to take n large enough such that π n+1 2 ≤ 10−8 22n+1 (n + 1)! By trial and error we see that n = 10 suffices. We can assume x is not one of the nodes. and the problems with the Runge Function.2. . .2 Interpolation Error for Equally Spaced Nodes Despite the proven superiority of Chebyshev Nodes. otherwise the product in question is zero. Then n i=0 |x − xi | ≤ h2 (j + 1)!hj 4 (n − j)!hn−j−1 . We now consider bounding n x∈[a.5. . ERRORS IN POLYNOMIAL INTERPOLATION 73 Example Problem 5. As a crude approximation we can assert that f (k) (x) ≤ |cos x| + |sin x| ≤ 2. xj+1 We can show that xi = a + hi = a + |x − xj | |x − xj+1 | ≤ h2 . Then x is between some x j . How many Chebyshev nodes are required to interpolate the function f (x) = sin(x) + cos(x) to within 10−8 on the interval [0. and so we get an overall bound n i=0 2 |x − xi | ≤ hn+1 n! . and |x − xi | ≤ (i − j) h for j + 1 < i. . n Start by picking an x.2. π]? Solution: We first find the derivatives of f f (x) = cos(x) − sin(x) f (x) = − cos(x) + sin(x) .9. 1. It can be shown that (j + 1)!(n − j)! ≤ n!.

How many equally spaced nodes are required to interpolate the function f (x) = sin(x) + cos(x) to within 10−8 on the interval [0.b] where h = (b − a)/n. Example Problem 5. The reason this result does not seem to apply to Runge’s Function is that f (n) for Runge’s Function becomes unbounded as n → ∞. Thus we want to make n sufficiently large such that hn+1 2 ≤ 10−8 .74 The interpolation theorem then gives us |f (x) − p(x)| ≤ CHAPTER 5. INTERPOLATION hn+1 max f (n+1) (ξ) . . 4 (n + 1) ξ∈[a. we make the crude approximation f (k) (x) ≤ |cos x| + |sin x| ≤ 2. 4(n + 1) where h = (π − 0)/n.10. π]? Solution: As in Example Problem 5. That is we want to find n large enough such that π n+1 ≤ 10−8 . 2nn+1 (n + 1) By simply trying small numbers.9. we can see this is satisfied if n = 12.

] f[ .4) Complete the divided differences table: x −1 1 5 −1 1 5 3 3 −2 f[ ] f[ . ] 3 3 −2 75 Find the Newton form of the polynomial interpolant.2) Find the Lagrange Polynomials for the nodes {−1. ] f[ . ERRORS IN POLYNOMIAL INTERPOLATION Exercises (5. . (5. .2. 5}. ] −3 13 21 1 (5.5) Find the polynomial of degree no greater than 3 that interpolates x y −1 1 5 −3 3 3 −2 4 (Hint: reuse the Newton form of the polynomial from the previous question. . ] f[ .1) Find the Lagrange Polynomials for the nodes {−1. ] 6 3 2 3 1 0 3/2 2 3 2 37/8 8 . 1. (5.8) Complete the divided differences table: x −1 0 1 2 f[ ] f[ . ] f[ . .3) Find the polynomial of degree no greater than 2 that interpolates x y (5.5. . . (5.) (5. ] f[ . 1}.7) Find the polynomial of degree no greater than 3 that interpolates x y (5. .6) Find the nested form of the polynomial interpolant of the data x y x 1 3 4 6 1 3 4 6 −3 13 21 1 by completing the following divided differences table: f[ ] f[ .

12) Write code to calculate the Newton form coefficients.coefs.xs) where coefs is a vector of the Newton coefficients. octave:6> newtonCoef(xs. Your m-file should have header line like: function coefs = newtonCoef(xs. octave:3> newtonCoef(xs.10) Let p(x) interpolate the function x −2 at n equally spaced nodes on the interval [0.33333 -0. . and fxs the vector of n + 1 values.xs) ans = 34 octave:4> calcNewton (-3. for the nodes xi and values f (xi ).*xs + xs . INTERPOLATION (Something a little odd should have happened in the last column. octave:8> fxs = xs.xs) ans = 10 (a) What do you get when you try the following? octave:5> xs = [3 1 4 5 9 2 6 8 7 0]. octave:5> fxs = [3 1 4 1 5 9 2 6]. Your m-file should have header line like: function y = calcNewton(t.11) How many Chebyshev nodes are required to interpolate the function x to within 10−6 on the interval [1.00040 (a) What do you get when you try the following? octave:4> xs = [1 4 5 9 14 23 37 60]. Test your code on the following input: octave:1> xs = [1 -1 2 -2 3 -3 4 -4]. octave:9> newtonCoef(xs.13) Write code to calculate a polynomial interpolant from its Newton form coefficients and the node values.9) Let p(x) interpolate the function cos(x) at n equally spaced nodes on the interval [0.76 CHAPTER 5.ys) (b) Try the following: octave:7> xs = [1 3 4 2 8 -2 0 14 23 15]. octave:3> calcNewton (5.fxs) where xs is the vector of n + 1 nodes. by divided differences.coefs.coefs.5≤x≤1 as a function of n. Of what degree is the polynomial interpolant? (5. Bound the error max p(x) − x−2 0.08333 0.) Find the Newton form of the polynomial interpolant. Check your code against the following values: octave:1> xs = [1 3 4]. octave:2> coefs = [6 5 1]. 1].fxs) ans = 1.5. 2]. Bound the error max |p(x) − cos(x)| 0≤x≤2 as a function of n.fxs) (5. 2]? (5. xs is a vector of the nodes x i .+ 4. octave:2> fxs = [1 1 2 3 5 8 13 21].00417 -0. and y is the value of the interpolating polynomial at t.05000 0.00020 -0. How small is the error when n = 10? 1 (5.00000 -0.00000 0. How small is the error when n = 10? How small would the error be if Chebyshev nodes were used instead? How about when n = 10? (5.

coefs.xs) (b) Try the following octave:8> xs = [3 1 4 5 9 2 6 8 7 0].xs) 77 .coefs. ERRORS IN POLYNOMIAL INTERPOLATION octave:6> coefs = [1 -1 2 -2 3 -3 4 -4 5 -5].5. octave:10> calcNewton(1. octave:9> coefs = [1 -1 2 -2 3 -3 4 -4 5 -5]. octave:7> calcNewton(0.5.2.

INTERPOLATION .78 CHAPTER 5.

ti+1 ]. but x − ti−1 ≤ 0. . A spline of degree 1. then x − ti > 0. For a spline with this data. the linear polynomial on each subinterval is defined as yi+1 − yi Si (x) = yi + (x − ti ) .1 5.1 First and Second Degree Splines Splines make use of partitions. Example 6.5 2. b]. also known as a linear spline. The linear spline for the following data t y is shown in Figure 6. ti+1 − ti .1 0. 3. Thus if we wish to evaluate S(x). Definition 6.0 1.75 1. we search for the largest i such that x − t i > 0.3.5 0. S is a linear polynomial. A spline is a function consisting of simple functions glued together. . b]. S is continuous on [a. i=0 A linear spline is defined entirely by its value at the knots. 6. b] if 1. That is. A partition of the interval [a. given t y t 0 t1 . then evaluate Si (x).0 0.4 0. ti+1 ] . y n Note that if x ∈ [ti . t n y 0 y1 . 79 0.Chapter 6 Spline Interpolation Splines are used to approximate complex functions and shapes. which are a way of cutting an interval into a number of subintervals.0 3 there is only one linear spline with these values at the knots and linear on each given subinterval. .1 (Partition).1. is a function which is linear on each subinterval defined by a partition: Definition 6. In this way a spline is different from a polynomial interpolation. splines are normally piecewise polynomial. 2. A function S is a spline of degree 1 on [a. There is a partition {ti }n of [a. .2 (Linear Splines).3 4. b] is an ordered sequence {t i }n such i=0 that a = t0 < t1 < · · · < tn−1 < tn = b The numbers ti are known as knots. which consists of a single well defined function that approximates a given shape. b] such that on each [ti . The domain of S is [a.0 2.

f (tn ) A function and its linear spline interpolant are shown in Figure 6. ti+1 ]. |f (t) − p(t)| ≤ max {|f (t) − f (ti )| .4 0. 8 Over a given partition. |f (t) − f (ti+1 )|} ..1 Suppose p(t) is the linear polynomial interpolating f (t) at the endpoints of the subinterval [ti . respectively 1 Although we could directly claim Theorem 5. if the spline is being used to interpolate the function f. . if f (t) exists and is bounded by M2 on [ti . . . then |f (t) − p(t)| ≤ M2 (ti+1 − ti )2 . say. then |f (t) − p(t)| ≤ M1 (ti+1 − ti ) . then this is equivalent to interpolating the data t t0 t1 . then for t ∈ [ti . splines are used to interpolate tabulated data as well as functions.8 1 Figure 6.1 First Degree Spline Accuracy As with polynomial functions. To find the error bound.. ti+1 ]. we will consider the error on a single interval of the partition. ti+1 ] . The spline interpolant in that figure is fairly close to the function over some of the interval in question. SPLINE INTERPOLATION 5 4 3 2 1 0 0 0. and is linear between each knot.1. In the latter case. 2 Similarly. 6.8. these become. it is a bit of overkill. if f (t) exists and is bounded by M1 on [ti . and use a little calculus.2 0.1: A linear spline. That is.80 6 CHAPTER 6. but it also deviates greatly from the function at other points of the interval. tn y f (t0 ) f (t1 ) . We are interested in finding bounds on the possible error between a function and its spline interpolant. In particular. |f (t) − p(t)| is no larger than the “maximum variation” of f (t) on this interval. ti+1 ].6 0. The spline is piecewise linear.2.

or splines of degree 2. The following is a quadratic splne:  −x x ≤ 0. b).2 Second Degree Splines Piecewise quadratic splines. Q is continuous on [a. the function and its interpolant are very close.4 0.1. Q(x) =  −x2 + 7x − 8 2 ≤ x. 2 0≤i<n M2 |f (t) − p(t)| ≤ max (ti+1 − ti )2 . 0.75. b] such that on [ti . Unlike linear splines.7.6. Qi (x) = ai x2 + bi x + ci .1.4. 1. Q is continuous on (a. There is a partition {ti }i=0 of [a. Example 5. 0. are defined similarly: Definition 6. 0. 2. 0.9. b] if 1. FIRST AND SECOND DEGREE SPLINES 6 Sfrag replacements 81 function spline interpolant 5.2 0.1) If equally spaced nodes are used. b].5 3 2. 6.4 (Quadratic Splines).1 + sin (1/(0. for the knots 0. b]. cf. We consider why that is. 0.5)) is shown. ti+1 ]. . these bounds guarantee that spline interpolants become better as the number of nodes is increased. quadratic splines are not defined entirely by their values at the knots. The domain of Q is [a.08t + 0.2: The function f (t) = 4.5 5 4.6. |f (t) − p(t)| ≤ M1 max (ti+1 − ti ) . The spline Q(x) is defined by its piecewise polynomials.0 For some values of t. This contrasts with polynomial interpolants.6 0. n 4.  x2 − x 0 ≤ x ≤ 2.1.5.5 4 3. 3. A function Q is a quadratic spline on [a.5 2 0 0. Q is a polynomial of degree at most 2.8 1 Figure 6. which may get worse as the number of nodes is increased. along with the linear spline interpolant. Example 6. while they vary greatly in the middle of the interval. 8 0≤i<n (6.

or some other condition. . S. The given data t t 0 t1 . . There is a partition {ti }n of [a. 6. we see that we can define Qi (x) = Use this at ti+1 : yi+1 = Qi (ti+1 ) = zi+1 − zi (ti+1 − ti )2 + zi (ti+1 − ti ) + yi . For each of the n subintervals. . z i+1 from zi : zi+1 = 2 6. y n give two equations regarding Qi (x). Qi (ti+1 ) = zi+1 . . The condition on continuity of Q gives a single equation for each of the n − 1 internal nodes. 2.1. and suppose that the additional condition to define the quadratic spline is given by specifying z0 . This totals 3n − 1 equations. Qi (ti ) = zi . S . for example. i=0 You would also expect that a spline of degree k has k − 1 “degrees of freedom. 2 (ti+1 − ti ) zi+1 − zi (ti+1 − ti ) + zi (ti+1 − ti ) .2 (Natural) Cubic Splines If you recall the definition of the linear and quadratic splines. 2 yi+1 − yi − zi . yi+1 − yi = 2 zi+1 + zi yi+1 − yi = (ti+1 − ti ) . . .82 Thus there are 3n parameters to define Q(x). . or Q (a) = 0. t n y 0 y1 . . b). . S (k−1) are continuous on (a. SPLINE INTERPOLATION t 0 t1 .3 Computing Second Degree Splines t y t 0 t1 . Thus some additional user-chosen condition is required to determine the quadratic spline. This system is underdetermined.6 (Splines of Degree k). b] such that on [ti . Let zi = Qi (ti ). y n . the spline of degree k is defined by n(k + 1) parameters. We want to be able to compute the form of Q i (x). Q (a) = 0. Because Qi (ti ) = yi .” as we show here. . the data t y CHAPTER 6. t n y 0 y1 . but 3n unknowns. . t n y y 0 y1 . namely that Qi (ti ) must equal yi and Qi (ti+1 ) must equal yi+1 . This is 2n equations. . The domain of S is [a. probably you can guess the definition of the spline of degree k: Definition 6. ti+1 − ti zi+1 − zi (x − ti )2 + zi (x − ti ) + yi . . . S is a polynomial of degree ≤ k. from the data alone. . ti+1 ]. b]. One might choose. S . b] if 1. . y n Suppose the data are given. 3. If the partition has n + 1 knots. 2 (ti+1 − ti ) Thus we can determine. A function S is a spline of degree k on [a.

7. < tn = b. Corollary 6. These are the degrees of freedom. with the last equality holding because g(x) is zero at the knots. Derivatives are linear.1 Why Natural Cubic Splines? It turns out that natural cubic splines are a good choice in the sense that they are the “interpolant of minimal H 2 seminorm. The natural cubic spline is best twice-continuously differentiable interpolant for a twice-continuously differentiable function. 6. This yields cubic splines. Suppose f has two continuous derivatives. ti+1 ]. a because S (a) = S (b) = 0. and let S be the natural cubic spline interpolating f (x) at some given nodes {ti }n . . meaning that f (x) = S (x) + g (x).” The corollary following this theorem states this in more easily understandable terms: Theorem 6. Thus. Integrating by parts we get b b b b S (x)g (x) dx = S g a a − a S g dx = − S g dx. . . . we can (and must) add k − 1 constraints to uniquely define the spline. Thus we have k − 1 more unknowns than equations. barring some singularity. The usual choice is to make S (t0 ) = S (tn ) = 0. Then notice that S is a polynomial of degree ≤ 3 on each interval. This is a total of n(k + 1) − (k − 1) equations. Often k is chosen as 3. . (NATURAL) CUBIC SPLINES 83 provide 2n equations. Then g(x) is zero on the (n + 1) knots t i .2. Then b S (x) a 2 b dx ≤ f (x) a 2 dx Proof. Apply the theorem to get b S (x) a 2 b dx ≤ R (x) a 2 dx . . Thus b n−1 b n−1 ti+1 S g dx = a i=0 a ci g dx = i=0 ci g ti = 0. under the measure given by the theorem. Proof. S .6. The continuity of S .8. thus S (x) is a piecewise constant function. Then S(x) interpolates R(x) at these nodes. taking value c i on each interval [ti . We let g(x) = f (x) − S(x). Let R(x) be some twice-continuously differentiable function i=0 which also interpolates f (x) at these nodes. S (k−1) at the n − 1 internal knots gives (k − 1)(n − 1) equations. This yields the natural cubic spline. We must add 2 extra constraints to define the spline.2. We show that the last integral is zero. Let f be twice-continuously differentiable. Then a b f (x) 2 b dx = a S (x) 2 b dx + a g (x) 2 b dx + a 2S (x)g (x) dx. and S is the natural cubic spline interpolating f at knots a = t0 < t1 < .

The method is rather tedious. . And this is for the case of only three nodes. with limk→−∞ tk = −∞ and limk→∞ tk = ∞. . We would like a method easier than setting up the 4n equations and unknowns.9. 2] − 2 x + 3x Finding the constants for the previous example was fairly tedious. Now take the second derivative of S: S (x) = Continuity at the middle node gives b = f. . 2] 6ax + 2b 6ex + 2f x ∈ [−1.3. Solving this by “divide and conquer” gives S(x) = x3 + 3x2 − 2x − 1 x ∈ [−1. 0] ex3 + f x2 + gx + h x ∈ [0.3 B Splines The B splines form a basis for spline functions.1. 0] 3ex2 + 2f x + g x ∈ [0. . 6. whence the name.2 Computing Cubic Splines First we present an example of computing the natural cubic spline by hand: Example Problem 6. < t 2 < t1 < t0 < t1 < t2 < . The natural cubic spline condition gives −6a + 2b = 0 and 12e + 2f = 0. so we leave it to the exercises. 0] x ∈ [0. 2] −a + b − c − 1 = 3 We interpolate to find that d = h = −1 and 8e + 4f + 2g − 1 = 3 We take the derivative of S: S (x) = 3ax2 + 2bx + c x ∈ [−1.2. SPLINE INTERPOLATION 6. The B splines of degree 0 are defined as single “blocks”: 0 Bi (x) = 1 ti ≤ x < ti+1 0 otherwise . We presuppose the existence of an infinite number of knots: . 2] Continuity at the middle node gives c = g. .84 CHAPTER 6. Construct the natural cubic spline for the following data: t y −1 0 2 3 −1 3 Solution: The natural cubic spline is defined by eight parameters: S(x) = ax3 + bx2 + cx + d x ∈ [−1. 0] 1 3 2 − 2x − 1 x ∈ [0. something akin to the description in Subsection 6.

. . That is. These B splines are sometimes called hat functions. . The hat functions play a similar role because 1 Bi (tj ) = δ(i+1)j = 1 (i + 1) = j 0 (i + 1) = j Then if we want to interpolate the following data with splines of degree 1: t y We can immediately set S(x) = i=0 t 0 t1 . Some B splines are shown in Figure 6.3.3. ti+2 ). . Imagine wearing a hat shaped like this! Whatever. Harken back to polynomial interpolation and the Lagrange Functions. The nice thing about the hat functions is they allow us to use analogy. We focus on the case k = 1. are nonzero only on one subinterval [ti . • Nonzero only on (ti . t n y 0 y1 . We justify the description of B splines as basis splines: If S is a spline of degree 0 on the given knots and is continuous from the right then S(x) = i 0 S(xi )Bi (x). • Continuous. . y n n 1 yi Bi−1 (x). 1 The B splines quickly become unwieldy. ti+1 ). sum to 1 everywhere.6. the basis splines work in the same way that Lagrange Polynomials worked for polynomial interpolation. The B spline B i (x) is • Piecewise linear. • 1 at ti+1 . The B splines of degree k are defined recursively: k Bi (x) = x − ti ti+k − ti k−1 Bi (x) + ti+k+1 − x ti+k+1 − ti+1 k−1 Bi+1 (x). B SPLINES 85 The zero degree B splines are continuous from the right.

5 1.5 -0.5 -1 0 1 2 3 4 5 6 (c) Degree 2 B spline.5 PSfrag replacements 0 0 -0. in (b). in (c). with 2 Degree 1 B splines Figure 6.86 CHAPTER 6. two of the degree 1 B splines and a degree 2 B spline are shown.5 0 -0. t3 = 4 are shown. .3: Some B splines for the knots t 0 = 1.5 1 1 0. t2 = 3.5. t1 = 2.5 2.5. SPLINE INTERPOLATION 3 3 2.5 -1 0 1 2 -1 3 4 5 6 0 1 2 3 4 5 6 (a) Degree 0 B spline (b) Degree 1 B spline 3 1 B0 (x) 1 B1 (x) 2 (x) B0 2.5 2 1. The two hat functions “merge” together to give the quadratic B spline. In (a).5 0. one of the degree 0 B splines.5 1 0.5 2 2 1. a degree 1 B spline.

assume that Q (0) = −Q (4). β2 .1) Is the following function a linear spline on [0. β. β1 . 4]? Why or why not? S(x) = 3x + 2 −2x + 4 :0≤x<1 :1≤x≤4 87 (6. 2]? Why or why not? S(x) = x+3 3 :0≤x<1 :1≤x≤2 (6.3) Is the following function a linear spline on [0. 4]? Why or why not? S(x) = x2 + 3 5x − 6 :0≤x<3 :3≤x≤4 (6.5) Is the following function a quadratic spline on [0.7) Find constants.6. 5].3. 4]? Why or why not? Q(x) = x2 + 3 5x − 6 :0≤x<3 :3≤x≤4 (6. 2]? Why or why not? Q(x) = x2 + 3x + 2 2x2 + x + 3 :0≤x<1 :1≤x≤2 (6.6) Is the following function a quadratic spline on [0.4) Find constants. . B SPLINES Exercises (6. Assume your solution takes the form α1 (x − 1)2 + β1 (x − 1) − 2 : 0 ≤ x < 1 Q(x) = α2 (x − 1)2 + β2 (x − 1) − 2 : 1 ≤ x ≤ 4 Find the constants α1 . γ such that the following is a quadratic spline on [0. α.8) Find the quadratic spline that interpolates the following data: t y 0 1 4 1 −2 1 To resolve the single degree of freedom.  4x − 2 :0≤x<1  S(x) = αx + β :1≤x<3   −2x + 10 : 3 ≤ x ≤ 5 (6. α. α2 . β such that the following is a linear spline on [0.2) Is the following function a linear spline on [0.  3  1 x2 + 2x + 2 : 0 ≤ x < 1 2 Q(x) = αx2 + βx + γ : 1 ≤ x < 3   2 3x − 7x + 12 : 3 ≤ x ≤ 5 (6. 5].

SPLINE INTERPOLATION (6.9) Find the natural cubic spline that interpolates the data x y 0 1 3 4 2 7 It may help to assume your answer has the form S(x) = Ax3 + Bx2 + Cx + 4 D(x − 1)3 + E(x − 1)2 + F (x − 1) + 2 :0≤x<1 :1≤x≤3 .88 CHAPTER 6.

1) f (ξ)h2 . However. h 2 Remember that ξ is between x and x + h. as h gets smaller. Not surprisingly. even if you worked in infinite precision. Generally the roundoff error will increase as h decreases.Chapter 7 Approximating Derivatives 7. If there is a finite bound on f (z) on the interval in question then the dropped term is bounded by a constant times h. we have dropped the last term. By now. you should know that any calculation starts with Taylor’s Theorem: f (x + h) = f (x) + f (x)h + f (x) 2 f (ξ1 ) 3 h + h 2 3! f (x) 2 f (ξ2 ) 3 h − h f (x − h) = f (x) − f (x)h + 2 3! 1 This approximation for f (x) should remind you of the definition of f (x) as a limit. The truncation error for this approximation is O (h). Thus there is a nonzero h for which the sum of these two errors is minimized. we start with Taylor’s theorem: f (x + h) = f (x) + f (x)h + Rearranging we get f (x + h) − f (x) f (ξ)h − . 2 The error that we incur when we approximate f (x) by calculating [f (x + h) − f (x)] /h is called truncation error. See Example Problem 7. we cannot evaluate f (ξ). so we approximate the derivative by dropping the last term.1 due to subtractive cancellation. precision will be lost in equation 7. but its exact value is not known. 89 . f (x) = f (x) = f (x + h) − f (x) + O (h) h (7. The truncation error can be made small by making h small. The error in calculation for small h is called roundoff error.5 for an example of this. we calculate [f (x + h) − f (x)] /h as an approximation 1 to f (x). That is. That is.1 Finite Differences Suppose we have some blackbox function f (x) and we wish to calculate f (x) at some given x. In so doing. If we wish to calculate f (x). It has nothing to do with the kind of error that you get when you do calculations with a computer with limited precision. you would still have truncation error. We may want a more precise approximation.

. (x) + . . 2 If f (x) is continuous. 3! −2h2 3! f (x) + . .2. then there is some ξ between ξ 1 . .. . assuming f (x) is an analytic function.e. ξ2 such that f (ξ) = is the MVT at work. . . .90 By subtracting these two lines. The following examples illustrate: Example Problem 7. . . . 4! f . . involving the use of Taylor’s Theorem. we get f (x) = f (x + h) − f (x − h) + O h2 2h (This (7. 3! f (ξ1 ) + f (ξ2 ) 3 h 3! f (x + h) − f (x − h) f (ξ1 ) + f (ξ2 ) h2 − 2h 2 6 f (ξ1 )+f (ξ2 ) . 2! 3! 2 3 f (x) + 2hf (x) + 4h f (x) + 8h f (x) + . . then subtract to get some factor of f (x): f (x + 2h) = f (x) + 2hf (x) + f (x + h) = f (x) + hf (x) + f (x + 2h) − f (x + h) = hf (x) + 4h2 2! f (x) + 8h3 3! f (x) + h2 2! f (x) + h3 3! f (x) + 16h4 (4) (x) + 4! f h4 (4) (x) + .2) In some situations it may be necessary to use evaluations of a function at “odd” places to approximate a derivative. 2 3 4f (x) + 4hf (x) + 4h f (x) + 4h f (x) + . Use evaluations of f at x + h and x + 2h to approximate f (x). . APPROXIMATING DERIVATIVES f (x + h) − f (x − h) = 2f (x)h + Thus 2f (x)h = f (x + h) − f (x − h) − f (x) = f (ξ1 ) + f (ξ2 ) 3 h . (f (x + 2h) − f (x + h)) /h = f (x) + Example Problem 7.) Assuming some uniform bound on f (·). . . 2h for f with sufficient number of derivatives Solution: In this case we do not have to find the approximation scheme. 2! f (x) + 3! f (x) + 4! f Thus (f (x + 2h) − f (x + h)) /h = f (x) + O (h) 4f (x + h) − f (x + 2h) − 3f (x) = f (x) + O h2 . i. . Show that 7h3 15h4 (4) 3h2 (x) + . 2! f (x) + 3! f (x) + 4! f 2 3 7h 15h 3h (4) (x) + . These are usually straightforward to derive. . .. We only have to expand the appropriate terms with Taylor’s Theorem. As before: f (x + h) = f (x) + hf (x) + 4f (x + h) = f (x + 2h) = h3 h2 2! f (x) + 3! f (x) + .1. Solution: First use Taylor’s Theorem to expand f (x + h) and f (x + 2h). one with infintely many derivatives.. 2! 3! −4h3 3! f (4f (x + h) − f (x + 2h) − 3f (x)) /2h = f (x) + 4f (x + h) − f (x + 2h) − 3f (x) = 2hf (x) 4f (x + h) − f (x + 2h) = 3f (x) + 2hf (x) + 0f (x) + 3 + −4h f (x) + . we get CHAPTER 7. it is given to us.

Can we do better? Let’s define φ(h) = 1 [f (x + h) − f (x − h)] .2 Richardson Extrapolation The centered difference approximation gives a truncation error of O h2 . This derivation also gives us the centered difference approximation to the second derivative: f (x) = f (x + h) − 2f (x) + f (x − h) + O h2 .. . . 2 3! 4! f (x) 2 f (x) 3 f (4) (x) 4 f (x − h) = f (x) − f (x)h + h − h + h − .) . The constants ai are a function of f (i+1) (x) only. 4! 6! f (4) (x) 2 f (6) (x) 4 h +2 h + .. 2 3! 4! f (6) (x) 6 f (4) (x) 4 h +2 h + . .1.1 Approximating the Second Derivative Suppose we want to approximate the second derivative of some blackbox function f (x). . which is better than O (h) .7.. h2 (7.3) 7. RICHARDSON EXTRAPOLATION 91 7. We have to be a little tricky. . 3 4 16 64 f (i+1) (x) (i+1)! .. 4 16 64 256 But we can combine this with φ(h) to get better accuracy. but we can get the O h2 terms to cancel by taking the right multiples of these two approximations: 3 15 63 φ(h) − 4φ(h/2) = −3f (x) + 4h4 + a6 h6 + a8 h8 + . k=1 f (x + h) = f (x) + f (x)h + Now add the two series to get f (x + h) + f (x − h) = 2f (x) + h2 f (x) + 2 Then let ψ(h) = f (x + h) − 2f (x) + f (x − h) h2 = f (x) + 2 = f (x) + ∞ Thus we can use Richardson Extrapolation on ψ(h) to get higher order approximations. 4! 6! b2k h2k . . . Again.... 2h Had we expanded the Taylor’s Series for f (x + h).. 4 16 64 1 4 5 21 4φ(h/2) − φ(h) 6 = f (x) − 4h − a6 h − a8 h8 + . (In fact. .2. start with Taylor’s Theorem: f (x) 2 f (x) 3 f (4) (x) 4 h + h + h + . f (x − h) to more terms we would have seen that φ(h) = f (x) + a2 h2 + a4 h4 + a6 h6 + a8 h8 + .. they should take the value of What happens if we now calculate φ(h/2)? 1 1 1 1 φ(h/2) = f (x) + a2 h2 + a4 h4 + a6 h6 + a8 h8 + .

m − 1) . n) By definition we know how to calculate the first column of this table. Let D(n. 1) D(n. 1) D(2. As in divided differences. every other entry in the table depends on two other entries. . m − 1) − D(n − 1. . The following theorem gives this result: Theorem 7. We claim that D(n. . 2) · · · D(n. and the other to the left and up one space. some approximation: ∞ φ(h) = L + k=1 a2k h2k . n) = L + O h2(n+1) . D(n.m h 2n 2k (0 ≤ m ≤ n) . . We will also use this technique later to get better quadrature rules–that is. m) = 4m D(n. . . n) is a O h2(n+1) approximation to L. and can be repeatedly applied. 0) D(2. . . This technique of getting better approximations is known as the Richardson Extrapolation. ways of approximating the definite integral of a function. 0) D(1. through theory. n) = L+O h2(n+1) . but tedious. We skip the proof. There are constants a k. m) = L + k=m+1 ak. 0) D(n. n) for some n.92 CHAPTER 7. we use a pyramid table: D(0.. 0) = φ Now define D(n. induction. First we examine the recurrence for D(n.3 (Richardson Extrapolation). We will be interested in calculating D(n. 2) .m such that ∞ D(n. 4m − 1 (7. m). 7. one directly to the left. APPROXIMATING DERIVATIVES This approximation has a truncation error of O h4 .2. . . and have found. The proof is by an easy. Thus to calculate D(n. . that is D(n. n) we have to compute this whole lower triangular array. We want to show that D(n.1 Abstracting Richardson’s Method We now discuss Richardson’s Method in a more abstract framework.4) h 2n . 0) D(1. 1) D(2. Suppose you want to calculate some quantity L.

√ 1 Attempt to find f ( 2). Example Problem 7. 2h √ 2 until subtractive cancelling takes over.7.2. that φ(0.1 1.2.9 0 log 0.1. notice that for this simple example. then making h small gives a good approximation to f The following table illustrates this: .333333333331274 0.333334876543723 0.5. We now try to find D(2.333333333333313 Note that we have some motivation to use Richardson’s method in this case: If we let φ(h) = √ √ 1 f ( 2 + h) − f ( 2 − h) . RICHARDSON EXTRAPOLATION 93 7. Solution: The real answer is f (1) = 1/1 = 1.975 ≈ 1. We now try to find D(2.01. 2). but our computer doesn’t know that.4.025 log 0.05 1 ≈ 0. Define 1+h log 1−h 1 [f (1 + h) − f (1 − h)] = . which is supposed to be a O h6 approximation to 1 : 3 n\m 0 1 2 0 1 2 0. Recall that f (x) = 1+x2 .33333371913582 0. Example Problem 7.333339506181068 0. φ(h) = 2h 2h Let’s use h = 0. we have.999999686 ≈ 1.00001) ≈ 0.1 0.000834586 0.2 Using Richardson Extrapolation We now try out the technique on an example or two. Approximate the derivative of f (x) = log x at x = 1.2 ≈ 1.000208411 0.333333333333186 0.95 ≈ 1. already. Consider the ugly function: f (x) = arctan(x). 2). which is supposed to be a O h6 approximation to f (1) = 1: n\m 0 1 2 1. However. 3 Solution: Let’s use h = 0.999994954 2 ≈ 0.999999999. so the value that we are seeking is 1 .05 log 0.000000002 This shows that the Richardson method is pretty good.003353477 1.

33333333334106813 0.01 1 .01 0. PSfrag replacements total error 0.33306690738754691 0 The data are illustrated in Figure 7.94 CHAPTER 7.33306690738754696 0.0 0. The optimal h value is around 1 × 10 −5 .01 gives much better results than this optimal h. we get 13 decimal places from D(2.33306690738754696 0. and a roundoff term which increases.33333336091345694 0.33333950618106845 0.001 0.0001 0. Note that Richardson’s D(2.333333360913457 0. 2) approximation with h = 0.33395069677431943 0.1. 2).39269908169872408 0.0001 1e-06 1e-08 1e-10 1e-12 1e-16 1e-14 1e-12 1e-10 1e-08 1e-06 0.33333333395058062 0. Notice that φ(h) gives at most 10 decimal places of accuracy.33333332760676626 0.33333339506169679 0. The total error is the sum of a truncation term which decreases as h decreases. Note however. then begins to deteriorate.0001 1 × 10−5 1 × 10−6 1 × 10−7 1 × 10−8 1 × 10−9 1 × 10−10 1 × 10−11 1 × 10−12 1 × 10−13 1 × 10−14 1 × 10−15 1 × 10−16 φ(h) 0.1: The total error for the centered difference approximation to f ( 2) is shown versus h.1 0. 1 √ Figure 7. APPROXIMATING DERIVATIVES h 1.33333333315788138 0.01 0.33339997429493451 0.333333360913457 0.33333333332441484 0.

. .2) Let f (x) be an analytic function. Suppose you have some great computational approximation to the quantity Q such that ψ(h) = Q + a3 h3 + a6 h6 + a9 h9 . Assuming that φ(h) = Q + a2 h2 + a4 h4 + a6 h6 .. (7.9) using Taylor’s Theorem. Can you find some combination of ψ(h).25 ? ? (See equation 7. . using h = 4 . . ψ(h/2) which is a O h6 approximation to Q? Complete the following Richardson’s Extrapolation Table. RICHARDSON EXTRAPOLATION Exercises (7. Assuming that φ(h) = Q + a 2 h2 + a4 h4 + a6 h6 . assuming the first column consists of values D(n.4) (7.7. φ(λh) which is a O h4 approximation to Q. . Find some linear combination of φ(h) and φ(−h) which is a O h2 approximation to Q. find some combination of φ(h). find some combination of φ(h). what should you do with λ? Is this practical? Note that the method of Richardson Extrapolation that we considered used the value λ = 1/2.5 ? 2 1. To make the constant associated with the h4 term small in magnitude.8) (7. Let λ be some number in (0. where f (x) = x4 .7) (7.5) (7. Let ψ(h) be the centered difference approximation to the first derivative: ψ(h) = 2 f (x + h) − f (x − h) 2h 4 6 (a) Show that ψ(h) = f (x) + h f (x) + h f (5) (x) + h f (7) (x) + .2. Assuming that φ(h) = Q + a2 h2 + a4 h4 + a6 h6 . φ(h/3) which is a O h4 approximation to Q. i. one which is infinitely differentiable. Your answer should be something like O h? . say φ(h). and can approximate it with some formula.e..) (b) Use this formula to approximate f (0). h2 (7. give the accuracy of the above approximation. Approximate f (0) with this approximation. .) . (a) What order approximation is this? (Assume f (x) has bounded derivatives of arbitrary order. 0) for n = 0. . 1.3) Derive the approximation f (x) ≈ 4f (x + 3h) + 5f (x) − 9f (x − 2h) 30h (7. (a) Assuming that f (x) has bounded derivatives. . . 2: n\m 0 1 2 0 2 1 1. . and such that φ(h) = Q + a 1 h + a2 h2 + a3 h3 + a4 h4 + .1) Derive the approximation f (x) ≈ 4f (x + h) − 3f (x) − f (x − 2h) 6h 95 using Taylor’s Theorem.4 if you’ve forgotten the definitions. . φ(h/4) which is a O h4 approximation to Q.6) (7.. 1 (b) Let f (x) = x3 . 1). .. which depends on parameter h.1 Suppose you want to know quantity Q. find some combination of φ(h). 3! 5! 7! (b) Show that 8 (ψ(h) − ψ(h/2)) = f (x) + O h2 . and h = 0.

10) Write code to complete a Richardson’s Method table. octave:8> richardsons(col0) .125 0. D(n. given the first column. octave:6> richardsons(col0) (b) What do you get when you try the following? octave:7> col0 = [0.25 0. Your m-file should have header line like: function Dnn = richardsons(col0) where Dnn is the value at the lower left corner of the table.5 1. . 0). n) while col0 is the column of n + 1 values D(i.5 0.9999 0.99 0.03125].0625 0.5 1.5 0. octave:2> richardsons(col0) ans = 0.5 0. .5 0. Test your code on the following input: octave:1> col0 = [1 0.5 1. 1. for i = 0. .5].999 0.9 0. APPROXIMATING DERIVATIVES (7.99999].96 CHAPTER 7.019042 (a) What do you get when you try the following? octave:5> col0 = [1. n. .

Nevertheless. you learned the Fundamental Theorem of Calculus.Chapter 8 Integrals and Quadrature 8. P ) = i=0 97 . you may find yourself in a situation where you have to evaluate an integral for just such an integrand. What you might not have been told in Calculus is there are some functions for which a closed form antiderivative does not exist or at least is not known to humankind. An approximation will have to do. ordered collection of nodes x i : a = x0 < x1 < x2 < · · · < xn = b. and F (x) is an antiderivative of f (x). A partition of an interval [a. P. 8.1 The Definite Integral b Often enough the numerical analyst is presented with the challenge of finding the definite integral of some function: f (x) dx. then b a f (x) dx = F (b) − F (a). which claims that if f (x) is continuous. Given such a partition. define the upper and lower bounds on each subinterval [x j . xj+1 ] as follows: mi = inf {f (x) | xi ≤ x ≤ xi+1 } Mi = sup {f (x) | xi ≤ x ≤ xi+1 } Then for this function f and partition P. define the upper and lower sums: n−1 L(f.1 Upper and Lower Sums We will review the definition of the Riemann integral of a function. a In your golden years of Calculus. P ) = i=0 n−1 mi (xi+1 − xi ) Mi (xi+1 − xi ) U (f. b] is a finite.1.

. P ) ≤ U (f. and the upper sums an overestimate of the integral. and (b) upper sums of a function on a given interval are shown.2. (ii) If we switch to a “better” partition (i. Note that the lower sums are an underestimate. Definition 8.1. we define the integral b f (x) dx = inf U (f. P ) = inf U (f. A function f is Riemann Integrable over interval [a. a P You may recall the following Theorem 8. but not necessary. . Consider the Heaviside function: f (x) = 0 x<0 1 0≤x This function is not continuous on any interval containing 0. we expect that L(f. bottomright bottomright (a) The Lower Sum (b) The Upper Sum Figure 8. b]. Notice a few things about the upper. in case f (x) is integrable.1: The (a) lower. P ). b] if sup L(f. a finer one). Moreover. ·) increases and U (f. The notion of integrability familiar from Calculus class (that is Riemann Integrability) is defined in terms of the upper and lower sums. Every continuous function on a closed bounded interval of the real line is Riemann Integrable (on that interval). Continuity is sufficient. P ). as in Figure 8.e. P ). but is Riemann Integrable on every closed bounded interval. P P where the supremum and infimum are over all partitions of the interval [a. lower sums: (i) L(f. Example 8. ·) decreases. These approximations to the integral are the sums of areas of rectangles.98 CHAPTER 8. INTEGRALS AND QUADRATURE We can interpret the upper and lower sums graphically as the sums of areas of rectangles defined by the function f and the partition P .1.3.

the infimum of f (x) on each interval occurs at the leftmost endpoint. P )) = [f (xn ) − f (x0 )] = . we have L(f. P )) . if some information is known about the function. P ). P ) − L(f.1. that is x ≤ y implies f (x) ≤ f (y). P ) − L(f. for i = 0. 1. P ). P P so this function is not Riemann Integrable. Consider the Dirichlet function: f (x) = 0 x rational 1 x irrational 99 For any partition P of any interval [a. b]. or for a black box function. we have n−1 L(f. P ) = k=0 n−1 |b − a| mi |xk+1 − xk | = n Mi |xk+1 − xk | = |b − a| n n−1 f (xk ) k=0 n−1 U (f. . P )) . using this method on some function f (x) which is monotone increasing. THE DEFINITE INTEGRAL Example 8. The method cuts the interval [a. b] by means of a number of evaluations of f at points in this interval.4. P ) = 0. it is usually not feasible to find the suprema and infima of f (x) on the subintervals. while the supremum occurs at the right hand endpoint. 2 Because the value of the integral is between L(f. 2 Note that in general. P ) = 0 = 1 = inf U (f. 2 2 n 2n 8. so sup L(f.1. xi+1 ].8. The error of a simple quadrature rule usually depends on the function .3 Simple and Composite Rules For the remainder of this chapter we will study “simple” quadrature rules. However.1. P.e. this approximation has error at most 1 (U (f. P ) = 1. f (x) over an interval [a. . i. rules which approximate the integral of a function. In this case. while U (f. The integral is then approximated by the mean of the lower and upper sums: b a f (x) dx ≈ 1 (L(f..2 Approximating the Integral b The definition of the integral gives a simple method of approximating an integral a f (x) dx. P ) + U (f. n. Thus for this partition. Consider for example. P ) = k=0 f (xk+1 ) = k=0 |b − a| n n f (xk ) k=1 Then the error of the approximation is 1 1 |b − a| |b − a| [f (b) − f (a)] (U (f. P ) and U (f. b] into a partition of n equal subintervals x i = a+ b−a . 8.5. n The algorithm then has to somehow find the supremum and infimum of f (x) on each interval [xi . . and thus the lower and upper sums cannot be calculated. . it becomes easier: Example 8.

2: The trapezoidal rule for approximating the integral of a function over [a. By old school math. Thus. we can find this signed area easily. we usually cast it into a “composite” rule. a for some unpleasant or black box function f (x).2 Trapezoidal Rule b Suppose we are trying to approximate the integral f (x) dx. for example if the interval in question is [α. This gives the (simple) trapezoidal rule: b a f (x) dx ≈ (b − a) f (a) + f (b) . f (a)) . . and with one side the segment from a to b. To use a simple rule on a larger interval. b] is shown. INTEGRALS AND QUADRATURE f. xi+1 ]. apply the simple rule to each subinterval. . f (b)) . 8. and the partition is α = x 0 < x1 < x2 < . and sum the results. which we will study next becomes the composite trapezoidal rule. β] = i=0 simple rule applied to [xi . uright a b lleft Figure 8. β]. The means of extending a simple rule to a composite rule is straightforward: Partition the given interval into subintervals. < xn = β. Thus the trapezoidal rule. b] to some power which is determined by the rule. The trapezoidal rule approximates the integral PSfrag replacements b f (x) dx a by the (signed) area of the trapezoid through the points (a. we have n−1 composite rule on [α. That is we usually think of a simple rule as being applied to a small interval. See Figure 8.100 CHAPTER 8. and the width of the interval [a. (b.2. 2 .

. β] be partitioned by α = x0 < x1 < x2 < x3 < .107149. Solution: We have h = 2−0 = 1. xi+1 ] . < xn = β. Example Problem 8. one which you saw in your calculus class. xi+1 ] by the integral of pi (x) over the same interval. then the composite trapezoidal rule is β n−1 xi+1 xi f (x) dx = α i=0 1 f (x) dx ≈ 2 n−1 i=0 (xi+1 − xi ) [f (xi ) + f (xi+1 )] .5. Now recall our theorem on polynomial interpolation error. Approximate the integral 2 0 1 dx 1 + x2 by the composite trapezoidal rule with a partition of equally spaced points.6. you saw this in the less comprehensible form: b a f (x) dx ≈ h f (a) + f (b) + 2 n−1 f (xi ) . TRAPEZOIDAL RULE 101 The composite trapezoidal rule can be written in a simplified form. Ti = xi pi (x) dx = (xi+1 − xi ) pi (xi ) + pi (xi+1 ) h = (f (xi ) + f (xi+1 )) . x i+1 − xi = h. and our approximation is correct to two decimal places. Then the composite 2 2 5 trapezoidal rule gives the value 1 11 1 1 1+1+ = . i=0 (8. (2)! . if the interval in question is partitioned into equal width subintervals. and f (x0 ) = 1. 2 2 That’s right: the composite trapezoidal rule approximates the integral of f (x) over [x i . and we have b a f (x) dx ≈ h 2 n−1 [f (xi ) + f (xi+1 )] . (8. where h = (β − α)/n. That is if you let [α. . for n = 2. i=1 Note that the composite trapezoidal rule for equal subintervals is the same as the approximation we found for increasing functions in Example 8. For x ∈ [x i . 8.2.2. [f (x0 ) + f (x1 ) + f (x1 ) + f (x2 )] = 2 2 5 10 The actual value is arctan 2 ≈ 1. f (x2 ) = 1 . Let xi+1 xi+1 Ii = xi f (x) dx.1) Since each subinterval has equal width. xi+1 .1 How Good is the Composite Trapezoidal Rule? We consider the composite trapezoidal rule for partitions of equal subintervals. f (x1 ) = 1 . Let p i (x) be the polynomial of degree ≤ 1 that interpolates f (x) at x i .8. with xi = α + ih.2) In your calculus class. we have f (x) − pi (x) = 1 (2) f (ξx ) (x − xi ) (x − xi+1 ) .

and let I = a f (x) dx. β]. xi+1 ]. xi+1 ]. We use this theorem on our integral.e. [xi . 6 This gives Ii − T i = − for some ξi ∈ [xi .7 (Mean Value Theorem for Integrals). To make things simpler. Recall that ξx depends on x. h. Thus if.102 CHAPTER 8. Suppose f is continuous. the trapezoidal rule gives an overestimate of the integral I. which lies between the least and greatest values of f on the inteval [a. for some ξi ∈ [xi . but the sign as well. Then there is some ξ ∈ [a. xi+1 ]. 12 h3 .8 (Error of the Composite Trapezoidal Rule). b].. f (x) is concave up and thus f is positive. We now sum over all subintervals to find the total error of the composite trapezoidal rule n−1 E= i=0 Ii − T i = − h3 12 n−1 i=0 f (ξi ) = − (b − a) h2 12 1 n n−1 f (ξi ) . So E=− This gives us the theorem: (b − a) h2 f (ξ) 12 Theorem 8. 12 Note that this theorem tells us not only the magnitude of the error. and thus by the IVT. b]. and wave our hands to get continuity of f (ξ(x)). Recall the following theorem: Theorem 8. i. n i=0 f (ξi ). Note that (x − x i ) (x − xi+1 ) is nonpositive on the interval of question. Let T be the value of the trapezoidal rule applied to f (x) on this interval with a partition of uniform b spacing.3. i=0 n−1 1 On the far right we have an average value. Then we have 1 Ii − Ti = f (ξ) 2 xi+1 xi (x − xi ) (x − xi+1 ) dx. Let f (x) be continuous on [a. We assume continuity of f (x). g is Riemann Integrable and does not change sign on [α. there is some ξ which takes this value. Now integrate: xi+1 Ii − T i = xi f (x) − pi (x) dx = 1 2 xi+1 xi f (ξ(x)) (x − xi ) (x − xi+1 ) dx. for example. Then there is some ζ ∈ [α. INTEGRALS AND QUADRATURE for some ξx ∈ [xi . then I − T will be negative. We will now attack the integral on the right hand side. b] such that I −T =− (b − a) h2 f (ξ). call it ξ(x). we find that xi+1 xi (x − xi ) (x − xi+1 ) dx = − h3 f (ξi ). β] such that β β f (x)g(x) dx = f (ζ) α α g(x) dx. See Figure 8. xi+1 ] . . By boring calculus and algebra.

Thus f (ξ) is continuous and bounded by 2 on [0. If we use n equal subintervals then Theorem 8. has positive second derivative.10. 2]. i. thus f (x) = − (1+x)2 .2 Using the Error Bound 1 Example Problem 8. 1]. Example Problem 8..2. 3n2 . 6n2 and so n ≥ 1 × 105 suffices. How many intervals are required to approximate the integral 2 0 x3 − 1 dx to within 1 × Solution: We have f (x) = x3 − 1. How many intervals are required to approximate the integral ln 2 = I = 0 1 dx 1+x to within 1 × 10−10 ? 1 2 1 Solution: We have f (x) = 1+x .8. we need only take 1 ≤ 1 × 10−10 . thus f (x) = 3x2 . in absolute value. And f (x) = (1+x)3 .9. the trapezoidal rule will 6 be an overestimate.3: The trapezoidal rule is an overestimate for a function which is concave up.2. TRAPEZOIDAL RULE uright Sfrag replacements 103 xi xi+1 lleft Figure 8.e. Because f (x) is positive on this interval. − f (ξ) = − 12 n 12n2 To make this smaller than 1 × 10−10 .8 tells us the error will be 1−0 1−0 2 f (ξ) . 8. If we use n equal subintervals then by Theorem 8.8 the error will be − 2−0 12 2−0 n 2 10−6 ? f (ξ) = − 2f (ξ) . Thus f (ξ) is continuous and bounded by 12 on [0. and f (x) = 6x.

the error is quartered. .3 Romberg Algorithm Theorem 8. . This should look just like something 2n from Chapter 7. a. that the error of the composite trapezoidal rule approximation is O h2 . 24 ≤ 1 × 10−6 . in absolute value. b]. Then define 2n 1b−a φ(n) = 2 2n 2n −1 f (xi ) + f (xi+1 ) i=0 2n −1 b − a f (a) f (b) + + = n 2 2 2 f i=1 a+i b−a 2n . n n n n where hn = b−a . INTEGRALS AND QUADRATURE To make this smaller than 1 × 10−6 . . We do this by constructing a triangular array of approximations. . approximately. b are given. n 64 21 a8 h8 + . n 64 This approximation has a truncation error of O h4 . . we let R(n. we would have proved the relation: b φ(n) = a f (x) dx + a2 h2 + a4 h4 + a6 h6 + a8 h8 + . . suppose that f. . the forms are exactly the same. Towards this end. n+1 n+1 n+1 n+1 1 1 1 1 a8 h8 + . . What happens if we now calculate φ(n + 1)? We have b φ(n + 1) = a b f (x) dx + a2 h2 + a4 h4 + a6 h6 + a8 h8 + .8 tells us. As mentioned above. Sometimes we want to do better than this. We’ll use the same trick that we did from Richardson extrapolation. The intervals used to calculate φ(n + 1) are half the size of those for φ(n). this means the error is one quarter.104 CHAPTER 8. It turns out that if we had proved the error theorem differently. . For a given n. . it suffices to take √ and so n ≥ 8 × 103 suffices. That is h = b−a . 0) = φ(n). f (x) dx + a2 h2 + a4 h4 + a6 h6 + n n n n 4 16 64 256 = a b−a 1 This happens because hn+1 = 2n+1 = 2 b−a = h2n . each entry depending on two others. Because f (x) is positive on this interval. . Towards this end. . If we halve h. As with Richardon’s method for approximating 2n derivatives. we now combine the right multiples of these: φ(n) − 4φ(n + 1) = 4φ(n) − φ(n + 1) = 3 −3 3 f (x) dx + 4h4 + 4 n a b 1 f (x) dx − 4h4 − 4 n a b 15 a6 h6 + n 16 5 a6 h6 − n 16 63 a8 h8 + . we are going to use the trapezoidal rule on a partition of 2n equal subintervals of [a. we can use this to get better and better approximations to the integral. 3n2 8. . the trapezoidal rule will be an overestimate. . In fact. n Like in Richardson’s method. The constants ai are a function of f (i) (x) only.

0) R(k + 1. . 0). R(1. 0) . 0) = [f (0) + f (1) + f (1) + f (2)] = . for Romberg’s algorithm. where the original h is independently set before starting the triangular array. 0) R(2. n) Even though this is exactly the same as Richardon’s method. the second is the trapezoidal rule on two subintervals. for m > 0 R(n.3) The familiar pyramid table then is: R(0. . 0) = 4−1 44 10 − 3 12 10 = 32 = 1.. R(n. .. h 0 is determined by a and b. and need not be smaller than 1. h 0 = b − a. . 0) R(n. R(n. 2) . 2) · · · R(k + 2. n) . These are fairly simple. 30 At this point we are tempted to use Richardson’s analysis. 4m − 1 105 (8. R(k + 1. ROMBERG ALGORITHM then define. Solution: The first column is calculated by the trapezoidal rule.11. 1) R(k + 2. R(k + n. 1) . . Then calculating the following array: R(k. 1) . 0) − R(0. .3. 1) = 4R(1. 2) · · · R(2. . . This would claim that R(n. 0) and R(1. . find R(1. 0) = 6 2−01 [f (0) + f (2)] = . So we first calculate R(0. 1) R(2. 2) . . it has another name: this is called the Romberg Algorithm. Approximating the integral 2 0 1 dx 1 + x2 by Romberg’s Algorithm. . . R(k + n. 0) . O h0 This is a bit different from Richardson’s method. the first is the trapezoidal rule on a single subinterval. m − 1) . 2 2 10 Then. 1). 0) R(k + 2.0¯ 6. 1) R(k + n. Then R(0. 1) R(n. using Romberg’s Algorithm we have R(1. . . m − 1) − R(n − 1. 1 2 5 11 2−01 R(1.8. say 2k smaller than 1. However. n) is a 2(n+1) approximation to the integral. . We can easily deal with this problem by picking some k such that b−a is small enough. Successive columns are found by combining members of previous columns. 0) R(k + n. . 0) R(1. Example Problem 8. m) = 4m R(n.

. That is. rather lower entries in a single column are calculated instead. 1) R(k + n + 1. the user calculates the array: R(k. 0) R(k + 2. 1) R(k + 2. Subtractive cancelling or unbounded higher derivatives of f (x) can make successive approximations less accurate. first notice from the above example that R(0. R(k + n. 2 2 2 2 It would be best to calculate R(1. n) makes a fine approximation to the integral as l → ∞. f (a) + f (b) 1 + = hn 2 2 1 = R(n. 0) given R(n. . . and recall that 2n R(n. n) R(k + n + 1. . 0) . INTEGRALS AND QUADRATURE Quite often Romberg’s Algorithm is used to compute columns of this array. 8. . For this reason. i=1 2n −1 i=1 2n −1 i=1 f (a + ihn+1 ) . 0) R(k + n. .106 CHAPTER 8. . 1) R(k + n + 3. Then R(k + n + l. entries in ever rightward columns are usually not calculated. n) R(k + n + 3. . like 2 or 3. 0) without recalculating f (a) and f (b). Usually n is small. R(k + n. 2n+1 Then calculating R(n + 1. Let hn = b−a . . 0) = φ(n) = hn Thus f (a) + f (b) R(n + 1. 2) · · · R(k + n + 2. 0) requires only 2 n − 1 additional evaluations of f (x). n) R(k + n + 2. 0) = φ(n + 1) = hn+1  + 2 = 1 f (a) + f (b) hn + 2 2  2n+1 −1 i=1 f (a) + f (b) + 2 2n −1 f (a + ihn ) . instead of the + 1 usually required. . 2) ··· R(k + n + 1. 2) · · · R(k + n + 3. 0) R(k + n + 1. . 0) R(k + 1. R(k + 2. It turns out this is possible. . . . . 0) = ) + f( ) + f (b) . 1) . 0). . 0) R(k + n + 3. 1 2 a+b b−a1 a+b f (a) + f ( R(1. 0) R(k + n + 2.. 2n −1 i=1 f (a + (2i − 1)hn+1 ) . 0) = b−a1 [f (a) + f (b)] .  f (a + (2i − 1)hn+1 ) + f 2n −1 1 a + (2i) hn 2 . 1) R(k + n. 2) · · · . R(k + 1. 2) . 0) + hn+1 2 f (a + ihn ) + i=1 f (a + (2i − 1)hn+1 ) .3. . .1 Recursive Trapezoidal Rule It turns out there is an efficient way of calculating R(n + 1. n) . . 1) R(k + n + 2.

1 (x) = (1 − 0)(1 − 2) (x − 0)(x − 1) 1 = (x)(x − 1).8. Thus our rigged approximation is the one that gives b a b n b f (x) dx ≈ p(x) dx = a b i=0 f (xi ) a i (x) dx. We will examine how these rules are created. 1. . Normally one finds the nodes and weights i=0 in a table somewhere. the polynomial of degree ≤ n which interpolates f (x) on these nodes. we expect a quadrature rule with more nodes to be more accurate in some sense–the tradeoff is in the number of evaluations of f (·). 8. Construct a quadrature rule on the interval [0. (8. 2 (x) = (2 − 0)(2 − 1) 2 0 (x) = Then the weights are 4 A0 = 0 4 0 (x) dx = A1 = 0 4 A2 = 0 8 1 (x − 1)(x − 2) dx = . If f (x) is a polynomial of degree ≤ n then f (x) = p(x).1 Determining Weights (Lagrange Polynomial Method) n Suppose that the nodes {xi }i=0 are given.12. 4] using nodes 0. An easy way to find “good” weights {A i }n for these i=0 nodes is to rig them so the quadrature rule gives the integral of p(x).4.e. b a f (x) dx ≈ A0 f (x0 ) + A1 f (x1 ) + .4. i. Solution: The nodes are given. and the quadrature rule is exact. . 1 (x − 1)(x − 2) = (x − 1)(x − 2). If we let Ai = a i (x) dx.. and integrating them. An f (xn ). 2 (x) dx = 2 3 0 4 . Example Problem 8. 3 0 2 4 16 −(x)(x − 2) dx = − . 1 (x) dx = 3 0 4 1 20 (x)(x − 1) dx = . we determine the weights by constructing the Lagrange Polynomials. and weights {Ai }n . Recall n p(x) = i=0 f (xi ) i (x). where i (x) is the ith Lagrange polynomial. then we have a quadrature rule. GAUSSIAN QUADRATURE 107 8.4) n for some collection of nodes {xi }i=0 . (0 − 1)(0 − 2) 2 (x − 0)(x − 2) = −(x)(x − 2).4 Gaussian Quadrature The word quadrature refers to a method of approximating the integral of a function as the linear combination of the function at certain points. 2.

1. . The method of undetermined coefficients is a good alternative for finding the weights by hand. The equations are derived by letting the quadrature rule be exact for f (x) = x j for j = 0. setting b a n xj dx = i=0 Ai (xi )j . and for small n. By calculus we have 4 4 1 76 64 x2 + 1 dx = x3 + x = +4= . 2. Construct a quadrature rule on the interval [0.12.4.108 Thus our quadrature rule is 4 0 CHAPTER 8. 3 3 3 3 8. That is. 1. solving the linear system may not be too burdensome. Example Problem 8. as illustrated by the next example: . the method of undetermined coefficients is useful in more general settings. .” Since we will not consider n to be very large. n. let f (x) = x 2 +1. 3 3 Notice the difference compared to the Lagrange Polynomial Method: undetermined coefficients requires solution of a linear system. . 4] using nodes 0. we reconsider Example Problem 8. Moreover. For example.2 Determining Weights (Method of Undetermined Coefficients) Using the Lagrange Polynomial Method to find the weights A i is fine for a computer. 3 3 3 0 0 The approximation is 4 0 x2 + 1 dx ≈ 16 20 76 8 [0 + 1] − [1 + 1] + [4 + 1] = . A0 = 8 . The idea behind the method is to find n+1 equations involving the n+1 weights. A1 = − 16 . the system:  4 1 1 1 2 8  1 4 64/3 We get the same weights as in Example Problem 8. 3 3 3 We expect this rule to be exact for a quadratic function f (x). . while the former method calculates the weights “directly. on an exam). Solution: The method of undetermined coefficients gives the equations: 4 1 dx = 4 = A0 + A1 + A2 0 4 x dx = 8 = A1 + 2A2 0 4 0 x2 dx = 64/3 = A1 + 4A2 . To illustrate this. but can be tedious (and error-prone) if done by hand (say. INTEGRALS AND QUADRATURE 16 20 8 f (x) dx ≈ f (0) − f (1) + f (2).13.12: A 2 = We perform Na¨ Gaussian Elimination on ıve  1  0 0 20 3 .

r(x) are of degree ≤ n. or those of higher degree. a Thus a b b b b f (x) dx = a p(x)q(x) dx + a r(x)dx = a r(x)dx.3 Gaussian Nodes It would seem this is the best we can do: using n + 1 nodes we can devise a quadrature rule that is exact for polynomials of degree ≤ n by choosing the weights correctly. we can do far better. We should check: 1 0 x3 dx = 1/4 = 1/2 = 1 − 3/2 + 1 = A 1 + B 3 + C 6. We get these equations by plugging in successive polynomials. Gauss discovered that the right nodes to choose are the n + 1 roots of the (nontrivial) polynomial. you could say that q(x) is orthogonal to the polynomials xk in the resultant inner product space. but that’s just fancy talk. B = −1/2. and thus the rule is not exact for cubic polynomials. That is. but it might be better.8. x. This rule should be exact for polynomials of degree no greater than 2. C = 1/6. 8.14. Let f (x) be a polynomial of degree ≤ 2n + 1. Both p(x). x 2 and assuming the coefficients give equality: 1 1 dx = 1 = A 1 + B 0 + C 0 = A 0 1 x dx = 1/2 = A 1 + B 1 + C 0 = A + B 0 1 0 x2 dx = 1/3 = A 1 + B 2 + C 2 = A + 2B + 2C This is solved by A = 1.) Suppose that we have such a q(x)–we will not prove existence or uniqueness.4. . we plug in f (x) = 1. Determine a “quadrature” rule of the form 1 0 109 f (x) dx ≈ Af (1) + Bf (1) + Cf (1) that is exact for polynomials of highest possible degree.4. q(x). We write f (x) = p(x)q(x) + r(x). (If you view the integral as an inner product. of degree n + 1 which has the property b a xk q(x) dx = 0 (0 ≤ k ≤ n) . Because of how we picked q(x) we have b p(x)q(x) dx = 0. GAUSSIAN QUADRATURE Example Problem 8. It turns out that by choosing the nodes in the right way. What is the highest degree polynomial for which this rule is exact? Solution: Since there are three unknown coefficients to be determined. we look for three equations.

Notice. Let Ai be the coefficients for these nodes chosen by integrating the Lagrange Polynomials. however. Theorem 8. Then n n n Ai f (xi ) = i=0 i=0 Ai [p(xi )q(xi ) + r(xi )] = i=0 Ai r(xi ). c0 + c1 x + c2 x2 dx = c0 x + c1 x2 + c2 x3 dx = 0 Evaluating these integrals gives the following system of linear equations: 8 2c0 + 2c1 + c2 = 0. That is.” that is. 2]. then so does q (x) = αq(x). The last equality holds because the x i are the roots of q(x). that if q(x) satisfies the orthogonality conditions. Thus this rule is exact for f (x). The example is illustrative Example Problem 8. q(x). Thus we let q(x) = c 0 + c1 x + c2 x2 . of degree n + 1 which has the property b a xk q(x) dx = 0 (0 ≤ k ≤ n) . 8.16.4 Determining Gaussian Nodes We can determine the Gaussian nodes in the same way we determine coefficients. Let x i be the n + 1 roots of a (nontrivial) polynomial. The orthogonality condition becomes 2 2 1q(x) dx = 0 2 0 0 xq(x) dx = 0 2 0 that is. Because of how the A i are chosen we then have n n b b Ai f (xi ) = i=0 i=0 Ai r(xi ) = a r(x) dx = a f (x) dx. there are two equations.110 CHAPTER 8.4. With great foresight. Find the two Gaussian nodes for a quadrature rule on the interval [0. Then the quadrature rule for this choice of nodes and coefficients is exact for polynomials of degree ≤ 2n + 1. but three unknowns. x under the inner product of integration over [0. This reduces the equations to 2c0 + 2c1 = −8. 2]. 3 This system is “underdetermined. for any real ˆ number α. which is “orthogonal” to 1. 3 .15 (Gaussian Quadrature Theorem). we can pick the scaling of q(x) as we wish. 8 2c0 + c1 = −12. Gauss has) made quadrature twice as good. INTEGRALS AND QUADRATURE Now suppose that the Ai are chosen by Lagrange Polynomials so the quadrature rule on the nodes xi is exact for polynomials of degree ≤ n. We have (or rather. 3 8 2c0 + c1 + 4c2 = 0. we “guess” that we want c 2 = 3. Solution: We will find the function q(x) of degree 2.

A1 such that √ √ 2 3 3 f (x) dx ≈ A0 f (1 − ) + A1 f (1 + ) 3 3 0 is exact for polynomial f (x) of degree ≤ 1. 6 3 111 These nodes are a bit ugly. c1 = −6.8. . A1 such that 2 0 2 0 1 dx = A0 + A1 √ √ 3 3 x dx = A0 (1 − ) + A1 (1 + ) 3 3 This gives the equations 2 = A 0 + A1 √ √ 3 3 2 = A0 (1 − ) + A1 (1 + ) 3 3 This is solved by A0 = A1 = 1. It suffices to make this approximation exact for the “building blocks” of such polynomials. That is. we want to construct A 0 . c2 = 3. Remember. Then our nodes are the roots of q(x) = 2 − 6x + 3x 2 . that is. for the functions 1 and x. Rather than construct the Lagrange Polynomials. it suffices to find A0 . Thus our quadrature rule is 2 0 √ √ 3 3 ) + f (1 + ).17. Chapter 3) yields the answer c 0 = 2. Verify the results of the previous example problem for f (x) = x 3 . 0 The quadrature rule gives √ 3 3 f (1 − ) + f (1 + )= 3 3 √ √ 3 √ 3 3 3 1− + 1+ 3 3 √ √ √ √ 3 3 3 3 3 3 3 3 + 1+3 = 1−3 +3 − +3 + 3 9 27 3 9 27 √ √ √ √ = 2 − 3 − 3/9 + 2 + 3 + 3/9 = 4 Thus the quadrature rule is exact for f (x). Solution: We have 2 f (x) dx = (1/4) x4 2 0 = 4. That is the roots √ √ 6 ± 36 − 24 3 =1± . we will use the method of undetermined coefficients. Example Problem 8. GAUSSIAN QUADRATURE Simple Gaussian Elimination (cf.4. f (x) dx ≈ f (1 − 3 3 We expect this rule to be exact for cubic polynomials.

1.16 by using the technique of Example Problem 8. Thus −1 f (x) dx ≈ n Ai f (xi ).16. This is the same rule that was derived (with much more work) in Example Problem 8.18. Derive the quadrature rules of Example Problem 8. it doesn’t make sense in practice to recompute these things all the time. Thus g(t) = b−a f 2 b+a b−a t+ 2 2 = f (t + 1). The standard Gaussian Quadrature rule approximates this as 1 −1 g(t) dt ≈ g(− 1/3) + g( 1/3) = f (1 − 1/3) + f (1 + 1/3).com/Legendre-GaussQuadrature.1.html n 0 1 xi x0 = 0 x0 = − 1/3 x1 = 1/3 x0 = − 3/5 x1 = 0 x2 = 3/5 Ai A0 = 2 A0 = 1 A1 = 1 A0 = 5/9 A1 = 8/9 A1 = 5/9 1 2 Table 8.wolfram. g(t) is a polynomial of the same degree. 1]. any good textbook will list a few. i=0 with this relation holding exactly for all polynomials of degree no greater than 2n + 1. To integrate f (x) over [0. x= 2 2 2 Then b 1 b+a b−a b−a f (x) dx = t+ dt. Quadrature rules are normally given for the interval [−1. .5 Reinventing the Wheel While it is good to know the theory. so dx = dt. 1] on g(t) to evaluate the integral of f (x) over [a. Given a quadrature rule which is good on the interval [−1. b = 2. 2].1: Gaussian Quadrature rules for the interval [−1. b]. Solution: We have a = 0. g(t) = 2 2 2 if f (x) is a polynomial. we integrate g(t) over [−1.112 CHAPTER 8.19. 1]. The simplest ones are given in Table 8. See also http://mathworld. There are books full of quadrature rules. as the following example problem illustrates: Example Problem 8. 1]. On first consideration. it would seem you need a different rule for each interval [a. Thus we can use the quadrature rule for [−1. derive a version of the rule to apply to the interval [a.18 and the quadrature rules of Table 8. Solution: Consider the substitution: b+a b−a b−a t+ . This is not the case.4. b]. b]. INTEGRALS AND QUADRATURE 8. 1]. f 2 2 2 a −1 Letting b−a b+a b−a f t+ . Example Problem 8.

1 . 1] for f (x) = 4/(1 + x2 )) 3 (8. 1 + x2 to within 0.2) Use the composite trapezoidal rule. 3} . 1 . to approximate 3 0 113 x2 dx (= 9) Use the partition {xi }2 = {0. + 2f (xn−2 ) + 4f (xn−1 ) + f (xn )] . Why is your approximation an overestimate? (Check: i=0 4 2 I think the answer is 0. by hand to approximate 1 0 4 dx.7) (8. 1] are required to approximate 0 cos x dx with error smaller than 1 × 10−6 by the composite trapezoidal rule? (Use Theorem 8. You should find that the approximation is an overestimate.8. 1 (8.6) How many equal subintervals would be required to approximate 1 0 4 dx. Why is your approximation an overestimate? i=0 (8.8 to bound the error of the composite trapezoidal rule approximation of 2 3 0 x dx with n = 10 intervals. What is the highest degree polynomial for which this rule is exact? . by hand. 1). by hand.1) Use the composite trapezoidal rule. 1 .4) Use Theorem 8. 3 where ∆x = (b − a)/n.3) Use the composite trapezoidal rule.8.693) x+1 Use the partition {xi }3 = 0. GAUSSIAN QUADRATURE Exercises (8.9) Find a quadrature rule of the form 1 0 f (x) dx ≈ Af (0) + Bf (1/2) + Cf (1) that is exact for polynomials of highest possible degree. to approximate 1 0 1 dx (= ln 2 ≈ 0.7) How many equal subintervals of [2. As such we expect Simpson’s Rule to be a O h4 approximation to the integral. (8. .0001 by the composite trapezoidal rule? (Hint: Use the fact that |f (x)| ≤ 8 on [0. How good is your answer? (8. and n is assumed to be even. 3] are required to approximate 2 ex dx with error smaller than 1 × 10−3 by the composite trapezoidal rule? (8.8) Simpson’s Rule for quadrature is given as b a f (x) dx ≈ ∆x [f (x0 ) + 4f (x1 ) + 2f (x2 ) + 4f (x3 ) + . Show that Simpson’s Rule for n = 2 is actually given by Romberg’s Algorithm as R(1.4. . 1.5) How many equal subintervals of [0.) (8. 1 + x2 Use n = 4 subintervals.

consider the following questions: do you expect the nodes to be the endpoints 1 and 5? do you expect the nodes to be arranged symmetrically around the midpoint of the interval? (8. 5]. 1].a. (8.. Your m-file should have header line like: function iappx = gauss2(f.b. which could be difficult. 1]. b] by the composite trapezoidal rule on n equal subintervals. If f is defined to work on vectors element-wise. (8. find a rule 1 −1 f (x) dx ≈ Af (x0 ) + Bf (x1 ) + Cf (x2 ) To find the nodes x0 .e. x1 ./ n.n) You may wish to use the code: x = a . you can probably speed up your computation by using bigsum = 0. i. it does not exactly fit our definition of a quadrature rule.* (0:n) . x2 you will have to find the zeroes of a cubic equation. INTEGRALS AND QUADRATURE (8. Your m-file should have header line like: function iappx = trapezoidal(f.a. b]. but it may be applicable in some situations.11) Determine a “quadrature” rule of the form 1 0 f (x) dx ≈ Af (0) + Bf (0) + Cf (1) that is exact for polynomials of highest possible degree. i.17) Write code to implement composite Gaussian Quadrature based on code from the previous problem. For what order polynomials are these rules exact? (8. However.16) Write code to implement the Gaussian Quadrature rule for n = 2 to integrate f on the interval [a.13) Find the Gaussian Quadrature rule with 2 nodes for the interval [1.114 CHAPTER 8.10) Determine a “quadrature” rule of the form 1 −1 f (x) dx ≈ Af (0) + Bf (−1) + Cf (1) that is exact for polynomials of highest possible degree. find a rule 5 1 f (x) dx ≈ Af (x0 ) + Bf (x1 ) Before you solve the problem.e. Something like the following probably works: .14) Find the Gaussian Quadrature rule with 3 nodes for the interval [−1.. What is the highest degree polynomial for which this rule is exact? (8.+ (b-a) . What is the highest degree polynomial for which this rule is exact? (Since this rule uses derivatives of f.12) Consider the so-called order n Chebyshev Quadrature rule: 1 −1 n f (x) dx ≈ cn f (xi ) i=0 Find the weighting cn and nodes xi for the case n = 2 and the case n = 3.b) (8.15) Write code to approximate the integral of a f on [a.5 * ( f(x(1)) + f(x(n+1)) ) + sum( f(x(2:(n))) ). you may use the simplifying assumption that the nodes are symmetrically placed in the interval [−1.) (8.

b] x = a .* (0:n) ./ n.683.b. iappx = 0. or see http://mathworld. and erf(2/ 2) ≈ 0. for i=1:n iappx += gauss2(f.x(i+1)).com/Erf.n) % code to approximate integral of f over n equal subintervals of [a. GAUSSIAN QUADRATURE function iappx = gaussComp(f.+ (b-a) .x(i). These numbers should look familiar to you.wolfram. .955.4.a.html) The error function is used in probability. end Use your code to approximate the error function: 2 erf(z) = √ π z 0 115 e−t dt. (Try help erf in octave. 2 Compare your results with the octave/Matlab builtin function erf. the probability that a normal random variable is within z standard deviations from its mean is √ erf(z/ 2) √ √ Thus erf(1/ 2) ≈ 0.8. In particular.

116 CHAPTER 8. INTEGRALS AND QUADRATURE .

1. That is r = [r0 r1 . or residual. For a given p with 0 < p < ∞. . . . if f ∗ is the least squares best approximant. y from a class of functions. the p norm of r is defined as n 1/p p ri i=0 2 r p = For the purposes of approximation. The least-squares best approximant to a set of data. .1. In the general setting we are given a set of data x y x 0 x1 . Usually the class of functions F will be determined by some small number of parameters. is the function f ∗ ∈ F that minimizes the 2 norm of the error. The goal then is to find the “best” f ∈ F to fit the data to y = f (x). That is we let x = [x0 x1 .Chapter 9 Least Squares 9.1 The Definition of Ordinary Least Squares First we consider the data to be two vectors of length n + 1.1. F. The error. x n y 0 y1 .1 Least Squares Least squares is a general class of methods for fitting observed data to a theoretical model function. . We measure the size of a vector by the use of norms. and y = [y0 y1 . F. then y − f ∗ (x) 2 = min y − f (x) f ∈F 2 117 . . 9. the easiest norm to use is the norm: Definition 9. yn ] . x. of a given function with respect to this data is the vector r = y − f (x). The theory here will be concerned with defining “best.1 ((Ordinary) Least Squares Best Approximant). That is. . xn ] .” and examining methods for finding the “best” function to fit the data. which were explored in Subsection 1. Our goal is to find f ∈ F such that r is reasonably small. . rn ] . The most useful norms are the p norms. . . the number of parameters will be smaller (usually much smaller) than the number of data points.4. y n and some class of functions. where ri = yi − f (xi ).

By Definition 9. We are assuming F = {f (x) = ax + b | a. It assumes there is no error in the measurement of the data x. We set 0= 0= ∂φ a ∂φ b n = k=0 n 2xk (axk + b − yk ) 2 (axk + b − yk ) = k=0 These are called the normal equations. we want to find the function f (x) = ax + b that minimizes n i=0 1/2 [yi − f (xi )] 2 . and usually admits a relatively straightforward solution. We minimize this function with calculus. 9. Believe it or not these are two equations in two unknowns. and is ugly. b ∈ R } . That is. LEAST SQUARES We will generally assume uniqueness of the minimum.1.118 CHAPTER 9. x and y. it suffices to minimize n i=0 [axi + b − yi ]2 . This method is sometimes called the ordinary least squares method.2 Linear Least Squares We illustrate this definition using the class of linear functions as an example. As you may recall from calculus class. This is a minimization problem over two variables.1. They can be reduced to x2 a + k xk b = xk yk yk xk a + (n + 1)b = The solution is found by na¨ Gaussian Elimination. Let ıve d11 = d12 = d21 = e1 = e2 = We want to solve d11 a + d12 b = e1 d21 a + d22 b = e2 x2 k xk xk yk yk d22 = n + 1 .1. it suffices to minimize the sum rather than its square root. as this case is reasonably simple. We can now think of our job as drawing the “best” line through the data.

1. where the function αf is that function such that (αf )(x) = αf (x). j=0 c0 g0 (x) + c1 g1 (x) + . then αf + βg ∈ F. However. cm ) = k=0   j This can be rearranged to get Again we set partials to zero and solve  n ∂φ 2  = 0= ci k=0 m cj gj (xk ) − yk     2 (9. that is we minimize the function n φ (c0 . i. ˜ ˜ To find the least squares best approximant of F for a given set of data. Example 9. but not all cases. = c m = 0 Then we say that F is spanned by the functions {g j (x)}m if j=0 F=    f (x) = j cj gj (x) | cj ∈ R. g ∈ F. for a moment. . c1 . consider whether this is indeed the solution. In this case F can be viewed as a vector space over the real numbers. . . Note the basis functions need not be unique: a given class of functions will usually have more than one choice of basis functions. is the span of a small set of functions. This case is simpler to explore and we consider it here. and g1 (x) = x. Our calculations have only shown an extrema at this choice of (a. . 1. and α. LEAST SQUARES Gaussian Elimination produces a = b = d22 e1 − d12 e2 d22 d11 − d12 d21 d11 e2 − d21 e1 d22 d11 − d12 d21 119 The answer is not so enlightening as the means of finding the solution. j = 0.1.1) j cj gj (xk ) − yk  gi (xk ) n gj (xk )gi (xk ) cj = j=0 k k=0 yk gi (xk ) .e.9.. . for f. could it not be a maxima? 9. b ∈ R } is spanned by the two functions g 0 (x) = 1. . . . F. + cm gm (x) = 0 ∀x ⇒ c 0 = c1 = . The class F = {f (x) = ax + b | a. m    . Now let {gj (x)}m be a set of m + 1 linearly independent functions. and g1 (x) = x − 1. the class of functions. β ∈ R. We should. That is. . b). . it is also spanned by the two functions g 0 (x) = 2x + 1. we minimize the square of the 2 norm of the error. .1.3 Least Squares from Basis Functions In many. . In this case the functions gj are basis functions for F.

this reduces to: 4. and g 0 (x) = ln x.  . For example. . Note that we are talking about the basis {g j }m . ei = k=0 yk gi (xk ).75 3. Consider the case where m = 0.713938 1. First we have to evaluate the basis functions at the nodes xi . so we compute some of the d ij .0 2.436975 1. We explore this in Section 9. The data and the least squares interpolant are shown in Figure 9. .   . . dm0 dm1 dm2 · · · to the linear system  d02 · · · d0m d12 · · · d1m    d22 · · · d2m    . consider what would happen if the system of normal equations were diagonal.120 If we now let dij = k=0 CHAPTER 9.50 0. g1 (x1 ) = .0 −1. .25 2.165234 1. and not exactly about the class j=0 of functions F.0960c = 6. g0 (x1 ) = 0.7052. x2 = −1. dmm (again. .75 1.0 1. x1 = 1.2)    . Suppose the data are given at the nodes x 0 = 0.452472 −0.2 reduces to the 1-D equation: Σk ln2 xk c = Σk yk ln xk For our data. Example 9. called the normal equations):    e0 c0 c1   e1     c2  =  e2  (9.   . .2.068077 0. .  .50 2. The choice of the basis functions can affect how easy it is to solve this system. Then we have reduced the problem  d00 d01  d10 d11   d20 d21   . Consider the awfully chosen basis functions: g0 (x) = 2 − 1 x2 − x + 1 2 2 −1 x2 + x + 1 g1 (x) = x3 + where is small. We want to set up the normal equations. LEAST SQUARES n n gj (xk )gi (xk ). Find the least squares approximation of the data x y 0. .187098 −0. around machine precision.1. But this example was rigged to give: g0 (x0 ) = 1.841422 Solution: Essentially we are trying to find the c such that c ln x best approximates the data. . solving the system would be rather trivial. In this case.9844 so we find c = 1. g1 (x2 ) = 0 After much work we find we want to solve 1+ 1 2 1 1+ 2 c0 c1 = y0 + y 2 y0 + y 1 .  .. The system of equation 9. em cm Example Problem 9.725919 1.3.  . g0 (x2 ) = g1 (x0 ) = 1.  . .2.

This certainly gives a basis for F. To see why. Consider the case where F is the class of polynomials of degree no greater than m. gj (x) = Tj (x). 1]. but actually a rather poor one.5 -2 -2. A better basis for this problem is the set of Chebyshev Polynomials of the first kind. For simplicity we will assume that all xi are in the interval [0.5 -1 -1. look at the graph of the basis functions in Figure 9. Roughly speaking. Bummer. this has no solution.2 and the least squares interpolant are shown.5 1 0. The most obvious choice of basis functions is gj (x) = xj . which will be done by Gaussian Elimination. When the gi are small at the nodes xk . if gi (xk ) is small for some i’s and k’s. then some d ij can have a loss of precision when two small quantities are multiplied together and rounded to zero. However.3. Ti+1 (x) = 2xTi (x) − Ti−1 (x).5 0 -0. ORTHONORMAL BASES 2 121 1. This kind of thing is common in the method of least squares: the coefficients of the normal equations include terms like gi (xk )gj (xk ).1: The data of Example Problem 9.e. the computer would only find this if it had infinite precision.5 1 1. the computer thinks 2 = 0. T1 (x) = x. 9. We find they do a pretty poor job of discriminating around all the nodes.2. .9.. The basis functions look too much alike on the given interval.5 2 2. and since is rather small. these coefficients can get really small. Poor choice of basis vectors can also lead to numerical problems in solution of the normal equations. where T0 (x) = 1.5 0. Now we draw a rough sketch of the basis functions.5 3 Figure 9. and so tries to solve the system 1 1 1 1 c0 c1 = y0 + y 2 y0 + y 1 When y1 = y2 . since we are squaring. Since it does not.2 Orthonormal Bases In the previous section we saw that poor choice of basis vectors can lead to numerical problems. i.

. 1. and g1 (x) = x3 + 2 − 1 x2 + x + 1 are shown for = 0.05. These polynomials are illustrated in Figure 9. 1. i = 0. .5 1 0. .5 0 -0. . j = 0. We consider other methods.5 0 0. First. n.4.Sfrag replacements 122 1.5 CHAPTER 9. PSfrag replacements xi xi+1 -1. .2 0.4 0.4 0. we define A as the n × m matrix defined by the entries: aij = gj (xi ).3: The polynomials xi for i = 0.1 Alternatives to Normal Equations It turns out that the Normal Equations method isn’t really so great. m. .8 1 9. .5 1 0. LEAST SQUARES g0 (x) g1 (x) 1 0.6 0. 1. These polynomials make a bad basis because they look so much alike. . Note that around the three nodes 0. . 6 are shown on [0.5 -1 Figure 9. Figure 9.2 0 0 -1 -0.8 0. .2: The basis functions g0 (x) = 2 − 1 x2 − 2 x + 1. . −1. .2. these two functions take nearly identical values. . 1.6 0. 1]. This can lead to a system of normal equations with no solution. essentially.

. . . 1]. that is we want A r = A (y − A c) = 0. That is c is m identified with f (x) = j=0 cj gj (x). Basically. 1.5 -1 0 123 0. . .9. . Sfrag replacements xi xi+1 1 0. . . (9.. so A is “tall.5 0 -0. And thus the residual. . gm (x0 ) gm (x1 ) gm (x2 ) gm (x3 ) gm (x4 ) . g2 (x0 ) g2 (x1 ) g2 (x2 ) g2 (x3 ) g2 (x4 ) .8 1 g0 (xn ) g1 (xn ) g2 (xn ) · · · = (y − A c) (y − A c) .4 0. You should now convince yourself that Ac = f (x). . .2. they do not look like each other. with a function in F. ORTHONORMAL BASES Figure 9. These polynomials make a better basis for least squares because they are orthogonal under some inner product. ··· ··· ··· ··· ··· . gm (xn )            We write it in this way since we are thinking of the case where n m. we find that the Normal Equations can be written as: A A c = A y. or error. . in the natural way. .4: The Chebyshev polynomials T j (x) for j = 0.6 0. of this function is r = y − Ac. For this reason.” After some inspection. 6 are shown on [0. That is      A=      g0 (x0 ) g0 (x1 ) g0 (x2 ) g0 (x3 ) g0 (x4 ) . we will have that the residual is orthogonal to the column space of A. g1 (x0 ) g1 (x1 ) g1 (x2 ) g1 (x3 ) g1 (x4 ) . In our least squares theory we attempted to find that c that minimized y − Ac 2 2 We can see this as minimizing the Euclidian distance from y to A c.2 0.3) Now let the vector c be identified.

The general equation we wish to solve is Ac = y.e. and does not suffer the increased condition number of the normal equations method. Under the hood. However. in the following form: find c. however. is a random variable. LEAST SQUARES This is just the normal equations. consider the case of two variables. and that there is no error in the independent variables.2 Ordinary Least Squares in octave/Matlab The ordinary least squares best approximant is so fundamental that it is hard-wired into octave/Matlab’s system solve operator.2. We could rewrite this. that the measurements are “unbiased. and. For example. the ordinary least squares solution is a very good one. the error of the ith measurement. i.1 However. locate the Gauss-Markov Theorem in a good statistics textbook. that i=1 i=1 yi = mxi + b + i . unless specifically told otherwise. that there is actually error in the measurement of the x i ? In this case. Thus. r such that y r I A = 0 c A 0 This is now a system of n + m variables and unknowns.. which can be solved by specialized means. For more details on this.1. A number of measurements are made. gives the estimators with the lowest variance among all linear. We briefly mention that na¨ Gaussian Elimination is not ıve appropriate to solve the augmented form. .e. i. more equations than unknown variables. 1 This means that the ordinary least squares solution gives an unbiased estimator of m and b. i.124 CHAPTER 9.. which are thought to be related by an equation of the form y = mx + b. It is usually assumed that E [ i ] = 0. This system is overdetermined: there are more rows than columns in A. where i . this is solved just like the case where A is square: octave:1> c = A \ y. This is known as the augmented form. moreover. x and y. 9.e. what if it were the case that yi = m (xi + ξi ) + b + i .” assumes that measurement error is found entirely in the dependent variable. a priori estimators. sometimes referred to as “ordinary least squares.3 Orthogonal Least Squares The method described in Section 9. giving the two sequences {xi }n and {yi }n . Ordinary least squares assumes. 9.” Under the further assumption that the errors are independent and have the same variance. the orthogonal least squares method is appropriate. you should not implement the ordinary least squares solution method in octave/Matlab. octave/Matlab is not using the normal equations method.. in octave/Matlab. as it turns out to be equivalent to using the normal equations method.

Let L (n. as shown in (a). d) = 1. reflecting the assumption of no error in x i . shown in (b). expressed as vectors in Rn . 2 n 2 i=1 2 is the one to be minimized with respect to n and d. Let x(i) i=1 be the set of observations. The function m 2 1 n x(i) − d . d.5: The ordinary least squares method. minimizes the sum of the squared lengths of the vertical lines from the observed points to the regression line.5(b). The offsets from points to the regression line in (b) may not look orthogonal due to improper aspect ratio of the figure.9. Thus our problem is to find n and d that solve 2 m m min n n=1 i=1 n x(i) − d 2 2 . It gives the sum of the squared distances from the points to the plane described by n and d.5.5(a). as shown in Figure 9. We can express this as min f (n. 7 7 6 6 5 5 4 4 3 3 2 2 1 0 1 2 3 4 5 6 1 0 1 2 3 4 5 6 (a) Ordinary Least Squares (b) Orthogonal Least Squares Figure 9. We construct the solution geometrically. we will minimize it subject to the constraint that n 2 = 1. d) − λg(n.3. The solution of this problem uses the Lagrange multipliers technique. minimizes the sum of the squared distances from the points to the line. the ordinary least squares method is shown. To simplify its computation. We can solve this problem using vector calculus. d). The orthogonal least squares. d) subject to g(n. λ) = f (n. The Lagrange multipliers theory tells us that a necessary . In Figure 9. ORTHOGONAL LEAST SQUARES 125 The difference between the two methods is illustrated in Figure 9. The orthogonal least squares method will minimize the sum of the squared distances from observations to a line. The problem is to find the vector n and number d such that the hyperplane n x = d is the plane that minimizes the sum of the squared distances from the points to the plane. it minimizes the sum of the squared lengths of vertical lines from observations to a line.

We have 1m Xn ˆ 1m Xn + f(n) = n X Xn − 2 1m 1m n X 1m 1m Xn = n X Xn − 2 + 1m 1m n X 1m 1m Xn = n X Xn − 2 + 1m 1m n X 1m 1m Xn = n X Xn − 1m 1m X 1m 1m X n =n X X− 1m 1m = n Mn 1m Xn 1m 1m 1m 1m n X 1m 1m Xn1m 1m 1m 1m 1m 1m n X 1m 1m Xn 1m 1m 2 . d. this gives 0 = X Xn − X 1m 1m X X 1m 1m X n − λn = X X − − λI n 1m 1m 1m 1m Thus n is an eigenvector of the matrix M=X X− X 1m 1m X 1m 1m The final condition of the three Lagrange conditions is that g(n. d. d) = 1 L (n. λ) = 2d1m 1m − 21m Xn ∂d ⇒ d = 1m Xn/1m 1m This essentially tells us that we will zero the first moment of the data around the line. ˆ when the optimal d is used: f (n) = f (n. d. so the multiplication can be commuted. LEAST SQUARES condition for n.126 CHAPTER 9. λ) = (Xn − d1m ) (Xn − d1m ) − λn n We solve the necessary condition on 0= ∂L ∂d . d to be the solution is that there exists λ such that the following hold:   ∂L (n. Note that this determination of d requires knowledge of n. We can rewrite the Lagrange function as = n X Xn − 2d1m Xn + d2 1m 1m − λn n ∂L (n. First we rewrite the function to be minimized. then it is a unit eigenvector of M. d. It is not clear which eigenvector is the right one. Thus if n is a minimizer. λ) = 2X Xn − 2d 1m X − 2λn = 2 X Xn − 1m Xn 1m X 1m 1m − λn The middle term is a scalar times a vector. d) = n n = 1. 1m Xn/1m 1m ). λ) = 0  ∂d L (n. further work tells us which eigenvector it is. since this is a necessary condition. However. we can plug it into the gradient equation: 0= n L (n. However. λ) = 0  n  g (n. so we might have to check all the eigenvectors. Let X be the m × n matrix whose ith row is the vector x(i) . d.

whose roots are the eigenvalues of the matrix is p(λ) = det 2α − λ β β 2γ − λ = 4αγ − β 2 − 2(α + γ)λ + λ2 . and used the fact that a scalar is it’s own transpose. by the orthogonal least squares method. Example Problem 9. x2 xy ¯ ¯¯ xy y 2 ¯¯ ¯ 2α β β 2γ 2 x xi − m¯2 xi yi − m¯y x¯ xi yi − m¯y x¯ 2 y yi − m¯2 (9. then the matrix M must be positive definite.e. Thus one should select.and y-values..  .4) . . Solution: In this case the matrix X is   x1 y1  x2 y2     x3 y3     . you may compute that M has some negative eigenvalues. xm ym M = X X− = = = = X 1m 1m X . Note that. where we use x. d) is defined as w w. However. The characteristic ¯ ¯ polynomial of the matrix. . 1m 1m 1 xi yi x2 i − 2 yi xi yi m x2 i xi yi xi yi 2 yi −m Thus we have that ( xi ) 2 xi yi . Note that because f (n. as n.3. The associated eigenvector is in −k 1 = β − k (2α − λ− ) −kβ + 2γ − λ− 2α − λ− β β 2γ − λ− . then n Mn = λ. Thus we take n to be the unit eigenvector associated with the smallest eigenvalue. to mean. thus 1 m Xn = n X 1m . we want to minimize λ. Now if n is a unit eigenvector of M. i. by roundoff. the unit eigenvector associated with the smallest eigenvalue in absolute value. . λ must be positive. y . for some vector w.   . λ− = (α + γ) − the null space of the matrix 0= 2α − λ− β β 2γ − λ− v= (α − γ)2 + β 2 . though they are small in magnitude. respectively. xi ( yi ) 2 yi . yi )}i=1 . The roots of this polynomial are given by the quadratic equation λ± = 2(α + γ) ± 4 (α + γ)2 − 4 (4αγ − β 2 ) 2 = (α + γ) ± (α − γ)2 + β 2 We want the smaller eigenvalue. with corresponding eigenvalue λ.4. ORTHOGONAL LEAST SQUARES 127 This sequence of equations only required that the d be the optimal d for the given n. the mean of the x. Find the equation of the line which best approximates the 2D data m {(xi .9.

A square matrix U is called unitary if its inverse is its transpose. ¯ ¯ 9.1 Computing the Orthogonal Least Squares Approximant Just as the Normal Equations equation 9. The second equation tells that the rows of U are also such a collection. . 1m 1m Now note that W W= X− 1m 1m X 1m 1m X− 1m 1m X 1m 1m = X X−2 1m 1m X 1m 1m X X 1m 1m X + 2 1m 1m 1m 1m =X X−2 =X X− X 1m 1m X X 1m (1m 1m )1m X X 1m 1m X X 1m 1m X + + =X X−2 2 1m 1m 1m 1m 1m 1m 1m 1m X 1m 1m X = M. v. However it is parallel to the unit vector we want. y − y = k (x − x) . and can still use the regular formula to compute the optimal d.1.e. i. LEAST SQUARES β 2 (yi − x2 ) − m y 2 − x2 + ¯ ¯ i 2 2 (yi − x2 ) − m (¯2 − x2 ) y ¯ i 2 + 4( xi yi − m¯y x¯ or xi yi − m¯y )2 x¯ . but are not normally used to solve the problem. (9.. d = 1m Xn/1m 1m = = x y ¯ ¯ −k 1 1 1m X m −k 1 = y − k¯ ¯ x Thus the optimal line through the data is y = kx + d = k (x − x) + y or ¯ ¯ with k defined in equation 9.5) Thus the best plane through the data is of the form −kx + 1y = d y = kx + d Note that the eigenvector. 1m 1m We now need some linear algebra magic: Definition 9. From the first equation. U U = I = UU . that we used is not a unit vector.3. so too is the above described means of computing the orthogonal least squares not a good idea.128 This is solved by (γ − α) + 2γ − λ− = k= β = (γ − α)2 + β 2 CHAPTER 9.3 define the solution to the ordinary least squares approximant. we see that the columns of U form a collection of orthonormal vectors.3.5. First we define 1m 1m X W =X− .

9.3. ORTHOGONAL LEAST SQUARES Definition 9.3.2. Every m × n matrix B can be decomposed as B = UΣV ,

129

where U and V are unitary matrices, U is m × m, V is n × n, and Σ is m × n, and has nonzero elements only on its diagonal. The values of the diagonal of Σ are the singular values of B. The column vectors of V span the row space of B, and the column vectors of U contain the column space of B. The singular value decomposition generalizes the notion of eigenvalues and eigenvalues to the case of nonsquare matrices. Let u(i) be the ith column vector of U, v (i) the ith column of V, and let σi be the ith element of the diagonal of Σ. Then if x = i αi v (i) , we have  0 σ1 α1 0 σ 2 α2 . . . . . . 0 0 0 0 . . . . . . 0 0 ... ... .. . ... ... .. . ... 0 0 . . . 

Bx = UΣV
i

αi v (i)

     = UΣ [α1 α2 . . . αn ] = U      

     σ n αn  =  0   .  .  . 0

σi αi u(i) .
i

Thus v (i) and u(i) act as an “eigenvector pair” with “eigenvalue” σ i . The singular value decomposition is computed by the octave/Matlab command [U,S,V] = svd(B). The decomposition is computed such that the diagonal elements of S, the singular values, are positive, and decreasing with increasing index. Now we can use the singular value decomposition on W: W = UΣV . Thus M = W W = VΣ U UΣV = VΣ IΣV = VS2 V ,
2 where S2 is the n × n diagonal matrix whose ith diagonal is σi . Thus we see that the solution to the orthogonal least squares problem, the eigenvector associated with the smallest eigenvalue of M, is the column vector of V associated with the smallest, in absolute value, singular value of W. In octave/Matlab, this is V(:,n). This may not appear to be a computational savings, but if M is computed up front, and its eigenvectors are computed, there is loss of precision in its smallest eigenvalues which might lead us to choose the wrong normal direction (especially if there is more than one!). On the other hand, M is a n × n matrix, whereas W is m × n. If storage is at a premium, and the method is being reapplied with new observations, it may be preferrable to keep M, rather than keeping W. That is, it may be necessary to trade conditioning for space.

9.3.2

Principal Component Analysis

The above analysis leads us to the topic of Principal Component Analysis. Suppose the m × n matrix X represents m noisy observations of some process, where each observation is a vector in Rn . Suppose the observations nearly lie in some k-dimensional subspace of R n , for k < n. How can you find an approximate value of k, and how do you find the k-dimensional subspace?

130 As in the previous subsection, let W =X− 1m 1m X ¯ = X − 1m X, 1m 1m

CHAPTER 9. LEAST SQUARES

¯ where X = 1m X/1m 1m .

¯ Now note that X is the mean of the m observations, as a row vector in R n . Under the assumption ¯ that the noise is approximately the same for each observation, we should think that X is likely to be in or very close to the k-dimensional subspace. Then each row of W should be like a vector which is contained in the subspace. If we take some vector v and multiply Wv, we expect the resultant vector to have small norm if v is not in the k-dimensional subspace, and to have a large norm if it is in the subspace. You should now convince yourself that this is related to the singular value decomposition of W. Decomposing v = i αi v (i) , we have Wv = i σi αi u(i) , and thus product vector has small norm if the αi associated with large σi are small, and large norm otherwise. That is the principal directions of the “best” k-dimensional subspace associated with X are the k columns of V associated with the largest singular values of W.
1 cum. sum sqrd. sing. vals.

0.9

0.8

0.7

0.6

0.5

0.4

0.3 0 5 10 15 20 25 30 35 40

Figure 9.6: Noisy 5-dimensional data lurking in R 40 are treated to Principal Component Analysis. This figure shows the normalized cumulative sum of squared singular values of W. There is a sharp “elbow” at k = 5, which indicates the data is actually 5-dimensional. If k is unknown a priori, a good value can be obtained by eyeing a graph of the cumulative sum of squared singular values of W, and selecting the k that makes this appropriately large. This is illustrated in Figure 9.6, with data generated by the following octave/Matlab code: octave:1> octave:2> octave:3> octave:4> X = rand(150,5); sawX = (X * randn(5,40)) + kron(ones(150,1),randn(1,40)); sawX += 0.1 * randn(150,40); wun = ones(150,1);

9.3. ORTHOGONAL LEAST SQUARES octave:5> octave:6> octave:7> octave:8> octave:9> W = sawX - (wun * wun’ * sawX) / (wun’ * wun); [U,S,V] = svd(W); eigs = diag(S’*S); cseig = cumsum(eigs) ./ sum(eigs); plot ((1:40)’,cseig,’xr-@;cum. sum sqrd. sing. vals.;’);

131

3 0.10) Any symmetric positive definite matrix. y. express the approximate m in terms of the α.5. y n (9.3. In this exercise you will find the best linear function which approximates a set of data.3) Find the constant that best approximates.1 0.3 using a single basis function g 0 (x) = 1. say. LEAST SQUARES (9.2 or equation 9. finds the c that solves ˆ min Ac − y c 2 G Prove that the normal equations form of the solution of this problem is A GA c = A Gy. the data x 1 −2 5 y 1 −2 4 (9.5 (9. (9.4) Find the function f (x) = c which best approximates. β. the given data x. x n y 0 y1 . ˆ .9) For ordinary least squares regression to the line y = mx + b.132 Exercises CHAPTER 9. a maximum? (9. G can be used to define a norm in the following way: √ v G =df v Gv The weighted least squares method for G and data A and y. in the ordinary least squares sense.1. . (9.2) Our “linear” least squares might be better called the “affine” least squares. and not.1) How do you know that the choice of constants c i in our least squares analysis actually find a minimum of equation 9. find the function f (x) = cx which is the ordinary least squares best approximant to the given data x y x 0 x1 . (Hint: you can use equation 9. .) Do the xi values affect your answer? (9. Compare this to that found in equation 9. . .8) Prove that the least squares solution is the solution of the normal equations. in the ordinary least squares sense. (9. That is. (9.7) Find the function ax2 + b that best approximates the data x y 0 1 2 0.6) Find the function ax + b that best approximates the data of the previous problem using the orthogonal least squares method. equation 9.4. and thus takes the form −1 c= A A A y. and γ parameters from equation 9.5) Find the function ax + b that best approximates the data x y 0 −1 2 0 1 −1 Use the ordinary least squares method.

let octave/Matlab solve that linear system for you.15) Write code to find the function ae x + be−x that best interpolates. a set of data {(xi . 2 . .6.5 0 0. (b) Prove that the r 2 parameter is “scale-invariant. where L is a lower triangular matrix. . an is defined as ( ai )1/n . . (9.13) As proved in Example Problem 9.b] = lsqrexp(xys) (b) Try the following: octave:5> xys = xys = [-0. the orthogonal least squares best approximate line for x ¯ data {(xi . in the least squares sense. It is defined as r = where c is the approximant A A−1 A y.00001 0. octave:4> [a. (Hint: To find x such that Mx = b. octave:2> ys = [1. . Use the normal equations method. in the least squares sense. .12) The r 2 statistic is often used to describe the “goodness-of-fit” of the ordinary least squares approximation to data.3. use x = M \ b. How does your answer relate to the geometric mean? (9. yi )}m goes through the point (¯.034]’.9.7]. octave:3> xys = [xs ys].1.5 1]’.3.322 1. Your m-file should have header line like: . yi )}i=n . the parameter is unchanged (although the value of c may change.1.4.11) (Continuation) Let the symmetric positive definite matrix G have Cholesky Factorization G = LL .0.14) Find the constant c such that f (x) = ln (cx) best approximates. This should reduce the problem to solving a 2 × 2 linear system.b] = lsqrexp(xys) where xys is the (n + 1) × 2 matrix whose rows are the n + 1 tuples (x i . (9.430 0.) (a) What do you get when you try the following? octave:1> xs = [-1 -0. x n y y 0 y1 .16) Write code to find the plane n x(i) = d that best interpolates. Your m-file should have header line like: i=0 function [a. y n (Hint: You cannot use basis functions and equation 9.00001 0. Show that the solution to the weighted least squares method is the c that is the ordinary (unweighted) least squares best approximate ˆ solution to the problem L Ac = L y.0 0. .194 0. octave:6> [a. a2 . Does the same thing hold for the ordinary i=1 least squares best approximate line? (9.) The geometric mean of the numbers a 1 . if we change the units of our data. . . yi ). ORTHOGONAL LEAST SQUARES 133 (9.2 to solve this.” that is.103 0.b] = lsqrexp(xys) Does something unexpected happen? Why? (9. a i=m set of data x(i) i=0 . y ). You must use the Definition 9. the given data x x 0 x1 . ˆ (a) Prove that we also have r2 = 1 − Aˆ − y c y 2 2 2 2 2 Aˆ c y 2 2 2 .) ˆ (c) Devise a similar statistic for the orthogonal least squares approximation which is scaleinvariant and rotation-invariant. in the least squares sense.

b = 1.1. octave:8> orthm = -n(1) / n(2). octave:2> Xsub = rand(m. Now X contains data which are approximately on a (n − 1)-dimensional hyperplane in Rn .1. Use the octave/Matlab eig function."). octave:4> YSaw = Ytrue + yeps * randn(size(Ytrue)). which control the sizes of the errors in the x and y dimensions.+ b.* randn(size(X)). octave:3> T = rand(n-1. octave:12> plot(XSaw. octave:6> X += epsi .orthogonal least squares. which returns the eigenvalues and vectors of a matrix.0. octave:14> plot(XSaw.m*XSaw . lay in the rowspace of T. yeps = 0. octave:15> hold off.YSaw.0.d] = orthlsq(X).d] = orthlsq([XSaw. octave:2> Xtrue = randn(obs. octave:7> [n."*.0.offs = 2. Try with xeps = 0.observed data."r-. What do you get? Try again with epsi very small (1 × 10 −5 ) or zero. YSaw]).orthm*XSaw . when centered.+ lsb.(n-1)). Now test your code. . y data nearly on the line y = mx + b: octave:1> obs = 100.m = 0.1). octave:4> X = Xsub * T.orthb = d / n(2).true line.epsi = 0. octave:13> plot(XSaw. . octave:3> XSaw = Xtrue + xeps * randn(size(Xtrue))."). octave:7> [n. Try this again with different values of xeps and yeps. and with with xeps = 1. (a) Create artificially planar data by the following: octave:1> m = 100.+ orthb. Plot the two regression lines. You should have that the vector n is orthogonal to the rows of T."). ones(obs.d] = orthlsq(X) where X is the m × n matrix whose rows are the m tuples (x 1 . yeps = 1. Consider x.n)."b-.").1.xeps = 1.n = 5. x2 . octave:6> lsm = lsmb(1).375.Ytrue = m .0.lsm*XSaw . octave:11> hold on.ordinary least squares. octave:9> hold off. lsb = lsmb(2). LEAST SQUARES function [n. Compare the estimated coefficients from both solutions. octave:10> plot(XSaw. . Check this: octave:8> disp(norm(T * n)). (b) Compare the orthogonal least squares technique to the ordinary least squares in a two dimensional setting."g-.1)] \ YSaw. xn ).134 CHAPTER 9.* Xtrue . Under which situations does orthogonal least squares outperform ordinary least squares? Do they give equally erroneous results in some situations? Can you think of a way of “reweighting” channel data to make the orthogonal least squares method more robust in cases of unequal error variance? .* ones(size(X)). octave:5> X += offs . The data.+ b.yeps = 1. octave:5> lsmb = [XSaw. .

x(a) = c. They are usually easier to solve or approximate than the more general Partial Differential Equations. For more background on ODEs. 1 has solution x(t) = 3 t3 − 1 t2 + K.1) In calculus classes. For example the ODE given by = t2 − t. Much of the background material of this chapter should be familiar to the student of calculus.g. x(a) = c. which can contain partial derivatives. x(t)) . x(t)) is independent of its second variable. thus we turn to approximation schemes. you probably studied the case where f (t. We will focus more on the approximation of solutions. 2 A more general case is when f (t. 135 . or ODEs.Chapter 10 Ordinary Differential Equations Ordinary Differential Equations. PDEs. e. see any of the standard texts. 10. where K is chosen such that x(a) = c. You may recall that the solution to such a separable ODE usually involved the following equation: dx(t) dt dx = h(x) g(t) dt Finding an analytic solution can be considerably more difficult in the general case. x) is separable.. h.1 Elementary Methods A one-dimensional ODE is posed as follows: find some function x(t) such that = f (t. [8]. than on analytical solutions. that is f (t. dx(t) dt (10. are used in approximating some physical systems. x) = g(t)h(x) for some functions g. Some classical uses include simulation of the growth of populations and trajectory of a particle.

x(t)) . Note this is essentially the same as the “forward difference” approximation we made to the derivative. 2 But we cannot evaluate this exactly. we can write 1 1 m (m) 1 x(t + h) = x(t) + hx (t) + h2 x (t) + .e. maybe the formula would work.1 Integration and ‘Stepping’ We attempt to solve the ODE by integrating both sides.2 Taylor’s Series Methods We see that our more accurate integral approximations will be useless since they require information we do not know. x(r)) dr. we get x(t + h) ≈ x(t) + hf (t. The difference between the actual . We can also view this as using integral approximations where all information comes from the left-hand endpoint.136 CHAPTER 10. x(t + h))] . If we apply the left-hand rectangle rule for approximating integrals to equation 10. dt t+h t+h yields thus (10. Bummer.2. By Taylor’s theorem. (10. + h x (t) + hm+1 x(m+1) (τ ). Thus we fall back on Taylor’s Theorem (Theorem 1.1. This is the idea behind the Runge-Kutta Method (see Section 10. If we can approximate the integral then we have a way of ‘stepping’ from t to t + h. say by using Euler’s Method. Since τ is essentially unknown.1. ORDINARY DIFFERENTIAL EQUATIONS 10. x(r)) dr. But if we could approximate it.. We also say that this approximation is a truncation of the Taylor’s series. And it is embedded on the right hand side. for each step.1). x(t)).. . 10. Note that this means that all the work we have put into approximating integrals can be used to approximate the solution of ODEs. since x(t + h) appears on both sides of the equation. evaluations of f (t. + 2 m! This approximate solution to the ODE is called a Taylor’s series method of order m. x) for yet unknown x values. t+h x(t + h) = x(t) + t f (r. our best calculable approximation to this expression is 1 m (m) 1 h x (t). i. For simplicity we usually use the same step size.2). That is dx(t) = f (t. Using stepping schemes we can approximate x(t f inal ) given x(tinitial ) by taking a number of steps. . i. though nothing precludes us from varying the step size.2) dx = t t f (r. x(t + h) ≈ x(t) + hx (t) + h2 x (t) + . x(t)) + f (t + h. h. equation 7. if x has at least m + 1 continuous derivatives on the interval in question.3) This is called Euler’s Method.1. .e. 2 m! (m + 1)! where τ is between t and t + h. if we have a good approximate of x(t) we can approximate x(t + h). . Trapezoid rule gives x(t + h) ≈ x(t) + h [f (t.

dx(t) dt The actual solution is x(t) = et . Consider the ODE PSfrag replacements = x.1. x(t)). as the actual solution. 10. We discuss this more in Subsection 10.5.5 2 2.1. In Figure 10. Example 10. 22 20 18 16 14 12 10 8 6 4 2 0 0 0.1. called roundoff error.1. x(t) = e t has positive curvature. The truncation error exists independently from any error in computing the approximation. Each step proceeds on the linearization of the curve. 10. Euler’s Method will underestimate x(t) because the curvature of the actual x(t) is positive.1: Euler’s Method applied to approximate x = x.10. x) is given as a “black box” function. and is thus an underestimate. and thus the function is always above its linearization.1. . Here step size h = 0.4 Higher Order Methods Higher order methods are derived by taking more terms of the Taylor’s series expansion.1. x(0) = 1.5 1 1. Note however. x(0) = 1. that they will require information about the higher derivatives of x(t).5 3 Euler approximation Actual Figure 10.3 Euler’s Method When m = 1 we recover Euler’s Method: x(t + h) = x(t) + hx (t) = x(t) + hf (t. and thus may not be appropriate for the case where f (t. ELEMENTARY METHODS 137 value of x(t + h).3 is used. we see that eventually the Euler’s Method approximation is very poor. and this approximation is called the truncation error.

5 Errors There are two types of errors: truncation and roundoff. We have already observed this kind of behaviour. x(0) = 1. x(a) = c + . Roundoff error occurs because of the limited precision of computer arithmetic. x(t)) . the optimal h-value will usually depend on the given ODE and the approximation method. while the roundoff error increases.2. You can see that the truncation error is O hm+1 . Unfortunately. it is generally impossible to predict the optimal h.6 Stability Suppose you had an exact method of solving an ODE. but instead fed the wrong data into the computer. you were trying to solve dx(t) dt = f (t. Truncation error is the error of truncating the Taylor’s series after the mth term. which then solved exactly the ODE = f (t. This leads to the topic of stability. dx(t) dt . an error in the value of x(t) can cause a greater error in the value of x(t + h). but had the wrong data. x(a) = c.5. While it is possible these two kinds of error could cancel each other. 10. we usually assume they are additive.138 CHAPTER 10. x(t)) . It turns out that for some ODEs and some methods. Solution: By simple calculus: x (t) = x + ex x (t) = x + ex x x (t) = x (1 + ex ) + x ex x We can then write our step like a program: x (t) ← x(t) + ex(t) dx(t) dt x (t) ← x (t) 1 + ex(t) 1 1 x(t + h) ≈ x(t) + hx (t) + h2 x (t) + h3 x (t) 2 6 x (t) ← x (t) 1 + ex(t) + x (t) 2 x(t) e 10.2. This results in an optimal h-value for a given method. this does not happen. As h is made smaller. Derive the Taylor’s series method for m = 3 for the ODE = x + ex . Both of these kinds of error can accumulate: Since we step from x(t) to x(t + h). For an ODE with unknown solution. ORDINARY DIFFERENTIAL EQUATIONS Example Problem 10. That is.1. the truncation error presumably decreases. when analyzing approximate derivatives–cf.1. Example Problem 7. See Figure 10. given our pessimistic worldview. We also expect some tradeoff between these two errors as h varies.

We illustrate this with the usual example. The former ODE exhibits the opposite behavior– accrued errors will be amplified. and total error for a fictional ODE and approximation method are shown. Consider the ODE dx(t) = x.2: The truncation.3(a).1. Sfrag replacements 139 truncation roundoff total error 1e-06 1e-05 0. dt This has solution x(t) = x(0)et .10.1. The curves for different starting conditions diverge.” as seen in Figure 7. as in Figure 10.0001 0. dt which has solution x(t) = x(0)e−t .01 . roundoff.3.01 1e-07 Figure 10. This solution can diverge from the correct one because of the different starting conditions. ELEMENTARY METHODS 10000 1000 100 10 1 0. In reality.0002. if we instead consider the ODE dx(t) = −x. However. thus the approximation could be Euler’s Method.1 0. Thus the latter ODE exhibits stability: roundoff and truncation errors accrued at a given step will become irrelevant as more steps are taken. The total error is apparently minimized for an h value of approximately 0.3(b). as in Figure 10. Note that the truncation error appears to be O (h). we see that differences in the initial conditions become immaterial.001 0. Example 10. we expect the roundoff error to be more “bumpy.

5 1.6 0.5 = −0. ORDINARY DIFFERENTIAL EQUATIONS 12 10 = −0.8 0.4 = −0.6 1.5 2 (a) (1 + )et 1.5 1 1.4 0.25 =0 = 0.2 0 0 0. (1+ )e−t are shown.5 = −0.5 8 6 4 PSfrag replacements 2 0 0 0.5 1 1. while in (b). the functions (1 + )e t are shown.25 =0 = 0.PSfrag replacements 140 CHAPTER 10.3: Exponential functions with different starting conditions: in (a).2 1 0.25 = 0. . The latter exhibits stability.25 = 0.5 2 (b) (1 + )e−t Figure 10. while the former exhibits instability.

We can prove this without too much pain. However. s) = fx (t. ∂s ∂t ∂s ∂s By the independence of t. for t ≥ a. If we take the derivative of the ODE we get ∂ x(t. ∂s ∂t ∂s We use continuity of x to switch the orders of differentiation: ∂ ∂ ∂ x(t. if fx < −δ for a positive δ. s). giving stability. we have ∂ u(t) = q(t)u(t). x(t. and so lim u(t) = 0. it may suffer from instability. then lim Q(t) = −∞. ∂t ∂s ∂s Defining u(t) = ∂ ∂s x(t. If f x > δ for some positive δ. Define x(t. x(t. Some ODEs fall through this test. Similarly if q(r) ≤ −δ < 0. and q(t) = fx (t. x(a) = s. then the ODE is instable. ∂t ∂ ∂ ∂ x(t. which may make the method inappropriate or impractical.1. x(t. x(t)) + fx (t. . and so lim u(t) = ∞.. x(t)) . where t Q(t) = a q(r) dr. it can be recast into a method which may have superior stability at the cost of limited applicability. s) = ft (t.7 Backwards Euler’s Method Euler’s Method is the most basic numerical technique for solving ODEs. s) = f (t. ∂s Think about this in terms of the curves for our simple example x = x. ELEMENTARY METHODS 141 It turns out there is a simple test for stability. s). But note that u(t) = ∂ ∂s x(t. and thus we have instability.10.1. s)). i. s) x(t. s) = ∞. so ∂ ∂ ∂x(t. x(r. then lim Q(t) = ∞. 10. s). s)). However. s the first part of the RHS is zero. s) = fx (t. however. If q(r) = fx (r. s) x(t. s)) x(t. s)). the equation is stable.e. ∂t The solution is u(t) = ceQ(t) . x(t. s) = f (t. x(t. s)) ≥ δ > 0. s)) . s) to be the solution to = f (t. x(t. s)) . ∂s ∂t ∂s ∂ ∂ ∂t ∂x(t. Then instability means that t→∞ dx(t) dt lim ∂ x(t.

Letting tf = t + h. is a stepping method: given a reasonable approximation to x(t). x(0) = 2π. . we have to find x(h) such that x(h) = x(0) + cos x(h). 8 g(y) 6 4 2 0 -2 -4 -6 2 4 6 8 10 12 Figure 10. . ORDINARY DIFFERENTIAL EQUATIONS This method. Taylor’s Theorem states that x(tf − h) = x(tf ) + (−h)x (tf ) + (−h)2 x (tf ) + . it calculates an approximation to x(t + h). the second order term is truncated to give x(t) = x(t + h − h) = x(tf − h) ≈ x(tf ) + (−h)x (tf ) and so. and the Runge-Kutta methods in the following section are explicit methods: they can be used to directly compute an approximate step. Example 10. if we wish to approximate x(0 + h) using Backwards Euler’s Method. As usual. The techniques of Chapter 4 are needed to find the zero of this function. Consider the ODE = cos x.4: The nonincreasing function g(y) = 2π + cos y − y is shown. . We make this clear by examples.4.4. This is equivalent to finding a root of the equation PSfrag replacements 0 dx(t) dt g(y) = 2π + cos y − y = 0 The function g(y) is nonincreasing (g (y) is nonpositive). and has a zero between 6 and 8. Assuming that x has two continuous derivatives on the interval in question. . 2 for an analytic function. In this case. thus cannot be used to calculate an approximation for x(t+h). Since this approximation may contain x(t + h) on both sides. we call this method an implicit method. In contrast. x(t + h) ≈ x(t) + hx (t + h) The complication with applying this method is that x (t+h) may not be computable if x(t+h) is not known.142 CHAPTER 10. as seen in Figure 10. Euler’s Method. the so-called Backwards Euler’s Method. Taylor’s Theorem is the starting point.

dx(t) dt 143 The actual solution is x(t) = et . RUNGE-KUTTA METHODS Example 10. x(0) = 1.2 Runge-Kutta Methods dx(t) dt Recall the ODE problem: find some x(t) such that = f (t. gives the approximation: x(t + h) = x(t) + hx(t + h).3. 1−h Using Backwards Euler’s Method gives an over estimate.5. x(a) = c. as shown in Figure 10. through the tangent at that point.1. as shown in Figure 10.5 1 1.1: = x. x(t)) . 10. which shows the same problem approximated by Euler’s Method.5: Backwards Euler’s Method applied to approximate x = x. with the same stepsize. This effect is not evident in this case.5. as each step is to a point on a curve ke t .2.5 2 2. .10. 40 PSfrag replacements Backwards Euler approximation Actual 35 30 25 20 15 10 5 0 0 0. x(0) = 1. Consider the ODE from Example 10. The approximation is an overestimate. Using Backwards Euler’s Method.1.5 3 Figure 10. h = 0. Generally Backwards Euler’s Method is preferred over vanilla Euler’s Method because it gives equivalent stability for larger stepsize h. This is because each step proceeds on the tangent line to a curve ke t to a point on that curve. Compare this figure to Figure 10. x(t) x(t + h) = . Euler’s Method underestimates x(t).

y) + 1 2 h fxx (¯. c are given.e. We will pick another choice. y) ∂ 2 f (x.1 Taylor’s Series Redux We fall back on Taylor’s series. y) + kfy (x. in this case the 2-dimensional version: ∞ f (x + h. x ¯ For practice. 10. ∂x ∂y ∂ 2 f (x. Note that w 1 = 1. x (t). y ) + 2hkfxy (¯. y). y). y) ∂ 2 f (x. y) +k . y) = h 2 ∂f (x. .2 Deriving the Runge-Kutta Methods We now introduce the Runge-Kutta Method for stepping from x(t) to x(t + h). y) = f (x. y + k) = i=0 1 i! h ∂ ∂ +k ∂x ∂y i f (x. y) ∂f (x. The thing in the middle is an operator on f (x.. we will write this for n = 1: f (x + h. y) + 1 (n + 1)! h ∂ ∂ +k ∂x ∂y n+1 f (¯. .2. that is: h h h ∂ ∂ +k ∂x ∂y ∂ ∂ +k ∂x ∂y ∂ ∂ +k ∂x ∂y 0 f (x. y) + k2 + 2hk . . The Runge-Kutta Methods (don’t ask me how it’s pronounced) seek to resolve this. The partial derivative operators are interpreted exactly as if they were algebraic terms. This makes these methods less useful for the general setting of f being a blackbox function. which suffers inaccuracy. w2 = 0 corresponds to Euler’s Method. y) = h2 . or you are using a higher order method which requires evaluations of higher order derivatives of x(t). x) K2 ← hf (t + αh. β. x ¯ where (¯. y ) + k 2 fyy (¯. y + k) = i=0 1 i! h ∂ ∂ +k ∂x ∂y i f (x. i. y) and (x + h. Recall that the Taylor’s series method has problems: either you are using the first order method (Euler’s Method).2. x + βK1 ). There is a truncated version of Taylors series: n f (x + h. and does not require computation of K 2 . ORDINARY DIFFERENTIAL EQUATIONS where f.144 CHAPTER 10. w1 . We now examine the “proper” choices of the constants. etc. y ) is a point on the line segment with endpoints (x. y + k). x (t). y ). We suppose that α. y). w2 are fixed constants. y) + hfx (x. ∂x∂y ∂x2 ∂y 2 f (x. We then compute: K1 ← hf (t. Then we approximate: x(t + h) ← x(t) + w1 K1 + w2 K2 . 1 f (x. a. y ) x ¯ x ¯ x ¯ 2 10. y + k) = f (x.

our choice of the constants makes the approximate x(t + h) good up to a O h3 term. 6 . x + K1 /2) K3 ← hf (t + h/2. x + K3 ) 1 x(t + h) ← x(t) + (K1 + 2K2 + 2K3 + K4 ) . w1 + w2 = 1.2. x + K1 ) 1 x(t + h) ← x(t) + (K1 + K2 ) . x) + f (t + h. 2 Another choice is α = β = 2/3. w1 = 1/4. x) + β 2 h2 f 2 fx x(t. x) .. x) K2 ← hf (t + h/2. The next Runge-Kutta Method is the order four method: K1 ← hf (t. x) + αβf h2 ftx (t. 2 2 This can be written (and evaluated) as K1 ← hf (t. x + βK1 ) = f (t + αh.e.5) K4 ← hf (t + h. RUNGE-KUTTA METHODS 145 Also notice that the definition of K2 should be related to Taylor’s theorem in two dimensions. Sometimes this is not enough and higher-order Runge-Kutta Methods are used.10. because we end up with the Taylor’s series expansion up to that term. x) + f 4 4 t+ 2h 2h . x + βhf (t.x + f (t. This gives x(t + h) ← x(t) + 3h h f (t. 2 Now reconsider our step: x(t + h) = x(t) + w1 K1 + w2 K2 = x(t) + w1 hf + w2 hf + w2 αh2 ft + w2 βh2 f fx + O h3 . x)) 1 2 2 ¯¯ ¯¯ ¯¯ = f + αhft + βhf fx + α h ftt (t. 1 The usual choice of constants is α = β = 1. w2 = 3/4. x + K2 /2) (10. 2 If we happen to choose our constants such that K2 ← hf (t + h. cool. then we get 1 x(t + h) = x(t) + hx (t) + h2 (ft + f fx ) + O h3 2 1 = x(t) + hx (t) + h2 x (t) + O h3 . x)). w 1 = w2 = 2 . 2 i. This gives the second order RungeKutta Method : h h x(t + h) ← x(t) + f (t. x) αw2 = 1 = βw2 . x) . 3 3 (10.4) The Runge-Kutta Method of order two has error term O h3 . x + hf (t. Let’s look at it: K2 /h = f (t + αh. = x(t) + (w1 + w2 ) hx (t) + w2 h2 (αft + βf fx ) + O h3 . = x(t) + (w1 + w2 ) hx (t) + w2 h2 (αft + βf fx ) + O h3 .

6. w2 = 1.2. a. ORDINARY DIFFERENTIAL EQUATIONS This method has order O h5 .3 Systems of ODEs dx(t) dt Recall the regular ODE problem: find some x(t) such that = f (t. Sometimes the physical systems we are considering are more complex. but rather a collection of two ODEs: . For example.3 Examples Example 10.1 to compute x(1.7. where f. y(t)) . 2 2 This is called the modified Euler’s Method .html) The Runge-Kutta Method can be extrapolated to even higher orders. x(t).com/Runge-KuttaMethod. We already saw that w1 = 1. we might be interested in the system of ODEs:   dx(t) = f (t. (See http://mathworld.  x(a) = c.  x(0) = 1. w2 = 0 corresponds to Euler’s Method. Example 10. dx(t) dt You should immediately notice that this is not a system at all. Thus the methods of order higher than four are normally not used.1) using both Taylor’s Series Methods and Runge-Kutta methods of order 2. Consider the following system of ODEs:   dx(t) = t2 − x  dt   dy(t) 2 dt = y + y − t. x + f (t. y(t)) . x(a) = c. = t2 − x x(0) = 1. 10. However.  dt   dy(t) dt = g (t. the number of function evaluations grows faster than the accuracy of the method. c are given. 10.    y(a) = d. Note this gives a different value than if Euler’s Method was applied twice with step size h/2. The method becomes x(t + h) ← x(t) + hf t+ h h . x) . x(t).146 CHAPTER 10.wolfram.    y(0) = 0. We consider now the case w1 = 0. Consider the ODE: x = (tx)3 − x(1) = 1 x 2 t Use h = 0. x(t)) .

3. x1 (t).1 Larger Systems There is no need to stop at two functions. We will not consider uncoupled systems. x2 (t). y) is independent of x.   . is. y(0) = 0.3 can be written as X(t + h) ← X(t) + hF (t. Euler’s Method. Most of the methods we looked at for solving one dimensional ODEs can be rewritten to solve higher dimensional problems. . . coupled. . xn (t) such that                dx1 (t) dt dx2 (t) dt = f1 (t. and g(t. . We call such a system uncoupled. x. . . We may imagine we have to solve the following problem: find x1 (t).. .10.. xn (t)) . xn (a) = cn . .. + X (k) (t). As in 1D. .  . this method is derived from approximating X by its linearization. dt x1 (a) = c1 . . dxn (t) = fn (t..   . Compare this to equation 10. F =  2 . x1 (t). . . there is nothing new but notation: Let         c1 f1 x1 x1  c   f   x   x  X =  2 . dy(t) dt 147 These two ODEs can be solved separately. In fact. . our general strategy of using Taylor’s Theorem also carries through without change. x2 (t). . .3. x2 (t). X =  2 .1.3. In an uncoupled system the function f (t. That is we can use the k th order method to step as follows: X(t + h) ← X(t) + hX (t) + hk h2 X (t) + . Then we want to find X such that The idea is to not be afraid of the notation. of course. For example. . X(t)) X (a) = C.  cn fn xn xn X (t) = F (t. . xn (t)) . 2 k! . x2 (t). . X(t)) .6) 10.   . then do exactly the same thing as for the one dimensional case! That’s right. y) is independent of y. . A system which is not uncoupled. x. and want to find X(t + h). .. . x2 (a) = c2 . 10. (10.2 Recasting Single ODE Methods We will consider stepping methods for solving higher order ODEs. . That is. = f2 (t. we know X(t). . C =  2 . equation 10. xn (t)) ... . x1 (t). write everything as vectors. SYSTEMS OF ODES and = y 2 + y − t. ..

X(0) + K1 ) =  0.061 K2 ← 0. X + K2 2 2 K4 ← hF (t + h.1F (0. X(0)) =  2  −1   Euler’s Method then makes the approximation      0. Consider the system of ODEs:  2  x1 (t) = x2 − x3   x (t) = t + x + x   2 1 3   x3 (t) = x2 − x2 1  x1 (0) = 1.9 −0.1 1 The Runge-Kutta Method computes:   −0. X) K2 ← hF Order four: 1 1 t + h.    x3 (0) = 1. and the Runge-Kutta Method of order 2. 2 K1 ← hF (t. X + K3 ) 1 X(t + h) ← x(t) + (K1 + 2K2 + 2K3 + K4 ) . X(t)) = x2 (t) − x2 (t) 1 1 Thus we have  −1 F (0. X + K1 2 2 1 1 K3 ← hF t + h.1) by taking a single step of size 0. ORDINARY DIFFERENTIAL EQUATIONS Additionally we can write the Runge-Kutta Methods as follows: K1 ← hF (t.1 K1 ← 0. X(0)) =  0.    x2 (0) = 0.1.1 1 X(0.1.1F (0.2  −0.8.1  −0. X) Order two: K2 ← hF (t + h. Approximate X(0.061  and . X(0)) =  0  +  0.2  =  0.1) ← X(0) + 0. X + K1 ) 1 X(t + h) ← X(t) + (K1 + K2 ) . for Euler’s Method. 6 Example Problem 10.148 CHAPTER 10. Solution: We write     x2 (t) − x2 (t) 1 3  t + x1 (t) + x3 (t)  X(0) =  0  F (t.1F (0.19  −0.2  0.9 −0.

. x2 (0) = −1 . x2 (a) = c2 . find x(t) such that x(n) (t) = f t. . . . . .0805 0. then we can rewrite as   x1 (t) = (t/x2 (t)) + x1 (t) − 1   x (t) = x (t) 1 0 1 x2 (t) = (x1 (t)+x2 (t))    x0 (0) = 1. x1 = x .195  2 1 −0.10. x2 = x . . a. It allows us to solve higher order ODEs. . c 0 . . c1 . x (a) = c2 . Example Problem 10. For example. x (t). x(a) = c0 .0805 0. x(n−1) (t) . .195  =  0. cn .3.  . . x (0) = 0. consider the following problem: given f. . y(0) = −1 Solution: We let x0 = x. This is just a system of ODEs that we can solve like any other. x(t). . . x2 = y. xn−1 (t))    x0 (t) = x1 (t)     x (t) = x2 (t) 1 . xn−1 (a) = cn−1 . x1 = x . x (a) = c1 .9195  149 10.    x  n−2 (t) = xn−1 (t)   x0 (a) = c0 . x1 (t). . .1) ← X(0) + (K1 + K2 ) =  0  +  0. . x2 (t). .3 It’s Only Systems The fact we can simply solve systems of ODEs using old methods is good news. . Then the ODE plus these n − 1 equations can be written as   xn−1 (t) = f (t. x(n−1) (a) = cn−1 . Note that this trick also works on systems of higher order ODEs. . That is. we can transform any system of higher order ODEs into a (larger) system of first order ODEs. Rewrite the following system as a system of first order ODEs:   x (t) = (t/y(t)) + x (t) − 1 1 y (t) = (x (t)+y(t))  x(0) = 1. . We can put this in terms of a system by letting x0 = x.3. .9.9195 1 X(0. x1 (a) = c1 . x0 (t). SYSTEMS OF ODES Then      1 −0.  . . x1 (0) = 0. . xn−1 = x(n−1) .

phase curves never cross. . . . . . instead of time. . there would have to be two different values of the tangent of the curve.    x = f2 (x0 .  .2) We can rewrite this ODE as X (t) = F (X(t)) . x1 . x2 . the vector field would have to have two different values at that point.. Example 10. xn ) . . then adding the equations x0 = 1.4) x2 (t) = 3(x1 − 1. . We can get rid of the time dependence and thus make the system autonomous by making t another variable function to be found. xn ) . . and thus we speak of position. This is because at the point of crossing. xn ) . 1 Note that if you have converted a system of n equations with a time dependence to an autonomous system. . x1 . Consider the autonomous system of ODEs. . without an initial value: x1 (t) = −2(x2 − 2. x1 . . xn ) . x1 . . . .  .  . 3 .  n   x0 (a) = a. In our deterministic world. x1 . . Under this analogy.3. x2 . x2 . x1 . the phase curve is the path of a vessel placed in the river. . We do this by letting x 0 = t. . . x2 . . . ORDINARY DIFFERENTIAL EQUATIONS 10.4 It’s Only Autonomous Systems In fact. at a given point. x2 . The following example illustrates the idea of phase curves for autonomous ODE.     x = fn (x0 . xn (a) = cn . we can use a similar trick to get rid of the time-dependence of an ODE. . . An autonomous system can be written in the more elegant form: . . . . xn ) . You can think of the vector field F as being the velocity. . .  .   2 x3 = f3 (x0 . . the flow of which is time independent. . the movement of the particle is dependent only on its current position and not on time. . x2 . of a river.     x1 = f1 (x0 .     x = fn (t.e. and x0 (a) = a. One can consider X to be the position of a particle which moves through R n over time. . .     x = f3 (t. x2 . xn ) . i.10. .    x2 = f2 (t. xn ) . X = F (X) X (a) = C.150 CHAPTER 10. xn (a) = cn . . Moreover. . to get the system:   x0 = 1. . In this case time becomes one of the dimensions. x2 (a) = c2 . xn ) . x1 (a) = c1 . x1 . x2 (a) = c2 . . then the autonomous system should be considered a particle moving through Rn+1 . . x2 . x1 . we can consider the phase curve of an autonomous system. . For an autonomous system of ODEs.   n  x1 (a) = c1 . Consider the first order system   x1 = f1 (t. .1 A phase curve is a flow line in the vector field F .

2 2.5 2 In the case r = 0. 2 Thus the family of such ellipses. 4. taking all r ≥ 0.4 . arrowheads were not available for this figure.) You should verify that this ODE is solved by √ √ 2 cos(√6t + t0 ) X (t) = r √ 3 sin( 6t + t0 ) + 1. 1.6: The vector field F (X) is shown in R 2 .5 3 2.5 151 0 −2 3 0 X+ 4. 2π] . SYSTEMS OF ODES This is an autonomous ODE. The tips of the vectors are shown as circles.5 1 1. the ellipse is the “trivial” ellipse which consists of a point.5 4 3. the parameters r and t 0 are uniquely determined. 0 0.8 −3. This is because the phase curve represents the Euler’s Method and the second order Runge-Kutta Method were used to approximate the ODE with initial value 1.6.5 2 1.5 -0. form the phase curves of this ODE.10. . for any r ≥ 0. Thus the trajectory of X over time is that of an ellipse in the plane. We would expect the approximation of an ODE to never cross a phase curve.5 2 2.6 .5 0 -0.3. and t0 ∈ (0. (Due to budget restrictions. Given an initial value.5 1 0. The associated vector field can be expressed as F (X) = This vector field is plotted in Figure 10.5 3 3.5 X (0) = .5 Figure 10.

5 0 -0.5 1 0.5 -0.5 1 1. The Runge-Kutta Method used larger step size.5 (a) Euler’s Method 4.5 0 -0.5 1 0.5 0 0.5 2 2. .5 3 2.5 -0.5 3 2.5 (b) Runge-Kutta Method Figure 10.5 1 1. but was more accurate.5 3 3.152 CHAPTER 10.5 2 1.10 are shown for (a) Euler’s Method and (b) the Runge-Kutta Method of order 2.5 4 3.7: The simulated phase curves of the system of Example 10. thus the “spiralling” observed for Euler’s Method is spurious.5 3 3.5 0 0.5 4 3.5 2 2. The actual solution to the ODE is an ellipse. ORDINARY DIFFERENTIAL EQUATIONS 4.5 2 1.

7.015 for 3000 steps. . but at greater computational cost. since Euler’s Method steps in the direction tangent to the actual solution.3. whereas Euler’s Method requires only one. The Runge-Kutta Method was employed with step size h = 0. The approximations are shown in Figure 10.005 for 3000 steps.3 3 Because the Runge-Kutta Method of order two requires two evaluations of F . SYSTEMS OF ODES 153 Euler’s Method was used for step size h = 0. thus every step of Euler’s Method puts the approximation on an ellipse of larger radius. we see that Euler’s Method performed rather poorly. per step. This must be the case for any step size.10. and at lesser computational cost. Thus the Runge-Kutta Method outperforms Euler’s Method. Given that the actual solution of this initial value problem is an ellipse. and for larger step size. Smaller stepsize minimizes this effect. The Runge-Kutta Method performs much better. At the given resolution. no spiralling is observable. it is expected that they would be ‘evenly matched’ when Runge-Kutta Method uses a step size twice that of Euler’s Method. spiralling out from the ellipse.

   y (t) = x (t) + x(t) − 4. for some function g.   x(0) = 2. derive a higher order approximation scheme to find x(t + h) based on x(t).1) by using one step of Euler’s Method.154 Exercises (10. x(0) = 0.   y (t) = x (t) + y (t). (10. Approximate X(1. Use backward’s Euler (or. (10.  x (0) = π.3) Consider the separable ODE x (t) = x(t)g(t).    y(0) = 0.4) Consider the ODE x (t) = x(t) + t2 . ORDINARY DIFFERENTIAL EQUATIONS x (t) = t2 − x2 x(1) = 1 Approximate x(1.1) using one step of the second order Runge-Kutta Method. and h. t.  x (0) = y(0) = y (0) = 1. 1 − hg(t + h) What could go wrong with using such a scheme? (10. (10.8) Rewrite the following higher order system of ODEs as a system of first order ODEs:   x (t) = y (t) − x (t).6) Rewrite the following higher order ODE as a system of first order ODEs:   x (t) = x(t) − sin x (t).1) Given the ODE: CHAPTER 10.   x(0) = x (0) = y (0) = −1. . the right endpoint rule for approximating integrals) to derive the approximation scheme x(t + h) ← x(t) .5) Which of the following ODEs can you show are stable? Which are unstable? Which seem ambiguous? (a) x (t) = −x − arctan x (b) x (t) = 2x + tan x (c) x (t) = −4x − ex (d) x (t) = x + x3 (e) x (t) = t + cos x (10.7) Rewrite the system of ODEs as a system of first order ODEs   x (t) = y(t) + t − x(t).1) using one step of the second order Runge-Kutta Method. Using Taylor’s Theorem. alternatively.  x (0) = 1. (10.2) Given the ODE:   x1 (t) = x1 x2 + t   x2 (t) = 2x2 − x2 1  x1 (1) = 0   x2 (1) = 3 Approximate X(1. Approximate x(1. (10.1) by using one step of Euler’s Method.

h. 1000. Your m-file should have header line like: function xfin = rkm(f.0. 91.1. dx(t) dt 155 Your m-file should have header line like: function xfin = euler(f.h. 500. so your approximation should be good (cf.c. using a = 0.3. SYSTEMS OF ODES (10.0.2) (c) Run your code with f(t. For example. x(a) = c. (b) Run your code with f(t. using a = 0 = c. Your actual solution may be a rather poor approximation.0. Plot your approximations for varying values of steps. (a) Run your code with f(t. c = 0. and using h steps = 1. Test your code with the ODEs from the previous problem.9) Implement Euler’s Method for approximating the solution to = f (t. Also try large values of steps. 36. Can you guess what the actual solution is supposed to look like? Can you explain the poor performance of Euler’s Method for this ODE? (10. and 90.3). and using h steps = 2. and using h steps = 1. Experiment with different values of h.c.a.10) Implement the Runge-Kutta Method of order two for approximating the solution to = f (t. Does the Runge-Kutta Method perform any better than Euler’s Method for the last ODE? dx(t) dt . x) = −x. In this case the actual solution is xfin = ce−1 . x(t)) .steps) where xfin should be the approximate solution to the ODE at time a + steps h. (cf. Prepare a plot of error versus h for varying values of h.10. x) = x.1 and Figure 10.a. using a = 0 = c. x) = 1/ (1 − x) . like 100. Figure 7.steps) where xfin should be the approximate solution to the ODE at time a + steps h. Example 10. try steps values of 35. x(t)) . This ODE is stable. In this case the actual solution is xfin = ce. x(a) = c.

ORDINARY DIFFERENTIAL EQUATIONS .156 CHAPTER 10.

and state the conditions on the variable that appears in the error term. . P6 (7 pnts) Write out the iteration step of the Secant Method for finding a zero of the function f (x). . √ • Letting x0 = 1. Your iteration should look like xk+1 = ?? √ • Write the Newton’s Method iteration for finding 5. xn . Let a0 = 0. Fall 2003 P1 (7 pnts) Clearly state Taylor’s Theorem for f (x + h). What are a2 . Note that 5 ≈ 2. give the accuracy of the above approximation. x1 . .236. using h = 3 . xn . 1 • Let f (x) = x2 . Approximate f (0) with this approximation. . P3 (7 pnts) Write the Lagrange form of the polynomial of lowest degree that interpolates the following values: xk −1 1 yk 11 17 P4 (14 pnts) Use the method of Bisection for finding the zero of the function f (x) = x 3 − 3x2 + 5x − 5 .Appendix A Old Exams A. find the x1 and x2 for your iteration. Your answer should be something like O h? . Include all hypotheses. P8 (21 pnts) Consider the data: 0 1 2 k xk 3 1 −1 f (xk ) 5 15 9 157 . . x2 . write the polynomial that has value 1 at x0 and has value 0 at x1 . b2 ? 2 P5 (21 pnts) • Write out the iteration step of Newton’s Method for finding a zero of the function f (x). . That is. and b0 = 2. • Assuming that f (x) has bounded derivatives. Your iteration should look like xk+1 = ?? P7 (21 pnts) Consider the following approximation: f (x) ≈ f (x + h) − 5f (x) + 4f (x + h ) 2 3h • Derive this approximation using Taylor’s Theorem. . . P2 (7 pnts) Write out the Lagrange Polynomial 0 (x) for nodes x0 .1 First Midterm.

. Your answer should be a number. Fall 2003 b P1 (6 pnts) Write the Trapezoid rule approximation for a f (x) dx. Show your work. Find a numerical upper bound on the error |p(x) − f (x)| over the interval [−1. using the nodes {xi }n . Verify your answer. . .2 Second Midterm. P7 (28 pnts) Consider the quadrature rule: 1 0 f (x) dx ≈ z0 f (0) + z1 f (1) + z2 f (0) + z3 f (1).1 −0. Make this rule exact for all polynomials of degree ≤ 2. .e.158 APPENDIX A.001  43 91 −43 1  .0001 −0. Suppose b a f (x) dx = φ(n) + a5 h5 + a10 h10 + a15 h15 + . L and U. Show all your work. n n n where hn = b−a . A. devise a “Romberg-style” quadrature rule that is 2n a O h10 approximation to the integral. P3 (14 pnts) Let φ(n) be some quadrature rule that divides an interval into 2 n equal subintervals. • Suppose that these data were generated by the function f (x) = x 3 − 5x2 + 2x + 17. Clearly indicate your answer x(1) . i. A=  −12 26 −1 0  1 −10 π −7 2 Which row contains the first pivot? You must show your work. Using evaluations of φ(·).01 0. You must indicate which of the two methods you are using. P5 (14 pnts) Run one iteration of either Jacobi or Gauss Seidel on the system:     1 3 4 −3  −2 2 1  x =  −3  −1 3 −2 1 Start with x(0) = [1 1 1] . verify that A = LU. • Construct the Newton form of the polynomial p(x) of lowest degree that interpolates f (x) at these nodes. n P4 (14 pnts) Compute the LU factorization of the matrix   1 0 −1 A= 0 2 0  2 2 −1 using na¨ Gaussian Elimination. OLD EXAMS • Construct the divided differences table for the data. . i=0 P2 (6 pnts) Suppose Gaussian Elimination with scaled partial pivoting is applied to the matrix   0. Your final answer should be two ıve matrices. 3]. P6 (21 pnts) Derive the two-node Chebyshev Quadrature rule: 1 −1 f (x) dx ≈ c [f (x0 ) + f (x1 )] .

Verify your answer. . L. . . • Solve your system for z using na¨ Gaussian Elimination. x n y0 y1 . P9 (9 pnts) Let F = {f (x) = c0 x + c1 log x + c2 cos x | c0 .3 Final Exam. . L and U. c1 . Determine a quadrature rule of the form α −α f (x) dx ≈ Af (0) + Bf (−α) + Cf (α) that is exact for polynomials of highest possible degree. Fall 2003 P1 (4 pnts) Clearly state Taylor’s Theorem for f (x + h). y n Caution: this should not be done with basis vectors. . P8 (9 pnts) Find the least-squares best approximate of the form y = ln (cx) to the data x0 x1 . Your final answer should be two ıve matrices.e.. Let the knots be a = t0 < t1 < t2 < . Write out the Lagrange Polynomial 0 (x) for these nodes. . verify that A = LU. i.3. P2 (4 pnts) State the conditions that characterize S(x) as a natural cubic spline on the interval [a. FALL 2003 159 • Write four linear equations involving z i to make this rule exact for polynomials of the highest possible degree. that is. x(t)) x(a) = c Write out the Euler Method to step from x(t) to x(t + h). and state the conditions on the variable that appears in the error term. Show all your work. x2 = 0. P3 (4 pnts) Let x0 = 3. . P4 (4 pnts) Consider the ODE: x (t) = f (t. . . P5 (7 pnts) Compute the LU factorization of the matrix   −1 1 0 A =  1 1 −1  −2 0 2 using na¨ Gaussian Elimination. x1 = −5. Include all hypotheses. c2 ∈ R } . . Set up (but do not attempt to solve) the normal equations to find the least-squares best f ∈ F to approximate the data: x0 x1 . FINAL EXAM. x2 . . < tn = b. . Furthermore suppose that φ(n) = L + a1 n− 2 + a2 n− 2 + a3 n− 2 + . P6 (9 pnts) Let α be given.A. What is the highest degree polynomial for which this rule is exact? P7 (9 pnts) Suppose that φ(n) is some computational function that approximates a desired quantity. b]. Combine two evaluations of φ(·) in the manner of Richardson to get a O n−1 approximation to L. write the polynomial that has value 1 at x 0 and has value 0 at x1 . ıve A. y n 1 2 3 . x n y0 y1 .

.160 APPENDIX A. x(t + h) ← R(x(t). OLD EXAMS The normal equations should involve the vector c = [c 0 c1 c2 ] . x (t)) . give the accuracy of the above approximation. . Your answer should be something like O h? . It is permissible to write the update as a small program like: x (t) ← Q0 (x(t)) Rt 0 g(r) dr . which uniquely determines a function in F. P11 (11 pnts) Consider the ODE: x (t) = x(t)g(t) x(0) = 1 • Use the Fundamental Theorem of Calculus to write a step update of the form: t+h x(t + h) = x(t) + t ?? • Approximate the integral using the Trapezoid Rule to get an explicit step update of the form x(t + h) ← Q(t.5 0 0. x (t).) rather than write the update as a monstrous equation of the form x(t + h) ← M (x(t)). • Assuming that f (x) has bounded derivatives. P10 (9 pnts) Consider the following approximation: f (x) ≈ 3f (x − h) − 4f (x) + f (x + 3h) 6h2 • Derive this approximation using Taylor’s Theorem. P13 (15 pnts) Consider the data: k xk f (xk ) 0 1 2 −0. .5 5 15 9 • Construct the divided differences table for the data. x (t) ← Q1 (x(t). Do you see any problem with this stepping method if h > 2/m? Hint: the actual solution to the ODE is x(t) = e P12 (9 pnts) Consider the ODE: x (t) = ex(t) + sin (x(t)) x(0) = 0 Write out the Taylor’s series method of order three to approximate x(t + h) given x(t). . x (t). . x(t). . h) • Suppose that g(t) ≥ m > 0 for all t.

1 • Let f (x) = x2 + 1. P4 (14 pnts) One of the roots of x2 − bx + c = 0 as found by the quadratic formula is subject to subtractive cancellation when |b| |c| . . Let the first interval be [0. “black box” function. Jacobi. 256]. get a O h2 approximation to f (0) using the method of Richardson’s Extrapolation.5]. P2 (12 pnts) Write the general form of the iterated solution to the problem Ax = b. 0) D(1. Your iteration should look like xk+1 = ?? • Write the Newton’s Method iteration for finding 3 25/4.4 First Midterm.” and use the factor ω. That is. Use x0 = 3. describe what Q is. Find a reasonable C such that the truncation error is no greater than Ch 4 . Using Taylor’s Theorem. Include all hypotheses. 1) A. P2 (14 pnts) Give a O h4 approximation to sin (h + π/2) . find the x1 and x2 for your iteration. Gauss-Seidel. your answer should probably include a table like this: D(0.842.5. and fifth intervals. such that xk → 3 24. fourth. P1 (14 pnts) Perform bisection on the function graphed below. Give the second. • Suppose that these data were generated by the function f (x) = 40x 3 − 32x2 − 6x + 15. the approximation should be defined in terms of f (·) evaluated at some values. 0) D(1. FALL 2004 161 • Construct the Newton form of the polynomial p(x) of lowest degree that interpolates f (x) at these nodes. √ P3 (14 pnts) Use Newton’s Method to find a good approximation to 3 24. and let f (·) be a given. P15 (15 pnts) • Let x be some given number. Note that 3 25/4 ≈ 1. and state the conditions on the variable that appears in the error term. third. Call your approximation φ(h). Let Q be your “splitting matrix. √ iteratively define a sequence with x 0 = 3. derive a O (h) approximation to the derivative f (x). Your answer should be a number. Your solution should look something like: xk+1 =?? + ?? ?? [Problems] Answer no more than four of the following. • For the same f (x) and h. in relation to A : Richardson’s. 5 • Letting x0 = 2 . for h = 2 . Your solution should look something like: ??x(k) =??x(k−1) +?? or x(k) =??x(k−1) +?? For one of the following variants.A. P1 (12 pnts) Clearly state Taylor’s Theorem for f (x + h). Which root is it? Rewrite the expression for that root to eliminate subtractive cancellation. Fall 2004 [Definitions] Answer no more than two of the following. P3 (12 pnts) Write the iteration for Newton’s Method or the Secant Method for finding a root to the equation f (x) = 0. 0. and should probably involve x and h. Find a numerical upper bound on the error |p(x) − f (x)| over the interval [−0. FIRST MIDTERM.4. P14 (15 pnts) • Write out the iteration step of Newton’s Method for finding a zero of the function f (x). Approximate f (0) using your φ(h).

ıve   3 2 4  −6 1 −6  12 −12 10 P6 (14 pnts) Consider the linear equation:     5 6 1 1  2 4 0  x =  −6  3 1 2 6 Letting x(0) = [1 1 1] .162 64 32 0 -32 -64 0 32 64 96 128 160 192 APPENDIX A. (Hint: Use Taylor’s Theorem twice. . show that for Newton’s Method. Show all your work. OLD EXAMS 224 256 P5 (14 pnts) Find the LU decomposition of the following matrix by Gaussian Elimination with na¨ pivoting. [Questions] Answer only one of the following. 2 when |ek | is sufficiently small. . x13 . Justify your answer. Suppose that f (x) has the properties that f (x) ≥ m > 0 for all x. P1 (10 pnts) For a set of distinct nodes x 0 .) Is this faster or slower convergence than for Newton’s Method with a simple root? A. You must indicate which scheme you are using. and |f (x)| ≤ M for all x. . Fall 2004 5 (x)? [Definitions] Answer no more than two of the following three. What degree is this polynomial? . . Find δ > 0 such that |e k | < δ insures that xn → r. 1 ek+1 ≈ ek . P1 (20 pnts) Suppose that the eigenvalues of A are in [α. What is the minimal value of I − ωA2 2 for this optimal ω? For full credit you need to make some argument why this ω minimizes this norm. P2 (20 pnts) Let f (x) = 0 have a unique root r. x1 . Recall that the form of the error for Newton’s Method is −f (ξk ) 2 e . find x(1) using either the Gauss-Seidel or Jacobi iterative schemes. P3 (20 pnts) Suppose that f (r) = 0 = f (r) = f (r). and f (x) is continuous. Letting ek = xk −r. ek+1 = 2f (xk ) k where ek = r − xk . Use weighting ω = 1. β] with 0 < α < β. what is the Lagrange Polynomial You may write your answer as a product.5 Second Midterm. Find the ω that minimizes I − ωA2 2 .

b] there is some ξ ∈ [a. P1 (16 pnts) How large must n be to interpolate the function e x to within 0. b].  x2 − 3x + 4  Q(x) = α(x − 1)2 + β(x − 1) + 2   2 x +x−4 f (x) ≈ :0≤x<1 :1≤x<3 :3≤x≤4 P6 (16 pnts) Consider the following approximation: f (x + h) − 5f (x) + 4f (x + h ) 2 3h • Derive this approximation via Taylor’s Theorem. Let f (n+1) be continuous. i=0 P3 (10 pnts) State the conditions which define the function S(x) as a natural cubic spline on the interval [a. P3 (16 pnts) Derive the constants A and B which make the quadrature rule 1 −1 f (x) dx ≈ Af √ 15 − 5 + Bf (0) + Af √ 15 5 exact for polynomials of degree ≤ 2.5. 4]. . Your answer should be something like O h? . .A. Then for each x ∈ [a. FALL 2004 163 P2 (10 pnts) Complete this statement of the polynomial interpolation error theorem: Let p be the polynomial of degree at most n interpolating function f at the n+1 distinct nodes x 0 . [Problems] Answer no more than five of the following six. using h = 3 . . . b] such that f (x) − p(x) = ? ? n (?) . . β such that the following is a quadratic spline on [0. . 2] with the polynomial interpolant at n equally spaced nodes? P2 (16 pnts) Let φ(h) be some computable approximation to the quantity L such that φ(h) = L + a5 h5 + a10 h10 + a15 h15 + . Combine evaluations of φ(·) to devise a O h10 approximation to L.) P5 (16 pnts) Find constants α. SECOND MIDTERM. . give the accuracy of the above approximation. b].001 on the interval [0. xn on [a. Approximate f (0) with this approximation. x1 . Use this quadrature rule to approximate 1 π= −1 2 dx 1 + x2 P4 (16 pnts) Find the polynomial of degree ≤ 3 interpolating the data x y −1 0 1 2 −2 8 0 4 (Hint: using Lagrange Polynomials will probably take too long. • Assuming that f (x) has bounded derivatives. 1 • Let f (x) = x2 .

Combine evaluations of φ(·) to devise a O h8 approximation to L.1) using a single step. Your answer should be something like O h? . (b) Assuming that f (x) has bounded derivatives. Your solution should look something like: ??x(k) =??x(k−1) +?? or x(k) =??x(k−1) +?? For one of the following variants. Include all hypotheses. . P3 (10 pnts) Let φ(h) be some computable approximation to the quantity L such that φ(h) = L + a4 h4 + a8 h8 + a12 h12 + . P1 (10 pnts) Rewrite this system of higher order ODEs as a system of first order ODEs:  x (t) = x (t) [x(t) + ty(t)]    y (t) = y (t) [y(t) − tx(t)]     x (0) = 1 x(0) = 4    y (0) = −3     y(0) = −2 P2 (10 pnts) Consider the ODE: x (t) = t (x + 1)2 x(2) = 1/2 Use Euler’s Method to find an approximate value of x(2. . Gauss-Seidel. . P4 (15 pnts) Consider the approximation f (x) ≈ 9f (x + h) − 8f (x) − f (x − 3h) 12h (a) Derive this approximation using Taylor’s Theorem.164 APPENDIX A. give the accuracy of the above approximation. Let Q be your “splitting matrix. Jacobi. Fall 2004 [Definitions] Answer all of the following. OLD EXAMS A. x n y0 y1 .” and use the factor ω. P2 (10 pnts) Write the iteration for Newton’s Method or the Secant Method for finding a root to the equation f (x) = 0. Your solution should look something like: xk+1 =?? + ?? ?? P3 (10 pnts) Write the general form of the iterated solution to the problem Ax = b.6 Final Exam. describe what Q is. . . and state the conditions on the variable that appears in the error term. in relation to A : Richardson’s. P1 (10 pnts) Clearly state Taylor’s Theorem for f (x + h). y n [Problems] Answer all of the following. P4 (10 pnts) Define what it means for the function f (x) ∈ F to be the least squares best approximant to the data x0 x1 . . .

FALL 2004 P5 (15 pnts) Find constants. Write your subroutine in pseudo code. and h only. α. . x (t). . in particular your subroutine should not have access to a cubed root or logarithm operator. P1 (20 pnts) Consider the “quadrature” rule: 1 0 f (x) dx ≈ A0 f (0) + A1 f (1) + A2 f (1) + A3 f (1) (a) Write four linear equations involving A i to make this rule exact for polynomials of the highest possible degree. x (t). (b) Find the constants Ai using Gaussian Elimination with na¨ pivoting. . Your subroutine should use only the basic arithmetic operations of addition. β. t. multiplication. γ such that the following is a natural cubic spline on [1. division.5 f (xk ) 1.5]. calculates an approximate to 3 z via Newton’s Method.5 (a) Construct the divided differences table for the data.) ˜ such that x(t + h) = x + O h3 . Matlab. . FINAL EXAM. . P7 (15 pnts) Consider the ODE: x (t) = cos (x(t)) + sin (x(t)) x(0) = 1 Use Taylor’s theorem to devise a O h3 approximant to x(t + h) which is a function of x. (c) What degree is your polynomial? Is this strange? (d) Suppose that these data were generated by the function f (x) = 1 + cos 2πx . subtraction. 2 Find an upper bound on the error |p(x) − f (x)| over the interval [0. (b) Construct the Newton form of the polynomial p(x) of lowest degree that interpolates f (x) at these nodes. ˜ [Questions] Answer all of the following. given an input number z. x ← R(x(t). ıve P2 (20 pnts) Consider the data: 0 1 2 k xk 0 0. or C/C++.A. You may write your answer as a program: x (t) ← Q0 (x(t)) x (t) ← Q1 (x(t).25 0. Q(x) = α (x − 1)3 + β (x − 1)2 + 12 (x − 1) + 1 γx3 + 18x2 − 30x + 19 :1≤x<2 :2≤x<3 165 √ P6 (15 pnts) Devise a subroutine that. Your answer should be a number. 0. 3]. x (t)) . .5 1 0.6. Include reasonable termination conditions so your subroutine will terminate if it finds a good approximate solution.

. c1 . among the values x0 . . x1 . but did not. P4 (10 pnts) State some substantive question which you thought might appear on this exam. xn . . (xj ) = δj7 . c2 ∈ R . where (x) is some black box function. which determines the least squares approximant in F. . Answer this question (correctly). y n (a) Set up (but do not attempt to solve) the normal equations to find the the vector c = [c0 c1 c2 ] . . . We are concerned with finding the least-squares best f ∈ F to approximate the data x0 x1 . where f is the least squares best approximant to the data. . Prove that f (x7 ) = y7 . . .166 P3 (20 pnts) Let APPENDIX A. (b) Now suppose that the function (x) happens to be the Lagrange Polynomial associated with x7 . That is. OLD EXAMS √ F = f (x) = c0 x + c1 (x) + c2 ex | c0 . x n y0 y1 .

which is a copyleft license designed for free software. philosophical.Appendix B GNU Free Documentation License Version 1. We recommend this License principally for works whose purpose is instruction or reference. or absence of markup. You accept the license if you copy. this License preserves for the author and publisher a way to get credit for their work. A copy that is not ”Transparent” is called ”Opaque”. Preamble The purpose of this License is to make a manual. A ”Secondary Section” is a named appendix or a front-matter section of the Document that deals exclusively with the relationship of the publishers or authors of the Document to the Document’s overall subject (or to related matters) and contains nothing that could fall directly within that overall subject. The Document may contain zero Invariant Sections. We have designed this License in order to use it for manuals for free software. or of legal. refers to any such manual or work. and a Back-Cover Text may be at most 25 words. in any medium. commercial. But this License is not limited to software manuals.) The relationship could be a matter of historical connection with the subject or with related matters. as being those of Invariant Sections. A ”Transparent” copy of the Document means a machine-readable copy. An image format is not Transparent if used for any substantial amount of text. A Front-Cover Text may be at most 5 words. that is suitable for revising the document straightforwardly with generic text editors or (for images composed of pixels) generic paint programs or (for drawings) some widely available drawing editor. as Front-Cover Texts or Back-Cover Texts. which means that derivative works of the document must themselves be free in the same sense. If the Document does not identify any Invariant Sections then there are none. it can be used for any textual work. Such a notice grants a world-wide. textbook. A ”Modified Version” of the Document means any work containing the Document or a portion of it. The ”Invariant Sections” are certain Secondary Sections whose titles are designated. ethical or political position regarding them. If a section does not fit the above definition of Secondary then it is not allowed to be designated as Invariant. below. Any member of the public is a licensee. November 2002 Copyright c 2000. has been arranged to thwart or discourage subsequent modification by readers is not Transparent. in the notice that says that the Document is released under this License. modify or distribute the work in a way requiring permission under copyright law. but changing it is not allowed. that contains a notice placed by the copyright holder saying it can be distributed under the terms of this License. unlimited in duration. with or without modifying it. represented in a format whose specification is available to the general public. Secondarily. a Secondary Section may not explain any mathematics. A copy made in an otherwise Transparent file format whose markup. APPLICABILITY AND DEFINITIONS This License applies to any manual or other work. or other functional and useful document ”free” in the sense of freedom: to assure everyone the effective freedom to copy and redistribute it. 59 Temple Place. regardless of subject matter or whether it is published as a printed book. in the notice that says that the Document is released under this License. 1.2002 Free Software Foundation. royalty-free license. because free software needs free documentation: a free program should come with manuals providing the same freedoms that the software does.2. and is addressed as ”you”. and that is suitable for input to text formatters or for automatic translation to a variety of formats suitable for input to text formatters. either commercially or noncommercially. if the Document is in part a textbook of mathematics. MA 02111-1307 USA Everyone is permitted to copy and distribute verbatim copies of this license document. Inc. The ”Document”. It complements the GNU General Public License. to use that work under the conditions stated herein. Suite 330. or with modifications and/or translated into another language.2001. The ”Cover Texts” are certain short passages of text that are listed. This License is a kind of ”copyleft”. Boston. (Thus. while not being considered responsible for modifications made by others. 167 . either copied verbatim.

such as ”Acknowledgements”. PostScript or PDF designed for human modification. or state in or with each Opaque copy a computer-network location from which the general network-using public has access to download using public-standard network protocols a complete Transparent copy of the Document. The front cover must present the full title with all words of the title equally prominent and visible. Add an appropriate copyright notice for your modifications adjacent to the other copyright notices. XCF and JPG. ”Endorsements”. you may accept compensation in exchange for copies. as the publisher. If you publish or distribute Opaque copies of the Document numbering more than 100. SGML or XML for which the DTD and/or processing tools are not generally available. Copying with changes limited to the covers. unless they release you from this requirement. either commercially or noncommercially. under the same conditions stated above. you must take reasonably prudent steps. VERBATIM COPYING You may copy and distribute the Document in any medium. (Here XYZ stands for a specific section name mentioned below. to give them a chance to provide you with an updated version of the Document. If you use the latter option. but not required. thus licensing distribution and modification of the Modified Version to whoever possesses a copy of it. It is requested. you must do these things in the Modified Version: A. the copyright notices. free of added material. and standard-conforming simple HTML. as long as they preserve the title of the Document and satisfy these conditions. List on the Title Page. However. provided that this License. one or more persons or entities responsible for authorship of the modifications in the Modified Version. E. B. COPYING IN QUANTITY If you publish printed copies (or copies in media that commonly have printed covers) of the Document. A section ”Entitled XYZ” means a named subunit of the Document whose title either is precisely XYZ or contains XYZ in parentheses following text that translates XYZ in another language. you must enclose the copies in covers that carry. the title page itself. GNU FREE DOCUMENTATION LICENSE Examples of suitable formats for Transparent copies include plain ASCII without markup. numbering more than 100. for a printed book. State on the Title page the name of the publisher of the Modified Version. legibly. You may add other material on the covers in addition. ”Title Page” means the text near the most prominent appearance of the work’s title. all these Cover Texts: Front-Cover Texts on the front cover. and the machine-generated HTML. but only as regards disclaiming warranties: any other implication that these Warranty Disclaimers may have is void and has no effect on the meaning of this License. and you may publicly display copies. to ensure that this Transparent copy will remain thus accessible at the stated location until at least one year after the last time you distribute an Opaque copy (directly or through your agents or retailers) of that edition to the public. and that you add no other conditions whatsoever to those of this License. if it has fewer than five). with the Modified Version filling the role of the Document. if there were any. PostScript or PDF produced by some word processors for output purposes only. you should put the first ones listed (as many as fit reasonably) on the actual cover. clearly and legibly. 2. The ”Title Page” means. Opaque formats include proprietary formats that can be read and edited only by proprietary word processors. preceding the beginning of the body of the text. LaTeX input format.168 APPENDIX B. You may also lend copies. when you begin distribution of Opaque copies in quantity. For works in formats which do not have any title page as such. can be treated as verbatim copying in other respects. and the Document’s license notice requires Cover Texts. plus such following pages as are needed to hold. Examples of transparent image formats include PNG. and Back-Cover Texts on the back cover. You may not use technical measures to obstruct or control the reading or further copying of the copies you make or distribute. provided that you release the Modified Version under precisely this License. 4. if any) a title distinct from that of the Document. and the license notice saying this License applies to the Document are reproduced in all copies. If the required texts for either cover are too voluminous to fit legibly. and from those of previous versions (which should. ”Dedications”. Use in the Title Page (and on the covers. Texinfo input format. Both covers must also clearly and legibly identify you as the publisher of these copies. you must either include a machinereadable Transparent copy along with each Opaque copy. that you contact the authors of the Document well before redistributing any large number of copies. together with at least five of the principal authors of the Document (all of its principal authors. In addition. D. or ”History”. SGML or XML using a publicly available DTD. as authors. and continue the rest onto adjacent pages. be listed in the History section of the Document). The Document may include Warranty Disclaimers next to the notice which states that this License applies to the Document. These Warranty Disclaimers are considered to be included by reference in this License. the material this License requires to appear in the title page. If you distribute a large enough number of copies you must also follow the conditions in section 3.) To ”Preserve the Title” of such a section when you modify the Document means that it remains a section ”Entitled XYZ” according to this definition. MODIFICATIONS You may copy and distribute a Modified Version of the Document under the conditions of sections 2 and 3 above. C. . Preserve all the copyright notices of the Document. 3. You may use the same title as a previous version if the original publisher of that version gives permission.

authors. Do not retitle any existing section to be Entitled ”Endorsements” or to conflict in title with any Invariant Section. and publisher of the Document as given on its Title Page. You must delete all sections Entitled ”Endorsements”. I. year. immediately after the copyright notices. N. The combined work need only contain one copy of this License. add their titles to the list of Invariant Sections in the Modified Version’s license notice. given in the Document for public access to a Transparent copy of the Document. but you may replace the old one. You may add a section Entitled ”Endorsements”. provided that you include in the combination all of the Invariant Sections of all of the original documents. In the combination. 6. Preserve its Title. provided it contains nothing but endorsements of your Modified Version by various parties–for example. previously added by you or by arrangement made by the same entity you are acting on behalf of. You may omit a network location for a work that was published at least four years before the Document itself. forming one section Entitled ”History”. Section numbers or the equivalent are not considered part of the section titles. Such a section may not be included in the Modified Version. to the end of the list of Cover Texts in the Modified Version. or if the original publisher of the version it refers to gives permission. and any sections Entitled ”Dedications”. and replace the individual copies of this License in the various documents with a single copy that is included in the collection. and that you preserve all their Warranty Disclaimers. AGGREGATION WITH INDEPENDENT WORKS . in the form shown in the Addendum below. Preserve the section Entitled ”History”. provided you insert a copy of this License into the extracted document. make the title of each such section unique by adding at the end of it. J. likewise combine any sections Entitled ”Acknowledgements”. K. then add an item describing the Modified Version as stated in the previous sentence. These titles must be distinct from any other section titles. and follow this License in all other respects regarding verbatim copying of that document. and publisher of the Modified Version as given on the Title Page. and add to it an item stating at least the title. provided that you follow the rules of this License for verbatim copying of each of the documents in all other respects. if any. and distribute it individually under this License. Preserve all the Invariant Sections of the Document. Make the same adjustment to the section titles in the list of Invariant Sections in the license notice of the combined work. G. a license notice giving the public permission to use the Modified Version under the terms of this License. new authors. Preserve the network location. unaltered in their text and in their titles. and preserve in the section all the substance and tone of each of the contributor acknowledgements and/or dedications given therein. and likewise the network locations given in the Document for previous versions it was based on. unmodified. in parentheses. 5. COLLECTIONS OF DOCUMENTS You may make a collection consisting of the Document and other documents released under this License. and multiple identical Invariant Sections may be replaced with a single copy. statements of peer review or that the text has been approved by an organization as the authoritative definition of a standard. Delete any section Entitled ”Endorsements”. the name of the original author or publisher of that section if known. To do this. you must combine any sections Entitled ”History” in the various original documents. under the terms defined in section 4 above for modified versions. on explicit permission from the previous publisher that added the old one. you may at your option designate some or all of these sections as invariant. L. Include. M. and list them all as Invariant Sections of your combined work in its license notice. The author(s) and publisher(s) of the Document do not by this License give permission to use their names for publicity for or to assert or imply endorsement of any Modified Version. You may add a passage of up to five words as a Front-Cover Text. For any section Entitled ”Acknowledgements” or ”Dedications”. O. Include an unaltered copy of this License. Preserve any Warranty Disclaimers. year. If there are multiple Invariant Sections with the same name but different contents. H. COMBINING DOCUMENTS You may combine the Document with other documents released under this License. Only one passage of Front-Cover Text and one of Back-Cover Text may be added by (or through arrangements made by) any one entity. or else a unique number.169 F. and a passage of up to 25 words as a Back-Cover Text. If the Modified Version includes new front-matter sections or appendices that qualify as Secondary Sections and contain no material copied from the Document. You may extract a single document from such a collection. create one stating the title. you may not add another. If there is no section Entitled ”History” in the Document. If the Document already includes a cover text for the same cover. Preserve in that license notice the full lists of Invariant Sections and required Cover Texts given in the Document’s license notice. These may be placed in the ”History” section. Preserve the Title of the section. 7.

When the Document is included in an aggregate. TERMINATION You may not copy. or some other combination of the three. modify. However. 9. Otherwise they must appear on printed covers that bracket the whole aggregate. with no Invariant Sections. TRANSLATION Translation is considered a kind of modification. modify.2 or any later version published by the Free Software Foundation.gnu. such as the GNU General Public License. Front-Cover Texts and Back-Cover Texts. parties who have received copies.org/copyleft/. Any other attempt to copy. In case of a disagreement between the translation and the original version of this License or a notice or disclaimer. If a section in the Document is Entitled ”Acknowledgements”. or rights. you have the option of following the terms and conditions either of that specified version or of any later version that has been published (not as a draft) by the Free Software Foundation. If the Document does not specify a version number of this License. and all the license notices in the Document. GNU FREE DOCUMENTATION LICENSE A compilation of the Document or its derivatives with other separate and independent documents or works.170 APPENDIX B. and with the Back-Cover Texts being LIST.” line with this: with the Invariant Sections being LIST THEIR TITLES. .Texts. but you may include translations of some or all Invariant Sections in addition to the original versions of these Invariant Sections. is called an ”aggregate” if the copyright resulting from the compilation is not used to limit the legal rights of the compilation’s users beyond what the individual works permit. then if the Document is less than one half of the entire aggregate.. You may include a translation of this License. ADDENDUM: How to use this License for your documents To use this License in a document you have written. the Document’s Cover Texts may be placed on covers that bracket the Document within the aggregate. so you may distribute translations of the Document under the terms of section 4. 8. Such new versions will be similar in spirit to the present version. Each version of the License is given a distinguishing version number. revised versions of the GNU Free Documentation License from time to time. sublicense or distribute the Document is void. to permit their use in free software. you may choose any version ever published (not as a draft) by the Free Software Foundation. merge those two alternatives to suit the situation. the original version will prevail. and no Back-Cover Texts. sublicense. See http://www. no Front-Cover Texts. or distribute the Document except as expressly provided for under this License. If the Document specifies that a particular numbered version of this License ”or any later version” applies to it. ”Dedications”. or the electronic equivalent of covers if the Document is in electronic form. If you have Invariant Sections. provided that you also include the original English version of this License and the original versions of those notices and disclaimers. with the Front-Cover Texts being LIST. replace the ”with. If your document contains nontrivial examples of program code. in or on a volume of a storage or distribution medium. we recommend releasing these examples in parallel under your choice of free software license. 10. A copy of the license is included in the section entitled ”GNU Free Documentation License”. this License does not apply to the other works in the aggregate which are not themselves derivative works of the Document. and any Warranty Disclaimers. but may differ in detail to address new problems or concerns. If you have Invariant Sections without Cover Texts. include a copy of the License in the document and put the following copyright and license notices just after the title page: Copyright c YEAR YOUR NAME. or ”History”. distribute and/or modify this document under the terms of the GNU Free Documentation License. If the Cover Text requirement of section 3 is applicable to these copies of the Document. the requirement (section 4) to Preserve its Title (section 1) will typically require changing the actual title. and will automatically terminate your rights under this License. from you under this License will not have their licenses terminated so long as such parties remain in full compliance. Permission is granted to copy. FUTURE REVISIONS OF THIS LICENSE The Free Software Foundation may publish new. Version 1.. Replacing Invariant Sections with translations requires special permission from their copyright holders.

New York. Early Transcendentals.Bibliography [1] Guillaume Bal. CA. McGraw-Hill. Analysis of Numerical Methods. [7] David Kincaid and Ward Cheney. Calculus. Dover. ISBN 0-534-33892-5. 2003. Dover. [5] Eugene Isaacson and Herbert Bishop Keller. Brooks/Cole Publishing Co.. 1980. fourth edition. 2001. Numerical Mathematics and Computing.columbia. http://www. Wiley. ISBN 0-534-39330-6.ps. New York. 1996. fifth edition. 171 . ISBN 0486652416. course on numerical analysis. Epperson. Numerical analysis. Brooks/Cole-Thomson Learning. third edition. URL [2] Samuel Daniel Conte and Carl de Boor. Numerical Methods for Scientists and Engineers. [4] Richard Hamming. second edition. Belmont. Elementary Numerical Analysis. [8] James Stewart. CA. ISBN 0070124477. Mathematics of scientific computing. 2002. New York. Pacific Grove. [6] David Kincaid and Ward Cheney. Lecture notes. 1966. [3] James F. Pacific Grove. 1999..edu/~gb2030/COURSES/E6302/NumAnal. Brooks/Cole Publishing Co. ISBN 0-534-35184-0. An Introduction to Numerical Methods and Analysis. CA. ISBN 0-471-31647-4. An Algorithmic Approach. ISBN 0486680290. 1987.

119 LU Factorization. 84 linear. 58 splines. 105 Runge Function. 138 truncation. 9 normal equations. 28 natural cubic spline. 25 error roundoff. 122 Chebyshev Quadrature. 100 Dirichlet function. 117 linear. 3 bisection method. 118 ODE errors. 81 subtractive cancellation. 99 divided differences. 138 higher order methods. 91 Chebyshev Polynomials. 106 vector space. 3 trapezoidal rule. 107 Heaviside function. 69. Gaussian Quadrature. 8 eigenvector. 69 Runge-Kutta Method. 89. 52 norm. 97 Romberg Algorithm. 50 centered difference. 79 quadratic. 104. 137 stability. 108 multiplier. 136 partition. 104 Riemann Integrable. 79 Lagrange Polynomials. 63 least squares. 98 Hilbert Matrix. 6 Intermediate Value Theorem. 141 big-O. 4 Taylor’s Series Method. 79 basis splines. 67 eigenvalue. 83 Newton’s Method. 150 Richardson Extrapolation.Index Backwards Euler’s Method. Na¨ 25 ıve. 136. 33 method of undetermined coefficients. 48 inner product. 98 Riemann integral. 1. 137 Gaussian Elimination. 100 error. 143 secant method. 89. 7 172 subordinate. 102 recursive. 92. 49 knot. 5 . 136 Taylor’s Theorem. 8 elementary row operations. 138 stepping. 79 phase curve. 119 definition. 114 composite quadrature rule. 138 Euler’s Method.

Sign up to vote on this title
UsefulNot useful