Numerical Analysis

Doron Levy
Department of Mathematics
Stanford University
December 1, 2005
D. Levy
Preface
i
D. Levy CONTENTS
Contents
Preface i
1 Introduction 1
2 Interpolation 2
2.1 What is Interpolation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 The Interpolation Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.3 Newton’s Form of the Interpolation Polynomial . . . . . . . . . . . . . . 5
2.4 The Interpolation Problem and the Vandermonde Determinant . . . . . . 6
2.5 The Lagrange Form of the Interpolation Polynomial . . . . . . . . . . . . 7
2.6 Divided Differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.7 The Error in Polynomial Interpolation . . . . . . . . . . . . . . . . . . . 12
2.8 Interpolation at the Chebyshev Points . . . . . . . . . . . . . . . . . . . 15
2.9 Hermite Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.9.1 Divided differences with repetitions . . . . . . . . . . . . . . . . . 23
2.9.2 The Lagrange form of the Hermite interpolant . . . . . . . . . . . 25
2.10 Spline Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.10.1 Cubic splines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.10.2 What is natural about the natural spline? . . . . . . . . . . . . . 34
3 Approximations 36
3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2 The Minimax Approximation Problem . . . . . . . . . . . . . . . . . . . 41
3.2.1 Existence of the minimax polynomial . . . . . . . . . . . . . . . . 42
3.2.2 Bounds on the minimax error . . . . . . . . . . . . . . . . . . . . 43
3.2.3 Characterization of the minimax polynomial . . . . . . . . . . . . 44
3.2.4 Uniqueness of the minimax polynomial . . . . . . . . . . . . . . . 45
3.2.5 The near-minimax polynomial . . . . . . . . . . . . . . . . . . . . 46
3.2.6 Construction of the minimax polynomial . . . . . . . . . . . . . . 46
3.3 Least-squares Approximations . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3.1 The least-squares approximation problem . . . . . . . . . . . . . . 48
3.3.2 Solving the least-squares problem: a direct method . . . . . . . . 48
3.3.3 Solving the least-squares problem: with orthogonal polynomials . 50
3.3.4 The weighted least squares problem . . . . . . . . . . . . . . . . . 52
3.3.5 Orthogonal polynomials . . . . . . . . . . . . . . . . . . . . . . . 53
3.3.6 Another approach to the least-squares problem . . . . . . . . . . 58
3.3.7 Properties of orthogonal polynomials . . . . . . . . . . . . . . . . 63
4 Numerical Differentiation 65
4.1 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.2 Differentiation Via Interpolation . . . . . . . . . . . . . . . . . . . . . . . 67
4.3 The Method of Undetermined Coefficients . . . . . . . . . . . . . . . . . 70
4.4 Richardson’s Extrapolation . . . . . . . . . . . . . . . . . . . . . . . . . . 72
iii
CONTENTS D. Levy
5 Numerical Integration 74
5.1 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.2 Integration via Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.3 Composite Integration Rules . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.4 Additional Integration Techniques . . . . . . . . . . . . . . . . . . . . . . 82
5.4.1 The method of undetermined coefficients . . . . . . . . . . . . . . 82
5.4.2 Change of an interval . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.4.3 General integration formulas . . . . . . . . . . . . . . . . . . . . . 84
5.5 Simpson’s Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.5.1 The quadrature error . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.5.2 Composite Simpson rule . . . . . . . . . . . . . . . . . . . . . . . 86
5.6 Gaussian Quadrature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.6.1 Maximizing the quadrature’s accuracy . . . . . . . . . . . . . . . 87
5.6.2 Convergence and error analysis . . . . . . . . . . . . . . . . . . . 91
5.7 Romberg Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6 Methods for Solving Nonlinear Problems 95
6.1 The Bisection Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.2 Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.3 The Secant Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Bibliography 104
iv
D. Levy
1 Introduction
1
D. Levy
2 Interpolation
2.1 What is Interpolation?
Imagine that there is an unknown function f(x) for which someone supplies you with
its (exact) values at (n+1) distinct points x
0
< x
1
< < x
n
, i.e., f(x
0
), . . . , f(x
n
) are
given. The interpolation problem is to construct a function Q(x) that passes through
these points. One easy way of finding such a function, is to connect them with straight
lines. While this is a legitimate solution of the interpolation problem, usually (though
not always) we are interested in a different kind of a solution, e.g., a smoother function.
We therefore always specify a certain class of functions from which we would like to
find one that solves the interpolation problem. For example, we may look for a
polynomial, Q(x), that passes through these points. Alternatively, we may look for a
trigonometric function or a piecewise-smooth polynomial such that the interpolation
requirements
Q(x
j
) = f(x
j
), 0 j n, (2.1)
are satisfied (see Figure 2.1).
x
0
x
1
x
2
f(x
0
)
f(x
1
)
f(x
2
)
f(x)
Q(x)
Figure 2.1: The function f(x), the interpolation points x
0
, x
1
, x
2
, and the interpolating
polynomial Q(x)
As a simple example let’s consider values of a function that are prescribed at two
points: (x
0
, f(x
0
)) and (x
1
, f(x
1
)). There are infinitely many functions that pass through
these two points. However, if we limit ourselves to polynomials of degree less than or
equal to one, there is only one such function that passes through these two points, which
is nothing but the line that connects them. A line, in general, is a polynomial of degree
2
D. Levy 2.2 The Interpolation Problem
one, but if we want to keep the discussion general enough, it could be that f(x
0
) = f(x
1
)
in which case the line that connects the two points is the constant Q
0
(x) ≡ f(x
0
), which
is a polynomial of degree zero. This is why we say that there is a unique polynomial of
degree 1 that connects these two points (and not “a polynomial of degree 1”).
The points x
0
, . . . , x
n
are called the interpolation points. The property of “passing
through these points” is referred to as interpolating the data. The function that
interpolates the data is an interpolant or an interpolating polynomial (or whatever
function is being used).
Sometimes the interpolation problem has a solution. There are cases were the inter-
polation problem has no solution, has a unique solution, or has more than one solution.
What we are going to study in this section is precisely how to distinguish between these
cases. We are also going to present various ways of actually constructing the interpolant.
In general, there is little hope that the interpolant will be identical to the unknown
function f(x). The function Q(x) that interpolates f(x) at the interpolation points will
be still be identical to f(x) at these points because there we satisfy the interpolation
conditions (2.1). In general, at any other point, Q(x) and f(x) will not have the same
values. The interpolation error a measure on how different these two functions are.
We will study ways of estimating the interpolation error. We will also discuss strategies
on how to minimize this error.
It is important to note that it is possible to formulate interpolation problem without
referring to (or even assuming the existence of) any underlying function f(x). For
example, you may have a list of interpolation points x
0
, . . . , x
n
, and data that is given
at these points, y
0
, y
1
, . . . , y
n
, which you would like to interpolate. The solution to
this interpolation problem is identical to the one where the values are taken from an
underlying function.
2.2 The Interpolation Problem
We begin our study with the problem of polynomial interpolation: Given n + 1
distinct points x
0
, . . . , x
n
, we seek a polynomial Q
n
(x) of the lowest degree such that
the following interpolation conditions are satisfied:
Q
n
(x
j
) = f(x
j
), j = 0, . . . , n. (2.2)
Note that we do not assume any ordering between the points x
0
, . . . , x
n
, as such an
order will make no difference for the present discussion. If we do not limit the degree of
the interpolation polynomial it is easy to see that there any infinitely many polynomials
that interpolate the data. However, limiting the degree to n, singles out precisely
one interpolant that will do the job. For example, if n = 1, there are infinitely many
polynomials that interpolate between (x
0
, f(x
0
)) and (x
1
, f(x
1
)). There is only one
polynomial of degree 1 that does the job. This result is formally stated in the
following theorem:
Theorem 2.1 If x
0
, . . . , x
n
∈ R are distinct, then for any f(x
0
), . . . f(x
n
) there exists a
unique polynomial Q
n
(x) of degree n such that the interpolation conditions (2.2) are
satisfied.
3
2.2 The Interpolation Problem D. Levy
Proof. We start with the existence part and prove the result by induction. For n = 0,
Q
0
= f(x
0
). Suppose that Q
n−1
is a polynomial of degree n −1, and suppose also
that
Q
n−1
(x
j
) = f(x
j
), 0 j n −1.
Let us now construct from Q
n−1
(x) a new polynomial, Q
n
(x), in the following way:
Q
n
(x) = Q
n−1
(x) + c(x −x
0
) . . . (x −x
n−1
). (2.3)
The constant c in (2.3) is yet to be determined. Clearly, the construction of Q
n
(x)
implies that deg(Q
n
(x)) n. In addition, the polynomial Q
n
(x) satisfies the
interpolation requirements Q
n
(x
j
) = f(x
j
) for 0 j n −1. All that remains is to
determine the constant c in such a way that the last interpolation condition,
Q
n
(x
n
) = f(x
n
), is satisfied, i.e.,
Q
n
(x
n
) = Q
n−1
(x
n
) + c(x
n
−x
0
) . . . (x
n
−x
n−1
). (2.4)
The condition (2.4) defines c as
c =
f(x
n
) −Q
n−1
(x
n
)
n−1
¸
j=0
(x
n
−x
j
)
, (2.5)
and we are done with the proof of existence.
As for uniqueness, suppose that there are two polynomials Q
n
(x), P
n
(x) of degree n
that satisfy the interpolation conditions (2.2). Define a polynomial H
n
(x) as the
difference
H
n
(x) = Q
n
(x) −P
n
(x).
The degree of H
n
(x) is at most n which means that it can have at most n zeros (unless
it is identically zero). However, since both Q
n
(x) and P
n
(x) satisfy the interpolation
requirements (2.2), we have
H
n
(x
j
) = (Q
n
−P
n
)(x
j
) = 0, 0 j n,
which means that H
n
(x) has n + 1 distinct zeros. This leads to a contradiction that
can be resolved only if H
n
(x) is the zero polynomial, i.e.,
P
n
(x) = Q
n
(x),
and uniqueness is established.

4
D. Levy 2.3 Newton’s Form of the Interpolation Polynomial
2.3 Newton’s Form of the Interpolation Polynomial
One good thing about the proof of Theorem 2.1 is that it is constructive. In other
words, we can use the proof to write down a formula for the interpolation polynomial.
We follow the procedure given by (2.4) for reconstructing the interpolation polynomial.
We do it in the following way:
• Let
Q
0
(x) = a
0
,
where a
0
= f(x
0
).
• Let
Q
1
(x) = a
0
+ a
1
(x −x
0
).
Following (2.5) we have
a
1
=
f(x
1
) −Q
0
(x
1
)
x
1
−x
0
=
f(x
1
) −f(x
0
)
x
1
−x
0
.
We note that Q
1
(x) is nothing but the straight line connecting the two points
(x
0
, f(x
0
)) and (x
1
, f(x
1
)).
• In general, let
Q
n
(x) = a
0
+ a
1
(x −x
0
) + . . . + a
n
(x −x
0
) . . . (x −x
n−1
) (2.6)
= a
0
+
n
¸
j=1
a
j
j−1
¸
k=0
(x −x
k
).
The coefficients a
j
in (2.6) are given by
a
0
= f(x
0
), (2.7)
a
j
=
f(x
j
) −Q
j−1
(x
j
)
j−1
¸
k=0
(x
j
−x
k
)
.
We refer to the interpolation polynomial when written in the form (2.6)–(2.7) as the
Newton form of the interpolation polynomial. As we shall see below, there are
various ways of writing the interpolation polynomial. The uniqueness of the interpola-
tion polynomial as guaranteed by Theorem 2.1 implies that we will only be rewriting
the same polynomial in different ways.
Example 2.2
The Newton form of the polynomial that interpolates (x
0
, f(x
0
)) and (x
1
, f(x
1
)) is
Q
1
(x) = f(x
0
) +
f(x
1
) −f(x
0
)
x
1
−x
0
(x −x
0
).
5
2.4 The Interpolation Problem and the Vandermonde Determinant D. Levy
Example 2.3
The Newton form of the polynomial that interpolates the three points (x
0
, f(x
0
)),
(x
1
, f(x
1
)), and (x
2
, f(x
2
)) is
Q
2
(x) = f(x
0
)+
f(x
1
) −f(x
0
)
x
1
−x
0
(x−x
0
)+
f(x
2
) −f(x
0
) +
f(x
1
)−f(x
0
)
x
1
−x
0
(x
2
−x
0
)
(x
2
−x
0
)(x
2
−x
1
)
(x−x
0
)(x−x
1
).
2.4 The Interpolation Problem and the Vandermonde Deter-
minant
An alternative approach to the interpolation problem is to consider directly a polynomial
of the form
Q
n
(x) =
n
¸
k=0
b
k
x
k
, (2.8)
and require that the following interpolation conditions are satisfied
Q
n
(x
j
) = f(x
j
), 0 j n. (2.9)
In view of Theorem 2.1 we already know that this problem has a unique solution, so we
should be able to compute directly the coefficients of the polynomial as given in (2.8).
Indeed, the interpolation conditions, (2.9), imply that the following equations should
hold:
b
0
+ b
1
x
j
+ . . . + b
n
x
n
j
= f(x
j
), j = 0, . . . , n. (2.10)
In matrix form, (2.10) can be rewritten as

¸
¸
¸
¸
1 x
0
. . . x
n
0
1 x
1
. . . x
n
1
.
.
.
.
.
. . . .
.
.
.
1 x
n
. . . x
n
n

¸
¸
¸
¸
b
0
b
1
.
.
.
b
n

=

¸
¸
¸
¸
f(x
0
)
f(x
1
)
.
.
.
f(x
n
)

. (2.11)
In order for the system (2.11) to have a unique solution, it has to be nonsingular.
This means, e.g., that the determinant of its coefficients matrix must not vanish, i.e.

1 x
0
. . . x
n
0
1 x
1
. . . x
n
1
.
.
.
.
.
. . . .
.
.
.
1 x
n
. . . x
n
n

= 0. (2.12)
The determinant (2.12), is known as the Vandermonde determinant. We leave it
as an exercise to verify that

1 x
0
. . . x
n
0
1 x
1
. . . x
n
1
.
.
.
.
.
. . . .
.
.
.
1 x
n
. . . x
n
n

=
¸
i>j
(x
i
−x
j
). (2.13)
6
D. Levy 2.5 The Lagrange Form of the Interpolation Polynomial
Since we assume that the points x
0
, . . . , x
n
are distinct, the determinant (2.13) is indeed
non zero. Hence, the system (2.11) has a solution that is also unique, which confirms
what we already know according to Theorem 2.1.
2.5 The Lagrange Form of the Interpolation Polynomial
The form of the interpolation polynomial that we used in (2.8) assumed a linear com-
bination of polynomials of degrees 0, . . . , n, in which the coefficients were unknown. In
this section we take a different approach and assume that the interpolation polyno-
mial is given as a linear combination of n + 1 polynomials of degree n. This time, we
set the coefficients as the interpolated values, ¦f(x
j

n
j=0
, while the unknowns are the
polynomials. We thus let
Q
n
(x) =
n
¸
j=0
f(x
j
)l
n
j
(x), (2.14)
where l
n
j
(x) are n+1 polynomials of degree n. We use two indices in these polynomials:
the subscript j enumerates l
n
j
(x) from 0 to n and the superscript n is used to remind
us that the degree of l
n
j
(x) is n. Note that in this particular case, the polynomials l
n
j
(x)
are precisely of degree n (and not n). However, Q
n
(x), given by (2.14) may have a
lower degree. In either case, the degree of Q
n
(x) is n at the most. We now require that
Q
n
(x) satisfies the interpolation conditions
Q
n
(x
i
) = f(x
i
), 0 i n. (2.15)
By substituting x
i
for x in (2.14) we have
Q
n
(x
i
) =
n
¸
j=0
f(x
j
)l
n
j
(x
i
), 0 i n.
In view of (2.15) we may conclude that l
n
j
(x) must satisfy
l
n
j
(x
i
) = δ
ij
, (2.16)
where δ
ij
is the Kr¨onecker delta,
δ
ij
=

1, i = j,
0, i = j.
One obvious way of constructing polynomials l
n
j
of degree n that satisfy (2.16) is the
following:
l
n
j
(x) =
(x −x
0
) . . . (x −x
j−1
)(x −x
j+1
) . . . (x −x
n
)
(x
j
−x
0
) . . . (x
j
−x
j−1
)(x
j
−x
j+1
) . . . (x
j
−x
n
)
, 0 j n. (2.17)
Note that the denominator in (2.17) does not vanish since we assume that all inter-
polation points are distinct, and hence the polynomials l
n
j
(x) are well defined. The
7
2.5 The Lagrange Form of the Interpolation Polynomial D. Levy
Lagrange form of the interpolation polynomial is the polynomial Q
n
(x) given by
(2.14), where the polynomials l
n
j
(x) of degree n are given by
l
n
j
(x) =
n
¸
i=0
i=j
(x −x
i
)
n
¸
i=0
i=j
(x
j
−x
i
)
, j = 0, . . . , n. (2.18)
Example 2.4
We are interested in finding the Lagrange form of the interpolation polynomial that
interpolates two points: (x
0
, f(x
0
)) and (x
1
, f(x
1
)). We know that the unique interpola-
tion polynomial through these two points is the line that connects the two points. Such
a line can be written in many different forms. In order to obtain the Lagrange form we
let
l
1
0
(x) =
x −x
1
x
0
−x
1
, l
1
1
(x) =
x −x
0
x
1
−x
0
.
The desired polynomial is therefore given by the familiar formula
Q
1
(x) = f(x
0
)l
1
0
(x) + f(x
1
)l
1
1
(x) = f(x
0
)
x −x
1
x
0
−x
1
+ f(x
1
)
x −x
0
x
1
−x
0
.
Example 2.5
This time we are looking for the Lagrange form of the interpolation polynomial, Q
2
(x),
that interpolates three points: (x
0
, f(x
0
)), (x
1
, f(x
1
)), (x
2
, f(x
2
)). Unfortunately, the
Lagrange form of the interpolation polynomial does not let us use the interpolation
polynomial through the first two points, Q
1
(x), as a building block for Q
2
(x). This
means that we have to compute everything from scratch. We start with
l
2
0
(x) =
(x −x
1
)(x −x
2
)
(x
0
−x
1
)(x
0
−x
2
)
,
l
2
1
(x) =
(x −x
0
)(x −x
2
)
(x
1
−x
0
)(x
1
−x
2
)
,
l
2
2
(x) =
(x −x
0
)(x −x
1
)
(x
2
−x
0
)(x
2
−x
1
)
.
The interpolation polynomial is therefore given by
Q
2
(x) = f(x
0
)l
2
0
(x) + f(x
1
)l
2
1
(x) + f(x
2
)l
2
2
(x)
= f(x
0
)
(x −x
1
)(x −x
2
)
(x
0
−x
1
)(x
0
−x
2
)
+ f(x
1
)
(x −x
0
)(x −x
2
)
(x
1
−x
0
)(x
1
−x
2
)
+ f(x
2
)
(x −x
0
)(x −x
1
)
(x
2
−x
0
)(x
2
−x
1
)
.
It is easy to verify that indeed Q
2
(x
j
) = f(x
j
) for j = 0, 1, 2, as desired.
Remarks.
8
D. Levy 2.5 The Lagrange Form of the Interpolation Polynomial
1. One instance where the Lagrange form of the interpolation polynomial may seem
to be advantageous when compared with the Newton form is when you are
interested in solving several interpolation problems, all given at the same
interpolation points x
0
, . . . x
n
but with different values f(x
0
), . . . , f(x
n
). In this
case, the polynomials l
n
j
(x) are identical for all problems since they depend only
on the points but not on the values of the function at these points. Therefore,
they have to be constructed only once.
2. An alternative form for l
n
j
(x) can be obtained in the following way. If we define
w
n
(x) =
n
¸
i=0
(x −x
i
),
then
w

n
(x) =
n
¸
j=0
n
¸
i=0
i=j
(x −x
i
). (2.19)
When we then evaluate w

x
(x) at any interpolation point, x
j
, there is only one
term in the sum in (2.19) that does not vanish:
w

n
(x
j
) =
n
¸
i=0
i=j
(x
j
−x
i
).
Hence, l
n
j
(x) can be rewritten as
l
n
j
(x) =
w
n
(x)
(x −x
j
)w

n
(x
j
)
, 0 j n. (2.20)
3. For future reference we note that the coefficient of x
n
in the interpolation
polynomial Q
n
(x) is
n
¸
j=0
f(x
j
)
n
¸
k=0
k=j
(x
j
−x
k
)
. (2.21)
For example, the coefficient of x in Q
1
(x) in Example 2.4 is
f(x
0
)
x
0
−x
1
+
f(x
1
)
x
1
−x
0
.
9
2.6 Divided Differences D. Levy
2.6 Divided Differences
We recall that Newton’s form of the interpolation polynomial is
Q
n
(x) = a
0
+ a
1
(x −x
0
) + . . . + a
n
(x −x
0
) . . . (x −x
n−1
),
with a
0
= f(x
0
) and
a
j
=
f(x
j
) −Q
j−1
(x
j
)
j−1
¸
k=0
(x
j
−x
k
)
, 1 j n.
We name the j
th
coefficient, a
j
, as the j
th
-order divided difference. The j
th
-order
divided difference, a
j
, is based on the points x
0
, . . . , x
j
and on the values of the function
at these points f(x
0
), . . . , f(x
j
). To emphasize this dependence, we use the following
notation:
a
j
= f[x
0
, . . . , x
j
], 1 j n.
We also denote the zeroth-order divided difference as
a
0
= f[x
0
],
where
f[x
0
] = f(x
0
).
When written in terms of the divided differences, the Newton form of the interpolation
polynomial becomes
Q
n
(x) = f[x
0
] + f[x
0
, x
1
](x −x
0
) + . . . + f[x
0
, . . . x
n
]
n−1
¸
k=0
(x −x
k
). (2.22)
There is a simple way of computing the j
th
-order divided difference from lower order
divided differences. This is given by the following lemma.
Lemma 2.6 The divided differences satisfy:
f[x
0
, . . . x
n
] =
f[x
1
, . . . x
n
] −f[x
0
, . . . x
n−1
]
x
n
−x
0
. (2.23)
Proof. For any k, we denote by Q
k
(x), a polynomial of degree k, that interpolates
f(x) at x
0
, . . . , x
k
, i.e.,
Q
k
(x
j
) = f(x
j
), 0 j k.
We now consider the unique polynomial P(x) of degree n −1 that interpolates f(x)
at x
1
, . . . , x
n
. It is easy to verify that
Q
n
(x) = P(x) +
x −x
n
x
n
−x
0
[P(x) −Q
n−1
(x)]. (2.24)
10
D. Levy 2.6 Divided Differences
The coefficient of x
n
on the left-hand-side of (2.24) is f[x
0
, . . . , x
n
]. The coefficient of
x
n−1
in P(x) is f[x
1
, . . . , x
n
] and the coefficient of x
n−1
in Q
n−1
(x) is f[x
0
, . . . , x
n−1
].
Hence, the coefficient of x
n
on the right-hand-side of (2.24) is
1
x
n
−x
0
(f[x
1
, . . . , x
n
] −f[x
0
, . . . , x
n−1
]),
which means that
f[x
0
, . . . x
n
] =
f[x
1
, . . . x
n
] −f[x
0
, . . . x
n−1
]
x
n
−x
0
.

Example 2.7
The second-order divided difference is
f[x
0
, x
1
, x
2
] =
f[x
1
, x
2
] −f[x
0
, x
1
]
x
2
−x
0
=
f(x
2
)−f(x
1
)
x
2
−x
1

f(x
1
)−f(x
0
)
x
1
−x
0
x
2
−x
0
.
Hence, the unique polynomial that interpolates (x
0
, f(x
0
)), (x
1
, f(x
1
)), and (x
2
, f(x
2
))
is
Q
2
(x) = f[x
0
] + f[x
0
, x
1
](x −x
0
) + f[x
0
, x
1
, x
2
](x −x
0
)(x −x
1
)
= f(x
0
) +
f(x
1
) −f(x
0
)
x
1
−x
0
(x −x
0
) +
f(x
2
)−f(x
1
)
x
2
−x
1

f(x
1
)−f(x
0
)
x
1
−x
0
x
2
−x
0
(x −x
0
)(x −x
1
).
For example, if we want to find the polynomial of degree 2 that interpolates (−1, 9),
(0, 5), and (1, 3), we have
f(−1) = 9,
f[−1, 0] =
5 −9
0 −(−1)
= −4, f[0, 1] =
3 −5
1 −0
= −2,
f[−1, 0, 1] =
f[0, 1] −f[−1, 0]
1 −(−1)
=
−2 + 4
2
= 1.
so that
Q
2
(x) = 9 −4(x + 1) + (x + 1)x = 5 −3x + x
2
.
The relations between the divided differences are schematically portrayed in Table 2.1
(up to third-order). We note that the divided differences that are being used as the
coefficients in the interpolation polynomial are those that are located in the top of every
column. The recursive structure of the divided differences implies that it is required to
compute all the low order coefficients in the table in order to get the high-order ones.
11
2.7 The Error in Polynomial Interpolation D. Levy
x
0
f(x
0
)
`
f[x
0
, x
1
]
`
x
1
f(x
1
) f[x
0
, x
1
, x
2
]
` `
f[x
1
, x
2
] f[x
0
, x
1
, x
2
, x
3
]
`
x
2
f(x
2
) f[x
1
, x
2
, x
3
]
`
f[x
2
, x
3
]

x
3
f(x
3
)
Table 2.1: Divided Differences
One important property of any divided difference is that it is a symmetric function
of its arguments. This means that if we assume that y
0
, . . . , y
n
is any permutation of
x
0
, . . . , x
n
, then
f[y
0
, . . . , y
n
] = f[x
0
, . . . , x
n
].
This property can be clearly explained by recalling that f[x
0
, . . . , x
n
] plays the role of
the coefficient of x
n
in the polynomial that interpolates f(x) at x
0
, . . . , x
n
. At the same
time, f[y
0
, . . . , y
n
] is the coefficient of x
n
at the polynomial that interpolates f(x) at the
same points. Since the interpolation polynomial is unique for any given data, the order
of the points does not matter, and hence these two coefficients must be identical.
2.7 The Error in Polynomial Interpolation
In this section we would like to provide estimates on the “error” we make when in-
terpolating data that is taken from sampling an underlying function f(x). While the
interpolant and the function agree with each other at the interpolation points, there
is, in general, no reason to expect them to be close to each other elsewhere. Neverthe-
less, we can estimate the difference between them, a difference which we refer to as the
interpolation error. We let Π
n
denote the space of polynomials of degree n.
Theorem 2.8 Let f(x) ∈ C
n+1
[a, b]. Let Q
n
(x) ∈ Π
n
such that it interpolates f(x) at
the n + 1 distinct points x
0
, . . . , x
n
∈ [a, b]. Then ∀x ∈ [a, b], ∃ξ
n
∈ (a, b) such that
f(x) −Q
n
(x) =
1
(n + 1)!
f
(n+1)

n
)
n
¸
j=0
(x −x
j
). (2.25)
12
D. Levy 2.7 The Error in Polynomial Interpolation
Proof. We fix a point x ∈ [a, b]. If x is one of the interpolation points x
0
, . . . , x
n
, then
the left-hand-side and the right-hand-side of (2.25) are both zero, and hence this result
holds trivially. We therefore assume that x = x
j
0 j n, and let
w(x) =
n
¸
j=0
(x −x
j
).
We now let
F(y) = f(y) −Q
n
(y) −λw(y),
where λ is chosen as to guarantee that F(x) = 0, i.e.,
λ =
f(x) −Q
n
(x)
w(x)
.
Since the interpolation points x
0
, . . . , x
n
and x are distinct, w(x) does not vanish and
λ is well defined. We now note that since f ∈ C
n+1
[a, b] and since Q
n
and w are
polynomials, then also F ∈ C
n+1
[a, b]. In addition, F vanishes at n + 2 points:
x
0
, . . . , x
n
and x. According to Rolle’s theorem, F

has at least n + 1 distinct zeros in
(a, b), F

has at least n distinct zeros in (a, b), and similarly, F
(n+1)
has at least one
zero in (a, b), which we denote by ξ
n
. We have
0 = F
(n+1)

n
) = f
(n+1)

n
) −Q
(n+1)
n

n
) −λ(x)w
(n+1)

n
) (2.26)
= f
(n+1)

n
) −
f(x) −Q
n
(x)
w(x)
(n + 1)!
Here, we used the fact that the leading term of w(x) is x
n+1
, which guarantees that its
(n + 1)
th
derivative equals
w
(n+1)
(x) = (n + 1)! (2.27)
Reordering the terms in (2.26) we conclude with
f(x) −Q
n
(x) =
1
(n + 1)!
f
(n+1)

n
)w(x).

In addition to the interpretation of the divided difference of order n as the coefficient
of x
n
in some interpolation polynomial, there is another important characterization
which we will comment on now. Consider, e.g., the first-order divided difference
f[x
0
, x
1
] =
f(x
1
) −f(x
0
)
x
1
−x
0
.
Since the order of the points does not change the value of the divided difference, we can
assume, without any loss of generality, that x
0
< x
1
. If we assume, in addition, that
13
2.7 The Error in Polynomial Interpolation D. Levy
f(x) is continuously differentiable in the interval [x
0
, x
1
], then this divided difference
equals to the derivative of f(x) at an intermediate point, i.e.,
f[x
0
, x
1
] = f

(ξ), ξ ∈ (x
0
, x
1
).
In other words, the first-order divided difference can be viewed as an approximation of
the first derivative in the interval. It is important to note that while this interpretation
is based on additional smoothness requirements from f(x) (i.e. its being differentiable),
the divided differences are well defined also for non-differentiable functions.
This notion can be extended to divided differences of higher order as stated by the
following theorem.
Theorem 2.9 Let x, x
0
, . . . , x
n−1
be n +1 distinct points. Let a = min(x, x
0
, . . . , x
n−1
)
and b = max(x, x
0
, . . . , x
n−1
). Assume that f(y) has a continuous derivative of order n
in the interval (a, b). Then
f[x
0
, . . . , x
n−1
, x] =
f
(n)
(ξ)
n!
, (2.28)
where ξ ∈ (a, b).
Proof. Let Q
n+1
(y) interpolate f(y) at x
0
, . . . , x
n−1
, x. Then according to the
construction of the Newton form of the interpolation polynomial (2.22), we know that
Q
n
(y) = Q
n−1
(y) + f[x
0
, . . . , x
n−1
, x]
n−1
¸
j=0
(y −x
j
).
Since Q
n
(y) interpolated f(y) at x, we have
f(x) = Q
n−1
(x) + f[x
0
, . . . , x
n−1
, x]
n−1
¸
j=0
(x −x
j
).
By Theorem 2.8 we know that the interpolation error is given by
f(x) −Q
n−1
(x) =
1
n!
f
(n)

n−1
)
n−1
¸
j=0
(x −x
j
),
which implies the result (2.28).

Remark. In equation (2.28), we could as well think of the interpolation point x as
any other interpolation point, and name it, e.g., x
n
. In this case, the equation (2.28)
takes the somewhat more natural form of
f[x
0
, . . . , x
n
] =
f
(n)
(ξ)
n!
.
In other words, the n
th
-order divided difference is an n
th
-derivative of the function f(x)
at an intermediate point, assuming that the function has n continuous derivatives. Sim-
ilarly to the first-order divided difference, we would like to emphasize that the n
th
-order
divided difference is also well defined in cases where the function is not as smooth as
required in the theorem, though if this is the case, we can no longer consider this divided
difference to represent a n
th
-order derivative of the function.
14
D. Levy 2.8 Interpolation at the Chebyshev Points
2.8 Interpolation at the Chebyshev Points
In the entire discussion so far, we assumed that the interpolation points are given. There
may be cases where one may have the flexibility of choosing the interpolation points. If
this is the case, it would be reasonable to use this degree of freedom to minimize the
interpolation error.
We recall that if we are interpolating values of a function f(x) that has n continuous
derivatives, the interpolation error is of the form
f(x) −Q
n
(x) =
1
(n + 1)!
f
(n+1)

n
)
n
¸
j=0
(x −x
j
). (2.29)
Here, Q
n
(x) is the interpolating polynomial and ξ
n
is an intermediate point in the
interval of interest (see (2.25)).
It is important to note that the interpolation points influence two terms on the
right-hand-side of (2.29). The obvious one is the product
n
¸
j=0
(x −x
j
). (2.30)
The second one is f
(n+1)

n
) as ξ
n
depends on x
0
, . . . , x
n
. Due to the implicit depen-
dence of ξ
n
on the interpolation points, minimizing the interpolation error is not an
easy task. We will return to this “full” problem later on in the context of the mini-
max approximation. For the time being, we are going to focus on a simpler problem,
namely, how to choose the interpolation points x
0
, . . . , x
n
such that the product (2.30)
is minimized. The solution of this problem is the topic of this section. Once again, we
would like to emphasize that a solution of this problem does not (in general) provide
an optimal choice of interpolation points that will minimize the interpolation error. All
that it guarantees is that the product part of the interpolation error is minimal.
The tool that we are going to use is the Chebyshev polynomials. The solution of
the problem will be to interpolate at Chebyshev points. We will first introduce the
Chebyshev polynomials and the Chebyshev points and then show why interpolating at
these points minimizes (2.30).
We start by defining the Chebyshev polynomials using the following recursion
relation:

T
0
(x) = 1,
T
1
(x) = x,
T
n+1
(x) = 2xT
n
(x) −T
n−1
(x), n 1.
(2.31)
For example, T
2
(x) = 2xT
1
(x)−T
0
(x) = 2x
2
−1, and T
3
(x) = 4x
3
−3x. The polynomials
T
1
(x), T
2
(x) and T
3
(x) are shown in Figure 2.2.
Instead of writing the recursion formula, (2.31), it is possible to write an explicit
formula for the Chebyshev polynomials:
Lemma 2.10 For x ∈ [−1, 1],
T
n
(x) = cos(ncos
−1
x), n 0. (2.32)
15
2.8 Interpolation at the Chebyshev Points D. Levy
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
x
T
1
(x)
T
2
(x)
T
3
(x)
Figure 2.2: The Chebyshev polynomials T
1
(x), T
2
(x) and T
3
(x)
Proof. Standard trigonometric identities imply that
cos(n + 1)θ = cos θ cos nθ −sin θ sin nθ,
cos(n −1)θ = cos θ cos nθ + sin θ sin nθ.
Hence
cos(n + 1)θ = 2 cos θ cos nθ −cos(n −1)θ. (2.33)
We now let θ = cos
−1
x, i.e., x = cos θ, and define
t
n
(x) = cos(ncos
−1
x) = cos(nθ).
Then by (2.33)

t
0
(x) = 1,
t
1
(x) = x,
t
n+1
(x) = 2xt
n
(x) −t
n−1
(x), n 1.
Hence t
n
(x) = T
n
(x).

What is so special about the Chebyshev polynomials, and what is the connection
between these polynomials and minimizing the interpolation error? We are about to
answer these questions, but before doing so, there is one more issue that we must
clarify.
We define a monic polynomial as a polynomial for which the coefficient of the
leading term is one, i.e., a polynomial of degree n is monic, if it is of the form
x
n
+ a
n−1
x
n−1
+ . . . + a
1
x + a
0
.
16
D. Levy 2.8 Interpolation at the Chebyshev Points
Note that Chebyshev polynomials are not monic: the definition (2.31) implies that the
Chebyshev polynomial of degree n is of the form
T
n
(x) = 2
n−1
x
n
+ . . .
This means that T
n
(x) divided by 2
n−1
is monic, i.e.,
2
1−n
T
n
(x) = x
n
+ . . .
A general result about monic polynomials is the following
Theorem 2.11 If p
n
(x) is a monic polynomial of degree n, then
max
−1x1
[p
n
(x)[ 2
1−n
. (2.34)
Proof. We prove (2.34) by contradiction. Suppose that
[p
n
(x)[ < 2
1−n
, [x[ 1.
Let
q
n
(x) = 2
1−n
T
n
(x),
and let x
j
be the following n + 1 points
x
j
= cos


n

, 0 j n.
Since
T
n

cos

n

= (−1)
j
,
we have
(−1)
j
q
n
(x
j
) = 2
1−n
.
Hence
(−1)
j
p
n
(x
j
) [p
n
(x
j
)[ < 2
1−n
= (−1)
j
q
n
(x
j
).
This means that
(−1)
j
(q
n
(x
j
) −p
n
(x
j
)) > 0, 0 j n.
Hence, the polynomial (q
n
−p
n
)(x) oscillates (n+1) times in the interval [−1, 1], which
means that (q
n
−p
n
)(x) has at least n distinct roots in the interval. However, p
n
(x) and
q
n
(x) are both monic polynomials which means that their difference is a polynomial of
degree n −1 at most. Such a polynomial can not have more than n −1 distinct roots,
which leads to a contradiction. Note that p
n
−q
n
can not be the zero polynomial because
17
2.8 Interpolation at the Chebyshev Points D. Levy
that will imply that p
n
(x) and q
n
(x) are identical which again is not possible due to the
assumptions on their maximum values.

We are now ready to use Theorem 2.11 to figure out how to reduce the interpolation
error. We know by Theorem 2.8 that if the interpolation points x
0
, . . . , x
n
∈ [−1, 1],
then there exists ξ
n
∈ (−1, 1) such that the distance between the function whose values
we interpolate, f(x), and the interpolation polynomial, Q
n
(x), is
max
|x|1
[f(x) −Q
n
(x)[
1
(n + 1)!
max
|x|1
[f
(n+1)
(x)[ max
|x|1

n
¸
j=0
(x −x
j
)

.
We are interested in minimizing
max
|x|1

n
¸
j=0
(x −x
j
)

.
We note that
¸
n
j=0
(x −x
j
) is a monic polynomial of degree n + 1 and hence by Theo-
rem 2.11
max
|x|1

n
¸
j=0
(x −x
j
)

2
−n
.
The minimal value of 2
−n
can be actually obtained if we set
2
−n
T
n+1
(x) =
n
¸
j=0
(x −x
j
),
which is equivalent to choosing x
j
as the roots of the Chebyshev polynomial T
n+1
(x).
Here, we have used the obvious fact that [T
n
(x)[ 1.
What are the roots of the Chebyshev polynomial T
n+1
(x)? By Lemma 2.10
T
n+1
(x) = cos((n + 1) cos
−1
x).
The roots of T
n+1
(x), x
0
, . . . , x
n
, are therefore obtained if
(n + 1) cos
−1
(x
j
) =

j +
1
2

π, 0 j n,
i.e., the (n + 1) roots of T
n+1
(x) are
x
j
= cos

2j + 1
2n + 2
π

, 0 j n. (2.35)
The roots of the Chebyshev polynomials are sometimes referred to as the Chebyshev
points. The formula (2.35) for the roots of the Chebyshev polynomial has the following
geometrical interpretation. In order to find the roots of T
n
(x), define α = π/n. Divide
18
D. Levy 2.8 Interpolation at the Chebyshev Points
-1 x
0
x
1
0 x
2
x
3
1
The unit circle

8

8

8
π
8
x
Figure 2.3: The roots of the Chebyshev polynomial T
4
(x), x
0
, . . . , x
3
. Note that they
become dense next to the boundary of the interval
the upper half of the unit circle into n + 1 parts such that the two side angles are α/2
and the other angles are α. The Chebyshev points are then obtained by projecting these
points on the x-axis. This procedure is demonstrated in Figure 2.3 for T
4
(x).
The following theorem summarizes the discussion on interpolation at the Chebyshev
points. It also provides an estimate of the error for this case.
Theorem 2.12 Assume that Q
n
(x) interpolates f(x) at x
0
, . . . , x
n
. Assume also that
these (n + 1) interpolation points are the (n + 1) roots of the Chebyshev polynomial of
degree n + 1, T
n+1
(x), i.e.,
x
j
= cos

2j + 1
2n + 2
π

, 0 j n.
Then ∀[x[ 1,
[f(x) −Q
n
(x)[
1
2
n
(n + 1)!
max
|ξ|1

f
(n+1)
(ξ)

. (2.36)
Example 2.13
Problem: Let f(x) = sin(πx) in the interval [−1, 1]. Find Q
2
(x) which interpolates f(x)
in the Chebyshev points. Estimate the error.
Solution: Since we are asked to find an interpolation polynomial of degree 2, we need
3 interpolation points. We are also asked to interpolate at the Chebyshev points, and
hence we first need to compute the 3 roots of the Chebyshev polynomial of degree 3,
T
3
(x) = 4x
3
−3x.
19
2.8 Interpolation at the Chebyshev Points D. Levy
The roots of T
3
(x) can be easily found from x(4x
2
−3) = 0, i.e.,
x
0
= −

3
2
, , x
1
= 0, x
2
=

3
2
.
The corresponding values of f(x) at these interpolation points are
f(x
0
) = sin



3
2
π

≈ −0.4086,
f(x
1
) = 0,
f(x
2
) = sin


3
2
π

≈ 0.4086.
The first-order divided differences are
f[x
0
, x
1
] =
f(x
1
) −f(x
0
)
x
1
−x
0
≈ 0.4718,
f[x
1
, x
2
] =
f(x
2
) −f(x
1
)
x
2
−x
1
≈ 0.4718,
and the second-order divided difference is
f[x
0
, x
1
, x
2
] =
f[x
1
, x
2
] −f[x
0
, x
1
]
x
2
−x
0
= 0.
The interpolation polynomial is
Q
2
(x) = f(x
0
) + f[x
0
, x
1
](x −x
0
) + f[x
0
, x
1
, x
2
](x −x
0
)(x −x
1
) ≈ 0.4718x.
The original function f(x) and the interpolant at the Chebyshev points, Q
2
(x), are
plotted in Figure 2.4.
As of the error estimate, ∀[x[ 1,
[ sin πx −Q
2
(x)[
1
2
2
3!
max
|ξ|1
[(sin πt)
(3)
[
π
3
2
2
3!
1.292
A brief examination of Figure 2.4 reveals that while this error estimate is correct, it is
far from being sharp.
Remark. In the more general case where the interpolation interval for the function
f(x) is x ∈ [a, b], it is still possible to use the previous results by following the
following steps: Start by converting the interpolation interval to y ∈ [−1, 1]:
x =
(b −a)y + (a + b)
2
.
This converts the interpolation problem for f(x) on [a, b] into an interpolation problem
for f(x) = g(x(y)) in y ∈ [−1, 1]. The Chebyshev points in the interval y ∈ [−1, 1] are
the roots of the Chebyshev polynomial T
n+1
(x), i.e.,
y
j
= cos

2j + 1
2n + 2
π

, 0 j n.
20
D. Levy 2.9 Hermite Interpolation
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
Q
2
(x)
f(x)
x
Figure 2.4: The function f(x) = sin(π(x)) and the interpolation polynomial Q
2
(x) that
interpolates f(x) at the Chebyshev points. See Example 2.13.
The corresponding n + 1 interpolation points in the interval [a, b] are
x
j
=
(b −a)y
j
+ (a + b)
2
, 0 j n.
We now have
max
y∈[a,b]

n
¸
j=0
(y −y
j
)

=

b −a
2

n+1
max
|x|1

n
¸
j=0
(x −x
j
)

,
so that the interpolation error is
[f(y) −Q
n
(y)[
1
2
n
(n + 1)!

b −a
2

n+1
max
ξ∈[a,b]

f
(n+1)
(ξ)

. (2.37)
2.9 Hermite Interpolation
We now turn to a slightly different interpolation problem in which we assume that
in addition to interpolating the values of the function at certain points, we are also
interested in interpolating its derivatives. Interpolation that involves the derivatives is
called Hermite interpolation. Such an interpolation problem is demonstrated in the
following example:
Example 2.14
Problem: Find a polynomials p(x) such that p(1) = −1, p

(1) = −1, and p(0) = 1.
21
2.9 Hermite Interpolation D. Levy
Solution: Since three conditions have to be satisfied, we can use these conditions to
determine three degrees of freedom, which means that it is reasonable to expect that
these conditions uniquely determine a polynomial of degree 2. We therefore let
p(x) = a
0
+ a
1
x + a
2
x
2
.
The conditions of the problem then imply that

a
0
+ a
1
+ a
2
= −1,
a
1
+ 2a
2
= −1,
a
0
= 1.
Hence, there is indeed a unique polynomial that satisfies the interpolation conditions
and it is
p(x) = x
2
−3x + 1.
In general, we may have to interpolate high-order derivatives and not only first-
order derivatives. Also, we assume that for any point x
j
in which we have to satisfy an
interpolation condition of the form
p
(l)
(x
j
) = f(x
j
),
(with p
(l)
being the l
th
-order derivative of p(x)), we are also given all the values of the
lower-order derivatives up to l as part of the interpolation requirements, i.e.,
p
(i)
(x
j
) = f
(i)
(x
j
), 0 i l.
If this is not the case, it may not be possible to find a unique interpolant as demonstrated
in the following example.
Example 2.15
Problem: Find p(x) such that p

(0) = 1 and p

(1) = −1.
Solution: Since we are asked to interpolate two conditions, we may expect them to
uniquely determine a linear function, say
p(x) = a
0
+ a
1
x.
However, both conditions specify the derivative of p(x) at two distinct points to be
of different values, which amounts to a contradicting information on the value of a
1
.
Hence, a linear polynomial can not interpolate the data and we must consider higher-
order polynomials. Unfortunately, a polynomial of order 2 will no longer be unique
because not enough information is given. Note that even if the prescribed values of the
derivatives were identical, we will not have problems with the coefficient of the linear
term a
1
, but we will still not have enough information to determine the constant a
0
.
22
D. Levy 2.9 Hermite Interpolation
A simple case that you are probably already familiar with is the Taylor series.
When viewed from the point of view that we advocate in this section, one can consider
the Taylor series as an interpolation problem in which one has to interpolate the value
of the function and its first n derivatives at a given point, say x
0
, i.e., the interpolation
conditions are:
p
(j)
(x
0
) = f
(j)
(x
0
), 0 j n.
The unique solution of this problem in terms of a polynomial of degree n is
p(x) = f(x
0
) + f

(x
0
)(x −x
0
) + . . . +
f
(n)
(x
0
)
n!
(x −x
0
)
n
=
n
¸
j=0
f
(j)
(x
0
)
j!
(x −x
0
)
j
,
which is the Taylor series of f(x) expanded about x = x
0
.
2.9.1 Divided differences with repetitions
We are now ready to consider the Hermite interpolation problem. The first form we
study is the Newton form of the Hermite interpolation polynomial. We start by extend-
ing the definition of divided differences in such a way that they can handle derivatives.
We already know that the first derivative is connected with the first-order divided dif-
ference by
f

(x
0
) = lim
x→x
0
f(x) −f(x
0
)
x −x
0
= lim
x→x
0
f[x, x
0
].
Hence, it is natural to extend the notion of divided differences by the following definition.
Definition 2.16 The first-order divided difference with repetitions is defined as
f[x
0
, x
0
] = f

(x
0
). (2.38)
In a similar way, we can extend the notion of divided differences to high-order derivatives
as stated in the following lemma (which we leave without a proof).
Lemma 2.17 Let x
0
x
1
. . . x
n
. Then the divided differences satisfy
f[x
0
, . . . x
n
] =

f[x
1
, . . . , x
n
] −f[x
0
, . . . , x
n−1
]
x
n
−x
0
, x
n
= x
0
,
f
(n)
(x
0
)
n!
, x
n
= x
0
.
(2.39)
We now consider the following Hermite interpolation problem: The interpolation
points are x
0
, . . . , x
l
(which we assume are ordered from small to large). At each inter-
polation point x
j
, we have to satisfy the interpolation conditions:
p
(i)
(x
j
) = f
(i)
(x
j
), 0 i m
j
.
23
2.9 Hermite Interpolation D. Levy
Here, m
j
denotes the number of derivatives that we have to interpolate for each point
x
j
(with the standard notation that zero derivatives refers to the value of the function
only). In general, the number of derivatives that we have to interpolate may change
from point to point. The extended notion of divided differences allows us to write the
solution to this problem in the following way:
We let n denote the total number of points including their multiplicities (that cor-
respond to the number of derivatives we have to interpolate at each point), i.e.,
n = m
1
+ m
2
+ . . . + m
l
.
We then list all the points including their multiplicities (that correspond to the number
of derivatives we have to interpolate). To simplify the notations we identify these points
with a new ordered list of points y
i
:
¦y
0
, . . . , y
n−1
¦ = ¦x
0
, . . . , x
0
. .. .
m
1
, x
1
, . . . , x
1
. .. .
m
2
, . . . , x
l
, . . . , x
l
. .. .
m
l
¦.
The interpolation polynomial p
n−1
(x) is given by
p
n−1
(x) = f[y
0
] +
n−1
¸
j=1
f[y
0
, . . . , y
j
]
j−1
¸
k=0
(x −y
k
). (2.40)
Whenever a point repeats in f[y
0
, . . . , y
j
], we interpret this divided difference in terms
of the extended definition (2.39). In practice, there is no need to shift the notations to
y’s and we work directly with the original points. We demonstrate this interpolation
procedure in the following example.
Example 2.18
Problem: Find an interpolation polynomial p(x) that satisfies

p(x
0
) = f(x
0
),
p(x
1
) = f(x
1
),
p

(x
1
) = f

(x
1
).
Solution: The interpolation polynomial p(x) is
p(x) = f(x
0
) + f[x
0
, x
1
](x −x
0
) + f[x
0
, x
1
, x
1
](x −x
0
)(x −x
1
).
The divided differences:
f[x
0
, x
1
] =
f(x
1
) −f(x
0
)
x
1
−x
0
.
f[x
0
, x
1
, x
1
] =
f[x
1
, x
1
] −f[x
1
, x
0
]
x
1
−x
0
=
f

(x
1
) −
f(x
1
)−f(x
0
)
x
1
−x
0
x
1
−x
0
.
Hence
p(x) = f(x
0
)+
f(x
1
) −f(x
0
)
x
1
−x
0
(x−x
0
)+
(x
1
−x
0
)f

(x
1
) −[f(x
1
) −f(x
0
)]
(x
1
−x
0
)
2
(x−x
0
)(x−x
1
).
24
D. Levy 2.9 Hermite Interpolation
2.9.2 The Lagrange form of the Hermite interpolant
In this section we are interested in writing the Lagrange form of the Hermite interpolant
in the special case in which the nodes are x
0
, . . . , x
n
and the interpolation conditions
are
p(x
i
) = f(x
i
), p

(x
i
) = f

(x
i
), 0 i n. (2.41)
We look for an interpolant of the form
p(x) =
n
¸
i=0
f(x
i
)A
i
(x) +
n
¸
i=0
f

(x
i
)B
i
(x). (2.42)
In order to satisfy the interpolation conditions (2.41), the polynomials p(x) in (2.42)
must satisfy the 2n + 2 conditions:

A
i
(x
j
) = δ
ij
, B
i
(x
j
) = 0,
i, j = 0, . . . , n.
A

i
(x
j
) = 0, B

i
(x
j
) = δ
ij
,
(2.43)
We thus expect to have a unique polynomial p(x) that satisfies the constraints (2.43)
assuming that we limit its degree to be 2n + 1.
It is convenient to start the construction with the functions we have used in the
Lagrange form of the standard interpolation problem (Section 2.5). We already know
that
l
i
(x) =
n
¸
j=0
j=i
x −x
j
x
i
−x
j
,
satisfy l
i
(x
j
) = δ
ij
. In addition, for i = j,
l
2
i
(x
j
) = 0, (l
2
i
(x
j
))

= 0.
The degree of l
i
(x) is n, which means that the degree of l
2
i
(x) is 2n. We will thus
assume that the unknown polynomials A
i
(x) and B
i
(x) in (2.43) can be written as

A
i
(x) = r
i
(x)l
2
i
(x),
B
i
(x) = s
i
(x)l
2
i
(x).
The functions r
i
(x) and s
i
(x) are both assumed to be linear, which implies that deg(A
i
) =
deg(B
i
) = 2n + 1, as desired. Now, according to (2.43)
δ
ij
= A
i
(x
j
) = r
i
(x
j
)l
2
i
(x
j
) = r
i
(x
j

ij
.
Hence
r
i
(x
i
) = 1. (2.44)
25
2.9 Hermite Interpolation D. Levy
Also,
0 = A

i
(x
j
) = r

i
(x
j
)[l
i
(x
j
)]
2
+ 2r
i
(x
j
)l
i
(x
J
)l

i
(x
j
) = r

i
(x
j

ij
+ 2r
i
(x
j

ij
l

i
(x
j
),
and thus
r

i
(x
i
) + 2l

i
(x
i
) = 0. (2.45)
Assuming that r
i
(x) is linear, r
i
(x) = ax + b, equations (2.44),(2.45), imply that
a = −2l

i
(x
i
), b = 1 + 2l

i
(x
i
)x
i
.
Therefore
A
i
(x) = [1 + 2l

i
(x
i
)(x
i
−x)]l
2
i
(x).
As of B
i
(x) in (2.42), the conditions (2.43) imply that
0 = B
i
(x
j
) = s
i
(x
j
)l
2
i
(x
j
) =⇒ s
i
(x
i
) = 0, (2.46)
and
δ
ij
= B

i
(x
j
) = s

i
(x
j
)l
2
i
(x
j
) + 2s
i
(x
j
)(l
2
i
(x
j
))

=⇒ s

i
(x
i
) = 1. (2.47)
Combining (2.46) and (2.47), we obtain
s
i
(x) = x −x
i
,
so that
B
i
(x) = (x −x
i
)l
2
i
(x).
To summarize, the Lagrange form of the Hermite interpolation polynomial is given by
p(x) =
n
¸
i=0
f(x
i
)[1 + 2l

i
(x
i
)(x
i
−x)]l
2
i
(x) +
n
¸
i=0
f

(x
i
)(x −x
i
)l
2
i
(x). (2.48)
The error in the Hermite interpolation (2.48) is given by the following theorem.
Theorem 2.19 Let x
0
, . . . , x
n
be distinct nodes in [a, b] and f ∈ C
2n+2
[a, b]. If p ∈
Π
2n+1
, such that ∀0 i n,
p(x
i
) = f(x
i
), p

(x
i
) = f

(x
i
),
then ∀x ∈ [a, b], there exists ξ ∈ (a, b) such that
f(x) −p(x) =
f
(2n+2)
(ξ)
(2n + 2)!
n
¸
i=0
(x −x
i
)
2
. (2.49)
26
D. Levy 2.9 Hermite Interpolation
Proof. The proof follows the same techniques we used in proving Theorem 2.8. If x is
one of the interpolation points, the result trivially holds. We thus fix x as a
non-interpolation point and define
w(y) =
n
¸
i=0
(y −x
i
)
2
.
We also have
φ(y) = f(y) −p(y) −λw(y),
and select λ such that φ(x) = 0, i.e.,
λ =
f(x) −p(x)
w(x)
.
φ has (at least) n + 2 zeros in [a, b]: (x, x
0
, . . . , x
n
). By Rolle’s theorem, we know that
φ

has (at least) n + 1 zeros that are different than (x, x
0
, . . . , x
n
). Also, φ

vanishes at
x
0
, . . . , x
n
, which means that φ

has at least 2n + 2 zeros in [a, b].
Similarly, Rolle’s theorem implies that φ

has at least 2n + 1 zeros in (a, b), and by
induction, φ
(2n+2)
has at least one zero in (a, b), say ξ.
Hence
0 = φ
(2n+2)
(ξ) = f
(2n+2)
(ξ) −p
(2n+2)
(ξ) −λw
(2n+2)
(ξ).
Since the leading term in w(y) is x
2n+2
, w
(2n+2)
(ξ) = (2n + 2)!. Also, since
p(x) ∈ Π
2n+1
, p
(2n+2)
(ξ) = 0. We recall that x was an arbitrary (non-interpolation)
point and hence we have
f(x) −p(x) =
f
(2n+2)
(ξ)
(2n + 2)!
n
¸
i=0
(x −x
i
)
2
.

Example 2.20
Assume that we would like to find the Hermite interpolation polynomial that satisfies:
p(x
0
) = y
0
, p

(x
0
) = d
0
, p(x
1
) = y
1
, p

(x
1
) = d
1
.
In this case n = 1, and
l
0
(x) =
x −x
1
x
0
−x
1
, l

0
(x) =
1
x
0
−x
1
, l
1
(x) =
x −x
0
x
1
−x
0
, l

1
(x) =
1
x
1
−x
0
.
According to (2.48), the desired polynomial is given by (check!)
p(x) = y
0
¸
1 +
2
x
0
−x
1
(x
0
−x)

x −x
1
x
0
−x
1

2
+ y
1
¸
1 +
2
x
1
−x
0
(x
1
−x)

x −x
0
x
1
−x
0

2
+d
0
(x −x
0
)

x −x
1
x
0
−x
1

2
+ d
1
(x −x
1
)

x −x
0
x
1
−x
0

2
.
27
2.10 Spline Interpolation D. Levy
2.10 Spline Interpolation
So far, the only type of interpolation we were dealing with was polynomial interpolation.
In this section we discuss a different type of interpolation: piecewise-polynomial interpo-
lation. A simple example of such interpolants will be the function we get by connecting
data with straight lines (see Figure 2.5). Of course, we would like to generate functions
that are somewhat smoother than piecewise-linear functions, and still interpolate the
data. The functions we will discuss in this section are splines.
(x
0
, f(x
0
))
(x
1
, f(x
1
))
(x
2
, f(x
2
))
(x
3
, f(x
3
))
(x
4
, f(x
4
))
x
Figure 2.5: A piecewise-linear spline. In every subinterval the function is linear. Overall
it is continuous where the regularity is lost at the knots
You may still wonder why are we interested in such functions at all? It is easy to
motivate this discussion by looking at Figure 2.6. In this figure we demonstrate what a
high-order interpolant looks like. Even though the data that we interpolate has only one
extrema in the domain, we have no control over the oscillatory nature of the high-order
interpolating polynomial. In general, high-order polynomials are oscillatory, which rules
them as non-practical for many applications. That is why we focus our attention in this
section on splines.
Splines, should be thought of as polynomials on subintervals that are connected in
a “smooth way”. We will be more rigorous when we define precisely what we mean by
smooth. First, we pick n + 1 points which we refer to as the knots: t
0
< t
1
< < t
n
.
A spline of degree k having knots t
0
, . . . , t
n
is a function s(x) that satisfies the
following two properties:
1. On [t
i−1
, t
i
) s(x) is a polynomial of degree k, i.e., s(x) is a polynomial on every
subinterval that is defined by the knots.
2. Smoothness: s(x) has a continuous (k −1)
th
derivative on the interval [t
0
, t
n
].
28
D. Levy 2.10 Spline Interpolation
−5 −4 −3 −2 −1 0 1 2 3 4 5
−0.5
0
0.5
1
1.5
2
1
1+x
2
Q
10
(x)
x
Figure 2.6: An interpolant “goes bad”. In this example we interpolate 11 equally spaced
samples of f(x) =
1
1+x
2
with a polynomial of degree 10, Q
10
(x)
(t
0
, f(t
0
))
(t
1
, f(t
1
))
(t
2
, f(t
2
))
(t
3
, f(t
3
))
(t
4
, f(t
4
))
x
Figure 2.7: A zeroth-order (piecewise-constant) spline. The knots are at the interpola-
tion points. Since the spline is of degree zero, the function is not even continuous
29
2.10 Spline Interpolation D. Levy
A spline of degree 0 is a piecewise-constant function (see Figure 2.7). A spline of
degree 1 is a piecewise-linear function that can be explicitly written as
s(x) =

s
0
(x) = a
0
x + b
0
, x ∈ [t
0
, t
1
),
s
1
(x) = a
1
x + b
1
, x ∈ [t
1
, t
2
),
.
.
.
.
.
.
s
n−1
(x) = a
n−1
x + b
n−1
, x ∈ [t
n−1
, t
n
],
(see Figure 2.5 where the knots ¦t
i
¦ and the interpolation points ¦x
i
¦ are assumed to
be identical). It is now obvious why the points t
0
, . . . , t
n
are called knots: these are
the points that connect the different polynomials with each other. To qualify as an
interpolating function, s(x) will have to satisfy interpolation conditions that we will
discuss below. We would like to comment already at this point that knots should not be
confused with the interpolation points. Sometimes it is convenient to choose the knots
to coincide with the interpolation points but this is only optional, and other choices can
be made.
2.10.1 Cubic splines
A special case (which is the most common spline function that is used in practice) is
the cubic spline. A cubic spline is a spline for which the function is a polynomial of
degree 3 on every subinterval, and a function with two continuous derivatives overall
(see Figure 2.8).
Let’s denote such a function by s(x), i.e.,
s(x) =

s
0
(x), x ∈ [t
0
, t
1
),
s
1
(x), x ∈ [t
1
, t
2
),
.
.
.
.
.
.
s
n−1
(x), x ∈ [t
n−1
, t
n
],
where ∀i, the degree of s
i
(x) is 3.
We now assume that some data (that s(x) should interpolate) is given at the knots,
i.e.,
s(t
i
) = y
i
, 0 i n. (2.50)
The interpolation conditions (2.50) in addition to requiring that s(x) is continuous,
imply that
s
i−1
(t
i
) = y
i
= s
i
(t
i
), 1 i n −1. (2.51)
We also require the continuity of the first and the second derivatives, i.e.,
s

i
(t
i+1
) = s

i+1
(t
i+1
), 0 i n −2, (2.52)
s

i
(t
i+1
) = s

i+1
(t
i+1
), 0 i n −2.
Before actually computing the spline, let’s check if we have enough equations to
determine a unique solution for the problem. There are n subintervals, and in each
30
D. Levy 2.10 Spline Interpolation
(t
0
, f(t
0
))
(t
1
, f(t
1
))
(t
2
, f(t
2
))
(t
3
, f(t
3
))
(t
4
, f(t
4
))
x
Figure 2.8: A cubic spline. In every subinterval [t
i−1
, t
i
, the function is a polynomial of
degree 2. The polynomials on the different subintervals are connected to each other
in such a way that the spline has a second-order continuous derivative. In this example
we use the not-a-knot condition.
subinterval we have to determine a polynomial of degree 3. Each such polynomial
has 4 coefficients, which leaves us with 4n coefficients to determine. The interpolation
and continuity conditions (2.51) for s
i
(t
i
) and s
i
(t
i+1
) amount to 2n equations. The
continuity of the first and the second derivatives (2.52) add 2(n−1) = 2n−2 equations.
Altogether we have 4n − 2 equations but 4n unknowns which leaves us with 2 degrees
of freedom. These indeed are two degrees of freedom that can be determined in various
ways as we shall see below.
We are now ready to compute the spline. We will use the following notation:
h
i
= t
i+1
−t
i
.
We also set
z
i
= s

(t
i
).
Since the second derivative of a cubic function is linear, we observe that s

i
(x) is the line
connecting (t
i
, z
i
) and (t
i+1
, z
i+1
), i.e.,
s

i
(x) =
x −t
i
h
i
z
i+1

x −t
i+1
h
i
z
i
. (2.53)
Integrating (2.53) once, we have
s

i
(x) =
1
2
(x −t
i
)
2
z
i+1
h
i

1
2
(x −t
i+1
)
2
z
i
h
i
+ ˜ c.
31
2.10 Spline Interpolation D. Levy
Integrating again
s
i
(x) =
z
i+1
6h
i
(x −t
i
)
3
+
z
i
6h
i
(t
i+1
−x)
3
+ C(x −t
i
) + D(t
i+1
−x).
The interpolation condition, s(t
i
) = y
i
, implies that
y
i
=
z
i
6h
i
h
3
i
+ Dh
i
,
i.e.,
D =
y
i
h
i

z
i
h
i
6
.
Similarly, s
i
(t
i+1
) = y
i+1
, implies that
y
i+1
=
z
i+1
6h
i
h
3
i
+ Ch
i
,
i.e.,
C =
y
i+1
h
i

z
i+1
6
h
i
.
This means that we can rewrite s
i
(x) as
s
i
(x) =
z
i+1
6h
i
(x−t
i
)
3
+
z
i
6h
i
(t
i+1
−x)
3
+

y
i+1
h
i

z
i+1
6
h
i

(x−t
i
)+

y
i
h
i

z
i
6
h
i

(t
i+1
−x).
All that remains to determine is the second derivatives of s(x), z
0
, . . . , z
n
. We can set
z
1
, . . . , z
n−1
using the continuity conditions on s

(x), i.e., s

i
(t
i
) = s

i−1
(t
i
). We first
compute s

i
(x) and s

i−1
(x):
s

i
(x) =
z
i+1
2h
i
(x −t
i
)
2

z
i
2h
i
(t
i+1
−x)
2
+
y
i+1
h
i

z
i+1
6
h
i

y
i
h
i
+
z
i
h
i
6
.
s

i−1
(x) =
z
i
2h
i−1
(x −t
i−1
)
2

z
i−1
2h
i−1
(t
i
−x)
2
+
y
i
h
i−1

z
i
6
h
i−1

y
i−1
h
i−1
+
z
i−1
h
i−1
6
.
So that
s

i
(t
i
) = −
z
i
2h
i
h
2
i
+
y
i+1
h
i

z
i+1
6
h
i

y
i
h
i
+
z
i
h
i
6
= −
h
i
3
z
i

h
i
6
z
i+1

y
i
h
i
+
y
i+1
h
i
,
s

i−1
(t
i
) =
z
i
2h
i−1
h
2
i−1
+
y
i
h
i−1

z
i
6
h
i−1

y
i−1
h
i−1
+
z
i−1
h
i−1
6
=
h
i−1
6
z
i−1
+
h
i−1
3
z
i

y
i−1
h
i−1
+
y
i
h i−1
.
Hence, for 1 i n −1, we obtain the system of equations
h
i−1
6
z
i−1
+
h
i
+ h
i−1
3
z
i
+
h
i
6
z
i+1
=
1
h
i
(y
i+1
−y
i
) −
1
h
i−1
(y
i
−y
i−1
). (2.54)
32
D. Levy 2.10 Spline Interpolation
These are n−1 equations for the n+1 unknowns, z
0
, . . . , z
n
, which means that we have
2 degrees of freedom. Without any additional information about the problem, the only
way to proceed is by making an arbitrary choice. There are several standard ways to
proceed. One option is to set the end values to zero, i.e.,
z
0
= z
n
= 0. (2.55)
This choice of the second derivative at the end points leads to the so-called, natural
cubic spline. We will explain later in what sense this spline is “natural”. In this case,
we end up with the following linear system of equations

¸
¸
¸
¸
¸
¸
h
0
+h
1
3
h
1
6
h
1
6
h
1
+h
2
3
h
2
6
.
.
.
.
.
.
.
.
.
h
n−3
6
h
n−3
+h
n−2
3
h
n−2
6
h
n−2
6
h
n−2
+h
n−1
3

¸
¸
¸
¸
¸
¸
z
1
z
2
.
.
.
z
n−2
z
n−1

=

¸
¸
¸
¸
¸
¸
y
2
−y
1
h
1

y
1
−y
0
h
0
y
3
−y
2
h
2

y
2
−y
1
h
1
.
.
.
y
n−1
−y
n−2
h
n−2

y
n−2
−y
n−3
h
n−3
yn−y
n−1
h
n−1

y
n−1
−y
n−2
h
n−2

The coefficients matrix is symmetric, tridiagonal, and diagonally dominant (i.e., [a
ii
[ >
¸
n
j=1,j=i
[a
ij
[, ∀i), which means that it can always be (efficiently) inverted.
In the special case where the points are equally spaced, i.e., h
i
= h, ∀i, the system
becomes

¸
¸
¸
¸
¸
¸
4 1
1 4 1
.
.
.
.
.
.
.
.
.
1 4 1
1 4

¸
¸
¸
¸
¸
¸
z
1
z
2
.
.
.
z
n−2
z
n−1

=
6
h
2

¸
¸
¸
¸
¸
¸
y
2
−2y
1
+ y
0
y
3
−2y
2
+ y
1
.
.
.
y
n−1
−2y
n−2
+ y
n−3
y
n
−2y
n−1
+ y
n−2

(2.56)
In addition to the natural spline (2.55), there are other standard options:
1. If the values of the derivatives at the endpoints are known, one can specify them
s

(t
0
) = y

0
, s

(t
n
) = y

n
.
2. The not-a-knot condition. Here, we require the third-derivative s
(3)
(x) to be
continuous at the points t
1
, t
n−1
. In this case we end up with a cubic spline with
knots t
0
, t
2
, t
3
, . . . , t
n−2
, t
n
. The points t
1
and t
n−1
no longer function as knots.
The interpolation requirements are still satisfied at t
0
, t
1
, . . . , t
n−1
, t
n
. Figure 2.9
shows two different cubic splines that interpolate the same initial data. The spline
that is plotted with a solid line is the not-a-knot spline. The spline that is plotted
with a dashed line is obtained by setting the derivatives at both end-points to
zero.
33
2.10 Spline Interpolation D. Levy
(t
0
, f(t
0
))
(t
1
, f(t
1
))
(t
2
, f(t
2
))
(t
3
, f(t
3
))
(t
4
, f(t
4
))
x
Figure 2.9: Two cubic splines that interpolate the same data. Solid line: a not-a-knot
spline; Dashed line: the derivative is set to zero at both end-points
2.10.2 What is natural about the natural spline?
The following theorem states that the natural spline can not have a larger L
2
-norm of
the second-derivative than the function it interpolates (assuming that that function has
a continuous second-derivative). In fact, we are minimizing the L
2
-norm of the second-
derivative not only with respect to the “original” function which we are interpolating,
but with respect to any function that interpolates the data (and has a continuous second-
derivative). In that sense, we refer to the natural spline as “natural”.
Theorem 2.21 Assume that f

(x) is continuous in [a, b], and let a = t
0
< t
1
< <
t
n
= b. If s(x) is the natural cubic spline interpolating f(x) at the knots ¦t
i
¦ then

b
a
(s

(x))
2
dx

b
a
(f

(x))
2
dx.
Proof. Define g(x) = f(x) −s(x). Then since s(x) interpolates f(x) at the knots ¦t
i
¦
their difference vanishes at these points, i.e.,
g(t
i
) = 0, 0 i n.
Now

b
a
(f

)
2
dx =

b
a
(s

)
2
dx +

b
a
(g

)
2
dx + 2

b
a
s

g

dx. (2.57)
We will show that the last term on the right-hand-side of (2.57) is zero, which will
conclude the proof as the other two terms on the right-hand-side of (2.57) are
34
D. Levy 2.10 Spline Interpolation
non-negative. Splitting that term into a sum of integrals on the subintervals and
integrating by parts on every subinterval, we have

b
a
s

g

dx =
n
¸
i=1

t
i
t
i−1
s

g

dx =
n
¸
i=1
¸
(s

g

)

t
i
t
i−1

t
i
t
i−1
s

g

dx
¸
.
Since we are dealing with the “natural” choice s

(t
0
) = s

(t
n
) = 0, and since s

(x) is
constant on [t
i−1
, t
i
] (say c
i
), we end up with

b
a
s

g

dx = −
n
¸
i=1

t
i
t
i−1
s

g

dx = −
n
¸
i=1
c
i

t
i
t
i−1
g

dx = −
n
¸
i=1
c
i
(g(t
i
)−g(t
i−1
)) = 0.

We note that f

(x) can be viewed as a linear approximation of the curvature
[f

(x)[
(1 + (f

(x))
2
)
3
2
.
From that point of view, minimizing

b
a
(f

(x))
2
dx, can be viewed as finding the curve
with a minimal [f

(x)[ over an interval.
35
D. Levy
3 Approximations
3.1 Background
In this chapter we are interested in approximation problems. Generally speaking, start-
ing from a function f(x) we would like to find a different function g(x) that belongs
to a given class of functions and is “close” to f(x) in some sense. As far as the class
of functions that g(x) belongs to, we will typically assume that g(x) is a polynomial
of a given degree (though it can be a trigonometric function, or any other function).
A typical approximation problem, will therefore be: find the “closest” polynomial of
degree n to f(x).
What do we mean by “close”? There are different ways of measuring the “distance”
between two functions. We will focus on two such measurements (among many): the L

-
norm and the L
2
-norm. We chose to focus on these two examples because of the different
mathematical techniques that are required to solve the corresponding approximation
problems.
We start with several definitions. We recall that a norm on a vector space V over
R is a function | | : V →R with the following properties:
1. λ|f| = [λ[|f|, ∀λ ∈ R and ∀f ∈ V .
2. |f| 0, ∀f ∈ V . Also |f| = 0 iff f is the zero element of V .
3. The triangle inequality: |f + g| |f| +|g|, ∀f, g ∈ V .
We assume that the function f(x) ∈ C
0
[a, b] (continuous on [a, b]). A continuous
function on a closed interval obtains a maximum in the interval. We can therefore define
the L

norm (also known as the maximum norm) of such a function by
|f|

= max
axb
[f(x)[. (3.1)
The L

-distance between two functions f(x), g(x) ∈ C
0
[a, b] is thus given by
|f −g|

= max
axb
[f(x) −g(x)[. (3.2)
We note that the definition of the L

-norm can be extended to functions that are less
regular than continuous functions. This generalization requires some subtleties that
we would like to avoid in the following discussion, hence, we will limit ourselves to
continuous functions.
We proceed by defining the L
2
-norm of a continuous function f(x) as
|f|
2
=

b
a
[f(x)[
2
dx. (3.3)
The L
2
function space is the collection of functions f(x) for which |f|
2
< ∞. Of
course, we do not have to assume that f(x) is continuous for the definition (3.3) to
make sense. However, if we allow f(x) to be discontinuous, we then have to be more
36
D. Levy 3.1 Background
rigorous in terms of the definition of the interval so that we end up with a norm (the
problem is, e.g., in defining what is the “zero” element in the space). We therefore limit
ourselves also in this case to continuous functions only. The L
2
-distance between two
functions f(x) and g(x) is
|f −g|
2
=

b
a
[f(x) −g(x)[
2
dx. (3.4)
At this point, a natural question is how important is the choice of norm in terms of
the solution of the approximation problem. It is easy to see that the value of the norm
of a function may vary substantially based on the function as well as the choice of the
norm. For example, assume that |f|

< ∞. Then, clearly
|f|
2
=

b
a
[f[
2
dx ≤ (b −a)|f|

.
On the other hand, it is easy to construct a function with an arbitrary small |f|
2
and
an arbitrarily large |f|

. Hence, the choice of norm may have a significant impact on
the solution of the approximation problem.
As you have probably already anticipated, there is a strong connection between some
approximation problems and interpolation problems. For example, one possible method
of constructing an approximation to a given function is by sampling it at certain points
and then interpolating the sampled data. Is that the best we can do? Sometimes the
answer is positive, but the problem still remains difficult because we have to determine
the best sampling points. We will address these issues in the following sections.
The following theorem, the Weierstrass approximation theorem, plays a central role
in any discussion of approximations of functions. Loosely speaking, this theorem states
that any continuous function can be approached as close as we want to with polynomials,
assuming that the polynomials can be of any degree. We formulate this theorem in the
L

norm and note that a similar theorem holds also in the L
2
sense. We let Π
n
denote
the space of polynomials of degree n.
Theorem 3.1 (Weierstrass Approximation Theorem) Let f(x) be a continuous
function on [a, b]. Then there exists a sequence of polynomials P
n
(x) that converges
uniformly to f(x) on [a, b], i.e., ∀ε > 0, there exists an N ∈ N and polynomials P
n
(x) ∈
Π
n
, such that ∀x ∈ [a, b]
[f(x) −P
n
(x)[ < ε, ∀n N.
We will provide a constructive proof of the Weierstrass approximation theorem: first,
we will define a family of polynomials, known as the Bernstein polynomials, and then
we will show that they uniformly converge to f(x).
We start with the definition. Given a continuous function f(x) in [0, 1], we define
the Bernstein polynomials as
(B
n
f)(x) =
n
¸
j=0
f

j
n

n
j

x
j
(1 −x)
n−j
, 0 x 1.
37
3.1 Background D. Levy
We emphasize that the Bernstein polynomials depend on the function f(x).
Example 3.2
Three Bernstein polynomials B
6
(x), B
10
(x), and B
20
(x) for the function
f(x) =
1
1 + 10(x −0.5)
2
on the interval [0, 1] are shown in Figure 3.1. Note the gradual convergence of B
n
(x) to
f(x).
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
f(x)
B
6
(x)
B
10
(x)
B
20
(x)
Figure 3.1: The Bernstein polynomials B
6
(x), B
10
(x), and B
20
(x) for the function f(x) =
1
1+10(x−0.5)
2
on the interval [0, 1]
We now state and prove several properties of B
n
(x) that will be used when we prove
Theorem 3.1.
Lemma 3.3 The following relations hold:
1. (B
n
1)(x) = 1
2. (B
n
x)(x) = x
3. (B
n
x
2
)(x) =
n −1
n
x
2
+
x
n
.
Proof.
(B
n
1)(x) =
n
¸
j=0

n
j

x
j
(1 −x)
n−j
= (x + (1 −x))
n
= 1.
38
D. Levy 3.1 Background
(B
n
x)(x) =
n
¸
j=0
j
n

n
j

x
j
(1 −x)
n−j
= x
n
¸
j=1

n −1
j −1

x
j−1
(1 −x)
n−j
= x
n−1
¸
j=0

n −1
j

x
j
(1 −x)
n−1−j
= x[x + (1 −x)]
n−1
= x.
Finally,

j
n

2

n
j

=
j
n
(n −1)!
(n −j)!(j −1)!
=
n −1
n −1
j −1
n
(n −1)!
(n −j)!(j −1)!
+
1
n
(n −1)!
(n −j)!(j −1)!
=
n −1
n

n −2
j −2

+
1
n

n −1
j −1

.
Hence
(B
n
x
2
)(x) =
n
¸
j=0

j
n

2

n
j

x
j
(1 −x)
n−j
=
n −1
n
x
2
n
¸
j=2

n −2
j −2

x
j−2
(1 −x)
n−j
+
1
n
x
n
¸
j=1

n −1
j −1

x
j−1
(1 −x)
n−j
=
n −1
n
x
2
(x + (1 −x))
n−2
+
1
n
x(x + (1 −x))
n−1
=
n −1
n
x
2
+
x
n
.

In the following lemma we state several additional properties of the Bernstein poly-
nomials. The proof is left as an exercise.
Lemma 3.4 For all functions f(x), g(x) that are continuous in [0, 1], and ∀α ∈ R
1. Linearity.
(B
n
(αf + g))(x) = α(B
n
f)(x) + (B
n
g)(x).
2. Monotonicity. If f(x) g(x) ∀x ∈ [0, 1], then
(B
n
f)(x) (B
n
g)(x).
Also, if [f(x)[ g(x) ∀x ∈ [0, 1] then
[(B
n
f)(x)[ (B
n
g)(x).
3. Positivity. If f(x) 0 then
(B
n
f)(x) 0.
We are now ready to prove the Weierstrass approximation theorem, Theorem 3.1.
39
3.1 Background D. Levy
Proof. We will prove the theorem in the interval [0, 1]. The extension to [a, b] is left as
an exercise. Since f(x) is continuous on a closed interval, it is uniformly continuous.
Hence ∀x, y ∈ [0, 1], such that [x −y[ δ,
[f(x) −f(y)[ ε. (3.5)
In addition, since f(x) is continuous on a closed interval, it is also bounded. Let
M = max
x∈[0,1]
[f(x)[.
Fix any point a ∈ [0, 1]. If [x −a[ δ then (3.5) holds. If [x −a[ > δ then
[f(x) −f(a)[ 2M 2M

x −a
δ

2
.
(at first sight this seems to be a strange way of upper bounding a function. We will
use it later on to our advantage). Combining the estimates for both cases we have
[f(x) −f(a)[ ε +
2M
δ
2
(x −a)
2
.
We would now like to estimate the difference between B
n
f and f. The linearity of B
n
and the property (B
n
1)(x) = 1 imply that
B
n
(f −f(a))(x) = (B
n
f)(x) −f(a).
Hence using the monotonicity of B
n
and the mapping properties of x and x
2
, we have,
[B
n
f(x) −f(a)[ B
n

ε +
2M
δ
2
(x −a)
2

= ε +
2M
δ
2

n −1
n
x
2
+
x
n
−2ax + a
2

= ε +
2M
δ
2
(x −a)
2
+
2M
δ
2
x −x
2
n
.
Evaluating at x = a we have (observing that max
a∈[0,1]
(a −a
2
) =
1
4
)
[B
n
f(a) −f(a)[ ε +
2M
δ
2
a −a
2
n
ε +
M

2
n
. (3.6)
The point a was arbitrary so the result (3.6) holds for any point a ∈ [0, 1]. Choosing
N
M

2
ε
we have ∀n N,
|B
n
f −f|

ε +
M

2
N
2ε.

• Is interpolation a good way of approximating functions in the ∞-norm? Not
necessarily. Discuss Runge’s example...
40
D. Levy 3.2 The Minimax Approximation Problem
3.2 The Minimax Approximation Problem
We assume that the function f(x) is continuous on [a, b], and assume that P
n
(x) is a
polynomial of degree n. We recall that the L

-distance between f(x) and P
n
(x) on
the interval [a, b] is given by
|f −P
n
|

= max
axb
[f(x) −P
n
(x)[. (3.7)
Clearly, we can construct polynomials that will have an arbitrary large distance from
f(x). The question we would like to address is how close can we get to f(x) (in the L

sense) with polynomials of a given degree. We define d
n
(f) as the infimum of (3.7) over
all polynomials of degree n, i.e.,
d
n
(f) = inf
Pn∈Πn
|f −P
n
|

(3.8)
The goal is to find a polynomial P

n
(x) for which the infimum (3.8) is actually ob-
tained, i.e.,
d
n
(f) = |f −P

n
(x)|

. (3.9)
We will refer to a polynomial P

n
(x) that satisfies (3.9) as a polynomial of best
approximation or the minimax polynomial. The minimal distance in (3.9) will
be referred to as the minimax error.
The theory we will explore in the following sections will show that the minimax
polynomial always exists and is unique. We will also provide a characterization of
the minimax polynomial that will allow us to identify it if we actually see it. The
general construction of the minimax polynomial will not be addressed in this text as it
is relatively technically involved. We will limit ourselves to simple examples.
Example 3.5
We let f(x) be a monotonically increasing and continuous function on the interval [a, b]
and are interested in finding the minimax polynomial of degree zero to f(x) in that
interval. We denote this minimax polynomial by
P

0
(x) ≡ c.
Clearly, the smallest distance between f(x) and P

0
in the L

-norm will be obtained if
c =
f(a) + f(b)
2
.
The maximal distance between f(x) and P

0
will be attained at both edges and will be
equal to
±
f(b) −f(a)
2
.
41
3.2 The Minimax Approximation Problem D. Levy
3.2.1 Existence of the minimax polynomial
The existence of the minimax polynomial is provided by the following theorem.
Theorem 3.6 (Existence) Let f ∈ C
0
[a, b]. Then for any n ∈ N there exists P

n
(x) ∈
Π
n
, that minimizes |f(x) −P
n
(x)|

among all polynomials P(x) ∈ Π
n
.
Proof. We follow the proof as given in [7]. Let η = (η
0
, . . . , η
n
) be an arbitrary point in
R
n+1
and let
P
n
(x) =
n
¸
i=0
η
i
x
i
∈ Π
n
.
We also let
φ(η) = φ(η
0
, . . . , η
n
) = |f −P
n
|

.
Our goal is to show that φ obtains a minimum in R
n+1
, i.e., that there exists a point
η

= (η

0
, . . . , η

n
) such that
φ(η

) = min
η∈R
n+1
φ(η).
Step 1. We first show that φ(η) is a continuous function on R
n+1
. For an arbitrary
δ = (δ
0
, . . . , δ
n
) ∈ R
n+1
, define
q
n
(x) =
n
¸
i=0
δ
i
x
i
.
Then
φ(η + δ) = |f −(P
n
+ q
n
)|

≤ |f −P
n
|

+|q
n
|

= φ(η) +|q
n
|

.
Hence
φ(η + δ) −φ(η) ≤ |q
n
|

≤ max
x∈[a,b]
([δ
0
[ +[δ
1
[[x[ + . . . +[δ
n
[[x[
n
).
For any ε > 0, let
˜
δ = ε/(1 + c + . . . + c
n
), where c = max([a[, [b[). Then for any
δ = (δ
0
, . . . , δ
n
) such that max [δ
i
[
˜
δ, 0 i n,
φ(η + δ) −φ(η) ε. (3.10)
Similarly
φ(η) = |f−P
n
|

= |f−(P
n
+q
n
)+q
n
|

|f−(P
n
+q
n
)|

+|q
n
|

= φ(η+δ)+|q
n
|

,
which implies that under the same conditions as in (3.10) we also get
φ(η) −φ(η + δ) ε,
Altogether,
[φ(η + δ) −φ(η)[ ε,
which means that φ is continuous at η. Since η was an arbitrary point in R
n+1
, φ is
continuous in the entire R
n+1
.
42
D. Levy 3.2 The Minimax Approximation Problem
Step 2. We now construct a compact set in R
n+1
on which φ obtains a minimum. We
let
S =
¸
η ∈ R
n+1

φ(η) ≤ |f|

¸
.
We have
φ(0) = |f|

,
hence, 0 ∈ S, and the set S is nonempty. We also note that the set S is bounded and
closed (check!). Since φ is continuous on the entire R
n+1
, it is also continuous on S,
and hence it must obtain a minimum on S, say at η

∈ R
n+1
, i.e.,
min
η∈S
φ(a) = φ(η

).
Step 3. Since 0 ∈ S, we know that
min
η∈S
φ(η) φ(0) = |f|

.
Hence, if η ∈ R
n+1
but η ∈ S then
φ(η) > |f|

min
η∈S
φ(η).
This means that the minimum of φ over S is the same as the minimum over the entire
R
n+1
. Therefore
P

n
(x) =
n
¸
i=0
η

i
x
i
, (3.11)
is the best approximation of f(x) in the L

norm on [a, b], i.e., it is the minimax
polynomial, and hence the minimax polynomial exists.

We note that the proof of Theorem 3.6 is not a constructive proof. The proof does
not tell us what the point η

is, and hence, we do not know the coefficients of the
minimax polynomial as written in (3.11). We will discuss the characterization of the
minimax polynomial and some simple cases of its construction in the following sections.
3.2.2 Bounds on the minimax error
It is trivial to obtain an upper bound on the minimax error, since by the definition of
d
n
(f) in (3.8) we have
d
n
(f) |f −P
n
|

, ∀P
n
(x) ∈ Π
n
.
A lower bound is provided by the following theorem.
43
3.2 The Minimax Approximation Problem D. Levy
Theorem 3.7 (de la Vall´ee-Poussin) Let a x
0
< x
1
< < x
n+1
b. Let P
n
(x)
be a polynomial of degree n. Suppose that
f(x
j
) −P
n
(x
j
) = (−1)
j
e
j
, j = 0, . . . , n + 1,
where all e
j
= 0 and are of an identical sign. Then
min
j
[e
j
[ d
n
(f).
Proof. By contradiction. Assume for some Q
n
(x) that
|f −Q
n
|

< min
j
[e
j
[.
Then the polynomial
(Q
n
−P
n
)(x) = (f −P
n
)(x) −(f −Q
n
)(x),
is a polynomial of degree n that has the same sign at x
j
as does f(x) −P
n
(x). This
implies that (Q
n
− P
n
)(x) changes sign at least n + 2 times, and hence it has at least
n + 1 zeros. Being a polynomial of degree n this is possible only if it is identically
zero, i.e., if P
n
(x) ≡ Q
n
(x), which contradicts the assumptions on Q
n
(x) and P
n
(x).

3.2.3 Characterization of the minimax polynomial
The following theorem provides a characterization of the minimax polynomial in terms
of its oscillations property.
Theorem 3.8 (The oscillating theorem) Suppose that f(x) is continuous in [a, b].
The polynomial P

n
(x) ∈ Π
n
is the minimax polynomial of degree n to f(x) in [a, b] if
and only if f(x) −P

n
(x) assumes the values ±|f −P

n
|

with an alternating change of
sign at least n + 2 times in [a, b].
Proof. We prove here only the sufficiency part of the theorem. For the necessary part
of the theorem we refer to [7].
Without loss of generality, suppose that
(f −P

n
)(x
i
) = (−1)
i
|f −P

n
|

, 0 i n + 1.
Let
D

= |f −P

n
|

,
and let
d
n
(f) = min
Pn∈Πn
|f −P
n
|

.
We replace the infimum in the original definition of d
n
(f) by a minimum because we
already know that a minimum exists. de la Vall´ee-Poussin’s theorem (Theorem 3.7)
implies that D

d
n
. On the other hand, the definition of d
n
implies that d
n
D

.
Hence D

= d
n
and P

n
(x) is the minimax polynomial.

Remark. In view of these theorems it is obvious why the Taylor expansion is a poor
uniform approximation. The sum is non oscillatory.
44
D. Levy 3.2 The Minimax Approximation Problem
3.2.4 Uniqueness of the minimax polynomial
Theorem 3.9 (Uniqueness) Let f(x) be continuous on [a, b]. Then its minimax poly-
nomial P

n
(x) ∈ Π
n
is unique.
Proof. Let
d
n
(f) = min
Pn∈Πn
|f −P
n
|

.
Assume that Q
n
(x) is also a minimax polynomial. Then
|f −P

n
|

= |f −Q
n
|

= d
n
(f).
The triangle inequality implies that
|f −
1
2
(P

n
+ Q
n
)|


1
2
|f −P

n
|

+
1
2
|f −Q
n
|

= d
n
(f).
Hence,
1
2
(P

n
+ Q
n
) ∈ Π
n
is also a minimax polynomial. The oscillating theorem
(Theorem 3.8) implies that there exist x
0
, . . . , x
n+1
∈ [a, b] such that
[f(x
i
) −
1
2
(P

n
(x
i
) + Q
n
(x
i
))[ = d
n
(f), 0 i n + 1. (3.12)
Equation (3.12) can be rewritten as
[f(x
i
) −P

n
(x
i
) + f(x
i
) −Q
n
(x
i
)[ = 2d
n
(f), 0 i n + 1. (3.13)
Since P

n
(x) and Q
n
(x) are both minimax polynomials, we have
[f(x
i
) −P

n
(x
i
)[ ≤ |f −P

n
|

= d
n
(f), 0 i n + 1. (3.14)
and
[f(x
i
) −Q
n
(x
i
)[ ≤ |f −Q
n
|

= d
n
(f), 0 i n + 1. (3.15)
For any i, equations (3.13)–(3.15) mean that the absolute value of two numbers that
are d
n
(f) add up to 2d
n
(f). This is possible only if they are equal to each other, i.e.,
f(x
i
) −P

n
(x
i
) = f(x
i
) −Q
n
(x
i
), 0 i n + 1,
i.e.,
(P

n
−Q
n
)(x
i
) = 0, 0 i n + 1.
Hence, the polynomial (P

n
−Q
n
)(x) ∈ Π
n
has n + 2 distinct roots which is possible for
a polynomial of degree n only if it is identically zero. Hence
Q
n
(x) ≡ P

n
(x),
and the uniqueness of the minimax polynomial is established.

45
3.2 The Minimax Approximation Problem D. Levy
3.2.5 The near-minimax polynomial
We now connect between the minimax approximation problem and polynomial interpo-
lation. In order for f(x) − P
n
(x) to change its sign n + 2 times, there should be n + 1
points on which f(x) and P
n
(x) agree with each other. In other words, we can think
of P
n
(x) as a function that interpolates f(x) at (least in) n + 1 points, say x
0
, . . . , x
n
.
What can we say about these points?
We recall that the interpolation error is given by (2.25),
f(x) −P
n
(x) =
1
(n + 1)!
f
(n+1)
(ξ)
n
¸
i=0
(x −x
i
).
If P
n
(x) is indeed the minimax polynomial, we know that the maximum of
f
(n+1)
(ξ)
n
¸
i=0
(x −x
i
), (3.16)
will oscillate with equal values. Due to the dependency of f
(n+1)
(ξ) on the intermediate
point ξ, we know that minimizing the error term (3.16) is a difficult task. We recall that
interpolation at the Chebyshev points minimizes the multiplicative part of the error
term, i.e.,
n
¸
i=0
(x −x
i
).
Hence, choosing x
0
, . . . , x
n
to be the Chebyshev points will not result with the minimax
polynomial, but nevertheless, this relation motivates us to refer to the interpolant at
the Chebyshev points as the near-minimax polynomial. We note that the term
“near-minimax” does not mean that the near-minimax polynomial is actually close to
the minimax polynomial.
3.2.6 Construction of the minimax polynomial
The characterization of the minimax polynomial in terms of the number of points in
which the maximum distance should be obtained with oscillating signs allows us to
construct the minimax polynomial in simple cases by a direct computation.
We are not going to deal with the construction of the minimax polynomial in the
general case. The algorithm for doing so is known as the Remez algorithm, and we refer
the interested reader to [2] and the references therein.
A simple case where we can demonstrate a direct construction of the polynomial is
when the function is convex, as done in the following example.
Example 3.10
Problem: Let f(x) = e
x
, x ∈ [1, 3]. Find the minimax polynomial of degree 1, P

1
(x).
Solution: Based on the characterization of the minimax polynomial, we will be looking
for a linear function P

1
(x) such that its maximal distance between P

1
(x) and f(x) is
46
D. Levy 3.2 The Minimax Approximation Problem
obtained 3 times with alternative signs. Clearly, in the case of the present problem,
since the function is convex, the maximal distance will be obtained at both edges and
at one interior point. We will use this observation in the construction that follows.
The construction itself is graphically shown in Figure 3.2.
1 a 3
x
e
1
f(a)
¯ y
l
1
(a)
e
3
←− l
2
(x)
e
x
l
1
(x) −→
P

1
(x)
Figure 3.2: A construction of the linear minimax polynomial for the convex function e
x
on [1, 3]
We let l
1
(x) denote the line that connects the endpoints (1, e) and (3, e
3
), i.e.,
l
1
(x) = e + m(x −1).
Here, the slope m is given by
m =
e
3
−e
2
. (3.17)
Let l
2
(x) denote the tangent to f(x) at a point a that is identified such that the slope
is m. Since f

(x) = e
x
, we have e
a
= m, i.e.,
a = log m.
Now
f(a) = e
log m
= m,
and
l
1
(a) = e + m(log m−1).
47
3.3 Least-squares Approximations D. Levy
Hence, the average between f(a) and l
1
(a) which we denote by ¯ y is given by
¯ y =
f(a) + l
1
(a)
2
=
m + e + mlog m−m
2
=
e + mlog m
2
.
The minimax polynomial P

1
(x) is the line of slope m that passes through (a, ¯ y),
P

1
(x) −
e + mlog m
2
= m(x −log m),
i.e.,
P

1
(x) = mx +
e −mlog m
2
,
where the slope m is given by (3.17). We note that the maximal difference between
P

1
(x) and f(x) is obtained at x = 1, a, 3.
3.3 Least-squares Approximations
3.3.1 The least-squares approximation problem
We recall that the L
2
-norm of a function f(x) is defined as
|f|
2
=

b
a
[f(x)[
2
dx.
As before, we let Π
n
denote the space of all polynomials of degree n. The least-
squares approximation problem is to find the polynomial that is the closest to f(x)
in the L
2
-norm among all polynomials of degree n, i.e., to find Q

n
∈ Π
n
such that
|f −Q

n
|
2
= min
Qn∈Πn
|f −Q
n
|
2
.
3.3.2 Solving the least-squares problem: a direct method
Let
Q
n
(x) =
n
¸
i=0
a
i
x
i
.
We want to minimize |f(x) −Q
n
(x)|
2
among all Q
n
∈ Π
n
. For convenience, instead of
minimizing the L
2
norm of the difference, we will minimize its square. We thus let φ
denote the square of the L
2
-distance between f(x) and Q
n
(x), i.e.,
φ(a
0
, . . . , a
n
) =

b
a
(f(x) −Q
n
(x))
2
dx
=

b
a
f
2
(x)dx −2
n
¸
i=0
a
i

b
a
x
i
f(x)dx +
n
¸
i=0
n
¸
j=0
a
i
a
j

b
a
x
i+j
dx.
48
D. Levy 3.3 Least-squares Approximations
φ is a function of the n + 1 coefficients in the polynomial Q
n
(x). This means that we
want to find a point ˆ a = (ˆ a
0
, . . . , ˆ a
n
) ∈ R
n+1
for which φ obtains a minimum. At this
point
∂φ
∂a
k

a=ˆ a
= 0. (3.18)
The condition (3.18) implies that
0 = −2

b
a
x
k
f(x)dx +
n
¸
i=0
ˆ a
i

b
a
x
i+k
dx +
n
¸
j=0
ˆ a
j

b
a
x
j+k
dx (3.19)
= 2
¸
n
¸
i=0
ˆ a
i

b
a
x
i+k
dx −

b
a
x
k
f(x)dx
¸
.
This is a linear system for the unknowns (ˆ a
0
, . . . , ˆ a
n
):
n
¸
i=0
ˆ a
i

b
a
x
i+k
dx =

b
a
x
k
f(x), k = 0, . . . , n. (3.20)
We thus know that the solution of the least-squares problem is the polynomial
Q

n
(x) =
n
¸
i=0
ˆ a
i
x
i
,
where the coefficients ˆ a
i
, i = 0, . . . , n, are the solution of (3.20), assuming that this
system can be solved. Indeed, the system (3.20) always has a unique solution, which
proves that not only the least-squares problem has a solution, but that it is also unique.
We let H
n+1
(a, b) denote the (n+1) (n+1) coefficients matrix of the system (3.20)
on the interval [a, b], i.e.,
(H
n+1
(a, b))
i,k
=

b
a
x
i+k
dx, 0 i, k n.
For example, in the case where [a, b] = [0, 1],
H
n
(0, 1) =

¸
¸
¸
¸
1/1 1/2 . . . 1/n
1/2 1/3 . . . 1/(n + 1)
.
.
.
.
.
.
1/n 1/(n + 1) . . . 1/(2n −1)

(3.21)
The matrix (3.21) is known as the Hilbert matrix.
Lemma 3.11 The Hilbert matrix is invertible.
49
3.3 Least-squares Approximations D. Levy
Proof. We leave it is an exercise to show that the determinant of H
n
is given by
det(H
n
) =
(1!2! (n −1)!)
4
1!2! (2n −1)!
.
Hence, det(H
n
) = 0 and H
n
is invertible.

Is inverting the Hilbert matrix a good way of solving the least-squares problem? No.
There are numerical instabilities that are associated with inverting H. We demonstrate
this with the following example.
Example 3.12
The Hilbert matrix H
5
is
H
5
=

¸
¸
¸
¸
¸
1/1 1/2 1/3 1/4 1/5
1/2 1/3 1/4 1/5 1/6
1/3 1/4 1/5 1/6 1/7
1/4 1/5 1/6 1/7 1/8
1/5 1/6 1/7 1/8 1/9

The inverse of H
5
is
H
5
=

¸
¸
¸
¸
¸
25 −300 1050 −1400 630
−300 4800 −18900 26880 −12600
1050 −18900 79380 −117600 56700
−1400 26880 −117600 179200 −88200
630 −12600 56700 −88200 44100

The condition number of H
5
is 4.77 10
5
, which indicates that it is ill-conditioned. In
fact, the condition number of H
n
increases with the dimension n so inverting it becomes
more difficult with an increasing dimension.
3.3.3 Solving the least-squares problem: with orthogonal polynomials
Let ¦P
k
¦
n
k=0
be polynomials such that
deg(P
k
(x)) = k.
Let Q
n
(x) be a linear combination of the polynomials ¦P
k
¦
n
k=0
, i.e.,
Q
n
(x) =
n
¸
j=0
c
j
P
j
(x). (3.22)
Clearly, Q
n
(x) is a polynomial of degree n. Define
φ(c
0
, . . . , c
n
) =

b
a
[f(x) −Q
n
(x)]
2
dx.
50
D. Levy 3.3 Least-squares Approximations
We note that the function φ is a quadratic function of the coefficients of the linear
combination (3.22), ¦c
k
¦. We would like to minimize φ. Similarly to the calculations
done in the previous section, at the minimum, ˆ c = (ˆ c
0
, . . . , ˆ c
n
), we have
0 =
∂φ
∂c
k

c=ˆ c
= −2

b
a
P
k
(x)f(x)dx + 2
n
¸
j=0
ˆ c
j

b
a
P
j
(x)P
k
(x)dx,
i.e.,
n
¸
j=0
ˆ c
j

b
a
P
j
(x)P
k
(x)dx =

b
a
P
k
(x)f(x)dx, k = 0, . . . , n. (3.23)
Note the similarity between equation (3.23) and (3.20). There, we used the basis func-
tions ¦x
k
¦
n
k=0
(a basis of Π
n
), while here we work with the polynomials ¦P
k
(x)¦
n
k=0
instead. The idea now is to choose the polynomials ¦P
k
(x)¦
n
k=0
such that the system
(3.23) can be easily solved. This can be done if we choose them in such a way that

b
a
P
i
(x)P
j
(x)dx = δ
ij
=

1, i = j,
0, j = j.
(3.24)
Polynomials that satisfy (3.24) are called orthonormal polynomials. If, indeed, the
polynomials ¦P
k
(x)¦
n
k=0
are orthonormal, then (3.23) implies that
ˆ c
j
=

b
a
P
j
(x)f(x)dx, j = 0, . . . , n. (3.25)
The solution of the least-squares problem is a polynomial
Q

n
(x) =
n
¸
j=0
ˆ c
j
P
j
(x), (3.26)
with coefficients ˆ c
j
, j = 0, . . . , n, that are given by (3.25).
Remark. Polynomials that satisfy

b
a
P
i
(x)P
j
(x)dx =

b
a
(P
i
(x))
2
, i = j,
0, i = j,
with

b
a
(P
i
(x))
2
dx that is not necessarily 1 are called orthogonal polynomials. In
this case, the solution of the least-squares problem is given by the polynomial Q

n
(x) in
(3.26) with the coefficients
ˆ c
j
=

b
a
P
j
(x)f(x)dx

b
a
(P
j
(x))
2
dx
, j = 0, . . . , n. (3.27)
51
3.3 Least-squares Approximations D. Levy
3.3.4 The weighted least squares problem
A more general least-squares problem is the weighted least squares approximation
problem. We consider a weight function, w(x), to be a continuous on (a, b), non-
negative function with a positive mass, i.e.,

b
a
w(x)dx > 0.
Note that w(x) may be singular at the edges of the interval since we do not require
it to be continuous on the closed interval [a, b]. For any weight w(x), we define the
corresponding weighted L
2
-norm of a function f(x) as
|f|
2,w
=

b
a
(f(x))
2
w(x)dx.
The weighted least-squares problem is to find the closest polynomial Q

n
∈ Π
n
to f(x),
this time in the weighted L
2
-norm sense, i.e., we look for a polynomial Q

n
(x) of degree
n such that
|f −Q

n
|
2,w
= min
Qn∈Πn
|f −Q
n
|
2,w
. (3.28)
In order to solve the weighted least-squares problem (3.28) we follow the methodology
described in Section 3.3.3, and consider polynomials ¦P
k
¦
n
k=0
such that deg(P
k
(x)) = k.
We then consider a polynomial Q
n
(x) that is written as their linear combination:
Q
n
(x) =
n
¸
j=0
c
j
P
j
(x).
By repeating the calculations of Section 3.3.3, we obtain
n
¸
j=0
ˆ c
j

b
a
w(x)P
j
(x)P
k
(x)dx =

b
a
w(x)P
k
(x)f(x)dx, k = 0, . . . , n, (3.29)
(compare with (3.23)). The system (3.29) can be easily solved if we choose ¦P
k
(x)¦ to
be orthonormal with respect to the weight w(x), i.e.,

b
a
P
i
(x)P
j
(x)w(x)dx = δ
ij
.
Hence, the solution of the weighted least-squares problem is given by
Q

n
(x) =
n
¸
j=0
ˆ c
j
P
j
(x), (3.30)
where the coefficients are given by
ˆ c
j
=

b
a
P
j
(x)f(x)w(x)dx, j = 0, . . . , n. (3.31)
52
D. Levy 3.3 Least-squares Approximations
Remark. In the case where ¦P
k
(x)¦ are orthogonal but not necessarily normalized,
the coefficients of the solution (3.30) of the weighted least-squares problem are given by
ˆ c
j
=

b
a
P
j
(x)f(x)dx

b
a
(P
j
(x))
2
w(x)dx
, j = 0, . . . , n.
3.3.5 Orthogonal polynomials
At this point we already know that orthogonal polynomials play a central role in the
solution of least-squares problems. In this section we will focus on the construction of
orthogonal polynomials. The properties of orthogonal polynomials will be studies in
Section 3.3.7.
We start by defining the weighted inner product between two functions f(x) and
g(x) (with respect to the weight w(x)):
'f, g`
w
=

b
a
f(x)g(x)w(x)dx.
To simplify the notations, even in the weighted case, we will typically write 'f, g` instead
of 'f, g`
w
. Some properties of the weighted inner product include
1. 'αf, g` = 'f, αg` = α'f, g` , ∀α ∈ R.
2. 'f
1
+ f
2
, g` = 'f
1
, g` +'f
2
, g`.
3. 'f, g` = 'g, f`
4. 'f, f` 0 and 'f, f` = 0 iff f ≡ 0. Here we must assume that f(x) is continuous
in the interval [a, b]. If it is not continuous, we can have 'f, f` = 0 and f(x) can
still be non-zero (e.g., in one point).
The weighted L
2
-norm can be obtained from the weighted inner product by
|f|
2,w
=

'f, f`
w
.
Given a weight w(x), we are interested in constructing orthogonal (or orthonor-
mal) polynomials. This can be done using the Gram-Schmidt orthogonalization
process, which we now describe in detail.
In the general context of linear algebra, the Gram-Schmidt process is being used
to convert one set of linearly independent vectors to an orthogonal set of vectors that
spans the same vector space. In our context, we should think about the process as
converting one set of polynomials that span the space of polynomials of degree n
to an orthogonal set of polynomials that spans the same space Π
n
. Typically, the
initial set of polynomials will be ¦1, x, x
2
, . . . , x
n
¦, which we would like to convert to
orthogonal polynomials with respect to the weight w(x). However, to keep the discussion
slightly more general, we start with n+1 linearly independent functions (all in L
2
w
[a, b],
53
3.3 Least-squares Approximations D. Levy
¦g
i
(x)¦
n
i=0
, i.e.,

b
a
(g(x))
2
w(x)dx < ∞). The functions ¦g
i
¦ will be converted into
orthonormal vectors ¦f
i
¦.
We thus consider

f
0
(x) = d
0
g
0
(x),
f
1
(x) = d
1
(g
1
(x) −c
0
1
f
0
(x)),
.
.
.
f
n
(x) = d
n
(g
n
(x) −c
0
n
f
0
(x) −. . . −c
n−1
n
f
n−1
(x)).
The goal is to find the coefficients d
k
and c
j
k
such that ¦f
i
¦
n
i=0
is orthonormal with
respect to the weighted L
2
-norm over [a, b], i.e.,
'f
i
, f
j
`
w
=

b
a
f
i
(x)f
j
(x)w(x)dx = δ
ij
.
We start with f
0
(x):
'f
0
, f
0
`
w
= d
2
0
'g
0
, g
0
`
w
.
Hence,
d
0
=
1

'g
0
, g
0
`
w
.
For f
1
(x), we require that it is orthogonal to f
0
(x), i.e., 'f
0
, f
1
`
w
= 0. Hence
0 = d
1

f
0
, g
1
−c
0
1
f
0

w
= d
1
('f
0
, g
1
`
w
−c
0
1
),
i.e.,
c
0
1
= 'f
0
, g
1
`
w
.
The normalization condition 'f
1
, f
1
`
w
= 1 now implies
d
2
1

g
1
−c
0
1
f
0
, g
1
−c
0
1
f
0

w
= 1.
Hence
d
1
=
1

'g
1
−c
0
1
f
0
, g
1
−c
0
1
f
0
`
w
.
The denominator cannot be zero due to the assumption that g
i
(x) are linearly indepen-
dent. In general
f
k
(x) = d
k
(g
k
−c
0
k
f
0
−. . . −c
k−1
k
f
k−1
).
For i = 0, . . . , k −1 we require the orthogonality conditions
0 = 'f
k
, f
i
`
w
.
54
D. Levy 3.3 Least-squares Approximations
Hence
0 =

d
k
(g
k
−c
i
k
f
i
), f
i

w
= d
k
('g
k
, f
i
`
w
−c
i
k
),
i.e.,
c
i
k
= 'g
k
, f
i
`
w
, 0 i k −1.
The coefficient d
k
is obtained from the normalization condition 'f
k
, f
k
`
w
= 1.
Example 3.13
Let w(x) ≡ 1 on [−1, 1]. Start with g
i
(x) = x
i
, i = 0, . . . , n. We follow the Gram-
Schmidt orthogonalization process to generate from this list, a set of orthonormal poly-
nomials with respect to the given weight on [−1, 1]. Since g
0
(x) ≡ 1, we have
f
0
(x) = d
0
g
0
(x) = d
0
.
Hence
1 =

1
−1
f
2
0
(x)dx = 2d
2
0
,
which means that
d
0
=
1

2
=⇒ f
0
=
1

2
.
Now
f
1
(x)
d
1
= g
1
−c
0
1
f
0
= x −c
0
1

1
2
.
Hence
c
0
1
= 'g
1
, f
0
` =

x,

1
2
¸
=

1
2

1
−1
xdx = 0.
This implies that
f
1
(x)
d
1
= x =⇒ f
1
(x) = d
1
x.
The normalization condition 'f
1
, f
1
` = 1 reads
1 =

1
−1
d
2
1
x
2
dx =
2
3
d
2
1
.
Therefore,
d
1
=

3
2
=⇒ f
1
(x) =

3
2
x.
Similarly,
f
2
(x) =
1
2

5
2
(3x
2
−1),
and so on.
55
3.3 Least-squares Approximations D. Levy
We are now going to provide several important examples of orthogonal polynomials.
1. Legendre polynomials. We start with the Legendre polynomials. This is a
family of polynomials that are orthogonal with respect to the weight
w(x) ≡ 1,
on the interval [−1, 1]. The Legendre polynomials can be obtained from the re-
currence relation
(n + 1)P
n+1
(x) −(2n + 1)xP
n
(x) + nP
n−1
(x) = 0, n 1, (3.32)
starting from
P
0
(x) = 1, P
1
(x) = x.
It is possible to calculate them directly by Rodrigues’ formula
P
n
(x) =
1
2
n
n!
d
n
dx
n

(x
2
−1)
n

, n 0. (3.33)
The Legendre polynomials satisfy the orthogonality condition
'P
n
, P
m
` =
2
2n + 1
δ
nm
. (3.34)
2. Chebyshev polynomials. Our second example is of the Chebyshev polynomi-
als. These polynomials are orthogonal with respect to the weight
w(x) =
1

1 −x
2
,
on the interval [−1, 1]. They satisfy the recurrence relation
T
n+1
(x) = 2xT
n
(x) −T
n−1
(x), n 1, (3.35)
together with T
0
(x) = 1 and T
1
(x) = x (see (2.31)). They and are explicitly given
by
T
n
(x) = cos(ncos
−1
x), n 0. (3.36)
(see (2.32)). The orthogonality relation that they satisfy is
'T
n
, T
m
` =

0, n = m,
π, n = m = 0,
π
2
, n = m = 0.
(3.37)
56
D. Levy 3.3 Least-squares Approximations
3. Laguerre polynomials. We proceed with the Laguerre polynomials. Here the
interval is given by [0, ∞) with the weight function
w(x) = e
−x
.
The Laguerre polynomials are given by
L
n
(x) =
e
x
n!
d
n
dx
n
(x
n
e
−x
), n 0. (3.38)
The normalization condition is
|L
n
| = 1. (3.39)
A more general form of the Laguerre polynomials is obtained when the weight is
taken as
e
−x
x
α
,
for an arbitrary real α > −1, on the interval [0, ∞).
4. Hermite polynomials. The Hermite polynomials are orthogonal with respect
to the weight
w(x) = e
−x
2
,
on the interval (−∞, ∞). The can be explicitly written as
H
n
(x) = (−1)
n
e
x
2 d
n
e
−x
2
dx
n
, n 0. (3.40)
Another way of expressing them is by
H
n
(x) =
[n/2]
¸
k=0
(−1)
k
n!
k!(n −2k)!
(2x)
n−2k
, (3.41)
where [x] denotes the largest integer that is x. The Hermite polynomials satisfy
the recurrence relation
H
n+1
(x) −2xH
n
(x) + 2nH
n−1
(x) = 0, n 1, (3.42)
together with
H
0
(x) = 1, H
1
(x) = 2x.
They satisfy the orthogonality relation


−∞
e
−x
2
H
n
(x)H
m
(x)dx = 2
n
n!

πδ
nm
. (3.43)
57
3.3 Least-squares Approximations D. Levy
3.3.6 Another approach to the least-squares problem
In this section we present yet another way of deriving the solution of the least-squares
problem. Along the way, we will be able to derive some new results. We recall that our
goal is to minimize

b
a
w(x)(f(x) −Q
n
(x))
2
dx
among all the polynomials Q
n
(x) of degree n.
Assume that ¦P
k
(x)¦
k0
is an orthonormal family of polynomials with respect to
w(x), and let
Q
n
(x) =
n
¸
j=0
b
j
P
j
(x).
Then
|f −Q
n
|
2
2,w
=

b
a
w(x)

f(x) −
n
¸
j=0
b
j
P
j
(x)

2
dx.
Hence
0

f −
n
¸
j=0
b
j
P
j
, f −
n
¸
j=0
b
j
P
j
¸
w
= 'f, f`
w
−2
n
¸
j=0
b
j
'f, P
j
`
w
+
n
¸
i=0
n
¸
j=0
b
i
b
j
'P
i
, P
j
`
w
= |f|
2
2,w
−2
n
¸
j=0
'f, P
j
`
w
b
j
+
n
¸
j=0
b
2
j
= |f|
2
2,w

n
¸
j=0
'f, P
j
`
2
w
+
n
¸
j=0

'f, P
j
`
w
−b
j

2
.
The last expression is minimal iff ∀0 j n
b
j
= 'f, P
j
`
w
.
Hence, there exists a unique least-squares approximation which is given by
Q

n
(x) =
n
¸
j=0
'f, P
j
`
w
P
j
(x). (3.44)
Remarks.
1. We have
|f −Q

n
|
2
2,w
= |f|
2
2,w

n
¸
j=0
'f, P
j
`
2
w
.
Hence
|Q

n
|
2
=
n
¸
j=0

'f, P
j
`
w

2
= |f|
2
−|f −Q

n
|
2
|f|
2
,
58
D. Levy 3.3 Least-squares Approximations
i.e.,
n
¸
j=0

'f, P
j
`
w

2
|f|
2
2,w
. (3.45)
The inequality (3.45) is called Bessel’s inequality.
2. Assuming that [a, b] is finite, we have
lim
n→∞
|f −Q

n
|
2,w
= 0.
Hence
|f|
2
2,w
=

¸
j=0

'f, P
j
`
w

2
, (3.46)
which is known as Parseval’s equality.
Example 3.14
Problem: Let f(x) = cos x on [−1, 1]. Find the polynomial in Π
2
, that minimizes

1
−1
[f(x) −Q
2
(x)]
2
dx.
Solution: The weight w(x) ≡ 1 on [−1, 1] implies that the orthogonal polynomials we
need to use are the Legendre polynomials. We are seeking for polynomials of degree
2 so we write the first three Legendre polynomials
P
0
(x) ≡ 1, P
1
(x) = x, P
2
(x) =
1
2
(3x
2
−1).
The normalization factor satisfies, in general,

1
−1
P
2
n
(x) =
2
2n + 1
.
Hence

1
−1
P
2
0
(x)dx = 2,

1
−1
P
1
(x)dx =
2
3
,

1
−1
P
2
2
(x)dx =
2
5
.
We can then replace the Legendre polynomials by their normalized counterparts:
P
0
(x) ≡
1

2
, P
1
(x) =

3
2
x, P
2
(x) =

5
2

2
(3x
2
−1).
We now have
'f, P
0
` =

1
−1
cos x
1

2
dx =
1

2
sin x

1
−1
=

2 sin 1.
59
3.3 Least-squares Approximations D. Levy
Hence
Q

0
(x) ≡ sin 1.
We also have
'f, P
1
` =

1
−1
cos x

3
2
xdx = 0.
which means that Q

1
(x) = Q

0
(x). Finally,
'f, P
2
` =

1
−1
cos x

5
2
3x
2
−1
2
=
1
2

5
2
(12 cos 1 −8 sin 1),
and hence the desired polynomial, Q

2
(x), is given by
Q

2
(x) = sin 1 +

15
2
cos 1 −5 sin 1

(3x
2
−1).
In Figure 3.3 we plot the original function f(x) = cos x (solid line) and its approximation
Q

2
(x) (dashed line). We zoom on the interval x ∈ [0, 1].
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
x
Figure 3.3: A second-order L
2
-approximation of f(x) = cos x. Solid line: f(x); Dashed
line: its approximation Q

2
(x)
If the weight is w(x) ≡ 1 but the interval is [a, b], we can still use the Legendre
polynomials if we make the following change of variables. Define
x =
b + a + (b −a)t
2
.
60
D. Levy 3.3 Least-squares Approximations
Then the interval −1 t 1 is mapped to a x b. Now, define
F(t) = f

b + a + (b −a)t
2

= f(x).
Hence

b
a
[f(x) −Q
n
(x)]
2
dx =
b −a
2

1
−1
[F(t) −q
n
(t)]
2
dt.
Example 3.15
Problem: Let f(x) = cos x on [0, π]. Find the polynomial in Π
1
that minimizes

π
0
[f(x) −Q
1
(x)]
2
dx.
Solution:

π
0
(f(x) −Q

1
(x))
2
dx =
π
2

1
−1
[F(t) −q
n
(t)]
2
dt.
Letting
x =
π + πt
2
=
π
2
(1 + t),
we have
F(t) = cos

π
2
(1 + t)

= −sin
πt
2
.
We already know that the first two normalized Legendre polynomials are
P
0
(t) =
1

2
, P
1
(t) =

3
2
t.
Hence
'F, P
0
` = −

1
−1
1

2
sin
πt
2
dt = 0,
which means that Q

0
(t) = 0. Also
'F, P
1
` = −

1
−1
sin
πt
2

3
2
tdt = −

3
2
¸
sin
πt
2

π
2

2

t cos
πt
2
π
2
¸
1
−1
= −

3
2

8
π
2
.
Hence
q

1
(t) = −
3
2

8
π
2
t = −
12
π
2
t =⇒ Q

1
(x) = −
12
π
2

2
π
x −1

.
In Figure 3.4 we plot the original function f(x) = cos x (solid line) and its approximation
Q

1
(x) (dashed line).
61
3.3 Least-squares Approximations D. Levy
0 0.5 1 1.5 2 2.5 3
−1
−0.5
0
0.5
1
x
Figure 3.4: A first-order L
2
-approximation of f(x) = cos x on the interval [0, π]. Solid
line: f(x), Dashed line: its approximation Q

1
(x)
Example 3.16
Problem: Let f(x) = cos x in [0, ∞). Find the polynomial in Π
1
that minimizes


0
e
−x
[f(x) −Q
1
(x)]
2
dx.
Solution: The family of orthogonal polynomials that correspond to this weight on
[0, ∞) are Laguerre polynomials. Since we are looking for the minimizer of the
weighted L
2
norm among polynomials of degree 1, we will need to use the first two
Laguerre polynomials:
L
0
(x) = 1, L
1
(x) = 1 −x.
We thus have
'f, L
0
`
w
=


0
e
−x
cos xdx =
e
−x
(−cos x + sin x)
2


0
=
1
2
.
Also
'f, L
1
`
w
=


0
e
−x
cos x(1−x)dx =
1
2

¸
xe
−x
(−cos x + sin x)
2

e
−x
(−2 sin x)
4


0
= 0.
This means that
Q

1
(x) = 'f, L
0
`
w
L
0
(x) +'f, L
1
`
w
L
1
(x) =
1
2
.
62
D. Levy 3.3 Least-squares Approximations
3.3.7 Properties of orthogonal polynomials
We start with a theorem that deals with some of the properties of the roots of orthogonal
polynomials. This theorem will become handy when we discuss Gaussian quadratures
in Section 5.6. We let ¦P
n
(x)¦
n0
be orthogonal polynomials in [a, b] with respect to
the weight w(x).
Theorem 3.17 The roots x
j
, j = 1, . . . , n of P
n
(x) are all real, simple, and are in
(a, b).
Proof. Let x
1
, . . . , x
r
be the roots of P
n
(x) in (a, b). Let
Q
r
(x) = (x −x
1
) . . . (x −x
r
).
Then Q
r
(x) and P
n
(x) change their signs together in (a, b). Also
deg(Q
r
(x)) = r n.
Hence (P
n
Q
r
)(x) is a polynomial with one sign in (a, b). This implies that

b
a
P
n
(x)Q
r
(x)w(x)dx = 0,
and hence r = n since P
n
(x) is orthogonal to polynomials of degree less than n.
Without loss of generality we now assume that x
1
is a multiple root, i.e.,
P
n
(x) = (x −x
1
)
2
P
n−2
(x).
Hence
P
n
(x)P
n−2
(x) =

P
n
(x)
x −x
1

2
0,
which implies that

b
a
P
n
(x)P
n−2
(x)dx > 0.
This is not possible since P
n
is orthogonal to P
n−2
. Hence roots can not repeat.

Another important property of orthogonal polynomials is that they can all be written
in terms of recursion relations. We have already seen specific examples of such relations
for the Legendre, Chebyshev, and Hermite polynomials (see (3.32), (3.35), and (3.42)).
The following theorem states such relations always hold.
Theorem 3.18 (Triple Recursion Relation) Any three consecutive orthonormal poly-
nomials are related by a recursion formula of the form
P
n+1
(x) = (A
n
x + B
n
)P
n
(x) −C
n
P
n−1
(x).
If a
k
and b
k
are the coefficients of the terms of degree k and degree k −1 in P
k
(x), then
A
n
=
a
n+1
a
n
, B
n
=
a
n+1
a
n

b
n+1
a
n+1

b
n
a
n

, C
n
=
a
n+1
a
n−1
a
2
n
.
63
3.3 Least-squares Approximations D. Levy
Proof. For
A
n
=
a
n+1
a
n
,
let
Q
n
(x) = P
n+1
(x) −A
n
xP
n
(x)
= (a
n+1
x
n+1
+ b
n+1
x
n
+ . . .) −
a
n+1
a
n
x(a
n
x
n
+ b
n
x
n−1
+ . . .)
=

b
n+1

a
n+1
b
n
a
n

x
n
+ . . .
Hence deg(Q(x)) n, which means that there exists α
0
, . . . , α
n
such that
Q(x) =
n
¸
i=0
α
i
P
i
(x).
For 0 i n −2,
α
i
=
'Q, P
i
`
'P
i
, P
i
`
= 'Q, P
i
` = 'P
n+1
−A
n
xP
n
, P
i
` = −A
n
'xP
n
, P
i
` = 0.
Hence
Q
n
(x) = α
n
P
n
(x) + α
n−1
P
n−1
(x).
Set α
n
= B
n
and α
n−1
= −C
n
. Then, since
xP
n−1
=
a
n−1
a
n
P
n
+ q
n−1
,
we have
C
n
= A
n
'xP
n
, P
n−1
` = A
n
'P
n
, xP
n−1
` = A
n

P
n
,
a
n−1
a
n
P
n
+ q
n−1

= A
n
a
n−1
a
n
.
Finally
P
n+1
= (A
n
x + B
n
)P
n
−C
n
P
n−1
,
can be explicitly written as
a
n+1
x
n+1
+b
n+1
x
n
+. . . = (A
n
x+B
n
)(a
n
x
n
+b
n
x
n−1
+. . .)−C
n
(a
n−1
x
n−1
+b
n−1
x
n−2
+. . .).
The coefficient of x
n
is
b
n+1
= A
n
b
n
+ B
n
a
n
,
which means that
B
n
= (b
n+1
−A
n
b
n
)
1
a
n
.

64
D. Levy
4 Numerical Differentiation
4.1 Basic Concepts
This chapter deals with numerical approximations of derivatives. The first questions
that comes up to mind is: why do we need to approximate derivatives at all? After
all, we do know how to analytically differentiate every function. Nevertheless, there are
several reasons as of why we still need to approximate derivatives:
• Even if there exists an underlying function that we need to differentiate, we might
know its values only at a sampled data set without knowing the function itself.
• There are some cases where it may not be obvious that an underlying function
exists and all that we have is a discrete data set. We may still be interested in
studying changes in the data, which are related, of course, to derivatives.
• There are times in which exact formulas are available but they are very complicated
to the point that an exact computation of the derivative requires a lot of function
evaluations. It might be significantly simpler to approximate the derivative instead
of computing its exact value.
• When approximating solutions to ordinary (or partial) differential equations, we
typically represent the solution as a discrete approximation that is defined on a
grid. Since we then have to evaluate derivatives at the grid points, we need to be
able to come up with methods for approximating the derivatives at these points,
and again, this will typically be done using only values that are defined on a lattice.
The underlying function itself (which in this cased is the solution of the equation)
is unknown.
A simple approximation of the first derivative is
f

(x) ≈
f(x + h) −f(x)
h
, (4.1)
where we assume that h > 0. What do we mean when we say that the expression on
the right-hand-side of (4.1) is an approximation of the derivative? For linear functions
(4.1) is actually an exact expression for the derivative. For almost all other functions,
(4.1) is not the exact derivative.
Let’s compute the approximation error. We write a Taylor expansion of f(x + h)
about x, i.e.,
f(x + h) = f(x) + hf

(x) +
h
2
2
f

(ξ), ξ ∈ (x, x + h). (4.2)
For such an expansion to be valid, we assume that f(x) has two continuous derivatives.
The Taylor expansion (4.2) means that we can now replace the approximation (4.1) with
an exact formula of the form
f

(x) =
f(x + h) −f(x)
h

h
2
f

(ξ), ξ ∈ (x, x + h). (4.3)
65
4.1 Basic Concepts D. Levy
Since this approximation of the derivative at x is based on the values of the function at
x and x + h, the approximation (4.1) is called a forward differencing or one-sided
differencing. The approximation of the derivative at x that is based on the values of
the function at x −h and x, i.e.,
f

(x) ≈
f(x) −f(x −h)
h
,
is called a backward differencing (which is obviously also a one-sided differencing
formula).
The second term on the right-hand-side of (4.3) is the error term. Since the ap-
proximation (4.1) can be thought of as being obtained by truncating this term from the
exact formula (4.3), this error is called the truncation error. The small parameter h
denotes the distance between the two points x and x+h. As this distance tends to zero,
i.e., h →0, the two points approach each other and we expect the approximation (4.1)
to improve. This is indeed the case if the truncation error goes to zero, which in turn is
the case if f

(ξ) is well defined in the interval (x, x+h). The “speed” in which the error
goes to zero as h →0 is called the rate of convergence. When the truncation error is
of the order of O(h), we say that the method is a first order method. We refer to a
methods as a p
th
-order method if the truncation error is of the order of O(h
p
).
It is possible to write more accurate formulas than (4.3) for the first derivative. For
example, a more accurate approximation for the first derivative that is based on the
values of the function at the points f(x−h) and f(x+h) is the centered differencing
formula
f

(x) ≈
f(x + h) −f(x −h)
2h
. (4.4)
Let’s verify that this is indeed a more accurate formula than (4.1). Taylor expansions
of the terms on the right-hand-side of (4.4) are
f(x + h) = f(x) + hf

(x) +
h
2
2
f

(x) +
h
3
6
f


1
),
f(x −h) = f(x) −hf

(x) +
h
2
2
f

(x) −
h
3
6
f


2
).
Here ξ
1
∈ (x, x + h) and ξ
2
∈ (x −h, x). Hence
f

(x) =
f(x + h) −f(x −h)
2h

h
2
12
[f


1
) + f


2
)],
which means that the truncation error in the approximation (4.4) is

h
2
12
[f


1
) + f


2
)].
If the third-order derivative f

(x) is a continuous function in the interval [x−h, x+h],
then the intermediate value theorem implies that there exists a point ξ ∈ (x −h, x +h)
such that
f

(ξ) =
1
2
[f


1
) + f


2
)].
66
D. Levy 4.2 Differentiation Via Interpolation
Hence
f

(x) =
f(x + h) −f(x −h)
2h

h
2
6
f

(ξ), (4.5)
which means that the expression (4.4) is a second-order approximation of the first deriva-
tive.
In a similar way we can approximate the values of higher-order derivatives. For
example, it is easy to verify that the following is a second-order approximation of the
second derivative
f

(x) ≈
f(x + h) −2f(x) + f(x −h)
h
2
. (4.6)
To verify the consistency and the order of approximation of (4.6) we expand
f(x ±h) = f(x) ±hf

(x) +
h
2
2
f

(x) ±
h
3
6
f

(x) +
h
4
24
f
(4)

±
).
Here, ξ

∈ (x −h, x) and ξ
+
∈ (x, x + h). Hence
f(x + h) −2f(x) + f(x −h)
h
2
= f

(x)+
h
2
24

f
(4)


) + f
(4)

+
)

= f

(x)+
h
2
12
f
(4)
(ξ),
where we assume that ξ ∈ (x −h, x + h) and that f(x) has four continuous derivatives
in the interval. Hence, the approximation (4.6) is indeed a second-order approximation
of the derivative, with a truncation error that is given by

h
2
12
f
(4)
(ξ), ξ ∈ (x −h, x + h).
4.2 Differentiation Via Interpolation
In this section we demonstrate how to generate differentiation formulas by differentiating
an interpolant. The idea is straightforward: the first stage is to construct an interpo-
lating polynomial from the data. An approximation of the derivative at any point can
be then obtained by a direct differentiation of the interpolant.
We follow this procedure and assume that f(x
0
), . . . , f(x
n
) are given. The Lagrange
form of the interpolation polynomial through these points is
Q
n
(x) =
n
¸
j=0
f(x
j
)l
j
(x).
Here we simplify the notation and replace l
n
i
(x) which is the notation we used in Sec-
tion 2.5 by l
i
(x). According to the error analysis of Section 2.7 we know that the
interpolation error is
f(x) −Q
n
(x) =
1
(n + 1)!
f
(n+1)

n
)
n
¸
j=0
(x −x
j
),
67
4.2 Differentiation Via Interpolation D. Levy
where ξ
n
∈ (min(x, x
0
, . . . , x
n
), max(x, x
0
, . . . , x
n
)). Since here we are assuming that the
points x
0
, . . . , x
n
are fixed, we would like to emphasize the dependence of ξ
n
on x and
hence replace the ξ
n
notation by ξ
x
. We that have:
f(x) =
n
¸
j=0
f(x
j
)l
j
(x) +
1
(n + 1)!
f
(n+1)

x
)w(x), (4.7)
where
w(x) =
n
¸
i=0
(x −x
i
).
Differentiating the interpolant (4.7):
f

(x) =
n
¸
j=0
f(x
j
)l

j
(x) +
1
(n + 1)!
f
(n+1)

x
)w

(x) +
1
(n + 1)!
w(x)
d
dx
f
(n+1)

x
). (4.8)
We now assume that x is one of the interpolation points, i.e., x ∈ ¦x
0
, . . . , x
n
¦, say x
k
,
so that
f

(x
k
) =
n
¸
j=0
f(x
j
)l

j
(x
k
) +
1
(n + 1)!
f
(n+1)

x
k
)w

(x
k
). (4.9)
Now,
w

(x) =
n
¸
i=0
n
¸
j=0
j=i
(x −x
j
) =
n
¸
i=0
[(x −x
0
) . . . (x −x
i−1
)(x −x
i+1
) . . . (x −x
n
)].
Hence, when w

(x) is evaluated at an interpolation point x
k
, there is only one term in
w

(x) that does not vanish, i.e.,
w

(x
k
) =
n
¸
j=0
j=k
(x
k
−x
j
).
The numerical differentiation formula, (4.9), then becomes
f

(x
k
) =
n
¸
j=0
f(x
j
)l

j
(x
k
) +
1
(n + 1)!
f
(n+1)

x
k
)
¸
j=0
j=k
(x
k
−x
j
). (4.10)
We refer to the formula (4.10) as a differentiation by interpolation algorithm.
Example 4.1
We demonstrate how to use the differentiation by integration formula (4.10) in the case
where n = 1 and k = 0. This means that we use two interpolation points (x
0
, f(x
0
)) and
68
D. Levy 4.2 Differentiation Via Interpolation
(x
1
, f(x
1
)), and want to approximate f

(x
0
). The Lagrange interpolation polynomial in
this case is
f(x) = f(x
0
)l
0
(x) + f(x
1
)l
1
(x),
where
l
0
(x) =
x −x
1
x
0
−x
1
, l
1
(x) =
x −x
0
x
1
−x
0
.
Hence
l

0
(x) =
1
x
0
−x
1
, l

1
(x) =
1
x
1
−x
0
.
We thus have
f

(x
0
) =
f(x
0
)
x
0
−x
1
+
f(x
1
)
x
1
−x
0
+
1
2
f

(ξ)(x
0
−x
1
) =
f(x
1
) −f(x
0
)
x
1
−x
0

1
2
f

(ξ)(x
1
−x
0
).
Here, we simplify the notation and assume that ξ ∈ (x
0
, x
1
). If we now let x
1
= x
0
+h,
then
f

(x
0
) =
f(x
0
+ h) −f(x
0
)
h

h
2
f

(ξ),
which is the (first-order) forward differencing approximation of f

(x
0
), (4.3).
Example 4.2
We repeat the previous example in the case n = 2 and k = 0. This time
f(x) = f(x
0
)l
0
(x) + f(x
1
)l
1
(x) + f(x
2
)l
2
(x),
with
l
0
(x) =
(x −x
1
)(x −x
2
)
(x
0
−x
1
)(x
0
−x
2
)
, l
1
(x) =
(x −x
0
)(x −x
2
)
(x
1
−x
0
)(x
1
−x
2
)
, l
2
(x) =
(x −x
0
)(x −x
1
)
(x
2
−x
0
)(x
2
−x
1
)
.
Hence
l

0
(x) =
2x −x
1
−x
2
(x
0
−x
1
)(x
0
−x
2
)
, l

1
(x) =
2x −x
0
−x
2
(x
1
−x
0
)(x
1
−x
2
)
, l

2
(x) =
2x −x
0
−x
1
(x
2
−x
0
)(x
2
−x
1
)
.
Evaluating l

j
(x) for j = 1, 2, 3 at x
0
we have
l

0
(x
0
) =
2x
0
−x
1
−x
2
(x
0
−x
1
)(x
0
−x
2
)
, l

1
(x
0
) =
x
0
−x
2
(x
1
−x
0
)(x
1
−x
2
)
, l

2
(x
0
) =
x
0
−x
1
(x
2
−x
0
)(x
2
−x
1
)
Hence
f

(x
0
) = f(x
0
)
2x
0
−x
1
−x
2
(x
0
−x
1
)(x
0
−x
2
)
+ f(x
1
)
x
0
−x
2
(x
1
−x
0
)(x
1
−x
2
)
(4.11)
+f(x
2
)
x
0
−x
1
(x
2
−x
0
)(x
2
−x
1
)
+
1
6
f
(3)
(ξ)(x
0
−x
1
)(x
0
−x
2
).
69
4.3 The Method of Undetermined Coefficients D. Levy
Here, we assume ξ ∈ (x
0
, x
2
). For x
i
= x + ih, i = 0, 1, 2, equation (4.11) becomes
f

(x) = −f(x)
3
2h
+ f(x + h)
2
h
+ f(x + 2h)


1
2h

+
f

(ξ)
3
h
2
=
−3f(x) + 4f(x + h) −f(x + 2h)
2h
+
f

(ξ)
3
h
2
,
which is a one-sided, second-order approximation of the first derivative.
Remark. In a similar way, if we were to repeat the last example with n = 2 while
approximating the derivative at x
1
, the resulting formula would be the second-order
centered approximation of the first-derivative (4.5)
f

(x) =
f(x + h) −f(x −h)
2h

1
6
f

(ξ)h
2
.
4.3 The Method of Undetermined Coefficients
In this section we present the method of undetermined coefficients, which is a very
practical way for generating approximations of derivatives (as well as other quantities
as we shall see, e.g., when we discuss integration).
Assume, for example, that we are interested in finding an approximation of the
second derivative f

(x) that is based on the values of the function at three equally
spaced points, f(x −h), f(x), f(x + h), i.e.,
f

(x) ≈ Af(x + h) + B(x) + Cf(x −h). (4.12)
The coefficients A, B, and C are to be determined in such a way that this linear
combination is indeed an approximation of the second derivative. The Taylor expansions
of the terms f(x ±h) are
f(x ±h) = f(x) ±hf

(x) +
h
2
2
f

(x) ±
h
3
6
f

(x) +
h
4
24
f
(4)

±
), (4.13)
where (assuming that h > 0)
x −h ξ

x ξ
+
x + h.
Using the expansions in (4.13) we can rewrite (4.12) as
f

(x) ≈ Af(x + h) + Bf(x) + Cf(x −h) (4.14)
= (A + B + C)f(x) + h(A −C)f

(x) +
h
2
2
(A + C)f

(x)
+
h
3
6
(A −C)f
(3)
(x) +
h
4
24
[Af
(4)

+
) + Cf
(4)


)].
70
D. Levy 4.3 The Method of Undetermined Coefficients
Equating the coefficients of f(x), f

(x), and f

(x) on both sides of (4.14) we obtain the
linear system

A + B + C = 0,
A −C = 0,
A + C =
2
h
2
.
(4.15)
The system (4.15) has the unique solution:
A = C =
1
h
2
, B = −
2
h
2
.
In this particular case, since A and C are equal to each other, the coefficient of f
(3)
(x)
on the right-hand-side of (4.14) also vanishes and we end up with
f

(x) =
f(x + h) −2f(x) + f(x −h)
h
2

h
2
24
[f
(4)

+
) + f
(4)


)].
We note that the last two terms can be combined into one using an intermediate values
theorem (assuming that f(x) has four continuous derivatives), i.e.,
h
2
24
[f
(4)

+
) + f
(4)


)] =
h
2
12
f
(4)
(ξ), ξ ∈ (x −h, x + h).
Hence we obtain the familiar second-order approximation of the second derivative:
f

(x) =
f(x + h) −2f(x) + f(x −h)
h
2

h
2
12
f
(4)
(ξ).
In terms of an algorithm, the method of undetermined coefficients follows what was
just demonstrated in the example:
1. Assume that the derivative can be written as a linear combination of the values
of the function at certain points.
2. Write the Taylor expansions of the function at the approximation points.
3. Equate the coefficients of the function and its derivatives on both sides.
The only question that remains open is how many terms should we use in the Taylor
expansion. This question has, unfortunately, no simple answer. In the example, we have
already seen that even though we used data that is taken from three points, we could
satisfy four equations. In other words, the coefficient of the third-derivative vanished as
well. If we were to stop the Taylor expansions at the third derivative instead of at the
fourth derivative, we would have missed on this cancellation, and would have mistakenly
concluded that the approximation method is only first-order accurate. The number of
terms in the Taylor expansion should be sufficient to rule out additional cancellations.
In other words, one should truncate the Taylor series after the leading term in the error
has been identified.
71
4.4 Richardson’s Extrapolation D. Levy
4.4 Richardson’s Extrapolation
Richardson’s extrapolation can be viewed as a general procedure for improving the
accuracy of approximations when the structure of the error is known. While we study
it here in the context of numerical differentiation, it is by no means limited only to
differentiation and we will get back to it later on when we study methods for numerical
integration.
We start with an example in which we show how to turn a second-order approxima-
tion of the first derivative into a fourth order approximation of the same quantity. We
already know that we can write a second-order approximation of f

(x) given its values
in f(x ± h). In order to improve this approximation we will need some more insight
on the internal structure of the error. We therefore start with the Taylor expansions of
f(x ±h) about the point x, i.e.,
f(x + h) =

¸
k=0
f
(k)
(x)
k!
h
k
,
f(x −h) =

¸
k=0
(−1)
k
f
(k)
(x)
k!
h
k
.
Hence
f

(x) =
f(x + h) −f(x −h)
2h

¸
h
2
3!
f
(3)
(x) +
h
4
5!
f
(5)
(x) + . . .

. (4.16)
We rewrite (4.16) as
L = D(h) + e
2
h
2
+ e
4
h
4
+ . . . , (4.17)
where L denotes the quantity that we are interested in approximating, i.e.,
L = f

(x),
and D(h) is the approximation, which in this case is
D(h) =
f(x + h) −f(x −h)
2h
.
The error is
E = e
2
h
2
+ e
4
h
4
+ . . .
where e
i
denotes the coefficient of h
i
in (4.16). The important property of the coefficients
e
i
’s is that they do not depend on h. We note that the formula
L ≈ D(h),
is a second-order approximation of the first-derivative which is based on the values
of f(x) at x ± h. We assume here that in general e
i
= 0. In order to improve the
72
D. Levy 4.4 Richardson’s Extrapolation
approximation of L our strategy will be to eliminate the term e
2
h
2
from the error. How
can this be done? one possibility is to write another approximation that is based on the
values of the function at different points. For example, we can write
L = D(2h) + e
2
(2h)
2
+ e
4
(2h)
4
+ . . . . (4.18)
This, of course, is still a second-order approximation of the derivative. However, the
idea is to combine (4.17) with (4.18) such that the h
2
term in the error vanishes. Indeed,
subtracting the following equations from each other
4L = 4D(h) + 4e
2
h
2
+ 4e
4
h
4
+ . . . ,
L = D(2h) + 4e
2
h
2
+ 16e
4
h
4
+ . . . ,
we have
L =
4D(h) −D(2h)
3
−4e
4
h
4
+ . . .
In other words, a fourth-order approximation of the derivative is
f

(x) =
−f(x + 2h) + 8f(x + h) −8f(x −h) + f(x −2h)
12h
+ O(h
4
). (4.19)
Note that (4.19) improves the accuracy of the approximation (4.16) by using more
points.
This process can be repeated over and over as long as the structure of the error is
known. For example, we can write (4.19) as
L = S(h) + a
4
h
4
+ a
6
h
6
+ . . . (4.20)
where
S(h) =
−f(x + 2h) + 8f(x + h) −8f(x −h) + f(x −2h)
12h
.
Equation (4.20) can be turned into a sixth-order approximation of the derivative by
eliminating the term a
4
h
4
. We carry out such a procedure by writing
L = S(2h) + a
4
(2h)
4
+ a
6
(2h)
6
+ . . . (4.21)
Combining (4.21) with (4.20) we end up with a sixth-order approximation of the deriva-
tive:
L =
16S(h) −S(2h)
15
+ O(h
6
).
Remarks.
1. In (4.18), instead of using D(2h), it is possible to use other approximations, e.g.,
D(h/2). If this is what is done, instead of (4.19) we would get a fourth-order
approximation of the derivative that is based on the values of f at
x −h, x −h/2, x + h/2, x + h.
2. Once again we would like to emphasize that Richardson’s extrapolation is a
general procedure for improving the accuracy of numerical approximations that
can be used when the structure of the error is known. It is not specific for
numerical differentiation.
73
D. Levy
5 Numerical Integration
5.1 Basic Concepts
In this chapter we are going to explore various ways for approximating the integral of
a function over a given domain. Since we can not analytically integrate every function,
the need for approximate integration formulas is obvious. In addition, there might be
situations where the given function can be integrated analytically, and still, an approx-
imation formula may end up being a more efficient alternative to evaluating the exact
expression of the integral.
In order to gain some insight on numerical integration, it will be natural to recall
the notion of Riemann integration. We assume that f(x) is a bounded function defined
on [a, b] and that ¦x
0
, . . . , x
n
¦ is a partition (P) of [a, b]. For each i we let
M
i
(f) = sup
x∈[x
i−1
,x
i
]
f(x),
and
m
i
(f) = inf
x∈[x
i−1
,x
i
]
f(x),
Letting ∆x
i
= x
i
− x
i−1
, the upper (Darboux) sum of f(x) with respect to the
partition P is defined as
U(f, P) =
n
¸
i=1
M
i
∆x
i
, (5.1)
while the lower (Darboux) sum of f(x) with respect to the partition P is defined as
L(f, P) =
n
¸
i=1
m
i
∆x
i
. (5.2)
The upper integral of f(x) on [a, b] is defined as
U(f) = inf(U(f, P)),
and the lower integral of f(x) is defined as
L(f) = sup(L(f, P)),
where both the infimum and the supremum are taken over all possible partitions, P, of
the interval [a, b]. If the upper and lower integral of f(x) are equal to each other, their
common value is denoted by

b
a
f(x)dx and is referred to as the Riemann integral of
f(x).
For the purpose of the present discussion we can think of the upper and lower Dar-
boux sums (5.1), (5.2), as two approximations of the integral (assuming that the function
is indeed integrable). Of course, these sums are not defined in the most convenient way
74
D. Levy 5.1 Basic Concepts
for an approximation algorithm. This is because we need to find the extrema of the func-
tion in every subinterval. Finding the extrema of the function, may be a complicated
task on its own, which we would like to avoid.
Instead, one can think of approximating the value of

b
a
f(x)dx by multiplying the
value of the function at one of the end-points of the interval by the length of the interval,
i.e.,

b
a
f(x)dx ≈ f(a)(b −a). (5.3)
The approximation (5.3) is called the rectangular method (see Figure 5.1). Numerical
integration formulas are also referred to as integration rules or quadratures, and
hence we can refer to (5.3) as the rectangular rule or the rectangular quadrature.
a b
f(a)
f(b)
x
f
(
x
)
Figure 5.1: A rectangular quadrature
A variation on the rectangular rule is the midpoint rule. Similarly to the rectan-
gular rule, we approximate the value of the integral

b
a
f(x)dx by multiplying the length
of the interval by the value of the function at one point. Only this time, we replace the
value of the function at an endpoint, by the value of the function at the center point
1
2
(a + b), i.e.,

b
a
f(x)dx ≈ (b −a)f

a + b
2

. (5.4)
(see also Fig 5.2). As we shall see below, the midpoint quadrature (5.4) is a more
accurate quadrature than the rectangular rule (5.3).
75
5.1 Basic Concepts D. Levy
a (a+b)/2 b
f(a)
f((a+b)/2)
f(b)
x
f
(
x
)
Figure 5.2: A midpoint quadrature
In order to compute the quadrature error for the midpoint rule (5.4), we consider
the primitive function F(x),
F(x) =

x
a
f(x)dx,
and expand

a+h
a
f(x)dx = F(a + h) = F(a) + hF

(a) +
h
2
2
F

(a) +
h
3
6
F

(a) + O(h
4
) (5.5)
= hf(a) +
h
2
2
f

(a) +
h
3
6
f

(a) + O(h
4
)
If we let b = a + h, we have (expanding f(a + h/2)) for the quadrature error, E,
E =

a+h
a
f(x)dx −hf

a +
h
2

= hf(a) +
h
2
2
f

(a) +
h
3
6
f

(a) + O(h
4
)
−h
¸
f(a) +
h
2
f

(a) +
h
2
8
f

(a) + O(h
3
)

,
which means that the error term is of order O(h
3
) so we should stop the expansions
there and write
E = h
3
f

(ξ)

1
6

1
8

=
(b −a)
3
24
f

(ξ), ξ ∈ (a, b). (5.6)
76
D. Levy 5.2 Integration via Interpolation
Remark. Throughout this section we assumed that all functions we are interested in
integrating are actually integrable in the domain of interest. We also assumed that they
are bounded and that they are defined at every point, so that whenever we need to
evaluate a function at a point, we can do it. We will go on and use these assumptions
throughout the chapter.
5.2 Integration via Interpolation
In this section we will study how to derive quadratures by integrating an interpolant.
As always, our goal is to evaluate I =

b
a
f(x)dx. We select nodes x
0
, . . . , x
n
∈ [a, b],
and write the Lagrange interpolant (of degree n) through these points, i.e.,
P
n
(x) =
n
¸
i=0
f(x
i
)l
i
(x),
with
l
i
(x) =
n
¸
j=0
j=i
x −x
j
x
i
−x
j
, 0 i n.
Hence, we can approximate

b
a
f(x)dx ≈

b
a
P
n
(x)dx =
n
¸
i=0
f(x
i
)

b
a
l
i
(x)dx =
n
¸
i=0
A
i
f(x
i
). (5.7)
The quadrature coefficients A
i
in (5.7) are given by
A
i
=

b
a
l
i
(x)dx. (5.8)
Note that if we want to integrate several different functions at the same points, the
quadrature coefficients (5.8) need to be computed only once, since they do not depend
on the function that is being integrated. If we change the interpolation/integration
points, then we must recompute the quadrature coefficients.
For equally spaced points, x
0
, . . . , x
n
, a numerical integration formula of the form

b
a
f(x)dx ≈
n
¸
i=0
A
i
f(x
i
), (5.9)
is called a Newton-Cotes formula.
Example 5.1
We let n = 1 and consider two interpolation points which we set as
x
0
= a, x
1
= b.
77
5.2 Integration via Interpolation D. Levy
In this case
l
0
(x) =
b −x
b −a
, l
1
(x) =
x −a
b −a
.
Hence
A
0
=

b
a
l
0
(x) =

b
a
b −x
b −a
dx =
b −a
2
.
Similarly,
A
1
=

b
a
l
1
(x) =

b
a
x −a
b −a
dx =
b −a
2
= A
0
.
The resulting quadrature is the so-called trapezoidal rule,

b
a
dx ≈
b −a
2
[f(a) + f(b)], (5.10)
(see Figure 5.3).
a b
f(a)
f(b)
x
f
(
x
)
Figure 5.3: A trapezoidal quadrature
We can now use the interpolation error to compute the error in the quadrature (5.10).
The interpolation error is
f(x) −P
1
(x) =
1
2
f


x
)(x −a)(x −b), ξ
x
∈ (a, b),
and hence (using the integral intermediate value theorem)
E =

b
a
1
2
f


x
)(x−a)(x−b) =
f

(ξ)
2

b
a
(x−a)(x−b)dx = −
f

(ξ)
12
(b −a)
3
, (5.11)
with ξ ∈ (a, b).
78
D. Levy 5.3 Composite Integration Rules
Remarks.
1. We note that the quadratures (5.7),(5.8), are exact for polynomials of degree
n. For if p(x) is such a polynomial, it can be written as (check!)
p(x) =
n
¸
i=0
p(x
i
)l
i
(x).
Hence

b
a
p(x)dx =
n
¸
i=0
p(x
i
)

b
a
l
i
(x)dx =
n
¸
i=0
A
i
p(x
i
).
2. As of the opposite direction. Assume that the quadrature

b
a
f(x)dx ≈
n
¸
i=0
A
i
f(x
i
),
is exact for all polynomials of degree n. We know that
deg(l
j
(x)) = n,
and hence

b
a
l
j
(x)dx =
n
¸
i=0
A
i
l
j
(x
i
) =
n
¸
i=0
A
i
δ
ij
= A
j
.
5.3 Composite Integration Rules
In a composite quadrature, we divide the interval into subintervals and apply an inte-
gration rule to each subinterval. We demonstrate this idea with a couple of examples.
Example 5.2
Consider the points
a = x
0
< x
1
< < x
n
= b.
The composite trapezoidal rule is obtained by applying the trapezoidal rule in each
subinterval [x
i−1
, x
i
], i = 1, . . . , n, i.e.,

b
a
f(x)dx =
n
¸
i=1

x
i
x
i−1
f(x)dx ≈
1
2
n
¸
i=1
(x
i
−x
i−1
)[f(x
i−1
) + f(x
i
)], (5.12)
(see Figure 5.4).
A particular case is when these points are uniformly spaced, i.e., when all intervals
have an equal length. For example, if
x
i
= a + ih,
79
5.3 Composite Integration Rules D. Levy
x
0
x
1
x
2
x
n−1
x
n
x
f
(
x
)
Figure 5.4: A composite trapezoidal rule
where
h =
b −a
n
,
then

b
a
f(x)dx ≈
h
2
¸
f(a) + 2
n−1
¸
i=1
f(a + ih) + f(b)
¸
= h
n
¸
i=0

f(a + ih). (5.13)
The notation of a sum with two primes,
¸

, means that we sum over all the terms with
the exception of the first and last terms that are being divided by 2.
We can also compute the error term as a function of the distance between neighboring
points, h. We know from (5.11) that in every subinterval the quadrature error is

h
3
12
f


x
).
Hence, the overall error is obtained by summing over n such terms:
n
¸
i=1

h
3
12
f


i
) = −
h
3
n
12
¸
1
n
n
¸
i=1
f


i
)
¸
.
Here, we use the notation ξ
i
to denote an intermediate point that belongs to the i
th
interval. Let
M =
1
n
n
¸
i=1
f


i
).
80
D. Levy 5.3 Composite Integration Rules
Clearly
min
x∈[a,b]
f

(x) M max
x∈[a,b]
f

(x)
If we assume that f

(x) is continuous in [a, b] (which we anyhow do in order for the
interpolation error formula to be valid) then there exists a point ξ ∈ [a, b] such that
f

(ξ) = M.
Hence (recalling that (b −a)/n = h, we have
E = −
(b −a)h
2
12
f

(ξ), ξ ∈ [a, b]. (5.14)
This means that the composite trapezoidal rule is second-order accurate.
Example 5.3
In the interval [a, b] we assume n subintervals and let
h =
b −a
n
.
The quadrature points are
x
j
= a +

j −
1
2

h, j = 1, 2, . . . , n.
The composite midpoint rule is given by applying the midpoint rule (5.4) in every
subinterval, i.e.,

b
a
f(x)dx ≈ h
n
¸
j=1
f(x
j
). (5.15)
Equation (5.15) is known as the composite midpoint rule.
In order to obtain the quadrature error in the approximation (5.15) we recall that
in each subinterval the error is given according to (5.6), i.e.,
E
j
=
h
3
24
f


j
), ξ
j

x
j

h
2
, x
j
+
h
2

.
Hence
E =
n
¸
j=1
E
j
=
h
3
24
n
¸
j=1
f


j
) =
h
3
24
n
¸
1
n
n
¸
j=1
f


j
)
¸
=
h
2
(b −a)
24
f

(ξ), (5.16)
where ξ ∈ (a, b). This means that the composite midpoint rule is also second-order
accurate (just like the composite trapezoidal rule).
81
5.4 Additional Integration Techniques D. Levy
5.4 Additional Integration Techniques
5.4.1 The method of undetermined coefficients
The methods of undetermined coefficients for deriving quadratures is the following:
1. Select the quadrature points.
2. Write a quadrature as a linear combination of the values of the function at the
chosen quadrature points.
3. Determine the coefficients of the linear combination by requiring that the quadra-
ture is exact for as many polynomials as possible from the the ordered set ¦1, x, x
2
, . . .¦.
We demonstrate this technique with the following example.
Example 5.4
Problem: Find a quadrature of the form

1
0
f(x)dx ≈ A
0
f(0) + A
1
f

1
2

+ A
2
f(1),
that is exact for all polynomials of degree 2.
Solution: Since the quadrature has to be exact for all polynomials of degree 2, it has
to be exact for the polynomials 1, x, and x
2
. Hence we obtain the system of linear
equations
1 =

1
0
1dx = A
0
+ A
1
+ A
2
,
1
2
=

1
0
xdx =
1
2
A
1
+ A
2
,
1
3
=

1
0
x
2
dx =
1
4
A
1
+ A
2
.
Therefore, A
0
= A
2
=
1
6
and A
1
=
2
3
, and the desired quadrature is

1
0
f(x)dx ≈
f(0) + 4f

1
2

+ f(1)
6
. (5.17)
Since the resulting formula (5.17) is linear, its being exact for 1, x, and x
2
, implies that
it is exact for any polynomial of degree 2. In fact, we will show in Section 5.5.1 that
this approximation is actually exact for polynomials of degree 3.
82
D. Levy 5.4 Additional Integration Techniques
5.4.2 Change of an interval
Suppose that we have a quadrature formula on the interval [c, d] of the form

d
c
f(t)dt ≈
n
¸
i=0
A
i
f(t
i
). (5.18)
We would like to to use (5.18) to find a quadrature on the interval [a, b], that approxi-
mates for

b
a
f(x)dx.
The mapping between the intervals [c, d] →[a, b] can be written as a linear transforma-
tion of the form
λ(t) =
b −a
d −c
t +
ad −bc
d −c
.
Hence

b
a
f(x)dx =
b −a
d −c

d
c
f(λ(t))dt ≈
b −a
d −c
n
¸
i=0
A
i
f(λ(t
i
)).
This means that

b
a
f(x)dx ≈
b −a
d −c
n
¸
i=0
A
i
f

b −a
d −c
t
i
+
ad −bc
d −c

. (5.19)
We note that if the quadrature (5.18) was exact for polynomials of degree m, so is (5.19).
Example 5.5
We want to write the result of the previous example

1
0
f(x)dx ≈
f(0) + 4f

1
2

+ f(1)
6
,
as a quadrature on the interval [a, b]. According to (5.19)

b
a
f(x)dx ≈
b −a
6
¸
f(a) + 4f

a + b
2

+ f(b)

. (5.20)
The approximation (5.20) is known as the Simpson quadrature.
83
5.4 Additional Integration Techniques D. Levy
5.4.3 General integration formulas
We recall that a weight function is a continuous, non-negative function with a positive
mass. We assume that such a weight function w(x) is given and would like to write a
quadrature of the form

b
a
f(x)w(x)dx ≈
n
¸
i=0
A
i
f(x
i
). (5.21)
Such quadratures are called general (weighted) quadratures.
Previously, for the case w(x) ≡ 1, we wrote a quadrature of the form

b
a
f(x)dx ≈
n
¸
i=0
A
i
f(x
i
),
where
A
i
=

b
a
l
i
(x)dx.
Repeating the derivation we carried out in Section 5.2, we construct an interpolant
Q
n
(x) of degree n that passes through the points x
0
, . . . , x
n
. Its Lagrange form is
Q
n
(x) =
n
¸
i=0
f(x
i
)l
i
(x),
with the usual
l
i
(x) =
n
¸
j=0
j=i
x −x
j
x
i
−x
j
, 0 i n.
Hence

b
a
f(x)w(x)dx ≈

b
a
Q
n
(x)w(x)dx =
n
¸
i=0

b
a
l
i
(x)w(x)dx =
n
¸
i=0
A
i
f(x
i
),
where the coefficients A
i
are given by
A
i
=

b
a
l
i
(x)w(x)dx. (5.22)
To summarize, the general quadrature is

b
a
f(x)w(x)dx ≈
n
¸
i=0
A
i
f(x
i
), (5.23)
with quadrature coefficients, A
i
, that are given by (5.22).
84
D. Levy 5.5 Simpson’s Integration
5.5 Simpson’s Integration
In the last example we obtained Simpson’s quadrature (5.20). An alternative derivation
is the following: start with a polynomial Q
2
(x) that interpolates f(x) at the points a,
(a + b)/2, and b. Then approximate

b
a
f(x)dx ≈

b
a
¸
(x −c)(x −b)
(a −c)(a −b)
f(a) +
(x −a)(x −b)
(c −a)(c −b)
f(c) +
(x −a)(x −c)
(b −a)(b −c)
f(b)

dx
= . . . =
b −a
6
¸
f(a) + 4f

a + b
2

+ f(b)

,
which is Simpson’s rule (5.20). Figure 5.5 demonstrates this process of deriving Simp-
son’s quadrature for the specific choice of approximating

3
1
sin xdx.
1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3
0
0.2
0.4
0.6
0.8
1
←− P
2
(x)
sinx −→
x
Figure 5.5: An example of Simpson’s quadrature. The approximation of

3
1
sin xdx is
obtained by integrating the quadratic interpolant Q
2
(x) over [1, 3]
5.5.1 The quadrature error
Surprisingly, Simpson’s quadrature is exact for polynomials of degree 3 and not only
for polynomials of degree 2. We will see that by studying the error term. We let h
denote half of the interval [a, b], i.e.,
h =
b −a
2
.
85
5.5 Simpson’s Integration D. Levy
Then

b
a
f(x)dx =

a+2h
a
f(x)dx ≈
h
3
[f(a) + 4f(a + h) + f(a + 2h)]
=
h
3
¸
f(a) + 4f(a) + 4hf

(a) +
4
2
h
2
f

(a) +
4
6
h
3
f

(a) +
4
24
h
4
f
(4)
(a) + . . .
+ f(a) + 2hf

(a) +
(2h)
2
2
f

(a) +
(2h)
3
6
f

(a) +
(2h)
4
24
f
(4)
(a) + . . .

= 2hf(a) + 2h
2
f

(a) +
4
3
h
3
f

(a) +
2
3
h
4
f

(a) +
100
3 5!
h
5
f
(4)
(a) + . . .
We now define F(x) to be the primitive function of f(x), i.e.,
F(x) =

x
a
f(t)dt.
Hence
F(a + 2h) =

a+2h
a
f(x)dx = F(a) + 2hF

(a) +
(2h)
2
2
F

(a) +
(2h)
3
6
F

(a)
+
(2h)
4
4!
F
(4)
(a) +
(2h)
5
5!
F
(5)
(a) + . . .
= 2hf(a) + 2h
2
f

(a) +
4
3
h
3
f

(a) +
2
3
h
4
f

(a) +
32
5!
h
5
f
(4)
(a) + . . .
which implies that
F(a + 2h) −
h
3
[f(a) + 4f(a + h) + f(a + 2h)] = −
1
90
h
5
f
(4)
(a) + . . .
This means that the quadrature error for Simpson’s rule is
E = −
1
90

b −a
2

5
f
(4)
(ξ), ξ ∈ [a, b]. (5.24)
Since the fourth derivative of any polynomial of degree 3 is identically zero, the
quadrature error formula (5.24) implies that Simpson’s quadrature is exact for polyno-
mials of degree 3.
5.5.2 Composite Simpson rule
To derive a composite version of Simpson’s quadrature, we divide the interval [a, b] into
an even number of subintervals, n, and let
x
i
= a + ih, 0 i n,
where
h =
b −a
n
.
86
D. Levy 5.6 Gaussian Quadrature
Hence, if we replace the integral in every subintervals by Simpson’s rule (5.20), we obtain

b
a
f(x)dx =

x
2
x
0
f(x)dx + . . . +

xn
x
n−2
f(x)dx =
n/2
¸
i=1

x
2i
x
2i−2
f(x)dx

h
3
n/2
¸
i=1
[f(x
2i−2
) + 4f(x
2i−1
) + f(x
2i
)] .
The composite Simpson quadrature is thus given by

b
a
f(x)dx ≈
h
3

f(x
0
) + 2
n/2
¸
i=0
f(x
2i−2
) + 4
n/2
¸
i=1
f(x
2i−1
) + f(x
n
)
¸
¸
. (5.25)
Summing the error terms (that are given by (5.24)) over all sub-intervals, the quadrature
error takes the form
E = −
h
5
90
n/2
¸
i=1
f
(4)

i
) = −
h
5
90

n
2

2
n
n/2
¸
i=1
f
(4)

i
).
Since
min
x∈[a,b]
f
(4)
(x)
2
n
n/2
¸
i=1
f
(4)

i
) max
x∈[a,b]
f
(4)
(x),
we can conclude that
E = −
h
5
90
n
2
f
(4)
(ξ) = −
h
4
180
f
(4)
(ξ), ξ ∈ [a, b], (5.26)
i.e., the composite Simpson quadrature is fourth-order accurate.
5.6 Gaussian Quadrature
5.6.1 Maximizing the quadrature’s accuracy
So far, all the quadratures we encountered were of the form

b
a
f(x)dx ≈
n
¸
i=0
A
i
f(x
i
). (5.27)
An approximation of the form (5.27) was shown to be exact for polynomials of degree n
for an appropriate choice of the quadrature coefficients A
i
. In all cases, the quadrature
points x
0
, . . . , x
n
were given up front. In other words, given a set of nodes x
0
, . . . , x
n
,
the coefficients ¦A
i
¦
n
i=0
were determined such that the approximation was exact in Π
n
.
We are now interested in investigating the possibility of writing more accurate
quadratures without increasing the total number of quadrature points. This will be
87
5.6 Gaussian Quadrature D. Levy
possible if we allow for the freedom of choosing the quadrature points. The quadra-
ture problem becomes now a problem of choosing the quadrature points in addition to
determining the corresponding coefficients in a way that the quadrature is exact for
polynomials of a maximal degree. Quadratures that are obtained that way are called
Gaussian quadratures.
Example 5.6
The quadrature formula

1
−1
f(x)dx ≈ f


1

3

+ f

1

3

,
is exact for polynomials of degree 3(!) We will revisit this problem and prove this
result in Example 5.9 below.
An equivalent problem can be stated for the more general weighted quadrature case.
Here,

b
a
f(x)w(x)dx ≈
n
¸
i=0
A
i
f(x
i
), (5.28)
where w(x) 0 is a weight function. Equation (5.28) is exact for f ∈ Π
n
if and only if
A
i
=

b
a
w(x)
¸
j=0
j=i
x −x
j
x
i
−x
j
dx.
In both cases (5.27) and (5.28), the number of quadrature nodes, x
0
, . . . , x
n
, is n+1,
and so is the number of quadrature coefficients, A
i
. Hence, if we have the flexibility
of determining the location of the points in addition to determining the coefficients,
we have altogether 2n + 2 degrees of freedom, and hence we can expect to be able to
derive quadratures that are exact for polynomials in Π
2n+1
. This is indeed the case as
we shall see below. We will show that the general solution of this integration problem
is connected with the roots of orthogonal polynomials. We start with the following
theorem.
Theorem 5.7 Let q(x) be a nonzero polynomial of degree n+1 that is w-orthogonal to
Π
n
, i.e., ∀p(x) ∈ Π
n
,

b
a
p(x)q(x)w(x)dx = 0.
If x
0
, . . . , x
n
are the zeros of q(x) then (5.28) is exact ∀f ∈ Π
2n+1
.
Proof. For f(x) ∈ Π
2n+1
, write f(x) = q(x)p(x) + r(x). We note that p(x), r(x) ∈ Π
n
.
Since x
0
, . . . , x
n
are the zeros of q(x) then
f(x
i
) = r(x
i
).
88
D. Levy 5.6 Gaussian Quadrature
Hence,

b
a
f(x)w(x)dx =

b
a
[q(x)p(x) + r(x)]w(x)dx =

b
a
r(x)w(x)dx (5.29)
=
n
¸
i=0
A
i
r(x
i
) =
n
¸
i=0
A
i
f(x
i
).
The second equality in (5.29) holds since q(x) is w-orthogonal to Π
n
. The third equality
(5.29) holds since (5.28) is exact for polynomials in Π
n
.

According to Theorem 5.7 we already know that the quadrature points that will
provide the most accurate quadrature rule are the n+1 roots of an orthogonal polynomial
of degree n + 1 (where the orthogonality is with respect to the weight function w(x)).
We recall that the roots of q(x) are real, simple and lie in (a, b), something we know
from our previous discussion on orthogonal polynomials (see Theorem 3.17). In other
words, we need n + 1 quadrature points in the interval, and an orthogonal polynomial
of degree n +1 does have n +1 distinct roots in the interval. We now restate the result
regarding the roots of orthogonal functions with an alternative proof.
Theorem 5.8 Let w(x) be a weight function. Assume that f(x) is continuous in [a, b]
that is not the zero function, and that f(x) is w-orthogonal to Π
n
. Then f(x) changes
sign at least n + 1 times on (a, b).
Proof. Since 1 ∈ Π
n
,

b
a
f(x)w(x)dx = 0.
Hence, f(x) changes sign at least once. Now suppose that f(x) changes size only r
times, where r n. Choose ¦t
i
¦
i0
such that
a = t
0
< t
1
< < t
r+1
= b,
and f(x) is of one sign on (t
0
, t
1
), (t
1
, t
2
), . . . , (t
r
, t
r+1
). The polynomial
p(x) =
n
¸
i=1
(x −t
i
),
has the same sign property. Hence

b
a
f(x)p(x)w(x)dx = 0,
which leads to a contradiction since p(x) ∈ Π
n
.

89
5.6 Gaussian Quadrature D. Levy
Example 5.9
We are looking for a quadrature of the form

1
−1
f(x)dx ≈ A
0
f(x
0
) + A
1
f(x
1
).
A straightforward computation will amount to making this quadrature exact for the
polynomials of degree 3. The linearity of the quadrature means that it is sufficient to
make the quadrature exact for 1, x, x
2
, and x
3
. Hence we write the system of equations

1
−1
f(x)dx =

1
−1
x
i
dx = A
0
x
i
0
+ A
1
x
i
1
, i = 0, 1, 2, 3.
From this we can write

A
0
+ A
1
= 2,
A
0
x
0
+ A
1
x
1
= 0,
A
0
x
2
0
+ A
1
x
2
1
=
2
3
,
A
0
x
3
0
+ A
1
x
3
1
= 0.
Solving for A
1
, A
2
, x
0
, and x
1
we get
A
1
= A
2
= 1, x
0
= −x
1
=
1

3
,
so that the desired quadrature is

1
−1
f(x)dx ≈ f


1

3

+ f

1

3

. (5.30)
Example 5.10
We repeat the previous problem using orthogonal polynomials. Since n = 1, we expect
to find a quadrature that is exact for polynomials of degree 2n+1 = 3. The polynomial
of degree n+1 = 2 which is orthogonal to Π
n
= Π
1
with weight w(x) ≡ 1 is the Legendre
polynomial of degree 2, i.e.,
P
2
(x) =
1
2
(3x
2
−1).
The integration points will then be the zeros of P
2
(x), i.e.,
x
0
= −
1

3
, x
1
=
1

3
.
All that remains is to determine the coefficients A
1
, A
2
. This is done in the usual way,
assuming that the quadrature

1
−1
f(x)dx ≈ A
0
f(x
0
) + A
1
f(x
1
),
90
D. Levy 5.6 Gaussian Quadrature
is exact for polynomials of degree 1. The simplest will be to use 1 and x, i.e.,
2 =

1
−1
1dx = A
0
+ A
1
,
and
0 =

1
−1
xdx = −A
0
1

3
+ A
1
1

3
.
Hence A
0
= A
1
= 1, and the quadrature is the same as (5.30) (as should be).
5.6.2 Convergence and error analysis
Lemma 5.11 In a Gaussian quadrature formula, the coefficients are positive and their
sum is

b
a
w(x)dx.
Proof. Fix n. Let q(x) ∈ Π
n+1
be w-orthogonal to Π
n
. Also assume that q(x
i
) = 0 for
i = 0, . . . , n, and take ¦x
i
¦
n
i=0
to be the quadrature points, i.e.,

b
a
f(x)w(x)dx ≈
n
¸
i=0
A
i
f(x
i
). (5.31)
Fix 0 j n. Let p(x) ∈ Π
n
be defined as
p(x) =
q(x)
x −x
j
.
Since x
j
is a root of q(x), p(x) is indeed a polynomial of degree n. The degree of
p
2
(x) 2n which means that the Gaussian quadrature (5.31) is exact for it. Hence
0 <

b
a
p
2
(x)w(x)dx =
n
¸
i=0
A
i
p
2
(x
i
) =
n
¸
i=0
A
i
q
2
(x
i
)
(x
i
−x
j
)
2
= A
j
p
2
(x
j
),
which means that ∀j, A
j
> 0. In addition, since the Gaussian quadrature is exact for
f(x) ≡ 1, we have

b
a
w(x)dx =
n
¸
i=0
A
i
.

In order to estimate the error in the Gaussian quadrature we would first like to
present an alternative way of deriving the Gaussian quadrature. Our starting point
is the Lagrange form of the Hermite polynomial that interpolates f(x) and f

(x) at
x
0
, . . . , x
n
. It is given by (2.42), i.e.,
p(x) =
n
¸
i=0
f(x
i
)a
i
(x) +
n
¸
i=0
f

(x
i
)b
i
(x),
91
5.6 Gaussian Quadrature D. Levy
with
a
i
(x) = (l
i
(x))
2
[1 + 2l

i
(x
i
)(x
i
−x)], b
i
(x) = (x −x
i
)l
2
i
(x), 0 ≤ i ≤ n,
and
l
i
(x) =
n
¸
j=0
j=i
x −x
j
x
i
−x
j
.
We now assume that w(x) is a weight function in [a, b] and approximate

b
a
w(x)f(x)dx ≈

b
a
w(x)p
2n+1
(x)dx =
n
¸
i=0
A
i
f(x
i
) +
n
¸
i=0
B
i
f

(x
i
), (5.32)
where
A
i
=

b
a
w(x)a
i
(x)dx, (5.33)
and
B
i
=

b
a
w(x)b
i
(x)dx. (5.34)
In some sense, it seems to be rather strange to deal with the Hermite interpolant when
we do not explicitly know the values of f

(x) at the interpolation points. However, we
can eliminate the derivatives from the quadrature (5.32) by setting B
i
= 0 in (5.34).
Indeed (assuming n = 0):
B
i
=

b
a
w(x)(x −x
i
)l
2
i
(x)dx =
n
¸
j=0
j=i
(x
i
−x
j
)

b
a
w(x)
n
¸
j=0
(x −x
j
)l
i
(x)dx.
Hence, B
i
= 0, if the product
¸
n
j=0
(x − x
j
) is orthogonal to l
i
(x). Since l
i
(x) is a
polynomial in Π
n
, all that we need is to set the points x
0
, . . . , x
n
as the roots of a
polynomial of degree n+1 that is w-orthogonal to Π
n
. This is precisely what we defined
as a Gaussian quadrature.
We are now ready to formally establish the fact that the Gaussian quadrature is
exact for polynomials of degree 2n + 1.
Theorem 5.12 Let f ∈ C
2n+2
[a, b] and let w(x) be a weight function. Consider the
Gaussian quadrature

b
a
f(x)w(x)dx ≈
n
¸
i=0
A
i
f(x
i
).
Then there exists ζ ∈ (a, b) such that

b
a
f(x)w(x)dx −
n
¸
i=0
A
i
f(x
i
) =
f
(2n+2)
(ζ)
(2n + 2)!

b
a
n
¸
j=0
(x −x
j
)
2
w(x)dx.
92
D. Levy 5.7 Romberg Integration
Proof. We use the characterization of the Gaussian quadrature as the integral of a
Hermite interpolant. We recall that the error formula for the Hermite interpolation is
given by (2.49),
f(x) −p
2n+1
(x) =
f
(2n+2)
(ξ)
(2n + 2)!
n
¸
j=0
(x −x
j
)
2
, ξ ∈ (a, b).
Hence according to (5.32) we have

b
a
f(x)w(x)dx −
n
¸
i=0
A
i
f(x
i
) =

b
a
f(x)w(x)dx −

b
a
p
2n+1
w(x)dx
=

b
a
w(x)
f
(2n+2)
(ξ)
(2n + 2)!
n
¸
j=0
(x −x
j
)
2
dx.
The integral mean value theorem then implies that there exists ζ ∈ (a, b) such that

b
a
f(x)w(x)dx −
n
¸
i=0
A
i
f(x
i
) =
f
(2n+2)
(ζ)
(2n + 2)!

b
a
n
¸
j=0
(x −x
j
)
2
(x)w(x)dx.

We conclude this section with a convergence theorem that states that for continuous
functions, the Gaussian quadrature converges to the exact value of the integral as the
number of quadrature points tends to infinity. This theorem is not of a great practical
value because it does not provide an estimate on the rate of convergence. A proof of
the theorem that is based on the Weierstrass approximation theorem can be found in,
e.g., in [7].
Theorem 5.13 We let w(x) be a weight function and assuming that f(x) is a con-
tinuous function on [a, b]. For each n ∈ N we let ¦x
n
i
¦
n
i=0
be the n + 1 roots of the
polynomial of degree n + 1 that is w-orthogonal to Π
n
, and consider the corresponding
Gaussian quadrature:

b
a
f(x)w(x)dx ≈
n
¸
i=0
A
n
i
f(x
n
i
). (5.35)
Then the right-hand-side of (5.35) converges to the left-hand-side as n →∞.
5.7 Romberg Integration
We have introduced Richardson’s extrapolation in Section 4.4 in the context of numerical
differentiation. We can use a similar principle with numerical integration.
We will demonstrate this principle with a particular example. Let I denote the exact
integral that we would like to approximate, i.e.,
I =

b
a
f(x)dx.
93
5.7 Romberg Integration D. Levy
Let’s assume that this integral is approximated with a composite trapezoidal rule on a
uniform grid with mesh spacing h (5.13),
T(h) = h
n
¸
i=0

f(a + ih).
We know that the composite trapezoidal rule is second-order accurate (see (5.14)).
A more detailed study of the quadrature error reveals that the difference between I and
T(h) can be written as
I = T(h) + c
1
h
2
+ c
2
h
4
+ . . . + c
k
h
k
+ O(h
2k+2
).
The exact values of the coefficients, c
k
, are of no interest to us as long as they do not
depend on h (which is indeed the case). We can now write a similar quadrature that is
based on half the number of points, i.e., T(2h). Hence
I = T(2h) + c
1
(2h)
2
+ c
2
(2h)
4
+ . . .
This enables us to eliminate the h
2
error term:
I =
4T(h) −T(2h)
3
+ ˆ c
2
h
4
+ . . .
Therefore
4T(h) −T(2h)
3
=
1
3
¸
4h

1
2
f
0
+ f
1
+ . . . + f
n−1
+
1
2
f
n

−2h

1
2
f
0
+ f
2
+ . . . + f
n−2
+
1
2
f
n

=
h
3
(f
0
+ 4f
1
+ 2f
2
+ . . . + 2f
n−2
+ 4f
n−1
+ f
n
) = S(n).
Here, S(n) denotes the composite Simpson’s rule with n subintervals. The procedure
of increasing the accuracy of the quadrature by eliminating the leading error term is
known as Romberg integration. In some places, Romberg integration is used to
describe the specific case of turning the composite trapezoidal rule into Simpson’s rule
(and so on). The quadrature that is obtained from Simpson’s rule by eliminating the
leading error term is known as the super Simpson rule.
94
D. Levy
6 Methods for Solving Nonlinear Problems
6.1 The Bisection Method
In this section we present the “bisection method” which is probably the most intuitive
approach to root finding. We are looking for a root of a function f(x) which we assume
is continuous on the interval [a, b]. We also assume that it has opposite signs at both
edges of the interval, i.e., f(a)f(b) < 0. We then know that f(x) has at least one zero
in [a, b]. Of course f(x) may have more than one zero in the interval. The bisection
method is only going to converge to one of the zeros of f(x). There will also be no
indication as of how many zeros f(x) has in the interval, and no hints regarding where
can we actually hope to find more roots, if indeed there are additional roots.
The first step is to divide the interval into two equal subintervals,
c =
a + b
2
.
This generates two subintervals, [a, c] and [c, b], of equal lengths. We want to keep the
subinterval that is guaranteed to contain a root. Of course, in the rare event where
f(c) = 0 we are done. Otherwise, we check if f(a)f(c) < 0. If yes, we keep the left
subinterval [a, c]. If f(a)f(c) > 0, we keep the right subinterval [c, b]. This procedure
repeats until the stopping criterion is satisfied: we fix a small parameter ε > 0 and stop
when [f(c)[ < ε. To simplify the notation, we denote the successive intervals by [a
0
, b
0
],
[a
1
, b
1
],... The first two iterations in the bisection method are shown in Figure 6.1. Note
that in the case that is shown in the figure, the function f(x) has multiple roots but the
method converges to only one of them.
0
a
0
c
b
0
x
f(a
0
)
f(c)
f(b
0
)
0
a
1
c
b
1
x
f(a
1
)
f(c)
f(b
1
)
Figure 6.1: The first two iterations in a bisection root-finding method
95
6.1 The Bisection Method D. Levy
We would now like to understand if the bisection method always converges to a root.
We would also like to figure out how close we are to a root after iterating the algorithm
several times. We first note that
a
0
a
1
a
2
. . . b
0
,
and
b
0
b
1
b
2
. . . a
0
.
We also know that every iteration shrinks the length of the interval by a half, i.e.,
b
n+1
−a
n+1
=
1
2
(b
n
−a
n
), n 0,
which means that
b
n
−a
n
= 2
−n
(b
0
−a
0
).
The sequences ¦a
n
¦
n0
and ¦b
n
¦
n0
are monotone and bounded, and hence converge.
Also
lim
n→∞
b
n
− lim
n→∞
a
n
= lim
n→∞
2
−n
(b
0
−a
0
) = 0,
so that both sequences converge to the same value. We denote that value by r, i.e.,
r = lim
n→∞
a
n
= lim
n→∞
b
n
.
Since f(a
n
)f(b
n
) 0, we know that (f(r))
2
0, which means that f(r) = 0, i.e., r is a
root of f(x).
We now assume that we stop in the interval [a
n
, b
n
]. This means that r ∈ [a
n
, b
n
].
Given such an interval, if we have to guess where is the root (which we know is in the
interval), it is easy to see that the best estimate for the location of the root is the center
of the interval, i.e.,
c
n
=
a
n
+ b
n
2
.
In this case, we have
[r −c
n
[
1
2
(b
n
−a
n
) = 2
−(n+1)
(b
0
−a
0
).
We summarize this result with the following theorem.
Theorem 6.1 If [a
n
, b
n
] is the interval that is obtained in the n
th
iteration of the bisec-
tion method, then the limits lim
n→∞
a
n
and lim
n→∞
b
n
exist, and
lim
n→∞
a
n
= lim
n→∞
b
n
= r,
where f(r) = 0. In addition, if
c
n
=
a
n
+ b
n
2
,
then
[r −c
n
[ 2
−(n+1)
(b
0
−a
0
).
96
D. Levy 6.2 Newton’s Method
6.2 Newton’s Method
Newton’s method is a relatively simple, practical, and widely-used root finding method.
It is easy to see that while in some cases the method rapidly converges to a root of the
function, in some other cases it may fail to converge at all. This is one reason as of why
it is so important not only to understand the construction of the method, but also to
understand its limitations.
As always, we assume that f(x) has at least one (real) root, and denote it by r. We
start with an initial guess for the location of the root, say x
0
. We then let l(x) be the
tangent line to f(x) at x
0
, i.e.,
l(x) −f(x
0
) = f

(x
0
)(x −x
0
).
The intersection of l(x) with the x-axis serves as the next estimate of the root. We
denote this point by x
1
and write
0 −f(x
0
) = f

(x
0
)(x
1
−x
0
),
which means that
x
1
= x
0

f(x
0
)
f

(x
0
)
. (6.1)
In general, the Newton method (also known as the Newton-Raphson method) for
finding a root is given by iterating (6.1) repeatedly, i.e.,
x
n+1
= x
n

f(x
n
)
f

(x
n
)
. (6.2)
Two sample iterations of the method are shown in Figure 6.2. Starting from a point x
n
,
we find the next approximation of the root x
n+1
, from which we find x
n+2
and so on. In
this case, we do converge to the root of f(x).
It is easy to see that Newton’s method does not always converge. We demonstrate
such a case in Figure 6.3. Here we consider the function f(x) = tan
−1
(x) and show what
happens if we start with a point which is a fixed point of Newton’s method, iterated
twice. In this case, x
0
≈ 1.3917 is such a point.
In order to analyze the error in Newton’s method we let the error in the n
th
iteration
be
e
n
= x
n
−r.
We assume that f

(x) is continuous and that f

(r) = 0, i.e., that r is a simple root of
f(x). We will show that the method has a quadratic convergence rate, i.e.,
e
n+1
≈ ce
2
n
. (6.3)
A convergence rate estimate of the type (6.3) makes sense, of course, only if the method
converges. Indeed, we will prove the convergence of the method for certain functions
97
6.2 Newton’s Method D. Levy
0
r
x
n+2
x
n+1
x
n
x
f(x) −→
Figure 6.2: Two iterations in Newton’s root-finding method. r is the root of f(x) we
approach by starting from x
n
, computing x
n+1
, then x
n+2
, etc.
0
x
1
, x
3
, x
5
, ... x
0
, x
2
, x
4
, ...
x
t
a
n

1
(
x
)
Figure 6.3: Newton’s method does not always converge. In this case, the starting point
is a fixed point of Newton’s method iterated twice
98
D. Levy 6.2 Newton’s Method
f(x), but before we get to the convergence issue, let’s derive the estimate (6.3). We
rewrite e
n+1
as
e
n+1
= x
n+1
−r = x
n

f(x
n
)
f

(x
n
)
−r = e
n

f(x
n
)
f

(x
n
)
=
e
n
f

(x
n
) −f(x
n
)
f

(x
n
)
.
Writing a Taylor expansion of f(r) about x = x
n
we have
0 = f(r) = f(x
n
−e
n
) = f(x
n
) −e
n
f

(x
n
) +
1
2
e
2
n
f


n
),
which means that
e
n
f

(x
n
) −f(x
n
) =
1
2
f


n
)e
2
n
.
Hence, the relation (6.3), e
n+1
≈ ce
2
n
, holds with
c =
1
2
f


n
)
f

(x
n
)
(6.4)
Since we assume that the method converges, in the limit as n →∞we can replace (6.4)
by
c =
1
2
f

(r)
f

(r)
. (6.5)
We now return to the issue of convergence and prove that for certain functions
Newton’s method converges regardless of the starting point.
Theorem 6.2 Assume that f(x) has two continuous derivatives, is monotonically in-
creasing, convex, and has a zero. Then the zero is unique and Newton’s method will
converge to it from every starting point.
Proof. The assumptions on the function f(x) imply that ∀x, f

(x) > 0 and f

(x) > 0.
By (6.4), the error at the (n + 1)
th
iteration, e
n+1
, is given by
e
n+1
=
1
2
f


n
)
f

(x
n
)
e
2
n
,
and hence it is positive, i.e., e
n+1
> 0. This implies that ∀n 1, x
n
> r, Since
f

(x) > 0, we have
f(x
n
) > f(r) = 0.
Now, subtracting r from both sides of (6.2) we may write
e
n+1
= e
n

f(x
n
)
f

(x
n
)
, (6.6)
99
6.3 The Secant Method D. Levy
which means that e
n+1
< e
n
(and hence x
n+1
< x
n
). Hence, both ¦e
n
¦
n0
and ¦x
n
¦
n0
are decreasing and bounded from below. This means that both series converge, i.e.,
there exists e

such that,
e

= lim
n→∞
e
n
,
and there exists x

such that
x

= lim
n→∞
x
n
.
By (6.6) we have
e

= e


f(x

)
f

(x

)
,
so that f(x

) = 0, and hence x

= r.

6.3 The Secant Method
We recall that Newton’s root finding method is given by equation (6.2), i.e.,
x
n+1
= x
n

f(x
n
)
f

(x
n
)
.
We now assume that we do not know that the function f(x) is differentiable at x
n
, and
thus can not use Newton’s method as is. Instead, we can replace the derivative f

(x
n
)
that appears in Newton’s method by a difference approximation. A particular choice of
such an approximation,
f

(x
n
) ≈
f(x
n
) −f(x
n−1
)
x
n
−x
n−1
,
leads to the secant method which is given by
x
n+1
= x
n
−f(x
n
)
¸
x
n
−x
n−1
f(x
n
) −f(x
n−1
)

, n 1. (6.7)
A geometric interpretation of the secant method is shown in Figure 6.4. Given two
points, (x
n−1
, f(x
n−1
)) and (x
n
, f(x
n
)), the line l(x) that connects them satisfies
l(x) −f(x
n
) =
f(x
n−1
) −f(x
n
)
x
n−1
−x
n
(x −x
n
).
The next approximation of the root, x
n+1
, is defined as the intersection of l(x) and the
x-axis, i.e.,
0 −f(x
n
) =
f(x
n−1
) −f(x
n
)
x
n−1
−x
n
(x
n+1
−x
n
). (6.8)
100
D. Levy 6.3 The Secant Method
0
r
x
n+1
x
n
x
n−1
x
f(x) −→
Figure 6.4: The Secant root-finding method. The points x
n−1
and x
n
are used to obtain
x
n+1
, which is the next approximation of the root r
Rearranging the terms in (6.8) we end up with the secant method (6.7).
We note that the secant method (6.7) requires two initial points. While this is an
extra requirement compared with, e.g., Newton’s method, we note that in the secant
method there is no need to evaluate any derivatives. In addition, if implemented prop-
erly, every stage requires only one new function evaluation.
We now proceed with an error analysis for the secant method. As usual, we denote
the error at the n
th
iteration by e
n
= x
n
−r. We claim that the rate of convergence of
the secant method is superlinear (meaning, better than linear but less than quadratic).
More precisely, we will show that it is given by
[e
n+1
[ ≈ [e
n
[
α
, (6.9)
with
α =
1 +

5
2
. (6.10)
We start by rewriting e
n+1
as
e
n+1
= x
n+1
−r =
f(x
n
)x
n−1
−f(x
n−1
)x
n
f(x
n
) −f(x
n−1
)
−r =
f(x
n
)e
n−1
−f(x
n−1
)e
n
f(x
n
) −f(x
n−1
)
.
Hence
e
n+1
= e
n
e
n−1
¸
x
n
−x
n−1
f(x
n
) −f(x
n−1
)

¸
f(xn)
en

f(x
n−1
)
e
n−1
x
n
−x
n−1
¸
. (6.11)
101
6.3 The Secant Method D. Levy
A Taylor expansion of f(x
n
) about x = r reads
f(x
n
) = f(r + e
n
) = f(r) + e
n
f

(r) +
1
2
e
2
n
f

(r) + O(e
3
n
),
and hence
f(x
n
)
e
n
= f

(r) +
1
2
e
n
f

(r) + O(e
2
n
).
We thus have
f(x
n
)
e
n

f(x
n−1
)
e
n−1
=
1
2
(e
n
−e
n−1
)f

(r) + O(e
2
n−1
) + O(e
2
n
)
=
1
2
(x
n
−x
n−1
)f

(r) + O(e
2
n−1
) + O(e
2
n
).
Therefore,
f(xn)
en

f(x
n−1
)
e
n−1
x
n
−x
n−1

1
2
f

(r),
and
x
n
−x
n−1
f(x
n
) −f(x
n−1
)

1
f

(r)
.
The error expression (6.11) can be now simplified to
e
n+1

1
2
f

(r)
f

(r)
e
n
e
n−1
= ce
n
e
n−1
. (6.12)
Equation (6.12) expresses the error at iteration n+1 in terms of the errors at iterations
n and n − 1. In order to turn this into a relation between the error at the (n + 1)
th
iteration and the error at the n
th
iteration, we now assume that the order of convergence
is α, i.e.,
[e
n+1
[ ∼ A[e
n
[
α
. (6.13)
Since (6.13) also means that [e
n
[ ∼ A[e
n−1
[
α
, we have
A[e
n
[
α
∼ C[e
n
[A

1
α
[e
n
[
1
α
.
This implies that
A
1+
1
α
C
−1
∼ [e
n
[
1−α+
1
α
. (6.14)
The left-hand-side of (6.14) is non-zero while the right-hand-side of (6.14) tends to zero
as n →∞ (assuming, of course, that the method converges). This is possible only if
1 −α +
1
α
= 0,
102
D. Levy 6.3 The Secant Method
which, in turn, means that
α =
1 +

5
2
.
The constant A in (6.13) is thus given by
A = C
1
1+
1
α
= C
1
α
= C
α−1
=
¸
f

(r)
2f

(r)

α−1
.
We summarize this result with the theorem:
Theorem 6.3 Assume that f

(x) is continuous ∀x in an interval I. Assume that
f(r) = 0 and that f

(r) = 0. If x
0
, x
1
are sufficiently close to the root r, then x
n
→ r.
In this case, the convergence is of order
1+

5
2
.
103
REFERENCES D. Levy
References
[1] Atkinson K., An introduction to numerical analysis, Second edition, John Wiley &
Sons, New York, NY, 1989
[2] Cheney E.W., Introduction to approximation theory, Second edition, Chelsea pub-
lishing company, New York, NY, 1982
[3] Dahlquist G., Bj¨orck A., Numerical methods, Prentice-Hall, Englewood cliffs, NJ,
1974
[4] Davis P.J., Interpolation and approximation, Second edition, Dover, New York, NY,
1975
[5] Isaacson E., Keller H.B., Analysis of numerical methods, Second edition, Dover,
Mineola, NY, 1994
[6] Stoer J., Burlisch R., Introduction to numerical analysis, Second edition, Springer-
Verlag, New York, NY, 1993
[7] S¨ uli E., Mayers D., An introduction to numerical analysis, Cambridge university
press, Cambridge, UK, 2003.
104
Index
L
2
-norm. . . . . . . . . . . . . . . . . . . . . . . . . 36, 48
weighted. . . . . . . . . . . . . . . . . . . . . 52, 53
L

-norm. . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
approximation
best approximation . . . . . . . . . . . . . . 42
existence . . . . . . . . . . . . . . . . . . . . . . 42
least-squares. . . . . . . . . . . . . . . . . . . . . 48
Hilbert matrix. . . . . . . . . . . . . . . . . 49
orthogonal polynomials . . . . . . . . 50
solution . . . . . . . . . . . . . . . . . . . . . . . 48
minimax. . . . . . . . . . . . . . . . . . . . . . . . . 41
oscillating theorem . . . . . . . . . . . . 44
remez . . . . . . . . . . . . . . . . . . . . . . . . . 46
uniqueness. . . . . . . . . . . . . . . . . . . . . 45
near minimax . . . . . . . . . . . . . . . . . . . 46
Weierstrass . . . . . . . . . . . . . . . . . . . . . . 37
Bernstein polynomials . . . . . . . . . . . . . . . 37
Bessel’s inequality . . . . . . . . . . . . . . . . . . . 59
Chebyshev
near minimax interpolation . . . . . . 46
points . . . . . . . . . . . . . . . . . . . . . . . 18, 46
polynomials. . . . . . . . . . . . . . . . . . 15, 56
Chebyshev uniqueness theorem . . . . . . 45
de la Vall´ee-Poussin . . . . . . . . . . . . . . . . . 44
differentiation . . . . . . . . . . . . . . . . . . . . . . . 65
accuracy. . . . . . . . . . . . . . . . . . . . . . . . . 66
backward differencing. . . . . . . . . . . . 66
centered differencing. . . . . . . . . . . . . 66
forward differencing . . . . . . . . . . . . . 66
one-sided differencing. . . . . . . . . . . . 66
Richardson’s extrapolation. . . . . . . 72
truncation error . . . . . . . . . . . . . . . . . 66
undetermined coefficients . . . . . . . . 70
via interpolation . . . . . . . . . . . . . 67, 68
divided differences . . . . . . . . . . . . . . . 10, 14
with repetitions . . . . . . . . . . . . . . . . . 23
Gram-Schmidt . . . . . . . . . . . . . . . . . . . . . . 53
Hermite polynomials . . . . . . . . . . . . . . . . 57
Hilbert matrix. . . . . . . . . . . . . . . . . . . . . . . 49
inner product . . . . . . . . . . . . . . . . . . . . . . . 53
weighted . . . . . . . . . . . . . . . . . . . . . . . . 53
integration
Gaussian . . . . . . . . . . . . . . . . . . . . . . . . 87
orthogonal polynomials . . . . . . . . 88
midpoint rule. . . . . . . . . . . . . . . . . . . . 75
composite . . . . . . . . . . . . . . . . . . . . . 81
Newton-Cotes . . . . . . . . . . . . . . . . . . . 77
quadratures . . . . . . . . . . . . . . . . . . . . . 75
weighted. . . . . . . . . . . . . . . . . . . . . . . 84
rectangular rule . . . . . . . . . . . . . . . . . 75
Riemann . . . . . . . . . . . . . . . . . . . . . . . . 74
Romberg. . . . . . . . . . . . . . . . . . . . . 93, 94
Simpson’s rule . . . . . . . . . . . . . . . 83, 85
composite . . . . . . . . . . . . . . . . . . . . . 86
error . . . . . . . . . . . . . . . . . . . . . . . 85, 87
super Simpson. . . . . . . . . . . . . . . . . . . 94
trapezoidal rule . . . . . . . . . . . . . . 77, 78
composite . . . . . . . . . . . . . . . . . . . . . 79
undetermined coefficients . . . . . . . . 82
interpolation
Chebyshev points . . . . . . . . . . . . 15, 18
divided differences . . . . . . . . . . . 10, 14
with repetitions. . . . . . . . . . . . . . . . 23
error . . . . . . . . . . . . . . . . . . . . . . . . . 12, 15
existence . . . . . . . . . . . . . . . . . . . . . . . . . 3
Hermite . . . . . . . . . . . . . . . . . . . . . . . . . 21
Lagrange form. . . . . . . . . . . . . . . . . 25
Newton’s form. . . . . . . . . . . . . . . . . 23
interpolation error . . . . . . . . . . . . . . . . 3
interpolation points . . . . . . . . . . . . . . . 3
Lagrange form. . . . . . . . . . . . . . . . . . . . 8
near minimax . . . . . . . . . . . . . . . . . . . 46
Newton’s form . . . . . . . . . . . . . . . . 5, 10
divided differences . . . . . . . . . . . . . 10
polynomial interpolation. . . . . . . . . . 3
splines. . . . . . . . . . . . . . . . . . . . . . . . . . . 28
cubic . . . . . . . . . . . . . . . . . . . . . . . . . . 30
degree . . . . . . . . . . . . . . . . . . . . . . . . . 28
knots . . . . . . . . . . . . . . . . . . . . . . . . . . 28
105
INDEX D. Levy
natural . . . . . . . . . . . . . . . . . . . . 33, 34
not-a-knot . . . . . . . . . . . . . . . . . . . . . 33
uniqueness . . . . . . . . . . . . . . . . . . . . . 3, 7
Vandermonde determinant . . . . . . . . 6
weighted least squares . . . . . . . . . . . 52
Lagrange form . . . . . . . . see interpolation
Laguerre polynomials . . . . . . . . . . . . 57, 62
least-squares . . . . . . . . see approximation
weighted. . . . . . . . . see approximation
Legendre polynomials . . . . . . . . 56, 59, 60
maximum norm . . . . . . . . . . . . . . . . . . . . . 36
minimax error . . . . . . . . . . . . . . . . . . . 41, 43
monic polynomial . . . . . . . . . . . . . . . . . . . 16
Newton’s form . . . . . . . . see interpolation
norm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
orthogonal polynomials . . . 50, 51, 53, 88
Bessel’s inequality . . . . . . . . . . . . . . . 59
Chebyshev . . . . . . . . . . . . . . . . . . . . . . 56
Gram-Schmidt. . . . . . . . . . . . . . . . . . . 53
Hermite . . . . . . . . . . . . . . . . . . . . . . . . . 57
Laguerre . . . . . . . . . . . . . . . . . . . . . 57, 62
Legendre . . . . . . . . . . . . . . . . . 56, 59, 60
Parseval’s equality. . . . . . . . . . . . . . . 59
roots of . . . . . . . . . . . . . . . . . . . . . . . . . . 63
triple recursion relation. . . . . . . . . . 63
oscillating theorem . . . . . . . . . . . . . . . . . . 44
Parseval’s equality. . . . . . . . . . . . . . . . . . . 59
quadratures . . . . . . . . . . . . . see integration
Remez algorithm . . . . . . . . . . . . . . . . . . . . 46
Richardson’s extrapolation. . . . . . . 72, 93
Riemann sums . . . . . . . . . . . . . . . . . . . . . . 74
Romberg integration. . . . . . . . . . . . . 93, 94
root finding
Newton’s method. . . . . . . . . . . . . . . . 97
the bisection method . . . . . . . . . . . . 95
the secant method. . . . . . . . . . . . . . 100
splines . . . . . . . . . . . . . . . . see interpolation
Taylor series. . . . . . . . . . . . . . . . . . . . . . . . . 23
triangle inequality . . . . . . . . . . . . . . . . . . . 36
Vandermonde determinant . . . . . . . . . . . . 6
Weierstrass approximation theorem. . 37
106

D. Levy

Preface

i

.

. . . . . . . . . . . . . .3. . . . . . . . . . . . . 3. . . . . . . .4 Richardson’s Extrapolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 Numerical Differentiation 4. . .3. . . . . 2.1 Basic Concepts . . . . . . . . . . . . . . . . .7 Properties of orthogonal polynomials . . . . . . . . . . . . . . . . . 2. . . . . . . . . . . .2. . . . . . . . . . .3 Characterization of the minimax polynomial . . . . . 2. . . . . . . . 2. 2.5 The near-minimax polynomial . .2 The Lagrange form of the Hermite interpolant . . 3. . . . . .2. . . . . . . 2. . 3. . . . 3. . . . . . . . . . . . . . . . . . . .1 What is Interpolation? . . . . . . . . .1 Divided differences with repetitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. . . . .9 Hermite Interpolation . . . . . . .9. . . . . . . . . . . . . . 4.3. . . . . . 3. . . .3. . . . . . . . . . 3. . . . . . .2. . .3. . .6 Divided Differences . . . . . . . . . . . . . . . . . .4 Uniqueness of the minimax polynomial . . . . . . . . . . . . iii . . . . . 2. . . . . . . . . . . . .1 The least-squares approximation problem . . . . . . . . . . . . . . . . . i 1 2 2 3 5 6 7 10 12 15 21 23 25 28 30 34 36 36 41 42 43 44 45 46 46 48 48 48 50 52 53 58 63 65 65 67 70 72 3 Approximations 3. . .2 Solving the least-squares problem: a direct method . . . . . . .1 Cubic splines . . .10. .5 Orthogonal polynomials . . . . . . . . .8 Interpolation at the Chebyshev Points . . .6 Construction of the minimax polynomial . . . . . . . . . . . . . . 3. . . .3 Solving the least-squares problem: with orthogonal polynomials 3. . . . . . . . . . . . . . . .4 The weighted least squares problem . . . . 3. .2 Bounds on the minimax error . . . . . . . . . 3. .3. . . . . . . .4 The Interpolation Problem and the Vandermonde Determinant 2. . . . . . .1 Background . . . . . . . . . . .2 What is natural about the natural spline? . . . .3 Newton’s Form of the Interpolation Polynomial . . . . . . . . .2. . . . . . . . .5 The Lagrange Form of the Interpolation Polynomial . . . . . . . . . . . . . .10.6 Another approach to the least-squares problem . . . . . . . 2. . . . . . . . . .3 The Method of Undetermined Coefficients 4. . . . . . . .2 Differentiation Via Interpolation .2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. . . . . . 2. . .2 The Minimax Approximation Problem . . . . 3. . . . . . . . . . . . . . . . . . . . . . . .7 The Error in Polynomial Interpolation . . . . . . . . . . . . .2 The Interpolation Problem . . . . . . . . . . . . . 3.2. .1 Existence of the minimax polynomial . 3.9. . . . . . . . 2. . . Levy CONTENTS Contents Preface 1 Introduction 2 Interpolation 2. . . . .3. . . . . . . . . . . . . . . . . . . . . . . . . . . . .D. . . 4. . . 2.3 Least-squares Approximations . . . . . .10 Spline Interpolation . . . . . 3. .

. . . . 97 . . . . . . . Bibliography D. 5. . . . . . . . . . . . . . . . . . . . . . . . .5 Simpson’s Integration . .5. . . . . . . . . . . . . 100 104 iv . . . . . . . . . . .2 Convergence and error analysis . . . . . . . . . . .3 Composite Integration Rules . . . . . . . . . 5. . . . . . . . 5. . . . . . . . . . . . .1 Maximizing the quadrature’s accuracy . . . . . . . . 5. . . . . . . . 5. . . . . 5. . . . .2 Newton’s Method . . .4 Additional Integration Techniques . . . . . . . . . . . . .5.1 The quadrature error . . . . . . . . . . . .CONTENTS 5 Numerical Integration 5. .1 The method of undetermined coefficients 5. . . . . 95 . . . . . . . . . 5. .7 Romberg Integration . . . . . . . . .1 The Bisection Method . . . . . . .2 Integration via Interpolation . . .4. . . . . . . . . . .3 The Secant Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. . . . . . . . . . . . . . . . . . . .4. . . . . . . . . . . . . . . . . . . .6. . . . . . . . . . 6. . . . . . . . . . . . . . . . . . . . . . . . 6 Methods for Solving Nonlinear 6. . . . . . . . . .6 Gaussian Quadrature . . . . . . 6. . .4. . . . . . . . . . . . 5.3 General integration formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Change of an interval . . . .6. . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Basic Concepts . 5. Levy 74 74 77 79 82 82 83 84 85 85 86 87 87 91 93 . . . . . . . . 5. . . 5. . Problems 95 . . . . . . . . . . . . . . . . . . . . . .2 Composite Simpson rule . . . .

Levy 1 Introduction 1 .D.

and the interpolating polynomial Q(x) As a simple example let’s consider values of a function that are prescribed at two points: (x0 . we may look for a trigonometric function or a piecewise-smooth polynomial such that the interpolation requirements Q(xj ) = f (xj ). i. Q(x) f(x2) f(x1) f(x0) f(x) x0 x1 x2 Figure 2.e. x1 . the interpolation points x0 . usually (though not always) we are interested in a different kind of a solution.D. Alternatively.1) are satisfied (see Figure 2.1 Interpolation What is Interpolation? Imagine that there is an unknown function f (x) for which someone supplies you with its (exact) values at (n + 1) distinct points x0 < x1 < · · · < xn . that passes through these points. . f (x0 ). We therefore always specify a certain class of functions from which we would like to find one that solves the interpolation problem. However... There are infinitely many functions that pass through these two points. 0 j n. f (x1 )). which is nothing but the line that connects them.1). e.g. (2. . One easy way of finding such a function. we may look for a polynomial. While this is a legitimate solution of the interpolation problem. if we limit ourselves to polynomials of degree less than or equal to one. is a polynomial of degree 2 . f (x0 )) and (x1 . . in general. . x2 . a smoother function. Q(x). For example. is to connect them with straight lines. f (xn ) are given. Levy 2 2. The interpolation problem is to construct a function Q(x) that passes through these points.1: The function f (x). A line. there is only one such function that passes through these two points.

. This is why we say that there is a unique polynomial of degree 1 that connects these two points (and not “a polynomial of degree 1”). If we do not limit the degree of the interpolation polynomial it is easy to see that there any infinitely many polynomials that interpolate the data.2) are satisfied. there are infinitely many polynomials that interpolate between (x0 .D. . There is only one polynomial of degree 1 that does the job. n. . at any other point. Levy 2. . . The points x0 .2 The Interpolation Problem We begin our study with the problem of polynomial interpolation: Given n + 1 distinct points x0 . We will study ways of estimating the interpolation error. This result is formally stated in the following theorem: Theorem 2. you may have a list of interpolation points x0 . or has more than one solution. . f (x1 )). y0 . . . For example. singles out precisely one interpolant that will do the job. yn . . xn ∈ R are distinct. . 3 . What we are going to study in this section is precisely how to distinguish between these cases. . . The property of “passing through these points” is referred to as interpolating the data. we seek a polynomial Qn (x) of the lowest degree such that the following interpolation conditions are satisfied: Qn (xj ) = f (xj ). y1 . if n = 1. The solution to this interpolation problem is identical to the one where the values are taken from an underlying function.1). xn . . It is important to note that it is possible to formulate interpolation problem without referring to (or even assuming the existence of) any underlying function f (x). j = 0. there is little hope that the interpolant will be identical to the unknown function f (x). which you would like to interpolate. . . We will also discuss strategies on how to minimize this error. then for any f (x0 ). . . For example. . In general. . . has a unique solution. xn . . Sometimes the interpolation problem has a solution.1 If x0 . xn . . 2. We are also going to present various ways of actually constructing the interpolant. .2) Note that we do not assume any ordering between the points x0 . it could be that f (x0 ) = f (x1 ) in which case the line that connects the two points is the constant Q0 (x) ≡ f (x0 ). limiting the degree to n. which is a polynomial of degree zero. The function Q(x) that interpolates f (x) at the interpolation points will be still be identical to f (x) at these points because there we satisfy the interpolation conditions (2. Q(x) and f (x) will not have the same values. . as such an order will make no difference for the present discussion. However. f (x0 )) and (x1 . . . but if we want to keep the discussion general enough. In general.2 The Interpolation Problem one. The interpolation error a measure on how different these two functions are. xn are called the interpolation points. f (xn ) there exists a unique polynomial Qn (x) of degree n such that the interpolation conditions (2. (2. . . . . . and data that is given at these points. There are cases were the interpolation problem has no solution. The function that interpolates the data is an interpolant or an interpolating polynomial (or whatever function is being used).

which means that Hn (x) has n + 1 distinct zeros. Qn (xn ) = f (xn ). in the following way: Qn (x) = Qn−1 (x) + c(x − x0 ) · .e.2). However. In addition. 4 . Define a polynomial Hn (x) as the difference Hn (x) = Qn (x) − Pn (x). we have Hn (xj ) = (Qn − Pn )(xj ) = 0. This leads to a contradiction that can be resolved only if Hn (x) is the zero polynomial. · (x − xn−1 ). and uniqueness is established.2). Levy Proof.e. The condition (2.3) The constant c in (2. i. . since both Qn (x) and Pn (x) satisfy the interpolation requirements (2. the polynomial Qn (x) satisfies the interpolation requirements Qn (xj ) = f (xj ) for 0 j n − 1. is satisfied. Pn (x) of degree that satisfy the interpolation conditions (2. As for uniqueness. All that remains is to determine the constant c in such a way that the last interpolation condition.. 0 j n. . For n = 0. and suppose also that Qn−1 (xj ) = f (xj ). Qn (xn ) = Qn−1 (xn ) + c(xn − x0 ) · . (2.3) is yet to be determined. · · · (xn − xn−1 ). Qn (x). i. Clearly. (2.5) (xn − xj ) and we are done with the proof of existence. Pn (x) = Qn (x). n The degree of Hn (x) is at most n which means that it can have at most n zeros (unless it is identically zero). We start with the existence part and prove the result by induction. Q0 = f (x0 ). 0 j n − 1. .2 The Interpolation Problem D. Let us now construct from Qn−1 (x) a new polynomial. . the construction of Qn (x) implies that deg(Qn (x)) n.4) . suppose that there are two polynomials Qn (x). Suppose that Qn−1 is a polynomial of degree n − 1..2.4) defines c as c= f (xn ) − Qn−1 (xn ) n−1 j=0 (2.

· (x − xn−1 ) n j−1 (2.D. We follow the procedure given by (2. + an (x − x0 ) · . Example 2. where a0 = f (x0 ). We do it in the following way: • Let Q0 (x) = a0 .5) we have a1 = f (x1 ) − Q0 (x1 ) f (x1 ) − f (x0 ) = .6) are given by a0 = f (x0 ). • Let Q1 (x) = a0 + a1 (x − x0 ).1 implies that we will only be rewriting the same polynomial in different ways.7) as the Newton form of the interpolation polynomial. . f (x0 )) and (x1 .1 is that it is constructive.3 Newton’s Form of the Interpolation Polynomial 2. aj = j−1 k=0 (2. let Qn (x) = a0 + a1 (x − x0 ) + .6)–(2. . • In general. As we shall see below. x 1 − x0 5 . we can use the proof to write down a formula for the interpolation polynomial. The coefficients aj in (2. . Levy 2. there are various ways of writing the interpolation polynomial.2 The Newton form of the polynomial that interpolates (x0 . f (xj ) − Qj−1 (xj ) . x 1 − x0 x 1 − x0 We note that Q1 (x) is nothing but the straight line connecting the two points (x0 . . Following (2. f (x0 )) and (x1 .3 Newton’s Form of the Interpolation Polynomial One good thing about the proof of Theorem 2.6) = a0 + j=1 aj k=0 (x − xk ).7) (xj − xk ) We refer to the interpolation polynomial when written in the form (2. f (x1 )). The uniqueness of the interpolation polynomial as guaranteed by Theorem 2. In other words.4) for reconstructing the interpolation polynomial. f (x1 )) is Q1 (x) = f (x0 ) + f (x1 ) − f (x0 ) (x − x0 ).

Indeed. . the interpolation conditions. xn 0 . 1 xn . . ..g.9) In view of Theorem 2. xn 0 . (2.  . 1 xn . (2. . . . 1 x0 1 x1 . . f (x0 ))... that the determinant of its coefficients matrix must not vanish. i. xn 1 . . ..9). . + bn xn = f (xj ). so we should be able to compute directly the coefficients of the polynomial as given in (2.   . xn n i>j (xi − xj ). 1 j = 0. . .12). f (x1 )). . xn b0 f (x0 ) 0 x1 . . n. . (2. . imply that the following equations should hold: b0 + b1 xj + . f (x2 )) is f (x2 ) − f (x0 ) + x1 −x0 (x2 − x0 ) f (x1 ) − f (x0 ) Q2 (x) = f (x0 )+ (x−x0 )+ (x−x0 )(x−x1 ). e. . . .8).11) (2. . . . it has to be nonsingular. and (x2 . . xn 1 .e..4 The Interpolation Problem and the Vandermonde Determinant D. . . .10) can be rewritten as     x0 . (x1 .  =  . . .8) and require that the following interpolation conditions are satisfied Qn (xj ) = f (xj ). is known as the Vandermonde determinant.2. (2. .. Levy Example 2. xn bn f (xn ) n (2.10) In order for the system (2. . .4 The Interpolation Problem and the Vandermonde Determinant An alternative approach to the interpolation problem is to consider directly a polynomial of the form n Qn (x) = k=0 bk xk .1 we already know that this problem has a unique solution. . We leave it as an exercise to verify that 1 x0 1 x1 . . xn . xn   b1   f (x1 )  1     . . . . .12) The determinant (2. .11) to have a unique solution. (2. . .13) 6 . . x 1 − x0 (x2 − x0 )(x2 − x1 ) f (x1 )−f (x0 ) 2.  . (2. . . = . . . j In matrix  1 1  . .. . . . . xn n form. This means.3 The Newton form of the polynomial that interpolates the three points (x0 . = 0.  . 0 j n. .

However.14) we have n Qn (xi ) = j=0 n f (xj )lj (xi ).17) does not vanish since we assume that all intern polation points are distinct. in which the coefficients were unknown. (2. i = j. . (2. · (x − xn ) .11) has a solution that is also unique. The 7 . · (xj − xj−1 )(xj − xj+1 ) · . 2.17) Note that the denominator in (2. .14) n where lj (x) are n+1 polynomials of degree n. (xj − x0 ) · .8) assumed a linear combination of polynomials of degrees 0. We now require that Qn (x) satisfies the interpolation conditions Qn (xi ) = f (xi ). which confirms what we already know according to Theorem 2.5 The Lagrange Form of the Interpolation Polynomial Since we assume that the points x0 . . Levy 2.16) is the n One obvious way of constructing polynomials lj of degree following: n lj (x) = (x − x0 ) · . .15) By substituting xi for x in (2. We use two indices in these polynomials: n the subscript j enumerates lj (x) from 0 to n and the superscript n is used to remind n n us that the degree of lj (x) is n. the system (2. 0. . . .13) is indeed non zero. In this section we take a different approach and assume that the interpolation polynomial is given as a linear combination of n + 1 polynomials of degree n. · (xj − xn ) 0 j n. . · (x − xj−1 )(x − xj+1 ) · . {f (xj )}n . Qn (x). n In view of (2. 0 i n. n. Hence. the polynomials lj (x) are precisely of degree n (and not n). Note that in this particular case. the degree of Qn (x) is n at the most.16) where δij is the Kr¨necker delta. . and hence the polynomials lj (x) are well defined. . given by (2.D. . We thus let n Qn (x) = j=0 n f (xj )lj (x). we set the coefficients as the interpolated values. . .15) we may conclude that lj (x) must satisfy n lj (xi ) = δij . In either case.5 The Lagrange Form of the Interpolation Polynomial The form of the interpolation polynomial that we used in (2.1. This time. (2. (2. 0 i n. xn are distinct. i = j. . the determinant (2.14) may have a lower degree. n that satisfy (2. . . o δij = 1. while the unknowns are the j=0 polynomials.

We know that the unique interpolation polynomial through these two points is the line that connects the two points.5 The Lagrange Form of the Interpolation Polynomial D. where the polynomials lj (x) of degree n are given by n n lj (x) = i=0 i=j (x − xi ) . (x1 − x0 )(x1 − x2 ) (x − x0 )(x − x1 ) 2 l2 (x) = . 1.2. Such a line can be written in many different forms.4 We are interested in finding the Lagrange form of the interpolation polynomial that interpolates two points: (x0 . f (x0 )) and (x1 . Q2 (x). (x2 − x0 )(x2 − x1 ) 2 l0 (x) = The interpolation polynomial is therefore given by 2 2 2 Q2 (x) = f (x0 )l0 (x) + f (x1 )l1 (x) + f (x2 )l2 (x) (x − x1 )(x − x2 ) (x − x0 )(x − x2 ) (x − x0 )(x − x1 ) = f (x0 ) + f (x1 ) + f (x2 ) . (x1 . f (x1 )). Levy Lagrange form of the interpolation polynomial is the polynomial Qn (x) given by n (2.18) n i=0 i=j (xj − xi ) Example 2. the Lagrange form of the interpolation polynomial does not let us use the interpolation polynomial through the first two points. . . (2. In order to obtain the Lagrange form we let 1 l0 (x) = x − x1 . . Remarks. x 0 − x1 x 1 − x0 The desired polynomial is therefore given by the familiar formula 1 1 Q1 (x) = f (x0 )l0 (x) + f (x1 )l1 (x) = f (x0 ) Example 2. . (x0 − x1 )(x0 − x2 ) (x1 − x0 )(x1 − x2 ) (x2 − x0 )(x2 − x1 ) It is easy to verify that indeed Q2 (xj ) = f (xj ) for j = 0. f (x2 )).5 This time we are looking for the Lagrange form of the interpolation polynomial. f (x1 )). This means that we have to compute everything from scratch. n. f (x0 )). 2.14). x 1 − x0 x − x1 x − x0 + f (x1 ) . as a building block for Q2 (x). We start with (x − x1 )(x − x2 ) . as desired. (x0 − x1 )(x0 − x2 ) (x − x0 )(x − x2 ) 2 l1 (x) = . 8 . Q1 (x). x 0 − x1 1 l1 (x) = x − x0 . that interpolates three points: (x0 . j = 0. Unfortunately. (x2 .

(x − xj )wn (xj ) 0 j n.21) For example. . If we define n wn (x) = i=0 (x − xi ). then n n wn (x) = j=0 i=0 i=j (x − xi ). An alternative form for lj (x) can be obtained in the following way. One instance where the Lagrange form of the interpolation polynomial may seem to be advantageous when compared with the Newton form is when you are interested in solving several interpolation problems. xj . In this n case. the polynomials lj (x) are identical for all problems since they depend only on the points but not on the values of the function at these points. n 2. (2.19) that does not vanish: n wn (xj ) = i=0 i=j (xj − xi ). . .D. Levy 2. Therefore. all given at the same interpolation points x0 .4 is f (x1 ) f (x0 ) + . For future reference we note that the coefficient of xn in the interpolation polynomial Qn (x) is n n j=0 k=0 k=j f (xj ) (xj − xk ) . x 0 − x1 x 1 − x0 9 . there is only one term in the sum in (2. . . lj (x) can be rewritten as n lj (x) = wn (x) . . (2. (2.20) 3. .19) When we then evaluate wx (x) at any interpolation point. they have to be constructed only once. f (xn ). xn but with different values f (x0 ). n Hence. the coefficient of x in Q1 (x) in Example 2.5 The Lagrange Form of the Interpolation Polynomial 1.

We also denote the zeroth-order divided difference as a0 = f [x0 ]. .2.e. we use the following notation: aj = f [x0 . . x n − x0 10 n − 1 that interpolates f (x) (2. + f [x0 . To emphasize this dependence. + an (x − x0 ) · . . xn ] − f [x0 . that interpolates Proof. When written in terms of the divided differences. Levy 2. xj and on the values of the function at these points f (x0 ). . . . . . where f [x0 ] = f (x0 ). . . . This is given by the following lemma. f (xj ). a polynomial of degree f (x) at x0 . Qk (xj ) = f (xj ). .6 Divided Differences Qn (x) = a0 + a1 (x − x0 ) + . . . (2. . i. . . xn . we denote by Qk (x). . · (x − xn−1 ). x n − x0 (2. x1 ](x − x0 ) + .6 The divided differences satisfy: f [x0 . 1 j n.24) .22) There is a simple way of computing the j th -order divided difference from lower order divided differences. . Lemma 2. .23) k. We now consider the unique polynomial P (x) of degree at x1 . .6 Divided Differences D. xn ] k=0 (x − xk ). The j th -order divided difference. . . xj ]. aj . xn−1 ] . 1 j n. . .. aj . . xk . . . 0 j k. . as the j th -order divided difference. xn ] = f [x1 . For any k. the Newton form of the interpolation polynomial becomes n−1 Qn (x) = f [x0 ] + f [x0 . . . . We recall that Newton’s form of the interpolation polynomial is with a0 = f (x0 ) and aj = f (xj ) − Qj−1 (xj ) j−1 k=0 . is based on the points x0 . . (xj − xk ) We name the j th coefficient. It is easy to verify that Qn (x) = P (x) + x − xn [P (x) − Qn−1 (x)]. . . . . .

.D. x2 ] − f [x0 . 1] = so that f [0.7 The second-order divided difference is f [x1 . . . if we want to find the polynomial of degree (0. x1 . . .1 (up to third-order). 5). The coefficient of xn−1 in P (x) is f [x1 . . x2 ] = = x 2 − x0 f (x2 )−f (x1 ) x2 −x1 x 2 − x0 − f (x1 )−f (x0 ) x1 −x0 . . f (x1 )). xn−1 ]. .24) is 1 (f [x1 . 1] − f [−1. 1] = 3−5 = −2. . x n − x0 which means that f [x0 . 1 − (−1) 2 Q2 (x) = 9 − 4(x + 1) + (x + 1)x = 5 − 3x + x2 . . The recursive structure of the divided differences implies that it is required to compute all the low order coefficients in the table in order to get the high-order ones. x1 ] f [x0 . xn−1 ] . 0] = 5−9 = −4. x2 ](x − x0 )(x − x1 ) f (x1 ) − f (x0 ) (x − x0 ) + = f (x0 ) + x 1 − x0 f (x2 )−f (x1 ) x2 −x1 x 2 − x0 − f (x1 )−f (x0 ) x1 −x0 (x − x0 )(x − x1 ). . 9). f [−1.6 Divided Differences The coefficient of xn on the left-hand-side of (2. f [−1. . x1 . 0 − (−1) f [0. 1−0 2 that interpolates (−1. the coefficient of xn on the right-hand-side of (2. . xn ].24) is f [x0 . . we have f (−1) = 9. The relations between the divided differences are schematically portrayed in Table 2. . We note that the divided differences that are being used as the coefficients in the interpolation polynomial are those that are located in the top of every column. . and (1. . xn−1 ]). xn ] − f [x0 . . . the unique polynomial that interpolates (x0 . Hence. . . x1 ](x − x0 ) + f [x0 . f (x2 )) is Q2 (x) = f [x0 ] + f [x0 . (x1 . x n − x0 Example 2. . . . 0. . Hence. xn ] and the coefficient of xn−1 in Qn−1 (x) is f [x0 . 0] −2 + 4 = = 1. . f (x0 )). . and (x2 . Levy 2. 3). 11 . . For example. . xn ] − f [x0 . xn ] = f [x1 .

x2 ] f [x0 . . . x2 ] x2 f (x2 ) f [x2 . Then ∀x ∈ [a. . x3 ] f [x0 . . b]. .2. . . xn ]. . . . Levy x0 f (x0 ) f [x0 . xn . This means that if we assume that y0 . and hence these two coefficients must be identical. there is.7 The Error in Polynomial Interpolation D. x3 ] x3 f (x3 ) Table 2. . we can estimate the difference between them. While the interpolant and the function agree with each other at the interpolation points.7 The Error in Polynomial Interpolation In this section we would like to provide estimates on the “error” we make when interpolating data that is taken from sampling an underlying function f (x).25) . . . f [x1 . . . f [y0 . . then f [y0 . yn ] = f [x0 .1: Divided Differences One important property of any divided difference is that it is a symmetric function of its arguments. in general. xn ] plays the role of the coefficient of xn in the polynomial that interpolates f (x) at x0 . (n + 1)! j=0 12 n (2. xn . Theorem 2. x1 . . . . . . b) such that f (x) − Qn (x) = 1 f (n+1) (ξn ) (x − xj ). yn is any permutation of x0 . . x1 ] x1 f (x1 ) f [x1 . . ∃ξn ∈ (a. the order of the points does not matter. no reason to expect them to be close to each other elsewhere. xn ∈ [a.8 Let f (x) ∈ C n+1 [a. We let Πn denote the space of polynomials of degree n. b]. At the same time. . yn ] is the coefficient of xn at the polynomial that interpolates f (x) at the same points. . x3 ] 2. b]. . . . x2 . This property can be clearly explained by recalling that f [x0 . x1 . x2 . . . Since the interpolation polynomial is unique for any given data. . Nevertheless. Let Qn (x) ∈ Πn such that it interpolates f (x) at the n + 1 distinct points x0 . a difference which we refer to as the interpolation error. .

. .7 The Error in Polynomial Interpolation Proof. and let n w(x) = j=0 (x − xj ). we used the fact that the leading term of w(x) is xn+1 . i. without any loss of generality. F has at least n distinct zeros in (a. We now note that since f ∈ C n+1 [a. . in addition. If x is one of the interpolation points x0 .g. there is another important characterization which we will comment on now. xn and x. If we assume. w(x) does not vanish and λ is well defined. . .D. we can assume. the first-order divided difference f [x0 .26) we conclude with f (x) − Qn (x) = 1 f (n+1) (ξn )w(x). In addition. b] and since Qn and w are polynomials. x1 ] = f (x1 ) − f (x0 ) .e. . xn . . λ= f (x) − Qn (x) . which guarantees that its (n + 1)th derivative equals w(n+1) (x) = (n + 1)! Reordering the terms in (2. that x0 < x1 . F vanishes at n + 2 points: x0 . that 13 . We therefore assume that x = xj 0 j n. which we denote by ξn . According to Rolle’s theorem. and hence this result holds trivially. b]. . b).26) Here. e. then the left-hand-side and the right-hand-side of (2. then also F ∈ C n+1 [a. w(x) Since the interpolation points x0 .. F (n+1) has at least one zero in (a. We have 0 = F (n+1) (ξn ) = f (n+1) (ξn ) − Q(n+1) (ξn ) − λ(x)w(n+1) (ξn ) n f (x) − Qn (x) = f (n+1) (ξn ) − (n + 1)! w(x) (2.27) In addition to the interpretation of the divided difference of order n as the coefficient of xn in some interpolation polynomial. b). where λ is chosen as to guarantee that F (x) = 0. (n + 1)! (2. . x 1 − x0 Since the order of the points does not change the value of the divided difference. b]. Consider. xn and x are distinct. F has at least n + 1 distinct zeros in (a. . . b). . Levy 2. . We now let F (y) = f (y) − Qn (y) − λw(y). We fix a point x ∈ [a.25) are both zero. and similarly.

Levy f (x) is continuously differentiable in the interval [x0 . . n! In other words. then this divided difference equals to the derivative of f (x) at an intermediate point. xn−1 . x1 ]. . x] j=0 (y − xj ). xn−1 . we could as well think of the interpolation point x as any other interpolation point. . we can no longer consider this divided difference to represent a nth -order derivative of the function. xn−1 ) and b = max(x. x1 ).8 we know that the interpolation error is given by 1 f (x) − Qn−1 (x) = f (n) (ξn−1 ) (x − xj ). By Theorem 2. . This notion can be extended to divided differences of higher order as stated by the following theorem. b). the nth -order divided difference is an nth -derivative of the function f (x) at an intermediate point. though if this is the case.. . . . ξ ∈ (x0 . assuming that the function has n continuous derivatives. x] = where ξ ∈ (a. . . and name it.. xn−1 . xn ] = . i. Since Qn (y) interpolated f (y) at x. . . Remark. f [x0 . . In this case.28).22). n! j=0 which implies the result (2. . b). Similarly to the first-order divided difference. . e. the equation (2. . its being differentiable). In equation (2. n! (2. .28) takes the somewhat more natural form of f (n) (ξ) f [x0 . x0 .e.7 The Error in Polynomial Interpolation D. x0 . xn−1 ). . . It is important to note that while this interpretation is based on additional smoothness requirements from f (x) (i. the first-order divided difference can be viewed as an approximation of the first derivative in the interval. . Then f [x0 . 14 n−1 . . xn−1 be n + 1 distinct points. . we know that n−1 f (n) (ξ) . we would like to emphasize that the nth -order divided difference is also well defined in cases where the function is not as smooth as required in the theorem. . the divided differences are well defined also for non-differentiable functions. Let Qn+1 (y) interpolate f (y) at x0 . . we have n−1 f (x) = Qn−1 (x) + f [x0 . Let a = min(x.e.g. x] j=0 (x − xj ). . x. In other words. Assume that f (y) has a continuous derivative of order n in the interval (a. . Then according to the construction of the Newton form of the interpolation polynomial (2. . . Proof. . x0 . . . Theorem 2. .9 Let x. xn .2. x1 ] = f (ξ). xn−1 . .28).28) Qn (y) = Qn−1 (y) + f [x0 .

8 Interpolation at the Chebyshev Points In the entire discussion so far. T2 (x) = 2xT1 (x)−T0 (x) = 2x2 −1. Once again. It is important to note that the interpolation points influence two terms on the right-hand-side of (2. minimizing the interpolation error is not an easy task. For the time being.25)). T1 (x) = x. All that it guarantees is that the product part of the interpolation error is minimal.2.30) is minimized. how to choose the interpolation points x0 . . (2.30). The obvious one is the product n j=0 (x − xj ). . 1]. it is possible to write an explicit formula for the Chebyshev polynomials: Lemma 2. For example. 15 (2. The tool that we are going to use is the Chebyshev polynomials. it would be reasonable to use this degree of freedom to minimize the interpolation error. Due to the implicit dependence of ξn on the interpolation points. We will return to this “full” problem later on in the context of the minimax approximation. n 0. . the interpolation error is of the form 1 f (x) − Qn (x) = f (n+1) (ξn ) (x − xj ).8 Interpolation at the Chebyshev Points 2. If this is the case. The solution of the problem will be to interpolate at Chebyshev points.29) Here. Levy 2. We start by defining the Chebyshev polynomials using the following recursion relation:   T0 (x) = 1. (2. . we would like to emphasize that a solution of this problem does not (in general) provide an optimal choice of interpolation points that will minimize the interpolation error. We will first introduce the Chebyshev polynomials and the Chebyshev points and then show why interpolating at these points minimizes (2. (n + 1)! j=0 n (2. .30) The second one is f (n+1) (ξn ) as ξn depends on x0 . . T2 (x) and T3 (x) are shown in Figure 2. There may be cases where one may have the flexibility of choosing the interpolation points.10 For x ∈ [−1. we are going to focus on a simpler problem. Instead of writing the recursion formula. xn such that the product (2.32) . .31)  Tn+1 (x) = 2xTn (x) − Tn−1 (x). The polynomials T1 (x). xn . we assumed that the interpolation points are given. The solution of this problem is the topic of this section. n 1.29). Tn (x) = cos(n cos−1 x). (2. We recall that if we are interpolating values of a function f (x) that has n continuous derivatives. .31). namely. and T3 (x) = 4x3 −3x. Qn (x) is the interpolating polynomial and ξn is an intermediate point in the interval of interest (see (2.D.

We define a monic polynomial as a polynomial for which the coefficient of the leading term is one.4 −0. We now let θ = cos−1 x. T2 (x) and T3 (x) Proof. a polynomial of degree n is monic.4 0. there is one more issue that we must clarify. if it is of the form xn + an−1 xn−1 + .8 −0.4 0.. Levy 0. What is so special about the Chebyshev polynomials. n (2.2 0 −0.2 −0.  tn+1 (x) = 2xtn (x) − tn−1 (x). .6 −0.6 −0. + a1 x + a0 .8 Interpolation at the Chebyshev Points 1 D.8 T3(x) T1(x) 0. and define tn (x) = cos(n cos−1 x) = cos(nθ).8 1 −1 −1 x Figure 2.6 0. Hence tn (x) = Tn (x). . Then by (2.e. i.2.33) 1.8 T2(x) −0. t1 (x) = x. 16 .2 0 0. x = cos θ.4 −0.2: The Chebyshev polynomials T1 (x). cos(n − 1)θ = cos θ cos nθ + sin θ sin nθ.e. i.33)   t0 (x) = 1. Standard trigonometric identities imply that cos(n + 1)θ = cos θ cos nθ − sin θ sin nθ. Hence cos(n + 1)θ = 2 cos θ cos nθ − cos(n − 1)θ.6 0. but before doing so. and what is the connection between these polynomials and minimizing the interpolation error? We are about to answer these questions.2 0..

. A general result about monic polynomials is the following Theorem 2. 0 j n. However. We prove (2. and let xj be the following n + 1 points xj = cos Since Tn cos we have (−1)j qn (xj ) = 21−n . 21−n Tn (x) = xn + . Hence (−1)j pn (xj ) This means that (−1)j (qn (xj ) − pn (xj )) > 0. |x| 1. Levy 2.e. 0 j n. Note that pn −qn can not be the zero polynomial because 17 . . the polynomial (qn − pn )(x) oscillates (n + 1) times in the interval [−1. then −1 x 1 max |pn (x)| 21−n . Hence. Such a polynomial can not have more than n − 1 distinct roots. jπ n .34) Proof.D. i.11 If pn (x) is a monic polynomial of degree n. Let qn (x) = 21−n Tn (x).8 Interpolation at the Chebyshev Points Note that Chebyshev polynomials are not monic: the definition (2..34) by contradiction. . which means that (qn − pn )(x) has at least n distinct roots in the interval.31) implies that the Chebyshev polynomial of degree n is of the form Tn (x) = 2n−1 xn + . . Suppose that |pn (x)| < 21−n . This means that Tn (x) divided by 2n−1 is monic. (2. jπ n = (−1)j . which leads to a contradiction. pn (x) and qn (x) are both monic polynomials which means that their difference is a polynomial of degree n − 1 at most. 1]. |pn (xj )| < 21−n = (−1)j qn (xj ).

1].8 Interpolation at the Chebyshev Points D. Here. i. . The roots of Tn+1 (x). x0 .10 Tn+1 (x) = cos((n + 1) cos−1 x). and the interpolation polynomial. the (n + 1) roots of Tn+1 (x) are xj = cos 2j + 1 π .11 to figure out how to reduce the interpolation error. which is equivalent to choosing xj as the roots of the Chebyshev polynomial Tn+1 (x).11 n − xj ) is a monic polynomial of degree n + 1 and hence by Theo2−n .35) for the roots of the Chebyshev polynomial has the following geometrical interpretation. . 2n + 2 0 j n. . xn ∈ [−1. xn ..35) The roots of the Chebyshev polynomials are sometimes referred to as the Chebyshev points. . 0 j n. define α = π/n. In order to find the roots of Tn (x). The formula (2. max |x| 1 j=0 (x − xj ) The minimal value of 2−n can be actually obtained if we set n 2 −n Tn+1 (x) = j=0 (x − xj ). 1) such that the distance between the function whose values we interpolate. f (x). . then there exists ξn ∈ (−1. we have used the obvious fact that |Tn (x)| 1. n j=0 (x We note that rem 2. . We are interested in minimizing n max |x| 1 j=0 (x − xj ) .2. Qn (x). (2. Divide 18 . is max |f (x) − Qn (x)| |x| 1 1 max |f (n+1) (x)| max |x| 1 (n + 1)! |x| 1 n j=0 (x − xj ) . are therefore obtained if (n + 1) cos−1 (xj ) = j+ 1 2 π. What are the roots of the Chebyshev polynomial Tn+1 (x)? By Lemma 2.8 that if the interpolation points x0 . . We know by Theorem 2. . We are now ready to use Theorem 2. Levy that will imply that pn (x) and qn (x) are identical which again is not possible due to the assumptions on their maximum values.e.

.e. The Chebyshev points are then obtained by projecting these points on the x-axis.D. Find Q2 (x) which interpolates f (x) in the Chebyshev points. i. we need 3 interpolation points. Assume also that these (n + 1) interpolation points are the (n + 1) roots of the Chebyshev polynomial of degree n + 1. 2n + 2 0 j n. . Solution: Since we are asked to find an interpolation polynomial of degree 2. .36) 2j + 1 π . x3 . xj = cos Then ∀|x| 1. 2n (n 1 max f (n+1) (ξ) . + 1)! |ξ| 1 (2. Note that they become dense next to the boundary of the interval the upper half of the unit circle into n + 1 parts such that the two side angles are α/2 and the other angles are α. T3 (x) = 4x3 − 3x. 19 . The following theorem summarizes the discussion on interpolation at the Chebyshev points. 1]. .. and hence we first need to compute the 3 roots of the Chebyshev polynomial of degree 3. . We are also asked to interpolate at the Chebyshev points.3: The roots of the Chebyshev polynomial T4 (x). x0 . . Theorem 2. xn . . |f (x) − Qn (x)| Example 2.3 for T4 (x). Tn+1 (x). This procedure is demonstrated in Figure 2.13 Problem: Let f (x) = sin(πx) in the interval [−1. Estimate the error. It also provides an estimate of the error for this case.12 Assume that Qn (x) interpolates f (x) at x0 . .8 Interpolation at the Chebyshev Points The unit circle 5π 8 3π 8 7π 8 π 8 -1 x0 x1 0 x2 x3 1 x Figure 2. Levy 2.

2n + 2 0 j n. x1 . x1 ] = 0. | sin πx − Q2 (x)| 1 max |(sin πt)(3) | 2 3! |ξ| 1 2 π3 22 3! 1.4086. ∀|x| 1.8 Interpolation at the Chebyshev Points The roots of T3 (x) can be easily found from x(4x2 − 3) = 0. Q2 (x). x1 = 0. √ √ 3 3 x0 = − . In the more general case where the interpolation interval for the function f (x) is x ∈ [a.e. it is still possible to use the previous results by following the following steps: Start by converting the interpolation interval to y ∈ [−1.4.. x2 = . Levy ≈ 0.2. 2 f (x1 ) = 0. x2 ] = ≈ 0. yj = cos 2j + 1 π . f (x2 ) = sin √ 3 π 2 D.292 A brief examination of Figure 2. are plotted in Figure 2. 20 . 2 2 The corresponding values of f (x) at these interpolation points are √ 3 f (x0 ) = sin − π ≈ −0. f [x0 . x1 ] = x 1 − x0 f (x2 ) − f (x1 ) f [x1 . As of the error estimate. i. Remark.4086. it is far from being sharp. b] into an interpolation problem for f (x) = g(x(y)) in y ∈ [−1. x2 ](x − x0 )(x − x1 ) ≈ 0. The original function f (x) and the interpolant at the Chebyshev points. The Chebyshev points in the interval y ∈ [−1. 1] are the roots of the Chebyshev polynomial Tn+1 (x). x2 ] − f [x0 . 2 This converts the interpolation problem for f (x) on [a. x1 ](x − x0 ) + f [x0 . b]. x2 ] = f [x1 . . x 2 − x0 The interpolation polynomial is Q2 (x) = f (x0 ) + f [x0 . x1 .e.4718.4718. i..4718x.4 reveals that while this error estimate is correct. x 2 − x1 and the second-order divided difference is f [x0 . 1]: x= (b − a)y + (a + b) . The first-order divided differences are f (x1 ) − f (x0 ) ≈ 0. 1].

4 −0. Levy 1 2.8 f(x) 0. Interpolation that involves the derivatives is called Hermite interpolation. See Example 2.6 0.2 Q2(x) 0 −0. and p(0) = 1. The corresponding n + 1 interpolation points in the interval [a.b] 2 j=0 so that the interpolation error is |f (y) − Qn (y)| 1 b−a n (n + 1)! 2 2 n+1 ξ∈[a.D.2 0.13.b] n+1 n max |x| 1 j=0 (x − xj ) .4 0.6 0.8 1 x Figure 2.4: The function f (x) = sin(π(x)) and the interpolation polynomial Q2 (x) that interpolates f (x) at the Chebyshev points.9 Hermite Interpolation 0.2 −0.8 −0. 21 . 2 n 0 j n. we are also interested in interpolating its derivatives.4 0. Such an interpolation problem is demonstrated in the following example: Example 2. p (1) = −1.14 Problem: Find a polynomials p(x) such that p(1) = −1.6 −0.4 −0. max f (n+1) (ξ) .8 −1 −1 −0. (2.6 −0. b] are xj = (b − a)yj + (a + b) .37) 2. We now have b−a (y − yj ) = max y∈[a.2 0 0.9 Hermite Interpolation We now turn to a slightly different interpolation problem in which we assume that in addition to interpolating the values of the function at certain points.

Levy Solution: Since three conditions have to be satisfied. However. Example 2. Hence. p(i) (xj ) = f (i) (xj ).9 Hermite Interpolation D. In general. which amounts to a contradicting information on the value of a1 . a linear polynomial can not interpolate the data and we must consider higherorder polynomials.. we assume that for any point xj in which we have to satisfy an interpolation condition of the form p(l) (xj ) = f (xj ). we may have to interpolate high-order derivatives and not only firstorder derivatives. but we will still not have enough information to determine the constant a0 . 0 i l.  a0 = 1. Unfortunately. Hence. a polynomial of order 2 will no longer be unique because not enough information is given. we are also given all the values of the lower-order derivatives up to l as part of the interpolation requirements. both conditions specify the derivative of p(x) at two distinct points to be of different values. say p(x) = a0 + a1 x. we may expect them to uniquely determine a linear function. The conditions of the problem then imply that   a0 + a1 + a2 = −1. there is indeed a unique polynomial that satisfies the interpolation conditions and it is If this is not the case. Note that even if the prescribed values of the derivatives were identical. we can use these conditions to determine three degrees of freedom. 22 . which means that it is reasonable to expect that these conditions uniquely determine a polynomial of degree 2. We therefore let p(x) = a0 + a1 x + a2 x2 . we will not have problems with the coefficient of the linear term a1 . Solution: Since we are asked to interpolate two conditions. i. p(x) = x2 − 3x + 1. it may not be possible to find a unique interpolant as demonstrated in the following example.2. Also.15 Problem: Find p(x) such that p (0) = 1 and p (1) = −1. a1 + 2a2 = −1.e. (with p(l) being the lth -order derivative of p(x)).

n! (2. . i.e. The first form we study is the Newton form of the Hermite interpolation polynomial. 0 j n. . x = x . . 23 . x0 ]. 2. Definition 2. Levy 2. . x0 ] = f (x0 ). . . We start by extending the definition of divided differences in such a way that they can handle derivatives. xn .D. . .16 The first-order divided difference with repetitions is defined as f [x0 . xn ] − f [x0 .38) In a similar way. . we have to satisfy the interpolation conditions: p(i) (xj ) = f (i) (xj ). + f (n) (x0 ) (x − x0 )n = n! n j=0 f (j) (x0 ) (x − x0 )j . it is natural to extend the notion of divided differences by the following definition. Then the divided differences satisfy   f [x1 . j! which is the Taylor series of f (x) expanded about x = x0 . we can extend the notion of divided differences to high-order derivatives as stated in the following lemma (which we leave without a proof). (2.  n 0 x n − x0 f [x0 . xn ] =  f (n) (x0 )  . x→x0 x − x0 x→x0 Hence. . . n is The unique solution of this problem in terms of a polynomial of degree p(x) = f (x0 ) + f (x0 )(x − x0 ) + . At each interpolation point xj . the interpolation conditions are: p(j) (x0 ) = f (j) (x0 ). xn = x0 . 0 i mj . one can consider the Taylor series as an interpolation problem in which one has to interpolate the value of the function and its first n derivatives at a given point. xl (which we assume are ordered from small to large). say x0 . . When viewed from the point of view that we advocate in this section. We already know that the first derivative is connected with the first-order divided difference by f (x0 ) = lim f (x) − f (x0 ) = lim f [x.9 Hermite Interpolation A simple case that you are probably already familiar with is the Taylor series.1 Divided differences with repetitions We are now ready to consider the Hermite interpolation problem. ..17 Let x0 x1 .9. . . . . . . Lemma 2. xn−1 ] .39) We now consider the following Hermite interpolation problem: The interpolation points are x0 .

We then list all the points including their multiplicities (that correspond to the number of derivatives we have to interpolate). Example 2. . we interpret this divided difference in terms of the extended definition (2. n = m1 + m2 + . x1 ] = = . x1 . . x1 ](x − x0 ) + f [x0 . Levy Here.9 Hermite Interpolation D. . x1 ](x − x0 )(x − x1 ). .2. (2. . x1 . . x1 ] − f [x1 . . . x1 . . . + ml . m1 m2 ml The interpolation polynomial pn−1 (x) is given by n−1 j−1 pn−1 (x) = f [y0 ] + j=1 f [y0 . yj ] k=0 (x − yk ). x1 ] = f (x1 ) − f (x0 ) . x 1 − x0 (x1 − x0 )2 24 . p(x) = f (x0 ) + f [x0 .40) Whenever a point repeats in f [y0 . xl . i. . . x0 ] f [x0 . In general. x0 . x 1 − x0 x 1 − x0 Hence p(x) = f (x0 )+ f (x1 ) − f (x0 ) (x1 − x0 )f (x1 ) − [f (x1 ) − f (x0 )] (x−x0 )+ (x−x0 )(x−x1 ). . The divided differences: f [x0 . . . x1 . x 1 − x0 Solution: The interpolation polynomial p(x) is 1 )−f (x f (x1 ) − f (xx1 −x0 0 ) f [x1 . p(x1 ) = f (x1 ). xl }.39). . . . . . . the number of derivatives that we have to interpolate may change from point to point. To simplify the notations we identify these points with a new ordered list of points yi : {y0 . .. yn−1 } = {x0 . In practice. mj denotes the number of derivatives that we have to interpolate for each point xj (with the standard notation that zero derivatives refers to the value of the function only).  p (x1 ) = f (x1 ). . . . yj ].e. there is no need to shift the notations to y’s and we work directly with the original points. . The extended notion of divided differences allows us to write the solution to this problem in the following way: We let n denote the total number of points including their multiplicities (that correspond to the number of derivatives we have to interpolate at each point). . .18 Problem: Find an interpolation polynomial p(x) that satisfies   p(x0 ) = f (x0 ). . We demonstrate this interpolation procedure in the following example. .

41) We look for an interpolant of the form n n p(x) = i=0 f (xi )Ai (x) + i=0 f (xi )Bi (x).5). 0 i n. We will thus assume that the unknown polynomials Ai (x) and Bi (x) in (2. . Hence ri (xi ) = 1. 2 (li (xj )) = 0. (2. 2 The degree of li (x) is n. . 2 Bi (x) = si (x)li (x). according to (2. j = 0. xn and the interpolation conditions are p(xi ) = f (xi ). as desired. for i = j. In addition.44) . .41). i.9 Hermite Interpolation The Lagrange form of the Hermite interpolant In this section we are interested in writing the Lagrange form of the Hermite interpolant in the special case in which the nodes are x0 . . which implies that deg(Ai ) = deg(Bi ) = 2n + 1. (2. The functions ri (x) and si (x) are both assumed to be linear.D.43) can be written as 2 Ai (x) = ri (x)li (x). It is convenient to start the construction with the functions we have used in the Lagrange form of the standard interpolation problem (Section 2. li (x) = j=0 j=i x − xj .9.42) We thus expect to have a unique polynomial p(x) that satisfies the constraints (2.43)  Ai (xj ) = 0. p (xi ) = f (xi ). . (2. .42) must satisfy the 2n + 2 conditions:   Ai (xj ) = δij . the polynomials p(x) in (2. 2 li (xj ) = 0. .43) 2 δij = Ai (xj ) = ri (xj )li (xj ) = ri (xj )δij . 25 (2. Bi (xj ) = δij . Levy 2. x i − xj satisfy li (xj ) = δij . Now.43) assuming that we limit its degree to be 2n + 1.2 2. We already know that n In order to satisfy the interpolation conditions (2. Bi (xj ) = 0. which means that the degree of li (x) is 2n. . n.

. p(xi ) = f (xi ). then ∀x ∈ [a. p(x) = i=0 (2. (2. the conditions (2.2. equations (2.46) and (2.46) and 2 2 δij = Bi (xj ) = si (xj )li (xj ) + 2si (xj )(li (xj )) =⇒ si (xi ) = 1.47). . xn be distinct nodes in [a. Levy 0 = Ai (xj ) = ri (xj )[li (xj )]2 + 2ri (xj )li (xJ )li (xj ) = ri (xj )δij + 2ri (xj )δij li (xj ).43) imply that 2 0 = Bi (xj ) = si (xj )li (xj ) =⇒ si (xi ) = 0. p (xi ) = f (xi ). If p ∈ Π2n+1 .45) b = 1 + 2li (xi )xi . Theorem 2. the Lagrange form of the Hermite interpolation polynomial is given by n n 2 f (xi )[1 + 2li (xi )(xi − x)]li (x) + 2 f (xi )(x − xi )li (x). As of Bi (x) in (2. Assuming that ri (x) is linear. To summarize. b) such that f (2n+2) (ξ) f (x) − p(x) = (2n + 2)! n i=0 (x − xi )2 . D. such that ∀0 i n.49) . ri (x) = ax + b. b].44).47) Combining (2. and thus ri (xi ) + 2li (xi ) = 0.(2.48) i=0 The error in the Hermite interpolation (2.48) is given by the following theorem.19 Let x0 . we obtain si (x) = x − xi . Therefore 2 Ai (x) = [1 + 2li (xi )(xi − x)]li (x).9 Hermite Interpolation Also. there exists ξ ∈ (a. imply that a = −2li (xi ). b] and f ∈ C 2n+2 [a. b]. .42). so that 2 Bi (x) = (x − xi )li (x). 26 (2. . (2.45). (2.

λ= f (x) − p(x) . say ξ. and by induction. x 1 − x0 x − x0 x 1 − x0 2 According to (2. . p(2n+2) (ξ) = 0. Similarly. Levy 2. . b]. . . Example 2.48). p (x1 ) = d1 . If x is one of the interpolation points. x 1 − x0 l1 (x) = 1 .9 Hermite Interpolation Proof. . φ vanishes at x0 .D. Since the leading term in w(y) is x2n+2 . . Also. b]: (x. . x 0 − x1 l1 (x) = x − x0 . w(2n+2) (ξ) = (2n + 2)!. which means that φ has at least 2n + 2 zeros in [a. and select λ such that φ(x) = 0. b). w(x) φ has (at least) n + 2 zeros in [a. We recall that x was an arbitrary (non-interpolation) point and hence we have f (2n+2) (ξ) f (x) − p(x) = (2n + 2)! n i=0 (x − xi )2 . . p (x0 ) = d0 . the result trivially holds. We also have φ(y) = f (y) − p(y) − λw(y). we know that φ has (at least) n + 1 zeros that are different than (x. xn . φ(2n+2) has at least one zero in (a. x 0 − x1 l0 (x) = 1 . since p(x) ∈ Π2n+1 . .20 Assume that we would like to find the Hermite interpolation polynomial that satisfies: p(x0 ) = y0 .. . p(x1 ) = y1 . i. We thus fix x as a non-interpolation point and define n w(y) = i=0 (y − xi )2 . b). Rolle’s theorem implies that φ has at least 2n + 1 zeros in (a.8. and l0 (x) = x − x1 . The proof follows the same techniques we used in proving Theorem 2. . the desired polynomial is given by (check!) p(x) = y0 2 1+ (x0 − x) x 0 − x1 x − x1 x 0 − x1 x − x1 x 0 − x1 2 + y1 2 +d0 (x − x0 ) + d1 (x − x1 ) 27 x − x0 x 1 − x0 2 1+ (x1 − x) x 1 − x0 2 . x0 . Hence 0 = φ(2n+2) (ξ) = f (2n+2) (ξ) − p(2n+2) (ξ) − λw(2n+2) (ξ). x0 . xn ). xn ). . In this case n = 1. . Also.e. By Rolle’s theorem.

5: A piecewise-linear spline. A simple example of such interpolants will be the function we get by connecting data with straight lines (see Figure 2.e. . 28 .5). should be thought of as polynomials on subintervals that are connected in a “smooth way”. Overall it is continuous where the regularity is lost at the knots You may still wonder why are we interested in such functions at all? It is easy to motivate this discussion by looking at Figure 2. we would like to generate functions that are somewhat smoother than piecewise-linear functions.10 Spline Interpolation D. Even though the data that we interpolate has only one extrema in the domain. we pick n + 1 points which we refer to as the knots: t0 < t1 < · · · < tn . f(x1 )) (x3 . In every subinterval the function is linear. i.. A spline of degree k having knots t0 . tn is a function s(x) that satisfies the following two properties: 1. That is why we focus our attention in this section on splines. Levy 2. Of course. k. s(x) is a polynomial on every 2. In this section we discuss a different type of interpolation: piecewise-polynomial interpolation. f(x0 )) x Figure 2.6. Smoothness: s(x) has a continuous (k − 1)th derivative on the interval [t0 . ti ) s(x) is a polynomial of degree subinterval that is defined by the knots. and still interpolate the data.10 Spline Interpolation So far. tn ]. The functions we will discuss in this section are splines. . In general. which rules them as non-practical for many applications. we have no control over the oscillatory nature of the high-order interpolating polynomial.2. On [ti−1 . f(x3 )) (x4 . f(x4 )) (x2 . (x1 . In this figure we demonstrate what a high-order interpolant looks like. First. f(x2 )) (x0 . high-order polynomials are oscillatory. . We will be more rigorous when we define precisely what we mean by smooth. . Splines. the only type of interpolation we were dealing with was polynomial interpolation.

6: An interpolant “goes bad”. Q10 (x) (t1 .7: A zeroth-order (piecewise-constant) spline. f(t1 )) (t3 . f(t2 )) (t0 . Levy 2.D.10 Spline Interpolation 2 1. f(t4 )) (t2 .5 −5 −4 −3 −2 −1 0 1 2 3 4 5 x Figure 2. the function is not even continuous 29 . The knots are at the interpolation points. Since the spline is of degree zero. f(t3 )) (t4 . In this example we interpolate 11 equally spaced 1 samples of f (x) = 1+x2 with a polynomial of degree 10.5 Q10(x) 1 0.5 0 1 1+x2 −0. f(t0 )) x Figure 2.

52) Before actually computing the spline.   s (x). 2.8). t2 ). s(x) will have to satisfy interpolation conditions that we will discuss below. . s(ti ) = yi .10. t1 ). 1 i n − 1.   s (x) = a x + b .. t ].    s1 (x). tn are called knots: these are the points that connect the different polynomials with each other. t ].    s1 (x) = a1 x + b1 . t2 ). n−1 n−1 n−1 n−1 n (see Figure 2. s(x) = .7). There are n subintervals. and in each 30 . We now assume that some data (that s(x) should interpolate) is given at the knots.. To qualify as an interpolating function.50) in addition to requiring that s(x) is continuous.  . (2. imply that si−1 (ti ) = yi = si (ti ).1 Cubic splines where ∀i.51) We also require the continuity of the first and the second derivatives. i.  . It is now obvious why the points t0 . x ∈ [t0 . i. and other choices can be made. n−1 n−1 n The interpolation conditions (2.e. (2.2. We would like to comment already at this point that knots should not be confused with the interpolation points.  . A spline of degree 1 is a piecewise-linear function that can be explicitly written as   s0 (x) = a0 x + b0 . . n − 2. . . 0 i n. the degree of si (x) is 3.5 where the knots {ti } and the interpolation points {xi } are assumed to be identical). and a function with two continuous derivatives overall (see Figure 2. x ∈ [t0 . si (ti+1 ) = si+1 (ti+1 ). si (ti+1 ) = si+1 (ti+1 ). 0 0 (2. s(x) = . i. Let’s denote such a function by s(x). A cubic spline is a spline for which the function is a polynomial of degree 3 on every subinterval. let’s check if we have enough equations to determine a unique solution for the problem. . .. x ∈ [t . .10 Spline Interpolation D. x ∈ [t .e. x ∈ [t1 . t1 ). x ∈ [t1 . . i i n − 2.   s0 (x). Sometimes it is convenient to choose the knots to coincide with the interpolation points but this is only optional.e. . .  .50) A special case (which is the most common spline function that is used in practice) is the cubic spline. Levy A spline of degree 0 is a piecewise-constant function (see Figure 2.

we have zi+1 1 zi 1 − (x − ti+1 )2 + c. The interpolation and continuity conditions (2. These indeed are two degrees of freedom that can be determined in various ways as we shall see below. We will use the following notation: hi = ti+1 − ti . si (x) = x − ti x − ti+1 zi+1 − zi .52) add 2(n − 1) = 2n − 2 equations.8: A cubic spline.. f(t2 )) (t0 . In every subinterval [ti−1 .53) once. subinterval we have to determine a polynomial of degree 3. f(t0 )) x Figure 2. The continuity of the first and the second derivatives (2. Levy 2. ti . ˜ si (x) = (x − ti )2 2 hi 2 hi 31 . Each such polynomial has 4 coefficients.51) for si (ti ) and si (ti+1 ) amount to 2n equations. In this example we use the not-a-knot condition. We are now ready to compute the spline.10 Spline Interpolation (t1 . hi hi (2. Altogether we have 4n − 2 equations but 4n unknowns which leaves us with 2 degrees of freedom. f(t3 )) (t4 . f(t1 )) (t3 . i. We also set zi = s (ti ).53) Integrating (2.e. we observe that si (x) is the line connecting (ti . The polynomials on the different subintervals are connected to each other in such a way that the spline has a second-order continuous derivative. zi+1 ). f(t4 )) (t2 .D. zi ) and (ti+1 . Since the second derivative of a cubic function is linear. which leaves us with 4n coefficients to determine. the function is a polynomial of degree 2.

Levy Similarly. si (ti ) = si−1 (ti ). 6 3 hi−1 h i−1 si (ti ) = − Hence. (x − ti )2 − (ti+1 − x)2 + − 2hi 2hi hi 6 hi 6 yi−1 zi−1 hi−1 zi zi−1 yi zi . s(ti ) = yi .54) hi−1 hi + hi−1 hi 1 1 zi−1 + zi + zi+1 = (yi+1 − yi ) − (yi − yi−1 ). zn . . We first compute si (x) and si−1 (x): yi zi hi zi yi+1 zi+1 zi+1 hi − + .. we obtain the system of equations (2. We can set z1 . C= yi+1 zi+1 − hi . si (x) = 6hi 6hi The interpolation condition. zn−1 using the continuity conditions on s (x). i. 6 3 6 hi hi−1 32 . D= yi zi hi − . . 6hi i D. .e. implies that yi = i. implies that yi+1 = i. z0 .e. hi 6 zi+1 3 h + Chi .. 6hi 6hi hi 6 hi 6 All that remains to determine is the second derivatives of s(x).10 Spline Interpolation Integrating again zi zi+1 (x − ti )3 + (ti+1 − x)3 + C(x − ti ) + D(ti+1 − x). . 3 6 hi hi zi 2 yi−1 zi−1 hi−1 yi zi si−1 (ti ) = hi−1 + − hi−1 − + 2hi−1 hi−1 6 hi−1 6 hi−1 hi−1 yi−1 yi = zi−1 + zi − + .e. si−1 (x) = (x − ti−1 )2 − (ti − x)2 + − hi−1 − + 2hi−1 2hi−1 hi−1 6 hi−1 6 si (x) = So that zi 2 yi+1 zi+1 yi zi hi hi + − hi − + 2hi hi 6 hi 6 hi hi yi yi+1 = − zi − zi+1 − + . .. hi 6 This means that we can rewrite si (x) as si (x) = yi+1 zi+1 zi+1 yi zi zi hi (x−ti )+ (x−ti )3 + (ti+1 −x)3 + − − hi (ti+1 −x). .2. si (ti+1 ) = yi+1 . for 1 i n − 1. . . 6hi i zi 3 h + Dhi .

t1 . .  . In the special case where the points are equally spaced. .. hn−3 6 h2 6 . 33 . The not-a-knot condition. = (2. tridiagonal. y1 −y0 h0 y2 −y1 h1  The coefficients matrix is symmetric. the only way to proceed is by making an arbitrary choice.   y2 −y1 h1 y3 −y2 h2 − − . tn . (2. there are other standard options: 1. One option is to set the end values to zero. . . i. . . The spline that is plotted with a dashed line is obtained by setting the derivatives at both end-points to zero. . . Figure 2...10 Spline Interpolation These are n − 1 equations for the n + 1 unknowns. 2      h yn−1 − 2yn−2 + yn−3   1 4 1 zn−2  yn − 2yn−1 + yn−2 1 4 zn−1 In addition to the natural spline (2. . tn−1 .  .e. z0 .55) This choice of the second derivative at the end points leads to the so-called. ... . . We will explain later in what sense this spline is “natural”. tn−2 . which means that it can always be (efficiently) inverted. The spline that is plotted with a solid line is the not-a-knot spline. and diagonally dominant (i.D. Without any additional information about the problem..56)    . 2. If the values of the derivatives at the endpoints are known. . tn−1 . hi = h. zn ... In this case we end up with a cubic spline with knots t0 . . we require the third-derivative s(3) (x) to be continuous at the points t1 . . Here. . The interpolation requirements are still satisfied at t0 . . . i. tn . t3 .e. . In this case. . hn−3 +hn−2 3 hn−2 6               =    yn−1 −yn−2 yn−2 −yn−3  hn−2  zn−2   hn−2 − hn−3  6 yn −yn−1 hn−2 +hn−1 zn−1 − yn−1 −yn−2 h h 3 n−1 n−2  z1 z2 . ∀i). The points t1 and tn−1 no longer function as knots. |aii | > n j=1. z0 = zn = 0.  ... . ∀i. we end up with the following linear system of equations  h0 +h1       3 h1 6 h1 6 h1 +h2 3 .9 shows two different cubic splines that interpolate the same initial data. t2 . the system becomes      y2 − 2y1 + y0 4 1 z1  y3 − 2y2 + y1  1 4 1   z2      6     .. Levy 2. natural cubic spline. which means that we have 2 degrees of freedom. . There are several standard ways to proceed.55). s (tn ) = yn .j=i |aij |.. . one can specify them s (t0 ) = y0 . .e.

Then since s(x) interpolates f (x) at the knots {ti } their difference vanishes at these points. which will conclude the proof as the other two terms on the right-hand-side of (2.2.21 Assume that f (x) is continuous in [a. g(ti ) = 0. f(t1 )) (t3 . b]. Define g(x) = f (x) − s(x). f(t3 )) (t4 .9: Two cubic splines that interpolate the same data. (2. Proof..57) are 34 . Dashed line: the derivative is set to zero at both end-points 2. Solid line: a not-a-knot spline.e. Now b b b b 0 i n. f(t4 )) (t2 . If s(x) is the natural cubic spline interpolating f (x) at the knots {ti } then b b (s (x)) dx a a 2 (f (x))2 dx. and let a = t0 < t1 < · · · < tn = b.57) We will show that the last term on the right-hand-side of (2.57) is zero. Theorem 2. we are minimizing the L2 -norm of the secondderivative not only with respect to the “original” function which we are interpolating. but with respect to any function that interpolates the data (and has a continuous secondderivative). Levy (t1 . i.2 What is natural about the natural spline? The following theorem states that the natural spline can not have a larger L2 -norm of the second-derivative than the function it interpolates (assuming that that function has a continuous second-derivative). we refer to the natural spline as “natural”. (f )2 dx = a a (s )2 dx + a (g )2 dx + 2 a s g dx. In fact. f(t2 )) (t0 . In that sense.10 Spline Interpolation D. f(t0 )) x Figure 2.10.

Levy 2. we end up with b a n ti ti−1 n ti n s g dx = − i=1 s g dx = − ci i=1 ti−1 g dx = − ci (g(ti )−g(ti−1 )) = 0. 35 .10 Spline Interpolation non-negative.D. and since s (x) is constant on [ti−1 . ti−1 Since we are dealing with the “natural” choice s (t0 ) = s (tn ) = 0. Splitting that term into a sum of integrals on the subintervals and integrating by parts on every subinterval. b From that point of view. we have b n ti n ti ti s g dx = a i=1 ti−1 s g dx = i=1 (s g ) ti−1 − s g dx . minimizing a (f (x))2 dx. ti ] (say ci ). i=1 We note that f (x) can be viewed as a linear approximation of the curvature |f (x)| 3 (1 + (f (x))2 ) 2 . can be viewed as finding the curve with a minimal |f (x)| over an interval.

We can therefore define the L∞ norm (also known as the maximum norm) of such a function by f ∞ = max |f (x)|. g(x) ∈ C 0 [a. 2. a x b (3. 3. Levy 3 3.3) to make sense. We will focus on two such measurements (among many): the L∞ norm and the L2 -norm. we do not have to assume that f (x) is continuous for the definition (3. b] (continuous on [a. We start with several definitions. We recall that a norm on a vector space V over R is a function · : V → R with the following properties: 1. However. we will typically assume that g(x) is a polynomial of a given degree (though it can be a trigonometric function. if we allow f (x) to be discontinuous.1) The L∞ -distance between two functions f (x). ∀f ∈ V . b]). will therefore be: find the “closest” polynomial of degree n to f (x). Of course. we will limit ourselves to continuous functions.D. a x b (3. (3. This generalization requires some subtleties that we would like to avoid in the following discussion.2) We note that the definition of the L∞ -norm can be extended to functions that are less regular than continuous functions. We proceed by defining the L2 -norm of a continuous function f (x) as b f 2 = a |f (x)|2 dx. The triangle inequality: f + g We assume that the function f (x) ∈ C 0 [a. What do we mean by “close”? There are different ways of measuring the “distance” between two functions. As far as the class of functions that g(x) belongs to. Generally speaking. A continuous function on a closed interval obtains a maximum in the interval. f ∀λ ∈ R and ∀f ∈ V .3) The L2 function space is the collection of functions f (x) for which f 2 < ∞. f + g . hence. b] is thus given by f −g ∞ = max |f (x) − g(x)|. starting from a function f (x) we would like to find a different function g(x) that belongs to a given class of functions and is “close” to f (x) in some sense.1 Approximations Background In this chapter we are interested in approximation problems. ∀f. g ∈ V . We chose to focus on these two examples because of the different mathematical techniques that are required to solve the corresponding approximation problems. we then have to be more 36 . or any other function). λ f = |λ| f . 0. A typical approximation problem. Also f = 0 iff f is the zero element of V .

Loosely speaking. The following theorem. 1]. It is easy to see that the value of the norm of a function may vary substantially based on the function as well as the choice of the norm. the choice of norm may have a significant impact on the solution of the approximation problem. For example. On the other hand. Then. For example. The L2 -distance between two functions f (x) and g(x) is b f −g 2 = a |f (x) − g(x)|2 dx. Then there exists a sequence of polynomials Pn (x) that converges uniformly to f (x) on [a.e. We formulate this theorem in the L∞ norm and note that a similar theorem holds also in the L2 sense. ∀ε > 0. We therefore limit ourselves also in this case to continuous functions only. it is easy to construct a function with an arbitrary small f 2 and an arbitrarily large f ∞ . ∀n N. this theorem states that any continuous function can be approached as close as we want to with polynomials. e. assume that f ∞ < ∞. b].. a natural question is how important is the choice of norm in terms of the solution of the approximation problem. We start with the definition. Given a continuous function f (x) in [0. and then we will show that they uniformly converge to f (x). assuming that the polynomials can be of any degree. Is that the best we can do? Sometimes the answer is positive. there is a strong connection between some approximation problems and interpolation problems. Theorem 3. We will provide a constructive proof of the Weierstrass approximation theorem: first. As you have probably already anticipated..1 Background rigorous in terms of the definition of the interval so that we end up with a norm (the problem is. plays a central role in any discussion of approximations of functions. such that ∀x ∈ [a. j 37 0 x 1. we will define a family of polynomials. Levy 3. clearly b f 2 = a |f |2 dx ≤ (b − a) f ∞. in defining what is the “zero” element in the space).g. . the Weierstrass approximation theorem. i.1 (Weierstrass Approximation Theorem) Let f (x) be a continuous function on [a. Hence. we define the Bernstein polynomials as n (Bn f )(x) = j=0 f j n n j x (1 − x)n−j . but the problem still remains difficult because we have to determine the best sampling points. b] |f (x) − Pn (x)| < ε. there exists an N ∈ N and polynomials Pn (x) ∈ Πn .D. known as the Bernstein polynomials. (3. We let Πn denote the space of polynomials of degree n.4) At this point. one possible method of constructing an approximation to a given function is by sampling it at certain points and then interpolating the sampled data. b]. We will address these issues in the following sections.

8 B20(x) 0. Example 3.2 0.8 0.7 0.1.1. (Bn x2 )(x) = Proof.4 0. B10 (x).3 0.1 Background We emphasize that the Bernstein polynomials depend on the function f (x). Note the gradual convergence of Bn (x) to f (x). Levy on the interval [0.1 0.9 0.2 Three Bernstein polynomials B6 (x).5)2 We now state and prove several properties of Bn (x) that will be used when we prove Theorem 3. and B20 (x) for the function f (x) = 1 on the interval [0. B10 (x). 1] are shown in Figure 3. j 38 .5)2 D.3 The following relations hold: 1.6 0. 1 f(x) B6(x) B10(x) 0.6 0. and B20 (x) for the function f (x) = 1 1 + 10(x − 0. (Bn 1)(x) = 1 2. 1] 1+10(x−0.7 0.3 0 0.1: The Bernstein polynomials B6 (x).5 0. n n−1 2 x x + . Lemma 3.3. n n (Bn 1)(x) = j=0 n j x (1 − x)n−j = (x + (1 − x))n = 1.5 0. (Bn x)(x) = x 3.9 1 x Figure 3.4 0.

j n 2 n j = j (n − 1)! (n − 1)! (n − 1)! n−1j −1 1 = + n (n − j)!(j − 1)! n − 1 n (n − j)!(j − 1)! n (n − j)!(j − 1)! 1 n−1 n−1 n−2 + . g(x) ∀x ∈ [0.D. and ∀α ∈ R 1.1 Background j n j x (1 − x)n−j = x n j j=1 n−1 n (Bn x)(x) = j=0 n − 1 j−1 x (1 − x)n−j j−1 =x j=0 n−1 j x (1 − x)n−1−j = x[x + (1 − x)]n−1 = x.1. if |f (x)| |(Bn f )(x)| 3. Theorem 3.4 For all functions f (x). Monotonicity. If f (x) (Bn f )(x) 0. 1]. 1] then (Bn g)(x). j Finally. Lemma 3. 1]. n n n n 1 n − 2 j−2 x (1 − x)n−j + x n j=1 j−2 n n − 1 j−1 x (1 − x)n−j j−1 In the following lemma we state several additional properties of the Bernstein polynomials. Positivity. Linearity. (Bn (αf + g))(x) = α(Bn f )(x) + (Bn g)(x). 0 then We are now ready to prove the Weierstrass approximation theorem. 2. If f (x) (Bn f )(x) Also. Levy n 3. g(x) that are continuous in [0. The proof is left as an exercise. g(x) ∀x ∈ [0. = j−2 n n j−1 j n 2 Hence n (Bn x )(x) = j=0 2 n j x (1 − x)n−j j n n−1 2 = x n = j=2 1 n−1 2 x n−1 2 x (x + (1 − x))n−2 + x(x + (1 − x))n−1 = x + . 39 . then (Bn g)(x).

40 . Since f (x) is continuous on a closed interval. we have. 1]. 2δ 2 n (3. Hence ∀x. = ε + 2 (x − a)2 + 2 δ δ n Bn ε + n−1 2 x x + − 2ax + a2 n n Evaluating at x = a we have (observing that maxa∈[0. Choosing M N we have ∀n N . 1]. We will use it later on to our advantage).6) holds for any point a ∈ [0. The extension to [a. The linearity of Bn and the property (Bn 1)(x) = 1 imply that Bn (f − f (a))(x) = (Bn f )(x) − f (a). it is also bounded. Let M = max |f (x)|. Discuss Runge’s example.. δ2 We would now like to estimate the difference between Bn f and f .. x∈[0. it is uniformly continuous. Combining the estimates for both cases we have |f (x) − f (a)| ε+ 2M (x − a)2 .1 Background D.5) In addition. 1].1] (a − a2 ) = 1 ) 4 |Bn f (a) − f (a)| ε+ 2M a − a2 δ2 n ε+ M .6) The point a was arbitrary so the result (3. 2δ 2 ε Bn f − f ∞ ε+ M 2δ 2 N 2ε. Hence using the monotonicity of Bn and the mapping properties of x and x2 . We will prove the theorem in the interval [0. (3. Levy Proof.5) holds. since f (x) is continuous on a closed interval. If |x − a| |f (x) − f (a)| 2M 2M δ then (3. b] is left as an exercise. y ∈ [0. • Is interpolation a good way of approximating functions in the ∞-norm? Not necessarily. |Bn f (x) − f (a)| 2M 2M (x − a)2 = ε + 2 2 δ δ 2M x − x2 2M . |f (x) − f (y)| ε.1] Fix any point a ∈ [0. 1]. such that |x − y| δ. (at first sight this seems to be a strange way of upper bounding a function. If |x − a| > δ then x−a δ 2 .3.

D. Levy

3.2 The Minimax Approximation Problem

3.2

The Minimax Approximation Problem

We assume that the function f (x) is continuous on [a, b], and assume that Pn (x) is a polynomial of degree n. We recall that the L∞ -distance between f (x) and Pn (x) on the interval [a, b] is given by f − Pn

= max |f (x) − Pn (x)|.
a x b

(3.7)

Clearly, we can construct polynomials that will have an arbitrary large distance from f (x). The question we would like to address is how close can we get to f (x) (in the L∞ sense) with polynomials of a given degree. We define dn (f ) as the infimum of (3.7) over all polynomials of degree n, i.e., dn (f ) = inf
Pn ∈Πn

f − Pn

(3.8)

∗ The goal is to find a polynomial Pn (x) for which the infimum (3.8) is actually obtained, i.e., ∗ dn (f ) = f − Pn (x) ∞.

(3.9)

∗ We will refer to a polynomial Pn (x) that satisfies (3.9) as a polynomial of best approximation or the minimax polynomial. The minimal distance in (3.9) will be referred to as the minimax error. The theory we will explore in the following sections will show that the minimax polynomial always exists and is unique. We will also provide a characterization of the minimax polynomial that will allow us to identify it if we actually see it. The general construction of the minimax polynomial will not be addressed in this text as it is relatively technically involved. We will limit ourselves to simple examples.

Example 3.5 We let f (x) be a monotonically increasing and continuous function on the interval [a, b] and are interested in finding the minimax polynomial of degree zero to f (x) in that interval. We denote this minimax polynomial by
∗ P0 (x) ≡ c. ∗ Clearly, the smallest distance between f (x) and P0 in the L∞ -norm will be obtained if

c=

f (a) + f (b) . 2

∗ The maximal distance between f (x) and P0 will be attained at both edges and will be equal to

±

f (b) − f (a) . 2 41

3.2 The Minimax Approximation Problem 3.2.1 Existence of the minimax polynomial

D. Levy

The existence of the minimax polynomial is provided by the following theorem.
∗ Theorem 3.6 (Existence) Let f ∈ C 0 [a, b]. Then for any n ∈ N there exists Pn (x) ∈ Πn , that minimizes f (x) − Pn (x) ∞ among all polynomials P (x) ∈ Πn .

Proof. We follow the proof as given in [7]. Let η = (η0 , . . . , ηn ) be an arbitrary point in Rn+1 and let
n

Pn (x) =
i=0

ηi xi ∈ Πn .

We also let Our goal is to show that φ obtains a minimum in Rn+1 , i.e., that there exists a point ∗ ∗ η ∗ = (η0 , . . . , ηn ) such that φ(η ∗ ) = min φ(η). n+1
η∈R

φ(η) = φ(η0 , . . . , ηn ) = f − Pn

∞.

Step 1. We first show that φ(η) is a continuous function on Rn+1 . For an arbitrary δ = (δ0 , . . . , δn ) ∈ Rn+1 , define
n

qn (x) =
i=0

δi xi .

Then φ(η + δ) = f − (Pn + qn ) Hence φ(η + δ) − φ(η) ≤ qn
∞ ∞

≤ f − Pn

+ qn

= φ(η) + qn

∞.

≤ max (|δ0 | + |δ1 ||x| + . . . + |δn ||x|n ).
x∈[a,b]

˜ For any ε > 0, let δ = ε/(1 + c + . . . + cn ), where c = max(|a|, |b|). Then for any ˜ δ = (δ0 , . . . , δn ) such that max |δi | δ, 0 i n, φ(η + δ) − φ(η) Similarly φ(η) = f −Pn

ε.

(3.10)

= f −(Pn +qn )+qn ε,

f −(Pn +qn )

∞+

qn

= φ(η+δ)+ qn

∞,

which implies that under the same conditions as in (3.10) we also get φ(η) − φ(η + δ) Altogether, which means that φ is continuous at η. Since η was an arbitrary point in Rn+1 , φ is continuous in the entire Rn+1 . 42 |φ(η + δ) − φ(η)| ε,

D. Levy

3.2 The Minimax Approximation Problem

Step 2. We now construct a compact set in Rn+1 on which φ obtains a minimum. We let S = η ∈ Rn+1 We have φ(0) = f
∞,

φ(η) ≤ f

.

hence, 0 ∈ S, and the set S is nonempty. We also note that the set S is bounded and closed (check!). Since φ is continuous on the entire Rn+1 , it is also continuous on S, and hence it must obtain a minimum on S, say at η ∗ ∈ Rn+1 , i.e., min φ(a) = φ(η ∗ ).
η∈S

Step 3. Since 0 ∈ S, we know that min φ(η)
η∈S

φ(0) = f

∞.

Hence, if η ∈ Rn+1 but η ∈ S then φ(η) > f

min φ(η).
η∈S

This means that the minimum of φ over S is the same as the minimum over the entire Rn+1 . Therefore
n ∗ Pn (x)

=
i=0

∗ ηi xi ,

(3.11)

is the best approximation of f (x) in the L∞ norm on [a, b], i.e., it is the minimax polynomial, and hence the minimax polynomial exists. We note that the proof of Theorem 3.6 is not a constructive proof. The proof does not tell us what the point η ∗ is, and hence, we do not know the coefficients of the minimax polynomial as written in (3.11). We will discuss the characterization of the minimax polynomial and some simple cases of its construction in the following sections. 3.2.2 Bounds on the minimax error

It is trivial to obtain an upper bound on the minimax error, since by the definition of dn (f ) in (3.8) we have dn (f ) f − Pn
∞,

∀Pn (x) ∈ Πn .

A lower bound is provided by the following theorem. 43

n + 1. Proof. Assume for some Qn (x) that f − Qn ∞ < min |ej |. The sum is non oscillatory. 3. min |ej | j D.7) e ∗ implies that D dn .7 (de la Vall´e-Poussin) Let a e be a polynomial of degree n.e. On the other hand. We replace the infimum in the original definition of dn (f ) by a minimum because we already know that a minimum exists. Let Pn (x) j = 0. By contradiction. where all ej = 0 and are of an identical sign. suppose that ∗ ∗ (f − Pn )(xi ) = (−1)i f − Pn ∞. and hence it has at least n + 1 zeros. . the definition of dn implies that dn D∗ .2 The Minimax Approximation Problem Theorem 3. Levy x0 < x1 < · · · < xn+1 b. For the necessary part of the theorem we refer to [7].8 (The oscillating theorem) Suppose that f (x) is continuous in [a. .3 Characterization of the minimax polynomial The following theorem provides a characterization of the minimax polynomial in terms of its oscillations property. Let ∗ D∗ = f − Pn ∞.2. ∗ ∗ Hence D = dn and Pn (x) is the minimax polynomial. if Pn (x) ≡ Qn (x). Remark. is a polynomial of degree n that has the same sign at xj as does f (x) − Pn (x). This implies that (Qn − Pn )(x) changes sign at least n + 2 times. Suppose that f (xj ) − Pn (xj ) = (−1)j ej . j Then the polynomial (Qn − Pn )(x) = (f − Pn )(x) − (f − Qn )(x).. We prove here only the sufficiency part of the theorem. which contradicts the assumptions on Qn (x) and Pn (x). Without loss of generality. Then dn (f ). b]. .3. and let dn (f ) = min Pn ∈Πn f − Pn ∞. 44 . b]. i. In view of these theorems it is obvious why the Taylor expansion is a poor uniform approximation. Proof. ∗ The polynomial Pn (x) ∈ Πn is the minimax polynomial of degree n to f (x) in [a. b] if ∗ ∗ and only if f (x) − Pn (x) assumes the values ± f − Pn ∞ with an alternating change of sign at least n + 2 times in [a. 0 i n + 1. . Theorem 3. de la Vall´e-Poussin’s theorem (Theorem 3. Being a polynomial of degree n this is possible only if it is identically zero.

8) implies that there exist x0 . The triangle inequality implies that 1 ∗ f − (Pn + Qn ) 2 ∞ ≤ 1 ∗ f − Pn 2 ∞ + 1 f − Qn 2 ∞ = dn (f ).. ∗ (Pn − Qn )(xi ) = 0.. 0 i n + 1. i.4 3. (3. (3.14) and |f (xi ) − Qn (xi )| ≤ f − Qn ∞ = dn (f ).2. Assume that Qn (x) is also a minimax polynomial. The oscillating theorem 2 (Theorem 3. 0 i n + 1. Hence ∗ Qn (x) ≡ Pn (x). 2 Equation (3.12) ∗ |f (xi ) − Pn (xi ) + f (xi ) − Qn (xi )| = 2dn (f ). xn+1 ∈ [a. equations (3.e.13)–(3. .e. (3. 1 (Pn + Qn ) ∈ Πn is also a minimax polynomial. ∗ f (xi ) − Pn (xi ) = f (xi ) − Qn (xi ). Levy 3. Then its minimax poly∗ nomial Pn (x) ∈ Πn is unique.12) can be rewritten as 0 i n + 1. 0 i n + 1. 0 i n + 1. Proof. the polynomial (Pn − Qn )(x) ∈ Πn has n + 2 distinct roots which is possible for a polynomial of degree n only if it is identically zero. Then ∗ f − Pn ∞ = f − Qn ∞ = dn (f ).13) ∗ Since Pn (x) and Qn (x) are both minimax polynomials. . ∗ Hence.D. This is possible only if they are equal to each other. i. . 0 i n + 1. (3. 45 . Let dn (f ) = min Pn ∈Πn f − Pn ∞. b] such that 1 ∗ |f (xi ) − (Pn (xi ) + Qn (xi ))| = dn (f ).15) For any i. we have ∗ ∗ |f (xi ) − Pn (xi )| ≤ f − Pn ∞ = dn (f ).15) mean that the absolute value of two numbers that are dn (f ) add up to 2dn (f ). b]. . and the uniqueness of the minimax polynomial is established.2 The Minimax Approximation Problem Uniqueness of the minimax polynomial Theorem 3. ∗ Hence.9 (Uniqueness) Let f (x) be continuous on [a.

2 The Minimax Approximation Problem 3. We are not going to deal with the construction of the minimax polynomial in the general case. i. . we can think of Pn (x) as a function that interpolates f (x) at (least in) n + 1 points. but nevertheless. .2. as done in the following example.6 Construction of the minimax polynomial The characterization of the minimax polynomial in terms of the number of points in which the maximum distance should be obtained with oscillating signs allows us to construct the minimax polynomial in simple cases by a direct computation. A simple case where we can demonstrate a direct construction of the polynomial is when the function is convex. . we will be looking ∗ ∗ for a linear function P1 (x) such that its maximal distance between P1 (x) and f (x) is 46 . 3]. there should be n + 1 points on which f (x) and Pn (x) agree with each other. . In other words. . . say x0 . We note that the term “near-minimax” does not mean that the near-minimax polynomial is actually close to the minimax polynomial. Example 3.3. We recall that interpolation at the Chebyshev points minimizes the multiplicative part of the error term. Due to the dependency of f (n+1) (ξ) on the intermediate point ξ. x ∈ [1.2. Solution: Based on the characterization of the minimax polynomial.. and we refer the interested reader to [2] and the references therein. 3. xn .16) is a difficult task. 1 f (x) − Pn (x) = f (n+1) (ξ) (x − xi ). What can we say about these points? We recall that the interpolation error is given by (2. . In order for f (x) − Pn (x) to change its sign n + 2 times. Levy We now connect between the minimax approximation problem and polynomial interpolation.5 The near-minimax polynomial D. we know that the maximum of n n f (n+1) (ξ) i=0 (x − xi ). Find the minimax polynomial of degree ∗ 1. The algorithm for doing so is known as the Remez algorithm.e.25). (3. Hence. xn to be the Chebyshev points will not result with the minimax polynomial. choosing x0 . .16) will oscillate with equal values. n i=0 (x − xi ).10 Problem: Let f (x) = ex . (n + 1)! i=0 If Pn (x) is indeed the minimax polynomial. this relation motivates us to refer to the interpolant at the Chebyshev points as the near-minimax polynomial. P1 (x). we know that minimizing the error term (3.

i. Now f (a) = elog m = m.. Clearly. 47 . The construction itself is graphically shown in Figure 3. Levy 3. the maximal distance will be obtained at both edges and at one interior point.e. e) and (3.. Since f (x) = ex .2. e3 ex l1 (x) −→ l1 (a) y ¯ f(a) ∗ P1 (x) e1 ←− l2 (x) 1 a 3 x Figure 3. We will use this observation in the construction that follows. l1 (x) = e + m(x − 1). we have ea = m. e3 ). 3] We let l1 (x) denote the line that connects the endpoints (1. 2 (3. and l1 (a) = e + m(log m − 1).17) Let l2 (x) denote the tangent to f (x) at a point a that is identified such that the slope is m. i. since the function is convex.D. a = log m.2: A construction of the linear minimax polynomial for the convex function ex on [1.e. the slope m is given by m= e3 − e . in the case of the present problem.2 The Minimax Approximation Problem obtained 3 times with alternative signs. Here.

b φ(a0 .e. . we let Πn denote the space of all polynomials of degree n. ∗ P1 (x) = mx + e − m log m .. 2 where the slope m is given by (3. to find Q∗ ∈ Πn such that n f − Q∗ n 3. . As before. an ) = a b (f (x) − Qn (x))2 dx n b n n b = a f (x)dx − 2 2 ai i=0 a x f (x)dx + i=0 j=0 i ai aj a xi+j dx.3. 3. Solving the least-squares problem: a direct method Qn (x) = i=0 ai x i .17). The leastsquares approximation problem is to find the polynomial that is the closest to f (x) in the L2 -norm among all polynomials of degree n. We thus let φ denote the square of the L2 -distance between f (x) and Qn (x). the average between f (a) and l1 (a) which we denote by y is given by ¯ y= ¯ f (a) + l1 (a) m + e + m log m − m e + m log m = = .3 Least-squares Approximations Hence. 2 2 2 D. Levy ∗ The minimax polynomial P1 (x) is the line of slope m that passes through (a.3. We note that the maximal difference between ∗ P1 (x) and f (x) is obtained at x = 1.3. ¯ ∗ P1 (x) − e + m log m = m(x − log m). we will minimize its square. We want to minimize f (x) − Qn (x) 2 among all Qn ∈ Πn .e.3 3.e..2 Let n 2 = min Qn ∈Πn f − Qn 2 . 2 i.1 Least-squares Approximations The least-squares approximation problem We recall that the L2 -norm of a function f (x) is defined as b f 2 = a |f (x)|2 dx. instead of minimizing the L2 norm of the difference. . 3. . i. i. a. For convenience.. y ). 48 .

ˆ where the coefficients ai .. the system (3. . Indeed.20) We thus know that the solution of the least-squares problem is the polynomial n Q∗ (x) n = i=0 ai x i .   1/n 1/(n + 1)     1/n 1/(n + 1) . . which proves that not only the least-squares problem has a solution. 1) =  . . .21) The matrix (3. b (Hn+1 (a.18) implies that b n b n b 0 = −2 = 2 x f (x)dx + a n b i=0 k ai ˆ a b x i+k dx + j=0 aj ˆ a xj+k dx (3. .21) is known as the Hilbert matrix. b]. . ..D. .19) ai ˆ i=0 a xi+k dx − xk f (x)dx . an ) ∈ Rn+1 for which φ obtains a minimum.e. .20). i. a This is a linear system for the unknowns (ˆ0 . 49 . . b))i. . .  . an ): a ˆ n b b ai ˆ i=0 a xi+k dx = a xk f (x).. Levy 3. but that it is also unique. This means that we want to find a point a = (ˆ0 . are the solution of (3.11 The Hilbert matrix is invertible.3 Least-squares Approximations φ is a function of the n + 1 coefficients in the polynomial Qn (x). .18) The condition (3. k = 0. b] = [0. . . At this ˆ a ˆ point ∂φ ∂ak = 0. 1/(2n − 1) 1/2 1/3 . n. For example. a=ˆ a (3. a 0 i. .k = xi+k dx. (3. . . We let Hn+1 (a. Lemma 3. k n. b) denote the (n + 1) × (n + 1) coefficients matrix of the system (3.20) on the interval [a.. assuming that this ˆ system can be solved. .. 1]. . i = 0. n.20) always has a unique solution. in the case where [a. 1/1  1/2  Hn (0. . . (3. .

We demonstrate this with the following example. det(Hn ) = 0 and Hn is invertible. k=0 n Qn (x) = j=0 cj Pj (x). . In fact. Is inverting the Hilbert matrix a good way of solving the least-squares problem? No.77 · 105 .12 The Hilbert matrix H5 is  1/1 1/2  H5 = 1/3  1/4 1/5 1/2 1/3 1/4 1/5 1/6 is 1/3 1/4 1/5 1/6 1/7 1/4 1/5 1/6 1/7 1/8  1/5 1/6  1/7  1/8 1/9 The condition number of H5 is 4.3 Solving the least-squares problem: with orthogonal polynomials The inverse of H5  25  −300  H5 =  1050  −1400 630  −300 1050 −1400 630 4800 −18900 26880 −12600  −18900 79380 −117600 56700   26880 −117600 179200 −88200 −12600 56700 −88200 44100 n Let {Pk }k=0 be polynomials such that deg(Pk (x)) = k. cn ) = a [f (x) − Qn (x)]2 dx. Let Qn (x) be a linear combination of the polynomials {Pk }n . n. . 3.3.3 Least-squares Approximations D.3.e. . Levy Proof. Define (3. i. which indicates that it is ill-conditioned. . 50 .22) Clearly.. Example 3. the condition number of Hn increases with the dimension n so inverting it becomes more difficult with an increasing dimension. 1!2! · · · (2n − 1)! Hence. We leave it is an exercise to show that the determinant of Hn is given by det(Hn ) = (1!2! · · · (n − 1)!)4 . Qn (x) is a polynomial of degree b φ(c0 . There are numerical instabilities that are associated with inverting H.

n b b ∂φ ∂ck b c=ˆ c n b = −2 Pk (x)f (x)dx + 2 a j=0 cj ˆ a Pj (x)Pk (x)dx. .27) 51 . j = 0.20). j = 0. i = j. we have ˆ c ˆ 0= i. . Polynomials that satisfy  b  a (Pi (x))2 . . cj ˆ j=0 a Pj (x)Pk (x)dx = a Pk (x)f (x)dx.25). . ˆ (3.22). {ck }.24) are called orthonormal polynomials. n. Similarly to the calculations done in the previous section. (3. b (Pj (x))2 dx a j = 0.26) with coefficients cj .D. c = (ˆ0 . The idea now is to choose the polynomials {Pk (x)}n such that the system k=0 (3. . n. . . Levy 3. . indeed. We would like to minimize φ.23) can be easily solved. n. the solution of the least-squares problem is given by the polynomial Qn (x) in (3. . .24) Polynomials that satisfy (3..23) Note the similarity between equation (3. . . (3. . b Pi (x)Pj (x)dx =  a 0. 0. n. .23) and (3. (3. . while here we work with the polynomials {Pk (x)}n k=0 k=0 instead. then (3. i = j. This can be done if we choose them in such a way that b Pi (x)Pj (x)dx = δij = a 1. that are given by (3. k = 0. j = j. If. .23) implies that k=0 b cj = ˆ a Pj (x)f (x)dx. ˆ Remark. b with a (Pi (x))2 dx that is not necessarily 1 are called orthogonal polynomials.3 Least-squares Approximations We note that the function φ is a quadratic function of the coefficients of the linear combination (3. cn ). i = j. the polynomials {Pk (x)}n are orthonormal.26) with the coefficients cj = ˆ b a Pj (x)f (x)dx . we used the basis functions {xk }n (a basis of Πn ). . .e. (3. . There.25) The solution of the least-squares problem is a polynomial n Q∗ (x) = n j=0 cj Pj (x). at the minimum. . In ∗ this case.

w .e. n. (3.w = a (f (x))2 w(x)dx.3. .4 The weighted least squares problem D. . i. w(x). .3. For any weight w(x). Levy A more general least-squares problem is the weighted least squares approximation problem. n. (3.23)).3. we define the corresponding weighted L2 -norm of a function f (x) as b f 2. (3. By repeating the calculations of Section 3. .. i. b Pi (x)Pj (x)w(x)dx = δij . nonnegative function with a positive mass.28) In order to solve the weighted least-squares problem (3. to be a continuous on (a. We consider a weight function.3. we obtain n b b cj ˆ j=0 a w(x)Pj (x)Pk (x)dx = a w(x)Pk (x)f (x)dx. . ∗ The weighted least-squares problem is to find the closest polynomial Qn ∈ Πn to f (x). the solution of the weighted least-squares problem is given by n Q∗ (x) = n j=0 ˆ cj Pj (x).3. k = 0. . we look for a polynomial Q∗ (x) of degree n n such that f − Q∗ n 2.31) . b). k=0 We then consider a polynomial Qn (x) that is written as their linear combination: n Qn (x) = j=0 cj Pj (x).28) we follow the methodology described in Section 3.30) where the coefficients are given by b cj = ˆ a Pj (x)f (x)w(x)dx. i. a Note that w(x) may be singular at the edges of the interval since we do not require it to be continuous on the closed interval [a. b w(x)dx > 0.. j = 0.3. b].29) (compare with (3. this time in the weighted L2 -norm sense. 52 (3.29) can be easily solved if we choose {Pk (x)} to be orthonormal with respect to the weight w(x). .w = min Qn ∈Πn f − Qn 2.e.3 Least-squares Approximations 3. . a Hence.. and consider polynomials {Pk }n such that deg(Pk (x)) = k. The system (3.e.

5 Orthogonal polynomials At this point we already know that orthogonal polynomials play a central role in the solution of least-squares problems. f 4. w 53 . 3. in one point). we can have f. the Gram-Schmidt process is being used to convert one set of linearly independent vectors to an orthogonal set of vectors that spans the same vector space. the initial set of polynomials will be {1. to keep the discussion slightly more general. Typically. we should think about the process as converting one set of polynomials that span the space of polynomials of degree n to an orthogonal set of polynomials that spans the same space Πn . even in the weighted case.7. Here we must assume that f (x) is continuous in the interval [a. .3. In the case where {Pk (x)} are orthogonal but not necessarily normalized. In the general context of linear algebra. we will typically write f. In this section we will focus on the construction of orthogonal polynomials. 3. The properties of orthogonal polynomials will be studies in Section 3. g = f.. b]. However. This can be done using the Gram-Schmidt orthogonalization process. .w ∀α ∈ R. Given a weight w(x). g w .3. αf. n.3 Least-squares Approximations Remark. = f. f w. g = f1 . The weighted L2 -norm can be obtained from the weighted inner product by f 2. f1 + f2 . the coefficients of the solution (3. f = 0 and f (x) can still be non-zero (e. g instead of f. αg = α f. which we now describe in detail. f 0 and f. xn }. x2 .30) of the weighted least-squares problem are given by cj = ˆ Pj (x)f (x)dx . g + f2 . If it is not continuous. . . Some properties of the weighted inner product include 1. x. b]. f. Levy 3. g . g = g. we start with n + 1 linearly independent functions (all in L2 [a. .g. . . g w = a f (x)g(x)w(x)dx. which we would like to convert to orthogonal polynomials with respect to the weight w(x). g . b (Pj (x))2 w(x)dx a b a j = 0. We start by defining the weighted inner product between two functions f (x) and g(x) (with respect to the weight w(x)): b f. f. To simplify the notations. f = 0 iff f ≡ 0. In our context. we are interested in constructing orthogonal (or orthonormal) polynomials. 2. .D.

i.3 Least-squares Approximations b D. n n n n−1 n 0 n fi . f0 Hence. k − 1 we require the orthogonality conditions 0 = fk . i. b]. g1 1 w w = 0. f1 0 = d1 f0 .    f1 (x) = d1 (g1 (x) − c0 f0 (x)).e. .e. For f1 (x). b {gi (x)}n . In general k−1 fk (x) = dk (gk − c0 f0 − . . .. f0 .    f (x) = d (g (x) − c0 f (x) − .. Levy The goal is to find the coefficients dk and cj such that {fi }n is orthonormal with i=0 k respect to the weighted L2 -norm over [a. w The normalization condition f1 . g1 − c0 f0 1 1 1 Hence d1 = 1 g1 − c0 f0 . a (g(x))2 w(x)dx < ∞). The denominator cannot be zero due to the assumption that gi (x) are linearly independent. d0 = 1 g0 . w w w 2 = d0 g0 . .e. g0 w . we require that it is orthogonal to f0 (x). 1 . i. We thus consider   f0 (x) = d0 g0 (x).. c0 = f0 . The functions {gi } will be converted into i=0 orthonormal vectors {fi }. . 54 . . g0 . Hence = d1 ( f0 . g1 w − c0 ). g1 − c0 f0 1 i.3.e.  . . g1 1 − c0 f0 1 w w = 1 now implies = 1. 1 . f1 d2 g1 − c0 f0 . . We start with f0 (x): f0 . . − ck fk−1 ).. fi w . k For i = 0. fj w = a fi (x)fj (x)w(x)dx = δij . . − cn−1 f (x)).

a set of orthonormal polynomials with respect to the given weight on [−1. 1]. 2 3 2 =⇒ f1 (x) = 3 x. 1 3 1 −1 . 2 1 2 1 f1 (x) = g1 − c0 f0 = x − c0 1 1 d1 Hence 0 c1 = g1 . fi k i.e. f2 (x) = and so on. we have f0 (x) = d0 g0 (x) = d0 .D.13 Let w(x) ≡ 1 on [−1. Levy Hence 0 = dk (gk − ci fi ). f1 = 1 reads 1= Therefore. n. fk w = 1. i = 0. . w − ci ). 1 2 = xdx = 0. 2 2 d2 x2 dx = d2 . . 2 1 . Example 3. 55 1 2 5 (3x2 − 1). k .. 1]. −1 This implies that f1 (x) =x d1 1 =⇒ f1 (x) = d1 x. Since g0 (x) ≡ 1. . f0 = x. Start with gi (x) = xi . d1 = Similarly. 0 which means that 1 =⇒ d0 = √ 2 Now 1 f0 = √ . fi w w 3.3 Least-squares Approximations = dk ( gk . . The normalization condition f1 . fi i k − 1. Hence 1 1= −1 2 f0 (x)dx = 2d2 . 0 The coefficient dk is obtained from the normalization condition fk . i ck = gk . We follow the GramSchmidt orthogonalization process to generate from this list.

n 1. Tn . Levy We are now going to provide several important examples of orthogonal polynomials.32) It is possible to calculate them directly by Rodrigues’ formula Pn (x) = 1 dn (x2 − 1)n . Tm =     π  . This is a family of polynomials that are orthogonal with respect to the weight w(x) ≡ 1. 1]. starting from P0 (x) = 1. 2 56 (3. P1 (x) = x. 1 − x2 on the interval [−1. Our second example is of the Chebyshev polynomials. 2n + 1 (3.     π. (3. Legendre polynomials.31)).36) (see (2.33) The Legendre polynomials satisfy the orthogonality condition Pn . Pm = 2 δnm . They and are explicitly given by Tn (x) = cos(n cos−1 x). n 0. They satisfy the recurrence relation Tn+1 (x) = 2xTn (x) − Tn−1 (x).3. (3. n = m = 0. (3. (3.34) 2. on the interval [−1. 1]. The Legendre polynomials can be obtained from the recurrence relation (n + 1)Pn+1 (x) − (2n + 1)xPn (x) + nPn−1 (x) = 0.32)). 2n n! dxn n 0. We start with the Legendre polynomials.35) together with T0 (x) = 1 and T1 (x) = x (see (2. The orthogonality relation that they satisfy is   0. n 1.3 Least-squares Approximations D.37) . Chebyshev polynomials. 1. n = m. These polynomials are orthogonal with respect to the weight w(x) = √ 1 . n = m = 0.

∞). Levy 3. (3.43) . The can be explicitly written as Hn (x) = n −x2 n x2 d e (−1) e . on the interval [0. k!(n − 2k)! (3.42) where [x] denotes the largest integer that is the recurrence relation Hn+1 (x) − 2xHn (x) + 2nHn−1 (x) = 0. together with H0 (x) = 1. dxn 2 n 0. for an arbitrary real α > −1.40) Another way of expressing them is by [n/2] Hn (x) = k=0 (−1)k n! (2x)n−2k . We proceed with the Laguerre polynomials. ∞) with the weight function w(x) = e−x . (3.3 Least-squares Approximations 3. The Hermite polynomials are orthogonal with respect to the weight w(x) = e−x . The Hermite polynomials satisfy n 1. on the interval (−∞. They satisfy the orthogonality relation ∞ −∞ √ 2 e−x Hn (x)Hm (x)dx = 2n n! πδnm . (3. The Laguerre polynomials are given by Ln (x) = ex dn n −x (x e ). Laguerre polynomials.39) A more general form of the Laguerre polynomials is obtained when the weight is taken as e−x xα .D.38) The normalization condition is Ln = 1. n! dxn n 0. (3. ∞). 57 (3. 4. Here the interval is given by [0.41) x. H1 (x) = 2x. Hermite polynomials.

We recall that our goal is to minimize b a w(x)(f (x) − Qn (x))2 dx among all the polynomials Qn (x) of degree n.w = a w(x) f (x) − bj Pj (x) j=0 dx. n Q∗ 2 n = j=0 f. Pj j=0 2 w + w − bj 2 . Hence. We have n f− Hence Q∗ 2 n 2. . and let n Qn (x) = j=0 bj Pj (x). Pj j=0 w −2 f. = j=0 f.w = f 2 2. Pj j=0 + j=0 b2 j j = f n − f. The last expression is minimal iff ∀0 bj = f. n n n n n 0 f− = f j=0 2 2.w bj Pj . f n w −2 2 2.w − f. we will be able to derive some new results. Pj 2 w = f 2 − f − Q∗ n 58 2 f 2 .3 Least-squares Approximations 3. Pj j=0 n w + i=0 j=0 n bi bj Pi .6 Another approach to the least-squares problem D. Along the way. Pj f. (3.3.44) 1. Assume that {Pk (x)}k 0 is an orthonormal family of polynomials with respect to w(x). there exists a unique least-squares approximation which is given by n Q∗ (x) n Remarks. Pj w Pj (x). f − n bj Pj j=0 w bj w = f. Then b n 2 f− Hence 2 Qn 2. Levy In this section we present yet another way of deriving the solution of the least-squares problem.3.w bj f. Pj j=0 2 w . Pj w .

2 The normalization factor satisfies. P1 (x) = x. n 3.w . 1 P2 (x) = (3x2 − 1).w = 0. P1 (x) = x. (3.3 Least-squares Approximations f.. 2. b] is finite. Find the polynomial in Π2 . 1] implies that the orthogonal polynomials we need to use are the Legendre polynomials. (3. 59 .46) which is known as Parseval’s equality. 1 −1 2 Pn (x) = 2 .w = f.D. Levy i. Solution: The weight w(x) ≡ 1 on [−1. We are seeking for polynomials of degree 2 so we write the first three Legendre polynomials P0 (x) ≡ 1. P0 1 1 cos x √ dx = √ sin x = 2 2 −1 1 1 = −1 √ 2 sin 1. we have n→∞ lim f − Q∗ n ∞ j=0 2. 2 P1 (x)dx = .45) is called Bessel’s inequality. 2 2 2 2 We now have f. Example 3. Hence f 2 2.14 Problem: Let f (x) = cos x on [−1. 2n + 1 Hence 1 −1 2 P0 (x)dx = 2.45) The inequality (3. P2 (x) = √ (3x2 − 1). in general. Pj 2 w . 3 −1 1 2 2 P2 (x)dx = . that minimizes 1 −1 [f (x) − Q2 (x)]2 dx. 5 −1 1 We can then replace the Legendre polynomials by their normalized counterparts: √ 1 3 5 P0 (x) ≡ √ . Assuming that [a. Pj j=0 2 w f 2 2.e. 1].

2 60 .65 0. P1 = −1 cos x 3 xdx = 0. 0 We also have 1 D. 2 In Figure 3. Finally. 2 1 0.7 0.7 0.6 0.3 Least-squares Approximations Hence Q∗ (x) ≡ sin 1. 2 which means that Q∗ (x) = Q∗ (x).2 0.3.95 0.8 0.3 we plot the original function f (x) = cos x (solid line) and its approximation Q∗ (x) (dashed line).75 0.4 0.8 0. Solid line: f (x).3 0. We zoom on the interval x ∈ [0. b]. 0 1 1 f.1 0. 2 and hence the desired polynomial. P2 = −1 cos x 5 3x2 − 1 1 = 2 2 2 5 (12 cos 1 − 8 sin 1).5 0 0.9 1 x Figure 3. Define x= b + a + (b − a)t . Dashed line: its approximation Q∗ (x) 2 If the weight is w(x) ≡ 1 but the interval is [a. we can still use the Legendre polynomials if we make the following change of variables. is given by 2 Q∗ (x) = sin 1 + 2 15 cos 1 − 5 sin 1 (3x2 − 1).5 0.3: A second-order L2 -approximation of f (x) = cos x. Levy f.9 0. Q∗ (x).55 0.6 0.85 0. 1].

2 Hence 1 P1 (t) = 3 t. Example 3. P1 Hence 12 3 8 ∗ q1 (t) = − · 2 t = − 2 t 2 π π =⇒ Q∗ (x) = − 1 12 π2 2 x−1 . Also 0 F.4 we plot the original function f (x) = cos x (solid line) and its approximation ∗ Q1 (x) (dashed line). π]. Now.3 Least-squares Approximations t 1 is mapped to a = f (x). P0 = − −1 1 πt √ sin dt = 0. Find the polynomial in Π1 that minimizes π 0 [f (x) − Q1 (x)]2 dx. Levy Then the interval −1 F (t) = f Hence b a 3. define b + a + (b − a)t 2 [f (x) − Qn (x)]2 dx = b−a 2 1 −1 [F (t) − qn (t)]2 dt. 2 π2 In Figure 3.D. π πt =− sin 2 −1 1 3 tdt = − 2 t cos πt 3 sin πt 2 2 − π π 2 2 2 2 1 −1 =− 3 8 · . 2 2 π π + πt = (1 + t).15 Problem: Let f (x) = cos x on [0. 0 Letting x= we have F (t) = cos π πt (1 + t) = − sin . 61 . 2 2 which means that Q∗ (t) = 0. 2 2 We already know that the first two normalized Legendre polynomials are 1 P0 (t) = √ . π 1 −1 Solution: (f (x) − Q∗ (x))2 dx = 1 π 2 [F (t) − qn (t)]2 dt. x b. 2 F.

−x = e e−x (− cos x + sin x) cos xdx = 2 ∞ 0 1 = .5 1 1. L1 w . L1 w w ∞ 0 L1 (x) = 1 − x.5 3 x Figure 3. 2 62 = 0. This means that ∗ Q1 (x) = f. Dashed line: its approximation Q∗ (x) 1 Example 3. ∞) are Laguerre polynomials.5 2 2. 2 ∞ 0 = 0 ∞ e−x cos x(1−x)dx = xe−x (− cos x + sin x) e−x (−2 sin x) 1 − − 2 2 4 1 L1 (x) = . Since we are looking for the minimizer of the weighted L2 norm among polynomials of degree 1.3. we will need to use the first two Laguerre polynomials: L0 (x) = 1. Solid line: f (x). L0 w L0 (x) + f. Find the polynomial in Π1 that minimizes ∞ 0 e−x [f (x) − Q1 (x)]2 dx. ∞).5 −1 0 0. Levy 1 0.3 Least-squares Approximations D. π].5 0 −0.4: A first-order L2 -approximation of f (x) = cos x on the interval [0. L0 Also f. We thus have f.16 Problem: Let f (x) = cos x in [0. Solution: The family of orthogonal polynomials that correspond to this weight on [0.

Cn = an+1 an−1 . . Hence roots can not repeat. b] with respect to the weight w(x). This theorem will become handy when we discuss Gaussian quadratures in Section 5. Theorem 3. b). Hence Pn (x)Pn−2 (x) = which implies that b Pn (x) x − x1 2 0. (3. . a This is not possible since Pn is orthogonal to Pn−2 . b).3. . Pn (x)Pn−2 (x)dx > 0. . an Bn = an+1 an bn+1 bn − an+1 an 63 . · (x − xr ). n of Pn (x) are all real. xr be the roots of Pn (x) in (a. Let Qr (x) = (x − x1 ) · . If ak and bk are the coefficients of the terms of degree k and degree k − 1 in Pk (x). Let x1 .7 Properties of orthogonal polynomials 3. Also deg(Qr (x)) = r b n. . . simple.18 (Triple Recursion Relation) Any three consecutive orthonormal polynomials are related by a recursion formula of the form Pn+1 (x) = (An x + Bn )Pn (x) − Cn Pn−1 (x). We have already seen specific examples of such relations for the Legendre. and are in (a. We let {Pn (x)}n 0 be orthogonal polynomials in [a.D.e. . Another important property of orthogonal polynomials is that they can all be written in terms of recursion relations.42)). . Chebyshev. i. b).32). The following theorem states such relations always hold. and (3.35). then An = an+1 .17 The roots xj . Then Qr (x) and Pn (x) change their signs together in (a. Without loss of generality we now assume that x1 is a multiple root. This implies that Pn (x)Qr (x)w(x)dx = 0. Theorem 3. and Hermite polynomials (see (3. Proof.. b). a and hence r = n since Pn (x) is orthogonal to polynomials of degree less than n. . a2 n .3 Least-squares Approximations We start with a theorem that deals with some of the properties of the roots of orthogonal polynomials. Levy 3. Hence (Pn Qr )(x) is a polynomial with one sign in (a. j = 1. Pn (x) = (x − x1 )2 Pn−2 (x). .6.

an 64 . . which means that there exists α0 . Levy an+1 . an an−1 Pn + qn−1 an = An an−1 . . Then. xPn−1 = An Pn . . . can be explicitly written as an+1 xn+1 +bn+1 xn +. n. = (An x+Bn )(an xn +bn xn−1 +. .) − = Hence deg(Q(x)) n D. . Q. an an+1 x(an xn + bn xn−1 + .). . . Q(x) = i=0 For 0 i αi = n − 2. . . Pi . . Pi = 0. Finally Pn+1 = (An x + Bn )Pn − Cn Pn−1 .3. Set αn = Bn and αn−1 = −Cn . . The coefficient of xn is bn+1 = An bn + Bn an . . αn such that αi Pi (x). which means that Bn = (bn+1 − An bn ) 1 . . an an−1 Pn + qn−1 . Pi = Q. since xPn−1 = we have Cn = An xPn . . Pi = Pn+1 − An xPn . Pi Hence Qn (x) = αn Pn (x) + αn−1 Pn−1 (x). . For An = let Qn (x) = Pn+1 (x) − An xPn (x) an+1 bn an = (an+1 xn+1 + bn+1 xn + . Pi = −An xPn .3 Least-squares Approximations Proof. Pn−1 = An Pn .)−Cn (an−1 xn−1 +bn−1 xn−2 +.) an bn+1 − xn + .

A simple approximation of the first derivative is f (x) ≈ f (x + h) − f (x) . We write a Taylor expansion of f (x + h) about x. It might be significantly simpler to approximate the derivative instead of computing its exact value.3) . we typically represent the solution as a discrete approximation that is defined on a grid. (4. x + h).2) For such an expansion to be valid. What do we mean when we say that the expression on the right-hand-side of (4. we do know how to analytically differentiate every function. which are related. to derivatives. Let’s compute the approximation error. h 2 65 ξ ∈ (x. we assume that f (x) has two continuous derivatives. there are several reasons as of why we still need to approximate derivatives: • Even if there exists an underlying function that we need to differentiate. We may still be interested in studying changes in the data.1 Numerical Differentiation Basic Concepts This chapter deals with numerical approximations of derivatives.e. h2 f (x + h) = f (x) + hf (x) + f (ξ). we need to be able to come up with methods for approximating the derivatives at these points. (4. we might know its values only at a sampled data set without knowing the function itself. • There are some cases where it may not be obvious that an underlying function exists and all that we have is a discrete data set. The first questions that comes up to mind is: why do we need to approximate derivatives at all? After all. For almost all other functions.. x + h). this will typically be done using only values that are defined on a lattice. of course.1) is an approximation of the derivative? For linear functions (4. (4.1) is not the exact derivative.D. The Taylor expansion (4.1) is actually an exact expression for the derivative. • There are times in which exact formulas are available but they are very complicated to the point that an exact computation of the derivative requires a lot of function evaluations. Levy 4 4.2) means that we can now replace the approximation (4. Since we then have to evaluate derivatives at the grid points. i. • When approximating solutions to ordinary (or partial) differential equations. and again. Nevertheless. h (4.1) where we assume that h > 0. The underlying function itself (which in this cased is the solution of the equation) is unknown. 2 ξ ∈ (x.1) with an exact formula of the form f (x) = f (x + h) − f (x) h − f (ξ).

a more accurate approximation for the first derivative that is based on the values of the function at the points f (x − h) and f (x + h) is the centered differencing formula f (x) ≈ f (x + h) − f (x − h) . h → 0. which in turn is the case if f (ξ) is well defined in the interval (x.e.. This is indeed the case if the truncation error goes to zero. x + h) such that 1 f (ξ) = [f (ξ1 ) + f (ξ2 )].4) is − h2 [f (ξ1 ) + f (ξ2 )]. The small parameter h denotes the distance between the two points x and x + h. 2h 12 which means that the truncation error in the approximation (4.4) are h2 h3 f (x) + f (ξ1 ). The “speed” in which the error goes to zero as h → 0 is called the rate of convergence. 12 If the third-order derivative f (x) is a continuous function in the interval [x − h. 2 66 . 2h (4. 2 6 Here ξ1 ∈ (x. i. When the truncation error is of the order of O(h). For example. the approximation (4.1) is called a forward differencing or one-sided differencing. As this distance tends to zero. Taylor expansions of the terms on the right-hand-side of (4. The approximation of the derivative at x that is based on the values of the function at x − h and x.3).. 2 6 2 h h3 f (x − h) = f (x) − hf (x) + f (x) − f (ξ2 ). x + h) and ξ2 ∈ (x − h. i.4) Let’s verify that this is indeed a more accurate formula than (4. We refer to a methods as a pth -order method if the truncation error is of the order of O(hp ). then the intermediate value theorem implies that there exists a point ξ ∈ (x − h. Levy Since this approximation of the derivative at x is based on the values of the function at x and x + h. x + h). x). It is possible to write more accurate formulas than (4. we say that the method is a first order method. f (x) ≈ f (x) − f (x − h) . x + h]. the two points approach each other and we expect the approximation (4. The second term on the right-hand-side of (4. this error is called the truncation error.e.1) to improve.1).3) is the error term. h is called a backward differencing (which is obviously also a one-sided differencing formula).3) for the first derivative. Hence f (x + h) = f (x) + hf (x) + f (x + h) − f (x − h) h2 f (x) = − [f (ξ1 ) + f (ξ2 )].1) can be thought of as being obtained by truncating this term from the exact formula (4.1 Basic Concepts D.4. Since the approximation (4.

x) and ξ+ ∈ (x. 2h 6 4. 2 6 24 Here. n Here we simplify the notation and replace li (x) which is the notation we used in Section 2.6) we expand h3 h4 (4) h2 f (x ± h) = f (x) ± hf (x) + f (x) ± f (x) + f (ξ± ). We follow this procedure and assume that f (x0 ).5 by li (x). f (xn ) are given.5) which means that the expression (4. For example.2 Differentiation Via Interpolation In this section we demonstrate how to generate differentiation formulas by differentiating an interpolant. (n + 1)! j=0 67 n . . x + h) and that f (x) has four continuous derivatives in the interval. x + h).7 we know that the interpolation error is 1 f (x) − Qn (x) = f (n+1) (ξn ) (x − xj ). the approximation (4. 4.D. Levy Hence f (x) = f (x + h) − f (x − h) h2 − f (ξ). Hence f (x + h) − 2f (x) + f (x − h) h2 (4) h2 = f (x) + f (ξ− ) + f (4) (ξ+ ) = f (x) + f (4) (ξ). . 12 ξ ∈ (x − h. . The Lagrange form of the interpolation polynomial through these points is n Qn (x) = j=0 f (xj )lj (x). h2 (4.6) To verify the consistency and the order of approximation of (4.6) is indeed a second-order approximation of the derivative. The idea is straightforward: the first stage is to construct an interpolating polynomial from the data. x + h). According to the error analysis of Section 2.4) is a second-order approximation of the first derivative. An approximation of the derivative at any point can be then obtained by a direct differentiation of the interpolant. ξ− ∈ (x − h. In a similar way we can approximate the values of higher-order derivatives. .2 Differentiation Via Interpolation (4. Hence. with a truncation error that is given by h2 (4) − f (ξ). h2 24 12 where we assume that ξ ∈ (x − h. it is easy to verify that the following is a second-order approximation of the second derivative f (x) ≈ f (x + h) − 2f (x) + f (x − h) .

(4. .9). Hence.10) as a differentiation by interpolation algorithm. Levy where ξn ∈ (min(x. xn )). max(x. we would like to emphasize the dependence of ξn on x and hence replace the ξn notation by ξx . then becomes n f (xk ) = j=0 f (xj )lj (xk ) + 1 f (n+1) (ξxk ) (xk − xj ).. . so that n f (xk ) = j=0 f (xj )lj (xk ) + 1 f (n+1) (ξxk )w (xk ). (n + 1)! (4. . . f (x0 )) and 68 . . · · · (x − xn )]. . i.1 We demonstrate how to use the differentiation by integration formula (4. . x0 .9) Now. xn are fixed. .2 Differentiation Via Interpolation D.8) (n + 1)! (n + 1)! dx We now assume that x is one of the interpolation points. This means that we use two interpolation points (x0 . . when w (x) is evaluated at an interpolation point xk . .7) where n w(x) = i=0 (x − xi ). . Since here we are assuming that the points x0 . Example 4. . say xk . n n n w (x) = i=0 j=0 j=i (x − xj ) = i=0 [(x − x0 ) · . . x0 . . i. We that have: n f (x) = j=0 f (xj )lj (x) + 1 f (n+1) (ξx )w(x).10) in the case where n = 1 and k = 0. The numerical differentiation formula. . there is only one term in w (x) that does not vanish. · (x − xi−1 )(x − xi+1 ) · . . . xn }. Differentiating the interpolant (4. . (n + 1)! (4.e.7): n f (x) = j=0 f (xj )lj (x) + 1 d 1 f (n+1) (ξx )w (x) + w(x) f (n+1) (ξx ).4. (4. . x ∈ {x0 . . xn ).10) We refer to the formula (4.. (n + 1)! j=0 j=k (4. n w (xk ) = j=0 j=k (xk − xj ).e.

D. x 1 − x0 Here. x 0 − x1 l1 (x) = 1 . (x0 − x1 )(x0 − x2 ) 2x0 − x1 − x2 . x1 ). then f (x0 ) = f (x0 + h) − f (x0 ) h − f (ξ). (x2 − x0 )(x2 − x1 ) x 0 − x1 (x2 − x0 )(x2 − x1 ) (4. and want to approximate f (x0 ). x 1 − x0 x − x1 . where l0 (x) = Hence l0 (x) = We thus have f (x0 ) = f (x1 ) 1 f (x1 ) − f (x0 ) 1 f (x0 ) + + f (ξ)(x0 − x1 ) = − f (ξ)(x1 − x0 ).11) (x − x1 )(x − x2 ) . (x1 − x0 )(x1 − x2 ) l2 (x) = (x − x0 )(x − x1 ) . Levy 4. Example 4.2 Differentiation Via Interpolation (x1 . x 0 − x1 x 1 − x0 2 x 1 − x0 2 1 . (x1 − x0 )(x1 − x2 ) x 0 − x2 . (4. (x2 − x0 )(x2 − x1 ) 6 69 l1 (x0 ) = l2 (x0 ) = .3). 3 at x0 we have l0 (x0 ) = Hence f (x0 ) = f (x0 ) x 0 − x2 2x0 − x1 − x2 + f (x1 ) (x0 − x1 )(x0 − x2 ) (x1 − x0 )(x1 − x2 ) x 0 − x1 1 +f (x2 ) + f (3) (ξ)(x0 − x1 )(x0 − x2 ). (x2 − x0 )(x2 − x1 ) Evaluating lj (x) for j = 1. This time f (x) = f (x0 )l0 (x) + f (x1 )l1 (x) + f (x2 )l2 (x). If we now let x1 = x0 + h. 2. h 2 which is the (first-order) forward differencing approximation of f (x0 ). we simplify the notation and assume that ξ ∈ (x0 .2 We repeat the previous example in the case n = 2 and k = 0. f (x1 )). (x0 − x1 )(x0 − x2 ) l1 (x) = 2x − x0 − x2 . (x0 − x1 )(x0 − x2 ) l1 (x) = (x − x0 )(x − x2 ) . x 0 − x1 l1 (x) = x − x0 . with l0 (x) = Hence l0 (x) = 2x − x1 − x2 . The Lagrange interpolation polynomial in this case is f (x) = f (x0 )l0 (x) + f (x1 )l1 (x). (x1 − x0 )(x1 − x2 ) l2 (x) = 2x − x0 − x1 .

which is a very practical way for generating approximations of derivatives (as well as other quantities as we shall see. 2 6 24 (4. f (x) ≈ Af (x + h) + B(x) + Cf (x − h).12) The coefficients A.5) f (x) = f (x + h) − f (x − h) 1 − f (ξ)h2 . 1. h2 h3 h4 f (x) ± f (x) + f (4) (ξ± ).3 The Method of Undetermined Coefficients In this section we present the method of undetermined coefficients. 2. The Taylor expansions of the terms f (x ± h) are f (x ± h) = f (x) ± hf (x) + where (assuming that h > 0) x−h ξ− x ξ+ x + h. second-order approximation of the first derivative. Assume. the resulting formula would be the second-order centered approximation of the first-derivative (4. i = 0. i.11) becomes f (x) = −f (x) f (ξ) 2 3 2 1 + + f (x + h) + f (x + 2h) − h 2h h 2h 3 −3f (x) + 4f (x + h) − f (x + 2h) f (ξ) 2 + h. 2h 6 4. 6 24 70 . x2 ).4..e. (4. f (x + h).13) we can rewrite (4.14) h (A + C)f (x) 2 2 h4 h3 (A − C)f (3) (x) + [Af (4) (ξ+ ) + Cf (4) (ξ− )]. and C are to be determined in such a way that this linear combination is indeed an approximation of the second derivative. if we were to repeat the last example with n = 2 while approximating the derivative at x1 .13) Using the expansions in (4. B. Remark. f (x − h). that we are interested in finding an approximation of the second derivative f (x) that is based on the values of the function at three equally spaced points. f (x). we assume ξ ∈ (x0 . In a similar way. for example.. For xi = x + ih.12) as f (x) ≈ Af (x + h) + Bf (x) + Cf (x − h) = (A + B + C)f (x) + h(A − C)f (x) + + (4. Levy Here. = 2h 3 which is a one-sided.g. equation (4.3 The Method of Undetermined Coefficients D. e. when we discuss integration).

h2 24 We note that the last two terms can be combined into one using an intermediate values theorem (assuming that f (x) has four continuous derivatives). (4.D. Levy 4. h2 B=− 2 . Write the Taylor expansions of the function at the approximation points. the method of undetermined coefficients follows what was just demonstrated in the example: 1. h2 12 In terms of an algorithm. we would have missed on this cancellation. Equate the coefficients of the function and its derivatives on both sides. The number of terms in the Taylor expansion should be sufficient to rule out additional cancellations. no simple answer. the coefficient of the third-derivative vanished as well. h2 h2 (4) [f (ξ+ ) + f (4) (ξ− )] = f (4) (ξ). unfortunately. i. 71 . the coefficient of f (3) (x) on the right-hand-side of (4.. The only question that remains open is how many terms should we use in the Taylor expansion. since A and C are equal to each other. and f (x) on both sides of (4. f (x). h2 In this particular case. one should truncate the Taylor series after the leading term in the error has been identified.14) we obtain the linear system   A + B + C = 0. In other words. 3. we could satisfy four equations.15)   A+C = 2 .e. In the example. Assume that the derivative can be written as a linear combination of the values of the function at certain points. and would have mistakenly concluded that the approximation method is only first-order accurate. If we were to stop the Taylor expansions at the third derivative instead of at the fourth derivative. This question has. x + h). 2. h2 The system (4. we have already seen that even though we used data that is taken from three points. 24 12 ξ ∈ (x − h. In other words.  A − C = 0. Hence we obtain the familiar second-order approximation of the second derivative: f (x) = f (x + h) − 2f (x) + f (x − h) h2 (4) − f (ξ).14) also vanishes and we end up with f (x) = f (x + h) − 2f (x) + f (x − h) h2 (4) − [f (ξ+ ) + f (4) (ξ− )].15) has the unique solution: A=C= 1 .3 The Method of Undetermined Coefficients Equating the coefficients of f (x).

In order to improve this approximation we will need some more insight on the internal structure of the error..4. where L denotes the quantity that we are interested in approximating. We note that the formula L ≈ D(h). L = f (x). k! (−1)k f (k) (x) k h . We therefore start with the Taylor expansions of f (x ± h) about the point x. . We start with an example in which we show how to turn a second-order approximation of the first derivative into a fourth order approximation of the same quantity. . The important property of the coefficients ei ’s is that they do not depend on h. which in this case is D(h) = The error is E = e2 h2 + e4 h4 + .. . 2h (4.e. . .16) as L = D(h) + e2 h2 + e4 h4 + .16) ∞ k=0 ∞ k=0 f (k) (x) k h . . f (x + h) = f (x − h) = Hence f (x) = f (x + h) − f (x − h) h4 h2 (3) − f (x) + f (5) (x) + .17) . is a second-order approximation of the first-derivative which is based on the values of f (x) at x ± h. In order to improve the 72 f (x + h) − f (x − h) . 2h 3! 5! (4. While we study it here in the context of numerical differentiation. We assume here that in general ei = 0. Levy 4. where ei denotes the coefficient of hi in (4.e. . and D(h) is the approximation. it is by no means limited only to differentiation and we will get back to it later on when we study methods for numerical integration. i.16).4 Richardson’s Extrapolation Richardson’s extrapolation can be viewed as a general procedure for improving the accuracy of approximations when the structure of the error is known. k! We rewrite (4. . We already know that we can write a second-order approximation of f (x) given its values in f (x ± h). i.4 Richardson’s Extrapolation D.

2. . where −f (x + 2h) + 8f (x + h) − 8f (x − h) + f (x − 2h) .21) Combining (4.20) .D.19) improves the accuracy of the approximation (4. x + h/2.21) with (4. However. we can write (4. . 1. Indeed.17) with (4. 12h Equation (4. . . a fourth-order approximation of the derivative is −f (x + 2h) + 8f (x + h) − 8f (x − h) + f (x − 2h) + O(h4 ). . 15 Remarks.20) we end up with a sixth-order approximation of the derivative: 16S(h) − S(2h) L= + O(h6 ). It is not specific for numerical differentiation.g. it is possible to use other approximations. (4. . How can this be done? one possibility is to write another approximation that is based on the values of the function at different points. Levy 4.18). . We carry out such a procedure by writing S(h) = L = S(2h) + a4 (2h)4 + a6 (2h)6 + . . x + h. For example. . subtracting the following equations from each other 4L = 4D(h) + 4e2 h2 + 4e4 h4 + . (4. . the idea is to combine (4. instead of (4. In (4. If this is what is done. we have 4D(h) − D(2h) − 4e4 h4 + . For example. instead of using D(2h). . D(h/2). e. L = D(2h) + 4e2 h2 + 16e4 h4 + . 73 (4. . .20) can be turned into a sixth-order approximation of the derivative by eliminating the term a4 h4 . is still a second-order approximation of the derivative. 3 In other words. .19) as L= L = S(h) + a4 h4 + a6 h6 + . of course.19) we would get a fourth-order approximation of the derivative that is based on the values of f at x − h. we can write L = D(2h) + e2 (2h)2 + e4 (2h)4 + .18) such that the h2 term in the error vanishes.4 Richardson’s Extrapolation approximation of L our strategy will be to eliminate the term e2 h2 from the error. x − h/2. (4. This process can be repeated over and over as long as the structure of the error is known..18) This.19) f (x) = 12h Note that (4.16) by using more points. Once again we would like to emphasize that Richardson’s extrapolation is a general procedure for improving the accuracy of numerical approximations that can be used when the structure of the error is known. .

(5. inf f (x). b].2) The upper integral of f (x) on [a. . it will be natural to recall the notion of Riemann integration. . (5. of the interval [a. In order to gain some insight on numerical integration. In addition. and the lower integral of f (x) is defined as L(f ) = sup(L(f.1). . P )). If the upper and lower integral of f (x) are equal to each other. their b common value is denoted by a f (x)dx and is referred to as the Riemann integral of f (x). .2). the need for approximate integration formulas is obvious.1) while the lower (Darboux) sum of f (x) with respect to the partition P is defined as n L(f. b] is defined as U (f ) = inf(U (f. (5. as two approximations of the integral (assuming that the function is indeed integrable). P ) = i=1 mi ∆xi .1 Numerical Integration Basic Concepts In this chapter we are going to explore various ways for approximating the integral of a function over a given domain. and still. For each i we let Mi (f ) = and mi (f ) = x∈[xi−1 . these sums are not defined in the most convenient way 74 . Since we can not analytically integrate every function. b] and that {x0 . P . xn } is a partition (P ) of [a. b]. the upper (Darboux) sum of f (x) with respect to the partition P is defined as n U (f. Of course. We assume that f (x) is a bounded function defined on [a.xi ] sup x∈[xi−1 . there might be situations where the given function can be integrated analytically. For the purpose of the present discussion we can think of the upper and lower Darboux sums (5.xi ] f (x). P )).D. Levy 5 5. Letting ∆xi = xi − xi−1 . where both the infimum and the supremum are taken over all possible partitions. an approximation formula may end up being a more efficient alternative to evaluating the exact expression of the integral. P ) = i=1 Mi ∆xi .

by the value of the function at the center point 1 (a + b).1: A rectangular quadrature A variation on the rectangular rule is the midpoint rule.3) is called the rectangular method (see Figure 5. i. and hence we can refer to (5.e. (5. the midpoint quadrature (5. Numerical integration formulas are also referred to as integration rules or quadratures.3) as the rectangular rule or the rectangular quadrature. may be a complicated task on its own.2). 2 b a f (x)dx ≈ (b − a)f a+b 2 . Only this time.1). i.4) (see also Fig 5. 75 .3) The approximation (5. (5. Finding the extrema of the function. we replace the value of the function at an endpoint..3). we approximate the value of the integral a f (x)dx by multiplying the length of the interval by the value of the function at one point. which we would like to avoid. b Instead.D. f(b) f(x) f(a) a b x Figure 5. one can think of approximating the value of a f (x)dx by multiplying the value of the function at one of the end-points of the interval by the length of the interval..e.1 Basic Concepts for an approximation algorithm. This is because we need to find the extrema of the function in every subinterval. Levy 5. Similarly to the rectanb gular rule. b a f (x)dx ≈ f (a)(b − a). As we shall see below.4) is a more accurate quadrature than the rectangular rule (5.

x F (x) = a f (x)dx. we have (expanding f (a + h/2)) for the quadrature error.1 Basic Concepts D. Levy f(b) f((a+b)/2) f(x) f(a) a (a+b)/2 b x Figure 5. b).4). we consider the primitive function F (x). a+h E= a f (x)dx − hf a+ h 2 = hf (a) + h2 h3 f (a) + f (a) + O(h4 ) 2 6 h2 h −h f (a) + f (a) + f (a) + O(h3 ) . 24 76 ξ ∈ (a. and expand a+h f (x)dx = F (a + h) = F (a) + hF (a) + a h2 h3 F (a) + F (a) + O(h4 ) 2 6 (5. (5. 2 8 which means that the error term is of order O(h3 ) so we should stop the expansions there and write E = h3 f (ξ) 1 1 − 6 8 = (b − a)3 f (ξ).2: A midpoint quadrature In order to compute the quadrature error for the midpoint rule (5.5.6) .5) = hf (a) + h3 h2 f (a) + f (a) + O(h4 ) 2 6 If we let b = a + h. E.

and write the Lagrange interpolant (of degree n) through these points. we can do it. For equally spaced points. x0 .D.8) Note that if we want to integrate several different functions at the same points.8) need to be computed only once. n Pn (x) = i=0 f (xi )li (x). . If we change the interpolation/integration points. b As always. 77 . . . x1 = b. Hence. since they do not depend on the function that is being integrated. we can approximate b a b n b n f (x)dx ≈ Pn (x)dx = a i=0 f (xi ) a li (x)dx = i=0 Ai f (xi ).9) is called a Newton-Cotes formula. b]. We select nodes x0 . (5. then we must recompute the quadrature coefficients. the quadrature coefficients (5. (5.2 Integration via Interpolation In this section we will study how to derive quadratures by integrating an interpolant. i. We will go on and use these assumptions throughout the chapter. 5.. xn ∈ [a. i=0 (5. We also assumed that they are bounded and that they are defined at every point.1 We let n = 1 and consider two interpolation points which we set as x0 = a.7) are given by b Ai = a li (x)dx. Example 5. . Levy 5. . a numerical integration formula of the form b a n f (x)dx ≈ Ai f (xi ). so that whenever we need to evaluate a function at a point.e. . our goal is to evaluate I = a f (x)dx. . .7) The quadrature coefficients Ai in (5. Throughout this section we assumed that all functions we are interested in integrating are actually integrable in the domain of interest.2 Integration via Interpolation Remark. x i − xj 0 i n. with n li (x) = j=0 j=i x − xj . xn .

The interpolation error is 1 f (x) − P1 (x) = f (ξx )(x − a)(x − b). f(b) f(x) f(a) a b x Figure 5. b a dx ≈ b−a [f (a) + f (b)].3).11) 12 with ξ ∈ (a.2 Integration via Interpolation In this case l0 (x) = Hence A0 = a D. 78 . b−a 2 A1 = a l1 (x) = a The resulting quadrature is the so-called trapezoidal rule. 2 and hence (using the integral intermediate value theorem) b E= a 1 f (ξ) f (ξx )(x − a)(x − b) = 2 2 b a (x − a)(x − b)dx = − f (ξ) (b − a)3 . Levy b−x . b−a l0 (x) = a Similarly.10) (see Figure 5. 2 (5. (5.3: A trapezoidal quadrature We can now use the interpolation error to compute the error in the quadrature (5. b b b−x b−a dx = . b−a b l1 (x) = b x−a .5. b). ξx ∈ (a. b−a 2 b−a x−a dx = = A0 .10). b).

8). i=0 is exact for all polynomials of degree deg(lj (x)) = n. and hence b n n n. . . 79 .12) (see Figure 5. . We know that lj (x)dx = a i=0 Ai lj (xi ) = i=0 Ai δij = Aj . Levy Remarks. Hence b n b n p(x)dx = a i=0 p(xi ) a li (x)dx = i=0 Ai p(xi ).. 2. Assume that the quadrature b a n f (x)dx ≈ Ai f (xi ). i = 1. xi ]. b n f (x)dx = a i=1 1 f (x)dx ≈ 2 xi−1 xi n i=1 (xi − xi−1 )[f (xi−1 ) + f (xi )].2 Consider the points a = x0 < x1 < · · · < xn = b. Example 5. it can be written as (check!) n p(x) = i=0 p(xi )li (x).(5.3 Composite Integration Rules In a composite quadrature. We demonstrate this idea with a couple of examples. we divide the interval into subintervals and apply an integration rule to each subinterval.e. 5. i. i. As of the opposite direction.3 Composite Integration Rules 1.. 5.e. A particular case is when these points are uniformly spaced. n. For if p(x) is such a polynomial. (5. We note that the quadratures (5. when all intervals have an equal length. For example.7). . if xi = a + ih.4). are exact for polynomials of degree n. The composite trapezoidal rule is obtained by applying the trapezoidal rule in each subinterval [xi−1 .D.

i=1 80 . We can also compute the error term as a function of the distance between neighboring points. (5. Let 1 M= n n f (ξi ).3 Composite Integration Rules D. h. .4: A composite trapezoidal rule where h= then b a b−a .13) The notation of a sum with two primes. means that we sum over all the terms with the exception of the first and last terms that are being divided by 2. Levy f(x) x0 x1 x2 x xn−1 xn Figure 5. We know from (5. 12 n Hence.11) that in every subinterval the quadrature error is − h3 f (ξx ). n n−1 n h f (x)dx ≈ f (a) + 2 2 f (a + ih) + f (b) = h i=1 i=0 f (a + ih). i=1 Here.5. the overall error is obtained by summing over n such terms: n i=1 − h3 n 1 h3 f (ξi ) = − 12 12 n f (ξi ) . we use the notation ξi to denote an intermediate point that belongs to the ith interval.

. n The quadrature points are xj = a + j − 1 2 h. i. b a n f (x)dx ≈ h f (xj ). we have E=− (b − a)h2 f (ξ). 2. This means that the composite midpoint rule is also second-order accurate (just like the composite trapezoidal rule).D.15) is known as the composite midpoint rule. E= j=1 Ej = h3 24 n f (ξj ) = j=1 h3 1 n 24 n n f (ξj ) = j=1 h2 (b − a) f (ξ). 12 ξ ∈ [a.b] 5.4) in every subinterval. . In order to obtain the quadrature error in the approximation (5. b] we assume n subintervals and let h= b−a . 24 (5. xj + 2 2 . . .6).3 Composite Integration Rules min f (x) M x∈[a. Levy Clearly x∈[a.14) This means that the composite trapezoidal rule is second-order accurate.b] max f (x) If we assume that f (x) is continuous in [a. j = 1.16) where ξ ∈ (a.e. . b] such that f (ξ) = M. i. 24 ξj ∈ h h xj − .. Ej = Hence n h3 f (ξj ). Hence (recalling that (b − a)/n = h. b). 81 . n.15) Equation (5.3 In the interval [a. Example 5. b]. (5.15) we recall that in each subinterval the error is given according to (5. b] (which we anyhow do in order for the interpolation error formula to be valid) then there exists a point ξ ∈ [a. j=1 (5.e. The composite midpoint rule is given by applying the midpoint rule (5.

x.1 that this approximation is actually exact for polynomials of degree 3. 1 xdx = A1 + A2 . . implies that it is exact for any polynomial of degree 2. Determine the coefficients of the linear combination by requiring that the quadrature is exact for as many polynomials as possible from the the ordered set {1. . x.17) is linear. 4 0 1 6 2 and A1 = 3 . 2. In fact.4. We demonstrate this technique with the following example. 2. 2 0 1 1 x2 dx = A1 + A2 . 82 .4 5.5. Hence we obtain the system of linear equations 1 1 = 0 1dx = A0 + A1 + A2 .}. that is exact for all polynomials of degree Solution: Since the quadrature has to be exact for all polynomials of degree 2.4 Problem: Find a quadrature of the form 1 0 f (x)dx ≈ A0 f (0) + A1 f 1 2 + A2 f (1). x. it has to be exact for the polynomials 1. Write a quadrature as a linear combination of the values of the function at the chosen quadrature points. its being exact for 1. A0 = A2 = 1 0 f (x)dx ≈ f (0) + 4f 6 + f (1) . Example 5. Levy 5. .1 Additional Integration Techniques The method of undetermined coefficients The methods of undetermined coefficients for deriving quadratures is the following: 1.17) Since the resulting formula (5. 3. and the desired quadrature is 1 2 1 1 = 2 1 = 3 Therefore. (5. and x2 . x2 .5. we will show in Section 5. Select the quadrature points.4 Additional Integration Techniques D. and x2 .

d] → [a.18) We would like to to use (5.5 We want to write the result of the previous example 1 0 f (x)dx ≈ f (0) + 4f 6 1 2 + f (1) .2 Change of an interval 5. 83 . Levy 5. According to (5.18) was exact for polynomials of degree m.19) We note that if the quadrature (5. as a quadrature on the interval [a. that approximates for b f (x)dx. Example 5. b].19). (5.20) is known as the Simpson quadrature. d−c d−c b−a f (x)dx = d−c d c b−a f (λ(t))dt ≈ d−c n Ai f (λ(ti )).18) to find a quadrature on the interval [a. (5. b] can be written as a linear transformation of the form λ(t) = Hence b a b−a ad − bc t+ .D. i=0 This means that b a b−a f (x)dx ≈ d−c n Ai f i=0 b−a ad − bc ti + d−c d−c .4 Additional Integration Techniques Suppose that we have a quadrature formula on the interval [c. b]. so is (5. a The mapping between the intervals [c. i=0 (5.4.20) The approximation (5. d] of the form d c n f (t)dt ≈ Ai f (ti ).19) b a f (x)dx ≈ b−a f (a) + 4f 6 a+b 2 + f (b) .

where the coefficients Ai are given by b Ai = a li (x)w(x)dx.2.4 Additional Integration Techniques 5.4. . Levy We recall that a weight function is a continuous.23) with quadrature coefficients. we wrote a quadrature of the form b a n f (x)dx ≈ Ai f (xi ). i=0 where b Ai = a li (x)dx. We assume that such a weight function w(x) is given and would like to write a quadrature of the form b a n f (x)w(x)dx ≈ Ai f (xi ).21) Such quadratures are called general (weighted) quadratures. i=0 (5. non-negative function with a positive mass. Its Lagrange form is n Qn (x) = i=0 f (xi )li (x). . with the usual n li (x) = j=0 j=i x − xj . x i − xj 0 i n. Hence b a b n b n f (x)w(x)dx ≈ Qn (x)w(x)dx = a i=0 a li (x)w(x)dx = i=0 Ai f (xi ). Repeating the derivation we carried out in Section 5. the general quadrature is b a n f (x)w(x)dx ≈ Ai f (xi ). Previously. Ai . 84 . i=0 (5. that are given by (5. we construct an interpolant Qn (x) of degree n that passes through the points x0 .3 General integration formulas D. xn . .5.22). for the case w(x) ≡ 1. (5.22) To summarize. .

We let h denote half of the interval [a. 1 0. 2 85 .8 3 x 3 1 Figure 5. We will see that by studying the error term.1 The quadrature error Surprisingly.D. h= b−a . An alternative derivation is the following: start with a polynomial Q2 (x) that interpolates f (x) at the points a.4 1. Figure 5. = 6 2 b which is Simpson’s rule (5.4 2. Simpson’s quadrature is exact for polynomials of degree 3 and not only for polynomials of degree 2.. b].8 2 2.2 0 1 1.8 ←− P2 (x) 0.2 2. The approximation of obtained by integrating the quadratic interpolant Q2 (x) over [1. i.5 Simpson’s Integration 5.5: An example of Simpson’s quadrature.e.5 demonstrates this process of deriving Simp3 son’s quadrature for the specific choice of approximating 1 sin xdx.20). and b.5.6 2..6 sin x −→ 0.20). 3] sin xdx is 5. (a + b)/2. = ..4 0. Then approximate b a f (x)dx ≈ (x − c)(x − b) (x − a)(x − b) (x − a)(x − c) f (a) + f (c) + f (b) dx (a − c)(a − b) (c − a)(c − b) (b − a)(b − c) a a+b b−a f (a) + 4f + f (b) .5 Simpson’s Integration In the last example we obtained Simpson’s quadrature (5.6 1.2 1. Levy 5.

n. b]. we divide the interval [a. . b] into an even number of subintervals.. 3 3 3 · 5! f (x)dx ≈ a+2h F (x) = a f (t)dt. n 86 0 i n. Levy f (x)dx = We now define F (x) to be the primitive function of f (x). . + f (a) + 2hf (a) + 2 6 24 4 2 100 5 (4) = 2hf (a) + 2h2 f (a) + h3 f (a) + h4 f (a) + h f (a) + . 3 2 6 24 3 2 (2h) (2h)4 (4) (2h) f (a) + f (a) + f (a) + . Hence a+2h F (a + 2h) = a f (x)dx = F (a) + 2hF (a) + + (2h)2 (2h)3 F (a) + F (a) 2 6 (2h)5 (5) (2h)4 (4) F (a) + F (a) + . . . . 5. 3 90 This means that the quadrature error for Simpson’s rule is 1 E=− 90 b−a 2 5 f (4) (ξ). i.5. . . . 3 3 5! which implies that F (a + 2h) − 1 h [f (a) + 4f (a + h) + f (a + 2h)] = − h5 f (4) (a) + .24) Since the fourth derivative of any polynomial of degree 3 is identically zero.e.2 Composite Simpson rule To derive a composite version of Simpson’s quadrature. . ξ ∈ [a.5. x h [f (a) + 4f (a + h) + f (a + 2h)] 3 a h 4 4 4 = f (a) + 4f (a) + 4hf (a) + h2 f (a) + h3 f (a) + h4 f (4) (a) + . .5 Simpson’s Integration Then b a D. the quadrature error formula (5. .24) implies that Simpson’s quadrature is exact for polynomials of degree 3. where h= b−a . (5. . 4! 5! 4 2 32 = 2hf (a) + 2h2 f (a) + h3 f (a) + h4 f (a) + h5 f (4) (a) + . and let xi = a + ih. .

xn were given up front.25) Summing the error terms (that are given by (5. 90 2 180 ξ ∈ [a. the composite Simpson quadrature is fourth-order accurate.D.6.1 Gaussian Quadrature Maximizing the quadrature’s accuracy So far.6 5.b] max f (4) (x). given a set of nodes x0 . xn . + xn−2 n/2 f (x)dx = i=1 x2i−2 f (x)dx h ≈ 3 [f (x2i−2 ) + 4f (x2i−1 ) + f (x2i )] . 5.e. we can conclude that E=− h5 n (4) h4 (4) f (ξ) = − f (ξ). . all the quadratures we encountered were of the form b a n f (x)dx ≈ Ai f (xi ). we obtain b x2 xn n/2 x2i f (x)dx = a x0 f (x)dx + . . .6 Gaussian Quadrature Hence. . Levy 5. i=1 x∈[a.27) An approximation of the form (5. the quadrature points x0 . . b]. i=0 (5. . .20). (5. In other words.26) i. This will be 87 . the coefficients {Ai }n were determined such that the approximation was exact in Πn . .24)) over all sub-intervals. i=0 We are now interested in investigating the possibility of writing more accurate quadratures without increasing the total number of quadrature points. if we replace the integral in every subintervals by Simpson’s rule (5.. . . f (x2i−2 ) + 4 f (x)dx ≈ f (x0 ) + 2 3 a i=1 i=0 (5. In all cases.27) was shown to be exact for polynomials of degree n for an appropriate choice of the quadrature coefficients Ai .b] (x) 2 n n/2 f (4) (ξi ) i=1 x∈[a. the quadrature error takes the form h5 E=− 90 Since min f (4) n/2 f i=1 (4) h5 n 2 (ξi ) = − · · 90 2 n n/2 f (4) (ξi ). i=1 The composite Simpson quadrature is thus given by   n/2 n/2 b h f (x2i−1 ) + f (xn ) .

∀p(x) ∈ Πn .5. Levy possible if we allow for the freedom of choosing the quadrature points. For f (x) ∈ Π2n+1 .28) where w(x) b 0 is a weight function. Ai . the number of quadrature nodes. We note that p(x). b a n f (x)w(x)dx ≈ Ai f (xi ). Equation (5. and so is the number of quadrature coefficients.9 below. . xn . . and hence we can expect to be able to derive quadratures that are exact for polynomials in Π2n+1 .6 Gaussian Quadrature D. . 3(!) We will revisit this problem and prove this An equivalent problem can be stated for the more general weighted quadrature case. x0 . . Since x0 .27) and (5. 88 . . . Quadratures that are obtained that way are called Gaussian quadratures. b p(x)q(x)w(x)dx = 0. r(x) ∈ Πn . . xn are the zeros of q(x) then (5. if we have the flexibility of determining the location of the points in addition to determining the coefficients. Theorem 5. .28) is exact for f ∈ Πn if and only if w(x) a j=0 j=i Ai = x − xj dx. we have altogether 2n + 2 degrees of freedom.7 Let q(x) be a nonzero polynomial of degree n + 1 that is w-orthogonal to Πn .. xn are the zeros of q(x) then f (xi ) = r(xi ). This is indeed the case as we shall see below. . . i=0 (5.e. We start with the following theorem. Proof. Example 5. Hence. a If x0 .6 The quadrature formula 1 −1 f (x)dx ≈ f 1 −√ 3 +f 1 √ 3 .28). The quadrature problem becomes now a problem of choosing the quadrature points in addition to determining the corresponding coefficients in a way that the quadrature is exact for polynomials of a maximal degree.28) is exact ∀f ∈ Π2n+1 . . write f (x) = q(x)p(x) + r(x). . We will show that the general solution of this integration problem is connected with the roots of orthogonal polynomials. is exact for polynomials of degree result in Example 5. Here. is n + 1. i. x i − xj In both cases (5.

t1 ).7 we already know that the quadrature points that will provide the most accurate quadrature rule are the n+1 roots of an orthogonal polynomial of degree n + 1 (where the orthogonality is with respect to the weight function w(x)). (tr .28) is exact for polynomials in Πn . According to Theorem 5. simple and lie in (a.6 Gaussian Quadrature f (x)w(x)dx = a n a [q(x)p(x) + r(x)]w(x)dx = a n r(x)w(x)dx (5. tr+1 ). b f (x)w(x)dx = 0. and f (x) is of one sign on (t0 . The polynomial n p(x) = i=1 (x − ti ). f (x) changes sign at least once. . Since 1 ∈ Πn . Hence b f (x)p(x)w(x)dx = 0. a which leads to a contradiction since p(x) ∈ Πn . 89 . . a Hence. b). Choose {ti }i 0 such that a = t0 < t1 < · · · < tr+1 = b. t2 ). . and an orthogonal polynomial of degree n + 1 does have n + 1 distinct roots in the interval. where r n. Then f (x) changes sign at least n + 1 times on (a.29) = i=0 Ai r(xi ) = i=0 Ai f (xi ). b b b 5.8 Let w(x) be a weight function. Proof.29) holds since (5. has the same sign property. . something we know from our previous discussion on orthogonal polynomials (see Theorem 3. The second equality in (5. We recall that the roots of q(x) are real.D. Now suppose that f (x) changes size only r times. b). In other words. The third equality (5. and that f (x) is w-orthogonal to Πn . Levy Hence. Theorem 5. (t1 . Assume that f (x) is continuous in [a.29) holds since q(x) is w-orthogonal to Πn . b] that is not the zero function.17). We now restate the result regarding the roots of orthogonal functions with an alternative proof. we need n + 1 quadrature points in the interval.

1 0   A0 x3 + A1 x3 = 0.6 Gaussian Quadrature Example 5. 2. From this we can write   A0 + A1 = 2. Solving for A1 . and x3 . x2 . Hence we write the system of equations 1 1 f (x)dx = −1 −1 xi dx = A0 xi + A1 xi .e.. and x1 we get 1 x0 = −x1 = √ . Since n = 1. (5. 2  A0 x2 + A1 x2 = 3 . 1. This is done in the usual way.   A0 x0 + A1 x1 = 0. A2 .9 We are looking for a quadrature of the form 1 −1 D. x. 1 P2 (x) = (3x2 − 1). 3 1 x1 = √ .10 We repeat the previous problem using orthogonal polynomials.5. 90 . 0 1 i = 0.e.. 2 The integration points will then be the zeros of P2 (x). The polynomial of degree n+1 = 2 which is orthogonal to Πn = Π1 with weight w(x) ≡ 1 is the Legendre polynomial of degree 2. 3 All that remains is to determine the coefficients A1 .30) Example 5. 3. A2 . 3 so that the desired quadrature is 1 −1 f (x)dx ≈ f 1 −√ 3 +f 1 √ 3 . i. we expect to find a quadrature that is exact for polynomials of degree 2n + 1 = 3. A straightforward computation will amount to making this quadrature exact for the polynomials of degree 3. 1 x0 = − √ . The linearity of the quadrature means that it is sufficient to make the quadrature exact for 1. Levy f (x)dx ≈ A0 f (x0 ) + A1 f (x1 ). assuming that the quadrature 1 −1 f (x)dx ≈ A0 f (x0 ) + A1 f (x1 ). x0 . 1 0 A1 = A2 = 1. i.

we have b n w(x)dx = a i=0 Ai . Levy is exact for polynomials of degree 1 5. since the Gaussian quadrature is exact for f (x) ≡ 1. i=0 (5.. Our starting point is the Lagrange form of the Hermite polynomial that interpolates f (x) and f (x) at x0 .e.e. Also assume that q(xi ) = 0 for i = 0. i=0 b a n f (x)w(x)dx ≈ j Ai f (xi ). i. and 0= 1 1 xdx = −A0 √ + A1 √ .. 2= −1 1dx = A0 + A1 . It is given by (2. p(x) is indeed a polynomial of degree n. In order to estimate the error in the Gaussian quadrature we would first like to present an alternative way of deriving the Gaussian quadrature. i.. i. Let p(x) ∈ Πn be defined as q(x) . and the quadrature is the same as (5.31) Fix 0 n.2 Convergence and error analysis Lemma 5. x − xj p(x) = Since xj is a root of q(x).11 In a Gaussian quadrature formula. (xi − xj )2 which means that ∀j. The degree of p2 (x) 2n which means that the Gaussian quadrature (5.6. n. .e.42). Let q(x) ∈ Πn+1 be w-orthogonal to Πn . Fix n. Hence b n n 0< a p (x)w(x)dx = i=0 2 Ai p (xi ) = i=0 2 Ai q 2 (xi ) = Aj p2 (xj ).6 Gaussian Quadrature 1.D. In addition. 91 . the coefficients are positive and their b sum is a w(x)dx. . Aj > 0. 3 3 −1 1 Hence A0 = A1 = 1. 5. .30) (as should be). . . xn . n n p(x) = i=0 f (xi )ai (x) + i=0 f (xi )bi (x). . Proof.31) is exact for it. and take {xi }n to be the quadrature points. . The simplest will be to use 1 and x. .

Indeed (assuming n = 0): b n b n Bi = a w(x)(x − 2 xi )li (x)dx = j=0 j=i (xi − xj ) w(x) a j=0 (x − xj )li (x)dx. We are now ready to formally establish the fact that the Gaussian quadrature is exact for polynomials of degree 2n + 1. it seems to be rather strange to deal with the Hermite interpolant when we do not explicitly know the values of f (x) at the interpolation points.34) In some sense. Since li (x) is a j=0 polynomial in Πn . However.34). (5. Levy 0 ≤ i ≤ n. (5. we can eliminate the derivatives from the quadrature (5. . Consider the Gaussian quadrature b a n f (x)w(x)dx ≈ Ai f (xi ). Hence.12 Let f ∈ C 2n+2 [a. . b) such that b a n f (x)w(x)dx − Ai f (xi ) = i=0 f (2n+2) (ζ) (2n + 2)! 92 b n a j=0 (x − xj )2 w(x)dx.32) by setting Bi = 0 in (5. x i − xj We now assume that w(x) is a weight function in [a. Theorem 5. (5. .5. b] and let w(x) be a weight function. li (x) = j=0 j=i x − xj . D. xn as the roots of a polynomial of degree n + 1 that is w-orthogonal to Πn .33) and b Bi = a w(x)bi (x)dx.32) where b Ai = a w(x)ai (x)dx. . b] and approximate b a b n n w(x)f (x)dx ≈ w(x)p2n+1 (x)dx = a i=0 Ai f (xi ) + i=0 Bi f (xi ). . and n 2 bi (x) = (x − xi )li (x). Bi = 0. if the product n (x − xj ) is orthogonal to li (x).6 Gaussian Quadrature with ai (x) = (li (x))2 [1 + 2li (xi )(xi − x)]. This is precisely what we defined as a Gaussian quadrature. all that we need is to set the points x0 . i=0 Then there exists ζ ∈ (a.

Hence according to (5. and consider the corresponding Gaussian quadrature: b a n f (x)w(x)dx ≈ Ani f (xni ).32) we have b a n b b f (x)w(x)dx − Ai f (xi ) = i=0 a b f (x)w(x)dx − p2n+1 w(x)dx a n = a f (2n+2) (ξ) w(x) (x − xj )2 dx. the Gaussian quadrature converges to the exact value of the integral as the number of quadrature points tends to infinity. We use the characterization of the Gaussian quadrature as the integral of a Hermite interpolant. f (x) − p2n+1 (x) = f (2n+2) (ξ) (x − xj )2 . e. 93 . b) such that b a n f (x)w(x)dx − Ai f (xi ) = i=0 f (2n+2) (ζ) (2n + 2)! b n a j=0 (x − xj )2 (x)w(x)dx. We will demonstrate this principle with a particular example. i=0 (5. Levy 5.g.13 We let w(x) be a weight function and assuming that f (x) is a continuous function on [a. i.e. Let I denote the exact integral that we would like to approximate. in [7].. We recall that the error formula for the Hermite interpolation is given by (2.35) Then the right-hand-side of (5. For each n ∈ N we let {xni }n be the n + 1 roots of the i=0 polynomial of degree n + 1 that is w-orthogonal to Πn . (2n + 2)! j=0 The integral mean value theorem then implies that there exists ζ ∈ (a.7 Romberg Integration Proof.7 Romberg Integration We have introduced Richardson’s extrapolation in Section 4.35) converges to the left-hand-side as n → ∞. b I= a f (x)dx.. We can use a similar principle with numerical integration.D. A proof of the theorem that is based on the Weierstrass approximation theorem can be found in. This theorem is not of a great practical value because it does not provide an estimate on the rate of convergence. b). (2n + 2)! j=0 n ξ ∈ (a. b].4 in the context of numerical differentiation. 5.49). We conclude this section with a convergence theorem that states that for continuous functions. Theorem 5.

Romberg integration is used to describe the specific case of turning the composite trapezoidal rule into Simpson’s rule (and so on). + fn−1 + fn 2 2 1 1 f0 + f2 + . . We know that the composite trapezoidal rule is second-order accurate (see (5. + fn−2 + fn 2 2 4T (h) − T (2h) + c2 h4 + .7 Romberg Integration D. ck . In some places. 3 Here. . This enables us to eliminate the h2 error term: I= Therefore 1 4T (h) − T (2h) = 4h 3 3 −2h = 1 1 f0 + f1 + . + 2fn−2 + 4fn−1 + fn ) = S(n). The procedure of increasing the accuracy of the quadrature by eliminating the leading error term is known as Romberg integration.13). + ck hk + O(h2k+2 ). i. Levy Let’s assume that this integral is approximated with a composite trapezoidal rule on a uniform grid with mesh spacing h (5. 94 . .5. . The quadrature that is obtained from Simpson’s rule by eliminating the leading error term is known as the super Simpson rule. S(n) denotes the composite Simpson’s rule with n subintervals. .. . n T (h) = h i=0 f (a + ih). We can now write a similar quadrature that is based on half the number of points. are of no interest to us as long as they do not depend on h (which is indeed the case). The exact values of the coefficients. Hence I = T (2h) + c1 (2h)2 + c2 (2h)4 + .14)). . . ˆ 3 h (f0 + 4f1 + 2f2 + . .e. . . A more detailed study of the quadrature error reveals that the difference between I and T (h) can be written as I = T (h) + c1 h2 + c2 h4 + . . T (2h).

Of course f (x) may have more than one zero in the interval. There will also be no indication as of how many zeros f (x) has in the interval.1 Methods for Solving Nonlinear Problems The Bisection Method In this section we present the “bisection method” which is probably the most intuitive approach to root finding. the function f (x) has multiple roots but the method converges to only one of them. and no hints regarding where can we actually hope to find more roots.1: The first two iterations in a bisection root-finding method 95 . Note that in the case that is shown in the figure. The first step is to divide the interval into two equal subintervals. b].. we denote the successive intervals by [a0 .e. This procedure repeats until the stopping criterion is satisfied: we fix a small parameter ε > 0 and stop when |f (c)| < ε. if indeed there are additional roots. in the rare event where f (c) = 0 we are done.. we keep the right subinterval [c. b]. of equal lengths. If f (a)f (c) > 0. 2 This generates two subintervals.1. The bisection method is only going to converge to one of the zeros of f (x). We then know that f (x) has at least one zero in [a. Of course. Levy 6 6. The first two iterations in the bisection method are shown in Figure 6. [a1 . c= a+b . b]. i. [a. If yes. b]. f (a)f (b) < 0. b0 ].. c] and [c. To simplify the notation. Otherwise.D. we keep the left subinterval [a.. f(b0) f(c) 0 f(a0) a0 x c b0 f(b1) 0 f(a1) a1 f(c) c x b1 Figure 6. b1 ]. We also assume that it has opposite signs at both edges of the interval. c]. we check if f (a)f (c) < 0. We are looking for a root of a function f (x) which we assume is continuous on the interval [a. We want to keep the subinterval that is guaranteed to contain a root.

bn ]. an + b n cn = . we have 1 |r − cn | (bn − an ) = 2−(n+1) (b0 − a0 ). 2 We summarize this result with the following theorem.1 The Bisection Method D. so that both sequences converge to the same value. Given such an interval. 2 . 1 n 0. Theorem 6. We now assume that we stop in the interval [an . In addition. 2 In this case. which means that f (r) = 0.e. then the limits limn→∞ an and limn→∞ bn exist. b0 . if we have to guess where is the root (which we know is in the interval). We would also like to figure out how close we are to a root after iterating the algorithm several times. r = lim an = lim bn . i.. bn ]. a0 . it is easy to see that the best estimate for the location of the root is the center of the interval. i.. Levy We would now like to understand if the bisection method always converges to a root. n→∞ n→∞ Since f (an )f (bn ) 0. We also know that every iteration shrinks the length of the interval by a half. 96 an + b n ... i. This means that r ∈ [an ..1 If [an . The sequences {an }n Also n→∞ n→∞ 0 a1 a2 . lim bn − lim an = lim 2−n (b0 − a0 ) = 0.e. We denote that value by r. n→∞ where f (r) = 0. bn ] is the interval that is obtained in the nth iteration of the bisection method.. if cn = then |r − cn | 2−(n+1) (b0 − a0 ). and n→∞ lim an = lim bn = r. and hence converge. and {bn }n n→∞ 0 are monotone and bounded. we know that (f (r))2 0. r is a root of f (x).e.6.. We first note that a0 and b0 b1 b2 . i. bn+1 − an+1 = (bn − an ).e. 2 which means that bn − an = 2−n (b0 − a0 )..

2 Newton’s Method 6.D. iterated twice.3) makes sense. en+1 ≈ ce2 . that r is a simple root of f (x). In order to analyze the error in Newton’s method we let the error in the nth iteration be en = xn − r.2. We denote this point by x1 and write 0 − f (x0 ) = f (x0 )(x1 − x0 ). only if the method converges.3) A convergence rate estimate of the type (6.3917 is such a point.. and widely-used root finding method. in some other cases it may fail to converge at all. We then let l(x) be the tangent line to f (x) at x0 . the Newton method (also known as the Newton-Raphson method) for finding a root is given by iterating (6. n (6. We demonstrate such a case in Figure 6. l(x) − f (x0 ) = f (x0 )(x − x0 ). Starting from a point xn . practical. which means that x1 = x0 − f (x0 ) ... It is easy to see that while in some cases the method rapidly converges to a root of the function. i. we will prove the convergence of the method for certain functions 97 . we do converge to the root of f (x). As always. Levy 6. We assume that f (x) is continuous and that f (r) = 0. This is one reason as of why it is so important not only to understand the construction of the method.e. xn+1 = xn − f (xn ) . but also to understand its limitations. Indeed. we find the next approximation of the root xn+1 . We start with an initial guess for the location of the root. i. of course. We will show that the method has a quadratic convergence rate.e. x0 ≈ 1.3.1) In general. from which we find xn+2 and so on. In this case. say x0 . The intersection of l(x) with the x-axis serves as the next estimate of the root. It is easy to see that Newton’s method does not always converge.e. In this case.2) Two sample iterations of the method are shown in Figure 6. i.2 Newton’s Method Newton’s method is a relatively simple.e.1) repeatedly.. f (x0 ) (6. we assume that f (x) has at least one (real) root. and denote it by r. f (xn ) (6. Here we consider the function f (x) = tan−1 (x) and show what happens if we start with a point which is a fixed point of Newton’s method. i.

6.2 Newton’s Method

D. Levy

f(x) −→

0

r

xn+2
x

xn+1

xn

Figure 6.2: Two iterations in Newton’s root-finding method. r is the root of f (x) we approach by starting from xn , computing xn+1 , then xn+2 , etc.

tan−1(x)

0

x1, x3, x5, ...

x0, x2, x4, ...

x

Figure 6.3: Newton’s method does not always converge. In this case, the starting point is a fixed point of Newton’s method iterated twice

98

D. Levy

6.2 Newton’s Method

f (x), but before we get to the convergence issue, let’s derive the estimate (6.3). We rewrite en+1 as en+1 = xn+1 − r = xn − f (xn ) en f (xn ) − f (xn ) f (xn ) − r = en − = . f (xn ) f (xn ) f (xn )

Writing a Taylor expansion of f (r) about x = xn we have 1 0 = f (r) = f (xn − en ) = f (xn ) − en f (xn ) + e2 f (ξn ), 2 n which means that 1 en f (xn ) − f (xn ) = f (ξn )e2 . n 2 Hence, the relation (6.3), en+1 ≈ ce2 , holds with n c= 1 f (ξn ) 2 f (xn ) (6.4)

Since we assume that the method converges, in the limit as n → ∞ we can replace (6.4) by c= 1 f (r) . 2 f (r) (6.5)

We now return to the issue of convergence and prove that for certain functions Newton’s method converges regardless of the starting point. Theorem 6.2 Assume that f (x) has two continuous derivatives, is monotonically increasing, convex, and has a zero. Then the zero is unique and Newton’s method will converge to it from every starting point. Proof. The assumptions on the function f (x) imply that ∀x, f (x) > 0 and f (x) > 0. By (6.4), the error at the (n + 1)th iteration, en+1 , is given by en+1 = 1 f (ξn ) 2 e , 2 f (xn ) n 1, xn > r, Since

and hence it is positive, i.e., en+1 > 0. This implies that ∀n f (x) > 0, we have f (xn ) > f (r) = 0. Now, subtracting r from both sides of (6.2) we may write en+1 = en − f (xn ) , f (xn ) 99

(6.6)

6.3 The Secant Method

D. Levy

which means that en+1 < en (and hence xn+1 < xn ). Hence, both {en }n 0 and {xn }n are decreasing and bounded from below. This means that both series converge, i.e., there exists e∗ such that, e∗ = lim en ,
n→∞

0

and there exists x∗ such that x∗ = lim xn .
n→∞

By (6.6) we have e∗ = e∗ − f (x∗ ) , f (x∗ )

so that f (x∗ ) = 0, and hence x∗ = r.

6.3

The Secant Method
f (xn ) . f (xn )

We recall that Newton’s root finding method is given by equation (6.2), i.e., xn+1 = xn −

We now assume that we do not know that the function f (x) is differentiable at xn , and thus can not use Newton’s method as is. Instead, we can replace the derivative f (xn ) that appears in Newton’s method by a difference approximation. A particular choice of such an approximation, f (xn ) ≈ f (xn ) − f (xn−1 ) , xn − xn−1 xn − xn−1 , f (xn ) − f (xn−1 )

leads to the secant method which is given by xn+1 = xn − f (xn ) n 1. (6.7)

A geometric interpretation of the secant method is shown in Figure 6.4. Given two points, (xn−1 , f (xn−1 )) and (xn , f (xn )), the line l(x) that connects them satisfies l(x) − f (xn ) = f (xn−1 ) − f (xn ) (x − xn ). xn−1 − xn

The next approximation of the root, xn+1 , is defined as the intersection of l(x) and the x-axis, i.e., 0 − f (xn ) = f (xn−1 ) − f (xn ) (xn+1 − xn ). xn−1 − xn 100 (6.8)

10) f (xn )xn−1 − f (xn−1 )xn f (xn )en−1 − f (xn−1 )en −r = .8) we end up with the secant method (6. we note that in the secant method there is no need to evaluate any derivatives. Levy 6. We note that the secant method (6. (6.D. e. In addition. better than linear but less than quadratic). The points xn−1 and xn are used to obtain xn+1 . We now proceed with an error analysis for the secant method. with √ 1+ 5 α= .g. if implemented properly.4: The Secant root-finding method. f (xn ) − f (xn−1 ) f (xn ) − f (xn−1 ) − f (xn−1 ) en−1 xn − xn−1 101 . every stage requires only one new function evaluation.7) requires two initial points. We claim that the rate of convergence of the secant method is superlinear (meaning.11) . Newton’s method.7)..9) (6. More precisely. 2 We start by rewriting en+1 as en+1 = xn+1 − r = Hence en+1 = en en−1 xn − xn−1 f (xn ) − f (xn−1 ) f (xn ) en (6. As usual. we will show that it is given by |en+1 | ≈ |en |α . we denote the error at the nth iteration by en = xn − r.3 The Secant Method f(x) −→ 0 r xn+1 x xn xn−1 Figure 6. which is the next approximation of the root r Rearranging the terms in (6. While this is an extra requirement compared with.

we have A|en |α ∼ C|en |A− α |en | α .14) The left-hand-side of (6.13) also means that |en | ∼ A|en−1 |α . n 2 n and hence f (xn ) 1 = f (r) + en f (r) + O(e2 ).14) tends to zero as n → ∞ (assuming. Since (6. we now assume that the order of convergence is α. 1 1 1 1 (6.3 The Secant Method A Taylor expansion of f (xn ) about x = r reads 1 f (xn ) = f (r + en ) = f (r) + en f (r) + e2 f (r) + O(e3 ). of course. Levy xn − xn−1 and − 1 ≈ f (r). α 102 . This is possible only if 1−α+ 1 = 0.14) is non-zero while the right-hand-side of (6.12) Equation (6.12) expresses the error at iteration n + 1 in terms of the errors at iterations n and n − 1. f (xn ) − f (xn−1 ) f (r) The error expression (6. 2 f (r) (6. that the method converges). i.e.13) (6. 2 1 xn − xn−1 ≈ . |en+1 | ∼ A|en |α . f (xn ) en f (xn−1 ) en−1 D. n en 2 We thus have 1 f (xn ) f (xn−1 ) − = (en − en−1 )f (r) + O(e2 ) + O(e2 ) n−1 n en en−1 2 1 = (xn − xn−1 )f (r) + O(e2 ) + O(e2 ).. This implies that A1+ α C −1 ∼ |en |1−α+ α .11) can be now simplified to en+1 ≈ 1 f (r) en en−1 = cen en−1 . n−1 n 2 Therefore.6. In order to turn this into a relation between the error at the (n + 1)th iteration and the error at the nth iteration.

Assume that f (r) = 0 and that f (r) = 0. Levy which. √ In this case.D. If x0 .3 The Secant Method A = C 1+ α = C α = C α−1 = 1 f (r) 2f (r) α−1 . then xn → r. x1 are sufficiently close to the root r.3 Assume that f (x) is continuous ∀x in an interval I. in turn. α= 2 The constant A in (6.13) is thus given by 1 1 6. We summarize this result with the theorem: Theorem 6. the convergence is of order 1+2 5 . 103 . means that √ 1+ 5 .

. Burlisch R.. 1989 [2] Cheney E. Mineola. John Wiley & Sons. 2003.. Bj¨rck A. New York. Numerical methods. NY. NJ. Keller H.. Second edition. NY.REFERENCES D. Chelsea publishing company. An introduction to numerical analysis. Second edition. Englewood cliffs. 1982 [3] Dahlquist G. Dover. New York. Second edition. New York. Analysis of numerical methods. NY.. o 1974 [4] Davis P. Second edition.. Introduction to numerical analysis.. Introduction to approximation theory. Dover. NY. 1975 [5] Isaacson E.. Second edition. An introduction to numerical analysis. New York. Prentice-Hall... 104 . 1993 [7] S¨li E.. NY. Cambridge university u press. UK. Interpolation and approximation. Mayers D.B. Cambridge.J.W. 1994 [6] Stoer J. Levy References [1] Atkinson K. SpringerVerlag.

. . . . 49 orthogonal polynomials . . . . . . 10 polynomial interpolation . . . . . . . . . . . . . . . . . . 78 composite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .23 error . . . . . . . . . . . . . . . . . . . . . 68 divided differences . . . . . . . . . . . . . . 53 L∞ -norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10. . . . . 75 Riemann . . . . . . . . . . . . . . . . . . . .41 oscillating theorem . . . . . . . . . . . . . . . . . . . . . . 79 undetermined coefficients . . . . . . . . . 45 de la Vall´e-Poussin . . . 46 Weierstrass . . . 57 105 Hilbert matrix. . . . . . . . . . . . . . 46 Newton’s form . 77. . . . . . . . . . . .84 rectangular rule . . . . . 94 trapezoidal rule . . . . . . . . . 67. . . . . . . . . 50 solution . . . . . . . . . . . . . . . . 48 weighted . . . . . . . . . . . . . .45 near minimax . 66 one-sided differencing . 88 midpoint rule . . . . 3 Hermite . . . . . . . 46 points . . . 3 splines. . . . . . . . . . . . . . . . . . . . . . 85 composite . 77 quadratures . . . . . . . . . . . . 3 Lagrange form . . . . . . . . . . . . 18 divided differences . 44 e differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 error . . . . . . . . . . 75 weighted. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21 Lagrange form . . . . . . 56 Chebyshev uniqueness theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36. . . 8 near minimax . . . . . . . . . . . . 65 accuracy. .Index L2 -norm . . . . . . . . . 70 via interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 uniqueness. . . . 28 knots . . . . . . . . . . . . . . . . . . . . 23 Gram-Schmidt . . . . . . 15. . . . . . . 46 polynomials . . . . . . . . . . . . . . . . 37 Bessel’s inequality . . . . . . . . . . . . . . . . . . . . . 25 Newton’s form . . 37 Bernstein polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . .49 inner product . . . . . . . . . . . . . . 66 Richardson’s extrapolation. . . . . . . . . 87 super Simpson . . . . . . . . . 42 least-squares. . 36 approximation best approximation . . . . . . . . 10 divided differences . . . . . . . . . . . . . . . . . . . . . . . . 3 interpolation points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .66 backward differencing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 . . . . . . . . . . . . . . . . . . .66 centered differencing . . . . . . . . . . . . . . . . . . . . . . . . . 48 minimax. . . . . . . . . . . . . . . . 15. . . . . . . . . . . . . . . 53 integration Gaussian . . . . . . . . . . . 14 with repetitions . . . . . . . . . . . . . . . . . . . . . . . . 30 degree . . . . . . . . . . . 23 interpolation error .72 truncation error . . . . . . . . . . . . . . . . . 14 with repetitions. . . . . . . . . . . . . . . . . . . . . . . 18. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Hermite polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 remez . . . . . . . . . . . . . . . . . . . . . 5. . . . . . . . . 10. . . . . . . . . . . . . . . . . . . . . 59 Chebyshev near minimax interpolation . . . . . . . 66 undetermined coefficients .28 cubic . . . . . . . . . . . . . . 52. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87 orthogonal polynomials . . . . . 81 Newton-Cotes . . 15 existence . . . . . . . . . . . 85. . . . . . . . . . . . . . . .93. . . . . . . . . . 42 existence . . . . . 12. . . . . . . . . 83. . . . . . . . . . . . . . 82 interpolation Chebyshev points . . . . . . . . . . .48 Hilbert matrix . . . . . . 74 Romberg. . . . 75 composite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66 forward differencing . . . 94 Simpson’s rule . . . 53 weighted . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . 7 Vandermonde determinant . . . . . . .see approximation Legendre polynomials . . . . . . . . . . . 52 Lagrange form . . . . . . . . . . . . . . . . 43 monic polynomial . . . . . . 88 Bessel’s inequality . . . . . . . . . Levy Vandermonde determinant . . . 56. . . 6 Weierstrass approximation theorem . 59 quadratures . . . . . . . . . . . . . . . . . 60 maximum norm . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 root finding Newton’s method . . . . . 59. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 Parseval’s equality . . . . . . . 33 uniqueness . . . . . . . . . . . 95 the secant method . see interpolation norm . . . . . 63 oscillating theorem . . . . . . . . . . . . . 56. . . . see interpolation Taylor series. . . . . . . . . 33. . . . . 93. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57. . . . . . . . . . . . . . . . . . . . . . . 59. 56 Gram-Schmidt. . 16 Newton’s form . . . . see approximation weighted. . . . 62 Legendre . . . . . . . . . . . 36 orthogonal polynomials . . 46 Richardson’s extrapolation . . 36 106 D. . . . . . . . . . . . . 93 Riemann sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53. . . . . 100 splines . . . . . . . . . . . . . . . 34 not-a-knot . . . . . 62 least-squares . . . . . 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72. . . . . . . . . . . . . . .53 Hermite . . . . . . . . . . . 59 roots of. 51. . . . . . . . . . . 60 Parseval’s equality . . . . . . . . . . . 97 the bisection method . . 37 . . . . 57. . . .INDEX natural . . . . . . . .23 triangle inequality . . . . . . . . . see integration Remez algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 Laguerre . . . . 41. 36 minimax error . see interpolation Laguerre polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .63 triple recursion relation . . . . . . . . . . . . . . . . . . . . . . . . . . 59 Chebyshev . . . . . . . . . . . . 6 weighted least squares . . . . 74 Romberg integration . . . .