This action might not be possible to undo. Are you sure you want to continue?
Doron Levy
Department of Mathematics
Stanford University
December 1, 2005
D. Levy
Preface
i
D. Levy CONTENTS
Contents
Preface i
1 Introduction 1
2 Interpolation 2
2.1 What is Interpolation? . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 The Interpolation Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.3 Newton’s Form of the Interpolation Polynomial . . . . . . . . . . . . . . 5
2.4 The Interpolation Problem and the Vandermonde Determinant . . . . . . 6
2.5 The Lagrange Form of the Interpolation Polynomial . . . . . . . . . . . . 7
2.6 Divided Diﬀerences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.7 The Error in Polynomial Interpolation . . . . . . . . . . . . . . . . . . . 12
2.8 Interpolation at the Chebyshev Points . . . . . . . . . . . . . . . . . . . 15
2.9 Hermite Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.9.1 Divided diﬀerences with repetitions . . . . . . . . . . . . . . . . . 23
2.9.2 The Lagrange form of the Hermite interpolant . . . . . . . . . . . 25
2.10 Spline Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.10.1 Cubic splines . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.10.2 What is natural about the natural spline? . . . . . . . . . . . . . 34
3 Approximations 36
3.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.2 The Minimax Approximation Problem . . . . . . . . . . . . . . . . . . . 41
3.2.1 Existence of the minimax polynomial . . . . . . . . . . . . . . . . 42
3.2.2 Bounds on the minimax error . . . . . . . . . . . . . . . . . . . . 43
3.2.3 Characterization of the minimax polynomial . . . . . . . . . . . . 44
3.2.4 Uniqueness of the minimax polynomial . . . . . . . . . . . . . . . 45
3.2.5 The nearminimax polynomial . . . . . . . . . . . . . . . . . . . . 46
3.2.6 Construction of the minimax polynomial . . . . . . . . . . . . . . 46
3.3 Leastsquares Approximations . . . . . . . . . . . . . . . . . . . . . . . . 48
3.3.1 The leastsquares approximation problem . . . . . . . . . . . . . . 48
3.3.2 Solving the leastsquares problem: a direct method . . . . . . . . 48
3.3.3 Solving the leastsquares problem: with orthogonal polynomials . 50
3.3.4 The weighted least squares problem . . . . . . . . . . . . . . . . . 52
3.3.5 Orthogonal polynomials . . . . . . . . . . . . . . . . . . . . . . . 53
3.3.6 Another approach to the leastsquares problem . . . . . . . . . . 58
3.3.7 Properties of orthogonal polynomials . . . . . . . . . . . . . . . . 63
4 Numerical Diﬀerentiation 65
4.1 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.2 Diﬀerentiation Via Interpolation . . . . . . . . . . . . . . . . . . . . . . . 67
4.3 The Method of Undetermined Coeﬃcients . . . . . . . . . . . . . . . . . 70
4.4 Richardson’s Extrapolation . . . . . . . . . . . . . . . . . . . . . . . . . . 72
iii
CONTENTS D. Levy
5 Numerical Integration 74
5.1 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
5.2 Integration via Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . 77
5.3 Composite Integration Rules . . . . . . . . . . . . . . . . . . . . . . . . . 79
5.4 Additional Integration Techniques . . . . . . . . . . . . . . . . . . . . . . 82
5.4.1 The method of undetermined coeﬃcients . . . . . . . . . . . . . . 82
5.4.2 Change of an interval . . . . . . . . . . . . . . . . . . . . . . . . . 83
5.4.3 General integration formulas . . . . . . . . . . . . . . . . . . . . . 84
5.5 Simpson’s Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.5.1 The quadrature error . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.5.2 Composite Simpson rule . . . . . . . . . . . . . . . . . . . . . . . 86
5.6 Gaussian Quadrature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
5.6.1 Maximizing the quadrature’s accuracy . . . . . . . . . . . . . . . 87
5.6.2 Convergence and error analysis . . . . . . . . . . . . . . . . . . . 91
5.7 Romberg Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
6 Methods for Solving Nonlinear Problems 95
6.1 The Bisection Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.2 Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
6.3 The Secant Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
Bibliography 104
iv
D. Levy
1 Introduction
1
D. Levy
2 Interpolation
2.1 What is Interpolation?
Imagine that there is an unknown function f(x) for which someone supplies you with
its (exact) values at (n+1) distinct points x
0
< x
1
< < x
n
, i.e., f(x
0
), . . . , f(x
n
) are
given. The interpolation problem is to construct a function Q(x) that passes through
these points. One easy way of ﬁnding such a function, is to connect them with straight
lines. While this is a legitimate solution of the interpolation problem, usually (though
not always) we are interested in a diﬀerent kind of a solution, e.g., a smoother function.
We therefore always specify a certain class of functions from which we would like to
ﬁnd one that solves the interpolation problem. For example, we may look for a
polynomial, Q(x), that passes through these points. Alternatively, we may look for a
trigonometric function or a piecewisesmooth polynomial such that the interpolation
requirements
Q(x
j
) = f(x
j
), 0 j n, (2.1)
are satisﬁed (see Figure 2.1).
x
0
x
1
x
2
f(x
0
)
f(x
1
)
f(x
2
)
f(x)
Q(x)
Figure 2.1: The function f(x), the interpolation points x
0
, x
1
, x
2
, and the interpolating
polynomial Q(x)
As a simple example let’s consider values of a function that are prescribed at two
points: (x
0
, f(x
0
)) and (x
1
, f(x
1
)). There are inﬁnitely many functions that pass through
these two points. However, if we limit ourselves to polynomials of degree less than or
equal to one, there is only one such function that passes through these two points, which
is nothing but the line that connects them. A line, in general, is a polynomial of degree
2
D. Levy 2.2 The Interpolation Problem
one, but if we want to keep the discussion general enough, it could be that f(x
0
) = f(x
1
)
in which case the line that connects the two points is the constant Q
0
(x) ≡ f(x
0
), which
is a polynomial of degree zero. This is why we say that there is a unique polynomial of
degree 1 that connects these two points (and not “a polynomial of degree 1”).
The points x
0
, . . . , x
n
are called the interpolation points. The property of “passing
through these points” is referred to as interpolating the data. The function that
interpolates the data is an interpolant or an interpolating polynomial (or whatever
function is being used).
Sometimes the interpolation problem has a solution. There are cases were the inter
polation problem has no solution, has a unique solution, or has more than one solution.
What we are going to study in this section is precisely how to distinguish between these
cases. We are also going to present various ways of actually constructing the interpolant.
In general, there is little hope that the interpolant will be identical to the unknown
function f(x). The function Q(x) that interpolates f(x) at the interpolation points will
be still be identical to f(x) at these points because there we satisfy the interpolation
conditions (2.1). In general, at any other point, Q(x) and f(x) will not have the same
values. The interpolation error a measure on how diﬀerent these two functions are.
We will study ways of estimating the interpolation error. We will also discuss strategies
on how to minimize this error.
It is important to note that it is possible to formulate interpolation problem without
referring to (or even assuming the existence of) any underlying function f(x). For
example, you may have a list of interpolation points x
0
, . . . , x
n
, and data that is given
at these points, y
0
, y
1
, . . . , y
n
, which you would like to interpolate. The solution to
this interpolation problem is identical to the one where the values are taken from an
underlying function.
2.2 The Interpolation Problem
We begin our study with the problem of polynomial interpolation: Given n + 1
distinct points x
0
, . . . , x
n
, we seek a polynomial Q
n
(x) of the lowest degree such that
the following interpolation conditions are satisﬁed:
Q
n
(x
j
) = f(x
j
), j = 0, . . . , n. (2.2)
Note that we do not assume any ordering between the points x
0
, . . . , x
n
, as such an
order will make no diﬀerence for the present discussion. If we do not limit the degree of
the interpolation polynomial it is easy to see that there any inﬁnitely many polynomials
that interpolate the data. However, limiting the degree to n, singles out precisely
one interpolant that will do the job. For example, if n = 1, there are inﬁnitely many
polynomials that interpolate between (x
0
, f(x
0
)) and (x
1
, f(x
1
)). There is only one
polynomial of degree 1 that does the job. This result is formally stated in the
following theorem:
Theorem 2.1 If x
0
, . . . , x
n
∈ R are distinct, then for any f(x
0
), . . . f(x
n
) there exists a
unique polynomial Q
n
(x) of degree n such that the interpolation conditions (2.2) are
satisﬁed.
3
2.2 The Interpolation Problem D. Levy
Proof. We start with the existence part and prove the result by induction. For n = 0,
Q
0
= f(x
0
). Suppose that Q
n−1
is a polynomial of degree n −1, and suppose also
that
Q
n−1
(x
j
) = f(x
j
), 0 j n −1.
Let us now construct from Q
n−1
(x) a new polynomial, Q
n
(x), in the following way:
Q
n
(x) = Q
n−1
(x) + c(x −x
0
) . . . (x −x
n−1
). (2.3)
The constant c in (2.3) is yet to be determined. Clearly, the construction of Q
n
(x)
implies that deg(Q
n
(x)) n. In addition, the polynomial Q
n
(x) satisﬁes the
interpolation requirements Q
n
(x
j
) = f(x
j
) for 0 j n −1. All that remains is to
determine the constant c in such a way that the last interpolation condition,
Q
n
(x
n
) = f(x
n
), is satisﬁed, i.e.,
Q
n
(x
n
) = Q
n−1
(x
n
) + c(x
n
−x
0
) . . . (x
n
−x
n−1
). (2.4)
The condition (2.4) deﬁnes c as
c =
f(x
n
) −Q
n−1
(x
n
)
n−1
¸
j=0
(x
n
−x
j
)
, (2.5)
and we are done with the proof of existence.
As for uniqueness, suppose that there are two polynomials Q
n
(x), P
n
(x) of degree n
that satisfy the interpolation conditions (2.2). Deﬁne a polynomial H
n
(x) as the
diﬀerence
H
n
(x) = Q
n
(x) −P
n
(x).
The degree of H
n
(x) is at most n which means that it can have at most n zeros (unless
it is identically zero). However, since both Q
n
(x) and P
n
(x) satisfy the interpolation
requirements (2.2), we have
H
n
(x
j
) = (Q
n
−P
n
)(x
j
) = 0, 0 j n,
which means that H
n
(x) has n + 1 distinct zeros. This leads to a contradiction that
can be resolved only if H
n
(x) is the zero polynomial, i.e.,
P
n
(x) = Q
n
(x),
and uniqueness is established.
4
D. Levy 2.3 Newton’s Form of the Interpolation Polynomial
2.3 Newton’s Form of the Interpolation Polynomial
One good thing about the proof of Theorem 2.1 is that it is constructive. In other
words, we can use the proof to write down a formula for the interpolation polynomial.
We follow the procedure given by (2.4) for reconstructing the interpolation polynomial.
We do it in the following way:
• Let
Q
0
(x) = a
0
,
where a
0
= f(x
0
).
• Let
Q
1
(x) = a
0
+ a
1
(x −x
0
).
Following (2.5) we have
a
1
=
f(x
1
) −Q
0
(x
1
)
x
1
−x
0
=
f(x
1
) −f(x
0
)
x
1
−x
0
.
We note that Q
1
(x) is nothing but the straight line connecting the two points
(x
0
, f(x
0
)) and (x
1
, f(x
1
)).
• In general, let
Q
n
(x) = a
0
+ a
1
(x −x
0
) + . . . + a
n
(x −x
0
) . . . (x −x
n−1
) (2.6)
= a
0
+
n
¸
j=1
a
j
j−1
¸
k=0
(x −x
k
).
The coeﬃcients a
j
in (2.6) are given by
a
0
= f(x
0
), (2.7)
a
j
=
f(x
j
) −Q
j−1
(x
j
)
j−1
¸
k=0
(x
j
−x
k
)
.
We refer to the interpolation polynomial when written in the form (2.6)–(2.7) as the
Newton form of the interpolation polynomial. As we shall see below, there are
various ways of writing the interpolation polynomial. The uniqueness of the interpola
tion polynomial as guaranteed by Theorem 2.1 implies that we will only be rewriting
the same polynomial in diﬀerent ways.
Example 2.2
The Newton form of the polynomial that interpolates (x
0
, f(x
0
)) and (x
1
, f(x
1
)) is
Q
1
(x) = f(x
0
) +
f(x
1
) −f(x
0
)
x
1
−x
0
(x −x
0
).
5
2.4 The Interpolation Problem and the Vandermonde Determinant D. Levy
Example 2.3
The Newton form of the polynomial that interpolates the three points (x
0
, f(x
0
)),
(x
1
, f(x
1
)), and (x
2
, f(x
2
)) is
Q
2
(x) = f(x
0
)+
f(x
1
) −f(x
0
)
x
1
−x
0
(x−x
0
)+
f(x
2
) −f(x
0
) +
f(x
1
)−f(x
0
)
x
1
−x
0
(x
2
−x
0
)
(x
2
−x
0
)(x
2
−x
1
)
(x−x
0
)(x−x
1
).
2.4 The Interpolation Problem and the Vandermonde Deter
minant
An alternative approach to the interpolation problem is to consider directly a polynomial
of the form
Q
n
(x) =
n
¸
k=0
b
k
x
k
, (2.8)
and require that the following interpolation conditions are satisﬁed
Q
n
(x
j
) = f(x
j
), 0 j n. (2.9)
In view of Theorem 2.1 we already know that this problem has a unique solution, so we
should be able to compute directly the coeﬃcients of the polynomial as given in (2.8).
Indeed, the interpolation conditions, (2.9), imply that the following equations should
hold:
b
0
+ b
1
x
j
+ . . . + b
n
x
n
j
= f(x
j
), j = 0, . . . , n. (2.10)
In matrix form, (2.10) can be rewritten as
¸
¸
¸
¸
1 x
0
. . . x
n
0
1 x
1
. . . x
n
1
.
.
.
.
.
. . . .
.
.
.
1 x
n
. . . x
n
n
¸
¸
¸
¸
b
0
b
1
.
.
.
b
n
=
¸
¸
¸
¸
f(x
0
)
f(x
1
)
.
.
.
f(x
n
)
. (2.11)
In order for the system (2.11) to have a unique solution, it has to be nonsingular.
This means, e.g., that the determinant of its coeﬃcients matrix must not vanish, i.e.
1 x
0
. . . x
n
0
1 x
1
. . . x
n
1
.
.
.
.
.
. . . .
.
.
.
1 x
n
. . . x
n
n
= 0. (2.12)
The determinant (2.12), is known as the Vandermonde determinant. We leave it
as an exercise to verify that
1 x
0
. . . x
n
0
1 x
1
. . . x
n
1
.
.
.
.
.
. . . .
.
.
.
1 x
n
. . . x
n
n
=
¸
i>j
(x
i
−x
j
). (2.13)
6
D. Levy 2.5 The Lagrange Form of the Interpolation Polynomial
Since we assume that the points x
0
, . . . , x
n
are distinct, the determinant (2.13) is indeed
non zero. Hence, the system (2.11) has a solution that is also unique, which conﬁrms
what we already know according to Theorem 2.1.
2.5 The Lagrange Form of the Interpolation Polynomial
The form of the interpolation polynomial that we used in (2.8) assumed a linear com
bination of polynomials of degrees 0, . . . , n, in which the coeﬃcients were unknown. In
this section we take a diﬀerent approach and assume that the interpolation polyno
mial is given as a linear combination of n + 1 polynomials of degree n. This time, we
set the coeﬃcients as the interpolated values, ¦f(x
j
)¦
n
j=0
, while the unknowns are the
polynomials. We thus let
Q
n
(x) =
n
¸
j=0
f(x
j
)l
n
j
(x), (2.14)
where l
n
j
(x) are n+1 polynomials of degree n. We use two indices in these polynomials:
the subscript j enumerates l
n
j
(x) from 0 to n and the superscript n is used to remind
us that the degree of l
n
j
(x) is n. Note that in this particular case, the polynomials l
n
j
(x)
are precisely of degree n (and not n). However, Q
n
(x), given by (2.14) may have a
lower degree. In either case, the degree of Q
n
(x) is n at the most. We now require that
Q
n
(x) satisﬁes the interpolation conditions
Q
n
(x
i
) = f(x
i
), 0 i n. (2.15)
By substituting x
i
for x in (2.14) we have
Q
n
(x
i
) =
n
¸
j=0
f(x
j
)l
n
j
(x
i
), 0 i n.
In view of (2.15) we may conclude that l
n
j
(x) must satisfy
l
n
j
(x
i
) = δ
ij
, (2.16)
where δ
ij
is the Kr¨onecker delta,
δ
ij
=
1, i = j,
0, i = j.
One obvious way of constructing polynomials l
n
j
of degree n that satisfy (2.16) is the
following:
l
n
j
(x) =
(x −x
0
) . . . (x −x
j−1
)(x −x
j+1
) . . . (x −x
n
)
(x
j
−x
0
) . . . (x
j
−x
j−1
)(x
j
−x
j+1
) . . . (x
j
−x
n
)
, 0 j n. (2.17)
Note that the denominator in (2.17) does not vanish since we assume that all inter
polation points are distinct, and hence the polynomials l
n
j
(x) are well deﬁned. The
7
2.5 The Lagrange Form of the Interpolation Polynomial D. Levy
Lagrange form of the interpolation polynomial is the polynomial Q
n
(x) given by
(2.14), where the polynomials l
n
j
(x) of degree n are given by
l
n
j
(x) =
n
¸
i=0
i=j
(x −x
i
)
n
¸
i=0
i=j
(x
j
−x
i
)
, j = 0, . . . , n. (2.18)
Example 2.4
We are interested in ﬁnding the Lagrange form of the interpolation polynomial that
interpolates two points: (x
0
, f(x
0
)) and (x
1
, f(x
1
)). We know that the unique interpola
tion polynomial through these two points is the line that connects the two points. Such
a line can be written in many diﬀerent forms. In order to obtain the Lagrange form we
let
l
1
0
(x) =
x −x
1
x
0
−x
1
, l
1
1
(x) =
x −x
0
x
1
−x
0
.
The desired polynomial is therefore given by the familiar formula
Q
1
(x) = f(x
0
)l
1
0
(x) + f(x
1
)l
1
1
(x) = f(x
0
)
x −x
1
x
0
−x
1
+ f(x
1
)
x −x
0
x
1
−x
0
.
Example 2.5
This time we are looking for the Lagrange form of the interpolation polynomial, Q
2
(x),
that interpolates three points: (x
0
, f(x
0
)), (x
1
, f(x
1
)), (x
2
, f(x
2
)). Unfortunately, the
Lagrange form of the interpolation polynomial does not let us use the interpolation
polynomial through the ﬁrst two points, Q
1
(x), as a building block for Q
2
(x). This
means that we have to compute everything from scratch. We start with
l
2
0
(x) =
(x −x
1
)(x −x
2
)
(x
0
−x
1
)(x
0
−x
2
)
,
l
2
1
(x) =
(x −x
0
)(x −x
2
)
(x
1
−x
0
)(x
1
−x
2
)
,
l
2
2
(x) =
(x −x
0
)(x −x
1
)
(x
2
−x
0
)(x
2
−x
1
)
.
The interpolation polynomial is therefore given by
Q
2
(x) = f(x
0
)l
2
0
(x) + f(x
1
)l
2
1
(x) + f(x
2
)l
2
2
(x)
= f(x
0
)
(x −x
1
)(x −x
2
)
(x
0
−x
1
)(x
0
−x
2
)
+ f(x
1
)
(x −x
0
)(x −x
2
)
(x
1
−x
0
)(x
1
−x
2
)
+ f(x
2
)
(x −x
0
)(x −x
1
)
(x
2
−x
0
)(x
2
−x
1
)
.
It is easy to verify that indeed Q
2
(x
j
) = f(x
j
) for j = 0, 1, 2, as desired.
Remarks.
8
D. Levy 2.5 The Lagrange Form of the Interpolation Polynomial
1. One instance where the Lagrange form of the interpolation polynomial may seem
to be advantageous when compared with the Newton form is when you are
interested in solving several interpolation problems, all given at the same
interpolation points x
0
, . . . x
n
but with diﬀerent values f(x
0
), . . . , f(x
n
). In this
case, the polynomials l
n
j
(x) are identical for all problems since they depend only
on the points but not on the values of the function at these points. Therefore,
they have to be constructed only once.
2. An alternative form for l
n
j
(x) can be obtained in the following way. If we deﬁne
w
n
(x) =
n
¸
i=0
(x −x
i
),
then
w
n
(x) =
n
¸
j=0
n
¸
i=0
i=j
(x −x
i
). (2.19)
When we then evaluate w
x
(x) at any interpolation point, x
j
, there is only one
term in the sum in (2.19) that does not vanish:
w
n
(x
j
) =
n
¸
i=0
i=j
(x
j
−x
i
).
Hence, l
n
j
(x) can be rewritten as
l
n
j
(x) =
w
n
(x)
(x −x
j
)w
n
(x
j
)
, 0 j n. (2.20)
3. For future reference we note that the coeﬃcient of x
n
in the interpolation
polynomial Q
n
(x) is
n
¸
j=0
f(x
j
)
n
¸
k=0
k=j
(x
j
−x
k
)
. (2.21)
For example, the coeﬃcient of x in Q
1
(x) in Example 2.4 is
f(x
0
)
x
0
−x
1
+
f(x
1
)
x
1
−x
0
.
9
2.6 Divided Diﬀerences D. Levy
2.6 Divided Diﬀerences
We recall that Newton’s form of the interpolation polynomial is
Q
n
(x) = a
0
+ a
1
(x −x
0
) + . . . + a
n
(x −x
0
) . . . (x −x
n−1
),
with a
0
= f(x
0
) and
a
j
=
f(x
j
) −Q
j−1
(x
j
)
j−1
¸
k=0
(x
j
−x
k
)
, 1 j n.
We name the j
th
coeﬃcient, a
j
, as the j
th
order divided diﬀerence. The j
th
order
divided diﬀerence, a
j
, is based on the points x
0
, . . . , x
j
and on the values of the function
at these points f(x
0
), . . . , f(x
j
). To emphasize this dependence, we use the following
notation:
a
j
= f[x
0
, . . . , x
j
], 1 j n.
We also denote the zerothorder divided diﬀerence as
a
0
= f[x
0
],
where
f[x
0
] = f(x
0
).
When written in terms of the divided diﬀerences, the Newton form of the interpolation
polynomial becomes
Q
n
(x) = f[x
0
] + f[x
0
, x
1
](x −x
0
) + . . . + f[x
0
, . . . x
n
]
n−1
¸
k=0
(x −x
k
). (2.22)
There is a simple way of computing the j
th
order divided diﬀerence from lower order
divided diﬀerences. This is given by the following lemma.
Lemma 2.6 The divided diﬀerences satisfy:
f[x
0
, . . . x
n
] =
f[x
1
, . . . x
n
] −f[x
0
, . . . x
n−1
]
x
n
−x
0
. (2.23)
Proof. For any k, we denote by Q
k
(x), a polynomial of degree k, that interpolates
f(x) at x
0
, . . . , x
k
, i.e.,
Q
k
(x
j
) = f(x
j
), 0 j k.
We now consider the unique polynomial P(x) of degree n −1 that interpolates f(x)
at x
1
, . . . , x
n
. It is easy to verify that
Q
n
(x) = P(x) +
x −x
n
x
n
−x
0
[P(x) −Q
n−1
(x)]. (2.24)
10
D. Levy 2.6 Divided Diﬀerences
The coeﬃcient of x
n
on the lefthandside of (2.24) is f[x
0
, . . . , x
n
]. The coeﬃcient of
x
n−1
in P(x) is f[x
1
, . . . , x
n
] and the coeﬃcient of x
n−1
in Q
n−1
(x) is f[x
0
, . . . , x
n−1
].
Hence, the coeﬃcient of x
n
on the righthandside of (2.24) is
1
x
n
−x
0
(f[x
1
, . . . , x
n
] −f[x
0
, . . . , x
n−1
]),
which means that
f[x
0
, . . . x
n
] =
f[x
1
, . . . x
n
] −f[x
0
, . . . x
n−1
]
x
n
−x
0
.
Example 2.7
The secondorder divided diﬀerence is
f[x
0
, x
1
, x
2
] =
f[x
1
, x
2
] −f[x
0
, x
1
]
x
2
−x
0
=
f(x
2
)−f(x
1
)
x
2
−x
1
−
f(x
1
)−f(x
0
)
x
1
−x
0
x
2
−x
0
.
Hence, the unique polynomial that interpolates (x
0
, f(x
0
)), (x
1
, f(x
1
)), and (x
2
, f(x
2
))
is
Q
2
(x) = f[x
0
] + f[x
0
, x
1
](x −x
0
) + f[x
0
, x
1
, x
2
](x −x
0
)(x −x
1
)
= f(x
0
) +
f(x
1
) −f(x
0
)
x
1
−x
0
(x −x
0
) +
f(x
2
)−f(x
1
)
x
2
−x
1
−
f(x
1
)−f(x
0
)
x
1
−x
0
x
2
−x
0
(x −x
0
)(x −x
1
).
For example, if we want to ﬁnd the polynomial of degree 2 that interpolates (−1, 9),
(0, 5), and (1, 3), we have
f(−1) = 9,
f[−1, 0] =
5 −9
0 −(−1)
= −4, f[0, 1] =
3 −5
1 −0
= −2,
f[−1, 0, 1] =
f[0, 1] −f[−1, 0]
1 −(−1)
=
−2 + 4
2
= 1.
so that
Q
2
(x) = 9 −4(x + 1) + (x + 1)x = 5 −3x + x
2
.
The relations between the divided diﬀerences are schematically portrayed in Table 2.1
(up to thirdorder). We note that the divided diﬀerences that are being used as the
coeﬃcients in the interpolation polynomial are those that are located in the top of every
column. The recursive structure of the divided diﬀerences implies that it is required to
compute all the low order coeﬃcients in the table in order to get the highorder ones.
11
2.7 The Error in Polynomial Interpolation D. Levy
x
0
f(x
0
)
`
f[x
0
, x
1
]
`
x
1
f(x
1
) f[x
0
, x
1
, x
2
]
` `
f[x
1
, x
2
] f[x
0
, x
1
, x
2
, x
3
]
`
x
2
f(x
2
) f[x
1
, x
2
, x
3
]
`
f[x
2
, x
3
]
x
3
f(x
3
)
Table 2.1: Divided Diﬀerences
One important property of any divided diﬀerence is that it is a symmetric function
of its arguments. This means that if we assume that y
0
, . . . , y
n
is any permutation of
x
0
, . . . , x
n
, then
f[y
0
, . . . , y
n
] = f[x
0
, . . . , x
n
].
This property can be clearly explained by recalling that f[x
0
, . . . , x
n
] plays the role of
the coeﬃcient of x
n
in the polynomial that interpolates f(x) at x
0
, . . . , x
n
. At the same
time, f[y
0
, . . . , y
n
] is the coeﬃcient of x
n
at the polynomial that interpolates f(x) at the
same points. Since the interpolation polynomial is unique for any given data, the order
of the points does not matter, and hence these two coeﬃcients must be identical.
2.7 The Error in Polynomial Interpolation
In this section we would like to provide estimates on the “error” we make when in
terpolating data that is taken from sampling an underlying function f(x). While the
interpolant and the function agree with each other at the interpolation points, there
is, in general, no reason to expect them to be close to each other elsewhere. Neverthe
less, we can estimate the diﬀerence between them, a diﬀerence which we refer to as the
interpolation error. We let Π
n
denote the space of polynomials of degree n.
Theorem 2.8 Let f(x) ∈ C
n+1
[a, b]. Let Q
n
(x) ∈ Π
n
such that it interpolates f(x) at
the n + 1 distinct points x
0
, . . . , x
n
∈ [a, b]. Then ∀x ∈ [a, b], ∃ξ
n
∈ (a, b) such that
f(x) −Q
n
(x) =
1
(n + 1)!
f
(n+1)
(ξ
n
)
n
¸
j=0
(x −x
j
). (2.25)
12
D. Levy 2.7 The Error in Polynomial Interpolation
Proof. We ﬁx a point x ∈ [a, b]. If x is one of the interpolation points x
0
, . . . , x
n
, then
the lefthandside and the righthandside of (2.25) are both zero, and hence this result
holds trivially. We therefore assume that x = x
j
0 j n, and let
w(x) =
n
¸
j=0
(x −x
j
).
We now let
F(y) = f(y) −Q
n
(y) −λw(y),
where λ is chosen as to guarantee that F(x) = 0, i.e.,
λ =
f(x) −Q
n
(x)
w(x)
.
Since the interpolation points x
0
, . . . , x
n
and x are distinct, w(x) does not vanish and
λ is well deﬁned. We now note that since f ∈ C
n+1
[a, b] and since Q
n
and w are
polynomials, then also F ∈ C
n+1
[a, b]. In addition, F vanishes at n + 2 points:
x
0
, . . . , x
n
and x. According to Rolle’s theorem, F
has at least n + 1 distinct zeros in
(a, b), F
has at least n distinct zeros in (a, b), and similarly, F
(n+1)
has at least one
zero in (a, b), which we denote by ξ
n
. We have
0 = F
(n+1)
(ξ
n
) = f
(n+1)
(ξ
n
) −Q
(n+1)
n
(ξ
n
) −λ(x)w
(n+1)
(ξ
n
) (2.26)
= f
(n+1)
(ξ
n
) −
f(x) −Q
n
(x)
w(x)
(n + 1)!
Here, we used the fact that the leading term of w(x) is x
n+1
, which guarantees that its
(n + 1)
th
derivative equals
w
(n+1)
(x) = (n + 1)! (2.27)
Reordering the terms in (2.26) we conclude with
f(x) −Q
n
(x) =
1
(n + 1)!
f
(n+1)
(ξ
n
)w(x).
In addition to the interpretation of the divided diﬀerence of order n as the coeﬃcient
of x
n
in some interpolation polynomial, there is another important characterization
which we will comment on now. Consider, e.g., the ﬁrstorder divided diﬀerence
f[x
0
, x
1
] =
f(x
1
) −f(x
0
)
x
1
−x
0
.
Since the order of the points does not change the value of the divided diﬀerence, we can
assume, without any loss of generality, that x
0
< x
1
. If we assume, in addition, that
13
2.7 The Error in Polynomial Interpolation D. Levy
f(x) is continuously diﬀerentiable in the interval [x
0
, x
1
], then this divided diﬀerence
equals to the derivative of f(x) at an intermediate point, i.e.,
f[x
0
, x
1
] = f
(ξ), ξ ∈ (x
0
, x
1
).
In other words, the ﬁrstorder divided diﬀerence can be viewed as an approximation of
the ﬁrst derivative in the interval. It is important to note that while this interpretation
is based on additional smoothness requirements from f(x) (i.e. its being diﬀerentiable),
the divided diﬀerences are well deﬁned also for nondiﬀerentiable functions.
This notion can be extended to divided diﬀerences of higher order as stated by the
following theorem.
Theorem 2.9 Let x, x
0
, . . . , x
n−1
be n +1 distinct points. Let a = min(x, x
0
, . . . , x
n−1
)
and b = max(x, x
0
, . . . , x
n−1
). Assume that f(y) has a continuous derivative of order n
in the interval (a, b). Then
f[x
0
, . . . , x
n−1
, x] =
f
(n)
(ξ)
n!
, (2.28)
where ξ ∈ (a, b).
Proof. Let Q
n+1
(y) interpolate f(y) at x
0
, . . . , x
n−1
, x. Then according to the
construction of the Newton form of the interpolation polynomial (2.22), we know that
Q
n
(y) = Q
n−1
(y) + f[x
0
, . . . , x
n−1
, x]
n−1
¸
j=0
(y −x
j
).
Since Q
n
(y) interpolated f(y) at x, we have
f(x) = Q
n−1
(x) + f[x
0
, . . . , x
n−1
, x]
n−1
¸
j=0
(x −x
j
).
By Theorem 2.8 we know that the interpolation error is given by
f(x) −Q
n−1
(x) =
1
n!
f
(n)
(ξ
n−1
)
n−1
¸
j=0
(x −x
j
),
which implies the result (2.28).
Remark. In equation (2.28), we could as well think of the interpolation point x as
any other interpolation point, and name it, e.g., x
n
. In this case, the equation (2.28)
takes the somewhat more natural form of
f[x
0
, . . . , x
n
] =
f
(n)
(ξ)
n!
.
In other words, the n
th
order divided diﬀerence is an n
th
derivative of the function f(x)
at an intermediate point, assuming that the function has n continuous derivatives. Sim
ilarly to the ﬁrstorder divided diﬀerence, we would like to emphasize that the n
th
order
divided diﬀerence is also well deﬁned in cases where the function is not as smooth as
required in the theorem, though if this is the case, we can no longer consider this divided
diﬀerence to represent a n
th
order derivative of the function.
14
D. Levy 2.8 Interpolation at the Chebyshev Points
2.8 Interpolation at the Chebyshev Points
In the entire discussion so far, we assumed that the interpolation points are given. There
may be cases where one may have the ﬂexibility of choosing the interpolation points. If
this is the case, it would be reasonable to use this degree of freedom to minimize the
interpolation error.
We recall that if we are interpolating values of a function f(x) that has n continuous
derivatives, the interpolation error is of the form
f(x) −Q
n
(x) =
1
(n + 1)!
f
(n+1)
(ξ
n
)
n
¸
j=0
(x −x
j
). (2.29)
Here, Q
n
(x) is the interpolating polynomial and ξ
n
is an intermediate point in the
interval of interest (see (2.25)).
It is important to note that the interpolation points inﬂuence two terms on the
righthandside of (2.29). The obvious one is the product
n
¸
j=0
(x −x
j
). (2.30)
The second one is f
(n+1)
(ξ
n
) as ξ
n
depends on x
0
, . . . , x
n
. Due to the implicit depen
dence of ξ
n
on the interpolation points, minimizing the interpolation error is not an
easy task. We will return to this “full” problem later on in the context of the mini
max approximation. For the time being, we are going to focus on a simpler problem,
namely, how to choose the interpolation points x
0
, . . . , x
n
such that the product (2.30)
is minimized. The solution of this problem is the topic of this section. Once again, we
would like to emphasize that a solution of this problem does not (in general) provide
an optimal choice of interpolation points that will minimize the interpolation error. All
that it guarantees is that the product part of the interpolation error is minimal.
The tool that we are going to use is the Chebyshev polynomials. The solution of
the problem will be to interpolate at Chebyshev points. We will ﬁrst introduce the
Chebyshev polynomials and the Chebyshev points and then show why interpolating at
these points minimizes (2.30).
We start by deﬁning the Chebyshev polynomials using the following recursion
relation:
T
0
(x) = 1,
T
1
(x) = x,
T
n+1
(x) = 2xT
n
(x) −T
n−1
(x), n 1.
(2.31)
For example, T
2
(x) = 2xT
1
(x)−T
0
(x) = 2x
2
−1, and T
3
(x) = 4x
3
−3x. The polynomials
T
1
(x), T
2
(x) and T
3
(x) are shown in Figure 2.2.
Instead of writing the recursion formula, (2.31), it is possible to write an explicit
formula for the Chebyshev polynomials:
Lemma 2.10 For x ∈ [−1, 1],
T
n
(x) = cos(ncos
−1
x), n 0. (2.32)
15
2.8 Interpolation at the Chebyshev Points D. Levy
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
x
T
1
(x)
T
2
(x)
T
3
(x)
Figure 2.2: The Chebyshev polynomials T
1
(x), T
2
(x) and T
3
(x)
Proof. Standard trigonometric identities imply that
cos(n + 1)θ = cos θ cos nθ −sin θ sin nθ,
cos(n −1)θ = cos θ cos nθ + sin θ sin nθ.
Hence
cos(n + 1)θ = 2 cos θ cos nθ −cos(n −1)θ. (2.33)
We now let θ = cos
−1
x, i.e., x = cos θ, and deﬁne
t
n
(x) = cos(ncos
−1
x) = cos(nθ).
Then by (2.33)
t
0
(x) = 1,
t
1
(x) = x,
t
n+1
(x) = 2xt
n
(x) −t
n−1
(x), n 1.
Hence t
n
(x) = T
n
(x).
What is so special about the Chebyshev polynomials, and what is the connection
between these polynomials and minimizing the interpolation error? We are about to
answer these questions, but before doing so, there is one more issue that we must
clarify.
We deﬁne a monic polynomial as a polynomial for which the coeﬃcient of the
leading term is one, i.e., a polynomial of degree n is monic, if it is of the form
x
n
+ a
n−1
x
n−1
+ . . . + a
1
x + a
0
.
16
D. Levy 2.8 Interpolation at the Chebyshev Points
Note that Chebyshev polynomials are not monic: the deﬁnition (2.31) implies that the
Chebyshev polynomial of degree n is of the form
T
n
(x) = 2
n−1
x
n
+ . . .
This means that T
n
(x) divided by 2
n−1
is monic, i.e.,
2
1−n
T
n
(x) = x
n
+ . . .
A general result about monic polynomials is the following
Theorem 2.11 If p
n
(x) is a monic polynomial of degree n, then
max
−1x1
[p
n
(x)[ 2
1−n
. (2.34)
Proof. We prove (2.34) by contradiction. Suppose that
[p
n
(x)[ < 2
1−n
, [x[ 1.
Let
q
n
(x) = 2
1−n
T
n
(x),
and let x
j
be the following n + 1 points
x
j
= cos
jπ
n
, 0 j n.
Since
T
n
cos
jπ
n
= (−1)
j
,
we have
(−1)
j
q
n
(x
j
) = 2
1−n
.
Hence
(−1)
j
p
n
(x
j
) [p
n
(x
j
)[ < 2
1−n
= (−1)
j
q
n
(x
j
).
This means that
(−1)
j
(q
n
(x
j
) −p
n
(x
j
)) > 0, 0 j n.
Hence, the polynomial (q
n
−p
n
)(x) oscillates (n+1) times in the interval [−1, 1], which
means that (q
n
−p
n
)(x) has at least n distinct roots in the interval. However, p
n
(x) and
q
n
(x) are both monic polynomials which means that their diﬀerence is a polynomial of
degree n −1 at most. Such a polynomial can not have more than n −1 distinct roots,
which leads to a contradiction. Note that p
n
−q
n
can not be the zero polynomial because
17
2.8 Interpolation at the Chebyshev Points D. Levy
that will imply that p
n
(x) and q
n
(x) are identical which again is not possible due to the
assumptions on their maximum values.
We are now ready to use Theorem 2.11 to ﬁgure out how to reduce the interpolation
error. We know by Theorem 2.8 that if the interpolation points x
0
, . . . , x
n
∈ [−1, 1],
then there exists ξ
n
∈ (−1, 1) such that the distance between the function whose values
we interpolate, f(x), and the interpolation polynomial, Q
n
(x), is
max
x1
[f(x) −Q
n
(x)[
1
(n + 1)!
max
x1
[f
(n+1)
(x)[ max
x1
n
¸
j=0
(x −x
j
)
.
We are interested in minimizing
max
x1
n
¸
j=0
(x −x
j
)
.
We note that
¸
n
j=0
(x −x
j
) is a monic polynomial of degree n + 1 and hence by Theo
rem 2.11
max
x1
n
¸
j=0
(x −x
j
)
2
−n
.
The minimal value of 2
−n
can be actually obtained if we set
2
−n
T
n+1
(x) =
n
¸
j=0
(x −x
j
),
which is equivalent to choosing x
j
as the roots of the Chebyshev polynomial T
n+1
(x).
Here, we have used the obvious fact that [T
n
(x)[ 1.
What are the roots of the Chebyshev polynomial T
n+1
(x)? By Lemma 2.10
T
n+1
(x) = cos((n + 1) cos
−1
x).
The roots of T
n+1
(x), x
0
, . . . , x
n
, are therefore obtained if
(n + 1) cos
−1
(x
j
) =
j +
1
2
π, 0 j n,
i.e., the (n + 1) roots of T
n+1
(x) are
x
j
= cos
2j + 1
2n + 2
π
, 0 j n. (2.35)
The roots of the Chebyshev polynomials are sometimes referred to as the Chebyshev
points. The formula (2.35) for the roots of the Chebyshev polynomial has the following
geometrical interpretation. In order to ﬁnd the roots of T
n
(x), deﬁne α = π/n. Divide
18
D. Levy 2.8 Interpolation at the Chebyshev Points
1 x
0
x
1
0 x
2
x
3
1
The unit circle
7π
8
5π
8
3π
8
π
8
x
Figure 2.3: The roots of the Chebyshev polynomial T
4
(x), x
0
, . . . , x
3
. Note that they
become dense next to the boundary of the interval
the upper half of the unit circle into n + 1 parts such that the two side angles are α/2
and the other angles are α. The Chebyshev points are then obtained by projecting these
points on the xaxis. This procedure is demonstrated in Figure 2.3 for T
4
(x).
The following theorem summarizes the discussion on interpolation at the Chebyshev
points. It also provides an estimate of the error for this case.
Theorem 2.12 Assume that Q
n
(x) interpolates f(x) at x
0
, . . . , x
n
. Assume also that
these (n + 1) interpolation points are the (n + 1) roots of the Chebyshev polynomial of
degree n + 1, T
n+1
(x), i.e.,
x
j
= cos
2j + 1
2n + 2
π
, 0 j n.
Then ∀[x[ 1,
[f(x) −Q
n
(x)[
1
2
n
(n + 1)!
max
ξ1
f
(n+1)
(ξ)
. (2.36)
Example 2.13
Problem: Let f(x) = sin(πx) in the interval [−1, 1]. Find Q
2
(x) which interpolates f(x)
in the Chebyshev points. Estimate the error.
Solution: Since we are asked to ﬁnd an interpolation polynomial of degree 2, we need
3 interpolation points. We are also asked to interpolate at the Chebyshev points, and
hence we ﬁrst need to compute the 3 roots of the Chebyshev polynomial of degree 3,
T
3
(x) = 4x
3
−3x.
19
2.8 Interpolation at the Chebyshev Points D. Levy
The roots of T
3
(x) can be easily found from x(4x
2
−3) = 0, i.e.,
x
0
= −
√
3
2
, , x
1
= 0, x
2
=
√
3
2
.
The corresponding values of f(x) at these interpolation points are
f(x
0
) = sin
−
√
3
2
π
≈ −0.4086,
f(x
1
) = 0,
f(x
2
) = sin
√
3
2
π
≈ 0.4086.
The ﬁrstorder divided diﬀerences are
f[x
0
, x
1
] =
f(x
1
) −f(x
0
)
x
1
−x
0
≈ 0.4718,
f[x
1
, x
2
] =
f(x
2
) −f(x
1
)
x
2
−x
1
≈ 0.4718,
and the secondorder divided diﬀerence is
f[x
0
, x
1
, x
2
] =
f[x
1
, x
2
] −f[x
0
, x
1
]
x
2
−x
0
= 0.
The interpolation polynomial is
Q
2
(x) = f(x
0
) + f[x
0
, x
1
](x −x
0
) + f[x
0
, x
1
, x
2
](x −x
0
)(x −x
1
) ≈ 0.4718x.
The original function f(x) and the interpolant at the Chebyshev points, Q
2
(x), are
plotted in Figure 2.4.
As of the error estimate, ∀[x[ 1,
[ sin πx −Q
2
(x)[
1
2
2
3!
max
ξ1
[(sin πt)
(3)
[
π
3
2
2
3!
1.292
A brief examination of Figure 2.4 reveals that while this error estimate is correct, it is
far from being sharp.
Remark. In the more general case where the interpolation interval for the function
f(x) is x ∈ [a, b], it is still possible to use the previous results by following the
following steps: Start by converting the interpolation interval to y ∈ [−1, 1]:
x =
(b −a)y + (a + b)
2
.
This converts the interpolation problem for f(x) on [a, b] into an interpolation problem
for f(x) = g(x(y)) in y ∈ [−1, 1]. The Chebyshev points in the interval y ∈ [−1, 1] are
the roots of the Chebyshev polynomial T
n+1
(x), i.e.,
y
j
= cos
2j + 1
2n + 2
π
, 0 j n.
20
D. Levy 2.9 Hermite Interpolation
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
Q
2
(x)
f(x)
x
Figure 2.4: The function f(x) = sin(π(x)) and the interpolation polynomial Q
2
(x) that
interpolates f(x) at the Chebyshev points. See Example 2.13.
The corresponding n + 1 interpolation points in the interval [a, b] are
x
j
=
(b −a)y
j
+ (a + b)
2
, 0 j n.
We now have
max
y∈[a,b]
n
¸
j=0
(y −y
j
)
=
b −a
2
n+1
max
x1
n
¸
j=0
(x −x
j
)
,
so that the interpolation error is
[f(y) −Q
n
(y)[
1
2
n
(n + 1)!
b −a
2
n+1
max
ξ∈[a,b]
f
(n+1)
(ξ)
. (2.37)
2.9 Hermite Interpolation
We now turn to a slightly diﬀerent interpolation problem in which we assume that
in addition to interpolating the values of the function at certain points, we are also
interested in interpolating its derivatives. Interpolation that involves the derivatives is
called Hermite interpolation. Such an interpolation problem is demonstrated in the
following example:
Example 2.14
Problem: Find a polynomials p(x) such that p(1) = −1, p
(1) = −1, and p(0) = 1.
21
2.9 Hermite Interpolation D. Levy
Solution: Since three conditions have to be satisﬁed, we can use these conditions to
determine three degrees of freedom, which means that it is reasonable to expect that
these conditions uniquely determine a polynomial of degree 2. We therefore let
p(x) = a
0
+ a
1
x + a
2
x
2
.
The conditions of the problem then imply that
a
0
+ a
1
+ a
2
= −1,
a
1
+ 2a
2
= −1,
a
0
= 1.
Hence, there is indeed a unique polynomial that satisﬁes the interpolation conditions
and it is
p(x) = x
2
−3x + 1.
In general, we may have to interpolate highorder derivatives and not only ﬁrst
order derivatives. Also, we assume that for any point x
j
in which we have to satisfy an
interpolation condition of the form
p
(l)
(x
j
) = f(x
j
),
(with p
(l)
being the l
th
order derivative of p(x)), we are also given all the values of the
lowerorder derivatives up to l as part of the interpolation requirements, i.e.,
p
(i)
(x
j
) = f
(i)
(x
j
), 0 i l.
If this is not the case, it may not be possible to ﬁnd a unique interpolant as demonstrated
in the following example.
Example 2.15
Problem: Find p(x) such that p
(0) = 1 and p
(1) = −1.
Solution: Since we are asked to interpolate two conditions, we may expect them to
uniquely determine a linear function, say
p(x) = a
0
+ a
1
x.
However, both conditions specify the derivative of p(x) at two distinct points to be
of diﬀerent values, which amounts to a contradicting information on the value of a
1
.
Hence, a linear polynomial can not interpolate the data and we must consider higher
order polynomials. Unfortunately, a polynomial of order 2 will no longer be unique
because not enough information is given. Note that even if the prescribed values of the
derivatives were identical, we will not have problems with the coeﬃcient of the linear
term a
1
, but we will still not have enough information to determine the constant a
0
.
22
D. Levy 2.9 Hermite Interpolation
A simple case that you are probably already familiar with is the Taylor series.
When viewed from the point of view that we advocate in this section, one can consider
the Taylor series as an interpolation problem in which one has to interpolate the value
of the function and its ﬁrst n derivatives at a given point, say x
0
, i.e., the interpolation
conditions are:
p
(j)
(x
0
) = f
(j)
(x
0
), 0 j n.
The unique solution of this problem in terms of a polynomial of degree n is
p(x) = f(x
0
) + f
(x
0
)(x −x
0
) + . . . +
f
(n)
(x
0
)
n!
(x −x
0
)
n
=
n
¸
j=0
f
(j)
(x
0
)
j!
(x −x
0
)
j
,
which is the Taylor series of f(x) expanded about x = x
0
.
2.9.1 Divided diﬀerences with repetitions
We are now ready to consider the Hermite interpolation problem. The ﬁrst form we
study is the Newton form of the Hermite interpolation polynomial. We start by extend
ing the deﬁnition of divided diﬀerences in such a way that they can handle derivatives.
We already know that the ﬁrst derivative is connected with the ﬁrstorder divided dif
ference by
f
(x
0
) = lim
x→x
0
f(x) −f(x
0
)
x −x
0
= lim
x→x
0
f[x, x
0
].
Hence, it is natural to extend the notion of divided diﬀerences by the following deﬁnition.
Deﬁnition 2.16 The ﬁrstorder divided diﬀerence with repetitions is deﬁned as
f[x
0
, x
0
] = f
(x
0
). (2.38)
In a similar way, we can extend the notion of divided diﬀerences to highorder derivatives
as stated in the following lemma (which we leave without a proof).
Lemma 2.17 Let x
0
x
1
. . . x
n
. Then the divided diﬀerences satisfy
f[x
0
, . . . x
n
] =
f[x
1
, . . . , x
n
] −f[x
0
, . . . , x
n−1
]
x
n
−x
0
, x
n
= x
0
,
f
(n)
(x
0
)
n!
, x
n
= x
0
.
(2.39)
We now consider the following Hermite interpolation problem: The interpolation
points are x
0
, . . . , x
l
(which we assume are ordered from small to large). At each inter
polation point x
j
, we have to satisfy the interpolation conditions:
p
(i)
(x
j
) = f
(i)
(x
j
), 0 i m
j
.
23
2.9 Hermite Interpolation D. Levy
Here, m
j
denotes the number of derivatives that we have to interpolate for each point
x
j
(with the standard notation that zero derivatives refers to the value of the function
only). In general, the number of derivatives that we have to interpolate may change
from point to point. The extended notion of divided diﬀerences allows us to write the
solution to this problem in the following way:
We let n denote the total number of points including their multiplicities (that cor
respond to the number of derivatives we have to interpolate at each point), i.e.,
n = m
1
+ m
2
+ . . . + m
l
.
We then list all the points including their multiplicities (that correspond to the number
of derivatives we have to interpolate). To simplify the notations we identify these points
with a new ordered list of points y
i
:
¦y
0
, . . . , y
n−1
¦ = ¦x
0
, . . . , x
0
. .. .
m
1
, x
1
, . . . , x
1
. .. .
m
2
, . . . , x
l
, . . . , x
l
. .. .
m
l
¦.
The interpolation polynomial p
n−1
(x) is given by
p
n−1
(x) = f[y
0
] +
n−1
¸
j=1
f[y
0
, . . . , y
j
]
j−1
¸
k=0
(x −y
k
). (2.40)
Whenever a point repeats in f[y
0
, . . . , y
j
], we interpret this divided diﬀerence in terms
of the extended deﬁnition (2.39). In practice, there is no need to shift the notations to
y’s and we work directly with the original points. We demonstrate this interpolation
procedure in the following example.
Example 2.18
Problem: Find an interpolation polynomial p(x) that satisﬁes
p(x
0
) = f(x
0
),
p(x
1
) = f(x
1
),
p
(x
1
) = f
(x
1
).
Solution: The interpolation polynomial p(x) is
p(x) = f(x
0
) + f[x
0
, x
1
](x −x
0
) + f[x
0
, x
1
, x
1
](x −x
0
)(x −x
1
).
The divided diﬀerences:
f[x
0
, x
1
] =
f(x
1
) −f(x
0
)
x
1
−x
0
.
f[x
0
, x
1
, x
1
] =
f[x
1
, x
1
] −f[x
1
, x
0
]
x
1
−x
0
=
f
(x
1
) −
f(x
1
)−f(x
0
)
x
1
−x
0
x
1
−x
0
.
Hence
p(x) = f(x
0
)+
f(x
1
) −f(x
0
)
x
1
−x
0
(x−x
0
)+
(x
1
−x
0
)f
(x
1
) −[f(x
1
) −f(x
0
)]
(x
1
−x
0
)
2
(x−x
0
)(x−x
1
).
24
D. Levy 2.9 Hermite Interpolation
2.9.2 The Lagrange form of the Hermite interpolant
In this section we are interested in writing the Lagrange form of the Hermite interpolant
in the special case in which the nodes are x
0
, . . . , x
n
and the interpolation conditions
are
p(x
i
) = f(x
i
), p
(x
i
) = f
(x
i
), 0 i n. (2.41)
We look for an interpolant of the form
p(x) =
n
¸
i=0
f(x
i
)A
i
(x) +
n
¸
i=0
f
(x
i
)B
i
(x). (2.42)
In order to satisfy the interpolation conditions (2.41), the polynomials p(x) in (2.42)
must satisfy the 2n + 2 conditions:
A
i
(x
j
) = δ
ij
, B
i
(x
j
) = 0,
i, j = 0, . . . , n.
A
i
(x
j
) = 0, B
i
(x
j
) = δ
ij
,
(2.43)
We thus expect to have a unique polynomial p(x) that satisﬁes the constraints (2.43)
assuming that we limit its degree to be 2n + 1.
It is convenient to start the construction with the functions we have used in the
Lagrange form of the standard interpolation problem (Section 2.5). We already know
that
l
i
(x) =
n
¸
j=0
j=i
x −x
j
x
i
−x
j
,
satisfy l
i
(x
j
) = δ
ij
. In addition, for i = j,
l
2
i
(x
j
) = 0, (l
2
i
(x
j
))
= 0.
The degree of l
i
(x) is n, which means that the degree of l
2
i
(x) is 2n. We will thus
assume that the unknown polynomials A
i
(x) and B
i
(x) in (2.43) can be written as
A
i
(x) = r
i
(x)l
2
i
(x),
B
i
(x) = s
i
(x)l
2
i
(x).
The functions r
i
(x) and s
i
(x) are both assumed to be linear, which implies that deg(A
i
) =
deg(B
i
) = 2n + 1, as desired. Now, according to (2.43)
δ
ij
= A
i
(x
j
) = r
i
(x
j
)l
2
i
(x
j
) = r
i
(x
j
)δ
ij
.
Hence
r
i
(x
i
) = 1. (2.44)
25
2.9 Hermite Interpolation D. Levy
Also,
0 = A
i
(x
j
) = r
i
(x
j
)[l
i
(x
j
)]
2
+ 2r
i
(x
j
)l
i
(x
J
)l
i
(x
j
) = r
i
(x
j
)δ
ij
+ 2r
i
(x
j
)δ
ij
l
i
(x
j
),
and thus
r
i
(x
i
) + 2l
i
(x
i
) = 0. (2.45)
Assuming that r
i
(x) is linear, r
i
(x) = ax + b, equations (2.44),(2.45), imply that
a = −2l
i
(x
i
), b = 1 + 2l
i
(x
i
)x
i
.
Therefore
A
i
(x) = [1 + 2l
i
(x
i
)(x
i
−x)]l
2
i
(x).
As of B
i
(x) in (2.42), the conditions (2.43) imply that
0 = B
i
(x
j
) = s
i
(x
j
)l
2
i
(x
j
) =⇒ s
i
(x
i
) = 0, (2.46)
and
δ
ij
= B
i
(x
j
) = s
i
(x
j
)l
2
i
(x
j
) + 2s
i
(x
j
)(l
2
i
(x
j
))
=⇒ s
i
(x
i
) = 1. (2.47)
Combining (2.46) and (2.47), we obtain
s
i
(x) = x −x
i
,
so that
B
i
(x) = (x −x
i
)l
2
i
(x).
To summarize, the Lagrange form of the Hermite interpolation polynomial is given by
p(x) =
n
¸
i=0
f(x
i
)[1 + 2l
i
(x
i
)(x
i
−x)]l
2
i
(x) +
n
¸
i=0
f
(x
i
)(x −x
i
)l
2
i
(x). (2.48)
The error in the Hermite interpolation (2.48) is given by the following theorem.
Theorem 2.19 Let x
0
, . . . , x
n
be distinct nodes in [a, b] and f ∈ C
2n+2
[a, b]. If p ∈
Π
2n+1
, such that ∀0 i n,
p(x
i
) = f(x
i
), p
(x
i
) = f
(x
i
),
then ∀x ∈ [a, b], there exists ξ ∈ (a, b) such that
f(x) −p(x) =
f
(2n+2)
(ξ)
(2n + 2)!
n
¸
i=0
(x −x
i
)
2
. (2.49)
26
D. Levy 2.9 Hermite Interpolation
Proof. The proof follows the same techniques we used in proving Theorem 2.8. If x is
one of the interpolation points, the result trivially holds. We thus ﬁx x as a
noninterpolation point and deﬁne
w(y) =
n
¸
i=0
(y −x
i
)
2
.
We also have
φ(y) = f(y) −p(y) −λw(y),
and select λ such that φ(x) = 0, i.e.,
λ =
f(x) −p(x)
w(x)
.
φ has (at least) n + 2 zeros in [a, b]: (x, x
0
, . . . , x
n
). By Rolle’s theorem, we know that
φ
has (at least) n + 1 zeros that are diﬀerent than (x, x
0
, . . . , x
n
). Also, φ
vanishes at
x
0
, . . . , x
n
, which means that φ
has at least 2n + 2 zeros in [a, b].
Similarly, Rolle’s theorem implies that φ
has at least 2n + 1 zeros in (a, b), and by
induction, φ
(2n+2)
has at least one zero in (a, b), say ξ.
Hence
0 = φ
(2n+2)
(ξ) = f
(2n+2)
(ξ) −p
(2n+2)
(ξ) −λw
(2n+2)
(ξ).
Since the leading term in w(y) is x
2n+2
, w
(2n+2)
(ξ) = (2n + 2)!. Also, since
p(x) ∈ Π
2n+1
, p
(2n+2)
(ξ) = 0. We recall that x was an arbitrary (noninterpolation)
point and hence we have
f(x) −p(x) =
f
(2n+2)
(ξ)
(2n + 2)!
n
¸
i=0
(x −x
i
)
2
.
Example 2.20
Assume that we would like to ﬁnd the Hermite interpolation polynomial that satisﬁes:
p(x
0
) = y
0
, p
(x
0
) = d
0
, p(x
1
) = y
1
, p
(x
1
) = d
1
.
In this case n = 1, and
l
0
(x) =
x −x
1
x
0
−x
1
, l
0
(x) =
1
x
0
−x
1
, l
1
(x) =
x −x
0
x
1
−x
0
, l
1
(x) =
1
x
1
−x
0
.
According to (2.48), the desired polynomial is given by (check!)
p(x) = y
0
¸
1 +
2
x
0
−x
1
(x
0
−x)
x −x
1
x
0
−x
1
2
+ y
1
¸
1 +
2
x
1
−x
0
(x
1
−x)
x −x
0
x
1
−x
0
2
+d
0
(x −x
0
)
x −x
1
x
0
−x
1
2
+ d
1
(x −x
1
)
x −x
0
x
1
−x
0
2
.
27
2.10 Spline Interpolation D. Levy
2.10 Spline Interpolation
So far, the only type of interpolation we were dealing with was polynomial interpolation.
In this section we discuss a diﬀerent type of interpolation: piecewisepolynomial interpo
lation. A simple example of such interpolants will be the function we get by connecting
data with straight lines (see Figure 2.5). Of course, we would like to generate functions
that are somewhat smoother than piecewiselinear functions, and still interpolate the
data. The functions we will discuss in this section are splines.
(x
0
, f(x
0
))
(x
1
, f(x
1
))
(x
2
, f(x
2
))
(x
3
, f(x
3
))
(x
4
, f(x
4
))
x
Figure 2.5: A piecewiselinear spline. In every subinterval the function is linear. Overall
it is continuous where the regularity is lost at the knots
You may still wonder why are we interested in such functions at all? It is easy to
motivate this discussion by looking at Figure 2.6. In this ﬁgure we demonstrate what a
highorder interpolant looks like. Even though the data that we interpolate has only one
extrema in the domain, we have no control over the oscillatory nature of the highorder
interpolating polynomial. In general, highorder polynomials are oscillatory, which rules
them as nonpractical for many applications. That is why we focus our attention in this
section on splines.
Splines, should be thought of as polynomials on subintervals that are connected in
a “smooth way”. We will be more rigorous when we deﬁne precisely what we mean by
smooth. First, we pick n + 1 points which we refer to as the knots: t
0
< t
1
< < t
n
.
A spline of degree k having knots t
0
, . . . , t
n
is a function s(x) that satisﬁes the
following two properties:
1. On [t
i−1
, t
i
) s(x) is a polynomial of degree k, i.e., s(x) is a polynomial on every
subinterval that is deﬁned by the knots.
2. Smoothness: s(x) has a continuous (k −1)
th
derivative on the interval [t
0
, t
n
].
28
D. Levy 2.10 Spline Interpolation
−5 −4 −3 −2 −1 0 1 2 3 4 5
−0.5
0
0.5
1
1.5
2
1
1+x
2
Q
10
(x)
x
Figure 2.6: An interpolant “goes bad”. In this example we interpolate 11 equally spaced
samples of f(x) =
1
1+x
2
with a polynomial of degree 10, Q
10
(x)
(t
0
, f(t
0
))
(t
1
, f(t
1
))
(t
2
, f(t
2
))
(t
3
, f(t
3
))
(t
4
, f(t
4
))
x
Figure 2.7: A zerothorder (piecewiseconstant) spline. The knots are at the interpola
tion points. Since the spline is of degree zero, the function is not even continuous
29
2.10 Spline Interpolation D. Levy
A spline of degree 0 is a piecewiseconstant function (see Figure 2.7). A spline of
degree 1 is a piecewiselinear function that can be explicitly written as
s(x) =
s
0
(x) = a
0
x + b
0
, x ∈ [t
0
, t
1
),
s
1
(x) = a
1
x + b
1
, x ∈ [t
1
, t
2
),
.
.
.
.
.
.
s
n−1
(x) = a
n−1
x + b
n−1
, x ∈ [t
n−1
, t
n
],
(see Figure 2.5 where the knots ¦t
i
¦ and the interpolation points ¦x
i
¦ are assumed to
be identical). It is now obvious why the points t
0
, . . . , t
n
are called knots: these are
the points that connect the diﬀerent polynomials with each other. To qualify as an
interpolating function, s(x) will have to satisfy interpolation conditions that we will
discuss below. We would like to comment already at this point that knots should not be
confused with the interpolation points. Sometimes it is convenient to choose the knots
to coincide with the interpolation points but this is only optional, and other choices can
be made.
2.10.1 Cubic splines
A special case (which is the most common spline function that is used in practice) is
the cubic spline. A cubic spline is a spline for which the function is a polynomial of
degree 3 on every subinterval, and a function with two continuous derivatives overall
(see Figure 2.8).
Let’s denote such a function by s(x), i.e.,
s(x) =
s
0
(x), x ∈ [t
0
, t
1
),
s
1
(x), x ∈ [t
1
, t
2
),
.
.
.
.
.
.
s
n−1
(x), x ∈ [t
n−1
, t
n
],
where ∀i, the degree of s
i
(x) is 3.
We now assume that some data (that s(x) should interpolate) is given at the knots,
i.e.,
s(t
i
) = y
i
, 0 i n. (2.50)
The interpolation conditions (2.50) in addition to requiring that s(x) is continuous,
imply that
s
i−1
(t
i
) = y
i
= s
i
(t
i
), 1 i n −1. (2.51)
We also require the continuity of the ﬁrst and the second derivatives, i.e.,
s
i
(t
i+1
) = s
i+1
(t
i+1
), 0 i n −2, (2.52)
s
i
(t
i+1
) = s
i+1
(t
i+1
), 0 i n −2.
Before actually computing the spline, let’s check if we have enough equations to
determine a unique solution for the problem. There are n subintervals, and in each
30
D. Levy 2.10 Spline Interpolation
(t
0
, f(t
0
))
(t
1
, f(t
1
))
(t
2
, f(t
2
))
(t
3
, f(t
3
))
(t
4
, f(t
4
))
x
Figure 2.8: A cubic spline. In every subinterval [t
i−1
, t
i
, the function is a polynomial of
degree 2. The polynomials on the diﬀerent subintervals are connected to each other
in such a way that the spline has a secondorder continuous derivative. In this example
we use the notaknot condition.
subinterval we have to determine a polynomial of degree 3. Each such polynomial
has 4 coeﬃcients, which leaves us with 4n coeﬃcients to determine. The interpolation
and continuity conditions (2.51) for s
i
(t
i
) and s
i
(t
i+1
) amount to 2n equations. The
continuity of the ﬁrst and the second derivatives (2.52) add 2(n−1) = 2n−2 equations.
Altogether we have 4n − 2 equations but 4n unknowns which leaves us with 2 degrees
of freedom. These indeed are two degrees of freedom that can be determined in various
ways as we shall see below.
We are now ready to compute the spline. We will use the following notation:
h
i
= t
i+1
−t
i
.
We also set
z
i
= s
(t
i
).
Since the second derivative of a cubic function is linear, we observe that s
i
(x) is the line
connecting (t
i
, z
i
) and (t
i+1
, z
i+1
), i.e.,
s
i
(x) =
x −t
i
h
i
z
i+1
−
x −t
i+1
h
i
z
i
. (2.53)
Integrating (2.53) once, we have
s
i
(x) =
1
2
(x −t
i
)
2
z
i+1
h
i
−
1
2
(x −t
i+1
)
2
z
i
h
i
+ ˜ c.
31
2.10 Spline Interpolation D. Levy
Integrating again
s
i
(x) =
z
i+1
6h
i
(x −t
i
)
3
+
z
i
6h
i
(t
i+1
−x)
3
+ C(x −t
i
) + D(t
i+1
−x).
The interpolation condition, s(t
i
) = y
i
, implies that
y
i
=
z
i
6h
i
h
3
i
+ Dh
i
,
i.e.,
D =
y
i
h
i
−
z
i
h
i
6
.
Similarly, s
i
(t
i+1
) = y
i+1
, implies that
y
i+1
=
z
i+1
6h
i
h
3
i
+ Ch
i
,
i.e.,
C =
y
i+1
h
i
−
z
i+1
6
h
i
.
This means that we can rewrite s
i
(x) as
s
i
(x) =
z
i+1
6h
i
(x−t
i
)
3
+
z
i
6h
i
(t
i+1
−x)
3
+
y
i+1
h
i
−
z
i+1
6
h
i
(x−t
i
)+
y
i
h
i
−
z
i
6
h
i
(t
i+1
−x).
All that remains to determine is the second derivatives of s(x), z
0
, . . . , z
n
. We can set
z
1
, . . . , z
n−1
using the continuity conditions on s
(x), i.e., s
i
(t
i
) = s
i−1
(t
i
). We ﬁrst
compute s
i
(x) and s
i−1
(x):
s
i
(x) =
z
i+1
2h
i
(x −t
i
)
2
−
z
i
2h
i
(t
i+1
−x)
2
+
y
i+1
h
i
−
z
i+1
6
h
i
−
y
i
h
i
+
z
i
h
i
6
.
s
i−1
(x) =
z
i
2h
i−1
(x −t
i−1
)
2
−
z
i−1
2h
i−1
(t
i
−x)
2
+
y
i
h
i−1
−
z
i
6
h
i−1
−
y
i−1
h
i−1
+
z
i−1
h
i−1
6
.
So that
s
i
(t
i
) = −
z
i
2h
i
h
2
i
+
y
i+1
h
i
−
z
i+1
6
h
i
−
y
i
h
i
+
z
i
h
i
6
= −
h
i
3
z
i
−
h
i
6
z
i+1
−
y
i
h
i
+
y
i+1
h
i
,
s
i−1
(t
i
) =
z
i
2h
i−1
h
2
i−1
+
y
i
h
i−1
−
z
i
6
h
i−1
−
y
i−1
h
i−1
+
z
i−1
h
i−1
6
=
h
i−1
6
z
i−1
+
h
i−1
3
z
i
−
y
i−1
h
i−1
+
y
i
h i−1
.
Hence, for 1 i n −1, we obtain the system of equations
h
i−1
6
z
i−1
+
h
i
+ h
i−1
3
z
i
+
h
i
6
z
i+1
=
1
h
i
(y
i+1
−y
i
) −
1
h
i−1
(y
i
−y
i−1
). (2.54)
32
D. Levy 2.10 Spline Interpolation
These are n−1 equations for the n+1 unknowns, z
0
, . . . , z
n
, which means that we have
2 degrees of freedom. Without any additional information about the problem, the only
way to proceed is by making an arbitrary choice. There are several standard ways to
proceed. One option is to set the end values to zero, i.e.,
z
0
= z
n
= 0. (2.55)
This choice of the second derivative at the end points leads to the socalled, natural
cubic spline. We will explain later in what sense this spline is “natural”. In this case,
we end up with the following linear system of equations
¸
¸
¸
¸
¸
¸
h
0
+h
1
3
h
1
6
h
1
6
h
1
+h
2
3
h
2
6
.
.
.
.
.
.
.
.
.
h
n−3
6
h
n−3
+h
n−2
3
h
n−2
6
h
n−2
6
h
n−2
+h
n−1
3
¸
¸
¸
¸
¸
¸
z
1
z
2
.
.
.
z
n−2
z
n−1
=
¸
¸
¸
¸
¸
¸
y
2
−y
1
h
1
−
y
1
−y
0
h
0
y
3
−y
2
h
2
−
y
2
−y
1
h
1
.
.
.
y
n−1
−y
n−2
h
n−2
−
y
n−2
−y
n−3
h
n−3
yn−y
n−1
h
n−1
−
y
n−1
−y
n−2
h
n−2
The coeﬃcients matrix is symmetric, tridiagonal, and diagonally dominant (i.e., [a
ii
[ >
¸
n
j=1,j=i
[a
ij
[, ∀i), which means that it can always be (eﬃciently) inverted.
In the special case where the points are equally spaced, i.e., h
i
= h, ∀i, the system
becomes
¸
¸
¸
¸
¸
¸
4 1
1 4 1
.
.
.
.
.
.
.
.
.
1 4 1
1 4
¸
¸
¸
¸
¸
¸
z
1
z
2
.
.
.
z
n−2
z
n−1
=
6
h
2
¸
¸
¸
¸
¸
¸
y
2
−2y
1
+ y
0
y
3
−2y
2
+ y
1
.
.
.
y
n−1
−2y
n−2
+ y
n−3
y
n
−2y
n−1
+ y
n−2
(2.56)
In addition to the natural spline (2.55), there are other standard options:
1. If the values of the derivatives at the endpoints are known, one can specify them
s
(t
0
) = y
0
, s
(t
n
) = y
n
.
2. The notaknot condition. Here, we require the thirdderivative s
(3)
(x) to be
continuous at the points t
1
, t
n−1
. In this case we end up with a cubic spline with
knots t
0
, t
2
, t
3
, . . . , t
n−2
, t
n
. The points t
1
and t
n−1
no longer function as knots.
The interpolation requirements are still satisﬁed at t
0
, t
1
, . . . , t
n−1
, t
n
. Figure 2.9
shows two diﬀerent cubic splines that interpolate the same initial data. The spline
that is plotted with a solid line is the notaknot spline. The spline that is plotted
with a dashed line is obtained by setting the derivatives at both endpoints to
zero.
33
2.10 Spline Interpolation D. Levy
(t
0
, f(t
0
))
(t
1
, f(t
1
))
(t
2
, f(t
2
))
(t
3
, f(t
3
))
(t
4
, f(t
4
))
x
Figure 2.9: Two cubic splines that interpolate the same data. Solid line: a notaknot
spline; Dashed line: the derivative is set to zero at both endpoints
2.10.2 What is natural about the natural spline?
The following theorem states that the natural spline can not have a larger L
2
norm of
the secondderivative than the function it interpolates (assuming that that function has
a continuous secondderivative). In fact, we are minimizing the L
2
norm of the second
derivative not only with respect to the “original” function which we are interpolating,
but with respect to any function that interpolates the data (and has a continuous second
derivative). In that sense, we refer to the natural spline as “natural”.
Theorem 2.21 Assume that f
(x) is continuous in [a, b], and let a = t
0
< t
1
< <
t
n
= b. If s(x) is the natural cubic spline interpolating f(x) at the knots ¦t
i
¦ then
b
a
(s
(x))
2
dx
b
a
(f
(x))
2
dx.
Proof. Deﬁne g(x) = f(x) −s(x). Then since s(x) interpolates f(x) at the knots ¦t
i
¦
their diﬀerence vanishes at these points, i.e.,
g(t
i
) = 0, 0 i n.
Now
b
a
(f
)
2
dx =
b
a
(s
)
2
dx +
b
a
(g
)
2
dx + 2
b
a
s
g
dx. (2.57)
We will show that the last term on the righthandside of (2.57) is zero, which will
conclude the proof as the other two terms on the righthandside of (2.57) are
34
D. Levy 2.10 Spline Interpolation
nonnegative. Splitting that term into a sum of integrals on the subintervals and
integrating by parts on every subinterval, we have
b
a
s
g
dx =
n
¸
i=1
t
i
t
i−1
s
g
dx =
n
¸
i=1
¸
(s
g
)
t
i
t
i−1
−
t
i
t
i−1
s
g
dx
¸
.
Since we are dealing with the “natural” choice s
(t
0
) = s
(t
n
) = 0, and since s
(x) is
constant on [t
i−1
, t
i
] (say c
i
), we end up with
b
a
s
g
dx = −
n
¸
i=1
t
i
t
i−1
s
g
dx = −
n
¸
i=1
c
i
t
i
t
i−1
g
dx = −
n
¸
i=1
c
i
(g(t
i
)−g(t
i−1
)) = 0.
We note that f
(x) can be viewed as a linear approximation of the curvature
[f
(x)[
(1 + (f
(x))
2
)
3
2
.
From that point of view, minimizing
b
a
(f
(x))
2
dx, can be viewed as ﬁnding the curve
with a minimal [f
(x)[ over an interval.
35
D. Levy
3 Approximations
3.1 Background
In this chapter we are interested in approximation problems. Generally speaking, start
ing from a function f(x) we would like to ﬁnd a diﬀerent function g(x) that belongs
to a given class of functions and is “close” to f(x) in some sense. As far as the class
of functions that g(x) belongs to, we will typically assume that g(x) is a polynomial
of a given degree (though it can be a trigonometric function, or any other function).
A typical approximation problem, will therefore be: ﬁnd the “closest” polynomial of
degree n to f(x).
What do we mean by “close”? There are diﬀerent ways of measuring the “distance”
between two functions. We will focus on two such measurements (among many): the L
∞

norm and the L
2
norm. We chose to focus on these two examples because of the diﬀerent
mathematical techniques that are required to solve the corresponding approximation
problems.
We start with several deﬁnitions. We recall that a norm on a vector space V over
R is a function   : V →R with the following properties:
1. λf = [λ[f, ∀λ ∈ R and ∀f ∈ V .
2. f 0, ∀f ∈ V . Also f = 0 iﬀ f is the zero element of V .
3. The triangle inequality: f + g f +g, ∀f, g ∈ V .
We assume that the function f(x) ∈ C
0
[a, b] (continuous on [a, b]). A continuous
function on a closed interval obtains a maximum in the interval. We can therefore deﬁne
the L
∞
norm (also known as the maximum norm) of such a function by
f
∞
= max
axb
[f(x)[. (3.1)
The L
∞
distance between two functions f(x), g(x) ∈ C
0
[a, b] is thus given by
f −g
∞
= max
axb
[f(x) −g(x)[. (3.2)
We note that the deﬁnition of the L
∞
norm can be extended to functions that are less
regular than continuous functions. This generalization requires some subtleties that
we would like to avoid in the following discussion, hence, we will limit ourselves to
continuous functions.
We proceed by deﬁning the L
2
norm of a continuous function f(x) as
f
2
=
b
a
[f(x)[
2
dx. (3.3)
The L
2
function space is the collection of functions f(x) for which f
2
< ∞. Of
course, we do not have to assume that f(x) is continuous for the deﬁnition (3.3) to
make sense. However, if we allow f(x) to be discontinuous, we then have to be more
36
D. Levy 3.1 Background
rigorous in terms of the deﬁnition of the interval so that we end up with a norm (the
problem is, e.g., in deﬁning what is the “zero” element in the space). We therefore limit
ourselves also in this case to continuous functions only. The L
2
distance between two
functions f(x) and g(x) is
f −g
2
=
b
a
[f(x) −g(x)[
2
dx. (3.4)
At this point, a natural question is how important is the choice of norm in terms of
the solution of the approximation problem. It is easy to see that the value of the norm
of a function may vary substantially based on the function as well as the choice of the
norm. For example, assume that f
∞
< ∞. Then, clearly
f
2
=
b
a
[f[
2
dx ≤ (b −a)f
∞
.
On the other hand, it is easy to construct a function with an arbitrary small f
2
and
an arbitrarily large f
∞
. Hence, the choice of norm may have a signiﬁcant impact on
the solution of the approximation problem.
As you have probably already anticipated, there is a strong connection between some
approximation problems and interpolation problems. For example, one possible method
of constructing an approximation to a given function is by sampling it at certain points
and then interpolating the sampled data. Is that the best we can do? Sometimes the
answer is positive, but the problem still remains diﬃcult because we have to determine
the best sampling points. We will address these issues in the following sections.
The following theorem, the Weierstrass approximation theorem, plays a central role
in any discussion of approximations of functions. Loosely speaking, this theorem states
that any continuous function can be approached as close as we want to with polynomials,
assuming that the polynomials can be of any degree. We formulate this theorem in the
L
∞
norm and note that a similar theorem holds also in the L
2
sense. We let Π
n
denote
the space of polynomials of degree n.
Theorem 3.1 (Weierstrass Approximation Theorem) Let f(x) be a continuous
function on [a, b]. Then there exists a sequence of polynomials P
n
(x) that converges
uniformly to f(x) on [a, b], i.e., ∀ε > 0, there exists an N ∈ N and polynomials P
n
(x) ∈
Π
n
, such that ∀x ∈ [a, b]
[f(x) −P
n
(x)[ < ε, ∀n N.
We will provide a constructive proof of the Weierstrass approximation theorem: ﬁrst,
we will deﬁne a family of polynomials, known as the Bernstein polynomials, and then
we will show that they uniformly converge to f(x).
We start with the deﬁnition. Given a continuous function f(x) in [0, 1], we deﬁne
the Bernstein polynomials as
(B
n
f)(x) =
n
¸
j=0
f
j
n
n
j
x
j
(1 −x)
n−j
, 0 x 1.
37
3.1 Background D. Levy
We emphasize that the Bernstein polynomials depend on the function f(x).
Example 3.2
Three Bernstein polynomials B
6
(x), B
10
(x), and B
20
(x) for the function
f(x) =
1
1 + 10(x −0.5)
2
on the interval [0, 1] are shown in Figure 3.1. Note the gradual convergence of B
n
(x) to
f(x).
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
x
f(x)
B
6
(x)
B
10
(x)
B
20
(x)
Figure 3.1: The Bernstein polynomials B
6
(x), B
10
(x), and B
20
(x) for the function f(x) =
1
1+10(x−0.5)
2
on the interval [0, 1]
We now state and prove several properties of B
n
(x) that will be used when we prove
Theorem 3.1.
Lemma 3.3 The following relations hold:
1. (B
n
1)(x) = 1
2. (B
n
x)(x) = x
3. (B
n
x
2
)(x) =
n −1
n
x
2
+
x
n
.
Proof.
(B
n
1)(x) =
n
¸
j=0
n
j
x
j
(1 −x)
n−j
= (x + (1 −x))
n
= 1.
38
D. Levy 3.1 Background
(B
n
x)(x) =
n
¸
j=0
j
n
n
j
x
j
(1 −x)
n−j
= x
n
¸
j=1
n −1
j −1
x
j−1
(1 −x)
n−j
= x
n−1
¸
j=0
n −1
j
x
j
(1 −x)
n−1−j
= x[x + (1 −x)]
n−1
= x.
Finally,
j
n
2
n
j
=
j
n
(n −1)!
(n −j)!(j −1)!
=
n −1
n −1
j −1
n
(n −1)!
(n −j)!(j −1)!
+
1
n
(n −1)!
(n −j)!(j −1)!
=
n −1
n
n −2
j −2
+
1
n
n −1
j −1
.
Hence
(B
n
x
2
)(x) =
n
¸
j=0
j
n
2
n
j
x
j
(1 −x)
n−j
=
n −1
n
x
2
n
¸
j=2
n −2
j −2
x
j−2
(1 −x)
n−j
+
1
n
x
n
¸
j=1
n −1
j −1
x
j−1
(1 −x)
n−j
=
n −1
n
x
2
(x + (1 −x))
n−2
+
1
n
x(x + (1 −x))
n−1
=
n −1
n
x
2
+
x
n
.
In the following lemma we state several additional properties of the Bernstein poly
nomials. The proof is left as an exercise.
Lemma 3.4 For all functions f(x), g(x) that are continuous in [0, 1], and ∀α ∈ R
1. Linearity.
(B
n
(αf + g))(x) = α(B
n
f)(x) + (B
n
g)(x).
2. Monotonicity. If f(x) g(x) ∀x ∈ [0, 1], then
(B
n
f)(x) (B
n
g)(x).
Also, if [f(x)[ g(x) ∀x ∈ [0, 1] then
[(B
n
f)(x)[ (B
n
g)(x).
3. Positivity. If f(x) 0 then
(B
n
f)(x) 0.
We are now ready to prove the Weierstrass approximation theorem, Theorem 3.1.
39
3.1 Background D. Levy
Proof. We will prove the theorem in the interval [0, 1]. The extension to [a, b] is left as
an exercise. Since f(x) is continuous on a closed interval, it is uniformly continuous.
Hence ∀x, y ∈ [0, 1], such that [x −y[ δ,
[f(x) −f(y)[ ε. (3.5)
In addition, since f(x) is continuous on a closed interval, it is also bounded. Let
M = max
x∈[0,1]
[f(x)[.
Fix any point a ∈ [0, 1]. If [x −a[ δ then (3.5) holds. If [x −a[ > δ then
[f(x) −f(a)[ 2M 2M
x −a
δ
2
.
(at ﬁrst sight this seems to be a strange way of upper bounding a function. We will
use it later on to our advantage). Combining the estimates for both cases we have
[f(x) −f(a)[ ε +
2M
δ
2
(x −a)
2
.
We would now like to estimate the diﬀerence between B
n
f and f. The linearity of B
n
and the property (B
n
1)(x) = 1 imply that
B
n
(f −f(a))(x) = (B
n
f)(x) −f(a).
Hence using the monotonicity of B
n
and the mapping properties of x and x
2
, we have,
[B
n
f(x) −f(a)[ B
n
ε +
2M
δ
2
(x −a)
2
= ε +
2M
δ
2
n −1
n
x
2
+
x
n
−2ax + a
2
= ε +
2M
δ
2
(x −a)
2
+
2M
δ
2
x −x
2
n
.
Evaluating at x = a we have (observing that max
a∈[0,1]
(a −a
2
) =
1
4
)
[B
n
f(a) −f(a)[ ε +
2M
δ
2
a −a
2
n
ε +
M
2δ
2
n
. (3.6)
The point a was arbitrary so the result (3.6) holds for any point a ∈ [0, 1]. Choosing
N
M
2δ
2
ε
we have ∀n N,
B
n
f −f
∞
ε +
M
2δ
2
N
2ε.
• Is interpolation a good way of approximating functions in the ∞norm? Not
necessarily. Discuss Runge’s example...
40
D. Levy 3.2 The Minimax Approximation Problem
3.2 The Minimax Approximation Problem
We assume that the function f(x) is continuous on [a, b], and assume that P
n
(x) is a
polynomial of degree n. We recall that the L
∞
distance between f(x) and P
n
(x) on
the interval [a, b] is given by
f −P
n

∞
= max
axb
[f(x) −P
n
(x)[. (3.7)
Clearly, we can construct polynomials that will have an arbitrary large distance from
f(x). The question we would like to address is how close can we get to f(x) (in the L
∞
sense) with polynomials of a given degree. We deﬁne d
n
(f) as the inﬁmum of (3.7) over
all polynomials of degree n, i.e.,
d
n
(f) = inf
Pn∈Πn
f −P
n

∞
(3.8)
The goal is to ﬁnd a polynomial P
∗
n
(x) for which the inﬁmum (3.8) is actually ob
tained, i.e.,
d
n
(f) = f −P
∗
n
(x)
∞
. (3.9)
We will refer to a polynomial P
∗
n
(x) that satisﬁes (3.9) as a polynomial of best
approximation or the minimax polynomial. The minimal distance in (3.9) will
be referred to as the minimax error.
The theory we will explore in the following sections will show that the minimax
polynomial always exists and is unique. We will also provide a characterization of
the minimax polynomial that will allow us to identify it if we actually see it. The
general construction of the minimax polynomial will not be addressed in this text as it
is relatively technically involved. We will limit ourselves to simple examples.
Example 3.5
We let f(x) be a monotonically increasing and continuous function on the interval [a, b]
and are interested in ﬁnding the minimax polynomial of degree zero to f(x) in that
interval. We denote this minimax polynomial by
P
∗
0
(x) ≡ c.
Clearly, the smallest distance between f(x) and P
∗
0
in the L
∞
norm will be obtained if
c =
f(a) + f(b)
2
.
The maximal distance between f(x) and P
∗
0
will be attained at both edges and will be
equal to
±
f(b) −f(a)
2
.
41
3.2 The Minimax Approximation Problem D. Levy
3.2.1 Existence of the minimax polynomial
The existence of the minimax polynomial is provided by the following theorem.
Theorem 3.6 (Existence) Let f ∈ C
0
[a, b]. Then for any n ∈ N there exists P
∗
n
(x) ∈
Π
n
, that minimizes f(x) −P
n
(x)
∞
among all polynomials P(x) ∈ Π
n
.
Proof. We follow the proof as given in [7]. Let η = (η
0
, . . . , η
n
) be an arbitrary point in
R
n+1
and let
P
n
(x) =
n
¸
i=0
η
i
x
i
∈ Π
n
.
We also let
φ(η) = φ(η
0
, . . . , η
n
) = f −P
n

∞
.
Our goal is to show that φ obtains a minimum in R
n+1
, i.e., that there exists a point
η
∗
= (η
∗
0
, . . . , η
∗
n
) such that
φ(η
∗
) = min
η∈R
n+1
φ(η).
Step 1. We ﬁrst show that φ(η) is a continuous function on R
n+1
. For an arbitrary
δ = (δ
0
, . . . , δ
n
) ∈ R
n+1
, deﬁne
q
n
(x) =
n
¸
i=0
δ
i
x
i
.
Then
φ(η + δ) = f −(P
n
+ q
n
)
∞
≤ f −P
n

∞
+q
n

∞
= φ(η) +q
n

∞
.
Hence
φ(η + δ) −φ(η) ≤ q
n

∞
≤ max
x∈[a,b]
([δ
0
[ +[δ
1
[[x[ + . . . +[δ
n
[[x[
n
).
For any ε > 0, let
˜
δ = ε/(1 + c + . . . + c
n
), where c = max([a[, [b[). Then for any
δ = (δ
0
, . . . , δ
n
) such that max [δ
i
[
˜
δ, 0 i n,
φ(η + δ) −φ(η) ε. (3.10)
Similarly
φ(η) = f−P
n

∞
= f−(P
n
+q
n
)+q
n

∞
f−(P
n
+q
n
)
∞
+q
n

∞
= φ(η+δ)+q
n

∞
,
which implies that under the same conditions as in (3.10) we also get
φ(η) −φ(η + δ) ε,
Altogether,
[φ(η + δ) −φ(η)[ ε,
which means that φ is continuous at η. Since η was an arbitrary point in R
n+1
, φ is
continuous in the entire R
n+1
.
42
D. Levy 3.2 The Minimax Approximation Problem
Step 2. We now construct a compact set in R
n+1
on which φ obtains a minimum. We
let
S =
¸
η ∈ R
n+1
φ(η) ≤ f
∞
¸
.
We have
φ(0) = f
∞
,
hence, 0 ∈ S, and the set S is nonempty. We also note that the set S is bounded and
closed (check!). Since φ is continuous on the entire R
n+1
, it is also continuous on S,
and hence it must obtain a minimum on S, say at η
∗
∈ R
n+1
, i.e.,
min
η∈S
φ(a) = φ(η
∗
).
Step 3. Since 0 ∈ S, we know that
min
η∈S
φ(η) φ(0) = f
∞
.
Hence, if η ∈ R
n+1
but η ∈ S then
φ(η) > f
∞
min
η∈S
φ(η).
This means that the minimum of φ over S is the same as the minimum over the entire
R
n+1
. Therefore
P
∗
n
(x) =
n
¸
i=0
η
∗
i
x
i
, (3.11)
is the best approximation of f(x) in the L
∞
norm on [a, b], i.e., it is the minimax
polynomial, and hence the minimax polynomial exists.
We note that the proof of Theorem 3.6 is not a constructive proof. The proof does
not tell us what the point η
∗
is, and hence, we do not know the coeﬃcients of the
minimax polynomial as written in (3.11). We will discuss the characterization of the
minimax polynomial and some simple cases of its construction in the following sections.
3.2.2 Bounds on the minimax error
It is trivial to obtain an upper bound on the minimax error, since by the deﬁnition of
d
n
(f) in (3.8) we have
d
n
(f) f −P
n

∞
, ∀P
n
(x) ∈ Π
n
.
A lower bound is provided by the following theorem.
43
3.2 The Minimax Approximation Problem D. Levy
Theorem 3.7 (de la Vall´eePoussin) Let a x
0
< x
1
< < x
n+1
b. Let P
n
(x)
be a polynomial of degree n. Suppose that
f(x
j
) −P
n
(x
j
) = (−1)
j
e
j
, j = 0, . . . , n + 1,
where all e
j
= 0 and are of an identical sign. Then
min
j
[e
j
[ d
n
(f).
Proof. By contradiction. Assume for some Q
n
(x) that
f −Q
n

∞
< min
j
[e
j
[.
Then the polynomial
(Q
n
−P
n
)(x) = (f −P
n
)(x) −(f −Q
n
)(x),
is a polynomial of degree n that has the same sign at x
j
as does f(x) −P
n
(x). This
implies that (Q
n
− P
n
)(x) changes sign at least n + 2 times, and hence it has at least
n + 1 zeros. Being a polynomial of degree n this is possible only if it is identically
zero, i.e., if P
n
(x) ≡ Q
n
(x), which contradicts the assumptions on Q
n
(x) and P
n
(x).
3.2.3 Characterization of the minimax polynomial
The following theorem provides a characterization of the minimax polynomial in terms
of its oscillations property.
Theorem 3.8 (The oscillating theorem) Suppose that f(x) is continuous in [a, b].
The polynomial P
∗
n
(x) ∈ Π
n
is the minimax polynomial of degree n to f(x) in [a, b] if
and only if f(x) −P
∗
n
(x) assumes the values ±f −P
∗
n

∞
with an alternating change of
sign at least n + 2 times in [a, b].
Proof. We prove here only the suﬃciency part of the theorem. For the necessary part
of the theorem we refer to [7].
Without loss of generality, suppose that
(f −P
∗
n
)(x
i
) = (−1)
i
f −P
∗
n

∞
, 0 i n + 1.
Let
D
∗
= f −P
∗
n

∞
,
and let
d
n
(f) = min
Pn∈Πn
f −P
n

∞
.
We replace the inﬁmum in the original deﬁnition of d
n
(f) by a minimum because we
already know that a minimum exists. de la Vall´eePoussin’s theorem (Theorem 3.7)
implies that D
∗
d
n
. On the other hand, the deﬁnition of d
n
implies that d
n
D
∗
.
Hence D
∗
= d
n
and P
∗
n
(x) is the minimax polynomial.
Remark. In view of these theorems it is obvious why the Taylor expansion is a poor
uniform approximation. The sum is non oscillatory.
44
D. Levy 3.2 The Minimax Approximation Problem
3.2.4 Uniqueness of the minimax polynomial
Theorem 3.9 (Uniqueness) Let f(x) be continuous on [a, b]. Then its minimax poly
nomial P
∗
n
(x) ∈ Π
n
is unique.
Proof. Let
d
n
(f) = min
Pn∈Πn
f −P
n

∞
.
Assume that Q
n
(x) is also a minimax polynomial. Then
f −P
∗
n

∞
= f −Q
n

∞
= d
n
(f).
The triangle inequality implies that
f −
1
2
(P
∗
n
+ Q
n
)
∞
≤
1
2
f −P
∗
n

∞
+
1
2
f −Q
n

∞
= d
n
(f).
Hence,
1
2
(P
∗
n
+ Q
n
) ∈ Π
n
is also a minimax polynomial. The oscillating theorem
(Theorem 3.8) implies that there exist x
0
, . . . , x
n+1
∈ [a, b] such that
[f(x
i
) −
1
2
(P
∗
n
(x
i
) + Q
n
(x
i
))[ = d
n
(f), 0 i n + 1. (3.12)
Equation (3.12) can be rewritten as
[f(x
i
) −P
∗
n
(x
i
) + f(x
i
) −Q
n
(x
i
)[ = 2d
n
(f), 0 i n + 1. (3.13)
Since P
∗
n
(x) and Q
n
(x) are both minimax polynomials, we have
[f(x
i
) −P
∗
n
(x
i
)[ ≤ f −P
∗
n

∞
= d
n
(f), 0 i n + 1. (3.14)
and
[f(x
i
) −Q
n
(x
i
)[ ≤ f −Q
n

∞
= d
n
(f), 0 i n + 1. (3.15)
For any i, equations (3.13)–(3.15) mean that the absolute value of two numbers that
are d
n
(f) add up to 2d
n
(f). This is possible only if they are equal to each other, i.e.,
f(x
i
) −P
∗
n
(x
i
) = f(x
i
) −Q
n
(x
i
), 0 i n + 1,
i.e.,
(P
∗
n
−Q
n
)(x
i
) = 0, 0 i n + 1.
Hence, the polynomial (P
∗
n
−Q
n
)(x) ∈ Π
n
has n + 2 distinct roots which is possible for
a polynomial of degree n only if it is identically zero. Hence
Q
n
(x) ≡ P
∗
n
(x),
and the uniqueness of the minimax polynomial is established.
45
3.2 The Minimax Approximation Problem D. Levy
3.2.5 The nearminimax polynomial
We now connect between the minimax approximation problem and polynomial interpo
lation. In order for f(x) − P
n
(x) to change its sign n + 2 times, there should be n + 1
points on which f(x) and P
n
(x) agree with each other. In other words, we can think
of P
n
(x) as a function that interpolates f(x) at (least in) n + 1 points, say x
0
, . . . , x
n
.
What can we say about these points?
We recall that the interpolation error is given by (2.25),
f(x) −P
n
(x) =
1
(n + 1)!
f
(n+1)
(ξ)
n
¸
i=0
(x −x
i
).
If P
n
(x) is indeed the minimax polynomial, we know that the maximum of
f
(n+1)
(ξ)
n
¸
i=0
(x −x
i
), (3.16)
will oscillate with equal values. Due to the dependency of f
(n+1)
(ξ) on the intermediate
point ξ, we know that minimizing the error term (3.16) is a diﬃcult task. We recall that
interpolation at the Chebyshev points minimizes the multiplicative part of the error
term, i.e.,
n
¸
i=0
(x −x
i
).
Hence, choosing x
0
, . . . , x
n
to be the Chebyshev points will not result with the minimax
polynomial, but nevertheless, this relation motivates us to refer to the interpolant at
the Chebyshev points as the nearminimax polynomial. We note that the term
“nearminimax” does not mean that the nearminimax polynomial is actually close to
the minimax polynomial.
3.2.6 Construction of the minimax polynomial
The characterization of the minimax polynomial in terms of the number of points in
which the maximum distance should be obtained with oscillating signs allows us to
construct the minimax polynomial in simple cases by a direct computation.
We are not going to deal with the construction of the minimax polynomial in the
general case. The algorithm for doing so is known as the Remez algorithm, and we refer
the interested reader to [2] and the references therein.
A simple case where we can demonstrate a direct construction of the polynomial is
when the function is convex, as done in the following example.
Example 3.10
Problem: Let f(x) = e
x
, x ∈ [1, 3]. Find the minimax polynomial of degree 1, P
∗
1
(x).
Solution: Based on the characterization of the minimax polynomial, we will be looking
for a linear function P
∗
1
(x) such that its maximal distance between P
∗
1
(x) and f(x) is
46
D. Levy 3.2 The Minimax Approximation Problem
obtained 3 times with alternative signs. Clearly, in the case of the present problem,
since the function is convex, the maximal distance will be obtained at both edges and
at one interior point. We will use this observation in the construction that follows.
The construction itself is graphically shown in Figure 3.2.
1 a 3
x
e
1
f(a)
¯ y
l
1
(a)
e
3
←− l
2
(x)
e
x
l
1
(x) −→
P
∗
1
(x)
Figure 3.2: A construction of the linear minimax polynomial for the convex function e
x
on [1, 3]
We let l
1
(x) denote the line that connects the endpoints (1, e) and (3, e
3
), i.e.,
l
1
(x) = e + m(x −1).
Here, the slope m is given by
m =
e
3
−e
2
. (3.17)
Let l
2
(x) denote the tangent to f(x) at a point a that is identiﬁed such that the slope
is m. Since f
(x) = e
x
, we have e
a
= m, i.e.,
a = log m.
Now
f(a) = e
log m
= m,
and
l
1
(a) = e + m(log m−1).
47
3.3 Leastsquares Approximations D. Levy
Hence, the average between f(a) and l
1
(a) which we denote by ¯ y is given by
¯ y =
f(a) + l
1
(a)
2
=
m + e + mlog m−m
2
=
e + mlog m
2
.
The minimax polynomial P
∗
1
(x) is the line of slope m that passes through (a, ¯ y),
P
∗
1
(x) −
e + mlog m
2
= m(x −log m),
i.e.,
P
∗
1
(x) = mx +
e −mlog m
2
,
where the slope m is given by (3.17). We note that the maximal diﬀerence between
P
∗
1
(x) and f(x) is obtained at x = 1, a, 3.
3.3 Leastsquares Approximations
3.3.1 The leastsquares approximation problem
We recall that the L
2
norm of a function f(x) is deﬁned as
f
2
=
b
a
[f(x)[
2
dx.
As before, we let Π
n
denote the space of all polynomials of degree n. The least
squares approximation problem is to ﬁnd the polynomial that is the closest to f(x)
in the L
2
norm among all polynomials of degree n, i.e., to ﬁnd Q
∗
n
∈ Π
n
such that
f −Q
∗
n

2
= min
Qn∈Πn
f −Q
n

2
.
3.3.2 Solving the leastsquares problem: a direct method
Let
Q
n
(x) =
n
¸
i=0
a
i
x
i
.
We want to minimize f(x) −Q
n
(x)
2
among all Q
n
∈ Π
n
. For convenience, instead of
minimizing the L
2
norm of the diﬀerence, we will minimize its square. We thus let φ
denote the square of the L
2
distance between f(x) and Q
n
(x), i.e.,
φ(a
0
, . . . , a
n
) =
b
a
(f(x) −Q
n
(x))
2
dx
=
b
a
f
2
(x)dx −2
n
¸
i=0
a
i
b
a
x
i
f(x)dx +
n
¸
i=0
n
¸
j=0
a
i
a
j
b
a
x
i+j
dx.
48
D. Levy 3.3 Leastsquares Approximations
φ is a function of the n + 1 coeﬃcients in the polynomial Q
n
(x). This means that we
want to ﬁnd a point ˆ a = (ˆ a
0
, . . . , ˆ a
n
) ∈ R
n+1
for which φ obtains a minimum. At this
point
∂φ
∂a
k
a=ˆ a
= 0. (3.18)
The condition (3.18) implies that
0 = −2
b
a
x
k
f(x)dx +
n
¸
i=0
ˆ a
i
b
a
x
i+k
dx +
n
¸
j=0
ˆ a
j
b
a
x
j+k
dx (3.19)
= 2
¸
n
¸
i=0
ˆ a
i
b
a
x
i+k
dx −
b
a
x
k
f(x)dx
¸
.
This is a linear system for the unknowns (ˆ a
0
, . . . , ˆ a
n
):
n
¸
i=0
ˆ a
i
b
a
x
i+k
dx =
b
a
x
k
f(x), k = 0, . . . , n. (3.20)
We thus know that the solution of the leastsquares problem is the polynomial
Q
∗
n
(x) =
n
¸
i=0
ˆ a
i
x
i
,
where the coeﬃcients ˆ a
i
, i = 0, . . . , n, are the solution of (3.20), assuming that this
system can be solved. Indeed, the system (3.20) always has a unique solution, which
proves that not only the leastsquares problem has a solution, but that it is also unique.
We let H
n+1
(a, b) denote the (n+1) (n+1) coeﬃcients matrix of the system (3.20)
on the interval [a, b], i.e.,
(H
n+1
(a, b))
i,k
=
b
a
x
i+k
dx, 0 i, k n.
For example, in the case where [a, b] = [0, 1],
H
n
(0, 1) =
¸
¸
¸
¸
1/1 1/2 . . . 1/n
1/2 1/3 . . . 1/(n + 1)
.
.
.
.
.
.
1/n 1/(n + 1) . . . 1/(2n −1)
(3.21)
The matrix (3.21) is known as the Hilbert matrix.
Lemma 3.11 The Hilbert matrix is invertible.
49
3.3 Leastsquares Approximations D. Levy
Proof. We leave it is an exercise to show that the determinant of H
n
is given by
det(H
n
) =
(1!2! (n −1)!)
4
1!2! (2n −1)!
.
Hence, det(H
n
) = 0 and H
n
is invertible.
Is inverting the Hilbert matrix a good way of solving the leastsquares problem? No.
There are numerical instabilities that are associated with inverting H. We demonstrate
this with the following example.
Example 3.12
The Hilbert matrix H
5
is
H
5
=
¸
¸
¸
¸
¸
1/1 1/2 1/3 1/4 1/5
1/2 1/3 1/4 1/5 1/6
1/3 1/4 1/5 1/6 1/7
1/4 1/5 1/6 1/7 1/8
1/5 1/6 1/7 1/8 1/9
The inverse of H
5
is
H
5
=
¸
¸
¸
¸
¸
25 −300 1050 −1400 630
−300 4800 −18900 26880 −12600
1050 −18900 79380 −117600 56700
−1400 26880 −117600 179200 −88200
630 −12600 56700 −88200 44100
The condition number of H
5
is 4.77 10
5
, which indicates that it is illconditioned. In
fact, the condition number of H
n
increases with the dimension n so inverting it becomes
more diﬃcult with an increasing dimension.
3.3.3 Solving the leastsquares problem: with orthogonal polynomials
Let ¦P
k
¦
n
k=0
be polynomials such that
deg(P
k
(x)) = k.
Let Q
n
(x) be a linear combination of the polynomials ¦P
k
¦
n
k=0
, i.e.,
Q
n
(x) =
n
¸
j=0
c
j
P
j
(x). (3.22)
Clearly, Q
n
(x) is a polynomial of degree n. Deﬁne
φ(c
0
, . . . , c
n
) =
b
a
[f(x) −Q
n
(x)]
2
dx.
50
D. Levy 3.3 Leastsquares Approximations
We note that the function φ is a quadratic function of the coeﬃcients of the linear
combination (3.22), ¦c
k
¦. We would like to minimize φ. Similarly to the calculations
done in the previous section, at the minimum, ˆ c = (ˆ c
0
, . . . , ˆ c
n
), we have
0 =
∂φ
∂c
k
c=ˆ c
= −2
b
a
P
k
(x)f(x)dx + 2
n
¸
j=0
ˆ c
j
b
a
P
j
(x)P
k
(x)dx,
i.e.,
n
¸
j=0
ˆ c
j
b
a
P
j
(x)P
k
(x)dx =
b
a
P
k
(x)f(x)dx, k = 0, . . . , n. (3.23)
Note the similarity between equation (3.23) and (3.20). There, we used the basis func
tions ¦x
k
¦
n
k=0
(a basis of Π
n
), while here we work with the polynomials ¦P
k
(x)¦
n
k=0
instead. The idea now is to choose the polynomials ¦P
k
(x)¦
n
k=0
such that the system
(3.23) can be easily solved. This can be done if we choose them in such a way that
b
a
P
i
(x)P
j
(x)dx = δ
ij
=
1, i = j,
0, j = j.
(3.24)
Polynomials that satisfy (3.24) are called orthonormal polynomials. If, indeed, the
polynomials ¦P
k
(x)¦
n
k=0
are orthonormal, then (3.23) implies that
ˆ c
j
=
b
a
P
j
(x)f(x)dx, j = 0, . . . , n. (3.25)
The solution of the leastsquares problem is a polynomial
Q
∗
n
(x) =
n
¸
j=0
ˆ c
j
P
j
(x), (3.26)
with coeﬃcients ˆ c
j
, j = 0, . . . , n, that are given by (3.25).
Remark. Polynomials that satisfy
b
a
P
i
(x)P
j
(x)dx =
b
a
(P
i
(x))
2
, i = j,
0, i = j,
with
b
a
(P
i
(x))
2
dx that is not necessarily 1 are called orthogonal polynomials. In
this case, the solution of the leastsquares problem is given by the polynomial Q
∗
n
(x) in
(3.26) with the coeﬃcients
ˆ c
j
=
b
a
P
j
(x)f(x)dx
b
a
(P
j
(x))
2
dx
, j = 0, . . . , n. (3.27)
51
3.3 Leastsquares Approximations D. Levy
3.3.4 The weighted least squares problem
A more general leastsquares problem is the weighted least squares approximation
problem. We consider a weight function, w(x), to be a continuous on (a, b), non
negative function with a positive mass, i.e.,
b
a
w(x)dx > 0.
Note that w(x) may be singular at the edges of the interval since we do not require
it to be continuous on the closed interval [a, b]. For any weight w(x), we deﬁne the
corresponding weighted L
2
norm of a function f(x) as
f
2,w
=
b
a
(f(x))
2
w(x)dx.
The weighted leastsquares problem is to ﬁnd the closest polynomial Q
∗
n
∈ Π
n
to f(x),
this time in the weighted L
2
norm sense, i.e., we look for a polynomial Q
∗
n
(x) of degree
n such that
f −Q
∗
n

2,w
= min
Qn∈Πn
f −Q
n

2,w
. (3.28)
In order to solve the weighted leastsquares problem (3.28) we follow the methodology
described in Section 3.3.3, and consider polynomials ¦P
k
¦
n
k=0
such that deg(P
k
(x)) = k.
We then consider a polynomial Q
n
(x) that is written as their linear combination:
Q
n
(x) =
n
¸
j=0
c
j
P
j
(x).
By repeating the calculations of Section 3.3.3, we obtain
n
¸
j=0
ˆ c
j
b
a
w(x)P
j
(x)P
k
(x)dx =
b
a
w(x)P
k
(x)f(x)dx, k = 0, . . . , n, (3.29)
(compare with (3.23)). The system (3.29) can be easily solved if we choose ¦P
k
(x)¦ to
be orthonormal with respect to the weight w(x), i.e.,
b
a
P
i
(x)P
j
(x)w(x)dx = δ
ij
.
Hence, the solution of the weighted leastsquares problem is given by
Q
∗
n
(x) =
n
¸
j=0
ˆ c
j
P
j
(x), (3.30)
where the coeﬃcients are given by
ˆ c
j
=
b
a
P
j
(x)f(x)w(x)dx, j = 0, . . . , n. (3.31)
52
D. Levy 3.3 Leastsquares Approximations
Remark. In the case where ¦P
k
(x)¦ are orthogonal but not necessarily normalized,
the coeﬃcients of the solution (3.30) of the weighted leastsquares problem are given by
ˆ c
j
=
b
a
P
j
(x)f(x)dx
b
a
(P
j
(x))
2
w(x)dx
, j = 0, . . . , n.
3.3.5 Orthogonal polynomials
At this point we already know that orthogonal polynomials play a central role in the
solution of leastsquares problems. In this section we will focus on the construction of
orthogonal polynomials. The properties of orthogonal polynomials will be studies in
Section 3.3.7.
We start by deﬁning the weighted inner product between two functions f(x) and
g(x) (with respect to the weight w(x)):
'f, g`
w
=
b
a
f(x)g(x)w(x)dx.
To simplify the notations, even in the weighted case, we will typically write 'f, g` instead
of 'f, g`
w
. Some properties of the weighted inner product include
1. 'αf, g` = 'f, αg` = α'f, g` , ∀α ∈ R.
2. 'f
1
+ f
2
, g` = 'f
1
, g` +'f
2
, g`.
3. 'f, g` = 'g, f`
4. 'f, f` 0 and 'f, f` = 0 iﬀ f ≡ 0. Here we must assume that f(x) is continuous
in the interval [a, b]. If it is not continuous, we can have 'f, f` = 0 and f(x) can
still be nonzero (e.g., in one point).
The weighted L
2
norm can be obtained from the weighted inner product by
f
2,w
=
'f, f`
w
.
Given a weight w(x), we are interested in constructing orthogonal (or orthonor
mal) polynomials. This can be done using the GramSchmidt orthogonalization
process, which we now describe in detail.
In the general context of linear algebra, the GramSchmidt process is being used
to convert one set of linearly independent vectors to an orthogonal set of vectors that
spans the same vector space. In our context, we should think about the process as
converting one set of polynomials that span the space of polynomials of degree n
to an orthogonal set of polynomials that spans the same space Π
n
. Typically, the
initial set of polynomials will be ¦1, x, x
2
, . . . , x
n
¦, which we would like to convert to
orthogonal polynomials with respect to the weight w(x). However, to keep the discussion
slightly more general, we start with n+1 linearly independent functions (all in L
2
w
[a, b],
53
3.3 Leastsquares Approximations D. Levy
¦g
i
(x)¦
n
i=0
, i.e.,
b
a
(g(x))
2
w(x)dx < ∞). The functions ¦g
i
¦ will be converted into
orthonormal vectors ¦f
i
¦.
We thus consider
f
0
(x) = d
0
g
0
(x),
f
1
(x) = d
1
(g
1
(x) −c
0
1
f
0
(x)),
.
.
.
f
n
(x) = d
n
(g
n
(x) −c
0
n
f
0
(x) −. . . −c
n−1
n
f
n−1
(x)).
The goal is to ﬁnd the coeﬃcients d
k
and c
j
k
such that ¦f
i
¦
n
i=0
is orthonormal with
respect to the weighted L
2
norm over [a, b], i.e.,
'f
i
, f
j
`
w
=
b
a
f
i
(x)f
j
(x)w(x)dx = δ
ij
.
We start with f
0
(x):
'f
0
, f
0
`
w
= d
2
0
'g
0
, g
0
`
w
.
Hence,
d
0
=
1
'g
0
, g
0
`
w
.
For f
1
(x), we require that it is orthogonal to f
0
(x), i.e., 'f
0
, f
1
`
w
= 0. Hence
0 = d
1
f
0
, g
1
−c
0
1
f
0
w
= d
1
('f
0
, g
1
`
w
−c
0
1
),
i.e.,
c
0
1
= 'f
0
, g
1
`
w
.
The normalization condition 'f
1
, f
1
`
w
= 1 now implies
d
2
1
g
1
−c
0
1
f
0
, g
1
−c
0
1
f
0
w
= 1.
Hence
d
1
=
1
'g
1
−c
0
1
f
0
, g
1
−c
0
1
f
0
`
w
.
The denominator cannot be zero due to the assumption that g
i
(x) are linearly indepen
dent. In general
f
k
(x) = d
k
(g
k
−c
0
k
f
0
−. . . −c
k−1
k
f
k−1
).
For i = 0, . . . , k −1 we require the orthogonality conditions
0 = 'f
k
, f
i
`
w
.
54
D. Levy 3.3 Leastsquares Approximations
Hence
0 =
d
k
(g
k
−c
i
k
f
i
), f
i
w
= d
k
('g
k
, f
i
`
w
−c
i
k
),
i.e.,
c
i
k
= 'g
k
, f
i
`
w
, 0 i k −1.
The coeﬃcient d
k
is obtained from the normalization condition 'f
k
, f
k
`
w
= 1.
Example 3.13
Let w(x) ≡ 1 on [−1, 1]. Start with g
i
(x) = x
i
, i = 0, . . . , n. We follow the Gram
Schmidt orthogonalization process to generate from this list, a set of orthonormal poly
nomials with respect to the given weight on [−1, 1]. Since g
0
(x) ≡ 1, we have
f
0
(x) = d
0
g
0
(x) = d
0
.
Hence
1 =
1
−1
f
2
0
(x)dx = 2d
2
0
,
which means that
d
0
=
1
√
2
=⇒ f
0
=
1
√
2
.
Now
f
1
(x)
d
1
= g
1
−c
0
1
f
0
= x −c
0
1
1
2
.
Hence
c
0
1
= 'g
1
, f
0
` =
x,
1
2
¸
=
1
2
1
−1
xdx = 0.
This implies that
f
1
(x)
d
1
= x =⇒ f
1
(x) = d
1
x.
The normalization condition 'f
1
, f
1
` = 1 reads
1 =
1
−1
d
2
1
x
2
dx =
2
3
d
2
1
.
Therefore,
d
1
=
3
2
=⇒ f
1
(x) =
3
2
x.
Similarly,
f
2
(x) =
1
2
5
2
(3x
2
−1),
and so on.
55
3.3 Leastsquares Approximations D. Levy
We are now going to provide several important examples of orthogonal polynomials.
1. Legendre polynomials. We start with the Legendre polynomials. This is a
family of polynomials that are orthogonal with respect to the weight
w(x) ≡ 1,
on the interval [−1, 1]. The Legendre polynomials can be obtained from the re
currence relation
(n + 1)P
n+1
(x) −(2n + 1)xP
n
(x) + nP
n−1
(x) = 0, n 1, (3.32)
starting from
P
0
(x) = 1, P
1
(x) = x.
It is possible to calculate them directly by Rodrigues’ formula
P
n
(x) =
1
2
n
n!
d
n
dx
n
(x
2
−1)
n
, n 0. (3.33)
The Legendre polynomials satisfy the orthogonality condition
'P
n
, P
m
` =
2
2n + 1
δ
nm
. (3.34)
2. Chebyshev polynomials. Our second example is of the Chebyshev polynomi
als. These polynomials are orthogonal with respect to the weight
w(x) =
1
√
1 −x
2
,
on the interval [−1, 1]. They satisfy the recurrence relation
T
n+1
(x) = 2xT
n
(x) −T
n−1
(x), n 1, (3.35)
together with T
0
(x) = 1 and T
1
(x) = x (see (2.31)). They and are explicitly given
by
T
n
(x) = cos(ncos
−1
x), n 0. (3.36)
(see (2.32)). The orthogonality relation that they satisfy is
'T
n
, T
m
` =
0, n = m,
π, n = m = 0,
π
2
, n = m = 0.
(3.37)
56
D. Levy 3.3 Leastsquares Approximations
3. Laguerre polynomials. We proceed with the Laguerre polynomials. Here the
interval is given by [0, ∞) with the weight function
w(x) = e
−x
.
The Laguerre polynomials are given by
L
n
(x) =
e
x
n!
d
n
dx
n
(x
n
e
−x
), n 0. (3.38)
The normalization condition is
L
n
 = 1. (3.39)
A more general form of the Laguerre polynomials is obtained when the weight is
taken as
e
−x
x
α
,
for an arbitrary real α > −1, on the interval [0, ∞).
4. Hermite polynomials. The Hermite polynomials are orthogonal with respect
to the weight
w(x) = e
−x
2
,
on the interval (−∞, ∞). The can be explicitly written as
H
n
(x) = (−1)
n
e
x
2 d
n
e
−x
2
dx
n
, n 0. (3.40)
Another way of expressing them is by
H
n
(x) =
[n/2]
¸
k=0
(−1)
k
n!
k!(n −2k)!
(2x)
n−2k
, (3.41)
where [x] denotes the largest integer that is x. The Hermite polynomials satisfy
the recurrence relation
H
n+1
(x) −2xH
n
(x) + 2nH
n−1
(x) = 0, n 1, (3.42)
together with
H
0
(x) = 1, H
1
(x) = 2x.
They satisfy the orthogonality relation
∞
−∞
e
−x
2
H
n
(x)H
m
(x)dx = 2
n
n!
√
πδ
nm
. (3.43)
57
3.3 Leastsquares Approximations D. Levy
3.3.6 Another approach to the leastsquares problem
In this section we present yet another way of deriving the solution of the leastsquares
problem. Along the way, we will be able to derive some new results. We recall that our
goal is to minimize
b
a
w(x)(f(x) −Q
n
(x))
2
dx
among all the polynomials Q
n
(x) of degree n.
Assume that ¦P
k
(x)¦
k0
is an orthonormal family of polynomials with respect to
w(x), and let
Q
n
(x) =
n
¸
j=0
b
j
P
j
(x).
Then
f −Q
n

2
2,w
=
b
a
w(x)
f(x) −
n
¸
j=0
b
j
P
j
(x)
2
dx.
Hence
0
f −
n
¸
j=0
b
j
P
j
, f −
n
¸
j=0
b
j
P
j
¸
w
= 'f, f`
w
−2
n
¸
j=0
b
j
'f, P
j
`
w
+
n
¸
i=0
n
¸
j=0
b
i
b
j
'P
i
, P
j
`
w
= f
2
2,w
−2
n
¸
j=0
'f, P
j
`
w
b
j
+
n
¸
j=0
b
2
j
= f
2
2,w
−
n
¸
j=0
'f, P
j
`
2
w
+
n
¸
j=0
'f, P
j
`
w
−b
j
2
.
The last expression is minimal iﬀ ∀0 j n
b
j
= 'f, P
j
`
w
.
Hence, there exists a unique leastsquares approximation which is given by
Q
∗
n
(x) =
n
¸
j=0
'f, P
j
`
w
P
j
(x). (3.44)
Remarks.
1. We have
f −Q
∗
n

2
2,w
= f
2
2,w
−
n
¸
j=0
'f, P
j
`
2
w
.
Hence
Q
∗
n

2
=
n
¸
j=0
'f, P
j
`
w
2
= f
2
−f −Q
∗
n

2
f
2
,
58
D. Levy 3.3 Leastsquares Approximations
i.e.,
n
¸
j=0
'f, P
j
`
w
2
f
2
2,w
. (3.45)
The inequality (3.45) is called Bessel’s inequality.
2. Assuming that [a, b] is ﬁnite, we have
lim
n→∞
f −Q
∗
n

2,w
= 0.
Hence
f
2
2,w
=
∞
¸
j=0
'f, P
j
`
w
2
, (3.46)
which is known as Parseval’s equality.
Example 3.14
Problem: Let f(x) = cos x on [−1, 1]. Find the polynomial in Π
2
, that minimizes
1
−1
[f(x) −Q
2
(x)]
2
dx.
Solution: The weight w(x) ≡ 1 on [−1, 1] implies that the orthogonal polynomials we
need to use are the Legendre polynomials. We are seeking for polynomials of degree
2 so we write the ﬁrst three Legendre polynomials
P
0
(x) ≡ 1, P
1
(x) = x, P
2
(x) =
1
2
(3x
2
−1).
The normalization factor satisﬁes, in general,
1
−1
P
2
n
(x) =
2
2n + 1
.
Hence
1
−1
P
2
0
(x)dx = 2,
1
−1
P
1
(x)dx =
2
3
,
1
−1
P
2
2
(x)dx =
2
5
.
We can then replace the Legendre polynomials by their normalized counterparts:
P
0
(x) ≡
1
√
2
, P
1
(x) =
3
2
x, P
2
(x) =
√
5
2
√
2
(3x
2
−1).
We now have
'f, P
0
` =
1
−1
cos x
1
√
2
dx =
1
√
2
sin x
1
−1
=
√
2 sin 1.
59
3.3 Leastsquares Approximations D. Levy
Hence
Q
∗
0
(x) ≡ sin 1.
We also have
'f, P
1
` =
1
−1
cos x
3
2
xdx = 0.
which means that Q
∗
1
(x) = Q
∗
0
(x). Finally,
'f, P
2
` =
1
−1
cos x
5
2
3x
2
−1
2
=
1
2
5
2
(12 cos 1 −8 sin 1),
and hence the desired polynomial, Q
∗
2
(x), is given by
Q
∗
2
(x) = sin 1 +
15
2
cos 1 −5 sin 1
(3x
2
−1).
In Figure 3.3 we plot the original function f(x) = cos x (solid line) and its approximation
Q
∗
2
(x) (dashed line). We zoom on the interval x ∈ [0, 1].
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
x
Figure 3.3: A secondorder L
2
approximation of f(x) = cos x. Solid line: f(x); Dashed
line: its approximation Q
∗
2
(x)
If the weight is w(x) ≡ 1 but the interval is [a, b], we can still use the Legendre
polynomials if we make the following change of variables. Deﬁne
x =
b + a + (b −a)t
2
.
60
D. Levy 3.3 Leastsquares Approximations
Then the interval −1 t 1 is mapped to a x b. Now, deﬁne
F(t) = f
b + a + (b −a)t
2
= f(x).
Hence
b
a
[f(x) −Q
n
(x)]
2
dx =
b −a
2
1
−1
[F(t) −q
n
(t)]
2
dt.
Example 3.15
Problem: Let f(x) = cos x on [0, π]. Find the polynomial in Π
1
that minimizes
π
0
[f(x) −Q
1
(x)]
2
dx.
Solution:
π
0
(f(x) −Q
∗
1
(x))
2
dx =
π
2
1
−1
[F(t) −q
n
(t)]
2
dt.
Letting
x =
π + πt
2
=
π
2
(1 + t),
we have
F(t) = cos
π
2
(1 + t)
= −sin
πt
2
.
We already know that the ﬁrst two normalized Legendre polynomials are
P
0
(t) =
1
√
2
, P
1
(t) =
3
2
t.
Hence
'F, P
0
` = −
1
−1
1
√
2
sin
πt
2
dt = 0,
which means that Q
∗
0
(t) = 0. Also
'F, P
1
` = −
1
−1
sin
πt
2
3
2
tdt = −
3
2
¸
sin
πt
2
π
2
2
−
t cos
πt
2
π
2
¸
1
−1
= −
3
2
8
π
2
.
Hence
q
∗
1
(t) = −
3
2
8
π
2
t = −
12
π
2
t =⇒ Q
∗
1
(x) = −
12
π
2
2
π
x −1
.
In Figure 3.4 we plot the original function f(x) = cos x (solid line) and its approximation
Q
∗
1
(x) (dashed line).
61
3.3 Leastsquares Approximations D. Levy
0 0.5 1 1.5 2 2.5 3
−1
−0.5
0
0.5
1
x
Figure 3.4: A ﬁrstorder L
2
approximation of f(x) = cos x on the interval [0, π]. Solid
line: f(x), Dashed line: its approximation Q
∗
1
(x)
Example 3.16
Problem: Let f(x) = cos x in [0, ∞). Find the polynomial in Π
1
that minimizes
∞
0
e
−x
[f(x) −Q
1
(x)]
2
dx.
Solution: The family of orthogonal polynomials that correspond to this weight on
[0, ∞) are Laguerre polynomials. Since we are looking for the minimizer of the
weighted L
2
norm among polynomials of degree 1, we will need to use the ﬁrst two
Laguerre polynomials:
L
0
(x) = 1, L
1
(x) = 1 −x.
We thus have
'f, L
0
`
w
=
∞
0
e
−x
cos xdx =
e
−x
(−cos x + sin x)
2
∞
0
=
1
2
.
Also
'f, L
1
`
w
=
∞
0
e
−x
cos x(1−x)dx =
1
2
−
¸
xe
−x
(−cos x + sin x)
2
−
e
−x
(−2 sin x)
4
∞
0
= 0.
This means that
Q
∗
1
(x) = 'f, L
0
`
w
L
0
(x) +'f, L
1
`
w
L
1
(x) =
1
2
.
62
D. Levy 3.3 Leastsquares Approximations
3.3.7 Properties of orthogonal polynomials
We start with a theorem that deals with some of the properties of the roots of orthogonal
polynomials. This theorem will become handy when we discuss Gaussian quadratures
in Section 5.6. We let ¦P
n
(x)¦
n0
be orthogonal polynomials in [a, b] with respect to
the weight w(x).
Theorem 3.17 The roots x
j
, j = 1, . . . , n of P
n
(x) are all real, simple, and are in
(a, b).
Proof. Let x
1
, . . . , x
r
be the roots of P
n
(x) in (a, b). Let
Q
r
(x) = (x −x
1
) . . . (x −x
r
).
Then Q
r
(x) and P
n
(x) change their signs together in (a, b). Also
deg(Q
r
(x)) = r n.
Hence (P
n
Q
r
)(x) is a polynomial with one sign in (a, b). This implies that
b
a
P
n
(x)Q
r
(x)w(x)dx = 0,
and hence r = n since P
n
(x) is orthogonal to polynomials of degree less than n.
Without loss of generality we now assume that x
1
is a multiple root, i.e.,
P
n
(x) = (x −x
1
)
2
P
n−2
(x).
Hence
P
n
(x)P
n−2
(x) =
P
n
(x)
x −x
1
2
0,
which implies that
b
a
P
n
(x)P
n−2
(x)dx > 0.
This is not possible since P
n
is orthogonal to P
n−2
. Hence roots can not repeat.
Another important property of orthogonal polynomials is that they can all be written
in terms of recursion relations. We have already seen speciﬁc examples of such relations
for the Legendre, Chebyshev, and Hermite polynomials (see (3.32), (3.35), and (3.42)).
The following theorem states such relations always hold.
Theorem 3.18 (Triple Recursion Relation) Any three consecutive orthonormal poly
nomials are related by a recursion formula of the form
P
n+1
(x) = (A
n
x + B
n
)P
n
(x) −C
n
P
n−1
(x).
If a
k
and b
k
are the coeﬃcients of the terms of degree k and degree k −1 in P
k
(x), then
A
n
=
a
n+1
a
n
, B
n
=
a
n+1
a
n
b
n+1
a
n+1
−
b
n
a
n
, C
n
=
a
n+1
a
n−1
a
2
n
.
63
3.3 Leastsquares Approximations D. Levy
Proof. For
A
n
=
a
n+1
a
n
,
let
Q
n
(x) = P
n+1
(x) −A
n
xP
n
(x)
= (a
n+1
x
n+1
+ b
n+1
x
n
+ . . .) −
a
n+1
a
n
x(a
n
x
n
+ b
n
x
n−1
+ . . .)
=
b
n+1
−
a
n+1
b
n
a
n
x
n
+ . . .
Hence deg(Q(x)) n, which means that there exists α
0
, . . . , α
n
such that
Q(x) =
n
¸
i=0
α
i
P
i
(x).
For 0 i n −2,
α
i
=
'Q, P
i
`
'P
i
, P
i
`
= 'Q, P
i
` = 'P
n+1
−A
n
xP
n
, P
i
` = −A
n
'xP
n
, P
i
` = 0.
Hence
Q
n
(x) = α
n
P
n
(x) + α
n−1
P
n−1
(x).
Set α
n
= B
n
and α
n−1
= −C
n
. Then, since
xP
n−1
=
a
n−1
a
n
P
n
+ q
n−1
,
we have
C
n
= A
n
'xP
n
, P
n−1
` = A
n
'P
n
, xP
n−1
` = A
n
P
n
,
a
n−1
a
n
P
n
+ q
n−1
= A
n
a
n−1
a
n
.
Finally
P
n+1
= (A
n
x + B
n
)P
n
−C
n
P
n−1
,
can be explicitly written as
a
n+1
x
n+1
+b
n+1
x
n
+. . . = (A
n
x+B
n
)(a
n
x
n
+b
n
x
n−1
+. . .)−C
n
(a
n−1
x
n−1
+b
n−1
x
n−2
+. . .).
The coeﬃcient of x
n
is
b
n+1
= A
n
b
n
+ B
n
a
n
,
which means that
B
n
= (b
n+1
−A
n
b
n
)
1
a
n
.
64
D. Levy
4 Numerical Diﬀerentiation
4.1 Basic Concepts
This chapter deals with numerical approximations of derivatives. The ﬁrst questions
that comes up to mind is: why do we need to approximate derivatives at all? After
all, we do know how to analytically diﬀerentiate every function. Nevertheless, there are
several reasons as of why we still need to approximate derivatives:
• Even if there exists an underlying function that we need to diﬀerentiate, we might
know its values only at a sampled data set without knowing the function itself.
• There are some cases where it may not be obvious that an underlying function
exists and all that we have is a discrete data set. We may still be interested in
studying changes in the data, which are related, of course, to derivatives.
• There are times in which exact formulas are available but they are very complicated
to the point that an exact computation of the derivative requires a lot of function
evaluations. It might be signiﬁcantly simpler to approximate the derivative instead
of computing its exact value.
• When approximating solutions to ordinary (or partial) diﬀerential equations, we
typically represent the solution as a discrete approximation that is deﬁned on a
grid. Since we then have to evaluate derivatives at the grid points, we need to be
able to come up with methods for approximating the derivatives at these points,
and again, this will typically be done using only values that are deﬁned on a lattice.
The underlying function itself (which in this cased is the solution of the equation)
is unknown.
A simple approximation of the ﬁrst derivative is
f
(x) ≈
f(x + h) −f(x)
h
, (4.1)
where we assume that h > 0. What do we mean when we say that the expression on
the righthandside of (4.1) is an approximation of the derivative? For linear functions
(4.1) is actually an exact expression for the derivative. For almost all other functions,
(4.1) is not the exact derivative.
Let’s compute the approximation error. We write a Taylor expansion of f(x + h)
about x, i.e.,
f(x + h) = f(x) + hf
(x) +
h
2
2
f
(ξ), ξ ∈ (x, x + h). (4.2)
For such an expansion to be valid, we assume that f(x) has two continuous derivatives.
The Taylor expansion (4.2) means that we can now replace the approximation (4.1) with
an exact formula of the form
f
(x) =
f(x + h) −f(x)
h
−
h
2
f
(ξ), ξ ∈ (x, x + h). (4.3)
65
4.1 Basic Concepts D. Levy
Since this approximation of the derivative at x is based on the values of the function at
x and x + h, the approximation (4.1) is called a forward diﬀerencing or onesided
diﬀerencing. The approximation of the derivative at x that is based on the values of
the function at x −h and x, i.e.,
f
(x) ≈
f(x) −f(x −h)
h
,
is called a backward diﬀerencing (which is obviously also a onesided diﬀerencing
formula).
The second term on the righthandside of (4.3) is the error term. Since the ap
proximation (4.1) can be thought of as being obtained by truncating this term from the
exact formula (4.3), this error is called the truncation error. The small parameter h
denotes the distance between the two points x and x+h. As this distance tends to zero,
i.e., h →0, the two points approach each other and we expect the approximation (4.1)
to improve. This is indeed the case if the truncation error goes to zero, which in turn is
the case if f
(ξ) is well deﬁned in the interval (x, x+h). The “speed” in which the error
goes to zero as h →0 is called the rate of convergence. When the truncation error is
of the order of O(h), we say that the method is a ﬁrst order method. We refer to a
methods as a p
th
order method if the truncation error is of the order of O(h
p
).
It is possible to write more accurate formulas than (4.3) for the ﬁrst derivative. For
example, a more accurate approximation for the ﬁrst derivative that is based on the
values of the function at the points f(x−h) and f(x+h) is the centered diﬀerencing
formula
f
(x) ≈
f(x + h) −f(x −h)
2h
. (4.4)
Let’s verify that this is indeed a more accurate formula than (4.1). Taylor expansions
of the terms on the righthandside of (4.4) are
f(x + h) = f(x) + hf
(x) +
h
2
2
f
(x) +
h
3
6
f
(ξ
1
),
f(x −h) = f(x) −hf
(x) +
h
2
2
f
(x) −
h
3
6
f
(ξ
2
).
Here ξ
1
∈ (x, x + h) and ξ
2
∈ (x −h, x). Hence
f
(x) =
f(x + h) −f(x −h)
2h
−
h
2
12
[f
(ξ
1
) + f
(ξ
2
)],
which means that the truncation error in the approximation (4.4) is
−
h
2
12
[f
(ξ
1
) + f
(ξ
2
)].
If the thirdorder derivative f
(x) is a continuous function in the interval [x−h, x+h],
then the intermediate value theorem implies that there exists a point ξ ∈ (x −h, x +h)
such that
f
(ξ) =
1
2
[f
(ξ
1
) + f
(ξ
2
)].
66
D. Levy 4.2 Diﬀerentiation Via Interpolation
Hence
f
(x) =
f(x + h) −f(x −h)
2h
−
h
2
6
f
(ξ), (4.5)
which means that the expression (4.4) is a secondorder approximation of the ﬁrst deriva
tive.
In a similar way we can approximate the values of higherorder derivatives. For
example, it is easy to verify that the following is a secondorder approximation of the
second derivative
f
(x) ≈
f(x + h) −2f(x) + f(x −h)
h
2
. (4.6)
To verify the consistency and the order of approximation of (4.6) we expand
f(x ±h) = f(x) ±hf
(x) +
h
2
2
f
(x) ±
h
3
6
f
(x) +
h
4
24
f
(4)
(ξ
±
).
Here, ξ
−
∈ (x −h, x) and ξ
+
∈ (x, x + h). Hence
f(x + h) −2f(x) + f(x −h)
h
2
= f
(x)+
h
2
24
f
(4)
(ξ
−
) + f
(4)
(ξ
+
)
= f
(x)+
h
2
12
f
(4)
(ξ),
where we assume that ξ ∈ (x −h, x + h) and that f(x) has four continuous derivatives
in the interval. Hence, the approximation (4.6) is indeed a secondorder approximation
of the derivative, with a truncation error that is given by
−
h
2
12
f
(4)
(ξ), ξ ∈ (x −h, x + h).
4.2 Diﬀerentiation Via Interpolation
In this section we demonstrate how to generate diﬀerentiation formulas by diﬀerentiating
an interpolant. The idea is straightforward: the ﬁrst stage is to construct an interpo
lating polynomial from the data. An approximation of the derivative at any point can
be then obtained by a direct diﬀerentiation of the interpolant.
We follow this procedure and assume that f(x
0
), . . . , f(x
n
) are given. The Lagrange
form of the interpolation polynomial through these points is
Q
n
(x) =
n
¸
j=0
f(x
j
)l
j
(x).
Here we simplify the notation and replace l
n
i
(x) which is the notation we used in Sec
tion 2.5 by l
i
(x). According to the error analysis of Section 2.7 we know that the
interpolation error is
f(x) −Q
n
(x) =
1
(n + 1)!
f
(n+1)
(ξ
n
)
n
¸
j=0
(x −x
j
),
67
4.2 Diﬀerentiation Via Interpolation D. Levy
where ξ
n
∈ (min(x, x
0
, . . . , x
n
), max(x, x
0
, . . . , x
n
)). Since here we are assuming that the
points x
0
, . . . , x
n
are ﬁxed, we would like to emphasize the dependence of ξ
n
on x and
hence replace the ξ
n
notation by ξ
x
. We that have:
f(x) =
n
¸
j=0
f(x
j
)l
j
(x) +
1
(n + 1)!
f
(n+1)
(ξ
x
)w(x), (4.7)
where
w(x) =
n
¸
i=0
(x −x
i
).
Diﬀerentiating the interpolant (4.7):
f
(x) =
n
¸
j=0
f(x
j
)l
j
(x) +
1
(n + 1)!
f
(n+1)
(ξ
x
)w
(x) +
1
(n + 1)!
w(x)
d
dx
f
(n+1)
(ξ
x
). (4.8)
We now assume that x is one of the interpolation points, i.e., x ∈ ¦x
0
, . . . , x
n
¦, say x
k
,
so that
f
(x
k
) =
n
¸
j=0
f(x
j
)l
j
(x
k
) +
1
(n + 1)!
f
(n+1)
(ξ
x
k
)w
(x
k
). (4.9)
Now,
w
(x) =
n
¸
i=0
n
¸
j=0
j=i
(x −x
j
) =
n
¸
i=0
[(x −x
0
) . . . (x −x
i−1
)(x −x
i+1
) . . . (x −x
n
)].
Hence, when w
(x) is evaluated at an interpolation point x
k
, there is only one term in
w
(x) that does not vanish, i.e.,
w
(x
k
) =
n
¸
j=0
j=k
(x
k
−x
j
).
The numerical diﬀerentiation formula, (4.9), then becomes
f
(x
k
) =
n
¸
j=0
f(x
j
)l
j
(x
k
) +
1
(n + 1)!
f
(n+1)
(ξ
x
k
)
¸
j=0
j=k
(x
k
−x
j
). (4.10)
We refer to the formula (4.10) as a diﬀerentiation by interpolation algorithm.
Example 4.1
We demonstrate how to use the diﬀerentiation by integration formula (4.10) in the case
where n = 1 and k = 0. This means that we use two interpolation points (x
0
, f(x
0
)) and
68
D. Levy 4.2 Diﬀerentiation Via Interpolation
(x
1
, f(x
1
)), and want to approximate f
(x
0
). The Lagrange interpolation polynomial in
this case is
f(x) = f(x
0
)l
0
(x) + f(x
1
)l
1
(x),
where
l
0
(x) =
x −x
1
x
0
−x
1
, l
1
(x) =
x −x
0
x
1
−x
0
.
Hence
l
0
(x) =
1
x
0
−x
1
, l
1
(x) =
1
x
1
−x
0
.
We thus have
f
(x
0
) =
f(x
0
)
x
0
−x
1
+
f(x
1
)
x
1
−x
0
+
1
2
f
(ξ)(x
0
−x
1
) =
f(x
1
) −f(x
0
)
x
1
−x
0
−
1
2
f
(ξ)(x
1
−x
0
).
Here, we simplify the notation and assume that ξ ∈ (x
0
, x
1
). If we now let x
1
= x
0
+h,
then
f
(x
0
) =
f(x
0
+ h) −f(x
0
)
h
−
h
2
f
(ξ),
which is the (ﬁrstorder) forward diﬀerencing approximation of f
(x
0
), (4.3).
Example 4.2
We repeat the previous example in the case n = 2 and k = 0. This time
f(x) = f(x
0
)l
0
(x) + f(x
1
)l
1
(x) + f(x
2
)l
2
(x),
with
l
0
(x) =
(x −x
1
)(x −x
2
)
(x
0
−x
1
)(x
0
−x
2
)
, l
1
(x) =
(x −x
0
)(x −x
2
)
(x
1
−x
0
)(x
1
−x
2
)
, l
2
(x) =
(x −x
0
)(x −x
1
)
(x
2
−x
0
)(x
2
−x
1
)
.
Hence
l
0
(x) =
2x −x
1
−x
2
(x
0
−x
1
)(x
0
−x
2
)
, l
1
(x) =
2x −x
0
−x
2
(x
1
−x
0
)(x
1
−x
2
)
, l
2
(x) =
2x −x
0
−x
1
(x
2
−x
0
)(x
2
−x
1
)
.
Evaluating l
j
(x) for j = 1, 2, 3 at x
0
we have
l
0
(x
0
) =
2x
0
−x
1
−x
2
(x
0
−x
1
)(x
0
−x
2
)
, l
1
(x
0
) =
x
0
−x
2
(x
1
−x
0
)(x
1
−x
2
)
, l
2
(x
0
) =
x
0
−x
1
(x
2
−x
0
)(x
2
−x
1
)
Hence
f
(x
0
) = f(x
0
)
2x
0
−x
1
−x
2
(x
0
−x
1
)(x
0
−x
2
)
+ f(x
1
)
x
0
−x
2
(x
1
−x
0
)(x
1
−x
2
)
(4.11)
+f(x
2
)
x
0
−x
1
(x
2
−x
0
)(x
2
−x
1
)
+
1
6
f
(3)
(ξ)(x
0
−x
1
)(x
0
−x
2
).
69
4.3 The Method of Undetermined Coeﬃcients D. Levy
Here, we assume ξ ∈ (x
0
, x
2
). For x
i
= x + ih, i = 0, 1, 2, equation (4.11) becomes
f
(x) = −f(x)
3
2h
+ f(x + h)
2
h
+ f(x + 2h)
−
1
2h
+
f
(ξ)
3
h
2
=
−3f(x) + 4f(x + h) −f(x + 2h)
2h
+
f
(ξ)
3
h
2
,
which is a onesided, secondorder approximation of the ﬁrst derivative.
Remark. In a similar way, if we were to repeat the last example with n = 2 while
approximating the derivative at x
1
, the resulting formula would be the secondorder
centered approximation of the ﬁrstderivative (4.5)
f
(x) =
f(x + h) −f(x −h)
2h
−
1
6
f
(ξ)h
2
.
4.3 The Method of Undetermined Coeﬃcients
In this section we present the method of undetermined coeﬃcients, which is a very
practical way for generating approximations of derivatives (as well as other quantities
as we shall see, e.g., when we discuss integration).
Assume, for example, that we are interested in ﬁnding an approximation of the
second derivative f
(x) that is based on the values of the function at three equally
spaced points, f(x −h), f(x), f(x + h), i.e.,
f
(x) ≈ Af(x + h) + B(x) + Cf(x −h). (4.12)
The coeﬃcients A, B, and C are to be determined in such a way that this linear
combination is indeed an approximation of the second derivative. The Taylor expansions
of the terms f(x ±h) are
f(x ±h) = f(x) ±hf
(x) +
h
2
2
f
(x) ±
h
3
6
f
(x) +
h
4
24
f
(4)
(ξ
±
), (4.13)
where (assuming that h > 0)
x −h ξ
−
x ξ
+
x + h.
Using the expansions in (4.13) we can rewrite (4.12) as
f
(x) ≈ Af(x + h) + Bf(x) + Cf(x −h) (4.14)
= (A + B + C)f(x) + h(A −C)f
(x) +
h
2
2
(A + C)f
(x)
+
h
3
6
(A −C)f
(3)
(x) +
h
4
24
[Af
(4)
(ξ
+
) + Cf
(4)
(ξ
−
)].
70
D. Levy 4.3 The Method of Undetermined Coeﬃcients
Equating the coeﬃcients of f(x), f
(x), and f
(x) on both sides of (4.14) we obtain the
linear system
A + B + C = 0,
A −C = 0,
A + C =
2
h
2
.
(4.15)
The system (4.15) has the unique solution:
A = C =
1
h
2
, B = −
2
h
2
.
In this particular case, since A and C are equal to each other, the coeﬃcient of f
(3)
(x)
on the righthandside of (4.14) also vanishes and we end up with
f
(x) =
f(x + h) −2f(x) + f(x −h)
h
2
−
h
2
24
[f
(4)
(ξ
+
) + f
(4)
(ξ
−
)].
We note that the last two terms can be combined into one using an intermediate values
theorem (assuming that f(x) has four continuous derivatives), i.e.,
h
2
24
[f
(4)
(ξ
+
) + f
(4)
(ξ
−
)] =
h
2
12
f
(4)
(ξ), ξ ∈ (x −h, x + h).
Hence we obtain the familiar secondorder approximation of the second derivative:
f
(x) =
f(x + h) −2f(x) + f(x −h)
h
2
−
h
2
12
f
(4)
(ξ).
In terms of an algorithm, the method of undetermined coeﬃcients follows what was
just demonstrated in the example:
1. Assume that the derivative can be written as a linear combination of the values
of the function at certain points.
2. Write the Taylor expansions of the function at the approximation points.
3. Equate the coeﬃcients of the function and its derivatives on both sides.
The only question that remains open is how many terms should we use in the Taylor
expansion. This question has, unfortunately, no simple answer. In the example, we have
already seen that even though we used data that is taken from three points, we could
satisfy four equations. In other words, the coeﬃcient of the thirdderivative vanished as
well. If we were to stop the Taylor expansions at the third derivative instead of at the
fourth derivative, we would have missed on this cancellation, and would have mistakenly
concluded that the approximation method is only ﬁrstorder accurate. The number of
terms in the Taylor expansion should be suﬃcient to rule out additional cancellations.
In other words, one should truncate the Taylor series after the leading term in the error
has been identiﬁed.
71
4.4 Richardson’s Extrapolation D. Levy
4.4 Richardson’s Extrapolation
Richardson’s extrapolation can be viewed as a general procedure for improving the
accuracy of approximations when the structure of the error is known. While we study
it here in the context of numerical diﬀerentiation, it is by no means limited only to
diﬀerentiation and we will get back to it later on when we study methods for numerical
integration.
We start with an example in which we show how to turn a secondorder approxima
tion of the ﬁrst derivative into a fourth order approximation of the same quantity. We
already know that we can write a secondorder approximation of f
(x) given its values
in f(x ± h). In order to improve this approximation we will need some more insight
on the internal structure of the error. We therefore start with the Taylor expansions of
f(x ±h) about the point x, i.e.,
f(x + h) =
∞
¸
k=0
f
(k)
(x)
k!
h
k
,
f(x −h) =
∞
¸
k=0
(−1)
k
f
(k)
(x)
k!
h
k
.
Hence
f
(x) =
f(x + h) −f(x −h)
2h
−
¸
h
2
3!
f
(3)
(x) +
h
4
5!
f
(5)
(x) + . . .
. (4.16)
We rewrite (4.16) as
L = D(h) + e
2
h
2
+ e
4
h
4
+ . . . , (4.17)
where L denotes the quantity that we are interested in approximating, i.e.,
L = f
(x),
and D(h) is the approximation, which in this case is
D(h) =
f(x + h) −f(x −h)
2h
.
The error is
E = e
2
h
2
+ e
4
h
4
+ . . .
where e
i
denotes the coeﬃcient of h
i
in (4.16). The important property of the coeﬃcients
e
i
’s is that they do not depend on h. We note that the formula
L ≈ D(h),
is a secondorder approximation of the ﬁrstderivative which is based on the values
of f(x) at x ± h. We assume here that in general e
i
= 0. In order to improve the
72
D. Levy 4.4 Richardson’s Extrapolation
approximation of L our strategy will be to eliminate the term e
2
h
2
from the error. How
can this be done? one possibility is to write another approximation that is based on the
values of the function at diﬀerent points. For example, we can write
L = D(2h) + e
2
(2h)
2
+ e
4
(2h)
4
+ . . . . (4.18)
This, of course, is still a secondorder approximation of the derivative. However, the
idea is to combine (4.17) with (4.18) such that the h
2
term in the error vanishes. Indeed,
subtracting the following equations from each other
4L = 4D(h) + 4e
2
h
2
+ 4e
4
h
4
+ . . . ,
L = D(2h) + 4e
2
h
2
+ 16e
4
h
4
+ . . . ,
we have
L =
4D(h) −D(2h)
3
−4e
4
h
4
+ . . .
In other words, a fourthorder approximation of the derivative is
f
(x) =
−f(x + 2h) + 8f(x + h) −8f(x −h) + f(x −2h)
12h
+ O(h
4
). (4.19)
Note that (4.19) improves the accuracy of the approximation (4.16) by using more
points.
This process can be repeated over and over as long as the structure of the error is
known. For example, we can write (4.19) as
L = S(h) + a
4
h
4
+ a
6
h
6
+ . . . (4.20)
where
S(h) =
−f(x + 2h) + 8f(x + h) −8f(x −h) + f(x −2h)
12h
.
Equation (4.20) can be turned into a sixthorder approximation of the derivative by
eliminating the term a
4
h
4
. We carry out such a procedure by writing
L = S(2h) + a
4
(2h)
4
+ a
6
(2h)
6
+ . . . (4.21)
Combining (4.21) with (4.20) we end up with a sixthorder approximation of the deriva
tive:
L =
16S(h) −S(2h)
15
+ O(h
6
).
Remarks.
1. In (4.18), instead of using D(2h), it is possible to use other approximations, e.g.,
D(h/2). If this is what is done, instead of (4.19) we would get a fourthorder
approximation of the derivative that is based on the values of f at
x −h, x −h/2, x + h/2, x + h.
2. Once again we would like to emphasize that Richardson’s extrapolation is a
general procedure for improving the accuracy of numerical approximations that
can be used when the structure of the error is known. It is not speciﬁc for
numerical diﬀerentiation.
73
D. Levy
5 Numerical Integration
5.1 Basic Concepts
In this chapter we are going to explore various ways for approximating the integral of
a function over a given domain. Since we can not analytically integrate every function,
the need for approximate integration formulas is obvious. In addition, there might be
situations where the given function can be integrated analytically, and still, an approx
imation formula may end up being a more eﬃcient alternative to evaluating the exact
expression of the integral.
In order to gain some insight on numerical integration, it will be natural to recall
the notion of Riemann integration. We assume that f(x) is a bounded function deﬁned
on [a, b] and that ¦x
0
, . . . , x
n
¦ is a partition (P) of [a, b]. For each i we let
M
i
(f) = sup
x∈[x
i−1
,x
i
]
f(x),
and
m
i
(f) = inf
x∈[x
i−1
,x
i
]
f(x),
Letting ∆x
i
= x
i
− x
i−1
, the upper (Darboux) sum of f(x) with respect to the
partition P is deﬁned as
U(f, P) =
n
¸
i=1
M
i
∆x
i
, (5.1)
while the lower (Darboux) sum of f(x) with respect to the partition P is deﬁned as
L(f, P) =
n
¸
i=1
m
i
∆x
i
. (5.2)
The upper integral of f(x) on [a, b] is deﬁned as
U(f) = inf(U(f, P)),
and the lower integral of f(x) is deﬁned as
L(f) = sup(L(f, P)),
where both the inﬁmum and the supremum are taken over all possible partitions, P, of
the interval [a, b]. If the upper and lower integral of f(x) are equal to each other, their
common value is denoted by
b
a
f(x)dx and is referred to as the Riemann integral of
f(x).
For the purpose of the present discussion we can think of the upper and lower Dar
boux sums (5.1), (5.2), as two approximations of the integral (assuming that the function
is indeed integrable). Of course, these sums are not deﬁned in the most convenient way
74
D. Levy 5.1 Basic Concepts
for an approximation algorithm. This is because we need to ﬁnd the extrema of the func
tion in every subinterval. Finding the extrema of the function, may be a complicated
task on its own, which we would like to avoid.
Instead, one can think of approximating the value of
b
a
f(x)dx by multiplying the
value of the function at one of the endpoints of the interval by the length of the interval,
i.e.,
b
a
f(x)dx ≈ f(a)(b −a). (5.3)
The approximation (5.3) is called the rectangular method (see Figure 5.1). Numerical
integration formulas are also referred to as integration rules or quadratures, and
hence we can refer to (5.3) as the rectangular rule or the rectangular quadrature.
a b
f(a)
f(b)
x
f
(
x
)
Figure 5.1: A rectangular quadrature
A variation on the rectangular rule is the midpoint rule. Similarly to the rectan
gular rule, we approximate the value of the integral
b
a
f(x)dx by multiplying the length
of the interval by the value of the function at one point. Only this time, we replace the
value of the function at an endpoint, by the value of the function at the center point
1
2
(a + b), i.e.,
b
a
f(x)dx ≈ (b −a)f
a + b
2
. (5.4)
(see also Fig 5.2). As we shall see below, the midpoint quadrature (5.4) is a more
accurate quadrature than the rectangular rule (5.3).
75
5.1 Basic Concepts D. Levy
a (a+b)/2 b
f(a)
f((a+b)/2)
f(b)
x
f
(
x
)
Figure 5.2: A midpoint quadrature
In order to compute the quadrature error for the midpoint rule (5.4), we consider
the primitive function F(x),
F(x) =
x
a
f(x)dx,
and expand
a+h
a
f(x)dx = F(a + h) = F(a) + hF
(a) +
h
2
2
F
(a) +
h
3
6
F
(a) + O(h
4
) (5.5)
= hf(a) +
h
2
2
f
(a) +
h
3
6
f
(a) + O(h
4
)
If we let b = a + h, we have (expanding f(a + h/2)) for the quadrature error, E,
E =
a+h
a
f(x)dx −hf
a +
h
2
= hf(a) +
h
2
2
f
(a) +
h
3
6
f
(a) + O(h
4
)
−h
¸
f(a) +
h
2
f
(a) +
h
2
8
f
(a) + O(h
3
)
,
which means that the error term is of order O(h
3
) so we should stop the expansions
there and write
E = h
3
f
(ξ)
1
6
−
1
8
=
(b −a)
3
24
f
(ξ), ξ ∈ (a, b). (5.6)
76
D. Levy 5.2 Integration via Interpolation
Remark. Throughout this section we assumed that all functions we are interested in
integrating are actually integrable in the domain of interest. We also assumed that they
are bounded and that they are deﬁned at every point, so that whenever we need to
evaluate a function at a point, we can do it. We will go on and use these assumptions
throughout the chapter.
5.2 Integration via Interpolation
In this section we will study how to derive quadratures by integrating an interpolant.
As always, our goal is to evaluate I =
b
a
f(x)dx. We select nodes x
0
, . . . , x
n
∈ [a, b],
and write the Lagrange interpolant (of degree n) through these points, i.e.,
P
n
(x) =
n
¸
i=0
f(x
i
)l
i
(x),
with
l
i
(x) =
n
¸
j=0
j=i
x −x
j
x
i
−x
j
, 0 i n.
Hence, we can approximate
b
a
f(x)dx ≈
b
a
P
n
(x)dx =
n
¸
i=0
f(x
i
)
b
a
l
i
(x)dx =
n
¸
i=0
A
i
f(x
i
). (5.7)
The quadrature coeﬃcients A
i
in (5.7) are given by
A
i
=
b
a
l
i
(x)dx. (5.8)
Note that if we want to integrate several diﬀerent functions at the same points, the
quadrature coeﬃcients (5.8) need to be computed only once, since they do not depend
on the function that is being integrated. If we change the interpolation/integration
points, then we must recompute the quadrature coeﬃcients.
For equally spaced points, x
0
, . . . , x
n
, a numerical integration formula of the form
b
a
f(x)dx ≈
n
¸
i=0
A
i
f(x
i
), (5.9)
is called a NewtonCotes formula.
Example 5.1
We let n = 1 and consider two interpolation points which we set as
x
0
= a, x
1
= b.
77
5.2 Integration via Interpolation D. Levy
In this case
l
0
(x) =
b −x
b −a
, l
1
(x) =
x −a
b −a
.
Hence
A
0
=
b
a
l
0
(x) =
b
a
b −x
b −a
dx =
b −a
2
.
Similarly,
A
1
=
b
a
l
1
(x) =
b
a
x −a
b −a
dx =
b −a
2
= A
0
.
The resulting quadrature is the socalled trapezoidal rule,
b
a
dx ≈
b −a
2
[f(a) + f(b)], (5.10)
(see Figure 5.3).
a b
f(a)
f(b)
x
f
(
x
)
Figure 5.3: A trapezoidal quadrature
We can now use the interpolation error to compute the error in the quadrature (5.10).
The interpolation error is
f(x) −P
1
(x) =
1
2
f
(ξ
x
)(x −a)(x −b), ξ
x
∈ (a, b),
and hence (using the integral intermediate value theorem)
E =
b
a
1
2
f
(ξ
x
)(x−a)(x−b) =
f
(ξ)
2
b
a
(x−a)(x−b)dx = −
f
(ξ)
12
(b −a)
3
, (5.11)
with ξ ∈ (a, b).
78
D. Levy 5.3 Composite Integration Rules
Remarks.
1. We note that the quadratures (5.7),(5.8), are exact for polynomials of degree
n. For if p(x) is such a polynomial, it can be written as (check!)
p(x) =
n
¸
i=0
p(x
i
)l
i
(x).
Hence
b
a
p(x)dx =
n
¸
i=0
p(x
i
)
b
a
l
i
(x)dx =
n
¸
i=0
A
i
p(x
i
).
2. As of the opposite direction. Assume that the quadrature
b
a
f(x)dx ≈
n
¸
i=0
A
i
f(x
i
),
is exact for all polynomials of degree n. We know that
deg(l
j
(x)) = n,
and hence
b
a
l
j
(x)dx =
n
¸
i=0
A
i
l
j
(x
i
) =
n
¸
i=0
A
i
δ
ij
= A
j
.
5.3 Composite Integration Rules
In a composite quadrature, we divide the interval into subintervals and apply an inte
gration rule to each subinterval. We demonstrate this idea with a couple of examples.
Example 5.2
Consider the points
a = x
0
< x
1
< < x
n
= b.
The composite trapezoidal rule is obtained by applying the trapezoidal rule in each
subinterval [x
i−1
, x
i
], i = 1, . . . , n, i.e.,
b
a
f(x)dx =
n
¸
i=1
x
i
x
i−1
f(x)dx ≈
1
2
n
¸
i=1
(x
i
−x
i−1
)[f(x
i−1
) + f(x
i
)], (5.12)
(see Figure 5.4).
A particular case is when these points are uniformly spaced, i.e., when all intervals
have an equal length. For example, if
x
i
= a + ih,
79
5.3 Composite Integration Rules D. Levy
x
0
x
1
x
2
x
n−1
x
n
x
f
(
x
)
Figure 5.4: A composite trapezoidal rule
where
h =
b −a
n
,
then
b
a
f(x)dx ≈
h
2
¸
f(a) + 2
n−1
¸
i=1
f(a + ih) + f(b)
¸
= h
n
¸
i=0
f(a + ih). (5.13)
The notation of a sum with two primes,
¸
, means that we sum over all the terms with
the exception of the ﬁrst and last terms that are being divided by 2.
We can also compute the error term as a function of the distance between neighboring
points, h. We know from (5.11) that in every subinterval the quadrature error is
−
h
3
12
f
(ξ
x
).
Hence, the overall error is obtained by summing over n such terms:
n
¸
i=1
−
h
3
12
f
(ξ
i
) = −
h
3
n
12
¸
1
n
n
¸
i=1
f
(ξ
i
)
¸
.
Here, we use the notation ξ
i
to denote an intermediate point that belongs to the i
th
interval. Let
M =
1
n
n
¸
i=1
f
(ξ
i
).
80
D. Levy 5.3 Composite Integration Rules
Clearly
min
x∈[a,b]
f
(x) M max
x∈[a,b]
f
(x)
If we assume that f
(x) is continuous in [a, b] (which we anyhow do in order for the
interpolation error formula to be valid) then there exists a point ξ ∈ [a, b] such that
f
(ξ) = M.
Hence (recalling that (b −a)/n = h, we have
E = −
(b −a)h
2
12
f
(ξ), ξ ∈ [a, b]. (5.14)
This means that the composite trapezoidal rule is secondorder accurate.
Example 5.3
In the interval [a, b] we assume n subintervals and let
h =
b −a
n
.
The quadrature points are
x
j
= a +
j −
1
2
h, j = 1, 2, . . . , n.
The composite midpoint rule is given by applying the midpoint rule (5.4) in every
subinterval, i.e.,
b
a
f(x)dx ≈ h
n
¸
j=1
f(x
j
). (5.15)
Equation (5.15) is known as the composite midpoint rule.
In order to obtain the quadrature error in the approximation (5.15) we recall that
in each subinterval the error is given according to (5.6), i.e.,
E
j
=
h
3
24
f
(ξ
j
), ξ
j
∈
x
j
−
h
2
, x
j
+
h
2
.
Hence
E =
n
¸
j=1
E
j
=
h
3
24
n
¸
j=1
f
(ξ
j
) =
h
3
24
n
¸
1
n
n
¸
j=1
f
(ξ
j
)
¸
=
h
2
(b −a)
24
f
(ξ), (5.16)
where ξ ∈ (a, b). This means that the composite midpoint rule is also secondorder
accurate (just like the composite trapezoidal rule).
81
5.4 Additional Integration Techniques D. Levy
5.4 Additional Integration Techniques
5.4.1 The method of undetermined coeﬃcients
The methods of undetermined coeﬃcients for deriving quadratures is the following:
1. Select the quadrature points.
2. Write a quadrature as a linear combination of the values of the function at the
chosen quadrature points.
3. Determine the coeﬃcients of the linear combination by requiring that the quadra
ture is exact for as many polynomials as possible from the the ordered set ¦1, x, x
2
, . . .¦.
We demonstrate this technique with the following example.
Example 5.4
Problem: Find a quadrature of the form
1
0
f(x)dx ≈ A
0
f(0) + A
1
f
1
2
+ A
2
f(1),
that is exact for all polynomials of degree 2.
Solution: Since the quadrature has to be exact for all polynomials of degree 2, it has
to be exact for the polynomials 1, x, and x
2
. Hence we obtain the system of linear
equations
1 =
1
0
1dx = A
0
+ A
1
+ A
2
,
1
2
=
1
0
xdx =
1
2
A
1
+ A
2
,
1
3
=
1
0
x
2
dx =
1
4
A
1
+ A
2
.
Therefore, A
0
= A
2
=
1
6
and A
1
=
2
3
, and the desired quadrature is
1
0
f(x)dx ≈
f(0) + 4f
1
2
+ f(1)
6
. (5.17)
Since the resulting formula (5.17) is linear, its being exact for 1, x, and x
2
, implies that
it is exact for any polynomial of degree 2. In fact, we will show in Section 5.5.1 that
this approximation is actually exact for polynomials of degree 3.
82
D. Levy 5.4 Additional Integration Techniques
5.4.2 Change of an interval
Suppose that we have a quadrature formula on the interval [c, d] of the form
d
c
f(t)dt ≈
n
¸
i=0
A
i
f(t
i
). (5.18)
We would like to to use (5.18) to ﬁnd a quadrature on the interval [a, b], that approxi
mates for
b
a
f(x)dx.
The mapping between the intervals [c, d] →[a, b] can be written as a linear transforma
tion of the form
λ(t) =
b −a
d −c
t +
ad −bc
d −c
.
Hence
b
a
f(x)dx =
b −a
d −c
d
c
f(λ(t))dt ≈
b −a
d −c
n
¸
i=0
A
i
f(λ(t
i
)).
This means that
b
a
f(x)dx ≈
b −a
d −c
n
¸
i=0
A
i
f
b −a
d −c
t
i
+
ad −bc
d −c
. (5.19)
We note that if the quadrature (5.18) was exact for polynomials of degree m, so is (5.19).
Example 5.5
We want to write the result of the previous example
1
0
f(x)dx ≈
f(0) + 4f
1
2
+ f(1)
6
,
as a quadrature on the interval [a, b]. According to (5.19)
b
a
f(x)dx ≈
b −a
6
¸
f(a) + 4f
a + b
2
+ f(b)
. (5.20)
The approximation (5.20) is known as the Simpson quadrature.
83
5.4 Additional Integration Techniques D. Levy
5.4.3 General integration formulas
We recall that a weight function is a continuous, nonnegative function with a positive
mass. We assume that such a weight function w(x) is given and would like to write a
quadrature of the form
b
a
f(x)w(x)dx ≈
n
¸
i=0
A
i
f(x
i
). (5.21)
Such quadratures are called general (weighted) quadratures.
Previously, for the case w(x) ≡ 1, we wrote a quadrature of the form
b
a
f(x)dx ≈
n
¸
i=0
A
i
f(x
i
),
where
A
i
=
b
a
l
i
(x)dx.
Repeating the derivation we carried out in Section 5.2, we construct an interpolant
Q
n
(x) of degree n that passes through the points x
0
, . . . , x
n
. Its Lagrange form is
Q
n
(x) =
n
¸
i=0
f(x
i
)l
i
(x),
with the usual
l
i
(x) =
n
¸
j=0
j=i
x −x
j
x
i
−x
j
, 0 i n.
Hence
b
a
f(x)w(x)dx ≈
b
a
Q
n
(x)w(x)dx =
n
¸
i=0
b
a
l
i
(x)w(x)dx =
n
¸
i=0
A
i
f(x
i
),
where the coeﬃcients A
i
are given by
A
i
=
b
a
l
i
(x)w(x)dx. (5.22)
To summarize, the general quadrature is
b
a
f(x)w(x)dx ≈
n
¸
i=0
A
i
f(x
i
), (5.23)
with quadrature coeﬃcients, A
i
, that are given by (5.22).
84
D. Levy 5.5 Simpson’s Integration
5.5 Simpson’s Integration
In the last example we obtained Simpson’s quadrature (5.20). An alternative derivation
is the following: start with a polynomial Q
2
(x) that interpolates f(x) at the points a,
(a + b)/2, and b. Then approximate
b
a
f(x)dx ≈
b
a
¸
(x −c)(x −b)
(a −c)(a −b)
f(a) +
(x −a)(x −b)
(c −a)(c −b)
f(c) +
(x −a)(x −c)
(b −a)(b −c)
f(b)
dx
= . . . =
b −a
6
¸
f(a) + 4f
a + b
2
+ f(b)
,
which is Simpson’s rule (5.20). Figure 5.5 demonstrates this process of deriving Simp
son’s quadrature for the speciﬁc choice of approximating
3
1
sin xdx.
1 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3
0
0.2
0.4
0.6
0.8
1
←− P
2
(x)
sinx −→
x
Figure 5.5: An example of Simpson’s quadrature. The approximation of
3
1
sin xdx is
obtained by integrating the quadratic interpolant Q
2
(x) over [1, 3]
5.5.1 The quadrature error
Surprisingly, Simpson’s quadrature is exact for polynomials of degree 3 and not only
for polynomials of degree 2. We will see that by studying the error term. We let h
denote half of the interval [a, b], i.e.,
h =
b −a
2
.
85
5.5 Simpson’s Integration D. Levy
Then
b
a
f(x)dx =
a+2h
a
f(x)dx ≈
h
3
[f(a) + 4f(a + h) + f(a + 2h)]
=
h
3
¸
f(a) + 4f(a) + 4hf
(a) +
4
2
h
2
f
(a) +
4
6
h
3
f
(a) +
4
24
h
4
f
(4)
(a) + . . .
+ f(a) + 2hf
(a) +
(2h)
2
2
f
(a) +
(2h)
3
6
f
(a) +
(2h)
4
24
f
(4)
(a) + . . .
= 2hf(a) + 2h
2
f
(a) +
4
3
h
3
f
(a) +
2
3
h
4
f
(a) +
100
3 5!
h
5
f
(4)
(a) + . . .
We now deﬁne F(x) to be the primitive function of f(x), i.e.,
F(x) =
x
a
f(t)dt.
Hence
F(a + 2h) =
a+2h
a
f(x)dx = F(a) + 2hF
(a) +
(2h)
2
2
F
(a) +
(2h)
3
6
F
(a)
+
(2h)
4
4!
F
(4)
(a) +
(2h)
5
5!
F
(5)
(a) + . . .
= 2hf(a) + 2h
2
f
(a) +
4
3
h
3
f
(a) +
2
3
h
4
f
(a) +
32
5!
h
5
f
(4)
(a) + . . .
which implies that
F(a + 2h) −
h
3
[f(a) + 4f(a + h) + f(a + 2h)] = −
1
90
h
5
f
(4)
(a) + . . .
This means that the quadrature error for Simpson’s rule is
E = −
1
90
b −a
2
5
f
(4)
(ξ), ξ ∈ [a, b]. (5.24)
Since the fourth derivative of any polynomial of degree 3 is identically zero, the
quadrature error formula (5.24) implies that Simpson’s quadrature is exact for polyno
mials of degree 3.
5.5.2 Composite Simpson rule
To derive a composite version of Simpson’s quadrature, we divide the interval [a, b] into
an even number of subintervals, n, and let
x
i
= a + ih, 0 i n,
where
h =
b −a
n
.
86
D. Levy 5.6 Gaussian Quadrature
Hence, if we replace the integral in every subintervals by Simpson’s rule (5.20), we obtain
b
a
f(x)dx =
x
2
x
0
f(x)dx + . . . +
xn
x
n−2
f(x)dx =
n/2
¸
i=1
x
2i
x
2i−2
f(x)dx
≈
h
3
n/2
¸
i=1
[f(x
2i−2
) + 4f(x
2i−1
) + f(x
2i
)] .
The composite Simpson quadrature is thus given by
b
a
f(x)dx ≈
h
3
f(x
0
) + 2
n/2
¸
i=0
f(x
2i−2
) + 4
n/2
¸
i=1
f(x
2i−1
) + f(x
n
)
¸
¸
. (5.25)
Summing the error terms (that are given by (5.24)) over all subintervals, the quadrature
error takes the form
E = −
h
5
90
n/2
¸
i=1
f
(4)
(ξ
i
) = −
h
5
90
n
2
2
n
n/2
¸
i=1
f
(4)
(ξ
i
).
Since
min
x∈[a,b]
f
(4)
(x)
2
n
n/2
¸
i=1
f
(4)
(ξ
i
) max
x∈[a,b]
f
(4)
(x),
we can conclude that
E = −
h
5
90
n
2
f
(4)
(ξ) = −
h
4
180
f
(4)
(ξ), ξ ∈ [a, b], (5.26)
i.e., the composite Simpson quadrature is fourthorder accurate.
5.6 Gaussian Quadrature
5.6.1 Maximizing the quadrature’s accuracy
So far, all the quadratures we encountered were of the form
b
a
f(x)dx ≈
n
¸
i=0
A
i
f(x
i
). (5.27)
An approximation of the form (5.27) was shown to be exact for polynomials of degree n
for an appropriate choice of the quadrature coeﬃcients A
i
. In all cases, the quadrature
points x
0
, . . . , x
n
were given up front. In other words, given a set of nodes x
0
, . . . , x
n
,
the coeﬃcients ¦A
i
¦
n
i=0
were determined such that the approximation was exact in Π
n
.
We are now interested in investigating the possibility of writing more accurate
quadratures without increasing the total number of quadrature points. This will be
87
5.6 Gaussian Quadrature D. Levy
possible if we allow for the freedom of choosing the quadrature points. The quadra
ture problem becomes now a problem of choosing the quadrature points in addition to
determining the corresponding coeﬃcients in a way that the quadrature is exact for
polynomials of a maximal degree. Quadratures that are obtained that way are called
Gaussian quadratures.
Example 5.6
The quadrature formula
1
−1
f(x)dx ≈ f
−
1
√
3
+ f
1
√
3
,
is exact for polynomials of degree 3(!) We will revisit this problem and prove this
result in Example 5.9 below.
An equivalent problem can be stated for the more general weighted quadrature case.
Here,
b
a
f(x)w(x)dx ≈
n
¸
i=0
A
i
f(x
i
), (5.28)
where w(x) 0 is a weight function. Equation (5.28) is exact for f ∈ Π
n
if and only if
A
i
=
b
a
w(x)
¸
j=0
j=i
x −x
j
x
i
−x
j
dx.
In both cases (5.27) and (5.28), the number of quadrature nodes, x
0
, . . . , x
n
, is n+1,
and so is the number of quadrature coeﬃcients, A
i
. Hence, if we have the ﬂexibility
of determining the location of the points in addition to determining the coeﬃcients,
we have altogether 2n + 2 degrees of freedom, and hence we can expect to be able to
derive quadratures that are exact for polynomials in Π
2n+1
. This is indeed the case as
we shall see below. We will show that the general solution of this integration problem
is connected with the roots of orthogonal polynomials. We start with the following
theorem.
Theorem 5.7 Let q(x) be a nonzero polynomial of degree n+1 that is worthogonal to
Π
n
, i.e., ∀p(x) ∈ Π
n
,
b
a
p(x)q(x)w(x)dx = 0.
If x
0
, . . . , x
n
are the zeros of q(x) then (5.28) is exact ∀f ∈ Π
2n+1
.
Proof. For f(x) ∈ Π
2n+1
, write f(x) = q(x)p(x) + r(x). We note that p(x), r(x) ∈ Π
n
.
Since x
0
, . . . , x
n
are the zeros of q(x) then
f(x
i
) = r(x
i
).
88
D. Levy 5.6 Gaussian Quadrature
Hence,
b
a
f(x)w(x)dx =
b
a
[q(x)p(x) + r(x)]w(x)dx =
b
a
r(x)w(x)dx (5.29)
=
n
¸
i=0
A
i
r(x
i
) =
n
¸
i=0
A
i
f(x
i
).
The second equality in (5.29) holds since q(x) is worthogonal to Π
n
. The third equality
(5.29) holds since (5.28) is exact for polynomials in Π
n
.
According to Theorem 5.7 we already know that the quadrature points that will
provide the most accurate quadrature rule are the n+1 roots of an orthogonal polynomial
of degree n + 1 (where the orthogonality is with respect to the weight function w(x)).
We recall that the roots of q(x) are real, simple and lie in (a, b), something we know
from our previous discussion on orthogonal polynomials (see Theorem 3.17). In other
words, we need n + 1 quadrature points in the interval, and an orthogonal polynomial
of degree n +1 does have n +1 distinct roots in the interval. We now restate the result
regarding the roots of orthogonal functions with an alternative proof.
Theorem 5.8 Let w(x) be a weight function. Assume that f(x) is continuous in [a, b]
that is not the zero function, and that f(x) is worthogonal to Π
n
. Then f(x) changes
sign at least n + 1 times on (a, b).
Proof. Since 1 ∈ Π
n
,
b
a
f(x)w(x)dx = 0.
Hence, f(x) changes sign at least once. Now suppose that f(x) changes size only r
times, where r n. Choose ¦t
i
¦
i0
such that
a = t
0
< t
1
< < t
r+1
= b,
and f(x) is of one sign on (t
0
, t
1
), (t
1
, t
2
), . . . , (t
r
, t
r+1
). The polynomial
p(x) =
n
¸
i=1
(x −t
i
),
has the same sign property. Hence
b
a
f(x)p(x)w(x)dx = 0,
which leads to a contradiction since p(x) ∈ Π
n
.
89
5.6 Gaussian Quadrature D. Levy
Example 5.9
We are looking for a quadrature of the form
1
−1
f(x)dx ≈ A
0
f(x
0
) + A
1
f(x
1
).
A straightforward computation will amount to making this quadrature exact for the
polynomials of degree 3. The linearity of the quadrature means that it is suﬃcient to
make the quadrature exact for 1, x, x
2
, and x
3
. Hence we write the system of equations
1
−1
f(x)dx =
1
−1
x
i
dx = A
0
x
i
0
+ A
1
x
i
1
, i = 0, 1, 2, 3.
From this we can write
A
0
+ A
1
= 2,
A
0
x
0
+ A
1
x
1
= 0,
A
0
x
2
0
+ A
1
x
2
1
=
2
3
,
A
0
x
3
0
+ A
1
x
3
1
= 0.
Solving for A
1
, A
2
, x
0
, and x
1
we get
A
1
= A
2
= 1, x
0
= −x
1
=
1
√
3
,
so that the desired quadrature is
1
−1
f(x)dx ≈ f
−
1
√
3
+ f
1
√
3
. (5.30)
Example 5.10
We repeat the previous problem using orthogonal polynomials. Since n = 1, we expect
to ﬁnd a quadrature that is exact for polynomials of degree 2n+1 = 3. The polynomial
of degree n+1 = 2 which is orthogonal to Π
n
= Π
1
with weight w(x) ≡ 1 is the Legendre
polynomial of degree 2, i.e.,
P
2
(x) =
1
2
(3x
2
−1).
The integration points will then be the zeros of P
2
(x), i.e.,
x
0
= −
1
√
3
, x
1
=
1
√
3
.
All that remains is to determine the coeﬃcients A
1
, A
2
. This is done in the usual way,
assuming that the quadrature
1
−1
f(x)dx ≈ A
0
f(x
0
) + A
1
f(x
1
),
90
D. Levy 5.6 Gaussian Quadrature
is exact for polynomials of degree 1. The simplest will be to use 1 and x, i.e.,
2 =
1
−1
1dx = A
0
+ A
1
,
and
0 =
1
−1
xdx = −A
0
1
√
3
+ A
1
1
√
3
.
Hence A
0
= A
1
= 1, and the quadrature is the same as (5.30) (as should be).
5.6.2 Convergence and error analysis
Lemma 5.11 In a Gaussian quadrature formula, the coeﬃcients are positive and their
sum is
b
a
w(x)dx.
Proof. Fix n. Let q(x) ∈ Π
n+1
be worthogonal to Π
n
. Also assume that q(x
i
) = 0 for
i = 0, . . . , n, and take ¦x
i
¦
n
i=0
to be the quadrature points, i.e.,
b
a
f(x)w(x)dx ≈
n
¸
i=0
A
i
f(x
i
). (5.31)
Fix 0 j n. Let p(x) ∈ Π
n
be deﬁned as
p(x) =
q(x)
x −x
j
.
Since x
j
is a root of q(x), p(x) is indeed a polynomial of degree n. The degree of
p
2
(x) 2n which means that the Gaussian quadrature (5.31) is exact for it. Hence
0 <
b
a
p
2
(x)w(x)dx =
n
¸
i=0
A
i
p
2
(x
i
) =
n
¸
i=0
A
i
q
2
(x
i
)
(x
i
−x
j
)
2
= A
j
p
2
(x
j
),
which means that ∀j, A
j
> 0. In addition, since the Gaussian quadrature is exact for
f(x) ≡ 1, we have
b
a
w(x)dx =
n
¸
i=0
A
i
.
In order to estimate the error in the Gaussian quadrature we would ﬁrst like to
present an alternative way of deriving the Gaussian quadrature. Our starting point
is the Lagrange form of the Hermite polynomial that interpolates f(x) and f
(x) at
x
0
, . . . , x
n
. It is given by (2.42), i.e.,
p(x) =
n
¸
i=0
f(x
i
)a
i
(x) +
n
¸
i=0
f
(x
i
)b
i
(x),
91
5.6 Gaussian Quadrature D. Levy
with
a
i
(x) = (l
i
(x))
2
[1 + 2l
i
(x
i
)(x
i
−x)], b
i
(x) = (x −x
i
)l
2
i
(x), 0 ≤ i ≤ n,
and
l
i
(x) =
n
¸
j=0
j=i
x −x
j
x
i
−x
j
.
We now assume that w(x) is a weight function in [a, b] and approximate
b
a
w(x)f(x)dx ≈
b
a
w(x)p
2n+1
(x)dx =
n
¸
i=0
A
i
f(x
i
) +
n
¸
i=0
B
i
f
(x
i
), (5.32)
where
A
i
=
b
a
w(x)a
i
(x)dx, (5.33)
and
B
i
=
b
a
w(x)b
i
(x)dx. (5.34)
In some sense, it seems to be rather strange to deal with the Hermite interpolant when
we do not explicitly know the values of f
(x) at the interpolation points. However, we
can eliminate the derivatives from the quadrature (5.32) by setting B
i
= 0 in (5.34).
Indeed (assuming n = 0):
B
i
=
b
a
w(x)(x −x
i
)l
2
i
(x)dx =
n
¸
j=0
j=i
(x
i
−x
j
)
b
a
w(x)
n
¸
j=0
(x −x
j
)l
i
(x)dx.
Hence, B
i
= 0, if the product
¸
n
j=0
(x − x
j
) is orthogonal to l
i
(x). Since l
i
(x) is a
polynomial in Π
n
, all that we need is to set the points x
0
, . . . , x
n
as the roots of a
polynomial of degree n+1 that is worthogonal to Π
n
. This is precisely what we deﬁned
as a Gaussian quadrature.
We are now ready to formally establish the fact that the Gaussian quadrature is
exact for polynomials of degree 2n + 1.
Theorem 5.12 Let f ∈ C
2n+2
[a, b] and let w(x) be a weight function. Consider the
Gaussian quadrature
b
a
f(x)w(x)dx ≈
n
¸
i=0
A
i
f(x
i
).
Then there exists ζ ∈ (a, b) such that
b
a
f(x)w(x)dx −
n
¸
i=0
A
i
f(x
i
) =
f
(2n+2)
(ζ)
(2n + 2)!
b
a
n
¸
j=0
(x −x
j
)
2
w(x)dx.
92
D. Levy 5.7 Romberg Integration
Proof. We use the characterization of the Gaussian quadrature as the integral of a
Hermite interpolant. We recall that the error formula for the Hermite interpolation is
given by (2.49),
f(x) −p
2n+1
(x) =
f
(2n+2)
(ξ)
(2n + 2)!
n
¸
j=0
(x −x
j
)
2
, ξ ∈ (a, b).
Hence according to (5.32) we have
b
a
f(x)w(x)dx −
n
¸
i=0
A
i
f(x
i
) =
b
a
f(x)w(x)dx −
b
a
p
2n+1
w(x)dx
=
b
a
w(x)
f
(2n+2)
(ξ)
(2n + 2)!
n
¸
j=0
(x −x
j
)
2
dx.
The integral mean value theorem then implies that there exists ζ ∈ (a, b) such that
b
a
f(x)w(x)dx −
n
¸
i=0
A
i
f(x
i
) =
f
(2n+2)
(ζ)
(2n + 2)!
b
a
n
¸
j=0
(x −x
j
)
2
(x)w(x)dx.
We conclude this section with a convergence theorem that states that for continuous
functions, the Gaussian quadrature converges to the exact value of the integral as the
number of quadrature points tends to inﬁnity. This theorem is not of a great practical
value because it does not provide an estimate on the rate of convergence. A proof of
the theorem that is based on the Weierstrass approximation theorem can be found in,
e.g., in [7].
Theorem 5.13 We let w(x) be a weight function and assuming that f(x) is a con
tinuous function on [a, b]. For each n ∈ N we let ¦x
n
i
¦
n
i=0
be the n + 1 roots of the
polynomial of degree n + 1 that is worthogonal to Π
n
, and consider the corresponding
Gaussian quadrature:
b
a
f(x)w(x)dx ≈
n
¸
i=0
A
n
i
f(x
n
i
). (5.35)
Then the righthandside of (5.35) converges to the lefthandside as n →∞.
5.7 Romberg Integration
We have introduced Richardson’s extrapolation in Section 4.4 in the context of numerical
diﬀerentiation. We can use a similar principle with numerical integration.
We will demonstrate this principle with a particular example. Let I denote the exact
integral that we would like to approximate, i.e.,
I =
b
a
f(x)dx.
93
5.7 Romberg Integration D. Levy
Let’s assume that this integral is approximated with a composite trapezoidal rule on a
uniform grid with mesh spacing h (5.13),
T(h) = h
n
¸
i=0
f(a + ih).
We know that the composite trapezoidal rule is secondorder accurate (see (5.14)).
A more detailed study of the quadrature error reveals that the diﬀerence between I and
T(h) can be written as
I = T(h) + c
1
h
2
+ c
2
h
4
+ . . . + c
k
h
k
+ O(h
2k+2
).
The exact values of the coeﬃcients, c
k
, are of no interest to us as long as they do not
depend on h (which is indeed the case). We can now write a similar quadrature that is
based on half the number of points, i.e., T(2h). Hence
I = T(2h) + c
1
(2h)
2
+ c
2
(2h)
4
+ . . .
This enables us to eliminate the h
2
error term:
I =
4T(h) −T(2h)
3
+ ˆ c
2
h
4
+ . . .
Therefore
4T(h) −T(2h)
3
=
1
3
¸
4h
1
2
f
0
+ f
1
+ . . . + f
n−1
+
1
2
f
n
−2h
1
2
f
0
+ f
2
+ . . . + f
n−2
+
1
2
f
n
=
h
3
(f
0
+ 4f
1
+ 2f
2
+ . . . + 2f
n−2
+ 4f
n−1
+ f
n
) = S(n).
Here, S(n) denotes the composite Simpson’s rule with n subintervals. The procedure
of increasing the accuracy of the quadrature by eliminating the leading error term is
known as Romberg integration. In some places, Romberg integration is used to
describe the speciﬁc case of turning the composite trapezoidal rule into Simpson’s rule
(and so on). The quadrature that is obtained from Simpson’s rule by eliminating the
leading error term is known as the super Simpson rule.
94
D. Levy
6 Methods for Solving Nonlinear Problems
6.1 The Bisection Method
In this section we present the “bisection method” which is probably the most intuitive
approach to root ﬁnding. We are looking for a root of a function f(x) which we assume
is continuous on the interval [a, b]. We also assume that it has opposite signs at both
edges of the interval, i.e., f(a)f(b) < 0. We then know that f(x) has at least one zero
in [a, b]. Of course f(x) may have more than one zero in the interval. The bisection
method is only going to converge to one of the zeros of f(x). There will also be no
indication as of how many zeros f(x) has in the interval, and no hints regarding where
can we actually hope to ﬁnd more roots, if indeed there are additional roots.
The ﬁrst step is to divide the interval into two equal subintervals,
c =
a + b
2
.
This generates two subintervals, [a, c] and [c, b], of equal lengths. We want to keep the
subinterval that is guaranteed to contain a root. Of course, in the rare event where
f(c) = 0 we are done. Otherwise, we check if f(a)f(c) < 0. If yes, we keep the left
subinterval [a, c]. If f(a)f(c) > 0, we keep the right subinterval [c, b]. This procedure
repeats until the stopping criterion is satisﬁed: we ﬁx a small parameter ε > 0 and stop
when [f(c)[ < ε. To simplify the notation, we denote the successive intervals by [a
0
, b
0
],
[a
1
, b
1
],... The ﬁrst two iterations in the bisection method are shown in Figure 6.1. Note
that in the case that is shown in the ﬁgure, the function f(x) has multiple roots but the
method converges to only one of them.
0
a
0
c
b
0
x
f(a
0
)
f(c)
f(b
0
)
0
a
1
c
b
1
x
f(a
1
)
f(c)
f(b
1
)
Figure 6.1: The ﬁrst two iterations in a bisection rootﬁnding method
95
6.1 The Bisection Method D. Levy
We would now like to understand if the bisection method always converges to a root.
We would also like to ﬁgure out how close we are to a root after iterating the algorithm
several times. We ﬁrst note that
a
0
a
1
a
2
. . . b
0
,
and
b
0
b
1
b
2
. . . a
0
.
We also know that every iteration shrinks the length of the interval by a half, i.e.,
b
n+1
−a
n+1
=
1
2
(b
n
−a
n
), n 0,
which means that
b
n
−a
n
= 2
−n
(b
0
−a
0
).
The sequences ¦a
n
¦
n0
and ¦b
n
¦
n0
are monotone and bounded, and hence converge.
Also
lim
n→∞
b
n
− lim
n→∞
a
n
= lim
n→∞
2
−n
(b
0
−a
0
) = 0,
so that both sequences converge to the same value. We denote that value by r, i.e.,
r = lim
n→∞
a
n
= lim
n→∞
b
n
.
Since f(a
n
)f(b
n
) 0, we know that (f(r))
2
0, which means that f(r) = 0, i.e., r is a
root of f(x).
We now assume that we stop in the interval [a
n
, b
n
]. This means that r ∈ [a
n
, b
n
].
Given such an interval, if we have to guess where is the root (which we know is in the
interval), it is easy to see that the best estimate for the location of the root is the center
of the interval, i.e.,
c
n
=
a
n
+ b
n
2
.
In this case, we have
[r −c
n
[
1
2
(b
n
−a
n
) = 2
−(n+1)
(b
0
−a
0
).
We summarize this result with the following theorem.
Theorem 6.1 If [a
n
, b
n
] is the interval that is obtained in the n
th
iteration of the bisec
tion method, then the limits lim
n→∞
a
n
and lim
n→∞
b
n
exist, and
lim
n→∞
a
n
= lim
n→∞
b
n
= r,
where f(r) = 0. In addition, if
c
n
=
a
n
+ b
n
2
,
then
[r −c
n
[ 2
−(n+1)
(b
0
−a
0
).
96
D. Levy 6.2 Newton’s Method
6.2 Newton’s Method
Newton’s method is a relatively simple, practical, and widelyused root ﬁnding method.
It is easy to see that while in some cases the method rapidly converges to a root of the
function, in some other cases it may fail to converge at all. This is one reason as of why
it is so important not only to understand the construction of the method, but also to
understand its limitations.
As always, we assume that f(x) has at least one (real) root, and denote it by r. We
start with an initial guess for the location of the root, say x
0
. We then let l(x) be the
tangent line to f(x) at x
0
, i.e.,
l(x) −f(x
0
) = f
(x
0
)(x −x
0
).
The intersection of l(x) with the xaxis serves as the next estimate of the root. We
denote this point by x
1
and write
0 −f(x
0
) = f
(x
0
)(x
1
−x
0
),
which means that
x
1
= x
0
−
f(x
0
)
f
(x
0
)
. (6.1)
In general, the Newton method (also known as the NewtonRaphson method) for
ﬁnding a root is given by iterating (6.1) repeatedly, i.e.,
x
n+1
= x
n
−
f(x
n
)
f
(x
n
)
. (6.2)
Two sample iterations of the method are shown in Figure 6.2. Starting from a point x
n
,
we ﬁnd the next approximation of the root x
n+1
, from which we ﬁnd x
n+2
and so on. In
this case, we do converge to the root of f(x).
It is easy to see that Newton’s method does not always converge. We demonstrate
such a case in Figure 6.3. Here we consider the function f(x) = tan
−1
(x) and show what
happens if we start with a point which is a ﬁxed point of Newton’s method, iterated
twice. In this case, x
0
≈ 1.3917 is such a point.
In order to analyze the error in Newton’s method we let the error in the n
th
iteration
be
e
n
= x
n
−r.
We assume that f
(x) is continuous and that f
(r) = 0, i.e., that r is a simple root of
f(x). We will show that the method has a quadratic convergence rate, i.e.,
e
n+1
≈ ce
2
n
. (6.3)
A convergence rate estimate of the type (6.3) makes sense, of course, only if the method
converges. Indeed, we will prove the convergence of the method for certain functions
97
6.2 Newton’s Method D. Levy
0
r
x
n+2
x
n+1
x
n
x
f(x) −→
Figure 6.2: Two iterations in Newton’s rootﬁnding method. r is the root of f(x) we
approach by starting from x
n
, computing x
n+1
, then x
n+2
, etc.
0
x
1
, x
3
, x
5
, ... x
0
, x
2
, x
4
, ...
x
t
a
n
−
1
(
x
)
Figure 6.3: Newton’s method does not always converge. In this case, the starting point
is a ﬁxed point of Newton’s method iterated twice
98
D. Levy 6.2 Newton’s Method
f(x), but before we get to the convergence issue, let’s derive the estimate (6.3). We
rewrite e
n+1
as
e
n+1
= x
n+1
−r = x
n
−
f(x
n
)
f
(x
n
)
−r = e
n
−
f(x
n
)
f
(x
n
)
=
e
n
f
(x
n
) −f(x
n
)
f
(x
n
)
.
Writing a Taylor expansion of f(r) about x = x
n
we have
0 = f(r) = f(x
n
−e
n
) = f(x
n
) −e
n
f
(x
n
) +
1
2
e
2
n
f
(ξ
n
),
which means that
e
n
f
(x
n
) −f(x
n
) =
1
2
f
(ξ
n
)e
2
n
.
Hence, the relation (6.3), e
n+1
≈ ce
2
n
, holds with
c =
1
2
f
(ξ
n
)
f
(x
n
)
(6.4)
Since we assume that the method converges, in the limit as n →∞we can replace (6.4)
by
c =
1
2
f
(r)
f
(r)
. (6.5)
We now return to the issue of convergence and prove that for certain functions
Newton’s method converges regardless of the starting point.
Theorem 6.2 Assume that f(x) has two continuous derivatives, is monotonically in
creasing, convex, and has a zero. Then the zero is unique and Newton’s method will
converge to it from every starting point.
Proof. The assumptions on the function f(x) imply that ∀x, f
(x) > 0 and f
(x) > 0.
By (6.4), the error at the (n + 1)
th
iteration, e
n+1
, is given by
e
n+1
=
1
2
f
(ξ
n
)
f
(x
n
)
e
2
n
,
and hence it is positive, i.e., e
n+1
> 0. This implies that ∀n 1, x
n
> r, Since
f
(x) > 0, we have
f(x
n
) > f(r) = 0.
Now, subtracting r from both sides of (6.2) we may write
e
n+1
= e
n
−
f(x
n
)
f
(x
n
)
, (6.6)
99
6.3 The Secant Method D. Levy
which means that e
n+1
< e
n
(and hence x
n+1
< x
n
). Hence, both ¦e
n
¦
n0
and ¦x
n
¦
n0
are decreasing and bounded from below. This means that both series converge, i.e.,
there exists e
∗
such that,
e
∗
= lim
n→∞
e
n
,
and there exists x
∗
such that
x
∗
= lim
n→∞
x
n
.
By (6.6) we have
e
∗
= e
∗
−
f(x
∗
)
f
(x
∗
)
,
so that f(x
∗
) = 0, and hence x
∗
= r.
6.3 The Secant Method
We recall that Newton’s root ﬁnding method is given by equation (6.2), i.e.,
x
n+1
= x
n
−
f(x
n
)
f
(x
n
)
.
We now assume that we do not know that the function f(x) is diﬀerentiable at x
n
, and
thus can not use Newton’s method as is. Instead, we can replace the derivative f
(x
n
)
that appears in Newton’s method by a diﬀerence approximation. A particular choice of
such an approximation,
f
(x
n
) ≈
f(x
n
) −f(x
n−1
)
x
n
−x
n−1
,
leads to the secant method which is given by
x
n+1
= x
n
−f(x
n
)
¸
x
n
−x
n−1
f(x
n
) −f(x
n−1
)
, n 1. (6.7)
A geometric interpretation of the secant method is shown in Figure 6.4. Given two
points, (x
n−1
, f(x
n−1
)) and (x
n
, f(x
n
)), the line l(x) that connects them satisﬁes
l(x) −f(x
n
) =
f(x
n−1
) −f(x
n
)
x
n−1
−x
n
(x −x
n
).
The next approximation of the root, x
n+1
, is deﬁned as the intersection of l(x) and the
xaxis, i.e.,
0 −f(x
n
) =
f(x
n−1
) −f(x
n
)
x
n−1
−x
n
(x
n+1
−x
n
). (6.8)
100
D. Levy 6.3 The Secant Method
0
r
x
n+1
x
n
x
n−1
x
f(x) −→
Figure 6.4: The Secant rootﬁnding method. The points x
n−1
and x
n
are used to obtain
x
n+1
, which is the next approximation of the root r
Rearranging the terms in (6.8) we end up with the secant method (6.7).
We note that the secant method (6.7) requires two initial points. While this is an
extra requirement compared with, e.g., Newton’s method, we note that in the secant
method there is no need to evaluate any derivatives. In addition, if implemented prop
erly, every stage requires only one new function evaluation.
We now proceed with an error analysis for the secant method. As usual, we denote
the error at the n
th
iteration by e
n
= x
n
−r. We claim that the rate of convergence of
the secant method is superlinear (meaning, better than linear but less than quadratic).
More precisely, we will show that it is given by
[e
n+1
[ ≈ [e
n
[
α
, (6.9)
with
α =
1 +
√
5
2
. (6.10)
We start by rewriting e
n+1
as
e
n+1
= x
n+1
−r =
f(x
n
)x
n−1
−f(x
n−1
)x
n
f(x
n
) −f(x
n−1
)
−r =
f(x
n
)e
n−1
−f(x
n−1
)e
n
f(x
n
) −f(x
n−1
)
.
Hence
e
n+1
= e
n
e
n−1
¸
x
n
−x
n−1
f(x
n
) −f(x
n−1
)
¸
f(xn)
en
−
f(x
n−1
)
e
n−1
x
n
−x
n−1
¸
. (6.11)
101
6.3 The Secant Method D. Levy
A Taylor expansion of f(x
n
) about x = r reads
f(x
n
) = f(r + e
n
) = f(r) + e
n
f
(r) +
1
2
e
2
n
f
(r) + O(e
3
n
),
and hence
f(x
n
)
e
n
= f
(r) +
1
2
e
n
f
(r) + O(e
2
n
).
We thus have
f(x
n
)
e
n
−
f(x
n−1
)
e
n−1
=
1
2
(e
n
−e
n−1
)f
(r) + O(e
2
n−1
) + O(e
2
n
)
=
1
2
(x
n
−x
n−1
)f
(r) + O(e
2
n−1
) + O(e
2
n
).
Therefore,
f(xn)
en
−
f(x
n−1
)
e
n−1
x
n
−x
n−1
≈
1
2
f
(r),
and
x
n
−x
n−1
f(x
n
) −f(x
n−1
)
≈
1
f
(r)
.
The error expression (6.11) can be now simpliﬁed to
e
n+1
≈
1
2
f
(r)
f
(r)
e
n
e
n−1
= ce
n
e
n−1
. (6.12)
Equation (6.12) expresses the error at iteration n+1 in terms of the errors at iterations
n and n − 1. In order to turn this into a relation between the error at the (n + 1)
th
iteration and the error at the n
th
iteration, we now assume that the order of convergence
is α, i.e.,
[e
n+1
[ ∼ A[e
n
[
α
. (6.13)
Since (6.13) also means that [e
n
[ ∼ A[e
n−1
[
α
, we have
A[e
n
[
α
∼ C[e
n
[A
−
1
α
[e
n
[
1
α
.
This implies that
A
1+
1
α
C
−1
∼ [e
n
[
1−α+
1
α
. (6.14)
The lefthandside of (6.14) is nonzero while the righthandside of (6.14) tends to zero
as n →∞ (assuming, of course, that the method converges). This is possible only if
1 −α +
1
α
= 0,
102
D. Levy 6.3 The Secant Method
which, in turn, means that
α =
1 +
√
5
2
.
The constant A in (6.13) is thus given by
A = C
1
1+
1
α
= C
1
α
= C
α−1
=
¸
f
(r)
2f
(r)
α−1
.
We summarize this result with the theorem:
Theorem 6.3 Assume that f
(x) is continuous ∀x in an interval I. Assume that
f(r) = 0 and that f
(r) = 0. If x
0
, x
1
are suﬃciently close to the root r, then x
n
→ r.
In this case, the convergence is of order
1+
√
5
2
.
103
REFERENCES D. Levy
References
[1] Atkinson K., An introduction to numerical analysis, Second edition, John Wiley &
Sons, New York, NY, 1989
[2] Cheney E.W., Introduction to approximation theory, Second edition, Chelsea pub
lishing company, New York, NY, 1982
[3] Dahlquist G., Bj¨orck A., Numerical methods, PrenticeHall, Englewood cliﬀs, NJ,
1974
[4] Davis P.J., Interpolation and approximation, Second edition, Dover, New York, NY,
1975
[5] Isaacson E., Keller H.B., Analysis of numerical methods, Second edition, Dover,
Mineola, NY, 1994
[6] Stoer J., Burlisch R., Introduction to numerical analysis, Second edition, Springer
Verlag, New York, NY, 1993
[7] S¨ uli E., Mayers D., An introduction to numerical analysis, Cambridge university
press, Cambridge, UK, 2003.
104
Index
L
2
norm. . . . . . . . . . . . . . . . . . . . . . . . . 36, 48
weighted. . . . . . . . . . . . . . . . . . . . . 52, 53
L
∞
norm. . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
approximation
best approximation . . . . . . . . . . . . . . 42
existence . . . . . . . . . . . . . . . . . . . . . . 42
leastsquares. . . . . . . . . . . . . . . . . . . . . 48
Hilbert matrix. . . . . . . . . . . . . . . . . 49
orthogonal polynomials . . . . . . . . 50
solution . . . . . . . . . . . . . . . . . . . . . . . 48
minimax. . . . . . . . . . . . . . . . . . . . . . . . . 41
oscillating theorem . . . . . . . . . . . . 44
remez . . . . . . . . . . . . . . . . . . . . . . . . . 46
uniqueness. . . . . . . . . . . . . . . . . . . . . 45
near minimax . . . . . . . . . . . . . . . . . . . 46
Weierstrass . . . . . . . . . . . . . . . . . . . . . . 37
Bernstein polynomials . . . . . . . . . . . . . . . 37
Bessel’s inequality . . . . . . . . . . . . . . . . . . . 59
Chebyshev
near minimax interpolation . . . . . . 46
points . . . . . . . . . . . . . . . . . . . . . . . 18, 46
polynomials. . . . . . . . . . . . . . . . . . 15, 56
Chebyshev uniqueness theorem . . . . . . 45
de la Vall´eePoussin . . . . . . . . . . . . . . . . . 44
diﬀerentiation . . . . . . . . . . . . . . . . . . . . . . . 65
accuracy. . . . . . . . . . . . . . . . . . . . . . . . . 66
backward diﬀerencing. . . . . . . . . . . . 66
centered diﬀerencing. . . . . . . . . . . . . 66
forward diﬀerencing . . . . . . . . . . . . . 66
onesided diﬀerencing. . . . . . . . . . . . 66
Richardson’s extrapolation. . . . . . . 72
truncation error . . . . . . . . . . . . . . . . . 66
undetermined coeﬃcients . . . . . . . . 70
via interpolation . . . . . . . . . . . . . 67, 68
divided diﬀerences . . . . . . . . . . . . . . . 10, 14
with repetitions . . . . . . . . . . . . . . . . . 23
GramSchmidt . . . . . . . . . . . . . . . . . . . . . . 53
Hermite polynomials . . . . . . . . . . . . . . . . 57
Hilbert matrix. . . . . . . . . . . . . . . . . . . . . . . 49
inner product . . . . . . . . . . . . . . . . . . . . . . . 53
weighted . . . . . . . . . . . . . . . . . . . . . . . . 53
integration
Gaussian . . . . . . . . . . . . . . . . . . . . . . . . 87
orthogonal polynomials . . . . . . . . 88
midpoint rule. . . . . . . . . . . . . . . . . . . . 75
composite . . . . . . . . . . . . . . . . . . . . . 81
NewtonCotes . . . . . . . . . . . . . . . . . . . 77
quadratures . . . . . . . . . . . . . . . . . . . . . 75
weighted. . . . . . . . . . . . . . . . . . . . . . . 84
rectangular rule . . . . . . . . . . . . . . . . . 75
Riemann . . . . . . . . . . . . . . . . . . . . . . . . 74
Romberg. . . . . . . . . . . . . . . . . . . . . 93, 94
Simpson’s rule . . . . . . . . . . . . . . . 83, 85
composite . . . . . . . . . . . . . . . . . . . . . 86
error . . . . . . . . . . . . . . . . . . . . . . . 85, 87
super Simpson. . . . . . . . . . . . . . . . . . . 94
trapezoidal rule . . . . . . . . . . . . . . 77, 78
composite . . . . . . . . . . . . . . . . . . . . . 79
undetermined coeﬃcients . . . . . . . . 82
interpolation
Chebyshev points . . . . . . . . . . . . 15, 18
divided diﬀerences . . . . . . . . . . . 10, 14
with repetitions. . . . . . . . . . . . . . . . 23
error . . . . . . . . . . . . . . . . . . . . . . . . . 12, 15
existence . . . . . . . . . . . . . . . . . . . . . . . . . 3
Hermite . . . . . . . . . . . . . . . . . . . . . . . . . 21
Lagrange form. . . . . . . . . . . . . . . . . 25
Newton’s form. . . . . . . . . . . . . . . . . 23
interpolation error . . . . . . . . . . . . . . . . 3
interpolation points . . . . . . . . . . . . . . . 3
Lagrange form. . . . . . . . . . . . . . . . . . . . 8
near minimax . . . . . . . . . . . . . . . . . . . 46
Newton’s form . . . . . . . . . . . . . . . . 5, 10
divided diﬀerences . . . . . . . . . . . . . 10
polynomial interpolation. . . . . . . . . . 3
splines. . . . . . . . . . . . . . . . . . . . . . . . . . . 28
cubic . . . . . . . . . . . . . . . . . . . . . . . . . . 30
degree . . . . . . . . . . . . . . . . . . . . . . . . . 28
knots . . . . . . . . . . . . . . . . . . . . . . . . . . 28
105
INDEX D. Levy
natural . . . . . . . . . . . . . . . . . . . . 33, 34
notaknot . . . . . . . . . . . . . . . . . . . . . 33
uniqueness . . . . . . . . . . . . . . . . . . . . . 3, 7
Vandermonde determinant . . . . . . . . 6
weighted least squares . . . . . . . . . . . 52
Lagrange form . . . . . . . . see interpolation
Laguerre polynomials . . . . . . . . . . . . 57, 62
leastsquares . . . . . . . . see approximation
weighted. . . . . . . . . see approximation
Legendre polynomials . . . . . . . . 56, 59, 60
maximum norm . . . . . . . . . . . . . . . . . . . . . 36
minimax error . . . . . . . . . . . . . . . . . . . 41, 43
monic polynomial . . . . . . . . . . . . . . . . . . . 16
Newton’s form . . . . . . . . see interpolation
norm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
orthogonal polynomials . . . 50, 51, 53, 88
Bessel’s inequality . . . . . . . . . . . . . . . 59
Chebyshev . . . . . . . . . . . . . . . . . . . . . . 56
GramSchmidt. . . . . . . . . . . . . . . . . . . 53
Hermite . . . . . . . . . . . . . . . . . . . . . . . . . 57
Laguerre . . . . . . . . . . . . . . . . . . . . . 57, 62
Legendre . . . . . . . . . . . . . . . . . 56, 59, 60
Parseval’s equality. . . . . . . . . . . . . . . 59
roots of . . . . . . . . . . . . . . . . . . . . . . . . . . 63
triple recursion relation. . . . . . . . . . 63
oscillating theorem . . . . . . . . . . . . . . . . . . 44
Parseval’s equality. . . . . . . . . . . . . . . . . . . 59
quadratures . . . . . . . . . . . . . see integration
Remez algorithm . . . . . . . . . . . . . . . . . . . . 46
Richardson’s extrapolation. . . . . . . 72, 93
Riemann sums . . . . . . . . . . . . . . . . . . . . . . 74
Romberg integration. . . . . . . . . . . . . 93, 94
root ﬁnding
Newton’s method. . . . . . . . . . . . . . . . 97
the bisection method . . . . . . . . . . . . 95
the secant method. . . . . . . . . . . . . . 100
splines . . . . . . . . . . . . . . . . see interpolation
Taylor series. . . . . . . . . . . . . . . . . . . . . . . . . 23
triangle inequality . . . . . . . . . . . . . . . . . . . 36
Vandermonde determinant . . . . . . . . . . . . 6
Weierstrass approximation theorem. . 37
106
D. Levy
Preface
i
.
. . . . . . 3. . . . . . .2. . . . . . .2 The Lagrange form of the Hermite interpolant . . . . . . . . . . . . . .1 Cubic splines . . . . . 2. . . . .2 The Interpolation Problem . . .D. . . . . . . 3. . . . . . . . . 4. . . . .4 The Interpolation Problem and the Vandermonde Determinant 2. . . . . . . 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9. . . . . 2. . . . . . . . . . . . . . . . .1 What is Interpolation? . . .2 What is natural about the natural spline? . . . . . .3. . . . . .10. . . . . . . . . . . . . . . . . . 3. . . . . . . . . . . . . . . . . . . 4. . . .2 Diﬀerentiation Via Interpolation . . . . . . . . . . . . .6 Another approach to the leastsquares problem . . . . . . . . . . . .6 Construction of the minimax polynomial . . 2. . . . . . . . . . . . . . . . . . . . . . . .1 Basic Concepts . . . . . . . . . . i 1 2 2 3 5 6 7 10 12 15 21 23 25 28 30 34 36 36 41 42 43 44 45 46 46 48 48 48 50 52 53 58 63 65 65 67 70 72 3 Approximations 3. . .4 The weighted least squares problem . 3. . . . . . . 3. . . . . . . . . .10 Spline Interpolation . 2.3 Solving the leastsquares problem: with orthogonal polynomials 3. . . . . . . . . . . . . . . . . . . . . . 3. . . . . 3. .7 The Error in Polynomial Interpolation . . . . . . . . . 3. 2. . . . . . 3. . . . . . . . . . . . . . . . . . . .3 Leastsquares Approximations . . . . . . . . . . . . . . . . . . .6 Divided Diﬀerences . . . . . 2. . . .2. . . . . .3. . .1 Background . . .8 Interpolation at the Chebyshev Points . . . . . . . . . . . . . . . 3.5 Orthogonal polynomials . . . . . . . . . . . . .2. . . . Levy CONTENTS Contents Preface 1 Introduction 2 Interpolation 2. . .3. .1 Divided diﬀerences with repetitions . . . . . . . . . . . 3. . . . . . . . . . . 3. . . . . . . . . . .2. . 2. . . . . . . . . . . . . . . . . . .2. .2 The Minimax Approximation Problem . . . . . . . . . . . . . . . . 4 Numerical Diﬀerentiation 4. . . . . . . . iii . . . . . . . . . . . . . . . . . . 3. .9. . . . . . . . . . .7 Properties of orthogonal polynomials . . . . . . . . . . . . . . . .10. . . . . . . . . .3. . . . . 2. . . .3. . . . . . . . . 2.3. . . . 2. . . . . . .3 The Method of Undetermined Coeﬃcients 4.3 Characterization of the minimax polynomial . . . . . . . 2. . . . . . . . . . . . .9 Hermite Interpolation . . .2 Solving the leastsquares problem: a direct method . . . . .2 Bounds on the minimax error . . . . . . . . . . . . . . . .2. . .5 The Lagrange Form of the Interpolation Polynomial . . . . . . . . .1 The leastsquares approximation problem . .5 The nearminimax polynomial . . . . . . . . . . . . . 2. . .3 Newton’s Form of the Interpolation Polynomial . . .4 Uniqueness of the minimax polynomial . . . . . . . . . . . .1 Existence of the minimax polynomial . . . . . . . . . . . . . . . .4 Richardson’s Extrapolation . .3. . .
. . . . . . . . . . . Levy 74 74 77 79 82 82 83 84 85 85 86 87 87 91 93 .5. . . 6 Methods for Solving Nonlinear 6. . 6. . . . . . . . . . . . . . . .2 Integration via Interpolation . . . . . . . . . . . . . . 5. . .6. . . . . . . . . . . . . . . .3 General integration formulas . . . . . . . . . .4. . . . . . . . . . . . .2 Convergence and error analysis . . . . . . . . . . . 5. . . . . .6 Gaussian Quadrature . . . . 5. . . . . . . . . . . . . . . 5. . . . . . . . . . . 5. . .7 Romberg Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4. . . . . . . . . . . . . 5. . . . . . . . . . .4 Additional Integration Techniques . Problems 95 . . . . . . . . . . . . . . . . . . . . 5. . . . . . . . 95 . . .5 Simpson’s Integration . . . . . . . . 5. . . . . .1 Basic Concepts . . . . . . . 6. . . . . . . . . . . . . . . . . . . 5. . . . . . . . . . 5. . 97 . . . . . . . .4. . . . 5. . . . . . . . . . . . .3 Composite Integration Rules . . . . . . . . . . . .2 Newton’s Method . . . . . . . . . .1 Maximizing the quadrature’s accuracy . . . . . . . . .1 The quadrature error .6.2 Composite Simpson rule . . . . . . . . . . .CONTENTS 5 Numerical Integration 5. . . . . . . . . . . . . . . .1 The Bisection Method . . . . . . . . . . .2 Change of an interval . . . . . . 100 104 iv . . . . . .5. 5. . . . . . . . Bibliography D. . . . . . .1 The method of undetermined coeﬃcients 5. . . . . . . . . . . . .3 The Secant Method . . . . . . . . . .
Levy 1 Introduction 1 .D.
. the interpolation points x0 . . we may look for a polynomial.e.. f (x1 )). which is nothing but the line that connects them. and the interpolating polynomial Q(x) As a simple example let’s consider values of a function that are prescribed at two points: (x0 . While this is a legitimate solution of the interpolation problem.1: The function f (x). However. is to connect them with straight lines.1) are satisﬁed (see Figure 2. There are inﬁnitely many functions that pass through these two points. f (xn ) are given. .1 Interpolation What is Interpolation? Imagine that there is an unknown function f (x) for which someone supplies you with its (exact) values at (n + 1) distinct points x0 < x1 < · · · < xn . if we limit ourselves to polynomials of degree less than or equal to one. A line. x1 . there is only one such function that passes through these two points.1). . a smoother function. For example. One easy way of ﬁnding such a function. The interpolation problem is to construct a function Q(x) that passes through these points. We therefore always specify a certain class of functions from which we would like to ﬁnd one that solves the interpolation problem. (2. e. Q(x). we may look for a trigonometric function or a piecewisesmooth polynomial such that the interpolation requirements Q(xj ) = f (xj ). that passes through these points. Levy 2 2.D. Q(x) f(x2) f(x1) f(x0) f(x) x0 x1 x2 Figure 2. x2 . f (x0 )) and (x1 . i. 0 j n.g. f (x0 ). in general. Alternatively. is a polynomial of degree 2 .. usually (though not always) we are interested in a diﬀerent kind of a solution.
j = 0. f (xn ) there exists a unique polynomial Qn (x) of degree n such that the interpolation conditions (2. xn are called the interpolation points. or has more than one solution. Q(x) and f (x) will not have the same values. . f (x1 )). yn . you may have a list of interpolation points x0 . xn . . there are inﬁnitely many polynomials that interpolate between (x0 . . y0 . . at any other point. If we do not limit the degree of the interpolation polynomial it is easy to see that there any inﬁnitely many polynomials that interpolate the data. (2. The function Q(x) that interpolates f (x) at the interpolation points will be still be identical to f (x) at these points because there we satisfy the interpolation conditions (2.D. . it could be that f (x0 ) = f (x1 ) in which case the line that connects the two points is the constant Q0 (x) ≡ f (x0 ). . singles out precisely one interpolant that will do the job. There is only one polynomial of degree 1 that does the job. The function that interpolates the data is an interpolant or an interpolating polynomial (or whatever function is being used). there is little hope that the interpolant will be identical to the unknown function f (x). . . The points x0 . f (x0 )) and (x1 . In general. . as such an order will make no diﬀerence for the present discussion. . For example. then for any f (x0 ). . . xn ∈ R are distinct. We will also discuss strategies on how to minimize this error. . What we are going to study in this section is precisely how to distinguish between these cases. Levy 2. .2) Note that we do not assume any ordering between the points x0 . limiting the degree to n. xn . It is important to note that it is possible to formulate interpolation problem without referring to (or even assuming the existence of) any underlying function f (x). The property of “passing through these points” is referred to as interpolating the data. and data that is given at these points. However. . . if n = 1. which you would like to interpolate. Sometimes the interpolation problem has a solution. which is a polynomial of degree zero.2 The Interpolation Problem We begin our study with the problem of polynomial interpolation: Given n + 1 distinct points x0 .2 The Interpolation Problem one. . The interpolation error a measure on how diﬀerent these two functions are. y1 . . The solution to this interpolation problem is identical to the one where the values are taken from an underlying function.1 If x0 . . . There are cases were the interpolation problem has no solution. . We will study ways of estimating the interpolation error. This result is formally stated in the following theorem: Theorem 2. We are also going to present various ways of actually constructing the interpolant. . 3 . we seek a polynomial Qn (x) of the lowest degree such that the following interpolation conditions are satisﬁed: Qn (xj ) = f (xj ). but if we want to keep the discussion general enough. This is why we say that there is a unique polynomial of degree 1 that connects these two points (and not “a polynomial of degree 1”). 2. . . xn . .1). . .2) are satisﬁed. n. For example. In general. . has a unique solution. . . .
2).e. This leads to a contradiction that can be resolved only if Hn (x) is the zero polynomial. . However.3) is yet to be determined.2. i. is satisﬁed. Pn (x) = Qn (x). Levy Proof. suppose that there are two polynomials Qn (x). which means that Hn (x) has n + 1 distinct zeros. · (x − xn−1 ). and uniqueness is established.5) (xn − xj ) and we are done with the proof of existence. we have Hn (xj ) = (Qn − Pn )(xj ) = 0.2). (2. . All that remains is to determine the constant c in such a way that the last interpolation condition. 4 . We start with the existence part and prove the result by induction.3) The constant c in (2.e. Pn (x) of degree that satisfy the interpolation conditions (2. n The degree of Hn (x) is at most n which means that it can have at most n zeros (unless it is identically zero). In addition.4) deﬁnes c as c= f (xn ) − Qn−1 (xn ) n−1 j=0 (2. .4) . Qn (xn ) = Qn−1 (xn ) + c(xn − x0 ) · . (2. As for uniqueness.. · · · (xn − xn−1 ). the construction of Qn (x) implies that deg(Qn (x)) n. For n = 0. 0 j n − 1. . Deﬁne a polynomial Hn (x) as the diﬀerence Hn (x) = Qn (x) − Pn (x). Q0 = f (x0 ). 0 j n. Qn (x). in the following way: Qn (x) = Qn−1 (x) + c(x − x0 ) · .2 The Interpolation Problem D. the polynomial Qn (x) satisﬁes the interpolation requirements Qn (xj ) = f (xj ) for 0 j n − 1. and suppose also that Qn−1 (xj ) = f (xj ). The condition (2. i. Clearly. Suppose that Qn−1 is a polynomial of degree n − 1. since both Qn (x) and Pn (x) satisfy the interpolation requirements (2. Let us now construct from Qn−1 (x) a new polynomial.. Qn (xn ) = f (xn ).
x 1 − x0 x 1 − x0 We note that Q1 (x) is nothing but the straight line connecting the two points (x0 . aj = j−1 k=0 (2.7) as the Newton form of the interpolation polynomial. The coeﬃcients aj in (2.6) = a0 + j=1 aj k=0 (x − xk ). · (x − xn−1 ) n j−1 (2. As we shall see below.1 is that it is constructive. • In general. . there are various ways of writing the interpolation polynomial. f (x1 )). + an (x − x0 ) · .3 Newton’s Form of the Interpolation Polynomial 2.1 implies that we will only be rewriting the same polynomial in diﬀerent ways. We follow the procedure given by (2.6) are given by a0 = f (x0 ). • Let Q1 (x) = a0 + a1 (x − x0 ). The uniqueness of the interpolation polynomial as guaranteed by Theorem 2.6)–(2. x 1 − x0 5 .4) for reconstructing the interpolation polynomial. where a0 = f (x0 ).D. . Example 2. In other words. we can use the proof to write down a formula for the interpolation polynomial.7) (xj − xk ) We refer to the interpolation polynomial when written in the form (2. f (x0 )) and (x1 .2 The Newton form of the polynomial that interpolates (x0 . f (x0 )) and (x1 . f (x1 )) is Q1 (x) = f (x0 ) + f (x1 ) − f (x0 ) (x − x0 ). f (xj ) − Qj−1 (xj ) . . Following (2. . let Qn (x) = a0 + a1 (x − x0 ) + .3 Newton’s Form of the Interpolation Polynomial One good thing about the proof of Theorem 2.5) we have a1 = f (x1 ) − Q0 (x1 ) f (x1 ) − f (x0 ) = . Levy 2. We do it in the following way: • Let Q0 (x) = a0 .
. xn 0 .9) In view of Theorem 2. xn b1 f (x1 ) 1 . .13) 6 . = 0. . . xn . xn 1 . .9). . . Levy Example 2. that the determinant of its coeﬃcients matrix must not vanish. . x 1 − x0 (x2 − x0 )(x2 − x1 ) f (x1 )−f (x0 ) 2. (2. (2.8). f (x1 )).. .g. . . and (x2 . xn 1 . . .11) (2. . .8) and require that the following interpolation conditions are satisﬁed Qn (xj ) = f (xj ). . . . . . imply that the following equations should hold: b0 + b1 xj + .10) In order for the system (2. .. e.1 we already know that this problem has a unique solution. (x1 . is known as the Vandermonde determinant. . . .e. . 1 j = 0. . .. xn 0 . so we should be able to compute directly the coeﬃcients of the polynomial as given in (2. . n. . (2. . . This means. 1 x0 1 x1 . . . xn bn f (xn ) n (2. . . . = .11) to have a unique solution.4 The Interpolation Problem and the Vandermonde Determinant D. xn n i>j (xi − xj ). 0 j n. . .10) can be rewritten as x0 .12). . it has to be nonsingular.. 1 xn . = .3 The Newton form of the polynomial that interpolates the three points (x0 . . 1 xn .. j In matrix 1 1 . .. . (2. . . (2. . . xn n form. . We leave it as an exercise to verify that 1 x0 1 x1 . (2.. Indeed. . xn b0 f (x0 ) 0 x1 . . the interpolation conditions. i. . + bn xn = f (xj ).12) The determinant (2. . f (x0 )).4 The Interpolation Problem and the Vandermonde Determinant An alternative approach to the interpolation problem is to consider directly a polynomial of the form n Qn (x) = k=0 bk xk . . . . f (x2 )) is f (x2 ) − f (x0 ) + x1 −x0 (x2 − x0 ) f (x1 ) − f (x0 ) Q2 (x) = f (x0 )+ (x−x0 )+ (x−x0 )(x−x1 ). .2.
. (2. and hence the polynomials lj (x) are well deﬁned. while the unknowns are the j=0 polynomials. In either case. · (xj − xj−1 )(xj − xj+1 ) · .15) By substituting xi for x in (2.16) is the n One obvious way of constructing polynomials lj of degree following: n lj (x) = (x − x0 ) · . Levy 2. We use two indices in these polynomials: n the subscript j enumerates lj (x) from 0 to n and the superscript n is used to remind n n us that the degree of lj (x) is n. (2. . i = j. the polynomials lj (x) are precisely of degree n (and not n). (2.1. . .11) has a solution that is also unique.16) where δij is the Kr¨necker delta. 0 i n. . In this section we take a diﬀerent approach and assume that the interpolation polynomial is given as a linear combination of n + 1 polynomials of degree n. .14) we have n Qn (xi ) = j=0 n f (xj )lj (xi ).14) may have a lower degree.13) is indeed non zero. {f (xj )}n . . . n In view of (2. This time.8) assumed a linear combination of polynomials of degrees 0. Note that in this particular case. o δij = 1. the system (2. (2. . . We now require that Qn (x) satisﬁes the interpolation conditions Qn (xi ) = f (xi ). · (x − xj−1 )(x − xj+1 ) · . However. . (xj − x0 ) · .5 The Lagrange Form of the Interpolation Polynomial Since we assume that the points x0 .17) does not vanish since we assume that all intern polation points are distinct. . · (x − xn ) . we set the coeﬃcients as the interpolated values.15) we may conclude that lj (x) must satisfy n lj (xi ) = δij . the determinant (2. . .17) Note that the denominator in (2. xn are distinct. in which the coeﬃcients were unknown. n that satisfy (2. given by (2. . the degree of Qn (x) is n at the most. . n. 0. 0 i n. The 7 .D. 2. · (xj − xn ) 0 j n. i = j.14) n where lj (x) are n+1 polynomials of degree n. We thus let n Qn (x) = j=0 n f (xj )lj (x). Hence. which conﬁrms what we already know according to Theorem 2.5 The Lagrange Form of the Interpolation Polynomial The form of the interpolation polynomial that we used in (2. Qn (x).
Q1 (x). as desired. Levy Lagrange form of the interpolation polynomial is the polynomial Qn (x) given by n (2. the Lagrange form of the interpolation polynomial does not let us use the interpolation polynomial through the ﬁrst two points. where the polynomials lj (x) of degree n are given by n n lj (x) = i=0 i=j (x − xi ) . . x 1 − x0 x − x1 x − x0 + f (x1 ) . (x0 − x1 )(x0 − x2 ) (x1 − x0 )(x1 − x2 ) (x2 − x0 )(x2 − x1 ) It is easy to verify that indeed Q2 (xj ) = f (xj ) for j = 0. (x0 − x1 )(x0 − x2 ) (x − x0 )(x − x2 ) 2 l1 (x) = . In order to obtain the Lagrange form we let 1 l0 (x) = x − x1 . (2. We know that the unique interpolation polynomial through these two points is the line that connects the two points. f (x2 )). x 0 − x1 x 1 − x0 The desired polynomial is therefore given by the familiar formula 1 1 Q1 (x) = f (x0 )l0 (x) + f (x1 )l1 (x) = f (x0 ) Example 2.18) n i=0 i=j (xj − xi ) Example 2. Such a line can be written in many diﬀerent forms. Unfortunately. We start with (x − x1 )(x − x2 ) . x 0 − x1 1 l1 (x) = x − x0 . Remarks. f (x1 )). 1. f (x1 )). f (x0 )). This means that we have to compute everything from scratch. j = 0. 8 .2.4 We are interested in ﬁnding the Lagrange form of the interpolation polynomial that interpolates two points: (x0 . .5 This time we are looking for the Lagrange form of the interpolation polynomial. 2. that interpolates three points: (x0 .5 The Lagrange Form of the Interpolation Polynomial D. as a building block for Q2 (x). (x2 . (x1 . . (x2 − x0 )(x2 − x1 ) 2 l0 (x) = The interpolation polynomial is therefore given by 2 2 2 Q2 (x) = f (x0 )l0 (x) + f (x1 )l1 (x) + f (x2 )l2 (x) (x − x1 )(x − x2 ) (x − x0 )(x − x2 ) (x − x0 )(x − x1 ) = f (x0 ) + f (x1 ) + f (x2 ) . f (x0 )) and (x1 .14). (x1 − x0 )(x1 − x2 ) (x − x0 )(x − x1 ) 2 l2 (x) = . Q2 (x). . n.
. . xn but with diﬀerent values f (x0 ). they have to be constructed only once. there is only one term in the sum in (2. (2. xj . then n n wn (x) = j=0 i=0 i=j (x − xi ).4 is f (x1 ) f (x0 ) + . lj (x) can be rewritten as n lj (x) = wn (x) . (2. . n Hence. (2.19) When we then evaluate wx (x) at any interpolation point. One instance where the Lagrange form of the interpolation polynomial may seem to be advantageous when compared with the Newton form is when you are interested in solving several interpolation problems.19) that does not vanish: n wn (xj ) = i=0 i=j (xj − xi ). f (xn ).5 The Lagrange Form of the Interpolation Polynomial 1. (x − xj )wn (xj ) 0 j n. the coeﬃcient of x in Q1 (x) in Example 2. . Levy 2.20) 3. . . If we deﬁne n wn (x) = i=0 (x − xi ). For future reference we note that the coeﬃcient of xn in the interpolation polynomial Qn (x) is n n j=0 k=0 k=j f (xj ) (xj − xk ) . Therefore. An alternative form for lj (x) can be obtained in the following way. all given at the same interpolation points x0 . the polynomials lj (x) are identical for all problems since they depend only on the points but not on the values of the function at these points.21) For example.D. In this n case. . n 2. x 0 − x1 x 1 − x0 9 .
(xj − xk ) We name the j th coeﬃcient. . + an (x − x0 ) · . It is easy to verify that Qn (x) = P (x) + x − xn [P (x) − Qn−1 (x)]. When written in terms of the divided diﬀerences. . . that interpolates Proof. . . . x1 ](x − x0 ) + . x n − x0 10 n − 1 that interpolates f (x) (2. . . Lemma 2. 1 j n. + f [x0 . To emphasize this dependence. We also denote the zerothorder divided diﬀerence as a0 = f [x0 ]. . . .6 Divided Diﬀerences D. . Qk (xj ) = f (xj ).22) There is a simple way of computing the j th order divided diﬀerence from lower order divided diﬀerences. Levy 2. . . a polynomial of degree f (x) at x0 . . . aj . . For any k. .6 Divided Diﬀerences Qn (x) = a0 + a1 (x − x0 ) + . . . (2. we use the following notation: aj = f [x0 . . . xn−1 ] . . . . is based on the points x0 .23) k. the Newton form of the interpolation polynomial becomes n−1 Qn (x) = f [x0 ] + f [x0 . The j th order divided diﬀerence. 0 j k. . . .24) . . We now consider the unique polynomial P (x) of degree at x1 . . . xn ] k=0 (x − xk ). f (xj ).. This is given by the following lemma. . .2. aj . . xj and on the values of the function at these points f (x0 ). . we denote by Qk (x). .e. as the j th order divided diﬀerence. x n − x0 (2. . xj ]. xk . where f [x0 ] = f (x0 ).6 The divided diﬀerences satisfy: f [x0 . · (x − xn−1 ). xn . i. . 1 j n. xn ] − f [x0 . xn ] = f [x1 . We recall that Newton’s form of the interpolation polynomial is with a0 = f (x0 ) and aj = f (xj ) − Qj−1 (xj ) j−1 k=0 .
24) is 1 (f [x1 . f (x0 )).D. . . 5).24) is f [x0 . . xn ] and the coeﬃcient of xn−1 in Qn−1 (x) is f [x0 . . x n − x0 Example 2. xn−1 ] . x2 ] − f [x0 . . 3). xn ] = f [x1 . . x1 ](x − x0 ) + f [x0 . . xn ] − f [x0 . 0] = 5−9 = −4.7 The secondorder divided diﬀerence is f [x1 . we have f (−1) = 9. . . . 0. 1−0 2 that interpolates (−1. f (x2 )) is Q2 (x) = f [x0 ] + f [x0 . x1 . f [−1. . The relations between the divided diﬀerences are schematically portrayed in Table 2. x1 . xn ]. x n − x0 which means that f [x0 . . . the coeﬃcient of xn on the righthandside of (2. For example. the unique polynomial that interpolates (x0 . . . xn ] − f [x0 . . .1 (up to thirdorder). (x1 . . f [−1. x2 ](x − x0 )(x − x1 ) f (x1 ) − f (x0 ) (x − x0 ) + = f (x0 ) + x 1 − x0 f (x2 )−f (x1 ) x2 −x1 x 2 − x0 − f (x1 )−f (x0 ) x1 −x0 (x − x0 )(x − x1 ). The coeﬃcient of xn−1 in P (x) is f [x1 . 9). Hence. and (x2 . . 1] − f [−1. We note that the divided diﬀerences that are being used as the coeﬃcients in the interpolation polynomial are those that are located in the top of every column. . . f (x1 )). xn−1 ]. if we want to ﬁnd the polynomial of degree (0. x2 ] = = x 2 − x0 f (x2 )−f (x1 ) x2 −x1 x 2 − x0 − f (x1 )−f (x0 ) x1 −x0 . 1] = so that f [0. x1 ] f [x0 . . . 1] = 3−5 = −2. xn−1 ]). The recursive structure of the divided diﬀerences implies that it is required to compute all the low order coeﬃcients in the table in order to get the highorder ones. . 11 . . . Hence. .6 Divided Diﬀerences The coeﬃcient of xn on the lefthandside of (2. and (1. Levy 2. . 0 − (−1) f [0. 1 − (−1) 2 Q2 (x) = 9 − 4(x + 1) + (x + 1)x = 5 − 3x + x2 . . 0] −2 + 4 = = 1.
This means that if we assume that y0 . xn ∈ [a. x2 . . and hence these two coeﬃcients must be identical. x3 ] 2. . x1 ] x1 f (x1 ) f [x1 . b) such that f (x) − Qn (x) = 1 f (n+1) (ξn ) (x − xj ). At the same time. Nevertheless. there is. x2 ] x2 f (x2 ) f [x2 . . We let Πn denote the space of polynomials of degree n. x2 . f [y0 . . . in general. we can estimate the diﬀerence between them. .1: Divided Diﬀerences One important property of any divided diﬀerence is that it is a symmetric function of its arguments. . . . While the interpolant and the function agree with each other at the interpolation points. This property can be clearly explained by recalling that f [x0 .25) . xn . x1 . x2 ] f [x0 . ∃ξn ∈ (a. x3 ] x3 f (x3 ) Table 2. no reason to expect them to be close to each other elsewhere. xn ] plays the role of the coeﬃcient of xn in the polynomial that interpolates f (x) at x0 . Let Qn (x) ∈ Πn such that it interpolates f (x) at the n + 1 distinct points x0 .7 The Error in Polynomial Interpolation In this section we would like to provide estimates on the “error” we make when interpolating data that is taken from sampling an underlying function f (x). . Levy x0 f (x0 ) f [x0 . then f [y0 . b]. . . . a diﬀerence which we refer to as the interpolation error. (n + 1)! j=0 12 n (2.2. yn is any permutation of x0 . . . Theorem 2. . b]. . . yn ] is the coeﬃcient of xn at the polynomial that interpolates f (x) at the same points. .8 Let f (x) ∈ C n+1 [a. . . . . . yn ] = f [x0 . . the order of the points does not matter. x3 ] f [x0 . . xn . x1 . . . . b]. xn ]. .7 The Error in Polynomial Interpolation D. f [x1 . . Since the interpolation polynomial is unique for any given data. Then ∀x ∈ [a. .
then the lefthandside and the righthandside of (2.26) we conclude with f (x) − Qn (x) = 1 f (n+1) (ξn )w(x). Consider. We therefore assume that x = xj 0 j n. . then also F ∈ C n+1 [a. w(x) does not vanish and λ is well deﬁned. that 13 . xn and x are distinct. b]. F has at least n distinct zeros in (a. the ﬁrstorder divided diﬀerence f [x0 . x 1 − x0 Since the order of the points does not change the value of the divided diﬀerence. Levy 2. We have 0 = F (n+1) (ξn ) = f (n+1) (ξn ) − Q(n+1) (ξn ) − λ(x)w(n+1) (ξn ) n f (x) − Qn (x) = f (n+1) (ξn ) − (n + 1)! w(x) (2. b] and since Qn and w are polynomials. We now let F (y) = f (y) − Qn (y) − λw(y).26) Here. i. . which guarantees that its (n + 1)th derivative equals w(n+1) (x) = (n + 1)! Reordering the terms in (2. λ= f (x) − Qn (x) . without any loss of generality. there is another important characterization which we will comment on now.7 The Error in Polynomial Interpolation Proof. We now note that since f ∈ C n+1 [a. that x0 < x1 . According to Rolle’s theorem..D.. . b).g. b). We ﬁx a point x ∈ [a.e. . If we assume. F (n+1) has at least one zero in (a. w(x) Since the interpolation points x0 . e. xn . and hence this result holds trivially. in addition. . and let n w(x) = j=0 (x − xj ). which we denote by ξn . we used the fact that the leading term of w(x) is xn+1 . .25) are both zero. where λ is chosen as to guarantee that F (x) = 0. we can assume. F has at least n + 1 distinct zeros in (a. (n + 1)! (2. If x is one of the interpolation points x0 . . xn and x. and similarly.27) In addition to the interpretation of the divided diﬀerence of order n as the coeﬃcient of xn in some interpolation polynomial. b]. . b). x1 ] = f (x1 ) − f (x0 ) . F vanishes at n + 2 points: x0 . In addition. . . . .
In this case. Similarly to the ﬁrstorder divided diﬀerence. xn−1 . In other words. . x] j=0 (y − xj ). Proof. .22). n! In other words. . . . xn ] = . . we could as well think of the interpolation point x as any other interpolation point. the nth order divided diﬀerence is an nth derivative of the function f (x) at an intermediate point.2. . . Then according to the construction of the Newton form of the interpolation polynomial (2. f [x0 .28) takes the somewhat more natural form of f (n) (ξ) f [x0 . x1 ). Let a = min(x. the divided diﬀerences are well deﬁned also for nondiﬀerentiable functions. Then f [x0 . Since Qn (y) interpolated f (y) at x. and name it. we can no longer consider this divided diﬀerence to represent a nth order derivative of the function. Theorem 2.. assuming that the function has n continuous derivatives.28). . This notion can be extended to divided diﬀerences of higher order as stated by the following theorem.e. .7 The Error in Polynomial Interpolation D. the equation (2. x1 ]. n! (2. e. Levy f (x) is continuously diﬀerentiable in the interval [x0 . x0 . Let Qn+1 (y) interpolate f (y) at x0 . Assume that f (y) has a continuous derivative of order n in the interval (a.. . i. though if this is the case. . 14 n−1 . then this divided diﬀerence equals to the derivative of f (x) at an intermediate point. . x. x0 . x] = where ξ ∈ (a. b). . . . ξ ∈ (x0 . In equation (2. . . x] j=0 (x − xj ). n! j=0 which implies the result (2.28). xn−1 ). By Theorem 2. x1 ] = f (ξ).9 Let x. xn−1 . . xn . its being diﬀerentiable). . Remark. xn−1 ) and b = max(x. . . b).e. . . It is important to note that while this interpretation is based on additional smoothness requirements from f (x) (i. xn−1 . xn−1 be n + 1 distinct points.g. . . . x0 . xn−1 . . . . we would like to emphasize that the nth order divided diﬀerence is also well deﬁned in cases where the function is not as smooth as required in the theorem. . we know that n−1 f (n) (ξ) . .8 we know that the interpolation error is given by 1 f (x) − Qn−1 (x) = f (n) (ξn−1 ) (x − xj ). the ﬁrstorder divided diﬀerence can be viewed as an approximation of the ﬁrst derivative in the interval. we have n−1 f (x) = Qn−1 (x) + f [x0 .28) Qn (y) = Qn−1 (y) + f [x0 .
D.25)). .29). . Due to the implicit dependence of ξn on the interpolation points. (2. T2 (x) and T3 (x) are shown in Figure 2. .10 For x ∈ [−1. For the time being. It is important to note that the interpolation points inﬂuence two terms on the righthandside of (2. Levy 2.8 Interpolation at the Chebyshev Points 2. . we would like to emphasize that a solution of this problem does not (in general) provide an optimal choice of interpolation points that will minimize the interpolation error. The solution of the problem will be to interpolate at Chebyshev points. We start by deﬁning the Chebyshev polynomials using the following recursion relation: T0 (x) = 1.29) Here.31). Instead of writing the recursion formula. We recall that if we are interpolating values of a function f (x) that has n continuous derivatives.30).8 Interpolation at the Chebyshev Points In the entire discussion so far. We will return to this “full” problem later on in the context of the minimax approximation.30) is minimized. T2 (x) = 2xT1 (x)−T0 (x) = 2x2 −1. and T3 (x) = 4x3 −3x.31) Tn+1 (x) = 2xTn (x) − Tn−1 (x). . . .2. minimizing the interpolation error is not an easy task. The polynomials T1 (x). T1 (x) = x. We will ﬁrst introduce the Chebyshev polynomials and the Chebyshev points and then show why interpolating at these points minimizes (2. it is possible to write an explicit formula for the Chebyshev polynomials: Lemma 2. There may be cases where one may have the ﬂexibility of choosing the interpolation points. we assumed that the interpolation points are given. For example. it would be reasonable to use this degree of freedom to minimize the interpolation error. 1]. n 1. . (n + 1)! j=0 n (2. The tool that we are going to use is the Chebyshev polynomials.30) The second one is f (n+1) (ξn ) as ξn depends on x0 . xn . n 0.32) . Qn (x) is the interpolating polynomial and ξn is an intermediate point in the interval of interest (see (2. namely. the interpolation error is of the form 1 f (x) − Qn (x) = f (n+1) (ξn ) (x − xj ). Once again. If this is the case. All that it guarantees is that the product part of the interpolation error is minimal. how to choose the interpolation points x0 . The solution of this problem is the topic of this section. we are going to focus on a simpler problem. (2. Tn (x) = cos(n cos−1 x). 15 (2. xn such that the product (2. (2. The obvious one is the product n j=0 (x − xj ).
+ a1 x + a0 . What is so special about the Chebyshev polynomials. cos(n − 1)θ = cos θ cos nθ + sin θ sin nθ.8 −0. Levy 0. there is one more issue that we must clarify. . t1 (x) = x. .33) 1. Hence cos(n + 1)θ = 2 cos θ cos nθ − cos(n − 1)θ.8 T2(x) −0.2.e.8 T3(x) T1(x) 0. Hence tn (x) = Tn (x)..4 0. We now let θ = cos−1 x. but before doing so.6 −0.2: The Chebyshev polynomials T1 (x).e. if it is of the form xn + an−1 xn−1 + .4 −0.4 −0. We deﬁne a monic polynomial as a polynomial for which the coeﬃcient of the leading term is one. Standard trigonometric identities imply that cos(n + 1)θ = cos θ cos nθ − sin θ sin nθ. i. x = cos θ. and what is the connection between these polynomials and minimizing the interpolation error? We are about to answer these questions..6 −0. Then by (2.6 0. tn+1 (x) = 2xtn (x) − tn−1 (x).8 1 −1 −1 x Figure 2.33) t0 (x) = 1.2 0 0.4 0. 16 .2 −0. T2 (x) and T3 (x) Proof. i.6 0.2 0.2 0 −0. and deﬁne tn (x) = cos(n cos−1 x) = cos(nθ).8 Interpolation at the Chebyshev Points 1 D. a polynomial of degree n is monic. n (2.
x 1. Hence.D. This means that Tn (x) divided by 2n−1 is monic.31) implies that the Chebyshev polynomial of degree n is of the form Tn (x) = 2n−1 xn + . the polynomial (qn − pn )(x) oscillates (n + 1) times in the interval [−1. jπ n = (−1)j . Levy 2. . 0 j n. Such a polynomial can not have more than n − 1 distinct roots. We prove (2. A general result about monic polynomials is the following Theorem 2.34) by contradiction. (2. 1]. Suppose that pn (x) < 21−n . i. which means that (qn − pn )(x) has at least n distinct roots in the interval. which leads to a contradiction.34) Proof.e. .. Let qn (x) = 21−n Tn (x). pn (x) and qn (x) are both monic polynomials which means that their diﬀerence is a polynomial of degree n − 1 at most. jπ n .11 If pn (x) is a monic polynomial of degree n. . pn (xj ) < 21−n = (−1)j qn (xj ). 0 j n. 21−n Tn (x) = xn + . Note that pn −qn can not be the zero polynomial because 17 . However. Hence (−1)j pn (xj ) This means that (−1)j (qn (xj ) − pn (xj )) > 0. . then −1 x 1 max pn (x) 21−n .8 Interpolation at the Chebyshev Points Note that Chebyshev polynomials are not monic: the deﬁnition (2. and let xj be the following n + 1 points xj = cos Since Tn cos we have (−1)j qn (xj ) = 21−n .
n j=0 (x We note that rem 2. . are therefore obtained if (n + 1) cos−1 (xj ) = j+ 1 2 π. . Here. . max x 1 j=0 (x − xj ) The minimal value of 2−n can be actually obtained if we set n 2 −n Tn+1 (x) = j=0 (x − xj ).8 Interpolation at the Chebyshev Points D. xn ∈ [−1.35) The roots of the Chebyshev polynomials are sometimes referred to as the Chebyshev points.e. In order to ﬁnd the roots of Tn (x). (2. We are now ready to use Theorem 2. is max f (x) − Qn (x) x 1 1 max f (n+1) (x) max x 1 (n + 1)! x 1 n j=0 (x − xj ) . What are the roots of the Chebyshev polynomial Tn+1 (x)? By Lemma 2. and the interpolation polynomial. x0 .10 Tn+1 (x) = cos((n + 1) cos−1 x). deﬁne α = π/n. . the (n + 1) roots of Tn+1 (x) are xj = cos 2j + 1 π .11 to ﬁgure out how to reduce the interpolation error. 1) such that the distance between the function whose values we interpolate. The roots of Tn+1 (x). ..11 n − xj ) is a monic polynomial of degree n + 1 and hence by Theo2−n . 2n + 2 0 j n. which is equivalent to choosing xj as the roots of the Chebyshev polynomial Tn+1 (x).2. Levy that will imply that pn (x) and qn (x) are identical which again is not possible due to the assumptions on their maximum values. f (x). then there exists ξn ∈ (−1. Divide 18 . The formula (2.8 that if the interpolation points x0 . We know by Theorem 2. 0 j n. We are interested in minimizing n max x 1 j=0 (x − xj ) . we have used the obvious fact that Tn (x) 1.35) for the roots of the Chebyshev polynomial has the following geometrical interpretation. Qn (x). . . . i. 1]. xn .
. . .12 Assume that Qn (x) interpolates f (x) at x0 . Levy 2.3: The roots of the Chebyshev polynomial T4 (x). x0 . f (x) − Qn (x) Example 2.36) 2j + 1 π . Find Q2 (x) which interpolates f (x) in the Chebyshev points. xn .3 for T4 (x). The following theorem summarizes the discussion on interpolation at the Chebyshev points.8 Interpolation at the Chebyshev Points The unit circle 5π 8 3π 8 7π 8 π 8 1 x0 x1 0 x2 x3 1 x Figure 2.e.. + 1)! ξ 1 (2. . Tn+1 (x). Estimate the error. Assume also that these (n + 1) interpolation points are the (n + 1) roots of the Chebyshev polynomial of degree n + 1. 1].D. . We are also asked to interpolate at the Chebyshev points. .13 Problem: Let f (x) = sin(πx) in the interval [−1. Note that they become dense next to the boundary of the interval the upper half of the unit circle into n + 1 parts such that the two side angles are α/2 and the other angles are α. It also provides an estimate of the error for this case. x3 . 2n + 2 0 j n. we need 3 interpolation points. Theorem 2. T3 (x) = 4x3 − 3x. . i. Solution: Since we are asked to ﬁnd an interpolation polynomial of degree 2. 2n (n 1 max f (n+1) (ξ) . xj = cos Then ∀x 1. and hence we ﬁrst need to compute the 3 roots of the Chebyshev polynomial of degree 3. 19 . This procedure is demonstrated in Figure 2. The Chebyshev points are then obtained by projecting these points on the xaxis. .
x1 ](x − x0 ) + f [x0 . yj = cos 2j + 1 π . Remark. 2 f (x1 ) = 0.4.e.4086.4718. 1]: x= (b − a)y + (a + b) . 1] are the roots of the Chebyshev polynomial Tn+1 (x). The ﬁrstorder divided diﬀerences are f (x1 ) − f (x0 ) ≈ 0. . 2 2 The corresponding values of f (x) at these interpolation points are √ 3 f (x0 ) = sin − π ≈ −0. x1 .4718. x2 ] − f [x0 . ∀x 1. The Chebyshev points in the interval y ∈ [−1.. f (x2 ) = sin √ 3 π 2 D. i. x2 ] = f [x1 . b]. it is still possible to use the previous results by following the following steps: Start by converting the interpolation interval to y ∈ [−1.e.4718x. i. Levy ≈ 0. x2 = . 2n + 2 0 j n.292 A brief examination of Figure 2. 2 This converts the interpolation problem for f (x) on [a. it is far from being sharp. 20 .. are plotted in Figure 2. In the more general case where the interpolation interval for the function f (x) is x ∈ [a. The original function f (x) and the interpolant at the Chebyshev points. Q2 (x). 1]. x2 ] = ≈ 0. x 2 − x1 and the secondorder divided diﬀerence is f [x0 .4 reveals that while this error estimate is correct. x1 ] = 0. √ √ 3 3 x0 = − . As of the error estimate.8 Interpolation at the Chebyshev Points The roots of T3 (x) can be easily found from x(4x2 − 3) = 0. x1 ] = x 1 − x0 f (x2 ) − f (x1 ) f [x1 . x1 . f [x0 .  sin πx − Q2 (x) 1 max (sin πt)(3)  2 3! ξ 1 2 π3 22 3! 1. b] into an interpolation problem for f (x) = g(x(y)) in y ∈ [−1. x1 = 0.4086. x 2 − x0 The interpolation polynomial is Q2 (x) = f (x0 ) + f [x0 .2. x2 ](x − x0 )(x − x1 ) ≈ 0.
D.6 −0. max f (n+1) (ξ) .8 −0.8 −1 −1 −0.14 Problem: Find a polynomials p(x) such that p(1) = −1.2 Q2(x) 0 −0. we are also interested in interpolating its derivatives.37) 2. The corresponding n + 1 interpolation points in the interval [a. Levy 1 2.2 −0.4 0. 21 . Such an interpolation problem is demonstrated in the following example: Example 2.8 1 x Figure 2. 2 n 0 j n.4 −0.4 −0.4: The function f (x) = sin(π(x)) and the interpolation polynomial Q2 (x) that interpolates f (x) at the Chebyshev points. See Example 2.13. p (1) = −1. We now have b−a (y − yj ) = max y∈[a.b] 2 j=0 so that the interpolation error is f (y) − Qn (y) 1 b−a n (n + 1)! 2 2 n+1 ξ∈[a.6 0.9 Hermite Interpolation We now turn to a slightly diﬀerent interpolation problem in which we assume that in addition to interpolating the values of the function at certain points.2 0 0. Interpolation that involves the derivatives is called Hermite interpolation. and p(0) = 1.4 0.6 −0. (2.6 0.8 f(x) 0.9 Hermite Interpolation 0.b] n+1 n max x 1 j=0 (x − xj ) .2 0. b] are xj = (b − a)yj + (a + b) .
p(x) = x2 − 3x + 1. Hence. we may expect them to uniquely determine a linear function. Hence.9 Hermite Interpolation D. a linear polynomial can not interpolate the data and we must consider higherorder polynomials. a polynomial of order 2 will no longer be unique because not enough information is given. we are also given all the values of the lowerorder derivatives up to l as part of the interpolation requirements. a0 = 1. The conditions of the problem then imply that a0 + a1 + a2 = −1.e. p(i) (xj ) = f (i) (xj ). we can use these conditions to determine three degrees of freedom. Levy Solution: Since three conditions have to be satisﬁed. we assume that for any point xj in which we have to satisfy an interpolation condition of the form p(l) (xj ) = f (xj ). we may have to interpolate highorder derivatives and not only ﬁrstorder derivatives. which means that it is reasonable to expect that these conditions uniquely determine a polynomial of degree 2. it may not be possible to ﬁnd a unique interpolant as demonstrated in the following example. However. 22 . In general. 0 i l. but we will still not have enough information to determine the constant a0 .15 Problem: Find p(x) such that p (0) = 1 and p (1) = −1. we will not have problems with the coeﬃcient of the linear term a1 . there is indeed a unique polynomial that satisﬁes the interpolation conditions and it is If this is not the case.2. We therefore let p(x) = a0 + a1 x + a2 x2 . (with p(l) being the lth order derivative of p(x)). Example 2. Also. i. Unfortunately. which amounts to a contradicting information on the value of a1 .. Note that even if the prescribed values of the derivatives were identical. say p(x) = a0 + a1 x. both conditions specify the derivative of p(x) at two distinct points to be of diﬀerent values. a1 + 2a2 = −1. Solution: Since we are asked to interpolate two conditions.
.e. . x→x0 x − x0 x→x0 Hence. When viewed from the point of view that we advocate in this section. . x0 ] = f (x0 ). . .16 The ﬁrstorder divided diﬀerence with repetitions is deﬁned as f [x0 . . 0 i mj . Deﬁnition 2. (2.9.D. xl (which we assume are ordered from small to large). The ﬁrst form we study is the Newton form of the Hermite interpolation polynomial. We already know that the ﬁrst derivative is connected with the ﬁrstorder divided difference by f (x0 ) = lim f (x) − f (x0 ) = lim f [x. n! (2. 0 j n. n is The unique solution of this problem in terms of a polynomial of degree p(x) = f (x0 ) + f (x0 )(x − x0 ) + . xn ] − f [x0 . . . . . . we have to satisfy the interpolation conditions: p(i) (xj ) = f (i) (xj ). the interpolation conditions are: p(j) (x0 ) = f (j) (x0 ). . it is natural to extend the notion of divided diﬀerences by the following deﬁnition. x0 ]. . . .. xn ] = f (n) (x0 ) . We start by extending the deﬁnition of divided diﬀerences in such a way that they can handle derivatives. .38) In a similar way. Levy 2. we can extend the notion of divided diﬀerences to highorder derivatives as stated in the following lemma (which we leave without a proof). j! which is the Taylor series of f (x) expanded about x = x0 . Then the divided diﬀerences satisfy f [x1 . . n 0 x n − x0 f [x0 . one can consider the Taylor series as an interpolation problem in which one has to interpolate the value of the function and its ﬁrst n derivatives at a given point. . x = x . + f (n) (x0 ) (x − x0 )n = n! n j=0 f (j) (x0 ) (x − x0 )j .39) We now consider the following Hermite interpolation problem: The interpolation points are x0 .9 Hermite Interpolation A simple case that you are probably already familiar with is the Taylor series. 23 . xn−1 ] . i. 2. say x0 . xn = x0 .1 Divided diﬀerences with repetitions We are now ready to consider the Hermite interpolation problem. .17 Let x0 x1 . At each interpolation point xj . Lemma 2. xn .
x1 ](x − x0 ) + f [x0 . yn−1 } = {x0 . . We then list all the points including their multiplicities (that correspond to the number of derivatives we have to interpolate). . x1 ] − f [x1 . The extended notion of divided diﬀerences allows us to write the solution to this problem in the following way: We let n denote the total number of points including their multiplicities (that correspond to the number of derivatives we have to interpolate at each point). yj ]. x1 . . . x0 ] f [x0 . there is no need to shift the notations to y’s and we work directly with the original points. . . x1 ] = f (x1 ) − f (x0 ) .18 Problem: Find an interpolation polynomial p(x) that satisﬁes p(x0 ) = f (x0 ). (2. Example 2. xl }. mj denotes the number of derivatives that we have to interpolate for each point xj (with the standard notation that zero derivatives refers to the value of the function only). . . . i. xl . x 1 − x0 x 1 − x0 Hence p(x) = f (x0 )+ f (x1 ) − f (x0 ) (x1 − x0 )f (x1 ) − [f (x1 ) − f (x0 )] (x−x0 )+ (x−x0 )(x−x1 ). . the number of derivatives that we have to interpolate may change from point to point. x1 . p(x) = f (x0 ) + f [x0 . . . x1 ] = = .40) Whenever a point repeats in f [y0 . x1 . . . x1 . . x1 ](x − x0 )(x − x1 ).39).e. Levy Here. m1 m2 ml The interpolation polynomial pn−1 (x) is given by n−1 j−1 pn−1 (x) = f [y0 ] + j=1 f [y0 . .. To simplify the notations we identify these points with a new ordered list of points yi : {y0 . x 1 − x0 (x1 − x0 )2 24 . . . . . The divided diﬀerences: f [x0 . p(x1 ) = f (x1 ). In practice. . . we interpret this divided diﬀerence in terms of the extended deﬁnition (2. x 1 − x0 Solution: The interpolation polynomial p(x) is 1 )−f (x f (x1 ) − f (xx1 −x0 0 ) f [x1 . . x0 . . n = m1 + m2 + . . . In general. . We demonstrate this interpolation procedure in the following example. . p (x1 ) = f (x1 ). yj ] k=0 (x − yk ).2. + ml . .9 Hermite Interpolation D. .
We already know that n In order to satisfy the interpolation conditions (2. .43) 2 δij = Ai (xj ) = ri (xj )li (xj ) = ri (xj )δij . Levy 2. n. . Bi (xj ) = δij . p (xi ) = f (xi ). . Hence ri (xi ) = 1. . The functions ri (x) and si (x) are both assumed to be linear. xn and the interpolation conditions are p(xi ) = f (xi ). according to (2.5).2 2.9. x i − xj satisfy li (xj ) = δij . j = 0. It is convenient to start the construction with the functions we have used in the Lagrange form of the standard interpolation problem (Section 2. 2 (li (xj )) = 0. In addition. the polynomials p(x) in (2.41). li (x) = j=0 j=i x − xj .44) . We will thus assume that the unknown polynomials Ai (x) and Bi (x) in (2. (2.42) We thus expect to have a unique polynomial p(x) that satisﬁes the constraints (2. as desired. 2 The degree of li (x) is n.D. .43) assuming that we limit its degree to be 2n + 1. 0 i n. (2.42) must satisfy the 2n + 2 conditions: Ai (xj ) = δij . Now. . (2. . Bi (xj ) = 0. which implies that deg(Ai ) = deg(Bi ) = 2n + 1.41) We look for an interpolant of the form n n p(x) = i=0 f (xi )Ai (x) + i=0 f (xi )Bi (x). 2 Bi (x) = si (x)li (x). i. 25 (2.43) can be written as 2 Ai (x) = ri (x)li (x). .43) Ai (xj ) = 0.9 Hermite Interpolation The Lagrange form of the Hermite interpolant In this section we are interested in writing the Lagrange form of the Hermite interpolant in the special case in which the nodes are x0 . 2 li (xj ) = 0. which means that the degree of li (x) is 2n. for i = j.
such that ∀0 i n.19 Let x0 . D. As of Bi (x) in (2. the Lagrange form of the Hermite interpolation polynomial is given by n n 2 f (xi )[1 + 2li (xi )(xi − x)]li (x) + 2 f (xi )(x − xi )li (x). so that 2 Bi (x) = (x − xi )li (x). 26 (2.45).(2.49) . . Levy 0 = Ai (xj ) = ri (xj )[li (xj )]2 + 2ri (xj )li (xJ )li (xj ) = ri (xj )δij + 2ri (xj )δij li (xj ).9 Hermite Interpolation Also. equations (2.44). b].47) Combining (2. (2.43) imply that 2 0 = Bi (xj ) = si (xj )li (xj ) =⇒ si (xi ) = 0. .42).2.47). (2. To summarize. b) such that f (2n+2) (ξ) f (x) − p(x) = (2n + 2)! n i=0 (x − xi )2 . (2. imply that a = −2li (xi ).48) i=0 The error in the Hermite interpolation (2. If p ∈ Π2n+1 . p(x) = i=0 (2. xn be distinct nodes in [a. we obtain si (x) = x − xi . there exists ξ ∈ (a. Theorem 2. the conditions (2. . Therefore 2 Ai (x) = [1 + 2li (xi )(xi − x)]li (x). p (xi ) = f (xi ). Assuming that ri (x) is linear. p(xi ) = f (xi ).46) and (2. and thus ri (xi ) + 2li (xi ) = 0. ri (x) = ax + b. b] and f ∈ C 2n+2 [a.45) b = 1 + 2li (xi )xi .46) and 2 2 δij = Bi (xj ) = si (xj )li (xj ) + 2si (xj )(li (xj )) =⇒ si (xi ) = 1. then ∀x ∈ [a.48) is given by the following theorem. . b].
Rolle’s theorem implies that φ has at least 2n + 1 zeros in (a. the result trivially holds. and l0 (x) = x − x1 . the desired polynomial is given by (check!) p(x) = y0 2 1+ (x0 − x) x 0 − x1 x − x1 x 0 − x1 x − x1 x 0 − x1 2 + y1 2 +d0 (x − x0 ) + d1 (x − x1 ) 27 x − x0 x 1 − x0 2 1+ (x1 − x) x 1 − x0 2 . x 1 − x0 l1 (x) = 1 . By Rolle’s theorem. xn ). .48).9 Hermite Interpolation Proof. φ vanishes at x0 .D. since p(x) ∈ Π2n+1 . b). We recall that x was an arbitrary (noninterpolation) point and hence we have f (2n+2) (ξ) f (x) − p(x) = (2n + 2)! n i=0 (x − xi )2 . w(2n+2) (ξ) = (2n + 2)!. xn ). say ξ. x0 . b]. Example 2. We thus ﬁx x as a noninterpolation point and deﬁne n w(y) = i=0 (y − xi )2 . φ(2n+2) has at least one zero in (a.8.e. The proof follows the same techniques we used in proving Theorem 2. Also. . which means that φ has at least 2n + 2 zeros in [a. . . b). x 0 − x1 l1 (x) = x − x0 . Since the leading term in w(y) is x2n+2 . p (x1 ) = d1 . . x 0 − x1 l0 (x) = 1 . we know that φ has (at least) n + 1 zeros that are diﬀerent than (x. . In this case n = 1. . . p(2n+2) (ξ) = 0. If x is one of the interpolation points. . and select λ such that φ(x) = 0. i. Similarly. xn . and by induction. Hence 0 = φ(2n+2) (ξ) = f (2n+2) (ξ) − p(2n+2) (ξ) − λw(2n+2) (ξ).. w(x) φ has (at least) n + 2 zeros in [a. . We also have φ(y) = f (y) − p(y) − λw(y). p (x0 ) = d0 .20 Assume that we would like to ﬁnd the Hermite interpolation polynomial that satisﬁes: p(x0 ) = y0 . x 1 − x0 x − x0 x 1 − x0 2 According to (2. p(x1 ) = y1 . Levy 2. b]: (x. . x0 . Also. . λ= f (x) − p(x) . .
In general. the only type of interpolation we were dealing with was polynomial interpolation. and still interpolate the data. In this section we discuss a diﬀerent type of interpolation: piecewisepolynomial interpolation. ti ) s(x) is a polynomial of degree subinterval that is deﬁned by the knots. Levy 2. i.10 Spline Interpolation So far. we pick n + 1 points which we refer to as the knots: t0 < t1 < · · · < tn . That is why we focus our attention in this section on splines.5). highorder polynomials are oscillatory. A spline of degree k having knots t0 . In this ﬁgure we demonstrate what a highorder interpolant looks like. The functions we will discuss in this section are splines.2. f(x2 )) (x0 . f(x4 )) (x2 . we have no control over the oscillatory nature of the highorder interpolating polynomial. Overall it is continuous where the regularity is lost at the knots You may still wonder why are we interested in such functions at all? It is easy to motivate this discussion by looking at Figure 2. . We will be more rigorous when we deﬁne precisely what we mean by smooth. (x1 . First. Splines. f(x1 )) (x3 .6..e. which rules them as nonpractical for many applications. f(x3 )) (x4 . . we would like to generate functions that are somewhat smoother than piecewiselinear functions. Of course. should be thought of as polynomials on subintervals that are connected in a “smooth way”. . In every subinterval the function is linear. k.10 Spline Interpolation D. . A simple example of such interpolants will be the function we get by connecting data with straight lines (see Figure 2. Even though the data that we interpolate has only one extrema in the domain. f(x0 )) x Figure 2. Smoothness: s(x) has a continuous (k − 1)th derivative on the interval [t0 .5: A piecewiselinear spline. tn is a function s(x) that satisﬁes the following two properties: 1. On [ti−1 . tn ]. s(x) is a polynomial on every 2. 28 .
The knots are at the interpolation points.6: An interpolant “goes bad”. Since the spline is of degree zero.5 −5 −4 −3 −2 −1 0 1 2 3 4 5 x Figure 2. f(t1 )) (t3 .5 0 1 1+x2 −0. Q10 (x) (t1 . the function is not even continuous 29 . Levy 2. f(t2 )) (t0 . In this example we interpolate 11 equally spaced 1 samples of f (x) = 1+x2 with a polynomial of degree 10. f(t3 )) (t4 . f(t4 )) (t2 .D.10 Spline Interpolation 2 1.5 Q10(x) 1 0.7: A zerothorder (piecewiseconstant) spline. f(t0 )) x Figure 2.
8). Sometimes it is convenient to choose the knots to coincide with the interpolation points but this is only optional. tn are called knots: these are the points that connect the diﬀerent polynomials with each other. . x ∈ [t . s (x). . t2 ). s (x) = a x + b . . t2 ). s(x) = . (2. (2.. t1 ). x ∈ [t0 . s0 (x).1 Cubic splines where ∀i. x ∈ [t1 . the degree of si (x) is 3. .10.2. t ]. i i n − 2. t1 ). . si (ti+1 ) = si+1 (ti+1 ). We now assume that some data (that s(x) should interpolate) is given at the knots.e. We would like to comment already at this point that knots should not be confused with the interpolation points. n−1 n−1 n The interpolation conditions (2. 0 i n. imply that si−1 (ti ) = yi = si (ti ).52) Before actually computing the spline. It is now obvious why the points t0 . A cubic spline is a spline for which the function is a polynomial of degree 3 on every subinterval. i. .10 Spline Interpolation D.7). 0 0 (2.51) We also require the continuity of the ﬁrst and the second derivatives.. . s1 (x). and in each 30 . To qualify as an interpolating function.50) in addition to requiring that s(x) is continuous. i. si (ti+1 ) = si+1 (ti+1 ).5 where the knots {ti } and the interpolation points {xi } are assumed to be identical). t ]. s(x) will have to satisfy interpolation conditions that we will discuss below. . There are n subintervals. . x ∈ [t . n − 2. s(x) = . A spline of degree 1 is a piecewiselinear function that can be explicitly written as s0 (x) = a0 x + b0 . . and other choices can be made. 2. n−1 n−1 n−1 n−1 n (see Figure 2. let’s check if we have enough equations to determine a unique solution for the problem. s(ti ) = yi ..e. . x ∈ [t0 . Levy A spline of degree 0 is a piecewiseconstant function (see Figure 2. s1 (x) = a1 x + b1 . 1 i n − 1. Let’s denote such a function by s(x).50) A special case (which is the most common spline function that is used in practice) is the cubic spline. x ∈ [t1 . . . i. . and a function with two continuous derivatives overall (see Figure 2.e.
˜ si (x) = (x − ti )2 2 hi 2 hi 31 . f(t2 )) (t0 . we observe that si (x) is the line connecting (ti . si (x) = x − ti x − ti+1 zi+1 − zi . i. f(t0 )) x Figure 2.51) for si (ti ) and si (ti+1 ) amount to 2n equations. Each such polynomial has 4 coeﬃcients. We also set zi = s (ti ). Altogether we have 4n − 2 equations but 4n unknowns which leaves us with 2 degrees of freedom. hi hi (2. The continuity of the ﬁrst and the second derivatives (2. We are now ready to compute the spline.. which leaves us with 4n coeﬃcients to determine.53) Integrating (2. We will use the following notation: hi = ti+1 − ti . Levy 2. These indeed are two degrees of freedom that can be determined in various ways as we shall see below. zi ) and (ti+1 .e. f(t4 )) (t2 . we have zi+1 1 zi 1 − (x − ti+1 )2 + c. The polynomials on the diﬀerent subintervals are connected to each other in such a way that the spline has a secondorder continuous derivative.D.8: A cubic spline. f(t1 )) (t3 .53) once.52) add 2(n − 1) = 2n − 2 equations.10 Spline Interpolation (t1 . ti . the function is a polynomial of degree 2. f(t3 )) (t4 . In every subinterval [ti−1 . Since the second derivative of a cubic function is linear. subinterval we have to determine a polynomial of degree 3. The interpolation and continuity conditions (2. zi+1 ). In this example we use the notaknot condition.
hi 6 This means that we can rewrite si (x) as si (x) = yi+1 zi+1 zi+1 yi zi zi hi (x−ti )+ (x−ti )3 + (ti+1 −x)3 + − − hi (ti+1 −x). . zn−1 using the continuity conditions on s (x). we obtain the system of equations (2. . implies that yi = i... .54) hi−1 hi + hi−1 hi 1 1 zi−1 + zi + zi+1 = (yi+1 − yi ) − (yi − yi−1 ). . hi 6 zi+1 3 h + Chi . . 6 3 hi−1 h i−1 si (ti ) = − Hence.e. We can set z1 . for 1 i n − 1.10 Spline Interpolation Integrating again zi zi+1 (x − ti )3 + (ti+1 − x)3 + C(x − ti ) + D(ti+1 − x). 3 6 hi hi zi 2 yi−1 zi−1 hi−1 yi zi si−1 (ti ) = hi−1 + − hi−1 − + 2hi−1 hi−1 6 hi−1 6 hi−1 hi−1 yi−1 yi = zi−1 + zi − + . zn . Levy Similarly.2. i. 6 3 6 hi hi−1 32 . 6hi i zi 3 h + Dhi .. s(ti ) = yi . 6hi i D.e. z0 . C= yi+1 zi+1 − hi . (x − ti )2 − (ti+1 − x)2 + − 2hi 2hi hi 6 hi 6 yi−1 zi−1 hi−1 zi zi−1 yi zi .e. si (ti+1 ) = yi+1 . si (ti ) = si−1 (ti ). . si−1 (x) = (x − ti−1 )2 − (ti − x)2 + − hi−1 − + 2hi−1 2hi−1 hi−1 6 hi−1 6 si (x) = So that zi 2 yi+1 zi+1 yi zi hi hi + − hi − + 2hi hi 6 hi 6 hi hi yi yi+1 = − zi − zi+1 − + . D= yi zi hi − . implies that yi+1 = i. . 6hi 6hi hi 6 hi 6 All that remains to determine is the second derivatives of s(x). . si (x) = 6hi 6hi The interpolation condition. We ﬁrst compute si (x) and si−1 (x): yi zi hi zi yi+1 zi+1 zi+1 hi − + .
which means that we have 2 degrees of freedom. . tn−1 . There are several standard ways to proceed. t3 . Levy 2. The interpolation requirements are still satisﬁed at t0 . . The spline that is plotted with a dashed line is obtained by setting the derivatives at both endpoints to zero. one can specify them s (t0 ) = y0 . . We will explain later in what sense this spline is “natural”. ∀i). t1 . . . . zn . tn . i. 33 . .D. which means that it can always be (eﬃciently) inverted. . t2 . . = (2.9 shows two diﬀerent cubic splines that interpolate the same initial data. . The spline that is plotted with a solid line is the notaknot spline. . natural cubic spline.. In the special case where the points are equally spaced. tridiagonal. . tn−2 . we end up with the following linear system of equations h0 +h1 3 h1 6 h1 6 h1 +h2 3 . hn−3 +hn−2 3 hn−2 6 = yn−1 −yn−2 yn−2 −yn−3 hn−2 zn−2 hn−2 − hn−3 6 yn −yn−1 hn−2 +hn−1 zn−1 − yn−1 −yn−2 h h 3 n−1 n−2 z1 z2 . . tn−1 .e.56) . ∀i. .. hi = h.. . the only way to proceed is by making an arbitrary choice.. . s (tn ) = yn . .. . tn .. z0 . . there are other standard options: 1. .e.e. The points t1 and tn−1 no longer function as knots. y1 −y0 h0 y2 −y1 h1 The coeﬃcients matrix is symmetric. y2 −y1 h1 y3 −y2 h2 − − . In this case we end up with a cubic spline with knots t0 . (2. One option is to set the end values to zero. . the system becomes y2 − 2y1 + y0 4 1 z1 y3 − 2y2 + y1 1 4 1 z2 6 .. hn−3 6 h2 6 .55) This choice of the second derivative at the end points leads to the socalled. In this case. and diagonally dominant (i. i.j=i aij . z0 = zn = 0. aii  > n j=1.. . Here. 2. ... we require the thirdderivative s(3) (x) to be continuous at the points t1 . . If the values of the derivatives at the endpoints are known. Without any additional information about the problem. . Figure 2..55). .10 Spline Interpolation These are n − 1 equations for the n + 1 unknowns. The notaknot condition.. 2 h yn−1 − 2yn−2 + yn−3 1 4 1 zn−2 yn − 2yn−1 + yn−2 1 4 zn−1 In addition to the natural spline (2. .
Levy (t1 . In that sense.10. i. In fact. f(t3 )) (t4 . which will conclude the proof as the other two terms on the righthandside of (2. f(t2 )) (t0 ..9: Two cubic splines that interpolate the same data. Solid line: a notaknot spline. Deﬁne g(x) = f (x) − s(x).e. (f )2 dx = a a (s )2 dx + a (g )2 dx + 2 a s g dx. g(ti ) = 0.10 Spline Interpolation D. but with respect to any function that interpolates the data (and has a continuous secondderivative).57) We will show that the last term on the righthandside of (2. and let a = t0 < t1 < · · · < tn = b.2 What is natural about the natural spline? The following theorem states that the natural spline can not have a larger L2 norm of the secondderivative than the function it interpolates (assuming that that function has a continuous secondderivative).2. we are minimizing the L2 norm of the secondderivative not only with respect to the “original” function which we are interpolating. (2. b]. Proof. f(t4 )) (t2 . Now b b b b 0 i n. Theorem 2. we refer to the natural spline as “natural”.21 Assume that f (x) is continuous in [a. f(t1 )) (t3 .57) are 34 . Then since s(x) interpolates f (x) at the knots {ti } their diﬀerence vanishes at these points. f(t0 )) x Figure 2.57) is zero. Dashed line: the derivative is set to zero at both endpoints 2. If s(x) is the natural cubic spline interpolating f (x) at the knots {ti } then b b (s (x)) dx a a 2 (f (x))2 dx.
we have b n ti n ti ti s g dx = a i=1 ti−1 s g dx = i=1 (s g ) ti−1 − s g dx . and since s (x) is constant on [ti−1 . i=1 We note that f (x) can be viewed as a linear approximation of the curvature f (x) 3 (1 + (f (x))2 ) 2 . we end up with b a n ti ti−1 n ti n s g dx = − i=1 s g dx = − ci i=1 ti−1 g dx = − ci (g(ti )−g(ti−1 )) = 0. can be viewed as ﬁnding the curve with a minimal f (x) over an interval. b From that point of view. Levy 2. 35 . ti ] (say ci ). ti−1 Since we are dealing with the “natural” choice s (t0 ) = s (tn ) = 0.D.10 Spline Interpolation nonnegative. minimizing a (f (x))2 dx. Splitting that term into a sum of integrals on the subintervals and integrating by parts on every subinterval.
if we allow f (x) to be discontinuous. we do not have to assume that f (x) is continuous for the deﬁnition (3. g ∈ V . b]). (3. We can therefore deﬁne the L∞ norm (also known as the maximum norm) of such a function by f ∞ = max f (x). We chose to focus on these two examples because of the diﬀerent mathematical techniques that are required to solve the corresponding approximation problems. starting from a function f (x) we would like to ﬁnd a diﬀerent function g(x) that belongs to a given class of functions and is “close” to f (x) in some sense. The triangle inequality: f + g We assume that the function f (x) ∈ C 0 [a. λ f = λ f . a x b (3. we will typically assume that g(x) is a polynomial of a given degree (though it can be a trigonometric function. A typical approximation problem. However. b] (continuous on [a. This generalization requires some subtleties that we would like to avoid in the following discussion.D.1) The L∞ distance between two functions f (x). b] is thus given by f −g ∞ = max f (x) − g(x). Generally speaking. 2. f ∀λ ∈ R and ∀f ∈ V . we then have to be more 36 .3) The L2 function space is the collection of functions f (x) for which f 2 < ∞. Also f = 0 iﬀ f is the zero element of V .2) We note that the deﬁnition of the L∞ norm can be extended to functions that are less regular than continuous functions. Levy 3 3. A continuous function on a closed interval obtains a maximum in the interval. ∀f. We recall that a norm on a vector space V over R is a function · : V → R with the following properties: 1. As far as the class of functions that g(x) belongs to. Of course. We proceed by deﬁning the L2 norm of a continuous function f (x) as b f 2 = a f (x)2 dx.1 Approximations Background In this chapter we are interested in approximation problems. hence. 3. will therefore be: ﬁnd the “closest” polynomial of degree n to f (x). We start with several deﬁnitions. we will limit ourselves to continuous functions. ∀f ∈ V . We will focus on two such measurements (among many): the L∞ norm and the L2 norm.3) to make sense. 0. a x b (3. f + g . What do we mean by “close”? There are diﬀerent ways of measuring the “distance” between two functions. or any other function). g(x) ∈ C 0 [a.
Then. a natural question is how important is the choice of norm in terms of the solution of the approximation problem. but the problem still remains diﬃcult because we have to determine the best sampling points. there exists an N ∈ N and polynomials Pn (x) ∈ Πn . it is easy to construct a function with an arbitrary small f 2 and an arbitrarily large f ∞ . We will provide a constructive proof of the Weierstrass approximation theorem: ﬁrst. We start with the deﬁnition. e. Levy 3. such that ∀x ∈ [a. We will address these issues in the following sections. b] f (x) − Pn (x) < ε.1 Background rigorous in terms of the deﬁnition of the interval so that we end up with a norm (the problem is. On the other hand. assuming that the polynomials can be of any degree. It is easy to see that the value of the norm of a function may vary substantially based on the function as well as the choice of the norm. we will deﬁne a family of polynomials. ∀ε > 0. .1 (Weierstrass Approximation Theorem) Let f (x) be a continuous function on [a. and then we will show that they uniformly converge to f (x). we deﬁne the Bernstein polynomials as n (Bn f )(x) = j=0 f j n n j x (1 − x)n−j . Loosely speaking.. this theorem states that any continuous function can be approached as close as we want to with polynomials. For example. The L2 distance between two functions f (x) and g(x) is b f −g 2 = a f (x) − g(x)2 dx.D. Theorem 3. b]. The following theorem. known as the Bernstein polynomials. We let Πn denote the space of polynomials of degree n. Then there exists a sequence of polynomials Pn (x) that converges uniformly to f (x) on [a. one possible method of constructing an approximation to a given function is by sampling it at certain points and then interpolating the sampled data. As you have probably already anticipated.4) At this point. b]. ∀n N. Is that the best we can do? Sometimes the answer is positive. j 37 0 x 1. i. 1].g. assume that f ∞ < ∞. clearly b f 2 = a f 2 dx ≤ (b − a) f ∞.e. there is a strong connection between some approximation problems and interpolation problems. the choice of norm may have a signiﬁcant impact on the solution of the approximation problem. For example. Given a continuous function f (x) in [0. We formulate this theorem in the L∞ norm and note that a similar theorem holds also in the L2 sense. the Weierstrass approximation theorem. (3. in deﬁning what is the “zero” element in the space).. plays a central role in any discussion of approximations of functions. Hence. We therefore limit ourselves also in this case to continuous functions only.
Example 3. and B20 (x) for the function f (x) = 1 1 + 10(x − 0.3 0. j 38 . (Bn x2 )(x) = Proof.5 0. Note the gradual convergence of Bn (x) to f (x).5)2 We now state and prove several properties of Bn (x) that will be used when we prove Theorem 3. and B20 (x) for the function f (x) = 1 on the interval [0.6 0. n n−1 2 x x + .9 0.1 0.8 B20(x) 0.5 0. Lemma 3.4 0. n n (Bn 1)(x) = j=0 n j x (1 − x)n−j = (x + (1 − x))n = 1.3 The following relations hold: 1. 1] are shown in Figure 3.2 0.3 0 0.3.7 0.5)2 D.4 0. B10 (x).1: The Bernstein polynomials B6 (x). B10 (x).2 Three Bernstein polynomials B6 (x).8 0. Levy on the interval [0.6 0.9 1 x Figure 3.7 0.1 Background We emphasize that the Bernstein polynomials depend on the function f (x). 1 f(x) B6(x) B10(x) 0.1. 1] 1+10(x−0. (Bn 1)(x) = 1 2.1. (Bn x)(x) = x 3.
D. 1]. 1]. If f (x) (Bn f )(x) 0. if f (x) (Bn f )(x) 3. g(x) ∀x ∈ [0. Linearity. then (Bn g)(x). 0 then We are now ready to prove the Weierstrass approximation theorem. 1] then (Bn g)(x). Lemma 3. 2. If f (x) (Bn f )(x) Also. Positivity. Monotonicity. Theorem 3. n n n n 1 n − 2 j−2 x (1 − x)n−j + x n j=1 j−2 n n − 1 j−1 x (1 − x)n−j j−1 In the following lemma we state several additional properties of the Bernstein polynomials. 39 . g(x) ∀x ∈ [0. (Bn (αf + g))(x) = α(Bn f )(x) + (Bn g)(x). and ∀α ∈ R 1. j Finally.4 For all functions f (x). g(x) that are continuous in [0. j n 2 n j = j (n − 1)! (n − 1)! (n − 1)! n−1j −1 1 = + n (n − j)!(j − 1)! n − 1 n (n − j)!(j − 1)! n (n − j)!(j − 1)! 1 n−1 n−1 n−2 + . Levy n 3. The proof is left as an exercise.1 Background j n j x (1 − x)n−j = x n j j=1 n−1 n (Bn x)(x) = j=0 n − 1 j−1 x (1 − x)n−j j−1 =x j=0 n−1 j x (1 − x)n−1−j = x[x + (1 − x)]n−1 = x.1. = j−2 n n j−1 j n 2 Hence n (Bn x )(x) = j=0 2 n j x (1 − x)n−j j n n−1 2 = x n = j=2 1 n−1 2 x n−1 2 x (x + (1 − x))n−2 + x(x + (1 − x))n−1 = x + .
If x − a f (x) − f (a) 2M 2M δ then (3. We will use it later on to our advantage). 1].. since f (x) is continuous on a closed interval. = ε + 2 (x − a)2 + 2 δ δ n Bn ε + n−1 2 x x + − 2ax + a2 n n Evaluating at x = a we have (observing that maxa∈[0.6) The point a was arbitrary so the result (3. 1]. The linearity of Bn and the property (Bn 1)(x) = 1 imply that Bn (f − f (a))(x) = (Bn f )(x) − f (a). δ2 We would now like to estimate the diﬀerence between Bn f and f .5) In addition. b] is left as an exercise. we have. Bn f (x) − f (a) 2M 2M (x − a)2 = ε + 2 2 δ δ 2M x − x2 2M .6) holds for any point a ∈ [0. 2δ 2 ε Bn f − f ∞ ε+ M 2δ 2 N 2ε. 2δ 2 n (3. Hence using the monotonicity of Bn and the mapping properties of x and x2 .3. x∈[0. Discuss Runge’s example. 1]. 40 . Choosing M N we have ∀n N .1] (a − a2 ) = 1 ) 4 Bn f (a) − f (a) ε+ 2M a − a2 δ2 n ε+ M . Combining the estimates for both cases we have f (x) − f (a) ε+ 2M (x − a)2 . 1]. We will prove the theorem in the interval [0. y ∈ [0. Since f (x) is continuous on a closed interval.1] Fix any point a ∈ [0. • Is interpolation a good way of approximating functions in the ∞norm? Not necessarily.1 Background D. such that x − y δ. Levy Proof. f (x) − f (y) ε. Hence ∀x. (at ﬁrst sight this seems to be a strange way of upper bounding a function.. If x − a > δ then x−a δ 2 . The extension to [a. Let M = max f (x). it is also bounded.5) holds. it is uniformly continuous. (3.
D. Levy
3.2 The Minimax Approximation Problem
3.2
The Minimax Approximation Problem
We assume that the function f (x) is continuous on [a, b], and assume that Pn (x) is a polynomial of degree n. We recall that the L∞ distance between f (x) and Pn (x) on the interval [a, b] is given by f − Pn
∞
= max f (x) − Pn (x).
a x b
(3.7)
Clearly, we can construct polynomials that will have an arbitrary large distance from f (x). The question we would like to address is how close can we get to f (x) (in the L∞ sense) with polynomials of a given degree. We deﬁne dn (f ) as the inﬁmum of (3.7) over all polynomials of degree n, i.e., dn (f ) = inf
Pn ∈Πn
f − Pn
∞
(3.8)
∗ The goal is to ﬁnd a polynomial Pn (x) for which the inﬁmum (3.8) is actually obtained, i.e., ∗ dn (f ) = f − Pn (x) ∞.
(3.9)
∗ We will refer to a polynomial Pn (x) that satisﬁes (3.9) as a polynomial of best approximation or the minimax polynomial. The minimal distance in (3.9) will be referred to as the minimax error. The theory we will explore in the following sections will show that the minimax polynomial always exists and is unique. We will also provide a characterization of the minimax polynomial that will allow us to identify it if we actually see it. The general construction of the minimax polynomial will not be addressed in this text as it is relatively technically involved. We will limit ourselves to simple examples.
Example 3.5 We let f (x) be a monotonically increasing and continuous function on the interval [a, b] and are interested in ﬁnding the minimax polynomial of degree zero to f (x) in that interval. We denote this minimax polynomial by
∗ P0 (x) ≡ c. ∗ Clearly, the smallest distance between f (x) and P0 in the L∞ norm will be obtained if
c=
f (a) + f (b) . 2
∗ The maximal distance between f (x) and P0 will be attained at both edges and will be equal to
±
f (b) − f (a) . 2 41
3.2 The Minimax Approximation Problem 3.2.1 Existence of the minimax polynomial
D. Levy
The existence of the minimax polynomial is provided by the following theorem.
∗ Theorem 3.6 (Existence) Let f ∈ C 0 [a, b]. Then for any n ∈ N there exists Pn (x) ∈ Πn , that minimizes f (x) − Pn (x) ∞ among all polynomials P (x) ∈ Πn .
Proof. We follow the proof as given in [7]. Let η = (η0 , . . . , ηn ) be an arbitrary point in Rn+1 and let
n
Pn (x) =
i=0
ηi xi ∈ Πn .
We also let Our goal is to show that φ obtains a minimum in Rn+1 , i.e., that there exists a point ∗ ∗ η ∗ = (η0 , . . . , ηn ) such that φ(η ∗ ) = min φ(η). n+1
η∈R
φ(η) = φ(η0 , . . . , ηn ) = f − Pn
∞.
Step 1. We ﬁrst show that φ(η) is a continuous function on Rn+1 . For an arbitrary δ = (δ0 , . . . , δn ) ∈ Rn+1 , deﬁne
n
qn (x) =
i=0
δi xi .
Then φ(η + δ) = f − (Pn + qn ) Hence φ(η + δ) − φ(η) ≤ qn
∞ ∞
≤ f − Pn
∞
+ qn
∞
= φ(η) + qn
∞.
≤ max (δ0  + δ1 x + . . . + δn xn ).
x∈[a,b]
˜ For any ε > 0, let δ = ε/(1 + c + . . . + cn ), where c = max(a, b). Then for any ˜ δ = (δ0 , . . . , δn ) such that max δi  δ, 0 i n, φ(η + δ) − φ(η) Similarly φ(η) = f −Pn
∞
ε.
(3.10)
= f −(Pn +qn )+qn ε,
∞
f −(Pn +qn )
∞+
qn
∞
= φ(η+δ)+ qn
∞,
which implies that under the same conditions as in (3.10) we also get φ(η) − φ(η + δ) Altogether, which means that φ is continuous at η. Since η was an arbitrary point in Rn+1 , φ is continuous in the entire Rn+1 . 42 φ(η + δ) − φ(η) ε,
D. Levy
3.2 The Minimax Approximation Problem
Step 2. We now construct a compact set in Rn+1 on which φ obtains a minimum. We let S = η ∈ Rn+1 We have φ(0) = f
∞,
φ(η) ≤ f
∞
.
hence, 0 ∈ S, and the set S is nonempty. We also note that the set S is bounded and closed (check!). Since φ is continuous on the entire Rn+1 , it is also continuous on S, and hence it must obtain a minimum on S, say at η ∗ ∈ Rn+1 , i.e., min φ(a) = φ(η ∗ ).
η∈S
Step 3. Since 0 ∈ S, we know that min φ(η)
η∈S
φ(0) = f
∞.
Hence, if η ∈ Rn+1 but η ∈ S then φ(η) > f
∞
min φ(η).
η∈S
This means that the minimum of φ over S is the same as the minimum over the entire Rn+1 . Therefore
n ∗ Pn (x)
=
i=0
∗ ηi xi ,
(3.11)
is the best approximation of f (x) in the L∞ norm on [a, b], i.e., it is the minimax polynomial, and hence the minimax polynomial exists. We note that the proof of Theorem 3.6 is not a constructive proof. The proof does not tell us what the point η ∗ is, and hence, we do not know the coeﬃcients of the minimax polynomial as written in (3.11). We will discuss the characterization of the minimax polynomial and some simple cases of its construction in the following sections. 3.2.2 Bounds on the minimax error
It is trivial to obtain an upper bound on the minimax error, since by the deﬁnition of dn (f ) in (3.8) we have dn (f ) f − Pn
∞,
∀Pn (x) ∈ Πn .
A lower bound is provided by the following theorem. 43
This implies that (Qn − Pn )(x) changes sign at least n + 2 times. . . de la Vall´ePoussin’s theorem (Theorem 3. where all ej = 0 and are of an identical sign.8 (The oscillating theorem) Suppose that f (x) is continuous in [a. . Being a polynomial of degree n this is possible only if it is identically zero. Proof.2 The Minimax Approximation Problem Theorem 3. ∗ ∗ Hence D = dn and Pn (x) is the minimax polynomial. suppose that ∗ ∗ (f − Pn )(xi ) = (−1)i f − Pn ∞. Let ∗ D∗ = f − Pn ∞. Then dn (f ). We prove here only the suﬃciency part of the theorem. which contradicts the assumptions on Qn (x) and Pn (x). if Pn (x) ≡ Qn (x). j Then the polynomial (Qn − Pn )(x) = (f − Pn )(x) − (f − Qn )(x). In view of these theorems it is obvious why the Taylor expansion is a poor uniform approximation. Theorem 3. min ej  j D. Suppose that f (xj ) − Pn (xj ) = (−1)j ej . ∗ The polynomial Pn (x) ∈ Πn is the minimax polynomial of degree n to f (x) in [a. b] if ∗ ∗ and only if f (x) − Pn (x) assumes the values ± f − Pn ∞ with an alternating change of sign at least n + 2 times in [a. i. We replace the inﬁmum in the original deﬁnition of dn (f ) by a minimum because we already know that a minimum exists. Without loss of generality. For the necessary part of the theorem we refer to [7]. On the other hand. .7) e ∗ implies that D dn . The sum is non oscillatory. b].2. By contradiction. 44 . Levy x0 < x1 < · · · < xn+1 b.3. Let Pn (x) j = 0. 3.7 (de la Vall´ePoussin) Let a e be a polynomial of degree n. and let dn (f ) = min Pn ∈Πn f − Pn ∞. Assume for some Qn (x) that f − Qn ∞ < min ej .. the deﬁnition of dn implies that dn D∗ . 0 i n + 1. n + 1. b]. Remark. and hence it has at least n + 1 zeros.3 Characterization of the minimax polynomial The following theorem provides a characterization of the minimax polynomial in terms of its oscillations property. is a polynomial of degree n that has the same sign at xj as does f (x) − Pn (x). Proof.e.
equations (3. The oscillating theorem 2 (Theorem 3.e.14) and f (xi ) − Qn (xi ) ≤ f − Qn ∞ = dn (f ). (3.13) ∗ Since Pn (x) and Qn (x) are both minimax polynomials. Then its minimax poly∗ nomial Pn (x) ∈ Πn is unique. 0 i n + 1. ∗ f (xi ) − Pn (xi ) = f (xi ) − Qn (xi ). Proof. b]. the polynomial (Pn − Qn )(x) ∈ Πn has n + 2 distinct roots which is possible for a polynomial of degree n only if it is identically zero.2 The Minimax Approximation Problem Uniqueness of the minimax polynomial Theorem 3. 0 i n + 1.D. ∗ (Pn − Qn )(xi ) = 0. ∗ Hence. Hence ∗ Qn (x) ≡ Pn (x). 1 (Pn + Qn ) ∈ Πn is also a minimax polynomial. Levy 3. 0 i n + 1. we have ∗ ∗ f (xi ) − Pn (xi ) ≤ f − Pn ∞ = dn (f ).9 (Uniqueness) Let f (x) be continuous on [a. Then ∗ f − Pn ∞ = f − Qn ∞ = dn (f ). . .. 0 i n + 1. xn+1 ∈ [a.12) ∗ f (xi ) − Pn (xi ) + f (xi ) − Qn (xi ) = 2dn (f ). and the uniqueness of the minimax polynomial is established.12) can be rewritten as 0 i n + 1. i. 2 Equation (3. Let dn (f ) = min Pn ∈Πn f − Pn ∞. (3. 0 i n + 1. . (3.2. ∗ Hence. The triangle inequality implies that 1 ∗ f − (Pn + Qn ) 2 ∞ ≤ 1 ∗ f − Pn 2 ∞ + 1 f − Qn 2 ∞ = dn (f ). i. This is possible only if they are equal to each other.e. 45 .8) implies that there exist x0 .15) mean that the absolute value of two numbers that are dn (f ) add up to 2dn (f ). b] such that 1 ∗ f (xi ) − (Pn (xi ) + Qn (xi )) = dn (f ).13)–(3. (3.4 3. . Assume that Qn (x) is also a minimax polynomial.15) For any i..
as done in the following example. this relation motivates us to refer to the interpolant at the Chebyshev points as the nearminimax polynomial. choosing x0 . there should be n + 1 points on which f (x) and Pn (x) agree with each other. 3. Levy We now connect between the minimax approximation problem and polynomial interpolation. Due to the dependency of f (n+1) (ξ) on the intermediate point ξ. . we know that the maximum of n n f (n+1) (ξ) i=0 (x − xi ). . say x0 .2. In other words.10 Problem: Let f (x) = ex .25). The algorithm for doing so is known as the Remez algorithm. . we know that minimizing the error term (3.16) is a diﬃcult task. A simple case where we can demonstrate a direct construction of the polynomial is when the function is convex.3. i. we will be looking ∗ ∗ for a linear function P1 (x) such that its maximal distance between P1 (x) and f (x) is 46 . x ∈ [1. We recall that interpolation at the Chebyshev points minimizes the multiplicative part of the error term. n i=0 (x − xi ). We are not going to deal with the construction of the minimax polynomial in the general case. (3.2. 3]. but nevertheless. . .e. Solution: Based on the characterization of the minimax polynomial. In order for f (x) − Pn (x) to change its sign n + 2 times. . Find the minimax polynomial of degree ∗ 1. P1 (x).. We note that the term “nearminimax” does not mean that the nearminimax polynomial is actually close to the minimax polynomial. Example 3.16) will oscillate with equal values. Hence. we can think of Pn (x) as a function that interpolates f (x) at (least in) n + 1 points. xn to be the Chebyshev points will not result with the minimax polynomial. and we refer the interested reader to [2] and the references therein. xn .6 Construction of the minimax polynomial The characterization of the minimax polynomial in terms of the number of points in which the maximum distance should be obtained with oscillating signs allows us to construct the minimax polynomial in simple cases by a direct computation.5 The nearminimax polynomial D. 1 f (x) − Pn (x) = f (n+1) (ξ) (x − xi ). (n + 1)! i=0 If Pn (x) is indeed the minimax polynomial. What can we say about these points? We recall that the interpolation error is given by (2. . .2 The Minimax Approximation Problem 3.
2 The Minimax Approximation Problem obtained 3 times with alternative signs. Since f (x) = ex . The construction itself is graphically shown in Figure 3. the slope m is given by m= e3 − e . 3] We let l1 (x) denote the line that connects the endpoints (1. and l1 (a) = e + m(log m − 1). we have ea = m.D. a = log m. in the case of the present problem.. Now f (a) = elog m = m.2. 47 . e3 ex l1 (x) −→ l1 (a) y ¯ f(a) ∗ P1 (x) e1 ←− l2 (x) 1 a 3 x Figure 3. Here. l1 (x) = e + m(x − 1). Levy 3. the maximal distance will be obtained at both edges and at one interior point. 2 (3. Clearly. i. e) and (3.2: A construction of the linear minimax polynomial for the convex function ex on [1.e. We will use this observation in the construction that follows.17) Let l2 (x) denote the tangent to f (x) at a point a that is identiﬁed such that the slope is m. e3 ). i.e. since the function is convex..
we let Πn denote the space of all polynomials of degree n.2 Let n 2 = min Qn ∈Πn f − Qn 2 . Levy ∗ The minimax polynomial P1 (x) is the line of slope m that passes through (a. Solving the leastsquares problem: a direct method Qn (x) = i=0 ai x i . ¯ ∗ P1 (x) − e + m log m = m(x − log m). . b φ(a0 . to ﬁnd Q∗ ∈ Πn such that n f − Q∗ n 3. i. For convenience.. As before.e.e. We want to minimize f (x) − Qn (x) 2 among all Qn ∈ Πn . y ). instead of minimizing the L2 norm of the diﬀerence.3.3. We note that the maximal diﬀerence between ∗ P1 (x) and f (x) is obtained at x = 1. 3. 2 2 2 D. . 2 i.. an ) = a b (f (x) − Qn (x))2 dx n b n n b = a f (x)dx − 2 2 ai i=0 a x f (x)dx + i=0 j=0 i ai aj a xi+j dx. ∗ P1 (x) = mx + e − m log m . 48 . 3. a.3 3. The leastsquares approximation problem is to ﬁnd the polynomial that is the closest to f (x) in the L2 norm among all polynomials of degree n.3. we will minimize its square. i. . 2 where the slope m is given by (3. .3 Leastsquares Approximations Hence. the average between f (a) and l1 (a) which we denote by y is given by ¯ y= ¯ f (a) + l1 (a) m + e + m log m − m e + m log m = = .1 Leastsquares Approximations The leastsquares approximation problem We recall that the L2 norm of a function f (x) is deﬁned as b f 2 = a f (x)2 dx. We thus let φ denote the square of the L2 distance between f (x) and Qn (x).17).e..
. n. assuming that this ˆ system can be solved. .21) is known as the Hilbert matrix. a 0 i. Levy 3. . . 1/1 1/2 Hn (0. We let Hn+1 (a.. 49 . . b))i. b) denote the (n + 1) × (n + 1) coeﬃcients matrix of the system (3.21) The matrix (3.. . .18) The condition (3. . . an ): a ˆ n b b ai ˆ i=0 a xi+k dx = a xk f (x). . .20) on the interval [a. k n. which proves that not only the leastsquares problem has a solution. a This is a linear system for the unknowns (ˆ0 . b] = [0. i = 0.19) ai ˆ i=0 a xi+k dx − xk f (x)dx . b]. . are the solution of (3.11 The Hilbert matrix is invertible. . a=ˆ a (3. . an ) ∈ Rn+1 for which φ obtains a minimum. . in the case where [a. . Lemma 3. . . 1].e. . (3.20) always has a unique solution.. 1/n 1/(n + 1) 1/n 1/(n + 1) . k = 0. At this ˆ a ˆ point ∂φ ∂ak = 0. Indeed. For example. b (Hn+1 (a. the system (3. n.3 Leastsquares Approximations φ is a function of the n + 1 coeﬃcients in the polynomial Qn (x).18) implies that b n b n b 0 = −2 = 2 x f (x)dx + a n b i=0 k ai ˆ a b x i+k dx + j=0 aj ˆ a xj+k dx (3. This means that we want to ﬁnd a point a = (ˆ0 . .D. but that it is also unique. .. 1) = .20) We thus know that the solution of the leastsquares problem is the polynomial n Q∗ (x) n = i=0 ai x i .. . (3. i. ˆ where the coeﬃcients ai . 1/(2n − 1) 1/2 1/3 . . .20).k = xi+k dx.
k=0 n Qn (x) = j=0 cj Pj (x). We demonstrate this with the following example. Is inverting the Hilbert matrix a good way of solving the leastsquares problem? No. det(Hn ) = 0 and Hn is invertible. which indicates that it is illconditioned. We leave it is an exercise to show that the determinant of Hn is given by det(Hn ) = (1!2! · · · (n − 1)!)4 . . Levy Proof. n.12 The Hilbert matrix H5 is 1/1 1/2 H5 = 1/3 1/4 1/5 1/2 1/3 1/4 1/5 1/6 is 1/3 1/4 1/5 1/6 1/7 1/4 1/5 1/6 1/7 1/8 1/5 1/6 1/7 1/8 1/9 The condition number of H5 is 4. 1!2! · · · (2n − 1)! Hence.77 · 105 . Example 3. Qn (x) is a polynomial of degree b φ(c0 .3 Solving the leastsquares problem: with orthogonal polynomials The inverse of H5 25 −300 H5 = 1050 −1400 630 −300 1050 −1400 630 4800 −18900 26880 −12600 −18900 79380 −117600 56700 26880 −117600 179200 −88200 −12600 56700 −88200 44100 n Let {Pk }k=0 be polynomials such that deg(Pk (x)) = k. Deﬁne (3. . the condition number of Hn increases with the dimension n so inverting it becomes more diﬃcult with an increasing dimension.3 Leastsquares Approximations D. In fact..3. .3. 50 . i. Let Qn (x) be a linear combination of the polynomials {Pk }n .22) Clearly. . cn ) = a [f (x) − Qn (x)]2 dx.e. There are numerical instabilities that are associated with inverting H. 3.
we have ˆ c ˆ 0= i.22). . . at the minimum.e.20). we used the basis functions {xk }n (a basis of Πn ). The idea now is to choose the polynomials {Pk (x)}n such that the system k=0 (3.3 Leastsquares Approximations We note that the function φ is a quadratic function of the coeﬃcients of the linear combination (3. j = j.26) with the coeﬃcients cj = ˆ b a Pj (x)f (x)dx .. . .23) can be easily solved. . ˆ (3. ˆ Remark. (3. 0. . . . .26) with coeﬃcients cj . the polynomials {Pk (x)}n are orthonormal. . (3. i = j. n b b ∂φ ∂ck b c=ˆ c n b = −2 Pk (x)f (x)dx + 2 a j=0 cj ˆ a Pj (x)Pk (x)dx.23) implies that k=0 b cj = ˆ a Pj (x)f (x)dx. .D. b with a (Pi (x))2 dx that is not necessarily 1 are called orthogonal polynomials. If. i = j. . i = j. j = 0. n. (3. Similarly to the calculations done in the previous section. n. . . There. . We would like to minimize φ.27) 51 . Polynomials that satisfy b a (Pi (x))2 . that are given by (3. . j = 0. {ck }.25). cj ˆ j=0 a Pj (x)Pk (x)dx = a Pk (x)f (x)dx. In ∗ this case. indeed.23) and (3. n.23) Note the similarity between equation (3. the solution of the leastsquares problem is given by the polynomial Qn (x) in (3. . while here we work with the polynomials {Pk (x)}n k=0 k=0 instead.25) The solution of the leastsquares problem is a polynomial n Q∗ (x) = n j=0 cj Pj (x).24) are called orthonormal polynomials. . . then (3. k = 0. (3. c = (ˆ0 .24) Polynomials that satisfy (3. cn ). . This can be done if we choose them in such a way that b Pi (x)Pj (x)dx = δij = a 1. b Pi (x)Pj (x)dx = a 0. Levy 3. b (Pj (x))2 dx a j = 0. n.
. we look for a polynomial Q∗ (x) of degree n n such that f − Q∗ n 2.3. .28) we follow the methodology described in Section 3.e. b w(x)dx > 0. and consider polynomials {Pk }n such that deg(Pk (x)) = k. The system (3. i. a Hence.31) ..w = min Qn ∈Πn f − Qn 2. .30) where the coeﬃcients are given by b cj = ˆ a Pj (x)f (x)w(x)dx. ∗ The weighted leastsquares problem is to ﬁnd the closest polynomial Qn ∈ Πn to f (x).29) can be easily solved if we choose {Pk (x)} to be orthonormal with respect to the weight w(x). .3. we obtain n b b cj ˆ j=0 a w(x)Pj (x)Pk (x)dx = a w(x)Pk (x)f (x)dx. For any weight w(x).. i.3. b). nonnegative function with a positive mass. the solution of the weighted leastsquares problem is given by n Q∗ (x) = n j=0 ˆ cj Pj (x). j = 0. (3.e.w . 52 (3.28) In order to solve the weighted leastsquares problem (3. We consider a weight function. we deﬁne the corresponding weighted L2 norm of a function f (x) as b f 2. i.3. b]. By repeating the calculations of Section 3. (3. . n. k = 0. to be a continuous on (a.3. (3.23)).w = a (f (x))2 w(x)dx.29) (compare with (3.3. . k=0 We then consider a polynomial Qn (x) that is written as their linear combination: n Qn (x) = j=0 cj Pj (x). . . this time in the weighted L2 norm sense. b Pi (x)Pj (x)w(x)dx = δij .. n.4 The weighted least squares problem D. Levy A more general leastsquares problem is the weighted least squares approximation problem. a Note that w(x) may be singular at the edges of the interval since we do not require it to be continuous on the closed interval [a.e.3 Leastsquares Approximations 3. w(x).
f1 + f2 . x2 . 2. the GramSchmidt process is being used to convert one set of linearly independent vectors to an orthogonal set of vectors that spans the same vector space. . b (Pj (x))2 w(x)dx a b a j = 0.3. The properties of orthogonal polynomials will be studies in Section 3. the coeﬃcients of the solution (3. g instead of f. However. even in the weighted case.7. = f. . f 4. f = 0 and f (x) can still be nonzero (e. If it is not continuous. We start by deﬁning the weighted inner product between two functions f (x) and g(x) (with respect to the weight w(x)): b f. In the case where {Pk (x)} are orthogonal but not necessarily normalized. This can be done using the GramSchmidt orthogonalization process. Typically. g . b]. .D. g = g. Given a weight w(x). In this section we will focus on the construction of orthogonal polynomials. x. the initial set of polynomials will be {1. b]. Here we must assume that f (x) is continuous in the interval [a. in one point). g . g = f. In our context. 3. we will typically write f. In the general context of linear algebra.5 Orthogonal polynomials At this point we already know that orthogonal polynomials play a central role in the solution of leastsquares problems. . g + f2 . to keep the discussion slightly more general. n. f = 0 iﬀ f ≡ 0.w ∀α ∈ R. αg = α f. we start with n + 1 linearly independent functions (all in L2 [a. w 53 . Some properties of the weighted inner product include 1. . The weighted L2 norm can be obtained from the weighted inner product by f 2.3. g w . f w. we should think about the process as converting one set of polynomials that span the space of polynomials of degree n to an orthogonal set of polynomials that spans the same space Πn . we are interested in constructing orthogonal (or orthonormal) polynomials. . we can have f. which we now describe in detail. f. 3. xn }. . g w = a f (x)g(x)w(x)dx.g. which we would like to convert to orthogonal polynomials with respect to the weight w(x). To simplify the notations.. f. Levy 3. f 0 and f.30) of the weighted leastsquares problem are given by cj = ˆ Pj (x)f (x)dx .3 Leastsquares Approximations Remark. αf. g = f1 . .
f1 0 = d1 f0 . . f0 Hence. The denominator cannot be zero due to the assumption that gi (x) are linearly independent. We start with f0 (x): f0 . . . fi w . − cn−1 f (x)). . g1 w − c0 ). g1 1 − c0 f0 1 w w = 1 now implies = 1.3. f1 (x) = d1 (g1 (x) − c0 f0 (x)). The functions {gi } will be converted into i=0 orthonormal vectors {fi }. fj w = a fi (x)fj (x)w(x)dx = δij .e. f1 d2 g1 − c0 f0 . In general k−1 fk (x) = dk (gk − c0 f0 − . g0 . g0 w . we require that it is orthogonal to f0 (x).e. 1 .. f (x) = d (g (x) − c0 f (x) − . k For i = 0. g1 − c0 f0 1 i. d0 = 1 g0 . . Hence = d1 ( f0 . We thus consider f0 (x) = d0 g0 (x). 1 . For f1 (x). ..3 Leastsquares Approximations b D. − ck fk−1 ). n n n n−1 n 0 n fi . i. g1 − c0 f0 1 1 1 Hence d1 = 1 g1 − c0 f0 .. . .. . b].e. i. 54 . b {gi (x)}n .e. g1 1 w w = 0. i. a (g(x))2 w(x)dx < ∞). k − 1 we require the orthogonality conditions 0 = fk . . w The normalization condition f1 . Levy The goal is to ﬁnd the coeﬃcients dk and cj such that {fi }n is orthonormal with i=0 k respect to the weighted L2 norm over [a. . c0 = f0 . w w w 2 = d0 g0 . f0 .
D. . we have f0 (x) = d0 g0 (x) = d0 . n. 2 3 2 =⇒ f1 (x) = 3 x. fi k i. d1 = Similarly. 2 1 . . 1]. f1 = 1 reads 1= Therefore. fi i k − 1. 1 3 1 −1 . 1]. i = 0. Since g0 (x) ≡ 1. fi w w 3. Hence 1 1= −1 2 f0 (x)dx = 2d2 . 0 The coeﬃcient dk is obtained from the normalization condition fk . f0 = x. 1 2 = xdx = 0. . f2 (x) = and so on.3 Leastsquares Approximations = dk ( gk . i ck = gk . Example 3.e. Levy Hence 0 = dk (gk − ci fi ). 2 1 2 1 f1 (x) = g1 − c0 f0 = x − c0 1 1 d1 Hence 0 c1 = g1 .13 Let w(x) ≡ 1 on [−1. −1 This implies that f1 (x) =x d1 1 =⇒ f1 (x) = d1 x. We follow the GramSchmidt orthogonalization process to generate from this list. w − ci ). The normalization condition f1 . fk w = 1. 0 which means that 1 =⇒ d0 = √ 2 Now 1 f0 = √ . Start with gi (x) = xi . k . a set of orthonormal polynomials with respect to the given weight on [−1. 2 2 d2 x2 dx = d2 . 55 1 2 5 (3x2 − 1). ..
2n n! dxn n 0.36) (see (2. Legendre polynomials. This is a family of polynomials that are orthogonal with respect to the weight w(x) ≡ 1. Our second example is of the Chebyshev polynomials. n 1.31)). 2n + 1 (3.37) .33) The Legendre polynomials satisfy the orthogonality condition Pn . These polynomials are orthogonal with respect to the weight w(x) = √ 1 . (3.32)). 1 − x2 on the interval [−1. 1]. Pm = 2 δnm . 1. on the interval [−1.34) 2. The Legendre polynomials can be obtained from the recurrence relation (n + 1)Pn+1 (x) − (2n + 1)xPn (x) + nPn−1 (x) = 0. (3. n 0.35) together with T0 (x) = 1 and T1 (x) = x (see (2. Tn . 2 56 (3. n 1.3. 1]. n = m = 0. starting from P0 (x) = 1. Tm = π . The orthogonality relation that they satisfy is 0.32) It is possible to calculate them directly by Rodrigues’ formula Pn (x) = 1 dn (x2 − 1)n . n = m = 0. They and are explicitly given by Tn (x) = cos(n cos−1 x). π.3 Leastsquares Approximations D. P1 (x) = x. We start with the Legendre polynomials. n = m. (3. Chebyshev polynomials. They satisfy the recurrence relation Tn+1 (x) = 2xTn (x) − Tn−1 (x). (3. Levy We are now going to provide several important examples of orthogonal polynomials.
38) The normalization condition is Ln = 1. ∞).3 Leastsquares Approximations 3. n! dxn n 0. They satisfy the orthogonality relation ∞ −∞ √ 2 e−x Hn (x)Hm (x)dx = 2n n! πδnm . 4. 57 (3. ∞). on the interval [0. Hermite polynomials. Laguerre polynomials. Here the interval is given by [0.D. on the interval (−∞.43) . dxn 2 n 0. (3. The Hermite polynomials satisfy n 1. ∞) with the weight function w(x) = e−x .40) Another way of expressing them is by [n/2] Hn (x) = k=0 (−1)k n! (2x)n−2k .42) where [x] denotes the largest integer that is the recurrence relation Hn+1 (x) − 2xHn (x) + 2nHn−1 (x) = 0. (3. The Hermite polynomials are orthogonal with respect to the weight w(x) = e−x . Levy 3.39) A more general form of the Laguerre polynomials is obtained when the weight is taken as e−x xα . H1 (x) = 2x. (3. (3. The Laguerre polynomials are given by Ln (x) = ex dn n −x (x e ). We proceed with the Laguerre polynomials. k!(n − 2k)! (3.41) x. for an arbitrary real α > −1. The can be explicitly written as Hn (x) = n −x2 n x2 d e (−1) e . together with H0 (x) = 1.
6 Another approach to the leastsquares problem D. n n n n n 0 f− = f j=0 2 2. = j=0 f. Levy In this section we present yet another way of deriving the solution of the leastsquares problem.w = f 2 2. Pj j=0 2 w + w − bj 2 . Assume that {Pk (x)}k 0 is an orthonormal family of polynomials with respect to w(x).3. Pj w . Pj j=0 n w + i=0 j=0 n bi bj Pi . (3. Pj j=0 w −2 f. Pj j=0 2 w .3 Leastsquares Approximations 3. Pj j=0 + j=0 b2 j j = f n − f. f − n bj Pj j=0 w bj w = f.w = a w(x) f (x) − bj Pj (x) j=0 dx. .w − f. f n w −2 2 2.w bj Pj . Along the way. The last expression is minimal iﬀ ∀0 bj = f. We have n f− Hence Q∗ 2 n 2. Hence. we will be able to derive some new results. and let n Qn (x) = j=0 bj Pj (x). We recall that our goal is to minimize b a w(x)(f (x) − Qn (x))2 dx among all the polynomials Qn (x) of degree n.w bj f. n Q∗ 2 n = j=0 f. Pj w Pj (x). there exists a unique leastsquares approximation which is given by n Q∗ (x) n Remarks. Then b n 2 f− Hence 2 Qn 2.3. Pj f.44) 1. Pj 2 w = f 2 − f − Q∗ n 58 2 f 2 .
Pj 2 w .14 Problem: Let f (x) = cos x on [−1.e. 1 P2 (x) = (3x2 − 1).45) is called Bessel’s inequality. 3 −1 1 2 2 P2 (x)dx = . that minimizes 1 −1 [f (x) − Q2 (x)]2 dx. 2n + 1 Hence 1 −1 2 P0 (x)dx = 2. 1]. Assuming that [a. Levy i. n 3. Find the polynomial in Π2 . 1 −1 2 Pn (x) = 2 . in general. Example 3. 2 P1 (x)dx = . Hence f 2 2.w = f. P1 (x) = x.3 Leastsquares Approximations f.w = 0. 2 The normalization factor satisﬁes.. (3. P1 (x) = x. we have n→∞ lim f − Q∗ n ∞ j=0 2. (3.w . 1] implies that the orthogonal polynomials we need to use are the Legendre polynomials. Solution: The weight w(x) ≡ 1 on [−1. 2.45) The inequality (3. P2 (x) = √ (3x2 − 1). 5 −1 1 We can then replace the Legendre polynomials by their normalized counterparts: √ 1 3 5 P0 (x) ≡ √ . b] is ﬁnite. Pj j=0 2 w f 2 2.D.46) which is known as Parseval’s equality. 59 . We are seeking for polynomials of degree 2 so we write the ﬁrst three Legendre polynomials P0 (x) ≡ 1. 2 2 2 2 We now have f. P0 1 1 cos x √ dx = √ sin x = 2 2 −1 1 1 = −1 √ 2 sin 1.
2 In Figure 3. 0 We also have 1 D.7 0.1 0. Q∗ (x). 0 1 1 f.8 0. 2 and hence the desired polynomial.85 0. we can still use the Legendre polynomials if we make the following change of variables. We zoom on the interval x ∈ [0. b].3 we plot the original function f (x) = cos x (solid line) and its approximation Q∗ (x) (dashed line).55 0. Dashed line: its approximation Q∗ (x) 2 If the weight is w(x) ≡ 1 but the interval is [a.5 0 0. Solid line: f (x). P1 = −1 cos x 3 xdx = 0. 2 which means that Q∗ (x) = Q∗ (x).9 1 x Figure 3.65 0.6 0.75 0.3. 2 1 0. Deﬁne x= b + a + (b − a)t .6 0. Levy f.8 0. Finally. 1].2 0.5 0.7 0.4 0.95 0.3 0.3: A secondorder L2 approximation of f (x) = cos x.3 Leastsquares Approximations Hence Q∗ (x) ≡ sin 1.9 0. is given by 2 Q∗ (x) = sin 1 + 2 15 cos 1 − 5 sin 1 (3x2 − 1). 2 60 . P2 = −1 cos x 5 3x2 − 1 1 = 2 2 2 5 (12 cos 1 − 8 sin 1).
D. 0 Letting x= we have F (t) = cos π πt (1 + t) = − sin . Example 3. Also 0 F.4 we plot the original function f (x) = cos x (solid line) and its approximation ∗ Q1 (x) (dashed line).15 Problem: Let f (x) = cos x on [0. P0 = − −1 1 πt √ sin dt = 0. π 1 −1 Solution: (f (x) − Q∗ (x))2 dx = 1 π 2 [F (t) − qn (t)]2 dt. Levy Then the interval −1 F (t) = f Hence b a 3. P1 Hence 12 3 8 ∗ q1 (t) = − · 2 t = − 2 t 2 π π =⇒ Q∗ (x) = − 1 12 π2 2 x−1 . 61 . 2 F. π]. π πt =− sin 2 −1 1 3 tdt = − 2 t cos πt 3 sin πt 2 2 − π π 2 2 2 2 1 −1 =− 3 8 · . 2 π2 In Figure 3. 2 2 which means that Q∗ (t) = 0. 2 2 We already know that the ﬁrst two normalized Legendre polynomials are 1 P0 (t) = √ . deﬁne b + a + (b − a)t 2 [f (x) − Qn (x)]2 dx = b−a 2 1 −1 [F (t) − qn (t)]2 dt. 2 2 π π + πt = (1 + t).3 Leastsquares Approximations t 1 is mapped to a = f (x). x b. Find the polynomial in Π1 that minimizes π 0 [f (x) − Q1 (x)]2 dx. Now. 2 Hence 1 P1 (t) = 3 t.
5 0 −0. 2 62 = 0. π].3. we will need to use the ﬁrst two Laguerre polynomials: L0 (x) = 1. We thus have f.5 2 2. ∞). Solution: The family of orthogonal polynomials that correspond to this weight on [0. Dashed line: its approximation Q∗ (x) 1 Example 3. L1 w w ∞ 0 L1 (x) = 1 − x. Levy 1 0. −x = e e−x (− cos x + sin x) cos xdx = 2 ∞ 0 1 = . Solid line: f (x). Find the polynomial in Π1 that minimizes ∞ 0 e−x [f (x) − Q1 (x)]2 dx.16 Problem: Let f (x) = cos x in [0.5 −1 0 0.5 1 1. L1 w . Since we are looking for the minimizer of the weighted L2 norm among polynomials of degree 1. 2 ∞ 0 = 0 ∞ e−x cos x(1−x)dx = xe−x (− cos x + sin x) e−x (−2 sin x) 1 − − 2 2 4 1 L1 (x) = . ∞) are Laguerre polynomials. L0 Also f.3 Leastsquares Approximations D. L0 w L0 (x) + f. This means that ∗ Q1 (x) = f.5 3 x Figure 3.4: A ﬁrstorder L2 approximation of f (x) = cos x on the interval [0.
Another important property of orthogonal polynomials is that they can all be written in terms of recursion relations. We have already seen speciﬁc examples of such relations for the Legendre. Theorem 3. b). a and hence r = n since Pn (x) is orthogonal to polynomials of degree less than n. simple. . . then An = an+1 .17 The roots xj . This theorem will become handy when we discuss Gaussian quadratures in Section 5. · (x − xr ).18 (Triple Recursion Relation) Any three consecutive orthonormal polynomials are related by a recursion formula of the form Pn+1 (x) = (An x + Bn )Pn (x) − Cn Pn−1 (x). and Hermite polynomials (see (3. .. . . . This implies that Pn (x)Qr (x)w(x)dx = 0. Let Qr (x) = (x − x1 ) · . Pn (x) = (x − x1 )2 Pn−2 (x). j = 1. Proof. (3. b). Then Qr (x) and Pn (x) change their signs together in (a. Hence roots can not repeat. Cn = an+1 an−1 . . and are in (a.7 Properties of orthogonal polynomials 3. b). The following theorem states such relations always hold. . Hence (Pn Qr )(x) is a polynomial with one sign in (a. n of Pn (x) are all real. Hence Pn (x)Pn−2 (x) = which implies that b Pn (x) x − x1 2 0.6.42)). Without loss of generality we now assume that x1 is a multiple root. a2 n .D. a This is not possible since Pn is orthogonal to Pn−2 . Let x1 . an Bn = an+1 an bn+1 bn − an+1 an 63 . b] with respect to the weight w(x). We let {Pn (x)}n 0 be orthogonal polynomials in [a.35). Pn (x)Pn−2 (x)dx > 0.e. Also deg(Qr (x)) = r b n.3 Leastsquares Approximations We start with a theorem that deals with some of the properties of the roots of orthogonal polynomials.3. b). Theorem 3. Levy 3. Chebyshev. i. . and (3. If ak and bk are the coeﬃcients of the terms of degree k and degree k − 1 in Pk (x).32). . xr be the roots of Pn (x) in (a.
. . Pn−1 = An Pn . Pi = 0. . Levy an+1 . . an an+1 x(an xn + bn xn−1 + . . Q(x) = i=0 For 0 i αi = n − 2. = (An x+Bn )(an xn +bn xn−1 +. which means that Bn = (bn+1 − An bn ) 1 .) − = Hence deg(Q(x)) n D. . αn such that αi Pi (x). an an−1 Pn + qn−1 an = An an−1 . an an−1 Pn + qn−1 . .).3 Leastsquares Approximations Proof. Set αn = Bn and αn−1 = −Cn . an 64 . Pi = Q.3. Q. . . The coeﬃcient of xn is bn+1 = An bn + Bn an . .) an bn+1 − xn + . since xPn−1 = we have Cn = An xPn . Pi Hence Qn (x) = αn Pn (x) + αn−1 Pn−1 (x). . . . which means that there exists α0 . Finally Pn+1 = (An x + Bn )Pn − Cn Pn−1 . . . Pi = Pn+1 − An xPn . For An = let Qn (x) = Pn+1 (x) − An xPn (x) an+1 bn an = (an+1 xn+1 + bn+1 xn + . n. Then. Pi . can be explicitly written as an+1 xn+1 +bn+1 xn +. xPn−1 = An Pn . Pi = −An xPn . .)−Cn (an−1 xn−1 +bn−1 xn−2 +.
For almost all other functions. A simple approximation of the ﬁrst derivative is f (x) ≈ f (x + h) − f (x) . • When approximating solutions to ordinary (or partial) diﬀerential equations.2) means that we can now replace the approximation (4. we need to be able to come up with methods for approximating the derivatives at these points. The Taylor expansion (4. We may still be interested in studying changes in the data. (4. i. we might know its values only at a sampled data set without knowing the function itself.1) is not the exact derivative. 2 ξ ∈ (x. Let’s compute the approximation error. h 2 65 ξ ∈ (x. and again. x + h).e. (4. we assume that f (x) has two continuous derivatives. Nevertheless. there are several reasons as of why we still need to approximate derivatives: • Even if there exists an underlying function that we need to diﬀerentiate. this will typically be done using only values that are deﬁned on a lattice. of course. x + h). h2 f (x + h) = f (x) + hf (x) + f (ξ). The ﬁrst questions that comes up to mind is: why do we need to approximate derivatives at all? After all.1) where we assume that h > 0.3) .D.2) For such an expansion to be valid.1) is actually an exact expression for the derivative. h (4. Levy 4 4. • There are times in which exact formulas are available but they are very complicated to the point that an exact computation of the derivative requires a lot of function evaluations. What do we mean when we say that the expression on the righthandside of (4. We write a Taylor expansion of f (x + h) about x. It might be signiﬁcantly simpler to approximate the derivative instead of computing its exact value. The underlying function itself (which in this cased is the solution of the equation) is unknown.1) is an approximation of the derivative? For linear functions (4. • There are some cases where it may not be obvious that an underlying function exists and all that we have is a discrete data set. we do know how to analytically diﬀerentiate every function. (4. Since we then have to evaluate derivatives at the grid points. which are related..1) with an exact formula of the form f (x) = f (x + h) − f (x) h − f (ξ). we typically represent the solution as a discrete approximation that is deﬁned on a grid.1 Numerical Diﬀerentiation Basic Concepts This chapter deals with numerical approximations of derivatives. to derivatives.
4) are h2 h3 f (x) + f (ξ1 )..3). It is possible to write more accurate formulas than (4. The approximation of the derivative at x that is based on the values of the function at x − h and x. f (x) ≈ f (x) − f (x − h) . The “speed” in which the error goes to zero as h → 0 is called the rate of convergence. We refer to a methods as a pth order method if the truncation error is of the order of O(hp ).4) Let’s verify that this is indeed a more accurate formula than (4.1) to improve.. 2 66 . the approximation (4. As this distance tends to zero. The second term on the righthandside of (4. This is indeed the case if the truncation error goes to zero. 2 6 Here ξ1 ∈ (x. Since the approximation (4.1) is called a forward diﬀerencing or onesided diﬀerencing. x). When the truncation error is of the order of O(h).3) for the ﬁrst derivative. h is called a backward diﬀerencing (which is obviously also a onesided diﬀerencing formula).1 Basic Concepts D. i. i.1) can be thought of as being obtained by truncating this term from the exact formula (4. the two points approach each other and we expect the approximation (4. this error is called the truncation error.e. 2h 12 which means that the truncation error in the approximation (4. x + h) and ξ2 ∈ (x − h.e.4.1). x + h].4) is − h2 [f (ξ1 ) + f (ξ2 )]. which in turn is the case if f (ξ) is well deﬁned in the interval (x. x + h) such that 1 f (ξ) = [f (ξ1 ) + f (ξ2 )].3) is the error term. The small parameter h denotes the distance between the two points x and x + h. x + h). then the intermediate value theorem implies that there exists a point ξ ∈ (x − h. a more accurate approximation for the ﬁrst derivative that is based on the values of the function at the points f (x − h) and f (x + h) is the centered diﬀerencing formula f (x) ≈ f (x + h) − f (x − h) . 2 6 2 h h3 f (x − h) = f (x) − hf (x) + f (x) − f (ξ2 ). h → 0. 12 If the thirdorder derivative f (x) is a continuous function in the interval [x − h. we say that the method is a ﬁrst order method. For example. Hence f (x + h) = f (x) + hf (x) + f (x + h) − f (x − h) h2 f (x) = − [f (ξ1 ) + f (ξ2 )]. Taylor expansions of the terms on the righthandside of (4. 2h (4. Levy Since this approximation of the derivative at x is based on the values of the function at x and x + h.
12 ξ ∈ (x − h. . Hence. . An approximation of the derivative at any point can be then obtained by a direct diﬀerentiation of the interpolant.D.5) which means that the expression (4. 2 6 24 Here. ξ− ∈ (x − h. According to the error analysis of Section 2.7 we know that the interpolation error is 1 f (x) − Qn (x) = f (n+1) (ξn ) (x − xj ). For example. 2h 6 4. In a similar way we can approximate the values of higherorder derivatives.4) is a secondorder approximation of the ﬁrst derivative.5 by li (x). with a truncation error that is given by h2 (4) − f (ξ). h2 24 12 where we assume that ξ ∈ (x − h. h2 (4.2 Diﬀerentiation Via Interpolation (4.6) is indeed a secondorder approximation of the derivative. x) and ξ+ ∈ (x.2 Diﬀerentiation Via Interpolation In this section we demonstrate how to generate diﬀerentiation formulas by diﬀerentiating an interpolant. x + h) and that f (x) has four continuous derivatives in the interval. We follow this procedure and assume that f (x0 ). f (xn ) are given.6) To verify the consistency and the order of approximation of (4. x + h). . 4. n Here we simplify the notation and replace li (x) which is the notation we used in Section 2. The idea is straightforward: the ﬁrst stage is to construct an interpolating polynomial from the data. . the approximation (4. Hence f (x + h) − 2f (x) + f (x − h) h2 (4) h2 = f (x) + f (ξ− ) + f (4) (ξ+ ) = f (x) + f (4) (ξ).6) we expand h3 h4 (4) h2 f (x ± h) = f (x) ± hf (x) + f (x) ± f (x) + f (ξ± ). Levy Hence f (x) = f (x + h) − f (x − h) h2 − f (ξ). The Lagrange form of the interpolation polynomial through these points is n Qn (x) = j=0 f (xj )lj (x). (n + 1)! j=0 67 n . it is easy to verify that the following is a secondorder approximation of the second derivative f (x) ≈ f (x + h) − 2f (x) + f (x − h) . x + h).
.8) (n + 1)! (n + 1)! dx We now assume that x is one of the interpolation points. . i. · · · (x − xn )]. (4. (n + 1)! j=0 j=k (4. . (n + 1)! (4. .4. . . xn )).10) as a diﬀerentiation by interpolation algorithm. This means that we use two interpolation points (x0 . Hence. so that n f (xk ) = j=0 f (xj )lj (xk ) + 1 f (n+1) (ξxk )w (xk ). . i. then becomes n f (xk ) = j=0 f (xj )lj (xk ) + 1 f (n+1) (ξxk ) (xk − xj ). . Diﬀerentiating the interpolant (4.9) Now.10) We refer to the formula (4. . xn }. . . . . x0 . . there is only one term in w (x) that does not vanish. Example 4. (4.e. . n w (xk ) = j=0 j=k (xk − xj ). .7) where n w(x) = i=0 (x − xi ).. The numerical diﬀerentiation formula.e. (n + 1)! (4.2 Diﬀerentiation Via Interpolation D.1 We demonstrate how to use the diﬀerentiation by integration formula (4. n n n w (x) = i=0 j=0 j=i (x − xj ) = i=0 [(x − x0 ) · . xn ). · (x − xi−1 )(x − xi+1 ) · . . . x0 . we would like to emphasize the dependence of ξn on x and hence replace the ξn notation by ξx .. We that have: n f (x) = j=0 f (xj )lj (x) + 1 f (n+1) (ξx )w(x). Since here we are assuming that the points x0 .10) in the case where n = 1 and k = 0. x ∈ {x0 . xn are ﬁxed. . when w (x) is evaluated at an interpolation point xk . max(x.9). .7): n f (x) = j=0 f (xj )lj (x) + 1 d 1 f (n+1) (ξx )w (x) + w(x) f (n+1) (ξx ). say xk . Levy where ξn ∈ (min(x. f (x0 )) and 68 .
Levy 4. (x2 − x0 )(x2 − x1 ) x 0 − x1 (x2 − x0 )(x2 − x1 ) (4. If we now let x1 = x0 + h. (4. then f (x0 ) = f (x0 + h) − f (x0 ) h − f (ξ). x 0 − x1 l1 (x) = x − x0 . (x2 − x0 )(x2 − x1 ) 6 69 l1 (x0 ) = l2 (x0 ) = . we simplify the notation and assume that ξ ∈ (x0 .3). The Lagrange interpolation polynomial in this case is f (x) = f (x0 )l0 (x) + f (x1 )l1 (x).2 Diﬀerentiation Via Interpolation (x1 . (x0 − x1 )(x0 − x2 ) l1 (x) = 2x − x0 − x2 . This time f (x) = f (x0 )l0 (x) + f (x1 )l1 (x) + f (x2 )l2 (x). (x1 − x0 )(x1 − x2 ) l2 (x) = (x − x0 )(x − x1 ) . (x2 − x0 )(x2 − x1 ) Evaluating lj (x) for j = 1. (x1 − x0 )(x1 − x2 ) x 0 − x2 . 2. where l0 (x) = Hence l0 (x) = We thus have f (x0 ) = f (x1 ) 1 f (x1 ) − f (x0 ) 1 f (x0 ) + + f (ξ)(x0 − x1 ) = − f (ξ)(x1 − x0 ). x 1 − x0 x − x1 . (x1 − x0 )(x1 − x2 ) l2 (x) = 2x − x0 − x1 . h 2 which is the (ﬁrstorder) forward diﬀerencing approximation of f (x0 ).11) (x − x1 )(x − x2 ) .D. with l0 (x) = Hence l0 (x) = 2x − x1 − x2 . Example 4. x 1 − x0 Here. x 0 − x1 x 1 − x0 2 x 1 − x0 2 1 . x1 ). 3 at x0 we have l0 (x0 ) = Hence f (x0 ) = f (x0 ) x 0 − x2 2x0 − x1 − x2 + f (x1 ) (x0 − x1 )(x0 − x2 ) (x1 − x0 )(x1 − x2 ) x 0 − x1 1 +f (x2 ) + f (3) (ξ)(x0 − x1 )(x0 − x2 ). (x0 − x1 )(x0 − x2 ) l1 (x) = (x − x0 )(x − x2 ) . (x0 − x1 )(x0 − x2 ) 2x0 − x1 − x2 . x 0 − x1 l1 (x) = 1 . f (x1 )). and want to approximate f (x0 ).2 We repeat the previous example in the case n = 2 and k = 0.
12) as f (x) ≈ Af (x + h) + Bf (x) + Cf (x − h) = (A + B + C)f (x) + h(A − C)f (x) + + (4.e.3 The Method of Undetermined Coeﬃcients D. = 2h 3 which is a onesided. For xi = x + ih. 2. The Taylor expansions of the terms f (x ± h) are f (x ± h) = f (x) ± hf (x) + where (assuming that h > 0) x−h ξ− x ξ+ x + h. when we discuss integration). for example.14) h (A + C)f (x) 2 2 h4 h3 (A − C)f (3) (x) + [Af (4) (ξ+ ) + Cf (4) (ξ− )]. f (x − h).3 The Method of Undetermined Coeﬃcients In this section we present the method of undetermined coeﬃcients. 6 24 70 . In a similar way. i..11) becomes f (x) = −f (x) f (ξ) 2 3 2 1 + + f (x + h) + f (x + 2h) − h 2h h 2h 3 −3f (x) + 4f (x + h) − f (x + 2h) f (ξ) 2 + h. we assume ξ ∈ (x0 . f (x) ≈ Af (x + h) + B(x) + Cf (x − h). which is a very practical way for generating approximations of derivatives (as well as other quantities as we shall see. i = 0. (4..13) we can rewrite (4.13) Using the expansions in (4. f (x). 2 6 24 (4. the resulting formula would be the secondorder centered approximation of the ﬁrstderivative (4.4. 2h 6 4.g. e. that we are interested in ﬁnding an approximation of the second derivative f (x) that is based on the values of the function at three equally spaced points. secondorder approximation of the ﬁrst derivative. h2 h3 h4 f (x) ± f (x) + f (4) (ξ± ). Assume. equation (4. and C are to be determined in such a way that this linear combination is indeed an approximation of the second derivative. if we were to repeat the last example with n = 2 while approximating the derivative at x1 . x2 ). Remark.5) f (x) = f (x + h) − f (x − h) 1 − f (ξ)h2 . f (x + h). B. 1. Levy Here.12) The coeﬃcients A.
h2 24 We note that the last two terms can be combined into one using an intermediate values theorem (assuming that f (x) has four continuous derivatives). In other words. Assume that the derivative can be written as a linear combination of the values of the function at certain points.14) we obtain the linear system A + B + C = 0. the coeﬃcient of f (3) (x) on the righthandside of (4. since A and C are equal to each other. In the example. 3.3 The Method of Undetermined Coeﬃcients Equating the coeﬃcients of f (x). we could satisfy four equations.D. Levy 4. and would have mistakenly concluded that the approximation method is only ﬁrstorder accurate. and f (x) on both sides of (4.15) A+C = 2 . unfortunately. we have already seen that even though we used data that is taken from three points. h2 The system (4. 24 12 ξ ∈ (x − h.14) also vanishes and we end up with f (x) = f (x + h) − 2f (x) + f (x − h) h2 (4) − [f (ξ+ ) + f (4) (ξ− )]. no simple answer. one should truncate the Taylor series after the leading term in the error has been identiﬁed. h2 12 In terms of an algorithm. In other words. 71 . 2. we would have missed on this cancellation. i. h2 In this particular case. x + h).e. the coeﬃcient of the thirdderivative vanished as well. Equate the coeﬃcients of the function and its derivatives on both sides. the method of undetermined coeﬃcients follows what was just demonstrated in the example: 1. (4. If we were to stop the Taylor expansions at the third derivative instead of at the fourth derivative.. f (x). The only question that remains open is how many terms should we use in the Taylor expansion. This question has. Hence we obtain the familiar secondorder approximation of the second derivative: f (x) = f (x + h) − 2f (x) + f (x − h) h2 (4) − f (ξ). The number of terms in the Taylor expansion should be suﬃcient to rule out additional cancellations. A − C = 0. h2 B=− 2 . h2 h2 (4) [f (ξ+ ) + f (4) (ξ− )] = f (4) (ξ).15) has the unique solution: A=C= 1 . Write the Taylor expansions of the function at the approximation points.
L = f (x).4 Richardson’s Extrapolation D. is a secondorder approximation of the ﬁrstderivative which is based on the values of f (x) at x ± h. We therefore start with the Taylor expansions of f (x ± h) about the point x.e. .16) ∞ k=0 ∞ k=0 f (k) (x) k h . i.. which in this case is D(h) = The error is E = e2 h2 + e4 h4 + . it is by no means limited only to diﬀerentiation and we will get back to it later on when we study methods for numerical integration. We start with an example in which we show how to turn a secondorder approximation of the ﬁrst derivative into a fourth order approximation of the same quantity. . . We assume here that in general ei = 0. f (x + h) = f (x − h) = Hence f (x) = f (x + h) − f (x − h) h4 h2 (3) − f (x) + f (5) (x) + . Levy 4. 2h (4. We already know that we can write a secondorder approximation of f (x) given its values in f (x ± h). . In order to improve this approximation we will need some more insight on the internal structure of the error. where ei denotes the coeﬃcient of hi in (4. .4. and D(h) is the approximation.4 Richardson’s Extrapolation Richardson’s extrapolation can be viewed as a general procedure for improving the accuracy of approximations when the structure of the error is known. k! We rewrite (4. . The important property of the coeﬃcients ei ’s is that they do not depend on h.. In order to improve the 72 f (x + h) − f (x − h) . i.16) as L = D(h) + e2 h2 + e4 h4 + . We note that the formula L ≈ D(h).16). where L denotes the quantity that we are interested in approximating. k! (−1)k f (k) (x) k h . . While we study it here in the context of numerical diﬀerentiation.17) .e. . 2h 3! 5! (4.
17) with (4. e. Once again we would like to emphasize that Richardson’s extrapolation is a general procedure for improving the accuracy of numerical approximations that can be used when the structure of the error is known.20) we end up with a sixthorder approximation of the derivative: 16S(h) − S(2h) L= + O(h6 ). It is not speciﬁc for numerical diﬀerentiation. . . . a fourthorder approximation of the derivative is −f (x + 2h) + 8f (x + h) − 8f (x − h) + f (x − 2h) + O(h4 ). .21) Combining (4. 15 Remarks. instead of (4. We carry out such a procedure by writing S(h) = L = S(2h) + a4 (2h)4 + a6 (2h)6 + . instead of using D(2h).D. (4. . of course. 12h Equation (4. 73 (4. 2. This process can be repeated over and over as long as the structure of the error is known.19) we would get a fourthorder approximation of the derivative that is based on the values of f at x − h. For example. . In (4. x − h/2.18).21) with (4. (4. Levy 4. .g.4 Richardson’s Extrapolation approximation of L our strategy will be to eliminate the term e2 h2 from the error. 3 In other words. is still a secondorder approximation of the derivative. we can write (4. .20) . subtracting the following equations from each other 4L = 4D(h) + 4e2 h2 + 4e4 h4 + . where −f (x + 2h) + 8f (x + h) − 8f (x − h) + f (x − 2h) . .19) as L= L = S(h) + a4 h4 + a6 h6 + . 1. the idea is to combine (4.19) f (x) = 12h Note that (4..16) by using more points. L = D(2h) + 4e2 h2 + 16e4 h4 + . However. . Indeed. How can this be done? one possibility is to write another approximation that is based on the values of the function at diﬀerent points. For example. (4. x + h. . x + h/2. it is possible to use other approximations. we can write L = D(2h) + e2 (2h)2 + e4 (2h)4 + . If this is what is done. . . .19) improves the accuracy of the approximation (4. D(h/2). we have 4D(h) − D(2h) − 4e4 h4 + .18) such that the h2 term in the error vanishes.18) This. .20) can be turned into a sixthorder approximation of the derivative by eliminating the term a4 h4 .
b] is deﬁned as U (f ) = inf(U (f. it will be natural to recall the notion of Riemann integration. If the upper and lower integral of f (x) are equal to each other. P . . where both the inﬁmum and the supremum are taken over all possible partitions. their b common value is denoted by a f (x)dx and is referred to as the Riemann integral of f (x). Levy 5 5. as two approximations of the integral (assuming that the function is indeed integrable). xn } is a partition (P ) of [a. and still. For the purpose of the present discussion we can think of the upper and lower Darboux sums (5. P ) = i=1 Mi ∆xi . b]. P )). b] and that {x0 . there might be situations where the given function can be integrated analytically. We assume that f (x) is a bounded function deﬁned on [a.2). In addition. and the lower integral of f (x) is deﬁned as L(f ) = sup(L(f. (5. the upper (Darboux) sum of f (x) with respect to the partition P is deﬁned as n U (f. P ) = i=1 mi ∆xi . . . Of course. inf f (x). Since we can not analytically integrate every function. P )).1) while the lower (Darboux) sum of f (x) with respect to the partition P is deﬁned as n L(f. these sums are not deﬁned in the most convenient way 74 . (5. (5.1). Letting ∆xi = xi − xi−1 . .1 Numerical Integration Basic Concepts In this chapter we are going to explore various ways for approximating the integral of a function over a given domain. an approximation formula may end up being a more eﬃcient alternative to evaluating the exact expression of the integral.D. b]. the need for approximate integration formulas is obvious. In order to gain some insight on numerical integration.xi ] f (x). For each i we let Mi (f ) = and mi (f ) = x∈[xi−1 . of the interval [a.xi ] sup x∈[xi−1 .2) The upper integral of f (x) on [a.
75 . the midpoint quadrature (5.3) The approximation (5.4) (see also Fig 5. b a f (x)dx ≈ f (a)(b − a). Similarly to the rectanb gular rule.e.e. 2 b a f (x)dx ≈ (b − a)f a+b 2 .1: A rectangular quadrature A variation on the rectangular rule is the midpoint rule. b Instead. This is because we need to ﬁnd the extrema of the function in every subinterval.. As we shall see below.4) is a more accurate quadrature than the rectangular rule (5.1). Numerical integration formulas are also referred to as integration rules or quadratures. may be a complicated task on its own. (5.3) as the rectangular rule or the rectangular quadrature. and hence we can refer to (5. we replace the value of the function at an endpoint. f(b) f(x) f(a) a b x Figure 5. we approximate the value of the integral a f (x)dx by multiplying the length of the interval by the value of the function at one point.3) is called the rectangular method (see Figure 5.3). which we would like to avoid. i.D. Only this time.. Levy 5. (5.2). Finding the extrema of the function.1 Basic Concepts for an approximation algorithm. one can think of approximating the value of a f (x)dx by multiplying the value of the function at one of the endpoints of the interval by the length of the interval. i. by the value of the function at the center point 1 (a + b).
a+h E= a f (x)dx − hf a+ h 2 = hf (a) + h2 h3 f (a) + f (a) + O(h4 ) 2 6 h2 h −h f (a) + f (a) + f (a) + O(h3 ) . 24 76 ξ ∈ (a. we consider the primitive function F (x). we have (expanding f (a + h/2)) for the quadrature error. b).6) . Levy f(b) f((a+b)/2) f(x) f(a) a (a+b)/2 b x Figure 5.5.1 Basic Concepts D. 2 8 which means that the error term is of order O(h3 ) so we should stop the expansions there and write E = h3 f (ξ) 1 1 − 6 8 = (b − a)3 f (ξ).2: A midpoint quadrature In order to compute the quadrature error for the midpoint rule (5. (5.5) = hf (a) + h3 h2 f (a) + f (a) + O(h4 ) 2 6 If we let b = a + h. x F (x) = a f (x)dx. and expand a+h f (x)dx = F (a + h) = F (a) + hF (a) + a h2 h3 F (a) + F (a) + O(h4 ) 2 6 (5.4). E.
we can do it. (5. . If we change the interpolation/integration points.e. and write the Lagrange interpolant (of degree n) through these points. Hence.8) need to be computed only once. since they do not depend on the function that is being integrated. 77 . Levy 5. with n li (x) = j=0 j=i x − xj . We select nodes x0 . we can approximate b a b n b n f (x)dx ≈ Pn (x)dx = a i=0 f (xi ) a li (x)dx = i=0 Ai f (xi ). x0 . We also assumed that they are bounded and that they are deﬁned at every point. .. Example 5. then we must recompute the quadrature coeﬃcients.2 Integration via Interpolation Remark. .9) is called a NewtonCotes formula. . so that whenever we need to evaluate a function at a point.7) The quadrature coeﬃcients Ai in (5.D. . b As always. xn ∈ [a. n Pn (x) = i=0 f (xi )li (x). (5. x i − xj 0 i n.2 Integration via Interpolation In this section we will study how to derive quadratures by integrating an interpolant. Throughout this section we assumed that all functions we are interested in integrating are actually integrable in the domain of interest. We will go on and use these assumptions throughout the chapter. x1 = b. . i=0 (5.1 We let n = 1 and consider two interpolation points which we set as x0 = a. xn . For equally spaced points. the quadrature coeﬃcients (5. b]. a numerical integration formula of the form b a n f (x)dx ≈ Ai f (xi ). our goal is to evaluate I = a f (x)dx. .7) are given by b Ai = a li (x)dx. 5. . i.8) Note that if we want to integrate several diﬀerent functions at the same points.
b−a l0 (x) = a Similarly.2 Integration via Interpolation In this case l0 (x) = Hence A0 = a D.10) (see Figure 5. 2 (5. The interpolation error is 1 f (x) − P1 (x) = f (ξx )(x − a)(x − b). ξx ∈ (a.11) 12 with ξ ∈ (a. b).3: A trapezoidal quadrature We can now use the interpolation error to compute the error in the quadrature (5. b−a b l1 (x) = b x−a . 2 and hence (using the integral intermediate value theorem) b E= a 1 f (ξ) f (ξx )(x − a)(x − b) = 2 2 b a (x − a)(x − b)dx = − f (ξ) (b − a)3 . b). b−a 2 A1 = a l1 (x) = a The resulting quadrature is the socalled trapezoidal rule.5.3). (5. b a dx ≈ b−a [f (a) + f (b)]. Levy b−x . b b b−x b−a dx = . 78 .10). b−a 2 b−a x−a dx = = A0 . f(b) f(x) f(a) a b x Figure 5.
5.. i = 1. (5. i. it can be written as (check!) n p(x) = i=0 p(xi )li (x). 5. Example 5. n. Levy Remarks.D.(5.e. . A particular case is when these points are uniformly spaced. We know that lj (x)dx = a i=0 Ai lj (xi ) = i=0 Ai δij = Aj . xi ]. 2. For example. 79 . if xi = a + ih. .e. . We note that the quadratures (5.3 Composite Integration Rules 1.7). b n f (x)dx = a i=1 1 f (x)dx ≈ 2 xi−1 xi n i=1 (xi − xi−1 )[f (xi−1 ) + f (xi )]. As of the opposite direction. We demonstrate this idea with a couple of examples. For if p(x) is such a polynomial.4). .12) (see Figure 5. i=0 is exact for all polynomials of degree deg(lj (x)) = n. i. The composite trapezoidal rule is obtained by applying the trapezoidal rule in each subinterval [xi−1 . we divide the interval into subintervals and apply an integration rule to each subinterval. and hence b n n n.8).2 Consider the points a = x0 < x1 < · · · < xn = b.. Assume that the quadrature b a n f (x)dx ≈ Ai f (xi ). when all intervals have an equal length. Hence b n b n p(x)dx = a i=0 p(xi ) a li (x)dx = i=0 Ai p(xi ).3 Composite Integration Rules In a composite quadrature. are exact for polynomials of degree n.
the overall error is obtained by summing over n such terms: n i=1 − h3 n 1 h3 f (ξi ) = − 12 12 n f (ξi ) . n n−1 n h f (x)dx ≈ f (a) + 2 2 f (a + ih) + f (b) = h i=1 i=0 f (a + ih).11) that in every subinterval the quadrature error is − h3 f (ξx ). We can also compute the error term as a function of the distance between neighboring points. . i=1 Here.5. h. We know from (5.4: A composite trapezoidal rule where h= then b a b−a . means that we sum over all the terms with the exception of the ﬁrst and last terms that are being divided by 2.13) The notation of a sum with two primes.3 Composite Integration Rules D. i=1 80 . we use the notation ξi to denote an intermediate point that belongs to the ith interval. Let 1 M= n n f (ξi ). 12 n Hence. Levy f(x) x0 x1 x2 x xn−1 xn Figure 5. (5.
. b]. Levy Clearly x∈[a. 12 ξ ∈ [a. In order to obtain the quadrature error in the approximation (5.4) in every subinterval.16) where ξ ∈ (a.. . .15) we recall that in each subinterval the error is given according to (5. we have E=− (b − a)h2 f (ξ). j=1 (5. b] such that f (ξ) = M. Example 5.3 Composite Integration Rules min f (x) M x∈[a. i..D.14) This means that the composite trapezoidal rule is secondorder accurate. xj + 2 2 . Hence (recalling that (b − a)/n = h. .3 In the interval [a. b] we assume n subintervals and let h= b−a .6). b] (which we anyhow do in order for the interpolation error formula to be valid) then there exists a point ξ ∈ [a. Ej = Hence n h3 f (ξj ). b).b] max f (x) If we assume that f (x) is continuous in [a. n The quadrature points are xj = a + j − 1 2 h. 24 ξj ∈ h h xj − . j = 1. i.e.b] 5.15) is known as the composite midpoint rule. E= j=1 Ej = h3 24 n f (ξj ) = j=1 h3 1 n 24 n n f (ξj ) = j=1 h2 (b − a) f (ξ). 81 . 2. (5. The composite midpoint rule is given by applying the midpoint rule (5. This means that the composite midpoint rule is also secondorder accurate (just like the composite trapezoidal rule). 24 (5. n.15) Equation (5.e. b a n f (x)dx ≈ h f (xj ).
1 that this approximation is actually exact for polynomials of degree 3. and the desired quadrature is 1 2 1 1 = 2 1 = 3 Therefore.4.4 Additional Integration Techniques D. (5. implies that it is exact for any polynomial of degree 2.17) is linear. Determine the coeﬃcients of the linear combination by requiring that the quadrature is exact for as many polynomials as possible from the the ordered set {1. Write a quadrature as a linear combination of the values of the function at the chosen quadrature points. A0 = A2 = 1 0 f (x)dx ≈ f (0) + 4f 6 + f (1) . its being exact for 1. Levy 5. and x2 . . 2 0 1 1 x2 dx = A1 + A2 . x.}. We demonstrate this technique with the following example.17) Since the resulting formula (5. Example 5. it has to be exact for the polynomials 1. x. and x2 . 82 . Hence we obtain the system of linear equations 1 1 = 0 1dx = A0 + A1 + A2 . 1 xdx = A1 + A2 . Select the quadrature points. 4 0 1 6 2 and A1 = 3 . x2 . x.1 Additional Integration Techniques The method of undetermined coeﬃcients The methods of undetermined coeﬃcients for deriving quadratures is the following: 1. we will show in Section 5.4 5.5. that is exact for all polynomials of degree Solution: Since the quadrature has to be exact for all polynomials of degree 2. In fact.5. 2. 2. . 3.4 Problem: Find a quadrature of the form 1 0 f (x)dx ≈ A0 f (0) + A1 f 1 2 + A2 f (1). .
4.18) We would like to to use (5. that approximates for b f (x)dx. as a quadrature on the interval [a.19).4 Additional Integration Techniques Suppose that we have a quadrature formula on the interval [c. d] of the form d c n f (t)dt ≈ Ai f (ti ). d] → [a.19) We note that if the quadrature (5. i=0 This means that b a b−a f (x)dx ≈ d−c n Ai f i=0 b−a ad − bc ti + d−c d−c .19) b a f (x)dx ≈ b−a f (a) + 4f 6 a+b 2 + f (b) . i=0 (5. a The mapping between the intervals [c. (5. Levy 5. According to (5. b] can be written as a linear transformation of the form λ(t) = Hence b a b−a ad − bc t+ . b]. so is (5.20) is known as the Simpson quadrature. (5.2 Change of an interval 5.5 We want to write the result of the previous example 1 0 f (x)dx ≈ f (0) + 4f 6 1 2 + f (1) . Example 5. 83 .20) The approximation (5.18) to ﬁnd a quadrature on the interval [a. d−c d−c b−a f (x)dx = d−c d c b−a f (λ(t))dt ≈ d−c n Ai f (λ(ti )).D.18) was exact for polynomials of degree m. b].
21) Such quadratures are called general (weighted) quadratures. that are given by (5.23) with quadrature coeﬃcients.5. .2. Levy We recall that a weight function is a continuous. Its Lagrange form is n Qn (x) = i=0 f (xi )li (x). . xn .22). (5. 84 .22) To summarize. we wrote a quadrature of the form b a n f (x)dx ≈ Ai f (xi ). Previously. Repeating the derivation we carried out in Section 5.4 Additional Integration Techniques 5. Hence b a b n b n f (x)w(x)dx ≈ Qn (x)w(x)dx = a i=0 a li (x)w(x)dx = i=0 Ai f (xi ). x i − xj 0 i n.3 General integration formulas D. we construct an interpolant Qn (x) of degree n that passes through the points x0 . i=0 (5.4. . with the usual n li (x) = j=0 j=i x − xj . We assume that such a weight function w(x) is given and would like to write a quadrature of the form b a n f (x)w(x)dx ≈ Ai f (xi ). for the case w(x) ≡ 1. i=0 where b Ai = a li (x)dx. nonnegative function with a positive mass. Ai . . i=0 (5. where the coeﬃcients Ai are given by b Ai = a li (x)w(x)dx. the general quadrature is b a n f (x)w(x)dx ≈ Ai f (xi ).
An alternative derivation is the following: start with a polynomial Q2 (x) that interpolates f (x) at the points a. h= b−a . We will see that by studying the error term.2 1.4 2. (a + b)/2. We let h denote half of the interval [a.5.8 ←− P2 (x) 0.e. Then approximate b a f (x)dx ≈ (x − c)(x − b) (x − a)(x − b) (x − a)(x − c) f (a) + f (c) + f (b) dx (a − c)(a − b) (c − a)(c − b) (b − a)(b − c) a a+b b−a f (a) + 4f + f (b) .2 0 1 1. The approximation of obtained by integrating the quadratic interpolant Q2 (x) over [1.4 0.5 demonstrates this process of deriving Simp3 son’s quadrature for the speciﬁc choice of approximating 1 sin xdx. = . 2 85 .6 2. and b.8 2 2.5 Simpson’s Integration In the last example we obtained Simpson’s quadrature (5. 3] sin xdx is 5.D. = 6 2 b which is Simpson’s rule (5.20). i.5 Simpson’s Integration 5.1 The quadrature error Surprisingly.4 1. Simpson’s quadrature is exact for polynomials of degree 3 and not only for polynomials of degree 2. 1 0. Levy 5. b].20).5: An example of Simpson’s quadrature...6 1.8 3 x 3 1 Figure 5. Figure 5.2 2.6 sin x −→ 0..
. . 3 3 3 · 5! f (x)dx ≈ a+2h F (x) = a f (t)dt. . where h= b−a . n 86 0 i n. (5.24) implies that Simpson’s quadrature is exact for polynomials of degree 3. the quadrature error formula (5. x h [f (a) + 4f (a + h) + f (a + 2h)] 3 a h 4 4 4 = f (a) + 4f (a) + 4hf (a) + h2 f (a) + h3 f (a) + h4 f (4) (a) + .. Hence a+2h F (a + 2h) = a f (x)dx = F (a) + 2hF (a) + + (2h)2 (2h)3 F (a) + F (a) 2 6 (2h)5 (5) (2h)4 (4) F (a) + F (a) + . . 5. . n. + f (a) + 2hf (a) + 2 6 24 4 2 100 5 (4) = 2hf (a) + 2h2 f (a) + h3 f (a) + h4 f (a) + h f (a) + . . .e. we divide the interval [a. b]. i. 3 2 6 24 3 2 (2h) (2h)4 (4) (2h) f (a) + f (a) + f (a) + .5 Simpson’s Integration Then b a D. 3 3 5! which implies that F (a + 2h) − 1 h [f (a) + 4f (a + h) + f (a + 2h)] = − h5 f (4) (a) + .2 Composite Simpson rule To derive a composite version of Simpson’s quadrature. . Levy f (x)dx = We now deﬁne F (x) to be the primitive function of f (x). . ξ ∈ [a. . .24) Since the fourth derivative of any polynomial of degree 3 is identically zero. 4! 5! 4 2 32 = 2hf (a) + 2h2 f (a) + h3 f (a) + h4 f (a) + h5 f (4) (a) + . and let xi = a + ih.5. .5. 3 90 This means that the quadrature error for Simpson’s rule is 1 E=− 90 b−a 2 5 f (4) (ξ). . b] into an even number of subintervals.
26) i. i=1 x∈[a. if we replace the integral in every subintervals by Simpson’s rule (5.20). This will be 87 . xn were given up front. we obtain b x2 xn n/2 x2i f (x)dx = a x0 f (x)dx + .27) was shown to be exact for polynomials of degree n for an appropriate choice of the quadrature coeﬃcients Ai .24)) over all subintervals. all the quadratures we encountered were of the form b a n f (x)dx ≈ Ai f (xi ).25) Summing the error terms (that are given by (5. + xn−2 n/2 f (x)dx = i=1 x2i−2 f (x)dx h ≈ 3 [f (x2i−2 ) + 4f (x2i−1 ) + f (x2i )] .1 Gaussian Quadrature Maximizing the quadrature’s accuracy So far. i=0 We are now interested in investigating the possibility of writing more accurate quadratures without increasing the total number of quadrature points. xn .27) An approximation of the form (5. In other words. f (x2i−2 ) + 4 f (x)dx ≈ f (x0 ) + 2 3 a i=1 i=0 (5. . 90 2 180 ξ ∈ [a. . i=1 The composite Simpson quadrature is thus given by n/2 n/2 b h f (x2i−1 ) + f (xn ) . .. In all cases. the quadrature points x0 . Levy 5. .6. the quadrature error takes the form h5 E=− 90 Since min f (4) n/2 f i=1 (4) h5 n 2 (ξi ) = − · · 90 2 n n/2 f (4) (ξi ). (5.b] max f (4) (x).e.D. 5. the coeﬃcients {Ai }n were determined such that the approximation was exact in Πn . the composite Simpson quadrature is fourthorder accurate. .6 Gaussian Quadrature Hence.6 5. . . we can conclude that E=− h5 n (4) h4 (4) f (ξ) = − f (ξ). . i=0 (5. given a set of nodes x0 . b]. .b] (x) 2 n n/2 f (4) (ξi ) i=1 x∈[a. .
27) and (5. . Levy possible if we allow for the freedom of choosing the quadrature points. and so is the number of quadrature coeﬃcients. b a n f (x)w(x)dx ≈ Ai f (xi ).7 Let q(x) be a nonzero polynomial of degree n + 1 that is worthogonal to Πn . i=0 (5. Equation (5. x0 .28) is exact ∀f ∈ Π2n+1 . a If x0 . Example 5. 88 . we have altogether 2n + 2 degrees of freedom.e. Ai . is n + 1. if we have the ﬂexibility of determining the location of the points in addition to determining the coeﬃcients. xn are the zeros of q(x) then f (xi ) = r(xi ).. is exact for polynomials of degree result in Example 5. i.28) is exact for f ∈ Πn if and only if w(x) a j=0 j=i Ai = x − xj dx.9 below. .28) where w(x) b 0 is a weight function. xn are the zeros of q(x) then (5. . . r(x) ∈ Πn . This is indeed the case as we shall see below.6 The quadrature formula 1 −1 f (x)dx ≈ f 1 −√ 3 +f 1 √ 3 . We will show that the general solution of this integration problem is connected with the roots of orthogonal polynomials.5. xn . write f (x) = q(x)p(x) + r(x). For f (x) ∈ Π2n+1 . . Theorem 5. x i − xj In both cases (5. . ∀p(x) ∈ Πn . and hence we can expect to be able to derive quadratures that are exact for polynomials in Π2n+1 . 3(!) We will revisit this problem and prove this An equivalent problem can be stated for the more general weighted quadrature case. The quadrature problem becomes now a problem of choosing the quadrature points in addition to determining the corresponding coeﬃcients in a way that the quadrature is exact for polynomials of a maximal degree. We start with the following theorem. . . .6 Gaussian Quadrature D. Proof. Hence. . . Quadratures that are obtained that way are called Gaussian quadratures.28). Since x0 . the number of quadrature nodes. b p(x)q(x)w(x)dx = 0. We note that p(x). . Here.
t2 ). 89 .6 Gaussian Quadrature f (x)w(x)dx = a n a [q(x)p(x) + r(x)]w(x)dx = a n r(x)w(x)dx (5. Levy Hence. Assume that f (x) is continuous in [a. The polynomial n p(x) = i=1 (x − ti ).8 Let w(x) be a weight function. f (x) changes sign at least once. In other words. b] that is not the zero function. . .17). b b b 5. b). and that f (x) is worthogonal to Πn . t1 ).29) = i=0 Ai r(xi ) = i=0 Ai f (xi ).29) holds since (5. where r n.7 we already know that the quadrature points that will provide the most accurate quadrature rule are the n+1 roots of an orthogonal polynomial of degree n + 1 (where the orthogonality is with respect to the weight function w(x)). a which leads to a contradiction since p(x) ∈ Πn . Now suppose that f (x) changes size only r times. We recall that the roots of q(x) are real. The second equality in (5.28) is exact for polynomials in Πn . (t1 . simple and lie in (a. b). Proof. Choose {ti }i 0 such that a = t0 < t1 < · · · < tr+1 = b. We now restate the result regarding the roots of orthogonal functions with an alternative proof. we need n + 1 quadrature points in the interval.D. tr+1 ). . Hence b f (x)p(x)w(x)dx = 0. a Hence.29) holds since q(x) is worthogonal to Πn . has the same sign property. and f (x) is of one sign on (t0 . Since 1 ∈ Πn . Theorem 5. The third equality (5. b f (x)w(x)dx = 0. something we know from our previous discussion on orthogonal polynomials (see Theorem 3. and an orthogonal polynomial of degree n + 1 does have n + 1 distinct roots in the interval. According to Theorem 5. (tr . Then f (x) changes sign at least n + 1 times on (a. .
2 The integration points will then be the zeros of P2 (x). 3 so that the desired quadrature is 1 −1 f (x)dx ≈ f 1 −√ 3 +f 1 √ 3 . assuming that the quadrature 1 −1 f (x)dx ≈ A0 f (x0 ) + A1 f (x1 ). 90 . x2 . The polynomial of degree n+1 = 2 which is orthogonal to Πn = Π1 with weight w(x) ≡ 1 is the Legendre polynomial of degree 2. A2 . i. From this we can write A0 + A1 = 2. The linearity of the quadrature means that it is suﬃcient to make the quadrature exact for 1. Solving for A1 . 0 1 i = 0.9 We are looking for a quadrature of the form 1 −1 D. i. A2 . Since n = 1. x.. 2 A0 x2 + A1 x2 = 3 . 2. 3 All that remains is to determine the coeﬃcients A1 . and x3 .30) Example 5. 3 1 x1 = √ . 1 x0 = − √ . A straightforward computation will amount to making this quadrature exact for the polynomials of degree 3. A0 x0 + A1 x1 = 0.e. Hence we write the system of equations 1 1 f (x)dx = −1 −1 xi dx = A0 xi + A1 xi . 1 0 A0 x3 + A1 x3 = 0.10 We repeat the previous problem using orthogonal polynomials. Levy f (x)dx ≈ A0 f (x0 ) + A1 f (x1 ). 3. 1. (5. This is done in the usual way. 1 0 A1 = A2 = 1. and x1 we get 1 x0 = −x1 = √ .. 1 P2 (x) = (3x2 − 1).5.6 Gaussian Quadrature Example 5. x0 . we expect to ﬁnd a quadrature that is exact for polynomials of degree 2n + 1 = 3.e.
i=0 b a n f (x)w(x)dx ≈ j Ai f (xi ).e. Aj > 0. . since the Gaussian quadrature is exact for f (x) ≡ 1. . Hence b n n 0< a p (x)w(x)dx = i=0 2 Ai p (xi ) = i=0 2 Ai q 2 (xi ) = Aj p2 (xj ). Let p(x) ∈ Πn be deﬁned as q(x) . i.. and the quadrature is the same as (5. . . p(x) is indeed a polynomial of degree n. Let q(x) ∈ Πn+1 be worthogonal to Πn . i=0 (5. and take {xi }n to be the quadrature points. 3 3 −1 1 Hence A0 = A1 = 1. xn . n. It is given by (2. Levy is exact for polynomials of degree 1 5. 5.6. i. In addition. The degree of p2 (x) 2n which means that the Gaussian quadrature (5.31) is exact for it.42). Also assume that q(xi ) = 0 for i = 0. Fix n. The simplest will be to use 1 and x. . we have b n w(x)dx = a i=0 Ai . 91 .31) Fix 0 n. and 0= 1 1 xdx = −A0 √ + A1 √ .2 Convergence and error analysis Lemma 5. .. Proof.e..6 Gaussian Quadrature 1. i. x − xj p(x) = Since xj is a root of q(x). Our starting point is the Lagrange form of the Hermite polynomial that interpolates f (x) and f (x) at x0 . n n p(x) = i=0 f (xi )ai (x) + i=0 f (xi )bi (x).30) (as should be). . .11 In a Gaussian quadrature formula. the coeﬃcients are positive and their b sum is a w(x)dx. (xi − xj )2 which means that ∀j. 2= −1 1dx = A0 + A1 .D. In order to estimate the error in the Gaussian quadrature we would ﬁrst like to present an alternative way of deriving the Gaussian quadrature.e.
we can eliminate the derivatives from the quadrature (5. b] and approximate b a b n n w(x)f (x)dx ≈ w(x)p2n+1 (x)dx = a i=0 Ai f (xi ) + i=0 Bi f (xi ). and n 2 bi (x) = (x − xi )li (x). all that we need is to set the points x0 .32) by setting Bi = 0 in (5. i=0 Then there exists ζ ∈ (a. . Consider the Gaussian quadrature b a n f (x)w(x)dx ≈ Ai f (xi ). D. .32) where b Ai = a w(x)ai (x)dx. it seems to be rather strange to deal with the Hermite interpolant when we do not explicitly know the values of f (x) at the interpolation points. . if the product n (x − xj ) is orthogonal to li (x). (5.6 Gaussian Quadrature with ai (x) = (li (x))2 [1 + 2li (xi )(xi − x)]. Bi = 0. . Hence. Since li (x) is a j=0 polynomial in Πn . Theorem 5. Levy 0 ≤ i ≤ n. . b] and let w(x) be a weight function. This is precisely what we deﬁned as a Gaussian quadrature. xn as the roots of a polynomial of degree n + 1 that is worthogonal to Πn . (5. li (x) = j=0 j=i x − xj . We are now ready to formally establish the fact that the Gaussian quadrature is exact for polynomials of degree 2n + 1.12 Let f ∈ C 2n+2 [a.34). b) such that b a n f (x)w(x)dx − Ai f (xi ) = i=0 f (2n+2) (ζ) (2n + 2)! 92 b n a j=0 (x − xj )2 w(x)dx. However. Indeed (assuming n = 0): b n b n Bi = a w(x)(x − 2 xi )li (x)dx = j=0 j=i (xi − xj ) w(x) a j=0 (x − xj )li (x)dx. (5. x i − xj We now assume that w(x) is a weight function in [a.34) In some sense.5.33) and b Bi = a w(x)bi (x)dx.
e.35) converges to the lefthandside as n → ∞.4 in the context of numerical diﬀerentiation. A proof of the theorem that is based on the Weierstrass approximation theorem can be found in. We conclude this section with a convergence theorem that states that for continuous functions.g. We use the characterization of the Gaussian quadrature as the integral of a Hermite interpolant. b I= a f (x)dx. f (x) − p2n+1 (x) = f (2n+2) (ξ) (x − xj )2 . b). Hence according to (5.13 We let w(x) be a weight function and assuming that f (x) is a continuous function on [a. Let I denote the exact integral that we would like to approximate. We recall that the error formula for the Hermite interpolation is given by (2. We will demonstrate this principle with a particular example. 5. b]. e. This theorem is not of a great practical value because it does not provide an estimate on the rate of convergence.. the Gaussian quadrature converges to the exact value of the integral as the number of quadrature points tends to inﬁnity.35) Then the righthandside of (5. and consider the corresponding Gaussian quadrature: b a n f (x)w(x)dx ≈ Ani f (xni ). i. For each n ∈ N we let {xni }n be the n + 1 roots of the i=0 polynomial of degree n + 1 that is worthogonal to Πn . in [7]. We can use a similar principle with numerical integration. b) such that b a n f (x)w(x)dx − Ai f (xi ) = i=0 f (2n+2) (ζ) (2n + 2)! b n a j=0 (x − xj )2 (x)w(x)dx. (2n + 2)! j=0 n ξ ∈ (a.7 Romberg Integration We have introduced Richardson’s extrapolation in Section 4. 93 . i=0 (5.32) we have b a n b b f (x)w(x)dx − Ai f (xi ) = i=0 a b f (x)w(x)dx − p2n+1 w(x)dx a n = a f (2n+2) (ξ) w(x) (x − xj )2 dx.7 Romberg Integration Proof.D. Levy 5. Theorem 5.49). (2n + 2)! j=0 The integral mean value theorem then implies that there exists ζ ∈ (a..
The exact values of the coeﬃcients. T (2h). . . + ck hk + O(h2k+2 ).e. The quadrature that is obtained from Simpson’s rule by eliminating the leading error term is known as the super Simpson rule. .13). ˆ 3 h (f0 + 4f1 + 2f2 + . . S(n) denotes the composite Simpson’s rule with n subintervals. n T (h) = h i=0 f (a + ih). A more detailed study of the quadrature error reveals that the diﬀerence between I and T (h) can be written as I = T (h) + c1 h2 + c2 h4 + . In some places. 94 . . are of no interest to us as long as they do not depend on h (which is indeed the case). ck . + fn−2 + fn 2 2 4T (h) − T (2h) + c2 h4 + . We can now write a similar quadrature that is based on half the number of points. Romberg integration is used to describe the speciﬁc case of turning the composite trapezoidal rule into Simpson’s rule (and so on). This enables us to eliminate the h2 error term: I= Therefore 1 4T (h) − T (2h) = 4h 3 3 −2h = 1 1 f0 + f1 + . 3 Here. Hence I = T (2h) + c1 (2h)2 + c2 (2h)4 + .7 Romberg Integration D. . Levy Let’s assume that this integral is approximated with a composite trapezoidal rule on a uniform grid with mesh spacing h (5. i.5. We know that the composite trapezoidal rule is secondorder accurate (see (5. . . The procedure of increasing the accuracy of the quadrature by eliminating the leading error term is known as Romberg integration. . + fn−1 + fn 2 2 1 1 f0 + f2 + . + 2fn−2 + 4fn−1 + fn ) = S(n).. . . .14)).
1. We then know that f (x) has at least one zero in [a. c] and [c. The ﬁrst step is to divide the interval into two equal subintervals. if indeed there are additional roots..D. To simplify the notation. b]. If f (a)f (c) > 0. If yes. f (a)f (b) < 0. Otherwise. we denote the successive intervals by [a0 . we keep the right subinterval [c. We also assume that it has opposite signs at both edges of the interval. b]. [a1 . c]. There will also be no indication as of how many zeros f (x) has in the interval. This procedure repeats until the stopping criterion is satisﬁed: we ﬁx a small parameter ε > 0 and stop when f (c) < ε.1 Methods for Solving Nonlinear Problems The Bisection Method In this section we present the “bisection method” which is probably the most intuitive approach to root ﬁnding. [a... Note that in the case that is shown in the ﬁgure. Of course f (x) may have more than one zero in the interval. of equal lengths. The bisection method is only going to converge to one of the zeros of f (x). The ﬁrst two iterations in the bisection method are shown in Figure 6. and no hints regarding where can we actually hope to ﬁnd more roots.e. c= a+b . we keep the left subinterval [a. Of course. f(b0) f(c) 0 f(a0) a0 x c b0 f(b1) 0 f(a1) a1 f(c) c x b1 Figure 6. We want to keep the subinterval that is guaranteed to contain a root. the function f (x) has multiple roots but the method converges to only one of them.1: The ﬁrst two iterations in a bisection rootﬁnding method 95 . Levy 6 6. b0 ]. 2 This generates two subintervals. b]. we check if f (a)f (c) < 0.. in the rare event where f (c) = 0 we are done. b1 ]. We are looking for a root of a function f (x) which we assume is continuous on the interval [a. i. b].
bn ] is the interval that is obtained in the nth iteration of the bisection method. We denote that value by r. Levy We would now like to understand if the bisection method always converges to a root. i. We also know that every iteration shrinks the length of the interval by a half. r is a root of f (x). then the limits limn→∞ an and limn→∞ bn exist. 2 . i. b0 . n→∞ where f (r) = 0.. We now assume that we stop in the interval [an . 1 n 0. i. 2 which means that bn − an = 2−n (b0 − a0 ).e. 2 We summarize this result with the following theorem. and {bn }n n→∞ 0 are monotone and bounded. bn ]. it is easy to see that the best estimate for the location of the root is the center of the interval. which means that f (r) = 0.6. if we have to guess where is the root (which we know is in the interval). Given such an interval.. we know that (f (r))2 0. if cn = then r − cn  2−(n+1) (b0 − a0 ). n→∞ n→∞ Since f (an )f (bn ) 0.e. and hence converge. 96 an + b n . an + b n cn = .. bn ]. bn+1 − an+1 = (bn − an ).1 The Bisection Method D. and n→∞ lim an = lim bn = r.. Theorem 6.1 If [an . We would also like to ﬁgure out how close we are to a root after iterating the algorithm several times. The sequences {an }n Also n→∞ n→∞ 0 a1 a2 .. i.e. lim bn − lim an = lim 2−n (b0 − a0 ) = 0. we have 1 r − cn  (bn − an ) = 2−(n+1) (b0 − a0 )... This means that r ∈ [an . r = lim an = lim bn .e. We ﬁrst note that a0 and b0 b1 b2 . 2 In this case. so that both sequences converge to the same value. a0 .. In addition.
In this case.3917 is such a point. We will show that the method has a quadratic convergence rate. and widelyused root ﬁnding method. f (x0 ) (6. the Newton method (also known as the NewtonRaphson method) for ﬁnding a root is given by iterating (6. we ﬁnd the next approximation of the root xn+1 . It is easy to see that Newton’s method does not always converge. en+1 ≈ ce2 .e.2 Newton’s Method 6. that r is a simple root of f (x).2 Newton’s Method Newton’s method is a relatively simple. This is one reason as of why it is so important not only to understand the construction of the method. It is easy to see that while in some cases the method rapidly converges to a root of the function. Starting from a point xn .D. As always.3.1) In general. Indeed. n (6. of course. in some other cases it may fail to converge at all. i. We start with an initial guess for the location of the root. we will prove the convergence of the method for certain functions 97 .3) A convergence rate estimate of the type (6. i.3) makes sense. We denote this point by x1 and write 0 − f (x0 ) = f (x0 )(x1 − x0 ). In order to analyze the error in Newton’s method we let the error in the nth iteration be en = xn − r. f (xn ) (6...2) Two sample iterations of the method are shown in Figure 6.e. from which we ﬁnd xn+2 and so on. xn+1 = xn − f (xn ) . we assume that f (x) has at least one (real) root. only if the method converges. We assume that f (x) is continuous and that f (r) = 0.1) repeatedly. x0 ≈ 1. In this case.e. i.e. We then let l(x) be the tangent line to f (x) at x0 . iterated twice.. Here we consider the function f (x) = tan−1 (x) and show what happens if we start with a point which is a ﬁxed point of Newton’s method. and denote it by r. The intersection of l(x) with the xaxis serves as the next estimate of the root. Levy 6.2. say x0 . but also to understand its limitations. which means that x1 = x0 − f (x0 ) . l(x) − f (x0 ) = f (x0 )(x − x0 ). practical. we do converge to the root of f (x). i.. We demonstrate such a case in Figure 6.
6.2 Newton’s Method
D. Levy
f(x) −→
0
r
xn+2
x
xn+1
xn
Figure 6.2: Two iterations in Newton’s rootﬁnding method. r is the root of f (x) we approach by starting from xn , computing xn+1 , then xn+2 , etc.
tan−1(x)
0
x1, x3, x5, ...
x0, x2, x4, ...
x
Figure 6.3: Newton’s method does not always converge. In this case, the starting point is a ﬁxed point of Newton’s method iterated twice
98
D. Levy
6.2 Newton’s Method
f (x), but before we get to the convergence issue, let’s derive the estimate (6.3). We rewrite en+1 as en+1 = xn+1 − r = xn − f (xn ) en f (xn ) − f (xn ) f (xn ) − r = en − = . f (xn ) f (xn ) f (xn )
Writing a Taylor expansion of f (r) about x = xn we have 1 0 = f (r) = f (xn − en ) = f (xn ) − en f (xn ) + e2 f (ξn ), 2 n which means that 1 en f (xn ) − f (xn ) = f (ξn )e2 . n 2 Hence, the relation (6.3), en+1 ≈ ce2 , holds with n c= 1 f (ξn ) 2 f (xn ) (6.4)
Since we assume that the method converges, in the limit as n → ∞ we can replace (6.4) by c= 1 f (r) . 2 f (r) (6.5)
We now return to the issue of convergence and prove that for certain functions Newton’s method converges regardless of the starting point. Theorem 6.2 Assume that f (x) has two continuous derivatives, is monotonically increasing, convex, and has a zero. Then the zero is unique and Newton’s method will converge to it from every starting point. Proof. The assumptions on the function f (x) imply that ∀x, f (x) > 0 and f (x) > 0. By (6.4), the error at the (n + 1)th iteration, en+1 , is given by en+1 = 1 f (ξn ) 2 e , 2 f (xn ) n 1, xn > r, Since
and hence it is positive, i.e., en+1 > 0. This implies that ∀n f (x) > 0, we have f (xn ) > f (r) = 0. Now, subtracting r from both sides of (6.2) we may write en+1 = en − f (xn ) , f (xn ) 99
(6.6)
6.3 The Secant Method
D. Levy
which means that en+1 < en (and hence xn+1 < xn ). Hence, both {en }n 0 and {xn }n are decreasing and bounded from below. This means that both series converge, i.e., there exists e∗ such that, e∗ = lim en ,
n→∞
0
and there exists x∗ such that x∗ = lim xn .
n→∞
By (6.6) we have e∗ = e∗ − f (x∗ ) , f (x∗ )
so that f (x∗ ) = 0, and hence x∗ = r.
6.3
The Secant Method
f (xn ) . f (xn )
We recall that Newton’s root ﬁnding method is given by equation (6.2), i.e., xn+1 = xn −
We now assume that we do not know that the function f (x) is diﬀerentiable at xn , and thus can not use Newton’s method as is. Instead, we can replace the derivative f (xn ) that appears in Newton’s method by a diﬀerence approximation. A particular choice of such an approximation, f (xn ) ≈ f (xn ) − f (xn−1 ) , xn − xn−1 xn − xn−1 , f (xn ) − f (xn−1 )
leads to the secant method which is given by xn+1 = xn − f (xn ) n 1. (6.7)
A geometric interpretation of the secant method is shown in Figure 6.4. Given two points, (xn−1 , f (xn−1 )) and (xn , f (xn )), the line l(x) that connects them satisﬁes l(x) − f (xn ) = f (xn−1 ) − f (xn ) (x − xn ). xn−1 − xn
The next approximation of the root, xn+1 , is deﬁned as the intersection of l(x) and the xaxis, i.e., 0 − f (xn ) = f (xn−1 ) − f (xn ) (xn+1 − xn ). xn−1 − xn 100 (6.8)
We now proceed with an error analysis for the secant method. The points xn−1 and xn are used to obtain xn+1 . 2 We start by rewriting en+1 as en+1 = xn+1 − r = Hence en+1 = en en−1 xn − xn−1 f (xn ) − f (xn−1 ) f (xn ) en (6.10) f (xn )xn−1 − f (xn−1 )xn f (xn )en−1 − f (xn−1 )en −r = . We claim that the rate of convergence of the secant method is superlinear (meaning. Newton’s method.4: The Secant rootﬁnding method.g. we denote the error at the nth iteration by en = xn − r. with √ 1+ 5 α= .. As usual. While this is an extra requirement compared with.7) requires two initial points. every stage requires only one new function evaluation.3 The Secant Method f(x) −→ 0 r xn+1 x xn xn−1 Figure 6.7). We note that the secant method (6. e.8) we end up with the secant method (6. More precisely. if implemented properly.D. we will show that it is given by en+1  ≈ en α .11) . In addition. better than linear but less than quadratic). Levy 6. which is the next approximation of the root r Rearranging the terms in (6. we note that in the secant method there is no need to evaluate any derivatives. f (xn ) − f (xn−1 ) f (xn ) − f (xn−1 ) − f (xn−1 ) en−1 xn − xn−1 101 . (6.9) (6.
f (xn ) − f (xn−1 ) f (r) The error expression (6.12) expresses the error at iteration n + 1 in terms of the errors at iterations n and n − 1. Since (6. f (xn ) en f (xn−1 ) en−1 D. This is possible only if 1−α+ 1 = 0. α 102 .e.14) tends to zero as n → ∞ (assuming. n en 2 We thus have 1 f (xn ) f (xn−1 ) − = (en − en−1 )f (r) + O(e2 ) + O(e2 ) n−1 n en en−1 2 1 = (xn − xn−1 )f (r) + O(e2 ) + O(e2 ). Levy xn − xn−1 and − 1 ≈ f (r). In order to turn this into a relation between the error at the (n + 1)th iteration and the error at the nth iteration. n−1 n 2 Therefore. of course..12) Equation (6. This implies that A1+ α C −1 ∼ en 1−α+ α .11) can be now simpliﬁed to en+1 ≈ 1 f (r) en en−1 = cen en−1 . 2 f (r) (6.3 The Secant Method A Taylor expansion of f (xn ) about x = r reads 1 f (xn ) = f (r + en ) = f (r) + en f (r) + e2 f (r) + O(e3 ). n 2 n and hence f (xn ) 1 = f (r) + en f (r) + O(e2 ).13) also means that en  ∼ Aen−1 α .6.13) (6. i. en+1  ∼ Aen α . that the method converges). we have Aen α ∼ Cen A− α en  α . 2 1 xn − xn−1 ≈ . 1 1 1 1 (6.14) is nonzero while the righthandside of (6.14) The lefthandside of (6. we now assume that the order of convergence is α.
3 Assume that f (x) is continuous ∀x in an interval I.D. √ In this case. the convergence is of order 1+2 5 .3 The Secant Method A = C 1+ α = C α = C α−1 = 1 f (r) 2f (r) α−1 . If x0 . 103 . Assume that f (r) = 0 and that f (r) = 0. Levy which. α= 2 The constant A in (6. means that √ 1+ 5 . We summarize this result with the theorem: Theorem 6. in turn. x1 are suﬃciently close to the root r.13) is thus given by 1 1 6. then xn → r.
.W. Analysis of numerical methods. Cambridge. Keller H. Bj¨rck A. Second edition. Introduction to numerical analysis. New York. SpringerVerlag.. Second edition. Burlisch R. Dover. Dover. Mineola. NJ.. NY. Cambridge university u press. New York. Introduction to approximation theory. Numerical methods. 104 . Second edition.. 1982 [3] Dahlquist G.J. NY. 1994 [6] Stoer J.. Levy References [1] Atkinson K. Englewood cliﬀs.. Second edition.. 1993 [7] S¨li E. NY... Chelsea publishing company. Interpolation and approximation. An introduction to numerical analysis. UK. Mayers D. An introduction to numerical analysis.. New York. NY. 1975 [5] Isaacson E.REFERENCES D. PrenticeHall.B. 2003. o 1974 [4] Davis P. Second edition. 1989 [2] Cheney E. New York.. NY. John Wiley & Sons.
. 30 degree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 de la Vall´ePoussin . . . . . . . . . . . . . . . . 87 orthogonal polynomials . . . . . . . . . . . . . . . . . . . 83. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .45 near minimax . . . . . . . . . . . . . . . 36 approximation best approximation . 66 onesided diﬀerencing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85 composite . . 37 Bernstein polynomials . . . . . . . . . . 77. . . . . . . . . . . . . . . . 36. . . . . . . . . . . . . . . .66 centered diﬀerencing . . . . . . . . . .28 cubic . . . . . . . . . . . . . 75 composite . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 Newton’s form . . . . . . . . 94 Simpson’s rule . . 74 Romberg. . . 3 splines. . . . . . . . . . . . . . . . . . . . . . . . . . . 44 e diﬀerentiation . . . . . . . . . . . . . . . . . . . . . . 25 Newton’s form . 15. . . . . . . . . 49 orthogonal polynomials . . . . . . . . . . . . . . . . . . . . . . . . . 42 existence . . . . 21 Lagrange form . . . . . . . . . . . . . . . . . . . . . 85. . . . . . . . . 15. . . . . . . . . . . . . . 86 error . . . . . . . . . . . . . . . . . . . 79 undetermined coeﬃcients . . . . . . . . . . . . . . . . . . 18 divided diﬀerences . . . . . . . . . . . . . . . . .23 error . . . . . . . . 46 Weierstrass . . . . . . . . . . . . 3 Hermite . . . 66 undetermined coeﬃcients . . . . . . . . . . . . 52. . 48 weighted . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37 Bessel’s inequality . . . . . . . . 10 polynomial interpolation . . . . . . . 10. . . . . . . . . . . . . . . . . . . . . . . . . . 23 GramSchmidt . . . . 81 NewtonCotes . . . . . . . . . . . . . . . . . 53 L∞ norm . . . . 46 polynomials . . . . . . . . . 28 . . . . . . . . . 59 Chebyshev near minimax interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 Hermite polynomials . . . . . . . . . . . . . . . . . . . . . .48 Hilbert matrix . . . . . . . . . . . . . . . . . . .Index L2 norm . . . . . . . . . . . . . . . . . . . . 70 via interpolation . . . . . . . . . . . . . . . . . . . . . . . . 42 leastsquares. . . . . . . . . . . . . . . . 68 divided diﬀerences . . . . . . . 78 composite . . . 8 near minimax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 uniqueness. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28 knots . . . . . . . 46 points . . . . . . . . . . .49 inner product . . . . . . . . . . . . . . . . . . . . . 53 integration Gaussian . .93. . . . . . . . . . . . . . . . . . . . 15 existence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10. . . . . . . . . . . . 14 with repetitions. 56 Chebyshev uniqueness theorem . . . . . . . . . . . . . . . . 18. 5.66 backward diﬀerencing. . . . . . . . . . . . . . . . . 23 interpolation error . . . . . . . . . . . . . . . . . 44 remez . . . . . . . . . . . . 53 weighted . . . . . . . . . . . . . . . .72 truncation error . . . . . . . 88 midpoint rule . . . . . . . . . . . . 50 solution . . . . . . . . . . . . . . 65 accuracy.41 oscillating theorem . . . . . . . 14 with repetitions . . . . . . . . . . . . . . 3 Lagrange form . . . . . . . . . . . . 12. . . . . . . . 94 trapezoidal rule . . . . . . . . . . . . . . . . . . 82 interpolation Chebyshev points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 105 Hilbert matrix. . . . . . . . . . . . . . . . . . . . . 67. 66 Richardson’s extrapolation. 3 interpolation points . . . . . . . . . . . . 87 super Simpson . 66 forward diﬀerencing . . . . . . . . . . . . . . . . . . . . . . . . . .84 rectangular rule . . . . . . . . . . . . 48 minimax. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 weighted. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75 Riemann . . . . . . . . . . . . . . . . . . . 10 divided diﬀerences . . . . 77 quadratures . . . . . . . . . . . . .
. . 59 Chebyshev . . . . . . . . . . . . . . . . . 34 notaknot . . . . . . 60 maximum norm . . . . . . . . . . . . 33. . . 60 Parseval’s equality . . . . . . . . . . . . . . . . . . .53 Hermite . . 100 splines . . . . . . . 93. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 Lagrange form . . . . . . . . . . . . . 57. . 50. . . . . . 59. . . . . . . . . 43 monic polynomial . . . . .23 triangle inequality . . . . . . . . . . . . . . 59. . . . . . . . 59 quadratures . . . . . . . . . . 56 GramSchmidt. . . . . 74 Romberg integration . . . . . . . . . . . . . . . . . . . 59 roots of. . 3. . . . . . . . . . . . . . . 56. . . . . . . . . 6 Weierstrass approximation theorem . . . . . . . . . . . . . . . 97 the bisection method . . . . . . . . . . . . . 56. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 Bessel’s inequality . . . . . . . . . . 93 Riemann sums . . . . . . . 63 oscillating theorem . . . . . . . . . . . . .INDEX natural . . . . . . . . . 95 the secant method . . . . . . . . . . 36 minimax error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 uniqueness . . . . . . . . . . . . . . . . 57. . . . . . . . . . . . . . 62 leastsquares . . . . 36 106 D. . . . . . . . . . 72. . . . . . . . . . 53. . . . . . 37 . . . . . . 62 Legendre . . . . . . . . . . . . . . . . . . . . . . . . . . . . see interpolation norm . see interpolation Taylor series. 7 Vandermonde determinant . . . . . . . 46 Richardson’s extrapolation . . 94 root ﬁnding Newton’s method . . . .see approximation Legendre polynomials . . . . . see approximation weighted. . . . . . . 57 Laguerre . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 orthogonal polynomials . . . . . . . . . . . . . . . . . . . . . . . . .63 triple recursion relation . . . . . . . . . . . . . . . . 51. . . . . . . 44 Parseval’s equality . . . . . . . . . . 41. . . . . . . . . . . . . . . . 16 Newton’s form . . . . see interpolation Laguerre polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . 6 weighted least squares . . . . see integration Remez algorithm . . . . . . . Levy Vandermonde determinant .
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.