Professional Documents
Culture Documents
2 Linear Systems 41
2.1 Regular Linear Systems . . . . . . . . . . . . . . . . . . . . . . . 42
2.1.1 LU Decomposition . . . . . . . . . . . . . . . . . . . . . . 42
2.1.2 Matrix Norms, Inner Products and Condition Numbers . . 48
2.2 Nonsquare Linear Systems . . . . . . . . . . . . . . . . . . . . . . 52
2.2.1 Projections . . . . . . . . . . . . . . . . . . . . . . . . . . 54
2.2.2 Condition of Least Squares Problems . . . . . . . . . . . . 55
2.2.3 Orthogonal factorizations . . . . . . . . . . . . . . . . . . 56
2.2.4 Householder Reflections and Givens Rotations . . . . . . . 58
2.2.5 Rank Deficient Least Squares Problems . . . . . . . . . . . 60
iii
iv CONTENTS
3 Signal Processing 63
3.1 Discrete Fourier Transformation . . . . . . . . . . . . . . . . . . . 63
4 Iterative Methods 73
4.1 Computation of Eigenvalues . . . . . . . . . . . . . . . . . . . . . 74
4.1.1 Power iteration . . . . . . . . . . . . . . . . . . . . . . . . 75
4.2 Fixed Point Iteration . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.3 Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
4.3.1 Numerical Computation of Jacobians . . . . . . . . . . . . 84
4.3.2 Simplified Newton Method . . . . . . . . . . . . . . . . . . 85
4.4 Continuation Methods in Equilibrium Computation . . . . . . . . 87
4.5 Gauß-Newton method . . . . . . . . . . . . . . . . . . . . . . . . 91
4.6 Iterative Methods for Linear Systems . . . . . . . . . . . . . . . . 92
v
vi CONTENTS
Notational conventions
• indices, integer numbers: small Latin letters in the range [i − n], see For-
tran60 convention.
• vectors: small Latin letters mainly from the end of the alphabet, e.g.u, v, w
vii
viii CONTENTS
Topics
Course hours
Polynomial Interpolation all 4
Bézier Curves D 2
Spline Interpolation all 4
Bézier Splines D 2
Norms, Stability, Condition all 2
Linear Systems of Equations all 2
Least Squares (Data fitting) all 2
Orthogonal Factorization I 2
Numerical Signal Processing: FFT all 4
Nonlinear Systems: Fixed point problems all 2
Nonlinear Systems: Newton Iteration all 2
Nonlinear Data Fitting: Gauss-Newton I 2
Ordinary Differential Equation: Initial Value Problems all 8
Ordinary Differential Equations: Boundary Value Problems F,I,K 4
Basics of Parameter Estimation I 2
ix
x CONTENTS
Chapter 1
Before reading this paragraph make sure, that you are familiar with the basics
in Linear Algebra. Reread Chapter 6.2 in [Spa94].
Definition 1
Given data points (ti , yi ), i = 1, . . . , n.
A function f is called interpolating these data if
f (ti ) = yi
You might think of the independent variable ti being time points while yi denote
measurements at the these points.
Other examples are pressure versus temperature, current versus voltage etc.
There can be more measurements at a given time ti , thus yi might be a vector
with several components.
1
2 CHAPTER 1. INTERPOLATION AND CURVE DESIGN
f (ti ) − yi = ri
being small or even minimal in some sense, than we speak about approximation
instead of interpolation. For this end we have to discuss what we mean by “small”
or “minimal” and we have to introduce norms and inner products. This will be
the topic in a later chapter.
Another important topic in this context is curve design. Curve design is the task
covered by many modern drawing programs like FREEHAND, COREL DRAW
etc. It is also the basis for font generation for example in METAFONT and in
the POSTSCRIPT language.
and
ti
f (ti )
is a particular point of that graph. Note, that the independent variable t is
always the first component of the points if we consider a graph of a function. t
parameterizes the graph and the resulting curve is called a non parametric curve.
General curves are parametric, i.e. the describing parameter has to be given
separately and is not a component of the points. The graph of a parametric
2D-curve has the form:
f1 (t)
Γ := { |t ∈ [t0 , te ]}
f2 (t)
where f1 and f2 are functions over the given interval. A given graph can have
different parameterizations, i.e.
f1 (t) ϕ1 (τ )
Γ := { |t ∈ [t0 , te ]} = { |τ ∈ [τ0 , τe ]}.
f2 (t) ϕ2 (τ )
1.2. POLYNOMIAL SPACES AND INTERPOLATION 3
An example for graph of a non parametric curve is the plot of the sine function,
while the plot of the letter ”S” in a given font is an example for a parametric
curve.
In this curse we will consider only 2D-graphs together with interpolation, approx-
imation and curve design.
x = A \ b
and find the ai as components of the solution vector x.
The interesting question in this context is the solvability of the system, which is
answered by the following
p(ti ) = yi i = 0, . . . , n
1 10
4
L (t) 2 L0 (t)
0
0.5 0
4
L (t)
2
-2
0
10
-4 L4 (t)
-0.5
-6
-1 -8
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1
where ci are the coefficients which have to be determined according to the inter-
polation task.
Thus, the polynomials are represented in the basis
j−1
ω j (t) = (t − tk )|j = 0, . . . , n , (1.6)
k=0
with ω 0 (t) := 1.
When comparing this representation to the monomial formulation (1.1) one rec-
ognizes that the coefficients in front of the highest degree basis functions are the
same, i.e. cn = an . This fact will be used later, when designing an algorithm for
computing the ci ’s.
Note: ω j (ti ) = 0 for all i < j. The coefficients ci of the interpolation polynomial
1.2. POLYNOMIAL SPACES AND INTERPOLATION 7
• and so on.
yj − pj−1 (tj )
cj =
ω j (tj )
Definition 9 We denote by p(f |t0 , . . . , tj−1 ) ∈ Pj−1 the polynomial which inter-
polates ti , yi := f (ti ), i = 0, . . . , j − 1.
8 CHAPTER 1. INTERPOLATION AND CURVE DESIGN
Its leading coefficient, i.e. the coefficient in front of tj−1 in its monomial repre-
sentation is correspondingly denoted by
f [t0 , . . . , tj−1 ].
p(f |t0 , . . . , tj )(t) = f [t0 ]ω 0 (t) + f [t0 , t1 ]ω 1 (t) + . . . + f [t0 , . . . , tj ]ω j (t). (1.10)
Thus the coefficients cj of the interpolation polynomial are given by the divided
differences, which can easily be computed recursively:
Note, the MATLAB command diff is a useful tool for forming differences of
MATLAB vector components.
Theorem 10 Let f ∈ C n+1 (a, b) and let p(f |t0 , . . . , tn ) be the polynomial interpo-
lating the points (ti , f (ti )), i = 0, . . . , n and denote by I(t0 , . . . , tn , t) the smallest
interval containing t0 , . . . , tn and t.
Then there exists for all t ∈ (a, b) a ξ ∈ I(t0 , . . . , tn , t) such that
1
r(t) = f (n+1) (ξ)ω n+1 (t) (1.11)
(n + 1)!
holds.
Before we prove this theorem we discuss its consequences. The error is essentially
composed by two components, one depending on the function f and another
depending on ω(t) and consequently on the location of the ti and t. If the function
f and the polynomial degree is given, the only parameter which can be influenced
to decrease the error is the location of ti . An equidistant grid of ti ’s is not optimal.
An optimal placing of the interpolation points will be discussed in the advanced
course, when Chebychev polynomials are introduced.
If t is outside I(t0 , . . . , tn ) we speak about extrapolation. Extrapolation is of-
ten used for predicting the behavior of a process, e.g. the development of an
investment fond. As can be seen from the exercise the Newton interpolation
polynomials and with them the interpolation error grow rapidly outside the data
interval. Extrapolation is therefore a numerically dangerous process.
Proof (of Theorem 10):
We fix a t̄ = ti and set F (t) := r(t)−Kω n+1 (t) and determine K so that F (t̄) = 0.
Then, F has at least n + 2 zeros in I[t0 , t1 , . . . , tn , t̄]. Thus, by Rolle’s theorem F
has at least n + 1 zeros, F has n zeros and finally F (n+1) has at least one zero,
say ξ ∈ I[t0 , t1 , . . . , tn , t̄].
As p(n+1) ≡ 0 it follows
Thus
f (n+1) (ξ)
K=
(n + 1)!
from which we obtain the expression for the error. ✷.
So far we were interested in the error committed when interpolating a function
f by a polynomial of fixed degree. Can we decrease the error by increasing the
degree of the polynomials by adding more and more interpolation points?
This question is discussed in the exercises (see Runge’s phenomenon), from which
we conclude, that interpolation with high degree polynomials can lead to an
highly oscillatory error behavior and large errors.
The answers can be given directly by evaluating (1.11). Using the fact sin6 (t) ≤ 1
we can answer the questions by
• r(0.1) ≤ 1
720
ω 6 (0.1) ≤ 1
720
0.017 < 2.3 10−5
• r( 34 π) ≤ 1
720
ω 6 ( 34 π) ≤ 1
720
10.15 < 1.2 10−2
• maxt∈[0,π/2] |r(t)| ≤ 1
720
maxt∈[0,π/2] |ω 6 (t)| ≤ 2.3 10−5 .
−5
x 10
1.8
1.6
1.4
1.2
1
|r(t)|
Interpolation Points
0.8
0.6
0.4
0.2
0
0 0.5 1 1.5
t
Figure 1.2: Interpolation error: Sine function interpolated by a fifth degree poly-
nomial
Example 12 To interpolate the data points (0, 1), (1, 2), (2, −1), (3, −2) by a third
order polynomial and to plot the results for 100 points in [0, 3] these two com-
mands are used as follows
1.2. POLYNOMIAL SPACES AND INTERPOLATION 11
ti=[0,1,2,3] yi=[1,2,-1,-2];
coeff=polyfit(ti,yi,3)
p=polyval(coeff,linspace(0,3,100));
plot(linspace(0,3,100),p)
Note, that the last parameter of polyfit is the degree of the desired polynomial.
If you provide a number k < n − 1, where n is the number of data points, then
polyfit returns a polynomial, which fits a polynomial to the data points in the
least squares sense. The polynomial will in general no longer interpolate the data.
Polynomial data fitting is the topic of Sec. 2.2.
polyfit uses the Vandermonde approach.
From
n
Bin (t) = (1 − t)n−i ti
i
n−1 n−1
= (1 − t) t +
n−i i
(1 − t)n−i ti
i i−1
= (1 − t)Bin−1 (t) + tBi−1n−1
(t),
12 CHAPTER 1. INTERPOLATION AND CURVE DESIGN
3 3
0.8 B (λ) B (λ)
0 3
0.6
3 3
B1(λ) B (λ)
2
0.4
0.2
0
0 0.2 0.4 0.6 0.8 1
with B00 (t) = 1 and we set Bjn (t) = 0 for j > n and j < 0.
We briefly summarize some properties of Bernstein polynomials:
n n
1. i=0 Bi (t) = 1
2. t = 0 is a root of multiplicity i
3. t = 1 is a root of multiplicity n − i
5. Bin (t) ≥ 0
Due to the last property we can write every polynomial in Pn ([tmin , tmax ]) as a
linear combination of Bernstein polynomials:
n
p(t) = in (t)
bi B
i=0
Note, the entries of the governing matrix are all in [0, 1] by construction.
n
n
b= αi bi with bi ∈ E
I 2 , αi ∈ R and αi = 1
i=0 i=0
Definition 15
A map Φ : E I2 →E
I 2 is called an affine map if it leaves barycentric combinations
invariant, i.e.
n
n
b= αi bi ⇒ Φ(b) = αi Φ(bi )
i=0 i=0
2
Much more detail on this topic can be found in [Far88].
14 CHAPTER 1. INTERPOLATION AND CURVE DESIGN
Φ(b) = Ab + v
Example 16
• The identity A = I, v = 0
• Scaling v = 0, A, diagonal
• Rotation v = 0 and
cos α − sin α
A= (1.12)
sin α cos α
• Shearing v = 0 and
1 α
A=
0 1
Maps which leave angles and lengths unchanged are called orthogonal maps (or
rigid body motions). They are characterized by AT A = I. The set
is called a straight line through a and b. All points are obtained by a barycentric
combination of a and b.
Note, a straight line can be viewed as the result of an affine map applied to the
real axis. In particular the interval [0, 1] is mapped to the line segment [a, b].
We call α = (1−t) and β = t the barycentric coordinates of the point c = αa+βb
and c(t) as a linear interpolation of a and b.
Linear interpolation is affine invariant, i.e.
The points a, b, c are called collinear, if they are related by linear interpolation.
Given three collinear points we note
Definition 17
A sequence of straight lines, where each segment interpolates two given points
bi , bi+1 is called a polygon or a piecewise linear interpolant of b0 , b1 , . . . , bN .
which is a parabola
Note, the special representation of the parabola in terms of the (basis) functions
1 = ((1 − t) + t)2 .
ratio(b0 , b10 (t), b1 ) = ratio(b1 , b11 (t), b2 ) = ratio(b10 (t), b20 (t), b11 (t))
t
= ratio(0, t, 1) =
1−t
We can generalize this construction principle to generate higher degree polyno-
16 CHAPTER 1. INTERPOLATION AND CURVE DESIGN
3
b1 b11
2.5
2 b2
b10
b20
1.5
1
0 t 1
b
0
0.5
0
0 0.5 1 1.5 2 2.5 3 3.5 4
mials:
Given b0 , b1 , . . . , bn ∈ E
I 2.
Recurse:
(r−1) (r−1)
bri (t) := (1 − t)bi (t) + tbi+1 (t)
for r = 1, . . . , n and i = 0, . . . , n − r (1.13)
Definition 18
The bri (t) are called partial Bézier curves of degree r. They are controlled by the
points bi , . . . , bi+r .
The final curve bn (t) := bn0 (t) ∈ Pn [0, 1] is called a Bézier curve and the polygon
defined by b0 , b1 , . . . , bn is called its control polygon with control points bi .
• Bézier curves are affine invariant, i.e. applying Φ to the control points yields
the same result as applying it to the complete Bézier curve.
In particular (set i = 0, r = n)
n
n
b (t) = bj Bjn (t)
j=0
Proof: (Induction)
i+r−1
i+r
= (1 − t) r−1
bj Bj−i (t) +t r−1
bj Bj−i−1 (t)
j=i j=i+1
i+r
i+r
= (1 − t) r−1
bj Bj−i−1 (t) +t r−1
bj Bj−i−1 (t),
j=i j=i
r−1
note Brr−1 = B−1 = 0 per construction.
Thus,
i+r
bri (t) = bj [(1 − t)Bj−i
r−1 r−1
(t) + tBj−i−1 (t)]
j=i
i+r
r
= bj Bj−i (t)
j=i
r
= bj+i Bjr (t) ✷
j=0
18 CHAPTER 1. INTERPOLATION AND CURVE DESIGN
Properties:
1. Affine invariance
n
Due to Bi (t) = 1 the values of the Bernstein polynomials can be viewed
as barycentric coordinates of bn (t).
2. Convex hull property
From Bin (t) ≥ 0 we see again bn (t) is in the convex hull of the control
polygon defined by the bi . (review the definition of a convex combination!)
3. Linear precision, i.e.
n
i n
Bi (t) = t,
i=0
n
Note: (1− ni )a+ ni b are uniformly spaced points on the straight line between
a and b.
4. Invariance under parameter transformation
n
n
τ −a
bi Bin (t) = bi Bin ( )
i=0 i=0
b−a
We end this section with some examples. In Fig. 1.5 a Bézier curve with its four
control points
0 0.25 0.5 0.75 1
b0 := , b1 := , b2 := , b3 := , b4 := ,
1 2 2 1 1.5
and the corresponding control polygon are displayed. Note, that the abscissae
of the Bézier points are equally spaced which corresponds to a non parametric
curve, see also the linear precision property above.
In Fig. 1.6 the corresponding partial polynomials are plotted along with the
Bézier curve. These are b10 (t), b11 (t), b12 (t), b13 (t), b20 (t), b21 (t), b22 (t) and finally b30 , b31 .
For example,
b30 (t) := bezier(b0 , b1 , b2 , b3 , t) = bezier(b20 (t), b21 (t), t).
We modify now the curve and replace b4 by
0.4
b4 := .
1.5
1.3. BÉZIER CURVES 19
2.5
1.5
0.5
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Note, the abscissae of the Bézier points are no longer equidistant. The effect of
this change is seen in Fig. 1.7. We can visualize this figure as a so-called crossplot,
cf. Fig. 1.8
Figure 1.7: Bézier curve representing a parametric graph and the corresponding
convex hull
To see that these are indeed polynomials, set t := cos α and consider
Chebyshev polynomials have special properties, which make them useful for our
purposes:
2.0 2.0
1.5 1.5
1.0 1.0
0 .25 .50 .75 1.0 0 .25 .50 .75 1.0
t
1.0
.75
t .50
.25
0
0 .25 .50 .75 1.0
Theorem 22
Proof:([DH95])
The first part will be proven by contradiction:
Let P ∈ Pn be a polynomial with leading coefficient an = 2n−1 and |Pn (t)| < 1
for all x ∈ [−1, 1]. Then, P − Tn ∈ Pn−1 as both polynomials have the same
leading coefficient. We consider now this difference at tk := cos kπ
n
:
Tn (t2k ) = 1 ∧ P (t2k ) < 1 ⇒ P (t2k ) − Tn (t2k ) < 0
Tn (t2k+1 ) = −1 ∧ P (t2k+1 ) > −1 ⇒ P (t2k+1 ) − Tn (t2k+1 ) > 0.
22 CHAPTER 1. INTERPOLATION AND CURVE DESIGN
1
T3(x)
0.5
T1(x)
0 T (x)
2
−0.5
−1
−1 −0.5 0 0.5 1
Figure 1.9: Chebyshev Polynomials
Thus, the difference polynomial changes its sign at least n times in the interval
[−1, 1] and has consequently n roots in that interval. This contradicts the fact
P − Tn ∈ Pn−1 . By this we showed that for each polynomial P ∈ Pn be a polyno-
mial with leading coefficient an = 2n−1 there exists a ξ ∈ [−1, 1] with |Pn (ξ)| ≥ 1.
By scaling we finally see that for a general polynomial with an = 0 there exists a
ξ ∈ [−1, 1] with |Pn (ξ)| ≥ 2|an−1
n|
.
The second part of the theorem then follows directly. ✷
We apply this theorem to the result on the approximation error (cf. Th. 10) of
polynomial interpolation and conclude for [a, b] = [−1, 1]:
The approximation
1
f (t) − P (f |t0 , . . . , tn )(t) = f (n+1) (τ ) · ωn+1 (t)
(n + 1)!
error is minimal if ωn+1 = Tn+1 /2n , i.e. if the ti are roots of the n + 1st Chebychev
polynomial, so-called Chebychev points.
In case of [a, b] = [−1, 1] we have to consider the map:
t−a
[a, b] → [−1, 1] t → τ = 2 −1
b−a
and
1−τ 1+τ
[−1, 1] → [a, b] τ → t = a+ b
2 2
1.3. BÉZIER CURVES 23
• Two functions f, g are called orthogonal with respect to the inner product
< ·, · >w if
< f, g >w = 0 (1.15)
Theorem 24
There exists a unique sequence of normalized orthogonal polynomials
k−1
pk (t) − tpk−1 (t) = cj pj (t) (1.16)
j=0
24 CHAPTER 1. INTERPOLATION AND CURVE DESIGN
with
< pk − tpk−1 , pj >w
cj =
< pj , pj >w
(why?).
As < pk − tpk−1 , pj >w =< pk , pj >w − < tpk−1 , pj >w we obtain when requiring
that pk is orthogonal to all lower degree polynomials
< tpk−1 , pj >w < pk−1 , tpj >w
cj = − =−
< pj , p j > < pj , pj >w
which results in c0 = . . . = ck−3 = 0 and
< tpk−1 , pk−1 >w
ck−1 = −
< pk−1 , pk−1 >w
and
< pk−1 , tpk−2 >w
ck−2 = − .
< pk−2 , pk−2 >w
As tpk−2 = pk−1 + (lower degree polynomial) we can we get
< pk−1 , tpk−2 >w < pk−1 , pk−1 >w
ck−2 = − =− .
< pk−2 , pk−2 >w < pk−2 , pk−2 >w
From (1.16) we then obtain
pk (t) = (t + ck−1 )pk−1 + ck−2 pk−2
−βk −γk2
Example 25
The Chebyshev polynomials are orthogonal polynomials on [−1, 1] with repect to
the weight function w = (1 − t2 )−1/2 .
Example 26
For a = −1, b = 1 and ω(t) = 1 we obtain the Legendre polynomials Pk , which
can be constructed e.g. by the following MAPLE code:
p_m:=0;
p_0:=1;p_1:=t;
beta_2:=int(t*p_1*p_1,t=-1..1)/int(p_1*p_1,t=-1..1);
gamma2_2:=int(p_1*p_1,t=-1..1)/int(p_0*p_0,t=-1..1);
p_2:=(t-beta_2)*p_1-gamma2_2*p_0;
beta_3:=int(t*p_2*p_2,t=-1..1)/int(p_2*p_2,t=-1..1);
gamma2_3:=int(p_2*p_2,t=-1..1)/int(p_1*p_1,t=-1..1);
p_3:=(t-beta_3)*p_2-gamma2_3*p_1;
1.3. BÉZIER CURVES 25
P2(t)
0.5
0
P3(t)
0.5
P1(t)
−1
−1 −0.5 0 0.5 1
Figure 1.10: Legendre polynomials
Theorem 27
Let pk ∈ Pk be orthogonal to all p ∈ Pk−1 .
Then pk has k simple real roots in the open interval (a, b).
Proof:
Let t0 , . . . , tm−1 be distinct points in (a, b) where pk changes sign.
Then Qm (t) := (t−t0 )(t−t1 ) . . . (t−tm−1 ) changes sign at the same points. Thus,
wQm pk does not change sign in (a, b) and we get
b
< Qm , pk >w = w(t)Qm (t)pk (t)dt = 0.
a
ẏ = f (t, y) y(0) = y0
or equivalently t
y(t) = f (τ, y)dτ + y0 .
t0
Furthermore, numerical integration is important for its own, e.g. when computing
element matrices in FEM (finite element method) applications.
We introduce the following short notation
b
Iab (f ) := f (τ )dτ
a
Example 28
The approximation
n
Itt0e (f ) := Itti−1
i
(f )
i=1
with
1 1
Itti−1
i
(f ) := hi f (ti−1 ) + f (ti )
2 2
and hi := ti − ti−1 step size is called trapezoidal rule.
where s is number of stages, bj are the weights and cj the knots of the quadrature
formula.
1.4. QUADRATURE FORMULAS 27
t0 t t t te
1 2 3
a b
Itt0e (f ) − Itt0e (f )
Consequently, bj ≥ 0.
) := I(f).
I(f
28 CHAPTER 1. INTERPOLATION AND CURVE DESIGN
s
f(t) := P (f |τ1 , . . . , τs )(t) = f (τj )Ls−1
j (t).
j=1
Here Ls−1
j (t) is the j
th
Lagrange polynomial defined by the knot points (cf. Sec-
tion 1.2.1)
s
t − τi
s−1
Lj (t) = .
j=i=1
τ j − τi
Thus,
s
Itti−1
i
(f) = f (τj )Itti−1
i
(Ls−1
j (t)).
j=1
Set
ti 1
1
bsj := Ls−1
j (t)dt = Ls−1
j (ti−1 + σhi )dσ
hi ti−1 0
then
s
Itti−1
i
(f) = hi bsj f (τj ).
j=1
Thus, given cj the weights bj are fixed. The methods are consistent due to
s
j=1 Lj (t) = 1. They are exact for polynomials at least up to order s − 1, i.e.
s−1
p ∈ Ps−1 ⇒ I(p) = I(p)
Lemma 30
Given s distinct points τ1 , . . . , τs ∈ [0, 1],
then there is a unique functional
s
I01 (f ) = bj f (τj )
j=1
Proof:
by construction and the uniqueness of interpolating polynomials. ✷
By coordinate transformation this result applies to any finite time interval anal-
ogously.
We investigate now the approximation error and define
1.4. QUADRATURE FORMULAS 29
Definition 31
= I(p) ∀p ∈ Pk−1 and if there is a p0 ∈ Pk with I(p
If I(p) 0 ) = I(p0 ), then the
method has order k.
Note, consistent methods have at least order 1.
A criterion for the order of a scheme is given by the following theorem:
Theorem
s 32
q−1
If i=1 bi ci = 1
q
q = 1, . . . , k then the method I has order k.
Proof:
Taylor expansion of f about ti . ✷
Example 33
It can be easily checked by Taylor expansion that for the trapezoidal rule the local
error is
1
Itti−1
i
(f ) − Itti−1
i
(f ) = f (ti−1 )h3i + O(h4i )
12
with hi = ti − ti−1 . The global error is bounded by
n
t −t
te
max f (ξ)h2 + O(h3 )
e 0
It0 (f ) − It0 (f ) =
te
Iti−1 (f ) − Iti−1 (f ) ≤
ti ti
12 ξ∈[t0 ,te ]
i=1
with h = 1/(te − t0 ).
The power of h in this expression corresponds to the order of the method.
• Can we place the knots cj so that the method gets an order k > s ?
• What is the optimal (maximal) order ?
Theorem 34
Define I01 by (ci , bi )si=1 with order k ≥ s,
and set
M (t) := (t − c1 )(t − c2 ) · · · (t − cs ) ∈ Ps [0, 1].
The order of I01 is larger than s + m iff
1
M (t)p(t)dt = 0 ∀p ∈ Pm−1 [0, 1], (1.19)
0
where the second term vanishes due to M (ci ) = 0. Thus I01 (f ) = I01 (f ). ✷
Example 35
Consider m = 1, s = 3:
1
0 = (t − c1 )(t − c2 )(t − c3 ) · 1dt
0
1 1
= − (c1 + c2 + c3 ) +
4 3
1
+ (c1 c2 + c1 c3 + c2 c3 ) − c1 c2 c3
2
⇒
1
4
− (c1 + c2 )/3 + c1 c2 /2
c3 = 1
3
− (c1 + c2 )/2 + c1 c2
Thus, there are two degrees of freedom in designing such a method.
1.4. QUADRATURE FORMULAS 31
Theorem 36
A method with s stages has maximal order 2s.
Proof:
Assume order k ≥ 2s + 1. Then by preceding theorem:
1
0= M (t)p(t)dt ∀p ∈ Ps [0, 1] (1.20)
0
which is a contradiction. ✷
Theorem 37
There is a method of order 2s. It is uniquely defined by taking cj as the roots of
the sth Legendre polynomial Ps (2t − 1).
Example 38
dk dk
s i (ti+1 ) = si+1 (ti+1 ) k = 0, . . . , r − 1.
dtk dtk
Again, we consider the interpolation task, i.e. we look for a spline function s
satisfying the interpolation conditions:
s(ti ) = yi i = 0, . . . , n
si (t) = ai (t − ti )3 + bi (t − ti )2 + ci (t − ti ) + di (1.26)
Hence,
1 Si+1 − Si
b i = Si ai = (1.33)
2 6hi
Inserting these relations into (1.29) gives
Si+1 − Si 3 Si 2
yi+1 = hi + hi + ci hi + yi .
6hi 2
From that we get ci :
yi+1 − yi 2Si + Si+1
ci = − hi .
hi 6
Now we use condition (1.27c) and get
and finally
yi+1 − yi yi − yi−1
hi−1 Si−1 + 2(hi−1 + hi )Si + hi Si+1 = 6 − (1.34)
hi hi−1
34 CHAPTER 1. INTERPOLATION AND CURVE DESIGN
with i = 1, . . . , n − 1.
These are n−1 equations for the n+1 unknown second derivatives Si . We have to
ask for two more conditions, which are boundary conditions if we put conditions
on S0 and Sn .
The easiest is to ask for
S0 = Sn = 0. (1.35)
A cubic spline fulfilling this condition is called a natural spline. We will first
consider this possibility and then discuss other common choices of boundary
conditions.
Equations (1.34) and (1.35) give us a square linear system of equations which can
be solved to determine the Si :
2(h0 + h1 ) h1 S1
h1 2(h1 + h2 ) h2 S2
... S3
h2 h3 =
. ..
.. hn−2 .
hn−2 2(hn−2 + hn−1 ) Sn−1
y2 −y1 y1 −y0
h1
− h0
y3 −y2 − y2 −y1
h2 h1
..
= 6 . (1.36)
..
.
yn −yn−1 yn−1 −yn−2
hn−1
− hn−2
Note, the ”empty” entries in the coefficient matrix are zeros. The matrix has a
banded structure. It is a tridiagonal matrix , furthermore it is symmetric. How
this structure can be exploited when solving the system will be discussed in
MATLAB Chapter 2. Here we just use the corresponding MATLAB command
S=A\b
for solving the system. For defining the coefficient matrix we can use the fact
MATLAB that the matrix is banded and apply MATLAB’s command ”diag”, cf. ”help
diag” and the exercises.
As pointed out before the definition of a cubic spline leaves two degrees of freedom.
These are normally described in terms of boundary conditions. There are several
common choices
• natural spline: We take S0 = Sn = 0. This choice is often taken, if we have
no other specific information available.
• end slope condition We might have knowledge about the slopes at the
boundary points, i.e. s (t0 ) and s (tn ) are known. From that conditions
for S0 and S1 can be derived and the linear system corresponding to (1.36)
can be set up. We leave this as an exercise.
1.5. PIECEWISE POLYNOMIALS AND SPLINES 35
In MATLAB’s spline toolbox there are many additional tools for computing and MATLAB
evaluating splines. A command that computes spline coefficient for various end
conditions is csape.
(up to physical constants, like the elasticity coefficient), where the minimum is
taken over all C 2 functions satisfying the interpolation conditions.
In this subsection we will show, that the cubic spline functions indeed share this
property.
We denote by V the set of all C 2 functions which interpolate the points (ti , yi )
with i = 0, . . . , l + 1.
Theorem 40
Let s∗ ∈ V be a cubic spline satisfying a natural boundary condition. Then
s∗ 2 ≤ s 2 ∀s ∈ V.
36 CHAPTER 1. INTERPOLATION AND CURVE DESIGN
Proof:
Let s ∈ V, then there is a h ∈ C 2 with h(ti ) = 0 such that s(t) = s∗ (t) + h(t). We
then obtain
s 22 = s∗ + h22 = s∗ 2 + 2 < s∗ , h > +h 22
with te
∗
< s , h >:=
s∗ (t)h (t)dt.
t0
∗
We have to show that < s , h >= 0:
Integration by parts gives
te
∗ ∗
< s , h >= s (t)h (t) te
t0 − s∗ (t)h (t)dt.
t0
1.5.2 B-Splines
In this subsection we study the linear space of splines like we did before for
polynomial spaces and look for a basis of this space which gives us spline repre-
sentations with ”nice” coefficients. By ”nice” we mean in the context of graphics,
coefficients which have a direct geometrical interpretation. By changing the coef-
ficients we want to influence the shape of the spline only locally. We saw this task
already when discussing the Bernstein basis for polynomials. For the interpola-
tion task it plays the interpretation of the coefficients plays not a certain role, but
when using splines for design purposes the coefficients can solve as ”handles” to
influence the shape by positioning them through mouse clicks or other computer
input devices.
Let ∆ := {a = t0 , t1 , . . . , tl+1 = b} with ti < ti+1 denote a partitioning (or a grid)
of the interval [a, b].
The space of all splines of degree k − 1 with respect to ∆ is denoted by Sk,∆ . It
is easily checked that Sk,∆ is a linear space and evidently Pk−1 ⊂ Sk,∆ holds.
1.5. PIECEWISE POLYNOMIALS AND SPLINES 37
Thus a basis of Sk,∆ consists of a basis of the polynomial space plus some addi-
tional functions. Let us consider first the monomial basis of the polynomial space
and extend it to a basis of the spline space.
For this end we define
Definition 41
(t − ti )k−1 if t ≥ ti
(t − ti )k−1
+ :=
0 else.
Theorem 42
B := {1, t, . . . , tk−1 , (t − t1 )k−1
+ , . . . , (t − tl )+ } is a basis of Sk,∆ and dim Sk,∆ =
k−1
k + l.
and
t − τi τi+k − t
Nik := Ni,k−1 + Ni+1,k−1
τi+k−1 − τi τi+k − τi+1
where we use the convention 0/0 = 0 if nodes coincide.
Examples of these functions are depicted in Fig. 1.12. There one observes the in-
creasing degree of smoothness when raising the order of these functions. Without
proof we collect some important properties of these functions:
1. Nik (t) = 0 only for t ∈ [τi , τi+k ]: local support
38 CHAPTER 1. INTERPOLATION AND CURVE DESIGN
1.2
N
i1
1
Ni2
0.8 Ni3
0.6
N
i4
0.4
0.2
0 τ τ τ
i i+1
τ τ i+4
i+2 i+3
0.8
0.6
0.4
0.2
−0.2
1 1.5 2 2.5 3 3.5 4
l+k
s= di Nik
i=1
1.5. PIECEWISE POLYNOMIALS AND SPLINES 39
and in particular
l+k
1= Nik .
i=1
The coefficients di are called de Boor points. See also their role in the context of
Bézier splines.
Changing the di ’s influences only a local part of the total spline due to the local
support property of the B-splines. The degree of the B-spline determines the
number of intervals influenced by this change.
More on this subject can be found in [de 78, Far88].
40 CHAPTER 1. INTERPOLATION AND CURVE DESIGN
Chapter 2
Linear Systems
We saw in the preceding sections the need for solving linear systems. They oc-
curred when we wanted to solve the Vandermonde or Bézier system for polynomial
interpolation and in a special form (tridiagonal system) when computing cubic
interpolation splines.
Solving linear systems occurs in nearly all algorithms in numerical analysis as a
subproblem. It has the following form
Ax = b with A ∈ Rn×m
In this course we study the following cases
• n = m square systems
• n > m overdetermined systems
As n and m can be very large (up to 104 unknowns) computing time for solving
these systems can become crucial.
So long in this course we just solved these systems by using the MATLAB com- MATLAB
mand1 .
x = A \ b
Now, we go into details and look what this command actually does.
We first review some facts on solvability of linear systems.
Definition 46
The linear space
N (A) = {x ∈ Rm |Ax = 0}
is called the nullspace or kernel of A and
R(A) = {z ∈ Rn |∃x ∈ Rm Ax = z}
is called the range space or image space of A.
1
For MATLAB help on the ”\” command, type help mldivide
41
42 CHAPTER 2. LINEAR SYSTEMS
with
y := U x (2.1d)
This suggests the following algorithm
b1 = l11 y1
b2 = l21 y1 + l22 y2
..
.
b5 = l51 y1 + l52 y2 + l53 y3 + l54 y4 + l55 y5
From the first equation you immediately get y1 . Using this value, you easily
obtain y2 from the next equation and so on.
We describe the procedure for a general lower triangular matrix by the following
piece of MATLAB code: MATLAB
for i=1:n;
for j=1:i-1;
b(i)=b(i)-l(i,j)*b(j);
end;
b(i)=b(i)/l(i,i);
end;
y=b;
For solving this system, we start with the last equation and solve for y5 and
proceed in a similar way but backwards.
MATLAB In MATLAB code this reads
for i=n:-1:1;
for j=i+1:n
y(i)=y(i)-u(i,j)*y(j);
end;
y(i)=y(i)/u(i,i);
end;
x=y;
Counting operations gives n2 /2 + O(n) multiplications and as many additions for
the backward or forward substitution methods2
(k−1) (k−1)
with lik := aik /akk and A(k−1) := Mk−1 · · · M1 A and A(0) := A.
Elementary transformations are regular matrices and their inverses have a similar
structure:
1
..
.
−1
Mk := 1 (2.3)
...
lk+1,k
.
.
.
ln,k 1
We note two important facts in this context (which can be checked easily):
• Products of triangular matrices are triangular.
A = LU (2.4)
end
for i=1:N,
pivot=A(i,i);
if pivot==0
disp(’Matrix has zero pivot elements’)
break
end
for j=i:N,
L(j,i)=A(j,i)/pivot;
end
for k=i+1:N,
for j=i+1:N,
A(k,j)=A(k,j)-L(k,i)*A(i,j);
end
end
end
U=zeros(N,N);
for i=1:N,for j=i:N, U(i,j)=A(i,j);end;end;
Note every regular matrix can be LU factorized. The pivot elements might be
zero which will lead to a break in the algorithm. This can be seen from the
following (regular) example:
0 1 x1 b
= 1
1 0 x2 b2
We will give a criterion for matrices which are LU factorizable.
Definition 49
A is called diagonal row dominant iff
n
|aii | > |aij | ∀i = 1, . . . , n
j=i
Theorem 50
Every diagonal dominant matrix A has an LU decomposition.
From the example above we see, that a matrix which has no LU factorization
can be transformed into a matrix which has an LU factorization by permuting
the rows:
1 0 x1 b
= 2
0 1 x2 b1
2.1. REGULAR LINEAR SYSTEMS 47
A(k+1) = Mk Pk A(k)
with permutation matrix Pk which interchanges the rows in such a way that
the pivot element becomes the largest element in the column segment a(:, k : n)
(MATLAB notation). Consequently, MATLAB
|lik | ≤ 1 i = k + 1, . . . , n.
Looking for the largest element in a column and then interchanging rows is called
partial pivoting in contrast complete pivoting which is a more seldom applied
strategy. There one attempts to interchange both rows and columns to obtain
a pivot element which is the largest element in the actual submatrix in the k-th
step.
N −1
N −1
N
(N − i) = 2
k = 2
k2 − N 2
i=1 k=1 k=1
we finally get
N
N3 N2 N
k2 − N 2 = − +
k=1
3 2 6
• Round-off errors
Later we will meet a third error source, the truncation error when solving things
iteratively.
In order to study the effects of errors we have to be able to measure the size of
errors. Errors are often described as relative quantities, i.e.
absolute error
relative error =
exact solution
and as the exact solution often is not available we consider instead
absolute error
relative error = .
obtained solution
An error in the result of a linear system is a vector, thus we have to be able to
measure sizes of vectors.
For this end we introduce norms
2.1. REGULAR LINEAR SYSTEMS 49
Definition 52
A vector norm is defined by . : Rn → R with
• x ≥ 0
• x = 0 ⇔ x = 0
• x + y ≤ x + y
• αx = |α|x α ∈ R
Example 53
x1 = |x1 | + . . . + |xn |
1/2
x2 = (|x1 |2 + . . . + |xn |2 ) (Euklid)
x∞ = max |xi |
Theorem 54
All norms on Rn are equivalent in the sense:
There are constants c1 , c2 > 0 such that for all x
holds.
Example 55
√
x∞ ≤ x2 ≤ nx∞
√
x2 ≤ x1 ≤ nx2
x∞ ≤ x1 ≤ nx∞
Recall from your calculus course, that the definition of convergence is based on
norms. The ultimate consequence of this theorem is that an iteration process
in a finite dimensional space converging in one norm is also converging in any
other norm. For proving convergence we just can select a norm which is the most
convenient for the particular proof. Note, that in infinite dimensional spaces
(function spaces) this nice property is lost.
We relate now vector norms to matrices. The concept is highly based on viewing
matrices as linear maps
A : Rn −→ Rn .
50 CHAPTER 2. LINEAR SYSTEMS
Definition 56
Axp
Ap = max = max Axp
x=0 xp x p =1
defines a matrix norm, which is called subordinate to the vector norm xp .
Some matrix norms:
A1 = max j i aij
A2 = maxi λi (AT A) where λi (A) denotes the i-th eigenvalue of A
A∞ = maxi j aij
1/2
AF = i j |aij | 2
(Frobenius Norm)
MATLAB Vector and matrix norms can be computed in MATLAB by the command norm,
which takes additional argument to define the type of norm, i.e. ’inf’ stands
for infinity norm.
We consider now the sensitivity of the linear system Ax = b with respect to
perturbations ∆b of the input data b.
Note that the right inequality is a direct consequence from Def. 56. Analogously
we get
b = Ax ⇒ b ≤ Ax.
This leads to,
∆x A−1 ∆b A A−1 ∆b
≤ =
x x Ax
Thus,
∆x ∆b
≤ A A−1
x b
Condition numbers can obtained in MATLAB by using the commands cond and MATLAB
rcond. The first command computes the condition number exactly and takes as
argument a specification of the type of norm used for computing this number.
rcond estimates the inverse of the condition number κ(A)−1 with respect to the
1-norm.
We consider also perturbations of the matrix A:
(A + ∆A)(x + ∆x) = b
and define for this end: A(t) := A + t∆A and x(t) := x + t∆x, with t ∈ R.
Consider:
A(t)x(t) = b
and take the derivative w.r.t. t:
Thus,
x
≤ A−1 A
x
A
≤ A−1 A
A
∆x ∆A
≤ κ(A)
x A
Example 58 The relative error due to round-off is called the so-called machine
epsilon. Usually we have ε ≈ 10−16 when using double precision arithmetic.
52 CHAPTER 2. LINEAR SYSTEMS
∆x
≤ κ(A)ε
x
The number κ(A)−1 can be viewed as the distance from A to the nearest singular
matrix.
Ax = b with A ∈ Rm×n
with m " n. This kind of problem often occurs when performing data fitting.
Example 59
We would like to fit the data
i 0 1 2 3 4
ti -1.0 -0.5 0.0 0.5 1.0
yi 1.0 0.5 0.0 0.5 2.0
p(t) = a2 t2 + a1 t + a0 .
the system solvable would be the wrong way to attack the problem, because a
single erroneous measurement gets a too strong influence on the result.
Therefore we formulate the problem in a different way:
Find an x̂ with
From
AT Ax̂ − AT b = 0 (2.8)
These equations are called normal equations and their solution a least squares
solution of the overdetermined linear system.
Normal equations have a geometric interpretation: Consider the range space
R(A). It is spanned by the columns of A.
By writing the normal equations as
AT (b − Ax) = AT r(x) = 0
we see that the residual corresponding to the least squares solution has to be
normal (orthogonal) to the columns of A or, in other words, to the range space
of A. This justifies the name ”normal” equations. This result can be generalized
as follows
Theorem 60
Let V be a finite dimensional linear space with an inner product < ·, · >. Let
U ⊂ V be a subspace and
U ⊥ := {v ∈ V | < v, u >= 0 ∀u ∈ U }
v − u∗ = min v − u ⇔ v − u ∈ U ⊥ (2.9)
u∈U
with the norm v = (< v, v >)1/2 induced by the inner product < ·, · >.
54 CHAPTER 2. LINEAR SYSTEMS
r
b
A x
Im(A)
Proof:
Let u∗ ∈ U be the unique point with v − u∗ ∈ U ⊥ . (Why is this point unique?)
Then for all u ∈ U we have
To compute the least squares solution from the normal equations requires to first
form the matrix AT A. It can be shown that the condition number is squared by
this process, which will result in an unnecessary high sensitivity with respect to
perturbations. This can be avoided by using special techniques like orthogonal
factorization of A or singular value decomposition [Hea97].
MATLAB Overdetermined systems are solved in MATLAB with the same command as
square systems, i.e. by using ”\”. Note, that totally different algorithms are
performed by one and the same command. In the overdetermined case case
MATLAB does not solve the least squares problem by directly setting up and
solving the normal equations. MATLAB uses for stability reasons, which will be
explained later, orthogonal factorizations (see Sec. 2.2.3) instead.
2.2.1 Projections
From Fig.2.1 it is intuitively clear that the normal equations and the least squares
approach are related to projections. In linear algebra projections are defined by
Definition 61
An n × n matrix P is called an orthogonal projection if it satisfies
2.2. NONSQUARE LINEAR SYSTEMS 55
• P2 = P
• PT = P
< v − P v, P x >= v T P x − v T P T P x = v T P x − vP 2 x = 0
Example 62
• Q1 = (q1 , . . . , qr ) an m × r matrix,
with QT1 Q1 = I, then
Q1 QT
1 projector on R(Q1 )
⊥
I − Q1 QT1 projector on R(Q1 ) = N (Q1 )
Definition 63
The angle δ between b ∈ X and a subspace U ⊂ X is defined by
b − P b2
sin δ = ,
b2
The following theorem relates this angle to the condition of the least squares
problem with respect to perturbations of the data (perturbations in measurements
= perturbations of b, perturbations in measurement time = perturbation in A):
56 CHAPTER 2. LINEAR SYSTEMS
κ2 (A)
κ≤
cos δ
i λi (AT A) 1/2
Here κ2 (A) := max
mini λi (AT A)
is the condition number of A with respect to · 2
and δ is the angle between b and R(A) as defined above.
In the extreme case that δ = π/2 the condition becomes infinity, which expresses
the fact that the data has no relation to the problem. This indicates a wrong
model of the physical problem.
On the other hand, if δ is small, than the condition number is of the size of κ(A).
It is worth while to compare this fact to the condition of the normal equations
which is
κ2 (AT A) = κ2 (A)2 .
Thus, often the condition of least squares problem is significantly smaller than
the condition of the normal equations and one would introduce an ”artificial”
sensitivity with respect to perturbations if one would attempt to solve the least
squares problem via the normal equation. So we seek for an alternative charac-
terization of the least squares solution, which avoids forming the matrix AT A.
This alternative way may be computationally more expensive but it will be more
stable, i.e. less sensitive to perturbations.
For this end we will discuss orthogonal factorizations of A in the next subsection.
QT Q = I
2.2. NONSQUARE LINEAR SYSTEMS 57
First, note that a orthogonal projection is only in the trivial case P = I described
by an orthogonal matrix. So do not confound the terms.
A direct consequence of the definition is det(Q) = ±1 (see determinant multi-
plication theorem). Furthermore we see from the definition of the 2-norm ·
that for orthogonal matrices Q2 = 1 holds. This makes orthogonal matrices so
important in numerical analysis: Transformations by orthogonal matrices do not
change the condition of linear systems.
Example 66
R1
A = Q Q2
1 0
A = Q R
factorizations:
AT Ax − AT b = RT QT QRx − RT QT b
R1T R1 x = R1T QT1b
T
R1 x = Q1 b.
So, instead of solving the normal equations we can solve
R1 x = QT
1b (2.10)
58 CHAPTER 2. LINEAR SYSTEMS
we obtain even an expression for the norm of the residual of the least squares
solution:
r2 := min Ax − b2 = QT2 b2
MATLAB In MATLAB there is a command qr performing QR-factorization. The numerical
algorithm is based on either successive rotations of the coordinate system or
successive reflections corresponding to the geometric interpretation of orthogonal
matrices given above.
span(v)
Ha
v a
We select a vector v in such a way that it reflects a given vector a such that
a1 ã1
a2 0
Ha = H .. = .. .
. .
am 0
2.2. NONSQUARE LINEAR SYSTEMS 59
v a
Ha
function [v]=house(a1,m)
% [v]=function house(a1,m)
% computes a householdermatrix to
% transform the m-vector a1 into
% sigma e_1, where e_1 is the
% first unit vector and sigma is
% up to a sign norm(a1)
%
sigma=norm(a1);
e1=zeros(m,1);
e1(1)=1;
alpha=a1’*e1;
v=a1+sign(alpha)*sigma*e1;
gamma=sigma*(sigma+abs(alpha));
sigma=-sign(alpha)*sigma;
T
This code applied to the vector a := 1 2 3 gives the following result Ha :=
T
−3.7417 0 0 .
It is important to note that multiplications with Householder matrices can be
done with n + 1 multiplications and additions as
vv T 1
Ha = (I − )a = a − (v T a)v
γ γ
Definition 67
An n × m matrix A+ is called the More-Penrose pseudoinverse of the m × n
matrix A if the following properties hold
2.2. NONSQUARE LINEAR SYSTEMS 61
A+ = (AT A)−1 AT
These examples show, that, if the linear system has a unique solution in the
”classical” or in the ”least squares sense”, then it can expressed by x∗ = A+ b.
Furthermore we note, that AA+ is an orthogonal projector onto R(A). Thus it
follows by Theorem 60 that x∗ = A+ b is a solution of min Ax − b2 , i.e.
x = x∗ + v = A+ b + v with v ∈ N (A)
Furthermore we note that if b = 0 then x∗ ∈| N (A) (see property (3) in Def. 67.
Thus,
min x2 = min +
x2 = min x∗ + v.
x∈L(b) x=A b+v v∈N (A)
is xa st = A+ b.
The pseudoinverse A+ can be computed via the singular value decomposition of
A, which is a generalization of the diagonalization of a symmetric matrix by a
similarity transformation with orthogonal matrices:
62 CHAPTER 2. LINEAR SYSTEMS
Theorem 70
Any matrix A ∈ Rm×n can be factorized in
A = U ΣV T
with U ∈ Rm×m and V ∈ Rn×n being orthogonal matrices and Σ ∈ Rm×n with
Σ = diag(σ1 , . . . , σmin(m,n) )
and σi ≥ 0.
This factorization is called singular value decomposition and the σi are called
singular values.
In this course we will not present an algorithm for numerically performing the
singular value decomposition. It is very much related to algorithms for computing
eigenvalues of a general real matrix. We refer to standard text books like [Gv96].
We note some properties of the singular values, which can easily be checked:
• A+ = V Σ+ U T .
From the last property we see how the pseudoinverse can be constructed via the
MATLAB singular value decomposition (svd). In MATLAB the singular value decomposi-
tion is obtained by running the command
[U,S,V] = svd(A)
Aplus = pinv(A)
Chapter 3
Signal Processing
63
64 CHAPTER 3. SIGNAL PROCESSING
Example 72 In MATLAB you can generate the samples from a sound file. You
will find on the homepage of the course a sound file kaktus.au. By applying the
MATLAB commands
NM1=auread(’kaktus’,’size’)
[y,rate]=auread(’kaktus’);
you get the number of samples, here N = 480720, the sampling rate r = 8012 and
finally the samples y ∈ R. We can complete the data vector by yN = y0 . Thus
playing the sound file will take T = N/r = 60 sec.
In the following we will assume, that the time scale is normalized in such a way
that T = 1.
The interpolation task requires to determine ci ∈ C complex coefficients, such
that
N −1
ϕ(tk ) = cj ei2πjtk = yk
j=0
N −1
N −1
N −1
i2πjtk −i2πjtk
yk = ϕ(tk ) = cj e = ϕ(tk ) = cj e = cN −j ei2πjtk .
j=0 j=0 j=0
Thus, when yi ∈ R,
cj = cN −j . (3.2)
3.1. DISCRETE FOURIER TRANSFORMATION 65
a0 n
= + aj cos(2πjtk ) + bj sin(2πjtk )
2 j=1
This gives us now the interpretation of the coefficients: The measurements are
signals composed out of trigonometric functions. If t ∈ [0, 1], (a2j + b2j )1/2 gives
b
the amplitude at the frequency j Hz and arctan − ajj the corresponding phase.
1 2
0.9 1.5
0.8
1
0.7
0.5
Phase (rad)
0.6
Amplitude
0.5 0
0.4
−0.5
0.3
−1
0.2
−1.5
0.1
0 −2
0 20 40 60 80 100 0 20 40 60 80 100
Frequency (Hz) Frequency (Hz)
Example 74 Consider the function f (t) = sin(44 · 2πt + 1) + 0.2 sin(10 · 2πt)
and assume that 100 samples are taken equidistantly in [0, 1]. From the Fourier
coefficients cj we obtain the amplitude and phase depicted in Fig. 3.1.
The corresponding MATLAB code to generate this picture is MATLAB
N=100;
t=linspace(0,1,N+1);
t=t(1:N); %Erase the last point
signal=sin(44*2*pi*t+1)+0.2*sin(10*2*pi*t);% generate the signal
c=fft(signal)/N; % Compute the Fourier coefficients
amplitude=sqrt((2*real(c)).^2+(-2*imag(c)).^2);
phase=atan((2*imag(c))./(2*real(c)));
% erase phase values caused by round-off errors
phase(find(amplitude<1.e-5))=0; %check: help find
figure(1)
stem([0:N-1],amplitude)
figure(2)
stem([0:N-1],phase)
The figure reflects clearly the two frequencies, 44 Hz and 10 Hz contained in the
signal. Additionally one observes that the picture is symmetric and the frequencies
are reflected at 50 Hz. This is a consequence of the property (3.2) of the Fourier
coefficients. The phase plot reflects the phase shifts, −π/2 at 10 Hz and 1 − pi/2
at 44 Hz. Note that the phase shift is related to the phase of the cosine function.
In the MATLAB code the Fourier coefficients are computed via the command
MATLAB fft, which stands for Fast Fourier Transformation, an algorithm, which will be
explained in the rest of this chapter. Note, the division by N in the MATLAB
code. MATLAB uses a slightly different definition of the Fourier transformation
as we use it in this course. The definitions differ by this factor.
3.1. DISCRETE FOURIER TRANSFORMATION 67
For solving Eq. (3.1) we note first an important property of the complex trigono-
metric base polynomials ω(t)j
Theorem 75 Let tk = k/N and ωk := ei2πtk = ei2πk/N then
N −1
ωjk ωj−l = N δkl (3.6)
j=0
1 1
M −1 M −1
= yk ω1−2kl + yk+M ω1−2kl
N k=0 N k=0
68 CHAPTER 3. SIGNAL PROCESSING
j=2 i
j=3 j=1
j=0
j=4
1
j=5 j=7
j=6
Figure 3.2: Unit roots
−2(k+M )l
We note, ω1−2kl = ω1 and ω12 = ω2 (cf. Fig.3.2). Consequently,
1
M −1
c2l = (yk + yk+M )ω2−kl . (3.9)
N k=0
1
M −1
c2l+1 = (yk − yk+M )ω1−k ω2−kl . (3.10)
N k=0
for the data in the ”odd step” (that the step is ”odd” is indicated by the super-
script ”1”).
With these definitions Eqs. (3.9) and (3.10) read
repeated and the numbers of sums can be halved another time. Now, we have to
distinguish the cases l even and l odd. We add another superscript to α to mark,
which case we considered (see example below). The optimal situation occurs, if
N = 2p . Then this transformation can be iterated until we have only a single
term.
We will describe the procedure first by an example:
1 1 [1] −2k
3 3
c5 = (yk − yk+4 )ω1−k ω1−4k = α ω odd
N k=0
N k=0 k 2
[1]
=:αk
1 [1] 1 [01] −k
1 1
(αk + αk+2 ) ω2−2k
[1]
= = α ω even
N k=0
N k=0 k 4
[10]
=:αk
1 [10]
0
1 [101]
(αk − αk+1 )ω4−k
[10]
= = α odd
N k=0
N 0
[101]
=:αk
1 [mirror2 (j)]
Similarly, we get cj = α
N 0
. Where mirror2 just reverses the binary rep-
resentation of j, e.g.
The general idea of the FFT algorithm (FFT=fast Fourier transformation) can
best be described by the following MATLAB code:
function c=dfft(y)
% c=dfft(y)
% discrete fourier transformation of y
%
N=length(y);
omega_N=exp(-i*2*pi/N);
c=zeros(1,N);
%
p=log2(N);
if round(p) ~=p
error(’N is not a power of 2’)
end
NRED=N;
for ind=1:p
NRED_old=NRED;
NRED=NRED/2;
NSEG=2^(ind-1); %Number of even/odd segments
for ISEG=0:NSEG-1
fac=1;
for kk=1:NRED;
3.1. DISCRETE FOURIER TRANSFORMATION 71
k=kk+ISEG*NRED_old;
alpha_even=y(k)+y(k+NRED);
alpha_odd =(y(k)-y(k+NRED))*fac;
fac=fac*omega_N;
y(k)=alpha_even;
y(k+NRED)=alpha_odd;
end;
end;
omega_N=omega_N^2;
end;
% Sorting the indices and normalizing by N
% (this could be done by a simple bit-handling instead)
for j=0:N-1,
jbin=dec2bin(j,p+1);
jbininv=jbin(p+1:-1:2);
c(j+1)=y(bin2dec(jbininv)+1)/N;
end;
72 CHAPTER 3. SIGNAL PROCESSING
2,3,683
2,5,409
Flops
7
10
17,241
2,23,89
2,3,11,31
6 3,5,7,13
10
2
5
10
4090 4091 4092 4093 4094 4095 4096 4097 4098 4099
Number of Points
Figure 3.4: Computational effort for FFT depending on the prime factors of N
The basic idea of this algorithm is due to Cooley and Tucker. It’s success is based
on the fact that it requires only O(N log2 N ) multiplications if N is a power of
two.
If the number of samples N is no power of two, the iteration follows no longer
a binary tree and the sums (3.8) are split corresponding to the prime factors of
N . The computational works increases with the size of the prime factors and in
the extrem, when N is prime the computational effort becomes O(n2 ) which is
just the work which has to be done for the matrix-vector multiplication in (3.7),
cf. Fig. 3.4.
Chapter 4
Iterative Methods
All methods we discussed so far were finite in nature, i.e. the result was obtained
in a finite number of computational steps. The computational effort for obtaining
a numerical solution can be predicted, as the number of operation depends only
on the problem type, not on the particular data. For example, to perform LU fac-
torization of a non-sparse matrix requires always the same number of operations.
If we assume that computations can be carried without any round-off error, this
methods would give the exact answer to the given problem in a finite number
of arithmetic operations (+, −, ·, /). However, that is an exceptional situation.
In most cases solutions of mathematical problems can not be computed exactly.
√
This is particularly the case, if the solution is a irrational number like 2, for
example. Therefore, solutions of nonlinear equations, e.g.
x2 − 2 = 0
can not be computed exactly. As eigenvalues are defined as solutions of the char-
acteristic polynomial, they are not computable in a finite number of operations.
The methods we will consider now are based on iterative processes. The nu-
merical solution is the limit of a convergent series {xn }. Iteration means, that
one computes xi based on previous elements xi−1 , xi−2 , . . . and estimates the dis-
tance xi − x∗ . If this quantity is small enough xi is taken as the numerical
approximation to the problem at hand. How many iterates it needs to reach this
tolerance limit depends highly on the data. Consequently, the computational
effort depends on the data and not only on the type of problem. Even assuming
that computations can be carried out at infinite precision (no round-off), the
result will be in general not the exact solution due to the truncation of the lim-
iting process. We obtain only approximative solutions and we need a good error
estimation along with the method.
We discuss in this course iterative method to compute eigenvalues and zeros of
nonlinear functions.
73
74 CHAPTER 4. ITERATIVE METHODS
X −1 AX = diag(λ1 , . . . , λn ).
• If A = AT than all eigenvalues are real. The eigenvectors are linear inde-
pendent and form an orthogonal system, i.e. XX T = I.
Proof :
Let λ be an eigenvalue of A and x the corresponding eigenvector. Choose the
index i such that |xi | = x∞ . We write now the i-th component from the relation
Ax − λx = 0 as
Ax i = λxi .
Subtracting aii xi gives
Ax i − aii xi = (λ − aii )xi .
Consequently,
n
n
n
|λ − aii | |xi | = | Ax i − aii xi | ≤ |aij xj | = |aij | |xj | ≤ |aij | |xi |.
j=1 j=1 j=1
j=i j=i j=i
Thus λ ∈ Bi . ✷.
Furthermore it can be shown that if the union of r circles Bi is not intersecting
the remaining Bj , then the union contains exactly r eigenvalues.
We will relate now the Rayleigh quotient to the in modulus largest eigenvalue of
A:
Theorem 80 Let λ1 , λ2 , . . . , λN be the eigenvalues of A with |λ1 | > |λi | and let
x1 be the eigenvector corresponding to λ1 . The iterates x(n) generated by the
recursion (4.2) have the properties
lim µ(n) = λ1
n→∞
if xT
1x
(0)
= 0.
76 CHAPTER 4. ITERATIVE METHODS
Proof :
Let x1 , . . . , xN be an orthonormal system of eigenvectors of A. Then, there are
coefficients αi such that
N
(0)
x = αi xi .
i=1
Due to the orthogonality of the eigenvectors we get
xT
i x
(0)
= αi
and especially, by the assumption on x(0)
α1 = 0.
Furthermore,
N
N
(1) (0)
x = Ax = αi Axi = αi λi xi
i=1 i=1
and by iterating
N
x(n) = Ax(n−1) = An x(0) = αi λni xi .
i=1
Due to the orthonormality of the xi :
N
(n) T (n)
x x = x(n) 22 = αi2 λ2n
i
i=1
and
N
(n) T (n)
x Ax = αi2 λ2n+1
i .
i=1
Thus,
N N 2 λi 2n+1
2 2n+1
(n) i=1 αi λi i=1 αi λ1
µ = N 2 2n = λ1 N
.
2 λi 2n
i=1 αi λi i=1 αi λ1
• The better the eigenvalues are separated the faster the convergence (see
exercises).
• If λ1 = λ2 , then
xT
kx
(n)
lim = 0 k = 3, 4, . . . , N (4.5)
n→∞ x(n)
So, if we apply the power iteration to A−1 we would obtain the largest eigenvalue
of A−1 which is just the inverse of the smallest eigenvalue of A. This way of
applying power iteration is called the inverse power iteration method.
By applying the second statement of Theorem 81 enables us to compute also
other eigenvalues, not only the largest or smallest. If we assume, that si is a
good guess of the ith eigenvalue of A, then A − si I has an eigenvalue which is
near zero and consequently, we can expect, that (A − si I)−1 has (λi − si )−1 as
its largest eigenvalue in modulus. Often Gerschgorin’s theorem can be applied to
obtain a good guess for a certain eigenvalue. This technique to compute the ith
eigenvalue of A is called eigenvalue shift technique.
The inverse power iteration is a good example for a problem, where the same
LU factorization is applied to many different right hand side vectors as we can
rewrite the iteration in the following way in order to avoid direct inversion of A:
Note, eigenvectors are unique up to scaling. The scaling factor tends to grow
during the inverse iteration process. That is the reason, why the iterates are
normalized after each step.
We summarize the algorithm for the inverse power iteration:
• Compute µ(1)
• Iterate these steps, i.e. Solve Ax(k) = x(k−1) for x(k) , set x(k) := x(k) /x(k)
and compute the Rayleigh quotient µ(k) .
• If for some n the difference |µ(n) − µ(n−1) | is sufficiently small, then set
λ = (µ(n) )−1 .
• Apply the eigenvalue shift to repeat the process for the next eigenvalue.
Recall that this algorithm is applied to symmetric matrices and that its con-
vergence depends on how good the eigenvalues are separated from each other.
The main application of this technique can be found in eigenvalue problems for
boundary value problems, where depending on the particular discretization meth-
ods symmetric matrices of the type (1.36) occur.
There are also iteration methods for general, non symmetric matrices. These
are based on an iterative similarity transformation of A to block triangular form
(Schurform), cf. [Gv96].
Controler
ϕ
Figure 4.1: Control circuit
4.2. FIXED POINT ITERATION 79
1. There is exactly one desired state x∗ . Is the system in this state, the
controler may not change anything
x∗ = ϕ(x∗ ).
In that sense, the desired state x∗ is called a fixed point of the controler ϕ.
2. Do they converge ?
for all x, y ∈ D0 .
Otherwise the function is called dissipative.
80 CHAPTER 4. ITERATIVE METHODS
x, y ∈ D ⇒ x + t(y − x) ∈ D ∀t ∈ [0, 1]
Theorem 84 For all An×n and all E > 0 there exists a norm . such that
A ≤ ρ(A) + E
By this theorem and Eq. (4.11) we can check for continuously differentiable
functions ϕ contractivity by checking eigenvalues:
The condition
ρ(ϕ (x∗ )) ≤ δ < 1
is sufficient for ϕ being a contraction in a neighborhood of a point x∗ .
We are ready now for one of the most central theorems in Numerical Analysis
Proof:
For all x(0) ∈ D0 holds
and consequently
x(i+1) − x(i) ≤ Li x(1) − x(0) . (4.13)
4.2. FIXED POINT ITERATION 81
x∗ = lim x(i) .
i→∞
Thus, x∗ = ϕ(x∗ ). ✷
We turn now to the question about the speed (rate) of convergence and about
the error we make, when we stop iterating after a finite number of iterations.
Let x∗ be the fixed point. Then we get for contractive functions ϕ:
and consequently:
L(ϕ)
x(i) − x∗ ≤ x(i−1) − x(i) (4.15)
1 − L(ϕ)
This inequality is called an a posteriori error bound. With this we can decide on
the quality of the k th iterate after having computed it.
If we want to know in advance how many iterates we would need to achieve a
certain accuracy we apply an a priori error bound: Inserting
x(i) − x∗ ≤ Cx(i−1) − x∗ p
• p = 2 quadratic convergence. We will see later that this is the ideal order
of convergence for Newton’s method.
These high costs can only be compensated by a fast convergence. The conver-
gence properties of Newton’s method are stated by the Newton–Kantorovitch
Theorem. It says that Newton’s method is locally quadratic convergent under
some conditions on the smootheness of F and on the topological properties of
D, i.e. for every x(0) sufficiently near the solution x∗ a sequence x(i) is generated
with
x(i) − x∗ ≤ CN x(i−1) − x∗ 2 .
In many applications no information about the Jacobian F (x(i) ) is available.
Often F is known only as a set of complex subroutines, which are generated au-
tomatically by special purpose programs in optimization or engineering. There
are modern techniques, called automatic differentiation which generate a sub-
routine for the corresponding Jacobian. These techniques can be viewed as a
pre-compiler1 .
with ek being the k th unit vector and η ∈ R a sufficiently small number. The
increment η has to be chosen such that the influence of the approximation error
ε(η) can be neglected. It consists of truncation errors and roundoff errors in
the evaluation of F . Let εF be an upper bound for the error in the numerical
computation of F , then
2ε + 1 ∂ 2 Fi η 2 + O(η 3 )
η ∂F F 2 ∂x2j
|εij (η)| = ∆ej Fi − ≤
i
. (4.19)
∂xj η
In Fig. 4.2 the overall error for the example sin (1) is given. In the left part of
the figure the roundoff error dominates and in the right part the truncation error.
The slopes in double logarithmic representation are -1 and +1 for the roundoff
and approximation errors which can be expected from (4.19).
When neglecting η 2 and higher order terms this bound is minimized if η is selected
according the rule of thumb
$
∂ 2 Fi −1
η = 2 εF 2
∂xj
1
see html://www.mcs.anl.gov/adifor
4.3. NEWTON’S METHOD 85
Error |ε(η)|
−5
10
−10
10 −15 −10 −5 0
10 10 10 10
Perturbation η
Figure 4.2: Error |εij (η)| in the numerical computation of sin (1)
This method is called simplified Newton method. Every single step is much simpler
to compute, because of the Jacobian and it’s LU factorization are already known.
However this method is only linearly convergent.
applicable also for the simplified Newton method (set in the theorem B(x) =
F (x(0) )−1 or for the Gauß-Newton method, which we will consider later (set
B(x) = F (x)+ ).
Theorem 88
Let D ⊂ Rn be open and convex and x(0) ∈ D.
Let F ∈ C 1 (D, Rm ) and B ∈ C 0 (D, Rn×m ).
Assume that there exist constants r, ω, κ, δ0 such that for all x, y ∈ D the following
properties hold:
1. Curvature condition
%
%
%B(y) F (x + τ (y − x)) − F (x) (y − x)% ≤ τ ω y − x2 (4.20)
2. Compatibility condition
with the compatibility residual R(x) := F (x)−F (x)B(x)F (x) and κ < 1.
3. Contraction condition
ω (1)
δ0 := κ + x − x(0) < 1.
2
with x(1) = x(0) − B(x(0) )F (x(0) )
x(1) − x(0)
with r := .
1 − δ0
Then the iteration
x(k+1) := x(k) − B(x(k) )F (x(k) )
is well-defined with x(k) ∈ D0 and converges to a solution x∗ ∈ D0 of B(x)F (x) =
0.
The speed of convergence can be estimated by
δkj
x(k+j) − x∗ ≤ ∆x(k) (4.22)
1 − δk
with δk := κ + ω2 ∆x(k) and the increments decay conforming to
ω
∆x(k+1) ≤ κ∆x(k) + ∆x(k) 2 . (4.23)
2
4.4. CONTINUATION METHODS IN EQUILIBRIUM COMPUTATION 87
δ0j
x(j) − x∗ ≤ x(1) − x(0) (4.24)
1 − δ0
can be obtained.
This theorem with its constants needs some interpretation:
H(x, si ) = 0, i = 1, . . . , m
by Newton’s method where the solution xi of the ith problem is taken as starting
value for the iteration in the next problem. The key point is that if ∆si = si+1 −si
is sufficiently small, then the iteration process will converge since the starting
value xi for x will be hopefully in the region of convergence of the next subproblem
H(x, si+1 ) = 0.
From the mechanical point of view −F (x0 ) is a force which has to be added to
F (x) in order to keep the system in a non-equilibrium position. The goal of
the homotopy is then to successively reduce this force to zero by incrementally
changing s.
The embedding chosen in (4.25) is called a global homotopy. It is a special case
of a more general class of embeddings, the so-called convex homotopy
H : Rn × R → Rn
d d
with Hx := dx H, Hs := ds H.
If Hx (x, s) is regular we get to the so-called Davidenko differential equation:
• x(0) = x0 ;
• H(x(s), s) = 0;
for all s ∈ I0 .
The direction x (s) given by the right hand side of the Davidenko differential
equation is just the tangent at the solution curve x in s.
Fig. 4.3 motivates the following definition
x x(s)
x'(si )
s0 s1 si
Figure 4.3: Turning points
This is just a step of the explicit Euler method for ODEs applied to the Davi-
denko differential equation, cf. Sec. 5.7.1.
The amount of corrector iteration steps depends on the quality of the prediction.
First, we have to require that the predicted value is within the convergence region
of the corrector problem. This region depends on the constants given by Theorem
88, mainly on the nonlinearity of the F . Even if the predicted value is within
the domain of convergence, it should for reasons of efficiency be such that there
are not too many corrector iteration steps needed. Both requirements demand
an elaborated strategy for the step size control, i.e. for the location of the points
si . For other strategies we refer to [AG90].
A central example for an application of a homotopy method will be given in the
project homework of this course.
4.5. GAUSS-NEWTON METHOD 91
N (x) = N0 e−αx
F (x)22 = min
with F : Rn → Rm and m ≥ n.
A necessary criterion for
g(x) := F (x)T F (x) having a minimum is
and set x(k+1) := x(k) ) + ∆x. This method is called Gauß-Newton Method.
Note, that we obtain in every iteration step a linear least squares problem with
the corresponding normal equations:
∆x = −F (x(k) )+ F (x(k) )
92 CHAPTER 4. ITERATIVE METHODS
which shows a formal similarity with Newton’s method for nonlinear equations.
When comparing to the first approach, we note that we now neglected the Hes-
sian. This neglection can be made if F (x∗ ) is sufficiently small, which corresponds
to the requirement that the measurement errors are small and unbiased. In that
case Newton’s convergence theorem assures locally linear convergence and in par-
ticular if F (x∗ ) = 0 even locally quadratic convergence. For a detailed practical
example see the exercises.
Ax = b
• PDEs are solved within some accuracy bound. The error depends on the
density of the discretization mesh. It would be an unnecessary waste of
computation time to solve a subproblem, like linear system solving, to a
higher accuracy as the one required for the overall process.
All these points motivate the construction of iterative methods.
We first reformulate the problem into a fixed point problem following the same
idea, which we applied when considering nonlinear problems, see p. 79:
Ax = b ⇔ Q−1 (b − Ax) + x = x
(I − Q−1 A) x + Q−1 b = x
G c
4.6. ITERATIVE METHODS FOR LINEAR SYSTEMS 93
ϕ(x) = Gx + c = x.
From the fixed point theorem (Th. 85) we conclude that the iteration
x(i+1) := Gx(i) + c
converges if
ρ(G) < 1,
where ρ(G) denotes the spectral radius of G, see p. 80.
We consider three important choices of Q:
• Q = I: Richardson iteration
Richardsson Iteration
If we just choose Q = I we obtain the iteration
which converges if
which is equivalent to |λ(A)| < 2. This restricts severely the class of problems
for which this methods is applicable.
Jacobi Iteration
We write A as a sum of a diagonal matrix D and lower and upper triangular
matrices L and U ,
A = L + D + U,
and choose Q = D. This gives the iteration
Gauss-Seidel Iteration
Here, we choose Q = D + L and obtain the iteration
95
96 CHAPTER 5. ORDINARY DIFFERENTIAL EQUATIONS
Here m is the mass of a body, a(t) = ẍ(t) is its acceleration and f (t) is a force
acting on the body. If one is interested in the position as a function of time, the
governing equation is the second order differential equation
Introducing y1 (t) = x(t) and y2 (t) = ẋ(t), the second order ODE can be rewritten
as a first order system of two equations:
ẏ1 (t) = y2 (t),
ẏ2 (t) = ẍ(t) = f (y1 (t), t)/m.
In vector notation this system reads
the operation which can not be evaluated numerically obviously is the limit h → 0,
that defines the derivative
y(t + h) − y(t)
ẏ(t) = lim .
h→0 h
However, for any positive (small) h, the finite difference
y(t + h) − y(t)
h
can easily be evaluated. By definition, it is an approximation of the derivative
ẏ(t). Let us therefore approximate the differential equation ẏ(t) = f (t, y(t)) by
the difference equation
u(t + h) − u(t)
= f (t, u(t)).
h
Given u at time t, one can compute u at the later time t + h, by solving the
difference equation
u(t + h) = u(t) + hf (t, u(t)).
This is exactly one step of the explicit Euler method Introducing the notation
tn+1 = tn + h and un = u(tn ) it reads
We shall see in Sect. 5.4 that un really is a first order approximation to the exact
solution y(tn )
un − y(tn ) = O(h) h → 0.
1.2
1.15
1.1
1.05
0.95
y
0.9
0.85
0.8
0.75
0.7
0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4
t
Then any of the quadrature rules of Chapter 1.4 can be applied to approximate
the integral. Choosing the rectangular rule,
tn +1
f (τ, y(τ )) dτ ≈ hf (tn , y(tn )),
tn
un+1 = un + hf (tn , un ).
Ignoring the second order term and using the differential equation to express the
derivative ẏ(tn ) leads also to Euler’s method
un+1 = un + hf (tn , un ).
ẏ = −100y.
The exact solution is y(t) = y0 e−100t . With a positive initial value y0 it is positive
and rapidly decreasing as t → ∞. The explicit Euler method applied to this
differential equation reads
un+1 = (1 − 100h)un .
un = (−9)n u0
un = (0.9)n u0
is smoothly decaying.
Another test example is the initial value problem
√
ẏ = λ(y − sin(t)) + cos t, y(π/4) = 1/ 2,
where λ is a parameter. First we set λ = −0.2 and compare the results for
Euler’s method with two different step sizes h = π/10 and h = π/20, see Fig. 5.2.
Obviously, the errors decrease with the step size. Setting now λ = −10 the
numerical results for h = π/10 oscillate arround the true solution and the errors
grow rapidly in every single step. For the reduced step size h = π/20 however,
Euler’s method gives a quite good approximation to the true solution, see Fig. 5.3.
100 CHAPTER 5. ORDINARY DIFFERENTIAL EQUATIONS
1.2 1.2
1.1 1.1
1 1
0.9 0.9
y
y
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4
t t
Figure 5.2: Explicit Euler Method, λ = −0.2, h = π/10 (left), h = π/20 (right)
1.2 1.2
1.1 1.1
1 1
0.9 0.9
y
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4
t t
Figure 5.3: Explicit Euler Method, λ = −10, h = π/10 (left), h = π/20 (right)
ẏ = λy, (5.3)
y(t) = y0 eλt
remains bounded
|y(t)| = |y0 | · |e(α+iβ)t | = |y0 | · |eαt |
5.4. LOCAL, GLOBAL ERRORS AND CONVERGENCE 101
if α = Reλ is non positive. In this case it is reasonable to ask that the numerical
solution remains bounded too. For the explicit Euler method,
un+1 = (1 + hλ)un
this demand requires that the amplification factor is bounded by one
|1 + hλ| ≤ 1. (5.4)
The explicit Euler method is called stable for the test equation (5.3) if the step
size h satisfies the condition (5.4). In the case of real and negative λ, this means
h ≤ −2/λ, cf. the experiments in the previous section.
The set
S = {hλ ∈ C : |1 + hλ| ≤ 1}
is called the stability region of the Euler method. It is a disc of radius 1 centered
at (−1, 0), see Fig. 5.4.
1.5
0.5
Im
−0.5
−1
−1.5
−2.5 −2 −1.5 −1 −0.5 0 0.5
Re
is called the local error, where ŷ is the solution to ŷ˙ = f (t, ŷ) with initial condition
ŷ(tn ) = un .
Note that the local error is an error in the numerical approximation, that is in-
trocuced in one single time step; at time tn the values un and ŷ(tn ) are identical.
The global error however, is an error at time tn that has accumulated during
n steps of integration. That is the error that one naturally observes when per-
forming numerical calculation. In order to estimate the global error, we first
analyze the local error and then study how local errors accumulate during many
integration steps.
Local errors can be analyzed by Taylor expansion. We demonstrate this for the
explicit Euler method
un+1 = un + hf (tn , un ).
Inserting the initial condition for ŷ yields
˙ n) + h2 ¨
ŷ(tn+1 ) = ŷ(tn ) + hŷ(t ŷ(tn ) + . . .
2
Using the differential equation for ŷ we find
h2 ¨
ŷ(tn+1 ) = ŷ(tn ) + hf (tn , ŷ(tn )) + ŷ(tn ) + . . . (5.6)
2
Subtracting (5.6) from (5.5) gives the local error for the explicit Euler method :
h2 ¨
εn+1 = − ŷ(tn ) + . . .
2
The accumulation of all local errors during the time stepping procedure deter-
mines the global error which is observed after many iteration steps. To investigate
the global error, we subtract the Taylor expansion of the true solution
h2
y(tn+1 ) = y(tn ) + hẏ(tn ) + ÿ(tn ) + . . .
2
h2
= y(tn ) + hf (tn , y(tn )) + ÿ(tn ) + . . .
2
from the explicit Euler method
un+1 = un + hf (tn , un ).
5.5. STIFFNESS 103
en = O(h) as h → 0.
5.5 Stiffness
The explicit Euler method is always stable for the test equation ẏ = λy, λ < 0
when only the step size h is small enough
h < −2/λ.
However, for strongly negative λ * −1, this leads to extremely small step sizes.
Small step sizes may be reasonable and hence acceptable if the right hand side
104 CHAPTER 5. ORDINARY DIFFERENTIAL EQUATIONS
of the differential equation is large and the solution has an large gradient. But,
a stongly negative λ does not neccessarily imply large gradients (the right hand
side depends on y(t) also). Consider the example
ẏ = λ(y − sin t) + cos t, λ = −50. (5.8)
A particular solution is
yp (t) = sin t.
The general solution for the homogenous equation is
yh (t) = ceλt ,
thus the general solution for the nonhomogenous differential equation (5.8) is
y(t) = (y0 − sin t0 )eλ(t−t0 ) + sin t. (5.9)
This solution consists of a slowly varying part, sin t and an exponentially fast
decaying initial layer
(y0 − sin t0 )eλ(t−t0 ) .
Generally, differential equations which admit very fast initial layers as well as
slow solution components, are called stiff problems.
When in (5.9) y0 = sin(t0 ) and λ * −1, then the layer has a large derivative and
it is plausible that a numerical method requires small step sizes. However, after
only a short time the initial layer has decayed to zero and is no longer visible
in the solution (5.9). Then y(t) ≈ sin t and it would be reasonable to use much
larger time steps. Unfortunatelly, the explicit Euler method does not allow time
steps larger then the stability bound h < −2/λ. Even if we start the initial value
problem for t0 = π/4 exactly on the slow component y0 = sin(π/4) —that means
there is no initial layer present— the explicit Euler approximation diverges with
h = 0.3 > −2/λ, see Fig. 5.5. The reason for this effect is as follows. In the first
step, the method introduces an local error as u1 = sin(t1 ). Then in the second
step the initial layer is activated, the step size is too large, and the error gets
amplified.
As it is impossible to avoid local errors, the only way out of this problem is
is to construct methods with better stability properties. This leads to implicit
methods.
1.5
0.5
solution
−0.5
−1
−1.5
−2
1 1.5 2 2.5 3
time
thus
y(t) − y(t − h)
≈ f (t, y(t))
h
leading to the scheme
un+1 = un + hf (tn+1 , un+1 ). (5.10)
This method is known as the implicit Euler method. Given un ≈ y(tn ), a new
approximation un+1 ≈ y(tn+1 ) is defined by formula (5.10). However, this is an
implicit definition for un+1 and one has to solve the nonlinear equation (5.10) to
compute un+1 . Clearly, the methods of Ch. 4 can be applied for that task. For
example a fixed point iteration applied to (5.10)
(j+1) (j)
un+1 = un + hf (tn+1 , un+1 ), j = 0, 2, . . .
(0)
is easy to compute once an initial guess un+1 is known. However, to find this
guess, which may be a rough approximation to un+1 , an explicit Euler step is
good enough
(0)
un+1 = un + hf (tn , un ).
In this context the explicit Euler step is called a predictor to the fixed point iter-
ation which is then used as a corrector of the approximation. So called predictor–
corrector algorithms will be discussed in more detail in Sect. 5.7.1.
Before analyzing the implicit Euler method let us first give a second explanation.
We have seen in Sec. 5.2.3 that numerical methods can also be derived from the
integral formulation of the initial value problem
t
y(t) = y0 + f (τ, y(τ )) dτ.
t0
106 CHAPTER 5. ORDINARY DIFFERENTIAL EQUATIONS
1.2
1.1
0.9
y
0.8
0.7
0.6
0.5
0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4
t
1.5
0.5
Im
0
−0.5
−1
−1.5
−0.5 0 0.5 1 1.5 2 2.5
Re
un+1 = un + hλun+1 .
This condition is satisfied for any positive step size h. Hence the implicit Euler
method is unconditionally stable and the stability region
S = {hλ ∈ C : |1 − hλ| ≥ 1}
which could not be approximated by the explicit Euler method in the case λ =
−10 and h = π/10, see Sect. 5.2.4.
Figs. 5.8 and 5.9 show a stable behaviour of the implicit method independent
of the parameters λ and h. Also the errors obviously decrease as h → 0 (with
λ fixed). In fact, the unconditionally stable implicit Euler method produces
qualitatively correct approximations for all (reasonable) step sizes. Of course the
robustness of the method has its price: solving a nonlinear equation in every
single step.
1.2 1.2
1.1 1.1
1 1
0.9 0.9
y
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4
t t
Figure 5.8: Implicit Euler Method, λ = −0.2, h = π/10 (left), h = π/20 (right)
1.2 1.2
1.1 1.1
1 1
0.9 0.9
y
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4
t t
Figure 5.9: Implicit Euler Method, λ = −10, h = π/10 (left), h = π/20 (right)
5.7. MULTISTEP METHODS 109
with the step size hi at step i + 1. Let us assume for the moment that k solution
points at successive time points are given
un+1−i := y(tn+1−i ), i = 1, . . . , k.
are known and can be used to define an interpolation polynomial πkp of degree
k − 1 with the property
By this requirement this polynomial is uniquely defined, though there are many
different ways to represent it. For theoretical purposes the Lagrange formulation
is convenient. There, πkp is combined of Lagrange basis polynomials Lk−1
i (t) (cf.
Sec. 1.2.1 and Sec. 1.4)
k
πkp (t) = Lk−1
i (t)f (tn+1−i , un+1−i ) (5.13)
i=1
with
k
t − tn+1−j
Lk−1 (t) := .
n+1−i − tn+1−j
i
i=j=1
t
They fulfill
Lk−1
i (tn+1−j ) := δij (Kronecker symbol).
110 CHAPTER 5. ORDINARY DIFFERENTIAL EQUATIONS
k
upn+1 = un + hn p
βk−i f (tn+1−i , un+1−i ) (5.14)
i=1
tn+1
p 1
with βk−i = Lk−1
i (t) dt.
hn tn
The number of previous values needed to approximate x(tn+1 ) is called the num-
ber of steps of the method and all previous values and their derivatives sometimes
are called the trail of the method. In the sequel we will denote by un the numerical
approximation to y(tn ) and set fn := f (tn , un ).
Example 97 For equal (constant) step sizes the Adams–Bashforth methods are
given by the following formulas
k
c
un+1 = un + hn βk−i f (tn+1−i , un+1−i ) (5.15)
i=0
with
tn+1
c 1
βk−i := Lki (t) dt
hn tn
k
t − tn+1−j
k
Li (t) := .
i=j=0
tn+1−i − tn+1−j
5.7. MULTISTEP METHODS 111
Example 98 For equal (constant) step sizes the Adams–Moulton methods are
given by the following formulas
Evaluate (E)
f (tn+1 , upn+1 )
(5.16)
Correct (C)
un+1 = un + hn βkc f (tn+1 , upn+1 ) + ki=1 βk−i
c
f (tn+1−i , un+1−i )
Evaluate (E)
f (tn+1 , un+1 ).
un+1−i , i = 1, . . . , k.
k
πkp (t) = Lk−1
i (t)un+1−i .
i=1
k
xpn+1 = πkp (tn+1 ) = Lk−1
i (tn+1 )un+1−i .
i=1
p
Introducing the coefficients αk−i := −Lk−1
i (tn+1 ) the predictor equation
k
upn+1 =− p
αk−i un+1−i
i=1
The first conditions are interpolation conditions using the unknown value xn+1 ,
which is defined implicitly by considering (5.18b).
With the coefficients
c L̇ki (tn+1 ) 1
αk−i := , βkc :=
L̇k0 (tn+1 ) hn L̇k0 (tn+1 )
equation (5.18b) can be expressed by
k
un+1 = − c
αk−i un+1−i + hn βkc f (tn+1 , un+1 ), (5.19)
i=1
later that this method is of particular interest for stiff problems. The predictor-
corrector scheme for a BDF method has the form:
Predict (P)
upn+1 = − ki=1 αk−i
p
un+1−i
Evaluate (E)
(5.20)
f (tn+1 , upn+1 )
Correct (C)
un+1 = − ki=1 αk−i
c
un+1−i + hn βkc f (tn+1 , upn+1 )
Again, the implicit formula can be solved iteratively by applying the scheme
P (EC)m with m ≥ 1, though, in practice, BDF methods are mainly implemented
together with Newton’s method. We will see the reason for this later, when
discussing stiff ODEs.
k
c
ξ = un + hn βk−i f (tn+1−i , un+1−i )
i=1
k
ξ=− c
αk−i un+1−i
i=1
in the case of implicit BDF methods, cf. (5.20). Eq. (5.21) describes a fixed point
iteration. By the fixed point theorem a necessary condition for the convergence
of this iteration is that the corresponding mapping
is a contraction, cf. Def. 4.9. As the Lipschitz constants of ϕ and f are related
by
L(ϕ) = hn |βkc |L(f ),
114 CHAPTER 5. ORDINARY DIFFERENTIAL EQUATIONS
it is obvious that ϕ is contractive and the iteration convergent if the step size
is sufficiently small. In many cases L(f ) is of moderate size such that the step
size for the required local accuracy is small enough to ensure fast convergence
d
of the fixed point iteration. On the other hand, if the Jacobian dx f (t, x) has
eigenvalues being large in modulus, the step size might be restricted much more
by the demand for contractivity than by the required tolerance and stability.
This situation is to be expected when dealing with stiff systems. In this case it is
appropriate to switch from fixed point iteration to Newton iteration for solving
the implicit corrector equation. When applying Newton’s method, the nonlinear
equation
F (un+1 ) = un+1 − (ξ + hn βkc f (tn+1 , un+1 )) = 0 (5.22)
is considered. Newton’s method then defines the iteration
(i) (i) (i)
J(tn+1 , un+1 ) ∆u(i) = − un+1 − (ξ + hn βkc f (tn+1 , un+1 )) (5.23)
(i+1) (i)
with un+1 := un+1 + ∆u(i) and
d
J(t, u) := I − hβkc f (t, u).
du
Like in the case of fixed point iteration the predictor solution is taken as starting
(0)
value: un+1 := upn+1 . The method demands an high computational effort, which
is mostly spent when computing the Jacobian J and solving the linear system.
The automatic order variation is also used for starting multistep methods: The
starting points are successively obtained by starting with a one step method, then
proceeding with a two step method and so on until an order is reached which is
appropriate for the given problem.
c1
c2 a21
c3 a31 a32 c A
.. .. .. . . or
. . . . bT
cs as1 as2 · · · as,s−1
b1 b2 · · · bs−1 bs
with A = (aij ) and aij = 0 for j ≥ i.
The classical 4-stage Runge–Kutta method reads in this notation
0
1 1
2 2
1
2
0 12 . (5.27)
1 0 0 1
1 2 2 1
6 6 6 6
the function evaluation for the last stage at tn can be used for the first stage at
tn+1 .
The higher amount of function evaluations per step in Runge–Kutta methods
compared to multistep methods is often compensated by the fact that Runge–
Kutta methods may be able to use larger step sizes.
A non autonomous differential equation ẏ = f (t, y) can be written in autonomous
form where the right hand side of the differential equation is not explicitly de-
pending on time, by augmenting the system by the trivial equation ṫ = 1:
ṫ 1
ẏ = = = F (y).
ẋ f (t, x)
We will consider only methods fulfilling this condition and assuming for the rest
of this chapter autonomous differential equations for ease of notation.
5.8. EXPLICIT RUNGE–KUTTA METHODS 117
Setting ε(t, y, h) := y(t + h) − y(t) − hφh (t, y(t)) and applying the mean value
theorem1 to φ gives
Φn (h) = 1 + O(h).
ε is called the local error or, due to its role in (5.30), the global error increment
of the Runge–Kutta method.
Viewing the error propagation formula (5.30) we have to require
Example 100 For the Runge–Kutta method (5.26) we get by Taylor expansion
h3
ε(tn−1 , y, h) = (fyy f 2 + 4fy2 f ) + O(h4 ). (5.32)
24
d
using the notation fy := dy
f (y) for the elementary differentials.
y(tn−1 )
Thus (5.26) is a second order method.
n − un
u(p) = C (p) (t, y)hp+1 + O(hp+2 ) = ε(p) (tn , y(tn )) + O(hp+2 )
(p+1)
(p)
s
un+1 = un + h bpi ki
i=1
(p+1)
s
un+1 = un + h bp+1
i ki .
i=1
5.8. EXPLICIT RUNGE–KUTTA METHODS 119
Example 101 One method of low order in that class is the RKF2(3) method
0
1 1
1 1 1
2 4 4 (5.33)
1 1
2 2
0
1 1 4
6 6 6
Local Extrapolation
Though it is always the error of the lower order method which is estimated,
one often uses the higher order and more accurate method for continuing the
integration process. This foregoing is called local extrapolation. This is also
reflected in naming the method, i.e.
• in RK2(3) the integration process is carried out with a second order method
and in
0
1 1
5 5
3 3 9
10 40 40
4
5
44
45
− 56
15
32
9
8 19372
− 2187
25360 64448
− 212
9 6561 6561 729 (5.34)
1 9017
3168
− 355
33
46732
5247
49
176
− 18656
5103
1 35
384
0 500
1113
125
192
− 2187
6784
11
84
35
384
0 500
1113
125
192
− 2187
6784
11
84
0
5179
57600
0 7571
16695
393
640
− 339200
92097 187
2100
1
40
120 CHAPTER 5. ORDINARY DIFFERENTIAL EQUATIONS
This method uses six stages for 5th order result and one more for obtaining the
4th order result. One clearly sees that the method saves one function evaluation
by meeting the requirement (5.28) for the 5th order method which is used for local
extrapolation.
U1 = un
i−1
Ui = un + h aij λUj i = 2, . . . , s
j=1
s
un+1 = un + h bi λUi .
i=1
DOPRI4
2
-2 DOPRI5
-4
-4 -2 0 2
Figure 5.10: Stability regions for the Runge–Kutta pair DOPRI4 and DO-
PRI5.The methods are stable inside the gray areas.
122 CHAPTER 5. ORDINARY DIFFERENTIAL EQUATIONS
Bibliography
123
Index
124
INDEX 125
Gauss-methods, 29 matrix
Gauss-Seidel iteration, 93 banded, 34
Gauß methods, 31 orthogonal, 56
Gauß-Newton Method, 91 reflection, 57
Gerschgorin, 74 rotation, 57
global error, 102 tridiagonal, 34
global error increment, 117 mean value theorem, 80
midpoint rule, 31
Hölder-norm, 49
monomials, 3
homotopy
More-Penrose inverse, 60
convex, 88
multistep method
global, 88
predictor corrector scheme, 111
Horner’s rule, 4
Adams method, 109
image space, 41 Adams–Bashforth method, 110
implicit Euler Adams–Moulton method, 110
amplification factor, 107 BDF, 113
method, 105 implicit scheme, 110
stability region, 107
nested multiplications, 4
unconditional stability, 107
Newton basis polynomials, 6
increment function, 115
Newton’s method, 83
initial layer, 104
Newton–Kantorovitch Theorem, 84
initial value problem, 95
non parametric curve, 2
inner product, 23
non singular matrix, 42
interpolation polynomial
normal equations, 53
Lagrange form, 109
nullspace, 41
inverse power iteration method, 77
one-step method, 115
Jacobi iteration, 93
order of a RK method, 117
Jacobian, 83
ordinary differential equation, 95
kernel, 41 overdetermined linear system, 53
Kronecker symbol, 5
partial Bézier curve, 16
Lagrange polynomials, 5 partial pivoting, 47
least squares solution, 53 PECE, 111
Legendre polynomial, 31 pivot element, 45
Lipschitz constant, 113 polynomial, 3
Lipschitz continuous, 80 polynomials
local error, 102, 117 Bernstein, 11
local extrapolation, 119 Lagrange, 5
locally quadratic convergent, 84 Newton, 6
126 INDEX
sampling rate, 64
scalar product, 23
Schurform, 78
shift technique, 77
simplified Newton method, 85
simplifying assumptions, 117
Simpson’s rule, 27
singular matrix, 42
singular value decomposition, 62
sparsity pattern, 92
spectral radius, 80
spline
natural, 34
not-a-knot end condition, 35
periodic, 35
with end slope condition, 34