You are on page 1of 128

Department of Computer Science June 10, 2005

Aarhus University
Numerical Analysis and
Parabolic Equations
A Second Course
Ole sterby
Spring 2005
2
Contents
1 Error analysis 7
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Number representation and computational errors . . . . . . . . . 7
1.3 Bisection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Bisection in practice . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.5 An approximation to e . . . . . . . . . . . . . . . . . . . . . . . . 11
1.6 Floating-point numbers . . . . . . . . . . . . . . . . . . . . . . . . 12
1.7 A model for machine numbers . . . . . . . . . . . . . . . . . . . . 13
1.8 IEEE Standard for Binary Floating-Point Arithmetic . . . . . . . 15
1.9 Computational errors . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.10 More computational errors . . . . . . . . . . . . . . . . . . . . . . 19
1.11 Condition and stability . . . . . . . . . . . . . . . . . . . . . . . . 21
1.12 On adding many numbers . . . . . . . . . . . . . . . . . . . . . . 24
1.13 Some good advice . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
1.14 Truncation errors vs. computational errors . . . . . . . . . . . . . 29
2 The global error theoretical aspects 31
2.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
2.2 The explicit method . . . . . . . . . . . . . . . . . . . . . . . . . 32
2.3 The initial condition . . . . . . . . . . . . . . . . . . . . . . . . . 33
3
2.4 Dirichlet boundary conditions . . . . . . . . . . . . . . . . . . . . 33
2.5 The error for the explicit method . . . . . . . . . . . . . . . . . . 33
2.6 The implicit method . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.7 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.8 Crank-Nicolson . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.9 Example continued . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.10 Upwind schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
2.11 Boundary conditions with a derivative . . . . . . . . . . . . . . . 38
2.12 A rst order boundary approximation . . . . . . . . . . . . . . . . 39
2.13 The symmetric second order approximation . . . . . . . . . . . . 40
2.14 An asymmetric second order approximation . . . . . . . . . . . . 40
3 Estimating the global error and order 43
3.1 The local error . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.2 The global error . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
3.3 Limitations of the technique. . . . . . . . . . . . . . . . . . . . . . 51
3.4 An example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.5 Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
4 Two space dimensions 57
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
4.2 The explicit method . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.3 Implicit methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
4.4 ADI methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.5 The Peaceman-Rachford Method . . . . . . . . . . . . . . . . . . 61
4.6 Practical considerations . . . . . . . . . . . . . . . . . . . . . . . 62
4.7 Stability of Peaceman-Rachford . . . . . . . . . . . . . . . . . . . 63
4.8 DYakonov . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
4
4.9 Douglas-Rachford . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
4.10 Stability of Douglas-Rachford . . . . . . . . . . . . . . . . . . . . 66
4.11 The local error . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
4.12 The global error . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5 Equations with mixed derivative terms 73
5.1 Practical considerations . . . . . . . . . . . . . . . . . . . . . . . 74
5.2 Stability with mixed derivative . . . . . . . . . . . . . . . . . . . 75
5.3 Stability of ADI-methods . . . . . . . . . . . . . . . . . . . . . . . 77
5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
6 Two-factor models two examples 81
6.1 The Brennan-Schwartz model . . . . . . . . . . . . . . . . . . . . 81
6.2 Practicalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
6.3 A Traditional Douglas-Rachford step . . . . . . . . . . . . . . . . 87
6.4 The Peaceman-Rachford method . . . . . . . . . . . . . . . . . . 88
6.5 Fine points on eciency . . . . . . . . . . . . . . . . . . . . . . . 89
6.6 Convertible bonds . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
7 Ill-posed problems 93
7.1 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7.2 Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
7.3 Variable coecients An example . . . . . . . . . . . . . . . . . . 98
8 A free boundary problem 99
8.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
8.2 The mathematical model . . . . . . . . . . . . . . . . . . . . . . . 99
8.3 The boundary condition at innity . . . . . . . . . . . . . . . . . 102
8.4 Finite Dierence Schemes . . . . . . . . . . . . . . . . . . . . . . 106
5
8.5 Varying the Time Steps . . . . . . . . . . . . . . . . . . . . . . . 109
8.6 The Implicit Method . . . . . . . . . . . . . . . . . . . . . . . . . 110
8.7 The Crank-Nicolson Method . . . . . . . . . . . . . . . . . . . . . 115
8.8 Determining the order . . . . . . . . . . . . . . . . . . . . . . . . 119
8.9 Eciency of the methods . . . . . . . . . . . . . . . . . . . . . . . 123
6
Chapter 1
Error analysis
1.1 Introduction
As the main title suggests these notes cannot stand alone but are meant as a
supplement to an ordinary text book in Numerical Analysis. This chapter treats
various aspects of error analysis, usually not found in such text books, but which
the author has found useful in practical applications.
1.2 Number representation and computational
errors
An old rule within Numerical Analysis says: No result is better than the accom-
panying error estimate. Therefore error analysis is an important subject and the
rst step is to identify the disease and localize the sources of contamination.
The error is usually dened as the true value minus the calculated value. If is
an approximation to , then the error in is
error =
The error is a signed number and it is sometimes as important to know the sign
of the error as its magnitude.
Example: More than 2000 years ago Archimedes used circumscribed polynomi-
als to calculate an approximation to :
22
7
. Archimedes did not have the tools
for error estimation but by arranging his calculations carefully he made sure that
the error was negative: <
22
7
. Using inscribed polynomials and equally care-
ful calculations he arrived at an approximation with positive error such that he
7
altogether could demonstrate:
3
10
71
< < 3
1
7
. 2
Example: exp(x) can be approximated by the rst terms of the MacLaurin
series, e.g.
e = 1 + x +
x
2
2
+
x
3
6
.
The error is the sum of the remainder series:
error =
x
4
24
+
x
5
120
+
If 1 < x < 0, then this series is alternating with decreasing terms so the error
has the same sign as the rst term in the remainder series and is smaller in
absolute value. So we have an error bound.
If x > 0 then the error is positive but we have no immediate error bound.
If 0 < x < 1 then the rst term in the remainder series will provide a useful error
estimate, i.e. a number which has the right sign and the right order of magnitude
without being a safe bound. 2
In many cases the important quantity is the relative error dened as
relative error =
error

, ( ,= 0).
But where and how do these errors originate ?
1. One type of error is the so-called truncation error, which appears when an
innite series dening the solution to a mathematical problem is truncated to
a nite number of terms (cf. the example above). We also use the expression
discretization error which reminds us that a mathematical (continuous) problem
must be discretized (made nite) in order to be attacked numerically. In many
cases these two considerations amount to the same thing, and the terms are used
interchangeably.
Example: To solve the dierential equation y

= f(x, y) one can use Eulers


method
y
n+1
= y
n
+ h f(x
n
, y
n
)
where x
n
= x
0
+ n h and y
n
is an approximation to the solution value at x
n
.
The error is independent of whether we choose to consider Eulers formula as a
discretized version of the dierential equation or we consider the right-hand-side
as a truncated Taylor series for y(x
n
+ h). 2
But it is important to distinguish between this local error which is committed
in a single step and the global error which we observe in a given point x as the
8
dierence between the true solution value and the calculated value, and which
is the accumulated eect of the errors in each individual step. It is clear that
we can reduce the local error by reducing the step size h, but it is not quite as
obvious what will happen with the global error because there will be more steps
and more contributions to the error; and these cannot be added directly because
each of them propagate independently through Eulers formula.
To estimate the magnitude of the truncation error or to nd the step size or the
number of terms to include in order that the truncation error becomes smaller
than a desired error tolerance is a mathematical problem. Our computers
cannot always help us here.
2. Quite another type of error, which does not bother the pure mathematician,
but which can have a considerable inuence on our results which are often the
result of millions of unsupervised calculations, is the rounding error or computa-
tional error which we shall take a closer look at in the next sections.
3. Finally we have the regular blunders, such as 2 + 2 = 5, or y = x/0. These
errors can not be analysed; they must be eradicated.
1.3 Bisection
From mathematics we know the theorem:
If a function, f, is continuous in a closed interval [a, b], and f(a) and f(b) have
dierent signs then f has (at least) one zero in (a, b).
We should like to nd this zero, and this can be done by successively computing
the value of f at the midpoint, m, and replacing [a, b] by [a, m] respectively [m, b]
depending on whether f(m) has the same sign as f(b) or not. If this process
is not stopped by f(m) = 0 at some stage then after k steps we shall have an
interval of width (b a)2
k
which contains the zero. We can therefore determine
the zero with arbitrary accuracy, and at any stage we have a safe error bound.
9
1.4 Bisection in practice
Example: We have performed bisection on
p
5
(x) = x
5
5.5x
4
+ 12.1x
3
13.31x
2
+ 7.3205x 1.61051
with a = 0 and b = 2. This polynomial has the root 1.1 in [a, b]. We got the
following results:
i m p
5
(m) error
1 1.00000000000000000 -0.00001000000000095 0.1000000
2 1.50000000000000000 0.01023999999999559 -0.4000000
3 1.25000000000000000 0.00007593749999857 -0.1500000
4 1.12500000000000000 0.00000000976562253 -0.0250000
5 1.06250000000000000 -0.00000007415771619 0.0375000
6 1.09375000000000000 -0.00000000000953770 0.0062500
7 1.10937500000000000 0.00000000007241940 -0.0093750
8 1.10156250000000000 0.00000000000000733 -0.0015625
9 1.09765625000000000 -0.00000000000007172 0.0023438
10 1.09960937500000000 -0.00000000000000089 0.0003906
11 1.10058593750000000 -0.00000000000000044 -0.0005859
12 1.10107421875000000 0.00000000000000067 -0.0010742
13 1.10083007812500000 -0.00000000000000133 -0.0008301
.
.
52 1.10101168873318978 -0.00000000000000133 -0.0010117
53 1.10101168873319000 0.00000000000000000 -0.0010117
54 1.10101168873319022 0.00000000000000044 -0.0010117
55 1.10101168873319022 0.00000000000000044 -0.0010117
We can now make the following observations:
After 54 iterations there is no change/improvement.
The zero is determined to about 1.1010.
There is no signicant improvement after iteration 12.
We could have stopped the process at iteration 53, since p
5
(m) was calculated to
0.
An error of 0.001 is rather poor, when we compute with 16 decimals. 2
In practice there is a limit to how ne we can subdivide, and this is not even
a measure of the error, when we as here make a wrong decision, since f(m) is
calculated with the wrong sign at iteration 11.
10
1.5 An approximation to e
From mathematics we know that lim
n
(1 +
1
n
)
n
= e 2.718281828459 . . .
We compute the subseries corresponding to n = 2
k
:
k 1 + 2
k
(1 + 2
k
)
2
k
1 1.50000000000000000 2.25000000000000000
2 1.25000000000000000 2.44140625000000000
3 1.12500000000000000 2.56578451395034790
4 1.06250000000000000 2.63792849736659995
5 1.03125000000000000 2.67699012937818237
6 1.01562500000000000 2.69734495256509987
.
.
23 1.00000011920928955 2.71828166642075297
24 1.00000005960464477 2.71828174742680062
25 1.00000002980232238 2.71828178793237640
26 1.00000001490116119 2.71828180818247311
27 1.00000000745058059 2.71828180818247311
.
.
51 1.00000000000000044 2.71828180818247311
52 1.00000000000000022 2.71828180818247311
53 1.00000000000000000 1.00000000000000000
54 1.00000000000000000 1.00000000000000000
We observe the following:
After 26 iterations we have 8 correct digits (7 decimals).
Then there is no change for a long time.
At iteration 53 the product becomes equal to 1. 2
These two examples show that there is a dierence between usual mathematical
calculations and whatever happens within a computer. In the next sections we
shall take a closer look at what goes on when we use computers to perform
calculations with real numbers.
11
1.6 Floating-point numbers
Most modern computers represent (a subset of) the real numbers using the so-
called oating-point numbers.
A oating-point number system is characterized by
a base ^1
a number of digits s ^
a smallest exponent m Z
a largest exponent M Z
A oating-point number y can be written
y = d
1
.d
2
. . . d
s

e
where 1 d
1
, 0 d
k
1, m e M.
The number d
1
.d
2
. . . d
s
is interpreted as d
1
+ d
2

1
+ + d
s

s+1
and is
called the mantissa or the number part, while
e
is called the exponent part.
Remark: Most oating-point number systems also contain the number
0 = 0.00 . . . 0
e
, where e can be, but does not have to be, 0. 2
A oating-point number system with parameters , s, m og M is denoted
F(, s, m, M).
On an electronic computer a practical choice is = 2, and then the s digits can
be stored in s bits. The sign takes up another bit. The exponent e can be stored
as an integer, e.g. in the interval [m, M] = [2
t
, +2
t
1] which takes t + 1 bits,
such that the number altogether takes up s + t + 2 bits.
Remark: The condition d
1
1 ensures that we work with normalised oating-
point numbers, and our estimations of rounding errors in the next section are only
valid for these. If = 2 it follows that d
1
= 1, and this redundant information
can be left out such that we actually have one more bit available. 2
Example: Fig. 1.1 shows the numbers in F(2, 3, 2, 1), corresponding to what
can be represented in 6 bits. Note how the distance between adjacent oating-
point numbers change throughout the interval. Note also the large (relatively
speaking) interval from 0 to the smallest positive oating-point number,
m
. 2
The actual physical representation of digits in bits and bytes can vary quite a
bit(!), but the above notation captures the details essential for our considerations.
If the result of a computation becomes
M+1
or bigger then we have oating-point
overow and the computations will either be stopped, or a ag will be set with
12
-4 -3 -2 -1 0 1 2 3 4
Figure 1.1: F(2, 3, 2, 1)
information on what happened. If a number becomes smaller in absolute value
than
m
, then we have underow. The result will most often be set equal to 0,
and the computations can continue, possibly with an indication that an underow
has occurred.
Example:
The pocket calculator HP11C uses F(10, 10, 99, 99)
while SUN Pascal is based on F(2, 53, 1022, 1023)
2
It happens very seldom that meaningful calculations take us outside the limits of
the exponent (overow or underow), so we shall disregard these in what follows
and instead consider the system F(, s).
1.7 A model for machine numbers
Floating-point numbers is by far the most common, but not the only way to
represent something that looks like real numbers on a computer. To view things
a little more generally we introduce the following model.
Let M 1 be a discrete set of numbers which contains 0,
and let fl : 1 M be a mapping with the following properties:
x M fl(x) = x,
> 0, x 1 [ x fl(x) [ [x[.
The above is often denoted the machine accuracy or machine-.
The error, x fl(x), is called the representation error of x, and from the above
property follows that the relative representation error is bounded by . We fur-
thermore have
fl(x) x =
1
x, [
1
[
or
fl(x) = x(1 +
1
), [
1
[
a fundamental relation, which is rather important for what follows.
13
Remark: This model opens wide possibilities for the choice of fl. It is quite
useful, however, to require a monotonicity-condition:
a, b 1, a < b fl(a) fl(b).
From this very natural requirement follows implicitly, that a real number is al-
ways represented by one of the two closest machine numbers and therefore, that
the representation error is bounded by the distance between adjacent machine
numbers. 2
Example: Let M = F(, s). If x 1, then x can be written as
x = d
1
.d
2
. . . d
s
d
s+1
. . .
e
.
Let fl be the mapping which cuts o the digits d
s+1
, . . ., such that
fl(x) = d
1
.d
2
. . . d
s

e
.
We then have
0 error
es+1
and
0 relative error

es+1
x


es+1

e
=
1s
.
Instead of this truncation of the innite decimal fraction one might choose correct
rounding to the nearest (s 1)decimal fraction. In this way the error bound
becomes half as big; but what is far more important: The error now has mean
value 0 which has a favourable eect on the accumulated error after many com-
putations. 2
Remark: Since the mantissa can vary between 1 and (1
s
) the relative
accuracy can vary with a factor . We wish for a uniform accuracy and it is
therefore an advantage to choose small. = 2 is optimal in this respect. This
observation shall not detain us from using = 10 for daily use; but = 16,
which has been used on some IBM computers, is not recommended. 2
Remark: On the very rst computers it was common to represent real numbers
in the interval (1, +1) as binary fractions y = 0.d
1
d
2
. . . d
n1
. With n bits
you could represent numbers with an absolute accuracy of 2
n+1
, but in a very
conned interval. With oating-point numbers you give up the high accuracy in
a small interval for a slightly reduced accuracy in a much larger interval, since
some of the n bits are now used to represent the exponent part. 2
Example: Fig. 1.2 shows the numbers in F(2, 5, 1, 1), which corresponds to
what can be represented with 6 bits in xed-point notation. 2
14
-1 0 1
Figure 1.2: F(2, 5, 1, 1)
Example: With 32 bits and xed point representation we have an absolute error
of at most 2
31
4.7 10
10
, but only for x (1, 1). If we use 8 bits for the
exponent, then the relative representation error is bounded by 2
23
1.2 10
7
;
but we can choose m = 128 and M = 127 and therefore represent numbers in
the interval from 2
128
2.9 10
39
to 2
128
3.4 10
38
. 2
1.8 IEEE Standard for Binary Floating-Point
Arithmetic
Throughout the years there has been a fair amount of confusion about the imple-
mentation of oating-point numbers, and many decisions seem to have been left
to individual engineers and designers of arithmetic units; and numerical prop-
erties have therefore played second ddle to considerations regarding speed and
design.
In 1985 the IEEE (Institute of Electrical and Electronics Engineers, USA) issued
a standard for binary oating-point numbers [6], and this standard has more or
less been adopted in most modern processors. The IEEE standard includes two
formats which in our notation (roughly) corresponds to
Single precision F(2, 24, 126, 127)
Double precision F(2, 53, 1022, 1023)
The IEEE standard furthermore prescribes correct rounding with the extra -
nesse, that if x lies exactly between two machine numbers then x is rounded to
that machine number whose least signicant bit is 0. The standard also pre-
scribes three possibilities for directional rounding: towards 0, towards + and
towards . These come in handy when implementing interval arithmetic.
A count of the number of bits yields 33 resp. 65 bits for the two representations
and we can conclude that d
1
is not supposed to be explicitly stored.
We also note that two possible values for the exponent have been taken out for
special purposes. The high value has been reserved for representations of overow:
Inf (= Innity) and NaN (= Not a Number, e.g. 0/0 or Inf Inf). There
15
are also rules for arithmetic with these generalized numbers such that a program
does not need to stop because of a division by 0 or any other form of oating
overow.
The large gap between 0 and
m1
is also mended by allowing the mantissa
at this particular exponent to be non-normalised. One drawback with this trick
is that our relative error estimates of the previous section do not hold in this
interval but then again, they didnt hold before, either.
Example: Fig. 1.3 shows the numbers in an IEEElike number system
F(2, 4, 1, 1)

, where the leading bit in a normalised number is not stored and


where the smallest exponent is reserved non-normalised numbers in the neigh-
bourhood of 0. We have, however, not reserved the largest exponent for special
purposes. As in the previous gures the number system corresponds to what can
be represented using 6 bits. 2
-4 -3 -2 -1 0 1 2 3 4
Figure 1.3: F(2, 4, 1, 1)
The various characteristics of the oating-point number system, such as word
length, relative representation error, etc., used in ones computer is often docu-
mented in the accompanying manual; but a considerable amount of information
can be gained from program fragments such as the following.
Example: On a Sun Sparc ELC the following Pascal program is run:
eps := 1; n := 0; sum := 2;
while sum > 1 do
begin eps := eps/2; n := n + 1; sum := 1 + eps end;
writeln(n:6, eps:22:18, + 1 = 1);
twoeps := eps*2; n := n - 1; sum := 1 + twoeps;
writeln(n:6, twoeps:22:18, + 1 =, sum:22:18, = x);
writeln( :6, eps:22:18, + x =, sum+eps:22:18);
The results are:
53 0.00000000000000011 + 1 = 1
52 0.00000000000000022 + 1 = 1.00000000000000022 = x
0.00000000000000011 + x = 1.00000000000000044
16
These results are in agreement with the IEEE standard for double precision:
1 2
53
= 1
1 2
52
= 1 + 2
52
(1 2
52
) 2
53
= 1 + 2
51
The second relation shows that we actually have 53 bits at our disposal in the
manttissa. The rst and third relation shows that rounding at 0.5 can go up or
down depending on the value of the least signicant bit.
Running
eps := 1; n := 0;
while eps > 0 do
begin eps := eps/2; n := n + 1 end;
gives IEEE Warning:Underflow, and a printout of n shows that 2
1074
is the
smallest positive machine number, and that fl(2
1075
) = 0.
Further investigations show that
(1 + 2
52
) 2
1022
= (1 + 2
52
)2
1022
(1 + 2
52
) 2
1023
= 2
1023
The last calculation which is accompanied by IEEE Warning:Inexact shows
how the relative representation error becomes larger when the exponent is below
1022. In similar ways one can show that
1.25 2
1072
= 2
1072
+ 2
1074
1.25 2
1073
= 2
1073
and that a computation of 2
1024
gives Infinity with IEEE Warning: Overflow,
and that the largest representable number is
(1 + 2
1
+ + 2
52
)2
1023
. 2
1.9 Computational errors
Based on the estimates of the relative representation error we shall now esti-
mate the errors associated with addition, subtraction, multiplication and division
within a machine number system.
Example: In this and later examples we shall demonstrate properties of oating-
point number systems on the system F(10, 4), i.e. a common base 10 system
where numbers are written with 4 signicant digits. Instead of the correct way
17
of writing, like 2.891 10
1
or 6.146 10
4
, we shall allow ourselves to use the more
reader-friendly way of 28.91 and 0.0006146. Note that these two numbers are
written with 2 resp. 7 decimals; but that they both have 4 signicant digits.
The numbers 1.573 and 0.1824 are valid machine numbers, but
1.573 + 0.1824 = 1.7554
1.573 0.1824 = 1.3906
1.573 0.1824 = 0.2869152
1.573 / 0.1824 = 8.6239035 . . .
can not be represented in F(10, 4). This shortcoming is not due to oating-point
numbers being a particularly clumsy way of representation but follows directly
from the fact that the set of machine numbers is nite. Our rst conclusion,
based on these examples, is that the set of machine numbers is not closed w.r.t.
plus, minus, multiply, and divide. 2
Of course we should like to perform calculations anyway within our number sys-
tem, so instead we modify the arithmetic.
We shall make the following
Assumption: The result of adding (subtracting, multiplying, dividing) two ma-
chine numbers on a computer is the same as the representation of the exact sum
(dierence, multiplication, division), i.e.
a, b M a b = fl(a + b) = (a + b)(1 +
1
) [
1
[ 2
Remark: This assumption is quite realistic. Most computers have internal reg-
isters of length 2s. They can therefore internally store a product of two sdigit
numbers or a sum of two numbers whose exponents dier by at most s. But also
in those cases typically division where an exact result cannot be represented,
do we have sucient information to nd fl(a/b) correctly.
A closer analysis will show that just one extra digit (in addition to an over-
ow digit) is sucient for the assumption to hold, if the arithmetic unit in the
computer is designed carefully. 2
It follows from the above that we can represent the result of a simple arithmetic
operation involving two machine numbers with a relative error of the same order
of magnitude as the representation error of the number system. But how about
three numbers ?
Example:
(1.418 2937) 2936 = 2938 2936 = 2.000
1.418 (2937 2936) = 1.418 1.000 = 2.418
1.418 (2001 2000) = 1.418 1.000 = 1.418
(1.418 2001) (1.418 2000) = 2837 2836 = 1.000
18
2
We observe two of the most essential consequences of the fact that the machine
numbers are not closed w.r.t. the simple arithmetic operations: the associative
and the distributive laws do not hold for machine numbers. Moreover we note
that we cannot guarantee a small relative error when more than two numbers are
involved in the computations.
A closer analysis shows
a, b, c M (a b) c = fl(fl(a + b) + c)
= ((a + b)(1 +
1
) + c)(1 +
1
)
= (a + b)(1 +
1
)(1 +
1
) + c(1 +
1
)
= a(1 +
2
) + b(1 +
2
) + c(1 +
1
)
= a + b + c + a
2
+ b
2
+ c
1
where [
1
[, [
1
[ and
2
2 +
2
.
The last expression is an ordinary (forward) error analysis where the computed
result is compared to the true value. If a, b and c have the same sign then this
analysis indicates that the relative error is small, but if we have dierent signs
then the relative error can be arbitrarily large.
The expression right above is of a type which has proved much more useful when
analysing computational errors. It expresses the fact that the computed sum of
a, b, and c is equal to the true sum of three numbers, a, b and c, that are pretty
close to a, b, and c, respectively, in the sense that they deviate by no more than
the representation errors. Such a backward error analysis shows (in this case),
that this computation could not really be performed much better.
1.10 More computational errors
When analysing computational errors in arithmetic expressions involving several
operations we shall encounter error factors of the form
(1 +
1
)(1 +
2
)
1 +
3
, [
i
[
We shall now show that the essential part in such expressions is the number of
s, and that the above expression can be written as 1 +
3
, where [
3
[ 3
almost.
19
Theorem Let [
i
[ for i = 1, 2, . . . n, and 0 k n.
If n = b < 1, then there exists a , [[ <

1b
, such that
u =
(1 +
1
)(1 +
2
) (1 +
k
)
(1 +
k+1
) (1 +
n
)
= 1 + n
Remark: When usually is very small (2
24
, 10
9
, or the like), b will almost
always be small such that 1 b 1, and [[ . 2
The proof of the theorem is split into a series of lemmas.
Lemma 1 (1 )
k

k
i=1
(1 +
i
) (1 + )
k
2
Lemma 2 0 < a < 1 1 + a <
1
1a
Proof: (1 + a)(1 a) = 1 a
2
< 1 2
Corollary 3 0 < a < 1 1 a <
1
1+a
Lemma 4 0 < k < 1 1 k (1 )
k
< (1 + )
k
The last inequality follows from Corollary 3, and the rst inequality is trivial for
k = 1. For k 2 it follows from
(1 )
k
= 1 k +
1
2
k(k 1)
2
(1 )
k2
1 k (0 < < ) 2
Lemma 5 0 < k < 1 (1 + )
k
< (1 )
k
1 +
k
1k
The rst inequality follows from Lemma 2, and the second inequality follows from
Lemma 4:
(1 )
k

1
1k
= 1 +
k
1k
2
The proof of the theorem now follows by combining these inequalities:
u
(1 + )
k
(1 )
nk
< (1 )
n
1 +
n
1 b
u
(1 )
k
(1 + )
nk
> (1 )
n
1 n 2
Inspired by the theorem we now introduce the following
Notation: By
n
we denote a real number which satises [
n
[
n
1n
n.
Remark: This notation says nothing about the sign, and two occurrences of the
same index do not necessarily correspond to the same number. We therefore have
some unusual arithmetic rules for
n
such as
1
3
= 1 +
3
and
1 +
2
1 +
2
= 1 +
4
2
Example: a
1
a
2
a
n
= a
1
a
2
a
n
(1 +
n1
) 2
20
Example:
a
n
a
n1
a
1
= a
n
(1 +
n1
) + a
n1
(1 +
n1
) + a
n2
(1 +
n2
) +
+a
2
(1 +
2
) + a
1
(1 +
1
)
Note that it is not possible to give an estimate for the relative error for a sum
where the terms may have dierent signs. The backward error analysis suggests
that the rst terms undergo the most perturbations. It is therefore a good rule-
of-thumb to start with the smallest terms. It must be mentioned here that this
arrangement gives the smallest error estimate, but not necessarily the smallest
error. 2
Example:
a
2
n
a
2
n1
a
2
1
= a
2
n
(1 +
n
) + a
2
n1
(1 +
n
) + a
2
n2
(1 +
n1
) +
+a
2
2
(1 +
3
) + a
2
1
(1 +
2
)
= (a
2
n
+ a
2
n1
+ + a
2
1
)(1 +
n
)
Since all terms are positive it is possible here to give an estimate of the relative
error. The error estimate (and the error) can be diminished further by doing a
binary-tree addition, i.e. adding the terms two and two, adding the partial sums
two and two, etc. 2
1.11 Condition and stability
The condition of a problem and the stability of an algorithm are two concepts
which are important for the understanding of what factors matter for the accuracy
of our computations.
Loosely formulated, a mapping f is said to be well conditioned (in a region ), if
x

close to x implies f(x

) close to f(x).
This denition looks a bit like the concept of continuity:
> 0 > 0 : |x

x| < |f(x

) f(x)| < .
A well conditioned problem is continuous, but for a continuous problem to be
well conditioned, must not be too small relative to .
We can dene a condition number for the mapping f in a region as
(f) = sup
x,x

|f(x

) f(x)|
|x

x|
.
21
The condition number is closely related to the concepts of Lipschitz-constant and
modulus of continuity for real functions.
A small condition number ( 10) means that the problem is well conditioned
while a large condition number ( > 1000000) means that the problem is ill
conditioned, and there is a smooth transition from the very well conditioned
problems to the very poorly conditioned ones.
An algorithm or computational formula f

is a practical realisation of the map-


ping f. An algorithm is stable in a region , if
x x

close to x such that f

(x) is close to f(x

).
We have again stated the denition very loosely. This reects the smooth trans-
ition from (very) stable to (clearly) unstable algorithms.
We shall not require that f

(x) = f(x

), since we have no guarantee that f(x

)
can be represented exactly in the number system at hand.
We note that if f

is stable and f is well conditioned then f

(x) will be close to


f(x), and that is really what we want.
There is no tradition for characterizing a good or bad stability by a stability
number, but we sometimes come close when we perform a backward error anal-
ysis of an algorithm.
If a problem is ill conditioned then it is a good idea to try to reformulate, and
if this seems impossible then the least we can do is to focus attention on the
great sensitivity of the problem. The trouble with ill conditioned problems is a
mathematical, not a computational one, and there is very little we can do about
it.
If an algorithm is unstable then we shouldnt use it, but replace it with a stable
one.
Example: Finding the roots of a quadratic equation
x
2
2bx + c = 0
with coecients b and c is an ill conditioned problem if b
2
c, i.e. when there is
a double root or two close roots. If for instance b = 1 and c = 1, then both roots
are equal to 1, but b = 1 and c = 0.9999 9999 gives the roots 0.9999 and 1.0001,
so the shift of the roots is 10000 times larger than the change in c. Actually this
problem has = for (b, c) close to (1, 1).
But it should be stressed that when the roots in the quadratic equation are not
close then the root-nding problem is a well conditioned one.
This example illustrates that when there is a mathematical exception (determi-
nant = 0, double root, . . . ) then there is a neighbourhood around it where the
condition is poor.
22
This example also illustrates that a problem can be ill conditioned for some val-
ues of the data and well conditioned in other places. Care should be taken to use
it only in the other places. 2
Example: The usual formula for the roots
x
1,2
= b

b
2
c
is an unstable algorithm for the smaller root if this is much smaller (in absolute
value) than the larger, i.e. when [c[ b
2
. If for instance b = 1 and c = 0.000001,
and we use the number system F(10, 4), then fl(x
1
) = 0 and fl(x
2
) = 2. While
the larger root is represented as well as possible the relative error in the smaller
root is 100 %. The computational formula for the larger root is stable; but for
the smaller root we should rather use the fact that the product of the roots is
equal to c, leading to the formula
fl(x
1
) = fl(
c
x
2
) = 0.00000 05000,
a result with a relative error less than machine. We note in passing that this
problem is well conditioned since 1/2. 2
Example: Compute the average of 6.231 and 6.233 in F(10, 4).
From the formula m = (a + b)/2 we get
fl(fl(6.231 + 6.233)/2) = fl(12.46/2) = 6.230.
The error is reasonably small, but we note that the computed result does not lie
in the interval [a, b].
If we instead use m = a + (b a)/2 we nd
fl(6.231 + fl(fl(6.233 6.231)/2)) = 6.231 +
0.002
2
= 6.232
We shall return to this way of rearranging the computations to achieve better
stability properties in Section 1.13 on page 27. 2
The bible for rounding errors where much of the previous notation has been
introduced and many results given is [14].
23
1.12 On adding many numbers
One often encounters the problem of summing an innite series. The nite ma-
chine accuracy can here be an advantage because it implicitly sets a limit to how
many terms in the series are needed. Even if you dont know the machine accu-
racy or if you write a program to be run on dierent platforms you can make
the program adapt to dierent surroundings. One simple technique is to keep
adding terms until the sum does not change any more. This technique works ne
with swiftly converging series, but there are pitfalls.
We can for instance sum the harmonic series

n=1
1
n
in single precision to 15.403
682 71 using 2
21
= 2097152 terms. We note that
1
2
21
+1
is so small that adding
it to 15.403 682 71 gives no change. But the harmonic series is divergent and
therefore has no nite sum. More generally, the fact that one term is too small
does not imply that the eect of several terms cannot be felt.
We now redene the problem to the one of summing the rst 2097152 terms of
the harmonic series. Can we trust the calculated sum ?
k 2
k
S
k
D
k
1 2 1.500 000 00 1.500 000 00
2 4 2.083 333 49 0.583 333 49
3 8 2.717 857 36 0.634 523 87
4 16 3.380 728 96 0.662 871 60
5 32 4.058 495 52 0.677 766 56
6 64 4.743 891 72 0.685 396 19
7 128 5.433 147 43 0.689 255 71
8 256 6.124 345 78 0.691 198 35
9 512 6.816 517 35 0.692 171 57
10 1024 7.509 182 93 0.692 665 58
11 2048 8.202 081 68 0.692 898 75
12 4096 8.895 108 22 0.693 026 54
13 8192 9.588 195 80 0.693 087 58
14 16384 10.281 306 27 0.693 110 47
15 32768 10.974 409 10 0.693 102 84
16 65536 11.667 428 02 0.693 018 91
17 131072 12.360 085 49 0.692 657 47
18 262144 13.051 303 86 0.691 218 38
19 524288 13.737 017 63 0.685 713 77
20 1048576 14.403 683 66 0.666 666 03
21 2097152 15.403 682 71 0.999 999 05
In the table above we have listed the partial sums S
k
=

2
k
n=1
1
n
and the dierences
24
D
k
= S
k
S
k1
.
We can show that
ln2 2
k
< D
k
< ln 2 0.693 147 18,
and these inequalities are fullled up to k = 14, but clearly not for k > 16.
Especially the last dierence is completely wrong with a value of almost 1.0. In
other words: there must be an error in the rst decimal (= third digit) in S
21
.
Rather poor considering that we have 7 digits machine accuracy.
But we have also broken the good rule of summing the small terms rst. In the
table below we have performed the sum backwards and again listed both the
partial sums S

k
=

2
21
n=2
k1
1
n
and the dierences D

k
= S

k
S

k+1
. The latter can
be compared with the similar ones in the previous table.
k 2
k
S

k
D

k
21 2097152 0.693 266 15 0.693 266 15
20 1048576 1.386 155 01 0.692 888 86
19 524288 2.079 162 84 0.693 007 83
18 262144 2.772 186 52 0.693 023 68
17 131072 3.465 300 08 0.693 113 57
16 65536 4.158 451 08 0.693 151 00
15 32768 4.851 575 37 0.693 124 29
14 16384 5.544 691 56 0.693 116 19
13 8192 6.237 781 05 0.693 089 49
12 4096 6.930 810 45 0.693 029 40
11 2048 7.623 712 06 0.692 901 61
10 1024 8.316 372 87 0.692 660 81
9 512 9.008 551 60 0.692 178 73
8 256 9.699 749 95 0.691 198 35
7 128 10.389 008 52 0.689 258 58
6 64 11.074 402 81 0.685 394 29
5 32 11.752 169 61 0.677 766 80
4 16 12.415 040 97 0.662 871 36
3 8 13.049 565 32 0.634 524 35
2 4 13.632 898 33 0.583 333 02
1 2 15.132 898 33 1.500 000 00
The results look better but not overwhelmingly so. Apparently we must be
satised with 3 correct decimals in the rst dierences where more than 100000
terms participate.
In this problem where we have many terms of the same order of magnitude, a
third adding strategy can be advantageous. Arrange the terms as the leaves of a
25
balanced binary tree (this works very well when n is a power of 2), add the terms
2 and 2, add the partial sums 2 and 2, etc. In a summation of n = 2
k
terms each
term participates in only k additions and this leads to a smaller error estimate.
The table below shows that it also leads to smaller errors.
k 2
k
S
k
D
k
1 2 1.500 000 00 1.500 000 00
2 4 2.083 333 49 0.583 333 49
3 8 2.717 857 36 0.634 523 87
4 16 3.380 729 20 0.662 871 84
5 32 4.058 495 52 0.677 766 32
6 64 4.743 891 24 0.685 395 72
7 128 5.433 147 43 0.689 256 19
8 256 6.124 345 30 0.691 197 87
9 512 6.816 516 88 0.692 171 57
10 1024 7.509 176 25 0.692 659 38
11 2048 8.202 079 77 0.692 903 52
12 4096 8.895 105 36 0.693 025 59
13 8192 9.588 191 99 0.693 086 62
14 16384 10.281 309 13 0.693 117 14
15 32768 10.974 441 53 0.693 132 40
16 65536 11.667 581 56 0.693 140 03
17 131072 12.360 725 40 0.693 143 84
18 262144 13.053 871 15 0.693 145 75
19 524288 13.747 016 91 0.693 145 75
20 1048576 14.440 163 61 0.693 146 71
21 2097152 15.133 310 32 0.693 146 71
The dierences suggest that the error is of the order of 10
6
2
20
, which is
close to machine- relative to a sum of about 15.
To check the results we have performed yet another calculation in double precision
and using binary addition. The results are given below:
summation precision computed sum error
binary double 15.133 306 70
binary single 15.133 310 32 0.000 003 62
backwards single 15.132 898 33 +0.000 408 37
forwards single 15.403 682 71 0.270 376 01
The results show clearly that it is far better to add the small terms rst than
last. Even better is a binary addition, however. The relative error here is about
4, which is rather good considering that we have added two million terms.
26
1.13 Some good advice
It is easy to give examples that computer calculations with real numbers can go
wrong. It is more dicult (and dangerous) to give general advice on how to avoid
the pitfalls. But it is rather important to do so, so we try anyway.
1. A (good) approximation plus a (small) correction term.
In many calculations, typically iterations, we shall compute better and (hopefully)
better approximations to the solution. If the calculations can be arranged as
mentioned above, this will always be advantageous. For instance
a +
b a
2
is better than
a + b
2
,
x
x
2
a
2x
is better than
1
2
(x +
a
x
),
x
2
y
2
x
2
x
1
y
2
y
1
is better than
y
2
x
1
x
2
y
1
y
2
y
1
.
2. Add the small terms rst.
When a series with decreasing terms is to be summed, it is a good idea to start
with the small terms.
If you must add many numbers then you should consider binary addition.
3. Be careful when subtracting almost equal numbers.
I am not saying that such subtractions should be avoided. We have actually used
them to advantage in #1. They are perfectly OK in correction terms, but we
must make sure that we do not lose essential information.
4. Avoid large partial results on the road to a small nal answer.
The MacLaurin series for exp(x) is not to be recommended when x = 10, where
it contains terms of the order 2700, while the reult is 0.000 045. A solution here
is (cf. #5) to compute exp(+10) where the series can be used, and take the
reciprocal.
5. Use mathematical reformulations to avoid 3. and 4.
For small values of v we have that
2 sin
2
v
2
is better than 1 cos v.
27
For values of v in the neighbourhood of /2 we have that
2 sin
2
(

2
v) is better than 1 sin v
and especially if it is possible to nd an explicit expression for

2
v.
6. Series expansions can supplement 5.
For small x we have that
x +
x
2
2
+
x
3
6
+ is better than e
x
1,
x
3
6

x
5
120
+ is better than x sin x,
x
2

x
2
8
+ is better than 1

1 x.
7. Use integer calculations when possible.
Even when h is dened by a/n we shall often move one step too little or too many
if we try to march the distance a in steps of h. It is better to keep the integer
n as the primary parameter and go n steps. If we afterwards wish to halve the
step size then we can just double n and redene the step size from this.
8. Look at the numbers once in a while.
It is a good idea to (write and) check intermediate results, e.g. while testing the
program, and judge whether they make sense. Sound judgment and common
sense are invaluable helpers.
28
1.14 Truncation errors vs. computational errors
From Taylors formula
f(x + h) = f(x) + hf

(x) +
1
2
h
2
f

(x) +
1
6
h
3
f

(x) +
we have that
f

(x) =
f(x + h) f(x)
h

1
2
hf

(x)
1
6
h
2
f

(x) +
When we approximate f

(x) by the divided dierence then this value has a trun-


cation error of the form c
1
h + c
2
h
2
+
We can make the truncation error small by using a small value of h. But when h is
small then f(x+h) is close to f(x), and we can expect a considerable cancellation
in the dierence:
fl(f(x + h) f(x)) = (fl(f(x + h) fl(f(x)))(1 +
1
)
= (f(x + h)(1 +
1
) f(x)(1 +
2
))(1 +
1
)
= f(x + h) f(x) + f(x + h)
1
f(x)
2
+ ( )
1
We know that a dierence between two machine numbers can be computed with
small relative error, but if the two terms are the results of computations with
accompanying errors then the cancellation can become catastrophic. If we assume
that f(x) can be computed with a relative error of =
p
, where p is a small
integer then the error can be stimated by
[ error [ 2[f(x)[
p
+[f

(x)[
1

2p+1
,
if we assume that f(x) and f

(x) have order of magnitude 1. This error is now


magnied in the division by h.
While the truncation error is a nice and dierentiable function, the contribution
from the computational error is more random, but with a standard deviation
inversely proportional to h. The total error can therefore be expressed as
total error c
1
h + d
1
1
h
,
where c
1
is of order of magnitude 1 and d
1
of order of magnitude machine. The
expression on the right-hand-side has its minimum for h =

d
1
/c
1

with a
minimum value of 2

c
1
d
1

From these considerations we deduce that there is a lower bound for how small
values of h it is reasonable to use, and that there is a lower limit for the error
we can achieve with a given formula and a given machine accuracy, and that this
error limit is considerably larger than the machine accuracy.
29
A formula where the truncation error has the form c
1
h
p
+ is said to be of order
p, and for such a formula we have
total error c
1
h
p
+ d
1
1
h
.
Now the minimum occurs for h = (d
1
/pc
1
)
1/(p+1)

1/(p+1)
with a minimum value
of order of magnitude
p/(p+1)
.
If for instance p = 2 and = 10
15
, then the optimal h will be in the neighbour-
hood of 10
5
, and the optimal error about 10
10
. We therefore prefer formulae
of high order as long as they dont involve too much extra computation or have
other infavourable properties compared to the low order formula.
30
Chapter 2
The global error theoretical
aspects
2.1 Introduction
We study the linear, parabolic equation
u
t
= bu
xx
au
x
+ u + (2.1)
or as we prefer to write it here
Pu = u
t
bu
xx
+ au
x
u = (2.2)
introducing the partial dierential operator P. The coecients b, a, , and
may depend on t and x. We produce a numerical solution v(t, x) and our basic
assumption is that the global error can be expressed in terms of a series expansion
in the step sizes k and h
v(t, x) = u(t, x) hc kd hke h
2
f k
2
g (2.3)
The auxiliary functions c, d, e, f, and g need not all be present in any particular
situation. Often we shall observe that c or e or d are identically zero such that
the numerical solution is second order accurate in one or both of the step sizes.
Strictly speaking v(t, x) is only dened on a discrete set of grid points but we
shall imagine that it is possible to extend it in a dierentiable manner to the
whole region. Actually this can be done in many ways. The same considerations
apply to the auxiliary functions and we shall in the following see a concrete way
of extending these.
We can get information on the auxiliary functions by studying the dierence
equations and by using Taylor expansions. We rst look at the explicit scheme.
31
2.2 The explicit method
We introduce the dierence operators

2
v
n
m
=
v
n
m+1
2v
n
m
+ v
n
m1
h
2
, (2.4)
v
n
m
=
v
n
m+1
v
n
m1
2h
. (2.5)
The explicit scheme for (2.2) can now be written as
v
n+1
m
v
n
m
k
b
n
m

2
v
n
m
+ a
n
m
v
n
m

n
m
v
n
m
=
n
m
. (2.6)
We apply (2.3) and Taylor expand around (nk, mh):
v
n+1
m
v
n
m
k
= u
t
+
1
2
ku
tt
+
1
6
k
2
u
ttt
hc
t

1
2
hkc
tt
kd
t

1
2
k
2
d
tt
hke
t
h
2
f
t
k
2
g
t
+ O(k
3
+ k
2
h + kh
2
+ h
3
) (2.7)

2
v
n
m
= u
xx
+
1
12
h
2
u
4x
hc
xx
kd
xx
hke
xx
(2.8)
h
2
f
xx
k
2
g
xx
+ O( )
v
n
m
= u
x
+
1
6
h
2
u
xxx
hc
x
kd
x
hke
x
(2.9)
h
2
f
x
k
2
g
x
+ O( )
v
n
m
= u hc kd hke h
2
f k
2
g + O( ) (2.10)
We insert (2.7) (2.10) in (2.6) and equate terms with the same powers of h and
k:
1 : Pu = (2.11)
h : Pc = 0 (2.12)
k : Pd =
1
2
u
tt
(2.13)
hk : Pe =
1
2
c
tt
(2.14)
h
2
: Pf =
1
12
bu
4x
+
1
6
au
xxx
(2.15)
k
2
: Pg =
1
6
u
ttt

1
2
d
tt
(2.16)
The rst thing we notice is that we in (2.11) recover the original equation for u
indicating that the dierence scheme (and our assumption (2.3)) is consistent.
32
The auxiliary functions are actually only dened on the grid points but inspired
by (2.12) (2.16) it seems natural to extend them between the gridpoints such
that these dierential equations are satised at all points in the region. We note
that each of the auxiliary functions should satisfy a dierential equation very
similar to the original one, the only dierence lying in the inhomogeneous terms.
2.3 The initial condition
In order to secure a unique solution to (2.1) we must impose some side conditions.
One of these is an initial condition, typically of the form
u(0, x) = u
0
(x), A x B, (2.17)
where u
0
(x) is a given function of x. It is natural to expect that we start our
numerical solution as accurately as possible, i.e. we set v
0
m
= v(0, mh) = u
0
(mh)
for all grid points between A and B. But we would like to extend v between grid
points as well, and the natural thing would be to set v(0, x) = u
0
(x), A x B.
With this assumption we see from (2.3) that
c(0, x) = d(0, x) = e(0, x) = f(0, x) = g(0, x) = = 0, A x B. (2.18)
2.4 Dirichlet boundary conditions
In order to secure uniqueness we must in addition to the initial condition impose
two boundary conditions which could look like
u(t, A) = u
A
(t), u(t, B) = u
B
(t), t > 0, (2.19)
where u
A
(t) and u
B
(t) are two given functions of t. Just like for the initial
condition it is natural to require v(t, x) to satisfy these conditions not only at the
grid points on the boundary but on the whole boundary and as a consequence
the auxiliary functions will all assume the value 0 on the boundary:
c(t, A) = d(t, A) = e(t, A) = f(t, A) = g(t, A) = = 0, t > 0, (2.20)
c(t, B) = d(t, B) = e(t, B) = f(t, B) = g(t, B) = = 0, t > 0. (2.21)
2.5 The error for the explicit method
If we have an initial-boundary value problem for (2.1) with Dirichlet boundary
conditions, and if we use the explicit method for the numerical solution then we
have the following results for the auxiliary functions:
33
The dierential equation (2.12) for c(t, x) is homogeneous and so are the side
conditions according to (2.18), (2.20), and (2.21). c(t, x) 0 is a solution, and
by uniqueness the only one. It follows that c(t, x) 0 and therefore that there
is no h-contribution to the global error in (2.3).
The dierential equation (2.14) for e(t, x) is apparently inhomogeneous, but since
c(t, x) 0 so is c
tt
and the equation is homogeneous after all. So are the side
conditions and we can conclude that e(t, x) 0.
The global error expression (2.3) for the explicit method therefore takes the form
v(t, x) = u(t, x) kd h
2
f k
2
g (2.22)
and we deduce that the explicit method is indeed rst order in time and second
order in space.
For d we have from (2.13) that Pd =
1
2
u
tt
so we must require the problem to
be such that u is twice dierentiable w.r.t. t. This is usually no problem except
possibly in small neighbourhoods around isolated points on the boundary.
2.6 The implicit method
For the implicit method
v
n+1
m
v
n
m
k
b
n+1
m

2
v
n+1
m
+ a
n+1
m
v
n+1
m

n+1
m
v
n+1
m
=
n+1
m
. (2.23)
it is natural to choose ((n + 1)k, mh) as the expansion point. Equations (2.8)
(2.10) stay the same with the exception that all functions should be evaluated at
((n + 1)k, mh). In equation (2.7) three terms on the r.h.s. change sign:
v
n+1
m
v
n
m
k
= u
t

1
2
ku
tt
+
1
6
k
2
u
ttt
hc
t
+
1
2
hkc
tt
kd
t
+
1
2
k
2
d
tt
hke
t
h
2
f
t
k
2
g
t
+ O(k
3
+ k
2
h + kh
2
+ h
3
) (2.24)
Equating terms as before we get a set of equations rather similar to (2.11)
(2.16). (2.11) and (2.12) are unchanged, there is a single sign change in (2.14)
and we can still conclude that c(t, x) e(t, x) 0. The remaining equations are
k : Pd =
1
2
u
tt
(2.25)
h
2
: Pf =
1
12
bu
4x
+
1
6
au
xxx
(2.26)
k
2
: Pg =
1
6
u
ttt
+
1
2
d
tt
(2.27)
34
and the error expansion for the implicit method has the same form as (2.22).
Since there is a sign change in (2.25) as compared to (2.13) we can conclude
that d
Im
(t, x) = d
Ex
(t, x). The r.h.s. of (2.26) is the same as in (2.15) and the
sign change in the r.h.s. of (2.27) is compensated by d being of opposite sign.
We therefore have that f(t, x) and g(t, x) are the same for the explicit and the
implicit method.
2.7 An example
Consider
u
t
= u
xx
, 1 x 1, t > 0,
u(0, x) = u
0
(x) = cos x, 1 x 1,
u(t, 1) = u(t, 1) = e
t
cos 1, t > 0.
The true solution is u(t, x) = e
t
cos x.
For the explicit method we have
Pd = d
t
d
xx
=
1
2
u
tt
=
1
2
u.
For f we have similarly
Pf = f
t
f
xx
=
1
12
u
4x
=
1
12
u.
We conclude that for this problem f(t, x) =
1
6
d(t, x) or d(t, x) = 6f(t, x).
For the explicit method we must have k = h
2
with
1
2
. Keeping xed the
leading terms in the error expansion are
kd + h
2
f = 6h
2
f + h
2
f = (6 1)h
2
f.
If we choose =
1
2
as is common we get the leading term of the error to be
2h
2
f(t, x). There is an obvious advantage in choosing =
1
6
in which case we
obtain fourth order (in h) accuracy.
If we use the implicit method f stays the same and d changes sign and the leading
terms of the error expansion become
6h
2
f + h
2
f = (6 + 1)h
2
f.
With =
1
2
the error becomes 4h
2
f(t, x) i.e. twice as big (and of opposite sign)
as for the explicit method. There is no value for that will secure a higher order
accuracy.
35
2.8 Crank-Nicolson
The Crank-Nicolson method can be written as
v
n+1
m
v
n
m
k

1
2
b
n+1
m

2
v
n+1
m

1
2
b
n
m

2
v
n
m
(2.28)
+
1
2
a
n+1
m
v
n+1
m
+
1
2
a
n
m
v
n
m

1
2

n+1
m
v
n+1
m

1
2

n
m
v
n
m
=
1
2
(
n+1
m
+
n
m
).
The natural expansion point is now ((n +
1
2
)k, mh). For the approximations to
the x-derivatives it is worthwhile to Taylor-expand rst in the x-direction to get
formulas like (2.8) and (2.9) referring to (nk, mh) and ((n + 1)k, mh) and then
use the formula
1
2
u
n+1
+
1
2
u
n
= u
n+
1
2
+
1
8
k
2
u
tt
+ O(k
4
) (2.29)
on all the individual terms. The resulting equations are
v
n+1
m
v
n
m
k
= u
t
+
1
24
k
2
u
ttt
hc
t
kd
t
hke
t
(2.30)
h
2
f
t
k
2
g
t
+ O(k
3
+ k
2
h + kh
2
+ h
3
)
1
2
(b
n+1
m

2
v
n+1
m
+ b
n
m

2
v
n
m
) = b
n+
1
2
m u
xx
+
1
12
h
2
u
4x
hc
xx
kd
xx
(2.31)
hke
xx
h
2
f
xx
k
2
g
xx
+
1
8
k
2
(bu
xx
)
tt
+ O( )
1
2
(a
n+1
m
v
n+1
m
+ a
n
m
v
n
m
) = a
n+
1
2
m u
x
+
1
6
h
2
u
xxx
hc
x
kd
x
hke
x
(2.32)
h
2
f
x
k
2
g
x
+
1
8
k
2
(au
x
)
tt
+ O( )
1
2
(
n+1
m
v
n+1
m
+
n
m
v
n
m
) =
n+
1
2
m u hc kd hke h
2
f k
2
g (2.33)
+
1
8
k
2
(u)
tt
+ O( )
1
2
(
n+1
m
+
n
m
) = +
1
8
k
2

tt
+ O( ) (2.34)
We insert (2.30) (2.34) in (2.28) and equate terms with the same powers of h
and k:
1 : Pu = (2.35)
h : Pc = 0 (2.36)
k : Pd = 0 (2.37)
hk : Pe = 0 (2.38)
36
h
2
: Pf =
1
12
bu
4x
+
1
6
au
xxx
(2.39)
k
2
: Pg =
1
24
u
ttt

1
8
(bu
xx
)
tt
+
1
8
(au
x
)
tt

1
8
(u)
tt

1
8

tt
(2.40)
The r.h.s. in (2.40) looks rather complicated but if the solution to (2.1) is smooth
enough such that we can dierentiate (2.1) twice w.r.t. t then we can combine
the last four terms in (2.40) to
1
8
u
ttt
and the equation becomes
k
2
: Pg =
1
12
u
ttt
(2.41)
If the inhomogeneous term (t, x) in the equation (2.1) can be evaluated at the
mid-points ((n+
1
2
)k, mh) then it is tempting to use
n+
1
2
m instead of
1
2
(
n+1
m
+
n
m
)
in (2.28). We shall then miss the term with
1
8

tt
in (2.40) and therefore not have
complete advantage of the reduction leading to (2.41). Instead we shall have
k
2
: Pg =
1
12
u
ttt
+
1
8

tt
.
It is impossible to say in general which is better, but certainly (2.41) is simpler.
Looking at equations (2.35) (2.41) we again recognize the original equation for
u in (2.35), and from (2.36) (2.38) we may conclude that c(t, x) d(t, x)
e(t, x) 0 showing that Crank-Nicolson is indeed second order in both k and h.
We also note from (2.39) that f(t, x) for Crank-Nicolson is the same function as
for the explicit and the implicit method.
2.9 Example continued
For our example we have
Pg = g
t
g
xx
=
1
12
u
ttt
=
1
12
u.
We conclude that f(t, x) = g(t, x) and that the leading terms of the error are
h
2
f + k
2
g = (h
2
k
2
)f.
There is a distinct advantage in choosing k = h in which case the second order
terms in the error expansion will cancel.
37
2.10 Upwind schemes
When [a[ is large compared to b we occasionally observe oscillations in the nu-
merical solution. One remedy is to reduce the step sizes but this costs computer
time. Another option is to use an upwind scheme such as in the explicit case (for
a > 0):
v
n+1
m
v
n
m
k
b
n
m

2
v
n
m
+ a
n
m
v
n
m
v
n
m1
h

n
m
v
n
m
=
n
m
. (2.42)
To analyze the eect on the error we use
v
n
m
v
n
m1
h
= u
x

1
2
hu
xx
+
1
6
h
2
u
xxx
hc
x
+
1
2
h
2
c
xx
(2.43)
kd
x
+
1
2
hkd
xx
hke
x
h
2
f
x
k
2
g
x
+ O( )
together with (2.7), (2.8) and (2.10). Equating terms with the same powers of h
and k now gives
1 : Pu = (2.44)
h : Pc =
1
2
au
xx
(2.45)
k : Pd =
1
2
u
tt
(2.46)
hk : Pe =
1
2
c
tt
+
1
2
ad
xx
(2.47)
From (2.45) and (2.47) we conclude that c(t, x) and e(t, x) are no longer identically
zero and that the method is now rst order in both k and h. A similar result
holds for the implicit scheme. For Crank-Nicolson the order in h is also reduced
to 1 but we keep second order accuracy in k.
2.11 Boundary conditions with a derivative
If one of the boundary conditions involves a derivative then the discretization of
this has an eect on the global error of the numerical solution. Assume that the
condition on the left boundary is
u(t, A) u
x
(t, A) = , t > 0 (2.48)
where , and may depend on t. A similar condition might be imposed on the
other boundary and the considerations would be completely similar so we shall
38
just consider a derivative condition on one boundary. We shall in turn study
three dierent discretizations of the derivative in (2.48):
v
n
1
v
n
0
h
(rst order) (2.49)
v
n
1
v
n
1
2h
(second order, symmetric) (2.50)
v
n
2
+ 4v
n
1
3v
n
0
2h
(second order, asymmetric) (2.51)
2.12 A rst order boundary approximation
We rst use the approximation (2.49) in (2.48). If the coecients , and
depend on t they should be evaluated at t = nk:
v
n
0

v
n
1
v
n
0
h
= , t > 0. (2.52)
We now use the assumption (2.3) and Taylor-expand v
n
1
around (nk, 0):
u hc kd hke h
2
f k
2
g u
x
+
1
2
hu
xx
+
1
6
h
2
u
xxx
hc
x

1
2
h
2
c
xx
kd
x

1
2
hkd
xx
hke
x
h
2
f
x
k
2
g
x
= O( ) (2.53)
Collecting terms with 1, h, k, hk, h
2
, and k
2
as before we get
1 : u u
x
= (2.54)
h : c c
x
=
1
2
u
xx
(2.55)
k : d d
x
= 0 (2.56)
hk : e e
x
=
1
2
d
xx
(2.57)
h
2
: f f
x
=
1
6
u
xxx
+
1
2
c
xx
(2.58)
k
2
: g g
x
= 0 (2.59)
We recognize the condition (2.48) for u in (2.54). As for c the boundary condition
(2.55) is no longer homogeneous and we shall expect c to be nonzero. This holds
independently of which method is used for the discretization of the equation (2.1).
So if we use a rst order boundary approximation we get a global error which is
rst order in h.
39
2.13 The symmetric second order approxima-
tion
We now apply the symmetric approximation (2.50) in (2.48):
v
n
0

v
n
1
v
n
1
2h
= , t > 0. (2.60)
We again use the assumption (2.3) and Taylor-expand v
n
1
and v
n
1
around (nk, 0):
u hc kd hke h
2
f k
2
g u
x
+
1
6
h
2
u
xxx
hc
x
kd
x
hke
x
h
2
f
x
k
2
g
x
= O(h
3
+ h
2
k + hk
2
+ k
3
) (2.61)
Collecting terms with 1, h, k, hk, h
2
, and k
2
as before we get
1 : u u
x
= (2.62)
h : c c
x
= 0 (2.63)
k : d d
x
= 0 (2.64)
hk : e e
x
= 0 (2.65)
h
2
: f f
x
=
1
6
u
xxx
(2.66)
k
2
: g g
x
= 0 (2.67)
We recognize the condition (2.48) for u in (2.62). We now have a homogeneous
condition (2.63) for c and this will assure that c(t, x) 0 when we combine (2.60)
with the explicit, the implicit or the Crank-Nicolson method. We shall also have
e(t, x) 0, but in order to have d(t, x) 0 we must use the Crank-Nicolson
method.
2.14 An asymmetric second order approxima-
tion
We nally apply the approximation (2.51) in (2.48):
v
n
0

v
n
2
+ 4v
n
1
3v
n
0
2h
= , t > 0. (2.68)
We again use the assumption (2.3) and Taylor-expand v
n
1
and v
n
2
around (nk, 0):
u hc kd hke h
2
f k
2
g u
x

1
3
h
2
u
xxx
hc
x
kd
x
hke
x
h
2
f
x
k
2
g
x
= O(h
3
+ h
2
k + hk
2
+ k
3
) (2.69)
40
Collecting terms with 1, h, k, hk, h
2
, and k
2
as before we get
1 : u u
x
= (2.70)
h : c c
x
= 0 (2.71)
k : d d
x
= 0 (2.72)
hk : e e
x
= 0 (2.73)
h
2
: f f
x
=
1
3
u
xxx
(2.74)
k
2
: g g
x
= 0 (2.75)
All the same conclusions as for the symmetric case will hold also in this asymmet-
ric case. One disadvantage which does not show in equations (2.70) (2.75) is
that the next h-term is now third order and therefore can be expected to interfere
more than the fourth order term which is present in the symmetric case.
41
42
Chapter 3
Estimating the global error and
order
3.1 The local error
The error of a nite dierence scheme for solving a partial dierential equation
is often given by means of the local error which is the error committed in one
step given correct starting values, or more frequently as the local truncation error
given in terms of a Taylor expansion again for a single step and with presumed
correct starting values. Rather than giving numerical values one often resorts to
giving the order of the scheme in terms of the step size such as O(h) or O(h
2
).
The interesting issues, however, are the magnitude of the global error i.e. the
dierence between the true solution and the computed value at a specied point,
in a sense the cumulated value of all the local errors up to this point, and the
order of this error in terms of the step size used.
3.2 The global error
We shall rst dene what we mean by the global error being of order say O(h).
Let u(x) be the true solution, and let v(x) be the computed solution. v(x) is
actually only dened on a discrete set of grid points but if need be it can be
extended in a dierentiable fashion. We shall assume that the computed solution
can be written as
v(x) = u(x) hc(x) h
2
d(x) h
3
f(x) , (3.1)
where c(x), d(x) and f(x) are dierentiable functions. These auxiliary functions
are strictly speaking also dened only on the discrete set of grid points but can
43
likewise be extended in a dierentiable fashion to the whole interval in question
if we wish.
If the function c(x) happens to be identically 0 then the method is (at least) of
second order, otherwise it is of rst order. Even if c(x) is not identically 0 then
it might very well have isolated zeroes. At such places our analysis might give
results which are dicult to interpret correctly. Therefore the analysis should
always be performed for a substantial set of points in order to give trustworthy
results.
In the following we shall show how we by performing calculations with various
values of the step size, h, can extract information not only about the true solution
but also about the order and magnitude of the error.
We shall begin our analysis in one dimension and later extend it to functions of
two or more variables. A calculation with step size h will thus result in
v
1
= u hc h
2
d h
3
f
A second calculation with twice as large a step size gives
v
2
= u 2hc 4h
2
d 8h
3
f
We can now eliminate u by subtraction:
v
1
v
2
= hc + 3h
2
d + 7h
3
f +
A third calculation with 4h is necessary to retrieve information about the order
v
3
= u 4hc 16h
2
d 64h
3
f ,
whence
v
2
v
3
= 2hc + 12h
2
d + 56h
3
f +
and a division gives
v
2
v
3
v
1
v
2
= 2
c + 6hd + 28h
2
f +
c + 3hd + 7h
2
f +
. (3.2)
This quotient can be computed in all those points where we have information from
all three calculations, i.e. all grid points corresponding to the last calculation
If c ,= 0 and h is suitably small we shall observe numbers in the neighbourhood of
2 in all points, and this would indicate that the method is of rst order. If c = 0
and d ,= 0, then the quotient will assume values close to 4 and if this happens for
44
many points and not just at isolated spots then we can deduce that the method
is of second order. The smaller h the smaller inuence for the next terms in the
numerator and the denominator and the picture should become clearer.
The error in the rst calculation, v
1
, is given by
e

1
= u v
1
= hc + h
2
d + h
3
f + .
If we observe many numbers of the ratio (3.2) in the neighbourhood of 2 indicating
that [c[ is substantially larger than h[d[, and that the method therefore is of rst
order, then e
1
is represented reasonably well by v
1
v
2
:
e

1
= v
1
v
2
2h
2
d 6h
3
f , (3.3)
and v
1
v
2
can be used as an estimate of the error in v
1
.
One could choose to add this result to v
1
and thereby get more accurate results.
This process is called Richardson extrapolation and can be done for all grid points
involved in the calculation of v
2
. If the error (estimate) behaves nicely we might
even consider interpolating to the intermediate points and thus get extrapolated
values with spacing h. Interpolation or not, we cannot at the same time, i.e.
without doing some extra work, get a realistic estimate of the error in this im-
proved value. The old estimate can of course still be used but it is expected to
be rather pessimistic.
If in contrast we observe many numbers in the neighbourhood of 4 then [c[ is
substantially less than h[d[ and is probably 0. At the same time [d[ will be larger
than h[f[, and the method would be of second order with an error
e

1
= u v
1
= h
2
d + h
3
f + .
This error will be estimated nicely by (v
1
v
2
)/3:
e

1
=
1
3
(v
1
v
2
)
4
3
h
3
f . (3.4)
It is thus important to check the order before calculating an estimate of the error
and certainly before making any corrections using this estimate. If in doubt it
is usually safer to estimate the order on the low side. If the order is 2 and the
correct error estimate therefore (v
1
v
2
)/3, then misjudging the order to be 1
and using v
1
v
2
for the error estimate would not be terribly bad, and actually
on the safe side. But if one wants to attempt Richardson extrapolation it is very
important to have the right order.
45
-1 1
2
4
Figure 3.1: The function w(x) = 2
1+2x
1+x
.
If our task is to compute function values with a prescribed error tolerance then the
error estimates can also be used to predict a suitable step size which would satisfy
this requirement and in the second round to check that the ensuing calculations
are satisfactory.
Can we trust these observations? Yes, if we really observe values of (3.2) between
say 1.8 and 2.2 for all relevant grid points then the method is of rst order and
the rst term in the remainder series dominates the rest. Discrepancies from this
pattern in small areas are also allowed. They may be due to the fact that c(x)
has an isolated zero. This can be checked by observing the values of v
1
v
2
in a neighbourhood. These numbers which are dominated by the term hc will
then display a change of sign. In such a region the error estimate v
1
v
2
might
be pessimistic but since the absolute value at the same time is rather small this
should not give reason to worry.
If a method is of rst order and we choose to exploit the error estimate to adjust
the calculated value (i.e. to perform Richardson extrapolation) then it might be
reasonable to assume that the resulting method is of second order. This of course
can be tested by repeating the above process.
Once more it should be stressed that the order must be checked before carrying
on. Therefore we need an extra calculation, v
4
, such that we can compute three
extrapolated values on the basis of which we can get information about the order.
We of course expect the order to be at least 2. If the results do not conform then
it might be an idea to review the calculations.
46
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 t
0.0
0.1
0.2
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
x
Figure 3.2: c(t, x).
What will actually happen if one performs a Richardson extrapolation based on
a wrong assumption about the order? Usually not too much. If one attempts to
eliminate a second order term in a rst order calculation then the result will still
be of rst order; and if one attempts to eliminate a rst order term which is not
there then the absolute value of the error might double but the result will retain
its high order.
If one wants to understand in detail what might happen to the ratio (3.2) in the
strange areas, i.e. how the ratio might vary when h[d[ is not small compared to
[c[, then one can consider the behaviour of the function
w(x) = 2
1 + 2x
1 + x
, (3.5)
where x = 3h
d
c
If x is positive, then 2 < w(x) < 4, and w(x) 4, when x .
This corresponds to c and d having the same sign.
If x is small then w(x) 2.
If x is large and negative then w(x) > 4, and w(x) 4 when x .
The situation x corresponds to c = 0, i.e. that the method is of second
order.
The picture becomes rather blurred when x is close to 1, i.e. when c and d have
opposite sign and c 3hd:
x 1 w +
x 1 w
47
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 t
0.00
0.01
0.02
0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
x
Figure 3.3: g(t, x).
1 < x <
1
2
w < 0
But in these cases we are far away from [c[ h[d[.
Reducing the step size by one half corresponds to reducing x by one half.
If 0 < w(x) < 4 then w(
x
2
) will be closer to 2.
If w(x) < 0 then 0 < w(
x
2
) < 2.
If 6 < w(x) then w(
x
2
) < 0.
If 4 < w(x) < 6 then w(
x
2
) > w(x).
If c and d have opposite sign and c is not dominant the picture will be rather
chaotic, but a suitable reduction of h will result in a clearer picture if the funda-
mental assumptions are valid.
We have been rather detailed in our analysis of rst order methods with a non-
vanishing second order term. Quite similar analyses can be made for second and
third order, or for second and fourth order or for higher orders. If the ratio (3.2)
is close to 2
p
then our method is of order p.
If u is a function of two or more variables then we can perform similar analyses
taking one variable at a time. If say u(t, x) is a function of two variables, t and
x, and v is a numerical approximation based on step sizes k and h then our
assumption would be
v
1
= u hc kd h
2
f k
2
g
48
t \ x 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0.1 2.1 2.1 2.1 2.1 2.1 2.1 2.1 2.1 2.1 2.1
0.2 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0
0.3 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0
0.4 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0
0.5 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0
0.6 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0
0.7 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0
0.8 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0
0.9 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0
1.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0 2.0
Figure 3.4: h-ratio for rst order boundary condition.
A calculation with 2h and k gives
v
2
= u 2hc kd 4h
2
f k
2
g ,
such that
v
1
v
2
= hc + 3h
2
f + ,
and we are back in the former case. We compute
v
3
= u 4hc kd 16h
2
f k
2
g ,
and can check the order of approximation in h using the ratio (3.2).
For the k-dependence we compute
v
4
= u hc 2kd h
2
f 4k
2
g
and
v
5
= u hc 4kd h
2
f 16k
2
g
and using v
1
, v
4
and v
5
we can determine the order of approximation in k and
the corresponding term in the error of v
1
.
These error terms can then be used to decide how the step sizes might be reduced
in order to achieve a given error tolerance. Richardson extrapolation is also a
possibility here to increase the order and accuracy. In particular it would be
tempting to improve an O(k + h
2
)-method to O(k
2
+ h
2
) by extrapolation in
the k-direction. In order to check that such extrapolation(s) give the expected
49
t \ x 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0.1 1.9 14.7 3.9 1.2 4.3 5.2 4.5 3.4 2.8 2.8
0.2 1.8 17.1 2.7 4.1 4.2 4.0 4.0 4.0 4.0 4.0
0.3 1.7 6.1 3.6 4.1 4.0 4.0 4.0 4.0 4.0 4.0
0.4 1.5 4.2 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0
0.5 1.3 3.7 4.1 4.0 4.0 4.0 4.0 4.0 4.0 4.0
0.6 1.2 3.5 4.1 4.0 4.0 4.0 4.0 4.0 4.0 4.0
0.7 1.0 3.5 4.1 4.0 4.0 4.0 4.0 4.0 4.0 4.0
0.8 0.9 3.5 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0
0.9 0.8 3.6 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0
1.0 0.8 3.7 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0
Figure 3.5: k-ratio for second order boundary condition.
results, it is again necessary to supplement with further calculations (and with a
more advanced numbering system for these v

s).
So what is the cost in terms of work or computer time to get this extra informa-
tion? We shall compare with the computational work for v
1
under the assumption
that the work is proportional to the number of grid points. Therefore v
2
costs
half as much as v
1
, and v
3
costs one fourth. The work involved in calculating
v
1
v
2
, v
2
v
3
and their quotient which is done for 1/4 of the grid points will not
be considered since it is assumed to be considerably less than the fundamental
dierence calculations.
The work involved in nding v
1
, v
2
and v
3
is therefore 1.75, i.e. an extra cost of
75%, and that is actually very inexpensive for an error estimate. If the numbers
allow an extrapolation then the result of this is expected to be better than a
calculation with half the step size and we are certainly better o. If the compu-
tational work increases faster than the number of grid points then the result is
even more in favour of the present method.
If u is a function of two variables with two independent step sizes then the cost of
the ve necessary calculations is 2.5 times the cost of v
1
. This is still a reasonable
price to pay. Knowing the magnitude of the error and its dependence on the step
sizes enables us to choose near-optimal combinations of these and thus avoid
redundant calculations, and a possible extrapolation might improve the results
considerably more than halving the step sizes and quadrupling the work.
50
t \ x 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0.1 3.2 3.4 3.6 3.7 3.8 3.9 3.9 3.9 3.9 4.0
0.2 3.4 3.5 3.6 3.7 3.7 3.8 3.8 3.8 3.9 3.9
0.3 3.5 3.6 3.6 3.7 3.7 3.8 3.8 3.8 3.8 3.8
0.4 3.5 3.6 3.6 3.7 3.7 3.7 3.8 3.8 3.8 3.8
0.5 3.6 3.6 3.6 3.7 3.7 3.7 3.8 3.8 3.8 3.8
0.6 3.6 3.6 3.7 3.7 3.7 3.7 3.7 3.8 3.8 3.8
0.7 3.6 3.6 3.7 3.7 3.7 3.7 3.7 3.7 3.8 3.8
0.8 3.6 3.6 3.7 3.7 3.7 3.7 3.7 3.7 3.8 3.8
0.9 3.6 3.6 3.7 3.7 3.7 3.7 3.7 3.7 3.8 3.8
1.0 3.6 3.6 3.7 3.7 3.7 3.7 3.7 3.7 3.7 3.8
Figure 3.6: h-ratio for asymmetric second order.
3.3 Limitations of the technique.
It is essential for the technique to give satisfactory results that the leading term
in the error expansion is the dominant one. This will always be the case when
the step size is small, but how can we know that the step size is small enough?
If we compute with step size h and if the result is rst order as indicated in
formula (3.1) then we assume that the error term h c is small. It is therefore
reasonable to assume that the second term, h
2
d is very small because it contains
the factor h
2
and therefore that the ratio (3.2) will be close to 2. In this case
the estimate of the error will also be reliable. If the method is second order the
leading term is h
2
d. It is often the case with symmetric formulae that the next
term is fourth order, and if h
2
d is small then we may assume that h
4
g is very
small. Thus the ratio (3.2) will give numbers close to 4 and we should be able to
trust the error estimate.
If, however, a method is second order and the next term in the error expansion
is third order the situation is quite dierent. Now we can expect the third or-
der term to interfere signicantly making the order determination dicult and
extrapolation a dubious aair.
Even if we can safely determine the order we should not expect too much of the
extrapolation. Going from rst to second or from second to fourth order we shall
usually double the number of correct decimals, but going from second to third or
from fourth to sixth order this number will only increase by 50 %.
51
t \ x 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0.1 4.0 4.0 4.0 4.0 4.0 4.0 4.1 4.2 4.4 4.9
0.2 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 3.9 3.9
0.3 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.2
0.4 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 3.9
0.5 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.1
0.6 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0
0.7 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0
0.8 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0
0.9 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0
1.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0 4.0
Figure 3.7: k-ratio for asymmetric second order.
So the main area of application is to rst (and second) order methods, but of
course here the need is also the greatest.
3.4 An example
We shall illustrate our techniques on a simple example involving the heat equa-
tion:
u
t
= u
xx
0 x 1 t 0,
with initial condition
u(0, x) = cos x 0 x 1
and boundary conditions
u
x
(t, 0) = 0 t 0,
u(t, 1) = cos 1 t 0.
The solution is u(t, x) = e
t
cos x.
We have solved numerically using Crank-Nicolson and wish to study the be-
haviour of the global error using various discretizations of the derivative boundary
condition. First of all we have solved the initial-boundary value problems (using
h = k = 0.025) for the functions c(t, x) and g(t, x) corresponding to the rst or-
52
t \ x 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0.1 15.9 15.9 15.9 16.0 16.0 16.1 16.2 16.5 17.2 18.6
0.2 16.0 16.0 16.0 16.1 16.1 16.1 16.1 16.0 15.8 15.5
0.3 16.1 16.1 16.1 16.1 16.0 16.0 16.0 16.0 16.1 16.5
0.4 16.0 16.0 16.0 16.0 16.1 16.1 16.1 16.1 16.0 15.8
0.5 16.0 16.0 16.0 16.0 16.0 16.0 16.0 16.0 16.1 16.2
0.6 16.0 16.0 16.0 16.0 16.0 16.0 16.0 16.1 16.0 15.9
0.7 16.0 16.0 16.0 16.0 16.0 16.0 16.0 16.0 16.0 16.1
0.8 16.0 16.0 16.0 16.0 16.0 16.0 16.0 16.0 16.0 15.9
0.9 16.0 16.0 16.0 16.0 16.0 16.0 16.0 16.0 16.0 16.1
1.0 16.0 16.0 16.0 16.0 16.0 16.0 16.0 16.0 16.0 16.0
Figure 3.8: h-ratio for symmetric second order with h = k.
der boundary approximation (cf. Section 2.12) and show the results graphically
in Fig. 3.2 and Fig. 3.3.
The values of c(t, x) lie between 0 and 0.28 and those of g(t, x) between 0 and
0.022 and from this we could estimate the truncation error for given values of the
step sizes h and k. Or we could suggest step sizes in order to make the truncation
error smaller than a given tolerance.
More often we have little knowledge of the auxiliary functions beforehand and
we shall instead extract empirical knowledge from our calculations.
Using formula (3.2) and the similar one for k we check the order of the method
calculating the ratios on a 10 10 grid using step sizes that are 16 times smaller.
The results are shown in Fig. 3.4 and Fig. 3.5 for h and k respectively.
The method is clearly rst order in h with only few values deviating appreciably
from 2.0. The picture is more confusing for k where the second order is only
convincing for larger values of t or x. For small values of x or t k
2
g(t, x) is much
smaller than hc(t, x) and a greater sensitivity is to be expected here.
The values for c(t, x) as determined by v
1
v
2
(see 3.3) with h = k = 0.00625 agree
within 7 % with those obtained from solving the dierential equation for c and
a better agreement can be obtained using smaller step sizes. The corresponding
determination of g(t, x) is reasonably good when t and x are not too close to 0.
In the regions where we have diculty determining the order (cf. Fig. 3.5) we can
of course have little trust in an application of formula (3.4) but in regions where
the ratio (3.2) is between 3.0 and 5.0 the agreement is within 8 % with the step
53
t \ x 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0.1 9.9 9.1 7.5 7.0 6.9 7.0 7.3 7.7 8.0 8.3
0.2 7.9 7.7 8.3 8.5 8.5 8.2 7.9 7.6 7.4 7.3
0.3 8.4 8.1 7.8 7.9 8.0 8.1 8.2 8.2 8.3 8.3
0.4 8.0 7.9 8.2 8.2 8.1 8.0 8.0 8.0 8.0 8.1
0.5 8.2 8.0 7.9 8.0 8.0 8.1 8.1 8.1 8.0 8.0
0.6 8.0 7.9 8.1 8.1 8.0 8.0 8.0 8.0 8.0 8.1
0.7 8.1 8.0 7.9 8.0 8.0 8.0 8.0 8.0 8.0 8.0
0.8 8.0 7.9 8.1 8.0 8.0 8.0 8.0 8.0 8.0 8.0
0.9 8.1 8.0 7.9 8.0 8.0 8.0 8.0 8.0 8.0 8.0
1.0 8.0 7.9 8.0 8.0 8.0 8.0 8.0 8.0 8.0 8.0
Figure 3.9: h-ratio for asymmetric second order with h = k.
sizes chosen.
With the second order boundary approximations from Section 2.13 c(t, x) is ex-
pected to be identically 0, and the derivative boundary condition for f reduces
to f
x
(t, 0) = 0 since c
xx
= 0 and u
xxx
(t, 0) = e
t
sin 0 = 0.
The inhomogeneous term in the dierential equation for f is now

1
12
au
xxxx
=
1
12
e
t
cos x =
1
12
u
ttt
and we therefore see that f(t, x) = g(t, x).
For the symmetric approximation (2.60) the ratio (3.2) assumes values between
3.99 and 4.01 for the h-dependence and between 3.8 and 4.9 for the k-dependence
when using h = k = 0.025. These good results are due to the fact that the next
terms in the error expansion (2.3) are h
4
and k
4
because of the symmetry and
therefore interfere little. The values for f and g agree within 2 % with each other
and with the values obtained from the independent solution of the dierential
equation.
For the asymmetric second order boundary approximation (2.68) the second order
in k is clearly detectable on the 10 10 grid using h = 0.00625 and k = 0.025
(see Fig. 3.7) and reasonably so for h (cf. Fig. 3.6). The values obtained here for
g(t, x) agree within 1 % with those previously obtained whereas the values for
f(t, x) are 10 - 20 % too small in agreement with the h-ratio being determined
consistently 10 - 20 % too small. The presence of an interfering h
3
-term in the
error expansion is clearly noticeable here.
54
Since f = g it is very tempting to try equal step sizes with the second order
boundary approximations. Using h = k = 0.025 we can conrm that the sym-
metric approximation now leads to a method which is fourth order (see Fig. 3.8)
and that the asymmetric approximation (2.68) leads to a method which is third
order in the common step size (see Fig. 3.9).
3.5 Literature
The idea to perform extrapolation and thus achieve a higher order goes back to
the british mathematician Lewis F. Richardson who introduced it for the special
case of p = 2 [11]. The formula for determining or checking the order is given
in lecture notes from the Technical University of Denmark [2] but the idea is
probably older. The history of extrapolation processes in numerical analysis is
given in the survey paper [7] which also contains an extensive bibliography.
55
56
Chapter 4
Two space dimensions
4.1 Introduction
A general, linear equation
u
t
= b
1
u
xx
+ 2b
12
u
xy
+ b
2
u
yy
a
1
u
x
a
2
u
y
+ u + (4.1)
is called parabolic if
b
1
> 0 and b
1
b
2
> b
2
12
. (4.2)
The solution u(t, x, y) is a function of t, x, and y, and the coecients may also
depend on t, x, and y.
To ensure a unique solution (4.1) must be supplemented by an initial condition
u(0, x, y) = u
0
(x, y), X
1
x X
2
, Y
1
y Y
2
(4.3)
and boundary conditions such as
u(t, X
1
, y) = u
11
(t, y), t > 0 (4.4)
and similar relations for x = X
2
, y = Y
1
, and y = Y
2
.
We shall begin with treating the case where b
12
= 0, i.e. that there is no mixed
derivative term in (4.1). We can then write the equation as
u
t
= P
1
u + P
2
u + (4.5)
where
P
1
u = b
1
u
xx
a
1
u
x
+ u (4.6)
P
2
u = b
2
u
yy
a
2
u
y
+ (1 )u (4.7)
57
and 0 1. While symmetry considerations might speak for an even distri-
bution of the u-term ( =
1
2
) it is computationally simpler to use = 0 or = 1
i.e. to include the u-term fully in one of the two operators.
For the numerical solution of (4.1) we choose a step size k in the t-direction and
step sizes h
1
and h
2
in the x- and the y-direction, respectively, and seek the
numerical solution v
n
lm
on the discrete set of points (nk, X
1
+ lh
1
, Y
1
+ mh
2
).
4.2 The explicit method
In the simplest case P
1
u = b
1
u
xx
and P
2
u = b
2
u
yy
such that
u
t
= b
1
u
xx
+ b
2
u
yy
. (4.8)
The explicit method for (4.8) looks like
v
n+1
lm
v
n
lm
k
= b
1

2
x
v
n
lm
+ b
2

2
y
v
n
lm
. (4.9)
To study the stability of (4.9) we take v
n
lm
on the form
v
n
lm
= g
n
e
i
1
lh
1
e
i
2
mh
2
= g
n
e
il
1
e
im
2
(4.10)
and insert in (4.9):
g = 1 + b
1

1
(e
i
1
2 + e
i
1
) + b
2

2
(e
i
2
2 + e
i
2
)
= 1 4b
1

1
sin
2

1
2
4b
2

2
sin
2

2
2
(4.11)
where
1
= k/h
2
1
and
2
= k/h
2
2
. The stability requirement is [g[ 1 and since
g is real and clearly less than 1 the critical condition is g 1 or
2b
1

1
sin
2

1
2
+ 2b
2

2
sin
2

2
2
1. (4.12)
Since this must hold for all
1
and
2
the requirement for stability is
b
1

1
+ b
2

2

1
2
. (4.13)
4.3 Implicit methods
Formula (4.13) puts severe restrictions on the step size k and it is very tempting
to study generalizations of the implicit or the Crank-Nicolson methods to two
58
space dimensions. A similar analysis will show that they are both unconditionally
stable, i.e. they are stable for any choice of k, h
1
, and h
2
.
Problem: Show this.
The use of an implicit method requires the solution of a set of linear equations
for each time step. In one space dimension the coecient matrix for these equa-
tions is tridiagonal, and the equations can be solved with a number of simple
arithmetic operations (SAO) proportional to the number of internal grid points.
The situation is less favourable in two or more space dimensions.
If we have L internal points in the x-direction and M internal points in the y-
direction then we have L M grid points and equations per time step. There are
at most 5 non-zero coecients in each equation. If the grid points are ordered
lexicographically then the coecient matrix will have a band structure such as
shown to the left in Fig. 4.1 where the non-zero coecients are marked with a
square. During a Gaussian elimination the region between the outer bands will
ll in as shown to the right in Fig. 4.1 resulting in a number of non-zeroes which
is approximately L
2
M and a number of SAO proportional to L
3
M.
Figure 4.1: A coecient matrix before and after elimination.
Example.
Consider u
t
= u
xx
+ u
yy
on the unit square with h
1
= h
2
= 0.01, L = M 100.
If we want to integrate to t = 1 using the explicit method then stability requires
k
1
4100
2
and a number of time steps at least K = 4 100
2
. The number of SAO
is proportional to the number of grid points which is K L M 4L
4
4 10
8
.
If we want to use an implicit method we may choose k = h
1
= h
2
, K = L = M
so the number of grid points is approximately L
3
but the number of SAO is
proportional to K L
3
M = L
5
10
10
.
We can conclude that there is no advantage in using implicit methods directly
since the time involved in solving the linear equations outweighs the advantage of
using larger step sizes in time. There are more elaborate ways to order the equa-
tions and to perform the solution process, but none that will make a signicant
dierence in favour of implicit methods.
59
4.4 ADI methods
When the dierential operator can be split as in (4.5) (4.7) there are ways to
avoid the L
3
-factor. Such methods are called time-splitting methods or Locally
One-Dimensional (LOD) or Alternating Direction Implicit (ADI) and the general
idea is to split a time step in two and to take one operator or one space coordinate
at a time.
Taking our inspiration from the Crank-Nicolson method we begin discretizing
(4.5) in the time-direction:
u
t
((n +
1
2
)k, x, y) =
u
n+1
u
n
k
+ O(k
2
), (4.14)
P
1
u + P
2
u + =
1
2
P
1
(u
n+1
+ u
n
) +
1
2
P
2
(u
n+1
+ u
n
) (4.15)
+
1
2
(
n+1
+
n
) + O(k
2
).
Insert in (4.5), multiply by k, and rearrange:
(I
1
2
kP
1

1
2
kP
2
)u
n+1
= (I +
1
2
kP
1
+
1
2
kP
2
)u
n
(4.16)
+
1
2
k(
n+1
+
n
) + O(k
3
).
If we add
1
4
k
2
P
1
P
2
u
n+1
on the left side and
1
4
k
2
P
1
P
2
u
n
on the right side then we
commit an error which is O(k
3
) and therefore can be included in that term:
(I
1
2
kP
1
)(I
1
2
kP
2
)u
n+1
= (I +
1
2
kP
1
)(I +
1
2
kP
2
)u
n
(4.17)
+
1
2
k(
n+1
+
n
) + O(k
3
).
We now discretize in the space coordinates replacing P
1
by P
1h
, P
2
by P
2h
, and
u by v:
(I
1
2
kP
n+1
1h
)(I
1
2
kP
n+1
2h
)v
n+1
= (I +
1
2
kP
n
1h
)(I +
1
2
kP
n
2h
)v
n
(4.18)
+
1
2
k(
n+1
+
n
)
and this gives rise to the Peaceman-Rachford method [10]:
(I
1
2
kP
n+1
1h
) v
n+
1
2
= (I +
1
2
kP
n
2h
)v
n
+ , (4.19)
(I
1
2
kP
n+1
2h
)v
n+1
= (I +
1
2
kP
n
1h
) v
n+
1
2
+ . (4.20)
60
The operators P
1h
and P
2h
involve the respective coecients of the dierential
equation and may therefore depend on time. This dependence is indicated by
superscripts n and n + 1. The superscript n +
1
2
on v does not mean that these
values have any special relation to the intermediate time value t = (n +
1
2
)k.
These numbers are just intermediate values, introduced for reason of calculation
and eciency with no reference to the solution at any particular time.
We have introduced the values and to take into account the inhomogeneous
term because it is not evident how this term should be split. We shall attend
to this matter shortly.
4.5 The Peaceman-Rachford Method
In order to check whether the solution v
n+1
to (4.19) and (4.20) is also the solution
to (4.18) we start with v
n+1
from (4.20) and apply the dierence operators from
the left side of (4.18):
(I
1
2
kP
n+1
1h
)(I
1
2
kP
n+1
2h
)v
n+1
= (I
1
2
kP
n+1
1h
)(I +
1
2
kP
n
1h
) v
n+
1
2
+
= (I +
1
2
kP
n
1h
)(I
1
2
kP
n+1
1h
) v
n+
1
2
+ (I
1
2
kP
n+1
1h
)
= (I +
1
2
kP
n
1h
)(I +
1
2
kP
n
2h
)v
n
+ (I +
1
2
kP
n
1h
) (4.21)
+ (I
1
2
kP
n+1
1h
)
The rst equal sign follows from (4.20), the third one from (4.19), and the second
one requires that the operators P
n+1
1h
and P
n
1h
commute.
This is not always the case when the coecients depend on t and x. A closer
analysis reveals that we have commutativity if the coecients b
1
, a
1
, and are
either constant or only depend on t and y or only depend on x and y. If depends
on all of t, x, and y we may incorporate it in the operator P
2
. If a
1
and are 0
then b
1
may depend on both t and x (and y) if it can be written as a product of
a function of t (and y) and a function of x (and y).
The operators P
1
and P
2
do not enter in a symmetric fashion. Therefore it may
happen that we do not have commutativity for one but we do for the other. In
this case we may switch freely between the x- and y-coordinates.
The main consequence of non-commutativity is that the ADI method (4.19)
(4.20) becomes rst order in time instead of the expected second order.
Once commutativity is established we take a closer look at the inhomogeneous
61
term. From (4.18) and (4.21) we have the requirement that
(I +
1
2
kP
n
1h
) + (I
1
2
kP
n+1
1h
) =
1
2
k(
n+1
+
n
) (4.22)
where a discrepancy of order O(k
3
) may be allowed with reference to a similar
term in (4.17). There are three possible choices for and that will satisfy this:
=
1
2
k
n
, =
1
2
k
n+1
, (4.23)
= =
1
4
k(
n+1
+
n
), (4.24)
= =
1
2
k
n+
1
2
. (4.25)
4.6 Practical considerations
The system of equations (4.19) for v
n+
1
2
contains one equation for each interior
grid point in the xy-region. The operator P
1h
refers to neighbouring points in the
x-direction and the resulting coecient matrix becomes tridiagonal and we can
therefore solve the system with a number of SAO proportional to the number of
interior grid points. Similarly the system of equations (4.20) for v
n+1
is eectively
tridiagonal and can be solved at a similar cost irrespective of whether we reorder
the grid points or not.
We shall now take a detailed look at how we set up and solve the two systems
(4.19) and (4.20).
1. To compute the r.h.s. of (4.19) we need the values of v
n
at all interior grid
points and at all interior grid points on the boundaries y = Y
1
and y = Y
2
. If we
have Dirichlet conditions on these boundaries we know these values directly. If
the boundary conditions involve the y-derivative on one or both of these boundary
line segments then we use a (preferably second order) dierence approximation
to the derivative.
Cost: 5LM SAO.
2. To complete the system (4.19) we need information on v
n+
1
2
for interior grid
points on the boundary line segments x = X
1
and x = X
2
. If P
1h
does not depend
on time then rearranging (4.20) and adding to (4.19) gives
2 v
n+
1
2
= (I +
1
2
kP
n
2h
)v
n
+ (I
1
2
kP
n+1
2h
)v
n+1
+ . (4.26)
If we have Dirichlet boundary conditions on x = X
1
and x = X
2
then we have
information on v
n
and v
n+1
here and we can apply P
2h
.
62
3. Solve system (4.19).
Cost: 8LM SAO.
4. To compute the r.h.s. of (4.20) we need the same values of v
n+
1
2
for x = X
1
and x = X
2
as we discussed in 2.
Cost: 5LM SAO.
5. To complete the system (4.20) we need information on v
n+1
for y = Y
1
and
y = Y
2
. As in 1. if we have Dirichlet boundary conditions we know these values
directly. Otherwise we include equations involving dierence approximations to
the derivatives.
6. Solve system (4.20).
Cost: 8LM SAO.
Total cost: 26LM SAO per time step.
In the gure below we have visualized the considerations above. The horisontal or
vertical lines indicate which operator we are concerned with (P
1h
or P
2h
) and the
bullets indicate which function values we are considering. In 1. we are computing
r.h.s. values based on v
n
. In 2. and 3. we are computing v-values and in 4.
r.h.s. values based on these. Finally in 5. and 6. we compute values for v
n+1
.
1. 2. 3. 4. 5. 6.
4.7 Stability of Peaceman-Rachford
We have derived the Peaceman-Rachford method on the basis of ideas from
Crank-Nicolson so we expect unconditional stability, but we have also made a
few minor alterations along the way so it is probably a good idea to do an in-
dependent check. We shall do this for the special case u
t
= b
1
u
xx
+ b
2
u
yy
with
constant coecients b
1
and b
2
. Inserting
v
n
lm
= g
n
e
il
1
e
im
2
and v
n+
1
2
lm
= gv
n
lm
(4.27)
in (4.19) and (4.20) gives
(1 + 2b
1

1
sin
2

1
2
) g = 1 2b
2

2
sin
2

2
2
, (4.28)
(1 + 2b
2

2
sin
2

2
2
)g = (1 2b
1

1
sin
2

1
2
) g, (4.29)
63
such that
g =
1 2b
1

1
sin
2
1
2
1 + 2b
1

1
sin
2
1
2

1 2b
2

2
sin
2
2
2
1 + 2b
2

2
sin
2
2
2
. (4.30)
It is easily seen that 1 g 1 such that we indeed have unconditional stability.
We also note that components with high frequency in both directions
1
,

2
will have g 1 (if b
1

1
and b
2

2
are large) so these components will not
be damped very much and they will not alternate from one time step to the next.
This lack of alternation is a distinct dierence from Crank-Nicolson.
4.8 DYakonov
There are other ways of splitting equation (4.18). DYakonov [15] has suggested
(I
1
2
kP
n+1
1h
) v
n+
1
2
= (I +
1
2
kP
n
1h
)(I +
1
2
kP
n
2h
)v
n
+ , (4.31)
(I
1
2
kP
n+1
2h
)v
n+1
= v
n+
1
2
+ . (4.32)
To check the equivalence we take the solution v
n+1
from (4.32) and apply the
dierence operators from the left side of (4.18):
(I
1
2
kP
n+1
1h
)(I
1
2
kP
n+1
2h
)v
n+1
= (I
1
2
kP
n+1
1h
)( v
n+
1
2
+ ) (4.33)
= (I +
1
2
kP
n
1h
)(I +
1
2
kP
n
2h
)v
n
+ + (I
1
2
kP
n+1
1h
)
In this case we have no problem with commutativity of the operators. As for the
inhomogeneous term an obvious choice is
= 0, =
1
2
k(
n+1
+
n
). (4.34)
When calculating the r.h.s. of (4.31) we must know v
n
on all grid points including
the boundaries and the corners. In addition we have in general a sum of 9 terms
for each equation possibly with dierent coecients so the cost for step 1. is
17LM SAO.
Setting up system (4.31) requires v
n+
1
2
on the interior points on the boundary
segments x = X
1
and x = X
2
. These values can be found by solving (4.32) from
right to left if we have Dirichlet conditions on these boundary segments.
Solving equations (4.31) now costs 8LM SAO.
The r.h.s. of (4.32) is easy and so are the necessary boundary values of v
n+1
on
y = Y
1
and y = Y
2
. The solution of (4.32) then costs another 8LM SAO, and
the total cost of a time step with DYakonov amounts to 33LM SAO making
DYakonov slightly more expensive than Peaceman-Rachford.
64
4.9 Douglas-Rachford
Other ADI methods can be derived from other basic schemes. If we for example
take our inspiration from the implicit method and discretize in time we get
u
n+1
u
n
k
= P
1
u
n+1
+ P
2
u
n+1
+
n+1
+ O(k) (4.35)
or
(I kP
1
kP
2
)u
n+1
= u
n
+ k
n+1
+ O(k
2
) (4.36)
or
(I kP
1
)(I kP
2
)u
n+1
= (I + k
2
P
1
P
2
)u
n
+ k
n+1
+ O(k
2
) (4.37)
where we in the last equation have incorporated k
2
P
1
P
2
(u
n+1
u
n
) in the O(k
2
)-
term. (4.37) is now discretized in the space directions to
(I kP
n+1
1h
)(I kP
n+1
2h
)v
n+1
= (I + k
2
P
n
1h
P
n
2h
)v
n
+ k
n+1
(4.38)
and this formula can be split into the following two which are known as the
Douglas-Rachford method [5]:
(I kP
n+1
1h
) v
n+
1
2
= (I + kP
n
2h
)v
n
+ (4.39)
(I kP
n+1
2h
)v
n+1
= v
n+
1
2
kP
n
2h
v
n
+ (4.40)
To check that v
n+1
in (4.40) is also the solution to (4.38) we take v
n+1
from (4.40)
and apply the dierence operators from (4.38):
(I kP
n+1
1h
)(I kP
n+1
2h
)v
n+1
= (I kP
n+1
1h
) v
n+
1
2
kP
n
2h
v
n
+
= (I + kP
n
2h
)v
n
+ (I kP
n+1
1h
)(kP
n
2h
v
n
)
= (I + k
2
P
n+1
1h
P
n
2h
)v
n
+ + (I kP
n+1
1h
). (4.41)
In order to match the inhomogeneous term in (4.38) a natural choice for and
would be = k
n+1
, = 0.
One could question the relevance of the rst order terms on the r.h.s. of (4.39)
and (4.40). Actually the k
2
-term in (4.37) could easily be incorporated in the
O(k
2
)-term and the result would be a simpler version of formula (4.38) which
could be split into
(I kP
n+1
1h
) v
n+
1
2
= v
n
+ (4.42)
(I kP
n+1
2h
)v
n+1
= v
n+
1
2
+ (4.43)
65
where we again would suggest = k
n+1
, = 0 in order to match a possible
inhomogeneous term.
The practical considerations are dealt with as for Peaceman-Rachford or DYako-
nov. We just summarize the results for the computational work which is similar
to Peaceman-Rachford for (4.39) (4.40) and slightly less ( 18LM SAO) for
the simpler scheme (4.42) (4.43).
4.10 Stability of Douglas-Rachford
Again we expect to inherit the unconditional stability of the implicit method
but we had better check it directly. We again look at the special case u
t
=
b
1
u
xx
+ b
2
u
yy
, we use (4.27) and for simplicity we introduce
x
1
= 4b
1

1
sin
2

1
2
, x
2
= 4b
2

2
sin
2

2
2
. (4.44)
From (4.39) (4.40) we get
(1 + x
1
) g = 1 x
2
, (1 + x
2
)g = g + x
2
, (4.45)
such that
g =
1 x
2
+ (1 + x
1
)x
2
(1 + x
1
)(1 + x
2
)
=
1 + x
1
x
2
1 + x
1
+ x
2
+ x
1
x
2
. (4.46)
Since x
1
> 0 and x
2
> 0 we have 0 < g < 1 just like we hoped. For the simpler
scheme (4.42) (4.43) the result is
g =
1
(1 + x
1
)(1 + x
2
)
(4.47)
which also ensures 0 < g < 1. We mention in passing that the original implicit
method would have given
g =
1
1 + x
1
+ x
2
(4.48)
For large values of x
1
and x
2
, i.e. large values of b
i

i
and
i
formula (4.46)
gives g 1 corresponding to weak damping whereas (4.47) gives g 0 corre-
sponding to strong damping and even stronger than with the implicit scheme
(4.48). This may speak in favour of the simpler scheme (4.42) (4.43).
66
4.11 The local error
In order to check the local error we use the symbols of the dierential and die-
rence operators (cf. [13, page 56]). We consider the simple equation
u
t
b
1
u
xx
b
2
u
yy
= (4.49)
with constant coecients and use the test functions
v
n
lm
= e
snk
e
ilh
1

1
e
imh
2

2
= e
st
e
ix
1
e
iy
2
, v = gv
n
lm
. (4.50)
The symbol for the dierential operator in (4.49) is
p(s,
1
,
2
) = s + b
1

2
1
+ b
2

2
2
. (4.51)
We rst look at the simple scheme (4.42) (4.43) where we get
(1 + 4b
1
k
h
2
1
sin
2
h
1

1
2
) g = 1 + k
n+1
, (4.52)
(1 + 4b
2
k
h
2
2
sin
2
h
2

2
2
)e
sk
= g (4.53)
or
(1 + 4b
1
k
h
2
1
sin
2
h
1

1
2
)(1 + 4b
2
k
h
2
2
sin
2
h
2

2
2
)e
sk
1 = k
n+1
. (4.54)
Taylor expanding the l.h.s. gives
(1 + kb
1

2
1
+ O(h
2
1
))(1 + kb
2

2
2
+ O(h
2
2
))(1 + sk +
1
2
s
2
k
2
+ O(k
3
)) 1. (4.55)
Before we begin checking orders we should remember that we have multiplied by
k in order to get formula (4.36) and the following formulae. Therefore we must
divide (4.55) by k in order to get back to the standard form and we now get
p
kh
(s,
1
,
2
) = s + b
1

2
1
+ b
2

2
2
+ k(b
1
s
2
1
+ b
2
s
2
2
+ b
1
b
2

2
1

2
2
+
1
2
s
2
)
+ O(k
2
+ h
2
1
+ h
2
2
). (4.56)
Since the inhomogeneous term is evaluated at t = (n + 1)k we have
r
kh
(s,
1
,
2
) = e
sk
= 1 + sk + O(k
2
). (4.57)
We now combine (4.51), (4.56) and (4.57) in
p
kh
r
kh
p = k(b
1
s
2
1
+ b
2
s
2
2
+ b
1
b
2

2
1

2
2
+
1
2
s
2
s
2
b
1
s
2
1
b
2
s
2
2
)
+ O(k
2
+ h
2
1
+ h
2
2
)
= k(b
1
b
2

2
1

2
2

1
2
s
2
) + O(k
2
+ h
2
1
+ h
2
2
). (4.58)
67
Formula (4.58) shows that the simple ADI scheme (4.42) (4.43) is indeed rst
order in time and second order in the space variables as we would expect for a
scheme derived from the implicit method. We note that w.r.t. the order of the
local error it is not important whether we compute the inhomogeneous term at
t = (n+1)k or at t = nk. It might have an eect on the size of the error, though.
For the traditional Douglas-Rachford scheme we have instead of (4.54)
(1 + 4b
1
k
h
2
1
sin
2
h
1

1
2
)(1 + 4b
2
k
h
2
2
sin
2
h
2

2
2
)e
sk
(1 4b
2
k
h
2
2
sin
2
h
2

2
2
)
(1 + 4b
1
k
h
2
1
sin
2
h
1

1
2
) 4b
2
k
h
2
2
sin
2
h
2

2
2
= k
n+1
. (4.59)
We Taylor expand the left hand side:
(1 + kb
1

2
1
+ O(h
2
1
))(1 + kb
2

2
2
+ O(h
2
2
))(1 + sk +
1
2
s
2
k
2
+ O(k
3
))
(1 kb
2

2
2
+ O(h
2
2
)) (1 + kb
1

2
1
+ O(h
2
1
))kb
2

2
2
+ O(h
2
2
) (4.60)
and nd for the dierence operator
p
kh
(s,
1
,
2
) = s + b
1

2
1
+ b
2

2
2
+ k(b
1
s
2
1
+ b
2
s
2
2
+ b
1
b
2

2
1

2
2
+
1
2
s
2
b
1
b
2

2
1

2
2
) + O(k
2
+ h
2
1
+ h
2
2
) (4.61)
and nally
p
kh
r
kh
p = k(b
1
s
2
1
+ b
2
s
2
2
+
1
2
s
2
s
2
b
1
s
2
1
b
2
s
2
2
) + O(k
2
+ h
2
1
+ h
2
2
)
=
1
2
ks
2
+ O(k
2
+ h
2
1
+ h
2
2
). (4.62)
This result looks more elegant than (4.58) but whether the error becomes smaller
is quite a dierent matter.
For Peaceman-Rachford and DYakonov the formula corresponding to (4.55) and
(4.60) is
(1 +
1
2
kb
1

2
1
+ O(h
2
1
))(1 +
1
2
kb
2

2
2
+ O(h
2
2
))(1 + sk +
1
2
s
2
k
2
+ O(k
3
))
(1
1
2
kb
1

2
1
+ O(h
2
1
))(1
1
2
kb
2

2
2
+ O(h
2
2
)) (4.63)
such that
p
kh
(s,
1
,
2
) = s +
1
2
b
1

2
1
+
1
2
b
2

2
2
+
1
2
b
1

2
1
+
1
2
b
2

2
2
(4.64)
+ k(
1
2
b
1
s
2
1
+
1
2
b
2
s
2
2
+
1
4
b
1
b
2

2
1

2
2
+
1
2
s
2

1
4
b
1
b
2

2
1

2
2
)
+ O(k
2
+ h
2
1
+ h
2
2
).
68
The exact expression for r
kh
(s,
1
,
2
) depends on which one of the choices (4.23)
(4.25) we select, but up to O(k
2
), and O(h
2
1
) in case of (4.23), we get
r
kh
(s,
1
,
2
) = 1 +
1
2
sk + O(k
2
+ h
2
1
). (4.65)
We now combine (4.51), (4.64) and (4.65):
p
kh
r
kh
p = s + b
1

2
1
+ b
2

2
2
+
1
2
k(b
1
s
2
1
+ b
2
s
2
2
+ s
2
)
(1 +
1
2
sk)(s + b
1

2
1
+ b
2

2
2
) + O(k
2
+ h
2
1
+ h
2
2
) (4.66)
= O(k
2
+ h
2
1
+ h
2
2
)
showing that Peaceman-Rachford is indeed second order accurate at least for the
simple equation (4.49) with constant coecients. Extending the result to lower
order terms presents no problem but if the coecients are allowed to vary with
space and time we have a more complicated picture as mentioned in the discussion
on page 61.
4.12 The global error
In order to study the global error we introduce a set of auxiliary functions which
may depend on (t, x, y) but not on (k, h
1
, h
2
) and we assume that the numerical
solution can be written as
v = u h
1
c
1
h
2
c
2
kd h
1
ke
1
h
2
ke
2
h
2
1
f
1
h
2
2
f
2
k
2
g (4.67)
We need a similar assumption for the intermediate values
v = u h
1
c
1
h
2
c
2
k

d h
1
k e
1
h
2
k e
2
h
2
1

f
1
h
2
2

f
2
k
2
g (4.68)
and we shall seek information on these auxiliary functions. We shall assume
Dirichlet boundary conditions and therefore have homogeneous side conditions
for all the auxiliary functions. Beginning with the simple version of Douglas-
Rachford (4.42) (4.43) we have
(I kP
n+1
1h
) v = v
n
+ k
n+1
(4.69)
(I kP
n+1
2h
)v
n+1
= v (4.70)
where P
1h
and P
2h
are discretized versions of P
1
and P
2
from (4.6) (4.7). Using
Taylor expansion we have
P
2h
u = (b
2

2
y
a
2

y
+ (1 ))u
= b
2
u
yy
a
2
u
y
+ (1 )u +
1
12
b
2
h
2
2
u
4y

1
6
a
2
h
2
2
u
yyy
+ O(h
4
2
)
= P
2
u +
1
12
b
2
h
2
2
u
4y

1
6
a
2
h
2
2
u
yyy
+ O(h
4
2
) (4.71)
69
and similarly for the auxiliary functions and for P
1
.
Inserting (4.67) on the l.h.s. of (4.70) and applying (4.71) we have with t =
(n + 1)k as expansion point
(I kP
2h
)v
n+1
= (I kP
2h
)(u h
1
c
1
k
2
g) + O( )
= (I kP
2
)(u h
1
c
1
k
2
g) (4.72)

1
12
b
2
kh
2
2
u
4y
+
1
6
a
2
kh
2
2
u
yyy
+ O( )
O( ) shall here and in the following indicate third order terms in k, h
1
and h
2
and will therefore include the two terms with u
4y
and u
yyy
.
Inserting (4.68) on the r.h.s. of (4.70) and equating terms we get
u = u, c
1
= c
1
, c
2
= c
2
,

f
1
= f
1
,

f
2
= f
2
(4.73)
together with

d = d + P
2
u, (4.74)
e
1
= e
1
P
2
c
1
, (4.75)
e
2
= e
2
P
2
c
2
, (4.76)
g = g P
2
d. (4.77)
Next we insert (4.68) on the l.h.s. of (4.69)
(I kP
1h
) v = (I kP
1h
)( u h
1
c
1
k
2
g) + O( )
= (I kP
1
)( u h
1
c
1
k
2
g) (4.78)

1
12
b
1
kh
2
1
u
4x
+
1
6
a
1
kh
2
1
u
xxx
+ O( )
For the r.h.s. of (4.69) we must remember that the expansion point is t = (n+1)k:
v
n
+ k
n+1
= (u h
1
c
1
k
2
g) + k (4.79)
ku
t
+ h
1
kc
1t
+ h
2
kc
2t
+ k
2
d
t
+
1
2
k
2
u
tt
+ O( )
Equating terms in (4.78) and (4.79) conrms (4.73) and adds

d + P
1
u = d + u
t
, (4.80)
e
1
P
1
c
1
= e
1
c
1t
, (4.81)
e
2
P
1
c
2
= e
2
c
2t
, (4.82)
g P
1

d = g d
t

1
2
u
tt
. (4.83)
Comparing (4.74) and (4.80) and remembering that u = u we have
d + P
2
u = d + u
t
P
1
u
70
or
u
t
P
1
u P
2
u = (4.84)
conrming the consistency of our assumptions.
(4.75) and (4.81) together with (4.73) gives
e
1
P
2
c
1
= e
1
c
1t
+ P
1
c
or
c
1t
P
1
c
1
P
2
c
1
= 0. (4.85)
From (4.76) and (4.82) we get a similar equation for c
2
and since the side condi-
tions are also homogeneous we may conclude that c
1
c
2
0.
From (4.77) and (4.83) we get
g P
2
d = g d
t
+ P
1

d
1
2
u
tt
= g d
t
+ P
1
d
1
2
u
tt
+ P
1
P
2
u
or
d
t
P
1
d P
2
d =
1
2
u
tt
+ P
1
P
2
u (4.86)
showing that d(t, x, y) is not identically 0 and that the error therefore is rst
order in k. If we continue in this way we shall see that f
1
and f
2
are also dierent
from 0 and that the error therefore is O(k + h
2
1
+ h
2
2
) as we would expect.
We might note here that we have multiplied by k in order to get to equations
(4.69) (4.70) and this is the reason why we only get information about the
auxiliary functions corresponding to the rst order terms in (4.67) even though
we compare terms up to and including second order.
For the original Douglas-Rachford scheme (4.39) (4.40) the equations are
(I kP
1h
) v = (I + kP
2h
)v
n
+ k
n+1
, (4.87)
(I kP
2h
)v
n+1
= v kP
2h
v
n
. (4.88)
The extra term which must be added in (4.87) and subtracted in (4.88) is
kP
2h
v
n
= kP
2h
(u h
1
c
1
h
2
c
2
kd) + O( )
= kP
2
(u h
1
c
1
h
2
c
2
kd) + O( )
= kP
2
(u h
1
c
1
h
2
c
2
kd) k
2
P
2
u
t
+ O( ) (4.89)
where the last term is due to the expansion point being at t = (n + 1)k.
Equating terms in (4.88) gives the equalities (4.73) together with

d = d, e
1
= e
1
, e
2
= e
2
, g = g + P
2
u
t
(4.90)
71
and from (4.87) we get

d + P
1
u = d + u
t
P
2
u , (4.91)
e
1
P
1
c
1
= e
1
c
1t
+ P
2
c
1
, (4.92)
e
2
P
1
c
2
= e
2
c
2t
+ P
2
c
2
, (4.93)
g P
1

d = g d
t
+ P
2
d
1
2
u
tt
+ P
2
u
t
. (4.94)
The rst three imply (4.84), (4.85) and its analogue such that we again can
conclude that c
1
c
2
0.
From (4.90) and (4.94) we nally deduce that
d
t
P
1
d P
2
d =
1
2
u
tt
(4.95)
in accordance with our expectations that Douglas-Rachford is rst order in time.
We note that v now is a rather good approximation to v
n+1
but whether v
n+1
now is a better or worse approximation to u is impossible to decide.
A similar analysis can be performed for Peaceman-Rachford and dYakonov to
show that these methods are indeed second order in all three step sizes.
In practice we can check the orders and estimate the various contributions to the
error using the methods from chapter 3 taking each step size separately.
72
Chapter 5
Equations with mixed derivative
terms
We now return to the general equation (4.1). As a dierence approximation to
u
xy
we shall use

2
xy
v
lm
=
x
(
y
v
lm
) =
x
(
v
l,m+1
v
l,m1
2h
2
)
=
1
4h
1
h
2
(v
l+1,m+1
v
l1,m+1
v
l+1,m1
+ v
l1,m1
) (5.1)
=
y
(
x
v
lm
) =
2
yx
v
lm
.
There is no obvious way of splitting the mixed derivative or dierence operator
among the two operators P
1
and P
2
in (4.5) so we shall instead treat the mixed
derivative term in a way analogous to what we did for the inhomogeneous term.
The rst scheme we consider is the simple Douglas-Rachford scheme (4.42)
(4.43) where and now should be chosen to take care of the mixed derivative
term (in addition to a possible inhomogeneous term which we shall disregard
here).
Following the analysis on page 66 we shall select and such that
+ (I kP
n+1
1h
) = 2kb
12

2
xy
v
n+1
+ O(k
2
). (5.2)
There are three straightforward choices for and which will satisfy (5.2):
= = kb
12

2
xy
v
n
, (5.3)
= kb
12

2
xy
v
n
, = kb
12

2
xy
v, (5.4)
= 2kb
12

2
xy
v
n
, = 0. (5.5)
73
For the traditional Douglas-Rachford scheme the condition is the same so the
same three possibilities for and apply.
For the Peaceman-Rachford scheme (4.19) (4.20) we would aim at
(I +
1
2
kP
n
1h
) + (I
1
2
kP
n+1
1h
) = kb
12

2
xy
(v
n+1
+ v
n
) + O(k
3
). (5.6)
This is a bit more dicult to achieve. Two obvious suggestions are (5.3) and
(5.4) but they are not quite accurate enough and the resulting method becomes
only rst order in time.
Formula (5.3) would be good enough if we could replace v
n
by an intermediate
value v
n+
1
2
. An approximation to a value at time t = (n +
1
2
)k can be obtained
by extrapolation from values at t = (n 1)k and t = nk:
v
n+
1
2
= v
n
+
1
2
(v
n
v
n1
) (5.7)
and a good suggestion for and is now
= = kb
12

2
xy
v
n+
1
2
. (5.8)
In the same manner (5.4) would be good if we could replace v by v
n+1
. An
extrapolated value would here be v
n
+ (v
n
v
n1
) such that we have
= kb
12

2
xy
v
n
, = kb
12

2
xy
(2v
n
v
n1
) (5.9)
and both (5.8) and (5.9) would lead to schemes which are second order in time.
5.1 Practical considerations
We have now about a dozen dierent combinations but they are not all equally
good. The rst point to consider is how to get boundary values at x = X
1
and
x = X
2
for v. For the Simple Douglas-Rachford scheme (4.42) (4.43) we solve
(4.43) to get
v = (I kP
n+1
2h
)v
n+1
(5.10)
If we choose formula (5.5) then = 0 and (5.10) can be used as it stands.
If we choose formula (5.3) then the -term involves
2
xy
v
n
which cannot be calcu-
lated at the boundary points. There are two ways around this.
1. Take the dierence at the nearest neighbour point, e.g.

2
xy
v
n
0m
:=
2
xy
v
n
1m
74
This will introduce an error of the rst order in h
1
and this is not ideal.
2. Use a linear extrapolation in the x-direction, e.g.

2
xy
v
n
0m
:= 2
2
xy
v
n
1m

2
xy
v
n
2m
Formula (5.4) presents even bigger problems since we cannot calculate
2
xy
v at
the neighbouring points to any of the boundaries, so this option cannot be rec-
ommended.
For the Traditional Douglas-Rachford scheme (4.39) (4.40) the formula for v at
the x-boundaries is
v = (I kP
n+1
2h
)v
n+1
+ kP
n
2h
v
n
(5.11)
and the same considerations apply.
For the Peaceman-Rachford scheme (4.19)-(4.20) the formula for v is (4.26) and
this clearly favours the case where = so that (5.3) and (5.8) are ideal choices.
(5.5) and (5.9) can be tackled using the suggestions above, whereas (5.4) still
presents grave problems.
5.2 Stability with mixed derivative
We shall study stability requirements in relation to the dierential equation
u
t
= b
1
u
xx
+ 2b
12
u
xy
+ b
2
u
yy
(5.12)
with the discretization
v
n+1
lm
v
n
lm
k
= (1 )(b
1

2
x
+ 2b
12

2
xy
+ b
2

2
y
)v
n
lm
(5.13)
+(b
1

2
x
+ 2b
12

2
xy
+ b
2

2
y
)v
n+1
lm
where = 0, 0.5, and 1 corresponds to the explicit, the Crank-Nicolson, and the
implicit method, respectively. We put the discretized solution on the form
v
n
lm
= e
snk
e
i
1
lh
1
e
i
2
mh
2
= g
n
e
il
1
e
im
2
(5.14)
and use the abbreviations
kb
1

2
x
v
n
lm
= 4b
1

1
sin
2

1
2
= 4x
1
, (5.15)
kb
2

2
x
v
n
lm
= 4b
2

2
sin
2

2
2
= 4x
2
, (5.16)
2kb
12

2
x
v
n
lm
= 2b
12

12
sin
1
sin
2
= 2x
12
(5.17)
75
introducing the quantities x
1
, x
2
, and x
12
.
Remember that the condition for (5.12) to be parabolic is
b
2
12
< b
1
b
2
(5.18)
together with b
1
> 0 and b
2
> 0. It follows that
b
2
12

2
12
= b
2
12
k
2
h
2
1
h
2
2
< b
1
b
2
k
2
h
2
1
h
2
2
= b
1

1
b
2

2
(5.19)
or
[b
12
[
12
<

b
1

1
b
2

2
. (5.20)
We also have
0 (

b
1

b
2

2
)
2
= b
1

1
+ b
2

2
2

b
1

1
b
2

2
. (5.21)
Combining (5.20) and (5.21) we get
2[b
12
[
12
< 2

b
1

1
b
2

2
b
1

1
+ b
2

2
. (5.22)
Similarly we have
0 (

b
1
h
1
sin

1
2

b
2
h
2
sin

2
2
)
2
(5.23)
=
b
1
h
2
1
sin
2

1
2
+
b
2
h
2
2
sin
2

2
2
2

b
1
b
2
h
1
h
2
sin

1
2
sin

2
2
.
Use (5.18), multiply by k and rearrange
2[b
12
[
12
[ sin

1
2
sin

2
2
[ < b
1

1
sin
2

1
2
+ b
2

2
sin
2

2
2
. (5.24)
Now multiply by 4[ cos

1
2
cos

2
2
[ 4 and get
2[b
12
[
12
[ sin
1
sin
2
[ < 4b
1

1
sin
2

1
2
+ 4b
2

2
sin
2

2
2
(5.25)
or
2[x
12
[ < 4x
1
+ 4x
2
. (5.26)
For the explicit scheme the growth factor becomes
g = 1 4x
1
4x
2
2x
12
76
and for stability we require 1 g 1. g 1 is equivalent to (5.26) and is
therefore satised for a parabolic equation. g 1 is equivalent to
2x
1
+ 2x
2
+ x
12
1.
A close inspection reveals that the maximum value of the l.h.s. is attained for

1
=
2
= such that we get the condition
2b
1

1
+ 2b
2

2
1 (5.27)
just as without the mixed term.
For the implicit scheme the growth factor is
g =
1
1 + 4x
1
+ 4x
2
+ 2x
12
(5.28)
and because of (5.26) we have always 0 g 1.
For Crank-Nicolson we have
g =
1 2x
1
2x
2
x
12
1 + 2x
1
+ 2x
2
+ x
12
(5.29)
and because of (5.26) we have always 1 g 1.
Altogether the mixed derivative term does not alter the basic stability properties
of the explicit, Crank-Nicolson, or the implicit method. But in practice we do
not wish to use any of these. We would rather prefer an ADI-method.
5.3 Stability of ADI-methods
We rst look at the Simple Douglas-Rachford scheme (4.42)-(4.43) together with
= as given by (5.3). The equations for the growth factor are
(1 + 4x
1
) g = 1 x
12
, (5.30)
(1 + 4x
2
)g = g x
12
, (5.31)
g =
1 x
12
(1 + 4x
1
)(1 + 4x
2
)

x
12
1 + 4x
2
=
1 2x
12
4x
1
x
12
(1 + 4x
1
)(1 + 4x
2
)
.(5.32)
A necessary condition for stability is g 1 or
x
12
(1 + 2x
1
) 2x
1
+ 2x
2
+ 8x
1
x
2
or
x
12
2x
2
+ 2x
1
1 + 2x
2
1 + 2x
1
.
77
Comparing with (5.26) we suspect that we may be in trouble when x
2
< x
1
and
is large. Actually the inequality is violated when b
1
= b
2
= 1, b
12
= 0.9, h
1
= h
2
,

1
= /2, > 10, and
2
is small and negative.
If we combine the Simple Douglas-Rachford scheme with (5.4) the equations for
the growth factor are
(1 + 4x
1
) g = 1 x
12
, (5.33)
(1 + 4x
2
)g = g(1 x
12
), (5.34)
g =
1 x
12
1 + 4x
2

1 x
12
1 + 4x
1
. (5.35)
We notice immediately that g 0 and the condition g 1 is equivalent to
2[x
12
[ + x
2
12
4x
1
+ 4x
2
+ 16x
1
x
2
which follows from (5.26). We conclude that this combination is unconditionally
stable which makes it interesting despite the problems with boundary values for
v.
If we combine the Simple Douglas-Rachford scheme with (5.5) the equations for
the growth factor are
(1 + 4x
1
) g = 1 2x
12
, (5.36)
(1 + 4x
2
)g = g, (5.37)
g =
1 2x
12
(1 + 4x
1
)(1 + 4x
2
)
. (5.38)
From (5.26) it follows readily that 1 g 1 and that we therefore have
unconditional stability which makes this scheme very interesting indeed.
If we combine the Traditional Douglas-Rachford scheme (4.39) (4.40) with (5.3)
the equations for the growth factor are
(1 + 4x
1
) g = 1 4x
2
x
12
, (5.39)
(1 + 4x
2
)g = g + 4x
2
x
12
, (5.40)
g =
1 4x
2
x
12
(1 + 4x
1
)(1 + 4x
2
)
+
4x
2
x
12
1 + 4x
2
(5.41)
=
1 2x
12
4x
1
x
12
+ 16x
1
x
2
(1 + 4x
1
)(1 + 4x
2
)
.
Comparing with Simple Douglas-Rachford it is apparent that we have even
greater problems with the stability condition g 1.
If we combine the Traditional Douglas-Rachford scheme with (5.4) the equations
for the growth factor are
(1 + 4x
1
) g = 1 4x
2
x
12
, (5.42)
78
(1 + 4x
2
)g = g(1 x
12
) + 4x
2
, (5.43)
g =
1 x
12
1 + 4x
2

1 x
12
4x
2
1 + 4x
1
+
4x
2
1 + 4x
2
(5.44)
=
(1 x
12
)
2
+ 4x
2
x
12
+ 16x
1
x
2
(1 + 4x
1
)(1 + 4x
2
)
.
If we supplement our earlier counterexample with
2
= /2, and > 10 then we
have g > 1 violating the stability requirement.
If we combine the Traditional Douglas-Rachford scheme with (5.5) the equations
for the growth factor are
(1 + 4x
1
) g = 1 4x
2
2x
12
, (5.45)
(1 + 4x
2
)g = g + 4x
2
, (5.46)
g =
1 4x
2
2x
12
(1 + 4x
1
)(1 + 4x
2
)
+
4x
2
1 + 4x
2
(5.47)
=
1 + 16x
1
x
2
2x
12
(1 + 4x
1
)(1 + 4x
2
)
.
From (5.26) it follows readily that 1 g 1 and once again we have a useful
combination.
If we combine the Peaceman-Rachford scheme (4.19)-(4.20) with (5.3) the equa-
tions for the growth factor are
(1 + 2x
1
) g = 1 2x
2
x
12
, (5.48)
(1 + 2x
2
)g = g(1 2x
1
) x
12
, (5.49)
g =
(1 2x
1
)(1 2x
2
x
12
)
(1 + 2x
1
)(1 + 2x
2
)

x
12
1 + 2x
2
(5.50)
=
(1 2x
1
)(1 2x
2
) 2x
12
(1 + 2x
1
)(1 + 2x
2
)
.
g 1 follows directly from (5.26), and g 1 is equivalent to
1 + 4x
1
x
2
2x
12
1 4x
1
x
2
or
1 + 4x
1
x
2
x
12
0.
If
1
and
2
have dierent signs or if b
12
< 0 then x
12
< 0 and we are done. We
can therefore assume 0 <
1
,
2
< and b
12
> 0.
0 (1 2

b
1
b
2
k
h
1
h
2
sin

1
2
sin

2
2
)
2
= 1 + 4b
1

1
b
2

2
sin
2

1
2
sin
2

1
2
4

b
1
b
2

12
sin

1
2
sin

2
2
1 + 4x
1
x
2
4b
12

12
sin

1
2
sin

2
2
cos

1
2
cos

2
2
= 1 + 4x
1
x
2
x
12
79
thus proving stability.
If we combine the Peaceman-Rachford scheme (4.19)-(4.20) with (5.8) the equa-
tions for the growth factor are
(1 + 2x
1
) g = 1 2x
2
x
12
(
3
2

1
2
g
1
), (5.51)
(1 + 2x
2
)g = g(1 2x
1
) x
12
(
3
2

1
2
g
1
), (5.52)
(1 + 2x
1
)(1 + 2x
2
)g = (1 2x
1
)(1 2x
2
x
12
(
3
2

1
2
g
1
))
x
12
(
3
2

1
2
g
1
)(1 + 2x
1
)
= (1 2x
1
)(1 2x
2
) x
12
(3 g
1
). (5.53)
We thus have a quadratic equation for g:
(1 + 2x
1
)(1 + 2x
2
)g
2
((1 2x
1
)(1 2x
2
) 3x
12
)g x
12
= 0.
A close examination of this equation will reveal that the roots are always less
than 1 in magnitude such that also this combination is unconditionally stable.
5.4 Summary
We conclude this section with a summary of results. Based on the practical
considerations and the stability properties we can recommend (5.5) together with
either the simple or the traditional Douglas-Rachford method. The Peaceman-
Rachford method plays well together with (5.3) although the result will only be
rst order in t. For a second order method we recommend (5.8).
80
Chapter 6
Two-factor models two
examples
6.1 The Brennan-Schwartz model
A model for the determination of prices for bonds suggested by Brennan og
Schwartz [3] can be formulated in the following way:
u
t
= (
r

r
)u
r
+ (

2
l
l
+ (l r)l)u
l
ru +
1
2

2
r
u
rr
+
r

l
u
rl
+
1
2

2
l
u
ll
where
r is the short interest,
l is the long interest,
u is the price of the bond,

r
= a
1
+ b
1
(l r),

r
= r
r
, og

l
= l
l
.
The coecients have been estimated to
a
1
= 0.00622,
b
1
= 0.2676,

r
= 0.10281,

l
= 0.02001,
= 0.0022,
= 0.9.
We transform the r-interval [0, ) to (0, 1] using
x =
1
1 +
r
r
, r =
1 x

r
x
,
81
and similarly for l:
y =
1
1 +
l
l
, l =
1 y

l
y
,
where the transformation coecients
r
and
l
are chosen properly, often between
10 and 13. An interest interval from 10% to 1% will with = 10 be transformed
into [0.5, 0.91] and with = 13 into [0.43, 0.88].
We now have
u
r
=
u
x
dx
dr
=

r
(1 +
r
r)
2
u
x
=
r
x
2
u
x
u
l
=
l
y
2
u
y

2
u
r
2
=

x
(
r
x
2
u
x
)
dx
dr
= 2
2
r
x
3
u
x
+
2
r
x
4
u
xx

2
u
l
2
= 2
2
l
y
3
u
y
+
2
l
y
4
u
yy

2
u
rl
=

x
(
l
y
2
u
y
)
dx
dr
=
r

l
x
2
y
2
u
xy
and the dierential equation becomes
u
t
= au
xx
+ 2bu
xy
+ cu
yy
+ du
x
+ eu
y
+ fu,
where
a = a(x) =
1
2

2
r

2
r
x
4
=
1
2

2
r
(1 x)
2
x
2
b = b(x, y) =
1
2

l
x
2
y
2
=
1
2

l
(1 x)(1 y)xy
c = c(y) =
1
2

2
l

2
l
y
4
=
1
2

2
l
(1 y)
2
y
2
d = d(x, y) =
2
r

2
r
x
3
(
r

r
)
r
x
2
= x((1 x)(
2
r
(1 x) + b
1
+
r
) a
1

r
x b
1

l
x
y
(1 y))
e = e(x, y) = (
2
l
+ l r)l
l
y
2
+
2
l
(1 y)
2
y
= y(1 y)(
2
l
y
1 y

l
y
+
1 x

r
x
)
f = f(x) = r =
1 x

r
x
We note in passing that the dierential equation is parabolic since
ac b
2
=
1
4

2
r

2
l
(1 x)
2
(1 y)
2
x
2
y
2
(1
2
) > 0.
82
The initial condition at t = 0 is the value of the bond at expiry, i.e.
u(x, y, 0) = 1
The boundary condition at x = 0 is found by multiplying the dierential equation
by x and then let x 0, which gives
0 = (1 y)
y

r
u
y

r
u
or
(1 y)y
du
dy
= u
or
1
u
du =
1
(1 y)y
dy = (
1
1 y
+
1
y
)dy.
Integration gives
ln u ln u
0
= ln y ln y
0
ln(1 y) + ln(1 y
0
)
or
u = u
0
y
1 y
1 y
0
y
0
.
We wish u to be bounded, also when y 1, and therefore we must have u
0
= 0,
and thus
u(0, y, t) = 0.
This is a so-called natural boundary condition, i.e. a condition which follows
naturally from the equation. It also ts well to our intuitive understanding that
if r then the back value of the bond will not be particularly great.
The boundary condition at y = 0 is found in a similar way by multiplying with
y and then let y 0. We then nd
0 = b
1

l
x
2
u
x
i.e. u must be constant, and since u(0, 0, t) = 0 according to the rst boundary
condition we must have
u(x, 0, t) = 0;
but this also conforms with our intuition about the case l .
83
Boundary conditions for y = 1 are found by inserting y = 1 in the dierential
equation. We then have
b(x, 1) = c(1) = e(x, 1) = 0
and
u
t
=
1
2

2
r
(1 x)
2
x
2
u
xx
+ ((1 x)(
2
r
(1 x) + b
1
+
r
) a
1

r
x)xu
x

1 x

r
x
u.
This is a parabolic equation in one space dimension which can be solved be-
forehand, or perhaps rather concurrently with the solution in the interior. This
equation has the initial condition u(x, 1, 0) = 1 from the general initial condi-
tion and the boundary conditions u(0, 1, t) = 0 from the boundary condition for
x = 0, and u(1, 1, t) = 1 from the argument that when both the interests are 0,
then the bond will retain its value.
In order to nd boundary conditions for x = 1 we likewise put x = 1 in the
dierential equation and nd
a(1) = b(1, y) = f(1) = 0
and
u
t
=
1
2

2
l
(1 y)
2
y
2
u
yy
(a
1

r
+ b
1

l
1 y
y
)u
x
(1 y)(
2
l
y
2
+
1 y

l
)u
y
This is in principle a parabolic dierential equation in t and y with an extra term
involving u
x
and therefore referring to u-values in the interior. This equation
cannot be solved beforehand but must be solved concurrently with the solution
in the interior.
The initial condition is as before u(1, y, 0) = 1, and the boundary conditions are
u(1, 0, t) = 0 and u(1, 1, t) = 1.
6.2 Practicalities
We should like to implement an ADI method for the solution of this problem.
One small detail in the practical considerations is that we need values for v on
two of the boundaries of the region. Because of the diculties mentioned above
getting boundary values at x = 1 it seems convenient to reverse the order of the
operators P
1
and P
2
from the usual order in chapter 4. Thus we wish to solve for
v in the y-direction and therefore we shall need information on v at y = 0 and
y = 1 where information is more readily available.
84
We select a step size, h
1
, in the x-direction, or rather we select an integer, L,
and set h
1
= 1/L. Similarly the step size in the y-direction is given through the
integer, M, by h
2
= 1/M, and the time step by k = 1/N. Including boundary
nodes we thus have (L + 1)(M + 1)N function values to compute. With small
step sizes we might not have storage space for all these numbers at the same
time, but then again we dont need to. At any particular time step we only need
information corresponding to two consecutive time levels (or three for Peaceman-
Rachford) and we can therefore make do with two (or three) (L + 1)(M + 1)
arrays. If solution values are needed at intermediate times these can be recorded
along the way. Such values are usually only required at coarser intervals,

h
1
> h
1
,

h
2
> h
2
, and

k > k, and therefore require smaller arrays.
Because of the discontinuity between the initial values and the boundary values
at x = 0 and y = 0 it may be convenient to use or at least begin with the
Douglas-Rachford method. Using the simple version (4.42 4.43) and (5.5) for
the mixed derivative term the equations become
(I kP
2h
) v = v
n
+ 2kb(x, y)
2
xy
v
n
(6.1)
(I kP
1h
)v
n+1
= v (6.2)
The time step from nk to (n+1)k is now divided into a number of subtasks num-
bered like in section 4.6 except that we have added one subtask at the beginning.
0. Advance the solution on y = 1 using
u
t
= a(x)u
xx
+ d(x, 1)u
x
+ f(x)u (6.3)
discretized using the implicit method
v
n+1
l,M
v
n
l,M
= a
1
(v
n+1
l+1,M
2v
n+1
l,M
+ v
n+1
l1,M
) + 2d
1
(v
n+1
l+1,M
v
n+1
l1,M
) + fkv
n+1
l,M
or
(a
1
+
1
2
d
1
)v
n+1
l1,M
+ (1 + 2a
1
fk)v
n+1
l,M
(a
1
+
1
2
d
1
)v
n+1
l+1,M
= v
n
l,M
. (6.4)
This tridiagonal system of equations supplemented with the boundary conditions
v
n+1
0,M
= 0 and v
n+1
L,M
= 1 can now be solved using Gaussian elimination.
1. The r.h.s. of (6.1) requires v
n
at all interior points which is no problem
and
2
xy
v
n
at the same points which means v
n
at all points including those on
the boundary. The only problem arises at the rst time step because of the
discontinuity between the initial condition and the boundary conditions at x = 0
and y = 0. We recommend to use the initial value throughout and thereby avoid
divided dierences of the order 1/h
1
h
2
. In the present case the b-coecient is
85
rather small because of the small numerical value of so the eect of a dierent
choice is minimal.
2. We next compute v for y = 0 and y = 1 using (6.2) and get v
l,0
= 0 and
v
l,M
= (I kP
1h
)v
n+1
l,M
. (6.5)
Comparing (6.5) with (6.3) and (6.4) we note that there might be an advantage
in including the fu-term in the P
1
-operator because then (6.5) takes the simpler
form of
v
l,M
= v
n
l,M
. (6.6)
In the general case with fu in P
1
and (1 )fu in P
2
the formula for v becomes
v
l,M
= v
n
l,M
+ (1 )fkv
n+1
l,M
(6.7)
3. The system of equations (6.1) can now be solved for v at all interior points.
The system consists of L 1 tridiagonal systems of M 1 unknowns each and
they can be solved independently of each other.
4. The r.h.s. of system (6.2) consists of v at all interior points which we have
just computed in 3.
5. On each horizontal line (6.2) gives rise to one equation for each internal node,
i.e. a total of L 1 equations in L + 1 unknowns, the extra unknowns being the
values of v
n+1
at x = 0 and x = 1. The former is equal to 0, and for the latter
we must resort to the boundary equation
u
t
= c(y)u
yy
+ e(1, y)u
y
+ d(1, y)u
x
(6.8)
An implicit discretization of (6.8) could be
v
n+1
L,m
v
n
L,m
= c
2
(v
n+1
L,m+1
2v
n+1
L,m
+ v
n+1
L,m1
) +
1
2
e
2
(v
n+1
L,m+1
v
n+1
L,m1
)
+
1
2
d
1
(v
n+1
L2,m
4v
n+1
L1,m
+ 3v
n+1
L,m
) (6.9)
where we have used the asymmetric second order dierence approximation for
u
x
on the boundary. A simpler formula would result from replacing the last
parenthesis in (6.9) by (2v
n+1
L1,m
+ 2v
n+1
L,m
) but since this is only a rst order
approximation of u
x
the resulting v
n+1
would be only rst order correct in x.
Equation (6.9) supplies the extra information we need about v
n+1
L,m
, but now the
various rows are no longer independent of each other.
6. The total system which is outlined in Fig. 6.1 in the case L = M = 5 is
tridiagonal with three exceptions all due to equation (6.9): In each block there
86
Figure 6.1: The coecient matrix corresponding to L = M = 5.
is an element two places left of the diagonal in the last row (the coecient of
v
n+1
L2,m
). In each but the rst block there is an element L places left of the diagonal
in the last row (the coecient of v
n+1
L,m1
). In each but the last block there is an
element L places right of the diagonal in the last row (the coecient of v
n+1
L,m+1
).
On the r.h.s. of the system we must remember the eect of the boundary value at
(1,1) in the last equation. The other boundary value at (1,0) is 0 so no correction
is needed here.
Although the system of linear equations is not tridiagonal it still can be solved
using Gaussian elimination without introducing new non-zero elements, and the
solution process requires a number of simple arithmetic operations which is linear
in the number of unknowns and only marginally larger than that of a tridiagonal
system.
6.3 A Traditional Douglas-Rachford step
If one prefers to use the Traditional Douglas-Rachford method then the equations
to be solved instead of (6.1) and (6.2) are
(I kP
2h
) v = (I + kP
1h
)v
n
+ 2kb(x, y)
2
xy
v
n
(6.10)
(I kP
1h
)v
n+1
= v kP
1h
v
n
(6.11)
Most of the considerations of the preceding section are still applicable so we shall
just focus our attention on the dierences which occur in 1. and 4.
1. The r.h.s. of (6.10) now also includes P
1h
v
n
which means that it requires
knowledge of v
n
not only at all interior points but also for x = 0 and x = 1. The
only diculty lies at the very rst time step where we still prefer to settle the
discontinuity issue by adopting the initial value throughout.
4. Similar considerations apply for the P
1h
-term on the r.h.s. of (6.11).
87
6.4 The Peaceman-Rachford method
The implicit/Douglas-Rachford method is only rst order in time and therefore
we might prefer to use Crank-Nicolson/Peaceman-Rachford, possibly after an
initial implicit step.
The Peaceman-Rachford equations, augmented with (5.8) are
(I
1
2
kP
2h
) v = (I +
1
2
kP
1h
)v
n
+ kb(x, y)
2
xy
v
n+
1
2
(6.12)
(I
1
2
kP
1h
)v
n+1
= (I +
1
2
kP
2h
) v + kb(x, y)
2
xy
v
n+
1
2
(6.13)
where
v
n+
1
2
= v
n
+
1
2
(v
n
v
n1
) n 1. (6.14)
Formula (6.14) can not be used in the rst step but here it is OK to replace it by
v
n+
1
2
= v
n
. (6.15)
Since b(x, y) in this example is so small there is actually little dierence between
the results obtained with (6.14) and with (6.15).
Once again we divide the time step from nk to (n + 1)k into subtasks with the
same numbering as before.
0. On the boundary y = 1 it is now appropriate to discretize (6.3) using Crank-
Nicolson:
(
1
2
a
1
+
1
4
d
1
)v
n+1
l1,M
+ (1 + a
1

1
2
fk)v
n+1
l,M
(
1
2
a
1
+
1
4
d
1
)v
n+1
l+1,M
=
(
1
2
a
1

1
4
d
1
)v
n
l1,M
+ (1 a
1
+
1
2
fk)v
n
l,M
+ (
1
2
a
1
+
1
4
d
1
)v
n
l+1,M
.
This is a system of the same structure as (6.4) although with a more complicated
r.h.s. where previous considerations concerning the jump between the initial and
boundary values at (t, x) = (0, 0) apply at the rst step.
1. The r.h.s. of (6.12) is very similar to that of (6.10) and previous comments
apply.
2. v for y = 0 and y = 1 are now given by (4.26) which gives v
l,0
= 0 and
v
l,M
=
1
2
(I +
1
2
kP
1h
)v
n
l,M
+
1
2
(I
1
2
kP
1h
)v
n+1
l,M
. (6.16)
88
Again there is a computational advantage in including the fu-term in the P
1
-
operator in which case (6.16) reduces to
v
l,M
= (I +
1
2
kP
1h
)v
n
l,M
. (6.17)
In the general case with fu in P
1
and (1 )fu in P
2
the formula for v becomes
v
l,M
= (I +
1
2
kP
1h
)v
n
l,M
+
1
4
(1 )fk(v
n+1
l,M
v
n
l,M
). (6.18)
3. The system of equations (6.12) can now be solved for v at all interior points.
The system consists of L 1 tridiagonal systems of M 1 unknowns each and
they can be solved independently of each other.
4. The r.h.s. of system (6.13) requires knowledge of v at all interior points in
addition to the boundary values from 2.. The values needed for v are the same
as in 1..
5. Equation (6.13) now gives rise to a set of tridiagonal equations which must
be supplemented by the Crank-Nicolson equivalent of (6.9) to form a system of
equations with the same pattern of nonzeroes as before.
6. This system is now solved for v
n+1
at all interior grid points as well as at all
interior points on the boundary line x = 1.
6.5 Fine points on eciency
Eciency often amounts to a trade-o between storage space and computation
time. Readability of the program can also tip the scale in favour of one partic-
ular strategy. Since the coecients do not depend on time many things can be
computed once and reused in each time step. Also the Gaussian elimination can
be performed once and the components of the LU factors stored for later use.
The coecient functions (a, . . . , f) may be supplied as subroutines or they may
be computed ahead of time at all grid points and stored in arrays. b(x, y), d(x, y),
and e(x, y) require two-dimensional (L+1) (M +1) arrays, a(x), f(x), and c(y)
need one-dimensional vectors with L + 1 resp. M + 1 elements.
89
6.6 Convertible bonds
In a model by Longsta and Schwartz [9] the two independent variables are the
interest, r, and the volatility, V . The dierential equation can be written as
u
t
=
1
2
V u
rr
+ (( + )V r)u
rV
+
1
2
((
2
+ +
2
)V ( + )r)u
V V
+( + +


r +


V )u
r
+(
2
+
2
+
( )

r +


V )u
V
ru,
where the parameters have been estimated to
= 0.001149
= 0.1325
= 3.0493
= 0.05658
= 0.1582
= 3.998.
The conditions for this equation to be parabolic are
V > 0
and
V ((
2
+ +
2
)V ( + )r) (( + )V r)
2
> 0.
The last condition can be rewritten to
V
2
+
2

2
r
2
( + )rV < 0
or
V
2
( + )rV + r
2
< 0
or
r < V < r
since < in our case.
90
The equation is therefore only parabolic in part of the region (r > 0, V > 0). If
an equation is not parabolic then the problem is ill-posed, meaning that the norm
of the solution at a particular time is not guaranteed to be bounded in terms of
the initial condition (cf. [13, formula (1.5.4)]). Or, small changes in the initial
condition may produce large changes in the solution at a later time. In principle
the solution can become very large in a very short time although in practice we
may be able to retain a limited accuracy for small time intervals.
The problem is similar to that of weather forecasting. We are fairly condent
what the weather will be like an hour from now. We have a certain amount of
trust in what the meteorologists predict about to-morrow. But ve days ahead:
No way.
The main advice for ill-posed problems is not to touch them. It is far better to
search for another model with reasonable mathematical properties. If an ill-posed
problem must be solved then approach it very carefully. And be prepared that
our numerical methods may deceive us when they are used outside their usual
area of application. A more detailed amalysis is given in the next chapter
91
92
Chapter 7
Ill-posed problems
7.1 Theory
For the simple parabolic problem
u
t
= bu
xx
(7.1)
it is essential that b > 0.
From Fourier analysis we know that
u
t
(t, ) = b
2
u(t, ) (7.2)
and therefore that the Fourier transform of u can be written
u(t, ) = e
b
2
t
u(0, ) (7.3)
and by Parseval that

[u(t, x)[
2
dx =

[ u
t
(t, )[
2
d =

[e
2b
2
t
[ [ u(0, )[
2
d. (7.4)
When b > 0 we know that the exponential factor is 1 for t > 0 and therefore
that the norm of the solution at any time t is bounded by the norm of the initial
value. Furthermore it follows that small changes in the initial value will produce
small changes in the solution at later times.
If b < 0, or if we try to solve the heat equation backwards in time, the situation
is quite dierent. Since 2b
2
t > 0 we shall now observe a magnication of the
various components of the solution increasing with .
If the initial condition is smooth, consisting only of low frequency components
then the eect of the magnication is limited for small values of t. But if the
93
initial function contains high frequency components, or eqivalently that u(0, )
is dierent from 0 for large values of then the corresponding components of
the solution will exhibit a large magnication. The solution will be extremely
sensitive to small variations in the initial value if these variations have high
frequency. In mathematical terms the solution will not depend continuously on
the initial data.
The main advice is: Stay away from such problems.
Even if the initial function is smooth, the unavoidable rounding errors connected
with numerical computations will introduce high frequency perturbations which
in turn must be magnied.
7.2 Practice
But it is dicult to restrain our curiosity. What will actually happen if we try
to solve such a problem numerically with a nite dierence method.
The components of the numerical solution are governed by the growth factor
which for the general -method is
g() =
1 (1 )4bsin
2
2
1 + 4bsin
2
2
. (7.5)
For a given step size, h = (X
2
X
1
)/M, not all frequencies, , are possible.
Because of the nite and discrete nature of the problem, only a nite number of
frequencies are possible, given by

s
=
s
M
, s = 1, 2, . . . M 1. (7.6)
For the explicit method we have
g() = 1 4bsin
2

2
. (7.7)
When b < 0 we notice immediately that g() > 1 for all . All components will
be magnied, and the high frequency components will be magnied most. This
is ne for it reects the behaviour of the true solution, at least qualitatively.
The largest magnication at time t = nk is
(1 + 4[b[)
n
= (1 + 4
[bk[
h
2
)
n
e
4
|bnk|
h
2
= e
4
|bt|
h
2
,
so with a given step size h there is a limit to the magnication at a given time, t,
independent of the time step k. This is also in accordance with the mathematical
94
properties of the solution since the value of h denes an upper limit on the possible
frequencies.
For the Crank-Nicolson method the growth factor is
g() =
1 2bsin
2
2
1 + 2bsin
2
2
. (7.8)
We may experience innite magnication at the nite frequency given by
sin
2

2
=
1
2b
a situation which is possible if b < 1/2. We may not observe innite magni-
cation in practice if the corresponding frequency is not among those given by
(7.6).
The largest magnication is given by the value of
s
which maximizes (7.8). On
the other hand it is easily seen from (7.8) that all components are magnied, just
as for the explicit method, but the magnication becomes rather small for high
frequency components when [b[ is large.
For the implicit method the growth factor is
g() =
1
1 + 4bsin
2
2
. (7.9)
Again we may experience innite magnication when b < 1/4 but we may not
observe it in practice because of the discrete set of applicable frequencies. If [b[
is large we observe from (7.9) that high frequency components of the numerical
solution will be damped. This may result in a pleasant looking solution, but it is
a deception. Since all components of the true solution are magnied, a damping
of some is really an unwanted eect.
A further complication associated with negative values of b is that we may
encounter zeroes in the diagonal in the course of the Gaussian elimination, even in
cases where the tridiagonal matrix is non-singular. It may therefore be necessary
to introduce pivoting.
95
1
-1

1
EX
1
-1

1
CN
1
-1

1
IM
Figure 7.1: The growth factors for EX, CN, and IM as functions of x = 2bsin
2
2
.
The behaviour of the explicit method (EX), Crank-Nicolson (CN), and the im-
plicit method (IM) can be visualized using the graphs of 12x (EX), (1x)/(1+x)
(CN), and 1/(1 + 2x) (IM), where x = 2bsin
2
2
as shown in Fig. 7.1. We have
stability (damping) when the respective functions lie in the strip between 1 and
1, so for positive b, (x), we have unconditional stability with CN and IM, and
we require 2b < 1, (x < 1) with EX. For negative b we always have instability
with EX and CN, but we note that the magnication is rather small for large
[b[ with CN. For IM we have stability for large [b[ (and ). These observations
should be compared with the fact that the true solution exhibits large growth for
large values of .
When b < 1/2 the numerical results from CN and IM will not be inuenced
most by the high frequency components. Larger amplication factors will appear
due to the singularity of g() for intermediate values of
s
= s/M. The dom-
inant factor will occur for the value of s which makes 2bsin
2
(
s
/2) closest to
1, respectively 1/2, and the behaviour will be somewhat erratic when we vary
the step sizes.
96
Table 7.1: Error growth for the negative heat equation.
k b n t g s
EX 0.1 10 4 0.4 40 19
0.01 1 11 0.11 5 19
0.005 0.5 16 0.08 3 19
0.0025 0.25 27 0.0675 2 19
0.00125 0.125 47 0.05875 1.5 19
CN 0.1 10 7 0.7 23 3
0.01 1 2 0.02 10
0.005 0.5 4 0.02 300 19
0.0025 0.25 19 0.0475 3 19
0.00125 0.125 39 0.04875 1.7 19
IM 0.1 10 10 1.0 45 2
0.01 1 3 0.03 10 7
0.005 0.5 2 0.01 10
0.0025 0.25 5 0.0125 150 19
0.00125 0.125 31 0.03875 2 19
Example. The equation u
t
= u
xx
has the solution u(t, x) = e
t
cos x and
this is used to dene initial and boundary values. We have solved the equation
numerically using EX, CN, and IM on x [1, 1] with h = 0.1 and a range of
time steps from k = 0.1 to k = 1/800 giving values of b from 10 to 1/8.
We have continued the numerical solution until the error exceeded 5 and have
recorded the number of time steps, the nal value of t, and the observed growth
which in most cases was in good agreement with the theoretical value from (7.5)
and (7.6).
The results are given in Table 7.1. For small values of b the growth factor
approaches 1 and the worst growth is always associated with the highest frequency
component. As the reduction in g is coupled with a reduction in the time step we
notice that the time interval of integration becomes smaller as the time step is
reduced. As b gets larger (in absolute value) we may observe signicant growth
with CN and IM for low frequency components because of the singularity in the
expression for g. 2
97
7.3 Variable coecients An example
What happens when an equation is ill-posed in part of its domain. Can we trust
the solution in the rest of the domain where the equation is well posed. This
question can be illustrated by the following
Example. Consider the equation
u
t
=
1
2
xu
xx
. (7.10)
The coecient of u
xx
is negative when x < 0 and the equation is therefore ill-
posed here. A solution to (7.10) is
u(t, x) = tx + x
2
and this is used to dene initial and boundary values. If we solve (7.10) on an
interval such that 0 becomes a grid point then the system of equations will de-
couple in two, because of zeroes in the side diagonal, and bad vibrations from the
negative part will have no inuence on the positive side. To avoid this decoupling
we therefore choose to solve in the interval x [0.983, 1.017]. We have solved
the equation numerically using CN and IM with h = 0.1 and a range of time
steps from k = 0.1 to k = 1/200. We have continued the numerical solution until
the error exceeded 5 and have recorded the number of time steps and the nal
value of t and give the results in Table 7.2.
In all cases we observed severe error growth, usually originating from the negative
part of the interval but eventually spreading to the whole interval. 2
Table 7.2: Range of integration until error exceeds 5.
k n t
CN 1/10 7 0.7
1/40 14 0.35
1/100 25 0.25
1/200 56 0.28
IM 1/10 8 0.8
1/40 22 0.55
1/100 10 0.1
1/200 34 0.17
98
Chapter 8
A free boundary problem
8.1 Introduction
When modeling the price of the American option we have the extra complication
that because of the early exercise property we do not know beforehand the extent
of the region where the dierential equation must be solved. The position of the
boundary must be calculated along with the solution. This type of problem is
called a free boundary problem or Stefan problem after the Austrian (or Slovenian)
mathematician Josef Stefan who, based on measurements of the thickness of the
ice on arctic waters, set up a mathematical model for the growth of the ice cover
during the winter [21]. The following is joint work with Asbjrn Trolle Hansen
as presented in [19] which is also part of Asbjrns Ph.D.-thesis [18].
8.2 The mathematical model
The dierential equation for the price function for an American put option is
u
t
=
1
2

2
x
2
u
xx
+ rxu
x
ru, t > 0, x > y(t) (8.1)
where t is the time to expiry and x is the price of the underlying risky asset.
The initial condition is
u(0, x) = 0, x K (8.2)
and the boundary conditions are
lim
x
u(t, x) = 0, t > 0, (8.3)
99
u(t, y(t)) = K y(t), t > 0, (8.4)
u
x
(t, y(t)) = 1, t > 0. (8.5)
The function y(t) is called the exercise boundary and this function is not known
beforehand except for the information that
lim
t0
y(t) = K (8.6)
where K is the exercise price. The region
C = (t, x) [ t > 0, x > y(t) (8.7)
is called the continuation region. It is in this region we seek the solution of (8.1),
and it is characterized by the condition that u(t, x) > K x. The region
S = (t, x) [ t > 0, x < y(t) (8.8)
is called the stopping region. We can assign the price
u(t, x) = K x, (t, x) S (8.9)
but we must emphasize that this is not a solution to (8.1).
0
K
t
y(t)
C
S
Figure 8.1: The continuation region and the stopping region.
The initial condition is only given for x K because the American option will
never expire in the money due to the early exercise feature. If one wishes, one
can extend (8.9) to t = 0.
100
The boundary condition (8.3) expresses that the value of the option approaches
0 as the price of the underlying asset approaches innity.
The boundary condition (8.4) expresses that we are at the exercise boundary,
and (8.5) is the smooth t condition expressing that the partial derivative of u
with respect to x is continuous across the boundary if we assume the price from
(8.9) in the stopping region.
In [20] the above problem has been studied extensively, and the existence, unique-
ness, and dierentiability of u(t, x) and y(t) has been shown. We shall now study
the behaviour near the boundary further.
Theorem 1.
a. y(t) < K for t > 0.
b. y

(t) < 0 for t > 0.


c. u
t
is continuous across the boundary. More specically
lim
st
u
t
(s, y(t)) = lim
st
u
t
(s, y(t)) = 0 (8.10)
d. u
xx
is discontinuous across the boundary. More specically
lim
st
u
xx
(s, y(t)) =
2rK
(y(t))
2
(8.11)
whereas
lim
st
u
xx
(s, y(t)) = 0
Proof. If y(t) > K for some t > 0 then u(t, y(t)) < 0 by (8.4) which is
counterintuitive.
If y(t) = K for some t > 0 then u(t, y(t)) = 0 and because of (8.5) we would have
u(t, y(t) + ) < 0 for small, positive which again is counterintuitive.
Now consider for h > 0
u(t + h, y(t)) u(t, y(t))
= u(t + h, y(t + h)) u(t + h, y(t + h)) + u(t + h, y(t)) u(t, y(t))
= K y(t + h) K + y(t) (y(t + h) y(t))u
x
(t + h, z)
for some z between y(t) and y(t + h). If we also use the mean value theorem on
the very rst expression then there is a (0, 1) such that
hu
t
(t + h, y(t)) = (y(t) y(t + h))(1 + u
x
(t + h, z)).
From the denition of the continuation region we know that u(t, x) > K x for
x > y(t) and it follows that u
x
(t, x) > 1 for x y(t) small and positive. It
also follows that u
t
(t, x) > 0 for x y(t) small and positive. We therefore must
101
have y(t) > y(t + h) and therefore y

(t) < 0 for t > 0. Applying the mean value


theorem to y, dividing by h, and letting h 0 gives
lim
st
u
t
(s, y(t)) = 0.
The limit from the other side is also 0 since u(t, x) is independent of t in the
stopping region and we have established (8.10).
We therefore have
lim
st

1
2

2
x
2
u
xx
(s, y(t)) + rxu
x
(s, y(t)) ru(s, y(t)) = 0
By (8.4) and (8.5) and the continuity of u and u
x
in C (8.11) follows. That the
limit from the other side is 0 follows from the fact that u is a linear function in
S. 2
We conclude from (8.1) and (8.2) that
lim
t0
u
t
(t, x) = 0, x > K
but we note that lim
t0
u
t
(t, K) is undened. For reasons of monotonicity we
expect to have
u
t
(t, x) > 0, u
x
(t, x) < 0, u
xx
(t, x) > 0, for (t, x) C.
8.3 The boundary condition at innity
A boundary condition at innity is impractical when implementing a nite dif-
ference method. A commonly used technique to avoid it is to pick some large
M > K and replace (8.3) by
u(t, M) = 0, t > 0, (8.12)
Since the solution u(t, x) is usually very small for large x the error in using (8.12)
instead of (8.3) should be small. But how small is it and what is the eect on
the boundary curve.
Let us rst look at the steady-state solution to the original problem, i.e. u(x) =
lim
t
u(t, x). Since lim
t
u
t
(t, x) = 0 we have the following ordinary dieren-
tial equation problem for u(x)
1
2

2
x
2
u

+ rxu

ru = 0, y x < , (8.13)
lim
x
u(x) = 0, (8.14)
u(y) = K y, (8.15)
u

(y) = 1. (8.16)
102
where y = lim
t
y(t).
To nd the general solution to (8.13) we try with the power function u(x) = x
z
and get the characteristic equation
1
2

2
z(z 1) + rz r = 0
or
1
2

2
z
2
+ (r
1
2

2
)z r = 0.
The discriminant of this quadratic is
disc = (r
1
2

2
)
2
+ 2r
2
= (r +
1
2

2
)
2
such that
z =
(r
1
2

2
) (r +
1
2

2
)

2
and the two roots are z = 1 and z = =
2r

2
. The general solution can
therefore be written
u(x) = A
x
K
+ B(
x
K
)

. (8.17)
The boundary condition at innity gives A = 0, and from (8.15) and (8.16) we
get
u(y) = B(
y
K
)

= K y B = (K y)(
y
K
)

,
u

(y) =
B
K
(
y
K
)
1
=
K y
K
K
y
= 1
K + y = y y =

+ 1
K
B = K
1
+ 1
(

+ 1
)

. (8.18)
If we replace the upper boundary condition (8.14) with u(M) = 0 the general
solution is still (8.17) but the determination of A and B becomes a bit more
complicated.
u(M) = 0 A
M
K
+ B(
M
K
)

= 0 B = A(
M
K
)
+1
The boundary conditions (8.15) and (8.16) now give
A
y
K
A(
M
K
)
+1
(
y
K
)

= K y,
A
K
+
A
K
(
M
K
)
+1
(
y
K
)
1
= 1.
103
The second equation gives
(
y
M
)
1
=
K + A
A
y = M(
A
K + A
)
1/(+1)
and the rst equation now gives
A
K
(1 +
A
K
)

=
1

(

+ 1
K
M
)
+1
= . (8.19)
We cannot give a closed form solution for A from (8.19) but when is small,
A/K will also be small, and (1 + A/K)

will be close to 1, and an approximate


value is A
(1)
= K. A better value can be obtained by Newton-iteration where
the next iterate will be
A
(2)
= ( + )K
with
=
(1 )((1 )

1)
1 ( + 1)
.
Table 8.1: Corresponding values of M, , , A, B, and y.
M A B y
120 0.022 431 0.003 044 2.5527 7.6224 85.516
140 0.008 896 0.000 426 0.9322 7.0191 84.117
200 0.001 047 0.000 006 0.1052 6.7333 83.421
6.6980 83.333
In Table 8.1 we supply values for , , A, B, and y for various values of M and
corresponding to K = 100, = 0.2, r = 0.1 and therefore = 5.
Denote the solution function and the boundary function corresponding to a nite
value of M by u
M
(t, x) and y
M
(t), respectively. We note that the limit value
lim
t
y
M
(t) moves upwards as the value for M decreases and we conclude that
this holds for the whole boundary curve since otherwise the boundary curves for
two dierent values of M would intersect.
We want to estimate the error in u and y when we use a nite value, M, so we
dene the error function
w(t, x) = u(t, x) u
M
(t, x).
104
0
K
M
t
y(t)
y (t)
M
Figure 8.2: y(t) and y
M
(t).
It is dened in the same region as u
M
since y
M
(t) > y(t) and here it satises the
same dierential equation:
w
t
=
1
2

2
x
2
w
xx
+ rxw
x
rw, t 0, y
M
(t) x M. (8.20)
The initial condition is
w(0, x) = 0, K x M (8.21)
and the boundary conditions are
w(t, M) = u(t, M), t 0, (8.22)
w(t, y
M
(t)) = u(t, y
M
(t)) K + y
M
(t), t 0, (8.23)
w
x
(t, y
M
(t)) = u
x
(t, y
M
(t)) + 1, t 0. (8.24)
Using Taylor expansion we get
u(t, y
M
(t)) = u(t, y(t)) + (y
M
(t) y(t))u
x
+
1
2
(y
M
(t) y(t))
2
u
xx
+
= K y(t) y
M
(t) + y(t) +
1
2
(y
M
(t) y(t))
2
2rK

2
y(t)
2
+
= K y
M
(t) + (
y
M
(t)
y(t)
1)
2
rK

2
+ ,
u
x
(t, y
M
(t)) = 1 + (
y
M
(t)
y(t)
1)
2rK

2
y(t)
+
105
so the conditions (8.23) and (8.24) read
w(t, y
M
(t)) = (
y
M
(t)
y(t)
1)
2
rK

2
+ , t 0,
w
x
(t, y
M
(t)) = (
y
M
(t)
y(t)
1)
2rK

2
y(t)
+ , t 0.
We note that w(t, M) > 0, w(t, y
M
(t)) > 0, w
x
(t, y
M
(t)) > 0, w
t
(t, y
M
(t)) > 0,
and w
t
(t, M) > 0. This suggests that w(t, x) > 0 and is increasing with t and x.
We therefore have
0 w(t, x) < lim
t
w(t, x) = u(x) u
M
(x) u(M) = B(
K
M
)

with B given by (8.18). With the above parameter values K = 100, = 0.2,
r = 0.1, and M = 200 the error for at the money options is bounded by u(K)
u
M
(K) = 0.0704 and the maximum error for any x is bounded by u(M) = 0.2104.
0 1 2 3 4 5
85
90
95
100
Figure 8.3: The boundary curve calculated with Brennan-Schwartz.
8.4 Finite Dierence Schemes
If we introduce a traditional grid with xed step sizes k and h then we face
the problem that the boundary curve, y(t), typically passes between grid points.
There are various ways to deal with this diculty.
106
Table 8.2: The price function calculated with Brennan-Schwartz.
x t 0.4 0.8 1.2 1.6 2.0 2.4 2.8 3.2 3.6 4.0
195 0.000 0.000 0.001 0.002 0.005 0.008 0.011 0.014 0.017 0.019
190 0.000 0.000 0.002 0.005 0.011 0.018 0.024 0.031 0.037 0.042
185 0.000 0.000 0.003 0.009 0.019 0.029 0.040 0.050 0.060 0.068
180 0.000 0.001 0.005 0.014 0.028 0.044 0.059 0.074 0.087 0.099
175 0.000 0.001 0.008 0.022 0.041 0.062 0.083 0.103 0.121 0.136
170 0.000 0.002 0.012 0.032 0.058 0.086 0.113 0.139 0.161 0.181
165 0.000 0.004 0.019 0.047 0.081 0.117 0.152 0.184 0.212 0.237
160 0.000 0.007 0.031 0.069 0.113 0.159 0.202 0.241 0.275 0.305
155 0.000 0.012 0.048 0.100 0.158 0.214 0.267 0.313 0.355 0.390
150 0.001 0.022 0.076 0.146 0.219 0.288 0.351 0.407 0.456 0.497
145 0.002 0.041 0.119 0.211 0.303 0.387 0.462 0.528 0.584 0.632
140 0.006 0.073 0.185 0.306 0.419 0.521 0.609 0.685 0.750 0.805
135 0.016 0.129 0.287 0.442 0.580 0.700 0.803 0.890 0.963 1.025
130 0.040 0.226 0.442 0.636 0.803 0.943 1.060 1.158 1.240 1.308
125 0.097 0.390 0.676 0.915 1.111 1.271 1.402 1.511 1.600 1.675
120 0.223 0.661 1.026 1.311 1.535 1.714 1.859 1.977 2.074 2.154
115 0.488 1.101 1.545 1.872 2.122 2.316 2.471 2.596 2.698 2.781
110 1.009 1.795 2.306 2.665 2.931 3.135 3.295 3.423 3.527 3.611
105 1.962 2.868 3.411 3.780 4.048 4.252 4.410 4.535 4.636 4.718
100 3.519 4.456 4.988 5.340 5.591 5.780 5.926 6.040 6.132 6.207
95 6.142 6.836 7.249 7.530 7.731 7.885 8.004 8.096 8.171 8.232
90 10.222 10.430 10.589 10.708 10.803 10.877 10.934 10.981 11.022
At a given time level one can articially move the boundary curve to the nearest
grid point. We hereby introduce an error in the x-direction of order h and this
is undesirable.
One can also introduce dierence approximations to u
x
and u
xx
based on uneven
steps. This is a viable approach when the boundary curve is known beforehand,
but things become complicated when y(t) has to be determined along with u(t, x).
A third method was proposed by Brennan & Schwartz in 1977 [16]. They start
with the initial values from (8.2) augmented with values from (8.9) for t = 0
and x < K. They choose a step size h such that K/h is an integer, and a value
M = Jh where J is another integer. They then perform a Crank-Nicolson step
to t = k computing a set of auxiliary values
1
m
, m = 0, 1, . . . , J = M/h with
boundary values
1
0
= K and
1
J
= 0. The solution values are now determined
as v
1
m
= max(K mh,
1
m
) and the position of the exercise boundary is given by
y
M
(k) = h maxm[v
1
m
K mh. Once v
1
has been determined we can move
107
Table 8.3: Order quotients corresponding to h.
x t 0.4 0.8 1.2 1.6 2.0 2.4 2.8 3.2 3.6 4.0
195 4.7 2.8 3.5 3.7 3.7 3.8 3.9 3.8
190 4.1 4.9 2.9 3.5 3.7 3.8 3.8 3.9 3.8
185 4.1 5.1 3.1 3.5 3.7 3.8 3.8 3.9 3.8
180 4.1 6.0 3.2 3.6 3.7 3.8 3.8 3.9 3.8
175 4.2 12.7 3.3 3.6 3.7 3.8 3.8 3.9 3.8
170 4.2 0.5 3.4 3.6 3.7 3.8 3.9 3.8 3.8
165 4.0 4.4 2.5 3.5 3.6 3.7 3.8 3.9 3.8 3.8
160 4.0 4.6 3.1 3.5 3.7 3.7 3.8 3.9 3.8 3.8
155 4.0 5.5 3.3 3.6 3.7 3.8 3.8 3.9 3.8 3.8
150 4.0 -2.4 3.4 3.6 3.7 3.8 3.9 3.9 3.8 3.8
145 4.0 2.8 3.5 3.7 3.7 3.8 3.9 3.9 3.8 3.8
140 4.1 3.3 3.6 3.7 3.8 3.8 3.9 3.8 3.8 3.8
135 4.3 3.5 3.6 3.7 3.8 3.8 3.9 3.8 3.8 3.9
130 -4.0 3.6 3.7 3.8 3.8 3.9 3.9 3.8 3.8 3.9
125 3.4 3.6 3.7 3.8 3.8 4.0 3.9 3.7 3.8 4.0
120 3.6 3.7 3.7 3.8 3.8 4.1 3.8 3.7 3.8 4.1
115 3.6 3.7 3.7 3.9 3.8 4.2 3.7 3.7 3.9 4.2
110 3.6 3.7 3.7 3.8 3.9 4.3 3.4 3.7 3.9 4.5
105 3.6 3.7 3.7 3.6 4.1 4.4 3.2 3.9 4.0 4.9
100 2.7 2.0 1.4 1.1 0.8 0.6 0.5 0.4 0.3 0.2
95 -2.5 -30.2 11.4 5.2 4.3 2.4 2.3 2.4 2.6 3.0
90 -1.1 -1.1 -0.8 -1.1 -0.5 -0.7 -1.3 -1.9 -1.4
on to v
2
, v
3
, etc. This may seem like a rather harsh treatment of the problem,
but the results seem reasonable at a rst glance.
In Fig. 8.3 we show a plot of the computed boundary curve for the parameter
values K = 100, = 0.2, r = 0.1, and M = 200, and calculated with h = 0.25
and k = 0.0625. In Table 8.2 we give values for the price function for a selection
of points in the continuation region.
In order to estimate the error we apply the techniques of Chapter 3. In Table
8.3 we supply the quotients which should be close to 4.0 if the method is second
order in h. This looks fairly reasonable for x > K. Occasional isolated deviations
correspond to small values of the associated error estimate which is shown in
Table 8.4. For x K the order determination is far from reliable and the error
estimate due to the x-discretisation is unfortunately also much larger here, close
to the exercise boundary.
108
Table 8.4: Error estimate *1000 corresponding to h.
x t 0.4 0.8 1.2 1.6 2.0 2.4 2.8 3.2 3.6 4.0
195 -0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
190 -0.00 -0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
185 -0.00 -0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
180 -0.00 -0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
175 -0.00 -0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.01
170 -0.00 0.00 0.00 0.00 0.00 0.01 0.01 0.01 0.01
165 -0.00 -0.00 0.00 0.00 0.00 0.01 0.01 0.01 0.01 0.01
160 -0.00 -0.00 0.00 0.00 0.01 0.01 0.01 0.01 0.01 0.01
155 -0.00 -0.00 0.00 0.01 0.01 0.01 0.01 0.01 0.01 0.01
150 -0.00 0.00 0.00 0.01 0.01 0.01 0.02 0.02 0.02 0.02
145 -0.00 0.00 0.01 0.01 0.02 0.02 0.02 0.02 0.02 0.02
140 -0.00 0.00 0.01 0.02 0.02 0.02 0.02 0.03 0.03 0.03
135 -0.00 0.01 0.02 0.03 0.03 0.03 0.03 0.03 0.03 0.03
130 0.00 0.02 0.03 0.03 0.04 0.04 0.04 0.04 0.04 0.04
125 0.01 0.03 0.04 0.05 0.05 0.05 0.05 0.05 0.05 0.05
120 0.03 0.06 0.06 0.06 0.06 0.06 0.06 0.06 0.06 0.06
115 0.07 0.08 0.08 0.08 0.08 0.07 0.08 0.07 0.07 0.08
110 0.13 0.12 0.11 0.10 0.10 0.09 0.09 0.08 0.09 0.09
105 0.15 0.17 0.15 0.13 0.12 0.10 0.11 0.09 0.12 0.11
100 -6.74 -8.53 -9.31 -9.57 -9.52 -9.30 -8.98 -8.59 -8.12 -7.71
95 -0.50 -0.06 0.18 0.30 0.42 0.50 0.45 0.47 0.51 0.40
90 -1.90 -1.72 -1.57 -1.43 -1.07 -1.17 -0.91 -0.81 -0.91
In Table 8.5 we supply the similar quotients corresponding to the time discreti-
sation. Values close to 2.0 indicate that the method is of rst order in k contrary
to our expectations of a method based on Crank-Nicolson. Also here the order
determination leaves a lot to be desired for x K, and the error estimate due
to the time discretisation is also much larger here as seen in Table 8.6. We must
conclude that the Brennan-Schwartz approach is not ideal.
8.5 Varying the Time Steps
For the American option problem where the boundary curve is known to be mono-
tonic we can suggest an alternate approach. Since the nite dierence methods
which we are usually considering for parabolic problems are one-step methods
there is no need to keep the step size in time constant throughout the calcu-
109
Table 8.5: Order quotients corresponding to k.
x t 0.4 0.8 1.2 1.6 2.0 2.4 2.8 3.2 3.6 4.0
195 5.8 3.6 0.7 2.2 2.1 2.0 1.9 1.9 1.9 1.9
190 5.5 3.7 1.0 2.2 2.1 2.0 1.9 1.9 1.9 1.9
185 4.9 3.9 1.3 2.2 2.1 2.0 1.9 1.9 1.9 1.9
180 4.4 4.3 1.7 2.2 2.1 2.0 1.9 1.9 1.9 1.9
175 4.0 5.1 1.9 2.2 2.0 2.0 1.9 1.9 1.9 1.9
170 3.6 7.7 2.1 2.1 2.0 1.9 1.9 1.9 1.9 1.9
165 3.3 -25.1 2.2 2.1 2.0 1.9 1.9 1.9 1.9 1.9
160 3.0 -0.3 2.2 2.1 2.0 1.9 1.9 1.9 1.9 1.9
155 2.9 1.5 2.2 2.0 1.9 1.9 1.9 1.9 1.9 1.9
150 3.0 2.2 2.2 2.0 1.9 1.9 1.9 1.9 1.9 1.9
145 3.5 2.6 2.1 1.9 1.9 1.9 1.9 1.9 1.9 1.9
140 6.0 2.7 2.0 1.9 1.9 1.9 1.9 1.9 1.9 1.9
135 -2.7 2.5 2.0 1.9 1.8 1.8 1.9 1.9 1.9 1.9
130 1.0 2.2 1.9 1.8 1.8 1.8 1.8 1.9 1.9 1.9
125 2.4 1.8 1.8 1.8 1.8 1.8 1.9 1.9 1.9 1.9
120 4.3 1.6 1.7 1.8 1.8 1.9 1.9 1.9 1.9 1.9
115 5.5 2.2 1.7 1.7 1.8 1.8 1.9 1.9 1.9 1.9
110 -0.1 2.4 2.3 2.0 1.9 1.9 1.8 1.8 1.9 1.9
105 -11.0 -0.6 0.6 1.3 1.7 2.0 2.1 2.1 2.1 2.1
100 1.8 1.9 2.0 2.0 1.9 1.9 1.9 1.9 1.8 1.8
95 -5.4 -1.5 -0.5 -0.0 0.3 0.5 0.6 0.7 0.8 0.9
90 1.2 1.2 1.2 1.3 1.4 1.4 1.4 1.5
lation. Instead we propose to choose the step sizes k
n
such that the boundary
curve will pass exactly through grid points. This idea was proposed for the orig-
inal Stefan Problem by Douglas & Gallie [17]. In our case the boundary curve is
decreasing so we shall choose k
n
such that there is precisely one extra grid point
at the next time level, see Fig. 8.4. This will imply that the initial time steps
will be very small and then keep increasing, but as we shall see that is not such
a bad idea from other points of view as well.
8.6 The Implicit Method
Since there is exactly one extra grid point on the next time level the situation
is ideally suited for the implicit method which will never have to refer to points
outside the continuation region.
110
Table 8.6: Error estimate *100 corresponding to k.
x t 0.4 0.8 1.2 1.6 2.0 2.4 2.8 3.2 3.6 4.0
195 -0.00 -0.00 0.00 0.00 0.01 0.01 0.01 0.01 0.02 0.02
190 -0.00 -0.00 0.00 0.01 0.01 0.02 0.02 0.03 0.03 0.04
185 -0.00 -0.00 0.00 0.01 0.02 0.03 0.04 0.05 0.05 0.06
180 -0.00 -0.00 0.01 0.02 0.03 0.05 0.06 0.07 0.08 0.09
175 -0.00 -0.00 0.01 0.03 0.05 0.07 0.08 0.10 0.11 0.12
170 -0.00 -0.00 0.02 0.04 0.07 0.09 0.11 0.13 0.14 0.15
165 -0.00 0.00 0.03 0.06 0.09 0.12 0.15 0.17 0.19 0.20
160 -0.01 0.01 0.05 0.09 0.12 0.16 0.19 0.22 0.24 0.26
155 -0.01 0.02 0.07 0.12 0.17 0.21 0.25 0.28 0.30 0.33
150 -0.02 0.04 0.11 0.17 0.23 0.28 0.32 0.36 0.39 0.41
145 -0.02 0.08 0.16 0.24 0.30 0.36 0.42 0.46 0.49 0.52
140 -0.02 0.13 0.23 0.33 0.41 0.48 0.54 0.59 0.62 0.66
135 0.03 0.22 0.34 0.45 0.55 0.63 0.70 0.75 0.79 0.83
130 0.15 0.34 0.49 0.63 0.74 0.83 0.91 0.96 1.01 1.05
125 0.36 0.51 0.71 0.87 1.00 1.10 1.18 1.24 1.29 1.33
120 0.54 0.77 1.01 1.21 1.35 1.46 1.54 1.60 1.65 1.69
115 0.66 1.15 1.48 1.68 1.83 1.94 2.02 2.08 2.12 2.16
110 1.67 1.65 2.06 2.33 2.49 2.59 2.66 2.70 2.74 2.77
105 0.60 3.36 3.56 3.48 3.43 3.43 3.45 3.48 3.51 3.54
100 19.86 15.82 13.94 12.90 12.22 11.72 11.32 10.99 10.70 10.46
95 4.33 7.09 7.09 6.94 6.86 6.82 6.81 6.80 6.78 6.77
90 9.50 9.30 9.07 8.85 8.63 8.50 8.45 8.39 8.36
We choose a step size in the x-direction, h = K/J
0
, where J
0
is a positive integer,
and such that J = M/h is also an integer. The step size h is kept xed during
the computation. The number of grid points above the exercise boundary is thus
J J
0
at time t = 0 which corresponds to the expiration time. For each time
step we add one extra grid point in the x-direction such that at the end of the
n-th time step we have J J
n
grid points above the boundary where J
n
= J
0
n.
The time steps will be denoted k
1
, k
2
, . . . and we dene t
n
=

n
i=1
k
i
. We now
compute a grid function
v
n
m
= v(t
n
, mh) u(t
n
, mh)
satisfying
v
n
m
v
n1
m
k
n
=

2
2
m
2
(v
n
m+1
2v
n
m
+ v
n
m1
) (8.25)
111
K
t
Figure 8.4: The rst few grid lines.
+
r
2
m(v
n
m+1
v
n
m1
) rv
n
m
, m = J
n
+ 1, . . . , J 1,
v
0
m
= 0, m = J
0
, . . . , J, (8.26)
v
n
J
= 0, (8.27)
v
n
Jn
= K J
n
h = nh, (8.28)
1 =
v
n
m+2
+ 4v
n
m+1
3v
n
m
2h
, m = J
n
, (8.29)
n = 1, 2, . . .. The rst order approximation to the boundary derivative v
n
m+1

v
n
m
= h is abandoned in favour of (8.29) for reasons of accuracy and because of
problems with the initial steps.
In practice we guess a value for k
n
, solve the tridiagonal system determined by
(8.25), (8.27), and (8.28), and use formula (8.29) to correct the time step. Al-
ternatively we could solve the (almost) tridiagonal system determined by (8.25),
(8.27), and (8.29), and use formula (8.28) to correct the time step. Numerical
experiments, however, suggest that the former method is computationally more
ecient in that it requires fewer iterations when searching for the size of the next
time step.
Getting Started
For n = 1 equations (8.28) and (8.29) give
v
1
m+2
+ 4v
1
m+1
= h, (m = J
1
= J
0
1).
112
When h is small we expect v
1
m+2
to be very small. We put v
1
m+2
= h and get
v
1
m+1
=
h
4
(1 + ), m = J
1
.
Applying (8.25) with m = J
1
+ 1 = J
0
= K/h we then get
h(1 + )
4k
1
=

2
K
2
2h
2
(
1 +
2
+ 1)h +
rK
2h
( 1)h r
h
4
(1 + )

1
k
1
=

2
K
2
h
2

1
1 +
2rK
h
r
k
1
=
h
2

2
K
2
2rKh
1
1+
rh
2

h
2

2
K
2
2rKh rh
2
(8.30)
and we have a good initial guess for the size of the rst time step. Note further
that
k
1

h
2

2
K
2
so it appears that the boundary curve y(T) starts out from (0, K) with a vertical
tangent and with a shape much like a parabola with its apex at (0, K).
For n > 1 we could as a starting value for k
n
use k
n1
, but for n > 2 it turns out
to be more ecient to use
k
1
n
= 2k
n1
k
n2
(8.31)
based on the assumption that the second derivative of y
M
(t) is small. Given
the initial value k
1
n
, a good second value can be obtained as follows. First we
solve equations (8.25), (8.27), and (8.28) and obtain tentative values for v
n
m
, m =
J
n
, . . . , J. We then evaluate the accuracy of k
1
n
by calculating the error term
e = 1
v
n
Jn+2
+ 4v
n
Jn+1
3v
n
Jn
2h
. (8.32)
If e = 0 then k
1
n
is the right size of the time step. Otherwise we would like to
change k
n
such that e = 0. First notice that
e =
4v
n
Jn+1
v
n
Jn+2
2h

3v
n
Jn+1
2h

3
v
n
Jn+1
t
t
2h
.
An approximation to the time derivative at grid point (n, J
n
+1) can be obtained
by
v
n
Jn+1
t

v
n
Jn+1
v
n1
Jn+1
k
1
n
and since we want e = e our second value for k
n
becomes
k
2
n
= k
1
n
+
2
3
hk
1
n
e
v
n
Jn+1
v
n1
Jn+1
. (8.33)
113
Iterating k
n
With a proposed value for k
n
we can rewrite equation (8.25) as
v
n
m

m
k
n
(v
n
m+1
2v
n
m
+ v
n
m1
)
m
k
n
(v
n
m+1
v
n
m1
) + rk
n
v
n
m
= v
n1
m
with
m
=
1
2

2
m
2
and
m
=
1
2
rm. Collecting terms we get
(
m

m
)k
n
v
n
m1
+ (1 + (2
m
+ r)k
n
)v
n
m
(
m
+
m
)k
n
v
n
m+1
= v
n1
m
,(8.34)
m = J
n
+ 1, . . . , J 1. With the two boundary conditions (8.27) and (8.28) we
have as many equations as unknowns and we can solve the resulting system of
equations. The calculated values are then checked with equation (8.29). If they
do not t, and they seldom do the rst time around, the value for k
n
is adjusted,
and the equations are solved again until a satisfactory agreement with (8.29) is
achieved.
The rst and the second value for k
n
have been discussed above. The general
way of calculating k
i+1
n
from k
i1
n
and k
i
n
for i > 1 is by using the secant method
on formula (8.32):
k
i+1
n
= k
i
n
e
i
k
i
n
k
i1
n
e
i
e
i1
(8.35)
where e
i
is calculated from (8.32) with v-values calculated with k
i
n
. The iteration
is continued until two successive values of k
i
n
dier by less than a predetermined
tolerance.
Table 8.7 illustrates the time step iterations corresponding to the usual set of
parameter values K = 100, = 0.2, r = 0.1, and M = 200, and calculated
with h = 1. We note that the initial guess for k
1
from (8.30) is reasonably good.
The second guess from (8.33) goes in the right direction but overshoots. k
1
is a
reasonable initial guess for k
2
which turns out to be only slightly larger than k
1
,
thus not supporting the parabola hypothesis. For the subsequent steps (8.31) is
a reasonable rst guess although it always undershoots as does the second guess
from (8.33). In all cases the subsequent secant method displays rapid convergence.
We have chosen a tolerance of 10
10
because even when only a limited accuracy
is demanded in the nal results it is important that the time steps are correct.
And since the secant method has superlinear convergence the last decimals are
cheap.
Because the boundary curve has a horizontal asymptote the time steps must
eventually increase without bound as we approach the magical value of y
M
. We
stop the calculations when the proposed time step exceeds a predetermined large
value (or becomes negative).
114
Table 8.7: Time step iterations with the implicit method.
n 1 2 3 4 5 6
1 0.00263089 0.00249934 0.00261373 0.00261361
2 0.00261361 0.00274429 0.00306504 0.00306051 0.00306055
3 0.00350749 0.00368286 0.00406362 0.00404923 0.00404955
4 0.00503856 0.00529049 0.00633407 0.00623458 0.00624054 0.00624058
5 0.00843160 0.00885318 0.01059401 0.01039681 0.01041042 0.01041053
6 0.01458048 0.01530950 0.01695293 0.01683323 0.01683825 0.01683827
7 0.02326601 0.02442931 0.02580814 0.02574396 0.02574540
8 0.03465253 0.03638516 0.03803276 0.03796856 0.03796965
9 0.05019391 0.05270361 0.05518939 0.05509763 0.05509915
10 0.07222864 0.07584007 0.07995677 0.07980713 0.07980971
11 0.10452027 0.10974629 0.11709737 0.11682760 0.11683268
12 0.15385565 0.16154843 0.17582050 0.17528125 0.17529254
13 0.23375241 0.24544003 0.27619704 0.27499448 0.27502200
14 0.37475150 0.39348908 0.47043296 0.46747716 0.46754145 0.46754149
15 0.66006096 0.69306401 0.94060986 0.93475551 0.93476998 0.93476997
16 1.40199846 1.47209838 2.85982166 3.01747989 3.04737618 3.04806822
8.7 The Crank-Nicolson Method
When attempting to use the Crank-Nicolson method on this problem there is a
minor complication. Since the boundary curve is decreasing the rst interior grid
point at a particular time level corresponds to a boundary point at the previous
time level. If we want to use a Crank-Nicolson approximation here we shall refer
to a point outside the continuation region at the previous time level. In some
cases this kind of point is referred to as a ctitious point. There are (at least)
three ways to circumvent this complication.
1. Use the value Kx at the ctitious point. This is not correct but reasonably
close.
2. Extrapolate from the boundary point and its neighbour using the central
dierence approximation to u
x
which we know is 1.
3. Replace the dierence approximations to u
x
and u
xx
on the boundary by the
exact values of 1 and
2rK

2
y(t)
2
from (8.5) and (8.11).
In practice these three approaches perform equally good. We shall therefore con-
centrate on the method described in 3. and note that the details in implementing
115
1. and 2. can be lled in similarly. We do make one exception from the method-
ology described in 3. At the initial step neither u
x
(0, K) nor u
xx
(0, K) is dened
so we need to deal with the initial step dierently. Instead we look at suggestion
1. and approximate
u
x
(0, K) by
u(0, K + h) u(0, K h)
2h
=
1
2
and
u
xx
(0, K) by
u(0, K + h) 2u(0, K) + u(0, K h)
h
2
=
1
h
.
With notation as in the previous section we compute a grid function
v
n
m
= v(t
n
, mh) u(t
n
, mh)
satisfying
v
1
m
v
0
m
k
n
=

2
4
m
2
(v
1
m+1
2v
1
m
+ v
1
m1
+ h) (8.36)
+
r
4
m(v
1
m+1
v
1
m1
h)
r
2
v
1
m
, m = J
0
, n = 1,
v
n
m
v
n1
m
k
n
=

2
4
m
2
(v
n
m+1
2v
n
m
+ v
n
m1
) +
rK
2
(8.37)
+
r
4
m(v
n
m+1
v
n
m1
2h)
r
2
(v
n
m
+ v
n1
m
), m = J
n1
, n > 1,
v
n
m
v
n1
m
k
n
=

2
4
m
2
(v
n
m+1
2v
n
m
+ v
n
m1
+ v
n1
m+1
2v
n1
m
+ v
n1
m1
) (8.38)
+
r
4
m(v
n
m+1
v
n
m1
+ v
n1
m+1
v
n1
m1
)
r
2
(v
n
m
+ v
n1
m
),
m = J
n
+ 2, . . . , J 1,
v
0
m
= 0, m = J
0
, . . . , J, (8.39)
v
n
J
= 0, (8.40)
v
n
Jn
= K J
n
h = nh, (8.41)
1 =
v
n
m+2
+ 4v
n
m+1
3v
n
m
2h
, m = J
n
, (8.42)
n = 1, 2, . . .. Just as for the implicit method we guess a value for k
n
, solve
the tridiagonal system of equations given by (8.36) (8.41), and use (8.42) to
correct the time step. Alternatively (8.42) could be incorporated in the system
of equations and (8.41) be used for the correction. We prefer the former since it
appears to lead to fewer iterations.
Getting Started
For n = 1 equations (8.41) and (8.42) give
v
1
m+2
+ 4v
1
m+1
= h, (m = J
1
= J
0
1).
116
When h is small we expect v
1
m+2
to be very small. We put v
1
m+2
= h and get
v
1
m+1
=
h
4
(1 + ), m = J
1
.
Applying (8.36) we then get
h(1 + )
4k
1
=

2
m
2
4
(
1 +
2
+ 2)h +
rm
4
( 2)h
r
2
h
4
(1 + )

1
k
1
=
2
m
2
(
1
2
+
1
1 +
)
2
1 +
rm
1
2
r
k
1
=
h
2

2
K
2
(
1
2
+
1
1+
) rKh
2
1+

1
2
rh
2
and we have a good initial guess for the size of the rst time step. Note further
that
k
1

h
2
3
2

2
K
2
2rKh
1
2
rh
2

h
2
3
2

2
K
2
(8.43)
so just like for the implicit method it appears that the boundary curve y(T) starts
out from (0, K) with a vertical tangent and with a shape much like a parabola
with its apex at (0, K), although this time a slightly dierent parabola.
For n = 2 practical experience shows that it is ecient to exploit the fact that the
boundary curve at the beginning looks like a parabola with its apex at (0, K).
When f(x) = x
2
then f(2h)/f(h) = 4 so that f(2h) f(h) = 3f(h). This
indicates that it might be a good idea to put k
1
2
= 3k
1
. This is in contrast to
the implicit method where the second step turns out to be of the same order
of magnitude as the rst. Thus it looks like the parabola conjecture ts better
to Crank-Nicolson than to the implicit method, at least for the rs two steps.
The third step with Crank-Nicolson is, however, not large enough to t the same
pattern.
For n > 2 we proceed like with the implicit method and put
k
1
n
= 2k
n1
k
n2
. (8.44)
For the second guess we use (8.33) and for subsequent values the secant method
(8.35) is used just as with the implicit method producing better and better values
for the time step until the tolerance is met..
Table 8.8 illustrates the time step iterations corresponding to the usual set of
parameter values K = 100, = 0.2, r = 0.1, and M = 200, and calculated with
h = 1. Most comments from the previous table can be taken verbatim to this
117
Table 8.8: Time step iterations with Crank-Nicolson.
n 1 2 3 4 5 6
1 0.00172444 0.00181066 0.00176141 0.00176115 0.00176116
2 0.00528347 0.00534501 0.00575238 0.00575488 0.00575490
3 0.00974864 0.00803102 0.00654763 0.00659399 0.00659352 0.00659352
4 0.00743214 0.00779111 0.00804438 0.00804408 0.00804408
5 0.00949463 0.01017074 0.01067041 0.01066699 0.01066699
6 0.01328991 0.01456567 0.01543735 0.01542461 0.01542468
7 0.02018236 0.02222535 0.02337563 0.02335643 0.02335653
8 0.03128838 0.03377432 0.03489219 0.03488019 0.03488023
9 0.04640393 0.04936500 0.05056048 0.05055332 0.05055333
10 0.06622644 0.07076255 0.07257623 0.07256957 0.07256957
11 0.09458581 0.10195371 0.10499982 0.10499678
12 0.13742398 0.14955464 0.15495471 0.15496947 0.15496949
13 0.20494220 0.22635535 0.23713380 0.23722496 0.23722525
14 0.31948101 0.36112641 0.38639870 0.38687650 0.38688015
15 0.53653504 0.63033291 0.70622891 0.70946632 0.70953139 0.70953144
16 1.03218273 1.30246815 1.65738437 1.70025643 1.70343860 1.70346436
one. The main dierences are that k
2
now is close to 3k
1
whereas k
3
is close
to k
2
. The initial guess for k
3
therefore overshoots. So much for the parabola
hypothesis.
Her kommer lige lidt tekst der skal fremprovokere et sideskift.
118
Table 8.9: Order quotients for the implicit method.
x t 0.4 0.8 1.2 1.6 2.0 2.4 2.8 3.2 3.6 4.0
195 6.9 1.7 2.0 1.9 1.5 - 5.7 2.0 1.7 1.6
190 6.4 1.7 2.0 1.9 1.5 - 5.6 2.0 1.7 1.6
185 5.8 1.7 2.0 1.9 1.5 6.6 5.3 2.0 1.7 1.7
180 5.2 1.7 2.0 1.9 1.5 4.5 5.1 2.0 1.7 1.7
175 4.5 1.6 2.0 1.9 1.5 3.4 4.8 2.0 1.7 1.7
170 4.0 1.6 2.0 1.9 1.3 2.7 4.5 2.0 1.7 1.7
165 3.5 1.6 1.9 1.8 0.9 2.3 4.3 2.1 1.8 1.7
160 3.1 1.5 1.9 1.7 7.1 2.1 4.1 2.1 1.8 1.7
155 2.8 1.5 1.9 1.3 2.5 1.9 3.9 2.1 1.8 1.8
150 2.5 1.5 1.9 -2.4 2.2 1.8 3.7 2.1 1.8 1.8
145 2.3 1.5 1.9 3.7 2.1 1.7 3.6 2.2 1.9 1.9
140 2.1 1.5 1.8 2.9 2.1 1.7 3.5 2.2 1.9 1.9
135 2.0 1.5 2.2 2.7 2.1 1.6 3.5 2.2 2.0 1.9
130 1.9 1.6 2.1 2.7 2.1 1.6 3.5 2.3 2.0 2.0
125 1.8 1.5 2.2 2.7 2.2 1.6 3.4 2.3 2.1 2.0
120 1.6 1.6 2.2 2.7 2.2 1.6 3.4 2.4 2.1 2.1
115 2.3 1.7 2.3 2.7 2.3 1.6 3.4 2.4 2.1 2.1
110 2.1 1.7 2.3 2.7 2.3 1.6 3.4 2.5 2.2 2.2
105 2.1 1.8 2.3 2.8 2.3 1.6 3.4 2.5 2.2 2.2
100 2.2 1.8 2.4 2.8 2.3 1.6 3.4 2.5 2.2 2.2
95 2.1 1.8 2.3 2.7 2.3 1.7 3.4 2.5 2.2 2.2
90 1.6 1.7 2.1 2.5 2.2 1.7 3.3 2.4 2.2 2.2
85 8.0 0.6 0.7 1.0 1.1
y 2.0 1.7 2.0 2.4 2.1 3.5 8.1 3.9 3.1 3.3
8.8 Determining the order
As usual we should like to determine the order of the methods and to estimate
the error. There is only one independent step size, h, and we can easily perform
calculations with h, 2h, and 4h, but now the problem arises. The step sizes in
the t-direction are determined in the course of the calculations, and there is no
way of ensuring that we have comparable grid function values at the same point
in time corresponding to two dierent values of h.
The solution is to interpolate between the time values that we actually compute.
But we must make sure that the accuracy does not deteriorate because of the
interpolation. Let t denote a time where a solution value is wanted and let t
n
be
119
Table 8.10: Error estimate *10 for the implicit method.
x t 0.4 0.8 1.2 1.6 2.0 2.4 2.8 3.2 3.6 4.0
195 -0.000 -0.001 -0.002 -0.003 -0.002 0.000 0.002 0.008 0.011 0.013
190 -0.000 -0.002 -0.005 -0.006 -0.004 0.001 0.005 0.017 0.024 0.027
185 -0.000 -0.003 -0.007 -0.009 -0.006 0.002 0.008 0.028 0.038 0.044
180 -0.000 -0.005 -0.011 -0.013 -0.008 0.004 0.013 0.041 0.056 0.064
175 -0.000 -0.007 -0.014 -0.016 -0.009 0.008 0.019 0.057 0.077 0.087
170 -0.001 -0.010 -0.018 -0.018 -0.008 0.015 0.028 0.077 0.102 0.115
165 -0.001 -0.014 -0.022 -0.019 -0.005 0.026 0.040 0.102 0.132 0.147
160 -0.002 -0.018 -0.025 -0.018 0.001 0.041 0.056 0.132 0.168 0.185
155 -0.004 -0.024 -0.027 -0.013 0.012 0.063 0.078 0.170 0.212 0.230
150 -0.007 -0.030 -0.027 -0.003 0.029 0.093 0.106 0.215 0.264 0.283
145 -0.012 -0.035 -0.021 0.014 0.055 0.134 0.141 0.269 0.324 0.343
140 -0.020 -0.037 -0.009 0.041 0.090 0.185 0.184 0.333 0.394 0.411
135 -0.031 -0.031 0.015 0.079 0.136 0.249 0.236 0.406 0.472 0.486
130 -0.042 -0.011 0.052 0.132 0.195 0.326 0.296 0.486 0.557 0.566
125 -0.044 0.028 0.107 0.199 0.264 0.412 0.361 0.571 0.646 0.648
120 -0.024 0.094 0.179 0.277 0.341 0.502 0.428 0.654 0.731 0.726
115 0.040 0.185 0.261 0.359 0.417 0.588 0.489 0.726 0.803 0.790
110 0.152 0.287 0.342 0.432 0.481 0.655 0.535 0.776 0.850 0.830
105 0.279 0.370 0.399 0.477 0.514 0.683 0.551 0.785 0.855 0.830
100 0.333 0.391 0.403 0.468 0.498 0.651 0.523 0.735 0.797 0.770
95 0.242 0.312 0.330 0.386 0.411 0.535 0.432 0.604 0.654 0.633
90 0.043 0.134 0.174 0.218 0.242 0.319 0.267 0.376 0.409 0.399
85 0.002 0.024 0.040 0.049 0.057
y -0.519 -0.623 -0.655 -0.718 -0.725 -0.852 -0.753 -1.780 -2.260 -2.074
the largest grid point smaller than t. We shall assume that solution values are
also calculated at time t
n+1
. A linear interpolation between function values w
n
and w
n+1
at times t
n
and t
n+1
can be written as
w(t) = w
n
+ (t t
n
)
w
n+1
w
n
t
n+1
t
n
. (8.45)
This is a special case of the Newton interpolation polynomial which has the
advantage that extra interpolation points can be added as the need arises. The
implicit method is expected to be of rst order and therefore a linear interpolation
is suciently accurate. Table 8.9 gives the order quotients and Table 8.10 gives
the error function multiplied by 10 for a computation with the parameter values
K = 100, = 0.2, r = 0.1, and M = 200, and calculated with h = 0.25, 0.5,
and 1.0. The last line in each table refers to the boundary curve. The general
120
Table 8.11: Order quotients for Crank-Nicolson.
x t 0.4 0.8 1.2 1.6 2.0 2.4 2.8 3.2 3.6 4.0
195 - 4.0 - 3.9 1.7 2.7 1.6 5.5 4.8 -
190 - 4.0 - 3.9 1.8 2.7 1.6 5.5 4.8 -
185 - 4.0 -1.6 4.0 1.8 2.7 1.6 5.7 4.8 -
180 - 4.0 1.1 4.0 1.8 2.7 1.7 5.9 4.9 -
175 4.0 4.1 1.7 4.0 1.9 2.8 1.8 6.2 5.0 -
170 4.0 4.2 2.0 4.0 1.9 2.8 1.9 6.7 5.2 -
165 4.0 4.3 2.2 4.0 2.0 2.8 2.0 7.7 5.9 -
160 4.0 -3.1 2.3 4.0 2.0 2.9 2.3 - - -
155 4.0 3.6 2.4 4.0 2.1 3.0 3.0 - 3.0 -
150 4.1 3.8 2.5 4.0 2.3 3.1 5.2 - 3.7 -
145 4.2 3.9 2.5 4.0 2.5 3.5 - -5.9 4.0 -
140 6.6 3.9 2.6 4.0 3.1 7.3 -1.4 -1.5 4.1 -7.7
135 3.5 3.9 2.7 3.9 - 1.4 -0.2 0.1 4.1 -5.5
130 3.7 3.8 3.0 3.6 0.1 2.1 0.2 0.9 4.2 -4.0
125 3.8 3.8 3.7 4.5 1.2 2.3 0.5 1.4 4.2 -2.9
120 3.7 3.6 -1.7 4.2 1.5 2.4 0.6 1.7 4.2 -2.1
115 3.7 2.9 1.6 4.2 1.6 2.5 0.7 2.0 4.2 -1.5
110 3.5 5.2 2.0 4.1 1.7 2.5 0.8 2.1 4.2 -1.0
105 2.9 4.5 2.2 4.1 1.8 2.6 0.9 2.2 4.2 -0.7
100 1.5 4.3 2.3 4.1 1.9 2.6 1.0 2.3 4.2 -0.4
95 1.9 4.3 2.5 4.1 2.0 2.7 1.1 2.4 4.2 -0.1
90 3.0 4.3 3.3 4.1 2.5 3.0 1.5 2.6 4.2 0.4
85 - 6.3 8.7 3.8 3.9 2.5
y 1.3 4.1 2.8 4.0 2.5 3.0 1.7 2.9 4.1 1.3
picture is that of a rst order method and occasional deviations can be explained
by small values of the error function.
In contrast to Brennan-Schwartz the error function does not have an excessive
maximum at x = K. The behaviour is much smoother with a soft maximum
attained near x = K.
Crank-Nicolson is expected to be of second order and therefore linear interpola-
tion is not sucient. (8.45) is therefore augmented with a second order term
w(t) = w
n
+ (t t
n
)w[n, n + 1] + (t t
n
)(t t
n+1
)w[n 1, n, n + 1] (8.46)
where the divided dierences w[ ] are dened by
w[n, n + 1] =
w
n+1
w
n
t
n+1
t
n
,
121
Table 8.12: Error function *1000 for Crank-Nicolson.
x t 0.4 0.8 1.2 1.6 2.0 2.4 2.8 3.2 3.6 4.0
195 - -0.000 - 0.002 0.005 0.005 0.007 0.004 0.002 0.001
190 - -0.001 - 0.004 0.010 0.011 0.015 0.008 0.005 0.003
185 - -0.001 0.000 0.007 0.016 0.018 0.023 0.012 0.007 0.004
180 - -0.002 0.001 0.010 0.022 0.024 0.030 0.016 0.009 0.005
175 -0.000 -0.002 0.003 0.014 0.029 0.030 0.037 0.019 0.009 0.004
170 -0.000 -0.002 0.006 0.019 0.036 0.036 0.041 0.021 0.009 0.002
165 -0.000 -0.001 0.010 0.024 0.042 0.040 0.043 0.020 0.006 -0.002
160 -0.001 0.000 0.016 0.030 0.046 0.042 0.041 0.017 - -0.009
155 -0.002 0.003 0.024 0.036 0.049 0.041 0.033 0.010 -0.009 -0.019
150 -0.002 0.009 0.032 0.040 0.048 0.035 0.018 -0.002 -0.023 -0.034
145 -0.002 0.016 0.041 0.042 0.042 0.023 -0.006 -0.020 -0.041 -0.053
140 -0.001 0.027 0.047 0.040 0.028 0.005 -0.039 -0.044 -0.065 -0.077
135 0.006 0.038 0.048 0.030 0.006 -0.021 -0.083 -0.074 -0.095 -0.106
130 0.019 0.047 0.041 0.012 -0.027 -0.056 -0.137 -0.110 -0.129 -0.139
125 0.039 0.049 0.023 -0.015 -0.069 -0.098 -0.200 -0.151 -0.167 -0.175
120 0.061 0.040 -0.009 -0.052 -0.119 -0.146 -0.267 -0.194 -0.206 -0.212
115 0.074 0.016 -0.052 -0.094 -0.172 -0.194 -0.331 -0.235 -0.242 -0.247
110 0.065 -0.018 -0.099 -0.136 -0.220 -0.235 -0.384 -0.268 -0.271 -0.273
105 0.039 -0.052 -0.137 -0.167 -0.252 -0.260 -0.412 -0.285 -0.285 -0.286
100 0.017 -0.070 -0.149 -0.174 -0.253 -0.259 -0.400 -0.279 -0.277 -0.277
95 0.015 -0.058 -0.124 -0.148 -0.214 -0.221 -0.336 -0.241 -0.240 -0.241
90 0.008 -0.026 -0.062 -0.088 -0.129 -0.142 -0.211 -0.165 -0.169 -0.172
85 -0.008 -0.028 -0.028 -0.055 -0.065 -0.071
y -0.020 0.282 0.475 0.523 0.669 0.669 0.908 0.687 0.676 0.673
w[n 1, n, n + 1] =
w[n, n + 1] w[n 1, n]
t
n+1
t
n1
.
Table 8.11 gives the order quotients and Table 8.12 gives the error function mul-
tiplied by 1000 for a computation with the parameter values K = 100, = 0.2,
r = 0.1, and M = 200, and calculated with h = 0.0625, 0.125, and 0.25. The last
line in each table refers to the boundary curve. The general picture is that of a
second order method although there are several deviations from the pattern. On
the positive side we can notice that the behaviour on the line x = K is pretty
much like in the rest of the region and that the error function is very small. That
the singularity at (0, K) has no signicant eect on the numbers is explained by
the fact that the time steps in the beginning are very small, leading to small
values of b and therefore ecient damping of high frequency components by the
Crank-Nicolson method.
122
Table 8.13: The price function computed with Crank-Nicolson.
x t 0.4 0.8 1.2 1.6 2.0 2.4 2.8 3.2 3.6 4.0
195 0.000 0.000 0.001 0.002 0.005 0.008 0.011 0.014 0.017 0.020
190 0.000 0.000 0.002 0.005 0.011 0.018 0.025 0.031 0.037 0.042
185 0.000 0.000 0.003 0.009 0.019 0.030 0.041 0.051 0.060 0.069
180 0.000 0.001 0.005 0.015 0.029 0.044 0.060 0.075 0.088 0.100
175 0.000 0.001 0.008 0.022 0.041 0.063 0.084 0.104 0.122 0.138
170 0.000 0.002 0.012 0.033 0.059 0.087 0.115 0.140 0.163 0.183
165 0.000 0.004 0.020 0.048 0.082 0.119 0.154 0.186 0.214 0.239
160 0.000 0.007 0.031 0.070 0.115 0.161 0.204 0.243 0.278 0.308
155 0.000 0.012 0.049 0.102 0.160 0.217 0.269 0.317 0.358 0.394
150 0.001 0.023 0.077 0.148 0.221 0.291 0.355 0.411 0.460 0.502
145 0.002 0.041 0.121 0.214 0.306 0.391 0.467 0.533 0.590 0.638
140 0.006 0.074 0.188 0.310 0.424 0.526 0.615 0.692 0.757 0.812
135 0.016 0.132 0.291 0.447 0.587 0.708 0.811 0.898 0.972 1.034
130 0.041 0.230 0.448 0.644 0.812 0.953 1.070 1.169 1.251 1.320
125 0.100 0.396 0.684 0.925 1.122 1.283 1.416 1.525 1.615 1.690
120 0.229 0.671 1.038 1.325 1.551 1.731 1.877 1.995 2.092 2.173
115 0.498 1.116 1.563 1.892 2.143 2.339 2.494 2.620 2.722 2.806
110 1.026 1.818 2.332 2.693 2.960 3.164 3.325 3.454 3.558 3.642
105 1.992 2.901 3.447 3.818 4.087 4.290 4.449 4.574 4.675 4.758
100 3.636 4.528 5.047 5.394 5.644 5.831 5.977 6.091 6.184 6.258
95 6.234 6.911 7.320 7.599 7.800 7.952 8.070 8.163 8.238 8.299
90 10.068 10.320 10.526 10.681 10.799 10.892 10.965 11.023 11.071 11.110
85 15.000 15.006 15.014 15.022 15.030 15.038
y 88.528 86.792 85.875 85.294 84.891 84.597 84.374 84.201 84.064 83.953
We conclude this section by giving in Table 8.13 the values of the price function
and the boundary curve as calculated by Crank-Nicolson with h = 0.0625.
8.9 Eciency of the methods
Comparing Brennan-Schwartz to Crank-Nicolson with variable time steps we note
that the latter is a second order method with error estimates that can be trusted,
also in the interesting region where x K. The price we have to pay to achieve
this is very small time steps and the beginning and several iterations per time
step. To take the last point rst we supply in Table 8.14 the average number
of iterations per time step as a function of h and the tolerance as well as the
total number (N) of time steps for a given value of h. As expected the number
123
Table 8.14: Average number of iterations per time step, and total number (N)
of time steps.
Tolerance
h 10
3
10
4
10
5
10
6
10
7
10
8
10
9
10
10
10
11
10
12
N
1/1 3.06 3.19 3.75 4.19 4.56 4.81 5.06 5.19 5.56 5.69 16
1/2 1.94 3.12 3.21 3.58 4.12 4.36 4.58 4.82 5.12 5.27 33
1/4 1.44 2.08 2.79 3.20 3.36 3.62 4.20 4.30 4.36 4.61 66
1/8 1.33 1.50 2.21 2.58 3.16 3.23 3.36 3.69 4.17 4.21 132
1/16 1.28 1.34 1.51 2.18 2.48 3.11 3.15 3.23 3.39 3.92 265
1/32 - 1.28 1.36 1.52 2.14 2.38 3.07 3.09 3.15 3.35 530
of iterations increase with decreasing values of the tolerance but not very much.
On the other hand the number of iterations decrease with decreasing values of h.
Typically we can expect 3-4 iterations per time step for small h. For comparison
we should remember that Brennan-Schwartz computes all the way down to 0
which amounts to 1.7-2 times the number of valid grid points.
When discussing the time step variations it is essential to point out that we use
very small time steps for t close to 0 (time close to expiry) where most of the
action is as measured by large values of the time derivatives of u near x = K.
One property to strive for is a constant value of k
n
u
t
(t
n
, K) as t
n
increases. In
Table 8.15 we have given values of v
n
m
v
n1
m
which can be taken as approximations
of k
n
u
t
(t
n
, mh) for values of x = mh around K and for all the 16 time steps
we take when h = 1. It is seen that for constant x k
n
u
t
(t
n
, x) displays a slow
increase, possibly after some initial uctuations. With this behaviour in mind
we can of course modify Brennan-Schwartz and incorporate varying time steps,
possibly making this method more competitive.
124
Table 8.15: Values of k
n
u
t
near x = K.
n x 93 94 95 96 97 98 99 100 101 102 103 104 105 106
1 0.26 0.03 0.00 0.00 0.00 0.00 0.00
2 0.16 0.36 0.22 0.08 0.03 0.01 0.00 0.00
3 0.10 0.24 0.21 0.21 0.14 0.07 0.03 0.01 0.00
4 0.07 0.17 0.18 0.22 0.19 0.16 0.11 0.07 0.04 0.02
5 0.05 0.13 0.16 0.22 0.22 0.21 0.18 0.14 0.10 0.07 0.04
6 0.04 0.11 0.15 0.22 0.24 0.26 0.25 0.23 0.19 0.15 0.12 0.08
7 0.03 0.10 0.15 0.22 0.25 0.29 0.30 0.30 0.28 0.25 0.22 0.18 0.14
8 0.03 0.10 0.15 0.22 0.26 0.31 0.33 0.35 0.35 0.34 0.31 0.28 0.25 0.21
9 0.09 0.15 0.21 0.26 0.31 0.34 0.37 0.38 0.39 0.38 0.36 0.34 0.31 0.28
10 0.15 0.21 0.26 0.31 0.34 0.38 0.40 0.42 0.42 0.42 0.41 0.39 0.37 0.35
11 0.21 0.26 0.31 0.35 0.39 0.41 0.43 0.44 0.45 0.45 0.45 0.44 0.42 0.41
12 0.26 0.31 0.35 0.39 0.42 0.45 0.46 0.48 0.49 0.49 0.49 0.49 0.48 0.46
13 0.31 0.35 0.40 0.43 0.46 0.48 0.50 0.51 0.53 0.53 0.53 0.53 0.53 0.52
14 0.36 0.40 0.43 0.47 0.49 0.52 0.54 0.56 0.57 0.58 0.58 0.58 0.58 0.58
15 0.41 0.44 0.48 0.51 0.54 0.56 0.58 0.60 0.61 0.62 0.63 0.64 0.64 0.64
16 0.46 0.49 0.52 0.56 0.58 0.61 0.63 0.65 0.66 0.68 0.69 0.70 0.70 0.71
125
Bibliography
[1] A. C. Aitken, On Bernoullis numerical solution of algebraic equations,
Proc. Roy. Soc. Edinburgh, 46 (1926), pp. 289305.
[2] V. A. Barker, Extrapolation,
Hfte 31, Numerisk Institut, DtH, Lyngby, 1974.
[3] M. J. Brennan and E. S. Schwartz, A continuous time approach to the pricing
of bonds, Journal of Banking and Finance, 3 (1979), pp. 133155.
[4] J. Crank and P. Nicolson, A practical method for numerical evaluation of
solutions of partial dierential equations of the heat-conduction type, Proc.
Cambridge Philos. Soc., 43 (1947), pp. 5067.
Reprinted in Adv. Comput. Math., 6 (1996), pp. 207226.
[5] J. Douglas and H. H. Rachford, On the numerical solution of heat conduction
problems in two and three space variables,
Trans. Amer. Math. Soc., 82 (1956), pp. 421439.
[6] IEEE, Standard for Binary Floating Point Arithmetic,
ANSI/IEEE Std. 754-1985.
[7] D. C. Joyce, Survey of extrapolation processes in numerical analysis,
SIAM Review, 13 (1971), pp. 435490.
[8] P. Laasonen,

Uber eine Methode zur L osung der W armeleitungsgleichung,
Acta Math., 81 (1949), pp. 309317.
[9] Longsta and E. S. Schwartz, Interest rate volatility and the term structure:
A two-factor general equilibrium model,
J. Finance, 47 (1992), pp. 12591282.
See also Journal of Fixed Income, 3, no. 2 (1993), pp. 714.
[10] D. W. Peaceman and H. H. Rachford, The numerical solution of parabolic
and elliptic dierential equations, J. SIAM, 3 (1955), pp. 2841.
126
[11] Lewis F. Richardson and J. Arthur Gaunt, The Deferred Approach to the
Limit I - II, Trans. Roy. Soc. London, 226A (1927), pp. 299361.
[12] Werner Romberg, Vereinfachte Numerische Integration,
Norske Vid. Selsk. Forh., Trondheim, 28 (1955), pp. 3036.
[13] J. C. Strikwerda, Finite Dierence Schemes and Partial Dierential Equa-
tions, Wadsworth and Brooks/Cole, Pacic Grove, CA, 1989.
[14] J. H. Wilkinson, Rounding Errors in Algebraic Processes,
HMSO, London, 1963.
[15] Y. G. DYakonov, On the application of disintegrating dierence operators,
USSR Comp. Math., 3 (1963), 511515.
See also vol. 2 (1962), pp. 5577 and pp. 581607.
[16] M. J. Brennan and E. S. Schwartz, The valuation of American put options,
Journal of Finance, 32 (1977), pp. 449462.
[17] J. Douglas and T. M. Gallie, On the numerical integration of a parabolic
dierential equation subject to a moving boundary condition, Duke Math. J.,
22 (1955), pp. 557-571.
[18] Asbjrn Trolle Hansen, Martingale Methods in Contingent Claim Pricing
and Asymmetric Financial Markets,
Ph.D. thesis, Dept. Oper. Research, Aarhus University, 1998.
[19] Asbjrn Trolle Hansen and Ole sterby, Accelerating the Crank-Nicolson
method in American option pricing,
Dept. Oper. Research, Aarhus University, 1998.
[20] P. L. J. van Moerbeke, On optimal stopping and free boundary problems,
Arch. Rational Mech. Anal., 60 (1976), pp. 101148.
[21] J. Stefan,

Uber die Theorie der Eisbildung, insbesondere uber die Eisbildung
im Polarmeere, Akad. Wiss. Wien, Mat. Nat. Classe, Sitzungsberichte, 98
(1889), pp. 965983.
127
128