This action might not be possible to undo. Are you sure you want to continue?
– 1 –
Numerical Methods
Natural Sciences Tripos 1B
Lecture Notes
Lent Term 1999
© Stuart Dalziel
Department of Applied Mathematics and Theoretical Physics
University of Cambridge
Phone: (01223) 337911
Email: s.dalziel@damtp.cam.ac.uk
WWW: http://www.damtp.cam.ac.uk/user/fdl/people/sd103/
Lecture Notes: http://www.damtp.cam.ac.uk/user/fdl/people/sd103/lectures/
Formatting and visibility in this version:
Subsections
Subsubsections
Fourth order subsections
All versions
Full text
Common equations – complete equation given in handouts
Hidden equations – these were left incomplete in handouts
Full figures
Tables
Numerical Methods for Natural Sciences IB Introduction
– 2 –
CONTENTS
• Page numbers are correct only for the handout with which they are given
Formatting and visibility in this version: ................................................................................... 1
Subsections ................................................................................................................................. 1
Subsubsections....................................................................................................................... 1
1 Introduction................................................................................................................................ 6
1.1 Objective............................................................................................................................... 6
1.2 Books .................................................................................................................................... 6
General: .................................................................................................................................. 6
More specialised:.................................................................................................................... 6
1.3 Programming......................................................................................................................... 7
1.4 Tools ..................................................................................................................................... 7
1.4.1 Software libraries .......................................................................................................... 7
1.4.2 Maths systems ................................................................................................................ 7
1.5 Course Credit ........................................................................................................................ 8
1.6 Versions ................................................................................................................................ 8
1.6.1 Notes disctributed during lectures................................................................................. 8
1.6.2 Acrobat........................................................................................................................... 8
1.6.3 HTML............................................................................................................................. 8
1.6.4 Copyright ....................................................................................................................... 9
2 Key Idea.................................................................................................................................... 10
3 Root finding in one dimension................................................................................................ 11
3.1 Why?................................................................................................................................... 11
3.2 Bisection ............................................................................................................................. 11
3.2.1 Convergence ................................................................................................................ 12
3.2.2 Criteria......................................................................................................................... 12
3.3 Linear interpolation (regula falsi) ....................................................................................... 13
3.4 NewtonRaphson................................................................................................................. 14
3.4.1 Convergence ................................................................................................................ 15
3.5 Secant (chord) ..................................................................................................................... 16
3.5.1 Convergence ................................................................................................................ 18
3.6 Direct iteration .................................................................................................................... 19
3.6.1 Convergence ................................................................................................................ 20
3.7 Examples............................................................................................................................. 21
3.7.1 Bisection method.......................................................................................................... 21
3.7.2 Linear interpolation..................................................................................................... 22
3.7.3 NewtonRaphson.......................................................................................................... 22
3.7.4 Secant method.............................................................................................................. 23
3.7.5 Direct iteration ............................................................................................................ 23
Numerical Methods for Natural Sciences IB Introduction
– 3 –
3.7.6 Comparison.................................................................................................................. 25
3.7.7 Fortran program
*
........................................................................................................ 26
4 Linear equations ...................................................................................................................... 29
4.1 Gauss elimination ............................................................................................................... 29
4.2 Pivoting............................................................................................................................... 32
4.2.1 Partial pivoting............................................................................................................ 33
4.2.2 Full pivoting................................................................................................................. 33
4.3 LU factorisation .................................................................................................................. 34
4.4 Banded matrices.................................................................................................................. 36
4.5 Tridiagonal matrices ........................................................................................................... 36
4.6 Other approaches to solving linear systems........................................................................ 37
4.7 Over determined systems
*
................................................................................................... 38
4.8 Under determined systems
*
................................................................................................. 40
5 Numerical integration.............................................................................................................. 41
5.1 Manual method ................................................................................................................... 41
5.2 Constant rule....................................................................................................................... 41
5.3 Trapezium rule.................................................................................................................... 42
5.4 Midpoint rule..................................................................................................................... 44
5.5 Simpson’s rule .................................................................................................................... 47
5.6 Quadratic triangulation
*
...................................................................................................... 48
5.7 Romberg integration ........................................................................................................... 49
5.8 Gauss quadrature................................................................................................................. 50
5.9 Example of numerical integration....................................................................................... 51
5.9.1 Program for numerical integration
*
............................................................................ 54
6 First order ordinary differential equations ........................................................................... 57
6.1 Taylor series........................................................................................................................ 57
6.2 Finite difference.................................................................................................................. 57
6.3 Truncation error .................................................................................................................. 58
6.4 Euler method....................................................................................................................... 59
6.5 Implicit methods ................................................................................................................. 60
6.5.1 Backward Euler ........................................................................................................... 60
6.5.2 Richardson extrapolation ............................................................................................ 61
6.5.3 CrankNicholson.......................................................................................................... 62
6.6 Multistep methods............................................................................................................... 63
6.7 Stability............................................................................................................................... 63
6.8 Predictorcorrector methods................................................................................................ 65
6.8.1 Improved Euler method................................................................................................ 66
6.8.2 RungeKutta methods................................................................................................... 66
7 Higher order ordinary differential equations ....................................................................... 68
7.1 Initial value problems ......................................................................................................... 68
7.2 Boundary value problems ................................................................................................... 68
7.2.1 Shooting method .......................................................................................................... 68
Numerical Methods for Natural Sciences IB Introduction
– 4 –
7.2.2 Linear equations .......................................................................................................... 69
7.3 Other considerations
*
.......................................................................................................... 70
7.3.1 Truncation error
*
......................................................................................................... 71
7.3.2 Error and step control
*
................................................................................................ 71
8 Partial differential equations .................................................................................................. 72
8.1 Laplace equation................................................................................................................. 72
8.1.1 Direct solution ............................................................................................................. 73
8.1.2 Relaxation.................................................................................................................... 75
8.1.3 Multigrid
*
..................................................................................................................... 79
8.1.4 The mathematics of relaxation
*
................................................................................... 80
8.1.5 FFT
*
............................................................................................................................. 84
8.1.6 Boundary elements
*
..................................................................................................... 84
8.1.7 Finite elements
*
............................................................................................................ 84
8.2 Poisson equation ................................................................................................................. 84
8.3 Diffusion equation .............................................................................................................. 84
8.3.1 Semidiscretisation....................................................................................................... 84
8.3.2 Euler method................................................................................................................ 85
8.3.3 Stability ........................................................................................................................ 85
8.3.4 Model for general initial conditions ............................................................................ 86
8.3.5 CrankNicholson.......................................................................................................... 86
8.3.6 ADI
*
............................................................................................................................. 87
8.4 Advection
*
........................................................................................................................... 87
8.4.1 Upwind differencing
*
................................................................................................... 87
8.4.2 Courant number
*
.......................................................................................................... 87
8.4.3 Numerical dispersion
*
.................................................................................................. 88
8.4.4 Shocks
*
......................................................................................................................... 88
8.4.5 LaxWendroff
*
.............................................................................................................. 88
8.4.6 Conservative schemes
*
................................................................................................. 88
9. Number representation
*
......................................................................................................... 89
9.1. Integers
*
............................................................................................................................. 89
9.2. Floating point
*
.................................................................................................................... 90
9.3. Rounding and truncation error
*
.......................................................................................... 91
9.4. Endians
*
............................................................................................................................. 91
10. Computer languages
*
............................................................................................................ 93
10.1. Procedural verses Object Oriented
*
................................................................................. 93
10.2. Fortran 90
*
....................................................................................................................... 93
10.2.1. Procedural oriented
*
................................................................................................. 94
10.2.2. Fortran enhancements
*
............................................................................................. 94
10.3. C++
*
................................................................................................................................. 95
10.3.1. C
*
.............................................................................................................................. 95
10.3.2. Object Oriented
*
....................................................................................................... 95
10.3.3. Weaknesses
*
.............................................................................................................. 96
10.4. Others
*
.............................................................................................................................. 97
10.4.1. Ada
*
........................................................................................................................... 97
10.4.2. Algol
*
........................................................................................................................ 97
Numerical Methods for Natural Sciences IB Introduction
– 5 –
10.4.3. Basic
*
........................................................................................................................ 97
10.4.4. Cobol
*
....................................................................................................................... 98
10.4.5. Delphi
*
...................................................................................................................... 98
10.4.6. Forth
*
........................................................................................................................ 98
10.4.7. Lisp
*
.......................................................................................................................... 98
10.4.8. Modula2
*
................................................................................................................. 98
10.4.9. Pascal
*
...................................................................................................................... 98
10.4.10. PL/1
*
....................................................................................................................... 99
10.4.11. PostScript
*
.............................................................................................................. 99
10.4.12. Prolog
*
.................................................................................................................... 99
10.4.13. Smalltalk
*
................................................................................................................ 99
10.4.14. Visual Basic
*
........................................................................................................... 99
Numerical Methods for Natural Sciences IB Introduction
– 6 –
1 Introduction
These lecture notes are written for the Numerical Methods course as part of the Natural Sciences
Tripos, Part IB. The notes are intended to compliment the material presented in the lectures rather
than replace them.
1.1 Objective
• To give an overview of what can be done
• To give insight into how it can be done
• To give the confidence to tackle numerical solutions
An understanding of how a method works aids in choosing a method. It can also provide an
indication of what can and will go wrong, and of the accuracy which may be obtained.
• To gain insight into the underlying physics
• “The aim of this course is to introduce numerical techniques that can be used on
computers, rather than to provide a detailed treatment of accuracy or stability” –
Lecture Schedule.
Unfortunately the course is now examinable and therefore the material must be presented in a
manner consistent with this.
1.2 Books
General:
• Numerical Recipes  The Art of Scientific Computing, by Press, Flannery, Teukolsky
& Vetterling (CUP)
• Numerical Methods that Work, by Acton (Harper & Row)
• Numerical Analysis, by Burden & Faires (PWSKent)
• Applied Numerical Analysis, by Gerald & Wheatley (AddisonWesley)
• A Simple Introduction to Numerical Analysis, by Harding & Quinney (Institute of
Physics Publishing)
• Elementary Numerical Analysis, 3rd Edition, by Conte & de Boor (McGrawHill)
More specialised:
• Numerical Methods for Ordinary Differential Systems, by Lambert (Wiley)
• Numerical Solution of Partial Differential Equations: Finite Difference Methods, by
Smith (Oxford University Press)
For many people, Numerical Recipes is the bible for simple numerical techniques. It contains not
only detailed discussion of the algorithms and their use, but also sample source code for each.
Numerical Methods for Natural Sciences IB Introduction
– 7 –
Numerical Recipes is available for three tastes: Fortran, C and Pascal, with the source code
examples being taylored for each.
1.3 Programming
While a number of programming examples are given during the course, the course and
examination do not require any knowledge of programming. Numerical results are given to
illustrate a point and the code used to compute them presented in these notes purely for
completeness.
1.4 Tools
Unfortunately this course is too short to be able to provide an introduction to the various tools
available to assist with the solution of a wide range of mathematical problems. These tools are
widely available on nearly all computer platforms and fall into two general classes:
1.4.1 Software libraries
These are intended to be linked into your own computer program and provide routines for
solving particular classes of problems.
• NAG
• IMFL
• Numerical Recipes
The first two are commercial packages providing object libraries, while the final of these libraries
mirrors the content of the Numerical Recipes book and is available as source code.
1.4.2 Maths systems
These provide a shrinkwrapped solution to a broad class of mathematical problems. Typically
they have easytouse interfaces and provide graphical as well as text or numeric output. Key
features include algebraic analytical solution. There is fierce competition between the various
products available and, as a result, development continues at a rapid rate.
• Derive
• Maple
• Mathcad
• Mathematica
• Matlab
• Reduce
Numerical Methods for Natural Sciences IB Introduction
– 8 –
1.5 Course Credit
Prior to the 19951996 academic year, this course was not examinable. Since then, however,
there have been two examination questions each year. Some indication of the type of exam
questions may be gained from earlier tripos papers and from the later examples sheets. Note that
there has, unfortunately, been a tendency to concentrate on the more analysis side of the course in
the examination questions.
Some of the topics covered in these notes are not examinable. This situation is indicated by an
asterisk at the end of the section heading.
1.6 Versions
These lecture notes are available in three forms: the lecture notes distributed during lectures, and
the set available in two formats on the web.
1.6.1 Notes disctributed during lectures
The version distributed during lectures includes blanks for you to fill in the missing details. These
details will be given during the lectures themselves.
1.6.2 Acrobat
The lecture notes are also available over the web. This year’s notes will be provided in Acrobat
format (pdf) and may be found at
http://www.damtp.cam.ac.uk/user/fdl/people/sd103/lectures/
These notes contain all the information, and any blanks have been filled in.
1.6.3 HTML
In previous years these have been provided through an html format, and these notes remain
available, although may not contain the latest revisions. The HTML version of the notes also has all
the blanks filled in.
The HTML is generated from a source Word document that contains graphics, display equations
and inline equations and symbols. All graphics and complex display equations (where the Microsoft
Equation Editor has been used) are converted to GIF files for the HTML version. However, many of
the simpler equations and most of the inline equations and symbols do not use the Equation Editor
as this is very inefficient. As a consequence, they appear as characters rather than GIF files in the
HTML document. This has major advantages in terms of document size, but can cause problems
with older World Wide Web browsers.
Due to limitations in HTML and many older World Wide Web browsers, Greek and Symbols
used within the text and single line equations may not be displayed correctly. Similarly, some
browsers do not handle superscript and subscript. To avoid confusion when using older browsers,
all Greek and Symbols are formatted in Green. Thus if you find a green Roman character, read it as
the Greek equivalent. Table 1of the correspondences is given below. Variables and normal symbols
Numerical Methods for Natural Sciences IB Introduction
– 9 –
are treated in a similar way but are coloured dark Blue to distinguish them from the Greek. The
context and colour should distinguish them from HTML hypertext links. Similarly, subscripts are
shown in dark Cyan and superscripts in dark Magenta. Greek subscripts and superscripts are the
same Green as the normal characters, the context providing the key to whether it is a subscript or
superscript. For a similar reason, the use of some mathematical symbols (such as less than or equal
to) has been avoided and their Basic computer equivalent used in stead.
Fortunately many newer browsers (Microsoft Internet Explorer 3.0 and Netscape 3.0 on the PC,
but on many Unix platforms the Greek and Symbol characters are unavailable) do not have the same
character set limitations. The colour is still displayed, but the characters appear as intended.
Greek/Symbol character Name
α alpha
β beta
δ delta
∆ Delta
ε epsilon
ϕ phi
Φ Phi
λ lambda
µ mu
π pi
θ theta
σ sigma
ψ psi
Ψ Psi
<= less than or equal to
>= greater than or equal to
<> not equal to
=~ approximately equal to
vector vectors are represented as bold
Table 1: Correspondence between colour and characters.
1.6.4 Copyright
These notes may be duplicated freely for the purposes of education or research. Any such
reproductions, in whole or in part, should contain details of the author and this copyright notice.
Numerical Methods for Natural Sciences IB Introduction
– 10 –
2 Key Idea
The central idea behind the majority of methods discussed in this course is the Taylor Series
expansion of a function about a point. For a function of a single variable, we may represent the
expansion as
( ) ( ) ( ) ( ) ( ) f x x f x xf x
x
f x
x
f x + · + ′ + ′′ + ′′′ + δ δ
δ δ
2 3
2 6
K. (1)
In two dimensions we have
( ) ( ) f x x y y f x y x
f
x
y
f
y
x f
x
y f
y
x y
f
x y
+ + · + + + + + + δ δ δ
∂
∂
δ
∂
∂
δ ∂
∂
δ ∂
∂
δ δ
∂
∂ ∂
, ,
2 2
2
2 2
2
2
2 2
K. (2)
Similar expansions may be constructed for functions with more independent variables.
Numerical Methods for Natural Sciences IB Introduction
– 11 –
3 Root finding in one dimension
3.1 Why?
Solutions x = x
0
to equations of the form f(x) = 0 are often required where it is impossible or
infeasible to find an analytical expression for the vector x. If the scalar function f depends on m
independent variables x
1
,x
2
,…,x
m
, then the solution x
0
will describe a surface in m–1 dimensional
space. Alternatively we may consider the vector function f(x)=0, the solutions of which typically
collapse to particular values of x. For this course we restrict our attention to a single independent
variable x and seek solutions to f(x)=0.
3.2 Bisection
This is the simplest method for finding a root to an equation and is also known as binary
chopping. As we shall see, it is also the most robust. One of the main drawbacks is that we need two
initial guesses x
a
and x
b
which bracket the root: let f
a
= f(x
a
) and f
b
= f(x
b
) such that f
a
f
b
<= 0. An
example of this is shown graphically in figure 1. Clearly, if f
a
f
b
= 0 then one or both of x
a
and x
b
must be a root of f(x) = 0.
x
a
x
b
f
a
f
b
x
c
f
c
Figure 1: Graphical representation of the bisection method showing two initial guesses (x
a
and x
b
bracketting
the root).
The basic algorithm for the bisection method relies on repeated application of
• Let x
c
= (x
a
+x
b
)/2,
Numerical Methods for Natural Sciences IB Introduction
– 12 –
• if f
c
= f(c) = 0 then x = x
c
is an exact solution,
• elseif f
a
f
c
< 0 then the root lies in the interval (x
a
,x
c
),
• else the root lies in the interval (x
c
,x
b
).
By replacing the interval (x
a
,x
b
) with either (x
a
,x
c
) or (x
c
,x
b
) (whichever brackets the root), the error
in our estimate of the solution to f(x) = 0 is, on average, halved. We repeat this interval halving
until either the exact root has been found or the interval is smaller than some specified tolerance.
3.2.1 Convergence
Since the interval (x
a
,x
b
) always bracets the root, we know that the error in using either x
a
or x
b
as
an estimate for root at the nth iteration, e
n
, must be
e
n
< x
a
 x
b
. (3)
Now since the interval (x
a
,x
b
) is halved for each iteration, then
e
n+1
~ e
n
/2. (4)
More generally, if x
n
is the estimate for the root x
*
at the nth iteration, then the error in this
estimate is
ε
n
= x
n
 x
*
. (5)
In many cases we may express the error at the n+1th time step in terms of the error at the nth time
step as
ε
n+1
 ~ Cε
n

p
. (6)
Indeed this criteria applies to all techniques discussed in this course, but in many cases it applies
only asymptotically as our estimate x
n
converges on the exact solution. The exponent p in equation
(6) gives the order of the convergence. The larger the value of p, the faster the scheme converges on
the solution, at least provided ε
n+1
< ε
n
. For first order schemes (i.e. p = 1), C < 1 for convergence.
For the bisection method we may estimate ε
n
as e
n
. The form of equation (4) then suggests p = 1
and C = 1/2, showing the scheme is first order and converges linearly. Indeed convergence is
guaranteed  a root to f(x) = 0 will always be found  provided f(x) is continuous over the initial
interval.
3.2.2 Criteria
In general, a numerical root finding procedure will not find the exact root being sought (ε = 0),
rather it will find some suitably accurate approximation to it. In order to prevent the algorithm
continuing to refine the solution for ever, it is necessary to place some conditions under which the
solution process is to be finished or aborted. Typically this will take the form of an error tolerance
on e
n
= a
n
–b
n
, the value of f
c
, or both.
For some methods it is also important to ensure the algorithm is converging on a solution (i.e.
ε
n+1
 < ε
n
 for suitably large n), and that this convergence is sufficiently rapid to attain the solution
in a reasonable span of time. The guaranteed convergence of the bisection method does not require
Numerical Methods for Natural Sciences IB Introduction
– 13 –
such safety checks which, combined with its extreme simplicity, is one of the reasons for its
widespread use despite being relatively slow to converge.
3.3 Linear interpolation (regula falsi)
This method is similar to the bisection method in that it requires two initial guesses to bracket the
root. However, instead of simply dividing the region in two, a linear interpolation is used to obtain a
new point which is (hopefully, but not necessarily) closer to the root than the equivalent estimate for
the bisection method. A graphical interpretation of this method is shown in figure 2.
x
a
x
b
f
a
f
b
x
1
x
2
Figure 2: Root finding by the linear interpolation (regula falsi) method. The two initial gueses x
a
and x
b
must
bracket the root.
The basic algorithm for the linear interpolation method is
• Let x x
x x
f f
f x
x x
f f
f
x f x f
f f
c a
b a
b a
a b
b a
b a
b
a b b a
b a
· −
−
−
· −
−
−
·
−
−
, then
• if f
c
= f(x
c
) = 0 then x = x
c
is an exact solution,
• elseif f
a
f
c
< 0 then the root lies in the interval (x
a
,x
c
),
Numerical Methods for Natural Sciences IB Introduction
– 14 –
• else the root lies in the interval (x
c
,x
b
).
Because the solution remains bracketed at each step, convergence is guaranteed as was the case for
the bisection method. The method is first order and is exact for linear f.
3.4 NewtonRaphson
Consider the Taylor Series expansion of f(x) about some point x = x
0
:
f(x) = f(x
0
) + (x–x
0
)f'(x
0
) + ½(x–x
0
)
2
f"(x
0
) + O(x–x
0

3
). (7)
Setting the quadratic and higher terms to zero and solving the linear approximation of f(x) = 0 for x
gives
( )
( )
x x
f x
f x
1 0
0
0
· −
’
. (8)
Subsequent iterations are defined in a similar manner as
( )
( )
x x
f x
f x
n n
n
n
+
· −
1
’
. (9)
Geometrically, x
n+1
can be interpreted as the value of x at which a line, passing through the point
(x
n
,f(x
n
)) and tangent to the curve f(x) at that point, crosses the y axis. Figure 3 provides a graphical
interpretation of this.
x
0
f
0
x
1
x
2
Figure 3: Graphical interpretation of the Newton Raphson algorithm.
Numerical Methods for Natural Sciences IB Introduction
– 15 –
When it works, NewtonRaphson converges much more rapidly than the bisection or linear
interpolation. However, if f’ vanishes at an iteration point, or indeed even between the current
estimate and the root, then the method will fail to converge. A graphical interpretation of this is
given in figure 4.
x
0
x
1
x
2
Figure 4: Divergence of the Newton Raphson algorithm due to the presence of a turning point close to the
root.
3.4.1 Convergence
To study how the NewtonRaphson scheme converges, expand f(x) around the root x = x*,
f(x) = f(x*) + (x– x*)f'(x*) +
1
/
2
(x– x*)
2
f"(x*) + O(x– x*
3
) (10)
and substitute into the iteration formula. This then shows
Numerical Methods for Natural Sciences IB Introduction
– 16 –
( )
( )
( ) ( )
( ) ( )
( ) ( )
[ ]
( )
( )
( )
( )
( )
( )
ε
ε
ε ε
ε
ε ε ε ε
ε ε ε ε
n n
n
n
n
n
n n
n
n n n n
n n n n
x x
x x
f x
f x
f x f x
f x f x
f x f x
f x
f x
f x
f x
f x
f x
f x
+ +
· −
· − −
′
· −
′ + ′′ +
′ + ′′ +
· − ′ + ′′ +
′
−
′′
′
+
]
]
]
· − +
′′
′
−
′′
′
1 1
1
2
2
1
2
2
2
1
2
2
1
1
*
*
* * ...
* * ...
* * ...
*
*
*
...
*
*
*
( )
( )
( )
( )
( )
*
*
*
+
·
′′
′
+
O
f x
f x
O
n
n n
ε
ε ε
3
1
2
2 3
(11)
since f(x*)=0. Thus, by comparison with (5), there is second order (quadratic) convergence. The
presence of the f’ term in the denominator shows that the scheme will not converge if f’ vanishes in
the neighbourhood of the root.
3.5 Secant (chord)
This method is essentially the same as NewtonRaphson except that the derivative f’(x) is
approximated by a finite difference based on the current and the preceding estimate for the root, i.e.
( )
( ) ( )
f x
f x f x
x x
n
n n
n n
’ ≈
−
−
−
−
1
1
, (12)
and this is substituted into the NewtonRaphson algorithm (9) to give
( )
( ) ( )
( ) x x
x x
f x f x
f x
n n
n n
n n
n +
−
−
· −
−
−
1
1
1
. (13)
This formula is identical to that for the Linear Interpolation method discussed in section 3.3. The
difference is that rather than replacing one of the two estimates so that the root is always bracketed,
the oldest point is always discarded in favour of the new. This means it is not necessary to have two
initial guesses bracketing the root, but on the other hand, convergence is not guaranteed. A graphical
representation of the method working is shown in figure 5 and failure to converge in figure 6. In
some cases, swapping the two initial guesses x
0
and x
1
will change the behaviour of the method
from convergent to divergent.
Numerical Methods for Natural Sciences IB Introduction
– 17 –
x
1
x
0
x
2
x
3
x
4
Figure 5: Convergence on the root using the secant method.
x
0
x
1
x
2
x
3
Figure 6: Divergence using the secant method.
Numerical Methods for Natural Sciences IB Introduction
– 18 –
3.5.1 Convergence
The order of convergence may be obtained in a similar way to the earlier methods. Expanding
around the root x = x* for x
n
and x
n+1
gives
f(x
n
) = f(x*) + ε
n
f'(x*) +
1
/
2
ε
n
2
f''(x*) + O(ε
n

3
), (14a)
f(x
n1
) = f(x*) + ε
n–1
f’(x*) +
1
/
2
ε
n–1
2
f’’(x*) + O(ε
n–1

3
), (14b)
and substituting into the iteration formula
( )
( ) ( )
( )
( ) ( )
( ) ( ) ( ) ( )
[ ]
( )
( ) ( )
( )
( )
ε
ε
ε ε
ε ε ε ε
ε ε
ε
ε ε
ε ε
n n
n
n
n n
n n
n
n n
n n n n
n n
n
n n
n n
x x
x x
f x
f x f x
x x
f x f x
f x f x f x f x
f x f x
f x
+ +
−
−
− −
−
−
· −
· − −
−
−
· −
′ + ′′ +
′ + ′′ + − ′ + ′′ +
−
· −
′ + ′′ +
− ′
1 1
1
1
1
2
2
1
2
2
1
1
2
1
2
1
1
2
2
1
*
*
* * ...
* * ... * * ...
* * ...
* ( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
( )
1
1
1
2
1
1
1
2
2
1
2
1
1
2
1
1
2
2 3
1
2
+ +
′′
+
]
]
]
−
· − +
′′
′
+
]
]
]
− +
′′
′
+
]
]
]
· − + +
′′
′
−
′′
′
+
·
′′
−
−
−
−
ε ε
ε ε
ε ε ε ε ε
ε ε ε ε ε ε ε
n n
n n
n n n n n
n n n n n n n
f x
f x
f x
f x
f x
f x
f x
f x
f x
f x
O
f x
*
’ *
...
*
*
...
*
*
...
*
*
*
*
*
( )
( )
′
+
− −
f x
O
n n n
*
ε ε ε
1 1
3
. (15)
Note that this expression for ε
n+1
includes both ε
n
and ε
n–1
. In general we would like it in terms of ε
n
only. The form of this expression suggests a power law relationship. By writing
( )
( )
ε ε
β
α
n n
f x
f x
+
·
′′
′

.
`
,
1
2
*
*
, (16)
and substituting into the error evolution equation (15) gives
( )
( )
( )
( )
( )
( )
( )
( )
ε ε ε
ε ε
ε
β α
α
β α
α
α
n n n
n n
n
f x
f x
f x
f x
f x
f x
f x
f x
+ −
−
−
+
·
′′
′

.
`
,
·
′′
′

.
`
,
′′
′

.
`
,
·
′′
′

.
`
,
1 1
1
1
1
2
2 2
2
*
*
*
*
*
*
*
*
(17)
which we equate with our assumed relationship to show
Numerical Methods for Natural Sciences IB Introduction
– 19 –
α
α
α
β
α
α α
·
+
·
+
·
+
· ·
+
1 1 5
2
1
1 2
1 5
,
.
(18)
Thus the method is of noninteger order 1.61803… (the golden ratio). As with NewtonRaphson,
the method may diverge if f’ vanishes in the neighbourhood of the root.
3.6 Direct iteration
A simple and often useful method involves rearranging and possibly transforming the function
f(x) by T(f(x),x) to obtain g(x) = T(f(x),x). The only restriction on T(f(x),x) is that solutions to f(x) = 0
have a one to one relationship with solutions to g(x) = x for the roots being sort. Indeed, one reason
for choosing such a transformation for an equation with multiple roots is to eliminate known roots
and thus simplify the location of the remaining roots. The efficiency and convergence of this
method depends on the final form of g(x).
The iteration formula for this method is then just
x
n+1
= g(x
n
). (19)
A graphical interpretion of this formula is given in figure 7.
x
0
x
1
Figure 7: Convergence on a root using the Direct Iteration method.
Numerical Methods for Natural Sciences IB Introduction
– 20 –
3.6.1 Convergence
The convergence of this method may be determined in a similar manner to the other methods by
expanding about x*. Here we need to expand g(x) rather than f(x). This gives
g(x
n
) = g(x*) + ε
n
g’(x*) +
1
/
2
ε
n
2
g"(x*) + O(ε
n

3
), (20)
so that the evolution of the error follows
( )
( ) ( )
( )
( ) ( )
( )
ε
ε ε ε
ε ε ε
n n
n
n n n
n n n
x x
g x x
g x g x g x O x
g x g x O
+ +
· −
· −
· + ′ + ′′ + −
· ′ + ′′ +
1 1
1
2
2 3
1
2
2 2
*
*
( *) * * *
* *
. (21)
The method is clearly first order and will converge only if g’ < 1. The sign of g’ determines whether
the convergence (or divergence) is monotonic (positive g') or oscillatory (negative g'). Figure 8
shows how the method will diverge if this restriction on g' is not satisfied. Here g’ < −1 so the
divergence is oscilatory.
Obviously our choice of T(f(x),x) should try to minimise g’(x) in the neighbourhood of the root to
maximise the rate of convergence. In addition, we should choose T(f(x),x) so that the curvature
g"(x) does not become too large.
If g’(x) < 0, then we get oscillatory convergence/divergence.
Numerical Methods for Natural Sciences IB Introduction
– 21 –
x
0
Figure 8: The divergence of a Direct Iteration when g’ < −1.
3.7 Examples
Consider the equation
f(x) = cos x − 1/2. (22)
3.7.1 Bisection method
• Initial guesses x = 0 and x = π/2.
• Expect linear convergence: ε
n+1
 ~ ε
n
/2.
Iteration Error e
n+1
/e
n
0 0.261799 0.500001909862
1 0.130900 0.4999984721161
2 0.0654498 0.5000015278886
Numerical Methods for Natural Sciences IB Introduction
– 22 –
3 0.0327250 0.4999969442322
4 0.0163624 0.5000036669437
5 0.00818126 0.4999951107776
6 0.00409059 0.5000110008581
7 0.00204534 0.4999755541866
8 0.00102262 0.5000449824959
9 0.000511356 0.4999139542706
10 0.000255634 0.5001721210794
11 0.000127861 0.4996574405018
12 0.0000638867 0.5006848060707
13 0.0000319871 0.4986322611303
14 0.0000159498 0.5027411002019
15 0.00000801862
3.7.2 Linear interpolation
• Initial guesses x = 0 and x = π/2.
• Expect linear convergence: ε
n+1
 ~ cε
n
.
Iteration Error e
n+1
/e
n
0
0.261799
0.1213205550823
1 0.0317616 0.0963178807113
2 0.00305921 0.09340810209172
3 0.000285755 0.09312907910623
4 0.0000266121 0.09310313729469
5 0.00000247767 0.09310037252741
6 0.000000230672 0.09310059304987
7 0.0000000214757 0.09310010849472
8 0.00000000199939 0.09310039562066
9 0.000000000186144 0.09310104005501
10 0.0000000000173302 0.09310567679542
11 0.00000000000161354 0.09316100003719
12 0.000000000000150319 0.09374663216227
13 0.0000000000000140919 0.10000070962752
14 0.0000000000000014092 0.1620777746239
15 0.0000000000000002284
3.7.3 NewtonRaphson
• Initial guess: x = π/2.
• Note that can not use x = 0 as derivative vanishes here.
• Expect quadratic convergence: ε
n+1
~ cε
n
2
.
Iteration Error e
n+1
/e
n
e
n+1
/e
n
2
0 0.0235988 0.00653855280777 0.2770714107399
1 0.000154302 0.0000445311143083 0.2885971297087
2 0.00000000687124 0.000000014553 
3 1.0E15
4
Machine accuracy
Numerical Methods for Natural Sciences IB Introduction
– 23 –
3.7.4 Secant method
• Initial guesses x = 0 and x = π/2.
• Expect convergence: ε
n+1
~ cε
n
1.618
.
Iteration Error e
n+1
/e
n
e
n+1
/e
n

1.618
0 0.261799 0.1213205550823 0.2777
1 0.0317616 0.09730712558561 0.8203
2 0.00309063 0.009399086917554 0.3344
3 0.0000290491 0.0008898244696049 0.5664
4 0.0000000258486 0.000008384051747483 0.4098
5 0.000000000000216716
6
Machine accuracy
• Convergence substantially faster than linear interpolation.
3.7.5 Direct iteration
There are a variety of ways in which equation (22) may be rearranged into the form required for
direct iteration.
3.7.5.1 Addition of x
Use
x
n+1
= g(x) = x
n
+ cos x – 1/2 (23)
• Initial guess: x = 0 (also works with x = π/2)
• Expect convergence: ε
n+1
~ g’(x*) ε
n
~ 0.13 ε
n
.
Iteration Error e
n+1
/e
n
0 0.547198 0.30997006568
1 0.169615 0.1804233116175
2 0.0306025 0.1417596601585
3 0.00433820 0.1350620072841
4 0.000585926 0.1341210323488
5 0.0000785850 0.1339937647134
6 0.0000105299 0.1339775306508
7 0.00000141077 0.1339750632633
8 0.000000189008 0.1339747523914
9 0.0000000253223 0.1339747969181
10 0.00000000339255 0.1339744440023
11 0.000000000454515 0.1339748963181
12 0.0000000000608936 0.1339759843399
13 0.00000000000815828 0.1339878013503
14 0.00000000000109311 0.1340617138257
15 0.0000000000001465442
Numerical Methods for Natural Sciences IB Introduction
– 24 –
3.7.5.2 Multiplcation by x
Use
x
n+1
= g(x) = 2x cos x (24)
• Initial guess: x = π/2 (fails with x = 0 as this is a new solution to g(x)=x)
• Expect convergence: ε
n+1
~ g’(x*) ε
n
~ 0.81 ε
n
.
Iteration Error e
n+1
/e
n
0 0.0635232 0.9577980958138
1 0.0608424 0.6773664418235
2 0.0412126 0.9070721090152
3 0.0373828 0.7297714456916
4 0.0272809 0.8754733164962
5 0.0238837 0.7600455540808
6 0.0181527 0.854809477378
7 0.0155171 0.778843985023
8 0.0120854 0.8410892481838
9 0.0101649 0.7908921878228
10 0.00803934 0.8319464035605
11 0.00668830 0.7987216482514
12 0.00534209 0.8258546748557
13 0.00441179 0.8038528579103
14 0.00354643 0.8218010788314
15 0.00291446
3.7.5.3 Approximating f’(x)
The Direct Iteration method is closely related to the Newton Raphson method when a particular
choice of transformation T(f(x)) is made. Consider
f(x) = f(x) + (x–x)h(x) = 0. (25)
Rearranging equation (25) for one of the x variables and labelling the different variables for
different steps in the interation gives
x
n+1
= g(x
n
) = x
n
– f(x
n
)/h(x
n
). (26)
Now if we choose h(x) such that g’(x)=0 everywhere (which requires h(x) = f'(x)), then we recover
the NewtonRaphson method with its quadratic convergence.
In some situations calculation of f'(x) may not be feasible. In such cases it may be necessary to
rely on the first order and secant methods which do not require a knowledge of f'(x). However, the
convergence of such methods is very slow. The Direct Iteration method, on the otherhand, provides
us with a framework for a faster method. To do this we select h(x) as an approximation to f’(x). For
the present f(x) = cos x  1/2 we may approximate f'(x) as
h(x) = 4x(x – π)/π
2
(27)
• Initial guess: x = 0 (fails with x = π/2 as h(x) vanishes).
• Expect convergence: ε
n+1
~ g’(x*) ε
n
~ 0.026 ε
n
.
Numerical Methods for Natural Sciences IB Introduction
– 25 –
Iteration Error e
n+1
/e
n
0 0.0235988 0.02985973863078
1 0.000704654 0.02585084310882
2 0.0000182159 0.02572477890195
3 0.000000468600 0.02572151088348
4 0.0000000120531 0.02572134969427
5 0.000000000310022 0.02572107785899
6 0.00000000000797410 0.02570835580191
7 0.000000000000205001 0.02521207213623
8 0.00000000000000516850
9
Machine accuracy
The convergence, while still formally linear, is significantly more rapid than with the other first
order methods. For a more complex example, the computational cost of having more iterations than
Newton Raphson may be significantly less than the cost of evaluating the derivative.
A further potential use of this approach is to avoid the divergence problems associated with f’(x)
vanishing in the Newton Raphson scheme. Since h(x) only approximates f’(x), and the accuracy of
this approximation is more important close to the root, it may be possible to choose h(x) in such a
way as to avoid a divergent scheme.
3.7.6 Comparison
Figure 9 shows graphicall a comparison between the different approaches to finding the roots of
equation (22). The clear winner is the NewtonRaphson scheme, with the approximated derivative
for the Direct Iteration proving a very good alternative.
0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 11.0 12.0 13.0 14.0 15.0 16.0
1.0e14
1.0e13
1.0e12
1.0e11
1.0e10
1.0e9
1.0e8
1.0e7
1.0e6
1.0e5
1.0e4
0.001
0.01
0.1
1.0
Interation (n)
R
e
l
a
t
i
v
e
e
r
r
o
r
(
ε
n
/
ε
0
)
Convergence for cos(x) = 1/2
Scheme
Bisection
Linear interpolation
NewtonRaphson
Secant
General iteration: x
n+1
= x
n
+ cosx
n
 1/2
General iteration: x
n+1
= 2x
n
cosx
n
General iteration: Approximating f’(x) with quadratic
Figure 9: Comparison of the convergence of the error in the estimate of the root to cos x = 1/2 for a range of
different root finding algorithms.
Numerical Methods for Natural Sciences IB Introduction
– 26 –
3.7.7 Fortran program
*
The following program was used to generate the data presented for the above examples. Note
that this is included as an illustrative example. No knowledge of Fortran or any other programming
language is required in this course.
PROGRAM Roots
INTEGER*4 i,j
REAL*8 x,xa,xb,xc,fa,fb,fc,pi,xStar,f,df
REAL*8 Error(0:15,0:15)
f(x)=cos(x)0.5
df(x) = SIN(x)
pi = 3.141592653
xStar = ACOS(0.5)
WRITE(6,*)’# ’,xStar,f(xStar)
C=====Bisection
xa = 0
fa = f(xa)
xb = pi/2.0
fb = f(xb)
DO i=0,15
xc = (xa + xb)/2.0
fc = f(xc)
IF (fa*fc .LT. 0.0) THEN
xb = xc
fb = fc
ELSE
xa = xc
fa = fc
ENDIF
Error(0,i) = xc  xStar
ENDDO
C=====Linear interpolation
xa = 0
fa = f(xa)
xb = pi/2.0
fb = f(xb)
DO i=0,15
xc = xa  (xbxa)/(fbfa)*fa
fc = f(xc)
IF (fa*fc .LT. 0.0) THEN
xb = xc
fb = fc
ELSE
xa = xc
fa = fc
ENDIF
Error(1,i) = xc  xStar
ENDDO
C=====NewtonRaphson
*
Not examinable
Numerical Methods for Natural Sciences IB Introduction
– 27 –
xa = pi/2.0
DO i=0,15
xa = xa  f(xa)/df(xa)
Error(2,i) = xa  xStar
ENDDO
C=====Secant
xa = 0
fa = f(xa)
xb = pi/2.0
fb = f(xb)
DO i=0,15
IF (fa .NE. fb) THEN
C If fa = fb then either method has converged (xa=xb)
C or will diverge from this point
xc = xa  (xbxa)/(fbfa)*fa
xa = xb
fa = fb
xb = xc
fb = f(xb)
ENDIF
Error(3,i) = xc  xStar
ENDDO
C=====Direct iteration using x + f(x) = x
xa = 0.0
DO i=0,15
xa = xa + f(xa)
Error(4,i) = xa  xStar
ENDDO
C=====Direct iteration using xf(x)=0 rearranged for x
CStarting point prevents convergence
xa = pi/2.0
DO i=0,15
xa = 2.0*xa*(f(x)0.5)
Error(5,i) = xa  xStar
ENDDO
C=====Direct iteration using xf(x)=0 rearranged for x
xa = pi/4.0
DO i=0,15
xa = 2.0*xa*COS(xa)
Error(6,i) = xa  xStar
ENDDO
C=====Direct iteration using 4x(xpi)/pi/pi to approximate f’
xa = pi/2.0
DO i=0,15
xa = xa  f(xa)*pi*pi/(4.0*xa*(xapi))
Error(7,i) = xa  xStar
ENDDO
C=====Output results
DO i=0,15
WRITE(6,100)i,(Error(j,i),j=0,7)
ENDDO
100 FORMAT(1x,i4,8(1x,g12.6))
END
Numerical Methods for Natural Sciences IB Introduction
– 28 –
Numerical Methods for Natural Sciences IB Introduction
– 29 –
4 Linear equations
Solving equation of the form Ax = r is central to many numerical algorithms. There are a number
of methods which may be used, some algebraically correct, while others iterative in nature and
providing only approximate solutions. Which is best will depend on the structure of A, the context
in which it is to be solved and the size compared with the available computer resources.
4.1 Gauss elimination
This is what you would probably do if you were computing the solution of a nontrivial system
by hand. For example, if
x y z
x y z
x y z
+ + ·
+ + ·
+ + ·
2 3 6
2 2 3 7
4 4 9
, (28)
we might then subtract 2 times the first equation from the second equation, and subtract the first
equation from the third equation to get
x y z
x y z
x y z
+ + ·
+ − + − · −
+ + ·
2 3 6
0 2 3 5
0 2 3
. (29)
In the second step we might add the second equation to the third to obtain
x y z
x y z
x y z
+ + ·
+ − + − · −
+ + − · −
2 3 6
0 2 3 5
0 0 2 2
. (30)
The third equation now involves only z giving z = 1. Substituting this back into the second equation
gives an equation for y and soon. In particular we have
( ) ( )
( )
( )
z
y z
x y z
· − − ·
· − + − · − + − · − − ·
· − − · − − · ·
2 2 1
5 3 2 5 3 2 2 2 1
6 2 3 1 6 2 3 1 1 1 1
/
/ / /
/ / /
. (31)
We may write this system in terms of a matrix A, an unknown vector x and the known righthand
side b as
Ax = b, (32)
and do exactly the same manipulations on the rows of the matrix A and righthand side b. From the
system
Numerical Methods for Natural Sciences IB Introduction
– 30 –
1 2 3
2 2 3
1 4 4
6
7
9
]
]
]
]
]

.
`
,
·

.
`
,
x
y
z
, (33)
we subract 2 times the first row from the second row, and subtract the first row from the third row
to obtain
1 2 3
0 2 3
0 2 1
6
5
3
− −
]
]
]
]
]

.
`
,
· −

.
`
,
x
y
z
. (34)
Before adding the second and third rows, this time we will divide the second row through by −2, the
element on the diagonal, getting
1 2 3
0 1 3 2
0 2 1
6
5 2
3
/ /
]
]
]
]
]

.
`
,
·

.
`
,
x
y
z
. (35)
This may seem pointless in this example, but in general it simplifies the next step where we subtract
a
32
times the second row from the third row. Here a
32
= 2 and represents the value in the second
column of the third row of the matrix. Thus the next step is
( ) ( )
1 2 3
0 1 3 2
0 2 2 1 2 3 2
6
5 2
3 2 5 2
/
/
/
/ − −
]
]
]
]
]

.
`
,
·
−

.
`
,
x
y
z
, (36)
⇒
1 2 3
0 1 3 2
0 0 2
6
5 2
2
/ /
−
]
]
]
]
]

.
`
,
·
−

.
`
,
x
y
z
. (37)
We again divide the resulting row (now the third row) by the element on the diagonal (a
33
= −2) to
obtain
1 2 3
0 1 3 2
0 0 1
6
5 2
1
/ /
]
]
]
]
]

.
`
,
·

.
`
,
x
y
z
, (38)
and retrieve immediately the value z = 1 from the last row of the equation. Substituting back we get
( )
1 2 3
0 1 0
0 0 1
6
5 2 3 2
1
6
1
1
]
]
]
]
]

.
`
,
· −

.
`
,
·

.
`
,
x
y
z
z / / , (39)
and finally we recover the answer
Numerical Methods for Natural Sciences IB Introduction
– 31 –
1 0 0
0 1 0
0 0 1
6 2 3
1
1
1
1
1
]
]
]
]
]

.
`
,
·

.
`
,
·
− −

.
`
,
·

.
`
,
x
y
z
x
y
z
y z
. (40)
In general, for the system
a x a x a x a x r
a x a x a x a x r
a x a x a x a x r
a x a x a x a x r
n n
n n
n n
n n n nn n n
11 1 12 2 13 3 1 1
21 1 22 2 23 3 2 2
31 1 32 2 33 3 3 3
1 1 2 2 3 3
+ + + + ·
+ + + + ·
+ + + + ·
+ + + + ·
K
K
K
M
K
, (41)
we first divide the first row by a
11
and then subtract a
21
times the new first row from the second
row, a
31
times the new first row from the third row … and a
n1
times the new first row from the nth
row. This gives
( ) ( ) ( )
( ) ( ) ( )
( ) ( ) ( )
1
0
0
0
12 11 13 11 1 11
22 21 11 12 23 21 11 13 2 21 11 1
32 31 11 12 33 31 11 13 3 31 11 1
2 1 11 12 3 1 11 13 1 11 1
1
2
3
a a a a a a
a a a a a a a a a a a a
a a a a a a a a a a a a
a a a a a a a a a a a a
x
x
x
x
n
n n
n n
n n n n nn n n n
K
M M
− − −
− − −
− − −
]
]
]
]
]
]
]
]

.
`
,
( )
( )
( )
·
−
−
−

.
`
,
r a
r a a r
r a a r
r a a r
n n
1 11
2 21 11 1
3 31 11 1
1 11 1
M
. (42)
By repeating this process for rows 3 to n, this time using the new contents of element 2,2, we
gradually replace the region below the leading diagonal with zeros. Once we have
1
0 1
0 1
0 0 0 1
12 13 1
23 2
3
1
2
3
1
2
3
$ $ $
$ $
$
$
$
$
$
a a a
a a
a
x
x
x
x
r
r
r
r
n
n
n
n n
K
M
K
M M
]
]
]
]
]
]
]
]

.
`
,
·

.
`
,
(43)
the final solution may be obtained by back substitution.
x r
x r a x
x r a x a x
x r a x a x a x
n n
n n n n n
n n n n n n n n
n n
·
· −
· − −
· − − − −
− − −
− − − − − −
$ ,
$ $ ,
$ $ $ ,
$ $ $ $ .
,
, ,
, , ,
1 1 1
2 2 2 1 1 2
1 1 1 2 2 1 3 3 1
M
L
(44)
If the arithmetic is exact, and the matrix A is not singular, then the answer computed in this
manner will be exact (provided no zeros appear on the diagonal  see below). However, as computer
arithmetic is not exact, there will be some truncation and rounding error in the answer. The
cumulative effect of this error may be very significant if the loss of precision is at an early stage in
the computation. In particular, if a numerically small number appears on the diagonal of the row,
then its use in the elimination of subsequent rows may lead to differences being computed between
Numerical Methods for Natural Sciences IB Introduction
– 32 –
very large and very small values with a consequential loss of precision. For example, if a
22
–
(a
21
/a
11
)a
12
were very small, 10
–6
, say, and both a
23
–(a
21
/a
11
)a
13
and a
33
–(a
31
/a
11
)a
13
were 1, say,
then at the next stage of the computation the 3,3 element would involve calculating the difference
between 1/10
–6
=10
6
and 1. If single precision arithmetic (representing real values using
approximately six significant digits) were being used, the result would be simply 1.0 and subsequent
calculations would be unaware of the contribution of a
23
to the solution. A more extreme case which
may often occur is if, for example, a
22
–(a
21
/a
11
)a
12
is zero – unless something is done it will not be
possible to proceed with the computation!
A zero value occuring on the leading diagonal does not mean the matrix is singular. Consider, for
example, the system
0 3 0
2 0 0
0 0 1
3
2
1
1
2
3
]
]
]
]
]

.
`
,
·

.
`
,
x
x
x
, (45)
the solution of which is obviously x
1
= x
2
= x
3
= 1. However, if we were to apply the Gauss
Elimination outlined above, we would need to divide through by a
11
= 0. Clearly this leads to
difficulties!
4.2 Pivoting
One of the ways around this problem is to ensure that small values (especially zeros) do not
appear on the diagonal and, if they do, to remove them by rearranging the matrix and vectors. In the
example given in (45) we could simply interchange rows one and two to produce
2 0 0
0 3 0
0 0 1
2
3
1
1
2
3
]
]
]
]
]

.
`
,
·

.
`
,
x
x
x
, (46)
or columns one and two to give
3 0 0
0 2 0
0 0 1
3
2
1
2
1
3
]
]
]
]
]

.
`
,
·

.
`
,
x
x
x
, (47)
either of which may then be solved using standard Guass Elimination.
More generally, suppose at some stage during a calculation we have
Numerical Methods for Natural Sciences IB Introduction
– 33 –
1 4 1 8 3 2 5
0 10 1 10 201 13 4
0 9 4 6 8 2 18
0 3 2 3 4 6003 15
0 15 1 9 33 2 1
0 155 23 4 25 73 2
0 8 56 4 4 4 88
6
1
2
3
4
5
6
1
2
3
4
5
6
K
M
K
M M
−
−
−
−
−
−
]
]
]
]
]
]
]
]
]
]
]
]
]

.
`
,
·

.
x
x
x
x
x
x
x
r
r
r
r
r
r
r
n n
$
$
$
$
$
$
$
`
,
(48)
where the element 2,5 (201) is numerically the largest value in the second row and the element 6,2
(155) the numerically largest value in the second column. As discussed above, the very small 10
–6
value for element 2,2 is likely to cause problems. (In an extreme case we might even have the value
0 appearing on the diagonal – clearly something must be done to avoid a divide by zero error
occurring!) To remove this problem we may again rearrange the rows and/or columns to bring a
larger value into the 2,2 element.
4.2.1 Partial pivoting
In partial or column pivoting, we rearrange the rows of the matrix and the righthand side to
bring the numerically largest value in the column onto the diagonal. For our example matrix the
largest value is in element 6,2 and so we simply swap rows 2 and 6 to give
1 4 1 8 3 2 5
0 155 23 4 25 73 2
0 9 4 6 8 2 18
0 3 2 3 4 6003 15
0 15 1 9 33 2 1
0 10 1 10 201 13 4
0 8 56 4 4 4 88
6
1
2
3
4
5
6
1
6
3
4
5
2
K
M
K
M M
−
−
−
−
−
]
]
]
]
]
]
]
]
]
]
]
]
]

.
`
,
·

.
−
x
x
x
x
x
x
x
r
r
r
r
r
r
r
n n
$
$
$
$
$
$
$
`
,
. (49)
Note that our variables remain in the same order which simplifies the implementation of this
procedure. The righthand side vector, however, has been rearranged. Partial pivoting may be
implemented for every step of the solution process, or only when the diagonal values are sufficiently
small as to potentially cause a problem. Pivoting for every step will lead to smaller errors being
introduced through numerical inaccuracies, but the continual reordering will slow down the
calculation.
4.2.2 Full pivoting
The philosophy behind full pivoting is much the same as that behind partial pivoting. The main
difference is that the numerically largest value in the column or row containing the value to be
replaced. In our example above element the magnitude of element 2,5 (201) is the greatest in either
Numerical Methods for Natural Sciences IB Introduction
– 34 –
row 2 or column 2 so we shall rearrange the columns to bring this element onto the diagonal. This
will also entail a rearrangement of the solution vector x. The rearranged system becomes
1 3 1 8 3 2 5
0 201 1 10 10 13 4
0 8 4 6 9 2 18
0 4 2 3 3 6003 15
0 33 1 9 15 2 1
0 25 23 4 155 73 2
0 4 56 4 8 4 88
6
1
5
3
4
2
6
1
2
3
4
5
6
K
M
K
M
M
−
−
−
−
−
−
]
]
]
]
]
]
]
]
]
]
]
]
]

.
`
,
·

.
x
x
x
x
x
x
x
b
b
b
b
b
b
b
n
n
$
$
$
$
$
$
$
`
,
. (50)
The ultimate degree of accuracy can be provided by rearranging both rows and columns so that
the numerically largest value in the submatrix not yet processed is brought onto the diagonal. In our
example above, the largest value is 6003 occurring at position 4,6 in the matrix. We may bring this
onto the diagonal for the next step by interchanging columns one and six and rows two and four.
The order in which we do this is unimportant. The final result is
1 4 1 8 3 2 5
0 6003 2 3 4 3 4
0 2 4 6 8 9 18
0 13 1 10 201 10 15
0 2 1 9 33 15 1
0 73 23 4 25 155 2
0 4 56 4 4 8 88
6
1
6
3
4
5
2
1
4
3
2
5
6
K
M
K
M M
−
−
−
−
−
]
]
]
]
]
]
]
]
]
]
]
]
]

.
`
,
·

.
−
x
x
x
x
x
x
x
r
r
r
r
r
r
r
n n
$
$
$
$
$
$
$
`
,
. (51)
Again this process may be undertaken for every step, or only when the value on the diagonal is
considered too small relative to the other values in the matrix.
If it is not possible to rearrange the columns or rows to remove a zero from the diagonal, then the
matrix A is singular and no solution exists.
4.3 LU factorisation
A frequently used form of Gauss Elimination is LU Factorisation also known as LU
Decomposition or Crout Factorisation. The basic idea is to find two matrices L and U such that
LU = A, (52)
Numerical Methods for Natural Sciences IB Introduction
– 35 –
where L is a lower triangular matrix (zero above the leading diagonal) and U is an upper triangular
matrix (zero below the diagonal). Note that this decomposition is underspecified in that we may
choose the relative scale of the two matrices arbitrarily. By convention, the L matrix is scaled to
have a leading diagonal of unit values. Once we have computed L and U we need solve only
Ly = b, (53)
then
Ux = y, (54)
a procedure requiring O(n
2
) operations compared with O(n
3
) operations for the full Gauss
elimination. While the factorisation process requires O(n
3
) operations, this need be done only once
whereas we may wish to solve Ax=b for with whole range of b.
Since we have decided the diagonal elements l
ii
in the lower triangular matrix will always be
unity, it is not necessary for us to store these elements and so the matrices L and U can be stored
together in an array the same size as that used for A. Indeed, in most implementations the
factorisation will simply overwrite A.
The basic decomposition algorithm for overwriting A with L and U may be expressed as
# Factorisation
FOR i=1 TO n
FOR p=i TO n
a a a a
pi pi pk ki
k
i
· −
·
−
∑
1
1
NEXT p
FOR q=i+1 TO n
a
a a a
a
iq
iq ik kq
k
i
ii
·
−
·
−
∑
1
1
NEXT q
NEXT i
# Forward Substitution
FOR i=1 TO n
FOR q=n+1 TO n+m
a
a a a
a
iq
iq ik kq
k
i
ii
·
−
·
−
∑
1
1
NEXT q
NEXT i
# Back Substitution
FOR i=n–1 TO 1
FOR q=n+1 TO n+m
a a a a
iq iq ik kq
k i
n
· −
· +
∑
1
NEXT q
NEXT i
This algorithm assumes the righthand side(s) are initially stored in the same array structure as the
matrix and are positioned in the column(s) n+1 (to n+m for m righthand sides). To improve the
Numerical Methods for Natural Sciences IB Introduction
– 36 –
efficiency of the computation for righthand sides known in advance, the forward substitution loop
may be incorporated into the factorisation loop.
Figure 10 indicates how the LU Factorisation process works. We want to find vectors l
i
T
and u
j
such that a
ij
= l
i
T
u
j
. When we are at the stage of calculating the ith element of u
j
, we will already
have the i nonzero elements of l
i
T
and the first i−1 elements of u
j
. The ith element of u
j
may
therefore be chosen simply as u
j(i)
= a
ij
− l
i
T
u
j
where the dotproduct is calculated assuming u
j(i)
is
zero.
l
i
T
u
j
a
ij
Figure 10: Diagramatic representation of how LU factorisation works for calculating u
ij
to replace a
ij
where
i < j. The white areas represent zeros in the L and U matrices.
As with normal Gauss Elimination, the potential occurrence of small or zero values on the
diagonal can cause computational difficulties. The solution is again pivoting – partial pivoting is
normally all that is required. However, if the matrix is to be used in its factorised form, it will be
essential to record the pivoting which has taken place. This may be achieved by simply recording
the row interchanges for each i in the above algorithm and using the same row interchanges on the
righthand side when using L in subsequent forward substitutions.
4.4 Banded matrices
The LU Factorisation may readily be modified to account for banded structure such that the only
nonzero elements fall within some distance of the leading diagonal. For example, if elements
outside the range a
i,i–b
to a
i,i+b
are all zero, then the summations in the LU Factorisation algorithm
need be performed only from k=i or k=i+1 to k=i+b. Moreover, the factorisation loop FOR q=i+1
TO n can terminate at i+b instead of n.
One problem with such banded structures can occur if a (near) zero turns up on the diagonal
during the factorisation. Care must then be taken in any pivoting to try to maintain the banded
structure. This may require, for example, pivoting on both the rows and columns as described in
section 4.2.2.
Making use of the banded structure of a matrix can save substantially on the execution time and,
if the matrix is stored intelligently, on the storage requirements. Software libraries such as NAG and
IMSL provide a range of routines for solving such banded linear systems in a computationally and
storage efficient manner.
4.5 Tridiagonal matrices
A tridiagonal matrix is a special form of banded matrix where all the elements are zero except for
those on and immediately above and below the leading diagonal (b=1). It is sometimes possible to
Numerical Methods for Natural Sciences IB Introduction
– 37 –
rearrange the rows and columns of a matrix which does not initially have this structure in order to
gain this structure and hence greatly simplify the solution process. As we shall see later in sections 6
to 8, tridiagonal matrices frequently occur in numerical solution of differential equations.
A tridiagonal system may be written as
b c
a b c
a b c
a b c
a b
b c
a b c
a b c
a b
n n
n n n
n n n
n n
1 1
2 2 2
3 3 3
4 4 4
5 5
3 3
2 2 2
1 1 1
0 0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0
0 0 0 0 0 0 0
L
L
L
L
L
M M M M M O M M M M
L
L
L
L
− −
− − −
− − −
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]

.
`
,
·

.
`
,
−
−
−
−
−
−
x
x
x
x
x
x
x
x
x
r
r
r
r
r
r
r
r
r
n
n
n
n
n
n
n
n
1
2
3
4
5
3
2
1
1
2
3
4
5
3
2
1
M M
(55)
or
a
i
x
i–1
+ b
i
x
i
+ c
i
x
i+1
= r
i
(56)
for i=1,…,n. Clearly x
–1
and x
n+1
are not required and we set a
1
=c
n
=0 to reflect this.
If solved by standard Gauss elimination, then we start by dividing the first equation by b
0
before
subtracting a
2
times this from the second equation to eliminate a
2
. We then divide the new second
equation by the new value in place of b
2
, before subtracting a
3
times this equation from the third,
and so on.
Solution, by analogy with the LU Factorisation, may be expressed as
# Factorisation
FOR i=1 TO n
b
i
= b
i
– a
i
c
i–1
c
i
= c
i
/b
i
NEXT i
# Forward Substitution
FOR i=1 TO n
r
i
= (r
i
– a
i
r
i–1
)/b
i
NEXT i
# Back Substitution
FOR i=n–1 TO 1
r
i
= r
i
– c
i
r
i+1
NEXT i
4.6 Other approaches to solving linear systems
There are a number of other methods for solving general linear systems of equations including
approximate iterative techniques. Many large matrices which need to be solved in practical
situations have very special structures which allow solution  either exact or approximate  much
faster than the general O(n
3
) solvers presented here. We shall return to this topic in section 8.1
Numerical Methods for Natural Sciences IB Introduction
– 38 –
where we shall discuss a system with a special structure resulting from the numerical solution of the
Laplace equation.
4.7 Over determined systems
*
If the matrix A contains m rows and n columns, with m > n, the system is probably over
determined (unless there are m–n redundant rows). Such a system may be the result from fitting a
model with unknown coefficients to experimental data or observations. For example, fitting data
points s
i
,r
i
(i = 0,n−1) with the model a + bs + cs
2
+ de
s
= r leads to the linear system
1
1
1
1
1
1
1
0 0
2
1 1
2
2 2
2
3 3
2
4 4
2
5 5
2
1 1
2
0
1
2
3
4
5
1
0
1
2
3
4
5
1
s s e
s s e
s s e
s s e
s s e
s s e
s s e
a
b
c
d
r
r
r
r
r
r
r
s
s
s
s
s
s
n n
s
n
n
M M M M M
− − −
−
]
]
]
]
]
]
]
]
]
]
]
]
]

.
`
,
·

.
`
,
. (57)
which is of the form Ax = r, where x
T
= (a,b,c,d) While the solution to Ax = r will not exist in an
algebraic sense, it can be valuable to determine the solution in an approximate sense. The error in
this approximate solution is then
e = Ax – r. (58)
The approximate solution is chosen by optimising this error in some manner. Most useful among
the classes of solution is the Least Squares solution. In this solution we minimise the residual sum
of squares, which is simply
rss = e
T
e. (59)
Substituting for e we obtain
rss = [x
T
A
T
− r
T
][Ax − r]
= x
T
A
T
Ax − 2x
T
A
T
r + r
T
r, (60)
and setting ∂ ∂ rss x to zero gives
∂
∂
rss
x
x r · − · 2 2 0 A A A
T T
. (61)
Thus, if we solve the n by n problem A
T
Ax = A
T
r, the solution vector x will give us the solution in a
least squares sense.
*
Not examinable
Numerical Methods for Natural Sciences IB Introduction
– 39 –
Warning: The matrix A
T
A is often poorly conditioned (nearly singular) and can lead to
significant errors in the resulting Least Squares solution due to rounding error. While these errors
may be reduced using pivoting in combination with Gauss Elimination, it is generally better to solve
the Least Squares problem using the Householder transformation, as this produces less rounding
error, or better still by Singular Value Decomposition which will highlight any redundant or nearly
redundant variables in x.
The Householder transformation avoids the poorly conditioned nature of A
T
A by solving the
problem directly without evaluating this matrix. Suppose Q is an orthogonal matrix such that
Q
T
Q = I, (62)
where I is the identity matrix and Q is chosen to transform A into
QA
R
0
·
]
]
]
, (63)
where R is a square matrix of a size n and 0 is a zero matrix of size mn by n. The righthand side of
the system QAx = Qr becomes
Qr =
b
c

.
`
,
, (64)
where b is a vector of size n and c is a vector of size mn.
Now the turning point (global minimum) in the residual sum of squares, (61), this occurs when
[ ]
[ ]
[ ] [ ]
[ ]
[ ] [ ]
[ ]
∂
∂
rss
x
x r
x r
x r
x r
x b
· −
· −
· −
· −
· −
2
2
2
2
2
A A A
A Q QA A Q Q
QA QA QA Q
QA QA Q
R R
T T
T T T T
T T
T
T
(65)
vanishes. For a nontrivial solution, that occurs when
Rx = b. (66)
This system may be solved to obtain the least squares solution x using any of the normal linear
solvers discussed above.
Further discussion of these methods is beyond the scope of this course.
Numerical Methods for Natural Sciences IB Introduction
– 40 –
4.8 Under determined systems
*
If the matrix A contains m rows and n columns, with m < n, the system is under determined. The
solution maps out a n–m dimensional subregion in n dimensional space. Solution of such systems
typically requires some form of optimisation in order to further constrain the solution vector.
Linear programming represents one method for solving such systems. In Linear Programming,
the solution is optimised such that the objective function z=c
T
x is minimised. The “Linear” indicates
that the underdetermined system of equations is linear and the objective function is linear in the
solution variable x. The “Programming” arose to enhance the chances of obtaining funding for
research into this area when it was developing in the 1960s.
*
Not examinable
Numerical Methods for Natural Sciences IB Introduction
– 41 –
5 Numerical integration
There are two main reasons for you to need to do numerical integration: analytical integration
may be impossible or infeasible, or you may wish to integrate tabulated data rather than known
functions. In this section we outline the main approaches to numerical integration. Which is
preferable depends in part on the results required, and in part on the function or data to be
integrated.
5.1 Manual method
If you were to perform the integration by hand, one approach is to superimpose a grid on a graph
of the function to be integrated, and simply count the squares, counting only those covered by 50%
or more of the function. Provided the grid is sufficiently fine, a reasonably accurate estimate may be
obtained. Figure 11 demonstrates how this may be achieved.
x
1
x
0
Figure 11: Manual method for determining integral by superimposing a grid on a graph of the integrand. The
boxes indicated in grey are counted.
5.2 Constant rule
Perhaps the simplest form of numerical integration is to assume the function f(x) is constant over the
interval being integrated. Such a scheme is illustrated in figure 12. Clearly this is not going to be a
very accurate method of integrating, and indeed leads to an ambiguous result, depending on whether
the constant is selected from the lower or the upper limit of the integral.
Numerical Methods for Natural Sciences IB Introduction
– 42 –
x
1
=x
0
+∆x
x
0
Figure 12: Integration by constant rule whereby the value of f(x) is assumed constant over the interval.
Integration of a Taylor Series expansion of f(x) shows the error in this approximation to be
( ) ( ) ( )( ) ( )( )
( ) ( ) ( )
( ) ( )
f x x f x f x x x f x x x x
f x x f x x f x x
f x x O x
x
x x
x
x x
d d
0 0
0 0 0
1
2 0 0
2
0
1
2 0
2 1
6 0
3
0
2
+ +
∫ ∫
· + ′ − + ′′ − +
· + ′ + ′′ +
· +
∆ ∆
∆ ∆ ∆
∆ ∆
K
K
,
(67)
if the constant is taken from the lower limit, or f(x
0
+∆x)∆x if taken from the upper limit. In both
cases the error is O(∆x
2
), with the coefficient being derived from f’(x).
Clearly we can do much better than this, and as a result this rule is not used in practice, although
a knowledge of it helps with understanding the solution of ordinary differential equations (see §6).
5.3 Trapezium rule
Consider the Taylor Series expansion integrated from x
0
to x
0
+∆x:
( )
( ) ( )( ) ( )( )
( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( )
[ ]
( ) ( ) ( ) ( )
f x x f x f x x x f x x x x
f x x f x x f x x
f x f x f x x f x x f x x x
f x f x x x O x
x
x x
x
x x
d d
0 0
0 0 0
1
2 0 0
2
0
1
2 0
2 1
6 0
3
1
2 0
1
2 0 0
1
2 0
2 1
12 0
2
1
2 0 0
3
+ +
∫ ∫
· + ′ − + ′′ − +
· + ′ + ′′ +
· + + ′ + ′′ + − ′′ +
· + + +
∆ ∆
∆ ∆ ∆
∆ ∆ ∆ ∆
∆ ∆ ∆
K
K
K K
. (68)
Numerical Methods for Natural Sciences IB Introduction
– 43 –
The approximation represented by
1
/
2
[f(x
0
) + f(x
0
+∆x)]∆x is called the Trapezium Rule based on its
geometric interpretation as shown in figure 13.
x
1
=x
0
+∆x
x
0
Figure 13: Graphical interpretation of the trapezium rule.
As we can see from equation (68), the error in the Trapezium Rule is proportional to ∆x
3
. Thus, if
we were to halve ∆x, the error would be decreased by a factor of eight. However, the size of the
domain would be halved, thus requiring the Trapezium Rule to be evaluated twice and the
contributions summed. The net result is the error decreasing by a factor of four rather than eight.
The Trapezium Rule used in this manner is sometimes termed the Compound Trapezium Rule, but
more often simply the Trapezium Rule. In general it consists of the sum of integrations over a
smaller distance ∆x to obtain a smaller error.
Suppose we need to integrate from x
0
to x
1
. We shall subdivide this interval into n steps of size
∆x=(x
1
–x
0
)/n as shown in figure 14.
Numerical Methods for Natural Sciences IB Introduction
– 44 –
x
1
=x
0
+n∆x
x
0
Figure 14: Compound Trapezium Rule.
The Compound Trapezium Rule approximation to the integral is therefore
( ) ( )
( )
( )
( )
( )
( ) ( ) ( )
( )
( ) ( )
[ ]
f x x f x x
x
f x i x f x i x
x
f x f x x f x x f x n x f x
x
x
x i x
x i x
i
n
i
n
d d
0
1
0
0
1
0
1
0 0
0
1
0 0 0 0 1
2
1
2
2 2 2 2 1
∫ ∫
∑
∑
·
≈ + + + +
· + + + + + + + − +
+
+ +
·
−
·
−
∆
∆
∆
∆ ∆
∆
∆ ∆ ∆ K
. (69)
While the error for each step is O(∆x
3
), the cumulative error is n times this or O(∆x
2
) ~ O(n
2
).
The above analysis assumes ∆x is constant over the interval being integrated. This is not
necessary and an extension to this procedure to utilise a smaller step size ∆x
i
in regions of high
curvature would reduce the total error in the calculation, although it would remain O(∆x
2
). We
would choose to reduce ∆x in the regions of high curvature as we can see from equation (68) that
the leading order truncation error is scaled by f".
5.4 Midpoint rule
A variant on the Trapezium Rule is obtained by integrating the Taylor Series from x
0
−∆x/2 to
x
0
+∆x/2:
Numerical Methods for Natural Sciences IB Introduction
– 45 –
( )
( ) ( )( ) ( )( )
( ) ( )
f x x f x f x x x f x x x x
f x x f x x
x x
x x
x x
x x
d d
0
1
2
0
1
2
0
1
2
0
1
2
0 0 0
1
2 0 0
2
0
1
24 0
3
−
+
−
+
∫ ∫
· + ′ − + ′′ − +
· + ′′ +
∆
∆
∆
∆
∆ ∆
K
K
. (70)
By evaluating the function f(x) at the midpoint of each interval the error may be slightly reduced
relative to the Trapezium rule (the coefficient in front of the curvature term is 1/24 for the Mid
point Rule compared with 1/12 for the Trapezium Rule) but the method remains of the same order.
Figure 15 provides a graphical interpretation of this approach.
x
0
+ ½∆x
x
0
− ½∆x
Figure 15: Graphical interpretation of the midpoint rule. The grey region defines the midpoint rule as a
rectangular approximation with the dashed lines showing alternative trapeziodal aproximations containing
the same area.
Again we may reduce the error when integrating the interval x
0
to x
1
by subdividing it into n
smaller steps. This Compound Midpoint Rule is then
( ) ( ) ( ) f x x x f x i x
x
x
i
n
d
0
1
0
1
2
0
1
∫
∑
≈ + +
·
−
∆ ∆ , (71)
with the graphical interpretation shown in figure 16. The difference between the Trapezium Rule
and Midpoint Rule is greatly diminished in their compound forms. Comparison of equations (69)
and (71) show the only difference is in the phase relationship between the points used and the
domain, plus how the first and last intervals are calculated.
Numerical Methods for Natural Sciences IB Introduction
– 46 –
x
1
=x
0
+n∆x
x
0
Figure 16: Compound Midpoint Rule.
There are two further advantages of the Midpoint Rule over the Trapezium Rule. The first is that
is requires one fewer function evaluations for a given number of subintervals, and the second that it
can be used more effectively for determining the integral near an integrable singularity. The reasons
for this are clear from figure 17.
Numerical Methods for Natural Sciences IB Introduction
– 47 –
Figure 17: Applying the Midpoint Rule where the singular integrand would cause the Trapezium Rule to
fail.
5.5 Simpson’s rule
An alternative approach to decreasing the step size ∆x for the integration is to increase the
accuracy of the functions used to approximate the integrand. Figure 18 sketches one possibility,
using a quadratic approximation to f(x).
Figure 18: Quadratic approximation to integrand is the basis of Simpson’s Rule.
Integrating the Taylor series over an interval 2∆x shows
Numerical Methods for Natural Sciences IB Introduction
– 48 –
( )
( ) ( ) ( ) ( ) ( ) f x x f x x f x x f x x f x x f x x
x
x x
iv
d
0
2
0 0
2 4
3 0
3 2
3 0
4 4
15 0
5
2 2
+
∫
· + ′ + ′′ + ′′′ +
∆
∆ ∆ ∆ ∆ ∆ K (72)
( )
[
( ) ( ) ( ) ( ) ( ) ( )
( ) ( ) ( ) ( ) ( ) ( )
( )
]
( ) ( ) ( ) ( ) ( )
·
+ + ′ + ′′ + ′′′ + +
+ + ′ + ′′ + ′′′ + +
−
· + + + + +
∆
∆ ∆ ∆ ∆
∆ ∆ ∆ ∆
∆
∆
∆ ∆ ∆
x
f x
f x f x x f x x f x x f x x
f x f x x f x x f x x f x x
f x x
x
f x f x x f x x O x
iv
iv
iv
3
4
2 2
3
4 2
0
0 0
1
2 0
2 1
6 0
3 1
24 0
4
0 0 0
2 4
3 0
3 2
3 0
4
17
30 0
4
0 0 0
5
K
K
K
Whereas the error in the Trapezium rule was O(∆x
3
), Simpson’s rule is two orders more accurate at
O(∆x
5
), giving exact integration of cubics.
To improve the accuracy when integrating over larger intervals, the interval x
0
to x
1
may again be
subdivided into n steps. The threepoint evaluation for each subinterval requires that there are an
even number of subintervals. Hence we must be able to express the number of intervals as n=2m.
The Compound Simpson’s rule is then
( )
( )
( )
( )
( )
( )
( ) ( ) ( )
( )
( ) ( )
[ ]
f x x
x
f x i x f x i x f x i x
x
f x f x x f x x f x n x f x
x
x
i
m
d
0
1
3
2 4 2 1 2 2
3
4 2 2 4 1
0 0 0
0
1
0 0 0 0 1
∫
∑
≈ + + + + + + +
· + + + + + + + − +
·
−
∆
∆ ∆ ∆
∆
∆ ∆ ∆ K
, (73)
and the corresponding error O(n∆x
5
) or O(∆x
4
).
5.6 Quadratic triangulation
*
Simpson’s Rule may be employed in a manual way to determine the integral with nothing more
than a ruler. The approach is to cover the domain to be integrated with a triangle or trapezium
(whichever is geometrically more appropriate) as is shown in figure 19. The integrand may cross the
side of the trapezium (triangle) connecting the end points. For each arclike region so created (there
are two in figure 19) the maximum deviation (indicated by arrows in figure 19) from the line should
be measured, as should the length of the chord joining the points of crossing. From Simpson’s rule
we may approximate the area between each of these arcs and the chord as
area =
2
/
3
× chord× maxDeviation, (74)
remembering that some increase the area while others decrease it relative to the initial trapezoidal
(triangular) estimate. The overall estimate (ignoring linear measurement errors) will be O(l
5
), where
l is the length of the (longest) chord.
*
Not examinable
Numerical Methods for Natural Sciences IB Introduction
– 49 –
x
1
=x
0
+n∆x
x
0
Figure 19: Quadratic triangulation to determine the area using a manual combination of the Trapezium and
Simpson’s Rules.
5.7 Romberg integration
With the Compound Trapezium Rule we know from section 5.3 the error in some estimate T(∆x)
of the integral I using a step size ∆x goes like c∆x
2
as ∆x→0, for some constant c. Likewise the error
in T(∆x/2) will be c∆x
2
/4. From this we may construct a revised estimate T
(1)
(∆x/2) for I as a
weighted mean of T(∆x) and T(∆x/2):
T
(1)
(∆x/2) = αT(∆x/2) + (1–α)T(∆x)
= α[I + c∆x
2
/4 + O(∆x
4
)] + (1–α)[I + c∆x
2
+ O(∆x
4
)]. (75)
By choosing the weighting factor α = 4/3 we elimate the leading order (O(∆x
2
)) error terms,
relegating the error to O(∆x
4
). Thus we have
T
(1)
(∆x/2) = [4T(∆x/2) – T(∆x)]/3. (76)
Comparison with equation (73) shows that this formula is precisely that for Simpson’s Rule.
This same process may be carried out to higher orders using ∆x/4, ∆x/8, … to eliminate the
higher order error terms. For the Trapezium Rule the errors are all even powers of ∆x and as a result
it can be shown that
T
(m)
(∆x/2) = [2
2m
T
(m–1)
(∆x/2) – T
(m–1)
(∆x)]/(2
2m
–1). (77)
A similar process may also be applied to the Compound Simpson’s Rule.
Numerical Methods for Natural Sciences IB Introduction
– 50 –
5.8 Gauss quadrature
By careful selection of the points at which the function is evaluated it is possible to increase the
precision for a given number of function evaluations. The Midpoint rule is an example of this: with
just a single function evaluation it obtains the same order of accuracy as the Trapezium Rule (which
requires two points).
One widely used example of this is Gauss quadrature which enables exact integration of cubics
with only two function evaluations (in contrast Simpson’s Rule, which is also exact for cubics,
requires three function evaluations). Gauss quadrature has the formula
( )
( ) f x x
x
f x x f x x O x
x
x x x
d
0
1 0
2
1
2
3
6
1
2
3
6
0 0
4
· +
∫
≈ + −

.
`
,

.
`
,
+ + +

.
`
,

.
`
,
]
]
]
]
+
∆
∆
∆ ∆ ∆ . (78)
In general it is possible to choose M function evaluations per interval to obtain a formula exact for
all polynomials of degree 2M−1 and less.
The Gauss Quadrature accurate to order 2M−1 may be determined using the same approach
required for the twopoint scheme. This may be derived by comparing the Taylor Series expansion
for the integral with that for the points x
0
+α∆x and x
0
+β∆x:
( )
( ) ( ) ( ) ( ) ( )
( ) ( )
( )
( )
( )
( )
( ) ( )
( )
( )
( )
( )
( ) ( ) ( )
f x x xf x
x
f x
x
f x
x
f x O x
x
f x xf x
x
f x
x
f x
f x xf x
x
f x
x
f x
xf x
x
f x
x
x x x
d
0
1 0
0
2
0
3
0
4
0
5
0 0
2
0
3
0
0 0
2
0
3
0
0
2
0
2
2 6 24
2 2 6
2 6
2
· +
∫
· + ′ + ′′ + ′′′ +
· + ′ + ′′ + ′′′ +
+ + ′ + ′′ + ′′′ +
]
]
]
]
· + + ′ +
∆
∆
∆ ∆ ∆
∆
∆
∆
∆
α∆
α∆ α∆
β∆
β∆ β∆
α β α
K
K
( ) ( ) + + + + β α β
2
3
3 3
4
4 12
∆ ∆ x x
K
. (79)
Equating the various terms reveals
α + β = 1
(α
2
+ β
2
)/4 = 1/6, (80)
the solution of which gives the positions stated in equation (78).
Using three function evaluations: symmetry suggests for the interval x
0
−∆x to x
0
+∆x the points
should be evaluated at x
0
and x
0
tα with weightings 2∆xA and 2∆xB. The Taylor Series expansion of
the integral gives
( )
( ) ( ) ( ) ( ) ( ) f x x x f x
x
f x
x
f x
x
f x O x
x x
x x
iv vi
d
0
0
2
6 120 5040
0
2
0
4
0
6
0
8
−
+
∫
· + ′′ + + +
]
]
]
∆
∆
∆
∆ ∆ ∆
∆ , (81)
while the expansion of the function for the three points gives
Numerical Methods for Natural Sciences IB Introduction
– 51 –
( )
( ) ( ) ( ) [ ]
[
( ) ( )
( ) ( )
]
f x x x Af x Bf x Bf x
x Af
B f f f f f f f f O
B f f f f f f f f O
x x
x x
iv v vi vii
iv v vi vii
d
0
0
2
2
0 0 0
1
2
2 1
6
3 1
24
4 1
120
5 1
720
6 1
5040
7 8
1
2
2 1
6
3 1
24
4 1
120
5 1
720
6 1
5040
7 8
−
+
∫
· + − + +
·
+ − ′ + ′′ − ′′′ + − + − +
+ + ′ + ′′ + ′′′ + + + + +
∆
∆
∆
∆
α α
α α α α α α α α
α α α α α α α α
( ) ( ) [ ]
· + + ′′ + + + 2 2
2 1
12
4 1
360
6 8
∆x A B f B f B f f O
iv vi
α α α α . (82)
Comparing the terms between (81) and (82) gives
A + 2B = 1
Bα
2
= ∆x
2
/6
Bα
4
/12 = ∆x
4
/120 (83)
which may be solved to obtain the position of the evaluations and the weightings
(Bα
4
/12)/Bα
2
= (∆x
4
/120) / (∆x
2
/6)
⇒ α
2
= (6/10)∆x
2
⇒ α = (3/5)
1/2
∆x
B = 5/18
A = 4/9, (84)
thus
( )
( ) f x x x f x f x x f x x
x x
x x
d
0
0
1
9
8 5
3
5
5
3
5
0 0
1 2
0
1 2
−
+
∫
· + −

.
`
,

.
`
,
+ +

.
`
,

.
`
,
]
]
]
]
∆
∆
∆ ∆ ∆
/ /
. (85)
5.9 Example of numerical integration
Consider the integral
sin x x d
0
2
π
∫
· , (86)
which may be integrated numerically using any of the methods described in the previous sections.
Tables 2 to 6 summarise the error in the numerical estimates for the Trapezium Rule, Midpoint
Rule, Simpson’s Rule, Gauss Quadrature and a three point Gauss Quadrature (formula derived in
lectures). Table 7 compares these errors. The results are presented in terms of the number of
function evaluations required. The calculations were performed in double precision.
Numerical Methods for Natural Sciences IB Introduction
– 52 –
No.
intervals
No. f(x) Trapezium Rule Error Ratio:
e
2n
/e
n
1 2 2.00000000 0.2146
2 3 0.429203673 0.24203
4 5 0.103881102 0.24806
8 9 0.0257683980 0.24952
16 17 0.00642965622 0.24988
32 33 0.00160663902 0.24997
64 65 0.000401611359 0.24999
128 129 0.000100399815 0.25
256 257 0.0000250997649 0.25
512 513 0.00000627492942 0.25
1024 1025 0.00000156873161 0.25
2048 2049 0.000000392182860 0.25
4096 4097 0.0000000980457133 0.25
8192 8193 0.0000000245114248 0.25
16384 16385 0.00000000612785222 0.25
32768 32769 0.00000000153194190 0.24999
65536 65537 0.000000000382977427 0.24994
131072 131073 0.0000000000957223189 0.25014
262144 262145 0.0000000000239435138 0.24898
524288 524289 0.00000000000596145355
Table 2: Error in Trapezium Rule. Note error ratio → 2
−2
(∆x
2
)
No.
intervals
No. f(x) Midpoint Rule Error Ratio:
e
2n
/e
n
1 1 1.14189790 0.19392
2 2 0.221441469 0.23638
4 4 0.0523443059 0.24662
8 8 0.0129090855 0.24916
16 16 0.00321637816 0.24979
32 32 0.000803416309 0.24995
64 64 0.000200811728 0.24999
128 128 0.0000502002859 0.25
256 256 0.0000125499060 0.25
512 512 0.00000313746618 0.25
1024 1024 0.000000784365898 0.25
2048 2048 0.000000196091438 0.25
4096 4096 0.0000000490228564 0.25
8192 8192 0.0000000122557182 0.25
16384 16384 0.00000000306393221 0.25
32768 32768 0.000000000765979280 0.25
65536 65536 0.000000000191497040 0.24979
131072 131072 0.0000000000478341810 0.25081
262144 262144 0.0000000000119970700 0.25286
524288 524288 0.00000000000303357339
Table 3: Error in Midpoint Rule. Note error ratio → 2
−2
(∆x
2
)
No.
intervals
No. f(x) Simpson’s Rule Error Ratio:
e
2n
/e
n
1 3 0.0943951023 0.0483
2 5 0.00455975498 0.05903
4 9 0.000269169948 0.06164
8 17 0.0000165910479 0.06228
16 33 0.00000103336941 0.06245
32 65 0.0000000645300022 0.06249
64 129 0.00000000403225719 0.0625
Numerical Methods for Natural Sciences IB Introduction
– 53 –
128 257 0.000000000252001974 0.0625
256 513 0.0000000000157500679 0.0624
512 1025 0.000000000000982769421
Table 4: Error in Simpson’s Rule. Note error ratio → 2
−4
(∆x
4
)
No.
intervals
No. f(x) Gauss Quadrature Error Ratio:
e
2n
/e
n
1 2 0.0641804253 0.0476
2 4 0.00305477319 0.05881
4 8 0.000179666460 0.06158
8 16 0.0000110640837 0.06227
16 32 0.000000688965642 0.06244
32 64 0.0000000430208237 0.06249
64 128 0.00000000268818500 0.0625
128 256 0.000000000168002278 0.06249
256 512 0.0000000000104984909 0.06248
512 1024 0.000000000000655919762
Table 5: Error in Gauss Quadrature. Note error ratio → 2
−4
(∆x
4
)
No.
intervals
No.
f(x)
Gauss Quadrature  three point Error Ratio:
e
2n
/e
n
1 3 0.001388913607743625 0.01169
2 6 0.00001624311099668319 0.01464
4 12 0.0000002378219958742989 0.01538
8 24 0.000000003657474767493341 0.01556
16 48 0.00000000005692291082937118 0.0156
32 96 0.0000000000008881784197001252 0.016
64 192 0.00000000000001421085471520200 0.01563
128 384 0.0000000000000002220446049250313 1
256 768 0.0000000000000002220446049250313
Table 6: Error in Three point Gauss Quadrature. Note error ratio → 2
−6
(∆x
6
)
No.
Intervals
Trapezium
Rule
Midpoint
Rule
Simpson’s
Rule
Gauss
Quadrature
3 Pnt Gauss
Quadrature
1 2.0000E+00 1.1418E+00 9.4395E02 6.4180E02 1.3889E03
2 4.2920E01 2.2144E01 4.5597E03 3.0547E03 1.6243E05
4 1.0388E01 5.2344E02 2.6916E04 1.7966E04 2.3782E07
8 2.5768E02 1.2909E02 1.6591E05 1.1064E05 3.6574E09
16 6.4296E03 3.2163E03 1.0333E06 6.8896E07 5.6922E11
32 1.6066E03 8.0341E04 6.4530E08 4.3020E08 8.8817E13
64 4.0161E04 2.0081E04 4.0322E09 2.6881E09 1.4210E14
128 1.0039E04 5.0200E05 2.5200E10 1.6800E10 2.2204E16
256 2.5099E05 1.2549E05 1.5750E11 1.0498E11 2.2204E16
512 6.2749E06 3.1374E06 9.8276E13 6.5591E13
1024 1.5687E06 7.8436E07
2048 3.9218E07 1.9609E07
4096 9.8045E08 4.9022E08
8192 2.4511E08 1.2255E08
16384 6.1278E09 3.0639E09
32768 1.5319E09 7.6597E10
65536 3.8297E10 1.9149E10
131072 9.5722E11 4.7834E11
262144 2.3943E11 1.1997E11
524288 5.9614E12 3.0335E12
1048576
Table 7: Error in numerical integration of (86) as a function of the number of subintervals.
Numerical Methods for Natural Sciences IB Introduction
– 54 –
5.9.1 Program for numerical integration
*
Note that this program is written for clarity rather than speed. The number of function
evaluations actually computed may be approximately halved for the Trapezium rule and reduced by
one third for Simpson’s rule if the compound formulations are used. Note also that this example is
included for illustrative purposes only. No knowledge of Fortran or any other programming
language is required in this course.
PROGRAM Integrat
REAL*8 x0,x1,Value,Exact,pi
INTEGER*4 i,j,nx
C=====Functions
REAL*8 TrapeziumRule
REAL*8 MidpointRule
REAL*8 SimpsonsRule
REAL*8 GaussQuad
C=====Constants
pi = 2.0*ASIN(1.0D0)
Exact = 2.0
C=====Limits
x0 = 0.0
x1 = pi
C=======================================================================
C= Trapezium rule =
C=======================================================================
WRITE(6,*)
WRITE(6,*)’Trapezium rule’
nx = 1
DO i=1,20
Value = TrapeziumRule(x0,x1,nx)
WRITE(6,*)nx,Value,Value  Exact
nx = 2*nx
ENDDO
C=======================================================================
C= Midpoint rule =
C=======================================================================
WRITE(6,*)
WRITE(6,*)’Midpoint rule’
nx = 1
DO i=1,20
Value = MidpointRule(x0,x1,nx)
WRITE(6,*)nx,Value,Value  Exact
nx = 2*nx
ENDDO
C=======================================================================
C= Simpson’s rule =
C=======================================================================
WRITE(6,*)
WRITE(6,*)’Simpson’’s rule’
WRITE(6,*)
nx = 2
DO i=1,10
Value = SimpsonsRule(x0,x1,nx)
WRITE(6,*)nx,Value,Value  Exact
nx = 2*nx
ENDDO
C=======================================================================
C= Gauss Quadrature =
C=======================================================================
*
Not examinable
Numerical Methods for Natural Sciences IB Introduction
– 55 –
WRITE(6,*)
WRITE(6,*)’Gauss quadrature’
nx = 1
DO i=1,10
Value = GaussQuad(x0,x1,nx)
WRITE(6,*)nx,Value,Value  Exact
nx = 2*nx
ENDDO
END
FUNCTION f(x)
C=====parameters
REAL*8 x,f
f = SIN(x)
RETURN
END
REAL*8 FUNCTION TrapeziumRule(x0,x1,nx)
C=====parameters
INTEGER*4 nx
REAL*8 x0,x1
C=====functions
REAL*8 f
C=====local variables
INTEGER*4 i
REAL*8 dx,xa,xb,fa,fb,Sum
dx = (x1  x0)/DFLOAT(nx)
Sum = 0.0
DO i=0,nx1
xa = x0 + DFLOAT(i)*dx
xb = x0 + DFLOAT(i+1)*dx
fa = f(xa)
fb = f(xb)
Sum = Sum + fa + fb
ENDDO
Sum = Sum * dx / 2.0
TrapeziumRule = Sum
RETURN
END
REAL*8 FUNCTION MidpointRule(x0,x1,nx)
C=====parameters
INTEGER*4 nx
REAL*8 x0,x1
C=====functions
REAL*8 f
C=====local variables
INTEGER*4 i
REAL*8 dx,xa,fa,Sum
dx = (x1  x0)/Dfloat(nx)
Sum = 0.0
DO i=0,nx1
xa = x0 + (DFLOAT(i)+0.5)*dx
fa = f(xa)
Sum = Sum + fa
ENDDO
Sum = Sum * dx
MidpointRule = Sum
RETURN
END
REAL*8 FUNCTION SimpsonsRule(x0,x1,nx)
C=====parameters
INTEGER*4 nx
REAL*8 x0,x1
Numerical Methods for Natural Sciences IB Introduction
– 56 –
C=====functions
REAL*8 f
C=====local variables
INTEGER*4 i
REAL*8 dx,xa,xb,xc,fa,fb,fc,Sum
dx = (x1  x0)/DFLOAT(nx)
Sum = 0.0
DO i=0,nx1,2
xa = x0 + DFLOAT(i)*dx
xb = x0 + DFLOAT(i+1)*dx
xc = x0 + DFLOAT(i+2)*dx
fa = f(xa)
fb = f(xb)
fc = f(xc)
Sum = Sum + fa + 4.0*fb + fc
ENDDO
Sum = Sum * dx / 3.0
SimpsonsRule = Sum
RETURN
END
REAL*8 FUNCTION GaussQuad(x0,x1,nx)
C=====parameters
INTEGER*4 nx
REAL*8 x0,x1
C=====functions
REAL*8 f
C=====local variables
INTEGER*4 i
REAL*8 dx,xa,xb,fa,fb,Sum,dxl,dxr
dx = (x1  x0)/DFLOAT(nx)
dxl = dx*(0.5D0  SQRT(3.0D0)/6.0D0)
dxr = dx*(0.5D0 + SQRT(3.0D0)/6.0D0)
Sum = 0.0
DO i=0,nx1
xa = x0 + DFLOAT(i)*dx + dxl
xb = x0 + DFLOAT(i)*dx + dxr
fa = f(xa)
fb = f(xb)
Sum = Sum + fa + fb
ENDDO
Sum = Sum * dx / 2.0
GaussQuad = Sum
RETURN
END
Numerical Methods for Natural Sciences IB Introduction
– 57 –
6 First order ordinary differential equations
Ths section of the course introduces some commonly used methods for determining the
numerical solutions of ordinary differential equations. These methods will then be used in §8 as the
basis for solving some types of partial differential equations.
6.1 Taylor series
The key idea behind numerical solution of odes is the combination of function values at different
points or times to approximate the derivatives in the required equation. The manner in which the
function values are combined is determined by the Taylor Series expansion for the point at which
the derivative is required. This gives us a finite difference approximation to the derivative.
6.2 Finite difference
Consider a first order ode of the form
( )
d
d
y
t
f t y · , , (87)
subject to some boundary/initial condition y(t=t
0
) = c. The finite difference solution of this equation
proceeds by discretising the independent variable t to t
0
, t
0
+∆t, t
0
+2∆t, t
0
+3∆t, … We shall denote
the exact solution at some t = t
n
= t
0
+n∆t by y
n
= y(t=t
n
) and our approximate solution by Y
n
. We
then look to solve
Y’
n
= f(t
n
,Y
n
) (88)
at each of the points in the domain, where Y’
n
is an approximation to dy/dt based on the discrete set
of Y
n
.
If we take the Taylor Series expansion for the grid points in the neighbourhood of some point
t = t
n
,
( )
( )
( )
L
Y Y tY t Y t Y O t
Y Y tY t Y t Y O t
Y Y
Y Y tY t Y t Y O t
Y Y tY t Y t
n n n n n
n n n n n
n n
n n n n n
n n n n
−
−
+
+
· − ′ + ′′− ′′′ +
· − ′ + ′′− ′′′ +
·
· + ′ + ′′+ ′′′ +
· + ′ + ′′+ ′
2
2 3 4
1
2 3 4
1
2 3 4
2
2 3
2 2
4
3
1
2
1
6
1
2
1
6
2 2
4
3
∆ ∆ ∆ ∆
∆ ∆ ∆ ∆
∆ ∆ ∆ ∆
∆ ∆ ∆ ( ) ′′ + Y O t
n
∆
4
L
, (89)
Numerical Methods for Natural Sciences IB Introduction
– 58 –
we may then take linear combinations of these expansions to obtain an approximation for the
derivative Y’
n
at t = t
n
, viz.
′ ≈
+
·
∑
Y Y
n i n i
i a
b
α . (90)
The linear combination is chosen so as to eliminate the term in Y
n
, requiring
α
i
i a
b
·
∑
· 0 (91)
and, depending on the method, possibly some of the terms of higher order. We shall look at various
strategies for choosing α
i
in the following sections. As an example, we may select the Y
n
and Y
n1
values as the basis of an approximation to approximate Y’
n
. Adding the appropriate equations,
noting that the weightings sum to zero so may be expressed as α
1
= α and α
2
= − α, and setting
equal to Y’
n
, gives
Y’
n
≈ αY
n
− αY
n1
= αY
n
− α(Y
n
− ∆tY’
n
+ ½∆t
2
Y"
n
−
1
/
6
∆t
3
Y’"
n
+ ...). (92)
Equating left and righthand sides reveals α = 1/∆t, so
( )
Y Y
t
Y tY O t
n n
n n
−
· ′ + ′′+
−1 2
1
2 ∆
∆ ∆ . (93)
Before looking at this in any further detail, we need to consider the error associated with
approximating y
n
by Y
n
.
6.3 Truncation error
The global truncation error at the nth step is the cumulative total of the truncation error at the
previous steps and is
E
n
= Y
n
– y
n
. (94)
In contrast, the local truncation error for the nth step is
e
n
= Y
n
– y
n
*, (95)
where y
n
* the exact solution of our differential equation but with the initial condition y
n–1
*=Y
n–1
.
Note that E
n
is not simply
e
n
i
n
·
∑
1
, (96)
which would give E
n
= O(ne
n
). It also depends on the stability of the method (see section 6.7 for
details) and we aim for E
n
= O(e
n
).
Numerical Methods for Natural Sciences IB Introduction
– 59 –
6.4 Euler method
The Euler method is the simplest finite difference scheme to understand and implement.
Following on from (93) we approximate the derivative in (88) as
Y’
n
≈ (Y
n+1
– Y
n
)/∆t, (97)
in our differential equation for Y
n
to obtain
Y
n+1
= Y
n
+ ∆tf(t
n
,Y
n
). (98)
Given the initial/boundary condition Y
0
= c, we may obtain Y
1
from Y
0
+ ∆tf(t
0
,Y
0
), Y
2
from
Y
1
+ ∆tf(t
1
,Y
1
) and so on, marching forwards through time. This process is shown graphically in
figure 20.
t
0
Figure 20: Sketch of the function y(t) (dark line) and the Euler method solution (arrows). Each arrow is
tangental to to the solution of (87) passing through the point located at the start of the arrow. Note that this
point need not be on the desired y(t) curve.
The Euler method is termed an explicit method because we are able to write down an explicit
solution for Y
n+1
in terms of “known” values at t
n
.
Comparing the Euler method equation for Y
n
in (98) with the the Taylor Series expansion for Y
n1
in (89), shows that the O(∆t) error in the finite difference approximation (93) becomes an O(∆t
2
)
error in (98). Thus the Euler method is first order accurate and is often referred to as a first order
method. Moreover, it can be shown that if Y
n
=y
n
+O(∆t
2
), then Y
n+1
=y
n+1
+O(∆t
2
) provided the
scheme is stable (see section 6.7).
Numerical Methods for Natural Sciences IB Introduction
– 60 –
6.5 Implicit methods
The Euler method outlined in the previous section may be summarised by the update formula
Y
n+1
= g(Y
n
,t
n
,∆t). In contrast implicit methods have have Y
n+1
on both sides: Y
n+1
= h(Y
n
,Y
n+1
,t
n
,∆t),
for example. Such implicit methods are computationally more expensive for a single step than
explicit methods, but offer advantages in terms of stability and/or accuracy in many circumstances.
Often the computational expense per step is more than compensated for by it being possible to take
larger steps (see section 6.7).
6.5.1 Backward Euler
The backward Euler method is almost identical to its explicit relative, the only difference being
that the derivative Y’
n
is approximated by
Y'
n
≈ (Y
n
– Y
n–1
)/∆t, (99)
to give the evolution equation
Y
n+1
= Y
n
+ ∆tf(t
n+1
,Y
n+1
). (100)
This is shown graphically in figure 21.
t
0
Figure 21: Sketch of the function y(t) (dark line) and the Euler method solution (arrows). Each arrow is
tangental to to the solution of (87) passing through the point located at the end of the arrow. Note that this
point need not be on the desired y(t) curve.
Numerical Methods for Natural Sciences IB Introduction
– 61 –
The dependence of the righthand side on the variables at t
n+1
rather than t
n
means that it is not, in
general possible to give an explicit formula for Y
n+1
only in terms of Y
n
and t
n+1
. (It may, however,
be possible to recover an explicit formula for some functions f.)
As the derivative Y’
n
is approximated only to the first order, the Backward Euler method has
errors of O(∆t
2
), exactly as for the Euler method. The solution process will, however, tend to be
more stable and hence accurate, especially for stiff problems (problems where f’ is large). An
example of this is shown in figure 22.
t
0
Backward Euler
(Forward) Euler
CrankNicholson
Figure 22: Comparison of ordinary differential equation solvers for a stiff problem.
6.5.2 Richardson extrapolation
The Romberg integration approach (presented in section 5.7) of using two approximations of
different step size to construct a more accurate estimate may also be used for numerical solution of
ordinary differential equations. Again, if we have the estimate for some time t calculated using a
time step ∆t, then for both the Euler and Backward Euler methods the approximate solution is
related to the true solution by Y(t,∆t) = y(t) + c∆t
2
. Similarly an estimate using a step size ∆t/2 will
follow Y(t,∆t/2) = y(t) +
1
/
4
c∆t
2
as ∆t→0. Combining these two estimates to try and cancel the
O(∆t
2
) errors gives the improved estimate as
Y
(1)
(t,∆t/2) = [4Y(t,∆t/2) – Y(t,∆t) ]/3. (101)
The same approach may be applied to higher order methods such as those presented in the
following sections. It is generally preferable, however, to utilise a higher order method to start with,
the exception being that calculating both Y(t,∆t) and Y(t,∆t/2) allows the two solutions to be
compared and thus the trunctation error estimated.
Numerical Methods for Natural Sciences IB Introduction
– 62 –
6.5.3 CrankNicholson
If we use central differences rather than the forward difference of the Euler method or the
backward difference of the backward Euler, we may obtain a second order method due to
cancellation of the terms of O(∆t
2
). Using the same discretisation of t we obtain
Y’
n+1/2
≈ (Y
n+1
– Y
n
)/∆t. (102)
Substitution into our differential equation for Y
n
gives
(Y
n+1
– Y
n
)/∆t ≈ f(t
n+1/2
,Y
n+1/2
). (103)
The requirement for f(t
n+1/2
,Y
n+1/2
) is then satisfied by a linear interpolation for f between t
n–1/2
and
t
n+1/2
to obtain
Y
n+1
– Y
n
=
1
/
2
[f(t
n+1
,Y
n+1
) + f(t
n
,Y
n
)]∆t. (104)
As with the Backward Euler, the method is implicit and it is not, in general, possible to write an
explicit expression for Y
n+1
in terms of Y
n
.
Formal proof that the CrankNicholson method is second order accurate is slightly more
complicated than for the Euler and Backward Euler methods due to the linear interpolation to
approximate f(t
n+1/2
,Y
n+1/2
). The overall approach is much the same, however, with a requirement for
Taylor Series expansions about t
n
::
( )
( )
y y
dy
dt
t
d y
dt
t O t
y f t
f
t
f
f
y
t O t
n n
n n
+
· + + +
· + + +

.
`
,
+
1
2
2
2 3
2 3
1
2
1
2
∆ ∆ ∆
∆ ∆ ∆
∂
∂
∂
∂
(105a)
( ) ( ) ( ) ( )
( )
( )
f t y f t y
f
t
t
f
y
y O t O y
f
f
t
t
f
y
dy
dt
t O t
f
f
t
f
f
y
t O t
n n n n
n
n
+ +
· + + + +
· + + +
· + +

.
`
,
+
1 1
2 2
2
2
, ,
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∂
∆ ∆ ∆ ∆
∆ ∆ ∆
∆ ∆
(105b)
Substitution of these into the left and righthand sides of equation (104) reveals
( ) y y f t
f
t
f
f
y
t O t
n n n +
− · + +

.
`
,
+
1
2 3
1
2
∆ ∆ ∆
∂
∂
∂
∂
(106a)
and
( ) ( ) ( ) ( )
( )
1
2
1
2
1
2
1 1
2
2 3
f t y f t y t f f
f
t
f
f
y
t O t t
f t
f
t
f
f
y
t O t
n n n n n n
n
, , + · + + +

.
`
,
+

.
`
,
· + +

.
`
,
+
+ +
∆ ∆ ∆ ∆
∆ ∆ ∆
∂
∂
∂
∂
∂
∂
∂
∂
(106b)
Numerical Methods for Natural Sciences IB Introduction
– 63 –
which are equal up to O(∆t
3
).
6.6 Multistep methods
As an alternative, the accuracy of the approximation to the derivative may be improved by using
a linear combination of additional points. By utilising only Y
n–s+1
,Y
n–s+2
,…,Y
n
we may construct an
approximation to the derivatives of orders 1 to s at t
n
. For example, if s = 2 then
Y’
n
= f
n
,
Y"
n
=~ (f
n
– f
n–1
)/∆t (107)
and so we may construct a second order method from the Taylor series expansion as
Y
n+1
= Y
n
+ ∆tY’
n
+ ½∆t
2
Y"
n
= Y
n
+
1
/
2
∆t(3f
n
– f
n–1
). (108)
For s=3 we also have Y’"
n
and so can make use of a secondorder onesided finite difference to
approximate Y"
n
= f’
n
= (3f
n
4 f
n1
+ f
n2
)/2∆t and include the third order Y’"
n
= f"
n
= (f
n

2f
n1
+f
n2
)/∆t
2
to obtain
Y
n+1
= Y
n
+ ∆tY’
n
+
1
/
2
∆t
2
Y"
n
+
1
/
6
∆t
3
Y"
n
= Y
n
+
1
/
12
∆t(23f
n
– 16f
n–1
+ 5f
n–2
). (109)
These methods are called AdamsBashforth methods. Note that s = 1 recovers the Euler method.
Implicit AdamsBashforth methods are also possible if we use information about f
n+1
in addition
to earlier time steps. The corresponding s = 2 method then uses
Y’
n
= f
n
,
Y"
n
≈ (f
n+1
– f
n–1
)/2∆t
Y’"
n
≈ (f
n+1
– 2f
n
+ f
n–1
)/∆t
2
, (110)
to give
Y
n+1
= Y
n
+ ∆tY’
n
+
1
/
2
∆t
2
Y"
n
+
1
/
6
∆t
3
Y’"
n
= Y
n
+ (1/12)∆t(5f
n+1
+ 8f
n
– f
n–1
). (111)
This family of implicit methods is known as AdamsMoulton methods.
6.7 Stability
The stability of a method can be even more important than its accuracy as measured by the order
of the truncation error. Suppose we are solving
y’ = λy, (112)
for some complex λ. The exact solution is bounded (i.e. does not increase without limit) provided
Reλ ≤ 0. Substituting this into the Euler method shows
Numerical Methods for Natural Sciences IB Introduction
– 64 –
Y
n+1
= (1 + λ∆t)Y
n
= (1 + λ∆t)
2
Y
n–1
= … = (1 + λ∆t)
n+1
Y
0
. (113)
If Y
n
is to remain bounded for increasing n and given Reλ < 0 we require
1 + λ∆t ≤ 1. (114)
If we choose a time step ∆t which does not satisfy (114) then Y
n
will increase without limit. This
condition (114) on ∆t is very restrictive if λ<<0 as it demonstrates the Euler method must use very
small time steps ∆t < 2λ
–1
if the solution is to converge on y = 0.
The reason why we consider the behaviour of equation (112) is that it is a model for the
behaviour of small errors. Suppose that at some stage during the solution process our approximate
solution is
y
$
= y + ε, (115)
where ε is the (small) error. Substituting this into our differential equation of the form y’ = f(t,y) and
using a Taylor Series expansion gives
( )
( )
( ) ( )
dy
dt
dy
dt
d
dt
f t y
f t y
f t y
f
y
O
$
$
,
,
,
· +
·
· +
· + +
ε
ε
ε
∂
∂
ε
2
. (116)
Thus, to the leading order, the error obeys an equation of the form given by (112), with λ = ∂f/∂y.
As it is desirable for errors to decrease (and thus the solution remain stable) rather than increase
(and the solution be unstable), the limit on the time step suggested by (114) applies for the
application of the Euler method to any ordinary differential equation. A consequence of the decay of
errors present at one time step as the solution process proceeds is that memory of a particular time
step’s contribution to the global truncation error decays as the solution advances through time. Thus
the global truncation error is dominated by the local trunction error(s) of the most recent step(s) and
O(E
n
) = O(e
n
).
In comparison, solution of (112) by the Backward Euler method
Y
n+1
= Y
n
+ ∆tλY
n+1
, (117)
can be rearranged for Y
n+1
and
Y
n+1
= Y
n
/(1 – λ∆t) = Y
n–1
/(1 – λ∆t)
2
= … = Y
0
/(1 – λ∆t)
n+1
, (118)
which will be stable provided
1 – λ∆t > 1. (119)
For Reλ ≤ 0 this is always satisfied and so the Backward Euler method is unconditionally stable.
The CrankNicholson method may be analysed in a similar fashion with
(1λ∆t/2)Y
n+1
= (1+λ∆t/2)Y
n
, (120)
Numerical Methods for Natural Sciences IB Introduction
– 65 –
to arrive at
Y
n+1
= [(1+λ∆t/2)/(1 – λ∆t/2)]
n+1
Y
0
, (121)
with the magnitude of the term in square brackets always less than unity for Reλ < 0. Thus, like
Backward Euler, CrankNicholson is unconditionally stable.
In general, explicit methods require less computation per step, but are only conditionally stable
and so may require far smaller step sizes than an implicit method of nominally the same order.
6.8 Predictorcorrector methods
One approach to using an implicit method is to use a direct iteration to solve the implicit
equation. For example, the Backward Euler method has the form
Y
n+1
= Y
n
+ ∆t f(t
n+1
,Y
n+1
). (122)
Applying the theory for direct iteration outlined in §3.6, we may write
Y
n+1,i
= Y
n
+ ∆t f(t
n+1
,Y
n+1,i
), (123)
setting Y
n+1,0
= Y
n
, and iterate over i until the solution has converged. From (21) we know it will
converge if
( ) ( )
∂
∂
∂
∂ Y
Y t f t Y t
f
Y
n i
n n i
+
+
+ · <
1
1
1
,
,
, ∆ ∆ , (124)
and the solution will gradually approach the Backward Euler solution. Note that this convergence
criterion is more stringent than the stability criterion established in §6.7!
What we are effectively doing here is making a prediction with the Euler method (our equation
for Y
n+1,1
is just the Euler method), then multiple corrections to this to get the solution to converge
on the Backward Euler method. A similar strategy may be applied to the CrankNicholson method
to obtain
Y
n+1,i
= Y
n
+ ½ ∆t [f(t
n+1
,Y
n+1,i
) + f(t
n
,Y
n,i
)], (125)
requiring ½ ∆t ∂f/∂Y < 1 for convergence. Once convergence is achieved, we are left with a second
order implicit scheme. However, it may not be necessary or desirable to carry on iterating until
convergence is achieved. Moreover, the condition for convergence using (125) imposes the same
restrictions upon ∆t as stability of the Euler method. There are other reasons, however, for taking
this approach.
The above examples, if iterated only a finite number of times rather than until convergence is
achieved, belong to a class of methods known as predictorcorrector methods. Predictorcorrector
methods try to combine the advantages of the simplicity of explicit methods with the improved
accuracy of implicit methods. They achieve this by using an explicit method to predict the solution
Y
n+1
(p)
at t
n+1
and then utilise f(t
n+1
,Y
n+1
(p)
) as an approximation to f(t
n+1
,Y
n+1
) to correct this
prediction using something similar to an implicit step.
Numerical Methods for Natural Sciences IB Introduction
– 66 –
6.8.1 Improved Euler method
The simplest of these methods combines the Euler method as the predictor
Y
n+1
(1)
= Y
n
+ ∆tf(t
n
,Y
n
), (126)
and then the Backward Euler to give the corrector
Y
n+1
(2)
= Y
n
+ ∆tf(t
n
,Y
n+1
(1)
). (127)
The final solution is the mean of these:
Y
n+1
= (Y
n+1
(1)
+ Y
n+1
(2)
)/2. (128)
To understand the stability of this method we again use the y’ = λy so that the three steps
described by equations (126) to (128) become
Y
n+1
(1)
= Y
n
+ λ∆tY
n
, (129a)
Y
n+1
(2)
= Y
n
+ λ∆tY
n+1
(1)
= Y
n
+ λ∆t (Y
n
+ λ∆tY
n
)
= (1 + λ∆t + λ
2
∆t
2
)Y
n
, (129b)
Y
n+1
= (Y
n+1
(1)
+ Y
n+1
(2)
)/2
= [(1 + λ∆t)Y
n
+ (1 + λ∆t + λ
2
∆t
2
) Y
n
]/2
= (1 + λ∆t +
1
/
2
λ
2
∆t
2
) Y
n
= (1 + λ∆t +
1
/
2
λ
2
∆t
2
)
n+1
Y
0
. (129c)
Stability requires 1 + λ∆t +
1
/
2
λ
2
∆t
2
 < 1 (for Reλ < 0) which in turn restricts ∆t < 2λ
–1
. Thus the
stability of this method, commonly known as the Improved Euler method, is identical to the Euler
method. It is also the same as the criterion for the iterative CrankNicholson method given in (125)
to converge. This is not surprising as it is limited by the stability of the initial predictive step. The
accuracy of the method is, however, second order as may be seen by comparison of (129c) with the
Taylor Series expansion.
6.8.2 RungeKutta methods
The Improved Euler method is the simplest of a family of similar predictor corrector methods
following the form of a single predictor step and one or more corrector steps. The corrector step
may be repeated a fixed number of times, or until the estimate for Y
n+1
converges to some tolerance.
One subgroup of this family are the RungeKutta methods which use a fixed number of corrector
steps. The Improved Euler method is the simplest of this subgroup. Perhaps the most widely used of
these is the fourth order method:
k
(1)
= ∆tf(t
n
,Y
n
) , (130a)
k
(2)
= ∆tf(t
n
+
1
/
2
∆t ,Y
n
+½k
(1)
) , (130b)
Numerical Methods for Natural Sciences IB Introduction
– 67 –
k
(3)
= ∆tf(t
n
+
1
/
2
∆t ,Y
n
+½k
(2)
) , (130c)
k
(4)
= ∆tf(t
n
+∆t ,Y
n
+k
(3)
) , (130d)
Y
n+1
= Y
n
+ (k
(1)
+ 2k
(2)
+ 2k
(3)
+ k
(4)
)/6. (130e)
In order to analyse this we need to construct TaylorSeries expansions for
k
(2)
= ∆tf(t
n
+
1
/
2
∆t,Y
n
+
1
/
2
k
(1)
) = ∆t[f(t
n
,Y
n
)+(∆t/2)(∂f/∂t+f∂f/∂y)], and similarly for k
(3)
and k
(4)
. This is
then compared with a full TaylorSeries expansion for Y
n+1
up to fourth order. To achieve this we
require
Y" = df/dt = ∂f/∂t + ∂y/∂t ∂f/∂y = ∂f/∂t + f ∂f/∂y, (131)
Y’" = d
2
f/dt
2
= ∂
2
f/∂t
2
+ 2f ∂
2
f/∂t∂y + ∂f/∂t ∂f/∂y + f
2
∂
2
f/∂y
2
+ f (∂f/∂y)
2
, (132)
and similarly for Y"". All terms up to order ∆t
4
can be shown to match, with the error coming in at
∆t
5
.
Numerical Methods for Natural Sciences IB Introduction
– 68 –
7 Higher order ordinary differential equations
In this section we look at how to solve higher order ordinary differential equations. Some of the
techniques discussed here are based on the techniques introduced in §6, and some rely on a
combination of this material and the linear algebra of §4.
7.1 Initial value problems
The discussion so far has been for first order ordinary differential equations. All the methods
given may be applied to higher ordinary differential equations, provided it is possible to write an
explicit expression for the highest order derivative and the system has a complete set of initial
conditions. Consider some equation
d
d
d
d
d
d
d
d
2
2
n
n
n
n
y
t
f t y
y
t
y
t
y
t
·

.
`
,
−
−
, , , , K
1
1
, (133)
where at t = t
0
we know the values of y, dy/dt, d
2
y/dt
2
, …, d
n1
y/dt
n1
. By writing x
0
=y, x
1
=dy/dt,
x
2
=d
2
y/dt
2
, …, x
n–1
=d
n–1
y/dt
n–1
, we may express this as the system of equations
x
0
’ = x
1
x
1
’ = x
2
x
2
’ = x
3
. . . .
x
n–2
’ = x
n–1
x
n–1
’ = f(t,x
0
,x
1
,…,x
n–2
), (134)
and use the standard methods for updating each x
i
for some t
n+1
before proceeding to the next time
step. A decision needs to be made as to whether the values of x
i
for t
n
or t
n+1
are to be used on the
right hand side of the equation for x
n–1
’. This decision may affect the order and convergence of the
method. Detailed analysis may be undertaken in a manner similar to that for the first order ordinary
differential equations.
7.2 Boundary value problems
For second (and higher) order odes, two (or more) initial/boundary conditions are required. If
these two conditions do not correspond to the same point in time/space, then the simple extension of
the first order methods outlined in section 7.1 can not be applied without modification. There are
two relatively simple approaches to solve such equations.
7.2.1 Shooting method
Suppose we are solving a second order equation of the form y" = f(t,y,y’) subject to y(0) = c
0
and
y(1) = c
1
. With the shooting method we apply the y(0)=c
0
boundary condition and make some guess
Numerical Methods for Natural Sciences IB Introduction
– 69 –
that y’(0) = α
0
. This gives us two initial conditions so that we may apply the simple timestepping
methods already discussed in section 7.1. The calculation proceeds until we have a value for y(1). If
this does not satisfy y(1) = c
1
to some acceptable tolerance, we revise our guess for y’(0) to some
value α
1
, say, and repeat the time integration to obtain an new value for y(1). This process continues
until we hit y(1)=c
1
to the acceptable tolerance. The number of iterations which will need to be
made in order to achieve an acceptable tolerance will depend on how good the refinement algorithm
for α is. We may use the root finding methods discussed in section 3 to undertake this refinement.
The same approach can be applied to higher order ordinary differential equations. For a system of
order n with m boundary conditions at t = t
0
and n–m boundary conditions at t = t
1
, we will require
guesses for n–m initial conditions. The computational cost of refining these n–m guesses will
rapidly become large as the dimensions of the space increase.
7.2.2 Linear equations
The alternative is to rewrite the equations using a finite difference approximation with step size
∆t = (t
1
–t
0
)/N to produce a system of N+1 simultaneous equations. Consider the second order linear
system
y" + ay’ + by = c, (135)
with boundary conditions y(t
0
) = α and y(t
1
) + y’(t
1
) = β. If we use the central difference
approximations
y'
i
≈ (Y
i+1
– Y
i–1
)/2∆t, (136a)
y"
i
≈ (Y
i+1
– 2Y
i
+ Y
i–1
)/∆t
2
. (136b)
Substitution into (135) gives
Y Y Y
t
a
Y Y
t
bY c
i i i i i
i
+ − + −
− +
+
−
+ ·
1 1
2
1 1
2
2 ∆ ∆
, (137)
and we may write the boundary condition at t = t
0
as
Y
0
= α. (138)
For t = t
1
we must express the derivative in the boundary condition in finite difference form using
points that exist in our mesh. The obvious thing to do is to take the backward difference so that the
boundary condition becomes
Y
Y Y
t
n
n n
+
−
·
−1
∆
β , (139)
but as we shall see, this choice is not ideal.
The above strategy leads to the system of equations
Y
0
= α,
(1+
1
/
2
a∆t)Y
0
+ (b∆t
2
–2)Y
1
+ (1–
1
/
2
a∆t)Y
2
= c∆t
2
,
(1+
1
/
2
a∆t)Y
1
+ (b∆t
2
–2)Y
2
+ (1–
1
/
2
a∆t)Y
3
= c∆t
2
,
Numerical Methods for Natural Sciences IB Introduction
– 70 –
…
(1+
1
/
2
a∆t)Y
n–2
+ (b∆t
2
–2)Y
n–1
+ (1–
1
/
2
a∆t)Y
n
= c∆t
2
,
−Y
n1
+ (∆t + 1) Y
n
= β∆t (140)
This tridiagonal system may be readily solved using the method discussed in section 4.5.
There is one problem with the scheme outlined above: while the Y" and Y’ derivatives within the
domain are second order accurate, the boundary condition at t = t
1
is imposed only as a firstorder
approximation to the derivative. We have two ways in which we may improve the accuracy of this
boundary condition.
The first way is the most obvious: make use of the Y
n2
value to construct a secondorder
approximation to Y" so that we have
( ) ′ ·
− +
+
− −
Y
Y Y Y
t
O t
n
n n n
3 4
2
1 2 2
∆
∆ , (141)
so that the last equation in our system becomes
Y
n2
− 4Y
n1
+ (2∆t + 3) Y
n
= 2β∆t. (142)
The problem with this is that we then loose our simple tridiagonal system.
The second approach is to artificially increase the size of the domain by one mesh point and
impose the boundary condition inside the domain. The additional mesh point is often referred to as a
dummy mesh point or dummy cell. This leads to the last three equations being
(1+
1
/
2
a∆t)Y
n–2
+ (b∆t
2
–2)Y
n–1
+ (1–
1
/
2
a∆t)Y
n
= c∆t
2
,
(1+
1
/
2
a∆t)Y
n1
+ (b∆t
2
–2)Y
n
+ (1–
1
/
2
a∆t)Y
n+1
= c∆t
2
,
−Y
n1
+ 2∆tY
n
+ Y
n+1
= 2β∆t (143)
The attraction of this approach is that we maintain our simple tridiagonal system.
Clearly the same tricks can be applied if we have a boundary condition on the derivative at t
0
.
Higher order linear equations may be catered for in a similar manner and the matrix representing
the system of equations will remain banded, but not as sparse as tridiagonal. The solution may be
undertaken using the modified LU decomposition introduced in section 4.4.
Nonlinear equations may also be solved using this approach, but will require an iterative solution
of the resulting matrix system Ax = b as the matrix A will be a function of x. In most circumstances
this is most efficiently achieved through a NewtonRaphson algorithm, similar in principle to that
introduced in section 3.4 but where a system of linear equations requires solution for each iteration.
7.3 Other considerations
*
Not examinable
*
Not examinable
Numerical Methods for Natural Sciences IB Introduction
– 71 –
7.3.1 Truncation error
*
Not examinable
7.3.2 Error and step control
*
Not examinable
*
Not examinable
*
Not examinable
Numerical Methods for Natural Sciences IB Introduction
– 72 –
8 Partial differential equations
In this section we shall concentrate on the numerical solution of second order linear partial
differential equations.
8.1 Laplace equation
Consider the Laplace equation in two dimensions
∇ · + ·
2
2
2
2
2
0 ϕ
∂ ϕ
∂
∂ ϕ
∂ x y
, (144)
in some rectangular domain described by x in [x
0
,x
1
], y in [y
0
,y
1
]. Suppose we discretise the solution
ϕ onto a m+1 by n+1 rectangular grid (or mesh) given by
x
i
= x
0
+ i∆x (145a)
y
j
= y
0
+ j∆y (145b)
where i=0,m, j=0,n. The mesh spacing is ∆x = (x
1
–x
0
)/m and ∆y = (y
1
–y
0
)/n. Let
ϕ
ij
= ϕ(x
i
,y
j
) (146)
be the exact solution at the mesh point i,j, and Φ
ij
≈ ϕ
ij
be the approximate solution at that mesh
point.
By considering the Taylor Series expansion for ϕ about some mesh point i,j,
( ) ϕ ϕ
∂ϕ
∂
∂ ϕ
∂
∂ ϕ
∂
i j i j
i j i j i j
x
x
x
x
x
x
O x
+
· + + + +
1
2
2
2
3
3
3
4
2 6
, ,
, , ,
∆
∆ ∆
∆ , (147a)
( ) ϕ ϕ
∂ϕ
∂
∂ ϕ
∂
∂ ϕ
∂
i j i j
i j i j i j
x
x
x
x
x
x
O x
−
· − + − +
1
2
2
2
3
3
3
4
2 6
, ,
, , ,
∆
∆ ∆
∆ , (147b)
( ) ϕ ϕ
∂ϕ
∂
∂ ϕ
∂
∂ ϕ
∂
i j i j
i j i j i j
y
y
y
y
y
y
O y
, ,
, , ,
+
· + + + +
1
2
2
2
3
3
3
4
2 6
∆
∆ ∆
∆ , (147b)
( ) ϕ ϕ
∂ϕ
∂
∂ ϕ
∂
∂ ϕ
∂
i j i j
i j i j i j
y
y
y
y
y
y
O y
, ,
, , ,
−
· − + − +
1
2
2
2
3
3
3
4
2 6
∆
∆ ∆
∆ , (147b)
it is clear that we may approximate ∂
2
ϕ/∂x
2
and ∂
2
ϕ/∂y
2
to the second order using the four adjacent
mesh points to obtain the finite difference approximation
Φ Φ Φ
∆
Φ Φ Φ
∆
i j i j i j i j i j i j
x y
+ − + −
− +
+
− +
·
1 1
2
1 1
2
2 2
0
, , , , , ,
(148)
Numerical Methods for Natural Sciences IB Introduction
– 73 –
for the internal points 0<i<m, 0<j<n. In addition to this we will have either Dirichlet, von Neumann
or mixed boundary conditions to specify the boundary values of ϕ
ij
. The system of linear equations
described by (148) in combination with the boundary conditions may be solved in a variety of ways.
8.1.1 Direct solution
Provided the boundary conditions are linear in ϕ, our finite difference approximation is itself
linear and the resulting system of equations may be written as
Aϕ = b, (149)
with (m+1)(n+1) equations. This system may be solved directly using Gauss Elimination as
discussed in section 4.1. However, this approach may be feasible if the total number of mesh points
(m+1)(n+1) required is relatively small, but as the matrix A used to represent the complete system
will have [(m+1)(n+1)]
2
elements, the storage and computational cost of such a solution will
become prohibitive even for relatively modest m and n.
i=0, j=0
i=3, j=0
i=0, j=3 i=3, j=3
ϕ(x,0) = 0
∂ϕ/∂y = 1
ϕ(0,y) = y
ϕ(3,0) = y
Figure 23: Sketch of domain for Laplace equation. The linear system for this domain is given in (150).
The structure of the system ensures A is relatively sparse, consisting of a tridiagonal core with
one nonzero diagonal above and another below this. These nonzero diagonals are offset by either m
or n from the leading diagonal. For example, in the domain shown in figure 23, the linear system is
Numerical Methods for Natural Sciences IB Introduction
– 74 –
1
1
1
1
1
1 1 4 1 1
1 1 4 1 1
1
1
1 1 4 1 1
1 1 4 1 1
1
1 1
1 1
1 1
1 1
00
10
20
30
01
11
21
31
02
12
22
32
03
13
23
33
−
−
−
−
−
−
−
−
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]

.
` ϕ
ϕ
ϕ
ϕ
ϕ
ϕ
ϕ
ϕ
ϕ
ϕ
ϕ
ϕ
ϕ
ϕ
ϕ
ϕ ,
·

.
`
,
0
0
0
0
1
0
0
1
2
0
0
2
1
1
1
1
(150)
Provided pivoting (if required) is conducted in such a way that it does not place any nonzero
elements outside this band then solution by Gauss Elimination or LU Decomposition will only
produce nonzero elements inside this band, substantially reducing the storage and computational
requirements (see section 4.4). Careful choice of the order of the matrix elements (i.e. by x or by y)
may help reduce the size of this matrix so that it need contain only O(m
3
) elements for a square
domain.
Because of the wide spread need to solve Laplace’s and related equations, specialised solvers
have been developed for this problem. One of the best of these is Hockney’s method for solving
Aϕ = b which may be used to reduce a block tridiagonal matrix (and the corresponding righthand
side) of the form
A
A I
I A I
I A I
I A I
I A I
I A
I
I A
·
]
]
]
]
]
]
]
]
]
]
]
]
]
$
$
$
$
$
$
$
O
O O
, (151)
into a block diagonal matrix of the form
Numerical Methods for Natural Sciences IB Introduction
– 75 –
$
$
$
$
$
$
$
B
B
B
B
B
B
B
O
O O
]
]
]
]
]
]
]
]
]
]
]
]
]
, (152)
where
$
A and
$
B are themselves block tridiagonal matrices and I is an identiy matrix.. This process
may be performed iteratively to reduce an n dimensional finite difference approximation to
Laplace’s equation to a tridiagonal system of equations with n–1 applications. The computational
cost is O(p log p), where p is the total number of mesh points. The main drawback of this method is
that the boundary conditions must be able to be cast into the block tridiagonal format.
8.1.2 Relaxation
An alternative to direct solution of the finite difference equations is an iterative numerical
solution. These iterative methods are often referred to as relaxation methods as an initial guess at
the solution is allowed to slowly relax towards the true solution, reducing the errors as it does so.
There are a variety of approaches with differing complexity and speed. We shall introduce these
methods before looking at the basic mathematics behind them.
8.1.2.1 Jacobi
The Jacobi Iteration is the simplest approach. For clarity we consider the special case when
∆x = ∆y. To find the solution for a twodimensional Laplace equation simply:
1. Initialise Φ
ij
to some initial guess.
2. Apply the boundary conditions.
3. For each internal mesh point set
Φ*
ij
= (Φ
i+1,j
+ Φ
i–1,j
+ Φ
i,j+1
+ Φ
i,j–1
)/4. (153)
4. Replace old solution Φ with new estimate Φ*.
5. If solution does not satisfy tolerance, repeat from step 2.
An example of this algorithm is given in the table below where ϕ = 0 on the boundaries and the
initial guess is ϕ = 1 in the interior (clearly the answer is ϕ = 0 everywhere). The first four iterations
are labelled a, b, c and d.
a) 0
b) 0
c) 0
a) 0
b) 0
c) 0
a) 0
b) 0
c) 0
a) 0
b) 0
c) 0
a) 0
b) 0
c) 0
Numerical Methods for Natural Sciences IB Introduction
– 76 –
a) 0
b) 0
c) 0
a) 1
b) 1/2
c) 3/8
a) 1
b) 3/4
c) 1/2
a) 1
b) 1/2
c) 3/8
a) 0
b) 0
c) 0
a) 0
b) 0
c) 0
a) 1
b) 3/4
c) 1/2
a) 1
b) 1
c) 3/4
a) 1
b) 3/4
c) 1/2
a) 0
b) 0
c) 0
a) 0
b) 0
c) 0
a) 1
b) 1/2
c) 3/8
a) 1
b) 3/4
c) 1/2
a) 1
b) 1/2
c) 3/8
a) 0
b) 0
c) 0
a) 0
b) 0
c) 0
a) 0
b) 0
c) 0
a) 0
b) 0
c) 0
a) 0
b) 0
c) 0
a) 0
b) 0
c) 0
The coefficients in (153) (here all 1/4) used to calculate the refined estimate is often referred to
as the stencil or template. Higher order approximations may be obtained by simply employing a
stencil which utilises more points. Other equations (e.g. the biharmonic equation, ∇
4
Ψ = 0) may be
solved by introducing a stencil appropriate to that equation.
While very simple and cheap per iteration, the Jacobi Iteration is very slow to converge,
especially for larger grids. Corrections to errors in the estimate Φ
ij
diffuse only slowly from the
boundaries taking O(max(m,n)) iterations to diffuse across the entire mesh.
8.1.2.2 GaussSeidel
The GaussSeidel Iteration is very similar to the Jacobi Iteration, the only difference being that
the new estimate Φ*
ij
is returned to the solution Φ
ij
as soon as it is completed, allowing it to be used
immediately rather than deferring its use to the next iteration. The advantages of this are:
• Less memory required (there is no need to store Φ*).
• Faster convergence (although still relatively slow).
On the other hand, the method is less amenable to vectorisation as, for a given iteration, the new
estimate of one mesh point is dependent on the new estimates for those already scanned.
An example of the first iteration of the GaussSeidel approach using the same initial guess as our
Jacobi iteration in §8.1.2.1 is given below, with the iteration starting at the bottom left and moving
across then up.
Numerical Methods for Natural Sciences IB Introduction
– 77 –
a) 0
b) 0
c) 0
d) 0
a) 0
b) 0
c) 0
d) 0
a) 0
b) 0
c) 0
d) 0
a) 0
b) 0
c) 0
d) 0
a) 0
b) 0
c) 0
d) 0
a) 0
b) 0
c) 0
d) 0
a) 1
b) 13/32
c)
d)
a) 1
b) 29/64
c)
d)
a) 1
b) 29/128
c)
d)
a) 0
b) 0
c) 0
d) 0
a) 0
b) 0
c) 0
d) 0
a) 1
b) 5/8
c)
d)
a) 1
b) 13/32
c)
d)
a) 1
b) 29/64
c)
d)
a) 0
b) 0
c) 0
d) 0
a) 0
b) 0
c) 0
d) 0
a) 1
b) 1/2
c)
d)
a) 1
b) 5/8
c)
d)
a) 1
b) 13/32
c)
d)
a) 0
b) 0
c) 0
d) 0
a) 0
b) 0
c) 0
d) 0
a) 0
b) 0
c) 0
d) 0
a) 0
b) 0
c) 0
d) 0
a) 0
b) 0
c) 0
d) 0
a) 0
b) 0
c) 0
d) 0
Clearly in this example the solution is converging on zero more rapidly, but the symmetry that
existed in the initial guess has been lost.
8.1.2.3 RedBlack ordering
A variant on the GaussSeidel Iteration is obtained by updating the solution Φ
ij
in two passes
rather than one. If we consider the mesh points as a chess board, then the white squares would be
updated on the first pass and the black squares on the second pass. The advantages
• No interdependence of the solution updates within a single pass aids vectorisation.
Numerical Methods for Natural Sciences IB Introduction
– 78 –
• Faster convergence at low wave numbers.
An example, based on those in the previous two sections, is given below. Note that the cells
labelled “Red” are updated on the first pass and those labelled “Black” on the second pass. Unlike
the GaussSeidel, the order in which the cells are updated within one pass does not matter.
Black
a) 0
b) 0
Red
a) 0
b) 0
Black
a) 0
b) 0
Red
a) 0
b) 0
Black
a) 0
b) 0
Red
a) 0
b) 0
Black
a) 1
b) 3/8
Red
a) 1
b) 3/4
Black
a) 1
b) 3/8
Red
a) 0
b) 0
Black
a) 0
b) 0
Red
a) 1
b) 3/4
Black
a) 1
b) 3/4
Red
a) 1
b) 3/4
Black
a) 0
b) 0
Red
a) 0
b) 0
Black
a) 1
b) 3/8
Red
a) 1
b) 3/4
Black
a) 1
b) 3/8
Red
a) 0
b) 0
Black
a) 0
b) 0
Red
a) 0
b) 0
Black
a) 0
b) 0
Red
a) 0
b) 0
Black
a) 0
b) 0
In this example the solution is converging faster than the Jacobi iteration and at a comparable
rate to GaussSeidel, yet is maintaining the symmetries which existed in the initial guess. For many
problems, particularly those looking at instabilities, the maintenance of symmetries is important.
8.1.2.4 Successive Over Relaxation (SOR)
It has been found that the errors in the solution obtained by any of the three preceding methods
decrease only slowly and often decrease in a monotonic manner. Hence, rather than setting
Φ*
ij
= (Φ
i+1,j
+ Φ
i–1,j
+ Φ
i,j+1
+ Φ
i,j–1
)/4,
for each internal mesh point, we use
Φ*
ij
= (1–σ)Φ
ij
+ σ(Φ
i+1,j
+ Φ
i–1,j
+ Φ
i,j+1
+ Φ
i,j–1
)/4, (154)
Numerical Methods for Natural Sciences IB Introduction
– 79 –
for some value σ. The optimal value of σ will depend on the problem being solved and may vary as
the iteration process converges. Typically, however, a value of around 1.2 to 1.4 produces good
results. In some special cases it is possible to determine an optimal value analytically.
8.1.3 Multigrid
*
The big problem with relaxation methods is their slow convergence. If σ = 1 then application of
the stencil removes all the error in the solution at the wave length of the mesh for that point, but has
little impact on larger wave lengths. This may be seen if we consider the onedimensional equation
d
2
ϕ/dx
2
= 0 subject to ϕ(x=0) = 0 and ϕ(x=1) = 1. Suppose our initial guess for the iterative
solution is that Φ
i
= 0 for all internal mesh points. With the Jacobi Iteration the correction to the
internal points diffuses only slowly along from x = 1.
,WHUDWLRQV
Multigrid methods try to improve the rate of convergence by considering the problem of a
hierarchy of grids. The larger wave length errors in the solution are dissipated on a coarser grid
while the shorter wave length errors are dissipated on a finer grid. for the example considered
above, the solution would converge in one complete Jacobi multigrid iteration, compared with the
slow asymptotic convergence above.
For linear problems, the basic multigrid algorithm for one complete iteration may be described as
1. Select the initial finest grid resolution p=P
0
and set b
(p)
= 0 and make some initial
guess at the solution Φ
(p)
2. If at coarsest resolution (p=0) then solve A
(p)
Φ
(p)
=b
(p)
exactly and jump to step 7
3. Relax the solution at the current grid resolution, applying boundary conditions
*
Not examinable
Numerical Methods for Natural Sciences IB Introduction
– 80 –
4. Calculate the error r = AΦ
(p)
–b
(p)
5. Coarsen the error b
(p–1)
←r to the next coarser grid and decrement p
6. Repeat from step 2
7. Refine the correction to the next finest grid Φ
(p+1)
= Φ
(p+1)
+αΦ
(p)
and increment p
8. Relax the solution at the current grid resolution, applying boundary conditions
9. If not at current finest grid (P
0
), repeat from step 7
10. If not at final desired grid, increment P
0
and repeat from step 7
11. If not converged, repeat from step 2.
Typically the relaxation steps will be performed using Successive Over Relaxtion with RedBlack
ordering and some relaxation coefficient σ. The hierarchy of grids is normally chosen to differ in
dimensions by a factor of 2 in each direction. The factor α is typically less than unity and effectively
damps possible instabilities in the convergence. The refining of the correction to a finer grid will be
achieved by (bi)linear or higher order interpolation, and the coarsening may simply be by sub
sampling or averaging the error vector r.
It has been found that the number of iterations required to reach a given level of convergence is
more or less independent of the number of mesh points. As the number of operations per complete
iteration for n mesh points is O(n)+O(n/2
d
)+ +O(n/2
2d
)+…, where d is the number of dimensions in
the problem, then it can be seen that the Multigrid method may often be faster than a direction
solution (which will require O(n
3
), O(n
2
) or O(n log n) operations, depending on the method used).
This is particularly true if n is large or there are a large number of dimensions in the problem. For
small problems, the coefficient in front of the n for the Multigrid solution may be relatively large so
that direct solution may be faster.
A further advantage of Multigrid and other iterative methods when compared with direct
solution, is that irregular shaped domains or complex boundary conditions are implemented more
easily. The difficulty with this for the Multigrid method is that care must be taken in order to ensure
consistent boundary conditions in the embedded problems.
8.1.4 The mathematics of relaxation
*
In principle, relaxation methods which are the basis of the Jacobi, GaussSeidel, Successive Over
Relaxation and Multigrid methods may be applied to any system of linear equations to interatively
improve an approximation to the exact solution. The basis for this is identical to the Direct Iteration
method described in section 3.6. We start by writing the vector function
f(x) = Ax − b, (155)
and search for the vector of roots to f(x) = 0 by writing
x
n+1
= g(x
n
), (156)
where
g(x) = D
−1
{[A+D]x − b}, (157)
*
Not examinable
Numerical Methods for Natural Sciences IB Introduction
– 81 –
with D a diagonal matrix (zero for all offdiagonal elements) which may be chosen arbitrarily. We
may analyse this system by following our earlier analysis for the Direct Iteration method (section
3.6). Let us assume the exact solution is x* = g(x*), then
ε
n+1
= x
n+1
− x*
= D
1
{[A+D]x
n
− b} − D
1
{[A+D]x* − b}
= D
−1
[A+D](x
n
− x*)
= D
−1
[A+D]ε
n
= {D
−1
[A+D]}
n+1
ε
0
. (158)
From this it is clear that convergence will be linear and requires
ε
n+1
 = Bε
n
 < ε
n
, (159)
where B = D
−1
[A+D] for some suitable norm. As any error vector ε
n
may be written as a linear
combination of the eigen vectors of our matrix B, it is sufficient for us to consider the eigen value
problem
Bε
n
= λε
n
, (160)
and require max(λ) to be less than unity. In the asymptotic limit, the smaller the magnitude of this
maximum eigen value the more rapid the convergence. The convergence remains, however, linear.
Since we have the ability to choose the diagonal matrix D, and since it is the eigen values of
B = D
−1
[A+D] rather than A itself which are important, careful choice of D can aid the speed at
which the method converges. Typically this means selecting D so that the diagonal of B is small.
8.1.4.1 Jacobi and GaussSeidel for Laplace equation
*
The structure of the finite difference approximation to Laplace’s equation lends itself to these
relaxation methods. In one dimension,
A ·
−
−
−
−
−
−
]
]
]
]
]
]
]
]
]
]
]
2 1
1 2 1
1 2 1
1 2 1
1 2 1
1 2
O O O
(161)
and both Jacobi and GaussSeidel iterations take D as 2I (I is the identity matrix) on the diagonal to
give B = D
−1
[A+D] as
*
Not examinable
Numerical Methods for Natural Sciences IB Introduction
– 82 –
B ·
]
]
]
]
]
]
]
]
]
]
]
0 1 2
1 2 0 1 2
1 2 0 1 2
1 2 0 1 2
1 2 0 1 2
1 2 0
O O O
(162)
The eigen values λ of this matrix are given by the roots of
det(BλI) = 0. (163)
In this case the determinant may be obtained using the recurrence relation
det(Bλ)
(n)
= λ det(Bλ)
(n1)

1
/
4
det(Bλ)
(n2)
, (164)
where the subscript gives the size of the matrix B. From this we may see
det(Bλ)
(1)
= λ ,
det(Bλ)
(2)
= λ
2
 ¼ ,
det(Bλ)
(3)
= λ
3
+ ½λ ,
det(Bλ)
(4)
= λ
4
 ¾ λ
2
+ (1/16) ,
det(Bλ)
(5)
= λ
5
+ λ
3
 (3/16)λ ,
det(Bλ)
(6)
= λ
6
 (5/4)λ
4
+ (3/8)λ
2
 (1/64) ,
… (165)
which may be solved to give the eigen values
λ
(1)
= 0 ,
λ
2
(2)
= 1/4 ,
λ
2
(3)
= 0, 1/2 ,
λ
2
(4)
= (3 t √5)/8 ,
λ
2
(5)
= 0, 1/4, 3/4 ,
… (166)
It can be shown that for a system of any size following this general form, all the eigen values satisfy
λ < 1, thus proving the relaxation method will always converge. As we increase the number of
mesh points, the number of eigen values increases and gradually fills up the range λ < 1, with the
Numerical Methods for Natural Sciences IB Introduction
– 83 –
numerically largest eigen values becoming closer to unity. As a result of λ→1, the convergence of
the relaxation method slows considerably for large problems. A similar analysis may be applied to
Laplace’s equation in two or more dimensions, although the expressions for the determinant and
eigen values is correspondingly more complex.
The large eigen values are responsible for decreasing the error over large distances (many mesh
points). The multigrid approach enables the solution to converge using a much smaller system of
equations and hence smaller eigen values for the larger distances, bypassing the slow convergence
of the basic relaxation method.
8.1.4.2 Successive Over Relaxation for Laplace equation
*
The analysis of the Jacobi and GaussSeidel iterations may be applied equally well to Successive
Over Relaxation. The main difference is that D = (2/σ)I so that
B
SOR
·
−
−
−
−
−
−
]
]
]
]
]
]
]
]
]
]
]
1 2
2 1 2
2 1 2
2 1 2
2 1 2
2 1
σ σ
σ σ σ
σ σ σ
σ σ σ
σ σ σ
σ σ
O O O
(167)
and thus
B I B I
SOR J
− · −
+ −
]
]
]
λ σ
λ σ
σ
1
(168)
and the corresponding eigen values λ
SOR
are related to the eigen values λ
J
for the Jacobi and Gauss
Seidel methods by
λ
SOR
= 1 + σ(λ
J
− 1) (169)
Thus if σ is chosen inappropriately, the eigen values of B will exceed unity and the relaxation
method will diverge. On the otherhand, careful choice of σ will allow the eigen values of B to be
less than those for Jacobi and GaussSeidel, thus increasing the rate of convergence.
8.1.4.3 Other equations
*
Relaxation methods may be applied to other differential equations or more general systems of linear
equations in a similar manner. As a rule of thumb, the solution will converge if the A matrix is
diagonally dominant, i.e. the numerically largest values occur on the diagonal. If this is not the case,
*
Not examinable
*
Not examinable
Numerical Methods for Natural Sciences IB Introduction
– 84 –
SOR can still be used, but it may be necessary to choose σ < 1 whereas for Laplace’s equation σ ≥ 1
produces a better rate of convergence.
8.1.5 FFT
*
One of the most common ways of solving Laplace’s equation is to take the Fourier transform of
the equation to convert it into wave number space and there solve the resulting algebraic equations.
This conversion process can be very efficient if the Fast Fourier Transform algorithm is used,
allowing a solution to be evaluated with O(n log n) operations.
In its simplest form the FFT algorithm requires there to be n = 2
p
mesh points in the direction(s)
to be transformed. The efficiency of the algorithm is achieved by first calculating the transform of
pairs of points, then of pairs of transforms, then of pairs of pairs and so on up to the full resolution.
The idea is to divide and conquer! Details of the FFT algorithm may be found in any standard text.
8.1.6 Boundary elements
*
8.1.7 Finite elements
*
8.2 Poisson equation
The Poisson equation ∇
2
ϕ = f(x) may be treated using the same techniques as Laplace’s equation.
It is simply necessary to set the righthand side to f, scaled suitably to reflect any scaling in A.
8.3 Diffusion equation
Consider the twodimensional diffusion equation,
∂
∂
∂
∂
∂
∂
u
t
D
u
x
u
y
· +

.
`
,
2
2
2
2
, (170)
subject to u(x,y,t) = 0 on the boundaries x=0,1 and y=0,1. Suppose the initial conditions are
u(x,y,t=0) = u
0
(x,y) and we wish to evaluate the solution for t > 0. We shall explore some of the
options for achieving this in the following sections.
8.3.1 Semidiscretisation
One of the simplest and most useful approaches is to discretise the equation in space and then
solve a system of (coupled) ordinary differential equations in time in order to calculate the solution.
Using a square mesh of step size ∆x = ∆y = 1/m, and taking the diffusivity D = 1, we may utilise our
earlier approximation for the Laplacian operator (equation (148)) to obtain
*
Not examinable
*
Not examinable
*
Not examinable
Numerical Methods for Natural Sciences IB Introduction
– 85 –
∂
∂
u
t
u u u u u
x
i j i j i j i j i j i j , , , , , ,
≈
+ + + −
+ − + − 1 1 1 1
2
4
∆
(171)
for the internal points i=1,m–1 and j=1,m–1. On the boundaries (i=0,j), (i=m,j), (i,j=0) and (i,j=m)
we simply have u
ij
=0. If U
ij
represents our approximation of u at the mesh points x
ij
, then we must
simply solve the (m–1)
2
coupled ordinary differential equations
( )
( )
′ · + + + −
+ − + −
U t U U U U U x
i j i j i j i j i j i j , , , , , ,
/
1 1 1 1
2
4 ∆ . (172)
In principle we may utilise any of the time stepping algorithms discussed in earlier lectures to
solve this system. As we shall see, however, care needs to be taken to ensure the method chosen
produces a stable solution.
8.3.2 Euler method
Applying the Euler method Y
n+1
= Y
n
+∆tf(Y
n
,t
n
) to our spatially discretised diffusion equation
gives
( ) ( ) ( ) ( ) ( ) ( ) ( )
( )
U U U U U U U
i j
n
i j
n
i j
n
i j
n
i j
n
i j
n
i j
n
, , , , , , ,
+
+ − + −
· + + + + + −
1
1 1 1 1
4 µ , (173)
where the Courant number
µ = ∆t/∆x
2
, (174)
describes the size of the time step relative to the spatial discretisation. As we shall see, stability of
the solution depends on µ in contrast to an ordinary differential equation where it is a function of the
time step ∆t only.
8.3.3 Stability
Stability of the Euler method solving the diffusion equation may be analysed in a similar way to
that for ordinary differential equations. We start by asking the question “does the Euler method
converge as t–>infinity?” The exact solution will have u –> 0 and the numerical solution must also
do this if it is to be stable.
We choose
U
(0)
i,j
=sin(αi) sin(βj), (175)
for some α and β chosen as multiples of π/m to satisfy u = 0 on the boundaries. Substituting this
into (173) gives
U
(1)
i,j
= sin(αi)sin(βj) + µ{sin[α(i+1)]sin(βj) + sin[α(i−1)]sin(βj)
+ sin(αi)sin[β(j+1)] + sin(αi)sin[β(j1)] − 4 sin(αi)sin(βj)}
= sin(αi)sin(βj) + µ{[sin(αi)cos(α) + cos(αi)sin(α)]sin(βj) + [sin(αi)cos(α) − cos(αi)sin(α)]sin(βj)
+ sin(αi)[sin(βj)cos(β) + cos(βj)sin(β)] + sin(αi)[sin(βj)cos(β) − cos(βj)sin(β)] − 4 sin(αi)sin(βj)}
Numerical Methods for Natural Sciences IB Introduction
– 86 –
= sin(αi)sin(βj) + 2µ{sin(αi)cos(α) sin(βj) + sin(αi)sin(βj)cos(β) − 2 sin(αi)sin(βj)}
= sin(αi)sin(βj){1 + 2µ[cos(α) + cos(β) − 2]}
= sin(αi)sin(βj){1 − 4µ[sin
2
(α/2) + sin
2
(β/2)]}. (176)
Applying this at consecutive times shows the solution at time t
n
is
U
(n)
i,j
= sin(αi)sin(βj) {1 − 4µ[sin
2
(α/2) + sin
2
(β/2)]}
n
, (177)
which then requires 1 − 4µ[sin
2
(α/2) + sin
2
(β/2)] < 1 for this to converge as n>infinity. For this to
be satisfied for arbitrary α and β we require µ < 1/4. Thus we must ensure
∆t < ∆x
2
/4. (178)
A doubling of the spatial resolution therefore requires a factor of four more time steps so overall the
expense of the computation increases sixteenfold.
The analysis for the diffusion equation in one or three dimensions may be computed in a similar
manner.
8.3.4 Model for general initial conditions
Our analysis of the Euler method for solving the diffusion equation in section 8.3.3 assumed
initial conditions of the form sin(kπx/L
x
) sin(lπy/L
y
) where k,l are integers and L
x
, L
y
are the
dimensions of the domain. In addition to satisfying the boundary conditions, these initial conditions
represent a set of orthogonal functions which may be used to construct any arbitrary initial
conditions as a Fourier series. Now, since the diffusion equation is linear, and as our stability
analysis of the previous section shows the conditions under which the solution for each Fourier
mode is stable, we can see that the equation (178) applies equally for arbitrary initial conditions.
8.3.5 CrankNicholson
The implicit CrankNicholson method is significantly better in terms of stability than the Euler
method for ordinary differential equations. For partial differential equations such as the diffusion
equation we may analyse this in the same manner as the Euler method of section 8.3.3.
For simplicity, consider the one dimensional diffusion equation
∂
∂
∂
∂
u
t
u
x
·
2
2
(179)
with u(x=0,t) = u(x=1,t) = 0 and apply the standard spatial discretisation for the curvature term to
obtain
( ) ( ) ( ) ( )
{ ¦
( ) ( ) ( ) ( )
{ ¦ U U U U U U U U
i
n
i
n
i
n
i
n
i
n
i
n
i
n
i
n +
+
+ +
−
+
+ −
− − + · + − +
1
1
1 1
1
1
1 1
2
2
2
2
µ µ
(180)
for the i=1,m−1 internal points. Solution of this expression will involve the solution of a tridiagonal
system for this onedimensional problem at each time step:
Numerical Methods for Natural Sciences IB Introduction
– 87 –
( )
( )
( )
( )
( )
( )
( )
( )
1
1
1
1
1
1
1
1
1
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
1
2
0
1
1
1
2
1
3
1
4
1
5
1
6
1
1
1
− + −
− + −
− + −
− + −
− + −
− + −
− + −
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
]
+
+
+
+
+
+
+
−
+
µ µ µ
µ µ µ
µ µ µ
µ µ µ
µ µ µ
µ µ µ
µ µ µ
U
U
U
U
U
U
U
U
n
n
n
n
n
n
n
m
n
( )
( ) ( ) ( )
( )
( ) ( ) ( )
( )
( ) ( ) ( )
( )
U
U U U U
U U U U
U U U U
m
n
n n n n
n n n n
m
n
m
n
m
n
m
n
+
− − −

.
`
,
·
+ − +
+ − +
+ − +

.
`
,
1
1
1
2 2 1 0
2
1
2 3 2 1
1
1
2 1 2
0
2
2
2
0
µ
µ
µ
(181)
To test the stability we again choose a Fourier mode. Here we have only one spatial dimension so
we use U
(0)
i
= sin(θi) which satisfies the boundary condition if θ is a multiple of π. Substituting this
into (182) we find
U U U
i
n
i
n
i
n
·
−
+

.
`
,
·
−
+

.
`
,
−1
2
2
2
2
0
2
2
2
2
1 2
1 2
1 2
1 2
µ
µ
µ
µ
θ
θ
θ
θ
sin
sin
sin
sin
. (182)
Since the term [1–2µsin
2
(θ/2)]/[1+2µsin
2
(θ/2)] < 1 for all µ > 0, the CrankNicholson method is
unconditionally stable. The step size ∆t may be chosen on the grounds of truncation error
independently of ∆x.
8.3.6 ADI
*
Not examinable
8.4 Advection
*
Not examinable
8.4.1 Upwind differencing
*
Not examinable
8.4.2 Courant number
*
Not examinable
*
Not examinable
*
Not examinable
*
Not examinable
*
Not examinable
Numerical Methods for Natural Sciences IB Introduction
– 88 –
8.4.3 Numerical dispersion
*
Not examinable
8.4.4 Shocks
*
Not examinable
8.4.5 LaxWendroff
*
Not examinable
8.4.6 Conservative schemes
*
Not examinable
*
Not examinable
*
Not examinable
*
Not examinable
*
Not examinable
Numerical Methods for Natural Sciences IB Introduction
– 89 –
9. Number representation
*
This section outlines some of the fundamentals as to how numbers are represented in computers.
None of the material in this section is examinable.
9.1. Integers
*
Normally represented as 2’s compliment. Positive integers are presented as their binary
equivalents:
Decimal Binary Two’s compliment (16 bit)
1 1
0000000000000001
5 101
0000000000000101
183 10110111
0000000010110111
255 11111111
0000000011111111
32767 111111111111111
0111111111111111
Negative numbers are obtained by taking the one’s compliment (i.e. inverting all bits) and adding
one: –A = (A xor 1111111111111
2
) + 1. Any carry bit generated is discarded. This operation is its
selfinverse
Decimal Two’s compliment (16 bit)
A 5
0000000000000101
xor 1111111111111 1111111111111010
+ 1
–5
1111111111111011
Decimal
Two’s compliment (16 bit)
–A –5
1111111111111011
xor 1111111111111 0000000000000100
+ 1
5
0000000000000101
Examples
Decimal Binary Two’s compliment (16 bit)
–1 –1
1111111111111111
–5 –101
1111111111111011
–183 –10110111
1111111101001001
–255 –11111111
1111111100000001
–32767 –111111111111111
1000000000000001
–32768 –1111111111111110
1000000000000000
• The number must be represented by a known number of bits, n (say)
• The highest order bit indicates the sign of the number
• If n bits are used to represent the integer, values in the range –2
n–1
to 2
n–1
–1 may be
represented. For n=16, values from –32768 to 32767 are represented
*
Not examinable
*
Not examinable
Numerical Methods for Natural Sciences IB Introduction
– 90 –
• Normal binary addition of two two’s compliment numbers produces a correct result
if any carry bit is discarded
Decimal
Two’s compliment (16 bit)
–5
1111111111111011
+ + 183
0000000010110111
binary sum ???
10000000010110010
discard carry bit 178
0000000010110010
9.2. Floating point
*
Represented as a binary mantissa and a binary exponent. There are a number of strategies for
encoding the two parts. The most common of these is the IEEE floating point format which we
describe here.
Figure 24 shows the IEE format for four byte floating point values, and figure 25 the same for
eight byte values. In both cases the number is stored as a sign bit followed by the expondent and the
mantissa. The number of bits used to represent the exponent and mantissa varies depending on the
total number of bits.
Sign bit. The sign bit gives the overall sign for the number. A value of 0 indicates positive
values, while 1 indicates negative values.
Exponent. This occupies eight bits for four byte values and eleven bits for eight byte values. The
value stored in the exponent field is offset such that for four byte reals exponents of n = 1 ,0 and +1
are stored as 126 (=01111110
2
), 127 (=01111111
2
) and 128 (=10000000
2
) respectively. The
corresponding stored values for eight byte numbers are 1022 (=01111111110
2
), 1023
(=01111111111
2
) and 1024 (=10000000000
2
). Each of these exponents represents a power of two
scaling on the mantissa.
Mantissa. The mantissa is stored as unsigned binary in the remaining 23 (four byte values) or 52
(eight byte values) bits. The use of a mantissa plus an exponent means that the binary representation
of all floating point values apart from zero will start with a one. It is thus unnecessary to store the
first binary digit, and so improve the accuracy of the number representation. There is an assumed
binary point (equivalent of a decimal point) following the unstored part of the number.
Zero. The value zero (0) does not follow the pattern of a unit value for the first binary digit in the
mantissa. Given that, plus it being, in general, a special case, zero does not follow the above
pattern. Instead, it is stored with all three components set to zero.
In the example below we show the four byte IEE representations of a selection of values.Binary
values are indicated by a subscript 2.
Decimal Sign Exponent Mantissa Value Stored
seeeeeee emmmmmmm mmmmmmmm mmmmmmmm
0.0
0
2
0
00000000
2
0.0
0.00000
2
0.0
00000000 00000000 00000000 00000000
2
1.0 +
0
2
0
01111111
2
1.0
1.00000
2
1.0
00111111 10000000 00000000 00000000
2
8.0 +
0
2
3
10000100
2
1.0
1.00000
2
8.0
01000001 00000000 00000000 00000000
2
*
Not examinable
Numerical Methods for Natural Sciences IB Introduction
– 91 –
3.0 +
0
2
1
10000000
2
1.5
1.10000
2
3.0
01000000 01000000 00000000 00000000
2
3.0 
1
2
1
10000000
2
1.5
1.10000
2
3.0
11000000 01000000 00000000 00000000
2
0.25 +
0
2
2
01111101
2
1.0
1.00000
2
0.25
00111110 10000000 00000000 00000000
2
0.2 +
0
2
3
01111100
2
1.6
1.10011
2
0.2 + 0.0000000149..
00111110 01001100 11001100 11001101
The first six values in this table yield exact representations under the IEE format. The final value
(0.2), however, while an exact decimal, is represented as a recurring binary and thus suffers from
truncation error in the IEEE format. Here the final bit has been rounded up, whereas the recurring
sequence would carry on the 11001100 pattern. The error for a single value is small, but when
carried through the a number of stages of computation, may lead to substantial errors being
generated.
9.3. Rounding and truncation error
*
The finite precision of floating point arithmetic in computers leads to rounding and truncation
error. Truncation error is the result of the computer not being able to represent most floating point
values exactly.
If two floating point values are combined, by addition, for example, then there may be a further
loss of precision through rounding error even if both starting values may be represented exactly.
Consider the sum of 0.00390625=0.00000001
2
and 16=10000
2
, both of which have exact
representations in our binary floating point format. The sum of these, 10000.00000001
2
would
require a 13 bit mantissa. If we were to have a format where we have only 12 bits stored for the
mantissa, it would be necessary to reduce the number by either rounding the binary number up to
10000.0000001
2
, or truncating it to give 10000.0000000
2
. In either case there is a loss of precision.
The effect of this will accumulate of successive computations an may lead to a meaningless final
result if care is not taken to ensure adequate precision and a suitable order for calculations.
9.4. Endians
*
While almost all modern computers use formats discussed in the previous sections to store
integer and floating point values, the same value may not be stored in the same way on two different
machines. The reason for this is in the architecture of the central processing unit (CPU). Some
machines store the least significant byte of a word at the lower memory address and the most
significant byte at the upper memory address occupied by the value. This is the case for machines
based on Intel and Digital processors (e.g. PCs and DEC alphas), amongst others. However, other
machines store the values the opposite way around in memory, storing the least significant byte at
the higher memory address. Many (but not all) Unix work stations use the latter strategy.
The result of this is that files containing binary data (e.g. in the raw IEEE format rather than as
ASCII text) will not be easily portable from a machine that uses one convention to one that uses the
*
Not examinable
*
Not examinable
Numerical Methods for Natural Sciences IB Introduction
– 92 –
other. A two byte integer value of 1 = 00000000 00000001
2
written by one machine would be read
as 256 = 00000001 00000000
2
when read by the other machine.
Numerical Methods for Natural Sciences IB Introduction
 93
10. Computer languages
*
A complete discussion of computer languages would require a whole course in
itself. These notes are simply intended to give the reader a feeling for the strengths
and uses of different languages. The notes are written from the perspective of
someone who was brought up on a wide range of languages with programs suitable for
both procedural and object oriented approaches. The discussion is necessarily brief
and may be unfairly critical on some languages.
Ideally you would use the language which is most appropriate for a given
application. For most projects, the language will be selected from the intersection of
the subset of languages you know and the set of languages supported on the specific
platform you require. Only for large projects is it worth investing the time and money
required to utilise the optimal language.
None of the material in this section is examinable.
10.1. Procedural verses Object Oriented
*
For many applications, the current trend is towards object oriented languages such
as C++. Procedural languages are considered behind the times by many computer
scientists, yet they remain the language of choice for many scientific programs. The
main procedural contender is, and has always been Fortran. Both C++ and Fortran
have their advantages, depending on the task at hand. This section is intended to act as
an aid for choosing the appropriate language for a given task.
10.2. Fortran 90
*
Fortran is one of the oldest languages still in wide spread use. It has long had a
standard, enabling programs written for one type of machine to be relatively easily
ported to run on a different type of machine, provided a Fortran compiler was
available. The standard has evolved to reflect changes in coding practices and
requirements. Unfortunately for many programmers, the standard lags many years
behind current thinking. For example, Fortran 77 is the standard to which the majority
of current compilers adhere. As the name suggests, the standard was released in 1977
and the demands of computing have changed significantly since then.
The latest standard, Fortran 90, was not finally released until about 1994. Like
earlier improvements to the standard, it maintains a high level of backward
compatibility. A Fortran 90 compiler should be able to compile all but a few of the
oldest of Fortran programs (assuming they adhere to the relevant standard). While
Fortran 90 answers the majority of criticisms about the earlier version, work has
*
Not examinable
*
Not examinable
*
Not examinable
Numerical Methods for Natural Sciences IB Introduction
 94
already started on the next generation, helping to ensure Fortran remains in wide
spread use.
10.2.1. Procedural oriented
*
The origins of Fortran as a scientific computer language are reflected in the name
which stands for Formula Translation. The language is procedural oriented with the
algorithm (or “formula”) playing the central rôle controling the way the program is
written and structured. In contrast, the object oriented C++ (see below) focuses on the
data. For the majority of scientific programs using the types of algorithms introduced
in this course, the procedural oriented approach is more appropriate. The program
generates data from input parameters. The underlying numerical algorithms process
one type of data in a consistent manner.
As the most computationally intensive programs have traditionally followed the
procedural model, and have been written in Fortran, there has been a tremendous
effort put into developing optimisation strategies for Fortran compilers. As a result the
code produced by Fortran compilers is normally more efficient for a numerical
program than that produced by other language compilers. In contrast, if you were to
develop a word processor, for example, in Fortran, the end result would be less
efficient (in terms of speed and size) and more cumbersome than something with the
same functionality developed in C++.
10.2.2. Fortran enhancements
*
For those familiar with Fortran 77, the Fortran 90 standard has introduced many of
the features sadly missing from the earlier version. Control structures such as DO
loops have done away with the need for a numeric label. WHILE, CASE and BREAK
statements have been added to simplify program logic, and the ability to call
subroutines recursively allows the program to operate more efficiently. From the data
side structures (records), unions an maps greatly simplify the passing of parameters
and the organisation of variables without resort to a multitude of COMMON blocks.
Perhaps the most significant changes accompany a move from all data structures
being static (i.e. the size of and space required by variables is determined at compile
time) to allowing dynamic allocation of memory. This greatly improves the flexibility
of a program and frequently allows you to achieve more with less memory as the same
memory can be reused for different purposes.
Many of these enhancements have been borrowed from languages such as C++,
and adapted to the Fortran environment. One of the most powerful (and, if abused,
most dangerous) is the ability to overload a function or operator. By overloading a
function or operator, the precise result will depend on the type (and number) of
parameters or operands used in executing the function/operator. For example, the
operator “+” could be defined such that, when it “adds” two strings, then it
concatenates them together, perhaps stripping any trailing spaces in the process.
*
Not examinable
*
Not examinable
Numerical Methods for Natural Sciences IB Introduction
 95
Similarly it would be possible to define “number=string” such that the string “string”
was passed to an interpreter function and the contents of it evaluated. Clearly doing
something like defining “WRITE” as “READ” would lead to a great deal of
confusion!
10.3. C++
*
Like Fortran, C++ has evolved from earlier languages. The original, BCPL, was
adopted and enhanced by Bell Laboratories to produce B and then C as part of their
project to develop Unix.
10.3.1. C
*
The main features of C are its access to lowlevel features of the system and its
extremely compact (and often confusing) notation. C has often been described as a
write only language as it is relatively easy to write code which almost no one will be
able to decipher. The popularity of C stems from its close association with Unix rather
than from any inherent qualities. While there are probably still more programs written
in Fortran and COBOL than there are in C, it is almost certain that more people use
programs written in C (or C++) than in any other language. The vast majority of
shrink wrapped applications are now written in C or C++.
While C compilers are still widely available and in common use, there is little or
no justification for using C for any new project. C++ (which is more than just an
improved version of C) provides all the functionality of C and allows the code to be
written in a more flexible and readable manner.
Many programs were written in C during the late 1980s because it was the “in”
language rather than the best for a given application. As a result of this investment and
the relative simplicity of gradually moving over to C++, most mainstream applications
in C
10.3.2. Object Oriented
*
As systems became bigger, it became desirable – often essential – to reuse
components as much as possible. Historically this was achieved through the use of
software libraries, but such libraries tend to be useful only for lowlevel routines. The
concept of Object Oriented Programming was introduced to C to provide a straight
forward way of handling many similar – but not identical – objects without having to
write specialised code for each object. An object is simply some data which describes
a coherent unit.
*
Not examinable
*
Not examinable
*
Not examinable
Numerical Methods for Natural Sciences IB Introduction
 96
Suppose we have an object my_cup which belongs to some class Ccup_of_drink
(say). The object my_cup will take the form of a description of the cup which might
include information about its colour, its texture, its use and its contents. Suppose we
also have an object my_plate and that this belongs to a class Cplate_of_food. Clearly
there is much in common between a cup and a plate. They are both crockery, although
my_cup will contain a drink while my_plate will contain food. To save us having to
define the crockery aspects separately for my_cup and my_plate, we shall allow them
to inherit these attributes from a class Ccrockery. Tableware will, in turn, inherit
attributes from other classes such as Ctableware, Ccolour and Ctexture. This
hierarchy may be expressed as
Ccup_of_drink: Ccrockery, Cdrink
Cplate_of_food: Ccrockery, Cfood
Ccrockery: Ctableware, Ccolour, Ctexture
Similarly we might have a teaspoon my_teaspoon belonging to Ccutlery which
follows
Ccutlery: Ctableware, Ccolour
The objectoriented approach comes in if we have an object my_dishwasher. The
dishwasher will accept all objects belonging to Ctableware. The classes Ccrockery
and Ccutlery are both derived from Ctableware and so any Ccrockery or Ccutlery
object is also a Ctableware object, and can thus be washed by the my_dishwasher.
Similarly as Ccup_of_drink and Cplate_of_food are derived (indirectly) from
Ctableware and so can be washed by my_dishwasher.
The object my_dishwasher would simply say “wash thyself” to any object it
contains. For a Ccrockery or Ccutlery object the process of washing would be
identical, and could thus be embodied at the Ctableware level. It is not quite so simple
for the Ccup_of_drink and Cplate_of_food objects as they contain food which would
be destroyed by the dishwasher. Thus when my_dishwasher tries to wash the
Ctableware derived object my_plate, it is necessary for the Cplate_of_food class to
override the Ctableware procedure for washing. In this example the Cplate_of_food
response to the dishwasher may be to declare the my_plate empty of food and then
pass the request for washing the plate down to its embedded Ccrockery class which
will in turn pass it down to Ctableware.
10.3.3. Weaknesses
*
On the surface, the class structure would appear to allow C++ to be adapted to any
type of application. In practice there are some things that are difficult to achieve in a
seamless way and others which lead to relatively inefficient programming. This
inefficiency may not matter for a wordprocessor (although users of Microsoft Word
may not agree), but can be critical for a large scale numerical simulation.
For someone brought up on most other languages, the most obvious omission is
support for multidimensional arrays. While C++ is able to have an array of arrays, and
*
Not examinable
Numerical Methods for Natural Sciences IB Introduction
 97
so have some of the functionality of a multidimensional array, they can not be used in
the same way and the usual notation can not be employed. Further, it is easy to get
confused as to what you have an array of, especially with the cryptic notation shared
by C and C++.
C++ itself is a very small language with only a few built in functions and operators.
It gains its functionality through standard and specialised libraries. While this adds to
the flexibility, it has implications on the time taken to compile a program and, more
importantly, on the execution speed. Even with the most carefully constructed
libraries and the functions being inserted inline (instead of being called), the fact the
compiler does not understand the functions within the library limits the degree of
optimisation which can be achieved.
10.4. Others
*
10.4.1. Ada
*
Touted as the language to replace all other languages, Ada was introduced in the
1980s with a substantial backing from the US military. Its acceptance was slow due to
its cumbersome nature and its unsuitability for many of the computers of the period.
10.4.2. Algol
*
In the late 1960s and early 1970s Algol (Algorithmic Language) looked as though
it would become the scientific computer language. Algol forced the programmer to
adopt a much more structured approach than other languages at the time and offered a
vast array of builtin functionality such as matrix manipulation. Ultimately it was the
shear power that lead to the demise of Algol as it was unable to fit on the smaller mini
and microcomputers as they were developed.
10.4.3. Basic
*
Basic has developed from a design exercise in Southampton to the language which
is known by more people than any other. The name Basic (Beginners Symbolic
Instruction Code) does not really represent a language, but rather a family of
languages which appear similar in concept but will differ in almost every important
detail. Most implementations of Basic are proprietary and some are much better than
others. Basic was designed as an interpreted language and most implementations
follow this heritage. The name Basic is often used for the interpreted macro language
in many modern applications, especially those from Microsoft.
*
Not examinable
*
Not examinable
*
Not examinable
*
Not examinable
Numerical Methods for Natural Sciences IB Introduction
 98
10.4.4. Cobol
*
Cobol (Common Business Oriented Language) and Fortran are the only two early
highlevel languages remaining in widespread use. However, whereas Fortran has
evolved with time, Cobol is essentially the same as it was 30 years ago. It is an
extremely verbose, Englishlike language aimed at business applications. It remains in
widespread use not because of any inherent advantages, but because of the huge
investment which would be required to upgrade the software to a more modern
language.
10.4.5. Delphi
*
Developed by Borland International as a competitor for Microsoft’s Visual Basic,
Delphi tries to combine the graphical development tools of Visual Basic with the
structure and consistency of Pascal and the Object Oriented approach of C++ and
Smalltalk. It is being predicted that Delphi will become increasingly popular on PCs.
10.4.6. Forth
*
A threadedinterpreted language. Developed to control telescopes. Uses reverse
polish notation. Extremely efficient for an interpreted language, but relatively hard to
read and program.
10.4.7. Lisp
*
Good for Artificial Intellegence, provided you like recursion and counting brackets.
10.4.8. Modula2
*
An improved form of Pascal. Very good, but never widely accepted.
10.4.9. Pascal
*
Developed as a teaching language. The standard lacks many of the features
essential for effective computing. Most implementations provide significant
enhancements to this standard, but are often not portable. Can be used for scientific
programming, but the compilers are often inferior to those of Fortran and C++.
*
Not examinable
*
Not examinable
*
Not examinable
*
Not examinable
*
Not examinable
*
Not examinable
Numerical Methods for Natural Sciences IB Introduction
 99
10.4.10. PL/1
*
Introduced by IBM in the 1970s as a replacement for both Cobol and Fortran, PL/1
has some of the advantages and some of the disadvantages of both languages.
However, it never gained much of a following due to a combination of its proprietary
nature and a requirement for very substantial mainframe resources.
10.4.11. PostScript
*
Often thought of as a printer protocol, PostScript is in fact a fullfledged language
with all the normal capabilities plus built in graphics! Like Forth, PostScript is a
threadedinterpreted language which results in efficient but hard to read code. While
PostScript can be used for generalpurpose computing, it is inadvisable as it would tie
up the printer for a long time! Unlike most languages in widespread use, PostScript is
a proprietary language belonging to Adobe. There are, however, many emulations of
PostScript with suitably disguised names: GhostScript, StarScript, …
10.4.12. Prolog
*
10.4.13. Smalltalk
*
One of the first objectoriented programming languages. It provides a more
consistent view than C++, but is less widely accepted.
10.4.14. Visual Basic
*
This flavour of Basic was introduced by Microsoft as an easy method of writing
programs for the Windows GUI. It has been an extremely successful product, but
remains relatively inefficient for computational work when compared with C++ or
Fortran.
*
Not examinable
*
Not examinable
*
Not examinable
*
Not examinable
*
Not examinable
Numerical Methods for Natural Sciences IB
Introduction
CONTENTS
• Page numbers are correct only for the handout with which they are given Formatting and visibility in this version: ................................................................................... 1 Subsections ................................................................................................................................. 1 Subsubsections....................................................................................................................... 1 1 Introduction................................................................................................................................ 6 1.1 Objective ............................................................................................................................... 6 1.2 Books .................................................................................................................................... 6 General: .................................................................................................................................. 6 More specialised:.................................................................................................................... 6 1.3 Programming......................................................................................................................... 7 1.4 Tools ..................................................................................................................................... 7 1.4.1 Software libraries .......................................................................................................... 7 1.4.2 Maths systems ................................................................................................................ 7 1.5 Course Credit ........................................................................................................................ 8 1.6 Versions ................................................................................................................................ 8 1.6.1 Notes disctributed during lectures................................................................................. 8 1.6.2 Acrobat........................................................................................................................... 8 1.6.3 HTML............................................................................................................................. 8 1.6.4 Copyright ....................................................................................................................... 9 2 Key Idea .................................................................................................................................... 10 3 Root finding in one dimension ................................................................................................ 11 3.1 Why? ................................................................................................................................... 11 3.2 Bisection ............................................................................................................................. 11 3.2.1 Convergence ................................................................................................................ 12 3.2.2 Criteria......................................................................................................................... 12 3.3 Linear interpolation (regula falsi) ....................................................................................... 13 3.4 NewtonRaphson................................................................................................................. 14 3.4.1 Convergence ................................................................................................................ 15 3.5 Secant (chord) ..................................................................................................................... 16 3.5.1 Convergence ................................................................................................................ 18 3.6 Direct iteration .................................................................................................................... 19 3.6.1 Convergence ................................................................................................................ 20 3.7 Examples............................................................................................................................. 21 3.7.1 Bisection method.......................................................................................................... 21 3.7.2 Linear interpolation..................................................................................................... 22 3.7.3 NewtonRaphson.......................................................................................................... 22 3.7.4 Secant method .............................................................................................................. 23 3.7.5 Direct iteration ............................................................................................................ 23
–2–
Numerical Methods for Natural Sciences IB
Introduction
3.7.6 Comparison.................................................................................................................. 25 3.7.7 Fortran program* ........................................................................................................ 26 4 Linear equations ...................................................................................................................... 29 4.1 Gauss elimination ............................................................................................................... 29 4.2 Pivoting............................................................................................................................... 32 4.2.1 Partial pivoting ............................................................................................................ 33 4.2.2 Full pivoting................................................................................................................. 33 4.3 LU factorisation .................................................................................................................. 34 4.4 Banded matrices.................................................................................................................. 36 4.5 Tridiagonal matrices ........................................................................................................... 36 4.6 Other approaches to solving linear systems........................................................................ 37 4.7 Over determined systems* ................................................................................................... 38 4.8 Under determined systems* ................................................................................................. 40 5 Numerical integration.............................................................................................................. 41 5.1 Manual method ................................................................................................................... 41 5.2 Constant rule ....................................................................................................................... 41 5.3 Trapezium rule .................................................................................................................... 42 5.4 Midpoint rule ..................................................................................................................... 44 5.5 Simpson’s rule .................................................................................................................... 47 5.6 Quadratic triangulation* ...................................................................................................... 48 5.7 Romberg integration ........................................................................................................... 49 5.8 Gauss quadrature................................................................................................................. 50 5.9 Example of numerical integration....................................................................................... 51 5.9.1 Program for numerical integration* ............................................................................ 54 6 First order ordinary differential equations ........................................................................... 57 6.1 Taylor series........................................................................................................................ 57 6.2 Finite difference .................................................................................................................. 57 6.3 Truncation error .................................................................................................................. 58 6.4 Euler method....................................................................................................................... 59 6.5 Implicit methods ................................................................................................................. 60 6.5.1 Backward Euler ........................................................................................................... 60 6.5.2 Richardson extrapolation ............................................................................................ 61 6.5.3 CrankNicholson.......................................................................................................... 62 6.6 Multistep methods............................................................................................................... 63 6.7 Stability............................................................................................................................... 63 6.8 Predictorcorrector methods................................................................................................ 65 6.8.1 Improved Euler method................................................................................................ 66 6.8.2 RungeKutta methods................................................................................................... 66 7 Higher order ordinary differential equations ....................................................................... 68 7.1 Initial value problems ......................................................................................................... 68 7.2 Boundary value problems ................................................................................................... 68 7.2.1 Shooting method .......................................................................................................... 68
–3–
........................................................................................................ 87 8.................................................3......3 Numerical dispersion*....1...........................................3 Diffusion equation .......... 69 7......................................................................................................................................................1.................................................................................................................................................................................................................................................................................... 80 8.................... 84 8.........................................................Numerical Methods for Natural Sciences IB Introduction 7. Endians* ......................................................1................................................................... 85 8....... Procedural oriented*.............. 93 10................................2........................................................................................................................... 87 8..............................................1.........................................2............5 CrankNicholson....................................... 88 8...................3................................................. Ada* .... Object Oriented* .................................................................. Others*............2 Euler method...................... 72 8............................................................ 90 9......................................3...................................4........................2.......... C* ..................5 LaxWendroff* ..........3............. Weaknesses* ........... 94 10...................6 Boundary elements* ...................................................3..............................1 Truncation error* ............................3................................................3 Stability .......... Procedural verses Object Oriented* ...3..1................ 72 8............3 Other considerations* ..3.....................................3.................................................................4...................3..............................................................................................................................4............................................................................4...................................................................... Fortran 90* ......... 84 8.............................................1 Upwind differencing* .................................................................................................................. Algol* ... 84 8.... 94 10................. Floating point*..........................4 Shocks* .................2.......3 Multigrid* .............................................................4 Model for general initial conditions .................................4....................................................................... Fortran enhancements* ..................... 79 8...............................................................................................................4 The mathematics of relaxation* ......7 Finite elements*.................................... 87 8.............................................1..................................................................................... 95 10................................. 93 10......................1 Direct solution ...3................... 88 8.....1...........................4 Advection*.............................................1................................. 86 8........................2........................... 88 8................................................................................ 95 10....6 ADI* ................1....................................2 Relaxation ......... 89 9................................................ 73 8................................................ 95 10.... 93 10..... 97 10.............................................................. 97 –4– .......................2..................... 97 10................... 70 7.............................1.............................................................................................................................................................................4.................................................2 Poisson equation .................1 Semidiscretisation.....................................4............................2.............................. 85 8.............4.................................................. 75 8.........................................1...................................... 84 8. 87 8.............................. 88 9....................................................................................................................................................................................................... C++* ................1................ 86 8.......... 84 8....3................. 96 10................... 91 9...........3.6 Conservative schemes* ........................... Integers* ...4..............................3........2 Linear equations ............................2........................................................................................................................... 71 7..........................4..................................................................................... Number representation* .............................. 89 9.............................................................2 Error and step control* ........................... 91 10..........................................1 Laplace equation ..... Rounding and truncation error* ........................................................ 71 8 Partial differential equations ...... 84 8........2 Courant number*.............................................. Computer languages* .....5 FFT* ..........................................................
Numerical Methods for Natural Sciences IB
Introduction
10.4.3. Basic* ........................................................................................................................ 97 10.4.4. Cobol* ....................................................................................................................... 98 10.4.5. Delphi* ...................................................................................................................... 98 10.4.6. Forth* ........................................................................................................................ 98 10.4.7. Lisp* .......................................................................................................................... 98 10.4.8. Modula2* ................................................................................................................. 98 10.4.9. Pascal* ...................................................................................................................... 98 10.4.10. PL/1* ....................................................................................................................... 99 10.4.11. PostScript* .............................................................................................................. 99 10.4.12. Prolog* .................................................................................................................... 99 10.4.13. Smalltalk* ................................................................................................................ 99 10.4.14. Visual Basic* ........................................................................................................... 99
–5–
Numerical Methods for Natural Sciences IB
Introduction
1 Introduction
These lecture notes are written for the Numerical Methods course as part of the Natural Sciences Tripos, Part IB. The notes are intended to compliment the material presented in the lectures rather than replace them.
1.1 Objective
• To give an overview of what can be done • To give insight into how it can be done • To give the confidence to tackle numerical solutions An understanding of how a method works aids in choosing a method. It can also provide an indication of what can and will go wrong, and of the accuracy which may be obtained. • To gain insight into the underlying physics • “The aim of this course is to introduce numerical techniques that can be used on computers, rather than to provide a detailed treatment of accuracy or stability” – Lecture Schedule. Unfortunately the course is now examinable and therefore the material must be presented in a manner consistent with this.
1.2 Books General:
• Numerical Recipes  The Art of Scientific Computing, by Press, Flannery, Teukolsky & Vetterling (CUP) • Numerical Methods that Work, by Acton (Harper & Row) • Numerical Analysis, by Burden & Faires (PWSKent) • Applied Numerical Analysis, by Gerald & Wheatley (AddisonWesley) • A Simple Introduction to Numerical Analysis, by Harding & Quinney (Institute of Physics Publishing) • Elementary Numerical Analysis, 3rd Edition, by Conte & de Boor (McGrawHill)
More specialised:
• Numerical Methods for Ordinary Differential Systems, by Lambert (Wiley) • Numerical Solution of Partial Differential Equations: Finite Difference Methods, by Smith (Oxford University Press) For many people, Numerical Recipes is the bible for simple numerical techniques. It contains not only detailed discussion of the algorithms and their use, but also sample source code for each.
–6–
Numerical Methods for Natural Sciences IB
Introduction
Numerical Recipes is available for three tastes: Fortran, C and Pascal, with the source code examples being taylored for each.
1.3 Programming
While a number of programming examples are given during the course, the course and examination do not require any knowledge of programming. Numerical results are given to illustrate a point and the code used to compute them presented in these notes purely for completeness.
1.4 Tools
Unfortunately this course is too short to be able to provide an introduction to the various tools available to assist with the solution of a wide range of mathematical problems. These tools are widely available on nearly all computer platforms and fall into two general classes:
1.4.1 Software libraries
These are intended to be linked into your own computer program and provide routines for solving particular classes of problems. • NAG • IMFL • Numerical Recipes The first two are commercial packages providing object libraries, while the final of these libraries mirrors the content of the Numerical Recipes book and is available as source code.
1.4.2 Maths systems
These provide a shrinkwrapped solution to a broad class of mathematical problems. Typically they have easytouse interfaces and provide graphical as well as text or numeric output. Key features include algebraic analytical solution. There is fierce competition between the various products available and, as a result, development continues at a rapid rate. • Derive • Maple • Mathcad • Mathematica • Matlab • Reduce
–7–
All graphics and complex display equations (where the Microsoft Equation Editor has been used) are converted to GIF files for the HTML version. they appear as characters rather than GIF files in the HTML document. Since then. To avoid confusion when using older browsers. This has major advantages in terms of document size. Note that there has.6 Versions These lecture notes are available in three forms: the lecture notes distributed during lectures.6.uk/user/fdl/people/sd103/lectures/ These notes contain all the information. The HTML version of the notes also has all the blanks filled in. all Greek and Symbols are formatted in Green. unfortunately. display equations and inline equations and symbols. The HTML is generated from a source Word document that contains graphics. been a tendency to concentrate on the more analysis side of the course in the examination questions. Greek and Symbols used within the text and single line equations may not be displayed correctly.6. Some indication of the type of exam questions may be gained from earlier tripos papers and from the later examples sheets. Some of the topics covered in these notes are not examinable.Numerical Methods for Natural Sciences IB Introduction 1. although may not contain the latest revisions. some browsers do not handle superscript and subscript. 1. but can cause problems with older World Wide Web browsers. Variables and normal symbols –8– .cam. many of the simpler equations and most of the inline equations and symbols do not use the Equation Editor as this is very inefficient. Thus if you find a green Roman character.6. and any blanks have been filled in. 1. and the set available in two formats on the web. This year’s notes will be provided in Acrobat format (pdf) and may be found at http://www. however. This situation is indicated by an asterisk at the end of the section heading.ac.2 Acrobat The lecture notes are also available over the web. 1. this course was not examinable. 1.damtp. However.5 Course Credit Prior to the 19951996 academic year. Similarly. there have been two examination questions each year.1 Notes disctributed during lectures The version distributed during lectures includes blanks for you to fill in the missing details. Due to limitations in HTML and many older World Wide Web browsers. and these notes remain available. read it as the Greek equivalent. As a consequence. These details will be given during the lectures themselves. Table 1of the correspondences is given below.3 HTML In previous years these have been provided through an html format.
Greek subscripts and superscripts are the same Green as the normal characters. The colour is still displayed. subscripts are shown in dark Cyan and superscripts in dark Magenta. the context providing the key to whether it is a subscript or superscript.0 and Netscape 3. 1. The context and colour should distinguish them from HTML hypertext links.6. For a similar reason. but on many Unix platforms the Greek and Symbol characters are unavailable) do not have the same character set limitations.Numerical Methods for Natural Sciences IB Introduction are treated in a similar way but are coloured dark Blue to distinguish them from the Greek. but the characters appear as intended. –9– . Fortunately many newer browsers (Microsoft Internet Explorer 3.0 on the PC. should contain details of the author and this copyright notice. Any such reproductions.4 Copyright These notes may be duplicated freely for the purposes of education or research. Greek/Symbol character α β δ ∆ ε ϕ Φ λ µ π θ σ ψ Ψ <= >= <> =~ vector Name alpha beta delta Delta epsilon phi Phi lambda mu pi theta sigma psi Psi less than or equal to greater than or equal to not equal to approximately equal to vectors are represented as bold Table 1: Correspondence between colour and characters. in whole or in part. Similarly. the use of some mathematical symbols (such as less than or equal to) has been avoided and their Basic computer equivalent used in stead.
y + δy ) = f ( x . – 10 – . we may represent the expansion as δx 2 δx 3 f ( x + δx ) = f ( x ) + δxf ′( x ) + f ′′( x ) + f ′′′( x ) +K. y ) + δx ∂f ∂f δx 2 ∂ 2 f δy 2 ∂ 2 f ∂2 f + δy + + + δxδy +K. For a function of a single variable. 2 6 In two dimensions we have f ( x + δx . 2 ∂x 2 2 ∂y 2 ∂x ∂y ∂x∂y (2) (1) Similar expansions may be constructed for functions with more independent variables.Numerical Methods for Natural Sciences IB Introduction 2 Key Idea The central idea behind the majority of methods discussed in this course is the Taylor Series expansion of a function about a point.
xm.Numerical Methods for Natural Sciences IB Introduction 3 Root finding in one dimension 3. 3.…. If the scalar function f depends on m independent variables x1.1 Why? Solutions x = x0 to equations of the form f(x) = 0 are often required where it is impossible or infeasible to find an analytical expression for the vector x. then the solution x0 will describe a surface in m–1 dimensional space. – 11 – . The basic algorithm for the bisection method relies on repeated application of • Let xc = (xa+xb)/2. Clearly. Alternatively we may consider the vector function f(x)=0. One of the main drawbacks is that we need two initial guesses xa and xb which bracket the root: let fa = f(xa) and fb = f(xb) such that fa fb <= 0.x2.2 Bisection This is the simplest method for finding a root to an equation and is also known as binary chopping. fb xa fc fa xc xb Figure 1: Graphical representation of the bisection method showing two initial guesses (xa and xb bracketting the root). For this course we restrict our attention to a single independent variable x and seek solutions to f(x)=0. As we shall see. the solutions of which typically collapse to particular values of x. An example of this is shown graphically in figure 1. it is also the most robust. if fa fb = 0 then one or both of xa and xb must be a root of f(x) = 0.
We repeat this interval halving until either the exact root has been found or the interval is smaller than some specified tolerance. (4) (3) More generally.xb) is halved for each iteration. on average. at least provided εn+1 < εn. must be en < xa .Numerical Methods for Natural Sciences IB Introduction • if fc = f(c) = 0 then x = xc is an exact solution.a root to f(x) = 0 will always be found . the error in our estimate of the solution to f(x) = 0 is. Now since the interval (xa.xb) always bracets the root.provided f(x) is continuous over the initial interval. The larger the value of p. halved. if xn is the estimate for the root x* at the nth iteration. • else the root lies in the interval (xc. (6) Indeed this criteria applies to all techniques discussed in this course. the value of fc. we know that the error in using either xa or xb as an estimate for root at the nth iteration. By replacing the interval (xa. then en+1 ~ en/2. εn+1 < εn for suitably large n). rather it will find some suitably accurate approximation to it. then the error in this estimate is εn = xn . or both.x*.e. showing the scheme is first order and converges linearly. • elseif fa fc < 0 then the root lies in the interval (xa.2.xb). and that this convergence is sufficiently rapid to attain the solution in a reasonable span of time. it is necessary to place some conditions under which the solution process is to be finished or aborted. 3. but in many cases it applies only asymptotically as our estimate xn converges on the exact solution. en. 3. For first order schemes (i. (5) In many cases we may express the error at the n+1th time step in terms of the error at the nth time step as εn+1 ~ Cεnp.2 Criteria In general. The guaranteed convergence of the bisection method does not require – 12 – .xc) or (xc. For the bisection method we may estimate εn as en.2.1 Convergence Since the interval (xa. Typically this will take the form of an error tolerance on en = an–bn. Indeed convergence is guaranteed . C < 1 for convergence.xb) (whichever brackets the root). a numerical root finding procedure will not find the exact root being sought (ε = 0). The form of equation (4) then suggests p = 1 and C = 1/2.xb) with either (xa.e.xb. The exponent p in equation (6) gives the order of the convergence. the faster the scheme converges on the solution.xc). In order to prevent the algorithm continuing to refine the solution for ever. For some methods it is also important to ensure the algorithm is converging on a solution (i. p = 1).
xc). A graphical interpretation of this method is shown in figure 2.3 Linear interpolation (regula falsi) This method is similar to the bisection method in that it requires two initial guesses to bracket the root. is one of the reasons for its widespread use despite being relatively slow to converge. – 13 – . • elseif fa fc < 0 then the root lies in the interval (xa. fb xa x1 x2 xb fa Figure 2: Root finding by the linear interpolation (regula falsi) method. 3. combined with its extreme simplicity.Numerical Methods for Natural Sciences IB Introduction such safety checks which. a linear interpolation is used to obtain a new point which is (hopefully. but not necessarily) closer to the root than the equivalent estimate for the bisection method. instead of simply dividing the region in two. then f a = xb − b fb = a b fb − f a fb − f a fb − f a • if fc = f(xc) = 0 then x = xc is an exact solution. The two initial gueses xa and xb must bracket the root. The basic algorithm for the linear interpolation method is x − xa x − xa x f − xb f a • Let xc = xa − b . However.
(9) Geometrically. f0 x1 x2 x0 Figure 3: Graphical interpretation of the Newton Raphson algorithm. – 14 – . 3. (8) Subsequent iterations are defined in a similar manner as xn +1 = xn − f ’( xn ) f ( xn ) . Figure 3 provides a graphical interpretation of this. xn+1 can be interpreted as the value of x at which a line. (7) Setting the quadratic and higher terms to zero and solving the linear approximation of f(x) = 0 for x gives x1 = x0 − f ’( x0 ) f ( x0 ) .xb).4 NewtonRaphson Consider the Taylor Series expansion of f(x) about some point x = x0: f(x) = f(x0) + (x–x0)f'(x0) + ½(x–x0)2f"(x0) + O(x–x03).f(xn)) and tangent to the curve f(x) at that point.Numerical Methods for Natural Sciences IB Introduction • else the root lies in the interval (xc. convergence is guaranteed as was the case for the bisection method. passing through the point (xn. The method is first order and is exact for linear f. Because the solution remains bracketed at each step. crosses the y axis.
x0 x1 x2 Figure 4: Divergence of the Newton Raphson algorithm due to the presence of a turning point close to the root. A graphical interpretation of this is given in figure 4. expand f(x) around the root x = x*. if f’ vanishes at an iteration point.4. f(x) = f(x*) + (x– x*)f'(x*) + 1/2(x– x*)2f"(x*) + O(x– x*3) and substitute into the iteration formula. then the method will fail to converge. 3.1 Convergence To study how the NewtonRaphson scheme converges. This then shows (10) – 15 – . or indeed even between the current estimate and the root. NewtonRaphson converges much more rapidly than the bisection or linear interpolation.Numerical Methods for Natural Sciences IB Introduction When it works. However.
The difference is that rather than replacing one of the two estimates so that the root is always bracketed. but on the other hand. by comparison with (5).. there is second order (quadratic) convergence...Numerical Methods for Natural Sciences IB Introduction ε n +1 = x n +1 − x * = xn − x * − f ′( x n ) f ( xn ) 2 ε n f ′( x *) + 12 ε n f ′′( x *) +. convergence is not guaranteed.3. x n − x n −1 (12) and this is substituted into the NewtonRaphson algorithm (9) to give x n +1 = x n − (x − x ) f x .. Thus. ( ) f (x ) − f (x ) n n −1 n n n −1 (13) This formula is identical to that for the Linear Interpolation method discussed in section 3.. – 16 – . [ ] f ′(1x *) 1 − ε n 2 = εn − εn + εn f ′′( x *) 1 2 f ′′( x *) − 2 εn + O(ε 3 ) n f ′( x *) f ′( x *) f ′′( x *) +. A graphical representation of the method working is shown in figure 5 and failure to converge in figure 6.e. f ’( xn ) ≈ f ( x n ) − f ( x n −1 ) .5 Secant (chord) This method is essentially the same as NewtonRaphson except that the derivative f’(x) is approximated by a finite difference based on the current and the preceding estimate for the root. i. the oldest point is always discarded in favour of the new. f ′( x *) (11) = 1 2 ε2 n f ′′( x *) + O(ε 3 ) n f ′( x *) since f(x*)=0. This means it is not necessary to have two initial guesses bracketing the root. = εn − f ′( x *) + ε n f ′′( x *) +.. The presence of the f’ term in the denominator shows that the scheme will not converge if f’ vanishes in the neighbourhood of the root.. 2 = ε n − ε n f ′( x *) + 1 2 ε n f ′′( x *) +. 3. swapping the two initial guesses x0 and x1 will change the behaviour of the method from convergent to divergent. In some cases..
x0 x1 x3 x2 Figure 6: Divergence using the secant method.Numerical Methods for Natural Sciences IB Introduction x1 x2 x4 x3 x0 Figure 5: Convergence on the root using the secant method. – 17 – .
( ε − ε n −1 ) 1 n f ′′( x *) (ε n − ε n−1 ) f ′( x *) 1 + 2 (ε n + ε n−1 ) f ’( x *) +. Expanding around the root x = x* for xn and xn+1 gives f(xn) = f(x*) + εnf'(x*) + 1/2εn2f''(x*) + O(εn3). 2 f ′( x *) n and substituting into the error evolution equation (15) gives β (16) f ′′( x *) ε n +1 = ε ε 2 f ′( x *) n n −1 f ′′( x *) f ′′( x *) = ε 2 f ′( x *) n 2 f ′( x *) f ′′( x *) = 2 f ′( x *) 1−β α −β α ε1 α n (17) ε nα α +1 which we equate with our assumed relationship to show – 18 – .. 1 − 1 2 (ε n + ε n −1 ) +.1 Convergence The order of convergence may be obtained in a similar way to the earlier methods.Numerical Methods for Natural Sciences IB Introduction 3.− ε n −1 f ′( x *) + 12 ε n −1 f ′′( x *) +... (15) Note that this expression for εn+1 includes both εn and εn–1.... f ′′( x *) f ′′( x *) = ε n − ε n + 1 2 ε 2 +. n f ′( x *) f ′( x *) [ ] (ε n − ε n −1 ) = ε n − ε n + 1 2 ε n ( ε n + ε n −1 ) = 1 2 f ′′( x *) 1 2 f ′′( x *) − 2 εn + O(ε 3 ) n f ′( x *) f ′( x *) f ′′( x *) 3 ε ε + O(ε n −1 ) f ′( x *) n n −1 .... f(xn1) = f(x*) + εn–1f’(x*) + 1/2εn–12f’’(x*) + O(εn–13). and substituting into the iteration formula ε n +1 = x n +1 − x * = xn − x * − = εn − = εn − f ( xn ) (14a) (14b) f ( xn ) − f ( xn −1 ) (x n − xn −1 ) 2 2 ε n f ′( x *) + 12 ε n f ′′( x *) +... 2 ε n f ′( x *) + 12 ε n f ′′( x *) +. The form of this expression suggests a power law relationship.5. By writing f ′′( x *) α ε n +1 = ε .. In general we would like it in terms of εn only.. 2 ε n f ′( x *) + 12 ε n f ′′( x *) +..
x). A graphical interpretion of this formula is given in figure 7. Indeed. As with NewtonRaphson. The efficiency and convergence of this method depends on the final form of g(x). – 19 – . The iteration formula for this method is then just xn+1 = g(xn).Numerical Methods for Natural Sciences IB Introduction 1+ α 1+ 5 = .61803… (the golden ratio). The only restriction on T(f(x).6 Direct iteration A simple and often useful method involves rearranging and possibly transforming the function f(x) by T(f(x). 3.x) is that solutions to f(x) = 0 have a one to one relationship with solutions to g(x) = x for the roots being sort.x) to obtain g(x) = T(f(x). (19) x1 x0 Figure 7: Convergence on a root using the Direct Iteration method. α 2 α 1 2 β= = = . the method may diverge if f’ vanishes in the neighbourhood of the root. one reason for choosing such a transformation for an equation with multiple roots is to eliminate known roots and thus simplify the location of the remaining roots. 1+ α α 1+ 5 α= (18) Thus the method is of noninteger order 1.
then we get oscillatory convergence/divergence. In addition.6. so that the evolution of the error follows ε n +1 = x n +1 − x * = g ( xn ) − x * (20) 2 3 = g ( x*) + ε n g ′( x *) + 1 2 ε n g ′′( x *) + O(ε n ) − x * 2 2 = ε n g ′( x *) + 1 2 ε n g ′′( x *) + O(ε n ) . we should choose T(f(x). (21) The method is clearly first order and will converge only if g’ < 1. Here g’ < −1 so the divergence is oscilatory. Here we need to expand g(x) rather than f(x). If g’(x) < 0.Numerical Methods for Natural Sciences IB Introduction 3. The sign of g’ determines whether the convergence (or divergence) is monotonic (positive g') or oscillatory (negative g'). Figure 8 shows how the method will diverge if this restriction on g' is not satisfied.x) so that the curvature g"(x) does not become too large. Obviously our choice of T(f(x). This gives g(xn) = g(x*) + εng’(x*) + 1/2εn2g"(x*) + O(εn3).1 Convergence The convergence of this method may be determined in a similar manner to the other methods by expanding about x*.x) should try to minimise g’(x) in the neighbourhood of the root to maximise the rate of convergence. – 20 – .
• Expect linear convergence: εn+1 ~ εn/2. (22) 3.7 Examples Consider the equation f(x) = cos x − 1/2.500001909862 0. 3. Iteration 0 1 2 Error 0.130900 0.1 Bisection method • Initial guesses x = 0 and x = π/2.0654498 en+1/en 0.7.Numerical Methods for Natural Sciences IB Introduction x0 Figure 8: The divergence of a Direct Iteration when g’ < −1.4999984721161 0.5000015278886 – 21 – .261799 0.
1620777746239 3.7.00204534 0.5000449824959 0.00000000199939 0.4999951107776 0.5027411002019 3.09310313729469 0.000000230672 0. Iteration Error 0 1 2 3 4 0.000127861 0. Iteration 0 Error 0.00000000687124 1.09310010849472 0.0000319871 0.000154302 0.0327250 0.0235988 0.09310104005501 0.1213205550823 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0. • Expect quadratic convergence: εn+1 ~ cεn2.09310567679542 0.7.4996574405018 0.4986322611303 0.0000266121 0.0E15 en+1/en 0.2885971297087  Machine accuracy – 22 – .0163624 0.00653855280777 0.5000036669437 0.0000000000173302 0.000285755 0.3 NewtonRaphson • Initial guess: x = π/2.09374663216227 0.261799 0.4999755541866 0.000511356 0.09312907910623 0.0000000000000002284 en+1/en 0.0000445311143083 0.09310037252741 0.0000638867 0.00818126 0.4999139542706 0.000255634 0.00000000000161354 0. • Note that can not use x = 0 as derivative vanishes here.2770714107399 0.09316100003719 0.000000000186144 0.00102262 0.09310039562066 0.000000014553 en+1/en2 0.10000070962752 0.09310059304987 0.00000801862 0.2 Linear interpolation • Initial guesses x = 0 and x = π/2.0963178807113 0.000000000000150319 0.00000247767 0.5006848060707 0.0000000000000014092 0.Numerical Methods for Natural Sciences IB Introduction 3 4 5 6 7 8 9 10 11 12 13 14 15 0.0000000214757 0.5001721210794 0.4999969442322 0.0317616 0.00305921 0.09340810209172 0.00409059 0.0000159498 0.0000000000000140919 0. • Expect linear convergence: εn+1 ~ cεn.5000110008581 0.
1350620072841 0.09730712558561 0.261799 0.00000000000815828 0.000585926 0.618.1339750632633 0.000008384051747483 0.13 εn.1213205550823 0.00000000000109311 0.0000000000608936 0.0000000258486 0.0008898244696049 0.0306025 0.00309063 0.000000189008 0.0000105299 0.0000000000001465442 en+1/en 0.1341210323488 0.00000141077 0. • Expect convergence: εn+1 ~ cεn1. 3.00433820 0.1339878013503 0.1339748963181 0.00000000339255 0.1339937647134 0.169615 0.5.0000290491 0.7.0000000253223 0.0000785850 0.30997006568 0.1339747523914 0.009399086917554 0.1339775306508 0.1417596601585 0.1340617138257 – 23 – . Iteration 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 (23) Error 0.3344 0.0317616 0.1 Addition of x Use xn+1 = g(x) = xn + cos x – 1/2 • Initial guess: x = 0 (also works with x = π/2) • Expect convergence: εn+1 ~ g’(x*) εn ~ 0.000000000454515 0.1339747969181 0.8203 0.7. 3.4 Secant method • Initial guesses x = 0 and x = π/2.000000000000216716 Machine accuracy • Convergence substantially faster than linear interpolation. Iteration Error 0 1 2 3 4 5 6 en+1/en en+1/en1.1339759843399 0.1339744440023 0.5 Direct iteration There are a variety of ways in which equation (22) may be rearranged into the form required for direct iteration.4098 0.7.5664 0.547198 0.1804233116175 0.Numerical Methods for Natural Sciences IB Introduction 3.2777 0.618 0.
• Expect convergence: εn+1 ~ g’(x*) εn ~ 0. the convergence of such methods is very slow.0181527 0.8038528579103 0.0120854 0.0412126 0.8754733164962 0. Consider f(x) = f(x) + (x–x)h(x) = 0.778843985023 0.7987216482514 0.00534209 0.7.0155171 0.0101649 0. (25) Rearranging equation (25) for one of the x variables and labelling the different variables for different steps in the interation gives xn+1 = g(xn) = xn – f(xn)/h(xn).7.3 Approximating f’(x) The Direct Iteration method is closely related to the Newton Raphson method when a particular choice of transformation T(f(x)) is made.00291446 en+1/en 0.5.0238837 0.854809477378 0. However. then we recover the NewtonRaphson method with its quadratic convergence.8410892481838 0.00441179 0. In some situations calculation of f'(x) may not be feasible.8258546748557 0. (26) Now if we choose h(x) such that g’(x)=0 everywhere (which requires h(x) = f'(x)).026 εn. The Direct Iteration method.00354643 0.7908921878228 0. In such cases it may be necessary to rely on the first order and secant methods which do not require a knowledge of f'(x).7600455540808 0. To do this we select h(x) as an approximation to f’(x).Numerical Methods for Natural Sciences IB Introduction 3.8218010788314 3.00803934 0.9577980958138 0. For the present f(x) = cos x .00668830 0.6773664418235 0.0272809 0. (27) – 24 – .8319464035605 0.0635232 0. on the otherhand. provides us with a framework for a faster method.2 Multiplcation by x Use xn+1 = g(x) = 2x cos x • Initial guess: x = π/2 (fails with x = 0 as this is a new solution to g(x)=x) • Expect convergence: εn+1 ~ g’(x*) εn ~ 0. Iteration 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 (24) Error 0.9070721090152 0.5.0608424 0.1/2 we may approximate f'(x) as h(x) = 4x(x – π)/π2 • Initial guess: x = 0 (fails with x = π/2 as h(x) vanishes).0373828 0.7297714456916 0.81 εn.
02572151088348 0.000000000310022 0.0 9.Numerical Methods for Natural Sciences IB Introduction Iteration 0 1 2 3 4 5 6 7 8 9 Error 0.02572134969427 0.000704654 0.0 Scheme Bisection Linear interpolation NewtonRaphson Secant General iteration: xn+1 = xn + cosxn . Since h(x) only approximates f’(x). it may be possible to choose h(x) in such a way as to avoid a divergent scheme.0e5 1.7.1/2 General iteration: xn+1 = 2xncosxn General iteration: Approximating f’(x) with quadratic Interation (n) Figure 9: Comparison of the convergence of the error in the estimate of the root to cos x = 1/2 for a range of different root finding algorithms.02985973863078 0.02570835580191 0. – 25 – .0 8.02585084310882 0.00000000000000516850 en+1/en 0.0 7. 3. with the approximated derivative for the Direct Iteration proving a very good alternative.02521207213623 Machine accuracy The convergence.01 0.0e10 1.0 1.0 13.0 10.6 Comparison Figure 9 shows graphicall a comparison between the different approaches to finding the roots of equation (22).000000000000205001 0.0 11. Convergence for cos(x) = 1/2 1.0 0.0235988 0.0e6 1.0e7 1.0 16.0 3.0e9 1.0 6.0000182159 0.0e12 1.0e13 1.0e14 0. while still formally linear.02572477890195 0. For a more complex example.0 12.000000468600 0.1 0.00000000000797410 0.0 15.0 14.0 5. and the accuracy of this approximation is more important close to the root. the computational cost of having more iterations than Newton Raphson may be significantly less than the cost of evaluating the derivative.0e8 1. A further potential use of this approach is to avoid the divergence problems associated with f’(x) vanishing in the Newton Raphson scheme.0 4.0e4 Relative error (εn/ε0) 1.0e11 1. is significantly more rapid than with the other first order methods.02572107785899 0.001 1. The clear winner is the NewtonRaphson scheme.0 2.0000000120531 0.
0 fb = f(xb) DO i=0.i) = xc .pi.xStar ENDDO C=====NewtonRaphson * Not examinable – 26 – .i) = xc .5) WRITE(6. No knowledge of Fortran or any other programming language is required in this course.(xbxa)/(fbfa)*fa fc = f(xc) IF (fa*fc .fa.0 fb = f(xb) DO i=0.7.*)’# ’.f.xb.xStar.Numerical Methods for Natural Sciences IB Introduction 3.xa.15 xc = xa .5 df(x) = SIN(x) pi = 3.0) THEN xb = xc fb = fc ELSE xa = xc fa = fc ENDIF Error(0.xStar ENDDO C=====Linear interpolation xa = 0 fa = f(xa) xb = pi/2. PROGRAM Roots INTEGER*4 i.0) THEN xb = xc fb = fc ELSE xa = xc fa = fc ENDIF Error(1.xc.xStar.fb. 0.df REAL*8 Error(0:15.15 xc = (xa + xb)/2. Note that this is included as an illustrative example.LT.141592653 xStar = ACOS(0.fc.0 fc = f(xc) IF (fa*fc .7 Fortran program* The following program was used to generate the data presented for the above examples. 0.0:15) f(x)=cos(x)0.j REAL*8 x.f(xStar) C=====Bisection xa = 0 fa = f(xa) xb = pi/2.LT.
g12.xStar ENDDO C=====Direct iteration using x + f(x) = x xa = 0.15 xa = xa .15 xa = 2.i) = xa .i4.0 DO i=0.15 xa = xa .xStar ENDDO C=====Direct iteration using xf(x)=0 rearranged for x CStarting point prevents convergence xa = pi/2.Numerical Methods for Natural Sciences IB Introduction xa = pi/2.j=0.f(xa)*pi*pi/(4.15 IF (fa .0*xa*(xapi)) Error(7.0 DO i=0.(xbxa)/(fbfa)*fa xa = xb fa = fb xb = xc fb = f(xb) ENDIF Error(3.i) = xa .xStar ENDDO C=====Direct iteration using xf(x)=0 rearranged for x xa = pi/4.0 DO i=0.100)i.5) Error(5.0*xa*(f(x)0.i) = xa .NE.8(1x. fb) THEN C If fa = fb then either method has converged (xa=xb) C or will diverge from this point xc = xa .xStar ENDDO C=====Secant xa = 0 fa = f(xa) xb = pi/2.0*xa*COS(xa) Error(6.0 DO i=0.xStar ENDDO C=====Output results DO i=0.15 WRITE(6.f(xa)/df(xa) Error(2.15 xa = 2.i).0 DO i=0.i) = xa .i) = xa .i) = xc .15 xa = xa + f(xa) Error(4.6)) END – 27 – .(Error(j.0 fb = f(xb) DO i=0.7) ENDDO 100 FORMAT(1x.xStar ENDDO C=====Direct iteration using 4x(xpi)/pi/pi to approximate f’ xa = pi/2.
Numerical Methods for Natural Sciences IB Introduction – 28 – .
and subtract the first equation from the third equation to get x + 2y + 3z = 6 0 x + − 2 y + − 3z = − 5 . x + 4 y + 4z = 9 (28) we might then subtract 2 times the first equation from the second equation. In particular we have z = −2 / −2 = 1 x = (6 − 2 y − 3z ) / 1 = ( 6 − 2 − 3) / 1 = 1 / 1 = 1 y = ( − 5 + 3z ) / −2 = ( − 5 + 3) / −2 = −2 / −2 = 1 .Numerical Methods for Natural Sciences IB Introduction 4 Linear equations Solving equation of the form Ax = r is central to many numerical algorithms. the context in which it is to be solved and the size compared with the available computer resources. Substituting this back into the second equation gives an equation for y and soon. From the system – 29 – . if x + 2 y + 3z = 6 2 x + 2 y + 3z = 7 .1 Gauss elimination This is what you would probably do if you were computing the solution of a nontrivial system by hand. Which is best will depend on the structure of A. 4. (31) We may write this system in terms of a matrix A. an unknown vector x and the known righthand side b as Ax = b. 0x + 0 y + − 2z = − 2 (30) The third equation now involves only z giving z = 1. while others iterative in nature and providing only approximate solutions. For example. some algebraically correct. z = 3 0x + 2 y + (29) In the second step we might add the second equation to the third to obtain x + 2y + 3z = 6 0 x + − 2 y + − 3z = − 5 . (32) and do exactly the same manipulations on the rows of the matrix A and righthand side b. There are a number of methods which may be used.
0 2 1 z 3 (34) Before adding the second and third rows. 0 2 − 2 1 − 2( 3 / 2) z 3 − 2( 5 / 2) 1 2 3 x 6 0 1 3 / 2 y = 5 / 2 . Substituting back we get 1 2 3 x 6 6 0 1 0 y = 5 / 2 − ( 3 / 2) z = 1 . 0 0 1 z 1 (38) and retrieve immediately the value z = 1 from the last row of the equation. getting 1 2 3 x 6 0 1 3 / 2 y = 5 / 2 . and subtract the first row from the third row to obtain 1 2 3 x 6 0 − 2 − 3 y = − 5 .Numerical Methods for Natural Sciences IB Introduction 1 2 3 x 6 2 2 3 y = 7 . 1 4 4 z 9 (33) we subract 2 times the first row from the second row. but in general it simplifies the next step where we subtract a32 times the second row from the third row. the element on the diagonal. Thus the next step is 1 x 2 3 6 0 1 3 / 2 y = 5 / 2 . 0 2 1 z 3 (35) This may seem pointless in this example. Here a32 = 2 and represents the value in the second column of the third row of the matrix. 0 0 1 z 1 1 and finally we recover the answer (39) – 30 – . this time we will divide the second row through by −2. 0 0 − 2 z − 2 (36) ⇒ (37) We again divide the resulting row (now the third row) by the element on the diagonal (a33 = −2) to obtain 1 2 3 x 6 0 1 3 / 2 y = 5 / 2 .
as computer arithmetic is not exact. 2 x2 − a1. and the matrix A is not singular. there will be some truncation and rounding error in the answer. M an1 x1 + an 2 x2 + an 3 x3 +K+ ann xn = rn (41) we first divide the first row by a11 and then subtract a21 times the new first row from the second row. If the arithmetic is exact. M $ $ $ $ x1 = r1 − a1.n xn . then its use in the elimination of subsequent rows may lead to differences being computed between (44) – 31 – . then the answer computed in this manner will be exact (provided no zeros appear on the diagonal . However.n xn .3 x3 −L− a1.2. $ $ xn −1 = rn −1 − an −1.n −1 xn −1 − an − 2 . for the system (40) a11 x1 + a12 x2 + a13 x3 +K+ a1n xn = r1 a21 x1 + a22 x2 + a23 x3 +K+ a2 n xn = r2 a31 x1 + a32 x2 + a33 x3 +K+ a3n xn = r3 .n xn . $ xn = rn .see below). this time using the new contents of element 2. a31 times the new first row from the third row … and an1 times the new first row from the nth row. This gives a12 a11 1 0 a − (a a )a 22 21 11 12 0 a32 − (a31 a11 )a12 0 an 2 − (an1 a11 )a12 a13 a11 K a1n a11 r1 a11 x1 x r − (a a )r a23 − (a21 a11 )a13 a2 n − (a21 a11 )a1n 2 2 21 11 1 x3 = r3 − (a31 a11 )r1 . The cumulative effect of this error may be very significant if the loss of precision is at an early stage in the computation. (42) a33 − (a31 a11 )a13 a3n − (a31 a11 )a1n M M M xn rn − (an1 a11 )r1 an 3 − (an1 a11 )a13 ann − (an1 a11 )a1n By repeating this process for rows 3 to n.Numerical Methods for Natural Sciences IB Introduction 1 0 0 x x 6 − 2 y − 3z 1 0 1 0 y = y = 1 = 1 . 0 0 1 z z 1 1 In general. In particular. we gradually replace the region below the leading diagonal with zeros. if a numerically small number appears on the diagonal of the row. $ $ $ xn − 2 = rn − 2 − an − 2 . Once we have $ 1 a12 0 1 0 0 0 $ $ $ a13 K a1n x1 r1 $ $ $ a23 a2 n x2 r2 $ $ a3n x3 = r3 1 M M M $ 0 K 1 xn rn (43) the final solution may be obtained by back substitution.
if they do. Clearly this leads to difficulties! 4.2 Pivoting One of the ways around this problem is to ensure that small values (especially zeros) do not appear on the diagonal and. the result would be simply 1. if we were to apply the Gauss Elimination outlined above. if a22– (a21/a11)a12 were very small. for example. for example. then at the next stage of the computation the 3. 0 0 1 x3 1 or columns one and two to give 3 0 0 x2 3 0 2 0 x1 = 2 . 0 0 1 x3 1 (45) the solution of which is obviously x1 = x2 = x3 = 1. and both a23–(a21/a11)a13 and a33–(a31/a11)a13 were 1. we would need to divide through by a11 = 0. If single precision arithmetic (representing real values using approximately six significant digits) were being used. Consider.0 and subsequent calculations would be unaware of the contribution of a23 to the solution. to remove them by rearranging the matrix and vectors. say. say. In the example given in (45) we could simply interchange rows one and two to produce 2 0 0 x1 2 0 3 0 x2 = 3 .Numerical Methods for Natural Sciences IB Introduction very large and very small values with a consequential loss of precision. A more extreme case which may often occur is if.3 element would involve calculating the difference between 1/10–6=106 and 1. the system 0 3 0 x1 3 2 0 0 x2 = 2 . More generally. 10–6. suppose at some stage during a calculation we have (46) (47) – 32 – . 0 0 1 x3 1 either of which may then be solved using standard Guass Elimination. For example. a22–(a21/a11)a12 is zero – unless something is done it will not be possible to proceed with the computation! A zero value occuring on the leading diagonal does not mean the matrix is singular. However.
The righthand side vector.2 Full pivoting The philosophy behind full pivoting is much the same as that behind partial pivoting.2 (155) the numerically largest value in the second column. 4. (In an extreme case we might even have the value 0 appearing on the diagonal – clearly something must be done to avoid a divide by zero error occurring!) To remove this problem we may again rearrange the rows and/or columns to bring a larger value into the 2. or only when the diagonal values are sufficiently small as to potentially cause a problem.2.2 and so we simply swap rows 2 and 6 to give $ 4 1 8 3 2 K 5 x1 r1 1 0 − 155 23 4 x r 25 73 2 2 $6 $ 0 9 4 6 −8 2 18 x3 r3 $ 3 2 − 3 4 6003 15 x4 r4 0 = 0 15 $ 1 9 33 − 2 1 x5 r5 −6 $ 1 10 201 13 4 x6 r2 0 10 M M M $ K 88 xn rn 8 56 4 − 4 4 0 . Partial pivoting may be implemented for every step of the solution process. the very small 10–6 value for element 2. The main difference is that the numerically largest value in the column or row containing the value to be replaced.5 (201) is numerically the largest value in the second row and the element 6.1 Partial pivoting In partial or column pivoting.2 element.2.Numerical Methods for Natural Sciences IB Introduction $ 4 1 8 3 2 K 5 x1 r1 1 0 10− 6 1 10 201 13 $ 4 x2 r2 $ 0 9 4 6 −8 2 18 x3 r3 $ 3 2 − 3 4 6003 15 x4 r4 0 = 0 15 $ 1 9 33 − 2 1 x5 r5 $ 25 73 2 x6 r6 0 − 155 23 4 M M M $ K 88 xn rn 8 56 4 − 4 4 0 (48) where the element 2. In our example above element the magnitude of element 2. has been rearranged. Pivoting for every step will lead to smaller errors being introduced through numerical inaccuracies. (49) Note that our variables remain in the same order which simplifies the implementation of this procedure.2 is likely to cause problems.5 (201) is the greatest in either – 33 – . For our example matrix the largest value is in element 6. 4. but the continual reordering will slow down the calculation. however. As discussed above. we rearrange the rows of the matrix and the righthand side to bring the numerically largest value in the column onto the diagonal.
The rearranged system becomes 1 8 3 2 K 5 x1 1 3 0 201 1 10 10 − 6 13 4 x 5 0 − 8 4 6 9 2 18 x 3 0 4 2 −3 3 6003 15 x 4 0 33 1 −2 9 15 1 x 2 2 x 6 0 25 23 4 − 155 73 M M 8 4 K 88 x n 0 − 4 56 4 $ b1 $ b2 b $ 3 $ b = $4 b5 $ b6 M $ b n . This will also entail a rearrangement of the solution vector x. In our example above. (51) Again this process may be undertaken for every step. We may bring this onto the diagonal for the next step by interchanging columns one and six and rows two and four. then the matrix A is singular and no solution exists.6 in the matrix. 4. (52) – 34 – . or only when the value on the diagonal is considered too small relative to the other values in the matrix. The basic idea is to find two matrices L and U such that LU = A.3 LU factorisation A frequently used form of Gauss Elimination is LU Factorisation also known as LU Decomposition or Crout Factorisation. the largest value is 6003 occurring at position 4. The order in which we do this is unimportant. (50) The ultimate degree of accuracy can be provided by rearranging both rows and columns so that the numerically largest value in the submatrix not yet processed is brought onto the diagonal. If it is not possible to rearrange the columns or rows to remove a zero from the diagonal. The final result is 4 1 8 3 2 K 5 x1 1 0 6003 2 − 3 4 3 4 x 6 0 2 4 6 −8 9 18 x 3 −6 0 13 1 10 201 10 15 x4 0 − 2 1 9 33 15 1 x 5 25 − 155 2 x 2 0 73 23 4 M M 4 56 4 − 4 8 K 88 xn 0 $ r1 $ r4 r3 $ $ r2 = $ r 5 $ r6 M $ rn .Numerical Methods for Natural Sciences IB Introduction row 2 or column 2 so we shall rearrange the columns to bring this element onto the diagonal.
Indeed. The basic decomposition algorithm for overwriting A with L and U may be expressed as # Factorisation FOR i=1 TO n FOR p=i TO n a pi = a pi − ∑ a pk a ki k =1 i −1 NEXT p FOR q=i+1 TO n aiq = aiq − ∑ aik a kq k =1 i −1 aii NEXT q NEXT i # Forward Substitution FOR i=1 TO n FOR q=n+1 TO n+m aiq = aiq − ∑ aik a kq k =1 i −1 aii NEXT q NEXT i # Back Substitution FOR i=n–1 TO 1 FOR q=n+1 TO n+m aiq = aiq − NEXT q NEXT i k = i +1 ∑a n ik a kq This algorithm assumes the righthand side(s) are initially stored in the same array structure as the matrix and are positioned in the column(s) n+1 (to n+m for m righthand sides). By convention. Once we have computed L and U we need solve only Ly = b. the L matrix is scaled to have a leading diagonal of unit values. (54) (53) a procedure requiring O(n2) operations compared with O(n3) operations for the full Gauss elimination. then Ux = y.Numerical Methods for Natural Sciences IB Introduction where L is a lower triangular matrix (zero above the leading diagonal) and U is an upper triangular matrix (zero below the diagonal). While the factorisation process requires O(n3) operations. Since we have decided the diagonal elements lii in the lower triangular matrix will always be unity. this need be done only once whereas we may wish to solve Ax=b for with whole range of b. it is not necessary for us to store these elements and so the matrices L and U can be stored together in an array the same size as that used for A. in most implementations the factorisation will simply overwrite A. Note that this decomposition is underspecified in that we may choose the relative scale of the two matrices arbitrarily. To improve the – 35 – .
the forward substitution loop may be incorporated into the factorisation loop. 4. for example. we will already have the i nonzero elements of liT and the first i−1 elements of uj. We want to find vectors liT and uj such that aij = liTuj. However. if the matrix is stored intelligently. uj liT aij Figure 10: Diagramatic representation of how LU factorisation works for calculating uij to replace aij where i < j.Numerical Methods for Natural Sciences IB Introduction efficiency of the computation for righthand sides known in advance. It is sometimes possible to – 36 – . This may require. on the storage requirements.5 Tridiagonal matrices A tridiagonal matrix is a special form of banded matrix where all the elements are zero except for those on and immediately above and below the leading diagonal (b=1). The ith element of uj may therefore be chosen simply as uj(i) = aij− liTujwhere the dotproduct is calculated assuming uj(i) is zero.i+b are all zero. One problem with such banded structures can occur if a (near) zero turns up on the diagonal during the factorisation. For example. Figure 10 indicates how the LU Factorisation process works. Software libraries such as NAG and IMSL provide a range of routines for solving such banded linear systems in a computationally and storage efficient manner. the factorisation loop FOR q=i+1 TO n can terminate at i+b instead of n. 4. it will be essential to record the pivoting which has taken place. then the summations in the LU Factorisation algorithm need be performed only from k=i or k=i+1 to k=i+b. if the matrix is to be used in its factorised form. the potential occurrence of small or zero values on the diagonal can cause computational difficulties.4 Banded matrices The LU Factorisation may readily be modified to account for banded structure such that the only nonzero elements fall within some distance of the leading diagonal. Making use of the banded structure of a matrix can save substantially on the execution time and.i–b to ai. if elements outside the range ai. pivoting on both the rows and columns as described in section 4.2. The white areas represent zeros in the L and U matrices. Moreover. This may be achieved by simply recording the row interchanges for each i in the above algorithm and using the same row interchanges on the righthand side when using L in subsequent forward substitutions. The solution is again pivoting – partial pivoting is normally all that is required.2. As with normal Gauss Elimination. Care must then be taken in any pivoting to try to maintain the banded structure. When we are at the stage of calculating the ith element of uj.
….either exact or approximate . before subtracting a3 times this equation from the third. tridiagonal matrices frequently occur in numerical solution of differential equations. may be expressed as # Factorisation FOR i=1 TO n bi = bi – aici–1 ci = ci/bi NEXT i # Forward Substitution FOR i=1 TO n ri = (ri – airi–1)/bi NEXT i # Back Substitution FOR i=n–1 TO 1 ri = ri – ciri+1 NEXT i (56) 4.6 Other approaches to solving linear systems There are a number of other methods for solving general linear systems of equations including approximate iterative techniques. by analogy with the LU Factorisation. Clearly x–1 and xn+1 are not required and we set a1=cn=0 to reflect this.n. We then divide the new second equation by the new value in place of b2. If solved by standard Gauss elimination. We shall return to this topic in section 8.much faster than the general O(n3) solvers presented here. and so on.Numerical Methods for Natural Sciences IB Introduction rearrange the rows and columns of a matrix which does not initially have this structure in order to gain this structure and hence greatly simplify the solution process. A tridiagonal system may be written as b1 a 2 0 0 0 M 0 0 0 0 c1 b2 a3 0 0 M 0 0 0 0 0 c2 b3 a4 0 M 0 0 0 0 0 0 c3 b4 a5 M 0 0 0 0 0 0 0 L L L 0 0 0 0 0 M 0 0 0 0 0 M cn − 3 bn − 2 an −1 0 0 0 0 0 0 M 0 cn − 2 bn −1 an 0 x1 r1 0 x2 r2 0 x3 r3 0 x4 r4 0 x5 r5 = M M M 0 xn − 3 rn − 3 0 xn − 2 rn − 2 cn −1 xn −1 rn −1 bn xn rn c4 L b5 L M 0 0 0 0 O (55) L bn − 3 L an − 2 L L 0 0 or aixi–1 + bixi + cixi+1 = ri for i=1. Many large matrices which need to be solved in practical situations have very special structures which allow solution . As we shall see later in sections 6 to 8.1 – 37 – . Solution. then we start by dividing the first equation by b0 before subtracting a2 times this from the second equation to eliminate a2.
(58) The approximate solution is chosen by optimising this error in some manner. For example. In this solution we minimise the residual sum of squares. the solution vector x will give us the solution in a least squares sense. c r4 d r5 M rn −1 (57) which is of the form Ax = r.7 Over determined systems* If the matrix A contains m rows and n columns. The error in this approximate solution is then e = Ax – r. it can be valuable to determine the solution in an approximate sense. which is simply rss = eTe. the system is probably overdetermined (unless there are m–n redundant rows). fitting data points si.d) While the solution to Ax = r will not exist in an algebraic sense.Numerical Methods for Natural Sciences IB Introduction where we shall discuss a system with a special structure resulting from the numerical solution of the Laplace equation. where xT = (a.c. if we solve the n by n problem ATAx = ATr. and setting ∂rss ∂x to zero gives ∂rss = 2 A T Ax − 2 A T r = 0 . * Not examinable – 38 – .ri (i = 0. Such a system may be the result from fitting a model with unknown coefficients to experimental data or observations.b. Most useful among the classes of solution is the Least Squares solution. 4. ∂x (59) (60) (61) Thus.n−1) with the model a + bs + cs2 + des = r leads to the linear system 1 s0 1 s1 1 s2 1 s3 1 s 4 1 s5 M M 1 sn −1 2 s0 s12 2 s2 2 s3 2 s4 2 s5 M 2 sn −1 es es es es es es M es 0 1 2 3 4 5 n −1 r0 r1 a r2 b = r3 . with m > n. Substituting for e we obtain rss = [xTAT − rT][Ax − r] = xTATAx − 2xTATr + rTr.
Numerical Methods for Natural Sciences IB
Introduction
Warning: The matrix ATA is often poorly conditioned (nearly singular) and can lead to significant errors in the resulting Least Squares solution due to rounding error. While these errors may be reduced using pivoting in combination with Gauss Elimination, it is generally better to solve the Least Squares problem using the Householder transformation, as this produces less rounding error, or better still by Singular Value Decomposition which will highlight any redundant or nearly redundant variables in x. The Householder transformation avoids the poorly conditioned nature of ATA by solving the problem directly without evaluating this matrix. Suppose Q is an orthogonal matrix such that QTQ = I, where I is the identity matrix and Q is chosen to transform A into
R QA = , 0
(62)
(63)
where R is a square matrix of a size n and 0 is a zero matrix of size mn by n. The righthand side of the system QAx = Qr becomes
b Qr = , c
(64)
where b is a vector of size n and c is a vector of size mn. Now the turning point (global minimum) in the residual sum of squares, (61), this occurs when
∂rss = 2[ A T Ax − A T r ] ∂x = 2[ A T Q T QAx − A T Q T Qr ] = 2 [QA ] QAx − [QA ] Qr
T T
[
= 2[QA ] [QAx − Qr ]
T
]
(65)
= 2R T [ Rx − b]
vanishes. For a nontrivial solution, that occurs when Rx = b. (66)
This system may be solved to obtain the least squares solution x using any of the normal linear solvers discussed above. Further discussion of these methods is beyond the scope of this course.
– 39 –
Numerical Methods for Natural Sciences IB
Introduction
4.8 Under determined systems*
If the matrix A contains m rows and n columns, with m < n, the system is under determined. The solution maps out a n–m dimensional subregion in n dimensional space. Solution of such systems typically requires some form of optimisation in order to further constrain the solution vector. Linear programming represents one method for solving such systems. In Linear Programming, the solution is optimised such that the objective function z=cTx is minimised. The “Linear” indicates that the underdetermined system of equations is linear and the objective function is linear in the solution variable x. The “Programming” arose to enhance the chances of obtaining funding for research into this area when it was developing in the 1960s.
*
Not examinable
– 40 –
Numerical Methods for Natural Sciences IB
Introduction
5 Numerical integration
There are two main reasons for you to need to do numerical integration: analytical integration may be impossible or infeasible, or you may wish to integrate tabulated data rather than known functions. In this section we outline the main approaches to numerical integration. Which is preferable depends in part on the results required, and in part on the function or data to be integrated.
5.1 Manual method
If you were to perform the integration by hand, one approach is to superimpose a grid on a graph of the function to be integrated, and simply count the squares, counting only those covered by 50% or more of the function. Provided the grid is sufficiently fine, a reasonably accurate estimate may be obtained. Figure 11 demonstrates how this may be achieved.
x0
x1
Figure 11: Manual method for determining integral by superimposing a grid on a graph of the integrand. The boxes indicated in grey are counted.
5.2 Constant rule
Perhaps the simplest form of numerical integration is to assume the function f(x) is constant over the interval being integrated. Such a scheme is illustrated in figure 12. Clearly this is not going to be a very accurate method of integrating, and indeed leads to an ambiguous result, depending on whether the constant is selected from the lower or the upper limit of the integral.
– 41 –
with the coefficient being derived from f’(x). (68) – 42 – . or f(x0+∆x)∆x if taken from the upper limit. Clearly we can do much better than this. although a knowledge of it helps with understanding the solution of ordinary differential equations (see §6).Numerical Methods for Natural Sciences IB Introduction x0 x1=x0+∆x Figure 12: Integration by constant rule whereby the value of f(x) is assumed constant over the interval. 1 2 f ′( x 0 )∆x 2 + 1 6 f ′′( x 0 )∆x 3 +K (67) if the constant is taken from the lower limit. and as a result this rule is not used in practice. 5. Integration of a Taylor Series expansion of f(x) shows the error in this approximation to be x + ∆x x0 ∫ f ( x )dx = ∫ f ( x ) + f ′( x )( x − x ) + 0 0 0 x + ∆x 1 2 f ′′( x 0 )( x − x 0 ) +K dx 2 = f ( x 0 ) ∆x + x0 = f ( x 0 )∆x + O( ∆x 2 ) . In both cases the error is O(∆x2).3 Trapezium rule Consider the Taylor Series expansion integrated from x0 to x0+∆x: x + ∆x x0 ∫ f ( x )dx = ∫ f ( x ) + f ′( x )( x − x ) + 0 0 0 x + ∆x 1 2 f ′′( x 0 )( x − x 0 ) +K dx 2 = f ( x 0 )∆x + = = x0 [ 1 2 1 2 ( f ( x ) + f ′( x )∆x + f ′′( x )∆x ( f ( x ) + f ( x + ∆x))∆x + O(∆x ) f ( x0 ) + 0 1 2 0 0 1 2 3 0 0 1 2 f ′( x 0 )∆x 2 + 1 f ′′( x 0 )∆x 3 +K 6 2 +K − ) 1 12 f ′′( x 0 )∆x +K ∆x 2 ] .
the size of the domain would be halved. However. if we were to halve ∆x. the error would be decreased by a factor of eight. As we can see from equation (68). the error in the Trapezium Rule is proportional to ∆x3. The Trapezium Rule used in this manner is sometimes termed the Compound Trapezium Rule. Thus. thus requiring the Trapezium Rule to be evaluated twice and the contributions summed. Suppose we need to integrate from x0 to x1. In general it consists of the sum of integrations over a smaller distance ∆x to obtain a smaller error. We shall subdivide this interval into n steps of size ∆x=(x1–x0)/n as shown in figure 14. but more often simply the Trapezium Rule.Numerical Methods for Natural Sciences IB Introduction The approximation represented by 1/2[f(x0) + f(x0+∆x)]∆x is called the Trapezium Rule based on its geometric interpretation as shown in figure 13. x0 x1=x0+∆x Figure 13: Graphical interpretation of the trapezium rule. The net result is the error decreasing by a factor of four rather than eight. – 43 – .
We would choose to reduce ∆x in the regions of high curvature as we can see from equation (68) that the leading order truncation error is scaled by f". the cumulative error is n times this or O(∆x2) ~ O(n2). The above analysis assumes ∆x is constant over the interval being integrated. ∑ f ( x0 + i∆x) + f ( x0 + (i + 1) ∆x) 2 i =0 ∆x = f ( x0 ) + 2 f ( x0 + ∆x ) + 2 f ( x0 + 2 ∆x )+K+2 f ( x0 + ( n − 1) ∆x ) + f ( x1 ) 2 ≈ n −1 (69) [ ] While the error for each step is O(∆x3).Numerical Methods for Natural Sciences IB Introduction x0 x1=x0+n∆x Figure 14: Compound Trapezium Rule. although it would remain O(∆x2).4 Midpoint rule A variant on the Trapezium Rule is obtained by integrating the Taylor Series from x0−∆x/2 to x0+∆x/2: – 44 – . The Compound Trapezium Rule approximation to the integral is therefore x0 ∫ f ( x ) dx = ∑ ∫ f ( x ) d x i =0 x0 + i∆x x1 n −1 x0 + ( i +1) ∆x ∆x . This is not necessary and an extension to this procedure to utilise a smaller step size ∆xi in regions of high curvature would reduce the total error in the calculation. 5.
(70) By evaluating the function f(x) at the midpoint of each interval the error may be slightly reduced relative to the Trapezium rule (the coefficient in front of the curvature term is 1/24 for the Midpoint Rule compared with 1/12 for the Trapezium Rule) but the method remains of the same order.Numerical Methods for Natural Sciences IB Introduction x0 + 1 2 ∆x x0 − 1 2 ∆x ∫ f ( x ) dx = ∫ f ( x ) + f ′( x )( x − x ) + 0 0 0 x0 + 12 ∆x 1 2 = f ( x 0 )∆x + x0 − 12 ∆x f ′′( x 0 )( x − x 0 ) +K dx 2 1 24 f ′′( x 0 )∆x 3 +K . plus how the first and last intervals are calculated. Again we may reduce the error when integrating the interval x0 to x1 by subdividing it into n smaller steps. – 45 – . x0 − ½∆x x0 + ½∆x Figure 15: Graphical interpretation of the midpoint rule. The difference between the Trapezium Rule and Midpoint Rule is greatly diminished in their compound forms. This Compound Midpoint Rule is then x0 ∫ x1 f ( x ) dx ≈ ∆x ∑ f ( x0 + (i + 1 2 ) ∆x ) . i =0 n −1 (71) with the graphical interpretation shown in figure 16. Comparison of equations (69) and (71) show the only difference is in the phase relationship between the points used and the domain. The grey region defines the midpoint rule as a rectangular approximation with the dashed lines showing alternative trapeziodal aproximations containing the same area. Figure 15 provides a graphical interpretation of this approach.
The first is that is requires one fewer function evaluations for a given number of subintervals. and the second that it can be used more effectively for determining the integral near an integrable singularity.Numerical Methods for Natural Sciences IB Introduction x0 x1=x0+n∆x Figure 16: Compound Midpoint Rule. – 46 – . There are two further advantages of the Midpoint Rule over the Trapezium Rule. The reasons for this are clear from figure 17.
5.5 Simpson’s rule An alternative approach to decreasing the step size ∆x for the integration is to increase the accuracy of the functions used to approximate the integrand. Figure 18: Quadratic approximation to integrand is the basis of Simpson’s Rule.Numerical Methods for Natural Sciences IB Introduction Figure 17: Applying the Midpoint Rule where the singular integrand would cause the Trapezium Rule to fail. Figure 18 sketches one possibility. Integrating the Taylor series over an interval 2∆x shows – 47 – . using a quadratic approximation to f(x).
Simpson’s rule is two orders more accurate at O(∆x5). For each arclike region so created (there are two in figure 19) the maximum deviation (indicated by arrows in figure 19) from the line should be measured. To improve the accuracy when integrating over larger intervals. The Compound Simpson’s rule is then x0 ∫ x1 ∆x m−1 f ( x ) dx ≈ ∑ f ( x0 + 2i∆x) + 4 f ( x0 + ( 2i + 1) ∆x) + f ( x0 + ( 2i + 2) ∆x ) 3 i =0 ∆x = f ( x0 ) + 4 f ( x0 + ∆x ) + 2 f ( x0 + 2 ∆x )+K+4 f ( x0 + ( n − 1) ∆x ) + f ( x1 ) 3 [ ] . The approach is to cover the domain to be integrated with a triangle or trapezium (whichever is geometrically more appropriate) as is shown in figure 19. where l is the length of the (longest) chord. Hence we must be able to express the number of intervals as n=2m. 5. giving exact integration of cubics. the interval x0 to x1 may again be subdivided into n steps. * Not examinable – 48 – .Numerical Methods for Natural Sciences IB Introduction x + 2 ∆x x0 ∫ f ( x ) dx = 2 f ( x )∆x + 2 f ′( x )∆x 0 0 2 4 + 4 f ′′( x0 )∆x 3 + 2 f ′′′( x0 )∆x 4 + 15 f iv ( x0 )∆x 5K 3 3 (72) = 1 1 + 4 f ( x0 ) + f ′( x0 )∆x + 2 f ′′( x0 )∆x 2 + 1 f ′′′( x0 )∆x 3 + 24 f iv ( x0 )∆x 4 +K 6 2 0 0 0 ( + ( f ( x ) + 2 f ′( x )∆x + 2 f ′′( x )∆x = ∆x f ( x0 ) 3 [ − 17 f iv ( x0 )∆x 4 K 30 + 4 f ′′′( x0 )∆x 3 + 2 f iv 3 3 ∆x f ( x0 ) + 4 f ( x0 + ∆x ) + f ( x0 + 2 ∆x ) + O( ∆x 5 ) 3 ] ) ( x )∆x +K) 4 0 ( ) Whereas the error in the Trapezium rule was O(∆x3).6 Quadratic triangulation* Simpson’s Rule may be employed in a manual way to determine the integral with nothing more than a ruler. (74) remembering that some increase the area while others decrease it relative to the initial trapezoidal (triangular) estimate. The integrand may cross the side of the trapezium (triangle) connecting the end points. From Simpson’s rule we may approximate the area between each of these arcs and the chord as area = 2/3 × chord× maxDeviation. as should the length of the chord joining the points of crossing. The threepoint evaluation for each subinterval requires that there are an even number of subintervals. The overall estimate (ignoring linear measurement errors) will be O(l5). (73) and the corresponding error O(n∆x5) or O(∆x4).
This same process may be carried out to higher orders using ∆x/4.3 the error in some estimate T(∆x) of the integral I using a step size ∆x goes like c∆x2 as ∆x→0. For the Trapezium Rule the errors are all even powers of ∆x and as a result it can be shown that T(m)(∆x/2) = [22mT(m–1)(∆x/2) – T(m–1)(∆x)]/(22m–1). Likewise the error in T(∆x/2) will be c∆x2/4.Numerical Methods for Natural Sciences IB Introduction x0 x1=x0+n∆x Figure 19: Quadratic triangulation to determine the area using a manual combination of the Trapezium and Simpson’s Rules. A similar process may also be applied to the Compound Simpson’s Rule. ∆x/8.7 Romberg integration With the Compound Trapezium Rule we know from section 5. … to eliminate the higher order error terms. for some constant c. From this we may construct a revised estimate T(1)(∆x/2) for I as a weighted mean of T(∆x) and T(∆x/2): T(1)(∆x/2) = αT(∆x/2) + (1–α)T(∆x) = α[I + c∆x2/4 + O(∆x4)] + (1–α)[I + c∆x2 + O(∆x4)]. Thus we have T(1)(∆x/2) = [4T(∆x/2) – T(∆x)]/3. 5. (75) By choosing the weighting factor α = 4/3 we elimate the leading order (O(∆x2)) error terms. (77) (76) – 49 – . relegating the error to O(∆x4). Comparison with equation (73) shows that this formula is precisely that for Simpson’s Rule.
Numerical Methods for Natural Sciences IB Introduction 5. This may be derived by comparing the Taylor Series expansion for the integral with that for the points x0+α∆x and x0+β∆x: x1 = x0 + ∆x x0 ∫ f ( x ) dx = ∆xf ( x0 ) + = ∆x 2 ∆x 3 ∆x 4 f ′ ( x0 ) + f ′′( x0 ) + f ′′′( x0 ) + O( ∆x 5 ) 2 6 24 2 3 ( α∆x) ( α∆x ) ∆x f ′′( x0 ) + f ′′′( x0 ) +K f ( x0 ) + α∆xf ′( x0 ) + 2 2 6 . The Taylor Series expansion of the integral gives x0 + ∆x (80) ∆x 2 ∆x 4 iv ∆x 6 vi f ( x ) dx = 2 ∆x f ( x0 ) + f ′′( x0 ) + f ( x0 ) + f ( x0 ) + O( ∆x 8 ) . The Gauss Quadrature accurate to order 2M−1 may be determined using the same approach required for the twopoint scheme. (79) f ′′′( x0 )+K 2 6 3 2 ∆x ∆x ∆x 4 f ′( x0 ) + ( α 2 + β 2 ) + ( α 3 + β3 ) +K = ∆xf ( x0 ) + ( α + β) 4 12 2 + f ( x0 ) + β∆xf ′( x0 ) + 2 ( β∆x) f ′′( x0 ) + (β∆x ) 3 Equating the various terms reveals α+β=1 (α2 + β2)/4 = 1/6. the solution of which gives the positions stated in equation (78).8 Gauss quadrature By careful selection of the points at which the function is evaluated it is possible to increase the precision for a given number of function evaluations. Gauss quadrature has the formula x1 = x0 + ∆x x0 ∫ f ( x ) dx ≈ 1 3 ∆x ∆x + f x0 + − 2 2 6 1 3 ∆x + O( ∆x 4 ) . which is also exact for cubics. requires three function evaluations). ∫∆x 6 120 5040 x − 0 (81) while the expansion of the function for the three points gives – 50 – . Using three function evaluations: symmetry suggests for the interval x0−∆x to x0+∆x the points should be evaluated at x0 and x0±α with weightings 2∆xA and 2∆xB. One widely used example of this is Gauss quadrature which enables exact integration of cubics with only two function evaluations (in contrast Simpson’s Rule. The Midpoint rule is an example of this: with just a single function evaluation it obtains the same order of accuracy as the Trapezium Rule (which requires two points). f x0 + + 2 6 (78) In general it is possible to choose M function evaluations per interval to obtain a formula exact for all polynomials of degree 2M−1 and less.
Tables 2 to 6 summarise the error in the numerical estimates for the Trapezium Rule. Simpson’s Rule. The results are presented in terms of the number of function evaluations required. Gauss Quadrature and a three point Gauss Quadrature (formula derived in lectures).Numerical Methods for Natural Sciences IB Introduction x0 + ∆x x0 − ∆x ∫ f ( x)dx = 2∆x[ Af ( x ) + Bf ( x 0 0 − α ) + Bf ( x0 + α ) ] ) + O(α ))] 8 ( + B( f + αf ′ + 1 1 1 1 1 + B f − αf ′ + 2 α 2 f ′′ − 1 α 3 f ′′′ + 24 α 4 f iv − 120 α 5 f v + 720 α 6 f vi − 5040 α 7 f vii + O(α 8 ) 6 1 2 1 1 1 1 1 α 2 f ′′ + 6 α 3 f ′′′ + 24 α 4 f iv + 120 α 5 f v + 720 α 6 f vi + 5040 α 7 f vii = 2 ∆x[ Af 1 1 = 2 ∆x ( A + 2 B ) f + Bα 2 f ′′ + 12 Bα 4 f iv + 360 α 6 f vi + O(α 8 ) .9 Example of numerical integration Consider the integral ∫ sin x dx = 2 . [ ] (82) Comparing the terms between (81) and (82) gives A + 2B = 1 Bα2 = ∆x2/6 Bα4/12 = ∆x4/120 which may be solved to obtain the position of the evaluations and the weightings (Bα4/12)/Bα2 = (∆x4/120) / (∆x2/6) ⇒ α2 = (6/10)∆x2 (83) ⇒ α = (3/5)1/2∆x B = 5/18 A = 4/9. Midpoint Rule. thus 1/ 2 1/ 2 1 3 3 f ( x ) dx = ∆x 8 f ( x0 ) + 5 f x0 − ∆x + 5 f x0 + ∆x . The calculations were performed in double precision. ∫ 5 5 9 x − ∆x x0 + ∆x 0 (84) (85) 5. Table 7 compares these errors. – 51 – . 0 π (86) which may be integrated numerically using any of the methods described in the previous sections.
Note error ratio → 2−2 (∆x2) No.23638 0.25 0.00642965622 0.06228 0.24999 0.429203673 0.25 0.06164 0.000401611359 0. intervals 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 262144 524288 No.24979 0.221441469 0.000269169948 0.24995 0.25081 0.00455975498 0.25 0.00000000306393221 0.06245 0. intervals 1 2 4 8 16 32 64 Table 3: Error in Midpoint Rule.00000627492942 0.0000000000239435138 0. f(x) Simpson’s Rule Error Ratio: e2n/en 3 5 9 17 33 65 129 0. intervals 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 262144 524288 No.24988 0.0483 0.06249 0.000000000765979280 0.25 0.24979 0.000000000382977427 0.0000250997649 0.24999 0.14189790 0.00000000153194190 0.19392 0.0000000000119970700 0.25 0.0000000245114248 0. f(x) 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 262144 524288 Midpoint Rule 1.25 0.Numerical Methods for Natural Sciences IB Introduction No.0257683980 0.25 0.00000000403225719 0.000000392182860 0.24898 Table 2: Error in Trapezium Rule.25 0.24952 0.24997 0.00160663902 0.000200811728 0.24999 0.2146 0.00000103336941 0.0000000490228564 0.0000502002859 0.25 0.24994 0.24662 0.24203 0.0523443059 0.000000196091438 0.000100399815 0.24806 0.0000000980457133 0.0000165910479 0.25014 0.05903 0.0625 – 52 – .25 0.00000000000303357339 Error Ratio: e2n/en 0.25 0.00000000 0.25 0.00000000000596145355 Error Ratio: e2n/en 0.103881102 0.0943951023 0.000000000191497040 0.0000000122557182 0. Note error ratio → 2−2 (∆x2) No.00000000612785222 0.00321637816 0.25 0.000000784365898 0.0000125499060 0.0129090855 0.25 0.24916 0. f(x) 2 3 5 9 17 33 65 129 257 513 1025 2049 4097 8193 16385 32769 65537 131073 262145 524289 Trapezium Rule 2.25 0.0000000000478341810 0.25286 No.0000000645300022 0.0000000000957223189 0.00000156873161 0.25 0.000803416309 0.00000313746618 0.25 0.
2920E01 1.5319E09 3.2549E05 3.0388E01 2.2344E02 1.000179666460 0.2749E06 1.1064E05 6.0498E11 6.6800E10 1.06248 No.4511E08 6.4530E08 4.4395E02 4.5687E06 3.0641804253 0.0039E04 2.0335E12 Simpson’s Rule 9.9022E08 1.2144E01 5.8436E07 1.8896E07 4.0322E09 2.7966E04 1.3943E11 5.9614E12 Midpoint Rule 1.0000000000157500679 0.0000000000104984909 0.3020E08 2.6916E04 1.5200E10 1.0000000000000002220446049250313 0.06227 0.9218E07 9.0341E04 2.000000000252001974 0.0476 0.01464 0.01538 0.1997E11 3.0547E03 1.0639E09 7.5597E03 2.5591E13 3 Pnt Gauss Quadrature 1. intervals 1 2 4 8 16 32 64 128 256 512 No.6243E05 2.06249 0.0333E06 6.0000000430208237 0.00000000000001421085471520200 0.001388913607743625 0.4296E03 1.01556 0.06158 0.0625 0.0200E05 1.8045E08 2.01563 1 Table 6: Error in Three point Gauss Quadrature.3889E03 1.8276E13 Gauss Quadrature 6. Note error ratio → 2−6 (∆x6) No.8297E10 9. Note error ratio → 2−4 (∆x4) No.6066E03 4.0000000000000002220446049250313 0.6574E09 5.0000000000008881784197001252 0.9609E07 4.5768E02 6.01169 0.5750E11 9.9149E10 4.0161E04 1.0000002378219958742989 0.0625 0.06249 0.2204E16 Table 7: Error in numerical integration of (86) as a function of the number of subintervals.0000110640837 0.1418E+00 2.4180E02 3. Note error ratio → 2−4 (∆x4) No.00305477319 0.000000000000655919762 Error Ratio: e2n/en 0.2163E03 8.1374E06 7.0624 Table 4: Error in Simpson’s Rule.016 0. Trapezium Intervals Rule 1 2 4 8 16 32 64 128 256 512 1024 2048 4096 8192 16384 32768 65536 131072 262144 524288 1048576 2.2255E08 3.6881E09 1.0000E+00 4.6922E11 8. f(x) 2 4 8 16 32 64 128 256 512 1024 Gauss Quadrature 0.5099E05 6.0081E04 5. intervals 1 2 4 8 16 32 64 128 256 Table 5: Error in Gauss Quadrature.2909E02 3.8817E13 1.3782E07 3. – 53 – .05881 0.00000000005692291082937118 0.2204E16 2.00000000268818500 0.three point Error Ratio: e2n/en f(x) 3 6 12 24 48 96 192 384 768 0.5722E11 2.000000003657474767493341 0.Numerical Methods for Natural Sciences IB Introduction 128 256 512 257 513 1025 0. Gauss Quadrature .000000688965642 0.4210E14 2.0156 0.6597E10 1.000000000168002278 0.7834E11 1.1278E09 1.000000000000982769421 0.00001624311099668319 0.6591E05 1.06244 0.
x1.Value .nx) WRITE(6.nx) WRITE(6.x1.*)nx. PROGRAM Integrat REAL*8 x0.*) WRITE(6.nx C=====Functions REAL*8 TrapeziumRule REAL*8 MidpointRule REAL*8 SimpsonsRule REAL*8 GaussQuad C=====Constants pi = 2.*)’Trapezium rule’ nx = 1 DO i=1.x1.0D0) Exact = 2.*)’Midpoint rule’ nx = 1 DO i=1.j.*) nx = 2 DO i=1.Value .nx) WRITE(6.*)nx.Value. The number of function evaluations actually computed may be approximately halved for the Trapezium rule and reduced by one third for Simpson’s rule if the compound formulations are used.x1.*) WRITE(6.20 Value = MidpointRule(x0.9.Value.Exact nx = 2*nx ENDDO C======================================================================= C= Simpson’s rule = C======================================================================= WRITE(6.0 x1 = pi C======================================================================= C= Trapezium rule = C======================================================================= WRITE(6.*) WRITE(6.Numerical Methods for Natural Sciences IB Introduction 5.0*ASIN(1.Exact nx = 2*nx ENDDO C======================================================================= C= Gauss Quadrature = C======================================================================= * Not examinable – 54 – .20 Value = TrapeziumRule(x0. Note also that this example is included for illustrative purposes only.*)’Simpson’’s rule’ WRITE(6.Value.Exact nx = 2*nx ENDDO C======================================================================= C= Midpoint rule = C======================================================================= WRITE(6. No knowledge of Fortran or any other programming language is required in this course.pi INTEGER*4 i.Exact.1 Program for numerical integration* Note that this program is written for clarity rather than speed.10 Value = SimpsonsRule(x0.Value .0 C=====Limits x0 = 0.Value.*)nx.
f f = SIN(x) RETURN END REAL*8 FUNCTION TrapeziumRule(x0.xa.nx) C=====parameters INTEGER*4 nx REAL*8 x0.x1 – 55 – .*)nx.Exact nx = 2*nx ENDDO END FUNCTION f(x) C=====parameters REAL*8 x.Value .Sum dx = (x1 .x1 C=====functions REAL*8 f C=====local variables INTEGER*4 i REAL*8 dx.5)*dx fa = f(xa) Sum = Sum + fa ENDDO Sum = Sum * dx MidpointRule = Sum RETURN END REAL*8 FUNCTION SimpsonsRule(x0.nx) C=====parameters INTEGER*4 nx REAL*8 x0.Sum dx = (x1 .x1.x0)/Dfloat(nx) Sum = 0.fa.0 DO i=0.Value.*) WRITE(6.nx1 xa = x0 + DFLOAT(i)*dx xb = x0 + DFLOAT(i+1)*dx fa = f(xa) fb = f(xb) Sum = Sum + fa + fb ENDDO Sum = Sum * dx / 2.xb.xa.x1.0 TrapeziumRule = Sum RETURN END REAL*8 FUNCTION MidpointRule(x0.fb.x1.0 DO i=0.*)’Gauss quadrature’ nx = 1 DO i=1.x1.fa.Numerical Methods for Natural Sciences IB Introduction WRITE(6.nx) WRITE(6.x1 C=====functions REAL*8 f C=====local variables INTEGER*4 i REAL*8 dx.x0)/DFLOAT(nx) Sum = 0.nx1 xa = x0 + (DFLOAT(i)+0.nx) C=====parameters INTEGER*4 nx REAL*8 x0.10 Value = GaussQuad(x0.
0D0)/6.0*fb + fc ENDDO Sum = Sum * dx / 3.fa.0D0) dxr = dx*(0.0 DO i=0.fb.fa.0D0)/6.xc.xb.dxl.x1.Sum dx = (x1 .0D0) Sum = 0.0 SimpsonsRule = Sum RETURN END REAL*8 FUNCTION GaussQuad(x0.2 xa = x0 + DFLOAT(i)*dx xb = x0 + DFLOAT(i+1)*dx xc = x0 + DFLOAT(i+2)*dx fa = f(xa) fb = f(xb) fc = f(xc) Sum = Sum + fa + 4.Sum.fc.x0)/DFLOAT(nx) dxl = dx*(0.nx) C=====parameters INTEGER*4 nx REAL*8 x0.fb.nx1 xa = x0 + DFLOAT(i)*dx + dxl xb = x0 + DFLOAT(i)*dx + dxr fa = f(xa) fb = f(xb) Sum = Sum + fa + fb ENDDO Sum = Sum * dx / 2.xa.0 DO i=0.5D0 + SQRT(3.0 GaussQuad = Sum RETURN END – 56 – .SQRT(3.Numerical Methods for Natural Sciences IB Introduction C=====functions REAL*8 f C=====local variables INTEGER*4 i REAL*8 dx.5D0 .xb.xa.dxr dx = (x1 .x1 C=====functions REAL*8 f C=====local variables INTEGER*4 i REAL*8 dx.nx1.x0)/DFLOAT(nx) Sum = 0.
Yn = Yn 1 1 + Yn +1 = Yn + ∆tYn′ + ∆t 2Yn′′+ ∆t 3Yn′′′ O( ∆t 4 ) 2 6 4 + Yn + 2 = Yn + 2 ∆tYn′ + 2 ∆t 2Yn′′+ ∆t 3Yn′′′ O( ∆t 4 ) 3 L (89) – 57 – . y ) . t0+∆t. These methods will then be used in §8 as the basis for solving some types of partial differential equations. The finite difference solution of this equation proceeds by discretising the independent variable t to t0. If we take the Taylor Series expansion for the grid points in the neighbourhood of some point t = tn .Numerical Methods for Natural Sciences IB Introduction 6 First order ordinary differential equations Ths section of the course introduces some commonly used methods for determining the numerical solutions of ordinary differential equations. 6. L 4 + Yn − 2 = Yn − 2 ∆tYn′ + 2 ∆t 2Yn′′− ∆t 3Yn′′′ O( ∆t 4 ) 3 1 1 + Yn −1 = Yn − ∆tYn′ + ∆t 2Yn′′− ∆t 3Yn′′′ O( ∆t 4 ) 2 6 . The manner in which the function values are combined is determined by the Taylor Series expansion for the point at which the derivative is required. We then look to solve Y’n = f(tn.1 Taylor series The key idea behind numerical solution of odes is the combination of function values at different points or times to approximate the derivatives in the required equation. dt (87) subject to some boundary/initial condition y(t=t0) = c. This gives us a finite difference approximation to the derivative. t0+2∆t. 6. t0+3∆t. … We shall denote the exact solution at some t = tn = t0+n∆t by yn = y(t=tn) and our approximate solution by Yn.Yn) (88) at each of the points in the domain.2 Finite difference Consider a first order ode of the form dy = f (t . where Y’n is an approximation to dy/dt based on the discrete set of Yn.
). so Yn − Yn −1 1 = Yn′ + ∆tYn′′+ O( ∆t 2 ) ..Numerical Methods for Natural Sciences IB Introduction we may then take linear combinations of these expansions to obtain an approximation for the derivative Y’n at t = tn. requiring ∑α i=a b i =0 (91) and. It also depends on the stability of the method (see section 6.and righthand sides reveals α = 1/∆t. 6. (95) (94) where yn* the exact solution of our differential equation but with the initial condition yn–1*=Yn–1.. In contrast. As an example. possibly some of the terms of higher order. depending on the method. Equating left. Adding the appropriate equations.3 Truncation error The global truncation error at the nth step is the cumulative total of the truncation error at the previous steps and is En = Yn – yn. and setting equal to Y’n. we need to consider the error associated with approximating yn by Yn. noting that the weightings sum to zero so may be expressed as α1 = α and α2 = − α. We shall look at various strategies for choosing αi in the following sections. viz.7 for details) and we aim for En = O(en). gives Y’n ≈ αYn − αYn1 = αYn − α(Yn − ∆tY’n + ½∆t2Y"n − 1/6∆t3Y’"n + . (96) which would give En = O(nen). Yn′ ≈ ∑ α i Yn +i . ∆t 2 (92) (93) Before looking at this in any further detail. the local truncation error for the nth step is en = Yn – yn*. we may select the Yn and Yn1 values as the basis of an approximation to approximate Y’n. – 58 – . i=a b (90) The linear combination is chosen so as to eliminate the term in Yn. Note that En is not simply ∑e i =1 n n .
shows that the O(∆t) error in the finite difference approximation (93) becomes an O(∆t2) error in (98).4 Euler method The Euler method is the simplest finite difference scheme to understand and implement.Yn). Each arrow is tangental to to the solution of (87) passing through the point located at the start of the arrow. it can be shown that if Yn=yn+O(∆t2). Y2 from Y1 + ∆tf(t1. Following on from (93) we approximate the derivative in (88) as Y’n ≈ (Yn+1 – Yn)/∆t. – 59 – .Y0). Comparing the Euler method equation for Yn in (98) with the the Taylor Series expansion for Yn1 in (89). marching forwards through time. Thus the Euler method is first order accurate and is often referred to as a first order method. (98) (97) Given the initial/boundary condition Y0 = c. Moreover.7).Numerical Methods for Natural Sciences IB Introduction 6. we may obtain Y1 from Y0 + ∆tf(t0. This process is shown graphically in figure 20. then Yn+1=yn+1+O(∆t2) provided the scheme is stable (see section 6. in our differential equation for Yn to obtain Yn+1 = Yn + ∆tf(tn.Y1) and so on. The Euler method is termed an explicit method because we are able to write down an explicit solution for Yn+1 in terms of “known” values at tn. t0 Figure 20: Sketch of the function y(t) (dark line) and the Euler method solution (arrows). Note that this point need not be on the desired y(t) curve.
the only difference being that the derivative Y’n is approximated by Y'n ≈ (Yn – Yn–1)/∆t. – 60 – .Yn+1).tn. Often the computational expense per step is more than compensated for by it being possible to take larger steps (see section 6. Note that this point need not be on the desired y(t) curve.1 Backward Euler The backward Euler method is almost identical to its explicit relative.5.7).Yn+1. Each arrow is tangental to to the solution of (87) passing through the point located at the end of the arrow. Such implicit methods are computationally more expensive for a single step than explicit methods.∆t). (100) (99) t0 Figure 21: Sketch of the function y(t) (dark line) and the Euler method solution (arrows). but offer advantages in terms of stability and/or accuracy in many circumstances. 6. to give the evolution equation Yn+1 = Yn + ∆tf(tn+1.tn.5 Implicit methods The Euler method outlined in the previous section may be summarised by the update formula Yn+1 = g(Yn. This is shown graphically in figure 21.∆t).Numerical Methods for Natural Sciences IB Introduction 6. for example. In contrast implicit methods have have Yn+1 on both sides: Yn+1 = h(Yn.
Again. exactly as for the Euler method. An example of this is shown in figure 22. (101) The same approach may be applied to higher order methods such as those presented in the following sections. tend to be more stable and hence accurate. 6.∆t) and Y(t. the Backward Euler method has errors of O(∆t2). It is generally preferable. especially for stiff problems (problems where f’ is large). however. be possible to recover an explicit formula for some functions f. the exception being that calculating both Y(t.∆t/2) – Y(t.) As the derivative Y’n is approximated only to the first order.∆t/2) = [4Y(t.∆t) = y(t) + c∆t2. if we have the estimate for some time t calculated using a time step ∆t. – 61 – . Similarly an estimate using a step size ∆t/2 will follow Y(t. The solution process will.∆t) ]/3. however.Numerical Methods for Natural Sciences IB Introduction The dependence of the righthand side on the variables at tn+1 rather than tn means that it is not. in general possible to give an explicit formula for Yn+1 only in terms of Yn and tn+1. Combining these two estimates to try and cancel the O(∆t2) errors gives the improved estimate as Y(1)(t.7) of using two approximations of different step size to construct a more accurate estimate may also be used for numerical solution of ordinary differential equations. however.2 Richardson extrapolation The Romberg integration approach (presented in section 5.5. to utilise a higher order method to start with.∆t/2) = y(t) + 1/4c∆t2 as ∆t→0. Backward Euler CrankNicholson t0 (Forward) Euler Figure 22: Comparison of ordinary differential equation solvers for a stiff problem. (It may. then for both the Euler and Backward Euler methods the approximate solution is related to the true solution by Y(t.∆t/2) allows the two solutions to be compared and thus the trunctation error estimated.
in general.Yn+1/2) is then satisfied by a linear interpolation for f between tn–1/2 and tn+1/2 to obtain Yn+1 – Yn = 1/2[f(tn+1.Yn+1/2). y n +1 ) = f ( t n .Yn)]∆t.Numerical Methods for Natural Sciences IB Introduction 6. (104) As with the Backward Euler. Using the same discretisation of t we obtain Y’n+1/2 ≈ (Yn+1 – Yn)/∆t.Yn+1/2). however. we may obtain a second order method due to cancellation of the terms of O(∆t2).3 CrankNicholson If we use central differences rather than the forward difference of the Euler method or the backward difference of the backward Euler.5.and righthand sides of equation (104) reveals ∂f 2 1 ∂f yn +1 − yn = f n ∆t + + f ∆t + O( ∆t 3 ) ∂y 2 ∂t (106a) and ∂f ∂f 1 1 f (t n . y n ) + ∂f ∂f ∆t + ∆y + O( ∆t 2 ) + O( ∆y 2 ) ∂t ∂y ∂f ∂f dy = fn + ∆t + ∆t + O( ∆t 2 ) ∂t ∂y dt ∂f ∂f = fn + + f ∆t + O( ∆t 2 ) ∂y ∂t (105b) Substitution of these into the left. yn ) + f (t n +1 . (103) (102) The requirement for f(tn+1/2. possible to write an explicit expression for Yn+1 in terms of Yn. with a requirement for Taylor Series expansions about tn:: 1 d2y 2 dy ∆t + ∆t + O( ∆t 3 ) 2 dt 2 dt 1 ∂f ∂f 2 = yn + f n ∆t + + f ∆t + O( ∆t 3 ) 2 ∂t ∂y y n +1 = y n + (105a) f ( t n +1 .Yn+1) + f(tn. The overall approach is much the same. Substitution into our differential equation for Yn gives (Yn+1 – Yn)/∆t ≈ f(tn+1/2. Formal proof that the CrankNicholson method is second order accurate is slightly more complicated than for the Euler and Backward Euler methods due to the linear interpolation to approximate f(tn+1/2. yn +1 ) ∆t = f n + f n + + f ∆t + O( ∆t 2 ) ∆t ∂y 2 2 ∂t ( ) ∂f 2 1 ∂f = f n ∆t + + f ∆t + O( ∆t 3 ) ∂y 2 ∂t (106b) – 62 – . the method is implicit and it is not.
6. Suppose we are solving y’ = λy. (112) for some complex λ.Yn we may construct an approximation to the derivatives of orders 1 to s at tn. (109) These methods are called AdamsBashforth methods. The exact solution is bounded (i.…. Y"n ≈ (fn+1 – fn–1)/2∆t Y’"n ≈ (fn+1 – 2fn + fn–1)/∆t2. if s = 2 then Y’n = fn. This family of implicit methods is known as AdamsMoulton methods.6 Multistep methods As an alternative. to give Yn+1 = Yn + ∆tY’n + 1/2∆t2Y"n + 1/6∆t3Y’"n = Yn + (1/12)∆t(5fn+1 + 8fn – fn–1). (108) (107) For s=3 we also have Y’"n and so can make use of a secondorder onesided finite difference to approximate Y"n = f’n = (3fn 4 fn1 + fn2)/2∆t and include the third order Y’"n = f"n = (fn2fn1+fn2)/∆t2 to obtain Yn+1 = Yn + ∆tY’n + 1/2∆t2Y"n + 1/6∆t3Y"n = Yn + 1/12∆t(23fn – 16fn–1 + 5fn–2). Substituting this into the Euler method shows – 63 – . (111) (110) 6. Implicit AdamsBashforth methods are also possible if we use information about fn+1 in addition to earlier time steps.e. The corresponding s = 2 method then uses Y’n = fn. does not increase without limit) provided Reλ ≤ 0. By utilising only Yn–s+1. Y"n =~ (fn – fn–1)/∆t and so we may construct a second order method from the Taylor series expansion as Yn+1 = Yn + ∆tY’n + ½∆t2Y"n = Yn + 1/2∆t(3fn – fn–1). Note that s = 1 recovers the Euler method. For example.7 Stability The stability of a method can be even more important than its accuracy as measured by the order of the truncation error. the accuracy of the approximation to the derivative may be improved by using a linear combination of additional points.Yn–s+2.Numerical Methods for Natural Sciences IB Introduction which are equal up to O(∆t3).
This condition (114) on ∆t is very restrictive if λ<<0 as it demonstrates the Euler method must use very small time steps ∆t < 2λ–1 if the solution is to converge on y = 0. the error obeys an equation of the form given by (112).y) and using a Taylor Series expansion gives dy$ dy dε = + dt dt dt = f (t . with λ = ∂f/∂y. In comparison. As it is desirable for errors to decrease (and thus the solution remain stable) rather than increase (and the solution be unstable). If Yn is to remain bounded for increasing n and given Reλ < 0 we require 1 + λ∆t ≤ 1. to the leading order. A consequence of the decay of errors present at one time step as the solution process proceeds is that memory of a particular time step’s contribution to the global truncation error decays as the solution advances through time. the limit on the time step suggested by (114) applies for the application of the Euler method to any ordinary differential equation. Thus the global truncation error is dominated by the local trunction error(s) of the most recent step(s) and O(En) = O(en). (115) where ε is the (small) error. y$ ( ) . y + ε ) = f (t . (116) Thus. solution of (112) by the Backward Euler method Yn+1 = Yn + ∆tλYn+1. can be rearranged for Yn+1 and Yn+1 = Yn/(1 – λ∆t) = Yn–1/(1 – λ∆t)2 = … = Y0/(1 – λ∆t)n+1. The reason why we consider the behaviour of equation (112) is that it is a model for the behaviour of small errors. (119) (118) (117) For Reλ ≤ 0 this is always satisfied and so the Backward Euler method is unconditionally stable. (113) (114) If we choose a time step ∆t which does not satisfy (114) then Yn will increase without limit. y ) + ε ∂f + O(ε 2 ) ∂y = f t . (120) – 64 – . Suppose that at some stage during the solution process our approximate solution is y$ = y + ε. which will be stable provided 1 – λ∆t > 1. Substituting this into our differential equation of the form y’ = f(t. The CrankNicholson method may be analysed in a similar fashion with (1λ∆t/2)Yn+1 = (1+λ∆t/2)Yn.Numerical Methods for Natural Sciences IB Introduction Yn+1 = (1 + λ∆t)Yn = (1 + λ∆t)2Yn–1 = … = (1 + λ∆t)n+1Y0.
1 is just the Euler method). explicit methods require less computation per step. Note that this convergence criterion is more stringent than the stability criterion established in §6. There are other reasons.Yn+1) to correct this prediction using something similar to an implicit step.8 Predictorcorrector methods One approach to using an implicit method is to use a direct iteration to solve the implicit equation. They achieve this by using an explicit method to predict the solution Yn+1(p) at tn+1 and then utilise f(tn+1. we are left with a secondorder implicit scheme. (125) requiring ½ ∆t ∂f/∂Y < 1 for convergence.0 = Yn. From (21) we know it will converge if ∂ Y + ∆t f t . it may not be necessary or desirable to carry on iterating until convergence is achieved. The above examples.i)]. Applying the theory for direct iteration outlined in §3. Once convergence is achieved.i). (121) with the magnitude of the term in square brackets always less than unity for Reλ < 0. (124) and the solution will gradually approach the Backward Euler solution. In general. and iterate over i until the solution has converged. A similar strategy may be applied to the CrankNicholson method to obtain Yn+1.Yn+1.Yn+1. Thus. – 65 – . 6. if iterated only a finite number of times rather than until convergence is achieved. However. belong to a class of methods known as predictorcorrector methods. however. like Backward Euler. Yn +1.Yn+1(p)) as an approximation to f(tn+1. Predictorcorrector methods try to combine the advantages of the simplicity of explicit methods with the improved accuracy of implicit methods. Moreover.6. the condition for convergence using (125) imposes the same restrictions upon ∆t as stability of the Euler method.Yn. for taking this approach. then multiple corrections to this to get the solution to converge on the Backward Euler method. we may write Yn+1.i n ( ( ∂f )) = ∆t ∂Y < 1 . the Backward Euler method has the form Yn+1 = Yn + ∆t f(tn+1. For example. CrankNicholson is unconditionally stable.7! What we are effectively doing here is making a prediction with the Euler method (our equation for Yn+1.Numerical Methods for Natural Sciences IB Introduction to arrive at Yn+1 = [(1+λ∆t/2)/(1 – λ∆t/2)]n+1 Y0.Yn+1). (123) (122) setting Yn+1.i = Yn + ½ ∆t [f(tn+1. but are only conditionally stable and so may require far smaller step sizes than an implicit method of nominally the same order.i) + f(tn.i ∂Yn +1.i = Yn + ∆t f(tn+1.
Yn+1 = (Yn+1(1) + Yn+1(2))/2 = [(1 + λ∆t)Yn + (1 + λ∆t + λ2∆t2) Yn]/2 = (1 + λ∆t + 1/2λ2∆t2) Yn = (1 + λ∆t + 1/2λ2∆t2)n+1 Y0. (130a) (130b) – 66 – .Yn+1(1)). second order as may be seen by comparison of (129c) with the Taylor Series expansion. It is also the same as the criterion for the iterative CrankNicholson method given in (125) to converge.Yn).1 Improved Euler method The simplest of these methods combines the Euler method as the predictor Yn+1(1) = Yn + ∆tf(tn.Numerical Methods for Natural Sciences IB Introduction 6.Yn+½k(1)) . 6. k(2) = ∆tf(tn+1/2∆t . Yn+1(2) = Yn + λ∆tYn+1(1) = Yn + λ∆t (Yn + λ∆tYn) = (1 + λ∆t + λ2∆t2)Yn.8. The Improved Euler method is the simplest of this subgroup. commonly known as the Improved Euler method. and then the Backward Euler to give the corrector Yn+1(2) = Yn + ∆tf(tn.2 RungeKutta methods The Improved Euler method is the simplest of a family of similar predictor corrector methods following the form of a single predictor step and one or more corrector steps. One subgroup of this family are the RungeKutta methods which use a fixed number of corrector steps. (128) (127) (126) To understand the stability of this method we again use the y’ = λy so that the three steps described by equations (126) to (128) become Yn+1(1) = Yn + λ∆tYn. The accuracy of the method is. Perhaps the most widely used of these is the fourth order method: k(1) = ∆tf(tn. is identical to the Euler method. Thus the stability of this method. This is not surprising as it is limited by the stability of the initial predictive step. (129c) (129b) (129a) Stability requires 1 + λ∆t + 1/2λ2∆t2 < 1 (for Reλ < 0) which in turn restricts ∆t < 2λ–1. The final solution is the mean of these: Yn+1 = (Yn+1(1) + Yn+1(2))/2.Yn) . however.8. or until the estimate for Yn+1 converges to some tolerance. The corrector step may be repeated a fixed number of times.
All terms up to order ∆t4 can be shown to match. (132) (131) and similarly for Y"". – 67 – . This is then compared with a full TaylorSeries expansion for Yn+1 up to fourth order. with the error coming in at ∆t5. k(4) = ∆tf(tn+∆t .Yn)+(∆t/2)(∂f/∂t+f∂f/∂y)].Yn+1/2k(1)) = ∆t[f(tn. and similarly for k(3) and k(4). (130c) (130d) (130e) In order to analyse this we need to construct TaylorSeries expansions for k(2) = ∆tf(tn+1/2∆t.Yn+k(3)) . Yn+1 = Yn + (k(1) + 2k(2) + 2k(3) + k(4))/6. To achieve this we require Y" = df/dt = ∂f/∂t + ∂y/∂t ∂f/∂y = ∂f/∂t + f ∂f/∂y.Yn+½k(2)) . Y’" = d2f/dt2 = ∂2f/∂t2 + 2f ∂2f/∂t∂y + ∂f/∂t ∂f/∂y + f2 ∂2f/∂y2 + f (∂f/∂y)2.Numerical Methods for Natural Sciences IB Introduction k(3) = ∆tf(tn+1/2∆t .
we may express this as the system of equations x0’ = x1 x1’ = x2 x2’ = x3 . xn–2’ = xn–1 xn–1’ = f(t. and some rely on a combination of this material and the linear algebra of §4. n −1 .1 Shooting method Suppose we are solving a second order equation of the form y" = f(t.x0. provided it is possible to write an explicit expression for the highest order derivative and the system has a complete set of initial conditions. then the simple extension of the first order methods outlined in section 7. By writing x0=y.. two (or more) initial/boundary conditions are required. y .2. ….…. xn–1=dn–1y/dtn–1..2 Boundary value problems For second (and higher) order odes..xn–2). 7. dy/dt. dt n dt d t dt (133) where at t = t0 we know the values of y.Numerical Methods for Natural Sciences IB Introduction 7 Higher order ordinary differential equations In this section we look at how to solve higher order ordinary differential equations.y. 2 K . Consider some equation dn y dy d 2 y d n −1 y = f t . d2y/dt2. This decision may affect the order and convergence of the method. x2=d2y/dt2. (134) and use the standard methods for updating each xi for some tn+1 before proceeding to the next time step. A decision needs to be made as to whether the values of xi for tn or tn+1 are to be used on the right hand side of the equation for xn–1’. 7. All the methods given may be applied to higher ordinary differential equations. 7. . If these two conditions do not correspond to the same point in time/space.y’) subject to y(0) = c0 and y(1) = c1.x1. With the shooting method we apply the y(0)=c0 boundary condition and make some guess – 68 – .1 can not be applied without modification. There are two relatively simple approaches to solve such equations. dn1y/dtn1. …. Detailed analysis may be undertaken in a manner similar to that for the first order ordinary differential equations. x1=dy/dt.1 Initial value problems The discussion so far has been for first order ordinary differential equations. Some of the techniques discussed here are based on the techniques introduced in §6.
y"i ≈ (Yi+1 – 2Yi + Yi–1)/∆t2. If we use the central difference approximations y'i ≈ (Yi+1 – Yi–1)/2∆t. For a system of order n with m boundary conditions at t = t0 and n–m boundary conditions at t = t1. (1+1/2a∆t)Y1 + (b∆t2–2)Y2 + (1–1/2a∆t)Y3 = c∆t2. We may use the root finding methods discussed in section 3 to undertake this refinement. this choice is not ideal. 2 ∆t 2 ∆t (136a) (136b) (137) and we may write the boundary condition at t = t0 as Y0 = α. Consider the second order linear system y" + ay’ + by = c. The number of iterations which will need to be made in order to achieve an acceptable tolerance will depend on how good the refinement algorithm for α is. The computational cost of refining these n–m guesses will rapidly become large as the dimensions of the space increase.2. – 69 – . (135) with boundary conditions y(t0) = α and y(t1) + y’(t1) = β. If this does not satisfy y(1) = c1 to some acceptable tolerance. ∆t (139) but as we shall see. The same approach can be applied to higher order ordinary differential equations. we revise our guess for y’(0) to some value α1. (138) For t = t1 we must express the derivative in the boundary condition in finite difference form using points that exist in our mesh. The calculation proceeds until we have a value for y(1). Substitution into (135) gives Yi +1 − 2Yi + Yi −1 Y −Y + a i +1 i −1 + bYi = c . This gives us two initial conditions so that we may apply the simple timestepping methods already discussed in section 7. 7. The above strategy leads to the system of equations Y0 = α. we will require guesses for n–m initial conditions. and repeat the time integration to obtain an new value for y(1). This process continues until we hit y(1)=c1 to the acceptable tolerance. The obvious thing to do is to take the backward difference so that the boundary condition becomes Yn + Yn − Yn −1 =β.2 Linear equations The alternative is to rewrite the equations using a finite difference approximation with step size ∆t = (t1–t0)/N to produce a system of N+1 simultaneous equations. say. (1+1/2a∆t)Y0 + (b∆t2–2)Y1 + (1–1/2a∆t)Y2 = c∆t2.Numerical Methods for Natural Sciences IB Introduction that y’(0) = α0.1.
(1+1/2a∆t)Yn1 + (b∆t2–2)Yn + (1–1/2a∆t)Yn+1 = c∆t2. Clearly the same tricks can be applied if we have a boundary condition on the derivative at t0. −Yn1 + 2∆tYn + Yn+1 = 2β∆t The attraction of this approach is that we maintain our simple tridiagonal system.4. but not as sparse as tridiagonal. We have two ways in which we may improve the accuracy of this boundary condition. This leads to the last three equations being (1+1/2a∆t)Yn–2 + (b∆t2–2)Yn–1 + (1–1/2a∆t)Yn = c∆t2. The second approach is to artificially increase the size of the domain by one mesh point and impose the boundary condition inside the domain. The solution may be undertaken using the modified LU decomposition introduced in section 4.4 but where a system of linear equations requires solution for each iteration. but will require an iterative solution of the resulting matrix system Ax = b as the matrix A will be a function of x. 2 ∆t (140) (141) so that the last equation in our system becomes Yn2 − 4Yn1 + (2∆t + 3) Yn = 2β∆t. Nonlinear equations may also be solved using this approach. Higher order linear equations may be catered for in a similar manner and the matrix representing the system of equations will remain banded. There is one problem with the scheme outlined above: while the Y" and Y’ derivatives within the domain are second order accurate.5. The problem with this is that we then loose our simple tridiagonal system. similar in principle to that introduced in section 3. (143) (142) 7. the boundary condition at t = t1 is imposed only as a firstorder approximation to the derivative. −Yn1 + (∆t + 1) Yn = β∆t This tridiagonal system may be readily solved using the method discussed in section 4.Numerical Methods for Natural Sciences IB Introduction … (1+1/2a∆t)Yn–2 + (b∆t2–2)Yn–1 + (1–1/2a∆t)Yn = c∆t2.3 Other considerations* Not examinable * Not examinable – 70 – . The first way is the most obvious: make use of the Yn2 value to construct a secondorder approximation to Y" so that we have Yn′ = 3Yn − 4Yn −1 + Yn − 2 + O( ∆t 2 ) . In most circumstances this is most efficiently achieved through a NewtonRaphson algorithm. The additional mesh point is often referred to as a dummy mesh point or dummy cell.
Numerical Methods for Natural Sciences IB Introduction 7.3.2 Error and step control* Not examinable * * Not examinable Not examinable – 71 – .3.1 Truncation error* Not examinable 7.
j − ∆x + − + O( ∆x 4 ) . Suppose we discretise the solution ϕ onto a m+1 by n+1 rectangular grid (or mesh) given by xi = x0 + i∆x yj = y0 + j∆y where i=0. j 2 3 ∆x 2 ∂ ϕ i . j ∆x 2 + Φ i . j (147b) it is clear that we may approximate ∂2ϕ/∂x2 and ∂2ϕ/∂y2to the second order using the four adjacent mesh points to obtain the finite difference approximation Φi +1. The mesh spacing is ∆x = (x1–x0)/m and ∆y = (y1–y0)/n. j = ϕ i .x1]. j ∆y 3 ∂ ϕ i . j +1 − 2 Φ i .j. j = ϕ i . j=0. j 2 3 ∆y 2 ∂ ϕ i . j ∆y 3 ∂ ϕ i .Numerical Methods for Natural Sciences IB Introduction 8 Partial differential equations In this section we shall concentrate on the numerical solution of second order linear partial differential equations.n. ∂y 2 ∂y 6 ∂y 2 3 ∆y 2 ∂ ϕ i . j − ∆y + − + O( ∆y 4 ) . 2 ∂x 2 6 ∂x 3 ∂x 2 3 ∆x 2 ∂ ϕ i . j ∆x 3 ∂ ϕ i . j + Φ i . ∂y 2 ∂y 2 6 ∂y 3 ∂ϕ i . j − 2Φ i .1 Laplace equation Consider the Laplace equation in two dimensions ∂2ϕ ∂2ϕ ∇ ϕ = 2 + 2 = 0. j −1 ∆y 2 =0 (148) – 72 – . 2 ∂x 2 6 ∂x 3 ∂x ∂ϕ i . y in [y0. j (147b) ϕ i . j (147a) ∂ϕ i . j ∂ϕ i . j + Φ i −1. Let ϕij = ϕ(xi. j + ∆x + + + O( ∆x 4 ) . j ∆x 3 ∂ ϕ i .y1]. and Φij ≈ ϕij be the approximate solution at that mesh point.j. 8.m. j −1 = ϕ i . j (147b) ϕ i . j 4 + ∆y + 2 + 3 + O( ∆y ) .yj) (146) (145a) (145b) be the exact solution at the mesh point i. j ϕ i −1. j +1 = ϕ i . By considering the Taylor Series expansion for ϕ about some mesh point i. ∂x ∂y 2 (144) in some rectangular domain described by x in [x0. ϕ i +1.
1. 8. von Neumann or mixed boundary conditions to specify the boundary values of ϕij. This system may be solved directly using Gauss Elimination as discussed in section 4. the linear system is – 73 – .Numerical Methods for Natural Sciences IB Introduction for the internal points 0<i<m.0) = 0 i=3. j=3 i=3.1 Direct solution Provided the boundary conditions are linear in ϕ. The structure of the system ensures A is relatively sparse.0) = y i=0. ∂ϕ/∂y = 1 i=0. The linear system for this domain is given in (150). in the domain shown in figure 23.y) = y ϕ(3. the storage and computational cost of such a solution will become prohibitive even for relatively modest m and n. (149) with (m+1)(n+1) equations. this approach may be feasible if the total number of mesh points (m+1)(n+1) required is relatively small. In addition to this we will have either Dirichlet. but as the matrix A used to represent the complete system will have [(m+1)(n+1)]2 elements. The system of linear equations described by (148) in combination with the boundary conditions may be solved in a variety of ways. our finite difference approximation is itself linear and the resulting system of equations may be written as Aϕ = b.1. 0<j<n. j=0 Figure 23: Sketch of domain for Laplace equation. However. j=3 ϕ(0. For example. j=0 ϕ(x. consisting of a tridiagonal core with one nonzero diagonal above and another below this. These nonzero diagonals are offset by either m or n from the leading diagonal.
Careful choice of the order of the matrix elements (i.4). substantially reducing the storage and computational requirements (see section 4.e. specialised solvers have been developed for this problem. Because of the wide spread need to solve Laplace’s and related equations.Numerical Methods for Natural Sciences IB Introduction 1 1 1 1 1 1 1 1 −4 1 1 −4 1 1 1 1 1 −1 −1 −1 −1 1 −4 1 1 −4 1 1 1 1 1 1 1 1 1 ϕ 00 0 ϕ 0 10 ϕ 20 0 ϕ 30 0 ϕ 01 1 ϕ11 0 ϕ 0 21 ϕ 31 1 ϕ = 2 02 ϕ12 0 ϕ 0 22 ϕ 32 2 ϕ 03 1 ϕ 13 1 ϕ 23 1 (150) 1 ϕ 33 1 Provided pivoting (if required) is conducted in such a way that it does not place any nonzero elements outside this band then solution by Gauss Elimination or LU Decomposition will only produce nonzero elements inside this band. One of the best of these is Hockney’s method for solving Aϕ = b which may be used to reduce a block tridiagonal matrix (and the corresponding righthand side) of the form $ A I $ I A I $ I I A $ I A I A= . $ I I A $ I A O O O I $ I A into a block diagonal matrix of the form (151) – 74 – . by x or by y) may help reduce the size of this matrix so that it need contain only O(m3) elements for a square domain.
We shall introduce these methods before looking at the basic mathematics behind them. repeat from step 2.j–1)/4. $ B $ B O O O $ B (152) $ $ where A and B are themselves block tridiagonal matrices and I is an identiy matrix. 8.j+1 + Φi.2 Relaxation An alternative to direct solution of the finite difference equations is an iterative numerical solution. To find the solution for a twodimensional Laplace equation simply: 1. If solution does not satisfy tolerance.Numerical Methods for Natural Sciences IB Introduction $ B $ B $ B $ B . The computational cost is O(p log p).1 Jacobi The Jacobi Iteration is the simplest approach. 3..1. 2. An example of this algorithm is given in the table below where ϕ = 0 on the boundaries and the initial guess is ϕ = 1 in the interior (clearly the answer is ϕ = 0 everywhere). The first four iterations are labelled a. where p is the total number of mesh points. Initialise Φij to some initial guess.j + Φi–1. c and d. Replace old solution Φ with new estimate Φ*. reducing the errors as it does so. The main drawback of this method is that the boundary conditions must be able to be cast into the block tridiagonal format. For clarity we consider the special case when ∆x = ∆y. There are a variety of approaches with differing complexity and speed. 4.2. For each internal mesh point set Φ*ij = (Φi+1.1. 5. a) 0 b) 0 c) 0 a) 0 b) 0 c) 0 a) 0 b) 0 c) 0 a) 0 b) 0 c) 0 a) 0 b) 0 c) 0 (153) – 75 – . These iterative methods are often referred to as relaxation methods as an initial guess at the solution is allowed to slowly relax towards the true solution.j + Φi. Apply the boundary conditions. b. 8. This process may be performed iteratively to reduce an n dimensional finite difference approximation to Laplace’s equation to a tridiagonal system of equations with n–1 applications.
An example of the first iteration of the GaussSeidel approach using the same initial guess as our Jacobi iteration in §8.1 is given below. the method is less amenable to vectorisation as.1. The advantages of this are: • Less memory required (there is no need to store Φ*).Numerical Methods for Natural Sciences IB Introduction a) 0 b) 0 c) 0 a) 0 b) 0 c) 0 a) 0 b) 0 c) 0 a) 0 b) 0 c) 0 a) 1 b) 1/2 c) 3/8 a) 1 b) 3/4 c) 1/2 a) 1 b) 1/2 c) 3/8 a) 0 b) 0 c) 0 a) 1 b) 3/4 c) 1/2 a) 1 b) 1 c) 3/4 a) 1 b) 3/4 c) 1/2 a) 0 b) 0 c) 0 a) 1 b) 1/2 c) 3/8 a) 1 b) 3/4 c) 1/2 a) 1 b) 1/2 c) 3/8 a) 0 b) 0 c) 0 a) 0 b) 0 c) 0 a) 0 b) 0 c) 0 a) 0 b) 0 c) 0 a) 0 b) 0 c) 0 The coefficients in (153) (here all 1/4) used to calculate the refined estimate is often referred to as the stencil or template.2 GaussSeidel The GaussSeidel Iteration is very similar to the Jacobi Iteration. the new estimate of one mesh point is dependent on the new estimates for those already scanned.n)) iterations to diffuse across the entire mesh.2. the only difference being that the new estimate Φ*ij is returned to the solution Φij as soon as it is completed. • Faster convergence (although still relatively slow).2. especially for larger grids. Other equations (e. for a given iteration. – 76 – . with the iteration starting at the bottom left and moving across then up. the biharmonic equation. the Jacobi Iteration is very slow to converge. ∇4Ψ = 0) may be solved by introducing a stencil appropriate to that equation. allowing it to be used immediately rather than deferring its use to the next iteration.1. 8. While very simple and cheap per iteration.g. Higher order approximations may be obtained by simply employing a stencil which utilises more points. Corrections to errors in the estimate Φij diffuse only slowly from the boundaries taking O(max(m. On the other hand.
3 RedBlack ordering A variant on the GaussSeidel Iteration is obtained by updating the solution Φij in two passes rather than one.Numerical Methods for Natural Sciences IB Introduction a) 0 b) 0 c) 0 d) 0 a) 0 b) 0 c) 0 d) 0 a) 0 b) 0 c) 0 d) 0 a) 0 b) 0 c) 0 d) 0 a) 0 b) 0 c) 0 d) 0 a) 0 b) 0 c) 0 d) 0 a) 1 b) 13/32 c) d) a) 1 b) 5/8 c) d) a) 1 b) 1/2 c) d) a) 0 b) 0 c) 0 d) 0 a) 0 b) 0 c) 0 d) 0 a) 1 b) 29/64 c) d) a) 1 b) 13/32 c) d) a) 1 b) 5/8 c) d) a) 0 b) 0 c) 0 d) 0 a) 0 b) 0 c) 0 d) 0 a) 1 b) 29/128 c) d) a) 1 b) 29/64 c) d) a) 1 b) 13/32 c) d) a) 0 b) 0 c) 0 d) 0 a) 0 b) 0 c) 0 d) 0 a) 0 b) 0 c) 0 d) 0 a) 0 b) 0 c) 0 d) 0 a) 0 b) 0 c) 0 d) 0 a) 0 b) 0 c) 0 d) 0 Clearly in this example the solution is converging on zero more rapidly.2. If we consider the mesh points as a chess board. but the symmetry that existed in the initial guess has been lost.1. 8. – 77 – . then the white squares would be updated on the first pass and the black squares on the second pass. The advantages • No interdependence of the solution updates within a single pass aids vectorisation.
Note that the cells labelled “Red” are updated on the first pass and those labelled “Black” on the second pass. for each internal mesh point. 8. yet is maintaining the symmetries which existed in the initial guess. particularly those looking at instabilities.j–1)/4.j+1 + Φi. An example.j+1 + Φi. (154) – 78 – . Black a) 0 b) 0 Red a) 0 b) 0 Black a) 0 b) 0 Red a) 0 b) 0 Black a) 0 b) 0 Red a) 0 b) 0 Black a) 1 b) 3/8 Red a) 1 b) 3/4 Black a) 1 b) 3/8 Red a) 0 b) 0 Black a) 0 b) 0 Red a) 1 b) 3/4 Black a) 1 b) 3/4 Red a) 1 b) 3/4 Black a) 0 b) 0 Red a) 0 b) 0 Black a) 1 b) 3/8 Red a) 1 b) 3/4 Black a) 1 b) 3/8 Red a) 0 b) 0 Black a) 0 b) 0 Red a) 0 b) 0 Black a) 0 b) 0 Red a) 0 b) 0 Black a) 0 b) 0 In this example the solution is converging faster than the Jacobi iteration and at a comparable rate to GaussSeidel. the maintenance of symmetries is important. we use Φ*ij = (1–σ)Φij + σ(Φi+1.j–1)/4.Numerical Methods for Natural Sciences IB Introduction • Faster convergence at low wave numbers.j + Φi.4 Successive Over Relaxation (SOR) It has been found that the errors in the solution obtained by any of the three preceding methods decrease only slowly and often decrease in a monotonic manner.j + Φi–1.1. the order in which the cells are updated within one pass does not matter.j + Φi–1. Hence.j + Φi. For many problems. rather than setting Φ*ij = (Φi+1. based on those in the previous two sections. is given below. Unlike the GaussSeidel.2.
4 produces good results. For linear problems. If σ = 1 then application of the stencil removes all the error in the solution at the wave length of the mesh for that point. Select the initial finest grid resolution p=P0 and set b(p) = 0 and make some initial guess at the solution Φ (p) 2. compared with the slow asymptotic convergence above.WHUDWLRQV Multigrid methods try to improve the rate of convergence by considering the problem of a hierarchy of grids. but has little impact on larger wave lengths. for the example considered above. If at coarsest resolution (p=0) then solve A(p)Φ (p)=b(p) exactly and jump to step 7 3. applying boundary conditions * Not examinable – 79 – . The larger wave length errors in the solution are dissipated on a coarser grid while the shorter wave length errors are dissipated on a finer grid.3 Multigrid* The big problem with relaxation methods is their slow convergence. the solution would converge in one complete Jacobi multigrid iteration. Typically. This may be seen if we consider the onedimensional equation d2ϕ/dx2 = 0 subject to ϕ(x=0) = 0 and ϕ(x=1) = 1. The optimal value of σ will depend on the problem being solved and may vary as the iteration process converges. Suppose our initial guess for the iterative solution is that Φi = 0 for all internal mesh points. Relax the solution at the current grid resolution.Numerical Methods for Natural Sciences IB Introduction for some value σ. With the Jacobi Iteration the correction to the internal points diffuses only slowly along from x = 1. the basic multigrid algorithm for one complete iteration may be described as 1.2 to 1. a value of around 1. .1. 8. In some special cases it is possible to determine an optimal value analytically. however.
If not at current finest grid (P0). The hierarchy of grids is normally chosen to differ in dimensions by a factor of 2 in each direction. (157) (156) (155) * Not examinable – 80 – . then it can be seen that the Multigrid method may often be faster than a direction solution (which will require O(n3). Repeat from step 2 7. and search for the vector of roots to f(x) = 0 by writing xn+1 = g(xn). O(n2) or O(n log n) operations. increment P0 and repeat from step 7 11. is that irregular shaped domains or complex boundary conditions are implemented more easily. Successive Over Relaxation and Multigrid methods may be applied to any system of linear equations to interatively improve an approximation to the exact solution. A further advantage of Multigrid and other iterative methods when compared with direct solution. the coefficient in front of the n for the Multigrid solution may be relatively large so that direct solution may be faster. repeat from step 7 10. The refining of the correction to a finer grid will be achieved by (bi)linear or higher order interpolation. If not at final desired grid. For small problems.6. relaxation methods which are the basis of the Jacobi. where g(x) = D−1{[A+D]x − b}. repeat from step 2. Relax the solution at the current grid resolution.1. The basis for this is identical to the Direct Iteration method described in section 3. As the number of operations per complete iteration for n mesh points is O(n)+O(n/2d)+ +O(n/22d)+….Numerical Methods for Natural Sciences IB Introduction 4. 8. We start by writing the vector function f(x) = Ax − b. Coarsen the error b(p–1)←r to the next coarser grid and decrement p 6. and the coarsening may simply be by subsampling or averaging the error vector r. It has been found that the number of iterations required to reach a given level of convergence is more or less independent of the number of mesh points. Refine the correction to the next finest grid Φ (p+1) = Φ (p+1)+αΦ (p) and increment p 8. The difficulty with this for the Multigrid method is that care must be taken in order to ensure consistent boundary conditions in the embedded problems. depending on the method used). If not converged. where d is the number of dimensions in the problem. Typically the relaxation steps will be performed using Successive Over Relaxtion with RedBlack ordering and some relaxation coefficient σ. The factor α is typically less than unity and effectively damps possible instabilities in the convergence. Calculate the error r = AΦ (p)–b(p) 5.4 The mathematics of relaxation* In principle. GaussSeidel. applying boundary conditions 9. This is particularly true if n is large or there are a large number of dimensions in the problem.
then εn+1 = xn+1 − x* = D1{[A+D]xn − b} − D1{[A+D]x* − b} = D−1[A+D](xn − x*) = D−1[A+D]εn = {D−1[A+D]}n+1 ε0. The convergence remains. In the asymptotic limit. careful choice of D can aid the speed at which the method converges.6). linear. it is sufficient for us to consider the eigen value problem Bεn = λεn. 8. (159) (158) where B = D−1[A+D] for some suitable norm. and since it is the eigen values of B = D−1[A+D] rather than A itself which are important. the smaller the magnitude of this maximum eigen value the more rapid the convergence.1 Jacobi and GaussSeidel for Laplace equation * The structure of the finite difference approximation to Laplace’s equation lends itself to these relaxation methods. As any error vector εn may be written as a linear combination of the eigen vectors of our matrix B. however. In one dimension. From this it is clear that convergence will be linear and requires εn+1 = Bεn < εn.1.Numerical Methods for Natural Sciences IB Introduction with D a diagonal matrix (zero for all offdiagonal elements) which may be chosen arbitrarily. − 2 1 1 −2 1 1 −2 1 1 −2 1 A= 1 −2 1 O O O 1 − 2 (161) and both Jacobi and GaussSeidel iterations take D as 2I (I is the identity matrix) on the diagonal to give B = D−1[A+D] as * Not examinable – 81 – .4. Since we have the ability to choose the diagonal matrix D. (160) and require max(λ) to be less than unity. Let us assume the exact solution is x* = g(x*). Typically this means selecting D so that the diagonal of B is small. We may analyse this system by following our earlier analysis for the Direct Iteration method (section 3.
(5/4)λ4 + (3/8)λ2 . det(Bλ)(5) = λ5 + λ3 . λ2(5) = 0. … which may be solved to give the eigen values λ(1) = 0 . all the eigen values satisfy λ < 1. 1/2 .¼ . where the subscript gives the size of the matrix B. In this case the determinant may be obtained using the recurrence relation det(Bλ)(n) = λ det(Bλ)(n1) . … (166) (165) (164) (163) It can be shown that for a system of any size following this general form. 1/4. thus proving the relaxation method will always converge. As we increase the number of mesh points. the number of eigen values increases and gradually fills up the range λ < 1.(3/16)λ . λ2(4) = (3 ± √5)/8 . det(Bλ)(6) = λ6 . From this we may see det(Bλ)(1) = λ . det(Bλ)(4) = λ4 .(1/64) .¾ λ2 + (1/16) . λ2(3) = 0.Numerical Methods for Natural Sciences IB Introduction 0 12 1 2 0 1 2 12 0 12 12 0 12 B= 12 0 12 O O O 1 2 0 (162) The eigen values λ of this matrix are given by the roots of det(BλI) = 0. with the – 82 – . 3/4 . λ2(2) = 1/4 .1/4 det(Bλ)(n2) . det(Bλ)(3) = λ3 + ½λ . det(Bλ)(2) = λ2 .
the solution will converge if the A matrix is diagonally dominant. although the expressions for the determinant and eigen values is correspondingly more complex.4.2 Successive Over Relaxation for Laplace equation * The analysis of the Jacobi and GaussSeidel iterations may be applied equally well to Successive Over Relaxation. the convergence of the relaxation method slows considerably for large problems.3 Other equations* Relaxation methods may be applied to other differential equations or more general systems of linear equations in a similar manner. The multigrid approach enables the solution to converge using a much smaller system of equations and hence smaller eigen values for the larger distances. A similar analysis may be applied to Laplace’s equation in two or more dimensions. i. As a rule of thumb. On the otherhand. bypassing the slow convergence of the basic relaxation method. thus increasing the rate of convergence. the numerically largest values occur on the diagonal. * * Not examinable Not examinable – 83 – . the eigen values of B will exceed unity and the relaxation method will diverge. careful choice of σ will allow the eigen values of B to be less than those for Jacobi and GaussSeidel.1.4. The main difference is that D = (2/σ)I so that 1 − σ σ 2 σ 2 1− σ σ 2 σ 2 1− σ σ 2 σ 2 1− σ σ 2 = σ 2 1− σ σ 2 O O O σ 2 1 − σ B SOR (167) and thus λ + σ −1 B SOR − λI = σ B J − I σ (168) and the corresponding eigen values λSOR are related to the eigen values λJ for the Jacobi and GaussSeidel methods by λSOR = 1 + σ(λJ − 1) (169) Thus if σ is chosen inappropriately. As a result of λ→1.e. If this is not the case. 8. 8.1. The large eigen values are responsible for decreasing the error over large distances (many mesh points).Numerical Methods for Natural Sciences IB Introduction numerically largest eigen values becoming closer to unity.
allowing a solution to be evaluated with O(n log n) operations.1.t=0) = u0(x.2 Poisson equation The Poisson equation ∇2ϕ = f(x) may be treated using the same techniques as Laplace’s equation. 8. In its simplest form the FFT algorithm requires there to be n = 2p mesh points in the direction(s) to be transformed. This conversion process can be very efficient if the Fast Fourier Transform algorithm is used. We shall explore some of the options for achieving this in the following sections. 8. The idea is to divide and conquer! Details of the FFT algorithm may be found in any standard text. Using a square mesh of step size ∆x = ∆y = 1/m. 8. 8. scaled suitably to reflect any scaling in A.1. then of pairs of pairs and so on up to the full resolution.5 FFT* One of the most common ways of solving Laplace’s equation is to take the Fourier transform of the equation to convert it into wave number space and there solve the resulting algebraic equations.1 and y=0.Numerical Methods for Natural Sciences IB Introduction SOR can still be used.y.6 Boundary elements* 8. Suppose the initial conditions are u(x.y) and we wish to evaluate the solution for t > 0.y. ∂2u ∂2u ∂u = D 2 + 2 . ∂t ∂y ∂x (170) subject to u(x. It is simply necessary to set the righthand side to f.1. but it may be necessary to choose σ < 1 whereas for Laplace’s equation σ ≥ 1 produces a better rate of convergence. and taking the diffusivity D = 1.t) = 0 on the boundaries x=0.3.3 Diffusion equation Consider the twodimensional diffusion equation. The efficiency of the algorithm is achieved by first calculating the transform of pairs of points. we may utilise our earlier approximation for the Laplacian operator (equation (148)) to obtain * * * Not examinable Not examinable Not examinable – 84 – .7 Finite elements* 8.1 Semidiscretisation One of the simplest and most useful approaches is to discretise the equation in space and then solve a system of (coupled) ordinary differential equations in time in order to calculate the solution.1. then of pairs of transforms.
We choose U(0)i. j +1 + ui . ( ) (173) where the Courant number µ = ∆t/∆x2. j ∂t ≈ ui +1. j + U i(−1).3 Stability Stability of the Euler method solving the diffusion equation may be analysed in a similar way to that for ordinary differential equations. j + U i . j +1 + U i . (i=m. Substituting this into (173) gives U(1)i. j ( t ) = U i +1. As we shall see.nj) + µ U i(+1).2 Euler method Applying the Euler method Yn+1 = Yn+∆tf(Yn. On the boundaries (i=0. (i. (174) describes the size of the time step relative to the spatial discretisation. 8. stability of the solution depends on µ in contrast to an ordinary differential equation where it is a function of the time step ∆t only.j = sin(αi)sin(βj) + µ{sin[α(i+1)]sin(βj) + sin[α(i−1)]sin(βj) + sin(αi)sin[β(j+1)] + sin(αi)sin[β(j1)] − 4 sin(αi)sin(βj)} = sin(αi)sin(βj) + µ{[sin(αi)cos(α) + cos(αi)sin(α)]sin(βj) + [sin(αi)cos(α) − cos(αi)sin(α)]sin(βj) + sin(αi)[sin(βj)cos(β) + cos(βj)sin(β)] + sin(αi)[sin(βj)cos(β) − cos(βj)sin(β)] − 4 sin(αi)sin(βj)} – 85 – .m–1.m–1 and j=1.tn) to our spatially discretised diffusion equation gives n n U i(. j + ui . j + ui −1. then we must simply solve the (m–1)2 coupled ordinary differential equations U i′.j).j=sin(αi) sin(βj).Numerical Methods for Natural Sciences IB Introduction ∂ui .nj+1) = U i(. j + U i −1. j −1 − 4ui . We start by asking the question “does the Euler method converge as t–>infinity?” The exact solution will have u –> 0 and the numerical solution must also do this if it is to be stable. 8. j −1 − 4U i . j ∆x 2 (171) for the internal points i=1.j=m) we simply have uij=0. As we shall see. (175) for some α and β chosen as multiples of π/m to satisfy u = 0 on the boundaries.3.nj)−1 + −4U i(.j=0) and (i.nj)+1 + U i(. j / ∆x 2 .j). however.nj) .3. ( ) (172) In principle we may utilise any of the time stepping algorithms discussed in earlier lectures to solve this system. care needs to be taken to ensure the method chosen produces a stable solution. j + U i(. If Uij represents our approximation of u at the mesh points xij.
j = sin(αi)sin(βj) {1 − 4µ[sin2(α/2) + sin2(β/2)]}n.3.l are integers and Lx.3. and as our stability analysis of the previous section shows the conditions under which the solution for each Fourier mode is stable. In addition to satisfying the boundary conditions. these initial conditions represent a set of orthogonal functions which may be used to construct any arbitrary initial conditions as a Fourier series. since the diffusion equation is linear. For simplicity. consider the one dimensional diffusion equation ∂u ∂ 2 u = ∂t ∂x 2 (179) with u(x=0.3.Numerical Methods for Natural Sciences IB Introduction = sin(αi)sin(βj) + 2µ{sin(αi)cos(α) sin(βj) + sin(αi)sin(βj)cos(β) − 2 sin(αi)sin(βj)} = sin(αi)sin(βj){1 + 2µ[cos(α) + cos(β) − 2]} = sin(αi)sin(βj){1 − 4µ[sin2(α/2) + sin2(β/2)]}.3. 8.3 assumed initial conditions of the form sin(kπx/Lx) sin(lπy/Ly) where k.t) = 0 and apply the standard spatial discretisation for the curvature term to obtain U i( n +1) − µ ( n +1) {U i +1 − 2U i(n+1) + U i(−n1+1) } = U i(n) + µ {U i(+n1) − 2U i(n) + U i(−n1) } 2 2 (180) for the i=1. 8.3. Thus we must ensure ∆t < ∆x2/4. For partial differential equations such as the diffusion equation we may analyse this in the same manner as the Euler method of section 8.5 CrankNicholson The implicit CrankNicholson method is significantly better in terms of stability than the Euler method for ordinary differential equations. Applying this at consecutive times shows the solution at time tn is U(n)i. we can see that the equation (178) applies equally for arbitrary initial conditions. (178) A doubling of the spatial resolution therefore requires a factor of four more time steps so overall the expense of the computation increases sixteenfold. For this to be satisfied for arbitrary α and β we require µ < 1/4.t) = u(x=1. (177) (176) which then requires 1 − 4µ[sin2(α/2) + sin2(β/2)] < 1 for this to converge as n>infinity. Solution of this expression will involve the solution of a tridiagonal system for this onedimensional problem at each time step: – 86 – . The analysis for the diffusion equation in one or three dimensions may be computed in a similar manner.4 Model for general initial conditions Our analysis of the Euler method for solving the diffusion equation in section 8. Ly are the dimensions of the domain. Now.m−1 internal points.
Numerical Methods for Natural Sciences IB Introduction ( 0 1 U 0n +1) ( n +1) n 1 (n) (n) (n) − 1 µ 1 + µ − 1 µ U + µ U − 2U1 + U 0 ) 2 2 U 1 1 2 ( 2 ( ( n 1 1 ( U 2n +1) U 2 + 2 µ(U 3n ) − 2U 2n ) + U1( n ) ) − 1 µ 1+ µ − 2 µ 2 ( n +1) 1 1 − 2 µ 1+ µ − 2 µ (181) U 3 1 1 ( U 4n +1) − 2 µ 1+ µ − 2 µ ( n +1) = 1 1 − 2 µ 1+ µ − 2 µ U 5 1 1 U ( n +1) − 2 µ 1+ µ − 2 µ 6 n ( n) (n) (n) 1 1 1 ( n +1) − 2 µ 1 + µ − 2 µ U m−1 U m−1 + 2 µ(U m − 2U m−1 + U m− 2 ) 1 U mn +1) 0 ( To test the stability we again choose a Fourier mode.4. the CrankNicholson method is unconditionally stable. 8.2 Courant number* Not examinable * * * * Not examinable Not examinable Not examinable Not examinable – 87 – . Here we have only one spatial dimension so we use U(0)i = sin(θi) which satisfies the boundary condition if θ is a multiple of π. The step size ∆t may be chosen on the grounds of truncation error independently of ∆x. 1 + 2µ sin 2 1 + 2µ sin 2 n (182) Since the term [1–2µsin2(θ/2)]/[1+2µsin2(θ/2)] < 1 for all µ > 0.1 Upwind differencing* Not examinable 8.6 ADI* Not examinable 8.4. Substituting this into (182) we find U =U n i n −1 i 2 θ θ 1 − 2µ sin 2 2 0 1 − 2µ sin 2 2 θ = Ui 2 θ .3.4 Advection* Not examinable 8.
3 Numerical dispersion* Not examinable 8.5 LaxWendroff* Not examinable 8.4.Numerical Methods for Natural Sciences IB Introduction 8.4.4.4.4 Shocks* Not examinable 8.6 Conservative schemes* Not examinable * * * * Not examinable Not examinable Not examinable Not examinable – 88 – .
This operation is its selfinverse Decimal Two’s compliment (16 bit) A xor 1111111111111 + 1 5 –5 0000000000000101 1111111111111010 1111111111111011 Two’s compliment (16 bit) 1111111111111011 0000000000000100 0000000000000101 –A xor 1111111111111 + 1 Decimal –5 5 Examples Decimal –1 –5 –183 –255 –32767 –32768 Binary –1 –101 –10110111 –11111111 –111111111111111 –1111111111111110 Two’s compliment (16 bit) 1111111111111111 1111111111111011 1111111101001001 1111111100000001 1000000000000001 1000000000000000 • The number must be represented by a known number of bits. values from –32768 to 32767 are represented * * Not examinable Not examinable – 89 – . 9. values in the range –2n–1 to 2n–1–1 may be represented.Numerical Methods for Natural Sciences IB Introduction 9. inverting all bits) and adding one: –A = (A xor 11111111111112) + 1. For n=16. n (say) • The highest order bit indicates the sign of the number • If n bits are used to represent the integer. None of the material in this section is examinable.e. Positive integers are presented as their binary equivalents: Decimal Binary Two’s compliment (16 bit) 1 5 183 255 32767 1 101 10110111 11111111 111111111111111 0000000000000001 0000000000000101 0000000010110111 0000000011111111 0111111111111111 Negative numbers are obtained by taking the one’s compliment (i. Integers* Normally represented as 2’s compliment. Number representation* This section outlines some of the fundamentals as to how numbers are represented in computers. Any carry bit generated is discarded.1.
0 01000001 emmmmmmm mmmmmmmm mmmmmmmm 00000000 00000000 000000002 10000000 00000000 000000002 00000000 00000000 000000002 * Not examinable – 90 – . Mantissa. In both cases the number is stored as a sign bit followed by the expondent and the mantissa.0 8. 127 (=011111112) and 128 (=100000002) respectively. Zero. Exponent. This occupies eight bits for four byte values and eleven bits for eight byte values.0 and +1 are stored as 126 (=011111102).0 00111111 8.000002 1.0 1.2. and so improve the accuracy of the number representation.000002 Value Stored seeeeeee 0. The value zero (0) does not follow the pattern of a unit value for the first binary digit in the mantissa. Figure 24 shows the IEE format for four byte floating point values. The mantissa is stored as unsigned binary in the remaining 23 (four byte values) or 52 (eight byte values) bits. In the example below we show the four byte IEE representations of a selection of values. it is stored with all three components set to zero.0 00000000 1. The number of bits used to represent the exponent and mantissa varies depending on the total number of bits. while 1 indicates negative values.0 02 + 02 + 02 Sign Exponent 0 000000002 0 011111112 3 100001002 Mantissa 0.Numerical Methods for Natural Sciences IB Introduction • Normal binary addition of two two’s compliment numbers produces a correct result if any carry bit is discarded Two’s compliment (16 bit) Decimal + binary sum discard carry bit + –5 183 ??? 178 1111111111111011 0000000010110111 10000000010110010 0000000010110010 9. Given that. Instead. Floating point* Represented as a binary mantissa and a binary exponent.0 0. Sign bit. The use of a mantissa plus an exponent means that the binary representation of all floating point values apart from zero will start with a one. There is an assumed binary point (equivalent of a decimal point) following the unstored part of the number. The corresponding stored values for eight byte numbers are 1022 (=011111111102).0 1. Each of these exponents represents a power of two scaling on the mantissa. plus it being.000002 1. Decimal 0.0 1. The value stored in the exponent field is offset such that for four byte reals exponents of n = 1 . 1023 (=011111111112) and 1024 (=100000000002). a special case. A value of 0 indicates positive values. The sign bit gives the overall sign for the number. The most common of these is the IEEE floating point format which we describe here.Binary values are indicated by a subscript 2. zero does not follow the above pattern. and figure 25 the same for eight byte values. It is thus unnecessary to store the first binary digit. in general. There are a number of strategies for encoding the two parts.
both of which have exact representations in our binary floating point format. 9. or truncating it to give 10000. Consider the sum of 0. The result of this is that files containing binary data (e. for example.000002 1.. PCs and DEC alphas). but when carried through the a number of stages of computation.25 0. may lead to substantial errors being generated. whereas the recurring sequence would carry on the 11001100 pattern.2). Endians* While almost all modern computers use formats discussed in the previous sections to store integer and floating point values. The effect of this will accumulate of successive computations an may lead to a meaningless final result if care is not taken to ensure adequate precision and a suitable order for calculations. by addition.0 3. however.5 1.000000012 would require a 13 bit mantissa.000000012 and 16=100002. Many (but not all) Unix work stations use the latter strategy. amongst others.0 11000000 01000000 00000000 0.g.6 1.100002 1.4. The error for a single value is small. it would be necessary to reduce the number by either rounding the binary number up to 10000.g. is represented as a recurring binary and thus suffers from truncation error in the IEEE format. while an exact decimal.0000000149. the same value may not be stored in the same way on two different machines.3. If we were to have a format where we have only 12 bits stored for the mantissa.2 + 02 12 + 02 + 02 1 100000002 1 100000002 2 011111012 3 011111002 1.5 1.100002 1.00390625=0.Numerical Methods for Natural Sciences IB Introduction 3. This is the case for machines based on Intel and Digital processors (e. However.2 + 0. Here the final bit has been rounded up. 00111110 01001100 11001100 000000002 000000002 000000002 11001101 The first six values in this table yield exact representations under the IEE format.00000012. 9. storing the least significant byte at the higher memory address. The sum of these. In either case there is a loss of precision. other machines store the values the opposite way around in memory.0 01000000 01000000 00000000 3. then there may be a further loss of precision through rounding error even if both starting values may be represented exactly. The final value (0.100112 3. in the raw IEEE format rather than as ASCII text) will not be easily portable from a machine that uses one convention to one that uses the * * Not examinable Not examinable – 91 – . If two floating point values are combined.25 00111110 10000000 00000000 0.0 0. 10000.0 1. Rounding and truncation error* The finite precision of floating point arithmetic in computers leads to rounding and truncation error.00000002. The reason for this is in the architecture of the central processing unit (CPU). Truncation error is the result of the computer not being able to represent most floating point values exactly. Some machines store the least significant byte of a word at the lower memory address and the most significant byte at the upper memory address occupied by the value.
A two byte integer value of 1 = 00000000 000000012 written by one machine would be read as 256 = 00000001 000000002 when read by the other machine. – 92 – .Numerical Methods for Natural Sciences IB Introduction other.
These notes are simply intended to give the reader a feeling for the strengths and uses of different languages. As the name suggests. and has always been Fortran. The latest standard. was not finally released until about 1994. Fortran 90. The standard has evolved to reflect changes in coding practices and requirements. A Fortran 90 compiler should be able to compile all but a few of the oldest of Fortran programs (assuming they adhere to the relevant standard). the standard was released in 1977 and the demands of computing have changed significantly since then. None of the material in this section is examinable. Only for large projects is it worth investing the time and money required to utilise the optimal language.1. the current trend is towards object oriented languages such as C++. While Fortran 90 answers the majority of criticisms about the earlier version. work has * * * Not examinable Not examinable Not examinable . Computer languages* A complete discussion of computer languages would require a whole course in itself. This section is intended to act as an aid for choosing the appropriate language for a given task.2. Unfortunately for many programmers. provided a Fortran compiler was available. Like earlier improvements to the standard. Ideally you would use the language which is most appropriate for a given application. Procedural languages are considered behind the times by many computer scientists. Fortran 90* Fortran is one of the oldest languages still in wide spread use. Fortran 77 is the standard to which the majority of current compilers adhere. The notes are written from the perspective of someone who was brought up on a wide range of languages with programs suitable for both procedural and object oriented approaches. yet they remain the language of choice for many scientific programs. 10. it maintains a high level of backward compatibility.Numerical Methods for Natural Sciences IB Introduction 10. Both C++ and Fortran have their advantages. the standard lags many years behind current thinking. The main procedural contender is. For most projects. enabling programs written for one type of machine to be relatively easily ported to run on a different type of machine. The discussion is necessarily brief and may be unfairly critical on some languages.93 . 10. It has long had a standard. depending on the task at hand. the language will be selected from the intersection of the subset of languages you know and the set of languages supported on the specific platform you require. For example. Procedural verses Object Oriented * For many applications.
WHILE. and have been written in Fortran. the Fortran 90 standard has introduced many of the features sadly missing from the earlier version. 10. This greatly improves the flexibility of a program and frequently allows you to achieve more with less memory as the same memory can be reused for different purposes. In contrast. Many of these enhancements have been borrowed from languages such as C++. As the most computationally intensive programs have traditionally followed the procedural model.94 .2. in Fortran. there has been a tremendous effort put into developing optimisation strategies for Fortran compilers.e. helping to ensure Fortran remains in wide spread use. and the ability to call subroutines recursively allows the program to operate more efficiently. the end result would be less efficient (in terms of speed and size) and more cumbersome than something with the same functionality developed in C++. for example. Perhaps the most significant changes accompany a move from all data structures being static (i. then it concatenates them together. The underlying numerical algorithms process one type of data in a consistent manner.2.Numerical Methods for Natural Sciences IB Introduction already started on the next generation. the precise result will depend on the type (and number) of parameters or operands used in executing the function/operator. The language is procedural oriented with the algorithm (or “formula”) playing the central rôle controling the way the program is written and structured. unions an maps greatly simplify the passing of parameters and the organisation of variables without resort to a multitude of COMMON blocks. if you were to develop a word processor. Control structures such as DO loops have done away with the need for a numeric label. when it “adds” two strings. For example.1. 10. if abused. the procedural oriented approach is more appropriate. the object oriented C++ (see below) focuses on the data. perhaps stripping any trailing spaces in the process. the operator “+” could be defined such that. CASE and BREAK statements have been added to simplify program logic. the size of and space required by variables is determined at compile time) to allowing dynamic allocation of memory. most dangerous) is the ability to overload a function or operator.2. Fortran enhancements* For those familiar with Fortran 77. For the majority of scientific programs using the types of algorithms introduced in this course. and adapted to the Fortran environment. One of the most powerful (and. In contrast. * * Not examinable Not examinable . The program generates data from input parameters. From the data side structures (records). As a result the code produced by Fortran compilers is normally more efficient for a numerical program than that produced by other language compilers. By overloading a function or operator. Procedural oriented* The origins of Fortran as a scientific computer language are reflected in the name which stands for Formula Translation.
3. An object is simply some data which describes a coherent unit. The popularity of C stems from its close association with Unix rather than from any inherent qualities.Numerical Methods for Natural Sciences IB Introduction Similarly it would be possible to define “number=string” such that the string “string” was passed to an interpreter function and the contents of it evaluated. The original. most mainstream applications in C 10. was adopted and enhanced by Bell Laboratories to produce B and then C as part of their project to develop Unix. Clearly doing something like defining “WRITE” as “READ” would lead to a great deal of confusion! 10. BCPL. C++* Like Fortran.2. While there are probably still more programs written in Fortran and COBOL than there are in C. C++ has evolved from earlier languages. The vast majority of shrink wrapped applications are now written in C or C++. While C compilers are still widely available and in common use. * * * Not examinable Not examinable Not examinable . it is almost certain that more people use programs written in C (or C++) than in any other language. 10. but such libraries tend to be useful only for lowlevel routines. C++ (which is more than just an improved version of C) provides all the functionality of C and allows the code to be written in a more flexible and readable manner. Many programs were written in C during the late 1980s because it was the “in” language rather than the best for a given application. As a result of this investment and the relative simplicity of gradually moving over to C++.3. The concept of Object Oriented Programming was introduced to C to provide a straight forward way of handling many similar – but not identical – objects without having to write specialised code for each object. Historically this was achieved through the use of software libraries. there is little or no justification for using C for any new project. Object Oriented* As systems became bigger.95 . C has often been described as a write only language as it is relatively easy to write code which almost no one will be able to decipher.1.3. it became desirable – often essential – to reuse components as much as possible. C* The main features of C are its access to lowlevel features of the system and its extremely compact (and often confusing) notation.
its texture. Cfood Ccrockery: Ctableware. The classes Ccrockery and Ccutlery are both derived from Ctableware and so any Ccrockery or Ccutlery object is also a Ctableware object. Thus when my_dishwasher tries to wash the Ctableware derived object my_plate. we shall allow them to inherit these attributes from a class Ccrockery. in turn. The dishwasher will accept all objects belonging to Ctableware. although my_cup will contain a drink while my_plate will contain food. and * Not examinable . They are both crockery.Numerical Methods for Natural Sciences IB Introduction Suppose we have an object my_cup which belongs to some class Ccup_of_drink (say). its use and its contents. Similarly as Ccup_of_drink and Cplate_of_food are derived (indirectly) from Ctableware and so can be washed by my_dishwasher. and could thus be embodied at the Ctableware level. While C++ is able to have an array of arrays.3. Ccolour and Ctexture. but can be critical for a large scale numerical simulation. 10. it is necessary for the Cplate_of_food class to override the Ctableware procedure for washing. the class structure would appear to allow C++ to be adapted to any type of application.3.96 . In practice there are some things that are difficult to achieve in a seamless way and others which lead to relatively inefficient programming. Suppose we also have an object my_plate and that this belongs to a class Cplate_of_food. Tableware will. Ccolour The objectoriented approach comes in if we have an object my_dishwasher. In this example the Cplate_of_food response to the dishwasher may be to declare the my_plate empty of food and then pass the request for washing the plate down to its embedded Ccrockery class which will in turn pass it down to Ctableware. The object my_cup will take the form of a description of the cup which might include information about its colour. the most obvious omission is support for multidimensional arrays. To save us having to define the crockery aspects separately for my_cup and my_plate. Weaknesses* On the surface. This hierarchy may be expressed as Ccup_of_drink: Ccrockery. Ctexture Similarly we might have a teaspoon my_teaspoon belonging to Ccutlery which follows Ccutlery: Ctableware. For someone brought up on most other languages. and can thus be washed by the my_dishwasher. It is not quite so simple for the Ccup_of_drink and Cplate_of_food objects as they contain food which would be destroyed by the dishwasher. inherit attributes from other classes such as Ctableware. Cdrink Cplate_of_food: Ccrockery. Clearly there is much in common between a cup and a plate. This inefficiency may not matter for a wordprocessor (although users of Microsoft Word may not agree). Ccolour. For a Ccrockery or Ccutlery object the process of washing would be identical. The object my_dishwasher would simply say “wash thyself” to any object it contains.
3.4.4. The name Basic (Beginners Symbolic Instruction Code) does not really represent a language. it has implications on the time taken to compile a program and.97 . Basic* Basic has developed from a design exercise in Southampton to the language which is known by more people than any other. Ada was introduced in the 1980s with a substantial backing from the US military. 10.4. on the execution speed. * * * * Not examinable Not examinable Not examinable Not examinable . Algol forced the programmer to adopt a much more structured approach than other languages at the time and offered a vast array of builtin functionality such as matrix manipulation. While this adds to the flexibility.2. they can not be used in the same way and the usual notation can not be employed. Algol* In the late 1960s and early 1970s Algol (Algorithmic Language) looked as though it would become the scientific computer language.1. 10. Its acceptance was slow due to its cumbersome nature and its unsuitability for many of the computers of the period. especially with the cryptic notation shared by C and C++. Basic was designed as an interpreted language and most implementations follow this heritage. Most implementations of Basic are proprietary and some are much better than others. more importantly.Numerical Methods for Natural Sciences IB Introduction so have some of the functionality of a multidimensional array. Even with the most carefully constructed libraries and the functions being inserted inline (instead of being called). Others* 10. Ada* Touted as the language to replace all other languages. especially those from Microsoft. 10. It gains its functionality through standard and specialised libraries. but rather a family of languages which appear similar in concept but will differ in almost every important detail. Further. The name Basic is often used for the interpreted macro language in many modern applications. it is easy to get confused as to what you have an array of. C++ itself is a very small language with only a few built in functions and operators.4. the fact the compiler does not understand the functions within the library limits the degree of optimisation which can be achieved. Ultimately it was the shear power that lead to the demise of Algol as it was unable to fit on the smaller mini and microcomputers as they were developed.
Englishlike language aimed at business applications. Cobol is essentially the same as it was 30 years ago.7. provided you like recursion and counting brackets. 10.9. Uses reversepolish notation. but are often not portable.4.6.4.8. but because of the huge investment which would be required to upgrade the software to a more modern language. Forth* A threadedinterpreted language. Delphi tries to combine the graphical development tools of Visual Basic with the structure and consistency of Pascal and the Object Oriented approach of C++ and Smalltalk. 10.98 . but never widely accepted. Modula2* An improved form of Pascal. * * * * * * Not examinable Not examinable Not examinable Not examinable Not examinable Not examinable . Developed to control telescopes.4.4. Delphi* Developed by Borland International as a competitor for Microsoft’s Visual Basic. Lisp* Good for Artificial Intellegence.4. Most implementations provide significant enhancements to this standard.4. It remains in widespread use not because of any inherent advantages. Extremely efficient for an interpreted language. whereas Fortran has evolved with time. It is an extremely verbose.4. Cobol* Cobol (Common Business Oriented Language) and Fortran are the only two early highlevel languages remaining in widespread use. It is being predicted that Delphi will become increasingly popular on PCs.5. Pascal* Developed as a teaching language. However. Very good. The standard lacks many of the features essential for effective computing. Can be used for scientific programming. 10. but the compilers are often inferior to those of Fortran and C++. 10.Numerical Methods for Natural Sciences IB Introduction 10. 10. but relatively hard to read and program.
* * * * * Not examinable Not examinable Not examinable Not examinable Not examinable .11. Visual Basic* This flavour of Basic was introduced by Microsoft as an easy method of writing programs for the Windows GUI. however. It has been an extremely successful product. It provides a more consistent view than C++.10. However.4.Numerical Methods for Natural Sciences IB Introduction 10. While PostScript can be used for generalpurpose computing.4.4. Smalltalk* One of the first objectoriented programming languages.4. 10. StarScript. but is less widely accepted. Prolog* 10. PostScript is a proprietary language belonging to Adobe.14. PL/1 has some of the advantages and some of the disadvantages of both languages. PostScript* Often thought of as a printer protocol. PostScript is a threadedinterpreted language which results in efficient but hard to read code. There are. it never gained much of a following due to a combination of its proprietary nature and a requirement for very substantial mainframe resources.13. many emulations of PostScript with suitably disguised names: GhostScript.12. … 10. 10. but remains relatively inefficient for computational work when compared with C++ or Fortran.99 . PL/1* Introduced by IBM in the 1970s as a replacement for both Cobol and Fortran. it is inadvisable as it would tie up the printer for a long time! Unlike most languages in widespread use.4. PostScript is in fact a fullfledged language with all the normal capabilities plus built in graphics! Like Forth.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue listening from where you left off, or restart the preview.