You are on page 1of 58

Numerical Methods

for Science and Engineering

Volume 1 : Introduction and basic concepts

Dr SN Neossi Nguetchue

July-August 2015
Copyright (c) 2015 Neossi Nguetchue

1
Contents

1 Introduction and error analysis 4


1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Error analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.1 Roundoff error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.2 Truncation error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Norms of vectors and matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.1 Norms of vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3.2 Norms of matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2 Basic numerical linear algebra algorithms 9


2.1 Direct methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.1 Gauss elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.2 Gauss elimination with partial pivoting . . . . . . . . . . . . . . . . . . . . . 13
2.1.3 LU decomposition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.1.4 Tridiagonal systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2 Iterative methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2.1 Jacobi method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.2.2 Gauss-Seidel method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.2.3 Convergence of Jacobi and Gauss-Seidel methods . . . . . . . . . . . . . . . 19
2.2.4 Successive over relaxation (SOR) . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2.5 Residual vector and condition number . . . . . . . . . . . . . . . . . . . . . 21

3 Roots of non-linear equations 24


3.0.1 Scalar non-linear equation in one unknown . . . . . . . . . . . . . . . . . . . 24
3.0.2 The false position method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.0.3 Newton’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1 System of non-linear equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.1.1 Newton’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4 Interpolation (part I) 30
4.1 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.1.1 Polynomial interpolation [?] . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.2 Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

2
NUM701S: Numerical Methods 1 3

5 Interpolation (part II) 34


5.1 Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
5.1.1 Polynomial interpolation of uniformly distributed data . . . . . . . . . . . . 34
5.1.2 Cubic spline interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
5.2 Practice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

6 Numerical differentiation 41
6.1 Finite Difference Methods (FDMs) . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.1.1 FDMs by Taylor’s expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6.1.2 Extrapolation Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.1.3 Richardson’s Extrapolation . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

7 Numerical integration(Quadrature) 46
7.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
7.2 Midpoint Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
7.3 Trapezoidal Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
7.4 Simpson’s Rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

8 Ordinary differential equations 49


8.1 Initial value problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
8.1.1 First-order ODEs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
8.1.2 Euler’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
8.1.3 Error in Euler’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
8.1.4 Improved Euler method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
8.1.5 Runge-Kutta(RK) methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Chapter 1

Introduction and error analysis

1.1 Introduction
We live in a universe governed by the laws of Nature (gravity,electromagnetism, ...). Since we live
in such an environment, in order to solve the problems encountered in our daily life, we need to
understand these laws or principles that govern the world around us and that is the purpose of
Science. Mathematics is the language in which our universe or the laws that govern it have been
written. To make things simple, let’s assume we are interested in the study of the movement of
an object at the macroscopic scale, therefore we shall call upon or make use of classical mechanics
via Newton’s second law of motion which states that the sum of external forces applied to our
object is equal to the mass of the object times acceleration due to the movement of the object.
If we want to predict the position of the object under study at any time, we denote by x the
position of the object and t the time at which we want to locate the object. It is easy to realise
that x is a function or depends on t and we write x = x(t). Newton’s second law of motion as
described above relates t, x and its derivatives up to the second-order with respect to time ẋ
and ẍ which can be written as F (t, x, ẋ, ẍ) = 0. Such a relation is called an ordinary differential
equation (ODE). In a more general framework, if we are interested in the study of a quantity u
that depends on space variable x and time t, i.e. u is a multi-variable function, the above relation
takes the form F (t, x, ut , utt , utx , uxx ) = 0 which in this case is called is called a partial differential
equation (PDE). Ordinary differential equations (ODEs) and partial differential equations (PDEs)
are termed without distinction differential equations (DEs).

Most of the real world problems encountered in Science and Engineering are modelled by DEs.
These latter equations are not easy to solved even when they are linear. And since most of these
equations are non-linear, numerical methods are needed in order to approximate their solutions.
Let’s explain for a while what we mean by approximating the solution of a differential equation
(DE). First of all, to solve a DE we need initial conditions and/or boundary conditions depending
on the type of equation at hand. In general, the solution we are looking for is a continuous
function on a closed interval, say [a, b], but the algorithm used to approximate the solution
produces discrete values approximating the solution at a finite number of points of this interval.
A problem occurs at this stage and raises a number of questions:

4
NUM701S: Numerical Methods 1 5

- How good or accurate is the algorithm ?

- Does the approximate solution produced by this algorithm converge to the actual solution
as the distance between the discrete points gets closer to zero?
To answer these questions or related ones, we generally study the error of the method, i.e. the dif-
ference between the solution and its approximation. Once we have designed a good and converging
method for large size problem, i.e. for problem where the number of discrete points is large, we
usually convert our algorithm into a computer program written in a programming language such
as Matlab.

1.2 Error analysis


1.2.1 Roundoff error
Calculators and computers perform only finite digit arithmetic, that is calculations are only per-
formed with approximate representations of the actual numbers, i.e.

π = 3.14159265 . . .

with a calculator we may have only a five decimal place representation 3.14159.
If x∗ is an approximation to x the error is defined by x∗ = x+, where  can be positive or negative.
Definition 1.2.1 (Absolute error)

|| = |x − x∗ |.

Definition 1.2.2 (Relative error)

|| |x − x∗ |
= .
|x| |x|
provided x 6= 0

Example 1.2.3 If x = 0.02 and x∗ = 0.01


Absolute Error = 0.01
Relative Error = 0.5

Example 1.2.4 If x = 10.00 and x∗ = 10.01


Absolute Error = 0.01
Relative Error = 0.001

Thus as a measure of accuracy, the absolute error may be misleading and the relative error more
meaningful, since it takes into account the size of the value. Obviously in practice we cannot
determine absolute or relative error since the exact solution we are looking for is required for their
evaluations, we shall, however use approximations.
NUM701S: Numerical Methods 1 6

Initial data for a problem can for a variety of reasons be subject to errors. In this course we are
concerned with numerical techniques in which we perform a number of steps or iterations to attain
a solution. A good numerical technique has the property that a small error in the initial data leads
to a small error in the final results (this called stability).
Loss of accuracy due to round off error can often be avoided by a careful sequencing of operations
or reformulation of the problem.

Example 1.2.5 The roots of ax2 + bx + c = 0 are



−b ± b2 − 4ac
α, β = .
2a

If b2  4ac then b2 − 4ac ≈ b, and so:

−b + b2 − 4ac
α= ≈ 0.
2a
In calculating α we are subtracting nearly equal number and large error can arise. To avoid this,
we change the form of the quadratic formula by ”rationalising the numerator”:
√ √
−b + b2 − 4ac −b − b2 − 4ac −2c
α= × √ = √ .
2a 2
−b − b − 4ac b + b2 − 4ac
In the calculation of β we are adding two nearly equal numbers so there is no problem.

Example 1.2.6 Given a polynomial

P (x) = ax3 + bx2 + cx + d

its evaluation requires 6 multiplications and 3 additions. To minimise the roundoff errors, a more
efficient form, here the nested form, can be used

P (x) = ((ax + b)x + c)x + d

it now takes only 3 multiplications and 3 additions to evaluate this polynomial. few operations
implies less opportunity for roundoff errors to be introduced and the execution will be faster.

1.2.2 Truncation error


This type of error is different from roundoff error. Truncation error comes from approximations
made in the formulation of numerical methods and is not related to computational sources of error.
For example, in the approximation of the derivatives or integrals of a function we generally make
use of its Taylor’s expansion which automatically introduces truncation errors since the remainder
in the formula is dropped out because unknown.
NUM701S: Numerical Methods 1 7

1.3 Norms of vectors and matrices


Norms are very important in numerical methods in the sense that they enable us to measure the
size of vectors and matrices. More importantly norms enable us to measure the convergence or
non convergence of numerical schemes.
A norm is defined as a real-valued function that satisfies the following properties:
(i) ||A|| ≥ 0, for all A ∈ Rm×n
(ii)||A|| = 0 ⇔ A = 0
(iii) ||λA|| = |λ|||A||, for all λ ∈ R and A ∈ Rm×n
(iv) ||A + B|| ≤ ||A|| + ||B||, for all A, B ∈ Rm×n

1.3.1 Norms of vectors


The most commonly used norms for a vector x ∈ Rn are
the 1−norm:
Xn
kxk1 = |xi |
i=1

the Euclidean or 2−norm: v


u n
uX
kxk2 = t |xi |2
i=1

the maximum or ∞−norm


kxk∞ = max |xi |
1≤i≤

Example 1.3.1 If x = [−3 1 0 2]T then

kxk1 = | − 3| + |1| + |0| + |2| = 6


p √
kxk2 = | − 3|2 + |1|2 + |0|2 + |2|2 = 14

kxk∞ = max{| − 3|, |1|, |0|, |2|} = 3

1.3.2 Norms of matrices


If A ∈ Rn×n the 1−norm and ∞−norm are defined respectively as
n
X
kAk1 = max |aij |,
1≤j≤n
i=1

n
X
kAk∞ = max |aij |
1≤i≤n
j=1

they are the maximum column and row respectively.


NUM701S: Numerical Methods 1 8

Example 1.3.2  
5 −2 2
A=  3 1 2 
−2 −2 3
kAk1 = 10, and kAk∞ = 9

For the 2−norm, there is no simple formula, one method is


1/2
kAk2 = ρ(AT A)


where ρ(AT A) is the eigenvalue of largest magnitude or absolute value.


The 2−norm is of crucial importance since no other norm is smaller.

Example 1.3.3 For the last example


    
5 3 −2 5 −2 2 38 −3 10
AT A =  −2 1 −2   3 1 2  =  −3 9 8 
2 2 3 −2 −2 3 10 −8 17

The eigenvalues of this latter matrix are {3.7998, 17.1864, 43.0138}, thus

ρ(AT A) = 43.0138

and the 2−norm of A is


1/2
kAk2 = ρ(AT A)

= 6.5585.
Chapter 2

Basic numerical linear algebra


algorithms

Systems of linear equations arise in a large number of areas both directly in the modelling of
physical situations and indirectly in the numerical solution of other mathematical problems.

Consider the system of 3 equations in 3 unknowns x1 , x2 and x3

x1 + x2 + x 3 = 4
2x1 + 3x2 + x3 = 9
x1 − x2 − x3 = −2

Using matrix notation, this system can written as


    
1 1 1 x1 4
 2 3 1   x2  =  9 
1 −1 −1 x3 −2
or
Ax = b
where A is called the coefficient matrix, b is the right hand side vector and x is the vector of
unknowns. If we pre-multiply both sides by A−1

A−1 Ax = A−1 b
Ix = A−1 b
x = A−1 b

So we would expect a unique solution to exist if and only if A−1 exists, i.e. if A is non singular.
Solving a system of linear equations is equivalent to inverting the coefficient matrix.

The system above can be easily solved by adding and subtracting multiples of the equations to
eliminate one variable at a time. In general however, there may be hundreds or even thousands of

9
NUM701S: Numerical Methods 1 10

equations and unknowns, For a general system of n equations in n unknowns we write


    
a11 a12 · · · a1n x1 b1
 a21 a22 · · · a2n   x2   b2 
  ..  =  .. 
    
 ..
 .  .   . 
an1 an2 · · · ann xn bn
We may perform certain elementary row operations on a system of linear equations which do
not alter the solution of the system, we may
• interchange equations,

• multiply an equation by a non zero constant,

• add a multiple of one equation to another.

2.1 Direct methods


2.1.1 Gauss elimination
To facilitate in the solution of systems of linear equations we tend to write it in augmented matrix
form, where we avoid writing the vector of unknowns and bring together the coefficient matrix ant
the right hand side vector, so for the above system we write
.
 
1 1 1 .. 4

 2 3 .. 
 1 . 9 

..
1 −1 −1 . −2
and for the general system
..
 
 a11 a12 ··· a1n . b1
.. 
 a21 a22 ··· a2n . b2
 

 .. .. 

 . . 

..
an1 an2 ··· ann . bn
To solve the system of equations Ax = b, we reduce it to an equivalent system which has upper
triangular form U x = g, written as
 .. 
u11 u12 · · · u1,n−1 u1n . g1

 0 u22 · · · .. 
 u 2,n−1 u 2n . g2


 ... .. 
 . 
.
 

 0 0 · · · un−1,n−1 un−1,n
.
. g

n−1 
..
0 0 ··· 0 unn . gn
NUM701S: Numerical Methods 1 11

The system can then be solved by a process of back substitution.

In the Gauss elimination method, reduction to upper triangular form is achieved through the
systematic application of elementary row operations. Each step involves working on successive
columns.

Example 2.1.1
.
 
R1 :  1 1 1 .. 4
. 
R2 : 
 2 3 1 .. 9 
R3 : .
1 −1 −1 .. −2
R2 ← R2 − 2R1 , R3 ← R3 − R1

.
 
R1 :  1 1 1 .. 4
R2 :  .. 
 0 1 −1 . 1 
R3 : .
0 −2 −2 .. −6
R3 ← R3 + 2R2
..
 
R1 :  1 1 1 . 4
R2 :  .. 
 0 1 −1 . 1


R3 : .
0 0 −4 .. −4
writing the system in full

x1 + x2 + x3 = 4
x2 − x3 = 1
−4x3 = −4

we can now solve directly for x3 , x2 and x1

x3 = −4/(−4) = 1
x2 = 1 + x3 = 2
x1 = 4 − x 2 − x3 = 1

In general to eliminate elements below the element a11 we apply the sequence of row operations
ai1
Ri ← Ri − R1 , i = 2, . . . , n.
a11
We call a11 the pivot and mi1 = ai1 /a11 the multiplier. Clearly we cannot carry out the operation
NUM701S: Numerical Methods 1 12

unless the pivot a11 is non zero. If a11 6= 0 then the new augmented matrix obtained is
 .. 
a11 a12 ··· a1,n−1 a1n . b1
 (1) (1) (1) .. (1)

 0
 a22 ··· a2,n−1 a2n . b2  
 .. .. 
 . . 
.. (1) 
 
(1) (1) (1)
 0 an−1,2 · · · an−1,n−1 an−1,n . bn−1 

(1) (1) (1) .. (1)
0 an2 ··· a ann
n,n−1 . bn

The superscript (1) refers to coefficients which may have changed as a result of row operations.
(1)
We now repeat the process to eliminate the elements below the diagonal element a22
(1)
ai2
Ri = Ri − (1)
R2 , i = 3, . . . , n.
a22
(1)
The element a22 is now the pivot
 .. 
a11 a12 a1,3 ··· a1n . b1
 (1) (1) (1) .. (1) 
 0
 a22 a2,3 ··· a2n . b2 

 (2) (2) .. (2) 
 0 0 a33 ··· a3,n . b3 
.. ..
 

 . .


(2) (2) .. (2)
0 0 an,3 ··· ann . bn
The procedure is repeated until we have introduced zeros below the main diagonal in the first n − 1
columns. We then have the desired upper triangular form
 .. 
a11 a12 a1,3 · · · a1n . b1
 (1) (1) (1) .
. (1)

 0 a
 22 a2,3 · · · a2n . b2 

 (2) (2) .
. (2) 
 0 0 a 33 · · · a 3,n . b 3

... .
 

 .. 

(n−1) .
.. b(n−1)
0 0 0 · · · ann n

We may then use back substitution to obtain xi , i = 1, . . . , n

xn = b(n−1)
n /a(n−1)
nn
n
!
(i−1)
X (i−1) (i−1)
xi = bi − aij xj /aii , i = n − 1, . . . , 1.
j=i+1

Remark: It was pointed out before that Gauss elimination will fail if there is a zero pivot. There
may also be problems if the pivot is small and hence the multiplier large, any rounding errors in
the row being multiplied by the multiplier will be greatly magnified. A small pivot also has a bad
effect on the errors during the process of backward substitution.
NUM701S: Numerical Methods 1 13

2.1.2 Gauss elimination with partial pivoting


This overcomes the problem of of small pivot values and helps reduce the effect of round off errors.

To perform partial pivoting, we ensure that for each step the diagonal element aii is of the
largest absolute value possible. The benefits of pivoting are shown in the following example

Example 2.1.2
.
 
−0.002 4.000 4.000 .. 7.998
 −2.000 2.906 −5.387 ...
 
 −4.481 

.
3.000 −4.031 −3.112 .. −4.143
This system has exact solution x1 = x2 = x3 = 1. We first apply GE without pivoting and use 4
significant figures with rounding in the calculations.
2.000 3.000
R2 ← R2 − R1 , R3 ← R3 + R1
0.002 0.002
i.e. R2 ← R2 − 1000R1 , R3 ← R3 + 1500R1
..
 
−0.002 4.000 4.000 . 7.998
 .. 

 0 −3997 −4005 . −8002 

..
0 5996 5997 . 12000
5996
R3 ← R3 + R2 , i.e. R3 ← R3 + 1.5R2
3997
.
 
−0.002 4.000 4.000 .. 7.998
 .. 

 0 −3997 −4005 . −8002 

..
0 0 −11.00 . 0
Back substitution gives
x3 = 0,
1
x2 = −3997 (−8002) = 2.002,
1
x1 = −0.002 (7.998 − 4.000 × 2.002),
= 5.
Round off errors have been ”blown up” by the small pivot.

Example 2.1.3 If we apply pivoting, on the first step we need to swap rows 1 and 3

Swap R1 ↔ R3
.
 
3.000−4.031 −3.112 .. −4.143
 −2.000 2.906 −5.387 ...
 
 −4.481 

.
−0.002 4.000 4.000 .. 7.998
NUM701S: Numerical Methods 1 14

2.000 0.002
R2 ← R2 + R1 , R3 ← R3 + R1
3.000 3.000
i.e. R2 ← R2 + 0.6667R1 , R3 ← R3 + 0.0006667R1
.
 
3.000 −4.031 −3.112 .. −4.143

 0 .. 
 3.997 3.998 . 7.995 

..
0 0.2190 −7.642 . −7.243
0.2190
R3 ← R3 − R2 , i.e. R3 ← R3 − 0.05479R2
3.997
..
 
3.000 −4.031 −3.112 . −4.143
 . 
 0
 3.997 3.998 .. 7.995  
..
0 0 −7.861 . −7.681
Upon back substitution
x3 = 1,
1
x2 = 3997
(7.995 − 3.998) = 1,
1
x1 = 3.000
(−4.143 + 4.031 + 3.112) = 1,
So to 4 significant figures, we have eliminated round off error.
Problem 2.1.4 The algorithm for the Gaussian elimination is formally given using two loops and
the rows as
%gauss_elim.m
function gauss_elim(A,b) % 1
.......... % 2
for i=1:n-1 % 3
for j=i+1:n % 4
mji=aji/aii % 5
Rj <--- Rj - mji*Ri % 6
end % 7
end % 8
To express the assignment in line 6 above, we need to write the components of the row vectors Rj
and Ri in terms of another variable, say k,
Rj = (ajk )1≤k≤n+1 and Ri = (aik )1≤k≤n+1
Thus Rj ← Rj − mji Ri can be programmed as
ajk = ajk − mji ∗ aik , 1 ≤ k ≤ n + 1
assuming we are working with the augmented matrix.
1. Write a complete Matlab code by modifying the script gauss elim.m and using the appropriate
position of the loop over k. Test your code by applying it to the examples solved above.
2. Write a vectorised version of code in gauss elim.m by reducing the number of loops.
NUM701S: Numerical Methods 1 15

2.1.3 LU decomposition
A problem frequently encountered when solving a system of linear equations Ax = b is the
need to obtain solutions for variety of right-hand side vector b while A remains fixed. Gauss
Elimination (GE) would involve solving Ax = b for each b in other words, for each vector b in the
augmented matrix, the GE will be perform with the same coefficient matrix A unchanged. From
this results in a waste of time and computer resources. LU decomposition addresses the problem
by focusing on the coefficient matrix A only. As we shall see, this approach is closely related to GE.

Assume we can factor A into a product of lower triangular matrix L and upper triangular matrix
U i.e. A = LU . For simplicity and without loss of generality, for a 3 × 3 case
    
a11 a12 a13 l11 0 0 u11 u12 u13
 a21 a22 a23  =  l21 l22 0   0 u22 u23 
a31 a32 a33 l31 l32 l33 0 0 u33
then Ax = b implies LU x = b
If we let U x = y, the above system is equivalent to the following two systems of linear equations
Ly = b (2.1)
Ux = y (2.2)
We can solve (2.1) for y by forward substitution and then solve (2.2) for x by backward
substitution.
To obtain the elements of L and U , one approach consists of multiplying the matrices together
and equate to the corresponding elements in A. As L and U stand, there are n2 + n unknowns in
total but the system obtained to find their elements has only n2 equations. We therefore need to
specify n additional conditions. The standard choice is to set all the diagonal elements of L, lii ,
to 1’s. i.e.
   
1 0 0 u11 u12 u13
L =  l21 1 0  , U =  0 u22 u23 
l31 l32 1 0 0 u33
We find at the end of the above procedure that U is exactly the same as that obtained from the
GE without pivoting.
The other approach to perform the LU decomposition of A is to write A = IA, where I is the
identity matrix, and perform GE without pivoting to the second factor A while replacing the
corresponding entries of I below the main diagonal by the associated multipliers used in GE.
The following example serves to illustrate this latter approach.
Example 2.1.5   
1 0 0 1 1 1 : R1
A = IA =  0 1 0  2 3 1  : R2
0 0 1 1 −1 −1 : R3
  
1 0 0 1 1 1 : R1
 2 1 0  0 1 −1  : R2 ← R2 − 2R1
1 0 1 0 −2 −2 : R3 ← R3 − R1
NUM701S: Numerical Methods 1 16

  
1 0 0 1 1 1 : R1
 2 1 0   0 1 −1  : R2
1 −2 1 0 0 −4 : R3 ← R3 + R2
Thus:   
1 0 0 1 1 1
L=  2 1 0  and U =  0 1 −1 .
1 −2 1 0 0 −4

2.1.4 Tridiagonal systems


When solving numerically second-order differential equations resulting from the application of
Newton’s second law of motion, we arrive at a system of linear equation whose associated coefficient
matrix has a tridiagonal structure. In real-life application, many problems can be converted or
solved by the means of a triadiagonal system of linear equation. As an example to illustrate our
purpose, assume we want to solve the following one dimensional steady-state heat equation

u00 (x) = f (x), a < x < b (2.3)


u(a) = 20, u(b) = 30.

For simplicity we choose f (x) = 1, a = 0 and b = 1. We discretise the interval [a, b] as the set of
finite points xi , where xi = i∆x, i = 0, 1, . . . , N + 1, with ∆x = 1/(N + 1), N being the number
of grid points.

a=x0 ··· xi−1 xi xi+1 ··· b=xn x

Figure 2.1: Discretisation of the interval [a, b]

Using the finite difference approximation of u00 at xi using the second-order central difference
(see numerical differentiation), we have

u(xi+1 ) − 2u(xi ) + u(xi−1 )


u00 (xi ) = + O((∆x)2 )
(∆x)2

which can be simply written as


Ui+1 − 2Ui + Ui−1
u00 (xi ) ≈ , with h = ∆x. (2.4)
h2
Substituting Eq. (2.3) into Eq. (2.4), we obtain

Ui+1 − 2Ui + Ui−1 = h2 fi = h2 , i = 1, . . . , N (2.5)

Writing Eq. (2.5) row by row, starting from i = 1 up to i = N and applying the boundary
NUM701S: Numerical Methods 1 17

conditions U0 = 20 and UN +1 = 30, we obtain the following system of linear equations


−2U1 + U2 = h2 − U0 = h2 − 20
U1 − 2U2 + U3 = h2
U2 − 2U3 + U4 = h2
U3 − 2U4 + U5 = h2
... ...
UN −4 − 2UN −3 + UN −2 = h2
UN −3 − 2UN −2 + UN −1 = h2
UN −2 − 2UN −1 + UN = h2
UN −1 − 2UN = h2 − UN +1 = h2 − 30.
Writing the above system in matrix form we obtain

h2 − 20
    
−2 1 0 ··· ··· ··· ··· 0 U1
... .. U2 h2
1 −2 1 ··· ··· ··· .
    
    
 .. ..  U3   h2 
 0 1 −2 1 . ··· ··· .    
  U4   h2 
 .. .. .. .. .. ..    
 . . . . . .  ˙  
 =  ˙ 
..
  
.. .. .. .. .. ˙ ˙
. . . . .
    
 .    
 .. ...
 UN −3   h2 

 . ··· ··· 1 −2 1 0 
 UN −2



 h2


 .. ...    
 . ··· ··· ··· 1 −2 1  UN −1   h2 
2
0 ··· ··· ··· ··· 0 1 −2 UN h − 30

2.2 Iterative methods


We start with an initial estimate, x[0] of the system Ax = b, by a numerical procedure we generate
a sequence of vectors {x[1] , x[2] , x[3] , . . .} each of which is a better approximation to the true
solution than the previous one. This is called iterative refinement.

To generate the new approximation we rearrange the system of equations in the form
x[m+1] = Cx[m] + d, m = 0, 1, 2, . . .
C is called the iteration matrix, besides C and d are independent of x.

The iterative refinement is stopped when two successive approximations are found to differ, in
some sense, by less than a given tolerance. We shall use a stopping criterion
[m] [m−1]
xj − xj
max < , m > 0.
1≤j≤n [m]
xj

Consider Ax = b, where A is non singular and the diagonal elements of A are non zero. Define
NUM701S: Numerical Methods 1 18

• L to be the strictly lower triangular part of A.

• U to be the strictly upper triangular part of A.

• D to be the diagonal part of A.

e.g.

 A = L
 + D + U     
a11 a12 a13 0 0 0 0 a12 a13 a11 0 0
 a21 a22 a23  =  a21 0 0  +  0 0 a23  +  0 a22 0 
a31 a32 a33 a31 a32 0 0 0 0 0 0 a33

2.2.1 Jacobi method


If we consider the 3 × 3 system

a11 x1 + a12 x2 + a13 x3 = b1


a21 x1 + a12 x2 + a23 x3 = b2
a31 x1 + a32 x2 + a33 x3 = b3

Solving in each equation for the unknown on the main diagonal and inserting the subscript, m,
gives

[m+1] 1  [m] [m]



x1 = b1 − a12 x2 − a13 x3
a11
[m+1] 1  [m] [m]

x2 = b2 − a21 x1 − a23 x3
a22
[m+1] 1  [m] [m]

x3 = b3 − a31 x1 − a32 x2
a33
m = 0, 1, 2, . . .
In matrix form this is
x[m+1] = D−1 b − (L + U )x[m]
 

so
CJ = −D−1 (L + U ), d = D−1 b.
NUM701S: Numerical Methods 1 19

2.2.2 Gauss-Seidel method


This method uses the most recent estimates at each step in the hope of achieving faster convergence
i.e. in the 3 × 3 system
[m+1] 1  [m] [m]

x1 = b1 − a12 x2 − a13 x3
a11
[m+1] 1  [m+1] [m]

x2 = b2 − a21 x1 − a23 x3
a22
[m+1] 1  [m+1] [m+1]

x3 = b3 − a31 x1 − a32 x2
a33
m = 0, 1, 2, . . .
If we rewrite this system
[m+1] [m] [m]
a11 x1 = b1 − a12 x2 − a13 x3
[m+1] [m+1] [m]
a21 x1 + a22 x2 = b2 − a23 x3
[m+1] [m+1] [m+1]
a31 x1 + a32 x2 + a33 x3 = b3
m = 0, 1, 2, . . .

In matrix form, this is


(D + L)x[m+1] = b − U x[m] ,
x[m+1] = (D + L)−1 b − U x[m] .
 

Therefore
CGS = −(D + L)−1 U, d = (D + L)−1 b.

2.2.3 Convergence of Jacobi and Gauss-Seidel methods


For both methods, the exact solution satisfies
x = Cx + d
If we subtract
x[m+1] = Cx[m] + d
from the above and set
e[m] = x − x[m]
then
e[m+1] = Ce[m] ,
ke[m+1] k = kCe[m] k ≤ kCkke[m] k,
from using kABk ≤ kA|kkBk. This implies
ke[m] k ≤ (kCk)m ke[0] k
Therefore we have the following result
NUM701S: Numerical Methods 1 20

Theorem 2.2.1 A necessary and sufficient condition for convergence is that

ρ(C) = max |λi (C)| < 1.


i

ρ is called the spectral radius of the matrix C.

Theorem 2.2.2 A sufficient condition for convergence is that the coefficient matrix is diagonally
dominant, i.e. X
|aii | > |aij |, ∀ i.
j6=i

Example 2.2.3  
1 3 −5
A = 1 4 1
4 −1 2
We have

i=1: |1| ≯ |3| + | − 5| = 8


i=2: |4| > |1| + |1| = 2
i=3: |2| ≯ |4| + | − 1| = 5
It is clear that A is not diagonally dominant. But if we construct the following matrix by swapping
the first and third rows of A, we have
 
4 −1 2
B = 1 4 1
1 3 −5

This time

i=1: |4| > | − 1| + |2| = 3


i=2: |4| > |1| + |1| =2
i=3: | − 5| > |1| + |3| =4
Since all inequalities are satisfied, the system is diagonally dominant.

2.2.4 Successive over relaxation (SOR)


If we rewrite the Gauss-Seidel equations as

[m+1] [m] 1  [m] [m] [m]



x1 = x1 + b1 − a12 x2 − a13 x3 − a11 x1
a11
[m+1] [m] 1  [m+1] [m] [m]

x2 = x2 + b2 − a21 x1 − a23 x3 − a22 x2
a22
[m+1] [m] 1  [m+1] [m+1] [m]

x3 = x3 + b3 − a31 x1 − a32 x2 − a33 x3
a33
NUM701S: Numerical Methods 1 21

We now multiply the ”correction” term by ω, called the relaxation parameter, we obtain
[m+1] [m] ω  [m] [m] [m]

x1 = x1 + b1 − a12 x2 − a13 x3 − a11 x1
a11
[m+1] [m] ω  [m+1] [m] [m]

x2 = x2 + b2 − a21 x1 − a23 x3 − a22 x2
a22
[m+1] [m] ω 
[m+1] [m+1] [m]

x3 = x3 + b3 − a31 x1 − a32 x2 − a33 x3
a33
where ω ∈ (0, 2).
It can be shown that the solution diverges for ω ∈
/ (0, 2).

• If ω = 1, this gives the Gauss-Seidel method.

• If ω > 1, we take larger than normal correction. This is useful if the Gauss-Seidel iterates
converge monotonically at a slow rate.

• If ω < 1, we take a smaller than normal correction. This is useful if the Gauss-Seidel iterates
are oscillatory.
We may rearrange these equations as
[m+1] [m+1] [m] [m]
a11 x1 = ωb1 + (1 − ω)a11 x1 − ωa12 x2 − ωa13 x3
[m+1] [m+1] [m] [m]
ωa21 x1 + a22 x2 = ωb2 + (1 − ω)a22 x2 − ωa23 x3
[m+1] [m+1] [m+1] [m]
ωa31 x1 + ωa32 x2 + a33 x3 = ωb3 + (1 − ω)a33 x3

So in matrix form we have

(D + ωL)x[m+1] = ωb + [(1 − ω)D − ωU ]x[m]


x[m+1] = (D + ωL)−1 ωb + [(1 − ω)D − ωU ]x[m]
 

Therefore
Cω = (D + ωL)−1 [(1 − ω)D − ωU ], d = (D + ωL)−1 ωb.
To obtain an optimum value of of ω, it can be shown that if µ is the eigenvalue of largest magnitude
of Cω , then
2
ωopt = p .
1 + 1 − µ2

2.2.5 Residual vector and condition number


When we compute the solution of the system Ax = b, where A is non-singular, we obtain, as
a result of round-off error or from an iterative scheme, only an approximate solution x∗ . The
residual vector is defined as
r = b − Ax∗ .
NUM701S: Numerical Methods 1 22

It would be nice if a small norm for the residual always implied that x∗ was close to the true
solution, i.e. that kx − x∗ k was small. Unfortunately, this is not always the case, e.g.,
    
1 1 x1 2
=
1.001 1 x2 2.001

which has the solution x = [1 1]T . Now x∗ = [2 0]T is a very poor approximation to x. The
residual in this case is
      
2 1 1 2 0
r= − = ,
2.001 1.001 1 0 −0.001

which has a very small norm. A system like this is said to be ill-conditioned.

Theorem 2.2.4 Suppose x∗ is an approximation to the solution of Ax = b, A non-singular, and


r = b − Ax∗ is the residual associated with x∗ , then for any natural norm

kx − x∗ k ≤ kA−1 kkrk

and
kx − x∗ k krk
≤ kAkkA−1 k , b, x 6= 0.
kxk kbk

Proof 2.2.5 Since r = b − Ax∗ = Ax − Ax∗ and since A is non-singular

x − x∗ = A−1 r

therefore
kx − x∗ k = kA−1 rk ≤ kA−1 kkrk. (2.6)


Moreover, since b − Ax we have kbk ≤ kAkkxk and since b, x 6= 0

1 kAk
≤ (2.7)
kx kbk

and so combining (2.6) and (2.7) we have

kx − x∗ k krk
≤ kAkkA−1 k .
kxk kbk

Definition 2.2.6 The condition number of a non-singular matrix A relative to the norm k · k
is
K(A) = kAkkA−1 k.
NUM701S: Numerical Methods 1 23

This implies
kx − x∗ k krk
≤ K(A) .
kxk kbk
Since
K(A) = kAkkA−1 k ≥ kAA−1 k = kIn k = 1.
When K  1, we say that A is ill-conditioned. When K ≈ 1, we say that A is well-conditioned.

Since it is costly from a computational point of view to find the inverse of a matrix and since
inverting A is prone to inaccuracy the larger K(A) is, method for approximating K(A) have been
developed.
Chapter 3

Roots of non-linear equations

Non-linear equations occur in many real-world problems and are rarely solvable analytically.

All numerical methods for finding roots or zeros of these equations are iterative in nature. Given
initial information about a root, we refine the information producing a sequence of successively
more accurate estimates.

3.0.1 Scalar non-linear equation in one unknown


Consider the following equations:

x3 − x2 − 4x − 6 = 0 (Three roots)
x sin x = 1 (Infinitely many roots)
x = e−x (One root)
x = ex (No roots)

All non-linear equations can be written in the form

f (x) = 0.

We shall examine two types of iterative methods for these equations

• Bracketing Methods: These methods require as initial information an interval containing


a root of the equation. They then produce a smaller interval still containing the root. This
is repeated until the interval is sufficiently small.

• Fixed-Point Methods: These are analogous to the iterative methods met in the previous
section. We start with an approximation to the root and produce a sequence of approxima-
tions, each closer to the root than its predecessor.

To obtain these initial intervals or approximations, graphical methods are usually used.

24
NUM701S: Numerical Methods 1 25

The fixed-point method


The main goal of this method is to transform the equation f (x) = 0 to x = g(x) which is assumed
to be ”much” simpler to study and solve and where g has some interesting properties we shall
address below. It is also assumed that the root of the equation has been ”isolated” in a small
interval by means of the graphical method and interval division or bracketing method. As an
example, let’s consider the following equation
1
f (x) = − x2 + 3x − 4 = 0.
2
This quadratic equation can be readily solved and we find that it has two roots α1 = 2 and α2 = 4.
(i) The simplest fixed-point form can be obtained by adding x on both sides of the equation yielding
1
g1 (x) = − x2 + 4x − 4 = 0
2
In this case, the sequence xk = g(xk−1 ) will not converge to 2 for any choice of the starting point
x0 ∈ (1.5, 2.5) (except 2).

k xk Relative error (in %)


0 1.9999 0.0050
1 1.99979 0.010000250
4 1.9983994 0.0800030007
8 1.974236107 1.288194654
16 −48.977392359 2548.869617959
.. .. ..
. . .
23
20 −1.174 × 10 5.874 × 1024
24 -Inf Inf

Table 3.1: The sequence xk diverges eventhough x0 is closed enough

(ii) Yet another form can be obtained by isolating x2 from the equation and apply the square
root on both side of the resulting equation giving

x = 6x − 8 = g2 (x).

The sequence xk = g(xk−1 ) will converge to 4 for any choice of the starting point x0 ∈ (3.5, 4.5)
(except 4). See Table 3.2 for illustration.

The bisection method


If we have an interval [a, b], of length b − a, in which f (a)f (b) < 0, then we know that a root of
the equation lies in the interval [a, b].
NUM701S: Numerical Methods 1 26

k xk Relative error (in %)


0 3.51 12.25
1 3.613862200 9.653445002
4 3.821691345 4.457716379
8 3.940842578 1.478935554
16 3.993956504 0.151087388
.. .. ..
. . .
20 3.99884832 0.047879202
26 3.999658940 0.008526504

Table 3.2: The sequence xk converges after 26 iterations (tol=10−4 ).

If we calculate the mid point of [a, b],


1
c = (a + b)
2
then

• If f (a)f (b) < 0 then f (a) and f (c) have opposite signs and so the root must lie in [a, c].

• If f (a)f (c) > 0 then f (a) and f (c) have the same sign and so f (b) and f (c) must have
opposite signs, so the root must lie in the interval [c, b].
So with one step we have halved the interval in which we know the root lies. If we use a
stopping criterion
|bs − as | < ε
we have

|b1 − a1 | = |b − a|
1
|b2 − a2 | = |b1 − a1 |
2
..
.
1
|bs − as | = |bs−1 − as−1 |
2
1
= 2 |bs−2 − as−2 |
2
1
= s−1 |b1 − a1 |
2

We require |bs − as | < ε or |bs − as | ≈ ε which implies


1
|b1 − a1 | ≈ ε
2s−1
NUM701S: Numerical Methods 1 27

Thus 2s = 2 |b1 −a
ε
1|
, and finally  
|b1 − a1 |
s = log 2 / log 2.
ε
Application √
Find the root of f (x) = x − cos(x) x ∈ [0, 1] using the bisection method. (Solved in class)

3.0.2 The false position method


The bisection method is attractive because of its simplicity and guaranteed convergence. Its
disadvantage is that it is, in general, extremely slow.
In the bisection method we calculate f (a), f (b) and then a succession of f (c) values, but we only
use the signs of these terms. In the false position method, we make use of their values as well. We
choose c to be the point where the straight line joining (a, f (a)) and (b, f (b)) cuts the x-axis.
The equation of the straight line through (a, f (a)) and (b, f (b)) is
x−a
y = f (a) + (f (b) − f (a))
b−a
we require the point c where y = 0,
c−a
0 = y = f (a) + (f (b) − f (a))
b−a
af (b) − bf (a)
c = .
f (b) − f (a)

Note that the false position method often approaches the root from one side only, so we require
a different stopping criterion to that of the bisection method. We actually choose

|c − c∗ | < ε

where c∗ is the value of c calculated on the previous step.



Application Find the root of f (x) = x − cos(x) x ∈ [0, 1] using the false method with ε = 10−4 .

3.0.3 Newton’s method


In this method, we start with an initial approximation (or guess) to the root and use this to generate
a new approximation. It works by taking as the new approximation the point of intersection of
the tangent to the curve y = f (x) at xi with the x-axis.
Newton’s method can be derived in several ways, we choose to do via Taylor’s Series.
Let xi+1 = xi + h then, by Taylor’s Series,

h2 00
f (xi+1 ) = f (xi ) + hf 0 (xi ) + f (xi ) + . . .
2!
NUM701S: Numerical Methods 1 28

If we truncate after 2 terms


f (xi+1 ≈ f (xi ) + hf 0 (xi )
Thus this series has error O(h2 ). Ideally, f (xi+1 ) = 0 so that solving for h gives
f (xi )
h=− , with f 0 (xi ) 6= 0.
f 0 (xi )
Therefore
f (xi )
xi+1 = xi + h = xi − , i = 0, 1, . . .
f 0 (xi )
Newton’s method is extremely robust and fails in very few cases.

Application Find the root of f (x) = x−cos(x) x ∈ [0, 1] using Newton’s method with ε = 10−4 .

3.1 System of non-linear equations


3.1.1 Newton’s method
Two non-linear equations in two unknowns x1 and x2 can be written as
f1 (x1 , x2 ) = 0,
f2 (x1 , x2 ) = 0.
Newton’s method can be extended to this problem by the Taylor Series approach. Let
∂f1 ∂f2 ∂f1 ∂f2
= f1x1 , = f2x1 , = f1x2 , = f2x2 .
∂x1 ∂x1 ∂x2 ∂x2
We wish to determine
h iT h iT
[i+1] [i+1] [i+1] [i] [i] [i] [i]
x = x1 , x2 = x1 + h1 , x2 + h2

[i+1] [i] [i]


x1 = x1 + h1
[i+1] [i] [i]
x2 = x2 + h2
By Taylor’s Theorem for functions of two variables
f1 (x1 + h1 , x2 + h2 ) = f1 (x1 , x2 ) + h1 f1x1 + h2 f1x2 + O(h21 , h22 , h1 h2 , . . .),
f2 (x1 + h1 , x2 + h2 ) = f2 (x1 , x2 ) + h1 f2x1 + h2 f2x2 + O(h21 , h22 , h1 h2 , . . .)
Assuming that |h1 | and |hs | are both < 1 and that h1 and h2 make x1 + h1 and x2 + h2 exact i.e.
f1 (·) = f2 (·) = 0, the values of h1 and h2 are approximated by the solutions of the following 2 × 2
linear system
f1x1 h1 + f1x2 h2 = −f1 (x1 , x2 )
f2x1 h1 + f2x2 h2 = −f2 (x1 , x2 )
NUM701S: Numerical Methods 1 29

i.e.     
f1x1 f1x2 h1 f1 (x1 , x2 )
=− .
f2x1 f2x2 h2 f2 (x1 , x2 )
J(x)h = −f (x). (3.1)
The matrix of partial derivatives, J, is called the Jacobian. The elements of J and the right-hand
side are known and so Eq. (3.1) is a linear system and can be solved by the methods of the previous
chapter.
Thus we have
h[i] = −J −1 (x[i] )f (x[i] ),
hence
x[i+1] = x[i] + h[i] = x[i] − J −1 (x[i] )f (x[i] ).
This method is easily extended to a system of n equations in n unknowns with the Jacobian given
by the n × n matrix  ∂f1 ∂f1

∂x1
· · · ∂x n

J =  ... .. .. 

. . 
∂fn ∂fn
∂x1
··· ∂xn

Indeed, if we consider the following system of nonlinear equations

Application (in class).


Chapter 4

Interpolation (part I)

4.1 Theory
4.1.1 Polynomial interpolation [?]
Suppose that the values of f1 , f2 , . . . , fn of a scalar argument function f : R −→ Rn are known at
the points x1 < x2 < . . . < xn . We want to approximate this function on the interval [x1 , x2 ]. One
of the standard ways to do that is to use the polynomial interpolation technique. Here, we explain
how to use the Newton interpolation formula to reach our goal. It is well known form the theory
that the Newton interpolating polynomial fitted the above-mentioned information is written in the
following form:
Xn i−1
Y
Pn−1 (x) = f [x1 , . . . , xi ] (x − xj )
i=1 j=1

where f [x1 , . . . , xi ], i = 1, 2, . . . , n, denote divided differences. The zeroth divided differences are
defined initially by the equalities f [xi ] i = 1, 2, . . . , n. Then, the k−th differences are calculated
recursively by the formula

f [xi+1 , xi+2 , . . . , xi+k ] − f [xi , xi+1 , . . . , xi+k−1 ]


f [xi , xi+1 , . . . , xi+k−1 , xi+k ] =
xi+k − xxi

Thus, to construct any Newton interpolating polynomial we perform two steps. In the first
one, we compute the divided differences. In the second step, we evaluate the value of the
polynomial Pn−1 (x) at any fixed point x. Let us consider the first step(i.e calculation of the
divided differences). The above definition gives a way to calculate such differences. Usually, it is
done in the form of the divided difference table

f [x1 ] f [x2 ] ··· f [xn ]


f [x1 , x2 ] · · · f [xn−1 , xn ]
... ..
.
f [x1 , . . . , xn ]

30
NUM701S: Numerical Methods 1 31

On the other hand, the formula for the polynomial Pn−1 (x) shows that, when evaluating Pn−1 (x),
we use only the divided differences of the following form: f [x1 , x2 , . . . , xi ], i = 1, 2, . . . , n (i.e the
differences which lie on the main diagonal of the table). Taking into account this observation,
we organize our program as follows: In the first step, we store the zeroth divided differences to a
Matlab array. In the second step, we leave the first entry of the first divided difference f [xj−1 , xj ]
calculated by the definition given above. Generally, the ith step of this algorithm implies that we
remain all the entries with numbers from 1 to i unchanged and recalculate the array starting with
the (i + 1)-th entry according to the definition of the i-th divided differences. Finally, our array
contains the main diagonal of the divided difference table after n steps of the algorithm. Now we
write a program. It takes an array of the points xi (input parameter x ), i = 1, 2, . . . , n, as output.
our program supports vector-valued functions f. In order to handle this general situation correctly,
we suppose that the input parameter f contains value of the vector-function f in column-wise form
(i.e the values of the vector-function f evaluated at the point xi are stored to the i-th column of
the matrix f). Here is the code:

% div_diff.m
function diff = div_diff(x,f) % 1
% 2
diff = f; % 3
% 4
for i = 2:length(x) % 5
% 6
tmp1 = diff(:,i-1); % 7
% 8
for j = i:length(x) % 9
tmp2 = diff(:,j); %10
diff(:,j) = (tmp2 - tmp1)/... %11
(x(j)-x(j-1+1}}; %12
tmp1 = tmp2; %13
end; %14
end; %15

In the third line of the code, we put the values of the tabulated f data into the output array
diff. Lines 5-15 contain the main loop. In each iteration step of the loop, we calculate a subrow
of the divided difference table and store it to the corresponding entries of the output array diff, as
explained above (see lines 7-14).
Having computed all the divided differences, we easily evaluate the polynomial Pn−1 (x) at any
fixed point x. To do that more efficiently, we rewrite the formula for the polynomial Pn−1 (x) in
the nested form:

Pn−1 (x) = ((. . . (f [x1 , x2 , . . . , xn ](x − xn−1 ) + f [x1 , x2 , . . . , xn−1 ])(x − xn−2 ) + · · ·
+ f [x1 , x2 , x3 ])(x − x2 ) + f [x1 , x2 ])(x − x1 ) + f [x1 ].

This more advanced formula is used in practical computations.


Let us write a program. It takes an array of the points xi , i = 1, 2, . . . , n, (input parameter x) an
NUM701S: Numerical Methods 1 32

array of the function values fi , i = 1, 2, . . . , n (input parameter f) and an array of the points to
evaluate the interpolating polynomial (input parameter xx). The function returns the array ff of
the polynomial values calculated at the points xx, as output. Here is the code:

% newt_interp.m
function ff = newt_interp(x,f,xx); % 1
% 2
ff = zeros(size(f,1),length(xx)); % 3
% 4
diff = div_diff(x,f); % 5
% 6
for i = 1:length(xx) % 7
% 8
ff(:,i) = diff(:,end); % 9
%10
for j = length(x):-1:2; %11
ff(:,i) = ff(:,i)*(xx(:,i) - x(j-1)) ... %12
+ diff(:,j-1); %13
end; %14
end;

The third line of the code puts zeros in the output array ff. Line 5 calculates the divided differences
by using the auxiliary function div diff.
Lines 7-15 execute the main loop. The i − th loop iteration evaluates the value of the polynomial
Pn−1 (x) at the i − th entry of the input array xx by means of the nested scheme (see lines 9-14).
You can test the function newt interp in the command Matlab window as follows:

>> x = 0:2:8;
>> f = sin(x);
>> xx = 0:0.1:8;
>> ff = sin(xx);
>> pp = newt_interp(x,f,xx);
>> plot(x,f,’o’, xx,ff, xx,pp);

This is an example of approximation of the function sin(x) on he interval [0, 8] by means of the
Newton interpolation formula. The result of the approximation is plotted the form of the graphs
of the functions sin(x) and Pn−1 (x). Compare these graphs.

Exercise 1. In code of the function div diff, we use two auxiliary variables tmp1 and tmp2 (see
lines 7-13). Explain the purpose and necessity of these variables.
Exercise 2. In the implementation of the nested scheme, we access entries of the array diff in the
reverse order (see lines 9-14 in the code of the function newt-interp). Is it possible to go through
this array in the forward direction?
NUM701S: Numerical Methods 1 33

4.2 Practice
PROBLEM ONE
Write a program that, for any given values f1 , f2 , . . . , fn of a function f calculated at the points
x1 < x2 < . . . < xn , evaluates the values of the Lagrange interpolating polynomial Pn−1 (x) at any
arbitrary point x.
The programs takes an array of the points xi , i = 1, 2, . . . , n (input parameter x) an array of
function values fi , i = 1, 2, . . . , n, (input parameter f) and an array of the points to evaluate the
interpolating polynomial (input parameter xx). It returns the array ff of the Lagrange polynomial
values calculated at the points xx, as output.

PROBLEM TWO
Write a program that, for any given values f1 , f2 , . . . , fn of a function f calculated at the points
x1 < x2 < . . . < xn , evaluates the values of the interpolating polynomial Pn−1 (x), constructed by
the method of unknown coefficients, at any arbitrary point x.
The programs takes an array of the points xi , i = 1, 2, . . . , n (input parameter x) an array of
function values fi , i = 1, 2, . . . , n, (input parameter f) and an array of the points to evaluate the
interpolating polynomial (input parameter xx). It returns the array ff of the polynomial values
calculated at the points xx, as output.
Note that you must use your own version of the functions lu decomp and lu solve in order to treat
the linear system arising in the method of unknown coefficients.

PROBLEM THREE
In our theory 6, it was explained how the value of the Newton interpolating polynomial at any given
point x is calculated by means of forward divided differences. Modify the code of the functions
div diff and new interp in such a way that they will implement Newton interpolating polynomials
with backward divided differences (instead of the forward ones).
Chapter 5

Interpolation (part II)

5.1 Theory
5.1.1 Polynomial interpolation of uniformly distributed data
In the previous chapter, we explained how to construct the Newton interpolating polynomial by
using tabulated data fi , i = 1, 2, . . . , n, evaluated at not equally spaced points xi , i = 1, 2, . . . , n.
Here, we consider the situation when the nodes xi are distributed uniformly with some step size
h (i.e., we know that xi = x0 + ih, i = 1, 2, . . . , n, for the i-th point of the grid). In this case, the
interpolating polynomial Pn−1 (x) can be written in the Newton-Gregory form:
n  
X s
Pn−1 (x1 + sh) = ∆i−1 f1 .
i=1
i − 1
Here, we have used the binomial coefficient notation
 
s s(s − 1) · · · (s − i + 2)
= ,
i−1 (i − 1)!
and the forward differences ∆i−1 f1 , i = 1, 2, . . . , n. They are defined as follows: For the zeroth
forward differences, we define ∆0 fi = fi , i = 1, 2, . . . , n. Then, the k-th differences are calculated
recursively by means of the formula
∆k fi = ∆k−1 fi+1 − ∆k−1 fi .
As before, we split evaluation of any Newton-Gregory interpolating polynomial into two stages. In
the first one, we calculate the forwards differences. In the second stage, we evaluate the value of
the polynomial Pn−1 (x) at a point x by the mathematical formula above.
Let us consider the first stage (i.e. calculation of forwards differences). As you can see, the
formula for the forwards differences ∆k fi implies similar computations to those for the formula for
the forwards divided differences, but without division by the quantity xi − x1 . Thus, we can easily
accommodate the code of the function div diff to our purposes by eliminating al the divisions.
Let us write a program. It takes an array of the values fi , i = 1, 2, . . . , n, (input parameter) and
returns the array of the forwards differences of the form ∆i−1 f1 , i = 1, 2, . . . , n, as output. Here
is the code:

34
NUM701S: Numerical Methods 1 35

% fw_diff.m
function diff = fw_diff(f) % 1
% 2
diff = f; % 3
% 4
for i = 2:size(f,2) % 5
tmp1 = diff(:,i-1); % 6
for j = i:size(f,2) % 7
tmp2 = diff(:,j); % 8
diff(:,j) = tmp2 - tmp1; % 9
tmp1 = tmp2; %10
end; %11
end; %12

The function fw diff uses the same idea to compute the forwards differences as that for the code
of the function div diff implementing the forwards divided differences calculation.
Having computed the forwards differences, we easily evaluate the polynomial Pn−1 (x) at an arbi-
trary point x. Our program takes an initial point (input parameter x 1), a grid step size (input
parameter h), an array of the function values fi , i = 1, 2, . . . , n, (input parameter f) and an array
of the points to evaluate the interpolating polynomial (input parameter xx). The function returns
the array of the polynomial values ff calculated at the points xx, a output. here is the code:

% ng_interp.m
function ff = ng_interp(x_1,h,f,xx) % 1
% 2
ff = zeros(size(f,1),length(xx)); % 3
% 4
diff = fw_diff(f); % 5
% 6
for i = 1:length(xx) % 7
% 8
s = (xx(i) - x_1)/h; % 9
bin_coeff = 1; %10
%11
ff(:,i) = diff(:,1); %12
%13
for j = 2:size(f,2); %14
bin_coeff = bin_coeff*(s-j+2)/(j-1); %15
ff(:,i) = ff(:,i)+diff(:,j)*bin_coeff; %16
end; %17
end; %18

The third line of the code sets the output parameter to zero. Line 5 calculates the forward
differences by means of the auxiliary function fw diff.
Lines 7-18 execute the main loop. Line 9 calculates the auxiliary variable s for the current point
NUM701S: Numerical Methods 1 36

xx(i). We stress that the value of the initial point x1 is stored in the variable x 1. Then, line 10
sets the initial value of the binomial coefficient bin coeff. Line 12 puts the starting value of the
polynomial Pn−1 (x) in the i-th column of the output parameter ff. Lines 14-17 calculate the value
of the polynomial Pn−1 (x) at the i-th entry of the input array xx.
You can test the function ng interp in the Matlab command window as follows:

>> x = 0:2:8;
>> f = sin(x);
>> xx = 0:0.1:8;
>> ff = sin(xx);
>> pp = ng interp(0,2,f,xx);
>> plot(x,f,’o’,xx,ff,xx,pp);

This is an example of approximation of the function sin(x) on the interval [0, 8] by means of the
Newton-Gregory interpolation formula. The result of the approximation is plotted the form of
the graphs of the functions sin(x) and Pn−1 (x). Compare these graphs.

Exercise 1. Explain why we do not recalculate the auxiliary variable bin coeff in every iteration
step of the nested loop (see line 15 of the code of the function ng interp) completely, but only
update it.

5.1.2 Cubic spline interpolation


Suppose that the values f1 , f2 , . . . , fn of a scalar argument function f : R −→ Rn are known at
the points x1 < x2 < . . . < xn . We intend to approximate this function over the interval [x1 , xn ].
In the previous section, we explained how to solve the above-mentioned problem by means of the
polynomial interpolation technique. Here, we deal with the piecewise polynomial interpolation
based on cubic splines.
We do not use high degree interpolating polynomial Pn−1 that approximates the function f on the
whole interval [x1 , xn ]. The principal idea is to fit an interpolating polynomial to the function f (x)
on every particular small subinterval [xi , xi+1 ] (of length hi ), i = 1, 2, . . . , n − 1, separately. To do
that, we apply the third degree polynomials written in the form
Si (x) = ai + bi (x − xi ) + ci (x − xi )2 + di (x − xi )3 , i = 1, 2, . . . , n − 1.
To approximate the tabulated function f (x) correctly, the polynomials Si (x) must satisfy the
following conditions:
Si (xi ) = fi , Si (xi+1 ) = fi+1 , i = 1, 2, . . . , n − 1,
Si0 (xi+1 ) = Si+1
0
(xi+1 ), i = 1, 2, . . . , n − 1,
00 00
Si (xi+1 ) = Si+1 (xi+1 ), i = 1, 2, . . . , n − 1.
In addition, the polynomials S1 (x) and Sn−1 must satisfy one of the boundary conditions:
S100 (x1 ) = Sn−1
00
(xn ) = 0, (i.e. Free Boundary Conditions);
S10 (x1 ) = f 0 (x1 ), Sn−1
0
(xn ) = f 0 (xn ), (i.e. Clamped Boundary Conditions).
NUM701S: Numerical Methods 1 37

Below, we consider the Clamped Boundary Conditions.


It follows from the above formulas that the equalities ai = fi , i = 1, 2, . . . , n, hold for the coeffi-
cients ai . The coefficients ci , i = 1, 2, . . . , n, can be found as a solution of the linear system Ac = r
where the coefficient matrix A has the following tridiagonal form:
 
2h0 h0 0
 h0 2(h0 + h1 ) h1 
 
 0 h1 2(h1 + h2 ) h2 
A= ,
 
. . . . . .
 . . . 
 
 hn−2 2(hn−2 + hn−1 ) hn−1 
0 hn−1 2hn−1

and the right-hand side vector r is determined by the expression


3
 
(a2 − a1 ) − 3f 0 (x1 )
 h 1 
 3 3 
 (a3 − a2 ) − (a2 − a1 ) 
 h2 h1 
..
 
r= .
 
.
 3 3 
(an − an−1 ) − (an−1 − an−2 )
 
 hn−1 hn−2
 

 3 
3f 0 (xn ) − (an − an−1 )
hn−1
Having found ci , i = 1, 2, . . . , n, we calculate the remaining coefficients bi and di by means of the
following formulas:
1 hi 1
bi = (ai+1 − ai ) − (2ci + ci+1 ), di = (ci+1 − ci ), i = 1, 2, . . . , n − 1.
hi 3 3hi
First of all, let us write a program that, for any given array of points xi , i = 1, 2, . . . , n, an array
of values fi , i = 1, 2, . . . , n, (input parameter f), and an array f1 of the first derivatives of the
function f evaluated at the boundary points x1 and xn (input parameter f1), returns the arrays of
the cubic spline coefficients ai , bi , ci and di , i = 1, 2, . . . , n − 1, (output parameters a, b, c and d).
Here is the code:

% spline_coeff.m
function [a,b,c,d] = spline_coeff(x,f,f1) % 1
% 2
h = x(2:end) - x(1:end); % 3
% 4
a = f; % 5
% 6
A = diag([2*h(1), 2*(h(1:end-1)+h(2:end)), 2*h(end)]) + ... % 7
diag(h,-1) + diag(h,1); % 8
% 9
NUM701S: Numerical Methods 1 38

r = [ 3*(a(2)-a(1))/h(1) -3*f1(1),... %10


3*(a(3:end) - a(2:end-1))./h(2:end ) - ... %11
3*(a(2:end-1) - a(1:end-2))./h(1:end-1),... %12
3*f1(2) - 3*(a(end)-a(end-1))/h(end) ]’; %13
%14
VAL = trid_lu_decomp(A); %15
c = trid_lu_solve(VAL,r)’; %16
%17
b = (a(2:end) - a(1:end-1))./h - h.*(2*c(1:end-1) + c(2:end))/3; %18
d = (c(2:end) - c(1:end-1))/3./h; %19
%20
a = a(1:end-1); %21
c = c(1:end-1); %22
This code merely implements the mathematical formulas presented above.
In line 3, we calculate the lengths hi of all subintervals [xi , xi+1 ]. i = 1, 2, . . . , n − 1, and store them
into the array h.
Line 5 defines the array a of the coefficients ai .
Lines 7-8 compute the tridiagonal matrix A according to the formula mentioned-above. So, line 7
calculates the main diagonal of the matrix A. Line 8 gives us sub- and super-diagonals respectively.
In lines 10-13, we set up the right-hand side vector r.
Lines 15-16 solve the linear system Ac = r with respect to the cubic spline coefficients ci , i =
1, 2, . . . , n, and save the solution vector to the output array c.
In lines 18-19, we calculate the remaining output coefficients bi and di , i = 1, 2, . . . , n − 1.
Lines 21-22 cut the arrays a and c by dropping out the extra entries an and cn . We do this because
we will never use them in the future.
Now we write a program for the spline interpolation. It takes an array of points xi , i = 1, 2, . . . , n,
an array of values fi , i = 1, 2, . . . , n (input parameter f) at those points, and an array f1 of the
first derivatives of the function f evaluated at the boundary points x1 and xn (input parameter f1)
and an array of points where to evaluate the cubic spline at (input parameter xx). The function
returns the array of the cubic spline values ff calculated at the points xx, as output. Here is the
code:
%spline_interp.m
function ff = spline_interp(x,f,f1,xx) % 1
% 2
ff = zeros(size(xx)); % 3
% 4
[a,b,c,d] = spline_coeff(x,f,f1); % 5
% 6
for i=1:length(xx) % 7
% 8
I = find(xx(i) <= x); % 9
j = max(1, I(1) - 1); %10
%11
NUM701S: Numerical Methods 1 39

ff(i) = a(j) + (xx(i) - x(j))*b(j) + ... %12


(xx(i) - x(j))^2*c(j) + ... %13
(xx(i) - x(j))^3*d(j); %14
end; %15

This code is quite simple. The first line of the code sets the output parameter ff to zero. Line
5 calculates the cubic spline coefficients.
Lines 7-15 contain the main loop. The i-th iteration step of the main loop computes the value of
the cubic spline S(x) at the point xx(i). So, in lines 9-10, we look for the subinterval [xj , xj+1 ]
containing the current point xx(i). Lines 12-14 calculate the value of the spline with the use of the
coefficients aj , bj , cj and dj corresponding to this particular subinterval.
You can test the function spline interp in the Matlab window as follows:

>> x = 0:2:30;
>> f = cos(x);
>> f1 = -sin([0,30]);
>> xx = 0:0.1:30;
>> ff = cos(xx);
>> pp = spline_interp(x,f,f1,xx);
>> plot(x,f,’o’, xx,ff, xx,pp);

This example presents the cubic spline approximation of the function cos(x) on the interval [0, 30].
The result is shown in the form of two graphs. The first one is for the exact function cos(x). The
second graph gives the cubic spline approximation S(x). Compare these graphs.

Exercise Explain why the vectorised formulas for the matrix A, the vector r and arrays b and d,
which we use in the code of the function spline coeff, works correctly. Convert these formulas into
the scalar form by using the Matlab loop statement for.

5.2 Practice
PROBLEM ONE
By using the sample code of the functions fw diff and ng interp as templates, write a program
that implements the Newton-Gregory interpolation formula with backward differences (instead of
forward ones).
The program takes an initial point (input parameter x 1), a grid step size (input parameter h),
an array of function values fi , i = 1, 2, . . . , n (input parameter f) and an array of points where to
evaluate the interpolating polynomial at (input parameter xx). The function returns the array ff
of the polynomial values calculated at the points xx, as output.

PROBLEM TWO
In the theory section, we explained the program that implements the cubic spline interpolation with
the Clamped Boundary Conditions. Modify the code of the function spline coeff and spline interp
NUM701S: Numerical Methods 1 40

in such a way that they will implement the cubic spline interpolation with the Free Boundary
Conditions (instead of the clamped ones).
Notes that you must use your own versions of the functions trid lu decomp and trid lu solve to solve
the linear system with the unknown coefficients ci , i = 1, 2, . . . , n.
Chapter 6

Numerical differentiation

6.1 Finite Difference Methods (FDMs)


6.1.1 FDMs by Taylor’s expansion
Consider Taylor’s expansion of a function

h2 00 h3 h4
f (x + h) = f (x) + hf 0 (x) + f (x) + f 000 (x) + f (4) (x) + · · · (6.1)
2! 3! 4!
2 3 4
h h h
f (x − h) = f (x) − hf 0 (x) + f 00 (x) − f 000 (x) + f (4) (x) − · · · (6.2)
2! 3! 4!
Rearranging Eq (6.1) gives

f (x + h) − f (x) h h2
f 0 (x) = − f 00 (x) − f 000 (x) + · · ·
h 2! 3!
f (x + h) − f (x)
f 0 (x) = + O(h). (6.3)
h
This latter equation is the Forward Difference approximation to f 0 (x).
Rearranging Eq (6.2) gives

f (x) − f (x − h) h h2
f 0 (x) = + f 00 (x) − f 000 (x) + · · ·
h 2! 3!
f (x) − f (x − h)
f 0 (x) = + O(h).
h
This equation is the Backward Difference approximation to f 0 (x).
If we subtract Eq (6.2) from Eq (6.1) and after rearranging we have

f (x + h) − f (x − h) h2 000 h4
f 0 (x) = − f (x) − f (5) − · · · ,
2h 3! 5!
f (x + h) − f (x − h)
f 0 (x) = + O(h2 ). (6.4)
2h

41
NUM701S: Numerical Methods 1 42

Eq. (6.4) is called Central Difference approximation to f 0 (x). The parameter h is called the
step size.

Example
Let f (x) = ln x and x = x0 = 1.8, approximate f 0 (1.8) by the Forward Difference and estimate
the error.

f (1.8 + h) − f (1.8)
f 0 (1.8) ≈ , h>0
h
|hf 00 (ξ)| |h| |h|
with error = 2 ≤ , where 1.8 < ξ < 1.8 + h. The results are summarised in the
2 2ξ 2(1.8)2
following table for various values of h.

f (1.8 + h) − f (1.8) |h|


h f (1.8 + h) 2(1.8)2
h
0.1 0.64185389 0.5406722 0.0154321
0.01 0.59332685 0.5540180 0.0015432
0.001 0.58834207 0.5554013 0.0001543
The exact value is f 0 (1.8) = 0.555 · · · .
Using the central difference approximation, we have

f (1.8 + h) − f (1.8 − h)
f 0 (1.8) ≈ , h>0
2h
h2 |f (3) (ξ)| h2 h2
with error = ≤ , where 1.8 < ξ < 1.8 + h.
3! 3|ξ|3 3(1.8)3
f (1.8 + h) − f (1.8 − h) h2
h f (1.8 + h) f (1.8 − h)
2h 3(1.8)3
0.1 0.64185389 0.53062825 0.5561282 5.719 × 10−4
0.01 0.59332685 0.58221562 0.5555615 5.7155 × 10−6
0.001 0.58834207 0.58723096 0.555555 5.7155 × 10−8
We can also obtain an approximation to the second derivative. For that we add Eqs. (6.1)
and (6.2):
2 00 h4 (4)
f (x + h) + f (x − h) = 2f (x) + h f (x) + f (x) + · · · ,
12
00
solving for f (x) we get

f (x + h) − 2f (x) + f (x − h) h2 (4) h4 (6)


f 00 (x) = − f (x) − f (x) − · · · ,
h2 12 360
f (x + h) − 2f (x) + f (x − h)
f 00 (x) = + O(h2 ).
h2
This is the Central Difference approximation to f 00 (x), which is second-order accurate.
NUM701S: Numerical Methods 1 43

Example
Following the previous example to approximate f 00 (1.8) with f (x) = ln x we obtain the results
given in the table below:

f (1.8 + h) − 2f (x) + f (1.8 − h) h2


h f (1.8 + h) f(1.8) f (1.8 − h)
h2 2(1.8)4
0.1 0.64185389 0.58778666 0.53062825 -0.30911926 4.76299 × 10−4
0.01 0.59332685 0.58778666 0.58221562 -0.30864674 4.76299 × 10−6
0.001 0.58834207 0.58778666 0.58723096 -0.30864202 4.76299 × 10−8

The exact value being f 00 (1.8) = −0.30864198 correct to 8 significant digits.


For h = 0.001, the relative error is 1.3 × 10−5 % while the absolute error is 4 × 10−8 . We can see
that it is less than 4.76299 × 10−8 as predicted by the theory.

6.1.2 Extrapolation Techniques


This is a way to improve the accuracy of the estimates of derivatives from a table of uniformly or
evenly spaced data. We expose the method through the following example.
Example
Estimate the value of f 0 (2.4) from the following data

xi fi
2.0 0.123060
2.1 0.105706
2.2 0.089584
2.3 0.074764
2.4 0.061277
2.5 0.049126
2.6 0.038288
2.7 0.028722
2.8 0.020371
Using central differences with h = 0.1, we have
f (2.5) − f (2.3)
f 0 (2.4) = + O(h2 ),
2(0.1)
= −0.12819 + C1 (0.1)2 + Ĉ1 (0.1)4 + · · · ,
Repeating the same procedure with h = 0.2, we have
f (2.6) − f (2.2)
f 0 (2.4) = + O(h2 ),
2(0.2)
= −0.12824 + C2 (0.1)2 + Ĉ2 (0.1)4 + · · ·
NUM701S: Numerical Methods 1 44

The values of C1 and C2 are not identical, but we assume they are the same. It can be shown that
we make an error of O(h4 ) in taking the two values as equal. Based on this assumption, we can
eliminate the C’s and get
1
f 0 (2.4) = −0.12819 + [−0.12819 − (−0.12824)]
(0.2/0.1)2 − 1
= −0.12817 + O(h2 ).

The latter assumption is an instance of a general rule stated as follows:

Given two estimates of a value that have errors of O(hn ), where the h’s are in the ratio of
2 to 1, we can extrapolate to a better estimate of the exact value as
1
Better estimate = more accurate + n (more accurate − less accurate).
2 −1
The more accurate value is the one computed with the smaller value of h.
From the table of data, one can calculate an approximation with h = 0.4 and then extrapolate
using the approximation with h = 0.2 to obtain a second O(h4 ). We then have two first-order
extrapolation with O(h4 ) error, a second extrapolation can then be performed on the two O(h2 )
approximations to give an O(h6 ) approximation. The extrapolations are laid out in table form as
follows:

h Initial estimate First order est. Second order est.


0.1 −0.12819 −0.12817 −0.12817
0.2 −0.12824 −0.12820
0.4 −0.12836
The second-order extrapolation in the above table comes from the following computation
1
f 0 (2.4) = −0.12817 + (−0.12817 − (−0.12820))
−1 24
= −0.12817 + O(h6 ).

The rule stated above is a general rule thus it may be used for second derivative approximation
as well. For example, let’s approximate f 00 (2.4). First we take h = 0.1, the central difference
approximation gives:
f (2.5)− 2f (2.4) + f (2.3)
f 00 (x) = + O(h2 ),
(0.1)2
= 0.13360 + O(h2 ). (6.5)

Then take h = 0.2


f (2.6)− 2f (2.4) + f (2.2)
f 00 (x) = 2
+ O(h2 ),
(0.2)
= 0.13295 + O(h2 ). (6.6)
NUM701S: Numerical Methods 1 45

And finally take h = 0.4

f (2.8)− 2f (2.4) + f (2.0)


f 00 (x) = + O(h2 ),
(0.4)2
= 0.13048 + O(h2 ). (6.7)

The extrapolations are laid out in the following table

h Initial estimate First order est. Second order est.


0.1 0.13360 0.13382 0.13382
0.2 0.13295 0.13377
0.4 0.13048

Since the data was constructed using f (x) = e−x sin x, we can compute the true values f 0 (2.4) =
−0.12817 and f 00 (2.4) = 0.13379. So the difference in the results for f 00 (2.4) is due to to round-off
error.

6.1.3 Richardson’s Extrapolation


We can apply the same technique when we want to differentiate a known function numerically. In
this application, we can make h-values smaller rather than use larger values as required when the
function is known only from tabulated values as above.
We begin at some arbitrary value of h and compute an approximation to f 0 (x), then compute a sec-
ond using h/2 then use the extrapolation formula to compute an improved estimate. Generally, one
builds a table by continuing with higher-order extrapolations with the h value halved at each stage.

Example
Build a Richardson table for f (x) = x2 cos x to evaluate f 0 (1). Start with h = 0.1.

h Initial estimate First order est. Second order est. Third est.
0.1 0.226736
0.05 0.236031 0.239129
0.025 0.238358 0.239133 0.239134
0.0125 0.238938 0.239132 0.239132 0.239132

The Richardson’s technique indicates convergence when two successive values on the same line are
the same. Note that the table is oriented differently to that of the tabulated data problem, this is
done to indicate the reducing value of h.
Chapter 7

Numerical integration(Quadrature)

7.1 Introduction
A common problem is to evaluate the definite integral
Z b
I= f (x)dx.
a

We think of this as calculating the area beneath the curve of the function f = f (x) between
the limits x = a and x = b. There are many situations where the function f = f (x) cannot be
integrated, or it is known only as a table of values, in these cases we must use numerical techniques
to approximate I.

7.2 Midpoint Rule


If we consider the interval to be [x0 , x2 ] with x1 ∈ I0 is the midpoint i.e x1 = (x0 + x2 )/2. So
the hight of the rectangle between x0 and x1 that will be used to approximate the area below the
curve of the function f = f (x) is f (x1 ) (see Figure xx1), thus

h3 00
I = 2hf (x1 ) + f (ξ), ξ ∈ (x0 , x2 )
3
≈ IM = 2hf (x1 )

7.3 Trapezoidal Rule


If [x0 , x1 ] represents the interval, then (see Figure xx2)

1 h3
I = (sum of fi ) × (distance between xi ) − f 00 (ξ), ξ ∈ (x0 , x1 )
2 12
h
≈ IT = (f0 + f1 )
2

46
NUM701S: Numerical Methods 1 47

7.4 Simpson’s Rule


Assume the interval of integration is [x0 , x2 ] and the three points (x0 , f0 ), (x0 +h, f1 ) and x0 +2h, f2
are used, since the three points are equally spaced, we can use Newton-Gregory’s forward difference
interpolation P2 to fit a quadratic polynomial to these three points and then integrate. We have
(x − x0 ) (x − x0 )(x − x1 ) 2
P2 (x) = f0 + ∆f0 + ∆ f0
h 2h2
where
∆f0 = f1 − f0 and ∆2 f0 = f2 − 2f1 + f0 .
Thus x2  
(x − x0 ) (x − x0 )(x − x1 ) 2
Z
I= f0 + ∆f0 + ∆ f0 dx
x0 h 2h2
In order to simplify the integration process of the above polynomial, we transform it using x =
x0 + 2hr, where r ∈ [0, 1]. This gives dx = 2hdr, we also have

x − x0 = 2hr,

and

x − x1 = x0 + 2hr − (x0 + h)
= h(2r − 1).

The integral becomes


Z 1 Z 1 Z 1
2
I = 2hf0 dr + 4h∆f0 rdr + 2h∆ f0 r(2r − 1)dr
0 0 0
 1  1
1 1 2 2 2 3 1 2
= 2hf0 [r]0 + 4h∆f0 r + 2h∆ f0 r − r ,
2 0 3 2 0
h
= 2hf0 + 2h∆f0 + ∆2 f0 .
3
Substituting for ∆f0 and ∆2 f0 , we have
h
IS = 2hf0 + 2h(f1 − f0 ) + (f2 − 2f1 + f0 ).
3
which upon simplification gives
h
I ≈ IS = (f0 + 4f1 + f2 )
3

Examples
Use the Trapezoidal rule and Simpson’s rule to approximate the integral of the following functions
on the interval [0, 2]:
NUM701S: Numerical Methods 1 48

1- f (x) = x2 ; 2- f (x) = 1/(x + 1); 3- f (x) = ex ;

Solution
For the Trapezoidal rule on [0, 2] we use:
Z 2
f (x)dx ≈ f (0) + f (2),
0

since for Simpson’s rule we need an even number of subintervals, we use h = (b − a)/2 = 1, thus
Z 2
1
f (x)dx ≈ [f (0) + 4f (1) + f (2)]
0 3
The results to tree decimal places are summarised in the following table:

f (x) x2 1/(x + 1) ex
Exact value 2.667 1.099 6.389
Trapezoidal rule 4.000 1.333 8.389
Simpson’s rule 2.667 1.111 6.421
Chapter 8

Ordinary differential equations

8.1 Initial value problems


8.1.1 First-order ODEs
First-order ODEs are of the form
dy
y0 = = f (x, y) (8.1)
dx
We seek a solution for x ∈ [a, b] with the initial condition y(a) = η. A numerical method proceeds
from the initial point and integrates Eq. (8.1) over the range x ∈ [a, b].
Defining (xi ) by xi = a + ih, i = 0, 1, . . . , n where h is the constant stepsize, successive values
y1 , y2 , . . . , yn are computed at a + h, a + 2h, . . . , a + nh.
Following notation is used

xi = x0 + ih, yi = y(xi ), fi = f (xi , yi ), where x0 = a, and xn = b.

8.1.2 Euler’s method


If we use forward difference operator

f (x + h) − f (x)
f 0 (x) = + O(h)
h
to approximate dy/dx at a point xi , then
1
yi0 = (yi+1 − yi )
h
we obtain the algorithm for the Euler’s method

yi+1 = yi + hfi (8.2)

Thus given (x0 , y0 ) = (a, η), we can calculate (xi , yi ) for i = 1, . . . , n. Since the new value yi+1 can
be calculated from known values of xi and yi , this method is explicit.

49
NUM701S: Numerical Methods 1 50

8.1.3 Error in Euler’s method


Consider the Taylor expansion of yi+1

h2 00 h3 000
yi+1 = yi + hyi0 + y + yi + . . . ,
2! i 3!
since yi0 = fi
h2 0 h3 00
yi+1 = yi + hfi + fi + fi + . . .
2! 3!
If we subtract Eq. (8.2) from this we obtain

h2 0 h3 00
εE = f + fi + . . . ,
2! i 3!
h2 0
= f (ξ),
2! i
where xi < ξ < xi+1 . Thus the error per step is O(h2 ) and is called the Local Truncation Error
(LTE). If we integrate over n step the Global Truncation Error is O(h).

Example
Solve the IVP √
x0 (t) = 2 x, x(0) = 4
to compute x(1) using Euler’s method with 4 and 8 steps respectively. Display all the intermediate
steps in the form of a table including the exact solution and the error.

With 4 steps

k tk xk x(tk ) Error
0 0 4 4 0
1 0.25 5.000 5.063 0.063
2 0.50 6.118 6.250 0.132
3 0.75 7.355 7.563 0.208
4 1.00 8.711 9.000 0.289
With 8 steps

k tk xk x(tk ) Error
0 0 4 4 0
2 0.25 5.030 5.063 0.033
4 0.50 6.182 6.250 0.068
6 0.75 7.456 7.563 0.107
8 1.00 8.852 9.000 0.148
NUM701S: Numerical Methods 1 51

8.1.4 Improved Euler method


A much more efficient than Euler’s method can be derived using a predictor-corrector strategy.
The method obtained is second-order accurate and is called improved Euler method or Heun’s
method.
The algorithm is derived as follows. Assume we want to solve a more general non-autonomous
IVP
y 0 (t) = f (t, y(t)), y(a) = η (8.3)
We choose evenly spaced times a = t0 < t1 < . . . < tn−1 < tn = T with step size h. Integrating
Eq. (8.3) over the interval [tk , tk+1 ] and using the fundamental theorem of calculus yields
Z tk+1
y(tk+1 ) = y(tk ) + f (t, y(t))dt
tk

Using the trapezoidal rule for numerical integration, we obtain


h
y(tk+1 ) ≈ y(tk ) + (f (tk , y(tk )) + f (tk+1 , y(tk+1 ))) .
2
Knowing that yk ≈ y(tk ) and yk+1 ≈ y(tk+1 ), we assume

h
yk+1 = yk + (f (tk , yk ) + f (tk+1 , yk+1 ))
2
This an implicit method in the sense that if yk is known, we cannot always determine directly yk+1
without solving at this stage a non-linear equation, due to the term f (tk+1 , yk+1 ) (if f is non-linear
in y), via an iterative method such as Newton’s method. A way out is to estimate yk+1 using
Euler’s method and that estimate is called a predictor and is denoted pk+1 . We can now write the
complete algorithm for the improved Euler’s method

t0 = a,
Predictor: pk+1 = yk + hf (tk , yk ),
tk+1 = tk + h,
h
Corrector: yk+1 = yk + (f (tk , yk ) + f (tk+1 , pk+1 )), k = 0, 1, . . . , n − 1.
2

Example
Solve the IVP √
x0 (t) = 2 x, x(0) = 4
to compute x(1) using the improved Euler method with 4 and 8 steps respectively and compare
your results with those of the Euler’s method. (Partially solved in Class).
NUM701S: Numerical Methods 1 52

8.1.5 Runge-Kutta(RK) methods


The general one step method takes the form

yi+1 = yi + hφ(xi , yi ; h)

In Euler’s method we have φ(xi , yi ; h) = fi = yi0 we are using the slope at the point yi
to extrapolate yi and obtain yi+1 . The method of Runge-Kutta extends this. Instead of just
calculating the slope at yi we can take it to be the ’weighted’ average at yi and intermediate points.

A second order Runge-Kutta method is

q1 = f (xi , yi )

h h
q2 = f (xi + , yi + q1 )
2 2
yi+1 = yi + hq2
which has LTE of O(h3 ).

We can include more sample points in φ(xi , yi ; h) to increase the accuracy. The most widely used
formula is the forth-order RK method

q1 = f (xi , yi )

h h
q2 = f (xi + , yi + q1 )
2 2
h h
q3 = f (xi + , yi + q2 )
2 2

q4 = f (xi + h, yi + hq3 )
h
yi+1 = yi + (q1 + 2q2 + 2q3 + q4 )
6
this method has LTE of O(h4 )

Example
Approximate y(0.4) using RK4 with 2 steps for the following non-autonomous IVP

y 0 (t) = ty(t) + y(t) + t2 , y(0) = 2

f (t, y) = ty + y + t2 , t0 = 0, y0 = 2, h = 0.2.
Step 1:

q1 = f (t0 , y0 ) = t0 y0 + y0 + t20 = 0 + 2 + 0 = 2
NUM701S: Numerical Methods 1 53

h
y0 + q1 = 2 + (0.1)(2) = 2.2
2
h h
q2 = f (t0 + , y0 + q1 ) = 0.1(2.2) + 2.2 + 0.12 = 2.43
2 2
h
y0 + q2 = 2.243
2
h h
q3 = f (t0 + , y0 + q2 ) = 2.4773
2 2
y0 + hq3 = 2.4955
q4 = f (t0 + h, y0 + hq3 ) = 3.0346
h
y1 = y0 + (q1 + 2q2 + 2q3 + q4 ) = 2.4950
6
Step 2:
In a similar manner as above, we find

y(0.4) ≈ y2 = 3.2566 

8.1.5 linear multi-step (LMM)Methods

The method considered above are called single step methods since they calculate the solution at
xi+1 from known information at xi This approach ignores other points which may already have
been calculated, LMM method attempt to take more than one point into consideration.
if we take the central difference operation (26)

1
yi0 = (yi+1 − yi−1 )
2h
therefore

yi+1 = yi−1 + 2hf (xi , yi ).


Thus to calculate yi+1 at xi+1 requires information at xi+1 and xi . if we integrate y 0 = f (x, y)
using Simpson’s rule (27) in the range [xi−1 , xi+1 ]

h
yi+1 − yi−1 = (fi−1 + 4fi + fi+1 ).
3
This time an estimate of fi+1 is required on the right hand side, this is called an implicit method.
both (31) and (32) are 2 steps methods.
NUM701S: Numerical Methods 1 54

8.1.6 Adams Methods

These methods are derived from


Z xi+1
yi+1 = yi + P (x)dx
xi

where P (x) is an interpolating polynomial. If we fit P (x) to the points fi , fi−1 , . . . , fi−m then
we obtain explicit formula which are called Adam’s-Bashforth methods. If P (x) is fitted to
fi+1 , fi , . . . , fi−m the methods are implicit and known as Adam’s-Moulton methods.

The most commonly used Adam’s methods are listed below along with their LTE’s.

Adam’s-Bashforth (A-B) methods

LTE Order
h2 0
yi+1 = yi + hfi (Euler) 1 Step 2
f (ξ) 1
5h3 00
yi+1 = yi + h2 (3fi − fi−1 ) 2 Step 12
f (ξ) 2
h 3h4 (3)
yi+1 = yi + 12 (23fi − 16fi−1 + 5fi−2 ) 3 Step 8
f (ξ) 3
h 251h5 (4)
yi+1 = yi + 24 (55fi − 59fi−1 + 37fi−2 − 9fi−3 ) 4 Step 720
f (ξ) 4

Adam’s-Moulton (A-M) methods

LTE Order
3
yi+1 = yi + h
(f + fi ) (Trapezoidal
2 i+1
Rule) 1 Step − h12 f 00 (ξ) 2
h 4
yi+1 = yi + 12
(5fi+1 + 8fi − fi−1 ) 2 Step − h24 f (3) (ξ) 3
h 5
yi+1 = yi + 24
(9fi+1 + 19fi − 5fi−1 + fi−2 ) 3 Step − 19h
720
f (4) (ξ) 4
It is seen that the order of the error for m step A-M methods is greater than that for m step A-B
methods.

Remark Apart from the 1 step methods, these formulae are not self-starting, they require the
solution at additional points to x0 . In general, single step methods must be used to generate
additional starting values for LMM’s.

8.1.7 Predictor corrector methods

The advantage of smaller errors in implicit methods is offset by the difficulty in solving for yi+1 on
the right-hand side. Consider the problem
dy
= y 2 + x2 ,
dx
this is simple to solve using A-B methods. However, if we use the 1 step A-M method we have
h 2
yi+1 = yi + (yi+1 + x2i+1 + yi2 + x2i ),
2
NUM701S: Numerical Methods 1 55

h 2 h
yi+1 − yi+1 = yi + (x2i+1 + yi2 + x2i ),
2 2
the terms on the right hand side are known and we have a non-linear equation in the one unknown
yi+1 . This ’inconvenience’ is overcome by using A-B and A-M methods
 in tandem. We use an A-B
[p] [p] [p]
method to predict yi+1 , written yi+1 , then evaluate fi+1 = f xi+1 , yi+1 and then use an A-M
[p] [c]
method with fi+1 to calculate a corrected value yi+1 . This is usually written PEC, if we were to
apply the corrector m times, then we write P(EC)m . The A-B and A-M pair are usually chosen
to have the same order of error. So the O(h3 ) P-C pair would be
[p] h
yi+1 = yi + (3fi − fi−1 )
2
[c] h [p]
yi+1 = yi + (fi+1 + fi )
2
and the O(h4 ) P-C pair
[p] h
yi+1 = yi + (23fi − 16fi−1 − 5fi−2 )
12
[c] h [p]
yi+1 = yi + (5fi+1 + 8fi − fi−1 )
12
The corrector method would generally be applied several times until the solution has converged
in some way.

8.2 System of first-order ODEs


A system of m first-order IVPs is given by
dy1
= f1 (x, y1 , y2 , . . . , ym )
dx
dy2
= f2 (x, y1 , y2 , . . . , ym )
dx
..
.
dym
= fm (x, y1 , y2 , . . . , ym )
dx
subject to
y1 (a) = η1 , y2 (a) = η2 , . . . , ym (a) = ηm .
Methods to solve systems are simply generalisations of methods for single equations.

Example
Given the system of two equations, where we write y = y1 , z = y2 and f = f1 , g = f2
dy
= f (x, y, z)
dx
dy
= g(x, y, z)
dx
NUM701S: Numerical Methods 1 56

with y(0) = η1 and z(0) = η2


The second-order RK method would be applied as follows

q1 = f (xi , yi , zi )
r1 = g(xi , yi , zi )

then
h h h
q2 = f (xi + , yi + q1 , zi + r1 )
2 2 2
h h h
r2 = g(xi + , yi + q1 , zi + r1 )
2 2 2
and finally

yi+1 = yi + hq2
zi+1 = zi + hr2

Note that we must calculate q1 , r1 , q2 , r2 in that order. Similarly, LMMs can be applied to systems.

Example
If we apply the second-order Predictor-Corrector method to the above system, we obtain the
predictors

[p] h
yi+1 = yi + (3fi − fi−1 )
2
[p] h
zi+1 = zi + (3gi − gi−1 )
2
then evaluate
 
[p] [p] [p]
fi+1 = f x, yi+1 , zi+1
 
[p] [p] [p]
gi+1 = g x, yi+1 , zi+1

and then correct


[c] h [p]
yi+1 = yi + (fi+1 + fi )
2
[c] h [p]
zi+1 = zi + (gi+1 + gi ).
2
8.3 Higher-order IVP

In general, if we have an mth order IVP it can be rewritten as a system of m first-order IVPs.
NUM701S: Numerical Methods 1 57

Example
For the general second-order equation

d2 y dy
2
+ a + by = 0
dx dx
with initial data y(0) = η1 and y 0 (0) = η2 .
We let
dy dz d2 y
z= ⇒ = 2.
dx dx dx
The original ODE can now be written as
dz
+ az + by = 0.
dx
We then have the system of two equations
dy
= z
dx
dz
= −az − by
dx
subject to y(0) = η1 , z(0) = η2 .
Once transformed into a system of first-order ODEs, the methods of the previous section can be
applied.

You might also like