Numerical Computions

Numerical Computing
CSC 2702
Required textbook:
Numerical Analysis: Burden & Faires: 8th edition Thomson Brooks/Cole
Dr Azeddine M
KICT, CS
IIUM
October 12, 2009
1
Contents
1 Mathematical Preliminaries 4
1.1 Review of Calculus . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Round-off Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2 Solution of Equations in One Variable 19

2.1 Bisection Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.2 Fixed-Point Iteration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3 Newton’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.4 Secant Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2.5 False Position Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3 Error Analysis for Iterative Methods 29

3.1 Linearly and Quadratically Convergent Procedures . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Zero multiplicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4 Accelerating Convergence 33
4.1 Aitken’s ∆2 method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2 Steffensen’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3 Zeros Polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.4 Horner’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.5 Deflation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.6 Müller’s method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5 Interpolation and Polynomial Approximation 43

5.1 Weierstrass Approximation Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.2 Lagrange Polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5.3 Neville’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.4 Newton Interpolating Polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.5 Polynomial Forms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.6 Spline Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.7 Parametric Curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
2
6 Numerical Differentiation and Integral 57
6.1 Numerical Differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
6.2 Richardson’s Extrapolation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6.3 Elements of Numerical Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.3.1 Trapezoidal rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
6.3.2 Simpson’s rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.3.3 Degree of precision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.3.4 Newton-Cotes Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.4 Composite Numerical Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.5 Adaptive Quadrature Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.6 Gaussian Quadrature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.7 Improper Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
7 Initial-Value Problems for ODE 77

7.1 introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
7.2 Elementary Theory of Initial-Value Problems . . . . . . . . . . . . . . . . . . . . . . . . . . 78
7.3 Euler’s Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
7.4 Higher-Order Taylor Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
7.5 Runge-Kutta Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.6 Predictor Corrector Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
8 Direct Methods for Solving Linear Systems 90

8.1 Gaussian Elimination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
8.2 Pivoting Strategies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
8.3 Matrix Inverse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
8.4 Determinant of Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
8.5 Matrix Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
9 Iterative Methods for Solving Linear Systems 93

9.1 Norms of vectors and Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
9.2 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
9.3 Iterative Techniques for Solving Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . 93
10 Some Useful Remarks 94

10.1 Largest Possible Root . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
10.2 Convergence of Bisection Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
10.3 Convergence of False Position Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
10.4 Convergence of Newton-Raphson Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
10.5 Convergence of Secant Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
10.6 Convergence of Fixed Point Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
11 Exams 98
11.1 exam 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
11.2 exam 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
11.3 exam 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
3
Chapter 1
Mathematical Preliminaries
1.1 Review of Calculus

Definition of the limit:
A function f defined on a set X of real number has a limit L at x0 , written
lim f (x) = L (1.1)

x→x0
if the following statement is true
∀ǫ > 0 ∃δ > 0 | 0 < |x − x0 | < δ ⇒ |f (x) − L| < ǫ (1.2)
f is continuous at x0 if
lim f (x) = f (x0 ) (1.3)
x→x0
Convergence of the sequences:

The sequence has a limit x (converges to x) if
∀ǫ > 0 ∃N > 0 | n > N ⇒ |xn − x| < ǫ (1.4)
Differentiable functions:
The function f is differentiable at x0 if
f (x) − f (x0 )
f ′ (x) = lim (1.5)
x→x0 x − x0
exists. This limit is called the derivative of f at x0 .
The set of all function that have n continuous derivatives on X is denoted by C n (X).
Theorem: If the function f is differentiable at x0 , then f is continuous at x0 .
Rolle’s Theorem:
Suppose f ∈ C[a, b] and f is differentiable on (a, b). If f (a) = f (b), then a number c ∈ (a, b) exists
with f ′ (c) = 0.
4
Mean value theorem:
Suppose f ∈ C[a, b] and f is differentiable on (a, b). A number c ∈ (a, b) exists with
f (b) − f (a)
f ′ (c) = (1.6)
b−a
Extreme value theorem:

If f ∈ C[a, b], then c1 and c2 exist with f (c1 ) ≤ f (x) ≤ f (c2 ), for all x ∈ [a, b]. In addition, if f is
differentiable on (a, b), then c1 and c2 occur either at the endpoints of [a, b] or where f ′ is zero.
Riemann integral:
The Riemann integral of a function f on the interval [a, b] is the limit (provided it exists) of
Z b n
X
f (x)dx = lim f (zi )∆xi , (1.7)
a max∆xi →0
i=1
where the numbers xi satisfy a = x0 ≤ x1 ... ≤ xn = b, and where ∆xi = xi −xi−1 , and zi is arbitrarily
chosen in [xi−1 , xi ]. If the points are equally spaced and we choose zi = xi , in this case The Riemann
integral of a function f on the interval [a, b] is the limit (provided it exists) of
Z b n
b−aX
f (x)dx = lim f (xi ), (1.8)
a n→∞ n i=1
where xi = a + i(b − a)/n.
Weighted Mean Value Theorem for Integral: Suppose f ∈ C[a, b], the Riemann integral of g
exists on [a, b], and g(x) does not change sign on [a, b]. Them there exist a number c in (a, b) with
Z b Z b
f (x)g(x)dx = f (c) g(x)dx. (1.9)
a a
When g(x) = 1, it gives the average value of the function f over the interval [a, b].
Z b
1
f (c) = f (x)dx. (1.10)
b−a a
Generalized Rolle’s Theorem: Suppose that f ∈ C[a, b] is n times differentiable on (a, b). If f (x)
is zero at n + 1 distinct numbers x0 , ..., xn in [a, b], then a number c in (a, b) exists with f (n) (c) = 0.
Intermediate Value Theorem: If f ∈ C[a, b] and K is any number between f (a) and f (b), then
there exists c in (a, b) for which f (c) = K.
Taylor’s Theorem: If f ∈ C n [a, b] and f (n+1) exists on [a, b], and x0 ∈ [a, b]. For every x ∈ [a, b],
there exists a number ξ(x) between x0 and x with f (x) = Pn (x) + Rn (x), where
n
X f (k) (x0 )
Pn (x) = (x − x0 )k (1.11)
k=0
k!
(n+1)
f (ξ(x))
Rn (x) = (x − x0 )n+1 (1.12)
(n + 1)!
5
Pn (x) is called the nth Taylor polynomial for f about x0 , and Rn (x) is called the remainder
term (or truncation error). In the case when x0 = 0, the Taylor polynomial is often called a
Maclaurin polynomial.
if we take n → ∞ the Taylor polynomial is called Taylor series for f about x0 . For x0 = 0, the
Taylor series is called Maclaurin series.
Example:
We want to determine an approximate value of cos (0.01) using second Maclaurin polynomial
1 1
cos x = 1 − x2 + x3 sin(ξ) (1.13)
2 6
where ξ is number between 0 and x.
cos (0.01) = 0.99995 + 0.16 × 10−6 sin ξ (1.14)
where we use the bar over 6 to indicate that this digit repeats indefinitely.
| cos(0.01) − 0.99995| = 0.16 × 10−6 sin ξ ≤ 0.16 × 10−6 (1.15)
which gives
0.99994983 < 0.99995 − 0.16 × 10−6 ≤ cos(0.01)

≤ 0.99995 + 0.16 × 10−6 < 0.99995017
(1.16)
The error bound is much larger than the actual error. This is due in part to the poor bound we used
for sin ξ. It can be shown that | sin x| ≤ |x|. Since 0 ≤ ξ ≤ 0.01, we find bound 0.16 × 10−8 .
1 1
Note that eq. (1.13) can be written as cos x = 1 − x2 + x4 cos(ξ ′ ) and the error will be no more
2 24
1 −8 −9
than × 10 = 0.416 × 10 .
24
This example illustrate two objectives of numerical analysis:
find an approximation to the solution and determine a bound for the error.
1.1.1 Exercises
The exercises are from the textbook sec 1.1 pages 14-16.
Tutor: Exercises 1,2,3,4,15,23
Students: All odd exercises except 17
Assignment 1: Exercises 15 and 26
solution of exercise 13:

2
The Taylor polynomial P4 (x) for the function f (x) = x ex is given by
f (x) = x + x3 + 1/2 x5 + O(x7 )

P4 (x) = x + x3
6
The remainder is given by
f (5) (ξ(x)) 5
R4 (x) = x
5!
1 ξ2
= e (15 + 90ξ 2 + 60ξ 4 + 8ξ 6 )x5
30
≤ 1.211406197 x5
≤ 0.01240479946
we find the last inequality by substituting ξ = x = 0.4.
The integral can be approximated by
Z0.4 Z0.4
f (x)dx ≈ x + x3 dx = 0.086400
0 0
The upper bound error is

Z0.4
1.211406197 x5 = 0.0008269866307
0
To approximate f ′ (0.2) using P4′ (0.2) we can write

f ′ (0.2) ≈ 1 + 3 x2 |0.2 = 1.12
2
f ′ (0.2) = ex (1 + 2x2 )|0.2 = 1.124075636
The actual error is
1.124075636 − 1.12 = 0.004075636

The nth derivative of cos(x) is
π
cos (x)(n) = cos x + n
2
According to Taylor’s theorem the error is
(x − x0 )(n+1) π
Rn (x) = sin ξ + n
(n + 1)! 2
where ξ is between x and x0 . For x = 42o and x0 = π/4 we find that the error is less than
(π/60)n+1
Rn (x) ≤ E =
(n + 1)!
n=1 , E = 0.001370778390
n=2 , E = 0.00002392459621
n=3 , E = 3.131722321 × 10−7
n=4 , E = 3.279531946 × 10−9
7
So n = 3 is sufficient to get the value accuracy to 10−6 . In this case P3 (x) is equal to
P3 (x) = cos(x0 ) − sin(x0 ) (x − x0 ) − 1/2 cos(x0 ) (x − x0 )2 + 1/6 sin(x0 ) (x − x0 )3

= 0.7431446016
the actual value of cos 42o = 0.7431448255..., so the error is of the order of 0.2239 × 10−6 .

let assume that m = Min [f (x1 ), f (x2 )] and M = Max [f (x1 ), f (x2 )]. So, for any value ξ between x1
and x2 we have
c1 m ≤ c1 f (x1 ) ≤ c1 M
c2 m ≤ c2 f (x2 ) ≤ c2 M
which lead to
(c1 + c2 )m ≤ c1 f (x1 ) + c2 f (x2 ) ≤ (c1 + c2 )M
and therefore
c1 f (x1 ) + c2 f (x2 )
m≤ ≤M
c1 + c2
without loss of generality, let assume that m = f (x1 ) and M = f (x2 ), then the last equation gives
c1 f (x1 ) + c2 f (x2 )
f (x1 ) ≤ ≤ f (x2 )
c1 + c2
According to the intermediate value theorem ∃ξ between x1 and x2 such that
c1 f (x1 ) + c2 f (x2 )
f (ξ) =
c1 + c2
1.2 Round-off Errors

An n-digit floating-point number in base β has the form
x = ±(.d1 d2 ...dn )β β e (1.17)
where (.d1 d2 ...dn )β is a β-fraction called the mantissa, end e is an integer called the exponent.
Such a floating-point number is said to be normalized in case d1 6= 0, or else d1 = d2 = ... = dn = 0.
64-bit (binary digit) representation for number, called long real.

The first bit is a sign indicator, denoted by s
This is followed by 11-bit exponent, c, called the characteristic,
and, a 52-bit binary fraction, f , called mantissa.
the base for the exponent is 2.
The 52 binary digits can holds up to 16 decimal digits.

The exponent of 11 binary digits gives a range of 0 to 211 − 1 = 2047.
To use small number we have to shift the exponent by 1023, so the range is actually between
0 − 1023 = −1023 and 2047 − 1023 = 1024
8
To save storage and provide a unique representation for each floating-point number we use a nor-
malized form
(−1)s 2c−1023 (1 + f ) (1.18)
Example: consider the machine number
0 |10000000011
{z } 1011100100010000000000000000000000000000000000000000
| {z }
The left most bit is zero, the number is positive.
The next eleven bits, (10000000011)2 = 1 + 21 + 210 = 1027.
The exponent part is 21027−1023 = 24 . The final 52 bits specify the mantissa
1 1 1 1 1 1
f = (.101110010001)2 = + 3 + 4 + 5 + 8 + 12
2 2 2 2 2 2
This number represents
(−1)s 2c−1023 (1 + f ) = 24 × (1 + 0.722900390625) = 27.56640625
The next smallest number to this is represented by

0 |10000000011
{z } |1011100100001111111111111111111111111111111111111111
{z }
and the next largest number is
0 |10000000011
{z } |1011100100010000000000000000000000000000000000000001
{z } This means that our orig-
27.566406249...
27.566406250...
27.56640625
11
00
00
11
00
11
next smallest machine number next largest machine number
Figure 1.1:
inal number represents not only 27.56640625, but also half of the real numbers that are between the
numbers 27.56640625 and its two nearest machine-number neighbors see (Fig. 1.1).
Underflow and overflow

..............
Round-off errors:
Round-off errors arise because it is impossible to represent all real numbers exactly on a finite-state
machine (which is what all practical digital computers are).
On a pocket calculator, if one enters 0.0000000000001 (or the maximum number of zeros possible),
then a ’+’, and then 100000000000000 (again, the maximum number of zeros possible), one will obtain
the number 100000000000000 again, and not 100000000000000.0000000000001. The calculator’s
answer is incorrect because of round-off in the calculation.
9
Round-off errors in a computer1
The most basic source of errors in a computer is attributed to the error in representing a real number
with a limited number of bits.
The machine epsilon, ǫ, is the interval between 1 and the next number greater than 1 that is
distinguishable from 1. This means that no number between 1 and 1 + ǫ can be represented in
the computer.
Machine epsilon can be found by the following program:
10 E=1
20 IF E+1>1 THEN PRINT E ELSE STOP
30 E=E/2: GOTO 20
When numbers are added or subtracted, an accurate representation of the result may require much
larger number of digits than needed for numbers added or subtracted. Serious amounts of round-off
error occur in situations:
1. when adding (or subtracting) a very small number to (or from) a large number
2. when a number is subtracted from another that is very close
To test the first case on the computer, let us add 0.00001 to unity ten thousand times. The program
to do this job would be:
10 sum=1
20 for i=1 to 10000
30 sum=sum+0.00001
40 next
50 print*, sum
The result of this program would be

sum = 1.100136
Since the exact answer is 1.1, the relative error of this computation is
1.100136 − 1.1
= 0.000124 = 0.0124%
1.1
The cause of this round off error can be understood like this. Let consider the computation of
1+0.00001. The binary representation of 1 and 0.00001 in 32-bit (binary digit) are, respectively,
(1)10 = (0.1000 0000 0000 0000 0000 0000)2 × 21
(0.00001)10 = (0.1010 0111 1100 0101 1010 1100)2 × 2−16
We adjust their exponent we get
(0.10000 0000 0000 0000 0000 0000 0000 0000 0000 0000)2 × 21
+
(0.00000 0000 0000 0000 1010 0111 1100 0101 1010 1100)2 × 21
(0.10000 0000 0000 0000 1010 0111 1100 0101 1010 1100)2 × 21
1
Applied Numerical Methods with Software, Shoichiro Nakamura
10
Now we have to use only 24-bit for the mantissa, we get
(0.10000 0000 0000 0000 1010 0111 1100 0101 1010 1100)2 × 21
after you round it you get
(0.1000 0000 0000 0000 0101 0100)2 × 21
which is equivalent to (1.0000100136)10 .

Thus, whenever 0.00001 is added to 1, the result gains 0.0000100136 as an error. When addition of
0.00001 is added to 1 ten thousand times, an error of exactly ten thousand times 0.0000100136 is
generated. Although the calculated result gains in the present example, it can lose if some digits are
cut off. Loss and gain are both referred to round-off error.
Strategies to minimize round off errors
1. Double precision
2. Grouping
3. Taylor expansion
4. Changing definition of variable
5. Rewriting the equation to avoid subtractions
Example:
We want to add 0.00001 ten thousand times to unity by using:
(a)-double precision
(b)-grouping method
Double precision method:
10 SUM=1.0D0
20 DO I=1,10000
30 SUM=SUM+0.00001D0
40 END DO
50 PRINT *, SUM
Grouping method:
SUM=1
DO 47 I=1,100
TOTAL=0
DO 40 K=1,100
TOTAL=TOTAL+0.00001
40 CONTINUE
11
SUM=SUM+TOTAL
47 CONTINUE
PRINT *, SUM
Example: As θ approaches 0, accuracy of a numerical evaluation for
sin(1 + θ) − sin(1)
d=
θ
becomes very poor because of the round-off errors. By using Taylor expansion we can write
sin(1 + θ) = sin(1) + θ cos(1) − 0.5θ2 sin(1)...
Therefore,
d ≈ D = cos(1) − 0.5θ sin(1)
By writing a program to compute d. The accuracy of the approximation increases as θ approaches

0.
The FORTRAN program is:
program testeps
implicit none
real :: d,da,t=1.0e0,h=10.0e0
integer :: i
do i=1,7
t=t/h
da=cos(1.0e0)-0.5e0*t*sin(1.0e0)
d=(sin(1.0e0+t)-sin(1.0e0))/t
print*,t,d,da
end do
end
The output is:
angle d D
-------------------------------------
0.10000000E-00 0.49736413 0.49822873
9.99999978E-03 0.53608829 0.53609490
9.99999931E-04 0.53993475 0.53988153
9.99999902E-05 0.54062998 0.54026020
9.99999884E-06 0.54383242 0.54029804
9.99999884E-07 0.54327762 0.54030186
12
Law of Arithmetic Due to errors introduced in floating point arithmetic, the associative and
distributive laws of arithmetic are not always satisfied. that is
x + (y + z) =
6 (x + y) + z
x × (y × z) = 6 (x × y) × z
x × (y + z) 6= (x × y) + (x × z)
Example: Let x = 0.456732 × 10−2 , y = 0.243451, and z = −0.248000.
x+y = 0.00456732 + 0.243451 = 0.248018

(x + y) − z = 0.248018 − 0.248000 = 0.000018 = 0.180000 × 10−4
(y + z) = 0.243451 − 0.248000 = −0.0045490
x + (y + z) = 0.00456732 − 0.00454900 = 0.0001832 = 0.183200 × 10−4
chopping and rounding:

Let use normalized decimal floating-point form
±0.d1 d2 ...dk × 10n (1.19)
where 1 ≤ d1 ≤ 9 and 0 ≤ di ≤ 9. This number is called k-digit decimal machine numbers. Any
positive real number within the numerical range of the machine can be normalize to the form
y = 0.d1 d2 ...dk dk+1 dk+2 ... × 10n (1.20)
The floating-point form of y, denoted by f l(y), is obtained by terminating the mantissa of y at k

decimal digits. There are two ways of performing this termination. One called chopping. is simply
chop off the digits dk+1 dk+2 .... The other method, called rounding, add 5 × 10n−(k+1) to y and then
chops the result. So, when rounding, if dk+1 ≥ 5, we add 1 to dk to obtain f l(y). This is we round
up. When dk+1 < 5, we merely chop off all but the first k digits; so we round down. Rounding up
the digits and even the exponent might change.
Example: the number π = 0.314159... × 101 . The floating-point form of π using five-digit chopping
is f l(π) = 0.31415 × 101 = 3.1415. The floating-point form of π using five-digit rounding is 3.1416,
because of the sixth digit expansion of π which is 9 > 5.
Absolute error and relative error:

|p − p∗ |
If p⋆ is an approximation to p, the absolute error is |p − p⋆ |, and the relative error is .
|p|
Significant digit:
The number p∗ is said to approximate p to t significant digits if t is the largest nonegative integer
for which
|p − p∗ |
≤ 5 × 10−t (1.21)
|p|
A more formal definition of significant digits is as following. Let the true value have digits p =
d1 d2 ...dk dk+1 ...dn .
Let the approximate value have p∗ = d1 d2 ...dk ek+1 ...en .
where d1 6= 0 and with the first difference in the digits occurring at the (k + 1)st digit. We then say
13
that p and p∗ agree to k significant digits if |dk+1 − ek+1 | < 5. Otherwise, we say they agree to k − 1
significant digits.
Example: Let the true value p = 10/3 and the approximate value p∗ = 3.333.
The absolute error is |10/3 − 3.333| = 1/3000.
The relative error is 1/10000=10−4 < 5 × 10−4
The number of significant digits is 4.
Assume that the floating-point representations f l(x) and f l(y) are given for the real number x and y
and the symbols ⊕, ⊖, ⊗, ⊘ represent addition, subtraction, multiplication, and division operations,
respectively. The finite-digit arithmetic is given by
x⊕y = f l(f l(x) + f l(y))

x⊖y = f l(f l(x) − f l(y))
x⊗y = f l(f l(x) × f l(y))
x⊘y = f l(f l(x) ÷ f l(y))
For k-digit chopping we have

y − f l(y)
≤ 10−k+1 (1.22)
y
For k-digit rounding we have

y − f l(y)
≤ 0.5 × 10−k+1 (1.23)
y
One of the most common error involves the cancellation of significant digits due to the subtraction
of two nearly equal numbers. Suppose we have two nearly equal numbers x and y, with x > y,
we have
f l(x) = 0.d1 d2 ...dp αp+1 αp+2 ...αk × 10n

f l(y) = 0.d1 d2 ...dp βp+1 βp+2 ...βk × 10n
the floating-point form of x − y takes the form
f l(f l(x) − f l(y)) = αp+1 αp+2 ...αk × 10n−p − βp+1 βp+2 ...βk × 10n−p
= 0.σp+1 σp+2 ...σk × 10n−p
The floating-point number used to represent x − y has at most k − p digits of significance. Any
further calculations involving x − y retain the problem of having only k − p digits of significance.
Loss of significance: Consider, for example, x∗ = 0.76545421 × 101 and y ∗ = 0.76544200 × 101 to
be an approximation to x and y, respectively, correct to seven significant digits. Then, in eight-digit
floating-point arithmetic, the difference z ∗ = x∗ − y ∗ = 0.12210000 × 10−3 . But as an approximation
to z = x − y is good only to three digits, since the fourth significant digit of z ∗ is derived from the
eight digits of x∗ and y ∗ , both possibly in error. Hence, while the error in z ∗ is at most the sum of
the error in x∗ and y ∗ , the relative error in z ∗ is possibly 10000 times the relative relative error in x∗
and y ∗ . Loss of significant digits is therefore dangerous only if we wish to keep the relative error
small
14
We can also have error when dividing by a small number of multiplying by large number. Suppose,
for example, that the number z has a finite-digit approximation z + δ, where the error δ is introduced
by representation or previous calculation. If we divide it by ǫ = 10−n , where n > 0, then

z f l(z)
≈ fl = (z + δ) × 10n
ǫ f l(ǫ)
so, the absolute error in this approximation, |δ| × 10n , is the original absolute error, |δ|, multiplied
by a factor 10n .
Example:
Let p = 0.54617 and q = 0.54601. The exact value of r = p − q = 0.16 × 10−5 . If we perform the
subtraction using 4-digit rounding we find p∗ = 0.5462 and q ∗ = 0.5460, and r∗ = p∗ −q ∗ = 0.2×10−4 .
The relative error is

r − r∗
r = 0.25

which has only one significant digit, whereas p∗ and q ∗ were accurate to four and five significant
digits, respectively.
Example:
The quadrature formula states that the roots of ax2 + bx + c = 0, when a 6= 0, are
√
−b ± b2 − 4ac
x± = (1.24)
2a
using four-digit rounding arithmetic, consider this formula applied to x2 + 62.1x + 1 − 0, whose
roots are approximately x+ = −0.01610723 and x− = −62.08390. we can√see that b ≫ 4ac, so
the numerator of x+ involves the subtraction of two nearly equal numbers. b2 − 4ac = 62.06, we
get f l(x+ ) = −0.02 which is a poor approximation to x+ = −0.01611, with a relative error about
2.4 × 10−1 . The other root f l(x− ) = −62.10 has a small relative error around 3.2 × 10−4 .
To obtain more accurate we can use the formula
√
−b + b2 − 4ac
x+ =
√2a √
−b + b2 − 4ac b + b2 − 4ac
= √
2a b + b2 − 4ac
−2c
= √
b + b2 − 4ac
so we can get f l(x+ ) = −0.01610 which has the small relative error 6.2 × 10−4 . we can also derive a
formula for x2
−2c
x− = √
b − b2 − 4ac
In this case f l(x− ) will be −50.00 which has the large relative error 1.9 × 10−1 .
15
Example:
This example shows how we can avoid loss of significance. We want to evaluate f (x) = 1 − cos(x)
near zero in six-digit arithmetic. Since cos(x) ≈ 1 for x near zero, there will be loss of significant
digits by first finding cos(x) and then subtracting it from 1. Without loss of generality, assume that
x is close to zero with x > 0 , we have
cos(x) = 0.a1 a2 a3 a4 a5 a6 a7 ...
if x = x0 is close enough to zero we can have
cos(x0 ) = 0.999999a7 ...
the difference is
1 − cos(x0 ) = 0.100000 × 10−5 − 0.a7 a8 a9 ... × 10−6
if we use rounding and if a7 ≥ 5 we cannot calculate the value of cos x using six-digit arithmetic at
all x ≤ x0 , because the rounding value of 1 − cos(x) is zero. for example 1 − cos(0.001) = 0.000000
but it is equal to 0.500000 × 10−6 . To overcome this we can use another formula
1 + cos(x)
1 − cos(x) = (1 − cos(x))
1 + cos x
sin2 (x)
=
1 + cos x
If we use this last equation we find that for x = 0.001
sin2 (0.001)
1 − cos(0.001) =
1 + cos 0.001
0.1 × 10−5
=
2
= 0.5 × 10−6
We can use Taylor polynomial
x2 x4
1 − cos x ≈ − + ...
2 24
which gives
0.0012 0.0014
1 − cos 0.001 ≈ − + ...
2 24
0.1 × 10−11
≈ 0.5 × 10−6 − + ...
24
= 0.5 × 10−6 − 0.416667 × 10−13 + ...
≈ 0.5 × 10−6
Example:
The value of the polynomial p(x) = 2x3 − 3x2 + 5x − 4 at x = 3 can be calculated as:
16
⋆ x2 = 9, x3 = 27, then we put every thing together, p(3) = 54 − 17 + 15 − 4 = 38.
We have five multiplication: x2 , x3 , 2x3 , 3x2 , 5x, and
one addition and two subtractions. We need in total 8 operations
⋆ The polynomial can be arrange as p(x) = [(2x − 3)x + 5]x − 4, nested manner
We need three multiplications and one addition and two subtractions. In total we need six.
– In general, for a polynomial of degree n we need (n − 1) + n = 2n − 1 multiplications:
( n−1 for xn , xn−1 ,...,x2 . and n for the multiplication of coefficients, an ×xn ,an−1 ×xn−1 ,...a1 ×x.
. However, for the nested form we need only n multiplications.
– Both need n addition/subtraction operations.
1.2.1 Exercises
Assignment: odd exercises From section 1.2 pages 26-29
Tutorial: 1,
22
Exercise 1: = π = 3.1415926..., and p∗ = = 3.142857. The absolute error is
7
|p − p∗ | = 3.142857 − 3.141592 6...
0.0012644 < |p − p∗ | < 0.0012645
If we round it we find that the absolute error is 0.00126. The relative error is

0.0012644 p − p∗ 0.0012645
< <
3.1415927 p 3.1415926

−4
p − p∗
4.0247 × 10 < < 4.0250 × 10−4
p
If we round it we find that the relative error is 4.025 × 10−4
∗ |p − p∗ |
The relative error in p , as an approximation to p, is defined by α = . Note that this number
|p|
|p − p∗ |
is close to if α ≪ 1. One can show that
|p∗ |
|p − p∗ | |p − p∗ | α
= α =⇒ = ≈α (1.25)
|p| |p∗ | |1 ± α|
Exercise 5: Three-digit rounding arithmetic for:

a) 133 + 0.921 = 133.921 is 134.
b) 133 − 0.499 = 132.501 is 133.
c) (121 − 0.327) − 119 = (120.673) − 119 = 121 − 119 is 2.
d) (121 − 119) − 0.327 = (2) − 0.327 = 1.673 is 1.67
13
−6 0.929 − 0.857 0.072
e) 14 7 = = = 1.80
2e − 5.4 5.44 − 5.4 0.04
The absolute error and relative error are:
a) 0.79 × 10−1 , 0.59 × 10−3
b) 0.499, 0.377 × 10−2
e) 0.154 0.0786
17
Assignment
Suppose two points (x0 , y0 ) and (x1 , y1 ) are on the straight line with y0 6= y1 . The x-intercept of the line
is given by
x0 y1 − x1 y0
x=
y1 − y0
or
(x1 − x0 )y0
x = x0 −
y1 − y0
Group1 Use the data (x0 , y0 ) = (1.31, 3.24) and (x1 , y1 ) = (1.93, 4.76) and three-digit rounding arithmetic
to compute x-intercept both ways. Which method is better and why?
Group2 Use the data (x0 , y0 ) = (0.2, 0.2) and (x1 , y1 ) = (1.2, 1.01) and three-digit rounding arithmetic to
compute x-intercept both ways. Which method is better and why?
Solution
-y1 x0 + x1 y0
X1 := --------------
-y1 + y0
y0 (x1 - x0)
X2 := x0 + ------------
-y1 + y0
Group 1 Group 2
------------------------------------- ---------------------------------------
x0 := 1.31 x0 := 0.2
y0 := 3.24 y0 := 0.2
x1 := 1.93 x1 := 1.2
y1 := 4.76 y1 := 1.01
Actual solution is -0.01157894737 Actual solution is -0.04691358025
X1 := -0.00658 X1 := -0.0469
X2 := -0.01 X2 := -0.047
Relative error for X1 is 0.4317272728 Relative error for X1 is 0.000289473750

Relative error for X2 is 0.1363636365 Relative error for X2 is 0.001842105197
X2 is better than X1 X1 is better than X2
18
Chapter 2
Solution of Equations in One Variable
Finding the roots of a function f is very important in science and engineering and it is not always simple.
Let consider the function
f (x) = (1 − x)8 = 1 − 8x + 28x2 − 56x3 + 70x4 − 56x5 + 28x6 − 8x7 + x8
It is clear that x = 1 is the only real root of f . The graph of f is given in Fig. 2.1. It shows that there are
many roots for f because of many positive and negative values of f (x).
10 −14
10.0
7.5
y 5.0
2.5
0.0
0.97 0.98 0.99 1.0 1.01 1.02 1.03
x
Figure 2.1: The strange behavior of f (x) near x = 1 is due to round off errors in the computation of
expanded for of f (x). of f (x) = 0.
19
2.1 Bisection Method
Definition: The first technique, based on intermediate value theorem, is called bisection method.
To begin, set a1 = a and b1 = b, and let p1 the midpoint of the interval [a, b]; that is
b 1 − a1 a1 + b 1
p 1 = a1 + = (2.1)
2 2
if f (p1 ) = 0, then the root of f (x) = 0 is p = p1 . If f (p1 ) 6= 0, then f (p1 ) has the same sign of as
either f (a1 ) or f (b1 ). When f (p1 ) and f (a1 ) have the same sign, p ∈ (p1 , b1 ), and we set a2 = p1 and
b2 = b1 . When f (p1 ) and f (b1 ) have the same sign, p ∈ (a1 , p1 ), and we set a2 = a1 and b2 = p1 . We
then reapply the process to the interval [a2 , b2 ].
Algorithm:
INPUT: endpoints a, b; Tolerance T OL; maximum number of iteration N0
OUTPUT: approximate solution p or message of failure.
Step 1: Set i=1; FA=f(a);
Step 2: while i ≤ N0 do steps 3-6.

Step 3: set p = a + (b − a)/2; (compute pi )
FP=f(p);
Step 4: if F P = 0 or (b − a)/2 < T OL then OUTPUT (p);
(procedure terminated successfully)
STOP.
Step 5: set i = i + 1
Step 6: If F A.F P > 0 then set a = p; (compute ai , bi ) FA=FP
else set b = p
Step 7: OUTPUT (“method failed after N0 iterations”)

STOP
The stopping procedure in step 4 by selecting a tolerance ǫ > 0 with the following conditions:
|p − pN −1 | < ǫ (2.2)
N
pN − pN −1
<ǫ , pN 6= 0 (2.3)
pN
|f (pN )| < ǫ (2.4)
The first and the last one are not good measure of the tolerance (see Ex.16 and 17 page 52). The
middle one which is the relative error is better that the two others.
1
If you apply the bisection method to the function f (x) = on the interval [0, 2] you will find
x−1
that the method can catch a singularity at x=1. The stopping procedure in step 4 will never be
satisfied for any number of iterations N0 and the method fails.
To start the algorithm we need to check that the sign of the product f (a).f (b) ≤ 0. We can use the
sign function defined by

 −1, if x < 0
sign(x) = 0, if x = 0 (2.5)

1, if x > 0
20
and we just test sign(f(a)).sign(f(b)) instead of f (a).f (b).
It is good practice to set an upper bound for the number of iterations. This eliminate the possibility
in entering an infinite loop.
It is good to choose the interval [a, b] to be small as possible so we can reduce the number of iterations.
The bisection is slow to converge, N may become quite larger for small tolerance.
Theorem: Suppose that f ∈ C[a, b] and f (a).f (b) < 0. The bisection method generates a sequence
pn approximating a zero p of f with
b−a
|pn − p| ≤ , n ≥ 1. (2.6)
2n
Proof: For each n ≥ 1, we have
b1 = b , a1 = a (2.7)
and (2.8)
b−a
b n − an = (2.9)
2n−1
we also have that
b n − an b−a
|p − pn | ≤ = n (2.10)
2 2
which shows that the sequence pn converges to p.
The bound for number of iterations assumes calculation performed using infinite-digit arithmetic.
When implementing the method on a computer, we have to consider round-off error. For example,
the computation of midpoint of [a, b] should be found from the equation
b n − an
p n = an + (2.11)
2
instead from the algebraic equivalent equation
an + b n
pn = (2.12)
2
The first equation add a small correction (bn − an )/2 to the known value an . if bn − an is near the
maximum precision of the machine this correction will not affect significantly pn . However,
(an + bn )/2 may return a midpoint that is not even in the interval [an , bn ].
Exercises:
Odd numbers of sec. 2.1 page 51-52.
– Ex 13: √
An approximate value to 3 25 correct to within 10−4 .
Let consider the function f (x) = x3 − 25. We can choose the interval [2, 3], We have f (2) = −17
and f (3) = 2. The two values have different signs, so we can apply the bisection method.
21
n an bn pn b n − an f (pn )
1 2 3 2.5 1 −9.3750
2 2.5 3 2.75 0.5 −4.2031
3 2.75 3 2.8750 0.25 −1.2363
4 2.875 3 2.93750 0.125 +0.34741
5 2.8750 2.93750 2.906250 0.0625 +0.0625
6 2.90625 2.93750 2.921875 0.031250 +0.03125
7 2.921875 2.937500 2.929688 0.015625 +0.145710
8 2.9218750 2.9296875 2.9257812 0.0078125 +0.0452607
9 2.9218750 2.9257812 2.9238281 0.0039062 −0.0048632
10 2.9238 2.9258 2.9248 1.9531E − 03 +2.0190E − 02
11 2.9238 2.9248 2.9243 9.7656E − 04 +7.6615E − 03
12 2.9238 2.9243 2.9241 4.8828E − 04 +1.3986E − 03
13 2.9238 2.9241 2.9240 2.4414E − 04 −17.324E − 03
14 2.9240 2.9241 2.9240 1.2207E − 04 −1.6692E − 04
√3
So, the approximate value of 25 is p14 = 2.9240, because (b14 − a14 )/2 = 6.1035E − 05 and
(b13 − a13 )/2 = 1.2207E − 04. If we use the Theorem:
b−a
|pn − p| ≤ < 10−4 (2.13)
2n
we find that
1 −4 4
< 10 ⇒ −n log 2 < −4 ⇒ n > = 13.288
2n log 2
So, n should be at least 14.
– Ex 18:
The function f (x) = sin (πx) has zeros at every integer. We want to show that when −1 < a < 0
and 2 < b < 3, the bisection method converges to:
0 for a + b < 2
2 for a + b > 2
1 for a + b = 2

a+b
we have to check the sign of sin (aπ), sin π ,and sin (bπ) for each iteration.
2
For the starting point we have:
for a ∈ (−1, 0), the function sin (aπ) ∈ [−1, 0)
and for b ∈ (2, 3), the function sin (bπ) ∈ (0, 1]
So, the bisection method can apply on the interval [a,b].
The only root that we can get are, 0, or 1, or 2; because these are the only integers belong to
(−1, 3).
a+b
Next, we have to check the sign of sin π . We know that a + b ∈ (1, 3),
2

a+b a+b
* if a + b < 2 we have p = 0.5 < < 1 and sin π > 0.
2 2
so the sign of sin (aπ) < 0 and sin (pπ) > 0 are different and the only root between a and
p is 0. Therefore the bisection method converge to 0.
22

a+b a+b
* if a + b > 2 we have p = 1 < < 1.5 and sin π < 0.
2 2
so the sign of sin (pπ) < 0 and sin (bπ) > 0 are different and the only root between p and b
is 2. Therefore the bisection method converge to 2.

a+b
* if a + b = 2 we have p = 1 and sin π = 0.
2
which of cause is the root 1.
We will solve exercises 1,3,11,15,16 from sec 2.1 page 51-52.
2.2 Fixed-Point Iteration

Definition: A number p is a fixed point for a given function g if
g(p) = p (2.14)
Theorem: If g ∈ C[a, b] and g(x) ∈ [a, b] for all x ∈ [a, b], the g has a fixed point in [a, b].
In addition, g ′ (x) exists on (a, b) and a positive constant k < 1 exists with
|g ′ (x)| ≤ k, for all x ∈ (a, b) (2.15)
then the fixed point in [a, b] is unique.

proof: Let consider the function f (x) = g(x) − x. We have the following relations:
f (b) = g(b) − b ≤ 0 f (a) = g(a) − a ≥ 0 (2.16)
So, according to the Intermediate value theorem f (x) = 0 has a root. Thus g(x) = x has a solu-
tion.Therefore g has a fixed point.
Assume that there are more than one fixed points, let say, p and q with p 6= q. We know from the
mean value theorem that it exists ξ between q and q such that
g(p) − g(q)
= g ′ (ξ) = 1
p−q
which contradicts the fact that |g ′ (x)| < 1, so, there will be only one fixed point.
What about the case when |g ′ (x)| > 1, do we have such function?
Indeed if |g ′ (x)| =
6 1 for all x the only possible case is that |g ′ (x)| < 1.
Proof: Suppose that g ′ (x) 6= 1 for all x ∈ (a, b). Because g(x) ∈ [a, b] and g ′ (x) exists so, we have
g ′ (x) < −1 or −1 ≤ g ′ (x) < 1 or g ′ (x) > 1:
The first one and last on are not true because
g(b) − g(a)
−1 ≤ <1 (2.17)
b−a
So, we can say that for the case when |g ′ (x)| > 1 no function exists.
Algorithm:
INPUT: initial point p0 a, b; Tolerance T OL; maximum number of iteration N0
23
OUTPUT: approximate solution p or message of failure.
Step 1: Set i=1;
Step 2: while i ≤ N0 do steps 3-6.
Step 3: set p = g(p0); (compute pi )

Step 4: if |p − p0 | < T OL then
OUTPUT (p);
(procedure terminated successfully)
STOP.
Step 5: set i = i + 1
Step 6: Set p0 = p; (update p0 )
Step 7: OUTPUT (“method failed after N0 iterations”)

STOP.
Fixed-point theorem If g ∈ C[a, b] and g(x) ∈ [a, b] for all x ∈ [a, b], suppose in addition, g ′ (x)
exists on (a, b) and a positive constant 0 < k < 1 exists with
|g ′ (x)| ≤ k, for all x ∈ (a, b) (2.18)
then for any p0 ∈ [a, b], the sequence pn = g(pn−1 ) converges to the unique fixed point p ∈ [a, b].
Proof:
If g satisfies the fixed point theorem, then bounds for the error involved in using pn to approximate
p are given by
|p − pn | ≤ k n max{p0 − a, b − p0 } (2.19)
kn
|p − pn | ≤ |p1 − p0 | (2.20)
1−k
proof:
Study examples 3 and 4 page 57
Exercises:
– Ex.5: We use a fixed-point iteration method to determine a solution accurate to 10−2 for
x4 − 3x2 − 3 = 0 on [1, 2] and p0 = 1.
Solution:
To use the fixed-point iteration we have to find a function g(x) which satisfies fixed-point theo-
1/4
rem. the equation x4 −3x2 −3 = 0 leads to x4 = 3x2 +3, which in turn leads to x = 3x2 + 3 .
1/4
Now we check if the function g(x) = 3x2 + 3 satisfies the fixed-point theorem.
3 x
g ′ (x) =
2 (3x + 3)3/4
2
24
The derivative is always positive in the region [1, 2]. So, g(1) = 1.565084580 and g(2) =
1.967989671. Therefore g(x) ∈ [1, 2].
3 x 3 2
g ′ (x) = 3/4
≤
2 (3x2 + 3) 2 (3 × 12 + 3)3/4
≤ 0.7825422900 < 1
So, the function g(x) satisfies the fixed-point theorem. For more precisely, the second derivative
is given by
3 x2 − 2
g ′′ (x) = −
4 (x2 + 1)(3x2 + 3)3/4
√
In the region [1, 2] we have g ′ (1) = 0.3912711450, g ′ (2) = 0.3935979342 and g( 2) = 0.4082482906,
so g ′ (x) ≤ 0.4082482906 < 1. Therefore our k = 0.4082482906.
according to the theorem we have
|p − pn | ≤ k n max[p0 − a, b − p0 ]
which becomes
|p − pn | ≤ k n max[p0 − a, b − p0 ]
≤ 0.4082482906n ≤ 10−2
which implies that n ≥ 6. The answer will be p6 = 1.943316930 is accurate to within 10−2 .
– Ex 7:
We want to show that the function g(x) = π + 0.5 sin (x/2) has a unique fixed point on [0, 2π].
We know that
0 < π − 0.5 ≤ g(x) ≤ π + 0.5 < 2π (2.21)
The derivative of the function g is given by
cos (x/2) 1
g ′ (x) = ≤ (2.22)
4 4
Therefore the function has a unique fixed point. To find an approximation to the fixed-point
that is accurate to 10−2 let estimate the number of iteration. Let use p0 = π
n
1
|p − pn | ≤ π ≤ 10−2 (2.23)
4
This leads to the number of iteration n ≥ 5.
we can better use

kn
|p − pn | ≤ |p1 − p0 | (2.24)
1−k
25
we have p1 = π + 1/2, so the last equation becomes
n
(1/4)n 1 2 1
|p − pn | ≤ = ≤ 10−2 (2.25)
3/4 2 3 4
This leads to the number of iteration n ≥ 4. Therefore
p0 = 3.141592654
p1 = 3.641592654
p2 = 3.626048865
p3 = 3.626995623
p4 = 3.626938795
p5 = 3.626942209
2.3 Newton’s Method

Derivation of Newton’s (or the Newton-Raphson) method:
Suppose that f ∈ C 2 [a, b], have continuous second derivative. Let p0 ∈ [a, b] be an approximation to
the solution p of f (x) = 0 such that f ′ (p) 6= 0 and |p − p0 | is “small”. The first Taylor polynomial
of f (x) about p0 is
(p − p0 )2 ′′
f (p) = f (p0 ) + (p − p0 )f ′ (p0 ) + f (ξ(p)). (2.26)
2
where ξ(p) lies between p and p0 . Since f (p) = 0, and if we neglect the quadrature term we get
f (p0 )
p ≈ p1 = p0 − (2.27)
f ′ (p0 )
We can write the sequence
f (pn−1 )
pn = pn−1 − , n≥1 (2.28)
f ′ (pn−1 )
The last equation is what we call Newton’s Method.
Algorithm (please see Page 64)
Newton’s method is a functional iteration technique of the form pn = g(pn−1 ) where
f (pn−1 )
g(pn−1 ) = pn−1 − , n≥1 (2.29)
f ′ (pn−1 )
The Newton’s method cannot be continuous if f ′ (pn−1 ) = 0.

Newton’s method derivation depends on the assumption that p0 is close to the p. So, it is important
that the initial approximation p0 is chosen to be close to the actual value p. In some cases even with
poor initial approximation the Newton’s method converges.
The following theorem illustrates the theoretical importance of the initial approximation choice of
p0 .
Theorem: Let f ∈ C 2 [a, b]. If p ∈ [a, b] is such that f (p) = 0 and f ′ (p) 6= 0, then there exists a δ > 0
such that Newton’s method generates a sequence {pn } converging to p for any initial approximation
p0 ∈ [p − δ, p + δ].
Proof: (see page 66).
26
The theorem states, that under reasonable assumptions, Newton’s method converges provided a
sufficiently accurate initial approximation is chosen. In practice, the method doesn’t tell us how to
calculate δ. In general either the method converges quickly or it will be clear that convergence is
unlikely.
Newton’s method is a powerful technique, but it has a major weakness: the need of the first deriva-
tive.The calculation of the first derivative f ′ (x) needs more arithmetic operations than f (x).
2.4 Secant Method

Newton’s method is a powerful technique, but it has a major weakness: the need of the first derivative.The
calculation of the first derivative f ′ (x) needs more arithmetic operations than f (x).
We know that the first derivative is defined by
f (x) − f (pn−1 )
f ′ (pn−1 ) = lim (2.30)
x→pn−1 x − pn−1
Letting x = pn−2 , we have
f (pn−2 ) − f (pn−1 )
f ′ (pn−1 ) ≈ (2.31)
pn−2 − pn−1
using this last equation in the Newton’s method we get
pn−1 − pn−2
pn = pn−1 − f (pn−1 ) (2.32)
f (pn−1 ) − f (pn−2 )
This is called the Secant Method. Starting with two initial approximation p0 and p1 , the approx-
imation p2 is the x-intercept of the line joining the two points (p0 , f (p0 )) and (p1 , f (p1 )).
The approximation p3 is the x-intercept of the line joining (p1 , f (p1 )) and (p2 , f (p2 )) (see the Fig.2.2).
2.5 False Position Method
Secant Method False Position Metho
p0 p p3 p1 p p p p1
2 0 2 3
p4
p
4
Figure 2.2: Secant Method and False Position method for finding the root of f (x) = 0.
27
False position method: generates approximations in the same way as the Secant method, but
includes a test to ensure that the root is always bracketed between successive iterations. This
method is not recommended and it is just to illustrate how bracketing can be incorporated.
First we choose two approximations p0 and p1 such that f (p0 ) · f (p1 ) < 0. The approximation p2 is
chosen the same manner as the secant method, as the x-intercept of the line joining (p0 , f (p0 )) and
(p1 , f (p1 )). To decide which secant line to be use to compute p3 , we check the sign of f (p1 ) · f (p2 ).
If it is negative then p1 and p2 bracket a root, and we choose b3 as the x-intercept of the line joining
(p1 , f (p1 )) and (p2 , f (p2 ). If not we choose p3 as the x-intercept of the line joining p0 , f (p0 )) and
(p2 , f (p2 )). In similar manner we can found pn , for n ≥ 4.
Exercises
Ex 5: We want to use Newton’s method to find an approximate solution accurate to 10−4 of for the
following equation.
x3 − 2x2 − 5 = 0, in the interval , [1, 4]
Solution:
The Newton’s method is:
f (pn−1 )
pn = pn−1 − , n≥1 (2.33)
f ′ (pn−1 )
with f (x) = x3 − 2x− 5 and f ′ (x) = 3x2 − 4x.
The approximation is accurate to the places for which pn−1 and pn agree.
The Newton’s methods gives:
n pn−1 pn |pn − pn−1 |

1 2.500000000 2.714285714 0.214285714
2 2.714285714 2.690951516 0.023334198
3 2.690951516 2.690647499 0.000304017
4 2.690647499 2.690647448 5.110−8
5 2.690647448 2.690647448 0.00
which is p4 = 2.690647448
If we use p0 = 2 we get
n pn−1 pn |pn − pn−1 |

1 2.000000000 3.250000000 1.250000000
2 3.250000000 2.811036789 0.438963211
3 2.811036789 2.697989503 0.113047286
4 2.697989503 2.690677153 0.007312350
5 2.690677153 2.690647448 0.000029705
which is p5 = 2.690647448
28
Chapter 3
Error Analysis for Iterative Methods
3.1 Linearly and Quadratically Convergent Procedures

We investigate the order of convergence of functional iteration.
Definition: Suppose {pn } is a sequence that converges to p, with pn 6= p for all n. If positive
constant λ and α exist with
|pn+1 − p|
lim =λ (3.1)
n→∞ |pn − p|α
then the sequence converges to p of order α, with asymptotic constant λ.

An iterative techniques of the form pn = g(pn−1 ) is said to be of order α if the sequence {pn }
converges to the solution p = g(p) of order α.
In general, a sequence with high order of convergence converges more rapidly than a sequence with
a lower order. The asymptotic constant affects the speed of convergence but is not as important as
the order.
If α = 1, the sequence is linearly convergent.

If α = 2, the sequence is quadratically convergent.
Assume that we have two sequences one converges linearly to zero and the other converges quadrat-
ically to zero. For simplicity , suppose that
|pn+1 |
≈ 0.5 Linearly convergent
|pn |
|p̃n+1 |
≈ 0.5 Quadratically convergent
|p̃n |2
For linearly convergent, we have
|pn − 0| = |pn | ≈ 0.5|pn−1 | ≈ 0.52 |pn−2 | ≈ ... ≈ 0.5n |p0 |
For quadratically convergent, we have

n −1 n
|p̃n − 0| = |p̃n | ≈ 0.5|p̃n−1 |2 ≈ 0.53 |p̃n−2 |4 ≈ ... ≈ (0.5)2 |p̃0 |2
If we use p0 =1 we can see that

n −1
|pn | ≈ 0.5n , |p̃n | ≈ (0.5)2
29
after seven iterations we get
|pn | ≈ 0.78125 × 10−2 |p̃n | ≈ 0.58775 × 10−38
In order for the linearly convergent to have the same accuracy as quadratically convergence we need:
7 −1
(0.5)n = (0.5)2 ⇒ n = 27 − 1 = 127
Theorem: Let g ∈ C[a, b] be such that g(x) ∈ [a, b], for all x ∈ [a, b]. Suppose, in addition, that g ′
is continuous on (a, b) and that a positive constant k < 1 exists with |g ′ (x)| ≤ k, for all x ∈ (a, b). If
g ′ (x) 6= 0, then for any p0 ∈ [a, b], the sequence
pn = g(pn−1 ), n ≥ 1, (3.2)
converges only linearly to the unique fixed point p in [a, b].
proof:
Theorem: Let p a solution of the equation x = g(x). Suppose that g ′ (p) = 0 and g ′′ is continuous
with |g ′′ (x)| < M on an open interval I containing p. Then there exists a number δ > 0 such that, for
p0 ∈ [p−δ, p+δ], the sequence defined by pn = g(pn−1 ), where n ≥ 1, converges at least quadratically
to p. Moreover, for sufficient large values of n,
M
|pn+1 − p| < |pn − p|2 . (3.3)
2
Proof:
The easiest way to construct a fixed-point problem associated with a root-finding problem f (x) = 0
is to subtract a multiple of f (x) from x.
pn = g(pn−1 ), with g(x) = x − φ(x)f (x), where φ(x) is a differentiable function that will be chosen
later.
If p satisfies f (p) = 0 then it is clear that g(p) = p.
If the iteration procedure derived from g to be quadratically convergent, we need g ′ (p) = 0 when
f (p) = 0. Since
g ′ (x) = 1 − φ′ (x)f (x) − φ(x)f ′ (x)
we have
g ′ (p) = 1 − φ′ (p)f (p) − φ(p)f ′ (p)
= 1 − φ(p)f ′ (p)
which implies
1
φ(p) =
f ′ (p)
If we let φ(x) = 1/f ′ (x), we will ensure that φ(p) = 1/f ′ (p) and produce quadratically convergent
procedure
f (pn−1 )
pn = g(pn−1 ) = pn−1 − (3.4)
f ′ (pn−1 )
This is of cause Newton’s method which is quadratically convergent provided that f ′ (pn−1 ) 6= 0.
30
3.2 Zero multiplicity
Definition: A solution p of f (x) = 0 is zero multiplicity m of f if for x 6= p, we can write
f (x) = (x − p)m q(x), where lim q(x) 6= 0.
x→p
Theorem: f ∈ C 1 [a, b] has a simple zero at p in (a, b) iff f (p) = 0 but f ′ (p) 6= 0.
Proof:
If p is a simple root of f then Newton’s method converges quadratically. If p is not a simple root
then Newton’s method may not converge quadratically (see Example 2 page 79).
Theorem: The function f ∈ C m [a, b] has a zero of multiplicity m at p in (a, b) iff
f (p) = f ′ (p) = ... = f m−1 (p) = 0

f m (p) 6= 0
One method to handle problems of multiple roots is to define a function

f (x)
µ(x) = (3.5)
f ′ (x)
If p is zero of f of multiplicity m and f (x) = (x − p)m q(x), then

q(x)
µ(x) = (x − p)
mq(x) + (x − p)q ′ (x)
which has a simple zero. Newton’s method can be applied to µ to give
µ(x)
g(x) = x −
µ′ (x)
f (x) f ′ (x)
= x− (3.6)
[f ′ (x)]2 − f (x) f ′′ (x)
If g has the required continuity conditions, functional iteration applied to g will be quadratically
convergent regardless of the multiplicity of the zero of f . Theoretically, the only drawback to this
method is the additional calculation of the second derivative f ′′ . In practice, multiple roots can cause
serious round-off problems since the denominator consists of the difference of two numbers that are
both close to zero.
3.3 Exercises
Ex 6: Show that the following sequence converges linearly to p = 0.
How large must n before we have |p − pn | ≤ 5 × 10−2 ?
a)pn = 1/n. b) qn = 1/n2 .
Sol 6: It is clear that the limit of the sequences is zero

It is clear that
|pn+1 − p|
lim =1
n→∞ |pn − p|
31
for (a) we have 1/n ≤ 5 × 10−2 implies that n ≥ 20. for (b) we have 1/n2 ≤ 5 × 10−2 implies that
n2 ≥ 20, which in turn gives n ≥ 5.
n
Ex 8a: We want to show that the sequence pn = 10−2 converges quadratically to 0.
Sol 8a: We know that

n
lim 10−2 = 0
n→∞
we now calculate the following limit

n+1 n
10−2 (10−2 )2
lim n = lim n =1
n→∞ (10−2 )2 n→∞ (10−2 )2
thus α = 2,and the sequence converges quadratically to zero.

k
Ex 8b: We want to show that the sequence pn = 10−n doesn’t converge to zero quadratically, regardless of
the size of the exponent k > 1.
Sol 8b: We know that

k
lim 10−n = 0
n→∞
we now calculate the following limit

k
10−(n+1)
lim k
n→∞ (10−n )2
To find this limit we know that

" k #
n+1
lim 2nk − (n + 1)k = lim nk 2 − =∞
n→∞ n→∞ n
So, we cannot find a positive number λ. Therefore, the sequence doesn’t converge quadratically.
32
Chapter 4
Accelerating Convergence
4.1 Aitken’s ∆2 method

We consider a technique that can be used to accelerate the convergence of a sequence that is
linearly convergent.
Suppose that {pn } is a linearly convergent sequence with limit p. The Aitken’s ∆2 method on the
assumption that {p̂n } converges defined by
(pn+1 − pn )2
p̂n = pn −
pn+2 − 2pn+1 + pn
(∆pn )2
= pn − for n ≥ 0 (4.1)
∆2 p n
converges more rapidly to p than does the original sequence {pn }. The symbol ∆pn is the forward
difference which is defined by
∆pn = pn+1 − pn , for n ≥ 0 (4.2)

(4.3)
High powers of the operator ∆ are defined recursively by
∆k pn = ∆(∆k−1 pn ) for k ≥ 2 (4.4)

(4.5)
For example
∆2 p n = ∆(∆pn )
= ∆(pn+1 − pn )
= (pn+2 − pn+1 ) − (pn+1 − pn )
= pn+2 − 2pn+1 + −pn (4.6)
Theorem: Suppose that {pn } is a sequence that converges linearly to the limit p and that
pn+1 − p
lim <1 (4.7)
n→∞ pn − p
33
Then the sequence
(∆pn )2
p̂n = pn − (4.8)
∆2 p n
converges to p faster that {p} in the sense that
p̂n − p
lim =0 (4.9)
n→∞ pn − p
Example:
Let consider pn = cos(1/n). This sequence converges linearly to p = 1,
1
1

cos n+1 −1 n2 sin n+1
lim = lim
n→∞ cos 1 − 1 n→∞ (n + 1)2 sin 1
n n
1

sin n+1
= lim
n→∞ sin 1
n
1

n2 cos n+1
= lim
n→∞ (n + 1)2 cos 1
n
= 1
The Aitken’s ∆2 is defined by
(∆pn )2
p̂n = pn −
∆2 p n
(pn+1 − pn )2
= pn − (4.10)
pn+2 − 2pn+1 + pn
n pn p̂n
1 0.5403023059 0.9617750599
2 0.8775825619 0.9821293535
3 0.9449569463 0.9897855148
4 0.9689124217 0.9934156481
5 0.9800665778 0.9954099422
6 0.9861432316 0.9966199575
7 0.9898132604 0.9974083190
Example:
The function f (x) = x3 − 3x + 2 = (x − 1)2 (x + 2) has a double root p = 1. If Newton’s method
converges to p = 1 it converges linearly. We choose p0 = 2. The Newton’s method produces the
following sequence:
p0 = 2.
p3n − 3pn + 2
pn+1 = pn −
3pn − 3
34
pn − 1
n pn
pn−1 − 1
1 1.555555555555556 0.5555555560
2 1.297906602254429 0.5362318832
3 1.155390199213768 0.5216071009
4 1.079562210414361 0.5120156259
5 1.040288435171017 0.5063765197
6 1.020276809786733 0.5032910809
7 1.010172323431420 0.5016727483
8 1.005094741093272 0.5008434160
9 1.002549528082823 0.5004234759
10 1.001275305026243 0.5002121961
11 1.000637787960288 0.5001062491
12 1.000318927867152 0.5000533092
13 1.000159472408516 0.5000250840
14 1.000079738323218 0.5000125414
15 1.000039869690520 0.5000125411
It is clear that Newton’s method is linearly convergent or it converges slowly to p = 1. Let us apply
Aitken’s acceleration process to the sequence pn of iterations generated by Newton’s method
(pn+1 − pn )2
p̂n = pn −
pn+2 − 2pn+1 + pn
(4.11)
which gives:
n pn p̂n pn − 1 p̂n − 1
0 2.0 0.9425287356 1.0 −0.0574712644
1 1.555555556 0.9789767949 0.555555556 −0.0210232051
2 1.297906602 0.9933420783 0.297906602 −0.0066579217
3 1.155390199 0.9980927682 0.155390199 −0.0019072318
4 1.079562210 0.9994865474 0.079562210 −0.0005134526
5 1.040288435 0.9998665586 0.040288435 −0.0001334414
6 1.020276810 0.9999659695 0.020276810 −0.0000340305
7 1.010172323 0.9999914062 0.010172323 −0.0000085938
8 1.005094741 0.9999978406 0.005094741 −0.0000021594
9 1.002549528 0.9999994588 0.002549528 −5.41210−7
10 1.001275305 0.9999998645 0.001275305 −1.35510−7
11 1.000637788 0.9999999661 0.000637788 −3.3910−8
12 1.000318928 0.9999999915 0.000318928 −8.510−9
13 1.000159472 0.9999999979 0.000159472 −2.110−9
14 1.000079738 ∗∗ 0.000079738 ∗∗
15 1.000039870 ∗∗ 0.000039870 ∗∗
4.2 Steffensen’s Method

By applying a modification of Aitken’s ∆2 method to a linearly convergent sequence obtained from a
fixed-point iteration, we can accelerate the convergence to quadratic. This procedure is known as Stef-
35
fensen’s method.
Aitken’s method construct the terms in order,
p0 , p1 = g(p0 ), p2 = g(p1 ), p̂0 = {∆2 }(p0 ), p3 = g(p2 ), p̂1 = {∆2 }(p1 ), ... (4.12)
where {∆2 } indicates that Aitken’s method given by eq.(4.10) is used. Steffensen’s Method constructs
the same first four terms, p0 , p1 , p2 , and p̂0 . However, at this step it assumes that p̂0 is better
approximation to p than p2 and applies fixed-point iteration to p̂0 instead of p2 . This leads to the
following sequence:
(0) (0) (0) (0) (0) (1) (0) (1) (1)
p0 , p1 = g(p0 ), p2 = g(p1 ), p0 = {∆2 }(p0 ), p1 = g(p0 ), ... (4.13)
Note that the denominator can be zero in the next iteration. If this occurs, we terminate the sequence
and select the last one before we get zero1 .
Ex 3 page 86 :
(0) (1)
Let g(x) = cos(x − 1) and p0 = 2. We want to use Steffensen’s method to get p0 .
(0)
p0 = 2
(0)
p1 = cos(2 − 1) = 0.5403023059
(0)
p2 = cos(0.5403023059 − 1) = 0.8961866647
(0) (0)
(1) (0) (p1 − p0 )2
p0 = p0 − (0) (0) (0)
= 0.826427396
p2 − 2p1 + p0
Ex 4 page 86 :
(0) (1) (2)
Let g(x) = 1 + (sin x)2 and p0 = 2. We want to use Steffensen’s method to get p0 and p0
(0)
p0 = 2
(0)
p1 = 1 + (sin 2)2 = 1.708073418
(0)
p2 = 1 + (sin 1.708073418)2 = 1.981273081
(0) (0)
(1) (0) (p1 − p0 )2
p0 = p0 − (0) (0) (0)
= 2.152904629
p2 − 2p1 + p0
(2)
To calculate p0 we start with:
(1)
p0 = = 2.152904629
(1)
p1 = 1 + (sin 2.152904629)2 = 1.697735097
(1)
p2 = 1 + (sin 1.697735097)2 = 1.983972911
(1) (1)
(2) (1) (p1 − p0 )2
p0 = p0 − (1) (1) (1)
= 1.873464043
p2 − 2p1 + p0
1
See page 85 from the textbook
36
4.3 Zeros Polynomial
A polynomial of degree n has the form
P (x) = an xn + an−1 xn−1 + ... + a1 x + a0 (4.14)
where ai ’s are the coefficients of P . are constant and an 6= 0.
Fundamental Theorem of Algebra: If P (x) is polynomial of degree n ≥ 1, then P (x) = 0 has at least
one root (possibly complex).
If P (x) is a polynomial of degree n ≥ 1, then there exist unique constants x1 , x2 , ..., xk , possibly
complex, and unique positive integers m1 , m2 , ..., mk , such that
k
X
mi = n
i=1
P (x) = an (x − x1 )m1 (x − x1 )m2 ...(x − xk )mk
mi is the multiplicity of the zero xi .
Let P (x) and Q(x) be polynomials of degree at most n, If x1 , x2 , ..., xk , with k > n, are distinct
numbers with P (xi ) = Q(xi ) for i = 1, 2, ..., k, then p(x) = Q(x) for all values of x.
4.4 Horner’s Method

To use Newton’s method to locate approximate zeros of a polynomial P(x), we need to evaluate P (x) and
P ′ (x) at specified values. To compute them efficiency we compute them in the nested manner. Horner’s
method incorporates this nesting technique and require only n multiplications and n additions to
evaluate an arbitrary nth-degree polynomial.
Synthetic Division Algorithm and the Remainder Theorem:

A polynomial of degree n can be written as
Pn (x) = an xn + an−1 xn−1 + ... + a1 x + a0 (4.15)
Divide this polynomial Pn (x) by (x − x1 ), giving a reduced polynomial Qn−1 (x) of degree n − 1, and
a remainder R
Pn (x) = (x − x1 )Qn−1 (x) + R (4.16)
We can see that Pn (x1 ) = R. If we differentiate Pn (x) we get
Pn′ (x) = (x − x1 )Q′n−1 (x) + Qn−1 (x) (4.17)
thus,
Pn′ (x1 ) = Qn−1 (x1 ) (4.18)
37
We evaluate Qn−1 (x1 ) by a second division whose remainder equals Qn−1 (x1 ), and so on. Now we
can write
Pn (x) = an xn + an−1 xn−1 + ... + a1 x + a0
= (x − x1 )Qn−1 (x) + R
= (x − x1 )(bn−1 xn−1 + bn−2 xn−2 + ... + b1 x + b0 ) + R
= bn−1 xn + bn−2 xn−1 + ... + b1 x2 + b0 x

− bn−1 x1 xn−1 + bn−2 x1 xn−2 + ... + b1 x1 x + b0 x1 + R
We collect the terms
Pn (x) = bn−1 xn + [bn−2 − x1 bn−1 ] xn−1 + [bn−3 − x1 bn−2 ] xn−2 + ...
+ [b0 − x1 b1 ] x + [R − x1 b0 ]
By comparison we get:
bn−1 = an
bn−2 = an−1 + x1 bn−1
.. .
. = ..
bi = ai+1 + x1 bi+1
.. .
. = ..
b 0 = a1 + x 1 b 1
So, the reminder can be evaluated from
R = a0 + x 1 b 0
Horner’s Method: Let

P (x) = an xn + an−1 xn−1 + ... + a1 x + a0 (4.19)
If bn = an and
bk = ak + bk+1 x0 , for k = n − 1, n − 2, ..., 1, 0 (4.20)
then b0 = P (x0 ), k is from n − 1 to 0, which means you need only n multiplications and n additions
to get p(x0 ). Moreover, if
Q(x) = bn xn−1 + bn−1 xn−2 + ... + b2 x + b1 (4.21)
Then
P (x) = (x − x0 )Q(x) + b0 (4.22)
Proof:
The derivative of P (x) is given by

P ′ (x) = Q(x) + (x − x0 )Q′ (x) (4.23)
Thus
P ′ (x0 ) = Q(x0 ) (4.24)
38
Example:
We want to evaluate P (x) = 2x4 − 3x2 + 3x − 4 at x0 = −2 using Horner’s method.
we start by:
Coeff x4 Coeff x3 Coeff x2 Coeff x1 Coeff x0

a4 = 2 a3 = 0 a2 = −3 a1 = 3 a0 = −4
b 4 = a4 b3 = a3 + b4 (−2) b2 = a2 + b3 (−2) b1 = a1 + b2 (−2) b0 = a0 + b1 (−2)
b4 = 2 b3 = −4 b2 = 5 b1 = −7 b0 = 10
Therefore, P (−2) = 10 and P (x) = (x + 2)(2x3 − 4x2 + 5x − 7) + 10.
Example:
Find an approximation to one of the zeros of P (x) = 2x4 − 3x2 + 3x − 4 using Newton’s Method and
synthetic division to evaluate P (xn ) and P ′ (xn ) for each iterate xn .
at x0 = −2 we use bn = an and bk = ak + bk+1 x0 for k = n − 1 to k = 0.
21 02 −33 34 −45
(26 )(−2) = −47 (−4)8 (−2) = 89 (510 )(−2) = −1011 (−712 )(−2) = 1413
26 02 − 47 = −48 −33 + 89 = 510 34 − 1011 = −712 −45 + 1413 = 1014
Using the theorem P ′ (x0 ) = Q(x0 ) we get
21 −42 53 −74
(25 )(−2) = −46 (−87 )(−2) = 168 (219 )(−2) = −4210
25 −42 − 45 = −87 53 + 168 = 219 −74 − 4210 = −4911
and
P (x0 ) 10
x1 = x0 − = −2 − ≈ −1.796
Q(x0 ) −49
repeating the procedure we get for x1 = −1.796, P (x1 ) = 1.742 and P ′ (x1 ) = −32.565, so, x2 ≈
−1.73896. in a similar manner we get x3 = −1.73897. An actual zero to five decimal places is
−1.73896.
4.5 Deflation
If the N th iterate, xN , in Newton’s method is an approximate zero for the polynomial P (x), then
P (x) = (x − xN )Q(x) + b0 = (x − xN )Q(x) + P (xN ) ≈ (x − xN )Q(x) (4.25)
so, x − xN is an approximate factor of P (x). Letting x̂1 = xN be the approximate zero of P and Q1 (x) =
Q(x) be the approximate factor given by
P (x) ≈ (x − x̂1 )Q1 (x) (4.26)
We can find a second approximate zero of P by applying Newton’s method to Q1 (x). If P (x) is of order n
we can apply repeatedly the procedure to find x̂2 and Q2 (x),..., x̂n−2 and Qn−2 (x). After finding (n − 2)
roots we get a quadrature form which we can solve it to get the last two approximate roots. This procedure
is called deflation.
39
The accuracy difficulty with deflation is due to the fact that, when obtaining the approximate zero
of P (x), Newton’s method is used on the reduced polynomial Qk (x), that is,
P (x) ≈ (x − x̂1 )(x − x̂2 )...(x − x̂k )Qk (x) (4.27)
An approximate zero x̂k+1 of Qk (x) will generally not approximate a root of P (x) but of Qk (x). To
eliminate this we can use the reduced equations to find approximates x̂i , and then apply Newton’s
Method to the original polynomial P (x).
One problem with applying the Secant, False position, or Newton’ methods to polynomials is pos-
sibility of having complex roots. If the initial approximation is real all subsequent approximations
will also be real. To overcome this problem we start with complex initial approximation.
4.6 Müller’s method

The secant method use two initial approximations p0 and p1 , to get p2 which is the x-intercept of the line
joining the two points (p0 , f (p0 )), (p1 , f (p1 )).
Müller’s method uses three initial approximations, p0 , p1 ,p2 , and determinate the next approximation p3 by
considering the intersection of the x-axis with the parabola through (p0 , f (p0 )),(p1 , f (p1 )), and (p2 , f (p2 )).
The parabola take the form
P (x) = a(x − p2 )2 + b(x − p2 ) + c (4.28)
Of cause we can find these parameters a, b, and c.
To determine p3 , a zero of P (x), we apply the quadrature formula
−2c
p3 − p2 = √ (4.29)
b ± b2 − 4ac
This has no problem with subtracting nearly equal numbers (see example 5 section 1.2).
This formula has two roots. In Muller’s method, the sign is chosen to agree with the sign of b, so
2c
p3 = p2 − √ (4.30)
b + sign(b) b2 − 4ac
Once p3 is determined, we apply the procedure to p1 , p2 and p3 to get p4 . and so one.
The method involve square root which means that complex numbers can be found using Muller’s
method.
4.7 Exercises
We want to find the approximation to 10−4 of all real zeros of the following polynomial using New-
ton’s method.
P (x) = x3 − 2x2 − 5
40
sol:
Descartes’s rule of signs. The rule states that the number np of positive zeros of a polynomial P (x)
is less than or equal to the number of variations v in sign of the coefficients of P (x). Moreover, the
difference v − np is nonegative even integer.
For our example, the number of variations v in sign of the coefficients of P (x) is v = 1.
There are at most 1 positive root. Moreover, 1 − np ≥ 0, which implies that np = 1. Therefore there
is one positive root.
Now we change x → −x, we find
P (−x) = −x3 − 2x2 − 5
So, there is no variations in sign, v = 0. Thus, there is no negative root.

Thus, our conclusion is: there is only on real root which is a positive real number.
We then apply Newton’s Method starting with p0 = 2.
f (x) = x3 − 2x2 − 5
f ′ (x) = 3x2 − 4x
p0 = 2
f (pn )
pn+1 = pn − ′ , n≥0
f (pn )
n pn |pn − pn−1 |
1 3.250000000 1.250000000
2 2.811036789 0.438963211
3 2.697989503 0.113047286
4 2.690677153 0.007312350
5 2.690647448 0.000029705
6 2.690647448 0.000000001
We want to find to within 10−5 all the zeros of the polynomial
f (x) = x4 + 5x3 − 9x2 − 85x − 136
by first finding the real zeros using Newton’s method and then reducing to polynomails of lower
degree to determine any complex zeros.
According to Descart rule we have:
1. For positive zeroes, we have: number of variations of sign is 1. Thus, there is only on positive
zero
2. For negative zeroes, we have: number of variartions of sign is 3. Thus, there are one or three
negative zeroes.
We apply newtons method with p0 = 0.00000 we get
41
[1 -1.600000000, 1.600000000]
[2 -2.681394805, 1.081394805]
[3 -5.595348023, 2.913953218]
[4 -4.842605061, 0.752742962]
[5 -4.377210956, 0.465394105]
[6 -4.167343093, 0.209867863]
[7 -4.124721017, 0.042622076]
[8 -4.123107873, 0.001613144]
[9 -4.123105624, 0.000002249]
After rounding we get the negative zeros is −4.123106.

We now use Horner’s method to get the reduced polynomial. we get
b4 = 1.
b3 = 9.123105625
b2 = 28.61552812
b1 = 32.9848450
b0:= 0.
The polynomial is now reduced to
Q1 (x) = x3 + 9.123105625x2 + 28.61552812x + 32.9848450
We use Newton Method to get a solution for Q1 (x) = 0, we find up to 10−5 the root 4.123106. We
use Horners method to get the reduced polynomial
b3 = 1.
b2 = 5.
b1 = 8.
b0 = 0.
Thus, the reduced polynomial is
Q2 (x) = x2 + 5x + 8
This gives two complex roots, −2.5 ± 1.32288i
42
Chapter 5
Interpolation and Polynomial

Approximation
Conseder Data with two columns, x and y. We plot this data and see if we can fit this data to a function
y = f (x). This is what we call “interpolation”. We will study the case when we fit the data to a
polynomial.
5.1 Weierstrass Approximation Theorem

Theorem: Suppose that f is defined and continuous on [a, b].
For each ǫ > 0, there exists a polynomial P (x), with the property that
|f (x) − P (x)| < ǫ, for all x ∈ [a, b].
The proof of this theorem can be found in most elementary textbook on real analysis.
Taylor polynomials are used mainly to approximate a function at a specified point. A good polyno-
mial needs to provide a relatively accurate approximation over the entire interval.
Taylor polynomial is not always an appropriate for interpolation. As example To approximate
f (x) = 1/x at x = 3 using Taylor polynomial expanded at x = 1, leads to inaccurate result.
n 0 1 2 3 4 5 6 7
Pn (3) 1 −1 3 −5 11 −21 43 −85
5.2 Lagrange Polynomial

In this section we find approximating polynomials that are determined simply by specifying certain points
on the plane through which they must pass.
Let determine a polynomial of degree one that passes through the distinct points (x0 , f (x0 )) and
(x1 , f (x1 )). We define the functions:
x − x1
L0 (x) = (5.1)
x0 − x1
x − x0
L1 (x) = (5.2)
x1 − x0
43
and define
P (x) = L0 (x)f (x0 ) + L1 (x)f (x1 ) (5.3)
It is clear that the polynomial P (x) coincides with f (x) at x0 and x1 and it is the unique linear
function passing through (x0 , f (x0 )) and (x1 , f (x1 )).
To generate this concept of linear interpolation, consider the construction of a polynomial of

degree at most n that passes through n + 1 points, (x0 , f (x0 )), (x1 , f (x1 )),...,(xn , f (xn )).
Theorem If x0 , x1 ,...,xn are n + 1 distinct numbers and f is a function whose values are given at
these numbers, then a unique polynomial P (x) of degree at most n exists with
f (xk ) = P (xk ), k = 0, 1, ..., n (5.4)
This polynomial is given by

n
X
P (x) = f (xi )Ln,i (x) (5.5)
i=0
with
n
Y x − xi
Ln,k (x) = (5.6)
i=0
xk − xi
i 6= k
Proof:
Theorem: Suppose x0 , x1 ,...,xn are n + 1 distinct numbers in [a, b] and f ∈ C n+1 [a, b], then
f n+1 (ξ(x))
f (x) = P (x) + (x − x0 )(x − x1 )...(x − xn ) (5.7)
(n + 1)!
where P (x) is the interpolating polynomial given by eq.(5.5) and ξ(x) ∈ (a, b).
Proof:
Definition: Let f be a function at x0 , x1 ,..., xn , and suppose that m1 , m2 ,...,mk are k distinct
integers, with 0 ≤ mi ≤ n for each i. The Lagrange polynomial that agrees with f (x) at all k points
xm1 , xm2 ,..., xmk , is denoted by Pm1 m2 ...mk (x).
Theorem Let f be defined at x0 , x1 ,...,xk , and xj 6= xi be two numbers from the set. Then,
(x − xj )P0,1,...,j−1,j+1,...,k (x) − (x − xi )P0,1,...,i−1,i+1,...,k (x)

P (x) = (5.8)
(xi − xj )
describes the kth Lagrange polynomial that interpolate f at k + 1 points, x0 , x1 ,...,xk .

Proof:
44
5.3 Neville’s Method
This theorem implies that the interpolating polynomials can be generated recursively.
To avoid multiple subscripts, let Qi,j , for 0 ≤ j ≤ i, denote the interpolating polynomial of degree j
on the (j + 1) numbers xi−j , xi−j+1 ,...,xi−1 , xi ; that is
Qi,j = Pi−j,i−j+1,...,i−1,i (5.9)
x0 P0 = Q0,0
x1 P1 = Q1,0 P0,1 = Q1,1
x2 P2 = Q2,0 P1,2 = Q2,1 P0,1,2 = Q2,2 (5.10)
x3 P3 = Q3,0 P2,3 = Q3,1 P1,2,3 = Q3,2 P0,1,2,3 = Q3,3
x4 P4 = Q4,0 P3,4 = Q4,1 P2,3,4 = Q4,2 P1,2,3,4 = Q4,3 P0,1,2,3,4 = Q4,4
Naville’s Iterated Interpolation is given by
(x − xi−j )Qi,j−1 − (x − xi )Qi−1,j−1

Qi,j = (5.11)
(xi − xi−j )
Q0,0 = f (x0 ), Q1,0 = f (x1 ), , ..., Qn,0 = f (xn ), (5.12)
Example:
Suppose function f is given for the following values:
x f (x)
x0 = 1.0 0.7651977
x1 = 1.3 0.6200860
x2 = 1.6 0.4554022
x3 = 1.9 0.2818186
x4 = 2.2 0.1103623
we want to approximate f (1.5) using various interpolating polynomials at x = 1.5. By using Neville’s
method, eq. (5.12), we can calculate Qi,j
Q0,0 = P0 = 0.7651977,
Q1,0 = P1 = 0.6200860,
(x − x0 )Q1,0 − (x − x1 )Q0,0
Q1,1 = P0,1 = = 0.5233449
(x1 − x0 )
Q2,0 = P2 = 0.4554022
(x − x1 )Q2,0 − (x − x2 )Q1,0
Q2,1 = P1,2 = = 0.5102968
(x2 − x1 )
(x − x0 )Q2,1 − (x − x2 )Q1,1
Q2,2 = P0,1,2 = = 0.5124715
(x2 − x0 )
assignments:
study example 6 page 113.
45
5.4 Newton Interpolating Polynomial
Suppose there is a known polynomial Pn−1 (x) that interpolates the data set: (xi , yi ), i = 0, 1, .., n − 1.
When one more data point (xn , yn ), which is distinct from all the other data points, is added to the data
set, we can construct a new polynomial Pn (x) that interpolates the new data set. To do so, let consider
the polynomial
n−1
Y
Pn (x) = Pn−1 (x) + cn (x − xi ) (5.13)
i=0
where cn is an unknown constant.

For the case when n = 0, we write
P0 (x) = y0 (5.14)
It is clear that for all the old points
Pn (xi ) = Pn−1 (xi ), for i = 0, ..., n − 1 (5.15)
at the new point (xn , yn ) we have

n−1
Y
Pn (xn ) = Pn−1 (xn ) + cn (xn − xi ) (5.16)
i=0
which leads to at the new point (xn , yn ) we have
Pn (xn ) − Pn−1 (xn ) yn − Pn−1 (xn )

cn = n−1
= n−1 (5.17)
Y Y
(xn − xi ) (xn − xi )
i=0 i=0
So, For any given data set (xi , yi ), i = 1, 2, ..., n, we can obtain the interpolating polynomial by recursive
process that starts from P0 (x) and uses the above construction to get P1 (x), P2 (x), ..., Pn−1 (x). We will
demonstrate this process through the following example
i 0 1 2 3 4
xi 0 0.25 0.5 0.75 1
yi −1 0 1 0 1
First step: for i = 0 we have
P0 (x) = y0 = −1
Second step: adding the point (x1 , y1 ) = (0.24, 0), we get

0
Y
P1 (x) = P0 (x) + c1 (x − xi )
i=0
= −1 + c1 (x − x0 )
= −1 + c1 x
46
The constant c1 is given by
y1 − P0 (x1 )
c1 = 0
Y
(x1 − xi )
i=0
0 − (−1)
= =4
0.25 − 0
Thus,
P1 (x) = −1 + 4 x
The third step: adding the point (x2 , y2 ) = (0.5, 1),
1
Y
P2 (x) = P2 (x) + c2 (x − xi )
i=0
= (−1 + 4x) + c2 (x − x0 )(x − x1 )
= −1 + 4x + c2x(x − 0.25)
The constant c2 is given by
y2 − P1 (x2 )
c2 = 1
Y
(x2 − xi )
i=0
−1 − (−1 + 4 × 0.5)
= =0
(0.5 − 0)(0.5 − 0.25)
Thus,
P2 (x) = −1 + 4x
We continue the calculations we find

−64 64 1 1
c3 = , P3 (x) = −1 + 4x − x x − x−
3 3 4 2
64 1 1 1 1 3
c4 = 64, P4 (x) = −1 + 4x − x x − x− + 64x x − x− x−
3 4 2 4 2 4
Divided difference polynomial: The divided difference polynomial is a helpful method to generate
interpolation polynomials.
The first order divided difference of f at x = xi is give by
f (xi+1 ) − f (xi )
f [xi , xi+1 ] = (5.18)
xi+1 − xi
The second order divided difference of f at xi is given by
f [xi+1 , xi+2 ] − f [xi , xi+1 ]
f [xi , xi+1 , xi+2 ] = (5.19)
xi+2 − xi
47
We can generate this to higher order
f [x1 , . . . , xn ] − f [x0 , . . . , xn−1 ]
f [x0 , x1 , . . . , xn ] = (5.20)
xn − x0
With these definition we get the interpolation polynomial as:
n−1
Y
Pn (x) = f [x0 ] + f [x0 , x1 ](x − x0 ) + . . . + f [x0 , . . . , xn ] (x − xi )
i=0
n
X i−1
Y
= f [x0 ] + f [x0 , . . . , xi ] (x − xi ) (5.21)
i=1 i=0
Example:
i 0 1 2 3 4
xi 0 0.25 0.5 0.75 1
yi −1 0 1 0 1
Let try to find the interpolation polynomial of the above table
i xi f (xi ) 1st DD 2nd DD 3rd DD
0 0.00 f [x0 ] = −1
1 0.25 f [x1 ] = 0
f [x1 ]−f [x0 ]
f [x0 , x1 ] = x1 −x0
=4
2 0.50 f [x1 ] = 1
f [x2 ]−f [x1 ]
f [x1 , x2 ] = x2 −x1
=4
3 0.75 f [x1 ] = 0
f [x0 , x1 , x2 ] = 0
f [x3 ]−f [x2 ]
f [x2 , x3 ] = x3 −x2
= −4
f [x1 , x2 , x3 ] = −16
4 1.00 f [x1 ] = 1
f [x4]−f [3]
f [x3 , x4 ] = x4 −x3
=4
f [x0 , x1 , x2 , x3 ] = − 64
3
f [x2 , x3 , x4 ] = 16
128
f [x1 , x2 , x3 , x4 ] = 3
f [x0 , x1 , x2 , x3 , x4 ] = 64
So, the polynomial is

1 64 1 1
P3 (x) = −1 + 4(x − 0) + 0(x − 0) x − − (x − 0) x − x−
4 3 4 2

1 1 3
+64 (x − 0) x − x− x−
4 2 4

64 1 1 1 1 3
= −1 + 4x − x x − x− + 64x x − x− x−
3 4 2 4 2 4
48
5.5 Polynomial Forms
We follow the book Introduction to numerical analysis, Alastair Wood, Addition-Wesly
Power form: A polynomial of degree n can be written in power form as
Pn (x) = a0 + a1 x + . . . + an xn
Xn
= ak x k (5.22)
k=0
This form is convenient for analysis but may leads to loss of significance. For example, let consider
18001
P1 (x) = −x
3
This polynomial takes 1/3 at x = 6000 and -2/3 at x = 6001. On a finite-precision machine with 5
decimal digits the coefficients are stored as a∗0 = 6000.3 and a∗1 = −1, and hence
P1 (6000) = 6000.3 − 6000 = 0.3

P1 (6001) = 6000.3 − 6001 = −0.7
Only one digit of the exact value is recovered, yet the coefficients are accurate to 5 digits! 4 significant
digits have been lost due to subtraction of two near-neighbor large numbers.
Shifted power form: The drawback seen in the previous example can be alleviated by changing
the origin of the x to a non-zero value c an writing the polynomial (5.22) as
n
X
Pn (x) = ak x k
k=0
= b0 + b1 (x − c) + . . . + bn (x − c)n (5.23)
Xn
= bk (x − c)k
k=0
This form is called shifted power form. c is a centre an bk are constant coefficients. The previous
example can be written as
18001
P1 (x) = −x
3
1
= − (x − 6000)
3
So, we get
1 1
P1 (6000) = − (6000 − 6000) = = 0.33333
3 3
1 1
P1 (6001) = − (6001 − 6000) = − 1 = −0.66667
3 3
49
These values are accurate to 5 digits and there is no loss of significance.
We can find the coefficients bk by using Taylor polynomial at x=c. This gives
n (k)
X Pn (c)
Pn (x) = (x − c)k (5.24)
k=0
k!
(k)
where,Pn (c) is the k-th derivative of Pn (x) at x = c. Thus,
(k)
Pn (c)
bk =
k!
Newton form: We can generalized equation (5.24) by choosing n centres c1 , . . . , cn we get
Pn (x) = d0 + d1 (x − c1 ) + d2 (x − c1 )(x − c2 ) + . . .
+dn (x − c1 )(x − c2 ) . . . (x − cn ) (5.25)
Xn k
Y
= d0 + dk (x − cj )
k=1 j=1
This is Newton form, which is particularly useful for polynomial interpolation. if c1 = c2 = . . . =

cn = c we get the shifted power form, and for c1 = c2 = . . . = cn = 0 we cover the power form.
5.6 Spline Interpolation

For large data set a single approximation by a polynomial satisfying the data (xi , f (xi )) will give a poly-
nomial of high degree. In general a polynomial of high degree oscillates which may not be acceptable
behavior. One solution of this problem is to use interpolation in a piecewise manner.
The simplest approach uses linear interpolates. Given n + 1 items of data in ascending order by x, the
data (xi , f (xi )) and (xi+1 , f (xi+1 )) are interpolated by a straight line. A piecewise linear interpolation is
called linear spline S1 . The linear spline suffers from the lack of smoothness. The continuity is assured
but there is a change in the first derivative. The solution is to use splines having greater smoothness.
We concentrate on cubic spline.
Definition1 :
For the data (xi , f (xi )), i = 0, . . . , n, S3 is a cubic spline in [x0 , xn ] if:
(1) S3 restricted to [xi−1 , xi ] is a polynomial of degree at most 3
(2) S3 ∈ C 2 [x0 , xn ]
(3) If s3,i and s3,i+1 are cubic interpolation on adjacent sub-intervals then the conditions:
s3,i (xi ) = s3,i+1 (xi ) = f (xi )

s′3,i (xi ) = s′3,i+1 (xi )
s′′3,i (xi ) = s′′3,i+1 (xi )
The condition of this definition is that individual interpolates can no longer be constructed in isola-
tion. The piecewise interpolates s3,1 , . . . , s3,n are interdependent through the derivatives continuity
condition.
1
Alastair Wood, Introduction to Numerical Analysis
50
On the interval [xi−1 , xi ], and for i = 1, 2, . . . , n, we have
s3,i (x) = f (xi−1 ) + ai (x − xi−1 ) + bi (x − xi−1 )2 + ci (x − xi−1 )3 (5.26)
there are 3n constants to be determined, ai , bi ,ci , i = 1, . . . , n. The continuity enforce that
s3,i (xi ) = f (xi−1 ) + ai (xi − xi−1 ) + bi (xi − xi−1 )2 + ci (xi − xi−1 )3
s3,i+1 (xi ) = f (xi ) + ai+1 (xi − xi ) + bi+1 (xi − xi )2 + ci+1 (xi − xi )3
which leads to
f (xi−1 ) + ai (xi − xi−1 ) + bi (xi − xi−1 )2 + ci (xi − xi−1 )3 = f (xi )
f (xi−1 ) + ai hi + bi h2i + ci h3i = f (xi )
where hi = xi − xi−1 for i = 1, . . . , n.
For the first derivative we have
s′3,i (xi ) = ai + 2bi (xi − xi−1 ) + 3ci (xi − xi−1 )2
s′3,i+1 (xi ) = ai+1 + 2bi+1 (xi − xi ) + 3ci+1 (xi − xi )2
which leads to
ai + 2bi hi + 3ci h3i = ai+1
for i = 1, . . . , n − 1.
For the second derivative we get
s′′3,i (xi ) = 2bi + 6ci (xi − xi−1 )
s′′3,i (xi ) = 2bi+1 + 6ci+1 (xi − xi )
which leads to
bi + 3ci hi = bi+1
for i = 1, . . . , n − 1.
The natural cubic spline is defined by
s′′3,1 (x0 ) = s′′3,n (xn ) = 0 (5.27)
in other words
b1 = 0, and bn + 3cn hn = 0 (5.28)
we have:
3n constants to be determined
n equations from continuity
n − 1 equations from 1st derivative
n − 1 equations from 2st derivative
2 equations from natural cubic spline

Therefore, we can find all the constants.
51
5.7 Parametric Curves
In some cases curves cannot be expressed as a function of one coordinate variable y in terms of the other
variable x. A straightforward method to represent such curves is to use parametric technique. We choose
a parameter t on the interval [t0 , tn ], with t0 < t1 < ... < tn and construct approximation functions with
xi = x(ti ) and yi = y(ti ) (5.29)
Consider a curve given by the figure.3.14 page 158 from the textbook. From the curve we can extract the
following table
i 0 1 2 3 4
ti 0 0.25 0.5 0.75 1
xi −1 0 1 0 1
yi 0 1 0.5 0 −1
Please refer to page 158
Example:
The first part of the graph
x = [10, 6, 2, 1, 2, 6, 10]
y = [3, 1, 1, 4, 6, 7, 6]
t = [0, 1/6, 1/3, 1/2, 2/3, 5/6, 1]
The second part of the graph
x = [2, 6, 10, 10, 13]
y = [10, 12, 10, 1, 1]
t = [0, 1/4, 2/4, 3/4, 1]
52
The cubic polynomials for the first graph are:


 10 − 29713
u − 540
13
u3 u < 1/6
 115 27


 13
− 13 u − 1620
13
u2 + 2700
13
u3 u < 1/3
 283
13
− 1539
13
u + 2916
13
u2 − 1836
13
u3 u < 1/2
fx = u 7→ 176 1215 2592 2 1836 3 (5.30)

 − 13 + 13 u − 13 u + 13 u u < 2/3
1168
− 4833 u + 6480 u2 − 2700


 u3 u < 5/6
 13707 13
 1917
13
1620 2
13
540 3
− 13 + 13 u − 13 u + 13 u otherwise
(5.31)


 3 − 1797
130
u + 4266
65
u3 u < 1/6
367
− 130 u − 65 u2 + 1350
1383 1242
u3



 130 13
u < 1/3
 2143 17367 22734 2 17226 3
130
− 130 u + 65 u − 65 u u < 1/2
fy = u 7→ (5.32)


1831
− 65 + 17463 130
u − 12096
65
u2 + 5994
65
u3 u < 2/3
 389 16521 13392 2 1350 3

 − 130 u + 65 u − 13 u
 132397 u < 5/6

− 26 + 40629 130
u − 20898
65
u2 + 6966
65
u3 otherwise
The cubic polynomials for the second graph are:


 2 + 205 u + 152 u3 u < 1/4
 113 14137
 7
28
− 14 u + 6847
u2 − 760
7
u3 u < 1/2
fx = u 7→ 815 2647 2 1096 3 (5.33)
 − + u − 300 u + 7 u u < 3/4
 92928 269914

14
− 14 u + 1464 7
u2 − 4887
u3 otherwise
(5.34)

 10 + 135 u − 184 u3 u < 1/4
 323 14
 7
28
− 123
14
u + 516
7
u2 − 872
7
u3 u < 1/2
fy = u 7→ (5.35)


1277
− 28 + 14 u − 612 u2 + 2328
4677
7
u3 u < 3/4
 2399 7473 3816 2 1272 3
14
− 14 u + 7 u − 7 u otherwise
53
We merge the two graphs we get
Example:
Given the data points
i 0 1 2
xi 4 9 16
fi 2 3 4
We want to estimate the faction value f at x = 7 using natural cubic spline.

The cubic equations are:
f1 (x) = a0 + a1 (x − x0 ) + a2 (x − x0 )2 + a3 (x − x0 )3
f2 (x) = b0 + b1 (x − x1 ) + b2 (x − x1 )2 + b3 (x − x1 )3
which become
f1 (x) = 2 + a1 (x − 4) + a2 (x − 4)2 + a3 (x − 4)3

f2 (x) = 3 + b1 (x − 9) + b2 (x − 9)2 + b3 (x − 9)3
The continuity leads to
3 = 2 + a1 (9 − 4) + a2 (9 − 4)2 + a3 (9 − 4)3
1 = 5a1 + 25a2 + 125a3
4 = 3 + b1 (16 − 9) + b2 (16 − 9)2 + b3 (16 − 9)3
1 = +7b1 + 49b2 + 343b3
The first derivatives:
f1′ (x) = a1 + 2a2 (x − 4) + 3a3 (x − 4)2

f2′ (x) = b1 + 2b2 (x − 9) + 3b3 (x − 9)2
The continuity of the derivatives lead to:
b1 = a1 + 10a2 + 75a3
The second derivatives:
f1′′ (x) = 2a2 + 6a3 (x − 4)

f2′′ (x) = 2b2 + 6b3 (x − 9)
54
The continuity of the second derivatives lead to
b2 = a2 + 15a3
The condition of natural spline
0 = a2
0 = b2 + 21b3
Now we collect our equations:
1 = 5a1 + 25a2 + 125a3

1 = 7b1 + 49b2 + 343b3
b1 = a1 + 10a2 + 75a3
b2 = a2 + 15a3
0 = a2
0 = b2 + 21b3
Because we are interested in f (7) we can use
55
1 = 5a1 + 125a3
1 = +7b1 + 49b2 + 343b3
b1 = a1 + 75a3
b2 = 15a3
0 = b2 + 21b3
then, we get:
1 = 5a1 + 125a3
and

15a3
1 = 7(a1 + 75a3 ) + 49(15a3 ) + 343 −
21
1 = 7a1 + 1015a3
The solutions of the following equations are:

89
a1 =
420
−1
a3 =
2100
Thus, the cubic polynomial is
89 1
f1 (x) = 2 + (x − 4) − (x − 4)3
420 2100
Therefore
459
f1 (7) = 175
(5.36)
We can find the other polynomial in the same way
37 1 1
f2 (x) = 3 + (x − 9) − (x − 9)2 + (x − 9)3
210 140 2940
56
Chapter 6
Numerical Differentiation and Integral
6.1 Numerical Differentiation

Consider a small increment ∆x = h in x. According to Taylor’s theorem, we have
h2 ′′
f (x + h) = f (x) + hf ′ (x) + f (ξ) (6.1)
2
where, ξ is a real number between x and x + h. We can get
f (x + h) − f (x) h ′′
f ′ (x) = − f (ξ) (6.2)
h 2
We can get the same formula using Linear Lagrange polynomial. we use two points x0 and x1 = x0 +h,
we get
x − x1 x − x0 f ′′ (ξ)
f (x) = f (x0 ) + f (x1 ) + (x − x0 )(x − x1 )
x0 − x1 x1 − x0 2!
x − x1 x − x0 f ′′ (ξ)
= −f (x0 ) + f (x1 ) + (x − x0 )(x − x1 )
h h 2!
Now we calculate the derivative at x = x0
−f (x0 ) f (x1 ) f ′′ (ξ) 1 ′
f ′ (x) = + + (2x − x0 − x1 ) + [f ′′ (ξ)] (x − x0 )(x − x1 )
h h 2! 2!
f (x 0 + h) − f (x 0 ) h
f ′ (x0 ) = − f ′′ (ξ)
h 2
This formula is known as the forward-difference if h > 0 and the backward-difference if h < 0.
The previous formula is also called two-point formula.
Backward-difference can be written for h > 0 as

f (x) − f (x + h)) h ′′
f ′ (x) = + f (ξ) (6.3)
h 2
57
(n+1)-point formula: To obtain general derivative approximation formulas, suppose that {x0 , x1 , ..., xn }
are (n+1) distinct numbers in some interval I and that f ∈ C n+1 (I). We can Use Lagrange polyno-
mials n n
X f (n+1) (ξ(x)) Y
f (x) = f (xk )Lk (x) + (x − xk ) (6.4)
k=0
(n + 1)! k=0
then, we calculate the derivative of f (x) we get

n n
′
X f (n+1) (ξj ) Y
f (xj ) = f (xk )L′k (xj ) + (x − xk ) (6.5)
k=0
(n + 1)! k = 0
k 6= j
In general, using more evaluation points produces greater accuracy, although the number of functional
evaluations and growth of round-off error discourages this somewhat.
Three-point formula:
1 h2
f ′ (x0 ) = [−3f (x0 ) + 4f (x + h) − f (x0 + 2h)] + f (3) (ξ0 ), (6.6)
2h 3
2
1 h
f ′ (x0 ) = [f (x0 + h) − f (x0 − h)] + f (3) (ξ1 ), (6.7)
2h 6
where ξ0 lies between x0 and x0 + 2h and ξ1 lies between x0 − h and x0 + h. Although the errors
in both formulas are O(h2 ), the error in the last equation is approximately half the error in the first
equation. This is because it use data from both sides of x0 .
Five-point formula
1
f ′ (x0 ) = [f (x0 − 2h) − 8f (x0 − h)
12h
h4
+8f (x0 + h) − f (x0 + 2h)] + f (5) (ξ), (6.8)
30
where ξ lies between x0 − 2h and x0 + 2h. The other five-point formula is useful for end-point
approximations. It is given by
1
f ′ (x0 ) = [−25f (x0 ) + 48f (x0 + h) − 36f (x0 + 2h)
12h
h4
+16f (x0 + 3h) − 3f (x0 + 4h)] + f (5) (ξ), (6.9)
5
where ξ lies between x0 and x0 + 4h. Left-endpoint approximation are found using h > 0 and
right-endpoint approximations with h < 0.
Example:
Let f (x) = x exp (x). The values of f at different x are given
x f (x)
1.8 10.889465
1.9 12.703199
2.0 14.778112
2.1 17.148957
2.2 19.855030
58
Since, f ′ (x) = (x+1) exp (x), we have f ′ (2) = 22.167168. Let us approximate f ′ (2) using three-point
formulas.
1
f ′ (x0 ) ≈ [−3f (x0 ) + 4f (x + h) − f (x0 + 2h)])
2h
1
f ′ (2) ≈ [−3f (2.0) + 4f (2.1) − f (2.2)] , h = 0.1
2 × 0.1
≈ 22.032310
1
f ′ (2) ≈ [−3f (2.0) + 4f (1.9) − f (1.8)] , h = −0.1
−2 × 0.1
≈ 22.054525
1
f ′ (x0 ) ≈ [f (x0 + h) − f (x0 − h)] ,
2h
1
≈ [f (2.1) − f (1.9)] , h = 0.1
2 × 0.1
≈ 22.228790
The errors are 22.167168 − 22.03231 = 0.134858, 22.167168 − 22.054525 = 0.112643, and 22.167168 −
22.228790 = −0.61622 × 10−1 . respectively
It is also possible to find approximation to higher derivatives function using only tabulated values
of function at various points.
Expand a function f in a third Taylor polynomial about x0 and evaluate at x0 ± h. Then
(x − x0 )2 ′′ (x − x0 )3 (3)
f (x) = f (x0 ) + (x − x0 )f ′ (x0 ) + f (x0 ) + f (x0 )
2 6
(x − x0 )4 (4)
+ f (ξ), (6.10)
24
h2 ′′ h3 h4
f (x0 + h) = f (x0 ) + hf ′ (x0 ) + f (x0 ) + f (3) (x0 ) + f (4) (ξ+ ) (6.11)
2 6 24
2 3
h h h4
f (x0 − h) = f (x0 ) − hf ′ (x0 ) + f ′′ (x0 ) − f (3) (x0 ) + f (4) (ξ− ) (6.12)
2 6 24
where x0 − h < ξ− < x0 < ξ+ < x0 + h adding the two last equations we get
f (x0 + h) + f (x0 − h) h2 h4 (4)
= f (x0 ) + f ′′ (x0 ) + f (ξ+ ) + f (4) (ξ− ) , (6.13)
2 2 24
solving this last equation we find that
1 h2 (4) (4)

f ′′ (x0 ) = [f (x 0 − h) − 2f (x 0 ) + f (x 0 + h)] − f (ξ+ ) + f (ξ− ) (6.14)
h2 24
Suppose that f (4) is continuous on [x0 − h, x0 + h]. Since 1/2(f (4) (ξ+ ) + f (4) (ξ− )) is between f (4) (ξ+ )
and f (4) (ξ− ), the Intermediate value theorem implies that a number ξ exists between ξ+ and ξ− , and
hence in (x0 − h, x0 + h), with
f (4) (ξ+ ) + f (4) (ξ− )
f (4) =
2
This lead to
1 h2 (4)
f ′′ (x0 ) = [f (x 0 − h) − 2f (x 0 ) + f (x 0 + h)] − f (ξ) (6.15)
h2 12
where ξ ∈ (x0 − h, x0 + h).
59
It is important to pay attention to round-off error when approximating derivatives. Let illustrate
this by an example:
two-point formula
f (x + h) − f (x) f1 − f0
f ′ (x) ≈ = (6.16)
h h
If we assume the round off errors in f0 and f1 as e0 and e1 , respectively, then
f1 + e1 − f0 − e0 f1 − f0 e1 − e0
f ′ (x) ≈ = + (6.17)
h h h
If the errors are of magnitude e, we can at worst get
2e
Rudolf error = (6.18)
h
h Mh
We know that the truncation error is given by − f ′′ (ξ) = , where M is the bound given by
′′
2 2
M = max |f (ξ)| and x ≤ ξ ≤ x + h Thus the bound for total error is then
Total error = Round off error + Truncation error (6.19)

2e M h
E2 = + (6.20)
h 2
Note that when the step size h increased, the truncation error increases while the round off error
decreases. The optimal value of h can be found to be
r
′ M 2e e
E (h) = − 2 = 0 =⇒ h = 2 . (6.21)
2 h M
Three-point formula
1 h2
f ′ (x0 ) = [f (x0 + h) − f (x0 − h)] − f (3) (ξ) (6.22)
2h 6
Suppose that in evaluating f (x0 ± h) we encounter round-off error e(x0 ± h). Then our computer
values f˜(x0 ± h) are related to f (x0 ± h) by
f (x0 ± h) = f˜(x0 ± h) + e(x0 ± h) (6.23)
The total error is

′ f˜(x0 + h) − f˜(x0 − h) e(x0 + h) − e(x0 − h) h2 (3)
f (x0 ) − = − f (ξ) (6.24)
2h 2h 6
is due in part to the round-off error and in part to the truncation error. If we assume that the
round-off error is bounded by ǫ > 0 and that the third derivative is bounded by M > 0. then the
total error will be bounded by

′ f˜(x0 + h) − f˜(x0 − h) e(x0 + h) − e(x0 − h) h2 (3)
f (x0 ) − = − f (ξ) (6.25)
2h 2h 6
ǫ h2
≤ + M. (6.26)
h 6
60
To reduce the truncation error, h2 M/6, we must reduce h. But if h is reduced, the round-off error
ǫ/h grows. In practice, then, it is seldom advantageous to let h be too small sine the round-off error
will dominate the calculations. The Minimum error can be found for the optimal value of h given by
r
3 3ǫ
h= (6.27)
M
Numerical differentiation is unstable: small values of h needed to reduce truncation error also cause
the round-off error to grow.
Example:
Show that the differentiation rule
f ′ (x0 ) ≈ a0 f0 + a1 f1 + a2 f2
is exact for all f ∈ C 3 , if, and only if, it exact for
f (x) = 1, f (x) = x, f (x) = x2 ,
and find the values of a0 , a1 , and a2 .
solution: According to the following formula
n n
X f (n+1) (ξ(x)) Y
f (x) = f (xk )Lk (x) + (x − xk ) (6.28)
k=0
(n + 1)! k=0
with n
Y (x − xi )
Lk (x) = (6.29)
i=0
(xk − xi )
i 6= k
we calculate the derivative of f (x)

n n
′
X f (n+1) (ξj ) Y
f (xj ) = f (xk )L′k (xj ) + (x − xk ) (6.30)
k=0
(n + 1)! k = 0
k 6= j
Therefore, we get
n
′ f (3) (ξ0 ) Y
f (x0 ) = f0 L′0 (x0 ) + f1 L′1 (x0 ) + f2 L′2 (x0 ) + (x0 − xk ) (6.31)
6 k=1
we have to show that
– For f (x) = 1, x, x2 , the formula f ′ (x0 ) = a0 f0 + a1 f1 + a2 f2 is exact since ∀ξ0 f (3) (ξ0 ) = 0.
– Now, we want to show that if the formula f ′ (x0 ) = a0 f0 + a1 f1 + a2 f2 is exact, then it should
be also exact for f (x) = 1, x, x2 .
If f ′ (x0 ) = a0 f0 + a1 f1 + a2 f2 is exact then ∀ξ0 f (3) (ξ0 ) = 0.
This implies that ∀α, β, γ, f (x) = α + βx + γx2 .
Then,
if f ′ (x0 ) = a0 f0 + a1 f1 + a2 f2 is exact =⇒ ∀ξ0 f (3) (ξ0 ) = 0
=⇒ ∀α, β, γ, f (x) = α + βx + γx2
=⇒ the formula is exact fo f (x) = 1, x, x2
Exercises section 4.1: 1, 3, 5, 9, 13, 15, 19
61
6.2 Richardson’s Extrapolation
Richardson’s extrapolation is used to generate high-accuracy results while using low-order formulas. Ex-
trapolation can be applied whenever it is known that an approximation technique has an error term with
predictable form like
M − N (h) = K1 h + K2 h2 + ..., (6.32)
for some collection of unknown constants, Ki , where N (h) approximate an unknown value M . In general
M − N (h) ≈ K1 h, unless there was a large variation in magnitude among the constants K, which is O(h).
if M can be written in the form
m−1
X
M = N (h) + Kj hj + O(hm ), (6.33)
j=1
then for j = 2, 3, ..., m, we have an O(hj ) approximation of the form

h Nj−1 (h/2) − Nj−1 (h)
Nj (h) = Nj−1 + (6.34)
2 2j−1 − 1
These approximations are generated by rows in the order indicated by the numbered entries in the following
table
O(h) O(h2 ) O(h3 ) O(h4 )
1 : N1 (h/1) ≡ N (h/1)
2 : N1 (h/2) ≡ N (h/2) 3 : N2 (h) (6.35)
4 : N1 (h/4) ≡ N (h/4) 5 : N2 (h/2) 6 : N3 (h)
7 : N1 (h/8) ≡ N (h/8) 8 : N2 (h/4) 9 : N3 (h/2) 10 : N4 (h)
If M is in the form that contains only even power of h then

h Nj−1 (h/2) − Nj−1 (h)
Nj (h) = Nj−1 + (6.36)
2 4j−1 − 1
for j = 2, 3, .., gives an O(h2j ).
Example:
We want to determine an approximate the value to f ′ (1.0) with h = 0.4 where f (x) = ln x. We use
Richardson’s Extrapolation N3 (h). We have
1 h2 h4 (5)
f ′ (x0 ) = [f (x0 + h) − f (x0 − h)] − f (3) (x0 ) − f (ξ) − ... (6.37)
2h 6 120
In the case h = 0.4 and x0 = 1, we can calculate N3 (h) using
1
N1 (h) = [ln(x0 + h) − ln(x0 − h)]
2h
N1 (0.4) = 1.059122326
N1 (0.2) = 1.013662770
N1 (0.1) = 1.003353478
62
We then use

h Nj−1 (h/2) − Nj−1 (h)
Nj (h) = Nj−1 + (6.38)
2 4j−1 − 1
to get
N1 (0.2) − N1 (0.4)
N2 (0.4) = N1 (0.2) + = 0.9985095847
41 − 1
N1 (0.1) − N1 (0.2)
N2 (0.2) = N1 (0.1) + = 0.9999170473
41 − 1
N2 (0.2) − N2 (0.4)
N3 (0.4) = N2 (0.2) + = 1.000010878
42 − 1
Exercises section 4.2: 1, 3, 5, 11
Exercises 5 (a) and 7 (a)
> a1:=1.1;f1:=exp(2*a1);
> a2:=1.2;f2:=exp(2*a2);
> a3:=1.3;f3:=exp(2*a3);
> a4:=1.4;f4:=exp(2*a4);
a1 := 1.1
f1 := 9.025013499
a2 := 1.2
f2 := 11.02317638
a3 := 1.3
f3 := 13.46373804
a4 := 1.4
f4 := 16.44464677
> h:=0.1;
> f1p:=1/2/h*(-3*f1+4*f2-f3);
> f2p:=1/2/h*(f3-f1);
> f3p:=1/2/h*(f4-f2);
> f4p=-1/2/h*(-3*f4+4*f3-f2);
h := 0.1
f1p := 17.76963490
f2p := 22.19362270
f3p := 27.10735195
f4p = 32.51082265
> evalf(subs(x=1.3,h^2/3*diff(exp(2*x),x$3)));
0.3590330144
0.1795165072
0.2192619569
63
0.4385239139
6.3 Elements of Numerical Integration

Rb
The basic method involved in approximating a f (x)dx is called numerical quadrature. It uses a sum
Pn
ai f (xi ) to approximate the integral. The methods of quadrature in this section are based on Lagrange
i=0
interpolating polynomial.
6.3.1 Trapezoidal rule

To derive the Trapezoidal rule, we use the linear Lagrange polynomial which agrees with the function f (x)
at x0 = a and x1 = b. The Trapezoidal Rule is:
Zb
h h3
f (x)dx = [f (x0 ) + f (x1 )] − f ′′ (ξ), (6.39)
2 12
a
where, h = x1 − x0 = b − a and ξ ∈ (a, b).

We want to evaluate the integral Z 2
I= (x3 + 1) dx (6.40)
1
using trapezoidal rule.
Z2
1 h3
f (x)dx = [f (1) + f (2)] − f ′′ (ξ)
2 12
1
1 1
=[2 + 9] − 6ξ
2 12
11 1
= − ξ
2 2
Therefore the trapezoidal rule give 5.5 with a maximum error Emax equals to
1
|Emax | = Max ξ=1
2
The exact value of the integral is
Z 2
I = (x3 + 1) dx
1
1 4
= x + x |21
4
1 4
= 2 + 2 − f rac1414 + 1
4
19
= = 4.75
4
Thus, the absolute error is |5.5 − 4.75| = 0.75. Note that the error is less than the maximum error
calculated from the formula.
64
6.3.2 Simpson’s rule
Simpson’s rule uses the second Lagrange polynomial with nodes at x0 = a, x1 = a + h, x2 = b, where
h = (b − a)/2. There are few tricks to get the formula (please see sec 4.3 and exercise 24 of sec 4.3)
Zx2
h h5
f (x)dx = [f (x0 ) + 4f (x1 ) + f (x2 )] − f (4) (ξ). (6.41)
3 90
x0
We want to evaluate the integral Z 1

I= ex dx (6.42)
−1
using Simpson’s rule
Z1
1 1
f (x)dx = [f (−1) + 4f (0) + f (1)] − f (4) (ξ) (6.43)
3 90
−1
(6.44)
we get
Z1
1 −1 1
ex dx = [e + 4 + e] − eξ
3 90
−1
1 ξ
= 2.36205 − e
90
The maximum error is
1 ξ
|Emax | = Max e
90
1
= e = 0.03020
90
The exact value of the integral is
Z 1
1
I= ex dx = e − = 2.35040 (6.45)
−1 e
the absolute error is |2.36205 − 2.35040| = 0.01165 which is less than the maximum error 0.03020.
6.3.3 Degree of precision

The degree of accuracy, or precision, of a quadrature formula is the largest positive integer n such that
the formula is exact for xk , for each k = 0, 1, ..., n.
What degree of precision does the following formula have?
Z 1
1 1
f (x)dx = f − √ +f √
−1 3 3
65
The integral
Z 1
1 + (−1)k
xk dx =
−1 1+k
and
k k k
1 1 1
−√ + √ = √ 1 + (−1)k
3 3 3
The form has n degree of precision if it is exact for xk , k = 0, . . . , n
k
1 + (−1)k 1
= √ 1 + (−1)k
1+k 3
It is true when k is an odd number. For k even number
k
1 1
= √
1+k 3
which is true for k = 0, 2. Therefore, the formula is true for k = 0, 1, 2, 3. Thus, the degree of the formula
is 3. We can conclude from this example that the formula is true for all the polynomial of degree at most
3.
6.3.4 Newton-Cotes Formula

There are two types of Newton-Cotes formula, open and closed. The (n+1)-point closed Newton-Cote
formula uses nodes xi = x0 + ih, for i = 0, 1, ..., n, where x0 = a, xn = b, and h = (b − a)/n. It called
closed because the endpoints of the closed interval [a, b] are included as nodes.
The open Newton-Cotes formulas use nodes xi = x0 + ih, for each i = 0, 1, ..., n, where h = (b −
a)/(n + 2) and x0 = a + h. This implies that xn = b − h. so, we label the endpoints by setting x−1 = a
and xn+1 = b. where
Zb n
X
f (x)dx ≈ ai f (xi ), (6.46)
a i=0
Zb
ai = Li (x)dx (6.47)
a
Theorem 4.2
Theorem 4.3
Exercises section 4.3:
6.4 Composite Numerical Integration

The Newton-Cotes formulas are generally unsuitable for use over large integration intervals. One way to
overcome this problem is to split the large interval into subintervals and sum Newton-Cotes formulas over
all these subintervals.
66
Composite Simpson’s rule:
Let f ∈ C 4 [a, b], n be even number, h = (b − a)/n, and xj = a + hj, for each j = 0, 1, . . . , n. There
exists a µ ∈ (a, b) for which the composite Simpson’s rule for n subintervals can be written as
 
Z b (n/2)−1 n/2
h X X b − a 4 (4)
f (x) dx = f (a) + 2 f (x2j ) + 4 f (x2j−1 ) + f (b) − h f (µ) (6.48)
a 3 j=1 j=1
180
Composite Trapezoidal rule:

Let f ∈ C 2 [a, b], h = (b − a)/n, and xj = a + hj, for each j = 0, 1, . . . , n. There exists a µ ∈ (a, b)
for which the composite trapezoidal rule for n subintervals can be written as
Z b " n−1
#
h X b − a 2 ′′
f (x) dx = f (a) + 2 f (xj ) + f (b) − h f (µ) (6.49)
a 2 j=1
12
Composite Midpoint rule:

Let f ∈ C 2 [a, b], n be even, h = (b − a)/(n + 2), and xj = a + h(j + 1), for each j =, −1, 0, . . . , n + 1.
There exists a µ ∈ (a, b) for which the composite Midpoint rule for n + 2 subintervals can be
written as
Z b n/2
X b − a 2 ′′
f (x) dx = 2h f (x2j ) − h f (µ) (6.50)
a j=0
6
An important property shared by all these composite integration techniques is a stability with respect
to round off errors. Let demonstrate this property for Composite Simpson’s rule with n subintervals
to a function f on [a, b]. Assume that
f (xi ) = f˜(xi ) + ei (6.51)
where ei is the round off error and f˜(xi ) is an approximation to f (xi ). From the Composite Simpson’s
rule
 
Z b (n/2)−1 n/2
h X X b − a 4 (4)
f (x) dx = f (a) + 2 f (x2j ) + 4 f (x2j−1 ) + f (b) − h f (µ)
a 3 j=1 j=1
180
the accumulate round off error, e(h) is given by
 
(n/2)−1 n/2
h X X
e(h) = e0 + 2 e2j + 4 e2j−1 + en 
3 j=1 j=1
If the round off errors are uniformly bounded by ǫ, then

h
e(h) ≤ ǫ [1 + (n − 2) + 2n + 1]
3
≤ nhǫ = (b − a)ǫ
It is clear from the last equation that the bound is independent of h and n. This means that,
even though we may nee to divide an interval into more parts to ensure accuracy, the increased
computation that is required does not increase the round off error. This means that the procedure
is stable as h approaches zero. Recall that was not true for the numerical differentiation procedures.
67
6.5 Adaptive Quadrature Methods

The composite quadrature rules use equally spaced points. This is not good if the function to be integrated
has large variations in some region and small variations at other regions. So, it is useful to introduce a
method to adjust the step size. The step size have to be smaller over region where a large variation occurs.
This technique is called adaptive quadrature. The method is based on Simpson’s rule. But we can also
use any Newton-Cotes formula.
We know that Simpson’s rule uses two subintevals over [ak , bk ]:
h
S(ak , bk ) = (f (ak ) + 4f (ck ) + f (bk )) , (6.52)
3
b k − ak
where ck is the center of [ak , bk ], and h = . Furthermore, if f ∈ C (4) [ak , bk ], then there exist
2
ξk ∈ [ak , bk ] so that
Z b
h5 (4)
f (x) dx = S(ak , bk ) − f (ξk ) (6.53)
a 90
A composite Simpson rule using four subintervals of [ak , bk ] can be performed by bisecting this interval
into two equal subinterval [ak , ck ] = [ak1 , bk1 ] and [ck , bk ] = [ak2 , bk2 ]. We then write
h
S(ak1 , bk1 ) + S(ak2 , bk2 ) = (f (ak1 ) + 4f (ck1 ) + f (bk1 )) (6.54)
3×2
h
+ (f (ak2 ) + 4f (ck2 ) + f (bk2 )) (6.55)
3×2
where only two additional evaluation of f (x) are needed at ck1 and ck2 , which are the midpoint of the
intervals [ak1 , bk1 ], and [ak2 , bk2 ], respectively.
Furthermore, if f ∈ C (4) [ak , bk ], then there exist ξk1 ∈ [ak , bk ] so that
Z bk
h5
= S(ak1 , bk1 ) + S(ak2 , bk2 ) − f (4) (ξk1 ) (6.56)
ak 16 × 90
If we assume that f (4) (ξk ) ≈ f (4) (ξk1 ), we get
h5 (4) h5
S(ak , bk ) − f (ξk ) ≈ S(ak1 , bk1 ) + S(ak2 , bk2 ) − f (4) (ξk1 )
90 16 × 90
which can be written as
h5 (4) 16
− f (ξk1 ) ≈ (S(ak1 , bk1 ) + S(ak2 , bk2 ) − S(ak , bk ))
90 15
Thus, we can find that
Z b
≈ 1 |S(ak1 , bk1 ) + S(ak2 , bk2 ) − S(ak , bk )|

f (x) dx − S(a k1 , b k1 ) − S(a k2 , b k2 ) 15 (6.57)
a
68
1 1
Because of the assumption f (4) (ξk ) ≈ f (4) (ξk1 ), the fraction is replaced with when implementing
15 10
the method in a program.
Assume that we want the tolerance to be ǫk > 0 for the interval [ak , bk ]. If
1
|S(ak1 , bk1 ) + S(ak2 , bk2 ) − S(ak , bk )| < ǫk (6.58)
10
we infer that Z b

f (x) dx − S(a k1 , b k1 ) − S(a k2 , b k2 ) < ǫk
(6.59)
a
Thus the composite Simpson rule is used to approximate the integral

Z b
f (x) dx ≈ S(ak1 , bk1 ) + S(ak2 , bk2 ) (6.60)
a
and the error bound for this approximation over the interval [ak , bk ] is ǫk .
The adaptive quadrature is implemented by applying Simpson’s rules in this way:
1. We start with the interval [a0 , b0 ] and tolerance ǫ0
2. we then apply Simpson rule:

h
S(ak , bk ) = (f (ak ) + 4f (ck ) + f (bk )) ,
3
h
S(ak1 , bk1 ) + S(ak2 , bk2 ) = (f (ak1 ) + 4f (ck1 ) + f (bk1 ))
3×2
h
+ (f (ak2 ) + 4f (ck2 ) + f (bk2 ))
3×2
3. The interval is refined into subintervals labeled [a01 , b01 ] and [a02 , b02 ].
4. If the accuracy test

1
|S(a01 , b01 ) + S(a02 , b02 ) − S(a0 , b0 )| < ǫ0 (6.61)
10
is passed, the quadrature formula
h
S(a01 , b01 ) + S(a02 , b02 ) = (f (a01 ) + 4f (c01 ) + f (b01 ))
6
h
+ (f (a02 ) + 4f (c02 ) + f (b02 ))
6
is applied to [a0 , b0 ] and we are done.
5. If the accuracy test fails,
The two subintervals are labeled [a1 , b1 ] and [a2 , b2 ], over which the tolerances are halved,
ǫ1 = ǫ/2, ǫ2 = ǫ/2.
We repeat the steps 4-5 for the two intervals with the new tolerances.
6. we add all the quadrature formulas where the accuracy test are passed
69
example We apply the adaptive quadrature algorithm to approximate
Z 1
3√
x dx = 1
0 2
We first define the function

v−µ µ+v
S(µ, v) = f (µ) + 4f + f (v)
6 2
Accuracy test for [0,1], we start with ǫ0 = 0.001
S(0, 1) = 0.9571067813
S(0, 0.5) = 0.3383883477
S(0.5, 1) = 0.6464010497
|S(0, 0.5) + S(0.5, 1) − S(0, 1)| − 10ǫ0 = 0.0176826161 > 0
We have to refine the interval [0, 1] into [0, 0.5] and [0.5, 1]
Accuracy test for [0,0.5], our ǫ1 = ǫ0 /2
S(0, 0.5) = 0.3383883477

S(0, 0.25) = 0.1196383477
S(0.25, 0.5) = 0.2285372827
|S(0, 0.25) + S(0.25, 0.5) − S(0, 0.5)| − 10ǫ1 = 0.004787282700 > 0
We have to refine the interval [0, 5] into [0, 0.25] and [0.25, 0.5]
S(0, 0.25) = 0.1196383477

S(0, 0.125) = 0.04229854347
S(0.125, 0.25) = 0.08080013118
|S(0, 0.125) + S(0.125, 0.25) − S(0, 0.25)| − 10ǫ2 = 0.000960326900 > 0
We have to refine the interval [0, 25] into [0, 0.125] and [0.125, 0.25]
S(0, 0.125) = 0.04229854347

S(0, 0.0625) = 0.01495479346
S(0.0625, 0.125) = 0.02856716035
|S(0, 0.0625) + S(0.0625, 0.125) − S(0, 0.125)| − 10ǫ3 = 0.000026589660 < 0
The test has passed. So, we can keep the interval [0, 0.125] with
S(0, 0.0625) + S(0.0625, 0.125) = 0.04352195381.
We go now back and keep ǫ3
70
Accuracy test for [0.125,0.25]
S(0.125, 0.25) = 0.08080013118
S(0.125, 0.1875) = 0.03699538942
S(0.1875, 0.25) = 0.04381002180
|S(0.125, 0.1875) + S(0.1875, 0.25) − S(0.125, 0.25)| − 10ǫ3 = 0.001244719960 < 0
The test has passed. So, we can keep the interval [0.125, 0.25] with
S(0.125, 0.1875) + S(0.1875, 0.25) = 0.08080541122.
We go now back with ǫ2
Accuracy test for [0.25 0.5]
S(0.25, 0.5) = 0.2285372827
S(0.25, 0.375) = 0.1046387629
S(0.375, 0.5) = 0.1239134540
|S(0.25, 0.375) + S(0.375, 0.5) − S(0.25, 0.5)| − 10ǫ2 = 0.002485065800 < 0
The test has passed. So, we can keep the interval [0.25, 0.5] with
S(0.25, 0.375) + S(0.375, 0.5) = 0.2285522169.
We go now back with ǫ1
Accuracy test for [0.5 1]
S(0.5, 1) = 0.6464010497
S(0.5, 0.75) = 0.2959631153
S(0.75, 1) = 0.3504801743
|S(0.5, 0.75) + S(0.75, 1) − S(0.5, 1)| − 10ǫ1 = 0.004957760100 < 0
The test has passed. So, we can keep the interval [0.5, 1] with
S(0.5, 0.75) + S(0.75, 1) = 0.6464432896.
Now we can add
Z 1
3√
x dx ≈ S(0, 0.0625) + S(0.0625, 0.125) + S(0.125, 0.1875) + S(0.1875, 0.25) +
0 2
S(0.25, 0.375) + S(0.375, 0.5) + S(0.5, 0.75) + S(0.75, 1)
= 0.9993228715
the actual error is 0.0006771285

We can summarize our results:
ak bk AQ
0.000 0.125 0.04352195381
0.125 0.250 0.08080541122
0.250 0.500 0.2285522169
0.500 1.000 0.6464432896
0.9993228715
71
1.5
0.5
0
0 0.125 .0.25 0.5 1
6.6 Gaussian Quadrature

Gaussian integration is based on the concept that accuracy of quadrature form n can be improved by
choosing nodes wisely, rather than on the basis of equal spacing nodes. Gauss integration assumes an
approximation of the form
This equation contains 2n unknowns to be determined. Gaussian Quadrature use the fact that the
quadrature form has the high degree of precision as possible.
Z 1 Xn
f (x) dx ≈ ai f (xi ) (6.62)
−1 i=1
Let us find the Gaussian quadrature formula for n = 2. In this case the function 1, x, x2 , and 4x3 should
give exact results.
Z 1
f (x) = 1 =⇒ a1 + a2 = dx = 2
−1
Z 1
f (x) = x =⇒ a1 x1 + a2 x2 = xdx = 0
−1
Z 1
2 2 2 2
f (x) = x =⇒ a1 x1 + a2 x2 = x2 dx =
−1 3
Z 1
f (x) = x3 =⇒ a1 x31 + a2 x32 = x3 dx = 0
−1
solving these equations we get

1
a1 = a2 = 1, x1 = −x2 = √ (6.63)
3
Thus, we have the Gaussian Integration for two nodes
Z 1
1 1
f (x) dx ≈ f √ +f √ (6.64)
−1 3 3
72
This method can be generalized to more than two nodes but there is another alternative way to get
more easily. This alternative way use what we call Legendre Polynomials. They are defined by
For each n, Pn is a monic polynomial of degree n, a polynomial xn + a(n−1) x(n−1) + ... + a1 x + a0 in
which the coefficient of the highest order term is 1.
whenever P (x) is polynomial of degree less then n we have
Z 1
P (x) Pn (x) dx = 0 (6.65)
−1
These are few Legendre polynomials:

1
P0 (x) = 1, P1 (x) = x, , P2 (x) = x2 − ,
3
3 4 3
P3 = x3 − x, P4 (x) = x4 − x2 + (6.66)
5 6 35
The nodes xi , i = 1, . . . , n are determined by the roots of Pn (x) and the coefficients ai are defined by
Z 1 Y
x − xj
ai = dx (6.67)
−1 j = 1 xi − xj
j 6= i
If P (x) is any polynomial of degree less than 2n, then

Z 1 Z 1 n
X
f (x) dx ≈ P (x) dx = ai P (xi ) (6.68)
−1 −1 i=1
Note that Gaussian formula imposes a restriction on the limits of integration to be from -1 to 1. It is
possible to overcome this restriction by using the technique of changing variable.
Z b Z 1
f (x) dx = g(z)dz (6.69)
a −1
We define
z = Ax + B (6.70)
z−B
x = (6.71)
A
So, we can get
−1 = A a + B
1 = Ab + B
which gives the solution

2
A =
b−a
a+b
B =
a−b
73
Therefore
Z b Z 1
1
f (x) dx = g(z) dz (6.72)
a A −1
where

z−B
g(z) = f (6.73)
A
Example:
Convert the integral
Z 2
I= e−x/2 dx
−2
to the Gaussian limits of integration.

Solution:
Z 2
I = e−x/2 dx
−2
Z
1 1 z−B
= f dz
A −1 A
We have
2 1
A= =
b−a 2
a+b
B= =0
a−b
Thus
Z 2
I = e−x/2 dx
−2
Z 1
= 2 f (2z) dz
−1
Z 1
= 2 e−z dz
−1
6.7 Improper Integrals

Improper integral result in two cases:
the notion of integration is extended to an interval of integration on which the function is unbounded
the endpoint of the interval of integration is infinity, ∞.
74
Let consider the first case, when the integrand f (x) is unbounded. The second case can be reduced to the
first case. It is well known that the improper integral
Z b
dx
p
(6.74)
a (x − a)
converges iff the power 0 < p < 1. Thus, we define the improper integral
Z b
dx (b − a)1−p
p
= (6.75)
a (x − a) 1−p
If the function f (x) can be written as
g(x)
f (x) = (6.76)
(x − a)p
where g is continuous on [a, b], the improper integral
Z b
f (x)dx (6.77)
a
exists for 0 < p < 1.

Let now approximate this integral given by eq. (6.77) using composite Simpson’s rule. Assume that
g ∈ C 5 [a, b]. In this case, the fourth Taylor polynomial, P4 (x), for g can be written as
g ′′ (a) g ′′′ (a) g (4) (a)

P4 (x) = g(a) + (x − a)g ′ (a) + (x − a)2 + (x − a)3 + (x − a)4 (6.78)
2 3! 4!
Therefore
Z b Z b Z b
g(x) − P4 (x) P4 (x)
f (x)dx = dx + dx (6.79)
a a (x − a)p a (x − a)p
The last term can be integrated like
Z b 4 Z
P4 (x) X g (k) (a)
p
dx = (x − a)k−p dx
a (x − a) k=0
k!
4
X g (k) (a)
= (b − a)k−p+1 (6.80)
k=0
(k − p + 1)k!
Now, we can define a function G(x) such that


 g(x) − P4 (x)
, if a < x ≤ b
G(x) = (x − a)p (6.81)

0, if x = a
Since 0 < p < 1 and P4k (a) agrees with g (k) (a) for each k = 0, dots, 4, we have G ∈ C 4 [a, b]. This implies
that the Composite Simpson’s rule can be applied to approximate the integral of G on [a, b].
Example:
We want to approximate Z 1
sin (x)
dx (6.82)
0 x1/4
Using Simpson’s composite rule with n = 4.
75
1. We find the fourth order Taylor polynomial for sin(x) at x = 0
1
sin(x) ≈ P4 (x) = x − x3 (6.83)
6
2. We evaluate the integral

Z 1 Z 1
P4 (x) 3/4 1 11/4
dx = x − x dx
0 x1/4 0 6
166
= = 0.5269841270
315
3. We define the function G(x)


1


 sin(x) − x + x3
 6 , if 0 < x ≤ 1
G(x) = x 1/4




0, if x = 0
4. We apply Composite Simpson’s rule with n = 4
Z 1
1
G(x) dx ≈ [G(0) + 2G(0.5) + 4G(0.25) + 4G(0.75) + G(1)] = 0.001432198742
0 12
5. Now we get
Z 1
sin (x)
dx ≈= 0.001432198742 + 0.5269841270 = 0.5284163257
0 x1/4
To approximate the improper integral with a singularity at the right endpoint, we could apply the
technique used above with the following transformation
Z b Z −a
f (x)dx = f (−z)dz (6.84)
a −b
which has a singularity at the left endpoint. An improper with a singularity at a < c < b can be treated
as the sum
Z b Z c Z b
f (x)dx = f (x)dx + f (x)dx (6.85)
a a c
Other type of improper integral involves infinite limits of integration can be treated as
Z ∞ Z 1/a
f (x)dx = t−2 f (1/t)dt (6.86)
a 0
for a 6= 0. For the case when a = we can split the integral into two parts. One from 0 to c and the other
from c 6= 0 to ∞.
76
Chapter 7
Initial-Value Problems for ODE
7.1 introduction
An ordinary Differential Equation (ODE) is an equation containing one ore more derivatives of y.
Differential equations are classified according to their order. The order of differential equation is
the highest derivative that appears in the equation. When the the equation contains only a first
derivative, it is called first-order differential equation. A first order differential equation can be
expressed as
dy
= f (t, y) (7.1)
dt
Degree of a differential equation is the power of the highest-order derivative. For example
ty ′′ + 3t2 + 2 = 0
is a first-degree, second-order differential equation.
A differential equation is a linear equation when it does not contain terms involving the product of
the dependent variable y or its derivatives. For example y ′′ + 2y ′ + t2 is linear but y ′′ + 2y ′ y + t2 is
not.
If the order of the equation is n, we nee n conditions in order to obtain a unique solution. When
all the conditions are specified at a particular value of independent variable t, then the problem is
called initial-value problem.
It is also possible to specify the conditions at different values of t. such problems are called the
boundary-value problem.
All numerical techniques for solving differential equations involve a series of estimates of y(t) starting
from the given conditions. There are two basic approaches, one-step and multistep methods.
In one-step methods, we use information from only one preceding points. To estimate yi we only
need yi−1 .
Multistep methods use information at two or more previous steps to estimate a value.
77
7.2 Elementary Theory of Initial-Value Problems
Lipschitz condition: A function f (t, y) is said to satisfy a Lipschitz condition in the variable y on
a set D ⊂ R2 if a constant L > 0 exists with
|f (t, y1 ) − f (t, y2 )| ≤ L|y1 − y2 |, (7.2)
whenever (t, y1 ), (t, y2 ) ∈ D.
Convex Set: A set D ∈ R2 is said to be convex if whenever (t1 , y1 ) and (t2 , y2 ) belong to D and λ
is in [0, 1], the point
((1 − λ)t1 + λt2 , (1 − λ)y1 + λy2 ) belongs to D. This means that the entire straight line segment
between the two points also belongs to the set D.
Non Convex
Convex
The set D = {(t, y)|a ≤ t ≤ b, y ∈ R} is a convex set.
Theorem:
Suppose that f (t, y) is defined on a convex set D ∈ R2 . If a constant L > 0 exists with

∂f
(t, y) ≤ L, for all (t, y) ∈ D,
∂y
then f satisfies a Lipschitz condition on D in the variable y.

proof:
Theorem:
Suppose that D = {(t, y)|a ≤ t ≤ b, y ∈ R} and that f (t, y) is continuous on D. If f satisfies a
Lipschitz condition on D in the variable y, then the initial-value problem
y ′ (t) = f (t, y), a ≤ t ≤ b, y(a) = α,
has a unique solution y(t) for a ≤ t ≤ b.

Example:
We want to show that the following initial-value problem has a unique solution
y ′ = y cos(t), 0 ≤ t ≤ 1, y(0) = 1
We apply the previous theorem which states:
78
Suppose that D = {(t, y)|a ≤ t ≤ b, y ∈ R} and that f (t, y) is continuous on D. If f satisfies a
Lipschitz condition on D in the variable y, then the initial-value problem
y ′ (t) = f (t, y), a ≤ t ≤ b, y(a) = α,
has a unique solution y(t) for a ≤ t ≤ b.
Let check that
f (t, y) = y ′ = y cos(t), 0 ≤ t ≤ 1
satisfies Lipschitz condition.
|y1 cos(t) − y2 cos(t)| = cos(t)|y1 − y2 | ≤ |y1 − y2 |
Thus, L = 1, f (t, y) satisfies the Lipschitz condition. Therefore, there is a unique solution.
example:
We want to show that

′ y3 + y
y =− 2
(3y + 1)t
has solution y 3 tyt = 2 for 1 ≤ t ≤ 2, and y(1) = 1.
– we calculate the derivative of y 3 + t + yt = 2 we find
3y 2 y ′ t + 3y 3 + y + y ′ t = 0
y ′ (3y 2 t + t) + y 3 + y = 0
which gives
y3 + y
y′ = −
(3y 2 + 1)t
we have also from y 3 t + yt = 2 that at t = 1 we get y 3 + y = 2. There is only one real root for
y 3 + y = 2 which is y = 1. To get y(2) we can use Newton method
f (y) = y 3 + y − 1 = 0
p3 + pn − 2
pn+1 = pn − n 2
3pn + 1
It is clear that f (0) = −1 and f (1) = 1. We can start Newton iteration from p0 = 0.5. We get
i pi
0 0.5
1 0.7142857143
2 0.6831797236
3 0.6823284233
4 0.6823278037
5 0.6823278038
6 0.6823278038
So, the approximate value of y(2) = 0.6823278038.
Exercises section 5.1: 1, 3, 5, 7
79
7.3 Euler’s Method
The objective of Euler’s method is to obtain an approximate solution to the well-posed initial-value
problem
dy
= f (x, y), a ≤ t ≤ b, y(a) = α (7.3)
dt
We can obtain approximate solutions at fixed points, called mesh points.
Let assume that the mesh points are equally distributed throughout the interval [a, b]. So, we choose
ti = a + ih, for each i = 0, 1, ..., N. (7.4)
The common distance between the points h = (b − a)/N , is called the step size. To derive Euler’s
method we use Taylor’s Theorem
′ (ti+1 − ti )2 ′′
y(ti+1 ) = y(ti ) + (ti+1 − ti )y (ti ) + y (ξi )
2
h2
= y(ti ) + h y ′ (ti ) + y ′′ (ξi )
2
h2
= y(ti ) + h f (ti , y(ti )) + y ′′ (ξi ) (7.5)
2
Euler’s method constructs wi ≈ y(ti ), for each i = 1, 2, ..., N , by dropping the remainder term. Thus,
it is given by
w0 = α,
wi+1 = wi + hf (ti , wi ), for each i = 0, 1, ..., N. (7.6)
This last equation is called difference equation associated with Euler’s method.
Example: Let approximate the solution of y ′ = y − t2 + 1, using Euler’s method with 0 ≤ t ≤ 2,

y(0) = 0.5, and N = 10 (please see Example 1 page 259).
We have h = (b − a)/N = 0.2 and t0 = 0. The Euler’s method is:
w0 = 0.5,
wi+1 = wi + 0.2 × (wi − t2i + 1), for each i = 0, 1, ..., N
ti wi
0.0 0.05000
0.2 0.80000
0.4 1.15200
0.6 1.55040
.. ..
. .
2.0 4.86580
80
Theorem
Suppose that f is continuous an satisfies a Lipschitz condition with constant L on
D = {(t, y) | a ≤ t ≤ b, −∞ < y < ∞} (7.7)
and that a constant M exists with
|y ′′ (t)| ≤ M, for all t ∈ [a, b] (7.8)
Let y(t) denote the unique solution to the initial-value problem
y ′ (t) = f (t, y), a ≤ t ≤ b, y(a) = α (7.9)
and wi , (i = 0, . . . , N ) be the approximations generated by Euler’s method for some positive integer
N . Then, for each i
hM L(ti −a)
|y(ti ) − wi | ≤ e −1 (7.10)
2L
Theorem
Assume that the hypotheses of the previous theorem hold and ui (i = 0, . . . , N ) be the approximations
obtained from
u0 = α + δ0 ,
ui+1 = ui + hf (ti , ui ) + δi+1 (7.11)
and if |δi | < δ, then

1 hM δ L(ti −a)
|y(ti ) − ui | ≤ + e − 1 + |δ0 |eL(ti −a) (7.12)
L 2 h
The error bound is no longer linear in h. In fact it goes to infinity for h goes to zero.

hM δ
lim + =∞ (7.13)
h−→∞ 2 h
It can be shown that the minimum value of the error occurs when
r
2δ
h= (7.14)
M
example ex 9:
Given the initial-value problem
2
y ′ = y + t2 et , 1 ≤ t ≤ 2, y(1) = 0.
t
with the exact solution y(t) = t2 (et − e). The Euler’s method with h = 0.1 gives
81
t w(t) y(t) y(t) − w(t)
1. 0 0. 0.
1.1 0.2718281828 0.345919877 0.0740916942
1.2 0.6847555777 0.866642537 0.1818869593
1.3 1.276978344 1.607215080 0.330236736
1.4 2.093547688 2.620359552 0.526811864
1.5 3.187445123 3.967666297 0.780221174
1.6 4.620817847 5.720961530 1.100143683
1.7 6.466396379 7.963873477 1.497477098
1.8 8.809119690 10.79362466 1.984504970
1.9 11.74799654 14.32308154 2.57508500
2.0 15.39823565 18.68309709 3.28486144
A linear interpolation to approximate y(1.04) can be found as follows. We use x0 = 1 and x1 = 1.1
and the values of w(x0 ) and w(x1 ) we get the Lagrange polynomial
P (x) = 2.718281828 x − 2.718281828
Thus, P (1.04) = 0.108731273. The exact value is y(1.04) = 0.119987497 and the error is 0.011256224.
7.4 Higher-Order Taylor Methods

Euler’s method is Taylor’s method of order one.
For order n we write Taylor polynomial for
n
X hk (k) hn+1
y(ti+1 ) = y (ti ) + y (n+1) (ξi ) (7.15)
k=0
k! (n + 1)!
for some ξi ∈ (ti+1 , ti ) The difference-equation method correspond to previous equation can be found
by deleting the remainder term. we get the Taylor method order n:
ω0 = α,
ωn+1 = ωi + h T (n) (ti , ωi ), i = 0, 1, ..., N − 1, (7.16)
n−1
X hj
T (n) = f (ti , ωi ) + f (j) (ti , ωi )
j=1
(j + 1)!
Example:
We want to approximate the solution of the initial-valued problem
y ′ = t2 + y 2 , 0 ≤ t ≤ 0.4, y(0) = 0
with h = 0.2 using Taylor’s method of order 4.
We calculate the derivatives
y′ = t2 + y 2
y ′′ = 2t + 2yy ′
y ′′′ = 2 + 2y ′2 + 2yy ′′
y (4) = 6y ′ y ′′ + 2yy ′′′
82
we find
y(0) = 0.0
y(0.2) = 0.00266666667
y(0.4) = 0.02135325469
If we use step size h = 0.4 we get y(0.4) = 0.02133333333. The correct answer is y(0.4) = 0.021359.
It shows that the accuracy has been improved by using subintervals, i.e., decreasing the step size.
7.5 Runge-Kutta Methods

The Taylor’s method provides the formal definition of a step-by-step numerical method for solving
initial-value problems. The difficulty of applying Taylor’s method is connected with evaluating higher
derivatives which is extremely complicated. We can explore a class of methods that agree with the
first n + 1 terms of the Taylor series using function value only (no need to construct f (r) ). These are
Runge-Kutta Methods.
Runge-Kutta methods refer to a family of one-step methods. They all based on the general form of
the extrapolation equation,
yi+1 = yi + slope × interval size (7.17)

= yi + m h (7.18)
where m is the slope that is weighted averages of the slopes at various points in the interval. If
we estimate m using slopes at r points in the interval (ti , ti+1 ), then m can be written as m =
w1 m1 + w2 m2 + . . . + wr mr , where wi are weights of the slopes at various points.
Runge-Kutta methods are known by their order. For instance, a Runge-Kutta method is called the
r-order Runge-Kutta method when slopes at r points are used to construct the weighted average
slope m. In Euler’s method we use only one point slope at (ti , yi ) to estimate yi+1 , and therefore,
Euler’s method is a first-order Runge-Kutta method.
Second Taylor Polynomial in two variables for the function f (t, y) near the point (t0 , y0 ) can be
written as
P2 (t, y) = f (t0 , y0 ) + [(t − t0 )ft + (y − y0 )fy ]

(t − t0 )2 (y − y0 )2
+ ftt + fyy + (t − t0 )(y − y0 )fty (7.19)
2 2
We consider here the second-order method which has the form
yi+1 = yi + (w1 m1 + w2 m2 )h (7.20)

where
m1 = f (ti , yi ) (7.21)
m2 = f (ti + a1 h, yi + b1 m1 h) (7.22)
83
The weights w1 and w2 and the constants a1 and b1 are to be determined. The principle of Runge-
Kutta method is that these parameters are chosen such that the power series expension of the right
side of eq. (7.20) agrees with Taylor series expension of yi+1 in terms of yi and f (ti , yi ).
The second-order Taylor series expension of yi+1 about yi is given by
y ′′ 2
yi+1 = yi + y ′ h + h (7.23)
2
We know that
yi′ = f (ti , yi )
dy ′ ∂f ∂f
yi′′ = = + f (ti , yi )
dt ∂t ∂y
We get
h2
yi+1 = yi + f h + (ft + fy f )
2
Now consider the right side of eq. (7.20). To get Taylor’s expension we need the Taylor’s series of
two variables. we can write
yi+1 = yi + (w1 m1 + w2 m2 )h
= y1 + (w1 f + w2 f (ti + a1 h, yi + b1 m1 h))
= yi + [w1 f + w2 (f + a1 hft + b1 m1 hfy + O(h2 ))]h
= yi + w1 hf + w2 hf + w2 a1 h2 ft + w2 b1 m1 h2 fy + O(h3 )
= yi + (w1 + w2 )hf + w2 (a1 ft + b1 m1 fy )h2 + O(h3 )
Then we can compare between
h2
yi+1 = yi + f h + (ft + fy f )
2
yi+1 = yi + (w1 + w2 )hf + w2 (a1 ft + b1 m1 fy )h2
we find
1
w1 + w2 = 1, w2 a1 = w2 b 1 =
2
Note that we have only three equations but four variables. These set of equations has no unique
solution. The index i = 0, 1, . . . , N − 1.
If we choose w1 = 0, w2 = 1 and a1 = b1 = 1/2 we get what we call Midpoint method
m1 = f (ti , yi )
h m1
m2 = f (ti + , yi + h)
2 2
yi+1 = yi + m2 h
84
If we choose w1 = w2 = 1/2 and a1 = b1 = 1 we get what we call Modified Euler Method
m1 = f (ti , yi )
m2 = f (ti + h, yi + m1 h)
1
yi+1 = yi + (m1 + m2 )h
2
If we choose w1 = 1/4, w2 = 3/4 and a1 = b1 = 2/3 we get what we call Heun’s method
m1 = f (ti , yi )

2 2
m2 = f ti + h, yi + m1 h
3 3
h
yi+1 = yi + (m1 + 3m2 )h
4
The derivation of Runge-Kutta Order Four is too long, we just give it here without details.
m1 = f (ti , yi )

h 1
m2 = f ti + , yi + m1 h
2 2

h 1
m3 = f ti + , yi + m2 h (7.24)
2 2
m4 = f (ti + h, yi + m3 h)
1
yi+1 = yi + (m1 + 2m2 + 2m3 + m4 )h
6
Examples:
We want to approximate the solution of
y ′ = te3t − 2y, 0 ≤ t ≤ 1, y(0) = 0
with h = 0.5. Let use Modified Euker’s method
N = (1 − 0)/h = 2
y0 = 0
ti = 0 + ih, i = 0, 1, . . . , N − 1
m1 = f (ti , yi )
m2 = f (ti + h, yi + m1 h)
h
yi+1 = yi + (m1 + m2 )h
2
we get
ti wi
0.0 0.0
0.5 0.560211134
1.0 5.301489796
85
Example:
We want that the midpoint method, the modified Euler’s method, and Heun’s method give the same
approximations to the initial-value problem
y ′ = −y + t + 1, 0 ≤ t ≤ 1, y(0) = 1
yi+1 = yi + (w1 + w2 )f h + (a1 ft + b1 m1 fy )

= yi + f h + w2 (a1 − b1 m1 )h2
with w1 + w2 = 1.
– Midpoint method:
1
w2 = 1, a1 = b 1 =
2
h2
yi+1 = yi + f h + (1 − m1 )
2
– Modified Euler’s method:
1
w2 = , a1 = b 1 = 1
2
h2
yi+1 = yi + f h + (1 − m1 )
2
– Heun’s method:
3 2
w2 = , a1 = b 1 =
4 3
h2
yi+1 = yi + f h + (1 − m1 )
2
Therefore, all the three methods give the same approximations.
example:
We want to find y(0.2) by using Runge-Kutta fourth order method for
y′ = 1 + y2, y(0) = 0
We take t0 = 0, y0 = 0, and h = 0.2, we get
m1 = f (t0 , y0 ) = 1
m2 = f (t0 + h/2, y0 + hm1 /2) = 1.01000000
m3 = f (t0 + h/2, y0 + hm2 /2) = 1.01020100
m4 = f (t0 + h, y0 + m3 h) = 1.040820242
y(0.2) = y0 + h(m1 + 2m2 + 2m3 + m4 )/6 = 0.2027074080
86
7.6 Predictor Corrector Methods
In the previous methods, Euler, Heun, Taylor, and Runge-Kutta are called one-step methods because
only the value of y0 at the beginning of the interval was required. They use information from one previous
point to compute the successive point; that is, yi is needed to compute yi+1 . The Multistep methods
make use of information about the solution at more then one point. A desirable feature of multistep
method is that the local truncation error can be determined and a correction term can be included, which
improves the accuracy of the answer at each step.
Definition: An m-step multistep method for solving the initial-value problem
y ′ f (t, y), a ≤ t ≤ b, y(a) = α (7.25)
has a difference equation for finding the approximation wi+1 at the mesh point t − i + 1 represented by
the following equation, where the integer m > 1
wi+1 = am−1 wi + am−2 wi−1 + . . . + a0 wi+1−m
+h [bm f (ti+1 , wi+1 ) + bm−1 f (ti , wi ) + . . . + b0 f (ti+1−m , wi+1−m )] (7.26)
for i = m − 1, . . . , N − 1, where h = (b − 1)/N , the a0 , a1 , . . . , am−1 and b0 , . . . , bm are constant, and the
starting values w0 = α, w1 = α1 , . . . , wm−1 = αm−1 . When bm = 0 the method is called explicit, or open,
since in eq. (7.26), wi+1 is explicitly given in terms of previously determined values. When bm 6= 0 the
method is called implicit, or closed, since wi+1 occurs in both sides. Euler’s method gives
yi+1 = yi + f (t0 + ih, yi ), i = 0, 1, . . . (7.27)
The modified Euler’s method can be written as
h
yi+1 = yi + [f (ti , yi ) + f (ti+1 , yi+1 )] (7.28)
2
The value of yi+1 is first estimated by Euler’s method, eq (7.27), and then used in the right hand side of
the modified Euler’s method, eq (7.28), giving a better approximation of yi+1 . The value of yi+1 is again
substituted in the modified Euler’s method, eq (7.28), to find a still better approximation of yi+1 . This
procedure is repeated till two consecutive iterated values of yi+1 agree. The equation (7.27) is therefore
called predictor while eq (7.28) is called corrector.
We will describe only the multistep method called Adam-Bashforth-Moulton Method. It can
be derived from the fundamental theorem of calculus
Z ti+1
yi+1 = yi + f (t, y)dt (7.29)
ti
The predictor uses the Lagrange polynomial approximation for f (t, y) based on the four values, ti−3 ,
ti−2 , ti−1 , and ti .
(t − ti−2 )(t − ti−1 )(t − ti )
P4 (t) = yi−3
(ti−3 − ti−2 )(ti−3 − ti−1 )(ti−3 − ti )
(t − ti−3 )(t − ti−1 )(t − ti )
+yi−2
(ti−2 − ti−3 )(ti−2 − ti−1 )(ti−2 − ti )
(t − ti−3 )(t − ti−2 )(t − ti )
+yi−1
(ti−1 − ti−3 )(ti−1 − ti−2 )(ti−1 − ti )
(t − ti−3 )(t − ti−2 )(t − ti−1 )
+yi
(ti − ti−3 )(ti − ti−2 )(ti − ti−1 )
87
It is integrated over the interval [ti , ti+1 ].
Z ti+1
h
P4 (t)dt = (55yi − 59yi−1 + 37yi−2 − 9yi−3 ) (7.30)
ti 24
Then, we can write

h
yi+1 = yi + (55yi − 59yi−1 + 37yi−2 − 9yi−3 ) (7.31)
24
This last equation is called Adams-Bashforth predictor. Note here that extrapolation is used.
The corrector is developed in similar way. A second Lagrange polynomial for f (t, y) is constructed
based on the four points, (ti−2 , yi−2 ), (ti−1 , yi−1 ), (ti , yi ), and the new point (ti+1 , yi+1 ) just calculated
by eq. (7.33).
(t − ti−1 )(t − ti )(t − ti+1 )

P4 (t) = yi−2
(ti−2 − ti−1 )(ti−2 − ti )(ti−2 − ti+1 )
(t − ti−2 )(t − ti )(t − ti+1 )
+yi−1
(ti−1 − ti−2 )(ti−1 − ti )(ti−1 − ti+1 )
(t − ti−2 )(t − ti−1 )(t − ti+1 )
+yi
(ti − ti−2 )(ti − ti−1 )(ti − ti+1 )
(t − ti−2 )(t − ti−1 )(t − ti )
+yi+1
(ti+1 − ti−2 )(ti+1 − ti−1 )(ti+1 − ti )
It is integrated over the interval [ti , ti+1 ].

Z ti+1
h
P4 (t)dt = (9yi+1 + 19yi − 5yi−1 + yi−2 ) (7.32)
ti 24
Which gives the Adams-Moulton corrector:

h
yi+1 => yi + (9yi+1 + 19yi − 5yi−1 + yi−2 ) (7.33)
24
We have to repeat the last equation, the corrector, till we obtain the needed accuracy.
Algorithm
– COMPUTE THE FIRST FOUR INITIAL-VALUE WITH RUNGE-KUTTA-METHOD

(0)
– COMPUTE yi+1 , USING THE FORMULA
(0) h
yi+1 = yi + [55fi − 59fi−1 + 37fi−2 − 9fi−3 ] (7.34)
24
(0) (0)
– COMPUTE fi+1 = f (xi+1 , yn+1 )
(k)
– COMPUTE yi+1 from the equation
(k) h (k−1)
yi+1 = yi + [9f (xi+1 , yi+1 ) + 19fi − 5fi−1 + fi−2 ] (7.35)
24
88
– ITERATE ON i UNTIL

y (k) − y (k−1)
i+1 i+1
(k)
<ǫ (7.36)
y
i+1

Example:
Consider the initial-value problem
y′ = 1 + y2, 0 ≤ t ≤ 0.8, y(0) = 0
The first steps is to calculate the four initial value, w0 , w1 , w2 , and w3 . To do this we can use for
example Four-order Runge-Kutta method with t0 = 0, w0 = 0, and h = 0.2 we get
k1 = hf (t0 , y0 )
k2 = hf (t0 + h/2, y0 + k1 /2)
k3 = hf (t0 + h/2, y0 + k2 /2)
k4 = hf (t0 + h, y0 + k3 )
w1 = w0 + (k1 + 2k2 + 2k3 + k4 )/6 = 0.2027074081
In the same way we can get
w2 = 0.4227889928
w3 = 0.6841334020
so, we get the predictor
w4 = w3 + h/24 [55f (t3 , w3 ) − 59f (t2 , w2 ) + 37f (t1 , w1 ) − 9f (t0 , w0 )]
= 1.023434882
Now, we can correct the predicted value using the formula
t1 = 0.2, t2 = 0.4, t3 = 0.6, t4 = 0.8
(0)
w4 = 1.023434882
h i
(1) (0)
w4 = w3 + h/24 9f (t4 , w4 ) + 19f (t3 , w3 ) − 5f (t2 , w2 ) + f (t1 , w1 )
= 1.029690402
h i
(2) (1)
= 1.030653654
h i
(3) (2)
= 1.030653654
So, the predicted-corrector methods gives an approximate solution 1.030653654. The actual solution
of the ODE is
y(t) = tan(x)
y(0.8) = 1.029638557
The errors are
|w4 (0.8) − tan(0.8)| = 0.006203675
(3)
|w4 (0.8) − tan(0.8)| = 0.001015097
89
Chapter 8
Direct Methods for Solving Linear Systems
Alinear system has one of the following properties:

Unique solution
No solution
No unique solution
Ill conditioned
Let consider the following linear system

x − 2y = −2
0.45x − 0.91y = −1
This system is ill conditioned because the two equations represent a nearly two parallel lines.
Definition: A matrix n × m can be represented by
 
a11 a12 . . . a1m
 a21 a22 . . . a2m 
 
A = (aij ) =  .. .. .. 
 . . . 
an1 an2 . . . anm
Linear equations can be represented by matrix. Solving linear equation can be done by three main
operations:
Row Ei can be multiplied by any nonzero constant λ.
Row Ej can be multiplied by λ and added to the row Ei
Row Ei and Ej can be transposed in order.
Example
 
1 1 0 3 4
 2 1 −1 1 1 
 
 3 −1 −1 2 −3 
−1 2 3 −1 4
90
it becomes
   
1 1 0 3 4 1 1 0 3 4
 2 1 −1 1 1 
 →  0 −1 −1 −5 −7 
 

 3 −1 −1 2 −3   0 −4 −1 −7 −15 
−1 2 3 −1 4 0 3 3 2 8
 
1 1 0 3 4
 0 −1 −1 −5 −7 
→ 0

0 3 13 13 
0 0 0 −13 −13
The matrix becomes a triangular matrix. It is possible to solve the linear equations by a backward-
substitution process.
8.1 Gaussian Elimination

In general the matrix A of dimension n × n can be reduced to triangular by elementary operations.
Example:
Consider the following system
 
1 1 1 1 3
 2 −1 −1 2 12 
 
 1 3 −2 −1 −9 
−1 −1 1 4 17
After elementary operation we get
 
1 1 1 1 3
 0 −3 −3 0 6 
 
 0 0 −5 −2 −8 
0 0 0 21
5
84
5
Now, the solution can be obtained with backward substitution

21 84
x4 = ⇒ x4 = 4
5 5
−5x3 − 2 × 4 = −8 ⇒ x3 = 0
−3x2 − 3 × 0 = 6 ⇒ x2 = −2
x1 − 2 + 0 + 4 = 3 ⇒ x1 = 1
If the forward elimination gives the final row
00 . . . 0ann |bn
which gives ann xn = bn , the original system has a unique solution, which will be obtained by backward
substitution.
if the final row has
00 . . . 00|bn
where bn 6= 0, the system has no solution.
91
if the final row has
00 . . . 00|0
The system has an infinity of solutions.
There are few strategics to implement Gauss Elimination:
8.2 Pivoting Strategies

8.3 Matrix Inverse
8.4 Determinant of Matrix
8.5 Matrix Factorization
92
Chapter 9
Iterative Methods for Solving Linear

Systems
9.1 Norms of vectors and Matrices

9.2 Eigenvalues and Eigenvectors
9.3 Iterative Techniques for Solving Linear Systems
93
Chapter 10
Some Useful Remarks
10.1 Largest Possible Root

For a polynomial represented by
f (x) = an xn + an−1 xn−1 + . . . + a1 x + a0 (10.1)
the largest possible root is given by
an−1
xl = − (10.2)
an
This value is taken as initial approximation when no other value is given.
Search Bracket: All real roots are within the interval
 s 
2 s 2
− an−1 an−2 an−1 an−2 
−2 , −2 (10.3)
an an an an
Another relationship that suggests an interval for roots. All real roots are within the interval

1 1
−1 − Max{|a0 |, . . . , |an−1 }, 1 + Max{|a0 |, . . . , |an−1 } (10.4)
|an | |an |
10.2 Convergence of Bisection Method

1 b−a
In the bisection method, after n iteration, the root must lie within , This means that the error
2n
bound at nth iteration is
b−a
En = (10.5)
2n
Similarly,
b−a En
En+1 = n+1
= (10.6)
2 2
That is, the error decreases linearly with each step by a factor of 1/2. Therefore, the bisection method is
linearly convergent.
1
Numerical Methods, E Balagurusamy
94
10.3 Convergence of False Position Method
The false position formula is based on the linear interpolation model. One of the starting points is fixed
while the other moves towards the solution. Assume that the initial points bracketing the solution are a
and b and that a moves towards the solution and b is fixed.
Let x1 = a an p be the solution. Then,
E0 = p − p0 , E1 = p − p1 (10.7)
that is
Ei = p − pi (10.8)
It can be shown that..
10.4 Convergence of Newton-Raphson Method

Let pn be estimate of a root of the function f (x) = 0. If pn and pn+1 are close to each other, then, we can
use Taylor’s series expansion,
f ′′ (ξ)
f (pn+1 ) = f (pn ) + (pn+1 − pn )f ′ (pn ) + (pn+1 − pn )2 (10.9)
2
where, ξ lies between pn and pn+1 . Let assume that the exact root is p. Then
f ′′ (ξ)
0 = f (pn ) + (p − pn )f ′ (pn ) + (p − pn )2 (10.10)
2
Assume that f ′ (pn ) 6= 0. From Newton-Raphson formula we have
f (pn )
pn+1 = pn − ⇒ f (pn ) = (pn − pn+1 ) f ′ (pn ) (10.11)
f ′ (pn )
substituting this for f (pn ) in eq.(10.10),yields
f ′′ (ξ)
0 = (p − pn+1 ) f ′ (pn ) + (p − pn )2 (10.12)
2
The error in the estimate xn+1 is given by
En+1 = p − pn+1
En = p − pn
Now, eq. (10.12) can be written as

f ′′ (ξ) 2
0 = En+1 f ′ (pn ) + En (10.13)
2
which leads to
f ′′ (ξ) 2
En+1 = − E (10.14)
2 f ′ (pn ) n
The last equation shows that the error is roughly proportional to the square of the error in the previous
iteration. Thus, the Newton-Raphson method is quadratically convergent.
The Newton-Raphson Method has certain limitations. The Method may fail in the following situations:
95
1. If f ′ (pn ) = 0.
2. If the initial guess is too far away from the required root, the process may converge to some other
root.
3. A practical value in the iteration sequence may repeat, resulting in an infinite loop. This occurs
when the tangent to the curve f (x) at pn+1 cuts the x-axis again at pn .
10.5 Convergence of Secant Method

Secant method uses two initial estimates but does not require that they must bracket the root. The secant
formula of iteration is
pn − pn−1
pn+1 = pn − f (pn ) (10.15)
f (pn ) − f (pi−1 )
Let p be actual root of f (x) and En the error in the estimate of pn . Then
pi = Ei + p, for, i = n − 1, n, n + 1 (10.16)
Substituting in eq.(10.15) and simplifying, we get
En−1 f (pn ) − En f (pn−1 )
En+1 = (10.17)
f (pn ) − f (pi−1 )
According to the mean value theorem, there exist at least one point, between pn and p such that
f (pn ) − f (p) f (pn ) f (pn )
f ′ (ξn ) = = = ⇒ f (pn ) = Ei f ′ (ξn ) (10.18)
pn − p pn − p En
Similarly,
f (pn−1 ) = En−1 f ′ (ξn−1 ) (10.19)
The equation (10.17) becomes
f ′ (ξn ) − f ′ (ξn−1 )
En+1 = En En−1 (10.20)
f (pn ) − f (pi−1 )
That is,
En+1 ∝ En En−1 (10.21)
Let us now find the order of convergence of this iteration process. If the order of convergence is α then
En+1 ∝ Enα (10.22)
So, from equation (10.21), we can write
En+1 ∝ En En−1 ⇒ Enα ∝ En En−1
⇒ Enα ∝ En−1
α
En−1
α α+1
⇒ En ∝ En−1
(α+1)/α
⇒ En ∝ En−1
α (α+1)/α
⇒ En−1 ∝ En−1
96
This means that √
α+1 1± 5
=α⇒α= . (10.23)
α 2
Since α is always positive then, the order of convergence of the secant method is α = 1.618 and the
convergence is referred to as superlinear convergence.
10.6 Convergence of Fixed Point Method
97
Chapter 11
Exams
11.1 exam 1
Answer all questions.
1. Let x = 0.456 × 10(−2) , y = 0.134, and z = 0.920.
Use three-digit rounding arithmetic to evaluate:
(a) (x + y) + z.
(b) x + (y + z).
Is the associative law for addition satisfied? Explain your answer.
(10 Marks)
2. Evaluate the polynomial
P (x) = x3 − 6x2 + 11x − 5
at p = 2 using Horner’s Theorem.
(6 Marks)
3. We want to evaluate the square root of 5 using the equation x2 − 5 = 0 by applying the fixed-point
iteration algorithm.
1 5
(a) Use algebraic manipulation to show that g(x) = x + has a fixed point exactly at x2 − 5 = 0
2 2x
(b) Use fixed-point theorem to show that the function g(x) converges to the unique fixed point for
any intial p0 ∈ [2, 5].
(c) Use p0 = 3 to evaluate p2 .
(8 Marks)
Z 4
4. (a) Evaluate exactly the integral ex dx.
0
98
Z 4
(b) Find an approximation to ex dx using Simpson’s rule with h = 2.
0
Z 4
(c) Find an approximation to ex dx using Composite Simpson’s rule with h = 1.
0
(d) Does the composite Simpson’s rule improve the approximation.
(8 Marks)
5. Given the equation y ′ = 3x2 +1, with y(1) = 2. Estimate y(2) by Euler’s method using with h = 0.25.
(8 Marks)
END
Useful Formulas and Theorem

Horner’s Theorem Let
P (x) = an xn + an−1 xn−1 + ... + a1 x + a0 (11.1)
If bn = an and
bk = ak + bk+1 x0 , for k = n − 1, n − 2, ..., 1, 0 (11.2)
then b0 = P (x0 ),
Fixed-Point Theorem:
Let g ∈ C[a, b] be such that g(x) ∈ [a, b], for all x in [a, b]. Suppose, in addition, that g ′ exists on
(a, b) and that a constant 0 < k < 1 exists with
|g ′ (x)| ≤ k, for all x ∈ (a, b).
Then, for any number p0 in [a, b], the sequence defined by
pn = g(pn−1 ), n ≥ 1,
converges to the unique fixed point p in [a, b].
Euler’s method:
dy
To approximate the solution of the initial-value problem = f (t, y), a ≤ t ≤ b, y(a) = α at (N+1)
dt
equally spaced numbers in the interval [a, b], we construct the solution y(ti ) = wi for i = 0, 1, ..., N −1
and
w0 = α,
t0 = a,
wi+1 = wi + hf (ti , wi ),
ti = a + ih,
where h = (b − a)/N
99
11.2 exam 2
1. (a) Evaluate f (x) = x3 − 6.1x2 + 3.2x + 1.5 at x = 4.71 using three-digit rounding arithmetic.
(b) Find the relative error in (a).
(c) Use Horner’s Theorem to evaluate f (x) at x = 4.71 using three-digit rounding arityhmetic.
(d) Find the relative error in (c).
(e) Why the relative error in (c) is less than the relative error in (a).
(10 Marks)
2. Let f (x) = −x3 − cos x and p0 = −1. Use Newton’s method to find p2 .
(6 Marks)
3. (a) Let A be a given positive constant and g(x) = 2x − Ax2 . Show that if fixed-point iteration
converges to a nonzero limit, then the limit is p = 1/A, so the reciprocal of a number can be
found using only multiplications and subtractions.
1
(b) Use fixed-point iteration with p0 = 0.1 to find p2 that approximates .
11
(8 Marks)
4. Use the forward-difference and backward-difference formulas to determine each of the missing entry
in the following table:
x f (x) f ′ (x)
1.0 1.0000 ....
1.2 1.2625 ....
1.4 1.6595 ....
(8 Marks)
5. Use Euler’s method to approximate the solution for the following initial-value problem.
y ′ = et−y , 0 ≤ t ≤ 1, y(0) = 1 with h = 0.5.
(8 Marks)
END
100
Horner’s Theorem Let

P (x) = an xn + an−1 xn−1 + ... + a1 x + a0 (11.3)
If bn = an and
bk = ak + bk+1 x0 , for k = n − 1, n − 2, ..., 1, 0 (11.4)
then b0 = P (x0 ),
Fixed-Point Theorem:
Let g ∈ C[a, b] be such that g(x) ∈ [a, b], for all x in [a, b]. Suppose, in addition, that g ′ exists on
(a, b) and that a constant 0 < k < 1 exists with
|g ′ (x)| ≤ k, for all x ∈ (a, b).
Then, for any number p0 in [a, b], the sequence defined by
pn = g(pn−1 ), n ≥ 1,
converges to the unique fixed point p in [a, b].
Euler’s method:
dy
dt
and
w0 = α,
t0 = a,
wi+1 = wi + hf (ti , wi ),
ti = a + ih,
101
11.3 exam 3
1. Let
ex − e−x
f (x) =
x
(a) Find lim f (x).
x→0
(b) Use three-digit rounding arithmetic to evaluate f (0.100).

(c) The actual value is f (0.100) = 2.003335000 find the relative error for the value obtained in (b).
(8 Marks)
2. (a) Find the actual value of

Z 1
ex dx
−1
(b) Use Simpson’s rule to approximate the integral in (a).

(c) Find the maximum error from Simpson’s formula.
(8 Marks)
3. Use the forward-difference and backward-difference formulas to determine each of the missing entry
in the following table
x f (x) f ′ (x)
1.0 1.0000 ....
1.2 1.2625 ....
1.4 1.6595 ....
(8 Marks)
Z2
1
4. (a) Find The actual value of dx.
x+4
0
Z2
1
(b) Use the Trapazoidal rule to approximate dx
x+4
0
and Find the actual error.
(c) Determine the values of n and h required for the Composite Trapazoidal rule to approximate
Z2
1
dx to within 10−6 .
x+4
0
102
(8 Marks)
5. Use Euler’s method to approximate the solution for the following initial-value problem.
y ′ = et−y , 0 ≤ t ≤ 1, y(0) = 1 with h = 0.5.
(8 Marks)
END
Composite Trapezoidal rule:

Let f ∈ C 2 [a, b], h = (b − a)/n, and xj = a + hj, for each j = 0, 1, . . . , n. There exists a µ ∈ (a, b)
for which the composite trapezoidal rule for n subintervals can be written as
Z b " n−1
#
h X b − a 2 ′′
f (x) dx = f (a) + 2 f (xj ) + f (b) − h f (µ) (11.5)
a 2 j=1
12
Simpson’s rule
With nodes at x0 = a, x1 = a + h, x2 = b, where h = (b − a)/2, the Simpson’s rule is
Zx2
h h5
f (x)dx = [f (x0 ) + 4f (x1 ) + f (x2 )] − f (4) (ξ). (11.6)
3 90
x0
Euler’s method
dy
dt
and
w0 = α,
t0 = a,
wi+1 = wi + hf (ti , wi ),
ti = a + ih,
103

Numerical Computions

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Numerical Computions

Uploaded by

Copyright:

Available Formats

Numerical Computing

2 Solution of Equations in One Variable 19

3 Error Analysis for Iterative Methods 29

5 Interpolation and Polynomial Approximation 43

7 Initial-Value Problems for ODE 77

8 Direct Methods for Solving Linear Systems 90

9 Iterative Methods for Solving Linear Systems 93

10 Some Useful Remarks 94

1.1 Review of Calculus

lim f (x) = L (1.1)

if the following statement is true

∀ǫ > 0 ∃δ > 0 | 0 < |x − x0 | < δ ⇒ |f (x) − L| < ǫ (1.2)

 Convergence of the sequences:

∀ǫ > 0 ∃N > 0 | n > N ⇒ |xn − x| < ǫ (1.4)

 Theorem: If the function f is differentiable at x0 , then f is continuous at x0 .

 Extreme value theorem:

where xi = a + i(b − a)/n.

cos (0.01) = 0.99995 + 0.16 × 10−6 sin ξ (1.14)

| cos(0.01) − 0.99995| = 0.16 × 10−6 sin ξ ≤ 0.16 × 10−6 (1.15)

0.99994983 < 0.99995 − 0.16 × 10−6 ≤ cos(0.01)

 solution of exercise 13:

f (x) = x + x3 + 1/2 x5 + O(x7 )

The upper bound error is

To approximate f ′ (0.2) using P4′ (0.2) we can write

 solution of exercise 15:

P3 (x) = cos(x0 ) − sin(x0 ) (x − x0 ) − 1/2 cos(x0 ) (x − x0 )2 + 1/6 sin(x0 ) (x − x0 )3

 solution of exercise 26:

(c1 + c2 )m ≤ c1 f (x1 ) + c2 f (x2 ) ≤ (c1 + c2 )M

1.2 Round-off Errors

x = ±(.d1 d2 ...dn )β β e (1.17)

 64-bit (binary digit) representation for number, called long real.

 The 52 binary digits can holds up to 16 decimal digits.

(−1)s 2c−1023 (1 + f ) = 24 × (1 + 0.722900390625) = 27.56640625

The next smallest number to this is represented by

 Underflow and overflow

The result of this program would be

after you round it you get

(0.1000 0000 0000 0000 0101 0100)2 × 21

which is equivalent to (1.0000100136)10 .

 Strategies to minimize round off errors

Double precision method:

Example: As θ approaches 0, accuracy of a numerical evaluation for

sin(1 + θ) = sin(1) + θ cos(1) − 0.5θ2 sin(1)...

d ≈ D = cos(1) − 0.5θ sin(1)

By writing a program to compute d. The accuracy of the approximation increases as θ approaches

The output is:

Example: Let x = 0.456732 × 10−2 , y = 0.243451, and z = −0.248000.

x+y = 0.00456732 + 0.243451 = 0.248018

 chopping and rounding:

±0.d1 d2 ...dk × 10n (1.19)

y = 0.d1 d2 ...dk dk+1 dk+2 ... × 10n (1.20)

The floating-point form of y, denoted by f l(y), is obtained by terminating the mantissa of y at k

 Absolute error and relative error:

x⊕y = f l(f l(x) + f l(y))

 For k-digit chopping we have

For k-digit rounding we have

f l(x) = 0.d1 d2 ...dp αp+1 αp+2 ...αk × 10n

the floating-point form of x − y takes the form

cos(x) = 0.a1 a2 a3 a4 a5 a6 a7 ...

if x = x0 is close enough to zero we can have

cos(x0 ) = 0.999999a7 ...

1 − cos(x0 ) = 0.100000 × 10−5 − 0.a7 a8 a9 ... × 10−6

We can use Taylor polynomial

Convergence of the sequences:

Theorem: If the function f is differentiable at x0 , then f is continuous at x0 .

Extreme value theorem:

solution of exercise 13:

solution of exercise 15:

solution of exercise 26:

64-bit (binary digit) representation for number, called long real.

The 52 binary digits can holds up to 16 decimal digits.

Underflow and overflow

Strategies to minimize round off errors

chopping and rounding:

Absolute error and relative error:

For k-digit chopping we have

Exercise 5: Three-digit rounding arithmetic for:

We will solve exercises 1,3,11,15,16 from sec 2.1 page 51-52.

Study examples 3 and 4 page 57

The Newton’s method cannot be continuous if f ′ (pn−1 ) = 0.

We know that the first derivative is defined by

If α = 1, the sequence is linearly convergent.

One method to handle problems of multiple roots is to define a function

Aitken’s method construct the terms in order,

Synthetic Division Algorithm and the Remainder Theorem:

Horner’s Method: Let

The parabola take the form

To determine p3 , a zero of P (x), we apply the quadrature formula