# Numerical Computing CSC 2702

Required textbook: Numerical Analysis: Burden & Faires: 8th edition Thomson Brooks/Cole

Dr Azeddine M KICT, CS IIUM October 12, 2009

1

Contents

1 Mathematical Preliminaries 1.1 Review of Calculus . . . . 1.1.1 Exercises . . . . . . 1.2 Round-oﬀ Errors . . . . . 1.2.1 Exercises . . . . . . 4 4 6 8 17 19 20 23 26 27 27

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

. . . . . . . . .

2 Solution of Equations in One Variable 2.1 Bisection Method . . . . . . . . . . . . 2.2 Fixed-Point Iteration . . . . . . . . . . 2.3 Newton’s Method . . . . . . . . . . . . 2.4 Secant Method . . . . . . . . . . . . . 2.5 False Position Method . . . . . . . . .

3 Error Analysis for Iterative Methods 29 3.1 Linearly and Quadratically Convergent Procedures . . . . . . . . . . . . . . . . . . . . . . 29 3.2 Zero multiplicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 3.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 4 Accelerating Convergence 4.1 Aitken’s ∆2 method . . . 4.2 Steﬀensen’s Method . . . 4.3 Zeros Polynomial . . . . 4.4 Horner’s Method . . . . 4.5 Deﬂation . . . . . . . . . 4.6 M¨ller’s method . . . . . u 4.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33 33 35 37 37 39 40 40 43 43 43 45 46 49 50 52

5 Interpolation and Polynomial Approximation 5.1 Weierstrass Approximation Theorem . . . . . 5.2 Lagrange Polynomial . . . . . . . . . . . . . . 5.3 Neville’s Method . . . . . . . . . . . . . . . . 5.4 Newton Interpolating Polynomial . . . . . . . 5.5 Polynomial Forms . . . . . . . . . . . . . . . . 5.6 Spline Interpolation . . . . . . . . . . . . . . . 5.7 Parametric Curves . . . . . . . . . . . . . . .

2

6 Numerical Diﬀerentiation and Integral 6.1 Numerical Diﬀerentiation . . . . . . . . 6.2 Richardson’s Extrapolation . . . . . . . 6.3 Elements of Numerical Integration . . . 6.3.1 Trapezoidal rule . . . . . . . . . 6.3.2 Simpson’s rule . . . . . . . . . . 6.3.3 Degree of precision . . . . . . . 6.3.4 Newton-Cotes Formula . . . . . 6.4 Composite Numerical Integration . . . 6.5 Adaptive Quadrature Methods . . . . . 6.6 Gaussian Quadrature . . . . . . . . . . 6.7 Improper Integrals . . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . .

57 57 62 64 64 65 65 66 66 68 72 74 77 77 78 80 82 83 87 90 91 92 92 92 92

7 Initial-Value Problems for ODE 7.1 introduction . . . . . . . . . . . . . . . . . . 7.2 Elementary Theory of Initial-Value Problems 7.3 Euler’s Method . . . . . . . . . . . . . . . . 7.4 Higher-Order Taylor Methods . . . . . . . . 7.5 Runge-Kutta Methods . . . . . . . . . . . . 7.6 Predictor Corrector Methods . . . . . . . . . 8 Direct Methods for Solving Linear 8.1 Gaussian Elimination . . . . . . . 8.2 Pivoting Strategies . . . . . . . . 8.3 Matrix Inverse . . . . . . . . . . . 8.4 Determinant of Matrix . . . . . . 8.5 Matrix Factorization . . . . . . . Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9 Iterative Methods for Solving Linear Systems 93 9.1 Norms of vectors and Matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 9.2 Eigenvalues and Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 9.3 Iterative Techniques for Solving Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . 93 10 Some Useful Remarks 10.1 Largest Possible Root . . . . . . . . . . . 10.2 Convergence of Bisection Method . . . . 10.3 Convergence of False Position Method . 10.4 Convergence of Newton-Raphson Method 10.5 Convergence of Secant Method . . . . . . 10.6 Convergence of Fixed Point Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94 94 94 95 95 96 97

11 Exams 98 11.1 exam 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 11.2 exam 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100 11.3 exam 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

3

**Chapter 1 Mathematical Preliminaries
**

1.1 Review of Calculus

Deﬁnition of the limit: A function f deﬁned on a set X of real number has a limit L at x0 , written

x→x0

lim f (x) = L

(1.1)

**if the following statement is true ∀ǫ > 0 ∃δ > 0 | 0 < |x − x0 | < δ ⇒ |f (x) − L| < ǫ f is continuous at x0 if
**

x→x0

(1.2)

lim f (x) = f (x0 )

(1.3)

Convergence of the sequences: The sequence has a limit x (converges to x) if ∀ǫ > 0 ∃N > 0 | n > N ⇒ |xn − x| < ǫ Diﬀerentiable functions: The function f is diﬀerentiable at x0 if f ′ (x) = lim

x→x0

(1.4)

f (x) − f (x0 ) x − x0

(1.5)

exists. This limit is called the derivative of f at x0 . The set of all function that have n continuous derivatives on X is denoted by C n (X). Theorem: If the function f is diﬀerentiable at x0 , then f is continuous at x0 . Rolle’s Theorem: Suppose f ∈ C[a, b] and f is diﬀerentiable on (a, b). If f (a) = f (b), then a number c ∈ (a, b) exists with f ′ (c) = 0.

4

Mean value theorem: Suppose f ∈ C[a, b] and f is diﬀerentiable on (a, b). A number c ∈ (a, b) exists with f ′ (c) = f (b) − f (a) b−a (1.6)

Extreme value theorem: If f ∈ C[a, b], then c1 and c2 exist with f (c1 ) ≤ f (x) ≤ f (c2 ), for all x ∈ [a, b]. In addition, if f is diﬀerentiable on (a, b), then c1 and c2 occur either at the endpoints of [a, b] or where f ′ is zero. Riemann integral: The Riemann integral of a function f on the interval [a, b] is the limit (provided it exists) of

b n

f (x)dx =

a

max∆xi →0

lim

f (zi )∆xi ,

i=1

(1.7)

where the numbers xi satisfy a = x0 ≤ x1 ... ≤ xn = b, and where ∆xi = xi −xi−1 , and zi is arbitrarily chosen in [xi−1 , xi ]. If the points are equally spaced and we choose zi = xi , in this case The Riemann integral of a function f on the interval [a, b] is the limit (provided it exists) of

b a

b−a f (x)dx = lim n→∞ n

n

f (xi ),

i=1

(1.8)

where xi = a + i(b − a)/n. Weighted Mean Value Theorem for Integral: Suppose f ∈ C[a, b], the Riemann integral of g exists on [a, b], and g(x) does not change sign on [a, b]. Them there exist a number c in (a, b) with

b b

f (x)g(x)dx = f (c)

a a

g(x)dx.

(1.9)

**When g(x) = 1, it gives the average value of the function f over the interval [a, b]. 1 f (c) = b−a
**

b

f (x)dx.

a

(1.10)

Generalized Rolle’s Theorem: Suppose that f ∈ C[a, b] is n times diﬀerentiable on (a, b). If f (x) is zero at n + 1 distinct numbers x0 , ..., xn in [a, b], then a number c in (a, b) exists with f (n) (c) = 0. Intermediate Value Theorem: If f ∈ C[a, b] and K is any number between f (a) and f (b), then there exists c in (a, b) for which f (c) = K. Taylor’s Theorem: If f ∈ C n [a, b] and f (n+1) exists on [a, b], and x0 ∈ [a, b]. For every x ∈ [a, b], there exists a number ξ(x) between x0 and x with f (x) = Pn (x) + Rn (x), where

n

Pn (x) = Rn (x) = f

k=0 (n+1)

f (k) (x0 ) (x − x0 )k k!

(1.11) (1.12)

(ξ(x)) (x − x0 )n+1 (n + 1)! 5

Pn (x) is called the nth Taylor polynomial for f about x0 , and Rn (x) is called the remainder term (or truncation error). In the case when x0 = 0, the Taylor polynomial is often called a Maclaurin polynomial. if we take n → ∞ the Taylor polynomial is called Taylor series for f about x0 . For x0 = 0, the Taylor series is called Maclaurin series. Example: We want to determine an approximate value of cos (0.01) using second Maclaurin polynomial 1 1 cos x = 1 − x2 + x3 sin(ξ) 2 6 where ξ is number between 0 and x. cos (0.01) = 0.99995 + 0.16 × 10−6 sin ξ where we use the bar over 6 to indicate that this digit repeats indeﬁnitely. | cos(0.01) − 0.99995| = 0.16 × 10−6 sin ξ ≤ 0.16 × 10−6 which gives 0.99994983 < 0.99995 − 0.16 × 10−6 ≤ cos(0.01) ≤ 0.99995 + 0.16 × 10−6 < 0.99995017 (1.15) (1.14) (1.13)

(1.16)

The error bound is much larger than the actual error. This is due in part to the poor bound we used for sin ξ. It can be shown that | sin x| ≤ |x|. Since 0 ≤ ξ ≤ 0.01, we ﬁnd bound 0.16 × 10−8 . 1 1 Note that eq. (1.13) can be written as cos x = 1 − x2 + x4 cos(ξ ′ ) and the error will be no more 2 24 1 −9 −8 × 10 = 0.416 × 10 . than 24 This example illustrate two objectives of numerical analysis: ﬁnd an approximation to the solution and determine a bound for the error.

1.1.1

Exercises

The exercises are from the textbook sec 1.1 pages 14-16. Tutor: Exercises 1,2,3,4,15,23 Students: All odd exercises except 17 Assignment 1: Exercises 15 and 26 solution of exercise 13: 2 The Taylor polynomial P4 (x) for the function f (x) = x ex is given by f (x) = x + x3 + 1/2 x5 + O(x7 ) P4 (x) = x + x3

6

The remainder is given by R4 (x) = f (5) (ξ(x)) 5 x 5! 1 ξ2 = e (15 + 90ξ 2 + 60ξ 4 + 8ξ 6 )x5 30 ≤ 1.211406197 x5 ≤ 0.01240479946

**we ﬁnd the last inequality by substituting ξ = x = 0.4. The integral can be approximated by
**

0.4 0.4

f (x)dx ≈

0 0

x + x3 dx = 0.086400

**The upper bound error is
**

0.4

1.211406197 x5 = 0.0008269866307

0 ′ To approximate f ′ (0.2) using P4 (0.2) we can write

f ′ (0.2) = ex (1 + 2x2 )|0.2 = 1.124075636 The actual error is 1.124075636 − 1.12 = 0.004075636 solution of exercise 15: The nth derivative of cos(x) is cos (x)(n) = cos x + According to Taylor’s theorem the error is Rn (x) = π (x − x0 )(n+1) sin ξ + n (n + 1)! 2 π n 2

2

f ′ (0.2) ≈ 1 + 3 x2 |0.2 = 1.12

where ξ is between x and x0 . For x = 42o and x0 = π/4 we ﬁnd that the error is less than (π/60)n+1 Rn (x) ≤ E = (n + 1)!

n=1 n=2 n=3 n=4

, , , ,

E E E E

= 0.001370778390 = 0.00002392459621 = 3.131722321 × 10−7 = 3.279531946 × 10−9 7

So n = 3 is suﬃcient to get the value accuracy to 10−6 . In this case P3 (x) is equal to P3 (x) = cos(x0 ) − sin(x0 ) (x − x0 ) − 1/2 cos(x0 ) (x − x0 )2 + 1/6 sin(x0 ) (x − x0 )3 = 0.7431446016 the actual value of cos 42o = 0.7431448255..., so the error is of the order of 0.2239 × 10−6 . solution of exercise 26: let assume that m = Min [f (x1 ), f (x2 )] and M = Max [f (x1 ), f (x2 )]. So, for any value ξ between x1 and x2 we have c1 m ≤ c1 f (x1 ) ≤ c1 M c2 m ≤ c2 f (x2 ) ≤ c2 M which lead to (c1 + c2 )m ≤ c1 f (x1 ) + c2 f (x2 ) ≤ (c1 + c2 )M and therefore m≤ c1 f (x1 ) + c2 f (x2 ) ≤M c1 + c2

without loss of generality, let assume that m = f (x1 ) and M = f (x2 ), then the last equation gives f (x1 ) ≤ c1 f (x1 ) + c2 f (x2 ) ≤ f (x2 ) c1 + c2

According to the intermediate value theorem ∃ξ between x1 and x2 such that f (ξ) = c1 f (x1 ) + c2 f (x2 ) c1 + c2

1.2

Round-oﬀ Errors

An n-digit ﬂoating-point number in base β has the form x = ±(.d1 d2 ...dn )β β e (1.17)

where (.d1 d2 ...dn )β is a β-fraction called the mantissa, end e is an integer called the exponent. Such a ﬂoating-point number is said to be normalized in case d1 = 0, or else d1 = d2 = ... = dn = 0. 64-bit (binary digit) representation for number, called long real. The ﬁrst bit is a sign indicator, denoted by s This is followed by 11-bit exponent, c, called the characteristic, and, a 52-bit binary fraction, f , called mantissa. the base for the exponent is 2. The 52 binary digits can holds up to 16 decimal digits. The exponent of 11 binary digits gives a range of 0 to 211 − 1 = 2047. To use small number we have to shift the exponent by 1023, so the range is actually between 0 − 1023 = −1023 and 2047 − 1023 = 1024 8

To save storage and provide a unique representation for each ﬂoating-point number we use a normalized form (−1)s 2c−1023 (1 + f ) (1.18) Example: consider the machine number 0 10000000011 1011100100010000000000000000000000000000000000000000 The left most bit is zero, the number is positive. The next eleven bits, (10000000011)2 = 1 + 21 + 210 = 1027. The exponent part is 21027−1023 = 24 . The ﬁnal 52 bits specify the mantissa f = (.101110010001)2 = This number represents (−1)s 2c−1023 (1 + f ) = 24 × (1 + 0.722900390625) = 27.56640625 The next smallest number to this is represented by 0 10000000011 1011100100001111111111111111111111111111111111111111 and the next largest number is 0 10000000011 1011100100010000000000000000000000000000000000000001 This means that our orig27.566406249... 27.566406250... next largest machine number

1 1 1 1 1 1 + 3 + 4 + 5 + 8 + 12 2 2 2 2 2 2

next smallest machine number

11 00 11 00 11 00

Figure 1.1: inal number represents not only 27.56640625, but also half of the real numbers that are between the numbers 27.56640625 and its two nearest machine-number neighbors see (Fig. 1.1). Underﬂow and overﬂow .............. Round-oﬀ errors: Round-oﬀ errors arise because it is impossible to represent all real numbers exactly on a ﬁnite-state machine (which is what all practical digital computers are). On a pocket calculator, if one enters 0.0000000000001 (or the maximum number of zeros possible), then a ’+’, and then 100000000000000 (again, the maximum number of zeros possible), one will obtain the number 100000000000000 again, and not 100000000000000.0000000000001. The calculator’s answer is incorrect because of round-oﬀ in the calculation.

27.56640625

9

Round-oﬀ errors in a computer1 The most basic source of errors in a computer is attributed to the error in representing a real number with a limited number of bits. The machine epsilon, ǫ, is the interval between 1 and the next number greater than 1 that is distinguishable from 1. This means that no number between 1 and 1 + ǫ can be represented in the computer. Machine epsilon can be found by the following program: 10 20 30 E=1 IF E+1>1 THEN PRINT E ELSE STOP E=E/2: GOTO 20

When numbers are added or subtracted, an accurate representation of the result may require much larger number of digits than needed for numbers added or subtracted. Serious amounts of round-oﬀ error occur in situations: 1. when adding (or subtracting) a very small number to (or from) a large number 2. when a number is subtracted from another that is very close To test the ﬁrst case on the computer, let us add 0.00001 to unity ten thousand times. The program to do this job would be: 10 20 30 40 50 sum=1 for i=1 to 10000 sum=sum+0.00001 next print*, sum

The result of this program would be sum = 1.100136 Since the exact answer is 1.1, the relative error of this computation is 1.100136 − 1.1 = 0.000124 = 0.0124% 1.1 The cause of this round oﬀ error can be understood like this. Let consider the computation of 1+0.00001. The binary representation of 1 and 0.00001 in 32-bit (binary digit) are, respectively, (1)10 = (0.1000 0000 0000 0000 0000 0000)2 × 21 (0.00001)10 = (0.1010 0111 1100 0101 1010 1100)2 × 2−16

We adjust their exponent we get

(0.10000 0000 0000 0000 0000 0000 0000 0000 0000 0000)2 × 21 + (0.00000 0000 0000 0000 1010 0111 1100 0101 1010 1100)2 × 21 (0.10000 0000 0000 0000 1010 0111 1100 0101 1010 1100)2 × 21

1

Applied Numerical Methods with Software, Shoichiro Nakamura

10

Now we have to use only 24-bit for the mantissa, we get (0.10000 0000 0000 0000 1010 0111 1100 0101 1010 1100)2 × 21 after you round it you get

(0.1000 0000 0000 0000 0101 0100)2 × 21 which is equivalent to (1.0000100136)10 . Thus, whenever 0.00001 is added to 1, the result gains 0.0000100136 as an error. When addition of 0.00001 is added to 1 ten thousand times, an error of exactly ten thousand times 0.0000100136 is generated. Although the calculated result gains in the present example, it can lose if some digits are cut oﬀ. Loss and gain are both referred to round-oﬀ error. Strategies to minimize round oﬀ errors 1. Double precision 2. Grouping 3. Taylor expansion 4. Changing deﬁnition of variable 5. Rewriting the equation to avoid subtractions Example: We want to add 0.00001 ten thousand times to unity by using: (a)-double precision (b)-grouping method Double precision method: 10 SUM=1.0D0 20 DO I=1,10000 30 SUM=SUM+0.00001D0 40 END DO 50 PRINT *, SUM

Grouping method: SUM=1 DO 47 I=1,100 TOTAL=0 DO 40 K=1,100 TOTAL=TOTAL+0.00001 40 CONTINUE 11

SUM=SUM+TOTAL 47 CONTINUE PRINT *, SUM

Example: As θ approaches 0, accuracy of a numerical evaluation for d= sin(1 + θ) − sin(1) θ

becomes very poor because of the round-oﬀ errors. By using Taylor expansion we can write sin(1 + θ) = sin(1) + θ cos(1) − 0.5θ2 sin(1)... Therefore, d ≈ D = cos(1) − 0.5θ sin(1) By writing a program to compute d. The accuracy of the approximation increases as θ approaches 0. The FORTRAN program is: program testeps implicit none real :: d,da,t=1.0e0,h=10.0e0 integer :: i do i=1,7 t=t/h da=cos(1.0e0)-0.5e0*t*sin(1.0e0) d=(sin(1.0e0+t)-sin(1.0e0))/t print*,t,d,da end do end

The output is: angle d D ------------------------------------0.10000000E-00 0.49736413 0.49822873 9.99999978E-03 0.53608829 0.53609490 9.99999931E-04 0.53993475 0.53988153 9.99999902E-05 0.54062998 0.54026020 9.99999884E-06 0.54383242 0.54029804 9.99999884E-07 0.54327762 0.54030186

12

Law of Arithmetic Due to errors introduced in ﬂoating point arithmetic, the associative and distributive laws of arithmetic are not always satisﬁed. that is x + (y + z) = (x + y) + z x × (y × z) = (x × y) × z x × (y + z) = (x × y) + (x × z) Example: Let x = 0.456732 × 10−2 , y = 0.243451, and z = −0.248000. x+y (x + y) − z (y + z) x + (y + z) = = = = 0.00456732 + 0.243451 = 0.248018 0.248018 − 0.248000 = 0.000018 = 0.180000 × 10−4 0.243451 − 0.248000 = −0.0045490 0.00456732 − 0.00454900 = 0.0001832 = 0.183200 × 10−4

chopping and rounding: Let use normalized decimal ﬂoating-point form ±0.d1 d2 ...dk × 10n (1.19)

where 1 ≤ d1 ≤ 9 and 0 ≤ di ≤ 9. This number is called k-digit decimal machine numbers. Any positive real number within the numerical range of the machine can be normalize to the form y = 0.d1 d2 ...dk dk+1 dk+2 ... × 10n (1.20)

The ﬂoating-point form of y, denoted by f l(y), is obtained by terminating the mantissa of y at k decimal digits. There are two ways of performing this termination. One called chopping. is simply chop oﬀ the digits dk+1 dk+2 .... The other method, called rounding, add 5 × 10n−(k+1) to y and then chops the result. So, when rounding, if dk+1 ≥ 5, we add 1 to dk to obtain f l(y). This is we round up. When dk+1 < 5, we merely chop oﬀ all but the ﬁrst k digits; so we round down. Rounding up the digits and even the exponent might change. Example: the number π = 0.314159... × 101 . The ﬂoating-point form of π using ﬁve-digit chopping is f l(π) = 0.31415 × 101 = 3.1415. The ﬂoating-point form of π using ﬁve-digit rounding is 3.1416, because of the sixth digit expansion of π which is 9 > 5. Absolute error and relative error: If p⋆ is an approximation to p, the absolute error is |p − p⋆ |, and the relative error is |p − p∗ | . |p|

A more formal deﬁnition of signiﬁcant digits is as following. Let the true value have digits p = d1 d2 ...dk dk+1 ...dn . Let the approximate value have p∗ = d1 d2 ...dk ek+1 ...en . where d1 = 0 and with the ﬁrst diﬀerence in the digits occurring at the (k + 1)st digit. We then say 13

Signiﬁcant digit: The number p∗ is said to approximate p to t signiﬁcant digits if t is the largest nonegative integer for which |p − p∗ | ≤ 5 × 10−t (1.21) |p|

that p and p∗ agree to k signiﬁcant digits if |dk+1 − ek+1 | < 5. Otherwise, we say they agree to k − 1 signiﬁcant digits. Example: Let the true value p = 10/3 and the approximate value p∗ = 3.333. The absolute error is |10/3 − 3.333| = 1/3000. The relative error is 1/10000=10−4 < 5 × 10−4 The number of signiﬁcant digits is 4. Assume that the ﬂoating-point representations f l(x) and f l(y) are given for the real number x and y and the symbols ⊕, ⊖, ⊗, ⊘ represent addition, subtraction, multiplication, and division operations, respectively. The ﬁnite-digit arithmetic is given by x⊕y x⊖y x⊗y x⊘y For k-digit chopping we have y − f l(y) ≤ 10−k+1 y For k-digit rounding we have y − f l(y) ≤ 0.5 × 10−k+1 y (1.23) (1.22) = = = = f l(f l(x) + f l(y)) f l(f l(x) − f l(y)) f l(f l(x) × f l(y)) f l(f l(x) ÷ f l(y))

One of the most common error involves the cancellation of signiﬁcant digits due to the subtraction of two nearly equal numbers. Suppose we have two nearly equal numbers x and y, with x > y, we have f l(x) = 0.d1 d2 ...dp αp+1 αp+2 ...αk × 10n f l(y) = 0.d1 d2 ...dp βp+1 βp+2 ...βk × 10n the ﬂoating-point form of x − y takes the form f l(f l(x) − f l(y)) = αp+1 αp+2 ...αk × 10n−p − βp+1 βp+2 ...βk × 10n−p = 0.σp+1 σp+2 ...σk × 10n−p The ﬂoating-point number used to represent x − y has at most k − p digits of signiﬁcance. Any further calculations involving x − y retain the problem of having only k − p digits of signiﬁcance. Loss of signiﬁcance: Consider, for example, x∗ = 0.76545421 × 101 and y ∗ = 0.76544200 × 101 to be an approximation to x and y, respectively, correct to seven signiﬁcant digits. Then, in eight-digit ﬂoating-point arithmetic, the diﬀerence z ∗ = x∗ − y ∗ = 0.12210000 × 10−3 . But as an approximation to z = x − y is good only to three digits, since the fourth signiﬁcant digit of z ∗ is derived from the eight digits of x∗ and y ∗ , both possibly in error. Hence, while the error in z ∗ is at most the sum of the error in x∗ and y ∗ , the relative error in z ∗ is possibly 10000 times the relative relative error in x∗ and y ∗ . Loss of signiﬁcant digits is therefore dangerous only if we wish to keep the relative error small 14

We can also have error when dividing by a small number of multiplying by large number. Suppose, for example, that the number z has a ﬁnite-digit approximation z + δ, where the error δ is introduced by representation or previous calculation. If we divide it by ǫ = 10−n , where n > 0, then z ≈ fl ǫ f l(z) f l(ǫ) = (z + δ) × 10n

so, the absolute error in this approximation, |δ| × 10n , is the original absolute error, |δ|, multiplied by a factor 10n . Example: Let p = 0.54617 and q = 0.54601. The exact value of r = p − q = 0.16 × 10−5 . If we perform the subtraction using 4-digit rounding we ﬁnd p∗ = 0.5462 and q ∗ = 0.5460, and r∗ = p∗ −q ∗ = 0.2×10−4 . The relative error is r − r∗ = 0.25 r which has only one signiﬁcant digit, whereas p∗ and q ∗ were accurate to four and ﬁve signiﬁcant digits, respectively. Example: The quadrature formula states that the roots of ax2 + bx + c = 0, when a = 0, are √ −b ± b2 − 4ac x± = 2a

(1.24)

using four-digit rounding arithmetic, consider this formula applied to x2 + 62.1x + 1 − 0, whose roots are approximately x+ = −0.01610723 and x− = −62.08390. we can√ that b ≫ 4ac, so see the numerator of x+ involves the subtraction of two nearly equal numbers. b2 − 4ac = 62.06, we get f l(x+ ) = −0.02 which is a poor approximation to x+ = −0.01611, with a relative error about 2.4 × 10−1 . The other root f l(x− ) = −62.10 has a small relative error around 3.2 × 10−4 . To obtain more accurate we can use the formula √ −b + b2 − 4ac x+ = 2a √ −b + b2 − 4ac = 2a −2c √ = b + b2 − 4ac

√ b + b2 − 4ac √ b + b2 − 4ac

so we can get f l(x+ ) = −0.01610 which has the small relative error 6.2 × 10−4 . we can also derive a formula for x2 x− = −2c √ b − b2 − 4ac

In this case f l(x− ) will be −50.00 which has the large relative error 1.9 × 10−1 . 15

Example: This example shows how we can avoid loss of signiﬁcance. We want to evaluate f (x) = 1 − cos(x) near zero in six-digit arithmetic. Since cos(x) ≈ 1 for x near zero, there will be loss of signiﬁcant digits by ﬁrst ﬁnding cos(x) and then subtracting it from 1. Without loss of generality, assume that x is close to zero with x > 0 , we have cos(x) = 0.a1 a2 a3 a4 a5 a6 a7 ... if x = x0 is close enough to zero we can have cos(x0 ) = 0.999999a7 ... the diﬀerence is 1 − cos(x0 ) = 0.100000 × 10−5 − 0.a7 a8 a9 ... × 10−6 if we use rounding and if a7 ≥ 5 we cannot calculate the value of cos x using six-digit arithmetic at all x ≤ x0 , because the rounding value of 1 − cos(x) is zero. for example 1 − cos(0.001) = 0.000000 but it is equal to 0.500000 × 10−6 . To overcome this we can use another formula 1 − cos(x) = 1 + cos(x) (1 − cos(x)) 1 + cos x sin2 (x) = 1 + cos x

If we use this last equation we ﬁnd that for x = 0.001 1 − cos(0.001) = sin2 (0.001) 1 + cos 0.001 0.1 × 10−5 = 2 = 0.5 × 10−6 x2 x4 − + ... 2 24

We can use Taylor polynomial 1 − cos x ≈ which gives 1 − cos 0.001 ≈ 0.0012 0.0014 − + ... 2 24 0.1 × 10−11 ≈ 0.5 × 10−6 − + ... 24 = 0.5 × 10−6 − 0.416667 × 10−13 + ... ≈ 0.5 × 10−6

Example: The value of the polynomial p(x) = 2x3 − 3x2 + 5x − 4 at x = 3 can be calculated as: 16

⋆ x2 = 9, x3 = 27, then we put every thing together, p(3) = 54 − 17 + 15 − 4 = 38. We have ﬁve multiplication: x2 , x3 , 2x3 , 3x2 , 5x, and one addition and two subtractions. We need in total 8 operations ⋆ The polynomial can be arrange as p(x) = [(2x − 3)x + 5]x − 4, nested manner We need three multiplications and one addition and two subtractions. In total we need six. – In general, for a polynomial of degree n we need (n − 1) + n = 2n − 1 multiplications: ( n−1 for xn , xn−1 ,...,x2 . and n for the multiplication of coeﬃcients, an ×xn ,an−1 ×xn−1 ,...a1 ×x. . However, for the nested form we need only n multiplications. – Both need n addition/subtraction operations.

1.2.1

Exercises

Assignment: odd exercises From section 1.2 pages 26-29 Tutorial: 1, Exercise 1: = π = 3.1415926..., and p∗ = 22 = 3.142857. The absolute error is 7

|p − p∗ | = 3.142857 − 3.141592 6... 0.0012644 < |p − p∗ | < 0.0012645 If we round it we ﬁnd that the absolute error is 0.00126. The relative error is 0.0012644 p − p∗ 0.0012645 < < 3.1415927 p 3.1415926 ∗ p−p 4.0247 × 10−4 < < 4.0250 × 10−4 p |p − p∗ | . Note that this number The relative error in p , as an approximation to p, is deﬁned by α = |p| |p − p∗ | is close to if α ≪ 1. One can show that |p∗ |

∗

If we round it we ﬁnd that the relative error is 4.025 × 10−4

|p − p∗ | |p − p∗ | α = α =⇒ = ≈α |p| |p∗ | |1 ± α| Exercise 5: Three-digit rounding arithmetic for: a) 133 + 0.921 = 133.921 is 134. b) 133 − 0.499 = 132.501 is 133. c) (121 − 0.327) − 119 = (120.673) − 119 = 121 − 119 is 2. d) (121 − 119) − 0.327 = (2) − 0.327 = 1.673 is 1.67 13 −6 0.929 − 0.857 0.072 = = 1.80 e) 14 7 = 2e − 5.4 5.44 − 5.4 0.04 The absolute error and relative error are: a) 0.79 × 10−1 , 0.59 × 10−3 b) 0.499, 0.377 × 10−2 e) 0.154 0.0786 17

(1.25)

Assignment Suppose two points (x0 , y0 ) and (x1 , y1 ) are on the straight line with y0 = y1 . The x-intercept of the line is given by x= or x0 y1 − x1 y0 y1 − y0 (x1 − x0 )y0 y1 − y0

x = x0 −

Group1 Use the data (x0 , y0 ) = (1.31, 3.24) and (x1 , y1 ) = (1.93, 4.76) and three-digit rounding arithmetic to compute x-intercept both ways. Which method is better and why?

Group2 Use the data (x0 , y0 ) = (0.2, 0.2) and (x1 , y1 ) = (1.2, 1.01) and three-digit rounding arithmetic to compute x-intercept both ways. Which method is better and why?

Solution -y1 x0 + x1 y0 X1 := --------------y1 + y0 y0 (x1 - x0) X2 := x0 + ------------y1 + y0 Group 1 ------------------------------------x0 := 1.31 y0 := 3.24 x1 := 1.93 y1 := 4.76 Actual solution is -0.01157894737 X1 := -0.00658 X2 := -0.01 Relative error for X1 is 0.4317272728 Relative error for X2 is 0.1363636365 X2 is better than X1 Group 2 --------------------------------------x0 := 0.2 y0 := 0.2 x1 := 1.2 y1 := 1.01 Actual solution is -0.04691358025 X1 := -0.0469 X2 := -0.047 Relative error for X1 is 0.000289473750 Relative error for X2 is 0.001842105197 X1 is better than X2

18

**Chapter 2 Solution of Equations in One Variable
**

Finding the roots of a function f is very important in science and engineering and it is not always simple. Let consider the function f (x) = (1 − x)8 = 1 − 8x + 28x2 − 56x3 + 70x4 − 56x5 + 28x6 − 8x7 + x8 It is clear that x = 1 is the only real root of f . The graph of f is given in Fig. 2.1. It shows that there are many roots for f because of many positive and negative values of f (x).

10 −14 10.0

7.5

y 5.0

2.5

0.0 0.97 0.98 0.99 1.0 x 1.01 1.02 1.03

Figure 2.1: The strange behavior of f (x) near x = 1 is due to round oﬀ errors in the computation of expanded for of f (x). of f (x) = 0.

19

2.1

Bisection Method

Deﬁnition: The ﬁrst technique, based on intermediate value theorem, is called bisection method. To begin, set a1 = a and b1 = b, and let p1 the midpoint of the interval [a, b]; that is b 1 − a1 a1 + b 1 = (2.1) 2 2 if f (p1 ) = 0, then the root of f (x) = 0 is p = p1 . If f (p1 ) = 0, then f (p1 ) has the same sign of as either f (a1 ) or f (b1 ). When f (p1 ) and f (a1 ) have the same sign, p ∈ (p1 , b1 ), and we set a2 = p1 and b2 = b1 . When f (p1 ) and f (b1 ) have the same sign, p ∈ (a1 , p1 ), and we set a2 = a1 and b2 = p1 . We then reapply the process to the interval [a2 , b2 ]. p 1 = a1 + Algorithm: INPUT: endpoints a, b; Tolerance T OL; maximum number of iteration N0 OUTPUT: approximate solution p or message of failure. Step 1: Set i=1; FA=f(a); Step 2: while i ≤ N0 do steps 3-6. Step 3: set p = a + (b − a)/2; (compute pi ) FP=f(p); Step 4: if F P = 0 or (b − a)/2 < T OL then OUTPUT (p); (procedure terminated successfully) STOP. Step 5: set i = i + 1 Step 6: If F A.F P > 0 then set a = p; (compute ai , bi ) FA=FP else set b = p Step 7: OUTPUT (“method failed after N0 iterations”) STOP The stopping procedure in step 4 by selecting a tolerance ǫ > 0 with the following conditions: |pN − pN −1 | < ǫ pN − pN −1 <ǫ pN |f (pN )| < ǫ (2.2) , pN = 0 (2.3) (2.4)

The ﬁrst and the last one are not good measure of the tolerance (see Ex.16 and 17 page 52). The middle one which is the relative error is better that the two others. 1 If you apply the bisection method to the function f (x) = on the interval [0, 2] you will ﬁnd x−1 that the method can catch a singularity at x=1. The stopping procedure in step 4 will never be satisﬁed for any number of iterations N0 and the method fails. To start the algorithm we need to check that the sign of the product f (a).f (b) ≤ 0. We can use the sign function deﬁned by −1, if x < 0 0, if x = 0 sign(x) = (2.5) 1, if x > 0 20

and we just test sign(f(a)).sign(f(b)) instead of f (a).f (b). It is good practice to set an upper bound for the number of iterations. This eliminate the possibility in entering an inﬁnite loop. It is good to choose the interval [a, b] to be small as possible so we can reduce the number of iterations. The bisection is slow to converge, N may become quite larger for small tolerance. Theorem: Suppose that f ∈ C[a, b] and f (a).f (b) < 0. The bisection method generates a sequence pn approximating a zero p of f with |pn − p| ≤ Proof: For each n ≥ 1, we have b1 = b b n − an we also have that |p − pn | ≤ b−a b n − an = n 2 2 (2.10) , a1 = a and b−a = 2n−1 (2.7) (2.8) (2.9) b−a , 2n n ≥ 1. (2.6)

which shows that the sequence pn converges to p. The bound for number of iterations assumes calculation performed using inﬁnite-digit arithmetic. When implementing the method on a computer, we have to consider round-oﬀ error. For example, the computation of midpoint of [a, b] should be found from the equation p n = an + instead from the algebraic equivalent equation pn = an + b n 2 (2.12) b n − an 2 (2.11)

The ﬁrst equation add a small correction (bn − an )/2 to the known value an . if bn − an is near the maximum precision of the machine this correction will not aﬀect signiﬁcantly pn . However, (an + bn )/2 may return a midpoint that is not even in the interval [an , bn ]. Exercises: Odd numbers of sec. 2.1 page 51-52. – Ex 13: √ An approximate value to 3 25 correct to within 10−4 . Let consider the function f (x) = x3 − 25. We can choose the interval [2, 3], We have f (2) = −17 and f (3) = 2. The two values have diﬀerent signs, so we can apply the bisection method.

21

pn b n − an f (pn ) 2.5 1 −9.3750 2.75 0.5 −4.2031 2.8750 0.25 −1.2363 2.93750 0.125 +0.34741 2.906250 0.0625 +0.0625 2.921875 0.031250 +0.03125 2.929688 0.015625 +0.145710 2.9257812 0.0078125 +0.0452607 2.9238281 0.0039062 −0.0048632 2.9248 1.9531E − 03 +2.0190E − 02 2.9243 9.7656E − 04 +7.6615E − 03 2.9241 4.8828E − 04 +1.3986E − 03 2.9240 2.4414E − 04 −17.324E − 03 2.9240 1.2207E − 04 −1.6692E − 04 √ 3 So, the approximate value of 25 is p14 = 2.9240, because (b14 − a14 )/2 = 6.1035E − 05 and (b13 − a13 )/2 = 1.2207E − 04. If we use the Theorem: |pn − p| ≤ we ﬁnd that 4 1 < 10−4 ⇒ −n log 2 < −4 ⇒ n > = 13.288 2n log 2 So, n should be at least 14. – Ex 18: The function f (x) = sin (πx) has zeros at every integer. We want to show that when −1 < a < 0 and 2 < b < 3, the bisection method converges to: 0 2 1 for for for a+b<2 a+b>2 a+b=2 a+b π ,and sin (bπ) for each iteration. 2 b−a < 10−4 2n (2.13)

n 1 2 3 4 5 6 7 8 9 10 11 12 13 14

an 2 2.5 2.75 2.875 2.8750 2.90625 2.921875 2.9218750 2.9218750 2.9238 2.9238 2.9238 2.9238 2.9240

bn 3 3 3 3 2.93750 2.93750 2.937500 2.9296875 2.9257812 2.9258 2.9248 2.9243 2.9241 2.9241

we have to check the sign of sin (aπ), sin

For the starting point we have: for a ∈ (−1, 0), the function sin (aπ) ∈ [−1, 0) and for b ∈ (2, 3), the function sin (bπ) ∈ (0, 1] So, the bisection method can apply on the interval [a,b]. The only root that we can get are, 0, or 1, or 2; because these are the only integers belong to (−1, 3). a+b π . We know that a + b ∈ (1, 3), Next, we have to check the sign of sin 2

¶ if a + b < 2 we have p = 0.5 <

a+b a+b < 1 and sin π > 0. 2 2 so the sign of sin (aπ) < 0 and sin (pπ) > 0 are diﬀerent and the only root between a and p is 0. Therefore the bisection method converge to 0. 22

a+b a+b < 1.5 and sin π < 0. 2 2 so the sign of sin (pπ) < 0 and sin (bπ) > 0 are diﬀerent and the only root between p and b is 2. Therefore the bisection method converge to 2. a+b ¶ if a + b = 2 we have p = 1 and sin π = 0. 2 which of cause is the root 1.

¶ if a + b > 2 we have p = 1 <

We will solve exercises 1,3,11,15,16 from sec 2.1 page 51-52.

2.2

**Fixed-Point Iteration
**

Deﬁnition: A number p is a ﬁxed point for a given function g if g(p) = p Theorem: If g ∈ C[a, b] and g(x) ∈ [a, b] for all x ∈ [a, b], the g has a ﬁxed point in [a, b]. In addition, g ′ (x) exists on (a, b) and a positive constant k < 1 exists with |g ′ (x)| ≤ k, for all x ∈ (a, b) (2.15) (2.14)

then the ﬁxed point in [a, b] is unique. proof: Let consider the function f (x) = g(x) − x. We have the following relations: f (b) = g(b) − b ≤ 0 f (a) = g(a) − a ≥ 0 (2.16)

So, according to the Intermediate value theorem f (x) = 0 has a root. Thus g(x) = x has a solution.Therefore g has a ﬁxed point. Assume that there are more than one ﬁxed points, let say, p and q with p = q. We know from the mean value theorem that it exists ξ between q and q such that g(p) − g(q) = g ′ (ξ) = 1 p−q which contradicts the fact that |g ′ (x)| < 1, so, there will be only one ﬁxed point. What about the case when |g ′ (x)| > 1, do we have such function? Indeed if |g ′ (x)| = 1 for all x the only possible case is that |g ′ (x)| < 1. Proof: Suppose that g ′ (x) = 1 for all x ∈ (a, b). Because g(x) ∈ [a, b] and g ′ (x) exists so, we have g ′ (x) < −1 or −1 ≤ g ′ (x) < 1 or g ′ (x) > 1: The ﬁrst one and last on are not true because −1 ≤ g(b) − g(a) <1 b−a (2.17)

So, we can say that for the case when |g ′ (x)| > 1 no function exists. Algorithm: INPUT: initial point p0 a, b; Tolerance T OL; maximum number of iteration N0 23

OUTPUT: approximate solution p or message of failure. Step 1: Set i=1; Step 2: while i ≤ N0 do steps 3-6. Step 3: set p = g(p0); (compute pi ) Step 4: if |p − p0 | < T OL then OUTPUT (p); (procedure terminated successfully) STOP. Step 5: set i = i + 1 Step 6: Set p0 = p; (update p0 ) Step 7: OUTPUT (“method failed after N0 iterations”) STOP. Fixed-point theorem If g ∈ C[a, b] and g(x) ∈ [a, b] for all x ∈ [a, b], suppose in addition, g ′ (x) exists on (a, b) and a positive constant 0 < k < 1 exists with |g ′ (x)| ≤ k, for all x ∈ (a, b) (2.18)

then for any p0 ∈ [a, b], the sequence pn = g(pn−1 ) converges to the unique ﬁxed point p ∈ [a, b]. Proof: If g satisﬁes the ﬁxed point theorem, then bounds for the error involved in using pn to approximate p are given by |p − pn | ≤ k n max{p0 − a, b − p0 } kn |p1 − p0 | |p − pn | ≤ 1−k proof: Study examples 3 and 4 page 57 Exercises: – Ex.5: We use a ﬁxed-point iteration method to determine a solution accurate to 10−2 for x4 − 3x2 − 3 = 0 on [1, 2] and p0 = 1. Solution: To use the ﬁxed-point iteration we have to ﬁnd a function g(x) which satisﬁes ﬁxed-point theo1/4 rem. the equation x4 −3x2 −3 = 0 leads to x4 = 3x2 +3, which in turn leads to x = 3x2 + 3 . Now we check if the function g(x) = 3x2 + 3 g ′ (x) =

1/4

(2.19) (2.20)

satisﬁes the ﬁxed-point theorem.

3 x 2 + 3)3/4 2 (3x

24

The derivative is always positive in the region [1, 2]. So, g(1) = 1.565084580 and g(2) = 1.967989671. Therefore g(x) ∈ [1, 2]. g ′ (x) = 3 x 2 3 ≤ 3/4 2 (3x2 + 3) 2 (3 × 12 + 3)3/4 ≤ 0.7825422900 < 1

So, the function g(x) satisﬁes the ﬁxed-point theorem. For more precisely, the second derivative is given by g ′′ (x) = − x2 − 2 3 4 (x2 + 1)(3x2 + 3)3/4

√ In the region [1, 2] we have g ′ (1) = 0.3912711450, g ′ (2) = 0.3935979342 and g( 2) = 0.4082482906, so g ′ (x) ≤ 0.4082482906 < 1. Therefore our k = 0.4082482906. according to the theorem we have |p − pn | ≤ k n max[p0 − a, b − p0 ] which becomes |p − pn | ≤ k n max[p0 − a, b − p0 ] ≤ 0.4082482906n ≤ 10−2 – Ex 7: We want to show that the function g(x) = π + 0.5 sin (x/2) has a unique ﬁxed point on [0, 2π]. We know that 0 < π − 0.5 ≤ g(x) ≤ π + 0.5 < 2π (2.21) The derivative of the function g is given by g ′ (x) = 1 cos (x/2) ≤ 4 4 (2.22) which implies that n ≥ 6. The answer will be p6 = 1.943316930 is accurate to within 10−2 .

Therefore the function has a unique ﬁxed point. To ﬁnd an approximation to the ﬁxed-point that is accurate to 10−2 let estimate the number of iteration. Let use p0 = π |p − pn | ≤ This leads to the number of iteration n ≥ 5. we can better use |p − pn | ≤ 25 kn |p1 − p0 | 1−k (2.24) 1 4

n

π ≤ 10−2

(2.23)

we have p1 = π + 1/2, so the last equation becomes 2 (1/4)n 1 = |p − pn | ≤ 3/4 2 3 This leads to the number of iteration n ≥ 4. Therefore p0 = 3.141592654 p1 = 3.641592654 p2 = 3.626048865 p3 = 3.626995623 p4 = 3.626938795 p5 = 3.626942209 1 4

n

≤ 10−2

(2.25)

2.3

Newton’s Method

Derivation of Newton’s (or the Newton-Raphson) method: Suppose that f ∈ C 2 [a, b], have continuous second derivative. Let p0 ∈ [a, b] be an approximation to the solution p of f (x) = 0 such that f ′ (p) = 0 and |p − p0 | is “small”. The ﬁrst Taylor polynomial of f (x) about p0 is (p − p0 )2 ′′ f (ξ(p)). (2.26) f (p) = f (p0 ) + (p − p0 )f ′ (p0 ) + 2 where ξ(p) lies between p and p0 . Since f (p) = 0, and if we neglect the quadrature term we get p ≈ p1 = p0 − We can write the sequence f (pn−1 ) , f ′ (pn−1 ) The last equation is what we call Newton’s Method. pn = pn−1 − Algorithm (please see Page 64) Newton’s method is a functional iteration technique of the form pn = g(pn−1 ) where g(pn−1 ) = pn−1 − f (pn−1 ) , f ′ (pn−1 ) n≥1 (2.29) n≥1 (2.28) f (p0 ) f ′ (p0 ) (2.27)

The Newton’s method cannot be continuous if f ′ (pn−1 ) = 0. Newton’s method derivation depends on the assumption that p0 is close to the p. So, it is important that the initial approximation p0 is chosen to be close to the actual value p. In some cases even with poor initial approximation the Newton’s method converges. The following theorem illustrates the theoretical importance of the initial approximation choice of p0 . Theorem: Let f ∈ C 2 [a, b]. If p ∈ [a, b] is such that f (p) = 0 and f ′ (p) = 0, then there exists a δ > 0 such that Newton’s method generates a sequence {pn } converging to p for any initial approximation p0 ∈ [p − δ, p + δ]. Proof: (see page 66). 26

The theorem states, that under reasonable assumptions, Newton’s method converges provided a suﬃciently accurate initial approximation is chosen. In practice, the method doesn’t tell us how to calculate δ. In general either the method converges quickly or it will be clear that convergence is unlikely. Newton’s method is a powerful technique, but it has a major weakness: the need of the ﬁrst derivative.The calculation of the ﬁrst derivative f ′ (x) needs more arithmetic operations than f (x).

2.4

Secant Method

Newton’s method is a powerful technique, but it has a major weakness: the need of the ﬁrst derivative.The calculation of the ﬁrst derivative f ′ (x) needs more arithmetic operations than f (x). We know that the ﬁrst derivative is deﬁned by f ′ (pn−1 ) = lim Letting x = pn−2 , we have f ′ (pn−1 ) ≈ f (x) − f (pn−1 ) x − pn−1 (2.30)

x→pn−1

using this last equation in the Newton’s method we get pn = pn−1 −

f (pn−2 ) − f (pn−1 ) pn−2 − pn−1

(2.31)

pn−1 − pn−2 f (pn−1 ) f (pn−1 ) − f (pn−2 )

(2.32)

This is called the Secant Method. Starting with two initial approximation p0 and p1 , the approximation p2 is the x-intercept of the line joining the two points (p0 , f (p0 )) and (p1 , f (p1 )). The approximation p3 is the x-intercept of the line joining (p1 , f (p1 )) and (p2 , f (p2 )) (see the Fig.2.2).

2.5

**False Position Method
**

Secant Method False Position Metho

p0

p

2

p3 p4

p1

p

0

p

2

p

3

p1 p

4

Figure 2.2: Secant Method and False Position method for ﬁnding the root of f (x) = 0.

27

False position method: generates approximations in the same way as the Secant method, but includes a test to ensure that the root is always bracketed between successive iterations. This method is not recommended and it is just to illustrate how bracketing can be incorporated. First we choose two approximations p0 and p1 such that f (p0 ) · f (p1 ) < 0. The approximation p2 is chosen the same manner as the secant method, as the x-intercept of the line joining (p0 , f (p0 )) and (p1 , f (p1 )). To decide which secant line to be use to compute p3 , we check the sign of f (p1 ) · f (p2 ). If it is negative then p1 and p2 bracket a root, and we choose b3 as the x-intercept of the line joining (p1 , f (p1 )) and (p2 , f (p2 ). If not we choose p3 as the x-intercept of the line joining p0 , f (p0 )) and (p2 , f (p2 )). In similar manner we can found pn , for n ≥ 4. Exercises Ex 5: We want to use Newton’s method to ﬁnd an approximate solution accurate to 10−4 of for the following equation. x3 − 2x2 − 5 = 0, in the interval , [1, 4] Solution: The Newton’s method is: pn = pn−1 − f (pn−1 ) , f ′ (pn−1 ) n≥1 (2.33)

with f (x) = x3 − 2x− 5 and f ′ (x) = 3x2 − 4x. The approximation is accurate to the places for which pn−1 and pn agree. The Newton’s methods gives: n 1 2 3 4 5 which is p4 = 2.690647448 If we use p0 = 2 we get n 1 2 3 4 5 which is p5 = 2.690647448 pn−1 2.000000000 3.250000000 2.811036789 2.697989503 2.690677153 pn 3.250000000 2.811036789 2.697989503 2.690677153 2.690647448 |pn − pn−1 | 1.250000000 0.438963211 0.113047286 0.007312350 0.000029705 pn−1 2.500000000 2.714285714 2.690951516 2.690647499 2.690647448 pn 2.714285714 2.690951516 2.690647499 2.690647448 2.690647448 |pn − pn−1 | 0.214285714 0.023334198 0.000304017 5.110−8 0.00

28

**Chapter 3 Error Analysis for Iterative Methods
**

3.1 Linearly and Quadratically Convergent Procedures

Deﬁnition: Suppose {pn } is a sequence that converges to p, with pn = p for all n. If positive constant λ and α exist with |pn+1 − p| =λ (3.1) lim n→∞ |pn − p|α

We investigate the order of convergence of functional iteration.

then the sequence converges to p of order α, with asymptotic constant λ. An iterative techniques of the form pn = g(pn−1 ) is said to be of order α if the sequence {pn } converges to the solution p = g(p) of order α.

In general, a sequence with high order of convergence converges more rapidly than a sequence with a lower order. The asymptotic constant aﬀects the speed of convergence but is not as important as the order. If α = 1, the sequence is linearly convergent. If α = 2, the sequence is quadratically convergent. Assume that we have two sequences one converges linearly to zero and the other converges quadratically to zero. For simplicity , suppose that |pn+1 | ≈ 0.5 |pn | |˜n+1 | p ≈ 0.5 |˜n |2 p For linearly convergent, we have |pn − 0| = |pn | ≈ 0.5|pn−1 | ≈ 0.52 |pn−2 | ≈ ... ≈ 0.5n |p0 | For quadratically convergent, we have |˜n − 0| = |˜n | ≈ 0.5|˜n−1 |2 ≈ 0.53 |˜n−2 |4 ≈ ... ≈ (0.5)2 p p p p If we use p0 =1 we can see that |pn | ≈ 0.5n , |˜n | ≈ (0.5)2 p 29

n −1 n −1

Linearly convergent Quadratically convergent

|˜0 |2 p

n

**after seven iterations we get |pn | ≈ 0.78125 × 10−2 (0.5)n = (0.5)2
**

7 −1

|˜n | ≈ 0.58775 × 10−38 p ⇒ n = 27 − 1 = 127

In order for the linearly convergent to have the same accuracy as quadratically convergence we need:

Theorem: Let g ∈ C[a, b] be such that g(x) ∈ [a, b], for all x ∈ [a, b]. Suppose, in addition, that g ′ is continuous on (a, b) and that a positive constant k < 1 exists with |g ′ (x)| ≤ k, for all x ∈ (a, b). If g ′ (x) = 0, then for any p0 ∈ [a, b], the sequence pn = g(pn−1 ), n ≥ 1, converges only linearly to the unique ﬁxed point p in [a, b]. proof: Theorem: Let p a solution of the equation x = g(x). Suppose that g ′ (p) = 0 and g ′′ is continuous with |g ′′ (x)| < M on an open interval I containing p. Then there exists a number δ > 0 such that, for p0 ∈ [p−δ, p+δ], the sequence deﬁned by pn = g(pn−1 ), where n ≥ 1, converges at least quadratically to p. Moreover, for suﬃcient large values of n, |pn+1 − p| < Proof: The easiest way to construct a ﬁxed-point problem associated with a root-ﬁnding problem f (x) = 0 is to subtract a multiple of f (x) from x. pn = g(pn−1 ), with g(x) = x − φ(x)f (x), where φ(x) is a diﬀerentiable function that will be chosen later. If p satisﬁes f (p) = 0 then it is clear that g(p) = p. If the iteration procedure derived from g to be quadratically convergent, we need g ′ (p) = 0 when f (p) = 0. Since g ′ (x) = 1 − φ′ (x)f (x) − φ(x)f ′ (x) we have g ′ (p) = 1 − φ′ (p)f (p) − φ(p)f ′ (p) = 1 − φ(p)f ′ (p) which implies φ(p) = 1 f ′ (p) M |pn − p|2 . 2 (3.3) (3.2)

If we let φ(x) = 1/f ′ (x), we will ensure that φ(p) = 1/f ′ (p) and produce quadratically convergent procedure pn = g(pn−1 ) = pn−1 − f (pn−1 ) f ′ (pn−1 ) (3.4)

This is of cause Newton’s method which is quadratically convergent provided that f ′ (pn−1 ) = 0. 30

3.2

Zero multiplicity

Deﬁnition: A solution p of f (x) = 0 is zero multiplicity m of f if for x = p, we can write f (x) = (x − p)m q(x), where lim q(x) = 0.

x→p

Theorem: f ∈ C 1 [a, b] has a simple zero at p in (a, b) iﬀ f (p) = 0 but f ′ (p) = 0. Proof: If p is a simple root of f then Newton’s method converges quadratically. If p is not a simple root then Newton’s method may not converge quadratically (see Example 2 page 79). Theorem: The function f ∈ C m [a, b] has a zero of multiplicity m at p in (a, b) iﬀ f (p) = f ′ (p) = ... = f m−1 (p) = 0 f m (p) = 0 One method to handle problems of multiple roots is to deﬁne a function µ(x) = f (x) f ′ (x) (3.5)

If p is zero of f of multiplicity m and f (x) = (x − p)m q(x), then µ(x) = (x − p) q(x) mq(x) + (x − p)q ′ (x) µ(x) µ′ (x) f (x) f ′ (x) [f ′ (x)]2 − f (x) f ′′ (x) (3.6)

which has a simple zero. Newton’s method can be applied to µ to give g(x) = x − = x−

If g has the required continuity conditions, functional iteration applied to g will be quadratically convergent regardless of the multiplicity of the zero of f . Theoretically, the only drawback to this method is the additional calculation of the second derivative f ′′ . In practice, multiple roots can cause serious round-oﬀ problems since the denominator consists of the diﬀerence of two numbers that are both close to zero.

3.3

Exercises

Ex 6: Show that the following sequence converges linearly to p = 0. How large must n before we have |p − pn | ≤ 5 × 10−2 ? a)pn = 1/n. b) qn = 1/n2 . Sol 6: It is clear that the limit of the sequences is zero It is clear that |pn+1 − p| =1 n→∞ |pn − p| lim 31

for (a) we have 1/n ≤ 5 × 10−2 implies that n ≥ 20. for (b) we have 1/n2 ≤ 5 × 10−2 implies that n2 ≥ 20, which in turn gives n ≥ 5. Ex 8a: We want to show that the sequence pn = 10−2 converges quadratically to 0. Sol 8a: We know that

n→∞

n

lim 10−2 = 0

n

**we now calculate the following limit 10−2 = lim n n→∞ (10−2 )2
**

n+1

(10−2 )2 lim =1 n n→∞ (10−2 )2

n

thus α = 2,and the sequence converges quadratically to zero. Ex 8b: We want to show that the sequence pn = 10−n doesn’t converge to zero quadratically, regardless of the size of the exponent k > 1. Sol 8b: We know that

n→∞

k

lim 10−n = 0

k

we now calculate the following limit 10−(n+1) k n→∞ (10−n )2 lim To ﬁnd this limit we know that lim 2n − (n + 1)

k k

k

n→∞

=

n→∞

lim n

k

2−

n+1 n

k

=∞

So, we cannot ﬁnd a positive number λ. Therefore, the sequence doesn’t converge quadratically.

32

**Chapter 4 Accelerating Convergence
**

4.1 Aitken’s ∆2 method

We consider a technique that can be used to accelerate the convergence of a sequence that is linearly convergent. Suppose that {pn } is a linearly convergent sequence with limit p. The Aitken’s ∆2 method on the assumption that {ˆn } converges deﬁned by p pn = pn − ˆ (pn+1 − pn )2 pn+2 − 2pn+1 + pn (∆pn )2 for n ≥ 0 = pn − ∆2 p n

(4.1)

converges more rapidly to p than does the original sequence {pn }. The symbol ∆pn is the forward diﬀerence which is deﬁned by ∆pn = pn+1 − pn , for n ≥ 0 (4.2) (4.3)

High powers of the operator ∆ are deﬁned recursively by ∆k pn = ∆(∆k−1 pn ) for k ≥ 2 (4.4) (4.5)

For example ∆2 p n = = = = ∆(∆pn ) ∆(pn+1 − pn ) (pn+2 − pn+1 ) − (pn+1 − pn ) pn+2 − 2pn+1 + −pn

(4.6)

Theorem: Suppose that {pn } is a sequence that converges linearly to the limit p and that pn+1 − p <1 n→∞ pn − p lim 33 (4.7)

Then the sequence pn = pn − ˆ converges to p faster that {p} in the sense that pn − p ˆ =0 n→∞ pn − p lim Example: Let consider pn = cos(1/n). This sequence converges linearly to p = 1, lim cos cos

1 n+1 1 n

(∆pn )2 ∆2 p n

(4.8)

(4.9)

n→∞

−1

−1

**n2 sin n+1 = lim n→∞ (n + 1)2 sin 1 n = =
**

n→∞

1

lim

sin

cos n+1 n n→∞ (n + 1)2 cos 1 n = 1 lim The Aitken’s ∆2 is deﬁned by pn ˆ (∆pn )2 = pn − ∆2 p n (pn+1 − pn )2 = pn − pn+2 − 2pn+1 + pn pn ˆ 0.9617750599 0.9821293535 0.9897855148 0.9934156481 0.9954099422 0.9966199575 0.9974083190

1 n+1 1 sin n 2

1

(4.10)

n 1 2 3 4 5 6 7

pn 0.5403023059 0.8775825619 0.9449569463 0.9689124217 0.9800665778 0.9861432316 0.9898132604

Example: The function f (x) = x3 − 3x + 2 = (x − 1)2 (x + 2) has a double root p = 1. If Newton’s method converges to p = 1 it converges linearly. We choose p0 = 2. The Newton’s method produces the following sequence: p0 = 2. pn+1 = pn − 34 p3 − 3pn + 2 n 3pn − 3

n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

pn 1.555555555555556 1.297906602254429 1.155390199213768 1.079562210414361 1.040288435171017 1.020276809786733 1.010172323431420 1.005094741093272 1.002549528082823 1.001275305026243 1.000637787960288 1.000318927867152 1.000159472408516 1.000079738323218 1.000039869690520

pn − 1 pn−1 − 1 0.5555555560 0.5362318832 0.5216071009 0.5120156259 0.5063765197 0.5032910809 0.5016727483 0.5008434160 0.5004234759 0.5002121961 0.5001062491 0.5000533092 0.5000250840 0.5000125414 0.5000125411

It is clear that Newton’s method is linearly convergent or it converges slowly to p = 1. Let us apply Aitken’s acceleration process to the sequence pn of iterations generated by Newton’s method pn = pn − ˆ which gives: n 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 pn 2.0 1.555555556 1.297906602 1.155390199 1.079562210 1.040288435 1.020276810 1.010172323 1.005094741 1.002549528 1.001275305 1.000637788 1.000318928 1.000159472 1.000079738 1.000039870 pn ˆ 0.9425287356 0.9789767949 0.9933420783 0.9980927682 0.9994865474 0.9998665586 0.9999659695 0.9999914062 0.9999978406 0.9999994588 0.9999998645 0.9999999661 0.9999999915 0.9999999979 ∗∗ ∗∗ pn − 1 1.0 0.555555556 0.297906602 0.155390199 0.079562210 0.040288435 0.020276810 0.010172323 0.005094741 0.002549528 0.001275305 0.000637788 0.000318928 0.000159472 0.000079738 0.000039870 pn − 1 ˆ −0.0574712644 −0.0210232051 −0.0066579217 −0.0019072318 −0.0005134526 −0.0001334414 −0.0000340305 −0.0000085938 −0.0000021594 −5.41210−7 −1.35510−7 −3.3910−8 −8.510−9 −2.110−9 ∗∗ ∗∗ (pn+1 − pn )2 pn+2 − 2pn+1 + pn

(4.11)

4.2

Steﬀensen’s Method

By applying a modiﬁcation of Aitken’s ∆2 method to a linearly convergent sequence obtained from a ﬁxed-point iteration, we can accelerate the convergence to quadratic. This procedure is known as Stef35

fensen’s method. Aitken’s method construct the terms in order, p0 , p1 = g(p0 ), p2 = g(p1 ), p0 = {∆2 }(p0 ), ˆ p3 = g(p2 ), p1 = {∆2 }(p1 ), ... ˆ (4.12)

where {∆2 } indicates that Aitken’s method given by eq.(4.10) is used. Steﬀensen’s Method constructs the same ﬁrst four terms, p0 , p1 , p2 , and p0 . However, at this step it assumes that p0 is better ˆ ˆ approximation to p than p2 and applies ﬁxed-point iteration to p0 instead of p2 . This leads to the ˆ following sequence: p0 ,

(0)

p1 = g(p0 ),

(0)

(0)

p2 = g(p1 ),

(0)

(0)

p0 = {∆2 }(p0 ),

(1)

(0)

p1 = g(p0 ), ...

(1)

(1)

(4.13)

Note that the denominator can be zero in the next iteration. If this occurs, we terminate the sequence and select the last one before we get zero1 . Ex 3 page 86 : (0) (1) Let g(x) = cos(x − 1) and p0 = 2. We want to use Steﬀensen’s method to get p0 . p0

(0)

**= 2 = cos(0.5403023059 − 1) = 0.8961866647 = p0 −
**

(0)

(0) p1 (0) p2

= cos(2 − 1) = 0.5403023059 (p1 − p0 )2

(0) (0) (0)

p0

(1)

p2 − 2p1 + p0

(0)

(0)

= 0.826427396

**Ex 4 page 86 : (0) (1) (2) Let g(x) = 1 + (sin x)2 and p0 = 2. We want to use Steﬀensen’s method to get p0 and p0 p0 p2
**

(0) (0)

**= 2 = 1 + (sin 2)2 = 1.708073418 = 1 + (sin 1.708073418)2 = 1.981273081 =
**

(0) p0

p1

(0)

(1) p0

−

p2 − 2p1 + p0

(0)

(p1 − p0 )2

(0)

(0)

(0)

(0)

= 2.152904629

**To calculate p0 we start with: p0
**

(1)

(2)

**= = 2.152904629 = 1 + (sin 2.152904629)2 = 1.697735097 = 1 + (sin 1.697735097)2 = 1.983972911 = p0 −
**

(1)

(1) p1 (1) p2

p0

1

(2)

p2 − 2p1 + p0

(1)

(p1 − p0 )2

(1)

(1)

(1)

(1)

= 1.873464043

See page 85 from the textbook

36

4.3

Zeros Polynomial

P (x) = an xn + an−1 xn−1 + ... + a1 x + a0 (4.14)

A polynomial of degree n has the form

where ai ’s are the coeﬃcients of P . are constant and an = 0. Fundamental Theorem of Algebra: If P (x) is polynomial of degree n ≥ 1, then P (x) = 0 has at least one root (possibly complex). If P (x) is a polynomial of degree n ≥ 1, then there exist unique constants x1 , x2 , ..., xk , possibly complex, and unique positive integers m1 , m2 , ..., mk , such that

k

mi = n

i=1

P (x) = an (x − x1 )m1 (x − x1 )m2 ...(x − xk )mk mi is the multiplicity of the zero xi . Let P (x) and Q(x) be polynomials of degree at most n, If x1 , x2 , ..., xk , with k > n, are distinct numbers with P (xi ) = Q(xi ) for i = 1, 2, ..., k, then p(x) = Q(x) for all values of x.

4.4

Horner’s Method

To use Newton’s method to locate approximate zeros of a polynomial P(x), we need to evaluate P (x) and P ′ (x) at speciﬁed values. To compute them eﬃciency we compute them in the nested manner. Horner’s method incorporates this nesting technique and require only n multiplications and n additions to evaluate an arbitrary nth-degree polynomial. Synthetic Division Algorithm and the Remainder Theorem: A polynomial of degree n can be written as Pn (x) = an xn + an−1 xn−1 + ... + a1 x + a0 (4.15)

Divide this polynomial Pn (x) by (x − x1 ), giving a reduced polynomial Qn−1 (x) of degree n − 1, and a remainder R Pn (x) = (x − x1 )Qn−1 (x) + R (4.16) We can see that Pn (x1 ) = R. If we diﬀerentiate Pn (x) we get

′ Pn (x) = (x − x1 )Q′n−1 (x) + Qn−1 (x)

(4.17) (4.18)

thus,

′ Pn (x1 ) = Qn−1 (x1 )

37

We evaluate Qn−1 (x1 ) by a second division whose remainder equals Qn−1 (x1 ), and so on. Now we can write Pn (x) = = = = We collect the terms Pn (x) = bn−1 xn + [bn−2 − x1 bn−1 ] xn−1 + [bn−3 − x1 bn−2 ] xn−2 + ... + [b0 − x1 b1 ] x + [R − x1 b0 ] By comparison we get: bn−1 = an bn−2 = an−1 + x1 bn−1 . . = . . . . bi = ai+1 + x1 bi+1 . . . = . . . b 0 = a1 + x 1 b 1 So, the reminder can be evaluated from R = a0 + x 1 b 0 Horner’s Method: Let P (x) = an xn + an−1 xn−1 + ... + a1 x + a0 If bn = an and bk = ak + bk+1 x0 , for k = n − 1, n − 2, ..., 1, 0 (4.20) (4.19) an xn + an−1 xn−1 + ... + a1 x + a0 (x − x1 )Qn−1 (x) + R (x − x1 )(bn−1 xn−1 + bn−2 xn−2 + ... + b1 x + b0 ) + R bn−1 xn + bn−2 xn−1 + ... + b1 x2 + b0 x − bn−1 x1 xn−1 + bn−2 x1 xn−2 + ... + b1 x1 x + b0 x1 + R

then b0 = P (x0 ), k is from n − 1 to 0, which means you need only n multiplications and n additions to get p(x0 ). Moreover, if Q(x) = bn xn−1 + bn−1 xn−2 + ... + b2 x + b1 Then Proof: The derivative of P (x) is given by P ′ (x) = Q(x) + (x − x0 )Q′ (x) Thus P ′ (x0 ) = Q(x0 ) 38 (4.24) (4.23) P (x) = (x − x0 )Q(x) + b0 (4.22) (4.21)

Example: We want to evaluate P (x) = 2x4 − 3x2 + 3x − 4 at x0 = −2 using Horner’s method. we start by: Coeﬀ x4 a4 = 2 b 4 = a4 b4 = 2 Coeﬀ x3 a3 = 0 b3 = a3 + b4 (−2) b3 = −4 Coeﬀ x2 a2 = −3 b2 = a2 + b3 (−2) b2 = 5 Coeﬀ x1 a1 = 3 b1 = a1 + b2 (−2) b1 = −7 Coeﬀ x0 a0 = −4 b0 = a0 + b1 (−2) b0 = 10

Therefore, P (−2) = 10 and P (x) = (x + 2)(2x3 − 4x2 + 5x − 7) + 10. Example: Find an approximation to one of the zeros of P (x) = 2x4 − 3x2 + 3x − 4 using Newton’s Method and synthetic division to evaluate P (xn ) and P ′ (xn ) for each iterate xn . at x0 = −2 we use bn = an and bk = ak + bk+1 x0 for k = n − 1 to k = 0. 21 26 02 −33 34 −45 (26 )(−2) = −47 (−4)8 (−2) = 89 (510 )(−2) = −1011 (−712 )(−2) = 1413 02 − 47 = −48 −33 + 89 = 510 34 − 1011 = −712 −45 + 1413 = 1014

Using the theorem P ′ (x0 ) = Q(x0 ) we get 21 25 and x1 = x0 −

−42 53 −74 (25 )(−2) = −46 (−87 )(−2) = 168 (219 )(−2) = −4210 −42 − 45 = −87 53 + 168 = 219 −74 − 4210 = −4911 10 P (x0 ) = −2 − ≈ −1.796 Q(x0 ) −49

repeating the procedure we get for x1 = −1.796, P (x1 ) = 1.742 and P ′ (x1 ) = −32.565, so, x2 ≈ −1.73896. in a similar manner we get x3 = −1.73897. An actual zero to ﬁve decimal places is −1.73896.

4.5

Deﬂation

P (x) = (x − xN )Q(x) + b0 = (x − xN )Q(x) + P (xN ) ≈ (x − xN )Q(x) (4.25)

If the N th iterate, xN , in Newton’s method is an approximate zero for the polynomial P (x), then

so, x − xN is an approximate factor of P (x). Letting x1 = xN be the approximate zero of P and Q1 (x) = ˆ Q(x) be the approximate factor given by P (x) ≈ (x − x1 )Q1 (x) ˆ (4.26)

We can ﬁnd a second approximate zero of P by applying Newton’s method to Q1 (x). If P (x) is of order n we can apply repeatedly the procedure to ﬁnd x2 and Q2 (x),..., xn−2 and Qn−2 (x). After ﬁnding (n − 2) ˆ ˆ roots we get a quadrature form which we can solve it to get the last two approximate roots. This procedure is called deﬂation. 39

The accuracy diﬃculty with deﬂation is due to the fact that, when obtaining the approximate zero of P (x), Newton’s method is used on the reduced polynomial Qk (x), that is, P (x) ≈ (x − x1 )(x − x2 )...(x − xk )Qk (x) ˆ ˆ ˆ (4.27)

An approximate zero xk+1 of Qk (x) will generally not approximate a root of P (x) but of Qk (x). To ˆ eliminate this we can use the reduced equations to ﬁnd approximates xi , and then apply Newton’s ˆ Method to the original polynomial P (x). One problem with applying the Secant, False position, or Newton’ methods to polynomials is possibility of having complex roots. If the initial approximation is real all subsequent approximations will also be real. To overcome this problem we start with complex initial approximation.

4.6

M¨ller’s method u

The secant method use two initial approximations p0 and p1 , to get p2 which is the x-intercept of the line joining the two points (p0 , f (p0 )), (p1 , f (p1 )). M¨ller’s method uses three initial approximations, p0 , p1 ,p2 , and determinate the next approximation p3 by u considering the intersection of the x-axis with the parabola through (p0 , f (p0 )),(p1 , f (p1 )), and (p2 , f (p2 )). The parabola take the form P (x) = a(x − p2 )2 + b(x − p2 ) + c Of cause we can ﬁnd these parameters a, b, and c. To determine p3 , a zero of P (x), we apply the quadrature formula p3 − p2 = −2c √ b ± b2 − 4ac (4.29) (4.28)

This has no problem with subtracting nearly equal numbers (see example 5 section 1.2). This formula has two roots. In Muller’s method, the sign is chosen to agree with the sign of b, so p3 = p2 − 2c √ b + sign(b) b2 − 4ac (4.30)

Once p3 is determined, we apply the procedure to p1 , p2 and p3 to get p4 . and so one. The method involve square root which means that complex numbers can be found using Muller’s method.

4.7

Exercises

We want to ﬁnd the approximation to 10−4 of all real zeros of the following polynomial using Newton’s method. P (x) = x3 − 2x2 − 5 40

sol: Descartes’s rule of signs. The rule states that the number np of positive zeros of a polynomial P (x) is less than or equal to the number of variations v in sign of the coeﬃcients of P (x). Moreover, the diﬀerence v − np is nonegative even integer. For our example, the number of variations v in sign of the coeﬃcients of P (x) is v = 1. There are at most 1 positive root. Moreover, 1 − np ≥ 0, which implies that np = 1. Therefore there is one positive root. Now we change x → −x, we ﬁnd P (−x) = −x3 − 2x2 − 5 So, there is no variations in sign, v = 0. Thus, there is no negative root. Thus, our conclusion is: there is only on real root which is a positive real number. We then apply Newton’s Method starting with p0 = 2. f (x) = x3 − 2x2 − 5 f ′ (x) = 3x2 − 4x p0 = 2 f (pn ) pn+1 = pn − ′ , f (pn )

n≥0

n 1 2 3 4 5 6

pn 3.250000000 2.811036789 2.697989503 2.690677153 2.690647448 2.690647448

|pn − pn−1 | 1.250000000 0.438963211 0.113047286 0.007312350 0.000029705 0.000000001

We want to ﬁnd to within 10−5 all the zeros of the polynomial f (x) = x4 + 5x3 − 9x2 − 85x − 136 by ﬁrst ﬁnding the real zeros using Newton’s method and then reducing to polynomails of lower degree to determine any complex zeros. According to Descart rule we have: 1. For positive zeroes, we have: number of variations of sign is 1. Thus, there is only on positive zero 2. For negative zeroes, we have: number of variartions of sign is 3. Thus, there are one or three negative zeroes. We apply newtons method with p0 = 0.00000 we get

41

[1 [2 [3 [4 [5 [6 [7 [8 [9

-1.600000000, -2.681394805, -5.595348023, -4.842605061, -4.377210956, -4.167343093, -4.124721017, -4.123107873, -4.123105624,

1.600000000] 1.081394805] 2.913953218] 0.752742962] 0.465394105] 0.209867863] 0.042622076] 0.001613144] 0.000002249]

We now use Horner’s method to get the reduced polynomial. we get b4 = b3 = b2 = b1 = b0:= The polynomial is now reduced to Q1 (x) = x3 + 9.123105625x2 + 28.61552812x + 32.9848450 We use Newton Method to get a solution for Q1 (x) = 0, we ﬁnd up to 10−5 the root 4.123106. We use Horners method to get the reduced polynomial b3 b2 b1 b0 = = = = 1. 5. 8. 0. 1. 9.123105625 28.61552812 32.9848450 0.

After rounding we get the negative zeros is −4.123106.

Thus, the reduced polynomial is

Q2 (x) = x2 + 5x + 8 This gives two complex roots, −2.5 ± 1.32288i

42

**Chapter 5 Interpolation and Polynomial Approximation
**

Conseder Data with two columns, x and y. We plot this data and see if we can ﬁt this data to a function y = f (x). This is what we call “interpolation”. We will study the case when we ﬁt the data to a polynomial.

5.1

**Weierstrass Approximation Theorem
**

Theorem: Suppose that f is deﬁned and continuous on [a, b]. For each ǫ > 0, there exists a polynomial P (x), with the property that |f (x) − P (x)| < ǫ, for all x ∈ [a, b]. The proof of this theorem can be found in most elementary textbook on real analysis. Taylor polynomials are used mainly to approximate a function at a speciﬁed point. A good polynomial needs to provide a relatively accurate approximation over the entire interval. Taylor polynomial is not always an appropriate for interpolation. As example To approximate f (x) = 1/x at x = 3 using Taylor polynomial expanded at x = 1, leads to inaccurate result. n 0 1 2 3 4 5 6 7 Pn (3) 1 −1 3 −5 11 −21 43 −85

5.2

Lagrange Polynomial

In this section we ﬁnd approximating polynomials that are determined simply by specifying certain points on the plane through which they must pass. Let determine a polynomial of degree one that passes through the distinct points (x0 , f (x0 )) and (x1 , f (x1 )). We deﬁne the functions: L0 (x) = x − x1 x0 − x1 x − x0 L1 (x) = x1 − x0 43 (5.1) (5.2)

and deﬁne P (x) = L0 (x)f (x0 ) + L1 (x)f (x1 ) (5.3)

It is clear that the polynomial P (x) coincides with f (x) at x0 and x1 and it is the unique linear function passing through (x0 , f (x0 )) and (x1 , f (x1 )). To generate this concept of linear interpolation, consider the construction of a polynomial of degree at most n that passes through n + 1 points, (x0 , f (x0 )), (x1 , f (x1 )),...,(xn , f (xn )). Theorem If x0 , x1 ,...,xn are n + 1 distinct numbers and f is a function whose values are given at these numbers, then a unique polynomial P (x) of degree at most n exists with f (xk ) = P (xk ), This polynomial is given by

n

k = 0, 1, ..., n

(5.4)

P (x) =

i=0

f (xi )Ln,i (x)

(5.5)

with

n

Ln,k (x) =

i=0 i=k

x − xi xk − xi

(5.6)

Proof: Theorem: Suppose x0 , x1 ,...,xn are n + 1 distinct numbers in [a, b] and f ∈ C n+1 [a, b], then f n+1 (ξ(x)) f (x) = P (x) + (x − x0 )(x − x1 )...(x − xn ) (n + 1)! where P (x) is the interpolating polynomial given by eq.(5.5) and ξ(x) ∈ (a, b). Deﬁnition: Let f be a function at x0 , x1 ,..., xn , and suppose that m1 , m2 ,...,mk are k distinct integers, with 0 ≤ mi ≤ n for each i. The Lagrange polynomial that agrees with f (x) at all k points xm1 , xm2 ,..., xmk , is denoted by Pm1 m2 ...mk (x). Theorem Let f be deﬁned at x0 , x1 ,...,xk , and xj = xi be two numbers from the set. Then, P (x) = (x − xj )P0,1,...,j−1,j+1,...,k (x) − (x − xi )P0,1,...,i−1,i+1,...,k (x) (xi − xj ) (5.8) (5.7)

Proof:

describes the kth Lagrange polynomial that interpolate f at k + 1 points, x0 , x1 ,...,xk . Proof:

44

5.3

Neville’s Method

This theorem implies that the interpolating polynomials can be generated recursively. To avoid multiple subscripts, let Qi,j , for 0 ≤ j ≤ i, denote the interpolating polynomial of degree j on the (j + 1) numbers xi−j , xi−j+1 ,...,xi−1 , xi ; that is Qi,j = Pi−j,i−j+1,...,i−1,i (5.9)

x0 x1 x2 x3 x4

P0 P1 P2 P3 P4

= Q0,0 = Q1,0 = Q2,0 = Q3,0 = Q4,0

P0,1 P1,2 P2,3 P3,4

= Q1,1 = Q2,1 P0,1,2 = Q2,2 = Q3,1 P1,2,3 = Q3,2 P0,1,2,3 = Q3,3 = Q4,1 P2,3,4 = Q4,2 P1,2,3,4 = Q4,3 P0,1,2,3,4 = Q4,4

(5.10)

Naville’s Iterated Interpolation is given by Qi,j = Q0,0 (x − xi−j )Qi,j−1 − (x − xi )Qi−1,j−1 (xi − xi−j ) = f (x0 ), Q1,0 = f (x1 ), , ..., Qn,0 = f (xn ), (5.11) (5.12)

Example: Suppose function f is given for the following values: x x0 x1 x2 x3 x4 = 1.0 = 1.3 = 1.6 = 1.9 = 2.2 f (x) 0.7651977 0.6200860 0.4554022 0.2818186 0.1103623

we want to approximate f (1.5) using various interpolating polynomials at x = 1.5. By using Neville’s method, eq. (5.12), we can calculate Qi,j Q0,0 = P0 = 0.7651977, Q1,0 = P1 = 0.6200860, (x − x0 )Q1,0 − (x − x1 )Q0,0 = 0.5233449 Q1,1 = P0,1 = (x1 − x0 ) Q2,0 = P2 = 0.4554022 (x − x1 )Q2,0 − (x − x2 )Q1,0 = 0.5102968 Q2,1 = P1,2 = (x2 − x1 ) (x − x0 )Q2,1 − (x − x2 )Q1,1 Q2,2 = P0,1,2 = = 0.5124715 (x2 − x0 ) assignments: study example 6 page 113. 45

5.4

Newton Interpolating Polynomial

Suppose there is a known polynomial Pn−1 (x) that interpolates the data set: (xi , yi ), i = 0, 1, .., n − 1. When one more data point (xn , yn ), which is distinct from all the other data points, is added to the data set, we can construct a new polynomial Pn (x) that interpolates the new data set. To do so, let consider the polynomial

n−1

Pn (x) = Pn−1 (x) + cn

i=0

(x − xi )

(5.13)

where cn is an unknown constant. For the case when n = 0, we write P0 (x) = y0 It is clear that for all the old points Pn (xi ) = Pn−1 (xi ), at the new point (xn , yn ) we have

n−1

(5.14)

for i = 0, ..., n − 1

(5.15)

Pn (xn ) = Pn−1 (xn ) + cn

i=0

(xn − xi )

(5.16)

**which leads to at the new point (xn , yn ) we have cn = Pn (xn ) − Pn−1 (xn )
**

n−1 i=0

=

yn − Pn−1 (xn )

n−1 i=0

(5.17)

(xn − xi )

(xn − xi )

So, For any given data set (xi , yi ), i = 1, 2, ..., n, we can obtain the interpolating polynomial by recursive process that starts from P0 (x) and uses the above construction to get P1 (x), P2 (x), ..., Pn−1 (x). We will demonstrate this process through the following example i 0 1 2 3 4 xi 0 0.25 0.5 0.75 1 yi −1 0 1 0 1 First step: for i = 0 we have P0 (x) = y0 = −1 Second step: adding the point (x1 , y1 ) = (0.24, 0), we get

0

P1 (x) = P0 (x) + c1

i=0

(x − xi )

= −1 + c1 (x − x0 ) = −1 + c1 x 46

**The constant c1 is given by c1 = y1 − P0 (x1 )
**

0 i=0

(x1 − xi )

= Thus,

0 − (−1) =4 0.25 − 0

**P1 (x) = −1 + 4 x The third step: adding the point (x2 , y2 ) = (0.5, 1),
**

1

P2 (x) = P2 (x) + c2

i=0

(x − xi )

**= (−1 + 4x) + c2 (x − x0 )(x − x1 ) = −1 + 4x + c2x(x − 0.25) The constant c2 is given by c2 = y2 − P1 (x2 )
**

1 i=0

(x2 − xi )

−1 − (−1 + 4 × 0.5) = =0 (0.5 − 0)(0.5 − 0.25) Thus, P2 (x) = −1 + 4x We continue the calculations we ﬁnd c3 = −64 , 3 64 1 x x− 3 4 64 1 P4 (x) = −1 + 4x − x x − 3 4 P3 (x) = −1 + 4x − 1 2 1 x− 2 x−

c4 = 64,

+ 64x x −

1 4

x−

1 2

x−

3 4

Divided diﬀerence polynomial: The divided diﬀerence polynomial is a helpful method to generate interpolation polynomials. The ﬁrst order divided diﬀerence of f at x = xi is give by f [xi , xi+1 ] = f (xi+1 ) − f (xi ) xi+1 − xi (5.18)

The second order divided diﬀerence of f at xi is given by f [xi , xi+1 , xi+2 ] = f [xi+1 , xi+2 ] − f [xi , xi+1 ] xi+2 − xi 47 (5.19)

We can generate this to higher order f [x1 , . . . , xn ] − f [x0 , . . . , xn−1 ] xn − x0 With these deﬁnition we get the interpolation polynomial as: f [x0 , x1 , . . . , xn ] =

n−1

(5.20)

Pn (x) = f [x0 ] + f [x0 , x1 ](x − x0 ) + . . . + f [x0 , . . . , xn ]

n i−1

i=0

(x − xi ) (5.21)

= f [x0 ] +

i=1

f [x0 , . . . , xi ]

i=0

(x − xi )

Example: i 0 1 2 3 4 xi 0 0.25 0.5 0.75 1 yi −1 0 1 0 1 2nd DD

**Let try to ﬁnd the interpolation polynomial of the above table i xi f (xi ) 0 0.00 f [x0 ] = −1 1 0.25 f [x1 ] = 0 2 0.50 f [x1 ] = 1 f [x1 , x2 ] = 3 0.75 f [x1 ] = 0 f [x0 , x1 , x2 ] = 0 f [x2 , x3 ] = 4 1.00 f [x1 ] = 1 f [x3 , x4 ] =
**

f [x4]−f [3] x4 −x3 f [x3 ]−f [x2 ] x3 −x2 f [x2 ]−f [x1 ] x2 −x1

1st DD

3rd DD

f [x0 , x1 ] =

f [x1 ]−f [x0 ] x1 −x0

=4

=4

= −4

f [x1 , x2 , x3 ] = −16 f [x0 , x1 , x2 , x3 ] = − 64 3 f [x1 , x2 , x3 , x4 ] =

128 3

=4 f [x2 , x3 , x4 ] = 16

f [x0 , x1 , x2 , x3 , x4 ] = 64 So, the polynomial is P3 (x) = −1 + 4(x − 0) + 0(x − 0) x − 64 1 1 x− (x − 0) x − 3 4 2 1 3 1 x− x− +64 (x − 0) x − 4 2 4 64 1 1 1 1 3 = −1 + 4x − x x − x− + 64x x − x− x− 3 4 2 4 2 4 − 48 1 4

5.5

Polynomial Forms

Power form: A polynomial of degree n can be written in power form as Pn (x) = a0 + a1 x + . . . + an xn

n

We follow the book Introduction to numerical analysis, Alastair Wood, Addition-Wesly

=

k=0

ak x k

(5.22)

This form is convenient for analysis but may leads to loss of signiﬁcance. For example, let consider P1 (x) = 18001 −x 3

This polynomial takes 1/3 at x = 6000 and -2/3 at x = 6001. On a ﬁnite-precision machine with 5 decimal digits the coeﬃcients are stored as a∗ = 6000.3 and a∗ = −1, and hence 0 1 P1 (6000) = 6000.3 − 6000 = 0.3 P1 (6001) = 6000.3 − 6001 = −0.7 Only one digit of the exact value is recovered, yet the coeﬃcients are accurate to 5 digits! 4 signiﬁcant digits have been lost due to subtraction of two near-neighbor large numbers. Shifted power form: The drawback seen in the previous example can be alleviated by changing the origin of the x to a non-zero value c an writing the polynomial (5.22) as

n

Pn (x) =

k=0

ak x k (5.23)

= b0 + b1 (x − c) + . . . + bn (x − c)n

n

=

k=0

bk (x − c)k

This form is called shifted power form. c is a centre an bk are constant coeﬃcients. The previous example can be written as P1 (x) = 18001 −x 3 1 = − (x − 6000) 3

So, we get 1 − (6000 − 6000) = 3 1 − (6001 − 6000) = P1 (6001) = 3 49 1 = 0.33333 3 1 − 1 = −0.66667 3

P1 (6000) =

These values are accurate to 5 digits and there is no loss of signiﬁcance. We can ﬁnd the coeﬃcients bk by using Taylor polynomial at x=c. This gives

n

Pn (x) =

k=0 (k)

Pn (c) (x − c)k k!

(k)

(5.24)

**where,Pn (c) is the k-th derivative of Pn (x) at x = c. Thus, bk = Pn (c) k!
**

(k)

**Newton form: We can generalized equation (5.24) by choosing n centres c1 , . . . , cn we get Pn (x) = d0 + d1 (x − c1 ) + d2 (x − c1 )(x − c2 ) + . . . +dn (x − c1 )(x − c2 ) . . . (x − cn )
**

n k

(5.25)

= d0 +

k=1

dk

j=1

(x − cj )

This is Newton form, which is particularly useful for polynomial interpolation. if c1 = c2 = . . . = cn = c we get the shifted power form, and for c1 = c2 = . . . = cn = 0 we cover the power form.

5.6

Spline Interpolation

For large data set a single approximation by a polynomial satisfying the data (xi , f (xi )) will give a polynomial of high degree. In general a polynomial of high degree oscillates which may not be acceptable behavior. One solution of this problem is to use interpolation in a piecewise manner. The simplest approach uses linear interpolates. Given n + 1 items of data in ascending order by x, the data (xi , f (xi )) and (xi+1 , f (xi+1 )) are interpolated by a straight line. A piecewise linear interpolation is called linear spline S1 . The linear spline suﬀers from the lack of smoothness. The continuity is assured but there is a change in the ﬁrst derivative. The solution is to use splines having greater smoothness. We concentrate on cubic spline. Deﬁnition1 : For the data (xi , f (xi )), i = 0, . . . , n, S3 is a cubic spline in [x0 , xn ] if: (1) S3 restricted to [xi−1 , xi ] is a polynomial of degree at most 3 (2) S3 ∈ C 2 [x0 , xn ] (3) If s3,i and s3,i+1 are cubic interpolation on adjacent sub-intervals then the conditions: s3,i (xi ) = s3,i+1 (xi ) = f (xi ) s′3,i (xi ) = s′3,i+1 (xi ) s′′ (xi ) = s′′ (xi ) 3,i 3,i+1 The condition of this deﬁnition is that individual interpolates can no longer be constructed in isolation. The piecewise interpolates s3,1 , . . . , s3,n are interdependent through the derivatives continuity condition.

1

Alastair Wood, Introduction to Numerical Analysis

50

On the interval [xi−1 , xi ], and for i = 1, 2, . . . , n, we have s3,i (x) = f (xi−1 ) + ai (x − xi−1 ) + bi (x − xi−1 )2 + ci (x − xi−1 )3 there are 3n constants to be determined, ai , bi ,ci , i = 1, . . . , n. The continuity enforce that s3,i (xi ) = f (xi−1 ) + ai (xi − xi−1 ) + bi (xi − xi−1 )2 + ci (xi − xi−1 )3 s3,i+1 (xi ) = f (xi ) + ai+1 (xi − xi ) + bi+1 (xi − xi )2 + ci+1 (xi − xi )3 which leads to f (xi−1 ) + ai (xi − xi−1 ) + bi (xi − xi−1 )2 + ci (xi − xi−1 )3 = f (xi ) f (xi−1 ) + ai hi + bi h2 + ci h3 = f (xi ) i i where hi = xi − xi−1 for i = 1, . . . , n. For the ﬁrst derivative we have s′3,i+1 (xi ) = ai+1 + 2bi+1 (xi − xi ) + 3ci+1 (xi − xi )2 which leads to ai + 2bi hi + 3ci h3 = ai+1 i for i = 1, . . . , n − 1. For the second derivative we get s′′ (xi ) = 2bi + 6ci (xi − xi−1 ) 3,i s′′ (xi ) = 2bi+1 + 6ci+1 (xi − xi ) 3,i which leads to bi + 3ci hi = bi+1 for i = 1, . . . , n − 1. The natural cubic spline is deﬁned by s′′ (x0 ) = s′′ (xn ) = 0 3,1 3,n in other words b1 = 0, we have: 3n constants to be determined n equations from continuity n − 1 equations from 1st derivative n − 1 equations from 2st derivative 2 equations from natural cubic spline Therefore, we can ﬁnd all the constants. 51 and bn + 3cn hn = 0 (5.28) (5.27) s′3,i (xi ) = ai + 2bi (xi − xi−1 ) + 3ci (xi − xi−1 )2 (5.26)

5.7

Parametric Curves

In some cases curves cannot be expressed as a function of one coordinate variable y in terms of the other variable x. A straightforward method to represent such curves is to use parametric technique. We choose a parameter t on the interval [t0 , tn ], with t0 < t1 < ... < tn and construct approximation functions with xi = x(ti ) and yi = y(ti ) (5.29)

Consider a curve given by the ﬁgure.3.14 page 158 from the textbook. From the curve we can extract the following table i 0 1 2 3 4 ti 0 0.25 0.5 0.75 1 xi −1 0 1 0 1 yi 0 1 0.5 0 −1 Please refer to page 158 Example: The ﬁrst part of the graph x = [10, 6, 2, 1, 2, 6, 10] y = [3, 1, 1, 4, 6, 7, 6] t = [0, 1/6, 1/3, 1/2, 2/3, 5/6, 1] The second part of the graph x = [2, 6, 10, 10, 13] y = [10, 12, 10, 1, 1] t = [0, 1/4, 2/4, 3/4, 1]

52

The cubic polynomials for the ﬁrst graph are: 10 − 297 u − 540 u3 13 115 13 27 13 − 13 u − 1620 u2 + 2700 u3 13 13 283 1539 − 13 u + 2916 u2 − 1836 u3 13 13 13 fx = u → − 176 + 1215 u − 2592 u2 + 1836 u3 13 13 13 13 1168 4833 13 − 13 u + 6480 u2 − 2700 u3 13 13 707 1917 − 13 + 13 u − 1620 u2 + 540 u3 13 13 3 − 1797 u + 4266 u3 65 367 130 130 − 1383 u − 1242 u2 + 1350 u3 65 13 2143 130 − 17367 u + 22734 u2 − 17226 u3 130 130 65 65 fy = u → − 1831 + 17463 u − 12096 u2 + 5994 u3 130 65 65 389 65 16521 13 − 130 u + 13392 u2 − 1350 u3 65 13 2397 40629 − 26 + 130 u − 20898 u2 + 6966 u3 65 65

u < 1/6 u < 1/3 u < 1/2 u < 2/3 u < 5/6 otherwise u < 1/6 u < 1/3 u < 1/2 u < 2/3 u < 5/6 otherwise

(5.30)

(5.31)

(5.32)

The cubic polynomials for the second graph are: 2 + 205 u + 152 u3 7 113 14137 − 14 u + 684 u2 − 760 u3 28 7 7 fx = u → − 815 + 2647 u − 300 u2 + 1096 u3 14 7 92928 2699 − 14 u + 1464 u2 − 488 u3 14 7 7 10 + 135 u − 184 u3 7 323 14 − 123 u + 516 u2 − 872 u3 28 14 7 7 fy = u → − 1277 + 4677 u − 612 u2 + 2328 u3 14 7 2399 7473 28 − 14 u + 3816 u2 − 1272 u3 14 7 7

u < 1/4 u < 1/2 u < 3/4 otherwise u < 1/4 u < 1/2 u < 3/4 otherwise

(5.33) (5.34) (5.35)

53

We merge the two graphs we get

Example: Given the data points i 0 1 2 xi 4 9 16 fi 2 3 4 We want to estimate the faction value f at x = 7 using natural cubic spline. The cubic equations are: f1 (x) = a0 + a1 (x − x0 ) + a2 (x − x0 )2 + a3 (x − x0 )3 f2 (x) = b0 + b1 (x − x1 ) + b2 (x − x1 )2 + b3 (x − x1 )3 which become f1 (x) = 2 + a1 (x − 4) + a2 (x − 4)2 + a3 (x − 4)3 f2 (x) = 3 + b1 (x − 9) + b2 (x − 9)2 + b3 (x − 9)3 The continuity leads to 3 1 4 1 The ﬁrst derivatives:

′ f1 (x) = a1 + 2a2 (x − 4) + 3a3 (x − 4)2 ′ f2 (x) = b1 + 2b2 (x − 9) + 3b3 (x − 9)2

= = = =

2 + a1 (9 − 4) + a2 (9 − 4)2 + a3 (9 − 4)3 5a1 + 25a2 + 125a3 3 + b1 (16 − 9) + b2 (16 − 9)2 + b3 (16 − 9)3 +7b1 + 49b2 + 343b3

**The continuity of the derivatives lead to: b1 = a1 + 10a2 + 75a3 The second derivatives:
**

′′ f1 (x) = 2a2 + 6a3 (x − 4) ′′ f2 (x) = 2b2 + 6b3 (x − 9)

54

The continuity of the second derivatives lead to b2 = a2 + 15a3

The condition of natural spline 0 = a2 0 = b2 + 21b3 Now we collect our equations: 1 1 b1 b2 0 0 = = = = = = 5a1 + 25a2 + 125a3 7b1 + 49b2 + 343b3 a1 + 10a2 + 75a3 a2 + 15a3 a2 b2 + 21b3

Because we are interested in f (7) we can use

55

1 1 b1 b2 0 then, we get:

= = = = =

5a1 + 125a3 +7b1 + 49b2 + 343b3 a1 + 75a3 15a3 b2 + 21b3

1 = 5a1 + 125a3 and 1 = 7(a1 + 75a3 ) + 49(15a3 ) + 343 − 1 = 7a1 + 1015a3 The solutions of the following equations are: a1 = a3 Thus, the cubic polynomial is f1 (x) = 2 + Therefore f1 (7) = We can ﬁnd the other polynomial in the same way f2 (x) = 3 + 1 1 37 (x − 9) − (x − 9)2 + (x − 9)3 210 140 2940

459 175

15a3 21

89 420 −1 = 2100

1 89 (x − 4) − (x − 4)3 420 2100 (5.36)

56

**Chapter 6 Numerical Diﬀerentiation and Integral
**

6.1 Numerical Diﬀerentiation

h2 ′′ f (ξ) 2

Consider a small increment ∆x = h in x. According to Taylor’s theorem, we have f (x + h) = f (x) + hf ′ (x) + where, ξ is a real number between x and x + h. We can get f ′ (x) = f (x + h) − f (x) h ′′ − f (ξ) h 2 (6.2) (6.1)

We can get the same formula using Linear Lagrange polynomial. we use two points x0 and x1 = x0 +h, we get f (x) = f (x0 ) x − x1 x − x0 f ′′ (ξ) (x − x0 )(x − x1 ) + f (x1 ) + x0 − x1 x1 − x0 2! x − x1 x − x0 f ′′ (ξ) = −f (x0 ) + f (x1 ) + (x − x0 )(x − x1 ) h h 2!

Now we calculate the derivative at x = x0 −f (x0 ) f (x1 ) f ′′ (ξ) 1 ′ + + (2x − x0 − x1 ) + [f ′′ (ξ)] (x − x0 )(x − x1 ) h h 2! 2! f (x0 + h) − f (x0 ) h ′′ f ′ (x0 ) = − f (ξ) h 2 f ′ (x) = This formula is known as the forward-diﬀerence if h > 0 and the backward-diﬀerence if h < 0. The previous formula is also called two-point formula. Backward-diﬀerence can be written for h > 0 as f ′ (x) = f (x) − f (x + h)) h ′′ + f (ξ) h 2 (6.3)

57

(n+1)-point formula: To obtain general derivative approximation formulas, suppose that {x0 , x1 , ..., xn } are (n+1) distinct numbers in some interval I and that f ∈ C n+1 (I). We can Use Lagrange polynomials n n f (n+1) (ξ(x)) f (x) = f (xk )Lk (x) + (x − xk ) (6.4) (n + 1)! k=0 k=0 then, we calculate the derivative of f (x) we get

n

f (xj ) =

k=0

′

f (xk )L′k (xj )

f (n+1) (ξj ) + (n + 1)!

n

k=0 k=j

(x − xk )

(6.5)

In general, using more evaluation points produces greater accuracy, although the number of functional evaluations and growth of round-oﬀ error discourages this somewhat. Three-point formula: 1 h2 [−3f (x0 ) + 4f (x + h) − f (x0 + 2h)] + f (3) (ξ0 ), 2h 3 2 1 h f ′ (x0 ) = [f (x0 + h) − f (x0 − h)] + f (3) (ξ1 ), 2h 6 f ′ (x0 ) = (6.6) (6.7)

where ξ0 lies between x0 and x0 + 2h and ξ1 lies between x0 − h and x0 + h. Although the errors in both formulas are O(h2 ), the error in the last equation is approximately half the error in the ﬁrst equation. This is because it use data from both sides of x0 . Five-point formula 1 [f (x0 − 2h) − 8f (x0 − h) 12h h4 +8f (x0 + h) − f (x0 + 2h)] + f (5) (ξ), 30 f ′ (x0 ) =

(6.8)

where ξ lies between x0 − 2h and x0 + 2h. The other ﬁve-point formula is useful for end-point approximations. It is given by f ′ (x0 ) = 1 [−25f (x0 ) + 48f (x0 + h) − 36f (x0 + 2h) 12h h4 +16f (x0 + 3h) − 3f (x0 + 4h)] + f (5) (ξ), 5

(6.9)

where ξ lies between x0 and x0 + 4h. Left-endpoint approximation are found using h > 0 and right-endpoint approximations with h < 0. Example: Let f (x) = x exp (x). The values of f at diﬀerent x are given x 1.8 1.9 2.0 2.1 2.2 f (x) 10.889465 12.703199 14.778112 17.148957 19.855030 58

Since, f ′ (x) = (x+1) exp (x), we have f ′ (2) = 22.167168. Let us approximate f ′ (2) using three-point formulas. 1 [−3f (x0 ) + 4f (x + h) − f (x0 + 2h)]) f ′ (x0 ) ≈ 2h 1 f ′ (2) ≈ [−3f (2.0) + 4f (2.1) − f (2.2)] , h = 0.1 2 × 0.1 ≈ 22.032310 1 [−3f (2.0) + 4f (1.9) − f (1.8)] , h = −0.1 f ′ (2) ≈ −2 × 0.1 ≈ 22.054525 1 [f (x0 + h) − f (x0 − h)] , f ′ (x0 ) ≈ 2h 1 ≈ [f (2.1) − f (1.9)] , h = 0.1 2 × 0.1 ≈ 22.228790 The errors are 22.167168 − 22.03231 = 0.134858, 22.167168 − 22.054525 = 0.112643, and 22.167168 − 22.228790 = −0.61622 × 10−1 . respectively It is also possible to ﬁnd approximation to higher derivatives function using only tabulated values of function at various points. Expand a function f in a third Taylor polynomial about x0 and evaluate at x0 ± h. Then f (x) = f (x0 ) + (x − x0 )f ′ (x0 ) + + (x − x0 )4 (4) f (ξ), 24 (x − x0 )3 (3) (x − x0 )2 ′′ f (x0 ) + f (x0 ) 2 6

(6.10) (6.11) (6.12)

h2 ′′ h3 h4 f (x0 ) + f (3) (x0 ) + f (4) (ξ+ ) 2 6 24 2 3 h h h4 f (x0 − h) = f (x0 ) − hf ′ (x0 ) + f ′′ (x0 ) − f (3) (x0 ) + f (4) (ξ− ) 2 6 24 where x0 − h < ξ− < x0 < ξ+ < x0 + h adding the two last equations we get f (x0 + h) = f (x0 ) + hf ′ (x0 ) +

h2 h4 (4) f (x0 + h) + f (x0 − h) = f (x0 ) + f ′′ (x0 ) + f (ξ+ ) + f (4) (ξ− ) , 2 2 24 solving this last equation we ﬁnd that f ′′ (x0 ) = 1 h2 (4) [f (x0 − h) − 2f (x0 ) + f (x0 + h)] − f (ξ+ ) + f (4) (ξ− ) h2 24

(6.13)

(6.14)

Suppose that f (4) is continuous on [x0 − h, x0 + h]. Since 1/2(f (4) (ξ+ ) + f (4) (ξ− )) is between f (4) (ξ+ ) and f (4) (ξ− ), the Intermediate value theorem implies that a number ξ exists between ξ+ and ξ− , and hence in (x0 − h, x0 + h), with f (4) = This lead to f ′′ (x0 ) = where ξ ∈ (x0 − h, x0 + h). 59 h2 1 [f (x0 − h) − 2f (x0 ) + f (x0 + h)] − f (4) (ξ) h2 12 (6.15) f (4) (ξ+ ) + f (4) (ξ− ) 2

It is important to pay attention to round-oﬀ error when approximating derivatives. Let illustrate this by an example: two-point formula f ′ (x) ≈ f1 − f0 f (x + h) − f (x) = h h (6.16)

If we assume the round oﬀ errors in f0 and f1 as e0 and e1 , respectively, then f ′ (x) ≈ f1 + e1 − f0 − e0 f1 − f0 e1 − e0 = + h h h 2e h (6.17)

If the errors are of magnitude e, we can at worst get Rudolf error = (6.18)

h Mh We know that the truncation error is given by − f ′′ (ξ) = , where M is the bound given by 2 2 ′′ M = max |f (ξ)| and x ≤ ξ ≤ x + h Thus the bound for total error is then Total error = Round oﬀ error + Truncation error 2e M h E2 = + h 2 (6.19) (6.20)

Note that when the step size h increased, the truncation error increases while the round oﬀ error decreases. The optimal value of h can be found to be E ′ (h) = Three-point formula f ′ (x0 ) = h2 1 [f (x0 + h) − f (x0 − h)] − f (3) (ξ) 2h 6 (6.22) 2e M − 2 = 0 =⇒ h = 2 2 h e . M (6.21)

Suppose that in evaluating f (x0 ± h) we encounter round-oﬀ error e(x0 ± h). Then our computer ˜ values f (x0 ± h) are related to f (x0 ± h) by ˜ f (x0 ± h) = f (x0 ± h) + e(x0 ± h) The total error is f ′ (x0 ) − ˜ ˜ f (x0 + h) − f (x0 − h) e(x0 + h) − e(x0 − h) h2 (3) = − f (ξ) 2h 2h 6 (6.24) (6.23)

is due in part to the round-oﬀ error and in part to the truncation error. If we assume that the round-oﬀ error is bounded by ǫ > 0 and that the third derivative is bounded by M > 0. then the total error will be bounded by f ′ (x0 ) − ˜ ˜ f (x0 + h) − f (x0 − h) = 2h e(x0 + h) − e(x0 − h) h2 (3) − f (ξ) 2h 6 2 ǫ h ≤ + M. h 6 (6.25) (6.26)

60

To reduce the truncation error, h2 M/6, we must reduce h. But if h is reduced, the round-oﬀ error ǫ/h grows. In practice, then, it is seldom advantageous to let h be too small sine the round-oﬀ error will dominate the calculations. The Minimum error can be found for the optimal value of h given by h=

3

3ǫ M

(6.27)

Numerical diﬀerentiation is unstable: small values of h needed to reduce truncation error also cause the round-oﬀ error to grow. Example: Show that the diﬀerentiation rule is exact for all f ∈ C 3 , if, and only if, it exact for f (x) = 1, and ﬁnd the values of a0 , a1 , and a2 . solution: According to the following formula f (n+1) (ξ(x)) (x − xk ) f (x) = f (xk )Lk (x) + (n + 1)! k=0 k=0 with Lk (x) =

i=0 i=k

f ′ (x0 ) ≈ a0 f0 + a1 f1 + a2 f2 f (x) = x, f (x) = x2 ,

n

n

(6.28)

n

(x − xi ) (xk − xi )

n

(6.29)

**we calculate the derivative of f (x)
**

n

f (xj ) =

k=0

′

f (xk )L′k (xj )

f (n+1) (ξj ) + (n + 1)!

k=0 k=j

(x − xk )

(6.30)

Therefore, we get f (x0 ) = we have to show that – For f (x) = 1, x, x2 , the formula f ′ (x0 ) = a0 f0 + a1 f1 + a2 f2 is exact since ∀ξ0 f (3) (ξ0 ) = 0. – Now, we want to show that if the formula f ′ (x0 ) = a0 f0 + a1 f1 + a2 f2 is exact, then it should be also exact for f (x) = 1, x, x2 . If f ′ (x0 ) = a0 f0 + a1 f1 + a2 f2 is exact then ∀ξ0 f (3) (ξ0 ) = 0. This implies that ∀α, β, γ, f (x) = α + βx + γx2 . Then, if f ′ (x0 ) = a0 f0 + a1 f1 + a2 f2 is exact =⇒ ∀ξ0 f (3) (ξ0 ) = 0 =⇒ ∀α, β, γ, f (x) = α + βx + γx2 =⇒ the formula is exact fo f (x) = 1, x, x2 Exercises section 4.1: 1, 3, 5, 9, 13, 15, 19 61

′

f0 L′0 (x0 )

+

f1 L′1 (x0 )

+

f2 L′2 (x0 )

f (3) (ξ0 ) + (x0 − xk ) 6 k=1

n

(6.31)

6.2

Richardson’s Extrapolation

Richardson’s extrapolation is used to generate high-accuracy results while using low-order formulas. Extrapolation can be applied whenever it is known that an approximation technique has an error term with predictable form like M − N (h) = K1 h + K2 h2 + ..., (6.32) for some collection of unknown constants, Ki , where N (h) approximate an unknown value M . In general M − N (h) ≈ K1 h, unless there was a large variation in magnitude among the constants K, which is O(h). if M can be written in the form

m−1

M = N (h) +

j=1

Kj hj + O(hm ),

(6.33)

then for j = 2, 3, ..., m, we have an O(hj ) approximation of the form Nj (h) = Nj−1 h 2 + Nj−1 (h/2) − Nj−1 (h) 2j−1 − 1 (6.34)

These approximations are generated by rows in the order indicated by the numbered entries in the following table O(h) 1 : N1 (h/1) ≡ N (h/1) 2 : N1 (h/2) ≡ N (h/2) 4 : N1 (h/4) ≡ N (h/4) 7 : N1 (h/8) ≡ N (h/8) O(h2 ) O(h3 ) O(h4 ) (6.35)

3 : N2 (h) 5 : N2 (h/2) 6 : N3 (h) 8 : N2 (h/4) 9 : N3 (h/2) 10 : N4 (h)

If M is in the form that contains only even power of h then Nj (h) = Nj−1 for j = 2, 3, .., gives an O(h2j ). Example: We want to determine an approximate the value to f ′ (1.0) with h = 0.4 where f (x) = ln x. We use Richardson’s Extrapolation N3 (h). We have f ′ (x0 ) = h2 h4 (5) 1 [f (x0 + h) − f (x0 − h)] − f (3) (x0 ) − f (ξ) − ... 2h 6 120 (6.37) h 2 + Nj−1 (h/2) − Nj−1 (h) 4j−1 − 1 (6.36)

In the case h = 0.4 and x0 = 1, we can calculate N3 (h) using 1 [ln(x0 + h) − ln(x0 − h)] 2h N1 (0.4) = 1.059122326 N1 (0.2) = 1.013662770 N1 (0.1) = 1.003353478 N1 (h) =

62

We then use Nj (h) = Nj−1 to get N2 (0.4) = N1 (0.2) + N1 (0.2) − N1 (0.4) = 0.9985095847 41 − 1 N1 (0.1) − N1 (0.2) N2 (0.2) = N1 (0.1) + = 0.9999170473 41 − 1 N2 (0.2) − N2 (0.4) N3 (0.4) = N2 (0.2) + = 1.000010878 42 − 1 h 2 + Nj−1 (h/2) − Nj−1 (h) 4j−1 − 1 (6.38)

Exercises section 4.2: 1, 3, 5, 11 Exercises 5 (a) and 7 (a) > > > > a1:=1.1;f1:=exp(2*a1); a2:=1.2;f2:=exp(2*a2); a3:=1.3;f3:=exp(2*a3); a4:=1.4;f4:=exp(2*a4); a1 := 1.1 f1 := 9.025013499 a2 := 1.2 f2 := 11.02317638 a3 := 1.3 f3 := 13.46373804 a4 := 1.4 f4 := 16.44464677 > > > > > h:=0.1; f1p:=1/2/h*(-3*f1+4*f2-f3); f2p:=1/2/h*(f3-f1); f3p:=1/2/h*(f4-f2); f4p=-1/2/h*(-3*f4+4*f3-f2); h := 0.1 f1p := 17.76963490 f2p := 22.19362270 f3p := 27.10735195 f4p = 32.51082265 evalf(subs(x=1.3,h^2/3*diff(exp(2*x),x$3))); evalf(subs(x=1.3,h^2/6*diff(exp(2*x),x$3))); evalf(subs(x=1.4,h^2/6*diff(exp(2*x),x$3))); evalf(subs(x=1.4,h^2/3*diff(exp(2*x),x$3))); 0.3590330144 0.1795165072 0.2192619569 63

> > > >

0.4385239139

6.3

n i=0

**Elements of Numerical Integration
**

b a

The basic method involved in approximating

f (x)dx is called numerical quadrature. It uses a sum

ai f (xi ) to approximate the integral. The methods of quadrature in this section are based on Lagrange interpolating polynomial.

6.3.1

Trapezoidal rule

To derive the Trapezoidal rule, we use the linear Lagrange polynomial which agrees with the function f (x) at x0 = a and x1 = b. The Trapezoidal Rule is:

b

f (x)dx =

a

h3 h [f (x0 ) + f (x1 )] − f ′′ (ξ), 2 12

(6.39)

**where, h = x1 − x0 = b − a and ξ ∈ (a, b). We want to evaluate the integral I=
**

1

2

(x3 + 1) dx

(6.40)

**using trapezoidal rule.
**

2

f (x)dx =

1

1 h3 [f (1) + f (2)] − f ′′ (ξ) 2 12

1 1 [2 + 9] − 6ξ 2 12 11 1 = − ξ 2 2 Therefore the trapezoidal rule give 5.5 with a maximum error Emax equals to = |Emax | = Max The exact value of the integral is

2

1 ξ=1 2

I =

1

(x3 + 1) dx

=

1 4 x + x |2 1 4 1 4 = 2 + 2 − f rac1414 + 1 4 19 = 4.75 = 4

Thus, the absolute error is |5.5 − 4.75| = 0.75. Note that the error is less than the maximum error calculated from the formula. 64

6.3.2

Simpson’s rule

Simpson’s rule uses the second Lagrange polynomial with nodes at x0 = a, x1 = a + h, x2 = b, where h = (b − a)/2. There are few tricks to get the formula (please see sec 4.3 and exercise 24 of sec 4.3)

x2

f (x)dx =

x0

h5 h [f (x0 ) + 4f (x1 ) + f (x2 )] − f (4) (ξ). 3 90

(6.41)

We want to evaluate the integral I=

1

ex dx

−1

(6.42)

**using Simpson’s rule
**

1

f (x)dx =

−1

1 1 [f (−1) + 4f (0) + f (1)] − f (4) (ξ) 3 90

(6.43) (6.44)

we get

1

ex dx =

−1

1 −1 1 [e + 4 + e] − eξ 3 90 1 ξ e 90

**= 2.36205 − The maximum error is |Emax | = Max = The exact value of the integral is
**

1

1 e = 0.03020 90

1 ξ e 90

I=

−1

ex dx = e −

1 = 2.35040 e

(6.45)

the absolute error is |2.36205 − 2.35040| = 0.01165 which is less than the maximum error 0.03020.

6.3.3

Degree of precision

The degree of accuracy, or precision, of a quadrature formula is the largest positive integer n such that the formula is exact for xk , for each k = 0, 1, ..., n. What degree of precision does the following formula have?

1

f (x)dx = f

−1

1 −√ 3

+f

1 √ 3

65

The integral

1

xk dx =

−1

1 + (−1)k 1+k

k

and 1 −√ 3

k

+

1 √ 3

k

=

1 √ 3

1 + (−1)k

**The form has n degree of precision if it is exact for xk , k = 0, . . . , n 1 + (−1)k = 1+k 1 √ 3
**

k

1 + (−1)k

**It is true when k is an odd number. For k even number 1 = 1+k 1 √ 3
**

k

which is true for k = 0, 2. Therefore, the formula is true for k = 0, 1, 2, 3. Thus, the degree of the formula is 3. We can conclude from this example that the formula is true for all the polynomial of degree at most 3.

6.3.4

Newton-Cotes Formula

There are two types of Newton-Cotes formula, open and closed. The (n+1)-point closed Newton-Cote formula uses nodes xi = x0 + ih, for i = 0, 1, ..., n, where x0 = a, xn = b, and h = (b − a)/n. It called closed because the endpoints of the closed interval [a, b] are included as nodes. The open Newton-Cotes formulas use nodes xi = x0 + ih, for each i = 0, 1, ..., n, where h = (b − a)/(n + 2) and x0 = a + h. This implies that xn = b − h. so, we label the endpoints by setting x−1 = a and xn+1 = b. where

b n

f (x)dx ≈

a

ai f (xi ),

i=0 b

(6.46)

ai =

a

Li (x)dx

(6.47)

Theorem 4.2 Theorem 4.3 Exercises section 4.3:

6.4

Composite Numerical Integration

The Newton-Cotes formulas are generally unsuitable for use over large integration intervals. One way to overcome this problem is to split the large interval into subintervals and sum Newton-Cotes formulas over all these subintervals. 66

Composite Trapezoidal rule: Let f ∈ C 2 [a, b], h = (b − a)/n, and xj = a + hj, for each j = 0, 1, . . . , n. There exists a µ ∈ (a, b) for which the composite trapezoidal rule for n subintervals can be written as

b a

Composite Simpson’s rule: Let f ∈ C 4 [a, b], n be even number, h = (b − a)/n, and xj = a + hj, for each j = 0, 1, . . . , n. There exists a µ ∈ (a, b) for which the composite Simpson’s rule for n subintervals can be written as n/2 (n/2)−1 b b − a 4 (4) h f (x2j−1 ) + f (b) − f (x2j ) + 4 h f (µ) (6.48) f (x) dx = f (a) + 2 3 180 a j=1 j=1

h b − a 2 ′′ f (x) dx = f (a) + 2 f (xj ) + f (b) − h f (µ) 2 12 j=1

n−1

(6.49)

Composite Midpoint rule: Let f ∈ C 2 [a, b], n be even, h = (b − a)/(n + 2), and xj = a + h(j + 1), for each j =, −1, 0, . . . , n + 1. There exists a µ ∈ (a, b) for which the composite Midpoint rule for n + 2 subintervals can be written as n/2 b b − a 2 ′′ f (x2j ) − h f (µ) (6.50) f (x) dx = 2h 6 a j=0 An important property shared by all these composite integration techniques is a stability with respect to round oﬀ errors. Let demonstrate this property for Composite Simpson’s rule with n subintervals to a function f on [a, b]. Assume that ˜ f (xi ) = f (xi ) + ei (6.51) ˜ where ei is the round oﬀ error and f (xi ) is an approximation to f (xi ). From the Composite Simpson’s rule

b

f (x) dx =

a

**the accumulate round oﬀ error, e(h) is given by
**

(n/2)−1

h f (a) + 2 3

(n/2)−1

f (x2j ) + 4

j=1

b − a 4 (4) f (x2j−1 ) + f (b) − h f (µ) 180 j=1

n/2

e(h) =

If the round oﬀ errors are uniformly bounded by ǫ, then e(h) ≤

h e0 + 2 3

n/2

e2j + 4

j=1 j=1

e2j−1 + en

h ǫ [1 + (n − 2) + 2n + 1] 3 ≤ nhǫ = (b − a)ǫ

It is clear from the last equation that the bound is independent of h and n. This means that, even though we may nee to divide an interval into more parts to ensure accuracy, the increased computation that is required does not increase the round oﬀ error. This means that the procedure is stable as h approaches zero. Recall that was not true for the numerical diﬀerentiation procedures. 67

Exercises section 4.4:

6.5

Adaptive Quadrature Methods

The composite quadrature rules use equally spaced points. This is not good if the function to be integrated has large variations in some region and small variations at other regions. So, it is useful to introduce a method to adjust the step size. The step size have to be smaller over region where a large variation occurs. This technique is called adaptive quadrature. The method is based on Simpson’s rule. But we can also use any Newton-Cotes formula. We know that Simpson’s rule uses two subintevals over [ak , bk ]: S(ak , bk ) = h (f (ak ) + 4f (ck ) + f (bk )) , 3 (6.52)

**where ck is the center of [ak , bk ], and h = ξk ∈ [ak , bk ] so that
**

b a

b k − ak . Furthermore, if f ∈ C (4) [ak , bk ], then there exist 2 h5 (4) f (ξk ) 90

f (x) dx = S(ak , bk ) −

(6.53)

A composite Simpson rule using four subintervals of [ak , bk ] can be performed by bisecting this interval into two equal subinterval [ak , ck ] = [ak1 , bk1 ] and [ck , bk ] = [ak2 , bk2 ]. We then write S(ak1 , bk1 ) + S(ak2 , bk2 ) = h (f (ak1 ) + 4f (ck1 ) + f (bk1 )) 3×2 h (f (ak2 ) + 4f (ck2 ) + f (bk2 )) + 3×2 (6.54) (6.55)

where only two additional evaluation of f (x) are needed at ck1 and ck2 , which are the midpoint of the intervals [ak1 , bk1 ], and [ak2 , bk2 ], respectively. Furthermore, if f ∈ C (4) [ak , bk ], then there exist ξk1 ∈ [ak , bk ] so that

bk ak

= S(ak1 , bk1 ) + S(ak2 , bk2 ) −

h5 f (4) (ξk1 ) 16 × 90

(6.56)

**If we assume that f (4) (ξk ) ≈ f (4) (ξk1 ), we get S(ak , bk ) − which can be written as − Thus, we can ﬁnd that
**

b a

h5 h5 (4) f (ξk ) ≈ S(ak1 , bk1 ) + S(ak2 , bk2 ) − f (4) (ξk1 ) 90 16 × 90

h5 (4) 16 f (ξk1 ) ≈ (S(ak1 , bk1 ) + S(ak2 , bk2 ) − S(ak , bk )) 90 15

f (x) dx − S(ak1 , bk1 ) − S(ak2 , bk2 ) ≈ 68

1 |S(ak1 , bk1 ) + S(ak2 , bk2 ) − S(ak , bk )| 15

(6.57)

1 1 is replaced with when implementing Because of the assumption f (4) (ξk ) ≈ f (4) (ξk1 ), the fraction 15 10 the method in a program. Assume that we want the tolerance to be ǫk > 0 for the interval [ak , bk ]. If 1 |S(ak1 , bk1 ) + S(ak2 , bk2 ) − S(ak , bk )| < ǫk 10 we infer that

a b

(6.58)

f (x) dx − S(ak1 , bk1 ) − S(ak2 , bk2 ) < ǫk

(6.59)

**Thus the composite Simpson rule is used to approximate the integral
**

b a

f (x) dx ≈ S(ak1 , bk1 ) + S(ak2 , bk2 )

(6.60)

and the error bound for this approximation over the interval [ak , bk ] is ǫk . The adaptive quadrature is implemented by applying Simpson’s rules in this way: 1. We start with the interval [a0 , b0 ] and tolerance ǫ0 2. we then apply Simpson rule: h (f (ak ) + 4f (ck ) + f (bk )) , 3 h (f (ak1 ) + 4f (ck1 ) + f (bk1 )) S(ak1 , bk1 ) + S(ak2 , bk2 ) = 3×2 h + (f (ak2 ) + 4f (ck2 ) + f (bk2 )) 3×2 S(ak , bk ) = 3. The interval is reﬁned into subintervals labeled [a01 , b01 ] and [a02 , b02 ]. 4. If the accuracy test 1 |S(a01 , b01 ) + S(a02 , b02 ) − S(a0 , b0 )| < ǫ0 10 is passed, the quadrature formula S(a01 , b01 ) + S(a02 , b02 ) = h (f (a01 ) + 4f (c01 ) + f (b01 )) 6 h + (f (a02 ) + 4f (c02 ) + f (b02 )) 6

(6.61)

is applied to [a0 , b0 ] and we are done. 5. If the accuracy test fails, The two subintervals are labeled [a1 , b1 ] and [a2 , b2 ], over which the tolerances are halved, ǫ1 = ǫ/2, ǫ2 = ǫ/2. We repeat the steps 4-5 for the two intervals with the new tolerances. 6. we add all the quadrature formulas where the accuracy test are passed 69

**example We apply the adaptive quadrature algorithm to approximate
**

1 0

3√ x dx = 1 2

We ﬁrst deﬁne the function S(µ, v) = v−µ 6 f (µ) + 4f µ+v 2 + f (v)

Accuracy test for [0,1], we start with ǫ0 = 0.001 S(0, 1) S(0, 0.5) S(0.5, 1) |S(0, 0.5) + S(0.5, 1) − S(0, 1)| − 10ǫ0 We have to reﬁne the interval [0, 1] into [0, 0.5] and [0.5, 1] Accuracy test for [0,0.5], our ǫ1 = ǫ0 /2 S(0, 0.5) S(0, 0.25) S(0.25, 0.5) |S(0, 0.25) + S(0.25, 0.5) − S(0, 0.5)| − 10ǫ1 = = = = 0.3383883477 0.1196383477 0.2285372827 0.004787282700 > 0 = = = = 0.9571067813 0.3383883477 0.6464010497 0.0176826161 > 0

We have to reﬁne the interval [0, 5] into [0, 0.25] and [0.25, 0.5] Accuracy test for [0,0.25], our ǫ2 = ǫ1 /2 S(0, 0.25) S(0, 0.125) S(0.125, 0.25) |S(0, 0.125) + S(0.125, 0.25) − S(0, 0.25)| − 10ǫ2 = = = = 0.1196383477 0.04229854347 0.08080013118 0.000960326900 > 0

We have to reﬁne the interval [0, 25] into [0, 0.125] and [0.125, 0.25] Accuracy test for [0,0.125], our ǫ3 = ǫ2 /2 S(0, 0.125) S(0, 0.0625) S(0.0625, 0.125) |S(0, 0.0625) + S(0.0625, 0.125) − S(0, 0.125)| − 10ǫ3 The test has passed. So, we can keep the interval [0, 0.125] with S(0, 0.0625) + S(0.0625, 0.125) = 0.04352195381. We go now back and keep ǫ3 70 = = = = 0.04229854347 0.01495479346 0.02856716035 0.000026589660 < 0

Accuracy test for [0.125,0.25] S(0.125, 0.25) S(0.125, 0.1875) S(0.1875, 0.25) |S(0.125, 0.1875) + S(0.1875, 0.25) − S(0.125, 0.25)| − 10ǫ3 The test has passed. So, we can keep the interval [0.125, 0.25] with S(0.125, 0.1875) + S(0.1875, 0.25) = 0.08080541122. We go now back with ǫ2 Accuracy test for [0.25 0.5] S(0.25, 0.5) S(0.25, 0.375) S(0.375, 0.5) |S(0.25, 0.375) + S(0.375, 0.5) − S(0.25, 0.5)| − 10ǫ2 The test has passed. So, we can keep the interval [0.25, 0.5] with S(0.25, 0.375) + S(0.375, 0.5) = 0.2285522169. We go now back with ǫ1 Accuracy test for [0.5 1] S(0.5, 1) S(0.5, 0.75) S(0.75, 1) |S(0.5, 0.75) + S(0.75, 1) − S(0.5, 1)| − 10ǫ1 = = = = 0.6464010497 0.2959631153 0.3504801743 0.004957760100 < 0 = = = = 0.2285372827 0.1046387629 0.1239134540 0.002485065800 < 0 = = = = 0.08080013118 0.03699538942 0.04381002180 0.001244719960 < 0

**The test has passed. So, we can keep the interval [0.5, 1] with S(0.5, 0.75) + S(0.75, 1) = 0.6464432896. Now we can add
**

1 0

3√ x dx ≈ S(0, 0.0625) + S(0.0625, 0.125) + S(0.125, 0.1875) + S(0.1875, 0.25) + 2 S(0.25, 0.375) + S(0.375, 0.5) + S(0.5, 0.75) + S(0.75, 1) = 0.9993228715

the actual error is 0.0006771285 We can summarize our results: ak 0.000 0.125 0.250 0.500 bk 0.125 0.250 0.500 1.000 AQ 0.04352195381 0.08080541122 0.2285522169 0.6464432896 0.9993228715 71

1.5

1

0.5

0 0

0.125

.0.25

0.5

1

Exercises section 4.6:

6.6

Gaussian Quadrature

Gaussian integration is based on the concept that accuracy of quadrature form n can be improved by choosing nodes wisely, rather than on the basis of equal spacing nodes. Gauss integration assumes an approximation of the form This equation contains 2n unknowns to be determined. Gaussian Quadrature use the fact that the quadrature form has the high degree of precision as possible.

1 −1 n

f (x) dx ≈

ai f (xi )

i=1

(6.62)

Let us ﬁnd the Gaussian quadrature formula for n = 2. In this case the function 1, x, x2 , and 4x3 should give exact results.

1

f (x) = 1 =⇒ a1 + a2 =

−1

dx = 2

1

f (x) = x =⇒ a1 x1 + a2 x2 =

−1 1

xdx = 0 x2 dx =

−1 1

f (x) = x2 =⇒ a1 x2 + a2 x2 = 1 2 f (x) = x3 =⇒ a1 x3 + a2 x3 = 1 2

−1

2 3

x3 dx = 0

solving these equations we get

**1 x1 = −x2 = √ 3 Thus, we have the Gaussian Integration for two nodes a1 = a2 = 1,
**

1 −1

(6.63)

f (x) dx ≈ f

1 √ 3 72

+f

1 √ 3

(6.64)

This method can be generalized to more than two nodes but there is another alternative way to get more easily. This alternative way use what we call Legendre Polynomials. They are deﬁned by For each n, Pn is a monic polynomial of degree n, a polynomial xn + a(n−1) x(n−1) + ... + a1 x + a0 in which the coeﬃcient of the highest order term is 1. whenever P (x) is polynomial of degree less then n we have

1

P (x) Pn (x) dx = 0

−1

(6.65)

These are few Legendre polynomials: 1 P0 (x) = 1, P1 (x) = x, , P2 (x) = x2 − , 3 3 4 2 3 P3 = x3 − x, P4 (x) = x4 − x + 5 6 35

(6.66)

**The nodes xi , i = 1, . . . , n are determined by the roots of Pn (x) and the coeﬃcients ai are deﬁned by
**

1

ai =

−1

j =1 j =i

x − xj dx xi − xj

(6.67)

**If P (x) is any polynomial of degree less than 2n, then
**

1 −1 1 n

f (x) dx ≈

P (x) dx =

−1 i=1

ai P (xi )

(6.68)

Note that Gaussian formula imposes a restriction on the limits of integration to be from -1 to 1. It is possible to overcome this restriction by using the technique of changing variable.

b 1

f (x) dx =

a −1

g(z)dz

(6.69)

We deﬁne z = Ax + B z−B x = A So, we can get −1 = A a + B 1 = Ab + B which gives the solution 2 b−a a+b B = a−b A = 73 (6.70) (6.71)

Therefore

b

f (x) dx =

a

1 A

1

g(z) dz

−1

(6.72)

**where g(z) = f Example: Convert the integral
**

2

z−B A

(6.73)

I=

−2

e−x/2 dx

**to the Gaussian limits of integration. Solution:
**

2

I =

−2

e−x/2 dx 1 A

1

= We have

f

−1

z−B A

dz

A=

1 2 = b−a 2 a+b B= =0 a−b

2

Thus I =

−2 1

e−x/2 dx f (2z) dz

−1 1

= 2 = 2

−1

e−z dz

6.7

Improper Integrals

the notion of integration is extended to an interval of integration on which the function is unbounded the endpoint of the interval of integration is inﬁnity, ∞.

Improper integral result in two cases:

74

Let consider the ﬁrst case, when the integrand f (x) is unbounded. The second case can be reduced to the ﬁrst case. It is well known that the improper integral

b a

dx (x − a)p

(6.74)

**converges iﬀ the power 0 < p < 1. Thus, we deﬁne the improper integral
**

b a

(b − a)1−p dx = (x − a)p 1−p f (x) =

(6.75)

**If the function f (x) can be written as g(x) (x − a)p where g is continuous on [a, b], the improper integral
**

b

(6.76)

f (x)dx

a

(6.77)

exists for 0 < p < 1. Let now approximate this integral given by eq. (6.77) using composite Simpson’s rule. Assume that g ∈ C 5 [a, b]. In this case, the fourth Taylor polynomial, P4 (x), for g can be written as P4 (x) = g(a) + (x − a)g ′ (a) + Therefore

b b

g ′′ (a) g ′′′ (a) g (4) (a) (x − a)2 + (x − a)3 + (x − a)4 2 3! 4! g(x) − P4 (x) dx + (x − a)p

4 b a

(6.78)

f (x)dx =

a a

**The last term can be integrated like
**

b a

P4 (x) dx (x − a)p

(6.79)

P4 (x) dx = (x − a)p =

k=0 4

g (k) (a) (x − a)k−p dx k! g (k) (a) (b − a)k−p+1 (k − p + 1)k! (6.80)

k=0

Now, we can deﬁne a function G(x) such that g(x) − P4 (x) , if a < x ≤ b G(x) = (x − a)p 0, if x = a

(6.81)

k Since 0 < p < 1 and P4 (a) agrees with g (k) (a) for each k = 0, dots, 4, we have G ∈ C 4 [a, b]. This implies that the Composite Simpson’s rule can be applied to approximate the integral of G on [a, b]. Example: We want to approximate 1 sin (x) dx (6.82) x1/4 0 Using Simpson’s composite rule with n = 4.

75

**1. We ﬁnd the fourth order Taylor polynomial for sin(x) at x = 0 1 sin(x) ≈ P4 (x) = x − x3 6 2. We evaluate the integral
**

1 0

(6.83)

P4 (x) dx = x1/4

1 0

1 x3/4 − x11/4 dx 6

166 = = 0.5269841270 315 3. We deﬁne the function G(x) sin(x) − x + 1 x3 6 , if 0 < x ≤ 1 1/4 G(x) = x 0, if x = 0

**4. We apply Composite Simpson’s rule with n = 4
**

1 0

G(x) dx ≈

1 [G(0) + 2G(0.5) + 4G(0.25) + 4G(0.75) + G(1)] = 0.001432198742 12

5. Now we get

1 0

sin (x) dx ≈= 0.001432198742 + 0.5269841270 = 0.5284163257 x1/4

To approximate the improper integral with a singularity at the right endpoint, we could apply the technique used above with the following transformation

b −a

f (x)dx =

a −b

f (−z)dz

(6.84)

**which has a singularity at the left endpoint. An improper with a singularity at a < c < b can be treated as the sum
**

b c b

f (x)dx =

a a

f (x)dx +

c

f (x)dx

(6.85)

**Other type of improper integral involves inﬁnite limits of integration can be treated as
**

∞ 1/a

f (x)dx =

a 0

t−2 f (1/t)dt

(6.86)

for a = 0. For the case when a = we can split the integral into two parts. One from 0 to c and the other from c = 0 to ∞. 76

**Chapter 7 Initial-Value Problems for ODE
**

7.1 introduction

An ordinary Diﬀerential Equation (ODE) is an equation containing one ore more derivatives of y. Diﬀerential equations are classiﬁed according to their order. The order of diﬀerential equation is the highest derivative that appears in the equation. When the the equation contains only a ﬁrst derivative, it is called ﬁrst-order diﬀerential equation. A ﬁrst order diﬀerential equation can be expressed as dy = f (t, y) (7.1) dt Degree of a diﬀerential equation is the power of the highest-order derivative. For example ty ′′ + 3t2 + 2 = 0 is a ﬁrst-degree, second-order diﬀerential equation. A diﬀerential equation is a linear equation when it does not contain terms involving the product of the dependent variable y or its derivatives. For example y ′′ + 2y ′ + t2 is linear but y ′′ + 2y ′ y + t2 is not. If the order of the equation is n, we nee n conditions in order to obtain a unique solution. When all the conditions are speciﬁed at a particular value of independent variable t, then the problem is called initial-value problem. It is also possible to specify the conditions at diﬀerent values of t. such problems are called the boundary-value problem. All numerical techniques for solving diﬀerential equations involve a series of estimates of y(t) starting from the given conditions. There are two basic approaches, one-step and multistep methods. In one-step methods, we use information from only one preceding points. To estimate yi we only need yi−1 . Multistep methods use information at two or more previous steps to estimate a value.

77

7.2

**Elementary Theory of Initial-Value Problems
**

Lipschitz condition: A function f (t, y) is said to satisfy a Lipschitz condition in the variable y on a set D ⊂ R2 if a constant L > 0 exists with |f (t, y1 ) − f (t, y2 )| ≤ L|y1 − y2 |, whenever (t, y1 ), (t, y2 ) ∈ D. Convex Set: A set D ∈ R2 is said to be convex if whenever (t1 , y1 ) and (t2 , y2 ) belong to D and λ is in [0, 1], the point ((1 − λ)t1 + λt2 , (1 − λ)y1 + λy2 ) belongs to D. This means that the entire straight line segment between the two points also belongs to the set D. (7.2)

Non Convex Convex

The set D = {(t, y)|a ≤ t ≤ b, y ∈ R} is a convex set. Theorem: Suppose that f (t, y) is deﬁned on a convex set D ∈ R2 . If a constant L > 0 exists with ∂f (t, y) ≤ L, for all (t, y) ∈ D, ∂y then f satisﬁes a Lipschitz condition on D in the variable y. proof: Theorem: Suppose that D = {(t, y)|a ≤ t ≤ b, y ∈ R} and that f (t, y) is continuous on D. If f satisﬁes a Lipschitz condition on D in the variable y, then the initial-value problem y ′ (t) = f (t, y), a ≤ t ≤ b, y(a) = α, Example: We want to show that the following initial-value problem has a unique solution y ′ = y cos(t), 0 ≤ t ≤ 1, y(0) = 1 We apply the previous theorem which states: 78 has a unique solution y(t) for a ≤ t ≤ b.

Suppose that D = {(t, y)|a ≤ t ≤ b, y ∈ R} and that f (t, y) is continuous on D. If f satisﬁes a Lipschitz condition on D in the variable y, then the initial-value problem y ′ (t) = f (t, y), a ≤ t ≤ b, y(a) = α, has a unique solution y(t) for a ≤ t ≤ b. Let check that f (t, y) = y ′ = y cos(t), 0 ≤ t ≤ 1 satisﬁes Lipschitz condition. |y1 cos(t) − y2 cos(t)| = cos(t)|y1 − y2 | ≤ |y1 − y2 | Thus, L = 1, f (t, y) satisﬁes the Lipschitz condition. Therefore, there is a unique solution. example: We want to show that y3 + y y =− 2 (3y + 1)t

′

has solution y 3 tyt = 2 for 1 ≤ t ≤ 2, and y(1) = 1. – we calculate the derivative of y 3 + t + yt = 2 we ﬁnd 3y 2 y ′ t + 3y 3 + y + y ′ t = 0 y ′ (3y 2 t + t) + y 3 + y = 0 which gives y′ = − y3 + y (3y 2 + 1)t

we have also from y 3 t + yt = 2 that at t = 1 we get y 3 + y = 2. There is only one real root for y 3 + y = 2 which is y = 1. To get y(2) we can use Newton method f (y) = y 3 + y − 1 = 0 p3 + pn − 2 pn+1 = pn − n 2 3pn + 1 It is clear that f (0) = −1 and f (1) = 1. We can start Newton iteration from p0 = 0.5. We get i 0 1 2 3 4 5 6 pi 0.5 0.7142857143 0.6831797236 0.6823284233 0.6823278037 0.6823278038 0.6823278038

So, the approximate value of y(2) = 0.6823278038. Exercises section 5.1: 1, 3, 5, 7 79

7.3

Euler’s Method

The objective of Euler’s method is to obtain an approximate solution to the well-posed initial-value problem dy = f (x, y), dt a ≤ t ≤ b, y(a) = α (7.3)

We can obtain approximate solutions at ﬁxed points, called mesh points. Let assume that the mesh points are equally distributed throughout the interval [a, b]. So, we choose ti = a + ih, for each i = 0, 1, ..., N. (7.4)

The common distance between the points h = (b − a)/N , is called the step size. To derive Euler’s method we use Taylor’s Theorem (ti+1 − ti )2 ′′ y (ξi ) y(ti+1 ) = y(ti ) + (ti+1 − ti )y (ti ) + 2 h2 = y(ti ) + h y ′ (ti ) + y ′′ (ξi ) 2 h2 = y(ti ) + h f (ti , y(ti )) + y ′′ (ξi ) 2

′

(7.5)

Euler’s method constructs wi ≈ y(ti ), for each i = 1, 2, ..., N , by dropping the remainder term. Thus, it is given by w0 = α, wi+1 = wi + hf (ti , wi ),

for each i = 0, 1, ..., N.

(7.6)

This last equation is called diﬀerence equation associated with Euler’s method. Example: Let approximate the solution of y ′ = y − t2 + 1, using Euler’s method with 0 ≤ t ≤ 2, y(0) = 0.5, and N = 10 (please see Example 1 page 259). We have h = (b − a)/N = 0.2 and t0 = 0. The Euler’s method is: w0 = 0.5, wi+1 = wi + 0.2 × (wi − t2 + 1), i ti 0.0 0.2 0.4 0.6 . . . wi 0.05000 0.80000 1.15200 1.55040 . . . for each i = 0, 1, ..., N

2.0 4.86580

80

Theorem Suppose that f is continuous an satisﬁes a Lipschitz condition with constant L on D = {(t, y) | a ≤ t ≤ b, −∞ < y < ∞} and that a constant M exists with |y ′′ (t)| ≤ M, for all t ∈ [a, b] Let y(t) denote the unique solution to the initial-value problem (7.8) (7.7)

y ′ (t) = f (t, y), a ≤ t ≤ b, y(a) = α

(7.9)

and wi , (i = 0, . . . , N ) be the approximations generated by Euler’s method for some positive integer N . Then, for each i |y(ti ) − wi | ≤ hM L(ti −a) e −1 2L (7.10)

Theorem Assume that the hypotheses of the previous theorem hold and ui (i = 0, . . . , N ) be the approximations obtained from u0 = α + δ0 , ui+1 = ui + hf (ti , ui ) + δi+1 and if |δi | < δ, then |y(ti ) − ui | ≤ 1 L δ hM + 2 h eL(ti −a) − 1 + |δ0 |eL(ti −a) (7.12)

(7.11)

The error bound is no longer linear in h. In fact it goes to inﬁnity for h goes to zero. lim hM δ + 2 h =∞ (7.13)

h−→∞

It can be shown that the minimum value of the error occurs when h= example ex 9: Given the initial-value problem 2 y ′ = y + t2 et , 1 ≤ t ≤ 2, y(1) = 0. t with the exact solution y(t) = t2 (et − e). The Euler’s method with h = 0.1 gives 81 2δ M (7.14)

t 1. 1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 2.0

w(t) 0 0.2718281828 0.6847555777 1.276978344 2.093547688 3.187445123 4.620817847 6.466396379 8.809119690 11.74799654 15.39823565

y(t) 0. 0.345919877 0.866642537 1.607215080 2.620359552 3.967666297 5.720961530 7.963873477 10.79362466 14.32308154 18.68309709

y(t) − w(t) 0. 0.0740916942 0.1818869593 0.330236736 0.526811864 0.780221174 1.100143683 1.497477098 1.984504970 2.57508500 3.28486144

A linear interpolation to approximate y(1.04) can be found as follows. We use x0 = 1 and x1 = 1.1 and the values of w(x0 ) and w(x1 ) we get the Lagrange polynomial P (x) = 2.718281828 x − 2.718281828 Thus, P (1.04) = 0.108731273. The exact value is y(1.04) = 0.119987497 and the error is 0.011256224.

7.4

**Higher-Order Taylor Methods
**

Euler’s method is Taylor’s method of order one. For order n we write Taylor polynomial for

n

y(ti+1 ) =

k=0

hk (k) hn+1 y (ti ) + y (n+1) (ξi ) k! (n + 1)!

(7.15)

for some ξi ∈ (ti+1 , ti ) The diﬀerence-equation method correspond to previous equation can be found by deleting the remainder term. we get the Taylor method order n: ω0 = α, ωn+1 = ωi + h T (n) (ti , ωi ),

n−1

T (n) = f (ti , ωi ) +

j=1

hj f (j) (ti , ωi ) (j + 1)!

i = 0, 1, ..., N − 1,

(7.16)

Example: We want to approximate the solution of the initial-valued problem y ′ = t2 + y 2 , We calculate the derivatives y′ y ′′ y ′′′ y (4) = = = = t2 + y 2 2t + 2yy ′ 2 + 2y ′2 + 2yy ′′ 6y ′ y ′′ + 2yy ′′′ 82 0 ≤ t ≤ 0.4, y(0) = 0

with h = 0.2 using Taylor’s method of order 4.

we ﬁnd y(0) = 0.0 y(0.2) = 0.00266666667 y(0.4) = 0.02135325469 If we use step size h = 0.4 we get y(0.4) = 0.02133333333. The correct answer is y(0.4) = 0.021359. It shows that the accuracy has been improved by using subintervals, i.e., decreasing the step size.

7.5

**Runge-Kutta Methods
**

The Taylor’s method provides the formal deﬁnition of a step-by-step numerical method for solving initial-value problems. The diﬃculty of applying Taylor’s method is connected with evaluating higher derivatives which is extremely complicated. We can explore a class of methods that agree with the ﬁrst n + 1 terms of the Taylor series using function value only (no need to construct f (r) ). These are Runge-Kutta Methods. Runge-Kutta methods refer to a family of one-step methods. They all based on the general form of the extrapolation equation, yi+1 = yi + slope × interval size = yi + m h (7.17) (7.18)

where m is the slope that is weighted averages of the slopes at various points in the interval. If we estimate m using slopes at r points in the interval (ti , ti+1 ), then m can be written as m = w1 m1 + w2 m2 + . . . + wr mr , where wi are weights of the slopes at various points. Runge-Kutta methods are known by their order. For instance, a Runge-Kutta method is called the r-order Runge-Kutta method when slopes at r points are used to construct the weighted average slope m. In Euler’s method we use only one point slope at (ti , yi ) to estimate yi+1 , and therefore, Euler’s method is a ﬁrst-order Runge-Kutta method. Second Taylor Polynomial in two variables for the function f (t, y) near the point (t0 , y0 ) can be written as

P2 (t, y) = f (t0 , y0 ) + [(t − t0 )ft + (y − y0 )fy ] (t − t0 )2 (y − y0 )2 + ftt + fyy + (t − t0 )(y − y0 )fty 2 2 We consider here the second-order method which has the form yi+1 = yi + (w1 m1 + w2 m2 )h where m1 = f (ti , yi ) m2 = f (ti + a1 h, yi + b1 m1 h) 83

(7.19)

(7.20)

(7.21) (7.22)

The weights w1 and w2 and the constants a1 and b1 are to be determined. The principle of RungeKutta method is that these parameters are chosen such that the power series expension of the right side of eq. (7.20) agrees with Taylor series expension of yi+1 in terms of yi and f (ti , yi ). The second-order Taylor series expension of yi+1 about yi is given by yi+1 = yi + y ′ h + We know that

′ yi = f (ti , yi ) dy ′ ∂f ∂f ′′ yi = = + f (ti , yi ) dt ∂t ∂y

y ′′ 2 h 2

(7.23)

We get yi+1 = yi + f h + (ft + fy f ) h2 2

Now consider the right side of eq. (7.20). To get Taylor’s expension we need the Taylor’s series of two variables. we can write yi+1 = = = = = yi + (w1 m1 + w2 m2 )h y1 + (w1 f + w2 f (ti + a1 h, yi + b1 m1 h)) yi + [w1 f + w2 (f + a1 hft + b1 m1 hfy + O(h2 ))]h yi + w1 hf + w2 hf + w2 a1 h2 ft + w2 b1 m1 h2 fy + O(h3 ) yi + (w1 + w2 )hf + w2 (a1 ft + b1 m1 fy )h2 + O(h3 )

Then we can compare between yi+1 = yi + f h + (ft + fy f ) yi+1 we ﬁnd w1 + w2 = 1, w2 a1 = w2 b 1 = 1 2 h2 2 = yi + (w1 + w2 )hf + w2 (a1 ft + b1 m1 fy )h2

Note that we have only three equations but four variables. These set of equations has no unique solution. The index i = 0, 1, . . . , N − 1. If we choose w1 = 0, w2 = 1 and a1 = b1 = 1/2 we get what we call Midpoint method m1 = f (ti , yi ) m1 h h) m2 = f (ti + , yi + 2 2 yi+1 = yi + m2 h

84

If we choose w1 = w2 = 1/2 and a1 = b1 = 1 we get what we call Modiﬁed Euler Method m1 = f (ti , yi ) m2 = f (ti + h, yi + m1 h) 1 yi+1 = yi + (m1 + m2 )h 2 If we choose w1 = 1/4, w2 = 3/4 and a1 = b1 = 2/3 we get what we call Heun’s method m1 = f (ti , yi ) m2 = f yi+1 2 2 ti + h, yi + m1 h 3 3 h = yi + (m1 + 3m2 )h 4

The derivation of Runge-Kutta Order Four is too long, we just give it here without details. m1 = f (ti , yi ) m2 = f m3 m4 yi+1 h 1 ti + , yi + m1 h 2 2 1 h = f ti + , yi + m2 h 2 2 = f (ti + h, yi + m3 h) 1 = yi + (m1 + 2m2 + 2m3 + m4 )h 6

(7.24)

Examples: We want to approximate the solution of y ′ = te3t − 2y, 0 ≤ t ≤ 1, y(0) = 0

with h = 0.5. Let use Modiﬁed Euker’s method N y0 ti m1 m2 yi+1 we get ti 0.0 0.5 1.0 wi 0.0 0.560211134 5.301489796 = = = = = (1 − 0)/h = 2 0 0 + ih, i = 0, 1, . . . , N − 1 f (ti , yi ) f (ti + h, yi + m1 h) h = yi + (m1 + m2 )h 2

85

Example: We want that the midpoint method, the modiﬁed Euler’s method, and Heun’s method give the same approximations to the initial-value problem y ′ = −y + t + 1, 0 ≤ t ≤ 1, y(0) = 1

yi+1 = yi + (w1 + w2 )f h + (a1 ft + b1 m1 fy ) = yi + f h + w2 (a1 − b1 m1 )h2 with w1 + w2 = 1. – Midpoint method: w2 = 1, yi+1 = yi + f h + – Modiﬁed Euler’s method: 1 w2 = , 2 yi+1 = yi + f h + – Heun’s method: 3 w2 = , 4 yi+1 = yi + f h + a1 = b 1 = 2 3 a1 = b 1 = 1 h2 (1 − m1 ) 2 a1 = b 1 = 1 2

h2 (1 − m1 ) 2

h2 (1 − m1 ) 2

Therefore, all the three methods give the same approximations. example: We want to ﬁnd y(0.2) by using Runge-Kutta fourth order method for y′ = 1 + y2, We take t0 = 0, y0 = 0, and h = 0.2, we get m1 m2 m3 m4 y(0.2) = = = = = f (t0 , y0 ) = 1 f (t0 + h/2, y0 + hm1 /2) = 1.01000000 f (t0 + h/2, y0 + hm2 /2) = 1.01020100 f (t0 + h, y0 + m3 h) = 1.040820242 y0 + h(m1 + 2m2 + 2m3 + m4 )/6 = 0.2027074080 y(0) = 0

86

7.6

Predictor Corrector Methods

In the previous methods, Euler, Heun, Taylor, and Runge-Kutta are called one-step methods because only the value of y0 at the beginning of the interval was required. They use information from one previous point to compute the successive point; that is, yi is needed to compute yi+1 . The Multistep methods make use of information about the solution at more then one point. A desirable feature of multistep method is that the local truncation error can be determined and a correction term can be included, which improves the accuracy of the answer at each step. Deﬁnition: An m-step multistep method for solving the initial-value problem y ′ f (t, y), a ≤ t ≤ b, y(a) = α (7.25)

has a diﬀerence equation for ﬁnding the approximation wi+1 at the mesh point t − i + 1 represented by the following equation, where the integer m > 1 wi+1 = am−1 wi + am−2 wi−1 + . . . + a0 wi+1−m +h [bm f (ti+1 , wi+1 ) + bm−1 f (ti , wi ) + . . . + b0 f (ti+1−m , wi+1−m )] (7.26)

for i = m − 1, . . . , N − 1, where h = (b − 1)/N , the a0 , a1 , . . . , am−1 and b0 , . . . , bm are constant, and the starting values w0 = α, w1 = α1 , . . . , wm−1 = αm−1 . When bm = 0 the method is called explicit, or open, since in eq. (7.26), wi+1 is explicitly given in terms of previously determined values. When bm = 0 the method is called implicit, or closed, since wi+1 occurs in both sides. Euler’s method gives yi+1 = yi + f (t0 + ih, yi ), i = 0, 1, . . . (7.27)

The modiﬁed Euler’s method can be written as h yi+1 = yi + [f (ti , yi ) + f (ti+1 , yi+1 )] (7.28) 2 The value of yi+1 is ﬁrst estimated by Euler’s method, eq (7.27), and then used in the right hand side of the modiﬁed Euler’s method, eq (7.28), giving a better approximation of yi+1 . The value of yi+1 is again substituted in the modiﬁed Euler’s method, eq (7.28), to ﬁnd a still better approximation of yi+1 . This procedure is repeated till two consecutive iterated values of yi+1 agree. The equation (7.27) is therefore called predictor while eq (7.28) is called corrector. We will describe only the multistep method called Adam-Bashforth-Moulton Method. It can be derived from the fundamental theorem of calculus

ti+1

yi+1 = yi +

ti

f (t, y)dt

(7.29)

The predictor uses the Lagrange polynomial approximation for f (t, y) based on the four values, ti−3 , ti−2 , ti−1 , and ti . P4 (t) = yi−3 (t − ti−2 )(t − ti−1 )(t − ti ) (ti−3 − ti−2 )(ti−3 − ti−1 )(ti−3 − ti ) (t − ti−3 )(t − ti−1 )(t − ti ) +yi−2 (ti−2 − ti−3 )(ti−2 − ti−1 )(ti−2 − ti ) (t − ti−3 )(t − ti−2 )(t − ti ) +yi−1 (ti−1 − ti−3 )(ti−1 − ti−2 )(ti−1 − ti ) (t − ti−3 )(t − ti−2 )(t − ti−1 ) +yi (ti − ti−3 )(ti − ti−2 )(ti − ti−1 ) 87

**It is integrated over the interval [ti , ti+1 ].
**

ti+1

P4 (t)dt =

ti

h (55yi − 59yi−1 + 37yi−2 − 9yi−3 ) 24

(7.30)

Then, we can write yi+1 = yi + h (55yi − 59yi−1 + 37yi−2 − 9yi−3 ) 24 (7.31)

This last equation is called Adams-Bashforth predictor. Note here that extrapolation is used. The corrector is developed in similar way. A second Lagrange polynomial for f (t, y) is constructed based on the four points, (ti−2 , yi−2 ), (ti−1 , yi−1 ), (ti , yi ), and the new point (ti+1 , yi+1 ) just calculated by eq. (7.33). P4 (t) = yi−2 (t − ti−1 )(t − ti )(t − ti+1 ) (ti−2 − ti−1 )(ti−2 − ti )(ti−2 − ti+1 ) (t − ti−2 )(t − ti )(t − ti+1 ) +yi−1 (ti−1 − ti−2 )(ti−1 − ti )(ti−1 − ti+1 ) (t − ti−2 )(t − ti−1 )(t − ti+1 ) +yi (ti − ti−2 )(ti − ti−1 )(ti − ti+1 ) (t − ti−2 )(t − ti−1 )(t − ti ) +yi+1 (ti+1 − ti−2 )(ti+1 − ti−1 )(ti+1 − ti ) h (9yi+1 + 19yi − 5yi−1 + yi−2 ) 24

**It is integrated over the interval [ti , ti+1 ].
**

ti+1

P4 (t)dt =

ti

(7.32)

Which gives the Adams-Moulton corrector: yi+1 => yi + h (9yi+1 + 19yi − 5yi−1 + yi−2 ) 24 (7.33)

We have to repeat the last equation, the corrector, till we obtain the needed accuracy. Algorithm – COMPUTE THE FIRST FOUR INITIAL-VALUE WITH RUNGE-KUTTA-METHOD – COMPUTE yi+1 , USING THE FORMULA yi+1 = yi + – COMPUTE fi+1 = f (xi+1 , yn+1 ) – COMPUTE yi+1 from the equation yi+1 = yi +

(k) (k) (0) (0) (0) (0)

h [55fi − 59fi−1 + 37fi−2 − 9fi−3 ] 24

(7.34)

h (k−1) [9f (xi+1 , yi+1 ) + 19fi − 5fi−1 + fi−2 ] 24 88

(7.35)

– ITERATE ON i UNTIL yi+1 − yi+1 yi+1

(k) (k) (k−1)

<ǫ

(7.36)

Example: Consider the initial-value problem y′ = 1 + y2, The ﬁrst steps is to calculate the four initial value, w0 , w1 , w2 , and w3 . To do this we can use for example Four-order Runge-Kutta method with t0 = 0, w0 = 0, and h = 0.2 we get k1 k2 k3 k4 w1 = = = = = hf (t0 , y0 ) hf (t0 + h/2, y0 + k1 /2) hf (t0 + h/2, y0 + k2 /2) hf (t0 + h, y0 + k3 ) w0 + (k1 + 2k2 + 2k3 + k4 )/6 = 0.2027074081 w2 = 0.4227889928 w3 = 0.6841334020 so, we get the predictor w4 = w3 + h/24 [55f (t3 , w3 ) − 59f (t2 , w2 ) + 37f (t1 , w1 ) − 9f (t0 , w0 )] = 1.023434882 Now, we can correct the predicted value using the formula

(0) w4 (1) w4 (2) w4 (3) w4

0 ≤ t ≤ 0.8,

y(0) = 0

In the same way we can get

t1 = 0.2, t2 = 0.4, = 1.023434882

(0)

t3 = 0.6,

t4 = 0.8

= 1.029690402 (1) = w3 + h/24 9f (t4 , w4 ) + 19f (t3 , w3 ) − 5f (t2 , w2 ) + f (t1 , w1 )

= w3 + h/24 9f (t4 , w4 ) + 19f (t3 , w3 ) − 5f (t2 , w2 ) + f (t1 , w1 )

= 1.030653654 (2) = w3 + h/24 9f (t4 , w4 ) + 19f (t3 , w3 ) − 5f (t2 , w2 ) + f (t1 , w1 ) = 1.030653654

So, the predicted-corrector methods gives an approximate solution 1.030653654. The actual solution of the ODE is y(t) = tan(x) y(0.8) = 1.029638557 The errors are |w4 (0.8) − tan(0.8)| = 0.001015097 89 |w4 (0.8) − tan(0.8)| = 0.006203675

(3)

**Chapter 8 Direct Methods for Solving Linear Systems
**

Alinear system has one of the following properties: Unique solution No solution No unique solution Ill conditioned Let consider the following linear system x − 2y = −2 0.45x − 0.91y = −1 This system is ill conditioned because the two equations Deﬁnition: A matrix n × m can be represented by a11 a12 a21 a22 A = (aij ) = . . . . . . an1 an2 represent a nearly two parallel lines. . . . a1m . . . a2m . . . . . . anm

Linear equations can be represented by matrix. Solving linear equation can be done by three main operations: Row Ei can be multiplied by any nonzero constant λ. Row Ej can be multiplied by λ and added to the row Ei Row Ei and Ej can be transposed in order.

Example 4 1 1 0 3 2 1 −1 1 1 3 −1 −1 2 −3 −1 2 3 −1 4 90

it becomes 1 1 0 3 1 1 0 3 4 4 2 1 1 −1 1 → 0 −1 −1 −5 −7 0 −4 −1 −7 −15 3 −1 −1 2 −3 −1 2 3 −1 0 3 3 2 4 8 4 1 1 0 3 0 −1 −1 −5 −7 → 0 0 3 13 13 0 0 0 −13 −13

The matrix becomes a triangular matrix. It is possible to solve the linear equations by a backwardsubstitution process.

8.1

Gaussian Elimination

triangular by elementary operations.

In general the matrix A of dimension n × n can be reduced to Example: Consider the following system 3 1 1 1 1 2 −1 −1 2 12 1 3 −2 −1 −9 −1 −1 1 4 17 After elementary operation we get

Now, the solution can be obtained with backward substitution

1 1 1 1 3 0 −3 −3 6 0 0 0 −5 −2 −8 0 0 0 21 84 5 5

21 84 x4 = ⇒ x4 = 4 5 5 −5x3 − 2 × 4 = −8 ⇒ x3 = 0 −3x2 − 3 × 0 = 6 ⇒ x2 = −2 x1 − 2 + 0 + 4 = 3 ⇒ x1 = 1 If the forward elimination gives the ﬁnal row 00 . . . 0ann |bn which gives ann xn = bn , the original system has a unique solution, which will be obtained by backward substitution. if the ﬁnal row has 00 . . . 00|bn where bn = 0, the system has no solution. 91

if the ﬁnal row has 00 . . . 00|0 The system has an inﬁnity of solutions. There are few strategics to implement Gauss Elimination:

8.2 8.3 8.4 8.5

Pivoting Strategies Matrix Inverse Determinant of Matrix Matrix Factorization

92

**Chapter 9 Iterative Methods for Solving Linear Systems
**

9.1 9.2 9.3 Norms of vectors and Matrices Eigenvalues and Eigenvectors Iterative Techniques for Solving Linear Systems

93

**Chapter 10 Some Useful Remarks
**

10.1 Largest Possible Root

f (x) = an xn + an−1 xn−1 + . . . + a1 x + a0 an−1 an This value is taken as initial approximation when no other value is given. xl = − Search Bracket: All real roots are within the interval 2 an−2 an−1 an−1 − , −2 an an an −1 − the largest possible root is given by (10.1) (10.2)

For a polynomial represented by

2

−2

an−2 an

(10.3)

Another relationship that suggests an interval for roots. All real roots are within the interval 1 1 Max{|a0 |, . . . , |an−1 }, 1 + Max{|a0 |, . . . , |an−1 } |an | |an | (10.4)

10.2

1

**Convergence of Bisection Method
**

b−a , This means that the error 2n (10.5)

In the bisection method, after n iteration, the root must lie within b−a 2n

bound at nth iteration is En = Similarly, En+1 = En b−a = n+1 2 2

(10.6)

That is, the error decreases linearly with each step by a factor of 1/2. Therefore, the bisection method is linearly convergent.

1

Numerical Methods, E Balagurusamy

94

10.3

Convergence of False Position Method

The false position formula is based on the linear interpolation model. One of the starting points is ﬁxed while the other moves towards the solution. Assume that the initial points bracketing the solution are a and b and that a moves towards the solution and b is ﬁxed. Let x1 = a an p be the solution. Then, E0 = p − p0 , that is Ei = p − pi It can be shown that.. (10.8) E1 = p − p1 (10.7)

10.4

Convergence of Newton-Raphson Method

Let pn be estimate of a root of the function f (x) = 0. If pn and pn+1 are close to each other, then, we can use Taylor’s series expansion, f (pn+1 ) = f (pn ) + (pn+1 − pn )f ′ (pn ) + f ′′ (ξ) (pn+1 − pn )2 2 (10.9)

where, ξ lies between pn and pn+1 . Let assume that the exact root is p. Then 0 = f (pn ) + (p − pn )f ′ (pn ) + f ′′ (ξ) (p − pn )2 2 (10.10)

Assume that f ′ (pn ) = 0. From Newton-Raphson formula we have pn+1 = pn − f (pn ) ⇒ f (pn ) = (pn − pn+1 ) f ′ (pn ) f ′ (pn ) (10.11)

substituting this for f (pn ) in eq.(10.10),yields 0 = (p − pn+1 ) f ′ (pn ) + The error in the estimate xn+1 is given by En+1 = p − pn+1 En = p − pn Now, eq. (10.12) can be written as 0 = En+1 f ′ (pn ) + which leads to En+1 = − f ′′ (ξ) 2 En 2 (10.13) (10.14) f ′′ (ξ) (p − pn )2 2 (10.12)

f ′′ (ξ) 2 E 2 f ′ (pn ) n

The last equation shows that the error is roughly proportional to the square of the error in the previous iteration. Thus, the Newton-Raphson method is quadratically convergent. The Newton-Raphson Method has certain limitations. The Method may fail in the following situations: 95

1. If f ′ (pn ) = 0. 2. If the initial guess is too far away from the required root, the process may converge to some other root. 3. A practical value in the iteration sequence may repeat, resulting in an inﬁnite loop. This occurs when the tangent to the curve f (x) at pn+1 cuts the x-axis again at pn .

10.5

Convergence of Secant Method

Secant method uses two initial estimates but does not require that they must bracket the root. The secant formula of iteration is pn − pn−1 f (pn ) (10.15) pn+1 = pn − f (pn ) − f (pi−1 ) Let p be actual root of f (x) and En the error in the estimate of pn . Then pi = Ei + p, for, i = n − 1, n, n + 1 Substituting in eq.(10.15) and simplifying, we get En+1 = En−1 f (pn ) − En f (pn−1 ) f (pn ) − f (pi−1 ) (10.17) (10.16)

According to the mean value theorem, there exist at least one point, between pn and p such that f ′ (ξn ) = Similarly, f (pn−1 ) = En−1 f ′ (ξn−1 ) The equation (10.17) becomes En+1 = That is, f ′ (ξn ) − f ′ (ξn−1 ) En En−1 f (pn ) − f (pi−1 ) En+1 ∝ En En−1

α En+1 ∝ En

f (pn ) f (pn ) f (pn ) − f (p) = = ⇒ f (pn ) = Ei f ′ (ξn ) pn − p pn − p En

(10.18)

(10.19)

(10.20) (10.21)

Let us now ﬁnd the order of convergence of this iteration process. If the order of convergence is α then (10.22)

**So, from equation (10.21), we can write
**

α En+1 ∝ En En−1 ⇒ En ∝ En En−1 α α ⇒ En ∝ En−1 En−1 α α+1 ⇒ En ∝ En−1

⇒ En ∝ En−1

(α+1)/α

α ⇒ En−1 ∝ En−1

(α+1)/α

96

This means that

√ α+1 1± 5 =α⇒α= . (10.23) α 2 Since α is always positive then, the order of convergence of the secant method is α = 1.618 and the convergence is referred to as superlinear convergence.

10.6

Convergence of Fixed Point Method

97

Chapter 11 Exams

11.1 exam 1

Answer all questions. 1. Let x = 0.456 × 10(−2) , y = 0.134, and z = 0.920. Use three-digit rounding arithmetic to evaluate: (a) (x + y) + z. (b) x + (y + z). Is the associative law for addition satisﬁed? Explain your answer. (10 Marks) 2. Evaluate the polynomial P (x) = x3 − 6x2 + 11x − 5 at p = 2 using Horner’s Theorem. (6 Marks) 3. We want to evaluate the square root of 5 using the equation x2 − 5 = 0 by applying the ﬁxed-point iteration algorithm. 5 1 has a ﬁxed point exactly at x2 − 5 = 0 (a) Use algebraic manipulation to show that g(x) = x + 2 2x (b) Use ﬁxed-point theorem to show that the function g(x) converges to the unique ﬁxed point for any intial p0 ∈ [2, 5]. (c) Use p0 = 3 to evaluate p2 . (8 Marks)

4

**4. (a) Evaluate exactly the integral
**

0

ex dx. 98

4

(b) Find an approximation to

0 4

**ex dx using Simpson’s rule with h = 2. ex dx using Composite Simpson’s rule with h = 1.
**

0

(c) Find an approximation to

(d) Does the composite Simpson’s rule improve the approximation. (8 Marks) 5. Given the equation y ′ = 3x2 +1, with y(1) = 2. Estimate y(2) by Euler’s method using with h = 0.25. (8 Marks)

END

Useful Formulas and Theorem Horner’s Theorem Let P (x) = an xn + an−1 xn−1 + ... + a1 x + a0 If bn = an and bk = ak + bk+1 x0 , then b0 = P (x0 ), Fixed-Point Theorem: Let g ∈ C[a, b] be such that g(x) ∈ [a, b], for all x in [a, b]. Suppose, in addition, that g ′ exists on (a, b) and that a constant 0 < k < 1 exists with |g ′ (x)| ≤ k, pn = g(pn−1 ), converges to the unique ﬁxed point p in [a, b]. Euler’s method: dy = f (t, y), a ≤ t ≤ b, y(a) = α at (N+1) To approximate the solution of the initial-value problem dt equally spaced numbers in the interval [a, b], we construct the solution y(ti ) = wi for i = 0, 1, ..., N −1 and w0 = α, t0 = a, wi+1 = wi + hf (ti , wi ), ti = a + ih, where h = (b − a)/N 99 for all x ∈ (a, b). n ≥ 1, Then, for any number p0 in [a, b], the sequence deﬁned by for k = n − 1, n − 2, ..., 1, 0 (11.2) (11.1)

11.2

exam 2

Answer all questions. 1. (a) Evaluate f (x) = x3 − 6.1x2 + 3.2x + 1.5 at x = 4.71 using three-digit rounding arithmetic. (b) Find the relative error in (a). (c) Use Horner’s Theorem to evaluate f (x) at x = 4.71 using three-digit rounding arityhmetic. (d) Find the relative error in (c). (e) Why the relative error in (c) is less than the relative error in (a). (10 Marks) 2. Let f (x) = −x3 − cos x and p0 = −1. Use Newton’s method to ﬁnd p2 . (6 Marks) 3. (a) Let A be a given positive constant and g(x) = 2x − Ax2 . Show that if ﬁxed-point iteration converges to a nonzero limit, then the limit is p = 1/A, so the reciprocal of a number can be found using only multiplications and subtractions. 1 (b) Use ﬁxed-point iteration with p0 = 0.1 to ﬁnd p2 that approximates . 11 (8 Marks) 4. Use the forward-diﬀerence and backward-diﬀerence formulas to determine each of the missing entry in the following table: f (x) f ′ (x) x 1.0 1.0000 .... 1.2 1.2625 .... 1.4 1.6595 .... (8 Marks) 5. Use Euler’s method to approximate the solution for the following initial-value problem. y ′ = et−y , 0 ≤ t ≤ 1, y(0) = 1 with h = 0.5. (8 Marks)

END

100

Useful Formulas and Theorem Horner’s Theorem Let P (x) = an xn + an−1 xn−1 + ... + a1 x + a0 If bn = an and bk = ak + bk+1 x0 , then b0 = P (x0 ), Fixed-Point Theorem: Let g ∈ C[a, b] be such that g(x) ∈ [a, b], for all x in [a, b]. Suppose, in addition, that g ′ exists on (a, b) and that a constant 0 < k < 1 exists with |g ′ (x)| ≤ k, for all x ∈ (a, b). for k = n − 1, n − 2, ..., 1, 0 (11.4) (11.3)

Then, for any number p0 in [a, b], the sequence deﬁned by pn = g(pn−1 ), converges to the unique ﬁxed point p in [a, b]. Euler’s method: dy To approximate the solution of the initial-value problem = f (t, y), a ≤ t ≤ b, y(a) = α at (N+1) dt equally spaced numbers in the interval [a, b], we construct the solution y(ti ) = wi for i = 0, 1, ..., N −1 and w0 = α, t0 = a, wi+1 = wi + hf (ti , wi ), ti = a + ih, where h = (b − a)/N n ≥ 1,

101

11.3

1. Let

exam 3

Answer all questions.

f (x) = (a) Find lim f (x).

x→0

ex − e−x x

(b) Use three-digit rounding arithmetic to evaluate f (0.100). (c) The actual value is f (0.100) = 2.003335000 ﬁnd the relative error for the value obtained in (b). (8 Marks) 2. (a) Find the actual value of

1

ex dx

−1

(b) Use Simpson’s rule to approximate the integral in (a). (c) Find the maximum error from Simpson’s formula. (8 Marks) 3. Use the forward-diﬀerence and backward-diﬀerence formulas to determine each of the missing entry in the following table f (x) f ′ (x) x 1.0 1.0000 .... 1.2 1.2625 .... 1.4 1.6595 .... (8 Marks)

2

**4. (a) Find The actual value of
**

0

1 dx. x+4

2

**(b) Use the Trapazoidal rule to approximate and Find the actual error.
**

2 0

1 dx x+4

(c) Determine the values of n and h required for the Composite Trapazoidal rule to approximate 1 dx to within 10−6 . x+4 102

0

(8 Marks) 5. Use Euler’s method to approximate the solution for the following initial-value problem. y ′ = et−y , 0 ≤ t ≤ 1, y(0) = 1 with h = 0.5. (8 Marks)

END

Useful Formulas and Theorem Composite Trapezoidal rule: Let f ∈ C 2 [a, b], h = (b − a)/n, and xj = a + hj, for each j = 0, 1, . . . , n. There exists a µ ∈ (a, b) for which the composite trapezoidal rule for n subintervals can be written as

b a

h b − a 2 ′′ f (x) dx = f (a) + 2 f (xj ) + f (b) − h f (µ) 2 12 j=1

n−1

(11.5)

**Simpson’s rule With nodes at x0 = a, x1 = a + h, x2 = b, where h = (b − a)/2, the Simpson’s rule is
**

x2

f (x)dx =

x0

h h5 [f (x0 ) + 4f (x1 ) + f (x2 )] − f (4) (ξ). 3 90

(11.6)

Euler’s method dy = f (t, y), a ≤ t ≤ b, y(a) = α at (N+1) To approximate the solution of the initial-value problem dt equally spaced numbers in the interval [a, b], we construct the solution y(ti ) = wi for i = 0, 1, ..., N −1 and w0 = α, t0 = a, wi+1 = wi + hf (ti , wi ), ti = a + ih, where h = (b − a)/N

103