You are on page 1of 7

Chapter 1 Basic Concepts and Introduction

1 Basic Concepts and Introduction


three contents:

• What is numerical methods? Classification of Error;

• Measures of error;

• Avoid the loss of accuracy.

1.1 What is numerical methods?Classification of Error;


Numerical method is the research of solving problems with computers which is a critical
part of scientific computing.
1. The process of solving a practical problem scientifically is:

Practical problem
Omiting secondary factors, modelling error

Mathematical model whose exact solution is difficult to be solved
↓Simplifying, truncation error
Numerical problem that is solved with computer easily
↓round-off error
Making procedure to get the nearby solution

Taking the nearby solution into the practical experiment to check

For the whole computational process, unfortunately, very few mathematical model can
be solved to get the exact solutions. Thus, numerical solution, approximation to the exact
solutions should be considered to replace the exact solutions. They can be found using
numerical algorithms and procedure with computer.
Computational methods involve the study development of algorithms and making pro-
cedure for obtaining numerical solutions to various mathematical problems. Frequently
computational methods are called the mathematics of scientific computing.
A major advantage for computational methods are that a numerical solver can be
obtained when a problem has no analytic solution.
It is important to realize that the result from numerical methods is an approximation,
which means there is a difference between the exact solution and the nearby solution,

1
called Error. The classification of errors depends on the source of the error. Assume that
an exact solution is x, a nearby solution x̃, approximating x. The error is the difference
between them, ∆x = x − x̃. In this course, we focus on the truncation error and round-off
error. For example, Ex1.1

1 3 1
sin(x) + M → sin(x) = x − x + R → sin(x) ≈ x − x3 → procedure,
3! 3!
where M is the modelling error, R is the truncation error. Then let’s introduce to the
round-off error. In our traditional mathematical world we permit numbers with an infinite
number of digits. However, in a computer, only a relatively small subset of the real number
system is represented and used. For those real numbers which can not be represented
and used by the computer, the calculator or computer will replace it by one represented
number, called round-off. For example-the storage of number in computer. For a 4 digits
computer,
(1) Storage: the real number x = 123.45678 is represented by x̃ = 0.1234 × 103 . We
have missed some precisions.

(2) Arithmetics: Given two numbers a = 0.1234 × 103 and b = 0.4567 × 10−1 , calculate
the sum of them,

a + b = 0.1234 × 103 + 0.4567 × 10−1


= 0.1234 × 103 + 0.00004567 × 103
= 0.12344567 × 103
= 0.1234 × 103 = a.

This phenomena is considered as ”Large number swallows small number”. We should


avoid this phenomena.
For any electronic computer, since a machine itself only can represent a finite set of
numbers, but of course, the set of real numbers which we use in is infinite. It is therefore
necessary to decide on an efficient method of representing numbers in computer so that
we reach an acceptable compromise between the range of numbers available to us and the
accuracy or precision of their representation. The way this is usually achieved is to use
the floating-point number system.
In addition, when you look for reference about numerical methods, there are three
keywords:

Numerical mathematics > Numerical analysis > Numerical methods.

Our course includes of the origin, algorithm and analysis of many numerical methods.

1.2 Measures of error


There are three ways to measure the error or accuracy for the difference between an
exact number x and a nearby number x̃, including of Absolute error, Relative error and
Significant digits.
Absolute erroris defined as

∆x = x − x̃ or |∆x| = |x − x̃|

2
which is called absolute error of nearby number x̃.
Since we do not know the exact number x, the absolute error is unknown. Introducing
an absolute error bound. Given a positive number ε such that

|∆x| ≤ , that is |x − x̃| ≤ ε,

here  is the absolute error bound. Based on the bound, we can estimate the range of the
exact number as follows

x = x̃ ± , or x̃ −  ≤ x ≤ x̃ + .

Ex1.2

˜ and
To get the diameter d of one machine, we have measured the nearby diameter d,
the absolute error bound  ≤ 0.01. Then we can estimate the range of accurate
1 1
d˜ − × 10−2 < d < d˜ + × 10−2 .
2 2
Several questions

Is an absolute error enough to measure the difference between x and x̃?


Is ∆x large means that the approximate number x̃ is bad?
Is ∆x small means that the approximate number x̃ is good?
Ex1.3
Measure the speed of the light. The accurate and approximate values are

c = 299792.458 km/s, c̃ = 299791.5 km/s,

respectively. Calculate the absolute error


|∆c|
|∆c| ≈ 0.9 km/s, ≈ 0.0003%.
c
The absolute error is 0.0003% of the accurate value c. The other example:
Measure the running speed of an athlete. The accurate and approximate values are

b = 11 m/s, b̃ = 1 m/s,

respectively. Therefore the absolute error is got

|∆b| = |b − b̃| = 10 m/s = 0.01 km/s.

It is 90% of the accurate value b. The absolute error |∆c| is larger than |∆b|, but c̃ is more
accurate than b̃.
Thus, we need a new measure for error.
Relative error is defined as

x − x̃ x − x̃
δx = or |δx| = .
x x
In most circumstances, we do not have the exact value x available. it is common to replace
the x in the denominator of the definition by the approximation x̃ in which case this error
is given by

x − x̃ x − x̃
δx = or |δx| =
.
x̃ x̃

3
with some confinements. The relative error corresponds to the agreement to a number of
significant figures.
For the same reason, we introduce a relative error bound as |δx| ≤ ε,ε is called the bound.
Also we can estimate the range of the true value x

x − x̃
|δx| = ≤

which implies
x̃(1 − ε) ≤ x ≤ x̃(1 + ε), or x̃ = x(1 ± ε).
As a general principle, we expect absolute error to be more appropriate for numbers
close to unity, while relative error seems more natural for large number or those close to
zero.
The third way to measure the accuracy is Significant Digits.
Significant Digits.
Another term that is commonly used to express accuracy is significant digits, that
is, how many digits in the number have meaning. The more significant digits, the more
accurate the approximate number is.
Definition: Given the approximate number x̃ of the true number x is Positive and
Negative
x̃ = ±x1 x2 · · · xm .xm+1 · · · xm+n
If |x − x̃| ≤ 0.00 · · · 05 = 0.5 × 10−n = 12 × 10−n , the approximate number x̃ is said to
approximate x to n decimal places. For the number of the significant digits, we count it
from the leftmost nonzero digit to the n decimal place. All these digits are significant.
If all the digits of the approximate value x̃ are significant digits, then the approximate
x̃ is said an effective number.

Ex.1.4 Given an exact value x = 2 = 1.414213562 · · · and the approximate value
x̃ = 1.4142, find how many significant digits does x̃ has?
Solution Calculate the absolute error |∆x| = |x − x̃| = 0.000013 ≤ 0.00005 =
0.5 × 10−4 . Find n = 4; Thus, x̃ approximate x to 4 decimal places, and has 1 + 4 = 5
significant digits.
Ex. 1.5 An exact value π = 3.141592653589 · · · is approximated by

π1 = 3.1416,
π2 = 3.1428 · · · ,
π3 = 3.1415929 · · · .

Please find the significant digits of π1 , π2 , π3 .


Solution Calculate |π − π1 | = 0.0000073 · · · < 0.00005 = 0.5 × 10−4 ,
|π − π2 | = 0.0012 < 0.0005 = 0.5 × 10−2 ,
|π − π3 | = 0.00000026 · · · < 0.0000005 = 0.5 × 10−6 . π1 has 5 significant digits. π2 has 3
significant digits. π3 has 7 significant digits.
Relation Denote an approximate value by x̃ = ±0.x1 x2 · · · × 10m , x1 > 0. The
following is the relation among the relative error, absolute error and accurate numbers:
1. If |δx| ≤ 21 ×10−t , then |∆x| = |x̃|×|δx| = 0.x1 x2 · · ·×10m × 12 ×10−t ≤ 21 ×10−(t−m) .
Furthermore, x̃ = ±0.x1 x2 · · · × 10m , m ∈ Z, x1 > 0, is accurate to t − m decimal and at
least has t significant digits.

4
1
|∆x| ×10−n
2. If |∆x| ≤ 1
2 × 10−n , then has n + m significant digits. |δx| ≤ x̃ = 2
0.x1 ×10m ≤
1
×10−n
2
x1 ×10m−1
≤ 2x11 × 10−n−m+1 ≤ 12 × 10−n−m+1 .
3. If x̃ has l significant digits, n + m = l ⇒ n = l − m, |∆x| ≤ 1
2 × 10−(l−m) ,
1
|∆x| ×10−(l−m)
|δx| ≤ x̃ ≤ 2
x1 ×10m−1
≤ 1
2 × 10−(l−1) .

Ex. 1.6
(1) x = 23.692, x̃ = 23.696.How many significant digits has x̃? 4
(2) x̃ = 10000.245 and |∆x| < 21 × 10−2 , it has 7 significant digits.
(3) x̃ = 0.0000123 and |∆x| < 12 × 10−6 , it has 2 significant digits.
Ex. 1.7 Determine the absolute and relative errors when approximating p by p∗ if
(a) p = 0.3000 × 101 and p∗ = 0.3100 × 101 , ∆p = 0.1, δp = 0.1/3;
(b) p = 0.3000 × 10−3 and p∗ = 0.3100 × 10−3 , ∆p = 0.1 × 10−4 , δp = 0.1/3;
for students
(c) p = 0.3000 × 104 and p∗ = 0.3100 × 104 , ∆p = 0.1 × 103 and δp = 0.3333 × 10−1 ;
Based on the definition, n = −3 and m = 4 in this case. Therefore, the significant digits
are n + m = 1, δp = 0.1/3.
This example shows that the same relative error occurs for widely varying absolute
errors.
As a measure of accuracy the relative error is more meaningful taking into consideration
the size of the value.
Solution for(c): ∆p = 0.1 × 103 = 100 ≤ 500 = 0.5 × 103 = 12 × 103 = 12 × 10−(−3) .

1.3 Avoid the loss of accuracy


1. Large number swallows small number.

2. Avoid the subtraction of nearly equal numbers.

Ex. 1.8 Use 6-digit decimal floating-point operations to calculate the exact value
x = lg 1.99999 − lg 1.99998 = 2.1714886 · · · × 10−6 (the true value).
But in a six-digit computer, the formula is calculated as

x̃ = lg 1.99999 − lg 1.99998 ≈ 0.301028 − 0.301026 = 0.000002 = 2 × 10−6 .

The approximate value x̃ has only one significant digit. However, if we change the formula
into
1.99999
x̃ = lg 1.99999 − lg 1.99998 = lg ≈ 2.17149 × 10−6
1.99998
which has more accuracy with six significant digits.

3. Reduce computations.

Ex. 1.9 To calculate the formula 3x3 + 2x2 + 4x + 6, it takes one compute 3 times
addition, 6 times multiplication. If we change the form into ((3x + 2)x + 4)x + 6, then it
needs 3 times addition, 3 times multiplication.
Therefore, one general polynomial of degree n,

a0 + a1 x + a2 x2 + · · · an xn

5
is written into the form

(((an x + an−1 )x + an−2 )x + · · · + a1 )x + a0


requiring n times addition, n times multiplication. This technique can reduce the number
of the floating point operations.

4. Use stable algorithm.


R1
In a 4-digit decimal computer, calculate 8 integrals, Ii = e−1 0 xi ex dx, i = 0, 1, · · · 7.
Theoretical analysis:

Property 1 using integration by parts, we have recurrence relation

Ii = 1 − iIi−1 ;

Property 2 I0 = 1 − e−1 ;
R1 R1 R1
Property 3 Ii = 0 xi ex−1 dx = 0 xi e−(1−x) dx < 0 xi dx = 1
i+1 .It means that Ii →
0(when i → ∞).
Computation: using two kinds of algorithms.

Algorithm 1.

Given I0 = 1 − e−1 ,compute by

Ii = 1 − iIi−1 , i = 0, 1, · · · , 7

The results:
0.6321 0.3679 0.2642 0.2074
0.1704 0.1480 0.1120 0.2160

Why? What is wrong with Algorithm 1?


Analysis:
For Algorithm 1, the initial value I0 , the approximate value I˜0
(
Ii = 1 − iIi−1
I˜i = 1 − iI˜i−1 , i = 1, 2, · · · 7

Ii − I˜i = (−i)(Ii−1 − I˜i−1 ), i = 1, 2, · · · 7


I7 − I˜7 = −7!(I0 − I˜0 ) = −5040(I0 − I˜0 ) large

Algorithm 2.

1−Ii
Setting I11 = 0, compute Ii−1 = i , i = 11, 10, · · · 1

I0 , I1 , I2 , · · · I7 are

0.6321 0.3679 0.2642 0.2073

6
0.1709 0.1455 0.1268 0.1124


1 − Ii
 Ii−1 =


i
1 − I˜i
 I˜i−1 =

 ,
i
˜
7 −I7
get I0 − I˜0 = − I5040 is very small.

You might also like