Professional Documents
Culture Documents
1. Numerical analysis
Numerical analysis is the branch of mathematics which study and develop
the algorithms that use numerical approximation for the problems of mathematical analysis (continuous mathematics). Numerical technique is widely
used by scientists and engineers to solve their problems. A major advantage
for numerical technique is that a numerical answer can be obtained even
when a problem has no analytical solution. However, result from numerical
analysis is an approximation, in general, which can be made
as accurate as
8
2
6
6
= 2.66666 =
+
+
+ . . . 101 .
3
101 102 103
This is an infinite series, but computer use an finite amount of memory to
represent numbers. Thus only a finite number of digits may be used to
represent any number, no matter by what representation method.
For example, we can chop the infinite decimal representation of 83 after 4
digits,
6
6
6
8
2
= ( 1 + 2 + 3 + 4 ) 101 = 0.2666 101 .
3
10
10
10
10
Generalizing this, we say that number has n decimal digits and call this n
as precision.
For each real number x, we associate a floating point representation denoted
by f l(x), given by
f l(x) = (0.a1 a2 . . . an ) e ,
here based fraction is called mantissa with all ai integers and e is known
as exponent. This representation is called based floating point representation of x.
For example,
42.965 = 4 101 + 2 100 + 9 101 + 6 102 + 5 103
= 42965 102 .
0.00234 = 0.234 102 .
0 is written as 0.00 . . . 0 e . Likewise, we can use for binary number system
and any real x can be written
x = q 2m
with 21 q 1 and some integer m. Both q and m will be expressed in
terms of binary numbers.
For example,
1001.1101 = 1 23 + 2 20 + 1 21 + 1 22 + 1 24
= (9.8125)10 .
Remark: The above representation is not unique.
For example, 0.2666 101 = 0.02666 102 .
Definition 3.1 (Normal form). A non-zero floating-point number is in normal form if the values of mantissa lies in (1, 1 ) or [ 1 , 1].
Therefore, we normalize the representation by a1 6= 0. Not only the precision is limited to a finite number of digits, but also the range of exponent is
also restricted. Thus there are integers m and M such that m e M .
3.1. Rounding and chopping. Let x be any real number and f l(x) be its
machine approximation. There are two ways to do the cutting to store a
real number
x = (a1 a2 . . . an an+1 . . . ) e ,
a1 6= 0.
(1) Chopping: We ignore digits after an and write the number as following in chopping
f l(x) = (.a1 a2 . . . an ) e .
(2) Rounding: Rounding is defined as following
(0.a1 a2 . . . an ) e , 0 an+1 < /2
(rounding down)
f l(x) =
(0.a1 a2 . . . an ) + (0.00 . . . 01) e , /2 an+1 < (rounding up).
Example 1.
6
0.86 100 (rounding)
fl
=
0.85 100 (chopping).
7
Rules for rounding off numbers:
(1) If the digit to be dropped is greater than 5, the last retained digit is
increased by one. For example,
12.6 is rounded to 13.
(2) If the digit to be dropped is less than 5, the last remaining digit is left
as it is. For example,
12.4 is rounded to 12.
(3) If the digit to be dropped is 5, and if any digit following it is not zero,
the last remaining digit is increased by one. For example,
12.51 is rounded to 13.
(4) If the digit to be dropped is 5 and is followed only by zeros, the last
remaining digit is increased by one if it is odd, but left as it is if even. For
example,
11.5 is rounded to 12, and 12.5 is rounded to 12.
Definition 3.2 (Absolute and relative error). If f l(x) is the approximation
to the exact value x, then the absolute error is |x f l(x)|, and relative error
|x f l(x)|
.
is
|x|
Remark: As a measure of accuracy, the absolute error may be misleading
and the relative error is more meaningful.
Definition 3.3 (Overflow and underflow). An overflow is obtained when
a number is too large to fit into the floating point system in use, i.e e >
M . An underflow is obtained when a number is too small, i.e e < m .
When overflow occurs in the course of a calculation, this is generally fatal.
But underflow is non-fatal: the system usually sets the number to 0 and
continues. (Matlab does this, quietly.)
X
ai
i
x = (0.a1 a2 . . . an an+1 . . . ) =
!
e , a1 6= 0.
i=1
n
X
ai
i
f l(x) = (0.a1 a2 . . . an ) e =
!
e.
i=1
Therefore
X
ai
i
|x f l(x)| =
!
e
i=n+1
X
ai
.
i
e |x f l(x)| =
i=n+1
X
1
i
i=n+1
= ( 1)
1
n+1
"
= ( 1))
1
n+2
#
1
n+1
1 1
+ ...
= n .
Now
|x| = (0.a1 a2 . . . an ) e
1
e.
Therefore
|x f l(x)|
n e
1
1n .
|x|
e
Rounding errors: For rounding
n
P ai
(0.a1 a2 . . . an ) =
e , an+1 < /2
i
i=1
f l(x) =
n a
P
1
i
e
e , an+1 /2.
(0.a1 a2 . . . an1 [an + 1]) = n +
i
i=1
X
X
ai
an+1
ai
|x f l(x)| =
= n+1 +
i
i
i=n+1
/2 1
+
n+1
i=n+2
X
i=n+2
( 1)
i
/2 1
1
1
+ n+1 = n .
n+1
Since an+1
X a
1
i
e |x f l(x)| =
i
n
i=n+1
1
X
ai
= n
i
i=n+1
1
X
a
a
n+1
i
n n+1
i
i=n+2
1
an+1
n n+1
/2, therefore
1
/2
e
|x f l(x)| n n+1
1
= n .
2
1 en
.
2
|x f l(x)|
1 n e
1
= 1n .
1
e
|x|
2
2
5. Significant Figures
The number 25.4 is said to have 3 significant figures. The number 25.40
is said to have 4 significant figures
Rules for deciding the number of significant figures in a measured quantity:
(1) All nonzero digits are significant:
1.234 has 4 significant figures, 1.2 has 2 significant figures.
(2) Zeros between nonzero digits are significant: 1002 has 4 significant figures.
(3) Leading zeros to the left of the first nonzero digits are not significant;
such zeros merely indicate the position of the decimal point: 0.001 has only
1 significant figure.
(4) Trailing zeros that are also to the right of a decimal point in a number
are significant: 0.0230 has 3 significant figures.
(5) When a number ends in zeros that are not to the right of a decimal
point, the zeros are not necessarily significant: 190 may be 2 or 3 significant
figures, 50600 may be 3, 4, or 5 significant figures.
The potential ambiguity in the last rule can be avoided by the use of standard
exponential, or scientific, notation. For example, depending on whether
the number of significant figures is 3, 4, or 5, we would write 50600 calories
as:
0.506 106 (3 significant figures)
0.5060 106 (4 significant figures), or
0.50600 106 (5 significant figures).
What is an exact number? Some numbers are exact because they are
known with complete certainty. Most exact numbers are integers: exactly
12 inches are in a foot, there might be exactly 23 students in a class. Exact
numbers are often found as conversion factors or as counts of objects. Exact
numbers can be considered to have an infinite number of significant figures.
Thus, the number of apparent significant figures in any exact number can be
ignored as a limiting factor in determining the number of significant figures
in the result of a calculation.
6. Rules for mathematical operations
In carrying out calculations, the general rule is that the accuracy of a
calculated result is limited by the least accurate measurement involved in
the calculation. In addition and subtraction, the result is rounded off so
that it has the same number of digits as the measurement having the fewest
decimal places (counting from left to right). For example,
100 (assume 3 significant figures) +23.643 (5 significant figures) = 123.643,
which should be rounded to 124 (3 significant figures). However, that it is
possible two numbers have no common digits (significant figures in the same
digit column).
In multiplication and division, the result should be rounded off so as to have
the same number of significant figures as in the component with the least
X x1 x2
+
X X X
1 X
x1 x2 . . . xn1
1
=
=
.
X xn
x1 x2 x3 . . . xn
xn
Therefore
X
xn
x1 x2
+
+ +
.
=
X
x1
x2
xn
Therefore maximum relative and absolute errors are given by
X x1 x2
xn
.
Er =
+
+ +
X x1 x2
xn
X
X
(x1 x2 . . . xn ).
Ea =
X
X
X
x1
then
x2
X
X
X = x1
+ x2
.
x1
x2
We have
X
x1 X
x2 X
x1 x2
=
+
=
.
X
X x1
X x2
x1
x2
Therefore relative error
X x1 x2
,
Er =
+
X x1 x2
and absolute error
X
X.
Ea =
X
Absolute error
Ea 0.0021
x1
= 0.0639
x2
7.342
Hence true value of
lies between 30.4647 0.0639 = 30.4008 and
0.241
30.4647 + 0.0639 = 30.5286.
7. Loss of significance, stability and conditioning
Roundoff errors are inevitable and difficult to control. Other types of
errors which occur in computation may be under our control. The subject of
numerical analysis is largely preoccupied with understanding and controlling
errors of various kinds. Here we examine some of them.
10
7.1. Loss of significance. One of the most common error-producing calculations involves the cancellation of significant digits due to the subtractions
nearly equal numbers (or the addition of one very large number and one
very small number). The phenomenon can be illustrated with the following
example.
Example 8. If x = 0.3721478693 and y = 0.3720230572. What is the relative error in the computation of x y using five decimal digits of accuracy?
Sol. We can compute with ten decimal digits of accuracy and can take it as
exact.
x y = 0.0001248121.
Both x and y will be rounded to five digits before subtraction. Thus
f l(x) = 0.37215
f l(y) = 0.37202.
f l(x) f l(y) = 0.13000 103 .
Relative error, therefore is
(x y) (f l(x) f l(y)
Er =
.04% = 4%.
xy
x+1+1
x
x + 1 1 = ( x + 1 1)
=
.
x+1+1
x+1+1
This expression has no subtractions, and so is not subject to subtractive
cancelling. When x = 1.2345678 105 , this expression evaluates approximately as
1.2345678 105
= 6.17281995 106
2.0000062
on a machine with 8 digits, there is no loss of precision.
Example 10. Find the solution of the following equation using floatingpoint arithmetic with 4-digit mantissa
x2 1000x + 25 = 0.
Sol. Given that,
x2 1000x + 25 = 0
11
0.1000e4 0.1000e4
x2 =
= 0.0000e4
2
One of the roots becomes zero due to the limited precision allowed in computation.
In this equation since b2 is much larger than 4ac. Hence b and
b2 4ac become two equal numbers. Calculation of x2 involves the subtraction of nearly two equal numbers which will cause serious loss of significant figures.
To obtain a more accurate 4-digit rounding approximation for x2 , we change
the formulation by rationalizing the numerator or we know that in quadratic
equation ax2 + bx + c = 0, the product of the roots is given by c/a, therefore the smaller root may be obtained by dividing (c/a) by the largest root.
Therefore first root is given by 0.1000e4 and second root is given as
0.2500e2
25
=
= 0.2500e 1.
0.1000e4
0.1000e4
Example 11. The quadratic formula is used for computing the roots of
equation ax2 + bx + c = 0, a 6= 0 and roots are given by
b b2 4ac
x=
.
2a
Consider the equation x2 + 62.10x + 1 = 0 and discuss the numerical results.
Sol. Using quadratic formula and 8-digit rounding arithmetic, we obtain
two roots
x1 = .01610723
x2 = 62.08390.
We use these values as exact values. Now we perform calculations with
4-digit rounding
arithmetic.
12
In calculating x2 ,
f l(x2 ) =
62.10 62.06
= 62.10.
2.000
x1 =
2c
.
b + b2 4ac
Then
f l(x1 ) =
2.000
= 2.000/124.2 = 0.01610.
62.10 + 62.06
2c
.
b b2 4ac
The use of this formula results not only involve the subtraction of two nearly
equal numbers but also division by the small number. This would cause
degrade in accuracy.
f l(x2 ) =
2.000
= 2.000/.04000 = 50.00
62.10 62.06
x3 x5 x7
+
+ ...)
3!
5!
7!
x3
x5
x7
+
...
6
6 20 6 20 42
x3
x2
x2
x2
=
1 (1 (1 )(...)) .
6
20
42
72
=
13
7.2. Conditioning. The words condition and conditioning are used to indicate how sensitive the solution of a problem may be to small changes in
the input data. A problem is ill-conditioned if small changes in the data
can produce large changes in the results. For a certain types of problems,
a condition number can be defined. If that number is large, it indicates an
ill-conditioned problem. In contrast, if the number is modest, the problem
is recognized as a well-conditioned problem.
The condition number can be calculated in the following manner:
K=
x x
x
0
xf (x)
,
f (x)
10
, then the condition number can be calculated
For example, if f (x) =
1 x2
as
0
2
xf (x)
= 2x .
K=
f (x) |1 x2 |
Condition number can be quite large for |x| 1. Therefore, the function is
ill-conditioned.
7.3. Stability of an algorithm. Another theme that occurs repeatedly
in numerical analysis is the distinction between numerical algorithms are
stable and those that are not. Informally speaking, a numerical process is
unstable if small errors made at one stage of the process are magnified and
propagated in subsequent stages and seriously degrade the accuracy of the
overall calculation.
An algorithm can be thought of as a sequence of problems, i.e. a sequence of
function evaluations. In this case we consider the algorithm for evaluating
f (x) to consist of the evaluation of the sequence x1 , x2 , , xn . We are concerned with the condition of each of the functions f1 (x1 ), f2 (x2 ), , fn1 (xn1 )
where f (x) = fi (xi ) for all i. An algorithm is unstable if any fi is illconditioned, i.e. if any fi (xi ) has condition much worse than f (x). Consider
the example
f (x) = x + 1 x
14
:
:
:
:
:
=
=
=
=
=
x = 12345
x0 + 1
x
1
x0
x2 x3 .
The loss of significance occurs with the final subtraction. We can rewrite
the last step in the form f3 (x3 ) = x2 x3 to show how the final answer
depends on x3 . As f30 (x3 ) = 1, we have the condition
x3 f30 (x3 ) x3
=
K(x3 ) =
f3 (x3 ) x2 x3
from which we find K(x3 ) 2.2 104 when x = 12345. Note that this is
the condition of a subproblem arrived at during the algorithm. To find an
alternative algorithm we write
x+1+ x
1
f (x) = ( x + 1 x)
=
x+1+ x
x+1+ x
This suggests the algorithm
x0
x1
x2
x3
x4
f (x) := x5
:
:
:
:
:
:
=
=
=
=
=
=
x = 12345
x0 + 1
x
1
x0
x2 + x3
1/x4 .
15