You are on page 1of 10

Chapter 1.

Number System and Numerical Error Analysis

Introduction: The objective of this course is to introduce the engineer and scientist to
numerical methods which can be used to solve mathematical problems arising in engineering
and science that cannot be solved by exact methods. With the general accessibility of high-
speed digital computers, it is now possible to obtain rapid and accurate solutions to many
complex problems that face the engineer and scientist.

The problems in the course are of two types that are exercise problems and applied problems.
Exercise problems are straightforward problems designed to give practice in the application
of the numerical algorithms presented in each chapter. We don’t bother about real constraints
for the coefficients and the numbers and also don’t try to interpret what the result means. Exercise
problems emphasize the mechanics of the method. And, applied problems involve more applied
engineering and scientific applications which require numerical solutions.

1.1.Significant Digits, Precision, Accuracy, Errors, And Number


Representation

When dealing with numerical values and numerical calculations, there are several concepts that
must be considered:

1. Significant digits
2. Precision and accuracy
3. Errors
4. Number representation

This concepts are discussed briefly in this section.

1
1.1. Significant digits (figures)

The concept of a significant figure, or digit, has been developed to formally designate the reliability
of a numerical value. The significant digits of a number are those that can be used with confidence.
Therefore, the significant digits, or figures, in a number are the digits of the number which are
known to be correct. They correspond to the number of certain digits plus one estimated digit.

Figure 1.1: speedo meter

Fig. 2.1 depicts a speedometer from an automobile. Visual inspection of the speedometer indicates
that the car is traveling between 48 and 49 km/h. Because the indicator is higher than the midpoint
between the markers on the gauge, we can say with assurance that the car is traveling at
approximately 49 km/h. We have confidence in this result because two or more reasonable
individuals reading this gauge would arrive at the same conclusion. However, let us say that we
insist that the speed be estimated to one decimal place. For this case, one person might say 48.8,
whereas another might say 48.9 km/h. Therefore, because of the limits of this instrument, only the
first two digits can be used with confidence. Estimates of the third digit (or higher) must be viewed
as approximations.

For the example, the speedometer in Fig.1.1 yield readings of three significant figures.

Although it is usually a straightforward procedure to ascertain the significant figures of a number,


some cases can lead to confusion. For example, zeros are not always significant figures because
they may be necessary just to locate a decimal point or to specify the measurement unit.

2
For example 0.00001845, 0.0001845, and 0.001845 all have four significant figures. Similarly,
when trailing zeros are used in large numbers, it is not clear how many, if any, of the zeros are
significant. For example, at face value the number 45,300 may have three, four, or five significant
digits, depending on whether the zeros are known with confidence. Such uncertainty can be
resolved by using scientific notation, where 4.53×104, 4.530×104, 4.5300×104 designate that the
number is known to three, four, and five significant figures, respectively.

Engineering and scientific calculations generally begin with a set of data having a known number
of significant digits. When these numbers are processed through a numerical algorithm, it is
important to be able to estimate how many significant digits are present in the final computed
result. For this purpose, there would be round-off.

1.2. Precision and Accuracy

The errors associated with both calculations and measurements can be characterized with regard
to their accuracy and precision. Accuracy refers to how closely a computed or measured value
agrees with the true value. Precision refers to how closely individual computed or measured values
agree with each other.

These concepts can be illustrated graphically,

Figure 1.2: a) inaccurate & imprecise b) accurate & imprecise


c) inaccurate & precise d) accurate & precise

3
In a numerical calculation, precision is governed by the number of digits being carried in the
numerical calculations. Accuracy is governed by the errors in the numerical approximation.

1.3. Errors

The accuracy of a numerical calculation is quantified by the error of the calculation. Several types
of errors can occur in numerical calculations.

1. Errors in the parameters of the problem (non-existent/missing the existing)


2. Algebraic errors in the calculation
3. Iteration errors
4. Approximation errors
5. Round off errors: Why round-off is important? Round-off is a process of representing a
digit with next larger significant place due to limitation of display or memory and also to
have a common conventional measure for the uncertain digit.

Mathematically, error can be defined as,

𝑇𝑟𝑢𝑒 𝑒𝑟𝑟𝑜𝑟 (𝐸𝑡 ) = 𝑇𝑟𝑢𝑒 𝑣𝑎𝑙𝑢𝑒 − 𝐸𝑠𝑡𝑖𝑚𝑎𝑡𝑖𝑜𝑛


……………………..…………………………………………. (1.1)

A shortcoming of this definition is that it takes no account of the order of magnitude of the value
under examination. For example, an error of a centimeter is not significant in portioning land for
residential but a millimeter range accuracy is needed in brain surgery. One way to account for the
magnitudes of the quantities being evaluated is to normalize the error to the true value, as follow,

𝑡𝑟𝑢𝑒 𝑒𝑟𝑟𝑜𝑟
𝑇𝑟𝑢𝑒 𝑓𝑟𝑎𝑐𝑡𝑖𝑜𝑛 𝑟𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑒𝑟𝑟𝑜𝑟 = 𝑡𝑟𝑢𝑒 𝑣𝑎𝑙𝑢𝑒 …………………………………………….… (1.2)

𝑡𝑟𝑢𝑒 𝑒𝑟𝑟𝑜𝑟
Or 𝑇𝑟𝑢𝑒 𝑝𝑒𝑟𝑐𝑒𝑛𝑡 𝑟𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑒𝑟𝑟𝑜𝑟 (𝜀𝑡 ) = 𝑡𝑟𝑢𝑒 𝑣𝑎𝑙𝑢𝑒 × 100% ……………………..………. (1.3)

Note that, in Eq. 3, ε is subscripted with ‘t’ to signify that the error is normalized to the true value.
However, in actual situations such information is rarely available. For numerical methods, the true
value will be known only when we deal with functions that can be solved analytically.

4
Such will typically be the case when we investigate the theoretical behavior of a particular
technique for simple systems. However, in real-world applications, we will obviously not know
the true answer a priori. For these situations, an alternative is to normalize the error using the best
available estimate of the true value, that is, to the approximation error.

𝑎𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑒 𝑒𝑟𝑟𝑜𝑟
𝐴𝑝𝑝𝑜𝑥𝑚𝑎𝑡𝑒 𝑝𝑒𝑟𝑐𝑒𝑛𝑡 𝑟𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑒𝑟𝑟𝑜𝑟 (𝜀𝑎 ) = 𝑎𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑒 𝑣𝑎𝑙𝑢𝑒 × 100% …………………. (1.4)

But, numerical regressive method we don’t have the best known approximate value at the start of
the process. Rather, the next approximation is done based on the previous approximation.
Therefore, εa will be defined as,

𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝑎𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑖𝑜𝑛−𝑝𝑟𝑒𝑣𝑖𝑜𝑢𝑠 𝑎𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑖𝑜𝑛


𝜀𝑎 = × 100% ………………………………. (1.5)
𝑐𝑢𝑟𝑟𝑒𝑛𝑡 𝑎𝑝𝑝𝑟𝑜𝑥𝑖𝑚𝑎𝑡𝑖𝑜𝑛

In the above definition (Eq. 5), the result may be negative if the current approximation is less than
the previous. Mostly, the magnitude of the error have significant and we take the absolute value.
Especially, in defining tolerance value for programming numerical computation, the absolute value
is put; i.e. |εa|.

Example 1.1: Error calculation

In mathematics, functions can often be represented by infinite series. For example, the exponential
function can be computed using

Thus, as more terms are added in sequence, the approximation becomes a better and better estimate
of the true value of ex. The equation is called a Maclaurin series expansion.

If the true value of e0.7 = 2.013752707, based on this expansion, fill the table below by calculating
the true percent relative error (εt) and the approximate percent relative error (εa) for each terms.

5
Term Approximate result εt (%) |Εa (%)|
1 (n=0) 1 50.34 -
2 (n=1) 1.7 15.58 41.18
3 (n=2) 1.945 3.41 12.596
4 (n=3) 2.002166667 0.575 2.855
5 (n=4) 2.012170833 0.0786 0.497
6 (n=5) 2.013571417 0.0090 0.0696

*** The first iteration or term is when n=0.

1.4. Computer Representation of Numbers

Numerical round-off errors are directly related to the manner in which numbers are stored in a
computer. The fundamental unit whereby information is represented is called a word. This is an
entity that consists of a string of binary digits, orbits. Numbers are typically stored in one or more
words. To understand how this is accomplished, we must first review some material related to
number systems.

1.4.1. Number system

A number system is merely a convention for representing quantities. Because we have 10 fingers
and 10 toes, the number system that we are most familiar with is the decimal, or base-10, number
system. A base is the number used as the reference for constructing the system. The base-10 system
uses the 10 digits—0, 1, 2, 3, 4, 5, 6, 7, 8, 9—to represent numbers. By themselves, these digits
are satisfactory for counting from 0 to 9.

For larger quantities, combinations of these basic digits are used, with the position or place value
specifying the magnitude. The right-most digit in a whole number represents a number from 0 to
9. The second digit from the right represents a multiple of 10. The third digit from the right
represents a multiple of 100 and so on. For example, if we have the number 64,801 then we have
six groups of 10,000, four groups of 1000, eight groups of 100, zero groups of 10, and one more
units. Or, (6×104) + (4×103) + (8×102) + (0×10) + (1×100) = 64,801.

6
The primary logic units of digital computers are on/off electronic components. Like, analogy of
that a computer is an animal only have two fingers to count only either 0 or 1. Hence, numbers on
the computer are represented with a binary, or base-2, system. Just as with the decimal system,
quantities can be represented using positional notation. For example, the binary number 11 is
equivalent to (1×21) + (1×20) = 3 in the decimal system.

Exercise: Review on conversion of base of number system

1. Convert the following numbers into base-10 number system


a. (10010)2 b. (2072)8 c. (32A5C)16
Where: A-eleven, C-thirteen
2. Convert the following numbers into binary number system
a. 3541 b. 5561 C. 6594
3. Convert the following decimal numbers into base ten
a. (0.21200021)3 b. (0.3025403012)6 c. (0.341027)8
4. Do the following conversions
a. 0.2125875 →()4 b. 0.18789651 →()7 c. 0.789451 → ()5

1.4.2. Integer representation

The most straightforward approach, called the signed magnitude method, employs the first bit of
a word to indicate the sign, with a 0 for positive and a 1 for negative. The remaining bits are used
to store the number. For example, the integer value of -149 would be stored on a 16-bit computer,
as in Fig. 1.3.

Figure 1.3: Integer representation

7
1.4.3. Floating-point representation

Fractional quantities are typically represented in computers using floating-point form. In this
approach, the number is expressed as a fractional part, called a mantissa or significand, and an
integer part, called an exponent or characteristic.

𝑚 × 𝑏 𝑒 ……………………………………………………………………………………. (1.6)

Where ‘m’ is the mantissa, ‘b’ is the base of the number system being used, and ‘e’ is the exponent.
For instance, the number 358.54 could be represented as 0. 35854×103 in a floating-point base-10
system.

Figure 1.4: Floating number memory allocation

‘m’ should be normalized in the range of 1/b ≤ m < 1. So, ‘e’ would have sign digit. The process
of normalization increase the number of significant digit able to be presented with same memory
space.

Floating-point representation allows both fractions and very large numbers to be expressed on the
computer. However, it has some disadvantages. For example, floating-point numbers take up more
room and take longer to process than integer numbers. More significantly, however, their use
introduces a source of error because the mantissa holds only a finite number of significant figures.
Thus, a round-off error is introduced.

8
Example 1.3: Floating number representation

1. In Fig. 1.5, nine bit word of a machine is shown. If the allocations of mantissa, signed
exponent and sign of the number are as indicated, find the word to represent

a. the possible smallest positive number Ans: 011111000 = 0.00390625


b. the possible largest positive number Ans: 001111111 = 120

Figure 1.5: normalized-floating point number

2. Write number 1.2 using floating number notation in

a. base three
b. base four

*** As you have seen in the above example, because of finite number of bits are available in computer
system, only limited resolution of quantity can be represented in discrete. This phenomenon is known
as underflow. Similarly overflow occurs when the magnitude of quantity is larger than the capacity
of the machine.

1.5. Errors in Arithmetic Manipulations of Computer Numbers

 Chopping error in addition

When two floating-point numbers are added, the mantissa of the number with the smaller exponent
is modified so that the exponents are the same. This has the effect of aligning the decimal points
and in effect causes chopping error.

Adding 0.451×101 + 0.254×103

9
First 0.451×101 0.004516×103

Then, 0.254×103
+ 0.004516×103
0.254×103

 Cancellation of significant digits

10

You might also like