# Chapter 3.

Approximation and Round-Off Errors

Speed: 48.X Mileage: 87324.4X

Significant Figures

• Number of significant figures indicates precision. Significant digits of a number are those that can be used with confidence, e.g., the number of certain digits plus one estimated digit.

2

53,800 How many significant figures? 5.38 x 104 5.380 x 104 5.3800 x 104 3 4 5

Zeros are sometimes used to locate the decimal point not significant figures. 0.00001753 0.0001753 0.001753 4 4 4

We cannot exactly compute the errors associated with numerical methods.
– Only rarely given data are exact..Approximations and Round-Off Errors
• For many engineering problems. no analytical solutions. e. unavoidable round-offs. – Algorithm itself usually introduces errors as well. Therefore there is probably error in the input information.
3
• How confident we are in our approximate result? • The question is “how much error is present in our calculation and is it tolerable?”
. • Numerical methods yield approximate results. etc … – The output information will then contain error from both of these sources. since they originate from measurements.g.

• Imprecision (or uncertainty). • Inaccuracy (or bias) – A systematic deviation from the actual value.
.4
• Accuracy – How close is a computed or measured value to the true value • Precision (or reproducibility) – How close is a computed or measured value to previously computed or measured values. – Magnitude of scatter.

Fig 3.2
5
.

Error Definitions
6
True Value = Approximation + Error Et = True value – Approximation (+/-)
True error
true error True fractional relative error = true value
true error True percent relative error. ε t = ×100% true value
.

Then
Approximate error εa = Approximation
• Iterative approach.• For numerical methods. example Newton’s method Current approximation .Previous approximation εa = Current approximation (+ / -)
. the true value will be known only when we deal with functions that can be solved analytically (simple systems). we usually not know the answer a priori. In real world applications.

5 ×10-n = (0. • Computations are repeated until stopping criterion is satisfied.
εa 〈 εs
Pre-specified % tolerance based on the knowledge of your solution
• If the following criterion is met
ε s = 0.8
• Use absolute value.
.5 ×10(2-n) )%
you can be sure that the result is correct to at least n significant figures.

Fig 3.3 decimal. binary
9
.

4 Signed binary
10
• 1000 0000 0000 0001 = (-1) • 2’s complement – 0000 0000 0000 0001 = 1 – 0000 0000 0000 0000 = 0 – 1111 1111 1111 1111 = -1 – 1111 1111 1111 1110 = -2 • Number range ?.Fig 3. How to compute − a from a ?
.

011 = 1× 2 2 + 0 × 21 + 1× 20 + 0 × 2 −1 + 1× 2 −2 + 1× 2 −3 = 4 + 1 + 0. binary
123.25 + 0.125 = 5.Fractional number – decimal.375
Fixed point number
fraction
.456 = 1× 10 2 + 2 ×101 + 3 ×10 0 + 4 × 10 −1 + 5 × 10 −2 + 6 × 10 −3
11
101.

Multiply the mantissa by 10 and lower the exponent by 1 0.78 0.029411765 Suppose only 4 34 decimal places to be stored 1 0.Floating point number (base-10) 156.2941 x 10-1
Additional significant Chapter 3 figure is retained
.0294×100 ≤ m <1 10
• Normalized to remove the leading zeroes.15678x103 in a floating point base-10 system
12
1 = 0.

5
. base-2)
13
101.00101101 = 0.Floating point number (binary.101011× 23 0.101101× 2
−2
Fig 3.011 = 0.

e.g.Floating point number • Numbers such as π.
14
m ⋅ be
mantissa
exponent Base of the number system used
. they cannot precisely represent certain exact base-10 numbers. or 7 cannot be expressed by a fixed number of significant figures. • Computers use a base-2 representation. • Fractional quantities are typically represented in computer using “floating point” form.. e.

– Take longer to process than integer numbers. • not anymore – Round-off errors are introduced because mantissa holds only a finite number of significant figures.1 ≤m<1 0.5 ≤m<1
• Floating point representation allows both fractions and very large numbers. However.
– Floating point numbers take up more room.Floating point number
1 ≤ m <1 b
15
Therefore for a base-10 system for a base-2 system
0.
.

Chopping.141593 εt=0. because rounding adds to the computational overhead.00000035 • Some machines use chopping.141592 chopping error εt=0.00000065 If rounded π=3.
16
.14159265358 to be stored on a base-10 system carrying 7 significant digits. resulting chopping error is negligible. Rounding Example: π=3. Since number of significant figures is large enough. π=3.

6 .Fig 3.example
17
.

078125 – 2-3×(1 × 2-1 + 1 × 2-2 +0 × 2-3) =0.187500 – 2-2×(1 × 2-1 + 1 × 2-2 +1 × 2-3) =0. 61)
– 2-3 ×(1 × 2-1+ 0 × 2-2 +0 × 2-3) =0.015625 – 2-2 ×(1 × 2-1+ 0 × 2-2 +0 × 2-3) =0.Example 3.218750 – Evenly spaced by 2-2×(0 × 2-1 + 0 × 2-2 +1 × 2-3) =0.093750 – 2-3×(1 × 2-1 + 1 × 2-2 +1 × 2-3) =0.109375 – Evenly spaced by 2-3×(0 × 2-1 + 0 × 2-2 +1 × 2-3)= 0.156250 – 2-2×(1 × 2-1 + 1 × 2-2 +0 × 2-3) =0.125000 – 2-2×(1 × 2-1 + 0 × 2-2 +1 × 2-3) =0.4 (p.062500 (the smallest) – 2-3×(1 × 2-1 + 0 × 2-2 +1 × 2-3) =0.03125
18
.

5 – 23 ×(1 × 2-1+ 0 × 2-2 +0 × 2-3) =4 – 23×(1 × 2-1 + 0 × 2-2 +1 × 2-3) =5 – 23×(1 × 2-1 + 1 × 2-2 +0 × 2-3) =6 – 23×(1 × 2-1 + 1 × 2-2 +1 × 2-3) =7 (the largest) – Evenly spaced by 23×(0 × 2-1 + 0 × 2-2 +1 × 2-3) =1
19
.5 – Evenly spaced by 22×(0 × 2-1 + 0 × 2-2 +1 × 2-3)= 0.4 (p.5 – 22×(1 × 2-1 + 1 × 2-2 +0 × 2-3) =3 – 22×(1 × 2-1 + 1 × 2-2 +1 × 2-3) =3.Example 3. 61)
– 22 ×(1 × 2-1+ 0 × 2-2 +0 × 2-3) =2 – 22×(1 × 2-1 + 0 × 2-2 +1 × 2-3) =2.

7
20
.Fig 3.

IEEE Standard 754 Floating Point Numbers
• Single precision (32-bit) – sign: 1 bit. mantissa: 52 bits – 15-16 significant base-10 digits with range 10^-308 to 10^308
21
•
. exponent: 8 bits. mantissa: 23 bits – 7 significant base-10 digits with range 10^-38 to 10^39 Double precision (64-bit) – sign: 1 bit. exponent: 11 bits.

4381 × 10-1
0.1557 ×10 0.Arithmetic Manipulations
22
• Common Arithmetic operations – The mantissa of the number with the smaller exponent is modified so that the exponents are the same – 0.1557 × 101 + 0.1600 ×101
0.004381 ×101 0.160081 ×101
.1557
×101 0.4381×10
1 −1
0.

0955 ×10
0.3641 10
0.9550 ×10
1
.2686 ×102 2 0.Arithmetic Manipulations
• Subtraction – 0.0.3641×10
2 2
2 × 0.2686 ×10
− 0.3641 × 102 .2686 × 102
23
0.

7641 × 103
0.7641×103 3 0.Arithmetic Manipulations
24
• Subtraction – 0.7642 ×10
3 3
3 × 0.0.7642 10
0.7642 × 103 .7642 ×10
0.0001×10
0.1000 ×10
0
.

1363 ×103 × 0.6423 ×10−1 0.1363 ×103 × 0.1363 × 103 × 0.8754549 ×10
1
– Exponents are added – Mantissas are multiplied
0.08754549 × 102 0.8754 ×10
1
.6423 × 103
0.Arithmetic Manipulations
25
• Multiplication – 0.6423 ×10−1
0.

4000 × 104 + 0.4000 ×10
4
.4000001×10
0.0000001×104 4 0.4000
×10
4
+ 0.0000001×10
+ 0.Errors
26
• Adding a large and a small number – 0.1000 × 10-2
0.4000
×10
4 4
0.

12345??? 0.Errors
27
• Subtractive Cancellation – If b 2 >> 4ac
−b ± b 2 − 4ac x= 2a
0.00000012 0.00000???
%≅b b − 4ac = b
2
% = " small " −b + b
.12345666 − 0.12345??? − 0.12345678 0.