Round-Off Error - Wikipedia

12/30/21, 3:45 PM Round-off error - Wikipedia
Round-off error
A roundoff error,[1] also called rounding error,[2] is the difference between the result produced
by a given algorithm using exact arithmetic and the result produced by the same algorithm using
finite-precision, rounded arithmetic.[3] Rounding errors are due to inexactness in the representation
of real numbers and the arithmetic operations done with them. This is a form of quantization error.[4]
When using approximation equations or algorithms, especially when using finitely many digits to
represent real numbers (which in theory have infinitely many digits), one of the goals of numerical
analysis is to estimate computation errors.[5] Computation errors, also called numerical errors,
include both truncation errors and roundoff errors.
When a sequence of calculations with an input involving any roundoff error are made, errors may
accumulate, sometimes dominating the calculation. In ill-conditioned problems, significant error may
accumulate.[6]
In short, there are two major facets of roundoff errors involved in numerical calculations:[7]
1. Digital computers have magnitude and precision limits on their ability to represent numbers.
2. Certain numerical manipulations are highly sensitive to roundoff errors. This can result from both
mathematical considerations as well as from the way in which computers perform arithmetic
operations.
Contents
Representation error
Floating-point number system
Notation of floating-point number system
Normalized floating-number system
IEEE standard
Machine epsilon
Roundoff error under different rounding rules
Calculating roundoff error in IEEE standard
Measuring roundoff error by using machine epsilon
Theorem
Proof
Roundoff error caused by floating-point arithmetic
Addition
Multiplication
Division
Subtractive cancellation
Accumulation of roundoff error
Unstable algorithms
Ill-conditioned problems
https://en.wikipedia.org/wiki/Round-off_error 1/10
Real world example: Patriot missile failure due to magnification of roundoff error
See also
References
Further reading
External links
Representation error
The error introduced by attempting to represent a number using a finite string of digits is a form of
roundoff error called representation error.[8] Here are some examples of representation error in
decimal representations:
Notation Representation Approximation Error

1/7 0.142 857 0.142 857 0.000 000 142 857
ln 2 0.693 147 180 559 945 309 41... 0.693 147 0.000 000 180 559 945 309 41...
log10 2 0.301 029 995 663 981 195 21... 0.3010 0.000 029 995 663 981 195 21...
3
√2 1.259 921 049 894 873 164 76... 1.25992 0.000 001 049 894 873 164 76...
√2 1.414 213 562 373 095 048 80... 1.41421 0.000 003 562 373 095 048 80...
e 2.718 281 828 459 045 235 36... 2.718 281 828 459 045 0.000 000 000 000 000 235 36...
π 3.141 592 653 589 793 238 46... 3.141 592 653 589 793 0.000 000 000 000 000 238 46...
Increasing the number of digits allowed in a representation reduces the magnitude of possible
roundoff errors, but any representation limited to finitely many digits will still cause some degree of
roundoff error for uncountably many real numbers. Additional digits used for intermediary steps of a
calculation are known as guard digits.[9]
Rounding multiple times can cause error to accumulate.[10] For example, if 9.945309 is rounded to
two decimal places (9.95), then rounded again to one decimal place (10.0), the total error is 0.054691.
Rounding 9.945309 to one decimal place (9.9) in a single step introduces less error (0.045309).
Floating-point number system

Compared with the fixed-point number system, the floating-point number system is more efficient in
representing real numbers so it is widely used in modern computers. While the real numbers are
infinite and continuous, a floating-point number system is finite and discrete. Thus, representation
error, which leads to roundoff error, occurs under the floating-point number system.
Notation of floating-point number system
A floating-point number system is characterized by integers:
: base or radix
: precision
: exponent range, where is the lower bound and is the upper bound
Any has the following form:
where is an integer such that for , and is an integer

such that .
Normalized floating-number system

A floating-point number system is normalized if the leading digit is always nonzero unless the
[3]
number is zero. Since the mantissa is , the mantissa of a nonzero number in a
normalized system satisfies . Thus, the normalized form of a nonzero IEEE
floating-point number is where . In binary, the leading digit is always so
it is not written out and is called the implicit bit. This gives an extra bit of precision so that the
roundoff error caused by representation error is reduced.
Since floating-point number system is finite and discrete, it cannot represent all real numbers
which means infinite real numbers can only be approximated by some finite numbers through
rounding rules. The floating-point approximation of a given real number by can be
denoted.
The total number of normalized floating-point numbers is
, where
counts choice of sign, being positive or negative

counts choice of the leading digit
counts remaining mantissa
counts choice of exponents
counts the case when the number is .
IEEE standard
In the IEEE standard the base is binary, i.e. , and normalization is used. The IEEE standard
stores the sign, exponent, and mantissa in separate fields of a floating point word, each of which has a
fixed width (number of bits). The two most commonly used levels of precision for floating-point
numbers are single precision and double precision.
Precision Sign (bits) Exponent (bits) Mantissa (bits)

Single 1 8 23
Double 1 11 52
Machine epsilon
Machine epsilon can be used to measure the level of roundoff error in the floating-point number
system. Here are two different definitions.[3]
The machine epsilon, denoted , is the maximum possible absolute relative error in
representing a nonzero real number in a floating-point number system.
The machine epsilon, denoted , is the smallest number such that . Thus,
whenever .
Roundoff error under different rounding rules

There are two common rounding rules, round-by-chop and round-to-nearest. The IEEE standard uses
round-to-nearest.
Round-by-chop: The base- expansion of is truncated after the digit.
This rounding rule is biased because it always moves the result toward zero.
Round-to-nearest: is set to the nearest floating-point number to . When there is a tie, the
floating-point number whose last stored digit is even is used.
For IEEE standard where the base is , this means when there is a tie it is rounded so that
the last digit is equal to .
This rounding rule is more accurate but more computationally expensive.
Rounding so that the last stored digit is even when there is a tie ensures that it is not rounded
up or down systematically. This is to try to avoid the possibility of an unwanted slow drift in
long calculations due simply to a biased rounding.
The following example illustrates the level of roundoff error under the two rounding rules.[3] The
rounding rule, round-to-nearest, leads to less roundoff error in general.
x Round-by-chop Roundoff Error Round-to-nearest Roundoff Error

1.649 1.6 0.049 1.6 0.049
1.650 1.6 0.050 1.7 0.050
1.651 1.6 0.051 1.7 -0.049
1.699 1.6 0.099 1.7 -0.001
1.749 1.7 0.049 1.7 0.049
1.750 1.7 0.050 1.8 -0.050
Calculating roundoff error in IEEE standard
Suppose the usage of round-to-nearest and IEEE double precision.
Example: the decimal number can be rearranged into
Since the bit to the right of the binary point is a and is followed by other nonzero bits, the
round-to-nearest rule requires rounding up, that is, add bit to the bit. Thus, the normalized
floating-point representation in IEEE standard of is
Now the roundoff error can be calculated when representing with .
This representation is derived by discarding the infinite tail
from the right tail and then added in the rounding step.
Then .
Thus, the roundoff error is .
Measuring roundoff error by using machine epsilon
The machine epsilon can be used to measure the level of roundoff error when using the two
rounding rules above. Below are the formulas and corresponding proof.[3] The first definition of
machine epsilon is used here.
Theorem
1. Round-by-chop:
2. Round-to-nearest:
Proof
Let where , and let be the floating-point

representation of . Since round-by-chop is being used, it is
* In order to determine
the maximum of this quantity, the is a need to find the maximum of the numerator and the minimum
of the denominator. Since (normalized system), the minimum value of the denominator is .
The numerator is bounded above by . Thus,
. Therefore, for round-by-chop.

The proof for round-to-
nearest is similar.
Note that the first definition of machine epsilon is not quite equivalent to the second definition
when using the round-to-nearest rule but it is equivalent for round-by-chop.
Roundoff error caused by floating-point arithmetic

Even if some numbers can be represented exactly by floating-point numbers and such numbers are
called machine numbers, performing floating-point arithmetic may lead to roundoff error in the
final result.
Addition
Machine addition consists of lining up the decimal points of the two numbers to be added, adding
them, and then storing the result again as a floating-point number. The addition itself can be done in
higher precision but the result must be rounded back to the specified precision, which may lead to
roundoff error.[3]
For example, adding to in IEEE double precision as follows,
This is saved as since round-to-nearest is used in IEEE standard. Therefore,
is equal to in IEEE double precision and the roundoff error is .
From this example, it can be seen that roundoff error can be introduced when doing the addition of a
large number and a small number because the shifting of decimal points in the mantissas to make the
exponents match may cause the loss of some digits.
Multiplication
In general, the product of -digit mantissas contains up to digits, so the result might not fit in the
[3]
mantissa. Thus roundoff error will be involved in the result.
For example, consider a normalized floating-point number system with the base and the
mantissa digits are at most . Then and . Note that
but since there at most mantissa digits. The roundoff
error would be .
Division
In general, the quotient of -digit mantissas may contain more than -digits.[3] Thus roundoff error
will be involved in the result.
For example, if the normalized floating-point number system above is still being used, then
but . So, the tail
is cut off.
Subtractive cancellation
The subtracting of two nearly equal numbers is called subtractive cancellation.[3]
When the leading digits are cancelled, the result may be too small to be represented exactly and it
will just be represented as .
For example, let and the second definition of machine epsilon is used here. What
is the solution to ?
It is known that and are nearly equal numbers, and

. However, in the floating-point number system,
. Although is easily big
enough to be represented, both instances of have been rounded away giving .
Even with a somewhat larger , the result is still significantly unreliable in typical cases. There is
not much faith in the accuracy of the value because the most uncertainty in any floating-point
number is the digits on the far right.
For example,
. The
result is clearly representable, but there is not much faith in it.
Accumulation of roundoff error

Errors can be magnified or accumulated when a sequence of calculations is applied on an initial input
with roundoff error due to inexact representation.
Unstable algorithms
An algorithm or numerical process is called stable if small changes in the input only produce small
changes in the output and it is called unstable if large changes in the output
are produced.[11]
A sequence of calculations normally occur when running some algorithm. The amount of error in the
result depends on the stability of the algorithm. Roundoff error will be magnified by unstable
algorithms.
For example, for with given. It is easy to show that
. Suppose is our initial value and has a small representation error , which means
the initial input to this algorithm is instead of . Then the algorithm does the following
sequence of calculations.
The roundoff error is amplified in succeeding calculations so this algorithm is unstable.
Ill-conditioned problems
Even if a stable algorithm is used, the solution to a problem may still be inaccurate due to the
accumulation of roundoff error when the problem itself is ill-conditioned.
The condition number of a problem is the ratio of the relative change in the solution to the relative
change in the input.[3] A problem is well-conditioned if small relative changes in input result in
small relative changes in the solution. Otherwise, the problem is ill-conditioned.[3] In other words,
a problem is ill-conditioned if its condition number is "much larger" than .
The condition number is introduced as a measure of the roundoff errors that can result when solving
ill-conditioned problems.[7]
Real world example: Patriot missile failure due to magnification of roundoff error
On 25 February 1991, during the Gulf War, an American Patriot missile battery in Dharan, Saudi
Arabia, failed to intercept an incoming Iraqi Scud missile. The Scud struck an American Army
barracks and killed 28 soldiers. A report of the then-General Accounting Office entitled "Patriot
Missile Defense: Software Problem Led to System Failure at Dhahran, Saudi Arabia" reported on the
cause of the failure: an inaccurate calculation of the time since boot due to computer arithmetic
errors. Specifically, the time in tenths of a second, as measured by the system's internal clock, was
multiplied by 10 to produce the time in seconds. This calculation was performed using a 24-bit fixed
point register. In particular, the value 1/10, which has a non-terminating binary expansion, was
chopped at 24 bits after the radix point. The small chopping error, when multiplied by the large
number giving the time in tenths of a second, led to a significant error. Indeed, the Patriot battery had
been up around 100 hours, and an easy calculation shows that the resulting time error due to the
magnified chopping error was about 0.34 seconds. (The number 1/10 equals
. In other words, the binary expansion of 1/10 is
. Now the 24 bit register in the Patriot stored instead
introducing an error of
binary, or about decimal. Multiplying by the number of tenths of a second in hours
gives ). A Scud travels at about 1676 meters per second,
and so travels more than half a kilometer in this time. This was far
enough that the incoming Scud was outside the "range gate" that
the Patriot tracked. Ironically, the fact that the bad time
calculation had been improved in some parts of the code, but not
all, contributed to the problem, since it meant that the
inaccuracies did not cancel.[12]
See also
Precision (arithmetic)
Truncation
Rounding
Loss of significance
Floating point
Kahan summation algorithm
Machine epsilon
American Patriot missile
Wilkinson's polynomial
References
1. Butt, Rizwan (2009), Introduction to Numerical Analysis Using MATLAB (https://books.google.co
m/books?id=QWub-UVGxqkC&pg=PA11), Jones & Bartlett Learning, pp. 11–18, ISBN 978-0-
76377376-2
2. Ueberhuber, Christoph W. (1997), Numerical Computation 1: Methods, Software, and Analysis (htt
ps://books.google.com/books?id=JH9I7EJh3JUC&pg=PA139), Springer, pp. 139–146, ISBN 978-
3-54062058-7
3. Forrester, Dick (2018). Math/Comp241 Numerical Methods (lecture notes). Dickinson College.
4. Aksoy, Pelin; DeNardis, Laura (2007), Information Technology in Theory (https://books.google.co
m/books?id=KGS5IcixljwC&pg=PA134), Cengage Learning, p. 134, ISBN 978-1-42390140-2
5. Ralston, Anthony; Rabinowitz, Philip (2012), A First Course in Numerical Analysis (https://books.g
oogle.com/books?id=TVq8AQAAQBAJ&pg=PA2), Dover Books on Mathematics (2nd ed.),
Courier Dover Publications, pp. 2–4, ISBN 978-0-48614029-2
6. Chapman, Stephen (2012), MATLAB Programming with Applications for Engineers (https://books.
google.com/books?id=of8KAAAAQBAJ&pg=PA454), Cengage Learning, p. 454, ISBN 978-1-
28540279-6
7. Chapra, Steven (2012). Applied Numerical Methods with MATLAB for Engineers and Scientists
(3rd ed.). The McGraw-Hill Companies, Inc. ISBN 9780073401102.
8. Laplante, Philip A. (2000). Dictionary of Computer Science, Engineering and Technology (https://b
ooks.google.com/books?id=U1M3clUwCfEC&pg=PA420). CRC Press. p. 420. ISBN 978-0-
84932691-2.
9. Higham, Nicholas John (2002). Accuracy and Stability of Numerical Algorithms (https://books.goo
gle.com/books?id=epilvM5MMxwC&pg=PA43) (2 ed.). Society for Industrial and Applied
Mathematics (SIAM). pp. 43–44. ISBN 978-0-89871521-7.
10. Volkov, E. A. (1990). Numerical Methods (https://books.google.com/books?id=ubfrNN8GGOIC&pg
=PA24). Taylor & Francis. p. 24. ISBN 978-1-56032011-1.
11. Collins, Charles (2005). "Condition and Stability" (https://www.math.utk.edu/~ccollins/M577/Hando

uts/cond_stab.pdf) (PDF). Department of Mathematics in University of Tennessee. Retrieved
2018-10-28.
12. Arnold, Douglas. "The Patriot Missile Failure" (http://ta.twi.tudelft.nl/users/vuik/wi211/disasters.htm
l). Retrieved 2018-10-29.
Further reading
Matt Parker (2021). Humble Pi: When Math Goes Wrong in the Real World. Riverhead Books.
ISBN 978-0593084694.
External links
Roundoff Error (http://mathworld.wolfram.com/RoundoffError.html) at MathWorld.
Goldberg, David (March 1991). "What Every Computer Scientist Should Know About Floating-
Point Arithmetic" (http://perso.ens-lyon.fr/jean-michel.muller/goldberg.pdf) (PDF). ACM Computing
Surveys. 23 (1): 5–48. doi:10.1145/103162.103163 (https://doi.org/10.1145%2F103162.103163).
Retrieved 2016-01-20. ([1] (http://www.validlab.com/goldberg/paper.pdf), [2] (http://docs.oracle.co
m/cd/E19957-01/806-3568/ncg_goldberg.html))
20 Famous Software Disasters (http://www.devtopics.com/20-famous-software-disasters/)
Retrieved from "https://en.wikipedia.org/w/index.php?title=Round-off_error&oldid=1056121736"
This page was last edited on 19 November 2021, at 21:33 (UTC).
Text is available under the Creative Commons Attribution-ShareAlike License;

additional terms may apply. By using this
site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia
Foundation, Inc., a non-profit organization.

Round-Off Error - Wikipedia

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Round-Off Error - Wikipedia

Uploaded by

Copyright:

Available Formats

12/30/21, 3:45 PM Round-off error - Wikipedia

Notation Representation Approximation Error

Floating-point number system

Notation of floating-point number system

A floating-point number system is characterized by integers:

Any has the following form:

where is an integer such that for , and is an integer

Normalized floating-number system

counts choice of sign, being positive or negative

Precision Sign (bits) Exponent (bits) Mantissa (bits)

Roundoff error under different rounding rules

Round-by-chop: The base- expansion of is truncated after the digit.

x Round-by-chop Roundoff Error Round-to-nearest Roundoff Error

Calculating roundoff error in IEEE standard

Suppose the usage of round-to-nearest and IEEE double precision.

Example: the decimal number can be rearranged into

Now the roundoff error can be calculated when representing with .

This representation is derived by discarding the infinite tail

Measuring roundoff error by using machine epsilon

Let where , and let be the floating-point

The numerator is bounded above by . Thus,

. Therefore, for round-by-chop.

Roundoff error caused by floating-point arithmetic

For example, adding to in IEEE double precision as follows,

This is saved as since round-to-nearest is used in IEEE standard. Therefore,

is equal to in IEEE double precision and the roundoff error is .

The subtracting of two nearly equal numbers is called subtractive cancellation.[3]

It is known that and are nearly equal numbers, and

Accumulation of roundoff error

For example, for with given. It is easy to show that

The roundoff error is amplified in succeeding calculations so this algorithm is unstable.

11. Collins, Charles (2005). "Condition and Stability" (https://www.math.utk.edu/~ccollins/M577/Hando

Retrieved from "https://en.wikipedia.org/w/index.php?title=Round-off_error&oldid=1056121736"

This page was last edited on 19 November 2021, at 21:33 (UTC).

Text is available under the Creative Commons Attribution-ShareAlike License;

You might also like