You are on page 1of 30

Numerical Methods

Representing Numbers
Accuracy and Precision
• Accuracy – how closely a
increasing accuracy measured value agrees with
the truth
• Precision – how closely
measured values agree with
increasing precision

each other

• Inaccurate measurements are


due to some bias
• Imprecise values are caused by
some uncertainty
• Inaccuracy and imprecision are
measured using an error term
Errors – Measurement
• Absolute error 𝐸 𝐴 =¿𝑡𝑟𝑢𝑒 − 𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒𝑑∨¿
 

𝑡𝑟𝑢𝑒 −𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒𝑑
• Relative error
 
𝐸𝑅= | 𝑡𝑟𝑢𝑒 |
• true = 1.5 cm, calculated = 1 cm
• absolute = 0.5 cm
• relative = 0.333

• true = 1,000,000.5 cm, calculated = 1,000,000 cm


• absolute = 0.5 cm
• relative = 0.5 x 10-8
Absolute vs. Relative Error
• Generally, scientific and engineering applications are less
sensitive to small errors in large values
• Relative error can be used to compare values at widely
varying sizes
• Most computing systems are designed to minimize relative
error
𝑡𝑟𝑢𝑒 −𝑐𝑎𝑙𝑐𝑢𝑙𝑎𝑡𝑒𝑑
Relative Error | |
 
𝐸𝑅=
𝑡𝑟𝑢𝑒

1) Undefined when the true value is zero


2
𝑓 (𝑥 )− 𝑓 𝑐 (𝑥)
1

0
0
𝑓 ( 2.3 )=0
 
1 2 3 4 5
 
|
𝐸𝑅=
max ⁡{∨𝑓 𝑐 ( 𝑥)∨,∨𝑓 ( 𝑥 )∨}|
-1

Celsius
2) Relative measurements Kelvin
true   C   K
calculated   C   K

  = 283 −284
 𝐸 = 10− 11
𝑅 |
10 | 𝐸 𝑅
283 | |
𝐸
  𝑅 =0.1=10 % 𝐸
  𝑅 =0.0035=.35 %
Sources of error
•  Inaccuracy often results from a bias in the algorithm
• Euler’s method for explicit integration [discussed later] returns
biased results based on the curvature of the solution to the
differential equation
• Creating more accurate algorithms is a cornerstone of numerical
methods

• Imprecision is more fundamental – depends on the number


of bits supported by the data type being used
• Imagine that we only have 3 digits to represent a value
• How do we represent , , and ?
• Our choices for improving precision are limited:
• increase the number of digits available
Number Bases
• Decimal – base 10
• measure precision in “digits”
• Binary – base 2
• measure precision in “bits”
• Hexadecimal – base 16
• useful for memory dumps
• matches neatly with data produced by 8-, 32-, 64-bit systems
• will explore later
• Octal – base 8
• look at memory dumps from machines with smaller word sizes
• similar to how hexadecimal is used now
Binary
• When
  discussing numerical algorithms, it is convenient to
discuss numbers using both binary and decimal notation
• For example, discussing significant bits and significant digits can be
useful for different applications
• Understand what both of these terms mean
• Numerical basis is represented by a subscript when the
value is ambiguous:
Interpreting binary numbers
•• Double-dabble
  algorithm
• initialize a register X = 0
• for each digit in the binary number
• if 0 – double X
• if 1 – dabble X (double it and add one)

• Do this with a few numbers:


0000 1101 0110 1010 1111 1111
13 106 255
• How would you handle signed integers?
• Introduce a “signed” bit as the most significant digit:

• What are the following values (assuming an preceding signed bit)?


1101 1010 1011 1001 1111 1111
-90 -57 -127
Overflow
• What
  is 1111 1111 + 1?
• or 0000 0000 – 1?
• Value “wraps around” to the opposite value
• One of the most common errors in programming
• What happens when we convert an 8-bit integer into a 4-bit integer
by truncating the most significant bits?
0001 0011 -> 0011 1100 1010 -> 1010

• Keeping the least significant bits of a number performs the


operation:
• This is the defined behavior for C/C++ casting:

unsigned long m = ULONG_MAX; //


unsigned int n = m; //
Fixed Point and Arithmetic
• Decimal
  places are fractional powers-of-two
• Evaluate:

• Demonstrate the operations:

+ + x x
Floating Point, base 10
23
. 602× 10
  exponent

mantissa base

• spend
  a few bits on the mantissa and a few on the exponent
• effectively represent very large and very small numbers

• Any base can be used


• normalization constraint:
insures that each number has a single representation
(basically, the first digit of the mantissa should be non-zero)
Floating Point, base 2
• Standard
  format:

• is the base
• is a sign bit
• is the mantissa, where
• is the exponent

• What is the decimal spacing between values when and is 3-bits?


smallest value:
next value:
• How about when ?
smallest value:
next value:
Floating Point Multiplication
• 

1) Add the sign bits:


2) Add the exponents:
3) Multiply the mantissa:
4) Adjust to normalize :
5) Chop excess digits:
• Find the result of the binary operation given a 3-bit mantissa:
var
Floating Point Addition
• 

1) Convert to the same exponent (and chop):

2) Add:
3) Normalize:
• What is the actual solution and the corresponding relative
error?

• Note that the actual solution could be represented


Loss of Precision
• This
  subtraction effectively reduced our numerical precision
by one digit:

• Assuming a 3-digit mantissa, perform the following


operations:

1)
2)
where is the correct result, the relative error is
Loss of Precision
• Each
  operation maintains the appropriate precision
• However, we end up with far less precision than we expect
• The result has one digit of precision
• The correct value of requires two digits of precision
• We expect , we only need , but we get
• This is known colloquially as catastrophic cancellation
• How could we have avoided loss of precision in this case?

• We will discuss ways to avoid loss of precision in more


complex cases in the next lecture
Loss of Precision Theorem
•   Let and be normalized floating-point machine numbers such that ,
and

for some positive integers and , then at most and at least significant
digits are lost in the subtraction .

• How many bits are lost when calculating for:


and ?
Lost Bits
•  Reformulate the Loss of Precision Theorem:

• Lose between and bits


Lost Bits
•  How many bits are lost in , where
and ?

Between and bits of precision have been lost.


Lost Bits – Example 2
•  How many bits are lost in when ?

Between and bits of precision have been lost.

We will discuss what to do about this in the next lecture.


Representation Quirks
• Precision
  scales between decades – spacing between values
increases with increasing exponent
• Machine epsilon (unit epsilon)
• smallest number such that
• spacing between and an adjacent number is given by:

• Functions with slope skip values


• Imagine has slope at all points. Several values of will have no
corresponding
• How is this a problem for finding roots or optimization?
Chopping, Rounding, and Bias
• 
• Chopping values: 0.1557
  × 101
 
   

• Rounding values: 0.1557


  × 101
 
 
 
   

• Both chopping and rounding introduce a bias


• rounding – 4 digits (1-4) round down while 5 digits (5-9) round up
• eliminating the bias can be done by randomly rounding up or down
in the case of a ‘5’ – not worth it if running the same program twice
returns two different results
IEEE 32-bit floating point
•  Slightly more complicated

• no wasted 1 to force normalization


• exponent bias removes the need for a sign bit in
• 32-bit: 1 sign bit, 8-bit exponent, 23-bit mantissa
• exponent bias:
• machine epsilon

• 64-bit: 1 sign bit, 11-bit exponent, 52-bit mantissa


• exponent bias:
• machine epsilon
Hexadecimal
• Useful when looking at binary data
• memory dumps, binary files, etc.

0000 1010 0001 1011 0010 1100 0011 1101


0 A 1 B 2 C 3 D

• Easier to view the data in 8-bit chunks


• 0A 1B 2C 3D
• Commonly used for
• memory dumps
• doing your own math, bitwise operations
Endianness
• Hardware-dependent byte
ordering of the system
• Conversion is called the “nUxi
problem”
• Intel (and by extension most
home computers) use little-
endian
• Important if you want to do
your own bitwise operations
• Create binary masks
• Implement your own floating
point
• Integer arithmetic
Testing Endianness
main() {
int a = 0x0A1B2C3D;
unsigned char *c = (unsigned char*)(&a);
if (*c == 0x3D)
std::cout<<“little-endian”<<std::endl;
else
std::cout<<“big-endian”<<std::endl;
}
Numerical Methods – a PSA
• In 1991, failure of a MIM-104 Patriot Missile defense system was
caused by a software error. Time stamps from multiple radar
pulses were converted to floating point differently.
• http://en.wikipedia.org/wiki/MIM-104_Patriot#Failure_at_Dhahran
• In 1994, Professor Thomas R. Nicely discovered that the Intel
Pentium floating‐point processor returned erroneous results for
certain division operations.
• http://engineeringfailures.org/?p=466
• In 1996, the Ariane5 rocket launched by the European Space
Agency exploded 40 seconds after lift‐off from Kourou, French
Guiana, because of an overflow error.
• http://www.around.com/ariane.html
• IBM recently announced that they’re going little-endian
• https://www.business-cloud.com/articles/news/ibm-drops-linux-bombshell

h/t Prof. Lennart Johnsson (COSC)


Numerical Methods – a PSA
• Boeing
  747 operators have been ordered to periodically
reset their electrical systems to avoid an overflow error that
happens every centiseconds (around every 8 months)
• http://www.nytimes.com/2015/05/01/business/faa-orders-fix-for-p
ossible-power-loss-in-boeing-787.html?_
r=0
• Software patch is being released Fall 2015

• Donkey Kong breaks at level 22. You’re given 260 seconds to


complete the level – however they’re only using an 8-bit
unsigned integer to store the time...
Memory dump demo
• Try using https://hexed.it/ to view memory dumps and
binary files

You might also like