You are on page 1of 55

Floating Point Operations

Numerical Methods

2019-2020
Discrete World

• We can not represent


all the real numbers
with a computer.
• Continuous processes
Numerical Methods

in space or in time
must be represented
in a discrete world

2019-2020
Discrete World

• The hardware of any computer only


implements the elemental algebraic operations:
+,-,*,/
• Any other mathematical operation must be
Numerical Methods

properly implemented through a specific


algorithm. This includes:
– Any transcendental function: sin(x), ln(x)
– Derivation operations

2019-2020
Floating Point Numbers

• In order to describe a real number we use a


real part, a decimal point and a fractional part:

37.21829, 0.002271828
Numerical Methods

Decimal Point Fractional Part


Real Part

2019-2020
Floating Point Numbers

• But we normally use the normalized scientific


notation:

37.21829  0.3721829  10 2
Numerical Methods

2
0.002271828  0.2271828  10

2019-2020
Floating Point Numbers

• Using normalized notation each number is


represented by certain fraction multiplied by
10n.

x  0.d1 d 2 d 3   10 n
Numerical Methods

• With the additional constraint that the first digit


in the fractional part must be not zero

d1  0
2019-2020
Floating Point Numbers

• Thus, in normalized scientific form, a


decimal point will be represented as:

1
x   r  10 n  r 1
10
Numerical Methods

where r will be called the normalized


mantissa and n the exponent

2019-2020
Floating Point Numbers

• Computers will use the normalized scientific


notation in a similar way, buy using a
different basis. The binary basis-2 system
1
x  q  2 m
 q 1
Numerical Methods

2
• q will be then a bit sequence bi (0 or 1) with
the additional condition:
b1  1
2019-2020
Floating Point Numbers

• The important point is that any computer has a


finite word length to complete the
representation of a real number.
• We can only represent numbers with a finite
Numerical Methods

number of digits. No way to treat irrational


numbers
• Real numbers can be too big or too small in
order to be represented within a computer.

2019-2020
Floating Point Numbers
• The finite set of numbers that can be
represented in a computer will be called the
set of machine numbers

x  0.1b2 b3  bn 2  2 m
Numerical Methods

• The smallest number will be of the form:

0.12  2 m

• And thus, there will be always a “hole around


zero”
2019-2020
Floating Point Numbers
• The detailed
Sign of q 1 bit
representation will
depend on the
machine Sign of m 1 bit
architecture.
Numerical Methods

• In order to fix the Integer |m| 14 bites


ideas, suppose that
we deal with a 64
bites and: Integer q 48 bits

2019-2020
Floating Point Numbers

Radix Point
16 bites 48 bites
Numerical Methods

exponent Normalized Mantissa

Mantissa sign Exponent Sign

2019-2020
Floating Point Numbers

• With these constraints, the maximum exponent


that can be represented is

ଵସ
Numerical Methods

• On the other hand, the smallest number that


can be represented by our machine is:
48 14
q2  0.3553  10

2019-2020
Computer Errors

• Some real numbers can be not represented with


this word structure:

x  (0.1d1d 2 K d 48d 49d 50 K ) 2  2 m


Numerical Methods

• We will be forced to discard some of the digits


in the mantissa, starting at the 48th position:

d 49d 50 K

2019-2020
Computer Errors

• We can do this in two ways. By Cutting off:

x'  0.1d 2 d 3  d 48 2  2 m

• Or rounding off. In this case we will


Numerical Methods

proceed as follows:

x ''   0.1d 2d 3 K d 48  2  2 48


  2 m

• And then we will take the nearest value


between x’ and x’’
2019-2020
Absolute Error

• The absolute error will be defined as:

 a  x  x'
• If we are using the rounding of numbers,
Numerical Methods

then:
1
 a  x  x'  x' ' x'  2  49  m

2
• The absolute error will depend on the
magnitude of the number (through m)
2019-2020
Relative Error

• On the other hand, the relative error is


defined as: x  x'
r 
x
• An in the case of rounding we have:
Numerical Methods

x  x '' 249 m 249 48


   2
x  2 3 
0.1d d K  2 m
1/ 2

• And the error does not depend anymore on


the magnitude of this particular number
2019-2020
Relative Error

• In our machine the representation error of a


real number will not be greater that this value:

 r  2 48  0.36  10 14


Numerical Methods

• Thus, all the floating point operations are


performed normally with 14 significant digits.
• This particular value is the machine rounding
error.

2019-2020
Computer Errors
• If during the computing operations we get a
number of the form:
 q2 m

where m is bigger than the greater number of


Numerical Methods

your computer you will get an overflow error


• And if you try to use number smaller that the
smallest number you get an underflow error.
(NaN: Not a Number)

2019-2020
Arithmetic Operations

• Any elemental arithmetic operation will not


improve the precision of your numbers:

x  y  x' y ' a ( x)   a ( y )
Numerical Methods

• Your errors will always pile up. Never


disappearing

 a ( x  y )   a ( x)   a ( y )

2019-2020
Matlab/Octave

• We will make an intensive use of the


MatLab/octave tools for Numerical Analysis
Numerical Methods

2019-2020
Matlab/Octave
• Octave+GNUPlot (QtOctave) is a free
distribution software, with a similar functionality
to MatLab
Numerical Methods

2019-2020
Matlab/Octave
• In this soft we will find a great number of
integrated functionality. We get a programming
and many graphics facilities.
Numerical Methods

2019-2020
Matlab/Octave

• All the useful


standard functions
are implemented
Numerical Methods

2019-2020
Matlab/Octave

• We can also get the standard logical operators


Numerical Methods

2019-2020
Matlab/Octave

• And the standard hierarchy of precedence with


the operators.
Numerical Methods

2019-2020
Floating Point Operations

• The lower an
bigger number
that your
computer can
Numerical Methods

use are stored in


the constants
realmax and
realmin

2019-2020
Floating Point Operations
• By Default, Matlab/Octave will work with a
precision of 16 decimal digits, but also by
default, will show you in the screen the values
in single precision (7 digits)
Numerical Methods

2019-2020
Floating Point Operations

• Consider the following simple arithmetic


operation:
Numerical Methods

2019-2020
Floating Point Operations

• The explanation of the former result is in the


fact that the computer works with basis-2 and
the floating point number t = 0.1 is not
represented as 1/10 but as this infinite series:
Numerical Methods

1 1 1 0 0 1 1 0
 4  5  6  7  8  9  10 L
10 2 2 2 2 2 2 2

• And the computer will never be able to store all


the mantissa digits
2019-2020
Floating Point Operations
• When representing floating point numbers the
computer will make a representation error
which will have as upper limit the value:

x  fl ( x) 1
 
Numerical Methods

x 2
• Where fl(x) is the floating point representation.
The constant ε is called the machine epsilon.

2019-2020
Floating Point Operations
• This constant ε is the minimum distance
between two floating point numbers. It is also
the smallest relative error. This values is
predefined in eps and takes the value ε=2^(-52)
Numerical Methods

2019-2020
Floating Point Operations

• This constant value can be calculated in any


computer using the following algorithm:
Numerical Methods

• The constant b ends with the relative error


b=eps/2. We will not be able to do floating
point operations with an smaller relative error.

2019-2020
Floating Point Operations

• There are many


ways to compute the
machine epsilon.
Here we have
Numerical Methods

another example:

2019-2020
Loss of Significance
• The limited precision implies that we have to
pay special attention if we want to make good
programs.
• A common error appears when subtracting very
similar numbers. The relative error can be really
Numerical Methods

huge. This particular error is called terms


cancellation

2019-2020
Loss of Significance

• In the case of subtraction the relative error can


be computed as:

 a ( x)   a ( y )
r 
x y
Numerical Methods

• And if x and y are very similar we can obtain


huge relative

2019-2020
Loss of Significance

• Consider for instance the expression


y  x  sin(x)
where:
x  0.6666666667  10 1
Numerical Methods

sin( x)  0.6661729492  10 1
x  sin( x)  0.0004937175  10 1
 0.4937175000  10  4
• We have lost three significant digits, i.e., a
relative error of 103

2019-2020
Loss of Significance

• If we want to avoid this problem we will be


forced to make some changes in mathematical
expression. For instance, consider:
f ( x)  x 2  1  1
Numerical Methods

• If x is a near zero value is much better to use


instead:

     
2
x 1 1 x2
f ( x)  x 1 1  
2 
 x2  1 1 x2 1 1
 

2019-2020
Loss of Significance

• With this program we can explore the


difference (nm122.m)
Numerical Methods

2019-2020
Loss of Significance

• This program will compute the two equivalent


functions:

f1 ( x)  x  x 1  x  i f 2 ( x) 
x
x 1  x
Numerical Methods

• For different values of x. When x=1015 we have


that:
x  1  3.162277660168381 107  31622776.60168381
x  3.162277660168379 107  31622776.60168379

2019-2020
Loss of Significance

• And we obtain the following results:


Numerical Methods

2019-2020
Loss of Significance

• Another example can be found when evaluating


polynomials like
p( x )  ( x  1)7
• Taking the x values within [0.988, 1.012] and
Numerical Methods

developing the polynomial expression to


obtain:
p ( x)  x 7  7 x 6  21x 5  35 x 4  35 x 3  21x 2  7 x  1

2019-2020
Loss of Significance
• Evaluating these two expressions alternatively
we obtain
Numerical Methods

2019-2020
Loss of Significance

• In order to evaluate the expression

y ( x)  x  sin( x)
Numerical Methods

• Is a clever solution to use the Taylor series for


sin(x). Thus
 x3 x5 x7  x3 x5 x7
f ( x)  x   x         
 3! 5! 7!  3! 5! 7!

2019-2020
Transcendental Functions

• The last example introduces a new problem. A


computer will not be able to evaluate directly
expressions like:

ln(2)
Numerical Methods

• Transcendental functions are those functions


that can not be evaluated using elementary
arithmetic operations.

2019-2020
Transcendental Functions
• We can use a polynomial to approximate this
kind of functions, thanks to:
• Weierstrass Approximation Theorem: If f is
defined and it si continuous within the interval
[a,b] and given any ε >0, There is some
Numerical Methods

polynomial P, defined within [a,b], for which

f ( x)  P( x)  

for any x   a, b
2019-2020
Taylor’s Polynomial

• Taylor’s Theorem: If f  C a, b and if f(n+1)


n

exists within the interval (a,b), then, for any


point c and x in [a,b] there is some point ξ
between c and x for which
Numerical Methods

n
1
f ( x)  
k 0 k!
f (k )
( c )( x  c ) k  E n ( x ),
where
1
E n ( x)  f ( n 1) ( )( x  c) n 1 .
(n  1)!

2019-2020
Taylor’s Polynomial

• In this way we can write

1 1
ln( x)  ( x  1)  ( x  1)  ( x  1) 3  
2

2 3
Numerical Methods

n 1 1
 (1) ( x  1) n  E n ( x)
n

• This expression can be used in any computer


as we are using only arithmetic operations.

2019-2020
Taylor’s Polynomial

• Suppose that we want to calculate ln(2) with a


precision of 8 digits. Then, it must be true that:

1 ( n 1) 1
E n ( 2)   n 1
(2  1)   10 8

n 1 n 1
Numerical Methods

• Thus n+1 ≥ 108, and we will need more than a


hundred million terms

2019-2020
Taylor’s Polynomial
• The Taylor Polynomial is just a local
approximation to a function
Numerical Methods

2019-2020
Taylor’s Polynomial

• The Taylor polynomial will not be useful


outside the interpolating range
Numerical Methods

2019-2020
Error Propagation
• We can compute recursively the value of the
Bessel functions in order to see how the
propagation of errors can be catastrophic.
• The value of the Bessel functions Jn(x) when
x=1. Can be obtained with the recursive
Numerical Methods

relation:
1
J n 1 ( x)  2nx J n ( x)  J n 1 ( x)
where
J 0 (1)  0.7651976865
J1 (1)  0.4400505857
2019-2020
Error Propagation

• The Bessel functions are defined by the relation



1
J n ( x) 
  cos( x sin   n )d
0
Numerical Methods

• And it is true that



1
J n ( x) 
  1d
0
1

2019-2020
Error Propagation

• But if we use the simple following


program:
Numerical Methods

2019-2020
Error Propagation

• But you will


get!?!
Numerical Methods

2019-2020

You might also like