Professional Documents
Culture Documents
EEE212 Page 1
Binary Representation
How a Decimal Number is
Represented
−1 −2
257.76 = 2 × 10 + 5 × 10 + 7 ×10 + 7 ×10 + 6 ×10
2 1 0
Base 2
(1× 23 + 0 × 2 2 + 1× 21 + 1× 20 )
(1011.0011) 2 = −1 −2 −3 −4
+ ( 0 × 2 + 0 × 2 + 1 × 2 + 1 × 2 ) 10
= 11.1875
Convert Base 10 Integer to
binary representation
Table 1 Converting a base-10 integer to binary representation.
Quotient Remainder
11/2 5 1 = a0
5/2 2 1 = a1
2/2 1 0 = a2
1/2 0 1 = a3
Hence
(11)10 = (a3 a 2 a1 a0 ) 2
= (1011) 2
Start
Integer N to be
Input (N)10
converted to binary
format
i=0
Divide N by 2 to get
quotient Q & remainder R
i=i+1,N=Q
ai = R
No
Is Q = 0?
Yes
n=i
(N)10 = (an. . .a0)2
STOP
Fractional Decimal Number
to Binary
Table 2. Converting a base-10 fraction to binary representation.
Hence
(0.1875)10 = (a−1a− 2 a− 3a− 4 ) 2
= (0.0011) 2
Start
Fraction F to be
Input (F)10
converted to binary
format
i = −1
Multiply F by 2 to get
number before decimal,
S and after decimal, T
i = i − 1, F = T
ai = R
No
Is T =0?
Yes
n=i
(F)10 = (a-1. . .a-n)2
STOP
Decimal Number to Binary
(11.1875)10 = ( ?.? )2
Since
(11)10 = (1011) 2
and
(0.1875)10 = (0.0011) 2
we have
(11.1875)10 = (1011.0011) 2
All Fractional Decimal Numbers
Cannot be Represented Exactly
Table 3. Converting a base-10 fraction to approximate binary representation.
Number Number
Number after before
decimal Decimal
0.3 × 2 0.6 0.6 0 = a−1
0.6 × 2 1.2 0.2 1 = a− 2
0.2 × 2 0.4 0.4 0 = a−3
0.4 × 2 0.8 0.8 0 = a− 4
0.8 × 2 1.6 0.6 1 = a−5
= 1× 2 + 0 × 2 + 1× 2 + 1× 2
3 2 1 0
= (1011)2
(0.1875)10 = 2 −3
+ 0.0625
= 2 −3 + 2 − 4
−1 −2 −3 −4
= 0 × 2 + 0 × 2 + 1× 2 + 1× 2
= (.0011)2
(11.1875)10 = (1011.0011)2
Floating Point Representation
Floating Decimal Point : Scientific Form
−3
0.003678 is written as + 3.678 ×10
− 256.78 is written as − 2.5678 ×10 2
Example
The form is
sign × mantissa ×10exponent
or
σ × m ×10e
Example: For
− 2.5678 ×10 2
σ = −1
m = 2.5678
e=2
Floating Point Format for Binary
Numbers
y = σ × m× 2 e
0 0 1 0 1 1 1 0 1
mantissa exponent
Sign of the Sign of the
number exponent
Machine Epsilon
Defined as the measure of accuracy and found
by difference between 1 and the next number
that can be represented
Example
∈mach = 1.0625 − 1 = 2 −4
Relative Error and Machine
Epsilon
The absolute relative true error in representing
a number will be less then the machine epsilon
Example
(0.02832)10 ≅ (1.1100)2 × 2−6
= (1.1100 )2 × 2 −(0110 ) 2
0 1 0 1 1 0 1 1 0 0
Sign of the exponent mantissa
Sign of the
number
exponent
(1.1100)2 × 2−(0110 ) 2
= 0.0274375
0.02832 − 0.0274375
∈a =
0.02832
= 0.034472 < 2 − 4 = 0.0625
IEEE-754 Floating Point
Standard
• Standardizes representation of
floating point numbers on
different computers in single and
double precision.
• Standardizes representation of
floating point operations on
different computers.
One Great Reference
http://www.validlab.com/goldberg/paper.pdf
IEEE-754 Format Single
Precision
s
.
Value = ( −1) × (1 m )2 × 2 e ' −127
Example#1
1 1 0 1 0 0 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
= (− 1) × (1.625) × 2162−127
= (− 1)× (1.625)× 235 = −5.5834 ×1010
Example#2
Represent -5.5834x1010 as a single
precision floating point number.
? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ?
− 5.5834 × 10 = (− 1) × (1. ? ) × 2 ±?
10 1
15
Exponent for 32 Bit IEEE-754
8 bits would represent
0 ≤ e′ ≤ 255
Bias is 127; so subtract 127 from
representation
− 127 ≤ e ≤ 128
16
Exponent for Special Cases
Actual range of e′
1 ≤ e′ ≤ 254
e′ = 0 and e′ = 255 are reserved for special numbers
Actual range of e
− 126 ≤ e ≤ 127
Special Exponents and Numbers
e′ = 0 all zeros
e′ = 255 all ones
s e′ m Represents
0 all zeros all zeros 0
1 all zeros all zeros -0
0 all ones all zeros ∞
1 all ones all zeros −∞
0 or 1 all ones non-zero NaN
IEEE-754 Format
(1.1........1)2 × 2
127
= 3.40 ×10 38
ε mach = 2 −23
= 1.19 × 10 −7
19
Why measure errors?
1) To determine the accuracy of
numerical results.
2) To develop stopping criteria for
iterative algorithms.
True Error
Defined as the difference between the true
value in a calculation and the approximate
value found using a numerical method etc.
If f ( x) = 7e and h = 0.3
0.5 x
h f ′(2) ∈a m
0.3 10.263 N/A 0
3
Round off Error
• Caused by representing a number
approximately
1
≅ 0.333333
3
2 ≅ 1.4142...
5
Problems created by round off error
6
Problem with Patriot missile
• Clock cycle of 1/10 seconds was
represented in 24-bit fixed point
register created an error of 9.5 x
10-8 seconds.
• The battery was on for 100
consecutive hours, thus causing
an inaccuracy of
−8 s 3600s
= 9.5 × 10 × 100hr ×
0.1s 1hr
= 0.342 s
7
Problem (cont.)
• The shift calculated in the ranging system
of the missile was 687 meters.
• The target was considered to be out of
range at a distance greater than 137
meters.
8
Effect of Carrying Significant
Digits in Calculations
9
Find the contraction in the
diameter
Tc
∆D = D ∫ α (T )dT
Ta
11
Regressing Data in Excel
(general format)
19
Example of Truncation Error
2!
20
Another Example of Truncation
Error
secant line
P
tangent line
90
y = x2
60
30
0 x
0 1.5 3 4.5 6 7.5 9 10.5 12
22
Example 1 —Maclaurin series
1.2
Calculate the value of e with an absolute
relative approximate error of less than 1%.
1.2 2 1.2 3
e 1.2
= 1 + 1.2 + + + ...................
2! 3!
n
e 1.2
Ea ∈a %
1 1 __ ___
2 2.2 1.2 54.545
3 2.92 0.72 24.658
4 3.208 0.288 8.9776
5 3.2944 0.0864 2.6226
6 3.3151 0.020736 0.62550
f ( x) = x 2 f ( x + ∆x) − f ( x)
Find for
f ′(3) using f ′( x) ≈
∆x
and ∆x = 0.2
f (3 + 0.2) − f (3)
f (3) =
'
0.2
f (3.2) − f (3) 3.2 2 − 32 10.24 − 9 1.24
= = = = = 6.2
0.2 0.2 0.2 0.2
90
9
∫
y=x
2
2
60
x dx
30 3
0 x
0 3 6 9 12
25
Integration example (cont.)
∫ = (6 − 3) + ( x 2 ) (9 − 6)
2 2
x dx ( x )
x =3 x =6
3
= (3 2 )3 + (6 2 )3
= 27 + 108 = 135
Actual value is given by
9 9
x 3 93 − 33
∫3 x dx = 3 = 3 = 234
2
3
Truncation error is then
234 − 135 = 99
Can you find the truncation error with 4 rectangles?
26