Professional Documents
Culture Documents
1/8/2007 - L24 IEEE Floating Poi Copyright 2006 - Joanne DeGroat, ECE, OSU 1
nt Basics
Lecture overview
The standard
Floating Point Basics
A floating point adder design
1/8/2007 - L24 IEEE Flo Copyright 2006 - Joanne DeGroat, ECE, OSU 2
ating Point Basics
The floating point standard
Single Precision
s e (8-bits) f (23-bits)
1/8/2007 - L24 IEEE Flo Copyright 2006 - Joanne DeGroat, ECE, OSU 5
ating Point Basics
Conversion Examples
Converting from base 10 to the representation
Single precision example
Covert 10010
Step 1 – convert to binary - 0110 0100
128 64 32 16 8 4 2 1
0 1 1 0 0 1 0 0
1/8/2007 - L24 IEEE Flo Copyright 2006 - Joanne DeGroat, ECE, OSU 6
ating Point Basics
Conversion Example Continued
1.1001 x 26 is binary for 100
Thus the exponent is a 6
Biased exponent will be 6+127=133 = 1000 0101
Sign will be a 0 for positive
Stored fractional part f will be 1001
Thus we have
se f
0 100 0 010 1 1 00 1000….
4 2 C 8 0 0 0 0 in hexadecimal
$42C8 0000 is representation for 100
1/8/2007 - L24 IEEE Flo Copyright 2006 - Joanne DeGroat, ECE, OSU 7
ating Point Basics
Another example
Representation for -175
175 = 128 + 32 + 8 + 4 + 2 +1 = 1010 1111
Or 1.0101111 x 27
S=1
Exponent is 7 +127 = 134 = 1000 0110
Fractional part f = 0101111
Representation 1100 0011 0010 1111 0000 ….
Or in Hex $C32F 0000
1/8/2007 - L24 IEEE Flo Copyright 2006 - Joanne DeGroat, ECE, OSU 8
ating Point Basics
Converting back
Convert $C32F 0000 into decimal
Extract components from
1100 0011 0010 1111
S=1
Exponent = 1000 0110 = 128+4+2 = 134
unbias 134 – 127 =7
f = 0101111 so mantissa is 1.0101111
Adjust by exponent 1010 1111 (move binary pt 7 places)
Or 128+32+15 = 175
Sign is negative so -175
1/8/2007 - L24 IEEE Flo Copyright 2006 - Joanne DeGroat, ECE, OSU 9
ating Point Basics
Another example
Convert $41C8 0000 to decimal
0100 0001 1100 1000 0000 ….
S is 0 so positive number
Exponent 1000 0011 = 128+3=131-127=4
f = 1001 so mantissa is 1.1001
With 4 binary positions have 11001 as final
number or a decimal
25
1/8/2007 - L24 IEEE Flo Copyright 2006 - Joanne DeGroat, ECE, OSU 10
ating Point Basics
Arithmetic with floating point numbers
Add op1 $42C8 0000 and op2 $41C8 0000
First divide into component parts
Op1 $42C8 0000 =0100 0010 1100 1000 0000 ….
S=0
E = 1000 0101 = 133 – 127 = 6
Mop1 = 1.10010000…
Op2 $41C8 0000 =0100 0001 1100 1000 0000 ….
S=0
E = 1000 0011 = 131 – 127 = 4
Mop2 = 1.10010000…
1/8/2007 - L24 IEEE Flo Copyright 2006 - Joanne DeGroat, ECE, OSU 11
ating Point Basics
Now add the mantissas
But first align the mantissas
Op1 1.1001000….
Op2 1.1001000…. Which is the smaller number
and needs to be aligned
Exponent difference between op1 and op2 is 2
So shift op2 by 2 binary places or
Op2 becomes 0.0110010000…
1/8/2007 - L24 IEEE Flo Copyright 2006 - Joanne DeGroat, ECE, OSU 12
ating Point Basics
Add
Add op1 mantissa with the aligned op2
mantissa
1.1001000000…
0.0110010000…
1.1111010000
Result exponent is 6
Value is 1111101 or 64+32+16+8+4+1=125
Values added were 100 and 25
1/8/2007 - L24 IEEE Flo Copyright 2006 - Joanne DeGroat, ECE, OSU 13
ating Point Basics
Constructing Result Value
Sign 0
Exponent 6 E = 1000 0101 = 133 – 127 = 6
Mantissa of Result 1.1111010000
Fractional Part 1111010000….
Constructed Value
0 100 0010 1 111 1010 0000 0000 0000 0000
$4 2 F A 0 0 0 0 (125)
1/8/2007 - L24 IEEE Flo Copyright 2006 - Joanne DeGroat, ECE, OSU 14
ating Point Basics
Floating point representation of
125
Positive so s is 0
Exponent is 6 + 127 = 133 = 1000 0101
Fractional part from mantissa of
1.111101 or 111101
Constructed value
0 1000 0101 111101 00000000000000000
$42FA 0000
1/8/2007 - L24 IEEE Flo Copyright 2006 - Joanne DeGroat, ECE, OSU 15
ating Point Basics
Multiplication example
Multiply op1 $42C8 0000 & op2 $41C8 0000
First divide into component parts
Op1 $42C8 0000 =0100 0010 1100 1000 0000 ….
S=0
E = 1000 0101 = 133 – 127 = 6
Mop1 = 1.10010000…
Op2 $41C8 0000 =0100 0001 1100 1000 0000 ….
S=0
E = 1000 0011 = 131 – 127 = 4
Mop2 = 1.10010000…
1/8/2007 - L24 IEEE Flo Copyright 2006 - Joanne DeGroat, ECE, OSU 16
ating Point Basics
Multiplication basics
Base 10 example
3x102 * 1.1x102 = 3.3 x 104
Have 2 numbers A x 2ea and B x 2eb
Multiply and get
result = A*B x 2ea+eb
1/8/2007 - L24 IEEE Flo Copyright 2006 - Joanne DeGroat, ECE, OSU 17
ating Point Basics
So here
Have sign of both is + so result is +
Exponent addition
Both exponents are biased as stored
If you add stored binary exponents you need to
subtract the extra bias or 127
Or using pencil and paper (or powerpoint) can
just add the unbiased exponent of one operand to
the other biased exponent
Here have 133 + 4 = 137 = 1000 1001
1/8/2007 - L24 IEEE Flo Copyright 2006 - Joanne DeGroat, ECE, OSU 18
ating Point Basics
The mantissas
Do a binary multiplication
1.1001
1.1001
1 1001
1100 1
11001 and add
100111 0001
Adjusting for binary point have 10.01110001
1/8/2007 - L24 IEEE Flo Copyright 2006 - Joanne DeGroat, ECE, OSU 19
ating Point Basics
Final result
Exponent is 137 or 10
Mantissa is 10.01110001
Adjusted for exponent 1001 1100 0100
Value is 2048+256+128+64+4
Or 2304+128+68 = 2432 + 68 = 2500
And we were multiplying 100 * 25
1/8/2007 - L24 IEEE Flo Copyright 2006 - Joanne DeGroat, ECE, OSU 20
ating Point Basics
Specification of a FPA
Floating Point Add/Subtract Unit
Specification
Inputs in IEEE 754 Double Precision
Must perform both addition and subtraction
Must handle the full floating point standard
Normalized numbers
Not a Numbers – NaNs
+/- Infinity
Denormalized numbers
1/8/2007 - L24 IEEE Flo Copyright 2006 - Joanne DeGroat, ECE, OSU 21
ating Point Basics
Specifications continued
Result will be a IEEE 754 Double Precision
representation
Unit will correctly handle the invalid operation of
adding + and - = Nan per the standard
Unit latches it inputs into registers from parallel
64-bit data busses.
There is a separate signal line that indicates the
operation add or subtract
1/8/2007 - L24 IEEE Flo Copyright 2006 - Joanne DeGroat, ECE, OSU 22
ating Point Basics
Specifications continued
Outputs
The correctly represented result
Flags that are output are
Zero result
Overflow to infinity from normalized numbers as inputs
NaN result
Overshift (result is the larger of the two operands)
Denormalized result
Inexact (result was rounded)
Invalid operation for addition
1/8/2007 - L24 IEEE Flo Copyright 2006 - Joanne DeGroat, ECE, OSU 23
ating Point Basics
High level block diagram
Basic architecture interface
Data – 64 bit A,B,& C Busses
Control signals – Latch, Add/Sub, Asel, Drive
Condition Flags Output – 7 Flag signals
Clocks – Phi1 and Phi2 (a 2 phase clocked architecture
Abus Bbus
Add/Sub
Latch
Phi1 Floating Point Adder
Phi2 Unit
Asel
Drive
Cbus Flags
1/8/2007 - L24 IEEE Flo Copyright 2006 - Joanne DeGroat, ECE, OSU 24
ating Point Basics
Start the VHDL
The entity interface
1/8/2007 - L24 IEEE Flo Copyright 2006 - Joanne DeGroat, ECE, OSU 25
ating Point Basics