Decimal Floating-Point Arithmetic

Dongdong Chen

EE800, U of S

1

Objectives
• IEEE 754-2008 standard for Decimal Floating-Point (DFP) arithmetic (Lecture 1)
– – – – – DFP numbers formats DFP number encoding DFP arithmetic operations DFP rounding modes DFP exception handling
EE800, U of S 2

Objectives (Con.)
• Algorithm, architecture and VLSI circuit design for DFP arithmetic (Lecture 2)
– – – – DFP adder/substracter DFP multiplier DFP divider DFP transcendental function computation

EE800, U of S

3

Background

The decimal computer arithmetic went out of style 25 to 30 years ago; no one uses it now." Is that true?

EE800, U of S

4

Introduction • Decimal is still essential for specific applications – Numbers in commercial databases are decimal – Extensive use decimal in commercial applications – Survey of commercial databases report – Decimal fixed-point or floating-point number • How to process decimal computation – Software computation – Convert back to decimal representation – Problems EE800. U of S 5 .

09… – Example 2: telephone billing Cost: 0.05)=0.) • Errors from decimal and binary conversion – Example 1: represent 0.1 in DFP or BFP Decimal representation (BCD code):0.6999…8*(1.70.74 • Decimal integer.0001 Binary representation: 0.Introduction (Con.05)=0.70*(1. U of S 6 . fixed-point or floating-point? • Decimal hardware or software solutions? EE800.00011… 0.734999… DFP arithmetic: 0. Tax: 5% BFP arithmetic: 0.

multiplier. z10 • Intel include DFP software solution in system – Intel DFP software computation library • DFP arithmetic IP blocks: – Basic DFP arithmetic IPs: DFP adder/substrcter. EE800. divider. z9. reciprocal etc. square root etc. – Transcendental DFP arithmetic IPs: DFP CORDIC.Current Researches • DFP arithmetic defined in IEEE 754-2008 • IBM computing systems include DFP hardware – IBM Power6. antilogarithm. U of S 7 . Logarithm.

DFP Arithmetic in IEEE 754-2008 • Review BFP arithmetic in IEEE 754-2008 • How to define new DFP in IEEE 754-2008 EE800. U of S 8 .

significand (or mantissa): (–1)sign × significand × 2exponent – more bits for significand gives more accuracy – more bits for exponent increases range • IEEE 754 floating point standard: – single precision: 8 bit exponent. 52 bit significand EE800. exponent. 23 bit significand – double precision: 11 bit exponent. U of S 9 .BFP Floating-point representation • Representation: – sign.

BFP floating-point Number • Leading “1” bit of significand is implicit –Example: if the significand is 011010110…0. there is exactly one non-zero digit to the left of the point.011010110…0 • This is called a normalized number. the actual significand is 1. but only 23 of them are stored. EE800. U of S 10 . – Unique representation of a number – We get a little more precision: there are 24 bits in the significand.

U of S 11 . the actual exponent is 131-127=4 • If e = 0101 1101 (9310). the actual exponent is 93-127=-34 EE800.Exponent • Exponent is “biased” to make sorting easier – all 0s is smallest exponent. we can compare magnitudes as if they were unsigned integers. all 1s is largest – The actual exponent is e-127 for single precision. • If e = 1000 0011 (13110). and e-1023 for double precision – Bias of 127 for single precision and 1023 for double precision – By biasing the exponent and storing it before the significand.

bias = 127.BFP Floating-Point Formats Short (32-bit) format 8 bits. –126 to 127 23 bits for fractional part (plus hidden 1 in integer part) Sign Exponent 11 bits. U of S 12 . –1022 to 1023 Significand 52 bits for fractional part (plus hidden 1 in integer part) Long (64-bit) format EE800. bias = 1023.

(2 – 2-23)×2128 -2-127 0 2-127 (2 – 2-23)×2128 exponent = 128 and fraction ≠ 0. It is called “not a number” or NaN EE800.) Positive and negative zero Positive and negative infinity 1 00000000 00000000000000000000000 0 Biased exponent Fraction 0 1 11111111 00000000000000000000000 0 Fraction ∞ Biased exponent Negative underflow Negative Overflow Expressible negative numbers Positive underflow Expressible positive numbers Positive Overflow .BFP Floating-Point Formats (Con. U of S 13 .

75 = -3/4 = -3/22 – binary: -.11 = -1.Example • Summary: FP representation (–1)sign×(1+significand)×2exponent – bias • Example: – decimal: -.1 x 2-1 – floating point: exponent = 126 = 01111110 – IEEE single precision: 1 01111110 10000000000000000000000 EE800. U of S 14 .

exponent.DFP Number Representation • Representation: – sign. significand (or mantissa): (–1)sign × significand × 10exponent – more digits for significand gives more accuracy – more bits for exponent increases range representation: • DFP formats: – decimal32: DFP storage format encoded in 32-bit – decimal64: DFP computational format encoded in 64-bit – decimal128: DFP computational format encoded in 128-bit EE800. U of S 15 .

Not-a-Number (NaN). 1 MSD of significand.DFP Number format • 1-bit Sign (S) is defined as same as BFP format • w+5-bit combination (G) to two subfield: – 5-bit (G0…G4) to encode: 2 MSBs of exponent. which consists of w+2-bit nonnegative biased exponent. EE800. Inf. – W-bit(G5…Gw+4) as a suffix 2 MSBs derived from G0…G4. U of S 16 .

DFP Exponent • Exponent is “biased” to make sorting easier – Binary format (not decimal) – The actual exponent is e-101 for decimal32. U of S 17 . EE800. e-398 for decimal64. e-6167 for decimal128 – Range of exponent is (emin−q+1) ≤ e ≤ (emax−q+1).

) • J×10-bit Trailing Significand (T) Field: – Densely packed decimal (DPD) encoding 3-digit decimal number encoded to 10-bit binary number DPD converted to binary coded decimal (BCD) – Binary integer decimal (BID) encoding decimal number encoded by binary integer – Non-normalized decimal significand (-1)0 × 0.09000 × 101 – DFP number’s Cohort EE800. U of S 18 .DFP Number format (Con.00900 × 102 (-1)0 × 0.

U of S 19 .Parameters in DFP Format EE800.

35×10-2 to decimal64 – Sign bit: “1” negative. U of S 20 .00…1000111101” “A2 30 00 00 00 00 02 3D” (binary/hex) EE800.Example • Summary: DFP representation • (–1)sign×(significand)×10exponent-bias • Convert -8.. “0” positive (sign 1) – Exponent: -2+398=396 (8-bit “0110001100”) – Significand: 835(50-bit DPD coding “0…00 02 3D”) – Encoding of 5-bit MSBs (G0…G4) of Combinational field “01000” – Decimal-64 : “10100010001100….

• Underflow: If DFP number are less than the smallest DFP number (|vmin|=10emin-q+1) then underflow occurs. • Normal number: The remaining exponent values and significands represent normal numbers. EE800. • Overflow: If DFP numbers with absolute values are larger than the largest DFP number (|vmax|=(10q 1)×10emax-q+1) then overflow occurs. If the absolute value of DFP number is less than 10emin and larger than 10emax-q+1. • Infinite Number: G0…G4 “11110”.DFP special values • Not-a-Number: G0…G4 “11111”. U of S 21 . it produces subnormal. sign of Inf according to the sign bit.

DFP2) • DFP comparison operations – do not distinguish between redundant of the same number • DFP conversion operations – DFP to BFP conversion (correctly rounded).DFP2) – Quantize(DFP1. – DFP to integer conversion • Recommended DFP operations EE800. U of S 22 .DFP Arithmetic Operations • Basic DFP arithmetic operations • Two decimal-specific DFP operations – SameQuantum(DFP1.

U of S 23 .DFP2) – Quantize(DFP1.DFP2) • DFP comparison operations – do not distinguish between redundant of the same number • DFP conversion operations – DFP to BFP conversion (correctly rounded). – DFP to integer conversion • Recommended DFP operations EE800.DFP Arithmetic Operations • Basic DFP arithmetic operations • Two decimal-specific DFP operations – SameQuantum(DFP1.

U of S 24 .DFP Number’s Cohort • Non-normalized decimal significand • DFP number’s Cohort • Standard defines the preferred (required) exponent (quantum) – Exact operation results: the cohort member is selected based on the preferred exponent (quantum) for a DFP result of that operation – Inexact operation results: the cohort member of least possible exponent is used to get the maximum number of significant digits EE800.

U of S 25 .DFP Rounding Modes • Five types of active rounding modes – – – – – roundTiesToEven roundTiesToAway roundTiesToPositive roundTiesToNegative roundTowardZero • Correct rounding and Faithful rounding • IEEE 754-2008 require to satisfy the correct rounded results for all DFP arithmetic operations • DFP operations should satisfy all rounding modes EE800.

U of S 26 . • Underflow operation: if the magnitude of a result is below 10emin. • Overflow operation: if the magnitude of a result exceeds the largest finite number representable in the format of the operation. The default result is a +inf or −inf. default result is NaN • Division by zero: if the dividend is a finite non-zero number and the divisor is zero. quareroot of negative operand. EE800. • Inexact: the correctly rounded result of an operation differs from the infinite precision result.DFP Exception Handling • Invalid operation: Operand is NaN. 0×Inf.

U of S 27 .DFP Addition/Subtraction EE800.

U of S 28 .DFP Add/Sub Data flow EE800.

DFP Addition • Step 1: equalize the exponents – add the mantissas only when exponents are the same. round digit. and the number with larger exponent should be shifting its point to right. – Rewriting the operand with the smaller exponent could result in a loss of the least significant digits – keep guard digit. – the number with smaller exponent should be shifting its point to the left. and stick digit for the operand with smaller exponent EE800. U of S 29 .

DFP addition • Step 2: add the mantissas 0099999x101 +0016234x10-3 0999990x100 0000016(234)x100 1000006(234) x100 • Step 3: Normalize the result if necessary EE800. U of S 30 .

U of S 31 .234 EE800.DFP addition • Step 4: Round the number if needed 1000006234x100 =1000006x100 • Step 5: Repeat step 3 if the result is no longer normalized • The final result is 1000006 • The correct answer is 1000006.

Guard bits • To help minimize rounding problems. • Previous example: add one extra digit. EE800. one rounded digit and one sticky digit to make rounding more accurate. • IEEE 754-2008 requires one guard digit. U of S 32 . IEEE specifies that intermediate steps of operations must store guard digits additional internal digits that increase the precision of the operations.

DFP add/sub EE800. U of S 33 .

U of S 34 .General Description: Addition EE800.

U of S 35 .Example: Addition EE800.

) EE800.Example: Addition (Con. U of S 36 .

U of S 37 .DFU: IBM POWER6 and Z10 EE800.

High performance Implementation EE800. U of S 38 .

U of S 39 .High performance Implementation EE800.

Vázquez and E. June 08-10 2009 EE800. Portland.High performance Implementation [12] A. Antelo“A High-performance Significand BCD Adder with IEEE 754-2008 Decimal Rounding” ARITH19. U of S 40 .

Antelo“A High-performance Significand BCD Adder with IEEE 754-2008 Decimal Rounding” ARITH19.Evaluation Results and Comparison [Proposed]: A. June 08-10 2009 EE800. U of S 41 . Portland. Vázquez and E.

DFP Multiplication EE800. U of S 42 .

Scheme of decimal multiplier x: y: xy0: 5x 0 xy1: 5x −x xy2 : x 0 xy3: 10x −2x 1963× 8145= 9815 0000 9815 -1963 1963 0000 19630 -3926 15988635 43 EE800. U of S .

Partial product generation Generate XYi Yi {1.2.3…7. U of S 44 .9} XYi is carry save format EE800.8.

U of S 45 .Partial product generation Solid Circles: BCD Sum (digit) Hollow Circles: Carry (bit) n-digit radix-10 CSA m-digit radix-10 counter EE800.

U of S 46 .Carry Save Adder Tree CSA Tree to Generate Multiplication Result EE800.

Flowchart of DFP Multiplier 47 .

Architecture of DFP Multiplier 48 .

then shift right • Inexact 49 .Exception Detection & Handling • Invalid operation – sNaN (pass significand of sNaN) – 0 x ∞ (produce qNaN with significand 0) • Overflow (and Inexact) – IEIP – SLA > Emax – Increase SLA until all LZs removed • Underflow (and possibly Inexact) – IEIP – SLA < Emin – Decrease SLA until 0.

Implementation Highlights • Leverage operands' LZCs – SC. SLA. control-based rounding scheme 50 . and IESIP • Handle NaNs with minimal overhead – No dataflow modification – Coerce multiplicand or multiplier to 1 • Support gradual underflow – No dataflow modification – Simply extend number of iterations • Simple.

653 um2 237.607 um2 14.Synthesis Results • • • • 64-bit (16 digit) operands.45 FO4s • Critical path – Fixed-point – Floating-point 4:2 compressor (accumulator) 128-bit barrel shifer 51 .11um CMOS. DPD encoded LSI Logic's gflxp 0.72 FO4s 15. 55ps FO4 Synopsys Design Compiler Results – Fixed-point – Floating-point 119.

Applicability to Parallel Designs
• • • • IE and IP shift generation Rounding scheme NaN handling Exception detection and handling

• On-the-fly sticky bit generation... NO

52

Sequential vs. Parallel
• Sequential
– Less area – Potentially better cycle time

• Parallel
– Less latency – Higher throughput

53

DFP Division

EE800, U of S

54

DFP Division Data Flow
64 64

Sign (1 bit)

Combinational Field (5 bits)

Exponent Field (8 bits)
E1_b 8 8 E2_b

Significands Field (50bits)
M1_b 50 50 M2_b

Unpacking

• •

5 C1 1 1

5 C2

2 E1_a 2 E2_a E1

Combin_Register
10 10 E2

DPD_to_BCD
M1_b 60 60 M2_b

S1

Combinational Div Process
S2

Exponent Substraction
4 M2_a 4 M1_a E12 10

Combin_Register

Sign Logic

M1 64

64

M2

Mantissa Division Bias Addition
F Sq Ea 10 Mn

72

Exponent Adjustment
1 1

Fa 1

Normalization

72

10 Mn

72

Rounding Control

• • • • •

Exponent Adjustment
Ea Eq_C 10

Fa2 1

Rounding
64 Mq

1 Fr

Combinational Com Process

2 Mq_C 4

Exponent Div

Significand_Div
60 Mq

Eq Cq 5

8

BCD_to_DPD
50 Mq

11

Sign Eb(1 bit)

Combinational Field (5 bits)

ExponentM12 Field (8 bits)
64

Significands Field (50 bits)

packing

Unpacking Decimal FloatingPoint Number Check for zeros and infinity Subtract exponents Divide Mantissa Normalize and detect overflow and underflow Perform rounding Replace sign Packing
55

EE800, U of S

Stop) 1 S2 • Step2: Sign Process Sign Logic Sq  S1  S2 1 Sq EE800.Unpacking and Sign Logic 64 64 Sign (1 bit) Combinational Field (5 bits) Exponent Field (8 bits) Significands Field (50bits) Unpacking • S1 1 Step1: Unpacking Floating-Point Number Check for zeros and infinity (if F=0. U of S 56 .

U of S 57 .Exponent Subtraction E1 11 11 E2 Exponent Substraction E12 11 • Step3: Exponent Subtract Eb  E1  E2 + bias Bias Addition Eb 11 EE800.

Mantissa Division M1 64 64 M2 Algorithms Choose here? 1.1  M 2  1 M12 68 M min  0. Non-restoring division 3. High-Radix division 4.1 M max  1  10 p  1 0.1  M1  1 0. U of S 58 . Convergence division Mantissa Division • Step4: Mantissa Division 0.1  M min / M max  M1 / M 2  M max / M min  10 EE800. Restoring division 2.

also need to detect overflow and underflow For example: “0934…2140819564” Left shift one bit  “934…21408195640 Should tell exponent and Ea=Eb-1 EE800.Normalization Eb 10 M12 68 Exponent Adjustment Ea 10 1 Fa Normalization Mn 68 Step5 : Left shift over one bit is needed to make Mantissa result Normalized. U of S 59 • .

Infinity. detect the NaN. EE800. Round-up. Round-to-nearest. Sometimes. 11 Sign Eb(1 bit) Combinational Field (5 bits) ExponentM12 Field (8 bits) 64 Significands Field (50 bits) packing • Step7: Packing the Sign bit and Exponent bits and Significand bits together.Rounding and Packing 10 Ea 68 Mn 68 Exponent Adjustment 10 Eq Fr 1 Rounding Fr 1 Rounding Control 64 Mq • Step6 : Truncate. U of S 60 . according to IEEE Rounding standard: “Round to nearest even” is more better. the Rounding Policy above is not fair.

” Proceedings of the IEEE International Conference on Application -Specific Systems. 2004. Sep. pp. J. EE800.-K. Architectures and Processors. 84-95. Wang and M. U of S 61 . Schulte.High performance Implementation [1] L. “Decimal Floating-Point Division Using Newton-Raphson Iteration.

EE800.High performance Implementation [2] Tomás Lang and Alberto Nannarelli. June 2007. “A Radix-10 Digit-Recurrence Division Unit: Algorithm and Architecture. U of S 62 . pp727–739. IEEE.”IEEE Transactions on Computers.

U of S 63 .High performance Implementation EE800.

5 DFP Divider[2] 16 (decimal64) 1 20 20 1: Synthesized with a STM 90-nm standard cell library EE800. U of S 64 .57 150 85.Evaluation Results and Comparison DFP Divider[1] Precision (digit) Cycle time (ns) # of cycles Latency (ns) 16 (decimal64) 0.

DFP Transcendental Arithmetic EE800. U of S 65 .

Contents • • • • • Introduction Decimal Logarithmic Converter Decimal Antilogarithmic Converter Conclusions Future Work EE800. U of S 66 .

2458900) To guarantee a 32-bit DFP Calculation. there need to keep 14-digit FXP logarithmic calculation. U of S 67 . EE800. Example: R  log10 ((1)0 108  0024589)  8 + 5 + log10 (0.32-bit DFP Logarithm X  (1) s 10e  coefficient R  log10 ( X )  log10 (10e ) + log10 (coefficient ) coefficient is a non-normalized decimal Integer.

32-bit DFP Antilogarithm Here: log10 ( X min )  X  log10 ( X max ) P  Anti log10 ( X )  10 X For 32-bit DFP: Anti log10 ( X )  10 X [101. U of S 68 . there need to keep 8-digit FXP antilog calculation.99999]  X Int X Frac  10 1  X Int 10 X frac Example: Anti log10 ((1) 1940467 10 ) 5 Anti log10 (19.40467)  1019 100.4046700 To guarantee a 32-bit DFP calculation. EE800.96.

Digit-Recurrence Algorithm (Log) The corresponding recurrences: E ( j + 1)  E[ j ](1 + e j 10 j ) L( j + 1)  L[ j ]  log10 (1 + e j 10 j ) Here: E[1]  m L[1]  0 ej ∈{-9 -8 -7…0 1…7 8 9} e j selected so that E( j + 1) converges to 1 EE800. U of S 69 .

Digit-Recurrence Algorithm (Antilog) Any 7-digit fixed-point decimal input N: 10( m)  em ln(10)  em ' The corresponding recurrences: L( j + 1)  L[ j ]  ln(1 + e j 10 ) j E ( j + 1)  E[ j ]  (1 + e j 10 j ) j f  1 + e 10 Here: E[1]  1 L[1]  m ' i j e j selected so that L( j + 1) converges to 0 ej ∈{-9 -8 -7…0 1…7 8 9} EE800. U of S 70 .

) A scaled remainder is defined as: Log: Antilog: W [ j ]  10 j (1  E[ j ]) W [ j ]  10 ( E[ j ]) j e j is achieved by Rounding W [j] e j  round (W [ j ]) e1 is achieved by using look-up table. e2…ej can be obtained with selection by rounding EE800.Selection By Rounding (cont. U of S 71 .

2.Architecture: Decimal Log Converter m 8 Detector 28 2 Mult1 Tab I 4 “0000” Reg 1 28 Mux 7 8 32 m2m 3m 5m e1 Reg 2 56 m' e1 4 56 Mux 1 56 4 56 56 m' Stage 1 “0000” e1 4 ej Stage 2 4 (1/ln(10)) 4 56 Adjusted Costant 0 & Log 10(5.3) 64 Mux 9 64 Reg 6 64 W[j] 4e j Mux 2 Mult2 Shifter (x10-j) 56 Reg 4 Mux 5 56 9'sCom 56 56 ej m' Mux 3 56 9'sCom 56 4 56 4 56 1 Mux 4 56 56 W[j] Tab II Mult3 64 Mux 8 64 64 14-Digit Decimal CLA Adder 56 Shifter (x10) 56 Mux 6 56 Shifter (x100) 56 16-Digit Dec CLA 64 4 14-Digit Dec CLA Rounding Logic 56 W[j] ej 4 Reg 5 Reg 3 critical path EE800. U of S 72 .

347 Shifter 1.564 9.350 CLA 5.7 MHz 17 clock cycle *: Xilinx Virtex2p XC2VP30 with package ff1157 and speed -7 Critical Path Detail (ns): Reg2 1.566 Total 20.519 Round 0.438 Mux5 1.97 73 EE800.188 Mux2 Mult 2 1. U of S .Implementation Results Logic Utilization Used Available* Utilization # of Occupied Slices Maximum Frequency # of Clock Cycles 2842 13696 21% 47.

Architecture: Dec. U of S 74 . Antilog Converter Cons Mul “0000” 32 m' 40 Reg 2 40 X frac 28 Reg 1 28 28 ln(10) Stage 1 Stage 2 Critical Path ej 4 ej 40 e1 40 40 „1‟ Mux 5 40 Reg 6 40 TAB I e 8 1 12 4 9'sCom 40 40 Shifter_Reg 40 Mux 3 40 AddGen AddGen 7 7 Mux 1 7 “0000” TABLE II 40 40 9'sCom Shifter (x10j+1) 40 40 Mux 2 40 Mult 40 “0000” Mux 4 40 Shifter (x10-j) 40 10-digit Dec CLA W[j] 40 Rounding Logic ej 4 Reg 3 ej 4 10-digit Dec CLA L(j) 40 Final Rounding 28 Reg 5 28 40 EE800.

Implementation Results Logic Utilization Used Available* Utilization # of Occupied Slices Maximum Frequency # of Clock Cycles 2315 13696 17% 51.42 EE800.545 Total 19.539 Shifter 1.599 Mult 7. U of S 75 .794 Round 0.5 MHz 11 clock cycle *: Xilinx Virtex2p XC2VP30 with package ff1157 and speed -7 Critical Path Detail (ns): Reg6 1.839 Mux4 1.100 CLA 6.

Comparison (with Binary FXP Log and Exponential Converters) • similar dynamic range for the normalized coefficients. U of S 76 . EE800. 223  107  224 252  1016  253 • Binary reference available having the same digitrecurrence algorithm with Selection by Rounding. • The radix-10 is close to radix-8.

Exp.) (with Binary FXP Log and Exponential Converters) Radix-10 Decimal1 Log. Exp.Comparison (cont. Radix-8 Binary [1] Log. U of S 77 . Precision (digit) Area (fa2) Cycle time (T3) # of cycles Latency (T3) 1: 7 16 7 16 24 53 24 53 1630 2640 1370 2260 17 8 136 19 17 323 16 8 128 18 17 306 647 1829 7 8 56 8 18 144 627 1777 7 11 77 8 21 168 Synthesized with a TMSC 0.18-um standard cell library 2: the area of 1-bit full adder 3: the delay of 1-bit full adder EE800.

• Compare them with binary converters. EE800. • Implemented them on FPGA and ASIC.Conclusions • Achieved 32-bit DFP accuracy of decimal log and antilog results. U of S 78 .

Future Work The 64-bit and 128-bit DFP logarithm and antilog converters. • EE990 April. 2009 Decimal Log and Antilog Converters EE800. U of S 79 79/18 . • The presented architecture can be optimized to achieve a faster speed or occupy a smaller area.

Logarithmic and Antilogarithmic Converter • Implementing and programming DFP are both really hard. U of S 80 .Summary • IEEE 754-2008 defines a DFP standard that defines – number representation in several precisions – correct DFP arithmetic operations – rounding modes • Implementation of DFP Adder. Multiplier. Divider. EE800.

Sign up to vote on this title
UsefulNot useful