You are on page 1of 5

IEEE International Conference on Power, Control, Signals and Instrumentation Engineering (ICPCSI-2017)

Implementation of IEEE 754 Compliant Single


Precision Floating-Point Adder Unit Supporting
Denormal Inputs on Xilinx FPGA

Milind Shirke*, Sajish Chandrababu, Yogindra Abhyankar


High Performance Computing Technology Group
Centre for Development of Advanced Computing
Pune, India
Email: milindshirke@gmail.com, sajishc@cdac.in, yogindra@cdac.in

Abstract—FPGA based reconfigurable computing standard adder algorithm implementation [7]. In our
accelerators are increasingly being used in high performance and implementation we use the standard floating-point adder
scientific computing applications to achieve higher performance. algorithm considering its area efficiency as the prime
These applications demand high numerical stability and accuracy objective.
and hence usually use floating-point arithmetic. The floating-
point arithmetic cores available from major FPGA vendors are Section 2 of this paper describes in brief the IEEE 754
not fully IEEE 754 compliant. These cores do not support single precision floating-point format, exceptions, special
denormal numbers. In this paper we describe our numbers and rounding modes. Section 3 describes the standard
implementation of IEEE 754 compliant single precision floating- floating-point adder algorithm and Section 4 describes the
point adder that supports denormal inputs. Further, we compare architecture and hardware modules developed to implement
its performance and resource utilization against the Xilinx the algorithm. Section 5 compares our implementation of
floating-point adder IP core. Our implementation has eight IEEE 754 compliant single precision floating-point adder unit
stages of pipeline, utilizing minimal FPGA resources; it can supporting denormal inputs with the Xilinx floating-point
operate at frequencies greater than 300 MHz. adder core. Section 6 concludes the paper and highlights the
future work.
Keywords—FPGA, IEEE 754, Floating-Point, single precision,
Reconfigurable Computing
II. IEEE 754 STANDARD FOR FLOATING POINT
I. INTRODUCTION The IEEE 754 is an industry standard to represent floating
Floating-Point addition is one of the most frequently used point numbers.
floating-point operation. An analysis of real time applications A. Single Precision Floating-Point format
indicate that signal processing algorithms require, on an
The IEEE 754 single precision format is a 32 bit format.
average 40% multiplication and 60% addition operations.
This format uses 1-bit for sign, 8-bits for exponent and 23-bits
Therefore the floating-point addition forms an important
to represent the fraction as in fig. 1. The single precision
operation in math co-processors, DSP processors, embedded
floating-point number is calculated as (-1) S × 1.F × 2(E-127).
arithmetic processors, and data processing units. Almost every
The sign bit is 0 for non-negative number and 1 for negative
modern computer and compiler uses the IEEE 754 floating-
numbers. The exponent field can be used to represent both
point standard. This standard has greatly improved both the
positive and negative exponents. To do this, a bias is added to
ease of porting floating-point programs and the quality of
the actual exponent. For IEEE single-precision format, this
computer arithmetic. When a program is moved from one
value is 127. As an example, a stored value of 147 indicates an
system to another, the results of the basic operations remains
exponent of (147-127), or 20. The mantissa (significand) is
the same if both systems are compliant to the IEEE 754
composed of an implicit leading bit and the fraction bit, and
standard. This standard completely describes the behavior and
represents the precision bits of the number. Exponent values
operations on floating-point data, including advanced options
of 0xFF and 0x00 are reserved for special numbers such as
such as the use of rounding modes and other computational
zero, denormalized numbers, infinity, and Not a Number
techniques and controls [1]. Many floating-point adder
(NaN).
algorithms and design approaches have been developed and
discussed by the VLSI community [2-10]. Leading One
Predictor (LOP) and Far and Close Path algorithms achieve Sign 8 bit Exponent 23 bit Fraction
better overall latency, while the standard floating-point adder (S) (E) (F)
algorithm is considered as an area efficient algorithm. The Far
and Close algorithm consumes 88% more slices than the
Fig. 1. IEEE 754 Single Precision Floating-Point Format
*Currently not working with C-DAC. The work was performed while the
author was with C-DAC.

978-1-5386-0814-2/17/$31.00 ©2017 IEEE

408
IEEE International Conference on Power, Control, Signals and Instrumentation Engineering (ICPCSI-2017)

B. Normalized and Denormalized Numbers The implementer shall choose how tininess is detected, but
A floating-point number is said to be normalized if the shall detect tininess in the same way for all operations in radix
exponent field contains the real exponent plus the bias other two, including conversion operations under a binary rounding
than 0xFF and 0x00. For all the normalized numbers, the first attribute. In our implementation we are calculating underflow
bit just left of the decimal point is considered to be 1 and not after rounding.
encoded in the floating-point representation; this also called as The invalid operation exception is signalled if and only if
the implicit or the hidden bit. there is no useful definable result. In these cases the operands
To support very small numbers, IEEE introduced are invalid for the operation to be performed. These operations
denormalized numbers. The denormalized numbers represent are any general-computational or signalling-computational
numbers between zero and the lowest normalized number. A operation on a signalling NaN except for some conversions.
floating-point number is considered as denormalized, if the For example magnitude subtraction of infinities is considered
exponent field is 0x00 and the fraction field is not all 0’s. For as invalid operation.
the denormalized number, the implicit (hidden) bit is
considered to be a 0. III. STANDARD FLOATING-POINT ADDER ALGORITHM
The standard floating-point adder algorithm as shown in
C. Special Quantities fig. 2 is implemented for our design. This area efficient
In IEEE 754 standard, infinity and NaN are treated as algorithm is as described below.
special quantities. Without any special quantities, there was no
1) Compare two operands N1 and N2 and check for
good way to handle exceptional situations like calculating the
denormalization and infinity. If numbers are
square root of a negative number other than aborting the
denormalized, set the implicit bit to 0 otherwise it is
computation. In IEEE 754 arithmetic, a NaN is returned in the
Read Operand
above said situation [11]. N1 & N2
In single-precision representation, Infinity is represented
by exponent field of 0xFF and fraction field of all 0’s while
NaN is represented by exponent field of 0xFF and the fraction Denormalize Yes Set Implicit bit 0
operands?
field that is not all 0’s.
No
D. Rounding Mode and Exceptions
Three user-selectable rounding modes and a default Compare Yes
exponent Swap Operands
rounding mode are defined in the IEEE standard. The default (E1<E2)?
rounding mode (roundTiesToEven) of IEEE 754 is
implemented in our unit. In this mode the floating-point No
number nearest to the infinitely precise result is delivered; if Shift smaller operand by
|E1-E2|
the two nearest floating-point numbers bracketing an
un-representable infinitely precise result are equally near, the
one with an even least significant digit will be delivered [1]. Operation Yes Two’s Complement of smaller
Subtraction? operand
The IEEE 754 defines five types of exceptions: overflow,
underflow, invalid operation, inexact result, and division-by- No
zero. Exceptions are signalled by setting a flag. We have Add fraction parts f1 + f2
implemented overflow, underflow and invalid operation flags
in our designs.
The overflow exception is signalled if and only if the Negative Yes Two’s complement of result
Result?
destination format’s largest finite number is exceeded in
magnitudes by what would have been the rounded floating- No
point result were the exponent range unbounded. Leading One Detector

The underflow exception is signalled when a tiny non-zero


result is detected. For binary formats, this will be either: Normalization

a) After rounding — when a non-zero result computed as


though the exponent range were unbounded would lie strictly Rounding
between the smallest positive/negative normal floating-point
number ± bemin.
Final Exponent & Sign calculation
b) Before rounding — when a non-zero result computed as
though both the exponent range and the precision were Output & Exceptions
unbounded would lie strictly between ± bemin.
Fig. 2. Standard Floating-point Adder Algorithm

409
IEEE International Conference on Power, Control, Signals and Instrumentation Engineering (ICPCSI-2017)

set to 1. At this point, the fraction part is extended to


24 bits.
2) The two exponents E1 of N1 and E2 of N2 are
compared. If E1 is less than E2, N1 and N2 are
swapped, otherwise no change.
3) The smaller operand’s fraction part is shifted right by
the absolute difference of the two exponents. After
shifting both the numbers have the same exponent.
4) The sign bits of the two operands and input
(operation to be performed add/sub) are used to
determine whether the operation is a subtraction or an
addition.
5) If the operation is a subtraction, then two’s
complement of the F2, fraction part of N2 is used.
6) Now the two fractions are added using a 27 bit adder.
7) If the resulting sum is a negative number, it has to be
inverted and a 1 has to be added to the result.
8) The result is then passed to a Leading One Detector.
It gives the position of the first one in the result.
9) Based on the output of the leading one detector, the
result is then shifted left to get normalized. In some
cases, 1-bit right shift is needed.
10) The result is then rounded towards nearest even
number.
11) Using the results from the leading one detector, the Fig. 3. Floating-point Adder Architecture
exponent is adjusted. The sign is computed and after
that overflow and underflow check is performed and precision of the number, three extra bits, the guard (g), round
the result is sent out. (r), and sticky (s) are added to the significand. The g and r bits
are first and second bits that might be shifted out and s is the
bit computed by ORing all the other bits that are shifted out.
IV. FLOATING-POINT ADDER ARCHITECTURE AND The barrel shifter implementation is used for the right shifter
IMPLEMENTATIONS because of the lower combinational delay.
The standard floating-point adder as shown in fig. 3,
consists of a two’s complement 27-bit adder, right-shifter and C. Two’s Complement Adder
an exponent difference module. For post normalization, we This is a simple adder that adds or subtracts the pre-
need a Leading One Detector and a left-shifter. After normalized significands. Two 27 bit significands are the inputs
normalization, rounding is performed taking guard (g), round to this module. The signs of individual significand are
(r) and sticky (s) bits into consideration. In the following sub processed before it enters the adder module and determines
sections we describe all the modules that are developed to whether the operation is addition or subtraction. This signal is
implement the floating-point adder. used to select the two’s complement of the operands before
adding in case of subtraction. The generated carry out signal
A. Exponent Difference Module determines the sign of the result and also used to determine the
The exponent difference module computes absolute sign of the output.
difference of two 8-bit numbers and identifies if E1 is smaller
than E2. An 8-bit adder is used to subtract the exponents and D. Leading One Detector (LOD)
the carry out is used to identify if E1 is smaller than E2. If the After the addition, the next step is to normalize the result.
result is negative, two’s complement of the result is computed The first step is to identify the position of first one (leading
to find the absolute difference. one) in the result. Then shift left the adder result by the
number of zeros in front of the leading one. In order to
B. Right Shifter perform this operation, special hardware, called Leading One
The right shifter is used to shift right the significand of the Detector (LOD) has to be implemented. This design has low
smaller operand by the absolute exponent difference. This is fan-in and fan-out that leads to area and delay efficient design,
done so that the two numbers have the same exponent and first presented by Oklobdzija in 1994 [12] and later a low
normal integer addition can be implemented. To maintain the power design in [13].

410
IEEE International Conference on Power, Control, Signals and Instrumentation Engineering (ICPCSI-2017)

TABLE I. A COMPARISON OF RESOURCE UTILIZATION & FREQUENCIES


BETWEEN OUR DESIGN IMPLEMENTATION VERSUS THE XILINX DESIGN ON
XC5VLX30-1 DEVICE.

LUT- Latency
Frequency
Method Register LUT Registers ( Clock
( MHz)
Pair Cycles)
Our 577 441 433 8 263

Xilinx 629 432 558 8 420


(a)
We have implemented the above algorithm using Xilinx.
The frequency and the resource utilization for our
implementation are shown in the table I. The performance and
resource utilization of the Xilinx core implementation of 32
bits single precision floating-point adder without denormal
support [13] is also shown in table I for comparison.
From table I, we can observe that the resource utilization is
(b) almost same for both the implementations; however there is a
Fig. 4. Comparison of how denormal numbers are handled. (a) difference in the maximum operating frequency. This is due
Simulation waveform for Xilinx core (b) Simulation waveform for to the fact that the logic required for handling denormal
our implementation of standard Floating-point Adder Algorithm numbers increases the critical path that in turn reduces the
supporting denormal numbers. operating frequency.

E. Normalization (Left Shifter) VI. CONCLUSION


Using the LOD, the result from the adder is shifted left for Many of the high performance scientific computing
normalization. That means now the first bit is 1. The barrel applications require fully compliant IEEE 754 arithmetic
shifter implementation is used for the left shifter. units. The Xilinx implementation treats denormal numbers as
zero, hence it may not be directly useful in such applications.
F. Rounding In order to make it useful, one may require additional logic.
The IEEE 754 default rounding scheme is implemented in Our implementation of IEEE 754 compliant single precision
our design. Rounding is done using the guard, round and floating point adder unit supporting denormal inputs will work
sticky bit of the result. A 1 is added to the result if guard bit is fine with scientific applications that require high accuracy and
set and either s or r bit is 1. The exponent part of the result is precision. In future we are planning to implement multi-path
determined by subtracting the larger of the exponent by the and far-and-close data-path algorithms as they reduce critical
leading zero count from the leading one detector. The sign is paths but at the cost of significant increase in hardware
selected according to result of the sign bit calculation unit. resources [15].
Overflow and underflow exceptions are flagged by comparing
the output exponent to the desired conditions. References
[1] “IEEE 754 Standard for Floating - Point Arithmetic”1985, IEEE Std
V. COMPARISON 754-1985.
Our FPGA design is written in VHDL and thoroughly [2] F. Pappalardo, G. Visalli, and M. Scarana, “An application-oriented
simulated using Modelsim 6.5. The simulation results were analysis of power/precision tradeoff in fixed and floating-point
arithmetic units for VLSI processors,” in Proc. IASTED Conf. Circuits,
compared with the C program results and found to be correct Signals, and Systems, Dec. 2004, pp. 416–421.
for both floating-point addition and subtraction. Apart from [3] M. Farmland, “On the Design of High Performance Digital Arithmetic
normal cases, special conditions like addition of two denormal Units”, Ph.D. dissertation, Department of Electrical Engineering,
numbers resulting in denormal number, subtraction of two Stanford University, Aug. 1981.
normal numbers resulting in denormal number, subtraction of [4] P.M. Seidel and G. Even, “Delay-optimization implementation of IEEE
normal and denormal number resulting in denormal number floating-point addition,” IEEE Trans. Comput., vol. 53, no. 2, Feb. 2004,
etc. were also verified and found to be correct. pp. 97–113.
[5] J.D. Bruguera and T. Lang, “Leading-one prediction with concurrent
As discussed earlier the Xilinx floating-point core does not position correction,” IEEE Trans. Comput., vol. 48, no. 10, 1999, pp.
support the denormal numbers. The core considers denormal 1083–1097.
numbers as zero. Fig. 4(a) shows the waveform of the [6] S.F. Oberman, H. Al-Twaijry, and M.J. Flynn, “The SNAP Project:
simulation run involving denormal numbers for the Xilinx Design of floating-point arithmetic units,” in Proc. 13th IEEE Symp.
1997.
core. One can see that the result of addition of two denormal
numbers as zero. Fig. 4(b) shows the simulation waveform of [7] A. Malik and S.-B. Ko, “A study on the floating-point adder in FPGAs,”
in Proc. Canadian Conference on Electrical and Computer Engineering,
our design with same inputs and the corresponding expected 2006, pp. 86–89.
result.

411
IEEE International Conference on Power, Control, Signals and Instrumentation Engineering (ICPCSI-2017)

[8] G. Govindu, L. Zhuo, S. Choi, and V. Prasanna, “Analysis of high- [12] V. G. Oklobdzija, “An Algorithmic and Novel Design of a Leading Zero
performance floating-point arithmetic on FPGAs,” in Proc. Parallel Detector Circuit: Comparison with Logic Synthesis.” IEEE Transactions
Distrib. Process. Symp., 2004, pp. 149–156. on Very Large Scale Integration (VLSI) Systems, Vol. 2, No.1, pp. 124-
[9] J. Liang, R. Tessier, and O. Mencer., "Floating Point Unit Generetion 128, 1994.
and Evaluation for FPGAs". In Proc. of IEEE Symposium on Field [13] Abed, K.H.; Siferd, R.E., "VLSI Implementations of Low-Power
Programmable Custom Computing Machines (FCCM’03), California, Leading-One Detector Circuits," SoutheastCon, 2006. Proceedings of
USA, April 2003. the IEEE , vol., no., pp.279,284, March 31 2005-April 2 2005.
[10] A. Malik, D. Chen, Y. Choi, M. H. Lee, and S.B. Ko, “Design tradeoff [14] Xilinx Logicore. Floating-Point operator v7.0. http://www.xilinx.com/
analysis of floating-point adders in FPGAs,” Canadian Journal of [15] Shao Jie, Ye Ning , Zhang Xiao-Yan “An IEEE compliant Floating-
Electrical and Computer Engineering, vol. 33, pp. 169–175, 2008. point Adder with the deeply Pipelining paradigm on FPGAs” 2008
[11] What Every Computer Scientist Should Know About Floating-Point International Conference on Computer Science and Software
Arithmetic, published in the March, 1991 issue of Computing Survey. Engineering.

412

You might also like