You are on page 1of 45

# Concordia

University

FLOATING
POINT
1

Concordia

University

Lecture #4
In this lecture we will go over the following concepts:

1)
2)
3)
4)
5)
6)
7)
8)

## Floating Point Number representation

Accuracy and Dynamic range; IEEE standard
Rounding Techniques
Floating point Multiplication
Architectures for FP Multiplication
Comparison of two FP Architectures
9) Barrel Shifters

SingleanddoubleprecisiondataformatsofIEEE754standard

Sign 8bitbiased
S ExponentE

23bitsunsignedfractionP

(a)IEEEsingleprecisiondataformat
Sign 11 bitbiased
S ExponentE

52bitsunsignedfractionp

(b)IEEEdoubleprecisiondataformat

FormatparametersofIEEE754FloatingPointStandard

Parameter

Format
Single
Precision

Double
Precision

Formatwidthinbits

32

64

Precision(p)=
fraction+hiddenbit

23+1

52+1

11

Maximumvalueofexponent

+127

+1023

Minimumvalueofexponent

126

1022

Exponentwidthinbits

Rangeoffloatingpointnumbers

Underflow
Overflow

WithinRange

Negativenumbers

WithinRange

Positivenumbers

Overflow

Denormalized

ExceptionsinIEEE754
Exception

Remarks

Overflow

Resultcanbeordefaultmaximumvalue

Underflow

Resultcanbe0ordenormal

DividebyZero

Resultcanbe

Invalid

ResultisNaN

Inexact

Systemspecifiedroundingmayberequired

OperationsthatcangenerateInvalidResults

Operation

Remarks

Subtraction

Anoperationofthetype

Multiplication

Anoperationofthetypex

Division

Operationsofthetypeand

Remainder

OperationsofthetypexREM0andREMy

SquareRoot

SquareRootofanegativenumber

## IEEE compatible floating point multipliers

Algorithm

Step1
ingthebias,().biasis127and1023forsingleprecisionanddoubleprecisionIEEEdataformatrespectively
Step2
Ifthesignoftwofloatingpointnumbersarethesame,setthesignofproductto+,elsesetitto.
Step3
Multiplythetwosignificands.Forpbitsignificandtheproductis2pbitswide(p,thewidthofsignificand
Step4
NormalizetheproductifMSBoftheproductis1(i.e.productof),byshiftingtheproductrightby1bit
positionandincrementingthetentativeexponent.
Evaluateexceptionconditions,ifany.
Step5
RoundtheproductifR(M0+S)istrue,whereM0andRrepresentthepthand(p+1)stbitsfromtheleftend
ofnormalizedproductandStickybit(S)isthelogicalORofallthebitstowardstherightofRbit.Ifthe
MSBsofthenormalizedproductare1s,roundingcangenerateacarryout.Inthatcasenormalization(step
4)hastobedoneagain.

## Operands Multiplication and Rounding

pbitsignificandfield

Input
Significands

p1lowerorderbits

p1lowerorderbits
Significandsbeforemultiplication
2pbits

Cout
Resultofsignificandmultiplicationbeforenormalizationshift
pbitsignificandfield
p1higherorderbits

M0

NormalizedproductbeforeRounding

Figure2.4Significandmultiplication,normalizationandrounding

Whats the
best
architecture?

Architecture Consideration

Concordia

University

10

Concordia

A Simple FP Multiplier
Sign1
Sign2

Exp1
Exp2

University

Significand1
Significand2

Significan
dMultiplie
r

Exponent&Sign
Logic

Normalizatio
n Logi
c

Roundin
g Logi
c

Correction
Shift

ResultFlags
Logic
Resul
Selector
t
Flag
s

IEEE
Product

11

## A Dual Path FP Multiplier

Exponents

InputFloatingPointNumbers

ExponentLogic

Control/SignLogic

Concordia

University

1st

2nd

SignificandMultiplier
(PartialProduct
Processing)

BypassLogic

3rd
Critical
Path

Exponent
Incrementer

CPA/RoundingLogic

StickyLogic

Path2

ResultSelector/
NormalizationLogic

ResultIntegration/FlagLogic

Flagbits

IEEEproduct

12

Case-1
Normal
Number
Case-2
Normal
Number

Operand1
Operand2
Result

Operand1
Operand2
Result

S
0
0
0
S
0
0
0

Exponent
10000001
10000000
10000010
Exponent
10000000
10000000
10000001

Significand
00000000101000111101011
10101100110011001100110
10101101110111110011100
Significand
00001100110011001100110
00001100110011001100110
00011010001111010110111

13

## Comparison 0f 3 types of FP Multipliers using 0.22

micron CMOS technology

AREA
(cell)

POWER
(mW)

Delay
(ns)

2288.5

204.5

69.2

2997

94.5

68.81

Path FPM

3173

105

42.26

14

## IEEE compatible floating point adders

Algorithm

Step1
Comparetheexponentsoftwonumbersfor(or)andcalculatetheabsolutevalueofdifferencebetweenthetwo
exponents().Takethelargerexponentasthetentativeexponentoftheresult.
Step2
Shiftthesignificandofthenumberwiththesmallerexponent,rightthroughanumberofbitpositionsthatisequalto
theexponentdifference.Twooftheshiftedoutbitsofthealignedsignificandareretainedasguard(G)andRound
(R)bits.Soforpbitsignificands,theeffectivewidthofalignedsignificandmustbep+2bits.Appendathirdbit,
namelythestickybit(S),attherightendofthealignedsignificand.ThestickybitisthelogicalORofallshiftedout
bits.
Step3
Step4

Evaluateexceptionconditions,ifany.
Step5
RoundtheresultifthelogicalconditionR(M0+S)istrue,whereM0andRrepresentthepthand(p+1)stbitsfromtheleft
endofthenormalizedsignificand.Newstickybit(S)isthelogicalORofallbitstowardstherightoftheRbit.Iftheroundingcondition
roundingcangenerateacarryout.inthatcasenormalization(step4)hastobedoneagain.

15

## Floating Point Addition of Operands with Rounding

pbitsignificandfield
Aligned
significands

p1higherorderbits
p1higherorderbits

a0 0 0 0
b0 G R S

Cout

Fig.1Alignedsignificands

G R S

p1higherorderbits

M0 R S

NormalizedSignificandbeforeRounding

16

IEEE Rounding

IEEEdefaultroundingmodeRoundtonearesteven

Significand

Rounded
Result

Error

Significand

Rounded
Result

Error

X0.00

X0.

X1.00

X1.

X0.01

X0.

1/4

X1.01

X1.

1/4

X0.10

X0.

1/2

X1.10

X1.+1

+1/2

X0.11

X1.

+1/4

X1.11

X1.+1

+1/4

17

Whats the
best
architecture?

Architecture Consideration

Concordia

University

18

19

## Triple Path Floating Point Adder

Exponents

InputFloatingPointNumbers

ExponentLogic

ControlLogic

DataSelector
DataSelector/Prealign
(0/1BitRightShifter)

Prealignment
(RightBarrelShifter)/
Complementer

BypassLogic

Exponent
Incr/Decr

Result
Selector

Exponent
Subtractor

Normalization
(1bitRight/Left
Shifter)

ResultSelector

Countinglogic

Normalization
(LeftBarrelShifter)

ResultIntegration/FlagLogic

Flags

IEEESum

20

E x p o n e n t s

I n p u t F l o a t in g P o in t N u m b e r s

E x p o n e n t L o g ic

C o n tr o l L o g ic

1st

2nd
D a t a S e l e c t o r
D a t a S e l e c t o r / P r e a li g n
( 0 / 1 B it R i g h t S h i f t e r )

P re a lig n m e n t
( B a r r e l S h i f t e r R i g h t ) /
C o m p lem en ter
3 rd

C r it ic a l P a th

A d d e r / R o u n d in g L o g i c

A d d e r / R o u n d i n g L o g i c

B y p a s s L o g i c

4 th

E x p o n en t
I n c r /D e c r

R esu lt
S e lecto r

E x p o n e n t
S u b tra c to r

R e s u lt S e le c t o r

L e a d in g Z e ro
C o u nte r

5 th
N o r m a l i z a ti o n
( 1 b i t R i g h t/ L e f t
S h if t e r )

N o r m a li z a t i o n
( B a r r e l S h if te r L e f t)

R e s u l tI n t e g r a ti o n / F l a g L o g ic

F lag

I E E E S u m

21

B P (b y p a s s )
I
LZA

LZB
BP

BP

LZB

LZA

LZA

LZ B

22

exponents
e1
e2

Control

exponent
difference

1
sign2

sign1

significands
s1

s2

compare

rightshifter

bitinverter

bitinverter

signcontrol

LZAlogic

LZAcounter

roundingcontrol

exponent
subtract

incrementer

leftshift

sign

selector

exponent
incrementer

compensation
shifter

23

ComparisonofSynthesisresultsforIEEE754SinglePrecision

Parameters

SIMPLE

PIPE/

Maximumdelay,D(ns)

327.6

213.8

101.11

AveragePower,P
(mW)@2.38MHz

1836

1024

382.4

AreaA,Total
numberofCLBs(#)

664

1035

1324

PowerDelayProduct
(ns.10mW)

7.7.*104

4.31*104.

3.82*104

(10#.ns)

2.18`*104

2.21*104

1.34*104

(10#.ns2)

7.13.*106

4.73*106

1.35*106

24

25

Reference List

[1] Computer Arithmetic Systems, Algorithms, Architecture and Implementations. A. Omondi. Prentice Hall, 1994.
[2] Computer Architecture A Quantitative Approach, chapter Appendix A. D. Goldberg. Morgan Kaufmann, 1990.
[3] Reduced latency IEEE floating-point standard adder architectures. Beaumont-Smith, A.; Burgess, N.; Lefrere, S.; Lim, C.C.; Computer Arithmetic,
1999. Proceedings. 14th IEEE Symposium on , 14-16 April 1999
[4] Rounding in Floating-Point Addition using a Compound Adder. J.D. Bruguera and T. Lang. Technical Report. University of Santiago de Compostela.
(2000)
[5] Floating point adder/subtractor performing ieee rounding and addition/subtraction in parallel. W.-C. Park, S.-W. Lee, O.-Y. Kown, T.-D. Han, and S.D.
Kim. IEICE Transactions on Information and Systems, E79-D(4):297305, Apr. 1996.
[6] Efficient simultaneous rounding method removing sticky-bit from critical path for floating point addition. Woo-Chan Park; Tack-Don Han; Shin-Dug
Kim; ASICs, 2000. AP-ASIC 2000. Proceedings of the Second IEEE Asia Pacific Conference on , 28-30 Aug. 2000 Pages:223 226
[7] Efficient implementation of rounding units. Burgess. N.; Knowles, S.; Signals, Systems, and Computers, 1999. Conference Record of the Thirty-Third
Asilomar Conference on, Volume: 2, 24-27 Oct. 1999 Pages: 1489 - 1493 vol.2
[8] The Flagged Prefix Adder and its Applications in Integer Arithmetic. Neil Burgess. Journal of VLSI Signal Processing 31, 263271, 2002
[9] A family of adders. Knowles, S.; Computer Arithmetic, 2001. Proceedings. 15th IEEE Symposium on , 11-13 June 2001 Pages:277 281
[10] PAPA - packed arithmetic on a prefix adder for multimedia applications. Burgess, N.; Application-Specific Systems, Architectures and Processors, 2002.
Proceedings. The IEEE International Conference on, 17-19 July 2002 Pages:197 207
[11] Nonheuristic optimization and synthesis of parallel prefix adders. R. Zimmermann, in Proc. Int.Workshop on Logic and Architecture Synthesis,
Grenoble,
France, Dec. 1996, pp. 123132.
[12] Leading-One Prediction with Concurrent Position Correction. J.D. Bruguera and T. Lang. IEEE Transactions on Computers. Vol. 48. No. 10. pp.
1083-1097. (1999)
[13] Leading-zero anticipatory logic for high-speed floating point addition. Suzuki, H.; Morinaka, H.; Makino, H.; Nakase, Y.; Mashiko, K.; Sumi, T.;
Solid-State Circuits, IEEE Journal of , Volume: 31 , Issue: 8 , Aug. 1996 Pages:1157 1164
[14] On low power floating point data path architectures. R. V. K. Pillai. Ph. D thesis, Concordia University, Oct. 1999.
[15] A low power approach to floating point adder design. Pillai, R.V.K.; Al-Khalili, D.; Al-Khalili, A.J.; Computer Design: VLSI in Computers and
Processors, 1997. ICCD '97. Proceedings. 1997 IEEE International Conference on, 12-15 Oct. 1997 Pages:178 185
[16] Design of Floating-Point Arithmetic Units. S.F.Oberman, H. Al-Twaijry and M.J.Flynn. Proc. Of the 13th IEEE Symp on Computer Arithmetic.
pp. 156-165 1997
[17] Digital Arithmetic. M.D. Ercegovac and T. Lang. San Francisco: Morgan Daufmann, 2004. ISBN 1-55860-798-6
[18] Computer Arithmetic Algorithms. Israel Koren. Pub A K Peters, 2002. ISBN 1-56881-160-8
[19] Parallel Prefix Adder Designs. Beaumont-Smith, A.; Lim, C.-C.; Computer Arithmetic, 2001. Proceedings. 15th IEEE Symposium on, 11-13 June 2001
Pages:218 225
[20] Low-Power Logic Styles: CMOS Versus Pass-Transistor Logic. Reto Zimmmemann and Wolfgang Fichtner, IEEE Journal of Solid-State Circuits,
VOL.,32, No.7, July 1997
[21] Comparative Delay, Noise and Energy of High-performance Domino Adders with SNP. Yibin Ye, etc., 2000 Symposium on VLSI Circuits Digest of
Technical Papers
[22] 5 GHz 32b Integer-Execution Core in 130nm Dual-Vt CMOS. Sriram Vangal, etc., IEEE Journal of Solid-State Circuits, VOL.37, NO.11, 26
November
2002
[23] Performance analysis of low-power 1-bit CMOS full adder cells. A.Shams, T.Darwish and M.Byoumi, IEEE Trans. on VLSI Syst., vol. 10, no.1,

How to shift several
bits at once ?

Barrel Shifters

Concordia

University

27

Right Shift
Barrel Shifter

28

D3

MUX
1

D1

D2

MUX

D0

MUX

MUX

S0

Y3

Select
Si
0
0
1
1

Y1

Y2

Y0

Out Put
So
0
1
0
1

Y3
D3
D2
D1
D0

Y2
D2
D1
D0
D3

Y1
D1
D0
D3
D2

Operation
Y0
D0
D3
D2
D1

No Shift
Rotate Once
Rotate Twice
Rotate 3 times

29

## Distributed Barrel Shifter

x8

x7

x7

x6

x6

x5 x5

x4

x4

x3

x3 x2

x2

x1

x1

x0

MUX

MUX

MUX

MUX

MUX

MUX

MUX

MUX

k7

k6

k5

k4

k3

k2

k1

k0

S0

k9

k7

k8

k6

k7

k5 k6

k4

k5

k3

k4 k2

k3

k1

k2

k0

MUX

MUX

MUX

MUX

MUX

MUX

MUX

MUX

w7

w6

w5

w4

w3

w2

w1

w0

S1

w11 w7 w10 w6 w9 w5 w8

w4 w7 w3 w6 w2 w5

w1 w4 w0

MUX

MUX

MUX

MUX

MUX

MUX

MUX

MUX

y7

y6

y5

y4

y3

y2

y1

y0

S2

30

## Please note that in this

case if we have 8 bits
of data then inputs to
MUXes greater than 7
should be be set to a
desired value

31

32

. Block

## Diagram of the Right Shifter & GRS-bit

Generation Component

33

Concordia

University

The end

34

Concordia

University

Appendix 2
For Information

35

exponents
e1
e2

Control

exponent
difference

1
sign2

sign1

s2

X
X

compare

rightshifter

bitinverter

bitinverter

signcontrol

significands
s1

LZAlogic

LZAcounter

S&M

roundingcontrol

exponent
subtract

incrementer
S&M

leftshift

sign

selector
S&M

exponent
incrementer

compensation
shifter

36

## Improvements in FADD from Previous Designs

exponents
e1
e2

Control

exponent
difference

1
sign2

sign1

significands
s1

ABSENT

s2

compare

rightshifter

bitinverter

bitinverter

signcontrol

LZAlogic

LZAcounter

roundingcontrol

exponent
subtract

incrementer

leftshift

sign

selector

exponent
incrementer

ABSENT

compensation
shifter

37

Architecture Consideration

1
2
5

3
4
6

Straightforward IEEE

1. Positive result, Eliminate Complement
2. Comparison // Alignment
3. Full Normal // Rounding

1.
2.
3.
4.
5.
6.
7.

Exponent subtraction.
Alignment.
Conversion.
Normalization.
Rounding.

38

## (Compare to signal path)

Reduce latency
FAR data-path:
--No Conversion
--No Full normalization
--No LOP
CLOSE data-path:
--No Full Alignment

## Reduce total path delay

--eliminate Comparator
Increase area

## The latency of the floating-point addition can

be improved if the rounding is combined

39

Main Blocks
What blocks are
considered?

LOP with Concurrent Position Correction (New)
Alignment Shifter
Normalization Shifter

Concordia

University

40

## How can a compound

fastest?

Concordia

University

41

The Compound adder computes simultaneously the sum and the sum plus one, and then
the correct rounded result is obtained by selecting according to the requirements of the
rounding.

A B
A B 1
Effective Subtraction
A B 1 A-B
A B A-B-1
A B 1 B-A-1
A B B-A

42

## Round to nearest Sum, Sum+1

if g=1
if (LSB=1) OR (r+s=1)
else Truncate at LSB
Round Toward zero
Sum
Truncate
Round Toward +Infinity Sum, Sum+1 and
if sign=positive
if any bits to the right of the result LSB=1
else
Truncate at LSB
if sign=negative
Truncate at LSB
Round Toward -Infinity
Sum, Sum+1 and
if sign=negative
if any bits to the right of the result LSB=1
else
Truncate at LSB
if sign=positive
Truncate at LSB

Sum+2

Sum+2

Rounding Block

43

The Compound adder computes simultaneously the sum and the sum plus one, and then
the correct rounded result is obtained by selecting according to the requirements of the
rounding.

A B
A B 1
Effective Subtraction
A B 1 A-B
A B A-B-1
A B 1 B-A-1
A B B-A

44

Sum, Sum+1
Round to nearest
if g=1
if (LSB=1) OR (r+s=1)
else Truncate at LSB
Round Toward zero
Sum
Truncate
Round Toward +Infinity
Sum, Sum+1 and
if sign=positive
if any bits to the right of the result LSB=1
else
Truncate at LSB
if sign=negative
Truncate at LSB
Round Toward -Infinity
Sum, Sum+1 and
if sign=negative
if any bits to the right of the result LSB=1