Professional Documents
Culture Documents
Abstract
In the methodology offinding a better VLSI circuit, we sugg est
In this paper, we study area-time tradeoffs in VLSI for prefix the following steps.
computation using graph representations of this problem. Since
a Find a lower bound on the chosen metric.
�
the pro lem is inti mately related to binary addition, the results
we obtam lead to the design of area-time efficient VLSI adders. b Devise algori thms that are close to the lower bound or
This is a major goal of our work: to des ign fiery low latency addi analyze known algorithms with respect t o the chosen
metric.
tion circuitry that is also area efficient. To this end, we present
a new graph representation for pre fix computation tha t leads to c By investigating t he trade-off behavior between area and
the design of a fast, area-efficient binary adder. Th e new graph is time complexity, find the optimal range of area-time com
a combination of previously known graph representations for pre plexity.
fix compu t ation , and its area is close to known lower bounds on d Discover corresponding layouts that are optimal in area
the VLSI area of parallel prefix graphs . Using it, we are able to time complexity for given values of comput ation time
design VLSI adders having area A = O(nlogn) whose delay time (including minimum computation time ) .
is t he lowest possible value, i. e. the fastest possible area-efficient
VLSI adder. Brent and Kung [BK82] found a regular layout for a graph rep
resentation of prefix computation and used it to perform binary
1 Introduction
addition. We apply the above steps on a combination of their al
gorithm and an algorithm for solving linear recurrences developed
Addi tion circuitry is an impor tant part of any computer archi tec
by K ogge and Stone [KS73]. We then subs ti tute actual logic for
ture, since it is the major component of an ALU, fioating point
the no des and wires for the edges of our graphs in order to im
pro ces sor , or a special purp o se chip ([Gr8S], [FWB5], [ST8S], etc.).
plement VLSI circuitry for binary additio n that is both fast and
Since the performance of the adder can domina t e the p erforman ce
· area-efficient.
of the architecture, the design of low latency circu it ry has been the
This paper is organized as follows. In t he next section, we
subject of a large amount of research. In this paper, we co ntinu e .
denve upper bounds on area for a graph representation of our
these studies with the desig n of a fast area-efficient VLSI binary
new prefix algorithm. In s ect ion 3, we design VLSI circuitry for
adder. We start by deriving a new "hybrid" algorithm for prefix
bin ary addition based on our hybrid prefix algorithm. In se ction
co mputation (a combination of previously known algorithms for
4, we compare our model and circuitry with o th ers in terms of
prefix computation which are e ither the best case in area or time
area and delay time performance. We conclude this paper with
complexity ) , and investigate the trade-off behavior between area
suggestions for future research in sec t ion 5.
and time of this algorithm. Since graph representations of pre
fix algorithms map dir ectly into circuitry for binary addition, our
hy brid prefi x algo r i th m leads to the design of fast and area effi 2 Graph Representations of Algorithms for
cient VLSI adders. One of the results of this research is that we
are able to design the fastest possible circuitry occu pying a given
Prefix Computation
amount of area for b ot h prefix comp uta t ion and binary addition.
A par allel prefix circuit is a combinational circuit that takes n in
In VLSI design, there are three impor t ant factors that affect
puts :l:J, :1:2," ':1:n and produces the n outputs :1:1,:1:10 :1)2," ',:1:10
the performance of a design . The first is area complexity, the
z2 o· • 'OZn where 0 represents an arbitrary associati v e binary oper
second is time/delay performance, and the third is regularity of
ation. Parallel prefix computation has been used for many appli
interconnection. Usually, a pattern rich in conne c t ion s occupies a
cations, for instance, in regular layouts of parallel adders [BK82J,
lot of area , while a pattern with limited connections takes a long
generati�� consecutiv� terms of a linear recurrence [GLPG82], and
time to compute. Because the cost of area is very expensive, every
fast addi t Ion of two bmary numbers [Of63]. Therefore, it is of in
effort should be made to find the optimal area complexity for a
terest to find area-time efficient VLSI implementations of prefix
given value of time performance.
computation. We are interested here in the design of the fastest
Area in VLSI is expressed as the total amou n t of silicon used.
possible prefix circuitry occup y ing a given amount of VLSI area.
Interconnection of wires can consume most of the area of a VLSI
This section is organized as follows. In section 2.1, we review
circuit. Therefore, in this paper , we take the complexity of inter
past work on the subject of prefix computation. Section 2.2 de
connection into consideration rathe r than just a count of "active
s crib es a new algorithm for prefix computation that forms a basis
elements', "gates', or "registers".
for the rest of the paper. In sections 2.2 and 2.3, we derive two
different VLSI layouts for the graph representation of our new
lpresent address:
Supercomputing Re.earch Center 4S80 Forbes Blvd.
algorithm and discuss their properties. We obtain upper b01ll\ds
Lanham, MD 20706
49
CH2419-0/87/000010049$Ol.OO © 1987 IEEE
Authorized licensed use limited to: Uskudar Universitesi. Downloaded on February 01,2023 at 08:09:20 UTC from IEEE Xplore. Restrictions apply.
area, low time; B-K: low area, high time), and we can see that
on area forgiven value of delay time for the layouts, and from
a
these u pp er bounds, we find the range of dela y time for which area only a constant factor reduction in delay time from 2 log n - 1 to
is 0 (n log n). In section 2.4, we discuss properties of the graph's logn r esults in a significant in crease in area from O(nlogn) to
layout when its nodes are interpreted to implement the circuitry O(n2). It becomes of interest to discover upper and lower bounds
of a bin ar y adder. on are a- t im e tradeoffs for prefix computation when delay time is
restricted to lie in the range between log n and 2 log n - 1.
Carlson and Sugla. [e585] obtained lower bounds on the area
2.1 Review of Past Work
of a prefix graph for delay time between log n and 2 log n - 1.
Past work on p arallel algorithms for pr efix computation is as fol Among other things, the low er bounds are matched by the B-K
lows. Ladner and F is cher [LF80] gave a general co nstruct i on for and the "[{-S graphs at the extremes ofthe range. Thus, it is likely
parallel prefix circuits when gates have unbounded fan-out and that a proper combination of the B-K and K-S graphs will be both
Fich [Fi83] p rovided new lower and upp er bounds for circuits with area and time efficient.
gates having bounded fan-out. Brent and Kung [BK82] suggested By combining the B-K and K-S graphs as shown in Figure 2.3,
a regular layout for co mputing prefixes. Their algorithm is suit we obtain a new hybrid prefix grap h that achiev es intermediate
ab le for implementation in VLSI, and when it is combined with values of area and time. The graph is easily observed to compute
pip elining , time O(1ogn) and area O(n) can be achieved. In the prefixes correctly. We divide it into three separate blocks, the
early 70's, Kogge and Stone [KS73) discovered the technique of outer of which ar e multiple B-K graphs and t he inner of which
recUIsive doubling and gave an algorithm for solving a larg e class is the K-S gra ph . A straightforward VLSI layout of the graph
of recurrence problems on parallel computers su ch as the llliac follows from FigUIe 2.3, which is drawn with the correct number
IV. T heir algorithm can be applied to compute prefixes in paral of vertical tracks between each stage.
lel and a graph representation can be developed which leads to a If we define k as the extra depth of the hybrid prefix graph
p os sible VLSI layo)1t. In papers by Carlson and Sugla [CS85] and and n a8 the number of inputs, then the area of the embedded
Sugla [Su85], a lower bound on the VLSI area of a prefix graph K-S graph will be n2/2". To see this, consider a hybrid pre
was obtained using the minimum bi s ect ion width (MBW) of the fix graph of extra depth k comp osed of an embedded K-S graph
graph. Th omp s on [Th79,80] pi oneered this appr oa ch to deriving on n/2" inputs surrounded by B-K graphs each with 2" inputs.
VLSI lower bounds. In [CS85] and [SuS5], t he lower bound on the The embedded K-S graph has n/2" inputs and thus requires n/2h
area of a parallel prefix graph is A. =: !l(n222{loS. n-T) + n) where vertical tracks to lay out in VLSI. However, since its inputs are
l o g n :$ T :$ 2 log n. They indicated the possibility of de signing spread out with 2" tracks between each pair, it requires n hori
circuitry with minimum valu es of both area and time. zontal tracks. The area of the embedded B-K gra phs in this model
The work of Ladner and Fi s cher , and Fi ch , while interesting, will be 2kn (n horizontal tr ac ks and 2k vertical tracks). There
uses gate count rat h er than area as a complexity measur e. This fore the total VLSI area of this layout of the hybrid prefix graph
ignores the wiring area complexity needed to interconnect circuit will be A =: 2kn + n2 /2k. In the above expression for area, it
elements. Brent and Kung [BK82] use area as a complexity mea can be shown that A = O(nlogn) when k is the following range:
sure, however, their techniques do not produce minimum depth log n - log log n � Ie :5 log n - 1. Thus , a small a d ditive factor
parallel prefix circuits. In the follo wing section, we use meth o ds in del a y time performance is gained over the B-K graph with no
similar to those emplo yed by Brent and Kung, and Carlson and d egradation in area consumption.
Sugla to find the minimum dep th of an area efficient prefix circui t . When the above upper bounds are compared with Carlson
The result immediately su g g est s area-efficient VLSI imp lement a and Sugla's (CS85] lower bound of A = !l(nl/22"), it can be
tions of high-speed binary addition. seen that the lay,out doe s not achieve o p timalit y. This is mainly
due to that area wasted in the middle K-S grap h , who s e input s
2.2 The New Hybrid Prefix Algorithm, a Graph and s ubs e quent nodes are spread by 21. tracks. H ow ever, this
Representation, and a VLSI Layout. layout for the entire graph does have some desirable prop erties ,
including a regular connection pattern, and all inputs and outputs
In this subsection, we discuss results on area-t i me tradeoffs for on the boundaries. As we will see later in this paper, efficient area
prefix computation. Our di s cussions lead to a graph representa· utilization can be made for small values of n by fo lding nodes of
tion of a new algorit hm for c omput ing prefixes, which will form the K-S graph into unused tracks.
the basis of the fast area-efficient VLSI adder des ign s presented
late r in this paper . We obtain a straightforward VLSI layout of
2.3 An Area Optimal Layout for the Hybrid Prefix
the graph, and upper bound the area of the layout when it is
Graph
cons truct e d to have a given value of delay time. From this upper
bound, we find the range of delay time for which area is efficient, In subsection 2.2, we derived a VLSI layout for prefix graphs
i.e. A O(nlogn).
= th at achieves intermediate values of area and time p erformance .
The Brent-Kung ( B - K ) prefix graph [BKB2] can be laid out The layout has some desirable properties, es p ecially for small to
in area O ( n log n) with T 2log n - 1, as shown in FigUIe 2.1.
=: intermediate values of n, and we will use it as a basis for designing
A generalization of the Kogge-Stone (K-S) linear recurrence algo fast area- effici ent VLSI adders later in this paper. For l arge values
rithm [KS73] computes prefixes and has a graph representation of n h ow e v er , area is wasted in the middle of the gra ph that cannot
that can be laid out in area !l( n2) with T log n (see Figure
=: be used for the placement of other components.
2.2). The lar ge area is mainly due to the lar g e number of vertical In this subsection, we p re sent an alternative layou t for the
t r acks required t o embed wires in the upp er stages of the graph. hybrid prefix graph that is more suitable for l arge valu es of n. The
The last stage requires n/2 vertical tracks, the stage before n/4, layout achieves ar e a consumption that co m e s within a constant
etc., which adds to a total of n - 1 vertical trac k s. FigUIe 2.2 factor of the lo wer bound of Carlson and Sugla(CS85], however it
illustrates this, eventhough some lines do not follow gri d tra ck s . loses the desirable pr op ert y of having all inputs and o utputs on
C omparing these grap h representations for prefix computa the boundary.
tion, we find extremes in area and time performance (K-S: high This asymptotically optimal layout is shown in Figure 2.4. It is
50
Authorized licensed use limited to: Uskudar Universitesi. Downloaded on February 01,2023 at 08:09:20 UTC from IEEE Xplore. Restrictions apply.
When reducing the depth of th e hybrid prefix graph , the total
a result mainly of l a ying out the emb edded K-S graph in minimwn
area. Inputs to the K-S graph are placed so that consecut ive area of the processing elements increases (more are required ) and
bound of Carlson and Sugla [CS8S] to within a constant factor Area == 1/ f( Area of the processing elements )
for almost all values of extra depth k. If an H-t ree like la yout +(Area of interconnections )
is used for each B-K graph, optimalit y is achieved for all values
of Ie. We also find that T == 3/2(logn) - 1/2(loglogn - 0(1»
where f is the number of folds. T herefore with either or both of
the above two me thods , we can decrease the depth by at least
is the lowest value of delay time for which area O(nlogn) can
simultan eousl y be achieved. In Table 2.1, we compare the area one unit and still achieve the same area cons1lIIlption. We note
that reducing the depth oUhe hybrid prefix graph to its absolute
and time performance of the layouts presented in this and the
minimum (Ie 0, K-S case) results in the des i gn of area inefficient
previ ous subsection for different values of area, time , and extra
==
51
Authorized licensed use limited to: Uskudar Universitesi. Downloaded on February 01,2023 at 08:09:20 UTC from IEEE Xplore. Restrictions apply.
v Interconnection wires have reasonable lengths (linear in the then take posi ti ve input; produc e negative output
number of bits b eing added) . .else take negative input; pr oduce p osit ive output
following scheme.
Pi.2k ,« -'Pi,2k-l) + ("'Pi.21o-1))
(1)
where gi = a, .bi hi and Pi = ai + bp.ca ( positive input )
Here, we do not us e exclusive·ors in the e quat ions [BK82], so th at
it is not neces s ary to invert the a's and b's to produce the p's and
-,g;,21o+1 == -'(Pi,21o' gi,21e + 9i,21o)
g's. ,pj,21o+1 '(Pi.2k . Pi,2")
The recurrence relat io n in equat ion (1) can be applie d repeat iii) Whit e cell
edly to obtain the following set of carry equations in terms of gi, This cell is a simple inverter; notation is as follows.
Pi and C_l'
Pi,lo == 'Pi.k-l
i-1 i
9k,I "'g;,k-l
«II Pr.)· e-l
=
where 0 is implemented as given above. U sing this equation allows us to avoid the use of exclusive ors,
which would require the input variables and their complements
S.2 VLSI Implementation (ltj, ,ai, b i , ,bi) to be provided to the sum bi t circuitry. Taking
int o account that carries produced by the c arr y generation cir
The hybrid prefix algorithm is straightforward to convert into cir
cuitry alternate b etween being positive and negative, we design
cuitry for binary addition using techniques similar to those em
two types of sum cells. One takes its inputs "naturally" and the
plo ye d by Brent and Kung [BK82]. We use the same terminology
other must invert its two carry inputs.
as [BK82]. Figure 3.1 diagrams a carry look - ahe ad adder derived
from the hybrid prefix algorithm. With this graph, we can reduce sumce1l1 F(Ci, ""Ci-I, "'Pi,l, ,gi,l)
the d epth of the prefix algorithm from 7 (B-K case) to 5 (our way)
with n=16. We use only the standard layers of interconnection
sumee1l2 F( lei, Ci-l, "'pi,l, ,gi,l )
a vailab le in NMOS VLSI (metal, silicides, polysilicon). If the de
We note the following advantages of using leaf cells designed as
signer is allow e d to use more than 2 metal interconnection layers,
given above:
then a further reduc t ion in depth can be made while maintaining
the given area. We follow 4J.Lm NMOS topological de s i gn rules. Black cells are simple.
There are 4 cUfferent types of leaf cells in our circuitry: pg gen ,
ii P ggen cells are s i mple .
black, white, and sum. All are illustrated in Figure 3.1b. As
iii Sum cells are half of a full a dder cell.
shown in [NI8S], we can employ the following strategy to eliminate
the use of inverter gates in each leaf cell. iv All cells combine speed (minimum number of gate delays)
with area efficient layouts.
If (level is even number)
52
Authorized licensed use limited to: Uskudar Universitesi. Downloaded on February 01,2023 at 08:09:20 UTC from IEEE Xplore. Restrictions apply.
In Figure 3.2, we show the leaf cell layouts. The le af cells for the other designs, and even is competitive with the area of a regular
sum circuitry contain super buffers in order to more easily drive ripple-carry adder for small values of n. When n � 64, the area
a large capacitance output pad. Since all leaf cells have good values of our circuitry and of a ripple carry adder are compared in
layouts, the overall design is competitive in area with a standard Table 4.1. The area of our circuitry is only a factor of 2 or 3 higher
ripp le carry adder . Area increases in our design only because of than that of the ripple carry adder. For wire capacitance effects,
the extra proc es sing elements and interconnection area required the s p eed of the circui t ry is dependent on the length of the wires
by basing the design on a hybrid prefix graph. To reduce the (their capacitance) and the resistance oflong wires. In Table 4.2,
charge time of the capacitance being driven by the leaf cells, we we comp ar e for each design the total interconnection wire lengt h
used minimum sized pull up structures (2)' x 2>.). from inputs to outputs. From this table, we can conclude that it
takes almost same amount of time for each design to charge and
3.3 A Densely Packed Layout Using a Folding Method. pas s signals along the interconnection wires. Therefore we can
ignore the delay time due to wire capacitance in comparing these
Our c arr y look-ahead adder b ased on th e hybrid prefix algorithm desig ns ( b ut we do have to consider this wire capacitance when
can be packe d densely by using a f oldi ng method. With this measuring delay time in an actual implementation).
metho d , we can reduce in half the area of the overall de si gn for We now compare our circuitry with the carry skip adder of
reasonable sized adders (n � 64). Equivalently we could decrease [OB85], and the carry look-ahead adder of [Cav84] usin g the
the depth by one unit and still achieve the same ar ea consumption method of (OE85]. They associate a time unit t with one gate
by employing our folding method. For large sized adders, the propagation delay time (fan-in and fan-out bounded by at least
folding method is not as effective in reducing the depth, because 4). When we comp are our leaf cells with their single gate, OUl'S
the interconnection area. becomes dominant. is quite simple (
half the size; fan-in and fan-out bounded hy 2).
Our folding scheme is shown in Figure 3.3. It works by placing Therefore, the propagation delay time of any of our leaf cells is
two levels of the h ybrid prefix graph into one level of the layout, at most !t. As discussed in section 2.3, in our designs we have
since space is available to emb ed leaf cells. It should be ob vious fixed the value of ext ra depth to k = 1 when n � 64. The delay
from the figure that very efficient area consumption is realized, time comparison of our designs with (OB85], (Cav84] is shown in
especially for circuitry designed from a hybrid prefix grap h with Table 4.3.
extra depth k = 1. Next we compare our circuitry with the designs of [BK82]
In Fi gure 3.4, we int roduce two ways of arranging the direction [NIB5] and [NIRB6] using the method outlined in [NIBS]. Here,
of input and output to our carry look-ahead adder circuitry. In we simply count the worst-case number of gates along any path
Figure 3.4a, the inputs and outputs go through in the same direc from inputs to outputs to measure the delay time. The fan-in
tion (inputs and outputs bo th at the bottom). In Figure 3.4b, the and fan-out restrictions are almost the same in all these designs
outputs come out the t op , the opposite side of the inputs, which and thus are ignored. As we discussed previously, extra depth
come in the bottom. This gives the designer some fiexibility in k = 1 is used in our designs with n :$ 64. For large values of
attaching the circuitry to di fferent styles of bus structures, or n (n > 64), we use the value k = logn -loglogn. The delay
different arrangements of I/O p ads . Other desi gns of carry look time comparison between our design and [BK82], [NI8S], [NIR86]
ahead adders in VLSI [NIBS] [NIRB6] have ignored the placement is shown in Table 4.4. Using our hybrid prefix algorithm, delay
of input s and outputs, instead assuming that external inputs and time is the lowe s t possible value with optimal area 0 (n log n).
out p ut s are available wherever they are required. Such assump Furthermore, by adding various strategies (folding, simple leaf
tions are impractical in a typical ALU where I/O and bussing cells and multiple metal layers ), our designs are better than those
conventions are generally adapted in advance of the design of the of [BK82], [NI85], [NIR86] with respect to delay time, and almost
circuitry. We conclude this section by giving a diagram of the as area-efficient as a ripple carry adder.
entire chip including I/O pa.ds ( Figure 3.5).
5 Conclusion
4 Comparison with Other VLSI Adder De
We have presented a new prefix algorithm and have applied it to
signs obtain a fast area-efficient VLSI implementation of a binary adder.
\Vith the other techniques presented here (folding, multiple metal
In this section, we compare our VLSI circuitry for carry look
layers and the design of simple leaf cells), the area of this hybrid
ahead addition with other designs. It is not accurate to compare
adder becomes competitive with the area of a ripple carry adder.
each design by counting only the number of g ates because of the
Also, the time performance of this adder is better than any other
difrerences in fan-in and fan-out, types of gates, and wiring capac
reported in the open literature.
itan(c effeds. AI:counting for these differences using the method
The design strategy used here can be applied t o various tech
of [OBSS], we compare our circuitry with the carry skip adder of
nologies including ECL, TTL, CMOS, NMOS, etc. Since speed
[OB85] and the commercial carry look-ahead adder presented in
and power dissipation depend on the technology being used, and
[Cav84] for which fan-in and fan-out are bounded by at least 4
the lamda scale factor b e c omes smaller in a new technology, it may
logical wires. Then, with the same fan-in and fan-out restriction,
he desirable to extend our fan-out 2 algorithm to fan-out f (f � 2)
we compare out drcuitry with [BK82] [NI85] [NIR86].
due to the reduced fan-out capacitances being driven [OA83J. Our
First, some gerwral comparisonH arf' i.n order. Although the
current investigations are concentrating on this topic.
tim€' p�.,rc>rma.nce of our circuitry is proportional to O(logn), ii
still compares favorably to a standard carry look-ahead adder im
plemented in VLSI. This is because of the large fan-out of a carry
look-ahead adder, and the fact that fan-out proportional to n
require. delay time O(log n) to drive as a capacitive load under
realistic models of a VLSI cirCl�it.
The area of our circuitry is very efficient when compared with
Authorized licensed use limited to: Uskudar Universitesi. Downloaded on February 01,2023 at 08:09:20 UTC from IEEE Xplore. Restrictions apply.
·4 ·8 - 2 • 6
Outputs 1 0 2 0 " · 16
B-K part
Inputs 2 • . •
16
Fig 2.1 Kogge-Stone Prelix graph
Outputs 102 0 • • · 16
1
Fig 2.4 Area Optimal Layout (K=2, n=16)
8 l(J 8 1 r Levels
Inputs 1 2 16
Fig 2.2 Brent·Kung Prelix Graph
Levels Tracks
T T
k B·K Graph (inverted tree) k
3
+ +
K-S Graph n/ 2 k
log n 2
+
B-K Graph (binary tree) k
....L
r-- n inputs -
B·K
Inputs 16 p g. ca sum.ca
b) Hybrid Prelix Graph of k=1 and n=16
Fig 2.3 Hybrid Prefix G raph and Block Diagram b) nolation5 of leaf cell
Fig 3 . 1 Hybrid Prefix Adder
54
Authorized licensed use limited to: Uskudar Universitesi. Downloaded on February 01,2023 at 08:09:20 UTC from IEEE Xplore. Restrictions apply.
pi g2 p2 gl pi god
inl in2 g od
c) pgge n.ca
dJ white.c..
cO
Fig 3.4 Two different styles of HPA (16 bit)
cO cl cl
(Different arrangements of I/O pads)
55
Authorized licensed use limited to: Uskudar Universitesi. Downloaded on February 01,2023 at 08:09:20 UTC from IEEE Xplore. Restrictions apply.
k -[extra depthT T Ttimel Are..
References
K-S k 0 lo g n O(n' )
cas.
[ B K 82J R.P. Brent and H.T. Kung, "A Regular Layout for Parallel
HPC � (Iog n - log log n) :S k :S log n O(n log n) Adders,' IEEE Tran.actions on Compute r., Vol. 0-3 1 , No.
(.ec 2.2) lo g n - 1 f log log n S, March 1982.
HPC log n log log n :S k :S 2 10g n log log n O(n log n)
(.ee 2.8) log n [ C ay 84 ] J.J.F. CavlIJlagh, Digital Co mp uter Arithmetic DeBign and
B-K k - Iog n 1 2 1c g n 1 O(n log n) Implem entation, McGraw-Hill, New York, NY, 1984.
c... e [ CS 85] D.A. C arlson and B. Sughl., "On the Area Requirements of
Very Fa.st VLSI Adders,' Technical Report, Dept. of Elec
T..ble 2.1 Comparison �e..-Tim. Complexity
(or each c�' of graph. trical and Computer Engineering, Univ. of Ma.ssachusetts,
Amherst, MA, 1985.
number of g.ates delay time k
1 [Fi 8S] F . Fich, "New Bo u nds for Parallel Prefix C ircuit s, ' Proceed
B-K 2.5 .. 2 log .. log n 1
K-S n log n log n 0 ing. 15th A CM Sympo Biu m on Th e ory 01 Computing, pp .
Fich 2.5 n + Sv'riTog n log n ! Iog n � Iog log n !-(Io g n - log log nl 100-109, April 1983.
lTIog It - log log nl
Fan drian to IIJld B.Y. Woo, "VLSI Flo at ing Point Proces
HPC 2.5 n + vnTog n log n -llog n --nog log lt
[FW 851 J.
sors,' Proceeding. 7th Sympo.ium on Comp ute r A rithmetic,
pp. 9S-100, June 1985.
Table 2.2 Camporuon of Upper bound. by counting tb. number of gat••
(fan-out hounded by 2) IG LPG 8 2 ] A . G . Greenberg, R.E. Ladner, M. S. Paterson and Z . Galil,
"Efficient Parallel Al gorithms for Linear Recurrence Com
Area of Area of
put .. tiont Information Proctllin g Lettero, Vol. 1 5 , No. 1 ,
Ripple Carry Adder Hybrid Pre fix Adder
pp . SI-85, August 1982.
n - 1 6 200 X 700,\2 350 X 700XT
n - 32 200 X 1400'\ " 500 X 14005;2 IGr 85J T. Gross, "Floating-Point Arithmetic on a Reduced Instruction
n - 64 200 X 2 100,\� 650 X 2 1 00,\2 Set Processor, "Proceeding. 7th Sympo.ium on Computer
Arithmetic, pp. 86-92 , June 198 5 .
IKS 13] P.M. Kogse a n d H.S. Stone, "A Par.JJ.el Al gorithm for the
Table 4.1 Are a of Hybrid Prefix Adder and RCA
Efficient Solution of a General Cla.ss of Recurrence Equ....
tions," IEEE Tran.action. on Comp ute .. , Vol. C-22, No. 8 ,
Tot al intercon nectio n
pp. 783-791, August 1973.
wire length
RCA 0 ILF aO] R.E. L adner and M.J. Fischer, "Par.JJ.e l Prefix Comput....
K-S n tion.· Journal 01 the A CM, Vol. 27, No. 4, pp. 831-8:18,
[ Nffi 86J T.F. N g ai, M.J. Irwin and S. Rawat, " Regul ar Are .... Time
Efficient C arry-Lookahead Adders ," Journal 01 Paraliel and
Table 4.2 Total wire length from input to output for each c as e
Distributed Computing, pp. 92-105, 1986.
n CLA CSA HPA lOA a 8 ] S . Ong and D.E. Atkins, "A Comparison of ALU Structures
16 6t St 3.St For VLSI Technology,' Proceeding. 6th SlImpo,ium on Co m
32 8t 7t 4t puter A rith m e tic , pp. 10-16, 1983.
48 lOt 9t V.G. Oklobdzija and E.R. Barnes, "Some Optim ..l Schemes
lOB 8 5 J
64 lOt lOt 4.5t forALU Implementation in VLSI Technolo gy," Proceeding.
7th SI/mpo . iu m o n Co mpute r A rithmetic, pp. 2- 8 , June
wh ere CLA : Carry Look-Ahead Adder
1985.
(Fan-in , Fan-out bounded by 4)
CSA : Carry Skip Adder 10f 8 S ] Y. Orman, "On the Algorithmetic Complexity of Discrete
(Fan-in, Fan-out bou n ded by 4) Functions ," CI/bernetie . and Control Theorll, So viet Physic.
HPA : Hybrid Prefix Adder Dokladll, Vol. 7 , No . 7 , pp. 589- 5 9 1 , 196 3 .
(Fan-in , Fan-o u t bounded by 2) [ST 8 5 ] S .P. Smith a n d H . C . Torng , " D es i gn o f a
Fast Inner Product
Processor," Proceeding, 7th Sympo6ium o n Comp uter A rith
Table 4 . 3 Comp arison in T using the method of lOB 85J
metic, pp. 38-48 , June 1985 .
56
Authorized licensed use limited to: Uskudar Universitesi. Downloaded on February 01,2023 at 08:09:20 UTC from IEEE Xplore. Restrictions apply.