An Ultra-Fast Parallel Prefix Adder: 2019 IEEE 26th Symposium On Computer Arithmetic (ARITH)

2019 IEEE 26th Symposium on Computer Arithmetic (ARITH)
An Ultra-Fast Parallel Prefix Adder
Kumar Sambhav Pandey Dinesh Kumar B Neeraj Goel

School of Comp & Elect Engg School of Comp & Elect Engg Dept of Comp Sc & Engg
Indian Institute of Technology Indian Institute of Technology Indian Institute of Technology
Mandi, India Mandi, India Ropar, India
kumar@nith.ac.in, pandey@iitmandi.ac.in dinesh@projects.iitmandi.ac.in neeraj@iitrpr.ac.in
Hitesh Shrimali
School of Comp & Elect Engg
Indian Institute of Technology
Mandi, India
hitesh@iitmandi.ac.in
Abstract—Parallel Prefix adders are arguably the most recursive algorithms which can be elegantly expressed as
commonly used arithmetic units. They have been extensively logical recurrence relations [12]–[16].
investigated at architecture level, register transfer level (RTL), These local signals are mathematically defined and their
gate level, circuit level as well as layout level giving rise
to a plethora of mathematical formulations, topologies and useful properties are discussed in Section II where an
implementations. This paper contributes significantly to the elegant and consistent terminology for further discussions is
understanding of these parallel prefix adders in a couple of introduced. Using this terminology, Weinberger-Smith [12],
ways. Firstly, it attempts to describe various such parallel prefix Ling [13], Doran [14] and Jackson-Talwar [15] recursions
adders in elegant and consistent formulations. Secondly, a new for multi-bit integer addition are presented, compared and
family of parallel prefix adders is proposed at architecture
level. The estimates of the area-throughput characteristics for contrasted with each other in the same section. The section
an instance of this family are also presented. While the speeds also touches upon higher valency adders [17]–[19].
achieved by this instance match those achieved by the state The existing literature on architectural innovations in
of the art adders, their area characteristics exhibit upto 26% parallel prefix adders is scant as well as sparsely populated
improvement. in time, whereas the impementation community has been
Keywords-Parallel prefix adders; adder recurrence relations; fairly active. It is our belief that the absence of a unified
digital arithmetic; theory and a consistent terminology in the domain might
be the reason d’etre. As the architectures of various parallel
I. I NTRODUCTION prefix adders are described in a consistent formulation, more
Binary addition of multi-bit operands is one of the funda- efficient higher order recurrence relations and corresponding
mental operations in the domain of contemporary general novel architectures of these adders emerge.
purpose as well as signal processing applications. Many Section III presents our novel recursion for multi-bit
other operations like multiplication, floating point compu- integer addition and the corresponding realisation for one
tations, multiply-accumulate units or increment-decrement instance of the proposed family of parallel prefix adders.
operations are all composed of variants of adders. An This section further demonstrates the scalability and the
efficient implementation of such adders has, therefore, a generic nature of the proposal. The discussions are restricted
profound impact on the design of any data processing unit. at architecture level and are, therefore, independent of the
Parallel prefix adders are particularly interesting from the technology in which they are implemented.
CMOS implementation point of view because of the simplic- Estimates on the number of logic levels in the critical path
ity of their cell designs and the regularity of interconnects and the total gate count for an instance of the family (having
connecting these cells. Moreover, the implementations of recurrence of the order 4) are presented and discussed
these adders are easy to pipeline. Various architectures and in section IV. Comparisons with corresponding figures of
design alternatives have been investigated in literature [1]– merit for Weinberger-Smith [12], Ling [13] and Jackson-
[3]. All such investigations are based on computation of Talwar [15] adders are also made in this section. Section V
some local signals in parallel for all the bits of operands. concludes the paper.
These signals are then reduced over larger groups in tree-
like fashion. Then the sum bits and carry out bit from II. PARALLEL P REFIX A DDERS : M ATHEMATICAL
the adder are computed in parallel [4]–[11]. Reductions F ORMALISM AND T ERMINOLOGY
of these signals over larger groups are based on simple We use capital letters to represent binary operands. For
978-1-7281-3366-9/19/$31.00 ©2019 IEEE 125

DOI 10.1109/ARITH.2019.00034

example A and B are used for addend and augend respec- Ci+1 gi gi−1 g0 Cin
tively. Cin and Cout denote the carry-in to and carry-out = ◦ ··· ◦
pi · pi−1 · · · p0 pi pi−1 p0 1
of the adder respectively whereas S denotes the sum. Small
(8)
letters are used to represent bits (addend, augend, sum as
We also define a group generate signal gi···j and a group
well as some special local signals) and subscripts are used
propagate signal pi···j as:
to indicate their arithmetic weight, increasing from 0 at the
least significant bit. Thus ai signifies the addend, bi signifies
the augend Si signifies the sum, while gi signifies a special gi···j gi gi−1 gj+1 gj
= ◦ ··· ◦ (9)
signal (generate) at bit position i and so on. An exception pi···j pi pi−1 pj+1 pj
is made here for representing local carries, real or pseudo,
where capital letters are used, as in Ci , Hi , Ki etc. to Thus equation (8) can also be written as:
differentiate them from other local signals like hi , qi , ki
Ci+1 gi···0 Cin
etc. = ◦ (10)
The first stage of all the conventional parallel prefix pi···0 pi···0 1
adders computes two signals, carry generate (gi ) and carry Using equation (4), Ci+1 can be computed sequentially
propagate (pi ), which being local to each bit position can which results in carry ripple adders. However, a couple of
all be computed in parallel [8]. They are defined as: properties of the prefix operator (◦) are particularly insightful
in reducing sequences of these operations in parallel [8].
g i = ai · b i (1) Design of fast parallel prefix adders is based on these
p i = ai + bi (2) properties. Firstly, it is trivial to prove that the operator is
associative, i.e.,
Let us also define another local signal (ti ) as follows:

gi gj gk gi gj gk
t i = ai ⊕ b i (3) ◦ ◦ = ◦ ◦
pi pj pk pi pj pk
We also use Ci to denote input carry at bit position i. It (11)
is trivial to note that: Secondly, the operator is idempotent, i.e.,

gh···i gj···k gh···k
Ci+1 = gi + pi · Ci (4) ◦ = (12)
ph···i pj···k ph···k
The final stage computes the sum bits as:
where, h > i, i ≤ j + 1 and j > k i.e. the prefix operator
sequences should either overlap or should be adjacent.
si = ti ⊕ Ci (5) Having introduced the terminology and mathematical
Note that the first and the last stages are purely local in formalism, we describe Weinberger-Smith [12], Ling [13],
nature as they operate on signals only at their respective bit Doran [14], and Jackson-Talwar [15] recursions for multi-bit
positions. Hence all of the bits can be operated upon concur- integer addition in the following subsections.
rently. However, there is a data dependence between Ci+1 A. Weinberger-Smith Recurrence and Conventional Carry
and Ci ; and thus they cannot be computed concurrently. In Look Ahead Adder
conventional parallel prefix adders, carries are computed by The Weinberger-Smith [12] recurrence relation for multi-
using a special binary prefix operator (◦) defined on pairs bit integer addition is given by equations (7) and (10), where
of operands as: C0 = Cin and gi = 0, pi = 1 ∀i < 0. The algorithm based
on Ladner-Fischer topology [5] for a radix-8 adder is as
gi gj g i + pi · g j hereinafter given.
◦ = (6) 1) gi , pi and ti (∀i : i ≥ 0) are computed in parallel.
pi pj p i · pj
2) Following are all computed in parallel:
Thus,
g1···0 g1 g0
= ◦ ,
Ci+1 gi Ci p1···0 p1 p0
= ◦ (7)
pi pi 1 g3···2 g3 g2
= ◦ ,
Input carry at any bit position as a function of Cin can p3···2 p3 p2
thus be trivially computed using a sequence of the prefix
operations (◦)s introduced above as:
126

g5···4 g5 g4 C7 g6···0 Cin
= ◦ , = ◦ ,
p5···4 p5 p4 p6···0 p6···0 1

g7···6 g7 g6 C8 g7···0 Cin
= ◦ , = ◦
p7···6 p7 p6 p7···0 p7···0 1

C1 g0 Cin 6) si (∀i : 8 > i ≥ 0) are computed in parallel using
= ◦ equation (5).
p0 p0 1
The realisation for a 4 bit adder is shown in Fig. 1. The
3) Following are all computed in parallel: blocks labled as gpt are the initial processing blocks that
compute the signals gi , pi and ti as given by the step
g2···0 g2 g1···0
= ◦ , (1) of the algorithm and defined in equations (1), (2) and
p2···0 p2 p1···0 (3) respectively. The blocks labled as reduce compute the
group signals defined in steps (2) to (5) of the algorithm
g3···0 g3···2 g1···0
= ◦ , in treelike fashion. The blocks labled carry compute the
p3···0 p3···2 p1···0 carry out signals as defined by equation (4) while the blocks
labled sum compute the sum bits as defined by equation (5)
g6···4 g6 g5···4 respectively at each bit position.
= ◦ ,
p6···4 p6 p5···4

g7···4 g7···6 g5···4
= ◦ ,
p7···4 p7···6 p5···4

C2 g1···0 Cin
= ◦
p1···0 p1···0 1

g4···0 g4 g3···0
= ◦ ,
p4···0 p4 p3···0

g5···0 g5···4 g3···0
= ◦ ,
p5···0 p5···4 p3···0

Figure 1. A 4-bit adder based on Weinberger-Smith recurrence in Ladner-
g6···0 g6···4 g3···0
= ◦ , Fischer topology.
p6···0 p6···4 p3···0
B. Ling Adder
g7···0 g7···4 g3···0
= ◦ , It is realised in [13] that the recurrence relation for
p7···0 p7···4 p3···0 parallel prefix carry computation can be simplified if a
pseudo carry Hi = Ci + Ci−1 is propagated in lieu of
C3 g2···0 Cin
= ◦ , the conventional carry Ci . Once these pseudo carries are
p2···0 p2···0 1 known for all the bit positions, the conventional carries can
be extracted from them by Ci = pi−1 · Hi as proved below:
C4 g3···0 Cin
= ◦ pi−1 · Hi = pi−1 · (Ci + Ci−1 )
p3···0 p3···0 1
= pi−1 · (gi−1 + pi−1 · Ci−1 + Ci−1 )
= pi−1 · (gi−1 + Ci−1 )
C5 g4···0 Cin = pi−1 · gi−1 + pi−1 · Ci−1
= ◦ ,
p4···0 p4···0 1 = gi−1 + pi−1 · Ci−1

C6 g5···0 Cin = Ci
= ◦ ,
p5···0 p5···0 1 In order to develop Ling recurrence relation, we define
two more local signals (hi ) and (qi ) as:
127
hi = gi + gi−1 (13) C. Dimitrakopoulos-Nikolos Insight
qi = pi · pi−1 (14)
Following is an expansion of Ling recurrence relation for
Starting with the definition of Hi+1 and noting the fact Ling pseudo carries for a radix-8 adder.
that gi · pi = gi , we can define it in terms of Hi−1 as given
below: H8 h7 h5 h3 h1 Cin
= ◦ ◦ ◦ ◦ ,
Hi+1 = Ci+1 + Ci q6 · q4 · q2 · q0 q6 q4 q2 q0 1

= g i + pi · C i + C i H7 h6 h4 h2 h0
= ◦ ◦ ◦ ,
= gi + C i q5 · q3 · q1 q5 q3 q1 1
= gi + gi−1 + pi−1 · (gi−2 + pi−2 · Ci−2 )
H6 h5 h3 h1 Cin
= gi + gi−1 + pi−1 · pi−2 · (gi−2 + Ci−2 ) = ◦ ◦ ◦ ,
q4 · q2 · q0 q4 q2 q0 1
= hi + qi−1 · Hi−1 (15)
H5 h4 h2 h0
Thus the above equation and the definition of qi−1 can be = ◦ ◦ ,
q3 · q1 q3 q1 1
collected together using the binary prefix operator (◦) as:
H4 h3 h1 Cin
Hi+1 hi Hi−1 = ◦ ◦ ,
= ◦ (16) q2 · q0 q2 q0 1
qi−1 qi−1 1
H3 h2 h0
where Hin = H0 = C0 = Cin , h0 = g0 , q0 = p0 and = ◦ ,
q1 q1 1
hi = 0, qi = 1 ∀i < 0.
Ling carry (pseudo carry) at any bit position as a function H2 h1 Cin
of Hin = Cin can thus be trivially computed by a sequence = ◦ ,
q0 q0 1
of prefix operations using equation (17) as:
H1 h0
Hi+1 hi hi−2 Hin = ,
= ◦ ··· 1 1
qi−1 · qi−3 · · · q0 qi−1 qi−3 1
(17) H0 Cin
=
Ling [13] observed that, using equation (15), pseudo Ling 1 1
carry H4 can be written as:
H4 = h3 + q2 · h1 + q2 · q0 · Hin Dimitrakopoulos and Nikolos [16] observed that in the
above expansion the even and odd subscripted pseudo Ling
= g3 + g2 + p2 · g1 + p2 · p1 · g0 + p2 · p1 · p0 · Cin
carries are independent of each other. This essentially means
(18)
that they can be reduced in mutually exclusive even and
which is logically much more simpler than the corre- odd sub-trees. Therefore, the 8-input first stage of an 8-bit
sponding expression for the conventional carry C4 which Ling adder is actually two separate 4-input stages, which
is given below for ready reference: eliminates one logic stage in the critical path!
C4 = g3 +p3 ·g2 +p3 ·p2 ·g1 +p3 ·p2 ·p1 ·g0 +p3 ·p2 ·p1 ·p0 ·Cin D. Doran Recursion and Variants of Ling Adders
(19) Doran [14] observed that there are a few local signals
Obviously, because of smaller fan-outs of the gates and other than gi and pi or even hi and qi described by
simpler logic expressions for the pseudo carries, Ling adders Weinberger-Smith or Ling recurrence respectively which
achieve better area-time characteristics than the conventional may be used in lieu of them. There are a number of such
parallel prefix adders, albeit at the expense of having a more signals but all of them have only theoretical significance.
complex final stage that recovers the conventional carries None of them actually provide any further gains in either
from their respective pseudo Ling carries. area or latency vis-a-vis their conventional realisations. One
It may be noted that Ling recurrence relation has a striking of such few cases arises if we take another pseudo carry
similarity with Weinberger-Smith recurrence relation and signal Xi in place of signal Hi and take pi in place of gi
therefore, all the topologies for parallel prefix reduction are as defined below.
equally applicable in the case of Ling adders too. Thus one
may implement a Ling adder in any topology as proposed
Xi+1 = pi+1 + Gi (20)
by Ladner-Fishcer [5], Brent-Kung [6] or Knowles [8].
Gi = pi · Xi (21)
128
Thus, tions (19) can be further factorized as given below:
Xi+1 = pi+1 + pi · Xi (22) C4 = (g3 + p3 g2 + p3 p2 g1 + p3 p2 p1 g0 + p3 p2 p1 p0 Cin )

= (p3 ) · (g3 + g2 + p2 g1 + p2 p1 g0 + p2 p1 p0 Cin )
This recurrence relation is also similar in structure as the = (g3 + p3 p2 ) · (g3 + g2 + g1 + p1 g0 + p1 p0 Cin )
other ones defined in earlier subsections and therefore, the = (g3 + p3 g2 + p3 p2 p1 ) · (g3 + g2 + g1 + g0 + p0 Cin )
parallel prefix reduction tree structure remains similar to that (23)
of the parallel prefix adder based on Ling recurrence. It is readily observed that the second equality above is the
The realisation for a 4-bit Ling adder or any of its variants Ling factorization and the quantity in the second factor is
as suggested by Doran [14] and modified by Dimitrakopou- actually the Ling pseudo group carry H4 defined in (18).
los and Nikolos [16] is shown in Fig. 2. The blocks labled The third and the fourth equalities are the higher order
as hqt are the initial processing blocks that compute the factorizations. The quantities in the second factors in these
signals hi , qi and ti as defined in equations (13), (14) and equalities may be called the Jackson-Talwar pseudo carries
(3) respectively. The blocks labled as reduce compute the of order 3 and 4 respectively. For example, consider the
group signals in treelike fashion. The blocks labled carry case for a fourth order factorization. In this case, in order
compute the carry out signals as defined by equation (4) to understand the theory, we define two more local signals
while the blocks labled sum compute the sum bits as defined (bi ) and (di ) as:
by equation (5) respectively at each bit position.
bi = gi + gi−1 + gi−2 + gi−3 (24)
di = gi + pi gi−1 + pi pi−1 pi−2 (25)
Let us also define a Jackson-Talwar fourth order group
carry Ji as given in equation (26) below:
Ji = Ci + Ci−1 + Ci−2 + Ci−3 (26)
Starting with this definition for Ji+1 , we can define it in
terms of Ji−3 as given below:
Ji+1 = Ci+1 + Ci + Ci−1 + Ci−2
= gi + pi · Ci + Ci + Ci−1 + Ci−2
= gi + Ci + Ci−1 + Ci−2
= gi + gi−1 + pi−1 · Ci−1 + Ci−1 + Ci−2
= gi + gi−1 + Ci−1 + Ci−2
= gi + gi−1 + gi−2 + pi−2 · Ci−2 + Ci−2
= gi + gi−1 + gi−2 + Ci−2
= gi + gi−1 + gi−2 + gi−3 + pi−3 · Ci−3
= bi + di−3 · Ji−3 (27)
Figure 2. A 4-bit adder based on Ling recurrence in Ladner-Fischer
topology. Thus the above equation and the definition of bi−3 can be
collected together using the binary prefix operator (◦) as:

Ji+1 bi Ji−3
= ◦ (28)
di−3 di−3 1
E. Jackson-Talwar Factorization and Generalization of Ling
Adders where Jin = J0 = C0 = Cin , j0 = g0 , and b0 = p0 and
ji = 0 and bi = 1 ∀i < 0.
Jackson and Talwar [15] have generalized the Ling fac- The realisation for a 4-bit Jackson-Talwar adder based on
torization and introduced Reduced Generate (R) and Hyper the propagation of pseudo carries defined above is the same
Propagate (Q) signals in lieu of conventional Generate (G) as that for a 4-bit novel adder proposed by us in Section
and Propagate (Q) to further speed up multi-bit addition and III and shown in Fig. 3. The blocks labled as krt are the
proved that the relations for carries can be factorized even initial processing blocks for computing the signals di , bi and
beyond what was established by Ling [13]. For example, ti as defined above. The blocks labled as reduce compute
the expression for the conventional carry C4 given in equa- the group signals in treelike fashion. In this figure there
129
are no reduce blocks as it is a realisation of a 4-bit adder Ci or the Ling pseudo carry Hi . Once these pseudo carries
only and there are 4 subtrees in case of Jackson-Talwar are known for all the bit positions, the conventional carries
adders and also in case of our proposed adders. However, can be extracted from them by Ci = pi−1 · Ki as proved
in case of adders with operand widths of more than 4-bit, below:
corresponding subtrees with reduce blocks are present. The
blocks labled carry compute the carry out signals as defined Ki = Ci + Ci−1 + qi−2 · (Ci−2 + Ci−3 )
by equation (4) while the blocks labled sum compute the = gi−1 + pi−1 · Ci−1 + Ci−1 + qi−2 · (Ci−2 + Ci−3 )
sum bits as defined by equation (5) respectively at each bit = gi−1 + Ci−1 + qi−2 · (Ci−2 + Ci−3 )
position. = gi−1 + gi−2 + pi−2 · Ci−2 + qi−2 · Ci−2 + qi−2 · Ci−3
= gi−1 + gi−2 + pi−2 · gi−3 + pi−2 · pi−3 · Ci−3 + qi−2
F. Higher Valency Adders
· Ci−3
The discussions in the above subsections are based on the = gi−1 + gi−2 + pi−2 · gi−3 + qi−2 · Ci−3
assumption that the generate signals gi and the propagate
signals pi are computed for each bit. There is no reason = gi−1 + gi−2 + qi−2 · gi−3 + qi−2 · Ci−3
why group generate signal gi···j and group propagate signal = gi−1 + gi−2 + qi−2 · gi−3 + qi−2 · gi−4 + qi−2 · pi−4
pi···j as defined in equation (9) can not be computed for · Ci−4
groups of adjacent bits in place of individual bits. These (29)
signals defined over such groups can then be reduced in
similar treelike structures as discussed earlier. Such parallel Therefore,
prefix adders [17], [18] are known as higher valency adders.
It is worth noting that a parallel prefix adder with valency pi−1 · Ki = pi−1 · gi−1 + pi−1 · gi−2 + pi−1 · qi−2 · gi−3
2 is not the same as a Ling adder or that with valency 4 is
pi−1 · qi−2 · gi−4 + pi−1 · qi−2 · pi−4 · Ci−4
not the same as a Jackson-Talwar adder which are different
and architecturally more efficient. = gi−1 + pi−1 · gi−2 + pi−1 · pi−2 · gi−3
III. P ROPOSED N OVEL H IGHER O RDER R ECURRENCES + pi−1 · pi−2 · pi−3 · gi−4 + pi−1 · pi−2 · pi−3
AND C ORRESPONDING A DDERS · pi−4 · Ci−4
In case of Ling adders [13], the generate signals gi = Ci
are combined as conjunctions and the propagate signals pi
(30)
are combined as disjunctions and given in equations (13)
and (14) respectively. These combinations are, however, In order to develop recurrence relation for the above
limited to only 2 adjacent signals. Arguably Jackson-Talwar defined pseudo carry, we define two more local signals (ki )
adders [15] are motivated by the fact that more than 2 and (ri ) as:
adjacent generate signals can be combined as conjunctions
(Reduced Generate) and corresponding propagate signals ki = gi +gi−1 +qi−1 ·(gi−2 +gi−3 ) = hi +qi−1 ·hi−2 (31)
(Hyper Propagate) can be calculated in such a way that the ri = pi · pi−1 · pi−2 · pi−3 = qi · qi−2 (32)
overall addition of multi-bit integers remains correct. In case
more than 2 propagate signals are combined as disjunctions Using equation (29) for the definition of Ki+1 we can
and the corresponding genarate signals are calculated in define it in terms of Ki−3 as given below:
similar way to preserve the multi-bit addition semantics,
one can create a new family of adders which when looked Ki+1 = gi + gi−1 + qi−1 · gi−2 + qi−1 · gi−3 + qi−1 · pi−3
from the Dimitrakopolous-Nikolos [16] perspective, can be · Ci−3
decoupled in more than 2 subtrees and are consequently = ki + qi−1 · pi−3 · Ci−3
faster and more efficient. The following subsection describes = ki + qi−1 · pi−3 · pi−4 · Ki−3
such a decoupling in 4 subtrees to introduce the concept and
subsection III-B generalizes the concept at higher orders. = ki + ri−1 · Ki−3 (33)
Thus the above equation and the definition of ri can be
A. Order 4 Recurrence and Corresponding Adders collected together using the binary prefix operator (◦) as:

In line with the thought above, it is straight forward to Ki+1 ki Ki−3
show that the recurrence relation for parallel prefix carry = ◦ (34)
ri−1 ri−1 1
computation can be simplified further if another pseudo carry
Ki = Hi + qi−2 · Hi−2 = Ci + Ci−1 + pi−2 · pi−3 (Ci−2 + where Kin = K0 = H0 = C0 = Cin , k0 = h0 = g0 and
Ci−3 ) is propogated in place of either the conventional carry r0 = q0 = p0 , K1 = H1 = C1 + C0 = p0 · (g0 + C0 ) =
130

g0 + p0 · C0 = g0 + p0 · Cin , k1 = h1 = g1 + g0 and K2 k1 Cin
r1 = q1 = p1 · p0 , K2 = H2 + q0 · H0 = C2 + C1 + q0 · C0 = = ◦
r0 r0 1
g1 + g0 + p0 · C0 = g1 + g0 + p0 · Cin , k2 = h2 + q1 · h0 =
g2 + g1 + p1 · g0 and r2 = q2 · q0 = p2 · p1 · p0 and ki = 0
K1 k0 Cin
and ri = 1 ∀i < 0. = ◦
Input new pseudo carry Ki+1 at any bit position as a 1 1 1
function of Kin = Hin = Cin can thus be easily computed
by a sequence of prefix operations as: K0 Cin
=
Ki+1 ki ki−4 k1 1 1
= ◦ ···
ri−1 · ri−5 · · · r4 · r0 ri−1 ri−5 r0 It is readily observed in the above expansion that our
new pseudo carries are dependent on their previous pseudo
Cin
◦ carries only with a stride of 4. This essentially means
1 that they can be reduced in 4 mutually exclusive sub-trees.
(35) Therefore, the 8-input first stage of the 8 bit proposed adder
It is readily observed that, using equation (33), the new is actually 4 separate 2-input stages, which eliminates 2 logic
pseudo carry K4 can be written as: levels in the critical path!
The realisation for a 4-bit adder based on the propagation
K4 = k3 + r2 · Kin of pseudo carries defined by us above from an augmented
= h3 + q2 · h1 + q2 · q0 · Hin Dimitrakopoulos and Nikolos [16] perspective is shown in
= g3 + g2 + p2 · g1 + p2 · p1 · g0 + p2 · p1 · p0 · Cin Fig. 3. The blocks labled as krt are the initial processing
(36) blocks that compute the signals ki , ri and ti as defined in
equations (31), (32) and (3) respectively. The blocks labled
which is logically same as that of the pseudo Ling carry as reduce compute the group signals in treelike fashion.
H4 and obviously simpler than the corresponding expression Obviously, as it is a realisation of a 4-bit adder, these reduce
for the conventional carry C4 . blocks are not explicitly present. However, for adders with
Striking similarity of the above relations with Ling recur- higher operand widths they will appear in treelike fashion.
rence relation and Weinberger-Smith recurrence relation is The blocks labled carry compute the carry out signals
worth noting. It is, therefore, trivial to implement our new as defined by equation (4) while the blocks labled sum
adders modelled as Ladner-Fischer [5], Brent-Kung [6] or compute the sum bits as defined by equation (5) respectively
any other topology as proposed by Knowles [8]. Following at each bit position.
is an expansion of our new higher order recurrence relation
for our new pseudo carries for a radix-8 adder.
B. Higher Order Recurrences
K8 k7 k3 Cin The generalization of the concept to higher levels is also
= ◦ ◦
r 6 · r2 r6 r2 1 straight forward. For example, if yet other pseudo carries
Li = Ki + ri−2 · Ki−4 = Hi + qi−2 · Hi−2 + qi−2 · qi−4 ·
K7 k6 k2 Cin Hi−4 + qi−2 · qi−4 · qi−6 · Hi−6 = Ci + Ci−1 + pi−2 ·
= ◦ ◦
r 5 · r1 r5 r1 1 pi−3 · (Ci−2 + Ci−3 ) + pi−2 · pi−3 · pi−4 · pi−5 · (Ci−4 +
Ci−5 ) + pi−2 · pi−3 · pi−4 · pi−5 · pi−6 · pi−7 · (Ci−6 + Ci−7 )

K6 k5 k1 Cin are propogated in place of the pseudo carries Ki introduced
= ◦ ◦ above, the conventional carries can be extracted from them
r 4 · r0 r4 r0 1
by Ci = pi−1 · Li exactly as proved in equations (29) and
(30) above for Ki .
K5 k4 k0 Cin In order to develop recurrence relation for the above
= ◦ ◦
r3 r3 1 1 defined pseudo carry Li , we define two more local signals
(li ) and (si ) as:

K4 k3 Cin li = gi + gi−1 + qi−2 · (gi−2 + gi−3 )
= ◦
r2 r2 1 + qi−2 · qi−4 · (gi−4 + gi−5 )
+ qi−2 · qi−4 · qi−6 · (gi−6 + gi−7 ) (37)
K3 k2 Cin
= ◦
r1 r1 1
si = pi · pi−1 · pi−2 · pi−3 · pi−4 · pi−5 · pi−6 · pi−7 (38)
131
Figure 3. A 4-bit adder based on our recurrence in Ladner-Fischer topology.
Using the definition of Li+1 we can define it in terms of restrict our experiments at gate level alone and leave the
Li−7 as: layout details and post-layout simulations for extracting the
Li+1 = li + si−1 · Li−7 (39) exact speed and area for future work. The number of logic
levels in the critical path and the total gate count are good
Thus the above equation and the definition of si can be estimates for operating speed and layout area respectively.
collected together using the binary prefix operator (◦) as: These logic levels and gate counts were counted manually

Li+1 li Li−7 and are reported for 8-bit, 16-bit, 32-bit, 64-bit and 128-bit
= ◦ (40) adders in Table I.
si−1 si−1 1
Table I
where Lin = L0 = K0 = H0 = C0 = Cin and li = 0 and E STIMATES FOR S PEED AND A REA OF VARIOUS PARALLEL P REFIX
si = 1 ∀i < 0 as before. A DDERS .
The definitions of Ki+1 and Li+1 as given in equa- Number Recurrence Relations Logic Levels in Number
tions (34) and (40) have a striking similarity except that the of Bits Critical Path of Gates
former has a stride 4 while the later has a stride 8. Thus, Weinberger-Smith 11 107
if the pseudo carry Li is propagated instead of the pseudo 8 Ling 10 118
Jackson-Talwar 9 212
carry Ki , then the reduction tree for the adder based on it can Proposed 9 164
be partitioned into 8 subtrees instead of 4 which takes our Weinberger-Smith 13 259
concept at a higher level. In fact the concept proved in this 16 Ling 12 274
Jackson-Talwar 11 460
subsection can be recursively applied again and again to get Proposed 11 364
16 subtrees and even beyond subject only to the limitations Weinberger-Smith 15 611
of implementation technology. 32 Ling 14 646
Jackson-Talwar 13 1,004
Proposed 13 812
IV. E XPERIMENTAL R ESULTS AND D ISCUSSIONS Weinberger-Smith 17 1,411
64 Ling 16 1,478
Individual structural blocks of the parallel prefix adders Jackson-Talwar 15 2,188
based on Weinberger-Smith [12], Ling [13], Jackson- Proposed 15 1,804
Weinberger-Smith 19 3,203
Talwar [15] and our proposed recurrence relations (order 128 Ling 18 3,334
4) were synthesized using EDA tools at gate level with the Jackson-Talwar 17 5,260
available technology library which has synthesizable gates Proposed 17 3,980
with a maximum fan-in of 4. As the intent of this paper
is to propose an ultra fast adder at architecture level, we The number of logic levels in the critical path for all the
132
adders based on Ling recurrence [13] are 1 less than the R EFERENCES
values for the corresponding adders based on Weinberger-
[1] I. Koren, Computer Arithmetic Algorithms, 2nd ed. Natick,
Smith recurrence [12]. These levels in case of adders based MA, USA: A. K. Peters, Ltd., 2001.
on Jackson-Talwar recurrence [15] as well as those based
on our proposed novel recurrence are still lower by 1, [2] B. Parhami, Computer Arithmetic: Algorithms and Hardware
as expected. The total gate count for all the adders are Designs. New York, NY, USA: Oxford University Press,
increasing from Weinberger-Smith adder [12] to Ling adder Inc., 2000.
[13] to Jackson-Talwar adder [15]. This trend is in line with [3] M. D. Ercegovac and T. Lang, Digital Arithmetic. New York,
the expectation. NY, USA: Morgan Kaufmann, 2004.
The comparison between Jackson-Talwar adder [15] and
our proposed adder is particularly interesting. Though the [4] P. M. Kogge and H. S. Stone, “A parallel algorithm for the
speeds achieved by both the adders is the same, yet the efficient solution of a general class of recurrence equations,”
IEEE Transactions on Computers, vol. C-22, no. 8, pp. 786–
total gate count in case of our proposed adder is much
793, Aug 1973.
lower as compared to the former. This is in line with the
complexities expressed in equations (24), (31) and (25), [5] R. E. Ladner and M. J. Fischer, “Parallel prefix computation,”
(32) respectively. The order of recurrence presented in this J. ACM, vol. 27, no. 4, pp. 831–838, Oct. 1980.
paper is only 4. In case of recurrences at higher order the
layout area for Jackson-Talwar adder [15] is expected to [6] R. P. Brent and H. T. Kung, “A regular layout for parallel
adders,” IEEE Transactions on Computers, vol. C-31, no. 3,
deteriorate at a higher rate than that for our proposal. In pp. 260–264, March 1982.
fact, the scalabilty achieved by the proposed adder family
is much better than that of the Jackson-Talwar adders [15] [7] T. Han and D. A. Carlson, “Fast area-efficient VLSI adders,”
both in terms of the operand widths as well as in terms in 1987 IEEE 8th Symposium on Computer Arithmetic
of the order of factorization. The proposed adder family (ARITH), May 1987, pp. 49–56.
differs with other parallel prefix adders [12]–[15] only in
[8] S. Knowles, “A family of adders,” in Proceedings IEEE
the first stage, while the reduction tree topologies remain Symposium on Computer Arithmetic. ARITH-15 2001, 2001,
exactly the same. This fact highlights its generic nature as pp. 277–281.
any tree reduction topology [4]–[11] is equally applicable.
In the interest of consistency, however, we have chosen only [9] A. Beaumont-Smith and C. C. Lim, “Parallel prefix adder
Ladner-Fischer [5] topology. design,” in Proceedings IEEE Symposium on Computer Arith-
metic. ARITH-15 2001, 2001, pp. 218–225.
V. C ONCLUSIONS
[10] V. G. Oklobdzija, B. R. Zeydel, H. Dao, S. Mathew, and
Systematic and consistent development of the theory be- R. Krishnamurthy, “Energy-delay estimation technique for
hind parallel prefix adders has revealed newer factorisations high-performance microprocessor VLSI adders,” in Proceed-
of logical expressions for computation of bitwise carries ings 2003 IEEE Symposium on Computer Arithmetic, June
in parallel as well as more efficient recurrence relations. 2003, pp. 272–279.
This study has demonstrated existence of better energy-
[11] J. Sklansky, “Conditional-sum addition logic,” IRE Transac-
area efficient and faster multi-bit integer adder architectures. tions on Electronic Computers, vol. EC-9, no. 2, pp. 226–231,
The methodology proposed in this paper for higher order June 1960.
recurrences is shown to be recursive in nature. Though the
present study has proposed novel factorisation of the order [12] A. Weinberger and J. Smith, “A logic for high-speed addi-
4 in some detail, yet it is not difficult to extend the same tion,” Nat. Bur. Stand. Circ., vol. 591, pp. 3–12, 1958.
theory to higher order factorisations as well.
[13] H. Ling, “High-speed binary adder,” IBM Journal of Research
It would also be worthwhile to study hybrid designs and Development, vol. 25, no. 3, pp. 156–166, March 1981.
involving some of the architectures presented in this proposal
in novel ways to achieve better results. The present study is [14] R. W. Doran, “Variants of an improved carry look-ahead
limited in its scope at architecture level, which needs to be adder,” IEEE Transactions on Computers, vol. 37, no. 9, pp.
probed further at implementation levels. Both of these could 1110–1113, September 1988.
be topics of future investigations. [15] R. Jackson and S. Talwar, “High speed binary addition,” in
ACKNOWLEDGMENT Conference Record of the Thirty-Eighth Asilomar Conference
The authors would like to thank MeitY, Govt. of India on Signals, Systems and Computers, 2004., vol. 2, Nov 2004,
pp. 1350–1353 Vol.2.
and Digital India Corporation (formerly Media Lab asia)
for the Visweswaraya PhD scheme and supporting this [16] G. Dimitrakopoulos and D. Nikolos, “High-speed parallel-
R&D work. prefix VLSI Ling adders,” IEEE Transactions on Computers,
vol. 54, no. 2, pp. 225–231, Feb 2005.
133
[17] T. Kocak and P. Patil, “Design and implementation of high-
performance high-valency Ling adders,” in 2012 IEEE 15th
International Symposium on Design and Diagnostics of Elec-
tronic Circuits Systems (DDECS), April 2012, pp. 224–229.
[18] T. McAuley, W. Koven, A. Carter, P. Ning, and D. M. Harris,

“Implementation of a 64-bit Jackson adder,” in Signals, Sys-
tems and Computers, 2013 Asilomar Conference on. IEEE,
2013, pp. 1149–1154.
[19] N. Poornima and V. K. Bhaaskaran, “Design and implemen-

tation of 32-bit high valency jackson adders,” Journal of
Circuits, Systems and Computers, vol. 26, no. 07, p. 1750123,
2017.
134

An Ultra-Fast Parallel Prefix Adder: 2019 IEEE 26th Symposium On Computer Arithmetic (ARITH)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

An Ultra-Fast Parallel Prefix Adder: 2019 IEEE 26th Symposium On Computer Arithmetic (ARITH)

Uploaded by

Copyright:

Available Formats

2019 IEEE 26th Symposium on Computer Arithmetic (ARITH)

An Ultra-Fast Parallel Preﬁx Adder

Kumar Sambhav Pandey Dinesh Kumar B Neeraj Goel

978-1-7281-3366-9/19/$31.00 ©2019 IEEE 125

Xi+1 = pi+1 + pi · Xi (22) C4 = (g3 + p3 g2 + p3 p2 g1 + p3 p2 p1 g0 + p3 p2 p1 p0 Cin )

[18] T. McAuley, W. Koven, A. Carter, P. Ning, and D. M. Harris,

[19] N. Poornima and V. K. Bhaaskaran, “Design and implemen-

You might also like