Lecture3 - Adder Design

EESM5020 VLSI System Design and
Design Automation Spring 2019

Lecture 3 Design of Datapath
Operator - Adder Design
Reading Assignment:
Weste: Chapter 8
Rabaey: Chapter 11
Note: some of the figures in this slide set are adapted from the slide set
of Digital Integrated Circuits by Rabaey et.al. Copyright UCB 2002
1 EESM5020/19 Lecture 3
A Generic Digital Processor
q Datapath is the core of the processor and is the

place where all computations are performed
MEMORY
Input-output
CONTROL
h
e
p
i
c
t
u
r
e
c
a
n
'
t
b
e
d
i
s
p
l
a
y
e
d
.
DATAPATH
Full-Adder
A B
C Full Carry
adder
Sum
Sum = ABC + ABC +ABC +ABC
=A+B+C
Carry = MAJ(A,B,C)
=AB + AC + BC = AB + C(A+B)
A Sum
B
C
A
B Carry
C
A
3 B EESM5020/19 Lecture 3
The Ripple-Carry Adder
A B A1 B1 A2 B2 A3 B3
0 0
Co,1 Co,2
Ci,0 Co,0 Co,3
FA FA
The picture can't be displayed.
FA FA
(=Ci,1)
S0 S1 S2 S3
q Worst case delay linear with the number of bits, td = O(N)

q Worst case delay: tadder = NTcarry or (N-1) Tcarry+Tsum
q Propagation delay of ripple carry adder is linearly
proportionally to N.
q When designing the full adder cell for a fast ripple carry
adder, it is far more important to optimize Tcarry than Tsum.
Adder
q Fundamental problem is rapidly calculating the carry bit.
q All carry bits are dependent on all previous inputs
– LSB has fanout of N
q Simplest adder: ripper carry – linear adder
q Faster adders: carry look-ahead, carry bypass..
– All of these work out carry several bits at a time
q Even faster adder – logarithmic, tree adder
– Use prefix computation
An-1 Bn-1 A2 B2 A1 B1 A0 B0
C FA ... FA FA FA C0
n
5 Sn-1 S2 S1 S0
EESM5020/19 Lecture 3
Express Sum and Carry as a function of
P, G, D
Define 3 new variable which ONLY depend on A, B

Generate (G) = AB
Propagate (P) = A Å B
Delete (D) = A B
Can also derive expressions for S and Co based on

D and P
Manchester Carry Chain Adder
The idea: First generate carry-in for each adder bit as fast
as possible and then evaluate the sum. Delay is still
proportional to number of bits but “constant” is small
Two possible implementations: Static & Dynamic
Manchester Carry Chain Adder
q The propagate logic in the Manchester carry chain puts a
lot of NFETs in series.
q If Cin is high, and P signals are true, the path is long.
q The max length of the carry chain is limited to around 4.
Manchester Carry Chain
VDD
f
P0 P1 P2 P3 P4
Ci,0
G0 G1 G2 G3 G4
Sizing Manchester Carry Chain
Discharge Transistor Using Penfield- Rubenstein Model
R1 R2 R3 R4 R5 R6 Out (Elmore delay)delay time tp is
1 2 3 4 5 6
MC M0 M1 M2 M3 M4 0.69(C1R1 + C2(R1 + R2) + C3(R1 + R2 + R3 )
C1 C2 C3 C4 C5 C6
+ C4(R1 + R2 + R3 + R4) + C5(R1 + R2 + R3 +
R4 + R5) + C6(R1 + R2 + R3 + R4 + R5 + R6) )
N
æ i ö Assume 1.2µm technology and
tp = 0.69 å Ci ç å R j÷ minimum size transistor, C is
i = 1 èj = 1 ø assumed to be 20fF and R =
25 400 20KOhm, we have
20 300 t p = 0.69 ´ 21RC
Speed
With size of the pass-transistors is

Area
15 200
scaled by k we have
10 100 Ri +1
Ci = kCi +1, Ri =
51 1.5 2.0 2.5 3.0
01 1.5 2.0 2.5 3.0 k
k
Speed (normalized by 0.69RC)
k
Area (in minimum size devices)
t p = 0.69 RC (1 + 2k + 3k 2 + 4k 3 + 5k 4 + 6k 5 ) / k 5
Improving the speed of MCA
q If all propagate signals are true, and CI is high, six series
n-transistors pull the output node low in the case of
dynamic gate while five transistors are in series in the
static gate
q The worst-case propagation time can be improved by
bypassing the four stages if all carry-propagate signals
are true.
P0 P1 P2 P3 BP
Ci,0 Co,3
G0 G1 G2 G3
BP BP = P0P1P2P3
Carry-Bypass Adder (cont.)
Bit 0-3 Bit 4-7 Bit 8-11 Bit 12-15
Setup tsetup Setup Setup Setup

tbypass
Carry Carry Carry Carry

C i,0 Propagation Propagation t Propagation Propagation
Sum Sum Sum Sum

tsum
M bits
N
t p = t setup + ( M )tcarry + ( - 1)tbypass + ( M - 1)tcarry + t sum
M
Carry Ripple versus Carry Bypass
tp
ripple adder
bypass adder
• Bypass adder is very

Interesting for large adders.
• The ripple adder is actually
faster for small values of N for
which the extra bypass
multiplexers makes the bypass
4..8 Nnot very interesting
Carry Select Adder
Carry Select Adder: Critical Path
Bit 0-3 Bit 4-7 Bit 8-11 Bit 12-15
Setup Setup Setup Setup
"0" Carry "0" Carry "0" Carry "0" Carry

"0" "0" "0" "0"

"1" "1" "1" "1"
Multiplexer Multiplexer Multiplexer Multiplexer

Ci,0 Co,3 Co,7 Co,11 Co,15
Sum Generation Sum Generation Sum Generation Sum Generation
S0-3 S4-7 S8-11 S12-15
Linear Carry Select
Bit 0-3 Bit 4-7 Bit 8-11 Bit 12-15
(1)

"0" "0" "0" "0"
(1)

"1" "1" "1" "1"
(5) (5) (5) (5) (5)
(6) (7) (8)
Multiplexer Multiplexer Multiplexer Multiplexer
Ci,0
(9)
Sum Generation Sum Generation Sum Generation Sum Generation
S0-3 S 4-7 S8-11 S 12-15 (10)
Delay of the linear carry-bypass Adder
q (N/M) equal-length bypass stages, each of which has

M bits
æNö
t add = t setup + Mtcarry + ç ÷t mux + t sum
èM ø
where
tsetup - the fixed overhead to create P and G
tcarry - propagation delay through a single bit
tbypass - propagation delay through the bypass path
of a single stage.
tsum - the time to generate the sum of the final stage
32-bit carry-select adder
Approximate delay = (4+1+1+1+1+1) = 9 gate delay
Square Root Carry Select
Bit 0-1 Bit 2-4 Bit 5-8 Bit 9-13 Bit 14-19

(1)

"0" "0" "0" "0"
(1)

"1" "1" "1" "1"
(3) (3) (4) (5) (6) (7)
(4) (5) (6) (7)
Multiplexer Multiplexer Multiplexer Multiplexer Mux
Ci,0
(8)
Sum Generation Sum Generation Sum Generation Sum Generation Sum
S0-1 S2-4 S5-8 S9-13 S14-19 (9)
Delay of the Square-root carry-bypass Adder
q The idea of the square-root is to equalize the delay of the
carry chain and the select signal generated from the
multiplexer from the previous stage. So one method is to
have more bits and the later stages.
q This simple trick of making the adder stages progressively
longer results in an adder structure with sub-linear delay
characteristics.
q Assume than an N-bit adder contains P stages and the first
stage adds M bits. An additional bit is adder to each
subsequent stage. The following relation then holds:
N = M + ( M + 1) + ( M + 2) + ! + ( M + P - 1)
P ( P - 1) P 2 1
2 = MP + = + P ( M - )
P 2 2 2
q If M <<N then N » and P » 2N
2
q The total delay is given
t add = t setup + Mtcarry + ( )
2 N t mux + t sum
Adder Delays - Comparison
50.0
ripple adder
40.0
30.0
tp
linear select
20.0
10.0 square root select
0.0
0.0 20.0 40.0 60.0
21 N EESM5020/19 Lecture 3
LookAhead - Basic Idea
A0 ,B 0 A1,B 1 AN-1 ,BN-1

...
Ci,0 P0 Ci,1 P1
Ci,N-1 PN-1
...
Carry-Lookahead Adders
q The carry delay can be improved by calculating the carries to each
stage in parallel.
q The carry out can be expressed as
Cout = G + P Cin
where G = AB is the Generate and P =A Å B is the
Propagate.
q When G = 1, it ensures that a carry bit will be generated at Cout
independent of Cout and while P = 1 guarantees that an incoming
carry will propagate to Cout
q Now for the carry of the ith stage Ci is
Ci = Gi + Pi C i-1, where Gi = Ai . Bi and Pi = Ai + Bi or Ai Å
Bi
q Expand this yield:
Ci = Gi+PiCi-1 and Si = C i-1 Å Ai Å Bi or C i-1 Å Pi
Carry-Lookahead Adders
q The size and fan-in of the gates need to implement

this carry-lookahead scheme can clearly get out of
hand. As a result, the number of stages of look-
ahead is usually limited to about four.
C1 =
C2 =
C3 =
C4 =
Look-Ahead: Topology
Expanding Lookahead equations:

C o, k = Gk + Pk (Gk – 1 + Pk – 1Co, k – 2 )
All the way:

C o, k = Gk + Pk ( Gk – 1 + P k – 1( … + P1( G0 + P0 Ci, 0) ) )
Hierarchical Carry Look-ahead
q Problem of CLA:
– Unrolling of carry recurrence can be continued, if
unrolled to level k, resulting in two level AND-OR
structure
• AND Fan-in = k+1, OR Fan-in = k+1
• K+1 transistors in the MOS stack
– Therefore usually limit the size of carry lookahead
• Example: 4 bit
• Still too many stages
q Solution:
– Hierarchy, Block carry look-ahead
– Group carry look-ahead
Block Carry Lookahead
q Fourth bit carry:

ci + 4 = g i + 3 + pi + 3 g i + 2 + pi + 3 pi + 2 g i +1
+ pi + 3 pi + 2 pi +1 g i + pi + 3 pi + 2 pi +1 pi ci -1
q Block generate and propagate:
Gi ,i + 3 = g i + 3 + pi + 3 g i + 2 + pi + 3 pi + 2 g i +1 + pi + 3 pi + 2 pi +1 g i
Pi ,i + 3 = pi + 3 pi + 2 pi +1 pi
ci + 4 = Gi ,i + 3 + Pi ,i + 3ci -1
q Can create groups of groups, or super-groups :
G *j = G j + 3 + Pj + 3G j + 2 + Pj + 3 Pj + 2G j +1 + Pj + 3 Pj + 2 Pj +1G j
Pj* = Pj + 3 Pj + 2 Pj +1 p j
Delay is t d = c1 log éN ù
16-bit adder with group CLA
S12-15 B12-15 A12-15 S8-11 B8-11 A8-11 S4-7 B4-7 A4-7 S0-3 B0-3 A0-3
Sum0-3 PG0-3 Sum0-3 PG0-3 Sum0-3 PG0-3 Sum0-3 PG0-3
C12- P12- G P7- G C P4-7 G C P0-3 G

12- C7- 7-11 4-7 4-7 0-3 0-3
15 11
15 15 11
Cin
CLA1 CLA1 CLA1 CLA1
GP4 GP3 GC2 GG2 GC1 GP1 GG1

GG4 GC3 GG3 GP2
GGG1
GCLA
GGP1
Logarithmic Look-Ahead Adder(1)
– Tree adder
q Also called Parallel-Prefix Adders (PPA)
q Recursive look-ahead – look-ahead across look-
ahead
q an O(log2N) delay order adder based on basic
property of associative operators
q dot operation(*), a generic associativity property:
(a*b)*c = a*(b*c). Under these conditions, combing N
arguments using the * operator can be executed
with a critical path equal to (log2N)t instead of (N-1)t
A0 F
A1 A2 A3 A4 A5 A6 A7
A0
tp~ N
A1
A2
A3
F
A4
A5
A6 tp~ log2(N)
A7
q This property can be applied to the case of an N-bit
adder. Let * operator establish the following
relationship between two tuples (g,p). (A tuple is an
ordered set of values) (g,p) (g1,p1)
(g, p) • (g1, p1 ) = (g 2 , p 2 )
(g, p) • (g1, p1 ) = (g + pg1, pp1 )
(g2,p2)
q The * operator is a function that takes in two sets of
input and produces a set of two outputs. This *
operator is associative but not commutative.
€
q Two extra functions a and b are defined to access
the tuple g = a ( g , p)
p = b ( g , p)
Co ,k = Gk + Pk (Gk -1 + Pk -1 (" + P1 (G0 + P0Ci ,0 ))

Co ,0 = G0 + P0Ci ,0 = g 0 = a ( g 0 , P0 )
Co ,1 = G1 + g 0 P1 = a ((G1 , P1 ) • ( g 0, P0 ))
!
Go ,k = a ((Gk , Pk ) • (Gk -1 , Pk -1 ) • ! • ( g 0 , P0 ))
Log (PPA) Adder structure
q Given P and G calculated for each bit, calculating

the carries at each bit is equal to computing all the
prefixs (g,p) in parallel
(g, p) • (g1, p1 ) • (g 2 , p 2 ) •• (g N −1, p N −1 ) = (g N , p N )
q Since the dot operator is associative we can group
them in different order
q The general structure of the log adder thus looks
like the following: p,g generation of each bit
(I unit delay)
Calculate Carryi for each bit using the parallel

prefix tree (log(N) unit delay)
Compute the sum for each bit

in parallel (1 unit delay)
Brent-Kung Adder
Co , 0 = a ( g 0 , P0 )
Co ,1 = a ((G1 , P1 ) • ( g 0 , P0 ))
C o ,3 = a ((G3 , P3 ) • (G2 , P2 ) • (G1 , P1 ) • ( g 0 , P0 ))
Co , 7 = a (C4-7 , P4-7 ) • (C0,3 , P0,3 ))
Co,0 Co,1
(g0,P0) Co,2 Co,4
(G1,P1) Reverse
Co,3
(G2,P3) Co,5 Tree
(G3,P3)
(G4,P4) Forward Tree
Co,6
(G5,P5)
(G6,P6) Co,7
(G7,P7) (C4-7,P4-7)
16-bit Brent-Kung Log Adder
g15 g14 g13 g12 g11 g10 g9 g8 g7 g6 g5 g4 g3 g2 g1 g0
p15 p14 p13 p12 p11 p10 p9 p8 p7 p6 p5 p4 p3 p2 p1 p0
Cin
C16 C15 C1 C13 C12 C11 C10 C9 C8 C7 C6 C5 C4 C3 C2 C1

4
Properties of Brent-Kung Adder
q number of logic level is proportional to (log2N)

q Gate fan-in is limited, but the fanout can be large,
careful buffering is thus required
q layout is compact
q Once the carry bits are available, the sum bits are
easily derived in constant time.
Kogge-Stone Tree (PPF) Adder
g15 g14 g13 g12 g11 g10 g9 g8 g7 g6 g5 g4 g3 g2 g1 g0
p15 p14 p13 p12 p11 p10 p9 p8 p7 p6 p5 p4 p3 p2 p1 p0
Cin
C16 C15 C1 C13 C12 C11 C10 C9 C8 C7 C6 C5 C4 C3 C2 C1

4
Industrial examples
q Specifications for high speed microprocessors are very
demanding and cutting edge.
q May not achieve the specified results using only a single
technique: Combine different strategies
RA(63:24) RB(63:24) RA(23:0) RB(23:0) LSB
MSB
40 Bit 24 Bit
Carry Select Adder Differential Carry
cout23 Lookahead Adder
64 Bit Adder
EA(23:0)
EA(63:24)
TLB Data
Cache
Compare
Compare
© Dan Stasiak, IBM Rochester, 2001 real_add(40:0)
Industrial Example
q 64-bit adder, on 200MHz DEC Alpha 21064 RISC
microprocessor
q 5ns cycle using 0.75µm technology
q 4 different techniques for this 64-bit adder
– Manchester carry chain is used on the 8-boit level
• Carry chain optimized by tapering down each chain
stage
– Carry-lookahead addition (CLA) was used on the least
significant 32 bits of the adder
– Conditional-sum-addition for the 32 MSB of the adder.
• 6 8-bit select switches used
– Conditional-select adder for the most significant 32 bits
of the 64-bit words
Subtractor
q 2 s complement number for subtraction

q A – B = A + (-B) = A + B +1
AN…1 BN…1 AN…1 BN…1
+ 1 Sub/add
+
SN…1 = A-B SN…1 = A +

-B
Adding multiple inputs
q Adding k N-bit words

– Most obvious method is using k-1 cascaded CPAs
q Better way is to use carry-save adder
– carry-save adder – 3:2 counter or 3:2 compressor
• 3-bit input and 2 bit output
• Can do it parallelly and hence have a constant
delay
• Use fast adder to add the two compressed
numbers
Other types of Adder
q Carry-save adder
q Conditional sum adder
q Very Wide Adder using block generate and block
propagate
q (Reference : Weste: Chapter 8)

Lecture3 - Adder Design

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lecture3 - Adder Design

Uploaded by

Copyright:

Available Formats

EESM5020 VLSI System Design and

Design Automation Spring 2019

q Datapath is the core of the processor and is the

q Worst case delay linear with the number of bits, td = O(N)

Define 3 new variable which ONLY depend on A, B

Can also derive expressions for S and Co based on

With size of the pass-transistors is

Bit 0-3 Bit 4-7 Bit 8-11 Bit 12-15

Setup tsetup Setup Setup Setup

Carry Carry Carry Carry

Sum Sum Sum Sum

• Bypass adder is very

Setup Setup Setup Setup

"0" Carry "0" Carry "0" Carry "0" Carry

"1" Carry "1" Carry "1" Carry "1" Carry

Multiplexer Multiplexer Multiplexer Multiplexer

Sum Generation Sum Generation Sum Generation Sum Generation

S0-3 S4-7 S8-11 S12-15

Setup Setup Setup Setup

"0" Carry "0" Carry "0" Carry "0" Carry

"1" Carry "1" Carry "1" Carry "1" Carry

Sum Generation Sum Generation Sum Generation Sum Generation

S0-3 S 4-7 S8-11 S 12-15 (10)

q (N/M) equal-length bypass stages, each of which has

Approximate delay = (4+1+1+1+1+1) = 9 gate delay

Setup Setup Setup Setup

"0" Carry "0" Carry "0" Carry "0" Carry

"1" Carry "1" Carry "1" Carry "1" Carry

S0-1 S2-4 S5-8 S9-13 S14-19 (9)

10.0 square root select

A0 ,B 0 A1,B 1 AN-1 ,BN-1

q The size and fan-in of the gates need to implement

Expanding Lookahead equations:

All the way:

q Fourth bit carry:

Sum0-3 PG0-3 Sum0-3 PG0-3 Sum0-3 PG0-3 Sum0-3 PG0-3

C12- P12- G P7- G C P4-7 G C P0-3 G

GP4 GP3 GC2 GG2 GC1 GP1 GG1

Co ,k = Gk + Pk (Gk -1 + Pk -1 (" + P1 (G0 + P0Ci ,0 ))

q Given P and G calculated for each bit, calculating

Calculate Carryi for each bit using the parallel

Compute the sum for each bit

C16 C15 C1 C13 C12 C11 C10 C9 C8 C7 C6 C5 C4 C3 C2 C1

q number of logic level is proportional to (log2N)

C16 C15 C1 C13 C12 C11 C10 C9 C8 C7 C6 C5 C4 C3 C2 C1

q 2 s complement number for subtraction

SN…1 = A-B SN…1 = A +

q Adding k N-bit words

You might also like