You are on page 1of 42

EESM5020 VLSI System Design and

Design Automation Spring 2019


Lecture 3 Design of Datapath
Operator - Adder Design

Reading Assignment:
Weste: Chapter 8
Rabaey: Chapter 11

Note: some of the figures in this slide set are adapted from the slide set
of Digital Integrated Circuits by Rabaey et.al. Copyright UCB 2002
1 EESM5020/19 Lecture 3
A Generic Digital Processor

q Datapath is the core of the processor and is the


place where all computations are performed

MEMORY
Input-output

CONTROL
h
e
p
i
c
t
u
r
e
c
a
n
'
t
b
e
d
i
s
p
l
a
y
e
d
.

DATAPATH

2 EESM5020/19 Lecture 3
Full-Adder
A B

C Full Carry
adder
Sum
Sum = ABC + ABC +ABC +ABC
=A+B+C
Carry = MAJ(A,B,C)
=AB + AC + BC = AB + C(A+B)
A Sum
B
C
A
B Carry
C
A
3 B EESM5020/19 Lecture 3
The Ripple-Carry Adder
A B A1 B1 A2 B2 A3 B3
0 0
Co,1 Co,2
Ci,0 Co,0 Co,3
FA FA
The picture can't be displayed.

FA FA
(=Ci,1)

S0 S1 S2 S3

q Worst case delay linear with the number of bits, td = O(N)


q Worst case delay: tadder = NTcarry or (N-1) Tcarry+Tsum
q Propagation delay of ripple carry adder is linearly
proportionally to N.
q When designing the full adder cell for a fast ripple carry
adder, it is far more important to optimize Tcarry than Tsum.

4 EESM5020/19 Lecture 3
Adder
q Fundamental problem is rapidly calculating the carry bit.
q All carry bits are dependent on all previous inputs
– LSB has fanout of N
q Simplest adder: ripper carry – linear adder
q Faster adders: carry look-ahead, carry bypass..
– All of these work out carry several bits at a time
q Even faster adder – logarithmic, tree adder
– Use prefix computation
An-1 Bn-1 A2 B2 A1 B1 A0 B0

C FA ... FA FA FA C0
n

5 Sn-1 S2 S1 S0
EESM5020/19 Lecture 3
Express Sum and Carry as a function of
P, G, D

Define 3 new variable which ONLY depend on A, B


Generate (G) = AB
Propagate (P) = A Å B
Delete (D) = A B

Can also derive expressions for S and Co based on


D and P

6 EESM5020/19 Lecture 3
Manchester Carry Chain Adder
The idea: First generate carry-in for each adder bit as fast
as possible and then evaluate the sum. Delay is still
proportional to number of bits but “constant” is small
Two possible implementations: Static & Dynamic

7 EESM5020/19 Lecture 3
Manchester Carry Chain Adder
q The propagate logic in the Manchester carry chain puts a
lot of NFETs in series.
q If Cin is high, and P signals are true, the path is long.
q The max length of the carry chain is limited to around 4.

8 EESM5020/19 Lecture 3
Manchester Carry Chain

VDD

f
P0 P1 P2 P3 P4

Ci,0
G0 G1 G2 G3 G4

9 EESM5020/19 Lecture 3
Sizing Manchester Carry Chain
Discharge Transistor Using Penfield- Rubenstein Model
R1 R2 R3 R4 R5 R6 Out (Elmore delay)delay time tp is
1 2 3 4 5 6
MC M0 M1 M2 M3 M4 0.69(C1R1 + C2(R1 + R2) + C3(R1 + R2 + R3 )
C1 C2 C3 C4 C5 C6
+ C4(R1 + R2 + R3 + R4) + C5(R1 + R2 + R3 +
R4 + R5) + C6(R1 + R2 + R3 + R4 + R5 + R6) )
N
æ i ö Assume 1.2µm technology and
tp = 0.69 å Ci ç å R j÷ minimum size transistor, C is
i = 1 èj = 1 ø assumed to be 20fF and R =
25 400 20KOhm, we have
20 300 t p = 0.69 ´ 21RC
Speed

With size of the pass-transistors is


Area

15 200
scaled by k we have
10 100 Ri +1
Ci = kCi +1, Ri =
51 1.5 2.0 2.5 3.0
01 1.5 2.0 2.5 3.0 k
k
Speed (normalized by 0.69RC)
k
Area (in minimum size devices)
t p = 0.69 RC (1 + 2k + 3k 2 + 4k 3 + 5k 4 + 6k 5 ) / k 5
10 EESM5020/19 Lecture 3
Improving the speed of MCA
q If all propagate signals are true, and CI is high, six series
n-transistors pull the output node low in the case of
dynamic gate while five transistors are in series in the
static gate
q The worst-case propagation time can be improved by
bypassing the four stages if all carry-propagate signals
are true.
P0 P1 P2 P3 BP
Ci,0 Co,3
G0 G1 G2 G3

BP BP = P0P1P2P3

11 EESM5020/19 Lecture 3
Carry-Bypass Adder (cont.)

Bit 0-3 Bit 4-7 Bit 8-11 Bit 12-15

Setup tsetup Setup Setup Setup


tbypass

Carry Carry Carry Carry


C i,0 Propagation Propagation t Propagation Propagation

Sum Sum Sum Sum


tsum
M bits

N
t p = t setup + ( M )tcarry + ( - 1)tbypass + ( M - 1)tcarry + t sum
M
12 EESM5020/19 Lecture 3
Carry Ripple versus Carry Bypass

tp
ripple adder

bypass adder

• Bypass adder is very


Interesting for large adders.
• The ripple adder is actually
faster for small values of N for
which the extra bypass
multiplexers makes the bypass
4..8 Nnot very interesting
13 EESM5020/19 Lecture 3
Carry Select Adder

14 EESM5020/19 Lecture 3
Carry Select Adder: Critical Path
Bit 0-3 Bit 4-7 Bit 8-11 Bit 12-15

Setup Setup Setup Setup

"0" Carry "0" Carry "0" Carry "0" Carry


"0" "0" "0" "0"

"1" Carry "1" Carry "1" Carry "1" Carry


"1" "1" "1" "1"

Multiplexer Multiplexer Multiplexer Multiplexer


Ci,0 Co,3 Co,7 Co,11 Co,15

Sum Generation Sum Generation Sum Generation Sum Generation

S0-3 S4-7 S8-11 S12-15

15 EESM5020/19 Lecture 3
Linear Carry Select
Bit 0-3 Bit 4-7 Bit 8-11 Bit 12-15

Setup Setup Setup Setup

(1)

"0" Carry "0" Carry "0" Carry "0" Carry


"0" "0" "0" "0"
(1)

"1" Carry "1" Carry "1" Carry "1" Carry


"1" "1" "1" "1"
(5) (5) (5) (5) (5)
(6) (7) (8)
Multiplexer Multiplexer Multiplexer Multiplexer
Ci,0
(9)

Sum Generation Sum Generation Sum Generation Sum Generation

S0-3 S 4-7 S8-11 S 12-15 (10)

16 EESM5020/19 Lecture 3
Delay of the linear carry-bypass Adder

q (N/M) equal-length bypass stages, each of which has


M bits
æNö
t add = t setup + Mtcarry + ç ÷t mux + t sum
èM ø
where
tsetup - the fixed overhead to create P and G
tcarry - propagation delay through a single bit
tbypass - propagation delay through the bypass path
of a single stage.
tsum - the time to generate the sum of the final stage

17 EESM5020/19 Lecture 3
32-bit carry-select adder

Approximate delay = (4+1+1+1+1+1) = 9 gate delay

18 EESM5020/19 Lecture 3
Square Root Carry Select
Bit 0-1 Bit 2-4 Bit 5-8 Bit 9-13 Bit 14-19

Setup Setup Setup Setup


(1)

"0" Carry "0" Carry "0" Carry "0" Carry


"0" "0" "0" "0"
(1)

"1" Carry "1" Carry "1" Carry "1" Carry


"1" "1" "1" "1"
(3) (3) (4) (5) (6) (7)
(4) (5) (6) (7)
Multiplexer Multiplexer Multiplexer Multiplexer Mux
Ci,0
(8)
Sum Generation Sum Generation Sum Generation Sum Generation Sum

S0-1 S2-4 S5-8 S9-13 S14-19 (9)

19 EESM5020/19 Lecture 3
Delay of the Square-root carry-bypass Adder
q The idea of the square-root is to equalize the delay of the
carry chain and the select signal generated from the
multiplexer from the previous stage. So one method is to
have more bits and the later stages.
q This simple trick of making the adder stages progressively
longer results in an adder structure with sub-linear delay
characteristics.
q Assume than an N-bit adder contains P stages and the first
stage adds M bits. An additional bit is adder to each
subsequent stage. The following relation then holds:
N = M + ( M + 1) + ( M + 2) + ! + ( M + P - 1)
P ( P - 1) P 2 1
2 = MP + = + P ( M - )
P 2 2 2
q If M <<N then N » and P » 2N
2
q The total delay is given
t add = t setup + Mtcarry + ( )
2 N t mux + t sum
20 EESM5020/19 Lecture 3
Adder Delays - Comparison

50.0

ripple adder
40.0

30.0
tp

linear select

20.0

10.0 square root select

0.0
0.0 20.0 40.0 60.0
21 N EESM5020/19 Lecture 3
LookAhead - Basic Idea

A0 ,B 0 A1,B 1 AN-1 ,BN-1


...

Ci,0 P0 Ci,1 P1
Ci,N-1 PN-1

...

22 EESM5020/19 Lecture 3
Carry-Lookahead Adders
q The carry delay can be improved by calculating the carries to each
stage in parallel.
q The carry out can be expressed as
Cout = G + P Cin
where G = AB is the Generate and P =A Å B is the
Propagate.
q When G = 1, it ensures that a carry bit will be generated at Cout
independent of Cout and while P = 1 guarantees that an incoming
carry will propagate to Cout
q Now for the carry of the ith stage Ci is
Ci = Gi + Pi C i-1, where Gi = Ai . Bi and Pi = Ai + Bi or Ai Å
Bi
q Expand this yield:
Ci = Gi+PiCi-1 and Si = C i-1 Å Ai Å Bi or C i-1 Å Pi

23 EESM5020/19 Lecture 3
Carry-Lookahead Adders

q The size and fan-in of the gates need to implement


this carry-lookahead scheme can clearly get out of
hand. As a result, the number of stages of look-
ahead is usually limited to about four.
C1 =

C2 =

C3 =

C4 =

24 EESM5020/19 Lecture 3
Look-Ahead: Topology

Expanding Lookahead equations:


C o, k = Gk + Pk (Gk – 1 + Pk – 1Co, k – 2 )

All the way:


C o, k = Gk + Pk ( Gk – 1 + P k – 1( … + P1( G0 + P0 Ci, 0) ) )

25 EESM5020/19 Lecture 3
Hierarchical Carry Look-ahead

q Problem of CLA:
– Unrolling of carry recurrence can be continued, if
unrolled to level k, resulting in two level AND-OR
structure
• AND Fan-in = k+1, OR Fan-in = k+1
• K+1 transistors in the MOS stack
– Therefore usually limit the size of carry lookahead
• Example: 4 bit
• Still too many stages
q Solution:
– Hierarchy, Block carry look-ahead
– Group carry look-ahead

26 EESM5020/19 Lecture 3
Block Carry Lookahead

q Fourth bit carry:


ci + 4 = g i + 3 + pi + 3 g i + 2 + pi + 3 pi + 2 g i +1
+ pi + 3 pi + 2 pi +1 g i + pi + 3 pi + 2 pi +1 pi ci -1
q Block generate and propagate:
Gi ,i + 3 = g i + 3 + pi + 3 g i + 2 + pi + 3 pi + 2 g i +1 + pi + 3 pi + 2 pi +1 g i
Pi ,i + 3 = pi + 3 pi + 2 pi +1 pi
ci + 4 = Gi ,i + 3 + Pi ,i + 3ci -1
q Can create groups of groups, or super-groups :
G *j = G j + 3 + Pj + 3G j + 2 + Pj + 3 Pj + 2G j +1 + Pj + 3 Pj + 2 Pj +1G j
Pj* = Pj + 3 Pj + 2 Pj +1 p j
Delay is t d = c1 log éN ù
27 EESM5020/19 Lecture 3
16-bit adder with group CLA

S12-15 B12-15 A12-15 S8-11 B8-11 A8-11 S4-7 B4-7 A4-7 S0-3 B0-3 A0-3

Sum0-3 PG0-3 Sum0-3 PG0-3 Sum0-3 PG0-3 Sum0-3 PG0-3

C12- P12- G P7- G C P4-7 G C P0-3 G


12- C7- 7-11 4-7 4-7 0-3 0-3
15 11
15 15 11
Cin
CLA1 CLA1 CLA1 CLA1

GP4 GP3 GC2 GG2 GC1 GP1 GG1


GG4 GC3 GG3 GP2

GGG1
GCLA
GGP1

28 EESM5020/19 Lecture 3
Logarithmic Look-Ahead Adder(1)
– Tree adder
q Also called Parallel-Prefix Adders (PPA)
q Recursive look-ahead – look-ahead across look-
ahead
q an O(log2N) delay order adder based on basic
property of associative operators
q dot operation(*), a generic associativity property:
(a*b)*c = a*(b*c). Under these conditions, combing N
arguments using the * operator can be executed
with a critical path equal to (log2N)t instead of (N-1)t

29 EESM5020/19 Lecture 3
Logarithmic Look-Ahead Adder(2)

A0 F

A1 A2 A3 A4 A5 A6 A7

A0
tp~ N
A1

A2
A3
F
A4
A5
A6 tp~ log2(N)
A7

30 EESM5020/19 Lecture 3
Logarithmic Look-Ahead Adder(3)
q This property can be applied to the case of an N-bit
adder. Let * operator establish the following
relationship between two tuples (g,p). (A tuple is an
ordered set of values) (g,p) (g1,p1)
(g, p) • (g1, p1 ) = (g 2 , p 2 )
(g, p) • (g1, p1 ) = (g + pg1, pp1 )
(g2,p2)
q The * operator is a function that takes in two sets of
input and produces a set of two outputs. This *
operator is associative but not commutative.

q Two extra functions a and b are defined to access
the tuple g = a ( g , p)
p = b ( g , p)
31 EESM5020/19 Lecture 3
Logarithmic Look-Ahead Adder(4)

Co ,k = Gk + Pk (Gk -1 + Pk -1 (" + P1 (G0 + P0Ci ,0 ))


Co ,0 = G0 + P0Ci ,0 = g 0 = a ( g 0 , P0 )
Co ,1 = G1 + g 0 P1 = a ((G1 , P1 ) • ( g 0, P0 ))
!
Go ,k = a ((Gk , Pk ) • (Gk -1 , Pk -1 ) • ! • ( g 0 , P0 ))

32 EESM5020/19 Lecture 3
Log (PPA) Adder structure

q Given P and G calculated for each bit, calculating


the carries at each bit is equal to computing all the
prefixs (g,p) in parallel
(g, p) • (g1, p1 ) • (g 2 , p 2 ) •• (g N −1, p N −1 ) = (g N , p N )
q Since the dot operator is associative we can group
them in different order
q The general structure of the log adder thus looks
like the following: p,g generation of each bit
(I unit delay)

Calculate Carryi for each bit using the parallel


prefix tree (log(N) unit delay)

Compute the sum for each bit


in parallel (1 unit delay)
33 EESM5020/19 Lecture 3
Brent-Kung Adder
Co , 0 = a ( g 0 , P0 )
Co ,1 = a ((G1 , P1 ) • ( g 0 , P0 ))
C o ,3 = a ((G3 , P3 ) • (G2 , P2 ) • (G1 , P1 ) • ( g 0 , P0 ))
Co , 7 = a (C4-7 , P4-7 ) • (C0,3 , P0,3 ))
Co,0 Co,1
(g0,P0) Co,2 Co,4
(G1,P1) Reverse
Co,3
(G2,P3) Co,5 Tree
(G3,P3)
(G4,P4) Forward Tree
Co,6
(G5,P5)
(G6,P6) Co,7
(G7,P7) (C4-7,P4-7)
34 EESM5020/19 Lecture 3
16-bit Brent-Kung Log Adder
g15 g14 g13 g12 g11 g10 g9 g8 g7 g6 g5 g4 g3 g2 g1 g0
p15 p14 p13 p12 p11 p10 p9 p8 p7 p6 p5 p4 p3 p2 p1 p0
Cin

C16 C15 C1 C13 C12 C11 C10 C9 C8 C7 C6 C5 C4 C3 C2 C1


4

35 EESM5020/19 Lecture 3
Properties of Brent-Kung Adder

q number of logic level is proportional to (log2N)


q Gate fan-in is limited, but the fanout can be large,
careful buffering is thus required
q layout is compact
q Once the carry bits are available, the sum bits are
easily derived in constant time.

36 EESM5020/19 Lecture 3
Kogge-Stone Tree (PPF) Adder
g15 g14 g13 g12 g11 g10 g9 g8 g7 g6 g5 g4 g3 g2 g1 g0
p15 p14 p13 p12 p11 p10 p9 p8 p7 p6 p5 p4 p3 p2 p1 p0
Cin

C16 C15 C1 C13 C12 C11 C10 C9 C8 C7 C6 C5 C4 C3 C2 C1


4

37 EESM5020/19 Lecture 3
Industrial examples
q Specifications for high speed microprocessors are very
demanding and cutting edge.
q May not achieve the specified results using only a single
technique: Combine different strategies
RA(63:24) RB(63:24) RA(23:0) RB(23:0) LSB
MSB
40 Bit 24 Bit
Carry Select Adder Differential Carry
cout23 Lookahead Adder
64 Bit Adder
EA(23:0)
EA(63:24)
TLB Data
Cache
Compare
Compare
© Dan Stasiak, IBM Rochester, 2001 real_add(40:0)
38 EESM5020/19 Lecture 3
Industrial Example
q 64-bit adder, on 200MHz DEC Alpha 21064 RISC
microprocessor
q 5ns cycle using 0.75µm technology
q 4 different techniques for this 64-bit adder
– Manchester carry chain is used on the 8-boit level
• Carry chain optimized by tapering down each chain
stage
– Carry-lookahead addition (CLA) was used on the least
significant 32 bits of the adder
– Conditional-sum-addition for the 32 MSB of the adder.
• 6 8-bit select switches used
– Conditional-select adder for the most significant 32 bits
of the 64-bit words

39 EESM5020/19 Lecture 3
Subtractor

q 2 s complement number for subtraction


q A – B = A + (-B) = A + B +1
AN…1 BN…1 AN…1 BN…1

+ 1 Sub/add
+

SN…1 = A-B SN…1 = A +


-B

40 EESM5020/19 Lecture 3
Adding multiple inputs

q Adding k N-bit words


– Most obvious method is using k-1 cascaded CPAs
q Better way is to use carry-save adder
– carry-save adder – 3:2 counter or 3:2 compressor
• 3-bit input and 2 bit output
• Can do it parallelly and hence have a constant
delay
• Use fast adder to add the two compressed
numbers

41 EESM5020/19 Lecture 3
Other types of Adder

q Carry-save adder
q Conditional sum adder
q Very Wide Adder using block generate and block
propagate
q (Reference : Weste: Chapter 8)

42 EESM5020/19 Lecture 3

You might also like