You are on page 1of 29

Computer Architecture 10

Fast Adders

Ma d e w i t h Op e n Of f i c e . o r g

Carry Problem
Addition is primary mechanism in implementing arithmetic operations Slow addition directly affects the total performance of the computer Complex (and fast) addition schemes increase the cost of the final implementation The choice for adder structure must be tailormade to the potential applications

Ma d e w i t h Op e n Of f i c e . o r g

RCA
RCA Ripple Carry Adder

(Computer Architecture I)

Simplest, Smallest and Slowest (SSS), but ... Worst-case carry-propagation is (k)

k - digits

Ma d e w i t h Op e n Of f i c e . o r g

Bit-Serial RCA
VLSI implementation advantages:
small pin count reduced wire length high clock rate small space low power consumption
shift X Y

Alternative to pipeline units for parallel processing

FA

Bit-serial RCA implementation

X+Y shift

Ma d e w i t h Op e n Of f i c e . o r g

Coping with Carry


Detect the end of carry propagation
asynchronous adders

Speed up the propagation (CPA Carry Propagate Adders)


CLA / CSLA / CSKA Carry Lookahead / Select / Skip Adders

Limit the carry propagation


CSA Carry Save Adder Estimate & Parallel Carry Adders

Eliminate the carry propagation


Carry-free RNS operations
Ma d e w i t h Op e n Of f i c e . o r g

Adders Overview
1-bit adders
Half Adder HA Full Adder FA

Not complete

Bit Counter (m,k)

CPA

RCA

CSKA CSLA CLA

3-operand

multi-operand adders

CSA Adder Array


Ma d e w i t h Op e n Of f i c e . o r g

Adder Tree 6

Asynchronous Adder
Based on RCA - detection of carry completion Average longest chain of propagation ~ log2k Not suitable for synchronous processing
ai+bi di+1 ai*bi ai+bi di carry (c, d) from previous stage ci ai*bi Complete from other bit positions (c,d) extended carry ----------------------------------(0,0) Carry not yet known (0,1) Carry 0 (1,0) Carry 1 7

ci+1

Ma d e w i t h Op e n Of f i c e . o r g

CLA
CLA Carry Lookahead Adder
Fast but Complex

(Computer Architecture I)

Worst-case carry propagation is (log k)

LA

LA

LA LA Logic - 2nd level

LA Logic - 1st level

k - digits
Ma d e w i t h Op e n Of f i c e . o r g

Carry Select Adder (CSLA)


The oldest logarithmic-time adder The additions are performed in parallel according to alternative scenarios: carry= 0 or 1 The final selection of results is made with the computed value of carry CLSAs have (log k) addition time, but with high complexity of hardware
1 0 1 0 1 c0 0

Ma d e w i t h Op e n Of f i c e . o r g

One-Level CSLA
k1 k/2 RCA k1 k/2 RCA k/2 k/2 Mux 21 (k/2 buses) ck/2 k 1 k/2 k 0 k1 k/2 RCA 0 c0

k/2

cout + High k/2 bits of result


Ma d e w i t h Op e n Of f i c e . o r g

Low k/2 bits of result 10

Block Propagation in CSLA


Not optimal due to block-carry propagation delay
k-1 k/4 RCA k/4 RCA 0 k 1 k-1 k k/4 RCA 1 k/4 RCA 0 ck/2 k-1 k/4 RCA k/4 RCA 0 k 1 k-1 k/4 RCA 0 c0

c3k/4 ck

ck/4

Res. k-1...k

k-1...k

k-1...k

k-1...0

Ma d e w i t h Op e n Of f i c e . o r g

11

Two-Level CSLA
Block-carry propagation is fast, but design complex
k-1 k k/4 RCA 1 k/4 RCA 0 c3k/4 c3k/4 k-1 k k/4 RCA 1 k/4 RCA 0 k-1 k k/4 RCA 1 k/4 RCA 0 k-1 k/4 RCA 0 c0

ck/4

ck/2 ck Result k-1...k k-1...k k-1...0

Ma d e w i t h Op e n Of f i c e . o r g

12

Propagation Chains
Carries can be generated, propagated or absorbed Propagation chains are evaluated in parallel How to speed up the worst-case propagation?

Ma d e w i t h Op e n Of f i c e . o r g

carry propagation 13

worst-case

Carry Skip Adder (CSKA)


Carry-in is c propagated through n-stages if p=1 Propagate condition p is easily computable
4-bit RCA p Carry skip 14
i +1

FA

FA

FA

FA

ci

p = pi*pi+1*pi+2*pi+3

propagation condition

pi=ai+bi

Ma d e w i t h Op e n Of f i c e . o r g

CSKA
16-bit CSKA Adder
4-bit RCA p 4-bit RCA p 4-bit RCA p 4-bit RCA p

c16

Carry skip

c12

Carry skip

c8

Carry skip

c4

Carry skip

c0

worst-case carry-propagate carry-skip carry-propagate

The longest delay due to carry propagation:


propagation through bits 1-3 and OR skip bits 4-11 propagation through bits 12-14
(block 0 and last do not contribute to carry-propagation delay)
Ma d e w i t h Op e n Of f i c e . o r g

15

CSKA Delay Analysis


Assume: 1 block skip-delay = 1 bit carry-propagation
k bits, b bits in skip-block (fixed-size) TfixCSKA carry-propagation delay in fixed-block size CSKA (number of stages the carry must be propagated through)
4-bit RCA p 4-bit RCA p 4-bit RCA p 4-bit RCA p

c16

Carry skip

c12

Carry skip

c8

Carry skip

c4

Carry skip

c0

TfixCSKA= (b 1)

+ 0.5

+ (k/b 2) + (b 1) = 2b + k/b -3.5


+ in last block

in block 0 + OR gate + all skips

e.g. 32-bit: TfixCSKA = 12.5

(b=4, k=32)
16

Ma d e w i t h Op e n Of f i c e . o r g

Fixed-Size Skip Blocks


Optimal size of fixed-size skip blocks:
dTfixCSKA /db = 0 d(2b + k/b -3.5)/db = 2 k/b2 = 0

bopt = (k/2)1/2 TfixCSKA-opt= 2(2k)1/2 3.5

e.g. 16-bit adder bopt 3, TfixCSKA 8 e.g. 32-bit adder bopt = 4, TfixCSKA= 12.5 e.g. 64-bit adder bopt 6, TfixCSKA 19
Ma d e w i t h Op e n Of f i c e . o r g

17

Variable-Size Skip Blocks


Variable-size skip blocks shorten the propagation Optimal configuration is to have the longest block in the middle and the shortest at both ends
t number of blocks (even number) b bits in smallest skip block
Adder Skip Adder Skip Adder Skip Adder Skip Adder Skip Adder Skip Adder Skip Adder Skip

b+1

...

b+t/2-1

b+t/2-1

...

b+1

k = t*(b + t/4 1/2) b = k/t t/4 + 1/2


Ma d e w i t h Op e n Of f i c e . o r g

18

Variable-Size Skip Blocks


TvarCSKA carry-propagation delay in fixed-block size CSKA (number of stages the carry must be propagated through)
Adder Skip b1 Adder Skip Adder Skip Adder Skip t2 Adder Skip Adder Skip Adder Skip Adder Skip 0.5 + b1

TvarCSKA = 2*(b1) + 0.5 + (t2) = 2k/t + t/2 2.5

Optimal size of fixed-size skip blocks: dTvarCSKA /db = 0 2k/t2 + 1/2 = 0

topt = 2 k1/2 TvarCSKA-opt= 2 k1/2 2.5


Ma d e w i t h Op e n Of f i c e . o r g

19

Variable-Size Skip Blocks


topt = 2 k1/2 TvarCSKA-opt= 2 k
1/2

2.5

bopt 1 e.g. 16-bit adder topt = 8, TfixCSKA 5.5 e.g. 32-bit adder topt 12, TfixCSKA 9 e.g. 64-bit adder topt = 16, TfixCSKA 13.5
Ma d e w i t h Op e n Of f i c e . o r g

20

Multilevel CSKA
First-level skip blocks get propagate-condition signal from block adders Second-level skip blocks get propagate-condition signal from first-level skip blocks Carry can be propagated over group of skip blocks
Adder Skip Skip Skip Adder Skip Adder Skip Skip Adder Skip Adder Skip Skip Skip 21 Adder Skip Adder Skip Skip Adder Skip

Ma d e w i t h Op e n Of f i c e . o r g

Multilevel CSKA
Multilevel structures are results of complex optimizations and usually are not regular
adders first level skip blocks second level skip blocks

Ma d e w i t h Op e n Of f i c e . o r g

22

Hybrid Fast Adders


Combination of RCA, CLA, CSKA and others allows to satisfy various criteria:
high performance cost-effectiveness low power consumption

Ma d e w i t h Op e n Of f i c e . o r g

23

Example CSLA+CSKA+CLA

CL Logic

CSKA CSKA 0

CSKA CSKA 0

CSKA CSKA 0

CSKA

Ma d e w i t h Op e n Of f i c e . o r g

24

Multioperand Adders
Applications in multiplication, vector and matrix arithmetics and others
x y ------ -------------- x*y x y ------ -------------- x*y

Adding n-numbers of k-bits, total sum has k+log2n bits

Ma d e w i t h Op e n Of f i c e . o r g

25

Serial Implementation
Latency (n log k)
faster than linear dependence on number of operands logarithmic dependence for scaling the operand size

Partial sum

k+log n bits

n-operands k-bits

Fast adder (log k)

Shift register

Ma d e w i t h Op e n Of f i c e . o r g

26

Tree Implementation with CPA


Tree of 2-operand adders (RCA are best here!) For n-operands, n-1 adders are needed (costly) Latency (k+log n) scales well with n
n k-bit operands RCA k+1 RCA k+2 RCA k+3 bit result
Ma d e w i t h Op e n Of f i c e . o r g

RCA k+1

RCA k+1 RCA k+2

27

Look into Tree of RCAs


Adders at higher levels need not wait for full carry propagation from lower-level adders All adders start with just one FA-delay after previous level

FA

FA

HA

single RCA at level i

FA

HA

single RCA at level i+1

Ma d e w i t h Op e n Of f i c e . o r g

28

Tree Implementation with CSA


Tree of 3-operand carry-save adders CSA reduce n-operands to 2-operands, (log n) Final fast CPA is needed, (log k) Latency (log n + log k) scales well with n & k
n k-bit operands CSA CSA CSA CSA CPA
Ma d e w i t h Op e n Of f i c e . o r g

CSA

29

You might also like