Ca2 FastAdders

Computer Architecture 10
Fast Adders
Ma d e w i t h Op e n Of f i c e . o r g
Carry Problem
Addition is primary mechanism in implementing arithmetic operations Slow addition directly affects the total performance of the computer Complex (and fast) addition schemes increase the cost of the final implementation The choice for adder structure must be tailormade to the potential applications
RCA
RCA Ripple Carry Adder
(Computer Architecture I)
Simplest, Smallest and Slowest (SSS), but ... Worst-case carry-propagation is (k)
k - digits
Bit-Serial RCA
VLSI implementation advantages:
small pin count reduced wire length high clock rate small space low power consumption
shift X Y
Alternative to pipeline units for parallel processing
FA
Bit-serial RCA implementation
X+Y shift
Coping with Carry

Detect the end of carry propagation
asynchronous adders
Speed up the propagation (CPA Carry Propagate Adders)

CLA / CSLA / CSKA Carry Lookahead / Select / Skip Adders
Limit the carry propagation

CSA Carry Save Adder Estimate & Parallel Carry Adders
Eliminate the carry propagation

Carry-free RNS operations
Adders Overview
1-bit adders
Half Adder HA Full Adder FA
Not complete
Bit Counter (m,k)
CPA
RCA
CSKA CSLA CLA
3-operand
multi-operand adders
CSA Adder Array

Adder Tree 6
Asynchronous Adder
Based on RCA - detection of carry completion Average longest chain of propagation ~ log2k Not suitable for synchronous processing
ai+bi di+1 ai*bi ai+bi di carry (c, d) from previous stage ci ai*bi Complete from other bit positions (c,d) extended carry ----------------------------------(0,0) Carry not yet known (0,1) Carry 0 (1,0) Carry 1 7
ci+1
CLA
CLA Carry Lookahead Adder
Fast but Complex
(Computer Architecture I)
Worst-case carry propagation is (log k)
LA
LA
LA LA Logic - 2nd level
LA Logic - 1st level
k - digits
Carry Select Adder (CSLA)

The oldest logarithmic-time adder The additions are performed in parallel according to alternative scenarios: carry= 0 or 1 The final selection of results is made with the computed value of carry CLSAs have (log k) addition time, but with high complexity of hardware
1 0 1 0 1 c0 0
One-Level CSLA
k1 k/2 RCA k1 k/2 RCA k/2 k/2 Mux 21 (k/2 buses) ck/2 k 1 k/2 k 0 k1 k/2 RCA 0 c0
k/2
cout + High k/2 bits of result

Low k/2 bits of result 10
Block Propagation in CSLA

Not optimal due to block-carry propagation delay
k-1 k/4 RCA k/4 RCA 0 k 1 k-1 k k/4 RCA 1 k/4 RCA 0 ck/2 k-1 k/4 RCA k/4 RCA 0 k 1 k-1 k/4 RCA 0 c0
c3k/4 ck
ck/4
Res. k-1...k
k-1...k
k-1...k
k-1...0
11
Two-Level CSLA
Block-carry propagation is fast, but design complex
k-1 k k/4 RCA 1 k/4 RCA 0 c3k/4 c3k/4 k-1 k k/4 RCA 1 k/4 RCA 0 k-1 k k/4 RCA 1 k/4 RCA 0 k-1 k/4 RCA 0 c0
ck/4
ck/2 ck Result k-1...k k-1...k k-1...0
12
Propagation Chains
Carries can be generated, propagated or absorbed Propagation chains are evaluated in parallel How to speed up the worst-case propagation?
carry propagation 13
worst-case
Carry Skip Adder (CSKA)

Carry-in is c propagated through n-stages if p=1 Propagate condition p is easily computable
4-bit RCA p Carry skip 14
i +1
FA
FA
FA
FA
ci
p = pi*pi+1*pi+2*pi+3
propagation condition
pi=ai+bi
CSKA
16-bit CSKA Adder
4-bit RCA p 4-bit RCA p 4-bit RCA p 4-bit RCA p
c16
Carry skip
c12
Carry skip
c8
Carry skip
c4
Carry skip
c0
worst-case carry-propagate carry-skip carry-propagate
The longest delay due to carry propagation:

propagation through bits 1-3 and OR skip bits 4-11 propagation through bits 12-14
(block 0 and last do not contribute to carry-propagation delay)
15
CSKA Delay Analysis

Assume: 1 block skip-delay = 1 bit carry-propagation
k bits, b bits in skip-block (fixed-size) TfixCSKA carry-propagation delay in fixed-block size CSKA (number of stages the carry must be propagated through)
4-bit RCA p 4-bit RCA p 4-bit RCA p 4-bit RCA p
c16
Carry skip
c12
Carry skip
c8
Carry skip
c4
Carry skip
c0
TfixCSKA= (b 1)
+ 0.5
+ (k/b 2) + (b 1) = 2b + k/b -3.5

+ in last block
in block 0 + OR gate + all skips
e.g. 32-bit: TfixCSKA = 12.5
(b=4, k=32)
16
Fixed-Size Skip Blocks

Optimal size of fixed-size skip blocks:
dTfixCSKA /db = 0 d(2b + k/b -3.5)/db = 2 k/b2 = 0
bopt = (k/2)1/2 TfixCSKA-opt= 2(2k)1/2 3.5
e.g. 16-bit adder bopt 3, TfixCSKA 8 e.g. 32-bit adder bopt = 4, TfixCSKA= 12.5 e.g. 64-bit adder bopt 6, TfixCSKA 19
17
Variable-Size Skip Blocks

Variable-size skip blocks shorten the propagation Optimal configuration is to have the longest block in the middle and the shortest at both ends
t number of blocks (even number) b bits in smallest skip block
Adder Skip Adder Skip Adder Skip Adder Skip Adder Skip Adder Skip Adder Skip Adder Skip
b+1
...
b+t/2-1
b+t/2-1
...
b+1
k = t*(b + t/4 1/2) b = k/t t/4 + 1/2

18

TvarCSKA carry-propagation delay in fixed-block size CSKA (number of stages the carry must be propagated through)
Adder Skip b1 Adder Skip Adder Skip Adder Skip t2 Adder Skip Adder Skip Adder Skip Adder Skip 0.5 + b1
TvarCSKA = 2*(b1) + 0.5 + (t2) = 2k/t + t/2 2.5
Optimal size of fixed-size skip blocks: dTvarCSKA /db = 0 2k/t2 + 1/2 = 0
topt = 2 k1/2 TvarCSKA-opt= 2 k1/2 2.5

19

topt = 2 k1/2 TvarCSKA-opt= 2 k
1/2
2.5
bopt 1 e.g. 16-bit adder topt = 8, TfixCSKA 5.5 e.g. 32-bit adder topt 12, TfixCSKA 9 e.g. 64-bit adder topt = 16, TfixCSKA 13.5
20
Multilevel CSKA
First-level skip blocks get propagate-condition signal from block adders Second-level skip blocks get propagate-condition signal from first-level skip blocks Carry can be propagated over group of skip blocks
Adder Skip Skip Skip Adder Skip Adder Skip Skip Adder Skip Adder Skip Skip Skip 21 Adder Skip Adder Skip Skip Adder Skip
Multilevel CSKA
Multilevel structures are results of complex optimizations and usually are not regular
adders first level skip blocks second level skip blocks
22
Hybrid Fast Adders

Combination of RCA, CLA, CSKA and others allows to satisfy various criteria:
high performance cost-effectiveness low power consumption
23
Example CSLA+CSKA+CLA
CL Logic
CSKA CSKA 0
CSKA CSKA 0
CSKA CSKA 0
CSKA
24
Multioperand Adders
Applications in multiplication, vector and matrix arithmetics and others
x y ------ -------------- x*y x y ------ -------------- x*y
Adding n-numbers of k-bits, total sum has k+log2n bits
25
Serial Implementation
Latency (n log k)
faster than linear dependence on number of operands logarithmic dependence for scaling the operand size
Partial sum
k+log n bits
n-operands k-bits
Fast adder (log k)
Shift register
26
Tree Implementation with CPA

Tree of 2-operand adders (RCA are best here!) For n-operands, n-1 adders are needed (costly) Latency (k+log n) scales well with n
n k-bit operands RCA k+1 RCA k+2 RCA k+3 bit result
RCA k+1
RCA k+1 RCA k+2
27
Look into Tree of RCAs

Adders at higher levels need not wait for full carry propagation from lower-level adders All adders start with just one FA-delay after previous level
FA
FA
HA
single RCA at level i
FA
HA
single RCA at level i+1
28
Tree Implementation with CSA

Tree of 3-operand carry-save adders CSA reduce n-operands to 2-operands, (log n) Final fast CPA is needed, (log k) Latency (log n + log k) scales well with n & k
n k-bit operands CSA CSA CSA CSA CPA
CSA
29

Ca2 FastAdders

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Ca2 FastAdders

Uploaded by

Copyright:

Available Formats

Computer Architecture 10

Alternative to pipeline units for parallel processing

Bit-serial RCA implementation

Coping with Carry

Speed up the propagation (CPA Carry Propagate Adders)

Limit the carry propagation

Eliminate the carry propagation

Bit Counter (m,k)

CSKA CSLA CLA

CSA Adder Array

Worst-case carry propagation is (log k)

LA LA Logic - 2nd level

LA Logic - 1st level

Carry Select Adder (CSLA)

cout + High k/2 bits of result

Low k/2 bits of result 10

Block Propagation in CSLA

ck/2 ck Result k-1...k k-1...k k-1...0

Carry Skip Adder (CSKA)

worst-case carry-propagate carry-skip carry-propagate

The longest delay due to carry propagation:

CSKA Delay Analysis

+ (k/b 2) + (b 1) = 2b + k/b -3.5

in block 0 + OR gate + all skips

e.g. 32-bit: TfixCSKA = 12.5

Fixed-Size Skip Blocks

bopt = (k/2)1/2 TfixCSKA-opt= 2(2k)1/2 3.5

Variable-Size Skip Blocks

k = t*(b + t/4 1/2) b = k/t t/4 + 1/2

Variable-Size Skip Blocks

TvarCSKA = 2*(b1) + 0.5 + (t2) = 2k/t + t/2 2.5

Optimal size of fixed-size skip blocks: dTvarCSKA /db = 0 2k/t2 + 1/2 = 0

topt = 2 k1/2 TvarCSKA-opt= 2 k1/2 2.5

Variable-Size Skip Blocks

Hybrid Fast Adders

Adding n-numbers of k-bits, total sum has k+log2n bits

Fast adder (log k)

Tree Implementation with CPA

RCA k+1 RCA k+2

Look into Tree of RCAs

single RCA at level i

single RCA at level i+1

Tree Implementation with CSA

You might also like