You are on page 1of 14

How does CLA (carry look-

ahead adder) work?


Wei-jen Hsu
TA for EE457 at USC, fall 2004
Modified fall 2005
A simple one-bit full adder

A B

Cout (+) Cin

It takes A, B, and Cin as input and generates S and Cout in 2 gate delays (SOP)
4-bit RCA
A3 B3 A2 B2 A1 B1 A0 B0

C4 (+) C3
(+) C2
(+) C1
(+) C0
Carry propagation
forms a long sequential
wait chain, hence RCA
Is slow!! S3 S2 S1 S0

• Work from lowest bit to highest bit sequentially.


• With A0, B0, and C0, the lowest bit adder generates S0 and C1 in 2 gate delay.
• With A1, B1, and C1 ready, the second bit adder generates S1 and C2 in 2 gate
delay.
• Each bit adder has to wait for the lower bit adder to propagate the carry.
Observations
• The critical component each bit adder
waits for is the carry input.
• Instead of generating and propagating
carry bit-by-bit, can we generate all of
them in parallel and break the sequential
chain?
• This is exactly the idea of CLA (carry look-
ahead adder).
Carry Look Ahead Logic
• Now even before the carry in (Cin) is
available, based on the inputs (A,B) only,
can we say anything about the carry out?
• Under what condition will the bit propagate
an outgoing carry (Cout), if there is an
incoming carry (Cin)?
• Under what condition will the bit generate
an outgoing carry (Cout), regardless of
whether there is an incoming carry (Cin)?
1-bit CLA adder
A B

S
(+) Cin

p g

• Instead of Cout, an 1-bit CLA adder block takes A, B inputs and generates p,g
• p=propagator =>I will propagate the Cin to the next bit. p = A+B
(If either A or B is 1, Cin=1 causes Cout=1)
• g=generator =>I will generate a Cout independent of what Cin is. g = AB
(If both A and B are 1, Cout=1 for sure)
• p,g are generated in 1 gate delay after we have A,B. Note that Cin is not needed
to generate p,g.
• S is generated in 2 gate delay after we get Cin (SOP).
4-bit CLA
A B A B A B A B

C3 C2
(+) (+) (+) C1
(+) C0

p g p g p g p g

CLL (carry look-ahead logic)


C4

• The CLL takes p,g from all 4 bits and C0 as input to generate all Cs in 2 gate delay.
• C1=g0+p0C0,
• C2=g1+p1g0+p1p0C0,
• C3=g2+p2g1+p2p1g0+p2p1p0c0,
• C4=g3+p3g2+p3p2g1+p3p2p1g0+p3p2p1p0c0 (Note: this C4 is too complicated
to generate in 2-level SOP representation)
4-bit CLA
A3 B3 A2 B2 A1 B1 A0 B0

C3 C2
S3
(+) S2
(+) S1
(+) C1
S0
(+) C0

p3 g3 p2 g2 p1 g1 p0 g0

CLL (carry look-ahead logic)

• Given A,B’s, all p,g’s are generated in 1 gate delay in parallel.


• Given all p,g’s, all C’s are generated in 2 gate delay in parallel.
• Given all C’s, all S’s are generated in 2 gate delay in parallel.
• Key virtue of CLA: sequential operation in RCA is broken into parallel operation!!
Observation
• The CLL block cannot be made too big (at
most 4 bits) because if the equations for
C’s are too long it cannot be evaluated in 2
gate delay.
• So how about long operands, say 16 bits?
• We add another layer of CLL and make a
multi-level CLA.
16-bit CLA

• Same as before, p,g’s are generated in parallel in 1 gate delay


• Now, without input carry, the first-tier CLL cannot generate C’s……
Instead they generate P,G’s (group propagator and group generator) in 2 gate delay
P => This group will propagate the input carry to the group P=p0p1p2p3
G => This group will generate a output carry G=g3+p3g2+p3p2g1+p3p2p1g0
• The second-tier CLL takes the P,G’s from first-tier CLLs and C0 to generate “seed C’s”
for first-tier CLLs in 2 gate delay. (note that the logic for generating “seed C’s” from
P,G’s is exactly the same to generating C’s from p,g’s!)
• With the seed C’s as input, the first-tier CLLs use Cin and p,g’s to generate C’s in
2 gate delay Therefore, totally
• With all C’s in place, S’s are calculated in 2 gate delay 1+2+2+2+2=9 gate delay
to finish the whole thing!!
Now, how about 64-bit CLA?
• You can visualize that in mind by yourself
now, I guess.
A bit more details
A3 B3 A2 B2 A1 B1 A0 B0

C3 C2
S3
(+) S2
(+) S1
(+) C1
S0
(+) C0

p3 g3 p2 g2 p1 g1 p0 g0

CLL (carry look-ahead logic)


C4

• Do all these 4 S’s (S3, S2, S1, S0) come together?


Actually no! Since C0 is available from the beginning, S0 can be calculated in
2 gate delays (using original SOP expression for S bit in a single bit adder)
(before S3,S2,S1)
A bit more details (Cont’d)

C0

• Again, actually not all the S’s come together!


• C0 is readily available, so S0 can be calculated in 2 gate delays.
• Since C0 is readily available, the lowest first-tier CLL can generate C1, C2, C3
independent of the second-tier CLL. Since C1, C2, C3 are done earlier, so is S1,
S2, S3. (in 5 gate delays. 1 for (p,g), 2 for (C1,C2,C3), 2 for S)
• When we get C4, C8, C12, we can start to calculate S4, S8, S12 and get them
in 2 more gate delays. That is the same time when we get the other C’s (purple
guys) at 7 gate delays.
• Timing for items generated (in terms of gate delay):
black=already available (and special case for S0=4), orange=1, green=3, blue=5,
purple=7, brown=9
Thanks!!

(You can distribute these slides as


one whole file to anywhere you
feel it may be useful.)

You might also like