You are on page 1of 4
Carry-Save Addition Prof. Loh ©3220 - Processor Design - Spring 2005 February 2, 2005 1 Adding Multiple Numbers ‘There are many cases where itis desired to add more than two mune together. The straightforward wey of adding together m mumbers (all n bits wide) isto add the fist two, then edd that sum fo the next, ancl so on. This requires ‘total of m — 1 additions, for a total gate delay of O(n ign) (assuming lookahead carry adders). Instead, a tee of adders can be formed, taking only Og m - Ign) gate delays. Using cany save adition, the delay can be reduced further stil. The idea isto take 3 numbers that we want toad together, x+y + z, and comvertit into ? numbers c+ such that x + y-+ = c+ s, anddo this in O(1) time. The reason why addition can note performed in 01) time is because the carry information must be propagated, In cany save addition, we reftein ftom directly passing on the carry information until the very last step. We will fs lustre the general concept witha base 10 example ‘To add three numbers by hand, we typically align the three operands, and then proceed column by column in the same fashion that we perform addition with two munbers. The thre digits ina row are edded, and any overflow goes into the next column, Observe that when thee is some non-zero carry, we axe really adding four digits (the digits of -zyand =, plus the cary) cm 1121 x 12345 y 398172 z+20587 sun, 71104 ‘The camry save approach breaks this process down into two steps. The fist is to compute the sum ignoring any caries: x 12345 y 38172 z+20587 5 e099 4 Each s, is equal to the sum of z, + y, + 2; modulo 10. Now, separately, we can compute the cay on a column by column basis: x 12345 y 38172 z+20587 € Tort In this case, each c; is the sum of the bits from the previous column divided by 10 (ignoring any remainder). Another ‘way to look: at itis that any camry over fiom one column gets put into the next column. Now, we can add together ¢ and s, and we'll verify that itindeed is equal to 2 y+ 2 cout FA os = CsA Figue 1: The cary save adder block isthe same circuit as the full adder CsA CsA csa| cs] csa [esa [esa| | csa Figue 2: One CSA block is used for each bit. This circuit adds three n = § bit numbers together into two new munbers. s 60994 e tlio. ‘am: TIT. ‘The important point is that cand s can be computed independently, and furthermore, each c, (and s;) canbe computed, independently from all of the other c’s (and s's). This achieves our original goal of converting three muibers that we ‘wish to add into two numbers that add up to the same sum, and in (1) time ‘The same concept canbe applied to binary numbers. As a quick example x 1oo11 y. 11001 z+ Olo11 5 coo Ot et+liogid un Trordid What does the circuit to compute s and c look like? It is actuslly identical to the full adr, but with some of the signals renamed. Figure | shows a fll adder and a cary save adder. A cary save adder simply is a full adder with the cjq input renamed to =, the = output (the orginal “answer” output) renamed tos, and the cj,« output renamed to c. Figue 2 shows how n camy save adders are arranged to acd three n bit numbers 2,y and 7 into two numbers c and s. Note thatthe CSA block in bit postion zero generates cy, not co. Similar to the east signficent column when adding numbers by hand (the “blank”, ca is equal to zero. Note that all of the CSA blocks ae independent, thus the entire cicuit takes only O(1) time. To get the final sum, we still need a LCA, which will cost us O(Ig n) delay. The asymptotic gate delay to add three n-bit numbers is thus the same as adling only two n-bit numbers So how long does it take us to add m different n-bit numbers together? The simple approach is just to repeat this trick approximately m times over. This is illustrated in Figure 3. There ze m ~ 2CSA blocks (each block inthe figue actually represents many one-bit CSA blocks in parallel) that we have to go through, and then the final LCA. Note that every time we passthrough a CSA block, our number increases in size by one bit. Therefor, the numbers that go to. the LCA willbe at most n +m ~ 2bits long. So the final LCA will have a gate delay of O (Ig (n+ m)). Therefore the total gate delays O(m + Ig (n+ m)) Instead of amanging the CSA blocks ina chain, «tte formation can actuallybe used. This is slightly awkward ‘because of the odd ratio of 3 to 2. Figue 4 shows how to build a tree of CSAs. This circuits called a Wallace tree AXA AECL a Cd oy Lea) hex Figure 3: Adding m r-bit numbers with a chain of CSA's. ‘The depth of the tee is log m. Like before, the width of the numbers will increase as we get deeper in the tee At the end of the tee, the numbers will be O(n +log m) bits wide, and therefore the LCA will hve a O(Ig (n+ logm)) gate delay The total gate delay ofthe Wallace tree is thus O(log m + lx (n+ logm)) No NaN NTA Figure 4: A Wallace te for adding m n-bit numbers,

You might also like