You are on page 1of 148
11.2 Datapaths in Digital Processor Architectures We introduced the concept of a digital processor in Chapter 8. Its components consist of the datapath, memory, control, and inpuVoutput blocks. The datapath is the core of the processor— this is where all computations are performed. The other blocks in the processor are support units that either store the results produced by the datapath or help to determine what will happen in the next cycle. A typical datapath consists of an interconnection of basic combinational func- tions, such as arithmetic operators (addition, multiplication, comparison, and shift) or logic (AND, OR, and XOR). The design of the arithmetic operators is the topic of this chapter. The often are arranged in a bitsiced organization, as shown in eqeng on single-bit digital signal, the data in a processor oe a sax. Typical microprocessor datapaths are 32 or 6 bits wide, while the ‘san datapaths, such as those in DSL. modems, magnetic disk drives. or srl whirary width, typically 5 to 24 bits, For instance, a 32-bit pce that are 32 bits wide. Tis is reflected in the organization of the datapath Sine the sane sete fequently has to be performed on each bit ofthe data wor, the datapath conse ot ‘Theses, each operating on a single bit—hence the term bit sliced. Bit slices are ithe den salorreemble a similar structure for all bits. The datapath designer can concentrate onthe teapot a singe slice that is repeated 32 times. 113 The Adder ‘dian i the most commonly used arithmetic operation. It often isthe speed-limiing element "vel Therefore, careful optimization of the adder is of the utmost importance. This optimiza ‘can proced either at the logic or circuit level. Typical logic-level optimizations try to rea ‘she Boolean equations so that a faster or smaller circuit is obtained. An example of such 8 ‘optimization is the carry lookahead adder discussed later inthe chapter. Circuit opiiza- ibe te other hand, manipulate transistor sizes and circuit topology to optimize the sss ‘considering both optimization processes, we provide a short summary of the basic ‘of an adder circuit (as defined in any book on logic design [e-g.. Katz94)). igure 11-1 Instead ied in a word-based dedicated signal pro- ‘ompact-dise players sor operates on data 113.1. The Binary Adder: Definitions 11 shows the truth table of a binary full adder. A S*pu. Sis the sum output, and C, i the carry output. ‘renin Eg, (11.1). an 9 hater ints i ‘The Boolean expressions fr Sam Ce ee ~ 41 # Designing Arithmet . 02 Chapter He Buliding Table 11-1 Truth table for fll adder A 8 a Carry Statug te 0 soe a Propagate —~ © S=A@BeC, = ABC,+ABC,+ABC,+ ABC, C, = AB+BC,+AC; (uy It is often useful from an implementation perspective to define $ and C, as funcion ¢ some intermediate signals G (generate), D (delete), and P (propagate).' G = 1 (D = t) exsics ‘thats carry bit will be generated (deleted) at C, independent of C;, while P= 1 guarantees te ‘an incoming carry will propagate to C,.. Expressions for these signals can be derived f= ‘mspection of the truth table: G=aB D=45 ( P=A@B ‘We can sewrite $ and C, as functions of P and G (or D): CAG,P) = G+PC, S(G,P) = P@C, ‘Notice that G and P are only functions of A and B and are not dependent upon Cl silt way, we can also derive expressions for S\D, P) and C.(D, P) a Con Gs FA. ra } pague 1.2 Four pple-cay adder: opty. ater canbe constructed by cascading Nflladder (FA) circuits in series, con- tee ge = 110 Nels andthe ist cary-in Cy 0 O (Figure 11-2. This config = car adder since the cary bite fom one stag tthe ee The seth the cet depends upon the numberof loge stages that mast be traversed and is ee ed ipl sina, For some pt inl, no ping lect ccs ta 86 cay bs peal th wa rm he sgn Bb) tthe ms Sem Te ropution dey of sch tact alle the rica pat) fost cate delay vel posible inpat partes Sew ofthe ipple-cay ade, the worst case ely happens when a cary generated carat agian it poston propagates all the way tothe most significant bit positon, Foal consume inthe last stage to produce the sum. The delay is then propor umber of is inthe input words W and is approximated by an aader™ (N~ Vheanry * foam rr) si gy Ad fg equal the propagation delays from C; to C, and S, respectively? eee eee Eee eee eee imple] Propagation Delay of Ripple-Carry Adder Derive the values of A, and By (k = 0...1V ~ 1) so that the worst case delay is obtained for temple cary adder ‘The worst case condition requires that a carry be generated at the Isb position. Since the input cary of the first full adder Cp is always 0, this both Ap and By must equal 1. All te oer sages must be in propagate mode. Hence, either A, or B, must be high. Finally. ‘we would like to physically measure the delay of a transition on the msb sum bit. Assurn- ing an initial value of O for Sy_, we must arrange a 0 —> 1 transition. This is achieved by ‘eting both Ay; and By., to 0 (or 1), which yields a high sum bit given the incoming cary For example, the following values for A and B trigger the worst case delay for an 8- bitadition: ‘A: 0000001; B: OL LIL ~ Seen yee er tthe ora OO ee To set-up the worst case delay transition, ll the inputs can be Kept constant with Ay ‘undergoing a 0 —» 1 transition, ‘The left-most bit represents the msb inthis binary representation. Observe that this is only one of the many worst case patterns. This case exercises the 0 > 1 delay of the final sum, Derive several other cases that exercise the 0 | and 1 9 0 transitions, ‘Two important conclusions can be drawn from Eq. (11.4) “The propagation delay ofthe ripple-carry adder is linearly proportional tN. This property ‘becomes increasingly important when designing adders for the wide data paths (N= 16,..128) that are desirable in current and future computers. + When designing the full-adder cell fr a fast ripple-carry adder, it is far more important to OptMZe far, tA Sie the later has only @ minor influence on the total value f ye ‘Before staring an in-depth discussion on te circuit design of full-adder cells, the follow ‘ng additional logic property ofthe full adder is worth mentioning: Inverting all inputs to full adder results in inverted values forall outputs. ‘This propery, also called the inverting propery, is expressed inthe pair of equations 5(A, B,C) = (A,B,C) CAA.B.C,) = ClA,BE) and will be extremely useful when optimizing the speed of the ripple-carry adder. It states that the circuits of Figure 11-3 are identical. 11.32 The Full Adder: Circuit Design Considerations Static Adder Circuit (One way to implement the full-adder circuit is to take the logic equations of Eq. (1.1) and translate them directly into complementary CMOS circuitry. Some logic manipulations can help to reduce the transistor count. For instance, itis advantageous to share some logic between the as a8 ca] mm bec, = Gad Bm pee 5 3 11-9. Inverting property of te full acer {The oxcies in the schematics represent inverters e/~ | ese tt4 Complementary static CMOS implementation of full adder. ge nicary-peneraton subcircuit, as long as this does not slow down the carry generation, Mais th most critical part, as stated previously. The following is an example of such a reor- equation se: C, = AB+BC,+AC, a6) 5 = ABC,+E(A+B+C) eivalence withthe original equations is easily verified. The corresponding adder design, ‘complementary static CMOS, is shown in Figure 11-4 and requires 28 transistors. In addi- ‘= couuming a large area, this circuit is slow: | “Tal PMOS transistor stacks are present in both carry- and sum-generation circuit “Tie intrinsic load capacitance of the C, signal is large and consists of two diffusion and orca ee te A Th sina propagates through two inverting stages inthe carry-generation circuit. As carlier, minimizing the carry-path delay is the prime goal of the designer of 'ohspeed adder circuits. Given the small load (fan-out) atthe output of the carry chain, sqhnt © logic stages is too high a number, and leads to extra delay Sum generation requires one extra logic stage, but that is not that important, since a ‘Ppears only once in the propagation delay of the ripple-carry adder of Eq. (11.4). esigning Arithmetic Buty. ~~ mae, Chapter 11° 506 Even cell dd cet AD Be Ay Bs ny A Bs fe Ms ae t + {1.8 tvertereinatin in carry path. FA’ stands fora ut FeBer wathout the Inverter in the carTy path art design tricks. Notice thatthe ist gap 5 ‘though slow, the circuit includes some sm Although slow, ed with the; signal on the smaller PMOS sack, eng the carry-generation circuit is design it lopal effor to 2. Als, the NMOS and PMOS transistors connected 10 Cae pice cing rect application of a circuit-optimization as possible to the output of the gate. This is & ‘Bane discussed in Section 42—transistors onthe critical path should be placed a cose py, ‘sible to the output of the gate. For instance, in stage k ‘of the adder, signals A, and B, are availabe and stable long before C,4 (© Cau) arives after rippling through the previous stages. nm tra: the capacitances ofthe internal nodes inte transistor chain are precharged or discharge ‘Mince. On anal of C,y, only the capacitance of node X has to be (dis)charged. Puig te C, tanssiors closer to Vjg and GND would require not only the (ds)charging of tec: tance of noe X, but also ofthe internal capacitances. “The sped ofthis circuit can now be improved gradually by using some of the air mop ies discussed in the previous section. First the numberof inverting stages inte cary at canbe reduced by exploiting the inven property—invrtng all the inputs fa fuller also iver all the outputs. This rl allows ust eliminate an inverter in a cary stain em ‘usratd in Figure 115. Micror Adder Design ‘As improved adder circuit, also called the mirror adder, is shown in Figure 11-6 [Weste93}. 5 pera asd on Ba (1.3) The cry generation eer is worth analy Fs 7 ne gale is eliminated, as suggested in the previous section. Secondly, the PDN s PN networks of the gate are not dual, Instead, they form acever implemennion cof the rom aceeetlc funconwben ete D&G is hgh, Cs St © Voo & GND tect amt Propagate are val (oc ie 1? he icing Lrnaprep sere pered 1G Tis renin a considerable estion nba #4 Soe left tothe reader. The following obseraios * This full-adder cell requires only 4 transistors. "RecA Asters ag gare 11-6. Mirror adder—circuit schematics. ne xMOS and PMOS chains are completely symmetrical, which sil yield correct oper rea to se-duality ofboth the sum and cary functions. Asa result, a maximum of cre eansistors can be found in the cary-generation circuitry. «e-eanstrs connected to, ae placed closest tothe output of the gate. «Ci the transistors inthe cary stage have tobe optimized for speed. All transistors in the ‘un suge ca be of minimum siz. When laying out the cell, the most critical issue is the ‘minimization of the capacitance at node C,, Shared diffusions reduce the stack node cxpectances “let wer cell of Figure 11-4 the inverter can be sized independently to drive the C, ‘upu ofthe adder stage that follows. If the carry circuit in Figure 11-6 is symmetrically tux. cich ofits inpts has logical effort of 2. Tis means thatthe optimal fan-out sized oe delay, should be (4/2) = 2. However, the output of this stage drives two inter- ‘ltt capacancesand win gate capacitanes in he connecting adr cell clever sole ‘o keep the transistor sizes the same in each stage is to increase the size of the carry cae tot eto four times the sie ofthe sum tage ‘This maintains the optimal fan- Mo2-The resulting transistor ies ae annotated on Figué 11-6, where a PMOSY ratio of 2 is assumed. ‘a Gute Based Adder ray a designed o use mulilees and XORS. While his is impractical in a com- Sens OS implementation it becomes atrctive when the muliplexers and XORS are Seige tims gat A fuller implementation based on this approach i SacI and uses 24 wasnt Ii based on the propagae-geneate model ‘The propagate signal, which is the XOR of inputs A and B, is used to sum Beterig Cary > peeration sen igure 117 srarasen-gt-asea tease col with sum and carry delays Frou value (ater Weste26) selet com the new sum output Based seentary value ofthe input ca 3 on snes sree eas aa ope menting featur of uch an ade is ha ches similar delays fr both sum and cry ouput. Manchester Carry-Chain Adder can be simplified by adding generate and dee “Te cary-propagation circuitry in Figure 11-7 ‘pun as sbown in Figure 11-8, The propagate Sopot ifthe propagate signal (A, ©, str. Ifthe Props care iter ple low by the D sigs of pulled wp bY Gr The Yar implemenatce Pec IT tby makes even further implication possible. Sinee the wansion, 8 das Fe ir powuoni, the transmission gates canbe replaced by NMOS-only pass TA cease te ouput eliminates te ned forthe Kill signal (forthe casein which (=? Chain propagates the complementary values of the carry signals). ‘K Manchester carry-chain adder uses a cascade of pass transistors to it chain Kb), An example, based onthe dynamic circuit version introduced in Fi: 7 ieee Ne a ey al intermediate nodes ofthe F* eee oe evaluation, the A, node is discharged Wt stage k (G,) is high. propagate signal P, is high, or when the generate signal f Figure 11 een layout ofthe Manchester carry chain in stick-diaso yout consists of three rows af ell organized in bit-stied ste: TEP path is unchanged, and it passes C10 eC xgate condition is not satisfied, the ou: yplement the nade o ” Yow Yoo EL Ged oodh T ved ood co) 11-8 Manchester carry gates. (a) Static, using propagate, generate, gM. (>) dynamic implementaion, using only propagate and generate signals, & G Gq & Figure 11-9 Manchester carry-chain adder in dynamic logic (four-bit section). ‘tells computes the propagate and generate signals, the middle row propagates the carry from. ‘eoright, and the bottom row generates the final sums. ‘The worst case delay of the carry chain of the adder in Figure 11-9 is modeled by the lin- ‘sued RC network of Figure 11-11. As derived in Chapter 4, the propagation delay of such ‘work equals mot t= oo Ze{ 3, 0.9% N* Dac ay i “WG Cand Rak, Designing Arthmetc Building Big, chapter 11. * ropngterenerate ROW verter Row igure 11-10. Stok diagram of two bits ofa Manchester cary chain Caryn or dock R, 4 Res Re gCarvow \ bate er te eg wed Mog “or “oy Figure 11-11 Equivalent ntwork to determine propagation delay of a cary crn ‘Example 112 Sizing of Manchester Carry Chain ‘The capacitance per node on the carry chain equals four diffusion capacitances, oe inverter input capacitance, and the wiring capacitance proportional tothe size ofthe ce ‘The inverter and the PMOS precharging transistor can be kept at unt size. Together wit the wire capacitance, the fixed capacitance can be estimated as 15 {F (for our technology Ifa uait-sized transistor with width Wo has a resistance of 10 k& and a diffusion capa tance of 2 fF, then the RC time constant for a chain of transistors of width Wis c= (606 ois) 10x02 ee previous stage. Therefore, the transistor size is limited by the input loading capacitance Unfortunately, the distributed RC-nature of the carry chain results in a propagation Sl) that is quadratic in the number of bits N. To avoid this, it is necessary to insert signal-buffens® iver. The optimum numberof stages per buffer depends on the equivalent resistance o inverter andthe resistance and capacitance ofthe pass transistors, as was discussed in Chap? In ur technology and in mos other practical cases, this number is between 3 and 4. Adding *® inverter makes the overall propagation delay linear function of N, as isthe case with Pl cary adder ; Increasing the transistor width reduces this time constant, but it also loads the gute in ~, 871 ge We 43. The Binary Adder: Logic Design Considerations any adder is only practical forthe implementation of additions with a relatively Po no, ‘Most desktop computers use word lengths of 32 hits, while servers require 64; word ters, such as Mainframes, supercomputers, or multimedia processors (e.g., the Sation2) SU2UOKIA], require word lengths of up to 128 bits, The linear dependence of onthe numberof bits makes the usage of ripple adders rather impractical. Logic pes te et thal follow. We concentrate om the circuit design implications, (pe resented stnkctres are well own from the traditional logic desig literature hws om oo a ope Boas Adder ae ihe fourbit adder block of Figure 11-12a, Suppose thatthe values of A, and By (k = Como sach that all propagate signals P, (k = 0...3) are high. An incoming carry C, = | prop- vee those conditions through the complete adder chain and causes an outgoing carry aster words, if (PyP,P2Ps = 1) then C, 5 = C, 11 else either DELETE or GENERATE occurred a ‘information can be used to speed up the operation of the adder, as shown in Figure 11-12b. Wer BP = PoP;P2P3 = 1, the incoming carry is forwarded immediately to the next block tecoph the bypass transistor M,—hence the name carry-bypass adder or carry-skip adder [Lerman]. If this is not the case, the carry is obtained by way of the normal route. MG OG AG +4 4 cm tt Con| Con Goa Cox Co} ra Let ra Lol kw FA (2) Carry propagation {b) Adding a bypase Figure 11-12 Carry-bypass structure—basic concept. - Chapter 11 + Designing Arithmetic Building Example 11.3 Carry Bypass in Manchester Carry-Chain Adder = Figure 11-13 shows the possible camy-propagation paths when the full adder cy implemented in Manchestercany style. This picture demonstrates how the bypass spa ‘up the addition: The carry propagates citer though the byPass path or a carry ig pret ated somewhere inthe chain In oth se, the delay i smaller than the moral spp, configuration, The area overeat incur by adding the DYPAS paths smal and type ranges between 10 and 20%. However, adding the bypass path breaks the regular hia, strvcture (as was present in Figure 11-10). mm » toi tft 1 4 cr D> T BP Figure 11-13 Manchester carry-chain implementation of bypass adder. ‘Let us now compute the delay of an N-bit adder. At first, we assume thatthe total adder is divided im (N/M) equal-length bypass stages, each of which contains M bits. An approx ‘mating expression for the total propagation time can be derived from Figure I1-I4a and is given in Eq. (11.9). Namely, 85g Ma 4S tip #8 Day a Dt Gps Cpl GF Cassy Carry any [cary nH) hata eh - So le ethaaed in ga” 18) carY-byPase adder; composition, The worst case delay Pa” — 113, The Ade sainthe composing parameters defined as follows + hyag' the fixed overhead time to create t Tog’ the Propagation delay through a through a single stage of M bits “tas the Propagation delay through the hyp “tpn: the time to generate the sum of the final * multiplexer of a single lage Teco pauls sided in gray on the block diagram of Figue 11-14. From Fa. (11.9), it fol lsh il near in the numberof bts, since inthe worst cane ke, carry is generated at se fc poston. ippes through the frst lock, skips around (W/A 35 bypass stages, and is sesame a he lst bit postion without generating an output erry The eprnat number of bits se sap lock is determined by technological parameters such asthe extn delay of the bypass- scecane muluplexes, the buffering requirements in the carry chain, and the nea the delay through the ripple and the bypass paths, Although still linear, the slope of the det Problem 11.1. Delay of Carry-Skip Adder SEGIine pat pte tha wiggers the worst case delay in a 16-i(4-4) cay-bypus aber Ass {hat fay = Leap = Sip = tam = 1, determine the delay and it with that of a normal ripple 18 that fay = Leap = ip = team = Is ‘delay and compare te, Chapter 11+ Designing Arthmetic Butdingp,. “Te Linear Carry Select Adder Ina rippe-cay adder, every fl adder cell has to wait forthe incoming cary before an ou, ing cary can be genernted. One way to gt arnd this inear dependency i 0 anticipate om posible values ofthe cary input and evant terest for bash posses in advance get the real value ofthe incoming cary Hon the comet rest esl selected with pre tmuliplerer stage Am implementation ofthis Hea. appropriately called the cary-selerr mee [Besj62 i demon in pre 11-16, Conse he Hock of oer, which ading pay tok+ 3 Instead of wang onthe aval ofthe output ary of bit I. both the and tpn bits are anavand Prom a ict pont of view this means that {0 cary paths ae pry mented. When C. fins sees, either the result ofthe 0 oF the I path is selected bye truhipleer which conte performed with a minimal delay. Asis evident fom Figure 11.14 ge hardware overhead of the cany-select ari estrcted to an adlitional cary path and smn exer. and equals about 30% wth respect to a rpple-cary structure "A fll cary select aver is now consrited by chaining a numberof equa-length ser stage. as in the cary-bypass approach (se Figure 11-17). The eitical path is shaded in gny From inspection ofthe cite, we can derive a frstorder model ofthe Worst case propagston ‘ela ofthe modal, writen a5 ta * Sony M+ (Yar i. TE eae an Ad ge Fe fed delays and and M represent the total numberof bits, nd te ‘numberof bis per Sage eSpetVEY. ayy isthe delay ofthe carry through a single fll-aer ‘al The cary delay through a single block is proportional tothe length ofthat stage or equals M tory “The propagation delay ofthe adder is, again, linearly proportional to N (EQ, (11.10), The reason fortis linear behavior is thatthe block-select signal that selects between the and | ‘solutions sill ha o ripple through al stages in the worst case Sum Generation Four-bitcary-select module—topology. Figure 11-1 Sas ' . linear carry. pes 1117 Soteer-i near cary-select adder. The cris pate shan gy, Fammt2 Linear Carry Select Delay oy a ob Incyte ig wp at Coe mer Cnc ly mig it a a tone teSquare-Root Carry-Select Adder ‘ex sructureillastrates how an alert designer can make a major impact. To optimize « tsa its essential to locate the critical timing path fis. Consider th case ofa 16-bit near _anyseiect adder, To simplify the discussion, assume that the full-adder and multiplexer cells ‘we identical propagation delays equal to a normalized value of 1. The worst case arrival times s(t sigals at the different network nodes with respect to the time the input is applied are ‘miied and annotated on Figure 11-18a. This analysis demonstrates that the critical path of the ‘dé ripples through the multiplexer networks of the subsequent stages. One striking opportunity is readily apparent. Consider the multiplexer gate in the last ‘ter sage. The inputs to this multiplexer are the two carry chains of the block and the block- ‘ulipeter signal from the previous stage. A major mismatch between the arival times of the ‘al can be observed. The results of the carry chains are stable long before the multiplexer ‘aves. It makes sense to equalize the delay through both paths. This can be achieved by ‘ely adding more bits to the subsequent stages in the aa beagle Saas’. ‘of the carry signals. For example, the first stage can add 2 e second c . sree ee ie iho ‘dder topology is faster than the linear organization, even though an extra stage 1s lh te sane propagation dey aa lif ce ou a arrival times at the multiplexer nodes has been eliminated. 7 we. he smpe wick of ming the ae ae proeiely get els a With sublinear delay characteristics. This is illustrated by the following 576 Chapter 11 + Designing Arithmetic Butding Pees Tg Te ee elec L ae[ tom TL , Lo t L [enn] [Re cenin] [Smet Se So Sean Size HO) (2) Linear coniguration Bao pane Bass Bito13 L a i Setup seup (en 7 q = tr soem | xf ra] moc | loco = mp P=) po mp ¥PP=1) +P =F oe(u 2) au WM << Neg, M = 2, + M = 2, and W ‘simplified to 4), the Sirs term dominates, and Eq, (ILD can be ve 7 au) Ripple adder in unit delays) s ‘pqure 11-19 Propagation delay of square-root carry-select adder versus linear (Pic and select adders. The unit delay model is used to model the cell delays. P= JIN (11.13) {qsom (11.13) can be used to express fag a8 a function of N by rewriting Eq, (11.10): 04d = Nsetup + Mb carry + (LEN) ue + bum qan.aay ‘feéeiy is proportional to ./N for large adders (N >> M), or t,4q = O(./N). This square-root ‘nm fas a major impact, which is illustrated in Figure 11-19, where the delays of both the ‘x mn square-root select adders are plotted as a function of N. It can be observed that for ‘mp rates of NY, yy becomes almost a constant. ‘nen 113 Unequal Bypass Groups in Carry-Bypass Adder ‘créer might be intrested in applying the previous technique to carry-bypass adders. We saw ear ‘ct te delay i a linear function of a number of bits. Can they be modified to achieve better than lin- ‘x by using variable group sizes? e, /i#** make sense to make the consecutive groups gradually larger. However, the technique used in Pek alder doesnot directly apply to this case, and a progresive increase in stage sizes eventually SE te dey Consider a carry-bypass adder in which the last stage i the largest: The carry signal ‘Sete trough that stage and gets consumed a the msb postion (with 0 chance of bypassing it is ‘kal path for the sum generation. Increasing the size ofthe last group does not help the problem. this dixctssion and assuming constant delays for cary and bypass gates, skeich the profile te, ‘Xe brs network ta achieves delay thet net than near gee ae ace sme Chapter 11 + Designing Arithmetic Building Doe, The Carry-Lookahead Adder* “The Monolithic Lookahead Adder When designing even faster adders, i is essential tp see tat al resent in one Fm OF another in by, aarti ears lookahead principle offers a posible way 4g etre the towing relation ls fr ex around the rippling effect of ccarry-bypass and carry:select so [WeinhergerS6, MacSorley6t]. As stated position in an N-bit adder: Can = SA Bi Cant) = Get PACot1 di “The dependency Detween Ca and C,.. canbe efiminated by expanding Coa Con = G+ PGr-n Pe 1Coae-2) a Ina fally expanded form, Con = Gt Pea + Pr alors Pu(Go* PoCi0))) aun ‘with Co typically equal t 0. “Phi expanded relationship can be used to implement an N-bit adder. For every bi the ‘cary and sum outputs are independent ofthe previous bits. The ripple effect has thus been ef. tively eliminated, and the addition time should be independent ofthe numberof bits. A black