You are on page 1of 8
nxn Carry-Save Multipliers without Final Addition Paolo Montuschi Luigi Ciminiera Dipartimento di Automatica e Informatica, Politecnico di Torino, corso Duca degli Abrurai 24, 10129 Totino (Italy) Abstract: ‘Carry-save multipliers require an adder at the last step to-convert the catry-sum representation of the most significant half ofthe result into an ieredundant form. ‘This paper presents a multiplication scheme where this ‘conversion is performed with circuit operating in par- allel with the cacry-save array ‘The resulting implementation, when a radix adder array is used, produces a result on 2n bits with a delay Comparable io the multiplier proposed by Excegovac fand Lang in (13). When a radix-d array is employed, the proposed unit is almost twice as faster as the units propoted by Nakamura in (18) and Jullien et alin [27]. Keywords: carry-save addition, multiplication, on- they conversion, redundant number representations 1 Introduction “The design of high speed malice has eae payed rhea cei teenie adit tr tke Roe i Plein itn ens aren eget Ty, 0 sh al, TS, sh Many of the proposed multipliers ase, internally, = Carry-sam representation of the partial accumulated Products, The cary-sum is one ofthe ealy types of Fedandant representation introduced in oder to conve: ent addin the problem of carty propagation. The Signed digit redundant representation ts ideal for rep- resenting retults produced both by erative algorthens {or SRT [20] divinon and square root (1! and by on-line algorithms [12). On the other hand, the carsy-stm form ESmore popular than signed digit and well suited for thevepicentatin fs cong rom cei camlaions, (such asthe partial results during a mul tiplication) and, in generals easy to implement when {ie operands ae in ton redundant form, The main sea son of this a that carry save adder is simpler than ‘Ssgned digit adder [2 3), [16)- However, the former ‘ceepts operands in redundant form, while the latter ses redundant operands, ‘Therefore, once the prod: uct of negative numbers has been transformed into a Sequence of along ike the Baugh and Weoley a gorthm [ior by encoding ofthe partial products (24) fr by means of other techniques [19), [23 the cary: fam representation ofthe partial accumulated products E cheaper to handle. 1063-68993 800 © 199 EEE Multipliers based on the signed digit representation of the products have been presented in 13}, By carrying but an onrthefly convefsion procedure based on the Scheme presented in [i], Bregovc and Lang present in {13} a'n xn muldplier not requiring the carry prop- ‘gated addition in the last step of the “classical” mn ‘plies, but providing a rest only on the most signi icant n bits. ‘On the other hand, our proposal consists in two sthemes of earry-sumvadder-based nxn multipliers, for ‘ther binary of two's complement numbers, ome with fadix-2 and the other with radix-4 adder array. ‘The proposed multipliers can be considered as vatiations of fhePaaea” cry ane sltion (19) (2) and they feature all the 2n bite of the rest." Actually, they produce the least significant 7 bits ofthe result in Fedundant form, and the most significant w= 2n—n" iit in carry-sum form, but the latter are produced Starting with the most significant one, ‘The digits in ‘edundant form are converted in patllel withthe com- putation of the least significant ones, s0 that, when {he last redundant digit i produced, the whole prod luc value in redundant form can be obtained after « constant (i. independent of n) delay. ‘The design of nxn two's complement and binary mul Aipless is presented in section 2, where we discuss also the problem of om-thedly converting the result in it redundant form. The proposed multiplicrs are then ‘valuated in section , considering both thls hardware requirements and their speed of operation, 2 Design of binary and two's comple- ment multipliers We propote a multiplier which produces a result not re- auising the carry propagate addition in the last step of Some conventional schemes (15), (22), and with respect {othe mulliplier proposed by Bfcegovae and Lang in {al Gerhich te based on a signed representation), pro- viding all the 2 bits of the result. Tt performs the ruliplication producing the least significant n" bits in catty assimilated representation and the mest sig- nificant w= 2n— win catry-sum form, which ae on-the-fly (and in parallel) converted in non-redundant representation, ‘Our unit, generates all the elementary product terms By operating radi? and, in the case of two's comple: nent multipier, implements the algorithm by Baugh 4nd Wooley [4] with the extension by Blankenship [3 Figure 1: Proposed ‘As with many other parallel multiplying arrays, the ‘multiplication ean be considered as being performed in 2 phases: 1. all the elementary products are generated 2. the bits obtained in the phase 1 are added by a suitable array ‘The first phase requires a constant time, and it is not ‘of interest in our ease, because, once the radix has been Selected, its complexity and delay are common to dif ferent algorithms. Extension to higher radices of the rmultiphier for the phase when all the elementary prod- ucts are generated, can be performed by using algo- thms based on the encoding of partial products [24] ‘or on other techniques {19], (33) in order to eliminate the need for sign extension for the negative terms Several solutions have been proposed a the leature forthe anayausilting the bie pode inthe ee, Phase, OF pte ae the “ela” ade 2 Eosmistion stay 5) andthe aid aay prope ip Nakamara in {1} so by Tae ota (he Phe anal array for aid sutton cones the ful adder ax ‘the ba assisting ements On the Sittin’ abana ues or hi ay ison (6,3) counter au the base ssiang came, whe Sulieytatemploy «nt fal ad We sar hy Conaiering the tac and then we presen the tai isSaon ty ay of adders for n 2.1 Radix-2 adder array Our attention is focused only on the second phase; the array used for two's complement mumbers is shown in Fig: for n= 5. Globally, the whole unit is basically a carry save multiplier, where some additions are antici- ppated in order to obtain the digits in redundant form as Soon as possible. ‘This also implies the introduction of iagonal line of full adders (the one below the dashed Tine in Fig. 1). In Fig. 1 we have denoted with the two operands and the result, respectively. Jn Fig. the (n+ 1) least significant tits ofthe prod- tt Rive been Gcnoted withthe symbols & (nth Oy jy since they at produced in inedndant form fd they are equal tothe (uf eas igen bee Sf the resale, Therefore, in such a cave the subecpt toes elated to the weight ofthe it of the fal test. ‘On the other handy the (n= 1) most signi ‘ant digs ofthe esl ae produced as tarry and sum Figure 2: 01 ely conversion hardwate for rls, gin noasnilated careysum form. ty M hey have been denoted with the symbols py (wil "tan 1), where the subscript tof py is velated to the order of availability of the pair p.. Therefore, i is the frst available carry and sum pair, pp is the Second, and so on; each digit 7, in effect comes from the assimilation of one carry and one sum bit c, and su, respectively. ‘Te summarize, in Fig. I the (2 — 1) ‘most significant digits are labeled in the order they are reduced, while the (n+l) least sigaiieant bits ae le beled with index corresponding to the bit weight inside the final result ‘The (7+ 1) least significant bits are produced after a delay of n full adders. The (n—1) most significant dig- its ate also produced after a delay of fall adders. In particular it should be noted that the most significant farry and sum pair of the final product py is produced after a delay of 2 full adders.’ ‘The successive carry ‘and sum paits follow thereafter, with delays of one ful ‘idder between one and the next. In this computation, the half adder used to sum the additional inputs te: quired by the Baugh and Wooley algorithm does not contribute to the total delay, as it operates in parallel ‘with the rest of the array. iplier for postive numbers can be im: ‘mediately obtained fromthe one of Fig. 1, by ei nating all the additional inputs required by the Baugh and Wooley algorithm, as well as the extra half adder, ‘hile at the same time generating the appropriate ele- mentary products. 2.2 On-the-ly conversion and output of the result for the radix-2 adder array By using the hardware of Fig. 2, the redundant digits ‘reproduced by the array stating withthe most sgui- icant one ‘The hardware of Figs ased to hmpletent 4 sot of onthesy conversion algorithm whic me fap te Ge mtiodalg eld in 1, Howe 1] the mort significant digits are in Signed digt form, sei Te petrol igelth they icin ety ane ‘Table 1: Conve ion control signals Daly | Decision at level + about value of product digit 7 WY undesided 7 | ae wo change (omy = Ag) 1 deekded: inerement ‘modulo the operating base (rug = Ay +1) rod R) teptesentation. Observe that the hardwate forthe on- they conversion of Fig. 2 operates in parallel with the computation ofthe following digits ‘The carry-summ representation of the product is con- ‘verted into the corresponding conventional representa: tion m, During the on-the ly conversion, we produce both the assimilations of the bits pu using the blocks Aand the control signals Di) associated with each fice output from blocks A. This is done s0 as to determine whether the final bit mi Ay or Ay (ee (a, 11) mod 2). The meaning ofthe control signals ‘Sigiven in Table 1. In Fig, le observe tha, in cor. Tespondence with each stage of the array, one pait of Carry and sum bits is output from the array. By con- sidesng that py has the value of the assimilation of the bis cand sy. high level desription of the proces is ee ouk From (1) it follows that A; = c;@4y, where the symbol @ stands for the EXOR operator, At the beginning we a) Save Di uy and for Bet we have u if Daf) = wand py Dali) {3 4 DeH= wand pdr Dud ip Dili =u and p, = 2 oF Dal eis interesting to note that the rules in Table 1 imple- ‘ment the same addition algorithm as carry-select (15); hhowever, this implementation fone, because it takes into account the different. avai ability in time of the digits to be added, We denote with r(7]and 6] the most and least gui sea We hee cra hn Dal) =o with the paie (ni the con: ‘ition Did) = wih the pa (re and the condition Di {i Sn ae Chr ent 0.4 = 0}; The boolean expressions sed fo obtain the ts of [i+ 1) are derived from (2) and are witi = Bilas +n &li+1] = Wiles + dul) (3) From Table 1 we observe that at the final level, (ie when i= 1}ronly when Dy[n~ I] = itis necessary

You might also like