You are on page 1of 2

Low-power parallel multiplier with column bypassing

M.-C. Wen, S.-J. Wang and Y.-N. Lin
A low-power parallel multiplier design, in which some columns in the multiplier array can be turned-off whenever their outputs are known, is proposed. This design maintains the original array structure without introducing extra boundary cells, as was the case in previous designs. Experimental results show that it saves 10% of power for random input. Higher power reduction can be achieved if the operands contain more 0’s than 1’s.

(enclosed in the circle) can be bypassed, and the outputs from the first row are fed directly to the third row CSA. However, since the rightmost FA in the second row is disabled, it does not execute the addition and thus the output is not correct. To remedy this problem, an extra circuit must be added, and these elements locate in the triangle area in Fig. 2.
a3b1 a3b0 a2b1

0

a2b0 a1b1

0

a1b0 a0b1

0 a0b0

0
a3b2 a2b2 a1b2

+

0
a0b2

+

0

+
1

01 10

01 10

01 10 -b

0
a3b3 a2b3

+

+
01 10

+
01 10 –b

b2

+
2

01 10

Introduction: Multiplication is an essential arithmetic operation for common DSP applications, such as filtering and fast Fourier transform (FFT). To achieve high execution speed, parallel array multipliers are widely used. These multipliers tend to consume most of the power in DSP computations, and thus power-efficient multipliers are very important for the design of low-power DSP systems. CMOS is currently the dominant technology in digital VLSI. Two components contribute to the power dissipation in CMOS circuits. The static dissipation is due to leakage current, while dynamic power dissipation is due to switching transient current as well as charging and discharging of load capacitances. Since the amount of leakage current is usually small, the major source of power dissipation in CMOS circuits is the dynamic power dissipation. Dynamic power dissipation appears only when a CMOS gate switches from one stable state to another. Thus, the power consumption can be reduced if one can reduce the switching activity of a given logic circuit without changing its function. Many low-power multiplier designs can be found in the literature. A straightforward approach is to design a full adder (FA) that consumes less power [1]. Power reduction can also be achieved through structural modification. For example, rows of partial products can be ignored [2]. Parallel multiplier: Consider the multiplication of two unsigned n-bit numbers, where A ¼ anÀ1 anÀ2, . . . , a0 is the multiplicand and B ¼ bnÀ1 bnÀ2, . . . , b0 is the multiplier. The product P ¼ p2nÀ1p2nÀ2, . . . , p0, can be written as follows: P¼
nÀ1 X nÀ1 X ðai Á bj Þ2iþj i¼0 j¼0

a1b3

a0b3

0

+

+
01 10

+
01 10 -b

b3
3

+

01 10

+

+

+

+

P7

P6

P5

P4

P3

P2

P1

P0

Fig. 2 4 Â 4 Braun multiplier with row-bypassing

An array implementation, known as the Braun multiplier [3], is shown in Fig. 1. On the other hand, the Baugh-Wooley multiplier uses the same array structure to handle 2’s complement multiplication, with some of the partial products replaced by their complements. The multiplier array consists of (n À 1) rows of CSA, in which each row contains (n À 1) FA cells. Each FA in the CSA array has two outputs: the sum bit goes down while the carry bit goes to the lower-left FA. For an FA in the first row, there are only two valid inputs, and the third input bit is set two 0. Therefore, it can be replaced by a two-input half-adder. The last row is a ripple adder for carry propagation. In this Letter, we propose a low-power design for this multiplier.

Proposed method: Instead of bypassing rows of full adders, we propose a multiplier design in which columns of adders are bypassed. In this approach, the operations in a column can be disabled if the corresponding bit in the multiplicand is 0. There are two advantages to this approach. First, it eliminates the extra correcting circuit as shown in Fig. 2. Secondly, the modified FA is simpler than that used in the row-bypassing multiplier. Assume that we execute 1010 Â 1111 in Fig. 1. It can be verified that, for FAs in the first and third diagonals, two out of the three input bits are 0: the ‘carry’ bit from its upper right FA, and the partial product aibj (note that a0 ¼ a2 ¼ 0). As a result, the output carry bit of such an FA is 0, and the output sum bit is simply equal to the third bit, which is the ‘sum’ output of its upper FA. The following theorem shows that this is true in general. Therefore, when ai is 0, the operations in the corresponding diagonal can be disabled since all the outputs are known. We refer to the FAs in a diagonal in Fig. 1 as a column. Let FAi, j be the full adders locating in row i and column j, 0  i, j  n À 2, in the (n À 1) Â (n À 1) array, as shown in Fig. 1. FA0,0 is the adder at the upper-right corner. The following theorem establishes reason for column bypassing. Theorem 1: When aj ¼ 0, the output of a column j adder cell FAi, j can be specified as follows. 1. The output carry bit is 0. 2. The output sum bit is equal to the output sum bit of FAiÀ1, jþ1. Proof: We prove this theorem by induction. 1. Consider row 0. Note that, in row 0, there are only two bits to be added. Adder FA0, j carries out ajb1 þ ajþ1b0. If aj ¼ 0, then the output carry bit must be zero, and the out sum bit is equal to ajþ1b0. 2. Assume that the theorem holds for row i. 3. In row i þ 1, the inputs of FAiþ1, j are carry bit from FAi, j, sum bit from FAi, jþ1, and the partial product ajbiþ1. Since aj ¼ 0, two out of the three inputs are 0, and the output sum bit is equal to the sum bit sent by FAi, jþ1. According to theorem 1, when aj ¼ 0, the operations in column j can be ignored and thus the full adders can be disabled since the outputs are known.
a3b0 a2b1 + a3b2 a2b2 + a3b3 a2b3 a2 +
10

a3b1

a2b0 a1b1 +
10

a1b0 a0b1 +
10

a0b0

a1b2

10

a0b2 +
10

Fig. 1 4 Â 4 Braun multiplier

+ a0b3
10

a1b3 a1

10

Low-power multipliers with row-bypassing: A low-power multiplier design may disable the operations in some rows to save power [2]. If bit bj is 0, all partial products aibj, 0  i  n À 1, are zero. Therefore, the additions in the corresponding row in Fig. 1 can be bypassed. The rowbypassing multiplier is shown in Fig. 2. Each cell in the CSA array is augmented with three tri-state gates and two multiplexers. For example, let b2 be 0 in Fig. 2. In this case, the CSA in the second row

+
10

a0 +
10

+ P7 P6

+ P5

+ P4 P3 P2 P1 P0

Fig. 3 4 Â 4 column-bypassing multiplier

ELECTRONICS LETTERS 12th May 2005 Vol. 41 No. 10

13–17 Abu-Khater.e.4325 0. In the bottom of the CSA array.nchu. we implement the design with TSMC 0.. A. there will be greater power saving. and thus its output carry bit will not be changed. Asia-Pacific Conf. 1996. In this experiment. the two inputs of FA0. which disables the operations in columns of full adders. pp.edu. the FA will be disabled.tw References 1 2 3 Wu.e. The areas of the three designs are listed in Table 2.-C. The power is estimated by running HSPICE. If the distribution of 0’s and 1’s is not uniform. and Elmasry. we shall be able to achieve higher power saving. j). Note that we only need two tri-state gates and one multiplexer in a modified adder cell. S. this technique achieves higher power reduction with lower hardware overhead.j are fixed. Solid-State Circuits. Taichung 40227. pp. IEEE J.3 0. Table 1 gives the power consumption by the three designs. We do not need a tri-state gate for the carry input (CiÀ1. Table 2: Area (mm2) Multiplier type Braun [2] Proposed Size 4Â4 8672 (%) 8Â8 (%) 16 Â 16 (%) 131040 185367 162131 100 141 124 100 33286 100 13692 158 48991 147 10063 116 40236 121 Conclusion: We have presented a new low-power parallel multiplier design. 31. 4. pp. we need to set the carry outputs to be 0. row 0). Proc. and the reduction increases as the size becomes larger. and the reason is given as follows. on Circuits and Systems. Note that this is a relatively pessimistic estimation. Therefore. IEEE Int. all three inputs of FA1.: ‘High performance adder cell for low power pipelined multiplier’. the number of 0’s is more than 1’s). If aj ¼ 0. If the operands are sparse (i. In our design. which prohibits its output changing. 3. Bellaouar. i. the area overhead is roughly 20%.5. 57–60 Ohban. the corresponding FAs may not produce the correct outputs since their inputs are disabled. on Circuits and Systems. Otherwise.4 ELECTRONICS LETTERS 12th May 2005 Vol. there are only two inputs for each FA in the first row (i. 2. 41 No. Our results show that the row-bypassing multipliers actually consume more power. Vol.35 mm technology. Compared with row-bypassing. K. Our design consumes less power in all cases. while the area overheads of row-bypassing multipliers are more than 40%. Proc. 2002.: ‘Circuit techniques for CMOS low-power high-performance multipliers’.4 16 Â 16 8. 10 . M. the results are given as follows. Moshnyaga. V.e. Therefore. Wen. I. j are disabled. J. We compare the performance of this design with a normal Braun multiplier and rowbypassing multiplier [2]. M.31 2.26 7.5537 (%) 100 128 8Â8 2.1049/el:20050464 2 February 2005 Results: To evaluate the performance of this low-power multiplier.-N. and Inoue.25 (%) 100 119 97.. May 1996. when aj ¼ 0. Symp.01 8. # IEE 2005 Electronics Letters online no: 20050464 doi: 10.G.76 2..Multiplier design: The column bypassing multiplier is shown in Fig.15 (%) 100 103 89. National Chung-Hsing University..: ‘Multiplier energy reduction through bypassing of partial products’. A. (10).4298 99. Wang and Y. Lin (Department of Computer Science.-J. the probability of 0 and 1 are both 0. Taiwan) E-mail: sjwang@cs. 250 Kuo-Kuan Road. This is done by adding an AND gate at the outputs of the last-row CSA adders.S. Vol. 1535–1546 Table 1: Power (mWatt) Multiplier type Braun [2] Proposed Size 4Â4 0. the input patterns are assumed to be random. possibly due to the extra logic. For a Braun multiplier.