You are on page 1of 4

2004 IEEE Asia-Pacific Conference on Advanced System Integrated Circuits(AP-AS1C2004)/ Aug.

4-5,2004

4-3
A Low-Power Booth Multiplier Using Novel Data Partition Method
Jongsu Park, San Kim and Yong-Surk Lee
Processor Laboratory, Department of Electrical and Electronic Engineering, Yonsei University 134 Shinchon-dong, Seodaemun-gu, Seoul, Korea E-mail: jspark@dubiki.yonsei.ac.kr

Abstract
The Booth algorithm has a characteristic that the Booth algoriihm produces the Booth encoded products with a value of zero when input data stream have sequentially equal f values. Therefore, parrial products have greater chances o being zero when the one with a smaller dynamic range of two inputs is used as a multiplier. To minimize greater switching activities of partial products, we propose a novel multiplication algorithm and its associated architecture. The proposed algorithm divides a multiplication expression into four multiplication expressions, and each multiplication is computed independently. Finally, the results o each f f multiplication are added Therefore, the exchanging rate o two input data calculations can be higher during multiplication. Implementation results show the proposed f multiplier can maximally save about 20% in terms o power dissipation than the previous Booth multiplier.

~51. Many researchers have proposed methods to reduce power consumption by modifying conventional multiplication algorithms [1][6][7][81[91[lOl[111[12]. In order to reduce the increased amount of power consumption, we propose a novel data partition method and a multiplication algorithm by modifying the low power Booth multiplier [12]. The organization of the remainder of the paper is as follows: Section 2 describes the basics of radix-4 Booth algorithm. Sections 3 and 4 describe an existing multiplication and the proposed multiplication, respectively. Section 5 shows experimental results and finally, conclusions are discussed in Section 6.

11. RADIX-4 BOOTH ALGORITHM


The radix-4 Booth algorithm is a powerful method to increase the speed of the radix-2 Booth algorithm, since greater numbers of bits are inspected and eliminated during the total number of cycles necessary to obtain the product. The operation multiplication needs two inputs and they are a multiplicand and a multiplier. To realize low-complexity 2's complement multiplication, the radix-4 Booth algorithm can he applied to encode one of two inputs, X,Y. If data series of data are used for Booth encoding, a datum ofX. is partitioned into a large number of 3-bit groups. The 2's complement ofX, with a word length of W, which can be represented by

I. INTRODUCTION
Digital signal processing (DSP) is one of core technologies necessary for the next generation of multimedia and mobile communication systems [I]. Most DSP applications involve addition and multiplication arithmetic operations. For example, DCT, FFT, wavelet transform, and OFDM are essential DSP algorithms used for image and video processing, audio signal processing and mobile communications [2][3][4]. Currently, many portable information devices are batlery-powered. The multiplication process is complex and dissipates a large amount of power due to the need for summations of the partial products. Therefore, low-power multiplication is a key concern of hattely-powered multimedia devices. In a CMOS circuit, power consumption can he reduced by using a smaller switching activity in the circuit as expressed in the following equation (I).

j=a

E-,
=
1 4

2x;,, 2"
x

Ps",,tc*i"g = aCVAf,,

(1)

Where a is the switching activity parameter, c is the loading capacitor, Vdd is the supply voltage, and Fclk is the operating frequency. The symbol aC can also be viewed as effective switching activities when measured at the capacitor node during the charging and discharging. The only parameter which can he reduced in an algorithmic level is the switching activity. Therefore, minimizing the switching activity in the algorithmic level during the multiplication process should he considered first before the complex and expensive process of implementing a multiplier is attempted

Here, W is assumed to he an even number. When considering the other input datum of E multiplied by X;, Equation ( 2 ) can he modified into
--I W

--I W

=yB(xj,,,y)x22'
I=O

(3)

54

2004 IEEE Asia-Pacific Conference on Advanced System Integrated Circuits(AP-ASIC2004)/ Aug. 4-5,2004
As shown in Equation (3), the Booth encoded product, B(Xj,,,YJ is a value of -2Y, -Y, 0, Y and 2Y. We can observe that a two input multiplier is replaced with the Booth encoded product. As shown in Table 1, the Booth encoded product is zero when three consecutive hits have the same value (0 or 1). The Booth encoded product with a value of zero does not produce a partial product. Therefore, we must produce greater Booth encoded product with a value of zero to reduce power dissipation.

Table 1 Radix4 Booth Encoded Product

Fig. 1 Multiplication proposed in [12]

w
Switcher
MII#DI,LBOI

UultlDIIeI

I
111. PREVIOUS MULTIPLIER
A. Dynamic Range and the Booth Encoded Product First of all, we must understand the concept of dynamic range. Dynamic range means sequential binary data changes. For example, 0000 or 1 11 I have the smallest dynamic range. However, 0101 or 1010 have large dynamic ranges. For the Booth encoded result to have with a smaller dynamic range between two inputs, partial products have a greater chance of equaling zero. Therefore we do not need any additional computation, and we can reduce power dissipation if we find a scheme that produces more zero results during the Booth encoded products, since the number of partial products during the multiplication can be decreased. B. Shens Multiplication Algorithm Shen et al. proposed a multiplication algorithm used for low power dissipation [LO]. The multiplication algorithm concentrated on reducing the amount of switching activity of partial products, As explained in Table 1, the Booth encoded result with a smaller dynamic range produces more partial products with a value of zero. Therefore, the one with a smaller dynamic range between the two inputs must be the multiplier instead of a multiplicand. The comparator shown in Fig. 1 compares the effective dynamic range between two inputs and the switcher exchanges between two inputs if the dynamic range of a multiplicand is less than the multiplier. Then, the multiplicand and the multiplier of the switcher outputs are used as the conventional multiplier inputs. The limitation of this algorithm is that actual input data streams exchange may occur infrequently because this method compares the entire number of bits calculated between two input data streams during the overall dynamic range comparison.

11110101 x 10110110 = (1111 x l 0 l l ) X 100000000


= (1 I

I 1 x 01IO) x 10000
x 10000

I
I

= (0101 x 101I)
= (0101 x 01

IO) x 1

Fig. 2 Example of multiplication with smaller number of bits

IV. PROPOSED MULTIPLIER


A. Multiplication Input Data Partitioning We propose a low-power multiplication process applied with enhanced power efficiency than the previous method [12]. The previous method simply compares the entire number of bits between two input data sources simultaneously. Therefore, it has a less chance of two input data exchange for Booth encoding than the proposed method in which the two input data are divided into a large number of terms with smaller bits. For example, in order to increase the chance of data exchanges occur during multiplication, the multiplication process can be modified as shown in Fig. 2. The two inputs used for a multiplication are divided into the upper part and the lower pm. As shown in Fig. 2, (1 1110101 x 10110000) is not exchanged in the previous multiplication scheme. However, this data is exchanged in the proposed multiplication. With the proposed scheme, the chance of the exchanges can be increased because four terms of a multiplication with a smaller number of bits than those of the original input are compared for Booth encoding. Therefore, the proposed multiplication can increases the chance of partial product becoming zero and reduces the overall power dissipation with little additional hardware. The proposed multiplier also uses a higher speed parallel multiplication architecture with smaller bits than the existing Booth multipliers.

B. Architecture of the Proposed Multiplier Fig. 3 shows one example of the proposed multiplication architecture where two input data streams are divided into upper and lower parts.

55

2004 IEEE Asia-Pacific Conference on Advanced System Integrated Circuits(AP-AS1C2004)IAug. 4-5,2004 C. Multiplication of Negative Input
10110110 x 0110=1001000100 Stcp1)1011&0110 x 0110 Step2)1011+1&0110 x 0110 Step3)1100&0110 x 0110

1
M6ecnee

I
I

1100

signbits() x

x 0110

(11)0110 0110

output

I I101000(0000)+ 11 I1000100= 1001000100

Fig. 3 Block diagram ofthe proposed multiplication The multiplication scheme is composed of four modules; input dividing unit, dynamic range determination unit (DRD), a radix-4 Booth multiplier, and an adder tree used for summing partial products. The input dividing unit divides each input data stream into parts with smaller data bits, i.e., and upper parts and lower parts are used in this example. These smaller-sized data are processed independently for multiplication using Booth encoding. The DRD module detects the effective dynamic range of two inputs and exchanges each so that the following condition is met: the input with a larger effective dynamic range is the multiplicand, and the input with a smaller effective dynamic range is the multiplier. The microarchitecture of the DRD is shown in Fig. 4, where two 16-hit multiplication inputs are divided into two 8-bit parts. The first and second groups of the comparator's inputs in the DRD module are the first 4 MSB bits and the next 4 LSB bits of the 8-bit DRD input. The third bit of the DRD input is used commonly io the first and second groups because three hits (i+l, i, and i-I) are needed at once for use in the radix-4 Booth algorithm. Fig. 5 shows the comparator that is used in the DRD block. The comparator consists of two AND gates and one OR gate. The comparator output is zero when all input bits are equal value (0 or I).
x i 7 01

Fig. 6 Multiplication of negative inputs We need to take special care when partitioning negative data. For example, 100000000000001 is negative, however, the lower part of this data is positive when used with the Booth algorithm. We proposed novel data partition method to solve this problem. Fig. 6 shows one example of the negative input data case. Three steps are required to accomplish this process: 1) partitions two input data, 2) adds one to upper part of the data, 3) perform the same multiplication as in normal case.

V. EXPERIMENTAL RESULTS
A. Analysis of Exchanging Ratio
We analyzed the input exchanging ratios of the proposed

l -% q- A pv
7 1

"17'0,

7-n

I----

+,

3-0

CO

-.,.

...
1

c~_.I.lI.

+ +
sxrtcner

multiplication scheme used for DSP applications. We obtain some results for discrete cosine transform. The QClF images (Lena, Flower Garden, Miss America, and Table Tennis) are used for these transforms. As shown in Table 2, 8 x 8 DCTs of one image require 262,144 multiplications in these experiments. In the Shen's multiplication, 10,276 data exchanges (3.91% of the total number of multiplications) are achieved whereas the proposed multiplication provides 27,708 (10.56% of the total number of multiplications) data exchanges on average. The proposed multiplication increases the number of exchanges about 2.5 times (10.5613.91) when compared by the previous multiplication more smaller bit multiplications than the previous method. In those experiments, similar data exchange rates are achieved with four different images. This is due to the fact that DCT coefficients are fixed, and the exchanges occur in similar data positions.

Mi,ll,Dl,imd

U"I,."h**

Fig. 4 DRD of the proposed multiplication


VI

$
$!

$ r >

$-3b
B. Analysis of Power Dissipation

y,

Fig. 5 Comparator

56

2004 IEEE Asia-Pacific Conference on Advanced System Integrated Circuits(AP-ASIC2004)/Aug. 4-5,2004 We compared the power dissipation of the proposed multiplier with existing Booth multiplers (Shens [12], Yus [IO], and Ahns [ I I]). We evaluated power dissipation levels of the proposed layouted multiplier using a Synopsys Prime Power tool. Tables 3 and 4 show the overall power analysis results of the four multipliers when applied to FFT and Wavelet transform of images. Power dissipation of the proposed multiplier used for the DSP algorithms was reduced maximally by ahout 7% (Shen), 15% (Ahn) and 20% (Yu) on average, respectively. The reason why the exchange ratio is not the same as the power dissipation reduction ratio is that the input data exchange ratio is based only on the dyanmic range of the partitioned input data. Therefore, when two partitioned input data streams are exchanged, all partial products for the Booth encoding may not be zero.

REFERENCES
Chang-Young Han, Hyoung-Joon Park and LeeSup Kim, A low-power m y multiplier using separated multiplication technique, Circuits and Systems 11: Analog and Digital Signal Processing, IEEE Trans. on, Volume: 48 Issue: 9, Sep 2001 Page(s):866-871 C. Lemonds, A high throughput 16 by 16 bit multiplier for DSP cores, IEEE International Symposium on Circuits and Systems, ISCAS, vol. 2, pp. 477-480. 1996. Tumer, R.H., Courtney, T. and Woods, R., Implementation of fixed DSP functions using the reduced coefficient multiplier, Acoustics, Speech, and Signal Processing, 2001. proceedings. (ICASSP 01). 2001 IEEE International Conference on, volume: 2,2001 Page(s): 881-884 v01.2 Yiquan Wu and Zhaoda Zhu, The new real-multiplier FFT-j alforithms, Aerospace and Electronics Conference, 1993. NAECON 1993., proceedings of the IEEE 1993 National, 24-28 May 1993, Page(s): 90-93 vol. I Yi-Wen Wu, Chen, 0.T.X and Ruey-Liang Ma, A lowpower digital signal processor core by minimizing interdata switching activities, Circuits and Systems, 2001. MWSCAS 2001. Proceedings of the 44th IEEE 2001 Midwest Symposium on, Volume: 1, 2001 Page(s): 172175 vol.1 [6] Paliouras, V., Karaginni, K. and Stouraitis, T. A lowcomplexity combinatorial RNS multiplier, Circuits and Systems 11: Analog and Digital Signal Processing, IEEE Transactions on, Volume: 48 Issue: 7, Jul 2001 Page(s): 675-683 [7] Fayed, A.A and Bayoumi, M.A, A merged multiplieraccumulator for high speed signal processing applications, Acoustics, Speech, and Signal Processing, 2002. Proceedings. IEEE International Conference on, Volume: 3, 2002 Page(s): Ill-3212-111-3215 vo1.3 [8] Kim, S. and Papaefthymiou, M.C., Reconfigurable low energy multiplier for multimedia system design, VLSI, 2000. Proceedings. IEEE Computer Society Workshop on, 2000 Page@): 129-134 [9] Bakalis, D., Kalligeros, E., Nikolos, D., Vergos, H.T. and Alexiou, G., Low power BIST for wallace treebased multipliers, Quality Electronic Design, 2000. ISQED 2000. Proceedings. IEEE 2000 First International Symposium on, 2000 Page(s): 433-438 [IOIZhan Yu, Wasserman, L., and Willson, A.N., Jr., A painless way to reduce power dissipation by over 18% in Booth-encoded carry-save array multipliers for DSP, Signal Processing Systems, 2000. SiPS 2000. 2000 IEEE Workshop on, 11-13 Oct. 2000, Page(s): 571-580 [ 1 I] Taekyoon Ahn and Kiyoung Choi, Dynamic operand interchange for low power, Electronics Letters, Volume: 33 Issue: 25,4 Dec. 1997, Page(s): 2118-2120 [12]Nan-Ying Shen and Chen, 0.T.-C., Lowpower multipliers by minimizing switching activities of partial products, Circuits and Systems, 2000. ISCAS 2002. IEEE International symposium on, Volume: 4, 2002 Page(s): 1V-93 -1V-96 vo1.4

Proposed Multiplier

Shens Multiplier
17.81 17.46 18.07 17.39

YUS

Ahns
Multiplier
19.16 18.56 19.46 18.11

Multiplier
20.72 19.56 20.06

Miss
America

16.73 16.08

Le.
Flaner Garden Table Tennis

1682
16.54

19.25

Table 4 Power analysis for wavelet transform application

VI. CONCLUSION
We proposed a low-power multiplier using a modified Booth-algorithm. In order to reduce power dissipation, we partititioned two multiplication input data streams into smaller hits so that a higher probability of partial products becoming zero occurs for a lower switching rate. Whereas the overall area of the proposed multiplier is increased up to 9%, the power dissipation ratio of the proposed multiplier can be reduced maximally by about 20% of the total amount of power dissipation when compared with the existing Booth multiplier. Therefore, the proposed multiplication process can be applied to a low power design for use in portable multimedia information devices and SoC designs, especially when low power consumption and high rates of speed are primary design constraints.

57

You might also like