You are on page 1of 16

418

IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 11, NO. 3, JUNE 2003

Minimization of Switching Activities of Partial Products for Designing Low-Power Multipliers
Oscal T.-C. Chen, Sandy Wang, and Yi-Wen Wu
Abstract—This work presents low-power 2’s complement multipliers by minimizing the switching activities of partial products using the radix-4 Booth algorithm. Before computation for two input data, the one with a smaller effective dynamic range is processed to generate Booth codes, thereby increasing the probability that the partial products become zero. By employing the dynamic-range determination unit to control input data paths, the multiplier with a column-based adder tree of compressors or counters is designed. To further reduce power consumption, the two multipliers based on row-based and hybrid-based adder trees are realized with operations on effective dynamic ranges of input data. Functional blocks of these two multipliers can preserve their previous input states for noneffective dynamic data ranges and thus, reduce the number of their switching operations. To illustrate the proposed multipliers exhibiting low-power dissipation, the theoretical analyzes of switching activities of partial products are derived. The proposed 16 16-bit multiplier with the column-based adder tree conserves more than 31.2%, 19.1%, and 33.0% of power consumed by the conventional multiplier, in applications of the ADPCM audio, G.723.1 speech, and wavelet-based image coders, respectively. Furthermore, the proposed multipliers with row-based, hybrid-based adder trees reduce power consumption by over 35.3%, 25.3% and 39.6%, and 33.4%, 24.9% and 36.9%, respectively. When considering product factors of hardware areas, critical delays and power consumption, the proposed multipliers can outperform the conventional multipliers. Consequently, the multipliers proposed herein can be broadly used in various media processing to yield low-power consumption at limited hardware cost or little slowing of speed. Index Terms—Adder-tree, arithmetic, digital, low-power design, switching activity.

switching activities [3]. Thus, switching activities within the functional units of a multiplier account for the majority of the power dissipation of a multiplier, as given in the following: (1) where is the switching activity parameter, is the loading cais the operating voltage, and is the operating pacitance, can also be viewed as the effective switching cafrequency. pacitance of the transistors’ nodes on charging and discharging. Therefore, minimizing switching activities can effectively reduce power dissipation without impacting the circuit’s operational performance. Many researchers have elucidated various approaches that use modified algorithms, architectures, and circuits to reduce power consumption [4]–[9]. Abu-Khater et al. developed circuit techniques for low-power, high-performance multiplier designs [4]. Moshnyaga et al. analyzed the algorithmic, structural, and circuit levels, and used sign generation and 4–2 compressors to minimize switching activities [5]. Angel and Swartzlander suggested using an efficient sign extension scheme to process the sign bits [6], allowing the multiplier to bypass processing sign extensions, thus reducing power dissipation. Yu et al. reorganized a Booth-encoded carry-save adder array in a multiplier design to reduce power consumption [7]. Goldovsky et al. developed modified radix-4 Booth encoders to generate partial products that are summed by (3,2), (5,3), and (7,4) counters in an array with reducing sum and carry vectors [8]. Mahant-Shetti et al. employed a bottom-up temporal tiling approach to design a leapfrog array multiplier that minimized spurious transition activity [9]. In this work, low-power multipliers are investigated by minimizing switching activities of partial products according to effective dynamic ranges of input data. In designing the proposed low-power multipliers, the radix-4 Booth algorithm is utilized to reduce the complexity of implementation. For every two input data, the one with a smaller effective dynamic range is processed to yield several Booth codes. According to the Booth codes, the other datum is multiplied with 2, 1, 0, 1, or 2 to generate partial products that are then shifted and summed in parallel to yield the final result. Hence, these partial products have a greater chance of equaling zero because of the Booth encoding the datum with a smaller effective dynamic range. Furthermore, the switching activities of partial products decrease, implying a decline in power dissipation. To realize the proposed multipliers, the dynamic-range determination units can be easily designed in front of the Booth decoders and adder trees, to switch or pass input data flows where the adder trees

I. INTRODUCTION DVANCES IN microelectronic technology have led to more effective encoding of data, more reliable transmission of information, and more embedded intelligence in systems. In particular, to meet the increasing market demand for portable applications, these microelectronic devices consume very low power. Consequently, various digital signal processing chips are now designed with low-power dissipation [1], [2]. In such systems, a multiplier is a fundamental arithmetic unit. The computation of a multiplier manipulates two input data to generate many partial products for subsequent addition operations, which in the CMOS circuit design, require many
Manuscript received July 4, 2000; revised April 2, 2002. This work was supported in part by the Computer and Communication Research Laboratories, ITRI, Taiwan, under Contract TI-89024, and in part by the National Science Council, Taiwan, under Contract 88-2736-L-194-003. The authors are with the Department of Electrical Engineering, Signal and Media Laboratories, National Chung Cheng University Chia-Yi, 621, Taiwan R.O.C. (e-mail: oscal@ee.ccu.edu.tw). Digital Object Identifier 10.1109/TVLSI.2003.810788

A

1063-8210/03$17.00 © 2003 IEEE

The states of input data stored in the flip-flops can be changed by a group of bits such as 4. as shown in Fig. Switching activities of the unused functional blocks are minimized where input bits of unused functional blocks remain unaltered. can be implemented by the column-based. In this study. When only the dynamic-range determination unit is used in front of the conventional multiplier that uses counters and compressors. Booth decoders. row-based and hybrid-based adder trees are named as the proposed column-based. the critical delay of the proposed row-based multiplier is also longer than that of the proposed column-based multiplier because of adding in the row direction. However. as depicted in Fig. a row-based adder tree or a hybrid-based adder tree. and hybrid-based adder trees are implemented using TSMC 0. The proposed row-based and hybrid-based multipliers not only reduce the bit switching of . using columnbased. This situation is improved by developing the hybrid-based adder tree which integrates column-based and row-based structures in the proposed hybrid-based multiplier.25 m CMOS technology. additions of partial products in the row direction are proposed to reduce the number of partial products connected to each adder unit. On the other hand. to have capability of preserving the previous states for unused functional blocks.: MINIMIZATION OF SWITCHING ACTIVITIES OF PARTIAL PRODUCTS 419 (a) Fig. 1. slave-stage flip-flops. a dynamic-range determination unit. the proposed row-based multiplier requires more flip-flops than the proposed column-based one. The proposed column-based multiplier increases the probability that partial products become zero for power reduction. and the number of intermediate accumulation results connected to each adder unit. with a smaller effective dynamic range [10]. and a sign-extension unit. the low-power 2’s complement Booth-algorithm multipliers based on column-based. 6. such a multiplier is denoted as the proposed column-based multiplier. 1(a). Although partial products are more likely to be zero in the proposed column-based multiplier than in the conventional one. row-based and hybrid-based structures. (a) The column-based multiplier. 1(b). With this multiplier.CHEN et al. (b) The proposed multipliers. [11]. row-based and hybrid-based multipliers. some compressors or counters which sum these zero products may consume power because they add the switched sum or carry-out bit of neighboring compressors or counters. (b) The row-based or hybrid-based multiplier. only some functional units can be activated to conduct operations according to the one of two input data. These two multipliers include master-stage flip-flops. row-based. respectively. To improve upon this. and 8 bits to reduce the number of flip-flops. The conventional Booth-algorithm multiplier adds partial products in the column direction. The proposed multipliers.

as shown in Fig. Moreover. VOL. Booth encoding is performed through the radix-4. multiplexors. flip-flops are to latch input data for the dynamic-range determination unit to decide the input data flow and generate control signals. and thus ensure that the functional units addressed by these data do not consume switching power. Dynamic-Range Determination Unit The dynamic-range determination unit detects effective dynamic ranges of input data. resulting in eight partial products for summation in the column-based. 2–4 show the proposed column-based. This type of circuit design has both high-speed and low-power dissipation characteristics. In the proposed column-based multiplier. these control signals are used to control the data path of an adder tree and the sign extension operation. The proposed column-based 16 2 16-bit multipliers. respectively. the multipliers proposed herein are very well suited to low-power multimedia processing at reasonable hardware cost or little reduction of speed. otherwise it is 0. If these three bits are all either zero or one. In these three kinds of multipliers. and hybrid-based adder trees. The effective dynamic range detection can be realized using groups of bits to simplify the implementation. but not the four least significant bits. these control signals determine the data flows between the mater-stage and slave-stage flip-flops. 2. The master-stage . II. Fig. equations are derived to demonstrate that the proposed multipliers exhibit partial products with low switching activities. in Appendix 1. a basic group is based upon two bits for detection. Additionally. In this study. An overlapped bit in partial products. NO. to demonstrate the fact that the proposed multipliers have low-power consumption. JUNE 2003 Fig. The functional units of the proposed low-power multipliers are described as follows: Master-Stage and Slave-Stage Flip-Flops The master-stage and slave-stage flip-flops are realized using the true-single phase edge-triggered circuit. row-based. and latches. then a control signal output is 1. since a partial product is determined by an average of two bits of an input datum in the radix-4 Booth encoding. In the proposed row-based and hybrid-based multipliers. and then generates control signals. 3(a) [12]. and the comparators examine each 3-bit group. PROPOSED LOW-POWER MULTIPLIERS Figs. but also minimize the power consumption of functional units for noneffective bits. 11. 3. Consequently. 5 shows the functional blocks of the dynamic-range determination unit that includes comparators.420 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS. the control signals not only select the data flows but also manipulate slave-stage flip-flops to maintain noneffective bits in their previous states. The slave-stage flip-flops store the updated input data or retain previous data. logic gates. and hybrid-based 16 16-bit multipliers. Data detection begins from the most significant bits. row-based. The multiplication operations of the practical data are analyzed in the proposed multipliers that consume less power than the conventional multipliers.

: MINIMIZATION OF SWITCHING ACTIVITIES OF PARTIAL PRODUCTS 421 Fig. 3. (a) Mode I. . (b) Mode II. The proposed row-based 16 2 16-bit multipliers.CHEN et al.

422 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS. NO. VOL. 11. JUNE 2003 (c) (d) Fig. (c) Mode III. Continued. 3. (d) Mode IV. 3. .

: MINIMIZATION OF SWITCHING ACTIVITIES OF PARTIAL PRODUCTS 423 Fig. (b) Mode IV. The proposed hybrid-based 16 2 16-bit multipliers. (a) Mode III. 4.CHEN et al. .

14. 12. the effective dynamic ranges of input data are determined by a group of bits as a basis. These layouts are extracted. the neighboring two groups is used to support a continual comparison. using TSMC 0. VOL. and post-simulated by the Power-mill and Time-mill tools. and 1.25 m. a 16 16-bit multiplier has two input data. and Mahant–Shetti’s multipliers [7]–[9]. 11. 12. to reduce the number of the slave-stage flip-flops in the proposed row-based multipliers.25 m CMOS technology to generate their layouts. POWER ANALYSES The proposed 16 16-bit 2’s complement Booth-algorithm multipliers using the column-based. the widths/lengths of the pMOS and nMOS transistors are 2. The one of two input data having a smaller effective dynamic range can be determined by logic operations on the signals from the comparators. row-based and hybrid-based adder trees are implemented by the Cadence tool. Only modes III and IV of the proposed hybrid-based multipliers are explored by considering the reduction of processing speed. The hybrid-based adders. 5.424 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS.3). 10. NO. Four bits or more can constitute a basic group of data that are either changed or unchanged in slave-stage flip-flops together. shown in Fig. for most circuit cells in the conventional and proposed multipliers. except in the slave-stage flip-flops and the carry propagation adder in the last stage of the adder tree. 3(a). or to invert the output value. Furthermore. respectively. include the row-based adders and the column-based adders using Yu’s approach. (5. and 16 bits. Goldovsky. Here. only input bits in the effective dynamic range are allowed to move to the slave-stage flip-flops. 1. 1. because each of the functional units after the slave-stage flip-flops requires at least two partial products to be computed. 8.0 m/0. Here. 3. multiplexors of a row-based or hybridbased adder tree. 8. indicating the effective dynamic ranges of input data. The eight partial products are grouped into two parts which are individually summed in the column-based adders. The proposed radix-4 Booth decoder. Sign-Extension Unit The sign-extension unit is used only in the proposed rowbased and hybrid-based multipliers. address the control signal generator of the dynamic-range determination unit to yield the control signals that manipulate the slave-stage flip-flops. After an adder tree performs addition. and 16 bits. Considering the driving capabilities of slave-stage flip-flops and the processing speed of the carry propagation . The proposed row-based multipliers require seven ripple adders and multiplexors that are arranged in four operational structures. 12. the results in the effective and noneffective dynamic ranges have correct and incorrect values. Fig. For a 16-bit datum. III. The signal. 1 or 2 times the input datum. the effective dynamic ranges of which are determined by 12 3-bit comparators. and the results from these two parts are added by using the row-based adder. as shown in Fig. six groups are compared to determine the effective dynamic ranges of 4. Sign extension must be assigned to the output result in the noneffective dynamic range to restore the correct value in the final step. By using the control signals of the dynamic-range determination unit. 4. 3) mode III that operates on 8 and 16 bits. 2) mode II that operates on 8. and 4) mode IV that operates on 12 and 16 bits. A Booth Decoder The radix-4 Booth decoder can generate five possible values of 2. multiplexors were used to decide which bits were signs and which were values. are applied in the adder trees of the proposed column-based multipliers for comparison. is generated to control multiplexors in the switcher of the dynamic-range determination unit to manipulate the input data flow. indicating a datum with a smaller effective dynamic range. this and other signals. includes a 3-to-1 multiplexor and simple logic gates to select the decoded value of 0. 6 shows the functional blocks of the sign-extension unit in four different operational modes. (3. and (7.2). The dynamic-range determination unit. 0. respectively.4) counters. respectively. 3.25 m.5 m/0. Herein. and a sign-extension unit. JUNE 2003 Fig. An Adder Tree The carry-save adders. and 16 bits. four operational modes are considered for analysis simplification. 6. when effective dynamic ranges of input data randomly occur between 1 and 16 bits. Herein. and 2 times the input datum. and a leapfrog adder array applied in the Yu. Herein. 1) Mode I that operates on 4. such that the detected effective dynamic-range values may exceed the actual ones. Input bits in the noneffective dynamic range remain in their previous states such that no switching activities consume power. shown in Fig.

(d) Mode IV. The sign-extension unit.1 speech. In Mahant-Shetti’s multiplier. the multiplication operations involved in autocorrelation of linear prediction coding for 0. 7 shows the histograms of effective dynamic ranges of input data for multiplication in these three applications. Table I lists the power consumption. (c) Mode III. Their multiplication operations are performed using a multiplier that is either the proposed or conventional multiplier.125-second segment of audio is analyzed. consume less power than the conventional Yu’s.: MINIMIZATION OF SWITCHING ACTIVITIES OF PARTIAL PRODUCTS 425 Fig. a 0. In the wavelet-based image coder. Goldovsky’s multiplier requires a larger hardware area than the other two conventional multipliers since it uses the condition-sum adder in the last stage of its adder tree. Goldovsky’s and Mahant-Shetti’s multipliers. Adaptive differential pulse code modulation (ADPCM) audio.1 speech coder. adder.723. and wavelet-based image coders are employed in practical power analyzes.05-second speech signals sampled at 8 KHz have 26 697 input vectors.CHEN et al. such . (b) Mode II. and signal prediction involve 17 367 input vectors. one fortieth of the multiplication operations of the 512 512-pixel Lenna image through the 5-tap low-pass and 3-tap high-pass filtering of the wavelet filters are performed and involve 19 117 input vectors.723. (a) Mode I. areas and critical delays of the conventional and proposed column-based multipliers in these three applications. Goldovsky. In the ADPCM audio coder. Fig. in which the multiplication operations of low-pass and high-pass band-splitting. In the G. the sum output of a full adder is linked to the sum input of the subsequent adder using a leapfrog connection. their transistors are sufficiently enlarged for use in both the proposed and conventional multipliers. and Mahant-Shetti. 6. G. The proposed column-based multipliers that use the approaches of Yu.

and 2. (a) ADPCM audio coder. 19. Additionally. Yu’s multiplier includes the adder array for adding from the most to the least significant bits.6%. 3. 24. Here. and thus they consume less power than those computing the ADPCM audio and G. consumes less power than the other two proposed column-based multipliers.4%.5% of the critical delay.9% of the power in Yu’s multiplier to realize the ADPCM audio. The histograms of effective dynamic ranges of input data for multiplication in the three practical applications. respectively. VOL. The proposed row-based and hybrid-based multipliers in modes III and IV consume less power than the proposed column-based multipliers. (c) Wavelet-based image coder.6% of the hardware area of Yu’s multiplier.2%.3%. 21. 11. Table II lists the power consumption.1%. the proposed column-based. The proposed column-based multiplier. G. NO. respectively.8%. 25.1 speech and wavelet-based image coders. G. a further modification connects the sum and carry outputs of a carry save adder to the carry and sum inputs of the subsequent carry save adder. and 33. and more than 12. the proposed column-based multipliers computing the wavelet-based image coder. it uses 31. that this multiplier requires more full adders to realize its adder tree than Yu’s multiplier.3%.723. (b) G.0% less power than the Yu’s multiplier. The proposed row-based and hybrid-based multipliers in mode IV save more than 35. Here.1 speech coders. and 36. row-based and hybrid-based multipliers exhibit more than 0.6%.723.3%. and 33. 14. JUNE 2003 Fig. Hence.1% less power consumption in the proposed column-based multiplier using Yu’s approach.723.1 speech coder.1 speech and wavelet-based image coders. Nevertheless.723. can effectively switch or pass the input data flow to encode input data of which effective dynamic ranges are smaller than 9 bits. 7 shows that the effective dynamic ranges of input data from the wavelet-based image coder vary less and are smaller than 9 bits.9%. Fig.0%. when the . Additionally. to realize the ADPCM audio. 7. using Yu’s approach. respectively. and 12. Tables I and II illustrate that the row-based multiplier in mode IV consumes the least power. areas and critical delays of the proposed row-based and hybrid-based multipliers for these three applications. and 39. the dynamic-range determination unit consumes 7.426 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS. respectively.

AREAS AND CRITICAL DELAYS OF THE PROPOSED ROW-BASED AND HYBRID-BASED MULTIPLIERS POWER CONSUMPTION OF THE TABLE III PROPOSED COLUMN-BASED. where the signs of the input data are randomly generated. The proposed column-based multiplier that follows Yu’s approach.: MINIMIZATION OF SWITCHING ACTIVITIES OF PARTIAL PRODUCTS 427 TABLE I POWER CONSUMPTION. and Yu’s conventional multiplier are chosen from Tables I and II for a comparison that involves the effective dynamic ranges of input data with uniform and Gaussian distributions. ROW-BASED AND HYBRID-BASED. areas and critical delays. When considering the factor of multiplying power consumption. the proposed hybrid-based multiplier in mode IV performs best in these three applications and the second best performer is the proposed column-based multiplier. AREAS AND CRITICAL DELAYS OF THE CONVENTIONAL AND PROPOSED COLUMN-BASED MULTIPLIERS TABLE II POWER CONSUMPTION. Here.CHEN et al. the proposed row-based and hybrid-based multipliers in modes III and IV. The power dissipation of the dynamic-range determination unit and sign-extension unit is less than 8. each distribution case involves 15 000 input vectors.8% of those of the row-based and hybrid-based multipliers in mode IV. Table III lists the power consumption of the proposed . AND YU’S 16 EFFECTIVE DYNAMIC RANGES OF INPUT DATA WITH UNIFORM DISTRIBUTIONS 2 16-BIT MULTIPLIERS FOR operational mode IV is utilized.

and 1. AND YU’S 16 EFFECTIVE DYNAMIC RANGES OF INPUT DATA WITH GAUSSIAN DISTRIBUTIONS 2 16-BIT MUTIPLIERS FOR and conventional multipliers for uniformly distributed effective dynamic ranges of input data. Therefore. row-based. whereas these two multipliers in mode IV have the largest power saving ratios for effective dynamic ranges of input data between 1 and 12 bits. Users can thus determine a proposed multiplier that is suited to their applications by considering the chip area.428 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS. Furthermore. The dynamic-range . NO. In addition. and data type. the proposed row-based and hybrid-based multipliers can conserve more power than the proposed column-based multiplier. 11. When neighboring input data have similar effective dynamic ranges and the same sign. CONCLUSION The three proposed Booth-algorithm multipliers are demonstrated to dissipate less power than conventional ones. Table IV specifies the power consumption of the proposed and conventional multipliers when effective dynamic ranges of input data follow the Gaussian distributions with different means and standard deviations. the multipliers proposed herein consume less power by reducing the switching activities of partial products to realize various low-power multimedia applications. TABLE V PROBABILITIES OF THE BOOTH DECODED VALUES BEING 2Y . 3. AND 2Y 0 0 the proposed column-based multiplier can be cost-effective. larger standard deviations facilitate increased power savings because of an increased probability of encoding the data with smaller effective dynamic ranges.13 times the hardware area of Yu’s conventional multiplier. the proposed row-based and hybrid-based multipliers can effectively save power when their operational modes are selected to match the effective dynamic-range distribution of input data. power consumption. The effective dynamic ranges of input data increase with the mean. Y . ROW-BASED AND HYBRID-BASED. When two neighboring input data have a large dynamic-range difference. the proposed row-based multiplier may consume less power but has a longer delay than the proposed hybrid-based multiplier. increasing power consumption. 1.21. and 1. 0.13. The proposed row-based and hybrid-based multipliers in mode III have the largest power saving ratios for effective dynamic ranges of input data between 1 and 8 bits. for a given mean. JUNE 2003 TABLE IV POWER CONSUMPTION OF THE PROPOSED COLUMN-BASED. and hybrid-based multipliers. This effect reveals that operational modes III and IV can match the effective dynamic ranges of input data from 1 to 8 bits and from 1 to 12 bits. VOL.03 times the critical delay. respectively.15. when operational mode IV is utilized. However. row-based.00. The proposed column-based. row-based and hybrid-based adder trees. speed. These three multipliers are equipped with dynamic-range determination units to add partial products in the column-based. The results of the previous 16 16-bit proposed and conventional multipliers are analyzed to effectively utilize the proposed column-based. respectively. 1. Tables III and IV reveal that the proposed row-based or hybrid-based multipliers in modes III or IV consume the least power for various effective dynamic-range distributions. The saving ratios of power consumption of the proposed column-based multiplier against that of the conventional multiplier increase with the effective dynamic ranges of the input data. Y . IV. and 1. and hybridbased multipliers have 1.

processing speed.: MINIMIZATION OF SWITCHING ACTIVITIES OF PARTIAL PRODUCTS 429 TABLE VI PROBABILITIES OF BOOTH DECODED VALUES AT DIFFERENT EFFECTIVE DYNAMIC RANGES determination unit detects the one of two input data. and data types are the most important considerations of the cost-effective selection of the proposed column-based. row-based.CHEN et al. Additionally. the proposed low-power multipliers can be used in various practical applications with a small increase in hardware complexity or critical delay. manipulates the data flow of an adder tree. Consequently. the DRD unit of the proposed row-based and hybrid-based multipliers controls the slave-stage flip-flops to store effective dynamic-range bits of an input datum. or column-hybrid multiplier. hardware complexity. Finally. The power analyzes of multiplication operations of the practical input data confirmed that the proposed 16 16-bit column-based. APPENDIX THEORETICAL ANALYSES OF SWITCHING ACTIVITIES The theoretical foundation is derived to illustrate the reduction of switching activities for the partial products of the pro- . with the smaller effective dynamic range for Booth encoding. power consumption. row-based. The proposed hybrid-based multiplier is the best and the proposed column-based multiplier is the second best in terms of the product factors of hardware areas. and determines the operation of the sign-extension unit for further power reduction. and hybrid-based multipliers dissipate less power than Yu’s conventional multiplier. critical delays and power consumption. minimizing the switching activities of partial products.

where is assumed to be an even number. 0. where and designate probabilities associated with as . 11. and equals zero. are uniformly values when three bits of distributed as 0 or 1. Switching activities occur in cases 2). . using the radix-4 Booth algorithm. except for . . 1(a). is used for . Table VI. Hence. . equals where is 0. for the has an effective dynamic range of bits. 3-bit groups. the following: (6) for (4) and The relationship between can be classified simply as four cases of changes of partial products—1) from zero to zero. the effective dynamic ranges of have probabilities of . Additionally. NO. and thus have an average of one half of the bits with switching. Table VI presents the occurrence probabilities of the Booth decoded values. the probability . with a smaller effective dynamic range. If a series of data. and [13]. Equation (6) represents the average switching activity of partial products of the conventional multiplication. and (4) and (6) reveal that the minimum average switching activity occurs when the effective dynamic range of is only 1 bit. ranges of .430 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS. and . (5) (3) where the output value from the radix-4 Booth decoder has a bits. then the probability . the average switching activity of all partial products is is an intermediate product that According to (3). Furthermore. altering the distribution of can effec. The radix-4 Booth algorithm is usually applied to encode one of two input data. Here. In addition. input data are assumed to word length of be uncorrelated and switch simultaneously. 2) from zero to nonzero values. Accordingly. . 3. . is a fixed value for an effective dynamic From Table VI. each of which has one bit that is overlapped with with a the previous group. respectively. minimizing switching activitively increase the value of ties. the dynamic-range determination unit has a detection resolution of 2 bits and determines effective dynamic ranges larger than 4 bits. can be approximately given by the following: . Multiplying the other input datum. The partial products from Booth decoders that operate on the most significant bits of input data are more likely to become zero when the proposed column-based multiplication. 0. the switching activity can be reduced when increasing the probabilities that the partial products are zero. as shown in Fig. (2) is modified to. 3) from nonzero values to zero. The According to (6). (2) is the th digit of . and . by . and 4) from nonzero to nonzero values. range of bits.5 and is 1 for greater than 0. where indicates the probability that the effective dynamic range is bits. Table V lists occurrence probabilities of these five . . can be represented by five different values of . is zero can be derived as that each partial product. obtained by considering the effective dynamic of . Hence. In this case. is employed by Booth encoding the one of two input data. then a datum. If case in which . Here. neighboring partial products are independent and simultaneously change their states without glitching. VOL. and 4). . . JUNE 2003 posed 2’s complement multiplication. can be represented by average switching activity of the partial product. 3). is partitioned into several Booth encoding. the 2’s complement of word length of .

When is 5. the prorange for posed hybrid-based 16 16-bit multiplier in mode III has two and where includes from predetermined data ranges. the switching activity of the th partial product can be formulated as for an odd number (10) represents the least number in the predetermined data where . (9) and (11). 16 16-bit multiplication is used as an example in which two input data.: MINIMIZATION OF SWITCHING ACTIVITIES OF PARTIAL PRODUCTS 431 TABLE VII AVERAGE SWITCHING ACTIVITIES OF PARTIAL PRODUCTS OF THE PROPOSED AND CONVENTIONAL BOOTH-ALGORITHM MUTIPLIERS EFFECTIVE DYNAMIC RANGES OF INPUT DATA WITH UNIFORM DISTRIBUTIONS FOR that the effective dynamic range of input data is bits for Booth encoding can be formulated as for from zero to nonzero values. are assumed to have the same dynamic-range distribution. (4). the proposed row-based and hybrid-based multipliers. 1(b). Hence. change According to (6). (10). According to Table VII. within the grouped effective dynamic ranges. shown in Fig. and from nonzero values to zero. and . Several grouped data ranges are allowed for preserving the previous states to reduce the number of the slave-stage flip-flops. perform additions of partial products at effective dynamic data ranges to save power. a larger effective dynamic range of input data implies greater switching activities. ( ) equals 1 to 8 bits and (7) 11 and thus belongs to : then is 9 and is used in Eq. As well as the Booth encoding smaller dynamic-range numbers. from nonzero to nonzero values. These additional reduced switching activities come primarily from the changes of effective dynamic ranges of two neighboring input data for Booth encoding. With the proposed multipliers. ( ) with in Eq. the average switching activity of all partial prodReplacing the partial product from the Booth decoder is zero: ucts for the proposed row-based or hybrid-based multiplication is for (8) The average switching activity of all partial products within the proposed column-based multiplication are then represented by (11) (9) Only partial products from the effective dynamic range of an input datum for Booth encoding are switched and the others remain in their previous states. saving ratios are likely increased with effective dynamic ranges because the more differences between the effective dynamic ranges of two input data enable the proposed multipliers to encode input data with smaller effective . from large to small. Thus. the average switching activities of partial products for the conventional and proposed multiplication can be analyzed for various effective dynamic ranges of input data. Table VII illustrates average switching activities of partial products for effective dynamic ranges of input data with uniform distributions. and ( ) belongs to . For example. from 9 to 16 bits. Here. yields the probability that Consequently. their switching activities occur when partial products. an even number.CHEN et al.

Jr. he was with the Computer Processor Architecture Department of Computer Communication and Research Labs. and a Founding Member of the multimedia systems and applications technical committee of IEEE Circuits and Systems Society. Elmasry. He is a Life Member of Chinese Fuzzy Systems Association. IEEE Int. Since September 1995. Solid-State Circuits. Taiwan. Abu-Khater. Symp. 1560–1563. Chia-Yi. VOL. he has been an Associate Professor in the Department of Electrical Engineering. IEEE Workshop Signal Processing Syst. [6] E. E. Low-Power CMOS Design. Office of Research and Development. IEEE Int. Hsinchu.C. degrees in electrical engineering from University of Southern California at Los Angeles.” in Proc. Currently. 32. I. “Low-power multipliers by minimizing inter-data switching activities. He contributed significantly to many industrial applications including the fuzzy chip. Chiayi. Hsieh. 2000. and digital signal processor. pp.O. MA: Kluwer.-L.. Circuits Syst.” in Proc. Jan.-C. and R. pp. RF IC. H. no. S. Angel and E. 1995. [13] O. pp. NCCU. 3. R. . M.” in Proc.” in Proc. Piscataway. [4] I. Santa Clara. 199–208.” IEEE J.-C. 121–124. Wang. Chen was an Associate Editor of IEEE Circuits and Devices Magazine from July 1995 to March 1999. He participated in the Technical Program Committee of the IEEE International Conference on Multimedia and Expo. 2000. he is also Director of the Academic Development Division. in 1990 and 1994. microsensors. NJ: IEEE Press. Chen (S’89–M’94) was born in Taiwan. Svensson. S.432 IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS.. 1535–1546. and R. [12] J. “A painless way to reduce power dissipation by over 18% in Booth-encoded carry-save array multipliers for DSP. when the mean increases. Brodersen. and the M. (CCL). Taiwan helped on the circuit layouts and simulations.. That is. vol. Norwell. thereby increasing the saving ratios of switching activities. Tamaru. and Section Chief. pp. vol. “A comparative study of switching activity reduction techniques for design of low-power multipliers. vol.” IEEE Trans. and Ph. vol. Oscal T. vol. VLSI systems.” in Proc. and M. 498–523. Sheen. pp. Bellaouar. video/audio processing. He received the B. IEEE. Solid-State Circuits. [2] G. 1996. 1996. Taiwan. dynamic ranges. A. From 1994 to 1995. pp.. 2000. 83. According to Table VII and VIII. O. 5. 11..-C. Goldovsky. in 1965. Sheu. ITRI. Aug. National Chung Cheng University. [8] A. [7] Z. since it uses smaller effective dynamic-range numbers for Booth encoding and controls values of partial products in part of the noneffective dynamic range to remain unchanged. P. Apr. vol. 1999. USA. making more a reduction in switching activities more difficult. and R. “New single-clock CMOS latches and flipflops with improved speed and power savings. are also commended for his valuable suggestions on low-power circuit design. VLSI Syst. Yuan and C. Thereby. 571–580. NO.” Proc. Table VIII presents the average switching activities of the conventional and proposed multipliers for effective dynamic ranges of input data with the Gaussian distributions. degree in electrical engineering from National Taiwan University in 1987. the proposed row-based or hybrid-based multipliers in modes III or IV can exhibit the least switching activity. and Signal Processing. 1995. Kolagotla. K. Liu. 88–92. Conf. T. The variation characteristics of the results in Tables VII and VIII are quite consistent with those in Tables III and IV. “A highlyscaleable FIR using the Radix-4 Booth algorithm. Mar. 7. Wang.D. Chen. the power conserved from reduction of switching activities cannot compensate for the power consumed by the overhead hardware components in the proposed multipliers. respectively. W. IEEE 43rd Midwest Symp. His research interests include analog/digital circuit design. Lemonds. pp. and communication systems. O. 1998. [3] A. and J.S. as effective dynamic ranges of input data span in a small range or has a low standard deviation. IEEE Int. Dr. Symp. [5] V. 3. Nan-Ying Shen. 345–348. and C. May 1998. Dept. National Chung Cheng University (NCCU). [10] R. 266–269. vol. Practical Low‘-ower Digital VLSI Design. “Design and implementation of a 16 by 16 low-power tow’s complement multiplier. ACKNOWLEDGMENT Valuable comments and suggestions from reviewers are highly appreciated. Oct. variations in effective dynamic ranges of two input data increase. Brodersen. an increase in the effective dynamic range decreases the probability that the partial products become zero. the proposed multipliers consume little more power than the conventional multiplier in these cases. Swartzlander. “Power consumption of a 2’s complement adder minimized by effective dynamic data ranges. minimizing switching activities becomes increasingly difficult. Industrial Technology Research Institute (ITRI). He was the corecipient of the Best Paper Award of IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATON (VLSI) SYSTEMS in 1995. Chen. T. pp. Schulte. pp. 2000–2002.. W. “High performance low power array multiplier using temporal tiling. Acoustic. and A. P. G. Patel. 31. Balsara. P.-Y. Speech. Symp. “Low power parallel multipliers. Nassda Corp. When the standard deviation increases. neural networks. Center for Aviation and Space Technology and CCL..-C. Yu. Project Leader. W. IEEE Int.S. Wu. “Circuit techniques for CMOS low-power high-performance multipliers. Wang. pp.-L.. Y. Circuits Syst. DSP processors. Ma. IEEE Workshop Very Large Scale Integration (VLSI) Signal Processing. [11] S. Jr. 62–69. Wasserman. JUNE 2003 TABLE VIII AVERAGE SWITCHING ACTIVITIES OF PARTIAL PRODUCTS OF THE PROPOSED AND CONVENTIONAL BOOTH-ALGORITHM MUTIPLIERS FOR EFFECTIVE DYNAMIC RANGES OF INPUT DATA WITH THE GAUSSIAN DISTRIBUTIONS REFERENCES [1] A. speech recognition system. 1. respectively. B.” in Proc. 1997. 4. Moshnyaga and K. Chandrakasan and R. Willson. Mahant-Shetti. vol. of Electrical Engineering. [9] S. as System Design Engineer. Dr. 1765–1768. Chandrakasan and R. T. He has also served as a Technical Consultant with the Institute for information Industry. Yeap.-C. Circuits Syst.. Chen.” IEEE J. Bing J.” in Proc. In contrast. pp. 1998. Ma. “Minimizing power consumption in digital CMOS circuits. May 1999. Circuits Systems. However. Apr. L.

she joined Winbond Corporation.C. Taiwan. in 1976. Inc.. She received the B.. degree in electrical engineering from National Taiwan Ocean University at Keelung. R.C. degree in electrical engineering from National Chung Cheng University at Chiayi.S.S.C. Her research interests include operational amplifiers. and lowpower CMOS integrated circuits for consumer electronics. respectively. she is an Integrated Circuit Design Engineer in the Etrend Electronics. In 2000. . Her research interests include digital circuit design.: MINIMIZATION OF SWITCHING ACTIVITIES OF PARTIAL PRODUCTS 433 Sandy (Li Yueh) Wang was born in Taiwan. degrees in electrical engineering from National Chung Cheng University. R. Taiwan. R. in 1997 and 1999..C. Taiwan. Hsinchu. Yi-Wen Wu was born in Yunlin. RF circuit modules. Taiwan. board-level development and system integration.O. She received the B. and M.C. Tainan. Taiwan.S.O.O.O.O. Taiwan R. Currently. in 1999. in 2001. R.S. in 1974.CHEN et al. and the M. where she works in the field of very large scale integration (VLSI) circuit design and system analysis.