You are on page 1of 13



Design of Power-Efficient Configurable
Booth Multiplier
Shiann-Rong Kuang, Member, IEEE, and Jiun-Ping Wang

Abstract—In this paper, a power-efficient 16
16 configurable Booth multiplier (CBM) that supports single 16-b, single
8-b, or twin parallel 8-b multiplication operations is proposed.
To efficiently reduce power consumption, a novel dynamic-range
detector is developed to dynamically detect the effective dynamic
ranges of two input operands. The detection result is used to not
only pick the operand with smaller dynamic range for Booth
encoding to increase the probability of partial products becoming
zero but also deactivate the redundant switching activities in ineffective ranges as much as possible. Moreover, the output product
of the proposed multiplier can be truncated to further decrease
power consumption by sacrificing a bit of output precision. To efficiently and correctly combine these techniques, some additional
components, including a correcting-vector generator, an adjustor,
a sign-bit generator, a modified error compensation circuit, etc.,
are also developed. Finally, three real-life applications are adopted
to evaluate the power efficiency and error performance of the proposed multiplier. The results show that the proposed multiplier is
more complex than non-CBMs, but significant power and energy
savings can be achieved. Furthermore, the proposed multiplier
maintains an acceptable output quality for these applications
when truncation is performed.
Index Terms—Booth multiplier (BM), configurable multiplication, low-power design, truncation, partially guarded computation.



ORTABLE multimedia and digital signal processing
(DSP) systems, which typically require flexible processing ability, low power consumption, and short design cycle,
have become increasingly popular over the past few years.
Many multimedia and DSP applications are highly multiplication intensive so that the performance and power consumption
of these systems are dominated by multipliers. Unfortunately,
portable devices mostly operate with stand-alone batteries,
but multipliers are very power consuming. Consequently, it
is greatly imperative to develop power-efficient multipliers to
compose a high-performance and low-power portable multimedia and DSP system.
If the multiplier coefficients in these systems are constant,
a general multiplier can be simplified to a network of shifts,
adders, and subtracters [1] to reduce power consumption.
However, this kind of simplified multiplier is inflexible so
Manuscript received August 13, 2008; revised December 26, 2008. First published June 02, 2009; current version published March 05, 2010. This work was
supported in part by the National Science Council, Taiwan, under Grant NSC
96-2220-E-110-007. This paper was recommended by Associate Editor L. He.
The authors are with the Department of Computer Science and Engineering, National Sun Yat-Sen University, Kaohsiung 804, Taiwan (e-mail:
Digital Object Identifier 10.1109/TCSI.2009.2023763

that it may not be suitable for multiplication operations with
different or without constant coefficients. To achieve flexible processing ability, various techniques for reconfigurable
multipliers that are capable of supporting multiple-precision
multiplications have been developed [2]–[7]. These techniques
partitioned operands into multiple lower precision operands
and performed several multiplications in parallel. For example,
the reconfigurable-multiplier architecture with one level of
recursion was developed in [2] to provide variable-precision
arithmetic. The work in [3] proposed a methodology to design the reconfigurable unsigned array multiplier. A larger
multiplier was portioned into a number of smaller multipliers
to perform different precision modes, depending upon extra
control signals. In [4], the architecture for a reconfigurable
unsigned multiplier with one level of recursion was proposed.
and one
By employing three
scalable multiplication operations can be accomplished. Unfortunately, these unsigned multipliers cannot be applied to most
of the multimedia and DSP applications due to their signed
multiplication operations. On the other hand, Krithivasan
and Schulte [5] proposed a novel technique for designing a
subword-parallel multiplier that supports both unsigned and
two’s-complement multiplication. In [6], the subword-parallel
multiplier was partitioned into four independent multiply modules that generate partial products using Booth encoding, and
appropriated constant values were added to the partial-product
array to ensure the correctness of the final product in various
operation modes. However, these multipliers do not take the
power efficiency into consideration.
Several techniques are available to improve the power efficiency of portable systems based on the fact that the effective dynamic range of the input operands for multiplication operations
in many multimedia and DSP applications is generally limited to
a small range and that the case with a maximal range rarely occurs. Approaches termed guarded evaluation [8]–[12] reduce the
power consumption of multipliers by eliminating spurious computations according to the dynamic range of the input operands.
For example, Choi et al. [9] proposed the partially guarded computation method that disables a fraction of the multiplier based
on the dynamic range of the input operands. The unnecessary
computations in the sign-extension part are removed, thereby
reducing power consumption. In [10], an array multiplier was
broken into four petty clusters. By utilizing a clock-gating technique and preperforming input pattern analysis using a simple
detection unit, some of the clusters that produce the zero result
are disabled. In [11], the signal-gating scheme based on the effective range of each input operand was proposed to deactivate
unnecessary computations of the multiplier dynamically. The

1549-8328/$26.00 © 2010 IEEE

a concluding remark is given in Section V. thus. II. and are sometimes represented as and . we can obtain (6) . how to detect and shut down redundant computation. An efficient and flexible sign-extension method for multiterm parallel addition is also introduced in this section. and a sign-extension unit (SEU) are designed. The two’s-complement representations of and are (1) To easily realize the twin-precision multiplication. etc. and the truncation technique to design a power-efficient configurable BM (CBM). which add the estimated compensation carries to the carry inputs of the retained adder cells. Our main concerns are power efficiency and structural flexibility. The experimental results demonstrate that the proposed multiplier can provide various configurable characteristics for multimedia and DSP systems and achieve 569 more power savings with slight area overhead when compared to the previous low-power BM [15]. Section III describes the proposed power-efficient CBM and related key components. an SB generator .KUANG AND WANG: DESIGN OF POWER-EFFICIENT CONFIGURABLE BOOTH MULTIPLIER work in [12] separated the arithmetic units into the most and least significant parts and turned off the most significant part when it did not affect the computation results to save power. The one with the smaller dynamic range is processed to generate Booth encoding so that partial products have a greater opportunity to be zero. Various error compensation approaches and circuits [16]–[22]. data-dependent error compensation approaches [18]–[22] were developed to achieve better accuracy than that of the constant scheme by adaptively adjusting the compensation values according to the input data values at the expense of slight area and delay overheads. a novel dynamic-range detector and some additional generator. significant power saving can be achieved by directly omitting the adder cells for computing the least significant bits of the -bit output product. Shen and Chen [14] developed a dynamic-range detector to detect the effective range of two operands. On the contrary. partially guarded computation. have been developed to reduce the truncation error of array and Booth multipliers (BMs). CONFIGURABLE MULTIPLICATION WITH ONE-LEVEL RECURSION It is assumed that a signed -bit multiplicand is multiplied by a signed -bit multiplier to generate a signed -bit product . product can be calculated as (4) The and values can be expressed as (5) Substituting (5) into (4). thereby reducing power consumption maximally. For convenience. including a correcting-vector adjustor. the error compensation circuit (ECC) proposed in [21] is modified and incorporated into the configurable multiplier to reduce the product error when truncation is performed. applying these techniques simultaneously to a BM causes some problems such as when to exchange the input operands. In Section IV. an components. Additionally. Because the multiplications in the kernels of the most common multimedia and DSP applications are based on 8–16-b operands. constant error compensation values were precomputed and added along with the carry inputs of the retained adder cells to reduce the truncation error. the -bit output product of the plier in many multimedia and DSP systems is frequently truncated to bits due to the fixed register size and bus width inside the hardware. Finally. multiFurthermore. incorrect sign bit (SB). With this characteristic. single 8-b. respectively. The remainder of this paper is organized as follows. [17]. we attempt to combine configuration. In this paper. and large product errors. To overcome these problems. Techniques in [13] that can dynamically adjust two voltage supplies based on the range of the incoming operands and disable ineffective ranges with a zero-detection circuitry were presented to decrease the power consumption of multipliers. the power consumption of the multiplier can be further reduced. the proposed multiplier is designed to not only perform single 16-b. but large truncation errors are introduced. In Section II. or twin parallel 8-b multiplication operations but also offer a flexible tradeoff between output accuracy and power consumption to achieve more power savings. three real-life applications are used to demonstrate the power efficiency and error performance of the proposed multiplier. a new configurable multiplication with one level of recursion is proposed. The work in [15] partitioned the two input operands into fewer bits for Booth encoding to increase the probability of partial products becoming zero. These compensation approaches can be classified into constant and adaptive schemes. However. In the constant scheme [16]. another signed -bit binary numbers for and are defined as (2) (3) Based on (2) and (3).

we eliminate all negative constants “ ” by directly inserting the into the compression tree for 16-b multiplication with one-level recurstructure. Second. where denotes the SB of each partial-product row and is the hot one for negative encoding that implies that the partial-product row is negative or positive . generated from and . and of each partial-product row in the four MBE partial-product arrays can be simplified by replacing SB with and a negative constant “ . dependent partial-product arrays (i. Therefore. and ample. Assume that two’s-complement number and that .e.570 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS. 57. VOL. 1(a).e. equals zero or one of and . Otherwise. Multiplication matrix of partial-product bits for 16-b multiplication with one-level recursion. For each -bit MBE multiplication. four in. For exones) of . and ) are proand duced by the modified Booth encoding (MBE) approach and an generator. ) and two -bit vectors (i..” Subsequently. Based on the previous discussion. where (7) According to (7). can be represented as a two’s-complement number as follows: (8) According to (8). . However. if both and are one. 1. The sion can be calculated as . indicate the sign-extension bits (successive zeros or Let be the length of . a simple and flexible sign-extension method for multiterm two’s-complement addition is developed to solve the problem. Subsequently. First. and let and . respectively. if are 111 and 3. Note that when can be obtained by performing and forcing and to zero to avoid the subtraction operation because . the SBs of and all partial-product rows in the four MBE partial-product arrays must be properly th position to ensure the correctextended up to the ness of the final -bit product . two’s-complement additions for and all partial-product rows are carried out through a compression tree (Wallace [24] or Dadda [25]) and a -bit fast-carry propagation adder (FCPA). the MBE partial-product generator proposed in [23] can and are be used to produce the partial-product array. the sign-extension problem that occurred in the second step greatly impacts the performance and power consumption of the multiplier. Then. respectively. we and these simultaneously consider all SBs within is an -bit partial-product rows as follows. NO. as shown in Fig. MARCH 2010 Fig. and can be summed to obtain EV. 3. the output product can be generated by performing the following two steps.. and all parInstead of extending the SBs of tial-product rows in the four MBE partial-product arrays.

TABLE I MODES OF THE PROPOSED CONFIGURABLE MULTIPLIERS (9) . the partial products to produce the least significant product bits are set to zero. and denote the negative constant values where . and error compensation values are added to the most significant product bits to reduce the product error. and all partial-product rows of the pressing four MBE partial-product arrays simultaneously also decreases the area. and power of the compression tree. 1(b) and 2. thereby ensuring the correctness of output product . 2 can perform and twin parallel 8-b multiplication operations by setting . When 16-b or single 8-b multiplication operation is performed. Moreover. The for twin parallel 8-b multiplito cation operations must be revised into obtain the correct output products and . . The configuration signals figure the operation of the proposed multiplier into six modes. com. Obviously.KUANG AND WANG: DESIGN OF POWER-EFFICIENT CONFIGURABLE BOOTH MULTIPLIER 571 Fig. More power saving is obtained or 16) by disabling the computafor -bit multiplication ( least significant product bits if . Note that when . leading to smaller area and less power consumption. to avoid extra sign extension and guarantee the correctness of output product . Figs. 1(b) and Fig. 2. as shown in Figs. 1(b) to prevent the carries of the least significant part from propagating to the most significant part. and this value that is represented as a 32-b two’s-complement number is directly added into the compression tree. the single as shown in Table I. and EV. partially guarded computation and the truncation technique are integrated into the configurable multipli- cation with one-level recursion to construct a 16-b low-power CBM. and the inputs of and multiplications to zero. In addition to single 16-b multiplication. two-input AND gates must be inserted at the 16th position of the multiplication matrix in Fig. preventing extra sign extension reduces not only the partial-product bits of each row but also the number of rows inputted into the compression tree. delay. On the other hand. or 10. Block diagram of the proposed CBM. In tion of the this case. two parallel 8-b multiplication operations that satisfy the high-throughput requirement are carried out if . Fig. respec“ ” within tively. III. The single 8-b multiplication can be accomplished by the same method. For 16-b multiplication. . POWER-AWARE CBM In this section. 2 shows the block diagram of the proposed 16-b are utilized to conCBM.

[27] of the input registers and the reset of the output registers. 3. performs the exchange. 2 generthe proposed dynamic-range detector . may cause the coarse-grained comparison performed by incorrect judgment listed in the following. Moreover. not interchange them. . three predetermined guarded boundary positions . Switching logic of the proposed dynamic-range detector. These key components will be described and explained in detail in the following sections. but and .572 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS. For example. but does not perform the exchange. to dynamically disable the redundant computation of the multiplier by forcing unnecessary partial-product bits and carry propagations to zero based on the multiplication mode and the is effective range of the input operands. and an adjustor. and and the output registers are directly reset to zero. 57. and for ates switching signals each 8-b Booth multiplication to pick the operand that leads more partial products to zero for Booth encoding. should be interchanged does if the multiplier is used for Booth encoding. which will decrease the probability of partial products becoming zero. For two 8-b input operands. 3 shows the proposed switching logic for four 8-b Booth multiplications whose input operands . Given and input operands and . [15] divided the multiplication expression into four subexpressions and adopted the dynamic-range detector circuit ) to increase the exchange proposed in [14] (denoted as probability of the input operands. respectively. but performs the exchange. are the input operands are partitioned first into 3-b groups or and are then fed into three-input comparators to obtain more fine-grained comparison results. Case 1) Both input operands must be interchanged. respectively. If at least one of the input operands is equal to zero. Signal meaning as . only four comparators and some simple logic gates are required to judge whether they must be exchanged [15]. NO. represents the same successive zeros or ones. and are used to disable the redundant sign-extension computation on the left side . does not produce the shutdown signals to deactivate the redundant switching activities in the ineffective ranges of the input operands. and are selected. 3. Consequently. multiplication by manipulating utilized to shut down AND-based isolation gates at the inputs of multiplication multiplication to zero when and forcing in . including an gengenerator. including . 1(b). MARCH 2010 Fig. must cooperate with other components. VOL. and signals . (the multiplier). a reduce power consumption and produce the correct product . which produce two and one Booth encoded products with a zero value. Moreover. as shown in Fig. an SEU. With the advantage of simplicity. and nals. tect whether the consecutive 5 b from the input operands are all zeros or ones. and . To simplify utilizes a five-input comparator to dethe logic circuit. in Fig. In addition to produces several extra shutdown sigswitching signals. of and produced by are fed to Signals to generate signals HZ and LZ and an SB. are unnecessary to interchange. and . For exdenotes that all the bits from to are ample. Signals HZ and LZ produced by are directly connected to the clock-gating circuits [26]. so that partial products have greater chances of being zero during Booth encoding to minimize the switching activities of partial products. to erator. Case 2) Both input operands are unnecessary to interchange. Dynamic-Range Detector Park et al. which produce two and three Booth encoded products with a zero value. Signal indicates whether has successively equal bits. Additionally. respectively. 3. In Fig. we propose a novel dynamic-range made up of a switching logic and a shutdown detector logic to significantly decrease the incorrect judgment and the power consumption of the configurable multiplier by properly exchanging the input operands and shutting down the unused functional blocks based on the multiplication mode and the effective range of the input operands. an ECC. If the output of a comparator is 1. but (the multiplicand) and For example. A. 1) Switching logic: Fig. it indicates that the input 3-b group is successive zeros or ones so that its Booth encoded . the clock signals of the input registers are gated to hold the original values.

If the adder cells on the left side of are disabled because . can be obtained by the same method. the input operands will be exchanged if the switching signal is one. generate the switching signal that is used to determine which operand is a multiplier. we also set to zero when and because. 4. then the will be set to zero so that an intoleroutput product able product error will be introduced. and can be generated as if if otherwise if if otherwise and (10) and (11) Fig. and others can be shut down by forcing their input operands and to . in this situation. 2) Shutdown logic: Given the multiplication mode and the effective range of the input operands. When . and of each operand are compared to Finally. the outputs of . To further exploit the fact that the input operands of multiplication operations in many applications are frequently limited to a small range to avoid unnecessary sign-extension computation in single 16-b multiplication. the occurring probabilities of Cases 1) and 2) for these two circuits are computed by feeding all possible input patterns. the proposed switching logic can aid in detecting the length of the sign-extension bits of the input operands and shut down unnecessary computation. and multiplications by setting these signals to be zero. Aside from increasing the probability of Booth encoded products becoming zero. and . . the output product . We can predict the length of the sign-extension bits of output product as (15) Based on (15). denoted as and . comparators are 101. and to individually shut down . and the output product In this case. In this case. respectively. but . and multiplications can be shut down by forcing their input operands and to zero through controlling AND-based isolation gates at and multiplications. indicates that is calculated for each input operand. unnecessary sign-extension computation when or . if if otherwise if otherwise and (12) (13) where (14) with & and denoting the AND and OR operations.KUANG AND WANG: DESIGN OF POWER-EFFICIENT CONFIGURABLE BOOTH MULTIPLIER 573 TABLE II PROBABILITY OF CASES 1) AND 2) FOR DIFFERENT DYNAMIC-RANGE DETECTORS product will be zero. . then redundant computations of and multiplications can be disabled by the same approach. In our design. Similarly. if . In addition. is produced from the switching logic. the adder cells on the left side of can be . leading to . The results in Table II show that the total probability of Cases 1) and 2) for the proposed circuit is much smaller than that of Park’s circuit. 4 produces signals . Howbits on the left side of ever. there are some exceptions to (15) when both and are and negative numbers. then . shutdown signals . there will be Booth encoded products to be zero. Therefore. Moreover. if for . Based on the previous observations. In other words. For example. the is disabled sign-extension computation on the left side of . if multiplication should be disabled to achieve truncation. In fact. indicates that there will be at least Booth encoded products to be zero. as will be explained in the following section. . the number of Booth encoded products with a zero value. the shutdown logic shown in Fig. three predetermined guarded . To avoid the problem. . as mentioned earlier. For instance. and are boundary positions selected. and if . Subsequently. only the multiplication operations of and multiplications need to be performed. we disable the only when and indesign-extension computation when pendently exceed these predetermined guarded boundary positions to satisfy the aforementioned condition and simplify the is set to zero to disable the shutdown logic. To demonstrate the efficiency of the proposed switching logic and Park’s circuit. For example. if the inputs of . the computation of zero. Shutdown logic of the proposed dynamic-range detector. and the output-product disabled if must be replaced with an SB.

(b) ECC for n (19) . In other multiplication modes. Otherwise. 5(b) takes the partial products in MP as inputs to generate approximate carries . and are one and . to generate for the proposed On the other hand. the computation for the least significant bits of the -bit output product can be disabled to further reduce power consumption. disables the computation of LP in multiplication. 57.e. ). and the output-product bits are replaced by an SB directly. must be zero when . The circuit to generate . and Generators As mentioned in Section II. and Huang’s ECC proposed in [21] is slightly modified to decrease the product error. In fact. plier is summarized as follows. (a) HP. . MP. When two parallel 8-b multiplications with truncation are performed. Moreover. signal forces the LPs of (i. but also the mulonly the values of tiplication mode and the effective range of the input operands. when and . signal truncation is performed (i. VOL. the partial-product replaced by an SB directly. ator. When bits from bit positions 19 to 16 and the carry inputs of the adder cells located at bit position 16 are set to zero with two-input AND . Therefore. as shown in Fig. when . signals are determined by if otherwise and if otherwise and (16) (17) if if otherwise and and (18) where Fig.. multiplication is disabled. and indicate that . multiplication is equal to one only when . the output-product bits are by . and for different Table III lists the corresponding operation situations of the proposed multiplier. and must be zero when . and they can be generated by the same method as (14) and (19). When . EV can be obtained by summing up and . and LP. some only when of the four 8-b Booth multiplications are frequently shut down based on the effective range of the input operands.” Based on Table III. and the truncation operation is achieved by forcing the LPs of and multiplications to zero and adding the esti.e. the carry = 8. . Furthermore. and to the HPs of and mated carries multiplications.. is identical with and . and . and the output-product bits gates controlled by must be replaced by an SB directly.. and respectively. For example. and . In addition. and multiplicaapproximate carries are added into the HP of tion. Huang’s ECC must be slightly modified to adapt of ECC in to these characteristics. MP and LP for an 8-b BM. signal disables the multiplication when . only the SBs in and multiplications need to be considered and . and must be larger than 16.574 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS. The truncation operation performed in the proposed multi. proper EV and CV must be generated and fed into the compression tree to ensure that the final product is correct. 5(a). of ECC is equal to one only when their shutdown and signals . To accomplish single 16-b multiplication with truncation ). is also shown in Fig. by Therefore. To incorporate Huang’s ECC into the proposed multiplier. When truncation is performed. the partial products in LP are forced to zero through the extra two-input AND gates. denote the same meaning. 6. which are generated from and according to not and . Therefore. computation of C. where X denotes “don’t care. . the and generated by will be zero to disable the compuand multiplications by forcing tation of their partial-product bits to zero. which are added along with the carry inputs of the adder cells in HP to reduce the truncation error. the proposed multiplier performs the truncation operation .e. 3. Thus. MARCH 2010 . the partial-product bits from bit positions 23 to 20 and the carry inputs of the adder cells located at bit position 20 are forced to zero by employing additional two-input AND gates controlled . B. In our design. we can realize the EV generand . and the estimated carries and multiplications produced from the ECCs of are added into their HPs. when single 8-b multiplication with ). as shown in Fig. 4. the partial products of each 8-b Booth multiplication are divided into HP. Similarly. and multiplications to zero. Huang’s ECC shown in Fig. However. NO. as shown in (7). 5. ECC If the fixed-width multiplication operation is desirable (i. For the remnant 8-b Booth multiplications . the required multiplier also relies on the multiplication mode and the effective range of the input operands.

as shown in Table V. and the bit positions that have the same values for each situation are identified and classified into eight groups denoted . 6. Table IV lists all possible correcting vectors for different operation situations of for the proposed multiplier. only is necessarily added into the compression tree to obtain the final product. Because can be directly obtained through sign-extension bits an SEU.KUANG AND WANG: DESIGN OF POWER-EFFICIENT CONFIGURABLE BOOTH MULTIPLIER 575 TABLE III ALL EV VALUES OF THE PROPOSED MULTIPLIER Fig. 7. and . . Proposed EV generator. Note that is equal to because two independent 8-b multiplication operations are performed in parallel. and the final shown in Fig. all possible correcting vectors in Table IV are represented as two’s-complement binary numbers. TABLE IV CORRECTING VECTORS OF THE PROPOSED MULTIPLIER Fig. . To simplify the generator. 7. All bit positions in the same group can as generator is be generated by the same circuit. Proposed CV generator.

HZ and LZ are generated based on the following principles. must and . the . 3. In addition. and ( . 10. . The large product to to error is avoided by forcing the output bits from be one if . the SB substitutes for the to zero. the sign-extension bits of product is replaced by an SB to avoid unnecessary sign-extension computations. Therefore. Because . Similarly. Error Analysis To analyze the accuracy of the proposed multiplier. 8 is developed to overcome the problem. the adjustor shown for in Fig. the estimated carries generated from the ECC may covert a small negative product to positive or zero. we develop an shut down the entire multiplier when one of the input operands is zero. Adjustor If truncation is performed. and ( . Subsequently. 57. These three signals are generated as follows. VOL. and the normalized mean square normalized mean error are defined as error . SBG and SEU In partially guarded computation. if one of the input operands is zero. Simito are forced to be one when larly. then the output product must be equal to zero. where indicates Fig. and the truncated product beoutput-product bits comes . computing an SB as is not always correct when one of and is zero and the other is negative. multiplication is equal to zero. and all product problem also appears if bits from to are zero. that a large product error will occur when to are zero. In this be equal to zero when case. NO. the SB is selected through down signals multiplexers. However. LZ must be set to zero to reset the output and disable the clock signals of input regisregister and when and . and indiof cates that at least one of input operands and is equal to zero. the entire operation of the configurable multiplier can be shut down to obtain more power savings by preventing input registers from loading new data and directly resetting the output registers to to generate an SB and zero. and thus. When partially guarded computation is also performed. and . MARCH 2010 TABLE V TWO’S-COMPLEMENT BINARY NUMBER OF ALL CORRECTING VECTORS Fig. the error metrics in terms of the normalized maximum error . then .576 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS. . D. Similarly. Signal LPZ is generated by the same method. the original truncated product is equal and . E. we can find . Considering the example of . and will and and reset disable the clock signals of input registers . a large product error occurs if the sign-extension part of product are replaced by the SB generated from . If and the output register or 10. ters Finally. signal HZ should be set to zero to reset the output register and disable the clock signals of input registers and . Therefore. Moreover. where denotes the EXCLUSIVE–OR opera- tion. F. where denotes the output bits of FCPA. Proposed adjustor. the SEU that consists of 16 multiplexers. This situation occurs only when one of and is positive and the other is negative. Through using shuttation on the left side of . From the previous example. when or . where will disable the clock signals of input registers and and reset the output register . as shown in Fig. and must be set to zero. 9 shows the proposed multiplication is equal that at least one input operand of indicates that at least one input operand to zero. the output bits from . is utilized to directly assign sign-extension bits to the output product to avoid the redundant sign-extension compu. or . If and . 8. Moreover. the same and all product bits from .

the Cadence SOC Encounter is employed to perform the placement and routing. and represent the maximum operator. and the results will be shown and discussed in the next section. For 2-D 8 8 DCT operations and YUV-to-RGB conversion. The results of area and delay are obtained from the Cadence SOC Encounter. IV. Proposed SBG. and of CBM are significantly less than show that that of TBM and very approximate to that of HTM. and the output of the truncated multiplier. where each input voice segment involves 1300 multiplication operations.m CMOS standard cell technology library.13. TABLE VI ERROR ANALYSIS OF DIFFERENT MULTIPLIERS (20) where . respectively. we have designed the conventional BM. the area and delay of the proposed CBM very approximate to that of nonconfigurable low-power PBM. including a . saving on average by performing truncation and sacrificing a and also give bit of output quality. are given in Table VII. 11 shows the dynamic ranges of coefficients and input data for multiplication operations in these applications. a generator. six 0. the average operator. six input data (Data 1–Data 6) composed of 51 200 multiplication operations are extracted from the “stefan” benchmark for experiments. including DCT for an MPEG4 encoder [28]. respectively. although it is larger than that of BM because additional . where TBM. etc. For YUV-to-RGB conversion. three real-life applications have been picked to evaluate the accuracy. Fig. 10. Three real-life applications. coefficients and input data are set to the integer values represented by 16-b two’scomplement number. Proposed SEU. and the proposed CBM . respectively. In and addition.6% power saving on avcan further achieve 58. To further demonstrate the error performance of the proposed CBM. an circuits.1% over PBM. respectively.KUANG AND WANG: DESIGN OF POWER-EFFICIENT CONFIGURABLE BOOTH MULTIPLIER 577 TABLE VII COMPARISON OF AREA AND DELAY FOR DIFFERENT MULTIPLIERS Fig. These multipliers were synthesized by using the Synopsys Design Compiler with the TSMC 0. For power-consumption estimation. the coefficients and input speech signals sampled at 8 kHz of the FIR filter are represented by 8-b two’s-complement form. power savings of 14. The power simulations are performed at a clock frequency of 50 MHz at 1. Moreover. respectively. and for diftive simulation results in terms of ferent multipliers are listed in Table VI. 9.162-s segments of voice (Segm 1–Segm 6) with different features are captured and analyzed. the output of the standard multiplier.9% on average when compared with BM. Fig. The implementation results. and a 35-point low-pass FIR filter [30] for speech signal processing. and the proposed CBM for in Verilog HDL. and CBM[2:0] denote the conventional truncated BM without an ECC. are used to evaluate the power efficiency of these multipliers. Huang’s truncated multiplier [21]. With respect to power simulation vectors. . the results in Table VIII exhibit that is capable of achieving 30. YUV-to-RGB.8 V. Furthermore. Synopsys NanoSim is adopted to measure the average power consumption of transistor-level circuits. several input data with different effective dynamic ranges for DCT. and FIR35 are selected to demonstrate that the proposed CBM is still power efficient when the characteristics of input data are changed. The final outputs of the multiplication operations generated by the proposed CBM are compared with the standard output produced by BM to evaluate the error performance. As for the FIR35 filter. As can be seen. EXPERIMENTAL RESULTS For comparison.4% and 52. the results in Table IX indicate that provide power reductions of 26. an . Table IX also and offer power savings of shows that . Park’s low-power BM (PBM) [15]. The results with configuration signals .9% power erage. HTM. . YUV-to-RGB conversion [29].. including hardware area and critical path delay for these multipliers. The exhaus. are added to achieve configuration and reduce power consumption. six images (Imag 1–Imag 6) with different resolutions are adopted. For DCT. The power estimation and peak-signal-to-noise (PSNR) results of different multipliers are listed in Table VIII–X. Subsequently. When compared to BM.8% and 43.

578 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS. because two 8-b multiplication operations are simultaneously performed in CBM and the execution cycles for these multiplication operations will be reduced by half.9% and 33.7% and 57.3% and 38. respectively.0% energy savings when compared to BM. thereby reducing power consumption. TABLE VIII COMPARISONS OF POWER CONSUMPTION AND PSNR FOR DCT 12. (b) YUV-to-RGB. However. NO. V. It provides a flexible arithmetic capacity and a tradeoff between output precision and power consumption. Moreover. the results show that the proposed CBM can significantly reduce power and energy consumptions while maintaining acceptable output quality. the ineffective circuitry can be efficiently deactivated. The experimental results have shown that the proposed multiplier outperforms . VOL.8% over PBM. Dynamic ranges of coefficients and input data for multiplications in the three applications. (a) DCT. can still offer power savings of 22. On the other hand. CONCLUSION In this paper. respectively. (c) FIR filter. a power-efficient configurable multiplier has been proposed. In a word. 57. MARCH 2010 Fig. the results in Table X show that PBM consumes more power and than BM for FIR35 on average.1% and over BM. 3. Notice that result in no or little power saving but 40. 11.

(VLSI) Syst. Circuits Syst. K. Int. pp.. 14–17. Pfleiderer. Nov. pp. 5. Circuits. [19] K. Conf. E.” IEEE Trans.” in Proc. [18] S. 1. [11] N. K.” in Proc. Circuits Syst. “Guarded evaluation: Pushing power management to logic synthesis/design. May 2004.. G. 2006. 1051–1060. “Some schemes for parallel multipliers. Asia-Pacific Circuits Syst. ASIC. Taiwan. 2004.-Aided Design Integr.” in Proc. J. Analog Digit. Wey and J. 15th IEEE Int. S. Integr. 3538–3541. pp. Pfander. and A.-S.. pp. Petra. Dec. no. Jul. no. Circuits. Liao. pp. II. Workshop VLSI Signal Process. 43. pp. “A suggestion for a fast multiplier. Syst. Y. Des. pp.” in Proc.. Swartzlander Jr. Electron. 54. J. Feb. and H. Adv. and Y. II. IEEE Asia-Pacific Conf. 6. Choi. 90–94. Syst.-C. Chang. 3327–3330. May 2006. Comput. pp. “Low-error reduced-width Booth multipliers for DSP applications.” in Proc. pp. Circuits Syst. 149–154. ACKNOWLEDGMENT The authors would like to thank the reviewers for their many constructive comments and suggestions in improving this paper.” in Proc. pp. Q. [14] N. Int.. Circuits Syst. pp. Elect. S. M. Dec. no. Conf. Jun. Circuits Syst. Antoniou. C. Comput. [8] V. “Reducing energy of digital multiplier by adjusting voltage supply to multiplicand variation. the proposed multiplier is very suitable for multimedia and DSP applications that require flexible processing ability and low power consumption. 10. Garofalo. Parhi. H. [3] C. the results have also shown that it can significantly reduce power consumption with enough accuracy when truncation is performed. M. 37th Asilomar Conf. S. T. Symp. “Low-power multipliers by minimizing switching activities of partial products. Ahmadi. G. “Area-efficient multipliers for digital signal processing applications.. “Power minimization of function units by partially guarded computation.” in Proc.-S. In addition. vol. “A novel architecture for low-power design of parallel multipliers. vol. vol. Dong. Chen and Y. and K. Hsiao. 12. IEEE Int. Dec. “Design of low-error fixed-width modified booth multiplier. no. Jullien. [17] S. 974–978. J. A. Analog Digit. vol. [13] T. [9] J. G. Chung. Krithivasan and M. Low Power Electron. Tsao. [21] H. Dec. Huang. Signal Process. Li. 1. 2007. S. Oct. Mokrian.. 17. no. pp. Symp. 1996. L.. H. .” in Proc.. Circuits Syst. “A low-power Booth multiplier using novel data partition method. Strollo. 50. L. Qiang. 132–143. IEEE Comput. Circuits Syst. pp.” IEEE Trans. O. 2000. Feb. Circuits Syst. Ashar. 76–79. IEEE Int. “Truncated multiplication with correction constant. “A VLSI architecture for a run-time multi-precision reconfigurable Booth multiplier. Papers. Oct. “Low-power carry-free fixed-width multipliers with low-cost compensation circuit. Circuits Syst. [16] M. 29–32. Workshop VLSI. . Sun. 1430–1433. L.. pp.. “Low power multiplication for FIR filters. J. Jou. F.” Alta Freq. The authors would also like to thank the National Chip Implementation Center. J. 2. Aug. Nov. A. Yue. Very Large Scale Integr. A. 349–356. Electron. Low Power Electron. pp. Comput. 14th IEEE Int. and M. Reg. 299–303. 1423–1426. “A spurious-power suppression technique for multimedia/DSP applications. Signal Process. Shen and O. 46th IEEE Midwest Symp. 1997. “Multiplier architectures for media processing. J. Park. Wey. Fundam. vol. I. Nov.” IEEE Trans. [6] Y. Symp. Kidambi. pp.. 2004. no. Signal Process. Wallace.KUANG AND WANG: DESIGN OF POWER-EFFICIENT CONFIGURABLE BOOTH MULTIPLIER 579 TABLE IX COMPARISONS OF POWER CONSUMPTION AND PSNR FOR YUV-TO-RGB TABLE X COMPARISONS OF POWER CONSUMPTION AND PSNR FOR FIR35 the conventional multiplier in terms of power and energy efficiencies at the expense of extra area and delay overheads. for their contributions and support in technology data. vol. 2007. Conf. Larsson. L. [23] C. Shun. Chen.-C. Cho. Malik. 2001. A. “A novel reconfigurable architecture of low-power unsigned multiplier for digital signal processing. “Low power minimization combinational multipliers using data-driven signal gating. pp. Syst. vol. Symp. [5] S. Kim.-H. 37–40. Apr. “Design of reconfigurable array multipliers and multiplier-accumulators. 4.. Bayoumi. and Y. Schulte and E. Aug.. 1998. IEEE Int. “A self-compensation fixed-width booth multiplier and its 128-point FFT applications. Circuits. De Caro. Lee. May 1965. [7] Z.-A. Moshnyaga. Conf.. Juang and S.” IEEE Trans. REFERENCES [1] O. Analog Digit. Conf. Syst.-B. pp. C. vol. Gustafsson. Napoli.” in Proc. IEEE Can.” in Proc.” in Proc. 131–136. [10] A. Nicol and P. Circuits Syst. Chu.-J. Dadda. Annu. and E. Quan.. D. pp. and K. pp. 2003. [25] L. 48–51. EC-13. pp. Yamanaka and V.” in Proc.. and W. 2008. 1993. May 2003. Electron.. Asia-Pacific Circuits Syst. A.” in Proc. Feb. Theory Appl. 7th Int. pp. “Multiple-precision subword-parallel multiplier using correct-value merging technique. [22] V. M. [2] P. F. 125–128. Soc.. [4] S. Jan. 11. Lee. 2005.. [15] J. 1470–1474. 54–57. vol. 2009. Comput. 2193–2197. Symp. Miller. 2003.. G. 388–396. [12] K. IEEE Int.” IEEE Trans. II. Schulte.” IEEE Trans. 34. and P.” in Proc. 522–531.” IEEE Trans. I.-Y. As a result.. 1964. May 2005. 11.. no. 52. Signals.” in Proc. Conf. “Lower bounds for constant multiplication problems.-F. 56. “Low error truncated multipliers for DSP applications.” in Proc. Bermak. Choi. Tiwari. “A reconfigurable digit multiplier architecture. Fayed and M.. pp. 975–978. and A. El-Guibaly. pp. vol. 2003. 93–96. N.” IEEE Trans. D. and C. Kusha. [20] T. Jeon. Des.-C. Honarmand and A. IEEE Int. Tsai. Zhang. Li. pp. J. [24] C. no. Eng. 2007. May 2002.

pp. Shiann-Rong Kuang (M’09) received the B. 1991. and Ph. Yip. “Adaptive clock gating technique for low power IP core in SOC design. and T. Ueda. Des. 1990. Feb. pp. Jhongli City. Kaohsiung. Bruce.S. R. G.S. N. Nishio. Taiwan. Symp. computer arithmetic. Meng. [28] K. degree from National Central University. Chang. Circuits Syst. VOL. Applications.” in Proc. Low Power Electron. Mitsuhashi. Rao and P. M. Minami. Tainan City. Symp. Oct. Taiwan. He is currently working toward the Ph. New York: Academic. Paul and K. E.” in Proc. Jiun-Ping Wang received the B. Int. . all in electrical engineering. in 2003. and low-power design. Kitahara. Zhang. National Sun Yat-Sen University. Taiwan. MARCH 2010 [26] T.D. S. and J. Zhang.” in Proc. Wang. IEEE Int. Advantages. K.-Y.. in 1990 and the M. Discrete Cosine Transform: Algorithms. Murakata. and T. Kaohsiung. Jhongli City.. 57. H. He is currently an Assistant Professor with the Department of Computer Science and Engineering. Taiwan.580 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—I: REGULAR PAPERS. C Language Algorithms for Digital Signal Processing. Englewood Cliffs. Z. 408–417. 1994. M. National Sun Yat-Sen University. F.. Zhang. degree in information and computer engineering from Chung-Yuan Christian University. pp. His research interests include VLSI design. “A clock-gating method for low-power LSI design. NO. 307–312. 2120–2123. Chaddha. NJ: Prentice-Hall. [30] M. respectively.S. 3. Gordon. His research interests include VLSI/CAD and lowpower system-on-chip design. Usami. Taiwan.D. [27] X. in 1992 and 1998. Workshop VLSI Signal Process. T. May 2007. degree at the Department of Computer Science and Engineering. [29] B. 1998. degrees from National Cheng Kung University. “A low-power multiplierless YUV to RGB converter based on human vision perception.