You are on page 1of 9

International Journal of Advance Foundation and Research in Computer (IJAFRC)

Volume 2, Issue 8, August - 2015. ISSN 2348 4853, Impact Factor 1.317

Energy Efficient Reconfigurable Fixed Width Baugh-Wooley

Mr. A. D. Nanure*, Prof. D. R. Dandekar, Prof. Mrs. Y. A. Sadawarte
In this thesis, we propose a reconfigurable fixed-width Baugh-Wooley multiplier design framework that
provides six configuration modes (CMs).The presented six configuration modes of the parallel
reconfigurable fixed-width multiplier are capable of providing high resolution, parallel, and fullprecision multiplications for different computation demands. These six configuration mode are 1.Single
nn fixed width multiplier, 2.Dual n/2 n/2 fixed width multiplier, and 3.Single n/2 n/2 fixed width
multiplier, 4. Single n/2 n/2 full precision multiplier, 5.Dual n/4 n/4 full precision multiplier, 6.Single
n/4 n/4 full precision multiplier From the simulation results, the proposed 8x8 reconfigurable fixedwidth Baugh-Wooley multiplier can attain power saving to a certain extend with respect to that of the
8x8 reconfigurable fixed-width Baugh-multiplier with six configuration modes. Baugh-Wooley multiplier
is suitable for signed number and unsigned number multiplication
Index Terms: Baugh-Wooley algorithm, clock gating, and zero input technique.



In many DSP applications, multiplication is a very important operation. In DSP application power efficient
multiplier is essential due to the increased demand in expanding computing and communication
operations which offers a better power reduction. Most of the multiplication algorithms are based on the
Baugh Wooley or Booth [1][2][3]. In digital signal processing applications higher flexible multiplier is
required with less power and higher performance. In most of the cases fixed width multipliers are used
for the multiplication purposes. With this fixed width, area and power reduction are achieved to a large
extent. In architecture of multiplier operation consists of three stages; the generation of partial products,
reduction of partial products and final carry propagation addition. In fixed width multiplier the LSB are
truncated and concentrate only on the higher order bits for the multiplication process [6]. The ignoring of
the LSB part will lead to two main errors in the multiplication process i.e. reduction and rounding errors.
In a full precision multiplier nn multiplier it gives a2n output as sum of partial products. If the final
product is truncated to n bits, the product matrix contributes little to the final result. As more columns
which contribute the partial products are eliminated out, the area and power consumption of the
arithmetic unit and delay also reduced to a larger extent. Different configuration parameters are
required for making different functioning process in DSP. For requirement of different configuration
pattern, different structure of multiplier is needed but the complexity of hardware is higher.
Reconfiguring an existing structure will leads to greater flexibility without compromising on
performance. Former reconfigurable structures have four modes for various DSP functions[5].In this
paper it has increased up to six different modes by reconfiguring the low power fixed width multiplier
structure with power reduction techniques, also an error compensation technique in the design to
reduce the error[7]. The six configuration modes are 1. Single nn fixed width multiplier, 2.Dual n/2n/2
fixed width multiplier, 3.Single n/2n/2 fixed width multiplier,4. Single n/2n/2 full precision multiplier,
5. Dual n/4n/4 full precision multiplier, 6.Single n/4n/4 full precision multiplier.
71 | 2015, IJAFRC All Rights Reserved

International Journal of Advance Foundation and Research in Computer (IJAFRC)

Volume 2, Issue 8, August - 2015. ISSN 2348 4853, Impact Factor 1.317
With this six configuration mode has introduced a pipelined and reconfiguration concept in BaughWooley multiplier that which will functions in various bit length process. The paper is organized as
follows: section 2 gives an details about 2scomplement parallel array multiplication algorithm, section
3 gives description about design of Pipelined reconfigurable fixed width Baugh Wooley multiplier. In
section 4 discuss about the power reduction techniques which have been in the proposed architecture
and simulation results are presented in section5.Last,brief statements conclude the presentation of the
Considering two 2s-complement integer operands, we can, respectively, represent an n-bit multiplicand
X and an n-bit multiplier Y as follows.

Where ,

.The 2n full precision product

can be written as

This equation represents the Baugh-Wooley algorithm in which this array multiplier sums partial
product bits corresponding to each weighting.

Fig.1. Partial product array diagram for an n n Baugh-Wooley multiplier

Many DSP and computer applications demand to operate at lower resolution, where the data can
be expressed in a half-word length [21-26]. Generally, applying the subword multiplication scheme,
we can partition an n-bit operand into two independent n/2-bit operands or four independent n/4-bit
operands; hence, the sub word multiplier can perform not only n x n full-precision multiplication but
also two n/2xn/2 or four n/4xn/4 full-precision multiplications in parallel.

72 | 2015, IJAFRC All Rights Reserved

International Journal of Advance Foundation and Research in Computer (IJAFRC)

Volume 2, Issue 8, August - 2015. ISSN 2348 4853, Impact Factor 1.317
The partial product array for nn 2s complement multiplication are where notation w means to keep
n+w most significant column of the partial product for fixed width multiplication. If w = n the fixed
width multiplier becomes full precision multiplier.

Fig.2. Subword multiplication (a) two n/2xn/2 multiplications, (b) two n/2xn/2 partial product
distribution, (c) four n/4xn/4 multiplications, and (d) four n/4xn/4 partial-product array distribution.
This section describes the implementation of six different configuration modes under limited
hardware resource. Most of the applications it has require only single precision product, where the
double word length result is rounded to single precision. It is only necessary to estimate the carries
generated which is ripple into the most significant part of the product [8]. In the present work reduced
the accuracy degradation in fixed width multipliers by truncating with rounding technique which has
accuracy almost equal to the rounding technique with a little circuit complexity. The three modules
denoted by mul1, mul2, mul3 are used to achieve the six modes of operation. For attain various
configuration modes various configuration parameters has been set out. The elaborated structure of
MUL1, MUL2, MUL3 are given in the previous paper [5].The prototype of the reconfigurable architecture
is given below.
This pipelined reconfigurable fixed width multiplier architecture can apply to unsigned number & signed
73 | 2015, IJAFRC All Rights Reserved

International Journal of Advance Foundation and Research in Computer (IJAFRC)

Volume 2, Issue 8, August - 2015. ISSN 2348 4853, Impact Factor 1.317
number multiplication. The three module denoted by MUL1, MUL2, MUL3 are used to achieve the six
mode of operation to fulfill the different configuration mode. The elaborated structure of MUL1, MUL2
and MUL3 are given in the paper [5].
CM1: nn fixed width multiplier
In CM1, multiplier receives two n bit input data and produces an n bit product. All the three multiple
blocks are used for the calculation purpose. Each partial product is generated independently and summed
up to get the final result. In this mode, compensation vector is used to add carry to the final stage.
For avoiding of addition of compensation vector twice a control unit has been used in multiplier block
1.The partial array diagram and the configuration parameters has been given below.

Fig.3. pipelined reconfigurable multiplier.

Fig.4. (a) Partial product array diagram for n n fixed width multiplication

74 | 2015, IJAFRC All Rights Reserved

International Journal of Advance Foundation and Research in Computer (IJAFRC)

Volume 2, Issue 8, August - 2015. ISSN 2348 4853, Impact Factor 1.317
Fig.4.(b) partial- product array diagram using MUL1, MUL2, MUL3 for CM1.

Fig.4. (c) Configuration parameter setting.

CM2: n/2n/2 fixed width multipliers:
The input is given as two n/2 numbers and output is taken as two n/2 numbers. It is manifest that
themul1 and mul2 blocks are suitable for two n/2n/2 multiplication. In this mode the configuration
parameters are set has 1 for CP0, CP1, CP2.

Fig.5. (a) Partial products for CM2

(b) Input and output relations for CM2
CM3: one n/2n/2 fixed width multiplier
In this mode, two multipliers are used to obtain the final result. Two multiplicand operations are not
necessary for smaller bit length applications so that only one multiplier is required to obtain the result.
The power consumption is reduced by using only one multiplier block mul1.

Fig.6.(a) Proposed partial product array diagram for CM3, (b) Configuration parameter settings.
CM4 : one n/2n/2 full precision multiplier
In this case multiplier block 3 is alone is used for the operation. Two n/2 numbers are multiplied and n
bit product is given as the output. The partial product diagram and mode setting are given in figure 7(a)
& 7(b).

75 | 2015, IJAFRC All Rights Reserved

International Journal of Advance Foundation and Research in Computer (IJAFRC)

Volume 2, Issue 8, August - 2015. ISSN 2348 4853, Impact Factor 1.317

Fig.7. (a) Proposed partial-product array diagram for CM4, and (b) Configuration parameter settings
CM5: two n/4n/4 full precision multiplier
This configuration mode is widely used in low resolution operation which performs two n/4n/4
full precision multiplications. With minimum numbers of modules and partial product configuration we
make use of operation of parameter setting explained in fig.(8-b).
CM6: n/4n/4 full precision multiplier
This mode is an extension to mode5 which uses lesser resources to arrive at multiplication process. This
mode is added advantage for low power application where a small part of architecture is being used up.
In this only the higher order bits of mul3 has been using up for the calculation part. The higher bits from
both the inputs has been invoking for calculations.
Using the above mentioned operating modes and the reconfigurable architecture, a new
architecture is proposed to arrive at the functionality. The figure gives an over view of an architecture.
The entire architecture is divided it into 3 sections.stage1 decodes the operation condition for different
modes of operation. These bits select which multiplier functionality to be performed in a particular time.
The mode select bits are determined according to the reconfigurable region or modules designed.
Operation code (op) is used to determine the type of multiplication performed; either n x n fixed width or
n/2 x n/2 fixed width or n/2 x n/2 full precision or n/4 x n/4 full precision. In second stage each MUL
module performs independent multiplication operation according to the multiplicand inputs and the
decoded control signals from the stage 1. The product from each MUL is then sent to stage 3 for final
addition. MUX in the final stage is used to select the output of the multipliers based on the input
control signals.

Fig.8. (a) Partial products for CM5,

(b) Configuration parameters for CM5.
76 | 2015, IJAFRC All Rights Reserved

International Journal of Advance Foundation and Research in Computer (IJAFRC)

Volume 2, Issue 8, August - 2015. ISSN 2348 4853, Impact Factor 1.317

Clock gating is given to the register in the second and third stage of the multiplier. The prime aim of this is
to neglect unnecessary transition in the multiplication process. With our requirement only registers are
disabled based on the mode of operation
1.If multiplier is operated in m1 mode then mul1, mul2, mul3 are conditionally disabled based on the
zero inputs tothe multiplier.
2.For mode2, mul3 is being disabled.
3.For mode3, mul2 and mul3 are disabled.
4.For mode4, mul1 and mul2 are disabled.
5.For mode5, mul1 and mul2 are disabled.
6. For mode6, mul1 and mul2 are disabled and mul3 is partially disabled by disabling the gated


The functional blocks mul1, mul2 and mul3 can be functionally disabled based on the zero inputs they
receive. The condition for zero value is follows
1.If x [7:4] is zero, input register of mul1 and mul3 can be disabled
2.If x[3:0] is zero, input register of mul2 can be disabled
3.If y[7:4] is zero, input register of mul2 and mul3 can be disabled
4.If y[3:0] is zero ,input register of mul1 can be disabled.
In most cases if the inputs operands are zero the product of the multiplication process may not be
zero, because some of the partial products in the multiplication process has complemented out. The
actual outputs of the mul3and mul2 should be (11110000) and (001111)2 [5] .The output of mul1 may
not be same in all the cases the output depends on the partial product vector. In such case the actual
product of MUL1 in the disabled condition is {0100, x3y3 & Km2, (x3y3 & Km2)}.i control unit (CU) is
used to treat Km2 = 1 when MUL2 is disabled. Latch L is used to keep the present value when MUL1 is
disabled. For the operations other than M1 mode, input registers of ADD1 can be disabled. Based on the
above stated conditions the input signal is decoded and g_m1, g_m2 and g_m3 are generated which
control the gated registers of MUL1, MUL2 and MUL3 respectively. The gated register at stage 3 is
controlled by t [3] which is taken as value 1 only in the operation mode CM1.
Gate Delay
Net Delay
Clock Fan-out
Total Estimated Power Consumption
77 | 2015, IJAFRC All Rights Reserved

Spartan 3E (Xilinx 9.2i)

15.847 ns
10.331 ns
56 mW

International Journal of Advance Foundation and Research in Computer (IJAFRC)

Volume 2, Issue 8, August - 2015. ISSN 2348 4853, Impact Factor 1.317
Gate Count
Additional JTAG Gate Count
Bounded IO
No. of LUT
Higher Clock Frequency

100 MHz

1. From the simulation results, pipelined reconfigurable fixed width Baugh-Wooley multiplier with six
configuration modes consumes power to a certain extend than former reconfigurable fixed width
multiplier and capable of providing low power for high resolution and smaller bit length
2. This power efficient multiplier can be use for DSP application.
3. The same methodology can be used for n=16, 32, and 64 bit.
As an attempt to develop pipelined reconfigurable fixed-width multiplier using Baugh-Wooley algorithm
and architecture for low power multiplier design, the research presented in this thesis has achieved good
result. However there are limitations in my work and several future scopes are possible.
One possible direction is to apply developed multipliers to power aware system.

R.Baugh and B. A. Wooley, A twos complement parallel array multiplication algorithm, IEEE
Trans. Compt., vol. C-22, no. 12, pp. 1045-1047, Dec. 1973.


A. D.Booth,Signed Binary Multiplication Techniques.Quarterly J. Mechanics And Applied Maths

Vol.4,pp 236240,1951.


O. L. Macsorley, High-speed arithmetic in binary computer, Proc. IRE, vol. 49, pp. 67-91, 1961.

[4], Computer Arithmetic: Principles, Architecture, and Design. New York: John-Wiley,


Jin-HaoTu and Lan-Da Van, Power-Efficient Pipelined Reconfigurable Fixed-Width Baugh-Wooley

Multipliers IEEE Transactions on computers, vol. 58, No. 10, October 2009.


J. M. Jou, S. R. Kuang, and R. D. Chen, Design of low-error fixed-width multiplier for DSP
applications, IEEE Trans. Circuits Syst. II, vol. CAS-46, no. 6, pp. 836-842, Jun. 1999.


S. Krithivasan and M. J. Schulte, Multiplier architectures for media processing, in Proc.IEEE

Asilomar Conference on Signals, Systems, and Computers, Nov. 2003, vol. 2, pp. 2193-2197.


Taso Y. -L. , W.- H. Chan M.- H, Tan, M.- C. Lin

a nd
S .- J . Jo u, Low-e r
DSP core for communication systems, EURASIP Journal on Applied Signal
Processing, pp.1355-1370, Jan. 2003.

78 | 2015, IJAFRC All Rights Reserved

International Journal of Advance Foundation and Research in Computer (IJAFRC)

Volume 2, Issue 8, August - 2015. ISSN 2348 4853, Impact Factor 1.317

79 | 2015, IJAFRC All Rights Reserved