You are on page 1of 14

Prasann D.

Kulkarni, et al International Journal of Computer and Electronics Research [Volume 2, Issue 2, April 2013]


Prof Prasann D.Kulkarni1, Prof.S.P.Deshpande2, Dr.G.R.Udupi3
Lecturer,Dept of E&CE, KLSs VDRIT, Haliyal, India
Asst.Prof, Dept of E&CE, KLSs G.I.T, Belgaum, India
Principal, KLSs VDRIT, Haliyal, India
Abstract - A multiplier is one of the key hardware blocks
in most digital and high performance systems such as
FIR filters, digital signal processors and microprocessors
etc. With advances in technology, many researchers have
tried and are trying to design multipliers which offer
either of the following- high speed, low power
consumption, regularity of layout and hence less area or
even combination of them in multiplier. Thus making
them suitable for various high speed, low power and
compact VLSI implementations. However area and
speed are two conflicting constraints. So improving
speed results always in larger areas. So here we try to
find out the best trade off solution among them.
Generally as we know multiplication goes in two basic
steps. Partial product and then addition. Hence here, we
first try to design Considering the design of Wallace tree
multiplier then followed by Booths Wallace multiplier
and comparing the speed and Power consumption in

As the scale of integration keeps

growing, more and more sophisticated signal
processing systems are being implemented on a
VLSI chip. These signal processing applications
not only demand great computation capacity but
also consume considerable amount of energy.
While performance and Area remain to be the two
major design tools, power consumption has
become a critical concern in todays
VLSI system design. The need for low-power
VLSI system arises from two main forces. First,
with the steady growth of operating frequency and
processin capacity per chip, large currents have to
be delivered and the heat due to large power
consumption must be
removed by proper cooling techniques. Second,
battery life in portable electronic devices is
Low power design directly leads to prolonged
operation time in these portable devices.
Multiplication is a fundamental operation in
most signal processing algorithms. Multipliers
have large area, long latency and consume
considerable power. Therefore low-power
Motivation -

multiplier design has been an important part in

low- power VLSI system design. A systems
performance is generally determined by the
performance of the multiplier because the
multiplier is generally the slowest element in the
system. Furthermore, it is generally the most area
consuming. Hence,optimizing the speed and area
of the multiplier is a major design issue. However,
area and speed are usually conflicting constraints
so that improving speed results mostly in larger
We study different adders and compare them, so
that we can judge to know which adder was best
suited for situation.
Ripple Carry Adder has a smaller area while
having lesser speed.

Carry Select Adders are high-speed but

posses a larger area.

Carry Look Ahead Adder is in between the

spectrum having a proper trade off between
time and area complexities.

Coming to Multipliers, we consider different

Multipliers starting from Array Multiplier to
Wallace Tree, Booth Multipliers, both Radix-2
and Radix-4.
Array Multiplier is the worst case multiplier
consuming highest amount of power. Then comes
the Radix-2 Booth multiplier which consumes
lesser power than array multiplier. The Wallace
Tree multiplier and Booth Multiplier Radix-4
have nearly same amount of delay while Radix-4
Booth consuming lesser power than the other.
Hence we reach to a conclusion that Booth Radix4 Multiplier is best for situations requiring Low
power Applications. However, the benefit
achieved comes at the expense of increased

ISSN: 2278-5795

Page 94

Prasann D. Kulkarni, et al International Journal of Computer and Electronics Research [Volume 2, Issue 2, April 2013]

hardware complexity. Indeed, this implementation

requires hardware for the encoding and for the
selection of the partial products. Among other
multipliers, shift-and-add multipliers have been
used in many applications for their simplicity and
relatively small area requirement. The architecture
in BZFAD, gives an optimization in both power
and area.
Table 1: Comparison of address

for n bit

for n









The power consumption in digital CMOS circuit

can be described by
The dynamic power dissipation is caused by
charging and discharging of capacitances in the
circuit. The short circuit power consumption is
caused by the current flow through the direct path
existing between the power supply and the ground
during the transition phase. The n-MOS and pMOS transistors used in a CMOS logic circuit
commonly have non zero reverse leakage and sub
threshold current. The computation of a multiplier
manipulates two input data to generate many
partial products for subsequent addition
operations, which in the CMOS circuit design
require many switching activities. The switching
activities within the functional unit of a multiplier
accounts for the majority of the power dissipation
of a multiplier, as given in the following equation
Pswitching = C Vdd2 fclk

Table 2:.Comparision Multipliers



than Moderate


than Highest


than High

Power dissipation of VLSI chips is traditionally
a neglected subject. In the past, the device density
and frequency were low enough that it was not a
constraining factor in chips. As the scale of
integration improves, more transistors, faster and
smaller than their predecessors, are being packed
into a chip. This leads to the steady growth of the
operating frequency and processing capacity per
chip, resulting in increased power dissipation.


Where is the switching activity parameter, C

is the loading capacitance, Vdd is the operating
voltage and fclk is the operating frequency.
Shift-and-add multiplication is similar to the
multiplication performed by paper and pencil.
This method adds the multiplicand X to itself
Y times, where Y denotes the multiplier. To
multiply two numbers by paper and pencil, the
algorithm is to take the digits of the multiplier one
at a time from right to left, multiplying the
multiplicand by a single digit of the multiplier and
placing the intermediate product in the appropriate
positions to the left of the earlier results. To
perform the entire operations for getting the final
product, the conventional architecture for shift and
add multipliers require many switching activities.
So the dynamic power dissipation is more in
conventional architecture. By eliminating or
reducing the sources switching activity in the
conventional multiplier, low power architecture of
multiplier can be derived. Being one among the
functional components of many digital systems
the reduction of power dissipation in multipliers
should be as much as possible.

ISSN: 2278-5795

Page 95

Prasann D. Kulkarni, et al International Journal of Computer and Electronics Research [Volume 2, Issue 2, April 2013]

A low-power structure called BZ-FAD (Bypass
Zero, Feed A Directly) for shift-and-add
multipliers is proposed. The architecture
considerably lowers the switching activity of
conventional multipliers. The modifications to the
multiplier which multiplies A by B include the
removal of the shifting the B register, direct
feeding of A to the adder, bypassing the adder
whenever possible, using a ring counter instead of
a binary counter and removal of the partial
product shift. The architecture makes use of a
low-power ring counter proposed in this work.
Simulation results for 32-bit radix-2 multipliers
show that the BZ-FAD architecture lowers the
total switching activity up to 76% and power
consumption up to 30% when compared to the
conventional architecture. The proposed multiplier
can be used for low-power applications where the
speed is not a primary design parameter.
The rest of the paper is organized as follows.
Section II briefly reviews the background
information about conventional shift and add
multiplier. Section III describes the architecture
description of the low power multiplier. Section
IV describes the low power ring counter
architecture. Results are discussed in section V
and conclusion is in the last section.

the next stage. For an n-bit parallel adder it

requires n full adders.

Figure 1: A 4-bit Ripple Carry Adder

Logic equations
gi = ai bi
p = ai xor bi.
Ci+1 = gi +
Si = pi xor ci
Complexity and Delay for n-bit RCA structure
ARCA = O (n) = 7n
TRCA = O (n) = 2n
Not very efficient when large number bits
numbers are used.
Delay increases linearly wit bit length.
2.2 Carry Select Adder(CSLA)
In Carry select adder scheme, blocks of bits are
added in two ways: one assuming a carry-in of 0
and the other with a carry-in of 1.This results in
two precomputed sum and carry-out signal pairs
(s0i-1:k , c0i ; s1i-1:k , c1i) , later as the blocks
true carry-in (ck) becomes known , the correct
signal pairs are selected. Generally multiplexers
are used to propagate carries.

Addition is the most common and often used
arithmetic operation on microprocessor, digital
signal processor, especially digital computers.
Also, it serves as a building block for synthesis all
other arithmetic operations. Therefore, regarding
the efficient implementation of an arithmetic unit,
the binary adder structures become a very critical
hardware unit. Although many researches dealing
with the binary adder structures have been done,
the studies based on their comparative
performance analysis are only a few.
With respect to asymptotic delay time and area
complexity, the binary adder architectures can be
categorized into four primary classes as given
2.1 Ripple Carry Adder(RCA)
The well known adder architecture, ripple carry
adder is composed of cascaded full adders for nbit adder, as shown in figure 2.1.It is constructed
by cascading full adder blocks in series. The carry
out of one stage is fed directly to the carry-in of

Figure 2: A Carry Select Adder with 1 level using

n/2- bit RCA
Logic equations
Si-1: k = ck' s0i-1: k + ck s1i-1: k
= ck' c0i + ck c1i
Complexity and Delay for n-bit CSLA structure
ACSLA = O (n) = 14n
TCSLA = O (n1/*l+1) = 2.8n1/2.
Because of multiplexers larger area is
Have a lesser delay than Ripple Carry
Adders (half delay of RCA).

ISSN: 2278-5795

Page 96

Prasann D. Kulkarni, et al International Journal of Computer and Electronics Research [Volume 2, Issue 2, April 2013]

Hence we always go for Carry Select

Adder while working with smaller no of
2.3 Carry Look Ahead Adder(CLA)
Carry Look Ahead Adder can produce carries
faster due to carry bits generated in parallel by an
additional circuitry whenever inputs change. This
technique uses carry bypass logic to speed up the
carry propagation.

Figure 4: 8-BIT Carry Look Ahead Generator

(using 2-bit CLA)
Complexity and Delay for n-bit CLA structure
ACLA = O (n) = 14n
TCLA = O (log n) = 4 log2n.
Figure 3: 4-BIT CLA Logic equations
Let ai and bi be the augends and addend inputs,
ci the carry input, si and ci+1, the sum and carryout to the ith bit position. If the auxiliary
functions, pi and gi called the propagate and
generate signals, the sum output respectively are
defined as follows.
pi = ai + bi
gi = ai bi
si = ai xor bi xor ci ci+1 = gi + pici
As we increase the no of bits in the Carry Look
Ahead adders, the complexity increases because
the no. of gates in the expression Ci+1 increases.
So practically its not desirable to use the
traditional CLA shown above because it increases
the Space required and the power too.
Instead we will use here Carry Look Ahead
adder (less bits) in levels to create a larger CLA.
Commonly smaller CLA may be taken as a 4-bit
CLA. So we can define carry look ahead over a
group of 4 bits. Hence now we redefine terms
carry generate as [Group Generated Carry] g[
i,i+3 ] and carry propagate as [Group Propagated
Carry] p[ i,i+3 ] which are defined below.

3.1. Wallace Tree Multiplier

The Wallace tree multiplier is considerably
faster than a simple array multiplier because its
height is logarithmic in word size, not linear.
However, in addition to the large number of
adders required, the Wallace trees wiring is much
less regular and more complicated. As a result,
Wallace trees are often avoided by designers,
while design complexity is a concern to them.
Wallace tree styles use a log-depth tree network
for reduction. Faster, but irregular, they trade ease
of layout for speed. Wallace tree styles are
generally avoided for low power applications,
since excess of wiring is likely to consume extra
While subsequently faster than Carry-save
structure for large bit multipliers, the Wallace tree
multiplier has the disadvantage of being very
irregular, which complicates the task of coming
with an efficient layout.

Redefined Equations
g[ i,i+3 ] = gi+3 + gi+2 pi+3 + gi+1 pi+2 pi+3 +
g[i pi+1 pi+2 pi+3
p[ i,i+3 ] = pi pi+1 pi+2 pi+3
Now the modified block diagram for the Carry
Look ahead Adder (8-bit) using levels (of 4-bit
CLA) will be as block diagram below
Figure 5: Wallace Tree Block Diagram

ISSN: 2278-5795

Page 97

Prasann D. Kulkarni, et al International Journal of Computer and Electronics Research [Volume 2, Issue 2, April 2013]

Three step processes are used to multiply two

Formation of bit products.
Reduction of the bit product matrix into a
two row matrix by means of a carry save
Summation of remaining two rows using a
faster Carry Look Ahead Adder (CLA).
3.2 Booths Multiplier
Though Wallace Tree multipliers were faster
than the traditional Carry Save Method, it also
was very irregular and hence was complicated
while drawing the Layouts. Slowly when
multiplier bits gets beyond 32-bits large numbers
of logic gates are required and hence also more
interconnecting wires which makes chip design
large and slows down operating speed
Booth multiplier can be used in different modes
such as radix-2, radix-4, radix-8 etc. But we
decided to use Radix-4 Booths Algorithm
because of number of Partial products is reduced
to n/2.
3.2.1. Booth Multiplication Algorithm(Radux 4)
One of the solutions realizing high speed
multipliers is to enhance parallelism which helps
in decreasing the number of subsequent
calculation stages. The Original version of
Booths multiplier (Radix 2) had two
The number of add / subtract operations
became variable and hence became
inconvenient while designing Parallel
The Algorithm becomes inefficient when
there are isolated 1s
These problems are overcome by using Radix 4
Booths Algorithm which can scan strings of three
bits with the algorithm given below. The design of
Booths multiplier in this project consists of four
Modified Booth Encoded (MBE), four sign
extension corrector, four partial product
generators (comprises of 5:1 multiplexer) and
finally a Wallace Tree Adder. This Booth
multiplier technique is to increase speed by
reducing the number of partial products by half.
Since an 8-bit booth multiplier is used in this
project, so there are only four partial products that
need to be added instead of eight partial products
generated using conventional multiplier. The
architecture design for the modified Booths
Algorithm used in this project is shown below.

Figure 6: Architecture of designed Booth

Figure 5. shows the architecture of a
conventional shift and add multiplier. The dashed
ovals show the major sources of switching
activities. The multiplier is shifted in each cycle
and the bit which getting out of register B is
connected to the select pin of multiplexer, mux_A.
As the select signal changes, the output of mux_A
also changes. This causes the adder operation. The
partial product is required to be shifted in every
cycle. The counter is for checking whether the
required number of operations has been
performed. The major sources of switching
activities are summarized as below
Shifting of the B register
Activity in the counter
Activity in the adder
Switching between 0 and A in the
Activity in the multiplexer select
Shifting of the partial product register
By eliminating or reducing the switching activity
described above, low power architecture can be
derived architecture can be derived.

Figure 7: Architecture of conventional shift

and add multiplier with major
source of switching activity.

ISSN: 2278-5795

Page 98

Prasann D. Kulkarni, et al International Journal of Computer and Electronics Research [Volume 2, Issue 2, April 2013]

4.1 State Diagram

Figure 10: Multiplier with ring counter

For a 3 bit multiplier 3 bit ring counter is used.
Table 2 gives the required bit and counter output
TABEL 3: Counter output with required bit.
Figure 8: Conventional add shift multiplier
state diagram
5.1 Architecture
To derive a low-power architecture, we
concentrate our effort on eliminating or reducing
the sources of the switching activity discussed in
the previous section. The proposed architecture
which is shown in Figure 6.3 is called BZ-FAD.
5.1.1 Shift of the B Register
An example of shifting of register is shown here

To avoid this, in the proposed architecture (Fig

11) a multiplexer (M1) with one-hot encoded bus
selector chooses the hot bit of B in each cycle. A
ring counter is used to select B(n) in the nth cycle.
As will be seen later, the same counter can be
used for block M2 as well. The ring counter used
in the proposed multiplier is noticeably wider (32
bits vs. 5 bits for a 32-bit multiplier) than the
binary counter used in the conventional
architecture; therefore an ordinary ring counter, if
used in BZ-FAD, would raise more transitions
than its binary counterpart in the conventional
architecture. To minimize the switching activity of
the counter, we utilize the low-power ring counter,
which is described in the next section.
5.1.2 Reducing Switching Activity of te Adder

Figure 9: Shift and add multiplication example

In the traditional architecture (see Figure 9), to
generate the partial product, B(0) is used to decide
between A and 0. If the bit is 1, A should be
added to the previous partial product, whereas if it
is 0, no addition operation is needed to generate
the partial product. Hence, in each cycle, register
B should be shifted to the right so that its right bit
appears at B(0); this operation gives rise to some
switching activity.

In the conventional multiplier architecture

(Figure 7), in each cycle, the current partial
product is added to A (when B(0) is one) or to 0
(when B(0) is zero). This leads to unnecessary
transitions in the adder when B(0) is zero. In these
cases, the adder can be bypassed and the partial
product should be shifted to the right by one bit.
This is what is performed in the proposed
architecture which eliminates unnecessary
switching activities in the adder. As shown in
Figure 11, the Feeder and Bypass registers are
used to bypass the adder in the cycles where B(n)
is zero. In each cycle, the hot bit of the next cycle
(i.e., B(n + 1)) is checked. If it is 0, i.e., the adder
is not needed in the next cycle, the Bypass register
is clocked to store the current partial product. If

ISSN: 2278-5795

Page 99

Prasann D. Kulkarni, et al International Journal of Computer and Electronics Research [Volume 2, Issue 2, April 2013]

B(n + 1) is 1, i.e., the adder is really needed in the

next cycle, the Feeder register is clocked to store
the current partial product which must be fed to
the adder in the next cycle. Note that to select
between the Feeder and Bypass registers we have
used NAND and NOR gates which are inverting
logic, therefore, the inverted clock (~Clock in
Figure6.3) is fed to them. Finally, in each cycle,
B(n determines if the partial product should come
from the Bypass register or from the Adder output.
In each cycle, when the hot bit B(n) is zero, there
is no transition in the adder since its inputs do not
change. The reason is that in the previous cycle,
the partial product has been stored in the Bypass
register and the value of the Feeder register,
which is the input of the adder, remains
unchanged. The other input of the adder is A,
which is constant during the multiplication. This
enables us to remove the multiplexer and feed
input A directly to the adder, resulting in a
noticeable power saving. Finally, note that the
BZ-FAD architecture does not put any constraint
on the adder type. In this work, we have used the
ripple carry adder which has the least average
transition per addition among the look ahead,
carry skip, carry-select, and conditional sum
5.1.3 Shift of the PP Register
In the conventional architecture, the partial
product is shifted in each cycle giving rise to
ransitions. Inspecting the multiplication algorithm
reveals that the multiplication may be completed
by processing the most significant bits of the
partial product, and hence, it is not necessary for
the least significant bits of the partial product to
be shifted. We take advantage of this observation
in the BZ-FAD architecture. Notice that in Figure
11 for PLow, the lower half of the partial product,
we use k latches (for a k-bit multiplier). These
latches are indicated by the dotted rectangle M2 in
Figure 11 .

Figure 11: The proposed low power multiplier

architecture (BZ-FAD)
In the first cycle, the least significant bit, PP(0),
of the product becomes finalized and is stored in
the rightmost latch of PLow. The ring counter
output is used to open (unlatch) the proper latch.
This is achieved by connecting the S/~H line of
the nth latch to the nth bit of the ring counter
which is '1' in the nth cycle. In this way, the nth
latch samples the value of the nth bit of the final
product (Figure 11). In the subsequent cycles, the
next least significant bits are finalized and stored
in the proper latches. When the last bit is stored in
the leftmost latch, the higher and lower halves of
the partial product form the final product result.
Using this method, no shifting of the lower half of
the partial product is required. The higher part of
the partial product, however, is still shifted.
Comparing the two architectures, BZ-FAD saves
power for two reasons: first, the lower half of the
partial product is not shifted, and second, this half
is implemented with latches instead of flip-flops.
Note that in the conventional architecture (Fig 1)
the data transparency problem of latches prohibits
us from using latches instead of flip-flops for
forming the lower half of the partial product. This
problem does not exist in BZ-FAD since the lower
half is not formed by shifting the bits in a shift

ISSN: 2278-5795

Page 100

Prasann D. Kulkarni, et al International Journal of Computer and Electronics Research [Volume 2, Issue 2, April 2013]

Figure12.Manual approach for BZFAD

5.2 State Diagram


Figure 13: BZFAD state diagram

Following the architecture of conventional add

and shift multiplier, simulation results are
obtained. The total operation is obtained in four
states. First state loads the registers and second
state calculates the first partial product. As we
move on to the third state, the counter value is
incremented and is tested for the kth bit value.
With every increment of the counter until the
required value is reached, the other shifting and
ISSN: 2278-5795

Page 101

Prasann D. Kulkarni, et al International Journal of Computer and Electronics Research [Volume 2, Issue 2, April 2013]

addition operations are calculated. The output is

visible at the transition from third state to fourth
state, as done signal goes high. Later counter is
reset for further operations.
6.1 BZFAD Multiplier Code Description
We made a number of adjustments to the
conventional multiplier architecture to reduce
power. Following this BZFAD architecture,
simulation results are obtained. In the first state
the multiplier and multiplicand values are loaded
with their respective values and all the signals are
initialized to zero. In the next state, in each cycle,
the hot bit of the next cycle, that is, B(n+1) is
checked. If it is 0, that is, adder is not needed in
the next cycle, the bypass register is clocked to
store the current partial product. If B(n+1) is 1,
that is, the adder is really needed in the next cycle.
The Feeder register is clocked to store the current
partial product which must be fed to the adder in
the next cycle. In each cycle ring counter is
incremented and the MSB is checked for 1, when
it becomes 1 state is incremented. In the next
state, the lower half of partial product is stored in
the Plow latch and the upper half is stored in the
feeder, and these two registers are concatenated to
form the final product.
7.1 Basics About Spartan-II Trainer Kit
The Spartan-II trainer MXSFK-LC-208 is
useful to realize and verify various digital designs.
User can construct VHDL/Verilog code and verify
the results by implementing physically in to the
target device (FPGA -Field Programmable Gate
Arrays). With the help of this trainer user can
simulate/observe various input and output
conditions to verify the implemented design. Also
you can select various i/o std. Interface to the
7.2. Programmable Logic Devices [PLDS]
A Programmable Logic Device is a device
whose logic characteristics can be changed and
manipulated or stored through programming.
7.2.1 Different Types of PLDs. Programmable Array Logic[PALS]
The most common and simple device that falls
in this category is the PAL, which simply consists
of an array of AND gates and an array of OR

gates. The AND array is programmable while the

OR array is relatively fixed. Field Programmable Gate Arrays
FPGA's are arrays of logic blocks, which can
be linked together to form complex logic
implementations. They are separated into two
categories - Fine Grained and Coarse Grained.
Fine Grained being made up of sea of gates or
transistors or small macro cells, while Coarse
Grained being made up of bigger macro cells
which are often made up of flip-flops and Look up
Tables which make up the Combinational logic
functions. These are RAM based devices i.e.
these devices lose their configuration when power
is switched off. Hence they have to be configured
every time when power is applied. Complex Programmable Logic Devices
CPLD's are made up of smaller common Macro
cells, which are programmable. CPLD's consists
of multiple PAL like function block that can be
interconnected through a switch matrix. These are
[Flash] EPROM based devices i.e. these devices
store their configuration even when power is
switched off. Hence they need not to be
configured every time when power is applied. Application Specific Integrated Circuits
ASIC's are nothing but prefabricated pre-doped
silicon chips. These are application specific
designs. They cannot be reconfigured once
manufactured. Once the design is completely
finalized, it can be made as ASIC. Design changes
are not possible but the size and speed is more.
Spartan-II family is second-generation high
volume production FPGA solution. Devices in
this family are available up to 200,000 gates, with
up to 200MHz system performance at 2.5V
Features of the Spartan-II families are:
1. On-chip RAM (block and distributed).
2. Fully PCI compliant.
3. Dedicated carry logic for high-speed
4. Dedicated multiplier support.
5. Low

ISSN: 2278-5795

Page 102

Prasann D. Kulkarni, et al International Journal of Computer and Electronics Research [Volume 2, Issue 2, April 2013]

6. 16 high performance interface standards.

7. 4 dedicated delay locked loop (DLLs) for
advanced clock control.
8. Power down mode (ICCO =100 mA).
9. Unlimited re-programmability.
8.3.1 Tainer Description
Technical Data

On board FPGA Spartan-II XC2S50 PQ

208 and compatible with XC2S100,
XC2S150, PQ 208 Package.
2 Keys for Keyboard Interface.
8 Digital I/Ps and O/Ps with LED
Two seven segment Displays.
On board 4 MHZ clock and Power On
reset circuit.
User selectable Interface hardware.
Support required for VCCO is on board,
no external supply required].
Probing facility: All I/Os available to the
Power Supply
9-Volt Adapter supplied with Spartan-II
Required VCCO (3.3V) and Vccint (2.5V)
voltages are generated on board.
Seven Segment Led Display
Two 7-Segment LED displays are
provided. User can use them as an aid to
verify his design. [They come handy in
counter related application to monitor the

Test Points [TPs]

User can use these points to verify ground,

supply voltage, and clock.

DIP Switch
Single 8-way DIP switch [SW 1] is provided to be
used as input to the FPGA. Logic Level applied to
FPGA through SW1 is seen on LEDs LD0 to
Various jumpers are provided for
Selection of clock.
Selection of configuration mode.
Two Keys are provided for Keyboard
Downloading Cable
For downloading the design from PC, a 9 pin
D-Type male (J7) connector is provided on board.
The trainer can be connected to PC's parallel port
with a cable having 25 pins D-Type (male) to 9
pins D- type (female) connector. This cable is
provided with the trainer.


There are total 18 LEDs on the Trainer,

which are grouped as follows.
1. POWER-ON LED is used for
power supply indication
2. .DONE LED, indicates successful
configuration of SPARTAN-II
3. Eight LEDs [IL0 to IL7] indicate
the inputs applied by user.
4. Eight LEDs [LD0 to LD7] indicate
output conditions.

ISSN: 2278-5795

Figure 14: SPARTAN 2 Trainer

Page 103

Prasann D. Kulkarni, et al International Journal of Computer and Electronics Research [Volume 2, Issue 2, April 2013]

When a software is to be implemented on

hardware, interfacing is done. Any software
code/program can be dumped on a hardware kit
(in this case Spartan-II FPGA) with the help of a
software interfacing tool (Xilinx).
When we burned our programs for conventional
architecture and BZFAD architecture on the
Spartan-II kit, the results were obtained
successfully. The images of Spartan-II executing
the program are shown

Conventional 8 BZFAD




8.258 ns

6.975 ns

121.094 Mhz

143.362 Mhz

8.426 ns

7.167 ns




After understanding the architecture of both

conventional and BZFAD multipliers, next step
was to implement it. In order to accomplish this
we write a code in Very High Speed Integrated
Circuit- Hardware Descriptive Language [VHDL].
This code was synthesized using Xilinx and
simulated using ISE simulator [isim], and was
implemented by burning on Spartan2 FPGA kit.
Simulation results, timing summary, area
utilization and power analysis report is shown


8.1 Simulation Results

The simulation results for both the conventional
and BZFAD architectures follow in the order
given below,
4 Bit conventional multiplier
8 Bit Conventional Multiplier
4 Bit BZFAD Multiplier
8 Bit BZFAD Multiplier


Conventional 16 BZFAD 16




9.946 ns

6.564 ns

100.540 Mhz



10.281 ns

7.502 ns

input arrival
8.3 Area Utilization

8.2 Timing Summary




4 bit

4 bit

5.943 ns

4.918 ns

168.264 Mhz



6.682 ns

5.160 ns

arrival time

ISSN: 2278-5795

Page 104

Prasann D. Kulkarni, et al International Journal of Computer and Electronics Research [Volume 2, Issue 2, April 2013]

8.5 Result Summary

Figure 15: Area, Power and Delay comparison for

conventional and proposed BZFAD multiplier for
various bits.

8.4 Power Analysis

Figure 16: Relationship between power reduction

and bit size of multiplier.

ISSN: 2278-5795

Page 105

Prasann D. Kulkarni, et al International Journal of Computer and Electronics Research [Volume 2, Issue 2, April 2013]


Figure 17: Simulation for 8 bit BZFAD

In this paper, a low-power architecture for

shift-and-add multipliers was proposed. The
modifications to the conventional architecture
included the removal of the shift of the B register
(in A B), direct feeding of A to the adder,
bypassing the adder whenever possible, use of a
ring counter instead of the binary counter, and
removal of the partial product shift. The results
showed an average power reduction of 30% by the
proposed architecture. We also compared our
multiplier with SPST [6], a low-power tree-based
array multiplier. The comparison showed that the
power saving of BZ-FAD was only 6% lower than
that of SPST whereas the SPST area was five
times higher than that of the BZ-FAD. Thus, for
applications where small area and high speed are
important concerns, BZ-FAD is an excellent
choice. Additionally we proposed a low-power
architecture for ring counters based on
partitioning the counter into blocks of flip flops
clock gated with a special clock gating structure
the complexity of which was independent of the
block sizes. The simulation results showed that in
comparison with the conventional architecture, the
proposed architecture reduced the power
consumption more than 75% for the 64-bit counte

Figure 18: Simulation for 8 bit conventional

[1] M.Mottaghi
Kusha,m.Pedram BZFAD A Low Power
Low Area Multiplier Based on Shift and Add
Architecture IEEE Trans. Very Large Scale
Integr .(VLSI)Syst., Vol.17, no-2,pp302-306,
Feb. 2009.
[2] O. Chen, S.Wang, and Y.W. Wu,
Minimization of switching activities of
partial products for designing low-power
multipliers, IEEE Trans. Very Large Scale
Integr. (VLSI) Syst., vol. 11, no. 3, pp. 418
433, Jun. 2003.
[3] B.Parhami Computer arithmetic algorithms
and Hardware designs 1 st ed.Oxford U.K.
Oxford Univ, Press 2000.

[4] Ercegovac M.D. and Huang Z. (March 2006)

ISSN: 2278-5795

Page 106

Prasann D. Kulkarni, et al International Journal of Computer and Electronics Research [Volume 2, Issue 2, April 2013]

High performance low power left to right

array multiplier design IEEE Trans.
Comput., Vol-54, no-2, pp 272-283.
[5] Anantha P. Chandrakasan, Samuel Sheng, and
Robert W. Brodersen, Low-Power CMOS
Digital Design, Journal of Solid state
circuits. Volume 27, NO 4. April 1992.
[6] Nazieh M. Botros, HDL programming
Press(Available through John Wiley- India
and Thomson Learning) 2006 Edition.
[7] Charles H. Roth. Jr:, Digital systems Design
using VHDL, Thomson Learning, Inc, 9th
reprint, 2006.

Mr. Prasann D.Kulkarni has
completed B.E in Electronics
and Communication Engg.
From KLSs Vishwanathrao
Deshpande Rural Institute of
Kannada, Karnataka, India.
Presently he is pursuing M. Tech in Digital
Electronics from KLSs G.I.T, Belgaum,
Karnataka, India and since 2008 he is working as a
lecturer in KLSs Vishwanathrao Deshpande Rural
Institute of Technology, Haliyal, Uttar Kannada,
Karnataka, India. His Research interests are in Low
Power Embedded system design, Fuzzy logic in
neural applications.

ISSN: 2278-5795

Page 107