Edited Project

LOW AREA HIGH SPEED CONFINED MULTIPLIER USING
MULTIPLEXER BASED FULL ADDER
CHAPTER-1
1.1 Introduction
In recent years, low power, high speed and less area are the key parameters in
the design of modern VLSI circuits. In ALU multipliers plays a major role. In the
design of DSP structures multipliers is a important functional unit. The array
multiplier partial products are generated by anding of multiplier and multiplicand
bits.
In second stage full adders and half adders has been used for the reduction of
generated partial products. In third stage by addition of two rows using fast carry
adders. In recent years a lot of research work has been carried out to reduce the
complexity of the multiplier, a novel method is used for reduction of complexity of
array multiplier in terms of number of half adders, further improvement is carried
out by incorporating one more half adder to the right most columns, results in a
drastic area reduction. In addition to that, Booth encoding approach along with
compressor has been used to reduce the area as well as latency. Furthermore, the
conventional half adder and full adder in the second stage are replaced with XOR-
XNOR based 3:2, 4:2 and 5:2 compressors which bring an increase in speed of
operation.
An efficient approach is proposed by estimating the power of each stage of the

reduction tree using probabilistic gate-level power estimator. Due to that the
switching power is reduced by optimizing the transitions activity in the partial
product tree. The reordering of partial products is employed in such a way so as to
reduce the switching activity which leads to reduction in power. A modified full
adder using 4:1 multiplexers is used in the reduction phase to reduce the power and
full adder is designed using six 2:1 multiplexers. This paper mainly deals with the
replacement of full adders with different designs and comparing all the parameters
like delay, area and speed. The basic building blocks of arithmetic circuits in
digital signal processing systems are registers, multiplier and adders. Among of
these, the multipliers are the most area, time, and power consuming components.
Since non-regularities in the construction of traditional multipliers result in a large
amount of wasted area, there have been many researches in the past.
NMREC, Dept Of ECE 1
The basic building blocks of arithmetic circuits in digital signal processing

systems are registers, multiplier and adders. Among of these, the multipliers are the
most area, time, and power consuming components. There have been many
researches in the past and in present for high Speed Optimized Multipliers. In a
Simple term, A Wallace tree is an efficient hardware implementation of a digital
circuit that multiplies two integers.
The power devour postponement and range are dependably been an imperative
outline contemplations for any chip planner. Numerous DSP structures join
multipliers in their plan. Postponement of the circuit unavoidably amends with the
deferral of the multiplier. Along these lines research is going ahead to decrease the
postponement of multiplier and the deferral of entire circuit can be diminished.
An early depiction of the Wallace tree multiplier has helped much. Wallace
tree multiplier has been developed as rapid and range of productive multiplier. The
Wallace tree multiplier comprises of ANDing of multiplier and multiplicand bits
for the era of incomplete items. In second stage full adders and half adders has
been utilized for the debasing of spawned fractional items in two columns. Taken
after by expansion of two lines utilizing quick convey adders in the third stage. As
of late a ton of research work has been completed to debase the many-sided quality
of the multiplier. In a novel strategy is utilized for diminishment of many-sided
quality of Wallace tree multiplier as far as number of half adders.
In encourage amends to the strategy presented in is completed by consolidating
one all the more half adder to the privilege most sections, consequences in an
extreme range decrease. Not with standing that, in Booth encoding approach
alongside compressor has been utilized to lessen the region and inertness.
Moreover, in the regular half and full adders in the second stage are supplanted
with XOR-XNOR based on ration 3:2, 4:2 and 5:2 type condensers, which acquires
an expansion agility of operation. A proficient advent is suggested by
guesstimating the energy of each phase of the diminishment tree probabilistic
entryway level power estimator. Because of that the exchanging force is
diminished by ameliorating the moves in action in the unfinished item tree. In the
reordering of unfinished items is utilized in such a route in order to abate the
exchanging action which prompts debase in power.

Dividing the unfinished item tree into four , and applying to one gathering of
Wallace multiplier to another et cetera likewise accomplishing control decrease. In
an altered full adder utilizing 4:1 multiplexers is premeditated as a part of the
abating stage with the end goal, that it is debasing the short out present and in
addition them move action, in this manner the power is likewise getting
diminished. Yet, the region is expanding altogether. This work primarily manages
the supplanting of full adders with adjusted in the diminishment period of the
aforementioned multiplier.
In the suggested strategy, an altered full adders utilizing multiplexer is

associated to accomplish control lessening contrasted with the current procedures
with a little region and postpone amends. In the paper, a few sorts of
aforementioned are contemplated, which have lessened intricacy, control
utilization and inactivity when contrasted with the traditional aforementioned type.
Research efforts in the field of low power VLSI (very large-scale integration)
systems have increased many folds due to exponential growth of portable
electronic devices like laptops; audio/video based multimedia and cellular
communication devices. With rise in number of transistors on chip, power
consumption of VLSI systems is also raising which further, adds to run time
failures and reliability problems.
Packaging and cooling mechanism become more complex and costly with
excessive power consumption. Low power consumption is one of important design
criteria for IC designers at all levels of design along with delay and area
considerations. Faithful functionality of device at lowest supply voltage is also
vital consideration.

1.2 Objectives
The objectives of this research are exploration and analysis of techniques for
power reduction at circuit level, sub-system level and architecture level.
1. The main aim of our project is to implement confined wallace tree multiplier
using multiplexer based full adder.
2. To reduce the area, power and increase the speed of system we develop this
type of multiplier.
3. Multipliers play a vital role in digital signal processing(DSP) applications
such as finite impulse response(FIR) and discrete cosine transforms(DCT) etc.
4. In this the number of single bit adders are reduced and also it will be replaced
by multiplexers.
5. These multiplication techniques are designed by using RTL simulation and
synthesis is done in Xilinx ISE.

CHAPTER-2
2.1 Block diagram
Fig 2.1 Block Diagram
We propose a modification in multiplier to further reduce its area by reducing

the size of the final adder. Multiplier has the same number of stages and the same
rule for maximum number of rows in a stage as in the other multipliers discussed
in this paper. An 8-bit multiplier reduction process uses an additional half adder in
each stage in order to reduce the size of the final adder. The algorithm scans from
the right side and starts the reduction by using a half adder when it finds the first
column where the number of elements is greater than one. The additional half
adders are shown in solid boxes.

Full adder
A Full adder adds binary numbers and accounts for values carried in as well as
out. A one- bit full-adder adds three one-bit numbers, often written as A, B, and
Cin, A and B are the operands, and Cin is a bit carried in from the previous less-
significant stage. The full adder is usually a component in a cascade of adders,
which add 8, 16, 32, etc. bit binary numbers. The circuit produces a two-bit
output. Output carry and sum typically represented by the signals Cout and S, where
the sum equals 2Cout + S.
A full adder can be implemented in many different ways such as with a custom
transistor- level circuit or composed of other gates. One example implementation
is S = A ⊕ B ⊕ Cin and Cout = (A ⋅ B) + (Cin ⋅ (A ⊕B)).
In this implementation, the final OR gate before the carry-out output may be
replaced by an XOR gate without altering the resulting logic. Using only two types
of gates is convenient if the circuit is being implemented using simple integrated
circuit chips which contain only one gate type per chip.
A full adder can also be constructed from two half adders by connecting A and
B to the input of one half adder, then taking its sum-output S as one of the inputs
to the second half adder and Cin as its other input, and finally the carry outputs
from the two half-adders are connected to an OR gate.
The sum-output from the second half adder is the final sum output.of the full
adder and the output from the OR gate is the final carry output. and finally the
carry outputs from the two half-adders are connected to an OR gate. The critical
path of a full adder runs through both XOR gates and ends at the sum bit s.
Assumed that an XOR gate takes 1 delays to complete, the delay imposed by the
critical path of a full adder

Fig 2.2 full adder
Truth table of full adder
Proposed Full Adder

The proposed full adder circuit is designed by multiplexing method and
Boolean identities. After the simplification of Boolean identities, the equations of
sum and carry are shown. The simplest way of approach to the A⊕B is designed
according to the multiplexer method. The exclusive of C and its complement input
nodes are directly fed into A⊕B which generates the sum and its schematic form.
According to carry Equation the A⊕B circuit and C input node are combined
in the form of logical AND. The exclusive of this output node along with AB
circuit generates the carry output according to Equation and its schematic form is
shown.

This circuit uses multiplexing method efficiently to reduce the number of nodes
to 12. The proposed full adder circuit eliminates the power guard problem due to
regular arrangement of the transistor input nodes. Thus, the proposed circuit gives
less power dissipation, lower propagation delay, and low PDP when compared to
widely use existing full adder circuits.
Sum = A⊕B⊕C
Carry AB (A ⊕B )C
(b) Carry Circuit
Fig 2.3 Proposed Multiplexer based Full Adder

Fig 2.4 Multiplexer based full adder
2.2 Schematic diagram
Fig 2.5 Top module

The RTL Viewer allows you to select the portions of the design to display as
schematic. When the schematic is displayed, double-click on the symbol to push
into the schematic and view the various design elements and connectivity. Right-
click the schematic to view the various operations that can be performed in the
schematic viewer.
Fig 2.6 RTL view using multiplexer

2.3 Flow chart
Start
Multiply the two n bits binary number
Generate partial products
Arrange the partial products

in pyramidal form
Divide the partial products into group of three

lines
NO If YES
n=3
Apply full adder
Carry is generated
diagonally to next column
Apply half adder
Final binary output of given input data
Stop
Fig 2.7 Flow chart

2.4 Working principle

While processing the confined multiplier full adders are being considered and
half adders are taken to the next step until Wallace tree is solved, where as in
conventional multiplier both half adders and full adders are being considered. Due
to this the speed is increased and area is reduced.
The step by step process for designing a confined wallce tree multiplier is as
follows
 At first multiply the given binary input.
 Partial products are generated.
 Partial products are arranged in pyramidal structure.

 Divide the structure into group of rows depending on the number
of bits to be multiplied and use full adders for each group of 3 bits
in a column.
 A group of 2 bits in a column is passed on to next stage and

thereby reducing the number of half adders required.
 Single bits are passed on to the next stage same as in conventional

wallace reduction.

2.5 Existing system
Fig 2.8 Conventional Wallace tree multiplier
In the relevant strategy, three single piece are gone to a one piece full adders ,
which is recognized three info Wallace Tree circuit, and the yield flag, which is
provided to the following stage full adder of a similar piece, and the convey yield
banner is gone to the following stage full adder, of a similar number of bit, and the
convey yield flag, which is provided to the following period of the full adder
situated at one piece.
A eight-bit type is involved basically two sections specifically half and full
adder. Thus, right off the both are sketched out for building eight bit multiplier we
require eight half snake and forty eight full, so totally fifty six adders are
considered forth. Henceforth, the half adder is instantiated for each estimation as
indicated by the prerequisite by passing the best possible parameters. The last
outcome is gained from the total and convey bits of it.

Each does conveys to a solitary piece fractional item. Beginning from the
privilege most segment, when three bits are run over full adder and for two bits
half adders are utilized individually.
The whole and convey yields for each type at one phase are again conveyed to
as in the accompanying stage and are considered as contributions of respective
adders as a piece of that stage. Each segment has a specific request of greatness of
the halfway items. The entirety yield at one phase mirrors a spot in a similar section
at the following stage.
The convey yield at one phase mirrors where, in the segment to one side i.e. one
request of greatness higher. The last stride of the aforementioned kind is to include
the staying two columns utilizing a quick type.
A portion of the wires are broadly utilized parallel-prefix adders which are
utilized for rapid operations and these are KoggeStone, Sklansky and Brent-Kung.
These adders utilize a similar tree topology however vary as far as rationale levels,
fan out, and interconnected.
Wallace multiplier is a productive parallel kind. In the ordinary type, the initial
step is to shape fractional item cluster (of N2 bits). In the second step, gatherings
of three contiguous columns each, forty four is gathered. Each of the gathering at
three lines is debased by utilizing full and half adders.
Full adders are chosen as a part of every segment where there are three bits
though half adders are utilized as a part of every segment, where there are two
bits. Any single piece in a segment is passed to the following stage in a similar
section without handling. This diminishment method is rehashed in each
progressive stage until just two columns remain. In the last stride, the staying two
columns are included utilizing a convey proliferating the respective type.

A case of a portrayal of the traditional eight-bit by eight-bit aforementioned

type is appeared in figure below. The three line groupings are appeared by light
lines. There are three line groupings are appeared by light lines. Four respective
phases have been essential, to play out the debasing type modus operandi, each
with postponement of one full adder.
In conventional wallace tree multiplier, partial products are generated first, then
these are accumulated in different stages. The partial products are formed by N^2
AND gates. The following stages are involved in conventional wallace reduction
method.
 Appling a full adder to each column that contains 3 bits.
 Appling a half adder to each column that contains 2 bits.
 Passing any single bit column to next stage without processing.
In conventional Wallace tree multiplier the power consumption is more and area
utilized is also high, in order to overcome this confined wallce tree multiplier is
designed.

2.6 Proposed system

Confined Wallace tree multiplier:
Fig 2.9 Confined Wallace tree multiplier
Waters et al. exhibited diminished unpredictability Wallace kind decrease

approach. It is an alteration to the second stage debasing technique, utilized as a
part of the ordinary Wallace multipliers, in which number of the half adder is
extraordinarily diminished. In the principal stage, the halfway item exhibit is
shaped and it is amended over as a rearranged pyramid cluster. An upset pyramid
type, is framed when the bits in the left fifty percent of the unfinished item exhibit
is moved in the upward course.
In the second stage, this exhibit is partitioned into gathering of three lines each
and full adders are utilized as a part of every section. Half adders are utilized just
when the quantity of diminishment phases of the adjusted aforementioned kind is

surpassing that of the ordinary kind.

As indicated by condition (1) in the altered type, if (ri mod 3) equal to zero,
then half adder is required in the debasing organize generally half adder is not been
essential. The quantity of half adder was seen to be (N-S-1).
In the altered Wallace nine by nine debasing, just a single half adder is
considered as a part of the first and the second stage and two half adders are
utilized as a part of the last stage as appeared in Figure below.
In the third stage, aforementioned type bit convey which is proliferating is

utilized. Consequently, we watched that the quantity of the debasing stages stay
same, when contrasted with the regular Wallace diminishment while two all the
more full adders and seventeen less half adders are considered as a part of the
amended type. Both the adjusted and the customary type are contrasted for sizes
from eight with sixty four bits. In the third stage, (2N-2) bit convey proliferating
type is utilized. Thusly, we watched that the quantity of the diminishment stages
stay same when contrasted with the ordinary kind decrease while two all the more
full and seventeen less half adders are utilized as a part of the altered
aforementioned kind.
Both type yield same execution in the terms of deferral and have same number
of the decrease stages, yet the altered aforementioned type has the upside of
diminished many-sided quality as number of half kind is eighty not as much as the
customary kind in the second stage. However because of lessening in number of
half adders, the aggregate entryway tallies in amended type with decrease is
constantly not as much as that of the regular type diminishment. The quantity of
full kind is to some degree expanded between one to five for eight to sixty four bit
adjusted.
The confined wallace multiplier is similar to that of conventional wallace

multiplier in that it uses as many full adders as possible, but difference is that it
only uses half adders when necessary. The following stages are involved in
confined wallace reduction method.
 At first partial products are formed into pyramidal structure.

 Divide the structure into group of rows depending on the number

of bits to be multiplied and uses full adders for each group of 3 bits
in a column.
 A group of 2 bits in a column is passed on to next stage and
there by reducing the number of half adders required.
 Single bits are passed on to the next stage same as in conventional

wallace reduction.
In the project, we have concentrated a few sorts of respective multipliers and
contrasted them with the regular kind. In the diminished multifaceted nature
aforementioned multiplier, the quantity of half adder is lessened to eighty with
increment in number of full adder. Consequently, multifaceted nature is lessened in
opposition to the regular kin.
The drawbacks in conventional Wallace tree multiplier are solved in this

confined Wallace tree multiplier system.

CHAPTER-3
3.1 Adders
Nowadays, circuits with low power and high speed have a great importance. An
extensive, almost endless, assortment of adder architectures serves different
speed/area requirements. The adder with transistor sizes optimized to favor the
critical path using number of techniques has been studied in this section. A fast and
accurate operation of a digital system is greatly influenced by the performance of the
resident adders, because of their extensive use in other basic digital operations such
as subtraction, multiplication and division. IC engineers are required to improve the
performance of existing operation modules in some aspects in power depletion and
size. IC designers have encountered more constraints: high speed, high throughput,
small silicon area, and at the same time, low power dissipation. Hence, the research
of establishing low power, high performance adder cells are becoming feverish. One
efficient method to accomplish this task is derived from the structural level. The
migration towards deep submicron technologies has drastically changed the phase of
the lower power design. This approach to designing and analyzing an adder cell is
decomposing it into smaller modules for further analysis and improvement. In this
way, an optimized full adder cell can be constructed by connecting these improved
smaller modules.
Hence, improving performance of the digital adder would greatly advance the
execution of binary operations inside a circuit compromised of such blocks. In this
section a review on several types of adders and study of their performance is
presented. Basically adders are of two types single bit and multi bit.
3.1.1 Types of adders
Adders are classified into types
 Single bit adders
 Multi bit adders

Multi bit adders

Multi bit adders are further classified into two types
 Two operand adders
 Multi operand adders
Two operand adders
Two operand adders are classified into following types
 Ripple Carry Adder
 Carry look ahead Adder
 Conditional Sum Adder
 Carry Skip Adder
 Carry Select Adder
 Kogge-Stone Adder
Multi operand adders
Multi operand adders are classified into following types
 Carry Save Adder
 Array Adder
 Wallace Tree Adder
 Balanced Delay Adder
For FPGA implementation it is required to have less number of blocks. Semi-

custom design requires architecture consisting of less number of interconnects.
Hence selection of adder architecture is an important phase for low power
applications. From the review carried out the following are the observations:
 Adders can be implemented in different methods according to the
different requirements.

 Each kind of adder has different properties in area, propagation delay

and power consumption.
 There is no an absolute advantage or disadvantage for an adder,
and usually one advantage compensates with another disadvantage.
 A ripple carry adder is easy to implement for short bit
length, the performances are good.
 For long bit length, a carry look-ahead adder is not practical,

but a hierarchical structure can improve much.
 A carry select adder has good performance in propagation delay
especially the nonlinear one however, it compensates with large area.
 The Kogge-Stone prefix adder has the shortest propagation delay, but it
has the largest area and power consumption as well.
3.2 Multipliers
Multiplication is much common operation than addition, and is essential for
microprocessors, digital signal processors and graphics engines. The most basic
form of multiplication consists of forming the product of two unsigned (+)
numbers. Many different kinds of multipliers have been proposed with very
different hardware requirements, throughput and power dissipation. These include:
serial multipliers, sequential multipliers, array multipliers and tree multipliers.
Serial multipliers and sequential multipliers are rarely used in today’s high
performance CMOS circuits because of their poor throughput, although they are
quite power efficient. Array multipliers and tree multipliers are two of the most
popular kinds of multiplier. The basic principles of array and tree multipliers are:
 Generate partial products;
 Add all the partial products together through several rows of carry-save
adder (CSAs) using, for example, 3-2 (full) adders or 4-2 adders [45], finally
obtaining one partial sum and partial carry.
 Send the partial sum and partial carry to a multi-bit carry- propagate
adder to get final result.

Basically multipliers are of different types
1. Array multiplier
2. Booth multiplier
3. Wallace tree multiplier
4. Combinational multiplier
5. Sequential multiplier
3.2.1 Array multiplier
Array multipliers and tree multipliers are fast but expensive in terms of
hardware and power consumption. Iterative structures allow a trade-off between
performance and hardware requirement. Pipelining is usually used in iterative
systems to improve their performance.
There are a number of techniques that can be used to perform multiplication. In
general, the choice is based up on factors such as latency, throughput, area, and
design complexity.
An obvious approach is to use an M+1 – bit carry propagate adder (CPA) to
add the first two partial products, then another CPA to add the third partial product
to the running sum, and so forth. Such an approach requires N-1 CPA's and is
slow, even if a fast CPA is employed. More efficient parallel approaches use some
sort of array or tree of full adders to sum the partial products.
In the early 1950’s, multiplier performance was significantly improved with the
introduction of booth multiplier [40, 120] and the development of faster adders
[78, 100] and memory components. Booth's method and the modified booth's
method do not require a correction of the product when either (or both) of the
operands is negative for two's complement numbers.
During the 1950's, adders designs moved away from the slow sequential
formation of carried executed by ripple carry adders carry look ahead, carry select,
and conditional some adders yielded speedy sums through the faster simultaneous
or parallel generation of carriers. In the 1960's two classes of parallel multipliers
were defined .

The first class of parallel multipliers uses a rectangular array of identical

combinational cells to generate and sum the partial product bits. Multipliers
of this type are29called Array multipliers. They have a delay that is generally
proportional to the word length of the multiplier input.
Due to the regularity of their structures, array multipliers are carrying to layout
and have been implemented frequently. The second class of parallel multipliers
reduces a matrix of partial product bits to two words through the strategic
application of counters or compressors. These two words are then summed using a
fast carry-propagate adder to generate the product. This class of parallel multiplier
is known as Column compression multiplier. Since the delay is proportional to the
logarithm of the multiplier, word length, these are also the fastest multipliers.
In array multiplier, the two basic functions of partial product generation and
summation are combined. For unsigned N by N multiplication, N2+N-1 cells,
where N2 contain an AND gate for partial product generation and a full adder for
summing and N-1 cells containing a full adder, are connected to produce a
multiplier. The array generates N lower product bits directly and uses a Carry-
propagate adder, in this case a ripple carry adder, to form the upper N bits of the
product.
In order to design an array multiplier for two's complement operands, Booth
algorithm can be employed. The implementation of a booth's algorithm array
multiplier computes the partial products by examining two multiplicand bits at a
time. Except for enabling usage of two's complement operands, this booth's
algorithm array multiplier offers no performance or area advantage in comparison
to the basic array multiplier.
Better delays, though can be achieved by implementing a higher radix modified

booth algorithm. Another method for building an array multiplier that handles
two's complement operands was presented by Baugh and Wooley.

Fig 3.1 Array Multiplier
3.2.2 Booth multiplier

Booth's multiplication algorithm is a multiplication algorithm that multiplies
two signed binary numbers in two's complement notation. The algorithm was
invented by Andrew Donald Booth in 1950 while do in search on crystallography
at Birkbeck College in Bloomsbury, London. Booth's algorithm is of interest in the
study of computer architecture.
Booth's algorithm examines adjacent pairs of bits of the 'N'-bit multiplier Y in

signed two's complement representation, including an implicit bit below the least
significant bit, y−1= 0. For each bit yi, for i running from 0 to N − 1, the bits yi and
yi−1 are considered. Where these two bits are equal, the product accumulator P is
left unchanged. Where yi = 0 and yi−1 = 1, the multiplicand times 2i is added to P;
and where yi = 1 and yi−1 = 0, the multiplicand times 2i is subtracted from P. The
final value of P is the signed product.
The representations of the multiplicand and product are not specified; typically,
these are both also in two's complement representation, like the multiplier, but any
number system that supports addition and subtraction will work as well. As stated
here, the order of the steps is not determined. Typically, it proceeds from LSB to
MSB, starting at i = 0; the multiplication by 2i is then typically replaced by

Incremental shifting of the P accumulator to the right between steps; low bits can
be shifted out, and subsequent additions and subtractions can then be done just on
the highest N bits of P.
There are many variations and optimizations on these details. The algorithm is
often described as converting strings of 1s in the multiplier to a high-order +1 and
a low-order −1 at the ends of the string. When a string runs through the MSB, there
is no high-order +1, and the net effect is interpretation as a negative of the
appropriate value.
Booth's algorithm can be implemented by31repeatedly adding (with ordinary

unsigned binary addition) one of two predetermined values A and S to a product P,
then performing a rightward arithmetic shift on P. Let m and r be the multiplicand
and multiplier respectively and let x and y represent the number of bits in m and r.
1. Determine the values of A and S, and the initial value of P. All of these
numbers should have a length equal to (x + y + 1).
 A: Fill the most significant (leftmost) bits with the value of m. Fill the
remaining (y + 1) bits with zeros.
 S: Fill the most significant bits with the value of (−m) in two's
complement notation. Fill the remaining (y + 1) bits with zeros.
 P: Fill the most significant x bits with zeros. To the right of this, append
the value of r. Fill the least significant (rightmost) bit with a zero.
2. Determine the two least significant (rightmost) bits of P.
 If they are 01, find the value of P + A. Ignore any overflow.
 If they are 10, find the value of P + S. Ignore any overflow.
 If they are 00, do nothing. Use P directly in the next step.
 If they are 11, do nothing. Use P directly in the next step
3. Arithmetically shift the value obtained in the 2nd step by a single place to
the right. Let P now equal this new value.

4. Repeat steps 2 and 3 until they have been done y times.
5. Drop the least significant (rightmost) bit from P. This is the product of m.
Fig 3.2 Booth multiplier
3.2.3 Wallace tree multiplier

A Wallace tree is an efficient hardware implementation of a digital circuit
that multiplies two integers, devised by Australian Computer Scientist Chris
Wallace in 1964.
The Wallace tree has three steps:

 Multiply (that is – AND) each bit of one of the arguments, by each bit
of the other, yielding results.
 Depending on position of the multiplied bits, the wires carry different
weights, for example wire of bit carrying result of is 128 (see explanation of
weights below).Reduce the number of partial products to two by layers of full
and half adders.
 Group the wires in two numbers, and add them with a conventional adder.
The second step works as follows. As long as there are three or more wires
with the same weight add a following layer:-
 Take any three wires with the same weights and input them into a full
adder.
 The result will be an output wire of the same weight and an output
wire with a higher weight for each three input wires.

 If there are two wires of the same weight left, input them into a half adder.
 If there is just one wire left, connect it to the next layer.

The benefit of the Wallace tree is that there are only reduction layers, and
each layer has propagation delay. As making the partial products is and the final
addition is , the multiplication is only , not much slower than addition (however,
much more expensive in the gate count). Naively adding partial products with
regular adders would require time. From a complexity theoretic perspective, the
Wallace tree algorithm puts multiplication in the class NC.
These computations only consider gate delays and don't deal with wire delays,
which can also be very substantial. The Wallace tree can be also represented by a
tree of 3/2 or 4/2 adders. It is sometimes combined with Booth encoding.
3.2.4 Combinational multiplier

Combinational Multipliers do multiplication of two unsigned binary numbers.
Each bit of the multiplier is multiplied against the multiplicand, the product is
aligned according to the position of Combinational Multipliers do multiplication of
two unsigned binary numbers. Each bit of the multiplier is multiplied against the
multiplicand, the product is aligned according to the position
A combinational circuit for implementing the 4-bit multiplier is shown in

figure: Each of the ANDed terms is called a partial product. The resulting product
is formed by accumulating down the columns of partial products, propagating the
carries from the rightmost columns to the left. The multiplier will multiply two 4
bit number. The first level of 16 AND gates computes the individual partial
products. The second- and third-level logic blocks form the accumulation of the
products on a column-by-column basis. The column sums are formed by a mixture
of cascaded half adders and full adders. In the figure, inputs from the top are the
bits to be added and the input from the right is the carry-in. The output from the
bottom is the sum and to the left is the carry-out.

Combinational Multiplier gives more power consumption as well as optimum

number of components required, but delay for this multiplier is larger. It also
requires larger number of gates because of which area is also increased; due to this,
combinational multiplier is less economical. Thus, it is a fast multiplier but
hardware complexity is high.
Fig 3.3 Combinational Multiplier Circuit
3.2.5 Sequential multiplier
A sequential multiplier works in a manner similar to manual multiplication of

two decimal numbers, although two binary numbers are multiplied in this case. A
multiplicand X = [xn–1xn–2…x0] is multiplied by each bit of a multiplier Y = [yn–
1yn–2 …y0], forming the multiplicand- multiple Z = [zn–1zn–2 …z0], where zi =
xiyj for each I = 0, , n – 1. Then, Z is shifted left by j bit positions and is added, in
all digit positions in parallel, to the partial product Pj–1 which has been formed by
the previous steps, to generate the partial product Pj .
Repeating this step for j = 0 to n – 1, the product P = [p2n–1 p2n–2 …p0] of
2n bits is derived. The only difference of this sequential multiplier from the manual
multiplication is the repeated addition of each multiplicand-multiple, instead of
one-time addition of all multiplicand-multiples at the end.
This sequential multiplier is realized, which consists of a Multiplicand Register

of n-bits for storing multiplicand X, a Shift Register of 2n-bits for storing

multiplier Y and partial product Pj – 1, a Multiplicand-Multiple Generator (denoted

as MM Generator) for generating a multiplicand- multiple yj · X, and a Parallel
Adder of n-bits. Initially, X is stored in the Multiplicand Register and Y is stored in
the lower half (i.e., the least signiﬁcant bit positions) of the Shift Register where the
upper half of the Shift Register stores 0.
This sequential multiplier performs one iteration step described above in each
clock cycle. In other words, in each clock cycle, a multiplier bit yj is read from the
right-most position of Shift Register. A multiplicand- multiple yj · X is produced by
Multiplicand-Multiple Generator, which is X or 0 based on whether yj is 1 or 0, and
is fed to Parallel Adder. The upper n-bit of the partial product is read from the
upper half of Shift Register and also fed to Parallel Adder. The content of Shift
Register.

CHAPTER - 4
4.1 Software
Xilinx ISE (Integrated Synthesis Environment) is a software tool produced by
Xilinx for synthesis and analysis of HDL designs, enabling the developer to
synthesize ("compile") their designs, perform timing analysis, examine RTL
diagrams, simulate a design's reaction to different stimuli, and configure the target
device with the programmer.
Xilinx ISE is a design environment for FPGA products from Xilinx, and is
tightly-coupled to the architecture of such chips, and cannot be used with FPGA
products from other vendors. The Xilinx ISE is primarily used for circuit synthesis
and design, while ISIM or the ModelSim logic simulator is used for system-level
testing. Other components shipped with the Xilinx ISE include the Embedded
Development Kit (EDK), a Software Development Kit (SDK) and ChipScope Pro.
Since 2012, Xilinx ISE has been discontinued in favor of Vivado Design Suite,
that serves the same roles as ISE with additional features for system on a chip
development. Xilinx released the last version of ISE in October 2013 (version
14.7), and states that "ISE has moved into the sustaining phase of its product life
cycle, and there are no more planned ISE releases.
The ISE® software controls all aspects of the design flow. Through the Project
Navigator interface, you can access all of the design entry and design
implementation tools. You can also access the files and documents associated with
your project.
Project Navigator Interface
By default, the Project Navigator interface is divided into four panel sub-
windows, as seen in Figure 2-1. On the top left are the Start, Design, Files, and
Libraries panels, which include display and access to the source files in the project
as well as access to running processes for the currently selected source.

The Start panel provides quick access to opening projects as well as frequently
access reference material, documentation and tutorials. At the bottom of the Project
Navigator are the Console, Errors, and Warnings panels, which display status
messages, errors, and warnings.
To the right is a multi-document interface (MDI) window referred to as the
Workspace. The Workspace enables you to view design reports, text files,
schematics, and simulation waveforms. Each window can be resized, undocked
from Project Navigator, moved to a new location within the main Project
Navigator window, tiled, layered, or closed. You can use the View > Panels menu
commands to open or close panels. You can use the Layout > Load Default
Layout to restore the default window layout. These windows are discussed in more
detail in the following sections.
Fig 4.1 Project Navigator
Design Panel
The Design panel provides access to the View, Hierarchy, and Processes panes.

View Pane
The View pane radio buttons enable you to view the source modules associated
with the Implementation or Simulation Design View in the Hierarchy pane. If you
select Simulation, you must select a simulation phase from the drop-down list.
Hierarchy Pane
The Hierarchy pane displays the project name, the target device, user
documents, and design source files associated with the selected Design View. The
View pane at the top of the Design panel allows you to view only those source files
associated with the selected Design View, such as Implementation or Simulation.
Each file in the Hierarchy pane has an associated icon. The icon indicates the file
type (HDL file, schematic, core, or text file, for example).
Processes Pane
The Processes pane is context sensitive, and it changes based upon the source
type selected in the Sources pane and the top-level source in your project. From the
Processes pane, you can run the functions necessary to define, run, and analyze
your design.
The Processes pane provides access to the following functions:
 Design Summary/Reports Provides access to design reports, messages,

and summary of results data. Message filtering can also be performed.
 Design Utilities Provides access to symbol generation, instantiation
templates, viewing command line history, and simulation library
compilation. User Constraints Provides access to editing location and
timing constraints.
 Synthesis Provides access to Check Syntax, Synthesis, View RTL or
Technology Schematic, and synthesis reports. Available processes vary
depending on the synthesis tools you use Implement Design Provides
access to implementation tools and post- implementation analysis tools.
 Generate Programming File Provides access to bit stream generation.
 Configure Target Device Provides access to configuration tools for creating
programming files and programming the device.

Files Panel
The Files panel provides a flat, sortable list of all the source files in the project.
Files can be sorted by any of the columns in the view. Properties for each file can
be viewed and modified by right-clicking on the file and selecting Source
Properties.
Libraries Panel
The Libraries panel enables you to manage HDL libraries and their
associated HDL source files. You can create, view, and edit libraries and their
associated sources.
Console Panel
The Console provides all standard output from processes run from Project
Navigator. It displays errors, warnings, and information messages. Errors are
signified by a red X next to the message; while warnings have a yellow
exclamation mark (!).
Errors Panel
The Errors panel displays only error messages. Other console messages
are filtered out.
Warnings Panel
The Warnings panel displays only warning messages. Other console

messages are filtered out.
Design Summary/Report Viewer
The Design Summary provides a summary of key design data as well as access
to all of the messages and detailed reports from the synthesis and implementation
tools. The summary lists high-level information about your project, including
overview information, a device utilization summary, performance data gathered
from the Place and Route (PAR) report, constraints information, and summary
information from all reports with links to the individual reports. A link to the
System Settings report provides information on environment variables and tool
settings used during the design implementation.

Understanding the ISE Project File

The ISE project file (.xise extension) is an XML file that contains all
source- relevant data for the project as follows:
 ISE software version information
 List of source files contained in the project
 Source settings, including design and process properties The ISE project
file does not contain the following:
• Process status information
• Command history
• Constraints data
• Contains all of the necessary source settings and input data for
the project.
 Can be opened in Project Navigator in a read-only state.
 Only updated or modified if a source-level change is made to the project.
 Can be kept in a directory separate from the generated output directory

(working directory).
4.2 HDL-based design
HDL-based design procedure using a design of a runner’s stopwatch. The

design example used in this tutorial demonstrates many device features, software
features, and design flow practices you can apply to your own design. This design
targets a Spartan™-3A device, however all of the principles and flows taught are
applicable to any Xilinx® device family, unless otherwise noted. The design is
composed of HDL elements and two cores. You can synthesize the design using
Xilinx Synthesis Technology (XST), Simplify Pro, or Precision software.
This chapter is the chapter in the HDL Design Flow. After the design is
successfully defined, you will perform behavioral simulation, run is being
implementation with the Xilinx implementation tools, perform timing simulation,
and configure and download to the Spartan-3A device (XC3S700A) demo board.

VHDL or Verilog
Xilinx ISE supports both VHDL and Verilog designs and applies to both
designs simultaneously, noting differences where applicable. You will need to
decide which HDL language you would like to work through for the tutorial and
download the appropriate files for that language. XST can synthesize a mixed-
language design. However, this tutorial does not cover the mixed language feature.
Starting the ISE Software
To start the ISE software double-click the ISE Project Navigator icon on your
desktop or select Start > All Programs > Xilinx ISE Design Suite > ISE Design
Tools > Project Navigator.
Creating a New Project
To create a new project using the New Project Wizard, do the following:
1. From Project Navigator, select File > New Project. The New Project.
2. In the Location field, browse to c:\xilinx_tutorial or to the directory in
which you installed the project.
3. In the Name field, enter wtut_vhd or wtut_ver.
Fig 4.2 New Project Wizard-Create new project page

Fig 4.3 New project wizard-Device properties page
1. Verify that HDL is selected as the Top-Level Source Type, and click Next.
2. Select the following values in the New Project Wizard—Device
property page:
 Product Category: All
 Family: Spartan3A and Spartan3AN
 Device: XC3S700A
 Package: FG484
 Speed: -4
 Synthesis Tool: XST (VHDL/Verilog)
 Simulator: ISim (VHDL/Verilog)
 Preferred Language: VHDL or Verilog depending on preference.
This will determine the default language for all processes that generate HDL
files. Other properties can be left at their default values. Click Next, then Finish to
complete the project creation.

Adding Source Files
HDL files must be added to the project before they can be synthesized. You
will add five source files to the project as follows:
1. Select Project > Add Source.
2. Select the following files (.vhd files for VHDL design entry or .v files
for Verilog design entry) from the project directory, and click Open. •
clk_div_262k • lcd_control • statmach • stopwatch • time_cnt
3. In the Adding Source Files dialog box, verify that the files are
associated with All, that the associated library is work, and click OK
Using the New Source Wizard and ISE Text Editor
In this section, you create a file using the New Source wizard, specifying the
name and ports of the component. The resulting HDL file is then modified in the
ISE Text Editor. To create the source file, do the following:
1. Select Project > New Source. The New Source Wizard opens in
which you specify the type of source you want to create.
2. In the Select Source Type page, select VHDL Module or Verilog
Module.
3. In the File Name field, enter debounce.
4. Click Next.
5. In the Define Module page, enter two input ports named sig in and clk
and an output port named sig out for the debounce component.
6. Click Next to view a description of the module.
Fig 4.4 Select Source type page

Synthesizing the Design
So far you have been using Xilinx Synthesis Technology (XST) for syntax
checking. Next, you will synthesize the design using either XST, Synplify/Synplify
Pro, or Precision software. The synthesis tool uses the design’s HDL code and
generates a supported netlist type (EDIF or NGC) for the Xilinx implementation
tools. The synthesis tool performs the following general steps (although all
synthesis tools further break down these general steps) to create the netlist:
• Analyze/Check Syntax Checks the syntax of the source code.
• Compile Translates and optimizes the HDL code into a set of

components that the synthesis tool can recognize.
• Map Translates the components from the compile stage into the target
technology’s primitive components.
The synthesis tool can be changed at any time during the design flow. To change
the synthesis tool, do the following:
1. In the Hierarchy pane of the Project Navigator Design panel, select the
targeted part.
2. Right-click and select Design Properties.
In the Design Properties dialog box, click the Synthesis Tool value and use the
pull-down arrow to select the desired synthesis tool from the list
Fig 4.5 Specifying Synthesis tool Using the RTL/Technology Viewer

XST can generate a schematic representation of the HDL code that you have
entered. A schematic view of the code helps you analyze your design by displaying
a graphical connection between the various components that XST has inferred.
Following are the two forms of schematic representation:
• RTL View Pre-optimization of the HDL code.
• Technology View Post-synthesis view of the HDL design mapped to the

target technology. To view a schematic representation of your HDL code,
do the following:
1. In the Processes pane, expand Synthesize, and double-click View RTL

Schematic or View Technology Schematic.
If the Set RTL/Technology Viewer Startup Mode dialog appears, select Start
with the Explorer Wizard.
2. In the Create RTL Schematic start page, select the clk divider and lap load
debounce components from the Available Elements list, and then click the Add
button to move the selected items to the Selected Elements list.
3. Click Create Schematic.
Fig 4.6 Create RTL Schematic Start Page

The RTL Viewer allows you to select the portions of the design to display as
schematic. When the schematic is displayed, double-click on the symbol to push
into the schematic and view the various design elements and connectivity. Right-
click the schematic to view the various operations that can be performed in the
schematic viewer.
Fig 4.7 Sample of RTL schematic
Examining Synthesis Results

To view overall synthesis results, double-click View Synthesis Report
under the Synthesize process. The report consists of the following sections:
• Compiler Report
• Mapper Report
• Timing Report
• Resource Utilization
Compiler Report
The compiler report lists each HDL file that was compiled, names which file is
the top level, and displays the syntax checking result for each file that was
compiled. The report also lists FSM extractions, inferred memory, warnings.

Mapper Report
The mapper report lists the constraint files used, the target technology, and
attributes set in the design. The report lists the mapping results of flattened that were
created, and how FSMs were coded.
Timing Report
The timing report section provides detailed information on the constraints that
you entered and on delays on parts of the design that had no constraints. The delay
values are based on wireload models and are considered preliminary. Consult the
post-Place and Route timing reports , Design Implementation, for the most
accurate delay information.
Fig 4.8 Timing Report

CHAPTER – 5
5.1 Result and analysis
The figure given below is designed by using RTL simulation and it is simulated
in ModelSim 6.3c and synthesis is done in XILINX ISE .The results are taken for
various multiplier and multiplicand input that are given to confined Wallace
multiplier.
Simulation report
Fig 5.1 simulation report

Comparison results between Conventional Wallace multiplier

and Confined Wallace multiplier:
Method Area (sq mm) LUT Delay(ns)

Conventional 87 165 18.531
wallacetree
multiplier
Confined 81 119 7.835

wallacetree
multiplier
The above Table shows Comparison results between Conventional Wallace and
Confined Wallace multiplier using multiplexer based full adder . It shows that the
number of slices and LUT’s are reduced in Confined Wallace multipliers over
Conventional Wallace multiplier. And also it states that speed of the proposed
system also increased compared with the existing system.
Summary Report

Fig 5.2 Summary Report
This Summary report section provides a detailed information on the constrains

that we entered and delays on parts of the design that had no constrains. The delay
values are based on wire load models and LUT’s. The above summary report
provides the information about the number of LUT’s used in designing confined
Wallace tree multiplier using multiplexer based full adder i.e., 119 which is less
than conventional Wallace tree Multiplier and also provides total delay i.e., 7.815
ns which is also less than conventional Wallace tree Multiplier i.e., 18.531 ns.
5.2 Applications
 Confined multiplier plays vital role in digital signal processing

(DSP) applications such as finite impulse respose (FIR) and discrete
cosine transforms (DCT).
 In this method the numbers of single bit adders are reduced and also it
will be replaced by multiplexers. So that LUT’s (lookup tables) FPGA (Field
programmed gate array) has utilized fully by occupying in low number of
slices.
 It occupies less area.
 High speed performance can be achieved by using confined multiplier.

5.3 Disadvantages
For Higher order multiplication, difficulties are faced.
5.4 Advantages
 Area required for the system is reduced.
 Power consumed by the system is less.
 Speed of the system is improved.
 Complexity is less.

CHAPTER-6
6.1 Conclusion
The Confined Wallace tree multiplier using multiplexer based full adder is
designed. The tool we have used for simulation is Model Sim 6.3c and for
synthesis is XILINX ISE. The code is written in verilog HDL. By reducing the
usage of number of Half adders in a Wallace tree Multiplier results in increase of
speed. By designing Multiplexer based Full adders, area consumed for hardware is
decreased. As the hardware is reduced, power consumption is also minimized. The
implementation results of the proposed design that we have obtained is better than
existing design. Resulting that this architecture is used for high performance and
area efficient in Digital Signal Processing (DSP) applications such as Finite
Impulse Response (FIR) and Discrete Cosine Transforms (DCT) etc.

CHAPTER-7
7.1 Future scope
The proposed confined wallace tree multiplier performs well in total power
dissipation but show poor static power dissipation which needs to be concentrated.
The proposed multipliers show better area reduction and power reduction. The
proposed parallel counter demonstrates significant delay reduction. An
implementation of less area and high speed confined wallace tree multiplier using
multiplexer based full adder in a processor can be done to verify their suitability
for high speed portable applications.

REFERENCES
1. Bickerstaff, K. A.C. M. Schulte , and E. E. Swartzlander , Jr., ”Reduced Area

Multiplier”, Intl. Conf. on Application – Specific Array Processor, pp. 478- 489,
1993.
2. Bickerstaff , K.C, and E.E. Swartzlander , Jr . “Analysis of Column
Computer Arithmetic, pp . 33-39, 2001.
3. Rajaram , S . , Mrs. K.Vanithamani “ Improvement of Wallace multiplier
using parallel prefix adders “ Proceeding of 2011 International conference on
signal processing , Communication , Computing and Networking
Technologies (ICSCCN 2011).
4. Meier , P .C.H. , R.A Rutenbar and L.R Carley,” Exploring Multiplier
Architecture and layout fpr Low Power, in IEEE custom Integrated circuits
Conf.,pp.513- 516,1996.
5. R.Mahalakshimi and DR.T. Sasilatha, “ A power efficient carry saver adder
and modified carry save adder using cmos technology”,IEEE international
conference on computational intelligence and computing research, 2013.
6. Meier, P.C.H. , R.A. Rutenbar and L.R Carley,” Exploring Multiplier
Architecture and Layout for low power “ in IEEE custom Integrated Circuits
Conf., pp.513- 516,1996.
7. Dadda, L.”Some Schemes for Parallel Multipliers,” Alta Frequenza,
vol.34,pp.349- 356,1965.

Edited Project

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Edited Project

Uploaded by

Copyright:

Available Formats

LOW AREA HIGH SPEED CONFINED MULTIPLIER USING

MULTIPLEXER BASED FULL ADDER

An efficient approach is proposed by estimating the power of each stage of the

The basic building blocks of arithmetic circuits in digital signal processing

NMREC, Dept Of ECE 2

In the suggested strategy, an altered full adders utilizing multiplexer is

NMREC, Dept Of ECE 3

NMREC, Dept Of ECE 4

Fig 2.1 Block Diagram

We propose a modification in multiplier to further reduce its area by reducing

NMREC, Dept Of ECE 5

is S = A ⊕ B ⊕ Cin and Cout = (A ⋅ B) + (Cin ⋅ (A ⊕B)).

NMREC, Dept Of ECE 6

Fig 2.2 full adder

Truth table of full adder

Proposed Full Adder

NMREC, Dept Of ECE 7

Carry AB (A ⊕B )C

(b) Carry Circuit

Fig 2.3 Proposed Multiplexer based Full Adder

NMREC, Dept Of ECE 8

Fig 2.4 Multiplexer based full adder

2.2 Schematic diagram

Fig 2.5 Top module

NMREC, Dept Of ECE 9

Fig 2.6 RTL view using multiplexer

NMREC, Dept Of ECE 10

2.3 Flow chart

Multiply the two n bits binary number

Generate partial products

Arrange the partial products

Divide the partial products into group of three

Apply half adder

Final binary output of given input data

Fig 2.7 Flow chart

NMREC, Dept Of ECE 11

2.4 Working principle

 At first multiply the given binary input.

 Partial products are generated.

 Partial products are arranged in pyramidal structure.

 A group of 2 bits in a column is passed on to next stage and

 Single bits are passed on to the next stage same as in conventional

NMREC, Dept Of ECE 12

2.5 Existing system

Fig 2.8 Conventional Wallace tree multiplier

NMREC, Dept Of ECE 13

NMREC, Dept Of ECE 14

A case of a portrayal of the traditional eight-bit by eight-bit aforementioned

 Appling a full adder to each column that contains 3 bits.

 Appling a half adder to each column that contains 2 bits.

 Passing any single bit column to next stage without processing.

NMREC, Dept Of ECE 15

2.6 Proposed system

Fig 2.9 Confined Wallace tree multiplier

Waters et al. exhibited diminished unpredictability Wallace kind decrease

when the quantity of diminishment phases of the adjusted aforementioned kind is

NMREC, Dept Of ECE 16

In the third stage, aforementioned type bit convey which is proliferating is

The confined wallace multiplier is similar to that of conventional wallace

 At first partial products are formed into pyramidal structure.

NMREC, Dept Of ECE 17

 Divide the structure into group of rows depending on the number

 Single bits are passed on to the next stage same as in conventional