An Efficient High Speed Wallace Tree Mul

AN EFFICIENT HIGH SPEED WALLACE TREE
MULTIPLIER
N. Sureka1, Ms.R.Porselvi2, Ms.K.Kumuthapriya3
PG Student1, Assistant Professor2, Sr.Assistant Professor
Tagore Engineering College, Chennai.
sureka.ns05@gmail.com
Abstract—Power dissipation of integrated circuits is a a carry save adder for adding the partial products so obtained
major concern for VLSI circuit designers. A Wallace tree and a carry propagate adder in the final stage of addition.
multiplier is an improved version of tree based multiplier In the proposed architecture, partial product reduction is
architecture. It uses carry save addition algorithm to accomplished by the use of 4:2, 5:2 compressor structures and
reduce the latency. This paper aims at further reduction of the final stage of addition is performed by a proposed carry
the latency and power consumption of the Wallace tree select adder.
multiplier. This is accomplished by the use of 4:2, 5:2
compressors and a proposed carry select adder. The result II. COMPRESSOR FOR PARTIAL PRODUCT REDUCTION
shows that the proposed Wallace tree multiplier is 44.4%
faster than the conventional Wallace tree multiplier, along The multiplier architecture comprises of a partial product
with realization of 11% of reduced power consumption. generation stage, partial product reduction stage and the final
The simulations have been carried out using the Modelsim addition stage. The latency in the Wallace tree multiplier can
and Xilinx tools be reduced by decreasing the number of adders in the partial
products reduction stage. In the proposed architecture, multi bit
Keywords- Wallace tree, Carry select adder, compressors, compressors are used for realizing the reduction in the number
of partial product addition stages. The combined factors of low
adder, multiplier
power, low transistor count and minimum delay makes the 5:2
and 4:2 compressors, the appropriate choice. In these
I. INTRODUCTION compressors, the outputs generated at each stage are efficiently
A multiplier is one of the key hardware blocks in most used by replacing the XOR blocks with multiplexer blocks [3].
digital and high performance systems such as FIR filters, The select bits to the multiplexers are available much ahead of
digital signal processor, microprocessors etc. With advances in the inputs so that the critical path delay is minimized. The
technology, many researchers have tried and strive to design various adder structures in the conventional architecture are
multipliers which offer either of the following- high speed, low replaced by compressors.
power consumption, less area combination of them in
multipliers, thus making them compatible for various high
speed, low power, and compact VLSI implementations.
However, area and speed are two conflicting constraints.
Therefore, improving speed always results in larger area. The
most efficient multiplier structure will vary depending on the
throughput requirement of the application. The first step of the
design process is the selection of the optimum circuit structure.
There are various structures to perform the multiplication
operation starting from the simple serial multipliers to the
complex parallel multipliers. Any speed improvement in the
multiplier will improve the operating frequency of the digital
signal processors or can be traded for energy by optimizing
circuit sizes and the voltage supply. The new architecture
enhances the speed performance of the widely acknowledged Figure 1. A 4:2 Compressor
Wallace tree multiplier. The structural optimization is
The use of two full adders would introduce a delay of 4
performed on the conventional Wallace multiplier, in such a
whereas the use of 4:2 compressors reduces the latency to 3.
way that the latency of the total circuit reduces considerably.
Two full adders are replaced by a single 4:2 compressor. The
The Wallace tree basically multiplies two unsigned integers.
equations governing the outputs of the 4:2 compressor
The conventional Wallace tree multiplier architecture [1] [2]
architecture is shown below.
comprises of an AND array for computing the partial products,
SUM = ( X 1⊕ X 2) • X 3 ⊕ X 4 + ( X 1⊕ X 2) • ( X 3 ⊕ X 4) • CIN + and carries are calculated, the final sums are computed using
multiplexers having minimal delay. The multiplexer block
( X1 ⊕ X 2) • X 3 ⊕ X 4 + ( X1 ⊕ X 2) • ( X 3 ⊕ X 4) • CIN receives the two sets of 5-bit input (four sum bits and one
COUT = ( X 1⊕ X 2) • X 3 + ( X 1⊕ X 2) • X 1CARRY = ( X1 ⊕ X 2 ⊕ X 3 ⊕ X 4) carry bit each) and selects the final sum based on the select
• CIN + ( X 1⊕ X 2 ⊕ X 3 ⊕ X 4) • X 4
In conventional structure, three full adders are used for the

computation of sum and carry with a latency of 6. On the other
hand the use of 5:2 compressors reduces the latency to 4. In the
modified structure, 5:2 compressors replace three full adders.
Figure 3. Proposed 16-bit Carry Select Adder structure

input from the previous stage. Use of the basic unit with the
10-to-5 multiplexer thus achieves fast incrementing action
with reduced device count. Thus, the proposed CSA excels the
conventional CSA circuit in terms of speed by reducing the
carry propagation latency.
IV. CONVENTONAL AND PROPOSED WALLACE TREE
Figure 2. A 5:2 Compressor MULTIPLIERS
The logic equation for the 5:2 compressor can be written as A. Conventional Wallace tree multiplier
SUM = X 1 ⊕ X 2 ⊕ X 3 ⊕ X 4 ⊕ X 5 ⊕ CIN ⊕ CIN 2 In the conventional 8 bit Wallace tree multiplier design,
COUT 1 = (X 1 + X 2 ) • X 3 + X 1 • X 2 more number of addition operations is required. Using the
carry save adder, three partial product terms can be added at a
COUT 2 = (X 4 ⊕ X 5 ) • CIN + (X 4 ⊕ X 5 ) • X 4
time to form the carry and sum. The sum signal is used by the
CARRY = ((X 1 ⊕ X 2 ⊕ X 3 ) ⊕ (X 4 ⊕ X 5 ⊕ CIN)) • CIN 2 +
full adder of next level. The carry signal is used by the adder
((X 1 ⊕ X 2 ⊕ X 3 ) ⊕ (X 4 ⊕ X 5 ⊕ CIN)) • (X 1 ⊕ X 2 ⊕ X 3 ) involved in the generation of the next output bit, with a
resulting overall delay proportional to log3/2 n, for n number of
III. PROPOSED CARRY SELECT ADDER rows.
A multiplier consists of various stages of full adders, each
The Boolean expressions depicting the 4-bit basic higher stage adds up to the total delay of the system. In the
block are listed below: first and second stages of the Wallace structure, the partial
B1= ~A0; products do not depend upon any other values other than the
B2=A1^A0; inputs obtained from the AND array. However for the
C1=A1*A0; immediate higher stages, the final value (PP3) depends on the
C2=A2*A3; Cout value of previous stage. This operation is repeated for
further stages. Hence, the major cause of delay is the
C0=C1*C2;
propagation of the carry out from the previous stages to the
C3=C1*A2;
next stage. In conventional Wallace tree structure the total
B3=A2^C1; number of stages in the critical path sums up to 13. Each full
B4=A3^C3; adder accounts for a latency of 2. Therefore the total latency of
It can be seen that the carry out (C0) of the block is
calculated in parallel along with B3 by using a parallel chain the given structure when calculated is 26. The latency count
of AND gates, whereas a series pattern of carry propagation is gets added by one when considering the AND array, thus
used in RCA structure, which reduces the delay of giving a total latency of 27
incrementing in CSA when compared with the conventional
RCA. Figure 5 shows the proposed 16-bit carry select adder, B. Proposed novel architecture
which equally divides the word size of the adder into blocks of Our proposed architecture aims to reduce the overall
4-bit each. latency, which leads to increased speed and reduced power
The least significant 4-bits are added using consumption. In our architecture, we make use of compressors
conventional RCA, while other blocks are added in parallel in place of full adders, and the final carry propagate stage is
along with the given incrementer. Once all the interim sums replaced by a proposed carry select adder.
The first stage consists of a full adder. In the second stage,
two full adders have been grouped and implemented using a
4:2 compressor. Similarly, the third stage consists of a 5:2
compressor, which is a combination of 3 full adders and so on.
In this manner, the individual full adder blocks in the original
structure are grouped and implemented using compressors.
The number of interconnections is taken care of, since they
play a vital role in the flow of carry from one stage to the next
in the tree.
The longest delay path is the one consisting of two 5:2
compressors. This results in reduced latency of 8 (four per
compressor). The use of the proposed carry select adder in the
structure further results in a latency of 6 and the AND array has
a latency of 1.This novel structure brings down the overall
latency count to 15. This is, thus a significant reduction of
44.4% compared to the conventional structure.
V. RESULTS & DISCUSSIONS

Figure 5. Simulation result for Proposed Wallace tree
The figure 4 shows the Simulations results for the
multiplier
Conventional Wallace tree multiplier which is given below.
In this section the proposed and the conventional
architectures have been compared. The transistor count
comparison and the latency comparison are shown in Table I.
The latency defines the number of total phases required to
compute the output and is found to be 44.4% less than the
latency of the conventional Wallace tree multiplier.
TABLE I. NUMBER OF TRANSISTORS (N) AND LATENCY (L)

COMPARISON
Wallace Tree Multiplier

Circuit Structure
N L
Conventional 2998 27
Proposed 2748 15
The performance comparison of Conventional and

Proposed Wallace tree multipliers are depicted in the below
Figure 4. Simulation result for Conventional Wallace tree graph. Figure 6 shows area comparison of Conventional and
multiplier Proposed Wallace tree multiplier. Then, Figure 7 shows
latency comparison of Conventional and Proposed Wallace
The figure 5 shows the Simulations results for the Proposed tree multiplier.
Wallace tree multiplier which is given below.
[5] Sreehari Veeramachaneni, Kirthi M, Krishna
Lingamneni Avinash Sreekanth Reddy Puppala
M.B. Srinivas(2007), ‘Novel Architectures for
High-Speed and Low-Power 3-2, 4:2and
5:2Compressors’, 20th International Conference on
VLSI Design, Pp: 324-329.
[6] S. F. Hsiao, M. R. Jiang, and J. S. Yeh, ‘Design of
high speed low-power 3:2counter and 4:2compressor
for fast multipliers’, Electron. Lett, vol. 34, no. 4, Pp.
341–343, 1998.
[7] K. Prasad and K. K. Parhi, ‘Low-power 4:2and
5:2compressors,’ in Proc. of the 35th Asilomar Conf.
on Signals, Systems and Computers, vol. 1, 2001, Pp.
Figure 6 Area comparison of Conventional and Proposed 129–133.
Wallace tree multiplier [8] Massimo Alioto and Gaetano (2002), ‘Analysis and
Comparison on Full Adder Block in Submicron
Technology’, IEEE Transaction Very Large Scale
Integration (VLSI) Systems, Vol 10, No. 6, Pp: 806 –
823.
[10] Anantha P. Chandrakasan, Samuel Sheng and Robert
W. Brodersen (1992), ‘Low-Power CMOS Digital
Design‘, IEEE Journal of Solid State Circuits, Vol.
27, No. 4.
Figure 7. Latency comparison of Conventional and Proposed

Wallace tree multiplier
VI. CONCLUSIONS
In this paper, the implementation and analysis of a
novel Wallace tree architecture is proposed. The latency of
existing Wallace tree multiplier which is found to be 27 has
been reduced to 15.The comparison result also shows that a
significant reduction of latency and area is achieved. The
results obtained prove that the proposed architecture is more
efficient than the conventional one in terms of area
consumption and latency.
REFERENCES
[1] Abdellatif I,E. Mohamed, ‘Low-Power Digital VLSI

Design, Circuits and Systems’, Kluwer Academic
Publishers, Pp: 428-450.
[2] Abdellatif I,E. Mohamed, ‘Low-Power Digital VLSI
Design, Circuits and Systems’, Kluwer Academic
Publishers, Pp: 428-450.
[3] Neil H. Weste and Kamran Eshraghian, ‘Principles of
CMOS VLSI design-A Systems Perspective’,
Pearson Edition Pvt Ltd. 3rd edition, Pp: 345-356.
[4] Blodtti. A. and Saletti. R(2004), ‘Ultralow-power
adiabatic circuit semi-custom design’, Very Large
Scale Integration (VLSI) Systems, Vol. 12, No. 11,
Pp: 1248-1253.

An Efficient High Speed Wallace Tree Mul

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

An Efficient High Speed Wallace Tree Mul

Uploaded by

Copyright:

Available Formats

AN EFFICIENT HIGH SPEED WALLACE TREE

In conventional structure, three full adders are used for the

Figure 3. Proposed 16-bit Carry Select Adder structure

V. RESULTS & DISCUSSIONS

TABLE I. NUMBER OF TRANSISTORS (N) AND LATENCY (L)

Wallace Tree Multiplier

The performance comparison of Conventional and

Figure 7. Latency comparison of Conventional and Proposed

[1] Abdellatif I,E. Mohamed, ‘Low-Power Digital VLSI

You might also like