You are on page 1of 4

2016 International Conference on Advances in Electrical, Electronic and System Engineering, 14-16 Nov 2016, Putrajaya,

Malaysia

FPGA Based performance analysis of multiplier


policies for FIR filter
Aneela Pathan
PhD Scholar
1. Institute of Information and Communication Technology
Tayab D Memon
Mehran University of Engineering and Technology, Department of Electronic Engineering,
Jamshoro, Pakistan Mehran University of Engineering and Technology,
2. Department of Electronic Engineering, Jamshoro, Pakistan
Quaid-e-Awam University College of Engineering, Science tayabuddin.memon@faculty.muet.edu.pk
and Technology (QUCEST), Larkana, Pakistan
pathan_aneela@quest.edu.pk Imtiaz Hussain Kalwar
Sharmeen Keerio Department of Electronic Engineering,
Department of Electronic Engineering, Mehran University of Engineering and Technology,
Mehran University of Engineering and Technology, Jamshoro, Pakistan
Jamshoro, Pakistan imtiaz.hussain@faculty.muet.edu.pk
keerios@yahoo.com

Abstract—In this work, comparative analysis of Booth and requires more number of resources. Much work has been
Wallace Tree multiplier architectures is presented using Altera reported, to sort out the solutions for the optimization of
small commercial FPGA devices. Comparison is done with available DSP algorithms especially in multiply-add operation
respect to resources consumed and maximum frequency achieved (MAC) [1, 3-5].
for different multiplier bit width. The synthesis results show
tradeoff that Booth multiplier offers better performance at the In [6] alternative encoding techniques impact (i.e., area-
cost of more chip area. This is very useful to guide in choosing performance) on single-bit ternary FIR-like filter (SBTFF) is
suitable VLSI architecture as per required application. evaluated in FPGA [7]. In this work, we have focused on two
very important other algorithms which are considered to be
Keywords— Booth Algorithm; Wallace Tree; Multiplier more efficient in resource utilization with respect to
Policies; FPGA. performance (i.e., maximum clock frequency).
I. INTRODUCTION So in this work, we have focused on tradeoff of Booth and
MULTIPLICATION is inherent in the hardware Wallace Tree multipliers architectures in multi-bit FIR filter
implementation of any algorithm and plays central role, in most with small commercial FPGA devices provided by the Altera.
of the DSP algorithms including: Digital Filtering, Two algorithms are compared with respect to resources
Convolution, Correlation, Transformations (FFT, DCT), consumed and maximum frequency offered at different
Euclidean distance Signal and Image processing and many multiplier bit-widths i.e., 6, 8, and 10.
others [1, 2]. Therefore, computational performance of a DSP The remainder of this paper proceeds as follows. In section
systems is measured by its multiplication performance[3]. 2, both algorithms architectures are discussed in detail that is
Because the speed is considered as the key performance followed by direct form FIR filter design using both in FPGA.
parameters of any circuit and the speed depends on the two In section 4, simulation results are analyzed and concluded in
things, the clock speed of that circuit and number of cycles, section 5.
required to perform any task.
II. BOOTH AND WALLACE TREE ALGORITHMS
The traditional method or working of multiplier is
sequential. Multiplication involves three main steps; partial In this section, two well know architectures of Booth and
product generation, partial product reduction and addition. Wallace Tree Multiplier are described. Their details can be
These steps take much hardware in implementation. For seen in [6-8].
example in traditional array multiplier, consider two binary A. Booth Multiplier
numbers A and B, of M and N bits. There are MN summands
that are produced in parallel by a set of MN AND gates. NxN Booth‘s algorithm is a multiplication algorithm that utilizes
multiplier requires (N2) full adders, N half-adders and N2 AND two‘s complement notation of signed binary numbers for
gates. multiplication[1]. Booth‘s Algorithm is a smart move for
multiplying signed numbers. It initiate with the ability to both
In any type of digital system design, objective is to design add and subtract. The steps involved in Booth algorithm is
compact circuits, in order to reduce area and consequently the given as below:
path delay resulting less power consumption. Hence bringing
compactness is a big challenge in traditional multiplier, as it Step 1

978-1-5090-2889-4/16/$31.00 ©2016 IEEE 17


a. Decide the multiplier and then multiplicand. B. WALLCE TREE MULTIPLIER
b. Initialize the remaining registers to '0'. An enhanced adaptation of tree based multiplier
c. Initialize count register with number of multiplicand architecture is Wallace tree multiplier[1]. Wallace tree
bits. multiplier is an effective hardware application of a digital
circuit that multiplies two integers. It is known for its optimal
d. To determine the specific arithmetic action, use the computation time. Wallace tree multipliers uses carry save
current LSB (Least significant bit) and the previous addition (CSA) algorithm and it is a parallel multiplier.
LSB. For Example: Through carry save addition algorithm, propagation delay can
• Multiplicand =7 → 0111 → M be reduced [8].
• Multiplier =3 → 0011 → Q
To reduce the latency it uses carry save addition algorithm
• Register ‘A’ =0 → 0000 → A as shown in Fig: 2. CSA is a digital adder used in computer
• Register Q =0 → 0000 → Q architecture to calculate the sum of three or more inputs, it is
• Register Count=4 → 0100 → Count faster and cheap. On adding multiple operands using carry
Step 2 save adder (CSA) it gives 2 outputs.

a. Possible arithmetic actions:


• 00 → no arithmetic operation
• 11 → no arithmetic operation
• 01 → add multiplicand to left half of product
• 10 → subtract multiplicand from left half of product

Step 3
a. Perform an arithmetic right shift (ASR) on the entire
product.
Step 4
a. When count register is not '0' then continue the
multiplication.
If count register is '0' then END the algorithm
Fig. 2. Wallace tree architecture using carry save [[1]]
Start
III. FIR FILTER
The basic operation of digital filter is to take a sequence of
A 0; Q-1 0
M Multiplicand
input numbers and compute a different sequence of output
Q Multiplier numbers. As the name implies, an FIR filter consists of a finite
Count n number of sample values, reducing the convolution sum to a
finite sum per output sample instant. The output of an FIR of
order or length L, to an input time-series x[n], is given by a
=10 =01 finite version of the convolution sum given in (1), namely:

Qo, Q-1
x n ∗f n fkxn k ….. 1

where f [0] ≠ 0 through f [L-1] ≠ 0 are the filter’s L


=11 A A+M
A A-M
=00 coefficients [9].

Arithmetic shift right


A, Q, Q-1
Count Count-

No
Yes

END Count=0?

Fig. 3. Direct form for FIR filter structure


Fig. 1. Flow chart of Booth's algorithm

18
In this work, we have used direct form for FIR filter increases. Hence this device is suitable for large number of
structure for comparison of two algorithms as shown in figure bits. Booth Multiplier with Stratix-III gives best performance,
3. A collection of a tapped delay line, adders and multipliers in comparison to cyclone-III FPGA device, in terms of
can be seen in this figure. Maximum frequency, keeping the resources utilization same .
So Stratix-III can be best suited design device, when
IV. FPGA BASED IMPLIMENTATION performance is core consideration.
In this work, two algorithms are implemented on small Table 2: Area-performance results for Wallace Tree
commercial FPGA devices provided by Altera. Here, we have
chosen two architectures i.e., cyclone and Stratix. Especially Bit FMAX
cyclone-III and stratix-III area-performance is evaluated for DEVICE LUTs Reg IO Pins
Width (MHz)
two algorithms.
A. Booth and Wallace Tree Algorithms Synthesize in FPGA 6 93 125 25 552.79
In first place, synthesize is done for two algorithms that is
followed by incorporating these algorithms in FIR filter for as Cyclone-III 8 161 166 33 436.49
typical DSP application.
10 205 208 41 373.27
B. FIR filter synthesize in FPGA
Direct form structure of the FIR filter shown in fig: 3, is 6 113 125 25 865.05
synthesized on cyclone-III and Stratix-III for area-performance
analysis using both and Wallace multiplier techniques. Stratix-III 8 145 166 33 661.81
V. RESULTS AND DISCUSSION
10 185 208 41 600.24
In table 1 and table 2, are given the area-performance
results of the booth and Wallace tree algorithm in terms of
Look up tables (LUTs) provided with varying bit-width (i.e., In table 2, Wallace Tree Multiplier shows that, Stratix-III
size of the multiplier and multiplicand) of six, eight and ten. can be good choice, for implementing high frequency Wallace
Tree Multiplier, in comparison to cyclone-III device. Similar
In these results, total combinational functions indicate the to booth multiplier, maximum frequency achieved here is much
ratio of number of total combinational functions used in the greater, for same number of resource utilized.
design to the total available number of combinational
functions. Dedicated logic registers (Reg) indicates the ratio of In short, we can conclude that in performance perspective,
dedicated logic registers being used in the design to the Wallace tree is offering much better then Booth at the cost of
available dedicated logic registers. Maximum clock frequency more registers usage even in pipelined.
(FMAX) actually indicates the maximum speed of clock it can
attain in running a program without disturbing the internal Performance Evaluation of FIR filter with
clock setup time as well as clock hold time. Clock setup time is Booth and Wallace Tree Multiplier in Direct
the least duration for which the data must reach prior to the Form
clock edge active. Clock hold time is the least period of time 150
data should be stable after the clock edge active.
TABLE 1: Area-performance for Booth Multiplier 100
FMAX (MHz)

Bit IO FMAX
DEVICE LUTs Reg
Width Pins (MHz)
50
6 119 23 28 75.11

CYCLONE-III 8 151 27 36 84.16 0


6-bit 8-bit 10-bit
10 181 31 44 85.22 Wallace Cyclone Booth Cyclone
Wallace Stratix Booth Stratix
6 88 23 28 194.55
Fig. 4. Performance analysis of FIR filter with Booth and
STRATIX-III 8 109 27 36 189.9 Wallace Tree

10 126 31 44 149.21 In fig: 4 performance analysis of FIR filter with Booth and
Wallace tree are presented. It can be observed that maximum
In table 1, area-performance analysis of Booth multiplier Performance (FMAX) of Cyclone and Stratix is consistent for
using Cyclone-III, shows when the number of bits are Wallace tree multiplier, but when it is compared with Booth
increased from 6 to8 or 10, though the LUTS, Registers and IO multiplier FMAX is decreasing by increasing the number of
pins increases, but the maximum frequency achieved also bits (i.e., multiplier bit-width) in both families (Cyclone and

19
Stratix). It is evident that Wallace tree offer much higher [3] A. C. Thompson, "Techniques in Single-Bit Digital
performance than Booth in both families. Filtering," RMIT University, 2004.
[4] Tayab D Memon, et al., "Power-Area-Performance
Chip Area analysis in Direct Form Characteristics of FPGA based sigma-delta
Booth Multiplier Wallace Tree modulated FIR Filters," Journal of Signal Processing
Systems (JSPS) vol. 70, pp. 275-288, 2013.
30000
[5] Amin Z. Sadik and Z. M. Hussain, "Short Word-
25000 Length LMS Filtering " in ISSPA 2007, Sharjah,
L 20000 UAE, 2007, pp. 1-4.
U [6] T. D. Memon and P. Beckett, "The impact of
15000
T alternative encoding techniques on fi eld
10000 programmable gate array implementation of sigma-
S
5000 delta modulated ternary fi nite impulse response fi
0 lters," Australian Journal of Electrical and
6-bit 8-bit 10-bit Electronics Engineering vol. 10, pp. 107-116, 2013.
[7] T. Memon, et al., "Sigma-Delta Modulation Based
Digital Filter Design Techniques in FPGA," ISRN
Electronics, vol. 2012, 2012.
Fig. 5. Chip area analysis of multi-bit FIR filter using Booth
[8] K. N. Macpherson and R. W. Stewart, "Area Efficient
and Wallace Tree
FIR filters for high speed FPGA implementation,"
In fig: 5, chip area consumed by FIR filter using Booth and IEE Proc.-Vis. Image Signal Process., vol. 153, pp.
Wallace tree multiplier is presented. Here, we can see that chip 711-720, Dec. 2006.
area (i.e., number of LUTs) of Booth multiplier at lower bit [9] R. Mehboob, et al., "FIR Filter Design Methodology
occupies lesser area than the Wallace tree multiplier but by for Hardware Optimized Implementation," IEEE
increasing the number of bits it increases and it crosses the TRansaction on Consumer Electronics vol. 55, pp.
Wallace tree at maximum bit-width used in the synthesize. 1669-1673, Jul. 2009.
VI. CONCLUSION
In this paper, multiplier algorithms i.e., Booth and Wallace
Tree and Fir Filter using these Algorithms is synthesized in
Cyclone III and Stratix III and compared on the basis of area-
performance of two approaches.
Synthesize results shows that Booth multiplier offers poor
performance compared to Wallace tree and occupies less area
comparatively. Similarly, in FIR Filter deign the chip area of
Booth multiplier occupies lesser area than the Wallace tree
multiplier, but the performance of Wallace tree is much better
than booth.
This tradeoff indicates that in applications where area is an
important criterion there should be preference for Booth
multiplier over Wallace tree multiplier and where performance
is needed, the Wallace tree should be the preference.
Point to the future is to work on comparison of single-bit
ternary FIR filter and multi-bit filter (presented here) with
Booth and Wallace tree multiplier policies in the sense of
area-performance-power analysis.

REFERENCES
[1] G. K. Ma and F. J. Taylor, "Multiplier policies for
digital signal processing," IEEE ASSP Magazine, vol.
7, pp. 6-20, 1990.
[2] S. Ghanekar, et al., "Design and architecture of
multiplier-free FIR filters using periodically time-
varying ternary coefficients," Circuits and Systems I:
Fundamental Theory and Applications, IEEE
Transactions on, vol. 40, pp. 364-370, 1993.

20

You might also like