2.1 Survey On Parallel Fir Filters

15
CHAPTER 2
2 LITERATURE SURVEY
This chapter briefs the work related to various algorithms and

methods used in the design of fast parallel FIR filters. The logical depth and
logical element of constant multipliers reduced using various multiple constant
multiplication methods are briefed. Low power and area efficient CSLA is
discussed in a nutshell. The literature survey of reconfigurable FIR filters on
Application Specific Integrated Circuit (ASIC) and Field Programmable Gate
Array (FPGA) realizations are discussed. The unified operators used in the
design of FIR filter design are narrated. Finally, the summary of the literature
review is presented at the end of the chapter.
2.1 SURVEY ON PARALLEL FIR FILTERS
The area and power consumption of the parallel filter mainly

depend on the multipliers and adders used in the design. The multipliers are
the most power hungry and area consuming datapath element in the FIR
filters. In the initial stage the two operand multiplier like booth multiplier,
wallace tree multiplier, and dadda multipliers are used in filter design. These
multipliers consume huge area and power while designing higher order filters.
The coefficients used in the FIR filter are constant in most of the application
and researches start working towards the design of less area and low power
constant multipliers. In constant multipliers, one operand is fixed and the other
is varying and the multiplier can be implemented using shift and add operation
known as multiplierless multiplication. Power efficient multipliers are
16
designed by representing the coefficient in CSD, which is defined as the

Redundant Binary Representation (RBR) i.e., the number is expressed with no
adjacent non-zero digits. The CSD representation of binary number with fewer
non-zero bits reduces the number of addition operations in product generator.
Block processing digital filters are used to increase the parallel

computation and reduce the power consumption digital filter design. In block
processing, the serial input sequences are converted into parallel form by serial
to parallel converters of block size L. This 'L' blocks are processed in parallel
by L block subfilters to increase the sampling rate. The parallel FIR filters
have the advantages of high sampling rate and low power consumption with
area overhead due to the replication of processing blocks. A number of
research articles are published in the last three decades to design fast parallel
FIR filters with less area and low power consumption with reduced complexity
for hardware implementation in ASIC and FPGA's. Jose (1989) proposed the
computational fast structure to make a trade-off between the hardware
complexity and throughput of L-path and L-block filter implementation on
DSP processor by using fast algorithms for smaller linear convolutions based
on iterated computation to increase the speed. Ing-Song & Mitra (1996)
suggested block digital filtering in FIR filters to increase the parallel
computation and to reduce computation complexity. Fast short-length linear
convolution and DFT based fast FIR filtering algorithm are derived to reduce
the computational complexity and for the generation of parallel filter structure.
The area efficient parallel FIR filter was proposed by (David &
Keshab 1997). The area of block filter is reduced by sub-structure sharing
technique and Maximum Absolute Difference (MAD) quantization process.
High sampling rates or low power consumption with moderate sampling rate
are the main needs of parallel FIR filters in most of the applications. The
statistical properties of the input signal are used to reduce the number of
17
arithmetic unit like adders and multipliers. The frequency spectrum

characteristics are used to reduce the hardware cost by choosing suitable FFA
structures. Jin-Gyun & Keshab (2002) proposed Look-ahead Maximum
Absolute Difference (LMAD) quantization algorithm and structures based on
FFAs to reduce the complexity of parallel digital filters.
Iterated short convolution (ISC) algorithm is derived based on

mixed radix algorithm to design fast parallel FIR filters by (Chao & Keshab
2004). The long block size parallel filters are designed by cascading smaller
length parallel filters. The fast convolution Cook–Toom or Winograd
algorithm are used to derive any length, but they are not suitable in iterative
convolution for hardware implementation. The hardware cost of the parallel
FIR filters depends on the number of multiplications, additions/subtractions
and delay elements used in the design. The ISC based fast parallel FIR filters
utilizes fewer amounts of adders, multipliers and delay element compared to
FFA-based parallel filter. The subfilter used in this design was transposed
direct-form FIR filter structure. The ISC algorithm based structure reduces the
multiplications 17%, delay element 17%, and additions 3% for a 576-tap
6-level parallel FIR filter.
Chao & Keshab (2007) proposed computation sharing subfilter

method to reduce the number of multiplication operation. The linear
convolution method can be used for subfilter sharing but the structure may be
irregular in some cases. This method utilizes the subfilters of same length and
structure present in the parallel filter structure to develop the second stage
parallel FIR filter through process sharing. The second stage parallel FIR
filters are designed with the additional cost of delay elements to reduce the
addition and multiplication operations and they have regular structure for
hardware implementation.
18
Mou & Duhamel (1988) proposed the software and hardware

implementation of parallel fast FIR filtering algorithm using divide-and-
conquer approach. This method retained the multiply-accumulate structure for
the implementation of FIR filter in DSP and general purpose processors. The
polynomial product algorithm was used to derive the fast FIR algorithm.
Samueli (1989) proposed the multiplierless FIR filter based on CSD code to
represent coefficients as sum or difference of powers-of-two. Ing-Song &
Mitra (1996) analyse various parallel FIR filter structure for the
implementation trade-offs between the throughput and the hardware
complexity. The polyphase Structure, Direct Form Structure, Transpose Direct
Form Structure, Linear Phase Direct Form Structure, Bit Plane Structure, and
Block Processing Structure were analyzed and concluded that the transposed
direct form offer the best trade-off between speed and area. . Various recoding
techniques and Ripple Carry Adder, Carry Save Adder, Carry-Lookahead
Adder,Carry-Select and Carry-Skip techniques are analysed and proposed that
the CSD and CSA adder are best suited for the design of FIR filter with less
multipliers and propagation delay. It is also noted that the CSA adder consume
more area compared with other adder techniques. Parhi (1999) gave a very
brief description about Very Large Scale Integration (VLSI) digital signal
processing algorithm, number system, block processing techniques and
constant multiplier in various levels.
Yu-Chi & Ken (2012a) proposed a new parallel FIR filter based on
fast finite-impulse response algorithms to reduce the number of multipliers
into half by utilizing the symmetric property in the subfilter blocks. In this
symmetric parallel Fast FIR filter, the adders used in pre and post processing
block increases and it is fixed based on the length of the parallel FIR filter.
Due to this symmetric convolution techniques, the area and power
consumption of the parallel FIR filters are reduced to considerable level. The
even symmetric parallel fast FIR filter and odd length symmetric parallel fast
19
FIR filters are designed with low power and less area consumption. All the
literatures concentrate only on the algorithmic level strength reduction to
reduce the number of multipliers in the parallel FIR filter structure. The
arithmetic strength reduction is used to reduce the area, power and delay of the
parallel FIR Filters in components level.
2.2 RESOURCE MINIMIZATION OF MULTIPLE CONSTANT

MULTIPLIERS
Design of low cost and power efficient digital filter has huge
attention in portable communication systems. Multipliers are the most power
and area consuming datapath element in parallel FIR Filters. Extensive
research works have been carried out to replace the expensive multiplier by
low power and area efficient adders and shifters. Andrew & Malcolm (1995)
addressed the CSD based n-dimensional Reduced Adder Graph (RAG-n)
algorithm to reduce the number of adders and multipliers compared with
existing Bull and Horrock Algorithm (BHA). RAG-n algorithm uses look up
table method. Miodrag et al. (1996) proposed iterative pairwise matching
algorithm to explore the CSE method in the MCM to reduce the adder,
subtractor and shifter.
Sankarayya et al. (1997) proposed the Differences Method (DCM),

where the stored precomputed results are used instead of computing the
convolution. The reduction of energy dissipation and an increase in speed are
achieved with the cost of more storage. The number of basic operations like
additions and/or subtractions and shift are used to measure the complexity of
FIR filters. The shifter consumes less hardware cost compared to adders. So,
the complexity of multiplier can be reduced by reducing the number of adders.
The Nonrecursive Signed Common Subexpression Elimination

(NR-SCSE) algorithms are used for the reduction of occupied area and logic
20
depth of multipliers which was proposed by (Marcos et al. 2002). The NR-
SCSE searches the nonrecursive signed common subexpressions present in the
CSD array. Dongning et al. (2002) proposed SPT representation for coefficient
to design multiplierless implementation of FIR filter. Fei et al. (2007)
proposed an algorithm to maximize the adder sharing in the filter coefficient
representation by Common Signed-Power-of-Two (CSPT).
Pasko (1999) proposed CSE to reduce the area of multipliers. In

this method, the coefficients are represented in the CSD form to reduce the
nonzero bits and for shift add expansion, and then the CSE is applied to
identify the common pattern for elimination to reduce adder cost. The MCM
method using Minimum Spanning Trees (MSTs) works in two steps; first
calculating the MST for a graph and then calculating the differences to pass to
the next level was proposed by ( Oscar et al. 2004).
Yongtao & Kaushik (2005) refer, that the Computation Sharing

Differential Coefficient (CSDC) method is the combination of augmented
differential coefficient approach and SS. The filter coefficients used in
augmented differential coefficient approach expand the design space; this
space is represented by a complete undirected graph. The SS present in the
design space is identified by the genetic based heuristic search algorithm. This
method shows 70% of adder reduction in multiplierless implementation of FIR
filters. In DSP algorithms, energy efficiency and high processing speed is
achieved by direct mapped fully parallel architectures.
Contention Resolution Algorithm (CRA) is used to solve the local

minima problem in the design of multipliers which is proposed by (Fei et al.
2005). CRA uses the CSD coefficients to reduce the LO and LD. The logical
depth reduction method was used to minimize the propagation delay along the
adder chain of shift-add multipliers in CRA. The dynamic power consumption
21
of the shift-add multipliers can be reduced by using 2's complement

representation in CRA.
The channelizers are the more computationally intensive modules

in wideband receiver. The FIR filter used in the channelizer determines the
complexity of the adder used in the filter design. Youngbeom & Sejung (2002)
proposed the design of Linear Phase Finite Impulse Response (LPFIR) filter
using vertical Common Sub-expression (VCS) which reduces the adders to
48%. The method proposed by Richard (1996) was Horziontal Common Sub-
expression (HCS), the adders used in the channel filter are reduced by
Nonzero-bit Super Subexpression (NSS). The Horizontal Super Subexpression
Elimination (HSSE) is used to reduce the adders and Vertical Super
Subexpression Elimination (VSSE) is used to reduce the full adder in the
design of adders. The 3 and 4-nonzero-bit HSSE algorithm and VSSE
algorithm are derived from HCSE and VCSE methods by (Vinod et al. 2005).
Kenny et al. (2006) proposed the hybrid algorithm of RSAG-n and

RAG-n known as n-dimensional Reduced Add and Shift Graph algorithm to
reduce both the shifter and the adders used in the multiplier-less
multiplication. Yevgen & Markus (2007) proposed heuristic algorithm which
reduces 20% additions and subtractions than the BHA, BHM and RAG-n
algorithms. The MCMs can be implemented using different approaches. In
general, these approaches identify the common subexpressions present in the
multiple constant and eliminate them to reduce the logical depth and logical
operators in the multiplier-less design.
In most of the cases, the CSD is used for representing single

constant and Minimal Signed Digit (MSD) for multiple constant by (In-Cheol
& Hyeong-Ju 2002). The MSD has multiple representations with same number
of nonzero digits as CSD. MSD results in less number of hardware units than
CSD. MSD gave best result when proper MSD representation is used for each
22
coefficient in FIR filter design. The MSD uses 7% less hardware compared to
CSD in FIR filters. Malcolm & Andrew (2005) presented an algorithm which
is kth coefficient MSD known as KMSD. Adder cost of RAG-n, BHM,
Hartley‟s algorithm (FH) and KMSD algorithm for FIR filters with various
lengths are compared.
Some of the Graph-based Algorithms are Bull Horrocks Modified

algorithm (BHM), n-dimensional Reduced Adder Graph, Heuristic of
cumulative benefit and Pattern Modification Technique (PMT). Bull &
Horrocks (1987) proposed primitive operations to reduce the complexity of
filter design. Bull & Horrocks (1988) proposed an algorithm based on the
primitive operator graph known as Bull Horrocks algorithm. Dempster &
Macleod (1994) proposed a modified BH known as Bull-Horrocks Modified
algorithm (BHM) which is used to reduce the number of adders and
subtractors in the filter design.
Dempster & Macleod (1995) proposed RAG-n algorithm to reduce

the complexity of multiplier design. Yevgen & Markus (2007) proposed
Heuristic of cumulative (Hcub) benefit algorithm to reduce the adder and
subtractions used in the filter design. RAG-n and Hcub are adder graph
algorithms. Until the arrival of Hcub algorithm, the RAG-n was considered as
the best algorithm for MCM design. The algorithm has two parts optimal and
heuristic. The optimal part is used to optimize adder cost and heuristic part is
used to add extra coefficients. Oscar (2007) used the difference method
algorithms in the heuristic part of the adder graph algorithms to use
redundancy among the coefficients.
Mahesh & Vinod (2008) proposed Binary based Commom

Subexpression Elimination (BCSE) to reduce number of adders in higher order
filters. The BCSE method reduces more number of adders in higher order FIR
filters than CSD and CSE method. Graph Dependence (GD) based BCSE
23
algorithm is used for hardware reduction in FIR filter design. The BCSE
method reduces logical depth 20 % less than NR-SCE, 16% less than CRA
and 13% less than Subexpression Sharing (SS).
Chip-Hong & Mathias (2010) concluded from their analysis that

CSD/MSD representation can be used for low logical depth and BCSE has
little advantage over CSD in adder cost saving. The PMT was used to design
truncated MCM in the transposed form FIR filter by (Rui et al. 2010), which
consume 6 % less area compared with Uniformly Truncation (UT) and 35%
less area than non-truncated MCM when analysed in FPGA.
Jeong-Ho & In-Cheol (2008) proposed the Multiple Adder Graph

(MAG) algorithm with multiple adder of coefficient for the reduction of
hardware. The VCSE and HCSE are two classes of CS present in CSD
representation. The reduction of logic depth and logic operators using VCSE
and HCSE is analyzed by (Vinod et al. 2010). The algorithm based on
Identical Shift-Horizontal Common Subexpressions (IS-HCSs) reduces 5%
logical operator and 25% logical depth in multipliers compared with Multiple
Adder Graph (MAG) algorithm. From the analysis, it is noted that the MCM
techniques are used to reduce the logical depth and logical operator of FIR
filter in transposed form but these techniques are not analyzed in parallel form
filter structures.
2.3 RECONFIGURABLE FIR FILTERS
In digital signal processing, FIR filters have vast applications like

wireless communication, audio, video and image processing systems. The
system like multi-channel filters and SDR needs Reconfigurable FIR filter. In
RFIR filters the coefficient can be changed dynamically at real-time. The need
for RFIR filters in wireless communication systems and multimedia devices
motivate the researches to design high sampling rate, low power, and less
24
space reconfigurable digital filters. The parallel FIR filter structures can be
used to increase the sampling rate and reduce power consumption of RFIR
filter but the replication of hardware increases the area. The well known fact is
that the multipliers are more power and area consuming logical element in any
DSP system; obviously the area and power consumption of parallel RFIR
structure are high compared to normal structure. The RFIR filters are used in
the transmitter and receiver side of SDR in multi-standard communication
systems. In the last three decade, research about the design of low complex
reconfigurable FIR filter for various applications was developed.
A 100 MHz 40-tap Programmable FIR (PFIR) filter was proposed

by (Hatamian & Rao 1990). Evans et al. (1990) proposed a high speed digital
PFIR filter. An area-efficient PFIR digital filter using CSD coefficients was
proposed by (Kei-Yong et al. 1989). Kuan-Hung & Tzi-Dar (2006) presented
8-digit FIR filter architecture which is tap reconfigurable and consume less
power. This PFIR consume 16.5 mW power and can be operated in 86 MHz.
A 10-tap programmable FIR filter was designed by Jongsun et al. (2004) based
on Computation Sharing Multiplier (CSHM). The carry-select adder and
Conditional Capture Flip-Flop (CCFF) proposed by (Bai-Sun et al. 2001) were
used to reduce the power at circuit level. The performance of PFIR is high and
consuming low power. CSHM architecture can be used for adaptive filter and
matrix multiplication.
Siileyman et al. (2003) proposed Reconfigurable Multiplier Blocks

(ReMB) for 8-point Goertzel recursive DCT structure to save the area
consumption in ASIC and FPGA implementation. ReMB was designed based
on Reduced Coefficient Multiplier (RCM) techniques. In PFIR filter a fixed
input value is multiplied with one of several preset fixed point coefficients.
Peter et al. (2007) proposed a Directed Acyclic Graph (DAG) fusion algorithm
to generate time multiplexed MCM based on fused addition chains, which
25
consumes less area and delay compared with full multiplier. In this algorithm,
the 'N' number of single-constant logic circuit are integrated into a “fused”
logic circuit which consist of multiplexers and adders. The speed of operation
was reduced due to long logical depth result by DAG method.
In mobile systems, the filters consume less power and operate at

high speed to retain the battery power for long time with high throughput. The
RFIR filters used in the wideband receivers of SDR must be realized with the
specification of low power and high throughput. Sang & Pramod (2014) and
Mahesh & Vinod (2010) discussed various methods to increase the speed and
reduce the complexity for the implementation of reconfigurable FIR filters.
Mehendale et al. (1996) discussed the design of FIR filters with less
complexity using DA. The complexity reduction was achieved through
multiple memory banks and multi-rate architecture techniques. The high
requirement of memory according to the filter order are reduced by the above
techniques, due to more number of decomposition the throughput in the
system get reduced.
The CSD based PFIR proposed by Chen & Chiueh (2006) was used
to reduce the complexity of the filter by reducing the precision of the constant
coefficient value and does not dependent on the number of taps. This method
consumes large amount of hardware elements and power which made them
unfeasible for wideband receiver of SDR. Muhammad & Roy (2002) used
vector scaling for filtering and matrix multiplication. Mahesh & Vinod (2010)
proposed new architecture to reduce complexity of RFIR filters namely
Constant Shift Method (CSM) and Programmable Shift Method (PSM). The
CSM divided the coefficients into preset groups, which increase the speed of
computation and area, power consumption of RFIR filter. The BCSE
algorithm eliminates redundancy in the PSM architecture which reduces the
power and area with slight increase in delay. The word length of the
26
coefficients can be programmed in PSM architecture and the design was

implemented in Virtex-II FPGA. Their results were compared with vector
scaling method and the computation sharing method in the programmable
filter design.
Seok-Jae et al. (2011) proposed low power RFIR filter architecture

that consists of coefficient monitoring circuits. The DF structure was used in
the design. Amplitude Detector (AD) was used to monitor the amplitude of
input samples and stop the multiplication operation when the sample is below
the threshold value. Switching activity is one of the important factors that
determine the dynamic power consumption of CMOS circuits. Switching 'ON'
and 'OFF' the multiplier according to the input samples reduces the switching
power to considerable level. The switching activity of the RFIR filter was
reduced by Multiplier Control Signal Decision (MCSD) window. This
approach saves the power upto 41.9% with area overhead of 5.34% and
degradation filter performance at negligible level. Filter was designed using
Verilog HDL coding and Taiwan Semiconductor Manufacturing Company
(TSMC) 0.25 µm CMOS technology was used for result analysis.
The DA techniques are used for attaining high throughput and area
efficient implementation of FIR filter for various applications. Look Up Table
(LUT) and shift accumulation are the operation used in DA computation.
White (1989) reviews the application of DA in DSP applications. The ROM
based LUTs were used in the DA design, where the coefficients are fixed. The
memory requirement of DA based filter design was huge when the filter order
is high. Meher (2006) and Pramod et al. (2008) proposed the systolic
decomposition techniques for reducing the memory requirement of long-
length convolutions and higher order filter. The one and two dimensional
pipelined structures of FIR filter was designed using systolic based DA inner-
27
product calculation technique. This method reduces the memory size by

smaller address length, but the adder complexity and latency increases.
Kumm et al. (2013) suggests RAM based LUT instead of ROM

based LUT for dynamically changing the filter coefficient. Sang & Pramod
(2013) have proposed distributed arithmetic based adaptive FIR filter which
operate at high throughput and consume low power and less area. The parallel
LUT and concurrent weight updation methods are used to increase the
throughput of the filter. The conditional signed carry save accumulation was
used to reduce the area and power consumption in the design. Pramod & Sang
(2014) designed, shared-LUT based DA method for ASIC and FPGA
implementation RFIR filters. Compared with systolic and CSA based DA
shared LUT method consumed less energy and area for ASIC implementation
and DRAM-based implementation occupy 54% and 29% less slices at 91 MHz
in Virtex-5 FPGA.
Mazher & Varadarajan (2013) present the design of RFIR filter

based on CSHM, CSM, and modified BCSE. The Computation SHaring
Multipliers (CSHM) method was used to identify and reuse common
computation steps. The CSHM reuses the common shift and addition
operations. CSM was used to increase the speed of shifting operation. The
binary-based common subexpression elimination method was used to reduce
the adder for complexity reduction. Thus the combination of CSHM, CSM,
and modified BCSE methods was effectively used to reduce the area and
power consumption and increase the speed of the RFIR filter. In this structure,
the concurrency was attained through pipelining and parallel processing. The
design was implemented and analyzed using Virtex 2 FPGA.
Padmapriya & Lakshmi (2015) presented the dual mode RFIR filter
for speech signal processing where the DF structure was used. Two modes of
operation were used to reduce the power and area consumption. The testable
28
reversible mode of operation is to reduce the power consumption and the

multiplier less mode is used for reducing the area consumption. The amplitude
of the input values amplitude is compared with the input threshold value by an
Amplitude Detector (AD). If the input value is greater than the threshold
value, then reversible mode operation is carried out. Otherwise it operates on
multiplier less mode.
The CSD representation was used in the design due to the

properties of less number non zero bits and nonzero adjacent digits. In the
multiplier less mode, a pre-estimator unit, selector units, and adder units were
used. The selection unit output added by using adder unit consists of D-latch
based CSLA to reduce the area, power consumption, and delay. In the
reversible mode the reversible gates are used to reduce the power
consumption. The Fredkin & Peres reversible gates are used in reversible
mode to design the delay element and the adders. This method reduces 37.97%
power consumption and 44.61% area reduction compared with the
conventional methods.
Indranil et al. (2015a) proposed the design of Reconfigurable

Pulse-Shaping (RPS) FIR Interpolation Filter with low power and less area
consumption for Digital Up Converter (DUC). The Multiplications Per Input
Sample (MPIS) and Additions Per Input Sample (APIS) techniques are used to
reduce the number of multipliers and adders. This RFIR filter area and power
consumption are further reduced by using 2-bit BCS based constant multiplier.
When comparing with 3-bit BCS, 2-bit BCS consumes 62% less area delay
product in FPGA implementation.
The 2-bit BCSE-based RFIR filter consumes 9.5% less power than
3-bit BCSE-based and 91.3% less power than the work presented by Seok-Jae
Lee et al. (2011) in ASIC implementation. From the above analysis, it is
understood that the 2-bit BCSE algorithm reduces the area and power
29
consumption of the constant multiplier compared with 3-bit BCSE for both
FPGA and ASIC implementation. The Xilinx XPower tool was used to
analyze the dynamic power consumption of the design in FPGA
implementation.
Basant & Pramod (2016) proposed transpose form block

processing large length FIR filter for both fixed and reconfigurable structure.
The reconfigurable was architecture designed by using Inner Product Units
(IPUs) and horizontal/vertical subexpression elimination based MCM was
used in fixed coefficients architecture. The ASIC and FPGA implementation
consumes 42% less ADP and 40% less EPS than the direct form block FIR
structure.
Indranil et al. (2015b) proposed the design of constant multiplier

for RFIR filter using VHBCSE algorithm which supports for changing the
coefficient at run time. The RFIR can be implemented in both on ASIC and
FPGA. The 2-bit BCSE algorithm was applied vertically across neighbouring
coefficients and variable-bit BCSE was applied horizontally within the
coefficients. This VHBCSE algorithm reduces the switching activity that
reduces the power consumption.
When comparing with 2 and 3-bit BCSE algorithm, this method

reduces the 32% and 52% average power consumption for ASIC
implementation and in FPGA implementation Area Delay Product (ADP) was
reduced 13% and 28% and Power Delay Product (PDP) reduced to 76.1% and
77.8% compared to rounded truncated Multiple Constant Multiplication /
Accumulation (MCMAT) and Multi-root Binary Partition Graph (MBPG)
methods. The above information infers that none of the researchers has
focussed on the design of symmetric parallel reconfigurable FIR filters.
30
2.4 CARRY SELECT ADDERS
Behrooz (2000) describes about various number systems,

algorithms and techniques used in computer arithmetic. The multiplication and
division operation are performed by repetitive addition operation. Half adders
and full adders are the fundamental units for designing the adders. A simple
bit-serial adder can be designed using a full adder and a flip flop. The bit-serial
adder consumes very less area and the hardware complexity is less, but it
consumes n-cycles to generate the sum and carry. The critical path delay of
RCA adder is high. The latency of the RCA makes this unsuitable for most of
the applications. The adders may have two operand or multi operands, the
latency must be taken into consideration to increase the speed of adders.
Carry Lookahead Adder, Carry Skip Adder, Carry Select Adder

and Conditional Sum Adder are some of examples for two operand high speed
or fast adders. The examples for multi-operand adders are, CSA and parallel
counters. The multioperand adders are mostly used in the design of wallace
and dadda tree multipliers. The multioperand adders are used as a key
component to reduce the cost, power and delay of various multipliers. Bedrij
(1962) proposed CSLA which have less delay compared with ripple carry
adder and occupy less area compared to CLA. The CSLA consists of two sets
of RCA; one with carry in '0' and other with carry in '1'.The multiplexers used
to select the output depend upon the actual carry. Chang & Hsiao (1998)
proposed the design of carry select adder using RCA with carry '0' and the
RCA with carry '1' was replaced by using Add-One circuit which reduces the
area consumption by 6.3%.
Neve et al. (2004) proposed the design of high speed and low
power 64-bit CSLA. Design style, cell arrangement, and adder structure are
the three levels of abstraction used in the adder design. In this design, Branch-
Based Logic (BBL) style contains less number of transistors in series to make
31
a branch so as to reduce the power consumption. The reduction of stack height

in the critical path was attained by cell arrangement which increases the speed
of operation. A pre-sum technique is used in the adder structure level to reduce
the power consumption. This method improves 60% power-delay product
compared to Complementary Passgate Logic (CPL).
Yajuan et al. (2005) proposed SQRT based CSLA based on first

zero detection logic with remarkable power-delay and area-delay product. The
SQRT method is used to reduce the worst-case delay by different sized RCA
module. The first zero detection logic was a modified Add-One scheme. The
CMOS mirror topology was used in the RCA to obtain a performance trade off
between the power and delay. First zero detection circuit was designed using
transmission gates to avoid the threshold drop present in the pass transistor
logic. The modified add-one logic reduced the number of transistors compared
to RCA with '1'. The proposed CSLA consumes 46% less power and 70% less
number of transistor compared with conventional CSLA.
Yajuan & Chip-Hong (2008) proposed the design of hybrid adder.

It was designed by using the combination of CSLA and CLA. This hybrid
method was used to increase the speed of the adders. The accumulation of
partial product in parallel multiplier uses Redundant Binary (RB) number for
area efficiency and high speed operation. Normally, in all systems, the 2's
complement format was used for computation and a converter was used to
convert the numbers from 2's complement format to RB format and vice versa.
The combinations of RB and hybrid adders are used to design the multipliers
with high speed. The modified add-one circuit was used in the design of
CSLA adder for area reduction.
Massimo et al. (2011) discussed the design of parallel carry select

adder with reduced delay by reducing the size of the full adders. The CSLA
adders are widely used in mobile applications due to low power consumption.
32
The speed of CSLA adders depends on the number of FAs used in each group.
The delay of the CSLA was reduced by resizing the groups of full adders used
in the design of 16 or 32 bit adders. Ramkumar & Harish (2012) proposed
BEC based SQRT CSLA adder which consume less area and low power. The
RCA with carry-in '1' was replaced with a BEC which consists of less numbers
of gate compare with existing RCA. The area is reduced by 17.4 % and power
upto 15.4% compared with the dual RCA block of 64-bit. The CMOS 180 µm
technology was used for analysis.
2.5 ARITHMETIC OPTIMIZATION
The overall performance of DSP system can be improved through

arithmetic optimization. In arithmetic optimization the operation sharing was
used to reduce the resource utilization of system design. In Application
Specific Integrated Circuit based DSP system design, the constant
multiplication is implemented using addition, subtraction and shifting to
reduce the hardware complexity, power and resource utilization. The CSE
method has been used as a key technique to increase the resource utilization in
constant multiplier design. The resource utilization can be further increased by
operator unification.
Jean-Luc et al. (2008) proposed the unified arithmetic operator

used in finite field arithmetic. This approach reduces the hardware utilization
5.6 times compared with the existing method. The design was implemented on
Cyclone II FPGA and occupied 3,217 logical elements and operates at 152
MHz. Anshul et al. (2009) proposed the design of BCD based
adder/subtractor. The unsigned and signed representation can be used in the
design. The power-delay product was reduced upto 32% by unification
operation.
33
Jiatao et al. (2016) designed UAS based CSE algorithm for MCM
design which consumes less area and low power. The UAS was designed in
gate level which computes both sum and difference simultaneously. Cartesian
coordinate system was used to represent the constant values. The non-
overlapping non-zero pairs are combined together to increase the reusability of
the arithmetic resources. The proposed UAS MCM was used in the design of
the multipliers in the FIR filter, FFT and DCT system and the comparison
shows that 27.5% reduction in area-time and 12% power reduction. From the
literature reviews it is identified that the unification operation can be used for
arithmetic level strength reduction of parallel FIR filters.
2.6 SUMMARY
The survey infers that algorithm and architectural level

optimization of parallel FIR filter design offers low power and high sampling
rate with the overhead of additional hardware cost. The algorithm/arithmetic
level and bit-level strength reduction of MCM used for designing fixed and
reconfigurable FIR filter with less area and low power are described. The
arithmetic level strength reduction of CSLA adders and unification operation
are described for the design of FIR filters. The survey explores the possibilities
of arithmetic level strength reduction of symmetric parallel fast FIR filter
using MCM and fast adders to reduce the area consumption. The feasibility of
designing reconfigurable symmetric parallel fast FIR filter for ASIC and
FPGA implementation using configurable multipliers are explored from the
survey. While analyzing the pre/post processing and subfilter block of
symmetric parallel fast FIR filter structures, it is seen that there are a number
of unification pairs present in the structure, which can be replaced by unified
operator to reduce the area and power consumption of the parallel filters.

2.1 Survey On Parallel Fir Filters

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2.1 Survey On Parallel Fir Filters

Uploaded by

Copyright:

Available Formats

15

This chapter briefs the work related to various algorithms and

2.1 SURVEY ON PARALLEL FIR FILTERS

The area and power consumption of the parallel filter mainly

designed by representing the coefficient in CSD, which is defined as the

Block processing digital filters are used to increase the parallel

arithmetic unit like adders and multipliers. The frequency spectrum

Iterated short convolution (ISC) algorithm is derived based on

Chao & Keshab (2007) proposed computation sharing subfilter

Mou & Duhamel (1988) proposed the software and hardware

2.2 RESOURCE MINIMIZATION OF MULTIPLE CONSTANT

Sankarayya et al. (1997) proposed the Differences Method (DCM),

The Nonrecursive Signed Common Subexpression Elimination

Pasko (1999) proposed CSE to reduce the area of multipliers. In

Yongtao & Kaushik (2005) refer, that the Computation Sharing

Contention Resolution Algorithm (CRA) is used to solve the local

of the shift-add multipliers can be reduced by using 2's complement

The channelizers are the more computationally intensive modules

Kenny et al. (2006) proposed the hybrid algorithm of RSAG-n and

In most of the cases, the CSD is used for representing single

Some of the Graph-based Algorithms are Bull Horrocks Modified

Dempster & Macleod (1995) proposed RAG-n algorithm to reduce

Mahesh & Vinod (2008) proposed Binary based Commom

Chip-Hong & Mathias (2010) concluded from their analysis that

Jeong-Ho & In-Cheol (2008) proposed the Multiple Adder Graph

2.3 RECONFIGURABLE FIR FILTERS

In digital signal processing, FIR filters have vast applications like

A 100 MHz 40-tap Programmable FIR (PFIR) filter was proposed

Siileyman et al. (2003) proposed Reconfigurable Multiplier Blocks

In mobile systems, the filters consume less power and operate at

coefficients can be programmed in PSM architecture and the design was

Seok-Jae et al. (2011) proposed low power RFIR filter architecture

product calculation technique. This method reduces the memory size by

Kumm et al. (2013) suggests RAM based LUT instead of ROM

Mazher & Varadarajan (2013) present the design of RFIR filter

reversible mode of operation is to reduce the power consumption and the

The CSD representation was used in the design due to the

Indranil et al. (2015a) proposed the design of Reconfigurable

Basant & Pramod (2016) proposed transpose form block

Indranil et al. (2015b) proposed the design of constant multiplier

When comparing with 2 and 3-bit BCSE algorithm, this method

2.4 CARRY SELECT ADDERS

Behrooz (2000) describes about various number systems,

Carry Lookahead Adder, Carry Skip Adder, Carry Select Adder

a branch so as to reduce the power consumption. The reduction of stack height

Yajuan et al. (2005) proposed SQRT based CSLA based on first

Yajuan & Chip-Hong (2008) proposed the design of hybrid adder.

Massimo et al. (2011) discussed the design of parallel carry select

2.5 ARITHMETIC OPTIMIZATION

The overall performance of DSP system can be improved through

Jean-Luc et al. (2008) proposed the unified arithmetic operator

The survey infers that algorithm and architectural level

You might also like