Professional Documents
Culture Documents
CHAPTER 2
LITERATURE SURVEY
The literature survey focuses its attention towards FIR filter, particularly to
utilize under low power consumption, better performance and improved efficiency. The
implementation feasibility in VLSI environment is also studied and analyzed in depth.
were found to be 49% and 46% respectively with an average RSG increase in speed of
operation of 35% compared to other reconfigurable FIR filter architectures in literature.
Ababneh JI, Bataineh MH (2007) Linear Phase FIR filter design using particle
swarm optimization and genetic algorithms. Digital Signal Processing.
doi:10.1016/j.dsp.2007.05.011 [1]. Presented a high performance and low power FIR
filter design, which was based on computation sharing multiplier (CSHM). CSHM
specifically targeted computation re-use in vector-scalar products and was effectively
used in the suggested FIR filter design. Efficient circuit level techniques namely a new
carry select adder and Conditional Capture Flip-Flop (CCFF), were also used to further
improve power and performance. The suggested FIR filter architecture was
implemented in 0.25 pm technology. Experimental results on a 10 tap low pass CSHM
FIR filter showed speed and power improvement of 19% and 17%, respectively
because of the number of constraints that influence the power consumption of a device.
In addition to issues of performance and functionality, there was a need to satisfy strict
test coverage RSGe constraints. Furthermore, the effect of DFT circuits on the overall
performance was studied. A hearing aid device was considered as an example of a
system with strict power/area constraints. It was shown that the choice of multiplier
architecture and number representation should be carefully considered when specific
DSP architectural choices were made. The results were demonstrated with a number of
specially designed DSP architectures for the implementation of FIR filtering algorithms
on hearing aid devices.
Regalia, P.A., Lu, W.-S., Pei, S.-C. and Tseng, C.-C. “A weighted least-squares
method for the design of stable 1-D and 2-D IIR digital filters” [73]. IEEE Transactions
on (Volume: 47, Issue: 7), July 1999 suggested a method for the design of FIR digital
filters with low power consumption. In this method, the digital filter was implemented
as a cascade arrangement of low-order sections. The first section was designed through
optimization so as to satisfy as far as possible, the overall required specifications. The
first section was then fixed and a second section was added, which was designed so that
the first two sections an cascade satisfy again as far as possible the overall required
specifications.
This process was repeated until a multi-section filter was obtained that would
satisfy the required specifications under the most critical circumstances imposed by the
application at hand. In multi-section filters of this type, the minimum number of
sections required to process the current input signal can be switched in through the use
of a simple adaptation mechanism and, in this way, the power consumption can be
minimized. This design strategy was achieved by formulating the design of the kth
section as a weighted least-squares minimization problem, assuming that an optimum
(k -1)-section design is available.
26
design methodologies for low supply voltages. The flow was demonstrated on a 300-k
transistor test-chip, a time-division multiple-access baseband receiver, and a soft-output
Viterbi decoder. An example of architectural comparison of energy efficiency was also
presented.
Kyung-Saeng Kim and Kwyro Lee 2003 described a 32-tap finite impulse
response (FIR) filter with two 16-tap macros suitable for multiple taps [54]. The
derived condition for a coded coefficient and data block showed 35% savings in power
consumption and 44% improvement in occupied area compared to a typicalradix-4
modified Booth algorithm. According to the condition and separated shifting-accessing
clock scheme, they implemented a 32-tap FIR filter in 0.6- m CMOS technology with
three levels of metal.
Gemmeke, T., Gansen, M., Stockmanns, H.J. and Noll, T.G., 2004 reported
that power dissipation along with silicon area has become the key figure in chip design
[28]. They presented a design methodology reducing any combination of cost drivers
subject to a specified throughput. As a basic principle, the underlying optimization
regards the existing interactions within the design space of a building block. Crucial in
such optimization was the proper dimensioning of device sizes in contrast to the
common use of minimal dimensions in low-power implementations. Taking the design
space of an FIR filter as an example, the different steps of the design process were
highlighted resulting in a low-power high-throughput filter implementation. This filter
reported to have less silicon area than other state-of-the-art filter implementations, and
it disrupts the average RSGe trend of power dissipation by a factor of 6.
The balanced normalized peak ripple magnitude due to the quantization error
was fulfilled in the second stage by a local search method. The algorithm used a
common sub-expression based hamming weight pyramid to seek for low-cost candidate
coefficients with preferential consideration of shared common sub-expressions. They
reported that their algorithm was capable of synthesizing FIR filters with the least
CSPT terms compared with existing filter synthesis algorithms.
Ron Ho, T. Ono, Hopkins, R.D., Chow, A.., Schauer, J., Liu, F.Y. and Drost,
R., 2007 presented circuits for driving long on-chip wires through a series capacitor
[74]. The capacitor improved delay through signal pre-emphasis, offered a reduced
voltage swing on the wire for low energy without a second power supply, and reduced
the driven load, allowing for smaller drivers. Sidewall wire parasitic used as the series
capacitor improves process tracking, and twisted and interleaved differential wires
reduced both coupled noise as well as Miller-doubled cross-capacitance. Multiple
29
drivers sharing a target wire allow simple FIR filters for driver-side pre-equalization.
Receivers require DC bias circuits or DC-balanced data. A test chip in a 180 nm, 1.8 V
process compared capacitive-coupled long wires with optimally-repeated full-swing
wires.
Zhengtao Yu and Xun Liu, 2009 reported that Rotary clock is a resonant
clocking technique delivers on-chip clock signal distribution with very low power
dissipation [97]. They presented the first rotary-clock-based nontrivial digital circuit.
Their design was fully digital and generated using CMOS standard cells in 0.18 m
technology. They showed that the suggested FIR filter was seamlessly integrated with
the rotary clock technique. It used the spatially distributed multiple clock phases of
rotary clock and achieves high power savings. Simulation results demonstrated that
rotary-clock-based FIR filter can operate successfully at 610 MHz, providing a
throughput of 39 Gb/s. In comparison with the conventional clock-tree-based design,
their design achieved a 34.6% clocking power saving and a 12.8% overall circuit power
saving. In addition, the peak current consumed by the rotary-clock-based filter is
substantially lower by 40% on the ave RSGe. Their study makes the crucial step toward
the application of rotary clock technique to a broad range of VLSI designs.
Arslan T. and Erdogan A.T. (1998) “Data block processing for low power
implementation of direct form FIR filters on single multiplier CMOS DSPs” [3]. A
novel feature of the filter was that the degree of pipelining was dynamically variable,
30
depending upon the input data rate. This feature was critical in obtaining very low filter
latency throughout the range of operating frequencies. The filter was a ten-tap six-bit
FIR filter, fabricated in a 0.18- m CMOS process. Resulting chips were fully functional
over a wide range of supply voltages, and exhibited throughputs of over 1.3giga-
items/s, and latencies of 2–5 clock cycles. Interestingly, the filter throughput was
limited by the synchronous portion of the chip; the internal asynchronous pipeline was
estimated to be capable of significantly higher throughputs, around 1.8 giga-items/s.
Chong-Fatt Law, Bah-Hwee Gwee, Chang and Joseph S., 2011 suggested a set
of modeling rules and a synthesis method for the design of asynchronous pipelines
[11]. To keep the circuit area and power dissipation of the asynchronous control
network small, the suggested approach avoided the conventional syntax directed
translation approach. Instead, it employed a data-driven design style and a coarse-grain
approach to the synthesis of asynchronous control, restricting asynchronous control to
the implementation of communication channels commonly found in asynchronous
pipelines and operations involving these channels. The suggested approach integrates
well into conventional synchronous design flows because they are based on Verilog
and System Verilog specifications, and generate register-transfer level models suitable
for functional simulation and logic synthesis using existing computer-aided design
tools. Using a 32-bit microprocessor, an interpolated finite-impulse-response filter
bank, and a Reed–Solomon error detector as design examples, they showed that the
suggested approach was competitive with other comparable reported methods.
2.3 Design a Novel Slice Reduction Algorithm for Low Power and Area Efficient
FIR Filter
Dempster, A.G and M.D. Macleod. 1995 [18]. „Use of minimum adder
multiplier blocks in FIR digital filters‟, IEEE Trans. Circuits Syst. II, Analog Digit.
Signal Process. 42 (9): 569–577. RSG-n algorithm attempts to synthesize all required
multiplications by initially placing all coefficients of adder cost 1 (determined using
31
MAG) into the multiplier block and then building higher cost values using
combinations of shifts, adds and subtracts of other adder Outputs within the block.
This approach leads to high logic depth blocks and hence pushes up FPGA area
requirement. Note that RSG-n synthesizes multiplier blocks in two stages. (1) Optimal
stage, (2) Suboptimal heuristic stage. If the multiplier block is fully synthesized after
stage1, Dempster and Macleod show that their algorithm ensures the absolute minimum
number of adders to implement a given block. If stage2 is required, a suboptimal
multiplier.
Jang, Y and S. Yang. 2002. "Low-power CSD linear phase FIR filter structure
using vertical common sub expression''. Electronics Letters, 38 (15):777-779 [40]. In
the transposed direct form, the coefficient multipliers share the same input and hence
commonly known as multiplier block (MB). The MB reduces the complexity of the
FIR filter implementations, by exploiting the redundancy in MCM. Thus, redundant
computations (partial product additions in the multiplier) are eliminated using BCSE.
The BCSE method in was formulated as a low complexity solution to realize
application specific filters where the coefficients are fixed. In the case of channel filters
for SDR receivers, the coefficients need to be changed as the filter specification
changes with the communication standard. Therefore, reconfigurability is a necessary
requirement for SDR channel filters. In the next section, we propose two architectures
that incorporate reconfigurability into the BCSE-based low complexity filter
architecture. Although BCSE to illustrate proposed reconfigurable filter architectures is
used.
Dempster, A.G and M.D. Macleod. 1994. "Constant integer multiplication using
minimum adders", Circuits. Devices and Svstenis, IEEE Proceedings, 141 (5): 407-413
[17]. In this section, the architecture of the proposed FIR filter is presented. PE
32
performs the coefficient multiplication operation with the help of a shift and add unit
which will be explained in the latter part of this section. The architecture of PE is
different for proposed MAG and PSM. In the MAG, the filter coefficients are
partitioned into fixed groups and hence the PE architecture involves constant shifters.
But in the PSM, the PE consists of programmable shifters (PS).
still make up a significant part of today‟s purchased home recording devices. Sync
processing was done mainly digital, however based on the traditional techniques of
sync slicing and smoothing using a PLL (phase-locked loop). Due to its recursive
nature, the PLL was limited to a second order loop, which limits its filter performance.
Limited filter performance results in the well-known picture stability compromise,
where noise suppression has to be compromised with VCR-playback picture stability.
Samueli, H., “On the design of optimal equiripple FIR digital filters for data
transmission applications”. Published in Circuits and Systems, IEEE Transactions on
(Volume: 35, Issue: 12), Dec 1988 [76]. They introduced a concept which replaces the
PLL by non-recursive processing. Removing the stability issues of recursive processing
opens a large parameter range for filter design and optimization. They also gave a
discussion of the parameter optimization effects and results, which includes subjective
quality tests. The research was greatly supported by a real-time implementation of the
novel algorithm using an industry standard FPGA.
Yeung, K.S. and Chan, S.C. “The design and multiplier-less realization of
software radio receivers with reduced system delay”. Published in Circuits and Systems
I: Regular Papers, IEEE Transactions on (Volume: 51, Issue: 12). Dec. 2004 [94].
Studied the design and multiplier-less realization of a new software radio receiver
(SRR) with reduced system delay. It employs low-delay finite-impulse response (FIR)
and digital all pass filters to effectively reduce the system delay of the multistage
decimators in SRRs. The optimal least-square and min-max designs of these low-delay
FIR and all pass-based filters are formulated as a semi definite programming (SDP)
problem, which allows zero magnitude constraint to be incorporated readily as
additional linear matrix inequalities (LMIs). By implementing the sampling rate
converter (SRC) using a variable digital filter (VDF) immediately after the integer
decimators, the needs for an expensive programmable FIR filter in the traditional SRR
was avoided. A new method for the optimal min-max design of this VDF-based SRC
34
using SDP was also reported and compared with traditional weight least squares
method. Other implementation issues including the multiplier-less and digital signal
processor (DSP) realizations of the SRR and the generation of the clock signal in the
SRC are also studied. Design results show that the system delay and implementation
complexities (especially in terms of high-speed variable multipliers) of the reported
architecture are considerably reduced as compared with conventional approaches.
Da-Zheng Feng, Xian-Da Zhang, Dong-Xia Chang and Wei Xing Zheng, “A
fast recursive total least squares algorithm for adaptive FIR filtering”. Published in
Signal Processing, IEEE Transactions on (Volume: 52, Issue: 10). Oct. 2004 [16].
35
Reported a new fast recursive total least squares (N-RTLS) algorithm to recursively
compute the TLS solution for adaptive finite impulse response (FIR) filtering. The N-
RTLS algorithm was based on the minimization of the constrained Rayleigh quotient
(c-RQ) in which the last entry of the parameter vector was constrained to the negative
one. As analysis results on the convergence of the reported algorithm, they study the
properties of the stationary points of the c-RQ. The high computational efficiency of
the new algorithm depends on the efficient computation of the fast gain vector (FGV)
and the adaptation of the c-RQ. Since the last entry of the parameter vector in the c-RQ
has been fixed as the negative one, a minimum point of the c-RQ was searched only
along the input data vector, and a more efficient N-RTLS algorithm was obtained by
using the FGV. As compared with Davila‟s RTLS algorithms, the N-RTLS algorithm
saves the 6 number of multiplies divides, and square roots (MADs). The global
convergence of the new algorithm was studied by LaSalle‟s invariance principle. The
performances of the relevant algorithms are compared via simulations, and the long-
term numerical stability of the N-RTLS algorithm was verified.
obtain significant gains in execution time and power consumption with respect to the
initial implementation choices. Their approach was a first step to systematically
applying high-level data structure transformations in the context of memory-efficient
and low-power multimedia systems.
IF-sampler with a built-in 22-tap complex band-pass since FIR function in 0.35- m
CMOS are shown, demonstrating the feasibility of the presented method.
Da-Zheng Feng and Wei Xing Zheng, “Fast Approximate Inverse Power
Iteration Algorithm for Adaptive Total Least-Squares FIR Filtering”. Signal
Processing, IEEE Transactions on (Volume: 54, Issue: 10). Oct 2006 [15]. Reported
that the presence of contaminating noises at both the input and the output of a finite-
impulse-response (FIR) system constitutes a major impediment to unbiased parameter
estimation. The total least-squares (TLS) method was known to be effective in
achieving unbiased estimation. In this correspondence, they develop a fast recursive
algorithm with a view to finding the TLS solution for adaptive FIR filtering. Given the
fact that the TLS solution was obtainable via inverse power iteration, they introduce a
novel but approximate inverse power iteration in combination with Galerkin method so
that the TLS solution can be updated adaptively at a lower computational cost. They
also take advantage of the regular form of the TLS solution to constrain the last
element of the filter parameter vector to the negative one. They further reduce the
computational complexity of the developed algorithm by making efficient computation
of the fast gain vector defined in and using rank-one update of the augmented
autocorrelation matrix.
Grossmann, L.D. and Eldar, Y.C., “An L1-Method for the Design of Linear-
Phase FIR Digital Filters”. Signal Processing, IEEE Transactions on (Volume: 55,
Issue: 11). Nov. 2007 [33]. This work considers the design of linear-phase finite
impulse response digital filters using an1 optimality criterion. The motivation for using
such filters as well as a mathematical framework for their design was introduced. It was
shown that 1 filter possesses flat pass-bands and stop-bands while keeping the
transition band comparable to that of least-squares filters. The uniqueness of 1-based
filters was explored, and an alternation type theorem for the optimal frequency
response was derived. An efficient algorithm for calculating the optimal filter
coefficients was reported, which may be viewed as the analogue of the celebrated
Remez exchange method. A comparison with other design techniques was made,
demonstrating that the 1 approach may be a good alternative in several applications.
Neumann, N., Duthel, T., Haas, M. and Schaffer, C.G., “General Design Rules
for the Synthesis of Dispersion and Dispersion Slope Compensation FIR and IIR Filters
With Reduced Complexity” [70]. Light wave Technology, Journal of (Volume: 25,
Issue: 11). Nov. 2007, reported that chromatic dispersion was one of the main
39
transmission impairments in optical systems with high bit rates, because the dispersion
limit scales with the square of the data rate. Optical delay-line filters can be used to
compensate dispersion and dispersion slope. They can be designed as feed-forward
finite-impulse response filters or as all-pass infinite-impulse response filters. Due to the
time-variant property of the dispersion, those filters have to be adaptive, which requires
fast and reliable calculation of the filter coefficients. In this work, a new approach to
calculate filter coefficients by applying analytical methods was presented. Design
examples are given, and the filter performance was discussed.
Xueyi Yu, Yuanfeng Sun, Woogeun Rhee and Zhihua Wang, “An FIR-
Embedded Noise Filtering Method for △ Fractional-N PLL Clock Generators”. Solid-
State Circuits, IEEE Journal of (Volume: 44, Issue: 9). Sep 2009 [93]. Described a
noise filtering method for delta ∑fractional- N PLL clock generators to reduce out-of-
band phase noise and improve short-term jitter performance. Use of a low-cost ring
VCO mandates a wideband PLL design and complicates filtering out high-frequency
quantization noise from the delta ∑modulator. A hybrid finite impulse response (FIR)
filtering technique based on a semi digital approach enables low-OSR delta
∑modulation with robust quantization noise reduction despite circuit mismatch and
nonlinearity. A prototype 1-GHz delta ∑fractional- PLL is implemented in 0.18 m
CMOS. Experimental results show that the reported semi digital method effectively
40
Design Compiler using TSMC 90 nm library and find that the reported LUT-multiplier-
based design involves nearly 15% less area than the DA-based design for the same
throughput and lower latency of implementation.
Seok-Jae Lee, Ji-Woong Choi, Seon Wook Kim and Jongsun Park, “A
Reconfigurable FIR Filter Architecture to Trade off Filter Performance for Dynamic
Power Consumption”. Very Large Scale Integration (VLSI) Systems, IEEE
Transactions on (Volume: 19, Issue: 12). Dec 2011 [78]. Presented an architectural
approach to the design of low power reconfigurable finite impulse response (FIR) filter.
The approach was well suited when the filter order was fixed and not changed for
particular applications, and efficient trade-off between power savings and filter
performance can be made using the reported architecture. Generally, FIR filter has
large amplitude variations in input data and coefficients. Considering the amplitude of
the filter coefficients and inputs, the reported FIR filter dynamically changes the filter
42
order. Mathematical analysis on power savings and filter performance degradation and
its experimental results show that the reported approach achieves significant power
savings without seriously compromising the filter performance. The power savings was
up to 41.9% with minor performance degradation and the area overhead of the reported
scheme was less than 5.3% compared to the conventional approach.
Shmaliy, Y.S. and Ibarra-Manzano, O., “Noise Power Gain for Discrete-Time
FIR Estimators”. Signal Processing Letters, IEEE (Volume: 18, Issue: 4). April 2011
[79]. Presented that the noise power gain (NPG) matrix was specialized in state space
for transversal finite impulse response (FIR) estimators intended for filtering,
prediction, and smoothing of discrete time-variant -state models with states measured.
A computationally efficient iterative algorithm for NPG associated with unbiased
estimation was provided along. Based on a numerical example, they show that the
estimates are well bounded with the error bound (EB) specified in the three-sigma
sense by the main components of the NPG matrix and measurement noise variance. In
turn, the cross-components in the NPG matrix represent interactions in the estimator
channels. It was concluded that EB can serve as an efficient measure of errors in
optimal and suboptimal FIR and Kalman structures.
Lu, W.Q., Zhao, S., Zhou, X.F., Ren, J.Y. and Sobelman, G.E., “Reconfigurable
baseband processing architecture for communication” [62]. Computers & Digital
Techniques, IET (Volume: 5, Issue: 1). January 2011.The development of multiple
communication standards and services has created the need for a flexible and efficient
computational platform for baseband signal processing. Using a set of heterogeneous
reconfigurable execution units (RCEUS) and a homogeneous control mechanism, the
reported reconfigurable architecture achieves a large computational capability while
still providing a high degree of flexibility. Software tools and a library of commonly
used algorithms are also reported in this work to provide a convenient framework for
hardware generation and algorithm mapping. In this way, the architecture can be
43