Linköping Studies in Science and Technology

Dissertations, No 1420
On the Implementation of Integer and
Non-Integer Sampling Rate Conversion
Muhammad Abbas
Division of Electronics Systems
Department of Electrical Engineering
Linköping University
SE–581 83 Linköping, Sweden
Linköping 2012
Linköping Studies in Science and Technology
Dissertations, No 1420
Muhammad Abbas
mabbas@isy.liu.se
www.es.isy.liu.se
Division of Electronics Systems
Department of Electrical Engineering
Linköping University
SE–581 83 Linköping, Sweden
Copyright c _ 2012 Muhammad Abbas, unless otherwise noted.
All rights reserved.
Papers A, D, and E reprinted with permission from IEEE.
Papers B and C, partial work, reprinted with permission from IEEE.
Abbas, Muhammad
On the Implementation of Integer and Non-Integer Sampling Rate Conver-
sion
ISBN 978-91-7519-980-1
ISSN 0345-7524
Typeset with L
A
T
E
X2
ε
Printed by LiU-Tryck, Linköping, Sweden 2012
To my loving parents
Abstract
The main focus in this thesis is on the aspects related to the implementation of
integer and non-integer sampling rate conversion (SRC). SRC is used in many
communication and signal processing applications where two signals or systems
having different sampling rates need to be interconnected. There are two basic
approaches to deal with this problem. The first is to convert the signal to analog
and then re-sample it at the desired rate. In the second approach, digital signal
processing techniques are utilized to compute values of the new samples from
the existing ones. The former approach is hardly used since the latter one
introduces less noise and distortion. However, the implementation complexity
for the second approach varies for different types of conversion factors. In
this work, the second approach for SRC is considered and its implementation
details are explored. The conversion factor in general can be an integer, a ratio
of two integers, or an irrational number. The SRC by an irrational numbers
is impractical and is generally stated for the completeness. They are usually
approximated by some rational factor.
The performance of decimators and interpolators is mainly determined by
the filters, which are there to suppress aliasing effects or removing unwanted
images. There are many approaches for the implementation of decimation and
interpolation filters, and cascaded integrator comb (CIC) filters are one of them.
CIC filters are most commonly used in the case of integer sampling rate conver-
sions and often preferred due to their simplicity, hardware efficiency, and rela-
tively good anti-aliasing (anti-imaging) characteristics for the first (last) stage of
a decimation (interpolation). The multiplierless nature, which generally yields
to low power consumption, makes CIC filters well suited for performing con-
version at higher rate. Since these filters operate at the maximum sampling
frequency, therefore, are critical with respect to power consumption. It is there-
fore necessary to have an accurate and efficient ways and approaches that could
be utilized to estimate the power consumption and the important factors that
are contributing to it. Switching activity is one such factor. To have a high-level
estimate of dynamic power consumption, switching activity equations in CIC
filters are derived, which may then be used to have an estimate of the dynamic
power consumption. The modeling of leakage power is also included, which is
an important parameter to consider since the input sampling rate may differ
several orders of magnitude. These power estimates at higher level can then be
used as a feed-back while exploring multiple alternatives.
Sampling rate conversion is a typical example where it is required to deter-
mine the values between existing samples. The computation of a value between
existing samples can alternatively be regarded as delaying the underlying signal
by a fractional sampling period. The fractional-delay filters are used in this
context to provide a fractional-delay adjustable to any desired value and are
therefore suitable for both integer and non-integer factors. The structure that
is used in the efficient implementation of a fractional-delay filter is know as
Farrow structure or its modifications. The main advantage of the Farrow struc-
v
vi Abstract
ture lies in the fact that it consists of fixed finite-impulse response (FIR) filters
and there is only one adjustable fractional-delay parameter, used to evaluate
a polynomial with the filter outputs as coefficients. This characteristic of the
Farrow structure makes it a very attractive structure for the implementation. In
the considered fixed-point implementation of the Farrow structure, closed-form
expressions for suitable word lengths are derived based on scaling and round-off
noise. Since multipliers share major portion of the total power consumption, a
matrix-vector multiple constant multiplication approach is proposed to improve
the multiplierless implementation of FIR sub-filters.
The implementation of the polynomial part of the Farrow structure is in-
vestigated by considering the computational complexity of different polynomial
evaluation schemes. By considering the number of operations of different types,
critical path, pipelining complexity, and latency after pipelining, high-level com-
parisons are obtained and used to short list the suitable candidates. Most of
these evaluation schemes require the explicit computation of higher order power
terms. In the parallel evaluation of powers, redundancy in computations is re-
moved by exploiting any possible sharing at word level and also at bit level.
As a part of this, since exponents are additive under multiplication, an ILP
formulation for the minimum addition sequence problem is proposed.
Populärvetenskaplig sammanfattning
I system där digitala signaler behandlas så kan man ibland behöva ändra da-
tahastigheten (samplingshastighet) på en redan existerande digital signal. Ett
exempel kan vara system där flera olika standarder stöds och varje standard
behöver behandlas med sin egen datahastighet. Ett annat är dataomvandlare
som ut vissa aspekter blir enklare att bygga om de arbetar vid en högre hastig-
het än vad som teoretiskt behövs för att representera all information i signalen.
För att kunna ändra hastigheten krävs i princip alltid ett digitalt filter som kan
räkna ut de värden som saknas eller se till att man säkert kan slänga bort vissa
data utan att informationen förstörs. I denna avhandling presenteras ett antal
resultat relaterat till implementeringen av sådana filter.
Den första klassen av filter är så kallade CIC-filter. Dessa används flitigt
då de kan implementeras med enbart ett fåtal adderare, helt utan mer kost-
samma multiplikatorer som behövs i många andra filterklasser, samt enkelt kan
användas för olika ändringar av datahastighet så länge ändringen av datatak-
ten är ett heltal. En modell för hur mycket effekt olika typer av implemente-
ringar förbrukar presenteras, där den största skillnaden jämfört med tidigare
liknande arbeten är att effekt som förbrukas genom läckningsströmmar är med-
tagen. Läckningsströmmar blir ett relativt sett större och större problem ju mer
kretsteknologin utvecklas, så det är viktigt att modellerna följer med. Utöver
detta presenteras mycket noggranna ekvationer för hur ofta de digitala värdena
som representerar signalerna i dessa filter statistiskt sett ändras, något som har
en direkt inverkan på effektförbrukningen.
Den andra klassen av filter är så kallade Farrow-filter. Dessa används för
att fördröja en signal mindre än en samplingsperiod, något som kan användas
för att räkna ut mellanliggande datavärden och därmed ändra datahastighet
gotdtyckligt, utan att behöva ta hänsyn till om ändringen av datatakten är
ett heltal eller inte. Mycket av tidigare arbete har handlat om hur man väljer
värden för multiplikatorerna, medan själva implementeringen har rönt mindre
intresse. Här presenteras slutna uttryck för hur många bitar som behövs i imple-
menteringen för att representera allt data tillräckligt noggrant. Detta är viktigt
eftersom antalet bitar direkt påverkar mängden kretsar som i sin tur påverkar
mängden effekt som krävs. Utöver detta presenteras en ny metod för att ersätta
multiplikatorerna med adderare och multiplikationer med två. Detta är intres-
sant eftersom multiplikationer med två kan ersättas med att koppla ledningarna
lite annorlunda och man därmed inte behöver några speciella kretsar för detta.
I Farrow-filter så behöver det även implementeras en uträkning av ett po-
lynom. Som en sista del i avhandlingen presenteras dels en undersökning av
komplexiteten för olika metoder att räkna ut polynom, dels föreslås två olika
metoder att effektivt räkna ut kvadrater, kuber och högre ordningens heltalsex-
ponenter av tal.
vii
Preface
This thesis contains research work done at the Division of Electronics Systems,
Department of Electrical Engineering, Linköping University, Sweden. The work
has been done between December 2007 and December 2011, and has resulted
in the following publications.
Paper A
The power modeling of different realizations of cascaded integrator-comb (CIC)
decimation filters, recursive and non-recursive, is extended with the modeling
of leakage power. The inclusion of this factor becomes more important when
the input sampling rate varies by several orders of magnitude. Also the im-
portance of the input word length while comparing recursive and non-recursive
implementations is highlighted.
M. Abbas, O. Gustafsson, and L. Wanhammar, “Power estimation of re-
cursive and non-recursive CIC filters implemented in deep-submicron tech-
nology,” in Proc. IEEE Int. Conf. Green Circuits Syst., Shanghai, China,
June 21–23, 2010.
Paper B
A method for the estimation of switching activity in cascaded integrator comb
(CIC) filters is presented. The switching activities may then be used to estimate
the dynamic power consumption. The switching activity estimation model is
first developed for the general-purpose integrators and CIC filter integrators.
The model was then extended to gather the effects of pipelining in the carry
chain paths of CIC filter integrators. The correlation in sign extension bits
is also considered in the switching estimation model. The switching activity
estimation model is also derived for the comb sections of the CIC filters, which
normally operate at the lower sampling rate. Different values of differential
delay in the comb part are considered for the estimation. The comparison of
theoretical estimated switching activity results, based on the proposed model,
and those obtained by simulation, demonstrates the close correspondence of
the estimation model to the simulated one. Model results for the case of phase
accumulators of direct digital frequency synthesizers (DDFS) are also presented.
M. Abbas, O. Gustafsson, and K. Johansson, “Switching activity estima-
tion for cascaded integrator comb filters,” IEEE Trans. Circuits Syst. I,
under review.
A preliminary version of the above work can be found in:
M. Abbas and O. Gustafsson, “Switching activity estimation of CIC filter
integrators,” in Proc. IEEE Asia Pacific Conf. Postgraduate Research in
Microelectronics and Electronics, Shanghai, China, Sept. 22–24, 2010.
ix
x Preface
Paper C
In this work, there are three major contributions. First, signal scaling in the
Farrow structure is studied which is crucial for a fixed-point implementation.
Closed-form expressions for the scaling levels for the outputs of each sub-filter
as well as for the nodes before the delay multipliers are derived. Second, a
round-off noise analysis is performed and closed-form expressions are derived.
By using these closed-form expressions for the round-off noise and scaling in
terms of integer bits, different approaches to find the suitable word lengths to
meet the round-off noise specification at the output of the filter are proposed.
Third, direct form sub-filters leading to a matrix MCM block is proposed, which
stems from the approach for implementing parallel FIR filters. The use of a
matrix MCM blocks leads to fewer structural adders, fewer delay elements, and
in most cases fewer total adders.
M. Abbas, O. Gustafsson, and H. Johansson, “On the implementation
of fractional delay filters based on the Farrow structure,” IEEE Trans.
Circuits Syst. I, under review.
Preliminary versions of the above work can be found in:
M. Abbas, O. Gustafsson, and H. Johansson, “Scaling of fractional delay
filters using the Farrow structure,” in Proc. IEEE Int. Symp. Circuits
Syst., Taipei, Taiwan, May 24–27, 2009.
M. Abbas, O. Gustafsson, and H. Johansson, “Round-off analysis and
word length optimization of the fractional delay filters based on the Farrow
structure,” in Proc. Swedish System-on-Chip Conf., Rusthållargården,
Arlid, Sweden, May 4–5, 2009.
Paper D
The computational complexity of different polynomial evaluation schemes is
studied. High-level comparisons of these schemes are obtained based on the
number of operations of different types, critical path, pipelining complexity,
and latency after pipelining. These parameters are suggested to consider to
short list suitable candidates for an implementation given the specifications. In
comparisons, not only multiplications are considered, but they are divided into
data-data multiplications, squarers, and data-coefficient multiplications. Their
impact on different parameters suggested for the selection is stated.
M. Abbas and O. Gustafsson, “Computational and implementation com-
plexity of polynomial evaluation schemes,” in Proc. IEEE Norchip Conf.,
Lund, Sweden, Nov. 14–15, 2011.
Paper E
The problem of computing any requested set of power terms in parallel using
summations trees is investigated. A technique is proposed, which first generates
Preface xi
the partial product matrix of each power term independently and then checks
the computational redundancy in each and among all partial product matrices
at bit level. The redundancy here relates to the fact that same three partial
products may be present in more than one columns, and, hence, all can be
mapped to the one full adder. The testing of the proposed algorithm for different
sets of powers, variable word lengths, and signed/unsigned numbers is done to
exploit the sharing potential. This approach has achieved considerable hardware
savings for almost all of the cases.
M. Abbas, O. Gustafsson, and A. Blad, “Low-complexity parallel evalua-
tion of powers exploiting bit-level redundancy,” in Proc. Asilomar Conf.
Signals Syst. Comp., Pacific Grove, CA, Nov. 7–10, 2010.
Paper F
An integer linear programming (ILP) based model is proposed for the compu-
tation of a minimal cost addition sequence for a given set of integers. Since
exponents are additive for a multiplication, the minimal length addition se-
quence will provide an efficient solution for the evaluation of a requested set of
power terms. Not only is an optimal model proposed, but the model is extended
to consider different costs for multipliers and squarers as well as controlling the
depth of the resulting addition sequence. Additional cuts are also proposed
which, although not required for the solution, help to reduce the solution time.
M. Abbas and O. Gustafsson, “Integer linear programming modeling of
addition sequences with additional constraints for evaluation of power
terms,” manuscript.
Paper G
Based on the switching activity estimation model derived in Paper B, the model
equations are derived for the case of phase accumulators of direct digital fre-
quency synthesizers (DDFS).
M. Abbas and O. Gustafsson, “Switching activity estimation of DDFS
phase accumulators,” manuscript.
The contributions are also made in the following publication but the contents
are not directly relevant or less relevant to the topic of thesis.
M. Abbas, F. Qureshi, Z. Sheikh, O. Gustafsson, H. Johansson, and K. Jo-
hansson, “Comparison of multiplierless implementation of nonlinear-phase
versus linear-phase FIR filters,” in Proc. Asilomar Conf. Signals Syst.
Comp., Pacific Grove, CA, Oct. 26–29, 2008.
Acknowledgments
I humbly thank Allah Almighty, the Compassionate, the Merciful, who gave
health, thoughts, affectionate parents, talented teachers, helping friends and an
opportunity to contribute to the vast body of knowledge. Peace and blessing of
Allah be upon the Holy Prophet MUHAMMAD (peace be upon him), the last
prophet of Allah, who exhort his followers to seek for knowledge from cradle to
grave and whose incomparable life is the glorious model for the humanity.
I would like to express my sincere gratitude towards:
My advisors, Dr. Oscar Gustafsson and Prof. Håkan Johansson, for
their inspiring and valuable guidance, enlightening discussions, kind and
dynamic supervision through out and in all the phases of this thesis. I
have learnt a lot from them and working with them has been a pleasure.
Higher Education Commission (HEC) of Pakistan is gratefully acknowl-
edged for the financial support and Swedish Institute (SI) for coordinating
the scholarship program. Linköping University is also gratefully acknowl-
edged for partial support.
The former and present colleagues at the Division of Electronics Systems,
Department of Electrical Engineering, Linköping University have created
a very friendly environment. They always kindly do their best to help
you.
Dr. Kenny Johansson for introducing me to the area and power estimation
tools and sharing many useful scripts.
Dr. Amir Eghbali and Dr. Anton Blad for their kind and constant support
throughout may stay here at the department.
Dr. Kent Palmkvist for help with FPGA and VHDL-related issues.
Peter Johansson for all the help regarding technical as well as administra-
tive issues.
Dr. Erik Höckerdal for proving the LaTeX template, which has made life
very easy.
Dr. Rashad Ramzan and Dr. Rizwan Asghar for their generous help and
guidance at the start of my PhD study.
Syed Ahmed Aamir, Muhammad Touqir Pasha, Muhammad Irfan Kazim,
and Syed Asad Alam for being caring friends and providing help in proof-
reading this thesis.
My friends here in Sweden, Zafar Iqbal, Fahad Qureshi, Ali Saeed, Saima
Athar, Nadeem Afzal, Fahad Qazi, Zaka Ullah, Muhammad Saifullah
Khan, Dr. Jawad ul Hassan, Mohammad Junaid, Tafzeel ur Rehman,
and many more for all kind of help and keeping my social life alive.
xiii
xiv Acknowledgments
My friends and colleagues in Pakistan especially Jamaluddin Ahmed, Khalid
Bin Sagheer, Muhammad Zahid, and Ghulam Hussain for all the help and
care they have provided during the last four years.
My parent-in-laws, brothers, and sisters for their encouragement and
prays.
My elder brother Dr. Qaisar Abbas and sister-in-law Dr. Uzma for their
help at the very start when I came to Sweden and throughout the years
later on. I have never felt that I am away from home. Thanks for hosting
many wonderful days spent there at Uppsala.
My younger brother Muhammad Waqas and sister for their prays and
taking care of the home in my absence.
My mother and my father for their non-stop prays, being a great asset
with me during all my stay here in Sweden. Thanks to both of you for
having confidence and faith in me. Truly you hold the credit for all my
achievements.
My wife Dr. Shazia for her devotion, patience, unconditional cooperation,
care of Abiha single handedly, and being away from her family for few
years. To my little princess, Abiha, who has made my life so beautiful.
To those not listed here, I say profound thanks for bringing pleasant mo-
ments in my life.
Muhammad Abbas
January, 2012,
Linköping Sweden
Contents
1 Introduction 1
1.1 Digital Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 FIR Filters . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.2 IIR Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Sampling Rate Conversion . . . . . . . . . . . . . . . . . . . . . . 3
1.2.1 Decimation . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2.2 Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2.3 Noble Identities . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Polyphase Representation . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Fractional-Delay Filters . . . . . . . . . . . . . . . . . . . . . . . 10
1.5 Power and Energy Consumption . . . . . . . . . . . . . . . . . . 13
2 Finite Word Length Effects 17
2.1 Numbers Representation . . . . . . . . . . . . . . . . . . . . . . . 17
2.1.1 Two’s Complement Numbers . . . . . . . . . . . . . . . . 18
2.1.2 Canonic Signed-Digit Representation . . . . . . . . . . . . 19
2.2 Fixed-Point Quantization . . . . . . . . . . . . . . . . . . . . . . 19
2.3 Overflow Characteristics . . . . . . . . . . . . . . . . . . . . . . . 21
2.3.1 Two’s Complement Addition . . . . . . . . . . . . . . . . 21
2.3.2 Two’s Complement Multiplication . . . . . . . . . . . . . 21
2.4 Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.5 Round-Off Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.6 Word Length Optimization . . . . . . . . . . . . . . . . . . . . . 23
2.7 Constant Multiplication . . . . . . . . . . . . . . . . . . . . . . . 24
3 Integer Sampling Rate Conversion 27
3.1 Basic Implementation . . . . . . . . . . . . . . . . . . . . . . . . 27
3.1.1 FIR Decimators . . . . . . . . . . . . . . . . . . . . . . . 28
3.1.2 FIR Interpolators . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Polyphase FIR Filters . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2.1 Polyphase FIR Decimators . . . . . . . . . . . . . . . . . 30
3.2.2 Polyphase FIR Interpolators . . . . . . . . . . . . . . . . 31
xv
xvi Contents
3.3 Multistage Implementation . . . . . . . . . . . . . . . . . . . . . 31
3.4 Cascaded Integrator Comb Filters . . . . . . . . . . . . . . . . . 32
3.4.1 CIC Filters in Interpolators and Decimators . . . . . . . . 33
3.4.2 Polyphase Implementation of Non-Recursive CIC . . . . . 35
3.4.3 Compensation Filters . . . . . . . . . . . . . . . . . . . . 36
4 Non-Integer Sampling Rate Conversion 37
4.1 Sampling Rate Conversion: Rational Factor . . . . . . . . . . . . 37
4.1.1 Small Rational Factors . . . . . . . . . . . . . . . . . . . . 37
4.1.2 Polyphase Implementation of Rational Sampling Rate Con-
version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.1.3 Large Rational Factors . . . . . . . . . . . . . . . . . . . . 40
4.2 Sampling Rate Conversion: Arbitrary Factor . . . . . . . . . . . 41
4.2.1 Farrow Structures . . . . . . . . . . . . . . . . . . . . . . 41
5 Polynomial Evaluation 45
5.1 Polynomial Evaluation Algorithms . . . . . . . . . . . . . . . . . 45
5.2 Horner’s Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.3 Parallel Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.4 Powers Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.4.1 Powers Evaluation with Sharing at Word-Level . . . . . . 48
5.4.2 Powers Evaluation with Sharing at Bit-Level . . . . . . . 49
6 Conclusions and Future Work 53
6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
7 References 55
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Publications 67
A Power Estimation of Recursive and Non-Recursive CIC Filters
Implemented in Deep-Submicron Technology 69
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
2 Recursive CIC Filters . . . . . . . . . . . . . . . . . . . . . . . . 72
2.1 Complexity and Power Model . . . . . . . . . . . . . . . . 73
3 Non-Recursive CIC Filters . . . . . . . . . . . . . . . . . . . . . . 76
3.1 Complexity and Power Model . . . . . . . . . . . . . . . . 76
4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
Contents xvii
B Switching Activity Estimation of Cascaded Integrator Comb
Filters 85
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
2 CIC Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
3 CIC Filter Integrators/Accumulators . . . . . . . . . . . . . . . 90
3.1 One’s Probability . . . . . . . . . . . . . . . . . . . . . . . 91
3.2 Switching Activity . . . . . . . . . . . . . . . . . . . . . . 93
3.3 Pipelining in CIC Filter Accumulators/Integrators . . . . 97
3.4 Switching Activity for Correlated Sign Extension Bits . . 101
3.5 Switching Activity for Later Integrator Stages . . . . . . . 102
4 CIC Filter Combs . . . . . . . . . . . . . . . . . . . . . . . . . . 103
4.1 One’s Probability . . . . . . . . . . . . . . . . . . . . . . . 103
4.2 Switching Activity . . . . . . . . . . . . . . . . . . . . . . 104
4.3 Arbitrary Differential Delay Comb . . . . . . . . . . . . . 106
4.4 Switching Activity for Later Comb Stages . . . . . . . . . 108
5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109
5.1 CIC Integrators . . . . . . . . . . . . . . . . . . . . . . . . 109
5.2 CIC Combs . . . . . . . . . . . . . . . . . . . . . . . . . . 110
6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116
A Double Differential Delay Comb . . . . . . . . . . . . . . . . . . . 118
A.1 Switching Activity . . . . . . . . . . . . . . . . . . . . . . 118
C On the Implementation of Fractional-Delay Filters Based on
the Farrow Structure 123
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
1.1 Contribution of the Paper . . . . . . . . . . . . . . . . . . 126
1.2 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
2 Adjustable FD FIR Filters and the Farrow Structure . . . . . . 127
3 Scaling of the Farrow Structure for FD FIR Filters . . . . . . . 128
3.1 Scaling the Output Values of the Sub-Filters . . . . . . . 129
3.2 Scaling in the Polynomial Evaluation . . . . . . . . . . . . 130
4 Round-Off Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
4.1 Rounding and Truncation . . . . . . . . . . . . . . . . . . 134
4.2 Round-Off Noise in the Farrow Structure . . . . . . . . . 136
4.3 Approaches to Select Word Lengths . . . . . . . . . . . . 137
5 Implementation of Sub-Filters in the Farrow Structure . . . . . 143
6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
6.1 Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
6.2 Word Length Selection Approaches . . . . . . . . . . . . . 146
6.3 Sub-Filter Implementation . . . . . . . . . . . . . . . . . . 148
7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
xviii Contents
D Computational and Implementation Complexity of Polynomial
Evaluation Schemes 155
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
2 Polynomial Evaluation Algorithms . . . . . . . . . . . . . . . . . 159
2.1 Horner’s Scheme . . . . . . . . . . . . . . . . . . . . . . . 159
2.2 Dorn’s Generalized Horner Scheme . . . . . . . . . . . . . 159
2.3 Estrin’s Scheme . . . . . . . . . . . . . . . . . . . . . . . . 160
2.4 Munro and Paterson’s Scheme . . . . . . . . . . . . . . . 161
2.5 Maruyama’s Scheme . . . . . . . . . . . . . . . . . . . . . 162
2.6 Even Odd (EO) Scheme . . . . . . . . . . . . . . . . . . . 163
2.7 Li et al. Scheme . . . . . . . . . . . . . . . . . . . . . . . 164
2.8 Pipelining, Critical Path, and Latency . . . . . . . . . . . 164
3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165
4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
E Low-Complexity Parallel Evaluation of Powers Exploiting Bit-
Level Redundancy 175
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178
2 Powers Evaluation of Binary Numbers . . . . . . . . . . . . . . . 178
2.1 Unsigned Binary Numbers . . . . . . . . . . . . . . . . . . 178
2.2 Powers Evaluation for Two’s Complement Numbers . . . 180
2.3 Partial Product Matrices with CSD Representation of
Partial Product Weights . . . . . . . . . . . . . . . . . . . 180
2.4 Pruning of the Partial Product Matrices . . . . . . . . . . 180
3 Proposed Algorithm for the Exploitation of Redundancy . . . . . 181
3.1 Unsigned Binary Numbers Case . . . . . . . . . . . . . . . 181
3.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
3.3 Two’s Complement Input and CSD Encoding Case . . . . 182
4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
F Integer Linear Programming Modeling of Addition Sequences
With Additional Constraints for Evaluation of Power Terms 189
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192
2 Addition Chains and Addition Sequences . . . . . . . . . . . . . 193
3 Proposed ILP Model . . . . . . . . . . . . . . . . . . . . . . . . . 194
3.1 Basic ILP Model . . . . . . . . . . . . . . . . . . . . . . . 194
3.2 Minimizing Weighted Cost . . . . . . . . . . . . . . . . . 195
3.3 Minimizing Depth . . . . . . . . . . . . . . . . . . . . . . 196
4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196
5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201
Contents xix
G Switching Activity Estimation of DDFS Phase Accumulators 203
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206
2 One’s Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . 207
3 Switching Activity . . . . . . . . . . . . . . . . . . . . . . . . . . 207
4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
Chapter 1
Introduction
Linear time-invariant (LTI) systems have the same sampling rate at the input,
output, and inside of the systems. In applications involving systems operating
at different sampling rates, there is a need to convert the given sampling rate to
the desired sampling rate, without destroying the signal information of interest.
The sampling rate conversion (SRC) factor can be an integer or a non-integer.
This chapter gives a brief overview of SRC and the role of filtering in SRC.
Digital filters are first introduced. The SRC, when changing the sampling
rate by an integer factor, is then explained. The time and frequency-domain
representations of the downsampling and upsampling operations are then given.
The concept of decimation and interpolation that include filtering is explained.
The description of six identities that enable the reductions in computational
complexity of multirate systems is given. A part of the chapter is devoted to
the efficient polyphase implementation of decimators and interpolators. The
fractional-delay filters are then briefly reviewed. Finally, the power and energy
consumption in CMOS circuits is described.
1.1 Digital Filters
Digital filters are usually used to separate signals from noise or signals in dif-
ferent frequency bands by performing mathematical operations on a sampled,
discrete-time signal. A digital filter is characterized by its transfer function, or
equivalently, by its difference equation.
1.1.1 FIR Filters
If the impulse response is of finite duration and becomes zero after a finite
number of samples, it is a finite-length impulse response (FIR) filter. A causal
1
2 Chapter 1. Introduction
x(n)
z
−1
z
−1
z
−1
z
−1
h(3) h(2) h(1) h(0) h(N −1) h(N)
y(n)
Figure 1.1: Direct-form realization of an N-th order FIR filter.
z
−1
h(2)
z
−1
z
−1
h(1) h(0)
x(n)
h(N −1) h(N)
y(n)
Figure 1.2: Transposed direct-form realization of an N-th order FIR filter.
FIR filter of order N is characterized by a transfer function H(z), defined as [1, 2]
H(z) =
N

k=0
h(k)z
−k
, (1.1)
which is a polynomial in z
−1
of degree N. The time-domain input-output
relation of the above FIR filter is given by
y(n) =
N

k=0
h(k)x(n −k), (1.2)
where y(n) and x(n) are the output and input sequences, respectively, and
h(0), h(1), . . . , h(N) are the impulse response values, also called filter coefficients.
The parameter N is the filter order and total number of coefficients, N + 1,
is the filter length. The FIR filters can be designed to provide exact linear-
phase over the whole frequency range and are always bounded-input bounded-
output (BIBO) stable, independent of the filter coefficients [1–3]. The direct
form structure in Fig. 1.1 is the block diagram description of the difference
equation (1.2). The transpose structure is shown in Fig. 1.2. The number of
coefficient multiplications in the direct and transpose forms can be halved when
exploiting the coefficient symmetry of linear-phase FIR filter. The total number
of coefficient multiplications will be N/2 +1 for even N and (N + 1)/2 for odd
N.
1.1.2 IIR Filters
If the impulse response has an infinite duration, i.e., theoretically never ap-
proaches zero, it is an infinite-length impulse response (IIR) filter. This type
1.2. Sampling Rate Conversion 3
z
−1
z
−1
b
N
b
N−1
b
2
b
1
b
0
z
−1
z
−1
z
−1
−a
N
−a
N−1
−a
2
−a
1
x(n) y(n)
z
−1
Figure 1.3: Direct-form I IIR realization.
of filter is recursive and represented by a linear constant-coefficient difference
equation as [1, 2]
y(n) =
N

k=0
b
k
x(n −k) −
N

k=1
a
k
y(n −k). (1.3)
The first sum in (1.3) is non-recursive and the second sum is recursive. These
two parts can be implemented separately and connected together. The cascade
connection of the non-recursive and recursive sections results in a structure
called direct-form I as shown in Fig. 1.3.
Compared with an FIR filter, an IIR filter can attain the same magnitude
specification requirements with a transfer function of significantly lower order [1].
The drawbacks are nonlinear phase characteristics, possible stability issues, and
sensitivity to quantization errors [1, 4].
1.2 Sampling Rate Conversion
Sampling rate conversion is the process of converting a signal from one sampling
rate to another, while changing the information carried by the signal as little as
possible [5–8]. SRC is utilized in many DSP applications where two signals or
systems having different sampling rates need to be interconnected to exchange
digital signal data. The SRC factor can in general be an integer, a ratio of
two integers, or an irrational number. Mathematically, the SRC factor can be
defined as
R =
F
out
F
in
, (1.4)
4 Chapter 1. Introduction
where F
in
and F
out
are the original input sampling rate and the new sampling
rate after the conversion, respectively. The sampling frequencies are chosen in
such a way that each of them exceeds at least two times the highest frequency
in the spectrum of original continuous-time signal. When a continuous-time
signal x
a
(t) is sampled at a rate F
in
, and the discrete-time samples are x(n) =
x
a
(n/F
in
), SRC is required when there is need of x(n) = x
a
(n/F
out
) and the
continuous-time signal x
a
(t) is not available anymore. For example, an analog-
to-digital (A/D) conversion system is supplying a signal data at some sampling
rate, and the processor used to process that data can only accept data at a
different sampling rate. One alternative is to first reconstruct the corresponding
analog signal and, then, re-sample it with the desired sampling rate. However,
it is more efficient to perform SRC directly in the digital domain due to the
availability of accurate all-digital sampling rate conversion schemes.
SRC is available in two flavors. For R < 1, the sampling rate is reduced and
this process is known as decimation. For R > 1, the sampling rate is increased
and this process is known as interpolation.
1.2.1 Decimation
Decimation by a factor of M, where M is a positive integer, can be performed
as a two-step process, consisting of an anti-aliasing filtering followed by an
operation known as downsampling [9]. A sequence can be downsampled with a
factor of M by retaining every M-th sample and discarding all of the remaining
samples. Applying the downsampling operation to a discrete-time signal, x(n),
produces a downsampled signal y(m) as
y(m) = x(mM). (1.5)
The time index m in the above equation is related to the old time index n
by a factor of M. The block diagram showing the downsampling operation is
shown in Fig. 1.4a The sampling rate of new discrete-time signal is M times
smaller than the sampling rate of original signal. The downsampling operation
is linear but time-varying operation. A delay in the original input signal by
some samples does not result in the same delay of the downsampled signal.
A signal downsampled by two different factors may have two different shape
output signals but both carry the same information if the downsampling factor
satisfies the sampling theorem criteria.
The frequency domain representation of downsampling can be found by tak-
ing the z-transform to both sides of (1.5) as
Y (e
jωT
) =
+∞

−∞
x(mM)e
−jωTm
=
1
M
M−1

k=0
X(e
j(ωT−2πk)/M
). (1.6)
The above equation shows the implication of the downsampling operation on
the spectrum of the original signal. The output spectrum is a sum of M uni-
formly shifted and stretched versions of X(e
jωT
) and also scaled by a factor of
1.2. Sampling Rate Conversion 5
M
y(m) x(n)
(a) M-fold downsampler.
H(z) M
y(m) x(n) x
1
(m)
(b) M-fold decimator.
Figure 1.4: M-fold downsampler and decimation.
¸
¸
¸Y (e
jωT
)
¸
¸
¸

¸
¸
¸X
1
(e
jωT
)
¸
¸
¸
π/M π
π 2π 2Mπ
¸
¸
¸H
ideal
(e
jωT
1
)
¸
¸
¸
ωT
ωT
1
Figure 1.5: Spectra of the intermediate and decimated sequence.
1/M. The signals which are bandlimited to π/M can be downsampled without
distortion.
Decimation requires that aliasing should be avoided. Therefore, the first
step is to bandlimit the signal to π/M and then downsampling by a factor M.
The block diagram of a decimator is shown in Fig. 1.4b. The performance of
a decimator is determined by the filter H(z) which is there to suppress the
aliasing effect to an acceptable level. The spectra of the intermediate sequence
and output sequence obtained after downsampling are shown in Fig. 1.5. The
ideal filter, as shown by dotted line in Fig. 1.5, should be a lowpass filter with
the stopband edge at ω
s
T
1
= π/M.
1.2.2 Interpolation
Interpolation by a factor of L, where L is a positive integer, can be realized
as a two-step process of upsampling followed by an anti-imaging filtering. The
upsampling by a factor of L is implemented by inserting L − 1 zeros between
two consecutive samples [9]. An upsampling operation to a discrete-time signal
x(n) produces an upsampled signal y(m) according to
y(m) =
_
x(m/L), m = 0, ±L, ±2L, . . . ,
0, otherwise.
(1.7)
A block diagram shown in Fig. 1.6a is used to represent the upsampling op-
6 Chapter 1. Introduction
y(m) x(n)
L
(a) L-fold upsampler.
L
x
1
(m) x(n) y(m)
H(z)
(b) L-fold interpolator.
Figure 1.6: L-fold upsampler and interpolator.
¸
¸
¸X(e
jωT
)
¸
¸
¸
¸
¸
¸H
ideal
(e
jωT
1
)
¸
¸
¸
π/L 2π/L 2π
2Lπ π 2π
2π π π/L
¸
¸
¸X
1
(e
jωT
)
¸
¸
¸
¸
¸
¸Y (e
jωT
1
)
¸
¸
¸
ωT
ωT
1
ωT
1
Figure 1.7: Spectra of the original, intermediate, and output sequences.
eration. The upsampling operation increases the sampling rate of the original
signal by L times. The upsampling operation is a linear but time-varying oper-
ation. A delay in the original input signal by some samples does not result in
the same delay of the upsampled signal. The frequency domain representation
of upsampling can be found by taking the z-transform of both sides of (1.7) as
Y (e
jωT
) =
+∞

−∞
y(m)e
−jωTm
= X(e
jωTL
). (1.8)
The above equation shows that the upsampling operation leads to L−1 images
of the spectrum of the original signal in the baseband.
Interpolation requires the removal of the images. Therefore in first step,
upsampling by a factor of L is performed, and in the second step, unwanted
images are removed using anti-imaging filter. The block diagram of an interpo-
lator is shown in Fig. 1.6b. The performance of an interpolator is determined
1.2. Sampling Rate Conversion 7
c
2
x
1
(m)
M
M
y(n) y(n)
M
c
1
c
1
c
2
x
2
(m)
x
1
(m)
x
2
(m)
(a) Noble identity 1.
z
−M
M
x(m) y(n) x(m) y(n)
M z
−1
(b) Noble identity 3.
x(m) x(m)
M M
y(n) y(n)
H(z) H(z
M
)
(c) Noble identity 5.
Figure 1.8: Noble identities for decimation.
by the filter H(z), which is there to remove the unwanted images. As shown in
Fig. 1.7, the spectrum of the sequence, x
1
(m), not only contains the baseband of
the original signal, but also the repeated images of the baseband. Apparently,
the desired sequence, y(m), can be obtained from x
1
(m) by removing these
unwanted images. This is performed by the interpolation (anti-imaging) filter.
The ideal filter should be a lowpass filter with the stopband edge at ω
s
T
1
= π/L
as shown by dotted line in Fig. 1.7.
1.2.3 Noble Identities
The six identities, called noble identities, help to move the downsampler and
upsampler operations to a more desirable position to enable an efficient imple-
mentation structure. As a result, the arithmetic operations of additions and
multiplications are to be evaluated at the lowest possible sampling rate. In
SRC, since filtering has to be performed at the higher sampling rate, the com-
putational efficiency may be improved if downsampling (upsampling) operations
are introduced into the filter structures. In the first and second identities, seen
in Figs. 1.9a and 1.8a, moving converters leads to evaluation of additions and
multiplications at lower sampling rate. The third and fourth identities, seen
in Figs. 1.9b and 1.8b, show that a delay of M (L) sampling periods at the
higher sampling rate corresponds to a delay of one sampling period at the lower
rate. The fifth and sixth identities, seen in Figs. 1.9c and 1.8c, are generalized
versions of the third and fourth identities.
8 Chapter 1. Introduction
c
2
c
1
L
x(n)
y
1
(m)
y
2
(m)
x(n)
c
1
L
c
2
L
y
1
(m)
y
2
(m)
(a) Noble identity 2.
L L z
−1
x(n) y(m) x(n) y(m)
z
−L
(b) Noble identity 4.
L
y(m)
x(n)
L
y(m) x(n)
H(z) H(z
L
)
(c) Noble identity 6.
Figure 1.9: Noble identities for interpolation.
1.3 Polyphase Representation
A very useful tool in multirate signal processing is the so-called polyphase repre-
sentation of signals and systems [5, 10]. It facilitates considerable simplifications
of theoretical results as well as efficient implementation of multirate systems. To
formally define it, an LTI system is considered with a transfer function
H(z) =
+∞

n=−∞
h(n)z
−n
. (1.9)
For an integer M, H(z) can be decomposed as
H(z) =
M−1

m=0
z
−m
+∞

n=−∞
h(nM +m)z
−nM
=
M−1

m=0
z
−m
H
m
(z
M
). (1.10)
The above representation is equivalent to dividing the impulse response h(n)
into M non-overlapping groups of samples h
m
(n), obtained from h(n) by M-
fold decimation starting from sample m. The subsequences h
m
(n) and the
corresponding z-transforms defined in (1.10) are called the Type-1 polyphase
components of H(z) with respect to M [10].
The polyphase decomposition is widely used and its combination with the no-
ble identities leads to efficient multirate implementation structures. Since each
polyphase component contains M − 1 zeros between two consecutive samples
and only nonzero samples are needed for further processing, zeros can be dis-
charged resulting in downsampled-by-M polyphase components. The polyphase
components, as a result, operate at M times lower sampling rate.
1.3. Polyphase Representation 9
z
−1
z
−1
z
−1
x(n)
M
y(m)
H
0
(z
M
)
H
1
(z
M
)
H
2
(z
M
)
H
M−1
(z
M
)
(a) Polyphase decomposition of a decimation filter.
H
1
(z)
z
−1
z
−1
y(m)
M
M
M
x(m)
H
0
(z)
H
1
(z)
y(m)
H
0
(z)
x(n)
H
M−1
(z)
H
M−1
(z)
(b) Moving downsampler to before sub-filters and replacing input structure by a commutator.
Figure 1.10: Polyphase implementation of a decimator.
An efficient implementation of decimators and interpolators results if the
filter transfer function is represented in polyphase decomposed form [10]. The
filter H(z) in Fig. 1.4b is represented by its polyphase representation form as
shown in Fig. 1.10a. A more efficient polyphase implementation and its equiva-
lent commutative structure is shown in Fig. 1.10b. Similarly, the interpolation
filter H(z) in Fig. 1.6b is represented by its polyphase representation form as
shown in Fig. 1.11a. Its equivalent structure, but more efficient in hardware
implementation, is shown in Fig. 1.11b.
For FIR filters, the polyphase decomposition into low-order sub-filters is very
easy. However, for IIR filters, the polyphase decomposition is not so simple, but
it is possible to do so [10]. An IIR filter has a transfer function that is a ratio
of two polynomials. The representation of the transfer function into the form,
(1.10), needs some modifications in the original transfer function in such a way
that the denominator is only function of powers of z
M
or z
L
, where M and L
10 Chapter 1. Introduction
y(m)
z
−1
z
−1
z
−1
H
0
(z
L
)
H
1
(z
L
)
H
2
(z
L
)
H
L−1
(z
L
)
L
x(n)
(a) Polyphase decomposition of an interpolation filter.
x(n)
z
−1
z
−1
y(m)
x(n) y(m)
H
0
(z)
H
1
(z)
H
0
(z)
H
1
(z)
H
L−1
(z) H
L−1
(z)
L
L
L
(b) Moving upsampler to after sub-filters and replacing output structure by a commutator.
Figure 1.11: Polyphase implementation of an interpolator.
are the polyphase decomposition factors. Several approaches are available in
the literature for the polyphase decomposition of the IIR filters. In the first
approach [11], the original IIR transfer function is re-arranged and transformed
into (1.10). The polyphase sub-filters in the second approach has distinct all-
pass sub-filters [12–15].
In this thesis, only FIR filters and their polyphase implementation forms are
considered for the sampling rate conversion.
1.4 Fractional-Delay Filters
Fractional-delay (FD) filters find applications in, for example, mitigation of
symbol synchronization errors in digital communications [16–19], time-delay
estimation [20–22], echo cancellation [23], and arbitrary sampling rate conver-
1.4. Fractional-Delay Filters 11
0 0.2 0.4 0.6 0.8 1
0
0.5
1
1.5
2
2.5
3
3.5
ωT/π
P
h
a
s
e

d
e
l
a
y
d=0.9
d=0.8
d=0.7
d=0.6
d=0.5
d=0.4
d=0.3
d=0.2
d=0.1
Figure 1.12: Phase-delay characteristics of FD filters designed for delay param-
eter d = ¦0.1, 0.2, . . . , 0.9¦.
sion [24–26]. FD filters are used to provide a fractional delay adjustable to any
desired value. Ideally, the output y(n) of an FD filter for an input x(n) is given
by
y(n) = x(n −D), (1.11)
where D is a delay. Equation (1.11) is valid for integer values of D only. For
non-integer values of D, (1.11) need to be approximated. The delay parameter
D can be expressed as
D = D
int
+d, (1.12)
where D
int
is the integer part of D and d is the FD. The integer part of the delay
can then be implemented as a chain of D
int
unit delays. The FD d however needs
approximation. In the frequency domain, an ideal FD filter can be expressed as
H
des
(e

) = e
−j(D
int
+d)ωT
. (1.13)
The ideal FD filter in (1.13) can be considered as all-pass and having a linear-
phase characteristics. The magnitude and phase responses are
[H
des
(e
jωT
)[ = 1 (1.14)
and
φ(ωT) = −(D
int
+d)ωT. (1.15)
12 Chapter 1. Introduction
0 0.2 0.4 0.6 0.8 1
0
0.2
0.4
0.6
0.8
1
ωT/π
M
a
g
n
i
t
u
d
e
d=0
d=0.9, 0.1
d=0.8, 0.2
d=0.7, 0.3
d=0.6, 0.4
d=0.5
Figure 1.13: Magnitude of FD filters designed for delay parameter d =
¦0.1, 0.2, . . . , 0.9¦.
For the approximation of the ideal filter response in (1.11), a wide range
of FD filters have been proposed based on FIR and IIR filters [27–34]. The
phase delay and magnitude characteristics of the FD filter based on first order
Lagrange interpolation [32, 35] are shown in Figs. 1.12 and 1.13. The underlying
continuous-time signal x
a
(nT), delayed by a fractional-delay d, can be expressed
as
y(nT) = x
a
(nT −D
int
T −dT), (1.16)
and it is demonstrated for d = 0.2 and d = 0.4 in Fig. 1.14 .
In the application of FD filters for SRC, the fractional-delay d is changed at
every instant an output sample occurs. The input and output rates will now be
different. If the sampling rate is required to be increased by a factor of two, for
each input sample there are now two output samples. The fractional-delay d
assumes the value 0 and 0.5 for each input sample, and the two corresponding
output samples are computed. For a sampling rate increase by a factor of L,
d will take on all values in a sequential manner between ¦(L − 1)/L, 0¦, with
a step size of 1/L. For each input sample, L output samples are generated.
Equivalently, it can be interpreted as delaying the underlying continuous-time
signal by L different values of fractional-delays.
1.5. Power and Energy Consumption 13
−2 0 2 4 6 8 10
−1
−0.5
0
0.5
1
1.5
Sample index n
A
m
p
l
i
t
u
d
e


d=0
d=0.2
d=0.4
Figure 1.14: The underlying continuous-time signal delayed by d = 0.2 and
d = 0.4 with FD filtering.
1.5 Power and Energy Consumption
Power is dissipated in the form of heat in digital CMOS circuits. The power
dissipation is commonly divided into three different sources: [36, 37]
Dynamic or switching power consumption
Short circuit power consumption
Leakage power consumption
The above sources are summarized in an equation as
P
avg
= P
dynamic
+P
short-circuit
+P
leakage
= α
0→1
C
L
V
2
DD
f
clk
+I
sc
V
DD
+I
leakage
V
DD
. (1.17)
The switching or dynamic power consumption is related to charging and
discharging of a load capacitance C
L
through the PMOS and NMOS transistors
during low-to-high and high-to low transitions at the output, respectively. The
total energy drawn from the power supply for low-to-high transition, seen in
Fig. 1.15a, is C
L
V
2
DD
, half of which is dissipated in the form of heat through the
PMOS transistors while the other half is stored in the load capacitor. During
the pull-down, high to low transition, seen in Fig. 1.15b, the energy stored on
C
L
which is C
L
V
2
DD
/2 is dissipated as heat by the NMOS transistors. If all these
transitions occur at the clock rate of f
clk
then the switching power consumption
14 Chapter 1. Introduction
V
in
V
DD
V
out
C
L
(a) A rising output transition on the CMOS in-
verter.
V
DD
V
in
V
out
C
L
(b) A falling output transition on the inverter.
Figure 1.15: A rising and falling output transition on the CMOS inverter. The
solid arrows represent the charging and discharging of the load capacitance C
L
.
The dashed arrow is for the leakage current.
is given by C
L
V
2
DD
f
clk
. However the switching of the data is not always at the
clock rate but rather at some reduced rate which is best defined by another
parameter α
0→1
, defined as the average number of times in each cycle that a
node makes a transition from low to high. All the parameters in the dynamic
power equation, except α
0→1
, are defined by the layout and specification of the
circuit.
The second power term is due to the direct-path short circuit current, I
sc
,
which flows when both of the PMOS and NMOS transistors are active simulta-
neously, resulting in a direct path from supply to ground.
Leakage power, on the other hand, is dissipated in the circuits when they are
idle, as shown in Figs. 1.15a and 1.15b by a dashed line. The leakage current,
I
leakage
, consists of two major contributions: I
sub
and I
gate
. The term I
sub
is
the sub-threshold current caused by low threshold voltage, when both NMOS
1.5. Power and Energy Consumption 15
and PMOS transistors are off. The other quantity, I
gate
, is the gate current
caused by reduced thickness of the gate oxide in deep sub-micron process . The
contribution of the reverse leakage currents, due to of reverse bias between
diffusion regions and wells, is small compared to sub-threshold and gate leakage
currents.
In modern/concurrent CMOS technologies, the two foremost forms of power
consumptions are dynamic and leakage. The relative contribution of these two
forms of power consumptions has greatly evolved over the period of time. To-
day, when technology scaling motivates the reduced power supply and threshold
voltage, the leakage component of power consumption has started to become
dominant [38–40]. In today’s processes, sub-threshold leakage is the main con-
tributor to the leakage current.
Chapter 2
Finite Word Length Effects
Digital filters are implemented in hardware with finite-precision numbers and
arithmetic. As a result, the digital filter coefficients and internal signals are
represented in discrete form. This generally leads to two different types of finite
word length effects.
First, there are the errors in the representing of coefficients. The coefficients
representation in finite precision (quantization) has the effect of a slight change
in the location of the filter poles and zeros. As a result, the filter frequency
response differs from the response with infinite-precision coefficients. However,
this error type is deterministic and is called coefficient quantization error.
Second, there are the errors due to multiplication round-off, that results
from the rounding or truncation of multiplication products within the filter.
The error at the filter output that results from these roundings or truncations
is called round-off noise.
This chapter outlines the finite word length effects in digital filters. It first
discusses binary number representation forms. Different types of fixed-point
quantizations are then introduced along with their characteristics. The overflow
characteristics in digital filters are briefly reviewed with respect to addition and
multiplication operations. Scaling operation is then discussed which is used
to prevent overflows in digital filter structures. The computation of round-
off noise at the digital filter output is then outlined. The description of the
constant coefficient multiplication is then given. Finally, different approaches
for the optimization of word length are reviewed.
2.1 Numbers Representation
In digital circuits, a number representation with a radix of two, i.e., binary
representation, is most commonly used. Therefore, a number is represented by
17
18 Chapter 2. Finite Word Length Effects
a sequence of binary digits, bits, which are either 0 or 1. A w-bit unsigned
binary number can be represented as
X = x
0
x
1
x
2
...x
w−2
x
w−1
, (2.1)
with a value of
X =
w−1

i=0
x
i
2
w−i−1
, (2.2)
where x
0
is the most significant bit (MSB) and x
w−1
is the least significant bit
(LSB) of the binary number.
A fixed-point number consists of an integral part and a fractional part, with
the two parts separated by a binary point in radix of two. The position of the
binary point is almost always implied and thus the point is not explicitly shown.
If a fixed-point number has w
I
integer bits and w
F
fractional bits, it can be
expressed as
X = x
w
I
−1
. . . x
1
x
0
.x
−1
x
−2
. . . x
−w
F
. (2.3)
The value can be obtained as
X =
w
I
−1

i=0
x
i
2
i
+
−1

i=−w
F
x
i
2
i
. (2.4)
2.1.1 Two’s Complement Numbers
For a suitable representation of numbers and an efficient implementation of
arithmetic operation, fixed-point arithmetics with a word length of w bits is
considered. Because of its special properties, the two’s complement representa-
tion is considered, which is the most common type of arithmetic used in digital
signal processing. The numbers are usually normalized to [−1, 1), however, to
accommodate the integer bits, the range [−2
w
I
, −2
w
I
), where w
I
∈ N, is as-
sumed. The quantity w
I
denotes the number of integer bits. The MSB, the
left-most bit in w, is used as the sign bit. The sign bit is treated in the same
manner as the other bits. The fraction part is represented with w
F
= w−1−w
I
bits. The quantization step is as a result ∆ = 2
−w
F
.
If X
2C
is a w-bit number in two’s complement form, then by using all defi-
nitions considered above, X can be represented as
X
2C
= 2
w
I
_
−x
0
2
0
+
w−1

i=1
x
i
2
−i
_
, x
i
∈ ¦0, 1¦, i = 0, 1, 2, . . . , w −1,
= −x
0
2
w
I
. ¸¸ .
sign bit
+
w
I

i=1
x
i
2
w
I
−i
. ¸¸ .
integer
+
w−1

i=w
I
+1
x
i
2
w
I
−i
. ¸¸ .
fraction
,
2.2. Fixed-Point Quantization 19
or in compact form as
X
2C
= [ x
0
.¸¸.
sign bit
[ x
1
x
2
. . . x
w
I
. ¸¸ .
integer
[ x
w
I
+1
. . . x
w−1
. ¸¸ .
fraction
]
2
. (2.5)
In two’s complement, the range of representable numbers is asymmetric. The
largest number is
X
max
= 2
w
I
−2
−w
F
= [0[1 . . . 1[1 . . . 1[]
2
, (2.6)
and the smallest number is
X
min
= −2
w
I
= [1[0 . . . 0[0 . . . 0[]
2
. (2.7)
2.1.2 Canonic Signed-Digit Representation
Signed-digit (SD) numbers differ from the binary representation, since the digits
are allowed to take negative values, i.e., x
i
∈ ¦−1, 0, 1¦. The symbol 1 is
also used to represent −1. It is a redundant number system, as different SD
representations are possible of the same integer value. The canonic signed-digit
(CSD) representation is a special case of signed-digit representation in that each
number has a unique representation. The other feature of CSD representation
is that a CSD binary number has the fewest number of non-zero digits with no
consecutive bits being non-zero [41].
A number can be represented in CSD form as
X =
w−1

i=0
x
i
2
i
,
where, x
i
∈ ¦−1, 0, +1¦ and x
i
x
i+1
= 0, i = 0, 1, . . . , w −2.
2.2 Fixed-Point Quantization
Three types of fixed-point quantization are normally considered, rounding, trun-
cation, and magnitude truncation [1, 42, 43]. The quantization operator is
denoted by Q(.). For a number X, the rounded value is denoted by Q
r
(X),
the truncated value by Q
t
(X), and the magnitude truncated value Q
mt
(X). If
the quantized value has w
F
fractional bits, the quantization step size, i.e., the
difference between the adjacent quantized levels, is
∆ = 2
−w
F
(2.8)
The rounding operation selects the quantized level that is nearest to the un-
quantized value. As a result, the rounding error is at most ∆/2 in magnitude
as shown in Fig. 2.1a. If the rounding error,
r
, is defined as

r
= Q
r
(X) −X, (2.9)
20 Chapter 2. Finite Word Length Effects
∆/2
∆/2
(a) Rounding error.


(b) Truncation error.


(c) Magnitude truncation er-
ror.
Figure 2.1: Quantization error characteristics.
then


2

r


2
. (2.10)
Truncation simply discards the LSB bits, giving a quantized value that is
always less than or equal to the exact value. The error characteristics in the
case of truncation are shown in Fig. 2.1b. The truncation error is
−∆ <
t
≤ 0. (2.11)
Magnitude truncation chooses the nearest quantized value that has a magni-
tude less than or equal to the exact value, as shown in Fig. 2.1c, which implies
−∆ <
mt
< ∆. (2.12)
The quantization error can often be modeled as a random variable that has
a uniform distribution over the appropriate error range. Therefore, the filter
calculations involving round-off errors can be assumed error-free calculations
that have been corrupted by additive white noise [43]. The mean and variance
of the rounding error is
m
r
=
1

∆/2
_
−∆/2

r
d
r
= 0 (2.13)
and
σ
2
r
=
1

∆/2
_
−∆/2
(
r
−m
r
)
2
d
r
=

2
12
. (2.14)
Similarly, for truncation, the mean and variance of the error are
m
t
= −

2
and σ
2
t
=

2
12
, (2.15)
2.3. Overflow Characteristics 21
and for magnitude truncation,
m
mt
= 0 and σ
2
mt
=

2
3
. (2.16)
2.3 Overflow Characteristics
With finite word length, it is possible for the arithmetic operations to overflow.
This happens for fixed-point arithmetic ,e.g., when two numbers of the same sign
are added to give a value having a magnitude not in the interval [−2
w
I
, 2
w
I
).
Since numbers outside this range are not representable, the result overflows.
The overflow characteristics of two’s complement arithmetic can be expressed
as
X
2C
(X) =
_
_
_
X −2
w
I
+1
, X ≥ 2
w
I
,
X, −2
w
I
≤ X < 2
w
I
,
X + 2
w
I
+1
, X < −2
w
I
,
(2.17)
and graphically it is shown in Fig. 2.2.
−2
w
I
2
w
I
−∆
X
2C
(X)
X
Figure 2.2: Overflow characteristics for two’s complement arithmetic.
2.3.1 Two’s Complement Addition
In two’s complement arithmetic, when two numbers each having w-bits are
added together, the result will be w + 1 bits. To accommodate this extra bit,
the integer bits need to be extended. In two’s complement, such overflows can
be seen as discarding the extra bit, which corresponds to a repeated addition
or subtraction of 2
(w
I
+1)
to make the w + 1-bit result to be representable by
w-bits. This model for overflow is illustrated in Fig. 2.3.
2.3.2 Two’s Complement Multiplication
In the case of multiplication of two fixed-point numbers each having w-bits, the
result is 2w-bits. Overflow is similarly treated here as in the case of addition,
a repeated addition or subtraction of 2
(w
I
+1)
. Having two numbers, each with
a precision w
F
, the product is of precision 2w
F
. The number is reduced to w
F
22 Chapter 2. Finite Word Length Effects
y y y
d 2
(w
I
+1)
X
1
X
2
X
2
X
1
X
2
X
1
Figure 2.3: Addition in two’s complement. The integer d ∈ Z has to assign a
value such that y ∈ [−2
w
I
, 2
w
I
).
y
Q
y y
d 2
(w
I
+1)

X
1
X
2
X
2
X
1
X
2
X
1
Figure 2.4: Multiplication in two’s complement. The integer d ∈ Z has to assign
a value such that y ∈ [−2
w
I
, 2
w
I
).
precision again by using rounding or truncation. The model to handle overflow
in multiplication is shown in Fig. 2.4.
2.4 Scaling
To prevent overflow in fixed-point filter realizations, the signal levels inside
the filter can be reduced by inserting scaling multipliers. However, the scaling
multipliers should not distort the transfer function of the filter. Also the signal
levels should not be too low, otherwise, the signal-to-noise (SNR) ratio will
suffer as the noise level is fixed for fixed-point arithmetic.
The use of two’s complement arithmetic eases the scaling, as repeated addi-
tions with an overflow can be acceptable if the final sum lies within the proper
signal range [4]. However, the inputs to non-integer multipliers must not over-
flow. In the literature, there exist several scaling norms that compromise be-
tween the probability of overflows and the round-off noise level at the output.
In this thesis, only the commonly employed L
2
-norm is considered which for a
Fourier transform H(e
jωT
) is defined as
_
_
H(e
jωT
)
_
_
2
=
¸
¸
¸
¸
_
1

π
_
−π
[H(e
jωT
)[
2
d(ωT).
In particular, if the input to a filter is Gaussian white noise with a certain
probability of overflow, using L
2
-norm scaling of a node inside the filter, or at
the output, implies the same probability of overflow at that node.
2.5. Round-Off Noise 23
2.5 Round-Off Noise
A few assumptions need to be made before computing the round-off noise at the
digital filter output. Quantization noise is assumed to be stationary, white, and
uncorrelated with the filter input, output, and internal variables. This assump-
tion is valid if the filter input changes from sample to sample in a sufficiently
random-like manner [43].
For a linear system with impulse response g(n), excited by white noise with
mean m
x
and variance σ
2
x
, the mean and variance of the output noise is
m
y
= m
x

n=−∞
g(n) (2.18)
and
σ
2
y
= σ
2
x

n=−∞
g
2
(n), (2.19)
where g(n) is the impulse response from the point where a round-off takes place
to the filter output. In case there is more than one source of roundoff error
in the filter, the assumption is made that these errors are uncorrelated. The
round-off noise variance at the output is the sum of contributions from each
quantization error source.
2.6 Word Length Optimization
As stated earlier in the chapter, the quantization process introduces round-off
errors, which in turn measures the accuracy of an implementation. The cost
of an implementation is generally required to be minimized, while still satis-
fying the system specification in terms of implementation accuracy. Excessive
bit-width allocation will result in wasting valuable hardware resources, while
in-sufficient bit-width allocation will result in overflows and violate precision re-
quirements. The word length optimization approach trades precisions for VLSI
measures such as area, power, and speed. These are the measures or costs by
which the performance of a design is evaluated. After word length optimization,
the hardware implementation of an algorithm will be efficient typically involv-
ing a variety of finite precision representation of different sizes for the internal
variables.
The first difficulty in the word length optimization problem is defining of the
relationship of word length to considered VLSI measures. The possible ways
could be closed-form expressions or the availability of precomputed values in
the form of a table of these measures as a function of word lengths. These
closed-form expressions and precomputed values are then used by the word
length optimization algorithm at the word length assignment phase to have an
estimate, before doing an actual VLSI implementation.
24 Chapter 2. Finite Word Length Effects
The round-off noise at the output is considered as the measure of perfor-
mance function because it is the primary concern of many algorithm designers.
The round-off error is a decreasing function of word length, while VLSI mea-
sures such as area, speed, and power consumption are increasing functions of
word length. To derive a round-off noise model, an LTI system with n quanti-
zation error sources is assumed. This assumption allows to use superposition of
independent noise sources to compute the overall round-off noise at the output.
The noise variance at the output is then written as
σ
2
o
=

k=0
σ
2
e
i
h
2
i
(k), 1 ≤ i ≤ n, (2.20)
where e
i
is the quantization error source at node i and h
i
(k) is the impulse
response from node i to the output. If the quantization word length w
i
is
assumed for error source at node i then
σ
2
e
i
=
2
−2w
i
12
, 1 ≤ i ≤ n. (2.21)
The formulation of an optimization problem is done by constraining the cost or
accuracy of an implementation while optimizing other metric(s). For simplicity,
the cost function is assumed to be the area of the design. Its value is measured
appropriately to the considered technology, and it is assumed to be the function
of quantization word lengths, f(w
i
), i = 1, 2, . . . , n . The performance function,
on other hand, is taken to be the round-off noise value at the output, given in
(2.20), due to the limiting of internal word lengths. As a result, one possible
formulation of word length optimization problem is
minimize area : f(w
i
), 1 ≤ i ≤ n (2.22)
s.t. σ
2
o
≤ σ
spec
, (2.23)
where σ
spec
is the required noise specification at the output.
The problem of word length optimization has received considerable research
attention. In [44–48], different search-based strategies are used to find suitable
word length combinations. In [49], the word length allocation problem is solved
using a mixed-integer linear programming formulation. Some other approaches,
e.g., [50–52], have constrained the cost, while optimizing the other metric(s).
2.7 Constant Multiplication
A multiplication with a constant coefficient, commonly used in DSP algorithms
such as digital filters [53], can be made multiplierless by using additions, sub-
tractions, and shifts only [54]. The complexity for adders and subtracters is
roughly the same so no differentiation between the two is normally considered.
A shift operation in this context is used to implement a multiplication by a
2.7. Constant Multiplication 25
− −

x
x
x
y
¸3
¸4
¸2 ¸2 ¸2
y
¸2
y

¸2
Figure 2.5: Different realizations of multiplication with the coefficient 45. The
symbol ¸i are used to represent i left shifts.
factor of two. Most of the work in the literature has focused on minimizing
the adder cost [55–57]. For bit parallel arithmetic, the shifts can be realized
without any hardware using hardwiring.
In constant coefficient multiplication, the hardware requirements depend on
the coefficient value, e.g., the number of ones in the binary representation of the
coefficient value. The constant coefficient multiplication can be implemented by
the method that is based on the CSD representation of the constant coefficient
[58], or more efficiently by using other structures as well that require fewer
number of operations [59]. Consider, for example, the coefficient 45, having
the CSD representation 1010101. The multiplication with this constant can be
realized by three different structures as shown in Fig. 2.5, varying with respect
to number of additions and shifts requirement [41].
In some applications, one signal is required to be multiplied by several con-
stant coefficients, as in the case of transposed direct-form FIR filters shown
in 1.2. Realizing the set of products of a single multiplicand is known as the
multiplier block problem [60] or the multiple constant multiplications (MCM)
problem [61]. A simple way to implement multiplier blocks is to realize each
multiplier separately. However, they can be implemented more efficiently by us-
ing structures that remove any redundant partial results among the coefficients
and thereby reduce the overall number of operations. The MCM algorithms
can be divided into three groups based on the approach used in the algorithms;
sub-expression sharing [61–65], difference methods [66–70], and graph based
methods [60, 71–73].
The MCM concepts can be further generalized to computations involving
multiple inputs and multiple outputs. This corresponds to a matrix-vector
26 Chapter 2. Finite Word Length Effects
multiplication with a matrix with constant coefficients. This is the case for
linear transforms such as the discrete cosine transform (DCT) or the discrete
Fourier transform (DFT), but also FIR filter banks [74], polyphase decomposed
FIR filters [75], and state space digital filters [4, 41]. Matrix-vector MCM
algorithms include [76–80].
Chapter 3
Integer Sampling Rate Conversion
The role of filters in sampling rate conversion has been described in Chapter 1.
This chapter mainly focuses on the implementation aspects of decimators and
interpolators due to the fact that filtering initially has to be performed on the
side of higher sampling rate. The goal is then to achieve conversion structures
allowing the arithmetic operations to be performed at the lower sampling rate.
The overall computational workload will as a result be reduced.
The chapter begins with the basic decimation and interpolation structures
that are based on FIR filters. A part of the chapter is then devoted to the
efficient polyphase implementation of FIR decimators and interpolators. The
concept of the basic cascade integrator-comb (CIC) filter is introduced and
its properties are discussed. The structures of the CIC-based decimators and
interpolators are then shown. The overall two-stage implementation of a CIC
and an FIR filter that is used for the compensation is described. Finally, the
non-recursive implementation of the CIC filter and its multistage and polyphase
implementation structures are presented.
3.1 Basic Implementation
Filters in SRC have an important role and are used as anti-aliasing filters in
decimators or anti-imaging filters in interpolators. The characteristics of these
filters then correlate to the overall performance of a decimator or of an interpola-
tor. As discussed in Chapter 1, filtering operations need to be performed at the
higher sampling rate. In decimation, filtering operation precedes downsampling
and in interpolation, filtering operation proceeds upsampling. However, down-
sampling or upsampling operations can be moved into the filtering structures,
providing arithmetic operations to be performed at the lower sampling rate. As
a result, the overall computational workload in the sampling rate conversion
system can potentially be reduced by the conversion factor M (L).
27
28 Chapter 3. Integer Sampling Rate Conversion
x(n)
z
−1
z
−1
z
−1
z
−1
M
h(3) h(2) h(1) h(0)
y(m)
h(N −1) h(N)
Figure 3.1: Cascade of direct-form FIR filter and downsampler.
x(n)
z
−1
z
−1
z
−1
M M M M
h(0) h(1) h(3)
z
−1
M
M
h(2)
y(m)
h(N −1) h(N)
Figure 3.2: Computational efficient direct-form FIR decimation structure.
In the basic approach [81], FIR filters are considered as anti-aliasing or anti-
imaging filters. The non-recursive nature of FIR filters provides the opportunity
to improve the efficiency of FIR decimators and interpolators through polyphase
decomposition.
3.1.1 FIR Decimators
In the basic implementation of an FIR decimator, a direct-form FIR filter is
cascaded with the downsampling operation as shown in Fig. 3.1. The FIR fil-
ter acts as the anti-aliasing filter. The implementation can be modified to a
form that is computationally more efficient, as seen in Fig. 3.2. The downsam-
pling operation precedes the multiplications and additions which results in the
arithmetic operations to be performed at the lower sampling rate.
In Fig. 3.2, the input data is now read simultaneously from the delays at
every M:th instant of time. The input sample values are then multiplied by the
filter coefficients h(n) and combined together to give y(m).
In the basic implementation of the FIR decimator in Fig. 3.1, the number
of the multiplications per input sample in the decimator is equal to the FIR
filter length N + 1. The first modification made by the use of noble identity,
seen in Fig. 3.2, reduces the multiplications per input sample to (N + 1)/M.
The symmetry of coefficients may be exploited in the case of linear phase FIR
filter to reduce the number of multiplications to (N + 1)/2 for odd N and
N/2 + 1 for even N. In the second modification, for a downsampling factor
M, multiplications per input sample is (N + 2)/2M for even N. Similarly, the
3.2. Polyphase FIR Filters 29
z
−1
L
x(n)
h(2)
z
−1
z
−1
h(1) h(0)
y(m)
h(N −1) h(N)
Figure 3.3: Cascade of upsampler and transposed direct-form FIR filter.
h(2) h(1) h(0)
y(m)
z
−1
z
−1
z
−1
L L L L L
x(n)
h(N −1) h(N)
Figure 3.4: Computational efficient transposed direct-form FIR interpolation
structure.
multiplications per input sample are (N + 1)/(2M) for odd N.
3.1.2 FIR Interpolators
The interpolator is a cascade of an upsampler followed by an FIR filter that acts
as an anti-imaging filter, seen in Fig. 3.3. Similar to the case of decimator, the
upsampling operation is moved into the filter structure at the desired position
such that the multiplication operations are performed at the lower sampling-rate
side, as shown in Fig. 3.4.
In the basic multiplication of the FIR interpolator in Fig. 3.3, every L:th
input sample to the filter is non-zero. Since there is no need to multiply the
filter coefficients by the zero-valued samples, multiplications need to be per-
formed at the sampling rate of the input signal. The result is then upsampled
by the desired factor L. This modified implementation approach reduces the
multiplications count per output sample from N + 1 to (N + 1)/L. Similar to
the decimator case, the coefficient symmetry of the linear phase FIR filter can
be exploited to reduce the number of multiplication further by two.
3.2 Polyphase FIR Filters
As described earlier in Sec. 3.1, the transfer function of an FIR filter can be
decomposed into parallel structures based on the principle of polyphase decom-
position. For a factor of M polyphase decomposition, the FIR filter transfer
30 Chapter 3. Integer Sampling Rate Conversion
T T T
y(m)
x(n)
Figure 3.5: Polyphase factor of M FIR decimator structure.
function is decomposed into M low-order transfer functions called the polyphase
components [5, 10]. The individual contributions from these polyphase com-
ponents when added together has the same effect as of the original transfer
function.
If the impulse response h(n) is assumed zero outside 0 ≤ n ≤ N, then the
factors H
m
(z) in (1.10) becomes
H
m
(z) =
(N+1)/M

n=0
h(nM +m)z
−n
, 0 ≤ m ≤ M −1. (3.1)
3.2.1 Polyphase FIR Decimators
The basic structure of the polyphase decomposed FIR filter is the same as
that in Fig. 1.10a, however the sub-filters H
m
(z) are now specifically defined
in (3.1). The arithmetic operations in the implementation of all FIR sub-filters
still operate at the input sampling rate. The downsampling operation at the
output can be moved into the polyphase branches by using the more efficient
polyphase decimation implementation structure in Fig. 1.10. As a result, the
arithmetic operations inside the sub-filters now operate at the lower sampling
rate. The overall computational complexity of the decimator is reduced by a
factor of M.
The input sequences to the polyphase sub-filters are combination of delayed
and downsampled values of the input which can be directly selected from the
input with the use of a commutator. The resulting polyphase architecture for a
factor-of-M decimation is shown in Fig. 3.5. The commutator operates at the
3.3. Multistage Implementation 31
T T T
x(n)
y(m)
Figure 3.6: Polyphase factor of L FIR interpolator structure.
input sampling rate but the sub-filters operate at the output sampling rate.
3.2.2 Polyphase FIR Interpolators
The polyphase implementation of an FIR filter, by a factor L, is done by decom-
posing the original transfer function into L polyphase components, also called
polyphase sub-filters. The basic structure of the polyphase decomposed FIR
filter is same as that in the Fig. 1.11a. The polyphase sub-filters H
m
(z) are
defined in (3.1) for the FIR filter case. In the basic structure, the arithmetic
operations in the implementation of all FIR sub-filters operate at the upsam-
pled rate. The upsampling operation at the input is moved into the polyphase
branches by using the more efficient polyphase interpolation implementation
structure in Fig. 1.11. More efficient interpolation structure will result because
the arithmetic operations inside the sub-filters now operate at the lower sam-
pling rate. The overall computational complexity of the interpolator is reduced
by a factor of L.
On the output side, the combination of upsamplers, delays, and adders are
used to feed the correct interpolated output sample. The same function can be
replaced directly by using a commutative switch. The resulting architecture for
factor of L interpolator is shown in Fig. 3.6.
3.3 Multistage Implementation
A multistage decimator with K stages, seen in Fig. 3.7, can be used when
the decimation factor M can be factored into the product of integers, M =
M
1
M
2
, . . . , M
k
, instead of using a single filter and factor of M downsampler.
Similarly, a multistage interpolator with K stages, seen in Fig. 3.8, can be used if
32 Chapter 3. Integer Sampling Rate Conversion
H
1
(z) M
1
H
2
(z)
M
2
H
K
(z)
M
K
Figure 3.7: Multistage implementation of a decimator.
H
1
(z) H
2
(z) H
J
(z) L
1
L
2
L
J
Figure 3.8: Multistage implementation of an interpolator.
the interpolation factor L can be factored as L = L
1
L
2
L
K
. The noble
identities can be used to transform the multistage decimator implementation
into the equivalent single stage structure as shown in Fig. 3.9. Similarly, the
single stage equivalent of the multistage interpolator is shown in Fig. 3.10. An
optimum realization depends on the proper selection of K and J and best
ordering of the multistage factors [82].
The multistage implementations are used when there is need of implementing
large sampling rate conversion factors [1, 82]. Compared to the single stage
implementation, it relaxes the specifications of individual filters. A single stage
implementation with a large decimation (interpolation) factor requires a very
narrow passband filter, which is hard from the complexity point of view [3, 81,
82].
3.4 Cascaded Integrator Comb Filters
An efficient architecture for a high decimation-rate filter is the cascade integrator
comb (CIC) filter introduced by Hogenauer [83]. The CIC filter has proven to be
an effective element in high-decimation or interpolation systems [84–87]. The
simplicity of implementation makes the CIC filters suitable for operating at
higher frequencies. A single stage CIC filter is shown in Fig. 3.11. It is a
cascade of integrator and comb sections. The feed forward section of the CIC
filter, with differential delay R, is comb section and the feedback section is called
an integrator. The transfer function of a single stage CIC filter is given as
H(z) =
1 −z
−R
1 −z
−1
=
R−1

n=0
z
−n
. (3.2)
H
1
(z)H
2
(z
M
1
)H
3
(z
M
1
M
2
) . . . H
J
(z
M
1
M
2
...M
J−1
) M
1
M
2
. . . M
J
Figure 3.9: Single stage equivalent of a multistage decimator.
3.4. Cascaded Integrator Comb Filters 33
L
1
L
2
. . . L
J
H
1
(z)H
2
(z
L
1
)H
3
(z
L
1
L
2
) . . . H
J
(z
L
1
L
2
...L
J−1
)
Figure 3.10: Single stage equivalent of a multistage interpolator.
z
−1

x(n) y(m)
z
−R
Figure 3.11: Single-stage CIC filter.
The comb filter has R zeros that are equally spaced around the unit circle. The
zeros are the R-th roots of unity and are located at z(k) = e
j2πk/R
, where
k = 0, 1, 2, . . . , R − 1. The integrator section, on the other hand, has a single
pole at z = 1. CIC filters are based on the fact that perfect pole/zero cancel-
ing can be achieved, which is only possible with exact integer arithmetic [88].
The existence of integrator stages will lead to overflows. However, this is of
no harm if two’s complement arithmetic is used and the range of the number
system is equal to or exceeds the maximum magnitude expected at the output
of the composite CIC filter. The use of two’s complement and non-saturating
arithmetic accommodates the issues with overflow. With the two’s complement
wraparound property, the comb section following the integrator will compute
the correct result at the output.
The frequency response of CIC filter, by evaluating H(z) on the unit circle,
z = e
jωT
= e
j2πf
, is given by
H(e
jωT
) =
_
sin(
ωTR
2
)
sin(
ωT
2
)
_
e
−jωT(R−1)/2
(3.3)
The gain of the single stage CIC filter at ωT = 0 is equal to the differential
delay, R, of the comb section.
3.4.1 CIC Filters in Interpolators and Decimators
In the hardware implementations of decimation and interpolation, cascaded
integrator-comb (CIC) filters are frequently used as a computationally efficient
narrowband lowpass filters. These lowpass filters are well suited to improve
the efficiency of anti-aliasing filters prior to decimation or the anti-imaging
filters after the interpolation. In both of these applications, CIC filters have
to operate at very high sampling rate. The CIC filters are generally more
convenient for large conversion factors due to small lowpass bandwidth. In
multistage decimators with a large conversion factor, the CIC filter is the best
suited for the first decimation stage, whereas in interpolators, the comb filter is
adopted for the last interpolation stage.
34 Chapter 3. Integer Sampling Rate Conversion
z
−1
x(n)
M

y(m)
z
−D
Figure 3.12: Single stage CIC decimation filter with D = R/M.
The cascade connection of integrator and comb in Fig. 3.11 can be inter-
changed as both of these are linear time-invariant operations. In a decimator
structure, the integrator and comb are used as the first and the second section
of the CIC filter, respectively. The structure also includes the decimation factor
M. The sampling rate at the output as a result of this operation is F
in
/M,
where F
in
is the input sampling rate. However, an important aspect with the
CIC filters is the spectral aliasing resulting from the downsampling operation.
The spectral bands centered at the integer multiples of 2/D will alias directly
into the desired band after the downsampling operation. Similarly, the CIC
filter with the upsampling factor L is used for the interpolation purpose. The
sampling rate as a result of this operation is F
in
L. However, imperfect filtering
gives rise to unwanted spectral images. The filtering characteristics for the anti-
aliasing and anti-image attenuation can be improved by increasing the number
of stages of CIC filter.
The comb and integrator sections are both linear time-invariant and can
follow or precede each other. However, it is generally preferred to place comb
part on the side which has the lower sampling rate. It reduces the length of the
delay line which is basically the differential delay of the comb section. When
the decimation factor M is moved into the comb section using the noble identity
and considering R = DM, the resulting architecture is the most common im-
plementation of CIC decimation filters shown in Fig. 3.12. The delay length is
reduced to D = R/M, which is now considered as the differential delay of comb
part. One advantage is that the storage requirement is reduced and also the
comb section now operates at a reduced clock rate. Both of these factors result
in hardware saving and also low power consumption. Typically, the differential
delay D is restricted to 1 or 2. The value of D also decides the number of nulls
in the frequency response of the decimation filter. Similarly, the interpolation
factor L is moved into the comb section using the noble identity and considering
R = DL. The resulting interpolation structure is shown in Fig. 3.13. The delay
length is reduced to D = R/L. Similar advantages can be achieved as in the
case of CIC decimation filter. The important thing to note is that the integrator
section operate at the higher sampling rate in both cases.
The transfer function of the K-stage CIC filter is
H(z) = H
K
I
(z)H
K
C
(z) =
_
1 −z
−D
1 −z
−1
_
K
. (3.4)
In order to modify the magnitude response of the CIC filter, there are only two
3.4. Cascaded Integrator Comb Filters 35
z
−1
y(m)
L
x(n)

z
−D
Figure 3.13: Single stage CIC interpolation filter with D = R/L.
parameters; the number of CIC filter stages and the differential delay D. The
natural nulls of the CIC filter provide maximum alias rejection but the aliasing
bandwidths around the nulls are narrow, and are usually not enough to provide
sufficient aliasing rejection in the entire baseband of the signal. The advantage
of increased number of stages is improvement in attenuation characteristics but
the passband droop normally needs to be compensated.
3.4.2 Polyphase Implementation of Non-Recursive CIC
The polyphase decomposition of the non-recursive CIC filter transfer function
may achieve lower power consumption than the corresponding recursive imple-
mentation [89, 90]. The basic non-recursive form of the transfer function is
H(z) =
1
D
D−1

n=0
z
−n
. (3.5)
The challenge to achieve low power consumption are those stages of the filter
which normally operate at higher input sampling rate. If the overall conversion
factor D is power of two, D = 2
J
, then the transfer function can be expressed
as
H(z) =
J−1

i=0
(1 +z
−2
i
) (3.6)
As a result, the original converter can be transformed into a cascade of J factor-
of-two converters. Each has a non-recursive sub-filter (1 +z
−1
)
K
and a factor-
of-two conversion.
A single stage non-recursive CIC decimation filter is shown in Fig. 3.14. In
the case of multistage implementation, one advantage is that only first deci-
mation stage will operate at high input sampling rate. The sampling rate is
successively reduced by two after each decimation stage as shown in Fig. 3.15.
Further, the polyphase decomposition by a factor of two at each stage can be
used to reduce the sampling rate by an additional factor of two. The second
advantage of the non-recursive structures is that there are no more register over-
flow problems, if properly scaled. The register word length at any stage j for
an input word length w
in
, is limited to (w
in
+K j).
In an approach proposed in [89], the overall decimator is decomposed into
a first-stage non-recursive filter with the decimation factor J
1
, followed by a
cascade of non-recursive (1 + z
−1
)
K
filters with a factor-of-2 decimation. The
36 Chapter 3. Integer Sampling Rate Conversion
(1 + z
−1
+ z
−2
+ ... + z
−D+1
)
K
D
Figure 3.14: Non-recursive implementation of a CIC decimation filter.
(1 + z
−1
)
K
(1 + z
−1
)
K
(1 + z
−1
)
K
2 2 2
Figure 3.15: Cascade of J factor-of-two decimation filters.
polyphase decomposition along with noble identities is used to implement all
these sub-filters.
3.4.3 Compensation Filters
The frequency magnitude response envelopes of the CIC filters are like sin(x)/x,
therefore, some compensation filter are normally used after the CIC filters. The
purpose of the compensation filter is to compensate for the non-flat passband
of the CIC filter.
Several approaches in the literature [91–95] are available for the CIC filter
sharpening to improve the passband and stopband characteristics. They are
based on the method proposed earlier in [96]. A two stage solution of sharpened
comb decimation structure is proposed in [93–95, 97–100] for the case where
sampling rate conversion is expressible as the product of two integers. The
transfer function can then be written as the product of two filter sections. The
role of these filter sections is then defined. One filter section may be used
to provide sharpening while the other can provide stopband attenuation by
selecting appropriate number of stages in each filter section.
Chapter 4
Non-Integer Sampling Rate Conversion
The need for a non-integer sampling rate conversion may appear when the two
systems operating at different sampling rates have to be connected. This chapter
gives a concise overview of SRC by a rational factor and implementation details.
The chapter first introduces the SRC by a rational factor. The technique
for constructing efficient SRC by a rational factor based on FIR filters and
polyphase decomposition is then presented. In addition, the sampling rate al-
teration with an arbitrary conversion factor is described. The polynomial-based
approximation of the impulse response of a resampler model is then presented.
Finally, the implementation of fractional-delay filters based on the Farrow struc-
ture is considered.
4.1 Sampling Rate Conversion: Rational Factor
The two basic operators, downsampler and upsampler, are used to change the
sampling rate of a discrete-time signal by an integer factor only. Therefore, the
sampling rate change by a rational factor, requires the cascade of upsampler and
downsampler operators; Upsampling by a factor of L, followed by downsampling
by a factor of M [9, 82]. For some cases, it may be beneficial to change the order
of these operators, which is only possible if the L and M factors are relative
prime, i.e., M and L do not share a common divisor greater than one. A cascade
of these sampling rate alteration devices, results in sampling rate change by a
rational factor L/M. The realizations shown in Fig. 4.1 are equivalent if the
factors L and M are relative prime.
4.1.1 Small Rational Factors
The classical design of implementing the ratio L/M is upsampling by a factor
of L followed by appropriate filtering, followed by a downsampling by a factor
of M.
37
38 Chapter 4. Non-Integer Sampling Rate Conversion
x(n) y(n)
M L
x(n)
L M
y(n)
Figure 4.1: Two different realizations of rational factor L/M.
M
y(n) x(n)
L H
I
(z) H
d
(z)
Figure 4.2: Cascade of an interpolator and decimator.
The implementation of a rational SRC scheme is shown in Fig. 4.2. The
original input signal x(n) is first upsampled by a factor of L followed by an anti-
imaging filter also called interpolation filter. The interpolated signal is first
filtered by an anti-aliasing filter before downsampling by a factor of M. The
sampling rate of the output signal is
F
out
=
L
M
F
in
. (4.1)
Since both filters, interpolation and the decimation, are next to each other, and
operate at the same sampling rate. Both of these lowpass filters can be combined
into a single lowpass filter H(z) as shown in Fig. 4.3. The specification of the
filter H(z) needs to be selected such that it could act as both anti-imaging and
anti-aliasing filters at the same time.
Since the role of the filter H(z) is to act as an anti-imaging as well as an
anti-aliasing filter, the stopband edge frequency of the ideal filter H(z) in the
rational SRC of Fig. 4.3 should be
ω
s
T = min
_
π
L
,
π
M
_
. (4.2)
From the above equation it is clear that the passband will become more narrow
and filter requirements will be hard for larger values of L and M. The ideal
specification requirement for the magnitude response is
[H(e
jωT
)[ =
_
L, [ωT[ ≤ min
_
π
L
,
π
M
_
0, otherwise,
(4.3)
L
x(n)
M H(z)
y(n)
Figure 4.3: An implementation of SRC by a rational factor L/M.
4.1. Sampling Rate Conversion: Rational Factor 39
L
L
L
L
M
M
M
y(m) x(n)
H
0
(z)
H
1
(z)
H
L−1
(z)
M H
2
(z) z
−2
z
−1
z
−L+1
Figure 4.4: Polyphase realization of SRC by a rational factor L/M.
The frequency spectrum of the re-sampled signal, Y (e
jωT
), as a function of
spectrum of the original signal X(e
jωT
), the filter transfer function H(e
jωT
),
and the interpolation and decimation factors L and M, can be expressed as
Y (e
jωT
) =
1
M
M−1

k=0
X(e
j(LωT−2πk)/M
)H(e
j(ωT−2πk)/M
). (4.4)
It is apparent that the overall performance of the resampler depends on the
filter characteristics.
4.1.2 Polyphase Implementation of Rational Sampling Rate Con-
version
The polyphase representation is used for the efficient implementation of rational
sampling rate converters, which then enables the arithmetic operations to be
performed at the lowest possible sampling rate. The transfer function H(z) in
Fig. 4.3 is decomposed into L polyphase components using the approach shown
in Fig. 1.11a. In the next step, the interpolation factor L is moved into the sub-
filters using the computationally efficient polyphase representation form shown
in Fig. 1.11. An equivalent delay block for each sub-filter branch is placed in
cascade with the sub-filters. Since the downsampling operation is linear, and
also by the use of noble identity, it can be moved into each branch of the sub-
filters. The equivalent polyphase representation of the rational factor of L/M
as a result of these modifications is shown in Fig. 4.4 [5, 10].
By using polyphase representation and noble identities, Fig. 4.4 has already
attained the computational saving by a factor of L. The next motivation is to
move the downsampling factor to the input side so that all filtering operations
could be evaluated at 1/M-th of the input sampling rate. The reorganization of
the individual polyphase branches is targeted next. If the sampling conversion
40 Chapter 4. Non-Integer Sampling Rate Conversion
z
k(l
0
L−m
0
M)
L H
k
(z) M
H
k
(z)
L
M z
−k
H
k
(z) L M z
−km
0
H
k
(z) M L z
−km
0
z
kl
0
z
kl
0
Figure 4.5: Reordering the upsampling and downsampling in the polyphase
realization of k-th branch.
factors, L and M, are relative prime, then
l
0
L −m
0
M = −1 (4.5)
where l
0
and m
0
are some integers. Using (4.5), the k-th delay can be repre-
sented as
z
−k
= z
k(l
0
L−m
0
M)
(4.6)
The step-by-step reorganization of the k-th branch is shown in Fig. 4.5. The
delay chain z
−k
is first replaced by its equivalent form in (4.6), which then
enables the interchange of downsampler and upsampler. Since both upsampler
and downsampler are assumed to be relative prime, they can be interchanged.
The fixed independent delay chains are also moved to the input and the output
sides. As can be seen in Fig. 4.5, the sub-filter H
k
(z) is in cascade with the
downsampling operation, as highlighted by the dashed block. The polyphase
representation form in Fig. 1.10a and its equivalent computationally efficient
form in Fig. 1.10 can be used for the polyphase representation of the sub-filter
H
k
(z). The same procedure is followed for all other branches. As a result of
these adjustments, all filtering operations now operate at 1/M:th of the input
sampling rate. The intermediate sampling rate for the filtering operation will be
lower, which makes it a more efficient structure for a rational SRC by a factor
L/M. The negative delay z
kl
0
on the input side is adjusted by appropriately
delaying the input to make the solution causal. The rational sampling rate
conversion structures for the case L/M < 1 considered can be used to perform
L/M > 1 by considering its dual.
4.1.3 Large Rational Factors
The polyphase decomposition approach is convenient in cases where the factors
L and M are small. Otherwise, filters of a very high order are needed. For ex-
ample, in the application of sampling rate conversion from 48 kHz to 44.1 kHz,
4.2. Sampling Rate Conversion: Arbitrary Factor 41
the rational factors requirement is L/M = 147/160, which imposes severe re-
quirements for the filters. The stopband edge frequency from (4.2) is π/160.
This will result in a high-order filter with a high complexity. In this case, a
multi-stage realization will be beneficial [41].
4.2 Sampling Rate Conversion: Arbitrary Factor
In many applications, there is a need to compute the value of underlying
continuous-time signal x
a
(t) at an arbitrary time instant between two exist-
ing samples. The computation of new sample values at some arbitrary points
can be viewed as interpolation.
Assuming an input sequence, . . . , x((n
b
−2)T
x
), x((n
b
−1)T
x
), x(n
b
T
x
), x((n
b
+
1)T
x
), x((n
b
+2)T
x
), . . . , uniformly sampled at the interval T
x
. The requirement
is to compute new sample value, y(T
y
m) at some time instant, T
y
m, which
occurs between the two existing samples x(n
b
T
x
) and x((n
b
+ 1)T
x
), where
T
y
m = n
b
T
x
+ T
x
d. The parameter n
b
is some reference or base-point index,
and d is called the fractional interval. The fractional interval can take on any
value in the range [T
x
d[ ≤ 0.5, to cover the whole sampling interval range. The
new sample value y(T
y
m) is computed, using the existing samples, by utilizing
some interpolation algorithm. The interpolation algorithm may in general be
defined as a time-varying digital filter with the impulse response h(n
b
, d).
The interpolation problem can be considered as resampling, where the continuous-
time function y
a
(t) is first reconstructed from a finite set of existing sam-
ples x(n). The continuous-time signal can then be sampled at the desired
instant, y(T
y
m) = y
a
(T
y
m). The commonly used interpolation techniques
are based on polynomial approximations. For a given set of the input sam-
ples x(−N
1
+ n
b
), . . . , x(n
b
), . . . , x(n
b
+ N
2
), the window or interval chosen is
N = N
1
+N
2
+ 1, A polynomial approximation y
a
(t) is then defined as
y
a
(t) =
N
2

k=−N
1
P
k
(t)x(n
b
+k) (4.7)
where P
k
(t) are polynomials. If the interpolation process is based on the La-
grange polynomials, then P
k
(t) is defined as
P
k
(t) =
N
2

i=−N1,i =k
t −t
k
t
k
−t
i
, k = −N
1
, −N
1
+ 1, . . . , N
2
−1, N
2
. (4.8)
The Lagrange approximation also does the exact reconstruction of the original
input samples, x(n).
4.2.1 Farrow Structures
In conventional SRC implementation, if the SRC ratio changes, new filters are
needed. This limits the flexibility in covering different SRC ratios. By utiliz-
42 Chapter 4. Non-Integer Sampling Rate Conversion
y(n)
H
L
(z) H
2
(z) H
1
(z) H
0
(z)
x(n)
d d d
Figure 4.6: Farrow structure.
ing the Farrow structure, shown in Fig. 4.6, this can be solved in an elegant
way. The Farrow structure is composed of linear-phase FIR sub-filters, H
k
(z),
k = 0, 1, . . . , L, with either a symmetric (for k even) or antisymmetric (for k
odd) impulse response. The overall impulse response values are expressed as
polynomials in the delay parameter. The implementation complexity of the Far-
row structure is lower compared to alternatives such as online design or storage
of a large number of different impulse responses. The corresponding filter struc-
ture makes use of a number of sub-filters and one adjustable fractional-delay
as seen in Fig. 4.6. Let the desired frequency response, H
des
(e
jωT
, d), of an
adjustable FD filter be
H
des
(e
jωT
, d) = e
−jωT(D+d)
, [ωT[ ≤ ω
c
T < π, (4.9)
where D and d are fixed and adjustable real-valued constants, respectively. It
is assumed that D is either an integer, or an integer plus a half, whereas d takes
on values in the interval [−1/2, 1/2]. In this way, a whole sampling interval is
covered by d, and the fractional-delay equals d (d + 0.5) when D is an integer
(an integer plus a half). The transfer function of the Farrow structure is
H(z, d) =
L

k=0
d
k
H
k
(z), (4.10)
where H
k
(z) are fixed FIR sub-filters approximating kth-order differentiators
with frequency responses
H
k
(e
jωT
) ≈ e
−jωTD
(−jωT)
k
k!
, (4.11)
which is obtained by truncating the Taylor series expansion of (4.9) to L + 1
terms. A filter with a transfer function in the form of (4.11) can approximate
the ideal response in (4.10) as close as desired by choosing L and designing
the sub-filters appropriately [30, 101]. When H
k
(z) are linear-phase FIR filters,
the Farrow structure is often referred to as the modified Farrow structure. The
Farrow structure is efficient for interpolation whereas, for decimation, it is better
to use the transposed Farrow structure so as to avoid aliasing. The sub-filters
can also have even or odd orders N
k
. With odd N
k
, all H
k
(z) are general filters
whereas for even N
k
, the filter H
0
(z) reduces to a pure delay. An alternative
4.2. Sampling Rate Conversion: Arbitrary Factor 43
representation of the transfer function of Farrow structure is
H(z, d) =
L

k=0
N

n=0
h
k
(n)z
−n
d
k
,
=
N

n=0
L

k=0
h
k
(n)d
k
z
−n
,
=
N

n=0
h(n, d)z
−n
. (4.12)
The quantity N is the order of the overall impulse response and
h(n, d) =
L

k=0
h
k
(n)d
k
. (4.13)
Assuming T
in
and T
out
to be the sampling period of x(n) and y(n), respectively,
the output sample index at the output of the Farrow structure is
n
out
T
out
=
_
(n
in
+d(n
in
))T
in
, Even N
k
(n
in
+ 0.5 +d(n
in
))T
in
, Odd N
k
,
(4.14)
where n
in
(n
out
) is the input (output) sample index. If d is constant for all input
samples, the Farrow structure delays a bandlimited signal by a fixed d. If a
signal needs to be delayed by two different values of d, in both cases, one set of
H
k
(z) is used and only d is required to be modified.
In general, SRC can be seen as delaying every input sample with a different
d. This delay depends on the SRC ratio. For interpolation, one can obtain new
samples between any two consecutive samples of x(n). With decimation, one
can shift the original samples (or delay them in the time domain) to the positions
which would belong to the decimated signal. Hence, some signal samples will
be removed but some new samples will be produced. Thus, by controlling d
for every input sample, the Farrow structure performs SRC. For decimation,
T
out
> T
in
, whereas interpolation results in T
out
< T
in
.
Chapter 5
Polynomial Evaluation
The Farrow structure, shown in Fig. 4.6, requires evaluation of a polynomial of
degree L with d as an independent variable. Motivated by that, this chapter
reviews polynomial evaluation schemes and efficient evaluation of a required set
of powers terms. Polynomial evaluation also has other applications [102], such
as approximation of elementary functions in software or hardware [103, 104].
A uni-variate polynomial is a polynomial that has only one independent
variable, whereas multi-variate polynomials involves multiple independent vari-
ables. In this chapter, only uni-variate polynomial, p(x), is considered with an
independent variable x. At the start of chapter, the classical Horner scheme
is described, which is considered as an optimal way to evaluate a polynomial
with minimum number of operations. A brief overview of parallel polynomial
evaluation schemes is then presented in the central part of the chapter. Finally,
the computation of the required set of powers is discussed. Two approaches are
outlined to exploit potential sharing in the computations.
5.1 Polynomial Evaluation Algorithms
The most common form of a polynomial, p(x), also called power form, is repre-
sented as
p(x) = a
0
+a
1
x +a
2
x
2
+ +a
n
x
n
, a
n
, = 0.
Here n is the order of the polynomial, and a
0
, a
1
, . . . , a
n
are the polynomial
coefficients.
In order to evaluate p(x), the first solution that comes into mind is the
transformation of all powers to multiplications. The p(x) takes the form
p(x) = a
0
+a
1
x +a
2
x x + +a
n
x x x, a
n
, = 0.
45
46 Chapter 5. Polynomial Evaluation
x x a
1
x a
0
a
n
a
n−1
p(x)
Figure 5.1: Horner’s scheme for an n-th order polynomial.
However, there are more efficient ways to evaluate this polynomial. One possible
way to start with is the most commonly used scheme in software and hardware
named as Horner’s scheme.
5.2 Horner’s Scheme
Horner’s scheme is simply a nested re-write of the polynomial. In this scheme,
the polynomial p(x) is represented in the form
p(x) = a
0
+x
_
a
1
+x
_
a
2
+ +x(a
n−1
+a
n
x) . . .
_
_
. (5.1)
This leads to the structure in Fig. 5.1.
Horner’s scheme has the minimum arithmetic complexity of any polynomial
evaluation algorithm. For a polynomial order of n, n multiplications and n
additions are used. However, it is purely sequential, and therefore becomes
unsuitable for high-speed applications.
5.3 Parallel Schemes
Several algorithms with some degree of parallelism in the evaluation of polyno-
mials have been proposed [105–110]. However, there is an increase in the number
of operations for these parallel algorithms. Earlier works have primarily focused
on either finding parallel schemes suitable for software realization [105–111] or
how to find suitable polynomials for hardware implementation of function ap-
proximation [47, 112–116].
To compare different polynomial evaluation schemes, varying in degree of
parallelism, different parameters such as computational complexity, number of
operations of different types, critical path, pipelining complexity, and latency
after pipelining may be considered. On the basis of these parameters, the suit-
able schemes can be shortlisted for an implementation given the specifications.
Since all schemes have the same number of additions n, the difference is in
the number of multiplication operations, as given in Table 5.1. Multiplications
can be divided into three categories, data-data multiplications, squarers, and
data-coefficient multiplications.
In squarers, both of the inputs of the multipliers are same. This leads to
that the number of partial products can be roughly halved, and, hence, it is
5.4. Powers Evaluation 47
reasonable to assume that a squarer has roughly half the area complexity as of
a general multiplier [117, 118]. In data-coefficient multiplications, as explained
in Chapter 2, the area complexity is reduced but hard to find as it depends
on the coefficient. For simplicity, it is assumed that a constant multiplier has
an area which is 1/4 of a general multiplier. As a result, two schemes having
the same total number of multiplication but vary in types of multiplications
are different with respect to implementation cost and other parameters. The
scheme which has the lower number of data-data multiplications is cheaper to
implement.
The schemes which are frequently used in literature are Horner and Estrin
[105]. However, there are other polynomial evaluation schemes as well like
Dorn’s Generalized Horner Scheme [106], Munro and Paterson’s Scheme [107,
108], Maruyama’s Scheme [109], Even Odd (EO) Scheme, and Li et al. Scheme
[110]. All the listed schemes have some good potential and useful properties.
However, some schemes do not work for all polynomial orders.
The difference in all schemes is that how the schemes efficiently split the
polynomial into sub-polynomials so that they can be evaluated in parallel. The
other factor that is important is the type of powers required after splitting of
the polynomial. Some schemes split the polynomial in such a way that only
powers of the form x
2
i
or x
3
i
are required. The evaluation of such square and
cube powers are normally more efficient than conventional multiplications. The
data-coefficient multiplication count also gives the idea of degree of parallelism
of that scheme. For example, Horner’s scheme has only one data-coefficient
multiplication, and hence, has one sequential flow; no degree of parallelism.
The EO scheme, on other hand, has two data-coefficient multiplications, which
result in two branches running in parallel. The disadvantage on the other hand
is that the number of operations are increased and additional need of hardware
to run operations in parallel. The applications where polynomials are required
to be evaluated at run-time or have extensive use of it, any degree of parallelism
in it or efficient evaluation relaxes the computational burden to satisfy required
throughput.
Other factors that may be helpful in shortlisting any evaluation scheme are
uniformity and simplicity of the resulting evaluation architecture and linearity
of the scheme as the order grows; some schemes may be good for one polynomial
order but not for the others.
5.4 Powers Evaluation
Evaluation of a set of powers, also known as exponentiation, not only finds
application in polynomial evaluation [102, 119] but also in window-based expo-
nentiation for cryptography [120, 121]. The required set of power terms to be
evaluated may contain all powers up to some integer n or it has some sparse set
of power terms that need to be evaluated. In the process of evaluating all pow-
ers at the same time, redundancy in computations at different levels is expected
48 Chapter 5. Polynomial Evaluation
Table 5.1: Evaluation schemes for fifth-order polynomial, additions (A), data-
data multiplications (M), data-coefficient multiplications (C), squarers (S).
Scheme Evaluation A M C S
Direct a
5
x
5
+a
4
x
4
+a
3
x
3
+a
2
x
2
+a
1
x +a
0
5 2 5 2
Horner ((((a
5
x +a
4
)x +a
3
)x +a
2
)x +a
1
)x +a
0
5 1 4 0
Estrin (a
5
x
2
+a
4
x +a
3
)x
3
+a
2
x
2
+a
1
x +a
0
5 4 2 1
Li (a
5
x +a
4
)x
4
+ (a
3
x +a
2
)x
2
+ (a
1
x +a
0
) 5 3 2 2
Dorn (a
5
x
3
+a
2
)x
2
+ (a
4
x
3
+a
1
)x + (a
3
x
3
+a
0
) 5 3 3 1
EO ((a
5
x
2
+a
3
)x
2
+a
1
)x + ((a
4
x
2
+a
2
)x
2
+a
0
) 5 2 3 1
Maruyama (a
5
x +a
4
)x
4
+a
3
x
3
+ (a
2
x
2
+a
1
x +a
0
) 5 4 2 2
which need to be exploited. Two independent approaches are considered. In the
first, redundancy is removed at word level, while in the second, it is exploited
at bit level.
5.4.1 Powers Evaluation with Sharing at Word-Level
A direct approach to compute x
i
is the repeated multiplication of x with itself
i − 1 times, however, this approach is inefficient and becomes infeasible for
large values of i. For example, to compute x
23
, 22 repeated multiplications of
x are required. With the binary approach in [102], the same can be achieved
in eight multiplications, ¦x
2
, x
4
, x
5
, x
10
, x
11
, x
12
, x
23
¦, while the factor method
in [102] gives a solution with one less multiplication, ¦x
2
, x
3
, x
4
, x
7
, x
8
, x
16
, x
23
¦.
Another efficient way to evaluate a single power is to consider the relation to
the addition chain problem. An addition chain for an integer n is a list of
integers [102]
1 = a
0
, a
1
, a
2
, . . . , a
l
= n, (5.2)
such that
a
i
= a
j
+a
k
, k ≤ j < i, i = 1, 2, . . . , l. (5.3)
As multiplication is additive in the logarithmic domain, a minimum length
addition chain will give an optimal solution to compute a single power term. A
minimum length addition chain for n = 23 gives the optimal solution with only
six multiplications, ¦x
2
, x
3
, x
5
, x
10
, x
20
, x
23
¦.
However, when these techniques, used for the computation of a single power
[102, 122, 123], are applied in the evaluation of multiple power terms, it does not
combine well in eliminating the overlapping terms that appear in the evaluation
of individual powers as explained below with an example.
Suppose a sparse set a power terms, T = ¦22, 39, 50¦, need to be evaluated.
5.4. Powers Evaluation 49
A minimum length addition chain solution for each individual power is
22 →¦1, 2, 3, 5, 10, 11, 22¦
39 →¦1, 2, 3, 6, 12, 15, 27, 39¦
50 →¦1, 2, 3, 6, 12, 24, 25, 50¦.
As can be seen, there are some overlapping terms, ¦2, 3, 6, 12¦, which can be
shared. If these overlapping terms are removed and the solutions are combined
for the evaluation of the required set of powers, ¦22, 39, 50¦, the solution is
¦22, 39, 50¦ →¦1, 2, 3, 5, 6, 10, 11, 12, 15, 22, 24, 25, 27, 39, 50¦. (5.4)
However, the optimal solution for the evaluation of the requested set of powers,
¦22, 39, 50¦, is
¦22, 39, 50¦ →¦1, 2, 3, 6, 8, 14, 22, 36, 39, 50¦, (5.5)
which clearly shows a significant difference. This solution is obtained by address-
ing addition sequence problem, which can be considered as the generalization of
addition chains. An addition sequence for the set of integers T = n
1
, n
2
, . . . , nr
is an addition chain that contains each element of T. For example, an addition
sequence computing ¦3, 7, 11¦ is ¦1, 2, 3, 4, 7, 9, 11¦.
The removal of redundancy at word-level does not strictly mean for the
removal of overlapping terms only but to compute only those powers which
could be used for evaluation of other powers as well in the set.
Note that the case where all powers of x, from one up to some integer n,
are required to be computed is easy compared to the sparse case. Since every
single multiplication will compute some power in the set, and it is assumed that
each power is computed only once, any solution obtained will be optimal with
respect to the minimum number of multiplications.
5.4.2 Powers Evaluation with Sharing at Bit-Level
In this approach, the evaluation of power terms is done in parallel using sum-
mations trees, similar to parallel multipliers. The PP matrices for all requested
powers in the set are generated independently and sharing at bit level is ex-
plored in the PP matrices in order to remove redundant computations. The
redundancy here relates to the fact that same three partial products may be
present in more than one columns, and, hence, can be mapped to the same full
adder.
A w-bit unsigned binary number can be expressed as following
X = x
w−1
x
w−2
x
w−3
...x
2
x
1
x
0
, (5.6)
with a value of
X =
w−1

i=0
x
i
2
i
, (5.7)
50 Chapter 5. Polynomial Evaluation
where x
w−1
is the MSB and x
0
is the LSB of the binary number. The n
th
power
of an unsigned binary number as in (5.7) is given by
X
n
=
_
w−1

i=0
x
i
2
i
_
n
. (5.8)
For the powers with n ≥ 2, the expression in (5.8) can be evaluated using the
multinomial theorem. For any integer power n and a word length of w bits, the
above equation can be simplified to a sum of weighted binary variables, which
can further be simplified by using the identities x
i
x
j
= x
j
x
i
and x
i
x
i
= x
i
.
The PP matrix of the corresponding power term can then be deduced from
this function. In the process of generation of a PP matrix from the weighted
function, the terms, e.g., x
0
2
0
will be placed in the first and x
2
2
6
in the 7-th
column from the left in the PP matrix.
For the terms having weights other than a power of two, binary or CSD
representation may be used. The advantage of using CSD representation over
binary is that the size of PP matrix is reduced and therefore the number of
full adders for accumulating PP are reduced. To make the analysis of the PP
matrix simpler, the binary variables in the PP matrix can then be represented
by their corresponding representation weights as given in Table 5.2.
Table 5.2: Equivalent representation of binary variables in PP matrix for w = 3.
Binary variable
Word (X)
Representation
x
2
x
1
x
0
x
0
0 0 1 1
x
1
0 1 0 2
x
1
x
0
0 1 1 3
x
2
1 0 0 4
x
2
x
0
1 0 1 5
x
2
x
1
1 1 0 6
x
2
x
1
x
0
1 1 1 7
In Fig. 5.2, an example case for (n, w) = (5, 4), i.e., computing the square,
cube, and fourth power of a five-bit input, is considered to demonstrate the
sharing potential among multiple partial products. All partial products of the
powers from x
2
up to x
5
are shown. As can be seen in Fig. 5.2, there are savings,
since the partial product sets ¦9, 10, 13¦ and ¦5, 7, 9¦ are present in more than
one column. Therefore, compared to adding the partial products in an arbitrary
order, three full adders (two for the first set and one for the second) can be saved
by making sure that exactly these sets of partial products are added.
5.4. Powers Evaluation 51
4
6
9
n = 2
10
13
12
14
10
13
11
10
11
3
6
5
7
9
n = 3
10
13
14 2
5
9 11
10
15
10
14
13
15
17
10
13
13
11
5
7
9
6
9
4
9
5
7
8
9 6
4
5
7
9
3
5
n = 4
9
Figure 5.2: Partial product matrices with potential sharing of full adders, n = 5,
w = 4.
Chapter 6
Conclusions and Future Work
6.1 Conclusions
In this work, contributions beneficial for the implementation of integer and non-
integer SRC were presented. By optimizing both the number of arithmetic op-
erations and their word lengths, improved implementations are obtained. This
not only reduces the area and power consumption but also allows a better com-
parisons between different SRC alternatives during a high-level system design.
The comparisons are further improved for some cases by accurately estimating
the switching activities and by introducing leakage power consumption.
6.2 Future Work
The following ideas are identified as possibilities for future work:
It would be beneficial to derive corresponding accurate switching activ-
ity models for the non-recursive CIC architectures. In this way an im-
proved comparison can be made, where both architectures have more ac-
curate switching activity estimates. Deriving a switching activity model
for polyphase FIR filters would naturally have benefits for all polyphase
FIR filters, not only the ones with CIC filter coefficients.
For interpolating recursive CIC filters, the output of the final comb stage
will be embedded with zeros during the upsampling. Based on this, one
could derive closed form expressions for the integrator taking this addi-
tional knowledge into account.
The length of the Farrow sub-filters usually differ between different sub-
filters. This would motivate a study of pipelined polynomial evaluation
53
54 Chapter 6. Conclusions and Future Work
schemes where the coefficients arrive at different times. For the proposed
matrix-vector multiplication scheme, this could be utilized to shift the
rows corresponding to the sub-filters in such a way that the total implemen-
tation complexity for the matrix-vector multiplication and the pipelined
polynomial evaluation is minimized.
The addition sequence model does not include pipelining. It would be
possible to reformulate it to consider the amount of pipelining registers
required as well. This could also include obtaining different power terms
at different time steps.
Chapter 7
References
55
56 Chapter 7. References
References
[1] S. K. Mitra, Digital Signal Processing: A Computer Based Approach,
2nd ed. New York: McGraw-Hill, 2001.
[2] A. V. Oppenheim, R. W. Schafer, and J. R. Buck, Discrete-Time Signal
Processing. Prentice Hall, 1999.
[3] S. K. Mitra and J. F. Kaiser, Handbook for Digital Signal Processing. New
York: Wiley, 1993.
[4] L. Wanhammar and H. Johansson, Digital Filters Using MATLAB.
Linköping University, 2011.
[5] N. J. Fliege, Multirate Digital Signal Processing: Multirate Systems, Filter
Banks, Wavelets, 1st ed. New York, NY, USA: John Wiley & Sons, Inc.,
1994.
[6] R. Crochiere and L. Rabiner, “Optimum FIR digital filter implemen-
tations for decimation, interpolation, and narrow-band filtering,” IEEE
Trans. Acoust., Speech, Signal Process., vol. 23, no. 5, pp. 444–456, Oct.
1975.
[7] R. E. Crochiere and L. R. Rabiner, “Interpolation and decimation of dig-
ital signals–a tutorial review,” Proc. IEEE, vol. 69, no. 3, pp. 300–331,
Mar. 1981.
[8] T. Ramstad, “Digital methods for conversion between arbitrary sampling
frequencies,” IEEE Trans. Acoust., Speech, Signal Process., vol. 32, no. 3,
pp. 577–591, Jun. 1984.
[9] M. Bellanger, Digital Processing of Signals. John Wiley & Sons, Inc.,
1984.
[10] P. P. Vaidyanathan, Multirate Systems and Filter Banks. Prentice Hall,
1993.
[11] M. Bellanger, G. Bonnerot, and M. Coudreuse, “Digital filtering by
polyphase network:application to sample-rate alteration and filter banks,”
IEEE Trans. Acoust., Speech, Signal Process., vol. 24, no. 2, pp. 109–114,
Apr. 1976.
[12] R. A. Valenzuela and A. G. Constantinides, “Digital signal processing
schemes for efficient interpolation and decimation,” IEE Proc. Electr. Cir-
cuits Syst., vol. 130, no. 6, pp. 225–235, Dec. 1983.
[13] W. Drews and L. Gazsi, “A new design method for polyphase filters using
all-pass sections,” IEEE Trans. Circuits Syst., vol. 33, no. 3, pp. 346–348,
Mar. 1986.
References 57
[14] M. Renfors and T. Saramäki, “Recursive Nth–band digital filters – part
I: Design and properties,” IEEE Trans. Circuits Syst., vol. 34, no. 1, pp.
24–39, Jan. 1987.
[15] ——, “Recursive Nth-band digital filters – part II: Design of multistage
decimators and interpolators,” IEEE Trans. Circuits Syst., vol. 34, no. 1,
pp. 40–51, Jan. 1987.
[16] F. M. Gardner, “Interpolation in digital modems. I. Fundamentals,” IEEE
Trans. Commun., vol. 41, no. 3, pp. 501–507, Mar. 1993.
[17] L. Erup, F. M. Gardner, and R. A. Harris, “Interpolation in digital
modems. II. Implementation and performance,” IEEE Trans. Commun.,
vol. 41, no. 6, pp. 998–1008, Jun. 1993.
[18] V. Tuukkanen, J. Vesma, and M. Renfors, “Combined interpolation and
maximum likelihood symbol timing recovery in digital receivers,” in Proc.
IEEE Int. Universal Personal Commun. Record Conf., vol. 2, 1997, pp.
698–702.
[19] M.-T. Shiue and C.-L. Wey, “Efficient implementation of interpolation
technique for symbol timing recovery in DVB-T transceiver design,” in
Proc. IEEE Int. Electro/Information Technol. Conf., 2006, pp. 427–431.
[20] S. R. Dooley and A. K. Nandi, “On explicit time delay estimation using
the Farrow structure,” Signal Process., vol. 72, no. 1, pp. 53–57, Jan.
1999.
[21] M. Olsson, H. Johansson, and P. Löwenborg, “Time-delay estimation us-
ing Farrow-based fractional-delay FIR filters: filter approximation vs. es-
timation errors,” in Proc. Europ. Signal Process. Conf., 2006.
[22] ——, “Simultaneous estimation of gain, delay, and offset utilizing the
Farrow structure,” in Proc. Europ. Conf. Circuit Theory Design, 2007,
pp. 244–247.
[23] C. W. Farrow, “A continuously variable digital delay element,” in Proc.
IEEE Int. Symp. Circuits Syst., 1988, pp. 2641–2645.
[24] T. Saramäki and T. Ritoniemi, “An efficient approach for conversion be-
tween arbitrary sampling frequencies,” in Proc. IEEE Int. Symp. Circuits
Syst., vol. 2, 1996, pp. 285–288.
[25] D. Babic and M. Renfors, “Power efficient structure for conversion be-
tween arbitrary sampling rates,” IEEE Signal Process. Lett., vol. 12, no. 1,
pp. 1–4, Jan. 2005.
[26] J. T. Kim, “Efficient implementation of polynomial interpolation filters
or full digital receivers,” IEEE Trans. Consum. Electron., vol. 51, no. 1,
pp. 175–178, Feb. 2005.
58 Chapter 7. References
[27] T. I. Laakso, V. Valimaki, M. Karjalainen, and U. K. Laine, “Splitting
the unit delay,” IEEE Signal Process. Mag., vol. 13, no. 1, pp. 30–60, Jan.
1996.
[28] J. Vesma, “A frequency-domain approach to polynomial-based interpola-
tion and the Farrow structure,” IEEE Trans. Circuits Syst. II, vol. 47,
no. 3, pp. 206–209, Mar. 2000.
[29] R. Hamila, J. Vesma, and M. Renfors, “Polynomial-based maximum-
likelihood technique for synchronization in digital receivers,” IEEE Trans.
Circuits Syst. II, vol. 49, no. 8, pp. 567–576, Aug. 2002.
[30] H. Johansson and P. Löwenborg, “On the design of adjustable fractional
delay FIR filters,” IEEE Trans. Circuits Syst. II, vol. 50, no. 4, pp. 164–
169, Apr. 2003.
[31] J. Vesma and T. Saramäki, “Polynomial-based interpolation filters – part
I: Filter synthesis,” Circuits Syst. Signal Process., vol. 26, no. 2, pp. 115–
146, Apr. 2007.
[32] V. Välimäki and A. Haghparast, “Fractional delay filter design based on
truncated Lagrange interpolation,” IEEE Signal Process. Lett., vol. 14,
no. 11, pp. 816–819, Nov. 2007.
[33] H. H. Dam, A. Cantoni, S. Nordholm, and K. L. Teo, “Variable digital
filter with group delay flatness specification or phase constraints,” IEEE
Trans. Circuits Syst. II, vol. 55, no. 5, pp. 442–446, May 2008.
[34] E. Hermanowicz, H. Johansson, and M. Rojewski, “A fractionally delaying
complex Hilbert transform filter,” IEEE Trans. Circuits Syst. II, vol. 55,
no. 5, pp. 452–456, May 2008.
[35] C. Candan, “An efficient filtering structure for Lagrange interpolation,”
IEEE Signal Process. Lett., vol. 14, no. 1, pp. 17–19, Jan. 2007.
[36] A. P. Chandrakasan and R. W. Brodersen, Low-Power Digital CMOS
Design. Kluwer Academic Publishers, Boston-Dordrecht-London, 1995.
[37] A. P. Chandrakasan, S. Sheng, and R. W. Brodersen, “Low-power CMOS
digital design,” IEEE J. Solid-State Circuits, vol. 27, no. 4, pp. 473–484,
Apr. 1992.
[38] N. S. Kim, T. Austin, D. Baauw, T. Mudge, K. Flautner, J. S. Hu, M. J.
Irwin, M. Kandemir, and V. Narayanan, “Leakage current: Moore’s law
meets static power,” Computer, vol. 36, no. 12, pp. 68–75, Dec. 2003.
[39] K. Roy, S. Mukhopadhyay, and H. Mahmoodi-Meimand, “Leakage cur-
rent mechanisms and leakage reduction techniques in deep-submicrometer
CMOS circuits,” Proc. IEEE, vol. 91, no. 2, pp. 305–327, Feb. 2003.
References 59
[40] International technology roadmap for semiconductors,
http://public.itrs.net.
[41] L. Wanhammar, DSP Integrated Circuits. Academic Press, 1998.
[42] K. K. Parhi, VLSI Digital Signal Processing Systems: Design and Imple-
mentation. John Wiley and Sons, 1999.
[43] L. B. Jackson, “On the interaction of round-off noise and dynamic range
in digital filters,” Bell Syst. Tech. J., vol. 49, pp. 159–184, Feb. 1970.
[44] W. Sung and K.-I. Kum, “Simulation-based word-length optimization
method for fixed-point digital signal processing systems,” IEEE Trans.
Signal Process., vol. 43, no. 12, pp. 3087–3090, Dec. 1995.
[45] K.-I. Kum and W. Sung, “Combined word-length optimization and high-
level synthesis of digital signal processing systems,” IEEE Trans. Comput.-
Aided Design Integr. Circuits Syst., vol. 20, no. 8, pp. 921–930, Aug. 2001.
[46] K. Han and B. L. Evans, “Optimum wordlength search using sensitivity
information,” EURASIP J. Applied Signal Process., vol. 2006, pp. 1–14,
Jan. 2006.
[47] D.-U. Lee and J. D. Villasenor, “A bit-width optimization methodology
for polynomial-based function evaluation,” IEEE Trans. Comput., vol. 56,
no. 4, pp. 567–571, Apr. 2007.
[48] P. D. Fiore, “Efficient approximate wordlength optimization,” IEEE
Trans. Comput., vol. 57, no. 11, pp. 1561–1570, Nov. 2008.
[49] G. A. Constantinides, P. Y. K. Cheung, and W. Luk, Synthesis And Op-
timization Of DSP Algorithms. Norwell, MA, USA: Kluwer Academic
Publishers, 2004.
[50] P. D. Fiore and L. Lee, “Closed-form and real-time wordlength adapta-
tion,” in Proc. IEEE Int. Conf. Acoustics Speech Signal Process., vol. 4,
1999, pp. 1897–1900.
[51] N. Herve, D. Menard, and O. Sentieys, “Data wordlength optimization for
FPGA synthesis,” in Proc. IEEE Workshop Signal Process. Syst. Design
Implementation, 2005, pp. 623–628.
[52] S. C. Chan and K. M. Tsui, “Wordlength determination algorithms for
hardware implementation of linear time invariant systems with prescribed
output accuracy,” in Proc. IEEE Int. Symp. Circuits Syst., 2005, pp. 2607–
2610.
[53] M. Vesterbacka, “On implementation of maximally fast wave digital fil-
ters,” Ph.D. dissertation, Diss., no. 487, Linköping university, Sweden,
June 1997.
60 Chapter 7. References
[54] G.-K. Ma and F. J. Taylor, “Multiplier policies for digital signal process-
ing,” IEEE ASSP Mag., vol. 7, no. 1, pp. 6–20, Jan. 1990.
[55] A. G. Dempster and M. D. Macleod, “Constant integer multiplication
using minimum adders,” IEE Proc. Circuits Devices Syst., vol. 141, no. 5,
pp. 407–413, Oct. 1994.
[56] O. Gustafsson, A. G. Dempster, and L. Wanhammar, “Extended re-
sults for minimum-adder constant integer multipliers,” in Proc. IEEE Int.
Symp. Circuits Syst., vol. 1, 2002.
[57] O. Gustafsson, A. Dempster, K. Johansson, M. Macleod, and L. Wanham-
mar, “Simplified design of constant coefficient multipliers,” Circuits Syst.
Signal Process., vol. 25, no. 2, pp. 225–251, Apr. 2006.
[58] S. He and M. Torkelson, “FPGA implementation of FIR filters using
pipelined bit-serial canonical signed digit multipliers,” in Proc. IEEE Cus-
tom Integrated Circuits Conf., 1994, pp. 81–84.
[59] K. Johansson, O. Gustafsson, and L. Wanhammar, “Low-complexity bit-
serial constant-coefficient multipliers,” in Proc. IEEE Int. Symp. Circuits
Syst., vol. 3, 2004.
[60] D. R. Bull and D. H. Horrocks, “Primitive operator digital filters,” IEE
Proc. Circuits, Devices and Syst., vol. 138, no. 3, pp. 401–412, Jun. 1991.
[61] M. Potkonjak, M. B. Srivastava, and A. P. Chandrakasan, “Multiple con-
stant multiplications: efficient and versatile framework and algorithms
for exploring common subexpression elimination,” IEEE Trans. Comput.-
Aided Design Integr. Circuits Syst., vol. 15, no. 2, pp. 151–165, Feb. 1996.
[62] A. G. Dempster and M. D. Macleod, “Digital filter design using subex-
pression elimination and all signed-digit representations,” in Proc. IEEE
Int. Symp. Circuits Syst., vol. 3, 2004.
[63] R. I. Hartley, “Subexpression sharing in filters using canonic signed digit
multipliers,” IEEE Trans. Circuits Syst. II, vol. 43, no. 10, pp. 677–688,
Oct. 1996.
[64] M. Martinez-Peiro, E. I. Boemo, and L. Wanhammar, “Design of high-
speed multiplierless filters using a non-recursive signed common subex-
pression algorithm,” IEEE Trans. Circuits Syst. II, vol. 49, no. 3, pp.
196–203, Mar. 2002.
[65] R. Pasko, P. Schaumont, V. Derudder, S. Vernalde, and D. Durackova, “A
new algorithm for elimination of common subexpressions,” IEEE Trans.
Comput.-Aided Design Integr. Circuits Syst., vol. 18, no. 1, pp. 58–68, Jan.
1999.
References 61
[66] O. Gustafsson and L. Wanhammar, “A novel approach to multiple con-
stant multiplication using minimum spanning trees,” in Proc. IEEE Int.
Midwest Symp. Circuits Syst., vol. 3, 2002.
[67] O. Gustafsson, H. Ohlsson, and L. Wanhammar, “Improved multiple con-
stant multiplication using a minimum spanning tree,” in Proc. Asilomar
Conf. Signals Syst. Comput., vol. 1, 2004, pp. 63–66.
[68] O. Gustafsson, “A difference based adder graph heuristic for multiple con-
stant multiplication problems,” in Proc. IEEE Int. Symp. Circuits Syst.,
2007, pp. 1097–1100.
[69] K. Muhammad and K. Roy, “A graph theoretic approach for synthesiz-
ing very low-complexity high-speed digital filters,” IEEE Trans. Comput.-
Aided Design Integr. Circuits Syst., vol. 21, no. 2, pp. 204–216, Feb. 2002.
[70] H. Ohlsson, O. Gustafsson, and L. Wanhammar, “Implementation of low
complexity FIR filters using a minimum spanning tree,” in Proc. IEEE
Mediterranean Electrotechnical Conf., vol. 1, 2004, pp. 261–264.
[71] A. G. Dempster and M. D. Macleod, “Use of minimum-adder multiplier
blocks in FIR digital filters,” IEEE Trans. Circuits Syst. II, vol. 42, no. 9,
pp. 569–577, Sep. 1995.
[72] A. G. Dempster, S. S. Dimirsoy, and I. Kale, “Designing multiplier blocks
with low logic depth,” in Proc. IEEE Int. Symp. Circuits Syst., vol. 5,
2002.
[73] Y. Voronenko and M. Püschel, “Multiplierless multiple constant multipli-
cation,” ACM Trans. Algorithms, vol. 3, no. 2, p. article 11, Apr. May
2007.
[74] A. G. Dempster and N. P. Murphy, “Efficient interpolators and filter banks
using multiplier blocks,” IEEE Trans. Signal Process., vol. 48, no. 1, pp.
257–261, Jan. 2000.
[75] O. Gustafsson and A. G. Dempster, “On the use of multiple constant
multiplication in polyphase FIR filters and filter banks,” in Proc. IEEE
Nordic Signal Process. Symp., 2004, pp. 53–56.
[76] A. G. Dempster, O. Gustafsson, and J. O. Coleman, “Towards an algo-
rithm for matrix multiplier blocks,” in Proc. Europ. Conf. Circuit Theory
Design, Sep. 2003.
[77] O. Gustafsson, H. Ohlsson, and L. Wanhammar, “Low-complexity con-
stant coefficient matrix multiplication using a minimum spanning tree
approach,” in Proc. IEEE Nordic Signal Process. Symp., 2004, pp. 141–
144.
62 Chapter 7. References
[78] M. D. Macleod and A. G. Dempster, “Common subexpression elimination
algorithm for low-cost multiplierless implementation of matrix multipli-
ers,” Electron. Lett., vol. 40, no. 11, pp. 651–652, May 2004.
[79] N. Boullis and A. Tisserand, “Some optimizations of hardware multipli-
cation by constant matrices,” IEEE Trans. Comput., vol. 54, no. 10, pp.
1271–1282, Oct. 2005.
[80] L. Aksoy, E. Costa, P. Flores, and J. Monteiro, “Optimization algorithms
for the multiplierless realization of linear transforms,” ACM Trans. Des.
Autom. Electron. Syst., accepted.
[81] L. Milic, Multirate Filtering for Digital Signal Processing: MATLAB Ap-
plications. Hershey, PA: Information Science Reference - Imprint of: IGI
Publishing, 2008.
[82] R. E. Crochiere and L. Rabiner, Multirate Digital Signal Processing. Pren-
tice Hall, Englewood Cliffs, N.J., 1983.
[83] E. Hogenauer, “An economical class of digital filters for decimation and in-
terpolation,” IEEE Trans. Acoust., Speech, Signal Process., vol. 29, no. 2,
pp. 155 –162, Apr. 1981.
[84] D. Babic, J. Vesma, and M. Renfors, “Decimation by irrational factor us-
ing CIC filter and linear interpolation,” in Proc. IEEE Int. Conf. Acous-
tics, Speech Signal Process., vol. 6, 2001, pp. 3677–3680.
[85] S. Chu and C. Burrus, “Multirate filter designs using comb filters,” IEEE
Trans. Circuits Syst., vol. 31, no. 11, pp. 913–924, Nov. 1984.
[86] J. Candy, “Decimation for Sigma Delta modulation,” IEEE Trans. Com-
mun., vol. 34, no. 1, pp. 72–76, Jan. 1986.
[87] Y. Gao, L. Jia, and H. Tenhunen, “A fifth-order comb decimation filter
for multi-standard transceiver applications,” in Proc. IEEE Int. Symp.
Circuits Syst., vol. 3, 2000, pp. 89–92.
[88] U. Meyer-Baese, Digital Signal Processing with Field Programmable Gate
Arrays (Signals and Communication Technology). Secaucus, NJ, USA:
Springer-Verlag New York, Inc., 2004.
[89] H. Aboushady, Y. Dumonteix, M.-M. Louerat, and H. Mehrez, “Efficient
polyphase decomposition of comb decimation filters in Σ∆ analog-to-
digital converters,” IEEE Trans. Circuits Syst. II, vol. 48, no. 10, pp.
898–903, Oct. 2001.
[90] A. Blad and O. Gustafsson, “Integer linear programming-based bit-level
optimization for high-speed FIR decimation filter architectures,” Circuits
Syst. Signal Process., vol. 29, no. 1, pp. 81–101, Feb. 2010.
References 63
[91] A. Y. Kwentus, Z. Jiang, and J. Willson, A. N., “Application of filter
sharpening to cascaded integrator-comb decimation filters,” IEEE Trans.
Signal Process., vol. 45, no. 2, pp. 457–467, Feb. 1997.
[92] M. Laddomada and M. Mondin, “Decimation schemes for Sigma Delta
A/D converters based on kaiser and hamming sharpened filters,” IEE
Proc. –Vision, Image Signal Process., vol. 151, no. 4, pp. 287–296, Aug.
2004.
[93] G. Jovanovic-Dolecek and S. K. Mitra, “A new two-stage sharpened comb
decimator,” IEEE Trans. Circuits Syst. I, vol. 52, no. 7, pp. 1414–1420,
Jul. 2005.
[94] M. Laddomada, “Generalized comb decimation filters for Sigma Delta
A/D converters: Analysis and design,” IEEE Trans. Circuits Syst. I,
vol. 54, no. 5, pp. 994–1005, May 2007.
[95] ——, “Comb-based decimation filters for Sigma Delta A/D converters:
Novel schemes and comparisons,” IEEE Trans. Signal Process., vol. 55,
no. 5, pp. 1769–1779, May 2007.
[96] J. Kaiser and R. Hamming, “Sharpening the response of a symmetric
nonrecursive filter by multiple use of the same filter,” IEEE Trans. Acoust.,
Speech, Signal Process., vol. 25, no. 5, pp. 415–422, Oct. 1977.
[97] T. Saramäki and T. Ritoniemi, “A modified comb filter structure for dec-
imation,” in Proc. IEEE Int. Symp. Circuits Syst., vol. 4, 1997, pp. 2353–
2356.
[98] L. Lo Presti, “Efficient modified-sinc filters for sigma-delta A/D convert-
ers,” IEEE Trans. Circuits Syst. II, vol. 47, no. 11, pp. 1204–1213, Nov.
2000.
[99] G. Stephen and R. W. Stewart, “Sharpening of partially non-recursive
CIC decimation filters,” in Proc. Asilomar Conf. Signals Syst. Comput.,
vol. 2, 2004, pp. 2170–2173.
[100] G. J. Dolecek, “Modified CIC filter for rational sample rate conversion,”
in Proc. Int. Symp. Commun. and Information Tech., 2007, pp. 252–255.
[101] J. Vesma, “Optimization and applications of polynomial-based interpo-
lation filters,” Ph.D. dissertation, Diss. no. 254, Tampere University of
Technology, 1999.
[102] D. E. Knuth, The Art of Computer Programming, vol. 2: Seminumerical
Algorithms, 1st ed. Addison-Wesley, Reading, Massachusetts, 1998.
[103] S. Lakshmivarahan and S. K. Dhall, Analysis and Design of Parallel Al-
gorithms: Arithmetic and Matrix Problems. McGraw-Hill, 1990.
64 Chapter 7. References
[104] J.-M. Muller, Elementary Functions: Algorithms and Implementation.
Birkhauser, 2005.
[105] G. Estrin, “Organization of computer systems: the fixed plus variable
structure computer,” in Proc. Western Joint IRE-AIEE-ACM Computer
Conf., ser. IRE-AIEE-ACM ’60 (Western). New York, NY, USA: ACM,
1960, pp. 33–40.
[106] W. S. Dorn, “Generalizations of Horner’s rule for polynomial evaluation,”
IBM J. Res. Dev., vol. 6, no. 2, pp. 239–245, Apr. 1962.
[107] I. Munro and A. Borodin, “Efficient evaluation of polynomial forms,” J.
Comput. Syst. Sci., vol. 6, no. 6, pp. 625–638, Dec. 1972.
[108] I. Munro and M. Paterson, “Optimal algorithms for parallel polynomial
evaluation,” J. Comput. Syst. Sci., vol. 7, no. 2, pp. 189–198, Apr. 1973.
[109] K. Maruyama, “On the parallel evaluation of polynomials,” IEEE Trans.
Comput., vol. 22, no. 1, pp. 2–5, Jan. 1973.
[110] L. Li, J. Hu, and T. Nakamura, “A simple parallel algorithm for polyno-
mial evaluation,” SIAM J. Sci. Comput., vol. 17, no. 1, pp. 260–262, Jan.
1996.
[111] J. Villalba, G. Bandera, M. A. Gonzalez, J. Hormigo, and E. L. Zapata,
“Polynomial evaluation on multimedia processors,” in Proc. IEEE Int.
Applicat.-Specific Syst. Arch. Processors Conf., 2002, pp. 265–274.
[112] D.-U. Lee, A. A. Gaffar, O. Mencer, and W. Luk, “Optimizing hardware
function evaluation,” IEEE Trans. Comput., vol. 54, no. 12, pp. 1520–
1531, Dec. 2005.
[113] D.-U. Lee, R. C. C. Cheung, W. Luk, and J. D. Villasenor, “Hardware im-
plementation trade-offs of polynomial approximations and interpolations,”
IEEE Trans. Comput., vol. 57, no. 5, pp. 686–701, May 2008.
[114] N. Brisebarre, J.-M. Muller, A. Tisserand, and S. Torres, “Hardware op-
erators for function evaluation using sparse-coefficient polynomials,” Elec-
tron. Lett., vol. 42, no. 25, pp. 1441–1442, Dec. 2006.
[115] F. de Dinechin, M. Joldes, and B. Pasca, “Automatic generation of
polynomial-based hardware architectures for function evaluation,” in Proc.
IEEE Int. Applicat.-Specific Syst. Arch. Processors Conf., 2010, pp. 216–
222.
[116] S. Nagayama, T. Sasao, and J. T. Butler, “Design method for numerical
function generators based on polynomial approximation for FPGA imple-
mentation,” in Proc. Euromicro Conf. Digital Syst. Design Arch., Methods
and Tools, 2007, pp. 280–287.
References 65
[117] J. Pihl and E. J. Aas, “A multiplier and squarer generator for high perfor-
mance DSP applications,” in Proc. IEEE Midwest Symp. Circuits Syst.,
vol. 1, 1996, pp. 109–112.
[118] J.-T. Yoo, K. F. Smith, and G. Gopalakrishnan, “A fast parallel squarer
based on divide-and-conquer,” IEEE J. Solid-State Circuits, vol. 32, no. 6,
pp. 909–912, Jun. 1997.
[119] M. Wojko and H. ElGindy, “On determining polynomial evaluation struc-
tures for FPGA based custom computing machines,” in Proc. Australasian
Comput. Arch. Conf., 1999, pp. 11–22.
[120] H. C. A. van Tilborg, Encyclopedia of Cryptography and Security. New
York, USA,: Springer, 2005.
[121] R. L. Rivest, A. Shamir, and L. Adleman, “A method for obtaining digital
signatures and public-key cryptosystems,” Commun. ACM, vol. 21, no. 2,
pp. 120–126, Feb. 1978.
[122] A. C.-C. Yao, “On the evaluation of powers,” SIAM J. Comput., vol. 5,
no. 1, pp. 100–103, Aug. 1976.
[123] D. Bleichenbacher and A. Flammenkamp, “An efficient algorithm for com-
puting shortest addition chains,” SIAM J. Comput., 1998.

Linköping Studies in Science and Technology Dissertations, No 1420

Muhammad Abbas mabbas@isy.liu.se www.es.isy.liu.se Division of Electronics Systems Department of Electrical Engineering Linköping University SE–581 83 Linköping, Sweden

Copyright c 2012 Muhammad Abbas, unless otherwise noted. All rights reserved. Papers A, D, and E reprinted with permission from IEEE. Papers B and C, partial work, reprinted with permission from IEEE.

Abbas, Muhammad On the Implementation of Integer and Non-Integer Sampling Rate Conversion ISBN 978-91-7519-980-1 ISSN 0345-7524

A Typeset with L TEX 2ε Printed by LiU-Tryck, Linköping, Sweden 2012

To my loving parents

The fractional-delay filters are used in this context to provide a fractional-delay adjustable to any desired value and are therefore suitable for both integer and non-integer factors. They are usually approximated by some rational factor. CIC filters are most commonly used in the case of integer sampling rate conversions and often preferred due to their simplicity. SRC is used in many communication and signal processing applications where two signals or systems having different sampling rates need to be interconnected. and cascaded integrator comb (CIC) filters are one of them. which generally yields to low power consumption. or an irrational number. The computation of a value between existing samples can alternatively be regarded as delaying the underlying signal by a fractional sampling period. The performance of decimators and interpolators is mainly determined by the filters. The multiplierless nature. are critical with respect to power consumption. a ratio of two integers. In this work. Sampling rate conversion is a typical example where it is required to determine the values between existing samples. In the second approach. Since these filters operate at the maximum sampling frequency. therefore. The SRC by an irrational numbers is impractical and is generally stated for the completeness. It is therefore necessary to have an accurate and efficient ways and approaches that could be utilized to estimate the power consumption and the important factors that are contributing to it. The conversion factor in general can be an integer. which are there to suppress aliasing effects or removing unwanted images.Abstract The main focus in this thesis is on the aspects related to the implementation of integer and non-integer sampling rate conversion (SRC). There are two basic approaches to deal with this problem. There are many approaches for the implementation of decimation and interpolation filters. which is an important parameter to consider since the input sampling rate may differ several orders of magnitude. the implementation complexity for the second approach varies for different types of conversion factors. hardware efficiency. These power estimates at higher level can then be used as a feed-back while exploring multiple alternatives. The main advantage of the Farrow strucv . However. To have a high-level estimate of dynamic power consumption. and relatively good anti-aliasing (anti-imaging) characteristics for the first (last) stage of a decimation (interpolation). The structure that is used in the efficient implementation of a fractional-delay filter is know as Farrow structure or its modifications. digital signal processing techniques are utilized to compute values of the new samples from the existing ones. The former approach is hardly used since the latter one introduces less noise and distortion. The first is to convert the signal to analog and then re-sample it at the desired rate. The modeling of leakage power is also included. makes CIC filters well suited for performing conversion at higher rate. switching activity equations in CIC filters are derived. which may then be used to have an estimate of the dynamic power consumption. the second approach for SRC is considered and its implementation details are explored. Switching activity is one such factor.

In the considered fixed-point implementation of the Farrow structure. closed-form expressions for suitable word lengths are derived based on scaling and round-off noise.vi Abstract ture lies in the fact that it consists of fixed finite-impulse response (FIR) filters and there is only one adjustable fractional-delay parameter. By considering the number of operations of different types. As a part of this. since exponents are additive under multiplication. and latency after pipelining. In the parallel evaluation of powers. The implementation of the polynomial part of the Farrow structure is investigated by considering the computational complexity of different polynomial evaluation schemes. critical path. used to evaluate a polynomial with the filter outputs as coefficients. Since multipliers share major portion of the total power consumption. This characteristic of the Farrow structure makes it a very attractive structure for the implementation. high-level comparisons are obtained and used to short list the suitable candidates. pipelining complexity. . Most of these evaluation schemes require the explicit computation of higher order power terms. a matrix-vector multiple constant multiplication approach is proposed to improve the multiplierless implementation of FIR sub-filters. redundancy in computations is removed by exploiting any possible sharing at word level and also at bit level. an ILP formulation for the minimum addition sequence problem is proposed.

Som en sista del i avhandlingen presenteras dels en undersökning av komplexiteten för olika metoder att räkna ut polynom. Mycket av tidigare arbete har handlat om hur man väljer värden för multiplikatorerna. Utöver detta presenteras en ny metod för att ersätta multiplikatorerna med adderare och multiplikationer med två. medan själva implementeringen har rönt mindre intresse. Ett exempel kan vara system där flera olika standarder stöds och varje standard behöver behandlas med sin egen datahastighet. Läckningsströmmar blir ett relativt sett större och större problem ju mer kretsteknologin utvecklas. utan att behöva ta hänsyn till om ändringen av datatakten är ett heltal eller inte. I denna avhandling presenteras ett antal resultat relaterat till implementeringen av sådana filter. helt utan mer kostsamma multiplikatorer som behövs i många andra filterklasser. Detta är viktigt eftersom antalet bitar direkt påverkar mängden kretsar som i sin tur påverkar mängden effekt som krävs. dels föreslås två olika metoder att effektivt räkna ut kvadrater. För att kunna ändra hastigheten krävs i princip alltid ett digitalt filter som kan räkna ut de värden som saknas eller se till att man säkert kan slänga bort vissa data utan att informationen förstörs. Utöver detta presenteras mycket noggranna ekvationer för hur ofta de digitala värdena som representerar signalerna i dessa filter statistiskt sett ändras. där den största skillnaden jämfört med tidigare liknande arbeten är att effekt som förbrukas genom läckningsströmmar är medtagen. I Farrow-filter så behöver det även implementeras en uträkning av ett polynom. Dessa används för att fördröja en signal mindre än en samplingsperiod. Dessa används flitigt då de kan implementeras med enbart ett fåtal adderare. Ett annat är dataomvandlare som ut vissa aspekter blir enklare att bygga om de arbetar vid en högre hastighet än vad som teoretiskt behövs för att representera all information i signalen.Populärvetenskaplig sammanfattning I system där digitala signaler behandlas så kan man ibland behöva ändra datahastigheten (samplingshastighet) på en redan existerande digital signal. Detta är intressant eftersom multiplikationer med två kan ersättas med att koppla ledningarna lite annorlunda och man därmed inte behöver några speciella kretsar för detta. något som har en direkt inverkan på effektförbrukningen. Här presenteras slutna uttryck för hur många bitar som behövs i implementeringen för att representera allt data tillräckligt noggrant. samt enkelt kan användas för olika ändringar av datahastighet så länge ändringen av datatakten är ett heltal. vii . så det är viktigt att modellerna följer med. kuber och högre ordningens heltalsexponenter av tal. En modell för hur mycket effekt olika typer av implementeringar förbrukar presenteras. Den andra klassen av filter är så kallade Farrow-filter. något som kan användas för att räkna ut mellanliggande datavärden och därmed ändra datahastighet gotdtyckligt. Den första klassen av filter är så kallade CIC-filter.

.

A preliminary version of the above work can be found in: M. Circuits Syst. O. is extended with the modeling of leakage power. Paper A The power modeling of different realizations of cascaded integrator-comb (CIC) decimation filters. 2010. Sweden. “Switching activity estimation for cascaded integrator comb filters. Gustafsson. ix . Abbas. Abbas. under review. Johansson. Wanhammar. Shanghai. based on the proposed model.” in Proc. The work has been done between December 2007 and December 2011. 2010. demonstrates the close correspondence of the estimation model to the simulated one.Preface This thesis contains research work done at the Division of Electronics Systems. Gustafsson. Model results for the case of phase accumulators of direct digital frequency synthesizers (DDFS) are also presented. The switching activities may then be used to estimate the dynamic power consumption. The switching activity estimation model is first developed for the general-purpose integrators and CIC filter integrators.. “Power estimation of recursive and non-recursive CIC filters implemented in deep-submicron technology. and those obtained by simulation. 22–24. Conf. and has resulted in the following publications. The inclusion of this factor becomes more important when the input sampling rate varies by several orders of magnitude. The correlation in sign extension bits is also considered in the switching estimation model. recursive and non-recursive. “Switching activity estimation of CIC filter integrators. Shanghai. China. Paper B A method for the estimation of switching activity in cascaded integrator comb (CIC) filters is presented. Different values of differential delay in the comb part are considered for the estimation. Green Circuits Syst. Also the importance of the input word length while comparing recursive and non-recursive implementations is highlighted. IEEE Asia Pacific Conf. The comparison of theoretical estimated switching activity results. M. I. Department of Electrical Engineering.” IEEE Trans. Sept. Linköping University. and K. and L. Gustafsson. Postgraduate Research in Microelectronics and Electronics. which normally operate at the lower sampling rate. China. The switching activity estimation model is also derived for the comb sections of the CIC filters. The model was then extended to gather the effects of pipelining in the carry chain paths of CIC filter integrators.” in Proc. Abbas and O. M. O. June 21–23. IEEE Int.

High-level comparisons of these schemes are obtained based on the number of operations of different types. 2011. “On the implementation of fractional delay filters based on the Farrow structure. Gustafsson. First. and in most cases fewer total adders. Paper E The problem of computing any requested set of power terms in parallel using summations trees is investigated. “Computational and implementation complexity of polynomial evaluation schemes. Circuits Syst.. a round-off noise analysis is performed and closed-form expressions are derived. “Round-off analysis and word length optimization of the fractional delay filters based on the Farrow structure..” in Proc. but they are divided into data-data multiplications.” in Proc. fewer delay elements. signal scaling in the Farrow structure is studied which is crucial for a fixed-point implementation. These parameters are suggested to consider to short list suitable candidates for an implementation given the specifications. Lund. Second. Sweden. A technique is proposed. 2009. O. Symp. Abbas. IEEE Int. Taiwan. Circuits Syst. Gustafsson. May 24–27.. critical path. Taipei. M. squarers. and latency after pipelining. Closed-form expressions for the scaling levels for the outputs of each sub-filter as well as for the nodes before the delay multipliers are derived. O. Third. and data-coefficient multiplications. Their impact on different parameters suggested for the selection is stated. Johansson. “Scaling of fractional delay filters using the Farrow structure. May 4–5. In comparisons.x Paper C Preface In this work. Sweden. Johansson. 14–15. O. there are three major contributions. and H. 2009. Swedish System-on-Chip Conf. Abbas and O. By using these closed-form expressions for the round-off noise and scaling in terms of integer bits. I. Preliminary versions of the above work can be found in: M. Paper D The computational complexity of different polynomial evaluation schemes is studied. pipelining complexity. Nov. and H. under review. Arlid. Abbas.” in Proc. direct form sub-filters leading to a matrix MCM block is proposed. Rusthållargården. and H. Gustafsson. different approaches to find the suitable word lengths to meet the round-off noise specification at the output of the filter are proposed. Gustafsson. M. not only multiplications are considered. IEEE Norchip Conf. which first generates . Johansson. which stems from the approach for implementing parallel FIR filters. Abbas.” IEEE Trans. The use of a matrix MCM blocks leads to fewer structural adders. M.

Since exponents are additive for a multiplication. The redundancy here relates to the fact that same three partial products may be present in more than one columns. Gustafsson. Comp. Z. Sheikh.” in Proc. Gustafsson. CA. hence. The testing of the proposed algorithm for different sets of powers. M. 2010. Comp.. Abbas and O. H. The contributions are also made in the following publication but the contents are not directly relevant or less relevant to the topic of thesis. Paper F An integer linear programming (ILP) based model is proposed for the computation of a minimal cost addition sequence for a given set of integers. M. Paper G Based on the switching activity estimation model derived in Paper B. and K. “Integer linear programming modeling of addition sequences with additional constraints for evaluation of power terms.” manuscript. Not only is an optimal model proposed. This approach has achieved considerable hardware savings for almost all of the cases. Qureshi. the minimal length addition sequence will provide an efficient solution for the evaluation of a requested set of power terms.Preface xi the partial product matrix of each power term independently and then checks the computational redundancy in each and among all partial product matrices at bit level. “Comparison of multiplierless implementation of nonlinear-phase versus linear-phase FIR filters. Pacific Grove. “Switching activity estimation of DDFS phase accumulators. Oct. and signed/unsigned numbers is done to exploit the sharing potential. help to reduce the solution time. but the model is extended to consider different costs for multipliers and squarers as well as controlling the depth of the resulting addition sequence. Signals Syst.. Additional cuts are also proposed which. O. Abbas. the model equations are derived for the case of phase accumulators of direct digital frequency synthesizers (DDFS). Gustafsson. M.” manuscript. 2008. CA. 26–29. O.” in Proc. variable word lengths. and A. M. Asilomar Conf. 7–10. Johansson. although not required for the solution. Gustafsson. Signals Syst. Abbas and O. Blad. Abbas. and. Johansson. “Low-complexity parallel evaluation of powers exploiting bit-level redundancy. all can be mapped to the one full adder. . Asilomar Conf. Pacific Grove. F. Nov.

.

Anton Blad for their kind and constant support throughout may stay here at the department. Muhammad Saifullah Khan. Oscar Gustafsson and Prof. My friends here in Sweden. Dr. the Merciful. Higher Education Commission (HEC) of Pakistan is gratefully acknowledged for the financial support and Swedish Institute (SI) for coordinating the scholarship program. Erik Höckerdal for proving the LaTeX template. Kent Palmkvist for help with FPGA and VHDL-related issues. Dr. thoughts. which has made life very easy. Muhammad Irfan Kazim. Dr. Fahad Qazi. affectionate parents. the last prophet of Allah. Rizwan Asghar for their generous help and guidance at the start of my PhD study. talented teachers. Ali Saeed. helping friends and an opportunity to contribute to the vast body of knowledge. Rashad Ramzan and Dr. Peter Johansson for all the help regarding technical as well as administrative issues. and Syed Asad Alam for being caring friends and providing help in proofreading this thesis. for their inspiring and valuable guidance. Dr. Linköping University is also gratefully acknowledged for partial support. They always kindly do their best to help you. Kenny Johansson for introducing me to the area and power estimation tools and sharing many useful scripts. enlightening discussions. Tafzeel ur Rehman. the Compassionate. Linköping University have created a very friendly environment. Syed Ahmed Aamir. Mohammad Junaid. Håkan Johansson. who gave health. xiii . Dr. Peace and blessing of Allah be upon the Holy Prophet MUHAMMAD (peace be upon him). Department of Electrical Engineering. Jawad ul Hassan. Fahad Qureshi. and many more for all kind of help and keeping my social life alive. Nadeem Afzal. The former and present colleagues at the Division of Electronics Systems. Saima Athar. Dr. who exhort his followers to seek for knowledge from cradle to grave and whose incomparable life is the glorious model for the humanity. Zafar Iqbal. Dr. I have learnt a lot from them and working with them has been a pleasure. Zaka Ullah.Acknowledgments I humbly thank Allah Almighty. I would like to express my sincere gratitude towards: My advisors. Amir Eghbali and Dr. Muhammad Touqir Pasha. kind and dynamic supervision through out and in all the phases of this thesis.

Qaisar Abbas and sister-in-law Dr. Thanks to both of you for having confidence and faith in me. care of Abiha single handedly. My elder brother Dr. who has made my life so beautiful. Truly you hold the credit for all my achievements. I say profound thanks for bringing pleasant moments in my life. patience. unconditional cooperation. 2012. To my little princess. I have never felt that I am away from home. My mother and my father for their non-stop prays. My younger brother Muhammad Waqas and sister for their prays and taking care of the home in my absence. brothers. My wife Dr. My parent-in-laws. Thanks for hosting many wonderful days spent there at Uppsala. Abiha. To those not listed here. and being away from her family for few years. Khalid Bin Sagheer. and Ghulam Hussain for all the help and care they have provided during the last four years.xiv Acknowledgments My friends and colleagues in Pakistan especially Jamaluddin Ahmed. Linköping Sweden . Uzma for their help at the very start when I came to Sweden and throughout the years later on. Shazia for her devotion. being a great asset with me during all my stay here in Sweden. Muhammad Zahid. and sisters for their encouragement and prays. Muhammad Abbas January.

. . . . . 2. . .1 Two’s Complement Numbers . .1 Digital Filters . .1. . . . . 2. . . . . . . . . . . . . . . . . . . . . . . . 1. . . . . .1 FIR Decimators . . . 1. . . . .1. . 1. .2 Canonic Signed-Digit Representation 2. . . .1 FIR Filters . . . 2. . . . . . . . . . .1. . . . . .1 Basic Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. . . . . . .1. . . . . . . . . . . . . . . . . . . . . . . . . . .2. . . . .1. . . . . . 1. . 3. . 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Two’s Complement Addition . 2. 1. . . . 2. . .2. . . . . . . . . . . . . . . . . . .2 Sampling Rate Conversion . . . . . . . . . 3. . . . . . . . . 1 1 1 2 3 4 5 7 8 10 13 17 17 18 19 19 21 21 21 22 23 23 24 27 27 28 29 29 30 31 2 Finite Word Length Effects 2. . . . . . . . . . . . . 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.3 Overflow Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3. . . 3. . . . . . . . . . . . . . . . . . . . . . .6 Word Length Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. .2 Two’s Complement Multiplication . . . .4 Fractional-Delay Filters . . . .7 Constant Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 FIR Interpolators . . . . . . . .3 Polyphase Representation . . . . . . . .3. . . . . .2 IIR Filters . . . . 2. . .5 Round-Off Noise . . . . . . . . . . . . . . . . .2 Interpolation . . . . .2 Polyphase FIR Filters . . . . . . .5 Power and Energy Consumption . . . . . . . . . . .1 Numbers Representation . 1. . . . . . . . . . . . . . . . . . . . . 2. . . . . . . . .2. . .1 Polyphase FIR Decimators . . . 3. . . . . . . . . . . . . . . . . . . .1. . . . . . . . . 2. . . . . . . . . . . . .2. . . . . . . . . . . .4 Scaling . . 3 Integer Sampling Rate Conversion 3. .2 Fixed-Point Quantization . . .Contents 1 Introduction 1.2 Polyphase FIR Interpolators xv . . . . . .3 Noble Identities . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Decimation . . 1. . .

. 5. . . . . . . . .2 Horner’s Scheme .4. . . . . . . . . . . References . . 41 4. . . .4. . . . . . . . . . . . 5 Conclusion . . . . . . . . . . 56 Publications A Power Estimation of Recursive and Non-Recursive CIC Implemented in Deep-Submicron Technology 1 Introduction . Filters . . . . . . . . . . . . . .1 Powers Evaluation with Sharing at Word-Level 5. . . . . . . . .1 CIC Filters in Interpolators and Decimators . . . . . . 2. . . . . 3. 31 32 33 35 36 4 Non-Integer Sampling Rate Conversion 37 4. . 37 4. . . . . . . . . . . . . . .xvi 3. . . . . . . . . . 39 4. Cascaded Integrator Comb Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 Powers Evaluation . .4 Multistage Implementation . . 5. . . . . . . .3 Parallel Schemes . . . . . . . . . . . . . . . . . . . . . . 3. . . . . . 53 7 References 55 References . . . 4 Results . . . . . . .2 Polyphase Implementation of Rational Sampling Rate Conversion . . . . . . . . . . . . . . . . . . . . . 67 69 72 72 73 76 76 79 80 83 . .2 Powers Evaluation with Sharing at Bit-Level . . . . . . . . . . . . . . . . . .1. . .3 Large Rational Factors . . . . . .1 Polynomial Evaluation Algorithms . . . . . . . . .1 Sampling Rate Conversion: Rational Factor . . . . . . . . . . . . . . . . . . . . . . . . . .1 Complexity and Power Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4. . . . . . . . . . . . . . . .1 Small Rational Factors . . .1 Complexity and Power Model . . . . . . 3 Non-Recursive CIC Filters . . . . . . . . . . . . . . . . . . . . . . . . 5. 40 4. . . . . . . . . . 6 Conclusions and Future Work 53 6. .2 Polyphase Implementation of Non-Recursive CIC 3. . . . . . . 5. . . . . . . . . . . Contents . . . . . . . . . . . .2. 53 6. . 3.1 Farrow Structures . . . . . . . . . . . . . . . . .2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1.4. . . . . . . . . . . . . . . . . . . .3 3. . . . . . . . . . . . . . . . . . . . 37 4. . . . . . . 45 45 46 46 47 48 49 . .1. . . . . . . . . . . . . . . . . . . . . . 2 Recursive CIC Filters . . . . .3 Compensation Filters . . . . . . . . . . . . . . . . . . . .4. . . .1 Conclusions . . . . . . . . . . . . . . . . . 41 5 Polynomial Evaluation 5. . . . . . . . . . . .2 Sampling Rate Conversion: Arbitrary Factor . . . .

. . . . . . . . . . . .2 Word Length Selection Approaches . . . . . .4 Switching Activity for Correlated Sign Extension Bits . . . . . . . . . .2 Scaling in the Polynomial Evaluation .4 Switching Activity for Later Comb Stages .1 Scaling . . 118 C On the Implementation of Fractional-Delay Filters the Farrow Structure 1 Introduction . . . . . 4. .2 CIC Combs . 116 A Double Differential Delay Comb . . . . . . . . . . . 6. . . . . . . . . . . . . . . . . . . . . . 109 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 5 Results . . . . . . . . . . . . . .1 One’s Probability . . . .3 Approaches to Select Word Lengths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. . . . 151 . . . . 97 3. . . . . . . . . . 146 . . . . . . . 130 . . . . . .1 Rounding and Truncation .1 Switching Activity . . . . 4 Round-Off Noise . . . . . . . . . . . . . . . . . .1 One’s Probability . 104 4. . 1. . 126 . . 111 7 Conclusions . . . 109 5. . . . 133 . . . . . 118 A. . . . . . . . . . . . . . .2 Round-Off Noise in the Farrow Structure . . 129 . . . . . 2 Adjustable FD FIR Filters and the Farrow Structure 3 Scaling of the Farrow Structure for FD FIR Filters . . . . . . . . . . . . . . . . 137 . . . 112 References . . 144 . . . . . . . . 126 . . 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 3. . . . . . . . . . . . . . . . . . . . . . . . . .3 Sub-Filter Implementation . . . . 103 4. . . . . . . . . . . . 143 . . . . . .3 Pipelining in CIC Filter Accumulators/Integrators . . . . . . . . . .2 Switching Activity . . 4. . . . . . . . . . . . . . . . . . . . . . . . . . 127 . . . .2 Outline . . 3. . . . . . . 7 Conclusions . . . . . . . . . . . . . . . . . . References . . . 128 . . . . . . . . . . . 145 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Based on 123 . . . . . . . . . . . . . 90 3. . . . . . . . . . 89 3 CIC Filter Integrators/Accumulators . . . . . . . . . . . . . . . . . . . . . . . . . . 134 . . . . .1 CIC Integrators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. . . . . . . . . 110 6 Discussion . . . . . . . . . . . . . . . . . . . . . . . 91 3. . . . 93 3. . . . . . . . . . . . . . . . . . . . 127 .1 Contribution of the Paper . . . . . . . . . . . . . . . . . . . . . 5 Implementation of Sub-Filters in the Farrow Structure 6 Results . . . . . . . . . .1 Scaling the Output Values of the Sub-Filters . . . . . . . . . . . 103 4. . . . . . .2 Switching Activity . . . . . . . . . . . . . . . . . . . . . . 106 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . .Contents xvii B Switching Activity Estimation of Cascaded Integrator Comb Filters 85 1 Introduction . . . . . . 150 . . . . . . . . . . . 88 2 CIC Filters . 136 . .3 Arbitrary Differential Delay Comb . . . . . . 1. . . 102 4 CIC Filter Combs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 . . . . . . . . 3. . . . . . .5 Switching Activity for Later Integrator Stages . . . . . . .

. . and Latency . 161 2. . . . . . . . . . . . .8 Pipelining. . . . . . . . . . . . . . . . . . . . . . . . 184 References . . . . . . . . . 181 3. . 181 3. . . . . . . . . . . . . . . . . . . . . . .1 Unsigned Binary Numbers . . . . . . . . . . . . . . . 167 References . . . . . . . 178 2. . . . . . 158 2 Polynomial Evaluation Algorithms . . . . . 164 2. .3 Partial Product Matrices with CSD Representation of Partial Product Weights . . . . . . . . . . . . . . . . 180 2. . . . . . . . . 180 3 Proposed Algorithm for the Exploitation of Redundancy . . . . . . . . . 194 3. .5 Maruyama’s Scheme . . 195 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 2 Powers Evaluation of Binary Numbers . . . . . . . . . . . . . .2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 2. . . . . . . . . . . . . . . 201 . . . . . . . . .2 Minimizing Weighted Cost . . . . . . . . . . . . . . . . 162 2. . . . . . . . .4 Pruning of the Partial Product Matrices . . 182 3. . . Critical Path. . . . . . . . . . . . 159 2. . 182 4 Results . . . . . . . . . . . . . . . . . 196 5 Conclusions . . . . . . . . . . . . 196 4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 3 Proposed ILP Model . .1 Unsigned Binary Numbers Case . . . . . . . . . . . . . . . . . . . . .1 Horner’s Scheme . . 163 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 4 Conclusions .4 Munro and Paterson’s Scheme . . . . . . . . . . . . . . . . . . .xviii Contents D Computational and Implementation Complexity of Polynomial Evaluation Schemes 155 1 Introduction .3 Minimizing Depth . . . . . . . . . . . . .2 Dorn’s Generalized Horner Scheme . . . . 178 2. . . . . . . . . . . . 172 E Low-Complexity Parallel Evaluation of Powers Exploiting BitLevel Redundancy 175 1 Introduction . . . Scheme . . . . . . . . . . . 188 F Integer Linear Programming Modeling of Addition Sequences With Additional Constraints for Evaluation of Power Terms 189 1 Introduction . . 192 2 Addition Chains and Addition Sequences . . . . . 194 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 References . . . . . . . . . . . .7 Li et al. . . . . . . . . . . . . . . . . . . . . . . . 160 2. . . 164 3 Results . . . . . . . . 183 5 Conclusion . . . . . . . . . . . . . . . . . .2 Powers Evaluation for Two’s Complement Numbers . . . . . . . . . . . . . . . . . . .3 Two’s Complement Input and CSD Encoding Case . . . . . . . . . . .3 Estrin’s Scheme . . . . . .6 Even Odd (EO) Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 2. . . . . . . . . . . . . 159 2. . . . . . . . . . . . . . . . . . . . . . . . .1 Basic ILP Model . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix 203 206 207 207 209 211 . . . . . . . . . 4 Conclusion . . . . . . References . . . . . . . . . . . . . . . Phase Accumulators . 3 Switching Activity . . . . . . . . . . . . . . 2 One’s Probability . . . . . . . .Contents G Switching Activity Estimation of DDFS 1 Introduction . . . . . . . . . . . . . . . . . . .

.

1 FIR Filters If the impulse response is of finite duration and becomes zero after a finite number of samples. The sampling rate conversion (SRC) factor can be an integer or a non-integer. Finally. there is a need to convert the given sampling rate to the desired sampling rate. A part of the chapter is devoted to the efficient polyphase implementation of decimators and interpolators. A digital filter is characterized by its transfer function. output. it is a finite-length impulse response (FIR) filter. The fractional-delay filters are then briefly reviewed. and inside of the systems. or equivalently. The concept of decimation and interpolation that include filtering is explained. when changing the sampling rate by an integer factor. the power and energy consumption in CMOS circuits is described. The description of six identities that enable the reductions in computational complexity of multirate systems is given. This chapter gives a brief overview of SRC and the role of filtering in SRC. In applications involving systems operating at different sampling rates. 1.Chapter 1 Introduction Linear time-invariant (LTI) systems have the same sampling rate at the input.1. A causal 1 . without destroying the signal information of interest.1 Digital Filters Digital filters are usually used to separate signals from noise or signals in different frequency bands by performing mathematical operations on a sampled. Digital filters are first introduced. The SRC. by its difference equation. is then explained. The time and frequency-domain representations of the downsampling and upsampling operations are then given. 1. discrete-time signal.

h(N ) are the impulse response values.2. 2] N H(z) = k=0 h(k)z −k .1) which is a polynomial in z −1 of degree N . The number of coefficient multiplications in the direct and transpose forms can be halved when exploiting the coefficient symmetry of linear-phase FIR filter. The transpose structure is shown in Fig. . theoretically never approaches zero. (1. N + 1. The parameter N is the filter order and total number of coefficients.2 x(n) z −1 h(1) z −1 h(2) z −1 h(3) Chapter 1. The time-domain input-output relation of the above FIR filter is given by N y(n) = k=0 h(k)x(n − k). . The FIR filters can be designed to provide exact linearphase over the whole frequency range and are always bounded-input boundedoutput (BIBO) stable.2: Transposed direct-form realization of an N -th order FIR filter. 1. (1. it is an infinite-length impulse response (IIR) filter. The total number of coefficient multiplications will be N/2 + 1 for even N and (N + 1)/2 for odd N. respectively.1: Direct-form realization of an N -th order FIR filter.2) where y(n) and x(n) are the output and input sequences. defined as [1. 1. .1.e. i. independent of the filter coefficients [1–3]. x(n) h(N ) h(N − 1) z −1 h(2) z h(1) −1 h(0) z −1 y(n) Figure 1. and h(0). 1. h(1).1 is the block diagram description of the difference equation (1. This type . Introduction z −1 h(N − 1) h(N ) h(0) y(n) Figure 1. .2)..2 IIR Filters If the impulse response has an infinite duration. is the filter length. FIR filter of order N is characterized by a transfer function H(z). The direct form structure in Fig. also called filter coefficients.

The cascade connection of the non-recursive and recursive sections results in a structure called direct-form I as shown in Fig. possible stability issues. an IIR filter can attain the same magnitude specification requirements with a transfer function of significantly lower order [1].2. a ratio of two integers. Sampling Rate Conversion x(n) z −1 z −1 b0 y(n) −a1 −a2 −aN −1 −aN z −1 z −1 z −1 3 b1 b2 bN −1 z −1 bN Figure 1. 1.3. and sensitivity to quantization errors [1. Mathematically. Compared with an FIR filter. the SRC factor can be defined as Fout R= .2 Sampling Rate Conversion Sampling rate conversion is the process of converting a signal from one sampling rate to another.3: Direct-form I IIR realization.3) The first sum in (1.4) Fin . SRC is utilized in many DSP applications where two signals or systems having different sampling rates need to be interconnected to exchange digital signal data. 4]. while changing the information carried by the signal as little as possible [5–8]. 2] N N y(n) = k=0 bk x(n − k) − k=1 ak y(n − k). These two parts can be implemented separately and connected together. The drawbacks are nonlinear phase characteristics.1.3) is non-recursive and the second sum is recursive. (1. or an irrational number. 1. of filter is recursive and represented by a linear constant-coefficient difference equation as [1. The SRC factor can in general be an integer. (1.

5) as +∞ Y (ejωT ) = −∞ x(mM )e−jωT m = 1 M M−1 X(ej(ωT −2πk)/M ). respectively. consisting of an anti-aliasing filtering followed by an operation known as downsampling [9].6) The above equation shows the implication of the downsampling operation on the spectrum of the original signal. x(n). For R > 1. SRC is required when there is need of x(n) = xa (n/Fout ) and the continuous-time signal xa (t) is not available anymore. A sequence can be downsampled with a factor of M by retaining every M -th sample and discarding all of the remaining samples.1 Decimation Decimation by a factor of M . The downsampling operation is linear but time-varying operation. k=0 (1. 1. Applying the downsampling operation to a discrete-time signal. The frequency domain representation of downsampling can be found by taking the z-transform to both sides of (1. the sampling rate is increased and this process is known as interpolation.2. For R < 1. the sampling rate is reduced and this process is known as decimation. can be performed as a two-step process. The sampling frequencies are chosen in such a way that each of them exceeds at least two times the highest frequency in the spectrum of original continuous-time signal. A signal downsampled by two different factors may have two different shape output signals but both carry the same information if the downsampling factor satisfies the sampling theorem criteria. (1. A delay in the original input signal by some samples does not result in the same delay of the downsampled signal. One alternative is to first reconstruct the corresponding analog signal and. However.4a The sampling rate of new discrete-time signal is M times smaller than the sampling rate of original signal. then. produces a downsampled signal y(m) as y(m) = x(mM ). The output spectrum is a sum of M uniformly shifted and stretched versions of X(ejωT ) and also scaled by a factor of .5) The time index m in the above equation is related to the old time index n by a factor of M . and the processor used to process that data can only accept data at a different sampling rate. For example. an analogto-digital (A/D) conversion system is supplying a signal data at some sampling rate. and the discrete-time samples are x(n) = xa (n/Fin ). it is more efficient to perform SRC directly in the digital domain due to the availability of accurate all-digital sampling rate conversion schemes.4 Chapter 1. When a continuous-time signal xa (t) is sampled at a rate Fin . Introduction where Fin and Fout are the original input sampling rate and the new sampling rate after the conversion. The block diagram showing the downsampling operation is shown in Fig. SRC is available in two flavors. re-sample it with the desired sampling rate. 1. where M is a positive integer.

(b) M -fold decimator. otherwise.4b.5: Spectra of the intermediate and decimated sequence. . 1. The upsampling by a factor of L is implemented by inserting L − 1 zeros between two consecutive samples [9]. 1. Sampling Rate Conversion 5 x(n) M y(m) x(n) H(z) x1(m) M y(m) (a) M -fold downsampler.2 Interpolation Interpolation by a factor of L. where L is a positive integer.7) A block diagram shown in Fig. 1. The spectra of the intermediate sequence and output sequence obtained after downsampling are shown in Fig. An upsampling operation to a discrete-time signal x(n) produces an upsampled signal y(m) according to y(m) = x(m/L). m = 0. . Figure 1. The block diagram of a decimator is shown in Fig. X1 (ejωT ) Hideal (ejωT1 ) π/M Y (ejωT ) π 2π ωT1 π 2π 2Mπ ωT Figure 1. 1. (1. 1/M .2. Decimation requires that aliasing should be avoided. can be realized as a two-step process of upsampling followed by an anti-imaging filtering. .1. The ideal filter. The signals which are bandlimited to π/M can be downsampled without distortion. Therefore. 1.2.4: M -fold downsampler and decimation. ±L.6a is used to represent the upsampling op- . . as shown by dotted line in Fig. The performance of a decimator is determined by the filter H(z) which is there to suppress the aliasing effect to an acceptable level.5. ±2L. the first step is to bandlimit the signal to π/M and then downsampling by a factor M . should be a lowpass filter with the stopband edge at ωs T1 = π/M . 0.5.

upsampling by a factor of L is performed. X(ejωT ) π 2π Hideal (ejωT1 ) 2Lπ ωT X1 (ejωT ) π/L 2π/L Y (ejωT1 ) 2π ωT1 π/L π 2π ωT1 Figure 1.7: Spectra of the original. and in the second step. The frequency domain representation of upsampling can be found by taking the z-transform of both sides of (1. 1. The upsampling operation is a linear but time-varying operation. (1. intermediate.6: L-fold upsampler and interpolator. Figure 1.6 Chapter 1. A delay in the original input signal by some samples does not result in the same delay of the upsampled signal. The performance of an interpolator is determined .6b. The upsampling operation increases the sampling rate of the original signal by L times. unwanted images are removed using anti-imaging filter. Interpolation requires the removal of the images. eration. (b) L-fold interpolator. and output sequences.7) as +∞ Y (ejωT ) = −∞ y(m)e−jωT m = X(ejωT L ). Therefore in first step. The block diagram of an interpolator is shown in Fig. Introduction x(n) L y(m) x(n) L x1 (m) H(z) y(m) (a) L-fold upsampler.8) The above equation shows that the upsampling operation leads to L − 1 images of the spectrum of the original signal in the baseband.

. Figure 1.2. y(m). seen in Figs. help to move the downsampler and upsampler operations to a more desirable position to enable an efficient implementation structure.8: Noble identities for decimation. called noble identities. since filtering has to be performed at the higher sampling rate. In SRC. but also the repeated images of the baseband. the desired sequence. 1. moving converters leads to evaluation of additions and multiplications at lower sampling rate. the arithmetic operations of additions and multiplications are to be evaluated at the lowest possible sampling rate.9c and 1. the spectrum of the sequence. 1. by the filter H(z). x1 (m). the computational efficiency may be improved if downsampling (upsampling) operations are introduced into the filter structures. show that a delay of M (L) sampling periods at the higher sampling rate corresponds to a delay of one sampling period at the lower rate.9b and 1. As a result.7. 1. The ideal filter should be a lowpass filter with the stopband edge at ωs T1 = π/L as shown by dotted line in Fig.3 Noble Identities The six identities. 1. 7 x1(m) M c1 y(n) M c2 x(m) z −M M y(n) x(m) M z −1 y(n) (b) Noble identity 3. The fifth and sixth identities. seen in Figs. which is there to remove the unwanted images. can be obtained from x1 (m) by removing these unwanted images.1. This is performed by the interpolation (anti-imaging) filter.2. not only contains the baseband of the original signal.9a and 1. x(m) H(z M ) M y(n) x(m) M H(z) y(n) (c) Noble identity 5. Sampling Rate Conversion x1 (m) c1 M x2 (m) c2 y(n) x2(m) (a) Noble identity 1. 1. As shown in Fig.7. seen in Figs. 1. are generalized versions of the third and fourth identities.8b.8a.8c. The third and fourth identities. Apparently. In the first and second identities.

operate at M times lower sampling rate.9: Noble identities for interpolation. 1. . zeros can be discharged resulting in downsampled-by-M polyphase components. The subsequences hm (n) and the corresponding z-transforms defined in (1.10) The above representation is equivalent to dividing the impulse response h(n) into M non-overlapping groups of samples hm (n). x(n) L H(z L) y(m) x(n) H(z) L y(m) (c) Noble identity 6. H(z) can be decomposed as M−1 +∞ M−1 H(z) = m=0 z −m n=−∞ h(nM + m)z −nM = m=0 z −m Hm (z M ). The polyphase components. 10]. as a result. obtained from h(n) by M fold decimation starting from sample m. Since each polyphase component contains M − 1 zeros between two consecutive samples and only nonzero samples are needed for further processing.9) For an integer M .10) are called the Type-1 polyphase components of H(z) with respect to M [10]. (1.3 Polyphase Representation A very useful tool in multirate signal processing is the so-called polyphase representation of signals and systems [5. an LTI system is considered with a transfer function +∞ H(z) = n=−∞ h(n)z −n . To formally define it. Introduction y1(m) x(n) c2 L y2(m) c1 L y1(m) x(n) L z −L y(m) x(n) z −1 L y(m) (b) Noble identity 4. It facilitates considerable simplifications of theoretical results as well as efficient implementation of multirate systems. Chapter 1. The polyphase decomposition is widely used and its combination with the noble identities leads to efficient multirate implementation structures.8 c1 x(n) L c2 y2(m) (a) Noble identity 2. (1. Figure 1.

Figure 1. but it is possible to do so [10]. 1.10). 1. The filter H(z) in Fig. However. Polyphase Representation x(n) z −1 H1(z M ) z −1 H2(z M ) z −1 HM −1(z M ) (a) Polyphase decomposition of a decimation filter. For FIR filters.10a. is shown in Fig.11a. 1. the polyphase decomposition is not so simple.10b. (1. 1.4b is represented by its polyphase representation form as shown in Fig. 1. Similarly. An IIR filter has a transfer function that is a ratio of two polynomials. The representation of the transfer function into the form. An efficient implementation of decimators and interpolators results if the filter transfer function is represented in polyphase decomposed form [10]. for IIR filters. where M and L .10: Polyphase implementation of a decimator.3. the interpolation filter H(z) in Fig. Its equivalent structure. but more efficient in hardware implementation.11b. the polyphase decomposition into low-order sub-filters is very easy. 1.6b is represented by its polyphase representation form as shown in Fig. A more efficient polyphase implementation and its equivalent commutative structure is shown in Fig.1. 9 M y(m) H0(z M ) x(n) z −1 M H0(z) y(m) H0(z) M H1(z) x(m) H1(z) y(m) z −1 M HM −1(z) HM −1 (z) (b) Moving downsampler to before sub-filters and replacing input structure by a commutator. needs some modifications in the original transfer function in such a way that the denominator is only function of powers of z M or z L .

The polyphase sub-filters in the second approach has distinct allpass sub-filters [12–15]. for example. Introduction y(m) z −1 H1(z L) z −1 H2(z L) z −1 HL−1(z L) (a) Polyphase decomposition of an interpolation filter. and arbitrary sampling rate conver- . mitigation of symbol synchronization errors in digital communications [16–19].10 x(n) L H0(z L) Chapter 1.11: Polyphase implementation of an interpolator. the original IIR transfer function is re-arranged and transformed into (1.10). x(n) H0(z) L z −1 y(m) H0(z) H1(z) L x(n) H1(z) y(m) z −1 HL−1 (z) L HL−1 (z) (b) Moving upsampler to after sub-filters and replacing output structure by a commutator. Several approaches are available in the literature for the polyphase decomposition of the IIR filters. In this thesis. only FIR filters and their polyphase implementation forms are considered for the sampling rate conversion. echo cancellation [23]. In the first approach [11]. 1. time-delay estimation [20–22]. Figure 1.4 Fractional-Delay Filters Fractional-delay (FD) filters find applications in. are the polyphase decomposition factors.

5 d=0.13) can be considered as all-pass and having a linearphase characteristics. Equation (1.3 d=0.13) The ideal FD filter in (1.9}. In the frequency domain. 0. (1. (1. The delay parameter D can be expressed as D = Dint + d.4. Ideally.5 0 0 d=0.14) . (1. The integer part of the delay can then be implemented as a chain of Dint unit delays. The FD d however needs approximation.1.11) need to be approximated. .4 d=0. sion [24–26]. For non-integer values of D. FD filters are used to provide a fractional delay adjustable to any desired value.2.1 0. Fractional-Delay Filters 11 3.8 1 ωT/π Figure 1.4 0.1. an ideal FD filter can be expressed as Hdes (ejω ) = e−j(Dint +d)ωT .5 Phase delay 2 1. .12: Phase-delay characteristics of FD filters designed for delay parameter d = {0.5 1 0.12) where Dint is the integer part of D and d is the FD.2 0. the output y(n) of an FD filter for an input x(n) is given by y(n) = x(n − D).6 0.2 d=0.9 d=0. .7 d=0. The magnitude and phase responses are |Hdes (ejωT )| = 1 and φ(ωT ) = −(Dint + d)ωT. (1.11) is valid for integer values of D only. (1.5 3 2.6 d=0.8 d=0. .11) where D is a delay. 0.15) (1.

4 0.13: Magnitude of FD filters designed for delay parameter d = {0. a wide range of FD filters have been proposed based on FIR and IIR filters [27–34].2 and d = 0.4 d=0. In the application of FD filters for SRC. For the approximation of the ideal filter response in (1. and it is demonstrated for d = 0. d will take on all values in a sequential manner between {(L − 1)/L. can be expressed as y(nT ) = xa (nT − Dint T − dT ). the fractional-delay d is changed at every instant an output sample occurs. Equivalently.14 . The input and output rates will now be different. For each input sample. 0}. with a step size of 1/L. 0. for each input sample there are now two output samples.6 0.8 Magnitude d=0 d=0. . 1. (1.5 0.8.8 1 Figure 1.2 0. 0. The fractional-delay d assumes the value 0 and 0. 0.9}.6.2. 0. For a sampling rate increase by a factor of L. it can be interpreted as delaying the underlying continuous-time signal by L different values of fractional-delays.16) .6 d=0.9.11). 1. delayed by a fractional-delay d. .1 d=0. The underlying continuous-time signal xa (nT ).4 ωT/π 0. Introduction 1 0.13. and the two corresponding output samples are computed. . 0. .4 in Fig. If the sampling rate is required to be increased by a factor of two. 0.7. 35] are shown in Figs. L output samples are generated.2 0.3 0. The phase delay and magnitude characteristics of the FD filter based on first order Lagrange interpolation [32.2 0 0 d=0.12 Chapter 1.12 and 1.5 for each input sample.1.

15a. seen in Fig. 37] Dynamic or switching power consumption Short circuit power consumption Leakage power consumption The above sources are summarized in an equation as Pavg = Pdynamic + Pshort-circuit + Pleakage 2 = α0→1 CL VDD fclk + Isc VDD + Ileakage VDD .17) The switching or dynamic power consumption is related to charging and discharging of a load capacitance CL through the PMOS and NMOS transistors during low-to-high and high-to low transitions at the output. half of which is dissipated in the form of heat through the PMOS transistors while the other half is stored in the load capacitor.14: The underlying continuous-time signal delayed by d = 0. The power dissipation is commonly divided into three different sources: [36. During the pull-down.5 1 d=0 d=0. 1.5 −1 −2 0 2 4 6 Sample index n 8 10 Figure 1.15b. respectively.1.4 Amplitude 0.5 Power and Energy Consumption Power is dissipated in the form of heat in digital CMOS circuits. If all these transitions occur at the clock rate of fclk then the switching power consumption .2 d=0. 1. seen in 2 Fig. 1. The total energy drawn from the power supply for low-to-high transition.4 with FD filtering.5 0 −0.5. the energy stored on 2 CL which is CL VDD /2 is dissipated as heat by the NMOS transistors. high to low transition. is CL VDD . Power and Energy Consumption 13 1. (1.2 and d = 0.

However the switching of the data is not always at the clock rate but rather at some reduced rate which is best defined by another parameter α0→1 . resulting in a direct path from supply to ground. defined as the average number of times in each cycle that a node makes a transition from low to high. The leakage current. are defined by the layout and specification of the circuit. The dashed arrow is for the leakage current.15a and 1. consists of two major contributions: Isub and Igate . is dissipated in the circuits when they are idle. which flows when both of the PMOS and NMOS transistors are active simultaneously. Figure 1. 2 is given by CL VDD fclk . All the parameters in the dynamic power equation. except α0→1 . on the other hand.15b by a dashed line. as shown in Figs. Introduction VDD Vin CL Vout (a) A rising output transition on the CMOS inverter.15: A rising and falling output transition on the CMOS inverter. when both NMOS . The second power term is due to the direct-path short circuit current. Ileakage . Isc . 1. Leakage power.14 Chapter 1. The solid arrows represent the charging and discharging of the load capacitance CL . VDD Vin CL Vout (b) A falling output transition on the inverter. The term Isub is the sub-threshold current caused by low threshold voltage.

Today. Igate . the leakage component of power consumption has started to become dominant [38–40]. is the gate current caused by reduced thickness of the gate oxide in deep sub-micron process . The relative contribution of these two forms of power consumptions has greatly evolved over the period of time. when technology scaling motivates the reduced power supply and threshold voltage. the two foremost forms of power consumptions are dynamic and leakage. In modern/concurrent CMOS technologies. The contribution of the reverse leakage currents.1. due to of reverse bias between diffusion regions and wells. In today’s processes. Power and Energy Consumption 15 and PMOS transistors are off. sub-threshold leakage is the main contributor to the leakage current. is small compared to sub-threshold and gate leakage currents. The other quantity. .5.

.

Chapter 2 Finite Word Length Effects

Digital filters are implemented in hardware with finite-precision numbers and arithmetic. As a result, the digital filter coefficients and internal signals are represented in discrete form. This generally leads to two different types of finite word length effects. First, there are the errors in the representing of coefficients. The coefficients representation in finite precision (quantization) has the effect of a slight change in the location of the filter poles and zeros. As a result, the filter frequency response differs from the response with infinite-precision coefficients. However, this error type is deterministic and is called coefficient quantization error. Second, there are the errors due to multiplication round-off, that results from the rounding or truncation of multiplication products within the filter. The error at the filter output that results from these roundings or truncations is called round-off noise. This chapter outlines the finite word length effects in digital filters. It first discusses binary number representation forms. Different types of fixed-point quantizations are then introduced along with their characteristics. The overflow characteristics in digital filters are briefly reviewed with respect to addition and multiplication operations. Scaling operation is then discussed which is used to prevent overflows in digital filter structures. The computation of roundoff noise at the digital filter output is then outlined. The description of the constant coefficient multiplication is then given. Finally, different approaches for the optimization of word length are reviewed.

2.1 Numbers Representation
In digital circuits, a number representation with a radix of two, i.e., binary representation, is most commonly used. Therefore, a number is represented by 17

18

Chapter 2. Finite Word Length Effects

a sequence of binary digits, bits, which are either 0 or 1. A w-bit unsigned binary number can be represented as X = x0 x1 x2 ...xw−2 xw−1 , with a value of
w−1

(2.1)

X=
i=0

xi 2w−i−1 ,

(2.2)

where x0 is the most significant bit (MSB) and xw−1 is the least significant bit (LSB) of the binary number. A fixed-point number consists of an integral part and a fractional part, with the two parts separated by a binary point in radix of two. The position of the binary point is almost always implied and thus the point is not explicitly shown. If a fixed-point number has wI integer bits and wF fractional bits, it can be expressed as X = xwI −1 . . . x1 x0 .x−1 x−2 . . . x−wF . (2.3) The value can be obtained as
wI −1 −1

X=
i=0

xi 2i +
i=−wF

xi 2i .

(2.4)

2.1.1 Two’s Complement Numbers
For a suitable representation of numbers and an efficient implementation of arithmetic operation, fixed-point arithmetics with a word length of w bits is considered. Because of its special properties, the two’s complement representation is considered, which is the most common type of arithmetic used in digital signal processing. The numbers are usually normalized to [−1, 1), however, to accommodate the integer bits, the range [−2wI , −2wI ), where wI ∈ N, is assumed. The quantity wI denotes the number of integer bits. The MSB, the left-most bit in w, is used as the sign bit. The sign bit is treated in the same manner as the other bits. The fraction part is represented with wF = w − 1 − wI bits. The quantization step is as a result ∆ = 2−wF . If X2C is a w-bit number in two’s complement form, then by using all definitions considered above, X can be represented as
w−1

X2C = 2wI

−x0 20 +
wI

xi 2−i
i=1

, xi ∈ {0, 1}, i = 0, 1, 2, . . . , w − 1,
w−1

= −x0 2wI +
sign bit

xi 2wI −i +
i=1 integer i=wI +1

xi 2wI −i ,
fraction

2.2. Fixed-Point Quantization or in compact form as X2C = [ x0
sign bit

19

| x1 x2 . . . xwI | xwI +1 . . . xw−1 ]2 .
integer fraction

(2.5)

In two’s complement, the range of representable numbers is asymmetric. The largest number is Xmax = 2wI − 2−wF = [0|1 . . . 1|1 . . . 1|]2 , and the smallest number is Xmin = −2wI = [1|0 . . . 0|0 . . . 0|]2 . (2.7) (2.6)

2.1.2 Canonic Signed-Digit Representation
Signed-digit (SD) numbers differ from the binary representation, since the digits are allowed to take negative values, i.e., xi ∈ {−1, 0, 1}. The symbol 1 is also used to represent −1. It is a redundant number system, as different SD representations are possible of the same integer value. The canonic signed-digit (CSD) representation is a special case of signed-digit representation in that each number has a unique representation. The other feature of CSD representation is that a CSD binary number has the fewest number of non-zero digits with no consecutive bits being non-zero [41]. A number can be represented in CSD form as
w−1

X=
i=0

xi 2i ,

where, xi ∈ {−1, 0, +1} and xi xi+1 = 0, i = 0, 1, . . . , w − 2.

2.2 Fixed-Point Quantization
Three types of fixed-point quantization are normally considered, rounding, truncation, and magnitude truncation [1, 42, 43]. The quantization operator is denoted by Q(.). For a number X, the rounded value is denoted by Qr (X), the truncated value by Qt (X), and the magnitude truncated value Qmt (X). If the quantized value has wF fractional bits, the quantization step size, i.e., the difference between the adjacent quantized levels, is ∆ = 2−wF (2.8)

The rounding operation selects the quantized level that is nearest to the unquantized value. As a result, the rounding error is at most ∆/2 in magnitude as shown in Fig. 2.1a. If the rounding error, r , is defined as
r

= Qr (X) − X,

(2.9)

20

Chapter 2. Finite Word Length Effects

∆/2 ∆/2

∆ ∆

∆ ∆

(a) Rounding error.

(b) Truncation error.

(c) Magnitude truncation error.

Figure 2.1: Quantization error characteristics. then

∆ ∆ ≤ r≤ . (2.10) 2 2 Truncation simply discards the LSB bits, giving a quantized value that is always less than or equal to the exact value. The error characteristics in the case of truncation are shown in Fig. 2.1b. The truncation error is − −∆<
t

≤ 0.

(2.11)

Magnitude truncation chooses the nearest quantized value that has a magnitude less than or equal to the exact value, as shown in Fig. 2.1c, which implies −∆<
mt

< ∆.

(2.12)

The quantization error can often be modeled as a random variable that has a uniform distribution over the appropriate error range. Therefore, the filter calculations involving round-off errors can be assumed error-free calculations that have been corrupted by additive white noise [43]. The mean and variance of the rounding error is
∆/2

1 mr = ∆
−∆/2

rd r

=0

(2.13)

and
2 σr

∆/2

1 = ∆
−∆/2

(

r

− mr )2 d

r

=

∆2 . 12

(2.14)

Similarly, for truncation, the mean and variance of the error are mt = − ∆ 2
2 and σt =

∆2 , 12

(2.15)

2 mmt = 0 and σmt = 21 ∆2 .2. Having two numbers.1 Two’s Complement Addition In two’s complement arithmetic. and graphically it is shown in Fig.3. Overflow is similarly treated here as in the case of addition. which corresponds to a repeated addition or subtraction of 2(wI +1) to make the w + 1-bit result to be representable by w-bits. a repeated addition or subtraction of 2(wI +1) . 3 (2. each with a precision wF . the result will be w + 1 bits. the integer bits need to be extended. X < −2wI . (2.2 Two’s Complement Multiplication In the case of multiplication of two fixed-point numbers each having w-bits.e. In two’s complement. This happens for fixed-point arithmetic . To accommodate this extra bit.2: Overflow characteristics for two’s complement arithmetic. Overflow Characteristics and for magnitude truncation.2.3. such overflows can be seen as discarding the extra bit. X. X ≥ 2wI .3. 2. Since numbers outside this range are not representable.g. the product is of precision 2wF .17) X2C (X) =  wI +1 X +2 . the result is 2w-bits. X2C (X) 2w I − ∆ X −2wI Figure 2.3 Overflow Characteristics With finite word length. 2. the result overflows.. The number is reduced to wF .3. This model for overflow is illustrated in Fig. 2. 2wI ). −2wI ≤ X < 2wI .16) 2. 2. when two numbers of the same sign are added to give a value having a magnitude not in the interval [−2wI . it is possible for the arithmetic operations to overflow. when two numbers each having w-bits are added together. The overflow characteristics of two’s complement arithmetic can be expressed as   X − 2wI +1 .

using L2 -norm scaling of a node inside the filter. However. 2. Finite Word Length Effects X1 y X2 d · 2(wI +1) y Figure 2. otherwise. 2wI ). Also the signal levels should not be too low. The integer d ∈ Z has to assign a value such that y ∈ [−2wI . precision again by using rounding or truncation. 2.22 X1 y X2 X2 X1 Chapter 2. as repeated additions with an overflow can be acceptable if the final sum lies within the proper signal range [4]. X1 y X2 X2 X1 Q y X2 X1 d · 2(wI +1) y Figure 2.4. . there exist several scaling norms that compromise between the probability of overflows and the round-off noise level at the output. In this thesis. The integer d ∈ Z has to assign a value such that y ∈ [−2wI . In the literature.4 Scaling To prevent overflow in fixed-point filter realizations. only the commonly employed L2 -norm is considered which for a Fourier transform H(ejωT ) is defined as 1 2π π H(e jωT ) 2 = −π |H(ejωT )| d(ωT ). However. the signal-to-noise (SNR) ratio will suffer as the noise level is fixed for fixed-point arithmetic. or at the output. the inputs to non-integer multipliers must not overflow. 2wI ). the signal levels inside the filter can be reduced by inserting scaling multipliers.4: Multiplication in two’s complement. The model to handle overflow in multiplication is shown in Fig. implies the same probability of overflow at that node. if the input to a filter is Gaussian white noise with a certain probability of overflow. The use of two’s complement arithmetic eases the scaling. 2 In particular. the scaling multipliers should not distort the transfer function of the filter.3: Addition in two’s complement.

white. The word length optimization approach trades precisions for VLSI measures such as area. and speed. In case there is more than one source of roundoff error in the filter. Quantization noise is assumed to be stationary.2. The possible ways could be closed-form expressions or the availability of precomputed values in the form of a table of these measures as a function of word lengths. the quantization process introduces round-off errors. These closed-form expressions and precomputed values are then used by the word length optimization algorithm at the word length assignment phase to have an estimate. output.18) and 2 2 σy = σx ∞ g 2 (n).5. The cost of an implementation is generally required to be minimized. while still satisfying the system specification in terms of implementation accuracy. The round-off noise variance at the output is the sum of contributions from each quantization error source. before doing an actual VLSI implementation.6 Word Length Optimization As stated earlier in the chapter. the mean and variance of the output noise is ∞ my = mx n=−∞ g(n) (2. while in-sufficient bit-width allocation will result in overflows and violate precision requirements. These are the measures or costs by which the performance of a design is evaluated. n=−∞ (2. the assumption is made that these errors are uncorrelated. For a linear system with impulse response g(n). This assumption is valid if the filter input changes from sample to sample in a sufficiently random-like manner [43]. the hardware implementation of an algorithm will be efficient typically involving a variety of finite precision representation of different sizes for the internal variables. 2. Round-Off Noise 23 2. The first difficulty in the word length optimization problem is defining of the relationship of word length to considered VLSI measures. Excessive bit-width allocation will result in wasting valuable hardware resources. excited by white noise with 2 mean mx and variance σx . After word length optimization.5 Round-Off Noise A few assumptions need to be made before computing the round-off noise at the digital filter output.19) where g(n) is the impulse response from the point where a round-off takes place to the filter output. which in turn measures the accuracy of an implementation. and internal variables. . power. and uncorrelated with the filter input.

is taken to be the round-off noise value at the output.20).g. and power consumption are increasing functions of word length. i 1 ≤ i ≤ n. For simplicity. In [49]. the word length allocation problem is solved using a mixed-integer linear programming formulation. (2. A shift operation in this context is used to implement a multiplication by a . n . subtractions. The performance function.22) (2. (2. This assumption allows to use superposition of independent noise sources to compute the overall round-off noise at the output. have constrained the cost. commonly used in DSP algorithms such as digital filters [53]. If the quantization word length wi is assumed for error source at node i then 2 σei = 2−2wi . speed. 2. 12 1 ≤ i ≤ n.24 Chapter 2. e.20) where ei is the quantization error source at node i and hi (k) is the impulse response from node i to the output.7 Constant Multiplication A multiplication with a constant coefficient. while optimizing the other metric(s). . 2 s. i = 1. while VLSI measures such as area.t. The noise variance at the output is then written as ∞ 2 σo = k=0 2 σei h2 (k).21) The formulation of an optimization problem is done by constraining the cost or accuracy of an implementation while optimizing other metric(s). an LTI system with n quantization error sources is assumed. and shifts only [54]. f (wi ). The problem of word length optimization has received considerable research attention. To derive a round-off noise model.. can be made multiplierless by using additions.23) where σspec is the required noise specification at the output. one possible formulation of word length optimization problem is minimize area : f (wi ). Finite Word Length Effects The round-off noise at the output is considered as the measure of performance function because it is the primary concern of many algorithm designers. 2. due to the limiting of internal word lengths. different search-based strategies are used to find suitable word length combinations. . the cost function is assumed to be the area of the design. In [44–48]. on other hand. . given in (2. Some other approaches. 1≤i≤n (2. Its value is measured appropriately to the considered technology. [50–52]. The round-off error is a decreasing function of word length. The complexity for adders and subtracters is roughly the same so no differentiation between the two is normally considered. As a result. and it is assumed to be the function of quantization word lengths. . σo ≤ σspec .

difference methods [66–70]. the coefficient 45.5: Different realizations of multiplication with the coefficient 45. the hardware requirements depend on the coefficient value. 71–73]. Most of the work in the literature has focused on minimizing the adder cost [55–57]. The symbol i are used to represent i left shifts. For bit parallel arithmetic. they can be implemented more efficiently by using structures that remove any redundant partial results among the coefficients and thereby reduce the overall number of operations. e. and graph based methods [60. sub-expression sharing [61–65].2. A simple way to implement multiplier blocks is to realize each multiplier separately. as in the case of transposed direct-form FIR filters shown in 1. the shifts can be realized without any hardware using hardwiring. the number of ones in the binary representation of the coefficient value. for example. The constant coefficient multiplication can be implemented by the method that is based on the CSD representation of the constant coefficient [58].5. The MCM algorithms can be divided into three groups based on the approach used in the algorithms. varying with respect to number of additions and shifts requirement [41]. In constant coefficient multiplication. or more efficiently by using other structures as well that require fewer number of operations [59]. having the CSD representation 1010101. This corresponds to a matrix-vector . one signal is required to be multiplied by several constant coefficients.2. The MCM concepts can be further generalized to computations involving multiple inputs and multiple outputs. factor of two. 2. Realizing the set of products of a single multiplicand is known as the multiplier block problem [60] or the multiple constant multiplications (MCM) problem [61]. However. Consider.. In some applications.g. The multiplication with this constant can be realized by three different structures as shown in Fig. Constant Multiplication 25 x − − y 2 x 2 2 2 x − 3 y 2 4 − y Figure 2.7.

Matrix-vector MCM algorithms include [76–80]. and state space digital filters [4. 41]. . but also FIR filter banks [74].26 Chapter 2. Finite Word Length Effects multiplication with a matrix with constant coefficients. This is the case for linear transforms such as the discrete cosine transform (DCT) or the discrete Fourier transform (DFT). polyphase decomposed FIR filters [75].

Chapter 3 Integer Sampling Rate Conversion The role of filters in sampling rate conversion has been described in Chapter 1. the non-recursive implementation of the CIC filter and its multistage and polyphase implementation structures are presented. the overall computational workload in the sampling rate conversion system can potentially be reduced by the conversion factor M (L). The structures of the CIC-based decimators and interpolators are then shown. 3. Finally. The overall two-stage implementation of a CIC and an FIR filter that is used for the compensation is described. The chapter begins with the basic decimation and interpolation structures that are based on FIR filters. downsampling or upsampling operations can be moved into the filtering structures. This chapter mainly focuses on the implementation aspects of decimators and interpolators due to the fact that filtering initially has to be performed on the side of higher sampling rate. A part of the chapter is then devoted to the efficient polyphase implementation of FIR decimators and interpolators. The characteristics of these filters then correlate to the overall performance of a decimator or of an interpolator. In decimation. The goal is then to achieve conversion structures allowing the arithmetic operations to be performed at the lower sampling rate. As a result. providing arithmetic operations to be performed at the lower sampling rate. 27 . filtering operations need to be performed at the higher sampling rate. The overall computational workload will as a result be reduced.1 Basic Implementation Filters in SRC have an important role and are used as anti-aliasing filters in decimators or anti-imaging filters in interpolators. The concept of the basic cascade integrator-comb (CIC) filter is introduced and its properties are discussed. As discussed in Chapter 1. However. filtering operation proceeds upsampling. filtering operation precedes downsampling and in interpolation.

The input sample values are then multiplied by the filter coefficients h(n) and combined together to give y(m). the . Similarly. The symmetry of coefficients may be exploited in the case of linear phase FIR filter to reduce the number of multiplications to (N + 1)/2 for odd N and N/2 + 1 for even N .1. The implementation can be modified to a form that is computationally more efficient. the number of the multiplications per input sample in the decimator is equal to the FIR filter length N + 1. 3.2. 3. for a downsampling factor M . Integer Sampling Rate Conversion z −1 h(3) h(N − 1) z −1 h(N ) M h(0) y(m) Figure 3. reduces the multiplications per input sample to (N + 1)/M .1.2: Computational efficient direct-form FIR decimation structure. 3. a direct-form FIR filter is cascaded with the downsampling operation as shown in Fig. The non-recursive nature of FIR filters provides the opportunity to improve the efficiency of FIR decimators and interpolators through polyphase decomposition. In the second modification.2.1 FIR Decimators In the basic implementation of an FIR decimator. as seen in Fig. In Fig.1. FIR filters are considered as anti-aliasing or antiimaging filters. 3. The first modification made by the use of noble identity. the input data is now read simultaneously from the delays at every M :th instant of time.1: Cascade of direct-form FIR filter and downsampler. seen in Fig.28 x(n) z −1 h(1) z −1 h(2) Chapter 3. 3. In the basic approach [81].2. multiplications per input sample is (N + 2)/2M for even N . The FIR filter acts as the anti-aliasing filter. 3. x(n) z −1 M M z −1 M z −1 M M z −1 M h(0) h(1) h(2) h(3) h(N − 1) h(N ) y(m) Figure 3. In the basic implementation of the FIR decimator in Fig. The downsampling operation precedes the multiplications and additions which results in the arithmetic operations to be performed at the lower sampling rate.

multiplications per input sample are (N + 1)/(2M ) for odd N . 3. as shown in Fig. 3. every L:th input sample to the filter is non-zero.4: Computational efficient transposed direct-form FIR interpolation structure. 3. For a factor of M polyphase decomposition. multiplications need to be performed at the sampling rate of the input signal. 3. the FIR filter transfer . In the basic multiplication of the FIR interpolator in Fig. Similar to the case of decimator. Polyphase FIR Filters x(n) L h(N ) h(N − 1) z −1 29 h(2) z h(1) −1 h(0) z −1 y(m) Figure 3. seen in Fig.2 FIR Interpolators The interpolator is a cascade of an upsampler followed by an FIR filter that acts as an anti-imaging filter. the upsampling operation is moved into the filter structure at the desired position such that the multiplication operations are performed at the lower sampling-rate side.3. x(n) h(N ) L h(N − 1) L z −1 h(2) L h(1) L z −1 h(0) L z −1 y(m) Figure 3. 3.3: Cascade of upsampler and transposed direct-form FIR filter. The result is then upsampled by the desired factor L.1. Similar to the decimator case.3.3.2.4. This modified implementation approach reduces the multiplications count per output sample from N + 1 to (N + 1)/L. the transfer function of an FIR filter can be decomposed into parallel structures based on the principle of polyphase decomposition. Since there is no need to multiply the filter coefficients by the zero-valued samples.2 Polyphase FIR Filters As described earlier in Sec.1. the coefficient symmetry of the linear phase FIR filter can be exploited to reduce the number of multiplication further by two. 3.

5. The input sequences to the polyphase sub-filters are combination of delayed and downsampled values of the input which can be directly selected from the input with the use of a commutator.10a. function is decomposed into M low-order transfer functions called the polyphase components [5.1). Integer Sampling Rate Conversion x(n) y(m) T T T Figure 3. the arithmetic operations inside the sub-filters now operate at the lower sampling rate. If the impulse response h(n) is assumed zero outside 0 ≤ n ≤ N .10) becomes (N +1)/M Hm (z) = n=0 h(nM + m)z −n . 1. 10]. The overall computational complexity of the decimator is reduced by a factor of M . The resulting polyphase architecture for a factor-of-M decimation is shown in Fig.1) 3.1 Polyphase FIR Decimators The basic structure of the polyphase decomposed FIR filter is the same as that in Fig. however the sub-filters Hm (z) are now specifically defined in (3.10. The individual contributions from these polyphase components when added together has the same effect as of the original transfer function. (3. The arithmetic operations in the implementation of all FIR sub-filters still operate at the input sampling rate. As a result. then the factors Hm (z) in (1. 1. 3. 0 ≤ m ≤ M − 1.30 Chapter 3.2. The downsampling operation at the output can be moved into the polyphase branches by using the more efficient polyphase decimation implementation structure in Fig. The commutator operates at the .5: Polyphase factor of M FIR decimator structure.

The resulting architecture for factor of L interpolator is shown in Fig. More efficient interpolation structure will result because the arithmetic operations inside the sub-filters now operate at the lower sampling rate. input sampling rate but the sub-filters operate at the output sampling rate. can be used if . . . can be used when the decimation factor M can be factored into the product of integers.1) for the FIR filter case. delays. 3. The overall computational complexity of the interpolator is reduced by a factor of L. seen in Fig.2.3. The basic structure of the polyphase decomposed FIR filter is same as that in the Fig. seen in Fig.11a. Similarly. The upsampling operation at the input is moved into the polyphase branches by using the more efficient polyphase interpolation implementation structure in Fig. .11. Multistage Implementation 31 x(n) y(m) T T T Figure 3. . 1. a multistage interpolator with K stages. by a factor L.7. 1.8. The polyphase sub-filters Hm (z) are defined in (3. 3. The same function can be replaced directly by using a commutative switch. the arithmetic operations in the implementation of all FIR sub-filters operate at the upsampled rate. instead of using a single filter and factor of M downsampler. Mk . M = M1 ×M2 ×. 3.6.3. also called polyphase sub-filters. and adders are used to feed the correct interpolated output sample. 3.2 Polyphase FIR Interpolators The polyphase implementation of an FIR filter.3 Multistage Implementation A multistage decimator with K stages. 3. the combination of upsamplers. is done by decomposing the original transfer function into L polyphase components. On the output side. In the basic structure.6: Polyphase factor of L FIR interpolator structure.

with differential delay R. A single stage implementation with a large decimation (interpolation) factor requires a very narrow passband filter. L1 H1(z) L2 H2(z) LJ HJ (z) Figure 3. HJ (z M1M2.9. Compared to the single stage implementation. It is a cascade of integrator and comb sections.32 H1(z) M1 Chapter 3. . which is hard from the complexity point of view [3. The CIC filter has proven to be an effective element in high-decimation or interpolation systems [84–87]. is comb section and the feedback section is called an integrator. Integer Sampling Rate Conversion H2(z) HK (z) M2 MK Figure 3..9: Single stage equivalent of a multistage decimator. 81. n=0 (3. The multistage implementations are used when there is need of implementing large sampling rate conversion factors [1. Similarly. the interpolation factor L can be factored as L = L1 × L2 × · · · × LK . The transfer function of a single stage CIC filter is given as H(z) = 1 − z −R = 1 − z −1 R−1 z −n . 3. 3. it relaxes the specifications of individual filters.8: Multistage implementation of an interpolator. 82]. the single stage equivalent of the multistage interpolator is shown in Fig. . 82].MJ−1 ) M1M2 . 3. A single stage CIC filter is shown in Fig. An optimum realization depends on the proper selection of K and J and best ordering of the multistage factors [82]. 3.11.4 Cascaded Integrator Comb Filters An efficient architecture for a high decimation-rate filter is the cascade integrator comb (CIC) filter introduced by Hogenauer [83]. .7: Multistage implementation of a decimator. The simplicity of implementation makes the CIC filters suitable for operating at higher frequencies. The noble identities can be used to transform the multistage decimator implementation into the equivalent single stage structure as shown in Fig.2) H1(z)H2(z M1 )H3(z M1M2 ) . The feed forward section of the CIC filter.. MJ Figure 3.10. . .

HJ (z L1L2. These lowpass filters are well suited to improve the efficiency of anti-aliasing filters prior to decimation or the anti-imaging filters after the interpolation. the comb section following the integrator will compute the correct result at the output. 1.1 CIC Filters in Interpolators and Decimators In the hardware implementations of decimation and interpolation. R. The zeros are the R-th roots of unity and are located at z(k) = ej2πk/R . of the comb section. The integrator section. . the CIC filter is the best suited for the first decimation stage. The comb filter has R zeros that are equally spaced around the unit circle.. cascaded integrator-comb (CIC) filters are frequently used as a computationally efficient narrowband lowpass filters. . The use of two’s complement and non-saturating arithmetic accommodates the issues with overflow. z = ejωT = ej2πf . However. this is of no harm if two’s complement arithmetic is used and the range of the number system is equal to or exceeds the maximum magnitude expected at the output of the composite CIC filter. Cascaded Integrator Comb Filters L 1 L 2 .3) The gain of the single stage CIC filter at ωT = 0 is equal to the differential delay. is given by H(ejωT ) = sin( ωT R ) 2 sin( ωT ) 2 e−jωT (R−1)/2 (3. LJ H1(z)H2(z L1 )H3(z L1L2 ) . The existence of integrator stages will lead to overflows. which is only possible with exact integer arithmetic [88]. whereas in interpolators. The CIC filters are generally more convenient for large conversion factors due to small lowpass bandwidth. CIC filters have to operate at very high sampling rate.LJ−1 ) 33 Figure 3. 2. on the other hand. the comb filter is adopted for the last interpolation stage. In multistage decimators with a large conversion factor. In both of these applications. . 3. has a single pole at z = 1. x(n) z −R − y(m) z −1 Figure 3. With the two’s complement wraparound property.11: Single-stage CIC filter..4. . . R − 1. where k = 0.3. . . CIC filters are based on the fact that perfect pole/zero canceling can be achieved. . The frequency response of CIC filter.10: Single stage equivalent of a multistage interpolator.4. by evaluating H(z) on the unit circle. .

However. The structure also includes the decimation factor M . Similarly. imperfect filtering gives rise to unwanted spectral images. there are only two . Similarly. the integrator and comb are used as the first and the second section of the CIC filter. It reduces the length of the delay line which is basically the differential delay of the comb section. 3. Typically. it is generally preferred to place comb part on the side which has the lower sampling rate. The sampling rate as a result of this operation is Fin L. When the decimation factor M is moved into the comb section using the noble identity and considering R = DM . Both of these factors result in hardware saving and also low power consumption. The cascade connection of integrator and comb in Fig.4) In order to modify the magnitude response of the CIC filter. The important thing to note is that the integrator section operate at the higher sampling rate in both cases. which is now considered as the differential delay of comb part. the CIC filter with the upsampling factor L is used for the interpolation purpose. However. 3. The filtering characteristics for the antialiasing and anti-image attenuation can be improved by increasing the number of stages of CIC filter. the differential delay D is restricted to 1 or 2. Similar advantages can be achieved as in the case of CIC decimation filter. an important aspect with the CIC filters is the spectral aliasing resulting from the downsampling operation. (3. respectively. where Fin is the input sampling rate.11 can be interchanged as both of these are linear time-invariant operations.34 x(n) Chapter 3. In a decimator structure. The spectral bands centered at the integer multiples of 2/D will alias directly into the desired band after the downsampling operation.12. The comb and integrator sections are both linear time-invariant and can follow or precede each other. The delay length is reduced to D = R/L.13. The value of D also decides the number of nulls in the frequency response of the decimation filter. 3. Integer Sampling Rate Conversion M z −1 z −D y(m) − Figure 3. The transfer function of the K-stage CIC filter is K K H(z) = HI (z)HC (z) = 1 − z −D 1 − z −1 K . the interpolation factor L is moved into the comb section using the noble identity and considering R = DL. One advantage is that the storage requirement is reduced and also the comb section now operates at a reduced clock rate. However. The sampling rate at the output as a result of this operation is Fin /M .12: Single stage CIC decimation filter with D = R/M . The resulting interpolation structure is shown in Fig. The delay length is reduced to D = R/M . the resulting architecture is the most common implementation of CIC decimation filters shown in Fig.

4. n=0 (3. A single stage non-recursive CIC decimation filter is shown in Fig. then the transfer function can be expressed as J−1 i=0 H(z) = (1 + z −2 ) i (3.14. is limited to (win + K · j). The . 3. The basic non-recursive form of the transfer function is H(z) = 1 D D−1 z −n . the polyphase decomposition by a factor of two at each stage can be used to reduce the sampling rate by an additional factor of two.2 Polyphase Implementation of Non-Recursive CIC The polyphase decomposition of the non-recursive CIC filter transfer function may achieve lower power consumption than the corresponding recursive implementation [89. In the case of multistage implementation. 3.4. and are usually not enough to provide sufficient aliasing rejection in the entire baseband of the signal. the original converter can be transformed into a cascade of J factorof-two converters. If the overall conversion factor D is power of two. D = 2J .6) As a result. Further. Each has a non-recursive sub-filter (1 + z −1 )K and a factorof-two conversion.13: Single stage CIC interpolation filter with D = R/L. the overall decimator is decomposed into a first-stage non-recursive filter with the decimation factor J1 . The advantage of increased number of stages is improvement in attenuation characteristics but the passband droop normally needs to be compensated. the number of CIC filter stages and the differential delay D.5) The challenge to achieve low power consumption are those stages of the filter which normally operate at higher input sampling rate. In an approach proposed in [89]. Cascaded Integrator Comb Filters x(n) z −D − L y(m) z −1 35 Figure 3. parameters. followed by a cascade of non-recursive (1 + z −1 )K filters with a factor-of-2 decimation. if properly scaled. The sampling rate is successively reduced by two after each decimation stage as shown in Fig. one advantage is that only first decimation stage will operate at high input sampling rate. 3. The register word length at any stage j for an input word length win . The natural nulls of the CIC filter provide maximum alias rejection but the aliasing bandwidths around the nulls are narrow. The second advantage of the non-recursive structures is that there are no more register overflow problems.3. 90].15.

3.4. The purpose of the compensation filter is to compensate for the non-flat passband of the CIC filter. . therefore. some compensation filter are normally used after the CIC filters.3 Compensation Filters The frequency magnitude response envelopes of the CIC filters are like sin(x)/x. Several approaches in the literature [91–95] are available for the CIC filter sharpening to improve the passband and stopband characteristics. A two stage solution of sharpened comb decimation structure is proposed in [93–95.. + z −D+1)K D Figure 3. The role of these filter sections is then defined. 97–100] for the case where sampling rate conversion is expressible as the product of two integers..36 Chapter 3. The transfer function can then be written as the product of two filter sections. (1 + z −1)K 2 (1 + z −1)K 2 (1 + z −1 )K 2 Figure 3. They are based on the method proposed earlier in [96].15: Cascade of J factor-of-two decimation filters. One filter section may be used to provide sharpening while the other can provide stopband attenuation by selecting appropriate number of stages in each filter section. polyphase decomposition along with noble identities is used to implement all these sub-filters.14: Non-recursive implementation of a CIC decimation filter. Integer Sampling Rate Conversion (1 + z −1 + z −2 + .

This chapter gives a concise overview of SRC by a rational factor and implementation details. In addition. 4. followed by downsampling by a factor of M [9. it may be beneficial to change the order of these operators.1 are equivalent if the factors L and M are relative prime. M and L do not share a common divisor greater than one. 4. 4. For some cases. Therefore. i.Chapter 4 Non-Integer Sampling Rate Conversion The need for a non-integer sampling rate conversion may appear when the two systems operating at different sampling rates have to be connected. the sampling rate alteration with an arbitrary conversion factor is described. The chapter first introduces the SRC by a rational factor. are used to change the sampling rate of a discrete-time signal by an integer factor only.. A cascade of these sampling rate alteration devices. Upsampling by a factor of L. The polynomial-based approximation of the impulse response of a resampler model is then presented.e. downsampler and upsampler. 82]. requires the cascade of upsampler and downsampler operators. which is only possible if the L and M factors are relative prime. The technique for constructing efficient SRC by a rational factor based on FIR filters and polyphase decomposition is then presented. the sampling rate change by a rational factor. 37 . results in sampling rate change by a rational factor L/M . The realizations shown in Fig. the implementation of fractional-delay filters based on the Farrow structure is considered. followed by a downsampling by a factor of M .1 Sampling Rate Conversion: Rational Factor The two basic operators.1 Small Rational Factors The classical design of implementing the ratio L/M is upsampling by a factor of L followed by appropriate filtering. Finally.1.

The specification of the filter H(z) needs to be selected such that it could act as both anti-imaging and anti-aliasing filters at the same time.2: Cascade of an interpolator and decimator.2) From the above equation it is clear that the passband will become more narrow and filter requirements will be hard for larger values of L and M . Non-Integer Sampling Rate Conversion x(n) L M y(n) x(n) M L y(n) Figure 4.3 should be ωs T = min π π . . π π L. The sampling rate of the output signal is Fout = L Fin .3) L H(z) M y(n) Figure 4. The interpolated signal is first filtered by an anti-aliasing filter before downsampling by a factor of M . . The original input signal x(n) is first upsampled by a factor of L followed by an antiimaging filter also called interpolation filter. Since the role of the filter H(z) is to act as an anti-imaging as well as an anti-aliasing filter. The ideal specification requirement for the magnitude response is |H(ejωT )| = x(n) L. are next to each other.3. otherwise.2. 4. 4. The implementation of a rational SRC scheme is shown in Fig. interpolation and the decimation. the stopband edge frequency of the ideal filter H(z) in the rational SRC of Fig.38 Chapter 4. Both of these lowpass filters can be combined into a single lowpass filter H(z) as shown in Fig. L M (4. M (4. 4.3: An implementation of SRC by a rational factor L/M .1: Two different realizations of rational factor L/M . M (4.1) Since both filters. x(n) L HI (z) Hd(z) M y(n) Figure 4. |ωT | ≤ min 0. and operate at the same sampling rate.

2 Polyphase Implementation of Rational Sampling Rate Conversion The polyphase representation is used for the efficient implementation of rational sampling rate converters. Sampling Rate Conversion: Rational Factor x(n) H0(z) H1(z) H2(z) L L L z −1 z −2 M M M y(m) 39 HL−1(z) L z −L+1 M Figure 4. the filter transfer function H(ejωT ). In the next step. The transfer function H(z) in Fig. 4.3 is decomposed into L polyphase components using the approach shown in Fig. 4. the interpolation factor L is moved into the subfilters using the computationally efficient polyphase representation form shown in Fig. 1.4) It is apparent that the overall performance of the resampler depends on the filter characteristics. If the sampling conversion .1. 1. Since the downsampling operation is linear. can be expressed as Y (e jωT 1 )= M M−1 X(ej(LωT −2πk)/M )H(ej(ωT −2πk)/M ). it can be moved into each branch of the subfilters. which then enables the arithmetic operations to be performed at the lowest possible sampling rate. as a function of spectrum of the original signal X(ejωT ). 10]. Fig. The reorganization of the individual polyphase branches is targeted next.4 [5. The frequency spectrum of the re-sampled signal.11.4 has already attained the computational saving by a factor of L. k=0 (4. An equivalent delay block for each sub-filter branch is placed in cascade with the sub-filters.4.4: Polyphase realization of SRC by a rational factor L/M .1. and the interpolation and decimation factors L and M .11a. and also by the use of noble identity. Y (ejωT ). 4. The next motivation is to move the downsampling factor to the input side so that all filtering operations could be evaluated at 1/M -th of the input sampling rate. 4. The equivalent polyphase representation of the rational factor of L/M as a result of these modifications is shown in Fig. By using polyphase representation and noble identities.

As a result of these adjustments. Non-Integer Sampling Rate Conversion L L z kl0 Hk (z) z −k z k(l0L−m0M ) L M M L M M z −km0 z −km0 Figure 4.6) The step-by-step reorganization of the k-th branch is shown in Fig.1 kHz. Using (4. Since both upsampler and downsampler are assumed to be relative prime. L and M . The polyphase representation form in Fig.5) where l0 and m0 are some integers. 4.5. factors. The intermediate sampling rate for the filtering operation will be lower.1. filters of a very high order are needed.3 Large Rational Factors The polyphase decomposition approach is convenient in cases where the factors L and M are small. . in the application of sampling rate conversion from 48 kHz to 44. the sub-filter Hk (z) is in cascade with the downsampling operation. 1.5: Reordering the upsampling and downsampling in the polyphase realization of k-th branch. As can be seen in Fig. Otherwise. The delay chain z −k is first replaced by its equivalent form in (4. which makes it a more efficient structure for a rational SRC by a factor L/M .40 Hk (z) Hk (z) Hk (z) z kl0 Chapter 4.5. they can be interchanged. then l0 L − m0 M = −1 (4. 1.6).10a and its equivalent computationally efficient form in Fig.10 can be used for the polyphase representation of the sub-filter Hk (z). The rational sampling rate conversion structures for the case L/M < 1 considered can be used to perform L/M > 1 by considering its dual. are relative prime. all filtering operations now operate at 1/M :th of the input sampling rate. The same procedure is followed for all other branches. the k-th delay can be represented as z −k = z k(l0 L−m0 M) (4. The fixed independent delay chains are also moved to the input and the output sides. The negative delay z kl0 on the input side is adjusted by appropriately delaying the input to make the solution causal. For example. 4. which then enables the interchange of downsampler and upsampler. as highlighted by the dashed block. 4.5).

new filters are needed. 4. −N1 + 1.1 Farrow Structures In conventional SRC implementation. In this case. . x((nb −2)Tx ). to cover the whole sampling interval range. N2 − 1. x((nb + 1)Tx ). 4. if the SRC ratio changes. The interpolation algorithm may in general be defined as a time-varying digital filter with the impulse response h(nb .2.i=k t − tk . uniformly sampled at the interval Tx .2) is π/160. . y(Ty m) = ya (Ty m). . The new sample value y(Ty m) is computed. . k = −N1 . . . Ty m.8) The Lagrange approximation also does the exact reconstruction of the original input samples. Sampling Rate Conversion: Arbitrary Factor 41 the rational factors requirement is L/M = 147/160. . The stopband edge frequency from (4. x((nb −1)Tx ). x(nb Tx ). . x(n). d). then Pk (t) is defined as N2 Pk (t) = i=−N 1. If the interpolation process is based on the Lagrange polynomials. .4. . which imposes severe requirements for the filters. Assuming an input sequence. . A polynomial approximation ya (t) is then defined as N2 ya (t) = k=−N1 Pk (t)x(nb + k) (4. This limits the flexibility in covering different SRC ratios. x((nb +2)Tx ). . . The requirement is to compute new sample value. The parameter nb is some reference or base-point index. . The fractional interval can take on any value in the range |Tx d| ≤ 0. . The continuous-time signal can then be sampled at the desired instant. This will result in a high-order filter with a high complexity. which occurs between the two existing samples x(nb Tx ) and x((nb + 1)Tx ). by utilizing some interpolation algorithm. .2. By utiliz- .7) where Pk (t) are polynomials. a multi-stage realization will be beneficial [41]. and d is called the fractional interval. For a given set of the input samples x(−N1 + nb ). x(nb ). there is a need to compute the value of underlying continuous-time signal xa (t) at an arbitrary time instant between two existing samples. . The computation of new sample values at some arbitrary points can be viewed as interpolation.2 Sampling Rate Conversion: Arbitrary Factor In many applications. The commonly used interpolation techniques are based on polynomial approximations. . using the existing samples. tk − ti (4. N2 .5. y(Ty m) at some time instant. . where Ty m = nb Tx + Tx d. . x(nb + N2 ). The interpolation problem can be considered as resampling. the window or interval chosen is N = N1 + N2 + 1. where the continuoustime function ya (t) is first reconstructed from a finite set of existing samples x(n).

The corresponding filter structure makes use of a number of sub-filters and one adjustable fractional-delay as seen in Fig. The sub-filters can also have even or odd orders Nk .6. |ωT | ≤ ωc T < π. 1. A filter with a transfer function in the form of (4. the filter H0 (z) reduces to a pure delay. The implementation complexity of the Farrow structure is lower compared to alternatives such as online design or storage of a large number of different impulse responses. . . all Hk (z) are general filters whereas for even Nk .42 x(n) HL(z) d Chapter 4. or an integer plus a half. . this can be solved in an elegant way. a whole sampling interval is covered by d. An alternative . Hk (z). with either a symmetric (for k even) or antisymmetric (for k odd) impulse response. 1/2]. With odd Nk .5) when D is an integer (an integer plus a half). ing the Farrow structure. Non-Integer Sampling Rate Conversion H2(z) d H1(z) d H0(z) y(n) Figure 4. d) = k=0 dk Hk (z). for decimation. (4. The transfer function of the Farrow structure is L H(z. of an adjustable FD filter be Hdes (ejωT . Hdes (ejωT . (4. k! (4.11) can approximate the ideal response in (4. The overall impulse response values are expressed as polynomials in the delay parameter.6: Farrow structure. It is assumed that D is either an integer. L. it is better to use the transposed Farrow structure so as to avoid aliasing. Let the desired frequency response.10) as close as desired by choosing L and designing the sub-filters appropriately [30.9) to L + 1 terms.6. The Farrow structure is composed of linear-phase FIR sub-filters. and the fractional-delay equals d (d + 0. . 101]. When Hk (z) are linear-phase FIR filters.9) where D and d are fixed and adjustable real-valued constants. respectively. the Farrow structure is often referred to as the modified Farrow structure. d). 4. k = 0. whereas d takes on values in the interval [−1/2. shown in Fig. d) = e−jωT (D+d) . The Farrow structure is efficient for interpolation whereas. 4. In this way.10) where Hk (z) are fixed FIR sub-filters approximating kth-order differentiators with frequency responses Hk (ejωT ) ≈ e−jωT D (−jωT )k .11) which is obtained by truncating the Taylor series expansion of (4.

d) = k=0 hk (n)dk . Hence. one can obtain new samples between any two consecutive samples of x(n). (4. Tout > Tin . For interpolation. one set of Hk (z) is used and only d is required to be modified. by controlling d for every input sample. . In general. respectively. d) = k=0 n=0 N L hk (n)z −n dk .13) Assuming Tin and Tout to be the sampling period of x(n) and y(n). (4. For decimation. Thus. some signal samples will be removed but some new samples will be produced. (4. d)z −n .2. If a signal needs to be delayed by two different values of d. hk (n)dk z −n . whereas interpolation results in Tout < Tin . Odd Nk . the Farrow structure delays a bandlimited signal by a fixed d. With decimation. the output sample index at the output of the Farrow structure is nout Tout = (nin + d(nin ))Tin . If d is constant for all input samples.14) where nin (nout ) is the input (output) sample index. SRC can be seen as delaying every input sample with a different d. n=0 k=0 N = = n=0 h(n. one can shift the original samples (or delay them in the time domain) to the positions which would belong to the decimated signal.4. Even Nk (nin + 0.12) The quantity N is the order of the overall impulse response and L h(n.5 + d(nin ))Tin . This delay depends on the SRC ratio. in both cases. the Farrow structure performs SRC. Sampling Rate Conversion: Arbitrary Factor representation of the transfer function of Farrow structure is L N 43 H(z.

.

only uni-variate polynomial. Here n is the order of the polynomial. The p(x) takes the form p(x) = a0 + a1 × x + a2 × x × x + · · · + an × x × x · · · × x. Polynomial evaluation also has other applications [102]. is considered with an independent variable x. requires evaluation of a polynomial of degree L with d as an independent variable. In this chapter. At the start of chapter. p(x). p(x). .Chapter 5 Polynomial Evaluation The Farrow structure. the classical Horner scheme is described. 45 an = 0. 4. A uni-variate polynomial is a polynomial that has only one independent variable. which is considered as an optimal way to evaluate a polynomial with minimum number of operations. shown in Fig. Motivated by that. A brief overview of parallel polynomial evaluation schemes is then presented in the central part of the chapter. the computation of the required set of powers is discussed. . Finally. an are the polynomial coefficients.6. 104]. an = 0. 5.1 Polynomial Evaluation Algorithms The most common form of a polynomial. . this chapter reviews polynomial evaluation schemes and efficient evaluation of a required set of powers terms. Two approaches are outlined to exploit potential sharing in the computations. the first solution that comes into mind is the transformation of all powers to multiplications. is represented as p(x) = a0 + a1 x + a2 x2 + · · · + an xn . . In order to evaluate p(x). such as approximation of elementary functions in software or hardware [103. a1 . also called power form. whereas multi-variate polynomials involves multiple independent variables. . and a0 .

the suitable schemes can be shortlisted for an implementation given the specifications.1. Polynomial Evaluation an x an−1 x a1 x a0 p(x) Figure 5. number of operations of different types. and data-coefficient multiplications.1: Horner’s scheme for an n-th order polynomial. (5. 5. varying in degree of parallelism. . pipelining complexity. there is an increase in the number of operations for these parallel algorithms. 5. as given in Table 5. . 5. To compare different polynomial evaluation schemes. it is . hence. and. In this scheme. the polynomial p(x) is represented in the form p(x) = a0 + x a1 + x a2 + · · · + x(an−1 + an x) . This leads to that the number of partial products can be roughly halved. However.2 Horner’s Scheme Horner’s scheme is simply a nested re-write of the polynomial. However. there are more efficient ways to evaluate this polynomial. Horner’s scheme has the minimum arithmetic complexity of any polynomial evaluation algorithm. it is purely sequential. One possible way to start with is the most commonly used scheme in software and hardware named as Horner’s scheme. the difference is in the number of multiplication operations. both of the inputs of the multipliers are same. Earlier works have primarily focused on either finding parallel schemes suitable for software realization [105–111] or how to find suitable polynomials for hardware implementation of function approximation [47. 112–116]. However. On the basis of these parameters. and latency after pipelining may be considered. In squarers. data-data multiplications. For a polynomial order of n. Multiplications can be divided into three categories. Since all schemes have the same number of additions n.46 Chapter 5.3 Parallel Schemes Several algorithms with some degree of parallelism in the evaluation of polynomials have been proposed [105–110].1. . critical path.1) This leads to the structure in Fig. squarers. and therefore becomes unsuitable for high-speed applications. different parameters such as computational complexity. n multiplications and n additions are used.

Other factors that may be helpful in shortlisting any evaluation scheme are uniformity and simplicity of the resulting evaluation architecture and linearity of the scheme as the order grows. on other hand. as explained in Chapter 2. The data-coefficient multiplication count also gives the idea of degree of parallelism of that scheme. Scheme [110]. there are other polynomial evaluation schemes as well like Dorn’s Generalized Horner Scheme [106]. some schemes may be good for one polynomial order but not for the others. has two data-coefficient multiplications. Even Odd (EO) Scheme. Powers Evaluation 47 reasonable to assume that a squarer has roughly half the area complexity as of a general multiplier [117. The schemes which are frequently used in literature are Horner and Estrin [105]. Maruyama’s Scheme [109]. and Li et al. The required set of power terms to be evaluated may contain all powers up to some integer n or it has some sparse set of power terms that need to be evaluated. The scheme which has the lower number of data-data multiplications is cheaper to implement. The difference in all schemes is that how the schemes efficiently split the polynomial into sub-polynomials so that they can be evaluated in parallel. some schemes do not work for all polynomial orders. Horner’s scheme has only one data-coefficient multiplication. All the listed schemes have some good potential and useful properties. no degree of parallelism. The evaluation of such square and cube powers are normally more efficient than conventional multiplications. In data-coefficient multiplications. any degree of parallelism in it or efficient evaluation relaxes the computational burden to satisfy required throughput. The applications where polynomials are required to be evaluated at run-time or have extensive use of it. The EO scheme. not only finds application in polynomial evaluation [102. Munro and Paterson’s Scheme [107. which result in two branches running in parallel. Some schemes split the polynomial in such a way that only i i powers of the form x2 or x3 are required. it is assumed that a constant multiplier has an area which is 1/4 of a general multiplier. the area complexity is reduced but hard to find as it depends on the coefficient. The other factor that is important is the type of powers required after splitting of the polynomial.5. also known as exponentiation. 108]. For simplicity. 121]. and hence. However. has one sequential flow. The disadvantage on the other hand is that the number of operations are increased and additional need of hardware to run operations in parallel. For example. 118]. redundancy in computations at different levels is expected . In the process of evaluating all powers at the same time. However.4 Powers Evaluation Evaluation of a set of powers. 5.4. 119] but also in window-based exponentiation for cryptography [120. two schemes having the same total number of multiplication but vary in types of multiplications are different with respect to implementation cost and other parameters. As a result.

x5 . x20 . while in the second. to compute x23 .48 Chapter 5. x11 .4. a1 . it is exploited at bit level. additions (A). (5. . . {x2 . 50}. {x2 . it does not combine well in eliminating the overlapping terms that appear in the evaluation of individual powers as explained below with an example. when these techniques.3) (5. used for the computation of a single power [102. . x4 . Scheme Direct Horner Estrin Li Dorn EO Maruyama Evaluation a5 x5 + a4 x4 + a3 x3 + a2 x2 + a1 x + a0 ((((a5 x + a4 )x + a3 )x + a2 )x + a1 )x + a0 (a5 x2 + a4 x + a3 )x3 + a2 x2 + a1 x + a0 (a5 x + a4 )x4 + (a3 x + a2 )x2 + (a1 x + a0 ) (a5 x3 + a2 )x2 + (a4 x3 + a1 )x + (a3 x3 + a0 ) ((a5 x2 + a3 )x2 + a1 )x + ((a4 x2 + a2 )x2 + a0 ) (a5 x + a4 )x4 + a3 x3 + (a2 x2 + a1 x + a0 ) A 5 5 5 5 5 5 5 M 2 1 4 3 3 2 4 C 5 4 2 2 3 3 2 S 2 0 1 2 1 1 2 which need to be exploited. . 5. . 2. In the first. x10 . However. x23 }. x4 . . x7 . redundancy is removed at word level. l. x10 . need to be evaluated. x23 }.2) As multiplication is additive in the logarithmic domain. 39. x8 . data-coefficient multiplications (C). . squarers (S). x12 . x5 . a minimum length addition chain will give an optimal solution to compute a single power term. Another efficient way to evaluate a single power is to consider the relation to the addition chain problem. x23 }. Two independent approaches are considered. such that ai = aj + ak . With the binary approach in [102]. x3 . 22 repeated multiplications of x are required. For example. i = 1. An addition chain for an integer n is a list of integers [102] 1 = a0 . however. 122. . T = {22. while the factor method in [102] gives a solution with one less multiplication. {x2 . al = n. x16 . this approach is inefficient and becomes infeasible for large values of i.1: Evaluation schemes for fifth-order polynomial.1 Powers Evaluation with Sharing at Word-Level A direct approach to compute xi is the repeated multiplication of x with itself i − 1 times. k ≤ j < i. 123]. . x3 . Polynomial Evaluation Table 5. the same can be achieved in eight multiplications. A minimum length addition chain for n = 23 gives the optimal solution with only six multiplications. Suppose a sparse set a power terms. a2 . are applied in the evaluation of multiple power terms. datadata multiplications (M).

39. hence. from one up to some integer n.. 3. 14. the solution is {22. an addition sequence computing {3. 24. For example. and. 50}. 27. 9. 6. 2. (5. 10. . there are some overlapping terms. 6. 2. 11. 50} → {1.7) . Powers Evaluation A minimum length addition chain solution for each individual power is 22 → {1. 2. 2. 25. the optimal solution for the evaluation of the requested set of powers. 15.2 Powers Evaluation with Sharing at Bit-Level In this approach..5.x2 x1 x0 . 3. 3. 11}. As can be seen.4) However. 2. The PP matrices for all requested powers in the set are generated independently and sharing at bit level is explored in the PP matrices in order to remove redundant computations. 6. 5. 12}. and it is assumed that each power is computed only once. which can be considered as the generalization of addition chains. . 27. Since every single multiplication will compute some power in the set. 39. 22. If these overlapping terms are removed and the solutions are combined for the evaluation of the required set of powers. 50}. 22. This solution is obtained by addressing addition sequence problem. are required to be computed is easy compared to the sparse case. 12. can be mapped to the same full adder. 3. 24. . any solution obtained will be optimal with respect to the minimum number of multiplications. 12. 7.6) xi 2i . which can be shared. 39. 6. 50}. 50}. 39} 49 50 → {1. The removal of redundancy at word-level does not strictly mean for the removal of overlapping terms only but to compute only those powers which could be used for evaluation of other powers as well in the set. 7. 10.4. An addition sequence for the set of integers T = n1 . 39. 36. the evaluation of power terms is done in parallel using summations trees.5) which clearly shows a significant difference. 8.4. 15. 5. {22. {22. is {22. 3. (5. 11} is {1. 3. 25. 12. 5. 6. A w-bit unsigned binary number can be expressed as following X = xw−1 xw−2 xw−3 . 3. The redundancy here relates to the fact that same three partial products may be present in more than one columns. similar to parallel multipliers. 22} 39 → {1. 2. 50}. 39. Note that the case where all powers of x. (5. 39. . 50} → {1. {2. with a value of X= i=0 w−1 (5. 11. nr is an addition chain that contains each element of T . n2 . 4.

4).2: Equivalent representation of binary variables in PP matrix for w = 3.g. the terms. (5. the expression in (5. is considered to demonstrate the sharing potential among multiple partial products. .. 7. since the partial product sets {9. w) = (5. The PP matrix of the corresponding power term can then be deduced from this function. Binary variable x0 x1 x1 x0 x2 x2 x0 x2 x1 x2 x1 x0 Word (X) x2 x1 x0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1 Representation 1 2 3 4 5 6 7 In Fig.2.7) is given by w−1 n Xn = i=0 xi 2i . To make the analysis of the PP matrix simpler. i. cube. the above equation can be simplified to a sum of weighted binary variables. Therefore. 5.2. computing the square. e. Table 5. which can further be simplified by using the identities xi xj = xj xi and xi xi = xi . 9} are present in more than one column. In the process of generation of a PP matrix from the weighted function.2.8) For the powers with n ≥ 2. 13} and {5.8) can be evaluated using the multinomial theorem. For any integer power n and a word length of w bits. three full adders (two for the first set and one for the second) can be saved by making sure that exactly these sets of partial products are added. there are savings. For the terms having weights other than a power of two.e. The nth power of an unsigned binary number as in (5. and fourth power of a five-bit input.50 Chapter 5. 5. As can be seen in Fig. The advantage of using CSD representation over binary is that the size of PP matrix is reduced and therefore the number of full adders for accumulating PP are reduced.. the binary variables in the PP matrix can then be represented by their corresponding representation weights as given in Table 5. an example case for (n. x0 20 will be placed in the first and x2 26 in the 7-th column from the left in the PP matrix. compared to adding the partial products in an arbitrary order. 10. Polynomial Evaluation where xw−1 is the MSB and x0 is the LSB of the binary number. binary or CSD representation may be used. All partial products of the powers from x2 up to x5 are shown.

4.5. n = 5. w = 4.2: Partial product matrices with potential sharing of full adders. . Powers Evaluation 51 n=2 4 6 9 6 9 10 13 12 14 4 9 10 13 11 n=3 5 7 10 11 3 6 5 7 9 2 5 9 n=4 8 14 10 10 10 9 6 14 13 11 15 13 15 17 4 7 3 5 5 9 5 7 9 13 11 9 10 13 Figure 5.

.

By optimizing both the number of arithmetic operations and their word lengths. the output of the final comb stage will be embedded with zeros during the upsampling. In this way an improved comparison can be made. improved implementations are obtained. Deriving a switching activity model for polyphase FIR filters would naturally have benefits for all polyphase FIR filters. For interpolating recursive CIC filters. This would motivate a study of pipelined polynomial evaluation 53 .1 Conclusions In this work.Chapter 6 Conclusions and Future Work 6. Based on this. The comparisons are further improved for some cases by accurately estimating the switching activities and by introducing leakage power consumption. contributions beneficial for the implementation of integer and noninteger SRC were presented.2 Future Work The following ideas are identified as possibilities for future work: It would be beneficial to derive corresponding accurate switching activity models for the non-recursive CIC architectures. one could derive closed form expressions for the integrator taking this additional knowledge into account. where both architectures have more accurate switching activity estimates. This not only reduces the area and power consumption but also allows a better comparisons between different SRC alternatives during a high-level system design. 6. The length of the Farrow sub-filters usually differ between different subfilters. not only the ones with CIC filter coefficients.

This could also include obtaining different power terms at different time steps.54 Chapter 6. this could be utilized to shift the rows corresponding to the sub-filters in such a way that the total implementation complexity for the matrix-vector multiplication and the pipelined polynomial evaluation is minimized. . The addition sequence model does not include pipelining. For the proposed matrix-vector multiplication scheme. Conclusions and Future Work schemes where the coefficients arrive at different times. It would be possible to reformulate it to consider the amount of pipelining registers required as well.

Chapter 7 References 55 .

Inc. IEEE. 1984. no. 1976. Inc. [2] A. John Wiley & Sons. Mar. interpolation.” IEEE Trans. 2011. New York. [11] M. [8] T. [3] S. 300–331. Apr. R. Kaiser. 33. . 3. J. vol. vol. 1993. 1993. vol. “Interpolation and decimation of digital signals–a tutorial review. R. USA: John Wiley & Sons. vol. V. pp. and M. [4] L. Oct. [7] R.. no. Bonnerot. 1994. 225–235. 5. Gazsi. A.. R. Rabiner. 130. Mitra. Prentice Hall. pp. Dec.. “Digital signal processing schemes for efficient interpolation and decimation. Fliege.” IEEE Trans. 1986. Crochiere and L. vol. 577–591. K. Signal Process.” IEEE Trans. 6. 1999. Digital Processing of Signals. “Digital filtering by polyphase network:application to sample-rate alteration and filter banks. 32.. Valenzuela and A. and narrow-band filtering.. 2001. 3. Oppenheim. Ramstad. New York: Wiley. pp. 1st ed. Speech. [13] W. F. Constantinides. Prentice Hall. Mitra and J. 23.. NY. 1984. [12] R. E.. Multirate Digital Signal Processing: Multirate Systems. 346–348. 1975. Speech. 1981. Bellanger. Coudreuse.. Signal Process. Vaidyanathan. no. Circuits Syst. Discrete-Time Signal Processing. Buck. 109–114. Speech. 444–456. “Optimum FIR digital filter implementations for decimation. no. no. P..” Proc. [10] P. 2nd ed. References References [1] S. Electr. New York: McGraw-Hill. Multirate Systems and Filter Banks. Jun. Schafer. pp.56 Chapter 7. G. Drews and L. no. K. 69. Linköping University. Digital Filters Using MATLAB. Wavelets. [9] M. pp. Crochiere and L.” IEEE Trans. Acoust. 3. Rabiner. 24. Filter Banks. 2. G. Wanhammar and H. [5] N. W. Handbook for Digital Signal Processing.. Acoust.” IEE Proc. “Digital methods for conversion between arbitrary sampling frequencies. 1983. and J. [6] R. vol. “A new design method for polyphase filters using all-pass sections. Digital Signal Processing: A Computer Based Approach. Bellanger. Mar. Johansson. Circuits Syst. pp. Acoust. Signal Process.

12. “Simultaneous estimation of gain. Löwenborg. “A continuously variable digital delay element. R. Europ. pp. Jan.” IEEE Trans..” in Proc. pp. [23] C. pp. Record Conf.References 57 [14] M. [17] L. Circuits Syst. “Power efficient structure for conversion between arbitrary sampling rates. “On explicit time delay estimation using the Farrow structure. 1. Electron. Saramäki and T. Commun. 1.. Commun. . 501–507. 2007. 1. M. 2006. Renfors. [15] ——.” in Proc..” Signal Process. [20] S. Nandi. 72. and P. Gardner. vol. vol. Conf. 698–702. “Interpolation in digital modems. 998–1008. M. [19] M.-T. IEEE Int. Jan. Conf. 1987. Renfors. Consum. 1997. 2. Dooley and A. vol. 2005. Erup...” in Proc. “Efficient implementation of polynomial interpolation filters or full digital receivers. pp.. “Time-delay estimation using Farrow-based fractional-delay FIR filters: filter approximation vs. 1988. Johansson. 24–39. vol. and offset utilizing the Farrow structure. pp. F. “Interpolation in digital modems. Circuit Theory Design. Circuits Syst.” IEEE Trans. Wey. 34. [18] V. 51. Ritoniemi. estimation errors. [21] M. Farrow. Kim. 1996.” IEEE Trans. 1987. Circuits Syst. 6. Implementation and performance. Signal Process. Saramäki. 2. Conf. 175–178. 34. Jan. Circuits Syst. Harris. 2006. [16] F. IEEE Int. no.. no. J. 2641–2645. 3.-L. vol. 1–4. 285–288.” in Proc. Lett. 1993. Electro/Information Technol. Renfors and T.” IEEE Trans. no. “An efficient approach for conversion between arbitrary sampling frequencies. Jun. Olsson. pp. H. “Recursive Nth–band digital filters – part I: Design and properties. pp. no. vol.. Universal Personal Commun. Mar. Shiue and C. Vesma. vol. [25] D. 1999. W. pp. pp. 41. and M... vol. 244–247. 53–57.” IEEE Trans. II. 1. IEEE Int. Europ. [24] T. Symp.” in Proc.. no. Symp. Babic and M. 1993. Jan. Fundamentals. “Efficient implementation of interpolation technique for symbol timing recovery in DVB-T transceiver design. Gardner. [26] J. 1.” in Proc. A. no. pp. Tuukkanen. delay. 41. Feb. IEEE Int.” IEEE Signal Process. pp.. K. vol. no. 2005. 427–431. I. pp. T. “Combined interpolation and maximum likelihood symbol timing recovery in digital receivers. [22] ——. “Recursive Nth-band digital filters – part II: Design of multistage decimators and interpolators. 40–51. and R.

S. Solid-State Circuits. 442–446. H. [35] C. P. “Leakage current mechanisms and leakage reduction techniques in deep-submicrometer CMOS circuits. no. no. pp.” IEEE Trans. W. 115– 146. Austin. “Low-power CMOS digital design. May 2008. P. W. S. 11.” IEEE Trans. 1996. 27. S. . Mahmoodi-Meimand. no. and M. Mudge.” Computer.” IEEE Signal Process. and M. 14. Hu. no. “Leakage current: Moore’s law meets static power. [34] E. Narayanan. References [27] T. no. no. no. [36] A. [29] R. Hermanowicz. 12. V. Cantoni. Irwin. 2. [37] A. “Polynomial-based interpolation filters – part I: Filter synthesis. [33] H. Dec. “Fractional delay filter design based on truncated Lagrange interpolation. Hamila. 36. “Polynomial-based maximumlikelihood technique for synchronization in digital receivers. Lett. [30] H.” IEEE Signal Process. Jan. Löwenborg. vol. Vesma. “A frequency-domain approach to polynomial-based interpolation and the Farrow structure. 5... 2007. pp. Candan. 4. 2003. and R. “An efficient filtering structure for Lagrange interpolation. [31] J. Valimaki. Chandrakasan. 14. Jan. no. T.” IEEE Trans. 473–484. Teo. S. II. 2002. Roy. II. J. J. pp. Mar. D. S. no. vol. K. Lett. Chandrakasan and R. pp. pp. 1. and U. M. H. vol. pp. 68–75. Circuits Syst. M. 2000. [32] V. Välimäki and A. vol. 30–60. “A fractionally delaying complex Hilbert transform filter. 47. vol.. 4. IEEE. Circuits Syst.58 Chapter 7. 1992. Johansson and P.” IEEE Trans. 1. 305–327. and H. 2007. Laakso. Feb. II. Baauw. vol. [38] N. 5. “Splitting the unit delay. Boston-Dordrecht-London. Renfors. 50. and V. 1995. 49. vol. 3.” IEEE Signal Process. 2007. May 2008. A. 164– 169. 55. Laine.” IEEE Trans. 206–209. pp.” IEEE J. Circuits Syst. Apr. Sheng. pp. Rojewski. Nov. pp. Kluwer Academic Publishers. Haghparast. 13. 91. Signal Process. Brodersen. J. Circuits Syst. vol. no. no. Saramäki. M. vol. pp. pp.” Circuits Syst. 2003. I. II. vol. L. 452–456. 26. “Variable digital filter with group delay flatness specification or phase constraints. Nordholm. Brodersen. Apr. pp. no. Johansson. Apr. Flautner. Circuits Syst. Low-Power Digital CMOS Design. Vesma. 17–19. K. vol. 2003. T. II. Kim.” Proc. [39] K. Aug. and K. 567–576. Vesma and T. vol. “On the design of adjustable fractional delay FIR filters.. Kandemir. 2. 816–819. [28] J. Karjalainen. 8. Mag. 55. Dam. Mukhopadhyay.

Circuits Syst. IEEE Int. [43] L. Kum and W..” in Proc.” in Proc. vol. 2005. [48] P. Chan and K.References [40] International technology http://public. Han and B.. Aug. vol. 4. Conf. 12. “Closed-form and real-time wordlength adaptation. “Optimum wordlength search using sensitivity information. [49] G. Menard.” IEEE Trans. “Simulation-based word-length optimization method for fixed-point digital signal processing systems.” IEEE Trans. Sentieys. pp. and O. no. [45] K. 1998. [53] M. Vesterbacka. [47] D. [44] W. June 1997. pp. J. MA. [52] S. Applied Signal Process. Fiore and L. Diss. IEEE Int. 2001. [41] L. Lee. Academic Press. Acoustics Speech Signal Process. Parhi. P. “Data wordlength optimization for FPGA synthesis. IEEE Workshop Signal Process. 2005.” Ph. Y.D.net. “Efficient approximate wordlength optimization. no. 2007. Evans.” EURASIP J. D. Villasenor.. K. Norwell. Comput. pp.. 20. pp.” IEEE Trans. Design Implementation. no. vol. 159–184. pp. Sweden. Herve. Comput. Nov. Dec. Feb. 8. 1897–1900. 487. 2607– 2610. [51] N. 567–571. “Combined word-length optimization and highlevel synthesis of digital signal processing systems. “On implementation of maximally fast wave digital filters. 1–14. Sung and K. Apr. vol. B. [50] P. C. Lee and J. dissertation. no. pp.” Bell Syst. D. 2006. Signal Process. “Wordlength determination algorithms for hardware implementation of linear time invariant systems with prescribed output accuracy. 921–930. A.itrs. Tsui.-I.” IEEE Trans. L. pp. no. Comput. 4. 1561–1570. 623–628. 11. and W.-U. 49. [42] K. roadmap for 59 semiconductors. 1995. 2008. D. DSP Integrated Circuits. . Syst.. 1999. Kum. [46] K. vol. Synthesis And Optimization Of DSP Algorithms. 56. Cheung. 1999.. D. Luk. M. vol. 43. Tech. Circuits Syst. 2006. K.-I.. pp. Fiore. Sung. VLSI Digital Signal Processing Systems: Design and Implementation. pp. “A bit-width optimization methodology for polynomial-based function evaluation. John Wiley and Sons. vol.” in Proc. 3087–3090. Jan. 57. 2004. “On the interaction of round-off noise and dynamic range in digital filters.. Constantinides. Symp.Aided Design Integr.. Wanhammar. Jackson. Linköping university. USA: Kluwer Academic Publishers. 1970.

Schaumont. [60] D. vol. Martinez-Peiro. Symp. A. Dempster and M. Wanhammar. 7... 1991. vol. no. P. Circuits Syst. Circuits Syst. “Extended results for minimum-adder constant integer multipliers.” IEEE Trans.” in Proc. 138. 225–251. Ma and F. [56] O. pp. [62] A. Comput. M. 2004..-K. “Digital filter design using subexpression elimination and all signed-digit representations. 2. Circuits Devices Syst. pp. vol.” in Proc. Circuits Syst. 1. Circuits. “Low-complexity bitserial constant-coefficient multipliers. 15. Macleod.” IEEE Trans. V. Macleod.Aided Design Integr. Dempster. 1. [64] M. and L. Wanhammar. 2002. Srivastava. K. IEEE Int. pp.” IEEE ASSP Mag. References [54] G. 5. 58–68. “Multiple constant multiplications: efficient and versatile framework and algorithms for exploring common subexpression elimination. B. 407–413. D. Potkonjak. Chandrakasan. vol. II. Dempster and M. “Subexpression sharing in filters using canonic signed digit multipliers.. vol. Bull and D. 10. Wanhammar. Johansson. 151–165. Jun. pp. Oct. 18. Johansson. 1999. Circuits Syst. 1996. P. . Taylor. and L. O. pp. Vernalde. Devices and Syst. vol. “Design of highspeed multiplierless filters using a non-recursive signed common subexpression algorithm. 196–203. 2006. Symp. vol. no. no. and D. Jan. Gustafsson. A. Circuits Syst. I. Dempster. and L. Durackova. 677–688. He and M. 2002. Mar. vol. E. pp. 1996. 3. no. Macleod. 1.60 Chapter 7. H. “Simplified design of constant coefficient multipliers. Circuits Syst. Feb.. [59] K.” in Proc. 3. 1994. “Primitive operator digital filters. [65] R.” IEE Proc. pp.” Circuits Syst. [63] R. Pasko.. 141. Gustafsson. J. no. no. “A new algorithm for elimination of common subexpressions.. Derudder..” in Proc.” IEE Proc. “Constant integer multiplication using minimum adders. 6–20. Apr. [55] A. no. II. Jan. Torkelson. 25. and A. Comput. [61] M. Signal Process. G.-Aided Design Integr.. IEEE Custom Integrated Circuits Conf. Oct.” IEEE Trans. 3. 2004. vol. 2. “FPGA implementation of FIR filters using pipelined bit-serial canonical signed digit multipliers. 43. 49. Hartley. G. G. S. I. [58] S. pp. no. 1990. vol. [57] O. Horrocks. Boemo. 3.” IEEE Trans. and L. 81–84. IEEE Int. Circuits Syst. vol. 401–412. Wanhammar. D. Gustafsson. M. IEEE Int. 1994. R. Symp. “Multiplier policies for digital signal processing. pp..

II. 1. Dempster. pp. Macleod.” IEEE Trans. IEEE Mediterranean Electrotechnical Conf.” in Proc. “A novel approach to multiple constant multiplication using minimum spanning trees. 204–216. .” in Proc. Apr. 1. Dempster. vol. Circuits Syst. O. pp. H. 1097–1100. pp. Dimirsoy. vol. and I. Muhammad and K. Circuits Syst. Gustafsson.. 1. Symp. Circuits Syst.” in Proc..” in Proc. Algorithms. Gustafsson and L. 569–577. 1995. Gustafsson. IEEE Nordic Signal Process. [75] O. Dempster and M. Conf. vol. [72] A. no.” in Proc. G. 257–261. O. S. Signals Syst. no. Püschel.. 5. 2004. pp. D. Wanhammar. “Implementation of low complexity FIR filters using a minimum spanning tree. IEEE Nordic Signal Process. Circuits Syst. no. Circuit Theory Design. “Towards an algorithm for matrix multiplier blocks. Wanhammar. Wanhammar. Comput. IEEE Int. 141– 144. Circuits Syst. 2004. 2004. Symp. 2002. Roy. Midwest Symp. Ohlsson. Ohlsson. Jan. “Multiplierless multiple constant multiplication. Dempster and N. [73] Y. “A graph theoretic approach for synthesizing very low-complexity high-speed digital filters. 2002. and L. Gustafsson and A. and L. Europ. 2003.” IEEE Trans. “Designing multiplier blocks with low logic depth. 9. no. pp. 63–66.. 2004. G. “Use of minimum-adder multiplier blocks in FIR digital filters. Signal Process. Voronenko and M. [68] O.References 61 [66] O.. vol. Feb. Ohlsson. G. [69] K.Aided Design Integr. IEEE Int.” in Proc. [74] A. P. IEEE Int. Gustafsson. Symp. p. S. 2. “Low-complexity constant coefficient matrix multiplication using a minimum spanning tree approach. vol. [70] H. vol. vol. pp.. H. [76] A. 2007. 261–264. Wanhammar. Dempster.” in Proc.” IEEE Trans. Asilomar Conf. 3. “On the use of multiple constant multiplication in polyphase FIR filters and filter banks. “Improved multiple constant multiplication using a minimum spanning tree. Murphy. O. and J. pp. [77] O. Comput. [67] O. [71] A. pp.. Sep. 53–56. “A difference based adder graph heuristic for multiple constant multiplication problems. May 2007.. “Efficient interpolators and filter banks using multiplier blocks. Kale. Gustafsson. 48..” in Proc. vol. 2002. 42. Symp.” ACM Trans. article 11. Sep. and L. 2. 21. 3. Coleman. G. G. Gustafsson. 2000.

Vesma.. Hogenauer.. Jan. vol. Dempster. Syst. no. Y. 155 –162.Imprint of: IGI Publishing.” IEEE Trans. 2000. Costa. vol.” Electron. Feb. [79] N.62 Chapter 7. “Efficient polyphase decomposition of comb decimation filters in Σ∆ analog-todigital converters. [83] E. Des. 2005. accepted.. Jia. [85] S. [88] U. 1981. . vol. [81] L. “Decimation by irrational factor using CIC filter and linear interpolation. pp. Acoustics.. pp. no. Chu and C. “Decimation for Sigma Delta modulation. Crochiere and L. Aboushady. pp. Conf.. vol. Autom. 2001. [90] A. Mehrez. vol. Englewood Cliffs. [82] R. 2004. 1984. 89–92. 2. 898–903. 3. and J. D.. [86] J. Prentice Hall. Symp.” in Proc.” IEEE Trans. Oct. 81–101. Digital Signal Processing with Field Programmable Gate Arrays (Signals and Communication Technology).. 1986. 10. vol. 3677–3680. Circuits Syst. pp. Inc. pp. Renfors. Hershey. 31. 2008. 34. 72–76. no. and H. Secaucus. 29. pp. Louerat. Macleod and A. 1271–1282. PA: Information Science Reference . “Integer linear programming-based bit-level optimization for high-speed FIR decimation filter architectures. “An economical class of digital filters for decimation and interpolation. 11. May 2004. 651–652. 1983. Dumonteix. vol. Tenhunen. Nov. References [78] M. Acoust. Electron. pp. Candy.. “A fifth-order comb decimation filter for multi-standard transceiver applications. Flores.-M. Lett. J. Babic. “Some optimizations of hardware multiplication by constant matrices. Meyer-Baese. Milic. Commun. “Multirate filter designs using comb filters. and H. 1. Comput. Rabiner. pp. 2001.” IEEE Trans.. Blad and O. USA: Springer-Verlag New York. Tisserand. [84] D. “Optimization algorithms for the multiplierless realization of linear transforms. no. 1. Speech Signal Process. Multirate Digital Signal Processing. vol..” in Proc. [80] L. and M. no. 29. [89] H. 48. 913–924. 40. Speech. IEEE Int. “Common subexpression elimination algorithm for low-cost multiplierless implementation of matrix multipliers. G.. no. 10. IEEE Int. N. Aksoy. Multirate Filtering for Digital Signal Processing: MATLAB Applications. Monteiro. [87] Y. Boullis and A. Gao.” IEEE Trans. Circuits Syst. II. Circuits Syst. E. 54. 2010. Burrus. L.” Circuits Syst. 6. no. Signal Process. Apr. P. M. 11. E.” IEEE Trans. NJ.. Signal Process. vol. Gustafsson.J. pp. Oct.” ACM Trans.

Signals Syst. Signal Process. IEEE Int. pp. –Vision. vol. Speech. pp. Diss. Signal Process. Lakshmivarahan and S.. 151. Feb. no. pp. Stephen and R. 2353– 2356.” IEEE Trans. [103] S. Hamming.. 5. Y. “A modified comb filter structure for decimation. Mondin. N. “Optimization and applications of polynomial-based interpolation filters. May 2007. Tampere University of Technology. pp. Reading. 2007. II.” IEE Proc. 2004. Vesma.” in Proc. A. pp. 1st ed. 2. 5. Signal Process.” IEEE Trans. Massachusetts. “Comb-based decimation filters for Sigma Delta A/D converters: Novel schemes and comparisons.. Laddomada and M. no. vol. 1997. vol. . Image Signal Process. 287–296. Ritoniemi. 1999. Knuth. Mitra. 1204–1213. 457–467.” IEEE Trans. 994–1005. “Application of filter sharpening to cascaded integrator-comb decimation filters. vol.. 1997. Dhall.” IEEE Trans. Willson. 2005. pp. Z. Kwentus. [102] D. W. no. [92] M. Kaiser and R.. 1990. 52. Commun. no. and Information Tech. E. 25. 1414–1420. Lo Presti. Oct.. I. 2000. [97] T. 54. 2004. [98] L. 47. no. Asilomar Conf. Nov. [100] G.” IEEE Trans. 4. Saramäki and T. pp. no. 7. pp. 4. Jiang. Jul.References 63 [91] A. Circuits Syst. 2: Seminumerical Algorithms. no.” IEEE Trans. K. Analysis and Design of Parallel Algorithms: Arithmetic and Matrix Problems. Symp. 2. “Efficient modified-sinc filters for sigma-delta A/D converters. “Decimation schemes for Sigma Delta A/D converters based on kaiser and hamming sharpened filters. Circuits Syst. Stewart. 55. 252–255.” in Proc.. 5. vol.. vol.” in Proc. “Sharpening the response of a symmetric nonrecursive filter by multiple use of the same filter. Acoust. J. I. 45. 2170–2173. [101] J. Int. vol. Laddomada. Comput. vol. McGraw-Hill. and J. 11. Dolecek. Circuits Syst.. pp. “A new two-stage sharpened comb decimator. [95] ——. no. 415–422. [93] G. 1769–1779. The Art of Computer Programming. 1998. [94] M. [96] J. Aug. May 2007. vol. Symp. Addison-Wesley.” Ph. dissertation. [99] G. 254.D. pp. “Generalized comb decimation filters for Sigma Delta A/D converters: Analysis and design. vol. Jovanovic-Dolecek and S. “Modified CIC filter for rational sample rate conversion. Circuits Syst. K. 1977. “Sharpening of partially non-recursive CIC decimation filters.

ser. Mencer. 189–198. [116] S. vol.-Specific Syst.” in Proc. Euromicro Conf. IEEE Int. J. Dev. and J. Jan.. vol. Syst. Torres. “Generalizations of Horner’s rule for polynomial evaluation. Brisebarre.. [108] I. Lett. 17. 7. vol. Comput. 12. 2. Butler.” J. and T. IEEE Int. 216– 222. de Dinechin.” IBM J. Digital Syst. A. 2002.-U. Arch. T. C. Design Arch. 6. Dec. [105] G. Elementary Functions: Algorithms and Implementation. pp. Apr. 2007. Muller. Comput. Dec. 1972. 2010. vol. “On the parallel evaluation of polynomials. 625–638. Apr. 686–701. and E.. Comput. 1. no. W. no. Hu. Methods and Tools. Birkhauser. Applicat. pp. no. Joldes. Villalba. no. no. Comput.-M.. Sasao. vol. L.. 25. Pasca. “Hardware operators for function evaluation using sparse-coefficient polynomials. Dec. pp. and S. 265–274. “Optimizing hardware function evaluation. 22. Bandera. pp. 1. “Organization of computer systems: the fixed plus variable structure computer. NY.. Munro and A. and B. “Automatic generation of polynomial-based hardware architectures for function evaluation. “Design method for numerical function generators based on polynomial approximation for FPGA implementation. Arch. USA: ACM. 5. Paterson. [111] J. [110] L. no. Processors Conf. 1973. [112] D. no..-U. Cheung. Estrin. D. [109] K. pp.-M.. Syst. J. vol. 6. pp. A. pp. May 2008. 33–40. 260–262. Li. Western Joint IRE-AIEE-ACM Computer Conf. Gaffar. Sci. Borodin.. Jan. Sci. Munro and M.” IEEE Trans. 42. 1962. A. “A simple parallel algorithm for polynomial evaluation.” IEEE Trans.” in Proc.” Electron. 54. and W. 1520– 1531. Dorn. M. Luk. .” in Proc. “Efficient evaluation of polynomial forms. Nagayama. Gonzalez. pp. Comput.. [114] N. A. 2–5.” in Proc. and J. pp.” J. 1441–1442. C. Res. J.” SIAM J. 1973.. Processors Conf.. S. Zapata. G. 6. “Polynomial evaluation on multimedia processors. 1996.” IEEE Trans. [106] W. 239–245. 2005. 1960. Villasenor. “Optimal algorithms for parallel polynomial evaluation. Muller. R. Hormigo. pp. Luk. Applicat.-Specific Syst. “Hardware implementation trade-offs of polynomial approximations and interpolations. [115] F. vol. M. 2005. pp. Tisserand. Lee. pp. Maruyama. Nakamura. [113] D. T. Comput. 280–287. IRE-AIEE-ACM ’60 (Western).64 Chapter 7. O. 2. vol. New York. no. References [104] J. 57. Sci. 2006. [107] I. Lee.

pp. . Flammenkamp.References 65 [117] J.” SIAM J. Smith. Conf. vol. Arch. 1978.” Commun. “A method for obtaining digital signatures and public-key cryptosystems.-C. 6. 1. 1997. vol. 100–103. “A fast parallel squarer based on divide-and-conquer. 1. Gopalakrishnan.. 1999. no. 120–126. L. 11–22. 109–112. ElGindy. pp. Yao. 32. USA. Jun. New [121] R. 21. Feb. pp. “An efficient algorithm for computing shortest addition chains. Aug. K. 909–912. Yoo. C. vol. “On the evaluation of powers. van Tilborg. Comput. pp.-T. Australasian Comput. Pihl and E. [118] J. Bleichenbacher and A.: Springer. York. Circuits Syst. Comput. [119] M.” in Proc. “A multiplier and squarer generator for high performance DSP applications. no. 1998. J. pp. Adleman..” IEEE J. Solid-State Circuits.” SIAM J. 1996. IEEE Midwest Symp. 5. no. A. Encyclopedia of Cryptography and Security.. F. and L. “On determining polynomial evaluation structures for FPGA based custom computing machines. Rivest. [120] H. 1976. ACM. Shamir. [123] D. [122] A. and G. A.” in Proc. 2. C. vol. Wojko and H. Aas.. 2005.

Sign up to vote on this title
UsefulNot useful