This action might not be possible to undo. Are you sure you want to continue?

# Linköping Studies in Science and Technology

Dissertations, No 1420

On the Implementation of Integer and

Non-Integer Sampling Rate Conversion

Muhammad Abbas

Division of Electronics Systems

Department of Electrical Engineering

Linköping University

SE–581 83 Linköping, Sweden

Linköping 2012

Linköping Studies in Science and Technology

Dissertations, No 1420

Muhammad Abbas

mabbas@isy.liu.se

www.es.isy.liu.se

Division of Electronics Systems

Department of Electrical Engineering

Linköping University

SE–581 83 Linköping, Sweden

Copyright c _ 2012 Muhammad Abbas, unless otherwise noted.

All rights reserved.

Papers A, D, and E reprinted with permission from IEEE.

Papers B and C, partial work, reprinted with permission from IEEE.

Abbas, Muhammad

On the Implementation of Integer and Non-Integer Sampling Rate Conver-

sion

ISBN 978-91-7519-980-1

ISSN 0345-7524

Typeset with L

A

T

E

X2

ε

Printed by LiU-Tryck, Linköping, Sweden 2012

To my loving parents

Abstract

The main focus in this thesis is on the aspects related to the implementation of

integer and non-integer sampling rate conversion (SRC). SRC is used in many

communication and signal processing applications where two signals or systems

having diﬀerent sampling rates need to be interconnected. There are two basic

approaches to deal with this problem. The ﬁrst is to convert the signal to analog

and then re-sample it at the desired rate. In the second approach, digital signal

processing techniques are utilized to compute values of the new samples from

the existing ones. The former approach is hardly used since the latter one

introduces less noise and distortion. However, the implementation complexity

for the second approach varies for diﬀerent types of conversion factors. In

this work, the second approach for SRC is considered and its implementation

details are explored. The conversion factor in general can be an integer, a ratio

of two integers, or an irrational number. The SRC by an irrational numbers

is impractical and is generally stated for the completeness. They are usually

approximated by some rational factor.

The performance of decimators and interpolators is mainly determined by

the ﬁlters, which are there to suppress aliasing eﬀects or removing unwanted

images. There are many approaches for the implementation of decimation and

interpolation ﬁlters, and cascaded integrator comb (CIC) ﬁlters are one of them.

CIC ﬁlters are most commonly used in the case of integer sampling rate conver-

sions and often preferred due to their simplicity, hardware eﬃciency, and rela-

tively good anti-aliasing (anti-imaging) characteristics for the ﬁrst (last) stage of

a decimation (interpolation). The multiplierless nature, which generally yields

to low power consumption, makes CIC ﬁlters well suited for performing con-

version at higher rate. Since these ﬁlters operate at the maximum sampling

frequency, therefore, are critical with respect to power consumption. It is there-

fore necessary to have an accurate and eﬃcient ways and approaches that could

be utilized to estimate the power consumption and the important factors that

are contributing to it. Switching activity is one such factor. To have a high-level

estimate of dynamic power consumption, switching activity equations in CIC

ﬁlters are derived, which may then be used to have an estimate of the dynamic

power consumption. The modeling of leakage power is also included, which is

an important parameter to consider since the input sampling rate may diﬀer

several orders of magnitude. These power estimates at higher level can then be

used as a feed-back while exploring multiple alternatives.

Sampling rate conversion is a typical example where it is required to deter-

mine the values between existing samples. The computation of a value between

existing samples can alternatively be regarded as delaying the underlying signal

by a fractional sampling period. The fractional-delay ﬁlters are used in this

context to provide a fractional-delay adjustable to any desired value and are

therefore suitable for both integer and non-integer factors. The structure that

is used in the eﬃcient implementation of a fractional-delay ﬁlter is know as

Farrow structure or its modiﬁcations. The main advantage of the Farrow struc-

v

vi Abstract

ture lies in the fact that it consists of ﬁxed ﬁnite-impulse response (FIR) ﬁlters

and there is only one adjustable fractional-delay parameter, used to evaluate

a polynomial with the ﬁlter outputs as coeﬃcients. This characteristic of the

Farrow structure makes it a very attractive structure for the implementation. In

the considered ﬁxed-point implementation of the Farrow structure, closed-form

expressions for suitable word lengths are derived based on scaling and round-oﬀ

noise. Since multipliers share major portion of the total power consumption, a

matrix-vector multiple constant multiplication approach is proposed to improve

the multiplierless implementation of FIR sub-ﬁlters.

The implementation of the polynomial part of the Farrow structure is in-

vestigated by considering the computational complexity of diﬀerent polynomial

evaluation schemes. By considering the number of operations of diﬀerent types,

critical path, pipelining complexity, and latency after pipelining, high-level com-

parisons are obtained and used to short list the suitable candidates. Most of

these evaluation schemes require the explicit computation of higher order power

terms. In the parallel evaluation of powers, redundancy in computations is re-

moved by exploiting any possible sharing at word level and also at bit level.

As a part of this, since exponents are additive under multiplication, an ILP

formulation for the minimum addition sequence problem is proposed.

Populärvetenskaplig sammanfattning

I system där digitala signaler behandlas så kan man ibland behöva ändra da-

tahastigheten (samplingshastighet) på en redan existerande digital signal. Ett

exempel kan vara system där ﬂera olika standarder stöds och varje standard

behöver behandlas med sin egen datahastighet. Ett annat är dataomvandlare

som ut vissa aspekter blir enklare att bygga om de arbetar vid en högre hastig-

het än vad som teoretiskt behövs för att representera all information i signalen.

För att kunna ändra hastigheten krävs i princip alltid ett digitalt ﬁlter som kan

räkna ut de värden som saknas eller se till att man säkert kan slänga bort vissa

data utan att informationen förstörs. I denna avhandling presenteras ett antal

resultat relaterat till implementeringen av sådana ﬁlter.

Den första klassen av ﬁlter är så kallade CIC-ﬁlter. Dessa används ﬂitigt

då de kan implementeras med enbart ett fåtal adderare, helt utan mer kost-

samma multiplikatorer som behövs i många andra ﬁlterklasser, samt enkelt kan

användas för olika ändringar av datahastighet så länge ändringen av datatak-

ten är ett heltal. En modell för hur mycket eﬀekt olika typer av implemente-

ringar förbrukar presenteras, där den största skillnaden jämfört med tidigare

liknande arbeten är att eﬀekt som förbrukas genom läckningsströmmar är med-

tagen. Läckningsströmmar blir ett relativt sett större och större problem ju mer

kretsteknologin utvecklas, så det är viktigt att modellerna följer med. Utöver

detta presenteras mycket noggranna ekvationer för hur ofta de digitala värdena

som representerar signalerna i dessa ﬁlter statistiskt sett ändras, något som har

en direkt inverkan på eﬀektförbrukningen.

Den andra klassen av ﬁlter är så kallade Farrow-ﬁlter. Dessa används för

att fördröja en signal mindre än en samplingsperiod, något som kan användas

för att räkna ut mellanliggande datavärden och därmed ändra datahastighet

gotdtyckligt, utan att behöva ta hänsyn till om ändringen av datatakten är

ett heltal eller inte. Mycket av tidigare arbete har handlat om hur man väljer

värden för multiplikatorerna, medan själva implementeringen har rönt mindre

intresse. Här presenteras slutna uttryck för hur många bitar som behövs i imple-

menteringen för att representera allt data tillräckligt noggrant. Detta är viktigt

eftersom antalet bitar direkt påverkar mängden kretsar som i sin tur påverkar

mängden eﬀekt som krävs. Utöver detta presenteras en ny metod för att ersätta

multiplikatorerna med adderare och multiplikationer med två. Detta är intres-

sant eftersom multiplikationer med två kan ersättas med att koppla ledningarna

lite annorlunda och man därmed inte behöver några speciella kretsar för detta.

I Farrow-ﬁlter så behöver det även implementeras en uträkning av ett po-

lynom. Som en sista del i avhandlingen presenteras dels en undersökning av

komplexiteten för olika metoder att räkna ut polynom, dels föreslås två olika

metoder att eﬀektivt räkna ut kvadrater, kuber och högre ordningens heltalsex-

ponenter av tal.

vii

Preface

This thesis contains research work done at the Division of Electronics Systems,

Department of Electrical Engineering, Linköping University, Sweden. The work

has been done between December 2007 and December 2011, and has resulted

in the following publications.

Paper A

The power modeling of diﬀerent realizations of cascaded integrator-comb (CIC)

decimation ﬁlters, recursive and non-recursive, is extended with the modeling

of leakage power. The inclusion of this factor becomes more important when

the input sampling rate varies by several orders of magnitude. Also the im-

portance of the input word length while comparing recursive and non-recursive

implementations is highlighted.

M. Abbas, O. Gustafsson, and L. Wanhammar, “Power estimation of re-

cursive and non-recursive CIC ﬁlters implemented in deep-submicron tech-

nology,” in Proc. IEEE Int. Conf. Green Circuits Syst., Shanghai, China,

June 21–23, 2010.

Paper B

A method for the estimation of switching activity in cascaded integrator comb

(CIC) ﬁlters is presented. The switching activities may then be used to estimate

the dynamic power consumption. The switching activity estimation model is

ﬁrst developed for the general-purpose integrators and CIC ﬁlter integrators.

The model was then extended to gather the eﬀects of pipelining in the carry

chain paths of CIC ﬁlter integrators. The correlation in sign extension bits

is also considered in the switching estimation model. The switching activity

estimation model is also derived for the comb sections of the CIC ﬁlters, which

normally operate at the lower sampling rate. Diﬀerent values of diﬀerential

delay in the comb part are considered for the estimation. The comparison of

theoretical estimated switching activity results, based on the proposed model,

and those obtained by simulation, demonstrates the close correspondence of

the estimation model to the simulated one. Model results for the case of phase

accumulators of direct digital frequency synthesizers (DDFS) are also presented.

M. Abbas, O. Gustafsson, and K. Johansson, “Switching activity estima-

tion for cascaded integrator comb ﬁlters,” IEEE Trans. Circuits Syst. I,

under review.

A preliminary version of the above work can be found in:

M. Abbas and O. Gustafsson, “Switching activity estimation of CIC ﬁlter

integrators,” in Proc. IEEE Asia Paciﬁc Conf. Postgraduate Research in

Microelectronics and Electronics, Shanghai, China, Sept. 22–24, 2010.

ix

x Preface

Paper C

In this work, there are three major contributions. First, signal scaling in the

Farrow structure is studied which is crucial for a ﬁxed-point implementation.

Closed-form expressions for the scaling levels for the outputs of each sub-ﬁlter

as well as for the nodes before the delay multipliers are derived. Second, a

round-oﬀ noise analysis is performed and closed-form expressions are derived.

By using these closed-form expressions for the round-oﬀ noise and scaling in

terms of integer bits, diﬀerent approaches to ﬁnd the suitable word lengths to

meet the round-oﬀ noise speciﬁcation at the output of the ﬁlter are proposed.

Third, direct form sub-ﬁlters leading to a matrix MCM block is proposed, which

stems from the approach for implementing parallel FIR ﬁlters. The use of a

matrix MCM blocks leads to fewer structural adders, fewer delay elements, and

in most cases fewer total adders.

M. Abbas, O. Gustafsson, and H. Johansson, “On the implementation

of fractional delay ﬁlters based on the Farrow structure,” IEEE Trans.

Circuits Syst. I, under review.

Preliminary versions of the above work can be found in:

M. Abbas, O. Gustafsson, and H. Johansson, “Scaling of fractional delay

ﬁlters using the Farrow structure,” in Proc. IEEE Int. Symp. Circuits

Syst., Taipei, Taiwan, May 24–27, 2009.

M. Abbas, O. Gustafsson, and H. Johansson, “Round-oﬀ analysis and

word length optimization of the fractional delay ﬁlters based on the Farrow

structure,” in Proc. Swedish System-on-Chip Conf., Rusthållargården,

Arlid, Sweden, May 4–5, 2009.

Paper D

The computational complexity of diﬀerent polynomial evaluation schemes is

studied. High-level comparisons of these schemes are obtained based on the

number of operations of diﬀerent types, critical path, pipelining complexity,

and latency after pipelining. These parameters are suggested to consider to

short list suitable candidates for an implementation given the speciﬁcations. In

comparisons, not only multiplications are considered, but they are divided into

data-data multiplications, squarers, and data-coeﬃcient multiplications. Their

impact on diﬀerent parameters suggested for the selection is stated.

M. Abbas and O. Gustafsson, “Computational and implementation com-

plexity of polynomial evaluation schemes,” in Proc. IEEE Norchip Conf.,

Lund, Sweden, Nov. 14–15, 2011.

Paper E

The problem of computing any requested set of power terms in parallel using

summations trees is investigated. A technique is proposed, which ﬁrst generates

Preface xi

the partial product matrix of each power term independently and then checks

the computational redundancy in each and among all partial product matrices

at bit level. The redundancy here relates to the fact that same three partial

products may be present in more than one columns, and, hence, all can be

mapped to the one full adder. The testing of the proposed algorithm for diﬀerent

sets of powers, variable word lengths, and signed/unsigned numbers is done to

exploit the sharing potential. This approach has achieved considerable hardware

savings for almost all of the cases.

M. Abbas, O. Gustafsson, and A. Blad, “Low-complexity parallel evalua-

tion of powers exploiting bit-level redundancy,” in Proc. Asilomar Conf.

Signals Syst. Comp., Paciﬁc Grove, CA, Nov. 7–10, 2010.

Paper F

An integer linear programming (ILP) based model is proposed for the compu-

tation of a minimal cost addition sequence for a given set of integers. Since

exponents are additive for a multiplication, the minimal length addition se-

quence will provide an eﬃcient solution for the evaluation of a requested set of

power terms. Not only is an optimal model proposed, but the model is extended

to consider diﬀerent costs for multipliers and squarers as well as controlling the

depth of the resulting addition sequence. Additional cuts are also proposed

which, although not required for the solution, help to reduce the solution time.

M. Abbas and O. Gustafsson, “Integer linear programming modeling of

addition sequences with additional constraints for evaluation of power

terms,” manuscript.

Paper G

Based on the switching activity estimation model derived in Paper B, the model

equations are derived for the case of phase accumulators of direct digital fre-

quency synthesizers (DDFS).

M. Abbas and O. Gustafsson, “Switching activity estimation of DDFS

phase accumulators,” manuscript.

The contributions are also made in the following publication but the contents

are not directly relevant or less relevant to the topic of thesis.

M. Abbas, F. Qureshi, Z. Sheikh, O. Gustafsson, H. Johansson, and K. Jo-

hansson, “Comparison of multiplierless implementation of nonlinear-phase

versus linear-phase FIR ﬁlters,” in Proc. Asilomar Conf. Signals Syst.

Comp., Paciﬁc Grove, CA, Oct. 26–29, 2008.

Acknowledgments

I humbly thank Allah Almighty, the Compassionate, the Merciful, who gave

health, thoughts, aﬀectionate parents, talented teachers, helping friends and an

opportunity to contribute to the vast body of knowledge. Peace and blessing of

Allah be upon the Holy Prophet MUHAMMAD (peace be upon him), the last

prophet of Allah, who exhort his followers to seek for knowledge from cradle to

grave and whose incomparable life is the glorious model for the humanity.

I would like to express my sincere gratitude towards:

My advisors, Dr. Oscar Gustafsson and Prof. Håkan Johansson, for

their inspiring and valuable guidance, enlightening discussions, kind and

dynamic supervision through out and in all the phases of this thesis. I

have learnt a lot from them and working with them has been a pleasure.

Higher Education Commission (HEC) of Pakistan is gratefully acknowl-

edged for the ﬁnancial support and Swedish Institute (SI) for coordinating

the scholarship program. Linköping University is also gratefully acknowl-

edged for partial support.

The former and present colleagues at the Division of Electronics Systems,

Department of Electrical Engineering, Linköping University have created

a very friendly environment. They always kindly do their best to help

you.

Dr. Kenny Johansson for introducing me to the area and power estimation

tools and sharing many useful scripts.

Dr. Amir Eghbali and Dr. Anton Blad for their kind and constant support

throughout may stay here at the department.

Dr. Kent Palmkvist for help with FPGA and VHDL-related issues.

Peter Johansson for all the help regarding technical as well as administra-

tive issues.

Dr. Erik Höckerdal for proving the LaTeX template, which has made life

very easy.

Dr. Rashad Ramzan and Dr. Rizwan Asghar for their generous help and

guidance at the start of my PhD study.

Syed Ahmed Aamir, Muhammad Touqir Pasha, Muhammad Irfan Kazim,

and Syed Asad Alam for being caring friends and providing help in proof-

reading this thesis.

My friends here in Sweden, Zafar Iqbal, Fahad Qureshi, Ali Saeed, Saima

Athar, Nadeem Afzal, Fahad Qazi, Zaka Ullah, Muhammad Saifullah

Khan, Dr. Jawad ul Hassan, Mohammad Junaid, Tafzeel ur Rehman,

and many more for all kind of help and keeping my social life alive.

xiii

xiv Acknowledgments

My friends and colleagues in Pakistan especially Jamaluddin Ahmed, Khalid

Bin Sagheer, Muhammad Zahid, and Ghulam Hussain for all the help and

care they have provided during the last four years.

My parent-in-laws, brothers, and sisters for their encouragement and

prays.

My elder brother Dr. Qaisar Abbas and sister-in-law Dr. Uzma for their

help at the very start when I came to Sweden and throughout the years

later on. I have never felt that I am away from home. Thanks for hosting

many wonderful days spent there at Uppsala.

My younger brother Muhammad Waqas and sister for their prays and

taking care of the home in my absence.

My mother and my father for their non-stop prays, being a great asset

with me during all my stay here in Sweden. Thanks to both of you for

having conﬁdence and faith in me. Truly you hold the credit for all my

achievements.

My wife Dr. Shazia for her devotion, patience, unconditional cooperation,

care of Abiha single handedly, and being away from her family for few

years. To my little princess, Abiha, who has made my life so beautiful.

To those not listed here, I say profound thanks for bringing pleasant mo-

ments in my life.

Muhammad Abbas

January, 2012,

Linköping Sweden

Contents

1 Introduction 1

1.1 Digital Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 FIR Filters . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.2 IIR Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.2 Sampling Rate Conversion . . . . . . . . . . . . . . . . . . . . . . 3

1.2.1 Decimation . . . . . . . . . . . . . . . . . . . . . . . . . . 4

1.2.2 Interpolation . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2.3 Noble Identities . . . . . . . . . . . . . . . . . . . . . . . . 7

1.3 Polyphase Representation . . . . . . . . . . . . . . . . . . . . . . 8

1.4 Fractional-Delay Filters . . . . . . . . . . . . . . . . . . . . . . . 10

1.5 Power and Energy Consumption . . . . . . . . . . . . . . . . . . 13

2 Finite Word Length Eﬀects 17

2.1 Numbers Representation . . . . . . . . . . . . . . . . . . . . . . . 17

2.1.1 Two’s Complement Numbers . . . . . . . . . . . . . . . . 18

2.1.2 Canonic Signed-Digit Representation . . . . . . . . . . . . 19

2.2 Fixed-Point Quantization . . . . . . . . . . . . . . . . . . . . . . 19

2.3 Overﬂow Characteristics . . . . . . . . . . . . . . . . . . . . . . . 21

2.3.1 Two’s Complement Addition . . . . . . . . . . . . . . . . 21

2.3.2 Two’s Complement Multiplication . . . . . . . . . . . . . 21

2.4 Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

2.5 Round-Oﬀ Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.6 Word Length Optimization . . . . . . . . . . . . . . . . . . . . . 23

2.7 Constant Multiplication . . . . . . . . . . . . . . . . . . . . . . . 24

3 Integer Sampling Rate Conversion 27

3.1 Basic Implementation . . . . . . . . . . . . . . . . . . . . . . . . 27

3.1.1 FIR Decimators . . . . . . . . . . . . . . . . . . . . . . . 28

3.1.2 FIR Interpolators . . . . . . . . . . . . . . . . . . . . . . . 29

3.2 Polyphase FIR Filters . . . . . . . . . . . . . . . . . . . . . . . . 29

3.2.1 Polyphase FIR Decimators . . . . . . . . . . . . . . . . . 30

3.2.2 Polyphase FIR Interpolators . . . . . . . . . . . . . . . . 31

xv

xvi Contents

3.3 Multistage Implementation . . . . . . . . . . . . . . . . . . . . . 31

3.4 Cascaded Integrator Comb Filters . . . . . . . . . . . . . . . . . 32

3.4.1 CIC Filters in Interpolators and Decimators . . . . . . . . 33

3.4.2 Polyphase Implementation of Non-Recursive CIC . . . . . 35

3.4.3 Compensation Filters . . . . . . . . . . . . . . . . . . . . 36

4 Non-Integer Sampling Rate Conversion 37

4.1 Sampling Rate Conversion: Rational Factor . . . . . . . . . . . . 37

4.1.1 Small Rational Factors . . . . . . . . . . . . . . . . . . . . 37

4.1.2 Polyphase Implementation of Rational Sampling Rate Con-

version . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.1.3 Large Rational Factors . . . . . . . . . . . . . . . . . . . . 40

4.2 Sampling Rate Conversion: Arbitrary Factor . . . . . . . . . . . 41

4.2.1 Farrow Structures . . . . . . . . . . . . . . . . . . . . . . 41

5 Polynomial Evaluation 45

5.1 Polynomial Evaluation Algorithms . . . . . . . . . . . . . . . . . 45

5.2 Horner’s Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.3 Parallel Schemes . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

5.4 Powers Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . 47

5.4.1 Powers Evaluation with Sharing at Word-Level . . . . . . 48

5.4.2 Powers Evaluation with Sharing at Bit-Level . . . . . . . 49

6 Conclusions and Future Work 53

6.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

6.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

7 References 55

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

Publications 67

A Power Estimation of Recursive and Non-Recursive CIC Filters

Implemented in Deep-Submicron Technology 69

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

2 Recursive CIC Filters . . . . . . . . . . . . . . . . . . . . . . . . 72

2.1 Complexity and Power Model . . . . . . . . . . . . . . . . 73

3 Non-Recursive CIC Filters . . . . . . . . . . . . . . . . . . . . . . 76

3.1 Complexity and Power Model . . . . . . . . . . . . . . . . 76

4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

Contents xvii

B Switching Activity Estimation of Cascaded Integrator Comb

Filters 85

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

2 CIC Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

3 CIC Filter Integrators/Accumulators . . . . . . . . . . . . . . . 90

3.1 One’s Probability . . . . . . . . . . . . . . . . . . . . . . . 91

3.2 Switching Activity . . . . . . . . . . . . . . . . . . . . . . 93

3.3 Pipelining in CIC Filter Accumulators/Integrators . . . . 97

3.4 Switching Activity for Correlated Sign Extension Bits . . 101

3.5 Switching Activity for Later Integrator Stages . . . . . . . 102

4 CIC Filter Combs . . . . . . . . . . . . . . . . . . . . . . . . . . 103

4.1 One’s Probability . . . . . . . . . . . . . . . . . . . . . . . 103

4.2 Switching Activity . . . . . . . . . . . . . . . . . . . . . . 104

4.3 Arbitrary Diﬀerential Delay Comb . . . . . . . . . . . . . 106

4.4 Switching Activity for Later Comb Stages . . . . . . . . . 108

5 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

5.1 CIC Integrators . . . . . . . . . . . . . . . . . . . . . . . . 109

5.2 CIC Combs . . . . . . . . . . . . . . . . . . . . . . . . . . 110

6 Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

A Double Diﬀerential Delay Comb . . . . . . . . . . . . . . . . . . . 118

A.1 Switching Activity . . . . . . . . . . . . . . . . . . . . . . 118

C On the Implementation of Fractional-Delay Filters Based on

the Farrow Structure 123

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

1.1 Contribution of the Paper . . . . . . . . . . . . . . . . . . 126

1.2 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

2 Adjustable FD FIR Filters and the Farrow Structure . . . . . . 127

3 Scaling of the Farrow Structure for FD FIR Filters . . . . . . . 128

3.1 Scaling the Output Values of the Sub-Filters . . . . . . . 129

3.2 Scaling in the Polynomial Evaluation . . . . . . . . . . . . 130

4 Round-Oﬀ Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . 133

4.1 Rounding and Truncation . . . . . . . . . . . . . . . . . . 134

4.2 Round-Oﬀ Noise in the Farrow Structure . . . . . . . . . 136

4.3 Approaches to Select Word Lengths . . . . . . . . . . . . 137

5 Implementation of Sub-Filters in the Farrow Structure . . . . . 143

6 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

6.1 Scaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

6.2 Word Length Selection Approaches . . . . . . . . . . . . . 146

6.3 Sub-Filter Implementation . . . . . . . . . . . . . . . . . . 148

7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151

xviii Contents

D Computational and Implementation Complexity of Polynomial

Evaluation Schemes 155

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

2 Polynomial Evaluation Algorithms . . . . . . . . . . . . . . . . . 159

2.1 Horner’s Scheme . . . . . . . . . . . . . . . . . . . . . . . 159

2.2 Dorn’s Generalized Horner Scheme . . . . . . . . . . . . . 159

2.3 Estrin’s Scheme . . . . . . . . . . . . . . . . . . . . . . . . 160

2.4 Munro and Paterson’s Scheme . . . . . . . . . . . . . . . 161

2.5 Maruyama’s Scheme . . . . . . . . . . . . . . . . . . . . . 162

2.6 Even Odd (EO) Scheme . . . . . . . . . . . . . . . . . . . 163

2.7 Li et al. Scheme . . . . . . . . . . . . . . . . . . . . . . . 164

2.8 Pipelining, Critical Path, and Latency . . . . . . . . . . . 164

3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165

4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172

E Low-Complexity Parallel Evaluation of Powers Exploiting Bit-

Level Redundancy 175

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178

2 Powers Evaluation of Binary Numbers . . . . . . . . . . . . . . . 178

2.1 Unsigned Binary Numbers . . . . . . . . . . . . . . . . . . 178

2.2 Powers Evaluation for Two’s Complement Numbers . . . 180

2.3 Partial Product Matrices with CSD Representation of

Partial Product Weights . . . . . . . . . . . . . . . . . . . 180

2.4 Pruning of the Partial Product Matrices . . . . . . . . . . 180

3 Proposed Algorithm for the Exploitation of Redundancy . . . . . 181

3.1 Unsigned Binary Numbers Case . . . . . . . . . . . . . . . 181

3.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . 182

3.3 Two’s Complement Input and CSD Encoding Case . . . . 182

4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183

5 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

F Integer Linear Programming Modeling of Addition Sequences

With Additional Constraints for Evaluation of Power Terms 189

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 192

2 Addition Chains and Addition Sequences . . . . . . . . . . . . . 193

3 Proposed ILP Model . . . . . . . . . . . . . . . . . . . . . . . . . 194

3.1 Basic ILP Model . . . . . . . . . . . . . . . . . . . . . . . 194

3.2 Minimizing Weighted Cost . . . . . . . . . . . . . . . . . 195

3.3 Minimizing Depth . . . . . . . . . . . . . . . . . . . . . . 196

4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 196

5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 201

Contents xix

G Switching Activity Estimation of DDFS Phase Accumulators 203

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 206

2 One’s Probability . . . . . . . . . . . . . . . . . . . . . . . . . . . 207

3 Switching Activity . . . . . . . . . . . . . . . . . . . . . . . . . . 207

4 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

Chapter 1

Introduction

Linear time-invariant (LTI) systems have the same sampling rate at the input,

output, and inside of the systems. In applications involving systems operating

at diﬀerent sampling rates, there is a need to convert the given sampling rate to

the desired sampling rate, without destroying the signal information of interest.

The sampling rate conversion (SRC) factor can be an integer or a non-integer.

This chapter gives a brief overview of SRC and the role of ﬁltering in SRC.

Digital ﬁlters are ﬁrst introduced. The SRC, when changing the sampling

rate by an integer factor, is then explained. The time and frequency-domain

representations of the downsampling and upsampling operations are then given.

The concept of decimation and interpolation that include ﬁltering is explained.

The description of six identities that enable the reductions in computational

complexity of multirate systems is given. A part of the chapter is devoted to

the eﬃcient polyphase implementation of decimators and interpolators. The

fractional-delay ﬁlters are then brieﬂy reviewed. Finally, the power and energy

consumption in CMOS circuits is described.

1.1 Digital Filters

Digital ﬁlters are usually used to separate signals from noise or signals in dif-

ferent frequency bands by performing mathematical operations on a sampled,

discrete-time signal. A digital ﬁlter is characterized by its transfer function, or

equivalently, by its diﬀerence equation.

1.1.1 FIR Filters

If the impulse response is of ﬁnite duration and becomes zero after a ﬁnite

number of samples, it is a ﬁnite-length impulse response (FIR) ﬁlter. A causal

1

2 Chapter 1. Introduction

x(n)

z

−1

z

−1

z

−1

z

−1

h(3) h(2) h(1) h(0) h(N −1) h(N)

y(n)

Figure 1.1: Direct-form realization of an N-th order FIR ﬁlter.

z

−1

h(2)

z

−1

z

−1

h(1) h(0)

x(n)

h(N −1) h(N)

y(n)

Figure 1.2: Transposed direct-form realization of an N-th order FIR ﬁlter.

FIR ﬁlter of order N is characterized by a transfer function H(z), deﬁned as [1, 2]

H(z) =

N

k=0

h(k)z

−k

, (1.1)

which is a polynomial in z

−1

of degree N. The time-domain input-output

relation of the above FIR ﬁlter is given by

y(n) =

N

k=0

h(k)x(n −k), (1.2)

where y(n) and x(n) are the output and input sequences, respectively, and

h(0), h(1), . . . , h(N) are the impulse response values, also called ﬁlter coeﬃcients.

The parameter N is the ﬁlter order and total number of coeﬃcients, N + 1,

is the ﬁlter length. The FIR ﬁlters can be designed to provide exact linear-

phase over the whole frequency range and are always bounded-input bounded-

output (BIBO) stable, independent of the ﬁlter coeﬃcients [1–3]. The direct

form structure in Fig. 1.1 is the block diagram description of the diﬀerence

equation (1.2). The transpose structure is shown in Fig. 1.2. The number of

coeﬃcient multiplications in the direct and transpose forms can be halved when

exploiting the coeﬃcient symmetry of linear-phase FIR ﬁlter. The total number

of coeﬃcient multiplications will be N/2 +1 for even N and (N + 1)/2 for odd

N.

1.1.2 IIR Filters

If the impulse response has an inﬁnite duration, i.e., theoretically never ap-

proaches zero, it is an inﬁnite-length impulse response (IIR) ﬁlter. This type

1.2. Sampling Rate Conversion 3

z

−1

z

−1

b

N

b

N−1

b

2

b

1

b

0

z

−1

z

−1

z

−1

−a

N

−a

N−1

−a

2

−a

1

x(n) y(n)

z

−1

Figure 1.3: Direct-form I IIR realization.

of ﬁlter is recursive and represented by a linear constant-coeﬃcient diﬀerence

equation as [1, 2]

y(n) =

N

k=0

b

k

x(n −k) −

N

k=1

a

k

y(n −k). (1.3)

The ﬁrst sum in (1.3) is non-recursive and the second sum is recursive. These

two parts can be implemented separately and connected together. The cascade

connection of the non-recursive and recursive sections results in a structure

called direct-form I as shown in Fig. 1.3.

Compared with an FIR ﬁlter, an IIR ﬁlter can attain the same magnitude

speciﬁcation requirements with a transfer function of signiﬁcantly lower order [1].

The drawbacks are nonlinear phase characteristics, possible stability issues, and

sensitivity to quantization errors [1, 4].

1.2 Sampling Rate Conversion

Sampling rate conversion is the process of converting a signal from one sampling

rate to another, while changing the information carried by the signal as little as

possible [5–8]. SRC is utilized in many DSP applications where two signals or

systems having diﬀerent sampling rates need to be interconnected to exchange

digital signal data. The SRC factor can in general be an integer, a ratio of

two integers, or an irrational number. Mathematically, the SRC factor can be

deﬁned as

R =

F

out

F

in

, (1.4)

4 Chapter 1. Introduction

where F

in

and F

out

are the original input sampling rate and the new sampling

rate after the conversion, respectively. The sampling frequencies are chosen in

such a way that each of them exceeds at least two times the highest frequency

in the spectrum of original continuous-time signal. When a continuous-time

signal x

a

(t) is sampled at a rate F

in

, and the discrete-time samples are x(n) =

x

a

(n/F

in

), SRC is required when there is need of x(n) = x

a

(n/F

out

) and the

continuous-time signal x

a

(t) is not available anymore. For example, an analog-

to-digital (A/D) conversion system is supplying a signal data at some sampling

rate, and the processor used to process that data can only accept data at a

diﬀerent sampling rate. One alternative is to ﬁrst reconstruct the corresponding

analog signal and, then, re-sample it with the desired sampling rate. However,

it is more eﬃcient to perform SRC directly in the digital domain due to the

availability of accurate all-digital sampling rate conversion schemes.

SRC is available in two ﬂavors. For R < 1, the sampling rate is reduced and

this process is known as decimation. For R > 1, the sampling rate is increased

and this process is known as interpolation.

1.2.1 Decimation

Decimation by a factor of M, where M is a positive integer, can be performed

as a two-step process, consisting of an anti-aliasing ﬁltering followed by an

operation known as downsampling [9]. A sequence can be downsampled with a

factor of M by retaining every M-th sample and discarding all of the remaining

samples. Applying the downsampling operation to a discrete-time signal, x(n),

produces a downsampled signal y(m) as

y(m) = x(mM). (1.5)

The time index m in the above equation is related to the old time index n

by a factor of M. The block diagram showing the downsampling operation is

shown in Fig. 1.4a The sampling rate of new discrete-time signal is M times

smaller than the sampling rate of original signal. The downsampling operation

is linear but time-varying operation. A delay in the original input signal by

some samples does not result in the same delay of the downsampled signal.

A signal downsampled by two diﬀerent factors may have two diﬀerent shape

output signals but both carry the same information if the downsampling factor

satisﬁes the sampling theorem criteria.

The frequency domain representation of downsampling can be found by tak-

ing the z-transform to both sides of (1.5) as

Y (e

jωT

) =

+∞

−∞

x(mM)e

−jωTm

=

1

M

M−1

k=0

X(e

j(ωT−2πk)/M

). (1.6)

The above equation shows the implication of the downsampling operation on

the spectrum of the original signal. The output spectrum is a sum of M uni-

formly shifted and stretched versions of X(e

jωT

) and also scaled by a factor of

1.2. Sampling Rate Conversion 5

M

y(m) x(n)

(a) M-fold downsampler.

H(z) M

y(m) x(n) x

1

(m)

(b) M-fold decimator.

Figure 1.4: M-fold downsampler and decimation.

¸

¸

¸Y (e

jωT

)

¸

¸

¸

2π

¸

¸

¸X

1

(e

jωT

)

¸

¸

¸

π/M π

π 2π 2Mπ

¸

¸

¸H

ideal

(e

jωT

1

)

¸

¸

¸

ωT

ωT

1

Figure 1.5: Spectra of the intermediate and decimated sequence.

1/M. The signals which are bandlimited to π/M can be downsampled without

distortion.

Decimation requires that aliasing should be avoided. Therefore, the ﬁrst

step is to bandlimit the signal to π/M and then downsampling by a factor M.

The block diagram of a decimator is shown in Fig. 1.4b. The performance of

a decimator is determined by the ﬁlter H(z) which is there to suppress the

aliasing eﬀect to an acceptable level. The spectra of the intermediate sequence

and output sequence obtained after downsampling are shown in Fig. 1.5. The

ideal ﬁlter, as shown by dotted line in Fig. 1.5, should be a lowpass ﬁlter with

the stopband edge at ω

s

T

1

= π/M.

1.2.2 Interpolation

Interpolation by a factor of L, where L is a positive integer, can be realized

as a two-step process of upsampling followed by an anti-imaging ﬁltering. The

upsampling by a factor of L is implemented by inserting L − 1 zeros between

two consecutive samples [9]. An upsampling operation to a discrete-time signal

x(n) produces an upsampled signal y(m) according to

y(m) =

_

x(m/L), m = 0, ±L, ±2L, . . . ,

0, otherwise.

(1.7)

A block diagram shown in Fig. 1.6a is used to represent the upsampling op-

6 Chapter 1. Introduction

y(m) x(n)

L

(a) L-fold upsampler.

L

x

1

(m) x(n) y(m)

H(z)

(b) L-fold interpolator.

Figure 1.6: L-fold upsampler and interpolator.

¸

¸

¸X(e

jωT

)

¸

¸

¸

¸

¸

¸H

ideal

(e

jωT

1

)

¸

¸

¸

π/L 2π/L 2π

2Lπ π 2π

2π π π/L

¸

¸

¸X

1

(e

jωT

)

¸

¸

¸

¸

¸

¸Y (e

jωT

1

)

¸

¸

¸

ωT

ωT

1

ωT

1

Figure 1.7: Spectra of the original, intermediate, and output sequences.

eration. The upsampling operation increases the sampling rate of the original

signal by L times. The upsampling operation is a linear but time-varying oper-

ation. A delay in the original input signal by some samples does not result in

the same delay of the upsampled signal. The frequency domain representation

of upsampling can be found by taking the z-transform of both sides of (1.7) as

Y (e

jωT

) =

+∞

−∞

y(m)e

−jωTm

= X(e

jωTL

). (1.8)

The above equation shows that the upsampling operation leads to L−1 images

of the spectrum of the original signal in the baseband.

Interpolation requires the removal of the images. Therefore in ﬁrst step,

upsampling by a factor of L is performed, and in the second step, unwanted

images are removed using anti-imaging ﬁlter. The block diagram of an interpo-

lator is shown in Fig. 1.6b. The performance of an interpolator is determined

1.2. Sampling Rate Conversion 7

c

2

x

1

(m)

M

M

y(n) y(n)

M

c

1

c

1

c

2

x

2

(m)

x

1

(m)

x

2

(m)

(a) Noble identity 1.

z

−M

M

x(m) y(n) x(m) y(n)

M z

−1

(b) Noble identity 3.

x(m) x(m)

M M

y(n) y(n)

H(z) H(z

M

)

(c) Noble identity 5.

Figure 1.8: Noble identities for decimation.

by the ﬁlter H(z), which is there to remove the unwanted images. As shown in

Fig. 1.7, the spectrum of the sequence, x

1

(m), not only contains the baseband of

the original signal, but also the repeated images of the baseband. Apparently,

the desired sequence, y(m), can be obtained from x

1

(m) by removing these

unwanted images. This is performed by the interpolation (anti-imaging) ﬁlter.

The ideal ﬁlter should be a lowpass ﬁlter with the stopband edge at ω

s

T

1

= π/L

as shown by dotted line in Fig. 1.7.

1.2.3 Noble Identities

The six identities, called noble identities, help to move the downsampler and

upsampler operations to a more desirable position to enable an eﬃcient imple-

mentation structure. As a result, the arithmetic operations of additions and

multiplications are to be evaluated at the lowest possible sampling rate. In

SRC, since ﬁltering has to be performed at the higher sampling rate, the com-

putational eﬃciency may be improved if downsampling (upsampling) operations

are introduced into the ﬁlter structures. In the ﬁrst and second identities, seen

in Figs. 1.9a and 1.8a, moving converters leads to evaluation of additions and

multiplications at lower sampling rate. The third and fourth identities, seen

in Figs. 1.9b and 1.8b, show that a delay of M (L) sampling periods at the

higher sampling rate corresponds to a delay of one sampling period at the lower

rate. The ﬁfth and sixth identities, seen in Figs. 1.9c and 1.8c, are generalized

versions of the third and fourth identities.

8 Chapter 1. Introduction

c

2

c

1

L

x(n)

y

1

(m)

y

2

(m)

x(n)

c

1

L

c

2

L

y

1

(m)

y

2

(m)

(a) Noble identity 2.

L L z

−1

x(n) y(m) x(n) y(m)

z

−L

(b) Noble identity 4.

L

y(m)

x(n)

L

y(m) x(n)

H(z) H(z

L

)

(c) Noble identity 6.

Figure 1.9: Noble identities for interpolation.

1.3 Polyphase Representation

A very useful tool in multirate signal processing is the so-called polyphase repre-

sentation of signals and systems [5, 10]. It facilitates considerable simpliﬁcations

of theoretical results as well as eﬃcient implementation of multirate systems. To

formally deﬁne it, an LTI system is considered with a transfer function

H(z) =

+∞

n=−∞

h(n)z

−n

. (1.9)

For an integer M, H(z) can be decomposed as

H(z) =

M−1

m=0

z

−m

+∞

n=−∞

h(nM +m)z

−nM

=

M−1

m=0

z

−m

H

m

(z

M

). (1.10)

The above representation is equivalent to dividing the impulse response h(n)

into M non-overlapping groups of samples h

m

(n), obtained from h(n) by M-

fold decimation starting from sample m. The subsequences h

m

(n) and the

corresponding z-transforms deﬁned in (1.10) are called the Type-1 polyphase

components of H(z) with respect to M [10].

The polyphase decomposition is widely used and its combination with the no-

ble identities leads to eﬃcient multirate implementation structures. Since each

polyphase component contains M − 1 zeros between two consecutive samples

and only nonzero samples are needed for further processing, zeros can be dis-

charged resulting in downsampled-by-M polyphase components. The polyphase

components, as a result, operate at M times lower sampling rate.

1.3. Polyphase Representation 9

z

−1

z

−1

z

−1

x(n)

M

y(m)

H

0

(z

M

)

H

1

(z

M

)

H

2

(z

M

)

H

M−1

(z

M

)

(a) Polyphase decomposition of a decimation ﬁlter.

H

1

(z)

z

−1

z

−1

y(m)

M

M

M

x(m)

H

0

(z)

H

1

(z)

y(m)

H

0

(z)

x(n)

H

M−1

(z)

H

M−1

(z)

(b) Moving downsampler to before sub-ﬁlters and replacing input structure by a commutator.

Figure 1.10: Polyphase implementation of a decimator.

An eﬃcient implementation of decimators and interpolators results if the

ﬁlter transfer function is represented in polyphase decomposed form [10]. The

ﬁlter H(z) in Fig. 1.4b is represented by its polyphase representation form as

shown in Fig. 1.10a. A more eﬃcient polyphase implementation and its equiva-

lent commutative structure is shown in Fig. 1.10b. Similarly, the interpolation

ﬁlter H(z) in Fig. 1.6b is represented by its polyphase representation form as

shown in Fig. 1.11a. Its equivalent structure, but more eﬃcient in hardware

implementation, is shown in Fig. 1.11b.

For FIR ﬁlters, the polyphase decomposition into low-order sub-ﬁlters is very

easy. However, for IIR ﬁlters, the polyphase decomposition is not so simple, but

it is possible to do so [10]. An IIR ﬁlter has a transfer function that is a ratio

of two polynomials. The representation of the transfer function into the form,

(1.10), needs some modiﬁcations in the original transfer function in such a way

that the denominator is only function of powers of z

M

or z

L

, where M and L

10 Chapter 1. Introduction

y(m)

z

−1

z

−1

z

−1

H

0

(z

L

)

H

1

(z

L

)

H

2

(z

L

)

H

L−1

(z

L

)

L

x(n)

(a) Polyphase decomposition of an interpolation ﬁlter.

x(n)

z

−1

z

−1

y(m)

x(n) y(m)

H

0

(z)

H

1

(z)

H

0

(z)

H

1

(z)

H

L−1

(z) H

L−1

(z)

L

L

L

(b) Moving upsampler to after sub-ﬁlters and replacing output structure by a commutator.

Figure 1.11: Polyphase implementation of an interpolator.

are the polyphase decomposition factors. Several approaches are available in

the literature for the polyphase decomposition of the IIR ﬁlters. In the ﬁrst

approach [11], the original IIR transfer function is re-arranged and transformed

into (1.10). The polyphase sub-ﬁlters in the second approach has distinct all-

pass sub-ﬁlters [12–15].

In this thesis, only FIR ﬁlters and their polyphase implementation forms are

considered for the sampling rate conversion.

1.4 Fractional-Delay Filters

Fractional-delay (FD) ﬁlters ﬁnd applications in, for example, mitigation of

symbol synchronization errors in digital communications [16–19], time-delay

estimation [20–22], echo cancellation [23], and arbitrary sampling rate conver-

1.4. Fractional-Delay Filters 11

0 0.2 0.4 0.6 0.8 1

0

0.5

1

1.5

2

2.5

3

3.5

ωT/π

P

h

a

s

e

d

e

l

a

y

d=0.9

d=0.8

d=0.7

d=0.6

d=0.5

d=0.4

d=0.3

d=0.2

d=0.1

Figure 1.12: Phase-delay characteristics of FD ﬁlters designed for delay param-

eter d = ¦0.1, 0.2, . . . , 0.9¦.

sion [24–26]. FD ﬁlters are used to provide a fractional delay adjustable to any

desired value. Ideally, the output y(n) of an FD ﬁlter for an input x(n) is given

by

y(n) = x(n −D), (1.11)

where D is a delay. Equation (1.11) is valid for integer values of D only. For

non-integer values of D, (1.11) need to be approximated. The delay parameter

D can be expressed as

D = D

int

+d, (1.12)

where D

int

is the integer part of D and d is the FD. The integer part of the delay

can then be implemented as a chain of D

int

unit delays. The FD d however needs

approximation. In the frequency domain, an ideal FD ﬁlter can be expressed as

H

des

(e

jω

) = e

−j(D

int

+d)ωT

. (1.13)

The ideal FD ﬁlter in (1.13) can be considered as all-pass and having a linear-

phase characteristics. The magnitude and phase responses are

[H

des

(e

jωT

)[ = 1 (1.14)

and

φ(ωT) = −(D

int

+d)ωT. (1.15)

12 Chapter 1. Introduction

0 0.2 0.4 0.6 0.8 1

0

0.2

0.4

0.6

0.8

1

ωT/π

M

a

g

n

i

t

u

d

e

d=0

d=0.9, 0.1

d=0.8, 0.2

d=0.7, 0.3

d=0.6, 0.4

d=0.5

Figure 1.13: Magnitude of FD ﬁlters designed for delay parameter d =

¦0.1, 0.2, . . . , 0.9¦.

For the approximation of the ideal ﬁlter response in (1.11), a wide range

of FD ﬁlters have been proposed based on FIR and IIR ﬁlters [27–34]. The

phase delay and magnitude characteristics of the FD ﬁlter based on ﬁrst order

Lagrange interpolation [32, 35] are shown in Figs. 1.12 and 1.13. The underlying

continuous-time signal x

a

(nT), delayed by a fractional-delay d, can be expressed

as

y(nT) = x

a

(nT −D

int

T −dT), (1.16)

and it is demonstrated for d = 0.2 and d = 0.4 in Fig. 1.14 .

In the application of FD ﬁlters for SRC, the fractional-delay d is changed at

every instant an output sample occurs. The input and output rates will now be

diﬀerent. If the sampling rate is required to be increased by a factor of two, for

each input sample there are now two output samples. The fractional-delay d

assumes the value 0 and 0.5 for each input sample, and the two corresponding

output samples are computed. For a sampling rate increase by a factor of L,

d will take on all values in a sequential manner between ¦(L − 1)/L, 0¦, with

a step size of 1/L. For each input sample, L output samples are generated.

Equivalently, it can be interpreted as delaying the underlying continuous-time

signal by L diﬀerent values of fractional-delays.

1.5. Power and Energy Consumption 13

−2 0 2 4 6 8 10

−1

−0.5

0

0.5

1

1.5

Sample index n

A

m

p

l

i

t

u

d

e

d=0

d=0.2

d=0.4

Figure 1.14: The underlying continuous-time signal delayed by d = 0.2 and

d = 0.4 with FD ﬁltering.

1.5 Power and Energy Consumption

Power is dissipated in the form of heat in digital CMOS circuits. The power

dissipation is commonly divided into three diﬀerent sources: [36, 37]

Dynamic or switching power consumption

Short circuit power consumption

Leakage power consumption

The above sources are summarized in an equation as

P

avg

= P

dynamic

+P

short-circuit

+P

leakage

= α

0→1

C

L

V

2

DD

f

clk

+I

sc

V

DD

+I

leakage

V

DD

. (1.17)

The switching or dynamic power consumption is related to charging and

discharging of a load capacitance C

L

through the PMOS and NMOS transistors

during low-to-high and high-to low transitions at the output, respectively. The

total energy drawn from the power supply for low-to-high transition, seen in

Fig. 1.15a, is C

L

V

2

DD

, half of which is dissipated in the form of heat through the

PMOS transistors while the other half is stored in the load capacitor. During

the pull-down, high to low transition, seen in Fig. 1.15b, the energy stored on

C

L

which is C

L

V

2

DD

/2 is dissipated as heat by the NMOS transistors. If all these

transitions occur at the clock rate of f

clk

then the switching power consumption

14 Chapter 1. Introduction

V

in

V

DD

V

out

C

L

(a) A rising output transition on the CMOS in-

verter.

V

DD

V

in

V

out

C

L

(b) A falling output transition on the inverter.

Figure 1.15: A rising and falling output transition on the CMOS inverter. The

solid arrows represent the charging and discharging of the load capacitance C

L

.

The dashed arrow is for the leakage current.

is given by C

L

V

2

DD

f

clk

. However the switching of the data is not always at the

clock rate but rather at some reduced rate which is best deﬁned by another

parameter α

0→1

, deﬁned as the average number of times in each cycle that a

node makes a transition from low to high. All the parameters in the dynamic

power equation, except α

0→1

, are deﬁned by the layout and speciﬁcation of the

circuit.

The second power term is due to the direct-path short circuit current, I

sc

,

which ﬂows when both of the PMOS and NMOS transistors are active simulta-

neously, resulting in a direct path from supply to ground.

Leakage power, on the other hand, is dissipated in the circuits when they are

idle, as shown in Figs. 1.15a and 1.15b by a dashed line. The leakage current,

I

leakage

, consists of two major contributions: I

sub

and I

gate

. The term I

sub

is

the sub-threshold current caused by low threshold voltage, when both NMOS

1.5. Power and Energy Consumption 15

and PMOS transistors are oﬀ. The other quantity, I

gate

, is the gate current

caused by reduced thickness of the gate oxide in deep sub-micron process . The

contribution of the reverse leakage currents, due to of reverse bias between

diﬀusion regions and wells, is small compared to sub-threshold and gate leakage

currents.

In modern/concurrent CMOS technologies, the two foremost forms of power

consumptions are dynamic and leakage. The relative contribution of these two

forms of power consumptions has greatly evolved over the period of time. To-

day, when technology scaling motivates the reduced power supply and threshold

voltage, the leakage component of power consumption has started to become

dominant [38–40]. In today’s processes, sub-threshold leakage is the main con-

tributor to the leakage current.

Chapter 2

Finite Word Length Eﬀects

Digital ﬁlters are implemented in hardware with ﬁnite-precision numbers and

arithmetic. As a result, the digital ﬁlter coeﬃcients and internal signals are

represented in discrete form. This generally leads to two diﬀerent types of ﬁnite

word length eﬀects.

First, there are the errors in the representing of coeﬃcients. The coeﬃcients

representation in ﬁnite precision (quantization) has the eﬀect of a slight change

in the location of the ﬁlter poles and zeros. As a result, the ﬁlter frequency

response diﬀers from the response with inﬁnite-precision coeﬃcients. However,

this error type is deterministic and is called coeﬃcient quantization error.

Second, there are the errors due to multiplication round-oﬀ, that results

from the rounding or truncation of multiplication products within the ﬁlter.

The error at the ﬁlter output that results from these roundings or truncations

is called round-oﬀ noise.

This chapter outlines the ﬁnite word length eﬀects in digital ﬁlters. It ﬁrst

discusses binary number representation forms. Diﬀerent types of ﬁxed-point

quantizations are then introduced along with their characteristics. The overﬂow

characteristics in digital ﬁlters are brieﬂy reviewed with respect to addition and

multiplication operations. Scaling operation is then discussed which is used

to prevent overﬂows in digital ﬁlter structures. The computation of round-

oﬀ noise at the digital ﬁlter output is then outlined. The description of the

constant coeﬃcient multiplication is then given. Finally, diﬀerent approaches

for the optimization of word length are reviewed.

2.1 Numbers Representation

In digital circuits, a number representation with a radix of two, i.e., binary

representation, is most commonly used. Therefore, a number is represented by

17

18 Chapter 2. Finite Word Length Eﬀects

a sequence of binary digits, bits, which are either 0 or 1. A w-bit unsigned

binary number can be represented as

X = x

0

x

1

x

2

...x

w−2

x

w−1

, (2.1)

with a value of

X =

w−1

i=0

x

i

2

w−i−1

, (2.2)

where x

0

is the most signiﬁcant bit (MSB) and x

w−1

is the least signiﬁcant bit

(LSB) of the binary number.

A ﬁxed-point number consists of an integral part and a fractional part, with

the two parts separated by a binary point in radix of two. The position of the

binary point is almost always implied and thus the point is not explicitly shown.

If a ﬁxed-point number has w

I

integer bits and w

F

fractional bits, it can be

expressed as

X = x

w

I

−1

. . . x

1

x

0

.x

−1

x

−2

. . . x

−w

F

. (2.3)

The value can be obtained as

X =

w

I

−1

i=0

x

i

2

i

+

−1

i=−w

F

x

i

2

i

. (2.4)

2.1.1 Two’s Complement Numbers

For a suitable representation of numbers and an eﬃcient implementation of

arithmetic operation, ﬁxed-point arithmetics with a word length of w bits is

considered. Because of its special properties, the two’s complement representa-

tion is considered, which is the most common type of arithmetic used in digital

signal processing. The numbers are usually normalized to [−1, 1), however, to

accommodate the integer bits, the range [−2

w

I

, −2

w

I

), where w

I

∈ N, is as-

sumed. The quantity w

I

denotes the number of integer bits. The MSB, the

left-most bit in w, is used as the sign bit. The sign bit is treated in the same

manner as the other bits. The fraction part is represented with w

F

= w−1−w

I

bits. The quantization step is as a result ∆ = 2

−w

F

.

If X

2C

is a w-bit number in two’s complement form, then by using all deﬁ-

nitions considered above, X can be represented as

X

2C

= 2

w

I

_

−x

0

2

0

+

w−1

i=1

x

i

2

−i

_

, x

i

∈ ¦0, 1¦, i = 0, 1, 2, . . . , w −1,

= −x

0

2

w

I

. ¸¸ .

sign bit

+

w

I

i=1

x

i

2

w

I

−i

. ¸¸ .

integer

+

w−1

i=w

I

+1

x

i

2

w

I

−i

. ¸¸ .

fraction

,

2.2. Fixed-Point Quantization 19

or in compact form as

X

2C

= [ x

0

.¸¸.

sign bit

[ x

1

x

2

. . . x

w

I

. ¸¸ .

integer

[ x

w

I

+1

. . . x

w−1

. ¸¸ .

fraction

]

2

. (2.5)

In two’s complement, the range of representable numbers is asymmetric. The

largest number is

X

max

= 2

w

I

−2

−w

F

= [0[1 . . . 1[1 . . . 1[]

2

, (2.6)

and the smallest number is

X

min

= −2

w

I

= [1[0 . . . 0[0 . . . 0[]

2

. (2.7)

2.1.2 Canonic Signed-Digit Representation

Signed-digit (SD) numbers diﬀer from the binary representation, since the digits

are allowed to take negative values, i.e., x

i

∈ ¦−1, 0, 1¦. The symbol 1 is

also used to represent −1. It is a redundant number system, as diﬀerent SD

representations are possible of the same integer value. The canonic signed-digit

(CSD) representation is a special case of signed-digit representation in that each

number has a unique representation. The other feature of CSD representation

is that a CSD binary number has the fewest number of non-zero digits with no

consecutive bits being non-zero [41].

A number can be represented in CSD form as

X =

w−1

i=0

x

i

2

i

,

where, x

i

∈ ¦−1, 0, +1¦ and x

i

x

i+1

= 0, i = 0, 1, . . . , w −2.

2.2 Fixed-Point Quantization

Three types of ﬁxed-point quantization are normally considered, rounding, trun-

cation, and magnitude truncation [1, 42, 43]. The quantization operator is

denoted by Q(.). For a number X, the rounded value is denoted by Q

r

(X),

the truncated value by Q

t

(X), and the magnitude truncated value Q

mt

(X). If

the quantized value has w

F

fractional bits, the quantization step size, i.e., the

diﬀerence between the adjacent quantized levels, is

∆ = 2

−w

F

(2.8)

The rounding operation selects the quantized level that is nearest to the un-

quantized value. As a result, the rounding error is at most ∆/2 in magnitude

as shown in Fig. 2.1a. If the rounding error,

r

, is deﬁned as

r

= Q

r

(X) −X, (2.9)

20 Chapter 2. Finite Word Length Eﬀects

∆/2

∆/2

(a) Rounding error.

∆

∆

(b) Truncation error.

∆

∆

(c) Magnitude truncation er-

ror.

Figure 2.1: Quantization error characteristics.

then

−

∆

2

≤

r

≤

∆

2

. (2.10)

Truncation simply discards the LSB bits, giving a quantized value that is

always less than or equal to the exact value. The error characteristics in the

case of truncation are shown in Fig. 2.1b. The truncation error is

−∆ <

t

≤ 0. (2.11)

Magnitude truncation chooses the nearest quantized value that has a magni-

tude less than or equal to the exact value, as shown in Fig. 2.1c, which implies

−∆ <

mt

< ∆. (2.12)

The quantization error can often be modeled as a random variable that has

a uniform distribution over the appropriate error range. Therefore, the ﬁlter

calculations involving round-oﬀ errors can be assumed error-free calculations

that have been corrupted by additive white noise [43]. The mean and variance

of the rounding error is

m

r

=

1

∆

∆/2

_

−∆/2

r

d

r

= 0 (2.13)

and

σ

2

r

=

1

∆

∆/2

_

−∆/2

(

r

−m

r

)

2

d

r

=

∆

2

12

. (2.14)

Similarly, for truncation, the mean and variance of the error are

m

t

= −

∆

2

and σ

2

t

=

∆

2

12

, (2.15)

2.3. Overﬂow Characteristics 21

and for magnitude truncation,

m

mt

= 0 and σ

2

mt

=

∆

2

3

. (2.16)

2.3 Overﬂow Characteristics

With ﬁnite word length, it is possible for the arithmetic operations to overﬂow.

This happens for ﬁxed-point arithmetic ,e.g., when two numbers of the same sign

are added to give a value having a magnitude not in the interval [−2

w

I

, 2

w

I

).

Since numbers outside this range are not representable, the result overﬂows.

The overﬂow characteristics of two’s complement arithmetic can be expressed

as

X

2C

(X) =

_

_

_

X −2

w

I

+1

, X ≥ 2

w

I

,

X, −2

w

I

≤ X < 2

w

I

,

X + 2

w

I

+1

, X < −2

w

I

,

(2.17)

and graphically it is shown in Fig. 2.2.

−2

w

I

2

w

I

−∆

X

2C

(X)

X

Figure 2.2: Overﬂow characteristics for two’s complement arithmetic.

2.3.1 Two’s Complement Addition

In two’s complement arithmetic, when two numbers each having w-bits are

added together, the result will be w + 1 bits. To accommodate this extra bit,

the integer bits need to be extended. In two’s complement, such overﬂows can

be seen as discarding the extra bit, which corresponds to a repeated addition

or subtraction of 2

(w

I

+1)

to make the w + 1-bit result to be representable by

w-bits. This model for overﬂow is illustrated in Fig. 2.3.

2.3.2 Two’s Complement Multiplication

In the case of multiplication of two ﬁxed-point numbers each having w-bits, the

result is 2w-bits. Overﬂow is similarly treated here as in the case of addition,

a repeated addition or subtraction of 2

(w

I

+1)

. Having two numbers, each with

a precision w

F

, the product is of precision 2w

F

. The number is reduced to w

F

22 Chapter 2. Finite Word Length Eﬀects

y y y

d 2

(w

I

+1)

X

1

X

2

X

2

X

1

X

2

X

1

Figure 2.3: Addition in two’s complement. The integer d ∈ Z has to assign a

value such that y ∈ [−2

w

I

, 2

w

I

).

y

Q

y y

d 2

(w

I

+1)

X

1

X

2

X

2

X

1

X

2

X

1

Figure 2.4: Multiplication in two’s complement. The integer d ∈ Z has to assign

a value such that y ∈ [−2

w

I

, 2

w

I

).

precision again by using rounding or truncation. The model to handle overﬂow

in multiplication is shown in Fig. 2.4.

2.4 Scaling

To prevent overﬂow in ﬁxed-point ﬁlter realizations, the signal levels inside

the ﬁlter can be reduced by inserting scaling multipliers. However, the scaling

multipliers should not distort the transfer function of the ﬁlter. Also the signal

levels should not be too low, otherwise, the signal-to-noise (SNR) ratio will

suﬀer as the noise level is ﬁxed for ﬁxed-point arithmetic.

The use of two’s complement arithmetic eases the scaling, as repeated addi-

tions with an overﬂow can be acceptable if the ﬁnal sum lies within the proper

signal range [4]. However, the inputs to non-integer multipliers must not over-

ﬂow. In the literature, there exist several scaling norms that compromise be-

tween the probability of overﬂows and the round-oﬀ noise level at the output.

In this thesis, only the commonly employed L

2

-norm is considered which for a

Fourier transform H(e

jωT

) is deﬁned as

_

_

H(e

jωT

)

_

_

2

=

¸

¸

¸

¸

_

1

2π

π

_

−π

[H(e

jωT

)[

2

d(ωT).

In particular, if the input to a ﬁlter is Gaussian white noise with a certain

probability of overﬂow, using L

2

-norm scaling of a node inside the ﬁlter, or at

the output, implies the same probability of overﬂow at that node.

2.5. Round-Oﬀ Noise 23

2.5 Round-Oﬀ Noise

A few assumptions need to be made before computing the round-oﬀ noise at the

digital ﬁlter output. Quantization noise is assumed to be stationary, white, and

uncorrelated with the ﬁlter input, output, and internal variables. This assump-

tion is valid if the ﬁlter input changes from sample to sample in a suﬃciently

random-like manner [43].

For a linear system with impulse response g(n), excited by white noise with

mean m

x

and variance σ

2

x

, the mean and variance of the output noise is

m

y

= m

x

∞

n=−∞

g(n) (2.18)

and

σ

2

y

= σ

2

x

∞

n=−∞

g

2

(n), (2.19)

where g(n) is the impulse response from the point where a round-oﬀ takes place

to the ﬁlter output. In case there is more than one source of roundoﬀ error

in the ﬁlter, the assumption is made that these errors are uncorrelated. The

round-oﬀ noise variance at the output is the sum of contributions from each

quantization error source.

2.6 Word Length Optimization

As stated earlier in the chapter, the quantization process introduces round-oﬀ

errors, which in turn measures the accuracy of an implementation. The cost

of an implementation is generally required to be minimized, while still satis-

fying the system speciﬁcation in terms of implementation accuracy. Excessive

bit-width allocation will result in wasting valuable hardware resources, while

in-suﬃcient bit-width allocation will result in overﬂows and violate precision re-

quirements. The word length optimization approach trades precisions for VLSI

measures such as area, power, and speed. These are the measures or costs by

which the performance of a design is evaluated. After word length optimization,

the hardware implementation of an algorithm will be eﬃcient typically involv-

ing a variety of ﬁnite precision representation of diﬀerent sizes for the internal

variables.

The ﬁrst diﬃculty in the word length optimization problem is deﬁning of the

relationship of word length to considered VLSI measures. The possible ways

could be closed-form expressions or the availability of precomputed values in

the form of a table of these measures as a function of word lengths. These

closed-form expressions and precomputed values are then used by the word

length optimization algorithm at the word length assignment phase to have an

estimate, before doing an actual VLSI implementation.

24 Chapter 2. Finite Word Length Eﬀects

The round-oﬀ noise at the output is considered as the measure of perfor-

mance function because it is the primary concern of many algorithm designers.

The round-oﬀ error is a decreasing function of word length, while VLSI mea-

sures such as area, speed, and power consumption are increasing functions of

word length. To derive a round-oﬀ noise model, an LTI system with n quanti-

zation error sources is assumed. This assumption allows to use superposition of

independent noise sources to compute the overall round-oﬀ noise at the output.

The noise variance at the output is then written as

σ

2

o

=

∞

k=0

σ

2

e

i

h

2

i

(k), 1 ≤ i ≤ n, (2.20)

where e

i

is the quantization error source at node i and h

i

(k) is the impulse

response from node i to the output. If the quantization word length w

i

is

assumed for error source at node i then

σ

2

e

i

=

2

−2w

i

12

, 1 ≤ i ≤ n. (2.21)

The formulation of an optimization problem is done by constraining the cost or

accuracy of an implementation while optimizing other metric(s). For simplicity,

the cost function is assumed to be the area of the design. Its value is measured

appropriately to the considered technology, and it is assumed to be the function

of quantization word lengths, f(w

i

), i = 1, 2, . . . , n . The performance function,

on other hand, is taken to be the round-oﬀ noise value at the output, given in

(2.20), due to the limiting of internal word lengths. As a result, one possible

formulation of word length optimization problem is

minimize area : f(w

i

), 1 ≤ i ≤ n (2.22)

s.t. σ

2

o

≤ σ

spec

, (2.23)

where σ

spec

is the required noise speciﬁcation at the output.

The problem of word length optimization has received considerable research

attention. In [44–48], diﬀerent search-based strategies are used to ﬁnd suitable

word length combinations. In [49], the word length allocation problem is solved

using a mixed-integer linear programming formulation. Some other approaches,

e.g., [50–52], have constrained the cost, while optimizing the other metric(s).

2.7 Constant Multiplication

A multiplication with a constant coeﬃcient, commonly used in DSP algorithms

such as digital ﬁlters [53], can be made multiplierless by using additions, sub-

tractions, and shifts only [54]. The complexity for adders and subtracters is

roughly the same so no diﬀerentiation between the two is normally considered.

A shift operation in this context is used to implement a multiplication by a

2.7. Constant Multiplication 25

− −

−

x

x

x

y

¸3

¸4

¸2 ¸2 ¸2

y

¸2

y

−

¸2

Figure 2.5: Diﬀerent realizations of multiplication with the coeﬃcient 45. The

symbol ¸i are used to represent i left shifts.

factor of two. Most of the work in the literature has focused on minimizing

the adder cost [55–57]. For bit parallel arithmetic, the shifts can be realized

without any hardware using hardwiring.

In constant coeﬃcient multiplication, the hardware requirements depend on

the coeﬃcient value, e.g., the number of ones in the binary representation of the

coeﬃcient value. The constant coeﬃcient multiplication can be implemented by

the method that is based on the CSD representation of the constant coeﬃcient

[58], or more eﬃciently by using other structures as well that require fewer

number of operations [59]. Consider, for example, the coeﬃcient 45, having

the CSD representation 1010101. The multiplication with this constant can be

realized by three diﬀerent structures as shown in Fig. 2.5, varying with respect

to number of additions and shifts requirement [41].

In some applications, one signal is required to be multiplied by several con-

stant coeﬃcients, as in the case of transposed direct-form FIR ﬁlters shown

in 1.2. Realizing the set of products of a single multiplicand is known as the

multiplier block problem [60] or the multiple constant multiplications (MCM)

problem [61]. A simple way to implement multiplier blocks is to realize each

multiplier separately. However, they can be implemented more eﬃciently by us-

ing structures that remove any redundant partial results among the coeﬃcients

and thereby reduce the overall number of operations. The MCM algorithms

can be divided into three groups based on the approach used in the algorithms;

sub-expression sharing [61–65], diﬀerence methods [66–70], and graph based

methods [60, 71–73].

The MCM concepts can be further generalized to computations involving

multiple inputs and multiple outputs. This corresponds to a matrix-vector

26 Chapter 2. Finite Word Length Eﬀects

multiplication with a matrix with constant coeﬃcients. This is the case for

linear transforms such as the discrete cosine transform (DCT) or the discrete

Fourier transform (DFT), but also FIR ﬁlter banks [74], polyphase decomposed

FIR ﬁlters [75], and state space digital ﬁlters [4, 41]. Matrix-vector MCM

algorithms include [76–80].

Chapter 3

Integer Sampling Rate Conversion

The role of ﬁlters in sampling rate conversion has been described in Chapter 1.

This chapter mainly focuses on the implementation aspects of decimators and

interpolators due to the fact that ﬁltering initially has to be performed on the

side of higher sampling rate. The goal is then to achieve conversion structures

allowing the arithmetic operations to be performed at the lower sampling rate.

The overall computational workload will as a result be reduced.

The chapter begins with the basic decimation and interpolation structures

that are based on FIR ﬁlters. A part of the chapter is then devoted to the

eﬃcient polyphase implementation of FIR decimators and interpolators. The

concept of the basic cascade integrator-comb (CIC) ﬁlter is introduced and

its properties are discussed. The structures of the CIC-based decimators and

interpolators are then shown. The overall two-stage implementation of a CIC

and an FIR ﬁlter that is used for the compensation is described. Finally, the

non-recursive implementation of the CIC ﬁlter and its multistage and polyphase

implementation structures are presented.

3.1 Basic Implementation

Filters in SRC have an important role and are used as anti-aliasing ﬁlters in

decimators or anti-imaging ﬁlters in interpolators. The characteristics of these

ﬁlters then correlate to the overall performance of a decimator or of an interpola-

tor. As discussed in Chapter 1, ﬁltering operations need to be performed at the

higher sampling rate. In decimation, ﬁltering operation precedes downsampling

and in interpolation, ﬁltering operation proceeds upsampling. However, down-

sampling or upsampling operations can be moved into the ﬁltering structures,

providing arithmetic operations to be performed at the lower sampling rate. As

a result, the overall computational workload in the sampling rate conversion

system can potentially be reduced by the conversion factor M (L).

27

28 Chapter 3. Integer Sampling Rate Conversion

x(n)

z

−1

z

−1

z

−1

z

−1

M

h(3) h(2) h(1) h(0)

y(m)

h(N −1) h(N)

Figure 3.1: Cascade of direct-form FIR ﬁlter and downsampler.

x(n)

z

−1

z

−1

z

−1

M M M M

h(0) h(1) h(3)

z

−1

M

M

h(2)

y(m)

h(N −1) h(N)

Figure 3.2: Computational eﬃcient direct-form FIR decimation structure.

In the basic approach [81], FIR ﬁlters are considered as anti-aliasing or anti-

imaging ﬁlters. The non-recursive nature of FIR ﬁlters provides the opportunity

to improve the eﬃciency of FIR decimators and interpolators through polyphase

decomposition.

3.1.1 FIR Decimators

In the basic implementation of an FIR decimator, a direct-form FIR ﬁlter is

cascaded with the downsampling operation as shown in Fig. 3.1. The FIR ﬁl-

ter acts as the anti-aliasing ﬁlter. The implementation can be modiﬁed to a

form that is computationally more eﬃcient, as seen in Fig. 3.2. The downsam-

pling operation precedes the multiplications and additions which results in the

arithmetic operations to be performed at the lower sampling rate.

In Fig. 3.2, the input data is now read simultaneously from the delays at

every M:th instant of time. The input sample values are then multiplied by the

ﬁlter coeﬃcients h(n) and combined together to give y(m).

In the basic implementation of the FIR decimator in Fig. 3.1, the number

of the multiplications per input sample in the decimator is equal to the FIR

ﬁlter length N + 1. The ﬁrst modiﬁcation made by the use of noble identity,

seen in Fig. 3.2, reduces the multiplications per input sample to (N + 1)/M.

The symmetry of coeﬃcients may be exploited in the case of linear phase FIR

ﬁlter to reduce the number of multiplications to (N + 1)/2 for odd N and

N/2 + 1 for even N. In the second modiﬁcation, for a downsampling factor

M, multiplications per input sample is (N + 2)/2M for even N. Similarly, the

3.2. Polyphase FIR Filters 29

z

−1

L

x(n)

h(2)

z

−1

z

−1

h(1) h(0)

y(m)

h(N −1) h(N)

Figure 3.3: Cascade of upsampler and transposed direct-form FIR ﬁlter.

h(2) h(1) h(0)

y(m)

z

−1

z

−1

z

−1

L L L L L

x(n)

h(N −1) h(N)

Figure 3.4: Computational eﬃcient transposed direct-form FIR interpolation

structure.

multiplications per input sample are (N + 1)/(2M) for odd N.

3.1.2 FIR Interpolators

The interpolator is a cascade of an upsampler followed by an FIR ﬁlter that acts

as an anti-imaging ﬁlter, seen in Fig. 3.3. Similar to the case of decimator, the

upsampling operation is moved into the ﬁlter structure at the desired position

such that the multiplication operations are performed at the lower sampling-rate

side, as shown in Fig. 3.4.

In the basic multiplication of the FIR interpolator in Fig. 3.3, every L:th

input sample to the ﬁlter is non-zero. Since there is no need to multiply the

ﬁlter coeﬃcients by the zero-valued samples, multiplications need to be per-

formed at the sampling rate of the input signal. The result is then upsampled

by the desired factor L. This modiﬁed implementation approach reduces the

multiplications count per output sample from N + 1 to (N + 1)/L. Similar to

the decimator case, the coeﬃcient symmetry of the linear phase FIR ﬁlter can

be exploited to reduce the number of multiplication further by two.

3.2 Polyphase FIR Filters

As described earlier in Sec. 3.1, the transfer function of an FIR ﬁlter can be

decomposed into parallel structures based on the principle of polyphase decom-

position. For a factor of M polyphase decomposition, the FIR ﬁlter transfer

30 Chapter 3. Integer Sampling Rate Conversion

T T T

y(m)

x(n)

Figure 3.5: Polyphase factor of M FIR decimator structure.

function is decomposed into M low-order transfer functions called the polyphase

components [5, 10]. The individual contributions from these polyphase com-

ponents when added together has the same eﬀect as of the original transfer

function.

If the impulse response h(n) is assumed zero outside 0 ≤ n ≤ N, then the

factors H

m

(z) in (1.10) becomes

H

m

(z) =

(N+1)/M

n=0

h(nM +m)z

−n

, 0 ≤ m ≤ M −1. (3.1)

3.2.1 Polyphase FIR Decimators

The basic structure of the polyphase decomposed FIR ﬁlter is the same as

that in Fig. 1.10a, however the sub-ﬁlters H

m

(z) are now speciﬁcally deﬁned

in (3.1). The arithmetic operations in the implementation of all FIR sub-ﬁlters

still operate at the input sampling rate. The downsampling operation at the

output can be moved into the polyphase branches by using the more eﬃcient

polyphase decimation implementation structure in Fig. 1.10. As a result, the

arithmetic operations inside the sub-ﬁlters now operate at the lower sampling

rate. The overall computational complexity of the decimator is reduced by a

factor of M.

The input sequences to the polyphase sub-ﬁlters are combination of delayed

and downsampled values of the input which can be directly selected from the

input with the use of a commutator. The resulting polyphase architecture for a

factor-of-M decimation is shown in Fig. 3.5. The commutator operates at the

3.3. Multistage Implementation 31

T T T

x(n)

y(m)

Figure 3.6: Polyphase factor of L FIR interpolator structure.

input sampling rate but the sub-ﬁlters operate at the output sampling rate.

3.2.2 Polyphase FIR Interpolators

The polyphase implementation of an FIR ﬁlter, by a factor L, is done by decom-

posing the original transfer function into L polyphase components, also called

polyphase sub-ﬁlters. The basic structure of the polyphase decomposed FIR

ﬁlter is same as that in the Fig. 1.11a. The polyphase sub-ﬁlters H

m

(z) are

deﬁned in (3.1) for the FIR ﬁlter case. In the basic structure, the arithmetic

operations in the implementation of all FIR sub-ﬁlters operate at the upsam-

pled rate. The upsampling operation at the input is moved into the polyphase

branches by using the more eﬃcient polyphase interpolation implementation

structure in Fig. 1.11. More eﬃcient interpolation structure will result because

the arithmetic operations inside the sub-ﬁlters now operate at the lower sam-

pling rate. The overall computational complexity of the interpolator is reduced

by a factor of L.

On the output side, the combination of upsamplers, delays, and adders are

used to feed the correct interpolated output sample. The same function can be

replaced directly by using a commutative switch. The resulting architecture for

factor of L interpolator is shown in Fig. 3.6.

3.3 Multistage Implementation

A multistage decimator with K stages, seen in Fig. 3.7, can be used when

the decimation factor M can be factored into the product of integers, M =

M

1

M

2

, . . . , M

k

, instead of using a single ﬁlter and factor of M downsampler.

Similarly, a multistage interpolator with K stages, seen in Fig. 3.8, can be used if

32 Chapter 3. Integer Sampling Rate Conversion

H

1

(z) M

1

H

2

(z)

M

2

H

K

(z)

M

K

Figure 3.7: Multistage implementation of a decimator.

H

1

(z) H

2

(z) H

J

(z) L

1

L

2

L

J

Figure 3.8: Multistage implementation of an interpolator.

the interpolation factor L can be factored as L = L

1

L

2

L

K

. The noble

identities can be used to transform the multistage decimator implementation

into the equivalent single stage structure as shown in Fig. 3.9. Similarly, the

single stage equivalent of the multistage interpolator is shown in Fig. 3.10. An

optimum realization depends on the proper selection of K and J and best

ordering of the multistage factors [82].

The multistage implementations are used when there is need of implementing

large sampling rate conversion factors [1, 82]. Compared to the single stage

implementation, it relaxes the speciﬁcations of individual ﬁlters. A single stage

implementation with a large decimation (interpolation) factor requires a very

narrow passband ﬁlter, which is hard from the complexity point of view [3, 81,

82].

3.4 Cascaded Integrator Comb Filters

An eﬃcient architecture for a high decimation-rate ﬁlter is the cascade integrator

comb (CIC) ﬁlter introduced by Hogenauer [83]. The CIC ﬁlter has proven to be

an eﬀective element in high-decimation or interpolation systems [84–87]. The

simplicity of implementation makes the CIC ﬁlters suitable for operating at

higher frequencies. A single stage CIC ﬁlter is shown in Fig. 3.11. It is a

cascade of integrator and comb sections. The feed forward section of the CIC

ﬁlter, with diﬀerential delay R, is comb section and the feedback section is called

an integrator. The transfer function of a single stage CIC ﬁlter is given as

H(z) =

1 −z

−R

1 −z

−1

=

R−1

n=0

z

−n

. (3.2)

H

1

(z)H

2

(z

M

1

)H

3

(z

M

1

M

2

) . . . H

J

(z

M

1

M

2

...M

J−1

) M

1

M

2

. . . M

J

Figure 3.9: Single stage equivalent of a multistage decimator.

3.4. Cascaded Integrator Comb Filters 33

L

1

L

2

. . . L

J

H

1

(z)H

2

(z

L

1

)H

3

(z

L

1

L

2

) . . . H

J

(z

L

1

L

2

...L

J−1

)

Figure 3.10: Single stage equivalent of a multistage interpolator.

z

−1

−

x(n) y(m)

z

−R

Figure 3.11: Single-stage CIC ﬁlter.

The comb ﬁlter has R zeros that are equally spaced around the unit circle. The

zeros are the R-th roots of unity and are located at z(k) = e

j2πk/R

, where

k = 0, 1, 2, . . . , R − 1. The integrator section, on the other hand, has a single

pole at z = 1. CIC ﬁlters are based on the fact that perfect pole/zero cancel-

ing can be achieved, which is only possible with exact integer arithmetic [88].

The existence of integrator stages will lead to overﬂows. However, this is of

no harm if two’s complement arithmetic is used and the range of the number

system is equal to or exceeds the maximum magnitude expected at the output

of the composite CIC ﬁlter. The use of two’s complement and non-saturating

arithmetic accommodates the issues with overﬂow. With the two’s complement

wraparound property, the comb section following the integrator will compute

the correct result at the output.

The frequency response of CIC ﬁlter, by evaluating H(z) on the unit circle,

z = e

jωT

= e

j2πf

, is given by

H(e

jωT

) =

_

sin(

ωTR

2

)

sin(

ωT

2

)

_

e

−jωT(R−1)/2

(3.3)

The gain of the single stage CIC ﬁlter at ωT = 0 is equal to the diﬀerential

delay, R, of the comb section.

3.4.1 CIC Filters in Interpolators and Decimators

In the hardware implementations of decimation and interpolation, cascaded

integrator-comb (CIC) ﬁlters are frequently used as a computationally eﬃcient

narrowband lowpass ﬁlters. These lowpass ﬁlters are well suited to improve

the eﬃciency of anti-aliasing ﬁlters prior to decimation or the anti-imaging

ﬁlters after the interpolation. In both of these applications, CIC ﬁlters have

to operate at very high sampling rate. The CIC ﬁlters are generally more

convenient for large conversion factors due to small lowpass bandwidth. In

multistage decimators with a large conversion factor, the CIC ﬁlter is the best

suited for the ﬁrst decimation stage, whereas in interpolators, the comb ﬁlter is

adopted for the last interpolation stage.

34 Chapter 3. Integer Sampling Rate Conversion

z

−1

x(n)

M

−

y(m)

z

−D

Figure 3.12: Single stage CIC decimation ﬁlter with D = R/M.

The cascade connection of integrator and comb in Fig. 3.11 can be inter-

changed as both of these are linear time-invariant operations. In a decimator

structure, the integrator and comb are used as the ﬁrst and the second section

of the CIC ﬁlter, respectively. The structure also includes the decimation factor

M. The sampling rate at the output as a result of this operation is F

in

/M,

where F

in

is the input sampling rate. However, an important aspect with the

CIC ﬁlters is the spectral aliasing resulting from the downsampling operation.

The spectral bands centered at the integer multiples of 2/D will alias directly

into the desired band after the downsampling operation. Similarly, the CIC

ﬁlter with the upsampling factor L is used for the interpolation purpose. The

sampling rate as a result of this operation is F

in

L. However, imperfect ﬁltering

gives rise to unwanted spectral images. The ﬁltering characteristics for the anti-

aliasing and anti-image attenuation can be improved by increasing the number

of stages of CIC ﬁlter.

The comb and integrator sections are both linear time-invariant and can

follow or precede each other. However, it is generally preferred to place comb

part on the side which has the lower sampling rate. It reduces the length of the

delay line which is basically the diﬀerential delay of the comb section. When

the decimation factor M is moved into the comb section using the noble identity

and considering R = DM, the resulting architecture is the most common im-

plementation of CIC decimation ﬁlters shown in Fig. 3.12. The delay length is

reduced to D = R/M, which is now considered as the diﬀerential delay of comb

part. One advantage is that the storage requirement is reduced and also the

comb section now operates at a reduced clock rate. Both of these factors result

in hardware saving and also low power consumption. Typically, the diﬀerential

delay D is restricted to 1 or 2. The value of D also decides the number of nulls

in the frequency response of the decimation ﬁlter. Similarly, the interpolation

factor L is moved into the comb section using the noble identity and considering

R = DL. The resulting interpolation structure is shown in Fig. 3.13. The delay

length is reduced to D = R/L. Similar advantages can be achieved as in the

case of CIC decimation ﬁlter. The important thing to note is that the integrator

section operate at the higher sampling rate in both cases.

The transfer function of the K-stage CIC ﬁlter is

H(z) = H

K

I

(z)H

K

C

(z) =

_

1 −z

−D

1 −z

−1

_

K

. (3.4)

In order to modify the magnitude response of the CIC ﬁlter, there are only two

3.4. Cascaded Integrator Comb Filters 35

z

−1

y(m)

L

x(n)

−

z

−D

Figure 3.13: Single stage CIC interpolation ﬁlter with D = R/L.

parameters; the number of CIC ﬁlter stages and the diﬀerential delay D. The

natural nulls of the CIC ﬁlter provide maximum alias rejection but the aliasing

bandwidths around the nulls are narrow, and are usually not enough to provide

suﬃcient aliasing rejection in the entire baseband of the signal. The advantage

of increased number of stages is improvement in attenuation characteristics but

the passband droop normally needs to be compensated.

3.4.2 Polyphase Implementation of Non-Recursive CIC

The polyphase decomposition of the non-recursive CIC ﬁlter transfer function

may achieve lower power consumption than the corresponding recursive imple-

mentation [89, 90]. The basic non-recursive form of the transfer function is

H(z) =

1

D

D−1

n=0

z

−n

. (3.5)

The challenge to achieve low power consumption are those stages of the ﬁlter

which normally operate at higher input sampling rate. If the overall conversion

factor D is power of two, D = 2

J

, then the transfer function can be expressed

as

H(z) =

J−1

i=0

(1 +z

−2

i

) (3.6)

As a result, the original converter can be transformed into a cascade of J factor-

of-two converters. Each has a non-recursive sub-ﬁlter (1 +z

−1

)

K

and a factor-

of-two conversion.

A single stage non-recursive CIC decimation ﬁlter is shown in Fig. 3.14. In

the case of multistage implementation, one advantage is that only ﬁrst deci-

mation stage will operate at high input sampling rate. The sampling rate is

successively reduced by two after each decimation stage as shown in Fig. 3.15.

Further, the polyphase decomposition by a factor of two at each stage can be

used to reduce the sampling rate by an additional factor of two. The second

advantage of the non-recursive structures is that there are no more register over-

ﬂow problems, if properly scaled. The register word length at any stage j for

an input word length w

in

, is limited to (w

in

+K j).

In an approach proposed in [89], the overall decimator is decomposed into

a ﬁrst-stage non-recursive ﬁlter with the decimation factor J

1

, followed by a

cascade of non-recursive (1 + z

−1

)

K

ﬁlters with a factor-of-2 decimation. The

36 Chapter 3. Integer Sampling Rate Conversion

(1 + z

−1

+ z

−2

+ ... + z

−D+1

)

K

D

Figure 3.14: Non-recursive implementation of a CIC decimation ﬁlter.

(1 + z

−1

)

K

(1 + z

−1

)

K

(1 + z

−1

)

K

2 2 2

Figure 3.15: Cascade of J factor-of-two decimation ﬁlters.

polyphase decomposition along with noble identities is used to implement all

these sub-ﬁlters.

3.4.3 Compensation Filters

The frequency magnitude response envelopes of the CIC ﬁlters are like sin(x)/x,

therefore, some compensation ﬁlter are normally used after the CIC ﬁlters. The

purpose of the compensation ﬁlter is to compensate for the non-ﬂat passband

of the CIC ﬁlter.

Several approaches in the literature [91–95] are available for the CIC ﬁlter

sharpening to improve the passband and stopband characteristics. They are

based on the method proposed earlier in [96]. A two stage solution of sharpened

comb decimation structure is proposed in [93–95, 97–100] for the case where

sampling rate conversion is expressible as the product of two integers. The

transfer function can then be written as the product of two ﬁlter sections. The

role of these ﬁlter sections is then deﬁned. One ﬁlter section may be used

to provide sharpening while the other can provide stopband attenuation by

selecting appropriate number of stages in each ﬁlter section.

Chapter 4

Non-Integer Sampling Rate Conversion

The need for a non-integer sampling rate conversion may appear when the two

systems operating at diﬀerent sampling rates have to be connected. This chapter

gives a concise overview of SRC by a rational factor and implementation details.

The chapter ﬁrst introduces the SRC by a rational factor. The technique

for constructing eﬃcient SRC by a rational factor based on FIR ﬁlters and

polyphase decomposition is then presented. In addition, the sampling rate al-

teration with an arbitrary conversion factor is described. The polynomial-based

approximation of the impulse response of a resampler model is then presented.

Finally, the implementation of fractional-delay ﬁlters based on the Farrow struc-

ture is considered.

4.1 Sampling Rate Conversion: Rational Factor

The two basic operators, downsampler and upsampler, are used to change the

sampling rate of a discrete-time signal by an integer factor only. Therefore, the

sampling rate change by a rational factor, requires the cascade of upsampler and

downsampler operators; Upsampling by a factor of L, followed by downsampling

by a factor of M [9, 82]. For some cases, it may be beneﬁcial to change the order

of these operators, which is only possible if the L and M factors are relative

prime, i.e., M and L do not share a common divisor greater than one. A cascade

of these sampling rate alteration devices, results in sampling rate change by a

rational factor L/M. The realizations shown in Fig. 4.1 are equivalent if the

factors L and M are relative prime.

4.1.1 Small Rational Factors

The classical design of implementing the ratio L/M is upsampling by a factor

of L followed by appropriate ﬁltering, followed by a downsampling by a factor

of M.

37

38 Chapter 4. Non-Integer Sampling Rate Conversion

x(n) y(n)

M L

x(n)

L M

y(n)

Figure 4.1: Two diﬀerent realizations of rational factor L/M.

M

y(n) x(n)

L H

I

(z) H

d

(z)

Figure 4.2: Cascade of an interpolator and decimator.

The implementation of a rational SRC scheme is shown in Fig. 4.2. The

original input signal x(n) is ﬁrst upsampled by a factor of L followed by an anti-

imaging ﬁlter also called interpolation ﬁlter. The interpolated signal is ﬁrst

ﬁltered by an anti-aliasing ﬁlter before downsampling by a factor of M. The

sampling rate of the output signal is

F

out

=

L

M

F

in

. (4.1)

Since both ﬁlters, interpolation and the decimation, are next to each other, and

operate at the same sampling rate. Both of these lowpass ﬁlters can be combined

into a single lowpass ﬁlter H(z) as shown in Fig. 4.3. The speciﬁcation of the

ﬁlter H(z) needs to be selected such that it could act as both anti-imaging and

anti-aliasing ﬁlters at the same time.

Since the role of the ﬁlter H(z) is to act as an anti-imaging as well as an

anti-aliasing ﬁlter, the stopband edge frequency of the ideal ﬁlter H(z) in the

rational SRC of Fig. 4.3 should be

ω

s

T = min

_

π

L

,

π

M

_

. (4.2)

From the above equation it is clear that the passband will become more narrow

and ﬁlter requirements will be hard for larger values of L and M. The ideal

speciﬁcation requirement for the magnitude response is

[H(e

jωT

)[ =

_

L, [ωT[ ≤ min

_

π

L

,

π

M

_

0, otherwise,

(4.3)

L

x(n)

M H(z)

y(n)

Figure 4.3: An implementation of SRC by a rational factor L/M.

4.1. Sampling Rate Conversion: Rational Factor 39

L

L

L

L

M

M

M

y(m) x(n)

H

0

(z)

H

1

(z)

H

L−1

(z)

M H

2

(z) z

−2

z

−1

z

−L+1

Figure 4.4: Polyphase realization of SRC by a rational factor L/M.

The frequency spectrum of the re-sampled signal, Y (e

jωT

), as a function of

spectrum of the original signal X(e

jωT

), the ﬁlter transfer function H(e

jωT

),

and the interpolation and decimation factors L and M, can be expressed as

Y (e

jωT

) =

1

M

M−1

k=0

X(e

j(LωT−2πk)/M

)H(e

j(ωT−2πk)/M

). (4.4)

It is apparent that the overall performance of the resampler depends on the

ﬁlter characteristics.

4.1.2 Polyphase Implementation of Rational Sampling Rate Con-

version

The polyphase representation is used for the eﬃcient implementation of rational

sampling rate converters, which then enables the arithmetic operations to be

performed at the lowest possible sampling rate. The transfer function H(z) in

Fig. 4.3 is decomposed into L polyphase components using the approach shown

in Fig. 1.11a. In the next step, the interpolation factor L is moved into the sub-

ﬁlters using the computationally eﬃcient polyphase representation form shown

in Fig. 1.11. An equivalent delay block for each sub-ﬁlter branch is placed in

cascade with the sub-ﬁlters. Since the downsampling operation is linear, and

also by the use of noble identity, it can be moved into each branch of the sub-

ﬁlters. The equivalent polyphase representation of the rational factor of L/M

as a result of these modiﬁcations is shown in Fig. 4.4 [5, 10].

By using polyphase representation and noble identities, Fig. 4.4 has already

attained the computational saving by a factor of L. The next motivation is to

move the downsampling factor to the input side so that all ﬁltering operations

could be evaluated at 1/M-th of the input sampling rate. The reorganization of

the individual polyphase branches is targeted next. If the sampling conversion

40 Chapter 4. Non-Integer Sampling Rate Conversion

z

k(l

0

L−m

0

M)

L H

k

(z) M

H

k

(z)

L

M z

−k

H

k

(z) L M z

−km

0

H

k

(z) M L z

−km

0

z

kl

0

z

kl

0

Figure 4.5: Reordering the upsampling and downsampling in the polyphase

realization of k-th branch.

factors, L and M, are relative prime, then

l

0

L −m

0

M = −1 (4.5)

where l

0

and m

0

are some integers. Using (4.5), the k-th delay can be repre-

sented as

z

−k

= z

k(l

0

L−m

0

M)

(4.6)

The step-by-step reorganization of the k-th branch is shown in Fig. 4.5. The

delay chain z

−k

is ﬁrst replaced by its equivalent form in (4.6), which then

enables the interchange of downsampler and upsampler. Since both upsampler

and downsampler are assumed to be relative prime, they can be interchanged.

The ﬁxed independent delay chains are also moved to the input and the output

sides. As can be seen in Fig. 4.5, the sub-ﬁlter H

k

(z) is in cascade with the

downsampling operation, as highlighted by the dashed block. The polyphase

representation form in Fig. 1.10a and its equivalent computationally eﬃcient

form in Fig. 1.10 can be used for the polyphase representation of the sub-ﬁlter

H

k

(z). The same procedure is followed for all other branches. As a result of

these adjustments, all ﬁltering operations now operate at 1/M:th of the input

sampling rate. The intermediate sampling rate for the ﬁltering operation will be

lower, which makes it a more eﬃcient structure for a rational SRC by a factor

L/M. The negative delay z

kl

0

on the input side is adjusted by appropriately

delaying the input to make the solution causal. The rational sampling rate

conversion structures for the case L/M < 1 considered can be used to perform

L/M > 1 by considering its dual.

4.1.3 Large Rational Factors

The polyphase decomposition approach is convenient in cases where the factors

L and M are small. Otherwise, ﬁlters of a very high order are needed. For ex-

ample, in the application of sampling rate conversion from 48 kHz to 44.1 kHz,

4.2. Sampling Rate Conversion: Arbitrary Factor 41

the rational factors requirement is L/M = 147/160, which imposes severe re-

quirements for the ﬁlters. The stopband edge frequency from (4.2) is π/160.

This will result in a high-order ﬁlter with a high complexity. In this case, a

multi-stage realization will be beneﬁcial [41].

4.2 Sampling Rate Conversion: Arbitrary Factor

In many applications, there is a need to compute the value of underlying

continuous-time signal x

a

(t) at an arbitrary time instant between two exist-

ing samples. The computation of new sample values at some arbitrary points

can be viewed as interpolation.

Assuming an input sequence, . . . , x((n

b

−2)T

x

), x((n

b

−1)T

x

), x(n

b

T

x

), x((n

b

+

1)T

x

), x((n

b

+2)T

x

), . . . , uniformly sampled at the interval T

x

. The requirement

is to compute new sample value, y(T

y

m) at some time instant, T

y

m, which

occurs between the two existing samples x(n

b

T

x

) and x((n

b

+ 1)T

x

), where

T

y

m = n

b

T

x

+ T

x

d. The parameter n

b

is some reference or base-point index,

and d is called the fractional interval. The fractional interval can take on any

value in the range [T

x

d[ ≤ 0.5, to cover the whole sampling interval range. The

new sample value y(T

y

m) is computed, using the existing samples, by utilizing

some interpolation algorithm. The interpolation algorithm may in general be

deﬁned as a time-varying digital ﬁlter with the impulse response h(n

b

, d).

The interpolation problem can be considered as resampling, where the continuous-

time function y

a

(t) is ﬁrst reconstructed from a ﬁnite set of existing sam-

ples x(n). The continuous-time signal can then be sampled at the desired

instant, y(T

y

m) = y

a

(T

y

m). The commonly used interpolation techniques

are based on polynomial approximations. For a given set of the input sam-

ples x(−N

1

+ n

b

), . . . , x(n

b

), . . . , x(n

b

+ N

2

), the window or interval chosen is

N = N

1

+N

2

+ 1, A polynomial approximation y

a

(t) is then deﬁned as

y

a

(t) =

N

2

k=−N

1

P

k

(t)x(n

b

+k) (4.7)

where P

k

(t) are polynomials. If the interpolation process is based on the La-

grange polynomials, then P

k

(t) is deﬁned as

P

k

(t) =

N

2

i=−N1,i =k

t −t

k

t

k

−t

i

, k = −N

1

, −N

1

+ 1, . . . , N

2

−1, N

2

. (4.8)

The Lagrange approximation also does the exact reconstruction of the original

input samples, x(n).

4.2.1 Farrow Structures

In conventional SRC implementation, if the SRC ratio changes, new ﬁlters are

needed. This limits the ﬂexibility in covering diﬀerent SRC ratios. By utiliz-

42 Chapter 4. Non-Integer Sampling Rate Conversion

y(n)

H

L

(z) H

2

(z) H

1

(z) H

0

(z)

x(n)

d d d

Figure 4.6: Farrow structure.

ing the Farrow structure, shown in Fig. 4.6, this can be solved in an elegant

way. The Farrow structure is composed of linear-phase FIR sub-ﬁlters, H

k

(z),

k = 0, 1, . . . , L, with either a symmetric (for k even) or antisymmetric (for k

odd) impulse response. The overall impulse response values are expressed as

polynomials in the delay parameter. The implementation complexity of the Far-

row structure is lower compared to alternatives such as online design or storage

of a large number of diﬀerent impulse responses. The corresponding ﬁlter struc-

ture makes use of a number of sub-ﬁlters and one adjustable fractional-delay

as seen in Fig. 4.6. Let the desired frequency response, H

des

(e

jωT

, d), of an

adjustable FD ﬁlter be

H

des

(e

jωT

, d) = e

−jωT(D+d)

, [ωT[ ≤ ω

c

T < π, (4.9)

where D and d are ﬁxed and adjustable real-valued constants, respectively. It

is assumed that D is either an integer, or an integer plus a half, whereas d takes

on values in the interval [−1/2, 1/2]. In this way, a whole sampling interval is

covered by d, and the fractional-delay equals d (d + 0.5) when D is an integer

(an integer plus a half). The transfer function of the Farrow structure is

H(z, d) =

L

k=0

d

k

H

k

(z), (4.10)

where H

k

(z) are ﬁxed FIR sub-ﬁlters approximating kth-order diﬀerentiators

with frequency responses

H

k

(e

jωT

) ≈ e

−jωTD

(−jωT)

k

k!

, (4.11)

which is obtained by truncating the Taylor series expansion of (4.9) to L + 1

terms. A ﬁlter with a transfer function in the form of (4.11) can approximate

the ideal response in (4.10) as close as desired by choosing L and designing

the sub-ﬁlters appropriately [30, 101]. When H

k

(z) are linear-phase FIR ﬁlters,

the Farrow structure is often referred to as the modiﬁed Farrow structure. The

Farrow structure is eﬃcient for interpolation whereas, for decimation, it is better

to use the transposed Farrow structure so as to avoid aliasing. The sub-ﬁlters

can also have even or odd orders N

k

. With odd N

k

, all H

k

(z) are general ﬁlters

whereas for even N

k

, the ﬁlter H

0

(z) reduces to a pure delay. An alternative

4.2. Sampling Rate Conversion: Arbitrary Factor 43

representation of the transfer function of Farrow structure is

H(z, d) =

L

k=0

N

n=0

h

k

(n)z

−n

d

k

,

=

N

n=0

L

k=0

h

k

(n)d

k

z

−n

,

=

N

n=0

h(n, d)z

−n

. (4.12)

The quantity N is the order of the overall impulse response and

h(n, d) =

L

k=0

h

k

(n)d

k

. (4.13)

Assuming T

in

and T

out

to be the sampling period of x(n) and y(n), respectively,

the output sample index at the output of the Farrow structure is

n

out

T

out

=

_

(n

in

+d(n

in

))T

in

, Even N

k

(n

in

+ 0.5 +d(n

in

))T

in

, Odd N

k

,

(4.14)

where n

in

(n

out

) is the input (output) sample index. If d is constant for all input

samples, the Farrow structure delays a bandlimited signal by a ﬁxed d. If a

signal needs to be delayed by two diﬀerent values of d, in both cases, one set of

H

k

(z) is used and only d is required to be modiﬁed.

In general, SRC can be seen as delaying every input sample with a diﬀerent

d. This delay depends on the SRC ratio. For interpolation, one can obtain new

samples between any two consecutive samples of x(n). With decimation, one

can shift the original samples (or delay them in the time domain) to the positions

which would belong to the decimated signal. Hence, some signal samples will

be removed but some new samples will be produced. Thus, by controlling d

for every input sample, the Farrow structure performs SRC. For decimation,

T

out

> T

in

, whereas interpolation results in T

out

< T

in

.

Chapter 5

Polynomial Evaluation

The Farrow structure, shown in Fig. 4.6, requires evaluation of a polynomial of

degree L with d as an independent variable. Motivated by that, this chapter

reviews polynomial evaluation schemes and eﬃcient evaluation of a required set

of powers terms. Polynomial evaluation also has other applications [102], such

as approximation of elementary functions in software or hardware [103, 104].

A uni-variate polynomial is a polynomial that has only one independent

variable, whereas multi-variate polynomials involves multiple independent vari-

ables. In this chapter, only uni-variate polynomial, p(x), is considered with an

independent variable x. At the start of chapter, the classical Horner scheme

is described, which is considered as an optimal way to evaluate a polynomial

with minimum number of operations. A brief overview of parallel polynomial

evaluation schemes is then presented in the central part of the chapter. Finally,

the computation of the required set of powers is discussed. Two approaches are

outlined to exploit potential sharing in the computations.

5.1 Polynomial Evaluation Algorithms

The most common form of a polynomial, p(x), also called power form, is repre-

sented as

p(x) = a

0

+a

1

x +a

2

x

2

+ +a

n

x

n

, a

n

, = 0.

Here n is the order of the polynomial, and a

0

, a

1

, . . . , a

n

are the polynomial

coeﬃcients.

In order to evaluate p(x), the ﬁrst solution that comes into mind is the

transformation of all powers to multiplications. The p(x) takes the form

p(x) = a

0

+a

1

x +a

2

x x + +a

n

x x x, a

n

, = 0.

45

46 Chapter 5. Polynomial Evaluation

x x a

1

x a

0

a

n

a

n−1

p(x)

Figure 5.1: Horner’s scheme for an n-th order polynomial.

However, there are more eﬃcient ways to evaluate this polynomial. One possible

way to start with is the most commonly used scheme in software and hardware

named as Horner’s scheme.

5.2 Horner’s Scheme

Horner’s scheme is simply a nested re-write of the polynomial. In this scheme,

the polynomial p(x) is represented in the form

p(x) = a

0

+x

_

a

1

+x

_

a

2

+ +x(a

n−1

+a

n

x) . . .

_

_

. (5.1)

This leads to the structure in Fig. 5.1.

Horner’s scheme has the minimum arithmetic complexity of any polynomial

evaluation algorithm. For a polynomial order of n, n multiplications and n

additions are used. However, it is purely sequential, and therefore becomes

unsuitable for high-speed applications.

5.3 Parallel Schemes

Several algorithms with some degree of parallelism in the evaluation of polyno-

mials have been proposed [105–110]. However, there is an increase in the number

of operations for these parallel algorithms. Earlier works have primarily focused

on either ﬁnding parallel schemes suitable for software realization [105–111] or

how to ﬁnd suitable polynomials for hardware implementation of function ap-

proximation [47, 112–116].

To compare diﬀerent polynomial evaluation schemes, varying in degree of

parallelism, diﬀerent parameters such as computational complexity, number of

operations of diﬀerent types, critical path, pipelining complexity, and latency

after pipelining may be considered. On the basis of these parameters, the suit-

able schemes can be shortlisted for an implementation given the speciﬁcations.

Since all schemes have the same number of additions n, the diﬀerence is in

the number of multiplication operations, as given in Table 5.1. Multiplications

can be divided into three categories, data-data multiplications, squarers, and

data-coeﬃcient multiplications.

In squarers, both of the inputs of the multipliers are same. This leads to

that the number of partial products can be roughly halved, and, hence, it is

5.4. Powers Evaluation 47

reasonable to assume that a squarer has roughly half the area complexity as of

a general multiplier [117, 118]. In data-coeﬃcient multiplications, as explained

in Chapter 2, the area complexity is reduced but hard to ﬁnd as it depends

on the coeﬃcient. For simplicity, it is assumed that a constant multiplier has

an area which is 1/4 of a general multiplier. As a result, two schemes having

the same total number of multiplication but vary in types of multiplications

are diﬀerent with respect to implementation cost and other parameters. The

scheme which has the lower number of data-data multiplications is cheaper to

implement.

The schemes which are frequently used in literature are Horner and Estrin

[105]. However, there are other polynomial evaluation schemes as well like

Dorn’s Generalized Horner Scheme [106], Munro and Paterson’s Scheme [107,

108], Maruyama’s Scheme [109], Even Odd (EO) Scheme, and Li et al. Scheme

[110]. All the listed schemes have some good potential and useful properties.

However, some schemes do not work for all polynomial orders.

The diﬀerence in all schemes is that how the schemes eﬃciently split the

polynomial into sub-polynomials so that they can be evaluated in parallel. The

other factor that is important is the type of powers required after splitting of

the polynomial. Some schemes split the polynomial in such a way that only

powers of the form x

2

i

or x

3

i

are required. The evaluation of such square and

cube powers are normally more eﬃcient than conventional multiplications. The

data-coeﬃcient multiplication count also gives the idea of degree of parallelism

of that scheme. For example, Horner’s scheme has only one data-coeﬃcient

multiplication, and hence, has one sequential ﬂow; no degree of parallelism.

The EO scheme, on other hand, has two data-coeﬃcient multiplications, which

result in two branches running in parallel. The disadvantage on the other hand

is that the number of operations are increased and additional need of hardware

to run operations in parallel. The applications where polynomials are required

to be evaluated at run-time or have extensive use of it, any degree of parallelism

in it or eﬃcient evaluation relaxes the computational burden to satisfy required

throughput.

Other factors that may be helpful in shortlisting any evaluation scheme are

uniformity and simplicity of the resulting evaluation architecture and linearity

of the scheme as the order grows; some schemes may be good for one polynomial

order but not for the others.

5.4 Powers Evaluation

Evaluation of a set of powers, also known as exponentiation, not only ﬁnds

application in polynomial evaluation [102, 119] but also in window-based expo-

nentiation for cryptography [120, 121]. The required set of power terms to be

evaluated may contain all powers up to some integer n or it has some sparse set

of power terms that need to be evaluated. In the process of evaluating all pow-

ers at the same time, redundancy in computations at diﬀerent levels is expected

48 Chapter 5. Polynomial Evaluation

Table 5.1: Evaluation schemes for ﬁfth-order polynomial, additions (A), data-

data multiplications (M), data-coeﬃcient multiplications (C), squarers (S).

Scheme Evaluation A M C S

Direct a

5

x

5

+a

4

x

4

+a

3

x

3

+a

2

x

2

+a

1

x +a

0

5 2 5 2

Horner ((((a

5

x +a

4

)x +a

3

)x +a

2

)x +a

1

)x +a

0

5 1 4 0

Estrin (a

5

x

2

+a

4

x +a

3

)x

3

+a

2

x

2

+a

1

x +a

0

5 4 2 1

Li (a

5

x +a

4

)x

4

+ (a

3

x +a

2

)x

2

+ (a

1

x +a

0

) 5 3 2 2

Dorn (a

5

x

3

+a

2

)x

2

+ (a

4

x

3

+a

1

)x + (a

3

x

3

+a

0

) 5 3 3 1

EO ((a

5

x

2

+a

3

)x

2

+a

1

)x + ((a

4

x

2

+a

2

)x

2

+a

0

) 5 2 3 1

Maruyama (a

5

x +a

4

)x

4

+a

3

x

3

+ (a

2

x

2

+a

1

x +a

0

) 5 4 2 2

which need to be exploited. Two independent approaches are considered. In the

ﬁrst, redundancy is removed at word level, while in the second, it is exploited

at bit level.

5.4.1 Powers Evaluation with Sharing at Word-Level

A direct approach to compute x

i

is the repeated multiplication of x with itself

i − 1 times, however, this approach is ineﬃcient and becomes infeasible for

large values of i. For example, to compute x

23

, 22 repeated multiplications of

x are required. With the binary approach in [102], the same can be achieved

in eight multiplications, ¦x

2

, x

4

, x

5

, x

10

, x

11

, x

12

, x

23

¦, while the factor method

in [102] gives a solution with one less multiplication, ¦x

2

, x

3

, x

4

, x

7

, x

8

, x

16

, x

23

¦.

Another eﬃcient way to evaluate a single power is to consider the relation to

the addition chain problem. An addition chain for an integer n is a list of

integers [102]

1 = a

0

, a

1

, a

2

, . . . , a

l

= n, (5.2)

such that

a

i

= a

j

+a

k

, k ≤ j < i, i = 1, 2, . . . , l. (5.3)

As multiplication is additive in the logarithmic domain, a minimum length

addition chain will give an optimal solution to compute a single power term. A

minimum length addition chain for n = 23 gives the optimal solution with only

six multiplications, ¦x

2

, x

3

, x

5

, x

10

, x

20

, x

23

¦.

However, when these techniques, used for the computation of a single power

[102, 122, 123], are applied in the evaluation of multiple power terms, it does not

combine well in eliminating the overlapping terms that appear in the evaluation

of individual powers as explained below with an example.

Suppose a sparse set a power terms, T = ¦22, 39, 50¦, need to be evaluated.

5.4. Powers Evaluation 49

A minimum length addition chain solution for each individual power is

22 →¦1, 2, 3, 5, 10, 11, 22¦

39 →¦1, 2, 3, 6, 12, 15, 27, 39¦

50 →¦1, 2, 3, 6, 12, 24, 25, 50¦.

As can be seen, there are some overlapping terms, ¦2, 3, 6, 12¦, which can be

shared. If these overlapping terms are removed and the solutions are combined

for the evaluation of the required set of powers, ¦22, 39, 50¦, the solution is

¦22, 39, 50¦ →¦1, 2, 3, 5, 6, 10, 11, 12, 15, 22, 24, 25, 27, 39, 50¦. (5.4)

However, the optimal solution for the evaluation of the requested set of powers,

¦22, 39, 50¦, is

¦22, 39, 50¦ →¦1, 2, 3, 6, 8, 14, 22, 36, 39, 50¦, (5.5)

which clearly shows a signiﬁcant diﬀerence. This solution is obtained by address-

ing addition sequence problem, which can be considered as the generalization of

addition chains. An addition sequence for the set of integers T = n

1

, n

2

, . . . , nr

is an addition chain that contains each element of T. For example, an addition

sequence computing ¦3, 7, 11¦ is ¦1, 2, 3, 4, 7, 9, 11¦.

The removal of redundancy at word-level does not strictly mean for the

removal of overlapping terms only but to compute only those powers which

could be used for evaluation of other powers as well in the set.

Note that the case where all powers of x, from one up to some integer n,

are required to be computed is easy compared to the sparse case. Since every

single multiplication will compute some power in the set, and it is assumed that

each power is computed only once, any solution obtained will be optimal with

respect to the minimum number of multiplications.

5.4.2 Powers Evaluation with Sharing at Bit-Level

In this approach, the evaluation of power terms is done in parallel using sum-

mations trees, similar to parallel multipliers. The PP matrices for all requested

powers in the set are generated independently and sharing at bit level is ex-

plored in the PP matrices in order to remove redundant computations. The

redundancy here relates to the fact that same three partial products may be

present in more than one columns, and, hence, can be mapped to the same full

adder.

A w-bit unsigned binary number can be expressed as following

X = x

w−1

x

w−2

x

w−3

...x

2

x

1

x

0

, (5.6)

with a value of

X =

w−1

i=0

x

i

2

i

, (5.7)

50 Chapter 5. Polynomial Evaluation

where x

w−1

is the MSB and x

0

is the LSB of the binary number. The n

th

power

of an unsigned binary number as in (5.7) is given by

X

n

=

_

w−1

i=0

x

i

2

i

_

n

. (5.8)

For the powers with n ≥ 2, the expression in (5.8) can be evaluated using the

multinomial theorem. For any integer power n and a word length of w bits, the

above equation can be simpliﬁed to a sum of weighted binary variables, which

can further be simpliﬁed by using the identities x

i

x

j

= x

j

x

i

and x

i

x

i

= x

i

.

The PP matrix of the corresponding power term can then be deduced from

this function. In the process of generation of a PP matrix from the weighted

function, the terms, e.g., x

0

2

0

will be placed in the ﬁrst and x

2

2

6

in the 7-th

column from the left in the PP matrix.

For the terms having weights other than a power of two, binary or CSD

representation may be used. The advantage of using CSD representation over

binary is that the size of PP matrix is reduced and therefore the number of

full adders for accumulating PP are reduced. To make the analysis of the PP

matrix simpler, the binary variables in the PP matrix can then be represented

by their corresponding representation weights as given in Table 5.2.

Table 5.2: Equivalent representation of binary variables in PP matrix for w = 3.

Binary variable

Word (X)

Representation

x

2

x

1

x

0

x

0

0 0 1 1

x

1

0 1 0 2

x

1

x

0

0 1 1 3

x

2

1 0 0 4

x

2

x

0

1 0 1 5

x

2

x

1

1 1 0 6

x

2

x

1

x

0

1 1 1 7

In Fig. 5.2, an example case for (n, w) = (5, 4), i.e., computing the square,

cube, and fourth power of a ﬁve-bit input, is considered to demonstrate the

sharing potential among multiple partial products. All partial products of the

powers from x

2

up to x

5

are shown. As can be seen in Fig. 5.2, there are savings,

since the partial product sets ¦9, 10, 13¦ and ¦5, 7, 9¦ are present in more than

one column. Therefore, compared to adding the partial products in an arbitrary

order, three full adders (two for the ﬁrst set and one for the second) can be saved

by making sure that exactly these sets of partial products are added.

5.4. Powers Evaluation 51

4

6

9

n = 2

10

13

12

14

10

13

11

10

11

3

6

5

7

9

n = 3

10

13

14 2

5

9 11

10

15

10

14

13

15

17

10

13

13

11

5

7

9

6

9

4

9

5

7

8

9 6

4

5

7

9

3

5

n = 4

9

Figure 5.2: Partial product matrices with potential sharing of full adders, n = 5,

w = 4.

Chapter 6

Conclusions and Future Work

6.1 Conclusions

In this work, contributions beneﬁcial for the implementation of integer and non-

integer SRC were presented. By optimizing both the number of arithmetic op-

erations and their word lengths, improved implementations are obtained. This

not only reduces the area and power consumption but also allows a better com-

parisons between diﬀerent SRC alternatives during a high-level system design.

The comparisons are further improved for some cases by accurately estimating

the switching activities and by introducing leakage power consumption.

6.2 Future Work

The following ideas are identiﬁed as possibilities for future work:

It would be beneﬁcial to derive corresponding accurate switching activ-

ity models for the non-recursive CIC architectures. In this way an im-

proved comparison can be made, where both architectures have more ac-

curate switching activity estimates. Deriving a switching activity model

for polyphase FIR ﬁlters would naturally have beneﬁts for all polyphase

FIR ﬁlters, not only the ones with CIC ﬁlter coeﬃcients.

For interpolating recursive CIC ﬁlters, the output of the ﬁnal comb stage

will be embedded with zeros during the upsampling. Based on this, one

could derive closed form expressions for the integrator taking this addi-

tional knowledge into account.

The length of the Farrow sub-ﬁlters usually diﬀer between diﬀerent sub-

ﬁlters. This would motivate a study of pipelined polynomial evaluation

53

54 Chapter 6. Conclusions and Future Work

schemes where the coeﬃcients arrive at diﬀerent times. For the proposed

matrix-vector multiplication scheme, this could be utilized to shift the

rows corresponding to the sub-ﬁlters in such a way that the total implemen-

tation complexity for the matrix-vector multiplication and the pipelined

polynomial evaluation is minimized.

The addition sequence model does not include pipelining. It would be

possible to reformulate it to consider the amount of pipelining registers

required as well. This could also include obtaining diﬀerent power terms

at diﬀerent time steps.

Chapter 7

References

55

56 Chapter 7. References

References

[1] S. K. Mitra, Digital Signal Processing: A Computer Based Approach,

2nd ed. New York: McGraw-Hill, 2001.

[2] A. V. Oppenheim, R. W. Schafer, and J. R. Buck, Discrete-Time Signal

Processing. Prentice Hall, 1999.

[3] S. K. Mitra and J. F. Kaiser, Handbook for Digital Signal Processing. New

York: Wiley, 1993.

[4] L. Wanhammar and H. Johansson, Digital Filters Using MATLAB.

Linköping University, 2011.

[5] N. J. Fliege, Multirate Digital Signal Processing: Multirate Systems, Filter

Banks, Wavelets, 1st ed. New York, NY, USA: John Wiley & Sons, Inc.,

1994.

[6] R. Crochiere and L. Rabiner, “Optimum FIR digital ﬁlter implemen-

tations for decimation, interpolation, and narrow-band ﬁltering,” IEEE

Trans. Acoust., Speech, Signal Process., vol. 23, no. 5, pp. 444–456, Oct.

1975.

[7] R. E. Crochiere and L. R. Rabiner, “Interpolation and decimation of dig-

ital signals–a tutorial review,” Proc. IEEE, vol. 69, no. 3, pp. 300–331,

Mar. 1981.

[8] T. Ramstad, “Digital methods for conversion between arbitrary sampling

frequencies,” IEEE Trans. Acoust., Speech, Signal Process., vol. 32, no. 3,

pp. 577–591, Jun. 1984.

[9] M. Bellanger, Digital Processing of Signals. John Wiley & Sons, Inc.,

1984.

[10] P. P. Vaidyanathan, Multirate Systems and Filter Banks. Prentice Hall,

1993.

[11] M. Bellanger, G. Bonnerot, and M. Coudreuse, “Digital ﬁltering by

polyphase network:application to sample-rate alteration and ﬁlter banks,”

IEEE Trans. Acoust., Speech, Signal Process., vol. 24, no. 2, pp. 109–114,

Apr. 1976.

[12] R. A. Valenzuela and A. G. Constantinides, “Digital signal processing

schemes for eﬃcient interpolation and decimation,” IEE Proc. Electr. Cir-

cuits Syst., vol. 130, no. 6, pp. 225–235, Dec. 1983.

[13] W. Drews and L. Gazsi, “A new design method for polyphase ﬁlters using

all-pass sections,” IEEE Trans. Circuits Syst., vol. 33, no. 3, pp. 346–348,

Mar. 1986.

References 57

[14] M. Renfors and T. Saramäki, “Recursive Nth–band digital ﬁlters – part

I: Design and properties,” IEEE Trans. Circuits Syst., vol. 34, no. 1, pp.

24–39, Jan. 1987.

[15] ——, “Recursive Nth-band digital ﬁlters – part II: Design of multistage

decimators and interpolators,” IEEE Trans. Circuits Syst., vol. 34, no. 1,

pp. 40–51, Jan. 1987.

[16] F. M. Gardner, “Interpolation in digital modems. I. Fundamentals,” IEEE

Trans. Commun., vol. 41, no. 3, pp. 501–507, Mar. 1993.

[17] L. Erup, F. M. Gardner, and R. A. Harris, “Interpolation in digital

modems. II. Implementation and performance,” IEEE Trans. Commun.,

vol. 41, no. 6, pp. 998–1008, Jun. 1993.

[18] V. Tuukkanen, J. Vesma, and M. Renfors, “Combined interpolation and

maximum likelihood symbol timing recovery in digital receivers,” in Proc.

IEEE Int. Universal Personal Commun. Record Conf., vol. 2, 1997, pp.

698–702.

[19] M.-T. Shiue and C.-L. Wey, “Eﬃcient implementation of interpolation

technique for symbol timing recovery in DVB-T transceiver design,” in

Proc. IEEE Int. Electro/Information Technol. Conf., 2006, pp. 427–431.

[20] S. R. Dooley and A. K. Nandi, “On explicit time delay estimation using

the Farrow structure,” Signal Process., vol. 72, no. 1, pp. 53–57, Jan.

1999.

[21] M. Olsson, H. Johansson, and P. Löwenborg, “Time-delay estimation us-

ing Farrow-based fractional-delay FIR ﬁlters: ﬁlter approximation vs. es-

timation errors,” in Proc. Europ. Signal Process. Conf., 2006.

[22] ——, “Simultaneous estimation of gain, delay, and oﬀset utilizing the

Farrow structure,” in Proc. Europ. Conf. Circuit Theory Design, 2007,

pp. 244–247.

[23] C. W. Farrow, “A continuously variable digital delay element,” in Proc.

IEEE Int. Symp. Circuits Syst., 1988, pp. 2641–2645.

[24] T. Saramäki and T. Ritoniemi, “An eﬃcient approach for conversion be-

tween arbitrary sampling frequencies,” in Proc. IEEE Int. Symp. Circuits

Syst., vol. 2, 1996, pp. 285–288.

[25] D. Babic and M. Renfors, “Power eﬃcient structure for conversion be-

tween arbitrary sampling rates,” IEEE Signal Process. Lett., vol. 12, no. 1,

pp. 1–4, Jan. 2005.

[26] J. T. Kim, “Eﬃcient implementation of polynomial interpolation ﬁlters

or full digital receivers,” IEEE Trans. Consum. Electron., vol. 51, no. 1,

pp. 175–178, Feb. 2005.

58 Chapter 7. References

[27] T. I. Laakso, V. Valimaki, M. Karjalainen, and U. K. Laine, “Splitting

the unit delay,” IEEE Signal Process. Mag., vol. 13, no. 1, pp. 30–60, Jan.

1996.

[28] J. Vesma, “A frequency-domain approach to polynomial-based interpola-

tion and the Farrow structure,” IEEE Trans. Circuits Syst. II, vol. 47,

no. 3, pp. 206–209, Mar. 2000.

[29] R. Hamila, J. Vesma, and M. Renfors, “Polynomial-based maximum-

likelihood technique for synchronization in digital receivers,” IEEE Trans.

Circuits Syst. II, vol. 49, no. 8, pp. 567–576, Aug. 2002.

[30] H. Johansson and P. Löwenborg, “On the design of adjustable fractional

delay FIR ﬁlters,” IEEE Trans. Circuits Syst. II, vol. 50, no. 4, pp. 164–

169, Apr. 2003.

[31] J. Vesma and T. Saramäki, “Polynomial-based interpolation ﬁlters – part

I: Filter synthesis,” Circuits Syst. Signal Process., vol. 26, no. 2, pp. 115–

146, Apr. 2007.

[32] V. Välimäki and A. Haghparast, “Fractional delay ﬁlter design based on

truncated Lagrange interpolation,” IEEE Signal Process. Lett., vol. 14,

no. 11, pp. 816–819, Nov. 2007.

[33] H. H. Dam, A. Cantoni, S. Nordholm, and K. L. Teo, “Variable digital

ﬁlter with group delay ﬂatness speciﬁcation or phase constraints,” IEEE

Trans. Circuits Syst. II, vol. 55, no. 5, pp. 442–446, May 2008.

[34] E. Hermanowicz, H. Johansson, and M. Rojewski, “A fractionally delaying

complex Hilbert transform ﬁlter,” IEEE Trans. Circuits Syst. II, vol. 55,

no. 5, pp. 452–456, May 2008.

[35] C. Candan, “An eﬃcient ﬁltering structure for Lagrange interpolation,”

IEEE Signal Process. Lett., vol. 14, no. 1, pp. 17–19, Jan. 2007.

[36] A. P. Chandrakasan and R. W. Brodersen, Low-Power Digital CMOS

Design. Kluwer Academic Publishers, Boston-Dordrecht-London, 1995.

[37] A. P. Chandrakasan, S. Sheng, and R. W. Brodersen, “Low-power CMOS

digital design,” IEEE J. Solid-State Circuits, vol. 27, no. 4, pp. 473–484,

Apr. 1992.

[38] N. S. Kim, T. Austin, D. Baauw, T. Mudge, K. Flautner, J. S. Hu, M. J.

Irwin, M. Kandemir, and V. Narayanan, “Leakage current: Moore’s law

meets static power,” Computer, vol. 36, no. 12, pp. 68–75, Dec. 2003.

[39] K. Roy, S. Mukhopadhyay, and H. Mahmoodi-Meimand, “Leakage cur-

rent mechanisms and leakage reduction techniques in deep-submicrometer

CMOS circuits,” Proc. IEEE, vol. 91, no. 2, pp. 305–327, Feb. 2003.

References 59

[40] International technology roadmap for semiconductors,

http://public.itrs.net.

[41] L. Wanhammar, DSP Integrated Circuits. Academic Press, 1998.

[42] K. K. Parhi, VLSI Digital Signal Processing Systems: Design and Imple-

mentation. John Wiley and Sons, 1999.

[43] L. B. Jackson, “On the interaction of round-oﬀ noise and dynamic range

in digital ﬁlters,” Bell Syst. Tech. J., vol. 49, pp. 159–184, Feb. 1970.

[44] W. Sung and K.-I. Kum, “Simulation-based word-length optimization

method for ﬁxed-point digital signal processing systems,” IEEE Trans.

Signal Process., vol. 43, no. 12, pp. 3087–3090, Dec. 1995.

[45] K.-I. Kum and W. Sung, “Combined word-length optimization and high-

level synthesis of digital signal processing systems,” IEEE Trans. Comput.-

Aided Design Integr. Circuits Syst., vol. 20, no. 8, pp. 921–930, Aug. 2001.

[46] K. Han and B. L. Evans, “Optimum wordlength search using sensitivity

information,” EURASIP J. Applied Signal Process., vol. 2006, pp. 1–14,

Jan. 2006.

[47] D.-U. Lee and J. D. Villasenor, “A bit-width optimization methodology

for polynomial-based function evaluation,” IEEE Trans. Comput., vol. 56,

no. 4, pp. 567–571, Apr. 2007.

[48] P. D. Fiore, “Eﬃcient approximate wordlength optimization,” IEEE

Trans. Comput., vol. 57, no. 11, pp. 1561–1570, Nov. 2008.

[49] G. A. Constantinides, P. Y. K. Cheung, and W. Luk, Synthesis And Op-

timization Of DSP Algorithms. Norwell, MA, USA: Kluwer Academic

Publishers, 2004.

[50] P. D. Fiore and L. Lee, “Closed-form and real-time wordlength adapta-

tion,” in Proc. IEEE Int. Conf. Acoustics Speech Signal Process., vol. 4,

1999, pp. 1897–1900.

[51] N. Herve, D. Menard, and O. Sentieys, “Data wordlength optimization for

FPGA synthesis,” in Proc. IEEE Workshop Signal Process. Syst. Design

Implementation, 2005, pp. 623–628.

[52] S. C. Chan and K. M. Tsui, “Wordlength determination algorithms for

hardware implementation of linear time invariant systems with prescribed

output accuracy,” in Proc. IEEE Int. Symp. Circuits Syst., 2005, pp. 2607–

2610.

[53] M. Vesterbacka, “On implementation of maximally fast wave digital ﬁl-

ters,” Ph.D. dissertation, Diss., no. 487, Linköping university, Sweden,

June 1997.

60 Chapter 7. References

[54] G.-K. Ma and F. J. Taylor, “Multiplier policies for digital signal process-

ing,” IEEE ASSP Mag., vol. 7, no. 1, pp. 6–20, Jan. 1990.

[55] A. G. Dempster and M. D. Macleod, “Constant integer multiplication

using minimum adders,” IEE Proc. Circuits Devices Syst., vol. 141, no. 5,

pp. 407–413, Oct. 1994.

[56] O. Gustafsson, A. G. Dempster, and L. Wanhammar, “Extended re-

sults for minimum-adder constant integer multipliers,” in Proc. IEEE Int.

Symp. Circuits Syst., vol. 1, 2002.

[57] O. Gustafsson, A. Dempster, K. Johansson, M. Macleod, and L. Wanham-

mar, “Simpliﬁed design of constant coeﬃcient multipliers,” Circuits Syst.

Signal Process., vol. 25, no. 2, pp. 225–251, Apr. 2006.

[58] S. He and M. Torkelson, “FPGA implementation of FIR ﬁlters using

pipelined bit-serial canonical signed digit multipliers,” in Proc. IEEE Cus-

tom Integrated Circuits Conf., 1994, pp. 81–84.

[59] K. Johansson, O. Gustafsson, and L. Wanhammar, “Low-complexity bit-

serial constant-coeﬃcient multipliers,” in Proc. IEEE Int. Symp. Circuits

Syst., vol. 3, 2004.

[60] D. R. Bull and D. H. Horrocks, “Primitive operator digital ﬁlters,” IEE

Proc. Circuits, Devices and Syst., vol. 138, no. 3, pp. 401–412, Jun. 1991.

[61] M. Potkonjak, M. B. Srivastava, and A. P. Chandrakasan, “Multiple con-

stant multiplications: eﬃcient and versatile framework and algorithms

for exploring common subexpression elimination,” IEEE Trans. Comput.-

Aided Design Integr. Circuits Syst., vol. 15, no. 2, pp. 151–165, Feb. 1996.

[62] A. G. Dempster and M. D. Macleod, “Digital ﬁlter design using subex-

pression elimination and all signed-digit representations,” in Proc. IEEE

Int. Symp. Circuits Syst., vol. 3, 2004.

[63] R. I. Hartley, “Subexpression sharing in ﬁlters using canonic signed digit

multipliers,” IEEE Trans. Circuits Syst. II, vol. 43, no. 10, pp. 677–688,

Oct. 1996.

[64] M. Martinez-Peiro, E. I. Boemo, and L. Wanhammar, “Design of high-

speed multiplierless ﬁlters using a non-recursive signed common subex-

pression algorithm,” IEEE Trans. Circuits Syst. II, vol. 49, no. 3, pp.

196–203, Mar. 2002.

[65] R. Pasko, P. Schaumont, V. Derudder, S. Vernalde, and D. Durackova, “A

new algorithm for elimination of common subexpressions,” IEEE Trans.

Comput.-Aided Design Integr. Circuits Syst., vol. 18, no. 1, pp. 58–68, Jan.

1999.

References 61

[66] O. Gustafsson and L. Wanhammar, “A novel approach to multiple con-

stant multiplication using minimum spanning trees,” in Proc. IEEE Int.

Midwest Symp. Circuits Syst., vol. 3, 2002.

[67] O. Gustafsson, H. Ohlsson, and L. Wanhammar, “Improved multiple con-

stant multiplication using a minimum spanning tree,” in Proc. Asilomar

Conf. Signals Syst. Comput., vol. 1, 2004, pp. 63–66.

[68] O. Gustafsson, “A diﬀerence based adder graph heuristic for multiple con-

stant multiplication problems,” in Proc. IEEE Int. Symp. Circuits Syst.,

2007, pp. 1097–1100.

[69] K. Muhammad and K. Roy, “A graph theoretic approach for synthesiz-

ing very low-complexity high-speed digital ﬁlters,” IEEE Trans. Comput.-

Aided Design Integr. Circuits Syst., vol. 21, no. 2, pp. 204–216, Feb. 2002.

[70] H. Ohlsson, O. Gustafsson, and L. Wanhammar, “Implementation of low

complexity FIR ﬁlters using a minimum spanning tree,” in Proc. IEEE

Mediterranean Electrotechnical Conf., vol. 1, 2004, pp. 261–264.

[71] A. G. Dempster and M. D. Macleod, “Use of minimum-adder multiplier

blocks in FIR digital ﬁlters,” IEEE Trans. Circuits Syst. II, vol. 42, no. 9,

pp. 569–577, Sep. 1995.

[72] A. G. Dempster, S. S. Dimirsoy, and I. Kale, “Designing multiplier blocks

with low logic depth,” in Proc. IEEE Int. Symp. Circuits Syst., vol. 5,

2002.

[73] Y. Voronenko and M. Püschel, “Multiplierless multiple constant multipli-

cation,” ACM Trans. Algorithms, vol. 3, no. 2, p. article 11, Apr. May

2007.

[74] A. G. Dempster and N. P. Murphy, “Eﬃcient interpolators and ﬁlter banks

using multiplier blocks,” IEEE Trans. Signal Process., vol. 48, no. 1, pp.

257–261, Jan. 2000.

[75] O. Gustafsson and A. G. Dempster, “On the use of multiple constant

multiplication in polyphase FIR ﬁlters and ﬁlter banks,” in Proc. IEEE

Nordic Signal Process. Symp., 2004, pp. 53–56.

[76] A. G. Dempster, O. Gustafsson, and J. O. Coleman, “Towards an algo-

rithm for matrix multiplier blocks,” in Proc. Europ. Conf. Circuit Theory

Design, Sep. 2003.

[77] O. Gustafsson, H. Ohlsson, and L. Wanhammar, “Low-complexity con-

stant coeﬃcient matrix multiplication using a minimum spanning tree

approach,” in Proc. IEEE Nordic Signal Process. Symp., 2004, pp. 141–

144.

62 Chapter 7. References

[78] M. D. Macleod and A. G. Dempster, “Common subexpression elimination

algorithm for low-cost multiplierless implementation of matrix multipli-

ers,” Electron. Lett., vol. 40, no. 11, pp. 651–652, May 2004.

[79] N. Boullis and A. Tisserand, “Some optimizations of hardware multipli-

cation by constant matrices,” IEEE Trans. Comput., vol. 54, no. 10, pp.

1271–1282, Oct. 2005.

[80] L. Aksoy, E. Costa, P. Flores, and J. Monteiro, “Optimization algorithms

for the multiplierless realization of linear transforms,” ACM Trans. Des.

Autom. Electron. Syst., accepted.

[81] L. Milic, Multirate Filtering for Digital Signal Processing: MATLAB Ap-

plications. Hershey, PA: Information Science Reference - Imprint of: IGI

Publishing, 2008.

[82] R. E. Crochiere and L. Rabiner, Multirate Digital Signal Processing. Pren-

tice Hall, Englewood Cliﬀs, N.J., 1983.

[83] E. Hogenauer, “An economical class of digital ﬁlters for decimation and in-

terpolation,” IEEE Trans. Acoust., Speech, Signal Process., vol. 29, no. 2,

pp. 155 –162, Apr. 1981.

[84] D. Babic, J. Vesma, and M. Renfors, “Decimation by irrational factor us-

ing CIC ﬁlter and linear interpolation,” in Proc. IEEE Int. Conf. Acous-

tics, Speech Signal Process., vol. 6, 2001, pp. 3677–3680.

[85] S. Chu and C. Burrus, “Multirate ﬁlter designs using comb ﬁlters,” IEEE

Trans. Circuits Syst., vol. 31, no. 11, pp. 913–924, Nov. 1984.

[86] J. Candy, “Decimation for Sigma Delta modulation,” IEEE Trans. Com-

mun., vol. 34, no. 1, pp. 72–76, Jan. 1986.

[87] Y. Gao, L. Jia, and H. Tenhunen, “A ﬁfth-order comb decimation ﬁlter

for multi-standard transceiver applications,” in Proc. IEEE Int. Symp.

Circuits Syst., vol. 3, 2000, pp. 89–92.

[88] U. Meyer-Baese, Digital Signal Processing with Field Programmable Gate

Arrays (Signals and Communication Technology). Secaucus, NJ, USA:

Springer-Verlag New York, Inc., 2004.

[89] H. Aboushady, Y. Dumonteix, M.-M. Louerat, and H. Mehrez, “Eﬃcient

polyphase decomposition of comb decimation ﬁlters in Σ∆ analog-to-

digital converters,” IEEE Trans. Circuits Syst. II, vol. 48, no. 10, pp.

898–903, Oct. 2001.

[90] A. Blad and O. Gustafsson, “Integer linear programming-based bit-level

optimization for high-speed FIR decimation ﬁlter architectures,” Circuits

Syst. Signal Process., vol. 29, no. 1, pp. 81–101, Feb. 2010.

References 63

[91] A. Y. Kwentus, Z. Jiang, and J. Willson, A. N., “Application of ﬁlter

sharpening to cascaded integrator-comb decimation ﬁlters,” IEEE Trans.

Signal Process., vol. 45, no. 2, pp. 457–467, Feb. 1997.

[92] M. Laddomada and M. Mondin, “Decimation schemes for Sigma Delta

A/D converters based on kaiser and hamming sharpened ﬁlters,” IEE

Proc. –Vision, Image Signal Process., vol. 151, no. 4, pp. 287–296, Aug.

2004.

[93] G. Jovanovic-Dolecek and S. K. Mitra, “A new two-stage sharpened comb

decimator,” IEEE Trans. Circuits Syst. I, vol. 52, no. 7, pp. 1414–1420,

Jul. 2005.

[94] M. Laddomada, “Generalized comb decimation ﬁlters for Sigma Delta

A/D converters: Analysis and design,” IEEE Trans. Circuits Syst. I,

vol. 54, no. 5, pp. 994–1005, May 2007.

[95] ——, “Comb-based decimation ﬁlters for Sigma Delta A/D converters:

Novel schemes and comparisons,” IEEE Trans. Signal Process., vol. 55,

no. 5, pp. 1769–1779, May 2007.

[96] J. Kaiser and R. Hamming, “Sharpening the response of a symmetric

nonrecursive ﬁlter by multiple use of the same ﬁlter,” IEEE Trans. Acoust.,

Speech, Signal Process., vol. 25, no. 5, pp. 415–422, Oct. 1977.

[97] T. Saramäki and T. Ritoniemi, “A modiﬁed comb ﬁlter structure for dec-

imation,” in Proc. IEEE Int. Symp. Circuits Syst., vol. 4, 1997, pp. 2353–

2356.

[98] L. Lo Presti, “Eﬃcient modiﬁed-sinc ﬁlters for sigma-delta A/D convert-

ers,” IEEE Trans. Circuits Syst. II, vol. 47, no. 11, pp. 1204–1213, Nov.

2000.

[99] G. Stephen and R. W. Stewart, “Sharpening of partially non-recursive

CIC decimation ﬁlters,” in Proc. Asilomar Conf. Signals Syst. Comput.,

vol. 2, 2004, pp. 2170–2173.

[100] G. J. Dolecek, “Modiﬁed CIC ﬁlter for rational sample rate conversion,”

in Proc. Int. Symp. Commun. and Information Tech., 2007, pp. 252–255.

[101] J. Vesma, “Optimization and applications of polynomial-based interpo-

lation ﬁlters,” Ph.D. dissertation, Diss. no. 254, Tampere University of

Technology, 1999.

[102] D. E. Knuth, The Art of Computer Programming, vol. 2: Seminumerical

Algorithms, 1st ed. Addison-Wesley, Reading, Massachusetts, 1998.

[103] S. Lakshmivarahan and S. K. Dhall, Analysis and Design of Parallel Al-

gorithms: Arithmetic and Matrix Problems. McGraw-Hill, 1990.

64 Chapter 7. References

[104] J.-M. Muller, Elementary Functions: Algorithms and Implementation.

Birkhauser, 2005.

[105] G. Estrin, “Organization of computer systems: the ﬁxed plus variable

structure computer,” in Proc. Western Joint IRE-AIEE-ACM Computer

Conf., ser. IRE-AIEE-ACM ’60 (Western). New York, NY, USA: ACM,

1960, pp. 33–40.

[106] W. S. Dorn, “Generalizations of Horner’s rule for polynomial evaluation,”

IBM J. Res. Dev., vol. 6, no. 2, pp. 239–245, Apr. 1962.

[107] I. Munro and A. Borodin, “Eﬃcient evaluation of polynomial forms,” J.

Comput. Syst. Sci., vol. 6, no. 6, pp. 625–638, Dec. 1972.

[108] I. Munro and M. Paterson, “Optimal algorithms for parallel polynomial

evaluation,” J. Comput. Syst. Sci., vol. 7, no. 2, pp. 189–198, Apr. 1973.

[109] K. Maruyama, “On the parallel evaluation of polynomials,” IEEE Trans.

Comput., vol. 22, no. 1, pp. 2–5, Jan. 1973.

[110] L. Li, J. Hu, and T. Nakamura, “A simple parallel algorithm for polyno-

mial evaluation,” SIAM J. Sci. Comput., vol. 17, no. 1, pp. 260–262, Jan.

1996.

[111] J. Villalba, G. Bandera, M. A. Gonzalez, J. Hormigo, and E. L. Zapata,

“Polynomial evaluation on multimedia processors,” in Proc. IEEE Int.

Applicat.-Speciﬁc Syst. Arch. Processors Conf., 2002, pp. 265–274.

[112] D.-U. Lee, A. A. Gaﬀar, O. Mencer, and W. Luk, “Optimizing hardware

function evaluation,” IEEE Trans. Comput., vol. 54, no. 12, pp. 1520–

1531, Dec. 2005.

[113] D.-U. Lee, R. C. C. Cheung, W. Luk, and J. D. Villasenor, “Hardware im-

plementation trade-oﬀs of polynomial approximations and interpolations,”

IEEE Trans. Comput., vol. 57, no. 5, pp. 686–701, May 2008.

[114] N. Brisebarre, J.-M. Muller, A. Tisserand, and S. Torres, “Hardware op-

erators for function evaluation using sparse-coeﬃcient polynomials,” Elec-

tron. Lett., vol. 42, no. 25, pp. 1441–1442, Dec. 2006.

[115] F. de Dinechin, M. Joldes, and B. Pasca, “Automatic generation of

polynomial-based hardware architectures for function evaluation,” in Proc.

IEEE Int. Applicat.-Speciﬁc Syst. Arch. Processors Conf., 2010, pp. 216–

222.

[116] S. Nagayama, T. Sasao, and J. T. Butler, “Design method for numerical

function generators based on polynomial approximation for FPGA imple-

mentation,” in Proc. Euromicro Conf. Digital Syst. Design Arch., Methods

and Tools, 2007, pp. 280–287.

References 65

[117] J. Pihl and E. J. Aas, “A multiplier and squarer generator for high perfor-

mance DSP applications,” in Proc. IEEE Midwest Symp. Circuits Syst.,

vol. 1, 1996, pp. 109–112.

[118] J.-T. Yoo, K. F. Smith, and G. Gopalakrishnan, “A fast parallel squarer

based on divide-and-conquer,” IEEE J. Solid-State Circuits, vol. 32, no. 6,

pp. 909–912, Jun. 1997.

[119] M. Wojko and H. ElGindy, “On determining polynomial evaluation struc-

tures for FPGA based custom computing machines,” in Proc. Australasian

Comput. Arch. Conf., 1999, pp. 11–22.

[120] H. C. A. van Tilborg, Encyclopedia of Cryptography and Security. New

York, USA,: Springer, 2005.

[121] R. L. Rivest, A. Shamir, and L. Adleman, “A method for obtaining digital

signatures and public-key cryptosystems,” Commun. ACM, vol. 21, no. 2,

pp. 120–126, Feb. 1978.

[122] A. C.-C. Yao, “On the evaluation of powers,” SIAM J. Comput., vol. 5,

no. 1, pp. 100–103, Aug. 1976.

[123] D. Bleichenbacher and A. Flammenkamp, “An eﬃcient algorithm for com-

puting shortest addition chains,” SIAM J. Comput., 1998.

Linköping Studies in Science and Technology Dissertations, No 1420

Muhammad Abbas mabbas@isy.liu.se www.es.isy.liu.se Division of Electronics Systems Department of Electrical Engineering Linköping University SE–581 83 Linköping, Sweden

Copyright c 2012 Muhammad Abbas, unless otherwise noted. All rights reserved. Papers A, D, and E reprinted with permission from IEEE. Papers B and C, partial work, reprinted with permission from IEEE.

Abbas, Muhammad On the Implementation of Integer and Non-Integer Sampling Rate Conversion ISBN 978-91-7519-980-1 ISSN 0345-7524

A Typeset with L TEX 2ε Printed by LiU-Tryck, Linköping, Sweden 2012

To my loving parents

The fractional-delay ﬁlters are used in this context to provide a fractional-delay adjustable to any desired value and are therefore suitable for both integer and non-integer factors. They are usually approximated by some rational factor. CIC ﬁlters are most commonly used in the case of integer sampling rate conversions and often preferred due to their simplicity. SRC is used in many communication and signal processing applications where two signals or systems having diﬀerent sampling rates need to be interconnected. and cascaded integrator comb (CIC) ﬁlters are one of them. which generally yields to low power consumption. or an irrational number. The computation of a value between existing samples can alternatively be regarded as delaying the underlying signal by a fractional sampling period. The performance of decimators and interpolators is mainly determined by the ﬁlters. The multiplierless nature. are critical with respect to power consumption. a ratio of two integers. In this work. Sampling rate conversion is a typical example where it is required to determine the values between existing samples. In the second approach. Since these ﬁlters operate at the maximum sampling frequency. therefore. The SRC by an irrational numbers is impractical and is generally stated for the completeness. It is therefore necessary to have an accurate and eﬃcient ways and approaches that could be utilized to estimate the power consumption and the important factors that are contributing to it. The conversion factor in general can be an integer. which are there to suppress aliasing eﬀects or removing unwanted images.Abstract The main focus in this thesis is on the aspects related to the implementation of integer and non-integer sampling rate conversion (SRC). There are two basic approaches to deal with this problem. There are many approaches for the implementation of decimation and interpolation ﬁlters. which is an important parameter to consider since the input sampling rate may diﬀer several orders of magnitude. the implementation complexity for the second approach varies for diﬀerent types of conversion factors. hardware eﬃciency. These power estimates at higher level can then be used as a feed-back while exploring multiple alternatives. The main advantage of the Farrow strucv . However. To have a high-level estimate of dynamic power consumption. and relatively good anti-aliasing (anti-imaging) characteristics for the ﬁrst (last) stage of a decimation (interpolation). The structure that is used in the eﬃcient implementation of a fractional-delay ﬁlter is know as Farrow structure or its modiﬁcations. digital signal processing techniques are utilized to compute values of the new samples from the existing ones. The former approach is hardly used since the latter one introduces less noise and distortion. The ﬁrst is to convert the signal to analog and then re-sample it at the desired rate. The modeling of leakage power is also included. makes CIC ﬁlters well suited for performing conversion at higher rate. switching activity equations in CIC ﬁlters are derived. which may then be used to have an estimate of the dynamic power consumption. the second approach for SRC is considered and its implementation details are explored. Switching activity is one such factor.

In the considered ﬁxed-point implementation of the Farrow structure. closed-form expressions for suitable word lengths are derived based on scaling and round-oﬀ noise.vi Abstract ture lies in the fact that it consists of ﬁxed ﬁnite-impulse response (FIR) ﬁlters and there is only one adjustable fractional-delay parameter. By considering the number of operations of diﬀerent types. As a part of this. since exponents are additive under multiplication. and latency after pipelining. In the parallel evaluation of powers. The implementation of the polynomial part of the Farrow structure is investigated by considering the computational complexity of diﬀerent polynomial evaluation schemes. critical path. used to evaluate a polynomial with the ﬁlter outputs as coeﬃcients. Since multipliers share major portion of the total power consumption. This characteristic of the Farrow structure makes it a very attractive structure for the implementation. high-level comparisons are obtained and used to short list the suitable candidates. pipelining complexity. . Most of these evaluation schemes require the explicit computation of higher order power terms. a matrix-vector multiple constant multiplication approach is proposed to improve the multiplierless implementation of FIR sub-ﬁlters. redundancy in computations is removed by exploiting any possible sharing at word level and also at bit level. an ILP formulation for the minimum addition sequence problem is proposed.

Som en sista del i avhandlingen presenteras dels en undersökning av komplexiteten för olika metoder att räkna ut polynom. Mycket av tidigare arbete har handlat om hur man väljer värden för multiplikatorerna. Utöver detta presenteras en ny metod för att ersätta multiplikatorerna med adderare och multiplikationer med två. medan själva implementeringen har rönt mindre intresse. Ett exempel kan vara system där ﬂera olika standarder stöds och varje standard behöver behandlas med sin egen datahastighet. Läckningsströmmar blir ett relativt sett större och större problem ju mer kretsteknologin utvecklas. utan att behöva ta hänsyn till om ändringen av datatakten är ett heltal eller inte. I denna avhandling presenteras ett antal resultat relaterat till implementeringen av sådana ﬁlter. helt utan mer kostsamma multiplikatorer som behövs i många andra ﬁlterklasser. Detta är viktigt eftersom antalet bitar direkt påverkar mängden kretsar som i sin tur påverkar mängden eﬀekt som krävs. dels föreslås två olika metoder att eﬀektivt räkna ut kvadrater. För att kunna ändra hastigheten krävs i princip alltid ett digitalt ﬁlter som kan räkna ut de värden som saknas eller se till att man säkert kan slänga bort vissa data utan att informationen förstörs. Utöver detta presenteras mycket noggranna ekvationer för hur ofta de digitala värdena som representerar signalerna i dessa ﬁlter statistiskt sett ändras. där den största skillnaden jämfört med tidigare liknande arbeten är att eﬀekt som förbrukas genom läckningsströmmar är medtagen. I Farrow-ﬁlter så behöver det även implementeras en uträkning av ett polynom. Dessa används för att fördröja en signal mindre än en samplingsperiod. Dessa används ﬂitigt då de kan implementeras med enbart ett fåtal adderare. Ett annat är dataomvandlare som ut vissa aspekter blir enklare att bygga om de arbetar vid en högre hastighet än vad som teoretiskt behövs för att representera all information i signalen.Populärvetenskaplig sammanfattning I system där digitala signaler behandlas så kan man ibland behöva ändra datahastigheten (samplingshastighet) på en redan existerande digital signal. Detta är intressant eftersom multiplikationer med två kan ersättas med att koppla ledningarna lite annorlunda och man därmed inte behöver några speciella kretsar för detta. något som har en direkt inverkan på eﬀektförbrukningen. Här presenteras slutna uttryck för hur många bitar som behövs i implementeringen för att representera allt data tillräckligt noggrant. samt enkelt kan användas för olika ändringar av datahastighet så länge ändringen av datatakten är ett heltal. vii . så det är viktigt att modellerna följer med. kuber och högre ordningens heltalsexponenter av tal. En modell för hur mycket eﬀekt olika typer av implementeringar förbrukar presenteras. Den andra klassen av ﬁlter är så kallade Farrow-ﬁlter. något som kan användas för att räkna ut mellanliggande datavärden och därmed ändra datahastighet gotdtyckligt. Den första klassen av ﬁlter är så kallade CIC-ﬁlter.

.

A preliminary version of the above work can be found in: M. Circuits Syst. O. is extended with the modeling of leakage power. Paper A The power modeling of diﬀerent realizations of cascaded integrator-comb (CIC) decimation ﬁlters. 2010. Sweden. “Switching activity estimation for cascaded integrator comb ﬁlters. Gustafsson. ix . Abbas. Abbas. under review. Johansson. Wanhammar. Shanghai. based on the proposed model.” in Proc. The work has been done between December 2007 and December 2011. 2010. demonstrates the close correspondence of the estimation model to the simulated one.Preface This thesis contains research work done at the Division of Electronics Systems. Gustafsson. Model results for the case of phase accumulators of direct digital frequency synthesizers (DDFS) are also presented. The switching activities may then be used to estimate the dynamic power consumption. The switching activity estimation model is ﬁrst developed for the general-purpose integrators and CIC ﬁlter integrators.. “Power estimation of recursive and non-recursive CIC ﬁlters implemented in deep-submicron technology. and those obtained by simulation. 22–24. Conf. and has resulted in the following publications. The inclusion of this factor becomes more important when the input sampling rate varies by several orders of magnitude. The correlation in sign extension bits is also considered in the switching estimation model. recursive and non-recursive. “Switching activity estimation of CIC ﬁlter integrators. Shanghai. China. Paper B A method for the estimation of switching activity in cascaded integrator comb (CIC) ﬁlters is presented. Diﬀerent values of diﬀerential delay in the comb part are considered for the estimation. Green Circuits Syst. Also the importance of the input word length while comparing recursive and non-recursive implementations is highlighted. IEEE Asia Paciﬁc Conf. The comparison of theoretical estimated switching activity results. M. I. Department of Electrical Engineering.” IEEE Trans. Sept. Linköping University. and K. and L. Gustafsson. Postgraduate Research in Microelectronics and Electronics. which normally operate at the lower sampling rate. China. The switching activity estimation model is also derived for the comb sections of the CIC ﬁlters. The model was then extended to gather the eﬀects of pipelining in the carry chain paths of CIC ﬁlter integrators.” in Proc. Abbas and O. M. O. June 21–23. IEEE Int.

High-level comparisons of these schemes are obtained based on the number of operations of diﬀerent types. 2011. “On the implementation of fractional delay ﬁlters based on the Farrow structure. Gustafsson. First. and in most cases fewer total adders. Paper E The problem of computing any requested set of power terms in parallel using summations trees is investigated. “Computational and implementation complexity of polynomial evaluation schemes. Circuits Syst.. a round-oﬀ noise analysis is performed and closed-form expressions are derived. “Round-oﬀ analysis and word length optimization of the fractional delay ﬁlters based on the Farrow structure..” in Proc. but they are divided into data-data multiplications.” in Proc. fewer delay elements. signal scaling in the Farrow structure is studied which is crucial for a ﬁxed-point implementation. These parameters are suggested to consider to short list suitable candidates for an implementation given the speciﬁcations. Lund. Second. Sweden. A technique is proposed. 2009. O. Symp. Abbas. IEEE Int. Taiwan. Circuits Syst. Gustafsson. May 24–27.. critical path. Taipei. M. squarers. and latency after pipelining. Closed-form expressions for the scaling levels for the outputs of each sub-ﬁlter as well as for the nodes before the delay multipliers are derived. O. Third. and data-coeﬃcient multiplications. Their impact on diﬀerent parameters suggested for the selection is stated. Johansson. “Scaling of fractional delay ﬁlters using the Farrow structure. May 4–5. In comparisons.x Paper C Preface In this work. Sweden. Johansson. 14–15. O. there are three major contributions. and H. 2009. Swedish System-on-Chip Conf. Abbas and O. By using these closed-form expressions for the round-oﬀ noise and scaling in terms of integer bits. I. Preliminary versions of the above work can be found in: M. Paper D The computational complexity of diﬀerent polynomial evaluation schemes is studied. pipelining complexity. Nov. and H. under review. Arlid. Abbas.” in Proc. direct form sub-ﬁlters leading to a matrix MCM block is proposed. Rusthållargården. and H. Gustafsson. diﬀerent approaches to ﬁnd the suitable word lengths to meet the round-oﬀ noise speciﬁcation at the output of the ﬁlter are proposed. Gustafsson. M. not only multiplications are considered. IEEE Norchip Conf. which ﬁrst generates . Johansson. which stems from the approach for implementing parallel FIR ﬁlters. Abbas.” IEEE Trans. The use of a matrix MCM blocks leads to fewer structural adders. M.

Since exponents are additive for a multiplication. The redundancy here relates to the fact that same three partial products may be present in more than one columns. Gustafsson. Comp. Z. Sheikh.” in Proc. Gustafsson. CA. hence. The testing of the proposed algorithm for diﬀerent sets of powers. M. 2010. Comp.. Abbas and O. H. The contributions are also made in the following publication but the contents are not directly relevant or less relevant to the topic of thesis. Paper F An integer linear programming (ILP) based model is proposed for the computation of a minimal cost addition sequence for a given set of integers. M. Paper G Based on the switching activity estimation model derived in Paper B. and K. “Integer linear programming modeling of addition sequences with additional constraints for evaluation of power terms.” manuscript. Not only is an optimal model proposed. This approach has achieved considerable hardware savings for almost all of the cases. Qureshi. the minimal length addition sequence will provide an eﬃcient solution for the evaluation of a requested set of power terms.Preface xi the partial product matrix of each power term independently and then checks the computational redundancy in each and among all partial product matrices at bit level. “Comparison of multiplierless implementation of nonlinear-phase versus linear-phase FIR ﬁlters. Paciﬁc Grove. “Switching activity estimation of DDFS phase accumulators. Oct. and signed/unsigned numbers is done to exploit the sharing potential. help to reduce the solution time. but the model is extended to consider diﬀerent costs for multipliers and squarers as well as controlling the depth of the resulting addition sequence. Signals Syst.. Additional cuts are also proposed which. O. Abbas. the model equations are derived for the case of phase accumulators of direct digital frequency synthesizers (DDFS). Gustafsson. M.” manuscript. 2008. CA. 26–29. O.” in Proc. variable word lengths. and A. M. Asilomar Conf. 7–10. Johansson. although not required for the solution. Gustafsson. Signals Syst. Abbas and O. Blad. Abbas. and. Johansson. “Low-complexity parallel evaluation of powers exploiting bit-level redundancy. all can be mapped to the one full adder. . Asilomar Conf. Paciﬁc Grove. F. Nov.

.

Anton Blad for their kind and constant support throughout may stay here at the department. Muhammad Saifullah Khan. Oscar Gustafsson and Prof. My friends here in Sweden. Dr. the Merciful. Higher Education Commission (HEC) of Pakistan is gratefully acknowledged for the ﬁnancial support and Swedish Institute (SI) for coordinating the scholarship program. Erik Höckerdal for proving the LaTeX template. Kent Palmkvist for help with FPGA and VHDL-related issues. Dr. thoughts. which has made life very easy. Muhammad Irfan Kazim. Dr. Fahad Qazi. aﬀectionate parents. the last prophet of Allah. Rizwan Asghar for their generous help and guidance at the start of my PhD study. talented teachers. Ali Saeed. helping friends and an opportunity to contribute to the vast body of knowledge. Rashad Ramzan and Dr. Peter Johansson for all the help regarding technical as well as administrative issues. and Syed Asad Alam for being caring friends and providing help in proofreading this thesis. for their inspiring and valuable guidance. Dr. Linköping University is also gratefully acknowledged for partial support. They always kindly do their best to help you. Kenny Johansson for introducing me to the area and power estimation tools and sharing many useful scripts. enlightening discussions. Tafzeel ur Rehman. the Compassionate. Linköping University have created a very friendly environment. Syed Ahmed Aamir. Mohammad Junaid. Håkan Johansson. who gave health. xiii . Dr. Peace and blessing of Allah be upon the Holy Prophet MUHAMMAD (peace be upon him). Department of Electrical Engineering. Jawad ul Hassan. Fahad Qureshi. and many more for all kind of help and keeping my social life alive. Nadeem Afzal. The former and present colleagues at the Division of Electronics Systems. Saima Athar. Dr. who exhort his followers to seek for knowledge from cradle to grave and whose incomparable life is the glorious model for the humanity. Zafar Iqbal. Dr. I have learnt a lot from them and working with them has been a pleasure. Zaka Ullah.Acknowledgments I humbly thank Allah Almighty. I would like to express my sincere gratitude towards: My advisors. Amir Eghbali and Dr. Muhammad Touqir Pasha. kind and dynamic supervision through out and in all the phases of this thesis.

Qaisar Abbas and sister-in-law Dr. Thanks to both of you for having conﬁdence and faith in me. care of Abiha single handedly. My elder brother Dr. who has made my life so beautiful. Truly you hold the credit for all my achievements. I say profound thanks for bringing pleasant moments in my life. patience. unconditional cooperation. 2012. To my little princess. I have never felt that I am away from home. My mother and my father for their non-stop prays. My younger brother Muhammad Waqas and sister for their prays and taking care of the home in my absence. brothers. My wife Dr. My parent-in-laws. Thanks for hosting many wonderful days spent there at Uppsala. Abiha. To those not listed here. and being away from her family for few years. Khalid Bin Sagheer. and Ghulam Hussain for all the help and care they have provided during the last four years.xiv Acknowledgments My friends and colleagues in Pakistan especially Jamaluddin Ahmed. Linköping Sweden . Uzma for their help at the very start when I came to Sweden and throughout the years later on. Shazia for her devotion. being a great asset with me during all my stay here in Sweden. Muhammad Zahid. and sisters for their encouragement and prays. Muhammad Abbas January.

. . . . . 2. . .1 Two’s Complement Numbers . .1 Digital Filters . .1. . . . . 2. . . . . . . . . . . . . . . . . . . . . . . . 1. . . . . .1 FIR Decimators . . . 1. . . . .1. . 1. .2 Canonic Signed-Digit Representation 2. . . .1 FIR Filters . . . 2. . . . . . . . . . .1. . . . . .1 Basic Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1. . . . . . .1. . . . . . . . . . . . . . . . . . . . . . . . . . .2. . . . .1. . . . . . 1. . 3. . 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Two’s Complement Addition . 2. 1. . . . 2. . .2. . . . . . . . . . . . . . . . . . .2 Sampling Rate Conversion . . . . . . . . . 3. . . . . . . . . 1 1 1 2 3 4 5 7 8 10 13 17 17 18 19 19 21 21 21 22 23 23 24 27 27 28 29 29 30 31 2 Finite Word Length Eﬀects 2. . . . . . . . . . . . . 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.3 Overﬂow Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3. . . 3. . . . . . . . . . . . . . . . . . . . . . .6 Word Length Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . 2. .2 Two’s Complement Multiplication . . . .4 Fractional-Delay Filters . . . .7 Constant Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 FIR Interpolators . . . . . . . .3 Polyphase Representation . . . . . . . .3. . . . . .2 IIR Filters . . . . 2. . .5 Round-Oﬀ Noise . . . . . . . . . . . . . . . . .2 Interpolation . . . . .2 Polyphase FIR Filters . . . . . . .5 Power and Energy Consumption . . . . . . . . . . .1 Numbers Representation . 1. . . . . . . . . . . . . . . . . . . . . 2. . . . . . . . .2. . .1 Polyphase FIR Decimators . . . 3. . . . . . . . . . . . . . . . . . . .1. . . . . . . . . 2. . . . . . . . . . . . .2. . . . . . . . . . . .4 Scaling . . 3 Integer Sampling Rate Conversion 3. .2 Fixed-Point Quantization . . .Contents 1 Introduction 1.2 Polyphase FIR Interpolators xv . . . . . .3 Noble Identities . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Decimation . . 1. . .

. 5. . . . . . . . .2 Horner’s Scheme .4. . . . . . . . . . . References . . 41 4. . . .4. . . . . . . . . . . . 5 Conclusion . . . . . . . . . . 56 Publications A Power Estimation of Recursive and Non-Recursive CIC Implemented in Deep-Submicron Technology 1 Introduction . Filters . . . . . . . . . . . . . .1 Powers Evaluation with Sharing at Word-Level 5. . . . . . . . .1 CIC Filters in Interpolators and Decimators . . . . . . 2. . . . . 3. 31 32 33 35 36 4 Non-Integer Sampling Rate Conversion 37 4. . 37 4. . . . . . . . . . . . . . .xvi 3. . . . . . . . . . 39 4. Cascaded Integrator Comb Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 Powers Evaluation . .4 Multistage Implementation . . 5. . . . . . . .3 Parallel Schemes . . . . . . . . . . . . . . . . . . . . . . 3. . . . . . 53 7 References 55 References . . . 4 Results . . . . . . .2 Polyphase Implementation of Rational Sampling Rate Conversion . . . . . . . . . . . . . . . . . . . . . 67 69 72 72 73 76 76 79 80 83 . .2 Powers Evaluation with Sharing at Bit-Level . . . . . . . . . . . . . . . . . .1. . .3 Large Rational Factors . . . . . .1 Polynomial Evaluation Algorithms . . . . . . . . .1 Sampling Rate Conversion: Rational Factor . . . . . . . . . . . . . . . . . . . . . . . . . .1 Complexity and Power Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4. . . . . . . . . . . . . . . .1 Small Rational Factors . . .1 Complexity and Power Model . . . . . . 3 Non-Recursive CIC Filters . . . . . . . . . . . . . . . . . . . . . . . . 5. 40 4. . . . . . . . . . 6 Conclusions and Future Work 53 6. .2 Polyphase Implementation of Non-Recursive CIC 3. . . . . . . 5. . . . . . . . . . . Contents . . . . . . . . . . . .2. 53 6. . 3.1 Farrow Structures . . . . . . . . . . . . . . . . .2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1.4. . . . . . . . . . . . . . . . . . . .3 3. . . . . . . . . . . . . . . . . . . . 37 4. . . . . . . 45 45 46 46 47 48 49 . .1. . . . . . . . . . . . . . . . . . . . . . 2 Recursive CIC Filters . . . . .3 Compensation Filters . . . . . . . . . . . . . . . . . . . .4. . . .1 Conclusions . . . . . . . . . . . . . . . . . 41 5 Polynomial Evaluation 5. . . . . . . . . . . .2 Sampling Rate Conversion: Arbitrary Factor . . . .

. . . . . . . . . . . .2 Word Length Selection Approaches . . . . . .4 Switching Activity for Correlated Sign Extension Bits . . . . . . . . . .2 Scaling in the Polynomial Evaluation .4 Switching Activity for Later Comb Stages .1 Scaling . . 118 C On the Implementation of Fractional-Delay Filters the Farrow Structure 1 Introduction . . . . . 4. .2 CIC Combs . 116 A Double Diﬀerential Delay Comb . . . . . . . . . . . 6. . . . . . . . . . . . . . . . . . . . . . 109 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108 5 Results . . . . . . . . . . . . . .1 One’s Probability . . . .3 Approaches to Select Word Lengths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6. . . . 151 . . . . 97 3. . . . . . . . . . 146 . . . . . . . 130 . . . . . .1 Rounding and Truncation .1 Switching Activity . . . . 4 Round-Oﬀ Noise . . . . . . . . . . . . . . . . . .1 One’s Probability . 104 4. . 1. . 126 . . 111 7 Conclusions . . . 109 5. . . . 133 . . . . . 118 A. . . . . . . . . . . . . . .2 Round-Oﬀ Noise in the Farrow Structure . . 129 . . . . . 2 Adjustable FD FIR Filters and the Farrow Structure 3 Scaling of the Farrow Structure for FD FIR Filters . . . . . . . . . . . . . . . . 137 . . . 112 References . . 144 . . . . . . . . 126 . . 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101 3. . . . . . . . . . . . . . . . . . . . . . . . . .3 Sub-Filter Implementation . . . . 103 4. . . . . . . . . . . . 143 . . . . . .3 Pipelining in CIC Filter Accumulators/Integrators . . . . . . . . . .2 Switching Activity . . 4. . . . . . . . . . . . . . . . . . . . . . . . . . 127 . . . .2 Outline . . 3. . . . . . . 7 Conclusions . . . . . . . . . . . . . . . . . . References . . . 128 . . . . . . . . . . . 145 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Based on 123 . . . . . . . . . . . . . 90 3. . . . . . . . . . 89 3 CIC Filter Integrators/Accumulators . . . . . . . . . . . . . . . . . . . . . . . . . . 134 . . . . .1 CIC Integrators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. . . . . . . . . 110 6 Discussion . . . . . . . . . . . . . . . . . . . . . . . 91 3. . . . 93 3. . . . . . . . . . . . . . . . . . . . 127 .1 Contribution of the Paper . . . . . . . . . . . . . . . . . . . . . 5 Implementation of Sub-Filters in the Farrow Structure 6 Results . . . . . . . . . .1 Scaling the Output Values of the Sub-Filters . . . . . . . . . . . 103 4. . . . . . .2 Switching Activity . . . . . . . . . . . . . . . . . . . . . . 106 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . .Contents xvii B Switching Activity Estimation of Cascaded Integrator Comb Filters 85 1 Introduction . . . . . . 150 . . . . . . . . . . . 88 2 CIC Filters . 136 . .3 Arbitrary Diﬀerential Delay Comb . . . . . . 1. . . 102 4 CIC Filter Combs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148 . . . . . . . . 3. . . . . . .5 Switching Activity for Later Integrator Stages . . . . . . .

. . and Latency . 161 2. . . . . . . . . . . . .8 Pipelining. . . . . . . . . . . . . . . . . . . . . . . . 184 References . . . . . . . . . 181 3. . 181 3. . . . . . . . . . . . . . . . . . . . . . .1 Unsigned Binary Numbers . . . . . . . . . . . . . . . 167 References . . . . . . . 178 2. . . . . . 158 2 Polynomial Evaluation Algorithms . . . . . 164 2. .3 Partial Product Matrices with CSD Representation of Partial Product Weights . . . . . . . . . . . . . . . . 180 2. . . . . . . . . 180 3 Proposed Algorithm for the Exploitation of Redundancy . . . . . . . . . 194 3. .5 Maruyama’s Scheme . . 195 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 178 2 Powers Evaluation of Binary Numbers . . . . . . . . . . . . . .2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159 2. . . . . . . . . . . . . . . 201 . . . . . . . . .2 Minimizing Weighted Cost . . . . . . . . . . . . . . . . 162 2. . . . . . . . .4 Pruning of the Partial Product Matrices . . 182 3. . . Critical Path. . . . . . . . . . . . 159 2. . 182 4 Results . . . . . . . . . . . . . . . . . 196 5 Conclusions . . . . . . . . . . . . 196 4 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193 3 Proposed ILP Model . .1 Unsigned Binary Numbers Case . . . . . . . . . . . . . . . . . . . . .1 Horner’s Scheme . . 163 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 165 4 Conclusions .4 Munro and Paterson’s Scheme . . . . . . . . . . . . . . . . . . .xviii Contents D Computational and Implementation Complexity of Polynomial Evaluation Schemes 155 1 Introduction .3 Minimizing Depth . . . . . . . . . . . . .2 Dorn’s Generalized Horner Scheme . . . . 178 2. . . . . . . . . . . . 172 E Low-Complexity Parallel Evaluation of Powers Exploiting BitLevel Redundancy 175 1 Introduction . . . Scheme . . . . . . . . . . . 188 F Integer Linear Programming Modeling of Addition Sequences With Additional Constraints for Evaluation of Power Terms 189 1 Introduction . . 192 2 Addition Chains and Addition Sequences . . . . . 194 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198 References . . . . . . . . . . . .7 Li et al. . . . . . . . . . . . . . . . . . . . . . . . 160 2. . . 164 3 Results . . . . . . . . 183 5 Conclusion . . . . . . . . . . . . . . . . . .2 Powers Evaluation for Two’s Complement Numbers . . . . . . . . . . . . . . . . . . .3 Two’s Complement Input and CSD Encoding Case . . . . . . . . . . .3 Estrin’s Scheme . . . . . .6 Even Odd (EO) Scheme . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180 2. . . . . . . . . . . . . 159 2. . . . . . . . . . . . . . . . . . . . . . . . .1 Basic ILP Model . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xix 203 206 207 207 209 211 . . . . . . . . . 4 Conclusion . . . . . . References . . . . . . . . . . . . . . . Phase Accumulators . 3 Switching Activity . . . . . . . . . . . . . . 2 One’s Probability . . . . . . . .Contents G Switching Activity Estimation of DDFS 1 Introduction . . . . . . . . . . . . . . . . . . .

.

1 FIR Filters If the impulse response is of ﬁnite duration and becomes zero after a ﬁnite number of samples. The sampling rate conversion (SRC) factor can be an integer or a non-integer. Finally. there is a need to convert the given sampling rate to the desired sampling rate. A part of the chapter is devoted to the eﬃcient polyphase implementation of decimators and interpolators. A digital ﬁlter is characterized by its transfer function. output. it is a ﬁnite-length impulse response (FIR) ﬁlter. The fractional-delay ﬁlters are then brieﬂy reviewed. and inside of the systems. or equivalently. The concept of decimation and interpolation that include ﬁltering is explained. when changing the sampling rate by an integer factor. the power and energy consumption in CMOS circuits is described. The description of six identities that enable the reductions in computational complexity of multirate systems is given. This chapter gives a brief overview of SRC and the role of ﬁltering in SRC. In applications involving systems operating at diﬀerent sampling rates. 1.Chapter 1 Introduction Linear time-invariant (LTI) systems have the same sampling rate at the input.1. A causal 1 . without destroying the signal information of interest.1 Digital Filters Digital ﬁlters are usually used to separate signals from noise or signals in different frequency bands by performing mathematical operations on a sampled. Digital ﬁlters are ﬁrst introduced. The SRC. by its diﬀerence equation. is then explained. The time and frequency-domain representations of the downsampling and upsampling operations are then given. 1. discrete-time signal.

h(N ) are the impulse response values.2. 2] N H(z) = k=0 h(k)z −k .1) which is a polynomial in z −1 of degree N . The number of coeﬃcient multiplications in the direct and transpose forms can be halved when exploiting the coeﬃcient symmetry of linear-phase FIR ﬁlter. The transpose structure is shown in Fig. . theoretically never approaches zero. (1. N + 1. The parameter N is the ﬁlter order and total number of coeﬃcients.2 x(n) z −1 h(1) z −1 h(2) z −1 h(3) Chapter 1. The time-domain input-output relation of the above FIR ﬁlter is given by N y(n) = k=0 h(k)x(n − k). . The FIR ﬁlters can be designed to provide exact linearphase over the whole frequency range and are always bounded-input boundedoutput (BIBO) stable.2: Transposed direct-form realization of an N -th order FIR ﬁlter. 1. (1. it is an inﬁnite-length impulse response (IIR) ﬁlter. The total number of coeﬃcient multiplications will be N/2 + 1 for even N and (N + 1)/2 for odd N. respectively.1: Direct-form realization of an N -th order FIR ﬁlter.2) where y(n) and x(n) are the output and input sequences. deﬁned as [1. 1. .1.e. i. independent of the ﬁlter coeﬃcients [1–3]. x(n) h(N ) h(N − 1) z −1 h(2) z h(1) −1 h(0) z −1 y(n) Figure 1. and h(0). 1. h(1).1 is the block diagram description of the diﬀerence equation (1. This type . Introduction z −1 h(N − 1) h(N ) h(0) y(n) Figure 1. .2)..2 IIR Filters If the impulse response has an inﬁnite duration. is the ﬁlter length. FIR ﬁlter of order N is characterized by a transfer function H(z). The direct form structure in Fig. also called ﬁlter coeﬃcients.

The cascade connection of the non-recursive and recursive sections results in a structure called direct-form I as shown in Fig. possible stability issues. an IIR ﬁlter can attain the same magnitude speciﬁcation requirements with a transfer function of signiﬁcantly lower order [1].2. a ratio of two integers. Sampling Rate Conversion x(n) z −1 z −1 b0 y(n) −a1 −a2 −aN −1 −aN z −1 z −1 z −1 3 b1 b2 bN −1 z −1 bN Figure 1. 1.3. and sensitivity to quantization errors [1. Mathematically. Compared with an FIR ﬁlter. the SRC factor can be deﬁned as Fout R= .2 Sampling Rate Conversion Sampling rate conversion is the process of converting a signal from one sampling rate to another.3: Direct-form I IIR realization.3) The ﬁrst sum in (1.4) Fin . SRC is utilized in many DSP applications where two signals or systems having diﬀerent sampling rates need to be interconnected to exchange digital signal data. 4]. while changing the information carried by the signal as little as possible [5–8]. 2] N N y(n) = k=0 bk x(n − k) − k=1 ak y(n − k). These two parts can be implemented separately and connected together. The drawbacks are nonlinear phase characteristics.1.3) is non-recursive and the second sum is recursive. (1. or an irrational number. 1. of ﬁlter is recursive and represented by a linear constant-coeﬃcient diﬀerence equation as [1. The SRC factor can in general be an integer. (1.

5) as +∞ Y (ejωT ) = −∞ x(mM )e−jωT m = 1 M M−1 X(ej(ωT −2πk)/M ). respectively. consisting of an anti-aliasing ﬁltering followed by an operation known as downsampling [9].6) The above equation shows the implication of the downsampling operation on the spectrum of the original signal. x(n). For R > 1. SRC is required when there is need of x(n) = xa (n/Fout ) and the continuous-time signal xa (t) is not available anymore. A sequence can be downsampled with a factor of M by retaining every M -th sample and discarding all of the remaining samples.1 Decimation Decimation by a factor of M . The downsampling operation is linear but time-varying operation. k=0 (1. 1. Applying the downsampling operation to a discrete-time signal. The frequency domain representation of downsampling can be found by taking the z-transform to both sides of (1. the sampling rate is increased and this process is known as interpolation.2. For R < 1. the sampling rate is reduced and this process is known as decimation. can be performed as a two-step process. The sampling frequencies are chosen in such a way that each of them exceeds at least two times the highest frequency in the spectrum of original continuous-time signal. A signal downsampled by two diﬀerent factors may have two diﬀerent shape output signals but both carry the same information if the downsampling factor satisﬁes the sampling theorem criteria. (1. A delay in the original input signal by some samples does not result in the same delay of the downsampled signal. One alternative is to ﬁrst reconstruct the corresponding analog signal and. However.4a The sampling rate of new discrete-time signal is M times smaller than the sampling rate of original signal. then. produces a downsampled signal y(m) as y(m) = x(mM ). The output spectrum is a sum of M uniformly shifted and stretched versions of X(ejωT ) and also scaled by a factor of .5) The time index m in the above equation is related to the old time index n by a factor of M . and the processor used to process that data can only accept data at a diﬀerent sampling rate. For example. an analogto-digital (A/D) conversion system is supplying a signal data at some sampling rate. and the discrete-time samples are x(n) = xa (n/Fin ). it is more eﬃcient to perform SRC directly in the digital domain due to the availability of accurate all-digital sampling rate conversion schemes.4 Chapter 1. When a continuous-time signal xa (t) is sampled at a rate Fin . Introduction where Fin and Fout are the original input sampling rate and the new sampling rate after the conversion. The block diagram showing the downsampling operation is shown in Fig. SRC is available in two ﬂavors. re-sample it with the desired sampling rate. 1. where M is a positive integer.

(b) M -fold decimator. otherwise.4b.5: Spectra of the intermediate and decimated sequence. . 1. The upsampling by a factor of L is implemented by inserting L − 1 zeros between two consecutive samples [9]. 1. Sampling Rate Conversion 5 x(n) M y(m) x(n) H(z) x1(m) M y(m) (a) M -fold downsampler.2 Interpolation Interpolation by a factor of L. where L is a positive integer.7) A block diagram shown in Fig. 1. The spectra of the intermediate sequence and output sequence obtained after downsampling are shown in Fig. An upsampling operation to a discrete-time signal x(n) produces an upsampled signal y(m) according to y(m) = x(m/L). m = 0. . Figure 1. The block diagram of a decimator is shown in Fig. X1 (ejωT ) Hideal (ejωT1 ) π/M Y (ejωT ) π 2π ωT1 π 2π 2Mπ ωT Figure 1. 1. (1. 1/M .2. Decimation requires that aliasing should be avoided. can be realized as a two-step process of upsampling followed by an anti-imaging ﬁltering. .1. The ideal ﬁlter. The signals which are bandlimited to π/M can be downsampled without distortion. Therefore. 1.2.4: M -fold downsampler and decimation. ±L.6a is used to represent the upsampling op- . . as shown by dotted line in Fig. The performance of a decimator is determined by the ﬁlter H(z) which is there to suppress the aliasing eﬀect to an acceptable level.5. ±2L. the ﬁrst step is to bandlimit the signal to π/M and then downsampling by a factor M . should be a lowpass ﬁlter with the stopband edge at ωs T1 = π/M . 0.5.

upsampling by a factor of L is performed. X(ejωT ) π 2π Hideal (ejωT1 ) 2Lπ ωT X1 (ejωT ) π/L 2π/L Y (ejωT1 ) 2π ωT1 π/L π 2π ωT1 Figure 1.7: Spectra of the original. and in the second step. The frequency domain representation of upsampling can be found by taking the z-transform of both sides of (1. 1. The upsampling operation is a linear but time-varying operation. (1. intermediate.6: L-fold upsampler and interpolator. Figure 1.6 Chapter 1. A delay in the original input signal by some samples does not result in the same delay of the upsampled signal. The performance of an interpolator is determined .6b. The upsampling operation increases the sampling rate of the original signal by L times. unwanted images are removed using anti-imaging ﬁlter. Interpolation requires the removal of the images. eration. (b) L-fold interpolator. and output sequences.7) as +∞ Y (ejωT ) = −∞ y(m)e−jωT m = X(ejωT L ). Therefore in ﬁrst step. The block diagram of an interpolator is shown in Fig. Introduction x(n) L y(m) x(n) L x1 (m) H(z) y(m) (a) L-fold upsampler.8) The above equation shows that the upsampling operation leads to L − 1 images of the spectrum of the original signal in the baseband.

. Figure 1.2. y(m). seen in Figs. help to move the downsampler and upsampler operations to a more desirable position to enable an eﬃcient implementation structure.8: Noble identities for decimation. called noble identities. since ﬁltering has to be performed at the higher sampling rate. In SRC. but also the repeated images of the baseband. the desired sequence. 1. moving converters leads to evaluation of additions and multiplications at lower sampling rate. the arithmetic operations of additions and multiplications are to be evaluated at the lowest possible sampling rate.9c and 1. the spectrum of the sequence. 1. by the ﬁlter H(z). x1 (m). the computational eﬃciency may be improved if downsampling (upsampling) operations are introduced into the ﬁlter structures. show that a delay of M (L) sampling periods at the higher sampling rate corresponds to a delay of one sampling period at the lower rate.9b and 1. As a result.7. 1. The ideal ﬁlter should be a lowpass ﬁlter with the stopband edge at ωs T1 = π/L as shown by dotted line in Fig.3 Noble Identities The six identities. 1. 7 x1(m) M c1 y(n) M c2 x(m) z −M M y(n) x(m) M z −1 y(n) (b) Noble identity 3. The ﬁfth and sixth identities. seen in Figs. which is there to remove the unwanted images. can be obtained from x1 (m) by removing these unwanted images.1. This is performed by the interpolation (anti-imaging) ﬁlter.2. not only contains the baseband of the original signal.9a and 1. x(m) H(z M ) M y(n) x(m) M H(z) y(n) (c) Noble identity 5. Sampling Rate Conversion x1 (m) c1 M x2 (m) c2 y(n) x2(m) (a) Noble identity 1. 1. As shown in Fig.7. seen in Figs. 1. are generalized versions of the third and fourth identities.8b.8a.8c. The third and fourth identities. Apparently. In the ﬁrst and second identities.

operate at M times lower sampling rate.9: Noble identities for interpolation. 1. . zeros can be discharged resulting in downsampled-by-M polyphase components. The subsequences hm (n) and the corresponding z-transforms deﬁned in (1.10) The above representation is equivalent to dividing the impulse response h(n) into M non-overlapping groups of samples hm (n). x(n) L H(z L) y(m) x(n) H(z) L y(m) (c) Noble identity 6. H(z) can be decomposed as M−1 +∞ M−1 H(z) = m=0 z −m n=−∞ h(nM + m)z −nM = m=0 z −m Hm (z M ). The polyphase components. 10]. as a result. obtained from h(n) by M fold decimation starting from sample m. Since each polyphase component contains M − 1 zeros between two consecutive samples and only nonzero samples are needed for further processing.9) For an integer M .10) are called the Type-1 polyphase components of H(z) with respect to M [10]. (1.3 Polyphase Representation A very useful tool in multirate signal processing is the so-called polyphase representation of signals and systems [5. an LTI system is considered with a transfer function +∞ H(z) = n=−∞ h(n)z −n . To formally deﬁne it. Introduction y1(m) x(n) c2 L y2(m) c1 L y1(m) x(n) L z −L y(m) x(n) z −1 L y(m) (b) Noble identity 4. It facilitates considerable simpliﬁcations of theoretical results as well as eﬃcient implementation of multirate systems. Chapter 1. The polyphase decomposition is widely used and its combination with the noble identities leads to eﬃcient multirate implementation structures.8 c1 x(n) L c2 y2(m) (a) Noble identity 2. (1. Figure 1.

Figure 1. but it is possible to do so [10]. 1.10). 1. The ﬁlter H(z) in Fig. However. Polyphase Representation x(n) z −1 H1(z M ) z −1 H2(z M ) z −1 HM −1(z M ) (a) Polyphase decomposition of a decimation ﬁlter. For FIR ﬁlters.10a. is shown in Fig.11a. 1. the polyphase decomposition is not so simple.10b. (1. 1.4b is represented by its polyphase representation form as shown in Fig. 1. Similarly. An IIR ﬁlter has a transfer function that is a ratio of two polynomials. The representation of the transfer function into the form. An eﬃcient implementation of decimators and interpolators results if the ﬁlter transfer function is represented in polyphase decomposed form [10]. for IIR ﬁlters. where M and L .10: Polyphase implementation of a decimator.3. the interpolation ﬁlter H(z) in Fig. Its equivalent structure. but more eﬃcient in hardware implementation.11b. the polyphase decomposition into low-order sub-ﬁlters is very easy. 1.6b is represented by its polyphase representation form as shown in Fig. A more eﬃcient polyphase implementation and its equivalent commutative structure is shown in Fig.1. 9 M y(m) H0(z M ) x(n) z −1 M H0(z) y(m) H0(z) M H1(z) x(m) H1(z) y(m) z −1 M HM −1(z) HM −1 (z) (b) Moving downsampler to before sub-ﬁlters and replacing input structure by a commutator. needs some modiﬁcations in the original transfer function in such a way that the denominator is only function of powers of z M or z L .

The polyphase sub-ﬁlters in the second approach has distinct allpass sub-ﬁlters [12–15]. for example. Introduction y(m) z −1 H1(z L) z −1 H2(z L) z −1 HL−1(z L) (a) Polyphase decomposition of an interpolation ﬁlter. and arbitrary sampling rate conver- . mitigation of symbol synchronization errors in digital communications [16–19].10 x(n) L H0(z L) Chapter 1.11: Polyphase implementation of an interpolator. the original IIR transfer function is re-arranged and transformed into (1.10). x(n) H0(z) L z −1 y(m) H0(z) H1(z) L x(n) H1(z) y(m) z −1 HL−1 (z) L HL−1 (z) (b) Moving upsampler to after sub-ﬁlters and replacing output structure by a commutator. Several approaches are available in the literature for the polyphase decomposition of the IIR ﬁlters. In this thesis. only FIR ﬁlters and their polyphase implementation forms are considered for the sampling rate conversion. echo cancellation [23]. In the ﬁrst approach [11]. 1. time-delay estimation [20–22]. Figure 1.4 Fractional-Delay Filters Fractional-delay (FD) ﬁlters ﬁnd applications in. are the polyphase decomposition factors.

5 d=0.13) can be considered as all-pass and having a linearphase characteristics. Equation (1.3 d=0.13) The ideal FD ﬁlter in (1.9}. In the frequency domain. 0. (1. (1. The delay parameter D can be expressed as D = Dint + d.4. Ideally.5 0 0 d=0.14) . (1. The integer part of the delay can then be implemented as a chain of Dint unit delays. The FD d however needs approximation.1.11) need to be approximated. .4 d=0. sion [24–26]. For non-integer values of D. FD ﬁlters are used to provide a fractional delay adjustable to any desired value.2.1 0. Fractional-Delay Filters 11 3.8 1 ωT/π Figure 1.4 0.1. an ideal FD ﬁlter can be expressed as Hdes (ejω ) = e−j(Dint +d)ωT .5 Phase delay 2 1. .12: Phase-delay characteristics of FD ﬁlters designed for delay parameter d = {0.5 1 0.12) where Dint is the integer part of D and d is the FD.2 0. the output y(n) of an FD ﬁlter for an input x(n) is given by y(n) = x(n − D).6 0.2 d=0.9 d=0. .7 d=0. The magnitude and phase responses are |Hdes (ejωT )| = 1 and φ(ωT ) = −(Dint + d)ωT. (1.11) is valid for integer values of D only. (1.5 3 2.6 d=0.8 d=0. .11) where D is a delay. 0.15) (1.

4 0.13: Magnitude of FD ﬁlters designed for delay parameter d = {0. a wide range of FD ﬁlters have been proposed based on FIR and IIR ﬁlters [27–34].2 and d = 0.4 d=0. In the application of FD ﬁlters for SRC. For the approximation of the ideal ﬁlter response in (1. and it is demonstrated for d = 0. d will take on all values in a sequential manner between {(L − 1)/L. can be expressed as y(nT ) = xa (nT − Dint T − dT ). the fractional-delay d is changed at every instant an output sample occurs. Equivalently.14 . The input and output rates will now be diﬀerent. For each input sample. 0}. with a step size of 1/L. 0. for each input sample there are now two output samples.6 0.8 Magnitude d=0 d=0. . 1. (1.5 0.8.8 1 Figure 1.2 0. 0. The fractional-delay d assumes the value 0 and 0. 0.9}.6.2. 0. For a sampling rate increase by a factor of L. it can be interpreted as delaying the underlying continuous-time signal by L diﬀerent values of fractional-delays.16) .6 d=0.9.11). 1. delayed by a fractional-delay d. .1 d=0. The underlying continuous-time signal xa (nT ).4 ωT/π 0. Introduction 1 0.13. and the two corresponding output samples are computed. . 0. .4 in Fig. If the sampling rate is required to be increased by a factor of two. 0.7. 35] are shown in Figs. L output samples are generated.2 0.3 0. The phase delay and magnitude characteristics of the FD ﬁlter based on ﬁrst order Lagrange interpolation [32.2 0 0 d=0.12 Chapter 1.12 and 1.5 for each input sample.1.

15a. seen in Fig. 37] Dynamic or switching power consumption Short circuit power consumption Leakage power consumption The above sources are summarized in an equation as Pavg = Pdynamic + Pshort-circuit + Pleakage 2 = α0→1 CL VDD fclk + Isc VDD + Ileakage VDD .17) The switching or dynamic power consumption is related to charging and discharging of a load capacitance CL through the PMOS and NMOS transistors during low-to-high and high-to low transitions at the output. half of which is dissipated in the form of heat through the PMOS transistors while the other half is stored in the load capacitor.14: The underlying continuous-time signal delayed by d = 0. The power dissipation is commonly divided into three diﬀerent sources: [36. During the pull-down.5 1 d=0 d=0. 1.5 −1 −2 0 2 4 6 Sample index n 8 10 Figure 1.15b. respectively.1.4 Amplitude 0.5 Power and Energy Consumption Power is dissipated in the form of heat in digital CMOS circuits. If all these transitions occur at the clock rate of fclk then the switching power consumption .2 d=0. 1. seen in 2 Fig. 1. The total energy drawn from the power supply for low-to-high transition.4 with FD ﬁltering.5 0 −0.5. the energy stored on 2 CL which is CL VDD /2 is dissipated as heat by the NMOS transistors. high to low transition. is CL VDD . Power and Energy Consumption 13 1. (1.2 and d = 0.

However the switching of the data is not always at the clock rate but rather at some reduced rate which is best deﬁned by another parameter α0→1 . resulting in a direct path from supply to ground. deﬁned as the average number of times in each cycle that a node makes a transition from low to high. The leakage current. are deﬁned by the layout and speciﬁcation of the circuit. The dashed arrow is for the leakage current.15a and 1. consists of two major contributions: Isub and Igate . is dissipated in the circuits when they are idle. which ﬂows when both of the PMOS and NMOS transistors are active simultaneously. Figure 1. 2 is given by CL VDD fclk . All the parameters in the dynamic power equation. except α0→1 . on the other hand.15b by a dashed line. as shown in Figs. Introduction VDD Vin CL Vout (a) A rising output transition on the CMOS inverter.15: A rising and falling output transition on the CMOS inverter. when both NMOS . The second power term is due to the direct-path short circuit current. Ileakage . Isc . 1. Leakage power.14 Chapter 1. The solid arrows represent the charging and discharging of the load capacitance CL . VDD Vin CL Vout (b) A falling output transition on the inverter. The term Isub is the sub-threshold current caused by low threshold voltage.

Today. Igate . the leakage component of power consumption has started to become dominant [38–40]. is the gate current caused by reduced thickness of the gate oxide in deep sub-micron process . The relative contribution of these two forms of power consumptions has greatly evolved over the period of time. when technology scaling motivates the reduced power supply and threshold voltage. the two foremost forms of power consumptions are dynamic and leakage. In modern/concurrent CMOS technologies. The contribution of the reverse leakage currents.1. due to of reverse bias between diﬀusion regions and wells. In today’s processes. Power and Energy Consumption 15 and PMOS transistors are oﬀ. sub-threshold leakage is the main contributor to the leakage current. is small compared to sub-threshold and gate leakage currents. The other quantity. .5.

.

Chapter 2 Finite Word Length Eﬀects

Digital ﬁlters are implemented in hardware with ﬁnite-precision numbers and arithmetic. As a result, the digital ﬁlter coeﬃcients and internal signals are represented in discrete form. This generally leads to two diﬀerent types of ﬁnite word length eﬀects. First, there are the errors in the representing of coeﬃcients. The coeﬃcients representation in ﬁnite precision (quantization) has the eﬀect of a slight change in the location of the ﬁlter poles and zeros. As a result, the ﬁlter frequency response diﬀers from the response with inﬁnite-precision coeﬃcients. However, this error type is deterministic and is called coeﬃcient quantization error. Second, there are the errors due to multiplication round-oﬀ, that results from the rounding or truncation of multiplication products within the ﬁlter. The error at the ﬁlter output that results from these roundings or truncations is called round-oﬀ noise. This chapter outlines the ﬁnite word length eﬀects in digital ﬁlters. It ﬁrst discusses binary number representation forms. Diﬀerent types of ﬁxed-point quantizations are then introduced along with their characteristics. The overﬂow characteristics in digital ﬁlters are brieﬂy reviewed with respect to addition and multiplication operations. Scaling operation is then discussed which is used to prevent overﬂows in digital ﬁlter structures. The computation of roundoﬀ noise at the digital ﬁlter output is then outlined. The description of the constant coeﬃcient multiplication is then given. Finally, diﬀerent approaches for the optimization of word length are reviewed.

2.1 Numbers Representation

In digital circuits, a number representation with a radix of two, i.e., binary representation, is most commonly used. Therefore, a number is represented by 17

18

Chapter 2. Finite Word Length Eﬀects

a sequence of binary digits, bits, which are either 0 or 1. A w-bit unsigned binary number can be represented as X = x0 x1 x2 ...xw−2 xw−1 , with a value of

w−1

(2.1)

X=

i=0

xi 2w−i−1 ,

(2.2)

where x0 is the most signiﬁcant bit (MSB) and xw−1 is the least signiﬁcant bit (LSB) of the binary number. A ﬁxed-point number consists of an integral part and a fractional part, with the two parts separated by a binary point in radix of two. The position of the binary point is almost always implied and thus the point is not explicitly shown. If a ﬁxed-point number has wI integer bits and wF fractional bits, it can be expressed as X = xwI −1 . . . x1 x0 .x−1 x−2 . . . x−wF . (2.3) The value can be obtained as

wI −1 −1

X=

i=0

xi 2i +

i=−wF

xi 2i .

(2.4)

**2.1.1 Two’s Complement Numbers
**

For a suitable representation of numbers and an eﬃcient implementation of arithmetic operation, ﬁxed-point arithmetics with a word length of w bits is considered. Because of its special properties, the two’s complement representation is considered, which is the most common type of arithmetic used in digital signal processing. The numbers are usually normalized to [−1, 1), however, to accommodate the integer bits, the range [−2wI , −2wI ), where wI ∈ N, is assumed. The quantity wI denotes the number of integer bits. The MSB, the left-most bit in w, is used as the sign bit. The sign bit is treated in the same manner as the other bits. The fraction part is represented with wF = w − 1 − wI bits. The quantization step is as a result ∆ = 2−wF . If X2C is a w-bit number in two’s complement form, then by using all deﬁnitions considered above, X can be represented as

w−1

X2C = 2wI

−x0 20 +

wI

xi 2−i

i=1

, xi ∈ {0, 1}, i = 0, 1, 2, . . . , w − 1,

w−1

= −x0 2wI +

sign bit

xi 2wI −i +

i=1 integer i=wI +1

xi 2wI −i ,

fraction

**2.2. Fixed-Point Quantization or in compact form as X2C = [ x0
**

sign bit

19

**| x1 x2 . . . xwI | xwI +1 . . . xw−1 ]2 .
**

integer fraction

(2.5)

In two’s complement, the range of representable numbers is asymmetric. The largest number is Xmax = 2wI − 2−wF = [0|1 . . . 1|1 . . . 1|]2 , and the smallest number is Xmin = −2wI = [1|0 . . . 0|0 . . . 0|]2 . (2.7) (2.6)

**2.1.2 Canonic Signed-Digit Representation
**

Signed-digit (SD) numbers diﬀer from the binary representation, since the digits are allowed to take negative values, i.e., xi ∈ {−1, 0, 1}. The symbol 1 is also used to represent −1. It is a redundant number system, as diﬀerent SD representations are possible of the same integer value. The canonic signed-digit (CSD) representation is a special case of signed-digit representation in that each number has a unique representation. The other feature of CSD representation is that a CSD binary number has the fewest number of non-zero digits with no consecutive bits being non-zero [41]. A number can be represented in CSD form as

w−1

X=

i=0

xi 2i ,

where, xi ∈ {−1, 0, +1} and xi xi+1 = 0, i = 0, 1, . . . , w − 2.

**2.2 Fixed-Point Quantization
**

Three types of ﬁxed-point quantization are normally considered, rounding, truncation, and magnitude truncation [1, 42, 43]. The quantization operator is denoted by Q(.). For a number X, the rounded value is denoted by Qr (X), the truncated value by Qt (X), and the magnitude truncated value Qmt (X). If the quantized value has wF fractional bits, the quantization step size, i.e., the diﬀerence between the adjacent quantized levels, is ∆ = 2−wF (2.8)

The rounding operation selects the quantized level that is nearest to the unquantized value. As a result, the rounding error is at most ∆/2 in magnitude as shown in Fig. 2.1a. If the rounding error, r , is deﬁned as

r

= Qr (X) − X,

(2.9)

20

Chapter 2. Finite Word Length Eﬀects

∆/2 ∆/2

∆ ∆

∆ ∆

(a) Rounding error.

(b) Truncation error.

(c) Magnitude truncation error.

Figure 2.1: Quantization error characteristics. then

∆ ∆ ≤ r≤ . (2.10) 2 2 Truncation simply discards the LSB bits, giving a quantized value that is always less than or equal to the exact value. The error characteristics in the case of truncation are shown in Fig. 2.1b. The truncation error is − −∆<

t

≤ 0.

(2.11)

Magnitude truncation chooses the nearest quantized value that has a magnitude less than or equal to the exact value, as shown in Fig. 2.1c, which implies −∆<

mt

< ∆.

(2.12)

The quantization error can often be modeled as a random variable that has a uniform distribution over the appropriate error range. Therefore, the ﬁlter calculations involving round-oﬀ errors can be assumed error-free calculations that have been corrupted by additive white noise [43]. The mean and variance of the rounding error is

∆/2

1 mr = ∆

−∆/2

rd r

=0

(2.13)

and

2 σr

∆/2

1 = ∆

−∆/2

(

r

− mr )2 d

r

=

∆2 . 12

(2.14)

**Similarly, for truncation, the mean and variance of the error are mt = − ∆ 2
**

2 and σt =

∆2 , 12

(2.15)

2 mmt = 0 and σmt = 21 ∆2 .2. Having two numbers.1 Two’s Complement Addition In two’s complement arithmetic. and graphically it is shown in Fig.3. Overﬂow is similarly treated here as in the case of addition. which corresponds to a repeated addition or subtraction of 2(wI +1) to make the w + 1-bit result to be representable by w-bits. a repeated addition or subtraction of 2(wI +1) . 3 (2. each with a precision wF . the result will be w + 1 bits. the integer bits need to be extended. X < −2wI . (2.2 Two’s Complement Multiplication In the case of multiplication of two ﬁxed-point numbers each having w-bits.e. In two’s complement. This happens for ﬁxed-point arithmetic . To accommodate this extra bit.2: Overﬂow characteristics for two’s complement arithmetic. Overﬂow Characteristics and for magnitude truncation.2.3. such overﬂows can be seen as discarding the extra bit. X. X ≥ 2wI .3. 2. Since numbers outside this range are not representable.g. the product is of precision 2wF .17) X2C (X) = wI +1 X +2 . the result is 2w-bits. X2C (X) 2w I − ∆ X −2wI Figure 2.3 Overﬂow Characteristics With ﬁnite word length. 2. the result overﬂows.. The number is reduced to wF .3. This model for overﬂow is illustrated in Fig. 2. 2wI ). −2wI ≤ X < 2wI .16) 2. 2. when two numbers of the same sign are added to give a value having a magnitude not in the interval [−2wI . it is possible for the arithmetic operations to overﬂow. when two numbers each having w-bits are added together. The overﬂow characteristics of two’s complement arithmetic can be expressed as X − 2wI +1 .

using L2 -norm scaling of a node inside the ﬁlter. However. 2. Finite Word Length Eﬀects X1 y X2 d · 2(wI +1) y Figure 2. otherwise. 2wI ). Also the signal levels should not be too low. The integer d ∈ Z has to assign a value such that y ∈ [−2wI . precision again by using rounding or truncation. 2.22 X1 y X2 X2 X1 Chapter 2. as repeated additions with an overﬂow can be acceptable if the ﬁnal sum lies within the proper signal range [4]. X1 y X2 X2 X1 Q y X2 X1 d · 2(wI +1) y Figure 2.4. . there exist several scaling norms that compromise between the probability of overﬂows and the round-oﬀ noise level at the output. In this thesis. The integer d ∈ Z has to assign a value such that y ∈ [−2wI . In the literature.4 Scaling To prevent overﬂow in ﬁxed-point ﬁlter realizations. only the commonly employed L2 -norm is considered which for a Fourier transform H(ejωT ) is deﬁned as 1 2π π H(e jωT ) 2 = −π |H(ejωT )| d(ωT ). However. the signal-to-noise (SNR) ratio will suﬀer as the noise level is ﬁxed for ﬁxed-point arithmetic. or at the output. the inputs to non-integer multipliers must not overﬂow. 2wI ). the signal levels inside the ﬁlter can be reduced by inserting scaling multipliers.4: Multiplication in two’s complement. The model to handle overﬂow in multiplication is shown in Fig. implies the same probability of overﬂow at that node. if the input to a ﬁlter is Gaussian white noise with a certain probability of overﬂow. The use of two’s complement arithmetic eases the scaling. 2 In particular. the scaling multipliers should not distort the transfer function of the ﬁlter.3: Addition in two’s complement.

white. The word length optimization approach trades precisions for VLSI measures such as area. and speed. In case there is more than one source of roundoﬀ error in the ﬁlter. Quantization noise is assumed to be stationary.2. The possible ways could be closed-form expressions or the availability of precomputed values in the form of a table of these measures as a function of word lengths. the quantization process introduces round-oﬀ errors. These closed-form expressions and precomputed values are then used by the word length optimization algorithm at the word length assignment phase to have an estimate. output.18) and 2 2 σy = σx ∞ g 2 (n).5. The cost of an implementation is generally required to be minimized. while still satisfying the system speciﬁcation in terms of implementation accuracy. The round-oﬀ noise variance at the output is the sum of contributions from each quantization error source. before doing an actual VLSI implementation.6 Word Length Optimization As stated earlier in the chapter. the mean and variance of the output noise is ∞ my = mx n=−∞ g(n) (2. while in-suﬃcient bit-width allocation will result in overﬂows and violate precision requirements. These are the measures or costs by which the performance of a design is evaluated. n=−∞ (2. the assumption is made that these errors are uncorrelated. For a linear system with impulse response g(n). This assumption is valid if the ﬁlter input changes from sample to sample in a suﬃciently random-like manner [43]. the hardware implementation of an algorithm will be eﬃcient typically involving a variety of ﬁnite precision representation of diﬀerent sizes for the internal variables. 2. Round-Oﬀ Noise 23 2. The ﬁrst diﬃculty in the word length optimization problem is deﬁning of the relationship of word length to considered VLSI measures. Excessive bit-width allocation will result in wasting valuable hardware resources. excited by white noise with 2 mean mx and variance σx . After word length optimization.5 Round-Oﬀ Noise A few assumptions need to be made before computing the round-oﬀ noise at the digital ﬁlter output.19) where g(n) is the impulse response from the point where a round-oﬀ takes place to the ﬁlter output. which in turn measures the accuracy of an implementation. and internal variables. . power. and uncorrelated with the ﬁlter input.

is taken to be the round-oﬀ noise value at the output.20).g. and power consumption are increasing functions of word length. i 1 ≤ i ≤ n. For simplicity. In [49]. the word length allocation problem is solved using a mixed-integer linear programming formulation. (2. A shift operation in this context is used to implement a multiplication by a . n . subtractions. The performance function.22) (2. (2. This assumption allows to use superposition of independent noise sources to compute the overall round-oﬀ noise at the output. have constrained the cost. commonly used in DSP algorithms such as digital ﬁlters [53]. If the quantization word length wi is assumed for error source at node i then 2 σei = 2−2wi . speed. 2. 12 1 ≤ i ≤ n.24 Chapter 2. e.20) where ei is the quantization error source at node i and hi (k) is the impulse response from node i to the output.7 Constant Multiplication A multiplication with a constant coeﬃcient. while optimizing the other metric(s). . 2 s. i = 1. while VLSI measures such as area.t. The noise variance at the output is then written as ∞ 2 σo = k=0 2 σei h2 (k).21) The formulation of an optimization problem is done by constraining the cost or accuracy of an implementation while optimizing other metric(s). an LTI system with n quantization error sources is assumed. and shifts only [54]. f (wi ). The problem of word length optimization has received considerable research attention. To derive a round-oﬀ noise model.. can be made multiplierless by using additions.23) where σspec is the required noise speciﬁcation at the output. one possible formulation of word length optimization problem is minimize area : f (wi ). Finite Word Length Eﬀects The round-oﬀ noise at the output is considered as the measure of performance function because it is the primary concern of many algorithm designers. 2. due to the limiting of internal word lengths. diﬀerent search-based strategies are used to ﬁnd suitable word length combinations. . the cost function is assumed to be the area of the design. In [44–48]. on other hand. . given in (2. Some other approaches. 1≤i≤n (2. Its value is measured appropriately to the considered technology. [50–52]. The round-oﬀ error is a decreasing function of word length. The complexity for adders and subtracters is roughly the same so no diﬀerentiation between the two is normally considered. As a result. and it is assumed to be the function of quantization word lengths. . σo ≤ σspec .

diﬀerence methods [66–70]. the coeﬃcient 45.5: Diﬀerent realizations of multiplication with the coeﬃcient 45. the hardware requirements depend on the coeﬃcient value. 71–73]. Most of the work in the literature has focused on minimizing the adder cost [55–57]. The symbol i are used to represent i left shifts. For bit parallel arithmetic. they can be implemented more eﬃciently by using structures that remove any redundant partial results among the coeﬃcients and thereby reduce the overall number of operations. e. and graph based methods [60. sub-expression sharing [61–65].2. A simple way to implement multiplier blocks is to realize each multiplier separately. as in the case of transposed direct-form FIR ﬁlters shown in 1. the shifts can be realized without any hardware using hardwiring. the number of ones in the binary representation of the coeﬃcient value. for example. The constant coeﬃcient multiplication can be implemented by the method that is based on the CSD representation of the constant coeﬃcient [58].5. The MCM algorithms can be divided into three groups based on the approach used in the algorithms. varying with respect to number of additions and shifts requirement [41]. In constant coeﬃcient multiplication. or more eﬃciently by using other structures as well that require fewer number of operations [59]. having the CSD representation 1010101. This corresponds to a matrix-vector . one signal is required to be multiplied by several constant coeﬃcients.2. The MCM concepts can be further generalized to computations involving multiple inputs and multiple outputs. factor of two. 2. Realizing the set of products of a single multiplicand is known as the multiplier block problem [60] or the multiple constant multiplications (MCM) problem [61]. However. Consider.. In some applications.g. The multiplication with this constant can be realized by three diﬀerent structures as shown in Fig. Constant Multiplication 25 x − − y 2 x 2 2 2 x − 3 y 2 4 − y Figure 2.7.

Matrix-vector MCM algorithms include [76–80]. and state space digital ﬁlters [4. 41]. . but also FIR ﬁlter banks [74].26 Chapter 2. Finite Word Length Eﬀects multiplication with a matrix with constant coeﬃcients. This is the case for linear transforms such as the discrete cosine transform (DCT) or the discrete Fourier transform (DFT). polyphase decomposed FIR ﬁlters [75].

Chapter 3 Integer Sampling Rate Conversion The role of ﬁlters in sampling rate conversion has been described in Chapter 1. the non-recursive implementation of the CIC ﬁlter and its multistage and polyphase implementation structures are presented. the overall computational workload in the sampling rate conversion system can potentially be reduced by the conversion factor M (L). The structures of the CIC-based decimators and interpolators are then shown. 3. Finally. The overall two-stage implementation of a CIC and an FIR ﬁlter that is used for the compensation is described. The chapter begins with the basic decimation and interpolation structures that are based on FIR ﬁlters. downsampling or upsampling operations can be moved into the ﬁltering structures. This chapter mainly focuses on the implementation aspects of decimators and interpolators due to the fact that ﬁltering initially has to be performed on the side of higher sampling rate. A part of the chapter is then devoted to the eﬃcient polyphase implementation of FIR decimators and interpolators. The characteristics of these ﬁlters then correlate to the overall performance of a decimator or of an interpolator. In decimation. The goal is then to achieve conversion structures allowing the arithmetic operations to be performed at the lower sampling rate. As a result. providing arithmetic operations to be performed at the lower sampling rate. 27 . ﬁltering operations need to be performed at the higher sampling rate. The overall computational workload will as a result be reduced.1 Basic Implementation Filters in SRC have an important role and are used as anti-aliasing ﬁlters in decimators or anti-imaging ﬁlters in interpolators. The concept of the basic cascade integrator-comb (CIC) ﬁlter is introduced and its properties are discussed. As discussed in Chapter 1. However. ﬁltering operation proceeds upsampling. ﬁltering operation precedes downsampling and in interpolation.

The input sample values are then multiplied by the ﬁlter coeﬃcients h(n) and combined together to give y(m). the . Similarly. The symmetry of coeﬃcients may be exploited in the case of linear phase FIR ﬁlter to reduce the number of multiplications to (N + 1)/2 for odd N and N/2 + 1 for even N .1. The implementation can be modiﬁed to a form that is computationally more eﬃcient. the number of the multiplications per input sample in the decimator is equal to the FIR ﬁlter length N + 1. 3.2. 3. for a downsampling factor M . Integer Sampling Rate Conversion z −1 h(3) h(N − 1) z −1 h(N ) M h(0) y(m) Figure 3. reduces the multiplications per input sample to (N + 1)/M .1.2: Computational eﬃcient direct-form FIR decimation structure. 3. a direct-form FIR ﬁlter is cascaded with the downsampling operation as shown in Fig. The non-recursive nature of FIR ﬁlters provides the opportunity to improve the eﬃciency of FIR decimators and interpolators through polyphase decomposition. In the second modiﬁcation.2.1 FIR Decimators In the basic implementation of an FIR decimator. as seen in Fig. In Fig.1. FIR ﬁlters are considered as anti-aliasing or antiimaging ﬁlters. 3. The ﬁrst modiﬁcation made by the use of noble identity. the input data is now read simultaneously from the delays at every M :th instant of time.1: Cascade of direct-form FIR ﬁlter and downsampler. seen in Fig.28 x(n) z −1 h(1) z −1 h(2) Chapter 3. 3. In the basic approach [81].2. multiplications per input sample is (N + 2)/2M for even N . The FIR ﬁlter acts as the anti-aliasing ﬁlter. 3. x(n) z −1 M M z −1 M z −1 M M z −1 M h(0) h(1) h(2) h(3) h(N − 1) h(N ) y(m) Figure 3. In the basic implementation of the FIR decimator in Fig. The downsampling operation precedes the multiplications and additions which results in the arithmetic operations to be performed at the lower sampling rate.

multiplications per input sample are (N + 1)/(2M ) for odd N . 3. as shown in Fig. 3. every L:th input sample to the ﬁlter is non-zero.4: Computational eﬃcient transposed direct-form FIR interpolation structure. 3. For a factor of M polyphase decomposition. multiplications need to be performed at the sampling rate of the input signal. 3. the FIR ﬁlter transfer . In the basic multiplication of the FIR interpolator in Fig. Similar to the case of decimator. Polyphase FIR Filters x(n) L h(N ) h(N − 1) z −1 29 h(2) z h(1) −1 h(0) z −1 y(m) Figure 3. seen in Fig.2 FIR Interpolators The interpolator is a cascade of an upsampler followed by an FIR ﬁlter that acts as an anti-imaging ﬁlter. the upsampling operation is moved into the ﬁlter structure at the desired position such that the multiplication operations are performed at the lower sampling-rate side.3. x(n) h(N ) L h(N − 1) L z −1 h(2) L h(1) L z −1 h(0) L z −1 y(m) Figure 3. 3.3: Cascade of upsampler and transposed direct-form FIR ﬁlter. The result is then upsampled by the desired factor L.1. Similar to the decimator case.3.3.2.4. This modiﬁed implementation approach reduces the multiplications count per output sample from N + 1 to (N + 1)/L. the transfer function of an FIR ﬁlter can be decomposed into parallel structures based on the principle of polyphase decomposition. Since there is no need to multiply the ﬁlter coeﬃcients by the zero-valued samples.2 Polyphase FIR Filters As described earlier in Sec.1. the coeﬃcient symmetry of the linear phase FIR ﬁlter can be exploited to reduce the number of multiplication further by two. 3.

5. The input sequences to the polyphase sub-ﬁlters are combination of delayed and downsampled values of the input which can be directly selected from the input with the use of a commutator.10a. function is decomposed into M low-order transfer functions called the polyphase components [5.1). Integer Sampling Rate Conversion x(n) y(m) T T T Figure 3. the arithmetic operations inside the sub-ﬁlters now operate at the lower sampling rate. If the impulse response h(n) is assumed zero outside 0 ≤ n ≤ N .10) becomes (N +1)/M Hm (z) = n=0 h(nM + m)z −n . 1. 10]. The overall computational complexity of the decimator is reduced by a factor of M . The resulting polyphase architecture for a factor-of-M decimation is shown in Fig.1) 3.1 Polyphase FIR Decimators The basic structure of the polyphase decomposed FIR ﬁlter is the same as that in Fig. however the sub-ﬁlters Hm (z) are now speciﬁcally deﬁned in (3.10. The individual contributions from these polyphase components when added together has the same eﬀect as of the original transfer function. (3. The arithmetic operations in the implementation of all FIR sub-ﬁlters still operate at the input sampling rate. As a result. then the factors Hm (z) in (1. 1. 3. 0 ≤ m ≤ M − 1.30 Chapter 3.2. The downsampling operation at the output can be moved into the polyphase branches by using the more eﬃcient polyphase decimation implementation structure in Fig. The commutator operates at the .5: Polyphase factor of M FIR decimator structure.

The resulting architecture for factor of L interpolator is shown in Fig. More eﬃcient interpolation structure will result because the arithmetic operations inside the sub-ﬁlters now operate at the lower sampling rate. input sampling rate but the sub-ﬁlters operate at the output sampling rate. can be used if . . . can be used when the decimation factor M can be factored into the product of integers.1) for the FIR ﬁlter case. delays. 3. The overall computational complexity of the interpolator is reduced by a factor of L. seen in Fig.2.3. The basic structure of the polyphase decomposed FIR ﬁlter is same as that in the Fig. seen in Fig.11a. Similarly. The upsampling operation at the input is moved into the polyphase branches by using the more eﬃcient polyphase interpolation implementation structure in Fig. .11. Multistage Implementation 31 x(n) y(m) T T T Figure 3. . 1. a multistage interpolator with K stages. by a factor L.7. 1.8. The polyphase sub-ﬁlters Hm (z) are deﬁned in (3. 3. The same function can be replaced directly by using a commutative switch. the arithmetic operations in the implementation of all FIR sub-ﬁlters operate at the upsampled rate. instead of using a single ﬁlter and factor of M downsampler. Mk . M = M1 ×M2 ×. 3.6.3. also called polyphase sub-ﬁlters. and adders are used to feed the correct interpolated output sample. 3.2 Polyphase FIR Interpolators The polyphase implementation of an FIR ﬁlter.3 Multistage Implementation A multistage decimator with K stages. 3. the combination of upsamplers. is done by decomposing the original transfer function into L polyphase components. On the output side. In the basic structure.6: Polyphase factor of L FIR interpolator structure.

with diﬀerential delay R. A single stage implementation with a large decimation (interpolation) factor requires a very narrow passband ﬁlter. L1 H1(z) L2 H2(z) LJ HJ (z) Figure 3. HJ (z M1M2.9. Compared to the single stage implementation. It is a cascade of integrator and comb sections.32 H1(z) M1 Chapter 3. . which is hard from the complexity point of view [3. The CIC ﬁlter has proven to be an eﬀective element in high-decimation or interpolation systems [84–87]. is comb section and the feedback section is called an integrator. Integer Sampling Rate Conversion H2(z) HK (z) M2 MK Figure 3..9: Single stage equivalent of a multistage decimator. 81. n=0 (3. The multistage implementations are used when there is need of implementing large sampling rate conversion factors [1. Similarly. the interpolation factor L can be factored as L = L1 × L2 × · · · × LK . The transfer function of a single stage CIC ﬁlter is given as H(z) = 1 − z −R = 1 − z −1 R−1 z −n . 3. 3. it relaxes the speciﬁcations of individual ﬁlters.8: Multistage implementation of an interpolator. 82]. the single stage equivalent of the multistage interpolator is shown in Fig. . 82].MJ−1 ) M1M2 . 3. A single stage CIC ﬁlter is shown in Fig. An optimum realization depends on the proper selection of K and J and best ordering of the multistage factors [82]. 3.11.4 Cascaded Integrator Comb Filters An eﬃcient architecture for a high decimation-rate ﬁlter is the cascade integrator comb (CIC) ﬁlter introduced by Hogenauer [83]. .7: Multistage implementation of a decimator. The simplicity of implementation makes the CIC ﬁlters suitable for operating at higher frequencies. The noble identities can be used to transform the multistage decimator implementation into the equivalent single stage structure as shown in Fig.2) H1(z)H2(z M1 )H3(z M1M2 ) . The feed forward section of the CIC ﬁlter.. MJ Figure 3.10. . .

HJ (z L1L2. These lowpass ﬁlters are well suited to improve the eﬃciency of anti-aliasing ﬁlters prior to decimation or the anti-imaging ﬁlters after the interpolation. the comb section following the integrator will compute the correct result at the output. 1.1 CIC Filters in Interpolators and Decimators In the hardware implementations of decimation and interpolation. R. The zeros are the R-th roots of unity and are located at z(k) = ej2πk/R . of the comb section. The integrator section. . the CIC ﬁlter is the best suited for the ﬁrst decimation stage. The comb ﬁlter has R zeros that are equally spaced around the unit circle.. cascaded integrator-comb (CIC) ﬁlters are frequently used as a computationally eﬃcient narrowband lowpass ﬁlters. . The use of two’s complement and non-saturating arithmetic accommodates the issues with overﬂow. z = ejωT = ej2πf . However. this is of no harm if two’s complement arithmetic is used and the range of the number system is equal to or exceeds the maximum magnitude expected at the output of the composite CIC ﬁlter. Cascaded Integrator Comb Filters L 1 L 2 .3) The gain of the single stage CIC ﬁlter at ωT = 0 is equal to the diﬀerential delay. is given by H(ejωT ) = sin( ωT R ) 2 sin( ωT ) 2 e−jωT (R−1)/2 (3. LJ H1(z)H2(z L1 )H3(z L1L2 ) . The existence of integrator stages will lead to overﬂows. which is only possible with exact integer arithmetic [88]. whereas in interpolators. The CIC ﬁlters are generally more convenient for large conversion factors due to small lowpass bandwidth. CIC ﬁlters have to operate at very high sampling rate.LJ−1 ) 33 Figure 3. 2. on the other hand. the comb ﬁlter is adopted for the last interpolation stage. In multistage decimators with a large conversion factor. In both of these applications. . 3. has a single pole at z = 1. x(n) z −R − y(m) z −1 Figure 3. With the two’s complement wraparound property.11: Single-stage CIC ﬁlter..4. . . R − 1. where k = 0.3. . . CIC ﬁlters are based on the fact that perfect pole/zero canceling can be achieved. . The frequency response of CIC ﬁlter.10: Single stage equivalent of a multistage interpolator.4. by evaluating H(z) on the unit circle. .

However. The structure also includes the decimation factor M . Similarly. imperfect ﬁltering gives rise to unwanted spectral images. there are only two . Similarly. the integrator and comb are used as the ﬁrst and the second section of the CIC ﬁlter. It reduces the length of the delay line which is basically the diﬀerential delay of the comb section. 3. Typically. it is generally preferred to place comb part on the side which has the lower sampling rate. The sampling rate as a result of this operation is Fin L. When the decimation factor M is moved into the comb section using the noble identity and considering R = DM . Both of these factors result in hardware saving and also low power consumption. The cascade connection of integrator and comb in Fig.4) In order to modify the magnitude response of the CIC ﬁlter. The important thing to note is that the integrator section operate at the higher sampling rate in both cases. which is now considered as the diﬀerential delay of comb part. the CIC ﬁlter with the upsampling factor L is used for the interpolation purpose. However. 3. The ﬁltering characteristics for the antialiasing and anti-image attenuation can be improved by increasing the number of stages of CIC ﬁlter. the diﬀerential delay D is restricted to 1 or 2. Similar advantages can be achieved as in the case of CIC decimation ﬁlter. an important aspect with the CIC ﬁlters is the spectral aliasing resulting from the downsampling operation. (3. respectively. where Fin is the input sampling rate.11 can be interchanged as both of these are linear time-invariant operations.34 x(n) Chapter 3. In a decimator structure. The spectral bands centered at the integer multiples of 2/D will alias directly into the desired band after the downsampling operation.12. The comb and integrator sections are both linear time-invariant and can follow or precede each other. The delay length is reduced to D = R/L.13. The value of D also decides the number of nulls in the frequency response of the decimation ﬁlter. 3. Integer Sampling Rate Conversion M z −1 z −D y(m) − Figure 3. The transfer function of the K-stage CIC ﬁlter is K K H(z) = HI (z)HC (z) = 1 − z −D 1 − z −1 K . the interpolation factor L is moved into the comb section using the noble identity and considering R = DL. One advantage is that the storage requirement is reduced and also the comb section now operates at a reduced clock rate. However. The sampling rate at the output as a result of this operation is Fin /M .12: Single stage CIC decimation ﬁlter with D = R/M . The resulting interpolation structure is shown in Fig. The delay length is reduced to D = R/M . the resulting architecture is the most common implementation of CIC decimation ﬁlters shown in Fig.

4. n=0 (3. A single stage non-recursive CIC decimation ﬁlter is shown in Fig. then the transfer function can be expressed as J−1 i=0 H(z) = (1 + z −2 ) i (3.14. is limited to (win + K · j). The . 3. The basic non-recursive form of the transfer function is H(z) = 1 D D−1 z −n . the polyphase decomposition by a factor of two at each stage can be used to reduce the sampling rate by an additional factor of two.2 Polyphase Implementation of Non-Recursive CIC The polyphase decomposition of the non-recursive CIC ﬁlter transfer function may achieve lower power consumption than the corresponding recursive implementation [89. In the case of multistage implementation. 3.4. and are usually not enough to provide suﬃcient aliasing rejection in the entire baseband of the signal. the original converter can be transformed into a cascade of J factorof-two converters. If the overall conversion factor D is power of two. D = 2J .6) As a result. Further. Each has a non-recursive sub-ﬁlter (1 + z −1 )K and a factorof-two conversion.13: Single stage CIC interpolation ﬁlter with D = R/L. the overall decimator is decomposed into a ﬁrst-stage non-recursive ﬁlter with the decimation factor J1 . The advantage of increased number of stages is improvement in attenuation characteristics but the passband droop normally needs to be compensated. the number of CIC ﬁlter stages and the diﬀerential delay D.5) The challenge to achieve low power consumption are those stages of the ﬁlter which normally operate at higher input sampling rate. In an approach proposed in [89]. Cascaded Integrator Comb Filters x(n) z −D − L y(m) z −1 35 Figure 3. parameters. followed by a cascade of non-recursive (1 + z −1 )K ﬁlters with a factor-of-2 decimation. if properly scaled. The sampling rate is successively reduced by two after each decimation stage as shown in Fig. one advantage is that only ﬁrst decimation stage will operate at high input sampling rate. 3. The register word length at any stage j for an input word length win . The natural nulls of the CIC ﬁlter provide maximum alias rejection but the aliasing bandwidths around the nulls are narrow. The second advantage of the non-recursive structures is that there are no more register overﬂow problems.3. 90].15.

3.4. The purpose of the compensation ﬁlter is to compensate for the non-ﬂat passband of the CIC ﬁlter. . therefore. some compensation ﬁlter are normally used after the CIC ﬁlters.3 Compensation Filters The frequency magnitude response envelopes of the CIC ﬁlters are like sin(x)/x. Several approaches in the literature [91–95] are available for the CIC ﬁlter sharpening to improve the passband and stopband characteristics. A two stage solution of sharpened comb decimation structure is proposed in [93–95.. + z −D+1)K D Figure 3. The role of these ﬁlter sections is then deﬁned. 97–100] for the case where sampling rate conversion is expressible as the product of two integers..36 Chapter 3. The transfer function can then be written as the product of two ﬁlter sections. (1 + z −1)K 2 (1 + z −1)K 2 (1 + z −1 )K 2 Figure 3. They are based on the method proposed earlier in [96].15: Cascade of J factor-of-two decimation ﬁlters. One ﬁlter section may be used to provide sharpening while the other can provide stopband attenuation by selecting appropriate number of stages in each ﬁlter section. polyphase decomposition along with noble identities is used to implement all these sub-ﬁlters.14: Non-recursive implementation of a CIC decimation ﬁlter. Integer Sampling Rate Conversion (1 + z −1 + z −2 + .

This chapter gives a concise overview of SRC by a rational factor and implementation details. In addition. 4. followed by downsampling by a factor of M [9. it may be beneﬁcial to change the order of these operators.1 are equivalent if the factors L and M are relative prime. M and L do not share a common divisor greater than one. 4. 4. For some cases. Therefore. i.Chapter 4 Non-Integer Sampling Rate Conversion The need for a non-integer sampling rate conversion may appear when the two systems operating at diﬀerent sampling rates have to be connected. the sampling rate alteration with an arbitrary conversion factor is described. The chapter ﬁrst introduces the SRC by a rational factor. are used to change the sampling rate of a discrete-time signal by an integer factor only.. A cascade of these sampling rate alteration devices. Upsampling by a factor of L. The polynomial-based approximation of the impulse response of a resampler model is then presented.e. downsampler and upsampler. 82]. requires the cascade of upsampler and downsampler operators. which is only possible if the L and M factors are relative prime. The technique for constructing eﬃcient SRC by a rational factor based on FIR ﬁlters and polyphase decomposition is then presented. the sampling rate change by a rational factor. 37 . results in sampling rate change by a rational factor L/M . The realizations shown in Fig. the implementation of fractional-delay ﬁlters based on the Farrow structure is considered. followed by a downsampling by a factor of M .1 Sampling Rate Conversion: Rational Factor The two basic operators.1 Small Rational Factors The classical design of implementing the ratio L/M is upsampling by a factor of L followed by appropriate ﬁltering. Finally.1.

The speciﬁcation of the ﬁlter H(z) needs to be selected such that it could act as both anti-imaging and anti-aliasing ﬁlters at the same time.2: Cascade of an interpolator and decimator.2) From the above equation it is clear that the passband will become more narrow and ﬁlter requirements will be hard for larger values of L and M . Non-Integer Sampling Rate Conversion x(n) L M y(n) x(n) M L y(n) Figure 4.3 should be ωs T = min π π . . π π L. The sampling rate of the output signal is Fout = L Fin .3) L H(z) M y(n) Figure 4. The interpolated signal is ﬁrst ﬁltered by an anti-aliasing ﬁlter before downsampling by a factor of M . . The original input signal x(n) is ﬁrst upsampled by a factor of L followed by an antiimaging ﬁlter also called interpolation ﬁlter. Since the role of the ﬁlter H(z) is to act as an anti-imaging as well as an anti-aliasing ﬁlter. The ideal speciﬁcation requirement for the magnitude response is |H(ejωT )| = x(n) L. are next to each other.3. otherwise.2. 4. 4. The implementation of a rational SRC scheme is shown in Fig. interpolation and the decimation. the stopband edge frequency of the ideal ﬁlter H(z) in the rational SRC of Fig.38 Chapter 4. Both of these lowpass ﬁlters can be combined into a single lowpass ﬁlter H(z) as shown in Fig. L M (4. M (4. 4.3: An implementation of SRC by a rational factor L/M .1: Two diﬀerent realizations of rational factor L/M . M (4.1) Since both ﬁlters. x(n) L HI (z) Hd(z) M y(n) Figure 4. |ωT | ≤ min 0. and operate at the same sampling rate.

2 Polyphase Implementation of Rational Sampling Rate Conversion The polyphase representation is used for the eﬃcient implementation of rational sampling rate converters. Sampling Rate Conversion: Rational Factor x(n) H0(z) H1(z) H2(z) L L L z −1 z −2 M M M y(m) 39 HL−1(z) L z −L+1 M Figure 4. the ﬁlter transfer function H(ejωT ). In the next step. The transfer function H(z) in Fig. 4.3 is decomposed into L polyphase components using the approach shown in Fig. 4. the interpolation factor L is moved into the subﬁlters using the computationally eﬃcient polyphase representation form shown in Fig. 1.4) It is apparent that the overall performance of the resampler depends on the ﬁlter characteristics. If the sampling conversion .1. 1. Since the downsampling operation is linear. can be expressed as Y (e jωT 1 )= M M−1 X(ej(LωT −2πk)/M )H(ej(ωT −2πk)/M ). it can be moved into each branch of the subﬁlters. which then enables the arithmetic operations to be performed at the lowest possible sampling rate. as a function of spectrum of the original signal X(ejωT ). 10]. Fig. The reorganization of the individual polyphase branches is targeted next.4 [5. The frequency spectrum of the re-sampled signal.11.4 has already attained the computational saving by a factor of L. k=0 (4. An equivalent delay block for each sub-ﬁlter branch is placed in cascade with the sub-ﬁlters.4.4: Polyphase realization of SRC by a rational factor L/M .1. and the interpolation and decimation factors L and M .11a. and also by the use of noble identity. Y (ejωT ). 4. The next motivation is to move the downsampling factor to the input side so that all ﬁltering operations could be evaluated at 1/M -th of the input sampling rate. 4. The equivalent polyphase representation of the rational factor of L/M as a result of these modiﬁcations is shown in Fig. By using polyphase representation and noble identities.

As a result of these adjustments. Non-Integer Sampling Rate Conversion L L z kl0 Hk (z) z −k z k(l0L−m0M ) L M M L M M z −km0 z −km0 Figure 4.6) The step-by-step reorganization of the k-th branch is shown in Fig.1 kHz. Using (4. Since both upsampler and downsampler are assumed to be relative prime. L and M . The polyphase representation form in Fig.5) where l0 and m0 are some integers. 4.5. factors. The intermediate sampling rate for the ﬁltering operation will be lower.1. ﬁlters of a very high order are needed.3 Large Rational Factors The polyphase decomposition approach is convenient in cases where the factors L and M are small. . in the application of sampling rate conversion from 48 kHz to 44. the sub-ﬁlter Hk (z) is in cascade with the downsampling operation. 1.5: Reordering the upsampling and downsampling in the polyphase realization of k-th branch. As can be seen in Fig. Otherwise. The delay chain z −k is ﬁrst replaced by its equivalent form in (4. which makes it a more eﬃcient structure for a rational SRC by a factor L/M .40 Hk (z) Hk (z) Hk (z) z kl0 Chapter 4.5. they can be interchanged. then l0 L − m0 M = −1 (4. 1.6).10a and its equivalent computationally eﬃcient form in Fig.10 can be used for the polyphase representation of the sub-ﬁlter Hk (z). The rational sampling rate conversion structures for the case L/M < 1 considered can be used to perform L/M > 1 by considering its dual. are relative prime. all ﬁltering operations now operate at 1/M :th of the input sampling rate. The same procedure is followed for all other branches. the k-th delay can be represented as z −k = z k(l0 L−m0 M) (4. The ﬁxed independent delay chains are also moved to the input and the output sides. The negative delay z kl0 on the input side is adjusted by appropriately delaying the input to make the solution causal. For example. 4. which then enables the interchange of downsampler and upsampler. as highlighted by the dashed block. 4.5).

new ﬁlters are needed. 4. −N1 + 1.1 Farrow Structures In conventional SRC implementation. In this case. . x((nb −2)Tx ). to cover the whole sampling interval range. N2 − 1. x((nb + 1)Tx ). 4. if the SRC ratio changes. The interpolation algorithm may in general be deﬁned as a time-varying digital ﬁlter with the impulse response h(nb .2.i=k t − tk . uniformly sampled at the interval Tx .2) is π/160. . y(Ty m) = ya (Ty m). . The new sample value y(Ty m) is computed. . k = −N1 . . . Ty m.8) The Lagrange approximation also does the exact reconstruction of the original input samples. Sampling Rate Conversion: Arbitrary Factor 41 the rational factors requirement is L/M = 147/160. . The stopband edge frequency from (4. x((nb −1)Tx ). x(nb Tx ). . x(n). d). then Pk (t) is deﬁned as N2 Pk (t) = i=−N 1. If the interpolation process is based on the Lagrange polynomials. .4. . which imposes severe requirements for the ﬁlters. Assuming an input sequence. . A polynomial approximation ya (t) is then deﬁned as N2 ya (t) = k=−N1 Pk (t)x(nb + k) (4. This limits the ﬂexibility in covering diﬀerent SRC ratios. x((nb +2)Tx ). . . The requirement is to compute new sample value. The parameter nb is some reference or base-point index. . The fractional interval can take on any value in the range |Tx d| ≤ 0. . The continuous-time signal can then be sampled at the desired instant. This will result in a high-order ﬁlter with a high complexity. which occurs between the two existing samples x(nb Tx ) and x((nb + 1)Tx ). by utilizing some interpolation algorithm. .2. By utiliz- .7) where Pk (t) are polynomials. a multi-stage realization will be beneﬁcial [41]. and d is called the fractional interval. For a given set of the input samples x(−N1 + nb ). x(nb ). there is a need to compute the value of underlying continuous-time signal xa (t) at an arbitrary time instant between two existing samples. . The computation of new sample values at some arbitrary points can be viewed as interpolation.2 Sampling Rate Conversion: Arbitrary Factor In many applications. The commonly used interpolation techniques are based on polynomial approximations. . using the existing samples. tk − ti (4. N2 .5. y(Ty m) at some time instant. . where Ty m = nb Tx + Tx d. . x(nb + N2 ). The interpolation problem can be considered as resampling. the window or interval chosen is N = N1 + N2 + 1. where the continuoustime function ya (t) is ﬁrst reconstructed from a ﬁnite set of existing samples x(n).

The corresponding ﬁlter structure makes use of a number of sub-ﬁlters and one adjustable fractional-delay as seen in Fig. The sub-ﬁlters can also have even or odd orders Nk .6. |ωT | ≤ ωc T < π. 1. A ﬁlter with a transfer function in the form of (4. the ﬁlter H0 (z) reduces to a pure delay. The implementation complexity of the Farrow structure is lower compared to alternatives such as online design or storage of a large number of diﬀerent impulse responses. . . all Hk (z) are general ﬁlters whereas for even Nk .42 x(n) HL(z) d Chapter 4. or an integer plus a half. . this can be solved in an elegant way. a whole sampling interval is covered by d. An alternative . Hk (z). with either a symmetric (for k even) or antisymmetric (for k odd) impulse response. 1/2]. With odd Nk .5) when D is an integer (an integer plus a half). ing the Farrow structure. Non-Integer Sampling Rate Conversion H2(z) d H1(z) d H0(z) y(n) Figure 4. d) = k=0 dk Hk (z). for decimation. (4. The transfer function of the Farrow structure is L H(z. of an adjustable FD ﬁlter be Hdes (ejωT . Hdes (ejωT . (4. k! (4.11) can approximate the ideal response in (4. The overall impulse response values are expressed as polynomials in the delay parameter.6: Farrow structure. It is assumed that D is either an integer. L. it is better to use the transposed Farrow structure so as to avoid aliasing. Let the desired frequency response.10) as close as desired by choosing L and designing the sub-ﬁlters appropriately [30.9) to L + 1 terms.6. The Farrow structure is composed of linear-phase FIR sub-ﬁlters. and the fractional-delay equals d (d + 0. . 101]. When Hk (z) are linear-phase FIR ﬁlters.9) where D and d are ﬁxed and adjustable real-valued constants. respectively. the Farrow structure is often referred to as the modiﬁed Farrow structure. d). 4. k = 0. whereas d takes on values in the interval [−1/2. shown in Fig. d) = e−jωT (D+d) . The Farrow structure is eﬃcient for interpolation whereas. 4. In this way.10) where Hk (z) are ﬁxed FIR sub-ﬁlters approximating kth-order diﬀerentiators with frequency responses Hk (ejωT ) ≈ e−jωT D (−jωT )k .11) which is obtained by truncating the Taylor series expansion of (4.

d) = k=0 hk (n)dk . Hence. one can obtain new samples between any two consecutive samples of x(n). (4. Tout > Tin . For interpolation. one set of Hk (z) is used and only d is required to be modiﬁed. by controlling d for every input sample. . In general. respectively. d) = k=0 n=0 N L hk (n)z −n dk .13) Assuming Tin and Tout to be the sampling period of x(n) and y(n). (4. For decimation. Thus. some signal samples will be removed but some new samples will be produced. (4. d)z −n .2. If a signal needs to be delayed by two diﬀerent values of d. hk (n)dk z −n . whereas interpolation results in Tout < Tin . Odd Nk . the Farrow structure delays a bandlimited signal by a ﬁxed d. With decimation. the output sample index at the output of the Farrow structure is nout Tout = (nin + d(nin ))Tin . If d is constant for all input samples.14) where nin (nout ) is the input (output) sample index. SRC can be seen as delaying every input sample with a diﬀerent d. n=0 k=0 N = = n=0 h(n. one can shift the original samples (or delay them in the time domain) to the positions which would belong to the decimated signal.4. Even Nk (nin + 0.12) The quantity N is the order of the overall impulse response and L h(n.5 + d(nin ))Tin . This delay depends on the SRC ratio. in both cases. the Farrow structure performs SRC. Sampling Rate Conversion: Arbitrary Factor representation of the transfer function of Farrow structure is L N 43 H(z.

.

only uni-variate polynomial. Here n is the order of the polynomial. The p(x) takes the form p(x) = a0 + a1 × x + a2 × x × x + · · · + an × x × x · · · × x. Polynomial evaluation also has other applications [102]. is considered with an independent variable x. requires evaluation of a polynomial of degree L with d as an independent variable. In this chapter. At the start of chapter. p(x). p(x). .Chapter 5 Polynomial Evaluation The Farrow structure. the classical Horner scheme is described. 45 an = 0. 4. A uni-variate polynomial is a polynomial that has only one independent variable. which is considered as an optimal way to evaluate a polynomial with minimum number of operations. shown in Fig. Motivated by that. A brief overview of parallel polynomial evaluation schemes is then presented in the central part of the chapter. the computation of the required set of powers is discussed. . Finally. an are the polynomial coeﬃcients.6. 104]. an = 0. 5.1 Polynomial Evaluation Algorithms The most common form of a polynomial. . this chapter reviews polynomial evaluation schemes and eﬃcient evaluation of a required set of powers terms. Two approaches are outlined to exploit potential sharing in the computations. the ﬁrst solution that comes into mind is the transformation of all powers to multiplications. is represented as p(x) = a0 + a1 x + a2 x2 + · · · + an xn . . In order to evaluate p(x). such as approximation of elementary functions in software or hardware [103. a1 . also called power form. whereas multi-variate polynomials involves multiple independent variables. . and a0 .

the suitable schemes can be shortlisted for an implementation given the speciﬁcations.1. Polynomial Evaluation an x an−1 x a1 x a0 p(x) Figure 5. number of operations of diﬀerent types. and data-coeﬃcient multiplications.1: Horner’s scheme for an n-th order polynomial. (5. 5. varying in degree of parallelism. . pipelining complexity. there is an increase in the number of operations for these parallel algorithms. 5. as given in Table 5. . 5. To compare diﬀerent polynomial evaluation schemes. it is . hence. and. In this scheme. the polynomial p(x) is represented in the form p(x) = a0 + x a1 + x a2 + · · · + x(an−1 + an x) . This leads to that the number of partial products can be roughly halved. However.2 Horner’s Scheme Horner’s scheme is simply a nested re-write of the polynomial. However. there are more eﬃcient ways to evaluate this polynomial. Horner’s scheme has the minimum arithmetic complexity of any polynomial evaluation algorithm. it is purely sequential. One possible way to start with is the most commonly used scheme in software and hardware named as Horner’s scheme. the diﬀerence is in the number of multiplication operations. both of the inputs of the multipliers are same. Earlier works have primarily focused on either ﬁnding parallel schemes suitable for software realization [105–111] or how to ﬁnd suitable polynomials for hardware implementation of function approximation [47. 112–116]. However. On the basis of these parameters. and latency after pipelining may be considered. In squarers. data-data multiplications. For a polynomial order of n. Multiplications can be divided into three categories. Since all schemes have the same number of additions n.46 Chapter 5.3 Parallel Schemes Several algorithms with some degree of parallelism in the evaluation of polynomials have been proposed [105–110].1. . critical path.1) This leads to the structure in Fig. squarers. and therefore becomes unsuitable for high-speed applications. diﬀerent parameters such as computational complexity. n multiplications and n additions are used.

Other factors that may be helpful in shortlisting any evaluation scheme are uniformity and simplicity of the resulting evaluation architecture and linearity of the scheme as the order grows. on other hand. as explained in Chapter 2. The data-coeﬃcient multiplication count also gives the idea of degree of parallelism of that scheme. Scheme [110]. there are other polynomial evaluation schemes as well like Dorn’s Generalized Horner Scheme [106]. some schemes may be good for one polynomial order but not for the others. has two data-coeﬃcient multiplications. Even Odd (EO) Scheme. Powers Evaluation 47 reasonable to assume that a squarer has roughly half the area complexity as of a general multiplier [117. The schemes which are frequently used in literature are Horner and Estrin [105]. Maruyama’s Scheme [109]. and Li et al. The required set of power terms to be evaluated may contain all powers up to some integer n or it has some sparse set of power terms that need to be evaluated. The scheme which has the lower number of data-data multiplications is cheaper to implement. The diﬀerence in all schemes is that how the schemes eﬃciently split the polynomial into sub-polynomials so that they can be evaluated in parallel. some schemes do not work for all polynomial orders. Horner’s scheme has only one data-coeﬃcient multiplication. All the listed schemes have some good potential and useful properties. no degree of parallelism. The evaluation of such square and cube powers are normally more eﬃcient than conventional multiplications. In data-coeﬃcient multiplications. any degree of parallelism in it or eﬃcient evaluation relaxes the computational burden to satisfy required throughput. The applications where polynomials are required to be evaluated at run-time or have extensive use of it. The EO scheme. not only ﬁnds application in polynomial evaluation [102. Munro and Paterson’s Scheme [107. which result in two branches running in parallel. Some schemes split the polynomial in such a way that only i i powers of the form x2 or x3 are required. it is assumed that a constant multiplier has an area which is 1/4 of a general multiplier. the area complexity is reduced but hard to ﬁnd as it depends on the coeﬃcient. The other factor that is important is the type of powers required after splitting of the polynomial.5. also known as exponentiation. 108]. For simplicity. 121]. and hence. However. has one sequential ﬂow. The disadvantage on the other hand is that the number of operations are increased and additional need of hardware to run operations in parallel. For example. 118]. redundancy in computations at diﬀerent levels is expected . In the process of evaluating all powers at the same time. However.4 Powers Evaluation Evaluation of a set of powers. 5.4. 119] but also in window-based exponentiation for cryptography [120. two schemes having the same total number of multiplication but vary in types of multiplications are diﬀerent with respect to implementation cost and other parameters. As a result.

x5 . x20 . while in the second. to compute x23 .48 Chapter 5. x11 .4. a1 . it is exploited at bit level. additions (A). (5. . . {x2 . 50}. {x2 . it does not combine well in eliminating the overlapping terms that appear in the evaluation of individual powers as explained below with an example. when these techniques.3) (5. used for the computation of a single power [102. . x4 . Scheme Direct Horner Estrin Li Dorn EO Maruyama Evaluation a5 x5 + a4 x4 + a3 x3 + a2 x2 + a1 x + a0 ((((a5 x + a4 )x + a3 )x + a2 )x + a1 )x + a0 (a5 x2 + a4 x + a3 )x3 + a2 x2 + a1 x + a0 (a5 x + a4 )x4 + (a3 x + a2 )x2 + (a1 x + a0 ) (a5 x3 + a2 )x2 + (a4 x3 + a1 )x + (a3 x3 + a0 ) ((a5 x2 + a3 )x2 + a1 )x + ((a4 x2 + a2 )x2 + a0 ) (a5 x + a4 )x4 + a3 x3 + (a2 x2 + a1 x + a0 ) A 5 5 5 5 5 5 5 M 2 1 4 3 3 2 4 C 5 4 2 2 3 3 2 S 2 0 1 2 1 1 2 which need to be exploited. . 5. . 2. In the ﬁrst. x10 . However. x23 }. x4 . . x7 . redundancy is removed at word level. l. x10 . need to be evaluated. x23 }.2) As multiplication is additive in the logarithmic domain. 39. x8 . data-coeﬃcient multiplications (C). . squarers (S). x12 . x5 . a minimum length addition chain will give an optimal solution to compute a single power term. Another eﬃcient way to evaluate a single power is to consider the relation to the addition chain problem. x23 }. Two independent approaches are considered. such that ai = aj + ak . With the binary approach in [102]. x3 . 22 repeated multiplications of x are required. For example. i = 1. An addition chain for an integer n is a list of integers [102] 1 = a0 . however. 122. . T = {22. while the factor method in [102] gives a solution with one less multiplication. {x2 . al = n. x16 . this approach is ineﬃcient and becomes infeasible for large values of i.1: Evaluation schemes for ﬁfth-order polynomial.1 Powers Evaluation with Sharing at Word-Level A direct approach to compute xi is the repeated multiplication of x with itself i − 1 times. k ≤ j < i. 123]. . x3 . Polynomial Evaluation Table 5. the same can be achieved in eight multiplications. A minimum length addition chain for n = 23 gives the optimal solution with only six multiplications. Suppose a sparse set a power terms. a2 . are applied in the evaluation of multiple power terms. datadata multiplications (M).

39. hence. from one up to some integer n.. 3. 14. the solution is {22. an addition sequence computing {3. 24. For example. and. 50}. 27. 9. 6. 2. (5. 10. . there are some overlapping terms. 6. 2. 11. 50} → {1.7) . Powers Evaluation A minimum length addition chain solution for each individual power is 22 → {1. 2. 2. 25. the optimal solution for the evaluation of the requested set of powers. 15.2 Powers Evaluation with Sharing at Bit-Level In this approach..5.x2 x1 x0 . 3. 3. 11}. As can be seen.4) However. 2. The PP matrices for all requested powers in the set are generated independently and sharing at bit level is explored in the PP matrices in order to remove redundant computations. 6. 5. 12}. and it is assumed that each power is computed only once. which can be considered as the generalization of addition chains. . 27. Since every single multiplication will compute some power in the set. 39. 22. If these overlapping terms are removed and the solutions are combined for the evaluation of the required set of powers. 50}. 22. This solution is obtained by addressing addition sequence problem. are required to be computed is easy compared to the sparse case. 12. can be mapped to the same full adder. 3. 24. . any solution obtained will be optimal with respect to the minimum number of multiplications. 12. 7.6) xi 2i . which can be shared. 39. 6. 50}. 50}. 39} 49 50 → {1. The removal of redundancy at word-level does not strictly mean for the removal of overlapping terms only but to compute only those powers which could be used for evaluation of other powers as well in the set. 7. 10.4. An addition sequence for the set of integers T = n1 . 39. 36. the evaluation of power terms is done in parallel using summations trees.5) which clearly shows a signiﬁcant diﬀerence. 8.4. 15. 5. {22. {22. is {22. 3. (5. 11} is {1. 3. 25. 12. 5. 6. A w-bit unsigned binary number can be expressed as following X = xw−1 xw−2 xw−3 . 3. The redundancy here relates to the fact that same three partial products may be present in more than one columns. similar to parallel multipliers. 22} 39 → {1. 2. 50}. 39. Note that the case where all powers of x. (5. 39. . 50} → {1. {2. with a value of X= i=0 w−1 (5. 11. nr is an addition chain that contains each element of T . n2 . 4.

4).2: Equivalent representation of binary variables in PP matrix for w = 3.g. the terms. (5. the expression in (5. is considered to demonstrate the sharing potential among multiple partial products. .. 7. since the partial product sets {9. w) = (5. The PP matrix of the corresponding power term can then be deduced from this function. Binary variable x0 x1 x1 x0 x2 x2 x0 x2 x1 x2 x1 x0 Word (X) x2 x1 x0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1 Representation 1 2 3 4 5 6 7 In Fig.2.7) is given by w−1 n Xn = i=0 xi 2i . To make the analysis of the PP matrix simpler. i. cube. the above equation can be simpliﬁed to a sum of weighted binary variables. Therefore. 5.2. computing the square. e. Table 5. which can further be simpliﬁed by using the identities xi xj = xj xi and xi xi = xi . 9} are present in more than one column. In the process of generation of a PP matrix from the weighted function.2.8) For the powers with n ≥ 2. 13} and {5.8) can be evaluated using the multinomial theorem. For any integer power n and a word length of w bits. three full adders (two for the ﬁrst set and one for the second) can be saved by making sure that exactly these sets of partial products are added. there are savings. For the terms having weights other than a power of two.e. The nth power of an unsigned binary number as in (5. and fourth power of a ﬁve-bit input.50 Chapter 5. 5. As can be seen in Fig. The advantage of using CSD representation over binary is that the size of PP matrix is reduced and therefore the number of full adders for accumulating PP are reduced.. the binary variables in the PP matrix can then be represented by their corresponding representation weights as given in Table 5. an example case for (n. x0 20 will be placed in the ﬁrst and x2 26 in the 7-th column from the left in the PP matrix. compared to adding the partial products in an arbitrary order. 10. Polynomial Evaluation where xw−1 is the MSB and x0 is the LSB of the binary number. binary or CSD representation may be used. All partial products of the powers from x2 up to x5 are shown.

4.5. n = 5. w = 4.2: Partial product matrices with potential sharing of full adders. . Powers Evaluation 51 n=2 4 6 9 6 9 10 13 12 14 4 9 10 13 11 n=3 5 7 10 11 3 6 5 7 9 2 5 9 n=4 8 14 10 10 10 9 6 14 13 11 15 13 15 17 4 7 3 5 5 9 5 7 9 13 11 9 10 13 Figure 5.

.

By optimizing both the number of arithmetic operations and their word lengths. the output of the ﬁnal comb stage will be embedded with zeros during the upsampling. In this way an improved comparison can be made. improved implementations are obtained. Deriving a switching activity model for polyphase FIR ﬁlters would naturally have beneﬁts for all polyphase FIR ﬁlters. For interpolating recursive CIC ﬁlters. This would motivate a study of pipelined polynomial evaluation 53 .1 Conclusions In this work.Chapter 6 Conclusions and Future Work 6. Based on this. The comparisons are further improved for some cases by accurately estimating the switching activities and by introducing leakage power consumption. contributions beneﬁcial for the implementation of integer and noninteger SRC were presented.2 Future Work The following ideas are identiﬁed as possibilities for future work: It would be beneﬁcial to derive corresponding accurate switching activity models for the non-recursive CIC architectures. one could derive closed form expressions for the integrator taking this additional knowledge into account. where both architectures have more accurate switching activity estimates. This not only reduces the area and power consumption but also allows a better comparisons between diﬀerent SRC alternatives during a high-level system design. 6. The length of the Farrow sub-ﬁlters usually diﬀer between diﬀerent subﬁlters. not only the ones with CIC ﬁlter coeﬃcients.

This could also include obtaining diﬀerent power terms at diﬀerent time steps.54 Chapter 6. this could be utilized to shift the rows corresponding to the sub-ﬁlters in such a way that the total implementation complexity for the matrix-vector multiplication and the pipelined polynomial evaluation is minimized. . The addition sequence model does not include pipelining. For the proposed matrix-vector multiplication scheme. Conclusions and Future Work schemes where the coeﬃcients arrive at diﬀerent times. It would be possible to reformulate it to consider the amount of pipelining registers required as well.

Chapter 7 References 55 .

Inc. IEEE. 1984. no. 1976. Inc. [2] A. John Wiley & Sons. Mar. interpolation.” IEEE Trans. 2011. New York. [11] M. [8] T. [3] S. 300–331. Apr. R. Kaiser. 33. . 3. J. vol. vol. 1993. 1993. vol. “Interpolation and decimation of digital signals–a tutorial review. R. USA: John Wiley & Sons. vol. V. pp. and M. [4] L. Oct. [7] R.. no. Bonnerot. 1994. 225–235. 5. Gazsi. A.. R. Rabiner. 130. Mitra. Prentice Hall. pp. Dec.. “Digital signal processing schemes for eﬃcient interpolation and decimation. Fliege.” IEEE Trans. 1986. Crochiere and L. vol. 577–591. K. Signal Process.” IEEE Trans. 6. 1999. Digital Processing of Signals. “Digital ﬁltering by polyphase network:application to sample-rate alteration and ﬁlter banks. 32.. Valenzuela and A. and narrow-band ﬁltering.. 2001. 3. Oppenheim. Ramstad. New York: Wiley. pp. 1st ed. Speech. [13] W. F. Constantinides. Prentice Hall. Mitra and J. 23.. NY. 1984. [12] R. E.. Multirate Digital Signal Processing: Multirate Systems. 346–348. 1975. Speech. 1981. Bellanger. Coudreuse.. Signal Process. Vaidyanathan. no. Circuits Syst. Discrete-Time Signal Processing. Buck. 109–114. Speech. 444–456. “Optimum FIR digital ﬁlter implementations for decimation. no. no. P..” Proc. [10] P. 2nd ed. References References [1] S. Electr. New York: McGraw-Hill. Multirate Systems and Filter Banks. Jun. Schafer. pp.56 Chapter 7. G. Drews and L. no. K. 69. Linköping University. Digital Filters Using MATLAB. Wavelets. [9] M. pp. Crochiere and L.” IEEE Trans. Acoust. 3. Rabiner. 24. Filter Banks. 2. G. Wanhammar and H. [5] N. W. Handbook for Digital Signal Processing.. Acoust.” IEE Proc. “Digital methods for conversion between arbitrary sampling frequencies. 1983. and J. [6] R. vol. “A new design method for polyphase ﬁlters using all-pass sections. Digital Signal Processing: A Computer Based Approach. Bellanger. Mar. Johansson. Circuits Syst. pp. Acoust. Signal Process.

12. “Simultaneous estimation of gain. Löwenborg. “A continuously variable digital delay element. R. Europ. pp. Jan.” IEEE Trans..” in Proc. pp. [23] C. pp. Record Conf.References 57 [14] M. [17] L. Circuits Syst. “Power eﬃcient structure for conversion between arbitrary sampling rates. “On explicit time delay estimation using the Farrow structure. 1. Electron. Saramäki and T. Commun. 1.. Commun. . 501–507. 2007. 1. M. 2006. Renfors. [15] ——.” in Proc..” Signal Process. [20] S. Nandi. 72. and P. Gardner. vol. vol. Conf. 698–702. “Interpolation in digital modems. 998–1008. M. [19] M.-T. IEEE Int. Jan. Conf. 1987. Renfors. Consum. 1997. 2. Dooley and A. vol. 2005. Erup...” in Proc. “Eﬃcient implementation of polynomial interpolation ﬁlters or full digital receivers. pp.. “Time-delay estimation using Farrow-based fractional-delay FIR ﬁlters: ﬁlter approximation vs. 1988. Johansson. 24–39. vol. and oﬀset utilizing the Farrow structure. pp. F. “Interpolation in digital modems. Circuit Theory Design. Circuits Syst.” IEEE Trans. Wey. 34. [18] V. 51. Ritoniemi. estimation errors. [21] M. Farrow. Kim. 1996.” IEEE Trans. 1987. Circuits Syst. 6. Implementation and performance. Signal Process. Saramäki. 2. Conf. 175–178. 34. Jan. Circuits Syst. Harris. 2006. [16] F. IEEE Int. no.. no. J. 2641–2645. 3.-L. vol. 1–4. 285–288.” in Proc. Lett. 1993. Electro/Information Technol. Renfors and T.” IEEE Trans. no. “An eﬃcient approach for conversion between arbitrary sampling frequencies. Jun. Olsson. pp. H. “Recursive Nth–band digital ﬁlters – part I: Design and properties. pp. no. vol.. Universal Personal Commun. Mar. Shiue and C. Vesma. vol. [25] D. 1999. W. pp. pp. 41. and M... vol. 244–247. 53–57.” IEEE Trans. II. 1. IEEE Int. Europ. [24] T. Symp.” in Proc.. no. Symp. Babic and M. 1993. Jan. Fundamentals. “Eﬃcient implementation of interpolation technique for symbol timing recovery in DVB-T transceiver design. Gardner. [26] J. 1.” in Proc. A. no. pp. Tuukkanen. delay. 41. Feb. IEEE Int.” IEEE Signal Process. pp.. K. vol. no. 2005. 427–431. I. pp. T. “Combined interpolation and maximum likelihood symbol timing recovery in digital receivers. [22] ——. “Recursive Nth-band digital ﬁlters – part II: Design of multistage decimators and interpolators. 40–51. and R.

S. Solid-State Circuits. 442–446. H. [35] C. P. “Leakage current mechanisms and leakage reduction techniques in deep-submicrometer CMOS circuits. no. no. pp.” IEEE Trans. W. 115– 146. Austin. “Low-power CMOS digital design. May 2008. P. W. S. 11.” IEEE Trans. 1996. 27. S. . Mahmoodi-Meimand. no. and M. Mudge.” Computer.” IEEE Signal Process. and M. 14. Hu. no. “Leakage current: Moore’s law meets static power. [34] E. Narayanan. References [27] T. no. no. no. [36] A. [29] R. Hermanowicz. 12. V. Cantoni. Irwin. 2. [37] A. “Polynomial-based interpolation ﬁlters – part I: Filter synthesis. [33] H. Dec. “Fractional delay ﬁlter design based on truncated Lagrange interpolation. Hamila. 36. “Polynomial-based maximumlikelihood technique for synchronization in digital receivers. Lett. [30] H.” IEEE Signal Process. Jan. Löwenborg. vol. Vesma. “A frequency-domain approach to polynomial-based interpolation and the Farrow structure. 5... 2007. pp. Candan. 4. 2003. and R. “An eﬃcient ﬁltering structure for Lagrange interpolation. [31] J. Valimaki. Chandrakasan. 14. Jan. no. T.” IEEE Trans. 473–484. Teo. S. II. 2002. Roy. II. J. J. pp. Mar. D. S. no. vol. K. Lett. Chandrakasan and R. pp. pp. 1. and U. M. H. vol. pp. 68–75. Circuits Syst. M. 2000. [32] V. Välimäki and A. vol. 30–60. “A fractionally delaying complex Hilbert transform ﬁlter. 47. vol.. 4. IEEE. Circuits Syst.58 Chapter 7. 1992. Johansson and P.” IEEE Trans. 1. 305–327. and H. 2007. Laakso. Feb. II. Baauw. vol. [38] N. 5. “Splitting the unit delay. Boston-Dordrecht-London. Renfors. 50. and V. 1995. 49. vol. 3.” IEEE Signal Process. 2007. May 2008. A. 164– 169. 55. Laine.” IEEE Trans. 206–209. pp.” IEEE J. Circuits Syst. Apr. Sheng. pp. Rojewski. Nov. pp. Kluwer Academic Publishers. Haghparast. 13. 91. Signal Process. Brodersen. J. Circuits Syst. vol. no. no. Saramäki. M. vol. pp. pp.” Circuits Syst. 2003. I. II. vol. L. 452–456. 26. “Variable digital ﬁlter with group delay ﬂatness speciﬁcation or phase constraints. Nordholm. Brodersen. Apr. pp. no. Johansson. Apr. Flautner. Circuits Syst. Low-Power Digital CMOS Design. Vesma. 17–19. K. vol. 2003. T. II. Kim.” Proc. [39] K. Aug. and K. 567–576. Vesma and T. vol. “On the design of adjustable fractional delay FIR ﬁlters.. Kandemir. 2. 816–819. [28] J. Karjalainen. 8. Mag. 55. Dam. Mukhopadhyay.

Circuits Syst. IEEE Int. [43] L. Kum and W..” in Proc.” in Proc. vol. 2005. [48] P. Chan and K.References [40] International technology http://public. Han and B.. Aug. vol. 4. Conf. 12. “Closed-form and real-time wordlength adaptation. “Optimum wordlength search using sensitivity information. [49] G. Menard.” IEEE Trans. “Simulation-based word-length optimization method for ﬁxed-point digital signal processing systems.” IEEE Trans. Sentieys. pp. and O. no. [45] K. 1998. [53] M. Vesterbacka. [47] D. [44] W. June 1997. pp. J. MA. [52] S. Applied Signal Process. Fiore and L. Diss. IEEE Int. 2001. [41] L. Lee. Academic Press. Acoustics Speech Signal Process. Parhi. P. “Data wordlength optimization for FPGA synthesis. IEEE Workshop Signal Process. 2005.” Ph. Y.D.net. “Eﬃcient approximate wordlength optimization. no. 2007. Evans.” EURASIP J. D. Villasenor.. K. Norwell. Comput. pp.. 20. pp.” IEEE Trans. Design Implementation. no. vol. 159–184. pp. Sweden. Herve. Comput. Nov. Dec. Feb. 8. 1897–1900. 487. 2607– 2610. [51] N. 567–571. “Combined word-length optimization and highlevel synthesis of digital signal processing systems. “On implementation of maximally fast wave digital ﬁlters. 1–14. Sung and K. Apr. vol. B. [50] P. C. Lee and J. dissertation. no. pp.” Bell Syst. D. 2006. Signal Process. “Wordlength determination algorithms for hardware implementation of linear time invariant systems with prescribed output accuracy. 921–930. A.itrs. Tsui.-I.” IEEE Trans. L. pp. no. Comput. 4. 1561–1570. 623–628. 11. and W.-U. 49. [42] K. roadmap for 59 semiconductors. 1995. 2008. D. DSP Integrated Circuits. . Syst.. 1999. Kum. [46] K. vol. Synthesis And Optimization Of DSP Algorithms. 56. Cheung. 1999.. D. Luk. M. vol. 43. Tech. Circuits Syst. 2006. K.-I.. pp. Fiore. Sung. VLSI Digital Signal Processing Systems: Design and Implementation. pp. “A bit-width optimization methodology for polynomial-based function evaluation. John Wiley and Sons. vol.” in Proc. 3087–3090. Jan. 57. 2004. “On the interaction of round-oﬀ noise and dynamic range in digital ﬁlters.. Constantinides. Symp.Aided Design Integr.. Wanhammar. Jackson. Linköping university. USA: Kluwer Academic Publishers. 1970.

Schaumont. [60] D. vol. Martinez-Peiro. Symp. A. Dempster and M. Wanhammar. 7... 1991. vol. no. P. Circuits Syst. Circuits Syst. “Extended results for minimum-adder constant integer multipliers.” IEEE Trans.” in Proc. 138. 225–251. Ma and F. [56] O. pp. [62] A. Comput. M. 2004..-K. “Digital ﬁlter design using subexpression elimination and all signed-digit representations. 2. Circuits Devices Syst. pp. vol.” in Proc. Circuits Syst. 1. Circuits. “Low-complexity bitserial constant-coeﬃcient multipliers. 15. Macleod.” IEEE Trans. V. Macleod.Aided Design Integr. Dempster. 1. [64] M. and L. Wanhammar. 2002. Srivastava. K. IEEE Int. pp.” IEEE ASSP Mag. References [54] G. 5. 58–68. “Multiple constant multiplications: eﬃcient and versatile framework and algorithms for exploring common subexpression elimination. B. 407–413. D. Potkonjak. Chandrakasan. vol. II. Dempster and M. “Subexpression sharing in ﬁlters using canonic signed digit multipliers.. vol. Bull and D. 10. Wanhammar. Johansson. 151–165. Jun. pp. Oct. 18. Johansson. 1999. Circuits Syst. 1996. P. . Taylor. and L. O. pp. Vernalde. Devices and Syst. vol. “Design of highspeed multiplierless ﬁlters using a non-recursive signed common subexpression algorithm. 196–203. 2006. Symp. vol. no. no. and D. Jan. Gustafsson. A. Circuits Syst. I. Dempster. and L. Durackova. 677–688. He and M. 2002. Mar. vol. E. pp. 1996. 3. no. Macleod. 1.60 Chapter 7. H. “Simpliﬁed design of constant coeﬃcient multipliers. Circuits Syst. Feb.. [59] K.” in Proc. 3. 1994. “Primitive operator digital ﬁlters. [65] R.” IEE Proc. pp.” Circuits Syst. [63] R. Pasko.. 141. Gustafsson. J. no. no. “A new algorithm for elimination of common subexpressions.. Derudder..” in Proc.” IEE Proc. “Constant integer multiplication using minimum adders. 6–20. Apr. [55] A. no. II. Jan. Torkelson. 25. and A. Comput. [61] M. Signal Process. G.-Aided Design Integr.. IEEE Custom Integrated Circuits Conf. Oct.” IEEE Trans. 3. 2004. vol. 2. “FPGA implementation of FIR ﬁlters using pipelined bit-serial canonical signed digit multipliers. 43. 49. Hartley. G. G. S. I. [58] S. pp. no. 1990. vol. [57] O. Horrocks. Boemo. 3.” IEEE Trans. and L. 81–84. IEEE Int. Circuits Syst. vol. 401–412. Wanhammar. D. Gustafsson. M. IEEE Int. 1994. R. Symp. “Multiplier policies for digital signal processing. pp..

II. 1. Dempster. pp. Macleod.” IEEE Trans. IEEE Mediterranean Electrotechnical Conf.” in Proc. “A novel approach to multiple constant multiplication using minimum spanning trees. 204–216. .” in Proc. Apr. 1. Dempster. vol. Circuits Syst. O. pp. H. 1097–1100. pp. Dimirsoy. vol. and I. Muhammad and K. Circuits Syst. Gustafsson.. 1. Symp. Circuits Syst.” in Proc..” in Proc. Algorithms. Gustafsson and L. 569–577. 1995. Gustafsson. IEEE Nordic Signal Process. [75] O. Dempster and M. Conf. vol. [72] A. no.” in Proc. G. 257–261. O. S. Signals Syst. no. Püschel.. 5. 2004. pp. D. Wanhammar. “Implementation of low complexity FIR ﬁlters using a minimum spanning tree. IEEE Nordic Signal Process. Circuits Syst. no. Circuit Theory Design. “Towards an algorithm for matrix multiplier blocks. Wanhammar. Wanhammar. Comput. IEEE Int. 141– 144. Circuits Syst. 2004. 2004. Symp. 2002. Roy. Midwest Symp. Ohlsson. Ohlsson. Jan. “Multiplierless multiple constant multiplication. Dempster and N. [73] Y. “A graph theoretic approach for synthesizing very low-complexity high-speed digital ﬁlters. 2002. and L. Gustafsson and A. and L. Europ. 2003.” IEEE Trans. “Designing multiplier blocks with low logic depth. 9. no. pp. 63–66.. 2004. G. “Use of minimum-adder multiplier blocks in FIR digital ﬁlters. Signal Process. Voronenko and M. [68] O.References 61 [66] O.. vol. Feb. Ohlsson. G. [69] K.Aided Design Integr. IEEE Int.” in Proc. [74] A. P. IEEE Int. Gustafsson. Symp. p. S. 2. “Low-complexity constant coeﬃcient matrix multiplication using a minimum spanning tree approach. vol. [70] H. vol. vol. pp.. H. [76] A. 2007. 261–264. Wanhammar. Dempster.” in Proc.” IEEE Trans. Asilomar Conf. 3. “On the use of multiple constant multiplication in polyphase FIR ﬁlters and ﬁlter banks. “Improved multiple constant multiplication using a minimum spanning tree. Murphy. O. and J. pp. [77] O. Comput. [67] O. [71] A. pp.. Sep. 53–56. “A diﬀerence based adder graph heuristic for multiple constant multiplication problems. May 2007.. “Eﬃcient interpolators and ﬁlter banks using multiplier blocks. Kale. Gustafsson. 48..” in Proc. vol. 2002. 42. Symp.” ACM Trans. article 11. Sep. and L. 2. 21. 3. Coleman. G. G. Gustafsson. 2000.

Vesma.. Hogenauer.. Jan. vol. Dempster. Syst. no. Y. 155 –162.Imprint of: IGI Publishing.” IEEE Trans. 2000. Costa. vol.” Electron. Feb. [79] N.62 Chapter 7. “Eﬃcient polyphase decomposition of comb decimation ﬁlters in Σ∆ analog-todigital converters. [83] E. Des. 2005. accepted.. Jia. [85] S. [88] U. 1981. . vol. [81] L. “Decimation by irrational factor using CIC ﬁlter and linear interpolation. pp. Acoustics.. pp. no. Chu and C. “Decimation for Sigma Delta modulation. Crochiere and L. Aboushady. pp. Conf.. vol. Autom. 2001. [90] A. Mehrez. vol. Englewood Cliﬀs. [82] R. 2004. 1984. 89–92. 2. 898–903. 3. and J. D.. [86] J. Prentice Hall. Symp.” in Proc.” IEEE Trans. Oct. 81–101. Digital Signal Processing with Field Programmable Gate Arrays (Signals and Communication Technology).. 1986. 10. vol. 3677–3680. Circuits Syst. pp. Inc. pp. Renfors. Hershey. 31. 2008. 34. 72–76. no. and H. Secaucus. 29. pp. Louerat. Macleod and A. 1271–1282. PA: Information Science Reference . “Integer linear programming-based bit-level optimization for high-speed FIR decimation ﬁlter architectures. “An economical class of digital ﬁlters for decimation and interpolation. 11. May 2004. 651–652. 1983. Dumonteix. vol. Tenhunen. Nov. References [78] M. Acoust. Electron. pp. Candy.. “A ﬁfth-order comb decimation ﬁlter for multi-standard transceiver applications. Flores.-M. Lett. J. Babic. “Some optimizations of hardware multiplication by constant matrices. Meyer-Baese. Milic. Commun. “Multirate ﬁlter designs using comb ﬁlters. and H. 1. Comput. Rabiner. pp. 2001.” IEEE Trans.. Blad and O. USA: Springer-Verlag New York. Tisserand. [84] D. “Optimization algorithms for the multiplierless realization of linear transforms. no. 1. Speech Signal Process. Multirate Digital Signal Processing. vol..” in Proc. [80] L. and M. no. 29. [89] H. 48. 913–924. 40. Speech. IEEE Int. “Common subexpression elimination algorithm for low-cost multiplierless implementation of matrix multipliers. G.. no. 10. IEEE Int. N. Aksoy. Multirate Filtering for Digital Signal Processing: MATLAB Applications. Monteiro. [87] Y. Boullis and A. Gao.” IEEE Trans. Circuits Syst. II. Circuits Syst. E. 54. 2010. Burrus. L.” Circuits Syst. 6. no. Signal Process. Apr. P. M. 11. E.” IEEE Trans. NJ.. Signal Process. vol. Gustafsson.J. pp. Oct.” ACM Trans.

Signals Syst. Signal Process. IEEE Int. pp. –Vision. vol. Speech. pp. Diss. Signal Process. Lakshmivarahan and S.. 151. Feb. no. pp. Stephen and R. 2353– 2356.” IEEE Trans. [103] S. Hamming.. 5. Y. “A modiﬁed comb ﬁlter structure for decimation. Mondin. N. “Optimization and applications of polynomial-based interpolation ﬁlters. May 2007. Tampere University of Technology. pp. Reading. 2007. II.” IEE Proc. 2004. Vesma.” in Proc. A. pp. 1st ed. 2. 5. Signal Process.” IEEE Trans. Massachusetts. “Comb-based decimation ﬁlters for Sigma Delta A/D converters: Novel schemes and comparisons.. Laddomada and M. no. vol. 1997. vol. . Image Signal Process. 287–296. Ritoniemi. 1999. Knuth. Mitra. 1204–1213. 457–467.” IEEE Trans. 994–1005. “Application of ﬁlter sharpening to cascaded integrator-comb decimation ﬁlters. vol.. 1997. Dhall.” IEEE Trans. Willson. 2005. pp. Z. Kwentus. [102] D. W. no. [92] M. Kaiser and R.. 1990. 52. Commun. no. and Information Tech. E. 25. 1414–1420. Lo Presti. Oct.. I. 2000. [97] T. 54. 2004. [98] L. 47. no. Asilomar Conf. Nov. [100] G.” IEEE Trans. 4. Saramäki and T. pp. no. 7. pp. 4. Jiang. Jul.References 63 [91] A. Circuits Syst. 2: Seminumerical Algorithms. no.” IEEE Trans. K. Analysis and Design of Parallel Algorithms: Arithmetic and Matrix Problems. Symp. 2. “Eﬃcient modiﬁed-sinc ﬁlters for sigma-delta A/D converters. “Decimation schemes for Sigma Delta A/D converters based on kaiser and hamming sharpened ﬁlters. Circuits Syst. Stewart. 55. 252–255.” in Proc.. 5. vol.. vol.” in Proc. “Sharpening the response of a symmetric nonrecursive ﬁlter by multiple use of the same ﬁlter. Acoust. J. I. 45. 2170–2173. [101] J. Int. vol. Laddomada. Comput. vol. McGraw-Hill. and J. 11. Dolecek. Circuits Syst.. pp. “A new two-stage sharpened comb decimator. [95] ——. no. 415–422. [93] G. 1769–1779. The Art of Computer Programming. 1998. [94] M. [96] J. Aug. May 2007. vol. Symp. Addison-Wesley.” Ph. dissertation. [99] G. 254.D. pp. “Generalized comb decimation ﬁlters for Sigma Delta A/D converters: Analysis and design. vol. Jovanovic-Dolecek and S. “Modiﬁed CIC ﬁlter for rational sample rate conversion. Circuits Syst. K. 1977. “Sharpening of partially non-recursive CIC decimation ﬁlters.

ser. Mencer. 189–198. [116] S. vol.-Speciﬁc Syst.” in Proc. Euromicro Conf. IEEE Int. J. Dev. and J. Jan.. vol. Syst. Torres. “Generalizations of Horner’s rule for polynomial evaluation. Brisebarre.. [108] I. Lett. 17. 7. vol. Comput. 12. 2. Butler.” J. and T. IEEE Int. 216– 222. de Dinechin.” IBM J. Digital Syst. A. 2002.-U. Arch. T. C. Design Arch. 6. Dec. [105] G. Elementary Functions: Algorithms and Implementation. pp. Apr. 2007. Muller. Comput. Dec. 1972. 2010. vol. “On the parallel evaluation of polynomials. 625–638. Apr. 686–701. and E.. Comput. 1. no. W. no. Hu. Methods and Tools. Birkhauser. Applicat. pp. no. Joldes. Villalba. no. no. Comput.-M.. Sasao. vol. L.. 25. Pasca. “Hardware operators for function evaluation using sparse-coeﬃcient polynomials. Dec. pp. and S. 265–274. “Optimizing hardware function evaluation. 22. Bandera. pp. 1. “Organization of computer systems: the ﬁxed plus variable structure computer. NY.. Munro and A. and B. “Automatic generation of polynomial-based hardware architectures for function evaluation. “Design method for numerical function generators based on polynomial approximation for FPGA implementation. Arch. USA: ACM. 5. Paterson. [111] J. [110] L. no. Processors Conf. 1973. [112] D. no..-U. Cheung. Estrin. D. [109] K. pp.-M.. Syst. J. vol. 6. pp. A. pp. May 2008. 33–40. 260–262. Li. Western Joint IRE-AIEE-ACM Computer Conf. Gaﬀar. Sci. Borodin.. Jan. Sci. Munro and M.” IEEE Trans. 42. 1962. A. “A simple parallel algorithm for polynomial evaluation.” IEEE Trans.” in Proc.” Electron. 54. and W. 1520– 1531. Dorn. M. Luk. .” in Proc. “Eﬃcient evaluation of polynomial forms. Nagayama. Gonzalez. pp. Comput.. [114] N. A. 2–5.” in Proc. and J. pp.” J. 1441–1442. C. Res. J.” SIAM J. 1973.. Processors Conf.. S. Zapata. G. 6. “Polynomial evaluation on multimedia processors. 1996.” IEEE Trans. [106] W. 239–245. 2005. 1960. Villasenor. “Optimal algorithms for parallel polynomial evaluation. Muller. R. Hormigo. pp. Luk. Applicat.-Speciﬁc Syst. “Hardware implementation trade-oﬀs of polynomial approximations and interpolations. [115] F. vol. M. 2005. pp. Tisserand. Lee. pp. Maruyama. Nakamura. [113] D. T. Comput. 280–287. IRE-AIEE-ACM ’60 (Western).64 Chapter 7. O. 2. vol. New York. no. References [104] J. 57. Sci. 2006. [107] I. Lee.

pp. . Flammenkamp.References 65 [117] J.” SIAM J. Smith. Conf. vol. Arch. 1978.” Commun. “A method for obtaining digital signatures and public-key cryptosystems.-C. 6. 1. 1997. vol. 100–103. “A fast parallel squarer based on divide-and-conquer. 1. Gopalakrishnan.. 1999. no. 120–126. L. 11–22. 109–112. ElGindy. pp. Yao. 32. USA. Jun. New [121] R. 21. Feb. pp. “An eﬃcient algorithm for computing shortest addition chains. Aug. K. 909–912. Yoo. C. vol. “On the evaluation of powers. van Tilborg. Comput. pp.-T. Australasian Comput. Pihl and E. [118] J. Bleichenbacher and A.: Springer. York. Circuits Syst. Comput. [119] M.” in Proc. “A multiplier and squarer generator for high performance DSP applications. no. 1998. J. pp. Adleman..” IEEE J. Solid-State Circuits.” SIAM J. 1996. IEEE Midwest Symp. 5. no. A. Encyclopedia of Cryptography and Security.. F. and L. “On determining polynomial evaluation structures for FPGA based custom computing machines. Rivest. [120] H. 1976. ACM. Shamir. [123] D. [122] A. and G. A.” in Proc. 2. C. vol. Wojko and H. Aas.. 2005.